WO2018035311A1 - Compositions and methods for modulating gene expression using reading frame surveillance - Google Patents

Compositions and methods for modulating gene expression using reading frame surveillance Download PDF

Info

Publication number
WO2018035311A1
WO2018035311A1 PCT/US2017/047320 US2017047320W WO2018035311A1 WO 2018035311 A1 WO2018035311 A1 WO 2018035311A1 US 2017047320 W US2017047320 W US 2017047320W WO 2018035311 A1 WO2018035311 A1 WO 2018035311A1
Authority
WO
WIPO (PCT)
Prior art keywords
polynucleotide
cell
gene
exon
rna
Prior art date
Application number
PCT/US2017/047320
Other languages
French (fr)
Inventor
John T. Gray
Original Assignee
Gray John T
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gray John T filed Critical Gray John T
Publication of WO2018035311A1 publication Critical patent/WO2018035311A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/67General methods for enhancing the expression

Definitions

  • the invention relates to the field of nucleic acid biotechnology and provides, for instance, compositions and methods for the design and production of codon-optimized nucleic acids for enhancing gene expression in a target cell or tissue, as well as nucleic acids for inducing the expression or the silencing of a gene of interest, and nucleic acids capable of promoting alternative RNA splicing.
  • nucleic acids such as single-stranded polynucleotides and double-stranded nucleic acid duplexes.
  • vectors have been developed for the delivery of nucleic acids to target cells.
  • factors that have hindered the development of nucleic acids as a therapeutic paradigm are the difficulties associated with designing polynucleotides capable of selectively enhancing or repressing the expression of a gene of interest in a target cell, as well as the challenges that have been associated with modulating the alternative splicing of genes that contain multiple intronic sequences. There remains a need for a set of techniques that address these hindrances.
  • the invention provides compositions and methods for optimizing the nucleic acid sequence of a gene or RNA equivalent thereof encoding a protein of interest so as to achieve enhanced expression of the protein in a particular cell type.
  • Genes and RNA equivalents thereof optimized using the compositions and methods described herein can be synthesized by chemical synthesis techniques. Genes designed according to the methods described herein may be amplified, for instance, using prokaryotic or eukaryotic cells that have been transfected with the optimized gene.
  • the gene or RNA equivalent thereof may have a clinical benefit, as these constructs can be administered to a subject, such as a human subject, to treat a disease or condition characterized by a defect in, or a reduced expression of, the encoded protein.
  • Such diseases include heritable disorders, such as recessive genetic diseases, including X-linked myotubular myopathy (XLMTM), Pompe disease, recessive catecholaminergic polymorphic ventricular tachycardia (CPVT), and Crigler-Najjar syndrome, among others.
  • XLMTM X-linked myotubular myopathy
  • CPVT recessive catecholaminergic polymorphic ventricular tachycardia
  • Crigler-Najjar syndrome among others.
  • the invention features a method of preparing a codon-optimized gene or RNA equivalent thereof for expression of a protein in a cell, the method including: a) providing a gene expression profile for a plurality of genes expressed in the cell;
  • codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide, or RNA resulting from transcription thereof, to endogenous RNA molecules encoding a protein whose expression level is among the top 50% (e.g., among the top 1 %,
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 1 % of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 2% of gene expression levels in the intended target cell.
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 3% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 4% of gene expression levels in the intended target cell.
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 5% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 6% of gene expression levels in the intended target cell.
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 7% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 8% of gene expression levels in the intended target cell.
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 9% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 10% of gene expression levels in the intended target cell.
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 1 1 % of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 12% of gene expression levels in the intended target cell.
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 13% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 14% of gene expression levels in the intended target cell.
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 15% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 20% of gene expression levels in the intended target cell.
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 25% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 30% of gene expression levels in the intended target cell.
  • step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 50% of gene expression levels in the intended target cell.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • 9-nucleotide portions e.g., 9-nucleotide portions, 1 0-nucleotide portions, 1 1 -nucleotide portions, 12-nucleotide portions, 13- nucleotide portions, 14-nucleotide portions, 1 5-nucleotide portions, 16-nucleotide portions, 17-nucleotide portions, 18-nucleotide portions, 19-nucleotide portions, 20-nucleotide portions, 21 -nucleotide portions, 22-nucleotide portions, 23-nucleotide portions, 24-nucleotide portions, 25-nucleotide portions, 26- nucleotide portions,
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30- nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 29- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 28- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 27- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%
  • no more than 50% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 26- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%
  • no more than 50% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 25- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 24- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 23- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 22- nucleotide portion of the endogenous RNA molecules.
  • no more than 20% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 21 - nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 20- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 19- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 1 9-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules.
  • no more than 75% e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less
  • no more than 50% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules.
  • no more than 50% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 18- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules.
  • no more than 15% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules.
  • no more than 5% of the continuous 1 8-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules.
  • the polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity (e.g., no greater than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, or less) relative to endogenous RNA molecules encoding a protein whose expression level is among the top 50% (e.g., among the top 1 %, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50%) of expression levels in the cell.
  • 75% sequence identity e.g., no greater than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, or less
  • endogenous RNA molecules encoding a protein whose expression level is among the top 50% e.g., among the top 1 %, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50%
  • step (c) includes minimizing the sequence identity of the protein-coding region of the polynucleotide or RNA resulting from transcription thereof relative to the protein-coding regions of the endogenous RNA molecules.
  • step (c) includes minimizing the sequence identity of one or more non- coding regions of the polynucleotide or RNA resulting from transcription thereof relative to the corresponding non-coding regions of the endogenous RNA molecules.
  • the non-coding region is an intron, a 5' untranslated region (UTR), or a 3'
  • the method includes increasing the GC content of the polynucleotide while preserving the amino acid sequence of the encoded protein. In some embodiments, the method includes increasing the GC content of the polynucleotide to a quantity sufficient to permit hybridization of the polynucleotide or RNA resulting from transcription thereof to a complementary RNA strand with a Gibbs free energy change of from about -10 to about -100 kcal/mol.
  • the GC content of the polynucleotide may be increased to a quantity sufficient to permit hybridization of the polynucleotide or RNA resulting from transcription thereof to a complementary RNA strand with a Gibbs free energy change of from about -20 to about -90 kcal/mol, from about -30 to about -80 kcal/mol, from about -40 to about -70 kcal/mol, or from about -50 to about -60 kcal/mol.
  • the complementary RNA strand is from 9 to 30 nucleotides in length.
  • the complementary strand may be from about 18 to 30 nucleotides in length.
  • the complementary strand is 9 nucleotides in length.
  • the complementary strand is 12 nucleotides in length.
  • the complementary strand is 15 nucleotides in length.
  • the complementary strand is 18 nucleotides in length.
  • the complementary strand is 21 nucleotides in length.
  • the complementary strand is 24 nucleotides in length.
  • the complementary strand is 27 nucleotides in length.
  • the complementary strand is 30 nucleotides in length.
  • the method includes minimizing the CpG content of the polynucleotide while preserving the amino acid sequence of the encoded protein. In some embodiments, the method includes minimizing the homopolymer content of the polynucleotide while preserving the amino acid sequence of the encoded protein.
  • the method is used to obtain a codon-optimized polynucleotide for selective expression of the encoded protein in the cell, such as a target cell.
  • the cell e.g., the target cell
  • the cell is a eukaryotic cell, such as a mammalian cell.
  • the cell is a human cell, such as a liver cell, muscle cell, cardiomyocyte, or other cell of interest.
  • the method includes incorporating codon substitutions into the polynucleotide so as to increase the sequence identity of the designed polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 1 % (e.g., not among the top 2%, 3%, 4%, 5%, 6%, 6%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21 %, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31 %, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, or more) of expression levels in the target cell but is among the top 1 % (or among the top 1 % (or
  • the method may be used to obtain a gene or RNA equivalent thereof useful for tissue-specific expression of the encoded protein.
  • the method includes incorporating codon substitutions into the polynucleotide so as to increase the sequence identity of the designed polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 1 % (e.g., not among the top 2%, 3%, 4%, 5%, 6%, 6%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 1 5%, 16%, 17%, 18%, 19%, 20%, 21 %, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31 %, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%,
  • the target cell may be a liver cell and the cell of different tissue type may be a muscle cell so as to selectively express the gene or RNA equivalent thereof in a liver cell rather than a muscle cell.
  • the target cell may be a muscle cell and the cell of different tissue type may be a liver cell so as to selectively express the gene or RNA equivalent thereof in a muscle cell rather than a liver cell.
  • the method may be used to obtain a gene or RNA equivalent thereof useful for tissue- specific expression of the encoded protein.
  • the method includes preparing a vector containing the codon-optimized gene or RNA equivalent thereof.
  • the vector may be, for instance, a viral vector, such as an adeno- associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, or a vaccinia virus, among others.
  • the vector is an AAV, such as an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
  • the vector may be a pseudotyped AAV, such as rAAV2/8 or rAAV2/9.
  • the method includes contacting the vector with a population of cells in vitro or in vivo, for instance, so as to promote the expression of the encoded protein in a target cell.
  • the RNA equivalent is delivered to the cell (e.g., a target cell) directly, such as by contacting the cell with a liposome, vesicle (e.g., a synthetic vesicle), exosome (e.g., a synthetic exosome), dendrimer, nanoparticle (e.g., a chitosan-based nanoparticle or other cationic nanoparticle), or other carrier so as to promote entry of the RNA equivalent into the target cell.
  • a liposome vesicle (e.g., a synthetic vesicle), exosome (e.g., a synthetic exosome), dendrimer, nanoparticle (e.g., a chitosan-based nanoparticle or other cationic nanoparticle), or other carrier so as to promote
  • the method includes isolating the encoded protein from the population of cells.
  • the encoded protein may be purified from the population of cells and/or
  • the method may be used to obtain the encoded protein in high yield and in high purity.
  • the protein may be purified, for instance, to 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, 99.9%, 99.95%, 99.99% purity, or higher, as assessed, for instance, by a chromatographic procedure described herein or known in the art.
  • the invention provides a composition including a codon-optimized gene or RNA equivalent thereof designed or prepared according to any of the methods described above or herein.
  • the composition is a vector, such as a viral vector.
  • the viral vector is an AAV, adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, or a vaccinia virus, among others.
  • the vector is an AAV, such as an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
  • the vector is a pseudotyped AAV, such as rAAV2/8 or rAAV2/9.
  • the composition is a liposome, vesicle (e.g., a synthetic vesicle), exosome (e.g., a synthetic exosome), dendrimer, nanoparticle (e.g., a chitosan-based nanoparticle or other cationic nanoparticle), or other carrier.
  • vesicle e.g., a synthetic vesicle
  • exosome e.g., a synthetic exosome
  • dendrimer e.g., a dendrimer
  • nanoparticle e.g., a chitosan-based nanoparticle or other cationic nanoparticle
  • the gene or RNA equivalent thereof encodes a therapeutic protein.
  • the therapeutic protein is a protein listed in Table 3, below.
  • the therapeutic protein is myotubularin 1 (MTM1 ).
  • the therapeutic protein is acid a-glucosidase (GAA).
  • the therapeutic protein is calsequestrin 2 (CASQ2).
  • the therapeutic protein is a uridine diphosphate (UDP) glucuronosyltransferase (UGT), such as UGT family 1 member A1 (UGT1 A1 ).
  • the invention provides a method of treating a disease or condition in a patient characterized by a deficiency in a protein by administering to the patient a therapeutically effective amount of a codon-optimized gene or RNA equivalent thereof of any of the above aspects or
  • the disease or condition is X-linked myotubular myopathy (XLMTM) and the codon-optimized gene or RNA equivalent thereof encodes MTM1 .
  • the disease or condition is a glycogen storage disease, such as Pompe Disease, and the codon-optimized gene or RNA equivalent thereof encodes a lysosomal enzyme, such as GAA.
  • the disease or condition is recessive CPVT and the codon-optimized gene or RNA equivalent thereof encodes CASQ2.
  • the disease or condition is Crigler-Najjar syndrome and the codon-optimized gene or RNA equivalent thereof encodes a UGT family member protein, such as UGT1 A1 .
  • the invention features compositions and methods to attenuate gene expression, such as compositions and methods designed on the basis of the Reading Frame Surveillance (RFS) model.
  • the RFS model posits a mechanism by which a ribosome can prevent +1 frameshifting during translation of an mRNA.
  • a complementary RNA cRNA is generated by an RNA-dependent RNA polymerase (RdRP).
  • scRNAs short complementary RNAs
  • Sensing of an increased distance between the ribosome and a scRNA terminus may trigger an alteration and/or abortion of the translation of an mRNA.
  • the RFS model suggests that sensing of +1 frameshifts during the translation process can lead to repeated cleavage of the mRNA transcript that may result in terminal 2-base pair 3' overhangs, which can facilitate loading of RNA fragments onto the RNA Induced Silencing Complex (RISC) in eukaryotic cells.
  • RISC RNA Induced Silencing Complex
  • RISC is a multi-protein complex that can be programmed to target almost any gene for silencing, e.g., by a process called RNA interference (RNAi).
  • the invention features a method of attenuating expression of a gene (e.g., a wild type gene) in a cell (e.g., a mammalian cell, such as a human cell), such as a cell in vitro or in vivo.
  • a cell e.g., a mammalian cell, such as a human cell
  • the method can include the step of introducing into the cell a polynucleotide having at least one
  • the polynucleotide may include one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10) nucleotide insertions or deletions causing the +1 frameshift mutation.
  • the +1 frameshift mutation occurs when five nucleotides are inserted with respect to the gene sequence.
  • a single nucleotide deletion may cause a +1 frameshift mutation.
  • the polynucleotide includes include a wild type copy of the gene operably linked to the portion of the polynucleotide having a +1 frameshift mutation.
  • the polynucleotides may be DNA or RNA polynucleotides.
  • the polynucleotides may be introduced into a cell, such as a cell in vitro or in vivo, by contacting the cell with a vector including the polynucleotide.
  • the vector is may be a DNA or RNA vector.
  • the vector may be a viral vector, such as a viral vector selected from the group consisting of an adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, adeno-associated virus, baculovirus, herpes simplex virus, and a vaccinia virus.
  • the viral vector is an AAV, such as an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
  • the viral vector may be a pseudotyped AAV, such as rAAV2/8 or rAAV2/9.
  • the polynucleotide may be introduced into a cell by electroporation, or by contacting said cell with a cationic polymer, a cationic lipid, or calcium phosphate.
  • the invention features a vector comprising a polynucleotide having a +1 frameshift mutation with respect to a wild type sequence of a gene.
  • the polynucleotide comprises at least one (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) nucleotide insertion or deletion causing the +1 frameshift mutation.
  • the +1 frameshift mutation occurs when five nucleotides are inserted with respect to the gene sequence.
  • a single nucleotide deletion may cause a +1 frameshift mutation.
  • the polynucleotide comprises a wild type copy of the gene operably linked to the portion of the polynucleotide having the +1 frameshift mutation.
  • the vector is a DNA or RNA vector.
  • the vector may be a viral vector, such as a viral vector selected from the group consisting of an adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, adeno-associated virus, baculovirus, herpes simplex virus, and a vaccinia virus.
  • AAV adeno-associated virus
  • the viral vector is an AAV, such as an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
  • the viral vector may be a pseudotyped AAV, such as rAAV2/8 or rAAV2/9.
  • aberrant expression can be characterized as the expression of a gene that encodes a mutant protein, e.g., a protein that contains at least one (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) mutations, such as insertion and or deletions.
  • An aberrant protein may exhibit structural and/or functional features that are different from that of the wild type protein, which may be associated with human disease.
  • the aberrantly expressed gene is associated with a genetic disease (e.g., dominant catecholaminergic polyventricular tachycardia (CPVT) and Long QT syndrome (LQTS)) and/or a proliferative disorder (e.g., cancer).
  • a genetic disease e.g., dominant catecholaminergic polyventricular tachycardia (CPVT) and Long QT syndrome (LQTS)
  • CPVT dominant catecholaminergic polyventricular tachycardia
  • LQTS Long QT syndrome
  • cancer e.g., cancer
  • the invention provides compositions and methods for expressing a protein of interest, such as a therapeutic protein, in a target cell.
  • the nucleic acid constructs encoding the protein of interest can be delivered to a cell, such as a cell in vitro or in vivo.
  • the nucleic acid constructs can be prepared, e.g., using chemical synthesis techniques, such as through the use of solid phase nucleic acid synthesis protocols.
  • Nucleic acid constructs may be delivered to target cells using a variety of nucleic acid transfer technologies, including, for example, electroporation, lipid- mediated nucleic acid delivery (e.g., cationic lipid-mediated nucleic acid delivery), calcium phosphate- mediated nucleic acid delivery, and nanoparticle-mediated nucleic acid delivery, among others described herein.
  • the proteins encoded by these nucleic acid constructs can have a clinical benefit, as these constructs can be administered to a subject, such as a human subject, to treat a disease or condition characterized by a defect in, or a reduced expression of, the encoded protein.
  • Such diseases include heritable genetic disorders, such as recessive genetic diseases, including, without limitation, X-linked myotubular myopathy (XLMTM), Pompe disease, recessive catecholaminergic polymorphic ventricular tachycardia (CPVT), and Crigler-Najjar syndrome, among others.
  • XLMTM X-linked myotubular myopathy
  • CPVT recessive catecholaminergic polymorphic ventricular tachycardia
  • Crigler-Najjar syndrome among others.
  • the invention features a method of inducing expression of a protein in a cell, the method including introducing into the cell a duplex including a RNA polynucleotide encoding the protein, wherein the RNA polynucleotide is hybridized to a plurality of complementary nucleic acid strands (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more complementary nucleic acid strands).
  • complementary nucleic acid strands e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more complementary nucleic acid strands.
  • the plurality of complementary nucleic acid strands comprises one or more RNA strands. In some embodiments, the plurality of complementary nucleic acid strands comprises one or more DNA strands.
  • the plurality of complementary nucleic acid strands together span from the
  • the plurality of complementary strands may together span from the start codon (e.g., AUG) at the 5' end of the RNA polynucleotide encoding the protein to the stop codon (e.g., UAG, UAA, or UGA) at the 3' end of the RNA polynucleotide encoding the protein.
  • start codon e.g., AUG
  • stop codon e.g., UAG, UAA, or UGA
  • the plurality of complementary strands may together span from a regulatory element upstream of the start codon, such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide, for example, from a Kozak consensus sequence (e.g., GCCGCCACCAUGG (SEQ ID NO: 3)) to, for instance, a stop codon (e.g., UAG, UAA, or UGA) at the 3' end of the RNA polynucleotide encoding the protein.
  • a regulatory element upstream of the start codon such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide
  • a Kozak consensus sequence e.g., GCCGCCACCAUGG (SEQ ID NO: 3)
  • a stop codon e.g., UAG, UAA, or UGA
  • the plurality of complementary strands may together span from a regulatory element upstream of the start codon, such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide, for example, from a Kozak consensus sequence (e.g., GCCGCCACCAUGG (SEQ ID NO: 3)) to, for instance, a regulatory element downstream of the stop codon, such as a portion of the 3' UTR of the RNA polynucleotide.
  • a regulatory element upstream of the start codon such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide
  • a Kozak consensus sequence e.g., GCCGCCACCAUGG (SEQ ID NO: 3)
  • each nucleotide of the RNA polynucleotide encoding the protein is base paired to a nucleotide of the plurality of complementary nucleic acid strands, for instance, such that there are no single stranded regions of the duplex.
  • the duplex includes one or more single stranded regions.
  • the single stranded region consists of a multiple of 3 nucleotides in length (e.g., 3, 6, 9, 12, 15, 1 8, 21 , 24, 27, 30, 33, 36, 39, 42, 45, 48, 51 , 54, 57, 60, 63, 66, 69, 72, 75, 78, 81 , 84, 87, 90, 93, 96, 99, 102, 1 05, 108, 1 1 1 , 1 14, 1 17, 120, 123, 126, 129, 132, 135, or more, nucleotides in length).
  • 3 nucleotides in length e.g., 3, 6, 9, 12, 15, 1 8, 21 , 24, 27, 30, 33, 36, 39, 42, 45, 48, 51 , 54, 57, 60, 63, 66, 69, 72, 75, 78, 81 , 84, 87, 90, 93, 96, 99, 102, 1 05, 108, 1 1 1 1 , 1 14, 1 17, 120,
  • each of the complementary nucleic acid strands is from 9 to 30 nucleotides in length.
  • one or more of the plurality of complementary nucleic acid strands comprise one or more modified nucleotides.
  • the one or more modified nucleotides comprise a modified adenosine, such as N6-methyladenosine 5'-triphosphate, N1 - methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'-triphosphate, 2'-amino-2'-deoxyadenosine 5'- triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluoro-2'-deoxyadenosine 5'-triphosphate.
  • a modified adenosine such as N6-methyladenosine 5'-triphosphate, N1 - methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'-triphosphate, 2'-amino-2'-deoxyadenosine
  • the one or more modified nucleotides comprise a modified guanosine, such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'-deoxyguanosine 5'- triphosphate.
  • a modified guanosine such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'-deoxyguanosine 5'- triphosphate.
  • the one or more modified nucleotides comprise a modified uridine, such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'-triphosphate, 5-bromouridine 5'-triphosphate, 2- thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'- amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'-deoxyuridine 5'-triphosphate, or 2'-fluoro-2'- deoxyuridine 5'-triphosphate.
  • a modified uridine such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'-triphosphate, 5-bromouridine 5'-triphosphate, 2- thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'- amino-2'-deoxyuridine 5
  • the one or more modified nucleotides comprise a modified cytidine, such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5-bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'- deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, or 2'-fluoro-2'-deoxycytidine 5'- triphosphate.
  • a modified cytidine such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5-bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'
  • the duplex is introduced into the cell by electroporation, or by contacting the cell with a cationic lipid, a vesicle, an exosome, a liposome, a dendrimer, a synthetic cationic vesicle, a cationic polymer, or calcium phosphate.
  • the cell is a eukaryotic cell, such as a mammalian cell. In some embodiments, the cell is a human cell.
  • the protein is a protein listed in Table 3.
  • the protein is selected from the group consisting of myotubularin 1 (MTM1 ), acid a-glucosidase (GAA), calsequestrin 2 (CASQ2), and uridine diphosphate glucuronosyltransferase family 1 member A1 (UGT1 A1 ).
  • the invention provides a synthetic duplex including a RNA polynucleotide encoding a protein, wherein the RNA polynucleotide is hybridized to a plurality of complementary nucleic acid strands.
  • the plurality of complementary nucleic acid strands comprise one or more RNA strands. In some embodiments, the plurality of complementary nucleic acid strands comprise one or more DNA strands.
  • the plurality of complementary nucleic acid strands together span from the 5' end to the 3' end of the RNA polynucleotide encoding the protein.
  • the plurality of complementary strands may together span from the start codon (e.g., AUG) at the 5' end of the RNA polynucleotide encoding the protein to the stop codon (e.g., UAG, UAA, or UGA) at the 3' end of the RNA polynucleotide encoding the protein.
  • the plurality of complementary strands may together span from a regulatory element upstream of the start codon, such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide, for example, from a Kozak consensus sequence (e.g., GCCGCCACCAUGG (SEQ ID NO: 3)) to, for instance, a stop codon (e.g., UAG, UAA, or UGA) at the 3' end of the RNA polynucleotide encoding the protein.
  • a regulatory element upstream of the start codon such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide
  • a Kozak consensus sequence e.g., GCCGCCACCAUGG (SEQ ID NO: 3)
  • a stop codon e.g., UAG, UAA, or UGA
  • the plurality of complementary strands may together span from a regulatory element upstream of the start codon, such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide, for example, from a Kozak consensus sequence (e.g., GCCGCCACCAUGG (SEQ ID NO: 3)) to, for instance, a regulatory element downstream of the stop codon, such as a portion of the 3' UTR of the RNA polynucleotide.
  • a regulatory element upstream of the start codon such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide
  • a Kozak consensus sequence e.g., GCCGCCACCAUGG (SEQ ID NO: 3)
  • each nucleotide of the RNA polynucleotide encoding the protein is base- paired to a nucleotide of the plurality of complementary nucleic acid strands, for instance, such that there are no single stranded regions of the duplex.
  • the duplex includes one or more single stranded regions.
  • the single stranded region consists of a multiple of 3 nucleotides in length (e.g., 3, 6, 9, 12, 15, 1 8, 21 , 24, 27, 30, 33, 36, 39, 42, 45, 48, 51 , 54, 57, 60, 63, 66, 69, 72, 75, 78, 81 , 84, 87, 90, 93, 96, 99, 102, 1 05, 108, 1 1 1 , 1 14, 1 17, 120, 123, 126, 129, 132, 135, or more, nucleotides in length).
  • 3 nucleotides in length e.g., 3, 6, 9, 12, 15, 1 8, 21 , 24, 27, 30, 33, 36, 39, 42, 45, 48, 51 , 54, 57, 60, 63, 66, 69, 72, 75, 78, 81 , 84, 87, 90, 93, 96, 99, 102, 1 05, 108, 1 1 1 1 , 1 14, 1 17, 120,
  • each of the complementary nucleic acid strands is from 9 to 30
  • one or more of the plurality of complementary nucleic acid strands comprise one or more modified nucleotides.
  • the one or more modified nucleotides comprise a modified adenosine, such as N6-methyladenosine 5'-triphosphate, N1 - methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'-triphosphate, 2'-amino-2'-deoxyadenosine 5'- triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluoro-2'-deoxyadenosine 5'-triphosphate.
  • a modified adenosine such as N6-methyladenosine 5'-triphosphate, N1 - methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'-triphosphate, 2'-amino-2'-deoxyadenosine
  • the one or more modified nucleotides comprise a modified guanosine, such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'-deoxyguanosine 5'- triphosphate.
  • a modified guanosine such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'-deoxyguanosine 5'- triphosphate.
  • the one or more modified nucleotides comprise a modified uridine, such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'-triphosphate, 5-bromouridine 5'-triphosphate, 2- thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'- amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'-deoxyuridine 5'-triphosphate, or 2'-fluoro-2'- deoxyuridine 5'-triphosphate.
  • a modified uridine such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'-triphosphate, 5-bromouridine 5'-triphosphate, 2- thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'- amino-2'-deoxyuridine 5
  • the one or more modified nucleotides comprise a modified cytidine, such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5-bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'- deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, or 2'-fluoro-2'-deoxycytidine 5'- triphosphate.
  • a modified cytidine such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5-bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'
  • the composition is a cationic lipid, a vesicle, an exosome, a liposome, a dendrimer, a synthetic cationic vesicle, or a cationic polymer.
  • the protein is a protein listed in Table 3, such as a protein is selected from the group consisting of MTM1 , GAA, CASQ2, and UGT1 A1 .
  • the invention provides a method of treating a disease or condition in a patient characterized by a deficiency in a protein by administering to the patient a therapeutically effective amount of a composition of the foregoing aspect, or any embodiments thereof.
  • the disease or condition is X-linked myotubular myopathy (XLMTM) and the protein is MTM1 .
  • the disease or condition is a glycogen storage disease, such as Pompe Disease, and the protein is a lysosomal enzyme, such as GAA.
  • the disease or condition is recessive CPVT and the protein is CASQ2.
  • the disease or condition is Crigler- Najjar syndrome and the protein is UGT1 A1 .
  • the invention provides a method for inducing alternative splicing in a cell by providing the cell with a polynucleotide that includes the following elements operably linked in a 5'-to-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to an intron (INTR1 ); and a third region corresponding to a second exon (EX2) from the gene thereby inducing alternative splicing.
  • the wild-type form of the gene includes one or more intervening exons between EX1 and EX2, and the polynucleotide does not comprise a region corresponding to the one or more intervening exons between EX1 and EX2.
  • induced alternative splicing results in exon skipping during processing of the endogenous mRNA of the gene.
  • the invention provides a method for inducing exon skipping in a cell by providing the cell with a polynucleotide that includes the following elements operably linked in a 5'-to-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to an intron (INTR1 ); and a third region corresponding to a second exon (EX2) from the gene thereby inducing exon skipping.
  • the wild-type form of the gene includes one or more intervening exons between EX1 and EX2, and the polynucleotide does not comprise a region corresponding to the one or more intervening exons.
  • the polynucleotide further includes a fourth region corresponding to a second intron (INTR2) and a fifth region corresponding to a third exon (EX3) from the gene, operably linked to each other in a 5'-to-3' direction as EX1 -INTR2-EX2-INTR2-EX3.
  • a wild- type form of the gene includes one or more intervening exons between EX2 and EX3, and the polynucleotide does not include a region corresponding to the one or more intervening exons between EX2 and EX3.
  • Induced alternative splicing for instance, by way of exon skipping, can induce the splicing together of endogenous exons corresponding to EX2 and EX3.
  • the polynucleotide includes additional regions corresponding to additional introns (e.g., regions corresponding to a third (INTR3), fourth (INTR4), fifth (INTR5), sixth (INTR6), seventh (INTR7), eighth (INTR8), ninth (INTR9), or tenth (INTR10) intron, or more) as well as additional intervening regions corresponding to additional exons from the gene (e.g., regions corresponding to a fourth (EX4), fifth (EX5), sixth (EX6), seventh (EX7), eighth (EX8), ninth (EX9), tenth (EX1 0), or eleventh (EX1 1 ) exon, or more), such that each region corresponding to an intron is flanked by regions corresponding to an exon from the gene.
  • additional introns e.g., regions corresponding to a third (INTR3), fourth (INTR4), fifth (INTR5), sixth (INTR6), seventh (INTR7), eighth (INT
  • a wild-type form of the gene includes one or more intervening exons between, for example, EX3 and EX4, EX4 and EX5, EX5 and EX6, EX6 and EX7, EX7 and EX8, EX8 and EX9, EX9 and EX1 0, and/or EX10 and EX1 1 , and the polynucleotide does not include a region corresponding to the one or more intervening exons.
  • Alternative splicing for instance, by way of exon skipping, can induce the splicing together of endogenous exons corresponding to, for example, EX1 -4 (e.g., EX1 , EX2, EX3, and EX4), EX1 -EX5, EX1 -EX6, EX1 -EX7, EX1 -EX8, EX1 - EX9, EX1 -EX10, or EX1 -EX1 1 .
  • EX1 -4 e.g., EX1 , EX2, EX3, and EX4
  • EX1 -EX5 e.g., EX1 , EX2, EX3, and EX4
  • EX1 -EX5 e.g., EX1 , EX2, EX3, and EX4
  • EX1 -EX5 e.g., EX1 , EX2, EX3, and EX4
  • the invention provides a method for inducing alternative splicing by way of exon inclusion in a cell by providing the cell with a polynucleotide that includes the following elements operably linked in a 5'-to-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to a second exon (EX2); and a third region corresponding to a third exon (EX3) from the gene, thereby inducing inclusion of the second exon in an endogenous mRNA transcript.
  • EX1 includes 9 or more nucleotides (e.g., a multiple of 3, such as 9, 18, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 108, 1 17, 126, 135, 144, 153, 162, 171 , 180, 189, 198, 207, 216, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720, 729,
  • EX2 includes 9 or more nucleotides (e.g., a multiple of 3, such as 9, 1 8, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 108, 1 1 7, 126, 135, 144, 153, 1 62, 171 , 180, 189, 198, 207, 216, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720
  • EX2 includes 9 or more nucleotides (e.g., a multiple of 3, such as 9, 18, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 1 08, 1 17, 126, 135, 144, 153, 162, 171 , 1 80, 189, 198, 207, 216, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720, 729
  • EX3 includes 9 or more nucleotides (e.g., a multiple of 3, such as 9, 18, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 108, 1 17, 126, 135, 144, 153, 162, 1 71 , 180, 189, 198, 207, 216, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720, 729
  • one or more of EX1 , EX2, and EX3 does not include a full-length exon.
  • induced alternative splicing results in the inclusion of the second exon (e.., the region in the gene corresponding to EX2) during processing of the endogenous mRNA of the gene.
  • the invention features a composition including a vector including a polynucleotide including the following elements operably linked in a 5'-to-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to an intron (INTR1 ); and a third region corresponding to a second exon (EX2) the gene.
  • EX1 a first region corresponding to a first exon from a gene
  • INTR1 an intron
  • EX2 second exon
  • the wild-type form of the gene includes one or more intervening exons between EX1 and EX2, and the polynucleotide does not comprise a region corresponding to the one or more intervening exons.
  • the wild-type form of the gene includes a plurality of intervening exons between EX1 and EX2, and the polynucleotide does not contain a region corresponding to any of the intervening exons.
  • the polynucleotide further includes a fourth region corresponding to a second intron (INTR2) and a fifth region corresponding to a third exon (EX3) from the gene, operably linked to each other in a 5'-to-3' direction as EX1 -INTR2-EX2-INTR2-EX3.
  • a wild- type form of the gene includes one or more intervening exons between EX2 and EX3, and the polynucleotide does not include a region corresponding to the one or more intervening exons between EX2 and EX3.
  • Induced alternative splicing for instance, by way of exon skipping, can induce the splicing together of endogenous exons corresponding to EX2 and EX3.
  • the polynucleotide includes additional regions corresponding to additional introns (e.g., regions corresponding to a third (INTR3), fourth (INTR4), fifth (INTR5), sixth (INTR6), seventh (INTR7), eighth (INTR8), ninth (INTR9), or tenth (INTR10) intron, or more) as well as additional intervening regions corresponding to additional exons from the gene (e.g., regions corresponding to a fourth (EX4), fifth (EX5), sixth (EX6), seventh (EX7), eighth (EX8), ninth (EX9), tenth (EX1 0), or eleventh (EX1 1 ) exon, or more), such that each region corresponding to an intron is flanked by regions corresponding to an exon from the gene.
  • additional introns e.g., regions corresponding to a third (INTR3), fourth (INTR4), fifth (INTR5), sixth (INTR6), seventh (INTR7), eighth (INT
  • a wild-type form of the gene includes one or more intervening exons between, for example, EX3 and EX4, EX4 and EX5, EX5 and EX6, EX6 and EX7, EX7 and EX8, EX8 and EX9, EX9 and EX1 0, and/or EX10 and EX1 1 , and the polynucleotide does not include a region corresponding to the one or more intervening exons.
  • Alternative splicing for instance, by way of exon skipping, can induce the splicing together of endogenous exons corresponding to, for example, EX1 -4 (e.g., EX1 , EX2, EX3, and EX4), EX1 -EX5, EX1 -EX6, EX1 -EX7, EX1 -EX8, EX1 - EX9, EX1 -EX10, or EX1 -EX1 1 .
  • EX1 -4 e.g., EX1 , EX2, EX3, and EX4
  • EX1 -EX5 e.g., EX1 , EX2, EX3, and EX4
  • EX1 -EX5 e.g., EX1 , EX2, EX3, and EX4
  • EX1 -EX5 e.g., EX1 , EX2, EX3, and EX4
  • the region corresponding to an intron e.g., INTR1 , INTR2, INTR3, INTR4,
  • INTR5, INTR6, INTR7, INTR8, INTR9, or INTR10, etc. corresponds to any polynucleotide region that includes both a splice donor and splice acceptor, such that when a polynucleotide including, for example, EX1 -INTR1 -EX2, operably linked 5'-to-3', is expressed, INTR1 allows for splicing of the pre-mRNA such that the mature mRNA contains EX1 directly spliced to EX2 in a 5'-to-3' orientation.
  • the region corresponding to an intron corresponds to a full-length intron from a gene.
  • the region corresponding to an intron corresponds to a truncated intron from a gene.
  • INTR1 when INTR1 (or another region corresponding to an intron as described herein) is derived from a gene, INTR1 (or another region corresponding to an intron as described herein) may be from the same gene as EX1 and/or EX2 (or another region corresponding to an exon from the gene as described herein), or may be from a gene that does not contain EX1 and/or EX2 (or another region corresponding to an exon as described herein).
  • the region corresponding to an intron e.g., INTR1
  • the region corresponding to an intron includes the following elements linked operably in the 5'-to-3' direction: a 5' region that includes a splice donor site; optionally, an intervening region; and a 3' region that includes a splice acceptor site.
  • the 5' region may have a length in nucleotides of, for example, 0-500, 500-1 000, 1000-1500, or 1 500-2000.
  • the 3' region may have a length in nucleotides of, for example, 0-500, 500- 1000, 1000-1500, or 1 500-2000.
  • the intervening region may have a length, in nucleotides, of 0-500, 500- 1000, 1000-1500, or 1 500-2000.
  • the polynucleotide includes a eukaryotic promoter (P Euk ), operably linked
  • the eukaryotic promoter may be, for example, a tissue-specific promoter, such as a promoter selected from the group consisting of desmin promoter, creatine kinase promoter, myogenin promoter, a myosin heavy chain promoter, human brain and natriuretic peptide promoter, albumin promoter, a-1 - antitrypsin promoter, and hepatitis B virus core protein promoter; or a constitutive promoter, such as a promoter selected from the group consisting of CMV promoter, chicken-p-actin promoter, HSV promoter, TK promoter, RSV promoter, SV40 promoter, MMTV promoter, and Adenovirus E1 A promoter.
  • tissue-specific promoter such as a promoter selected from the group consisting of desmin promoter, creatine kinase promoter, myogenin promoter, a myosin heavy chain promoter, human brain and natriuretic peptide promoter, album
  • the polynucleotide includes a region of less than or equal to 2000 nucleotides linked 5' to the region corresponding to an exon from the gene (e.g., EX1 ) that is not from the gene including, e.g., EX1 and EX2. In some embodiments, the polynucleotide includes a region of less than or equal to 2000 nucleotides linked 3' to, e.g., EX2 that is not from the gene including EX1 and EX2.
  • the polynucleotide includes a region of less than or equal to 2000 nucleotides linked 5' to EX1 that is not from the gene including, e.g., EX1 and EX2 and a region of less than or equal to 2000 nucleotides linked 3' to EX2 that is not from the gene including, e.g., EX1 and EX2.
  • the polynucleotide contains a Kozak consensus sequence with a role in translation initiation by P Euk .
  • the length in nucleotides of the region corresponding to an exon from the gene is “x” or any multiple of "x", wherein "x” is 9-30 nucleotides.
  • the length in nucleotides of another region corresponding to an exon from the gene is independently “x” or any multiple of "x", wherein "x” is 9-30 nucleotides.
  • the length in nucleotides of a region corresponding to an exon from the gene is 9 nucleotides or any multiple of 9 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 10 nucleotides or any multiple of 10 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 1 1 nucleotides or any multiple of 1 1 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 12 nucleotides or any multiple of 12 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 13 nucleotides or any multiple of 13 nucleotides (e.g., 13, 26, 39, 52, 65, 78, 91 , 104, 1 17, 130, 143, 1 56, 169, 182, 195, 208, 221 , 234, 247, 260, 273, 286, 299, 312, 325, 338, 351 , 364, 377, 390, 403, 41 6, 429, 442, 455, 468, 481 , 494, 507, 520, 533, 546, 559, 572, 585, 598, 61 1 , 624, 637, 650, 663, 676, 689, 702, 715, 728, 741 , 754, 767, 780, 793, 806, 819, 832, 845, 858, 871 , 884
  • the length in nucleotides of a region corresponding to an exon from the gene is 14 nucleotides or any multiple of 14 nucleotides (e.g., 14, 28, 42, 56, 70, 84, 98, 1 12, 126, 140, 154, 1 68, 182, 196, 21 0, 224, 238, 252, 266, 280, 294, 308,
  • the length in nucleotides of a region corresponding to an exon from the gene is 15 nucleotides or any multiple of 15 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 16 nucleotides or any multiple of 16 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 17 nucleotides or any multiple of 17 nucleotides (e.g., 17, 34, 51 , 68, 85, 102, 1 19, 136, 153, 1 70, 187, 204, 221 , 238, 255, 272, 289, 306, 323, 340, 357, 374, 391 , 408, 425, 442, 459, 476, 493, 510, 527, 544, 561 , 578, 595, 612, 629, 646, 663, 680, 697, 714, 731 , 748, 765, 782, 799, 816, 833, 850, 867, 884, 901 , 918, 935, 952, 969, 986, 1003), and the length in nucleotides of another region corresponding to an exon form the
  • the length in nucleotides of a region corresponding to an exon from the gene is 18 nucleotides or any multiple of 18 nucleotides (e.g., 18, 36, 54, 72, 90, 108, 126, 144, 162, 1 80, 198, 216, 234, 252, 270, 288, 306, 324, 342, 360, 378, 396, 414, 432, 450, 468, 486, 504, 522, 540, 558, 576, 594, 612, 630, 648, 666, 684, 702, 720, 738, 756, 774, 792, 810, 828, 846, 864, 882, 900, 918, 936, 954, 972, 990, 1008), and the length in nucleotides of another region corresponding to an exon from the gene is independently 18 nucleotides or any multiple of 18 nucleotides (e.g., 18, 36, 54, 72, 90, 108, 126,
  • the length in nucleotides of a region corresponding to an exon from the gene is 19 nucleotides or any multiple of 19 nucleotides (e.g., 19, 38, 57, 76, 95, 1 14, 133, 152, 171 , 1 90, 209, 228, 247, 266, 285, 304, 323, 342, 361 , 380, 399, 41 8, 437, 456, 475, 494, 513, 532, 551 , 570, 589, 608, 627, 646, 665, 684, 703, 722, 741 , 760, 779, 798, 81 7, 836, 855, 874, 893, 912, 931 , 950, 969, 988, 1007), and the length in nucleotides of another region corresponding to an exon from the gene is independently 19 nucleotides or any multiple of 19 nucleotides (e.g., EX1 , among others described herein) is 19 nucleot
  • the length in nucleotides of a region corresponding to an exon from the gene is 20 nucleotides or any multiple of 20 nucleotides (e.g.,
  • the length in nucleotides of another region corresponding to an exon from the gene is independently 20 nucleotides or any multiple of 20 nucleotides (e.g., 20, 40, 60, 80, 100, 120, 140, 1 60, 180, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560, 580, 600, 620, 640, 660, 680, 700, 720, 740, 760, 780, 800, 820, 840, 860, 880, 900, 920, 940, 960, 980, 1000), and the length in nucleotides of another region corresponding to an exon from the gene is independently 20 nucleotides or any multiple of 20 nucleotides (e.g., 20, 40, 60, 80, 100, 120, 140, 1 60, 180, 200, 220, 240, 260, 280, 300
  • the length in nucleotides of a region corresponding to an exon from the gene is 21 nucleotides or any multiple of 21 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 22 nucleotides or any multiple of 22 nucleotides (e.g., 22, 44, 66, 88, 1 10, 132, 154, 176, 198, 220, 242, 264, 286, 308, 330, 352, 374, 396, 41 8, 440, 462, 484, 506, 528, 550, 572, 594, 616, 638, 660, 682, 704, 726, 748, 770, 792, 814, 836, 858, 880, 902, 924, 946, 968, 990, 1012), and the length in nucleotides of another region corresponding to an exon from the gene is independently 22 nucleotides or any multiple of 22 nucleotides (e.g., 22, 44, 66, 88, 1 10, 132, 1 54, 176,
  • the length in nucleotides of a region corresponding to an exon from the gene is 23 nucleotides or any multiple of 23 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 24 nucleotides or any multiple of 24 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 25 nucleotides or any multiple of 25 nucleotides (e.g.,
  • nucleotides of another region corresponding to an exon from the gene is independently 25 nucleotides or any multiple of 25 nucleotides (e.g., 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 975, 1000), and the length in nucleotides of another region corresponding to an exon from the gene is independently 25 nucleotides or any multiple of 25 nucleotides (e.g., 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 6
  • the length in nucleotides of a region corresponding to an exon from the gene is 26 nucleotides or any multiple of 26 nucleotides (e.g.,
  • 26 nucleotides or any multiple of 26 nucleotides e.g., 26, 52, 78, 104, 130, 156, 182, 208, 234, 260, 286, 312, 338, 364, 390, 416, 442, 468, 494, 520, 546, 572, 598, 624, 650, 676, 702, 728, 754, 780, 806, 832, 858, 884, 910, 936, 962, 988, 1014), and the length in nucleotides of another region corresponding to an exon from the gene is independently 26 nucleotides or any multiple of 26 nucleotides (e.g., 26, 52, 78, 104, 130, 156, 182, 208, 234, 260, 286, 312, 338, 364, 390, 416, 442, 468, 494, 520, 546, 572, 598, 624, 650, 676, 702, 728, 754, 780, 806, 832,
  • the length in nucleotides of a region corresponding to an exon from the gene is 27 nucleotides or any multiple of 27 nucleotides (e.g.,
  • the length in nucleotides of a region corresponding to an exon from the gene is 28 nucleotides or any multiple of 28 nucleotides (e.g.,
  • nucleotides of another region corresponding to an exon from the gene is independently 28 nucleotides or any multiple of 28 nucleotides (e.g., 28, 56, 84, 1 12, 140, 168, 196, 224, 252, 280, 308, 336, 364, 392,
  • the length in nucleotides of a region corresponding to an exon from the gene is 29 nucleotides or any multiple of 29 nucleotides (e.g.,
  • nucleotides of another region corresponding to an exon from the gene is independently 29 nucleotides or any multiple of
  • 29 nucleotides e.g., 29, 58, 87, 1 16, 145, 174, 203, 232, 261 , 290, 319, 348, 377, 406, 435, 464, 493, 522, 551 , 580, 609, 638, 667, 696, 725, 754, 783, 812, 841 , 870, 899, 928, 957, 986, 1015).
  • the length in nucleotides of a region corresponding to an exon from the gene is 30 nucleotides or any multiple of 30 nucleotides (e.g.,
  • nucleotides of another region corresponding to an exon from the gene is independently 30 nucleotides or any multiple of
  • nucleotides e.g., 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360, 390, 420, 450, 480, 510, 540, 570, 600, 630, 660, 690, 720, 750, 780, 81 0, 840, 870, 900, 930, 960, 990, 1020).
  • the polynucleotide is incorporated into a vector and delivered to a cell, wherein the cell may be a human cell.
  • the cell includes an endogenous gene including the following exons linked operably in a 5'-to-3' direction: EX1 , an exon having a mutation relative to the wild type form of the gene, and EX2.
  • EX1 an exon having a mutation relative to the wild type form of the gene
  • EX2 an exon having a mutation relative to the wild type form of the gene
  • the gene may also contain one or more additional exons 5' to EX1 , 3' to EX2, or in the intervening sequence between EX1 and EX2.
  • the mutation is a point mutation resulting in a frameshift, or a point mutation that generates a PTC, resulting in a nonsense mutation. In some embodiments, the mutation is an insertion or deletion resulting in a frameshift. In some embodiments the mutation is a duplication or deletion of a region including a full-length exon, such that the duplication or deletion results in a frameshift.
  • the gene is the DMD gene, which codes for the protein dystrophin, and the mutation is a point mutation.
  • the exon having the mutation may be selected from the group consisting of exon 2, exons 6-8, exon 1 7, exons 19-20, exon 35, exon 43-59, and exons 69-70.
  • the exon bearing the mutation in the DMD gene is exon 2.
  • the exon bearing the mutation in the DMD gene is exon 6.
  • the exon bearing the mutation in the DMD gene is exon 7.
  • the exon bearing the mutation in the DMD gene is exon 8.
  • the exon bearing the mutation in the DMD gene is exon 17.
  • the exon bearing the mutation in the DMD gene is exon 19. In some embodiments, the exon bearing the mutation in the DMD gene is exon 20. In some embodiments, the exon bearing the mutation in the DMD gene is exon 35. In some embodiments, the exon bearing the mutation in the DMD gene is exon 43. In some embodiments, the exon bearing the mutation in the DMD gene is exon 44. In some embodiments, the exon bearing the mutation in the DMD gene is exon 45. In some embodiments, the exon bearing the mutation in the DMD gene is exon 46. In some embodiments, the exon bearing the mutation in the DMD gene is exon 47. In some embodiments, the exon bearing the mutation in the DMD gene is exon 48.
  • the exon bearing the mutation in the DMD gene is exon 49. In some embodiments, the exon bearing the mutation in the DMD gene is exon 50. In some embodiments, the exon bearing the mutation in the DMD gene is exon 51 . In some embodiments, the exon bearing the mutation in the DMD gene is exon 52. In some embodiments, the exon bearing the mutation in the DMD gene is exon 53. In some embodiments, the exon bearing the mutation in the DMD gene is exon 54. In some embodiments, the exon bearing the mutation in the DMD gene is exon 55. In some embodiments, the exon bearing the mutation in the DMD gene is exon 56. In some embodiments, the exon bearing the mutation in the DMD gene is exon 57.
  • the exon bearing the mutation in the DMD gene is exon 58. In some embodiments, the exon bearing the mutation in the DMD gene is exon 59. In some embodiments, the exon bearing the mutation in the DMD gene is exon 69. In some embodiments, the exon bearing the mutation in the DMD gene is exon 70. In some embodiments, in the wild-type DMD gene, EX1 is 5' to the mutation-bearing exon and/or EX2 is 3' to the mutation bearing exon. In some embodiments, the mutation bearing exon is not present in the polynucleotide.
  • the gene is the DMD gene and the mutation is a duplication or deletion of a region including a full-length exon, such that the duplication or deletion results in a frameshift.
  • alternative splicing, for instance, by way of exon skipping, of one or more exons may restore the downstream reading frame.
  • the exon to be skipped is selected from the group consisting of exon 2, exons 6-8, exon 17, exons 19-20, exon 35, exon 43-59, and exons 69-70.
  • the exon to be skipped is exon 2.
  • the exon to be skipped is exon 6. In some embodiments, the exon to be skipped is exon 7. In some embodiments, the exon to be skipped is exon 8. In some embodiments, the exon to be skipped is exon 1 7. In some embodiments, the exon to be skipped is exon 19. In some embodiments, the exon to be skipped is exon 20. In some embodiments, the exon to be skipped is exon 35. In some embodiments, the exon to be skipped is exon 43. In some embodiments, the exon to be skipped is exon 44. In some embodiments, the exon to be skipped is exon 45. In some embodiments, the exon to be skipped is exon 46.
  • the exon to be skipped is exon 47. In some embodiments, the exon to be skipped is exon 48. In some embodiments, the exon to be skipped is exon 49. In some embodiments, the exon to be skipped is exon 50. In some embodiments, the exon to be skipped is exon 51 . In some embodiments, the exon to be skipped is exon 52. In some embodiments, the exon to be skipped is exon 53. In some embodiments, the exon to be skipped is exon 54. In some embodiments, the exon to be skipped is exon 55. In some embodiments, the exon to be skipped is exon 56. In some embodiments, the exon to be skipped is exon 57.
  • the exon to be skipped is exon 58. In some embodiments, the exon to be skipped is exon 59. In some embodiments, the exon to be skipped is exon 69. In some embodiments, the exon to be skipped is exon 70.
  • the mutation of the dystrophin gene may be related to a pathological state.
  • the pathological state may be, for instance, Duchenne muscular dystrophy (DMD) or Becker's Muscular Dystrophy (BMD).
  • the polynucleotide is flanked by one or more cloning sites. In some embodiments, the polynucleotide is incorporated into a vector.
  • the vector is a viral vector selected from the group consisting of adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, and vaccinia virus.
  • AAV adeno-associated virus
  • the viral vector including the polynucleotide is introduced into a cell.
  • the viral vector is an AAV selected from the group consisting of AAV1 , AAV2.
  • the viral vector is a pseudotyped AAV selected from rAAV2/8 or rAAV2/9.
  • the polynucleotide is introduced into a cell by electroporation, or by contacting the cell with a cationic polymer, a cationic lipid, or calcium phosphate.
  • the term "about” refers to a value that is within 10% above or below the value being described.
  • affinity refers to the strength of a binding interaction between two molecules, such as the interaction between two polynucleotides.
  • Kd is intended to refer to the dissociation constant, which can be obtained, for example, from the ratio of the rate constant for the dissociation of the two molecules (kd) to the rate constant for the association of the two molecules (k a ) and is expressed as a molar concentration (M). Kd values for the interaction between two molecules, such as the interaction between two polynucleotides, can be determined, for example, using methods established in the art.
  • Methods that can be used to determine the Kd of a receptor-ligand interaction include surface plasmon resonance, e.g., through the use of a biosensor system such as a BIACORE ® system, as well as calorimetry procedures, including isothermal titration calorimetry techniques described herein or known in the art.
  • the terms "attenuated expression,” “attenuated expression level,” or “attenuated levels,” refer to a decreased expression or decreased level of a gene (e.g., a wild type gene) in a cell (e.g., a mammalian cell, e.g., a human cell) relative to a control cell, such as a cell that does not contain a polynucleotide having a +1 frameshift mutation with respect to the gene (e.g., the wild type gene).
  • a gene e.g., a wild type gene
  • the term "basal level" in the context of gene expression refers to an endogenous expression level of an encoded protein that is above the limit of detection of a conventional protein detection assay.
  • Conventional protein detection assays include, without limitation, ELISA-based assays, immunoblot assays (e.g., Western blot assays), mass spectrometry (such as, for instance, matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry and electrospray ionization (ESI) mass spectrometry), and spectroscopic methods, such as nuclear magnetic resonance (NMR), infrared (IR) spectroscopy, and UV-Vis spectroscopy, among others.
  • MALDI-TOF matrix-assisted laser desorption ionization-time of flight
  • ESI electrospray ionization
  • spectroscopic methods such as nuclear magnetic resonance (NMR), infrared (IR) spectroscopy, and UV
  • cell type refers to a group of cells sharing a phenotype that is statistically separable based on gene expression data. For instance, cells of a common cell type may share similar structural and/or functional characteristics, such as similar gene activation patterns and antigen presentation profiles. Cells of a common cell type may include those that are isolated from a common tissue (e.g., epithelial tissue, neural tissue, connective tissue, or muscle tissue) and/or those that are isolated from a common organ, tissue system, blood vessel, or other structure and/or region in an organism.
  • tissue e.g., epithelial tissue, neural tissue, connective tissue, or muscle tissue
  • cRNA complementary RNA
  • CpG refers to a site within a nucleic acid molecule in which a cytosine nucleoside is bound to a guanosine nucleoside by way of a phosphodiester bond.
  • CpG content in the context of a polynucleotide refers to the relative quantity of CpG sites within a nucleic acid molecule.
  • endogenous describes a molecule (e.g., a polypeptide, nucleic acid, or cofactor) that is found naturally in a particular organism (e.g., a human) or in a particular location within an organism (e.g., an organ, a tissue, or a cell, such as a human cell).
  • a particular organism e.g., a human
  • a particular location within an organism e.g., an organ, a tissue, or a cell, such as a human cell.
  • exogenous describes a molecule (e.g., a polypeptide, nucleic acid, or cofactor) that is not found naturally in a particular organism (e.g., a human) or in a particular location within an organism (e.g., an organ, a tissue, or a cell, such as a human cell).
  • Exogenous materials include those that are provided from an external source to an organism or to cultured matter extracted there from.
  • exon refers to a region within the coding region of a gene, the nucleotide sequence of which determines the amino acid sequence of the corresponding protein.
  • exon also refers to the corresponding region of the RNA transcribed from a gene.
  • a gene, as defined above, may contain, for instance, a minimum of three exons separated by intervening introns. Exons are transcribed into pre-mRNA, and may be included in the mature mRNA depending on the alternative splicing of the gene. Exons that are included in the mature mRNA following processing are translated into protein, wherein the sequence of the exon determines the amino acid composition of the protein.
  • exon skipping refers to a form of alternative splicing wherein one or more exons are excluded from the resulting mature mRNA. Exon skipping may occur as a result of endogenous regulation, or may be induced in a pre-determined manner by the presence of a polynucleotide.
  • a first region corresponding to a first exon from a gene and ⁇ 1 " are used interchangeably and refer to a region that corresponds to any region of an exon greater than 20 nucleotides in length directly upstream of the splice donor site, wherein the region may be from any exon of the gene.
  • a first region corresponding to a first exon from a gene may be any full-length exon from a gene (e.g., exon 1 , exon 2, exon 3, exon 4, or exon 5, etc.).
  • a first region corresponding to a first exon from a gene may be a truncated portion of any exon from a gene (e.g., exon 1 , exon 2, exon 3, exon 4, or exon 5, etc.).
  • a second region corresponding to a second exon from a gene As used herein, the terms "a second region corresponding to a second exon from a gene" and
  • ⁇ 2 are used interchangeably and refer to a region that corresponds to any region of an exon greater than 20 nucleotides in length directly downstream of the splice acceptor site, wherein the region may be from any exon of the gene.
  • a second region corresponding to a second exon from a gene may be any full-length exon from a gene (e.g., exon 1 , exon 2, exon 3, exon 4, or exon 5, etc.).
  • a second region corresponding to a second exon from a gene may be a truncated portion of any exon from a gene (e.g., exon 1 , exon 2, exon 3, exon 4, or exon 5, etc.).
  • a third region corresponding to a third exon from a gene refers to a region that corresponds to any region of an exon greater than 20 nucleotides in length directly downstream of the splice donor site, wherein the region may be from any exon of the gene, as described for EX1 and EX2 herein.
  • the term "intron” refers to a region within the coding region of a gene, the nucleotide sequence of which is not translated into the amino acid sequence of the corresponding protein.
  • the term intron also refers to the corresponding region of the RNA transcribed from a gene.
  • the term “gene” refers to a gene as defined below that contains at minimum two introns, each of which forms the intervening sequence between two exons. Introns are transcribed into pre-mRNA, but are removed during processing, and are not included in the mature mRNA.
  • a second region corresponding to an intron and “INTR1 ,” as well as subsequent enumerations (e.g., “INTR2,” “INTR3,” “INTR4,” “INTR5,” “INTR6,” “INTR7,” “INTR8,”
  • INTR9 refers to any polynucleotide region that includes both a splice donor and splice acceptor, such that when a polynucleotide including EX1 -INTR1 -EX2, operably linked 5'-to-3', is expressed, INTR1 allows for splicing of the pre-mRNA such that the mature mRNA contains EX1 directly spliced to EX2 in a 5'-to-3' orientation.
  • INTR1 corresponds to a full-length intron from a gene.
  • INTR1 corresponds to a truncated intron from the gene.
  • INTR1 is derived from a gene
  • INTR1 may be from the same gene as EX1 and EX2, or may be from a gene that does not contain EX1 and EX2.
  • INTR1 is a synthetic intron, not isolated from any known gene.
  • INTR1 includes the following elements linked operably in the 5'-to-3' direction: a 5' region that includes a splice donor site; optionally, an intervening region; and a 3' region that includes a splice acceptor site.
  • frameshift refers to a change in how a ribosome defines a codon within a gene, and therefore defined the reading frame of translation.
  • the identity of each amino acid in a protein is determined by a three nucleotide codon, which is defined by the ribosome through binding of the anticodon of a tRNA to its complementary sequence on an mRNA. Therefore, any occurrence of a frameshift may alter the amino acid sequence of the translated protein both at and downstream of the location of the frameshift.
  • GC content refers to the quantity of nucleosides in a particular nucleic acid molecule, such as a DNA or RNA polynucleotide, that are either guanosine (G) or cytidine (C) relative to the total quantity of nucleosides present in the nucleic acid molecule.
  • G guanosine
  • C cytidine
  • GC Content ((Total quantity of guanosine nucleosides) + (Total quantity of cytidine nucleosides) / (Total quantity of nucleosides)) x 100
  • a gene refers to a region of DNA that encodes a protein.
  • a gene may include regulatory regions and a protein-coding region.
  • a gene includes two or more introns and three or more exons, wherein each intron forms an intervening sequence between two exons.
  • gene expression profile refers to a measurement of the level of one or more genes expressed by an individual cell, a cell type, or by a plurality of cells present in a
  • RNA transcripts such as mRNA
  • qRT- PCR quantitative reverse-transcription polymerase chain reaction
  • Additional techniques useful for determining RNA transcript levels include RNA sequencing assays (RNA-Seq) as described herein.
  • Gene expression levels can also be determined by measuring the quantity of protein produced from translation of such transcripts, for instance, using an enzyme-linked immunosorbent assay (ELISA) format in conjunction with one or more antibodies that specifically binds the protein of interest.
  • ELISA enzyme-linked immunosorbent assay
  • a gene expression profile can also include information regarding the relative expression levels of various genes within a cell, cell type, or plurality of cells present in a homogenous or heterogeneous mixture of cell types.
  • a gene expression profile may include a ranking of the expression levels of a plurality of genes in a particular cell or cell type. The genes may be ranked, for example, in order based on their relative expression levels, such that genes that are expressed in higher quantities are assigned a higher rank and genes that are expressed in lower quantities are assigned a lower rank.
  • a gene expression profile may also include numerical values describing the differences in the expression levels of various genes.
  • a gene expression profile may include one or more metrics (for instance, a gene expression value based on mRNA expression, such as a numerical value obtained from a RNA-Seq or qRT-PCR assay described herein) that indicate that the fold differences between the expression levels of various genes in the cell, cell type, or population of cells of interest.
  • An exemplary gene expression profile may contain, for instance, a set of 3 or more genes (for instance, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, 50 or more, or 100 or more genes) ranked based on relative expression level in the cell, cell type, or population of cells of interest as well as a corresponding series of numerical values obtained from a gene expression assay described herein or known in the art. These values can reflect the relative differences between the expression levels of the genes.
  • these values can reflect that, e.g., the first ranked gene in the expression profile is expressed at a 10-fold higher level relative to the second, third, fourth, or fifth genes in the expression profile.
  • a detailed description of the gene expression assays that can be used to quantitatively describe gene expression is provided herein. Additional gene expression assays that generate quantitative measures of gene expression are known in the art.
  • the term "genetic frameshift” refers to any mutation that results in a frameshift.
  • a deletion or insertion within an exon, wherein the deletion or insertion is not a multiple of three nucleotides, may result in a frameshift.
  • a duplication or deletion of an entire exon, wherein the number of nucleotides in the exon is not a multiple of three may result in a frameshift.
  • homopolymer in the context of a polynucleotide refers to a nucleic acid containing three or more (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 20, or more) continuous repeats of a single nucleotide.
  • Exemplary homopolymers include, e.g., poly adenosine phosphate (e.g., AAA, AAAA, AAAAA, AAAAAA, and the like), poly guanosine phosphate (e.g., GGG, GGGG, GGGGG, GGGGGG, and the like), poly cytidine phosphate (e.g., CCC, CCCC, CCCCC, CCCCCC, and the like), and poly thymidine phosphate (e.g., TTT, TTTT, TTTTT, TTTTTT, and the like), among others.
  • poly adenosine phosphate e.g., AAA, AAAA, AAAAA, AAAAAA, and the like
  • poly guanosine phosphate e.g., GGG, GGGG, GGGGGGGGGGG, and the like
  • poly cytidine phosphate e.g., CCC, CCCC, CCCCC, CCCCCC
  • ICso refers to the concentration of a substance (antagonist) that reduces the efficacy of a reference agonist or the constitutive activity of a biological target by 50%, for instance, as measured in a competitive ligand binding assay or in a cell-based functional assay.
  • the term "intron” refers to a region within the coding region of a gene, the nucleotide sequence of which is not translated into the amino acid sequence of the corresponding protein.
  • the term intron also refers to the corresponding region of the RNA transcribed from a gene.
  • a gene for example, may contain at minimum two introns, each of which forms the intervening sequence between two exons. Introns are transcribed into pre-mRNA, but are removed during processing, and are not included in the mature mRNA.
  • the term "low abundance” refers to the level of gene expression that is relatively less, for example, than the average (e.g., mean or median or an approximation thereof) of detectable genes expressed in a library, for instance, using standard gene detection techniques known in the art, such as qPCR or RNA-Seq.
  • a gene expressed at "low abundance” is generally expressed at levels among the bottom half (e.g., bottom half, bottom third, bottom quarter, bottom 10%, bottom 5%, bottom 1 %, bottom 0.1 %, or lower) of the transcriptome.
  • the term "minimize” or “minimization” refers to a process by which the lowest attainable value of a quantity is computed, subject to one or more constraints.
  • the quantity may be, for example, the percent identity of a polynucleotide, such as an RNA polynucleotide, with respect to another polynucleotide that is naturally expressed in a particular cell or cell type.
  • sequence identity minimization the constraints in place may require that the sequence identity of one polynucleotide be minimized with respect to a plurality of other polynucleotides, such as the set of polynucleotides expressed naturally in a particular cell or cell type.
  • the constraints in place may also require, for instance, that the polynucleotide being optimized retain the ability to encode a particular protein or polypeptide of interest, as assessed, for example, by translation of the nucleic acid sequence of the polynucleotide.
  • modified nucleotide refers to a nucleotide or portion thereof (e.g., adenosine, guanosine, thymidine, cytidine, or uridine) that has been altered by one or more enzymatic or synthetic chemical transformations.
  • exemplary alterations observed in modified nucleotides described herein or known in the art include the introduction of chemical substituents, such as halo, thio, amino, azido, alkyl, acyl, or other functional groups at one or more positions (e.g., the 2', 3', and/or 5' position) of a 2-deoxyribonucleotide or a ribonucleotide.
  • mRNA As used herein, the terms "mRNA,” “mature mRNA,” and “processed mRNA” are used interchangeably and refer to an RNA transcript of a gene after processing, wherein processing includes any of the following: addition of a 5' cap, splicing, and addition of a poly(A)-tail.
  • processing includes any of the following: addition of a 5' cap, splicing, and addition of a poly(A)-tail.
  • the resulting mRNA, mature mRNA, or processed mRNA does not contain introns, and contain only the exons determined by the splicing pattern.
  • mutation refers to any change in the sequence of a gene, such that the sequence is not identical to that of the wild type gene.
  • a mutation may be selected from the group including a single nucleotide point mutation that results in an amino acid change, a single nucleotide point mutation that results in a premature termination codon, a single nucleotide insertion, a single nucleotide deletion, the insertion of two or more contiguous nucleotides, the deletion of two or more contiguous nucleotides, the duplication of a contiguous region within a gene, or the deletion of a contiguous region within a gene.
  • a mutated gene may include a single mutation, or multiple mutations. A mutation may occur in any region of the gene.
  • nucleic acid and “polynucleotide” are used interchangeably and refer to polymers of nucleotides of any length.
  • polynucleotides are DNA polynucleotides and RNA polynucleotides.
  • operably linked in the context of a nucleic acid refers to a nucleic acid that is placed into a structural or functional relationship with another nucleic acid.
  • one segment of DNA may be operably linked to another segment of DNA if they are positioned relative to one another on the same contiguous DNA molecule and have a structural or functional relationship, such as a promoter or enhancer that is positioned relative to a coding region so as to facilitate transcription of the coding region.
  • the operably linked nucleic acids are not contiguous, but are positioned in such a way that they have a functional relationship with each other as nucleic acids or as proteins that are expressed by them. Enhancers, for example, do not have to be contiguous. Linking may be accomplished by ligation at convenient restriction sites or by using synthetic oligonucleotide adaptors or linkers.
  • percent (%) sequence identity refers to the percentage of amino acid (or nucleic acid) residues of a candidate sequence that are identical to the amino acid (or nucleic acid) residues of a reference sequence after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity (e.g., gaps can be introduced in one or both of the candidate and reference sequences for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software, such as BLAST, ALIGN, or Megalign (DNASTAR) software.
  • a reference sequence aligned for comparison with a candidate sequence may show that the candidate sequence exhibits from 50% to 100% sequence identity across the full length of the candidate sequence or a selected portion of contiguous amino acid (or nucleic acid) residues of the candidate sequence.
  • the length of the candidate sequence aligned for comparison purposes may be, for example, at least 30%, (e.g., 30%, 40, 50%, 60%, 70%, 80%, 90%, or 100%) of the length of the reference sequence.
  • the term "pharmaceutical composition” refers to a mixture containing a therapeutic compound to be administered to a subject, such as a mammal, e.g., a human, in order to prevent, treat or control a particular disease or condition affecting or that may affect the subject.
  • pharmaceutically acceptable refers to those compounds, materials, compositions and/or dosage forms, which are suitable for contact with the tissues of a subject, such as a mammal (e.g., a human) without excessive toxicity, irritation, allergic response and other problem complications commensurate with a reasonable benefit/risk ratio.
  • pioneer round and “pioneer translation” are used interchangeably and refer to initial round of translation during which complementary RNA (cRNA) is synthesized, for instance, according to the Reading Frame Surveillance model.
  • cRNA complementary RNA
  • pioneer translation is expected to be coupled to short complementary RNA (scRNA) synthesis such that each region of a cRNA that is complementary to an exon is cleaved to generate a series of scRNAs of uniform length (e.g., from 9-30 nucleotides in length), spanning that exon.
  • scRNA short complementary RNA
  • premature termination codon PTC
  • nonsense mutation refers to a point mutation which results a stop codon being present in the mRNA upstream of a wild type stop codon.
  • pre-mRNA refers to an RNA transcript of a gene prior to processing, wherein processing includes any of the following: addition of a 5' cap, splicing, and addition of a poly(A)- tail. Pre-mRNA may contain both introns and exons.
  • promoter refers to a region within the regulatory region of a gene that enables initiation of the transcription of a gene into a messenger RNA, wherein transcription is initiated with the binding of an RNA polymerase on or nearby the promoter.
  • reading frame refers to how a ribosome defines a codon within a gene, as determined by binding of a tRNA to a three-nucleotide codon during translation of an mRNA.
  • a reading frame is "restored,” this indicates that the reading frame had first been first altered by a frameshift or nonsense mutation, and subsequently returned to the original reading frame by some means.
  • ribosomal frameshift refers to a frameshift that occurs when the ribosome shifts in the 5' or 3' direction, with respect to the mRNA, during translation, resulting in a frameshift that is not a result of a genetic mutation.
  • a “+1 " or “plus 1 " ribosomal frameshift refers to a frameshift in which the ribosome shifts 1 nucleotide position towards the 3' end of the mRNA, wherein a single nucleotide is not recognized by a tRNA.
  • a “-1 " or “minus 1 " ribosomal frameshift refers to a frameshift in which the ribosome shifts 1 nucleotide position towards the 5' end of the mRNA, wherein a single nucleotide is recognized twice by a tRNA.
  • RNA equivalent of a gene refers to a RNA polynucleotide that corresponds to a DNA polynucleotide that encodes the gene, such as a RNA transcript obtainable by transcription of a DNA polynucleotide that contains the gene.
  • exemplary RNA equivalents include mRNA transcripts produced synthetically, such as by way of solid phase nucleic acid synthesis techniques known in the art and/or described herein, as well as by recombinant nucleic acid preparation methods.
  • RNA sequencing assay refers to an assay that can be used to determine the presence and/or quantity of one or more RNA transcripts (e.g., hnRNA or alternatively spliced mRNA) within a sample of RNA molecules.
  • RNA sequencing assays may include high-throughput sequencing methods as known in the art. Exemplary methods for conducting RNA-Seq assays are described herein.
  • sample refers to a specimen (e.g., blood, blood component (e.g., serum or plasma), urine, saliva, amniotic fluid, cerebrospinal fluid, tissue (e.g., placental or dermal), pancreatic fluid, chorionic villus sample, and cells) isolated from a subject.
  • a specimen e.g., blood, blood component (e.g., serum or plasma), urine, saliva, amniotic fluid, cerebrospinal fluid, tissue (e.g., placental or dermal), pancreatic fluid, chorionic villus sample, and cells
  • the phrases “specifically binds” and “binds” refer to a binding reaction which is determinative of the presence of a particular molecule, such as a polynucleotide, in a heterogeneous population of polynucleotides and other biological molecules that is recognized, e.g., by a ligand, such as a complementary polynucleotide, with particularity.
  • a ligand e.g., a complementary polynucleotide
  • a ligand that specifically binds to a protein will bind to the protein, e.g., with a KD of less than 1 00 nM.
  • a ligand that specifically binds to a protein may bind to the protein with a KD of up to 100 nM (e.g., between 1 pM and 100 nM).
  • a ligand that does not exhibit specific binding to another molecule or a domain thereof may exhibit a KD of greater than 100 nM (e.g., greater than 200 nM, 300 nM, 400 nM, 500 nM, 600 nm, 700 nM, 800 nM, 900 nM, 1 ⁇ , 100 ⁇ , 500 ⁇ , or 1 mM) for that particular molecule or domain thereof.
  • a variety of assay formats may be used to determine the affinity of a ligand for a specific protein.
  • solid-phase ELISA assays are routinely used to identify ligands that specifically bind a target protein. See, e.g., Harlow & Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor Press, New York (1988) and Harlow & Lane, Using Antibodies, A Laboratory Manual, Cold Spring Harbor Press, New York (1999), for a description of assay formats and conditions that can be used to determine specific protein binding.
  • the terms “subject” and “patient” are interchangeable and refer to an organism that receives treatment for a particular disease or condition as described herein.
  • scRNA short complementary RNA
  • scRNA short complementary RNA
  • the resulting scRNAs are uniform with respect to the length in nucleotides of the individual scRNA fragments, wherein the length of each scRNA is 9-30 nucleotides (e.g., 9, 1 0, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides).
  • the cleavage of cRNA to scRNA occurs during the pioneer round of translation.
  • Truncated polynucleotides may refer to the truncated form of any of the following: a gene, a pre-mRNA, an mRNA, an intron, or an exon.
  • a truncated polynucleotide may be missing a region at its 5' end, at its 3' end, an internal region, or some combination thereof.
  • a truncated protein may be missing a region at its N-terminus, at its C- terminus, an internal region, or some combination thereof.
  • vector refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked.
  • plasmid refers to a circular double stranded DNA loop into which additional DNA segments may be ligated.
  • viral vector Another type of vector is a viral vector, wherein additional DNA segments may be ligated into the viral genome.
  • Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors).
  • vectors e.g., non-episomal mammalian vectors
  • Other vectors can be integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome.
  • Certain vectors are capable of directing the expression of genes to which they are operably linked.
  • wild-type refers to a genotype with the highest frequency for a particular gene in a given organism.
  • Figure 1 A is a schematic showing the relative positioning of the A site and P site tRNA anticodon stems when the tRNA sequences are properly aligned to the mRNA template.
  • Figure 1 B is a schematic showing the relative positioning of the A site and P site tRNA anticodon stems when the tRNA sequences are improperly aligned such that the A site tRNA attempts to anneal to the mRNA template in a -1 frameshifted fashion. Steric clashes between the adjacent tRNA molecules in the A site and P site hinder the simultaneous binding of the two tRNA molecules to a single mRNA codon.
  • Figure 1 C is a schematic showing the relative positioning of the A site and P site tRNA anticodon stems when the tRNA sequences are improperly aligned such that the A site tRNA attempts to anneal to the mRNA template in a +1 frameshifted fashion. Unlike the instance portrayed in Figure 1 B, no steric clashes are present to hinder the binding of the A site tRNA to a downstream codon.
  • FIG 2A is a schematic portraying the formation of short complementary RNA (scRNA) fragments during translation of an mRNA template according to the Reading Frame Surveillance model. According to this model, a series of scRNA fragments are produced by an RNA-dependent RNA polymerase during pioneer translation. The scRNA fragments persist following the initial translation cycle and may anneal to daughter mRNA templates, providing an epigenetic memory of previous translations.
  • Figure 2B is a schematic showing a mechanism by which the ribosome senses the distance between the 3' terminal end of a scRNA fragment and the mRNA-bound tRNA molecules.
  • Figure 2C is a schematic showing the events that occur once a +1 frameshift is sensed by the ribosome during translation of an mRNA template.
  • the sensing of +1 frameshifts during the translation process can lead to repeated cleavage of the mRNA transcript that may result in terminal 2-base pair 3' overhangs, which can facilitate loading of RNA fragments onto the RNA Induced Silencing Complex (RISC) in eukaryotic cells.
  • RISC RNA Induced Silencing Complex
  • Figure 3 is a schematic portraying the Reading Frame Surveillance model applied to expression of a eukaryotic gene, wherein a pre-mRNA transcript contains both introns and exons, and cRNA fragments define the intron-exon boundaries.
  • a cRNA complementary to the full-length pre-mRNA is produced in the nucleus. Cleavage of the cRNA to generate scRNA fragments occurs during the pioneer round of translation, also in the nucleus. Formation of scRNA fragments by cleavage of the cRNA occurs at set uniform intervals at the regions of the cRNA that are complementary to the exonic regions of the pre-mRNA.
  • Figure 4 is a map of an exemplary polynucleotide construct capable of inducing exon skipping by RFS.
  • the polynucleotide is designed to induce skipping of exons 52 and 53 of the dystrophin gene, and may be used to restore dystrophin function in the mdx4cv mouse, which bears a mutation in exon 53.
  • the EX1 -INTR1 -EX2 region of the polynucleotide is indicated in pink, where EX1 is Exon 51 or a 3' portion of EX 51 of the dystrophin gene; INTR1 includes both a splice donor (SD) and a splice acceptor (SA) necessary to enable splicing of EX1 to EX2; and EX2 is Exon 54 or a 5' portion of Exon 54 of the dystrophin gene.
  • the polynucleotide comprising EX1 -INTR1 -EX2 may be operably linked to a promoter and incorporated into a vector.
  • EX1 -INTR1 -EX2 has been placed under the control of regulatory elements for the expression of human desmin protein (hDes). Additionally, a sequencing coding for green fluorescence protein (GFP) has been fused to the C- terminal end of Exon 54, in order to indicate and quantify expression of the protein product resulting from the construct.
  • GFP green fluorescence protein
  • the resulting polynucleotide has then been incorporated into a vector, which may be delivered to a host, such as a DMD cell line or a DMD animal model, for example, the mdx4cv mouse.
  • compositions and methods described herein can be used to optimize the nucleic acid sequence of a gene or RNA equivalent thereof encoding a protein of interest so as to achieve, for instance, enhanced expression of the protein in a particular cell type.
  • genes and RNA equivalents thereof can be optimized for tissue-specific expression of an encoded protein.
  • Genes and RNA equivalents thereof optimized using the compositions and methods described herein can be synthesized by chemical synthesis techniques and may be amplified, for instance, using polymerase chain reaction (PCR)-based amplification methods or by transfection of the gene into a cell, such as a bacterial cell or mammalian cell capable of replicating exogenous nucleic acids.
  • PCR polymerase chain reaction
  • RNA equivalents described herein can have important clinical utility.
  • diseases and conditions including heritable genetic disorders, are manifestations of a deficiency in a native protein.
  • target cells e.g., human cells.
  • target cells e.g., human cells.
  • the present invention is based in part on the hypothesis that, during translation of an
  • scRNA short complementary RNA fragments of from about 9 to about 30 nucleotides in length that anneal to the mRNA transcript are generated by an RNA-dependent RNA polymerase. It is hypothesized that these fragments guide the proper alignment of the ribosome to the mRNA template by preventing downstream (+1 ) frameshifts from occurring during the translation process. As shown in Figure 1 B, upstream (-1 ) frameshifting of the ribosome during translation of an mRNA construct is sterically prohibited, as this would require two adjacent tRNA molecules to be bound to the same base on the mRNA target.
  • RFS Reading Frame Surveillance
  • a key element of this model is that during the pioneer round, translation of a uniform set of codons is associated with cleavage of the cRNA template at a set distance from the A-site, which results in the mRNA being decorated with scRNA oligonucleotides.
  • the 7-codon length of the scRNA fragments diagrammed in Figure 2 represents one possible length for these fragments.
  • the RFS model postulates that the translational reading frame is monitored by sensing of scRNA termini with respect to the position of the tRNAs bound in the ribosome active sites.
  • Figure 2B presents one possible mechanism by which this sensing event occurs.
  • the ribosome senses the 5' end of a scRNA and, upon mRNA translocation to restore complete scRNA pairing with its complementary sequence on the mRNA, confirms that translation of the previous segment occurred without frameshifting.
  • Figure 2C presents a schematic of a+1 frameshifted complex, which increase the distance between the ribosome and the scRNA 5' end, which could be sensed by the ribosome to trigger alteration or abortion of the translation of that mRNA.
  • the RFS model additionally suggests that scRNA molecules can dissociate from the mRNA template that was translated during their biogenesis. These scRNA molecules are free to bind to daughter RNA molecules of the same polarity, for instance, to facilitate proper translation of these templates. In this way, the RFS model indicates that the scRNA molecules function as a form of epigenetic memory of previous translations, and facilitate the detection of reading frame errors occurring during genome replication.
  • RNA molecule e.g., a +1 frame-shifting event
  • scRNA molecules derived from translation of an intact RNA decorate the new, mutated RNA
  • translation downstream of the deletion would be sensed as a +1 frameshift at each scRNA every time it is read by the ribosome. This may trigger a severe response, such as cleavage of the RNA message.
  • RISC RNA Induced Silencing Complex
  • exons In eukaryotes, exons generally have elevated GC content relative to adjacent introns, and in higher eukaryotes this feature is even more pronounced in those regions of the genome with lower overall GC content and longer introns (Amit et al., Cell Reports 1 :543-556 (2012) and Louie et al., Genome Res. 13:2594-2601 (2003)).
  • This property of coding exons is frequently revealed in analyses of codon usage bias, in particular the GC content of the third, wobble position of codons. Codon usage bias is often attributed to selective pressure to maintain optimal translation rates in the context of charged tRNA abundances (Ikemura, Mol. Biol. Evol.
  • codon usage in eukaryotes can be influenced by reading frame surveillance, and can provide selective pressure to maintain high GC content in the wobble position.
  • the severity of disease- causing mutations in the Factor IX gene has been correlated with the change in pairing free energy causing by each coding sequence mutation (Hamasaki-Katagiri et al., Haemophilia 18:933-940 (2012)).
  • the RFS model can thus be used to inform the design of codon-optimized genes for expression in a target cell, and may synergize with existing codon-optimization procedures to provide a single set of guidelines one can follow to design genes capable of inducing robust protein expression in a target cell of interest.
  • the principle of reading frame surveillance is based in part on the postulate that scRNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated.
  • the model indicates that these complementary RNA fragments can function in concert to promote the degradation endogenous RNA that imperfectly aligns with the ribosome.
  • RNA molecules are capable of persisting within the cytoplasm and may anneal, for instance, to RNA molecules encoding other proteins due to chance complementarity
  • a gene encoding a protein intended for expression in a target cell can be codon- optimized by incorporating codon replacements into the gene that minimize the sequence identity between RNA resulting from transcription of the gene of interest and the endogenous RNA molecules present within the cell.
  • RNA transcript of the gene of interest can be compared.
  • gene expression techniques described herein such as qPCR, RNA-Seq, and immunoblot assays to assess which genes are expressed in a particular target cell and the extent to which these genes are expressed so as to determine a panel of RNA molecules against which the sequence identity of the RNA transcript of the gene of interest can be compared.
  • RNA transcripts corresponding to those genes that are expressed in higher abundance than other genes in the cell e.g., endogenous RNA transcripts corresponding to those genes that are expressed 5-fold higher, 10-fold higher, 15-fold higher, 20 -fold higher, 25 -fold higher, 30 -fold higher, 35 -fold higher, 40-fold higher, 45-fold higher, 50-fold higher, 100- fold higher, 200-fold higher, 300-fold higher, 400-fold higher, 500-fold higher, 600-fold higher, 700-fold higher, 800-fold higher, 900-fold higher, 1 , 000-fold higher, or more, such as RNA transcripts encoding proteins that are among the top 1 %, 2%,
  • genes that are expressed in higher quantities are expected to have higher quantities of corresponding scRNA fragments within the cell of interest. Since endogenous scRNA fragments exist only for those genes that are actively transcribed in a cell of interest, it is not necessary to minimize the sequence identity of the target gene relative to the entirety of the genome.
  • a gene's expression level is not among, for example, the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 1 6%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the cell, cell type, or population of cells of interest (e.g., the cell, cell type, or population of cells into which a codon-optimized nucleic acid is to be delivered to achieve enhanced gene expression), the sequence identity of the target gene need not be minimized relative to those genes that have lower gene expression (for instance, those genes whose expression levels are not among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%,
  • the sequence identity of the target gene need not be minimized relative to the unexpressed gene.
  • a codon-optimized gene for enhanced expression in a target cell e.g., a mammalian cell, such as a human cell
  • a target cell e.g., a mammalian cell, such as a human cell
  • a wild-type gene sequence or an already codon-optimized gene sequence encoding a protein of interest one can begin with a wild-type gene sequence or an already codon-optimized gene sequence encoding a protein of interest.
  • compositions and methods described herein can align the coding strand of the gene of interest, such as a gene encoding a therapeutic protein described herein, with the coding strands of genes expressed within the target cell, such as genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the cell.
  • annealing of the scRNA molecules can result in the formation of terminal 2-base pair 3' overhangs (e.g., as shown in Figure 2C), which facilitate loading of RNA fragments onto the RISC in eukaryotic cells, leading to abortion of translation and eventual degradation of the mRNA template.
  • Figure 2C terminal 2-base pair 3' overhangs
  • Single-nucleotide mutations that preserve the amino acid sequence of the encoded protein can be informed, for instance, by the standard genetic code, represented in Table 1 , below, compiled by the National Center for Biotechnology Information, Bethesda, Maryland, USA.
  • the codon-optimization process can be performed iteratively. For instance, one of skill in the art can begin with a wild-type gene sequence (e.g., excluding intronic DNA) and introduce substitutions into this sequence that reduce the sequence identity of the gene relative to the genes that are expressed within a target cell (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 1 7%, 18%, 1 9%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell).
  • a wild-type gene sequence e.g., excluding intronic DNA
  • substitutions into this sequence that reduce the sequence identity of the gene relative to the genes that are expressed within a target cell e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%,
  • This process can be repeated until all codons within the gene of interest have been evaluated for the opportunity to introduce single- nucleotide substitutions that can reduce sequence identity relative to the genes expressed at high levels within the target cell.
  • one can begin with a gene sequence that has previously been modified relative to the wild-type sequence of the gene, for instance, by incorporating codon substitutions that increase the GC content of the gene and/or that reduce CpG content of the gene relative to the wild- type sequence.
  • the sequence of the resulting gene can subsequently be aligned to the coding strands of the genes expressed in a desired target cell (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 1 5%, 16%, 1 7%, 18%, 1 9%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell), and iterative codon substitutions can be introduced throughout the gene in order to minimize the sequence identity of the previously-modified gene with respect to the genes expressed at high levels within the target cell.
  • a desired target cell e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 1 5%, 16%, 1
  • sequence identity of a gene of interest with respect to genes expressed at elevated levels in the cell e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 1 6%, 17%, 1 8%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell
  • one of skill in the art can align the sequence of the coding strand of the gene of interest with the coding strands of genes expressed at elevated levels in the desired target cell and may introduce mutations, e.g., as described above, that minimize sequence identity of the gene of interest to the genes expressed at elevated levels within the target cell while preserving the encoded amino acid sequence.
  • Exemplary instances in which one may introduce conservative substitutions include instances in which the amino acid to be substituted is not present within the active site of an enzyme or a site on the protein required for non-covalent binding to another biological molecule.
  • the compositions and methods described herein can be used to induce tissue-specific gene expression. For instance, after completing the codon-optimization process described above, one can subsequently align the coding strand of the codon-optimized gene with the coding strands of genes that are not expressed in the target cell (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell or genes that are not expressed above a basal level in the target cell) but that are expressed (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 1 1 1 %, 12%, 13%, 1
  • translation of those genes expressed in non-target cells is expected to result in the production of scRNA molecules, which may anneal imperfectly with the mRNA transcript of a gene of interest following transcription. Due to imperfect alignment due to incomplete sequence complementarity, annealing of the scRNA molecules can result in the formation of terminal 2-base pair 3' overhangs (e.g., as shown in Figure 2C), which facilitate loading of RNA fragments onto the RISC in eukaryotic cells, leading to abortion of translation and eventual degradation of the mRNA template.
  • designed mutations that minimize the sequence identity of a gene of interest relative to the genes expressed in a target cell may synergize with designed single- nucleotide substitutions that increase the sequence identity of a gene of interest relative to genes expressed in cells aside from the target cell so as to promote tissue-specific expression of a desired gene.
  • one of skill in the art can design variants of the target gene that contain increased GC content, reduced CpG content relative to the wild-type gene so as to enhance the translation of the encoded protein. For instance, after enhancing the protein-encoding gene sequence by incorporating codon substitutions that minimize the sequence identity of the coding strand of the target gene relative to the coding strands of genes expressed within the target cell, one of skill in the art can subsequently modify the designed coding sequence to as to increase the GC content of the coding sequence while preserving the identity of the encoded amino acid sequence. According to the RFS model, the increase in GC content will lead to enhanced binding of the ensuing mRNA transcript with scRNA fragments that are formed during translation of the desired mRNA template.
  • the binding of the target gene-specific scRNA molecules promotes improved licensing of the mRNA transcript for nuclear export and enhanced translation due to fewer instances of improper alignment (e.g., fewer instance of +1 frameshifting) of the ribosome to the mRNA template.
  • the codon-optimization process described herein can be done iteratively. For instance, one may introduce mutations that increase GC content into the gene that has already undergone the sequence identity minimization process described above. Alternatively, one may begin by introducing mutations that increase the GC content of a target gene and subsequently introduce mutations that minimize the sequence identity of a target gene relative to the genes expressed within a target cell.
  • one of skill in the art can design variants of the target gene that contain greater quantities of high-frequency codons within the target organism of interest. For instance, after enhancing the protein-encoding gene sequence by incorporating codon substitutions that minimize the sequence identity of the coding strand of the target gene relative to the coding strands of genes expressed at high levels within the target cell, one of skill in the art can subsequently modify the designed coding sequence to as to increase the quantity of codons that frequently occur in endogenous genes within the target organism (e.g., a mammal, such as a human).
  • a mammal such as a human
  • codons that have increased GC content tend to be employed more frequently in protein-coding genes, which may be a manifestation of the improved thermodynamic stability of mRNA:scRNA duplexes formed following the pioneer round of translation.
  • Table 2 A summary of the estimated codon frequency across the human genome is provided in Table 2, below, compiled by GENSCRIPT®.
  • One of skill in the art can manipulate the protein-encoding gene sequence of a target gene by incorporating codon substitutions that diminish the CpG content and/or homopolymer content of the gene. For instance, one can begin with a wild-type gene sequence and introduce substitutions (e.g., single- nucleotide substitutions) that reduce the CpG content and/or homopolymer content of the gene while preserving the identity of the encoded proteins sequence. One can then follow the sequence identity minimization process described above and in Example 1 in order to obtain a gene sequence that minimally resembles the genes encoded in a cell type of interest.
  • substitutions e.g., single- nucleotide substitutions
  • CpG sites and homopolymers can promote +1 frameshifts during the mRNA translation process.
  • the homopolymer encodes amino acid residues that are not essential for protein function (for instance, if the encoded amino acids are not present within the active site of an encoded enzyme or within a site necessary for non-covalent binding to another biological molecule), one of skill in the art can incorporate codon substitutions that interrupt the homopolymer and that introduce a conservative substitution into the encoded protein at the site of the corresponding amino acid. (v) Preparation of codon-optimized genes
  • the final codon-optimized gene can be prepared, for instance, by solid phase nucleic acid procedures known in the art.
  • solid phase nucleic acid procedures known in the art.
  • a solid phase synthesis process using a phosphoramidite method can be employed.
  • a nucleic acid is generally synthesized by the following steps.
  • a 5-OH-protected nucleoside that will occur at the 3' terminal end of the nucleic acid to be synthesized is esterified via the 3'-OH function to a solid support by appending the nucleoside to a cleavable linker. Then, the support for solid phase synthesis on which the nucleoside is immobilized can be placed in a reaction column which is then set on an automated nucleic acid synthesizer.
  • ⁇ (4) a step of oxidizing the immobilized phosphite substituent (e.g., with aqueous iodine or the like).
  • the above process can be repeated to elongate the nucleic acid as needed in a 3'-to-5' direction. 5' terminal direction is promoted, and a nucleic acid having a desired sequence is synthesized.
  • the cleavable linker is hydrolyzed (e.g., with aqueous ammonia, methylamine solution, or the like) to cleave the synthesized nucleic acid from the solid phase support.
  • aqueous ammonia, methylamine solution, or the like e.g., aqueous ammonia, methylamine solution, or the like.
  • the prepared gene can be amplified, for instance, using PCR-based techniques described herein or known in the art, and/or by transformation of DH5a E. coli with a plasmid containing the designed gene.
  • the bacteria can subsequently be cultured so as to amplify the DNA therein, and the gene can be isolated plasmid purification techniques known in the art, followed optionally by a restriction digest and/or sequencing of the plasmid to verify the identity codon-optimized gene.
  • Genes that can be codon-optimized according to the methods described herein include therapeutic proteins, such as those that can be transferred to a subject (e.g., a human patient) suffering from a disease or condition characterized by a deficiency in the protein.
  • genes that can be codon-optimized and delivered to a patient according to the methods described herein include genes encoding hormones and growth and differentiation factors including, without limitation, insulin, glucagon, growth hormone (GH), parathyroid hormone (PTH), calcitonin, growth hormone releasing factor (GRF), thyroid stimulating hormone (TSH), adrenocorticotropic hormone (ACTH), prolactin, melatonin, vasopressin, ⁇ -endorphin, met-enkephalin, leu-enkephalin, prolactin-releasing factor, prolactin-inhibiting factor, corticotropin-releasing hormone, thyrotropin-releasing hormone (TRH), follicle stimulating hormone (FSH), luteinizing hormone (LH), chorionic gonadotropin (CG), vascular endothelial growth factor (VEGF), angiopoietins, angiostatin, endostatin, granulocyte colony stimulating factor (GCSF), erythro
  • BDNF neurotrophins NT-3, NT-4/5 and NT-6
  • CNTF ciliary neurotrophic factor
  • GDNF glial cell line derived neurotrophic factor
  • neurtuin persephin
  • agrin any one of the family of semaphorins/collapsins, netrin-1 and netrin-2, hepatocyte growth factor (HGF), ephrins, noggin, sonic hedgehog and tyrosine hydroxylase.
  • HGF hepatocyte growth factor
  • ephrins noggin
  • sonic hedgehog tyrosine hydroxylase
  • genes that may be optimized and delivered according to the methods described herein include those that encode proteins that regulate the immune system including, without limitation, cytokines and lymphokines such as thrombopoietin (TPO), interleukins (IL) IL-1 a, IL-1 ⁇ , IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-1 1 , IL-12, IL-13, IL-14, IL-1 5, IL-16, and IL-17, monocyte
  • TPO thrombopoietin
  • IL-1 a interleukins
  • IL-1 ⁇ interleukins
  • IL-6 interleukins
  • IL-8 interleukins
  • MCP-1 chemoattractant protein
  • LIF leukemia inhibitory factor
  • GM-CSF granulocyte-macrophage colony stimulating factor
  • G-CSF granulocyte colony stimulating factor
  • M-CSF monocyte colony stimulating factor
  • Fas ligand tumor necrosis factors a and ⁇ (TNFa and TNFp)
  • TNFa and TNFp tumor necrosis factors a and ⁇
  • IFN interferons
  • immunoglobulins IgG, IgM, IgA, IgD and IgE include, without limitations, immunoglobulins IgG, IgM, IgA, IgD and IgE, chimeric immunoglobulins, humanized antibodies, single chain antibodies, T cell receptors, chimeric T cell receptors, single chain T cell receptors, class I and class II MHC molecules, as well as engineered MHC molecules including single chain MHC molecules.
  • Useful gene products also include complement regulatory proteins such as membrane cofactor protein (MCP), decay accelerating factor (DAF), CR1 , CR2 and CD59.
  • MCP membrane cofactor protein
  • DAF decay accelerating factor
  • genes that may be optimized and delivered according to the methods described herein include those that encode any one of the receptors for the hormones, growth factors, cytokines, lymphokines, regulatory proteins and immune system proteins.
  • receptors include flt-1 , flk-1 , TIE-2; the trk family of receptors such as TrkA, MuSK, Eph, PDGF receptor, EGF receptor, HER2, insulin receptor, IGF-1 receptor, the FGF family of receptors, the TGFp receptors, the interleukin receptors, the interferon receptors, serotonin receptors, a-adrenergic receptors, ⁇ -adrenergic receptors, the GDNF receptor, p75 neurotrophin receptor, among others.
  • the invention encompasses receptors for extracellular matrix proteins, such as integrins, counter-receptors for transmembrane-bound proteins, such as intercellular adhesion molecules (ICAM-1 , ICAM-2, ICAM-3 and ICAM-4), vascular cell adhesion molecules (VCAM), and selectins E-selectin, P-selectin and L-selectin.
  • the invention encompasses receptors for cholesterol regulation, including the LDL receptor, HDL receptor, VLDL receptor, and the scavenger receptor.
  • the invention encompasses the apolipoprotein ligands for these receptors, including ApoAI, ApoAIV and ApoE.
  • the invention also encompasses gene products such as steroid hormone receptor superfamily including glucocorticoid receptors and estrogen receptors, Vitamin D receptors and other nuclear receptors.
  • useful gene products include antimicrobial peptides such as defensins and maginins, transcription factors such as jun, fos, max, mad, serum response factor (SRF), AP-1 , AP-2, myb, MRG1 , CREM, Alx4, FREAC1 , NF- ⁇ , members of the leucine zipper family, C 2 H 4 zinc finger proteins, including Zif268, EGR1 , EGR2, C6 zinc finger proteins, including the glucocorticoid and estrogen receptors, POU domain proteins, exemplified by Pit 1 , homeodomain proteins, including HOX-1 , basic helix-loop-helix proteins, including myc, MyoD and myogenin, ETS-box containing proteins, TFE3, E2F, ATF1 , ATF2, ATF3, A
  • genes that may be optimized and delivered according to the methods described include those that encode carbamoyl synthetase I, ornithine transcarbamylase, arginosuccinate synthetase, arginosuccinate lyase, arginase, fumarylacetoacetate hydrolase, phenylalanine hydroxylase, alpha-1 antitrypsin, glucose-6-phosphatase, porphobilinogen deaminase, factor VII, factor VIII, factor IX, factor II, factor V, factor X, factor XII, factor XI, von Willebrand factor, superoxide dismutase, glutathione peroxidase and reductase, heme oxygenase, angiotensin converting enzyme, endothelin-1 , atrial natriuretic peptide, pro-urokinase, urokinase, plasminogen activator, heparin cofactor II, activated protein C
  • proteins include those involved in lysosomal storage disorders, including acid ⁇ -glucosidase, a-galactosidase a, a-1 -iduronidase, iduroate sulfatase, lysosomal acid a-glucosidase, sphingomyelinase, hexosaminidase A, hexomimidases A and B, arylsulfatase A, acid lipase, acid ceramidase, galactosylceramidase, a-fucosidase, ⁇ -, ⁇ -mannosidosis, aspartylglucosaminidase, neuramidase, galactosylceramidase, heparan-N-sulfatase, N-acetyl-a- glucosaminidase, Acetyl-CoA: a-glucosaminide N-acet
  • transgenes that may be optimized and delivered to a patient using the methods described herein include those that encode non-naturally occurring polypeptides, such as chimeric or hybrid polypeptides or polypeptides having a non-naturally occurring amino acid sequence containing insertions, deletions or amino acid substitutions.
  • polypeptides such as chimeric or hybrid polypeptides or polypeptides having a non-naturally occurring amino acid sequence containing insertions, deletions or amino acid substitutions.
  • single-chain engineered immunoglobulins could be useful in certain immunocompromised patients.
  • Other useful proteins include truncated receptors which lack their transmembrane and cytoplasmic domain. These truncated receptors can be used to antagonize the function of their respective ligands by binding to them without concomitant signaling by the receptor.
  • Other types of non-naturally occurring gene sequences include sense and antisense molecules and catalytic nucleic acids, such as ribozymes, which could be used
  • the expression level of a gene expressed by a single cell or type of cell can be ascertained, for example, by evaluating the concentration or relative abundance of RNA transcripts (e.g., mRNA)) derived from transcription of a gene of interest. Additionally or alternatively, gene expression can be determined by evaluating the concentration or relative abundance of protein produced by transcription and translation of a gene of interest. Protein concentrations can also be assessed using functional assays, such as enzymatic assays or gene transcription assays in the event the gene of interest encodes an enzyme or a modulator of transcription, respectively.
  • RNA transcripts e.g., mRNA
  • protein concentrations can also be assessed using functional assays, such as enzymatic assays or gene transcription assays in the event the gene of interest encodes an enzyme or a modulator of transcription, respectively.
  • the sections that follow describe exemplary techniques that can be used to measure and rank the expression levels of genes in a cell, cell type, or population of cells of interest, for instance, at the level of a single cell or a population of cells.
  • Expression of genes in a sample can be analyzed by a number of methodologies, many of which are known in the art and understood by the skilled artisan, including, but not limited to, nucleic acid sequencing, microarray analysis, proteomics, in-situ hybridization (e.g., fluorescence in-situ hybridization (FISH)), amplification-based assays, in situ hybridization, fluorescence activated cell sorting (FACS), northern analysis and/or PCR analysis of mRNAs.
  • FISH fluorescence in-situ hybridization
  • FACS fluorescence activated cell sorting
  • Nucleic acid-based datasets suitable for analysis of target cell-specific gene expression can have the form of a gene expression profile, which represents the identity of genes expressed in a cell of interest and the extent to which the gene is expressed, which can be used to determine the ranked order of gene expression levels within a cell, cell type, or population of cells of interest.
  • Such profiles may include whole transcriptome sequencing data (e.g., RNA-Seq data), panels of mRNAs, noncoding RNAs, or any other nucleic acid sequence that may be expressed from genomic DNA.
  • Other nucleic acid datasets suitable for use with the methods described herein may include expression data collected by imaging-based techniques (e.g., Northern blotting or Southern blotting known in the art).
  • Northern blot analysis is a conventional technique well known in the art and is described, for example, in Molecular Cloning, a Laboratory Manual, second edition, 1989, Sambrook, Fritch, Maniatis, Cold Spring Harbor Press, 10 Skyline Drive, Plainview, NY 1 1803-2500, the disclosure of which is incorporated herein by reference. Typical protocols for evaluating the status of genes and gene products are found, for example in Ausubel et al., eds., 1995, Current Protocols In Molecular Biology, Units 2 (Northern Blotting), 4
  • Gene expression profiles to be analyzed in conjunction with the methods described herein may include, for example, microarray data or nucleic acid sequencing data produced by a sequencing method known in the art (e.g., Sanger sequencing and next-generation sequencing methods, also known as high- throughput sequencing or deep sequencing).
  • exemplary next generation sequencing technologies include, without limitation, lllumina sequencing, Ion Torrent sequencing, 454 sequencing, SOLiD sequencing, and nanopore sequencing platforms. Additional methods of sequencing known in the art can also be used.
  • mRNA expression levels may be determined using RNA-Seq (e.g., as described in Mortazavi et al., Nat. Methods 5:621 -628 (2008), the disclosure of which is incorporated herein by reference in their entirety).
  • RNA-Seq is a robust technology known in the art for monitoring expression by direct sequencing the RNA molecules in a sample. Briefly, this methodology may involve fragmentation of RNA to an average length of 200 nucleotides, conversion to cDNA by random priming, and synthesis of double-stranded cDNA (e.g., using the Just cDNA DoubleStranded cDNA Synthesis Kit from Agilent Technology). Then, the cDNA is converted into a molecular library for sequencing by addition of sequence adapters for each library (e.g., from lllumina®/Solexa), and the resulting 50-100 nucleotide reads are mapped onto the genome.
  • sequence adapters for each library e.g., from lllumina®/Solexa
  • Gene expression levels may be determined using microarray-based platforms (e.g., single- nucleotide polymorphism (SNP) arrays), as microarray technology offers high resolution. Details of various microarray methods can be found in the literature. See, for example, US Patent No. 6,232,068 and Pollack et al., Nat. Genet. 23:41 -46 (1999), the disclosures of each of which are incorporated herein by reference in their entirety.
  • SNP single- nucleotide polymorphism
  • the array can be configured, for example, such that the sequence and position of each member of the array is known. Hybridization of a labeled probe with a particular array member indicates that the sample from which the probe was derived expresses that gene. Expression level may be quantified according to the amount of signal detected from hybridized probe- sample complexes.
  • a typical microarray experiment involves the following steps: 1 ) preparation of fluorescently labeled target from RNA isolated from the sample, 2) hybridization of the labeled target to the microarray, 3) washing, staining, and scanning of the array, 4) analysis of the scanned image and 5) generation of gene expression profiles.
  • a microarray processor is the Affymetrix
  • GENECHIP® system which is commercially available and comprises arrays fabricated by direct synthesis of oligonucleotides on a glass surface.
  • Other systems may be used as known to one skilled in the art.
  • Amplification-based assays also can be used to measure the expression level of one or more markers (e.g., genes).
  • the nucleic acid sequences of the gene act as a template in an amplification reaction (for example, PCR, such as qPCR).
  • PCR amplification reaction
  • the amount of amplification product is proportional to the amount of template in the original sample.
  • Comparison to appropriate controls provides a measure of the expression level of the gene, corresponding to the specific probe used, according to the principles described herein.
  • Methods of real-time qPCR using TaqMan probes are well known in the art. Detailed protocols for real-time qPCR are provided, for example, in Gibson et al., Genome Res.
  • Probes used for PCR may be labeled with a detectable marker, such as, for example, a radioisotope, fluorescent compound, bioluminescent compound, a chemiluminescent compound, metal chelator, or enzyme.
  • a detectable marker such as, for example, a radioisotope, fluorescent compound, bioluminescent compound, a chemiluminescent compound, metal chelator, or enzyme.
  • Gene expression can additionally be determined by measuring the concentration or relative abundance of a corresponding protein product encoded by a gene of interest. Protein levels can be assessed using standard detection techniques known in the art. Examples of protein expression analysis that generate data suitable for use with the methods described herein include, without limitation, proteomics approaches, immunohistochemical and/or western blot analysis, immunoprecipitation, molecular binding assays, ELISA, enzyme-linked immunofiltration assay (ELIFA), mass spectrometry, mass spectrometric immunoassay, and biochemical enzymatic activity assays. In particular, proteomics methods can be used to generate large-scale protein expression datasets in multiplex.
  • Proteomics methods may utilize mass spectrometry to detect and quantify polypeptides (e.g., proteins) and/or peptide microarrays utilizing capture reagents (e.g., antibodies) specific to a panel of target proteins to identify and measure expression levels of proteins expressed in a sample (e.g., a single cell sample or a multi- cell population).
  • polypeptides e.g., proteins
  • capture reagents e.g., antibodies
  • Exemplary peptide microarrays have a substrate-bound plurality of polypeptides, the binding of an oligonucleotide, a peptide, or a protein to each of the plurality of bound polypeptides being separately detectable.
  • the peptide microarray may include a plurality of binders, including but not limited to monoclonal antibodies, polyclonal antibodies, phage display binders, yeast two-hybrid binders, aptamers, which can specifically detect the binding of specific oligonucleotides, peptides, or proteins. Examples of peptide arrays may be found in US Patent Nos. 6,268,210, 5,766,960, and 5,143,854, the disclosures of each of which are incorporated herein by reference.
  • Mass spectrometry may be used in conjunction with the methods described herein to identify and characterize the gene expression profile of a single cell or multi-cell population. Any method of MS known in the art may be used to determine, detect, and/or measure a peptide or peptides of interest, e.g., LC-MS, ESI-MS, ESI-MS/MS, MALDI-TOF-MS, MALDI-TOF/TOF-MS, tandem MS, and the like.
  • Mass spectrometers generally contain an ion source and optics, mass analyzer, and data processing electronics.
  • Mass analyzers include scanning and ion-beam mass spectrometers, such as time-of-flight (TOF) and quadruple (Q), and trapping mass spectrometers, such as ion trap (IT), Orbitrap, and Fourier transform ion cyclotron resonance (FT-ICR), may be used in the methods described herein. Details of various MS methods can be found in the literature. See, for example, Yates et al., Annu. Rev. Biomed. Eng. 1 1 :49-79 (2009), the disclosure of which is incorporated herein by reference.
  • TOF time-of-flight
  • Q quadruple
  • trapping mass spectrometers such as ion trap (IT), Orbitrap, and Fourier transform ion cyclotron resonance (FT-ICR)
  • proteins in a sample can be first digested into smaller peptides by chemical (e.g., via cyanogen bromide cleavage) or enzymatic (e.g., trypsin) digestion.
  • Complex peptide samples also benefit from the use of front-end separation techniques, e.g., 2D-PAGE, HPLC, RPLC, and affinity chromatography.
  • the digested, and optionally separated, sample is then ionized using an ion source to create charged molecules for further analysis.
  • Ionization of the sample may be performed, e.g., by electrospray ionization (ESI), atmospheric pressure chemical ionization (APCI), photoionization, electron ionization, fast atom bombardment (FAB)/liquid secondary ionization (LSIMS), matrix assisted laser desorption/ionization (MALDI), field ionization, field desorption, thermospray/plasmaspray ionization, and particle beam ionization. Additional information relating to the choice of ionization method is known to those of skill in the art.
  • Tandem MS also known as MS/MS
  • Tandem MS may be particularly useful for methods described herein allowing for ionization followed by fragmentation of a complex peptide sample, such as a sample obtained from a multi-cell population described herein.
  • Tandem MS involves multiple steps of MS selection, with some form of ion fragmentation occurring in between the stages, which may be accomplished with individual mass spectrometer elements separated in space or using a single mass spectrometer with the MS steps separated in time. In spatially separated tandem MS, the elements are physically separated and distinct, with a physical connection between the elements to maintain high vacuum.
  • a polynucleotide such as codon-optimized DNA or RNA into a mammalian cell
  • electroporation can be used to permeabilize mammalian cells (e.g., human target cells) by the application of an electrostatic potential to the cell of interest.
  • Mammalian cells, such as human cells, subjected to an external electric field in this manner are subsequently predisposed to the uptake of exogenous nucleic acids. Electroporation of mammalian cells is described in detail, e.g., in Chu et al., Nucleic Acids Research 15:131 1 (1987), the disclosure of which is incorporated herein by reference.
  • NucleofectionTM utilizes an applied electric field in order to stimulate the uptake of exogenous polynucleotides into the nucleus of a eukaryotic cell.
  • NucleofectionTM and protocols useful for performing this technique are described in detail, e.g., in Distler et al., Experimental Dermatology 14:315 (2005), as well as in US 2010/03171 14, the disclosures of each of which are incorporated herein by reference.
  • Additional techniques useful for the transfection of target cells include the squeeze-poration methodology. This technique induces the rapid mechanical deformation of cells in order to stimulate the uptake of exogenous DNA through membranous pores that form in response to the applied stress. This technology is advantageous in that a vector is not required for delivery of nucleic acids into a cell, such as a human target cell. Squeeze-poration is described in detail, e.g., in Sharei et al., Journal of Visualized Experiments 81 :e50980 (2013), the disclosure of which is incorporated herein by reference.
  • Lipofection represents another technique useful for transfection of target cells. This method involves the loading of nucleic acids into a liposome, which often presents cationic functional groups, such as quaternary or protonated amines, towards the liposome exterior. This promotes electrostatic interactions between the liposome and a cell due to the anionic nature of the cell membrane, which ultimately leads to uptake of the exogenous nucleic acids, for instance, by direct fusion of the liposome with the cell membrane or by endocytosis of the complex. Lipofection is described in detail, for instance, in US Patent No. 7,442,386, the disclosure of which is incorporated herein by reference.
  • Similar techniques that exploit ionic interactions with the cell membrane to provoke the uptake of foreign nucleic acids include contacting a cell with a cationic polymer-nucleic acid complex.
  • exemplary cationic molecules that associate with polynucleotides so as to impart a positive charge favorable for interaction with the cell membrane include activated dendrimers (described, e.g., in Dennig, Topics in Current Chemistry 228:227 (2003), the disclosure of which is incorporated herein by reference) and
  • laserfection a technique that involves exposing a cell to electromagnetic radiation of a particular wavelength in order to gently permeabilize the cells and allow polynucleotides to penetrate the cell membrane. This technique is described in detail, e.g., in Rhodes et al., Methods in Cell Biology 82:309 (2007), the disclosure of which is incorporated herein by reference.
  • Microvesicles represent another potential vehicle that can be used to modify the genome of a target cell according to the methods described herein. For instance, microvesicles that have been induced by the co-overexpression of the glycoprotein VSV-G with, e.g., a genome-modifying protein, such as a nuclease, can be used to efficiently deliver proteins into a cell that subsequently catalyze the site- specific cleavage of an endogenous polynucleotide sequence so as to prepare the genome of the cell for the covalent incorporation of a polynucleotide of interest, such as a gene or regulatory sequence.
  • a genome-modifying protein such as a nuclease
  • vesicles also referred to as Gesicles
  • Gesicles for the genetic modification of eukaryotic cells is described in detail, e.g., in Quinn et al., Genetic Modification of Target Cells by Direct Delivery of Active Protein [abstract].
  • Methylation changes in early embryonic genes in cancer in: Proceedings of the 18th Annual Meeting of the American Society of Gene and Cell Therapy; 2015 May 13,
  • Transposons are polynucleotides that encode transposase enzymes and contain a polynucleotide sequence or gene of interest flanked by 5' and 3' excision sites. Once a transposon has been delivered into a cell, expression of the transposase gene commences and results in active enzymes that cleave the gene of interest from the transposon. This activity is mediated by the site-specific recognition of transposon excision sites by the transposase.
  • these excision sites may be terminal repeats or inverted terminal repeats.
  • the gene of interest can be integrated into the genome of a mammalian cell by transposase-catalyzed cleavage of similar excision sites that exist within the nuclear genome of the cell. This allows the gene of interest to be inserted into the cleaved nuclear DNA at the complementary excision sites, and subsequent covalent ligation of the phosphodiester bonds that join the gene of interest to the DNA of the mammalian cell genome completes the incorporation process.
  • the transposon may be a retrotransposon, such that the gene encoding the target gene is first transcribed to an RNA product and then reverse- transcribed to DNA before incorporation in the mammalian cell genome.
  • exemplary transposon systems include the piggybac transposon (described in detail in, e.g., WO 2010/085699) and the sleeping beauty transposon (described in detail in, e.g., US 2005/01 12764), the disclosures of each of which are incorporated herein by reference as they pertain to transposons for use in gene delivery to a cell of interest.
  • CRISPR clustered regularly interspaced short palindromic repeats
  • Cas9 Cas9 nuclease
  • Polynucleotides containing these foreign sequences and the repeat-spacer elements of the CRISPR locus are in turn transcribed in a host cell to create a guide RNA, which can subsequently anneal to a target sequence and localize the Cas9 nuclease to this site.
  • highly site-specific cas9-mediated DNA cleavage can be engendered in a foreign polynucleotide because the interaction that brings cas9 within close proximity of the target DNA molecule is governed by RNA:DNA hybridization.
  • RNA:DNA hybridization As a result, one can theoretically design a CRISPR/Cas system to cleave any target DNA molecule of interest.
  • ZFNs zinc finger nucleases
  • TALENs transcription activator-like effector nucleases
  • ZFNs and TALENs in genome editing applications are described, e.g., in Urnov et al., Nature Reviews Genetics 1 1 :636 (201 0); and in Joung et al., Nature Reviews Molecular Cell Biology 14:49 (2013), the disclosure of each of which are incorporated herein by reference as they pertain to compositions and methods for genome editing.
  • Additional genome editing techniques that can be used to incorporate polynucleotides encoding target genes into the genome of a target cell include the use of ARCUSTM meganucleases that can be rationally designed so as to site-specifically cleave genomic DNA.
  • the use of these enzymes for the incorporation of genes encoding target genes into the genome of a mammalian cell is advantageous in view of the defined structure-activity relationships that have been established for such enzymes.
  • Single chain meganucleases can be modified at certain amino acid positions in order to create nucleases that selectively cleave DNA at desired locations, enabling the site-specific incorporation of a target gene into the nuclear DNA of a target cell.
  • Viral genomes provide a rich source of vectors that can be used for the efficient delivery of exogenous genes into the genome of a cell (e.g., a mammalian cell, such as a human cell). Viral genomes are particularly useful vectors for gene delivery because the polynucleotides contained within such genomes are typically incorporated into the genome of a target cell by generalized or specialized transduction. These processes occur as part of the natural viral replication cycle, and do not require added proteins or reagents in order to induce gene integration.
  • viral vectors examples include AAV, retrovirus, adenovirus (e.g., Ad5, Ad26, Ad34, Ad35, and Ad48), parvovirus (e.g., adeno-associated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e.g., influenza virus), rhabdovirus (e.g., rabies and vesicular stomatitis virus), paramyxovirus (e.g.
  • RNA viruses such as picornavirus and alphavirus
  • double stranded DNA viruses including adenovirus, herpesvirus (e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e.g., vaccinia, modified vaccinia Ankara (MVA), fowlpox and canarypox).
  • herpesvirus e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus
  • poxvirus e.g., vaccinia, modified vaccinia Ankara (MVA), fowlpox and canarypox
  • Other viruses useful for delivering polynucleotides into a cell include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example.
  • retroviruses examples include: avian leukosis-sarcoma, mammalian C-type, B-type viruses, D-type viruses, HTLV-BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental
  • nucleic acids of the compositions and methods described herein are incorporated into rAAV vectors and/or virions in order to facilitate their introduction into a cell.
  • rAAV vectors useful in the invention are recombinant nucleic acid constructs that include (1 ) a heterologous sequence to be expressed (e.g., a polynucleotide encoding a GAA protein) and (2) viral sequences that facilitate integration and expression of the heterologous genes.
  • the viral sequences may include those sequences of AAV that are required in cis for replication and packaging (e.g., functional ITRs) of the DNA into a virion.
  • the heterologous gene encodes GAA, which is useful for correcting a GAA-deficiency in a cell.
  • rAAV vectors may also contain marker or reporter genes.
  • Useful rAAV vectors have one or more of the AAV WT genes deleted in whole or in part, but retain functional flanking ITR sequences.
  • the AAV ITRs may be of any serotype (e.g., derived from serotype 2) suitable for a particular application. Methods for using rAAV vectors are described, for example, in Tal et al., J.
  • the nucleic acids and vectors described herein can be incorporated into a rAAV virion in order to facilitate introduction of the nucleic acid or vector into a cell.
  • the capsid proteins of AAV compose the exterior, non-nucleic acid portion of the virion and are encoded by the AAV cap gene.
  • the cap gene encodes three viral coat proteins, VP1 , VP2 and VP3, which are required for virion assembly.
  • the construction of rAAV virions has been described, for instance, in US Patent Nos. 5,173,414; 5,139,941 ; 5,863,541 ; 5,869,305; 6,057,152; and 6,376,237; as well as in Rabinowitz et al., J. Virol. 76:791 -801 (2002) and Bowles et al., J. Virol. 77:423-432 (2003), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery
  • rAAV virions useful in conjunction with the compositions and methods described herein include those derived from a variety of AAV serotypes including AAV 1 , 2, 3, 4, 5, 6, 7, 8 and 9.
  • rAAV virions that include at least one serotype 1 capsid protein may be particularly useful.
  • rAAV virions that include at least one serotype 6 capsid protein may also be particularly useful, as serotype 6 capsid proteins are structurally similar to serotype 1 capsid proteins, and thus are expected to also result in high expression of GAA in muscle cells.
  • rAAV serotype 9 has also been found to be an efficient transducer of muscle cells.
  • AAV vectors and AAV proteins of different serotypes are described, for instance, in Chao et al., Mol. Ther. 2:619-623 (2000); Davidson et al., Proc. Natl. Acad. Sci. USA 97:3428-3432 (2000); Xiao et al., J. Virol. 72:2224-2232 (1998); Halbert et al., J. Virol. 74:1524-1 532 (2000); Halbert et al., J. Virol. 75:6615-6624 (2001 ); and Auricchio et al., Hum.
  • Pseudotyped vectors include AAV vectors of a given serotype (e.g., AAV9) pseudotyped with a capsid gene derived from a serotype other than the given serotype (e.g., AAV1 , AAV2, AAV3,
  • a representative pseudotyped vector is an AAV8 or AAV9 vector encoding a therapeutic protein pseudotyped with a capsid gene derived from AAV serotype 2.
  • Techniques involving the construction and use of pseudotyped rAAV virions are known in the art and are described, for instance, in Duan et al., J. Virol. 75:7662-7671 (2001 ); Halbert et al., J. Virol. 74:1524- 1532 (2000); Zolotukhin et al., Methods, 28:1 58-167 (2002); and Auricchio et al., Hum. Molec. Genet., 10:3075-3081 (2001 ).
  • AAV virions that have mutations within the virion capsid may be used to infect particular cell types more effectively than non-mutated capsid virions.
  • suitable AAV mutants may have ligand insertion mutations for the facilitation of targeting AAV to specific cell types.
  • the construction and characterization of AAV capsid mutants including insertion mutants, alanine screening mutants, and epitope tag mutants is described in Wu et al., J. Virol. 74:8635-45 (2000).
  • Other rAAV virions that can be used in conjunction with the compositions and methods described herein include those capsid hybrids that are generated by molecular breeding of viruses as well as by exon shuffling. See, e.g., Soong et al., Nat. Genet., 25:436-439 (2000) and Kolman and Stemmer, Nat. Biotechnol. 19:423-428 (2001 ).
  • compositions and methods described herein can also be used to alter the expression of a gene of interest in a cell, such as a cell in vitro or in vivo.
  • the invention provides methods and compositions for attenuating expression of a gene of interest in a cell, such as cell in vitro or in vivo, using a polynucleotide having a +1 frameshift mutation.
  • the polynucleotides and vectors described herein can have important clinical utility. For instance, a variety of diseases and conditions, including genetic and proliferative disorders, are manifestations of aberrant and/or dysregulated gene expression and may be associated with, for example, an elevated (e.g., increased) expression of a particular gene.
  • the sections that follow describe polynucleotides having a +1 frameshift mutation and vectors containing such polynucleotides that can be used in methods to attenuate and/or maintain the expression level of a target gene, for example, to reduce target gene expression to or below a basal level.
  • the present invention is based in part on the Reading Frame Surveillance (RFS) model, which suggests a mechanism by which a ribosome can prevent +1 frameshifting (e.g., a +1 or -1 frameshift mutation) from occurring during the translation of an mRNA.
  • a frameshift mutation may occur whenever the ribosome shifts in the 5' or 3' direction during translation and may result in the translation of product that is not genetically encoded.
  • the +1 frameshift mutation occurs when five nucleotides are inserted with respect to the gene sequence, e.g., during RNA replication.
  • a single nucleotide deletion may cause a +1 frameshift mutation, e.g., during RNA replication.
  • scRNA short complementary RNA
  • upstream (-1 ) frameshifting of the ribosome during translation of an mRNA construct is sterically prohibited, as this would require two adjacent tRNA molecules to be bound to the same base on the mRNA target, or that the mRNA contain a short homopolymeric sequence that allows slippage of the P-site tRNA.
  • +1 frameshifting of the ribosome could occur without such steric hindrance whenever a one-nucleotide translocation event allows for binding of the A-site tRNA in a +1 shifted position relative to the previous codon ( Figure 1 C).
  • the RFS model is based in part on the argument that primitive ribosomes were more error prone than modern ribosomes (Woese, Proc. Natl. Acad. Sci. U.S.A. 54:1 546-1552 (1965)), thus increasing the challenge of limiting the occurrence of +1 frameshifting events.
  • genomic RNA replication could cause primordial translation templates to be double stranded (Zenkin, J. Mol. Evol. 74:249-256 (2012))
  • the RFS model hypothesizes that a primordial ribosome could have used complementary RNA (cRNA) strands to measure template RNA during translation to facilitate recognition and response to frameshifting events.
  • cRNA complementary RNA
  • a complementary RNA is generated by an RNA-dependent RNA polymerase (RdRP) (Fig. 2A).
  • RdRP RNA-dependent RNA polymerase
  • pioneer translation is expected to occur in a manner that is coupled to the synthesis of cRNA.
  • translation of every set number of codons within a region of the cRNA that is complementary to an exon results in cleavage of the cRNA at set distance from the A-site, which produces a series of uniformly sized short complementary RNAs (scRNAs) that decorate the length of the exonic regions of the pre-mRNA.
  • scRNAs short complementary RNAs
  • the scRNAs are predicted to have a uniform size of approximately 9-30 (e.g., 9, 10, 1 1 , 12, 13, 14, 15, 1 6, 17, 1 8, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30) nucleotides and are spaced uniformly along the mRNA.
  • scRNAs Following splicing of the pre-mRNA to generate an mRNA, scRNAs remain complexed to the exonic regions of the mRNA, such that scRNAs span the length of the coding region of the mRNA.
  • the RFS model suggests that these scRNAs serve as a molecular ruler that allows continuous monitoring for ribosomal frameshifts during subsequent rounds of translation by the ribosome.
  • the RFS model postulates that the translational reading frame is monitored by sensing of scRNA termini with respect to the position of the tRNAs bound in the ribosomal active site (Fig. 2B).
  • Sensing of +1 frameshifts during the translation process can lead to repeated cleavage of the mRNA transcript that may result in terminal 2-base pair 3' overhangs (Fig. 2C), which can facilitate loading of RNA fragments onto the RNA Induced Silencing Complex (RISC) in eukaryotic cells (Elbashir et al. Genes Dev. 15:188-200, 2001 ).
  • RISC RNA Induced Silencing Complex
  • the RFS model additionally suggests that scRNA molecules may dissociate from the mRNA template that was translated during their biogenesis. These scRNA molecules may bid to daughter RNA molecules of the same polarity, for instance, to facilitate the proper translation of these templates.
  • terminal 2-base pair 3' overhangs are descendent of primordial structures involved in regulation of RNA integrity, as duplexes containing this motif are competent substrates for the RNA Induced Silencing Complex (RISC).
  • RISC RNA Induced Silencing Complex
  • RNA fragments For instance, translation downstream of a deletion in a synthetic gene construct would be sensed as a +1 frameshift at each scRNA every time it is read by the ribosome. In turn, this can trigger a severe response, such as cleavage of the RNA message. The ensuing cleavage can subsequently generate double stranded RNA molecules with terminal 2-base pair 3' overhangs (Figure 2C), which can facilitate loading of RNA fragments onto the RISC in eukaryotic cells (Elbashir et al., Genes Dev. 15:188-200 (2001 ), herein incorporated by reference in its entirety).
  • FIG. 3 is a schematic portraying the Reading Frame Surveillance model applied to modulation of a eukaryotic gene.
  • the RFS model postulates that the ability of free scRNA to bind to and monitor the translation of template mRNA is a concentration dependent process.
  • concentration of free scRNA when the concentration of free scRNA is reduced, translation of an mRNA template may be attenuated due to the limited binding (e.g., incomplete decoration) of the mRNA template by scRNA.
  • the concentration of free scRNA may be reduced, e.g., by competitive inhibition.
  • an mutant mRNA with a +1 frameshift mutation relative to the wild type mRNA may bind to and sequester scRNA molecules, if the mutant mRNA is present at a sufficiently high concentration (e.g., a concentration higher than the level of the wild type mRNA).
  • the RFS model suggests that competitive inhibition would result in a reduced rate of translation (e.g., attenuation of gene expression) of the wild type mRNA. Additionally, the ribosome would sense the +1 frameshift mutation in the mutant mRNA transcript, e.g., by sensing the position of each scRNA, and terminate translation.
  • gene silencing can be applied in a targeted manner, such as would be useful in the treatment of genetic disorders and conditions characterized by aberrant and/or dysregulated gene expression (e.g., elevated gene expression). Under certain conditions, reduction of the expression level of a gene to a lower level of expression may be useful as a method of treating, preventing, and/or ameliorating the disease.
  • a polynucleotide and/or a vector encoding a polynucleotide having a +1 frameshift mutation relative to the sequence of the wild type mRNA may be introduced into a cell, e.g., at a sufficient concentration to induce competitive inhibition of scRNA binding to the wild type mRNA template.
  • a polynucleotide and/or a vector encoding a polynucleotide having a +1 frameshift mutation may reduce target gene expression by about 1 0% (e.g., 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90%) or more.
  • RNA and/or protein levels may be monitored by techniques known in the art for measuring RNA and/or protein levels. For example, a number of widely used procedures exist for detecting and determining the abundance of a particular mRNA in a total or poly(A) RNA sample, such as and without limitation, Northern blot analysis, nuclease protection assays (NPA), in situ hybridization, and reverse transcription-polymerase chain reaction (RT- PCR).
  • NPA nuclease protection assays
  • RT- PCR reverse transcription-polymerase chain reaction
  • techniques measuring the concentration or relative abundance of a corresponding protein product encoded by a targeted gene of interest may be used, such as and without limitation, proteomics approaches, immunohistochemical and/or western blot analysis, immunoprecipitation, molecular binding assays, ELISA, enzyme-linked immunofiltration assay (ELIFA), mass spectrometry, mass spectrometric immunoassay, and biochemical enzymatic activity assays.
  • proteomics approaches immunohistochemical and/or western blot analysis, immunoprecipitation, molecular binding assays, ELISA, enzyme-linked immunofiltration assay (ELIFA), mass spectrometry, mass spectrometric immunoassay, and biochemical enzymatic activity assays.
  • ELIFA enzyme-linked immunofiltration assay
  • Polynucleotides useful for attenuating the expression of a gene as described herein include those that are designed to have at least one (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) +1 frameshift mutation with respect to at least a portion of the sequence of a targeted gene.
  • the polynucleotide may also include at least a single (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) nucleotide insertion or deletion causing the +1 frameshift mutation.
  • the +1 frameshift mutation may be generated by the addition of five nucleotides with respect to the targeted gene sequence.
  • a single nucleotide deletion may be introduced to cause a +1 frameshift mutation.
  • the polynucleotide may be designed to be from about 9 to about 100 (e.g., 9, 10, 1 1 , 12, 13, 14, 1 5, 16, 1 7, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, or 100) nucleotides in length. In some instances, the polynucleotides will be between about 9 to about 50 (e.g., about 10 to about 25, about 20 to about 35, about 30 to about 45, about 35 to about 50) nucleotides in length.
  • the at least one +1 frameshift mutation may occur at any nucleotide position along the length of the polynucleotide that allows for sufficient recognition by and binding of scRNA fragments.
  • the polynucleotide may be designed to include a wild type copy of the target gene operably linked to the portion of said polynucleotide having a +1 frameshift mutation.
  • the overall length of the polynucleotide and the position of the +1 frameshift mutation along its length will determine the number of nucleotides that include the wild type sequence of the target gene.
  • the length of the wild type sequence of the gene operably linked to the portion of the polynucleotide having a +1 frameshift mutation is at least 20% (e.g., 20, 30, 40, 50, 60, 70, or 80%) of the total polynucleotide length.
  • a polynucleotide having a total length of 50 nucleotides may be designed such that nucleotides 1 -30 include the wild type sequence and the nucleotides occurring after a +1 frameshift mutation at nucleotide 31 have a non-wild type sequence.
  • one (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 20) or more polynucleotides having a +1 frameshift mutation may be designed. These polynucleotides may be of varying lengths and have +1 frameshift mutations located at different positions relative to the sequence of the target gene. In this manner, the polynucleotides can be designed to bind different sets of scRNA fragments when introduced into a cell, such as a cell in vitro or in vivo. The use of multiple polynucleotides may increase the efficacy of treatment (e.g., attenuation of gene expression).
  • polynucleotides described herein may be introduced into a cell, such as a mammalian cell (e.g., a human cell) in vivo and in vitro, by methods known in the art, including transfection techniques. Additionally or alternatively, polynucleotides described herein may be incorporated into a variety of expression vectors, such as viral vectors, for delivery to and expression in mammalian cells.
  • a mammalian cell e.g., a human cell
  • expression vectors such as viral vectors
  • the polynucleotides may be designed to target gene involved in diseases and conditions characterized by aberrant and/or dysregulated gene expression (e.g., elevated levels of gene
  • polynucleotides having a +1 frameshift mutation may find therapeutic utility for attenuating gene expression.
  • diseases and conditions for which polynucleotides having a +1 frameshift mutation may find therapeutic utility for attenuating gene expression include, but are not limited to, genetic disorders, e.g., dominant catecholaminergic polyventricular tachycardia (CPVT) and Long QT syndrome (LQTS), as well as in proliferative disorders, e.g., cancer.
  • CPVT dominant catecholaminergic polyventricular tachycardia
  • LQTS Long QT syndrome
  • the polynucleotides of the can be prepared, for instance, by methods known in the art, such as solid phase nucleic acid synthesis and molecular cloning procedures.
  • a target cell e.g., a mammalian cell, such as a human cell
  • transfection techniques known in the art. For instance, electroporation can be used to permeabilize mammalian cells (e.g., human target cells) by the application of an electrostatic potential to the cell of interest. Mammalian cells, such as human cells, subjected to an external electric field in this manner are subsequently predisposed to the uptake of exogenous nucleic acids.
  • Additional techniques useful for the transfection of target cells with polynucleotides having a +1 frameshift mutation described herein include the squeeze-poration methodology. This technique induces the rapid mechanical deformation of cells in order to stimulate the uptake of exogenous DNA through membranous pores that form in response to the applied stress. This technology is advantageous in that a vector is not required for delivery of nucleic acids into a cell, such as a human target cell. Squeeze- poration is described in detail, e.g., in Sharei et al., Journal of Visualized Experiments 81 :e50980 (2013), the disclosure of which is incorporated herein by reference.
  • Lipofection represents another technique useful for transfection of target cells. This method involves the loading of nucleic acids into a liposome, which often presents cationic functional groups, such as quaternary or protonated amines, towards the liposome exterior. This promotes electrostatic interactions between the liposome and a cell due to the anionic nature of the cell membrane, which ultimately leads to uptake of the exogenous nucleic acids, for instance, by direct fusion of the liposome with the cell membrane or by endocytosis of the complex. Lipofection is described in detail, for instance, in US Patent No. 7,442,386, the disclosure of which is incorporated herein by reference.
  • RNA- containing aqueous core can have an anionic, cationic, or zwitterionic hydrophilic head group.
  • Some phospholipids are anionic whereas other are zwitterionic and others are cationic.
  • Suitable classes of phospholipid include, but are not limited to, phosphatidylethanolamines, phosphatidylcholines, phosphatidylserines, and phosphatidylglycerols.
  • Useful cationic lipids include, but are not limited to, dioleoyl trimethylammonium propane (DOTAP), 1 ,2-distearyloxy-N,N-dimethyl-3- aminopropane (DSDMA), 1 ,2-dioleyloxy-N,Ndimethyl-3-aminopropane (DODMA), 1 ,2-dilinoleyloxy-N,N- dimethyl-3-aminopropane (DlinDMA), 1 ,2-dilinolenyloxy-N,N-dimethyl-3-aminopropane (DLenDMA).
  • Zwitterionic lipids include, but are not limited to, acyl zwitterionic lipids and ether zwitterionic lipids.
  • Examples of useful zwitterionic lipids are DPPC, DOPC, DSPC, dodecylphosphocholine, 1 ,2-dioleoyl-sn- glycero- 3-phosphatidylethanolamine (DOPE), and 1 ,2-diphytanoyl-sn-glycero-3-phosphoethanolamine (DPyPE).
  • DOPE 1 ,2-dioleoyl-sn- glycero- 3-phosphatidylethanolamine
  • DPyPE 1 ,2-diphytanoyl-sn-glycero-3-phosphoethanolamine
  • the lipids in the liposome can be saturated or unsaturated. If an unsaturated lipid has two tails, either tails can be unsaturated, or the lipid can have one saturated tail and one unsaturated tail.
  • a lipid can include a steroid group in one tail.
  • Similar techniques that exploit ionic interactions with the cell membrane to provoke the uptake of foreign nucleic acids include contacting a cell with a cationic polymer-nucleic acid complex.
  • exemplary cationic molecules that associate with polynucleotides so as to impart a positive charge favorable for interaction with the cell membrane include activated dendrimers (described, e.g., in Dennig, Topics in Current Chemistry 228:227 (2003), the disclosure of which is incorporated herein by reference) and diethylaminoethyl (DEAE)-dextran, the use of which as a transfection agent is described in detail, for instance, in Gulick et al., Current Protocols in Molecular Biology 40:1 :9.2:9.2.1 (1 997), the disclosure of which is incorporated herein by reference.
  • Magnetic beads represent another tool that can be used to transfect target cells in a mild and efficient manner, as this methodology utilizes an applied magnetic field in order to direct the uptake of nucleic acids. This technology is described in detail, for instance, in US 2010/0227406, the disclosure of which is incorporated herein by reference.
  • laserfection a technique that involves exposing a cell to electromagnetic radiation of a particular wavelength in order to gently permeabilize the cells and allow polynucleotides to penetrate the cell membrane. This technique is described in detail, e.g., in Rhodes et al., Methods in Cell Biology 82:309 (2007), the disclosure of which is incorporated herein by reference.
  • Microvesicles represent another potential vehicle that can be used to modify the genome of a target cell according to the methods described herein. For instance, microvesicles that have been induced by the co-overexpression of the glycoprotein VSV-G with, e.g., a genome-modifying protein, such as a nuclease, can be used to efficiently deliver proteins into a cell that subsequently catalyze the site- specific cleavage of an endogenous polynucleotide sequence so as to prepare the genome of the cell for the covalent incorporation of a polynucleotide of interest, such as a gene or regulatory sequence.
  • a genome-modifying protein such as a nuclease
  • vesicles also referred to as Gesicles
  • Gesicles for the genetic modification of eukaryotic cells is described in detail, e.g., in Quinn et al., Genetic Modification of Target Cells by Direct Delivery of Active Protein [abstract].
  • Methylation changes in early embryonic genes in cancer in: Proceedings of the 18th Annual Meeting of the American Society of Gene and Cell Therapy; 2015 May 13,
  • exosomes such as those harvested naturally from a cell (e.g., a mammalian cell, such as a human cell) or those prepared synthetically. Exemplary techniques for the preparation and loading of exosomes with RNA substrates are described, for instance, in US 2015/0093433, the disclosure of which is incorporated herein by reference in its entirety.
  • Viral genomes provide a rich source of vectors that can be used for the efficient delivery of exogenous genes into the genome of a cell (e.g., a mammalian cell, such as a human cell). Viral genomes are particularly useful vectors for gene delivery because the polynucleotides contained within such genomes are typically incorporated into the genome of a target cell by generalized or specialized transduction. These processes occur as part of the natural viral replication cycle, and do not require added proteins or reagents in order to induce gene integration.
  • viral vectors examples include AAV, retrovirus, adenovirus (e.g., Ad5, Ad26, Ad34, Ad35, and Ad48), parvovirus (e.g., adeno-associated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e.g., influenza virus), rhabdovirus (e.g., rabies and vesicular stomatitis virus), paramyxovirus (e.g.
  • RNA viruses such as picornavirus and alphavirus
  • double stranded DNA viruses including adenovirus, herpesvirus (e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e.g., vaccinia, modified vaccinia Ankara (MVA), fowlpox and canarypox).
  • herpesvirus e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus
  • poxvirus e.g., vaccinia, modified vaccinia Ankara (MVA), fowlpox and canarypox
  • Other viruses useful for delivering polynucleotides described herein include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example.
  • retroviruses examples include: avian leukosis-sarcoma, mammalian C-type, B-type viruses, D-type viruses, HTLV- BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996).
  • examples include murine leukemia viruses, murine sarcoma viruses, mouse mammary tumor virus, bovine leukemia virus, feline leukemia virus, feline sarcoma virus, avian leukemia virus, human T-cell leukemia virus, baboon endogenous virus, Gibbon ape leukemia virus, Mason Pfizer monkey virus, simian immunodeficiency virus, simian sarcoma virus, Rous sarcoma virus and lentiviruses.
  • vectors are described, for example, in US Patent No. 5,801 ,030, the disclosure of which is incorporated herein by reference as it pertains to viral vectors for use in gene therapy.
  • polynucleotides of the compositions and methods described herein are incorporated into rAAV vectors and/or virions in order to facilitate their introduction into a cell.
  • rAAV vectors useful in the invention are recombinant nucleic acid constructs that include (1 ) a heterologous sequence to be expressed (e.g., a polynucleotide encoding a +1 frameshift mutation relative to a gene of interest) and (2) viral sequences that facilitate integration and expression of the heterologous genes.
  • the viral sequences may include those sequences of AAV that are required in cis for replication and packaging (e.g., functional ITRs) of the DNA into a virion.
  • the heterologous gene encodes GAA, which is useful for correcting a GAA-deficiency in a cell.
  • rAAV vectors may also contain marker or reporter genes.
  • Useful rAAV vectors have one or more of the AAV WT genes deleted in whole or in part, but retain functional flanking ITR sequences.
  • the AAV ITRs may be of any serotype (e.g., derived from serotype 2) suitable for a particular application. Methods for using rAAV vectors are described, for example, in Tal et al. J. Biomed. Sci. 7:279-291 (2000), and Monahan and Samulski, Gene Delivery 7:24-30 (2000), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
  • the polynucleotides and vectors described herein can be incorporated into a rAAV virion in order to facilitate introduction of the nucleic acid or vector into a cell.
  • the capsid proteins of AAV compose the exterior, non-nucleic acid portion of the virion and are encoded by the AAV cap gene.
  • the cap gene encodes three viral coat proteins, VP1 , VP2 and VP3, which are required for virion assembly.
  • the construction of rAAV virions has been described, for instance, in US Patent Nos. 5,173,414; 5,139,941 ; 5,863,541 ; 5,869,305; 6,057,152; and 6,376,237; as well as in Rabinowitz et al. J. Virol. 76:791 -801 (2002) and Bowles et al. J. Virol. 77:423-432 (2003), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene
  • rAAV virions useful in conjunction with the compositions and methods described herein include those derived from a variety of AAV serotypes including AAV 1 , 2, 3, 4, 5, 6, 7, 8 and 9.
  • rAAV virions that include at least one serotype 1 capsid protein may be particularly useful.
  • rAAV virions that include at least one serotype 6 capsid protein may also be particularly useful, as serotype 6 capsid proteins are structurally similar to serotype 1 capsid proteins, and thus are expected to also result in high expression of GAA in muscle cells.
  • rAAV serotype 9 has also been found to be an efficient transducer of muscle cells.
  • AAV vectors and AAV proteins of different serotypes are described, for instance, in Chao et al. Mol. Ther. 2:619-623 (2000); Davidson et al. Proc. Natl. Acad. Sci. USA 97:3428-3432 (2000); Xiao et al. J. Virol. 72:2224-2232 (1998); Halbert et al. J. Virol. 74:1524-1 532 (2000); Halbert et al. J. Virol. 75:6615-6624 (2001 ); and Auricchio et al. Hum. Molec. Genet.
  • Pseudotyped vectors include AAV vectors of a given serotype (e.g., AAV9) pseudotyped with a capsid gene derived from a serotype other than the given serotype (e.g., AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, etc.).
  • a representative pseudotyped vector is an AAV8 or AAV9 vector encoding a therapeutic protein pseudotyped with a capsid gene derived from AAV serotype 2.
  • Techniques involving the construction and use of pseudotyped rAAV virions are known in the art and are described, for instance, in Duan et al. J. Virol. 75:7662-7671 (2001 ); Halbert et al., J. Virol. 74:1524- 1532 (2000); Zolotukhin et al. Methods, 28:158-167 (2002); and Auricchio et al., Hum. Molec. Genet., 10:3075-3081 (2001 ).
  • AAV virions that have mutations within the virion capsid may be used to infect particular cell types more effectively than non-mutated capsid virions.
  • suitable AAV mutants may have ligand insertion mutations for the facilitation of targeting AAV to specific cell types.
  • the construction and characterization of AAV capsid mutants including insertion mutants, alanine screening mutants, and epitope tag mutants is described in Wu et al., J. Virol. 74:8635-45 (2000).
  • Other rAAV virions that can be used in conjunction with the compositions and methods described herein include those capsid hybrids that are generated by molecular breeding of viruses as well as by exon shuffling. See, e.g., Soong et al., Nat. Genet., 25:436-439 (2000) and Kolman and Stemmer, Nat. Biotechnol. 19:423-428 (2001 ).
  • compositions and methods described herein can also be used to induce the expression of a protein in a cell, such as a therapeutic protein, by contacting the cell in vitro or in vivo (for instance, in a patient, such as a human patient) with a duplex containing a mRNA strand encoding the protein annealed to a plurality of complementary nucleic acid strands.
  • the plurality of nucleic acid strands may include one or more RNA strands or one or more DNA strands.
  • the plurality of nucleic acid strands includes both RNA strands and DNA strands.
  • the complementary nucleic acid strands may include one or more modified nucleotides, such as a nucleotide that has been chemically modified so as to improve the strength of a specific binding interaction between the RNA encoding the protein of interest and the short complementary RNA strand.
  • modified nucleotides such as a nucleotide that has been chemically modified so as to improve the strength of a specific binding interaction between the RNA encoding the protein of interest and the short complementary RNA strand.
  • nucleic acid duplexes described herein can have important clinical utility.
  • diseases and conditions including heritable genetic disorders, are manifestations of a deficiency in a native protein.
  • target cells e.g., human cells.
  • target cells e.g., human cells.
  • target cells e.g., human cells.
  • nucleic acid duplexes containing short complementary nucleic acids annealed to a mRNA strand encoding a protein of interest and the use of these duplexes for the treatment of a disease or condition in a patient.
  • the present invention is based in part on the hypothesis that, during translation of an
  • scRNA short complementary RNA fragments of from about 9 to about 30 nucleotides in length that anneal to the mRNA transcript are generated by an RNA-dependent RNA polymerase. It is hypothesized that these fragments guide the proper alignment of the ribosome to the mRNA template by preventing downstream (+1 ) frameshifts from occurring during the translation process. As shown in Figure 1 B, upstream (-1 ) frameshifting of the ribosome during translation of an mRNA construct is sterically prohibited, as this would require two adjacent tRNA molecules to be bound to the same base on the mRNA target.
  • RFS Reading Frame Surveillance
  • a key element of this model is that during the pioneer round, translation of a uniform set of codons is associated with cleavage of the cRNA template at a set distance from the A-site, which results in the mRNA being decorated with scRNA oligonucleotides.
  • the 7-codon length of the scRNA fragments diagrammed in Figure 2 represents one possible length for these fragments.
  • the RFS model postulates that the translational reading frame is monitored by sensing of scRNA termini with respect to the position of the tRNAs bound in the ribosome active sites.
  • Figure 2B presents one possible mechanism by which this sensing event occurs.
  • the ribosome senses the 5' end of a scRNA and, upon mRNA translocation to restore complete scRNA pairing with its complementary sequence on the mRNA, confirms that translation of the previous segment occurred without frameshifting.
  • Figure 2C presents a schematic of a+1 frameshifted complex, which increase the distance between the ribosome and the scRNA 5' end, which could be sensed by the ribosome to trigger alteration or abortion of the translation of that mRNA.
  • the RFS model additionally suggests that scRNA molecules can dissociate from the mRNA template that was translated during their biogenesis. These scRNA molecules are free to bind to daughter RNA molecules of the same polarity, for instance, to facilitate proper translation of these templates. In this way, the RFS model indicates that the scRNA molecules function as a form of epigenetic memory of previous translations, and facilitate the detection of reading frame errors occurring during genome replication.
  • RNA molecule e.g., a +1 frame-shifting event
  • scRNA molecules derived from translation of an intact RNA decorate the new, mutated RNA
  • translation downstream of the deletion would be sensed as a +1 frameshift at each scRNA every time it is read by the ribosome. This may trigger a severe response, such as cleavage of the RNA message.
  • RISC RNA Induced Silencing Complex
  • exons In eukaryotes, exons generally have elevated GC content relative to adjacent introns, and in higher eukaryotes this feature is even more pronounced in those regions of the genome with lower overall GC content and longer introns (Amit et al., Cell Reports 1 :543-556 (2012) and Louie et al., Genome Res. 13:2594-2601 (2003)).
  • This property of coding exons is frequently revealed in analyses of codon usage bias, in particular the GC content of the third, wobble position of codons. Codon usage bias is often attributed to selective pressure to maintain optimal translation rates in the context of charged tRNA abundances (Ikemura, Mol. Biol. Evol.
  • the RFS model indicates that GC-rich coding exons bind to scRNA molecules with increased affinity and thermodynamic stability, which enhances translational fidelity and processivity.
  • codon usage in eukaryotes can be influenced by reading frame surveillance, and can provide selective pressure to maintain high GC content in the wobble position.
  • the severity of disease-causing mutations in the Factor IX gene has been correlated with the change in pairing free energy causing by each coding sequence mutation (Hamasaki-Katagiri et al., Haemophilia 18:933-940 (2012)).
  • the RFS model can thus be used to inform the design of polynucleotide duplexes engineered for robust expression in a target cell, and may synergize with codon-optimization procedures, such as those described herein, to provide a paradigm for augmenting gene expression in a cell of interest.
  • the principle of reading frame surveillance is based in part on the postulate that scRNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated.
  • synthetic duplexes containing a mRNA strand encoding a protein of interest annealed to a plurality of complementary oligonucleotides can be prepared ex vivo and delivered to a cell in vitro or in vivo so as to simulate the mRNA-scRNA duplexes that naturally occur.
  • mRNA of interest can be delivered to a cell in complex with a series of molecular rulers that will guide the high-fidelity translation of the encoded protein product.
  • Strands of mRNA encoding a protein of interest can be designed using a variety of techniques.
  • RNA sequence For instance, using the standard genetic code, one of skill in the art can design a cDNA strand encoding a protein of interest, which can then be converted to a RNA sequence.
  • the design of protein-encoding nucleic acids can be informed, for instance, using the genetic code represented in Table 1 , above, compiled by the National Center for Biotechnology Information, Bethesda, Maryland, USA.
  • the complementary oligonucleotides can each independently be the length of a naturally occurring scRNA molecule, such as from 9-30 nucleotides in length.
  • the complementary nucleic acids can be DNA or RNA, and may include one or more modified nucleic acids, for instance, as described below. (ii) Modified nucleic acids for inclusion in mRNA duplexes
  • Complementary nucleic acids that can be incorporated into a mRNA-containing duplex described herein may include one or more chemically or enzymatically modified nucleic acids. Such nucleic acids can be useful, for instance, for enhancing the affinity of the complementary nucleic acids for the mRNA strand of interest.
  • modified nucleotides include modified adenosine, such as N6- methyladenosine 5'-triphosphate, N1 -methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'- triphosphate, 2'-amino-2'-deoxyadenosine 5'-triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluoro-2'-deoxyadenosine 5'-triphosphate.
  • modified adenosine such as N6- methyladenosine 5'-triphosphate, N1 -methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'- triphosphate, 2'-amino-2'-deoxyadenosine 5'-triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluor
  • modified nucleotides include modified guanosine, such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'- deoxyguanosine 5'-triphosphate.
  • modified guanosine such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'- deoxyguanosine 5'-triphosphate.
  • Modified uridine nucleotides can be incorporated into the mRNA- containing duplexes described herein, such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'- triphosphate, 5-bromouridine 5'-triphosphate, 2-thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'-amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'- deoxyuridine 5'-triphosphate, or 2'-fluoro-2'-deoxyuridine 5'-triphosphate.
  • modified nucleotides include modified cytidine, such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5- bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'-deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, or 2'-fluoro-2'- deoxycytidine 5'-triphosphate.
  • modified cytidine such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5- bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'-de
  • modified nucleotides are described, for instance, in WO 2012/031046 and US Patent Nos. 4,71 1 ,955 and 8,536,323, the disclosures of each of which are incorporated herein by reference as they pertain to chemically or enzymatically modified nucleotides.
  • nucleic acids that encode the protein of interest and that contain elevated GC content, for instance, relative to the wild-type gene while preserving the amino acid sequence of the encoded protein so as to enhance the thermodynamic stability of the duplex.
  • the increase in GC content will lead to higher-affinity binding of the mRNA transcript with short complementary nucleic acid fragments. This binding promotes improved licensing of the mRNA transcript, as the short
  • complementary nucleic acids act as molecular rulers that do not permit improper alignment (e.g., +1 frameshifting) of the ribosome to the mRNA template.
  • one of skill in the art can manipulate the protein-encoding gene sequence of a synthetic mRNA strand by incorporating codon substitutions that diminish the CpG content and/or homopolymer content of the mRNA. For instance, one can begin with a wild-type mRNA sequence and introduce substitutions (e.g., single-nucleotide substitutions) that reduce the CpG content and/or homopolymer content of the gene while preserving the identity of the encoded protein sequence. CpG sites and homopolymers can promote +1 frameshifts during the mRNA translation process.
  • the homopolymer encodes amino acid residues that are not essential for protein function (for instance, if the encoded amino acids are not present within the active site of an encoded enzyme or within a site necessary for non-covalent binding to another biological molecule), one of skill in the art can incorporate codon substitutions that interrupt the homopolymer and that introduce a conservative substitution into the encoded protein at the site of the corresponding amino acid, for instance, such that the encoded protein has an amino acid sequence that exhibits at least 85% sequence identity (e.g., 85%, 90%, 95%, 97%, 99%, or more) relative to the wild type amino acid sequence.
  • mRNA-containing duplexes Preparation of mRNA-containing duplexes
  • the mRNA strand and short complementary oligonucleotides can be prepared, for instance, by solid phase nucleic acid procedures known in the art.
  • solid phase nucleic acid procedures known in the art.
  • a solid phase synthesis process using a phosphoramidite method can be employed.
  • a nucleic acid is generally synthesized by the following steps.
  • a 5-OH-protected nucleoside that will occur at the 3' terminal end of the nucleic acid to be synthesized is esterified via the 3'-OH function to a solid support by appending the nucleoside to a cleavable linker. Then, the support for solid phase synthesis on which the nucleoside is immobilized can be placed in a reaction column which is then set on an automated nucleic acid synthesizer.
  • ⁇ (2) a step of coupling a 5-OH-protected nucleosidephosphoramidite with the deprotected 5'-OH group in the presence of an activator (e.g., tetrazole or the like);
  • an activator e.g., tetrazole or the like
  • the above process can be repeated to elongate the nucleic acid as needed in a 3'-to-5' direction. 5' terminal direction is promoted, and a nucleic acid having a desired sequence is synthesized.
  • the cleavable linker is hydrolyzed (e.g., with aqueous ammonia, methylamine solution, or the like) to cleave the synthesized nucleic acid from the solid phase support.
  • aqueous ammonia, methylamine solution, or the like e.g., aqueous ammonia, methylamine solution, or the like.
  • mRNA-containing duplexes that can be designed according to the methods described herein include those that encode therapeutic proteins, such that the duplexes can be transferred to a subject (e.g., a human patient) suffering from a disease or condition characterized by a deficiency in the protein.
  • mRNA-containing duplexes that can be designed and delivered to a patient according to the methods described herein include those that encode, for example, hormones and growth and differentiation factors including, without limitation, insulin, glucagon, growth hormone (GH), parathyroid hormone (PTH), calcitonin, growth hormone releasing factor (GRF), thyroid stimulating hormone (TSH), adrenocorticotropic hormone (ACTH), prolactin, melatonin, vasopressin, ⁇ -endorphin, met-enkephalin, leu-enkephalin, prolactin-releasing factor, prolactin-inhibiting factor, corticotropin-releasing hormone, thyrotropin-releasing hormone (TRH), follicle stimulating hormone (FSH), luteinizing hormone (LH), chorionic gonadotropin (CG), vascular endothelial growth factor (VEGF), angiopoietins, angiostatin, endostatin, granulocyte colony stimulating factor (
  • mRNA-containing duplexes that can be designed and delivered to a patient according to the methods described herein include those that encode proteins that regulate the immune system including, without limitation, cytokines and lymphokines such as thrombopoietin (TPO), interleukins (IL) IL-1 a, IL-1 ⁇ , IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-1 0, IL-1 1 , IL-12, IL-13, IL-14, IL- 15, IL-16, and IL-17, monocyte chemoattractant protein (MCP-1 ), leukemia inhibitory factor (LIF), granulocyte-macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating factor (G- CSF), monocyte colony stimulating factor (M-CSF), Fas ligand, tumor necrosis factors a and ⁇ (TNFa and TNFp), interferons (
  • Gene products produced by the immune system are also encompassed by this invention. These include, without limitations, immunoglobulins IgG, IgM, IgA, IgD and IgE, chimeric immunoglobulins, humanized antibodies, single chain antibodies, T cell receptors, chimeric T cell receptors, single chain T cell receptors, class I and class II MHC molecules, as well as engineered MHC molecules including single chain MHC molecules.
  • Useful gene products also include complement regulatory proteins such as membrane cofactor protein (MCP), decay accelerating factor (DAF), CR1 , CR2 and CD59.
  • Still other therapeutic proteins that can be encoded by a mRNA-containing duplex described herein include those that encode any one of the receptors for the hormones, growth factors, cytokines, lymphokines, regulatory proteins and immune system proteins.
  • receptors include flt-1 , flk-1 , TIE-2; the trk family of receptors such as TrkA, MuSK, Eph, PDGF receptor, EGF receptor, HER2, insulin receptor, IGF-1 receptor, the FGF family of receptors, the TGFp receptors, the interleukin receptors, the interferon receptors, serotonin receptors, a-adrenergic receptors, ⁇ -adrenergic receptors, the GDNF receptor, p75 neurotrophin receptor, among others.
  • mRNA-containing duplexes may encode, for instance, receptors for extracellular matrix proteins, such as integrins, counter-receptors for transmembrane-bound proteins, such as intercellular adhesion molecules (ICAM-1 , ICAM-2, ICAM-3 and ICAM-4), vascular cell adhesion molecules (VCAM), and selectins E-selectin, P-selectin and L-selectin.
  • mRNA-containing duplexes may encode, for instance, receptors for cholesterol regulation, including the LDL receptor, HDL receptor, VLDL receptor, and the scavenger receptor.
  • mRNA-containing duplexes may encode, for instance, apolipoprotein ligands for these receptors, including ApoAI, ApoAIV and ApoE. Additional examples of proteins that may be encoded by mRNA-containing duplexes described herein include, for instance, steroid hormone receptor superfamily including glucocorticoid receptors and estrogen receptors, Vitamin D receptors and other nuclear receptors.
  • useful gene products include antimicrobial peptides such as defensins and maginins, transcription factors such as jun, fos, max, mad, serum response factor (SRF), AP-1 , AP-2, myb, MRG1 , CREM, Alx4, FREAC1 , NF-KB, members of the leucine zipper family, C2H4 zinc finger proteins, including Zif268, EGR1 , EGR2, C6 zinc finger proteins, including the glucocorticoid and estrogen receptors, POU domain proteins, exemplified by Pit 1 , homeodomain proteins, including HOX-1 , basic helix-loop-helix proteins, including myc, MyoD and myogenin, ETS-box containing proteins, TFE3, E2F, ATF1 , ATF2, ATF3, ATF4, ZF5, NFAT, CREB, HNF- 4, C/EBP, SP1 , CCAAT-box binding proteins, interferon regulation factor 1 (IRF), I
  • mRNA-containing duplexes of therapeutic utility include those that encode carbamoyl synthetase I, ornithine transcarbamylase, arginosuccinate synthetase, arginosuccinate lyase, arginase, fumarylacetoacetate hydrolase, phenylalanine hydroxylase, alpha-1 antitrypsin, glucose-6-phosphatase, porphobilinogen deaminase, factor VII, factor VIII, factor IX, factor II, factor V, factor X, factor XII, factor XI, von Willebrand factor, superoxide dismutase, glutathione peroxidase and reductase, heme oxygenase, angiotensin converting enzyme, endothelin-1 , atrial natriuretic peptide, pro-urokinase, urokinase, plasminogen activator, heparin cofactor II, activated protein C (
  • dehydrogenase insulin, beta-glucosidase, pyruvate carboxylase, hepatic phosphorylase, phosphorylase kinase, glycine decarboxylase (also referred to as P-protein), H-protein, T-protein, Menkes disease protein, tumor suppressors (e.g., p53), cystic fibrosis transmembrane regulator (CFTR), the product of Wilson's disease gene PWD, Cu/Zn superoxide dismutase, aromatic amino acid decarboxylase, tyrosine hydroxylase, acetylcholine synthetase, prohormone convertases, protease inhibitors, lactase, lipase, trypsin, gastrointestinal enzymes including chymotrypsin, and pepsin, adenosine deaminase, a1 antitrypsin, tissue inhibitor of metalloproteinases (TIMP), GLUT-1 ,
  • proteins include those involved in lysosomal storage disorders, including acid ⁇ -glucosidase, a-galactosidase a, ⁇ -1 -iduronidase, iduroate sulfatase, lysosomal acid a-glucosidase, sphingomyelinase, hexosaminidase A, hexomimidases A and B, arylsulfatase A, acid lipase, acid ceramidase, galactosylceramidase, a-fucosidase, ⁇ -, ⁇ -mannosidosis,
  • mRNA-containing duplexes that can be designed and prepared using the compositions and methods described herein include those that encode non-naturally occurring polypeptides, such as chimeric or hybrid polypeptides or polypeptides having a non-naturally occurring amino acid sequence containing insertions, deletions or amino acid substitutions.
  • non-naturally occurring polypeptides such as chimeric or hybrid polypeptides or polypeptides having a non-naturally occurring amino acid sequence containing insertions, deletions or amino acid substitutions.
  • single-chain engineered immunoglobulins could be useful in certain immunocompromised patients.
  • Other useful proteins include truncated receptors which lack their transmembrane and cytoplasmic domain. These truncated receptors can be used to antagonize the function of their respective ligands by binding to them without concomitant signaling by the receptor.
  • Other types of non-naturally occurring gene sequences include sense and antisense molecules and catalytic nucleic acids, such as
  • Representative proteins that can be encoded by a duplex containing mRNA annealed to a plurality of short complementary nucleic acids as described herein include those listed in Table 3, above.
  • the expression level of a gene expressed by a single cell or type of cell can be ascertained, for example, by evaluating the concentration or relative abundance of RNA transcripts (e.g., mRNA)) derived from transcription of a gene of interest. Additionally or alternatively, gene expression can be determined by evaluating the concentration or relative abundance of protein produced by transcription and translation of a gene of interest.
  • RNA transcripts e.g., mRNA
  • Protein concentrations can also be assessed using functional assays, such as enzymatic assays or gene transcription assays in the event the gene of interest encodes an enzyme or a modulator of transcription, respectively.
  • functional assays such as enzymatic assays or gene transcription assays in the event the gene of interest encodes an enzyme or a modulator of transcription, respectively.
  • the sections that follow describe exemplary techniques that can be used to measure the expression levels of genes in a cell, cell type, or population of cells of interest, for instance, prior to and/or following the administration of a mRNA-containing duplex described herein to a patient or cell of interest. Expression of genes in a sample can be analyzed by a number of
  • nucleic acid sequencing e.g., nucleic acid sequencing (FISH)), amplification-based assays, in situ hybridization, fluorescence activated cell sorting (FACS), northern analysis and/or PCR analysis of mRNAs.
  • FISH fluorescence in-situ hybridization
  • FACS fluorescence activated cell sorting
  • Nucleic acid-based datasets suitable for analysis of cell-specific gene expression can have the form of a gene expression profile, which represents the identity of genes expressed in a cell of interest and the extent to which the gene is expressed, which can be used to determine the ranked order of gene expression levels within a cell, cell type, or population of cells of interest.
  • Such profiles may include whole transcriptome sequencing data (e.g., RNA-Seq data), panels of mRNAs, noncoding RNAs, or any other nucleic acid sequence that may be expressed from genomic DNA.
  • Other nucleic acid datasets suitable for use with the methods described herein may include expression data collected by imaging- based techniques (e.g., Northern blotting or Southern blotting known in the art).
  • Northern blot analysis is a conventional technique well known in the art and is described, for example, in Molecular Cloning, a Laboratory Manual, second edition, 1989, Sambrook, Fritch, Maniatis, Cold Spring Harbor Press, 10 Skyline Drive, Plainview, NY 1 1 803-2500, the disclosure of which is incorporated herein by reference.
  • Typical protocols for evaluating the status of genes and gene products are found, for example in Ausubel et al., eds., 1995, Current Protocols In Molecular Biology, Units 2 (Northern Blotting), 4 (Southern Blotting), 1 5 (Immunoblotting) and 18 (PCR Analysis), the disclosure of which is incorporated herein by reference.
  • Gene expression profiles to be analyzed in conjunction with the methods described herein may include, for example, microarray data or nucleic acid sequencing data produced by a sequencing method known in the art (e.g., Sanger sequencing and next-generation sequencing methods, also known as high- throughput sequencing or deep sequencing).
  • exemplary next generation sequencing technologies include, without limitation, lllumina sequencing, Ion Torrent sequencing, 454 sequencing, SOLiD sequencing, and nanopore sequencing platforms. Additional methods of sequencing known in the art can also be used.
  • mRNA expression levels may be determined using RNA-Seq (e.g., as described in Mortazavi et al., Nat. Methods 5:621 -628 (2008), the disclosure of which is incorporated herein by reference in their entirety).
  • RNA-Seq is a robust technology known in the art for monitoring expression by direct sequencing the RNA molecules in a sample. Briefly, this methodology may involve fragmentation of RNA to an average length of 200 nucleotides, conversion to cDNA by random priming, and synthesis of double-stranded cDNA (e.g., using the Just cDNA DoubleStranded cDNA Synthesis Kit from Agilent Technology). Then, the cDNA is converted into a molecular library for sequencing by addition of sequence adapters for each library (e.g., from lllumina®/Solexa), and the resulting 50-100 nucleotide reads are mapped onto the genome.
  • sequence adapters for each library e.g., from lllumina®/Solexa
  • Gene expression levels may be determined using microarray-based platforms (e.g., single- nucleotide polymorphism (SNP) arrays), as microarray technology offers high resolution. Details of various microarray methods can be found in the literature. See, for example, US Patent No. 6,232,068 and Pollack et al., Nat. Genet. 23:41 -46 (1999), the disclosures of each of which are incorporated herein by reference in their entirety.
  • SNP single- nucleotide polymorphism
  • the array can be configured, for example, such that the sequence and position of each member of the array is known. Hybridization of a labeled probe with a particular array member indicates that the sample from which the probe was derived expresses that gene. Expression level may be quantified according to the amount of signal detected from hybridized probe- sample complexes.
  • a typical microarray experiment involves the following steps: 1 ) preparation of fluorescently labeled target from RNA isolated from the sample, 2) hybridization of the labeled target to the microarray, 3) washing, staining, and scanning of the array, 4) analysis of the scanned image and 5) generation of gene expression profiles.
  • a microarray processor is the Affymetrix
  • GENECHIP® system which is commercially available and comprises arrays fabricated by direct synthesis of oligonucleotides on a glass surface.
  • Other systems may be used as known to one skilled in the art.
  • Amplification-based assays also can be used to measure the expression level of one or more markers (e.g., genes).
  • the nucleic acid sequences of the gene act as a template in an amplification reaction (for example, PCR, such as qPCR).
  • PCR amplification reaction
  • the amount of amplification product is proportional to the amount of template in the original sample.
  • Comparison to appropriate controls provides a measure of the expression level of the gene, corresponding to the specific probe used, according to the principles described herein.
  • Methods of real-time qPCR using TaqMan probes are well known in the art. Detailed protocols for real-time qPCR are provided, for example, in Gibson et al., Genome Res.
  • Probes used for PCR may be labeled with a detectable marker, such as, for example, a radioisotope, fluorescent compound, bioluminescent compound, a chemiluminescent compound, metal chelator, or enzyme.
  • a detectable marker such as, for example, a radioisotope, fluorescent compound, bioluminescent compound, a chemiluminescent compound, metal chelator, or enzyme.
  • Gene expression can additionally be determined by measuring the concentration or relative abundance of a corresponding protein product encoded by a gene of interest. Protein levels can be assessed using standard detection techniques known in the art. Examples of protein expression analysis that generate data suitable for use with the methods described herein include, without limitation, proteomics approaches, immunohistochemical and/or western blot analysis, immunoprecipitation, molecular binding assays, ELISA, enzyme-linked immunofiltration assay (ELIFA), mass spectrometry, mass spectrometric immunoassay, and biochemical enzymatic activity assays. In particular, proteomics methods can be used to generate large-scale protein expression datasets in multiplex.
  • Proteomics methods may utilize mass spectrometry to detect and quantify polypeptides (e.g., proteins) and/or peptide microarrays utilizing capture reagents (e.g., antibodies) specific to a panel of target proteins to identify and measure expression levels of proteins expressed in a sample (e.g., a single cell sample or a multi- cell population).
  • polypeptides e.g., proteins
  • capture reagents e.g., antibodies
  • Exemplary peptide microarrays have a substrate-bound plurality of polypeptides, the binding of an oligonucleotide, a peptide, or a protein to each of the plurality of bound polypeptides being separately detectable.
  • the peptide microarray may include a plurality of binders, including but not limited to monoclonal antibodies, polyclonal antibodies, phage display binders, yeast two-hybrid binders, aptamers, which can specifically detect the binding of specific oligonucleotides, peptides, or proteins. Examples of peptide arrays may be found in US Patent Nos. 6,268,210, 5,766,960, and 5,143,854, the disclosures of each of which are incorporated herein by reference.
  • Mass spectrometry may be used in conjunction with the methods described herein to identify and characterize the gene expression profile of a single cell or multi-cell population. Any method of MS known in the art may be used to determine, detect, and/or measure a peptide or peptides of interest, e.g., LC-MS, ESI-MS, ESI-MS/MS, MALDI-TOF-MS, MALDI-TOF/TOF-MS, tandem MS, and the like.
  • Mass spectrometers generally contain an ion source and optics, mass analyzer, and data processing electronics.
  • Mass analyzers include scanning and ion-beam mass spectrometers, such as time-of-flight (TOF) and quadruple (Q), and trapping mass spectrometers, such as ion trap (IT), Orbitrap, and Fourier transform ion cyclotron resonance (FT-ICR), may be used in the methods described herein. Details of various MS methods can be found in the literature. See, for example, Yates et al., Annu. Rev. Biomed. Eng. 1 1 :49-79 (2009), the disclosure of which is incorporated herein by reference.
  • TOF time-of-flight
  • Q quadruple
  • trapping mass spectrometers such as ion trap (IT), Orbitrap, and Fourier transform ion cyclotron resonance (FT-ICR)
  • proteins in a sample can be first digested into smaller peptides by chemical (e.g., via cyanogen bromide cleavage) or enzymatic (e.g., trypsin) digestion.
  • Complex peptide samples also benefit from the use of front-end separation techniques, e.g., 2D-PAGE, HPLC, RPLC, and affinity chromatography.
  • the digested, and optionally separated, sample is then ionized using an ion source to create charged molecules for further analysis.
  • Ionization of the sample may be performed, e.g., by electrospray ionization (ESI), atmospheric pressure chemical ionization (APCI), photoionization, electron ionization, fast atom bombardment (FAB)/liquid secondary ionization (LSIMS), matrix assisted laser desorption/ionization (MALDI), field ionization, field desorption, thermospray/plasmaspray ionization, and particle beam ionization. Additional information relating to the choice of ionization method is known to those of skill in the art.
  • Tandem MS also known as MS/MS
  • Tandem MS may be particularly useful for methods described herein allowing for ionization followed by fragmentation of a complex peptide sample, such as a sample obtained from a multi-cell population described herein.
  • Tandem MS involves multiple steps of MS selection, with some form of ion fragmentation occurring in between the stages, which may be accomplished with individual mass spectrometer elements separated in space or using a single mass spectrometer with the MS steps separated in time. In spatially separated tandem MS, the elements are physically separated and distinct, with a physical connection between the elements to maintain high vacuum.
  • a target cell e.g., a mammalian cell, such as a human cell
  • transfection techniques known known in the art.
  • electroporation can be used to permeabilize mammalian cells (e.g., human target cells) by the application of an electrostatic potential to the cell of interest.
  • Mammalian cells, such as human cells, subjected to an external electric field in this manner are subsequently predisposed to the uptake of exogenous nucleic acids.
  • Additional techniques useful for the transfection of target cells with mRNA-containing duplexes described herein include the squeeze-poration methodology. This technique induces the rapid mechanical deformation of cells in order to stimulate the uptake of exogenous DNA through membranous pores that form in response to the applied stress. This technology is advantageous in that a vector is not required for delivery of nucleic acids into a cell, such as a human target cell. Squeeze-poration is described in detail, e.g., in Sharei et al., Journal of Visualized Experiments 81 :e50980 (2013), the disclosure of which is incorporated herein by reference.
  • Lipofection represents another technique useful for transfection of target cells. This method involves the loading of nucleic acids into a liposome, which often presents cationic functional groups, such as quaternary or protonated amines, towards the liposome exterior. This promotes electrostatic interactions between the liposome and a cell due to the anionic nature of the cell membrane, which ultimately leads to uptake of the exogenous nucleic acids, for instance, by direct fusion of the liposome with the cell membrane or by endocytosis of the complex. Lipofection is described in detail, for instance, in US Patent No. 7,442,386, the disclosure of which is incorporated herein by reference.
  • RNA- containing aqueous core can have an anionic, cationic, or zwitterionic hydrophilic head group.
  • Some phospholipids are anionic whereas other are zwitterionic and others are cationic.
  • Suitable classes of phospholipid include, but are not limited to, phosphatidylethanolamines, phosphatidylcholines, phosphatidylserines, and phosphatidylglycerols.
  • Useful cationic lipids include, but are not limited to, dioleoyl trimethylammonium propane (DOTAP), 1 ,2-distearyloxy-N,N-dimethyl-3- aminopropane (DSDMA), 1 ,2-dioleyloxy-N,Ndimethyl-3-aminopropane (DODMA), 1 ,2-dilinoleyloxy-N,N- dimethyl-3-aminopropane (DlinDMA), 1 ,2-dilinolenyloxy-N,N-dimethyl-3-aminopropane (DLenDMA).
  • Zwitterionic lipids include, but are not limited to, acyl zwitterionic lipids and ether zwitterionic lipids.
  • Examples of useful zwitterionic lipids are DPPC, DOPC, DSPC, dodecylphosphocholine, 1 ,2-dioleoyl-sn- glycero- 3-phosphatidylethanolamine (DOPE), and 1 ,2-diphytanoyl-sn-glycero-3-phosphoethanolamine (DPyPE).
  • DOPE 1 ,2-dioleoyl-sn- glycero- 3-phosphatidylethanolamine
  • DPyPE 1 ,2-diphytanoyl-sn-glycero-3-phosphoethanolamine
  • the lipids in the liposome can be saturated or unsaturated. If an unsaturated lipid has two tails, either tails can be unsaturated, or the lipid can have one saturated tail and one unsaturated tail.
  • a lipid can include a steroid group in one tail.
  • Similar techniques that exploit ionic interactions with the cell membrane to provoke the uptake of foreign nucleic acids include contacting a cell with a cationic polymer-nucleic acid complex.
  • exemplary cationic molecules that associate with polynucleotides so as to impart a positive charge favorable for interaction with the cell membrane include activated dendrimers (described, e.g., in Dennig, Topics in Current Chemistry 228:227 (2003), the disclosure of which is incorporated herein by reference) and diethylaminoethyl (DEAE)-dextran, the use of which as a transfection agent is described in detail, for instance, in Gulick et al., Current Protocols in Molecular Biology 40:1 :9.2:9.2.1 (1 997), the disclosure of which is incorporated herein by reference.
  • Magnetic beads represent another tool that can be used to transfect target cells in a mild and efficient manner, as this methodology utilizes an applied magnetic field in order to direct the uptake of nucleic acids. This technology is described in detail, for instance, in US 2010/0227406, the disclosure of which is incorporated herein by reference.
  • laserfection a technique that involves exposing a cell to electromagnetic radiation of a particular wavelength in order to gently permeabilize the cells and allow polynucleotides to penetrate the cell membrane. This technique is described in detail, e.g., in Rhodes et al., Methods in Cell Biology 82:309 (2007), the disclosure of which is incorporated herein by reference.
  • Microvesicles represent another potential vehicle that can be used to modify the genome of a target cell according to the methods described herein. For instance, microvesicles that have been induced by the co-overexpression of the glycoprotein VSV-G with, e.g., a genome-modifying protein, such as a nuclease, can be used to efficiently deliver proteins into a cell that subsequently catalyze the site- specific cleavage of an endogenous polynucleotide sequence so as to prepare the genome of the cell for the covalent incorporation of a polynucleotide of interest, such as a gene or regulatory sequence.
  • a genome-modifying protein such as a nuclease
  • vesicles also referred to as Gesicles
  • Gesicles for the genetic modification of eukaryotic cells is described in detail, e.g., in Quinn et al., Genetic Modification of Target Cells by Direct Delivery of Active Protein [abstract].
  • Methylation changes in early embryonic genes in cancer in: Proceedings of the 18th Annual Meeting of the American Society of Gene and Cell Therapy; 2015 May 13,
  • exosomes such as those harvested naturally from a cell (e.g., a mammalian cell, such as a human cell) or those prepared synthetically. Exemplary techniques for the preparation and loading of exosomes with RNA substrates are described, for instance, in US 2015/0093433, the disclosure of which is incorporated herein by reference in its entirety.
  • the invention additionally provides methods and compositions that induce alternative splicing, for instance, by way of exon skipping, in a gene by using the host's endogenous mechanism for monitoring +1 ribosomal frameshifts.
  • the Reading Frame Surveillance (RFS) model suggests a way by which a ribosome
  • a complementary RNA is generated by an RNA-dependent RNA polymerase (RdRP).
  • RdRP RNA-dependent RNA polymerase
  • pioneer translation is expected to occur in the nucleus following the synthesis of cRNA.
  • translation of every set number of codons within a region of the cRNA that is complementary to an exon results in cleavage of the cRNA at set intervals; thus producing a series of uniformly sized short complementary RNAs (scRNAs) that span the length of the exonic regions of the pre-mRNA.
  • scRNAs short complementary RNAs
  • scRNAs Following splicing of the pre- mRNA to generate an mRNA, scRNAs remain complexed to the exonic regions of the mRNA, such that scRNAs span the length of the coding region of the mRNA. These scRNAs have a uniform size of 9-30 nucleotides and uniform spacing along the exonic regions of the mRNA.
  • the RFS model suggests that these scRNAs serve as a molecular ruler that allows continuous monitoring for ribosomal frameshifts during subsequent rounds of translation by the ribosome. When a previously measured mRNA is subsequently translated, the RFS model postulates that the translational reading frame is monitored by sensing of scRNA termini with respect to the position of the tRNAs bound in the ribosomal active site.
  • splicing occurs in the nucleus generating mature mRNAs competent for translation. Export to the cytoplasm and translation by the ribosome is regulated by the removal of introns, resulting in contiguous, uninterrupted reading frames. scRNAs are generated during pioneer translation in the nucleus, prior to splicing or export. Therefore, splicing joins these scRNA-bound exonic regions, juxtaposing the bordering scRNAs and demarcating their junction with an exon junction complex (EJC).
  • EJC exon junction complex
  • the EJC is a protein complex formed on an mRNA at the junction of two exons which have been joined by splicing.
  • the EJC has a role in regulating both export of the processed mRNA from the nucleus and cytoplasmic translation by the ribosome.
  • the EJC may adjust the register of RFS monitoring during translation downstream of the junction. Otherwise put, if the length of an exon is not an exact multiple of the length of the scRNA, the uniformity of scRNA spacing will be disrupted at the splice junction. In this case, the presence of an EJC indicates that RFS may proceed downstream of the splice junction, monitoring for uniformity of scRNA spacing within the next exon.
  • the message may be exported to the cytoplasm and translated into protein by the ribosome.
  • exons are selected by alternative splicing for inclusion into mRNAs.
  • Splice site recognition has long been known to involve complementary pairing of short spliceosomal RNAs to splice site sequences and, although this pairing could occur coordinately with the generation of scRNAs, the simple recognition of splice site consensus sequences is not always enough to define all splice sites.
  • introns are long, as frequently occurs in higher eukaryotes, splice site selection is increasingly influenced by the poorly defined mechanism of exon recognition. The ability of a ribosome to translate an exon contributes to its inclusion into an mRNA.
  • pre-mRNA transcripts are first copied into cRNA and then scanned by a nuclear pioneer translation apparatus that results in the production of scRNAs that decorate exonic regions of the pre-mRNA.
  • the scRNAs can be recognized by the spliceosome and accessory factors to select exons to that are included in the mature mRNA following splicing.
  • scRNAs may dissociate from the mRNA template following pioneer translation and bind additional mRNA molecules, including the same polarity and sequence, to facilitate RFS on those templates.
  • scRNAs may function as a form of epigenetic memory of previous translations.
  • the presence of scRNA fragments spanning a given exon may be able to direct spliceosome-mediated inclusion in the mature mRNA.
  • the absence of scRNA fragments spanning a given exon may be able to direct splicesome-mediated exclusion from the mature mRNA.
  • scRNAs may establish an epigenetic signature that can be communicated between cells.
  • DMD Duchenne muscular dystrophy
  • DMD patients often possess mutations that disrupt the open reading frame of the DMD gene that codes for the dystrophin protein in muscle cells. This results in translation of C-terminally truncated proteins. Altered splicing of mutated DMD mRNA transcripts, whether induced by exogenous polynucleotides or occurring spontaneously in rare revertant fibers, can skip mutated exons and restore the translational reading frame such that a partially functional protein is generated.
  • tissue resident satellite stem cells pick up epigenetic cues from revertant fibers and use them during differentiation to replicate the specific alternative splicing pattern established in the original revertant fiber.
  • This model may be consistent with scRNA fragments transmitting epigenetic regulation between cells. In some embodiments, this may facilitate tissue development and repair without the need for high efficiency delivery to a large number of cells.
  • the present invention uses the host's endogenous RFS mechanism to induce targeted alternative splicing (for example, by way of exon skipping) during the splicing of a pre-mRNA to form an mRNA.
  • RFS will generate a series of scRNA fragments that may direct for the desired splice pattern in endogenous mRNA transcripts.
  • the resulting protein may be truncated, such that the truncated protein lacks one or more exteins of the wild-type form of the protein.
  • the invention provides a method for inducing alternative splicing of an mRNA encoding a protein by providing a cell with a polynucleotide including at least the following elements operably linked in a 5'-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to a first intron (INTR1 ) from the gene; and a third region corresponding to a second exon (EX2) from the gene.
  • the wild type form of the gene includes one or more intervening exons between EX1 and EX2, and the polynucleotide does not comprise the one or more intervening exons.
  • the compositions and methods described herein are used to induce alternative splicing in the form of exon skipping.
  • the polynucleotide may be delivered to a cell by inclusion in a vector, wherein the vector may be a viral vector.
  • the invention also provides a composition including a vector including a polynucleotide including at least the following elements operably linked in a 5'-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to a first intron (INTR1 ) from the gene; and a third region corresponding to a second exon (EX2) from the gene.
  • the wild type form of the gene includes one or more intervening exons between the EX1 and EX2, and the polynucleotide does not comprise the one or more intervening exons.
  • the polynucleotide further includes a eukaryotic promoter, allowing for transcription of the corresponding mRNA including EX1 -INTR1 -EX2, along with any additional elements of the polynucleotide.
  • the resulting mRNA may be used as a template by an RdRP to generate the corresponding cRNA.
  • the cRNA may be cleaved by the pioneer round of translation at uniform intervals of 9-30 nucleotides in the regions of the cRNA that are complementary to EX1 and EX2.
  • the presence of cRNA fragments spanning the EX1 and EX2 exonic regions of the resulting mRNA, wherein uniform spacing of the cRNAs denotes a continuous reading frame, may enables translation of the EX1 -EX2 mRNA intro protein.
  • the scRNAs generated by RFS of a given gene it may be necessary to determine the length, in nucleotides, of the scRNAs generated by RFS of a given gene.
  • the scRNA fragments generated on the polynucleotide comprising EX1 -INTR1 -EX2 it may be necessary to adjust the length of the first exon in the polynucleotide comprising EX1 -INTR1 -EX2.
  • the scRNA positioning on the endogenous transcript is expected to be determined by regular spacing of cleavage beginning from where pioneer translation initiates.
  • the polynucleotide comprising EX1 -INTR1 -EX2 includes an ATG sequence at the 5' end of EX1 , wherein the ATG may be responsible for the initiation of pioneer translation of the polynucleotide. Therefore, in some embodiments, the experimentalist may test initiation at multiple positions of EX1 to determine which model transcript matches the translation initiation of the endogenous mRNA, and thereby generates scRNAs on the polynucleotide that match the position and spacing of scRNAs on the endogenous mRNA.
  • this may be performed by generating a series of identical polynucleotides, each of which contains a different 5' truncation of EX1 , such that the initiation of pioneer translation at ATG of the polynucleotide occurs at a different position for each member of the series of polynucleotides,
  • this series of polynucleotides may be tested to determine which position for the initiation of pioneer translation enables the generation of scRNA fragments that direct for the desired splicing pattern in the endogenous gene.
  • scRNAs produced from the transcription and pioneer translation of the polynucleotide that includes EX1 -INTR1 -EX2 may dissociate from the processed EX1 -EX2 mRNA and bind to an endogenous pre-mRNA including EX1 and EX2.
  • an endogenous gene, and its corresponding pre-mRNA comprise EX1 , EX2 and one or more intervening exons between EX1 and EX2
  • scRNA fragments bound to EX1 and EX2 may induce alternative splicing by the spliceosome, such that the resulting mRNA includes EX1 and EX2 linked directly in a 5'-to-3' direction, and the intervening one or more exons are not included in the mature mRNA.
  • scRNAs generated as a result of delivery of a polynucleotide including EX1 -INTR1 -EX2 to a cell may induce epigenetic regulation in the cell that alters the splice pattern of endogenous mRNA.
  • the epigenetic memory that results as a product of scRNAs may be transferrable between cells.
  • Small mutations may be considered as mutations that occur within an isolated exon. Examples of small mutations include point mutations, single-nucleotide insertions, single-nucleotide deletions, and insertions or deletions that occur within an isolated exon.
  • the endogenous gene targeted for alternative splicing includes one or more small mutations, as described above.
  • the endogenous gene targeted for alternative splicing includes one or more small mutations in an in-frame exon.
  • delivery of a polynucleotide that induces skipping of the mutated in-frame exon by using the host's endogenous RFS mechanism may restore the wild-type reading frame of the downstream exons.
  • alternative splicing is induced in the form of exon skipping.
  • small mutations that occur in out-of-frame exons require that an additional one or more exons are also skipped in order to restore and maintain the reading frame.
  • the total number of nucleotides in the wild-type form of all the exons that are skipped must equal a multiple of three.
  • the endogenous gene targeted for alternative splicing includes one or more small mutations in an out-of-frame exon.
  • delivery of a polynucleotide induces skipping of the mutated out-of-frame exon and one or more additional exons by using the host's endogenous RFS mechanism, such that the downstream reading frame is restored to wild type by alternative splicing.
  • Large mutations may be considered as either a deletion that removes a region of a gene wherein the region includes one or more entire exons, or a duplication that multiplies a region of a gene including one or more entire exons.
  • the reading frame will not be disrupted. This will result in an internally truncated protein which may be fully or partially functional.
  • alternative splicing e.g., exon skipping
  • alternative splicing may be employed to remove duplicated exons.
  • alternative splicing of the duplicated exon does not incur a frameshift mutation.
  • the mutated gene includes a deletion of one or more out-of-frame exons, in some embodiments, one or more exons are skipped using RFS such that the total number of nucleotides in the deleted exon(s) and the skipped exon(s) is a multiple of three, thereby restoring the downstream reading frame.
  • the mutated gene includes a duplication of one or more out-of-frame exons, in some embodiments, one or more exons are skipped using RFS such that the total number of nucleotides in the duplicated exon(s) and the skipped exon(s) is a multiple of three, thereby restoring the downstream reading frame.
  • alternative splicing is induced in the form of exon skipping.
  • alternative splicing e.g., exon skipping
  • Reading Frame Surveillance may be used to restore complete or partial function to a protein, wherein the gene encoding the protein contains a mutation relative to the wild-type form of the gene.
  • a host may harbor a deleterious genetic mutation in an exon, wherein targeted skipping of the exon, may allow for complete or partial restoration of the function of the protein.
  • the mutation is a point mutation, insertion, or deletion that results in either a premature termination codon or a frameshift. In these embodiments, induced alternative splicing of the exon containing the mutation could restore the reading frame and thereby restore the partial or complete function of the protein.
  • a host may harbor a deleterious genetic mutation in which a region including one or more exons, or a portion of one or more exons, has been duplicated or deleted resulting in a frameshift.
  • Targeted alternative splicing of one or more additional exons such that the removal of the exon from the mRNA restores the downstream reading frame, may allow for complete or partial restoration of the function of the protein.
  • the mutation is a deletion or duplication of a region of a gene including an entire exon, wherein in the mutation results in a frameshift.
  • the mutation is a deletion or duplication of a region of a gene including a portion of an exon, wherein in the mutation results in a frameshift.
  • induced alternative splicing of one or more exons such that alternative splicing results in restoration of the downstream reading frame, may restore partial or complete function of the protein.
  • alternative splicing may be used in the treatment of Duchenne muscular dystrophy (DMD).
  • DMD arises from frameshifting or nonsense mutations in the DMD gene that codes for the dystrophin protein.
  • Alternative splicing induced by the compositions and methods described herein may restore the downstream reading frame in the mutated DMD gene, thus restoring partial function of the protein.
  • alternative splicing may be used in the treatment of dystrophic epidermolysis bullosa.
  • Frameshifting mutations in the COL7A 1 gene that encodes type VII collagen are associated with dystrophic epidermolysis bullosa.
  • In-frame exon skipping that results in restoration of the downstream reading frame has been shown, in some cases, to produce an internally-truncated protein with near-normal function.
  • alternative splicing may be used to disrupt the reading frame in order to include a premature termination codon in the mRNA transcript and produce a C-terminally truncated protein.
  • alternative splicing by RFS may be used to disrupt the downstream reading frame in order to include a premature termination codon in the resulting mRNA transcript.
  • Alternative splicing to disrupt the reading frame would require skipping of one or more exons, wherein the total number of nucleotides in the one or more exons being skipped does not add up to a multiple of 3.
  • design of the alternative splicing construct should consider the downstream sequence such that the alternative splicing results in a premature termination codon at the desired location.
  • alternative splicing Up to 95% of multi-exon genes undergo alternative splicing, wherein multiple protein isoforms may be produced from a single gene by the selective inclusion or exclusion of exons from the processed mRNA.
  • alternative splicing patterns can determine which isoform of a protein is expressed. This process may be regulated in a developmental and tissue specific manner.
  • alternative splicing e.g., by way of exon skipping
  • alternative splicing may be used to regulate the relative expression of different isoforms of the human tau protein.
  • Tau protein is encoded by the microtubule-associated protein tau (MAPT) gene and contributes to microtubule stabilization.
  • MTT microtubule-associated protein tau
  • the relative amount of tau protein that includes exon 10 (E10 + ) to tau protein that excludes exon 1 0 (E10 ) is regulated in development.
  • the human embryonic brain only expresses the E10 " isoform, wherein the human adult brain expresses both E1 0 " and E10 + in equal levels.
  • Tau-related pathologies such as frontotemporal dementia with Parkinsonism, are associated with a shift increase in the ratio of E1 0 + :E1 0 " leading to tau deposition in the brain and are implicated in movement disorders and memory problems.
  • skipping of exon 10 of the MAPT gene may restore the balance of E10 + :E10-.
  • alternative splicing that regulates alternative splicing may be used in the treatment of cancer.
  • the Wilms' tumor gene, WT1 is often overexpressed in leukemia and solid tumors.
  • the gene product of WT1 may interfere with normal signaling, leading to increased proliferation and inhibition of differentiation and apoptosis.
  • Some leukemic cells express high levels of WT1 mRNA containing the alternatively spliced exon 5. Skipping of exon 5 has been shown to decrease cell viability resulting in a decrease in cell survival in leukemic cell cultures. In some embodiments, skipping of exon 5 of the WT1 may be therapeutically relevant for the treatment of leukemia.
  • alternative splicing to induce isoform switching in the gene product of the folate hydrolase gene may be used in the treatment of prostate cancer.
  • One isoform of the prostate-specific membrane antigen encoded by the folate hydrolase gene is expressed at a level 140-fold higher in malignant prostate tissues.
  • the overexpressed isoform has an extracellular enzymatic domain that regulates folate uptake.
  • alternative splicing may be used to express a protein isoform with decreased enzymatic activity for the treatment of prostate cancer.
  • alternative splicing may be applied as an alternative way to knockdown the expression of a gene.
  • Apolipoprotein B Apolipoprotein B
  • the full-length ApoBI OO protein is a ligand for the LDL receptor and is required for the assembly of VLDL, IDL, and LDL particles.
  • ApoBI OO plays a central role in atherosclerosis.
  • the other isoform, ApoB48, is essential for chylomicron assembly and intestinal fat transport. ApoB48 arises from tissue-specific RNA editing that results in a premature termination codon being present in exon 26.
  • ApoB100 for the treatment of atherosclerosis without affecting the expression of ApoB48.
  • alternative splicing may be used to selectively knockdown the expression of ApoBI OO by skipping exon 27 and inducing a frameshift mutation. Because ApoB46 is truncated at exon 26, skipping of exon 27 would not interfere with its expression.
  • alternative splicing that induces a frameshift may be used to knockdown the activity of an over-expressed or constitutively active protein, wherein the increased activity of the protein leads to a pathogenic state.
  • fibrodysplasia ossificans progressiva FOP
  • a mutation of the receptor encoded by the ALK2 gene leads to constitutive activation of the receptor and downstream signaling pathways.
  • Alternative splicing that disrupts the reading frame of the receptor encoded by ALK2, and thereby results in a non-functional protein has been proposed as a possible method of treating FOP.
  • Mutations can introduce new splice sites associated with a pseudo-exon, the inclusion of which may disturb protein production.
  • alternative splicing may be used to remove an aberrant pseudo-exon and restore production of a functional protein.
  • One such application is in the treatment of autosomal recessive congenital myopathy, which is due to a mutation in the type 1 ryanodine receptor (RyR1 ) encoded by the RYR1 gene.
  • RyR1 ryanodine receptor
  • inclusion of small pseudo-exon in RYR1 resulting in a frameshift was observed.
  • Alternative splicing of this pseudo-exon may restore the reading frame of RYR1 and therefore restore production of the functional RyR1 proteins.
  • a polynucleotide such as codon-optimized DNA or RNA into a mammalian cell
  • electroporation can be used to permeabilize mammalian cells (e.g., human target cells) by the application of an electrostatic potential to the cell of interest.
  • Mammalian cells, such as human cells, subjected to an external electric field in this manner are subsequently predisposed to the uptake of exogenous nucleic acids. Electroporation of mammalian cells is described in detail, e.g., in Chu et al. Nucleic Acids Research 15:131 1 (1987), the disclosure of which is incorporated herein by reference.
  • NucleofectionTM utilizes an applied electric field in order to stimulate the uptake of exogenous polynucleotides into the nucleus of a eukaryotic cell.
  • NucleofectionTM and protocols useful for performing this technique are described in detail, e.g., in Distler et al. Experimental Dermatology 14:31 5 (2005), as well as in US 2010/031 71 14, the disclosures of each of which are incorporated herein by reference.
  • Additional techniques useful for the transfection of target cells include the squeeze-poration methodology. This technique induces the rapid mechanical deformation of cells in order to stimulate the uptake of exogenous DNA through membranous pores that form in response to the applied stress. This technology is advantageous in that a vector is not required for delivery of nucleic acids into a cell, such as a human target cell. Squeeze-poration is described in detail, e.g., in Sharei et al. Journal of Visualized Experiments 81 :e50980 (2013), the disclosure of which is incorporated herein by reference.
  • Lipofection represents another technique useful for transfection of target cells. This method involves the loading of nucleic acids into a liposome, which often presents cationic functional groups, such as quaternary or protonated amines, towards the liposome exterior. This promotes electrostatic interactions between the liposome and a cell due to the anionic nature of the cell membrane, which ultimately leads to uptake of the exogenous nucleic acids, for instance, by direct fusion of the liposome with the cell membrane or by endocytosis of the complex. Lipofection is described in detail, for instance, in US Patent No. 7,442,386, the disclosure of which is incorporated herein by reference.
  • Similar techniques that exploit ionic interactions with the cell membrane to provoke the uptake of foreign nucleic acids include contacting a cell with a cationic polymer-nucleic acid complex.
  • exemplary cationic molecules that associate with polynucleotides so as to impart a positive charge favorable for interaction with the cell membrane include activated dendrimers (described, e.g., in Dennig, Topics in Current Chemistry 228:227 (2003), the disclosure of which is incorporated herein by reference) and
  • Another useful tool for inducing the uptake of exogenous nucleic acids by target cells is laserfection, a technique that involves exposing a cell to electromagnetic radiation of a particular wavelength in order to gently permeabilize the cells and allow polynucleotides to penetrate the cell membrane. This technique is described in detail, e.g., in Rhodes et al. Methods in Cell Biology 82:309 (2007), the disclosure of which is incorporated herein by reference. Microvesicles represent another potential vehicle that can be used to modify the genome of a target cell according to the methods described herein.
  • microvesicles that have been induced by the co-overexpression of the glycoprotein VSV-G with, e.g., a genome-modifying protein, such as a nuclease, can be used to efficiently deliver proteins into a cell that subsequently catalyze the site- specific cleavage of an endogenous polynucleotide sequence so as to prepare the genome of the cell for the covalent incorporation of a polynucleotide of interest, such as a gene or regulatory sequence.
  • a genome-modifying protein such as a nuclease
  • the use of such vesicles, also referred to as Gesicles, for the genetic modification of eukaryotic cells is described in detail, e.g., in Quinn et al. Genetic Modification of Target Cells by Direct Delivery of Active Protein [abstract]. In: Methylation changes in early embryonic genes in cancer [abstract], in: Proceedings of the 18th Annual Meeting of the American Society of Gene and Cell Therapy; 2015 May 13,
  • Transposons are polynucleotides that encode transposase enzymes and contain a polynucleotide sequence or gene of interest flanked by 5' and 3' excision sites. Once a transposon has been delivered into a cell, expression of the transposase gene commences and results in active enzymes that cleave the gene of interest from the transposon. This activity is mediated by the site-specific recognition of transposon excision sites by the transposase.
  • these excision sites may be terminal repeats or inverted terminal repeats.
  • the gene of interest can be integrated into the genome of a mammalian cell by transposase-catalyzed cleavage of similar excision sites that exist within the nuclear genome of the cell. This allows the gene of interest to be inserted into the cleaved nuclear DNA at the complementary excision sites, and subsequent covalent ligation of the phosphodiester bonds that join the gene of interest to the DNA of the mammalian cell genome completes the incorporation process.
  • the transposon may be a retrotransposon, such that the gene encoding the target gene is first transcribed to an RNA product and then reverse- transcribed to DNA before incorporation in the mammalian cell genome.
  • exemplary transposon systems include the piggybac transposon (described in detail in, e.g., WO 2010/085699) and the sleeping beauty transposon (described in detail in, e.g., US 2005/01 12764), the disclosures of each of which are incorporated herein by reference as they pertain to transposons for use in gene delivery to a cell of interest.
  • CRISPR clustered regularly interspaced short palindromic repeats
  • Cas9 Cas9 nuclease
  • Polynucleotides containing these foreign sequences and the repeat-spacer elements of the CRISPR locus are in turn transcribed in a host cell to create a guide RNA, which can subsequently anneal to a target sequence and localize the Cas9 nuclease to this site.
  • highly site-specific cas9-mediated DNA cleavage can be engendered in a foreign polynucleotide because the interaction that brings cas9 within close proximity of the target DNA molecule is governed by RNA:DNA hybridization.
  • RNA:DNA hybridization As a result, one can theoretically design a CRISPR/Cas system to cleave any target DNA molecule of interest.
  • ZFNs zinc finger nucleases
  • TALENs transcription activator-like effector nucleases
  • Additional genome editing techniques that can be used to incorporate polynucleotides encoding target genes into the genome of a target cell include the use of ARCUSTM meganucleases that can be rationally designed so as to site-specifically cleave genomic DNA.
  • the use of these enzymes for the incorporation of genes encoding target genes into the genome of a mammalian cell is advantageous in view of the defined structure-activity relationships that have been established for such enzymes.
  • Single chain meganucleases can be modified at certain amino acid positions in order to create nucleases that selectively cleave DNA at desired locations, enabling the site-specific incorporation of a target gene into the nuclear DNA of a hematopoietic stem cell.
  • Viral vectors for nucleic acid delivery provide a rich source of vectors that can be used for the efficient delivery of exogenous genes into the genome of a cell (e.g., a mammalian cell, such as a human cell). Viral genomes are particularly useful vectors for gene delivery because the polynucleotides contained within such genomes are typically incorporated into the genome of a target cell by generalized or specialized transduction. These processes occur as part of the natural viral replication cycle, and do not require added proteins or reagents in order to induce gene integration.
  • viral vectors examples include AAV, retrovirus, adenovirus (e.g., Ad5, Ad26, Ad34, Ad35, and Ad48), parvovirus (e.g., adeno-associated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e.g., influenza virus), rhabdovirus (e.g., rabies and vesicular stomatitis virus), paramyxovirus (e.g.
  • RNA viruses such as picornavirus and alphavirus
  • double stranded DNA viruses including adenovirus, herpesvirus (e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e.g., vaccinia, modified vaccinia Ankara (MVA), fowlpox and canarypox).
  • herpesvirus e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus
  • poxvirus e.g., vaccinia, modified vaccinia Ankara (MVA), fowlpox and canarypox
  • Other viruses useful for delivering polynucleotides described herein include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example.
  • retroviruses examples include: avian leukosis-sarcoma, mammalian C-type, B-type viruses, D-type viruses, HTLV- BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996).
  • murine leukemia viruses include murine leukemia viruses, murine sarcoma viruses, mouse mammary tumor virus, bovine leukemia virus, feline leukemia virus, feline sarcoma virus, avian leukemia virus, human T-cell leukemia virus, baboon endogenous virus, Gibbon ape leukemia virus, Mason Pfizer monkey virus, simian immunodeficiency virus, simian sarcoma virus, Rous sarcoma virus and lentiviruses.
  • vectors are described, for example, in US Patent No. 5,801 ,030, the disclosure of which is incorporated herein by reference as it pertains to viral vectors for use in gene therapy.
  • nucleic acids of the compositions and methods described herein are incorporated into rAAV vectors and/or virions in order to facilitate their introduction into a cell.
  • rAAV vectors useful in the invention are recombinant nucleic acid constructs that include (1 ) a heterologous sequence to be expressed (e.g., a polynucleotide encoding a GAA protein) and (2) viral sequences that facilitate integration and expression of the heterologous genes.
  • the viral sequences may include those sequences of AAV that are required in cis for replication and packaging (e.g., functional ITRs) of the DNA into a virion.
  • the heterologous gene encodes GAA, which is useful for correcting a GAA-deficiency in a cell.
  • rAAV vectors may also contain marker or reporter genes.
  • Useful rAAV vectors have one or more of the AAV WT genes deleted in whole or in part, but retain functional flanking ITR sequences.
  • the AAV ITRs may be of any serotype (e.g., derived from serotype 2) suitable for a particular application. Methods for using rAAV vectors are described, for example, in Tal et al. J. Biomed. Sci. 7:279-291 (2000), and Monahan and Samulski, Gene Delivery 7:24-30 (2000), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
  • the nucleic acids and vectors described herein can be incorporated into a rAAV virion in order to facilitate introduction of the nucleic acid or vector into a cell.
  • the capsid proteins of AAV compose the exterior, non-nucleic acid portion of the virion and are encoded by the AAV cap gene.
  • the cap gene encodes three viral coat proteins, VP1 , VP2 and VP3, which are required for virion assembly.
  • the construction of rAAV virions has been described, for instance, in US Patent Nos. 5,173,414; 5,139,941 ; 5,863,541 ; 5,869,305; 6,057,152; and 6,376,237; as well as in Rabinowitz et al. J. Virol. 76:791 -801 (2002) and Bowles et al. J. Virol. 77:423-432 (2003), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
  • rAAV virions useful in conjunction with the compositions and methods described herein include those derived from a variety of AAV serotypes including AAV 1 , 2, 3, 4, 5, 6, 7, 8 and 9.
  • rAAV virions that include at least one serotype 1 capsid protein may be particularly useful.
  • rAAV virions that include at least one serotype 6 capsid protein may also be particularly useful, as serotype 6 capsid proteins are structurally similar to serotype 1 capsid proteins, and thus are expected to also result in high expression of GAA in muscle cells.
  • rAAV serotype 9 has also been found to be an efficient transducer of muscle cells.
  • AAV vectors and AAV proteins of different serotypes are described, for instance, in Chao et al. Mol. Ther. 2:619-623 (2000); Davidson et al. Proc. Natl. Acad. Sci. USA 97:3428-3432 (2000); Xiao et al. J. Virol. 72:2224-2232 (1998); Halbert et al. J.
  • Pseudotyped vectors include AAV vectors of a given serotype (e.g., AAV9) pseudotyped with a capsid gene derived from a serotype other than the given serotype (e.g., AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, etc.).
  • a representative pseudotyped vector is an AAV8 or AAV9 vector encoding a therapeutic protein pseudotyped with a capsid gene derived from AAV serotype 2.
  • AAV virions that have mutations within the virion capsid may be used to infect particular cell types more effectively than non-mutated capsid virions.
  • suitable AAV mutants may have ligand insertion mutations for the facilitation of targeting AAV to specific cell types.
  • the construction and characterization of AAV capsid mutants including insertion mutants, alanine screening mutants, and epitope tag mutants is described in Wu et al., J. Virol. 74:8635-45 (2000).
  • Other rAAV virions that can be used in conjunction with the compositions and methods described herein include those capsid hybrids that are generated by molecular breeding of viruses as well as by exon shuffling. See, e.g., Soong et al., Nat. Genet., 25:436-439 (2000) and Kolman and Stemmer, Nat. Biotechnol. 19:423-428 (2001 ).
  • This example demonstrates how the reading frame surveillance model can be applied to design a codon-optimized sequence of the GAA gene for enhanced expression in a target tissue within a human patient suffering from Pompe disease.
  • the principle of reading frame surveillance is based on the hypothesis that short complementary RNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated.
  • the model indicates that these complementary RNA fragments function in concert to promote degradation of endogenous RNA that imperfectly aligns with the ribosome.
  • RNA molecules are capable of persisting within the cytoplasm and may anneal, for instance, to RNA molecules encoding other proteins due to chance complementarity
  • a gene encoding a protein intended for expression in a target cell can be codon-optimized by incorporating codon replacements into the gene that minimize the sequence identity between RNA resulting from transcription of the gene of interest and the endogenous RNA molecules present within the cell.
  • RNA transcripts are generated from transcription of genes that are actively expressed within the target cell.
  • gene expression techniques described herein such as qPCR, RNA-Seq, and immunoblot assays to assess which genes are expressed in a particular target cell so as to ascertain the panel of RNA molecules against which the sequence identity of the RNA transcript of the gene of interest can be compared. Since the coding strand of the DNA gene corresponds directly to the ensuing mRNA template, codon substitutions can then be incorporated into the coding strand of the gene that minimize the sequence identity of the coding strand (and thus, the subsequent mRNA transcript) with endogenous RNA transcripts present in the cell.
  • a gene is not expressed above a threshold level in the target cell under investigation (e.g., if a gene's expression level is not among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 1 0%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell), the sequence identity of the target gene need not be minimized relative to the unexpressed gene.
  • a wild-type GAA gene sequence, excluding intronic DNA, is as follows:
  • the wild-type GAA amino acid sequence is as follows:
  • SEQ ID NO: 1 Analysis of SEQ ID NO: 1 reveals specific codon preferences for various amino acids throughout the gene. A summary of the codons used in the wild-type GAA gene to encode the amino acids of this protein are listed in Table 4, below.
  • codon frequencies outlined in Table 4 Inspection of the codon frequencies outlined in Table 4 reveals that for certain amino acids, such as isoleucine, a particular codon is predominantly used (e.g., about 93% of the isoleucine residues in wild-type GAA are encoded by ATC) while other codons are used less frequently (e.g., about 7% of the isoleucine residues in wild-type GAA are encoded by ATT) or not at all (e.g., none of the isoleucine residues in wild-type GAA are encoded by ATA).
  • a particular codon is predominantly used (e.g., about 93% of the isoleucine residues in wild-type GAA are encoded by ATC) while other codons are used less frequently (e.g., about 7% of the isoleucine residues in wild-type GAA are encoded by ATT) or not at all (e.g., none of the isoleucine residues in wild-type GAA
  • one of skill in the art can use gene expression techniques known in the art to determine a gene expression profile for a target cell and subsequently align the coding strand of the wild-type GAA gene with the coding strands of genes determined to be expressed in the target cell.
  • a high level e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 1 0%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%
  • This process will reduce the sequence complementarity between the GAA mRNA transcript and the endogenous scRNA fragments present within the cell following the translation of endogenous mRNA molecules, and will therefore diminish the degradation of the target mRNA transcript due to improper alignment of the endogenous scRNA molecules with the GAA mRNA template.
  • one of skill in the art can design variants of the GAA gene that contain increased GC content, reduced CpG content, and/or reduced homopolymer content so as to enhance the translation of the GAA protein. For instance, after enhancing the GAA-encoding gene sequence by incorporating codon substitutions that minimize the sequence identity of the coding strand of the wild-type GAA gene to the coding strands of genes expressed within the target cell, one of skill in the art can subsequently modify the designed coding sequence to as to increase the GC content of the coding sequence while preserving the identity of the encoded amino acid sequence (SEQ ID NO: 2).
  • the increase in GC content will lead to enhanced binding of the GAA mRNA transcript with GAA-specific scRNA fragments, which promotes improved licensing of the mRNA transcript for nuclear export and enhanced translation due to few instances of improper alignment of the ribosome to the GAA-encoding mRNA template.
  • GAA-encoding gene sequence can further manipulate the GAA-encoding gene sequence by incorporating codon substitutions that diminish the CpG content and/or homopolymer content of the designed GAA gene.
  • codon substitutions that diminish the CpG content and/or homopolymer content of the designed GAA gene.
  • homopolymer GGGGGG SEQ ID NO: 4
  • Homopolymers can be a site of frameshift mutations in the formation of an mRNA transcript and/or during the translation process.
  • this homopolymer sequence remains in the codon-optimized GAA gene even after minimizing the sequence identity of the gene relative to endogenously expressed genes in the target cell, one of skill in the art can incorporate further mutations that interrupt this homopolymer while preserving the identity of the encoded protein (SEQ ID NO: 2).
  • the homopolymer encodes amino acid residues that are not essential for protein function (for instance, if the encoded amino acids are not present within the active site of the GAA enzyme)
  • codon substitutions that interrupt the homopolymer and that introduce a conservative substitution into the encoded protein at the site of the corresponding amino acid.
  • the final codon-optimized gene can be prepared, for instance, by solid phase nucleic acid procedures known in the art. Techniques for the solid phase synthesis of polynucleotides are known in the art and are described, for instance, in US Patent No. 5,541 ,307, the disclosure of which is incorporated herein by reference as it pertains to solid phase polynucleotide synthesis and purification. Additionally, the prepared gene can be amplified, for instance, using PCR-based techniques described herein or known in the art, and/or by transformation of DH5a E. coli with a plasmid containing the designed gene.
  • the bacteria can subsequently be cultured so as to amplify the DNA therein, and the gene can be isolated plasmid purification techniques known in the art, followed optionally by a restriction digest and/or sequencing of the plasmid to verify the identity codon-optimized gene.
  • a gene encoding a therapeutic protein, such as GAA can be codon-optimized using the procedures described herein (e.g., as described in Example 1 , above).
  • the gene can subsequently be incorporated into a vector, such as a viral vector, and administered to a patient suffering from a disease associated with a deficiency in the gene.
  • a patient suffering from Pompe disease a lysosomal storage disorder characterized by a deficiency in GAA
  • a viral vector containing a codon-optimized GAA gene under the control of a suitable promoter for expression in a human cell, such as a human muscle cell.
  • an AAV vector such as a pseudotyped AAV2/9 vector
  • the gene may be placed under control of a tissue-specific promoter, such as the desmin promoter.
  • the AAV vector can be administered to the subject systemically.
  • the gene may be codon-optimized for muscle-specific expression, for instance, by introducing mutations into the codon-optimized GAA gene that favor degradation of the ensuing mRNA transcript in cells of other tissues based on sequence complementarity to scRNA molecules present within those tissues. In some instances, it may be desirable to achieve liver-specific expression of the encoded GAA protein.
  • codon substitutions can be incorporated into the optimized GAA gene sequence as described in Example 1 , above, that increase the sequence identity of the coding strand of the GAA gene with respect to genes expressed in non-liver cells while incorporating codon substitutions that diminish the sequence identity of the GAA gene relative to genes expressed in hepatocytes.
  • a practitioner of skill in the art can monitor the expression of the codon-optimized GAA gene by a variety of methods. For instance, one of skill in the art can transfect cultured hepatocytes, such as Hep2G cells, with the codon-optimized gene in order to model the expression of the codon-optimized gene in the liver of a patient. Expression of the encoded protein can subsequently be monitored using, for example, an expression assay described herein, such as qPCR, RNA-Seq, ELISA, or an immunoblot procedure.
  • codon optimization procedure Based on the data obtained from the gene expression assay, further iterations of the codon optimization procedure can be performed, for instance, so as to increase GC content and/or to diminish CpG content and homopolymer content in the mRNA transcript.
  • Candidate gene sequences with optimal expression patterns in vitro can subsequently be prepared for incorporation into a suitable vector and administration to a mammalian subject, such as an animal model of Pompe disease, or a human patient.
  • Example 3 Design and Delivery of polynucleotides having +1 frameshift mutations to human cells to reduce target gene expression
  • complementary RNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated.
  • the model indicates that scRNA may disengage and bind to RNA molecules of the same sequence, thereby functioning as a form of epigenetic memory of previous translation events. scRNA binding may be reduced to limit translation of a wild type mRNA transcript, by the introduction of a competitive RNA molecule that can bind to and sequester free scRNA fragments.
  • a therapeutic polynucleotide having the sequence of a target gene (e.g., an aberrantly expressed and/or mutant gene associated with a disease) mRNA, may be designed to contain a +1 frameshift mutation.
  • the +1 frameshift mutation may be generated by the addition of five nucleotides or by the inclusion of a single nucleotide deletion with respect to the targeted gene sequence.
  • the polynucleotide may be introduced, e.g., by transfection, into cultured human cells expressing the target gene. For example, the polynucleotide may be introduced into a series of cell cultures at increasing concentrations. Target gene expression may then be monitored, e.g., by RT-PCR and Western blot analysis, for dose-dependent attenuation (e.g., reduction) as compared to untreated control cells.
  • Example 4 Design and Delivery of polynucleotides having +1 frameshift mutations to human cells to reduce target gene expression by a viral vector
  • a therapeutic polynucleotide having the sequence of a target gene (e.g., an aberrantly expressed and/or mutant gene associated with a disease) mRNA, may be designed to contain a +1 frameshift mutation.
  • the polynucleotide may be incorporated into a viral vector, such as an adeno-associated virus (AAV), for delivery into human cells.
  • AAV adeno-associated virus
  • the AVV encoding the polynucleotide may be introduced into cultured human cells expressing the target gene.
  • the AVV encoding the polynucleotide may be introduced into a series of cell cultures at increasing concentrations.
  • Target gene expression may then be monitored, e.g., by RT-PCR and Western blot analysis, for dose-dependent attenuation (e.g., reduction) as compared to untreated control cells.
  • Example 5 Design and Delivery of polynucleotides having +1 frameshift mutations to human cardiac cells to reduce mutant RYR2 expression
  • CPVT Catecholaminergic polymorphic ventricular tachycardia
  • CPVT can result from the mutation of the gene encoding ryanodine receptor 2 (RYR2) (Human RYR2 mRNA sequence: NCBI Reference Sequence: NM_001035.2).
  • RYR2 ryanodine receptor 2
  • Attenuation of mutant RYR2 expression may be an effective treatment for CPVT.
  • This example demonstrates how the reading frame surveillance model can be applied to design a polynucleotide having a +1 frameshift mutation to promote attenuation of target gene expression in mammalian cells (e.g., human cells).
  • a therapeutic polynucleotide having the sequence of a CPVT associated mutant RYR2 mRNA, may be designed to contain a +1 frameshift mutation.
  • polynucleotide may be introduced, e.g., by transfection, into cultured human cardiac cells expressing the mutant RYR2 gene.
  • the polynucleotide may be introduced into a series of cell cultures at increasing concentrations.
  • RYR2 gene expression may then be monitored, e.g., by RT-PCR and
  • This example demonstrates how the reading frame surveillance model can be applied to design a mRNA-containing duplex for promoting GAA gene expression in a target tissue within a human patient suffering from Pompe disease.
  • the principle of reading frame surveillance is based on the hypothesis that short complementary RNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated.
  • the model indicates that these complementary RNA fragments function in concert to the high-fidelity translation of functional protein product.
  • a synthetic duplex can be prepared ex vivo and delivered to a patient, such as a patient suffering from a disease associated with a deficiency in a particular protein, so as to induce or augment the expression of the protein in the subject.
  • a wild-type GAA gene sequence, excluding intronic DNA, is as follows:
  • RNA molecule encoding wild type GAA.
  • One of skill in the art can design a series of short (e.g., 9-30 nucleotide) complementary nucleic acids that will anneal to the GAA-encoding mRNA.
  • the short complementary nucleic acids may include DNA strands, RNA strands, or both, and may include one or more modified nucleic acids.
  • modified nucleotides include modified adenosine, such as N6-methyladenosine 5'- triphosphate, N1 -methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'-triphosphate, 2'-amino-2'- deoxyadenosine 5'-triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluoro-2'- deoxyadenosine 5'-triphosphate.
  • modified adenosine such as N6-methyladenosine 5'- triphosphate, N1 -methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'-triphosphate, 2'-amino-2'- deoxyadenosine 5'-triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluoro-2
  • modified nucleotides include modified guanosine, such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino- 2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'- deoxyguanosine 5'-triphosphate.
  • modified guanosine such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino- 2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'- deoxyguanosine 5'-triphosphate.
  • Modified uridine nucleotides can be incorporated into the mRNA- containing duplexes described herein, such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'- triphosphate, 5-bromouridine 5'-triphosphate, 2-thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'-amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'- deoxyuridine 5'-triphosphate, or 2'-fluoro-2'-deoxyuridine 5'-triphosphate.
  • modified nucleotides include modified cytidine, such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5- bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'-deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, or 2'-fluoro-2'- deoxycytidine 5'-triphosphate.
  • modified cytidine such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5- bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'-de
  • the complementary nucleic acids can span, for example, the entire length of the mRNA strand, such as from the start codon to and including the stop codon.
  • the mRNA duplex may have one or more single stranded gaps, such as a single stranded region that is a multiple of 3 nucleotides in length (e.g.
  • the mRNA-containing duplex can be prepared, for instance, by solid phase nucleic acid procedures described herein or known in the art. Techniques for the solid phase synthesis of polynucleotides are known in the art and are described, for instance, in US Patent No. 5,541 ,307, the disclosure of which is incorporated herein by reference as it pertains to solid phase polynucleotide synthesis and purification.
  • a mRNA duplex as prepared according to Example 6 can be incorporated into a variety of delivery vehicles, such as a liposome, cationic polymer, nanoparticle, vesicle, or exosome, and administered to a patient suffering from a disease associated with a deficiency in the gene.
  • a patient suffering from Pompe disease a lysosomal storage disorder characterized by a deficiency in GAA
  • a liposome containing the mRNA duplex can be administered to the subject systemically.
  • the liposome can be administered in a tissue-specific fashion using administration techniques known in the art.
  • a practitioner of skill in the art can monitor the expression of the GAA gene by a variety of methods. For instance, expression of the encoded protein can be monitored using, for example, an expression assay described herein, such as qPCR, RNA-Seq, ELISA, or an immunoblot procedure. Based on the data obtained from the gene expression assay, the subject may be re-dosed with the mRNA duplex, for instance, so as to up-titrate or down-titrate the quantity of mRNA duplex administered to the subject.
  • an expression assay described herein such as qPCR, RNA-Seq, ELISA, or an immunoblot procedure.
  • the subject may be re-dosed with the mRNA duplex, for instance, so as to up-titrate or down-titrate the quantity of mRNA duplex administered to the subject.
  • Example 8 Induced alternative splicing by Reading Frame Surveillance of DMD exon 51 to restore the downstream reading frame in a mutant DMD gene
  • Duchenne muscular dystrophy is a severe, progressive, neuromuscular disorder caused by mutations in the DMD gene that result in C-terminally truncated, non-functional dystrophin proteins.
  • the mutations that give rise to DMD are heterogeneous in nature and typically include at least one of the following: a point mutation that produces a premature termination codon, a duplication, or a deletion.
  • Alternative splicing for instance, by way of exon skipping, is a promising therapeutic approach for the treatment of DMD since internally truncated dystrophins can be partly functional. This is exemplified by the less severe phenotype of Becker muscular dystrophy (BMD) patients, who carry mutations in the same gene, wherein the mutations do not alter the reading frame of protein translation.
  • BMD Becker muscular dystrophy
  • Alternative splicing using RFS requires knowledge or the genetic basis of the pathological state.
  • a heterogeneous genetic disease such as DMD
  • it is necessary to determine the specific genetic variant that gives rise to the disease in the individual to be treated. Therefore, the first step in the design of a polynucleotide for targeted alternative splicing by RFS is to determine, by genetic sequencing, the identity of the genetic mutation that leads to the C-terminal truncation of dystrophin.
  • the wild type DMD gene consists of 79 exons and nearly all patients have unique mutations.
  • the alternative splicing approach is mutation specific because different mutations require skipping different exons to restore protein function.
  • two-thirds of all patients carry a deletion of one or more exons, of which 70% cluster between exons 45 and 55.
  • skipping certain exons is applicable to treatment of large groups of patients.
  • This example describes the process for designing and administering a polynucleotide that directs for targeted exon skipping of exon 51 , wherein targeted exon skipping uses the Reading Frame
  • exon skipping using RFS may be achieved by introducing a polynucleotide into a cell harboring the ⁇ 48-50 DMD gene, wherein the polynucleotide includes EX1 -INTR1 -EX2 operably linked in a 5'-to-3' direction, wherein EX1 corresponds to exon 47 of DMD and EX2 corresponds to exon 52 of DMD.
  • INTR1 corresponds to any intron that enables the splicing of exon 47 to exon 52, wherein INTR1 may be isolated from DMD, isolated from another gene, a synthetic intron, or an intron that contains regions isolated from a gene and also includes a synthetic region.
  • This polynucleotide may be operably linked to a constitutive promoter, such as the
  • the vector including the polynucleotide may be delivered to a cell harboring the ⁇ 48-50 DMD gene, such as the well-characterized DMD myoblast model carrying the deletion of exons 48 through 50 (available at the Telethon Neuromuscular Biobank).
  • the DMD myoblasts ⁇ 48-50 can be induced to differentiate and samples of DMD mRNA and dystrophin protein collected after 7 days. Skipping of exon 51 can be evaluated by, e.g., RT-PCR analysis of the DMD mRNA and/or western blot of the dystrophin protein. Production of a dystrophin protein including an internal truncation of ⁇ 48-51 is expected, thereby restoring the downstream reading frame and partial function of the protein.
  • Example 9 Induced alternative splicing by Reading Frame Surveillance of APOB exon 27 for the selective knockdown of ApoB100 activity
  • Apolipoprotein B has emerged as a key target for therapeutic interventions that lower LDL cholesterol.
  • Two natural protein isoforms are generated by the APOB gene, ApoB100 and ApoB48.
  • the full-length ApoB100 isoform is expressed in the liver and is the principle structural apolipoprotein in LDL particles.
  • ApoB48 is generated by tissue-specific RNA editing of a single nucleotide that produces a premature termination codon in exon 26 of the APOB mRNA transcript.
  • ApoB48 is C-terminally truncated as compared to ApoB1 00.
  • ApoB48 is assembled into chylomicrons, whose function is to transport dietary fat and fat-soluble vitamins from the intestine.
  • a treatment aimed at reducing cholesterol by interfering with ApoB should be selective for ApoB100 and not interfere with the function of ApoB48.
  • This example describes the process for designing and administering a polynucleotide that directs for targeted exon skipping of exon 27 of the APOB gene, wherein the targeted exon skipping uses the Reading Frame Surveillance mechanism of the host cell.
  • the polynucleotide may be therapeutically relevant for the treatment of high cholesterol and atherosclerosis by generating a truncated form of ApoB1 00.
  • the wild type form of APOB exon 27 includes 1 15 nucleotides and a genetic deletion of exon 27 results in a frameshift mutation that leads to inclusion of a premature stop codon, and production of the C-terminally truncated ApoB87. Since ApoB48 includes a tissue-specific stop codon in exon 26, exon skipping of exon 27 does not interfere with its function.
  • exon skipping using RFS may be achieved by introducing a polynucleotide into a cell harboring the APOB gene, wherein the polynucleotide includes EX1 -INTR1 -EX2 operably linked in a 5'-to-3' direction, wherein EX1 corresponds to exon 26 of APOB and EX2 corresponds to exon 28 of DMD.
  • INTR1 corresponds to any intron that enables the splicing of exon 26 to exon 28 (e.g., INTR1 may be isolated from APOB, isolated from another gene, a synthetic intron, or an intron that contains regions isolated from a gene and also includes a synthetic region).
  • This polynucleotide may be operably linked to a constitutive promoter, such as the
  • the vector including the polynucleotide may be delivered to a cell such as the human hepatocellular carcinoma cell line, HepG2. HepG2 cells express ApoB100 and have been show to support exon skipping and production of the ApoB87 isoform.
  • the vector including the polynucleotide can be delivered to the cultured HepG2 cells as described in the detailed description of the invention, and samples of APOB mRNA and ApoB protein collected after 48 hours. Skipping of exon 27 can be evaluated, e.g., by RT-PCR analysis of the APOB mRNA and/or production of the ApoB87 isoform evaluated, e.g., by western blot of the ApoB protein.
  • the vector containing the polynucleotide may be delivered to a cell such as the human epithelial colorectal carcinoma cell line, Caco-2.
  • Caco-2 cells have been used as a model for lipoprotein synthesis in the intestine and express both the ApoBI OO and ApoB48 isoforms, once cellular confluence has been achieved.
  • the vector can be delivered to the cultured Caco-2 cells as described in the detailed description of the invention, and samples of APOB mRNA and ApoB protein would be collected after 48 hours.
  • Skipping of exon 27 can be evaluated, e.g., by RT-PCR analysis of the APOB mRNA, and production of the ApoB87 isoform can be evaluated, e.g., by western blot of the ApoB protein. No effect on the protein expression of the ApoB48 isoform is expected, thereby enabling selective gene knockdown by exon skipping using RFS.

Landscapes

  • Genetics & Genomics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The invention provides compositions and methods for stimulating or suppressing the expression of a gene of interest, as well for inducing alternative RNA splicing of a target gene. The compositions and methods described herein can be used, for instance, to produce genes and RNA equivalents optimized for expression in a particular cell type. The invention additionally features nucleic acid duplexes capable of promoting gene expression upon delivery into a cell, such as a mammalian cell described herein. The compositions and methods described herein can additionally be used to silence gene expression, as well as to stimulate alternative splicing of a pre-mRNA transcript, for instance, by way of exon skipping. The invention additionally provides methods of treating a variety of genetic disorders. Exemplary diseases that can be treated using genes optimized using the compositions and methods described herein include recessive genetic disorders, such as X-linked myotubular myopathy, Pompe disease, recessive catecholaminergic polymorphic ventricular tachycardia, and Crigler-Najjar syndrome, among others, as well as disorders associated with deleterious frameshift mutations, such as Duchenne muscular dystrophy.

Description

COMPOSITIONS AND METHODS FOR MODULATING GENE EXPRESSION
USING READING FRAME SURVEILLANCE
Field of the Invention
The invention relates to the field of nucleic acid biotechnology and provides, for instance, compositions and methods for the design and production of codon-optimized nucleic acids for enhancing gene expression in a target cell or tissue, as well as nucleic acids for inducing the expression or the silencing of a gene of interest, and nucleic acids capable of promoting alternative RNA splicing.
Background of the Invention
Advances in chemical nucleic acid synthesis technologies and recombinant nucleic acid preparation methods have enabled the efficient production of nucleic acids, such as single-stranded polynucleotides and double-stranded nucleic acid duplexes. Additionally, a variety of vectors have been developed for the delivery of nucleic acids to target cells. Among the factors that have hindered the development of nucleic acids as a therapeutic paradigm are the difficulties associated with designing polynucleotides capable of selectively enhancing or repressing the expression of a gene of interest in a target cell, as well as the challenges that have been associated with modulating the alternative splicing of genes that contain multiple intronic sequences. There remains a need for a set of techniques that address these hindrances.
Summary of the Invention
In some aspects, the invention provides compositions and methods for optimizing the nucleic acid sequence of a gene or RNA equivalent thereof encoding a protein of interest so as to achieve enhanced expression of the protein in a particular cell type. Genes and RNA equivalents thereof optimized using the compositions and methods described herein can be synthesized by chemical synthesis techniques. Genes designed according to the methods described herein may be amplified, for instance, using prokaryotic or eukaryotic cells that have been transfected with the optimized gene. The gene or RNA equivalent thereof may have a clinical benefit, as these constructs can be administered to a subject, such as a human subject, to treat a disease or condition characterized by a defect in, or a reduced expression of, the encoded protein. Such diseases include heritable disorders, such as recessive genetic diseases, including X-linked myotubular myopathy (XLMTM), Pompe disease, recessive catecholaminergic polymorphic ventricular tachycardia (CPVT), and Crigler-Najjar syndrome, among others.
For instance, in a first aspect, the invention features a method of preparing a codon-optimized gene or RNA equivalent thereof for expression of a protein in a cell, the method including: a) providing a gene expression profile for a plurality of genes expressed in the cell;
b) providing a polynucleotide sequence including a portion encoding the protein; c) incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide, or RNA resulting from transcription thereof, to endogenous RNA molecules encoding a protein whose expression level is among the top 50% (e.g., among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 1 0%, 15%, 20%, 25%, 30%, or 50%) or more of gene expression levels in the intended target cell; and d) synthesizing, expressing, and/or isolating the polynucleotide.
In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 1 % of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 2% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 3% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 4% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 5% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 6% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 7% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 8% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 9% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 10% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 1 1 % of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 12% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 13% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 14% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 15% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 20% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 25% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 30% of gene expression levels in the intended target cell. In some embodiments, step (c) includes incorporating codon substitutions into the polynucleotide that minimize the sequence identity of the polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 50% of gene expression levels in the intended target cell.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 9-30-nucleotide portions (e.g., 9-nucleotide portions, 1 0-nucleotide portions, 1 1 -nucleotide portions, 12-nucleotide portions, 13- nucleotide portions, 14-nucleotide portions, 1 5-nucleotide portions, 16-nucleotide portions, 17-nucleotide portions, 18-nucleotide portions, 19-nucleotide portions, 20-nucleotide portions, 21 -nucleotide portions, 22-nucleotide portions, 23-nucleotide portions, 24-nucleotide portions, 25-nucleotide portions, 26- nucleotide portions, 27-nucleotide portions, 28-nucleotide portions, 29-nucleotide portions, or 30- nucleotide portions) of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity (e.g., no greater than 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 30% or less) relative to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules. In some
embodiments, no more than 50% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 30-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 29- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 29-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 29-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 28- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 28-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 28-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 27- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 27-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 27-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%,
40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 26- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 26-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 26-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%,
40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 25- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 25-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 25-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 24- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 24-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 24-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 23- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 23-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 23-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 22- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 22-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 22-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 21 - nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 21 -nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 21 -nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 20- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 20-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 20-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 19- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 19-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 1 9-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 19-nucleotide portion of the endogenous RNA molecules.
In some embodiments, no more than 75% (e.g., no more than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 1 %, or less) of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 25% sequence identity to a corresponding continuous 18- nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 20% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 15% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 50% of the continuous 18-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 10% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules. In some embodiments, no more than 5% of the continuous 1 8-nucleotide portions of the polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 18-nucleotide portion of the endogenous RNA molecules.
In some embodiments, the polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity (e.g., no greater than 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30%, or less) relative to endogenous RNA molecules encoding a protein whose expression level is among the top 50% (e.g., among the top 1 %, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50%) of expression levels in the cell.
In some embodiments, step (c) includes minimizing the sequence identity of the protein-coding region of the polynucleotide or RNA resulting from transcription thereof relative to the protein-coding regions of the endogenous RNA molecules.
In some embodiments, step (c) includes minimizing the sequence identity of one or more non- coding regions of the polynucleotide or RNA resulting from transcription thereof relative to the corresponding non-coding regions of the endogenous RNA molecules.
In some embodiments, the non-coding region is an intron, a 5' untranslated region (UTR), or a 3'
UTR.
In some embodiments, the method includes increasing the GC content of the polynucleotide while preserving the amino acid sequence of the encoded protein. In some embodiments, the method includes increasing the GC content of the polynucleotide to a quantity sufficient to permit hybridization of the polynucleotide or RNA resulting from transcription thereof to a complementary RNA strand with a Gibbs free energy change of from about -10 to about -100 kcal/mol. In some embodiments, the GC content of the polynucleotide may be increased to a quantity sufficient to permit hybridization of the polynucleotide or RNA resulting from transcription thereof to a complementary RNA strand with a Gibbs free energy change of from about -20 to about -90 kcal/mol, from about -30 to about -80 kcal/mol, from about -40 to about -70 kcal/mol, or from about -50 to about -60 kcal/mol.
In some embodiments, the complementary RNA strand is from 9 to 30 nucleotides in length. For instance, the complementary strand may be from about 18 to 30 nucleotides in length. In some embodiments, the complementary strand is 9 nucleotides in length. In some embodiments, the complementary strand is 12 nucleotides in length. In some embodiments, the complementary strand is 15 nucleotides in length. In some embodiments, the complementary strand is 18 nucleotides in length. In some embodiments, the complementary strand is 21 nucleotides in length. In some embodiments, the complementary strand is 24 nucleotides in length. In some embodiments, the complementary strand is 27 nucleotides in length. In some embodiments, the complementary strand is 30 nucleotides in length.
In some embodiments, the method includes minimizing the CpG content of the polynucleotide while preserving the amino acid sequence of the encoded protein. In some embodiments, the method includes minimizing the homopolymer content of the polynucleotide while preserving the amino acid sequence of the encoded protein.
In some embodiments, the method is used to obtain a codon-optimized polynucleotide for selective expression of the encoded protein in the cell, such as a target cell. In some embodiments, the cell (e.g., the target cell) is a eukaryotic cell, such as a mammalian cell. In some embodiments, the cell is a human cell, such as a liver cell, muscle cell, cardiomyocyte, or other cell of interest. In some embodiments, the method includes incorporating codon substitutions into the polynucleotide so as to increase the sequence identity of the designed polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 1 % (e.g., not among the top 2%, 3%, 4%, 5%, 6%, 6%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21 %, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31 %, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, or more) of expression levels in the target cell but is among the top 1 % (or among the top 2%, 3%, 4%, 5%, 6%, 6%, 8%, 9%, 1 0%, 1 1 %, 12%, 13%, 14%, 1 5%, 16%, 1 7%, 18%, 19%, 20%, 21 %, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31 %, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, or more) of expression levels within a cell other than the target cell. For instance, the method may be used to obtain a gene or RNA equivalent thereof useful for tissue-specific expression of the encoded protein. In some embodiments, the method includes incorporating codon substitutions into the polynucleotide so as to increase the sequence identity of the designed polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 1 % (e.g., not among the top 2%, 3%, 4%, 5%, 6%, 6%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 1 5%, 16%, 17%, 18%, 19%, 20%, 21 %, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31 %, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, or more) of expression levels in the target cell but is among the top 1 % (or among the top 2%, 3%, 4%, 5%, 6%, 6%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 1 6%, 17%, 1 8%, 19%, 20%, 21 %, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31 %, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, or more) of expression levels within a cell of different tissue type. For instance, the target cell may be a liver cell and the cell of different tissue type may be a muscle cell so as to selectively express the gene or RNA equivalent thereof in a liver cell rather than a muscle cell. Alternatively, the target cell may be a muscle cell and the cell of different tissue type may be a liver cell so as to selectively express the gene or RNA equivalent thereof in a muscle cell rather than a liver cell. In this way, the method may be used to obtain a gene or RNA equivalent thereof useful for tissue- specific expression of the encoded protein.
In some embodiments, the method includes preparing a vector containing the codon-optimized gene or RNA equivalent thereof. The vector may be, for instance, a viral vector, such as an adeno- associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, or a vaccinia virus, among others. In some embodiments, the vector is an AAV, such as an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype. The vector may be a pseudotyped AAV, such as rAAV2/8 or rAAV2/9.
In some embodiments, the method includes contacting the vector with a population of cells in vitro or in vivo, for instance, so as to promote the expression of the encoded protein in a target cell. In some embodiments, the RNA equivalent is delivered to the cell (e.g., a target cell) directly, such as by contacting the cell with a liposome, vesicle (e.g., a synthetic vesicle), exosome (e.g., a synthetic exosome), dendrimer, nanoparticle (e.g., a chitosan-based nanoparticle or other cationic nanoparticle), or other carrier so as to promote entry of the RNA equivalent into the target cell.
In some embodiments, the method includes isolating the encoded protein from the population of cells. For instance, the encoded protein may be purified from the population of cells and/or
accompanying cell culture media using chromatography (e.g., size-exclusion chromatography, ion- exchange chromatography, high pressure liquid chromatography, or affinity chromatography, among other chromatographic techniques), gel filtration, or preparative gel electrophoresis, among others. In this way, for instance, the method may be used to obtain the encoded protein in high yield and in high purity. The protein may be purified, for instance, to 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, 99.9%, 99.95%, 99.99% purity, or higher, as assessed, for instance, by a chromatographic procedure described herein or known in the art.
In another aspect, the invention provides a composition including a codon-optimized gene or RNA equivalent thereof designed or prepared according to any of the methods described above or herein.
In some embodiments, the composition is a vector, such as a viral vector. In some embodiments, the viral vector is an AAV, adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, or a vaccinia virus, among others. In some embodiments, the vector is an AAV, such as an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype. In some embodiments, the vector is a pseudotyped AAV, such as rAAV2/8 or rAAV2/9. In some embodiments, the composition is a liposome, vesicle (e.g., a synthetic vesicle), exosome (e.g., a synthetic exosome), dendrimer, nanoparticle (e.g., a chitosan-based nanoparticle or other cationic nanoparticle), or other carrier.
In some embodiments, the gene or RNA equivalent thereof encodes a therapeutic protein. In some embodiments, the therapeutic protein is a protein listed in Table 3, below. In some embodiments, the therapeutic protein is myotubularin 1 (MTM1 ). In some embodiments, the therapeutic protein is acid a-glucosidase (GAA). In some embodiments, the therapeutic protein is calsequestrin 2 (CASQ2). In some embodiments, the therapeutic protein is a uridine diphosphate (UDP) glucuronosyltransferase (UGT), such as UGT family 1 member A1 (UGT1 A1 ).
In another aspect, the invention provides a method of treating a disease or condition in a patient characterized by a deficiency in a protein by administering to the patient a therapeutically effective amount of a codon-optimized gene or RNA equivalent thereof of any of the above aspects or
embodiments of the invention. In some embodiments, the disease or condition is X-linked myotubular myopathy (XLMTM) and the codon-optimized gene or RNA equivalent thereof encodes MTM1 . In some embodiments, the disease or condition is a glycogen storage disease, such as Pompe Disease, and the codon-optimized gene or RNA equivalent thereof encodes a lysosomal enzyme, such as GAA. In some embodiments, the disease or condition is recessive CPVT and the codon-optimized gene or RNA equivalent thereof encodes CASQ2. In some embodiments, the disease or condition is Crigler-Najjar syndrome and the codon-optimized gene or RNA equivalent thereof encodes a UGT family member protein, such as UGT1 A1 .
In addition to the above, in some aspects, the invention features compositions and methods to attenuate gene expression, such as compositions and methods designed on the basis of the Reading Frame Surveillance (RFS) model. The RFS model posits a mechanism by which a ribosome can prevent +1 frameshifting during translation of an mRNA. According to the RFS model, a complementary RNA (cRNA) is generated by an RNA-dependent RNA polymerase (RdRP). During the pioneer round of translation, translation of every set number of codons within a region of the cRNA that is complementary to an exon results in cleavage of the cRNA at a set distance from the ribosomal A-site, thus creating a series of uniformly-sized short complementary RNAs (scRNAs) that span the length of the exonic regions of the mRNA. The RFS model postulates that these scRNAs can serve as a molecular ruler that allows continuous monitoring for ribosomal frameshifts during subsequent rounds of translation, e.g., by the ribosome sensing scRNA termini with respect to the position of tRNAs bound in the ribosome active sites. Sensing of an increased distance between the ribosome and a scRNA terminus (e.g., between the ribosome and the scRNA 5' end, such as may occur when a +1 frameshifted complex is formed) may trigger an alteration and/or abortion of the translation of an mRNA. The RFS model suggests that sensing of +1 frameshifts during the translation process can lead to repeated cleavage of the mRNA transcript that may result in terminal 2-base pair 3' overhangs, which can facilitate loading of RNA fragments onto the RNA Induced Silencing Complex (RISC) in eukaryotic cells. RISC is a multi-protein complex that can be programmed to target almost any gene for silencing, e.g., by a process called RNA interference (RNAi).
For instance, in an aspect, the invention features a method of attenuating expression of a gene (e.g., a wild type gene) in a cell (e.g., a mammalian cell, such as a human cell), such as a cell in vitro or in vivo. The method can include the step of introducing into the cell a polynucleotide having at least one
(e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) +1 frameshift mutation with respect to the gene (e.g., a wild type gene). The polynucleotide may include one or more (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, or 10) nucleotide insertions or deletions causing the +1 frameshift mutation. In some embodiments, the +1 frameshift mutation occurs when five nucleotides are inserted with respect to the gene sequence. In some embodiments, a single nucleotide deletion may cause a +1 frameshift mutation.
In some embodiments, the polynucleotide includes include a wild type copy of the gene operably linked to the portion of the polynucleotide having a +1 frameshift mutation. The polynucleotides may be DNA or RNA polynucleotides. The polynucleotides may be introduced into a cell, such as a cell in vitro or in vivo, by contacting the cell with a vector including the polynucleotide. In some embodiments, the vector is may be a DNA or RNA vector. For example, the vector may be a viral vector, such as a viral vector selected from the group consisting of an adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, adeno-associated virus, baculovirus, herpes simplex virus, and a vaccinia virus. In particular embodiments, the viral vector is an AAV, such as an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype. In other embodiments, the viral vector may be a pseudotyped AAV, such as rAAV2/8 or rAAV2/9. The polynucleotide may be introduced into a cell by electroporation, or by contacting said cell with a cationic polymer, a cationic lipid, or calcium phosphate.
In an additional aspect, the invention features a vector comprising a polynucleotide having a +1 frameshift mutation with respect to a wild type sequence of a gene. In some embodiments, the polynucleotide comprises at least one (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) nucleotide insertion or deletion causing the +1 frameshift mutation. In some embodiments, the +1 frameshift mutation occurs when five nucleotides are inserted with respect to the gene sequence. In some embodiments, a single nucleotide deletion may cause a +1 frameshift mutation. In some embodiments, the polynucleotide comprises a wild type copy of the gene operably linked to the portion of the polynucleotide having the +1 frameshift mutation. In some embodiments, the vector is a DNA or RNA vector. The vector may be a viral vector, such as a viral vector selected from the group consisting of an adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, adeno-associated virus, baculovirus, herpes simplex virus, and a vaccinia virus. In some embodiments, the viral vector is an AAV, such as an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype. In other embodiments, the viral vector may be a pseudotyped AAV, such as rAAV2/8 or rAAV2/9.
The polynucleotides and vectors of the previous two aspects can have a clinical benefit, as these constructs can be administered to a subject, such as a human subject, to treat a disease or condition characterized by aberrant or undesirable expression of a target gene. In some embodiments, aberrant expression can be characterized as the expression of a gene that encodes a mutant protein, e.g., a protein that contains at least one (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) mutations, such as insertion and or deletions. An aberrant protein may exhibit structural and/or functional features that are different from that of the wild type protein, which may be associated with human disease. In some embodiments, the aberrantly expressed gene is associated with a genetic disease (e.g., dominant catecholaminergic polyventricular tachycardia (CPVT) and Long QT syndrome (LQTS)) and/or a proliferative disorder (e.g., cancer).
In addition to the above, in some aspects, the invention provides compositions and methods for expressing a protein of interest, such as a therapeutic protein, in a target cell. The nucleic acid constructs encoding the protein of interest can be delivered to a cell, such as a cell in vitro or in vivo. The nucleic acid constructs can be prepared, e.g., using chemical synthesis techniques, such as through the use of solid phase nucleic acid synthesis protocols. Nucleic acid constructs may be delivered to target cells using a variety of nucleic acid transfer technologies, including, for example, electroporation, lipid- mediated nucleic acid delivery (e.g., cationic lipid-mediated nucleic acid delivery), calcium phosphate- mediated nucleic acid delivery, and nanoparticle-mediated nucleic acid delivery, among others described herein. The proteins encoded by these nucleic acid constructs can have a clinical benefit, as these constructs can be administered to a subject, such as a human subject, to treat a disease or condition characterized by a defect in, or a reduced expression of, the encoded protein. Such diseases include heritable genetic disorders, such as recessive genetic diseases, including, without limitation, X-linked myotubular myopathy (XLMTM), Pompe disease, recessive catecholaminergic polymorphic ventricular tachycardia (CPVT), and Crigler-Najjar syndrome, among others.
For instance, in an aspect, the invention features a method of inducing expression of a protein in a cell, the method including introducing into the cell a duplex including a RNA polynucleotide encoding the protein, wherein the RNA polynucleotide is hybridized to a plurality of complementary nucleic acid strands (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more complementary nucleic acid strands).
In some embodiments, the plurality of complementary nucleic acid strands comprises one or more RNA strands. In some embodiments, the plurality of complementary nucleic acid strands comprises one or more DNA strands.
In some embodiments, the plurality of complementary nucleic acid strands together span from the
5' end to the 3' end of the RNA polynucleotide encoding the protein. For instance, the plurality of complementary strands may together span from the start codon (e.g., AUG) at the 5' end of the RNA polynucleotide encoding the protein to the stop codon (e.g., UAG, UAA, or UGA) at the 3' end of the RNA polynucleotide encoding the protein. In some embodiments, the plurality of complementary strands may together span from a regulatory element upstream of the start codon, such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide, for example, from a Kozak consensus sequence (e.g., GCCGCCACCAUGG (SEQ ID NO: 3)) to, for instance, a stop codon (e.g., UAG, UAA, or UGA) at the 3' end of the RNA polynucleotide encoding the protein. In some embodiments, the plurality of complementary strands may together span from a regulatory element upstream of the start codon, such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide, for example, from a Kozak consensus sequence (e.g., GCCGCCACCAUGG (SEQ ID NO: 3)) to, for instance, a regulatory element downstream of the stop codon, such as a portion of the 3' UTR of the RNA polynucleotide.
In some embodiments, each nucleotide of the RNA polynucleotide encoding the protein is base paired to a nucleotide of the plurality of complementary nucleic acid strands, for instance, such that there are no single stranded regions of the duplex. In some embodiments, the duplex includes one or more single stranded regions. In some embodiments, the single stranded region consists of a multiple of 3 nucleotides in length (e.g., 3, 6, 9, 12, 15, 1 8, 21 , 24, 27, 30, 33, 36, 39, 42, 45, 48, 51 , 54, 57, 60, 63, 66, 69, 72, 75, 78, 81 , 84, 87, 90, 93, 96, 99, 102, 1 05, 108, 1 1 1 , 1 14, 1 17, 120, 123, 126, 129, 132, 135, or more, nucleotides in length).
In some embodiments, each of the complementary nucleic acid strands is from 9 to 30 nucleotides in length. For instance, in some embodiments, each of the complementary nucleic acid
Figure imgf000019_0001
In some embodiments, one or more of the plurality of complementary nucleic acid strands comprise one or more modified nucleotides. In some embodiments, the one or more modified nucleotides comprise a modified adenosine, such as N6-methyladenosine 5'-triphosphate, N1 - methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'-triphosphate, 2'-amino-2'-deoxyadenosine 5'- triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluoro-2'-deoxyadenosine 5'-triphosphate. In some embodiments, the one or more modified nucleotides comprise a modified guanosine, such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'-deoxyguanosine 5'- triphosphate. In some embodiments, the one or more modified nucleotides comprise a modified uridine, such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'-triphosphate, 5-bromouridine 5'-triphosphate, 2- thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'- amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'-deoxyuridine 5'-triphosphate, or 2'-fluoro-2'- deoxyuridine 5'-triphosphate. In some embodiments, the one or more modified nucleotides comprise a modified cytidine, such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5-bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'- deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, or 2'-fluoro-2'-deoxycytidine 5'- triphosphate.
In some embodiments, the duplex is introduced into the cell by electroporation, or by contacting the cell with a cationic lipid, a vesicle, an exosome, a liposome, a dendrimer, a synthetic cationic vesicle, a cationic polymer, or calcium phosphate.
In some embodiments, the cell is a eukaryotic cell, such as a mammalian cell. In some embodiments, the cell is a human cell.
In some embodiments, the protein is a protein listed in Table 3. In some embodiments, for instance, the protein is selected from the group consisting of myotubularin 1 (MTM1 ), acid a-glucosidase (GAA), calsequestrin 2 (CASQ2), and uridine diphosphate glucuronosyltransferase family 1 member A1 (UGT1 A1 ).
In an additional aspect, the invention provides a synthetic duplex including a RNA polynucleotide encoding a protein, wherein the RNA polynucleotide is hybridized to a plurality of complementary nucleic acid strands.
In some embodiments, the plurality of complementary nucleic acid strands comprise one or more RNA strands. In some embodiments, the plurality of complementary nucleic acid strands comprise one or more DNA strands.
In some embodiments, the plurality of complementary nucleic acid strands together span from the 5' end to the 3' end of the RNA polynucleotide encoding the protein. For instance, the plurality of complementary strands may together span from the start codon (e.g., AUG) at the 5' end of the RNA polynucleotide encoding the protein to the stop codon (e.g., UAG, UAA, or UGA) at the 3' end of the RNA polynucleotide encoding the protein. In some embodiments, the plurality of complementary strands may together span from a regulatory element upstream of the start codon, such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide, for example, from a Kozak consensus sequence (e.g., GCCGCCACCAUGG (SEQ ID NO: 3)) to, for instance, a stop codon (e.g., UAG, UAA, or UGA) at the 3' end of the RNA polynucleotide encoding the protein. In some embodiments, the plurality of complementary strands may together span from a regulatory element upstream of the start codon, such as a portion of the 5' untranslated region (UTR) of the RNA polynucleotide, for example, from a Kozak consensus sequence (e.g., GCCGCCACCAUGG (SEQ ID NO: 3)) to, for instance, a regulatory element downstream of the stop codon, such as a portion of the 3' UTR of the RNA polynucleotide.
In some embodiments, each nucleotide of the RNA polynucleotide encoding the protein is base- paired to a nucleotide of the plurality of complementary nucleic acid strands, for instance, such that there are no single stranded regions of the duplex. In some embodiments, the duplex includes one or more single stranded regions. In some embodiments, the single stranded region consists of a multiple of 3 nucleotides in length (e.g., 3, 6, 9, 12, 15, 1 8, 21 , 24, 27, 30, 33, 36, 39, 42, 45, 48, 51 , 54, 57, 60, 63, 66, 69, 72, 75, 78, 81 , 84, 87, 90, 93, 96, 99, 102, 1 05, 108, 1 1 1 , 1 14, 1 17, 120, 123, 126, 129, 132, 135, or more, nucleotides in length).
In some embodiments, each of the complementary nucleic acid strands is from 9 to 30
Figure imgf000021_0001
In some embodiments, one or more of the plurality of complementary nucleic acid strands comprise one or more modified nucleotides. In some embodiments, the one or more modified nucleotides comprise a modified adenosine, such as N6-methyladenosine 5'-triphosphate, N1 - methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'-triphosphate, 2'-amino-2'-deoxyadenosine 5'- triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluoro-2'-deoxyadenosine 5'-triphosphate. In some embodiments, the one or more modified nucleotides comprise a modified guanosine, such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'-deoxyguanosine 5'- triphosphate. In some embodiments, the one or more modified nucleotides comprise a modified uridine, such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'-triphosphate, 5-bromouridine 5'-triphosphate, 2- thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'- amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'-deoxyuridine 5'-triphosphate, or 2'-fluoro-2'- deoxyuridine 5'-triphosphate. In some embodiments, the one or more modified nucleotides comprise a modified cytidine, such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5-bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'- deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, or 2'-fluoro-2'-deoxycytidine 5'- triphosphate.
In some embodiments, the composition is a cationic lipid, a vesicle, an exosome, a liposome, a dendrimer, a synthetic cationic vesicle, or a cationic polymer.
In some embodiments, the protein is a protein listed in Table 3, such as a protein is selected from the group consisting of MTM1 , GAA, CASQ2, and UGT1 A1 .
In another aspect, the invention provides a method of treating a disease or condition in a patient characterized by a deficiency in a protein by administering to the patient a therapeutically effective amount of a composition of the foregoing aspect, or any embodiments thereof. In some embodiments, the disease or condition is X-linked myotubular myopathy (XLMTM) and the protein is MTM1 . In some embodiments, the disease or condition is a glycogen storage disease, such as Pompe Disease, and the protein is a lysosomal enzyme, such as GAA. In some embodiments, the disease or condition is recessive CPVT and the protein is CASQ2. In some embodiments, the disease or condition is Crigler- Najjar syndrome and the protein is UGT1 A1 .
In another aspect, the invention provides a method for inducing alternative splicing in a cell by providing the cell with a polynucleotide that includes the following elements operably linked in a 5'-to-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to an intron (INTR1 ); and a third region corresponding to a second exon (EX2) from the gene thereby inducing alternative splicing. In some embodiments, the wild-type form of the gene includes one or more intervening exons between EX1 and EX2, and the polynucleotide does not comprise a region corresponding to the one or more intervening exons between EX1 and EX2. In some embodiments, induced alternative splicing results in exon skipping during processing of the endogenous mRNA of the gene.
For example, the invention provides a method for inducing exon skipping in a cell by providing the cell with a polynucleotide that includes the following elements operably linked in a 5'-to-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to an intron (INTR1 ); and a third region corresponding to a second exon (EX2) from the gene thereby inducing exon skipping. In some embodiments, the wild-type form of the gene includes one or more intervening exons between EX1 and EX2, and the polynucleotide does not comprise a region corresponding to the one or more intervening exons.
In some embodiments, the polynucleotide further includes a fourth region corresponding to a second intron (INTR2) and a fifth region corresponding to a third exon (EX3) from the gene, operably linked to each other in a 5'-to-3' direction as EX1 -INTR2-EX2-INTR2-EX3. In some embodiments, a wild- type form of the gene includes one or more intervening exons between EX2 and EX3, and the polynucleotide does not include a region corresponding to the one or more intervening exons between EX2 and EX3. Induced alternative splicing, for instance, by way of exon skipping, can induce the splicing together of endogenous exons corresponding to EX2 and EX3.
In some embodiments, the polynucleotide includes additional regions corresponding to additional introns (e.g., regions corresponding to a third (INTR3), fourth (INTR4), fifth (INTR5), sixth (INTR6), seventh (INTR7), eighth (INTR8), ninth (INTR9), or tenth (INTR10) intron, or more) as well as additional intervening regions corresponding to additional exons from the gene (e.g., regions corresponding to a fourth (EX4), fifth (EX5), sixth (EX6), seventh (EX7), eighth (EX8), ninth (EX9), tenth (EX1 0), or eleventh (EX1 1 ) exon, or more), such that each region corresponding to an intron is flanked by regions corresponding to an exon from the gene. In some embodiments, a wild-type form of the gene includes one or more intervening exons between, for example, EX3 and EX4, EX4 and EX5, EX5 and EX6, EX6 and EX7, EX7 and EX8, EX8 and EX9, EX9 and EX1 0, and/or EX10 and EX1 1 , and the polynucleotide does not include a region corresponding to the one or more intervening exons. Alternative splicing, for instance, by way of exon skipping, can induce the splicing together of endogenous exons corresponding to, for example, EX1 -4 (e.g., EX1 , EX2, EX3, and EX4), EX1 -EX5, EX1 -EX6, EX1 -EX7, EX1 -EX8, EX1 - EX9, EX1 -EX10, or EX1 -EX1 1 .
In another aspect, the invention provides a method for inducing alternative splicing by way of exon inclusion in a cell by providing the cell with a polynucleotide that includes the following elements operably linked in a 5'-to-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to a second exon (EX2); and a third region corresponding to a third exon (EX3) from the gene, thereby inducing inclusion of the second exon in an endogenous mRNA transcript. In some embodiments, EX1 includes 9 or more nucleotides (e.g., a multiple of 3, such as 9, 18, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 108, 1 17, 126, 135, 144, 153, 162, 171 , 180, 189, 198, 207, 216, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720, 729, 738, 747, 756, 765, 774, 783, 792, 801 , 810, 819, 828, 837, 846, 855, 864, 873, 882, 891 , 900, 909, 918, 927, 936, 945, 954, 963, 972, 981 , 990, or 999 nucleotides) corresponding to the 3' terminal region of the exon in the gene that flanks a neighboring intron. In some embodiments, EX2 includes 9 or more nucleotides (e.g., a multiple of 3, such as 9, 1 8, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 108, 1 1 7, 126, 135, 144, 153, 1 62, 171 , 180, 189, 198, 207, 216, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720, 729, 738, 747, 756, 765, 774, 783, 792, 801 , 810, 819, 828, 837, 846, 855, 864, 873, 882, 891 , 900, 909, 918, 927, 936, 945, 954, 963, 972, 981 , 990, or 999 nucleotides) corresponding to the 5' terminal region of the exon in the gene that flanks a neighboring intron. In some embodiments, EX2 includes 9 or more nucleotides (e.g., a multiple of 3, such as 9, 18, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 1 08, 1 17, 126, 135, 144, 153, 162, 171 , 1 80, 189, 198, 207, 216, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720, 729, 738, 747, 756, 765, 774, 783, 792, 801 , 810, 81 9, 828, 837, 846, 855, 864, 873, 882, 891 , 900, 909, 91 8, 927, 936, 945, 954, 963, 972, 981 , 990, or 999 nucleotides) corresponding to the 3' terminal region of the exon in the gene that flanks a neighboring intron. In some embodiments, EX3 includes 9 or more nucleotides (e.g., a multiple of 3, such as 9, 18, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 108, 1 17, 126, 135, 144, 153, 162, 1 71 , 180, 189, 198, 207, 216, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720, 729, 738, 747, 756, 765, 774, 783, 792, 801 , 810, 819, 828, 837, 846, 855, 864, 873, 882, 891 , 900, 909, 918, 927, 936, 945, 954, 963, 972, 981 , 990, or 999 nucleotides) corresponding to the 5' terminal region of the exon in the gene that flanks a neighboring intron. In some embodiments, one or more of EX1 , EX2, and EX3 does not include a full-length exon. In some embodiments, induced alternative splicing results in the inclusion of the second exon (e.., the region in the gene corresponding to EX2) during processing of the endogenous mRNA of the gene.
In another aspect, the invention features a composition including a vector including a polynucleotide including the following elements operably linked in a 5'-to-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to an intron (INTR1 ); and a third region corresponding to a second exon (EX2) the gene. In some embodiments, the wild-type form of the gene includes one or more intervening exons between EX1 and EX2, and the polynucleotide does not comprise a region corresponding to the one or more intervening exons.
In some embodiments, the wild-type form of the gene includes a plurality of intervening exons between EX1 and EX2, and the polynucleotide does not contain a region corresponding to any of the intervening exons.
In some embodiments, the polynucleotide further includes a fourth region corresponding to a second intron (INTR2) and a fifth region corresponding to a third exon (EX3) from the gene, operably linked to each other in a 5'-to-3' direction as EX1 -INTR2-EX2-INTR2-EX3. In some embodiments, a wild- type form of the gene includes one or more intervening exons between EX2 and EX3, and the polynucleotide does not include a region corresponding to the one or more intervening exons between EX2 and EX3. Induced alternative splicing, for instance, by way of exon skipping, can induce the splicing together of endogenous exons corresponding to EX2 and EX3.
In some embodiments, the polynucleotide includes additional regions corresponding to additional introns (e.g., regions corresponding to a third (INTR3), fourth (INTR4), fifth (INTR5), sixth (INTR6), seventh (INTR7), eighth (INTR8), ninth (INTR9), or tenth (INTR10) intron, or more) as well as additional intervening regions corresponding to additional exons from the gene (e.g., regions corresponding to a fourth (EX4), fifth (EX5), sixth (EX6), seventh (EX7), eighth (EX8), ninth (EX9), tenth (EX1 0), or eleventh (EX1 1 ) exon, or more), such that each region corresponding to an intron is flanked by regions corresponding to an exon from the gene. In some embodiments, a wild-type form of the gene includes one or more intervening exons between, for example, EX3 and EX4, EX4 and EX5, EX5 and EX6, EX6 and EX7, EX7 and EX8, EX8 and EX9, EX9 and EX1 0, and/or EX10 and EX1 1 , and the polynucleotide does not include a region corresponding to the one or more intervening exons. Alternative splicing, for instance, by way of exon skipping, can induce the splicing together of endogenous exons corresponding to, for example, EX1 -4 (e.g., EX1 , EX2, EX3, and EX4), EX1 -EX5, EX1 -EX6, EX1 -EX7, EX1 -EX8, EX1 - EX9, EX1 -EX10, or EX1 -EX1 1 .
In some embodiments, the region corresponding to an intron (e.g., INTR1 , INTR2, INTR3, INTR4,
INTR5, INTR6, INTR7, INTR8, INTR9, or INTR10, etc.) corresponds to any polynucleotide region that includes both a splice donor and splice acceptor, such that when a polynucleotide including, for example, EX1 -INTR1 -EX2, operably linked 5'-to-3', is expressed, INTR1 allows for splicing of the pre-mRNA such that the mature mRNA contains EX1 directly spliced to EX2 in a 5'-to-3' orientation. In some
embodiments, the region corresponding to an intron (e.g., INTR1 , INTR2, INTR3, INTR4, INTR5, INTR6, INTR7, INTR8, INTR9, or INTR10, etc.) corresponds to a full-length intron from a gene. In some embodiments, the region corresponding to an intron corresponds to a truncated intron from a gene. For instance, when INTR1 (or another region corresponding to an intron as described herein) is derived from a gene, INTR1 (or another region corresponding to an intron as described herein) may be from the same gene as EX1 and/or EX2 (or another region corresponding to an exon from the gene as described herein), or may be from a gene that does not contain EX1 and/or EX2 (or another region corresponding to an exon as described herein). In some embodiments, the region corresponding to an intron (e.g., INTR1 ) is a synthetic intron, not isolated from any known gene. In some embodiments, the region corresponding to an intron (e.g., INTR1 ) includes the following elements linked operably in the 5'-to-3' direction: a 5' region that includes a splice donor site; optionally, an intervening region; and a 3' region that includes a splice acceptor site. The 5' region may have a length in nucleotides of, for example, 0-500, 500-1 000, 1000-1500, or 1 500-2000. The 3' region may have a length in nucleotides of, for example, 0-500, 500- 1000, 1000-1500, or 1 500-2000. The intervening region may have a length, in nucleotides, of 0-500, 500- 1000, 1000-1500, or 1 500-2000.
In some embodiments, the polynucleotide includes a eukaryotic promoter (PEuk), operably linked
5' to EX1 . The eukaryotic promoter may be, for example, a tissue-specific promoter, such as a promoter selected from the group consisting of desmin promoter, creatine kinase promoter, myogenin promoter, a myosin heavy chain promoter, human brain and natriuretic peptide promoter, albumin promoter, a-1 - antitrypsin promoter, and hepatitis B virus core protein promoter; or a constitutive promoter, such as a promoter selected from the group consisting of CMV promoter, chicken-p-actin promoter, HSV promoter, TK promoter, RSV promoter, SV40 promoter, MMTV promoter, and Adenovirus E1 A promoter.
In some embodiments, the polynucleotide includes a region of less than or equal to 2000 nucleotides linked 5' to the region corresponding to an exon from the gene (e.g., EX1 ) that is not from the gene including, e.g., EX1 and EX2. In some embodiments, the polynucleotide includes a region of less than or equal to 2000 nucleotides linked 3' to, e.g., EX2 that is not from the gene including EX1 and EX2. In some embodiments, the polynucleotide includes a region of less than or equal to 2000 nucleotides linked 5' to EX1 that is not from the gene including, e.g., EX1 and EX2 and a region of less than or equal to 2000 nucleotides linked 3' to EX2 that is not from the gene including, e.g., EX1 and EX2.
In some embodiments, the polynucleotide contains a Kozak consensus sequence with a role in translation initiation by PEuk.
In some embodiments, the length in nucleotides of the region corresponding to an exon from the gene (e.g., EX1 ) is "x" or any multiple of "x", wherein "x" is 9-30 nucleotides. Additionally, the length in nucleotides of another region corresponding to an exon from the gene (e.g., EX2) is independently "x" or any multiple of "x", wherein "x" is 9-30 nucleotides.
In some embodiments of the above-described aspect, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 ) is 9 nucleotides or any multiple of 9 nucleotides (e.g.,
9, 18, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 108, 1 1 7, 126, 135, 144, 153, 1 62, 171 , 180, 189, 198, 207, 21 6, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720, 729, 738, 747, 756, 765, 774, 783, 792, 801 , 810, 819, 828, 837, 846, 855, 864, 873, 882, 891 , 900, 909, 918, 927, 936, 945, 954, 963, 972, 981 , 990, 999), and the length in nucleotides of another region corresponding to an exon from the gene (e.g., EX2, among others described herein) is independently 9 nucleotides or any multiple of 9 nucleotides (e.g., 9, 18, 27, 36, 45, 54, 63, 72, 81 , 90, 99, 1 08, 1 17, 126, 135, 144, 153, 162, 171 , 180, 189, 198, 207, 216, 225, 234, 243, 252, 261 , 270, 279, 288, 297, 306, 315, 324, 333, 342, 351 , 360, 369, 378, 387, 396, 405, 414, 423, 432, 441 , 450, 459, 468, 477, 486, 495, 504, 513, 522, 531 , 540, 549, 558, 567, 576, 585, 594, 603, 612, 621 , 630, 639, 648, 657, 666, 675, 684, 693, 702, 71 1 , 720, 729, 738, 747, 756, 765, 774, 783, 792, 801 , 81 0, 819, 828, 837, 846, 855, 864, 873, 882, 891 , 900, 909, 918, 927, 936, 945, 954, 963, 972, 981 , 990, 999, 1008).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 10 nucleotides or any multiple of 10 nucleotides (e.g.,
10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 1 10, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 41 0, 420, 430, 440,
450, 460, 470, 480, 490, 500, 51 0, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000), and the length in nucleotides of another region corresponding to an exon from the gene (e.g., EX2, among others described herein) is independently 10 nucleotides or any multiple of 10 nucleotides (e.g., 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 1 10, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 51 0, 520, 530, 540, 550, 560, 570, 580, 590, 600, 61 0, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 1 1 nucleotides or any multiple of 1 1 nucleotides (e.g.,
1 1 , 22, 33, 44, 55, 66, 77, 88, 99, 1 10, 121 , 132, 143, 1 54, 165, 176, 187, 198, 209, 220, 231 , 242, 253, 264, 275, 286, 297, 308, 319, 330, 341 , 352, 363, 374, 385, 396, 407, 418, 429, 440, 451 , 462, 473, 484, 495, 506, 517, 528, 539, 550, 561 , 572, 583, 594, 605, 616, 627, 638, 649, 660, 671 , 682, 693, 704, 71 5, 726, 737, 748, 759, 770, 781 , 792, 803, 814, 825, 836, 847, 858, 869, 880, 891 , 902, 913, 924, 935, 946, 957, 968, 979, 990, 1001 ), and the length in nucleotides of another region corresponding to an exon from the gene (e.g., EX2, among others described herein) is independently 1 1 nucleotides or any multiple of 1 1 nucleotides (e.g., 1 1 , 22, 33, 44, 55, 66, 77, 88, 99, 1 10, 121 , 132, 143, 154, 165, 176, 187, 198, 209, 220, 231 , 242, 253, 264, 275, 286, 297, 308, 31 9, 330, 341 , 352, 363, 374, 385, 396, 407, 418, 429, 440,
451 , 462, 473, 484, 495, 506, 51 7, 528, 539, 550, 561 , 572, 583, 594, 605, 616, 627, 638, 649, 660, 671 , 682, 693, 704, 715, 726, 737, 748, 759, 770, 781 , 792, 803, 814, 825, 836, 847, 858, 869, 880, 891 , 902, 913, 924, 935, 946, 957, 968, 979, 990, 1001 ).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 12 nucleotides or any multiple of 12 nucleotides (e.g.,
12, 24, 36, 48, 60, 72, 84, 96, 108, 120, 132, 144, 156, 168, 180, 192, 204, 216, 228, 240, 252, 264, 276, 288, 300, 312, 324, 336, 348, 360, 372, 384, 396, 408, 420, 432, 444, 456, 468, 480, 492, 504, 516, 528, 540, 552, 564, 576, 588, 600, 612, 624, 636, 648, 660, 672, 684, 696, 708, 720, 732, 744, 756, 768, 780, 792, 804, 816, 828, 840, 852, 864, 876, 888, 900, 912, 924, 936, 948, 960, 972, 984, 996, 1008), and the length in nucleotides of another region corresponding to an exon from the gene is independently 12 nucleotides or any multiple of 12 nucleotides (e.g., 12, 24, 36, 48, 60, 72, 84, 96, 1 08, 120, 132, 144, 156, 168, 180, 192, 204, 216, 228, 240, 252, 264, 276, 288, 300, 312, 324, 336, 348, 360, 372, 384, 396, 408, 420, 432, 444, 456, 468, 480, 492, 504, 516, 528, 540, 552, 564, 576, 588, 600, 612, 624, 636, 648, 660, 672, 684, 696, 708, 720, 732, 744, 756, 768, 780, 792, 804, 816, 828, 840, 852, 864, 876, 888, 900, 912, 924, 936, 948, 960, 972, 984, 996, 1008).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 13 nucleotides or any multiple of 13 nucleotides (e.g., 13, 26, 39, 52, 65, 78, 91 , 104, 1 17, 130, 143, 1 56, 169, 182, 195, 208, 221 , 234, 247, 260, 273, 286, 299, 312, 325, 338, 351 , 364, 377, 390, 403, 41 6, 429, 442, 455, 468, 481 , 494, 507, 520, 533, 546, 559, 572, 585, 598, 61 1 , 624, 637, 650, 663, 676, 689, 702, 715, 728, 741 , 754, 767, 780, 793, 806, 819, 832, 845, 858, 871 , 884, 897, 910, 923, 936, 949, 962, 975, 988, 1001 ), and the length in nucleotides of another region corresponding to an exon from the gene is independently 13 nucleotides or any multiple of 13 nucleotides (e.g., 13, 26, 39, 52, 65, 78, 91 , 104, 1 1 7, 130, 143, 156, 169, 1 82, 195, 208, 221 , 234, 247, 260, 273, 286, 299, 312, 325, 338, 351 , 364, 377, 390, 403, 416, 429, 442, 455, 468, 481 , 494, 507, 520, 533, 546, 559, 572, 585, 598, 61 1 , 624, 637, 650, 663, 676, 689, 702, 715, 728, 741 , 754, 767, 780, 793, 806, 819, 832, 845, 858, 871 , 884, 897, 91 0, 923, 936, 949, 962, 975, 988, 1001 ).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 14 nucleotides or any multiple of 14 nucleotides (e.g., 14, 28, 42, 56, 70, 84, 98, 1 12, 126, 140, 154, 1 68, 182, 196, 21 0, 224, 238, 252, 266, 280, 294, 308,
322, 336, 350, 364, 378, 392, 406, 420, 434, 448, 462, 476, 490, 504, 518, 532, 546, 560, 574, 588, 602, 616, 630, 644, 658, 672, 686, 700, 714, 728, 742, 756, 770, 784, 798, 812, 826, 840, 854, 868, 882, 896, 910, 924, 938, 952, 966, 980, 994, 1008), and the length in nucleotides of another region corresponding to an exon from the gene is independently 14 nucleotides or any multiple of 14 nucleotides (e.g., 14, 28, 42, 56, 70, 84, 98, 1 12, 126, 140, 154, 1 68, 182, 196, 210, 224, 238, 252, 266, 280, 294, 308, 322, 336, 350, 364, 378, 392, 406, 420, 434, 448, 462, 476, 490, 504, 518, 532, 546, 560, 574, 588, 602, 616, 630, 644, 658, 672, 686, 700, 714, 728, 742, 756, 770, 784, 798, 812, 826, 840, 854, 868, 882, 896, 910, 924, 938, 952, 966, 980, 994, 1008).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 15 nucleotides or any multiple of 15 nucleotides (e.g.,
15, 30, 45, 60, 75, 90, 1 05, 120, 135, 150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300, 31 5, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855, 870, 885, 900, 91 5, 930, 945, 960, 975, 990, 1005), and the length in nucleotides of another region corresponding to an exon from the gene is independently 15 nucleotides or any multiple of 15 nucleotides (e.g., 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 1 80, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855, 870, 885, 900, 915, 930, 945, 960, 975, 990, 1005).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 16 nucleotides or any multiple of 16 nucleotides (e.g.,
16, 32, 48, 64, 80, 96, 1 12, 128, 144, 160, 176, 192, 208, 224, 240, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512, 528, 544, 560, 576, 592, 608, 624, 640, 656, 672, 688, 704, 720, 736, 752, 768, 784, 800, 816, 832, 848, 864, 880, 896, 912, 928, 944, 960, 976, 992, 1008), and the length in nucleotides of another region corresponding to an exon from the gene is independently 16 nucleotides or any multiple of 16 nucleotides (e.g., 16, 32, 48, 64, 80, 96, 1 12, 128, 144, 160, 176, 192, 208, 224, 240, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512, 528, 544, 560, 576, 592, 608, 624, 640, 656, 672, 688, 704, 720, 736, 752, 768, 784, 800, 816, 832, 848, 864, 880, 896, 912, 928, 944, 960, 976, 992, 1008).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 17 nucleotides or any multiple of 17 nucleotides (e.g., 17, 34, 51 , 68, 85, 102, 1 19, 136, 153, 1 70, 187, 204, 221 , 238, 255, 272, 289, 306, 323, 340, 357, 374, 391 , 408, 425, 442, 459, 476, 493, 510, 527, 544, 561 , 578, 595, 612, 629, 646, 663, 680, 697, 714, 731 , 748, 765, 782, 799, 816, 833, 850, 867, 884, 901 , 918, 935, 952, 969, 986, 1003), and the length in nucleotides of another region corresponding to an exon form the gene is independently 17 nucleotides or any multiple of 17 nucleotides (e.g., 17, 34, 51 , 68, 85, 102, 1 19, 136, 153, 170, 187, 204, 221 , 238, 255, 272, 289, 306, 323, 340, 357, 374, 391 , 408, 425, 442, 459, 476, 493, 510, 527, 544, 561 , 578, 595, 612, 629, 646, 663, 680, 697, 714, 731 , 748, 765, 782, 799, 816, 833, 850, 867, 884, 901 , 91 8, 935, 952, 969, 986, 1003).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 18 nucleotides or any multiple of 18 nucleotides (e.g., 18, 36, 54, 72, 90, 108, 126, 144, 162, 1 80, 198, 216, 234, 252, 270, 288, 306, 324, 342, 360, 378, 396, 414, 432, 450, 468, 486, 504, 522, 540, 558, 576, 594, 612, 630, 648, 666, 684, 702, 720, 738, 756, 774, 792, 810, 828, 846, 864, 882, 900, 918, 936, 954, 972, 990, 1008), and the length in nucleotides of another region corresponding to an exon from the gene is independently 18 nucleotides or any multiple of 18 nucleotides (e.g., 18, 36, 54, 72, 90, 108, 126, 144, 162, 180, 198, 216, 234, 252, 270, 288, 306, 324, 342, 360, 378, 396, 414, 432, 450, 468, 486, 504, 522, 540, 558, 576, 594, 612, 630, 648, 666, 684, 702, 720, 738, 756, 774, 792, 810, 828, 846, 864, 882, 900, 918, 936, 954, 972, 990, 1008).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 19 nucleotides or any multiple of 19 nucleotides (e.g., 19, 38, 57, 76, 95, 1 14, 133, 152, 171 , 1 90, 209, 228, 247, 266, 285, 304, 323, 342, 361 , 380, 399, 41 8, 437, 456, 475, 494, 513, 532, 551 , 570, 589, 608, 627, 646, 665, 684, 703, 722, 741 , 760, 779, 798, 81 7, 836, 855, 874, 893, 912, 931 , 950, 969, 988, 1007), and the length in nucleotides of another region corresponding to an exon from the gene is independently 19 nucleotides or any multiple of 19 nucleotides (e.g., 19, 38, 57, 76, 95, 1 14, 133, 152, 171 , 190, 209, 228, 247, 266, 285, 304, 323, 342, 361 , 380, 399, 418, 437, 456, 475, 494, 513, 532, 551 , 570, 589, 608, 627, 646, 665, 684, 703, 722, 741 , 760, 779, 798, 817, 836, 855, 874, 893, 912, 931 , 950, 969, 988, 1 007).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 20 nucleotides or any multiple of 20 nucleotides (e.g.,
20, 40, 60, 80, 1 00, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560, 580, 600, 620, 640, 660, 680, 700, 720, 740, 760, 780, 800, 820, 840, 860, 880, 900, 920, 940, 960, 980, 1000), and the length in nucleotides of another region corresponding to an exon from the gene is independently 20 nucleotides or any multiple of 20 nucleotides (e.g., 20, 40, 60, 80, 100, 120, 140, 1 60, 180, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400, 420, 440, 460, 480, 500, 520, 540, 560, 580, 600, 620, 640, 660, 680, 700, 720, 740, 760, 780, 800, 820, 840, 860, 880, 900, 920, 940, 960, 980, 1 000).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 21 nucleotides or any multiple of 21 nucleotides (e.g.,
21 , 42, 63, 84, 1 05, 126, 147, 168, 189, 210, 231 , 252, 273, 294, 315, 336, 357, 378, 399, 420, 441 , 462, 483, 504, 525, 546, 567, 588, 609, 630, 651 , 672, 693, 714, 735, 756, 777, 798, 819, 840, 861 , 882, 903, 924, 945, 966, 987, 1008), and the length in nucleotides of another region corresponding to an exon from the gene is independently 21 nucleotides or any multiple of 21 nucleotides (e.g., 21 , 42, 63, 84, 105, 126, 147, 168, 189, 210, 231 , 252, 273, 294, 315, 336, 357, 378, 399, 420, 441 , 462, 483, 504, 525, 546, 567, 588, 609, 630, 651 , 672, 693, 714, 735, 756, 777, 798, 819, 840, 861 , 882, 903, 924, 945, 966, 987, 1008).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 22 nucleotides or any multiple of 22 nucleotides (e.g., 22, 44, 66, 88, 1 10, 132, 154, 176, 198, 220, 242, 264, 286, 308, 330, 352, 374, 396, 41 8, 440, 462, 484, 506, 528, 550, 572, 594, 616, 638, 660, 682, 704, 726, 748, 770, 792, 814, 836, 858, 880, 902, 924, 946, 968, 990, 1012), and the length in nucleotides of another region corresponding to an exon from the gene is independently 22 nucleotides or any multiple of 22 nucleotides (e.g., 22, 44, 66, 88, 1 10, 132, 1 54, 176, 198, 220, 242, 264, 286, 308, 330, 352, 374, 396, 418, 440, 462, 484, 506, 528, 550, 572, 594, 616, 638, 660, 682, 704, 726, 748, 770, 792, 814, 836, 858, 880, 902, 924, 946, 968, 990, 1012).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 23 nucleotides or any multiple of 23 nucleotides (e.g.,
23, 46, 69, 92, 1 15, 138, 161 , 184, 207, 230, 253, 276, 299, 322, 345, 368, 391 , 414, 437, 460, 483, 506, 529, 552, 575, 598, 621 , 644, 667, 690, 713, 736, 759, 782, 805, 828, 851 , 874, 897, 920, 943, 966, 989, 1012), and the length in nucleotides of another region corresponding to an exon from the gene is independently 23 nucleotides or any multiple of 23 nucleotides (e.g., 23, 46, 69, 92, 1 15, 138, 161 , 184, 207, 230, 253, 276, 299, 322, 345, 368, 391 , 414, 437, 460, 483, 506, 529, 552, 575, 598, 621 , 644, 667, 690, 713, 736, 759, 782, 805, 828, 851 , 874, 897, 920, 943, 966, 989, 1 012).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 24 nucleotides or any multiple of 24 nucleotides (e.g.,
24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288, 312, 336, 360, 384, 408, 432, 456, 480, 504, 528, 552, 576, 600, 624, 648, 672, 696, 720, 744, 768, 792, 816, 840, 864, 888, 912, 936, 960, 984, 1008), and the length in nucleotides of another region corresponding to an exon from the gene is independently 24 nucleotides or any multiple of 24 nucleotides (e.g., 24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288, 312, 336, 360, 384, 408, 432, 456, 480, 504, 528, 552, 576, 600, 624, 648, 672, 696, 720, 744, 768, 792, 816, 840, 864, 888, 912, 936, 960, 984, 1008).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 25 nucleotides or any multiple of 25 nucleotides (e.g.,
25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 975, 1000), and the length in nucleotides of another region corresponding to an exon from the gene is independently 25 nucleotides or any multiple of 25 nucleotides (e.g., 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 975, 1000).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 26 nucleotides or any multiple of 26 nucleotides (e.g.,
26, 52, 78, 104, 130, 156, 182, 208, 234, 260, 286, 312, 338, 364, 390, 416, 442, 468, 494, 520, 546, 572, 598, 624, 650, 676, 702, 728, 754, 780, 806, 832, 858, 884, 910, 936, 962, 988, 1014), and the length in nucleotides of another region corresponding to an exon from the gene is independently 26 nucleotides or any multiple of 26 nucleotides (e.g., 26, 52, 78, 104, 130, 156, 182, 208, 234, 260, 286, 312, 338, 364, 390, 416, 442, 468, 494, 520, 546, 572, 598, 624, 650, 676, 702, 728, 754, 780, 806, 832, 858, 884, 910, 936, 962, 988, 1014).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 27 nucleotides or any multiple of 27 nucleotides (e.g.,
27, 54, 81 , 108, 135, 162, 189, 216, 243, 270, 297, 324, 351 , 378, 405, 432, 459, 486, 513, 540, 567, 594, 621 , 648, 675, 702, 729, 756, 783, 810, 837, 864, 891 , 918, 945, 972, 999, 1026), and the length in nucleotides of another region corresponding to an exon from the gene is independently 27 nucleotides or any multiple of 27 nucleotides (e.g., 27, 54, 81 , 1 08, 135, 162, 189, 216, 243, 270, 297, 324, 351 , 378, 405, 432, 459, 486, 513, 540, 567, 594, 621 , 648, 675, 702, 729, 756, 783, 810, 837, 864, 891 , 918, 945, 972, 999, 1026).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 28 nucleotides or any multiple of 28 nucleotides (e.g.,
28, 56, 84, 1 12, 140, 168, 196, 224, 252, 280, 308, 336, 364, 392, 420, 448, 476, 504, 532, 560, 588, 616, 644, 672, 700, 728, 756, 784, 812, 840, 868, 896, 924, 952, 980, 1 008), and the length in nucleotides of another region corresponding to an exon from the gene is independently 28 nucleotides or any multiple of 28 nucleotides (e.g., 28, 56, 84, 1 12, 140, 168, 196, 224, 252, 280, 308, 336, 364, 392,
420, 448, 476, 504, 532, 560, 588, 616, 644, 672, 700, 728, 756, 784, 812, 840, 868, 896, 924, 952, 980, 1008).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 29 nucleotides or any multiple of 29 nucleotides (e.g.,
29, 58, 87, 1 16, 145, 174, 203, 232, 261 , 290, 319, 348, 377, 406, 435, 464, 493, 522, 551 , 580, 609, 638, 667, 696, 725, 754, 783, 812, 841 , 870, 899, 928, 957, 986, 1015), and the length in nucleotides of another region corresponding to an exon from the gene is independently 29 nucleotides or any multiple of
29 nucleotides (e.g., 29, 58, 87, 1 16, 145, 174, 203, 232, 261 , 290, 319, 348, 377, 406, 435, 464, 493, 522, 551 , 580, 609, 638, 667, 696, 725, 754, 783, 812, 841 , 870, 899, 928, 957, 986, 1015).
In some embodiments, the length in nucleotides of a region corresponding to an exon from the gene (e.g., EX1 , among others described herein) is 30 nucleotides or any multiple of 30 nucleotides (e.g.,
30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360, 390, 420, 450, 480, 51 0, 540, 570, 600, 630, 660, 690, 720, 750, 780, 810, 840, 870, 900, 930, 960, 990, 1020), and the length in nucleotides of another region corresponding to an exon from the gene is independently 30 nucleotides or any multiple of
30 nucleotides (e.g., 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360, 390, 420, 450, 480, 510, 540, 570, 600, 630, 660, 690, 720, 750, 780, 81 0, 840, 870, 900, 930, 960, 990, 1020).
In some embodiments, the polynucleotide is incorporated into a vector and delivered to a cell, wherein the cell may be a human cell.
In some embodiments, the cell includes an endogenous gene including the following exons linked operably in a 5'-to-3' direction: EX1 , an exon having a mutation relative to the wild type form of the gene, and EX2. The gene may also contain one or more additional exons 5' to EX1 , 3' to EX2, or in the intervening sequence between EX1 and EX2.
In some embodiments, the mutation is a point mutation resulting in a frameshift, or a point mutation that generates a PTC, resulting in a nonsense mutation. In some embodiments, the mutation is an insertion or deletion resulting in a frameshift. In some embodiments the mutation is a duplication or deletion of a region including a full-length exon, such that the duplication or deletion results in a frameshift.
In some embodiments, the gene is the DMD gene, which codes for the protein dystrophin, and the mutation is a point mutation. In some embodiments, the exon having the mutation may be selected from the group consisting of exon 2, exons 6-8, exon 1 7, exons 19-20, exon 35, exon 43-59, and exons 69-70. For instance, in some embodiments, the exon bearing the mutation in the DMD gene is exon 2. In some embodiments, the exon bearing the mutation in the DMD gene is exon 6. In some embodiments, the exon bearing the mutation in the DMD gene is exon 7. In some embodiments, the exon bearing the mutation in the DMD gene is exon 8. In some embodiments, the exon bearing the mutation in the DMD gene is exon 17. In some embodiments, the exon bearing the mutation in the DMD gene is exon 19. In some embodiments, the exon bearing the mutation in the DMD gene is exon 20. In some embodiments, the exon bearing the mutation in the DMD gene is exon 35. In some embodiments, the exon bearing the mutation in the DMD gene is exon 43. In some embodiments, the exon bearing the mutation in the DMD gene is exon 44. In some embodiments, the exon bearing the mutation in the DMD gene is exon 45. In some embodiments, the exon bearing the mutation in the DMD gene is exon 46. In some embodiments, the exon bearing the mutation in the DMD gene is exon 47. In some embodiments, the exon bearing the mutation in the DMD gene is exon 48. In some embodiments, the exon bearing the mutation in the DMD gene is exon 49. In some embodiments, the exon bearing the mutation in the DMD gene is exon 50. In some embodiments, the exon bearing the mutation in the DMD gene is exon 51 . In some embodiments, the exon bearing the mutation in the DMD gene is exon 52. In some embodiments, the exon bearing the mutation in the DMD gene is exon 53. In some embodiments, the exon bearing the mutation in the DMD gene is exon 54. In some embodiments, the exon bearing the mutation in the DMD gene is exon 55. In some embodiments, the exon bearing the mutation in the DMD gene is exon 56. In some embodiments, the exon bearing the mutation in the DMD gene is exon 57. In some embodiments, the exon bearing the mutation in the DMD gene is exon 58. In some embodiments, the exon bearing the mutation in the DMD gene is exon 59. In some embodiments, the exon bearing the mutation in the DMD gene is exon 69. In some embodiments, the exon bearing the mutation in the DMD gene is exon 70. In some embodiments, in the wild-type DMD gene, EX1 is 5' to the mutation-bearing exon and/or EX2 is 3' to the mutation bearing exon. In some embodiments, the mutation bearing exon is not present in the polynucleotide.
In some embodiments, the gene is the DMD gene and the mutation is a duplication or deletion of a region including a full-length exon, such that the duplication or deletion results in a frameshift. In some embodiments, alternative splicing, for instance, by way of exon skipping, of one or more exons may restore the downstream reading frame. When the gene is DMD, in some embodiments, the exon to be skipped, thereby restoring the downstream reading frame, is selected from the group consisting of exon 2, exons 6-8, exon 17, exons 19-20, exon 35, exon 43-59, and exons 69-70. For instance, in some embodiments, the exon to be skipped is exon 2. In some embodiments, the exon to be skipped is exon 6. In some embodiments, the exon to be skipped is exon 7. In some embodiments, the exon to be skipped is exon 8. In some embodiments, the exon to be skipped is exon 1 7. In some embodiments, the exon to be skipped is exon 19. In some embodiments, the exon to be skipped is exon 20. In some embodiments, the exon to be skipped is exon 35. In some embodiments, the exon to be skipped is exon 43. In some embodiments, the exon to be skipped is exon 44. In some embodiments, the exon to be skipped is exon 45. In some embodiments, the exon to be skipped is exon 46. In some embodiments, the exon to be skipped is exon 47. In some embodiments, the exon to be skipped is exon 48. In some embodiments, the exon to be skipped is exon 49. In some embodiments, the exon to be skipped is exon 50. In some embodiments, the exon to be skipped is exon 51 . In some embodiments, the exon to be skipped is exon 52. In some embodiments, the exon to be skipped is exon 53. In some embodiments, the exon to be skipped is exon 54. In some embodiments, the exon to be skipped is exon 55. In some embodiments, the exon to be skipped is exon 56. In some embodiments, the exon to be skipped is exon 57. In some embodiments, the exon to be skipped is exon 58. In some embodiments, the exon to be skipped is exon 59. In some embodiments, the exon to be skipped is exon 69. In some embodiments, the exon to be skipped is exon 70.
In some embodiments, the mutation of the dystrophin gene may be related to a pathological state. The pathological state may be, for instance, Duchenne muscular dystrophy (DMD) or Becker's Muscular Dystrophy (BMD).
In some embodiments, the polynucleotide is flanked by one or more cloning sites. In some embodiments, the polynucleotide is incorporated into a vector. In some embodiments, the vector is a viral vector selected from the group consisting of adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, and vaccinia virus. In some embodiments the viral vector including the polynucleotide is introduced into a cell. In some embodiments, the viral vector is an AAV selected from the group consisting of AAV1 , AAV2. AAV3, AAV4, AA, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype. In some embodiments, the viral vector is a pseudotyped AAV selected from rAAV2/8 or rAAV2/9.
In some embodiments, the polynucleotide is introduced into a cell by electroporation, or by contacting the cell with a cationic polymer, a cationic lipid, or calcium phosphate.
Definitions
As used herein, the term "about" refers to a value that is within 10% above or below the value being described.
As used herein, the term "affinity" refers to the strength of a binding interaction between two molecules, such as the interaction between two polynucleotides. The term "Kd", as used herein, is intended to refer to the dissociation constant, which can be obtained, for example, from the ratio of the rate constant for the dissociation of the two molecules (kd) to the rate constant for the association of the two molecules (ka) and is expressed as a molar concentration (M). Kd values for the interaction between two molecules, such as the interaction between two polynucleotides, can be determined, for example, using methods established in the art. Methods that can be used to determine the Kd of a receptor-ligand interaction include surface plasmon resonance, e.g., through the use of a biosensor system such as a BIACORE® system, as well as calorimetry procedures, including isothermal titration calorimetry techniques described herein or known in the art.
As used herein, the terms "attenuated expression," "attenuated expression level," or "attenuated levels," refer to a decreased expression or decreased level of a gene (e.g., a wild type gene) in a cell (e.g., a mammalian cell, e.g., a human cell) relative to a control cell, such as a cell that does not contain a polynucleotide having a +1 frameshift mutation with respect to the gene (e.g., the wild type gene).
As used herein, the term "basal level" in the context of gene expression refers to an endogenous expression level of an encoded protein that is above the limit of detection of a conventional protein detection assay. Conventional protein detection assays include, without limitation, ELISA-based assays, immunoblot assays (e.g., Western blot assays), mass spectrometry (such as, for instance, matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry and electrospray ionization (ESI) mass spectrometry), and spectroscopic methods, such as nuclear magnetic resonance (NMR), infrared (IR) spectroscopy, and UV-Vis spectroscopy, among others.
As used herein, the term "cell type" refers to a group of cells sharing a phenotype that is statistically separable based on gene expression data. For instance, cells of a common cell type may share similar structural and/or functional characteristics, such as similar gene activation patterns and antigen presentation profiles. Cells of a common cell type may include those that are isolated from a common tissue (e.g., epithelial tissue, neural tissue, connective tissue, or muscle tissue) and/or those that are isolated from a common organ, tissue system, blood vessel, or other structure and/or region in an organism.
As used herein, the term "complementary RNA" (cRNA) refers to a complementary RNA strand produced by an RNA-dependent RNA polymerase from a pre-mRNA template.
As used herein, the term "CpG" refers to a site within a nucleic acid molecule in which a cytosine nucleoside is bound to a guanosine nucleoside by way of a phosphodiester bond. The term "CpG content" in the context of a polynucleotide refers to the relative quantity of CpG sites within a nucleic acid molecule.
As used herein, the term "endogenous" describes a molecule (e.g., a polypeptide, nucleic acid, or cofactor) that is found naturally in a particular organism (e.g., a human) or in a particular location within an organism (e.g., an organ, a tissue, or a cell, such as a human cell).
As used herein, the term "exogenous" describes a molecule (e.g., a polypeptide, nucleic acid, or cofactor) that is not found naturally in a particular organism (e.g., a human) or in a particular location within an organism (e.g., an organ, a tissue, or a cell, such as a human cell). Exogenous materials include those that are provided from an external source to an organism or to cultured matter extracted there from.
As used herein, the term "exon" refers to a region within the coding region of a gene, the nucleotide sequence of which determines the amino acid sequence of the corresponding protein. The term exon also refers to the corresponding region of the RNA transcribed from a gene. A gene, as defined above, may contain, for instance, a minimum of three exons separated by intervening introns. Exons are transcribed into pre-mRNA, and may be included in the mature mRNA depending on the alternative splicing of the gene. Exons that are included in the mature mRNA following processing are translated into protein, wherein the sequence of the exon determines the amino acid composition of the protein.
As used herein, the term "exon skipping" refers to a form of alternative splicing wherein one or more exons are excluded from the resulting mature mRNA. Exon skipping may occur as a result of endogenous regulation, or may be induced in a pre-determined manner by the presence of a polynucleotide.
As used herein, the terms "a first region corresponding to a first exon from a gene" and ΈΧ1 " are used interchangeably and refer to a region that corresponds to any region of an exon greater than 20 nucleotides in length directly upstream of the splice donor site, wherein the region may be from any exon of the gene. For example, a first region corresponding to a first exon from a gene may be any full-length exon from a gene (e.g., exon 1 , exon 2, exon 3, exon 4, or exon 5, etc.). Alternately, a first region corresponding to a first exon from a gene may be a truncated portion of any exon from a gene (e.g., exon 1 , exon 2, exon 3, exon 4, or exon 5, etc.).
As used herein, the terms "a second region corresponding to a second exon from a gene" and
ΈΧ2" are used interchangeably and refer to a region that corresponds to any region of an exon greater than 20 nucleotides in length directly downstream of the splice acceptor site, wherein the region may be from any exon of the gene. For example, a second region corresponding to a second exon from a gene may be any full-length exon from a gene (e.g., exon 1 , exon 2, exon 3, exon 4, or exon 5, etc.).
Alternately, a second region corresponding to a second exon from a gene may be a truncated portion of any exon from a gene (e.g., exon 1 , exon 2, exon 3, exon 4, or exon 5, etc.).
As used herein, the terms "a third region corresponding to a third exon from a gene" and subsequent enumerations (e.g., ΈΧ3," ΈΧ4," ΈΧ5," ΈΧ6," ΈΧ7," ΈΧ8," ΈΧ9," ΈΧ10," and ΈΧ1 1 ," etc.) refer to a region that corresponds to any region of an exon greater than 20 nucleotides in length directly downstream of the splice donor site, wherein the region may be from any exon of the gene, as described for EX1 and EX2 herein.
As used herein, the term "intron" refers to a region within the coding region of a gene, the nucleotide sequence of which is not translated into the amino acid sequence of the corresponding protein. The term intron also refers to the corresponding region of the RNA transcribed from a gene. When used in the context of alternative RNA splicing, such as exon skipping, the term "gene" refers to a gene as defined below that contains at minimum two introns, each of which forms the intervening sequence between two exons. Introns are transcribed into pre-mRNA, but are removed during processing, and are not included in the mature mRNA.
As used herein, the terms "a second region corresponding to an intron" and "INTR1 ," as well as subsequent enumerations (e.g., "INTR2," "INTR3," "INTR4," "INTR5," "INTR6," "INTR7," "INTR8,"
"INTR9," and "INTR10", etc.), refer to any polynucleotide region that includes both a splice donor and splice acceptor, such that when a polynucleotide including EX1 -INTR1 -EX2, operably linked 5'-to-3', is expressed, INTR1 allows for splicing of the pre-mRNA such that the mature mRNA contains EX1 directly spliced to EX2 in a 5'-to-3' orientation. In some embodiments, INTR1 corresponds to a full-length intron from a gene. In some embodiments, INTR1 corresponds to a truncated intron from the gene. Wherein INTR1 is derived from a gene, INTR1 may be from the same gene as EX1 and EX2, or may be from a gene that does not contain EX1 and EX2. In some embodiments, INTR1 is a synthetic intron, not isolated from any known gene. In some embodiments, INTR1 includes the following elements linked operably in the 5'-to-3' direction: a 5' region that includes a splice donor site; optionally, an intervening region; and a 3' region that includes a splice acceptor site.
As used herein, the term "frameshift" refers to a change in how a ribosome defines a codon within a gene, and therefore defined the reading frame of translation. The identity of each amino acid in a protein is determined by a three nucleotide codon, which is defined by the ribosome through binding of the anticodon of a tRNA to its complementary sequence on an mRNA. Therefore, any occurrence of a frameshift may alter the amino acid sequence of the translated protein both at and downstream of the location of the frameshift.
As used herein, the term "GC content" refers to the quantity of nucleosides in a particular nucleic acid molecule, such as a DNA or RNA polynucleotide, that are either guanosine (G) or cytidine (C) relative to the total quantity of nucleosides present in the nucleic acid molecule. GC content may be expressed as a percentage, for instance, according to the following formula:
GC Content = ((Total quantity of guanosine nucleosides) + (Total quantity of cytidine nucleosides) / (Total quantity of nucleosides)) x 100
As used herein, the term "gene" refers to a region of DNA that encodes a protein. A gene may include regulatory regions and a protein-coding region. In some embodiments, a gene includes two or more introns and three or more exons, wherein each intron forms an intervening sequence between two exons.
As used herein, the term "gene expression profile" refers to a measurement of the level of one or more genes expressed by an individual cell, a cell type, or by a plurality of cells present in a
homogeneous or heterogeneous mixture of cell types. The expression level of a gene expressed by a single cell, type of cell, or plurality of cells can be ascertained, for example, by inferring the quantity or concentration of RNA transcripts, such as mRNA, present in a sample that correspond to the gene of interest. Determining the abundance of RNA transcripts in a sample can be performed using established techniques known in the art, including quantitative, reverse-transcription polymerase chain reaction (qRT- PCR) assays as described herein. Additional techniques useful for determining RNA transcript levels include RNA sequencing assays (RNA-Seq) as described herein. Gene expression levels can also be determined by measuring the quantity of protein produced from translation of such transcripts, for instance, using an enzyme-linked immunosorbent assay (ELISA) format in conjunction with one or more antibodies that specifically binds the protein of interest. A gene expression profile can also include information regarding the relative expression levels of various genes within a cell, cell type, or plurality of cells present in a homogenous or heterogeneous mixture of cell types. For instance, a gene expression profile may include a ranking of the expression levels of a plurality of genes in a particular cell or cell type. The genes may be ranked, for example, in order based on their relative expression levels, such that genes that are expressed in higher quantities are assigned a higher rank and genes that are expressed in lower quantities are assigned a lower rank. A gene expression profile may also include numerical values describing the differences in the expression levels of various genes. For instance, a gene expression profile may include one or more metrics (for instance, a gene expression value based on mRNA expression, such as a numerical value obtained from a RNA-Seq or qRT-PCR assay described herein) that indicate that the fold differences between the expression levels of various genes in the cell, cell type, or population of cells of interest. An exemplary gene expression profile may contain, for instance, a set of 3 or more genes (for instance, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 or more, 40 or more, 45 or more, 50 or more, or 100 or more genes) ranked based on relative expression level in the cell, cell type, or population of cells of interest as well as a corresponding series of numerical values obtained from a gene expression assay described herein or known in the art. These values can reflect the relative differences between the expression levels of the genes. For instance, these values can reflect that, e.g., the first ranked gene in the expression profile is expressed at a 10-fold higher level relative to the second, third, fourth, or fifth genes in the expression profile. A detailed description of the gene expression assays that can be used to quantitatively describe gene expression is provided herein. Additional gene expression assays that generate quantitative measures of gene expression are known in the art.
As used herein, the term "genetic frameshift" refers to any mutation that results in a frameshift. A deletion or insertion within an exon, wherein the deletion or insertion is not a multiple of three nucleotides, may result in a frameshift. Additionally, a duplication or deletion of an entire exon, wherein the number of nucleotides in the exon is not a multiple of three, may result in a frameshift.
As used herein, the term "homopolymer" in the context of a polynucleotide refers to a nucleic acid containing three or more (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 20, or more) continuous repeats of a single nucleotide. Exemplary homopolymers include, e.g., poly adenosine phosphate (e.g., AAA, AAAA, AAAAA, AAAAAA, and the like), poly guanosine phosphate (e.g., GGG, GGGG, GGGGG, GGGGGG, and the like), poly cytidine phosphate (e.g., CCC, CCCC, CCCCC, CCCCCC, and the like), and poly thymidine phosphate (e.g., TTT, TTTT, TTTTT, TTTTTT, and the like), among others.
As used herein, the term "ICso" refers to the concentration of a substance (antagonist) that reduces the efficacy of a reference agonist or the constitutive activity of a biological target by 50%, for instance, as measured in a competitive ligand binding assay or in a cell-based functional assay.
As used herein, the term "intron" refers to a region within the coding region of a gene, the nucleotide sequence of which is not translated into the amino acid sequence of the corresponding protein. The term intron also refers to the corresponding region of the RNA transcribed from a gene. In some embodiments, a gene, for example, may contain at minimum two introns, each of which forms the intervening sequence between two exons. Introns are transcribed into pre-mRNA, but are removed during processing, and are not included in the mature mRNA.
As used herein, the term "low abundance" refers to the level of gene expression that is relatively less, for example, than the average (e.g., mean or median or an approximation thereof) of detectable genes expressed in a library, for instance, using standard gene detection techniques known in the art, such as qPCR or RNA-Seq. In some instances, a gene expressed at "low abundance" is generally expressed at levels among the bottom half (e.g., bottom half, bottom third, bottom quarter, bottom 10%, bottom 5%, bottom 1 %, bottom 0.1 %, or lower) of the transcriptome.
As used herein, the term "minimize" or "minimization" refers to a process by which the lowest attainable value of a quantity is computed, subject to one or more constraints. The quantity may be, for example, the percent identity of a polynucleotide, such as an RNA polynucleotide, with respect to another polynucleotide that is naturally expressed in a particular cell or cell type. In the case of sequence identity minimization, the constraints in place may require that the sequence identity of one polynucleotide be minimized with respect to a plurality of other polynucleotides, such as the set of polynucleotides expressed naturally in a particular cell or cell type. The constraints in place may also require, for instance, that the polynucleotide being optimized retain the ability to encode a particular protein or polypeptide of interest, as assessed, for example, by translation of the nucleic acid sequence of the polynucleotide.
As used herein, the term "modified nucleotide" refers to a nucleotide or portion thereof (e.g., adenosine, guanosine, thymidine, cytidine, or uridine) that has been altered by one or more enzymatic or synthetic chemical transformations. Exemplary alterations observed in modified nucleotides described herein or known in the art include the introduction of chemical substituents, such as halo, thio, amino, azido, alkyl, acyl, or other functional groups at one or more positions (e.g., the 2', 3', and/or 5' position) of a 2-deoxyribonucleotide or a ribonucleotide.
As used herein, the terms "mRNA," "mature mRNA," and "processed mRNA" are used interchangeably and refer to an RNA transcript of a gene after processing, wherein processing includes any of the following: addition of a 5' cap, splicing, and addition of a poly(A)-tail. The resulting mRNA, mature mRNA, or processed mRNA does not contain introns, and contain only the exons determined by the splicing pattern.
As used herein, the term "mutation" refers to any change in the sequence of a gene, such that the sequence is not identical to that of the wild type gene. A mutation may be selected from the group including a single nucleotide point mutation that results in an amino acid change, a single nucleotide point mutation that results in a premature termination codon, a single nucleotide insertion, a single nucleotide deletion, the insertion of two or more contiguous nucleotides, the deletion of two or more contiguous nucleotides, the duplication of a contiguous region within a gene, or the deletion of a contiguous region within a gene. A mutated gene may include a single mutation, or multiple mutations. A mutation may occur in any region of the gene.
As used herein, the terms "nucleic acid" and "polynucleotide" are used interchangeably and refer to polymers of nucleotides of any length. Examples of polynucleotides are DNA polynucleotides and RNA polynucleotides.
As used herein, the term "operably linked" in the context of a nucleic acid refers to a nucleic acid that is placed into a structural or functional relationship with another nucleic acid. For example, one segment of DNA may be operably linked to another segment of DNA if they are positioned relative to one another on the same contiguous DNA molecule and have a structural or functional relationship, such as a promoter or enhancer that is positioned relative to a coding region so as to facilitate transcription of the coding region. In other examples, the operably linked nucleic acids are not contiguous, but are positioned in such a way that they have a functional relationship with each other as nucleic acids or as proteins that are expressed by them. Enhancers, for example, do not have to be contiguous. Linking may be accomplished by ligation at convenient restriction sites or by using synthetic oligonucleotide adaptors or linkers.
As used herein, the term "percent (%) sequence identity" refers to the percentage of amino acid (or nucleic acid) residues of a candidate sequence that are identical to the amino acid (or nucleic acid) residues of a reference sequence after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity (e.g., gaps can be introduced in one or both of the candidate and reference sequences for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). Alignment for purposes of determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software, such as BLAST, ALIGN, or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. For example, a reference sequence aligned for comparison with a candidate sequence may show that the candidate sequence exhibits from 50% to 100% sequence identity across the full length of the candidate sequence or a selected portion of contiguous amino acid (or nucleic acid) residues of the candidate sequence. The length of the candidate sequence aligned for comparison purposes may be, for example, at least 30%, (e.g., 30%, 40, 50%, 60%, 70%, 80%, 90%, or 100%) of the length of the reference sequence. When a position in the candidate sequence is occupied by the same amino acid residue as the corresponding position in the reference sequence, then the molecules are identical at that position.
As used herein, the term "pharmaceutical composition" refers to a mixture containing a therapeutic compound to be administered to a subject, such as a mammal, e.g., a human, in order to prevent, treat or control a particular disease or condition affecting or that may affect the subject. As used herein, the term "pharmaceutically acceptable" refers to those compounds, materials, compositions and/or dosage forms, which are suitable for contact with the tissues of a subject, such as a mammal (e.g., a human) without excessive toxicity, irritation, allergic response and other problem complications commensurate with a reasonable benefit/risk ratio.
As used herein, the terms "pioneer round" and "pioneer translation" are used interchangeably and refer to initial round of translation during which complementary RNA (cRNA) is synthesized, for instance, according to the Reading Frame Surveillance model. In eukaryotes, pioneer translation is expected to be coupled to short complementary RNA (scRNA) synthesis such that each region of a cRNA that is complementary to an exon is cleaved to generate a series of scRNAs of uniform length (e.g., from 9-30 nucleotides in length), spanning that exon.
As used herein, the terms "premature termination codon" (PTC) and "nonsense mutation" refer to a point mutation which results a stop codon being present in the mRNA upstream of a wild type stop codon. When a premature termination codon occurs, the resulting protein is truncated at its 3' end.
The term "pre-mRNA," as used herein refers to an RNA transcript of a gene prior to processing, wherein processing includes any of the following: addition of a 5' cap, splicing, and addition of a poly(A)- tail. Pre-mRNA may contain both introns and exons.
The term "promoter," as used herein, refers to a region within the regulatory region of a gene that enables initiation of the transcription of a gene into a messenger RNA, wherein transcription is initiated with the binding of an RNA polymerase on or nearby the promoter.
The term "reading frame" refers to how a ribosome defines a codon within a gene, as determined by binding of a tRNA to a three-nucleotide codon during translation of an mRNA. When a reading frame is "restored," this indicates that the reading frame had first been first altered by a frameshift or nonsense mutation, and subsequently returned to the original reading frame by some means.
The term "ribosomal frameshift," as used herein, refers to a frameshift that occurs when the ribosome shifts in the 5' or 3' direction, with respect to the mRNA, during translation, resulting in a frameshift that is not a result of a genetic mutation. A "+1 " or "plus 1 " ribosomal frameshift refers to a frameshift in which the ribosome shifts 1 nucleotide position towards the 3' end of the mRNA, wherein a single nucleotide is not recognized by a tRNA. A "-1 " or "minus 1 " ribosomal frameshift refers to a frameshift in which the ribosome shifts 1 nucleotide position towards the 5' end of the mRNA, wherein a single nucleotide is recognized twice by a tRNA.
As used herein, the term "RNA equivalent" of a gene refers to a RNA polynucleotide that corresponds to a DNA polynucleotide that encodes the gene, such as a RNA transcript obtainable by transcription of a DNA polynucleotide that contains the gene. Exemplary RNA equivalents include mRNA transcripts produced synthetically, such as by way of solid phase nucleic acid synthesis techniques known in the art and/or described herein, as well as by recombinant nucleic acid preparation methods.
As used herein, the term "RNA sequencing assay" or "RNA-Seq" refers to an assay that can be used to determine the presence and/or quantity of one or more RNA transcripts (e.g., hnRNA or alternatively spliced mRNA) within a sample of RNA molecules. Such RNA sequencing assays may include high-throughput sequencing methods as known in the art. Exemplary methods for conducting RNA-Seq assays are described herein.
As used herein, the term "sample" refers to a specimen (e.g., blood, blood component (e.g., serum or plasma), urine, saliva, amniotic fluid, cerebrospinal fluid, tissue (e.g., placental or dermal), pancreatic fluid, chorionic villus sample, and cells) isolated from a subject.
As used herein, the phrases "specifically binds" and "binds" refer to a binding reaction which is determinative of the presence of a particular molecule, such as a polynucleotide, in a heterogeneous population of polynucleotides and other biological molecules that is recognized, e.g., by a ligand, such as a complementary polynucleotide, with particularity. A ligand (e.g., a complementary polynucleotide) that specifically binds to a protein will bind to the protein, e.g., with a KD of less than 1 00 nM. For example, a ligand that specifically binds to a protein may bind to the protein with a KD of up to 100 nM (e.g., between 1 pM and 100 nM). A ligand that does not exhibit specific binding to another molecule or a domain thereof may exhibit a KD of greater than 100 nM (e.g., greater than 200 nM, 300 nM, 400 nM, 500 nM, 600 nm, 700 nM, 800 nM, 900 nM, 1 μΜ, 100 μΜ, 500 μΜ, or 1 mM) for that particular molecule or domain thereof. A variety of assay formats may be used to determine the affinity of a ligand for a specific protein. For example, solid-phase ELISA assays are routinely used to identify ligands that specifically bind a target protein. See, e.g., Harlow & Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor Press, New York (1988) and Harlow & Lane, Using Antibodies, A Laboratory Manual, Cold Spring Harbor Press, New York (1999), for a description of assay formats and conditions that can be used to determine specific protein binding.
As used herein, the terms "subject" and "patient" are interchangeable and refer to an organism that receives treatment for a particular disease or condition as described herein.
As used herein, the term "short complementary RNA" (scRNA) refers to the RNA fragments generated by cleavage of the cRNA at set intervals. Cleavage occurs in the region or regions of the cRNA that are complementary to the exonic region of the pre-mRNA or mRNA. The resulting scRNAs are uniform with respect to the length in nucleotides of the individual scRNA fragments, wherein the length of each scRNA is 9-30 nucleotides (e.g., 9, 1 0, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides). According to the RFS model, the cleavage of cRNA to scRNA occurs during the pioneer round of translation.
As used herein, the term "truncated" refers to a shortened form of a polynucleotide or protein relative to the corresponding wild type polynucleotide or protein, respectively. Truncated polynucleotides may refer to the truncated form of any of the following: a gene, a pre-mRNA, an mRNA, an intron, or an exon. A truncated polynucleotide may be missing a region at its 5' end, at its 3' end, an internal region, or some combination thereof. A truncated protein may be missing a region at its N-terminus, at its C- terminus, an internal region, or some combination thereof.
As used herein, the term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of vector is a "plasmid," which refers to a circular double stranded DNA loop into which additional DNA segments may be ligated. Another type of vector is a viral vector, wherein additional DNA segments may be ligated into the viral genome. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) can be integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Certain vectors are capable of directing the expression of genes to which they are operably linked.
As used herein, the term "wild-type" refers to a genotype with the highest frequency for a particular gene in a given organism.
Wherein a range of values is provided, it is considered to include both the upper and lower bound of the range.
Brief Description of the Figures
Figure 1 A is a schematic showing the relative positioning of the A site and P site tRNA anticodon stems when the tRNA sequences are properly aligned to the mRNA template. Figure 1 B is a schematic showing the relative positioning of the A site and P site tRNA anticodon stems when the tRNA sequences are improperly aligned such that the A site tRNA attempts to anneal to the mRNA template in a -1 frameshifted fashion. Steric clashes between the adjacent tRNA molecules in the A site and P site hinder the simultaneous binding of the two tRNA molecules to a single mRNA codon. Figure 1 C is a schematic showing the relative positioning of the A site and P site tRNA anticodon stems when the tRNA sequences are improperly aligned such that the A site tRNA attempts to anneal to the mRNA template in a +1 frameshifted fashion. Unlike the instance portrayed in Figure 1 B, no steric clashes are present to hinder the binding of the A site tRNA to a downstream codon.
Figure 2A is a schematic portraying the formation of short complementary RNA (scRNA) fragments during translation of an mRNA template according to the Reading Frame Surveillance model. According to this model, a series of scRNA fragments are produced by an RNA-dependent RNA polymerase during pioneer translation. The scRNA fragments persist following the initial translation cycle and may anneal to daughter mRNA templates, providing an epigenetic memory of previous translations. Figure 2B is a schematic showing a mechanism by which the ribosome senses the distance between the 3' terminal end of a scRNA fragment and the mRNA-bound tRNA molecules. Figure 2C is a schematic showing the events that occur once a +1 frameshift is sensed by the ribosome during translation of an mRNA template. The sensing of +1 frameshifts during the translation process can lead to repeated cleavage of the mRNA transcript that may result in terminal 2-base pair 3' overhangs, which can facilitate loading of RNA fragments onto the RNA Induced Silencing Complex (RISC) in eukaryotic cells.
Figure 3 is a schematic portraying the Reading Frame Surveillance model applied to expression of a eukaryotic gene, wherein a pre-mRNA transcript contains both introns and exons, and cRNA fragments define the intron-exon boundaries. A cRNA complementary to the full-length pre-mRNA is produced in the nucleus. Cleavage of the cRNA to generate scRNA fragments occurs during the pioneer round of translation, also in the nucleus. Formation of scRNA fragments by cleavage of the cRNA occurs at set uniform intervals at the regions of the cRNA that are complementary to the exonic regions of the pre-mRNA. Splicing then joins these scRNA-bound exonic regions, juxtaposing the bordering scRNAs and demarcating their junction with a signaling complex that can adjust the register of RFS monitoring during translation downstream of the junction. After all introns have been removed, and the message contains an acceptable contiguous reading frame that is uniformly decorated with scRNAs, it may be exported to the cytoplasm and translated into protein by the ribosome.
Figure 4 is a map of an exemplary polynucleotide construct capable of inducing exon skipping by RFS. In this example, the polynucleotide is designed to induce skipping of exons 52 and 53 of the dystrophin gene, and may be used to restore dystrophin function in the mdx4cv mouse, which bears a mutation in exon 53. The EX1 -INTR1 -EX2 region of the polynucleotide is indicated in pink, where EX1 is Exon 51 or a 3' portion of EX 51 of the dystrophin gene; INTR1 includes both a splice donor (SD) and a splice acceptor (SA) necessary to enable splicing of EX1 to EX2; and EX2 is Exon 54 or a 5' portion of Exon 54 of the dystrophin gene. The polynucleotide comprising EX1 -INTR1 -EX2 may be operably linked to a promoter and incorporated into a vector. In this example, the expression of EX1 -INTR1 -EX2 has been placed under the control of regulatory elements for the expression of human desmin protein (hDes). Additionally, a sequencing coding for green fluorescence protein (GFP) has been fused to the C- terminal end of Exon 54, in order to indicate and quantify expression of the protein product resulting from the construct. The resulting polynucleotide has then been incorporated into a vector, which may be delivered to a host, such as a DMD cell line or a DMD animal model, for example, the mdx4cv mouse.
Detailed Description
I. Codon Optimization
The compositions and methods described herein can be used to optimize the nucleic acid sequence of a gene or RNA equivalent thereof encoding a protein of interest so as to achieve, for instance, enhanced expression of the protein in a particular cell type. For example, using the compositions and methods described herein, genes and RNA equivalents thereof can be optimized for tissue-specific expression of an encoded protein. Genes and RNA equivalents thereof optimized using the compositions and methods described herein can be synthesized by chemical synthesis techniques and may be amplified, for instance, using polymerase chain reaction (PCR)-based amplification methods or by transfection of the gene into a cell, such as a bacterial cell or mammalian cell capable of replicating exogenous nucleic acids.
The genes and RNA equivalents described herein can have important clinical utility. A variety of diseases and conditions, including heritable genetic disorders, are manifestations of a deficiency in a native protein. With the advent of gene therapy, a wide array of vectors and gene delivery techniques have been developed for the introduction of exogenous protein-coding nucleic acids into target cells (e.g., human cells). However, there remains a need for a unified set of guidelines one can follow in order to optimize the sequence of the exogenous nucleic acid so as to achieve robust and stable expression of the encoded protein in the cell of interest.
A. Methods of Codon Optimization Using Reading Frame Surveillance
The present invention is based in part on the hypothesis that, during translation of an
endogenous mRNA template, short complementary RNA (scRNA) fragments of from about 9 to about 30 nucleotides in length that anneal to the mRNA transcript are generated by an RNA-dependent RNA polymerase. It is hypothesized that these fragments guide the proper alignment of the ribosome to the mRNA template by preventing downstream (+1 ) frameshifts from occurring during the translation process. As shown in Figure 1 B, upstream (-1 ) frameshifting of the ribosome during translation of an mRNA construct is sterically prohibited, as this would require two adjacent tRNA molecules to be bound to the same base on the mRNA target. In contrast, +1 frameshifting of the ribosome could occur without such steric hindrance whenever a one-nucleotide translocation allows for binding of the A-site tRNA in a +1 shifted position relative to the previous codon (Figure 1 C).
Primitive ribosomes have been argued to have been more error prone (Woese, Proc. Natl. Acad. Sci. U.S.A. 54:1 546-1552 (1965)), increasing the challenge of controlling mRNA positions and translocation to limit +1 frameshifting. Combined with the suggestion that genomic RNA replication could cause primordial translation templates to be double stranded (Zenkin, J. Mol. Evol. 74:249-256 (2012)), it is hypothesized that a primordial ribosome could have used the complementary RNA (cRNA) strand to measure the template RNA during translation to facilitate recognition and response to frameshifting events. The Reading Frame Surveillance (RFS) model on which the present invention is based in part postulates that measured cleavage of an annealed cRNA generated during the first round of translation creates a molecular ruler that, in subsequent rounds of translation, allows continuous monitoring of the progression of the mRNA through the ribosome with respect to the frame being read. An idealized schematic outlining this model is shown in Figure 2A, which demonstrates the primordial translation of a 42-coon mRNA, first in the pioneer round of translation and later when the same message is translated repetitively. A key element of this model is that during the pioneer round, translation of a uniform set of codons is associated with cleavage of the cRNA template at a set distance from the A-site, which results in the mRNA being decorated with scRNA oligonucleotides. The 7-codon length of the scRNA fragments diagrammed in Figure 2 represents one possible length for these fragments.
When a previously measured mRNA is subsequently translated, the RFS model postulates that the translational reading frame is monitored by sensing of scRNA termini with respect to the position of the tRNAs bound in the ribosome active sites. Figure 2B presents one possible mechanism by which this sensing event occurs. The ribosome senses the 5' end of a scRNA and, upon mRNA translocation to restore complete scRNA pairing with its complementary sequence on the mRNA, confirms that translation of the previous segment occurred without frameshifting. Figure 2C presents a schematic of a+1 frameshifted complex, which increase the distance between the ribosome and the scRNA 5' end, which could be sensed by the ribosome to trigger alteration or abortion of the translation of that mRNA.
The RFS model additionally suggests that scRNA molecules can dissociate from the mRNA template that was translated during their biogenesis. These scRNA molecules are free to bind to daughter RNA molecules of the same polarity, for instance, to facilitate proper translation of these templates. In this way, the RFS model indicates that the scRNA molecules function as a form of epigenetic memory of previous translations, and facilitate the detection of reading frame errors occurring during genome replication. If, for instance, a single base pair deletion is created during the replication of an RNA molecule (e.g., a +1 frame-shifting event), and scRNA molecules derived from translation of an intact RNA decorate the new, mutated RNA, then translation downstream of the deletion would be sensed as a +1 frameshift at each scRNA every time it is read by the ribosome. This may trigger a severe response, such as cleavage of the RNA message. The ensuing cleavage could subsequently generate double stranded RNA molecules with terminal 2-base pair 3' overhangs (Figure 2C), which facilitate loading of RNA fragments onto the RNA Induced Silencing Complex (RISC) in eukaryotic cells (Elbashir et al., Genes Dev. 15:1 88-200 (2001 )).
In eukaryotes, exons generally have elevated GC content relative to adjacent introns, and in higher eukaryotes this feature is even more pronounced in those regions of the genome with lower overall GC content and longer introns (Amit et al., Cell Reports 1 :543-556 (2012) and Louie et al., Genome Res. 13:2594-2601 (2003)). This property of coding exons is frequently revealed in analyses of codon usage bias, in particular the GC content of the third, wobble position of codons. Codon usage bias is often attributed to selective pressure to maintain optimal translation rates in the context of charged tRNA abundances (Ikemura, Mol. Biol. Evol. 2:13-34 (1985)), but there are many possible mechanistic causes for altered use of synonymous codons (Hershberg and Petrov, Annu. Rev. Genet. 42:287-299 (2008) and Plotkin and Ludla, Nat. Rev. Genet. 12:32-42 (201 1 )), and recent descriptions of physiological alteration of tRNA profiles in cells (Pavon-Eternod et al., Nucleic Acids Res. 37:7268-7280 (2009)) raise the possibility that codon usage bias in transcribed genes is a cause, as opposed to a consequence, of tRNA abundance. The RFS model indicates that GC-rich coding exons bind to scRNA molecules with increased thermodynamic stability, which enhances translational fidelity and processivity. Thus, codon usage in eukaryotes can be influenced by reading frame surveillance, and can provide selective pressure to maintain high GC content in the wobble position. In support of this model, the severity of disease- causing mutations in the Factor IX gene has been correlated with the change in pairing free energy causing by each coding sequence mutation (Hamasaki-Katagiri et al., Haemophilia 18:933-940 (2012)). The RFS model can thus be used to inform the design of codon-optimized genes for expression in a target cell, and may synergize with existing codon-optimization procedures to provide a single set of guidelines one can follow to design genes capable of inducing robust protein expression in a target cell of interest.
(i) Minimizing sequence identity to endogenous mRNA
The principle of reading frame surveillance is based in part on the postulate that scRNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated. The model indicates that these complementary RNA fragments can function in concert to promote the degradation endogenous RNA that imperfectly aligns with the ribosome. As these endogenous RNA molecules are capable of persisting within the cytoplasm and may anneal, for instance, to RNA molecules encoding other proteins due to chance complementarity, a gene encoding a protein intended for expression in a target cell can be codon- optimized by incorporating codon replacements into the gene that minimize the sequence identity between RNA resulting from transcription of the gene of interest and the endogenous RNA molecules present within the cell. Since endogenous RNA is generated from transcription of genes that are actively expressed within the target cell, one can use gene expression techniques described herein, such as qPCR, RNA-Seq, and immunoblot assays to assess which genes are expressed in a particular target cell and the extent to which these genes are expressed so as to determine a panel of RNA molecules against which the sequence identity of the RNA transcript of the gene of interest can be compared. Since the coding strand of the DNA gene corresponds directly to the sequence of the ensuing mRNA template, codon substitutions can be incorporated into the coding strand of the gene that minimize the sequence identity of the coding strand (and thus, the subsequent mRNA transcript) with endogenous RNA transcripts present in the cell, such as endogenous RNA transcripts corresponding to those genes that are expressed in higher abundance than other genes in the cell (e.g., endogenous RNA transcripts corresponding to those genes that are expressed 5-fold higher, 10-fold higher, 15-fold higher, 20 -fold higher, 25 -fold higher, 30 -fold higher, 35 -fold higher, 40-fold higher, 45-fold higher, 50-fold higher, 100- fold higher, 200-fold higher, 300-fold higher, 400-fold higher, 500-fold higher, 600-fold higher, 700-fold higher, 800-fold higher, 900-fold higher, 1 , 000-fold higher, or more, such as RNA transcripts encoding proteins that are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 1 0%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50%, or more, of all protein expressed by the cell).
Due to their elevated expression levels, genes that are expressed in higher quantities are expected to have higher quantities of corresponding scRNA fragments within the cell of interest. Since endogenous scRNA fragments exist only for those genes that are actively transcribed in a cell of interest, it is not necessary to minimize the sequence identity of the target gene relative to the entirety of the genome. If a gene's expression level is not among, for example, the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 1 6%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the cell, cell type, or population of cells of interest (e.g., the cell, cell type, or population of cells into which a codon-optimized nucleic acid is to be delivered to achieve enhanced gene expression), the sequence identity of the target gene need not be minimized relative to those genes that have lower gene expression (for instance, those genes whose expression levels are not among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the cell, cell type, or population of cells of interest). Likewise, if a gene is not expressed above a basal level in the target cell under investigation (for instance, if expression of the gene cannot be detected using a gene expression technique described herein), the sequence identity of the target gene need not be minimized relative to the unexpressed gene.
To design a codon-optimized gene for enhanced expression in a target cell (e.g., a mammalian cell, such as a human cell), one can begin with a wild-type gene sequence or an already codon-optimized gene sequence encoding a protein of interest. Using the compositions and methods described herein, one of skill in the art can align the coding strand of the gene of interest, such as a gene encoding a therapeutic protein described herein, with the coding strands of genes expressed within the target cell, such as genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the cell. One can then introduce mutations into the gene of interest that minimize the sequence identity of the target gene relative to the genes expressed in the target cell, for instance, as described in Example 1 , below. This can be done, for example, by incorporating codon substitutions into the gene that minimize the sequence identity of the target gene relative to other genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell while preserving the identity of the encoded protein. This can be done, for instance, by introducing single nucleotide substitutions at the third (wobble) position of one or more codons within the gene of interest that maintain the identity of the encoded protein by virtue of the multiplicity of the genetic code. For instance, one can introduce mutations into a coding sequence that change the identity of a nucleotide at the wobble position within a codon while preserving the identity of the encoded protein by selecting mutations that result in incorporation of the same amino acid upon translation of the codon of interest. Translation of those genes that are expressed at high levels (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell) is expected to result in the production of scRNA molecules, which may anneal imperfectly with the mRNA transcript of gene of interest following transcription. Due to imperfect alignment due to incomplete sequence complementarity, annealing of the scRNA molecules can result in the formation of terminal 2-base pair 3' overhangs (e.g., as shown in Figure 2C), which facilitate loading of RNA fragments onto the RISC in eukaryotic cells, leading to abortion of translation and eventual degradation of the mRNA template. Thus, by minimizing the sequence identity of the target gene relative to the genes expressed at high levels in a particular target cell, the ability of mRNA resulting from transcription of the gene of interest to anneal to endogenous scRNA molecules is diminished, thereby reducing the likelihood of, for example, RISC-promoted mRNA degradation.
Single-nucleotide mutations that preserve the amino acid sequence of the encoded protein can be informed, for instance, by the standard genetic code, represented in Table 1 , below, compiled by the National Center for Biotechnology Information, Bethesda, Maryland, USA.
Figure imgf000049_0001
The codon-optimization process can be performed iteratively. For instance, one of skill in the art can begin with a wild-type gene sequence (e.g., excluding intronic DNA) and introduce substitutions into this sequence that reduce the sequence identity of the gene relative to the genes that are expressed within a target cell (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 1 7%, 18%, 1 9%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell). This process can be repeated until all codons within the gene of interest have been evaluated for the opportunity to introduce single- nucleotide substitutions that can reduce sequence identity relative to the genes expressed at high levels within the target cell. Alternatively, one can begin with a gene sequence that has previously been modified relative to the wild-type sequence of the gene, for instance, by incorporating codon substitutions that increase the GC content of the gene and/or that reduce CpG content of the gene relative to the wild- type sequence. The sequence of the resulting gene can subsequently be aligned to the coding strands of the genes expressed in a desired target cell (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 1 5%, 16%, 1 7%, 18%, 1 9%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell), and iterative codon substitutions can be introduced throughout the gene in order to minimize the sequence identity of the previously-modified gene with respect to the genes expressed at high levels within the target cell.
To further minimize sequence identity of a gene of interest with respect to genes expressed at elevated levels in the cell (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 1 6%, 17%, 1 8%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell), one of skill in the art can align the sequence of the coding strand of the gene of interest with the coding strands of genes expressed at elevated levels in the desired target cell and may introduce mutations, e.g., as described above, that minimize sequence identity of the gene of interest to the genes expressed at elevated levels within the target cell while preserving the encoded amino acid sequence. One can additionally introduce single nucleotide substitutions that result in conservative mutations, for instance, when these mutations do not alter protein function. Exemplary instances in which one may introduce conservative substitutions include instances in which the amino acid to be substituted is not present within the active site of an enzyme or a site on the protein required for non-covalent binding to another biological molecule.
As the RNA transcriptome varies among cells of different tissues, the compositions and methods described herein can be used to induce tissue-specific gene expression. For instance, after completing the codon-optimization process described above, one can subsequently align the coding strand of the codon-optimized gene with the coding strands of genes that are not expressed in the target cell (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell or genes that are not expressed above a basal level in the target cell) but that are expressed (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more) in other cells within the same organism (e.g., a human patient suffering from a disease or condition characterized by a protein deficiency). One can then introduce single- nucleotide substitutions into the coding strand of the gene of interest that increase the sequence identity of the gene of interest relative to the genes expressed in non-target cells. According to the RFS model, translation of those genes expressed in non-target cells is expected to result in the production of scRNA molecules, which may anneal imperfectly with the mRNA transcript of a gene of interest following transcription. Due to imperfect alignment due to incomplete sequence complementarity, annealing of the scRNA molecules can result in the formation of terminal 2-base pair 3' overhangs (e.g., as shown in Figure 2C), which facilitate loading of RNA fragments onto the RISC in eukaryotic cells, leading to abortion of translation and eventual degradation of the mRNA template. In this fashion, designed mutations that minimize the sequence identity of a gene of interest relative to the genes expressed in a target cell (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 1 1 %, 12%, 13%, 14%, 1 5%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell) may synergize with designed single- nucleotide substitutions that increase the sequence identity of a gene of interest relative to genes expressed in cells aside from the target cell so as to promote tissue-specific expression of a desired gene.
(ii) Increasing GC content
In addition to the above process, one of skill in the art can design variants of the target gene that contain increased GC content, reduced CpG content relative to the wild-type gene so as to enhance the translation of the encoded protein. For instance, after enhancing the protein-encoding gene sequence by incorporating codon substitutions that minimize the sequence identity of the coding strand of the target gene relative to the coding strands of genes expressed within the target cell, one of skill in the art can subsequently modify the designed coding sequence to as to increase the GC content of the coding sequence while preserving the identity of the encoded amino acid sequence. According to the RFS model, the increase in GC content will lead to enhanced binding of the ensuing mRNA transcript with scRNA fragments that are formed during translation of the desired mRNA template. The binding of the target gene-specific scRNA molecules promotes improved licensing of the mRNA transcript for nuclear export and enhanced translation due to fewer instances of improper alignment (e.g., fewer instance of +1 frameshifting) of the ribosome to the mRNA template.
The codon-optimization process described herein can be done iteratively. For instance, one may introduce mutations that increase GC content into the gene that has already undergone the sequence identity minimization process described above. Alternatively, one may begin by introducing mutations that increase the GC content of a target gene and subsequently introduce mutations that minimize the sequence identity of a target gene relative to the genes expressed within a target cell.
(Hi) Increasing quantity of high-frequency codons
In addition to the above processes, one of skill in the art can design variants of the target gene that contain greater quantities of high-frequency codons within the target organism of interest. For instance, after enhancing the protein-encoding gene sequence by incorporating codon substitutions that minimize the sequence identity of the coding strand of the target gene relative to the coding strands of genes expressed at high levels within the target cell, one of skill in the art can subsequently modify the designed coding sequence to as to increase the quantity of codons that frequently occur in endogenous genes within the target organism (e.g., a mammal, such as a human). As described above, codons that have increased GC content tend to be employed more frequently in protein-coding genes, which may be a manifestation of the improved thermodynamic stability of mRNA:scRNA duplexes formed following the pioneer round of translation. Using the methods described herein, one may introduce mutations that increase the content of high-frequency codons in a gene that has already undergone the sequence identity minimization process described above. Alternatively, one may begin by introducing mutations that increase the content of high-frequency codons in a target gene and subsequently introduce mutations that minimize the sequence identity of a target gene relative to the genes expressed within a target cell. A summary of the estimated codon frequency across the human genome is provided in Table 2, below, compiled by GENSCRIPT®.
Figure imgf000052_0001
Figure imgf000053_0001
(iv) Reducing CpG content and homopolymer content
One of skill in the art can manipulate the protein-encoding gene sequence of a target gene by incorporating codon substitutions that diminish the CpG content and/or homopolymer content of the gene. For instance, one can begin with a wild-type gene sequence and introduce substitutions (e.g., single- nucleotide substitutions) that reduce the CpG content and/or homopolymer content of the gene while preserving the identity of the encoded proteins sequence. One can then follow the sequence identity minimization process described above and in Example 1 in order to obtain a gene sequence that minimally resembles the genes encoded in a cell type of interest. Alternatively, one can begin with a sequence that has been codon-optimized according to the sequence identity minimization process described above and subsequently can be manipulated by the introduction of mutations (e.g., single- nucleotide substitutions) that reduce the CpG content and/or homopolymer content of the gene. CpG sites and homopolymers can promote +1 frameshifts during the mRNA translation process. Alternatively, if the homopolymer encodes amino acid residues that are not essential for protein function (for instance, if the encoded amino acids are not present within the active site of an encoded enzyme or within a site necessary for non-covalent binding to another biological molecule), one of skill in the art can incorporate codon substitutions that interrupt the homopolymer and that introduce a conservative substitution into the encoded protein at the site of the corresponding amino acid. (v) Preparation of codon-optimized genes
Once designed, the final codon-optimized gene can be prepared, for instance, by solid phase nucleic acid procedures known in the art. For instance, to perform the chemical synthesis of nucleic acid molecules, such as DNA, RNA and the like, a solid phase synthesis process using a phosphoramidite method can be employed. According to this procedure, a nucleic acid is generally synthesized by the following steps.
First, a 5-OH-protected nucleoside that will occur at the 3' terminal end of the nucleic acid to be synthesized is esterified via the 3'-OH function to a solid support by appending the nucleoside to a cleavable linker. Then, the support for solid phase synthesis on which the nucleoside is immobilized can be placed in a reaction column which is then set on an automated nucleic acid synthesizer.
Thereafter, an iterative synthetic process including the following steps can be performed in the reaction column according to a synthesis program of the automated nucleic acid synthesizer:
• (1 ) a step of deprotection of the 5'-OH moiety of the protected, immobilized nucleoside (e.g., with an acid such as trichloroacetic acid in dichloromethane solution or the like to remove acid-labile hydroxyl protecting groups);
• (2) a step of coupling a 5-OH-protected nucleosidephosphoramidite with the deprotected 5'-OH group of the immobilized nucleoside in the presence of an activator (e.g., tetrazole or the like);
• (3) a step of capping the unreacted 5'-OH group of the 3'-terminal nucleoside (e.g., with acetic anhydride or the like); and
· (4) a step of oxidizing the immobilized phosphite substituent (e.g., with aqueous iodine or the like).
The above process can be repeated to elongate the nucleic acid as needed in a 3'-to-5' direction. 5' terminal direction is promoted, and a nucleic acid having a desired sequence is synthesized.
Lastly, the cleavable linker is hydrolyzed (e.g., with aqueous ammonia, methylamine solution, or the like) to cleave the synthesized nucleic acid from the solid phase support. Procedures such as the foregoing for the chemical synthesis of nucleic acids are known in the art and are described, for instance, in US Patent No. 8,835,656, the disclosure of which is incorporated herein by reference as it pertains to protocols for the synthesis of nucleic acid molecules.
Additionally, the prepared gene can be amplified, for instance, using PCR-based techniques described herein or known in the art, and/or by transformation of DH5a E. coli with a plasmid containing the designed gene. The bacteria can subsequently be cultured so as to amplify the DNA therein, and the gene can be isolated plasmid purification techniques known in the art, followed optionally by a restriction digest and/or sequencing of the plasmid to verify the identity codon-optimized gene.
B. Therapeutic Proteins
Genes that can be codon-optimized according to the methods described herein include therapeutic proteins, such as those that can be transferred to a subject (e.g., a human patient) suffering from a disease or condition characterized by a deficiency in the protein. For instance, genes that can be codon-optimized and delivered to a patient according to the methods described herein include genes encoding hormones and growth and differentiation factors including, without limitation, insulin, glucagon, growth hormone (GH), parathyroid hormone (PTH), calcitonin, growth hormone releasing factor (GRF), thyroid stimulating hormone (TSH), adrenocorticotropic hormone (ACTH), prolactin, melatonin, vasopressin, β-endorphin, met-enkephalin, leu-enkephalin, prolactin-releasing factor, prolactin-inhibiting factor, corticotropin-releasing hormone, thyrotropin-releasing hormone (TRH), follicle stimulating hormone (FSH), luteinizing hormone (LH), chorionic gonadotropin (CG), vascular endothelial growth factor (VEGF), angiopoietins, angiostatin, endostatin, granulocyte colony stimulating factor (GCSF), erythropoietin (EPO), connective tissue growth factor (CTGF), basic fibroblast growth factor (bFGF), bFGF2, acidic fibroblast growth factor (aFGF), epidermal growth factor (EGF), transforming growth factor a (TGFa), platelet-derived growth factor (PDGF), insulin-like growth factors I and II (IGF-I and IGF-II), any one of the transforming growth factor β (TGFp) superfamily comprising TGFp, activins, inhibins, or any of the bone morphogenic proteins (BMP) BMPs 1 15, any one of the heregulin/neuregulin/ARIA/neu differentiation factor (NDF) family of growth factors, nerve growth factor (NGF), brain-derived neurotrophic factor
(BDNF), neurotrophins NT-3, NT-4/5 and NT-6, ciliary neurotrophic factor (CNTF), glial cell line derived neurotrophic factor (GDNF), neurtuin, persephin, agrin, any one of the family of semaphorins/collapsins, netrin-1 and netrin-2, hepatocyte growth factor (HGF), ephrins, noggin, sonic hedgehog and tyrosine hydroxylase.
Other useful genes that may be optimized and delivered according to the methods described herein include those that encode proteins that regulate the immune system including, without limitation, cytokines and lymphokines such as thrombopoietin (TPO), interleukins (IL) IL-1 a, IL-1 β, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-1 1 , IL-12, IL-13, IL-14, IL-1 5, IL-16, and IL-17, monocyte
chemoattractant protein (MCP-1 ), leukemia inhibitory factor (LIF), granulocyte-macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating factor (G-CSF), monocyte colony stimulating factor (M-CSF), Fas ligand, tumor necrosis factors a and β (TNFa and TNFp), interferons (IFN) IFN-a, IFN-β, and IFN-γ, stem cell factor, flk-2/flt3 ligand. Gene products produced by the immune system are also encompassed by this invention. These include, without limitations, immunoglobulins IgG, IgM, IgA, IgD and IgE, chimeric immunoglobulins, humanized antibodies, single chain antibodies, T cell receptors, chimeric T cell receptors, single chain T cell receptors, class I and class II MHC molecules, as well as engineered MHC molecules including single chain MHC molecules. Useful gene products also include complement regulatory proteins such as membrane cofactor protein (MCP), decay accelerating factor (DAF), CR1 , CR2 and CD59.
Still other genes that may be optimized and delivered according to the methods described herein include those that encode any one of the receptors for the hormones, growth factors, cytokines, lymphokines, regulatory proteins and immune system proteins. Examples of such receptors include flt-1 , flk-1 , TIE-2; the trk family of receptors such as TrkA, MuSK, Eph, PDGF receptor, EGF receptor, HER2, insulin receptor, IGF-1 receptor, the FGF family of receptors, the TGFp receptors, the interleukin receptors, the interferon receptors, serotonin receptors, a-adrenergic receptors, β-adrenergic receptors, the GDNF receptor, p75 neurotrophin receptor, among others. The invention encompasses receptors for extracellular matrix proteins, such as integrins, counter-receptors for transmembrane-bound proteins, such as intercellular adhesion molecules (ICAM-1 , ICAM-2, ICAM-3 and ICAM-4), vascular cell adhesion molecules (VCAM), and selectins E-selectin, P-selectin and L-selectin. The invention encompasses receptors for cholesterol regulation, including the LDL receptor, HDL receptor, VLDL receptor, and the scavenger receptor. The invention encompasses the apolipoprotein ligands for these receptors, including ApoAI, ApoAIV and ApoE. The invention also encompasses gene products such as steroid hormone receptor superfamily including glucocorticoid receptors and estrogen receptors, Vitamin D receptors and other nuclear receptors. In addition, useful gene products include antimicrobial peptides such as defensins and maginins, transcription factors such as jun, fos, max, mad, serum response factor (SRF), AP-1 , AP-2, myb, MRG1 , CREM, Alx4, FREAC1 , NF-κΒ, members of the leucine zipper family, C2H4 zinc finger proteins, including Zif268, EGR1 , EGR2, C6 zinc finger proteins, including the glucocorticoid and estrogen receptors, POU domain proteins, exemplified by Pit 1 , homeodomain proteins, including HOX-1 , basic helix-loop-helix proteins, including myc, MyoD and myogenin, ETS-box containing proteins, TFE3, E2F, ATF1 , ATF2, ATF3, ATF4, ZF5, NFAT, CREB, HNF-4, C/EBP, SP1 , CCAAT-box binding proteins, interferon regulation factor 1 (IRF-1 ), Wilms' tumor protein, ETS-binding protein, STAT, GATA-box binding proteins, e.g., GATA-3, and the forkhead family of winged helix proteins.
Other useful genes that may be optimized and delivered according to the methods described include those that encode carbamoyl synthetase I, ornithine transcarbamylase, arginosuccinate synthetase, arginosuccinate lyase, arginase, fumarylacetoacetate hydrolase, phenylalanine hydroxylase, alpha-1 antitrypsin, glucose-6-phosphatase, porphobilinogen deaminase, factor VII, factor VIII, factor IX, factor II, factor V, factor X, factor XII, factor XI, von Willebrand factor, superoxide dismutase, glutathione peroxidase and reductase, heme oxygenase, angiotensin converting enzyme, endothelin-1 , atrial natriuretic peptide, pro-urokinase, urokinase, plasminogen activator, heparin cofactor II, activated protein C (Factor V Leiden), Protein C, antithrombin, cystathione beta-synthase, branched chain ketoacid decarboxylase, albumin, isovaleryl-CoA dehydrogenase, propionyl CoA carboxylase, methyl malonyl CoA mutase, glutaryl CoA dehydrogenase, insulin, beta-glucosidase, pyruvate carboxylase, hepatic phosphorylase, phosphorylase kinase, glycine decarboxylase (also referred to as P-protein), H-protein, T- protein, Menkes disease protein, tumor suppressors (e.g., p53), cystic fibrosis transmembrane regulator (CFTR), the product of Wilson's disease gene PWD, Cu/Zn superoxide dismutase, aromatic amino acid decarboxylase, tyrosine hydroxylase, acetylcholine synthetase, prohormone convertases, protease inhibitors, lactase, lipase, trypsin, gastrointestinal enzymes including chymotrypsin, and pepsin, adenosine deaminase, a1 anti-trypsin, tissue inhibitor of metalloproteinases (TIMP), GLUT-1 , GLUT-2, trehalose phosphate synthase, hexokinases I, II and III, glucokinase, any one or more of the individual chains or types of collagen, elastin, fibronectin, thrombospondin, vitronectin and tenascin, and suicide genes such as thymidine kinase and cytosine deaminase. Other useful proteins include those involved in lysosomal storage disorders, including acid β-glucosidase, a-galactosidase a, a-1 -iduronidase, iduroate sulfatase, lysosomal acid a-glucosidase, sphingomyelinase, hexosaminidase A, hexomimidases A and B, arylsulfatase A, acid lipase, acid ceramidase, galactosylceramidase, a-fucosidase, α-, β-mannosidosis, aspartylglucosaminidase, neuramidase, galactosylceramidase, heparan-N-sulfatase, N-acetyl-a- glucosaminidase, Acetyl-CoA: a-glucosaminide N-acetyltransferase, N-acetylglucosamine-6-sulfate sulfatase, N-acetylgalactosamine-6-sulfate sulfatase, arylsulfatase B, β-glucuoronidase and
hexosaminidases A and B.
Other useful transgenes that may be optimized and delivered to a patient using the methods described herein include those that encode non-naturally occurring polypeptides, such as chimeric or hybrid polypeptides or polypeptides having a non-naturally occurring amino acid sequence containing insertions, deletions or amino acid substitutions. For example, single-chain engineered immunoglobulins could be useful in certain immunocompromised patients. Other useful proteins include truncated receptors which lack their transmembrane and cytoplasmic domain. These truncated receptors can be used to antagonize the function of their respective ligands by binding to them without concomitant signaling by the receptor. Other types of non-naturally occurring gene sequences include sense and antisense molecules and catalytic nucleic acids, such as ribozymes, which could be used to modulate expression of a gene.
Representative genes that can be codon-optimized and delivered to a patient according to the methods described herein include those listed in Table 3, below.
Table 3. Exemplary disorders associated with a gene deficiency
Figure imgf000057_0001
Figure imgf000058_0001
Figure imgf000059_0001
Figure imgf000060_0001
C. Methods of Measuring Gene Expression
The expression level of a gene expressed by a single cell or type of cell (e.g., a cell belonging to a certain tissue) can be ascertained, for example, by evaluating the concentration or relative abundance of RNA transcripts (e.g., mRNA)) derived from transcription of a gene of interest. Additionally or alternatively, gene expression can be determined by evaluating the concentration or relative abundance of protein produced by transcription and translation of a gene of interest. Protein concentrations can also be assessed using functional assays, such as enzymatic assays or gene transcription assays in the event the gene of interest encodes an enzyme or a modulator of transcription, respectively. The sections that follow describe exemplary techniques that can be used to measure and rank the expression levels of genes in a cell, cell type, or population of cells of interest, for instance, at the level of a single cell or a population of cells. Expression of genes in a sample can be analyzed by a number of methodologies, many of which are known in the art and understood by the skilled artisan, including, but not limited to, nucleic acid sequencing, microarray analysis, proteomics, in-situ hybridization (e.g., fluorescence in-situ hybridization (FISH)), amplification-based assays, in situ hybridization, fluorescence activated cell sorting (FACS), northern analysis and/or PCR analysis of mRNAs.
(i) Nucleic acid detection
Nucleic acid-based datasets suitable for analysis of target cell-specific gene expression can have the form of a gene expression profile, which represents the identity of genes expressed in a cell of interest and the extent to which the gene is expressed, which can be used to determine the ranked order of gene expression levels within a cell, cell type, or population of cells of interest. Such profiles may include whole transcriptome sequencing data (e.g., RNA-Seq data), panels of mRNAs, noncoding RNAs, or any other nucleic acid sequence that may be expressed from genomic DNA. Other nucleic acid datasets suitable for use with the methods described herein may include expression data collected by imaging-based techniques (e.g., Northern blotting or Southern blotting known in the art). Northern blot analysis is a conventional technique well known in the art and is described, for example, in Molecular Cloning, a Laboratory Manual, second edition, 1989, Sambrook, Fritch, Maniatis, Cold Spring Harbor Press, 10 Skyline Drive, Plainview, NY 1 1803-2500, the disclosure of which is incorporated herein by reference. Typical protocols for evaluating the status of genes and gene products are found, for example in Ausubel et al., eds., 1995, Current Protocols In Molecular Biology, Units 2 (Northern Blotting), 4
(Southern Blotting), 15 (Immunoblotting) and 18 (PCR Analysis), the disclosure of which is incorporated herein by reference.
Gene expression profiles to be analyzed in conjunction with the methods described herein may include, for example, microarray data or nucleic acid sequencing data produced by a sequencing method known in the art (e.g., Sanger sequencing and next-generation sequencing methods, also known as high- throughput sequencing or deep sequencing). Exemplary next generation sequencing technologies include, without limitation, lllumina sequencing, Ion Torrent sequencing, 454 sequencing, SOLiD sequencing, and nanopore sequencing platforms. Additional methods of sequencing known in the art can also be used. For instance, mRNA expression levels may be determined using RNA-Seq (e.g., as described in Mortazavi et al., Nat. Methods 5:621 -628 (2008), the disclosure of which is incorporated herein by reference in their entirety). RNA-Seq is a robust technology known in the art for monitoring expression by direct sequencing the RNA molecules in a sample. Briefly, this methodology may involve fragmentation of RNA to an average length of 200 nucleotides, conversion to cDNA by random priming, and synthesis of double-stranded cDNA (e.g., using the Just cDNA DoubleStranded cDNA Synthesis Kit from Agilent Technology). Then, the cDNA is converted into a molecular library for sequencing by addition of sequence adapters for each library (e.g., from lllumina®/Solexa), and the resulting 50-100 nucleotide reads are mapped onto the genome.
Gene expression levels may be determined using microarray-based platforms (e.g., single- nucleotide polymorphism (SNP) arrays), as microarray technology offers high resolution. Details of various microarray methods can be found in the literature. See, for example, US Patent No. 6,232,068 and Pollack et al., Nat. Genet. 23:41 -46 (1999), the disclosures of each of which are incorporated herein by reference in their entirety. Using nucleic acid microarrays, mRNA samples are reverse transcribed and labeled to generate cDNA. The probes can then hybridize to one or more complementary nucleic acids arrayed and immobilized on a solid support. The array can be configured, for example, such that the sequence and position of each member of the array is known. Hybridization of a labeled probe with a particular array member indicates that the sample from which the probe was derived expresses that gene. Expression level may be quantified according to the amount of signal detected from hybridized probe- sample complexes. A typical microarray experiment involves the following steps: 1 ) preparation of fluorescently labeled target from RNA isolated from the sample, 2) hybridization of the labeled target to the microarray, 3) washing, staining, and scanning of the array, 4) analysis of the scanned image and 5) generation of gene expression profiles. One example of a microarray processor is the Affymetrix
GENECHIP® system, which is commercially available and comprises arrays fabricated by direct synthesis of oligonucleotides on a glass surface. Other systems may be used as known to one skilled in the art.
Amplification-based assays also can be used to measure the expression level of one or more markers (e.g., genes). In such assays, the nucleic acid sequences of the gene act as a template in an amplification reaction (for example, PCR, such as qPCR). In a quantitative amplification, the amount of amplification product is proportional to the amount of template in the original sample. Comparison to appropriate controls provides a measure of the expression level of the gene, corresponding to the specific probe used, according to the principles described herein. Methods of real-time qPCR using TaqMan probes are well known in the art. Detailed protocols for real-time qPCR are provided, for example, in Gibson et al., Genome Res. 6:995-1001 (1 996) and in Heid et al., Genome Res. 6:986-994 (1996), the disclosures of each of which are incorporated herein by reference. Levels of gene expression as described herein can be determined by RT-PCR technology. Probes used for PCR may be labeled with a detectable marker, such as, for example, a radioisotope, fluorescent compound, bioluminescent compound, a chemiluminescent compound, metal chelator, or enzyme.
(ii) Protein detection
Gene expression can additionally be determined by measuring the concentration or relative abundance of a corresponding protein product encoded by a gene of interest. Protein levels can be assessed using standard detection techniques known in the art. Examples of protein expression analysis that generate data suitable for use with the methods described herein include, without limitation, proteomics approaches, immunohistochemical and/or western blot analysis, immunoprecipitation, molecular binding assays, ELISA, enzyme-linked immunofiltration assay (ELIFA), mass spectrometry, mass spectrometric immunoassay, and biochemical enzymatic activity assays. In particular, proteomics methods can be used to generate large-scale protein expression datasets in multiplex. Proteomics methods may utilize mass spectrometry to detect and quantify polypeptides (e.g., proteins) and/or peptide microarrays utilizing capture reagents (e.g., antibodies) specific to a panel of target proteins to identify and measure expression levels of proteins expressed in a sample (e.g., a single cell sample or a multi- cell population).
Exemplary peptide microarrays have a substrate-bound plurality of polypeptides, the binding of an oligonucleotide, a peptide, or a protein to each of the plurality of bound polypeptides being separately detectable. Alternatively, the peptide microarray may include a plurality of binders, including but not limited to monoclonal antibodies, polyclonal antibodies, phage display binders, yeast two-hybrid binders, aptamers, which can specifically detect the binding of specific oligonucleotides, peptides, or proteins. Examples of peptide arrays may be found in US Patent Nos. 6,268,210, 5,766,960, and 5,143,854, the disclosures of each of which are incorporated herein by reference.
Mass spectrometry (MS) may be used in conjunction with the methods described herein to identify and characterize the gene expression profile of a single cell or multi-cell population. Any method of MS known in the art may be used to determine, detect, and/or measure a peptide or peptides of interest, e.g., LC-MS, ESI-MS, ESI-MS/MS, MALDI-TOF-MS, MALDI-TOF/TOF-MS, tandem MS, and the like. Mass spectrometers generally contain an ion source and optics, mass analyzer, and data processing electronics. Mass analyzers include scanning and ion-beam mass spectrometers, such as time-of-flight (TOF) and quadruple (Q), and trapping mass spectrometers, such as ion trap (IT), Orbitrap, and Fourier transform ion cyclotron resonance (FT-ICR), may be used in the methods described herein. Details of various MS methods can be found in the literature. See, for example, Yates et al., Annu. Rev. Biomed. Eng. 1 1 :49-79 (2009), the disclosure of which is incorporated herein by reference. Prior to MS analysis, proteins in a sample can be first digested into smaller peptides by chemical (e.g., via cyanogen bromide cleavage) or enzymatic (e.g., trypsin) digestion. Complex peptide samples also benefit from the use of front-end separation techniques, e.g., 2D-PAGE, HPLC, RPLC, and affinity chromatography. The digested, and optionally separated, sample is then ionized using an ion source to create charged molecules for further analysis. Ionization of the sample may be performed, e.g., by electrospray ionization (ESI), atmospheric pressure chemical ionization (APCI), photoionization, electron ionization, fast atom bombardment (FAB)/liquid secondary ionization (LSIMS), matrix assisted laser desorption/ionization (MALDI), field ionization, field desorption, thermospray/plasmaspray ionization, and particle beam ionization. Additional information relating to the choice of ionization method is known to those of skill in the art.
After ionization, digested peptides may then be fragmented to generate signature MS/MS spectra. Tandem MS, also known as MS/MS, may be particularly useful for methods described herein allowing for ionization followed by fragmentation of a complex peptide sample, such as a sample obtained from a multi-cell population described herein. Tandem MS involves multiple steps of MS selection, with some form of ion fragmentation occurring in between the stages, which may be accomplished with individual mass spectrometer elements separated in space or using a single mass spectrometer with the MS steps separated in time. In spatially separated tandem MS, the elements are physically separated and distinct, with a physical connection between the elements to maintain high vacuum. In temporally separated tandem MS, separation is accomplished with ions trapped in the same place, with multiple separation steps taking place over time. Signature MS/MS spectra may then be compared against a peptide sequence database (e.g., SEQUEST). Post-translational modifications to peptides may also be determined, for example, by searching spectra against a database while allowing for specific peptide modifications. D. Methods for the Delivery of Exogenous Nucleic Acids to Target Cells
(i) Transfection techniques
Techniques that can be used to introduce a polynucleotide, such as codon-optimized DNA or RNA into a mammalian cell are well known in the art. For instance, electroporation can be used to permeabilize mammalian cells (e.g., human target cells) by the application of an electrostatic potential to the cell of interest. Mammalian cells, such as human cells, subjected to an external electric field in this manner are subsequently predisposed to the uptake of exogenous nucleic acids. Electroporation of mammalian cells is described in detail, e.g., in Chu et al., Nucleic Acids Research 15:131 1 (1987), the disclosure of which is incorporated herein by reference. A similar technique, Nucleofection™, utilizes an applied electric field in order to stimulate the uptake of exogenous polynucleotides into the nucleus of a eukaryotic cell. Nucleofection™ and protocols useful for performing this technique are described in detail, e.g., in Distler et al., Experimental Dermatology 14:315 (2005), as well as in US 2010/03171 14, the disclosures of each of which are incorporated herein by reference.
Additional techniques useful for the transfection of target cells include the squeeze-poration methodology. This technique induces the rapid mechanical deformation of cells in order to stimulate the uptake of exogenous DNA through membranous pores that form in response to the applied stress. This technology is advantageous in that a vector is not required for delivery of nucleic acids into a cell, such as a human target cell. Squeeze-poration is described in detail, e.g., in Sharei et al., Journal of Visualized Experiments 81 :e50980 (2013), the disclosure of which is incorporated herein by reference.
Lipofection represents another technique useful for transfection of target cells. This method involves the loading of nucleic acids into a liposome, which often presents cationic functional groups, such as quaternary or protonated amines, towards the liposome exterior. This promotes electrostatic interactions between the liposome and a cell due to the anionic nature of the cell membrane, which ultimately leads to uptake of the exogenous nucleic acids, for instance, by direct fusion of the liposome with the cell membrane or by endocytosis of the complex. Lipofection is described in detail, for instance, in US Patent No. 7,442,386, the disclosure of which is incorporated herein by reference. Similar techniques that exploit ionic interactions with the cell membrane to provoke the uptake of foreign nucleic acids include contacting a cell with a cationic polymer-nucleic acid complex. Exemplary cationic molecules that associate with polynucleotides so as to impart a positive charge favorable for interaction with the cell membrane include activated dendrimers (described, e.g., in Dennig, Topics in Current Chemistry 228:227 (2003), the disclosure of which is incorporated herein by reference) and
diethylaminoethyl (DEAE)-dextran, the use of which as a transfection agent is described in detail, for instance, in Gulick et al., Current Protocols in Molecular Biology 40:1 :9.2:9.2.1 (1 997), the disclosure of which is incorporated herein by reference. Magnetic beads are another tool that can be used to transfect target cells in a mild and efficient manner, as this methodology utilizes an applied magnetic field in order to direct the uptake of nucleic acids. This technology is described in detail, for instance, in US
2010/0227406, the disclosure of which is incorporated herein by reference.
Another useful tool for inducing the uptake of exogenous nucleic acids by target cells is laserfection, a technique that involves exposing a cell to electromagnetic radiation of a particular wavelength in order to gently permeabilize the cells and allow polynucleotides to penetrate the cell membrane. This technique is described in detail, e.g., in Rhodes et al., Methods in Cell Biology 82:309 (2007), the disclosure of which is incorporated herein by reference.
Microvesicles represent another potential vehicle that can be used to modify the genome of a target cell according to the methods described herein. For instance, microvesicles that have been induced by the co-overexpression of the glycoprotein VSV-G with, e.g., a genome-modifying protein, such as a nuclease, can be used to efficiently deliver proteins into a cell that subsequently catalyze the site- specific cleavage of an endogenous polynucleotide sequence so as to prepare the genome of the cell for the covalent incorporation of a polynucleotide of interest, such as a gene or regulatory sequence. The use of such vesicles, also referred to as Gesicles, for the genetic modification of eukaryotic cells is described in detail, e.g., in Quinn et al., Genetic Modification of Target Cells by Direct Delivery of Active Protein [abstract]. In: Methylation changes in early embryonic genes in cancer [abstract], in: Proceedings of the 18th Annual Meeting of the American Society of Gene and Cell Therapy; 2015 May 13,
Abstract No. 122.
(ii) Incorporation of target genes by gene editing techniques
In addition to the above, a variety of tools have been developed that can be used for the incorporation of exogenous genes into target cells, such as a human cell. One such method that can be used for incorporating polynucleotides encoding target genes into target cells involves the use of transposons. Transposons are polynucleotides that encode transposase enzymes and contain a polynucleotide sequence or gene of interest flanked by 5' and 3' excision sites. Once a transposon has been delivered into a cell, expression of the transposase gene commences and results in active enzymes that cleave the gene of interest from the transposon. This activity is mediated by the site-specific recognition of transposon excision sites by the transposase. In some instances, these excision sites may be terminal repeats or inverted terminal repeats. Once excised from the transposon, the gene of interest can be integrated into the genome of a mammalian cell by transposase-catalyzed cleavage of similar excision sites that exist within the nuclear genome of the cell. This allows the gene of interest to be inserted into the cleaved nuclear DNA at the complementary excision sites, and subsequent covalent ligation of the phosphodiester bonds that join the gene of interest to the DNA of the mammalian cell genome completes the incorporation process. In certain cases, the transposon may be a retrotransposon, such that the gene encoding the target gene is first transcribed to an RNA product and then reverse- transcribed to DNA before incorporation in the mammalian cell genome. Exemplary transposon systems include the piggybac transposon (described in detail in, e.g., WO 2010/085699) and the sleeping beauty transposon (described in detail in, e.g., US 2005/01 12764), the disclosures of each of which are incorporated herein by reference as they pertain to transposons for use in gene delivery to a cell of interest.
Another tool for the integration of target genes into the genome of a target cell is the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system, a system that originally evolved as an adaptive defense mechanism in bacteria and archaea against viral infection. The CRISPR/Cas system includes palindromic repeat sequences within plasmid DNA and an associated Cas9 nuclease. This ensemble of DNA and protein directs site specific DNA cleavage of a target sequence by first incorporating foreign DNA into CRISPR loci. Polynucleotides containing these foreign sequences and the repeat-spacer elements of the CRISPR locus are in turn transcribed in a host cell to create a guide RNA, which can subsequently anneal to a target sequence and localize the Cas9 nuclease to this site. In this manner, highly site-specific cas9-mediated DNA cleavage can be engendered in a foreign polynucleotide because the interaction that brings cas9 within close proximity of the target DNA molecule is governed by RNA:DNA hybridization. As a result, one can theoretically design a CRISPR/Cas system to cleave any target DNA molecule of interest. This technique has been exploited in order to edit eukaryotic genomes (Hwang et al., Nature Biotechnology 31 :227 (2013)) and can be used as an efficient means of site- specifically editing target cell genomes in order to cleave DNA prior to the incorporation of a gene encoding a target gene. The use of CRISPR/Cas to modulate gene expression has been described in, for instance, US Patent No. 8,697,359, the disclosure of which is incorporated herein by reference as it pertains to the use of the CRISPR/Cas system for genome editing. Alternative methods for site- specifically cleaving genomic DNA prior to the incorporation of a gene of interest in a target cell include the use of zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs). Unlike the CRISPR/Cas system, these enzymes do not contain a guiding polynucleotide to localize to a specific target sequence. Target specificity is instead controlled by DNA binding domains within these enzymes. The use of ZFNs and TALENs in genome editing applications is described, e.g., in Urnov et al., Nature Reviews Genetics 1 1 :636 (201 0); and in Joung et al., Nature Reviews Molecular Cell Biology 14:49 (2013), the disclosure of each of which are incorporated herein by reference as they pertain to compositions and methods for genome editing.
Additional genome editing techniques that can be used to incorporate polynucleotides encoding target genes into the genome of a target cell include the use of ARCUS™ meganucleases that can be rationally designed so as to site-specifically cleave genomic DNA. The use of these enzymes for the incorporation of genes encoding target genes into the genome of a mammalian cell is advantageous in view of the defined structure-activity relationships that have been established for such enzymes. Single chain meganucleases can be modified at certain amino acid positions in order to create nucleases that selectively cleave DNA at desired locations, enabling the site-specific incorporation of a target gene into the nuclear DNA of a target cell. These single-chain nucleases have been described extensively in, for example, US Patent Nos. 8,021 ,867 and US 8,445,251 , the disclosures of each of which are incorporated herein by reference as they pertain to compositions and methods for genome editing.
E. Vectors for Delivery of Exogenous Nucleic Acids to Target Cells
(i) Viral vectors for nucleic acid delivery
Viral genomes provide a rich source of vectors that can be used for the efficient delivery of exogenous genes into the genome of a cell (e.g., a mammalian cell, such as a human cell). Viral genomes are particularly useful vectors for gene delivery because the polynucleotides contained within such genomes are typically incorporated into the genome of a target cell by generalized or specialized transduction. These processes occur as part of the natural viral replication cycle, and do not require added proteins or reagents in order to induce gene integration. Examples of viral vectors include AAV, retrovirus, adenovirus (e.g., Ad5, Ad26, Ad34, Ad35, and Ad48), parvovirus (e.g., adeno-associated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e.g., influenza virus), rhabdovirus (e.g., rabies and vesicular stomatitis virus), paramyxovirus (e.g. measles and Sendai), positive strand RNA viruses, such as picornavirus and alphavirus, and double stranded DNA viruses including adenovirus, herpesvirus (e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e.g., vaccinia, modified vaccinia Ankara (MVA), fowlpox and canarypox). Other viruses useful for delivering polynucleotides into a cell include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example. Examples of retroviruses include: avian leukosis-sarcoma, mammalian C-type, B-type viruses, D-type viruses, HTLV-BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental
Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996). Other examples include murine leukemia viruses, murine sarcoma viruses, mouse mammary tumor virus, bovine leukemia virus, feline leukemia virus, feline sarcoma virus, avian leukemia virus, human T-cell leukemia virus, baboon endogenous virus, Gibbon ape leukemia virus, Mason Pfizer monkey virus, simian immunodeficiency virus, simian sarcoma virus, Rous sarcoma virus and lentiviruses. Other examples of vectors are described, for example, in US Patent No. 5,801 ,030, the disclosure of which is incorporated herein by reference as it pertains to viral vectors for use in gene therapy.
(ii) AA V Vectors for nucleic acid delivery
In some embodiments, nucleic acids of the compositions and methods described herein are incorporated into rAAV vectors and/or virions in order to facilitate their introduction into a cell. rAAV vectors useful in the invention are recombinant nucleic acid constructs that include (1 ) a heterologous sequence to be expressed (e.g., a polynucleotide encoding a GAA protein) and (2) viral sequences that facilitate integration and expression of the heterologous genes. The viral sequences may include those sequences of AAV that are required in cis for replication and packaging (e.g., functional ITRs) of the DNA into a virion. In typical applications, the heterologous gene encodes GAA, which is useful for correcting a GAA-deficiency in a cell. Such rAAV vectors may also contain marker or reporter genes. Useful rAAV vectors have one or more of the AAV WT genes deleted in whole or in part, but retain functional flanking ITR sequences. The AAV ITRs may be of any serotype (e.g., derived from serotype 2) suitable for a particular application. Methods for using rAAV vectors are described, for example, in Tal et al., J.
Biomed. Sci. 7:279-291 (2000), and Monahan and Samulski, Gene Delivery 7:24-30 (2000), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
The nucleic acids and vectors described herein can be incorporated into a rAAV virion in order to facilitate introduction of the nucleic acid or vector into a cell. The capsid proteins of AAV compose the exterior, non-nucleic acid portion of the virion and are encoded by the AAV cap gene. The cap gene encodes three viral coat proteins, VP1 , VP2 and VP3, which are required for virion assembly. The construction of rAAV virions has been described, for instance, in US Patent Nos. 5,173,414; 5,139,941 ; 5,863,541 ; 5,869,305; 6,057,152; and 6,376,237; as well as in Rabinowitz et al., J. Virol. 76:791 -801 (2002) and Bowles et al., J. Virol. 77:423-432 (2003), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
rAAV virions useful in conjunction with the compositions and methods described herein include those derived from a variety of AAV serotypes including AAV 1 , 2, 3, 4, 5, 6, 7, 8 and 9. For targeting muscle cells, rAAV virions that include at least one serotype 1 capsid protein may be particularly useful. rAAV virions that include at least one serotype 6 capsid protein may also be particularly useful, as serotype 6 capsid proteins are structurally similar to serotype 1 capsid proteins, and thus are expected to also result in high expression of GAA in muscle cells. rAAV serotype 9 has also been found to be an efficient transducer of muscle cells. Construction and use of AAV vectors and AAV proteins of different serotypes are described, for instance, in Chao et al., Mol. Ther. 2:619-623 (2000); Davidson et al., Proc. Natl. Acad. Sci. USA 97:3428-3432 (2000); Xiao et al., J. Virol. 72:2224-2232 (1998); Halbert et al., J. Virol. 74:1524-1 532 (2000); Halbert et al., J. Virol. 75:6615-6624 (2001 ); and Auricchio et al., Hum.
Molec. Genet. 1 0:3075-3081 (2001 ), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
Also useful in conjunction with the compositions and methods described herein are pseudotyped rAAV vectors. Pseudotyped vectors include AAV vectors of a given serotype (e.g., AAV9) pseudotyped with a capsid gene derived from a serotype other than the given serotype (e.g., AAV1 , AAV2, AAV3,
AAV4, AAV5, AAV6, AAV7, AAV8, etc.). For example, a representative pseudotyped vector is an AAV8 or AAV9 vector encoding a therapeutic protein pseudotyped with a capsid gene derived from AAV serotype 2. Techniques involving the construction and use of pseudotyped rAAV virions are known in the art and are described, for instance, in Duan et al., J. Virol. 75:7662-7671 (2001 ); Halbert et al., J. Virol. 74:1524- 1532 (2000); Zolotukhin et al., Methods, 28:1 58-167 (2002); and Auricchio et al., Hum. Molec. Genet., 10:3075-3081 (2001 ).
AAV virions that have mutations within the virion capsid may be used to infect particular cell types more effectively than non-mutated capsid virions. For example, suitable AAV mutants may have ligand insertion mutations for the facilitation of targeting AAV to specific cell types. The construction and characterization of AAV capsid mutants including insertion mutants, alanine screening mutants, and epitope tag mutants is described in Wu et al., J. Virol. 74:8635-45 (2000). Other rAAV virions that can be used in conjunction with the compositions and methods described herein include those capsid hybrids that are generated by molecular breeding of viruses as well as by exon shuffling. See, e.g., Soong et al., Nat. Genet., 25:436-439 (2000) and Kolman and Stemmer, Nat. Biotechnol. 19:423-428 (2001 ). II. Gene Silencing
The compositions and methods described herein can also be used to alter the expression of a gene of interest in a cell, such as a cell in vitro or in vivo. In particular, the invention provides methods and compositions for attenuating expression of a gene of interest in a cell, such as cell in vitro or in vivo, using a polynucleotide having a +1 frameshift mutation.
The polynucleotides and vectors described herein can have important clinical utility. For instance, a variety of diseases and conditions, including genetic and proliferative disorders, are manifestations of aberrant and/or dysregulated gene expression and may be associated with, for example, an elevated (e.g., increased) expression of a particular gene. The sections that follow describe polynucleotides having a +1 frameshift mutation and vectors containing such polynucleotides that can be used in methods to attenuate and/or maintain the expression level of a target gene, for example, to reduce target gene expression to or below a basal level.
A. Methods of Attenuating Gene Expression Using Reading Frame Surveillance
The present invention is based in part on the Reading Frame Surveillance (RFS) model, which suggests a mechanism by which a ribosome can prevent +1 frameshifting (e.g., a +1 or -1 frameshift mutation) from occurring during the translation of an mRNA. A frameshift mutation may occur whenever the ribosome shifts in the 5' or 3' direction during translation and may result in the translation of product that is not genetically encoded. In some instances, the +1 frameshift mutation occurs when five nucleotides are inserted with respect to the gene sequence, e.g., during RNA replication. In some embodiments, a single nucleotide deletion may cause a +1 frameshift mutation, e.g., during RNA replication. Translation of an mRNA by the ribosome entails direct binding of tRNA anti-codons to each codon of the message (Fig. 1 A). In the RFS model, during translation of an endogenous mRNA template short complementary RNA (scRNA) fragments of about 9 to about 30 (e.g., 9, 10, 1 1 , 12, 13, 14, 15, 1 6, 17, 18, 1 9, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30) nucleotides in length are generated by an RNA- dependent RNA polymerase. These scRNA fragments anneal to the mRNA transcript and guide the proper alignment of the ribosome to the mRNA template by preventing downstream (+1 ) frameshifts from occurring during the translation process. As shown in Figure 1 B, upstream (-1 ) frameshifting of the ribosome during translation of an mRNA construct is sterically prohibited, as this would require two adjacent tRNA molecules to be bound to the same base on the mRNA target, or that the mRNA contain a short homopolymeric sequence that allows slippage of the P-site tRNA. In contrast, +1 frameshifting of the ribosome could occur without such steric hindrance whenever a one-nucleotide translocation event allows for binding of the A-site tRNA in a +1 shifted position relative to the previous codon (Figure 1 C).
The RFS model is based in part on the argument that primitive ribosomes were more error prone than modern ribosomes (Woese, Proc. Natl. Acad. Sci. U.S.A. 54:1 546-1552 (1965)), thus increasing the challenge of limiting the occurrence of +1 frameshifting events. Combined with the suggestion that genomic RNA replication could cause primordial translation templates to be double stranded (Zenkin, J. Mol. Evol. 74:249-256 (2012)), the RFS model hypothesizes that a primordial ribosome could have used complementary RNA (cRNA) strands to measure template RNA during translation to facilitate recognition and response to frameshifting events. According to the RFS model, following transcription of a gene, a complementary RNA (cRNA) is generated by an RNA-dependent RNA polymerase (RdRP) (Fig. 2A). In eukaryotes, pioneer translation is expected to occur in a manner that is coupled to the synthesis of cRNA. During the pioneer round, translation of every set number of codons within a region of the cRNA that is complementary to an exon results in cleavage of the cRNA at set distance from the A-site, which produces a series of uniformly sized short complementary RNAs (scRNAs) that decorate the length of the exonic regions of the pre-mRNA. The scRNAs are predicted to have a uniform size of approximately 9-30 (e.g., 9, 10, 1 1 , 12, 13, 14, 15, 1 6, 17, 1 8, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, or 30) nucleotides and are spaced uniformly along the mRNA.
According to the RFS model, following splicing of the pre-mRNA to generate an mRNA, scRNAs remain complexed to the exonic regions of the mRNA, such that scRNAs span the length of the coding region of the mRNA. The RFS model suggests that these scRNAs serve as a molecular ruler that allows continuous monitoring for ribosomal frameshifts during subsequent rounds of translation by the ribosome. When a previously measured mRNA is subsequently translated, the RFS model postulates that the translational reading frame is monitored by sensing of scRNA termini with respect to the position of the tRNAs bound in the ribosomal active site (Fig. 2B). Sensing of +1 frameshifts during the translation process can lead to repeated cleavage of the mRNA transcript that may result in terminal 2-base pair 3' overhangs (Fig. 2C), which can facilitate loading of RNA fragments onto the RNA Induced Silencing Complex (RISC) in eukaryotic cells (Elbashir et al. Genes Dev. 15:188-200, 2001 ).
The RFS model additionally suggests that scRNA molecules may dissociate from the mRNA template that was translated during their biogenesis. These scRNA molecules may bid to daughter RNA molecules of the same polarity, for instance, to facilitate the proper translation of these templates. For example, terminal 2-base pair 3' overhangs are descendent of primordial structures involved in regulation of RNA integrity, as duplexes containing this motif are competent substrates for the RNA Induced Silencing Complex (RISC). The instant invention is based in part on the discovery that targeted 2-base pair 3' overhangs in modern cells can be used as a modality for promoting gene silencing. For instance, translation downstream of a deletion in a synthetic gene construct would be sensed as a +1 frameshift at each scRNA every time it is read by the ribosome. In turn, this can trigger a severe response, such as cleavage of the RNA message. The ensuing cleavage can subsequently generate double stranded RNA molecules with terminal 2-base pair 3' overhangs (Figure 2C), which can facilitate loading of RNA fragments onto the RISC in eukaryotic cells (Elbashir et al., Genes Dev. 15:188-200 (2001 ), herein incorporated by reference in its entirety). The RFS model indicates that the scRNA molecules function as a form of epigenetic memory of previous translations and facilitate the detection of reading frame errors occurring during genome replication. The techniques described herein exploit this mechanism for the purposes of targeted gene silencing. Figure 3 is a schematic portraying the Reading Frame Surveillance model applied to modulation of a eukaryotic gene.
The RFS model postulates that the ability of free scRNA to bind to and monitor the translation of template mRNA is a concentration dependent process. In some instances, when the concentration of free scRNA is reduced, translation of an mRNA template may be attenuated due to the limited binding (e.g., incomplete decoration) of the mRNA template by scRNA. In particular instances, the concentration of free scRNA may be reduced, e.g., by competitive inhibition. For example, an mutant mRNA with a +1 frameshift mutation relative to the wild type mRNA may bind to and sequester scRNA molecules, if the mutant mRNA is present at a sufficiently high concentration (e.g., a concentration higher than the level of the wild type mRNA). The RFS model suggests that competitive inhibition would result in a reduced rate of translation (e.g., attenuation of gene expression) of the wild type mRNA. Additionally, the ribosome would sense the +1 frameshift mutation in the mutant mRNA transcript, e.g., by sensing the position of each scRNA, and terminate translation.
Based on the teachings of the RFS model, gene silencing can be applied in a targeted manner, such as would be useful in the treatment of genetic disorders and conditions characterized by aberrant and/or dysregulated gene expression (e.g., elevated gene expression). Under certain conditions, reduction of the expression level of a gene to a lower level of expression may be useful as a method of treating, preventing, and/or ameliorating the disease. To attenuate the expression of a wild type gene (e.g., such as a disease associated gene) a polynucleotide and/or a vector encoding a polynucleotide having a +1 frameshift mutation relative to the sequence of the wild type mRNA may be introduced into a cell, e.g., at a sufficient concentration to induce competitive inhibition of scRNA binding to the wild type mRNA template. In some instances, a polynucleotide and/or a vector encoding a polynucleotide having a +1 frameshift mutation may reduce target gene expression by about 1 0% (e.g., 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90%) or more.
The expression level of genes targeted for attenuation by the polynucleotides described herein may be monitored by techniques known in the art for measuring RNA and/or protein levels. For example, a number of widely used procedures exist for detecting and determining the abundance of a particular mRNA in a total or poly(A) RNA sample, such as and without limitation, Northern blot analysis, nuclease protection assays (NPA), in situ hybridization, and reverse transcription-polymerase chain reaction (RT- PCR). Additionally, techniques measuring the concentration or relative abundance of a corresponding protein product encoded by a targeted gene of interest may be used, such as and without limitation, proteomics approaches, immunohistochemical and/or western blot analysis, immunoprecipitation, molecular binding assays, ELISA, enzyme-linked immunofiltration assay (ELIFA), mass spectrometry, mass spectrometric immunoassay, and biochemical enzymatic activity assays. B. Therapeutic Polynucleotides having a +1 Frameshift Mutation
Polynucleotides useful for attenuating the expression of a gene as described herein include those that are designed to have at least one (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) +1 frameshift mutation with respect to at least a portion of the sequence of a targeted gene. The polynucleotide may also include at least a single (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) nucleotide insertion or deletion causing the +1 frameshift mutation. In some instances, the +1 frameshift mutation may be generated by the addition of five nucleotides with respect to the targeted gene sequence. In some embodiments, a single nucleotide deletion may be introduced to cause a +1 frameshift mutation. The polynucleotide may be designed to be from about 9 to about 100 (e.g., 9, 10, 1 1 , 12, 13, 14, 1 5, 16, 1 7, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, or 100) nucleotides in length. In some instances, the polynucleotides will be between about 9 to about 50 (e.g., about 10 to about 25, about 20 to about 35, about 30 to about 45, about 35 to about 50) nucleotides in length. The at least one +1 frameshift mutation may occur at any nucleotide position along the length of the polynucleotide that allows for sufficient recognition by and binding of scRNA fragments.
The polynucleotide may be designed to include a wild type copy of the target gene operably linked to the portion of said polynucleotide having a +1 frameshift mutation. The overall length of the polynucleotide and the position of the +1 frameshift mutation along its length will determine the number of nucleotides that include the wild type sequence of the target gene. In some instances, the length of the wild type sequence of the gene operably linked to the portion of the polynucleotide having a +1 frameshift mutation is at least 20% (e.g., 20, 30, 40, 50, 60, 70, or 80%) of the total polynucleotide length. For example, a polynucleotide having a total length of 50 nucleotides may be designed such that nucleotides 1 -30 include the wild type sequence and the nucleotides occurring after a +1 frameshift mutation at nucleotide 31 have a non-wild type sequence.
To target an individual gene for attenuation of expression, one (e.g., 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or 20) or more polynucleotides having a +1 frameshift mutation may be designed. These polynucleotides may be of varying lengths and have +1 frameshift mutations located at different positions relative to the sequence of the target gene. In this manner, the polynucleotides can be designed to bind different sets of scRNA fragments when introduced into a cell, such as a cell in vitro or in vivo. The use of multiple polynucleotides may increase the efficacy of treatment (e.g., attenuation of gene expression).
The polynucleotides described herein may be introduced into a cell, such as a mammalian cell (e.g., a human cell) in vivo and in vitro, by methods known in the art, including transfection techniques. Additionally or alternatively, polynucleotides described herein may be incorporated into a variety of expression vectors, such as viral vectors, for delivery to and expression in mammalian cells.
The polynucleotides may be designed to target gene involved in diseases and conditions characterized by aberrant and/or dysregulated gene expression (e.g., elevated levels of gene
expression). Examples of disease and conditions for which polynucleotides having a +1 frameshift mutation may find therapeutic utility for attenuating gene expression include, but are not limited to, genetic disorders, e.g., dominant catecholaminergic polyventricular tachycardia (CPVT) and Long QT syndrome (LQTS), as well as in proliferative disorders, e.g., cancer.
Once designed, the polynucleotides of the can be prepared, for instance, by methods known in the art, such as solid phase nucleic acid synthesis and molecular cloning procedures.
C. Methods for the Delivery of Exogenous Nucleic Acids to Target Cells
(i) Transfection techniques
Techniques that can be used to introduce a polynucleotides having a +1 frameshift mutation described herein into a target cell (e.g., a mammalian cell, such as a human cell) include transfection techniques known in the art. For instance, electroporation can be used to permeabilize mammalian cells (e.g., human target cells) by the application of an electrostatic potential to the cell of interest. Mammalian cells, such as human cells, subjected to an external electric field in this manner are subsequently predisposed to the uptake of exogenous nucleic acids. Electroporation of mammalian cells is described in detail, e.g., in Chu et al., Nucleic Acids Research 15:131 1 (1987), the disclosure of which is incorporated herein by reference. A similar technique, Nucleofection™, utilizes an applied electric field in order to stimulate the uptake of exogenous polynucleotides into the nucleus of a eukaryotic cell.
Nucleofection™ and protocols useful for performing this technique are described in detail, e.g., in Distler et al., Experimental Dermatology 14:315 (2005), as well as in US 201 0/03171 14, the disclosures of each of which are incorporated herein by reference.
Additional techniques useful for the transfection of target cells with polynucleotides having a +1 frameshift mutation described herein include the squeeze-poration methodology. This technique induces the rapid mechanical deformation of cells in order to stimulate the uptake of exogenous DNA through membranous pores that form in response to the applied stress. This technology is advantageous in that a vector is not required for delivery of nucleic acids into a cell, such as a human target cell. Squeeze- poration is described in detail, e.g., in Sharei et al., Journal of Visualized Experiments 81 :e50980 (2013), the disclosure of which is incorporated herein by reference.
Lipofection represents another technique useful for transfection of target cells. This method involves the loading of nucleic acids into a liposome, which often presents cationic functional groups, such as quaternary or protonated amines, towards the liposome exterior. This promotes electrostatic interactions between the liposome and a cell due to the anionic nature of the cell membrane, which ultimately leads to uptake of the exogenous nucleic acids, for instance, by direct fusion of the liposome with the cell membrane or by endocytosis of the complex. Lipofection is described in detail, for instance, in US Patent No. 7,442,386, the disclosure of which is incorporated herein by reference.
Various amphiphilic lipids can form bilayers in an aqueous environment to encapsulate a RNA- containing aqueous core as a liposome. These lipids can have an anionic, cationic, or zwitterionic hydrophilic head group. Some phospholipids are anionic whereas other are zwitterionic and others are cationic. Suitable classes of phospholipid include, but are not limited to, phosphatidylethanolamines, phosphatidylcholines, phosphatidylserines, and phosphatidylglycerols. Useful cationic lipids include, but are not limited to, dioleoyl trimethylammonium propane (DOTAP), 1 ,2-distearyloxy-N,N-dimethyl-3- aminopropane (DSDMA), 1 ,2-dioleyloxy-N,Ndimethyl-3-aminopropane (DODMA), 1 ,2-dilinoleyloxy-N,N- dimethyl-3-aminopropane (DlinDMA), 1 ,2-dilinolenyloxy-N,N-dimethyl-3-aminopropane (DLenDMA). Zwitterionic lipids include, but are not limited to, acyl zwitterionic lipids and ether zwitterionic lipids.
Examples of useful zwitterionic lipids are DPPC, DOPC, DSPC, dodecylphosphocholine, 1 ,2-dioleoyl-sn- glycero- 3-phosphatidylethanolamine (DOPE), and 1 ,2-diphytanoyl-sn-glycero-3-phosphoethanolamine (DPyPE). The lipids in the liposome can be saturated or unsaturated. If an unsaturated lipid has two tails, either tails can be unsaturated, or the lipid can have one saturated tail and one unsaturated tail. A lipid can include a steroid group in one tail.
Similar techniques that exploit ionic interactions with the cell membrane to provoke the uptake of foreign nucleic acids include contacting a cell with a cationic polymer-nucleic acid complex. Exemplary cationic molecules that associate with polynucleotides so as to impart a positive charge favorable for interaction with the cell membrane include activated dendrimers (described, e.g., in Dennig, Topics in Current Chemistry 228:227 (2003), the disclosure of which is incorporated herein by reference) and diethylaminoethyl (DEAE)-dextran, the use of which as a transfection agent is described in detail, for instance, in Gulick et al., Current Protocols in Molecular Biology 40:1 :9.2:9.2.1 (1 997), the disclosure of which is incorporated herein by reference.
Magnetic beads represent another tool that can be used to transfect target cells in a mild and efficient manner, as this methodology utilizes an applied magnetic field in order to direct the uptake of nucleic acids. This technology is described in detail, for instance, in US 2010/0227406, the disclosure of which is incorporated herein by reference.
Another useful tool for inducing the uptake of exogenous nucleic acids by target cells is laserfection, a technique that involves exposing a cell to electromagnetic radiation of a particular wavelength in order to gently permeabilize the cells and allow polynucleotides to penetrate the cell membrane. This technique is described in detail, e.g., in Rhodes et al., Methods in Cell Biology 82:309 (2007), the disclosure of which is incorporated herein by reference.
Microvesicles represent another potential vehicle that can be used to modify the genome of a target cell according to the methods described herein. For instance, microvesicles that have been induced by the co-overexpression of the glycoprotein VSV-G with, e.g., a genome-modifying protein, such as a nuclease, can be used to efficiently deliver proteins into a cell that subsequently catalyze the site- specific cleavage of an endogenous polynucleotide sequence so as to prepare the genome of the cell for the covalent incorporation of a polynucleotide of interest, such as a gene or regulatory sequence. The use of such vesicles, also referred to as Gesicles, for the genetic modification of eukaryotic cells is described in detail, e.g., in Quinn et al., Genetic Modification of Target Cells by Direct Delivery of Active Protein [abstract]. In: Methylation changes in early embryonic genes in cancer [abstract], in: Proceedings of the 18th Annual Meeting of the American Society of Gene and Cell Therapy; 2015 May 13,
Abstract No. 122.
Another tool for the delivery of polynucleotides having a +1 frameshift mutation to a target cell include the use of exosomes, such as those harvested naturally from a cell (e.g., a mammalian cell, such as a human cell) or those prepared synthetically. Exemplary techniques for the preparation and loading of exosomes with RNA substrates are described, for instance, in US 2015/0093433, the disclosure of which is incorporated herein by reference in its entirety.
D. Vectors for Delivery of Polynucleotides Having a +1 Frameshift Mutation to Target Cells
(i) Viral vectors for polynucleotide delivery
Viral genomes provide a rich source of vectors that can be used for the efficient delivery of exogenous genes into the genome of a cell (e.g., a mammalian cell, such as a human cell). Viral genomes are particularly useful vectors for gene delivery because the polynucleotides contained within such genomes are typically incorporated into the genome of a target cell by generalized or specialized transduction. These processes occur as part of the natural viral replication cycle, and do not require added proteins or reagents in order to induce gene integration. Examples of viral vectors include AAV, retrovirus, adenovirus (e.g., Ad5, Ad26, Ad34, Ad35, and Ad48), parvovirus (e.g., adeno-associated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e.g., influenza virus), rhabdovirus (e.g., rabies and vesicular stomatitis virus), paramyxovirus (e.g. measles and Sendai), positive strand RNA viruses, such as picornavirus and alphavirus, and double stranded DNA viruses including adenovirus, herpesvirus (e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e.g., vaccinia, modified vaccinia Ankara (MVA), fowlpox and canarypox). Other viruses useful for delivering polynucleotides described herein include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example. Examples of retroviruses include: avian leukosis-sarcoma, mammalian C-type, B-type viruses, D-type viruses, HTLV- BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996). Other examples include murine leukemia viruses, murine sarcoma viruses, mouse mammary tumor virus, bovine leukemia virus, feline leukemia virus, feline sarcoma virus, avian leukemia virus, human T-cell leukemia virus, baboon endogenous virus, Gibbon ape leukemia virus, Mason Pfizer monkey virus, simian immunodeficiency virus, simian sarcoma virus, Rous sarcoma virus and lentiviruses. Other examples of vectors are described, for example, in US Patent No. 5,801 ,030, the disclosure of which is incorporated herein by reference as it pertains to viral vectors for use in gene therapy. (ii) AA V Vectors for polynucleotide delivery
In some embodiments, polynucleotides of the compositions and methods described herein are incorporated into rAAV vectors and/or virions in order to facilitate their introduction into a cell. rAAV vectors useful in the invention are recombinant nucleic acid constructs that include (1 ) a heterologous sequence to be expressed (e.g., a polynucleotide encoding a +1 frameshift mutation relative to a gene of interest) and (2) viral sequences that facilitate integration and expression of the heterologous genes. The viral sequences may include those sequences of AAV that are required in cis for replication and packaging (e.g., functional ITRs) of the DNA into a virion. In typical applications, the heterologous gene encodes GAA, which is useful for correcting a GAA-deficiency in a cell. Such rAAV vectors may also contain marker or reporter genes. Useful rAAV vectors have one or more of the AAV WT genes deleted in whole or in part, but retain functional flanking ITR sequences. The AAV ITRs may be of any serotype (e.g., derived from serotype 2) suitable for a particular application. Methods for using rAAV vectors are described, for example, in Tal et al. J. Biomed. Sci. 7:279-291 (2000), and Monahan and Samulski, Gene Delivery 7:24-30 (2000), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
The polynucleotides and vectors described herein can be incorporated into a rAAV virion in order to facilitate introduction of the nucleic acid or vector into a cell. The capsid proteins of AAV compose the exterior, non-nucleic acid portion of the virion and are encoded by the AAV cap gene. The cap gene encodes three viral coat proteins, VP1 , VP2 and VP3, which are required for virion assembly. The construction of rAAV virions has been described, for instance, in US Patent Nos. 5,173,414; 5,139,941 ; 5,863,541 ; 5,869,305; 6,057,152; and 6,376,237; as well as in Rabinowitz et al. J. Virol. 76:791 -801 (2002) and Bowles et al. J. Virol. 77:423-432 (2003), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
rAAV virions useful in conjunction with the compositions and methods described herein include those derived from a variety of AAV serotypes including AAV 1 , 2, 3, 4, 5, 6, 7, 8 and 9. For targeting muscle cells, rAAV virions that include at least one serotype 1 capsid protein may be particularly useful. rAAV virions that include at least one serotype 6 capsid protein may also be particularly useful, as serotype 6 capsid proteins are structurally similar to serotype 1 capsid proteins, and thus are expected to also result in high expression of GAA in muscle cells. rAAV serotype 9 has also been found to be an efficient transducer of muscle cells. Construction and use of AAV vectors and AAV proteins of different serotypes are described, for instance, in Chao et al. Mol. Ther. 2:619-623 (2000); Davidson et al. Proc. Natl. Acad. Sci. USA 97:3428-3432 (2000); Xiao et al. J. Virol. 72:2224-2232 (1998); Halbert et al. J. Virol. 74:1524-1 532 (2000); Halbert et al. J. Virol. 75:6615-6624 (2001 ); and Auricchio et al. Hum. Molec. Genet. 10:3075-3081 (2001 ), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery. Also useful in conjunction with the compositions and methods described herein are pseudotyped rAAV vectors. Pseudotyped vectors include AAV vectors of a given serotype (e.g., AAV9) pseudotyped with a capsid gene derived from a serotype other than the given serotype (e.g., AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, etc.). For example, a representative pseudotyped vector is an AAV8 or AAV9 vector encoding a therapeutic protein pseudotyped with a capsid gene derived from AAV serotype 2. Techniques involving the construction and use of pseudotyped rAAV virions are known in the art and are described, for instance, in Duan et al. J. Virol. 75:7662-7671 (2001 ); Halbert et al., J. Virol. 74:1524- 1532 (2000); Zolotukhin et al. Methods, 28:158-167 (2002); and Auricchio et al., Hum. Molec. Genet., 10:3075-3081 (2001 ).
AAV virions that have mutations within the virion capsid may be used to infect particular cell types more effectively than non-mutated capsid virions. For example, suitable AAV mutants may have ligand insertion mutations for the facilitation of targeting AAV to specific cell types. The construction and characterization of AAV capsid mutants including insertion mutants, alanine screening mutants, and epitope tag mutants is described in Wu et al., J. Virol. 74:8635-45 (2000). Other rAAV virions that can be used in conjunction with the compositions and methods described herein include those capsid hybrids that are generated by molecular breeding of viruses as well as by exon shuffling. See, e.g., Soong et al., Nat. Genet., 25:436-439 (2000) and Kolman and Stemmer, Nat. Biotechnol. 19:423-428 (2001 ).
III. Inducing Gene Expression
The compositions and methods described herein can also be used to induce the expression of a protein in a cell, such as a therapeutic protein, by contacting the cell in vitro or in vivo (for instance, in a patient, such as a human patient) with a duplex containing a mRNA strand encoding the protein annealed to a plurality of complementary nucleic acid strands. The plurality of nucleic acid strands may include one or more RNA strands or one or more DNA strands. In some embodiments, the plurality of nucleic acid strands includes both RNA strands and DNA strands. The complementary nucleic acid strands may include one or more modified nucleotides, such as a nucleotide that has been chemically modified so as to improve the strength of a specific binding interaction between the RNA encoding the protein of interest and the short complementary RNA strand.
The nucleic acid duplexes described herein can have important clinical utility. A variety of diseases and conditions, including heritable genetic disorders, are manifestations of a deficiency in a native protein. With the advent of gene therapy, a wide array of vectors and gene delivery techniques have been developed for the introduction of exogenous protein-coding nucleic acids into target cells (e.g., human cells). However, there remains a need for a paradigm for promoting robust and stable translation of a mRNA strand encoding a protein of interest in a cell, such as a cell in a patient suffering from a heritable genetic disorder. The sections that follow describe the structure and function of nucleic acid duplexes containing short complementary nucleic acids annealed to a mRNA strand encoding a protein of interest and the use of these duplexes for the treatment of a disease or condition in a patient.
A. Methods of Inducing Gene Expression Using Reading Frame Surveillance
The present invention is based in part on the hypothesis that, during translation of an
endogenous mRNA template, short complementary RNA (scRNA) fragments of from about 9 to about 30 nucleotides in length that anneal to the mRNA transcript are generated by an RNA-dependent RNA polymerase. It is hypothesized that these fragments guide the proper alignment of the ribosome to the mRNA template by preventing downstream (+1 ) frameshifts from occurring during the translation process. As shown in Figure 1 B, upstream (-1 ) frameshifting of the ribosome during translation of an mRNA construct is sterically prohibited, as this would require two adjacent tRNA molecules to be bound to the same base on the mRNA target. In contrast, +1 frameshifting of the ribosome could occur without such steric hindrance whenever a one-nucleotide translocation allows for binding of the A-site tRNA in a +1 shifted position relative to the previous codon (Figure 1 C).
Primitive ribosomes have been argued to have been more error prone (Woese, Proc. Natl. Acad.
Sci. U.S.A. 54:1 546-1552 (1965)), increasing the challenge of controlling mRNA positions and translocation to limit +1 frameshifting. Combined with the suggestion that genomic RNA replication could cause primordial translation templates to be double stranded (Zenkin, J. Mol. Evol. 74:249-256 (2012)), it is hypothesized that a primordial ribosome could have used the complementary RNA (cRNA) strand to measure the template RNA during translation to facilitate recognition and response to frameshifting events. The Reading Frame Surveillance (RFS) model on which the present invention is based in part postulates that measured cleavage of an annealed cRNA generated during the first round of translation creates a molecular ruler that, in subsequent rounds of translation, allows continuous monitoring of the progression of the mRNA through the ribosome with respect to the frame being read. An idealized schematic outlining this model is shown in Figure 2A, which demonstrates the primordial translation of a 42-coon mRNA, first in the pioneer round of translation and later when the same message is translated repetitively. A key element of this model is that during the pioneer round, translation of a uniform set of codons is associated with cleavage of the cRNA template at a set distance from the A-site, which results in the mRNA being decorated with scRNA oligonucleotides. The 7-codon length of the scRNA fragments diagrammed in Figure 2 represents one possible length for these fragments.
When a previously measured mRNA is subsequently translated, the RFS model postulates that the translational reading frame is monitored by sensing of scRNA termini with respect to the position of the tRNAs bound in the ribosome active sites. Figure 2B presents one possible mechanism by which this sensing event occurs. The ribosome senses the 5' end of a scRNA and, upon mRNA translocation to restore complete scRNA pairing with its complementary sequence on the mRNA, confirms that translation of the previous segment occurred without frameshifting. Figure 2C presents a schematic of a+1 frameshifted complex, which increase the distance between the ribosome and the scRNA 5' end, which could be sensed by the ribosome to trigger alteration or abortion of the translation of that mRNA.
The RFS model additionally suggests that scRNA molecules can dissociate from the mRNA template that was translated during their biogenesis. These scRNA molecules are free to bind to daughter RNA molecules of the same polarity, for instance, to facilitate proper translation of these templates. In this way, the RFS model indicates that the scRNA molecules function as a form of epigenetic memory of previous translations, and facilitate the detection of reading frame errors occurring during genome replication. If, for instance, a single base pair deletion is created during the replication of an RNA molecule (e.g., a +1 frame-shifting event), and scRNA molecules derived from translation of an intact RNA decorate the new, mutated RNA, then translation downstream of the deletion would be sensed as a +1 frameshift at each scRNA every time it is read by the ribosome. This may trigger a severe response, such as cleavage of the RNA message. The ensuing cleavage could subsequently generate double stranded RNA molecules with terminal 2-base pair 3' overhangs (Figure 2C), which facilitate loading of RNA fragments onto the RNA Induced Silencing Complex (RISC) in eukaryotic cells (Elbashir et al., Genes Dev. 15:1 88-200 (2001 )).
In eukaryotes, exons generally have elevated GC content relative to adjacent introns, and in higher eukaryotes this feature is even more pronounced in those regions of the genome with lower overall GC content and longer introns (Amit et al., Cell Reports 1 :543-556 (2012) and Louie et al., Genome Res. 13:2594-2601 (2003)). This property of coding exons is frequently revealed in analyses of codon usage bias, in particular the GC content of the third, wobble position of codons. Codon usage bias is often attributed to selective pressure to maintain optimal translation rates in the context of charged tRNA abundances (Ikemura, Mol. Biol. Evol. 2:13-34 (1985)), but there are many possible mechanistic causes for altered use of synonymous codons (Hershberg and Petrov, Annu. Rev. Genet. 42:287-299 (2008) and Plotkin and Ludla, Nat. Rev. Genet. 12:32-42 (201 1 )), and recent descriptions of physiological alteration of tRNA profiles in cells (Pavon-Eternod et al., Nucleic Acids Res. 37:7268-7280 (2009)) raise the possibility that codon usage bias in transcribed genes is a cause, as opposed to a consequence, of tRNA abundance. The RFS model indicates that GC-rich coding exons bind to scRNA molecules with increased affinity and thermodynamic stability, which enhances translational fidelity and processivity. Thus, codon usage in eukaryotes can be influenced by reading frame surveillance, and can provide selective pressure to maintain high GC content in the wobble position. In support of this model, the severity of disease-causing mutations in the Factor IX gene has been correlated with the change in pairing free energy causing by each coding sequence mutation (Hamasaki-Katagiri et al., Haemophilia 18:933-940 (2012)). The RFS model can thus be used to inform the design of polynucleotide duplexes engineered for robust expression in a target cell, and may synergize with codon-optimization procedures, such as those described herein, to provide a paradigm for augmenting gene expression in a cell of interest. (i) Synthetic decorated mRNA-containing duplexes
The principle of reading frame surveillance is based in part on the postulate that scRNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated. Based on this hypothesis, synthetic duplexes containing a mRNA strand encoding a protein of interest annealed to a plurality of complementary oligonucleotides, such as a plurality of complementary oligonucleotides of 9-30 nucleotides in length, can be prepared ex vivo and delivered to a cell in vitro or in vivo so as to simulate the mRNA-scRNA duplexes that naturally occur. In this way, mRNA of interest can be delivered to a cell in complex with a series of molecular rulers that will guide the high-fidelity translation of the encoded protein product.
Strands of mRNA encoding a protein of interest can be designed using a variety of techniques.
For instance, using the standard genetic code, one of skill in the art can design a cDNA strand encoding a protein of interest, which can then be converted to a RNA sequence. The design of protein-encoding nucleic acids can be informed, for instance, using the genetic code represented in Table 1 , above, compiled by the National Center for Biotechnology Information, Bethesda, Maryland, USA.
Using standard Watson-Crick base pairing guidelines, one of skill in the art can subsequently design a series of short complementary oligonucleotides that can be annealed to the protein-encoding mRNA strand. The complementary oligonucleotides can each independently be the length of a naturally occurring scRNA molecule, such as from 9-30 nucleotides in length. The complementary nucleic acids can be DNA or RNA, and may include one or more modified nucleic acids, for instance, as described below. (ii) Modified nucleic acids for inclusion in mRNA duplexes
Complementary nucleic acids that can be incorporated into a mRNA-containing duplex described herein may include one or more chemically or enzymatically modified nucleic acids. Such nucleic acids can be useful, for instance, for enhancing the affinity of the complementary nucleic acids for the mRNA strand of interest. Exemplary modified nucleotides include modified adenosine, such as N6- methyladenosine 5'-triphosphate, N1 -methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'- triphosphate, 2'-amino-2'-deoxyadenosine 5'-triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluoro-2'-deoxyadenosine 5'-triphosphate. Additional examples of modified nucleotides include modified guanosine, such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'- deoxyguanosine 5'-triphosphate. Modified uridine nucleotides can be incorporated into the mRNA- containing duplexes described herein, such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'- triphosphate, 5-bromouridine 5'-triphosphate, 2-thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'-amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'- deoxyuridine 5'-triphosphate, or 2'-fluoro-2'-deoxyuridine 5'-triphosphate. Additional modified nucleotides include modified cytidine, such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5- bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'-deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, or 2'-fluoro-2'- deoxycytidine 5'-triphosphate.
Additional examples of modified nucleotides are described, for instance, in WO 2012/031046 and US Patent Nos. 4,71 1 ,955 and 8,536,323, the disclosures of each of which are incorporated herein by reference as they pertain to chemically or enzymatically modified nucleotides.
(Hi) GC content
In addition to modified nucleotides, in designing protein-encoding mRNA and complementary nucleic acids to be included in a duplex for promoting gene expression, one of skill in the art can design nucleic acids that encode the protein of interest and that contain elevated GC content, for instance, relative to the wild-type gene while preserving the amino acid sequence of the encoded protein so as to enhance the thermodynamic stability of the duplex. According to the RFS model, the increase in GC content will lead to higher-affinity binding of the mRNA transcript with short complementary nucleic acid fragments. This binding promotes improved licensing of the mRNA transcript, as the short
complementary nucleic acids act as molecular rulers that do not permit improper alignment (e.g., +1 frameshifting) of the ribosome to the mRNA template.
(iv) Reducing CpG content and homopolymer content
In addition to modulating GC content, one of skill in the art can manipulate the protein-encoding gene sequence of a synthetic mRNA strand by incorporating codon substitutions that diminish the CpG content and/or homopolymer content of the mRNA. For instance, one can begin with a wild-type mRNA sequence and introduce substitutions (e.g., single-nucleotide substitutions) that reduce the CpG content and/or homopolymer content of the gene while preserving the identity of the encoded protein sequence. CpG sites and homopolymers can promote +1 frameshifts during the mRNA translation process.
Alternatively, if the homopolymer encodes amino acid residues that are not essential for protein function (for instance, if the encoded amino acids are not present within the active site of an encoded enzyme or within a site necessary for non-covalent binding to another biological molecule), one of skill in the art can incorporate codon substitutions that interrupt the homopolymer and that introduce a conservative substitution into the encoded protein at the site of the corresponding amino acid, for instance, such that the encoded protein has an amino acid sequence that exhibits at least 85% sequence identity (e.g., 85%, 90%, 95%, 97%, 99%, or more) relative to the wild type amino acid sequence. (v) Preparation of mRNA-containing duplexes
Once designed, the mRNA strand and short complementary oligonucleotides can be prepared, for instance, by solid phase nucleic acid procedures known in the art. For instance, to perform the chemical synthesis of nucleic acid molecules, such as DNA, RNA and the like, a solid phase synthesis process using a phosphoramidite method can be employed. According to this procedure, a nucleic acid is generally synthesized by the following steps.
First, a 5-OH-protected nucleoside that will occur at the 3' terminal end of the nucleic acid to be synthesized is esterified via the 3'-OH function to a solid support by appending the nucleoside to a cleavable linker. Then, the support for solid phase synthesis on which the nucleoside is immobilized can be placed in a reaction column which is then set on an automated nucleic acid synthesizer.
Thereafter, an iterative synthetic process including the following steps can be performed in the reaction column according to a synthesis program of the automated nucleic acid synthesizer:
• (1 ) a step of deprotection of the 5'-OH moiety of the protected, immobilized nucleoside (e.g., with an acid such as trichloroacetic acid in dichloromethane solution or the like);
· (2) a step of coupling a 5-OH-protected nucleosidephosphoramidite with the deprotected 5'-OH group in the presence of an activator (e.g., tetrazole or the like);
• (3) a step of capping the unreacted 5'-OH group of the 3'-terminal nucleoside (e.g., with acetic anhydride or the like); and
• (4) a step of oxidizing the immobilized phosphite substituent (e.g., with aqueous iodine or the like).
The above process can be repeated to elongate the nucleic acid as needed in a 3'-to-5' direction. 5' terminal direction is promoted, and a nucleic acid having a desired sequence is synthesized.
Lastly, the cleavable linker is hydrolyzed (e.g., with aqueous ammonia, methylamine solution, or the like) to cleave the synthesized nucleic acid from the solid phase support. Procedures such as the foregoing for the chemical synthesis of nucleic acids are known in the art and are described, for instance, in US Patent No. 8,835,656, the disclosure of which is incorporated herein by reference as it pertains to protocols for the synthesis of nucleic acid molecules.
B. Therapeutic Proteins
mRNA-containing duplexes that can be designed according to the methods described herein include those that encode therapeutic proteins, such that the duplexes can be transferred to a subject (e.g., a human patient) suffering from a disease or condition characterized by a deficiency in the protein. For instance, mRNA-containing duplexes that can be designed and delivered to a patient according to the methods described herein include those that encode, for example, hormones and growth and differentiation factors including, without limitation, insulin, glucagon, growth hormone (GH), parathyroid hormone (PTH), calcitonin, growth hormone releasing factor (GRF), thyroid stimulating hormone (TSH), adrenocorticotropic hormone (ACTH), prolactin, melatonin, vasopressin, β-endorphin, met-enkephalin, leu-enkephalin, prolactin-releasing factor, prolactin-inhibiting factor, corticotropin-releasing hormone, thyrotropin-releasing hormone (TRH), follicle stimulating hormone (FSH), luteinizing hormone (LH), chorionic gonadotropin (CG), vascular endothelial growth factor (VEGF), angiopoietins, angiostatin, endostatin, granulocyte colony stimulating factor (GCSF), erythropoietin (EPO), connective tissue growth factor (CTGF), basic fibroblast growth factor (bFGF), bFGF2, acidic fibroblast growth factor (aFGF), epidermal growth factor (EGF), transforming growth factor a (TGFa), platelet-derived growth factor (PDGF), insulin-like growth factors I and II (IGF-I and IGF-II), any one of the transforming growth factor β (TGFp) superfamily comprising TGFp, activins, inhibins, or any of the bone morphogenic proteins (BMP) BMPs 1 15, any one of the heregulin/neuregulin/ARIA/neu differentiation factor (NDF) family of growth factors, nerve growth factor (NGF), brain-derived neurotrophic factor (BDNF), neurotrophins NT-3, NT-4/5 and NT-6, ciliary neurotrophic factor (CNTF), glial cell line derived neurotrophic factor (GDNF), neurtuin, persephin, agrin, any one of the family of semaphorins/collapsins, netrin-1 and netrin-2, hepatocyte growth factor (HGF), ephrins, noggin, sonic hedgehog and tyrosine hydroxylase.
Other therapeutically useful mRNA-containing duplexes that can be designed and delivered to a patient according to the methods described herein include those that encode proteins that regulate the immune system including, without limitation, cytokines and lymphokines such as thrombopoietin (TPO), interleukins (IL) IL-1 a, IL-1 β, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-1 0, IL-1 1 , IL-12, IL-13, IL-14, IL- 15, IL-16, and IL-17, monocyte chemoattractant protein (MCP-1 ), leukemia inhibitory factor (LIF), granulocyte-macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating factor (G- CSF), monocyte colony stimulating factor (M-CSF), Fas ligand, tumor necrosis factors a and β (TNFa and TNFp), interferons (IFN) IFN-a, IFN-β, and IFN-γ, stem cell factor, flk-2/flt3 ligand. Gene products produced by the immune system are also encompassed by this invention. These include, without limitations, immunoglobulins IgG, IgM, IgA, IgD and IgE, chimeric immunoglobulins, humanized antibodies, single chain antibodies, T cell receptors, chimeric T cell receptors, single chain T cell receptors, class I and class II MHC molecules, as well as engineered MHC molecules including single chain MHC molecules. Useful gene products also include complement regulatory proteins such as membrane cofactor protein (MCP), decay accelerating factor (DAF), CR1 , CR2 and CD59.
Still other therapeutic proteins that can be encoded by a mRNA-containing duplex described herein include those that encode any one of the receptors for the hormones, growth factors, cytokines, lymphokines, regulatory proteins and immune system proteins. Examples of such receptors include flt-1 , flk-1 , TIE-2; the trk family of receptors such as TrkA, MuSK, Eph, PDGF receptor, EGF receptor, HER2, insulin receptor, IGF-1 receptor, the FGF family of receptors, the TGFp receptors, the interleukin receptors, the interferon receptors, serotonin receptors, a-adrenergic receptors, β-adrenergic receptors, the GDNF receptor, p75 neurotrophin receptor, among others. mRNA-containing duplexes may encode, for instance, receptors for extracellular matrix proteins, such as integrins, counter-receptors for transmembrane-bound proteins, such as intercellular adhesion molecules (ICAM-1 , ICAM-2, ICAM-3 and ICAM-4), vascular cell adhesion molecules (VCAM), and selectins E-selectin, P-selectin and L-selectin. mRNA-containing duplexes may encode, for instance, receptors for cholesterol regulation, including the LDL receptor, HDL receptor, VLDL receptor, and the scavenger receptor. mRNA-containing duplexes may encode, for instance, apolipoprotein ligands for these receptors, including ApoAI, ApoAIV and ApoE. Additional examples of proteins that may be encoded by mRNA-containing duplexes described herein include, for instance, steroid hormone receptor superfamily including glucocorticoid receptors and estrogen receptors, Vitamin D receptors and other nuclear receptors. In addition, useful gene products include antimicrobial peptides such as defensins and maginins, transcription factors such as jun, fos, max, mad, serum response factor (SRF), AP-1 , AP-2, myb, MRG1 , CREM, Alx4, FREAC1 , NF-KB, members of the leucine zipper family, C2H4 zinc finger proteins, including Zif268, EGR1 , EGR2, C6 zinc finger proteins, including the glucocorticoid and estrogen receptors, POU domain proteins, exemplified by Pit 1 , homeodomain proteins, including HOX-1 , basic helix-loop-helix proteins, including myc, MyoD and myogenin, ETS-box containing proteins, TFE3, E2F, ATF1 , ATF2, ATF3, ATF4, ZF5, NFAT, CREB, HNF- 4, C/EBP, SP1 , CCAAT-box binding proteins, interferon regulation factor 1 (IRF-1 ), Wilms' tumor protein, ETS-binding protein, STAT, GATA-box binding proteins, e.g., GATA-3, and the forkhead family of winged helix proteins.
Other mRNA-containing duplexes of therapeutic utility include those that encode carbamoyl synthetase I, ornithine transcarbamylase, arginosuccinate synthetase, arginosuccinate lyase, arginase, fumarylacetoacetate hydrolase, phenylalanine hydroxylase, alpha-1 antitrypsin, glucose-6-phosphatase, porphobilinogen deaminase, factor VII, factor VIII, factor IX, factor II, factor V, factor X, factor XII, factor XI, von Willebrand factor, superoxide dismutase, glutathione peroxidase and reductase, heme oxygenase, angiotensin converting enzyme, endothelin-1 , atrial natriuretic peptide, pro-urokinase, urokinase, plasminogen activator, heparin cofactor II, activated protein C (Factor V Leiden), Protein C, antithrombin, cystathione beta-synthase, branched chain ketoacid decarboxylase, albumin, isovaleryl- CoA dehydrogenase, propionyl CoA carboxylase, methyl malonyl CoA mutase, glutaryl CoA
dehydrogenase, insulin, beta-glucosidase, pyruvate carboxylase, hepatic phosphorylase, phosphorylase kinase, glycine decarboxylase (also referred to as P-protein), H-protein, T-protein, Menkes disease protein, tumor suppressors (e.g., p53), cystic fibrosis transmembrane regulator (CFTR), the product of Wilson's disease gene PWD, Cu/Zn superoxide dismutase, aromatic amino acid decarboxylase, tyrosine hydroxylase, acetylcholine synthetase, prohormone convertases, protease inhibitors, lactase, lipase, trypsin, gastrointestinal enzymes including chymotrypsin, and pepsin, adenosine deaminase, a1 antitrypsin, tissue inhibitor of metalloproteinases (TIMP), GLUT-1 , GLUT-2, trehalose phosphate synthase, hexokinases I, II and III, glucokinase, any one or more of the individual chains or types of collagen, elastin, fibronectin, thrombospondin, vitronectin and tenascin, and suicide genes such as thymidine kinase and cytosine deaminase. Other useful proteins include those involved in lysosomal storage disorders, including acid β-glucosidase, a-galactosidase a, α-1 -iduronidase, iduroate sulfatase, lysosomal acid a-glucosidase, sphingomyelinase, hexosaminidase A, hexomimidases A and B, arylsulfatase A, acid lipase, acid ceramidase, galactosylceramidase, a-fucosidase, α-, β-mannosidosis,
aspartylglucosaminidase, neuramidase, galactosylceramidase, heparan-N-sulfatase, N-acetyl-a- glucosaminidase, Acetyl-CoA: a-glucosaminide N-acetyltransferase, N-acetylglucosamine-6-sulfate sulfatase, N-acetylgalactosamine-6-sulfate sulfatase, arylsulfatase B, β-glucuoronidase and
hexosaminidases A and B.
Other mRNA-containing duplexes that can be designed and prepared using the compositions and methods described herein include those that encode non-naturally occurring polypeptides, such as chimeric or hybrid polypeptides or polypeptides having a non-naturally occurring amino acid sequence containing insertions, deletions or amino acid substitutions. For example, single-chain engineered immunoglobulins could be useful in certain immunocompromised patients. Other useful proteins include truncated receptors which lack their transmembrane and cytoplasmic domain. These truncated receptors can be used to antagonize the function of their respective ligands by binding to them without concomitant signaling by the receptor. Other types of non-naturally occurring gene sequences include sense and antisense molecules and catalytic nucleic acids, such as ribozymes, which could be used to modulate expression of a gene.
Representative proteins that can be encoded by a duplex containing mRNA annealed to a plurality of short complementary nucleic acids as described herein include those listed in Table 3, above.
C. Methods of Measuring Gene Expression
Prior to and/or following administration of a mRNA-containing duplex described herein, the expression level of a gene expressed by a single cell or type of cell (e.g., a cell that has been contacted or is to be contacted with a mRNA-containing duplex described herein) can be ascertained, for example, by evaluating the concentration or relative abundance of RNA transcripts (e.g., mRNA)) derived from transcription of a gene of interest. Additionally or alternatively, gene expression can be determined by evaluating the concentration or relative abundance of protein produced by transcription and translation of a gene of interest. Protein concentrations can also be assessed using functional assays, such as enzymatic assays or gene transcription assays in the event the gene of interest encodes an enzyme or a modulator of transcription, respectively. The sections that follow describe exemplary techniques that can be used to measure the expression levels of genes in a cell, cell type, or population of cells of interest, for instance, prior to and/or following the administration of a mRNA-containing duplex described herein to a patient or cell of interest. Expression of genes in a sample can be analyzed by a number of
methodologies, many of which are known in the art and understood by the skilled artisan, including, but not limited to, nucleic acid sequencing, microarray analysis, proteomics, in-situ hybridization (e.g., fluorescence in-situ hybridization (FISH)), amplification-based assays, in situ hybridization, fluorescence activated cell sorting (FACS), northern analysis and/or PCR analysis of mRNAs.
(i) Nucleic acid detection
Nucleic acid-based datasets suitable for analysis of cell-specific gene expression can have the form of a gene expression profile, which represents the identity of genes expressed in a cell of interest and the extent to which the gene is expressed, which can be used to determine the ranked order of gene expression levels within a cell, cell type, or population of cells of interest. Such profiles may include whole transcriptome sequencing data (e.g., RNA-Seq data), panels of mRNAs, noncoding RNAs, or any other nucleic acid sequence that may be expressed from genomic DNA. Other nucleic acid datasets suitable for use with the methods described herein may include expression data collected by imaging- based techniques (e.g., Northern blotting or Southern blotting known in the art). Northern blot analysis is a conventional technique well known in the art and is described, for example, in Molecular Cloning, a Laboratory Manual, second edition, 1989, Sambrook, Fritch, Maniatis, Cold Spring Harbor Press, 10 Skyline Drive, Plainview, NY 1 1 803-2500, the disclosure of which is incorporated herein by reference. Typical protocols for evaluating the status of genes and gene products are found, for example in Ausubel et al., eds., 1995, Current Protocols In Molecular Biology, Units 2 (Northern Blotting), 4 (Southern Blotting), 1 5 (Immunoblotting) and 18 (PCR Analysis), the disclosure of which is incorporated herein by reference.
Gene expression profiles to be analyzed in conjunction with the methods described herein may include, for example, microarray data or nucleic acid sequencing data produced by a sequencing method known in the art (e.g., Sanger sequencing and next-generation sequencing methods, also known as high- throughput sequencing or deep sequencing). Exemplary next generation sequencing technologies include, without limitation, lllumina sequencing, Ion Torrent sequencing, 454 sequencing, SOLiD sequencing, and nanopore sequencing platforms. Additional methods of sequencing known in the art can also be used. For instance, mRNA expression levels may be determined using RNA-Seq (e.g., as described in Mortazavi et al., Nat. Methods 5:621 -628 (2008), the disclosure of which is incorporated herein by reference in their entirety). RNA-Seq is a robust technology known in the art for monitoring expression by direct sequencing the RNA molecules in a sample. Briefly, this methodology may involve fragmentation of RNA to an average length of 200 nucleotides, conversion to cDNA by random priming, and synthesis of double-stranded cDNA (e.g., using the Just cDNA DoubleStranded cDNA Synthesis Kit from Agilent Technology). Then, the cDNA is converted into a molecular library for sequencing by addition of sequence adapters for each library (e.g., from lllumina®/Solexa), and the resulting 50-100 nucleotide reads are mapped onto the genome.
Gene expression levels may be determined using microarray-based platforms (e.g., single- nucleotide polymorphism (SNP) arrays), as microarray technology offers high resolution. Details of various microarray methods can be found in the literature. See, for example, US Patent No. 6,232,068 and Pollack et al., Nat. Genet. 23:41 -46 (1999), the disclosures of each of which are incorporated herein by reference in their entirety. Using nucleic acid microarrays, mRNA samples are reverse transcribed and labeled to generate cDNA. The probes can then hybridize to one or more complementary nucleic acids arrayed and immobilized on a solid support. The array can be configured, for example, such that the sequence and position of each member of the array is known. Hybridization of a labeled probe with a particular array member indicates that the sample from which the probe was derived expresses that gene. Expression level may be quantified according to the amount of signal detected from hybridized probe- sample complexes. A typical microarray experiment involves the following steps: 1 ) preparation of fluorescently labeled target from RNA isolated from the sample, 2) hybridization of the labeled target to the microarray, 3) washing, staining, and scanning of the array, 4) analysis of the scanned image and 5) generation of gene expression profiles. One example of a microarray processor is the Affymetrix
GENECHIP® system, which is commercially available and comprises arrays fabricated by direct synthesis of oligonucleotides on a glass surface. Other systems may be used as known to one skilled in the art.
Amplification-based assays also can be used to measure the expression level of one or more markers (e.g., genes). In such assays, the nucleic acid sequences of the gene act as a template in an amplification reaction (for example, PCR, such as qPCR). In a quantitative amplification, the amount of amplification product is proportional to the amount of template in the original sample. Comparison to appropriate controls provides a measure of the expression level of the gene, corresponding to the specific probe used, according to the principles described herein. Methods of real-time qPCR using TaqMan probes are well known in the art. Detailed protocols for real-time qPCR are provided, for example, in Gibson et al., Genome Res. 6:995-1001 (1 996) and in Heid et al., Genome Res. 6:986-994 (1996), the disclosures of each of which are incorporated herein by reference. Levels of gene expression as described herein can be determined by RT-PCR technology. Probes used for PCR may be labeled with a detectable marker, such as, for example, a radioisotope, fluorescent compound, bioluminescent compound, a chemiluminescent compound, metal chelator, or enzyme.
(ii) Protein detection
Gene expression can additionally be determined by measuring the concentration or relative abundance of a corresponding protein product encoded by a gene of interest. Protein levels can be assessed using standard detection techniques known in the art. Examples of protein expression analysis that generate data suitable for use with the methods described herein include, without limitation, proteomics approaches, immunohistochemical and/or western blot analysis, immunoprecipitation, molecular binding assays, ELISA, enzyme-linked immunofiltration assay (ELIFA), mass spectrometry, mass spectrometric immunoassay, and biochemical enzymatic activity assays. In particular, proteomics methods can be used to generate large-scale protein expression datasets in multiplex. Proteomics methods may utilize mass spectrometry to detect and quantify polypeptides (e.g., proteins) and/or peptide microarrays utilizing capture reagents (e.g., antibodies) specific to a panel of target proteins to identify and measure expression levels of proteins expressed in a sample (e.g., a single cell sample or a multi- cell population).
Exemplary peptide microarrays have a substrate-bound plurality of polypeptides, the binding of an oligonucleotide, a peptide, or a protein to each of the plurality of bound polypeptides being separately detectable. Alternatively, the peptide microarray may include a plurality of binders, including but not limited to monoclonal antibodies, polyclonal antibodies, phage display binders, yeast two-hybrid binders, aptamers, which can specifically detect the binding of specific oligonucleotides, peptides, or proteins. Examples of peptide arrays may be found in US Patent Nos. 6,268,210, 5,766,960, and 5,143,854, the disclosures of each of which are incorporated herein by reference.
Mass spectrometry (MS) may be used in conjunction with the methods described herein to identify and characterize the gene expression profile of a single cell or multi-cell population. Any method of MS known in the art may be used to determine, detect, and/or measure a peptide or peptides of interest, e.g., LC-MS, ESI-MS, ESI-MS/MS, MALDI-TOF-MS, MALDI-TOF/TOF-MS, tandem MS, and the like. Mass spectrometers generally contain an ion source and optics, mass analyzer, and data processing electronics. Mass analyzers include scanning and ion-beam mass spectrometers, such as time-of-flight (TOF) and quadruple (Q), and trapping mass spectrometers, such as ion trap (IT), Orbitrap, and Fourier transform ion cyclotron resonance (FT-ICR), may be used in the methods described herein. Details of various MS methods can be found in the literature. See, for example, Yates et al., Annu. Rev. Biomed. Eng. 1 1 :49-79 (2009), the disclosure of which is incorporated herein by reference.
Prior to MS analysis, proteins in a sample can be first digested into smaller peptides by chemical (e.g., via cyanogen bromide cleavage) or enzymatic (e.g., trypsin) digestion. Complex peptide samples also benefit from the use of front-end separation techniques, e.g., 2D-PAGE, HPLC, RPLC, and affinity chromatography. The digested, and optionally separated, sample is then ionized using an ion source to create charged molecules for further analysis. Ionization of the sample may be performed, e.g., by electrospray ionization (ESI), atmospheric pressure chemical ionization (APCI), photoionization, electron ionization, fast atom bombardment (FAB)/liquid secondary ionization (LSIMS), matrix assisted laser desorption/ionization (MALDI), field ionization, field desorption, thermospray/plasmaspray ionization, and particle beam ionization. Additional information relating to the choice of ionization method is known to those of skill in the art.
After ionization, digested peptides may then be fragmented to generate signature MS/MS spectra. Tandem MS, also known as MS/MS, may be particularly useful for methods described herein allowing for ionization followed by fragmentation of a complex peptide sample, such as a sample obtained from a multi-cell population described herein. Tandem MS involves multiple steps of MS selection, with some form of ion fragmentation occurring in between the stages, which may be accomplished with individual mass spectrometer elements separated in space or using a single mass spectrometer with the MS steps separated in time. In spatially separated tandem MS, the elements are physically separated and distinct, with a physical connection between the elements to maintain high vacuum. In temporally separated tandem MS, separation is accomplished with ions trapped in the same place, with multiple separation steps taking place over time. Signature MS/MS spectra may then be compared against a peptide sequence database (e.g., SEQUEST). Post-translational modifications to peptides may also be determined, for example, by searching spectra against a database while allowing for specific peptide modifications.
D. Methods for the Delivery of mRNA Duplexes to Target Cells
Techniques that can be used to introduce a mRNA-containing duplex described herein into a target cell (e.g., a mammalian cell, such as a human cell) include transfection techniques known known in the art. For instance, electroporation can be used to permeabilize mammalian cells (e.g., human target cells) by the application of an electrostatic potential to the cell of interest. Mammalian cells, such as human cells, subjected to an external electric field in this manner are subsequently predisposed to the uptake of exogenous nucleic acids. Electroporation of mammalian cells is described in detail, e.g., in Chu et al., Nucleic Acids Research 15:131 1 (1987), the disclosure of which is incorporated herein by reference. A similar technique, Nucleofection™, utilizes an applied electric field in order to stimulate the uptake of exogenous polynucleotides into the nucleus of a eukaryotic cell. Nucleofection™ and protocols useful for performing this technique are described in detail, e.g., in Distler et al., Experimental
Dermatology 14:31 5 (2005), as well as in US 201 0/03171 14, the disclosures of each of which are incorporated herein by reference.
Additional techniques useful for the transfection of target cells with mRNA-containing duplexes described herein include the squeeze-poration methodology. This technique induces the rapid mechanical deformation of cells in order to stimulate the uptake of exogenous DNA through membranous pores that form in response to the applied stress. This technology is advantageous in that a vector is not required for delivery of nucleic acids into a cell, such as a human target cell. Squeeze-poration is described in detail, e.g., in Sharei et al., Journal of Visualized Experiments 81 :e50980 (2013), the disclosure of which is incorporated herein by reference.
Lipofection represents another technique useful for transfection of target cells. This method involves the loading of nucleic acids into a liposome, which often presents cationic functional groups, such as quaternary or protonated amines, towards the liposome exterior. This promotes electrostatic interactions between the liposome and a cell due to the anionic nature of the cell membrane, which ultimately leads to uptake of the exogenous nucleic acids, for instance, by direct fusion of the liposome with the cell membrane or by endocytosis of the complex. Lipofection is described in detail, for instance, in US Patent No. 7,442,386, the disclosure of which is incorporated herein by reference.
Various amphiphilic lipids can form bilayers in an aqueous environment to encapsulate a RNA- containing aqueous core as a liposome. These lipids can have an anionic, cationic, or zwitterionic hydrophilic head group. Some phospholipids are anionic whereas other are zwitterionic and others are cationic. Suitable classes of phospholipid include, but are not limited to, phosphatidylethanolamines, phosphatidylcholines, phosphatidylserines, and phosphatidylglycerols. Useful cationic lipids include, but are not limited to, dioleoyl trimethylammonium propane (DOTAP), 1 ,2-distearyloxy-N,N-dimethyl-3- aminopropane (DSDMA), 1 ,2-dioleyloxy-N,Ndimethyl-3-aminopropane (DODMA), 1 ,2-dilinoleyloxy-N,N- dimethyl-3-aminopropane (DlinDMA), 1 ,2-dilinolenyloxy-N,N-dimethyl-3-aminopropane (DLenDMA). Zwitterionic lipids include, but are not limited to, acyl zwitterionic lipids and ether zwitterionic lipids.
Examples of useful zwitterionic lipids are DPPC, DOPC, DSPC, dodecylphosphocholine, 1 ,2-dioleoyl-sn- glycero- 3-phosphatidylethanolamine (DOPE), and 1 ,2-diphytanoyl-sn-glycero-3-phosphoethanolamine (DPyPE). The lipids in the liposome can be saturated or unsaturated. If an unsaturated lipid has two tails, either tails can be unsaturated, or the lipid can have one saturated tail and one unsaturated tail. A lipid can include a steroid group in one tail.
Similar techniques that exploit ionic interactions with the cell membrane to provoke the uptake of foreign nucleic acids include contacting a cell with a cationic polymer-nucleic acid complex. Exemplary cationic molecules that associate with polynucleotides so as to impart a positive charge favorable for interaction with the cell membrane include activated dendrimers (described, e.g., in Dennig, Topics in Current Chemistry 228:227 (2003), the disclosure of which is incorporated herein by reference) and diethylaminoethyl (DEAE)-dextran, the use of which as a transfection agent is described in detail, for instance, in Gulick et al., Current Protocols in Molecular Biology 40:1 :9.2:9.2.1 (1 997), the disclosure of which is incorporated herein by reference.
Magnetic beads represent another tool that can be used to transfect target cells in a mild and efficient manner, as this methodology utilizes an applied magnetic field in order to direct the uptake of nucleic acids. This technology is described in detail, for instance, in US 2010/0227406, the disclosure of which is incorporated herein by reference.
Another useful tool for inducing the uptake of exogenous nucleic acids by target cells is laserfection, a technique that involves exposing a cell to electromagnetic radiation of a particular wavelength in order to gently permeabilize the cells and allow polynucleotides to penetrate the cell membrane. This technique is described in detail, e.g., in Rhodes et al., Methods in Cell Biology 82:309 (2007), the disclosure of which is incorporated herein by reference.
Microvesicles represent another potential vehicle that can be used to modify the genome of a target cell according to the methods described herein. For instance, microvesicles that have been induced by the co-overexpression of the glycoprotein VSV-G with, e.g., a genome-modifying protein, such as a nuclease, can be used to efficiently deliver proteins into a cell that subsequently catalyze the site- specific cleavage of an endogenous polynucleotide sequence so as to prepare the genome of the cell for the covalent incorporation of a polynucleotide of interest, such as a gene or regulatory sequence. The use of such vesicles, also referred to as Gesicles, for the genetic modification of eukaryotic cells is described in detail, e.g., in Quinn et al., Genetic Modification of Target Cells by Direct Delivery of Active Protein [abstract]. In: Methylation changes in early embryonic genes in cancer [abstract], in: Proceedings of the 18th Annual Meeting of the American Society of Gene and Cell Therapy; 2015 May 13,
Abstract No. 122.
Another tool for the delivery of mRNA-containing duplexes to a target cell include the use of exosomes, such as those harvested naturally from a cell (e.g., a mammalian cell, such as a human cell) or those prepared synthetically. Exemplary techniques for the preparation and loading of exosomes with RNA substrates are described, for instance, in US 2015/0093433, the disclosure of which is incorporated herein by reference in its entirety.
IV. Alternative RNA Splicing
The invention additionally provides methods and compositions that induce alternative splicing, for instance, by way of exon skipping, in a gene by using the host's endogenous mechanism for monitoring +1 ribosomal frameshifts. A. The Reading Frame Surveillance Model and Alternative Splicing
There is a strong evolutionary imperative to limit ribosomal frameshift errors, whereby the ribosome shifts in the 5' or 3' direction during translation to produce a frameshift that is not genetically- encoded. Reading of the mRNA by the ribosome should limit -1 frameshifting, as that event would require that either two tRNAs be bound to the same base on the mRNA, or that the mRNA contain a short homopolymeric sequence that allows slippage of the P-site tRNA. Plus 1 frameshifting, on the other hand, could occur without such steric inhibition whenever a single-nucleotide translocation allows binding of the A-site tRNA in a +1 shifted position relative to the previous codon.
The Reading Frame Surveillance (RFS) model suggests a way by which a ribosome
mechanistically prevents +1 frameshifting when translating an mRNA. Translation of an mRNA by the ribosome entails direct binding of tRNA anti-codons to each codon of the message. Therefore, steric clashes between adjacent tRNAs should limit -1 frameshifting since such an event requires that two tRNAs be bound to the same base on the mRNA, or that the mRNA contain a short homopolymeric sequence that allows for slippage of the P-site tRNA. Plus 1 frameshifting, on the other hand, could occur without such steric inhibition whenever a single nucleotide translocation event allows for binding of the A-site tRNA in a +1 shifted position relative to the previous codon.
According to the RFS model, following transcription of a gene, a complementary RNA (cRNA) is generated by an RNA-dependent RNA polymerase (RdRP). In eukaryotes, pioneer translation is expected to occur in the nucleus following the synthesis of cRNA. During the pioneer round, translation of every set number of codons within a region of the cRNA that is complementary to an exon, results in cleavage of the cRNA at set intervals; thus producing a series of uniformly sized short complementary RNAs (scRNAs) that span the length of the exonic regions of the pre-mRNA. Following splicing of the pre- mRNA to generate an mRNA, scRNAs remain complexed to the exonic regions of the mRNA, such that scRNAs span the length of the coding region of the mRNA. These scRNAs have a uniform size of 9-30 nucleotides and uniform spacing along the exonic regions of the mRNA. The RFS model suggests that these scRNAs serve as a molecular ruler that allows continuous monitoring for ribosomal frameshifts during subsequent rounds of translation by the ribosome. When a previously measured mRNA is subsequently translated, the RFS model postulates that the translational reading frame is monitored by sensing of scRNA termini with respect to the position of the tRNAs bound in the ribosomal active site.
In eukaryotes, splicing occurs in the nucleus generating mature mRNAs competent for translation. Export to the cytoplasm and translation by the ribosome is regulated by the removal of introns, resulting in contiguous, uninterrupted reading frames. scRNAs are generated during pioneer translation in the nucleus, prior to splicing or export. Therefore, splicing joins these scRNA-bound exonic regions, juxtaposing the bordering scRNAs and demarcating their junction with an exon junction complex (EJC). The EJC is a protein complex formed on an mRNA at the junction of two exons which have been joined by splicing. The EJC has a role in regulating both export of the processed mRNA from the nucleus and cytoplasmic translation by the ribosome. When a splicing event occurs in which the uniform spacing of the scRNAs fragments at the splice junction is not maintained, the EJC may adjust the register of RFS monitoring during translation downstream of the junction. Otherwise put, if the length of an exon is not an exact multiple of the length of the scRNA, the uniformity of scRNA spacing will be disrupted at the splice junction. In this case, the presence of an EJC indicates that RFS may proceed downstream of the splice junction, monitoring for uniformity of scRNA spacing within the next exon.
Therefore, after all introns have been removed and the message contains an acceptable contiguous reading frame that is bound to scRNAs that span the length of the mRNA coding region, it may be exported to the cytoplasm and translated into protein by the ribosome.
The mechanisms by which exons are selected by alternative splicing for inclusion into mRNAs are highly regulated. Splice site recognition has long been known to involve complementary pairing of short spliceosomal RNAs to splice site sequences and, although this pairing could occur coordinately with the generation of scRNAs, the simple recognition of splice site consensus sequences is not always enough to define all splice sites. When introns are long, as frequently occurs in higher eukaryotes, splice site selection is increasingly influenced by the poorly defined mechanism of exon recognition. The ability of a ribosome to translate an exon contributes to its inclusion into an mRNA. This is consistent with the RFS model, whereby pre-mRNA transcripts are first copied into cRNA and then scanned by a nuclear pioneer translation apparatus that results in the production of scRNAs that decorate exonic regions of the pre-mRNA. The scRNAs can be recognized by the spliceosome and accessory factors to select exons to that are included in the mature mRNA following splicing.
In some embodiments of the RFS model, scRNAs may dissociate from the mRNA template following pioneer translation and bind additional mRNA molecules, including the same polarity and sequence, to facilitate RFS on those templates. In this way, scRNAs may function as a form of epigenetic memory of previous translations. In the case of alternative splicing, the presence of scRNA fragments spanning a given exon may be able to direct spliceosome-mediated inclusion in the mature mRNA. Conversely, the absence of scRNA fragments spanning a given exon may be able to direct splicesome-mediated exclusion from the mature mRNA.
In one embodiment of the RFS model, scRNAs may establish an epigenetic signature that can be communicated between cells. Consider the observation of revertant muscle fibers in patients with Duchenne muscular dystrophy (DMD). DMD patients often possess mutations that disrupt the open reading frame of the DMD gene that codes for the dystrophin protein in muscle cells. This results in translation of C-terminally truncated proteins. Altered splicing of mutated DMD mRNA transcripts, whether induced by exogenous polynucleotides or occurring spontaneously in rare revertant fibers, can skip mutated exons and restore the translational reading frame such that a partially functional protein is generated. When visualized via dystrophin domain-specific immunohistochemistry, natural revertant fibers reside in what appear to be clonal clusters that grow by gaining additional, similarly spliced variants of the partially functional dystrophin protein produced by exon skipping. It has been suggested that tissue resident satellite stem cells pick up epigenetic cues from revertant fibers and use them during differentiation to replicate the specific alternative splicing pattern established in the original revertant fiber. This model may be consistent with scRNA fragments transmitting epigenetic regulation between cells. In some embodiments, this may facilitate tissue development and repair without the need for high efficiency delivery to a large number of cells.
B. Inducing Alternative Splicing
The present invention uses the host's endogenous RFS mechanism to induce targeted alternative splicing (for example, by way of exon skipping) during the splicing of a pre-mRNA to form an mRNA. By providing a multi-exon polynucleotide with the desired splice pattern, RFS will generate a series of scRNA fragments that may direct for the desired splice pattern in endogenous mRNA transcripts. If the desired splice pattern omits one or more exons endogenous to a given host gene, when the host gene mRNA is translated by the ribosome, the resulting protein may be truncated, such that the truncated protein lacks one or more exteins of the wild-type form of the protein.
In some embodiments, the invention provides a method for inducing alternative splicing of an mRNA encoding a protein by providing a cell with a polynucleotide including at least the following elements operably linked in a 5'-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to a first intron (INTR1 ) from the gene; and a third region corresponding to a second exon (EX2) from the gene. In one embodiment, the wild type form of the gene includes one or more intervening exons between EX1 and EX2, and the polynucleotide does not comprise the one or more intervening exons. In some embodiments, the compositions and methods described herein are used to induce alternative splicing in the form of exon skipping.
In some embodiments, the polynucleotide may be delivered to a cell by inclusion in a vector, wherein the vector may be a viral vector. As such, the invention also provides a composition including a vector including a polynucleotide including at least the following elements operably linked in a 5'-3' direction: a first region corresponding to a first exon from a gene (EX1 ); a second region corresponding to a first intron (INTR1 ) from the gene; and a third region corresponding to a second exon (EX2) from the gene. In the preferred embodiment, the wild type form of the gene includes one or more intervening exons between the EX1 and EX2, and the polynucleotide does not comprise the one or more intervening exons.
In some embodiments, the polynucleotide further includes a eukaryotic promoter, allowing for transcription of the corresponding mRNA including EX1 -INTR1 -EX2, along with any additional elements of the polynucleotide. The resulting mRNA may be used as a template by an RdRP to generate the corresponding cRNA. The cRNA may be cleaved by the pioneer round of translation at uniform intervals of 9-30 nucleotides in the regions of the cRNA that are complementary to EX1 and EX2. The presence of cRNA fragments spanning the EX1 and EX2 exonic regions of the resulting mRNA, wherein uniform spacing of the cRNAs denotes a continuous reading frame, may enables translation of the EX1 -EX2 mRNA intro protein.
In some embodiments, it may be necessary to determine the length, in nucleotides, of the scRNAs generated by RFS of a given gene. In order for the scRNA fragments generated on the polynucleotide comprising EX1 -INTR1 -EX2 to match the position and spacing of the scRNAs generated on the corresponding exon sequence of the endogenous transcript, it may be necessary to adjust the length of the first exon in the polynucleotide comprising EX1 -INTR1 -EX2. The scRNA positioning on the endogenous transcript is expected to be determined by regular spacing of cleavage beginning from where pioneer translation initiates. In some embodiments, the polynucleotide comprising EX1 -INTR1 -EX2 includes an ATG sequence at the 5' end of EX1 , wherein the ATG may be responsible for the initiation of pioneer translation of the polynucleotide. Therefore, in some embodiments, the experimentalist may test initiation at multiple positions of EX1 to determine which model transcript matches the translation initiation of the endogenous mRNA, and thereby generates scRNAs on the polynucleotide that match the position and spacing of scRNAs on the endogenous mRNA. In some embodiments, this may be performed by generating a series of identical polynucleotides, each of which contains a different 5' truncation of EX1 , such that the initiation of pioneer translation at ATG of the polynucleotide occurs at a different position for each member of the series of polynucleotides, In some embodiments, this series of polynucleotides may be tested to determine which position for the initiation of pioneer translation enables the generation of scRNA fragments that direct for the desired splicing pattern in the endogenous gene.
In some embodiments, scRNAs produced from the transcription and pioneer translation of the polynucleotide that includes EX1 -INTR1 -EX2 may dissociate from the processed EX1 -EX2 mRNA and bind to an endogenous pre-mRNA including EX1 and EX2. When an endogenous gene, and its corresponding pre-mRNA, comprise EX1 , EX2 and one or more intervening exons between EX1 and EX2, then scRNA fragments bound to EX1 and EX2 may induce alternative splicing by the spliceosome, such that the resulting mRNA includes EX1 and EX2 linked directly in a 5'-to-3' direction, and the intervening one or more exons are not included in the mature mRNA. Therefore, in some embodiments, scRNAs generated as a result of delivery of a polynucleotide including EX1 -INTR1 -EX2 to a cell, may induce epigenetic regulation in the cell that alters the splice pattern of endogenous mRNA. In some embodiments, the epigenetic memory that results as a product of scRNAs may be transferrable between cells.
C. Design Considerations for Polynucleotides that Induce Alternative Splicing
(i) Alternative splicing to bypass small mutations within an exon
Small mutations may be considered as mutations that occur within an isolated exon. Examples of small mutations include point mutations, single-nucleotide insertions, single-nucleotide deletions, and insertions or deletions that occur within an isolated exon. In some embodiments, the endogenous gene targeted for alternative splicing includes one or more small mutations, as described above.
Small mutations that occur in in-frame exons, wherein the length in nucleotides of the wild-type form of the exon is a multiple of three, can be bypassed by skipping the mutated exon alone. This will result in an internally truncated protein, which may restore partial or complete function of the protein by either removal of a premature termination codon or restoration of the downstream reading frame. In some embodiments, the endogenous gene targeted for alternative splicing includes one or more small mutations in an in-frame exon. When a mutation is present in an in-frame exon, in some embodiments, delivery of a polynucleotide that induces skipping of the mutated in-frame exon by using the host's endogenous RFS mechanism, may restore the wild-type reading frame of the downstream exons. In some embodiments, alternative splicing is induced in the form of exon skipping.
Small mutations that occur in out-of-frame exons, e.g., when the length in nucleotides of the wild- type form of the exon is not a multiple of three, require that an additional one or more exons are also skipped in order to restore and maintain the reading frame. In the case that multiple exons are skipped in order to restore or maintain the reading frame, the total number of nucleotides in the wild-type form of all the exons that are skipped must equal a multiple of three. In some embodiments, the endogenous gene targeted for alternative splicing includes one or more small mutations in an out-of-frame exon. When a mutation is present in an out-of-frame exon, in some embodiments, delivery of a polynucleotide induces skipping of the mutated out-of-frame exon and one or more additional exons by using the host's endogenous RFS mechanism, such that the downstream reading frame is restored to wild type by alternative splicing.
(ii) Alternative splicing to bypass large deletions or duplications
Large mutations may be considered as either a deletion that removes a region of a gene wherein the region includes one or more entire exons, or a duplication that multiplies a region of a gene including one or more entire exons.
If the total number of nucleotides in the exon or exons that are deleted or duplicated is a multiple of three, the reading frame will not be disrupted. This will result in an internally truncated protein which may be fully or partially functional. In this case, alternative splicing (e.g., exon skipping) is not required to restore the reading frame. However, wherein an in-frame duplication is related to a pathological state, in some embodiments, alternative splicing may be employed to remove duplicated exons. In the preferred embodiment, alternative splicing of the duplicated exon does not incur a frameshift mutation.
By contrast, when the total number of nucleotides in the exon or exons that have been deleted or duplicated is not a multiple of 3, the reading frame is shifted, leading to the incorporation of aberrant amino acids into the protein downstream of the mutation during translation. Often, an incorrect reading frame contains premature termination codons, leading to premature termination of translation and production of a C-terminally truncated protein. In this case, alternative splicing (e.g., by way of exon skipping or exon inclusion) can be used to restore the reading frame by skipping one or more adjacent exons such that the reading frame is restored. Wherein the mutated gene includes a deletion of one or more out-of-frame exons, in some embodiments, one or more exons are skipped using RFS such that the total number of nucleotides in the deleted exon(s) and the skipped exon(s) is a multiple of three, thereby restoring the downstream reading frame. Wherein the mutated gene includes a duplication of one or more out-of-frame exons, in some embodiments, one or more exons are skipped using RFS such that the total number of nucleotides in the duplicated exon(s) and the skipped exon(s) is a multiple of three, thereby restoring the downstream reading frame.
C. Therapeutic applications of alternative splicing using RFS
(i) Alternative splicing for reading frame restoration
The ability to induce targeted alternative splicing resulting in the production of a selectively truncated protein has well documented therapeutic value. In some embodiments, alternative splicing is induced in the form of exon skipping. In some embodiments, alternative splicing (e.g., exon skipping) by Reading Frame Surveillance may be used to restore complete or partial function to a protein, wherein the gene encoding the protein contains a mutation relative to the wild-type form of the gene.
In some embodiments, a host may harbor a deleterious genetic mutation in an exon, wherein targeted skipping of the exon, may allow for complete or partial restoration of the function of the protein. In some embodiments, the mutation is a point mutation, insertion, or deletion that results in either a premature termination codon or a frameshift. In these embodiments, induced alternative splicing of the exon containing the mutation could restore the reading frame and thereby restore the partial or complete function of the protein.
In another example, a host may harbor a deleterious genetic mutation in which a region including one or more exons, or a portion of one or more exons, has been duplicated or deleted resulting in a frameshift. Targeted alternative splicing of one or more additional exons, such that the removal of the exon from the mRNA restores the downstream reading frame, may allow for complete or partial restoration of the function of the protein. In some embodiments, the mutation is a deletion or duplication of a region of a gene including an entire exon, wherein in the mutation results in a frameshift. In other related embodiments, the mutation is a deletion or duplication of a region of a gene including a portion of an exon, wherein in the mutation results in a frameshift. In these embodiments, induced alternative splicing of one or more exons, such that alternative splicing results in restoration of the downstream reading frame, may restore partial or complete function of the protein.
In some embodiments, alternative splicing (e.g., by way of exon skipping) may be used in the treatment of Duchenne muscular dystrophy (DMD). As discussed previously, DMD arises from frameshifting or nonsense mutations in the DMD gene that codes for the dystrophin protein. Alternative splicing induced by the compositions and methods described herein may restore the downstream reading frame in the mutated DMD gene, thus restoring partial function of the protein.
In some embodiments, alternative splicing, for instance, by way of exon skipping, may be used in the treatment of dystrophic epidermolysis bullosa. Frameshifting mutations in the COL7A 1 gene that encodes type VII collagen are associated with dystrophic epidermolysis bullosa. In-frame exon skipping that results in restoration of the downstream reading frame has been shown, in some cases, to produce an internally-truncated protein with near-normal function.
(ii) Alternative splicing for reading frame disruption
In some embodiments, it may be therapeutically useful to produce a C-terminally truncated protein; for example, if a protein is overexpressed, if a protein contains a dominant negative mutation, or if a chromosomal aberration results in a fusion protein associated with disease. In such cases, alternative splicing may be used to disrupt the reading frame in order to include a premature termination codon in the mRNA transcript and produce a C-terminally truncated protein.
In some embodiments, alternative splicing by RFS may be used to disrupt the downstream reading frame in order to include a premature termination codon in the resulting mRNA transcript. Alternative splicing to disrupt the reading frame would require skipping of one or more exons, wherein the total number of nucleotides in the one or more exons being skipped does not add up to a multiple of 3. Moreover, design of the alternative splicing construct should consider the downstream sequence such that the alternative splicing results in a premature termination codon at the desired location.
(Hi) Alternative splicing for isoform switching
Up to 95% of multi-exon genes undergo alternative splicing, wherein multiple protein isoforms may be produced from a single gene by the selective inclusion or exclusion of exons from the processed mRNA. In these genes, alternative splicing patterns can determine which isoform of a protein is expressed. This process may be regulated in a developmental and tissue specific manner. In some embodiments, alternative splicing (e.g., by way of exon skipping) may be used to preferentially express a given protein isoform by inducing alternative splicing, wherein doing so has therapeutic value in the treatment of a disease state.
In some embodiments, alternative splicing may be used to regulate the relative expression of different isoforms of the human tau protein. Tau protein is encoded by the microtubule-associated protein tau (MAPT) gene and contributes to microtubule stabilization. There are six protein isoforms expressed in the brain that result from alternative splicing. The relative amount of tau protein that includes exon 10 (E10+) to tau protein that excludes exon 1 0 (E10 ) is regulated in development. The human embryonic brain only expresses the E10" isoform, wherein the human adult brain expresses both E1 0" and E10+ in equal levels. Tau-related pathologies, such as frontotemporal dementia with Parkinsonism, are associated with a shift increase in the ratio of E1 0+:E1 0" leading to tau deposition in the brain and are implicated in movement disorders and memory problems. For example, skipping of exon 10 of the MAPT gene may restore the balance of E10+:E10-.
In some embodiments, alternative splicing that regulates alternative splicing may be used in the treatment of cancer. The Wilms' tumor gene, WT1, is often overexpressed in leukemia and solid tumors. The gene product of WT1 may interfere with normal signaling, leading to increased proliferation and inhibition of differentiation and apoptosis. Some leukemic cells express high levels of WT1 mRNA containing the alternatively spliced exon 5. Skipping of exon 5 has been shown to decrease cell viability resulting in a decrease in cell survival in leukemic cell cultures. In some embodiments, skipping of exon 5 of the WT1 may be therapeutically relevant for the treatment of leukemia.
In some embodiments, alternative splicing to induce isoform switching in the gene product of the folate hydrolase gene may be used in the treatment of prostate cancer. One isoform of the prostate- specific membrane antigen encoded by the folate hydrolase gene is expressed at a level 140-fold higher in malignant prostate tissues. The overexpressed isoform has an extracellular enzymatic domain that regulates folate uptake. In some embodiments, alternative splicing may be used to express a protein isoform with decreased enzymatic activity for the treatment of prostate cancer.
(iv) Alternative splicing for gene knockdown
In some embodiments, alternative splicing (e.g., by way of exon skipping) may be applied as an alternative way to knockdown the expression of a gene. For example, there are two natural isoforms of Apolipoprotein B (ApoB). The full-length ApoBI OO protein is a ligand for the LDL receptor and is required for the assembly of VLDL, IDL, and LDL particles. ApoBI OO plays a central role in atherosclerosis. The other isoform, ApoB48, is essential for chylomicron assembly and intestinal fat transport. ApoB48 arises from tissue-specific RNA editing that results in a premature termination codon being present in exon 26. It is desirable to knockdown the expression of ApoB100 for the treatment of atherosclerosis without affecting the expression of ApoB48. In some embodiments, alternative splicing may be used to selectively knockdown the expression of ApoBI OO by skipping exon 27 and inducing a frameshift mutation. Because ApoB46 is truncated at exon 26, skipping of exon 27 would not interfere with its expression.
In some embodiments, alternative splicing that induces a frameshift may be used to knockdown the activity of an over-expressed or constitutively active protein, wherein the increased activity of the protein leads to a pathogenic state. In fibrodysplasia ossificans progressiva (FOP), a mutation of the receptor encoded by the ALK2 gene leads to constitutive activation of the receptor and downstream signaling pathways. Alternative splicing that disrupts the reading frame of the receptor encoded by ALK2, and thereby results in a non-functional protein, has been proposed as a possible method of treating FOP.
(v) Alternative splicing for cryptic splice sites
Mutations can introduce new splice sites associated with a pseudo-exon, the inclusion of which may disturb protein production. In some embodiments, alternative splicing may be used to remove an aberrant pseudo-exon and restore production of a functional protein. One such application is in the treatment of autosomal recessive congenital myopathy, which is due to a mutation in the type 1 ryanodine receptor (RyR1 ) encoded by the RYR1 gene. In at least one case, inclusion of small pseudo-exon in RYR1 resulting in a frameshift was observed. Alternative splicing of this pseudo-exon may restore the reading frame of RYR1 and therefore restore production of the functional RyR1 proteins. D. Methods for the Delivery of Exogenous Nucleic Acids to Target Cells
(i) Transfection techniques
Techniques that can be used to introduce a polynucleotide, such as codon-optimized DNA or RNA into a mammalian cell are well known in the art. For instance, electroporation can be used to permeabilize mammalian cells (e.g., human target cells) by the application of an electrostatic potential to the cell of interest. Mammalian cells, such as human cells, subjected to an external electric field in this manner are subsequently predisposed to the uptake of exogenous nucleic acids. Electroporation of mammalian cells is described in detail, e.g., in Chu et al. Nucleic Acids Research 15:131 1 (1987), the disclosure of which is incorporated herein by reference. A similar technique, Nucleofection™, utilizes an applied electric field in order to stimulate the uptake of exogenous polynucleotides into the nucleus of a eukaryotic cell. Nucleofection™ and protocols useful for performing this technique are described in detail, e.g., in Distler et al. Experimental Dermatology 14:31 5 (2005), as well as in US 2010/031 71 14, the disclosures of each of which are incorporated herein by reference.
Additional techniques useful for the transfection of target cells include the squeeze-poration methodology. This technique induces the rapid mechanical deformation of cells in order to stimulate the uptake of exogenous DNA through membranous pores that form in response to the applied stress. This technology is advantageous in that a vector is not required for delivery of nucleic acids into a cell, such as a human target cell. Squeeze-poration is described in detail, e.g., in Sharei et al. Journal of Visualized Experiments 81 :e50980 (2013), the disclosure of which is incorporated herein by reference.
Lipofection represents another technique useful for transfection of target cells. This method involves the loading of nucleic acids into a liposome, which often presents cationic functional groups, such as quaternary or protonated amines, towards the liposome exterior. This promotes electrostatic interactions between the liposome and a cell due to the anionic nature of the cell membrane, which ultimately leads to uptake of the exogenous nucleic acids, for instance, by direct fusion of the liposome with the cell membrane or by endocytosis of the complex. Lipofection is described in detail, for instance, in US Patent No. 7,442,386, the disclosure of which is incorporated herein by reference. Similar techniques that exploit ionic interactions with the cell membrane to provoke the uptake of foreign nucleic acids include contacting a cell with a cationic polymer-nucleic acid complex. Exemplary cationic molecules that associate with polynucleotides so as to impart a positive charge favorable for interaction with the cell membrane include activated dendrimers (described, e.g., in Dennig, Topics in Current Chemistry 228:227 (2003), the disclosure of which is incorporated herein by reference) and
diethylaminoethyl (DEAE)-dextran, the use of which as a transfection agent is described in detail, for instance, in Gulick et al. Current Protocols in Molecular Biology 40:1 :9.2:9.2.1 (1997), the disclosure of which is incorporated herein by reference. Magnetic beads are another tool that can be used to transfect target cells in a mild and efficient manner, as this methodology utilizes an applied magnetic field in order to direct the uptake of nucleic acids. This technology is described in detail, for instance, in US
2010/0227406, the disclosure of which is incorporated herein by reference.
Another useful tool for inducing the uptake of exogenous nucleic acids by target cells is laserfection, a technique that involves exposing a cell to electromagnetic radiation of a particular wavelength in order to gently permeabilize the cells and allow polynucleotides to penetrate the cell membrane. This technique is described in detail, e.g., in Rhodes et al. Methods in Cell Biology 82:309 (2007), the disclosure of which is incorporated herein by reference. Microvesicles represent another potential vehicle that can be used to modify the genome of a target cell according to the methods described herein. For instance, microvesicles that have been induced by the co-overexpression of the glycoprotein VSV-G with, e.g., a genome-modifying protein, such as a nuclease, can be used to efficiently deliver proteins into a cell that subsequently catalyze the site- specific cleavage of an endogenous polynucleotide sequence so as to prepare the genome of the cell for the covalent incorporation of a polynucleotide of interest, such as a gene or regulatory sequence. The use of such vesicles, also referred to as Gesicles, for the genetic modification of eukaryotic cells is described in detail, e.g., in Quinn et al. Genetic Modification of Target Cells by Direct Delivery of Active Protein [abstract]. In: Methylation changes in early embryonic genes in cancer [abstract], in: Proceedings of the 18th Annual Meeting of the American Society of Gene and Cell Therapy; 2015 May 13,
Abstract No. 122.
(ii) Incorporation of target genes by gene editing techniques
In addition to viral vectors, a variety of tools have been developed that can be used for the incorporation of exogenous genes into target cells, such as a human cell. One such method that can be used for incorporating polynucleotides encoding target genes into target cells involves the use of transposons. Transposons are polynucleotides that encode transposase enzymes and contain a polynucleotide sequence or gene of interest flanked by 5' and 3' excision sites. Once a transposon has been delivered into a cell, expression of the transposase gene commences and results in active enzymes that cleave the gene of interest from the transposon. This activity is mediated by the site-specific recognition of transposon excision sites by the transposase. In some instances, these excision sites may be terminal repeats or inverted terminal repeats. Once excised from the transposon, the gene of interest can be integrated into the genome of a mammalian cell by transposase-catalyzed cleavage of similar excision sites that exist within the nuclear genome of the cell. This allows the gene of interest to be inserted into the cleaved nuclear DNA at the complementary excision sites, and subsequent covalent ligation of the phosphodiester bonds that join the gene of interest to the DNA of the mammalian cell genome completes the incorporation process. In certain cases, the transposon may be a retrotransposon, such that the gene encoding the target gene is first transcribed to an RNA product and then reverse- transcribed to DNA before incorporation in the mammalian cell genome. Exemplary transposon systems include the piggybac transposon (described in detail in, e.g., WO 2010/085699) and the sleeping beauty transposon (described in detail in, e.g., US 2005/01 12764), the disclosures of each of which are incorporated herein by reference as they pertain to transposons for use in gene delivery to a cell of interest.
Another tool for the integration of target genes into the genome of a target cell is the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system, a system that originally evolved as an adaptive defense mechanism in bacteria and archaea against viral infection. The CRISPR/Cas system includes palindromic repeat sequences within plasmid DNA and an associated Cas9 nuclease. This ensemble of DNA and protein directs site specific DNA cleavage of a target sequence by first incorporating foreign DNA into CRISPR loci. Polynucleotides containing these foreign sequences and the repeat-spacer elements of the CRISPR locus are in turn transcribed in a host cell to create a guide RNA, which can subsequently anneal to a target sequence and localize the Cas9 nuclease to this site. In this manner, highly site-specific cas9-mediated DNA cleavage can be engendered in a foreign polynucleotide because the interaction that brings cas9 within close proximity of the target DNA molecule is governed by RNA:DNA hybridization. As a result, one can theoretically design a CRISPR/Cas system to cleave any target DNA molecule of interest. This technique has been exploited in order to edit eukaryotic genomes (Hwang et al. Nature Biotechnology 31 :227 (2013)) and can be used as an efficient means of site- specifically editing hematopoietic stem cell genomes in order to cleave DNA prior to the incorporation of a gene encoding a target gene. The use of CRISPR/Cas to modulate gene expression has been described in, for instance, US Patent No. 8,697,359, the disclosure of which is incorporated herein by reference as it pertains to the use of the CRISPR/Cas system for genome editing. Alternative methods for site- specifically cleaving genomic DNA prior to the incorporation of a gene of interest in a target cell include the use of zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs). Unlike the CRISPR/Cas system, these enzymes do not contain a guiding polynucleotide to localize to a specific target sequence. Target specificity is instead controlled by DNA binding domains within these enzymes. The use of ZFNs and TALENs in genome editing applications is described, e.g., in Urnov et al. Nature Reviews Genetics 1 1 :636 (2010); and in Joung et al. Nature Reviews Molecular Cell Biology 14:49 (2013), the disclosure of each of which are incorporated herein by reference as they pertain to compositions and methods for genome editing.
Additional genome editing techniques that can be used to incorporate polynucleotides encoding target genes into the genome of a target cell include the use of ARCUS™ meganucleases that can be rationally designed so as to site-specifically cleave genomic DNA. The use of these enzymes for the incorporation of genes encoding target genes into the genome of a mammalian cell is advantageous in view of the defined structure-activity relationships that have been established for such enzymes. Single chain meganucleases can be modified at certain amino acid positions in order to create nucleases that selectively cleave DNA at desired locations, enabling the site-specific incorporation of a target gene into the nuclear DNA of a hematopoietic stem cell. These single-chain nucleases have been described extensively in, for example, US Patent Nos. 8,021 ,867 and US 8,445,251 , the disclosures of each of which are incorporated herein by reference as they pertain to compositions and methods for genome editing. E. Vectors for Delivery of Exogenous Nucleic Acids to Target Cells
(i) Viral vectors for nucleic acid delivery Viral genomes provide a rich source of vectors that can be used for the efficient delivery of exogenous genes into the genome of a cell (e.g., a mammalian cell, such as a human cell). Viral genomes are particularly useful vectors for gene delivery because the polynucleotides contained within such genomes are typically incorporated into the genome of a target cell by generalized or specialized transduction. These processes occur as part of the natural viral replication cycle, and do not require added proteins or reagents in order to induce gene integration. Examples of viral vectors include AAV, retrovirus, adenovirus (e.g., Ad5, Ad26, Ad34, Ad35, and Ad48), parvovirus (e.g., adeno-associated viruses), coronavirus, negative strand RNA viruses such as orthomyxovirus (e.g., influenza virus), rhabdovirus (e.g., rabies and vesicular stomatitis virus), paramyxovirus (e.g. measles and Sendai), positive strand RNA viruses, such as picornavirus and alphavirus, and double stranded DNA viruses including adenovirus, herpesvirus (e.g., Herpes Simplex virus types 1 and 2, Epstein-Barr virus, cytomegalovirus), and poxvirus (e.g., vaccinia, modified vaccinia Ankara (MVA), fowlpox and canarypox). Other viruses useful for delivering polynucleotides described herein include Norwalk virus, togavirus, flavivirus, reoviruses, papovavirus, hepadnavirus, and hepatitis virus, for example. Examples of retroviruses include: avian leukosis-sarcoma, mammalian C-type, B-type viruses, D-type viruses, HTLV- BLV group, lentivirus, spumavirus (Coffin, J. M., Retroviridae: The viruses and their replication, In Fundamental Virology, Third Edition, B. N. Fields, et al., Eds., Lippincott-Raven Publishers, Philadelphia, 1996). Other examples include murine leukemia viruses, murine sarcoma viruses, mouse mammary tumor virus, bovine leukemia virus, feline leukemia virus, feline sarcoma virus, avian leukemia virus, human T-cell leukemia virus, baboon endogenous virus, Gibbon ape leukemia virus, Mason Pfizer monkey virus, simian immunodeficiency virus, simian sarcoma virus, Rous sarcoma virus and lentiviruses. Other examples of vectors are described, for example, in US Patent No. 5,801 ,030, the disclosure of which is incorporated herein by reference as it pertains to viral vectors for use in gene therapy.
(ii) AA V Vectors for nucleic acid delivery
In some embodiments, nucleic acids of the compositions and methods described herein are incorporated into rAAV vectors and/or virions in order to facilitate their introduction into a cell. rAAV vectors useful in the invention are recombinant nucleic acid constructs that include (1 ) a heterologous sequence to be expressed (e.g., a polynucleotide encoding a GAA protein) and (2) viral sequences that facilitate integration and expression of the heterologous genes. The viral sequences may include those sequences of AAV that are required in cis for replication and packaging (e.g., functional ITRs) of the DNA into a virion. In typical applications, the heterologous gene encodes GAA, which is useful for correcting a GAA-deficiency in a cell. Such rAAV vectors may also contain marker or reporter genes. Useful rAAV vectors have one or more of the AAV WT genes deleted in whole or in part, but retain functional flanking ITR sequences. The AAV ITRs may be of any serotype (e.g., derived from serotype 2) suitable for a particular application. Methods for using rAAV vectors are described, for example, in Tal et al. J. Biomed. Sci. 7:279-291 (2000), and Monahan and Samulski, Gene Delivery 7:24-30 (2000), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
The nucleic acids and vectors described herein can be incorporated into a rAAV virion in order to facilitate introduction of the nucleic acid or vector into a cell. The capsid proteins of AAV compose the exterior, non-nucleic acid portion of the virion and are encoded by the AAV cap gene. The cap gene encodes three viral coat proteins, VP1 , VP2 and VP3, which are required for virion assembly. The construction of rAAV virions has been described, for instance, in US Patent Nos. 5,173,414; 5,139,941 ; 5,863,541 ; 5,869,305; 6,057,152; and 6,376,237; as well as in Rabinowitz et al. J. Virol. 76:791 -801 (2002) and Bowles et al. J. Virol. 77:423-432 (2003), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
rAAV virions useful in conjunction with the compositions and methods described herein include those derived from a variety of AAV serotypes including AAV 1 , 2, 3, 4, 5, 6, 7, 8 and 9. For targeting muscle cells, rAAV virions that include at least one serotype 1 capsid protein may be particularly useful. rAAV virions that include at least one serotype 6 capsid protein may also be particularly useful, as serotype 6 capsid proteins are structurally similar to serotype 1 capsid proteins, and thus are expected to also result in high expression of GAA in muscle cells. rAAV serotype 9 has also been found to be an efficient transducer of muscle cells. Construction and use of AAV vectors and AAV proteins of different serotypes are described, for instance, in Chao et al. Mol. Ther. 2:619-623 (2000); Davidson et al. Proc. Natl. Acad. Sci. USA 97:3428-3432 (2000); Xiao et al. J. Virol. 72:2224-2232 (1998); Halbert et al. J.
Virol. 74:1524-1 532 (2000); Halbert et al. J. Virol. 75:6615-6624 (2001 ); and Auricchio et al. Hum. Molec. Genet. 10:3075-3081 (2001 ), the disclosures of each of which are incorporated herein by reference as they pertain to AAV vectors for gene delivery.
Also useful in conjunction with the compositions and methods described herein are pseudotyped rAAV vectors. Pseudotyped vectors include AAV vectors of a given serotype (e.g., AAV9) pseudotyped with a capsid gene derived from a serotype other than the given serotype (e.g., AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, etc.). For example, a representative pseudotyped vector is an AAV8 or AAV9 vector encoding a therapeutic protein pseudotyped with a capsid gene derived from AAV serotype 2. Techniques involving the construction and use of pseudotyped rAAV virions are known in the art and are described, for instance, in Duan et al. J. Virol. 75:7662-7671 (2001 ); Halbert et al., J. Virol. 74:1524- 1532 (2000); Zolotukhin et al. Methods, 28:158-167 (2002); and Auricchio et al., Hum. Molec. Genet., 10:3075-3081 (2001 ).
AAV virions that have mutations within the virion capsid may be used to infect particular cell types more effectively than non-mutated capsid virions. For example, suitable AAV mutants may have ligand insertion mutations for the facilitation of targeting AAV to specific cell types. The construction and characterization of AAV capsid mutants including insertion mutants, alanine screening mutants, and epitope tag mutants is described in Wu et al., J. Virol. 74:8635-45 (2000). Other rAAV virions that can be used in conjunction with the compositions and methods described herein include those capsid hybrids that are generated by molecular breeding of viruses as well as by exon shuffling. See, e.g., Soong et al., Nat. Genet., 25:436-439 (2000) and Kolman and Stemmer, Nat. Biotechnol. 19:423-428 (2001 ).
Examples
The following examples are put forth so as to provide those of ordinary skill in the art with a description of how the compositions and methods described herein may be used, made, and evaluated, and are intended to be purely exemplary of the invention and are not intended to limit the scope of what the inventors regard as their invention.
Example 1. Codon optimization of GAA for tissue-specific gene expression
This example demonstrates how the reading frame surveillance model can be applied to design a codon-optimized sequence of the GAA gene for enhanced expression in a target tissue within a human patient suffering from Pompe disease. The principle of reading frame surveillance is based on the hypothesis that short complementary RNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated. The model indicates that these complementary RNA fragments function in concert to promote degradation of endogenous RNA that imperfectly aligns with the ribosome. As these endogenous RNA molecules are capable of persisting within the cytoplasm and may anneal, for instance, to RNA molecules encoding other proteins due to chance complementarity, a gene encoding a protein intended for expression in a target cell can be codon-optimized by incorporating codon replacements into the gene that minimize the sequence identity between RNA resulting from transcription of the gene of interest and the endogenous RNA molecules present within the cell. Since endogenous RNA is generated from transcription of genes that are actively expressed within the target cell, one can use gene expression techniques described herein, such as qPCR, RNA-Seq, and immunoblot assays to assess which genes are expressed in a particular target cell so as to ascertain the panel of RNA molecules against which the sequence identity of the RNA transcript of the gene of interest can be compared. Since the coding strand of the DNA gene corresponds directly to the ensuing mRNA template, codon substitutions can then be incorporated into the coding strand of the gene that minimize the sequence identity of the coding strand (and thus, the subsequent mRNA transcript) with endogenous RNA transcripts present in the cell.
Since endogenous scRNA fragments exist only for those genes that are actively transcribed, it is not necessary to minimize the sequence identity of the target gene relative to the entirety of the genome. If a gene is not expressed above a threshold level in the target cell under investigation (e.g., if a gene's expression level is not among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 1 0%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more, of gene expression levels in the intended target cell), the sequence identity of the target gene need not be minimized relative to the unexpressed gene.
A wild-type GAA gene sequence, excluding intronic DNA, is as follows:
Figure imgf000107_0001
(SEQ ID NO: 1 ).
The wild-type GAA amino acid sequence is as follows:
Figure imgf000108_0001
Analysis of SEQ ID NO: 1 reveals specific codon preferences for various amino acids throughout the gene. A summary of the codons used in the wild-type GAA gene to encode the amino acids of this protein are listed in Table 4, below.
Figure imgf000109_0001
Figure imgf000110_0001
Inspection of the codon frequencies outlined in Table 4 reveals that for certain amino acids, such as isoleucine, a particular codon is predominantly used (e.g., about 93% of the isoleucine residues in wild-type GAA are encoded by ATC) while other codons are used less frequently (e.g., about 7% of the isoleucine residues in wild-type GAA are encoded by ATT) or not at all (e.g., none of the isoleucine residues in wild-type GAA are encoded by ATA). According to methods described herein, one of skill in the art can use gene expression techniques known in the art to determine a gene expression profile for a target cell and subsequently align the coding strand of the wild-type GAA gene with the coding strands of genes determined to be expressed in the target cell. Based on the genetic code, one can subsequently introduce mutations into the GAA gene sequence that minimize the sequence identity of the coding strand of the gene (and hence, the ensuing mRNA template) with the coding strands of the genes determined to be expressed at a high level (e.g., genes whose expression levels are among the top 1 %, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 1 0%, 1 1 %, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more of gene expression levels) within the cell. This process will reduce the sequence complementarity between the GAA mRNA transcript and the endogenous scRNA fragments present within the cell following the translation of endogenous mRNA molecules, and will therefore diminish the degradation of the target mRNA transcript due to improper alignment of the endogenous scRNA molecules with the GAA mRNA template.
In addition to the above process, one of skill in the art can design variants of the GAA gene that contain increased GC content, reduced CpG content, and/or reduced homopolymer content so as to enhance the translation of the GAA protein. For instance, after enhancing the GAA-encoding gene sequence by incorporating codon substitutions that minimize the sequence identity of the coding strand of the wild-type GAA gene to the coding strands of genes expressed within the target cell, one of skill in the art can subsequently modify the designed coding sequence to as to increase the GC content of the coding sequence while preserving the identity of the encoded amino acid sequence (SEQ ID NO: 2). The increase in GC content will lead to enhanced binding of the GAA mRNA transcript with GAA-specific scRNA fragments, which promotes improved licensing of the mRNA transcript for nuclear export and enhanced translation due to few instances of improper alignment of the ribosome to the GAA-encoding mRNA template.
Additionally, one of skill in the art can further manipulate the GAA-encoding gene sequence by incorporating codon substitutions that diminish the CpG content and/or homopolymer content of the designed GAA gene. For instance, in the wild-type GAA gene sequence above, there are three instances of the homopolymer GGGGGG (SEQ ID NO: 4). Homopolymers can be a site of frameshift mutations in the formation of an mRNA transcript and/or during the translation process. If this homopolymer sequence remains in the codon-optimized GAA gene even after minimizing the sequence identity of the gene relative to endogenously expressed genes in the target cell, one of skill in the art can incorporate further mutations that interrupt this homopolymer while preserving the identity of the encoded protein (SEQ ID NO: 2). Alternatively, if the homopolymer encodes amino acid residues that are not essential for protein function (for instance, if the encoded amino acids are not present within the active site of the GAA enzyme), one of skill in the art can incorporate codon substitutions that interrupt the homopolymer and that introduce a conservative substitution into the encoded protein at the site of the corresponding amino acid.
Once designed, the final codon-optimized gene can be prepared, for instance, by solid phase nucleic acid procedures known in the art. Techniques for the solid phase synthesis of polynucleotides are known in the art and are described, for instance, in US Patent No. 5,541 ,307, the disclosure of which is incorporated herein by reference as it pertains to solid phase polynucleotide synthesis and purification. Additionally, the prepared gene can be amplified, for instance, using PCR-based techniques described herein or known in the art, and/or by transformation of DH5a E. coli with a plasmid containing the designed gene. The bacteria can subsequently be cultured so as to amplify the DNA therein, and the gene can be isolated plasmid purification techniques known in the art, followed optionally by a restriction digest and/or sequencing of the plasmid to verify the identity codon-optimized gene.
Example 2. Use of a codon-optimized GAA gene for the treatment of Pompe disease
A gene encoding a therapeutic protein, such as GAA, can be codon-optimized using the procedures described herein (e.g., as described in Example 1 , above). The gene can subsequently be incorporated into a vector, such as a viral vector, and administered to a patient suffering from a disease associated with a deficiency in the gene. For instance, a patient suffering from Pompe disease, a lysosomal storage disorder characterized by a deficiency in GAA, can be administered a viral vector containing a codon-optimized GAA gene under the control of a suitable promoter for expression in a human cell, such as a human muscle cell. For instance, an AAV vector, such as a pseudotyped AAV2/9 vector, can be generated that incorporates the codon-optimized GAA gene between the 5' and 3' inverted terminal repeats of the vector, and the gene may be placed under control of a tissue-specific promoter, such as the desmin promoter. The AAV vector can be administered to the subject systemically. The gene may be codon-optimized for muscle-specific expression, for instance, by introducing mutations into the codon-optimized GAA gene that favor degradation of the ensuing mRNA transcript in cells of other tissues based on sequence complementarity to scRNA molecules present within those tissues. In some instances, it may be desirable to achieve liver-specific expression of the encoded GAA protein. To induce liver-specific expression, codon substitutions can be incorporated into the optimized GAA gene sequence as described in Example 1 , above, that increase the sequence identity of the coding strand of the GAA gene with respect to genes expressed in non-liver cells while incorporating codon substitutions that diminish the sequence identity of the GAA gene relative to genes expressed in hepatocytes.
A practitioner of skill in the art can monitor the expression of the codon-optimized GAA gene by a variety of methods. For instance, one of skill in the art can transfect cultured hepatocytes, such as Hep2G cells, with the codon-optimized gene in order to model the expression of the codon-optimized gene in the liver of a patient. Expression of the encoded protein can subsequently be monitored using, for example, an expression assay described herein, such as qPCR, RNA-Seq, ELISA, or an immunoblot procedure. Based on the data obtained from the gene expression assay, further iterations of the codon optimization procedure can be performed, for instance, so as to increase GC content and/or to diminish CpG content and homopolymer content in the mRNA transcript. Candidate gene sequences with optimal expression patterns in vitro can subsequently be prepared for incorporation into a suitable vector and administration to a mammalian subject, such as an animal model of Pompe disease, or a human patient. Example 3. Design and Delivery of polynucleotides having +1 frameshift mutations to human cells to reduce target gene expression
The principle of the reading frame surveillance is based on the hypothesis that short
complementary RNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated. The model indicates that scRNA may disengage and bind to RNA molecules of the same sequence, thereby functioning as a form of epigenetic memory of previous translation events. scRNA binding may be reduced to limit translation of a wild type mRNA transcript, by the introduction of a competitive RNA molecule that can bind to and sequester free scRNA fragments.
Based on the RFS model, a therapeutic polynucleotide, having the sequence of a target gene (e.g., an aberrantly expressed and/or mutant gene associated with a disease) mRNA, may be designed to contain a +1 frameshift mutation. The +1 frameshift mutation may be generated by the addition of five nucleotides or by the inclusion of a single nucleotide deletion with respect to the targeted gene sequence. The polynucleotide may be introduced, e.g., by transfection, into cultured human cells expressing the target gene. For example, the polynucleotide may be introduced into a series of cell cultures at increasing concentrations. Target gene expression may then be monitored, e.g., by RT-PCR and Western blot analysis, for dose-dependent attenuation (e.g., reduction) as compared to untreated control cells.
Example 4. Design and Delivery of polynucleotides having +1 frameshift mutations to human cells to reduce target gene expression by a viral vector
Based on the RFS model, a therapeutic polynucleotide, having the sequence of a target gene (e.g., an aberrantly expressed and/or mutant gene associated with a disease) mRNA, may be designed to contain a +1 frameshift mutation. The polynucleotide may be incorporated into a viral vector, such as an adeno-associated virus (AAV), for delivery into human cells. The AVV encoding the polynucleotide may be introduced into cultured human cells expressing the target gene. For example, the AVV encoding the polynucleotide may be introduced into a series of cell cultures at increasing concentrations. Target gene expression may then be monitored, e.g., by RT-PCR and Western blot analysis, for dose-dependent attenuation (e.g., reduction) as compared to untreated control cells.
Example 5. Design and Delivery of polynucleotides having +1 frameshift mutations to human cardiac cells to reduce mutant RYR2 expression
Catecholaminergic polymorphic ventricular tachycardia (CPVT) is a condition characterized by an abnormal heart rhythm (arrhythmia). CPVT can result from the mutation of the gene encoding ryanodine receptor 2 (RYR2) (Human RYR2 mRNA sequence: NCBI Reference Sequence: NM_001035.2).
Attenuation of mutant RYR2 expression may be an effective treatment for CPVT. This example demonstrates how the reading frame surveillance model can be applied to design a polynucleotide having a +1 frameshift mutation to promote attenuation of target gene expression in mammalian cells (e.g., human cells).
Based on the RFS model, a therapeutic polynucleotide, having the sequence of a CPVT associated mutant RYR2 mRNA, may be designed to contain a +1 frameshift mutation. The
polynucleotide may be introduced, e.g., by transfection, into cultured human cardiac cells expressing the mutant RYR2 gene. For example, the polynucleotide may be introduced into a series of cell cultures at increasing concentrations. RYR2 gene expression may then be monitored, e.g., by RT-PCR and
Western blot analysis, for dose-dependent attenuation (e.g., reduction) as compared to untreated control cells.
Example 6. Design and delivery of duplex containing mRNA encoding GAA for expression in Pompe disease patient
This example demonstrates how the reading frame surveillance model can be applied to design a mRNA-containing duplex for promoting GAA gene expression in a target tissue within a human patient suffering from Pompe disease. The principle of reading frame surveillance is based on the hypothesis that short complementary RNA fragments are produced following initial translation of an endogenous RNA transcript, and that these complementary RNA strands act as molecular rulers that prevent the ribosome from aligning imperfectly with the mRNA template, for instance, by preventing the binding of the ribosome to the template mRNA one or more nucleotides downstream of the next codon to be translated. The model indicates that these complementary RNA fragments function in concert to the high-fidelity translation of functional protein product.
Based on the RFS model, a synthetic duplex can be prepared ex vivo and delivered to a patient, such as a patient suffering from a disease associated with a deficiency in a particular protein, so as to induce or augment the expression of the protein in the subject.
A wild-type GAA gene sequence, excluding intronic DNA, is as follows:
Figure imgf000114_0001
Figure imgf000115_0001
Using the sequences set forth above, one of skill in the art can prepare (for instance, using chemical synthesis techniques described herein or known in the art) a mRNA molecule encoding wild type GAA. One of skill in the art can design a series of short (e.g., 9-30 nucleotide) complementary nucleic acids that will anneal to the GAA-encoding mRNA. The short complementary nucleic acids may include DNA strands, RNA strands, or both, and may include one or more modified nucleic acids.
Exemplary modified nucleotides include modified adenosine, such as N6-methyladenosine 5'- triphosphate, N1 -methyladenosine 5'-triphosphate, 2'-0-methyladenosine 5'-triphosphate, 2'-amino-2'- deoxyadenosine 5'-triphosphate, 2'-azido-2'-deoxyadenosine 5'-triphosphate, or 2'-fluoro-2'- deoxyadenosine 5'-triphosphate. Additional examples of modified nucleotides include modified guanosine, such as N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino- 2'-deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, or 2'-fluoro-2'- deoxyguanosine 5'-triphosphate. Modified uridine nucleotides can be incorporated into the mRNA- containing duplexes described herein, such as 5-methyluridine 5'-triphosphate, 5-idouridine 5'- triphosphate, 5-bromouridine 5'-triphosphate, 2-thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'-amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'- deoxyuridine 5'-triphosphate, or 2'-fluoro-2'-deoxyuridine 5'-triphosphate. Additional modified nucleotides include modified cytidine, such as 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5- bromocytidine 5'-triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'-deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, or 2'-fluoro-2'- deoxycytidine 5'-triphosphate.
The complementary nucleic acids can span, for example, the entire length of the mRNA strand, such as from the start codon to and including the stop codon. Alternatively, the mRNA duplex may have one or more single stranded gaps, such as a single stranded region that is a multiple of 3 nucleotides in length (e.g. ,3, 6, 9, 12, 15, 18, 21 , 24, 27, 30, 33, 36, 39, 42, 45, 48, 51 , 54, 57, 60, 63, 66, 69, 72, 75, 78, 81 , 84, 87, 90, 93, 96, 99, 102, 1 05, 108, 1 1 1 , 1 14, 1 17, 120, 123, 126, 129, 132, 135, or more, nucleotides in length).
Once designed, the mRNA-containing duplex can be prepared, for instance, by solid phase nucleic acid procedures described herein or known in the art. Techniques for the solid phase synthesis of polynucleotides are known in the art and are described, for instance, in US Patent No. 5,541 ,307, the disclosure of which is incorporated herein by reference as it pertains to solid phase polynucleotide synthesis and purification.
Example 7. Use of GAA-encoding mRNA duplex for the treatment of Pompe disease
A mRNA duplex as prepared according to Example 6 can be incorporated into a variety of delivery vehicles, such as a liposome, cationic polymer, nanoparticle, vesicle, or exosome, and administered to a patient suffering from a disease associated with a deficiency in the gene. For instance, a patient suffering from Pompe disease, a lysosomal storage disorder characterized by a deficiency in GAA, can be administered a liposome containing a mRNA duplex prepared according to the procedure described in Example 6. A liposome containing the mRNA duplex can be administered to the subject systemically. Alternatively, the liposome can be administered in a tissue-specific fashion using administration techniques known in the art.
A practitioner of skill in the art can monitor the expression of the GAA gene by a variety of methods. For instance, expression of the encoded protein can be monitored using, for example, an expression assay described herein, such as qPCR, RNA-Seq, ELISA, or an immunoblot procedure. Based on the data obtained from the gene expression assay, the subject may be re-dosed with the mRNA duplex, for instance, so as to up-titrate or down-titrate the quantity of mRNA duplex administered to the subject.
Example 8. Induced alternative splicing by Reading Frame Surveillance of DMD exon 51 to restore the downstream reading frame in a mutant DMD gene
Duchenne muscular dystrophy (DMD) is a severe, progressive, neuromuscular disorder caused by mutations in the DMD gene that result in C-terminally truncated, non-functional dystrophin proteins. The mutations that give rise to DMD are heterogeneous in nature and typically include at least one of the following: a point mutation that produces a premature termination codon, a duplication, or a deletion. Alternative splicing, for instance, by way of exon skipping, is a promising therapeutic approach for the treatment of DMD since internally truncated dystrophins can be partly functional. This is exemplified by the less severe phenotype of Becker muscular dystrophy (BMD) patients, who carry mutations in the same gene, wherein the mutations do not alter the reading frame of protein translation.
Alternative splicing using RFS requires knowledge or the genetic basis of the pathological state. In the case of a heterogeneous genetic disease, such as DMD, it is necessary to determine the specific genetic variant that gives rise to the disease in the individual to be treated. Therefore, the first step in the design of a polynucleotide for targeted alternative splicing by RFS is to determine, by genetic sequencing, the identity of the genetic mutation that leads to the C-terminal truncation of dystrophin.
The wild type DMD gene consists of 79 exons and nearly all patients have unique mutations. The alternative splicing approach is mutation specific because different mutations require skipping different exons to restore protein function. However, two-thirds of all patients carry a deletion of one or more exons, of which 70% cluster between exons 45 and 55. Thus, skipping certain exons is applicable to treatment of large groups of patients. One estimate, described in Aartmus-Rus et al. Human Mutation 30(3):293 (2009), indicates that 13.0% of all mutations that give rise to DMD could be corrected by skipping of DMD exon 51 (hereafter referred to as exon 51 ), wherein correction involves restoring the reading frame for the production of an internally truncated protein.
This example describes the process for designing and administering a polynucleotide that directs for targeted exon skipping of exon 51 , wherein targeted exon skipping uses the Reading Frame
Surveillance mechanism of the host cell. Since the wild type form of exons 48 through 50 of the DMD gene (Δ48-50) includes 397 nucleotides, a genetic deletion of these exons would result in a frameshift. Since the combined number of nucleotides in exons 48 through 51 (Δ48-51 ) is 630, a multiple of 3, targeted skipping of exon 51 restores the downstream reading frame of the mutated Δ48-50 DMD gene.
In this example, exon skipping using RFS may be achieved by introducing a polynucleotide into a cell harboring the Δ48-50 DMD gene, wherein the polynucleotide includes EX1 -INTR1 -EX2 operably linked in a 5'-to-3' direction, wherein EX1 corresponds to exon 47 of DMD and EX2 corresponds to exon 52 of DMD. INTR1 corresponds to any intron that enables the splicing of exon 47 to exon 52, wherein INTR1 may be isolated from DMD, isolated from another gene, a synthetic intron, or an intron that contains regions isolated from a gene and also includes a synthetic region.
This polynucleotide may be operably linked to a constitutive promoter, such as the
cytomegalovirus (CMV promoter), and incorporated into a viral vector, such as an adenoviral vector. The vector including the polynucleotide may be delivered to a cell harboring the Δ48-50 DMD gene, such as the well-characterized DMD myoblast model carrying the deletion of exons 48 through 50 (available at the Telethon Neuromuscular Biobank). Following delivery of the vector including the polynucleotide, the DMD myoblasts Δ48-50 can be induced to differentiate and samples of DMD mRNA and dystrophin protein collected after 7 days. Skipping of exon 51 can be evaluated by, e.g., RT-PCR analysis of the DMD mRNA and/or western blot of the dystrophin protein. Production of a dystrophin protein including an internal truncation of Δ48-51 is expected, thereby restoring the downstream reading frame and partial function of the protein.
Example 9. Induced alternative splicing by Reading Frame Surveillance of APOB exon 27 for the selective knockdown of ApoB100 activity
Apolipoprotein B (ApoB) has emerged as a key target for therapeutic interventions that lower LDL cholesterol. Two natural protein isoforms are generated by the APOB gene, ApoB100 and ApoB48. The full-length ApoB100 isoform is expressed in the liver and is the principle structural apolipoprotein in LDL particles. ApoB48 is generated by tissue-specific RNA editing of a single nucleotide that produces a premature termination codon in exon 26 of the APOB mRNA transcript. Thus, ApoB48 is C-terminally truncated as compared to ApoB1 00. ApoB48 is assembled into chylomicrons, whose function is to transport dietary fat and fat-soluble vitamins from the intestine. In the preferred embodiment, a treatment aimed at reducing cholesterol by interfering with ApoB should be selective for ApoB100 and not interfere with the function of ApoB48.
This example describes the process for designing and administering a polynucleotide that directs for targeted exon skipping of exon 27 of the APOB gene, wherein the targeted exon skipping uses the Reading Frame Surveillance mechanism of the host cell. The polynucleotide may be therapeutically relevant for the treatment of high cholesterol and atherosclerosis by generating a truncated form of ApoB1 00. The wild type form of APOB exon 27 includes 1 15 nucleotides and a genetic deletion of exon 27 results in a frameshift mutation that leads to inclusion of a premature stop codon, and production of the C-terminally truncated ApoB87. Since ApoB48 includes a tissue-specific stop codon in exon 26, exon skipping of exon 27 does not interfere with its function.
In this example, exon skipping using RFS may be achieved by introducing a polynucleotide into a cell harboring the APOB gene, wherein the polynucleotide includes EX1 -INTR1 -EX2 operably linked in a 5'-to-3' direction, wherein EX1 corresponds to exon 26 of APOB and EX2 corresponds to exon 28 of DMD. INTR1 corresponds to any intron that enables the splicing of exon 26 to exon 28 (e.g., INTR1 may be isolated from APOB, isolated from another gene, a synthetic intron, or an intron that contains regions isolated from a gene and also includes a synthetic region).
This polynucleotide may be operably linked to a constitutive promoter, such as the
cytomegalovirus (CMV promoter), and incorporated into a viral vector, such as an adenoviral vector. The vector including the polynucleotide may be delivered to a cell such as the human hepatocellular carcinoma cell line, HepG2. HepG2 cells express ApoB100 and have been show to support exon skipping and production of the ApoB87 isoform. The vector including the polynucleotide can be delivered to the cultured HepG2 cells as described in the detailed description of the invention, and samples of APOB mRNA and ApoB protein collected after 48 hours. Skipping of exon 27 can be evaluated, e.g., by RT-PCR analysis of the APOB mRNA and/or production of the ApoB87 isoform evaluated, e.g., by western blot of the ApoB protein.
To evaluate the specificity of gene knockdown, the vector containing the polynucleotide may be delivered to a cell such as the human epithelial colorectal carcinoma cell line, Caco-2. Caco-2 cells have been used as a model for lipoprotein synthesis in the intestine and express both the ApoBI OO and ApoB48 isoforms, once cellular confluence has been achieved. The vector can be delivered to the cultured Caco-2 cells as described in the detailed description of the invention, and samples of APOB mRNA and ApoB protein would be collected after 48 hours. Skipping of exon 27 can be evaluated, e.g., by RT-PCR analysis of the APOB mRNA, and production of the ApoB87 isoform can be evaluated, e.g., by western blot of the ApoB protein. No effect on the protein expression of the ApoB48 isoform is expected, thereby enabling selective gene knockdown by exon skipping using RFS.
Other Embodiments
All publications, patents, and patent applications mentioned in this specification are incorporated herein by reference to the same extent as if each independent publication or patent application was specifically and individually indicated to be incorporated by reference.
While the invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications and this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the invention that come within known or customary practice within the art to which the invention pertains and may be applied to the essential features hereinbefore set forth, and follows in the scope of the claims.
Other embodiments are within the claims.

Claims

1 . A method of preparing a codon-optimized gene or RNA equivalent thereof for expression of a protein in a cell, said method comprising:
a) providing a gene expression profile for a plurality of genes expressed in said cell;
b) providing a polynucleotide sequence comprising a portion encoding said protein;
c) incorporating codon substitutions into said polynucleotide that minimize the sequence identity of said polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 1 % of gene expression levels in the intended target cell; and
d) synthesizing, expressing, and/or isolating said polynucleotide.
2. The method of claim 1 , wherein step (c) comprises incorporating codon substitutions into said polynucleotide that minimize the sequence identity of said polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 5% of gene expression levels in the intended target cell.
3. The method of claim 2, wherein step (c) comprises incorporating codon substitutions into said polynucleotide that minimize the sequence identity of said polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 1 0% of gene expression levels in the intended target cell.
4. The method of claim 3, wherein step (c) comprises incorporating codon substitutions into said polynucleotide that minimize the sequence identity of said polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 1 5% of gene expression levels in the intended target cell.
5. The method of claim 4, wherein step (c) comprises incorporating codon substitutions into said polynucleotide that minimize the sequence identity of said polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 20% of gene expression levels in the intended target cell.
6. The method of claim 5, wherein step (c) comprises incorporating codon substitutions into said polynucleotide that minimize the sequence identity of said polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 25% of gene expression levels in the intended target cell.
7. The method of claim 6, wherein step (c) comprises incorporating codon substitutions into said polynucleotide that minimize the sequence identity of said polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 30% of gene expression levels in the intended target cell.
8. The method of claim 7, wherein step (c) comprises incorporating codon substitutions into said polynucleotide that minimize the sequence identity of said polynucleotide or RNA resulting from transcription thereof to endogenous RNA molecules encoding a protein whose expression level is among the top 50% of gene expression levels in the intended target cell.
9. The method of any one of claims 1 -8, wherein no more than 75% of the continuous 30- nucleotide portions of said polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of said endogenous RNA molecules.
10. The method of claim 9, wherein no more than 50% of the continuous 30-nucleotide portions of said polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of said endogenous RNA molecules.
1 1 . The method of claim 10, wherein no more than 25% of the continuous 30-nucleotide portions of said polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of said endogenous RNA molecules.
12. The method of claim 1 1 , wherein no more than 20% of the continuous 30-nucleotide portions of said polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of said endogenous RNA molecules.
13. The method of claim 12, wherein no more than 1 5% of the continuous 30-nucleotide portions of said polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of said endogenous RNA molecules.
14. The method of claim 13, wherein no more than 1 0% of the continuous 30-nucleotide portions of said polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of said endogenous RNA molecules.
15. The method of claim 14, wherein no more than 5% of the continuous 30-nucleotide portions of said polynucleotide or RNA resulting from transcription thereof exhibit greater than 75% sequence identity to a corresponding continuous 30-nucleotide portion of said endogenous RNA molecules.
16. The method of any one of claims 1 -15, wherein said polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity to endogenous RNA molecules encoding a protein whose expression level is among the top 1 % of expression levels in said cell.
17. The method of claim 16, wherein said polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity to endogenous RNA molecules encoding a protein whose expression level is among the top 5% of expression levels in said cell.
18. The method of claim 17, wherein said polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity to endogenous RNA molecules encoding a protein whose expression level is among the top 10% of expression levels in said cell.
19. The method of claim 18, wherein said polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity to endogenous RNA molecules encoding a protein whose expression level is among the top 15% of expression levels in said cell.
20. The method of claim 19, wherein said polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity to endogenous RNA molecules encoding a protein whose expression level is among the top 20% of expression levels in said cell.
21 . The method of claim 20, wherein said polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity to endogenous RNA molecules encoding a protein whose expression level is among the top 25% of expression levels in said cell.
22. The method of claim 21 , wherein said polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity to endogenous RNA molecules encoding a protein whose expression level is among the top 30% of expression levels in said cell.
23. The method of claim 22, wherein said polynucleotide or RNA resulting from transcription thereof exhibits no greater than 75% sequence identity to endogenous RNA molecules encoding a protein whose expression level is among the top 50% of expression levels in said cell.
24. The method of any one of claims 1 -23, wherein step (c) comprises minimizing the sequence identity of the protein-coding region of said polynucleotide or RNA resulting from transcription thereof relative to the protein-coding regions of said endogenous RNA molecules.
25. The method of any one of claims 1 -24, wherein step (c) comprises minimizing the sequence identity of one or more non-coding regions of said polynucleotide or RNA resulting from transcription thereof relative to the corresponding non-coding regions of said endogenous RNA molecules.
26. The method of claim 25, wherein said non-coding region is an intron, a 5' untranslated region (UTR), or a 3' UTR.
27. The method of any one of claims 1 -26, wherein said method further comprises increasing the GC content of the polynucleotide while preserving the amino acid sequence of the encoded protein.
28. The method of claim 27, wherein the GC content of said polynucleotide is increased to a quantity sufficient to permit hybridization of said polynucleotide or RNA resulting from transcription thereof to a complementary RNA strand with a Gibbs free energy change of from about -1 0 to about -100 kcal/mol.
29. The method of claim 28, wherein said complementary RNA strand is from 9 to 30 nucleotides in length.
30. The method of any one of claims 1 -29, wherein said method further comprises minimizing the CpG content of the polynucleotide while preserving the amino acid sequence of the encoded protein.
31 . The method of any one of claims 1 -30, wherein said method further comprises minimizing the homopolymer content of the polynucleotide while preserving the amino acid sequence of the encoded protein
32. The method of any one of claims 1 -31 , wherein said cell is a eukaryotic cell.
33. The method of claim 32, wherein said eukaryotic cell is a mammalian cell.
34. The method of claim 33, wherein said mammalian cell is a human cell.
35. The method of any one of claims 1 -34, wherein said method further comprises incorporating one or more codon substitutions into the polynucleotide that increase the sequence identity of said polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 1 % of expression levels in said cell but is among the top 1 % of expression levels in a cell of a different type.
36. The method of claim 35, wherein said method comprises incorporating one or more codon substitutions into the polynucleotide that increase the sequence identity of said polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 5% of expression levels in said cell but is among the top 5% of expression levels in a cell of a different type.
37. The method of claim 36, wherein said method comprises incorporating one or more codon substitutions into the polynucleotide that increase the sequence identity of said polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 10% of expression levels in said cell but is among the top 10% of expression levels in a cell of a different type.
38. The method of claim 37, wherein said method comprises incorporating one or more codon substitutions into the polynucleotide that increase the sequence identity of said polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 15% of expression levels in said cell but is among the top 15% of expression levels in a cell of a different type.
39. The method of claim 38, wherein said method comprises incorporating one or more codon substitutions into the polynucleotide that increase the sequence identity of said polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 20% of expression levels in said cell but is among the top 20% of expression levels in a cell of a different type.
40. The method of claim 39, wherein said method comprises incorporating one or more codon substitutions into the polynucleotide that increase the sequence identity of said polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 25% of expression levels in said cell but is among the top 25% of expression levels in a cell of a different type.
41 . The method of claim 40, wherein said method comprises incorporating one or more codon substitutions into the polynucleotide that increase the sequence identity of said polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 30% of expression levels in said cell but is among the top 30% of expression levels in a cell of a different type.
42. The method of claim 41 , wherein said method comprises incorporating one or more codon substitutions into the polynucleotide that increase the sequence identity of said polynucleotide or RNA resulting from transcription thereof to one or more endogenous RNA molecules encoding a protein whose expression level is not among the top 50% of expression levels in said cell but is among the top 50% of expression levels in a cell of a different type.
43. The method of any one of claim 35-42, wherein said cell of a different type is a cell of a different tissue.
44. The method of any one of claims 1 -43, wherein said method further comprises preparing a vector comprising said codon-optimized gene or RNA equivalent thereof.
45. The method of claim 44, wherein said vector is a viral vector.
46. The method of claim 45, wherein said viral vector is selected from the group consisting of adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, and a vaccinia virus.
47. The method of claim 46, wherein said viral vector is an AAV.
48. The method of claim 47, wherein said AAV is an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
49. The method of claim 46, wherein said viral vector is a pseudotyped AAV.
50. The method of claim 49, wherein said pseudotyped AAV is rAAV2/8 or rAAV2/9.
51 . The method of any one of claims 1 -43, wherein said method further comprises preparing a liposome, vesicle, synthetic vesicle, exosome, synthetic exosome, dendrimer, or nanoparticle comprising said codon-optimized gene or RNA equivalent thereof.
52. The method of any one of claims 44-51 , wherein said method further comprises contacting said vector, liposome, vesicle, synthetic vesicle, exosome, synthetic exosome, dendrimer, or nanoparticle with a population of cells in vitro or in vivo.
53. The method of claim 52, wherein said method comprises isolating said protein from said population of cells.
54. The method of claim 53, wherein said isolating comprises purifying said protein from a plurality of other compounds.
55. The method of claim 54, wherein said purifying comprises size-exclusion chromatography, ion-exchange chromatography, high pressure liquid chromatography, affinity chromatography, gel filtration, or preparative gel electrophoresis.
56. The method of any one of claims 1 -55, wherein said gene or RNA equivalent thereof encodes a protein listed in Table 3.
57. The method of claim 56, wherein said protein is selected from the group consisting of myotubularin 1 (MTM1 ), acid a-glucosidase (GAA), calsequestrin 2 (CASQ2), and uridine diphosphate glucuronosyltransferase family 1 member A1 (UGT1 A1 ).
58. A composition comprising a codon-optimized gene or RNA equivalent thereof prepared by the method of any one of claims 1 -57.
59. The composition of claim 58, wherein said composition is a vector.
60. The composition of claim 59, wherein said vector is a viral vector.
61 . The composition of claim 60, wherein said viral vector is selected from the group consisting of AAV, adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, and vaccinia virus.
62. The composition of claim 61 , wherein said viral vector is an AAV.
63. The composition of claim 62, wherein said AAV is an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
64. The composition of claim 61 , wherein said viral vector is a pseudotyped AAV.
65. The composition of claim 64, wherein said pseudotyped AAV is rAAV2/8 or rAAV2/9.
66. The composition of claim 58, wherein said composition is a liposome, vesicle, synthetic vesicle, exosome, synthetic exosome, dendrimer, or nanoparticle.
67. The composition of any one of claims 58-66, wherein said gene or RNA equivalent thereof encodes a protein listed in Table 3.
68. The composition of claim 67, wherein said protein is selected from the group consisting of MTM1 , GAA, CASQ2, and UGT1 A1 .
69. A method of treating a disease or condition in a patient characterized by a deficiency in a protein, said method comprising administering to the patient a therapeutically effective amount of the composition of any one of claims 58-68.
70. The method of claim 69, wherein said disease or condition is X-linked myotubular myopathy and said protein is MTM1 .
71 . The method of claim 69, wherein said disease or condition is Pompe disease and said protein is GAA.
72. The method of claim 69, wherein said disease or condition is recessive catecholaminergic polymorphic ventricular tachycardia and said protein is CASQ2.
73. The method of claim 69, wherein said disease or condition is Crigler-Najjar syndrome and said protein is UGT1 A1 .
74. A method of attenuating expression of a gene in a cell, said method comprising introducing into said cell a polynucleotide having a +1 frameshift mutation with respect to said gene.
75. The method of claim 74, wherein said polynucleotide comprises a single nucleotide deletion causing said +1 frameshift mutation.
76. The method of claim 74 or 75, wherein said polynucleotide comprises a wild type copy of said gene operably linked to the portion of said polynucleotide having said +1 frameshift mutation.
77. The method of any one of claims 74-76, wherein the said polynucleotide is a DNA polynucleotide.
78. The method of any one of claims 74-76, wherein the polynucleotide is an RNA
polynucleotide.
79. The method of any one of claims 74-78, wherein said polynucleotide is introduced into said cell by contacting said cell with a vector comprising said polynucleotide.
80. The method of claim 79, wherein the vector is a DNA vector.
81 . The method of claim 79, wherein the vector is an RNA vector.
82. The method of any one of claims 79-81 , wherein said vector is a viral vector.
83. The method of claim 82, wherein said viral vector is selected from the group consisting of adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, adeno-associated virus, baculovirus, herpes simplex virus, and a vaccinia virus.
84. The method of claim 83, wherein said viral vector is an AAV.
85. The method of claim 84, wherein said AAV is an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
86. The method of claim 83 wherein said viral vector is a pseudotyped AAV.
87. The method of claim 86, wherein said pseudotyped AAV is rAAV2/8 or rAAV2/9.
88. The method of any one of claims 74-87, wherein said polynucleotide is introduced into said cell by electroporation, or by contacting said cell with a cationic polymer, a cationic lipid, or calcium phosphate.
89. A vector comprising a polynucleotide having a +1 frameshift mutation with respect to a wild type sequence of a gene.
90. The vector of claim 89, wherein said polynucleotide comprises a single nucleotide deletion causing said +1 frameshift mutation.
91 . The vector of claim 89 or 90, wherein said polynucleotide comprises a wild type copy of said gene operably linked to the portion of said polynucleotide having said +1 frameshift mutation.
92. The vector of any one of claims 89-91 , wherein the vector is a DNA vector.
93. The vector of any one of claims 89-91 , wherein the vector is an RNA vector.
94. The vector of any one of claims 89-93, wherein said vector is a viral vector.
95. The vector of claim 94, wherein said viral vector is selected from the group consisting of adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, adeno-associated virus, baculovirus, herpes simplex virus, and a vaccinia virus.
96. The vector of claim 95, wherein said viral vector is an AAV.
97. The vector of claim 96, wherein said AAV is an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
98. The vector of claim 95, wherein said viral vector is a pseudotyped AAV.
99. The vector of claim 98, wherein said pseudotyped AAV is rAAV2/8 or rAAV2/9.
100. A method of inducing expression of a protein in a cell, said method comprising introducing into said cell a duplex comprising a RNA polynucleotide encoding said protein, wherein said RNA polynucleotide is hybridized to a plurality of complementary nucleic acid strands.
101 . The method of claim 100, wherein said plurality of complementary nucleic acid strands comprise one or more RNA strands.
102. The method of claim 100 or 101 , wherein said plurality of complementary nucleic acid strands comprise one or more DNA strands.
103. The method of any one of claims 100-102, wherein said plurality of complementary nucleic acid strands together span from the 5' end to the 3' end of said RNA polynucleotide encoding said protein.
104. The method of claim 103, wherein each nucleotide of said RNA polynucleotide encoding said protein is base-paired to a nucleotide of said plurality of complementary nucleic acid strands.
105. The method of any one of claims 99-104, wherein each of said complementary nucleic acid strands is from 9 to 30 nucleotides in length.
106. The method of any one of claims 99-105, wherein one or more of said plurality of complementary nucleic acid strands comprise one or more modified nucleotides.
107. The method of claim 106, wherein said one or more modified nucleotides comprise a modified adenosine.
108. The method of claim 107, wherein said modified adenosine is selected from the group consisting of N6-methyladenosine 5'-triphosphate, N1 -methyladenosine 5'-triphosphate, 2'-0- methyladenosine 5'-triphosphate, 2'-amino-2'-deoxyadenosine 5'-triphosphate, 2'-azido-2'- deoxyadenosine 5'-triphosphate, 2'-fluoro-2'-deoxyadenosine 5'-triphosphate.
109. The method of claim 106, wherein said one or more modified nucleotides comprise a modified guanosine.
1 10. The method of claim 109, wherein said modified guanosine is selected from the group consisting of N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'- deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, 2'-fluoro-2'-deoxyguanosine 5'-triphosphate.
1 1 1 . The method of claim 106, wherein said one or more modified nucleotides comprise a modified uridine.
1 12. The method of claim 1 1 1 , wherein said modified uridine is selected from the group consisting of 5-methyluridine 5'-triphosphate, 5-idouridine 5'-triphosphate, 5-bromouridine 5'-triphosphate, 2- thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'- amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'-deoxyuridine 5'-triphosphate, 2'-fluoro-2'-deoxyuridine 5'-triphosphate.
1 13. The method of claim 106, wherein said one or more modified nucleotides comprise a modified cytidine.
1 14. The method of claim 1 13, wherein said modified cytidine is selected from the group consisting of 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5-bromocytidine 5'- triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'- deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, 2'-fluoro-2'-deoxycytidine 5'- triphosphate.
1 15. The method of any one of claims 99-1 14, wherein said duplex is introduced into said cell by electroporation, or by contacting said cell with a cationic lipid, a vesicle, an exosome, a liposome, a dendrimer, a synthetic cationic vesicle, a cationic polymer, or calcium phosphate.
1 16. The method of any one of claims 99-1 15, wherein said cell is a eukaryotic cell.
1 17. The method of claim 1 16, wherein said eukaryotic cell is a mammalian cell.
1 18. The method of claim 1 17, wherein said mammalian cell is a human cell.
1 19. The method of any one of claims 99-1 18, wherein said protein is a protein listed in Table 3.
120. The method of claim 1 19, wherein said protein is selected from the group consisting of myotubularin 1 (MTM1 ), acid a-glucosidase (GAA), calsequestrin 2 (CASQ2), and uridine diphosphate glucuronosyltransferase family 1 member A1 (UGT1 A1 ).
121 . A composition comprising a synthetic duplex comprising a RNA polynucleotide encoding a protein, wherein said RNA polynucleotide is hybridized to a plurality of complementary nucleic acid strands.
122. The composition of claim 121 , wherein said plurality of complementary nucleic acid strands comprise one or more RNA strands.
123. The composition of claim 121 or 122, wherein said plurality of complementary nucleic acid strands comprise one or more DNA strands.
124. The composition of any one of claims 121 -123, wherein said plurality of complementary nucleic acid strands together span from the 5' end to the 3' end of said RNA polynucleotide encoding said protein.
125. The composition of any one of claims 121 -123, wherein each nucleotide of said RNA polynucleotide encoding said protein is base-paired to a nucleotide of said plurality of complementary nucleic acid strands.
126. The composition of any one of claims 121 -125, wherein each of said complementary nucleic acid strands comprises is from 9 to 30 nucleotides in length.
127. The composition of any one of claims 121 -126, wherein one or more of said plurality of complementary nucleic acid strands comprise one or more modified nucleotides.
128. The composition of claim 127, wherein said one or more modified nucleotides comprise a modified adenosine.
129. The composition of claim 128, wherein said modified adenosine is selected from the group consisting of N6-methyladenosine 5'-triphosphate, N1 -methyladenosine 5'-triphosphate, 2'-0- methyladenosine 5'-triphosphate, 2'-amino-2'-deoxyadenosine 5'-triphosphate, 2'-azido-2'- deoxyadenosine 5'-triphosphate, 2'-fluoro-2'-deoxyadenosine 5'-triphosphate.
130. The composition of claim 127, wherein said one or more modified nucleotides comprise a modified guanosine.
131 . The composition of claim 130, wherein said modified guanosine is selected from the group consisting of N1 -methylguanosine 5'-triphosphate, 2'-0-methylguanosine 5'-triphosphate, 2'-amino-2'- deoxyguanosine 5'-triphosphate, 2'-azido-2'-deoxyguanosine 5'-triphosphate, 2'-fluoro-2'-deoxyguanosine 5'-triphosphate.
132. The composition of claim 127, wherein said one or more modified nucleotides comprise a modified uridine.
133. The composition of claim 132, wherein said modified uridine is selected from the group consisting of 5-methyluridine 5'-triphosphate, 5-idouridine 5'-triphosphate, 5-bromouridine 5'-triphosphate, 2-thiouridine 5'-triphosphate, 4-thiouridine 5'-triphosphate, 2'-methyl-2'-deoxyuridine 5'-triphosphate, 2'- amino-2'-deoxyuridine 5'-triphosphate, 2'-azido-2'-deoxyuridine 5'-triphosphate, 2'-fluoro-2'-deoxyuridine 5'-triphosphate.
134. The composition of claim 127, wherein said one or more modified nucleotides comprise a modified cytidine.
135. The composition of claim 134, wherein said modified cytidine is selected from the group consisting of 5-methylcytidine 5'-triphosphate, 5-idocytidine 5'-triphosphate, 5-bromocytidine 5'- triphosphate, 2-thiocytidine 5'-triphosphate, 2'-methyl-2'-deoxycytidine 5'-triphosphate, 2'-amino-2'- deoxycytidine 5'-triphosphate, 2'-azido-2'-deoxycytidine 5'-triphosphate, 2'-fluoro-2'-deoxycytidine 5'- triphosphate.
136. The composition of any one of claims 121 -135, wherein said composition is a cationic lipid, a vesicle, an exosome, a liposome, a dendrimer, a synthetic cationic vesicle, or a cationic polymer.
137. The composition of any one of claims 121 -136, wherein said protein is a protein listed in Table 3.
138. The composition of claim 137, wherein said protein is selected from the group consisting of MTM1 , GAA, CASQ2, and UGT1 A1 .
139. A method of treating a disease or condition in a patient characterized by a deficiency in a protein, said method comprising administering to the patient a therapeutically effective amount of the composition of any one of claims 121 -138.
140. The method of claim 139, wherein said disease or condition is X-linked myotubular myopathy and said protein is MTM1 .
141 . The method of claim 139, wherein said disease or condition is Pompe disease and said protein is GAA.
142. The method of claim 139, wherein said disease or condition is recessive catecholaminergic polymorphic ventricular tachycardia and said protein is CASQ2.
143. The method of claim 139, wherein said disease or condition is Crigler-Najjar syndrome and said protein is UGT1 A1 .
144. A method of inducing exon skipping in a cell, said method comprising introducing into said cell a polynucleotide comprising :
a) a first region corresponding to a first exon from a gene (EX1 );
b) a second region corresponding to a first intron (INTR1 ); and
c) a third region corresponding to a second exon (EX2) from said gene,
operably linked to each other in a 5'-to-3' direction as EX1 -INTR1 -EX2, wherein a wild-type form of said gene comprises one or more intervening exons between said EX1 and said EX2, and wherein said polynucleotide does not comprise a region corresponding to said one or more intervening exons, thereby inducing skipping of said exons between said EX1 and said EX2.
145. The method of claim 144, wherein said polynucleotide further comprises:
d) a fourth region corresponding to a second intron (INTR2); and
e) a fifth region corresponding to a third exon (EX3) from said gene,
operably linked to each other in a 5'-to-3' direction as EX1 -INTR1 -EX2-INTR2-EX3, wherein a wild-type form of said gene comprises one or more intervening exons between said EX2 and said EX3, and wherein said polynucleotide does not comprise a region corresponding to said one or more intervening exons, thereby inducing skipping of said exons between said EX2 and said EX3.
146. The method of claim 145, wherein EX3 comprises a full-length exon.
147. The method of claim 145, wherein EX3 comprises a truncated exon.
148. The method of any of claims 144-147, wherein EX1 comprises a full-length exon.
149. The method of any of claims 144-147, wherein EX1 comprises a truncated exon.
150. The method of any one of claims 144-149, wherein EX2 comprises a full-length exon.
151 . The method of any one of claims 144-149, wherein EX2 comprises a truncated exon.
152. The method of any one of claims 144-151 , wherein said one or more intervening exons comprises an exon having a mutation relative to said wild type form of said gene.
153. The method of claim 152, wherein said mutation is a frameshift mutation.
154. The method of claim 153, wherein said one or more intervening exons comprise a single nucleotide deletion causing said frameshift mutation.
155. The method of claim 153, wherein said one or more intervening exons comprises a single nucleotide insertion causing said frameshift mutation.
156. The method of claim 152, wherein said mutation is a nonsense mutation.
157. The method of claim 156, wherein said one or more intervening exons comprises a single nucleotide substitution causing said nonsense mutation.
158. The method of any one of claims 144-151 , wherein said cell comprises a gene comprising a duplication of a region of said gene, wherein said region comprises an entire exon or a portion of an exon.
159. The method of any one of claims 144-151 , wherein said cell comprises a gene comprising a deletion of a region of said gene, wherein said region comprises an entire exon or a portion of an exon.
160. The method of any one of claims 144-159, wherein said polynucleotide further comprises a eukaryotic promoter (PEuk), wherein said eukaryotic promoter and said EX1 are operably linked to each other in a 5'-to-3' direction as: PEuk-EX1 .
161 . The method of claim 160, wherein said PEuk is a tissue specific promoter or a constitutive promoter.
162. The method of claim 161 , wherein said PEuk is a tissue specific promoter selected from the group consisting of desmin promoter, creatine kinase promoter, myogenin promoter, a myosin heavy chain promoter, human brain and natriuretic peptide promoter, albumin promoter, a-1 -antitrypsin promoter, and hepatitis B virus core protein promoter.
163. The method of claim 161 , wherein said PEuk is a constitutive promoter selected from the group consisting of cytomegalovirus (CMV) promoter, chicken-p-actin promoter, Herpes Simplex virus (HSV) promoter, thymidine kinase (TK) promoter, Rous Sarcoma Virus (RSV) promoter, Simian Virus 40 (SV40) promoter, Mouse Mammary Tumor Virus (MMTV) promoter, and Adenovirus E1 A promoter.
164. The method of any one of claims 144-163, wherein said gene encodes dystrophin.
165. The method of claim 164, wherein the exon having said mutation is selected from the group consisting of exon 2, exons 6, exon 7, exon 8, exon 17, exons 1 9, exon 20, exon 35, any one of exon 43- 59, exon 69, and exon 70.
166. The method of any one of claims 144-165, wherein said cell is a eukaryotic cell.
167. The method of claim 166, wherein said eukaryotic cell is a mammalian cell.
168. The method of claim 167, wherein said mammalian cell is a human cell.
169. The method of any one of claims 144-168, wherein said polynucleotide is introduced into said cell by contacting said cell with a vector comprising said polynucleotide.
170. The method of claim 169, wherein said vector is a viral vector.
171 . The method of claim 170, wherein said viral vector is selected from the group consisting of adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, and a vaccinia virus.
172. The method of claim 171 , wherein said viral vector is an AAV.
173. The method of claim 172, wherein said AAV is an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
174. The method of claim 170, wherein said viral vector is a pseudotyped AAV.
175. The method of claim 174, wherein said pseudotyped AAV is rAAV2/8 or rAAV2/9.
176. The method of any one of claims 144-175, wherein said polynucleotide is introduced into said cell by electroporation, or by contacting said cell with a cationic polymer, a cationic lipid, or calcium phosphate.
177. A composition comprising a vector comprising a polynucleotide comprising : a) a first region corresponding to a first exon from a gene (EX1 );
b) a second region corresponding to a first intron (INTR1 ); and
c) a third region corresponding to a second exon (EX2) from said gene,
operably linked to each other in a 5'-to-3' direction as EX1 -INTR1 -EX2, wherein a wild-type form of said gene comprises one or more intervening exons between said EX1 and said EX2, and wherein said polynucleotide does not comprise a region corresponding to said one or more intervening exons.
178. The composition of claim 177, wherein said polynucleotide further comprises:
d) a fourth region corresponding to a second intron (INTR2); and
e) a fifth region corresponding to a third exon (EX3) from said gene,
operably linked to each other in a 5'-to-3' direction as EX1 -INTR1 -EX2-INTR2-EX3, wherein a wild-type form of said gene comprises one or more intervening exons between said EX2 and said EX3, and wherein said polynucleotide does not comprise a region corresponding to said one or more intervening exons.
179. The composition of claim 178, wherein EX3 comprises a full-length exon.
180. The composition of claim 178, wherein EX3 comprises a truncated exon.
181 . The composition of any one of claims 177-180, wherein EX1 comprises a full-length exon.
182. The composition of any one of claims 177-180, wherein EX1 comprises a truncated exon.
183. The composition of any one of claims 177-182, wherein EX2 comprises a full-length exon.
184. The composition of any one of claims 177-182, wherein EX2 comprises a truncated exon.
185. The composition of any one of claims 177-184, wherein said composition further comprises a cell comprising a gene comprising said one or more intervening exons comprising a mutation relative to said wild type form of said gene.
186. The composition of claim 185, wherein said mutation is a frameshift mutation.
187. The composition of claim 186, wherein said one or more intervening exons comprises a single nucleotide deletion causing said frameshift mutation.
188. The composition of claim 185, wherein said mutation is a nonsense mutation.
189. The composition of claim 188, wherein said one or more intervening exons comprises a single nucleotide substitution causing said nonsense mutation.
190. The composition of any one of claims 177-184, wherein said cell comprises a gene comprising a duplication of a region of said gene, wherein said region comprises an entire exon or a portion of an exon.
191 . The composition of claim 190, wherein said cell comprises a gene comprising a deletion of a region of said gene, wherein said region comprises an entire exon or a portion of an exon.
192. The composition of any one of claims 177-191 , wherein said polynucleotide further comprises a eukaryotic promoter (PEuk), wherein said eukaryotic promoter and said EX1 are operably linked to each other in a 5'-to-3' direction as: PEuk-EX1 .
193. The composition of claim 192, wherein said PEuk is a tissue specific promoter or a constitutive promoter.
194. The composition of claim 193, wherein said PEuk is a tissue specific promoter selected from the group consisting of desmin promoter, creatine kinase promoter, myogenin promoter, a myosin heavy chain promoter, human brain natriuretic peptide promoter, albumin promoter, a-1 -antitrypsin promoter, and hepatitis B virus core protein promoter.
195. The composition of claim 193, wherein said PEuk is a constitutive promoter selected from the group consisting of CMV promoter, chicken-p-actin promoter, HSV promoter, TK promoter, RSV promoter, SV40 promoter, MMTV promoter, and Adenovirus E1 A promoter.
196. The composition of any one of claims 177-195, wherein said gene encodes dystrophin.
197. The composition of claim 196, wherein the exon having said mutation is selected from the group consisting of exon 2, exons 6, exon 7, exon 8, exon 1 7, exons 19, exon 20, exon 35, any one of exon 43-59, exon 69, and exon 70.
198. The composition of any one of claims 185-197, wherein said cell is a eukaryotic cell.
199. The composition of claim 198, wherein said eukaryotic cell is a mammalian cell.
200. The composition of claim 199, wherein said mammalian cell is a human cell.
201 . The composition of any one of claims 177-200, wherein said vector is a viral vector.
202. The composition of claim 201 , wherein said viral vector is selected from the group consisting of adeno-associated virus (AAV), adenovirus, lentivirus, retrovirus, poxvirus, baculovirus, herpes simplex virus, and a vaccinia virus.
203. The composition of claim 202, wherein said viral vector is an AAV.
204. The composition of claim 203, wherein said AAV is an AAV1 , AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, or AAVrh74 serotype.
205. The composition of claim 201 , wherein said viral vector is a pseudotyped AAV.
206. The composition of claim 205, wherein said pseudotyped AAV is rAAV2/8 or rAAV2/9.
207. A method of inducing exon inclusion in a cell, said method comprising introducing into said cell a polynucleotide comprising :
a) a first region corresponding to a first exon from a gene (EX1 );
b) a second region corresponding to a second exon (EX2); and
c) a third region corresponding to a third exon (EX3) from said gene,
operably linked to each other in a 5'-to-3' direction as EX1 -EX2-EX3, wherein one or more of EX1 , EX2, and EX3 does not comprise a full-length exon, thereby inducing inclusion of said second exon between said first exon and said third exon during splicing of an RNA transcript of said gene in said cell.
208. The method of claim 207, wherein said EX1 comprises a full-length exon.
209. The method of claim 207, wherein said EX1 comprises a truncated exon.
210. The method of any one of claims 207-209, wherein said EX2 comprises a full-length exon.
21 1 . The method of any one of claims 207-209, wherein said EX2 comprises a truncated exon.
212. The method of any one of claims 207-21 1 , wherein said EX3 comprises a full-length exon.
213. The method of any one of claims 207-21 1 , wherein said EX3 comprises a truncated exon.
PCT/US2017/047320 2016-08-19 2017-08-17 Compositions and methods for modulating gene expression using reading frame surveillance WO2018035311A1 (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US201662377328P 2016-08-19 2016-08-19
US201662377449P 2016-08-19 2016-08-19
US201662377331P 2016-08-19 2016-08-19
US201662377499P 2016-08-19 2016-08-19
US62/377,331 2016-08-19
US62/377,449 2016-08-19
US62/377,328 2016-08-19
US62/377,499 2016-08-19

Publications (1)

Publication Number Publication Date
WO2018035311A1 true WO2018035311A1 (en) 2018-02-22

Family

ID=61197101

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/047320 WO2018035311A1 (en) 2016-08-19 2017-08-17 Compositions and methods for modulating gene expression using reading frame surveillance

Country Status (1)

Country Link
WO (1) WO2018035311A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021025725A1 (en) * 2019-08-07 2021-02-11 Northwestern University Materials and methods for gene delivery in the heart
WO2022062440A1 (en) * 2020-09-22 2022-03-31 广州瑞风生物科技有限公司 Grna targeting ctgf gene and use thereof
WO2023221942A1 (en) * 2022-05-16 2023-11-23 Shanghai Vitalgen Biopharma Co., Ltd. Recombinant aav vectors for treating glutaric aciduria type i

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001000784A2 (en) * 1999-06-28 2001-01-04 Institut Gustave Roussy Compound derived from an offset orf of the ice gene
US20010016199A1 (en) * 1995-05-23 2001-08-23 Johnston Robert E. Alphavirus RNA replicon systems
US20080300842A1 (en) * 2004-05-04 2008-12-04 Sridhar Govindarajan Design, Synthesis and Assembly of Synthetic Nucleic Acids

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010016199A1 (en) * 1995-05-23 2001-08-23 Johnston Robert E. Alphavirus RNA replicon systems
WO2001000784A2 (en) * 1999-06-28 2001-01-04 Institut Gustave Roussy Compound derived from an offset orf of the ice gene
US20080300842A1 (en) * 2004-05-04 2008-12-04 Sridhar Govindarajan Design, Synthesis and Assembly of Synthetic Nucleic Acids

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BOOPATHI ET AL.: "Revealing natural antisense transcripts from Plasmodium vivax isolates: evidence of genome regulation in complicated malaria", INFECT GENET EVOL, vol. 20, 11 October 2013 (2013-10-11), pages 428 - 443, XP055466916 *
MATSUFUJI ET AL.: "Reading two bases twice: mammalian antizyme frameshifting in yeast", EMBO J, vol. 15, no. 6, 15 March 1996 (1996-03-15), pages 1360 - 1370, XP055466914 *
POPPLEWELL ET AL.: "Gene correction of a duchenne muscular dystrophy mutation by meganuclease-enhanced exon knock-in", HUM GENE THER, vol. 24, no. 7, 1 July 2013 (2013-07-01), pages 692 - 701, XP002713983 *
XIE ET AL.: "Knockout of one acetylcholinesterase allele in the mouse", CHEM BIOL INTERACT, vol. 14, no. 119-120, 1 May 1999 (1999-05-01), pages 289 - 299, XP027424496 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021025725A1 (en) * 2019-08-07 2021-02-11 Northwestern University Materials and methods for gene delivery in the heart
WO2022062440A1 (en) * 2020-09-22 2022-03-31 广州瑞风生物科技有限公司 Grna targeting ctgf gene and use thereof
WO2023221942A1 (en) * 2022-05-16 2023-11-23 Shanghai Vitalgen Biopharma Co., Ltd. Recombinant aav vectors for treating glutaric aciduria type i

Similar Documents

Publication Publication Date Title
US20210163948A1 (en) RNA TARGETING OF MUTATIONS VIA SUPPESSOR tRNAs AND DEAMINASES
JP7472121B2 (en) Compositions and methods for transgene expression from the albumin locus
EP3289081B1 (en) Compositions and methods for the treatment of nucleotide repeat expansion disorders
US20200407750A1 (en) Novel adeno-associated virus (aav) vectors, aav vectors having reduced capsid deamidation and uses therefor
US11713471B2 (en) Class II, type V CRISPR systems
AU2019360269A1 (en) Nucleic acid constructs and methods of use
CN116209770A (en) Methods and compositions for modulating genomic improvement
CA3116739A1 (en) Compositions and methods for treating alpha-1 antitrypsin deficiencey
US20210269825A1 (en) Compositions and methods for reducing spliceopathy and treating rna dominance disorders
WO2018035311A1 (en) Compositions and methods for modulating gene expression using reading frame surveillance
TW202233844A (en) Aav capsids and compositions containing same
Luo et al. AAVS1-targeted plasmid integration in AAV producer cell lines
US11597947B2 (en) Gene editing method using virus
CN112805012A (en) Genetic modification of mitochondrial genome
WO2022221581A1 (en) Programmable nucleases and methods of use
WO2022187278A1 (en) Nucleic acid detection and analysis systems
WO2023172926A1 (en) Precise excisions of portions of exons for treatment of duchenne muscular dystrophy
WO2023044424A1 (en) Frataxin gene therapy
WO2023230579A2 (en) Supplementation of liver enzyme expression
WO2024044717A1 (en) Programmable rna writing using crispr effectors and trans-splicing templates
AU2022366984A1 (en) Compositions and methods for treating alpha-1 antitrypsin deficiency
CN118139981A (en) FRATAXIN Gene therapy
CN115992123A (en) Cytosine base editor based on APOBEC3A after cynomolgus monkey transformation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17842110

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17842110

Country of ref document: EP

Kind code of ref document: A1