WO2020214973A1

WO2020214973A1 - Triple helix terminator for efficient rna trans-splicing

Info

Publication number: WO2020214973A1
Application number: PCT/US2020/028797
Authority: WO
Inventors: Krishna J. Fisher; Jean Bennett
Original assignee: The Trustees Of The University Of Pennsylvania
Priority date: 2019-04-17
Filing date: 2020-04-17
Publication date: 2020-10-22
Also published as: BR112021020539A2; KR20220002910A; CA3133555A1; EP3956442A1; IL287243A; MX2021012702A; EP3956442A4; CN114040974A; AU2020260154A1; US20220204989A1; JP2022529065A

Abstract

A nucleic acid trans-splicing molecule is provided that can replace an exon in a targeted mammalian ocular gene carrying a defect or mutation causing an ocular disease with an exon having the naturally-occurring sequence without the defect or mutation. The trans-splicing molecule includes a 3' transcription terminator domain which enhances the efficiency of trans-splicing. The 3' TTD comprises a triple helix domain and a tRNA-like domain.

Description

TRIPLE HELIX TERMINATOR FOR EFFICIENT RNA TRANS-SPLICING

BACKGROUND

A number of inherited retinal diseases are caused by mutations, generally multiple mutations, located throughout portions of large ocular genes. As one example, Stargardt disease, also known as Stargardt 1 (STGD1), is an autosomal recessive form of retinal dystrophy that is usually characterized by a progressive loss of central vision. Similar retinal diseases are caused by defects in other large ocular genes, including CEP290 (7440 nucleotides) which defects or mutations cause Leber’s congenital amaurosis, among other ocular disorders, andMTOZT (7465 nucleotides), which defects or mutations cause Usher’s disease.

The occurrences and locations of multiple mutations in such large ocular, and other, genes have made strategies for repairing the mutations very challenging. Despite the great promise of trans-splicing technology spanning over two decades to meet this challenge, it has yet to emerge a meaningful approach for gene therapy. This is due primarily, if not exclusively, to the poor efficiency of the trans-splicing reaction. It is important to recognize that trans-splicing is unusual in higher eukaryotes, including humans. And while there are a handful of rare examples of endogenous trans-splicing, cis- splicing clearly dominates by a large margin. Simply stated, trans-splicing in humans appears to be a novel class of alternative splicing that utilizes the same cellular factors and mechanisms that mediate the traditional cis-splicing pathway.

There remains a need for effective compositions and therapeutic methods for treating such disorders.

SUMMARY

Provided herein are RNA trans-splicing molecules (RTM) useful in treatment of diseases caused by defects in one or more exons of the coding sequence. Also provided are methods and compositions utilizing these RTM.

In one aspect, the invention includes a nucleic acid trans-splicing molecule (e.g., RTM) comprising a 3’ transcription terminator domain (TTD), which comprises a triple helix. In some embodiments, the triple helix comprises at least five consecutive A-U Hoogsteen base pairs (e.g., four to 20 consecutive A-U Hoogsteen base pairs, four to 18 consecutive A-U Hoogsteen base pairs, four to 15 consecutive A-U Hoogsteen base pairs, four to 12 consecutive A-U Hoogsteen base pairs, four to 11 consecutive A-U Hoogsteen base pairs, or four to 10 consecutive A-U Hoogsteen base pairs, e.g., six to eight consecutive A-U Hoogsteen base pairs, eight to 10 consecutive A-U Hoogsteen base pairs, 10 to 12 consecutive A-U Hoogsteen base pairs, 12 to 14 consecutive A-U Hoogsteen base pairs, 14 to 16 consecutive A-U Hoogsteen base pairs, 16 to 18 consecutive A-U

Hoogsteen base pairs, or 18 to 20 consecutive A-U Hoogsteen base pairs).

In some embodiments, the triple helix comprises an A-rich tract of 5-30 nucleic acids (e.g., 5-10 nucleic acids, 10-20 nucleic acids, or 20-30 nucleic acids). In some embodiments, the A-rich tract is at the 3’ end of the TTD (e.g., at or within a poly-A tail).

In some embodiments, the triple helix comprises a strand of 10 consecutive nucleotides, wherein 9 of the 10 consecutive nucleotides are paired via Hoogsteen base pairing. In some embodiments, the TTD comprises a stem-loop motif.

In some embodiments, the 3’ TTD comprises, operatively linked in a 5’-to-3’ direction, a 5’ U-rich motif, a stem-loop motif, a f U-rich motif, and an A-rich tract.

In some embodiments, 3’ TTD is at least 95% homologous with SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, or SEQ ID NO: 23 (e.g., at least 96% homologous with SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, or SEQ ID NO: 23; at least 97% homologous with SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, or SEQ ID NO: 23; at least 98% homologous with SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, or SEQ ID NO: 23; at least 99% homologous with SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, or SEQ ID NO: 23; or 100% homologous with SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, or SEQ ID NO: 23).

In some embodiments, the 3’ TTD is at least 95% homologous (e.g., at least 96%, at least 97%, at least 98%, or at least 99% homologous) with SEQ ID NO: 13, and wherein the triple helix comprises Hoogsteen base pairing of U7-U11 of SEQ ID NO: 13 with an A-rich tract. In some embodiments, the 3’ TTD is the PAN ENE+A.

In some embodiments, the 3’ TTD is at least 95% homologous (e.g., at least 96%, at least 97%, at least 98%, or at least 99% homologous) with SEQ ID NO: 15, and wherein the triple helix comprises Hoogsteen base pairing of U6-10, Cl l, and U12-15 of SEQ ID NO: 15 with an A-rich tract. In some embodiments, the 3’ TTD is the MALAT1 ENE+A.

In some embodiments, the 3’ TTD is at least 95% homologous (e.g., at least 96%, at least 97%, at least 98%, or at least 99% homologous) with SEQ ID NO: 17, and wherein the triple helix comprises Hoogsteen base pairing of U6-10, Cl l, and U12-15 of SEQ ID NO: 17 with an A-rich tract. In some embodiments, the 3’ TTD is the MALAT1 core ENE+A.

In some embodiments, the 3’ TTD is at least 95% homologous with SEQ ID NO: 23, and wherein the triple helix comprises Hoogsteen base pairing of U8-10, Cl l, and U12-15 of SEQ ID NO: 23 with an A-rich tract. In some embodiments, the 3’ TTD is the MENb ENE+A.

In one aspect, a nucleic acid trans-splicing molecule is provided. The RTM includes the following, operatively linked in a 5’-to-3’ direction:

(a) a coding sequence domain (CDS) comprising one or more functional exon(s) of a selected gene;

(b) a linker sequence of varying length and/or composition that acts as a structural connection between the coding domain and the binding domain, and may contain motifs that function as splicing enhancers, or have the capacity to fold into complex secondary structures that act to minimize the translation of the coding region before the trans-splicing event occurs, or encode a degradation peptide in the event of premature RTM maturation;

(c) a spliceosome recognition motif (Splice Donor, SD, also called the 5’

Splice Site (5’ SS)) configured to initiate spliceosome-mediated trans-splicing;

(d) a binding domain (BD) of varying length and sequence designed to hybridize to a target intron of the selected gene, wherein said gene has at least one defect or mutation in an exon 5’ to the target intron; and

(e) a 3’ transcription terminator domain (TTD),

wherein the nucleic acid trans-splicing molecule is configured to trans-splice the coding domain to an endogenous exon of the selected gene adjacent to the target intron, thereby replacing the endogenous defective or mutated exon with the functional exon and correcting a mutation in the selected gene. In one embodiment, the binding domain hybridizes to the target intron of the selected gene 3’ to the mutation and the coding domain comprises one or more exon(s) 5’ to the target intron.

In another aspect, the RTM includes the following, operatively linked in a 5’-to-3’ direction:

(a) a binding domain (BD) of varying length and sequence designed to hybridize to a target intron of the selected gene, wherein said gene has at least one defect or mutation in an exon 3’ to the targeted intron;

(b) a linker sequence of varying length and composition that acts as a structural connection between the binding domain the coding region, and contains motifs that function as splicing enhancers or fold into complex secondary structures that impede translation of the coding region as a competitive event for trans-splicing, or encode a degradation peptide in the event of premature RTM maturation;

(c) a 3’ spliceosome recognition motif ((Splice Acceptor, SA), also called the 3’ Splice Site (3’ SS)) configured to mediate trans-splicing;

(d) a coding sequence domain (CDS) comprising one or more functional exon(s) of the selected gene; and

(e) a 3’ transcription terminator domain (TTD),

wherein the nucleic acid trans-splicing molecule is configured to trans-splice the coding domain to an endogenous exon of the selected gene adjacent to the target intron, thereby replacing the endogenous defective or mutated exon with the functional exon and correcting a mutation in the selected gene. In one embodiment, the binding domain binds to the target intron of the selected gene 3’ to the mutation and the coding domain comprises one or more exon 5’ to the target intron.

In one embodiment, the 3’ transcription terminator domain is a sequence from one or more long non-coding RNAs (IncRNA) or other nuclear RNA molecules that contain a 3’ transcription terminator that condenses into a triple helix 3’ blunt-ended cap.

In another aspect, a recombinant adeno-associated virus (rAAV) is provided, which includes any of the RTM described herein.

In another aspect, a method of treating a disease caused by a defect or mutation in a target gene is provided. The method includes administering to the cells of a subject having the disease a composition comprising a recombinant AAV comprising a nucleic acid trans- splicing molecule as described herein.

In yet another aspect, a pharmaceutical preparation is provided, comprising a physiologically acceptable carrier and the rAAV or RTM as described herein.

Other aspects and embodiments are described in the following detailed description.

BRIEF DESCRIPTION OF THE FIGURES

FIGs. 1A-1E shows a map and partial sequence of RTM Luciferase reporter constructs that target Intron26 from human CEP290. They encode the 5’ half of the Luciferase coding sequence (CDS) along with different transcription terminator sequences: poly(A) - polyadenylation signal from SV40, which creates a 3’ terminal end following cleavage at the poly(A) signal and addition of an untemplated poly(A) tail (FIG. 1 A); hhRz - hammerhead Ribozyme, which self-cleaves to create a 3’ terminal end of the RTM (FIG. IB); Comp 14 - a truncated MALAT1 triple helix terminator structure, which creates a 3’ terminal end of the RTM following RNase P cleavage (two versions - FIG.

1C, ID); and a hybrid in which the mascRNA domain of Compl4 is replaced by hhRz, which creates a 3’ terminal end of the RTM following ribozyme self-cleavage (FIG. IE). For FIG.1A (391. poly (A)), SEQ ID NO: 31 nt 2081-2600 are shown. For FIG. IB

(391. hhRz) SEQ ID NO: 32 nt 2081-2447 are shown. For FIG. 1C (391.Compl4-vl) SEQ ID NO: 33 nt 2081-2470 are shown. For FIG. ID (391.Compl4-v2) SEQ ID NO: 34 nt 2081-2470 are shown. FIG.1E (391.Compl4.hhRz) SEQ ID NO: 35 nt 2081-2470 are shown.

FIG. IF shows a map and a sequence of a minigene that contains Intron26 from human CEP290 fused to the 3’ half of the luciferase CDS. FIG. IF (pcDNA_FRT.In26 target.3’Luc) SEQ ID NO: 36 nt 6761-7280 are shown.

FIG. 2A and 2B shows luciferase levels that were measured for the constructs described in FIG. 1A-1D, as discussed in Example 1. The RTM is delivered to a cell line that expresses a mini gene that contains Intron26 from human CEP290 fused to the 3’ half of the luciferase CDS shown in FIG. IF. FIGs. 3A-3C show a map and partial sequence of RTM constructs that target Intron23 of human ABCA4. They include one of several terminator sequences that were tested for ABCA4 trans-splicing activity: hhz - hammerhead Ribozyme, which self cleaves to create 3’ terminal end of RTM (FIG. 3A); C14 or Compl4 - a truncated derivative of the MALAT1 triple helix structure, which creates 3’ terminal end of RTM following RNase P cleavage (FIG. 3B); and wt - native MALATl triple helix terminator, which creates 3’ terminal end of RTM following RNase P cleavage (FIG. 3C). FIG. 3 A shows a portion of the sequence shown in SEQ ID NO: 28, with the 5’ SS (also called SD or splicing domain) beginning at nt 4311, and the insulator ending at nt 4591. FIG. 3B shows a portion of the sequence shown in SEQ ID NO: 29, with the 5’ SS (also called SD or splicing domain) beginning at nt 4311, and the mascRNA ending at nt 4620. FIG. 3C shows a portion of the sequence shown in SEQ ID NO: 30, with the 5’ SS (also called SD or splicing domain) beginning at nt 4311, and the mascRNA ending at nt 4654.

FIGs.4A and 4B are Western blots, and quantitation thereof, showing ABCA4 protein generated by RTM-mediated trans-splicing. RTMs of FIG. 3 that were tested include binding domains for ABCA4 intron23 (motifs 27 and 81) and intron22 (motifs 117 and 118). NB is a negative control Non-Binding motif.

FIG. 5 A shows Western blot analysis of RTMs containing different triple helix terminators from IncRNAs. They include the wild-type sequence from MALATl and NEAT1 (MENb), as well as chimeric forms where the triple helix domain from MALATl was fused to the tRNA-like motif from NEAT1 (called menRNA) and one where the triple helix domain from NEAT1 was fused to the mascRNA motif from MALATl. The data suggests trans-splicing activity is highest when an RTM contains the wild-type MALATl terminator.

FIG 5B shows the predicted base-pairing for triple helix terminators from three different IncRNAs, including MALATl, MENb (NEAT1), and PAN RNA (produced from the Kaposi’s sarcoma-associated herpesvirus, KSHV). The structural similarity across distinct IncRNAs suggests a common evolutionary strategy for protecting the 3’ end of the IncRNA following transcription termination. However, X-ray crystallography of the MALATl triple helix domain revealed it contains 10 major groove and 2 minor groove triples, the most of any known naturally occurring triple helical structure (Brown, J.A. et al. 2014). This intricate design likely confers a level of structural stability that is greater than either NEAT1 or PAN, and could explain why the MALAT1 terminator appears to better support trans-splicing. By way of protecting the RTM from degradation in the nucleus. Importantly, the blunt-ended triple helix of MALAT1 has been shown to inhibit rapid nuclear RNA decay as shown by in vivo decay assays (Brown, J.A. 2014).

FIG. 6A shows the highly conserved mascRNA sequence of MALAT1 from several species and it’s predicted folded conformation. A single G-to-A point mutation, indicated by the red arrow, was inserted into the mascRNA sequence to test the importance of this domain for trans-splicing activity. As shown in the Western blot (FIG. 6B), the point mutation ablated trans-splicing activity of a validated RTM that targets ABCA4. Possibly due to the inability of the mutated sequence to assume the correct conformation required for RNaseP recognition and cleavage.

FIG. 7shows a vector map of a vector which includes codon-optimized ABCA4 coding sequence and hammerhead ribozyme (hhRz). The sequence is shown in SEQ ID NO: 28.

FIG. 8shows a vector map of a vector which includes codon-optimized ABCA4 coding sequence, MALAT1, for codons 1-23 and the truncated MALAT1 Compl4 3 ’TTD sequences. The sequence is shown in SEQ ID NO: 29.

FIG. 9show a vector map of a vector which includes codon-optimized ABCA4 coding sequence, MALAT1, for codons 1-23 and the wt MALAT1 3’TTD sequences. The sequence is shown in SEQ ID NO: 30.

FIG. 10 shows a map and sequence of the triple helix region from the human MALAT1 IncRNA. The sequence of MALAT1 is shown in SEQ ID NO: 7. The triple helical region begins at 8287 of SEQ ID NO: 7 and the mascRNA ends at 8437 of SEQ ID NO: 7.

DETAILED DESCRIPTION

Many experimental trans-splicing studies that are reported in the literature often fall short of therapeutically meaningful endpoints. This is not to suggest these studies are not significant, as they invariably demonstrate the essential role of the RTM binding domain and splice site signals. And while these basic elements are indeed important, the complexities of RNA splicing involve an array of additional cis- and trans-acting factors for template recognition, spliceosome assembly, not to mention other non-splicing mechanisms that can directly impact the turn-over or localization of RTM molecules. Because trans-splicing is at a competitive disadvantage relative to cis-splicing, it is essential that the technical design of RNA trans-splicing molecules (RTM) includes features that increase the odds in favor of an RTM. One way to achieve that is by increasing the effective concentration of the RTM in the nucleus or by making the RTM a more attractive target to the spliceosome (via cis-acting elements or localization).

At the center of the present disclosure are RNA trans-splicing molecules (RTM) that are designed to specifically target a gene of interest and deliver its genetic payload via a trans-splicing reaction. Structurally, RTMs are organized into three core domains: 1) a protein coding region; 2) a binding domain that hybridizes to an intron within a target gene RNA transcript; and 3) a linker sequence with splicing signals (5’ SS or 3’SS) that connects the coding region to the binding domain. It’s important to emphasize that each of these three regions also have functional roles. Although modifications to any of these regions could theoretically impact RTM activity, the binding domain has ahracted the most ahention. Indeed, most reports in the literature include some degree of screening to identify the optimal binding sequence. Both the location of the target sequence and the length have shown to influence RTM activity. However, there has been no evidence of sequence specific features that might constitute consensus motifs or aid the development of binding domain design rules that might be applicable across different gene targets. As a result, binding domains are invariably determined by trial and error.

It remains unclear why some binding domains work beher than others. A likely explanation involves RNA folding, and how this might influence the availability of a given target sequence for hybridization of an RTM. RNA folding can also influence the RTM binding domain itself; i.e. if the binding domain assumes a complex secondary structure it won’t be available for hybridization with the target intron. Given an optimal binding domain is identified, an RTM remains subject to the same rules as other RNAs in the nucleus. And this could influence RTM activity independent of the binding reaction. Mechanistically, RTMs must have a half-life in the nucleus that is sufficiently long to allow the binding reaction to occur. If the RTM is transported out of the nucleus, or degraded by ubiquitous nuclear ribonucleases, two events that would markedly reduce the effective RTM concentration, trans-splicing efficiency will decline.

The biology of long non-coding RNAs (IncRNAs) has just recently become a topic of great interest in biomedical research and medicine. This due largely to the observation that some have been shown to be up-regulated in certain cancers. And while the relationship does not appear to be causative, understanding the role of these enigmatic RNAs could shed light on their possible role in gene regulation. Like RTMs, IncRNAs are transcribed by RNA polymerase II. And they both face the same problem; 3’ end processing to ensuring precise polymerase termination and functionality of the mature transcript. For an RTM, most literature reports use a polyadenylation signal for 3’ end processing. However, this approach signals the RTM to the cytoplasm, effectively reducing the nuclear copy number and allowing the RTM to express a truncated protein with unknown biological consequences. RTM expression, or sometimes referred to as RTM maturation, that generates a truncated protein is an undesirable outcome/off-target effect with unknown biological consequences. In contrast, many IncRNAs lack a polyadenylation signal and instead rely on noncanonical 3’ end processing for PolII termination. Some of these assume simple stem-loop structures at the 3’ end that are believed to help stabilize the mature transcript (e.g. histone mRNA). While others employ significantly more complex secondary structures.

IncRNAs have evolved a blueprint for nuclear localization that appears to include at least two features: 1) a nuclear localization signal, and 2) a mechanism for non canonical 3’ end processing to evade degradation by ribonucleases, thereby increasing their stability. A prototype IncRNA that has been shown to include both of these features is called MALAT1 (metastasis-associated lung adenocarcinoma transcript 1).

Interestingly, the 3’ end of MALAT1 is highly conserved across species and shown to condense into a triple helical structure following recognition and cleavage of a tRNA-like structure by RNaseP (Wilutz et al. 2012.Genes and Develop. 26:2392-2407). It is believed that this triple helix aids in stabilizing the MALAT1 transcript in the nuclease.

As described herein, the 3’ terminal triple helix from human MALAT1 was added to investigational RTMs that target the primary RNA transcript encoded by a CEP290- Luciferase reporter or the primary RNA transcript encoded by the endogenous ABCA4 gene. In all instances, the presence of the 3’ triple helix terminator marked enhanced trans- splicing activity. This was initially demonstrated with a 117bp truncated version of the 3’ terminal triple helix (called Comp 14, described in Wilutz et al. 2012) and later with the 151bp native sequence (NCBI REFSEQ: NR_002819).

In one aspect, the compositions and methods described herein employ gene therapy using adeno-associated virus (AAV) as a means for treating heritable genetic disorders. More specifically, the methods and compositions described herein employ the use of pre- mRNA trans- splicing as a gene therapy, both ex vivo and in vivo, for the treatment of diseases caused by defects in large genes. In one embodiment, these compositions and methods overcome the problem caused by the packaging limit for nucleic acids into AAV being limited to 4700 nucleotides. When including sequences necessary for producing an effective rAAV therapeutic and expressing the RNA-/ra/ v-s pi icing molecule (RTM), the effective size constraint for the RTM containing the ocular gene sequences is about 4000 nucleotides. These methods and compositions are particularly desirable for treatment of disorders caused by defects in genes exceeding the size necessary for incorporation and expression in an AAV, such as ABCA4. CEP 290 and MY 07 A, among other genes.

Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs and by reference to published texts, which provide one skilled in the art with a general guide to many of the terms used in the present application. The definitions used herein are provided for clarity only and are not intended to limit the claimed invention.

As used here, a“3’ transcription terminator domain” or“3’ TTD” refers to a long noncoding RNA (IncRNA) positioned at a 3’ terminus of a trans-splicing molecule. In some instances, a 3’ TTD increases trans-splicing efficiency. In some instances, the transcription terminator domain includes an expression and nuclear retention element (ENE), which, when aligned with an A-rich tract (e.g., a poly-A tail), can form an

ENE+A.

As used herein, a“long non-coding RNA” or“IncRNA” refers to a non-protein coding RNA transcript longer than 200 nucleotides (e.g., longer than 300 nucleotides, longer than 400 nucleotides, or longer than 500 nucleotides). In some embodiments, the IncRNA is from 200 to 300 nucleotides, from 300 to 400 nucleotides, from 400 to 500 nucleotides, or more than 500 nucleotides.

As used herein, the term“trans-splicing efficiency” refers to the number of trans- spliced RNA transcripts produced per trans-splicing molecule administered to a cell. Thus, trans-splicing efficiency reflects the stability and nuclear localization and retention of a trans-splicing molecule.

As used herein, the terms“triple helix,” triple helical structure,” and“triplex,” and grammatical derivations thereof, are used interchangeably and refer to a region of polynucleotide (e.g., RNA) characterized by a stacked major groove triple formed by Hoogsteen base pairing. In some instances, a triple helix includes multiple (e.g., four or more) consecutive nucleotides that pair via Hoogsteen base pairing. In some embodiments, the triple helix includes four or more consecutive adenosine nucleotides, wherein each of the consecutive adenines is paired to a uracil via Hoogsteen base pairing (e.g., a poly-A tract aligns with a U-rich motif, e.g., in a stacked major groove triple).

As used herein, the term“A-rich tract” refers to a strand of consecutive nucleic acids in which at least 80% of the consecutive nucleic acids are adenine (A).

As used herein, the term“U-rich motif’ refers to a strand of consecutive nucleic acids in which at least 80% of the consecutive nucleic acids are uracil (U).

A“nucleic acid trans-splicing molecule” or“trans-splicing molecule” has three main elements: (a) a binding domain that confers specificity by tethering the trans-splicing molecule to its target gene (e.g., pre-mRNA); (b) a splicing domain (e.g., a splicing domain having a 3’ or 5’ splice site); and (c) a coding sequence configured to be trans- spliced onto the target gene, which can replace one or more exons in the target gene (e.g., one or more mutated exons). A“pre-mRNA trans-splicing molecule” or“RTM” refers to a nucleic acid trans-splicing molecule that targets pre-mRNA. In some embodiments, a trans-splicing molecule, such as an RTM, can include cDNA, e.g., as part of a functional exon for replacement or correction of a mutated exon.

A nucleic acid is“operably linked” when it is placed into a structural or functional relationship with another nucleic acid sequence. For example, one nucleic acid sequence may be operably linked to another nucleic acid sequence if they are positioned relative to one another on the same contiguous polynucleotide and have a structural or functional relationship, such as formation of a triple helix (e.g., through Hoogsteen base pairing). In some instances, operably linked nucleic acid sequences are directly linked (i.e., the nucleic acid sequence is directly, covalently linked to another nucleic acid sequence, without intervening nucleotides). In other instances, operably linked nucleic acid sequences are not directly linked. In instances in which operably linked nucleic acid sequences are not directly linked, they can be operatively linked (indirectly) through a linker sequence. In some instances, the linker sequence can be 1-1,000 bases in length (e.g., 1-900, 1-800, 1- 700, 1-600, 1-500, 1-400, 1-300, 1-250, 1-200, 1-150, 1-100, 1-90, 1-80, 1-70, 1-60, 1-50, 1-40, 1-30-, 1-20, 1-10, 1-8, 1-6, 1-5, 1-4, or 1-3 bases in length, e.g., 1-10, 10-15, 15-20, 20-30, 30-40, 40-50, 50-100, 100-150, 150-200, or 200-500 bases in length). In some instances, an A-rich tract is operatively linked 3’ to a U-rich motif through a linker sequence.

As used herein, the term“mammalian subject” or“subject” includes any mammal in need of these methods of treatment or prophylaxis, including particularly humans. Other mammals in need of such treatment or prophylaxis include dogs, cats, or other domesticated animals, horses, livestock, laboratory animals, including non-human primates, etc. The subject may be male or female.

In one embodiment, the subject has, or is at risk of developing a disorder caused by a genetic mutation. In one embodiment, the subject has, or is at risk of developing an ocular disorder. In another embodiment, the subject has shown clinical signs of an ocular disorder, particular a disorder related to a defect or mutation in the genes ABCA4,

CEP 290, ox MY 07 A.

The term“ocular disorder” includes, without limitation, Stargardt disease

(autosomal dominant or autosomal recessive), retinitis pigmentosa, rod-cone dystrophy, Leber's congenital amaurosis, Usher's syndrome, Bardet-Biedl Syndrome, Best disease, retinoschisis, , untreated retinal detachment, pattern dystrophy, cone-rod dystrophy, achromatopsia, ocular albinism, enhanced S cone syndrome, diabetic retinopathy, age- related macular degeneration, retinopathy of prematurity, sickle cell retinopathy,

Congenital Stationary Night Blindness, glaucoma, or retinal vein occlusion. In another embodiment, the subject has, or is at risk of developing glaucoma, Leber’s hereditary optic neuropathy, lysosomal storage disorder, or peroxisomal disorder. Clinical signs of ocular disease include, but are not limited to, decreased peripheral vision, decreased central (reading) vision, decreased night vision, loss of color perception, reduction in visual acuity, decreased photoreceptor function, pigmentary changes. In another embodiment, the subject has been diagnosed with STGD1. In another

embodiment, the subject has been diagnosed with a juvenile onset macular degeneration, fundus flavimaculatus. In another embodiment, the subject has been diagnosed with cone- rod dystrophy. In another embodiment, the subject has been diagnosed with retinitis pigmentosa. In another embodiment, the subject has been diagnosed with age-related macular degeneration (AMD). In another embodiment, the subject has been diagnosed with LCA10. In yet another embodiment, the subject has not yet shown clinical signs of these ocular pathologies.

As used herein, the term“treatment” or“treating” is defined as one or more of reducing onset or progression of an ocular disease, preventing disease, reinducing the severity of the disease symptoms, or retarding their progression, removing the disease symptoms, delaying onset of disease or monitoring progression of disease or efficacy of therapy in a given subject.

As used herein, the term“selected cells” refers to any cell or cell type to which the RTM is delivered (i.e., targets of interest for modification using the compositions and methods provided herein). In certain embodiments, the selected cell is a prokaryotic cell.

In other embodiments, the selected cell is a eukaryotic cell, non-limiting examples of which include plant cells and tissues, animal cells and tissues, and human cells and tissues. Cells may be from established cell lines or they may be primary cells, where“primary cells”,“primary cell lines”, and“primary cultures” are used interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages of the culture. Without limitation, selected cells may for instance be cancerous. In certain embodiments, the selected cell is manipulated ex vivo and then administered to the subject. In yet other embodiments, the selected cells are targeted in vivo, e.g., by delivery of an rAVV, to a subject. In some embodiments, the term“selected cells” refers to ocular cells, which are any cell associated with the function of the eye, such as photoreceptor cells. In some embodiments, the term refers to rods, cones, photosensitive ganglion cells, retinal pigment epithelium (RPE) cells, Mueller cells, bipolar cells, horizontal cells, or amacrine cells. Some genes targets are expressed in the eye as well as in other organs. For example, CEP290 is expressed in kidney epithelium and in the central nervous system and MY07A is expressed in cochlear hair cells. Thus, selected cells may also include these extra-ocular cells. In certain embodiments, the selected cells are a skeletal muscle cell, e.g., a red (slow) skeletal muscle cell, a white (fast) skeletal muscle cell, or an intermediate skeletal muscle cell. In certain embodiments, the selected cell is a cardiac muscle cell, e.g., a cardiomyocyte or a nodal cardiac muscle cell. In certain embodiments, the selected cell is a smooth muscle cell. In certain embodiments, the selected cell is a muscle satellite cell or muscle stem cell.

As used herein, the term“host cell” may refer to the packaging cell line in which the rAAV is produced from the plasmid. In the alternative, the term“host cell” may refer to the target cell in which expression of the transgene is desired.

Codon optimization refers to modifying a nucleic acid sequence to change individual nucleic acids without any resulting change in the encoded amino acid. This process may be performed on any of the sequences described in this specification to enhance expression or stability. Codon optimization may be performed in a manner such as that described in, e.g., US Patent Nos. 7,561,972; 7,561,973; and 7,888,112, incorporated herein by reference, and conversion of the sequence surrounding the translational start site to a consensus Kozak sequence. See, Kozak et al, Nucleic Acids Res. 15 (20): 8125-8148, incorporated herein by reference. In one embodiment, the coding sequences are codon optimized.

The term“homologous” refers to the degree of identity between sequences of two nucleic acid sequences. The homology of homologous sequences is determined by comparing two sequences aligned under optimal conditions over the sequences to be compared. The sequences to be compared herein may have an addition or deletion (for example, gap and the like) in the optimum alignment of the two sequences. Such a sequence homology can be calculated by creating an alignment using, for example, the ClustalW algorithm (Nucleic Acid Res., 22(22): 4673 4680 (1994). Commonly available sequence analysis software, more specifically, Vector NTI, GENETYX, BLAST or analysis tools provided by public databases may also be used. The term“pharmaceutically acceptable” means approved by a regulatory agency of the Federal or a state government or listed in the U.S. Pharmacopeia or other generally recognized pharmacopeia for use in animals, and more particularly in humans.

The term“carrier” refers to a diluent, adjuvant, excipient, or vehicle with which the synthetic is administered. Examples of suitable pharmaceutical carriers are described in“Remington's Pharmaceutical sciences” by E. W. Martin.

The terms“a” or“an” refers to one or more, for example,“a gene” is understood to represent one or more such genes. As such, the terms“a” (or“an”),“one or more,” and“at least one” are used interchangeably herein.

As used herein, the term“about” means a variability of ± 0.1 to 10% from the reference given, unless otherwise specified.

With regard to the following description, it is intended that each of the

compositions herein described, is useful, in another embodiment, in the methods of treatment described herein. In addition, it is also intended that each of the compositions herein described as useful in the methods, is itself an embodiment. While various embodiments in the specification are presented using“comprising” language, which is inclusive of other components or steps, under other circumstances, a related embodiment is also intended to be interpreted and described using“consisting of’ or“consisting essentially of’ language, which is exclusive of all or any components or steps which significantly change the embodiment.

Pre-mRNA Trans-Splicing Methods and Molecules

Within a cell, a pre-mRNA intermediate exists that includes non-coding nucleic acid sequences, i.e., introns, and nucleic acid sequences that encode the amino acids forming the gene product. The introns are interspersed between the exons of a gene in the pre-mRNA, and are ultimately excised from the pre-mRNA molecule, when the exons are joined together by a protein complex known as the spliceosome. Using spliceosome activity, one may introduce an alternative exon via the introduction of a second nucleic acid. Spliceosome mediated RNA /ram-splicing (SMaRT) has been described as employing an engineered pre-mRNA trans- splicing molecule (RTM) that binds specifically to target pre-mRNA in the nucleus and triggers trans- splicing in a process mediated by the spliceosome. This methodology is described in, for example, Puttaraju M, et al 1999 Nat Biotechnol, 17:246-252; Gruber C et al, 2013 Dec, Mol. Oncol. 7(6): 1056; Avale ME, 2013 Jul, Hum. Mol. Genet., 22(13):2603-11; Rindt H et al, 2012 Dec, Cell Mol. Life Sci., 69(24):4191; US Patent Application Publication Nos. 2006/0246422 and 20130059901, and U.S. Patent Nos. 6,083,702; 6,013,487; 6,280,978; 7,399,753; and 8,053,232. These documents are incorporated herein by reference.

The nucleic acid trans-splicing molecules disclosed herein can include any of the structural or functional characteristics of nucleic acid trans-splicing molecules and related methods known in the art, for example, those described in WO 2017/087900 and

WO 2019/2045114, each of which is incorporated herein by reference in its entirety.

In some embodiments, an RNA /ram-splicing molecule (RTM) as described herein, has five main elements. In one embodiment, the elements include, operatively linked in a 5’-to-3’ direction:

(a) a coding domain (CD) comprising one or more functional exon(s) of a selected gene;

(b) a linker domain (LD) of varying length and sequence that acts as a structural connection between the coding domain and the binding domain, and may contain motifs that function as splicing enhancers, or have the capacity to fold into complex secondary structures that act to minimize the translation of the coding region before the trans-splicing event occurs, or encode a degradation peptide in the event of premature RTM maturation;

(c) a spliceosome recognition motif (Splice Donor, SD) configured to initiate spliceosome-mediated trans-splicing;

(d) a binding domain (BD) of varying length and sequence configured to hybridize to a target intron of the selected gene, wherein said gene has at least one defect or mutation in an exon 5’ to the target intron; and

(e) a 3’ transcription terminator domain (TTD) that increases the efficiency of trans-splicing.

The nucleic acid trans-splicing molecule is configured to trans-splice the coding domain to an endogenous exon of the selected gene adjacent to the target intron, thereby replacing the endogenous defective or mutated exon with the functional exon and correcting a mutation in the selected gene

In another embodiment the elements include, operatively linked in a 5’ to 3’ direction:

(a) a binding domain (BD) configured to bind a target intron of a selected gene, wherein said gene has at least one defect or mutation in an exon 3’ to the targeted intron;

(c) a 3’ spliceosome recognition motif (Splice Acceptor, SA) configured to mediate trans-splicing;

(d) a coding domain (CD) comprising one or more functional exon(s) of the selected gene; and

Coding Domain Sequence (CDS)

The coding domain of the RTMs described herein includes part of the wild-type coding sequence to be trans- spliced to the target pre-mRNA. By“wild-type coding sequence” it is meant a sequence which, when translated and assembled, provides a functional protein. The expression or function need not be to the same level as the wild- type protein. In one embodiment, the wild-type coding sequence is modified, e.g., via codon optimization.

The pre-RNA /ram-splicing molecule (RTM) is configured to trans-splice the coding domain to an endogenous exon of the selected gene adjacent to the target intron, thereby replacing the endogenous defective or mutated exon with the functional exon and correcting a mutation in the selected gene. The CDS may provide some or of all of the exons of the selected gene 3’ or 5’ to the binding domain, depending on the configuration of the RTM. For example, for 5’ trans-splicing reactions, all or some of the exons 5’ to the BD are replaced. For 3’ trans-splicing reactions, all or some of the exons 3’ to the BD are replaced. The design of the RTM permits replacement of the defective or mutated portion of the pre-mRNA exon(s) with a nucleic acid sequence, i.e., the exon (s) having a normal sequence without the defect or mutation. The“normal” sequence can be a wild-type naturally-occurring sequence or a corrected sequence with some other modification, e.g., codon-modified, that is not disease-causing.

In one embodiment, the coding domain is a single exon of the target gene, which contains the normal wildtype sequence lacking the disease-causing mutations, e.g., Exon 22 of ABCA4. In another embodiment, the coding domain comprises multiple exons which contain multiple mutations causing disease, e.g., Exons 1-22 of ABCA4. Depending upon the location of the exon to be corrected, the RTM may contain multiple exons located at the 5' or 3' end of the target gene, or the RTM may be designed to replace an exon in the middle of the gene. For use and delivery in the rAAV, the entire coding sequence of the ocular gene is not useful as the coding domain of RTM, unless this technique is directed to a small ocular gene less than 3000 nucleotides in length. As described herein, to replace an entire large gene, two RTMs, a 3' and a 5' RTM can be employed in different rAAV particles.

RTMs described herein can comprise coding domains encoding for one or more exons identified herein and characterized by containing a gene mutation or defect relating to the associated disease, e.g., Exon 27 of ABCA4 may be the coding domain for an RTM designed for the treatment of Stargardf s disease. In TABLES 1 to 3 herein, the names of the targeted genes and the exons containing likely mutations causing disease are identified.

In one embodiment, the coding domain of a 5' RTM is designed to replace the exons in the 5' portion of the targeted gene. In another embodiment, the coding domain of a 3' RTM is designed to replace the exons in the 3' portion of a gene. In another embodiment, the coding domain is one or a multiple exons located internally in the gene and the coding domain is located in a double trans-splicing RTMs.

Thus, for example, three possible types of RTMs are useful for treatment of disease caused by defects in e.g., ABCA4: A 5' trans-splicing RTMs which include a 5' splice site. After trans-splicing, the 5' RTM will have changed the 5' region of the target mRNA; a 3' RTM which include a 3' splice site that is used to trans-splice and replace the 3' region of the target mRNA; and a double trans-splicing RTM, which carry multiple binding domains along with a 3' and a 5' splice site. After trans-splicing, this RTM replaces an internal exon in the processed target mRNA. In other embodiments, the coding domain can include an exon that comprises naturally occurring or artificially introduced stop-codons in order to reduce gene expression; or the RTM can contain other sequences which produce an RNAi- like effect.

For use in treating Stargardf s disease, suitable coding regions of ABCA4 are Exons 1-22 or 27-50, in separate RTMs. For use in treating LCA10, suitable coding regions of CEP290 are Exons 1-26 or exons 27-54 in separate RTMs. For use in treating Usher Syndrome, suitable coding regions of MY07A are Exons 1-18 or 33-49, in separate RTMs.

Still other coding domains can be constructed by one of skill in the art to replace the entirety of the genes in fragments provided by a 5' RTM and 3 'RTM, and/or a double splicing RTM, given the teachings provided herein.

Linker Domain (LD)

The RTM described herein includes, in some embodiments, a linker domain (LD) of varying length and sequence that acts as a structural connection between the coding domain and the binding domain. In one embodiment, the LD contains one or more motifs that function as splicing enhancers. In one embodiment the LD provides one or more motifs that have the capacity to fold into complex secondary structures that act to minimize the translation of the coding region before the trans-splicing event occurs.

In one embodiment, the linker sequence is SEQ ID NO: 37:

ccgaatacgacacgtagcaagatct.

Spliceosome Recognition Motif (Splice Donor (SD) and Splice Acceptor (SAT)

Depending on the RTM (5’- or 3’) directionality, the RTM includes a spliceosome recognition motif, which is either a splice donor (SD), splice acceptor (SA) or both.

Introns always have two distinct nucleotides at either end. At the 5' end the DNA nucleotides are GT [GU in the premessenger RNA (pre-mRNA)]; at the 3' end they are AG. These nucleotides are part of the splicing sites. The SD is the splicing site at the beginning of an intron, intron 5' left end, and is sometimes referred to as the 5’ splice site or 5’SS. The SA is the splicing site at the end of an intron, intron 3' right end, and is sometimes referred to as the 3’ splice site, or 3’SS.

DONOR-SPLICE ACCEPTOR - SPLICE

N GT Nfifi . NN

† t

5' exon 3’ exon

Briefly, the splicing domain provides essential consensus motifs that are recognized by the spliceosome. The use of BP and PPT follows consensus sequences required for performance of the two phosphoryl transfer reaction involved in cis-splicing and, presumably, also in trans- splicing. In one embodiment a branch point consensus sequence in mammals is YNYURAC (Y=pyrimidine; N=any nucleotide). The underlined A is the site of branch formation. A polypyrimidine tract is located between the branch point and the splice site acceptor and is important for different branch point utilization and 3' splice site recognition. Consensus sequences for the 5' splice donor site and the 3' splice region used in RNA splicing are well known in the art. In addition, modified consensus sequences that maintain the ability to function as 5' donor splice sites and 3' splice regions may be used. Briefly, in one embodiment, the 5' splice site consensus sequence is the nucleic acid sequence AG/GURAGU (where / indicates the splice site). In another embodiment the endogenous splice sites that correspond to the exon proximal to the splice site can be employed to maintain any splicing regulatory signals. In one embodiment, the ABCA4 5'RTM containing as a coding region the sequence encoding exon 1-22 with a binding domain complementary to a region in intron 22 uses the endogenous intron 22 5' splice site. In another embodiment, th &ABCA4 3'RTM encoding exons 27-50 with a binding domain complementary to intron 26 uses the endogenous intron 26 3' splice site. In one embodiment a suitable 5’ splice site with spacer is: 5’- GTA AGA GAG CTC GTT GCG ATA TTA T -3’ SEQ ID NO: 1. In one embodiment a suitable 5’ splice site is AGGT.

In one embodiment, a suitable 3’ RTM BP is 5’-TACTAAC-3’ (SEQ ID NO: 2).

In one embodiment, a suitable 3’ splice site is: 5’- TAC TAA CTG GTA CCT CTT CTT TTT TTT CTG CAG -3’ SEQ ID NO: 2 or 5’-CAGGT-3’ (SEQ ID NO: 4). In one embodiment, a suitable 3’RTM PPT is 5’-TGG TAC CTC TTC TTT TTT TTC TG-3’

SEQ ID NO: 5.

Binding Domain (BD)

The RTM includes a binding domain (BD) of varying length and sequence configured to hybridize to a target intron of the selected gene. In one embodiment, the binding domain is a nucleic acid sequence complementary to a sequence of the target pre- mRNA to suppress endogenous target cis-splicing while enhancing trans-splicing between the trans-splicing molecule and the target pre-mRNA, e.g., to create a chimeric molecule having a portion of endogenous mRNA and the coding domain having one or more functional exons. In some embodiments, the binding domain is in an antisense orientation to a sequence of the target intron.

A 5’ trans-splicing molecule will generally bind the target intron 3’ to the mutation, while a 3’ trans-splicing molecule will generally bind the target intron 5’ to the mutation. In one embodiment, the binding domain comprises a part of a sequence complementary to the target intron. In one embodiment herein, the binding domain is a nucleic acid sequence complementary to the intron closest to (i.e., adjacent to) the exon sequence that is being corrected.

In another embodiment, the binding domain is targeted to an intron sequence in close proximity to the 3’ or 5’ splice signals of a target intron. In still another embodiment, a binding domain sequence can bind to the target intron in addition to part of an adjacent exon.

Thus, in some instances, the binding domain binds specifically to the mutated endogenous target pre-mRNA to anchor the coding domain of the trans-splicing molecule to the pre-mRNA to permit trans-splicing to occur at the correct position in the target gene. The spliceosome processing machinery of the nucleus may then mediate successful trans-splicing of the corrected exon for the mutated exon causing the disease.

In certain embodiments, the trans-splicing molecules feature binding domains that contain sequences on the target pre-mRNA that bind in more than one place. The binding domain may contain any number of nucleotides necessary to stably bind to the target pre- mRNA to permit trans-splicing to occur with the coding domain. In one embodiment, the binding domains are selected using mFOLD structural analysis for accessible loops (Zuker, Nucleic Acids Res. 2003, 31(13): 3406-3415).

Suitable target binding domains can be from 10 to 500 nucleotides in length. In some embodiments, the binding domain is from 20 to 400 nucleotides in length. In some embodiments, the binding domain is from 50 to 300 nucleotides in length. In some embodiments, the binding domain is from 100 to 200 nucleotides in length. In some embodiments, the binding domain is from 10-20 nucleotides in length (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length), 20-30 nucleotides in length (e.g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length), 30-40 nucleotides in length (e.g., 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 nucleotides in length), 40-50 nucleotides in length (e.g., 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides in length), 50-60 nucleotides in length (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 nucleotides in length), 60-70 nucleotides in length (e.g., 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 nucleotides in length), 70-80 nucleotides in length (e.g., 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, or 80 nucleotides in length), 80-90 nucleotides in length (e.g., 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90 nucleotides in length), 90-100 nucleotides in length (e.g., 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 nucleotides in length), 100-110 nucleotides in length (e.g., 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, or 110 nucleotides in length), 110- 120 nucleotides in length (e.g., 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, or 120 nucleotides in length), 120-130 nucleotides in length (e.g., 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, or 130 nucleotides in length), 130-140 nucleotides in length (e.g., 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, or 140 nucleotides in length), 140-150 nucleotides in length (e.g., 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, or 150 nucleotides in length), 150-160 nucleotides in length (e.g., 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, or 160 nucleotides in length), 160-170 nucleotides in length (e.g., 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, or 170 nucleotides in length), 170-180 nucleotides in length (e.g., 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, or 180 nucleotides in length), 180-190 nucleotides in length (e.g., 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, or 190 nucleotides in length), 190-200 nucleotides in length (e.g., 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, or 200 nucleotides in length), 200-210 nucleotides in length, 210-220 nucleotides in length, 220-230 nucleotides in length, 230- 240 nucleotides in length, 240-250 nucleotides in length, 250-260 nucleotides in length, 260-270 nucleotides in length, 270-280 nucleotides in length, 280-290 nucleotides in length, 290-300 nucleotides in length, 300-350 nucleotides in length, 350-400 nucleotides in length, 400-450 nucleotides in length, or 450-500 nucleotides in length. In some embodiments, the binding domain is about 150 nucleotides in length. In another embodiment, the target binding domains may include a nucleic acid sequence up to 750 nucleotides in length. In another embodiment, the target binding domains may include a nucleic acid sequence up to 1000 nucleotides in length. In another embodiment, the target binding domains may include a nucleic acid sequence up to 2000 nucleotides or more in length.

In some embodiments, the specificity of the trans-splicing molecule may be increased by increasing the length of the target binding domain. Other lengths may be used depending upon the lengths of the other components of the trans-splicing molecule.

The binding domain may be from 80% to 100% complementary to the target intron to be able to hybridize stably with the target intron. For example, in some embodiments, the binding domain is 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%,

91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% complimentary to the target intron. The degree of complementarity is selected by one of skill in the art based on the need to keep the trans-splicing molecule and the nucleic acid construct containing the necessary sequences for expression and for inclusion in the rAAV within a 3,000 or up to 4,000 nucleotide base limit. The selection of this sequence and strength of hybridization depends on the complementarity and the length of the nucleic acid.

In one embodiment, the BD targets intron 23, motif 81 of ABCA4. In one embodiment, the sequence is: SEQ ID NO: 6:

T C ACT GTTT AAT CT GTTAATT CAT CT GAGC ATTTT GAGGGT GTAGT C GCTT GAT TTTATCCTAGAGAGTGTGTGAGTCACACACAGAGAGGAGCAGAACCTCCAAG

GGTCCCTTTGGCTTGTCATCAATTATGTGGCAGCTGTAGGTTCT.

3’ Transcription Terminator Domain (TTD)

The RTM as described herein, contains a 3’ transcription terminator domain (TTD), e.g., a 3’ TTD that increases the efficiency of trans-splicing. The TTD, in one embodiment, comprises one or more of the following sequences: a sequence that is involved in the formation of a triplex (also referred to herein as the“triple helix” or“triple helical structure”), an RNase P cleavage site, the tRNA like structure that serves as a template for RNaseP cleavage (also referred to herein as the tRNA-like domain, structure or sequence), and any flanking sequence that might facilitate folding of these domains, independently or collectively. Such flanking sequence may be an artificial linker, a linker derived from another sequence, or flanking sequence from the native IncRNA. In one embodiment, the 3’ transcription terminator domain forms a triple helical structure that effectively caps the 3’ end or protects the 3’ end from nuclease degradation. As discussed herein, the tRNA-like domain may also include the RNase P cleavage site.

Long non-coding RNAs serve as important regulatory mediators in gene expression. Some IncRNAs have been shown to have 3’ ends produced by non-canonical recognition and cleavage of a tRNA-like structure by RNase P. In some instances, it has been shown that some IncRNAs are protected fom 3’-5’ endonucleases by highly conserved triple helical structures. As provided herein, sequences of the 3’ terminal ends of certain IncRNAs are able to be incorporated in RTM as a terminal domain (TTD) which is able to increase the efficiency of trans-splicing. In one embodiment, the TTD is a sequence from one or more long non-coding RNAs (IncRNA) or other nuclear RNA molecules that contain a 3’ transcription terminator that condenses into a triple helix 3’ end cap. In one embodiment, the TTD sequences are from the human long non-coding RNA MALAT1. In another embodiment, the TTD sequences are from the human IncRNA MENb. In one embodiment, the TTD includes nucleotides 8287-8437 of human

MALAT1 (SEQ ID NO: 7). In another embodiment, the TTD includes, in order from 5’ to 3’, a triplex forming sequence that comprises nucleotides 8287-8379 of SEQ ID NO: 7, an RNaseP cleavage site the comprises nucleotides 8379-8380 of SEQ ID NO: 7, and a tRNA-like sequence that comprises nucleotides 8380-8437 of SEQ ID NO: 7.

In some embodiments, the 3’ TTD comprises, in a 5’-to-3’ direction (linked directly or indirectly), a 5’ U-rich motif, a stem-loop motif, a 3’ U-rich motif, and an A- rich tract (e.g., a poly-A tail). In some instances, the A-rich tract is capable of Hoogsteen base pairing with the 5’ U-rich motif. In some embodiments, one or both stem strands is about 8-20 base pairs in length (e.g., from 9-16, 10-14, or 11-23 base pairs in length). In some embodiments, the 5’ U-rich motif and the 3’ U-rich motif each comprise at least five consecutive uracils. In some embodiments, the 5’ U-rich motif and the 3’ U-rich motif are each 5-15 base pairs in length.

In some embodiments, the 3’ TTD comprises, in a 5’ to 3’ direction, a 5’ U-rich motif comprising five consecutive uracils, a stem-loop motif in which at least one stem strand has a length of about 16 base pairs, a 3’ U-rich motif comprising five consecutive uracils, and an A-rich tract comprising at least 18 adenines. In some embodiments, the 3’ TTD comprises SEQ ID NO: 14. In some embodiments, the 3’ TTD comprises SEQ ID NO: 13.

In some embodiments, the 3’ TTD comprises, in a 5’ to 3’ direction, a 5’ U-rich motif comprising SEQ ID NO: 18, a stem-loop motif in which at least one stem strand has a length of about 13 nucleotides, a 3’ U-rich motif comprising SEQ ID NO: 19, and an A- rich tract comprising SEQ ID NO: 20. In some embodiments, the 3’ TTD comprises SEQ ID NO: 16. In some embodiments, the 3’ TTD comprises SEQ ID NO: 15.

In some embodiments, the 3’ TTD comprises, in a 5’ to 3’ direction, SEQ ID NO: 18, SEQ ID NO: 19, and SEQ ID NO: 20. In some embodiments, the 3’ TTD comprises SEQ ID NO: 17.

In some embodiments, the 3’ TTD comprises, in a 5’ to 3’ direction, a 5’ U-rich motif comprising SEQ ID NO: 23, a stem-loop motif in which at least one stem strand has a length of about 13 nucleotides, a 3’ U-rich motif comprising SEQ ID NO: 24, and an A- rich tract comprising SEQ ID NO: 25. In some embodiments, the 3’ TTD comprises SEQ ID NO: 24. In some embodiments, the 3’ TTD comprises SEQ ID NO: 23. In some embodiments, the 3’ TTD is between 200 and 1000 nucleotides in length (e.g., from 200 to 900, from 200 to 800, from 200 to 700, from 200 to 600, from 200 to 500, from 200 to 400, or from 200 to 300 nucleotides in length).

Triplex-forming structure

The triple helix structure is, in one embodiment, formed from an A-rich motif (e.g., an A-rich tract), along with two upstream (e.g., 5’) U-rich motifs and a stem-loop structure. As exemplified herein, these sequences are highly conserved evolutionarily in metastasis-associated lung adenocarcinoma transcript 1 (MALAT1), a IncRNA associated with certain cancers. Similar highly conserved A- and U-rich motifs are present at the 3’ end of the MENb long nuclearretained noncoding RNA, also known as NEAT1_2, which is also processed at its 3’ end by RNase P. It has been shown that these highly conserved A- and U-rich motifs form a triple-helical structure critical for protecting the 3’ end of MALAT1 from 3’-5’ exonucleases.

A number of triple-helices are useful in engineering any of the constructs described herein. Such triple-helices include ENE+A, riboswitch, and telomerase triple helices (see, e.g., Brown et al. Nature Structural and Molecular Biology, 21, 633-642, 2014, which is incorporated herein by reference). For example, ENE+A triple helices are described for human MALATl (Brown et al. Nat. Struct. Mol. Biol., 7, 633-40, 2014.), KSHV PAN (Mitton-Fry et al. Science, 330, 1244-7, 2010), human MENb (Brown et al. Proc. Natl. Acad. Sci. USA, 109, 19202-7, 2012 ), Acanthamoeba polyphaga mimivirus (Tycowski et al. Cell Rep., 2, 26-32, 2012), Cotesia congregata bracovirus (Tycowski et al. Cell Rep.,

2, 26-32, 2012), Cotesia sesamiae bracovirus (Tycowski et al. Cell Rep., 2, 26-32, 2012), Equine herpesvirus 2 (EHV2) (Tycowski et al. Cell Rep., 2, 26-32, 2012), Plautia stall intestine virus (PSIV) (Tycowski et al. Cell Rep., 2, 26-32, 2012), and Rhesus

rhadinovirus PAN (RRV) (Tycowski et al. Cell Rep., 2, 26-32, 2012). Other exemplary triple helices include riboswitch triple helices which are described for the PreQi-II Riboswitch from Lactobacillales rhamnosus (Liberman et al. Nat. Chem. Biol., 9, 353-5, 2013) and the SAM-II Riboswitch found in the Sargasso Sea metagenome (Gilbert et al. Nat. Struct. Mol. Biol., 15, 177-82, 2008). In yet another example, telomerase triple helices are described for humans (Theimer et al. Mol Cell, 17, 671-82, 2005) and for Kluyveromyces lactis (Cash et al. Proc. Natl. Acad. Sci USA, 110, 10970-5, 2013.

In one embodiment, the RTM contains a triplex forming sequence comprised of a U-rich motif 1 (e.g., a 5’ U-rich motii), a conserved stem-loop, a U-rich motif 2 (e.g., a 3’ U-rich motii), and an A-rich tract (e.g., as part of a poly-A tail), wherein the A-rich tract and the U-rich motif 2 form a Watson-Crick stem duplex, and the U-rich motif 1 aligns with the A-rich tract to form Hoogsteen base pairs. (Buske et al. 2012; Beal and Dervan, 1991), which is incorporated herein by reference. In one embodiment, the sequences are from human MALAT1. Thus, in one embodiment, the RTM contains a triplex forming sequence comprised of a U-rich motif 1 (8292-8301 of human MALAT1), a conserved stem-loop (8302-8333 of human MALAT1), a U-rich motif 2 (8334-8343 of human MALAT1), and an A-rich tract (8369-8379 of human MALAT1), wherein the A-rich tract and the U-rich motif 2 form a Watson-Crick stem duplex, and the U-rich motif 1 aligns with the A-rich tract to form Hoogsteen base pairs.

In another embodiment, the 3’ TTD described herein is of novel design, derived from theoretical modeling and/or by extension of naturally occurring sequences. In one embodiment, the TTD comprises, in order from 5’ to 3’, a triplex forming sequence of varying length and composition, an RNaseP cleavage site, and a tRNA-like sequence of varying length and composition. In one embodiment, the triplex forming sequence conforms to one of three known basic“motifs”, and are referred to by the base composition of the third strand of the triple helix: pyrimidine motif (T,C), purine motif (G,A), and purine-pyrimidine motif (G,T) (Buske FA, Bauer DC, Mattick JS, Bailey TL. 2012. Triplexator: Detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Res. 22: 1372-1382; Beal PA, Dervan PB. 1991. Second structural motif for recognition of DNA by oligonucleotide-directed triple-helix formation. Science. 251: 1360-1363, which are both incorporated herein by reference).

In another embodiment, the TTD is a truncated version of the human MALAT1 triple helix. In one embodiment the TTD contains a triplex forming sequence comprised of a U-rich motif 1 (8292-8301 of human MALAT1), a conserved stem-loop (8302-8310 and 8325-8333 of human MALAT1), a U-rich motif 2 (8334-8343 of human MALAT1), an A- rich tract (8369-8379 of human MALAT1), and a deletion spanning nucleotide 8345-8364 of human MALAT1 of the intervening sequence between U-rich motif 2 and the A-rich tract, wherein the A-rich tract and the U-rich motif 2 form a Watson-Crick stem duplex, and the U-rich motif 1 aligns with the A-rich tract to form Hoogsteen base pairs.

In one embodiment, the triple helix structure is derived from a IncRNA. In one embodiment, the triple helix structure is derived from MALAT1. As the MALAT1 sequences are highly conserved evolutionarily, the MALAT1 sequence can be from any species. In one embodiment, the MALAT1 sequence is from a human. In another embodiment, the MALAT1 sequence is from a mouse. In another embodiment, the MALAT1 sequence is from a non-human primate. In another embodiment, the MALAT1 sequence is from a dog. In another embodiment, the MALATl sequence is from an elephant. In another embodiment, the MALATl sequence is from an opossum. In another embodiment, the MALATl sequence is from fish. Such seqeuences are known in the art and can be found, e.g., in GenBank. In one embodiment, the MALATl sequence is SEQ ID NO: 7.

In another embodiment, the triple helix sequence is provided as a truncated or modified version of the native sequence, so long as the sequence retains the ability to fold into the required triple helix structure.

In one embodiment, the triple helix structure is derived from MENb. The MENb sequence can be from any species. In one embodiment, the MENb sequence is from a human. In another embodiment, the MENb sequence is from a mouse. In another embodiment, the MENb sequence is from a non-human primate. In another embodiment, the MENb sequence is from a dog. In another embodiment, the MENb sequence is from an elephant. In another embodiment, the MENb sequence is from an opossum. In another embodiment, the MENb sequence is from fish. Such seqeuences are known in the art and can be found, e.g., in GenBank.

In another embodiment, the triple helix sequence is provided as a truncated or modified version of the native sequence, so long as the sequence retains the ability to fold into the required triple helix structure. In one embodiment, the MENb sequence is SEQ ID NO: 8.

In some embodiments, the triple helix includes four to 100 consecutive adenosines paired via Hoogsteen base pairing (e.g., four to 80 consecutive adenosines paired via Hoogsteen base pairing, four to 60 consecutive adenosines paired via Hoogsteen base pairing, four to 50 consecutive adenosines paired via Hoogsteen base pairing, four to 40 consecutive adenosines paired via Hoogsteen base pairing, four to 30 consecutive adenosines paired via Hoogsteen base pairing, four to 20 consecutive adenosines paired via Hoogsteen base pairing, four to 18 consecutive adenosines paired via Hoogsteen base pairing, four to 15 consecutive adenosines paired via Hoogsteen base pairing, four to 12 consecutive adenosines paired via Hoogsteen base pairing, four to 11 consecutive adenosines paired via Hoogsteen base pairing, four to 10 consecutive adenosines paired via Hoogsteen base pairing, four to nine consecutive adenosines paired via Hoogsteen base pairing, four to eight consecutive adenosines paired via Hoogsteen base pairing, four to seven consecutive adenosines paired via Hoogsteen base pairing, or four to six consecutive adenosines paired via Hoogsteen base pairing, e.g., five to 50 consecutive adenosines paired via Hoogsteen base pairing, five to 40 consecutive adenosines paired via Hoogsteen base pairing, five to 30 consecutive adenosines paired via Hoogsteen base pairing, five to 20 consecutive adenosines paired via Hoogsteen base pairing, five to 18 consecutive adenosines paired via Hoogsteen base pairing, five to 15 consecutive adenosines paired via Hoogsteen base pairing, five to 12 consecutive adenosines paired via Hoogsteen base pairing, five to 10 consecutive adenosines paired via Hoogsteen base pairing, five to nine consecutive adenosines paired via Hoogsteen base pairing, five to eight consecutive adenosines paired via Hoogsteen base pairing, five to seven consecutive adenosines paired via Hoogsteen base pairing, or five to six consecutive adenosines paired via Hoogsteen base pairing, e.g., six to eight consecutive adenosines paired via Hoogsteen base pairing, eight to 10 consecutive adenosines paired via Hoogsteen base pairing, 10 to 12 consecutive adenosines paired via Hoogsteen base pairing, 12 to 14 consecutive adenosines paired via Hoogsteen base pairing, 14 to 16 consecutive adenosines paired via Hoogsteen base pairing, 16 to 18 consecutive adenosines paired via Hoogsteen base pairing, 18 to 20 consecutive adenosines paired via Hoogsteen base pairing, 20 to 30 consecutive adenosines paired via Hoogsteen base pairing, 30 to 40 consecutive adenosines paired via Hoogsteen base pairing, or 40 to 50 consecutive adenosines paired via Hoogsteen base pairing). In some embodiments, the triple helix includes a strand of consecutive nucleotides in which at least 90% of the nucleotides are paired via Hoogsteen base pairing (e.g., at least 90% of the nucleotides are paired via Hoogsteen base pairing, at least 91% of the nucleotides are paired via Hoogsteen base pairing, at least 92% of the nucleotides are paired via Hoogsteen base pairing, at least 93% of the nucleotides are paired via

Hoogsteen base pairing, at least 94% of the nucleotides are paired via Hoogsteen base pairing, at least 95% of the nucleotides are paired via Hoogsteen base pairing, at least 96% of the nucleotides are paired via Hoogsteen base pairing, at least 97% of the nucleotides are paired via Hoogsteen base pairing, at least 98% of the nucleotides are paired via Hoogsteen base pairing, at least 99% of the nucleotides are paired via Hoogsteen base pairing, or 100% of the nucleotides are paired via Hoogsteen base pairing).

Domain 2 - tRNA-like structure

The tRNA-like structures described herein, are sequences which form tRNA-like clover secondary structure, allowing it to be recognized by one or more of RNase P,

RNase Z, and the CCA-adding enzyme.

The tRNA-like structure of MALAT1 is termed mascRNA (MALAT1 -associated small cytoplasmic RNA). This sequence is 61nt long and is shown in SEQ ID NO: 9. The tRNA-like structure of mascRNA has been preserved through evolution, as the four mismatches between the mouse and human orthologs maintain the cloverleaf secondary structure. Although similar in structure to a tRNA and containing a well-conserved B-box, the 61-nt mascRNA transcript is smaller than most tRNAs (~76-nt) and has a small, relatively poorly conserved anticodon loop. Wilusz et al, Cell. 2008 Nov 28; 135(5): 919- 932, incorporated by reference herein. The tRNA-like structure of MENb is termed menRNA. Zhang et al., 2017, Cell Reports 19, 1723-1738, which is incorporated herein by reference.

In one embodiment, the tRNA-like structure is derived from a IncRNA. In one embodiment, the tRNA-like structure is derived from MALATl. As the MALAT1 sequences are highly conserved evolutionarily, the MALATl sequence can be from any species. In one embodiment, the MALATl sequence is from a human. In another embodiment, the MALATl sequence is from a mouse. In another embodiment, the MALAT1 sequence is from a non-human primate. In another embodiment, the MALAT1 sequence is from a dog. In another embodiment, the MALAT1 sequence is from an elephant. In another embodiment, the MALAT1 sequence is from an opossum. In another embodiment, the MALAT1 sequence is from fish. Such seqeuences are known in the art and can be found, e.g., in GenBank.

In another embodiment, the tRNA-like sequence is provided as a truncated or modified version of the native sequence, so long as the sequence retains the ability to fold into the required tRNA-like structure.

In one embodiment, the tRNA-like structure is derived from MENb. The MENb sequence can be from any species. In one embodiment, the MENb sequence is from a human. In another embodiment, the MENb sequence is from a mouse. In another embodiment, the MENb sequence is from a non-human primate. In another embodiment, the MENb sequence is from a dog. In another embodiment, the MENb sequence is from an elephant. In another embodiment, the MENb sequence is from an opossum. In another embodiment, the MENb sequence is from fish. Such seqeuences are known in the art and can be found, e.g., in GenBank.

The components of the TTD can originate from the same or different IncRNA, including IncRNA homologs from different species. For example, the triple helix domain and the tRNA-like domain may originate from the same long non-coding RNA or different combinations of long non-coding RNA domains derived from human or any other species. In one embodiment, the triple helix domain and the tRNA-like domain are from MALATl or NEATI/MENb.

Targeted Genes

The targeted gene is one that contains one or multiple defects or mutations that cause an ocular disease. In one embodiment described herein, the targeted gene is a mammalian gene with defects known to cause a disease or disorder. The wildtype sequences of the genes and encoded proteins and/or the genomic and chromosomal sequences are available from publically available databases and their accession numbers are provided herein. In addition to these published sequences, all corrections later obtained or naturally occurring conservative and non-disease-causing variants sequences that occur in the human or other mammalian population are also included. Additionally, conservative nucleotide replacements or those causing codon optimizations are also included. The sequences as provided by the database accession numbers may also be used to search for homologous sequences in the same or another mammalian organism.

It is anticipated that the target ocular nucleic acid sequences and the resulting protein truncates or amino acid fragments identified herein may tolerate certain minor modifications at the nucleic acid level to include, for example, modifications to the nucleotide bases which are silent, e.g., preference codons. In other embodiments, nucleic acid base modifications which change the amino acids, e.g. to improve expression of the resulting peptide/protein are anticipated. Also included as likely modification of fragments are allelic variations, caused by the natural degeneracy of the genetic code.

Also included as modification of the selected genes are analogs, or modified versions, of the encoded protein fragments provided herein. Typically, such analogs differ from the specifically identified proteins by only one to four codon changes. Conservative replacements are those that take place within a family of amino acids that are related in their side chains and chemical properties.

The nucleic acid sequence encoding a normal gene may be derived from any mammal which natively expresses that gene, or homolog thereof. In another embodiment, the gene sequence is derived from the same mammal that the composition is intended to treat. In another embodiment, the gene sequence is derived from a human. In other embodiments, certain modifications are made to the gene sequence in order to enhance the expression in the target cell. Such modifications include codon optimization.

In one embodiment, the gene is ABCA4, which is indicated in Stargardt’s Disease. The genomic sequence of the DNA for this gene can be found in the NCBI Reference Sequence for Chromosome 1 (135313 bp) at NG_009073.1. The mRNA for the gene as well as the locations of the exons are indicated in the NCBI report. The DNA sequence of ABCA4 provided as NCBI Reference Sequence: NM_000350.2. The amino acid sequence is provided as NCBI Reference Sequence: NP000341.2.

In another embodiment, the gene is CEP290. Leber congenital amaurosis comprises a group of early-onset childhood retinal dystrophies characterized by vision loss, nystagmus, and severe retinal dysfunction. Patients usually present at birth with profound vision loss and pendular nystagmus. Electroretinogram (ERG) responses are usually nonrecordable. Other clinical findings may include high hypermetropia, photodysphoria, oculodigital sign, keratoconus, cataracts, and a variable appearance to the fundus. LCA10 is caused by mutation in the CEP290 gene on chromosome 12q21 and may account for as many as 21% of cases of LCA. Mutations in CEP 290 can also result in extra-ocular findings, including kidney and CNS abnormalities, and thus can result in syndromes (Senior Loken syndrome, Joubert syndrome, Bardet-Biedl).

The genomic sequence of the DNA for this gene can be found in the NCBI Reference Sequence for Chromosome 12 from nt. 88049013-88142216 (93,204 bp) at NC_000012.12. The mRNA and the exons are identified in NCBI report. The DNA sequence of CEP 290 provided as NCBI Reference Sequence: NM_025114.3. The amino acid sequence is provided as NCBI Reference Sequence: NP0789390.3. The mRNA contains 54 exons and 59 introns (due to alternative splicing). Many mutations of CEP290 and their locations in the nucleotide sequence are known.

In another embodiment, the gene is MY07A. Mutations in this gene are related to Usher Syndrome. Usher syndrome is a condition characterized by hearing loss and progressive vision loss. The loss of vision is caused by an eye disease called retinitis pigmentosa (RP), which affects the layer of light-sensitive retina. Vision loss occurs as the light-sensing cells of the retina gradually deteriorate. Over time, these blind spots enlarge and merge to produce tunnel vision. In some cases of Usher syndrome, vision is further impaired by clouding of the lens of the eye (cataracts). Many people with retinitis pigmentosa retain some central vision throughout their lives, however. The loss of hearing is caused by disease in cochlear hair cells, which also gradually deteriorate. Usher syndrome type I can result from mutations in the CDH23 , MY07A, PCDH15, USH1C, or USH1G gene. More than 250 mutations in the MU07L gene have been identified in people with Usher syndrome type IB. Many of these genetic changes alter a single protein building block (amino acid) in critical regions of the myosin VIIA protein. Other mutations introduce a premature stop signal in the instructions for the myosin VIIA protein. As a result, an abnormally small version of this protein is made. Some mutations insert or delete small amounts of DNA in the MY 07 A gene, which alters the protein. All of these changes cause the production of a nonfunctional myosin VIIA protein that adversely affects the development and function of cells in the inner ear and retina, resulting in Usher syndrome.

The genomic sequence of the DNA for this gene can be found in the NCBI Reference Sequence for Chromosome 11 from nt. 77,128,255 to 77,215,240 (86,986 bp) at NC_000011.9. The DNA sequence of MY 07 A provided as NCBI Reference Sequence: NM_000260.3. The amino acid sequence is provided as NCBI Reference Sequence: NP 000251.1. The DNA sequence, amino acid sequence, exon sequences and intron sequences are provided for MY 07 A online at

https://grenada.lumc.nl/LOVD2/Usher_montpellier/refseq/MY07A_codingDNA.html, last modified February 17, 2010. The mRNA contains 49 exons and 61 introns. Many mutations of MY 07 A may be found on the CCHMC Molecular Genetics Laboratory Mutation Database, LOVD v.2.0.

RTM Target Gene Coding Sequence

In one embodiment, the coding domain is a single exon of the target gene, which contains the normal wild-type sequence lacking the disease-causing mutations, e.g., Exon 27 of ABCA4. In another embodiment, the coding domain comprises multiple exons which contain multiple mutations causing disease, e.g., Exons 1-22 of ABCA4. Depending upon the location of the exon to be corrected, the RTM may contain multiple exons located at the 5’ or 3’ end of the target gene, or the RTM may be designed to replace an exon in the middle of the gene. For use and delivery in the rAAV, the entire coding sequence of the gene is not useful as the coding domain of RTM, unless this technique is directed to a small gene less than 3000 nucleotides in length. As described herein, to replace an entire large gene, two RTMs, a 3’ and a 5’ RTM can be employed in different rAAV particles. In one embodiment, the coding domain of a 5’ RTM is designed to replace the exons in the 5’ portion of the targeted gene. In another embodiment, the coding domain of a 3’ RTM is designed to replace the exons in the 3’ portion of a gene. In another embodiment, the coding domain is one or a multiple exons located internally in the gene and the coding domain is located in a double trans- splicing RTMs.

Thus, for example, three possible types of RTMs are useful for treatment of disease caused by defects in e.g , ABCA4: 5' /ram-splicing RTMs which include a 5' splice site. After /rafts-splicing, the 5' RTM will have changed the 5' region of the target mRNA; a 3' RTM which include a 3' splice site that is used to trans- splice and replace the 3' region of the target mRNA; and double /ram-splicing RTMs, which carry multiple binding domains along with a 3' and a 5' splice site. After /rafts-splicing, this RTM replaces an internal exon in the processed target mRNA. In other embodiments, the coding domain can include an exon that comprises naturally occurring or artificially introduced stop-codons in order to reduce gene expression; or the RTM can contain other sequences which produce an RNAi- like effect.

For use in treating Stargardt’s disease, suitable coding regions of ABCA4 are Exons 1-22 or 27-50, in separate RTMs. For use in treating LCA10, suitable coding regions of CEP290 are Exons 1-26 or exons 27-54 in separate RTMs. For use in treating Usher Syndrome, suitable coding regions 0ΪMUO7A are Exons 1-18 or 33-49, in separate RTMs.

Optional Components or Modifications of the RTM

An optional spacer region may be used to separate the splicing domain from the target binding domain in the RTM. The spacer region may be designed to include features such as (i) stop codons which would function to block translation of any unspliced RTM and/or (ii) sequences that enhance /ra/ v-s pi icing to the target pre-mRNA. The spacer may be between 3 to 25 nucleotides or more depending upon the lengths of the other components of the RTM and the rAAV limitations. In one embodiment a suitable 5’ RTM spacer is AGA TCT CGT TGC GAT ATT AT SEQ ID NO: 10. In one embodiment a suitable 3’ spacer is: 5’- GAG AAC ATT ATT ATA GCG TTG CTC GAG -3’ SEQ ID NO: 11. Still other optional components of the RTMs include mini introns, and intronic or exonic enhancers or silencers that would regulate the /raws-splicing (See, e.g., the descriptions in the RTM technology publications cited herein.)

In another embodiment, the RTM further comprises at least one safety sequence incorporated into the spacer, binding domain, or elsewhere in the RTM to prevent non specific trans- splicing. This is a region of the RTM that covers elements of the 3' and/or 5' splice site of the RTM by relatively weak complementarity, preventing non-specific trans- splicing. The RTM is designed in such a way that upon hybridization of the

binding/targeting portion(s) of the RTM, the 3' and/or 5' splice site is uncovered and becomes fully active. Such“safety” sequences comprise a complementary stretch of cis- sequence (or could be a second, separate, strand of nucleic acid) which binds to one or both sides of the RTM branch point, pyrimidine tract, 3' splice site and/or 5' splice site (splicing elements), or could bind to parts of the splicing elements themselves. The binding of the“safety” may be disrupted by the binding of the target binding region of the RTM to the target pre-mRNA, thus exposing and activating the RTM splicing elements (making them available to trans- splice into the target pre-mRNA). In another embodiment, the RTM has 3'UTR sequences or ribozyme sequences added to the 3 or 5' end.

In an embodiment, splicing enhancers such as, for example, sequences referred to as exonic splicing enhancers may also be included in the structure of the synthetic RTMs. Additional features can be added to the RTM molecule, such as polyadenylation signals to modify RNA expression/stability, or 5' splice sequences to enhance splicing, additional binding regions,“safety”-self complementary regions, additional splice sites, or protective groups to modulate the stability of the molecule and prevent degradation. In addition, stop codons may be included in the RTM structure to prevent translation of unspliced RTMs. Further elements such as a 3' hairpin structure, circularized RNA, nucleotide base modification, or synthetic analogs can be incorporated into RTMs to promote or facilitate nuclear localization and spliceosomal incorporation, and intra-cellular stability.

The binding of the RTM nucleic acid molecule to the target pre-mRNA is mediated by complementarity (i.e. based on base-pairing characteristics of nucleic acids), triple helix formation or protein-nucleic acid interaction (as described in documents cited herein). In one embodiment, the RTM nucleic acid molecules consist of DNA, RNA or DNA/RNA hybrid molecules, wherein the DNA or RNA is either single or double stranded. Also comprised are RNAs or DNAs, which hybridize to one of the

aforementioned RNAs or DNAs preferably under stringent conditions like, for example, hybridization at 60°C in 2.5XSSC buffer and several washes at 37°C at a lower buffer concentration like, for example, 0.5xSSC buffer and which encode proteins exhibiting lipid phosphate phosphatase activity and/or association with plasma membranes. When RTMs are synthesized in vitro (synthetic RTMs), such RTMs can be modified at the base moiety, sugar moiety, or phosphate backbone, for example, to improve stability of the molecule, hybridization to the target mRNA, transport into the cell, stability in the cells to enzymatic cleavage, etc. For example, modification of a RTM to reduce the overall charge can enhance the cellular uptake of the molecule. In addition modifications can be made to reduce susceptibility to nuclease or chemical degradation. The nucleic acid molecules may be synthesized in such a way as to be conjugated to another molecule, e.g., a peptide, hybridization triggered cross-linking agent, transport agent, hybridization-triggered cleavage agent, etc.

Various other well-known modifications to the nucleic acid molecules can be introduced as a means of increasing intracellular stability and half-life (see also above for oligonucleotides). Possible modifications are known to the art (see documents cited herein). Modifications, which may be made to the structure of the synthetic RTMs include but are not limited to backbone modifications such as described in the cited RTM technology documents.

Recombinant AA V Molecules

A variety of known nucleic acid vectors may be used in these methods to design and assemble the components of the RTM and the recombinant adeno-associated virus (AAV), intended to deliver the RTM to the target cells. A wealth of publications known to those of skill in the art discusses the use of a variety of such vectors for delivery of genes (see, e.g., Ausubel et al, Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1989; Kay, M. A. et al, 2001 Nat. Medic., 7(l):33to40; and Walther W. and Stein U., 2000 Drugs, 60(2):249to71). In one embodiment described herein the vector is a recombinant AAV carrying a RTM and driven by a promoter that expresses the RTM in selected target cells of the affected subject. Methods for assembly of the recombinant vectors are well-known (see, e.g., International Patent Publication No. WO 00/15822, published March 23, 2000 and other references cited herein).

In certain embodiments described herein, the RTM(s) carrying the selected gene binding and coding sequences is delivered to the target cells, e.g., photoreceptor cells, in need of treatment by means of an adeno-associated virus vector. Many naturally occurring serotypes of AAV are available. Many natural variants in the AAV capsid exist, allowing identification and use of an AAV with properties specifically suited for ocular cells. AAV viruses may be engineered by conventional molecular biology techniques, making it possible to optimize these particles for cell specific delivery of the RTM nucleic acid sequences, for minimizing immunogenicity, for tuning stability and particle lifetime, for efficient degradation, for accurate delivery to the nucleus, etc.

The expression of the RTMs described herein can be achieved in the selected cells through delivery by recombinantly engineered AAVs or artificial AAV’s that contain sequences encoding the desired RTM. The use of AAVs is a common mode of exogenous delivery of DNA as it is relatively non-toxic, provides efficient gene transfer, and can be easily optimized for specific purposes. Among the serotypes of AAVs isolated from human or non-human primates (NHP) and well characterized, human serotype 2 has been widely used for efficient gene transfer experiments in different target tissues and animal models. Other AAV serotypes include, but are not limited to, AAV1, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8 and AAV9. Unless otherwise specified, the AAV ITRs, and other selected AAV components described herein, may be readily selected from among any AAV serotype, including, without limitation, AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAVrh.10, AAV8bp, AAV7m8 or other known and unknown AAV serotypes. These ITRs or other AAV components may be readily isolated using techniques available to those of skill in the art from an AAV serotype. Such AAV may be isolated or obtained from academic, commercial, or public sources (e.g., the American Type Culture Collection, Manassas, VA). Alternatively, the AAV sequences may be obtained through synthetic or other suitable means by reference to published sequences such as are available in the literature or in databases such as, e.g., GenBank, PubMed, or the like. See, e.g., WO 2005/033321 or WO2014/124282 for a discussion of various AAV serotypes, which is incorporated herein by reference.

Desirable AAV fragments for assembly into vectors include the cap proteins, including the vpl, vp2, vp3 and hypervariable regions, the rep proteins, including rep 78, rep 68, rep 52, and rep 40, and the sequences encoding these proteins. These fragments may be readily utilized in a variety of vector systems and host cells. Such fragments may be used alone, in combination with other AAV serotype sequences or fragments, or in combination with elements from other AAV or non- AAV viral sequences. As used herein, artificial AAV serotypes include, without limitation, AAV with a non-naturally occurring capsid protein. Such an artificial capsid may be generated by any suitable technique, using a selected AAV sequence (e.g., a fragment of a vpl capsid protein) in combination with heterologous sequences which may be obtained from a different selected AAV serotype, non-contiguous portions of the same AAV serotype, from a non- AAV viral source, or from a non-viral source. An artificial AAV serotype may be, without limitation, a pseudotyped AAV, a chimeric AAV capsid, a recombinant AAV capsid, or a“humanized” AAV capsid. Pseudotyped vectors, wherein the capsid of one AAV is replaced with a heterologous capsid protein, are useful in the invention. In one embodiment, AAV2/5 a useful pseudotyped vector. In another embodiment, the AAV is AAV2/8.

In one embodiment, the vectors useful in preparing the compositions and methods described herein contain, at a minimum, sequences encoding a selected AAV serotype capsid, e.g., an AAV2 capsid, or a fragment thereof. In another embodiment, useful vectors contain, at a minimum, sequences encoding a selected AAV serotype rep protein, e.g., AAV2 rep protein, or a fragment thereof. Optionally, such vectors may contain both AAV cap and rep proteins. In vectors in which both AAV rep and cap are provided, the AAV rep and AAV cap sequences can both be of one serotype origin, e.g., all AAV2 origin. Alternatively, vectors may be used in which the rep sequences are from an AAV serotype which differs from that which is providing the cap sequences. In one

embodiment, the rep and cap sequences are expressed from separate sources (e.g., separate vectors, or a host cell and a vector). In another embodiment, these rep sequences are fused in frame to cap sequences of a different AAV serotype to form a chimeric AAV vector, such as AAV2/8 described in US Patent No. 7,282,199, which is incorporated by reference herein.

A suitable recombinant adeno-associated virus (AAV) is generated by culturing a host cell which contains a nucleic acid sequence encoding an adeno-associated virus (AAV) serotype capsid protein, or fragment thereof, as defined herein; a functional rep gene; a minigene composed of, at a minimum, AAV inverted terminal repeats (ITRs) and the RTM nucleic acid sequence; and sufficient helper functions to permit packaging of the minigene into the AAV capsid protein. The components required to be cultured in the host cell to package an AAV minigene in an AAV capsid may be provided to the host cell in trans. Alternatively, any one or more of the required components (e.g., minigene, rep sequences, cap sequences, and/or helper functions) may be provided by a stable host cell which has been engineered to contain one or more of the required components using methods known to those of skill in the art.

In one embodiment, the rAAV comprises a promoter (or a functional fragment of a promoter). The selection of the promoter to be employed in the rAAV may be made from among a wide number of constitutive or inducible promoters that can express the selected transgene in the desired target cell. See, e.g., the list of promoters identified in

International Patent Publication No. WO2014/12482, published August 14, 2014, incorporated by reference herein. In one embodiment, the promoter is“cell specific”. The term“cell-specific” means that the particular promoter selected for the recombinant vector can direct expression of the selected transgene in a particular cell or ocular cell type. In one embodiment, the promoter is specific for expression of the transgene in photoreceptor cells. In another embodiment, the promoter is specific for expression in the rods and/or cones. In another embodiment, the promoter is specific for expression of the transgene in RPE cells. In another embodiment, the promoter is specific for expression of the transgene in ganglion cells. In another embodiment, the promoter is specific for expression of the transgene in Mueller cells. In another embodiment, the promoter is specific for expression of the transgene in bipolar cells. In another embodiment, the transgene is expressed in any of the above noted ocular cells.

In another embodiment, promoter is the native promoter for the target ocular gene to be expressed. Useful promoters include, without limitation, the rod opsin promoter, the red-green opsin promoter, the blue opsin promoter, the cGMP- -phosphodiesterase promoter, the mouse opsin promoter (Beltran et al 2010 cited above), the rhodopsin promoter (Mussolino et al, Gene Ther, July 2011, 18(7):637-45); the alpha-subunit of cone transducin (Morrissey et al, BMC Dev, Biol, Jan 2011, 11 :3); beta phosphodiesterase (PDE) promoter; the retinitis pigmentosa (RP1) promoter (Nicord et al, J. Gene Med, Dec 2007, 9(12): 1015-23); the NXNL2/NXNL 1 promoter (Lambard et al, PLoS One, Oct. 2010, 5(10):el3025), the RPE65 promoter; the retinal degeneration slow/peripherin 2 ( Rds/perph2 ) promoter (Cai et al, Exp Eye Res. 2010 Aug;91(2): 186-94); and the VMD2 promoter (Kachi et al, Human Gene Therapy, 2009 (20:31-9)). Each of these documents is incorporated by reference herein.

Other conventional regulatory sequences contained in the mini-gene or rAAV are also disclosed in documents such as WO2014/124282 and others cited and incorporated by reference herein. One of skill in the art may make a selection among these, and other, expression control sequences without departing from the scope described herein.

The desired AAV minigene is composed of, at a minimum, the RTM described herein and its regulatory sequences, and 5’ and 3’ AAV inverted terminal repeats (ITRs). In one embodiment, the ITRs of AAV serotype 2 are used. In another embodiment, the ITRs of AAV serotype 5 or 8 are used. However, ITRs from other suitable serotypes may be selected. It is this minigene which is packaged into the AAV capsid and delivered to a selected host cell.

The minigene, rep sequences, cap sequences, and helper functions required for producing the rAAV may be delivered to the packaging host cell in the form of any genetic element which transfers the sequences carried thereon. The selected genetic element may be delivered by any suitable method, including those described herein. The methods used to construct any embodiment described herein are known to those with skill in nucleic acid manipulation and include genetic engineering, recombinant engineering, and synthetic techniques. See, e.g., Sambrook et al, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, NY. Similarly, methods of generating rAAV virions are well known and the selection of a suitable method is not a limitation on the present invention. See, e.g., K. Fisher et al, 1993 J. Virol., 70:520to532 and US Patent 5,478,745, among others. These publications are incorporated by reference herein.

Suitable production cell lines are readily selected by one of skill in the art. For example, a suitable host cell can be selected from any biological organism, including prokaryotic (e.g., bacterial) cells, and eukaryotic cells, including, insect cells, yeast cells and mammalian cells. Briefly, the AAV production plasmid carrying the minigene is transfected into a selected packaging cell, where it may exist transiently. Alternatively, the minigene or gene expression cassette with its flanking ITRs is stably integrated into the genome of the host cell, either chromosomally or as an episome. Suitable transfection techniques are known and may readily be utilized to deliver the recombinant AAV genome to the host cell. Typically, the production plasmids are cultured in the host cells which express the cap and/or rep proteins. In the host cells, the minigene consisting of the RTM with flanking AAV ITRs is rescued and packaged into the capsid protein or envelope protein to form an infectious viral particle. Thus a recombinant AAV infectious particle is produced by culturing a packaging cell carrying the proviral plasmid in the presence of sufficient viral sequences to permit packaging of the gene expression cassette viral genome into an infectious AAV envelope or capsid.

The Pharmaceutical Carrier and Pharmaceutical Compositions

The compositions described herein containing the recombinant viral vector, e.g., AAV, containing the desired RTM minigene for use in the selected target cells, e.g., photoreceptor cells for treatment of Stargardt Disease, as detailed above, is preferably assessed for contamination by conventional methods and then formulated into a pharmaceutical composition intended for a suitable route of administration. Still other compositions containing the RTM, e.g., naked DNA or as protein, may be formulated similarly with a suitable carrier. Such formulation involves the use of a pharmaceutically and/or physiologically acceptable vehicle or carrier, particularly directed for

administration to the target cell. In one embodiment, carriers suitable for administration to the cells of the eye include buffered saline, an isotonic sodium chloride solution, or other buffers, e.g., HEPES, to maintain pH at appropriate physiological levels, and, optionally, other medicinal agents, pharmaceutical agents, stabilizing agents, buffers, carriers, adjuvants, diluents, etc. For injection, the carrier will typically be a liquid. Exemplary physiologically acceptable carriers include sterile, pyrogen-free water and sterile, pyrogen-free, phosphate buffered saline. A variety of such known carriers are provided in US Patent No. 7,629,322, incorporated herein by reference. In one embodiment, the carrier is an isotonic sodium chloride solution. In another embodiment, the carrier is balanced salt solution. In one embodiment, the carrier includes tween. If the virus is to be stored long-term, it may be frozen in the presence of glycerol or Tween20.

In other embodiments, e.g., compositions containing RTMs described herein include a surfactant. Useful surfactants, such as Pluronic F68 ((Poloxamer 188), also known as Lutrol® F68) may be included as they prevent AAV from sticking to inert surfaces and thus ensure delivery of the desired dose.

As an example, one illustrative composition designed for the treatment of the ocular diseases described herein comprises a recombinant adeno-associated vector carrying a nucleic acid sequence encoding 3’RTM as described herein, under the control of regulatory sequences which express the RTM in an ocular cell of a mammalian subject, and a pharmaceutically acceptable carrier. The carrier is isotonic sodium chloride solution and includes a surfactant Pluronic F68. In one embodiment, the RTM is that described in the examples. In another embodiment, the RTM contains the binding and coding regions for CEP 290 ox MY 07 A.

In yet another exemplary embodiment, the composition comprises a recombinant AAV2/5 pseudotyped adeno-associated virus carrying a 3’ or 5’ or RTM for internal gene replacement, the nucleic acid sequence under the control of promoter which directs expression of the RTM in the target cells, wherein the composition is formulated with a carrier and additional components suitable for injection.

In still another embodiment, the composition or components for production or assembly of this composition, including carriers, rAAV particles, surfactants, and/or the components for generating the rAAV, as well as suitable laboratory hardware to prepare the composition, may be incorporated into a kit.

Methods of Treating Disorders The compositions described above are thus useful in methods of treating one or more of the diseases associated with a selected gene. In one embodiment, the disease is an ocular disease (e.g., Stargardt Disease, Lebers Congenital Amaurosis, cone rod dystrophy, fundus flavimaculatus, retinitis pigmentosa, age-related macular degeneration, Senior Loken syndrome, Joubert syndrome, or Usher Syndrome, among others). Treatment, in one embodiment, includes delaying or ameliorating symptoms associated with the ocular diseases described herein. Such methods involve contacting a target pre-mRNA (e.g., ABCA4, CEP 290, MY07A) with one or more of a 3’RTM, 5’ RTM, both 3’ and 5’ RTM or a double trans- splicing RTM as described herein, under conditions in which a portion of the RTM is spliced to the target pre-mRNA to replace all or a part of the targeted gene carrying one or more defects or mutations, with a“healthy”, or normal or wildtype or corrected mRNA of the targeted gene, in order to correct expression of that gene in the target cell. Alternatively, a pre-miRNA (see the RTM documents cited herein) can be formed, which is designed to reduce the expression of a target mRNA. Thus, the methods and compositions are used to treat the ocular diseases/pathologies associated with the specific mutations and/or gene expression.

In one embodiment, the contacting involves direct administration to the affected subject; in another embodiment, the contacting may occur ex vivo to the cultured cell and the treated cell reimplanted in the subject. In one embodiment, the method involves administering a rAAV particle carrying a 3’ RTM. In another embodiment, the method involves administering a rAAV particle carrying a 5’ RTM. In another embodiment, the method involves administering a rAAV particle carrying a double trans- splicing RTM. In still another embodiment, the method involves administering a mixture of rAAV particle carrying a 3’ RTM and rAAV particle carrying a 5’ RTM. In still another embodiment, the method involves administering a mixture of rAAV particle carrying a 3’ RTM and an rAAV particle carrying a double tram- splicing RTM. In still another embodiment, the method involves administering a mixture of rAAV particle carrying a 5’ RTM and an rAAV carrying a double /ra -splicing RTM. In still another embodiment, the method involves administering a mixture of an rAAV particle carrying a 3’ RTM, with an rAAV particle carrying a 5’ RTM and an rAAV particle carrying a double tram- splicing RTM. These methods comprise administering to a subject in need thereof subject an effective concentration of a composition of any of those described herein.

In one illustrative embodiment, such a method is provided for preventing, arresting progression of or ameliorating vision loss associated with Stargardt Disease in a subject, said method comprising administering to an ocular cell of a mammalian subject in need thereof an effective concentration of a composition comprising a recombinant adeno- associated virus (AAV) carrying a 3’RTM such as described above and in the examples, under the control of regulatory sequences which permit the RTM to function and cause /rafts-splicing of the defective targeted gene in an ocular cell, e.g., photoreceptor cell, of a mammalian subject. In still another embodiment, the method involves administering two rAAV particles, one carrying a 5’ RTM and the other carrying the 3’RTM, such as those RTMs described in the examples to replace large portions of large genes.

By“administering” as used in the methods means delivering the composition to the target selected cell which is characterized by the disease caused by a mutation or defect in the targeted gene. For example, in one embodiment, the method involves delivering the composition by subretinal injection to the photoreceptor cells or other ocular cells. In another embodiment, intravitreal injection to ocular cells or injection via the palpebral vein to ocular cells may be employed. In another embodiment, the method involves delivering the composition by direct injection to the organ indicated, e.g., liver. In yet another embodiment, the method involves delivering the composition by intravenous injection. Still other methods of administration may be selected by one of skill in the art given this disclosure.

Furthermore, in certain embodiments, it is desirable to perform non-invasive retinal imaging and functional studies to identify areas of retained photoreceptors to be targeted for therapy. In these embodiments, clinical diagnostic tests are employed to determine the precise location(s) for one or more subretinal injection(s). These tests may include electroretinography (ERG), perimetry, topographical mapping of the layers of the retina and measurement of the thickness of its layers by means of confocal scanning laser ophthalmoscopy (cSLO) and optical coherence tomography (OCT), topographical mapping of cone density via adaptive optics (AO), functional eye exam, etc. In view of the imaging and functional studies, in some embodiments one or more injections are performed in the same eye in order to target different areas of retained photoreceptors.

For use in these methods, the volume and viral titer of each injection is determined individually, as further described below, and may be the same or different from other injections performed in the same subject. In another embodiment, a single, larger volume injection is made in order to treat the entire eye. The dosages, administrations and regimens may be determined by the attending physician given the teachings of this specification.

In one embodiment, the volume and concentration of the rAAV composition is selected so that only the certain regions of photoreceptors or other ocular cell is impacted. In another embodiment, the volume and/or concentration of the rAAV composition is a greater amount, in order reach larger portions of the eye. Similarly dosages are adjusted for administration to other organs.

An effective concentration of a recombinant adeno-associated virus carrying a RTM as described herein ranges between about 10⁸ and 10¹³ vector genomes per milliliter (vg/mL). The rAAV infectious units are measured as described in S.K. McLaughlin et al, 1988 J. Virol., 62: 1963. In another embodiment, the concentration ranges between 10⁹ and 10¹³ vector genomes per milliliter (vg/mL). In another embodiment, the effective concentration is about 1.5 x 10¹¹ vg/mL. In one embodiment, the effective concentration is about 1.5 x 10¹⁰ vg/mL. In another embodiment, the effective concentration is about 2.8 x 10¹¹ vg/mL. In yet another embodiment, the effective concentration is about 1.5 x 10¹² vg/mL. In another embodiment, the effective concentration is about 1.5 x 10¹³ vg/mL. It is desirable that the lowest effective concentration of virus be utilized in order to reduce the risk of undesirable effects, such as toxicity, and other issues related to administration to the eye, e.g., retinal dysplasia and detachment. Still other dosages in these ranges or in other units may be selected by the attending physician, taking into account the physical state of the subject, preferably human, being treated, including the age of the subject; the composition being administered and the particular disorder; the targeted cell and the degree to which the disorder, if progressive, has developed.

The composition may be delivered in a volume of from about 50 pL to about 1 mL, including all numbers within the range, depending on the size of the area to be treated, the viral titer used, the route of administration, and the desired effect of the method. In one embodiment, the volume is about 50 pL. In another embodiment, the volume is about 70 pL. In another embodiment, the volume is about 100 pL. In another embodiment, the volume is about 125 pL. In another embodiment, the volume is about 150 pL. In another embodiment, the volume is about 175 pL. In yet another embodiment, the volume is about 200 pL. In another embodiment, the volume is about 250 pL. In another embodiment, the volume is about 300 pL. In another embodiment, the volume is about 450 pL. In another embodiment, the volume is about 500 pL. In another embodiment, the volume is about 600 pL. In another embodiment, the volume is about 750 pL. In another embodiment, the volume is about 850 pL. In another embodiment, the volume is about 1000 pL.

The examples that follow do not limit the scope of the embodiments described herein. One skilled in the art will appreciate that modifications can be made in the following examples which are intended to be encompassed by the spirit and scope of the invention.

EXAMPLE 1 : Splicing Dependent Reporter RTM

The RTMs shown in FIGS. 1 A- ID were delivered delivered to a cell line that expresses a minigene (FIG. IF) that contains Intron26 from CEP290 fused to the 3’ half of luciferase ORF. The RTM binds (via the binding domain) to the target sequence in Intron26, bringing the 5’ splice site (5’ SS) in the RTM in proximity to the 3’ splice site (3’ SS) of the CEP290 minigene. Spliceosome mediated splicing occurs, yielding luciferase expression as a direct measure of trans-splicing activity (FIG. 2A). Two reference RTMs that contain either a polyadenylation signal (poly A) or hammerhead ribozyme (hhRz) constitute prior art for transcription termination elements, and serve here to establish a baseline of activity. The data suggests the Comp 14 derivative of the MALAT1 transcription terminator enhances trans-splicing relative to the reference RTM that contains a hhRz for transcription termination. Furthermore, this activity appears to be dependent on the mascRNA domain and its associated RNaseP cleavage. Evidenced by a loss of activity when the mascRNA domain is replaced with the hhRz. In FIG. 2B the experiment was designed to measure luciferase RNA and protein by TaqMan and Western blotting, respectively. N=4 experimental replicates were tested for each construct, revealing an increase in luciferase protein when the hhRz was replaced with the Compl4 Malatl derivative, consistent with luciferase activity shown in FIG. 2A. TaqMan analysis of RNA extracted from treated cells showed a similar increase in trans- spliced luciferase RNA when the RTM contained the Compl4 derivative of the Malatl terminator, according to two different primer-probe sets (S2 and S4). Because the RTM in these studies used a binding domain that targets Intron26 of the CEP290 gene, it was also possible to measure RTM trans-splicing activity against the endogenous CEP290 transcript. As shown in FIG. 2B, the RTM that carries the Comp 14 derivative of the Malatl terminator generated higher levels of the chimeric Luc-CEP290 RNA compared to an RTM with the hhRz terminator, according to two different TaqMan primer-probe sets (S2 and S3).

EXAMPLE 2: Comparison of 3’ terminator sequences

RTM constructs were made which several terminator sequences were tested for ABCA4 expression: hhz - hammerhead Ribozyme, which self cleaves to create 3’ terminal end of RTM (FIG. 3A); C14 or Compl4 - a truncated MALAT1 triple helix structure (SEQ ID NO: 12), which creates 3’ terminal end of RTM following RNase P cleavage (FIG. 3B); and wt - native MALAT1 triple helix, which creates 3’ terminal end of RTM following RNase P cleavage (FIG. 3C).

FIGs.4A and 4B are Western blots, and quantitation thereof, showing ABCA4 protein generated by RTM-mediated trans-splicing. RTMs of FIG. 3 that were tested include binding domains for ABCA4 intron23 (motifs 27 and 81) and intron22 (motifs 117 and 118). NB is a negative control Non-Binding motif. The data in FIG 4A shows a marked increase in ABCA4 protein when the hhRz terminator was replaced with the Comp 14 derivative. In FIG 4B the Comp 14 derivative was compared to the wild-type MALAT1 triple helix terminator, revealing an even greater increase in trans-splicing activity with the latter, ranging from 5-10 fold depending on the binding domain. In FIG. 4C the predicted base-pairing of the wild-type MALAT1 triple helix terminator and the Compl4 derivative is shown. In their design of the Compl4 derivative, Wilusz et al. suggested it should have the same base-pairing characteristics between the A-rich and U-rich domains as the wild- type MALAT1 sequence, yet with truncated flanking stem-loop domains. However, this assumption ignores the possible role of the flanking stem-loops for proper base-pairing, and could explain the lower ENE activity of Comp 14 compared to the wild-type MALAT1 triple helix terminator. The higher levels of trans-splicing activity seen with the wild-type MALAT1 sequence compared to the Compl4 derivative demonstrates an important characteristic of the triple helix terminator structure and ENE function.

FIG. 5 A shows Western blot analysis of RTMs containing different triple helix terminators from IncRNAs. They include the wild-type sequence from MALAT1 and NEAT1 (MENb), as well as chimeric forms where the triple helix domain from MALATl was fused to the tRNA-like motif from NEAT1 (called menRNA) and one where the triple helix domain from NEAT1 was fused to the mascRNA motif from MALATl. The data suggests trans-splicing activity is highest when an RTM contains the wild-type MALATl terminator.

FIG 5B shows the predicted base-pairing for triple helix terminators from three different IncRNAs, including MALATl, MENb (NEAT1), and PAN RNA (produced from the Kaposi’s sarcoma-associated herpesvirus, KSHV). The structural similarity across distinct IncRNAs suggests a common evolutionary strategy for protecting the 3’ end of the IncRNA following transcription termination. However, X-ray crystallography of the MALATl triple helix domain revealed it contains 10 major groove and 2 minor groove triples, the most of any known naturally occurring triple helical structure (Brown, J.A. et al. 2014). This intricate design likely confers a level of structural stability that is greater than either NEAT1 or PAN, and could explain why the MALATl terminator appears to better support trans-splicing. By way of protecting the RTM from degradation in the nucleus. Importantly, the blunt-ended triple helix of MALATl has been shown to inhibit rapid nuclear RNA decay as shown by in vivo decay assays (Brown, J.A. 2014).

FIG. 6A shows the highly conserved mascRNA sequence of MALATl from several species and it’s predicted folded conformation. A single G-to-A point mutation, indicated by the red arrow, was inserted into the mascRNA sequence to test the importance of this domain for trans-splicing activity. As shown in the Western blot (FIG. 6B), the point mutation ablated trans-splicing activity of a validated RTM that targets ABCA4. Possibly due to the inability of the mutated sequence to assume the correct conformation required for RNaseP recognition and cleavage.

The following additional numerated paragraphs further define some embodiments of the invention described herein.

1. A nucleic acid trans-splicing molecule comprising a 3’ transcription terminator domain (TTD), which comprises a triple helix.

2. The nucleic acid trans-splicing molecule of claim 1, wherein the triple helix comprises at least five consecutive A-U Hoogsteen base pairs.

3. The nucleic acid trans-splicing molecule of claim 1 or 2, wherein the triple helix comprises an A-rich tract of 5-30 nucleic acids.

4. The nucleic acid trans-splicing molecule of claim 3, wherein the A-rich tract is at the 3’ end of the TTD.

5. The nucleic acid trans-splicing molecule of any one of claims 1-4, wherein the triple helix comprises a strand of 10 consecutive nucleotides, wherein 9 of the 10 consecutive nucleotides are paired via Hoogsteen base pairing.

6. The nucleic acid trans-splicing molecule of any one of claims 1-5, wherein the TTD comprises a stem-loop motif.

7. The nucleic acid trans-splicing molecule of any one of claims 1-6, wherein the 3’ TTD comprises, operatively linked in a 5’-to-3’ direction, a 5’ U-rich motif, a stem- loop motif, a 3’ U-rich motif, and an A-rich tract.

8. The nucleic acid trans-splicing molecule of any one of claims 1-4, wherein the 3’ TTD is at least 95% homologous with SEQ ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, or SEQ ID NO: 23. 9. The nucleic acid trans-splicing molecule of claim 8, wherein the 3’ TTD is at least 95% homologous with SEQ ID NO: 13, and wherein the triple helix comprises Hoogsteen base pairing of U7-U11 of SEQ ID NO: 13 with an A-rich tract.

10. The nucleic acid of claim 9, wherein the 3’ TTD is the PAN ENE+A.

11. The nucleic acid trans-splicing molecule of any one of claims 1-8, wherein the 3’ TTD is at least 95% homologous with SEQ ID NO: 15, and wherein the triple helix comprises Hoogsteen base pairing of U6-10, Cl l, and U12-15 of SEQ ID NO: 15 with an A-rich tract.

12. The nucleic acid of claim 11, wherein the 3’ TTD is the MALAT1 ENE+A.

13. The nucleic acid trans-splicing molecule of claim 8, wherein the 3’ TTD is at least 95% homologous with SEQ ID NO: 17, and wherein the triple helix comprises Hoogsteen base pairing of U6-10, Cl l, and U12-15 of SEQ ID NO: 17 with an A-rich tract.

14. The nucleic acid of claim 13, wherein the 3’ TTD is the MALAT1 core ENE+A.

15. The nucleic acid trans-splicing molecule of claim 8, wherein the 3’ TTD is at least 95% homologous with SEQ ID NO: 23, and wherein the triple helix comprises Hoogsteen base pairing of U8-10, Cl 1, and U12-15 of SEQ ID NO: 23 with an A-rich tract.

16. The nucleic acid trans-splicing molecule of claim 15, wherein the 3’ TTD is the MENb ENE+A. 17. A nucleic acid trans-splicing molecule comprising, operatively linked in a 5’-to-3’ direction:

(a) a coding domain sequence (CDS) comprising one or more functional exon(s) of a selected gene;

(b) a linker domain sequence (LDS) of varying length that acts as a structural connection between the coding domain and the binding domain,

(c) a spliceosome recognition motif (5’ Splice Site) configured to initiate spliceosome-mediated trans-splicing;

(e) a 3’ transcription terminator domain (TTD) that increases the efficiency of trans-splicing,

wherein the nucleic acid trans-splicing molecule is configured to trans-splice the coding domain to an endogenous exon of the selected gene adjacent to the target intron, thereby replacing the endogenous defective or mutated exon with the functional exon and correcting a mutation in the selected gene.

18. The nucleic acid trans-splicing molecule of claim 17, wherein the binding domain hybridizes to the target intron of the selected gene 3’ to the mutation and the coding domain comprises one or more exon(s) 5’ to the target intron.

19. A nucleic acid trans-splicing molecule comprising, operatively linked in a 5’-to-3’ direction:

(b) a linker sequence of varying length and composition that acts as a structural connection between the binding domain the coding region;

(c) a 3’ spliceosome recognition motif (3’ Splice Site) configured to mediate trans-splicing; (d) a coding domain sequence (CDS) comprising one or more functional exon(s) of the selected gene; and

20. The nucleic acid trans-splicing molecule of claim 19, wherein the binding domain binds to the target intron of the selected gene 3’ to the mutation and the coding domain comprises one or more exon 5’ to the target intron.

21. The nucleic acid trans-splicing molecule of any of claims 17 to 20, wherein the 3’ transcription terminator domain forms a triple helical structure that effectively caps the 3’ end.

22. The nucleic acid trans-splicing molecule of any preceding claim, wherein the 3’ transcription terminator domain is a sequence from one or more long non-coding RNAs (IncRNA) or other nuclear RNA molecules that contain a 3’ transcription terminator that condenses into a triple helix blund-ended structure.

23. The nucleic acid trans-splicing molecule of any one of claims 17-22, wherein the 3’ transcription terminator domain is from the human long non-coding RNA MALAT1.

24. The nucleic acid trans-splicing molecule of claim 23, wherein the 3’ transcription terminator domain comprises nucleotides 8287-8437 of human MALAT1.

25. The nucleic acid trans-splicing molecule of claim 23, wherein the 3’ transcription terminator domain comprises, in order from 5’ to 3’, a triplex forming sequence that comprises nucleotides 8287-8379, an RNaseP cleavage site the comprises nucleotides 8379-8380, and a tRNA-like sequence that comprises nucleotides 8380-8437.

26. The nucleic acid trans-splicing molecule of claim 23, wherein the 3’ transcription terminator domain contains a triplex forming sequence comprised of a U-rich motif 1 (8292-8301), a conserved stem-loop (8302-8333), a U-rich motif 2 (8334-8343), and an A-rich tract (8369-8379), wherein the A-rich tract and the U-rich motif 2 form a Watson-Crick stem duplex, and the U-rich motif 1 aligns with the A-rich tract to form Hoogsteen base pairs.

27. The nucleic acid trans-splicing molecule of claim 23, wherein the 3’ transcription terminator domain is a truncated version of the human MALAT1 triple helix.

28. The nucleic acid trans-splicing molecule of claim 27, wherein the 3’ transcription terminator domain contains a triplex forming sequence comprised of a U-rich motif 1 (8292-8301), a conserved stem-loop (8302-8310 and 8325-8333), a U-rich motif 2 (8334-8343), an A-rich tract (8369-8379), and a deletion spanning nucleotide 8345-8364 of the intervening sequence between U-rich motif 2 and the A-rich tract, wherein the A- rich tract and the U-rich motif 2 form a Watson-Crick stem duplex, and the U-rich motif 1 aligns with the A-rich tract to form Hoogsteen base pairs.

29. The nucleic acid trans-splicing molecule of claim 27, wherein the 3’ transcription terminator domain comprises, in order from 5’ to 3’, a triplex forming sequence of varying length and composition, an RNaseP cleavage site, and a tRNA-like sequence of varying length and composition.

30. The nucleic acid trans-splicing molecule of claim 27, wherein the 3’ transcription terminator domain contains a triplex forming sequence that conforms to one of three known basic“motifs”, and are referred to by the base composition of the third strand of the triple helix: pyrimidine motif (T,C), purine motif (G,A), and purine- pyrimidine motif (G,T). 31. The nucleic acid trans-splicing molecule of claim 22, wherein the 3’ transcription terminator domain comprises a triple helix domain and a tRNA-like domain.

32. The nucleic acid trans-splicing molecule of claim 31, wherein the triple helix domain and the tRNA-like domain originate from the same long non-coding RNA or different combinations of long non-coding RNA domains derived from human or any other species.

33. The nucleic acid trans-splicing molecule of claim 31, wherein the triple helix domain and the tRNA-like domain are from MALAT1 or NEATI/MENb.

34. The nucleic acid trans-splicing molecule according to any preceding claim 17, wherein the targeted mammalian gene is ABCA4, CEP290, or MY07A.

35. The nucleic acid trans-splicing molecule according to any preceding claim, wherein the gene is ABCA4 and the defect or mutation is in any of Exons 1-23.

36. The nucleic acid trans-splicing molecule according to any preceding claim, further comprising one or more linker sequences.

37. The nucleic acid trans-splicing molecule according to claim 26, comprising a linker between the splicing domain and binding domain.

38. The nucleic acid trans-splicing molecule according to claim 36 or 37, comprising a linker between the binding domain and 3’ terminal domain.

39. A recombinant adeno-associated virus (rAAV) comprising the nucleic acid molecule of any one of claims 1-38. 40. The rAAV of claim 39, wherein the AAV preferentially targets a photoreceptor cell.

41. The rAAV of claim 39 or 40, wherein the AAV comprises an AAV5 capsid protein, an AAV8 capsid protein, an AAV8(b) capsid protein, or an AAV9 capsid protein.

42. A method of treating a disease caused by a defect or mutation in a target gene comprising: administering to the cells of a subject having the disease a composition comprising a recombinant AAV comprising a nucleic acid /rafts-splicing molecule of any of claims 1 to 38.

43. A method of treating an ocular disease caused by a defect or mutation in a target gene comprising: administering to the ocular cells of a subject having an ocular disease a composition comprising a recombinant AAV comprising a nucleic acid trans- splicing molecule of any of claims 1 to 38.

44. The method according to claim 43, wherein the disease is Stargardt Disease, Leber Congenital Amaurosis (LCA), cone rod dystrophy, fundus flavimaculatus, retinitis pigmentosa, age-related macular degeneration, or Usher Syndrome.

45. The method according to claim 43 or 44, wherein the composition is administered by subretinal injection.

46. The method according to claim 43, wherein the disease is Stargardt’ s Disease, the cells are photoreceptor cells, the ocular gene is ABCA4 and the corrected exon sequence is Exons 1-19, Exons 1-22, Exons 1-23 or Exons 1-24.

47. A pharmaceutical preparation, comprising a physiologically acceptable carrier and the rAAV of any of claims 39-41. All publications cited in this specification are incorporated herein by reference in their entireties. In addition, US Provisional Patent Application No. 62/835,164, filed April 17,

2019, is incorporated herein by reference in its entirety. Similarly, the SEQ ID NOs which are referenced herein and which appear in the appended Sequence Listing are incorporated by reference. While the invention has been described with reference to particular embodiments, it will be appreciated that modifications can be made without departing from the spirit of the invention. Such modifications are intended to fall within the scope of the appended claims.

Claims

CLAIMS:

1. A nucleic acid trans-splicing molecule comprising, operatively linked in a 5’-to-3’ direction:

(b) a linker domain sequence (LDS) of varying length and sequence that acts as a structural connection between the coding domain and the binding domain, and may contain motifs that function as splicing enhancers, or have the capacity to fold into complex secondary structures that act to minimize the translation of the coding region before the trans-splicing event occurs.

(c) a spliceosome recognition motif (5’ Splice Site, Splice Donor, SD) configured to initiate spliceosome-mediated trans-splicing;

(e) a 3’ transcription terminator domain (TTD),

2. The nucleic acid trans-splicing molecule of claim 1, wherein the binding domain hybridizes to the target intron of the selected gene 3’ to the mutation and the coding domain comprises one or more exon(s) 5’ to the target intron.

3. A nucleic acid trans-splicing molecule comprising, operatively linked in a 5’-to-3’ direction:

(a) a binding domain (BD) configured to bind a target intron of a selected gene, wherein said gene has at least one defect or mutation in an exon 3’ to the targeted intron; (b) a linker sequence of varying length and composition that acts as a structural connection between the binding domain the coding region, and contains motifs that function as splicing enhancers or fold into complex secondary structures that impede translation of the coding region as a competitive event for trans-splicing;

(c) a 3’ spliceosome recognition motif (3’ Splice Site)(Splice Acceptor, SA) configured to mediate trans-splicing;

(d) a coding domain sequence (CDS) comprising one or more functional exon(s) of the selected gene; and

(e) a 3’ transcription terminator domain (TTD),

4. The nucleic acid trans-splicing molecule of claim 3, wherein the binding domain binds to the target intron of the selected gene 3’ to the mutation and the coding domain comprises one ore more exon 5’ to the target intron.

5. The nucleic acid trans-splicing molecule of any of claims 1 to 4, wherein the 3’ transcription terminator domain forms a triple helical structure that effectively caps the 3’ end.

6. The nucleic acid trans-splicing molecule of any preceding claim, wherein the 3’ transcription terminator domain is a sequence from one or more long non-coding RNAs (IncRNA) or other nuclear RNA molecules that contain a 3’ transcription terminator that condenses into a triple helix 3’ end cap triple helix blund-ended structure.

7. The nucleic acid trans-splicing molecule of one of claims 1 to 7, wherein the 3’ transcription terminator domain is from the human long non-coding RNA

MALAT1.

8. The nucleic acid trans-splicing molecule of claim 7, wherein the 3’ transcription terminator domain comprises nucleotides 8287-8437 of human MALAT1.

9. The nucleic acid trans-splicing molecule of claim 7, wherein the 3’ transcription terminator domain comprises, in order from 5’ to 3’, a triplex forming sequence that comprises nucleotides 8287-8379, an RNaseP cleavage site the comprises nucleotides 8379-8380, and a tRNA-like sequence that comprises nucleotides 8380-8437.

10. The nucleic acid trans-splicing molecule of claim 7, wherein the 3’ transcription terminator domain contains a triplex forming sequence comprised of a U-rich motif 1 (8292-8301), a conserved stem-loop (8302-8333), a U-rich motif 2 (8334-8343), and an A-rich tract (8369-8379), wherein the A-rich tract and the U-rich motif 2 form a Watson-Crick stem duplex, and the U-rich motif 1 aligns with the A-rich tract to form Hoogsteen base pairs.

11. The nucleic acid trans-splicing molecule of claim 7, wherein the 3’ transcription terminator domain is a truncated version of the human MALAT1 triple helix.

12. The nucleic acid trans-splicing molecule of claim 11, wherein the 3’ transcription terminator domain contains a triplex forming sequence comprised of a U-rich motif 1 (8292-8301), a conserved stem-loop (8302-8310 and 8325-8333), a U-rich motif 2 (8334-8343), an A-rich tract (8369-8379), and a deletion spanning nucleotide 8345-8364 of the intervening sequence between U-rich motif 2 and the A-rich tract, wherein the A- rich tract and the U-rich motif 2 form a Watson-Crick stem duplex, and the U-rich motif 1 aligns with the A-rich tract to form Hoogsteen base pairs.

13. The nucleic acid trans-splicing molecule of claim 11 , wherein the 3’ transcription terminator domain comprises, in order from 5’ to 3’, a triplex forming sequence of varying length and composition, an RNaseP cleavage site, and a tRNA-like sequence of varying length and composition.

14. The nucleic acid trans-splicing molecule of claim 11, wherein the 3’ transcription terminator domain contains a triplex forming sequence that conforms to one of three known basic“motifs”, and are referred to by the base composition of the third strand of the triple helix: pyrimidine motif (T,C), purine motif (G,A), and purine- pyrimidine motif (G,T).

15. The nucleic acid trans-splicing molecule of claim 6, wherein the 3’ transcription terminator domain comprises a triple helix domain and a tRNA-like domain.

16. The nucleic acid trans-splicing molecule of claim 15, wherein the triple helix domain and the tRNA-like domain originate from the same long non-coding RNA or different combinations of long non-coding RNA domains derived from human or any other species.

17. The nucleic acid trans-splicing molecule of claim 15, wherein the triple helix domain and the tRNA-like domain are from MALAT1 or NEATI/MENb.

18. The nucleic acid trans-splicing molecule according to any preceding claim 1, wherein the targeted mammalian gene is ABCA4, CEP290, or MY07A.

19. The nucleic acid trans-splicing molecule according to any preceding claim, wherein the gene is ABCA4 and the defect or mutation is in any of Exons 1-23.

20. The nucleic acid trans-splicing molecule according to any preceding claim, further comprising one or more linker sequences.

21. The nucleic acid trans-splicing molecule according to claim 20, comprising a linker between the splicing domain and binding domain.

22. The nucleic acid trans-splicing molecule according to claim 20 or 21, comprising a linker between the binding domain and 3’ terminal domain.

23. A recombinant adeno-associated virus (rAAV) comprising the nucleic acid molecule of any one of claims 1-22.

24. The rAAV of claim 23, wherein the AAV preferentially targets a photoreceptor cell.

25. The rAAV of claim 23 or 24, wherein the AAV comprises an AAV5 capsid protein, an AAV8 capsid protein, an AAV8(b) capsid protein, or an AAV9 capsid protein.

26. A method of treating a disease caused by a defect or mutation in a target gene comprising: administering to the cells of a subject having the disease a composition comprising a recombinant AAV comprising a nucleic acid /rafts-splicing molecule of any of claims 1 to 22.

27. A method of treating an ocular disease caused by a defect or mutation in a target gene comprising: administering to the ocular cells of a subject having an ocular disease a composition comprising a recombinant AAV comprising a nucleic acid trans- splicing molecule of any of claims 1 to 22.

28. The method according to claim 27, wherein the disease is Stargardt Disease, Leber Congenital Amaurosis (LCA), cone rod dystrophy, fundus flavimaculatus, retinitis pigmentosa, age-related macular degeneration, or Usher Syndrome.

29. The method according to claim 27 or 28, wherein the composition is administered by subretinal injection.

30. The method according to claim 27, wherein the disease is Stargardt’s Disease, the cells are photoreceptor cells, the ocular gene is ABCA4 and the corrected exon sequence is Exons 1-19, Exons 1-22, Exons 1-23 or Exons 1-24.

31. A pharmaceutical preparation, comprising a physiologically acceptable carrier and the rAAV of any of claims 23-25.