WO2023062359A2 - Novel viral regulatory elements - Google Patents

Novel viral regulatory elements Download PDF

Info

Publication number
WO2023062359A2
WO2023062359A2 PCT/GB2022/052577 GB2022052577W WO2023062359A2 WO 2023062359 A2 WO2023062359 A2 WO 2023062359A2 GB 2022052577 W GB2022052577 W GB 2022052577W WO 2023062359 A2 WO2023062359 A2 WO 2023062359A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
seq
nucleotide sequence
sequences
expression cassette
Prior art date
Application number
PCT/GB2022/052577
Other languages
French (fr)
Other versions
WO2023062359A3 (en
Inventor
Daniel Farley
Jordan Wright
Cristina NOGUEIRA
Original Assignee
Oxford Biomedica (Uk) Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oxford Biomedica (Uk) Limited filed Critical Oxford Biomedica (Uk) Limited
Publication of WO2023062359A2 publication Critical patent/WO2023062359A2/en
Publication of WO2023062359A3 publication Critical patent/WO2023062359A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • C12N15/86Viral vectors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2710/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA dsDNA viruses
    • C12N2710/00011Details
    • C12N2710/16011Herpesviridae
    • C12N2710/16111Cytomegalovirus, e.g. human herpesvirus 5
    • C12N2710/16122New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2730/00Reverse transcribing DNA viruses
    • C12N2730/00011Details
    • C12N2730/10011Hepadnaviridae
    • C12N2730/10111Orthohepadnavirus, e.g. hepatitis B virus
    • C12N2730/10122New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2740/00Reverse transcribing RNA viruses
    • C12N2740/00011Details
    • C12N2740/10011Retroviridae
    • C12N2740/16011Human Immunodeficiency Virus, HIV
    • C12N2740/16041Use of virus, viral particle or viral elements as a vector
    • C12N2740/16043Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2750/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
    • C12N2750/00011Details
    • C12N2750/14011Parvoviridae
    • C12N2750/14111Dependovirus, e.g. adenoassociated viruses
    • C12N2750/14141Use of virus, viral particle or viral elements as a vector
    • C12N2750/14143Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • C12N2830/30Vector systems having a special element relevant for transcription being an enhancer not forming part of the promoter region
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • C12N2830/48Vector systems having a special element relevant for transcription regulating transport or export of RNA, e.g. RRE, PRE, WPRE, CTE
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • C12N2830/60Vector systems having a special element relevant for transcription from viruses

Definitions

  • the present invention relates to novel nucleotide sequences, and to viral vectors or cells comprising such nucleotide sequences.
  • the invention also relates to viral vector production systems, and methods for producing viral vectors using the nucleotide sequences, viral vectors, or cells described herein. Methods for identifying sequences that improve transgene expression in a target cell are also provided herein.
  • lentiviral lentiviral
  • AAV vectors as part of licensed medicinal products is now a reality, and is anticipated to increase over the coming decades.
  • RRE rev- response element
  • PREs post-transcriptional regulatory elements
  • lentiviral vectorology Several aspects of lentiviral vectorology are likely to contribute to the limit: [1] steady-state pool of vector genomic RNA (vRNA) in the production cell, [2] efficiency of conversion of vRNA to dsDNA by reverse transcriptase, and [3] efficiency of nuclear import and/or integration into host DNA.
  • vRNA vector genomic RNA
  • [2] efficiency of conversion of vRNA to dsDNA by reverse transcriptase [3] efficiency of nuclear import and/or integration into host DNA.
  • the desire to minimize lentiviral vector backbone sequences has recently been highlighted in the literature leading to attempts to alter the arrangement of existing cis-elements (Sertkaya et al., 2021 ; Vink et al., 2017) as well as by generating novel genome configurations to minimize RRE and the packaging signal (WO/2021/181108).
  • the known packaging limit for AAV vectors is ⁇ 5kb, and for 'self-complementary' scAAV vectors this is halved at ⁇ 2.5kb.
  • PREs such as the wPRE are often utilized within AAV vectors to increase transgene expression in target cells. Due to the premium for space within these smaller viral vectors, there is pressure to minimize all sequences such as the promoter, introns, UTRs, polyA sequence and even the protein product - all whilst retaining biological output of the delivered cassette. The necessity of the presence of these elements (to achieve high gene expression) has greatly restricted the utility of AAV vectors to deliver more complicated, larger payloads.
  • transgene cassettes will become more complex, requiring the delivery of more functions, for example in delivering more genes, transgene control or suicide switches. Improved viral vectors for larger payloads are urgently needed.
  • the inventors have generated viral vectors with novel short cis-acting sequences in the 3' UTR of a transgene expression cassette. They have identified two novel short cis-acting sequences that can be introduced into the 3' UTR of a transgene expression cassette, either alone, or in combination. These novel short nucleotide sequences (and combinations thereof) can either be used in addition to traditional post-transcriptional regulatory elements (PREs e.g. from woodchuck hepatitis virus; wPRE) to boost transgene expression in target cells or to replace these longer PREs entirely, enabling increased transgene capacity whilst maintaining high levels of transgene expression in target cells.
  • PREs post-transcriptional regulatory elements
  • CAR sequences previously identified to function within 5'UTR sequences of heterologous mRNA provide enhanced gene expression when incorporated into the 3'UTR of a viral vector transgene expression cassette.
  • the initially reported 160bp CAR sequence (composed of 16x repeats of a 10bp core sequence) could be further minimized to fewer than 16 repeats without loss of the benefit to transgene expression.
  • these CAR sequences are shown to enhance the transgene expression from transgene cassettes utilizing introns, as well as boosting expression from cassettes already containing a full length wPRE.
  • the inventors have also identified a minimal ZCCHC14 protein-binding sequence that can be incorporated into the 3' UTR of a transgene expression cassette to improve transgene expression.
  • the inventors have shown that these minimal ZCCHC14 protein-binding sequences can be combined with the CAR sequences described herein to further enhance transgene expression.
  • novel cis-acting sequences described herein can be used to minimize the size of functional cis-acting sequences of all viral vectors such that payloads can be increased and/or titres of vectors containing larger payloads can be improved, whilst maintaining transgene expression levels in target cells.
  • the invention may therefore be employed [1] within viral vector genomes where 'cargo' space is not limiting, such that the novel cis-acting sequences further enhance expression of a transgene cassette containing another 3' UTR element, such as the wPRE, or [2] within viral vector genomes where cargo space is limiting (i.e.
  • novel cis-acting sequences may be used instead of a larger 3'UTR element, such as the wPRE, thus reducing vector genome size, whilst also imparting an increase to transgene expression in target cells compared to a vector genome lacking any 3'UTR cis-acting element.
  • the invention provides a nucleotide sequence comprising a transgene expression cassette, wherein the 3' UTR of the transgene expression cassette comprises at least one cis- acting sequence selected from: a) a cis-acting Cytoplasmic Accumulation Region (CAR) sequence, comprising at least one CAR element (CARe) sequence; and/or b) a cis-acting ZCCHC14 protein-binding sequence, comprising at least one CNGGN- type pentaloop sequence, wherein the cis-acting ZCCHC14 protein-binding sequence does not comprise a full length post-transcriptional regulatory element (PRE) a element and does not comprise a full length PRE y element.
  • CAR Cytoplasmic Accumulation Region
  • CARe CAR element
  • ZCCHC14 protein-binding sequence comprising at least one CNGGN- type pentaloop sequence
  • PRE post-transcriptional regulatory element
  • the CAR sequence may comprise a plurality of CARe sequences.
  • the plurality of CARe sequences may be in tandem.
  • the CAR sequence may comprise at least two, at least four, at least six, at least eight, at least ten, at least twelve, at least fourteen, at least sixteen, at least eighteen, or at least twenty CARe sequences, optionally wherein the CARe sequences are in tandem.
  • the CAR sequence may comprise at least six CARe sequences in tandem, or at least ten CARe sequences in tandem.
  • the CAR sequence may comprise at least eight CARe sequences in tandem, at least twelve CARe sequences in tandem, or at least sixteen CARe sequences in tandem.
  • CARe nucleotide sequence may be BMWGHWSSWS (SEQ ID NO: 24) or BMWRHWSSWS (SEQ ID NO: 55).
  • CARe nucleotide sequence may be CMAGHWSSTG (SEQ ID NO:28).
  • the CARe nucleotide sequence may be selected from the group consisting of: CCAGTTCCTG (SEQ ID NO:29), CCAGATCCTG (SEQ ID NQ:30), CCAGTTCCTC (SEQ ID NO:31), TCAGATCCTG (SEQ ID NO:32), CCAGATGGTG (SEQ ID NO:33), CCAGTTCCAG (SEQ ID NO:34), CCAGCAGCTG (SEQ ID NO:35), CAAGCTCCTG (SEQ ID NO:36), CAAGATCCTG (SEQ ID NO:37), CCTGAACCTG (SEQ ID NO:38), CAAGAACGTG (SEQ ID NO:39), TCAGTTCCTG (SEQ ID NO: 56), GCAGTTCCTG (SEQ ID NO: 57), CAAGTTCCTG (SEQ ID NO: 58), CCTGTTCCTG (SEQ ID NO: 59), CCTGCTCCTG (SEQ ID NO: 60), CCTGTACCTG (SEQ ID NO: 29
  • the CARe nucleotide sequence may be selected from the group consisting of: CCAGTTCCTG (SEQ ID NO:29), CCTGTTCCTG (SEQ ID NO: 59), CCTGTACCTG (SEQ ID NO:61), CCTGTTCCAG (SEQ ID NO: 64), CCAATTCCTG (SEQ ID NO: 66), CCTGAACCTG (SEQ ID NO:38), CCAGTTCCTC (SEQ ID NO: 31) and CCAGTTCCAG (SEQ ID NO:34).
  • the CARe nucleotide sequence may be CCAGTTCCTG (SEQ ID NO:29). This sequence is also referred to as a "consensus” tile herein. It is the CARe sequence that is used to exemplify the invention in examples 1 to 4 below.
  • the 3' UTR of the transgene expression cassette may not comprise additional post- transcriptional regulatory elements (PREs).
  • the 3' UTR of the transgene expression cassette may comprise at least one additional post-transcriptional regulatory element (PRE).
  • the additional PRE may be a Woodchuck hepatitis virus PRE (wPRE).
  • WPRE Woodchuck hepatitis virus PRE
  • the cis-acting ZCCHC14 protein-binding sequence may be a PRE a element fragment.
  • the cis-acting ZCCHC14 protein-binding sequence may be: a) a fragment of a HBV PRE a element; or b) a fragment of HCMV RNA 2.7; or c) a fragment of a wPRE a element.
  • the PRE a element fragment may be no more than 200 nucleotides in length.
  • the PRE a element fragment may be no more than 90 nucleotides in length.
  • the CNGGN-type pentaloop sequence may be comprised within a stem-loop structure having a sequence selected from the group consisting of:
  • the CNGGN-type pentaloop sequence may be comprised within a heterologous stem-loop structure.
  • the ZCCHC14 protein-binding sequence may comprise a sequence selected from the group consisting of:
  • the 3' UTR of the transgene expression cassette may comprise a cis-acting CAR sequence and a cis-acting ZCCHC14 protein-binding sequence, optionally wherein the ZCCHC14 protein-binding sequence is located 3' to the CAR sequence.
  • the 3' UTR of the transgene expression cassette may comprise at least two spatially distinct cis-acting CAR sequences and/or at least two spatially distinct cis-acting ZCCHC14 protein-binding sequences.
  • the cis-acting sequences in the 3' UTR of a transgene expression cassette may comprise the sequence of SEQ ID NO: 69, SEQ ID NO:70 or SEQ ID NO:71.
  • the 3' UTR of the transgene expression cassette may further comprise a polyA sequence located 3' to the cis-acting CAR sequence and/or cis-acting ZCCHC14 proteinbinding sequence.
  • the transgene expression cassette may further comprise a promoter operably linked to the transgene.
  • the promoter may lack its native intron, optionally wherein the promoter is selected from the group consisting of: an EFS promoter, a PGK promoter, and a UBCs promoter.
  • the promoter may comprise an intron, optionally wherein the promoter is selected from the group consisting of: an EF1a promoter and a UBC promoter.
  • the transgene expression cassette may be a viral vector transgene expression cassette.
  • the viral vector transgene expression cassette may be selected from the group consisting of: a retroviral vector transgene expression cassette, an adenoviral vector transgene expression cassette, an adeno-associated viral vector transgene expression cassette, a herpes simplex viral vector transgene expression cassette, and a vaccinia viral vector transgene expression cassette.
  • the viral vector transgene expression cassette may be a retroviral vector transgene expression cassette or an adeno-associated viral vector transgene expression cassette.
  • the retroviral vector transgene expression cassette may be a lentiviral vector transgene expression cassette.
  • a nucleotide sequence is also provided comprising a viral vector genome expression cassette, wherein the viral vector genome expression cassette comprises the nucleotide sequence of the invention.
  • the viral vector genome expression cassette may be selected from the group consisting of: a retroviral vector genome expression cassette, an adenoviral vector genome expression cassette, an adeno-associated viral vector genome expression cassette, a herpes simplex viral vector genome expression cassette, and a vaccinia viral vector genome expression cassette.
  • the viral vector genome expression cassette may be a retroviral vector genome expression cassette or an adeno-associated viral vector transgene expression cassette.
  • the 3' UTR of the retroviral vector genome expression cassette may further comprise a 3' polypurine tract (3'ppt) that is located 5' to a DNA attachment (att) site, wherein, when the transgene expression cassette is in the forward orientation with respect to the genome expression cassette, the cis-acting sequence(s) are located 5' to the 3'ppt and/or 3' to the att site.
  • 3'ppt 3' polypurine tract
  • the retroviral vector genome expression cassette may be a lentiviral vector genome expression cassette.
  • the major splice donor site in the lentiviral vector genome expression cassette may be inactivated, optionally wherein the cryptic splice donor site 3' to the major splice donor site is also inactivated.
  • the inactivated major splice donor site may have the sequence of GGGGAAGGCAACAGATAAATATGCCTTAAAAT (SEQ ID NO:4).
  • the nucleotide sequence may further comprise a nucleotide sequence encoding a modified U1 snRNA, wherein the modified U1 snRNA has been modified to bind to a nucleotide sequence within the packaging region of the lentiviral vector genome.
  • the viral vector genome expression cassette may be operably linked to the nucleotide sequence encoding the modified U1 snRNA.
  • a viral vector is also provided comprising a viral vector genome encoded by the nucleotide sequence of the invention.
  • a viral vector production system comprising a nucleotide sequence according to the invention and one or more additional nucleotide sequence(s) encoding viral vector components is also provided.
  • the one or more additional nucleotide sequence(s) may encode gag-pol and env, and optionally rev.
  • a cell is also provided comprising a nucleotide sequence according to the invention, a viral vector according to the invention, or a viral vector production system according to the invention.
  • a method for producing a viral vector comprising the steps of:
  • a viral vector produced by the method of the invention is also provided.
  • nucleotide sequence of the invention is also provided.
  • a method for identifying one or more cis-acting sequence(s) that improve transgene expression in a target cell comprising the steps of:
  • step (c) may comprise performing RT PCR and optionally sequencing the transgene mRNA.
  • the method may be performed using a plurality of viral vectors with:
  • FIG. 1 provides an overview of the positional use of CARe cis-acting elements for use alone or in combination with ZCCHC14 stem loop(s) and/or a PRE within lentiviral vector genomes.
  • the schematic shows the generalized structure of a lentiviral vector genome containing the RRE or deleted for RRE (containing an undisclosed feature) and internal transgene expression cassette encoding a gene of interest (GOI); such genomes typically utilize a PRE (such as wPRE) within the transgene 3' UTR.
  • the PRE may optionally be entirely replaced with minimal CARe sequences alone or in combination with ZCCHC14 stem-loops (ZC'14 SL) up or downstream of the 3'ppt in order to reduce the size of the transgene cassette.
  • ZC'14 SL ZCCHC14 stem-loops
  • the same cis-acting element options can be employed in the transgene 3'UTR , except there is no 3'ppt to consider.
  • Figure 2 provides a detailed view of the CARe and ZCCHC14 stem loop sequences and their incorporation into the 3'UTR region of transgene cassettes within viral vectors.
  • A The consensus sequence for the 10bp CARe core sequence (or 'tile' referred herein).
  • B Two nonlimiting examples of ZCCHC14 binding stem loops found within HCMV (RNA2.7) and WHV (wPRE). ZCCHC14 recruitment leads to formation of a complex with Tent4, which promotes mixed tailing in polyA tails of mRNAs, stabilizing them.
  • C is a consensus sequence for the 10bp CARe core sequence (or 'tile' referred herein).
  • B Two nonlimiting examples of ZCCHC14 binding stem loops found within HCMV (RNA2.7) and WHV (wPRE). ZCCHC14 recruitment leads to formation of a complex with Tent4, which promotes mixed tailing in polyA tails of mRNAs, stabilizing them.
  • C C.
  • CARe sequences into the 3' UTR of a viral vector transgene cassette (DNA at top, RNA shown as curvy line below DNA), optionally together with ZCCHC14 stem loops, taking care in retro/lentiviral vectors not to disrupt 3'ppt or att (' ⁇ '[i.e. ⁇ U3]) integration sequences required for reverse transcription and integration respectively.
  • CARe sequences and optionally ZCCHC14 stem loops can be designed rationally or by library design, and screened empirically in target cells. For example screening can be done by viral vector transduction followed by selection of high-expressing cells (e.g.
  • Figure 3 shows production titres in suspension (serum-free) HEK293T cells of lentiviral vectors harbouring different transgene promoters combined with 3' UTR cis-acting elements.
  • a and B present data from two independent experiments for LV-RRE-EFS-GFP vectors containing different 3' UTR cis-acting elements: wPRE, ⁇ wPRE (wPRE deleted), 16x 10bp CARe sequences in sense (CARe.16t) or antisense (CARe.inv16t) and/or single copy of the ZCCHC14 stem loop from HCMV RNA2.7 (HCMV.ZSL1).
  • C shows data for output titres of LV- RRE-EF1a-GFP (EF1a contains an intron) and LV-RRE-huPGK-GFP.
  • Titres were measured by transduction of adherent HEK293T cells followed by flow cytometry based assay after 3 days (GFP TU/mL) or qPCR to LV DNA after 10 days (Integrating TU/mL).
  • the data shows that integrating titres of LVs are comparable irrespective of the presence/absence of any of the cis-acting elements but that GFP titres vary, reflecting the expression levels in transduced adherent HEK293T cells.
  • the 16x 10bp CARe tile (only) in the sense orientation provided a boost to LV GFP TU/ml titres lacking the wPRE.
  • Figure 4 shows that the 16x 10bp CARe tile boosts transgene expression from lentiviral vectors lacking wPRE in a T-cell line.
  • LV-RRE-EFS-GFP and LV-RRE-EF1a-GFP vector stocks produced in suspension (serum-free) HEK293Ts were used to transduce Jurkat cells at matched multiplicity of infection (MOI): MO1 1 [A], MOI 0.25 [B] and MOI 0.1 [C].
  • MOI multiplicity of infection
  • Transduced cells were analysed by flow cytometry to obtain % GFP-positive values and median fluorescence intensity values (Arbitrary units).
  • D displays data from normalized RT-PCR of extracted mRNA from the transduced cells, where the 100% level is set for each EFS-GFP or EF1a-GFP cassette containing the wPRE in each case.
  • the data show that the 16x 10bp CARe tile (only) in the sense orientation restores transgene expression levels to those observed with wPRE-only, and for EF1a-GFP surprisingly boosts transgene expression levels higher the wPRE-only.
  • the 16x 10bp CARe tile (only) in the sense orientation increase the levels of transgene mRNA above wPRE-only in all conditions.
  • Figure 5 provides production titres in suspension (serum-free) HEK293T cells of 'MSD- 2KO7'U1 -dependent' lentiviral vectors harbouring different transgene promoters combined with 3' UTR cis-acting elements.
  • LV-RRE-Pro-GFP vectors containing mutations in the SL2 loop of the packaging signal were produced +/- 256U1 , a modified U1 snRNA that binds to the vector genomic RNA to restore titres.
  • wPRE transgene promoter
  • ⁇ wPRE wPRE deleted
  • CARe.16t 16x 10bp CARe sequences in sense
  • CARe.inv16t 16x 10bp CARe sequences in sense
  • CARe.inv16t 16x 10bp CARe sequences in sense
  • CARe.inv16t 16x 10bp CARe sequences in sense
  • CARe.inv16t 16x 10bp CARe sequences in sense
  • CARe.inv16t antisense
  • Figure 6 provides production titres in suspension (serum-free) HEK293T cells of 'MSD- 2KO7ARRE lentiviral vectors harbouring different 3' UTR cis-acting elements and transgene expression in target cells.
  • LV-ARRE-EFS-GFP vectors containing mutations in the SL2 loop of the packaging signal were produced.
  • wPRE wPRE deleted
  • CARe.16t 16x 10bp CARe sequences in sense
  • CARe.inv16t 16x 10bp CARe sequences in sense
  • CARe.inv16t single copy of the ZCCHC14 stem loop from either HCMV RNA2.7 (HCMV.ZSL1) or WHV wPRE (WPRE.ZSI1).
  • A. Titres were measured by transduction of adherent HEK293T cells followed by flow cytometry based assay after 3 days (GFP TU/mL) or qPCR to LV DNA after 10 days (Integrating TU/mL).
  • B. Transgene expression in transduced adherent HEK293T or HEPG2 cells was measured by flow cytometry three days post-transduction at match MOI.
  • Figure 7 shows transgene expression levels in primary cells transduced with RRE/rev- dependent lentiviral vectors harbouring different 3' UTR cis-acting elements.
  • LV-RRE-EFS- GFP [A] or LV-RRE-EF1a-GFP [B] vector stocks produced in suspension (serum-free) HEK293Ts were used to transduce primary cells (92BR) at matched multiplicity of infection (MOI): MOI 2, 1 or 0.5.
  • wPRE wPRE3 (shortened wPRE), ⁇ wPRE (wPRE deleted), 16x 10bp CARe sequences in sense (CARe.16t) or antisense (CARe.inv16t), and/or single copy of the ZCCHC14 stem loop from either HCMV RNA2.7 (HCMV.ZSL1) or WHV wPRE (WPRE.ZSI1).
  • Transgene expression in transduced adherent 92BR cells was measured by flow cytometry three days post-transduction at match MOI.
  • Figure 8 shows transgene expression levels in adherent HEK293T cells transduced with RRE/rev-dependent lentiviral vectors harbouring different 3' UTR cis-acting elements at matched MOIs.
  • LV-RRE-EFS-GFP vector stocks produced in suspension (serum-free) HEK293Ts were initially titrated on adherent HEK293T cells to generate integrating titres (TU/mL).
  • Vector stocks were used to transduce fresh adherent HEK293T cells at matched multiplicity of infection (MOI): MOI 2, 1 or 0.5.
  • Transgene (GFP) expression in transduced adherent HEK293T cells was measured by flow cytometry three days post-transduction and median fluorescence intensities (Arbitrary units) normalised to that achieved with the standard wPRE-containing LV (set to 100%).
  • Figure 9 shows transgene expression levels in suspension Jurkat cells (T-cell line) transduced with RRE/rev-dependent lentiviral vectors harbouring different 3' UTR cis-acting elements at matched MOIs (diagonal lines).
  • LV-RRE-EFS-GFP vector stocks produced in suspension (serum-free) HEK293Ts were initially titrated on adherent HEK293T cells to generate integrating titres (open bars; TU/rnL).
  • Vector stocks were used to transduce fresh a Jurkat cells at matched multiplicity of infection (MOI): MOI 1 or 0.5.
  • wPRE 16x 10bp CARe tiles
  • HCMV.ZSL1 a single copy of the ZCCHC14 stem loop
  • ⁇ wPRE no element
  • variants deleted for wPRE but containing a single copy of the ZCCHC14 stem loop (at position 2) were also paired with increasing numbers of CARe tile, from 1x to 20x 10bp copies (at position 1 i.e. upstream of position 2).
  • Transgene (GFP) expression in transduced Jurkat cells was measured by flow cytometry three days post-transduction and median fluorescence intensities (Arbitrary units) normalised to that achieved with the standard wPRE-containing LV (set to 100%).
  • Figure 10 shows transgene expression levels in suspension Jurkat cells (T-cell line) transduced with RRE/rev-dependent lentiviral vectors harbouring different 3' UTR cis-acting elements at matched MOI.
  • LV-RRE-EFS-GFP vector stocks produced in suspension (serum- free) HEK293Ts were initially titrated on adherent HEK293T cells to generate integrating titres (not shown).
  • Vector stocks were used to transduce fresh Jurkat cells at matched multiplicity of infection of 1.
  • wPRE black bar
  • 16x 10bp CARe tiles CARe.16t; dark grey bar
  • HCMV.ZSL1 striped light grey bar
  • ⁇ wPRE white bar
  • variants deleted for wPRE but containing a single copy of the ZCCHC14 stem loop were also paired with 16x 10bp CARe tiles that contained synthetic variant sequences of the consensus (CARe.16t_vX; at position 1 i.e. upstream of position 2) as shown (grey bars).
  • Transgene (GFP) expression in transduced Jurkat cells was measured by flow cytometry ten days post-transduction and median fluorescence intensities (Arbitrary units) normalised to vector copy-number (VCN), which was measured by qPCR against HIV-Psi on extracted host cell DNA.
  • VCN vector copy-number
  • Figure 11 shows transgene expression levels in suspension Jurkat cells (T-cell line) transduced with RRE/rev-dependent lentiviral vectors harbouring different 3' UTR cis-acting elements at matched MOI.
  • LV-RRE-EFS-GFP vector stocks produced in suspension (serum- free) HEK293Ts were initially titrated on adherent HEK293T cells to generate integrating titres (not shown).
  • Vector stocks were used to transduce fresh Jurkat cells at matched multiplicity of infection of 1.
  • wPRE black bar
  • 16x 10bp CARe tiles CARe.16t; dark grey bar
  • HCMV.ZSL1 striped light grey bar
  • ⁇ wPRE striped light grey bar
  • variants deleted for wPRE but containing a single copy of the ZCCHC14 stem loop were also paired with 16x 10bp CARe tiles that contained native variant sequences of the consensus (CARe.16t_vX; at position 1 i.e. upstream of position 2) as shown (grey bars).
  • Transgene (GFP) expression in transduced Jurkat cells was measured by flow cytometry ten days post-transduction and median fluorescence intensities (Arbitrary units) normalised to vector copy-number (VCN), which was measured by qPCR against HIV-Psi on extracted host cell DNA.
  • the solid horizontal line indicates expression level achieved by the larger wPRE element, and the dotted horizontal line indicates expression levels without any 3'UTR element.
  • Figure 12 shows example rAAV vector genomes containing no (empty) or 3'UTR elements to enhance transgene expression in target cells.
  • the 'CAZL' element (a composite of tandem CARe 10bp consensus tiles [CARe.xt] and the ZCCHC14 stem loop [ZSL1 ]) is -140-260 nts in length (depending on use of 4x to 16x CARe tiles).
  • the wPRE is -590 nts in length, and therefore occupies more of the rAAV vector genome, size being a critical limitation for rAAVs.
  • Figure 13 shows the results of an experiment wherein rAAV vectors containing either CMV- or EFS-promoter driven GFP, paired with either variant CARe/ZSL1 ('CAZL') elements from ⁇ 140-to-260 nts in length or with the wPRE (-590 nts) at position 'x' (i.e. in 3'UTR).
  • Controls for the CARe/ZSL1 were inverted elements ('inv') to control for potential effects of different genome sizes, which ranged from 2.0-to-2.6 kb (CMV) and 1.6-to-2.2 kb (EFS) across all genomes.
  • An empty rAAV vector was used as a negative control.
  • rAAVs were made by cotransfection of HEK293T suspension cells with pGenome/pRepCap/pHelper plasmids at 1 :1 :1 ratio, and harvest 72 hours post-transfection.
  • Vector harvest material was titrated by qPCR against the GFP sequence to generated vg/mL physical titre values.
  • HEPG2 cells were transduced at the denoted MOIs, and then 72 hours post-transduction cells analysed by flow cytometry, and GFP Expression scores (%GFP+ x MFI; Arblls) generated.
  • the present invention provides novel nucleotide sequences, and viral vectors or cells comprising such nucleotide sequences.
  • a nucleotide sequence comprising a transgene expression cassette wherein the 3' UTR of the transgene expression cassette comprises at least one cis-acting sequence selected from (a) a cis-acting Cytoplasmic Accumulation Region (CAR) sequence; and/or (b) a cis-acting ZCCHC14 protein-binding sequence.
  • CAR Cytoplasmic Accumulation Region
  • nucleotide sequence is synonymous with the term “polynucleotide” and/or the term “nucleic acid sequence”.
  • the "nucleotide sequence” can be a double stranded or single stranded molecule and includes genomic DNA, cDNA, synthetic DNA, RNA and a chimeric DNA/RNA molecule.
  • Polynucleotides may be produced recombinantly, synthetically or by any means available to those of skill in the art. They may also be cloned by standard techniques.
  • the nucleotide sequence comprises a transgene expression cassette.
  • An expression cassette is a distinct component of a vector, comprising a gene (in this case a transgene) and regulatory sequence(s) to be expressed by a transfected, transduced or infected cell.
  • transgene refers to a segment of DNA or RNA that contains a gene sequence that has been isolated from one organism and is introduced into a different organism, is a non-native segment of DNA or RNA, or is a recombinant sequence that has been made using genetic engineering techniques.
  • the terms "transgene”, “transgene construct”, “GOI” (gene of interest) and “NOI” (nucleotide of interest) are used interchangeably herein.
  • transgene expression cassettes described herein are preferably viral vector transgene expression cassettes. Suitable viral vector transgene expression cassettes are described in more detail elsewhere herein.
  • the 3' UTR of the transgene expression cassettes described herein comprises at least one of the novel cis-acting sequences described herein.
  • Cis-acting sequences affect the expression of genes that are encoded in the same nucleotide sequence (i.e. the one in which the cis- acting sequence is also present).
  • cis-acting sequences include the typical post-transcriptional regulatory elements (PREs) such as that from the woodchuck hepatitis virus (wPRE).
  • PREs post-transcriptional regulatory elements
  • wPRE woodchuck hepatitis virus
  • the 3' UTR of the transgene expression cassettes described herein comprises at least one cis-acting sequence selected from (a) a cis-acting Cytoplasmic Accumulation Region (CAR) sequence; and/or (b) a cis-acting ZCCHC14 protein-binding sequence.
  • CAR Cytoplasmic Accumulation Region
  • a "Cytoplasmic Accumulation Region (CAR) sequence” is a nucleotide sequence that is transcribed into mRNA and increases the stability and/or export of the mRNA to the cytoplasm and accumulation of the mRNA in the cytoplasm of a cell by sequencedependent recruitment of the mRNA export machinery.
  • CAR sequences have been described previously, see for example Lei et al. , 2013, which describes that insertion of a CAR sequence upstream (i.e. at the 5' end) of a naturally intronless gene can promote the cytoplasmic accumulation of the mRNA transcript.
  • CAR sequences are shown to enhance the transgene expression from transgene cassettes utilizing introns as well as from transgene cassettes that are intronless, as well as boosting expression from cassettes already containing a full length wPRE.
  • Suitable CAR sequences for insertion into the 3' UTR of a transgene expression cassette described herein may be readily identifiable by a person of skill in the art, based on the disclosure provided herein, together with their common general knowledge (see e.g. the disclosure in Lei et al., 2013, which is incorporated herein in its entirety).
  • CARe CAR element
  • a CARe sequence is a core sequence that is present within a CAR sequence (and typically, wherein the CARe sequence is repeated a number of times within the CAR sequence). Examples of CARe sequences are shown in Figure 2 and are described in Lei et al., 2013.
  • CARe sequence may be a sequence that is represented by BMWGHWSSWS (SEQ ID NO: 24) or BMWRHWSSWS (SEQ ID NO: 55), wherein:
  • CARe sequences that are encompassed by BMWGHWSSWS (SEQ ID NO: 24) or BMWRHWSSWS (SEQ ID NO: 55) include those with a sequence represented by CMAGHWSSTG (using the nomenclature of Table 1 ; SEQ ID NO: 28).
  • Such CARe sequences include the CARe sequences identified previously in Lei et al., 2013 (where the CARe sequences were identified in the 5' region of HSPB3, c-Jun, IFNal and I FNp 1 genes).
  • the CARe sequence may be selected from the group consisting of: CCAGTTCCTG (SEQ ID NO: 29), CCAGATCCTG (SEQ ID NO: 30), CCAGTTCCTG (SEQ ID NO: 31), TCAGATCCTG (SEQ ID NO:32), CCAGATGGTG (SEQ ID NO: 33), CCAGTTCCAG (SEQ ID NO:34), CCAGCAGCTG (SEQ ID NO:35), CAAGCTCCTG (SEQ ID NO:36), CAAGATCCTG (SEQ ID NO:37), CCTGAACCTG (SEQ ID NO:38), CAAGAACGTG (SEQ ID NO:39), TCAGTTCCTG (SEQ ID NO: 56), GCAGTTCCTG (SEQ ID NO: 57), CAAGTTCCTG (SEQ ID NO: 58), CCTGTTCCTG (SEQ ID NO: 59), CCTGCTCCTG (SEQ ID NO: 60), CCTGTACCTG (SEQ ID NO:
  • the CARe nucleotide sequence is selected from the group consisting of: CCAGTTCCTG (SEQ ID NO:29), CCTGTTCCTG (SEQ ID NO: 59), CCTGTACCTG (SEQ ID NO:61), CCTGTTCCAG (SEQ ID NO: 64), CCAATTCCTG (SEQ ID NO: 66), CCTGAACCTG (SEQ ID NO:38), CCAGTTCCTC (SEQ ID NO: 31) and CCAGTTCCAG (SEQ ID NO:34).
  • the CARe sequence may be CCAGTTCCTG (SEQ ID NO: 29). This is the sequence that is used to exemplify the invention in examples 1 to 4 below.
  • the CARe sequence may be CCAGATCCTG (SEQ ID NO: 30). This is the consensus sequence identified in Figure 2A.
  • the CARe sequence may be CCAGTTCCTC (SEQ ID NO: 31). This sequence is also referred to as HSPB3 v2 herein (see e.g. Figure 11).
  • the CARe sequence may be TCAGATCCTG (SEQ ID NO: 32).
  • the CARe sequence may be CCAGATGGTG (SEQ ID NO: 33). This sequence is also referred to as HSPB3 v3 herein (see e.g. Figure 11).
  • the CARe sequence may be CCAGTTCCAG (SEQ ID NO: 34). This sequence is also referred to as IFNal v1 herein (see e.g. Figure 11).
  • the CARe sequence may be CCAGCAGCTG (SEQ ID NO: 35).
  • the CARe sequence may be CAAGCTCCTG (SEQ ID NO: 36).
  • the CARe sequence may be CAAGATCCTG (SEQ ID NO: 37).
  • CARe sequence may be CCTGAACCTG (SEQ ID NO: 38). This sequence is also referred to as c-Jun v2 herein (see e.g. Figure 11).
  • the CARe sequence may be CAAGAACGTG (SEQ ID NO: 39). This sequence is also referred to as c-Jun v4 herein (see e.g. Figure 11).
  • the CARe sequence may be TCAGTTCCTG (SEQ ID NO: 56). This sequence is also referred to as variant 1 in Figure 10.
  • the CARe sequence may be GCAGTTCCTG (SEQ ID NO: 57). This sequence is also referred to as variant 2 in Figure 10. In one example, the CARe sequence may be CAAGTTCCTG (SEQ ID NO: 58). This sequence is also referred to as variant 3 in Figure 10.
  • the CARe sequence may be CCTGTTCCTG (SEQ ID NO: 59). This sequence is also referred to as variant 4 in Figure 10.
  • the CARe sequence may be CCTGCTCCTG (SEQ ID NO: 60). This sequence is also referred to as variant 6 in Figure 10.
  • the CARe sequence may be CCTGTACCTG (SEQ ID NO:61). This sequence is also referred to as variant 7 in Figure 10.
  • the CARe sequence may be CCTGTTGCTG (SEQ ID NO: 62). This sequence is also referred to as variant 8 in Figure 10.
  • the CARe sequence may be CCTGTTCGTG (SEQ ID NO: 63). This sequence is also referred to as variant 9 in Figure 10.
  • the CARe sequence may be CCTGTTCCAG (SEQ ID NO: 64). This sequence is also referred to as variant 10 in Figure 10.
  • the CARe sequence may be CCTGTTCCTG (SEQ ID NO: 65). This sequence is also referred to as variant 11 in Figure 10.
  • the CARe sequence may be CCAATTCCTG (SEQ ID NO: 66). This sequence is also referred to as variant 12 in Figure 10.
  • the CARe sequence may be GAAGCTCCTG (SEQ ID NO: 67). This sequence is also referred to as IFNbl v1 in Figure 11.
  • a transgene expression cassette described herein may comprise a cis-acting Cytoplasmic Accumulation Region (CAR) sequence, comprising at least one of the CAR element (CARe) sequences described above.
  • CAR Cytoplasmic Accumulation Region
  • CAR sequences typically comprise a plurality of CARe sequences.
  • the CAR sequences described herein may include a plurality of CARe sequences, e.g. at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, or at least twenty CARe sequences.
  • the CAR sequence described herein comprises at least two CARe sequences.
  • the CAR sequence described herein comprises at least four CARe sequences.
  • the CAR sequence described herein comprises at least six CARe sequences.
  • the CAR sequence described herein comprises at least eight CARe sequences.
  • the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
  • the CAR sequence described herein comprises at least ten CARe sequences.
  • the CAR sequence described herein comprises at least twelve CARe sequences.
  • the CAR sequence described herein comprises at least fourteen CARe sequences.
  • the CAR sequence described herein comprises at least sixteen CARe sequences.
  • the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
  • the CAR sequence described herein comprises at least eighteen CARe sequences.
  • the CAR sequence described herein comprises at least twenty CARe sequences.
  • the CAR sequences described herein may include a plurality of CARe sequences, e.g. no more than two, no more than three, no more than four, no more than five, no more than six, no more than seven, no more than eight, no more than nine, no more than ten, no more than eleven, no more than twelve, no more than thirteen, no more than fourteen, no more than fifteen, no more than sixteen, no more than seventeen, no more than eighteen, no more than nineteen, or no more than twenty CARe sequences.
  • CARe sequences e.g. no more than two, no more than three, no more than four, no more than five, no more than six, no more than seven, no more than eight, no more than nine, no more than ten, no more than eleven, no more than twelve, no more than thirteen, no more than fourteen, no more than fifteen, no more than sixteen, no more than seventeen, no more than eighteen, no more than nineteen, or no more than twenty CARe sequences.
  • the CAR sequence described herein has no more than two CARe sequences.
  • the CAR sequence described herein has no more than four CARe sequences.
  • the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
  • the CAR sequence described herein has no more than six CARe sequences.
  • the CAR sequence described herein has no more than eight CARe sequences.
  • the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
  • the CAR sequence described herein has no more than ten CARe sequences.
  • the CAR sequence described herein has no more than twelve CARe sequences.
  • the CAR sequence described herein has no more than fourteen CARe sequences.
  • the CAR sequence described herein has no more than sixteen CARe sequences.
  • the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
  • the CAR sequence described herein has no more than eighteen CARe sequences.
  • the CAR sequence described herein has no more than twenty CARe sequences. In one example, the CAR sequence described herein has at least four, but no more than twenty, CARe sequences.
  • the CAR sequence described herein has at least eight, but no more than twenty, CARe sequences.
  • the CAR sequence described herein has at least twelve, but no more than twenty, CARe sequences.
  • the CAR sequence described herein has at least sixteen, but no more than twenty, CARe sequences.
  • the CAR sequence described herein has at least four, but no more than sixteen, CARe sequences.
  • the CAR sequence described herein has at least eight, but no more than sixteen, CARe sequences.
  • the CAR sequence described herein has at least twelve, but no more than sixteen, CARe sequences.
  • a CAR sequence described herein may consist of two CARe sequences, or consist of four CARe sequences, or consist of six CARe sequences, or consist of eight CARe sequences, or consist of ten CARe sequences, or consist of twelve CARe sequences, or consist of fourteen CARe sequences, or consist of sixteen CARe sequences, or consist of eighteen CARe sequences, or consist of twenty CARe sequences.
  • a CAR sequence comprising sixteen CARe sequences is therefore particularly contemplated herein.
  • a CAR sequence having no more than sixteen CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG) is therefore also particularly contemplated herein. From the foregoing, it will be appreciated that a CAR sequence having a total of sixteen CARe sequences (e.g. with at least one (or all) of the CARe sequence(s) being CCAGTTCCTG) is contemplated herein.
  • enhanced expression may also be achieved with less than sixteen CARe sequences (e.g. with one or more CARe sequences - see Figure 8). This is advantageous as it provides greater transgene capacity in the expression cassette.
  • a CAR sequence comprising at least eight CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is particularly contemplated herein.
  • a CAR sequence having a total of eight CARe sequences (e.g. with at least one (or all) of the CARe sequence(s) being CCAGTTCCTG) is contemplated herein.
  • a CAR sequence having no more than eight CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore also particularly contemplated herein.
  • CARe sequences at the 5' end of intronless mRNA
  • CAR sequence comprising six or ten CARe sequences have previously been shown to be functional.
  • a CAR sequence comprising at least six CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore particularly contemplated herein.
  • a CAR sequence comprising at least ten CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore also particularly contemplated herein.
  • a CAR sequence having no more than six CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore particularly contemplated herein.
  • Such sequences are particularly contemplated in the case that the CAR sequence is inserted in the 3'UTR of a transgene expression cassette.
  • a CAR sequence having no more than ten CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore also particularly contemplated herein.
  • such sequences are particularly contemplated in the case that the CAR sequence is inserted in the 3'UTR of a transgene expression cassette.
  • the CAR sequence may comprise a plurality of CARe sequences that are the same (i.e. the CAR sequence may comprise a number of repeats of the same CARe sequence).
  • the CAR sequence may comprise at least two, at least four, at least six, at least eight, at least ten, at least twelve, at least fourteen, at least sixteen, at least eighteen, or at least twenty of the same CARe sequence.
  • the CAR sequence may have no more than two, no more than four, no more than six, no more than eight, no more than ten, no more than twelve, no more than fourteen, no more than sixteen, no more than eighteen, or no more than twenty of the same CARe sequence.
  • the CAR sequence may comprise at least two, at least four, at least six, at least eight, at least ten, at least twelve, at least fourteen, at least sixteen, at least eighteen, or at least twenty CARe sequences, wherein at least two of the CARe sequences are different.
  • the CAR sequence may have no more than two, no more than four, no more than six, no more than eight, no more than ten, no more than twelve, no more than fourteen, no more than sixteen, no more than eighteen, or no more than twenty CARe sequences, wherein at least two of the CARe sequences are different.
  • the CARe sequences each may be selected independently from the group consisting of: SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66 and SEQ ID NO: 67.
  • CARe sequences each may be selected independently from the group consisting of: SEQ ID NO: 29, SEQ ID NO: 59, SEQ ID NO: 61 , SEQ ID NO: 64, SEQ ID NO: 66, SEQ ID NO: 38, SEQ ID NO: 31 , and SEQ ID NO: 34.
  • CARe sequences may be identified by a person of skill in the art.
  • the plurality of CARe sequences within the CAR sequence may be in tandem (i.e. they may be referred to as "tandem CARe sequences").
  • the at least two, at least four, at least six, at least eight, at least ten, at least twelve, at least fourteen, at least sixteen, at least eighteen, or at least twenty CARe sequences within the CAR sequence may be in tandem.
  • the no more than two, no more than four, no more than six, no more than eight, no more than ten, no more than twelve, no more than fourteen, no more than sixteen, no more than eighteen, or no more than twenty CARe sequences within the CAR sequence may be in tandem.
  • Tandem CARe sequences are located directly adjacent to each other in the nucleotide sequence.
  • the CAR sequence comprises two CARe sequences (each having the sequence CCAGTTCCTG (SEQ ID NO: 29)) in tandem, it would comprise the sequence CCAGTTCCTGCCAGTTCCTG (SEQ ID NO: 48).
  • the CAR sequence comprises three CARe sequences (each having the sequence CCAGTTCCTG (SEQ ID NO: 29)) in tandem, it would comprise the sequence CCAGTTCCTGCCAGTTCCTGCCAGTTCCTG (SEQ ID NO: 49) etc.
  • the tandem sequences may each have the same CARe sequence.
  • the CAR sequence may be described by its size. For example, where a CAR sequence comprises a plurality of CARe sequences in tandem, its size will be reflective of the number of CARe sequences that are present. For example, using the CARe sequences specifically recited herein (which all have a sequence of 10 nucleotides) the CAR sequence may be at least 20 nucleotides when it comprises two CARe sequences in tandem, at least 30 nucleotides when it comprises three CARe sequences in tandem, at least 40 nucleotides when it comprises four CARe sequences in tandem etc.
  • the CAR sequence may be at least 10 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides etc.
  • It may be at least 60 nucleotides (e.g. with at least six CARe sequences in tandem).
  • It may be at least 80 nucleotides (e.g. with at least eight CARe sequences in tandem). It may be at least 100 nucleotides (e.g. with at least ten CARe sequences in tandem).
  • the CAR sequence may be no more than 10 nucleotides, no more than 20 nucleotides, no more than 30 nucleotides, no more than 40 nucleotides, no more than 50 nucleotides etc.
  • the plurality of CARe sequences within the CAR sequence may be spatially separated by intervening sequences (i.e. one or more nucleotides may be present between neighbouring CARe sequences within the CAR sequence).
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than twenty nucleotides e.g.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than two nucleotides.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than four nucleotides.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than six nucleotides.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than eight nucleotides.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than ten nucleotides.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than twelve nucleotides.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than fourteen nucleotides.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than sixteen nucleotides.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than eighteen nucleotides.
  • the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than twenty nucleotides.
  • the transgene expression cassette may comprise at least one a cis-acting ZCCHC14 proteinbinding sequence (as an alternative to the CARe sequence(s) described herein, or in addition to the CARe sequence(s) described herein).
  • a ZCCHC14 protein-binding sequence is a nucleotide sequence that is capable of interacting with/being bound by a ZCCHC14 protein.
  • ZCCHC14 refers to human Zinc finger CCHC domain-containing protein 14 (also referred to as BDG-29), with UniProtKB identifier: Q8WYQ9; and NCBI Gene ID: 23174, updated on 4-Jul-2021).
  • Methods for determining whether a specific nucleotide sequence is capable of being bound by a ZCCHC14 protein are well known in the art; see for example the method of 'Systematic evolution of ligands by exponential enrichment' (SELEX) or the method of RNA electrophoretic mobility shift assay or the method RNA pull-down (cross-linking/immunoprecipitation) or a combination of these methods (Mol Biol (Mosk). May-Jun 2015;49(3):472-81). Routine methods for detecting nucleotide: protein interactions may also be used e.g. nucleotide pull down assays, ELISA assays, reporter assays etc. Appropriate ZCCHC14 protein-binding sequences can therefore readily be identified by a person of skill in the art.
  • the cis-acting ZCCHC14 protein-binding sequences described herein comprise at least one CNGGN-type pentaloop sequence.
  • ZCCHC14 protein-binding sequences comprising a CNGGN-type pentaloop sequence are known in the art.
  • the PRE of HBV and HCMV RNA 2.7, as well as wPRE are known to comprise a ZCCHC14 protein-binding sequence with a CNGGN-type pentaloop sequence.
  • the pentaloop adopts the GNGG(N) family loop conformation with a single bulged G residue, flanked by A-helical regions (see for example Kim et al., 2020), where N can be any nucleotide.
  • the cis-acting ZCCHC14 protein-binding sequences described herein may comprise any appropriate CNGGN-type pentaloop sequence.
  • they may comprise a CTGGT pentaloop sequence (as is seen in the stem loop found in HCMV RNA2.7; also known as the SLa of HCMV RNA2.7).
  • they may comprise a CTGGA pentaloop sequence (as is seen in the stem loop found in the a element of wPRE; also known as the SLa of wPRE).
  • they may comprise a CAGGT pentaloop sequence (as is seen in the stem loop found in the PRE a element of HBV; also known as the SLa of HBV).
  • the CNGGN-type pentaloop sequences found within the SLa of HCMV RNA 2.7, wPRE and HBV are typically part of a stem-loop structure, which facilitates TENT4-dependent tail regulation.
  • the cis-acting ZCCHC14 protein-binding sequences described herein may also comprise the CNGGN-type pentaloop sequence as part of a stem loop sequence.
  • a stem loop sequence any appropriate stem loop sequence may be used. Suitable stem loop sequences can readily be identified by a person of skill in the art, as the required level of complementarity needed for a stem loop sequence is known.
  • stem lengths may also be used.
  • stems of 7 or 8 nucleotides have been used, however, longer stems with up to an additional 9 nucleotides (18 nt added in total, 9 on each side) have also been shown to work (data not shown).
  • Non-limiting examples of stem loop sequences are provided below.
  • a cis-acting ZCCHC14 protein-binding sequence described herein may comprise the stem loop sequence TCCTCGTAGGCTGGTCCTGGGGA (SEQ ID NO: 40; which includes the pentaloop sequence CTGGT, and corresponds to the sequence of SLa of HCMV RNA2.7).
  • a cis-acting ZCCHC14 protein-binding sequence described herein may comprise the stem loop sequence GCCCGCTGCTGGACAGGGGC (SEQ ID NO: 41 ; which includes the pentaloop sequence CTGGA, and corresponds to the sequence of SLa of wPRE).
  • they may comprise the stem loop sequence TTGCTCGCAGCAGGTCTGGAGCAA (SEQ ID NO: 52; which includes the pentaloop sequence CAGGT, and corresponds to the sequence of SLa of HBV).
  • the cis-acting ZCCHC14 protein-binding sequences described herein may comprise the CNGGN-type pentaloop sequence as part of a heterologous stem loop sequence.
  • Appropriate heterologous stem loop sequences may readily be identified by a person of skill in the art.
  • the cis-acting ZCCHC14 protein-binding sequences described herein may comprise the CNGGN-type pentaloop sequence as part of a stem loop sequence, within a longer sequence (i.e. wherein the cis-acting ZCCHC14 protein-binding sequence comprises additional sequences that flank the stem loop sequence itself).
  • flanking sequences are given in SEQ ID NO: 42 (which shows the sequence of a stem-loop structure as set forth in SEQ ID NO: 40, with additional flanking sequences) and SEQ ID NO: 43 (which shows the sequence of a stem-loop structure as set forth in SEQ ID NO: 41 , with additional flanking sequences). Further examples of flanking sequences are given in SEQ ID NO: 54 (which shows the sequence of a stem-loop structure as set forth in SEQ ID NO: 52, with additional flanking sequences).
  • flanking sequences may be nucleotides that are naturally present at these positions in the corresponding PRE (e.g. for SEQ ID NO:43, the flanking sequences are those that are naturally present around the SLa sequence of wPRE). Alternatively, heterologous flanking sequences may be used.
  • the flanking sequences provided herein are merely by way of example and alternative flanking sequences and flanking sequences with different lengths may also be used.
  • a ZCCHC14 protein-binding sequence as described herein may comprise at least one CNGGN-type pentaloop sequence and a stem loop sequence, but does not comprise a flanking sequence.
  • a ZCCHC14 protein-binding sequence as described herein comprises at least one CNGGN-type pentaloop sequence, but does not comprise a stem loop sequence or a flanking sequence.
  • the cis-acting ZCCHC14 protein-binding sequences described herein do not comprise a full length post-transcriptional regulatory element (PRE) a element. In addition, they do not comprise a full length PRE y element (for example that found in wPRE or wPRE3). As such, the cis-acting ZCCHC14 protein-binding sequences described herein have neither a full length post-transcriptional regulatory element (PRE) a element nor a full length PRE y element. The reason for this is that the invention aims to minimise the size of the cis-acting sequences in the 3' UTR of the transgene expression cassette as much as possible, to provide more transgene capacity. The inventors have suprisingly found that the full length sequence of a PRE a element and the full length sequence of a PRE y element are not needed in order to obtain the effects observed herein.
  • the ZCCHC14 protein-binding sequence described herein therefore is not wPRE or wPRE3.
  • Full length PRE a element and y element sequences are readily identifiable by a person of skill in the art.
  • full length PRE a element and y element sequences for wPRE are provided herein as SEQ ID NO: 46 and SEQ ID NO: 45 respectively.
  • a full length PRE a element sequence for HBV is also provided as SEQ ID NO: 53.
  • the cis-acting ZCCHC14 protein-binding sequences described herein therefore do not comprise any of the following sequences: SEQ ID NO: 46, SEQ ID NO: 45, SEQ ID NO:53, and/or SEQ ID NO: 27.
  • a cis-acting ZCCHC14 protein-binding sequence provided herein may have a sequence that corresponds to a PRE a element fragment (in other words, it may have a sequence that is identical to a portion of a PRE a element, but does not include all of (i.e. is shorter than) the full length PRE a element sequence).
  • the cis-acting ZCCHC14 protein-binding sequence provided herein may be a truncated nucleotide sequence that constitutes a part of a PRE a element.
  • the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) a HBV PRE a element. In this context, it may be described as a HBV PRE a element fragment. It may also be described as a truncated nucleotide sequence that constitutes a part of a HBV PRE a element.
  • the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) SEQ ID NO: 53. In other words, it may be a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 53.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO:54.
  • the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) a HCMV RNA 2.7 sequence.
  • it may be a fragment (a portion of) the sequence shown in SEQ ID NO: 27.
  • It may be described as a HCMV RNA 2.7 fragment (for example, a fragment of the sequence shown in SEQ ID NO:27).
  • It may also be described as a truncated nucleotide sequence that constitutes a part of a HCMV RNA 2.7 sequence.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42.
  • the ZCCHC14 protein-binding sequence may be a fragment of the sequence shown in SEQ ID NO:27 and comprise the sequence of SEQ ID NO: 40, or 42.
  • the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) a wPRE PRE a element. In this context, it may be described as a wPRE a element fragment. It may also be described as a truncated nucleotide sequence that constitutes a part of a wPRE a element.
  • the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) SEQ ID NO: 46. In other words, it may be a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 46.
  • the ZCCHC14 proteinbinding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
  • the inventors have exemplified the invention by using ZCCHC14 protein-binding sequences that are derived from known PREs (specifically, the ZCCHC14 protein-binding sequence that is present in HCMV RNA2.7; and/or the ZCCHC14 protein-binding sequence that is present in the PRE of the woodchuck hepatitis virus (wPRE)). Although these ZCCHC14 proteinbinding sequences are particularly contemplated herein, other appropriate ZCCHC14 proteinbinding sequences (e.g. from other PREs) may alternatively (or additionally) be used.
  • the ZCCHC14 protein-binding sequences that are described herein are used to enhance transgene expression, whilst minimising the 'backbone' sequences of viral vectors such that titres of vectors containing larger payloads can be maintained or increased.
  • the ZCCHC14 protein-binding sequence is therefore typically small.
  • the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 240 nucleotides.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54.
  • the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 200 nucleotides (i.e. no more than 200 nucleotides in length).
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54.
  • the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 150 nucleotides (i.e. no more than 150 nucleotides in length).
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54.
  • the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 100 nucleotides (i.e. no more than 100 nucleotides in length).
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54.
  • the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 90 nucleotides (i.e. no more than 90 nucleotides in length).
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54.
  • the cis-acting ZCCHC14 protein-binding sequence is a fragment (a portion of) an HBV PRE a element (i.e. a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 53)
  • the cis-acting ZCCHC14 protein-binding sequence may be up to 240 nucleotides of SEQ ID NO: 53.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO: 54.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO: 54.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO:54.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO:54.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO:54.
  • the cis-acting ZCCHC14 protein-binding sequence is a fragment (a portion of) a HCMV RNA 2.7 sequence (e.g. a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 27)
  • the cis-acting ZCCHC14 protein-binding sequence may be up to 240 nucleotides of SEQ ID NO:27.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42. In a further example it may be up to 100 nucleotides of SEQ ID NO:27. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42.
  • the cis-acting ZCCHC14 protein-binding sequence is a fragment (a portion of) a wPRE a element (i.e. a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 46)
  • the cis-acting ZCCHC14 protein-binding sequence may be up to 240 nucleotides of SEQ ID NO: 46.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
  • the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
  • the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein is up to 90 nucleotides (i.e. a maximum of 90 nucleotides long). See for example SEQ ID NO: 43, which provides the sequence for the ZCCHC14 stem loop from wPRE and is 90 nucleotides in length).
  • ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein is up to 72 nucleotides (i.e. a maximum of 72 nucleotides long). See for example SEQ ID NO: 42, which provides the sequence for the ZCCHC14 stem loop from HCMV RNA2.7 and is 72 nucleotides in length). These examples demonstrate that relatively short sequences can be functional.
  • the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be the minimal sequence that is capable of being bound by a ZCCHC14 protein (i.e. it may be a minimal ZCCHC14 protein-binding sequence). In other words, it may be the smallest sequence that still provides the desired functionality.
  • the ZCCHC14 protein-binding sequence may therefore be the minimal PRE sequence that is capable of being bound by a ZCCHC14 protein. Methods for determining such minimal sequences are known in the art (e.g. nucleotide pull down assays, ELISA assays, reporter assays etc may be used).
  • the ZCCHC14 protein-binding sequence may comprise or consist of the sequence of SEQ ID NO: 40 or 41.
  • the ZCCHC14 protein-binding sequence may comprise or consist of a fragment of SEQ ID NO: 40 or 41 that is capable of binding ZCCHC14.
  • the ZCCHC14 protein-binding sequence may comprise or consist of the sequence of SEQ ID NO: 42 or 43.
  • the ZCCHC14 protein-binding sequence may comprise or consist of a fragment of SEQ ID NO: 42 or 43 that is capable of binding ZCCHC14.
  • the ZCCHC14 protein-binding sequence may comprise or consist of the sequence of SEQ ID NO: 52 or 54.
  • the ZCCHC14 protein-binding sequence may comprise or consist of a fragment of SEQ ID NO: 52 or 54 that is capable of binding ZCCHC14.
  • a ZCCHC14 protein-binding sequence as described herein does not comprise a PRE P element.
  • the ZCCHC14 protein-binding sequence does not include any sequences that are specific to the PRE ⁇ element of SEQ ID NO:47. In other words, it does not include any of the PRE p element of wPRE. In addition, it may not include any sequences that are specific to the PRE y element of SEQ ID NO:45. In other words, it may not include any of the PRE y element of wPRE and also may not include any of the PRE p element of wPRE.
  • any aspect of the cis-acting ZCCHC14 protein-binding sequences described herein may be combined with any aspect of the cis- acting CAR sequences described herein to provide a transgene expression cassette wherein the 3' UTR comprises both a cis-acting CAR sequence and a cis-acting ZCCHC14 proteinbinding sequence.
  • the ZCCHC14 protein-binding sequence may be located 3' to the CAR sequence.
  • the ZCCHC14 protein-binding sequence may be located 5' to the CAR sequence within the 3'UTR of the transgene expression cassette.
  • Suitable positions for the novel cis-acting sequences described herein may be identified by a person of skill in the art, taking into account Figure 1 , for example.
  • the 3' UTR of the transgene expression cassette comprises at least two spatially distinct cis-acting CAR sequences and/or at least two spatially distinct cis-acting ZCCHC14 protein-binding sequences.
  • the 3' UTR of the transgene expression cassette further comprises a polyA sequence located 3' to the cis-acting CAR sequence and/or cis-acting ZCCHC14 proteinbinding sequence. Details of polyA sequences are provided elsewhere herein.
  • the 3'UTR of the transgene expression cassette may comprise additional PRE sequences (in addition to the novel cis-acting CAR sequence and/or cis-acting ZCCHC14 protein-binding sequences described herein).
  • the 3'UTR may have a full length PRE sequence, such as wPRE.
  • Woodchuck Hepatitis Virus (WHV) Posttranscriptional Regulatory Element is a nucleotide sequence that, when transcribed, creates a tertiary structure enhancing expression. The sequence is commonly used in molecular biology to increase expression of genes delivered by viral vectors.
  • wPRE is a tripartite regulatory element with y, a, and p components (also referred to as elements herein), in the given order.
  • wPRE facilitates nucleocytoplasmic transport of RNA mediated by several alternative pathways that may be cooperative.
  • the wPRE has been shown to act on additional posttranscriptional mechanisms to stimulate expression of heterologous cDNAs. Further details relating to wPRE are provided elsewhere herein.
  • transgene expression cassettes the 3' UTR of which comprises both a novel cis-acting CAR sequence and/or cis-acting ZCCHC14 protein-binding sequence and an additional PRE, such as wPRE, are able to achieve markedly elevated transgene expression in cells.
  • the 3'UTR of the transgene expression cassette does not comprise additional PRE sequences (in such examples, the novel cis-acting CAR sequence and/or cis- acting ZCCHC14 protein-binding sequences described herein are considered enhance transgene expression sufficiently, and, for example, the additional transgene capacity achieved by omitting additional PRE sequences (such as wPRE sequences) is desirable).
  • transgene expression cassettes described herein may further comprise a promoter operably linked to the transgene.
  • a promoter operably linked to the transgene may be any suitable promoter operably linked to the transgene.
  • the promoter may be one that lacks its native intron (such as a promoter selected from the group consisting of: an EFS promoter, a PGK promoter, and a UBCs promoter).
  • the promoter may comprise an intron (for example, the promoter may be selected from the group consisting of: an EF1a promoter and a UBC promoter). These promoters are discussed in detail elsewhere herein.
  • the nucleotide sequences described herein comprise a transgene expression cassette (also referred to as transgene cassettes herein).
  • the invention has particular utility when the novel cis-acting sequences described herein are present within the 3'UTR of a viral vector transgene expression cassette. Accordingly, any discussion of a transgene expression cassette is particularly relevant to viral vector transgene expression cassettes.
  • the viral vector transgene expression cassette may be any suitable viral vector transgene expression cassette.
  • the viral vector transgene expression cassette may be selected from the group consisting of: a retroviral vector transgene expression cassette, an adenoviral vector transgene expression cassette, an adeno-associated viral vector transgene expression cassette, a herpes simplex viral vector transgene expression cassette, and a vaccinia viral vector transgene expression cassette.
  • the viral vector transgene expression cassette is an adeno-associated viral (AAV) vector transgene expression cassette.
  • the viral vector transgene expression cassette is a retroviral vector transgene expression cassette.
  • the retroviral vector transgene expression cassette may be a lentiviral vector transgene expression cassette. Appropriate viral vectors are discussed in detail elsewhere herein.
  • the invention has particular utility when the novel cis-acting sequences described herein are present within the 3'UTR of a retroviral vector transgene expression cassette (e.g. a lentiviral vector transgene expression cassette as used in the examples herein). Accordingly, any discussion of a transgene expression cassette is particularly relevant to retroviral vector transgene expression cassettes (e.g. lentiviral vector transgene expression cassettes as used in the examples herein).
  • the nucleotide sequences provided herein may be part of a viral vector genome.
  • the transgene expression cassette provided herein may be part of a larger nucleotide sequence, which further comprises additional elements that are required to make up a viral vector genome.
  • This may include, for example in the context of lentiviral vector genomes, a typical packaging sequence and rev-response element (RRE).
  • RRE rev-response element
  • these may further comprise additional post-transcriptional regulatory elements (PREs) such as that from the woodchuck hepatitis virus (wPRE), as considered above.
  • PREs post-transcriptional regulatory elements
  • a nucleotide sequence comprising a viral vector genome expression cassette is also provided herein, wherein the viral vector genome expression cassette comprises the transgene expression cassette described elsewhere herein.
  • the viral vector genome expression cassette may be any suitable cassette, for example it may be selected from the group consisting of: a retroviral vector genome expression cassette, an adenoviral vector genome expression cassette, an adeno-associated viral vector genome expression cassette, a herpes simplex viral vector genome expression cassette, and a vaccinia viral vector genome expression cassette.
  • the viral vector genome expression cassette is an adeno-associated viral vector genome expression cassette.
  • the viral vector genome expression cassette is a retroviral vector genome expression cassette.
  • the invention has particular utility when the novel cis-acting sequences described herein are present within the 3'UTR of a retroviral vector transgene expression cassette (e.g. a lentiviral vector transgene expression cassette as used in the examples herein). Accordingly, any discussion of a transgene expression cassette being part of a viral vector genome expression cassette is particularly relevant to retroviral vector genome expression cassettes (e.g. lentiviral vector genome expression cassettes as used in the examples herein).
  • the transgene expression cassette is in the forward orientation with respect to the viral vector genome expression cassette (such that the transgene expression cassette is encoded in the sense orientation).
  • the transgene expression cassette may be inverted with respect to the vector genome expression cassette i.e. the internal transgene promoter and gene sequences oppose the vector genome cassette promoter.
  • the 3' UTR of the retroviral vector genome expression cassette typically further comprises a 3' polypurine tract (3'ppt) and a DNA attachment (att) site.
  • the 3'ppt is located 5' to the att site within the 3'UTR of the retroviral vector genome expression cassette.
  • the core sequence that comprises both the 3'ppt and the att site may have a sequence of SEQ ID NO:25 (wherein 3'ppt is in bold, and att is underlined):
  • the sequence of SEQ ID NO: 26 may be used to provide the 3'ppt and att site (e.g. of a lentiviral vector genome expression cassette as described herein), (wherein 3'ppt is in bold, and att is underlined):
  • transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette, it is preferable if the sequence above (of SEQ ID NO:26) is not disrupted by the novel cis-acting sequences described herein.
  • transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette cis-acting elements within the 3'UTR of the transgene cassette may be positioned upstream and/or downstream of the above uninterrupted sequences.
  • Figure 1 also indicates how the CARe and/or ZCCHC14 binding loop(s) may be variably positioned within LV genome expression cassettes with or without a PRE, with transgene sequences in forward or reverse orientation.
  • the cis-acting sequence(s) described herein may be located 5' to the 3'ppt and/or 3' to the att site.
  • the cis-acting sequence(s) described herein are located 5' to the sequence of SEQ ID NO:25 or SEQ ID NO:26.
  • the cis-acting sequence(s) described herein may be located 3' to the sequence of SEQ ID NO:25 or SEQ ID NO:26.
  • sequence of SEQ ID NO:25 or SEQ ID NO:26 is not disrupted.
  • nucleotide sequences described herein may include additional features that are described in more detail elsewhere herein.
  • the nucleotide sequence comprises a lentiviral vector genome expression cassette
  • the major splice donor site in the lentiviral vector genome expression cassette may be inactivated.
  • the cryptic splice donor site 3' to the major splice donor site may also be inactivated.
  • the inactivated major splice donor site may have the sequence of GGGGAAGGCAACAGATAAATATGCCTTAAAAT (SEQ ID NO:4; MSD-2KOm5).
  • the nucleotide sequence may further comprise a nucleotide sequence encoding a modified U1 snRNA, wherein the modified U1 snRNA has been modified to bind to a nucleotide sequence within the packaging region of the lentiviral vector genome.
  • the viral vector genome expression cassette may be operably linked to the nucleotide sequence encoding the modified U1 snRNA. This feature is described in more detail elsewhere herein.
  • a viral vector comprising a viral vector genome encoded by the nucleotide sequences described herein are also provided herein.
  • the viral vector may be a viral particle or a virion.
  • a viral vector production system comprising a nucleotide sequence described herein and one or more additional nucleotide sequence(s) encoding viral vector components.
  • nucleotide sequences encoding viral vector components comprise nucleotide sequences encoding gag-pol and env.
  • the viral vector production system further comprises a nucleotide sequence encoding rev. Such components are described in more detail elsewhere herein.
  • Cells are also provided herein, wherein the cells comprise a nucleotide sequence described herein, a viral vector described herein, or a viral vector production system described herein.
  • a method for producing a viral vector comprising the steps of:
  • a method for identifying one or more cis-acting sequence(s) that improve transgene expression in a target cell comprising the steps of:
  • This method may be used to determine which one or more of the specific cis-acting sequences described herein improves transgene expression in a specific target cell.
  • the method therefore provides a mechanism for tailoring the selection of specific cis-acting sequences described herein for a specific combination of viral vector, transgene, and target cell.
  • the method can therefore advantageously be performed using a plurality of viral vectors with: (i) different cis-acting sequences in the 3'UTR of the transgene expression cassette; and
  • step (c) of the method comprises performing RT-PCR and optionally sequencing the transgene mRNA.
  • appropriate viral vectors may be selected from the group consisting of: a retroviral vector, an adenoviral vector, an adeno-associated viral vector, a herpes simplex viral vector and a vaccinia viral vector. Brief details of relevant viral vectors are provided below, followed by a more detailed review of lentiviral vectors specifically.
  • An adenovirus is a double-stranded, linear DNA virus that does not replicate through an RNA intermediate.
  • Adenoviruses are double-stranded DNA non-enveloped viruses that are capable of in vivo, ex vivo and in vitro transduction of a broad range of cell types of human and non-human origin. These cells include respiratory airway epithelial cells, hepatocytes, muscle cells, cardiac myocytes, synoviocytes, primary mammary epithelial cells and post-mitotically terminally differentiated cells such as neurons.
  • Adenoviral vectors are also capable of transducing non-dividing cells. This is very important for diseases, such as cystic fibrosis, in which the affected cells in the lung epithelium have a slow turnover rate. In fact, several trials are underway utilising adenovirus-mediated transfer of cystic fibrosis transporter (CFTR) into the lungs of afflicted adult cystic fibrosis patients.
  • CTR cystic fibrosis transporter
  • Adenoviruses have been used as vectors for gene therapy and for expression of heterologous genes. The large (36 kb) genome can accommodate up to 8 kb of foreign insert DNA and is able to replicate efficiently in complementing cell lines to produce very high titres of up to 1012 transducing units per ml. Adenovirus is thus one of the best systems to study the expression of genes in primary non-replicative cells.
  • Adenoviral vectors enter cells by receptor mediated endocytosis. Once inside the cell, adenovirus vectors rarely integrate into the host chromosome. Instead, they function episomally (independently from the host genome) as a linear genome in the host nucleus.
  • AAV-based vectors are produced in mammalian cell lines (e.g. HEK293-based) or through use of the baculovirus/Sf9 insect cell system.
  • AAV vectors can be produced by transient transfection of vector component encoding DNAs, typically together with helper functions from Adenovirus or Herpes Simplex virus (HSV), or by use of cell lines stably expressing AAV vector components.
  • Adenoviral vectors are typically produced in mammalian cell lines that stably express Adenovirus E1 functions (e.g. HEK293-based).
  • Adenoviral vectors are also typically 'amplified' via helper-function-dependent replication through serial rounds of 'infection' using the production cell line.
  • An adenoviral vector and production system thereof comprises a polynucleotide comprising all or a portion of an adenovirus genome. It is well known that an adenovirus is, without limitation, an adenovirus derived from Ad2, Ad5, Ad12, and Ad40.
  • An adenoviral vector is typically in the form of DNA encapsulated in an adenovirus coat or adenoviral DNA packaged in another viral or viral-like form (such as herpes simplex, and AAV).
  • An AAV vector it is commonly understood to be a vector derived from an adeno-associated virus serotype, including without limitation, AAV-1 , AAV-2, AAV-3, AAV-4, AAV-5, AAV-6, AAV-7 and AAV-8.
  • AAV vectors can have one or more of the AAV wild-type genes deleted in whole or part, preferably the rep and/or cap genes, but retain functional flanking ITR sequences. Functional ITR sequences are necessary for the rescue, replication and packaging of the AAV virion.
  • an AAV vector is defined herein to include at least those sequences required in cis for replication and packaging (e.g., functional ITRs) of the virus.
  • the ITRs need not be the wild-type nucleotide sequences, and may be altered, e.g., by the insertion, deletion or substitution of nucleotides, so long as the sequences provide for functional rescue, replication and packaging.
  • An 'AAV vector' also refers to its protein shell or capsid, which provides an efficient vehicle for delivery of vector nucleic acid to the nucleus of target cells.
  • AAV production systems require helper functions which typically refers to AAV- derived coding sequences which can be expressed to provide AAV gene products that, in turn, function in trans for productive AAV replication.
  • AAV helper functions include both of the major AAV open reading frames (ORFs), rep and cap.
  • the Rep expression products have been shown to possess many functions, including, among others: recognition, binding and nicking of the AAV origin of DNA replication; DNA helicase activity; and modulation of transcription from AAV (or other heterologous) promoters.
  • the Cap expression products supply necessary packaging functions.
  • AAV helper functions are used herein to complement AAV functions in trans that are missing from AAV vectors. It is understood that an AAV helper construct refers generally to a nucleic acid molecule that includes nucleotide sequences providing AAV functions deleted from an AAV vector which is to be used to produce a transducing vector for delivery of a nucleotide sequence of interest.
  • AAV helper constructs are commonly used to provide transient expression of AAV rep and/or cap genes to complement missing AAV functions that are necessary for AAV replication; however, helper constructs lack AAV ITRs and can neither replicate nor package themselves.
  • AAV helper constructs can be in the form of a plasmid, phage, transposon, cosmid, virus, or virion.
  • a number of AAV helper constructs have been described, such as the commonly used plasmids pAAV/Ad and plM29+45 which encode both Rep and Cap expression products. See, e.g., Samulski et al. (1989) J. Virol. 63:3822-3828; and McCarty et al. (1991) J. Virol.
  • a number of other vectors have been described which encode Rep and/or Cap expression products. See, e.g., U.S. Pat. Nos. 5,139,941 and 6,376,237.
  • the term "accessory functions" refers to non-AAV derived viral and/or cellular functions upon which AAV is dependent for its replication.
  • the term captures proteins and RNAs that are required in AAV replication, including those moieties involved in activation of AAV gene transcription, stage specific AAV mRNA splicing, AAV DNA replication, synthesis of Cap expression products and AAV capsid assembly.
  • Viral-based accessory functions can be derived from any of the known helper viruses such as adenovirus, herpesvirus (other than herpes simplex virus type-1) and vaccinia virus.
  • Herpes simplex virus is an enveloped double-stranded DNA virus that naturally infects neurons. It can accommodate large sections of foreign DNA, which makes it attractive as a vector system, and has been employed as a vector for gene delivery to neurons (Manservigiet et al Open Virol J. (2010) 4:123-156).
  • the use of HSV in therapeutic procedures requires the strains to be attenuated so that they cannot establish a lytic cycle.
  • the polynucleotide should preferably be inserted into an essential gene. This is because if a viral vector encounters a wild-type virus, transfer of a heterologous gene to the wild-type virus could occur by recombination. However, as long as the polynucleotide is inserted into an essential gene, this recombinational transfer would also delete the essential gene in the recipient virus and prevent "escape" of the heterologous gene into the replication competent wild-type virus population.
  • Vaccinia virus vectors include MVA or NYVAC.
  • Alternatives to vaccinia vectors include avipox vectors such as fowlpox or canarypox known as ALVAC and strains derived therefrom which can infect and express recombinant proteins in human cells but are unable to replicate.
  • Retroviral vectors may be derived from or may be derivable from any suitable retrovirus.
  • retroviruses include: murine leukemia virus (MLV), human T-cell leukemia virus (HTLV), mouse mammary tumour virus (MMTV), Rous sarcoma virus (RSV), Fujinami sarcoma virus (FuSV), Moloney murine leukemia virus (Mo MLV), FBR murine osteosarcoma virus (FBR MSV), Moloney murine sarcoma virus (Mo-MSV), Abelson murine leukemia virus (A-MLV), Avian myelocytomatosis virus-29 (MC29) and Avian erythroblastosis virus (AEV).
  • MMV murine leukemia virus
  • HTLV human T-cell leukemia virus
  • MMTV mouse mammary tumour virus
  • RSV Rous sarcoma virus
  • Fujinami sarcoma virus FuSV
  • Retroviruses may be broadly divided into two categories, namely "simple” and “complex”. Retroviruses may even be further divided into seven groups. Five of these groups represent retroviruses with oncogenic potential. The remaining two groups are the lentiviruses and the spumaviruses. A review of these retroviruses is presented in Coffin et al (1997) ibid.
  • retroviral and lentiviral genomes share many common features such as a 5' LTR and a 3' LTR, between or within which are located a packaging signal to enable the genome to be packaged, a primer binding site, integration sites to enable integration into a target cell genome and gag/pol and env genes encoding the packaging components - these are polypeptides required for the assembly of viral particles.
  • Lentiviruses have additional features, such as the rev gene and RRE sequences in HIV, which enable the efficient export of RNA transcripts of the integrated provirus from the nucleus to the cytoplasm of an infected target cell.
  • LTRs long terminal repeats
  • the LTRs are responsible for proviral integration, and transcription. LTRs also serve as enhancer-promoter sequences and can control the expression of the viral genes.
  • the LTRs themselves are identical sequences that can be divided into three elements, which are called U3, R and U5.
  • U3 is derived from the sequence unique to the 3' end of the RNA.
  • R is derived from a sequence repeated at both ends of the RNA and
  • U5 is derived from the sequence unique to the 5' end of the RNA.
  • the sizes of the three elements can vary considerably among different retroviruses.
  • At least part of one or more protein coding regions essential for replication may be removed from the virus; for example, gag/pol and env may be absent or not functional. This makes the viral vector replication-defective.
  • polyadenylation is part of the maturation of mRNA for translation and involves the addition of a polyadenine (poly(A)) tail to an mRNA transcript.
  • the poly(A) tail comprises multiple adenosine monophosphates and is important for the nuclear export, translation and stability of mRNA.
  • the process of polyadenylation begins as the transcription of a gene terminates.
  • a set of cellular proteins binds to the polyA sequence elements such that the 3' segment of the transcribed pre-mRNA is first cleaved followed by synthesis of the poly(A) tail at the 3' end of the mRNA.
  • a poly(A) tail is added at one of several possible sites, producing multiple transcripts from a single gene.
  • Native retroviral vector genomes are typically flanked by 3' and 5' long terminal repeats (LTRs).
  • Native retrovirus LTRs comprise a U3 region (containing the enhancer/promoter activities necessary for transcription), and an R-U5 region that comprises important cis-acting sequences regulating a number of functions, including packaging, splicing, polyadenylation and translation.
  • Retrovirus polyadenylation (polyA) sequences required for efficient transcriptional termination also reside within native retrovirus LTRs.
  • PAS core polyadenylation signal
  • DSE downstream Gil-rich downstream enhancer
  • USE upstream enhancer
  • the present invention may be combined with major splice donor (MSD) site knock out lentiviral vector genomes.
  • the invention may employ lentiviral vector genomes in which the major splice donor site, and optionally the cryptic splice donor site 3' to the major splice donor site, are inactivated.
  • the major splice donor site in the genome of the lentiviral vector, and optionally the cryptic splice donor site 3' to the major splice donor site in the genome of the lentiviral vector are inactivated.
  • the inactivated major splice donor site has the sequence set forth in SEQ ID NO: 4.
  • Suitable inactivated splice sites for use according to the present invention are described in WO 2021/160993 and incorporated herein by reference.
  • RNA splicing is catalysed by a large RNA-protein complex called the spliceosome, which is comprised of five small nuclear ribonucleoproteins (snRNPs).
  • snRNPs small nuclear ribonucleoproteins
  • the borders between introns and exons are marked by specific nucleotide sequences within a pre-mRNA, which delineate where splicing will occur. Such boundaries are referred to as "splice sites”.
  • the term "splice site” refers to polynucleotides that are capable of being recognized by the splicing machinery of a eukaryotic cell as suitable for being cut and/or ligated to another splice site.
  • Splice sites allow for the excision of introns present in a pre-mRNA transcript.
  • the 5' splice boundary is referred to as the “splice donor site” or the “5' splice site”
  • the 3' splice boundary is referred to as the “splice acceptor site” or the "3' splice site”.
  • Splice sites include, for example, naturally occurring splice sites, engineered or synthetic splice sites, canonical or consensus splice sites, and/or non-canonical splice sites, for example, cryptic splice sites.
  • Splice acceptor sites generally consist of three separate sequence elements: the branch point or branch site, a polypyrimidine tract and the acceptor consensus sequence.
  • the branch point consensus sequence in eukaryotes is YNYTRAC (where Y is a pyrimidine, N is any nucleotide, and R is a purine).
  • the 3' acceptor splice site consensus sequence is YAG (where Y is a pyrimidine) (see, e.g., Griffiths et al., eds., Modern Genetic Analysis, 2nd edition, W.H. Freeman and Company, New York (2002)).
  • the 3' splice acceptor site typically is located at the 3' end of an intron.
  • canonical splice site or “consensus splice site” may be used interchangeably and refer to splice sites that are conserved across species.
  • Consensus sequences for the 5' donor splice site and the 3' acceptor splice site used in eukaryotic RNA splicing are well known in the art. These consensus sequences include nearly invariant dinucleotides at each end of the intron: GT at the 5' end of the intron, and AG at the 3' end of an intron.
  • the canonical splice donor site consensus sequence may be (for DNA) AG/GTRAGT (where A is adenosine, T is thymine, G is guanine, C is cytosine, R is a purine and indicates the cleavage site).
  • AG/GTRAGT AG/GTRAGT
  • A is adenosine
  • T is thymine
  • G guanine
  • C is cytosine
  • R is a purine and indicates the cleavage site.
  • a splice donor may deviate from this consensus, especially in viral genomes where other constraints bear on the same sequence, such as secondary structure for example within a vRNA packaging region.
  • Non- canonical splice sites are also well known in the art, albeit they occur rarely compared to the canonical splice donor consensus sequence.
  • major splice donor site is meant the first (dominant) splice donor site in the viral vector genome, encoded and embedded within the native viral RNA packaging sequence typically located in the 5' region of the viral vector nucleotide sequence.
  • the lentiviral vector genome does not contain an active major splice donor site, that is splicing does not occur from the major splice donor site, and splicing activity from the major splice donor site is ablated.
  • the major splice donor site is located in the 5' packaging region of a lentiviral genome.
  • the major splice donor consensus sequence is (for DNA) TG/GTRAGT (where A is adenosine, T is thymine, G is guanine, C is cytosine, R is a purine and indicates the cleavage site).
  • the splice donor region i.e. the region of the vector genome which comprises the major splice donor site prior to mutation, may have the following sequence:
  • the mutated splice donor region may comprise the sequence:
  • the mutated splice donor region may comprise the sequence: GGGGCGGCGAGTGGAGACTACGCCAAAAAT ( SEQ ID NO : 3 - MSD-2KOv2 )
  • the mutated splice donor region may comprise the sequence:
  • the splice donor region may comprise the sequence:
  • This sequence is also referred to herein as the "stem loop 2" region (SL2).
  • This sequence may form a stem loop structure in the splice donor region of the vector genome.
  • this sequence (SL2) may have been deleted from the nucleotide sequence according to the invention as described herein.
  • the invention encompasses the use of a lentiviral vector genome that does not comprise SL2.
  • the invention encompasses the use of a lentiviral vector genome that does not comprise a sequence according to SEQ ID NO:5.
  • the major splice donor site may have the following consensus sequence, wherein R is a purine and "/" is the cleavage site:
  • R may be guanine (G).
  • the major splice donor and cryptic splice donor region may have the following core sequence, wherein "/" are the cleavage sites at the major splice donor and cryptic splice donor sites:
  • the MSD-mutated vector genome may have at least two mutations in the major splice donor and cryptic splice donor 'region' (/GTGA/GTA), wherein the first and second 'GT' nucleotides are the immediately 3' of the major splice donor and cryptic splice donor nucleotides respectively
  • the major splice donor consensus sequence is CTGGT.
  • the major splice donor site may contain the sequence CTGGT.
  • the nucleotide sequence encoding the lentiviral vector genome, prior to inactivation of the splice sites comprises the sequence as set forth in any of SEQ ID NOs: 1 , 5 and/or the sequence TG/GTRAGT, CTGGT, TGAGT and/or /GTGA/GTA.
  • the nucleotide sequence comprises an inactivated major splice donor site which would otherwise have a cleavage site between nucleotides corresponding to nucleotides 13 and 14 of SEQ ID NO:1.
  • the nucleotide sequence encoding the lentiviral vector genome also contains an inactive cryptic splice donor site.
  • the nucleotide sequence does not contain an active cryptic splice donor site adjacent to (3' of) the major splice donor site, that is to say that splicing does not occur from the adjacent cryptic splice donor site, and splicing from the cryptic splice donor site is ablated.
  • the term "cryptic splice donor site” refers to a nucleic acid sequence which does not normally function as a splice donor site or is utilised less efficiently as a splice donor site due to the adjacent sequence context (e.g. the presence of a nearby 'preferred' splice donor), but can be activated to become a more efficient functioning splice donor site by mutation of the adjacent sequence (e.g. mutation of the nearby 'preferred' splice donor).
  • the cryptic splice donor site is the first cryptic splice donor site 3' of the major splice donor.
  • the cryptic splice donor site is within 6 nucleotides of the major splice donor site on the 3' side of the major splice donor site.
  • the cryptic splice donor site is within 4 or 5, preferably 4, nucleotides of the major splice donor cleavage site.
  • the cryptic splice donor site has the consensus sequence TGAGT.
  • the inactivated cryptic splice donor site which would otherwise have a cleavage site between nucleotides corresponding to nucleotides 17 and 18 of SEQ ID NO:1.
  • the major splice donor site and/or adjacent cryptic splice donor site contain a "GT” motif.
  • both the major splice donor site and adjacent cryptic splice donor site contain a "GT” motif which is mutated.
  • the mutated GT motifs may inactivate splice activity from both the major splice donor site and adjacent cryptic splice donor site.
  • MSD-2KO An example of such a mutation is referred to herein as "MSD-2KO”.
  • the splice donor region may comprise the following sequence:
  • the mutated splice donor region may comprise the following sequence:
  • the mutated splice donor region may comprise the following sequence:
  • the mutated splice donor region may comprise the following sequence:
  • GGCGAGTGGAGACTACGCC SEQ ID NO : 7
  • the mutated splice donor region may comprise the following sequence:
  • the stem loop 2 region as described above may be deleted from the splice donor region, resulting in inactivation of both the major splice donor site and the adjacent cryptic splice donor site.
  • ASL2 Such a deletion is referred to herein as "ASL2".
  • a variety of different types of mutations can be introduced into the nucleic acid sequence in order to inactivate the major and adjacent cryptic splice donor sites.
  • the mutation is a functional mutation to ablate or suppress splicing activity in the splice region.
  • the nucleotide sequence may contain a mutation or deletion in any of the nucleotides in any of SEQ ID NOs: 1 , 5 and/or the sequence TG/GTRAGT, CTGGT, TGAGT and/or /GTGA/GTA. Suitable mutations will be known to one skilled in the art, and are described herein.
  • a point mutation can be introduced into the nucleic acid sequence.
  • a "nonsense” mutation produces a stop codon.
  • a "missense” mutation produces a codon that encodes a different amino acid.
  • a “silent” mutation produces a codon that encodes either the same amino acid or a different amino acid that does not alter the function of the protein.
  • One or more point mutations can be introduced into the nucleic acid sequence comprising the cryptic splice donor site.
  • the nucleic acid sequence comprising the cryptic splice site can be mutated by introducing two or more point mutations therein.
  • At least two point mutations can be introduced in several locations within the nucleic acid sequence comprising the major splice donor and cryptic splice donor sites to achieve attenuation of splicing from the splice donor region.
  • the mutations may be within the four nucleotides at the splice donor cleavage site; in the canonical splice donor consensus sequence this is A 1 G 2 /G 3 T 4 , wherein "/" is the cleavage site. It is well known in the art that a splice donor cleavage site may deviate from this consensus, especially in viral genomes where other constraints bear on the same sequence, such as secondary structure for example within a vRNA packaging region.
  • the G 3 T 4 dinucleotide is generally the least variable sequence within the canonical splice donor consensus sequence, and mutations to the G 3 and or T 4 will most likely achieve the greatest attenuating effect.
  • the major splice donor site in HIV-1 viral vector genomes this can be T 1 G 2 /G 3 T 4 , wherein "/" is the cleavage site.
  • the cryptic splice donor site in HIV-1 viral vector genomes this can be G 1 A 2 /G 3 T 4 , wherein "/" is the cleavage site.
  • the point mutation(s) can be introduced adjacent to a splice donor site.
  • the point mutation can be introduced upstream or downstream of a splice donor site.
  • the nucleic acid sequence comprising a major and/or cryptic splice donor site is mutated by introducing multiple point mutations therein, the point mutations can be introduced upstream and/or downstream of the cryptic splice donor site.
  • the nucleotide sequence encoding the RNA genome of the lentiviral vector for use according to the invention may optionally further comprise a mutation in a cryptic splice donor site within the SL4 loop of the packaging sequence.
  • a GT dinucleotide in said cryptic splice donor site within the SL4 loop of the packaging sequence is mutated to GC.
  • the nucleotide sequence encoding the lentiviral vector genome may be suitable for use in a lentiviral vector in a U3 or tat-independent system for vector production.
  • 3 rd generation lentiviral vectors are U3/tat-independent, and the nucleotide sequences according to the present invention may be used in the context of a 3 rd generation lentiviral vector.
  • tat is not provided in the lentiviral vector production system, for example tat is not provided in trans.
  • the cell or vector or vector production system as described herein does not comprise the tat protein.
  • HIV-1 U3 is not present in the lentiviral vector production system, for example HIV-1 U3 is not provided in cis to drive transcription of vector genome expression cassette.
  • the major splice donor site in the lentiviral vector genome is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence is for use in a tat-independent lentiviral vector.
  • the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence is produced in the absence of tat.
  • the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence has been transcribed independently of tat.
  • the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence is for use in a U3-independent lentiviral vector.
  • the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence has been transcribed independently of the U3 promoter.
  • the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence has been transcribed by a heterologous promoter.
  • transcription of the nucleotide sequence as described herein is not dependent on the presence of U3.
  • the nucleotide sequence may be derived from a U3-independent transcription event.
  • the nucleotide sequence may be derived from a heterologous promoter.
  • a nucleotide sequence as described herein may not comprise a native U3 promoter.
  • Splice site mutants of the present invention may be constructed using a variety of techniques. For example, mutations may be introduced at particular loci by synthesising oligonucleotides containing a mutant sequence, flanked by restriction sites enabling ligation to fragments of the native sequence. Following ligation, the resulting reconstructed sequence comprises a derivative having the desired nucleotide insertion, substitution, or deletion.
  • splice site mutants may be constructed as described in WO 2021/160993 (which is incorporated herein by reference in its entirety).
  • oligonucleotide-directed site-specific (or segment specific) mutagenesis procedures may be employed to provide an altered sequence according to the substitution, deletion, or insertion required.
  • Deletion or truncation derivatives of splice site mutants may also be constructed by utilising convenient restriction endonuclease sites adjacent to the desired deletion. Subsequent to restriction, overhangs may be filled in, and the DNA religated. Exemplary methods of making the alterations set forth above are disclosed by Sambrook et al. (Molecular cloning: A Laboratory Manual, 2d Ed., Cold Spring Harbor Laboratory Press, 1989).
  • Splice site mutants may also be constructed utilising techniques of PCR mutagenesis, chemical mutagenesis, chemical mutagenesis (Drinkwater and Klinedinst, 1986) by forced nucleotide misincorporation (e.g., Liao and Wise, 1990), or by use of randomly mutagenised oligonucleotides (Horwitz et al., 1989).
  • the present invention also provides a method for producing a lentiviral vector nucleotide sequence, comprising the steps of: providing a nucleotide sequence encoding the RNA genome of a lentiviral vector as described herein; and mutating the major splice donor site and cryptic splice donor site as described herein in said nucleotide sequence.
  • MSD-mutated lentiviral vectors are preferable to current standard lentiviral vectors for use as gene therapy vectors due to their reduced capacity to partake in aberrant splicing events both during LV production and in target cells.
  • the production of MSD mutated vectors has either relied upon supply of the HIV-1 tat protein (1 st and 2 nd generation U3-dependent lentiviral vectors), has been of lower efficiency due to the destabilising effect of mutating the MSD on vector RNA levels (in 3 rd generation vectors), or, as discovered by the present inventors, is improved by co-expression of modified U1 snRNA.
  • the present inventors have previously found that MSD-mutated, 3 rd generation (i.e.
  • U3/tat-independent LVs could be produced to high titre by co-expression of a modified U1 snRNA directed to bind to the 5'packaging region of the vector genome RNA during production (see WO 2021/014157 and WO 2021/160993, incorporated herein by reference).
  • vRNA produced from so-called MSD-mutated (or MSD-2KO) lentiviral vector genomes is typically substantially reduced, leading to lower vector titres. It is theorized that an 'early' interaction with the MSD and U1 snRNA (prior to splicing decisions) is important for transcription elongation from the external promoter. The inventors previously found that one solution to this problem was to provide a modified U1 snRNA in trans during LV production to stabilize the vRNA (see WO 2021/014157 and WO 2021/160993).
  • modified U1 snRNAs can enhance the production titres of MSD-mutated LVs in a manner that is independent of the presence of the 5'polyA signal within the 5'R region, indicating a novel mechanism over others' use of modified U1 snRNAs to suppress polyadenylation (so called U1 -interference, [lli]).
  • U1 -interference, [lli] modified U1 -interference, [lli]
  • the present inventors previously showed that the output titres of lentiviral vectors can be enhanced by co-expressing non-coding RNAs based on U1 snRNAs, which have been modified so that they no longer target the endogenous sequence (a splice donor site) but now target a sequence within the vRNA molecule.
  • the relative enhancement in output titres of lentiviral vectors harbouring attenuating mutations within the major splice donor region (containing the major splice donor and cryptic splice donor sites) by said modified U1 snRNAs are greater than standard lentiviral vectors containing a non-mutated major splice donor region.
  • Vector genomes harbouring a broad range of mutation types within the major splice donor region (point mutations, region deletion, and sequence replacement) that lead to reduced titres may be used in combination with a modified U1 snRNA.
  • the approach may comprise coexpression of modified U1 snRNAs together with the other vector components during vector production.
  • the modified U1 snRNAs are designed such that binding to the consensus splice donor site has been ablated by replacing it with a heterologous sequence that is complementary to a target sequence within the vector genome vRNA.
  • the nucleotide sequence of the invention is used in combination with a modified U1 snRNA.
  • the nucleotide sequence of the invention further comprises a nucleotide sequence encoding a modified U1 snRNA.
  • the nucleotide sequence encoding the lentiviral vector genome further encodes a modified U1 snRNA.
  • the nucleotide sequence encoding the lentiviral vector genome is operably linked to the nucleotide sequence encoding the modified U1 snRNA.
  • the nucleotide sequence encoding a modified U1 snRNA may be provided on a different nucleotide sequence, for example on a different plasmid.
  • the nucleotide sequence encoding a modified U1 snRNA may be provided in trans during production of a lentiviral vector as described herein.
  • Splicing and polyadenylation are key processes for mRNA maturation, particularly in higher eukaryotes where most protein-coding transcripts contain multiple introns.
  • the elements within a pre-mRNA that are required for splicing include the 5' splice donor signal, the sequence surrounding the branch point and the 3' splice acceptor signal. Interacting with these three elements is the spliceosome, which is formed by five small nuclear RNAs (snRNAs), including U1 snRNA, and associated nuclear proteins (snRNP).
  • snRNAs small nuclear RNAs
  • U1 snRNA is expressed by a polymerase II promoter and is present in most eukaryotic cells (Lund et al., 1984, J. Biol. Chem., 259:2013-2021).
  • U1 snRNA small nuclear RNA
  • U1 snRNA contains a short sequence at its 5'-end that is broadly complementary to the 5' splice donor sites at exon-intron junctions.
  • U1 snRNA participates in splice-site selection and spliceosome assembly by base pairing to the 5' splice donor site.
  • a known function for U1 snRNA outside splicing is in the regulation of 3'-end mRNA processing: it suppresses premature polyadenylation (polyA) at early polyA signals (particularly within introns).
  • U1 snRNA small nuclear RNA
  • the endogenous non-coding RNA, U1 snRNA binds to the consensus 5' splice donor site (e.g. 5'-MAGGURR-3', wherein M is A or C and R is A or G) via the native splice donor annealing sequence (e.g. 5'-ACUUACCUG-3') during early steps of intron splicing.
  • Stem loop I binds to U1A-70K protein that has been shown to be important for polyA suppression.
  • the modified U1 snRNA for use according to the present invention is modified to introduce a heterologous sequence that is complementary to a target sequence within the vector genome vRNA molecule at the site of the native splice donor targeting/annealing sequence.
  • Suitable modified U1 snRNAs for use according to the present invention are described in WO 2021/014157 and WO 2021/160993 and are incorporated herein by reference.
  • the modified U1 snRNAs can be manufactured according to methods generally known in the art.
  • the modified U1 snRNAs can be manufactured by chemical synthesis or recombinant DNA/RNA technology.
  • the modified U1 snRNAs as described herein can be manufactured as described in WO 2021/014157 and WO 2021/160993.
  • the present invention may be combined with the 'TRIP' system.
  • WO2015/092440 and WO2021/094752 which are incorporated in their entirety herein by reference, disclose the use of a heterologous translation control system in eukaryotic cell cultures to repress the translation of the NOI (repress transgene expression) during viral vector production and thus repress or prevent expression of the protein encoded by the NOI.
  • This system is referred to as the Transgene Repression In vector Production cell system or TRIP system.
  • the TRIP system utilises the bacterial trp operon regulation protein, tryptophan RNA-binding attenuation protein (TRAP), and the TRAP binding site/sequence (tbs) to mediate transgene repression.
  • TRAP tryptophan RNA-binding attenuation protein
  • tbs TRAP binding site/sequence
  • binding site is to be understood as a nucleic acid sequence that is capable of interacting with a certain protein.
  • the nucleic acid binding site e.g. tbs or portion thereof
  • a protein for example TRAP
  • TRAP RNA-binding protein
  • Such an interaction with an RNA-binding protein such as TRAP results in the repression or prevention of translation of a NOI to which the nucleic acid binding site (e.g. the tbs or portion thereof) is operably linked.
  • a consensus TRAP binding site sequence that is capable of binding TRAP is [KAGNN] repeated multiple times (e.g. 6, 7, 8, 9, 10, 11 , 12 or more times); such sequence is found in the native trp operon. In the native context, occasionally AAGNN is tolerated and occasionally additional "spacing" N nucleotides result in a functional sequence. In vitro experiments have demonstrated that at least 6 or more consensus repeats are required for TRAP-RNA binding (Babitzke P, Y. J., Campanelli D. (1996) Journal of Bacteriology 178(17): 5159-5163).
  • K may be T or G in DNA and II or G in RNA and "N” is to be understood as specifying any nucleotide at that position in the sequence (for example, "N” could be G, A, T, C or II).
  • the lentiviral vector genome further comprises a tbs.
  • the nucleotide sequence of the invention further comprises a TRAP binding site (tbs).
  • a nucleotide sequence encoding TRAP is present during production of the lentiviral vector as described herein.
  • the nucleotide sequence may further comprise a tbs, and also may comprise a Kozak sequence, wherein said tbs overlaps the Kozak sequence, or wherein said Kozak sequence comprises a portion of a tbs.
  • the nucleotide sequence may further comprise a multiple cloning site and a Kozak sequence, wherein said multiple cloning site is overlapping with or located downstream to the 3' KAGN2-3 repeat of the tbs and upstream of the Kozak sequence.
  • a "multiple cloning site” is to be understood as a DNA region which contains several restriction enzyme recognition sites (restriction enzyme sites) very close to each other.
  • the RE sites may be overlapping in the MCS for use in the invention.
  • a "restriction enzyme site” or “restriction enzyme recognition site” is a location on a DNA molecule containing specific sequences of nucleotides, 4-8 nucleotides in length, which are recognised by restriction enzymes.
  • a restriction enzyme recognises a specific RE site (i.e. a specific sequence) and cleaves the DNA molecule within, or nearby, the RE site.
  • the nucleotide of interest i.e. transgene
  • the nucleotide of interest is operably linked to the tbs.
  • the nucleotide of interest is translated in a target cell which expresses TRAP.
  • the nucleotide of interest is translated in a target cell which lacks TRAP.
  • tbs or portion thereof for use in the invention operably linked to a NOI is positioned in such a way that translation of the NOI is modified when as TRAP binds to the tbs or portion thereof.
  • the tbs may be capable of interacting with TRAP such that translation of the nucleotide of interest is repressed or prevented in a viral vector production cell.
  • ORFs present in the vector backbone delivered in transduced (e.g. patient) cells could be transcribed, for example, when read-through transcription from upstream cellular promoters occurs (lentiviral vectors target active transcription sites), leading to potential aberrant transcription of genetic material located in the vector backbone in patient cells. This potential aberrant transcription of genetic material located in the vector backbone following read- through transcription could also occur during lentiviral vector production in production cells.
  • the viral cis-acting sequence present within lentiviral vector genomes may contain multiple internal ORFs. These internal ORFs may be found between an internal ATG sequence of the viral cis-acting sequence and the stop codon immediately 3' to the ATG sequence.
  • Modifications in a viral cis-acting sequence to disrupt at least one internal ORF for example by mutating the ATG sequence which denotes the start of the at least one internal ORF, are tolerated.
  • the modified viral cis-acting sequence described herein retains its function.
  • the lentiviral vector genome comprises at least one modified viral cis-acting sequence, wherein at least one internal open reading frame (ORF) in the viral cis-acting sequence is disrupted (see PCT/GB2021/050620, incorporated herein by reference in its entirety).
  • ORF open reading frame
  • the at least one internal ORF may be disrupted by mutating at least one ATG sequence (ATG sequences may function as translation initiation codons).
  • the lentiviral vector genome comprises a modified nucleotide sequence encoding gag, wherein at least one internal ORF in the modified nucleotide sequence encoding gag is disrupted (see PCT/GB2021/050620, incorporated herein by reference in its entirety).
  • the at least one internal ORF in the modified nucleotide sequence encoding gag may be disrupted by mutating at least one ATG sequence as described herein.
  • the lentiviral vector genome comprises at least two (suitably at least three, at least four, at least five, at least six, at least seven) modified viral cis-acting sequences.
  • At least two (suitably at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen or at least twenty) internal ORFs in the at least one viral cis-acting sequence and/or in the nucleotide sequence encoding gag may be disrupted. In some embodiments, at least three internal ORFs in the at least one viral cis-acting sequence and/or in the nucleotide sequence encoding gag may be disrupted.
  • one (suitably, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen or twenty) internal ORFs in the at least one viral cis-acting sequence and/or the nucleotide sequence encoding gag may be disrupted.
  • the at least one internal ORF may be disrupted such that the internal ORF is not expressed. In some embodiments, the at least one internal ORF may be disrupted such that the internal ORF is not translated. In some embodiments, the at least one internal ORF may be disrupted such that no protein is expressed from the internal ORF. In some embodiments, the at least one internal ORF may be disrupted such that no protein is translated from the internal ORF.
  • the at least one internal ORF present in the modified viral cis- acting sequence and/or in the modified nucleotide sequence encoding gag in the vector backbone delivered in transduced cells may be disrupted such that aberrant transcription of the internal ORF is prevented when there is read-through transcription from upstream cellular promoters.
  • the at least one internal ORF may be disrupted by mutating at least one ATG sequence.
  • a "mutation" of an ATG sequence may comprise one or more nucleotide deletions, additions, or substitutions.
  • the at least one ATG sequence may be mutated in the modified viral cis- acting sequence and/or in the modified nucleotide sequence encoding gag to a sequence selected from the group consisting of: a) an ATTG sequence; b) an ACG sequence; c) an A-G sequence; d) an AAG sequence; e) a TTG sequence; and/or f) an ATT sequence.
  • the at least one ATG sequence may be mutated to an ATTG sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag.
  • the at least one ATG sequence may be mutated to an ACG sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag.
  • the at least one ATG sequence may be mutated to an A-G sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag.
  • the at least one ATG sequence may be mutated to an AAG sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag.
  • the at least one ATG sequence may be mutated to a TTG sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag.
  • the at least one ATG sequence may be mutated to an ATT sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag.
  • the at least one modified viral cis-acting element and/or the modified nucleotide sequence encoding gag may lack ATG sequences.
  • all ATG sequences within viral cis-acting sequences and/or the nucleotide sequence encoding gag in the lentiviral vector genome are mutated.
  • Lentiviral vectors typically comprise multiple viral cis-acting sequences.
  • Example viral cis- acting sequences include gag-p17, Rev response element (RRE), central polypurine tract (cppt) and Woodchuck hepatitis virus (WHV) post-transcriptional regulatory element (WPRE).
  • RRE Rev response element
  • cppt central polypurine tract
  • WPRE Woodchuck hepatitis virus
  • the at least one viral cis-acting sequence may be at least one lentiviral cis-acting sequence.
  • Example lentiviral cis-acting sequences include the RRE and cppt.
  • the at least one viral cis-acting sequence may be at least one non- lentiviral cis-acting sequence.
  • the at least one viral cis-acting sequence may be at least one lentiviral cis-acting sequence and at least one non-lentiviral cis-acting sequence.
  • the at least one viral cis-acting sequence is: a) gag-p17; and/or b) a Rev response element (RRE); and/or c) a Woodchuck hepatitis virus (WHV) post-transcriptional regulatory element (WPRE).
  • RRE Rev response element
  • WPRE Woodchuck hepatitis virus
  • the at least one viral cis-acting sequence is a RRE.
  • the at least one viral cis-acting sequence is a WPRE.
  • the lentiviral vector genome comprises at least two (suitably, at least 3, at least 4, at least 5) modified viral cis-acting sequences.
  • the lentiviral vector genome comprises a modified RRE as described herein and a modified WPRE as described herein.
  • the lentiviral vector genome comprises a modified RRE as described herein, a modified WPRE as described herein and a modified nucleotide sequence encoding gag as described herein.
  • the lentiviral vector genome as described herein lacks ATG sequences in the backbone of the vector genome. In one embodiment, the lentiviral vector genome as described herein lacks ATG sequences except in the NOI (i.e. transgene).
  • the lentiviral vector genome comprises at least one modified viral cis- acting sequence and/or a modified nucleotide sequence encoding gag, wherein at least one internal open reading frame (ORF) in the viral cis-acting sequence or in the nucleotide sequence encoding gag is ablated.
  • ORF internal open reading frame
  • the lentiviral vector genome comprises at least one modified viral cis- acting sequence and/or a modified nucleotide sequence encoding gag, wherein at least one internal open reading frame (ORF) in the viral cis-acting sequence or in the nucleotide sequence encoding gag is silenced.
  • ORF internal open reading frame
  • a further preferred but optional feature of the invention is the minimization of gag sequences included within the packaging sequences used in combination with the aforementioned features.
  • the amount of gag typically included within HIV-1 lentiviral vector packaging sequences can be reduced by at least 270 nucleotides, but may be reduced by up to the entire gag sequence.
  • the deleted gag nucleotide sequence may be that of the gag-p17 instability sequence. Deletion of the gag-p17 instability sequence typically results in reduced vector titers unless the first ATG codon of the remaining gag sequence is mutated.
  • the reduced packaging sequences comprise deleted gag sequences wherein only the first 80 nucleotides of gag remain.
  • the reduced packaging sequences comprise deleted gag sequences wherein only the first 70 nucleotides of gag remain.
  • the reduced packaging sequences comprise deleted gag sequences wherein only the first 60 nucleotides of gag remain.
  • the reduced packaging sequences comprise deleted gag sequences wherein only the first 50 nucleotides of gag remain.
  • the reduced packaging sequences comprise deleted gag sequences wherein only the first 40 nucleotides of gag remain.
  • the reduced packaging sequences comprise deleted gag sequences wherein only the first 30 nucleotides of gag remain.
  • the reduced packaging sequences comprise deleted gag sequences wherein only the first 20 nucleotides of gag remain.
  • the reduced packaging sequences comprise deleted gag sequences wherein only the first 10 nucleotides of gag remain.
  • the reduced packaging sequences comprise deleted gag sequences wherein no nucleotides of gag remain.
  • nucleotide sequences of the invention comprise ablated gag sequences wherein the gag sequences comprise only up to the first 10, up to the first 20, up to the first 30, up to the first 40, up to the first 50, up to the first 60, up to the first 70, or up to the first 80 nucleotides of gag.
  • the nucleotide sequence encoding gag may be a truncated nucleotide sequence encoding a part of gag.
  • the nucleotide sequence encoding gag may be a minimal truncated nucleotide sequence encoding a part of gag.
  • the part of gag may be a contiguous sequence.
  • the truncated nucleotide sequence or minimal truncated nucleotide sequence encoding a part of gag may also contain at least one frameshift mutation.
  • An example truncated nucleotide sequence encoding a part of gag and which contains a frameshift mutation at position 45-46 is as follows:
  • the nucleotide sequence encoding gag may, for example, comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 9; or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 10.
  • the modified nucleotide sequence encoding gag may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 9; or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 10.
  • the modified nucleotide sequence encoding gag may comprise the sequence as set forth in SEQ ID NO: 9 or SEQ ID NO: 10, or a sequence having at least 80% identity (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) thereto, wherein at least one ATG sequence selected from (a) to (c) is mutated: a) ATG corresponding to positions 1-3 of SEQ ID NO: 9; b) ATG corresponding to positions 47-49 of SEQ ID NO: 9; and/or c) ATG corresponding to positions 107-109 of SEQ ID NO: 9.
  • An example modified minimal truncated nucleotide sequence encoding a part of gag and which contains a frameshift mutation is as follows:
  • the modified nucleotide sequence encoding gag may comprise the sequence as set forth in SEQ ID NO: 11 , or a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity thereto.
  • the sequence may comprise less than three (suitably less than two or less than one) ATG sequences.
  • the modified nucleotide sequence encoding gag may comprise the sequence as set forth in SEQ ID NO: 12, or a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity thereto.
  • the sequence may comprise less than two (suitably less than one) ATG sequences.
  • the modified nucleotide sequence encoding gag may comprise less than three ATG sequences.
  • the modified nucleotide sequence encoding gag may comprise less than two or less than one ATG sequence(s).
  • the modified nucleotide sequence encoding gag may lack an ATG sequence. Lentiviral vector genomes lacking a nucleotide sequence encoding Gag-p17 or a fragment thereof are described in PCT/GB2021/050620, incorporated herein by reference in its entirety.
  • the lentiviral vector genome as described herein may lack a nucleotide sequence encoding Gag-p17 or a fragment thereof.
  • the lentiviral vector genome may, for example, not express Gag-p17 or a fragment thereof.
  • the lentiviral vector genome may lack the sequence as set forth in SEQ ID NO: 13.
  • the viral protein Gag-p17 surrounds the capsid of the lentiviral vector particle, and is in turn surrounded by the envelope protein.
  • a nucleotide sequence encoding Gag-p17 has historically been included in lentiviral vector genomes for the production of therapeutic lentiviral vectors.
  • the nucleotide sequence encoding Gag-p17 present within lentiviral vector genomes is typically embedded within the packaging region containing highly structured RNA towards the 5' region of the RNA (the 5'UTR ).
  • the nucleotide sequence encoding Gag-p17 typically comprises an RNA instability sequence (INS), herein referred to as p17-INS.
  • INS RNA instability sequence
  • the lentiviral vector genome lacking a nucleotide sequence encoding Gag-p17 or p17-INS is of a smaller size compared to a lentiviral vector genome comprising a nucleotide sequence encoding Gag-p17 or p17-INS.
  • the amount of viral DNA contained within the viral vector backbone delivered in transduced cells is reduced when a lentiviral vector genome lacking a nucleotide sequence encoding Gag-p17 or p17-INS is used.
  • the lentiviral vector genome lacking a nucleotide sequence encoding Gag-p17 or p17-INS may be used to deliver a transgene of larger size than the transgenes which can be delivered using a lentiviral vector genome containing a nucleotide sequence encoding Gag-p17 or p17-INS. Therefore, there are several reasons why it may be desirable to delete nucleotide sequence encoding Gag-p17 or p17-INS within the vector backbone. Deletion of gag sequences in order to reduce the size of lentiviral vector genome sequences has been reported (Sertkaya, H., et al., Sci Rep 11 :12067 (2021)).
  • the lentiviral vector genome lacks either (i) a nucleotide sequence encoding Gag-p17 or (ii) a fragment of a nucleotide sequence encoding Gag-p17.
  • the lentiviral vector genome lacks a nucleotide sequence encoding p17-INS or a fragment thereof.
  • An example p17-INS is as follows:
  • the lentiviral vector genome may lack the sequence as set forth in SEQ ID NO: 13.
  • the fragment of a nucleotide sequence encoding Gag-p17 is a part of a full-length nucleotide sequence encoding Gag-p17.
  • the fragment comprises or consists of at least about 10 nucleotides (suitably at least about 20, at least about 30, at least about 40, at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350 nucleotides).
  • the fragment of a nucleotide sequence encoding Gag-p17 may have a length which is between 1 % and 99% of full-length nucleotide sequence encoding Gag-p17.
  • the fragment of a nucleotide sequence encoding Gag-p17 may have a length which is at least about 10% (suitably at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) of a full-length nucleotide sequence encoding Gag-p17, such as a native nucleotide sequence encoding Gagpl 7.
  • the fragment may
  • the fragment of a nucleotide sequence encoding Gag-p17 may have a length which is between 1% and 99% of full-length nucleotide sequence encoding p17-INS.
  • the fragment of a nucleotide sequence encoding Gag-p17 may have a length which is at least about 10% (suitably at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) of a full-length nucleotide sequence encoding p17-INS, such as a native nucleotide sequence encoding p17- INS (e.
  • the fragment of a nucleotide sequence encoding Gag-p17 comprises or consists of the INS located in the nucleotide sequence encoding Gag-p17.
  • the lentiviral vector genome lacking either (i) a nucleotide sequence encoding Gag-p17 or (ii) a fragment of a nucleotide sequence encoding Gag-p17 comprises at least one modified viral cis-acting sequence as described herein.
  • the lentiviral vector genome lacking either (i) a nucleotide sequence encoding Gag-p17 or (ii) a fragment of a nucleotide sequence encoding Gag-p17 may comprise a modified RRE, a modified WPRE and/or a modified nucleotide sequence encoding gag as described herein.
  • the lentiviral vector genome lacking either (i) a nucleotide sequence encoding Gag-p17 or (ii) a fragment of a nucleotide sequence encoding Gag-p17 may comprise a modified RRE as described herein, a modified WPRE as described herein and a modified nucleotide sequence encoding gag as described herein.
  • the lentiviral vector genome comprises a modified Rev response element (RRE), wherein at least one internal open reading frame (ORF) in the RRE is disrupted as described herein.
  • RRE Rev response element
  • the RRE is an essential viral RNA element that is well conserved across lentiviral vectors and across different wild-type HIV-1 isolates.
  • the RRE present within lentiviral vector genomes may contain multiple internal ORFs. These internal ORFs may be found between an internal ATG sequence of the RRE and the stop codon immediately 3' to the ATG sequence.
  • the RRE present within lentiviral vector genomes is typically embedded within the packaging region containing highly structured RNA towards the 5' region of the RNA (the 5'UTR ).
  • the 5' UTR structure consists of series of stem-loop structures connected by small linkers. These stem-loops include the RRE.
  • the RRE itself has a complex secondary structure, involving complementary base-pairing, to which Rev binds. Modifications in the RRE to disrupt at least one internal ORF, for example by mutating the ATG sequence which denotes the start of the at least one internal ORF, are tolerated.
  • the modified RREs described herein retain Rev binding capacity.
  • the modified RRE may comprise less than eight ATG sequences.
  • the lentiviral vector genome comprises a modified Rev response element (RRE), wherein the modified RRE comprises less than eight ATG sequences.
  • RRE Rev response element
  • the modified RRE may comprise less than seven, less than six, less than five, less than four, less than three, less than two or less than one ATG sequence(s).
  • the modified RRE may lack an ATG sequence.
  • the RRE may be a minimal functional RRE.
  • An example minimal functional RRE is as follows:
  • the RRE may be the core RRE.
  • An example core RRE is as follows:
  • minimal functional RRE or “minimal RRE” is meant a truncated RRE sequence which retains the function of the full-length RRE. Thus, the minimal functional RRE retains Rev binding capacity.
  • the RRE may be a full-length RRE.
  • An example full-length RRE is as follows:
  • the RRE may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 14; and/or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 15.
  • the modified RRE may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 14; and/or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 15.
  • the modified RRE may comprise the sequence as set forth in SEQ ID NO: 14 or SEQ ID NO: 15, or a sequence having at least 80% identity (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) thereto, wherein at least one ATG sequence selected from the group (a)-(h) is mutated: a) ATG corresponding to positions 27-29 of SEQ ID NO: 15; b) ATG corresponding to positions 192-194 of SEQ ID NO: 15; c) ATG corresponding to positions 207-209 of SEQ ID NO: 15; d) ATG corresponding to positions 436-438 of SEQ ID NO: 15; e) ATG corresponding to positions 489-491 of SEQ ID NO: 15; f) ATG corresponding to positions 571-573 of SEQ ID NO: 15; g) ATG corresponding to positions 599-60
  • An example modified RRE sequence is as follows:
  • a further example modified RRE sequence is as follows:
  • a further example modified RRE sequence is as follows:
  • An example of a modified RRE sequence lacking an ATG sequence is as follows:
  • the modified RRE may comprise the sequence as set forth in SEQ ID NO: 16 or SEQ ID NO: 17 or SEQ ID NO: 18, or a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity thereto.
  • the sequence may comprise less than eight (suitably less than seven, less than six, less than five, less than four, less than three, less than two or less than one) ATG sequences.
  • WPRE Woodchuck hepatitis virus post-transcriptional regulatory element
  • the lentiviral vector genome comprises a modified Woodchuck hepatitis virus (WHV) post-transcriptional regulatory element (WPRE), wherein at least one internal open reading frame (ORF) in the WPRE is disrupted as described herein.
  • WV Woodchuck hepatitis virus
  • ORF internal open reading frame
  • the WPRE can enhance expression from a number of different vector types including lentiviral vectors (U.S. Patent Nos. 6,136,597; 6,287,814; Zufferey, R., et al. (1999) J. Virol. 73: 2886- 92).
  • this enhancement is thought to be due to improved RNA processing at the post-transcriptional level, resulting in increased levels of nuclear transcripts.
  • a two-fold increase in mRNA stability also contributes to this enhancement (Zufferey, R., et al. ibid).
  • the level of enhancement of protein expression from transcripts containing the WPRE versus those without the WPRE has been reported to be around 2-to-5 fold, and correlates well with the increase in transcript levels. This has been demonstrated with a number of different transgenes (Zufferey, R., et al. ibid).
  • the WPRE contains three cis-acting sequences important for its function in enhancing expression levels. In addition, it contains a fragment of approximately 180 bp comprising the 5'-end of the WHV X protein ORF (full length ORF is 425bp), together with its associated promoter.
  • the full-length X protein has been implicated in tumorigenesis (Flajolet, M. et al, (1998) J. Virol.
  • US 2005/0002907 discloses that mutation of a region of the WPRE corresponding to the X protein ORF ablates the tumorigenic activity of the X protein, thereby allowing the WPRE to be used safely in retroviral and lentiviral expression vectors to enhance expression levels of heterologous genes or nucleotides of interest.
  • the "X region" of the WPRE is defined as comprising at least the first 60-amino acids of the X protein ORF, including the translation initiation codon, and its associated promoter.
  • a "functional" X protein is defined herein as a truncated X protein that is capable of promoting tumorigenesis, or a transformed phenotype, when expressed in cells of interest.
  • a "non-functional" X protein in the context of this application is defined as an X protein that is incapable of promoting tumorigenesis in cells of interest.
  • the modified WPREs described herein retain the capacity to enhance expression from the lentiviral vector.
  • the modified WPRE may comprise less than seven ATG sequences.
  • the modified WPRE may comprise less than six ATG sequences.
  • the lentiviral vector genome comprises a modified Woodchuck hepatitis virus (WHV) post-transcriptional regulatory element (WPRE), wherein the modified WPRE comprises less than seven ATG sequences, preferably less than six ATG sequences.
  • WV Woodchuck hepatitis virus
  • WPRE post-transcriptional regulatory element
  • the modified WPRE may comprise less than seven, less than six, less than five, less than four, less than three, less than two or less than one ATG sequence(s).
  • the modified WPRE may lack ATG sequences.
  • at least one ATG sequence in the X region of the WPRE is mutated, whereby expression of a functional X protein is prevented.
  • the mutation is in the translation initiation codon of the X region. As a result of the mutation of the at least one ATG sequence, the X protein may not be expressed.
  • the modified WPRE does not comprise a mutation in an ATG sequence in the X region of the WPRE.
  • the WPRE may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 19; and/or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 20.
  • the modified WPRE may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 19; and/or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 20.
  • the modified WPRE may comprise the sequence as set forth in SEQ ID NO: 19 or SEQ ID NO: 20, or a sequence having at least 80% identity (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) thereto, wherein at least one ATG sequence selected from the group (a)-(g) is mutated: a) ATG corresponding to positions 53-55 of SEQ ID NO: 19; b) ATG corresponding to positions 72-74 of SEQ ID NO: 19; c) ATG corresponding to positions 91-93 of SEQ ID NO: 19; d) ATG corresponding to positions 104-106 of SEQ ID NO: 19; e) ATG corresponding to positions 121-123 of SEQ ID NO: 19; f) ATG corresponding to positions 170-172 of SEQ ID NO: 19; and/or g) ATG corresponding to
  • the WRPE typically contains a retained Pol ORF.
  • An example retained Pol ORF sequence is as follows: In one embodiment, at least one (suitably at least two or at least three) ATG sequence within the retained Pol ORF sequence in the WPRE is mutated. In one embodiment, all ATG sequences within the retained Pol ORF sequence in the WPRE are mutated.
  • the modified WPRE comprises less than three (suitably less than two or less than one) ATG sequences in the retained Pol ORF sequence in the WPRE. In one embodiment, the modified WPRE lacks an ATG sequence in the retained Pol ORF sequence in the WPRE.
  • the modified WPRE may comprise the sequence as set forth in SEQ ID NO: 22 or SEQ ID NO: 23, or a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity thereto.
  • the sequence may comprise less than six (suitably less than five, less than four, less than three, less than two or less than one) ATG sequences.
  • a vector is a tool that allows or facilitates the transfer of an entity from one environment to another.
  • some vectors used in recombinant nucleic acid techniques allow entities, such as a segment of nucleic acid (e.g. a heterologous DNA segment, such as a heterologous cDNA segment), to be transferred into and expressed by a target cell.
  • the vector may facilitate the integration of the nucleotide sequence encoding a viral vector component to maintain the nucleotide sequence encoding the viral vector component and its expression within the target cell.
  • the vector may be or may include an expression cassette (also termed an expression construct).
  • Expression cassettes as described herein comprise regions of nucleic acid containing sequences capable of being transcribed. Thus, sequences encoding mRNA, tRNA and rRNA are included within this definition.
  • the vector may contain one or more selectable marker genes (e.g. a neomycin resistance gene) and/or traceable marker gene(s) (e.g. a gene encoding green fluorescent protein (GFP)).
  • selectable marker genes e.g. a neomycin resistance gene
  • traceable marker gene(s) e.g. a gene encoding green fluorescent protein (GFP)
  • Vectors may be used, for example, to infect and/or transduce a target cell.
  • the vector may further comprise a nucleotide sequence enabling the vector to replicate in the host cell in question, such as a conditionally replicating oncolytic vector.
  • cassette which is synonymous with terms such as “conjugate”, “construct” and “hybrid” - includes a polynucleotide sequence directly or indirectly attached to a promoter.
  • the cassette comprises at least a polynucleotide sequence operably linked to a promoter.
  • expression cassettes for use in the invention may comprise a promoter for the expression of the nucleotide sequence encoding a viral vector component and optionally a regulator of the nucleotide sequence encoding the viral vector component.
  • the choice of expression cassette e.g. plasmid, cosmid, virus or phage vector, will often depend on the host cell into which it is to be introduced.
  • the expression cassette can be a DNA plasmid (supercoiled, nicked or linearised), minicircle DNA (linear or supercoiled), plasmid DNA containing just the regions of interest by removal of the plasmid backbone by restriction enzyme digestion and purification, DNA generated using an enzymatic DNA amplification platform e.g. doggybone DNA (dbDNATM) where the final DNA used is in a closed ligated form or where it has been prepared (e.g. restriction enzyme digestion) to have open cut ends.
  • dbDNATM doggybone DNA
  • the invention provides a viral vector production system comprising a set of nucleotide sequences, wherein the nucleotide sequences comprise nucleotide sequences encoding vector components including gag-pol, env, optionally rev, and a nucleotide sequence of the invention.
  • the invention provides a cell comprising the nucleotide sequence of the invention, the expression cassette of the invention, or the vector production system of the invention.
  • the invention provides a cell for producing lentiviral vectors comprising:
  • nucleotide sequences encoding vector components including gag-pol and env, and optionally rev, and the nucleotide sequence of the invention or the expression cassette of the invention;
  • the splicing activity from the major splice donor site and/or splice donor region of the RNA genome of the lentiviral vector is suppressed or ablated.
  • the splicing activity from the major splice donor site and/or splice donor region of the RNA genome of the lentiviral vector is suppressed or ablated during lentiviral vector production.
  • the invention provides a method for producing a lentiviral vector, comprising the steps of:
  • nucleotide sequences encoding vector components including gag-pol and env, and optionally rev, and the nucleotide sequence of the invention or the expression cassette of the invention;
  • the invention provides a lentiviral vector produced by the method of the invention.
  • the lentiviral vector comprises the RNA genome of the lentiviral vector as described herein.
  • the lentiviral vector genome comprises a modified 3' LTR and/or a modified 5' LTR as described herein.
  • the invention provides the use of the nucleotide sequence of the invention, the expression cassette of the invention, the viral vector production system of the invention, or the cell of the invention, for producing a lentiviral vector.
  • a lentiviral vector production system comprises a set of nucleotide sequences encoding the components required for production of the lentiviral vector. Accordingly, a vector production system comprises a set of nucleotide sequences which encode the viral vector components necessary to generate lentiviral vector particles.
  • “Viral vector production system” or “vector production system” or “production system” is to be understood as a system comprising the necessary components for viral vector production.
  • the viral vector production system comprises nucleotide sequences encoding Gag and Gag/Pol proteins, and Env protein and the vector genome sequence.
  • the production system may optionally comprise a nucleotide sequence encoding the Rev protein, or functional substitute thereof.
  • the viral vector production system comprises modular nucleic acid constructs (modular constructs).
  • a modular construct is a DNA expression construct comprising two or more nucleic acids used in the production of lentiviral vectors.
  • a modular construct can be a DNA plasmid comprising two or more nucleic acids used in the production of lentiviral vectors.
  • the plasmid may be a bacterial plasmid.
  • the nucleic acids can encode for example, gag-pol, rev, env, vector genome.
  • modular constructs designed for generation of packaging and producer cell lines may additionally need to encode transcriptional regulatory proteins (e.g. TetR, CymR) and/or translational repression proteins (e.g. TRAP) and selectable markers (e.g.
  • Suitable modular constructs for use in the present invention are described in EP 3502260, which is hereby incorporated by reference in its entirety.
  • the safety profile of these modular constructs has been considered and additional safety features directly engineered into the constructs. These features include the use of insulators for multiple open reading frames of retroviral vector components and/or the specific orientation and arrangement of the retroviral genes in the modular constructs. It is believed that by using these features the direct read-through to generate replication-competent viral particles will be prevented.
  • the nucleic acid sequences encoding the viral vector components may be in reverse and/or alternating transcriptional orientations in the modular construct.
  • the nucleic acid sequences encoding the viral vector components are not presented in the same 5' to 3' orientation, such that the viral vector components cannot be produced from the same mRNA molecule.
  • the reverse orientation may mean that at least two coding sequences for different vector components are presented in the 'head-to-head' and 'tail-to-tail' transcriptional orientations. This may be achieved by providing the coding sequence for one vector component, e.g. env, on one strand and the coding sequence for another vector component, e.g. rev, on the opposing strand of the modular construct.
  • each component may be orientated such that it is present in the opposite 5' to 3' orientation to all of the adjacent coding sequence(s) for other vector components to which it is adjacent, i.e. alternating 5' to 3' (or transcriptional) orientations for each coding sequence may be employed.
  • the modular construct for use according to the present invention may comprise nucleic acid sequences encoding two or more of the following vector components: gag-pol, rev, an env, vector genome.
  • the modular construct may comprise nucleic acid sequences encoding any combination of the vector components.
  • the modular construct may comprise nucleic acid sequences encoding: i) the RNA genome of the retroviral vector and rev, or a functional substitute thereof; ii) the RNA genome of the retroviral vector and gag-pol; iii) the RNA genome of the retroviral vector and env; iv) gag-pol and rev, or a functional substitute thereof; v) gag-pol and env; vi) env and rev, or a functional substitute thereof; vii) the RNA genome of the retroviral vector, rev, or a functional substitute thereof, and gag-pol; viii) the RNA genome of the retroviral vector, rev, or a functional substitute thereof, and env; ix) the RNA genome of the retroviral vector, gag-pol and env; or x) gag-pol, rev, or a functional substitute thereof, and env, wherein the nucleic acid sequences are in reverse and/or alternating orientations.
  • a cell for producing lentiviral vectors may comprise nucleic acid sequences encoding any one of the combinations i) to x) above, wherein the nucleic acid sequences are located at the same genetic locus and are in reverse and/or alternating orientations.
  • the same genetic locus may refer to a single extrachromosomal locus in the cell, e.g. a single plasmid, or a single locus (i.e. a single insertion site) in the genome of the cell.
  • the cell may be a stable or transient cell for producing lentiviral vectors. In one aspect the cell does not comprise tat.
  • the DNA expression construct can be a DNA plasmid (supercoiled, nicked or linearised), minicircle DNA (linear or supercoiled), plasmid DNA containing just the regions of interest by removal of the plasmid backbone by restriction enzyme digestion and purification, DNA generated using an enzymatic DNA amplification platform e.g. doggybone DNA (dbDNATM) where the final DNA used is in a closed ligated form or where it has been prepared (e.g restriction enzyme digestion) to have open cut ends.
  • dbDNATM doggybone DNA
  • the lentiviral vector is derived from HIV-1 , HIV-2, SIV, FIV, BIV, EIAV, CAEV or Visna lentivirus.
  • a “viral vector production cell”, “vector production cell”, or “production cell” is to be understood as a cell that is capable of producing a lentiviral vector or lentioviral vector particle.
  • Lentiviral vector production cells may be "producer cells” or "packaging cells”.
  • One or more DNA constructs of the viral vector system may be either stably integrated or episomally maintained within the viral vector production cell. Alternatively, all the DNA components of the viral vector system may be transiently transfected into the viral vector production cell. In yet another alternative, a production cell stably expressing some of the components may be transiently transfected with the remaining components required for vector production.
  • packaging cell refers to a cell which contains the elements necessary for production of lentiviral vector particles but which lacks the vector genome.
  • packaging cells contain one or more expression cassettes which are capable of expressing viral structural proteins (such as gag, gag/pol and env).
  • Producer cells/packaging cells can be of any suitable cell type. Producer cells are generally mammalian cells but can be derived from other organisms, e.g. insect cells.
  • the term "producer cell” or “vector producing/producer cell” refers to a cell which contains all the elements necessary for production of lentiviral vector particles.
  • the producer cell may be either a stable producer cell line or derived transiently or may be a stable packaging cell wherein the lentiviral genome is transiently expressed.
  • the vector components may include gag, env, rev and/or the RNA genome of the lentiviral vector when the viral vector is a lentiviral vector.
  • the nucleotide sequences encoding vector components may be introduced into the cell either simultaneously or sequentially in any order.
  • the vector production cells may be cells cultured in vitro such as a tissue culture cell line.
  • suitable production cells or cells for producing a lentiviral vector are those cells which are capable of producing viral vectors or viral vector particles when cultured under appropriate conditions.
  • the cells typically comprise nucleotide sequences encoding vector components, which may include gag, env, rev and the RNA genome of the lentiviral vector.
  • Suitable cell lines include, but are not limited to, mammalian cells such as murine fibroblast derived cell lines or human cell lines.
  • the vector production cells are derived from a human cell line. Accordingly, such suitable production cells may be employed in any of the methods or uses of the present invention.
  • nucleotide sequences are well known in the art and have been described previously.
  • introduction into a cell of nucleotide sequences encoding vector components including gag, env, rev and the RNA genome of the lentiviral vector is within the capabilities of a person skilled in the art.
  • Stable production cells may be packaging or producer cells.
  • the vector genome DNA construct may be introduced stably or transiently.
  • Packaging/producer cells can be generated by transducing a suitable cell line with a retroviral vector which expresses one of the components of the vector, i.e. a genome, the gag-pol components and an envelope as described in WO 2004/022761.
  • the nucleotide sequence can be transfected into cells and then integration into the production cell genome occurs infrequently and randomly.
  • the transfection methods may be performed using methods well known in the art. For example, a stable transfection process may employ constructs which have been engineered to aid concatemerisation.
  • the transfection process may be performed using calcium phosphate or commercially available formulations such as LipofectamineTM 2000CD (Invitrogen, CA), FuGENE® HD or polyethylenimine (PEI).
  • nucleotide sequences may be introduced into the production cell via electroporation.
  • the skilled person will be aware of methods to encourage integration of the nucleotide sequences into production cells. For example, linearising a nucleic acid construct can help if it is naturally circular. Less random integration methodologies may involve the nucleic acid construct comprising of areas of shared homology with the endogenous chromosomes of the mammalian host cell to guide integration to a selected site within the endogenous genome.
  • the nucleic acid construct may contain a loxP site which allows for targeted integration when combined with Cre recombinase (i.e. using the Cre/lox system derived from P1 bacteriophage).
  • the recombination site is an att site (e.g. from A phage), wherein the att site permits site-directed integration in the presence of a lambda integrase. This would allow the lentiviral genes to be targeted to a locus within the host cellular genome which allows for high and/or stable expression.
  • DSB double strand break
  • NHEJ non-homologous end joining
  • Cleavage can occur through the use of specific nucleases such as engineered zinc finger nucleases (ZFN), transcription-activator like effector nucleases (TALENs), using CRISPR/Cas9 systems with an engineered crRNA/tracr RNA ('single guide RNA') to guide specific cleavage, and/or using nucleases based on the Argonaute system (e.g., from T. thermophilus).
  • ZFN zinc finger nucleases
  • TALENs transcription-activator like effector nucleases
  • CRISPR/Cas9 systems with an engineered crRNA/tracr RNA ('single guide RNA') to guide specific cleavage
  • nucleases based on the Argonaute system e.g., from T. thermophilus
  • Packaging/producer cell lines can be generated by integration of nucleotide sequences using methods of just lentiviral transduction or just nucleic acid transfection, or a combination of both can be used. Methods for generating retroviral vectors from production cells and in particular the processing of retroviral vectors are described in WO 2009/153563.
  • the production cell may comprise the RNA-binding protein (e.g. tryptophan RNA-binding attenuation protein, TRAP) and/or the Tet Repressor (TetR) protein or alternative regulatory proteins (e.g. CymR).
  • RNA-binding protein e.g. tryptophan RNA-binding attenuation protein, TRAP
  • TetR Tet Repressor
  • alternative regulatory proteins e.g. CymR
  • Production of lentiviral vector from production cells can be via transfection methods, from production from stable cell lines which can include induction steps (e.g. doxycycline induction) or via a combination of both.
  • the transfection methods may be performed using methods well known in the art, and examples have been described previously.
  • Production cells either packaging or producer cell lines or those transiently transfected with the lentiviral vector encoding components are cultured to increase cell and virus numbers and/or virus titres.
  • Culturing a cell is performed to enable it to metabolize, and/or grow and/or divide and/or produce viral vectors of interest according to the invention. This can be accomplished by methods well known to persons skilled in the art, and includes but is not limited to providing nutrients for the cell, for instance in the appropriate culture media. The methods may comprise growth adhering to surfaces, growth in suspension, or combinations thereof.
  • Culturing can be done for instance in tissue culture flasks, tissue culture multiwell plates, dishes, roller bottles, wave bags or in bioreactors, using batch, fed-batch, continuous systems and the like.
  • cells are initially 'bulked up' in tissue culture flasks or bioreactors and subsequently grown in multi-layered culture vessels or large bioreactors (greater than 50L) to generate the vector producing cells for use in the present invention.
  • cells are grown in a suspension mode to generate the vector producing cells for use in the present invention.
  • Lentiviruses are part of a larger group of retroviruses. A detailed list of lentiviruses may be found in Coffin et al (1997) "Retroviruses” Cold Spring Harbour Laboratory Press Eds: JM Coffin, SM Hughes, HE Varmus pp 758-763). In brief, lentiviruses can be divided into primate and non-primate groups. Examples of primate lentiviruses include but are not limited to: the human immunodeficiency virus (HIV), the causative agent of human auto-immunodeficiency syndrome (AIDS), and the simian immunodeficiency virus (SIV).
  • HAV human immunodeficiency virus
  • AIDS causative agent of human auto-immunodeficiency syndrome
  • SIV simian immunodeficiency virus
  • the non-primate lentiviral group includes the prototype "slow virus” visna/maedi virus (VMV), as well as the related caprine arthritis-encephalitis virus (CAEV), equine infectious anaemia virus (EIAV), feline immunodeficiency virus (FIV), Maedi visna virus (MW) and bovine immunodeficiency virus (BIV).
  • VMV very low virus
  • CAEV caprine arthritis-encephalitis virus
  • EIAV equine infectious anaemia virus
  • FIV feline immunodeficiency virus
  • MW Maedi visna virus
  • bovine immunodeficiency virus BIV
  • the lentiviral vector is derived from HIV-1 , HIV-2, SIV, FIV, BIV, EIAV, CAEV or Visna lentivirus.
  • the lentivirus family differs from retroviruses in that lentiviruses have the capability to infect both dividing and non-dividing cells (Lewis et al (1992) EMBO J 11 (8): 3053-3058 and Lewis and Emerman (1994) J Virol 68 (1):510-516).
  • retroviruses such as MLV
  • MLV are unable to infect non-dividing or slowly dividing cells such as those that make up, for example, muscle, brain, lung and liver tissue.
  • a lentiviral vector is a vector which comprises at least one component part derivable from a lentivirus.
  • that component part is involved in the biological mechanisms by which the vector infects or transduces target cells and expresses a nucleotide of interest (NOI), or nucleotides of interest.
  • NOI nucleotide of interest
  • the lentiviral vector may be used to replicate the NOI in a compatible target cell in vitro.
  • a method of making proteins in vitro by introducing a vector of the invention into a compatible target cell in vitro and growing the target cell under conditions which result in expression of the NOI. Protein and NOI may be recovered from the target cell by methods well known in the art.
  • Suitable target cells include mammalian cell lines and other eukaryotic cell lines.
  • the vectors may have "insulators" - genetic sequences that block the interaction between promoters and enhancers, and act as a barrier reducing read-through from an adjacent gene.
  • the insulator is present between one or more of the lentiviral nucleic acid sequences to prevent promoter interference and read-thorough from adjacent genes. If the insulators are present in the vector between one or more of the lentiviral nucleic acid sequences, then each of these insulated genes may be arranged as individual expression units.
  • the basic structure of retroviral and lentiviral genomes share many common features such as a 5' LTR and a 3' LTR, between or within which are located a packaging signal to enable the genome to be packaged, a primer binding site, integration sites to enable integration into a target cell genome and gaglpol and env genes encoding the packaging components - these are polypeptides required for the assembly of viral particles.
  • Lentiviruses have additional features, such as the rev gene and RRE sequences in HIV, which enable the efficient export of RNA transcripts of the integrated provirus from the nucleus to the cytoplasm of an infected target cell.
  • LTRs long terminal repeats
  • the LTRs are responsible for proviral integration, and transcription. LTRs also serve as enhancer-promoter sequences and can control the expression of the viral genes.
  • the LTRs themselves are identical sequences that can be divided into three elements, which are called U3, R and U5.
  • U3 is derived from the sequence unique to the 3' end of the RNA.
  • R is derived from a sequence repeated at both ends of the RNA and
  • U5 is derived from the sequence unique to the 5' end of the RNA.
  • the sizes of the three elements can vary considerably among different retroviruses.
  • At least part of one or more protein coding regions essential for replication may be removed from the virus; for example, gaglpol and env may be absent or not functional. This makes the viral vector replication-defective.
  • the lentiviral vector may be derived from either a primate lentivirus (e.g. HIV-1) or a nonprimate lentivirus (e.g. EIAV).
  • a primate lentivirus e.g. HIV-1
  • a nonprimate lentivirus e.g. EIAV
  • a typical retroviral vector production system involves the separation of the viral genome from the essential viral packaging functions. These viral vector components are normally provided to the production cells on separate DNA expression cassettes (alternatively known as plasmids, expression plasmids, DNA constructs or expression constructs).
  • the vector genome comprises the NOI.
  • Vector genomes typically require a packaging signal (qj), the internal expression cassette harbouring the NOI, (optionally) a post-transcriptional element (PRE), typically a central polypurine tract (cppt), the 3'-ppu and a self-inactivating (SIN) LTR.
  • PRE post-transcriptional element
  • cppt central polypurine tract
  • SIN self-inactivating
  • the R-U5 regions are required for correct polyadenylation of both the vector genome RNA and NOI mRNA, as well as the process of reverse transcription.
  • the vector genome may optionally include an open reading frame, as described in WO 2003/064665, which allows for vector production in the absence of rev.
  • the packaging functions include the gag/pol and env genes. These are required for the production of vector particles by the production cell. Providing these functions in trans to the genome facilitates the production of replication-defective viral vectors.
  • Production systems for gamma-retroviral vectors are typically 3-component systems requiring genome, gaglpol and env expression constructs.
  • Production systems for HIV-1-based lentiviral vectors may additionally require the accessory gene rev to be provided and for the vector genome to include the rev-responsive element (RRE).
  • RRE rev-responsive element
  • ElAV-based lentiviral vectors do not require rev to be provided in trans if an open-reading frame (ORF) is present within the genome (see WO 2003/064665).
  • both the "external" promoter (which drives the vector genome cassette) and “internal” promoter (which drives the NOI cassette) encoded within the vector genome cassette are strong eukaryotic or virus promoters, as are those driving the other vector system components.
  • promoters include CMV, EF1a, PGK, CAG, TK, SV40 and Ubiquitin promoters.
  • Strong 'synthetic' promoters, such as those generated by DNA libraries e.g. JeT promoter may also be used to drive transcription.
  • tissue-specific promoters such as rhodopsin (Rho), rhodopsin kinase (RhoK), cone-rod homeobox containing gene (CRX), neural retina-specific leucine zipper protein (NRL), Vitelliform Macular Dystrophy 2 (VMD2), Tyrosine hydroxylase, neuronal-specific neuronal-specific enolase (NSE) promoter, astrocytespecific glial fibrillary acidic protein (GFAP) promoter, human a1 -antitrypsin (hAAT) promoter, phosphoenolpyruvate carboxykinase (PEPCK), liver fatty acid binding protein promoter, Flt-1 promoter, I N F-p promoter, Mb promoter, SP-B promoter, SYN1 promoter, WASP promoter, SV401 hAlb promoter, SV401 CD43, SV401 CD45, NSE I RU5' promoter, ICAM-2 promoter, GPII
  • Production of retroviral vectors involves either the transient co-transfection of the production cells with these DNA components or use of stable production cell lines wherein all the components are stably integrated within the production cell genome (e.g. Stewart HJ, Fong- Wong L, Strickland I, Chipchase D, Kelleher M, Stevenson L, Thoree V, McCarthy J, Ralph GS, Mitrophanous KA and Radcliffe PA. (2011). Hum Gene Ther. Mar; 22 (3):357-69).
  • An alternative approach is to use a stable packaging cell (into which the packaging components are stably integrated) and then transiently transfect in the vector genome plasmid as required (e.g. Stewart, H. J., M. A. Leroux-Carlucci, C.
  • packaging cell lines could be generated (just one or two packaging components are stably integrated into the cell lines) and to generate vector the missing components are transiently transfected.
  • the production cell may also express regulatory proteins such as a member of the tet repressor (TetR) protein group of transcription regulators (e.g.T-Rex, Tet-On, and Tet- Off), a member of the cumate inducible switch system group of transcription regulators (e.g. cumate repressor (CymR) protein), or an RNA-binding protein (e.g. TRAP - tryptophan- activated RNA-binding protein).
  • TetR tet repressor
  • CymR cumate repressor
  • RNA-binding protein e.g. TRAP - tryptophan- activated RNA-binding protein
  • the viral vector is derived from EIAV.
  • El AV has the simplest genomic structure of the lentiviruses and is particularly preferred for use in the present invention.
  • EIAV encodes three other genes: tat, rev, and S2.
  • Tat acts as a transcriptional activator of the viral LTR (Derse and Newbold (1993) Virology 194(2): 530-536 and Maury et al (1994) Virology 200(2):632-642) and rev regulates and coordinates the expression of viral genes through rev-response elements (RRE) (Martarano et al. (1994) J Virol 68(5):3102-3111).
  • RRE rev-response elements
  • the viral vector is derived from HIV: HIV differs from EIAV in that it does not encode S2 but unlike EIAV it encodes vif, vpr, vpu and nef.
  • RRV retroviral or lentiviral vector
  • RRV refers to a vector with sufficient retroviral genetic information to allow packaging of an RNA genome, in the presence of packaging components, into a viral particle capable of transducing a target cell. Transduction of the target cell may include reverse transcription and integration into the target cell genome.
  • the RRV carries non-viral coding sequences which are to be delivered by the vector to the target cell.
  • a RRV is incapable of independent replication to produce infectious retroviral particles within the target cell.
  • the RRV lacks a functional gaglpol and/or env gene, and/or other genes essential for replication.
  • the RRV vector of the present invention has a minimal viral genome.
  • minimal viral genome means that the viral vector has been manipulated so as to remove the non-essential elements whilst retaining the elements essential to provide the required functionality to infect, transduce and deliver a NOI to a target cell. Further details of this strategy can be found in WO 1998/17815 and WO 99/32646.
  • a minimal EIAV vector lacks tat, rev and S2 genes and neither are these genes provided in trans in the production system.
  • a minimal HIV vector lacks vif, vpr, vpu, tat and net.
  • the expression plasmid used to produce the vector genome within a production cell may include transcriptional regulatory control sequences operably linked to the retroviral genome to direct transcription of the genome in a production cell/packaging cell. All 3rd generation lentiviral vectors are deleted in the 5' U3 enhancer-promoter region, and transcription of the vector genome RNA is driven by heterologous promoter such as another viral promoter, for example the CMV promoter, as discussed below. This feature enables vector production independently of tat. Some lentiviral vector genomes require additional sequences for efficient virus production. For example, particularly in the case of HIV, RRE sequences may be included. However the requirement for RRE on the (separate) GagPol cassette (and dependence on rev which is provided in trans) may be reduced or eliminated by codon optimisation of the GagPol ORF. Further details of this strategy can be found in WO 2001/79518.
  • a functional analogue of the rev/RRE system is found in the Mason Pfizer monkey virus. This is known as the constitutive transport element (GTE) and comprises an RRE-type sequence in the genome which is believed to interact with a factor in the infected cell. The cellular factor can be thought of as a rev analogue. Thus, CTE may be used as an alternative to the rev/RRE system.
  • GTE constitutive transport element
  • CTE may be used as an alternative to the rev/RRE system.
  • Any other functional equivalents of the Rev protein which are known or become available may be relevant to the invention.
  • the Rex protein of HTLV-I can functionally replace the Rev protein of HIV-1.
  • Revand RRE may be absent or non-functional in the vector for use in the methods of the present invention; in the alternative rev and RRE, or functionally equivalent system, may be present.
  • 'rev' may refer to a sequence encoding the HIV-1 Rev protein or a sequence encoding any functional equivalent thereof.
  • the invention provides a viral vector production system and/or a cell comprising a set of nucleotide sequences, wherein the nucleotide sequences encode vector components including gag-pol, env, optionally rev, and the nucleotide sequences of any of the preceding claims.
  • the lentiviral vectors as described herein may be used in a self-inactivating (SIN) configuration in which the viral enhancer and promoter sequences have been deleted.
  • SIN vectors can be generated and transduce non-dividing target cells in vivo, ex vivo or in vitro with an efficacy similar to that of non-SIN vectors.
  • the transcriptional inactivation of the long terminal repeat (LTR) in the SIN provirus should prevent mobilisation of vRNA, and is a feature that further diminishes the likelihood of formation of replication-competent virus. This should also enable the regulated expression of genes from internal promoters by eliminating any cis-acting effects of the LTR.
  • LTR long terminal repeat
  • self-inactivating retroviral vector systems have been constructed by deleting the transcriptional enhancers or the enhancers and promoter in the U3 region of the 3' LTR. After a round of vector reverse transcription and integration, these changes are copied into both the 5' and the 3' LTRs producing a transcriptionally inactive provirus. However, any promoter(s) internal to the LTRs in such vectors will still be transcriptionally active.
  • This strategy has been employed to eliminate effects of the enhancers and promoters in the viral LTRs on transcription from internally placed genes. Such effects include increased transcription or suppression of transcription. This strategy can also be used to eliminate downstream transcription from the 3' LTR into genomic DNA.
  • gaglpol and/or env may be mutated and/or not functional.
  • a typical lentiviral vector as described herein at least part of one or more coding regions for proteins essential for virus replication may be removed from the vector. This makes the viral vector replication-defective. Portions of the viral genome may also be replaced by a NOI in order to generate a vector comprising an NOI which is capable of transducing a non-dividing target cell and/or integrating its genome into the target cell genome.
  • the lentiviral vectors are non-integrating vectors as described in WO 2006/010834 and WO 2007/071994.
  • the vectors have the ability to deliver a sequence which is devoid of or lacking viral RNA.
  • a heterologous binding domain (heterologous to gag) located on the RNA to be delivered and a cognate binding domain on Gag or GagPol can be used to ensure packaging of the RNA to be delivered. Both of these vectors are described in WO 2007/072056.
  • Polynucleotides of the invention may comprise DNA or RNA. They may be single-stranded or double-stranded. A nucleotide, or nucleotides, of interest is/are commonly referred to as NOI. It will be understood by a skilled person that numerous different polynucleotides can encode the same polypeptide as a result of the degeneracy of the genetic code. In addition, it is to be understood that skilled persons may, using routine techniques, make nucleotide substitutions that do not affect the polypeptide sequence encoded by the polynucleotides of the invention to reflect the codon usage of any particular host organism in which the polypeptides of the invention are to be expressed.
  • polynucleotides may be modified by any method available in the art. Such modifications may be carried out in order to enhance the in vivo activity or lifespan of the polynucleotides of the invention.
  • Polynucleotides such as DNA polynucleotides may be produced recombinantly, synthetically or by any means available to those of skill in the art. They may also be cloned by standard techniques.
  • Longer polynucleotides will generally be produced using recombinant means, for example using polymerase chain reaction (PCR) cloning techniques. This will involve making a pair of primers (e.g. of about 15 to 30 nucleotides) flanking the target sequence which it is desired to clone, bringing the primers into contact with mRNA or cDNA obtained from an animal or human cell, performing PCR under conditions which bring about amplification of the desired region, isolating the amplified fragment (e.g. by purifying the reaction mixture with an agarose gel) and recovering the amplified DNA.
  • the primers may be designed to contain suitable restriction enzyme recognition sites so that the amplified DNA can be cloned into a suitable vector.
  • Expression of a NOI and polynucleotide may be controlled using control sequences for example transcription regulation elements or translation repression elements, which include promoters, enhancers and other expression regulation signals (e.g. tet repressor (TetR) system) or the Transgene Repression In vector Production cell system (TRiP) or other regulators of NOIs described herein.
  • transcription regulation elements or translation repression elements which include promoters, enhancers and other expression regulation signals (e.g. tet repressor (TetR) system) or the Transgene Repression In vector Production cell system (TRiP) or other regulators of NOIs described herein.
  • Prokaryotic promoters and promoters functional in eukaryotic cells may be used. Tissuespecific or stimuli-specific promoters may be used. Chimeric promoters may also be used comprising sequence elements from two or more different promoters.
  • Suitable promoting sequences are strong promoters including those derived from the genomes of viruses, such as polyoma virus, adenovirus, fowlpox virus, bovine papilloma virus, avian sarcoma virus, cytomegalovirus (CMV), retrovirus and Simian Virus 40 (SV40), or from heterologous mammalian promoters, such as the actin promoter, EF1a, CAG, TK, SV40, ubiquitin, PGK or ribosomal protein promoter.
  • viruses such as polyoma virus, adenovirus, fowlpox virus, bovine papilloma virus, avian sarcoma virus, cytomegalovirus (CMV), retrovirus and Simian Virus 40 (SV40), or from heterologous mammalian promoters, such as the actin promoter, EF1a, CAG, TK, SV40, ubiquitin, PGK or
  • tissue-specific promoters such as rhodopsin (Rho), rhodopsin kinase (RhoK), cone-rod homeobox containing gene (CRX), neural retina-specific leucine zipper protein (NRL), Vitelliform Macular Dystrophy 2 (VMD2), Tyrosine hydroxylase, neuronal-specific neuronal-specific enolase (NSE) promoter, astrocytespecific glial fibrillary acidic protein (GFAP) promoter, human a1 -antitrypsin (hAAT) promoter, phosphoenolpyruvate carboxykinase (PEPCK), liver fatty acid binding protein promoter, Flt-1 promoter, I N F-p promoter, Mb promoter, SP-B promoter, SYN1 promoter, WASP promoter, SV401 hAlb promoter, SV401 CD43, SV401 CD45, NSE I RU5' promoter, ICAM-2 promoter, GPII
  • Enhancers are relatively orientation- and position-independent; however, one may employ an enhancer from a eukaryotic cell virus, such as the SV40 enhancer and the CMV early promoter enhancer.
  • the enhancer may be spliced into the vector at a position 5' or 3' to the promoter, but is preferably located at a site 5' from the promoter.
  • the promoter can additionally include features to ensure or to increase expression in a suitable target cell.
  • the features can be conserved regions e.g. a Pribnow Box or a TATA box.
  • the promoter may contain other sequences to affect (such as to maintain, enhance or decrease) the levels of expression of a nucleotide sequence. Suitable other sequences include the Sh1 -intron or an ADH intron. Other sequences include inducible elements, such as temperature, chemical, light or stress inducible elements. Also, suitable elements to enhance transcription or translation may be present. Regulators of NOIs
  • retroviral packaging/producer cell lines and retroviral vector production A complicating factor in the generation of retroviral packaging/producer cell lines and retroviral vector production is that constitutive expression of certain retroviral vector components and NOIs are cytotoxic leading to death of cells expressing these components and therefore inability to produce vector. Therefore, the expression of these components (e.g. gag-pol and envelope proteins such as VSV-G) can be regulated. The expression of other non-cytotoxic vector components, e.g. rev, can also be regulated to minimise the metabolic burden on the cell.
  • the modular constructs and/or cells as described herein may comprise cytotoxic and/or non-cytotoxic vector components associated with at least one regulatory element.
  • regulatory element refers to any element capable of affecting, either increasing or decreasing, the expression of an associated gene or protein.
  • a regulatory element includes a gene switch system, transcription regulation element and translation repression element.
  • a number of prokaryotic regulator systems have been adapted to generate gene switches in mammalian cells.
  • Many retroviral packaging and producer cell lines have been controlled using gene switch systems (e.g. tetracycline and cumate inducible switch systems) thus enabling expression of one or more of the retroviral vector components to be switched on at the time of vector production.
  • Gene switch systems include those of the (TetR) protein group of transcription regulators (e.g.T-Rex, Tet-On, and Tet-Off), those of the cumate inducible switch system group of transcription regulators (e.g. CymR protein) and those involving an RNA-binding protein (e.g. TRAP).
  • TetR tetracycline repressor
  • TetO2 tetracycline operators
  • hCMVp human cytomegalovirus major immediate early promoter
  • Tetracycline repressor rather than the tetR-mammalian cell transcription factor fusion derivatives, regulates inducible gene expression in mammalian cells. 1998. Hum Gene Then, 9: 1939-1950).
  • the expression of the NOI can be controlled by a CMV promoter into which two copies of the TetO2 sequence have been inserted in tandem.
  • TetR homodimers in the absence of an inducing agent (tetracycline or its analogue doxycycline [dox]), bind to the TetO2 sequences and physically block transcription from the upstream CMV promoter.
  • the inducing agent binds to the TetR homodimers, causing allosteric changes such that it can no longer bind to the TetO2 sequences, resulting in gene expression.
  • the TetR gene may be codon optimised as this may improve translation efficiency resulting in tighter control of TetO2 controlled gene expression.
  • the TRiP system is described in WO 2015/092440 and provides another way of repressing expression of the NOI in the production cells during vector production.
  • the TRAP-binding sequence e.g.
  • TRAP-tbs TRAP-tbs interaction forms the basis for a transgene protein repression system for the production of retroviral vectors, when a constitutive and/or strong promoter, including a tissue-specific promoter, driving the transgene is desirable and particularly when expression of the transgene protein in production cells leads to reduction in vector titres and/or elicits an immune response in vivo due to viral vector delivery of transgene-derived protein (Maunder et al, Nat Commun. (2017) Mar 27; 8).
  • the TRAP-tbs interaction forms a translational block, repressing translation of the transgene protein (Maunder et al, Nat Commun. (2017) Mar 27; 8).
  • the translational block is only effective in production cells and as such does not impede the DNA- or RNA- based vector systems.
  • the TRiP system is able to repress translation when the transgene protein is expressed from a constitutive and/or strong promoter, including a tissue-specific promoter from single- or multi cistronic mRNA. It has been demonstrated that unregulated expression of transgene protein can reduce vector titres and affect vector product quality.
  • transgene protein Repression of transgene protein for both transient and stable PaCL/PCL vector production systems is beneficial for production cells to prevent a reduction in vector titres: where toxicity or molecular burden issues may lead to cellular stress; where transgene protein elicits an immune response in vivo due to viral vector delivery of transgene-derived protein; where the use of gene-editing transgenes may result in on/off target affects; where the transgene protein may affect vector and/or envelope glycoprotein exclusion.
  • the lentiviral vector as described herein has been pseudotyped.
  • pseudotyping can confer one or more advantages.
  • the env gene product of the HIV based vectors would restrict these vectors to infecting only cells that express a protein called CD4. But if the env gene in these vectors has been substituted with env sequences from other enveloped viruses, then they may have a broader infectious spectrum (Verma and Somia (1997) Nature 389(6648):239-242).
  • workers have pseudotyped an HIV based vector with the glycoprotein from VSV (Verma and Somia (1997) Nature 389(6648):239-242). Accordingly, alternative sequences which perform the equivalent function as the env gene product of HIV based vectors are also known.
  • the Env protein may be a modified Env protein such as a mutant or engineered Env protein. Modifications may be made or selected to introduce targeting ability or to reduce toxicity or for another purpose (Valsesia- Wittman et al 1996 J Virol 70: 2056-64; Nilson et al (1996) Gene Ther 3(4):280-286; and Fielding et al (1998) Blood 91 (5):1802-1809 and references cited therein).
  • the vector may be pseudotyped with any molecule of choice.
  • env shall mean an endogenous lentiviral envelope or a heterologous envelope, as described herein.
  • env may be Env of HIV based vectors or a functional substitute thereof.
  • the envelope glycoprotein (G) of Vesicular stomatitis virus (VSV), a rhabdovirus is an envelope protein that has been shown to be capable of pseudotyping certain enveloped viruses and viral vector virions.
  • VSV-G pseudotyped vectors have been shown to infect not only mammalian cells, but also cell lines derived from fish, reptiles and insects (Burns et al. (1993) ibid). They have also been shown to be more efficient than traditional amphotropic envelopes for a variety of cell lines (Yee et al., (1994) Proc. Natl. Acad. Sci. USA 91 :9564- 9568, Emi et al. (1991) Journal of Virology 65:1202-1207). VSV-G protein can be used to pseudotype certain retroviruses because its cytoplasmic tail is capable of interacting with the retroviral cores.
  • VSV-G protein The provision of a non-retroviral pseudotyping envelope such as VSV-G protein gives the advantage that vector particles can be concentrated to a high titre without loss of infectivity (Akkina et al. (1996) J. Virol. 70:2581-5). Retrovirus envelope proteins are apparently unable to withstand the shearing forces during ultracentrifugation, probably because they consist of two non-covalently linked subunits. The interaction between the subunits may be disrupted by the centrifugation. In comparison the VSV glycoprotein is composed of a single unit. VSV- G protein pseudotyping can therefore offer potential advantages for both efficient target cell infection/transduction and during manufacturing processes.
  • WO 2000/52188 describes the generation of pseudotyped retroviral vectors, from stable producer cell lines, having vesicular stomatitis virus-G protein (VSV-G) as the membrane- associated viral envelope protein, and provides a gene sequence for the VSV-G protein.
  • VSV-G vesicular stomatitis virus-G protein
  • the Ross River viral envelope has been used to pseudotype a non-primate lentiviral vector (FIV) and following systemic administration predominantly transduced the liver (Kang et al., 2002, J. Virol., 76:9378-9388). Efficiency was reported to be 20-fold greater than obtained with VSV-G pseudotyped vector, and caused less cytotoxicity as measured by serum levels of liver enzymes suggestive of hepatotoxicity.
  • FOV non-primate lentiviral vector
  • the baculovirus GP64 protein has been shown to be an alternative to VSV-G for viral vectors used in the large-scale production of high-titre virus required for clinical and commercial applications (Kumar M, Bradow BP, Zimmerberg J (2003) Hum Gene Ther. 14(1):67-77). Compared with VSV-G-pseudotyped vectors, GP64-pseudotyped vectors have a similar broad tropism and similar native titres. Because, GP64 expression does not kill cells, HEK293T- based cell lines constitutively expressing GP64 can be generated.
  • envelopes which give reasonable titre when used to pseudotype EIAV include Mokola, Rabies, Ebola and LCMV (lymphocytic choriomeningitis virus). Intravenous infusion into mice of lentivirus pseudotyped with 4070A led to maximal gene expression in the liver.
  • the term "packaging signal”, which is referred to interchangeably as “packaging sequence” or “psi”, is used in reference to the noncoding, cis-acting sequence required for encapsidation of retroviral RNA strands during viral particle formation.
  • this sequence has been mapped to loci extending from upstream of the major splice donor site (SD) to at least the gag start codon (some or all of the 5' sequence of gag to nucleotide 688 may be included).
  • the packaging signal comprises the R region into the 5' coding region of Gag.
  • extended packaging signal or extended packaging sequence refers to the use of sequences around the psi sequence with further extension into the gag gene. The inclusion of these additional packaging sequences may increase the efficiency of insertion of vector RNA into viral particles.
  • RNA encapsidation determinants have been shown to be discrete and non-continuous, comprising one region at the 5' end of the genomic mRNA (R- U5) and another region that mapped within the proximal 311 nt of gag (Kaye et al., J Virol. Oct;69(10):6588-92 (1995).
  • IRES elements Insertion of IRES elements allows expression of multiple coding regions from a single promoter (Adam et al (as above); Koo et al (1992) Virology 186:669-675; Chen et al 1993 J. Virol 67:2142-2148). IRES elements were first found in the non-translated 5' ends of picornaviruses where they promote cap-independent translation of viral proteins (Jang et al (1990) Enzyme 44: 292-309). When located between open reading frames in an RNA, IRES elements allow efficient translation of the downstream open reading frame by promoting entry of the ribosome at the IRES element followed by downstream initiation of translation.
  • IRES encephalomyocarditis virus
  • IRES elements from PV, EMCV and swine vesicular disease virus have previously been used in retroviral vectors (Coffin et al, as above).
  • IRES includes any sequence or combination of sequences which work as or improve the function of an IRES.
  • the IRES(s) may be of viral origin (such as EMCV IRES, PV IRES, or FMDV 2A-like sequences) or cellular origin (such as FGF2 IRES, NRF IRES, Notch 2 IRES or EIF4 IRES).
  • the IRES In order for the IRES to be capable of initiating translation of each polynucleotide it should be located between or prior to the polynucleotides in the modular construct.
  • the nucleotide sequences utilised for development of stable cell lines require the addition of selectable markers for selection of cells where stable integration has occurred. These selectable markers can be expressed as a single transcription unit within the nucleotide sequence or it may be preferable to use IRES elements to initiate translation of the selectable marker in a polycistronic message (Adam et al 1991 J.Virol. 65, 4985).
  • genes can have relative orientations with respect to one another when part of the same nucleic acid construct.
  • At least two nucleic acid sequences present at the same locus in the cell or construct can be in a reverse and/or alternating orientations.
  • the pair of sequential genes will not have the same orientation. This can help prevent both transcriptional and translational read-through when the region is expressed within the same physical location of the host cell.
  • Having the alternating orientations benefits retroviral vector production when the nucleic acids required for vector production are based at the same genetic locus within the cell. This in turn can also improve the safety of the resulting constructs in preventing the generation of replication-competent retroviral vectors.
  • nucleic acid sequences are in reverse and/or alternating orientations the use of insulators can prevent inappropriate expression or silencing of a NOI from its genetic surroundings.
  • insulator refers to a class of nucleotide, e.g.DNA, sequence elements that when bound to insulator-binding proteins possess an ability to protect genes from surrounding regulator signals.
  • insulators There are two types of insulators: an enhancer blocking function and a chromatin barrier function.
  • an insulator is situated between a promoter and an enhancer, the enhancer-blocking function of the insulator shields the promoter from the transcription-enhancing influence of the enhancer (Geyer and Corces 1992; Kellum and Schedl 1992).
  • the chromatin barrier insulators function by preventing the advance of nearby condensed chromatin which would lead to a transcriptionally active chromatin region turning into a transcriptionally inactive chromatin region and resulting in silencing of gene expression.
  • Insulators which inhibit the spread of heterochromatin, and thus gene silencing, recruit enzymes involved in histone modifications to prevent this process (Yang J, Corces VG. 2011 ;110:43-76; Huang, Li et al. 2007; Dhillon, Raab et al. 2009).
  • An insulator can have one or both of these functions and the chicken p-globin insulator (cHS4) is one such example.
  • cHS4 chicken p-globin insulator
  • This insulator is the most extensively studied vertebrate insulator, is highly rich in G+C and has both enhancer-blocking and heterochromatic barrier functions (Chung J H, Whitely M, Felsenfeld G. Cell. 1993;74:505-514).
  • insulators with enhancer blocking functions are not limited to but include the following: human p-globin insulator 5 (HS5), human p-globin insulator 1 (HS1), and chicken p-globin insulator (cHS3) (Farrell CM1 , West AG, Felsenfeld G., Mol Cell Biol. 2002 Jun;22(11):3820-31 ; J Ellis et al. EMBO J. 1996 Feb 1 ; 15(3): 562- 568).
  • the insulators also help to prevent promoter interference (i.e. where the promoter from one transcription unit impairs expression of an adjacent transcription unit) between adjacent retroviral nucleic acid sequences. If the insulators are used between each of the retroviral vector nucleic acid sequences, then the reduction of direct read-through will help prevent the formation of replication-competent retroviral vector particles.
  • the insulator may be present between each of the retroviral nucleic acid sequences.
  • the use of insulators prevents promoter-enhancer interactions from one NOI expression cassette interacting with another NOI expression cassette in a nucleotide sequence encoding vector components.
  • An insulator may be present between the vector genome and gag-pol sequences. This therefore limits the likelihood of the production of a replication-competent retroviral vector and 'wild-type' like RNA transcripts, improving the safety profile of the construct.
  • the use of insulator elements to improve the expression of stably integrated multigene vectors is cited in Moriarity et al, Nucleic Acids Res. 2013 Apr;41 (8):e92.
  • Titre is often described as transducing units/mL (TU/rnL). Titre may be increased by increasing the number of vector particles and by increasing the specific activity of a vector preparation.
  • the lentiviral vector as described herein or a cell or tissue transduced with the lentiviral vector as described herein may be used in medicine.
  • the lentiviral vector as described herein, a production cell of the invention or a cell or tissue transduced with the lentiviral vector as described herein may be used for the preparation of a medicament to deliver a nucleotide of interest to a target site in need of the same.
  • Such uses of the lentiviral vector or transduced cell of the invention may be for therapeutic or diagnostic purposes, as described previously.
  • a "cell transduced by a viral vector particle” is to be understood as a cell, in particular a target cell, into which the nucleic acid carried by the viral vector particle has been transferred.
  • the nucleotide of interest i.e. transgene
  • a target cell which lacks TRAP.
  • “Target cell” is to be understood as a cell in which it is desired to express the NOI.
  • the NOI may be introduced into the target cell using a viral vector of the present invention. Delivery to the target cell may be performed in vivo, ex vivo or in vitro.
  • the nucleotide of interest gives rise to a therapeutic effect.
  • the NOI may have a therapeutic or diagnostic application.
  • Suitable NOIs include, but are not limited to sequences encoding enzymes, co-factors, cytokines, chemokines, hormones, antibodies, anti-oxidant molecules, engineered immunoglobulin-like molecules, single chain antibodies, fusion proteins, immune co-stimulatory molecules, immunomodulatory molecules, chimeric antigen receptors a transdomain negative mutant of a target protein, toxins, conditional toxins, antigens, transcription factors, structural proteins, reporter proteins, subcellular localization signals, tumour suppressor proteins, growth factors, membrane proteins, receptors, vasoactive proteins and peptides, anti-viral proteins and ribozymes, and derivatives thereof (such as derivatives with an associated reporter group).
  • the NOIs may also encode micro-RNA.
  • the NOI may be useful in the treatment of a neurodegenerative disorder.
  • the NOI may be useful in the treatment of Parkinson's disease.
  • the NOI may encode an enzyme or enzymes involved in dopamine synthesis.
  • the enzyme may be one or more of the following: tyrosine hydroxylase, GTP-cyclohydrolase I and/or aromatic amino acid dopa decarboxylase. The sequences of all three genes are available (GenBank® Accession Nos. X05290, I119523 and M76180, respectively).
  • the NOI may encode the vesicular monoamine transporter 2 (VMAT2).
  • the viral genome may comprise a NOI encoding aromatic amino acid dopa decarboxylase and a NOI encoding VMAT2. Such a genome may be used in the treatment of Parkinson's disease, in particular in conjunction with peripheral administration of L-DOPA.
  • the NOI may encode a therapeutic protein or combination of therapeutic proteins.
  • the NOI may encode a protein or proteins selected from the group consisting of glial cell derived neurotophic factor (GDNF), brain derived neurotrophic factor (BDNF), ciliary neurotrophic factor (CNTF), neurotrophin-3 (NT-3), acidic fibroblast growth factor (aFGF), basic fibroblast growth factor (bFGF), interleukin-1 beta (I L-1 p), tumor necrosis factor alpha (TNF-a), insulin growth factor-2, VEGF-A, VEGF-B, VEGF-C/VEGF-2, VEGF-D, VEGF-E, PDGF-A, PDGF-B, hetero- and homo-dimers of PDFG-A and PDFG-B.
  • GDNF glial cell derived neurotophic factor
  • BDNF brain derived neurotrophic factor
  • CNTF ciliary neurotrophic factor
  • NT-3 neurotrophin-3
  • aFGF acidic fibroblast growth factor
  • bFGF basic fibroblast growth
  • the NOI may encode an anti-angiogenic protein or anti-angiogenic proteins selected from the group consisting of angiostatin, endostatin, platelet factor 4, pigment epithelium derived factor (PEDF), placental growth factor, restin, interferon-a, interferon-inducible protein, gro-beta and tubedown-1 , interleukin(IL)-1 , IL-12, retinoic acid, anti-VEGF antibodies or fragments /variants thereof such as aflibercept, thrombospondin, VEGF receptor proteins such as those described in US 5,952,199 and US 6, 100,071 , and anti- VEGF receptor antibodies.
  • angiostatin angiostatin
  • endostatin platelet factor 4
  • PEDF pigment epithelium derived factor
  • placental growth factor restin
  • interferon-a interferon-inducible protein
  • gro-beta and tubedown-1 interleukin(IL)-1
  • IL-12 interleukin
  • the NOI may encode anti-inflammatory proteins, antibodies or fragment/variants of proteins or antibodies selected from the group consisting of NF-kB inhibitors, ILI beta inhibitors, TGFbeta inhibitors, IL-6 inhibitors, IL-23 inhibitors, IL-18 inhibitors, Tumour necrosis factor alpha and Tumour necrosis factor beta, Lymphotoxin alpha and Lymphotoxin beta, LIGHT inhibitors, alpha synuclein inhibitors, Tau inhibitors, beta amyloid inhibitors, and IL-17 inhibitors.
  • the NOI may encode cystic fibrosis transmembrane conductance regulator (CFTR).
  • CFTR cystic fibrosis transmembrane conductance regulator
  • the NOI may encode a protein normally expressed in an ocular cell.
  • the NOI may encode a protein normally expressed in a photoreceptor cell and/or retinal pigment epithelium cell.
  • the NOI may encode a protein selected from the group comprising RPE65, arylhydrocarbon-interacting receptor protein like 1 (AIPL1), CRB1 , lecithin retinal acetyltransferace (LRAT), photoreceptor-specific homeo box (CRX), retinal guanylate cyclise (GUCY2D), RPGR interacting protein 1 (RPGRIP1), LCA2, LCA3, LCA5, dystrophin, PRPH2, CNTF, ABCR/ABCA4, EMP1 , TIMP3, MERTK, ELOVL4, MYO7A, USH2A, VMD2, RLBP1 , COX-2, FPR, harmonin, Rab escort protein 1 , CNGB2, CNGA3, CEP 290, RPGR, RS1 , RP1 , PRELP, glutathione pathway enzymes and opticin.
  • AIPL1 arylhydrocarbon-interacting receptor protein like 1
  • CRB1 CRB1
  • LRAT le
  • the NOI may encode the human clotting Factor VIII or Factor IX.
  • the NOI may encode protein or proteins involved in metabolism selected from the group comprising phenylalanine hydroxylase (PAH), Methylmalonyl CoA mutase, Propionyl CoA carboxylase, Isovaleryl CoA dehydrogenase, Branched chain ketoacid dehydrogenase complex, Glutaryl CoA dehydrogenase, Acetyl CoA carboxylase, propionyl CoA carboxylase, 3 methyl crotonyl CoA carboxylase, pyruvate carboxylase, carbamoyl- phophate synthase ammonia, ornithine transcarbamylase, glucosylceramidase beta, alpha galactosidase A, glucosylceramidase beta, cystinosin, glucosamine(N-acetyl)-6-sulfatase, N- acetyl-alpha-glucosaminidase, N-
  • PAH
  • the NOI may encode a chimeric antigen receptor (CAR) or a T cell receptor (TCR).
  • the CAR is an anti-5T4 CAR.
  • the NOI may encode B-cell maturation antigen (BCMA), CD19, CD22, CD20, CD138, CD30, CD33, CD123, CD70, prostate specific membrane antigen (PSMA), Lewis Y antigen (LeY), Tyrosine-protein kinase transmembrane receptor (ROR1), Mucin 1 , cell surface associated (Muc1), Epithelial cell adhesion molecule (EpCAM), endothelial growth factor receptor (EGFR), insulin, protein tyrosine phosphatase, non-receptor type 22, interleukin 2 receptor, alpha, interferon induced with helicase C domain 1 , human epidermal growth factor receptor (HER2), glypican 3 (GPC3), disialoganglioside (GD2), mes
  • B-cell maturation antigen
  • the NOI may encode a chimeric antigen receptor (CAR) against NKG2D ligands selected from the group comprising LILBP1 , 2 and 3, H60, Rae-1 a, b, g, d, MICA, MICB.
  • CAR chimeric antigen receptor
  • the NOI may encode SGSH, SLIMF1 , GAA, the common gamma chain (CD132), adenosine deaminase, WAS protein, globins, alpha galactosidase A, 6- aminolevulinate (ALA) synthase, b-aminolevulinate dehydratase (ALAD), Hydroxymethylbilane (HMB) synthase, Uroporphyrinogen (URO) synthase, Uroporphyrinogen (URO) decarboxylase, Coproporphyrinogen (COPRO) oxidase, Protoporphyrinogen (PROTO) oxidase, Ferrochelatase, a-L-iduronidase, Iduronate sulfatase, Heparan sulfamidase, N-acetylglucosaminidase, Heparan-a-glucosaminide N-acetyltransfer
  • the vector may also comprise or encode a siRNA, shRNA, or regulated shRNA (Dickins et al. (2005) Nature Genetics 37: 1289-1295, Silva et al. (2005) Nature Genetics 37:1281-1288).
  • the vectors including retroviral and AAV vectors, according to the present invention may be used to deliver one or more NOI(s) useful in the treatment of the disorders listed in WO 1998/05635, WO 1998/07859, WO 1998/09985.
  • the nucleotide of interest may be DNA or RNA. Examples of such diseases are given below:
  • a disorder which responds to cytokine and cell proliferation/differentiation activity immunosuppressant or immunostimulant activity (e.g. for treating immune deficiency, including infection with human immunodeficiency virus, regulation of lymphocyte growth; treating cancer and many autoimmune diseases, and to prevent transplant rejection or induce tumour immunity); regulation of haematopoiesis (e.g. treatment of myeloid or lymphoid diseases); promoting growth of bone, cartilage, tendon, ligament and nerve tissue (e.g. for healing wounds, treatment of burns, ulcers and periodontal disease and neurodegeneration); inhibition or activation of follicle-stimulating hormone (modulation of fertility); chemotactic/chemokinetic activity (e.g.
  • haemostatic and thrombolytic activity e.g. for treating haemophilia and stroke
  • anti- inflammatory activity for treating, for example, septic shock or Crohn's disease
  • macrophage inhibitory and/or T cell inhibitory activity and thus, anti-inflammatory activity for treating, for example, septic shock or Crohn's disease
  • macrophage inhibitory and/or T cell inhibitory activity and thus, anti-inflammatory activity for treating, for example, septic shock or Crohn's disease
  • macrophage inhibitory and/or T cell inhibitory activity and thus, anti-inflammatory activity for treating, for example, septic shock or Crohn's disease
  • macrophage inhibitory and/or T cell inhibitory activity and thus, anti-inflammatory activity for treating, for example, septic shock or Crohn's disease
  • macrophage inhibitory and/or T cell inhibitory activity and thus, anti-inflammatory activity for treating, for example, septic shock or Crohn's disease
  • Malignancy disorders including cancer, leukaemia, benign and malignant tumour growth, invasion and spread, angiogenesis, metastases, ascites and malignant pleural effusion.
  • Autoimmune diseases including arthritis, including rheumatoid arthritis, hypersensitivity, allergic reactions, asthma, systemic lupus erythematosus, collagen diseases and other diseases.
  • Vascular diseases including arteriosclerosis, atherosclerotic heart disease, reperfusion injury, cardiac arrest, myocardial infarction, vascular inflammatory disorders, respiratory distress syndrome, cardiovascular effects, peripheral vascular disease, migraine and aspirindependent anti-thrombosis, stroke, cerebral ischaemia, ischaemic heart disease or other diseases.
  • Hepatic diseases including hepatic fibrosis, liver cirrhosis.
  • Inherited metabolic disorders including phenylketonuria PKU, Wilson disease, organic acidemias, urea cycle disorders, cholestasis, and other diseases.
  • Renal and urologic diseases including thyroiditis or other glandular diseases, glomerulonephritis or other diseases.
  • Ear, nose and throat disorders including otitis or other oto-rhino-laryngological diseases, dermatitis or other dermal diseases.
  • Dental and oral disorders including periodontal diseases, periodontitis, gingivitis or other dental/oral diseases.
  • Testicular diseases including orchitis or epididimo-orchitis, infertility, orchidal trauma or other testicular diseases.
  • Gynaecological diseases including placental dysfunction, placental insufficiency, habitual abortion, eclampsia, pre-eclampsia, endometriosis and other gynaecological diseases.
  • Ophthalmologic disorders such as Leber Congenital Amaurosis (LCA) including LCA10, posterior uveitis, intermediate uveitis, anterior uveitis, conjunctivitis, chorioretinitis, uveoretinitis, optic neuritis, glaucoma, including open angle glaucoma and juvenile congenital glaucoma, intraocular inflammation, e.g.
  • retinitis or cystoid macular oedema sympathetic ophthalmia, scleritis, retinitis pigmentosa
  • macular degeneration including age related macular degeneration (AMD) and juvenile macular degeneration including Best Disease, Best vitelliform macular degeneration, Stargardt's Disease, Usher's syndrome, Doyne's honeycomb retinal dystrophy, Sorby's Macular Dystrophy, Juvenile retinoschisis, Cone-Rod Dystrophy, Corneal Dystrophy, Fuch's Dystrophy, Leber's congenital amaurosis, Leber's hereditary optic neuropathy (LHON), Adie syndrome, Oguchi disease, degenerative fondus disease, ocular trauma, ocular inflammation caused by infection, proliferative vitreo- retinopathies, acute ischaemic optic neuropathy, excessive scarring, e.g.
  • glaucoma filtration operation reaction against ocular implants, corneal transplant graft rejection, and other ophthalmic diseases, such as diabetic macular oedema, retinal vein occlusion, RLBP1- associated retinal dystrophy, choroideremia and achromatopsia.
  • ophthalmic diseases such as diabetic macular oedema, retinal vein occlusion, RLBP1- associated retinal dystrophy, choroideremia and achromatopsia.
  • Neurological and neurodegenerative disorders including Parkinson's disease, complication and/or side effects from treatment of Parkinson's disease, AIDS-related dementia complex HIV-related encephalopathy, Devic's disease, Sydenham chorea, Alzheimer's disease and other degenerative diseases, conditions or disorders of the CNS, strokes, post-polio syndrome, psychiatric disorders, myelitis, encephalitis, subacute sclerosing pan-encephalitis, encephalomyelitis, acute neuropathy, subacute neuropathy, chronic neuropathy, Fabry disease, Gaucher disease, Cystinosis, Pompe disease, metachromatic leukodystrophy, Wiscott Aldrich Syndrome, adrenoleukodystrophy, beta-thalassemia, sickle cell disease, Guillaim-Barre syndrome, Sydenham chorea, myasthenia gravis, pseudo-tumour cerebri, Down's Syndrome, Huntington's disease, CNS compression or CNS trauma or infections of the CNS, muscular atrophies
  • cystic fibrosis mucopolysaccharidosis including Sanfilipo syndrome A, Sanfilipo syndrome B, Sanfilipo syndrome C, Sanfilipo syndrome D, Hunter syndrome, Hurler-Scheie syndrome, Morquio syndrome, ADA-SCID, X-linked SCID, X-linked chronic granulomatous disease, porphyria, haemophilia A, haemophilia B, post- traumatic inflammation, haemorrhage, coagulation and acute phase response, cachexia, anorexia, acute infection, septic shock, infectious diseases, diabetes mellitus, complications or side effects of surgery, bone marrow transplantation or other transplantation complications and/or side effects, complications and side effects of gene therapy, e.g.
  • siRNA, micro-RNA and shRNA due to infection with a viral carrier, or AIDS, to suppress or inhibit a humoral and/or cellular immune response, for the prevention and/or treatment of graft rejection in cases of transplantation of natural or artificial cells, tissue and organs such as cornea, bone marrow, organs, lenses, pacemakers, natural or artificial skin tissue.
  • a viral carrier or AIDS
  • a humoral and/or cellular immune response for the prevention and/or treatment of graft rejection in cases of transplantation of natural or artificial cells, tissue and organs such as cornea, bone marrow, organs, lenses, pacemakers, natural or artificial skin tissue.
  • shRNA siRNA, micro-RNA and shRNA
  • the NOI comprises a micro-RNA.
  • Micro-RNAs are a very large group of small RNAs produced naturally in organisms, at least some of which regulate the expression of target genes. Founding members of the micro-RNA family are let-7 and Un-4.
  • the let-7 gene encodes a small, highly conserved RNA species that regulates the expression of endogenous protein-coding genes during worm development.
  • the active RNA species is transcribed initially as an ⁇ 70 nt precursor, which is post-transcriptionally processed into a mature ⁇ 21 nt form.
  • Both let-7 and lin-4 are transcribed as hairpin RNA precursors which are processed to their mature forms by Dicer enzyme.
  • the vector may also comprise or encode a siRNA, shRNA, or regulated shRNA (Dickins et al. (2005) Nature Genetics 37: 1289-1295, Silva et al. (2005) Nature Genetics 37:1281-1288).
  • RNA interference RNA interference
  • siRNAs small interfering or silencing RNAs
  • dsRNA small interfering or silencing RNAs
  • dsRNA >30 bp has been found to activate the interferon response leading to shut-down of protein synthesis and non-specific mRNA degradation (Stark et al., Annu Rev Biochem 67:227-64 (1998)).
  • this response can be bypassed by using 21 nt siRNA duplexes (Elbashir et al., EMBO J. Dec 3;20(23):6877-88 (2001), Hutvagner et al., Science.Aug 3, 293(5531):834-8. Eupub Jul 12 (2001)) allowing gene function to be analysed in cultured mammalian cells.
  • the present disclosure provides a pharmaceutical composition
  • a pharmaceutical composition comprising the lentiviral vector as described herein or a cell or tissue transduced with the viral vector as described herein, in combination with a pharmaceutically acceptable carrier, diluent or excipient.
  • the present disclosure provides a pharmaceutical composition for treating an individual by gene therapy, wherein the composition comprises a therapeutically effective amount of a lentiviral vector.
  • the pharmaceutical composition may be for human or animal usage.
  • the composition may comprise a pharmaceutically acceptable carrier, diluent, excipient or adjuvant.
  • a pharmaceutically acceptable carrier diluent, excipient or adjuvant.
  • the choice of pharmaceutical carrier, excipient or diluent can be made with regard to the intended route of administration and standard pharmaceutical practice.
  • the pharmaceutical compositions may comprise, or be in addition to, the carrier, excipient or diluent any suitable binder(s), lubricant(s), suspending agent(s), coating agent(s), solubilising agent(s) and other carrier agents that may aid or increase vector entry into the target site (such as for example a lipid delivery system).
  • the composition can be administered by any one or more of inhalation; in the form of a suppository or pessary; topically in the form of a lotion, solution, cream, ointment or dusting powder; by use of a skin patch; orally in the form of tablets containing excipients such as starch or lactose, or in capsules or ovules either alone or in admixture with excipients, or in the form of elixirs, solutions or suspensions containing flavouring or colouring agents; or they can be injected parenterally, for example intracavernosally, intravenously, intramuscularly, intracranially, intraoccularly intraperitoneally, or subcutaneously.
  • compositions may be best used in the form of a sterile aqueous solution which may contain other substances, for example enough salts or monosaccharides to make the solution isotonic with blood.
  • compositions may be administered in the form of tablets or lozenges which can be formulated in a conventional manner.
  • the lentiviral vector as described herein may also be used to transduce target cells or target tissue ex vivo prior to transfer of said target cell or tissue into a patient in need of the same.
  • An example of such cell may be autologous T cells and an example of such tissue may be a donor cornea.
  • the present invention also encompasses the use of variants, derivatives, analogues, homologues and fragments thereof.
  • a variant of any given sequence is a sequence in which the specific sequence of residues (whether amino acid or nucleic acid residues) has been modified in such a manner that the polypeptide or polynucleotide in question retains at least one of its endogenous functions.
  • a variant sequence can be obtained by addition, deletion, substitution, modification, replacement and/or variation of at least one residue present in the naturally-occurring protein.
  • derivative in relation to proteins or polypeptides of the present invention includes any substitution of, variation of, modification of, replacement of, deletion of and/or addition of one (or more) amino acid residues from or to the sequence providing that the resultant protein or polypeptide retains at least one of its endogenous functions.
  • analogue in relation to polypeptides or polynucleotides includes any mimetic, that is, a chemical compound that possesses at least one of the endogenous functions of the polypeptides or polynucleotides which it mimics.
  • amino acid substitutions may be made, for example from 1 , 2 or 3 to 10 or 20 substitutions provided that the modified sequence retains the required activity or ability.
  • Amino acid substitutions may include the use of non-naturally occurring analogues.
  • Proteins used in the present invention may also have deletions, insertions or substitutions of amino acid residues which produce a silent change and result in a functionally equivalent protein.
  • Deliberate amino acid substitutions may be made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity and/or the amphipathic nature of the residues as long as the endogenous function is retained.
  • negatively charged amino acids include aspartic acid and glutamic acid
  • positively charged amino acids include lysine and arginine
  • amino acids with uncharged polar head groups having similar hydrophilicity values include asparagine, glutamine, serine, threonine and tyrosine.
  • Conservative substitutions may be made, for example according to the table below. Amino acids in the same block in the second column and preferably in the same line in the third column may be substituted for each other:
  • homologue means an entity having a certain homology with the wild type amino acid sequence and the wild type nucleotide sequence.
  • homology can be equated with "identity”.
  • a homologous sequence is taken to include an amino acid sequence which may be at least 50%, 55%, 65%, 75%, 85% or 90% identical, preferably at least 95%, 97 or 99% identical to the subject sequence.
  • the homologues will comprise the same active sites etc. as the subject amino acid sequence.
  • homology can also be considered in terms of similarity (i.e. amino acid residues having similar chemical properties/functions), in the context of the present invention it is preferred to express homology in terms of sequence identity.
  • a homologous sequence is taken to include a nucleotide sequence which may be at least 50%, 55%, 65%, 75%, 85% or 90% identical, preferably at least 95%, 97%, 98% or 99% identical to the subject sequence.
  • homology can also be considered in terms of similarity, in the context of the present invention it is preferred to express homology in terms of sequence identity.
  • Homology comparisons can be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs can calculate percentage homology or identity between two or more sequences.
  • Percentage homology may be calculated over contiguous sequences, i.e. one sequence is aligned with the other sequence and each amino acid in one sequence is directly compared with the corresponding amino acid in the other sequence, one residue at a time. This is called an "ungapped" alignment. Typically, such ungapped alignments are performed only over a relatively short number of residues.
  • the alignment process itself is typically not based on an all-or-nothing pair comparison. Instead, a scaled similarity score matrix is generally used that assigns scores to each pairwise comparison based on chemical similarity or evolutionary distance.
  • An example of such a matrix commonly used is the BLOSUM62 matrix - the default matrix for the BLAST suite of programs.
  • GCG Wisconsin programs generally use either the public default values or a custom symbol comparison table if supplied (see user manual for further details). For some applications, it is preferred to use the public default values for the GCG package, or in the case of other software, the default matrix, such as BLOSUM62.
  • “Fragments” are also variants and the term typically refers to a selected region of the polypeptide or polynucleotide that is of interest either functionally or, for example, in an assay. “Fragment” thus refers to an amino acid or nucleic acid sequence that is a portion of a full- length polypeptide or polynucleotide.
  • Such variants may be prepared using standard recombinant DNA techniques such as site- directed mutagenesis. Where insertions are to be made, synthetic DNA encoding the insertion together with 5' and 3' flanking regions corresponding to the naturally-occurring sequence either side of the insertion site may be made. The flanking regions will contain convenient restriction sites corresponding to sites in the naturally-occurring sequence so that the sequence may be cut with the appropriate enzyme(s) and the synthetic DNA ligated into the break. The DNA is then expressed in accordance with the invention to make the encoded protein. These methods are only illustrative of the numerous standard techniques known in the art for manipulation of DNA sequences and other known techniques may also be used.
  • All variants, fragments or homologues of the regulatory protein suitable for use in the cells and/or modular constructs of the invention will retain the ability to bind the cognate binding site of the NOI such that translation of the NOI is repressed or prevented in a viral vector production cell.
  • the polynucleotides used in the present invention may be codon-optimised. Codon optimisation has previously been described in WO 1999/41397 and WO 2001/79518. Different cells differ in their usage of particular codons. This codon bias corresponds to a bias in the relative abundance of particular tRNAs in the cell type. By altering the codons in the sequence so that they are tailored to match with the relative abundance of corresponding tRNAs, it is possible to increase expression. By the same token, it is possible to decrease expression by deliberately choosing codons for which the corresponding tRNAs are known to be rare in the particular cell type. Thus, an additional degree of translational control is available.
  • viruses including retroviruses, use a large number of rare codons and changing these to correspond to commonly used mammalian codons, increases expression of a gene of interest, e.g. a NOI or packaging components in mammalian production cells, can be achieved.
  • Codon usage tables are known in the art for mammalian cells, as well as for a variety of other organisms.
  • Codon optimisation of viral vector packaging components has a number of other advantages.
  • the nucleotide sequences encoding the packaging components of the viral particles required for assembly of viral particles in the producer cells/packaging cells have RNA instability sequences (INS) eliminated from them.
  • INS RNA instability sequences
  • the amino acid sequence coding sequence for the packaging components is retained so that the viral components encoded by the sequences remain the same, or at least sufficiently similar that the function of the packaging components is not compromised.
  • codon optimisation also overcomes the Rev/RRE requirement for export, rendering optimised sequences Rev-independent.
  • Codon optimisation also reduces homologous recombination between different constructs within the vector system (for example between the regions of overlap in the gag-pol and env open reading frames). The overall effect of codon optimisation is therefore a notable increase in viral titre and improved safety.
  • codons relating to INS are codon optimised.
  • the sequences are codon optimised in their entirety, with some exceptions, for example the sequence encompassing the frameshift site of gag-pol (see below).
  • the gag-pol gene of lentiviral vectors comprises two overlapping reading frames encoding the gag-pol proteins. The expression of both proteins depends on a frameshift during translation. This frameshift occurs as a result of ribosome "slippage" during translation. This slippage is thought to be caused at least in part by ribosome-stalling RNA secondary structures. Such secondary structures exist downstream of the frameshift site in the gag-pol gene.
  • the region of overlap extends from nucleotide 1222 downstream of the beginning of gag (wherein nucleotide 1 is the A of the gag ATG) to the end of gag (nt 1503). Consequently, a 281 bp fragment spanning the frameshift site and the overlapping region of the two reading frames is preferably not codon optimised. Retaining this fragment will enable more efficient expression of the Gag-Pol proteins.
  • the beginning of the overlap has been taken to be nt 1262 (where nucleotide 1 is the A of the gag ATG) and the end of the overlap to be nt 1461. In order to ensure that the frameshift site and the gag-pol overlap are preserved, the wild type sequence has been retained from nt 1156 to 1465.
  • Derivations from optimal codon usage may be made, for example, in order to accommodate convenient restriction sites, and conservative amino acid changes may be introduced into the Gag-Pol proteins.
  • codon optimisation is based on lightly expressed mammalian genes.
  • the third and sometimes the second and third base may be changed.
  • gag- pol sequences can be achieved by a skilled worker.
  • retroviral variants described which can be used as a starting point for generating a codon-optimised gag-pol sequence.
  • Lentiviral genomes can be quite variable. For example there are many quasispecies of HIV-1 which are still functional. This is also the case for EIAV. These variants may be used to enhance particular parts of the transduction process. Examples of HIV-1 variants may be found at the HIV Databases operated by Los Alamos National Security, LLC at http://hiv-web.lanl.gov. Details of EIAV clones may be found at the National Center for Biotechnology Information (NCBI) database located at http://www.ncbi.nlm.nih.gov.
  • NCBI National Center for Biotechnology Information
  • the strategy for codon-optimised gag-pol sequences can be used in relation to any retrovirus. This would apply to all lentiviruses, including EIAV, FIV, BIV, CAEV, VMR, SIV, HIV-1 and HIV-2. In addition this method could be used to increase expression of genes from HTLV-1, HTLV-2, HFV, HSRV and human endogenous retroviruses (HERV), MLV and other retroviruses. Codon optimisation can render gag-pol expression Rev-independent. In order to enable the use of anti-rev or RRE factors in the lentiviral vector, however, it would be necessary to render the viral vector generation system totally Rev/RRE-independent. Thus, the genome also needs to be modified. This is achieved by optimising vector genome components. Advantageously, these modifications also lead to the production of a safer system absent of all additional proteins both in the producer and in the transduced cell.
  • the invention describes use of a plurality of 'Cytoplasmic Accumulation Region' (CAR) elements (CARe) or 'tiles' based on the consensus sequence BMWGHWSSWS (SEQ ID NO: 24), for example CCAGATCCTG (SEQ ID NO:30), within the 3'UTR of a viral vector transgene cassette either alone or in combination with other cis-acting elements such as posttranscription regulatory elements (PREs) and/or ZCCHC14 binding loops.
  • CAR 'Cytoplasmic Accumulation Region'
  • CARe 'Cytoplasmic Accumulation Region'
  • SEQ ID NO: 24 consensus sequence
  • CCAGATCCTG SEQ ID NO:30
  • the invention can be used in two main contexts: [1] within viral vector genomes where 'cargo' space is not limiting, and therefore in target cells the CAR elements further enhance expression of a transgene cassette containing another 3'UTR element, such as wPRE, or [2] within viral vector genomes where cargo space is limiting (i.e. at or above or substantially above the packaging 'limit' of the viral vector system employed), and therefore the CAR elements may be used instead of a larger 3'UTR element, such as the wPRE, thus reducing vector genome size, whilst also imparting an increase to transgene expression in target cells compared to a vector genome lacking any 3'UTR cis-acting element.
  • wPRE another 3'UTR element
  • the net benefit is essentially determining the impact of total vector genome size on the maximum practical vector titres that can be achieved with or without the above stated combinations of cis-acting elements, whilst also leading to desirable levels transgene expression in target cells to mediate therapeutic effect. For example, if a lentiviral vector is being employed to deliver a large transgene resulting in a vector genome size of over ⁇ 9.5kb (the size of wild type HIV-1) - for example 10-to-13kb - then the viral vector titres are likely to be severely reduced (Sweeney and Vink, 2021).
  • Output titres may be orders of magntiude lower than titres of vector genomes of ⁇ 9.5kb. In such cases, it may be more advantageous to reduce the size of the vector genome by employing smaller cis-acting elements such as the CARe and/or ZCCHC14 binding loop(s) rather than larger elements such as the wPRE, even if transgene expression in the target cells is slighty or even modestly lower compared to use of wPRE (hence, the net benefit balances vector titre with transgene expression in target cells).
  • smaller cis-acting elements such as the CARe and/or ZCCHC14 binding loop(s)
  • the net benefit balances vector titre with transgene expression in target cells.
  • the threshold of vector genome packaging is much more stringent, with the optimum size for AAVs in the 4.7-5.3kb range (and in some cases ⁇ 6) (Wu et al., 2010).
  • AdVs the maximum size of the vector genome is considered to be 34-37kb, and the available space for transgene sequences depends on whether 'gutted' or 1st/2nd generation vectors are being employed (Ricobaraza et al., 2020).
  • Example 1 Positioning of novel cis-acting sequences in the 3'UTR of transgene cassettes within retroviral vector genomes (e.g. lentiviral vector genomes)
  • the 3'UTR of a transgene cassette encoded within a retroviral vector genome - such as a leniviral vector genome - also harbours elements required for reverse transcription and integration; namely, the 3' polypurine tract (3'ppt) and the DNA attachment (aft) site, respectively.
  • the 3'ppt is generated from viral RNA by the RNase H activity of RT plus-strand DNA during reverse transcription, resulting in a 15 nucleotide primer.
  • the 3'ppt is highly conserved in most retroviruses and has been shown to be selectively used as the site of plusstrand initiation (Rausch and Grice, 2004).
  • the att site is defined as those end sequences important for integration.
  • the att site is comprised of U3 sequences at the terminus of the 5' LTR, and terminal U5 sequences at the end of the 3' LTR (Brown et al., 1999).
  • the positioning of additional elements in the 3' UTR of the transgene cassette should be considered relative to these components when a retroviral viral vector is being used.
  • the transgene 3'UTR will necessarily contain the 3'ppt and U3 att sequences, since both the transgene cassette and vector genome cassette will utilizes the same polyadenylation (transcriptional terminator) sequence within the 3'LTR.
  • transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette
  • novel cis-acting sequences described herein e.g. the CAR sequence, and/or the ZCCHC14 protein-binding sequence
  • Suitable positions are shown in Figure 1. Other suitable positions may also be identified by a person of skill in the art, based on the disclosure provided herein
  • the core sequence that comprises both the 3'ppt and the att site may have a sequence of SEQ ID NO:25 (wherein 3'ppt is in bold, and att is underlined):
  • transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette, it is preferable if the sequence above (of SEQ ID NO:25) is not disrupted by the novel cis-acting sequences described herein.
  • sequence of SEQ ID NO: 26 may be used to provide the 3'ppt and att site (e.g. of a lentiviral vector genome expression cassette as described herein), (wherein 3'ppt is in bold, and att is underlined):
  • transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette, it is preferable if the sequence above (of SEQ ID NO:26) is not disrupted by the novel cis-acting sequences described herein.
  • transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette cis-acting elements within the 3'UTR of the transgene cassette may be positioned upstream and/or downstream of the above uninterrupted sequences.
  • Figure 1 also indicates how the CARe and/or ZCCHC14 binding loop(s) may be variably positioned within LV genome expression cassettes with or without a PRE, with transgene sequences in forward or reverse orientation.
  • Example 2 General positioning of CAR sequences with or without other cis-acting elements in the 3'UTR of transgene cassettes for empirical testing
  • Figure 2 provides greater detail on how CAR sequences may be used alone or in combination with ZCCHC14 protein-binding sequence(s), with or without a PRE within the 3'UTR of a transgene cassette of a viral vector.
  • Figure 2A shows the CAR element consensus sequence (adapted from Lei et al., 2013).
  • the nucleotide sequences and structures for ZCCHC14- binding loops from HMCV RNA2.7 and WHV wPRE are shown in Figure 2B.
  • Recruitment of ZCCHC14 to the 3' region of the transgene mRNA results in a complex of ZCCHC14-Tent4, which enables mixed tailing within polyA tails of the polyadenylated mRNA.
  • a plurality of CARe sequences for example 6 or 8 or 10 or 16 CARe sequences (repeated tandemly) may be positioned throughout the 3'UTR and optionally mixed with one or more ZCCHC15 binding loop(s) upstream of the polyA sequence to generate a composite sequence that increases the expression of the transgene protein. This can be done by rational design, for example testing different transgene cassettes containing increasing numbers of CARe sequences, and optionally in combination with one or more ZCCHC14 protein-binding sequences at different positions.
  • a (semi-)random library of transgene cassette may be produced to generate thousands to tens of millions or more variants, for example using golden gate cloning in combination with nucleic acid barcoding.
  • Such cassettes may be inserted directly into the viral vector of choice and transgene expression levels evaluated in target cells.
  • Figure 2C gives an overview of how such variants may be tested, screened and selected for high transgene expression in target cells to identify candidate composite cis-acting elements.
  • Example 3 Use of novel CARe-containing 3'UTRs within lentiviral vector transgene cassettes to increase transgene expression in different target cells
  • Lentiviral vector genomes were designed to contain GFP expression cassettes driven by a variety of different promoters; namely, EF1a (containing its own intron), EFS (EF1a lacking the intron), human phosphoglycerate kinase (huPGK), Ubiquitin (UBC; containing its own promoter) and UBCs (UBC lacking the intron).
  • Transgene cassette 3'UTR variants were made by generally deleting the wPRE and adding a CAR sequence (160bp in size) composed of tandem 16x CARe sequences (CARe.16t) in this position ('PosT), and optionally inserting a ZCCHC14 protein-binding sequence (from HMCV RNA2.7 and WHV wPRE) downstream of position 1 ('Pos2'). Controls were made with wPRE-only, ⁇ wPRE or where the 160bp CAR sequence was inverted (CARe.16t).
  • LVs were produced in suspension (serum-free) HEK293T cells by transient co-transfection, with packaging plasmids (gagpol, VSVG and rev for RRE/rev-dependent genomes).
  • packaging plasmids gagpol, VSVG and rev for RRE/rev-dependent genomes.
  • Titres of these 'MSD-2KO' genomes can be recovered by use of co-expression of a modified U1 snRNA (example '256U1' used herein) that anneals to SL1 of the packaging signal (see WO 2021/014157 and WO 2021/160993, incorporated herein by reference).
  • a modified U1 snRNA example '256U1' used herein
  • anneals to SL1 of the packaging signal see WO 2021/014157 and WO 2021/160993, incorporated herein by reference.
  • Clarified crude harvest material from LV productions were titrated on adherent HEK293T cells by either flow cytometry of transduced cells ('Biological titre GFP TU/mL) and/or by Integration assay (Integrating TU/mL).
  • alternative cells such as Jurkat (T-cell line), HEPG2 (Human hepatocyte carcinoma cell line) and equine primary cells '92BR' (testis fibroblasts) were transduced at matched multiplicities of infection (MOI), ranging from MOI 0.1-to-2.
  • Transgene expression levels in these transduce cells was measured by flow cytometry and (GFP) transgene Expression Scores (ES) generated by multiplying %GFP-positive cells by the median fluorescence intensities (arbitrary units).
  • Tables 3 and 4 presents the initial data for these experiments, and Figures 3-7 display some of these data graphically.
  • Tables 3 and 4. A summary of lentiviral vectors produced and transgene expression within selected transduced target cell.
  • the table displays the types of lentiviral vectors produced: standard RRE/rev-dependent (STD RRE-LV), U1/RRE-dependent MSD-mutated LVs (MSD-2KOm5-RRE-LV [+256U1] or MSD-mutated, RRE/rev-independent (ARRE) LVs (MSD-2KOm5-ARRE-LV); transgene promoters and stated 3'UTR cis-acting elements employed at either upstream (Pos1) or downstream (Pos2) of each other when employing two cis-acting elements.
  • Production titres from suspension (serum-free) HEK293T cells is stated as biological (GFP TU/rnL) or integrating (Integrating TU/rnL) titres (crude harvest).
  • Median fluorescence intensities (Arb units) are displayed at indicated matched MOIs from transduction of the stated target cells (92BR are donkey primary cells).
  • the novel elements described herein boost expression compared to the ⁇ wPRE control. ND - not done, NT - not tested.
  • wPRE wPRE3 (shortened wPRE), ⁇ wPRE (wPRE deleted), 16x 10bp CARe sequences in sense (CARe.16t) or antisense (CARe.inv16t), and/or single copy of the ZCCHC14 stem loop from either HCMV RNA2.7 (HCMV.ZSL1) or WHV wPRE (WPRE.ZSI1).
  • Figure 4 presents data in transduced Jurkat cells at three different MOIs, and also reports on transgene mRNA levels in the same experiment. These data show that the CARe.16t sequence was able to boost expression from LVs delivering an EFS-GFP cassette lacking the wPRE to the same levels observed with a cassette containing the wPRE. Surprisingly, the CAR.16t sequence provided a boost to expression greater than cassettes containing the wPRE when employing the EF1a promoter (i.e. containing an intron). This was not expected, given that CAR element were generally expected to act on intronless transcripts in their natural context. Figure 4D confirmed that some aspect of all these effects were due to increased mRNA steady-state pools in transduced cells.
  • Example 4 Pairing a ZCCHC14 protein-binding sequence with multiple numbers of CARe tiles leads to increased transgene expression in transduced cells
  • Controls LVs containing wPRE, no cis-acting element ( ⁇ wPRE), the 16 x 10bp CARe variant (CARe.16t) or a single ZCCHC14 protein-binding sequence were also produced. Clarified LV supernatants were titrated to generate integrating titres (Tll/mL), on adherent HEK293T cells. Following this, fresh adherent HEK293T cells were transduced at matched MOIs of 0.5, 1 or 2, and cultures incubated for 3 days. GFP expression in transduced cells was measured by flow cytometry, and median fluorescent intensity values generated (Arbitrary units). These were normalised to the wPRE-containing control (set to 100%).
  • Example 5 Transgene 3'UTRs bearing multiple numbers of CARe tiles optionally paired with a ZCCHC14 protein-binding sequence leads to increased transgene expression in transduced Jurkat cells.
  • ZSL1 ZCCHC14 protein-binding sequence
  • SEQ ID NO:29 10bp CARe sequence CCAGTTCCTG
  • Controls LVs containing wPRE, no cis-acting element ( ⁇ wPRE), the 16 x 10bp CARe variant (CARe.16t) or a single ZCCHC14 protein-binding sequence were also produced. Clarified LV supernatants were titrated to generate integrating titres (TU/mL), on adherent HEK293T cells ( Figure 9). Following this, fresh Jurkat cells were transduced at matched MOIs of 0.5 or 1 , and cultures incubated for 3 days. GFP expression in transduced cells was measured by flow cytometry, and median fluorescent intensity values generated (Arbitrary units). These were normalised to the wPRE-containing control (set to 100%).
  • the highest transgene expression was observed with as few as 16x CARe tiles, and comparable expression to wPRE was achieve when ZSL1 was paired with just 8x CARe tiles.
  • the CARe.8t or CAR.16t elements in combination with the single ZCCHC14 protein-binding sequence were ⁇ 150bp and ⁇ 230bp respectively, with each generating transgene expression of the same or 160% of that of the ⁇ 590bp wPRE control, respectively.
  • Example 6 Assessment of CARe consensus sequence variants as 16x tiles in combination with a ZCCHC14 protein-binding sequence on transgene expression in transduced Jurkat cells.
  • the CARe consensus sequence generated by Lei et al., 2013 - and tested in the present invention within transgene 3'llTRs - was engineered to reflect the variance within the consensus to generate alternative 'synthetic' tiles. These are shown in Figure 10, indicating how each variant differed within this semi-degenerate consensus sequence. Since it is shown in the present invention that 8x-to-16x copies of the CARe consensus tile produced increased transgene expression in a variety of different cell types/line, these synthetic variants were tandemised as 16x tiles and tested in combination with the HCMV.ZSL1 loop. Another variant was tested in which the CARe consensus 16x tile fragment was mutated in every other tile (i.e.
  • FIG. 10 displays transgene expression normalised to vector-copy- number (VCN).
  • the variant containing alternate mutated CARe tiles performed similarly to the CARe.16t consensus control without the ZSL1 loop (which produced slightly higher levels of expression than wPRE).
  • this data indicate that the CARe consensus tile is the most optimal 'version', and that each tile contributes to expression activity in a cumulative manner, since if only some consensus tiles were 'active' whilst others were acting as 'spacers', then the cons/mut5+ZSL1 variant would be expected to equally active as the CARe.16t+ZSL1 control.
  • the cons/mut5 variant boosted expression to 2/3rds that of the CARe.16t+ZSL1 control, which is more inline with the activity of the CARe.8t+ZSL1 variant in Figure 9.
  • CARe consensus variants were generated based on 'native' sequences as per Lei et al., 2013, namely, those found in c-Jun, HSPB3, IFNalpha and IFNbeta. These are displayed in Figure 11. These variants were tandemised into 16x CARe elements and paired with the HCMV/ZSL1 loop. These were inserted into an EFS-GFP expression cassette within a standard RRE/rev-dependent LV genome cassette and used to produce LVs in suspension (serum-free) HEK293T cells. The same set of experiments in Jurkat cells as described above was carried out, and data reported in Figure 11. In this case, variant 2 from c-Jun and variant 1 of IFNal provided similar levels of expression compared to wPRE.
  • Example 7 Use of CARe/ZSL1 variant as 3'UTR element within rAAV vectors.
  • rAAV vector genomes were engineered to contain example CAZL elements, so that they could be compared 'empty' or cassettes using the wPRE, which is -590 nts in length.
  • the use of CAZL elements may be especially useful in rAAVs where the genome packaging size limit is a 'hard' one, at ⁇ 5kb.
  • Figure 12 shows a schematic of these example rAAV vectors and of those generated for exemplification, with the CAZL elements tested ranging in size: 141 nt (CARe.4tZSL1), 181 nt (CARe.8t/ZSL1) and 261 nt (CARe.16t/ZSL1). Note that these variants contain 'stuffer' sequence between the tandemised CARe tiles and the ZSL1 sequence due to cloning sites, and thus may in practise be reduced further in size. Additional controls included the inverted CAZL element for each type to control both for rAAV genome size and for CAZL functionality, since the inverted elements are not expected to be functional.
  • rAAVs Two sets of the rAAVs were generated, wherein the GFP transgene was driven by the CMV or EFS promoters.
  • rAAVs were produced by transient transfection of HEK293T cells, followed by qPCR titration (see details in Figure 13 and Materials/Methods).
  • HEPG2 cells were transduced with the different rAAV vector stocks at matched MOIs (250 and 500; see details in Figure 13 and Materials/Methods).
  • Transgene expression data is presented in Figure 13 and demonstrates that the CAZL elements were able to increase transgene expression in HEPG2 cells compared to the empty control, and for the CMV-GFP cassette, the CARe.8t/ZSL1 and CARe.16t/ZSL1 CAZL variants enabled ⁇ 2-fold greater expression than the wPRE, despite being less than half the size of wPRE. Some variance was observed when employing different promoters, with the CARe.4t/ZSL1 increasing expression when using the EFS promoter to similar levels as the other two CAZL variants.
  • the inverted CAZL variants expressed transgene at similar levels as the empty control in all cases, demonstrating that orientation of the CAZL elements in the forward sense is necessary (i.e. these are functional sequences) and that the observed improved expression could not be explained by merely the increase in vector genome size or transgene cassette size relative to the empty control.
  • CARe.4t/ZL1 (SEQ ID NO:69):
  • CARe.8t/ZL1 (SEQ ID NO:70):
  • CARe.16t/ZL1 (SEQ ID NO:71):
  • HEK293Ts HEK293Ts suspension cells were grown in FreestyleTM 293 Expression Medium (Gibco) supplemented with 0.1 % of Cholesterol Lipid Concentrate (Gibco) and incubated at 37 °C in 5% CO2, in a shaking incubator (25 mm orbit set at 190 RPM).
  • HEK293T adherent cells were maintained in complete media (Dulbecco's Modified Eagle Medium (DMEM) (Sigma) supplemented with 10% heat-inactivated fetal bovine serum (Gibco), 2 mM L- glutamine (Sigma) and 1 % non-essential amino acids (NEAA) (Sigma)), at 37 °C in 5% CO2.
  • DMEM Dynamic Eagle Medium
  • DMEM heat-inactivated fetal bovine serum
  • NEAA non-essential amino acids
  • HEK293Ts All vector production was carried out in HEK293Ts, in 24-well plates (1 mL volumes, on a shaking platform) or 30mL shake flasks.
  • HEK293Ts cells were seeded at 8 x 10 5 cells per ml in serum-free media and were incubated at 37 °C in 5% CO 2 , shaking, throughout vector production.
  • Approximately 24 hours after seeding the cells were transfected using the following mass ratios of plasmids per effective final volume of culture at transfection: 0.95 ⁇ g/mL Genome, 0.1 ⁇ g/mL Gag-Pol, 0.06 ⁇ g/mL Rev, 0.07 ⁇ g/mL VSV-G.
  • Transfection was mediated by mixing DNA with Lipofectamine 2000CD in Opti-MEM as per manufacturer's protocol (Life Technologies). Sodium butyrate (Sigma) was added ⁇ 18 hrs later to 10 mM final concentration. Typically, vector supernatant was harvested 20-24 hours later, and then filtered (0.22 ⁇ m) and frozen at -80 °C. Vector used for transduction of Jurkat cells, was either produced in the absence of Sodium butyrate or centrifuged at 20,000 rpm for 1h30m and the vector pellet ressuspended in TSSM (Tromethamine, Sodium Chloride, Sucrose, Mannitol buffer).
  • TSSM Tuthamine, Sodium Chloride, Sucrose, Mannitol buffer
  • HEK293T cells were seeded at 1.2 x 10 4 cells/well in 96-well plates.
  • GFP-encoding viral vectors were used to transduce the cells in complete media containing 8 mg/ml polybrene and 1 x Penicillin Streptomycin for approximately 5-6 hours after which fresh media was added. The transduced cells were incubated for 2 days at 37 °C in 5% CO2. Cultures were then prepared for flow cytometry using an Attune-NxT® (Thermofisher). Percent GFP expression was measured and vector titres were calculated using a predicted cell count of 2 x 10 4 cells at the time of transduction (base on typical growth rate), the dilution factor of the vector sample, the percentage positive GFP population and total volume at transduction.
  • lentiviral vector titration by integration assay 0.5mL volumes of neat to 1 :5 diluted vector supernatants were used to transduce 1x10 5 HEK293T cells at 12-well scale in the presence of 8 ⁇ g/mL polybrene. Cultures were passaged for 10 days (1 :5 splits every 2-3 days) before host DNA was extracted from 1x10 6 cell pellets. Duplex quantitative PCR was carried out using a FAM primer/probe set to the HIV packaging signal ( ⁇ ) and to RRP1 , and vector titres (TU/rnL) calculated using the following factors: transduction volume, vector dilution, RRP1- normallised HIV-1 ⁇ copies detected per reaction.
  • HEK293Tcells were seeded at 9E4 cells per well in 12 well plates.
  • Jurkat cells were seeded at 2E5 cells per well on a 12 well plate. The following day, cells were recounted and MOI calculated based on integration titre and number of cells per well.
  • Cells were passaged for 10 days with splits every 2-3 days. Flow cytometry, using an Attune-NxT® (Thermofisher), was performed at each split to determine the transgene (GFP) mean fluorescence intensity (MFI).
  • GFP transgene
  • MFI mean fluorescence intensity
  • RNA extraction from the cytosolic fraction was performed using RNeasy kit (Qiagen), followed by DNAse treatment (ezDNaseTM, Invitrogen) and reverse transcriptase (SuperScriptTM IV VILOTM, Invitrogen). Quantification of cytosolic transgene mRNA was done by qPCR with primer and probes targeting GFP and GAPDH as a normalization control. Analyses was done by comparative quantification using the delta CT method. rAAV vector production
  • AAV vector was produced with the AAV-MAX Helper-Free AAV Production System (ThermoFisher). The manufacturer's protocol was followed with the exception that the production cells used were 1 ,65s cells. Cells were seeded at 3E+06 live cells/mL in Freestyle + 0.1 % Cholesterol (hereafter FS + 0.1% CLC) in a total volume of 20ml and incubated in a Multitron set to 37°C with 5% CO2 and 200 rpm until transfection mixes were ready. A total of 1 ,5ug/ml of DNA was transfected into cells at a molar ratio of 1 :1 :1 (Transfer: AAV2Rep/Cap: Helper).
  • Transfer plasmids used were pscAAV2-CMV-GFP and pscAAV2-EFS-GFP. After transfection, cells were returned to the Multitron and incubated for 72 hours before harvest with AAV-MAX lysis buffer (Thermo Fisher) following manufacturer's protocol. rAAV vector titration
  • the number of genome-containing particles present in the AAV preparation was determined by TaqMan Real-time PCR assay. Prior to the qPCR, vector was treated with DNAse and exonuclease in order to remove residual DNA. These enzymes were then heat-inactivated with a 95°C step, which simultaneously lyses AAV capsids to release vector genomes to allow analysis by PCR. TaqMan primers and probe were designed to target the GFP. rAAV transduction at matched multiplicity of infection (MOI)
  • HepG2 cells were seeded at 2E5 cells/ well in MEM supplemented with 10% FBS, L-glut (2mM), 1% NEAA in 12 well plates. The following day, cells were recounted, and MOI calculated based on the genome copy number/ml determined by qPCR and number of cells. HepG2 cells were transduced at MOI 250, and 500. Approximately 72 hours after infection, flow cytometry using an Attune-NxT® (Thermofisher), was performed to determine the percentage of GFP positive cells and the transgene (GFP) mean fluorescence intensity (MFI).
  • GFP transgene

Abstract

The present invention relates to novel nucleotide sequences, and to viral vectors or cells comprising such nucleotide sequences. The invention also relates to viral vector production systems, and methods for producing viral vectors using the nucleotide sequences, viral vectors, or cells described herein. Methods for identifying sequences that improve transgene expression in a target cell are also provided herein.

Description

Novel viral regulatory elements
The present invention relates to novel nucleotide sequences, and to viral vectors or cells comprising such nucleotide sequences. The invention also relates to viral vector production systems, and methods for producing viral vectors using the nucleotide sequences, viral vectors, or cells described herein. Methods for identifying sequences that improve transgene expression in a target cell are also provided herein.
The use of lentiviral (LV) and AAV vectors as part of licensed medicinal products is now a reality, and is anticipated to increase over the coming decades. As the underlying causes of many genetic diseases are being revealed, it is clear that the delivery of more functionality to the genetic payload (rather than a single gene) within vector genomes is becoming extremely desirable. The current 'limits' of lentiviral vector capacity have not changed significantly over the last 20 years, and remain in the region of ~7kb of transgene space when employing standard genome cis-acting sequences such as the typical packaging sequence, rev- response element (RRE) and post-transcriptional regulatory elements (PREs) such as that from the woodchuck hepatitis virus (wPRE). For retroviral vectors (including lentiviral vectors), use of PREs, such as wPRE, has been shown to substantially benefit transgene expression in target cells, whereas the benefit in increasing vector titres is less evident (Zufferey et al., 1999). Intrinsically, some aspect of this restriction is defined by the size of the wild type HIV- 1 genome of ~9.5kb from which these vector systems are derived. Generally, the specific titres of lentiviral vectors diminish substantially in proportion to their payload size over-and-above this 'limit'. Several aspects of lentiviral vectorology are likely to contribute to the limit: [1] steady-state pool of vector genomic RNA (vRNA) in the production cell, [2] efficiency of conversion of vRNA to dsDNA by reverse transcriptase, and [3] efficiency of nuclear import and/or integration into host DNA. Indeed, the desire to minimize lentiviral vector backbone sequences has recently been highlighted in the literature leading to attempts to alter the arrangement of existing cis-elements (Sertkaya et al., 2021 ; Vink et al., 2017) as well as by generating novel genome configurations to minimize RRE and the packaging signal (WO/2021/181108).
The known packaging limit for AAV vectors is ~5kb, and for 'self-complementary' scAAV vectors this is halved at ~2.5kb. PREs such as the wPRE are often utilized within AAV vectors to increase transgene expression in target cells. Due to the premium for space within these smaller viral vectors, there is pressure to minimize all sequences such as the promoter, introns, UTRs, polyA sequence and even the protein product - all whilst retaining biological output of the delivered cassette. The necessity of the presence of these elements (to achieve high gene expression) has greatly restricted the utility of AAV vectors to deliver more complicated, larger payloads. This has led to the development of radical approaches, which requires co-delivery of at least two different AAV vector types (so called 'dual AAVs'), that then allow for some degree of in-cell 'fusion' of the transgene product, by either DNA cassette recombination, or pre-mRNA trans-splicing or by protein-protein interactions; see Tornabene and Trapani (2020) and references therein. In addition to the complication of manufacturing multiple AAV vector products (and optimizing of their co-administration), these approaches suffer from inefficiencies of recapitulation of the full length transgene product, as well as safety concerns regarding production of truncated products and high vector doses.
An attempt has also been made previously to minimize the size of the wPRE, generating 'WPRE3', which contains minimal gamma and alpha elements and is approximately 41.2% of the full length wPRE (Choi et al., 2014). This was tested in AAV vectors delivered to hippocampal neuron cultures, and produced transgene protein up to 80% of that measured for the full length wPRE. WPRE3 has been used in lentiviral vectors encoding dCas9 but the titres were reported to be very low, and no information regarding dCas9 expression in target cells was reported.
As the success of gene therapies continues, there is expectation that transgene cassettes will become more complex, requiring the delivery of more functions, for example in delivering more genes, transgene control or suicide switches. Improved viral vectors for larger payloads are urgently needed.
Brief summary of the disclosure
The inventors have generated viral vectors with novel short cis-acting sequences in the 3' UTR of a transgene expression cassette. They have identified two novel short cis-acting sequences that can be introduced into the 3' UTR of a transgene expression cassette, either alone, or in combination. These novel short nucleotide sequences (and combinations thereof) can either be used in addition to traditional post-transcriptional regulatory elements (PREs e.g. from woodchuck hepatitis virus; wPRE) to boost transgene expression in target cells or to replace these longer PREs entirely, enabling increased transgene capacity whilst maintaining high levels of transgene expression in target cells.
The inventors have surprisingly found that CAR sequences previously identified to function within 5'UTR sequences of heterologous mRNA provide enhanced gene expression when incorporated into the 3'UTR of a viral vector transgene expression cassette. Moreover, when located within the transgene expression cassette 3' UTR, the initially reported 160bp CAR sequence (composed of 16x repeats of a 10bp core sequence) could be further minimized to fewer than 16 repeats without loss of the benefit to transgene expression. Surprisingly, these CAR sequences are shown to enhance the transgene expression from transgene cassettes utilizing introns, as well as boosting expression from cassettes already containing a full length wPRE.
The inventors have also identified a minimal ZCCHC14 protein-binding sequence that can be incorporated into the 3' UTR of a transgene expression cassette to improve transgene expression. The inventors have shown that these minimal ZCCHC14 protein-binding sequences can be combined with the CAR sequences described herein to further enhance transgene expression.
The novel cis-acting sequences described herein can be used to minimize the size of functional cis-acting sequences of all viral vectors such that payloads can be increased and/or titres of vectors containing larger payloads can be improved, whilst maintaining transgene expression levels in target cells. The invention may therefore be employed [1] within viral vector genomes where 'cargo' space is not limiting, such that the novel cis-acting sequences further enhance expression of a transgene cassette containing another 3' UTR element, such as the wPRE, or [2] within viral vector genomes where cargo space is limiting (i.e. at or above or substantially above the packaging 'limit' of the viral vector system employed), where that the novel cis-acting sequences may be used instead of a larger 3'UTR element, such as the wPRE, thus reducing vector genome size, whilst also imparting an increase to transgene expression in target cells compared to a vector genome lacking any 3'UTR cis-acting element.
Accordingly, the invention provides a nucleotide sequence comprising a transgene expression cassette, wherein the 3' UTR of the transgene expression cassette comprises at least one cis- acting sequence selected from: a) a cis-acting Cytoplasmic Accumulation Region (CAR) sequence, comprising at least one CAR element (CARe) sequence; and/or b) a cis-acting ZCCHC14 protein-binding sequence, comprising at least one CNGGN- type pentaloop sequence, wherein the cis-acting ZCCHC14 protein-binding sequence does not comprise a full length post-transcriptional regulatory element (PRE) a element and does not comprise a full length PRE y element.
Suitably, the CAR sequence may comprise a plurality of CARe sequences.
Suitably, the plurality of CARe sequences may be in tandem. Suitably, the CAR sequence may comprise at least two, at least four, at least six, at least eight, at least ten, at least twelve, at least fourteen, at least sixteen, at least eighteen, or at least twenty CARe sequences, optionally wherein the CARe sequences are in tandem.
Suitably, the CAR sequence may comprise at least six CARe sequences in tandem, or at least ten CARe sequences in tandem.
Suitably, the CAR sequence may comprise at least eight CARe sequences in tandem, at least twelve CARe sequences in tandem, or at least sixteen CARe sequences in tandem.
Suitably, the CARe nucleotide sequence may be BMWGHWSSWS (SEQ ID NO: 24) or BMWRHWSSWS (SEQ ID NO: 55).
Suitably, the CARe nucleotide sequence may be CMAGHWSSTG (SEQ ID NO:28).
Suitably, the CARe nucleotide sequence may be selected from the group consisting of: CCAGTTCCTG (SEQ ID NO:29), CCAGATCCTG (SEQ ID NQ:30), CCAGTTCCTC (SEQ ID NO:31), TCAGATCCTG (SEQ ID NO:32), CCAGATGGTG (SEQ ID NO:33), CCAGTTCCAG (SEQ ID NO:34), CCAGCAGCTG (SEQ ID NO:35), CAAGCTCCTG (SEQ ID NO:36), CAAGATCCTG (SEQ ID NO:37), CCTGAACCTG (SEQ ID NO:38), CAAGAACGTG (SEQ ID NO:39), TCAGTTCCTG (SEQ ID NO: 56), GCAGTTCCTG (SEQ ID NO: 57), CAAGTTCCTG (SEQ ID NO: 58), CCTGTTCCTG (SEQ ID NO: 59), CCTGCTCCTG (SEQ ID NO: 60), CCTGTACCTG (SEQ ID NO:61), CCTGTTGCTG (SEQ ID NO: 62), CCTGTTCGTG (SEQ ID NO: 63), CCTGTTCCAG (SEQ ID NO: 64), CCTGTTCCTG (SEQ ID NO: 65), CCAATTCCTG (SEQ ID NO: 66) and GAAGCTCCTG (SEQ ID NO: 67).
Suitably, the CARe nucleotide sequence may be selected from the group consisting of: CCAGTTCCTG (SEQ ID NO:29), CCTGTTCCTG (SEQ ID NO: 59), CCTGTACCTG (SEQ ID NO:61), CCTGTTCCAG (SEQ ID NO: 64), CCAATTCCTG (SEQ ID NO: 66), CCTGAACCTG (SEQ ID NO:38), CCAGTTCCTC (SEQ ID NO: 31) and CCAGTTCCAG (SEQ ID NO:34).
Suitably, the CARe nucleotide sequence may be CCAGTTCCTG (SEQ ID NO:29). This sequence is also referred to as a "consensus" tile herein. It is the CARe sequence that is used to exemplify the invention in examples 1 to 4 below.
Suitably, the 3' UTR of the transgene expression cassette may not comprise additional post- transcriptional regulatory elements (PREs). Suitably, the 3' UTR of the transgene expression cassette may comprise at least one additional post-transcriptional regulatory element (PRE).
Suitably, the additional PRE may be a Woodchuck hepatitis virus PRE (wPRE).
Suitably, the cis-acting ZCCHC14 protein-binding sequence may be a PRE a element fragment.
Suitably, the cis-acting ZCCHC14 protein-binding sequence may be: a) a fragment of a HBV PRE a element; or b) a fragment of HCMV RNA 2.7; or c) a fragment of a wPRE a element.
Suitably, the PRE a element fragment may be no more than 200 nucleotides in length.
Suitably, the PRE a element fragment may be no more than 90 nucleotides in length.
Suitably, the CNGGN-type pentaloop sequence may be comprised within a stem-loop structure having a sequence selected from the group consisting of:
(i) TCCTCGTAGGCTGGTCCTGGGGA (SEQ ID NO:40); and
(ii) GCCCGCTGCTGGACAGGGGC (SEQ ID NO:41).
Suitably, the CNGGN-type pentaloop sequence may be comprised within a heterologous stem-loop structure.
Suitably, the ZCCHC14 protein-binding sequence may comprise a sequence selected from the group consisting of:
(i)TGCCGTCGCCACCGCGTTATCCGTTCCTCGTAGGCTGGTCCTGGGGAACGGGTCGG
CGGCCGGTCGGCTTCT (SEQ ID NO: 42); and
(ii)CTATTGCCACGGCGGAACTCATCGCCGCCTGCCTTGCCCGCTGCTGGACAGGGGCT CGGCTGTTGGGCACTGACAATTCCGTGGTGTTGT (SEQ ID NO: 43).
Suitably, the 3' UTR of the transgene expression cassette may comprise a cis-acting CAR sequence and a cis-acting ZCCHC14 protein-binding sequence, optionally wherein the ZCCHC14 protein-binding sequence is located 3' to the CAR sequence. Suitably, the 3' UTR of the transgene expression cassette may comprise at least two spatially distinct cis-acting CAR sequences and/or at least two spatially distinct cis-acting ZCCHC14 protein-binding sequences.
Suitably, the cis-acting sequences in the 3' UTR of a transgene expression cassette may comprise the sequence of SEQ ID NO: 69, SEQ ID NO:70 or SEQ ID NO:71.
Suitably, the 3' UTR of the transgene expression cassette may further comprise a polyA sequence located 3' to the cis-acting CAR sequence and/or cis-acting ZCCHC14 proteinbinding sequence.
Suitably, the transgene expression cassette may further comprise a promoter operably linked to the transgene.
Suitably, the promoter may lack its native intron, optionally wherein the promoter is selected from the group consisting of: an EFS promoter, a PGK promoter, and a UBCs promoter.
Suitably, the promoter may comprise an intron, optionally wherein the promoter is selected from the group consisting of: an EF1a promoter and a UBC promoter.
Suitably, the transgene expression cassette may be a viral vector transgene expression cassette.
Suitably, the viral vector transgene expression cassette may be selected from the group consisting of: a retroviral vector transgene expression cassette, an adenoviral vector transgene expression cassette, an adeno-associated viral vector transgene expression cassette, a herpes simplex viral vector transgene expression cassette, and a vaccinia viral vector transgene expression cassette.
Suitably, the viral vector transgene expression cassette may be a retroviral vector transgene expression cassette or an adeno-associated viral vector transgene expression cassette.
Suitably, the retroviral vector transgene expression cassette may be a lentiviral vector transgene expression cassette. A nucleotide sequence is also provided comprising a viral vector genome expression cassette, wherein the viral vector genome expression cassette comprises the nucleotide sequence of the invention.
Suitably, the viral vector genome expression cassette may be selected from the group consisting of: a retroviral vector genome expression cassette, an adenoviral vector genome expression cassette, an adeno-associated viral vector genome expression cassette, a herpes simplex viral vector genome expression cassette, and a vaccinia viral vector genome expression cassette.
Suitably, the viral vector genome expression cassette may be a retroviral vector genome expression cassette or an adeno-associated viral vector transgene expression cassette.
Suitably, the 3' UTR of the retroviral vector genome expression cassette may further comprise a 3' polypurine tract (3'ppt) that is located 5' to a DNA attachment (att) site, wherein, when the transgene expression cassette is in the forward orientation with respect to the genome expression cassette, the cis-acting sequence(s) are located 5' to the 3'ppt and/or 3' to the att site.
Suitably, the retroviral vector genome expression cassette may be a lentiviral vector genome expression cassette.
Suitably, the major splice donor site in the lentiviral vector genome expression cassette may be inactivated, optionally wherein the cryptic splice donor site 3' to the major splice donor site is also inactivated.
Suitably, the inactivated major splice donor site may have the sequence of GGGGAAGGCAACAGATAAATATGCCTTAAAAT (SEQ ID NO:4).
Suitably, the nucleotide sequence may further comprise a nucleotide sequence encoding a modified U1 snRNA, wherein the modified U1 snRNA has been modified to bind to a nucleotide sequence within the packaging region of the lentiviral vector genome.
Suitably, the viral vector genome expression cassette may be operably linked to the nucleotide sequence encoding the modified U1 snRNA. A viral vector is also provided comprising a viral vector genome encoded by the nucleotide sequence of the invention.
A viral vector production system comprising a nucleotide sequence according to the invention and one or more additional nucleotide sequence(s) encoding viral vector components is also provided.
Suitably, the one or more additional nucleotide sequence(s) may encode gag-pol and env, and optionally rev.
A cell is also provided comprising a nucleotide sequence according to the invention, a viral vector according to the invention, or a viral vector production system according to the invention.
A method for producing a viral vector is also provided, comprising the steps of:
(a) introducing a viral vector production system of the invention into a cell; and
(b) culturing the cell under conditions suitable for the production of the viral vector.
A viral vector produced by the method of the invention is also provided.
Use of the nucleotide sequence of the invention, the viral vector production system of the invention, or the cell of the invention, for producing a viral vector is also provided.
A method for identifying one or more cis-acting sequence(s) that improve transgene expression in a target cell is also provided, the method comprising the steps of:
(a) transducing target cells with a viral vector of the invention;
(b) identifying target cells with a high level transgene expression; and
(c) optionally identifying the one or more cis-acting sequence(s) located within the 3' UTR of the transgene mRNA present within these target cells.
Suitably, step (c) may comprise performing RT PCR and optionally sequencing the transgene mRNA.
Suitably, the method may be performed using a plurality of viral vectors with:
(i) different cis-acting sequences in the 3'UTR of the transgene expression cassette; and/or
(ii) different cis-acting sequence locations within the 3'UTR of the transgene expression cassette; and/or (iii) different cis-acting sequence combinations in the 3'UTR of the transgene expression cassette; to identify one or more cis-acting sequence(s) that improve transgene expression in the target cell.
Various aspects of the invention are described in further detail below.
Brief description of the drawings
Embodiments of the invention are further described hereinafter with reference to the accompanying drawings, in which:
Figure 1 provides an overview of the positional use of CARe cis-acting elements for use alone or in combination with ZCCHC14 stem loop(s) and/or a PRE within lentiviral vector genomes. The schematic shows the generalized structure of a lentiviral vector genome containing the RRE or deleted for RRE (containing an undisclosed feature) and internal transgene expression cassette encoding a gene of interest (GOI); such genomes typically utilize a PRE (such as wPRE) within the transgene 3' UTR. The PRE may optionally be entirely replaced with minimal CARe sequences alone or in combination with ZCCHC14 stem-loops (ZC'14 SL) up or downstream of the 3'ppt in order to reduce the size of the transgene cassette. For transgene cassettes inverted with respect to the forward directionality of the vector genomic RNA, the same cis-acting element options can be employed in the transgene 3'UTR , except there is no 3'ppt to consider.
Figure 2 provides a detailed view of the CARe and ZCCHC14 stem loop sequences and their incorporation into the 3'UTR region of transgene cassettes within viral vectors. A. The consensus sequence for the 10bp CARe core sequence (or 'tile' referred herein). B. Two nonlimiting examples of ZCCHC14 binding stem loops found within HCMV (RNA2.7) and WHV (wPRE). ZCCHC14 recruitment leads to formation of a complex with Tent4, which promotes mixed tailing in polyA tails of mRNAs, stabilizing them. C. The concept of insertion of CARe sequences into the 3' UTR of a viral vector transgene cassette (DNA at top, RNA shown as curvy line below DNA), optionally together with ZCCHC14 stem loops, taking care in retro/lentiviral vectors not to disrupt 3'ppt or att ('Δ'[i.e. ΔU3]) integration sequences required for reverse transcription and integration respectively. CARe sequences and optionally ZCCHC14 stem loops can be designed rationally or by library design, and screened empirically in target cells. For example screening can be done by viral vector transduction followed by selection of high-expressing cells (e.g. GFP FACS), followed by RT-PCR and sequencing of target mRNA to identify transcripts containing combinations of the cis-acting elements that lead to greater transgene expression and mRNA steady-state pools. This process can be repeated to enrich the best variants, whilst also optionally including error- prone RT-PCR to fine-tune sequences.
Figure 3 shows production titres in suspension (serum-free) HEK293T cells of lentiviral vectors harbouring different transgene promoters combined with 3' UTR cis-acting elements. A and B present data from two independent experiments for LV-RRE-EFS-GFP vectors containing different 3' UTR cis-acting elements: wPRE, ΔwPRE (wPRE deleted), 16x 10bp CARe sequences in sense (CARe.16t) or antisense (CARe.inv16t) and/or single copy of the ZCCHC14 stem loop from HCMV RNA2.7 (HCMV.ZSL1). C shows data for output titres of LV- RRE-EF1a-GFP (EF1a contains an intron) and LV-RRE-huPGK-GFP. Titres were measured by transduction of adherent HEK293T cells followed by flow cytometry based assay after 3 days (GFP TU/mL) or qPCR to LV DNA after 10 days (Integrating TU/mL). The data shows that integrating titres of LVs are comparable irrespective of the presence/absence of any of the cis-acting elements but that GFP titres vary, reflecting the expression levels in transduced adherent HEK293T cells. The 16x 10bp CARe tile (only) in the sense orientation provided a boost to LV GFP TU/ml titres lacking the wPRE.
Figure 4 shows that the 16x 10bp CARe tile boosts transgene expression from lentiviral vectors lacking wPRE in a T-cell line. LV-RRE-EFS-GFP and LV-RRE-EF1a-GFP vector stocks produced in suspension (serum-free) HEK293Ts were used to transduce Jurkat cells at matched multiplicity of infection (MOI): MO1 1 [A], MOI 0.25 [B] and MOI 0.1 [C]. Transduced cells were analysed by flow cytometry to obtain % GFP-positive values and median fluorescence intensity values (Arbitrary units). D displays data from normalized RT-PCR of extracted mRNA from the transduced cells, where the 100% level is set for each EFS-GFP or EF1a-GFP cassette containing the wPRE in each case. The data show that the 16x 10bp CARe tile (only) in the sense orientation restores transgene expression levels to those observed with wPRE-only, and for EF1a-GFP surprisingly boosts transgene expression levels higher the wPRE-only. The 16x 10bp CARe tile (only) in the sense orientation increase the levels of transgene mRNA above wPRE-only in all conditions.
Figure 5 provides production titres in suspension (serum-free) HEK293T cells of 'MSD- 2KO7'U1 -dependent' lentiviral vectors harbouring different transgene promoters combined with 3' UTR cis-acting elements. LV-RRE-Pro-GFP vectors containing mutations in the SL2 loop of the packaging signal (thus ablating aberrant splicing from this region) were produced +/- 256U1 , a modified U1 snRNA that binds to the vector genomic RNA to restore titres. Three different transgene promoters (EFS, EF1a, huPGK) and different 3' UTR cis-acting elements were employed: wPRE, ΔwPRE (wPRE deleted), 16x 10bp CARe sequences in sense (CARe.16t) or antisense (CARe.inv16t). Titres were measured by transduction of adherent HEK293T cells followed by flow cytometry based assay after 3 days (GFP TU/rnL) or qPCR to LV DNA after 10 days (Integrating TU/rnL).
Figure 6 provides production titres in suspension (serum-free) HEK293T cells of 'MSD- 2KO7ARRE lentiviral vectors harbouring different 3' UTR cis-acting elements and transgene expression in target cells. LV-ARRE-EFS-GFP vectors containing mutations in the SL2 loop of the packaging signal (thus ablating aberrant splicing from this region) were produced. Different 3' UTR cis-acting elements were employed: wPRE, ΔwPRE (wPRE deleted), 16x 10bp CARe sequences in sense (CARe.16t) or antisense (CARe.inv16t), and/or single copy of the ZCCHC14 stem loop from either HCMV RNA2.7 (HCMV.ZSL1) or WHV wPRE (WPRE.ZSI1). A. Titres were measured by transduction of adherent HEK293T cells followed by flow cytometry based assay after 3 days (GFP TU/mL) or qPCR to LV DNA after 10 days (Integrating TU/mL). B. Transgene expression in transduced adherent HEK293T or HEPG2 cells was measured by flow cytometry three days post-transduction at match MOI.
Figure 7 shows transgene expression levels in primary cells transduced with RRE/rev- dependent lentiviral vectors harbouring different 3' UTR cis-acting elements. LV-RRE-EFS- GFP [A] or LV-RRE-EF1a-GFP [B] vector stocks produced in suspension (serum-free) HEK293Ts were used to transduce primary cells (92BR) at matched multiplicity of infection (MOI): MOI 2, 1 or 0.5. Different 3' UTR cis-acting elements were employed: wPRE, wPRE3 (shortened wPRE), ΔwPRE (wPRE deleted), 16x 10bp CARe sequences in sense (CARe.16t) or antisense (CARe.inv16t), and/or single copy of the ZCCHC14 stem loop from either HCMV RNA2.7 (HCMV.ZSL1) or WHV wPRE (WPRE.ZSI1). Transgene expression in transduced adherent 92BR cells was measured by flow cytometry three days post-transduction at match MOI.
Figure 8 shows transgene expression levels in adherent HEK293T cells transduced with RRE/rev-dependent lentiviral vectors harbouring different 3' UTR cis-acting elements at matched MOIs. LV-RRE-EFS-GFP vector stocks produced in suspension (serum-free) HEK293Ts were initially titrated on adherent HEK293T cells to generate integrating titres (TU/mL). Vector stocks were used to transduce fresh adherent HEK293T cells at matched multiplicity of infection (MOI): MOI 2, 1 or 0.5. Different 3' UTR cis-acting elements were employed as 'stand-alone' elements: wPRE, 16x 10bp CARe tiles (CARe.16t) or a single copy of the ZCCHC14 stem loop (HCMV.ZSL1), compared to no element (ΔwPRE). Additionally, variants deleted for wPRE but containing a single copy of the ZCCHC14 stem loop were also paired with increasing numbers of CARe tile, from 1x to 20x 10bp copies. Transgene (GFP) expression in transduced adherent HEK293T cells was measured by flow cytometry three days post-transduction and median fluorescence intensities (Arbitrary units) normalised to that achieved with the standard wPRE-containing LV (set to 100%).
Figure 9 shows transgene expression levels in suspension Jurkat cells (T-cell line) transduced with RRE/rev-dependent lentiviral vectors harbouring different 3' UTR cis-acting elements at matched MOIs (diagonal lines). LV-RRE-EFS-GFP vector stocks produced in suspension (serum-free) HEK293Ts were initially titrated on adherent HEK293T cells to generate integrating titres (open bars; TU/rnL). Vector stocks were used to transduce fresh a Jurkat cells at matched multiplicity of infection (MOI): MOI 1 or 0.5. Different 3' UTR cis-acting elements were employed as 'stand-alone' elements: wPRE, 16x 10bp CARe tiles (CARe.16t) or a single copy of the ZCCHC14 stem loop (HCMV.ZSL1), compared to no element (ΔwPRE). Additionally, variants deleted for wPRE but containing a single copy of the ZCCHC14 stem loop (at position 2) were also paired with increasing numbers of CARe tile, from 1x to 20x 10bp copies (at position 1 i.e. upstream of position 2). Transgene (GFP) expression in transduced Jurkat cells was measured by flow cytometry three days post-transduction and median fluorescence intensities (Arbitrary units) normalised to that achieved with the standard wPRE-containing LV (set to 100%).
Figure 10 shows transgene expression levels in suspension Jurkat cells (T-cell line) transduced with RRE/rev-dependent lentiviral vectors harbouring different 3' UTR cis-acting elements at matched MOI. LV-RRE-EFS-GFP vector stocks produced in suspension (serum- free) HEK293Ts were initially titrated on adherent HEK293T cells to generate integrating titres (not shown). Vector stocks were used to transduce fresh Jurkat cells at matched multiplicity of infection of 1. Different 3' UTR cis-acting elements were employed as 'stand-alone' elements: wPRE (black bar), 16x 10bp CARe tiles (CARe.16t; dark grey bar) or a single copy of the ZCCHC14 stem loop (HCMV.ZSL1 ; striped light grey bar), compared to no element (ΔwPRE; white bar). Additionally, variants deleted for wPRE but containing a single copy of the ZCCHC14 stem loop (at position 2) were also paired with 16x 10bp CARe tiles that contained synthetic variant sequences of the consensus (CARe.16t_vX; at position 1 i.e. upstream of position 2) as shown (grey bars). Transgene (GFP) expression in transduced Jurkat cells was measured by flow cytometry ten days post-transduction and median fluorescence intensities (Arbitrary units) normalised to vector copy-number (VCN), which was measured by qPCR against HIV-Psi on extracted host cell DNA. The solid horizontal line indicates expression level achieved by the larger wPRE element, and the dotted horizontal line indicates expression levels without any 3'UTR element.
Figure 11 shows transgene expression levels in suspension Jurkat cells (T-cell line) transduced with RRE/rev-dependent lentiviral vectors harbouring different 3' UTR cis-acting elements at matched MOI. LV-RRE-EFS-GFP vector stocks produced in suspension (serum- free) HEK293Ts were initially titrated on adherent HEK293T cells to generate integrating titres (not shown). Vector stocks were used to transduce fresh Jurkat cells at matched multiplicity of infection of 1. Different 3' UTR cis-acting elements were employed as 'stand-alone' elements: wPRE (black bar), 16x 10bp CARe tiles (CARe.16t; dark grey bar) or a single copy of the ZCCHC14 stem loop (HCMV.ZSL1 ; striped light grey bar), compared to no element (ΔwPRE; white bar). Additionally, variants deleted for wPRE but containing a single copy of the ZCCHC14 stem loop (at position 2) were also paired with 16x 10bp CARe tiles that contained native variant sequences of the consensus (CARe.16t_vX; at position 1 i.e. upstream of position 2) as shown (grey bars). These were from c-Jun, HSPB3, IFN-alpha and IFN-beta mRNAs. Transgene (GFP) expression in transduced Jurkat cells was measured by flow cytometry ten days post-transduction and median fluorescence intensities (Arbitrary units) normalised to vector copy-number (VCN), which was measured by qPCR against HIV-Psi on extracted host cell DNA. The solid horizontal line indicates expression level achieved by the larger wPRE element, and the dotted horizontal line indicates expression levels without any 3'UTR element.
Figure 12 shows example rAAV vector genomes containing no (empty) or 3'UTR elements to enhance transgene expression in target cells. The 'CAZL' element (a composite of tandem CARe 10bp consensus tiles [CARe.xt] and the ZCCHC14 stem loop [ZSL1 ]) is -140-260 nts in length (depending on use of 4x to 16x CARe tiles). The wPRE is -590 nts in length, and therefore occupies more of the rAAV vector genome, size being a critical limitation for rAAVs. Key - Inverted terminal repeat (ITR), Promoter (Pro), Gene of interest (GOI), polyadenylation signal (polyA).
Figure 13 shows the results of an experiment wherein rAAV vectors containing either CMV- or EFS-promoter driven GFP, paired with either variant CARe/ZSL1 ('CAZL') elements from ~140-to-260 nts in length or with the wPRE (-590 nts) at position 'x' (i.e. in 3'UTR). Controls for the CARe/ZSL1 were inverted elements ('inv') to control for potential effects of different genome sizes, which ranged from 2.0-to-2.6 kb (CMV) and 1.6-to-2.2 kb (EFS) across all genomes. An empty rAAV vector was used as a negative control. rAAVs were made by cotransfection of HEK293T suspension cells with pGenome/pRepCap/pHelper plasmids at 1 :1 :1 ratio, and harvest 72 hours post-transfection. Vector harvest material was titrated by qPCR against the GFP sequence to generated vg/mL physical titre values. HEPG2 cells were transduced at the denoted MOIs, and then 72 hours post-transduction cells analysed by flow cytometry, and GFP Expression scores (%GFP+ x MFI; Arblls) generated.
Detailed Description
The present invention provides novel nucleotide sequences, and viral vectors or cells comprising such nucleotide sequences.
A nucleotide sequence is provided, comprising a transgene expression cassette wherein the 3' UTR of the transgene expression cassette comprises at least one cis-acting sequence selected from (a) a cis-acting Cytoplasmic Accumulation Region (CAR) sequence; and/or (b) a cis-acting ZCCHC14 protein-binding sequence.
The term "nucleotide sequence" is synonymous with the term "polynucleotide" and/or the term "nucleic acid sequence". The "nucleotide sequence" can be a double stranded or single stranded molecule and includes genomic DNA, cDNA, synthetic DNA, RNA and a chimeric DNA/RNA molecule. Polynucleotides may be produced recombinantly, synthetically or by any means available to those of skill in the art. They may also be cloned by standard techniques.
The nucleotide sequence comprises a transgene expression cassette. An expression cassette is a distinct component of a vector, comprising a gene (in this case a transgene) and regulatory sequence(s) to be expressed by a transfected, transduced or infected cell. As used herein, "transgene" refers to a segment of DNA or RNA that contains a gene sequence that has been isolated from one organism and is introduced into a different organism, is a non-native segment of DNA or RNA, or is a recombinant sequence that has been made using genetic engineering techniques. The terms "transgene", "transgene construct", "GOI" (gene of interest) and "NOI" (nucleotide of interest) are used interchangeably herein.
The transgene expression cassettes described herein are preferably viral vector transgene expression cassettes. Suitable viral vector transgene expression cassettes are described in more detail elsewhere herein.
The 3' UTR of the transgene expression cassettes described herein comprises at least one of the novel cis-acting sequences described herein. Cis-acting sequences affect the expression of genes that are encoded in the same nucleotide sequence (i.e. the one in which the cis- acting sequence is also present). In the context of viral vectors, cis-acting sequences include the typical post-transcriptional regulatory elements (PREs) such as that from the woodchuck hepatitis virus (wPRE). General examples of cis-acting sequences are provided elsewhere herein. The terms "cis-acting element" and "cis-acting sequence" are used interchangeably herein.
The 3' UTR of the transgene expression cassettes described herein comprises at least one cis-acting sequence selected from (a) a cis-acting Cytoplasmic Accumulation Region (CAR) sequence; and/or (b) a cis-acting ZCCHC14 protein-binding sequence.
As used herein, a "Cytoplasmic Accumulation Region (CAR) sequence" is a nucleotide sequence that is transcribed into mRNA and increases the stability and/or export of the mRNA to the cytoplasm and accumulation of the mRNA in the cytoplasm of a cell by sequencedependent recruitment of the mRNA export machinery. CAR sequences have been described previously, see for example Lei et al. , 2013, which describes that insertion of a CAR sequence upstream (i.e. at the 5' end) of a naturally intronless gene can promote the cytoplasmic accumulation of the mRNA transcript.
The inventors have now suprisingly found that insertion of a CAR sequence into the 3' UTR of a transgene expression cassette enhances gene expression. Surprisingly, these CAR sequences are shown to enhance the transgene expression from transgene cassettes utilizing introns as well as from transgene cassettes that are intronless, as well as boosting expression from cassettes already containing a full length wPRE.
Suitable CAR sequences for insertion into the 3' UTR of a transgene expression cassette described herein may be readily identifiable by a person of skill in the art, based on the disclosure provided herein, together with their common general knowledge (see e.g. the disclosure in Lei et al., 2013, which is incorporated herein in its entirety).
The CAR sequences described herein comprise at least one CAR element (CARe) sequence. As would be understood by a person of skill in the art, a CARe sequence is a core sequence that is present within a CAR sequence (and typically, wherein the CARe sequence is repeated a number of times within the CAR sequence). Examples of CARe sequences are shown in Figure 2 and are described in Lei et al., 2013.
For example, the CARe sequence may be a sequence that is represented by BMWGHWSSWS (SEQ ID NO: 24) or BMWRHWSSWS (SEQ ID NO: 55), wherein:
Figure imgf000018_0001
Table 1 : nucleotide symbols
Exemplary CARe sequences that are encompassed by BMWGHWSSWS (SEQ ID NO: 24) or BMWRHWSSWS (SEQ ID NO: 55) include those with a sequence represented by CMAGHWSSTG (using the nomenclature of Table 1 ; SEQ ID NO: 28). Such CARe sequences include the CARe sequences identified previously in Lei et al., 2013 (where the CARe sequences were identified in the 5' region of HSPB3, c-Jun, IFNal and I FNp 1 genes).
The CARe sequence may be selected from the group consisting of: CCAGTTCCTG (SEQ ID NO: 29), CCAGATCCTG (SEQ ID NO: 30), CCAGTTCCTG (SEQ ID NO: 31), TCAGATCCTG (SEQ ID NO:32), CCAGATGGTG (SEQ ID NO: 33), CCAGTTCCAG (SEQ ID NO:34), CCAGCAGCTG (SEQ ID NO:35), CAAGCTCCTG (SEQ ID NO:36), CAAGATCCTG (SEQ ID NO:37), CCTGAACCTG (SEQ ID NO:38), CAAGAACGTG (SEQ ID NO:39), TCAGTTCCTG (SEQ ID NO: 56), GCAGTTCCTG (SEQ ID NO: 57), CAAGTTCCTG (SEQ ID NO: 58), CCTGTTCCTG (SEQ ID NO: 59), CCTGCTCCTG (SEQ ID NO: 60), CCTGTACCTG (SEQ
ID NO:61), CCTGTTGCTG (SEQ ID NO: 62), CCTGTTCGTG (SEQ ID NO: 63), CCTGTTCCAG (SEQ ID NO: 64), CCTGTTCCTG (SEQ ID NO: 65), CCAATTCCTG (SEQ ID NO: 66) and GAAGCTCCTG (SEQ ID NO: 67). In a particular example, the CARe nucleotide sequence is selected from the group consisting of: CCAGTTCCTG (SEQ ID NO:29), CCTGTTCCTG (SEQ ID NO: 59), CCTGTACCTG (SEQ ID NO:61), CCTGTTCCAG (SEQ ID NO: 64), CCAATTCCTG (SEQ ID NO: 66), CCTGAACCTG (SEQ ID NO:38), CCAGTTCCTC (SEQ ID NO: 31) and CCAGTTCCAG (SEQ ID NO:34).
In a particular example, the CARe sequence may be CCAGTTCCTG (SEQ ID NO: 29). This is the sequence that is used to exemplify the invention in examples 1 to 4 below.
In one example, the CARe sequence may be CCAGATCCTG (SEQ ID NO: 30). This is the consensus sequence identified in Figure 2A.
In one example, the CARe sequence may be CCAGTTCCTC (SEQ ID NO: 31). This sequence is also referred to as HSPB3 v2 herein (see e.g. Figure 11).
For example, the CARe sequence may be TCAGATCCTG (SEQ ID NO: 32).
In one example, the CARe sequence may be CCAGATGGTG (SEQ ID NO: 33). This sequence is also referred to as HSPB3 v3 herein (see e.g. Figure 11).
In one example, the CARe sequence may be CCAGTTCCAG (SEQ ID NO: 34). This sequence is also referred to as IFNal v1 herein (see e.g. Figure 11).
For example, the CARe sequence may be CCAGCAGCTG (SEQ ID NO: 35).
In one example, the CARe sequence may be CAAGCTCCTG (SEQ ID NO: 36).
In one example, the CARe sequence may be CAAGATCCTG (SEQ ID NO: 37).
For example, the CARe sequence may be CCTGAACCTG (SEQ ID NO: 38). This sequence is also referred to as c-Jun v2 herein (see e.g. Figure 11).
In one example, the CARe sequence may be CAAGAACGTG (SEQ ID NO: 39). This sequence is also referred to as c-Jun v4 herein (see e.g. Figure 11).
In one example, the CARe sequence may be TCAGTTCCTG (SEQ ID NO: 56). This sequence is also referred to as variant 1 in Figure 10.
In one example, the CARe sequence may be GCAGTTCCTG (SEQ ID NO: 57). This sequence is also referred to as variant 2 in Figure 10. In one example, the CARe sequence may be CAAGTTCCTG (SEQ ID NO: 58). This sequence is also referred to as variant 3 in Figure 10.
In one example, the CARe sequence may be CCTGTTCCTG (SEQ ID NO: 59). This sequence is also referred to as variant 4 in Figure 10.
In one example, the CARe sequence may be CCTGCTCCTG (SEQ ID NO: 60). This sequence is also referred to as variant 6 in Figure 10.
In one example, the CARe sequence may be CCTGTACCTG (SEQ ID NO:61). This sequence is also referred to as variant 7 in Figure 10.
In one example, the CARe sequence may be CCTGTTGCTG (SEQ ID NO: 62). This sequence is also referred to as variant 8 in Figure 10.
In one example, the CARe sequence may be CCTGTTCGTG (SEQ ID NO: 63). This sequence is also referred to as variant 9 in Figure 10.
In one example, the CARe sequence may be CCTGTTCCAG (SEQ ID NO: 64). This sequence is also referred to as variant 10 in Figure 10.
In one example, the CARe sequence may be CCTGTTCCTG (SEQ ID NO: 65). This sequence is also referred to as variant 11 in Figure 10.
In one example, the CARe sequence may be CCAATTCCTG (SEQ ID NO: 66). This sequence is also referred to as variant 12 in Figure 10.
In one example, the CARe sequence may be GAAGCTCCTG (SEQ ID NO: 67). This sequence is also referred to as IFNbl v1 in Figure 11.
In other words, a transgene expression cassette described herein (typically a viral vector transgene expression cassette) may comprise a cis-acting Cytoplasmic Accumulation Region (CAR) sequence, comprising at least one of the CAR element (CARe) sequences described above.
CAR sequences typically comprise a plurality of CARe sequences. For example, the CAR sequences described herein may include a plurality of CARe sequences, e.g. at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, or at least twenty CARe sequences.
In one example, the CAR sequence described herein comprises at least two CARe sequences.
In one example, the CAR sequence described herein comprises at least four CARe sequences.
In one example, the CAR sequence described herein comprises at least six CARe sequences.
In one example, the CAR sequence described herein comprises at least eight CARe sequences. In this context particularly, the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
In one example, the CAR sequence described herein comprises at least ten CARe sequences.
In one example, the CAR sequence described herein comprises at least twelve CARe sequences.
In one example, the CAR sequence described herein comprises at least fourteen CARe sequences.
In one example, the CAR sequence described herein comprises at least sixteen CARe sequences. In this context particularly, the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
In one example, the CAR sequence described herein comprises at least eighteen CARe sequences.
In one example, the CAR sequence described herein comprises at least twenty CARe sequences.
There may be a desire to use a CAR sequence that is as short as possible. Accordingly, in one example, the CAR sequences described herein may include a plurality of CARe sequences, e.g. no more than two, no more than three, no more than four, no more than five, no more than six, no more than seven, no more than eight, no more than nine, no more than ten, no more than eleven, no more than twelve, no more than thirteen, no more than fourteen, no more than fifteen, no more than sixteen, no more than seventeen, no more than eighteen, no more than nineteen, or no more than twenty CARe sequences.
In one example, the CAR sequence described herein has no more than two CARe sequences.
In one example, the CAR sequence described herein has no more than four CARe sequences. In this context particularly, the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
In one example, the CAR sequence described herein has no more than six CARe sequences.
In one example, the CAR sequence described herein has no more than eight CARe sequences. In this context particularly, the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
In one example, the CAR sequence described herein has no more than ten CARe sequences.
In one example, the CAR sequence described herein has no more than twelve CARe sequences.
In one example, the CAR sequence described herein has no more than fourteen CARe sequences.
In one example, the CAR sequence described herein has no more than sixteen CARe sequences. In this context particularly, the transgene expression cassette may also comprise at least one cis-acting ZCCHC14 protein-binding sequence provided herein.
In one example, the CAR sequence described herein has no more than eighteen CARe sequences.
In one example, the CAR sequence described herein has no more than twenty CARe sequences. In one example, the CAR sequence described herein has at least four, but no more than twenty, CARe sequences.
In one example, the CAR sequence described herein has at least eight, but no more than twenty, CARe sequences.
In one example, the CAR sequence described herein has at least twelve, but no more than twenty, CARe sequences.
In one example, the CAR sequence described herein has at least sixteen, but no more than twenty, CARe sequences.
In one example, the CAR sequence described herein has at least four, but no more than sixteen, CARe sequences.
In one example, the CAR sequence described herein has at least eight, but no more than sixteen, CARe sequences.
In one example, the CAR sequence described herein has at least twelve, but no more than sixteen, CARe sequences.
Suitably, a CAR sequence described herein may consist of two CARe sequences, or consist of four CARe sequences, or consist of six CARe sequences, or consist of eight CARe sequences, or consist of ten CARe sequences, or consist of twelve CARe sequences, or consist of fourteen CARe sequences, or consist of sixteen CARe sequences, or consist of eighteen CARe sequences, or consist of twenty CARe sequences.
The inventors have identified herein that inserting a CAR sequence comprising sixteen CARe sequences into the 3' UTR of a transgene expression cassette (typically viral vector transgene expression cassettes herein) enhances transgene expression. A CAR sequence comprising at least sixteen CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG) is therefore particularly contemplated herein.
A CAR sequence having no more than sixteen CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG) is therefore also particularly contemplated herein. From the foregoing, it will be appreciated that a CAR sequence having a total of sixteen CARe sequences (e.g. with at least one (or all) of the CARe sequence(s) being CCAGTTCCTG) is contemplated herein.
In addition, the inventors have shown that enhanced expression may also be achieved with less than sixteen CARe sequences (e.g. with one or more CARe sequences - see Figure 8). This is advantageous as it provides greater transgene capacity in the expression cassette.
A CAR sequence comprising at least eight CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is particularly contemplated herein.
Accordingly, a CAR sequence having a total of eight CARe sequences (e.g. with at least one (or all) of the CARe sequence(s) being CCAGTTCCTG) is contemplated herein.
A CAR sequence having no more than eight CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore also particularly contemplated herein.
Furthermore, in the context of inserting CARe sequences at the 5' end of intronless mRNA, CAR sequence comprising six or ten CARe sequences have previously been shown to be functional. Accordingly, in the context of the invention, wherein the CAR sequence is inserted in the 3'UTR of a transgene expression cassette, a CAR sequence comprising at least six CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore particularly contemplated herein.
Similarly, in the context of the invention, wherein the CAR sequence is inserted in the 3'UTR of a transgene expression cassette, a CAR sequence comprising at least ten CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore also particularly contemplated herein.
A CAR sequence having no more than six CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore particularly contemplated herein. Such sequences are particularly contemplated in the case that the CAR sequence is inserted in the 3'UTR of a transgene expression cassette. A CAR sequence having no more than ten CARe sequences (e.g. with at least one (or all) CARe sequence(s) being CCAGTTCCTG (SEQ ID NO: 29)) is therefore also particularly contemplated herein. As above, such sequences are particularly contemplated in the case that the CAR sequence is inserted in the 3'UTR of a transgene expression cassette.
The CAR sequence may comprise a plurality of CARe sequences that are the same (i.e. the CAR sequence may comprise a number of repeats of the same CARe sequence). For example, the CAR sequence may comprise at least two, at least four, at least six, at least eight, at least ten, at least twelve, at least fourteen, at least sixteen, at least eighteen, or at least twenty of the same CARe sequence.
In one example, the CAR sequence may have no more than two, no more than four, no more than six, no more than eight, no more than ten, no more than twelve, no more than fourteen, no more than sixteen, no more than eighteen, or no more than twenty of the same CARe sequence.
Alternatively, the CAR sequence may comprise at least two, at least four, at least six, at least eight, at least ten, at least twelve, at least fourteen, at least sixteen, at least eighteen, or at least twenty CARe sequences, wherein at least two of the CARe sequences are different.
In one example, the CAR sequence may have no more than two, no more than four, no more than six, no more than eight, no more than ten, no more than twelve, no more than fourteen, no more than sixteen, no more than eighteen, or no more than twenty CARe sequences, wherein at least two of the CARe sequences are different.
Indeed, in each of the embodiments described herein where a CAR sequence comprises a plurality of CARe sequences, the CARe sequences each may be selected independently from the group consisting of: SEQ ID NO: 29, SEQ ID NO: 30, SEQ ID NO: 31 , SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37, SEQ ID NO: 38, SEQ ID NO: 39, SEQ ID NO: 56, SEQ ID NO: 57, SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61 , SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66 and SEQ ID NO: 67. In a specific example, where a CAR sequence comprises a plurality of CARe sequences, the CARe sequences each may be selected independently from the group consisting of: SEQ ID NO: 29, SEQ ID NO: 59, SEQ ID NO: 61 , SEQ ID NO: 64, SEQ ID NO: 66, SEQ ID NO: 38, SEQ ID NO: 31 , and SEQ ID NO: 34.
Appropriate combinations of CARe sequences may be identified by a person of skill in the art. The plurality of CARe sequences within the CAR sequence may be in tandem (i.e. they may be referred to as "tandem CARe sequences").
For example, the at least two, at least four, at least six, at least eight, at least ten, at least twelve, at least fourteen, at least sixteen, at least eighteen, or at least twenty CARe sequences within the CAR sequence may be in tandem.
In one example, the no more than two, no more than four, no more than six, no more than eight, no more than ten, no more than twelve, no more than fourteen, no more than sixteen, no more than eighteen, or no more than twenty CARe sequences within the CAR sequence may be in tandem.
Tandem CARe sequences are located directly adjacent to each other in the nucleotide sequence. Byway of an example, if the CAR sequence comprises two CARe sequences (each having the sequence CCAGTTCCTG (SEQ ID NO: 29)) in tandem, it would comprise the sequence CCAGTTCCTGCCAGTTCCTG (SEQ ID NO: 48). Similarly, if the CAR sequence comprises three CARe sequences (each having the sequence CCAGTTCCTG (SEQ ID NO: 29)) in tandem, it would comprise the sequence CCAGTTCCTGCCAGTTCCTGCCAGTTCCTG (SEQ ID NO: 49) etc.
In embodiments in which the CAR sequence comprises tandem CARe sequences, the tandem sequences may each have the same CARe sequence.
The CAR sequence may be described by its size. For example, where a CAR sequence comprises a plurality of CARe sequences in tandem, its size will be reflective of the number of CARe sequences that are present. For example, using the CARe sequences specifically recited herein (which all have a sequence of 10 nucleotides) the CAR sequence may be at least 20 nucleotides when it comprises two CARe sequences in tandem, at least 30 nucleotides when it comprises three CARe sequences in tandem, at least 40 nucleotides when it comprises four CARe sequences in tandem etc.
Accordingly, the CAR sequence may be at least 10 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides etc.
It may be at least 60 nucleotides (e.g. with at least six CARe sequences in tandem).
It may be at least 80 nucleotides (e.g. with at least eight CARe sequences in tandem). It may be at least 100 nucleotides (e.g. with at least ten CARe sequences in tandem).
It may be at least 160 nucleotides (e.g. with at least sixteen CARe sequences in tandem).
The CAR sequence may be no more than 10 nucleotides, no more than 20 nucleotides, no more than 30 nucleotides, no more than 40 nucleotides, no more than 50 nucleotides etc.
It may be no more than 60 nucleotides (e.g. with up to six CARe sequences in tandem).
It may be no more than 80 nucleotides (e.g. with up to eight CARe sequences in tandem).
It may be no more than 100 nucleotides (e.g. with up to ten CARe sequences in tandem).
It may be no more than 160 nucleotides (e.g. with up to sixteen CARe sequences in tandem).
It may be no more than 200 nucleotides (e.g. with up to twenty CARe sequences in tandem).
It may be no more than 240 nucleotides (e.g. with up to twenty four CARe sequences in tandem).
It may be no more than 290 nucleotides (e.g. with up to twenty nine CARe sequences in tandem).
It may be no more than 300 nucleotides (e.g. with up to thirty CARe sequences in tandem).
It may be no more than 350 nucleotides (e.g. with up to thirty five CARe sequences in tandem).
It may be no more than 400 nucleotides (e.g. with up to forty CARe sequences in tandem).
It may be no more than 410 nucleotides (e.g. with up to forty one CARe sequences in tandem).
Alternatively, the plurality of CARe sequences within the CAR sequence may be spatially separated by intervening sequences (i.e. one or more nucleotides may be present between neighbouring CARe sequences within the CAR sequence). In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than twenty nucleotides e.g. there are no more than two, no more than three, no more than four, no more than five, no more than six, no more than seven, no more than eight, no more than nine, no more than ten, no more than eleven, no more than twelve, no more than thirteen, no more than fourteen, no more than fifteen, no more than sixteen, no more than seventeen, no more than eighteen, no more than nineteen, or no more than twenty intervening nucleotides between neighbouring CARe sequences within the CAR sequence.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than two nucleotides.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than four nucleotides.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than six nucleotides.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than eight nucleotides.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than ten nucleotides.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than twelve nucleotides.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than fourteen nucleotides.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than sixteen nucleotides.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than eighteen nucleotides.
In one example, the intervening sequence between neighbouring CARe sequences within the CAR sequence is no more than twenty nucleotides.
The transgene expression cassette may comprise at least one a cis-acting ZCCHC14 proteinbinding sequence (as an alternative to the CARe sequence(s) described herein, or in addition to the CARe sequence(s) described herein). As used herein, a ZCCHC14 protein-binding sequence is a nucleotide sequence that is capable of interacting with/being bound by a ZCCHC14 protein. "ZCCHC14" refers to human Zinc finger CCHC domain-containing protein 14 (also referred to as BDG-29), with UniProtKB identifier: Q8WYQ9; and NCBI Gene ID: 23174, updated on 4-Jul-2021). Recruitment of ZCCHC14 to the 3' region of the transgene mRNA results in a complex of ZCCHC14-Tent4, which enables mixed tailing within polyA tails of the polyadenylated mRNA. Mixed tailing increases the stability of the transgene mRNA in target cells.
Methods for determining whether a specific nucleotide sequence is capable of being bound by a ZCCHC14 protein are well known in the art; see for example the method of 'Systematic evolution of ligands by exponential enrichment' (SELEX) or the method of RNA electrophoretic mobility shift assay or the method RNA pull-down (cross-linking/immunoprecipitation) or a combination of these methods (Mol Biol (Mosk). May-Jun 2015;49(3):472-81). Routine methods for detecting nucleotide: protein interactions may also be used e.g. nucleotide pull down assays, ELISA assays, reporter assays etc. Appropriate ZCCHC14 protein-binding sequences can therefore readily be identified by a person of skill in the art.
The cis-acting ZCCHC14 protein-binding sequences described herein comprise at least one CNGGN-type pentaloop sequence. Several ZCCHC14 protein-binding sequences comprising a CNGGN-type pentaloop sequence are known in the art. For example, the PRE of HBV and HCMV RNA 2.7, as well as wPRE, are known to comprise a ZCCHC14 protein-binding sequence with a CNGGN-type pentaloop sequence. The pentaloop adopts the GNGG(N) family loop conformation with a single bulged G residue, flanked by A-helical regions (see for example Kim et al., 2020), where N can be any nucleotide.
The cis-acting ZCCHC14 protein-binding sequences described herein may comprise any appropriate CNGGN-type pentaloop sequence.
For example, they may comprise a CTGGT pentaloop sequence (as is seen in the stem loop found in HCMV RNA2.7; also known as the SLa of HCMV RNA2.7).
Alternatively, they may comprise a CTGGA pentaloop sequence (as is seen in the stem loop found in the a element of wPRE; also known as the SLa of wPRE).
Further alternatively, they may comprise a CAGGT pentaloop sequence (as is seen in the stem loop found in the PRE a element of HBV; also known as the SLa of HBV). The CNGGN-type pentaloop sequences found within the SLa of HCMV RNA 2.7, wPRE and HBV are typically part of a stem-loop structure, which facilitates TENT4-dependent tail regulation.
Accordingly, the cis-acting ZCCHC14 protein-binding sequences described herein may also comprise the CNGGN-type pentaloop sequence as part of a stem loop sequence. As would be clear to a person of skill in the art, any appropriate stem loop sequence may be used. Suitable stem loop sequences can readily be identified by a person of skill in the art, as the required level of complementarity needed for a stem loop sequence is known.
Differing stem lengths may also be used. In the examples herein, stems of 7 or 8 nucleotides have been used, however, longer stems with up to an additional 9 nucleotides (18 nt added in total, 9 on each side) have also been shown to work (data not shown).
Non-limiting examples of stem loop sequences are provided below.
For example, a cis-acting ZCCHC14 protein-binding sequence described herein may comprise the stem loop sequence TCCTCGTAGGCTGGTCCTGGGGA (SEQ ID NO: 40; which includes the pentaloop sequence CTGGT, and corresponds to the sequence of SLa of HCMV RNA2.7).
Alternatively, a cis-acting ZCCHC14 protein-binding sequence described herein may comprise the stem loop sequence GCCCGCTGCTGGACAGGGGC (SEQ ID NO: 41 ; which includes the pentaloop sequence CTGGA, and corresponds to the sequence of SLa of wPRE).
Further alternatively, they may comprise the stem loop sequence TTGCTCGCAGCAGGTCTGGAGCAA (SEQ ID NO: 52; which includes the pentaloop sequence CAGGT, and corresponds to the sequence of SLa of HBV).
Alternatively, the cis-acting ZCCHC14 protein-binding sequences described herein may comprise the CNGGN-type pentaloop sequence as part of a heterologous stem loop sequence. Appropriate heterologous stem loop sequences may readily be identified by a person of skill in the art.
The cis-acting ZCCHC14 protein-binding sequences described herein may comprise the CNGGN-type pentaloop sequence as part of a stem loop sequence, within a longer sequence (i.e. wherein the cis-acting ZCCHC14 protein-binding sequence comprises additional sequences that flank the stem loop sequence itself).
Examples of flanking sequences are given in SEQ ID NO: 42 (which shows the sequence of a stem-loop structure as set forth in SEQ ID NO: 40, with additional flanking sequences) and SEQ ID NO: 43 (which shows the sequence of a stem-loop structure as set forth in SEQ ID NO: 41 , with additional flanking sequences). Further examples of flanking sequences are given in SEQ ID NO: 54 (which shows the sequence of a stem-loop structure as set forth in SEQ ID NO: 52, with additional flanking sequences).
The flanking sequences may be nucleotides that are naturally present at these positions in the corresponding PRE (e.g. for SEQ ID NO:43, the flanking sequences are those that are naturally present around the SLa sequence of wPRE). Alternatively, heterologous flanking sequences may be used. The flanking sequences provided herein are merely by way of example and alternative flanking sequences and flanking sequences with different lengths may also be used.
In one example, a ZCCHC14 protein-binding sequence as described herein may comprise at least one CNGGN-type pentaloop sequence and a stem loop sequence, but does not comprise a flanking sequence.
In one example, a ZCCHC14 protein-binding sequence as described herein comprises at least one CNGGN-type pentaloop sequence, but does not comprise a stem loop sequence or a flanking sequence.
The cis-acting ZCCHC14 protein-binding sequences described herein do not comprise a full length post-transcriptional regulatory element (PRE) a element. In addition, they do not comprise a full length PRE y element (for example that found in wPRE or wPRE3). As such, the cis-acting ZCCHC14 protein-binding sequences described herein have neither a full length post-transcriptional regulatory element (PRE) a element nor a full length PRE y element. The reason for this is that the invention aims to minimise the size of the cis-acting sequences in the 3' UTR of the transgene expression cassette as much as possible, to provide more transgene capacity. The inventors have suprisingly found that the full length sequence of a PRE a element and the full length sequence of a PRE y element are not needed in order to obtain the effects observed herein.
The ZCCHC14 protein-binding sequence described herein therefore is not wPRE or wPRE3. Full length PRE a element and y element sequences are readily identifiable by a person of skill in the art. By way of example, full length PRE a element and y element sequences for wPRE are provided herein as SEQ ID NO: 46 and SEQ ID NO: 45 respectively. Furthermore, a full length PRE a element sequence for HBV is also provided as SEQ ID NO: 53. Although a PRE a element as such has not been identified within HCMV RNA 2.7, a "minimal element" having equivalent function has been found (see SEQ ID NO: 27 herein).
The cis-acting ZCCHC14 protein-binding sequences described herein therefore do not comprise any of the following sequences: SEQ ID NO: 46, SEQ ID NO: 45, SEQ ID NO:53, and/or SEQ ID NO: 27.
In one example, a cis-acting ZCCHC14 protein-binding sequence provided herein may have a sequence that corresponds to a PRE a element fragment (in other words, it may have a sequence that is identical to a portion of a PRE a element, but does not include all of (i.e. is shorter than) the full length PRE a element sequence). In other words, the cis-acting ZCCHC14 protein-binding sequence provided herein may be a truncated nucleotide sequence that constitutes a part of a PRE a element.
For example, the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) a HBV PRE a element. In this context, it may be described as a HBV PRE a element fragment. It may also be described as a truncated nucleotide sequence that constitutes a part of a HBV PRE a element. In this context, the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) SEQ ID NO: 53. In other words, it may be a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 53. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO:54.
For example, the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) a HCMV RNA 2.7 sequence. For example, it may be a fragment (a portion of) the sequence shown in SEQ ID NO: 27. It may be described as a HCMV RNA 2.7 fragment (for example, a fragment of the sequence shown in SEQ ID NO:27). It may also be described as a truncated nucleotide sequence that constitutes a part of a HCMV RNA 2.7 sequence. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42. For example, the ZCCHC14 protein-binding sequence may be a fragment of the sequence shown in SEQ ID NO:27 and comprise the sequence of SEQ ID NO: 40, or 42.
For example, the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) a wPRE PRE a element. In this context, it may be described as a wPRE a element fragment. It may also be described as a truncated nucleotide sequence that constitutes a part of a wPRE a element. In this context, the cis-acting ZCCHC14 protein-binding sequence may be a fragment (a portion of) SEQ ID NO: 46. In other words, it may be a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 46. In this example, the ZCCHC14 proteinbinding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
The inventors have exemplified the invention by using ZCCHC14 protein-binding sequences that are derived from known PREs (specifically, the ZCCHC14 protein-binding sequence that is present in HCMV RNA2.7; and/or the ZCCHC14 protein-binding sequence that is present in the PRE of the woodchuck hepatitis virus (wPRE)). Although these ZCCHC14 proteinbinding sequences are particularly contemplated herein, other appropriate ZCCHC14 proteinbinding sequences (e.g. from other PREs) may alternatively (or additionally) be used.
The ZCCHC14 protein-binding sequences that are described herein are used to enhance transgene expression, whilst minimising the 'backbone' sequences of viral vectors such that titres of vectors containing larger payloads can be maintained or increased. The ZCCHC14 protein-binding sequence is therefore typically small.
For example, the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 240 nucleotides. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54.
In one example, the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 200 nucleotides (i.e. no more than 200 nucleotides in length). In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54.
In a further example, the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 150 nucleotides (i.e. no more than 150 nucleotides in length). In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54.
In one example, the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 100 nucleotides (i.e. no more than 100 nucleotides in length). In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54. For example, the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be up to 90 nucleotides (i.e. no more than 90 nucleotides in length). In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, 41 , 42, 43, 52 or 54.
In an example where the cis-acting ZCCHC14 protein-binding sequence is a fragment (a portion of) an HBV PRE a element (i.e. a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 53), the cis-acting ZCCHC14 protein-binding sequence may be up to 240 nucleotides of SEQ ID NO: 53. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO: 54.
In one example it may be up to 200 nucleotides of SEQ ID NO: 53. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO: 54.
For example, it may be up to 150 nucleotides of SEQ ID NO: 53. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO:54.
In a further example it may be up to 100 nucleotides of SEQ ID NO: 53. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO:54.
For example, it may be up to 90 nucleotides of SEQ ID NO: 53. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 52, or SEQ ID NO:54.
In an example where the cis-acting ZCCHC14 protein-binding sequence is a fragment (a portion of) a HCMV RNA 2.7 sequence (e.g. a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 27), the cis-acting ZCCHC14 protein-binding sequence may be up to 240 nucleotides of SEQ ID NO:27. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42.
In one example it may be up to 200 nucleotides of SEQ ID NO:27. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42.
For example, it may be up to 150 nucleotides of SEQ ID NO:27. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42. In a further example it may be up to 100 nucleotides of SEQ ID NO:27. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42.
In a further example it may be up to 90 nucleotides of SEQ ID NO:27. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 40, or 42.
In an example where the cis-acting ZCCHC14 protein-binding sequence is a fragment (a portion of) a wPRE a element (i.e. a truncated nucleotide sequence that constitutes a part of SEQ ID NO: 46), the cis-acting ZCCHC14 protein-binding sequence may be up to 240 nucleotides of SEQ ID NO: 46. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
In one example it may be up to 200 nucleotides of SEQ ID NO: 46. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
For example, it may be up to 150 nucleotides of SEQ ID NO: 46. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
In a further example it may be up to 100 nucleotides of SEQ ID NO: 46. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
In a further example it may be up to 90 nucleotides of SEQ ID NO: 46. In this example, the ZCCHC14 protein-binding sequence may comprise the sequence of SEQ ID NO: 41 , or 43.
In one example provided herein, the ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein is up to 90 nucleotides (i.e. a maximum of 90 nucleotides long). See for example SEQ ID NO: 43, which provides the sequence for the ZCCHC14 stem loop from wPRE and is 90 nucleotides in length).
In one example provided herein, ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein is up to 72 nucleotides (i.e. a maximum of 72 nucleotides long). See for example SEQ ID NO: 42, which provides the sequence for the ZCCHC14 stem loop from HCMV RNA2.7 and is 72 nucleotides in length). These examples demonstrate that relatively short sequences can be functional.
The ZCCHC14 protein-binding sequence that is inserted into the 3' UTR of the transgene expression cassette described herein may be the minimal sequence that is capable of being bound by a ZCCHC14 protein (i.e. it may be a minimal ZCCHC14 protein-binding sequence). In other words, it may be the smallest sequence that still provides the desired functionality. In the context of ZCCHC14 protein-binding sequences that are naturally found in PREs, the ZCCHC14 protein-binding sequence may therefore be the minimal PRE sequence that is capable of being bound by a ZCCHC14 protein. Methods for determining such minimal sequences are known in the art (e.g. nucleotide pull down assays, ELISA assays, reporter assays etc may be used).
In an example, the ZCCHC14 protein-binding sequence may comprise or consist of the sequence of SEQ ID NO: 40 or 41. The ZCCHC14 protein-binding sequence may comprise or consist of a fragment of SEQ ID NO: 40 or 41 that is capable of binding ZCCHC14.
In one example, the ZCCHC14 protein-binding sequence may comprise or consist of the sequence of SEQ ID NO: 42 or 43. The ZCCHC14 protein-binding sequence may comprise or consist of a fragment of SEQ ID NO: 42 or 43 that is capable of binding ZCCHC14.
In one example, the ZCCHC14 protein-binding sequence may comprise or consist of the sequence of SEQ ID NO: 52 or 54. The ZCCHC14 protein-binding sequence may comprise or consist of a fragment of SEQ ID NO: 52 or 54 that is capable of binding ZCCHC14.
In the case of embodiments directed to a fragment of a specified sequence that is capable of binding ZCCHC14, the skilled person will readily be able to determine whether or not a given fragment of interest retains this protein-binding activity (for example by means of the assays described elsewhere in this disclosure) and, so will be able to assess whether or not this requirement is met without undue burden or need for excessive experimentation.
Suitably, a ZCCHC14 protein-binding sequence as described herein does not comprise a PRE P element. In some examples, the ZCCHC14 protein-binding sequence does not include any sequences that are specific to the PRE β element of SEQ ID NO:47. In other words, it does not include any of the PRE p element of wPRE. In addition, it may not include any sequences that are specific to the PRE y element of SEQ ID NO:45. In other words, it may not include any of the PRE y element of wPRE and also may not include any of the PRE p element of wPRE.
Thus far, the cis-acting ZCCHC14 protein-binding sequences and cis-acting CAR sequences provided herein have been discussed separately. However, as is shown in the examples section below (see also Figure 1), both of these sequences may be present within a 3' UTR of a transgene expression cassette. Accordingly, any aspect of the cis-acting ZCCHC14 protein-binding sequences described herein may be combined with any aspect of the cis- acting CAR sequences described herein to provide a transgene expression cassette wherein the 3' UTR comprises both a cis-acting CAR sequence and a cis-acting ZCCHC14 proteinbinding sequence.
In such examples, the ZCCHC14 protein-binding sequence may be located 3' to the CAR sequence.
Alternatively, the ZCCHC14 protein-binding sequence may be located 5' to the CAR sequence within the 3'UTR of the transgene expression cassette.
Suitable positions for the novel cis-acting sequences described herein may be identified by a person of skill in the art, taking into account Figure 1 , for example.
In some examples, the 3' UTR of the transgene expression cassette comprises at least two spatially distinct cis-acting CAR sequences and/or at least two spatially distinct cis-acting ZCCHC14 protein-binding sequences.
In some examples, the 3' UTR of the transgene expression cassette further comprises a polyA sequence located 3' to the cis-acting CAR sequence and/or cis-acting ZCCHC14 proteinbinding sequence. Details of polyA sequences are provided elsewhere herein.
The 3'UTR of the transgene expression cassette may comprise additional PRE sequences (in addition to the novel cis-acting CAR sequence and/or cis-acting ZCCHC14 protein-binding sequences described herein). For example, the 3'UTR may have a full length PRE sequence, such as wPRE.
Woodchuck Hepatitis Virus (WHV) Posttranscriptional Regulatory Element (wPRE) is a nucleotide sequence that, when transcribed, creates a tertiary structure enhancing expression. The sequence is commonly used in molecular biology to increase expression of genes delivered by viral vectors. wPRE is a tripartite regulatory element with y, a, and p components (also referred to as elements herein), in the given order. wPRE facilitates nucleocytoplasmic transport of RNA mediated by several alternative pathways that may be cooperative. In addition, the wPRE has been shown to act on additional posttranscriptional mechanisms to stimulate expression of heterologous cDNAs. Further details relating to wPRE are provided elsewhere herein. The inventors have demonstrated that transgene expression cassettes the 3' UTR of which comprises both a novel cis-acting CAR sequence and/or cis-acting ZCCHC14 protein-binding sequence and an additional PRE, such as wPRE, are able to achieve markedly elevated transgene expression in cells.
In other examples, the 3'UTR of the transgene expression cassette does not comprise additional PRE sequences (in such examples, the novel cis-acting CAR sequence and/or cis- acting ZCCHC14 protein-binding sequences described herein are considered enhance transgene expression sufficiently, and, for example, the additional transgene capacity achieved by omitting additional PRE sequences (such as wPRE sequences) is desirable).
The transgene expression cassettes described herein may further comprise a promoter operably linked to the transgene. Several appropriate promoters are discussed in detail elsewhere herein.
For example, the promoter may be one that lacks its native intron (such as a promoter selected from the group consisting of: an EFS promoter, a PGK promoter, and a UBCs promoter).
Alternatively, the promoter may comprise an intron (for example, the promoter may be selected from the group consisting of: an EF1a promoter and a UBC promoter). These promoters are discussed in detail elsewhere herein.
The nucleotide sequences described herein comprise a transgene expression cassette (also referred to as transgene cassettes herein). The invention has particular utility when the novel cis-acting sequences described herein are present within the 3'UTR of a viral vector transgene expression cassette. Accordingly, any discussion of a transgene expression cassette is particularly relevant to viral vector transgene expression cassettes.
The viral vector transgene expression cassette may be any suitable viral vector transgene expression cassette. For example, the viral vector transgene expression cassette may be selected from the group consisting of: a retroviral vector transgene expression cassette, an adenoviral vector transgene expression cassette, an adeno-associated viral vector transgene expression cassette, a herpes simplex viral vector transgene expression cassette, and a vaccinia viral vector transgene expression cassette.
In a particular example, the viral vector transgene expression cassette is an adeno-associated viral (AAV) vector transgene expression cassette. In a particular example, the viral vector transgene expression cassette is a retroviral vector transgene expression cassette. For example, the retroviral vector transgene expression cassette may be a lentiviral vector transgene expression cassette. Appropriate viral vectors are discussed in detail elsewhere herein.
The invention has particular utility when the novel cis-acting sequences described herein are present within the 3'UTR of a retroviral vector transgene expression cassette (e.g. a lentiviral vector transgene expression cassette as used in the examples herein). Accordingly, any discussion of a transgene expression cassette is particularly relevant to retroviral vector transgene expression cassettes (e.g. lentiviral vector transgene expression cassettes as used in the examples herein).
The nucleotide sequences provided herein may be part of a viral vector genome. In other words, the transgene expression cassette provided herein may be part of a larger nucleotide sequence, which further comprises additional elements that are required to make up a viral vector genome. This may include, for example in the context of lentiviral vector genomes, a typical packaging sequence and rev-response element (RRE). In suitable embodiments, these may further comprise additional post-transcriptional regulatory elements (PREs) such as that from the woodchuck hepatitis virus (wPRE), as considered above. Each of these elements are discussed in more detail elsewhere herein.
Accordingly, a nucleotide sequence comprising a viral vector genome expression cassette is also provided herein, wherein the viral vector genome expression cassette comprises the transgene expression cassette described elsewhere herein.
The viral vector genome expression cassette may be any suitable cassette, for example it may be selected from the group consisting of: a retroviral vector genome expression cassette, an adenoviral vector genome expression cassette, an adeno-associated viral vector genome expression cassette, a herpes simplex viral vector genome expression cassette, and a vaccinia viral vector genome expression cassette.
In a particular example, the viral vector genome expression cassette is an adeno-associated viral vector genome expression cassette.
In a specific example, the viral vector genome expression cassette is a retroviral vector genome expression cassette. As described elsewhere herein, the invention has particular utility when the novel cis-acting sequences described herein are present within the 3'UTR of a retroviral vector transgene expression cassette (e.g. a lentiviral vector transgene expression cassette as used in the examples herein). Accordingly, any discussion of a transgene expression cassette being part of a viral vector genome expression cassette is particularly relevant to retroviral vector genome expression cassettes (e.g. lentiviral vector genome expression cassettes as used in the examples herein).
In one example, the transgene expression cassette is in the forward orientation with respect to the viral vector genome expression cassette (such that the transgene expression cassette is encoded in the sense orientation).
Alternatively, the transgene expression cassette may be inverted with respect to the vector genome expression cassette i.e. the internal transgene promoter and gene sequences oppose the vector genome cassette promoter.
As would be known by a person of skill in the art, the 3' UTR of the retroviral vector genome expression cassette typically further comprises a 3' polypurine tract (3'ppt) and a DNA attachment (att) site. Typically, the 3'ppt is located 5' to the att site within the 3'UTR of the retroviral vector genome expression cassette. When the invention is contemplated in the context of a transgene expression cassette that is the forward orientation (sense orientation) with respect to the genome expression cassette, the positioning of the novel cis-acting sequences provided herein relative to the 3'ppt and att site may need to be considered. Further details of this are provided in Example 1.
For example the core sequence that comprises both the 3'ppt and the att site (e.g. of a lentiviral vector genome expression cassette as described herein) may have a sequence of SEQ ID NO:25 (wherein 3'ppt is in bold, and att is underlined):
5'-AAAAGAAAAGGGGGGACTGGAAGGGCTAATTCAC-3' (SEQ ID NO:25)
Accordingly, where the transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette, it is preferable if the sequence above (of SEQ ID NO:25) is not disrupted by the novel cis-acting sequences described herein. In one example, the sequence of SEQ ID NO: 26 may be used to provide the 3'ppt and att site (e.g. of a lentiviral vector genome expression cassette as described herein), (wherein 3'ppt is in bold, and att is underlined):
5'-
CCAATGACTTACAAGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGAC TGGAAGGGCTAATTCACTCCCAA-3' (SEQ ID NO:26)
Accordingly, where the transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette, it is preferable if the sequence above (of SEQ ID NO:26) is not disrupted by the novel cis-acting sequences described herein.
Preferably, where the transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette cis-acting elements within the 3'UTR of the transgene cassette may be positioned upstream and/or downstream of the above uninterrupted sequences. Figure 1 also indicates how the CARe and/or ZCCHC14 binding loop(s) may be variably positioned within LV genome expression cassettes with or without a PRE, with transgene sequences in forward or reverse orientation.
For example, when the transgene expression cassette is in the forward orientation with respect to the genome expression cassette, the cis-acting sequence(s) described herein may be located 5' to the 3'ppt and/or 3' to the att site. Preferably, in this example, the cis-acting sequence(s) described herein are located 5' to the sequence of SEQ ID NO:25 or SEQ ID NO:26.
Alternatively, the cis-acting sequence(s) described herein may be located 3' to the sequence of SEQ ID NO:25 or SEQ ID NO:26.
In either example, the sequence of SEQ ID NO:25 or SEQ ID NO:26 is not disrupted.
The nucleotide sequences described herein may include additional features that are described in more detail elsewhere herein. For example, when the nucleotide sequence comprises a lentiviral vector genome expression cassette, the major splice donor site in the lentiviral vector genome expression cassette may be inactivated. Furthermore, the cryptic splice donor site 3' to the major splice donor site may also be inactivated. In one example that is described in more detail elsewhere herein, the inactivated major splice donor site may have the sequence of GGGGAAGGCAACAGATAAATATGCCTTAAAAT (SEQ ID NO:4; MSD-2KOm5). When the nucleotide sequence comprises a lentiviral vector genome expression cassette, the nucleotide sequence may further comprise a nucleotide sequence encoding a modified U1 snRNA, wherein the modified U1 snRNA has been modified to bind to a nucleotide sequence within the packaging region of the lentiviral vector genome. For example, the viral vector genome expression cassette may be operably linked to the nucleotide sequence encoding the modified U1 snRNA. This feature is described in more detail elsewhere herein.
A viral vector comprising a viral vector genome encoded by the nucleotide sequences described herein are also provided herein. The viral vector may be a viral particle or a virion.
A viral vector production system is also provided, comprising a nucleotide sequence described herein and one or more additional nucleotide sequence(s) encoding viral vector components. Optionally, nucleotide sequences encoding viral vector components comprise nucleotide sequences encoding gag-pol and env. Optionally, the viral vector production system further comprises a nucleotide sequence encoding rev. Such components are described in more detail elsewhere herein.
Cells are also provided herein, wherein the cells comprise a nucleotide sequence described herein, a viral vector described herein, or a viral vector production system described herein.
A method for producing a viral vector is also provided, comprising the steps of:
(c) introducing a viral vector production system described herein into a cell; and
(d) culturing the cell under conditions suitable for the production of the viral vector.
A method for identifying one or more cis-acting sequence(s) that improve transgene expression in a target cell is also provided, the method comprising the steps of:
(d) transducing target cells with a viral vector described herein;
(e) identifying target cells with a high level transgene expression; and
(f) optionally identifying the one or more cis-acting sequence(s) located within the 3' UTR of the transgene mRNA present within these target cells.
This method may be used to determine which one or more of the specific cis-acting sequences described herein improves transgene expression in a specific target cell. The method therefore provides a mechanism for tailoring the selection of specific cis-acting sequences described herein for a specific combination of viral vector, transgene, and target cell.
The method can therefore advantageously be performed using a plurality of viral vectors with: (i) different cis-acting sequences in the 3'UTR of the transgene expression cassette; and
(ii) different cis-acting sequence locations within the 3'UTR of the transgene expression cassette; and/or
(iii) different cis-acting sequence combinations in the 3'UTR of the transgene expression cassette; to identify one or more cis-acting sequence(s) that improve transgene expression in the target cell.
Any appropriate means for identifying the one or more cis-acting sequence(s) located within the 3' UTR of the transgene mRNA present within these target cells may be used within the method. In one example, step (c) of the method comprises performing RT-PCR and optionally sequencing the transgene mRNA.
Viral vectors
As would be clear to a person of skill in the art, the invention applies to any appropriate viral vector. For example, appropriate viral vectors may be selected from the group consisting of: a retroviral vector, an adenoviral vector, an adeno-associated viral vector, a herpes simplex viral vector and a vaccinia viral vector. Brief details of relevant viral vectors are provided below, followed by a more detailed review of lentiviral vectors specifically.
Adenoviral and Adeno-Associated Viral Vectors
An adenovirus is a double-stranded, linear DNA virus that does not replicate through an RNA intermediate. There are over 50 different human serotypes of adenovirus divided into 6 subgroups based on their genetic sequence.
Adenoviruses are double-stranded DNA non-enveloped viruses that are capable of in vivo, ex vivo and in vitro transduction of a broad range of cell types of human and non-human origin. These cells include respiratory airway epithelial cells, hepatocytes, muscle cells, cardiac myocytes, synoviocytes, primary mammary epithelial cells and post-mitotically terminally differentiated cells such as neurons.
Adenoviral vectors are also capable of transducing non-dividing cells. This is very important for diseases, such as cystic fibrosis, in which the affected cells in the lung epithelium have a slow turnover rate. In fact, several trials are underway utilising adenovirus-mediated transfer of cystic fibrosis transporter (CFTR) into the lungs of afflicted adult cystic fibrosis patients. Adenoviruses have been used as vectors for gene therapy and for expression of heterologous genes. The large (36 kb) genome can accommodate up to 8 kb of foreign insert DNA and is able to replicate efficiently in complementing cell lines to produce very high titres of up to 1012 transducing units per ml. Adenovirus is thus one of the best systems to study the expression of genes in primary non-replicative cells.
The expression of viral or foreign genes from the adenovirus genome does not require a replicating cell. Adenoviral vectors enter cells by receptor mediated endocytosis. Once inside the cell, adenovirus vectors rarely integrate into the host chromosome. Instead, they function episomally (independently from the host genome) as a linear genome in the host nucleus.
The use of recombinant adeno-associated viral (AAV) and Adenovirus based viral vectors for gene therapy is widespread, and manufacture of the same has been well documented. Typically, AAV-based vectors are produced in mammalian cell lines (e.g. HEK293-based) or through use of the baculovirus/Sf9 insect cell system. AAV vectors can be produced by transient transfection of vector component encoding DNAs, typically together with helper functions from Adenovirus or Herpes Simplex virus (HSV), or by use of cell lines stably expressing AAV vector components. Adenoviral vectors are typically produced in mammalian cell lines that stably express Adenovirus E1 functions (e.g. HEK293-based).
Adenoviral vectors are also typically 'amplified' via helper-function-dependent replication through serial rounds of 'infection' using the production cell line. An adenoviral vector and production system thereof comprises a polynucleotide comprising all or a portion of an adenovirus genome. It is well known that an adenovirus is, without limitation, an adenovirus derived from Ad2, Ad5, Ad12, and Ad40. An adenoviral vector is typically in the form of DNA encapsulated in an adenovirus coat or adenoviral DNA packaged in another viral or viral-like form (such as herpes simplex, and AAV).
An AAV vector it is commonly understood to be a vector derived from an adeno-associated virus serotype, including without limitation, AAV-1 , AAV-2, AAV-3, AAV-4, AAV-5, AAV-6, AAV-7 and AAV-8. AAV vectors can have one or more of the AAV wild-type genes deleted in whole or part, preferably the rep and/or cap genes, but retain functional flanking ITR sequences. Functional ITR sequences are necessary for the rescue, replication and packaging of the AAV virion. Thus, an AAV vector is defined herein to include at least those sequences required in cis for replication and packaging (e.g., functional ITRs) of the virus. The ITRs need not be the wild-type nucleotide sequences, and may be altered, e.g., by the insertion, deletion or substitution of nucleotides, so long as the sequences provide for functional rescue, replication and packaging. An 'AAV vector' also refers to its protein shell or capsid, which provides an efficient vehicle for delivery of vector nucleic acid to the nucleus of target cells. AAV production systems require helper functions which typically refers to AAV- derived coding sequences which can be expressed to provide AAV gene products that, in turn, function in trans for productive AAV replication. As such, AAV helper functions include both of the major AAV open reading frames (ORFs), rep and cap. The Rep expression products have been shown to possess many functions, including, among others: recognition, binding and nicking of the AAV origin of DNA replication; DNA helicase activity; and modulation of transcription from AAV (or other heterologous) promoters. The Cap expression products supply necessary packaging functions. AAV helper functions are used herein to complement AAV functions in trans that are missing from AAV vectors. It is understood that an AAV helper construct refers generally to a nucleic acid molecule that includes nucleotide sequences providing AAV functions deleted from an AAV vector which is to be used to produce a transducing vector for delivery of a nucleotide sequence of interest. AAV helper constructs are commonly used to provide transient expression of AAV rep and/or cap genes to complement missing AAV functions that are necessary for AAV replication; however, helper constructs lack AAV ITRs and can neither replicate nor package themselves. AAV helper constructs can be in the form of a plasmid, phage, transposon, cosmid, virus, or virion. A number of AAV helper constructs have been described, such as the commonly used plasmids pAAV/Ad and plM29+45 which encode both Rep and Cap expression products. See, e.g., Samulski et al. (1989) J. Virol. 63:3822-3828; and McCarty et al. (1991) J. Virol. 65:2936-2945. A number of other vectors have been described which encode Rep and/or Cap expression products. See, e.g., U.S. Pat. Nos. 5,139,941 and 6,376,237. In addition, it is common knowledge that the term "accessory functions" refers to non-AAV derived viral and/or cellular functions upon which AAV is dependent for its replication. Thus, the term captures proteins and RNAs that are required in AAV replication, including those moieties involved in activation of AAV gene transcription, stage specific AAV mRNA splicing, AAV DNA replication, synthesis of Cap expression products and AAV capsid assembly. Viral-based accessory functions can be derived from any of the known helper viruses such as adenovirus, herpesvirus (other than herpes simplex virus type-1) and vaccinia virus.
Herpes simplex virus vectors
Herpes simplex virus (HSV) is an enveloped double-stranded DNA virus that naturally infects neurons. It can accommodate large sections of foreign DNA, which makes it attractive as a vector system, and has been employed as a vector for gene delivery to neurons (Manservigiet et al Open Virol J. (2010) 4:123-156). The use of HSV in therapeutic procedures requires the strains to be attenuated so that they cannot establish a lytic cycle. In particular, if HSV vectors are used for gene therapy in humans, the polynucleotide should preferably be inserted into an essential gene. This is because if a viral vector encounters a wild-type virus, transfer of a heterologous gene to the wild-type virus could occur by recombination. However, as long as the polynucleotide is inserted into an essential gene, this recombinational transfer would also delete the essential gene in the recipient virus and prevent "escape" of the heterologous gene into the replication competent wild-type virus population.
Vaccinia virus vectors
Vaccinia virus vectors include MVA or NYVAC. Alternatives to vaccinia vectors include avipox vectors such as fowlpox or canarypox known as ALVAC and strains derived therefrom which can infect and express recombinant proteins in human cells but are unable to replicate.
Retroviral vectors
Retroviral vectors may be derived from or may be derivable from any suitable retrovirus. A large number of different retroviruses have been identified. Examples include: murine leukemia virus (MLV), human T-cell leukemia virus (HTLV), mouse mammary tumour virus (MMTV), Rous sarcoma virus (RSV), Fujinami sarcoma virus (FuSV), Moloney murine leukemia virus (Mo MLV), FBR murine osteosarcoma virus (FBR MSV), Moloney murine sarcoma virus (Mo-MSV), Abelson murine leukemia virus (A-MLV), Avian myelocytomatosis virus-29 (MC29) and Avian erythroblastosis virus (AEV). A detailed list of retroviruses may be found in Coffin et al. (1997) "Retroviruses", Cold Spring Harbour Laboratory Press Eds: JM Coffin, SM Hughes, HE Varmus pp 758-763.
Retroviruses may be broadly divided into two categories, namely "simple" and "complex". Retroviruses may even be further divided into seven groups. Five of these groups represent retroviruses with oncogenic potential. The remaining two groups are the lentiviruses and the spumaviruses. A review of these retroviruses is presented in Coffin et al (1997) ibid.
The basic structure of retroviral and lentiviral genomes share many common features such as a 5' LTR and a 3' LTR, between or within which are located a packaging signal to enable the genome to be packaged, a primer binding site, integration sites to enable integration into a target cell genome and gag/pol and env genes encoding the packaging components - these are polypeptides required for the assembly of viral particles. Lentiviruses have additional features, such as the rev gene and RRE sequences in HIV, which enable the efficient export of RNA transcripts of the integrated provirus from the nucleus to the cytoplasm of an infected target cell.
In the provirus, these genes are flanked at both ends by regions called long terminal repeats (LTRs). The LTRs are responsible for proviral integration, and transcription. LTRs also serve as enhancer-promoter sequences and can control the expression of the viral genes.
The LTRs themselves are identical sequences that can be divided into three elements, which are called U3, R and U5. U3 is derived from the sequence unique to the 3' end of the RNA. R is derived from a sequence repeated at both ends of the RNA and U5 is derived from the sequence unique to the 5' end of the RNA. The sizes of the three elements can vary considerably among different retroviruses.
In a typical retroviral vector, at least part of one or more protein coding regions essential for replication may be removed from the virus; for example, gag/pol and env may be absent or not functional. This makes the viral vector replication-defective.
Polyadenylation
In eukaryotes, polyadenylation is part of the maturation of mRNA for translation and involves the addition of a polyadenine (poly(A)) tail to an mRNA transcript. The poly(A) tail comprises multiple adenosine monophosphates and is important for the nuclear export, translation and stability of mRNA. The process of polyadenylation begins as the transcription of a gene terminates. A set of cellular proteins binds to the polyA sequence elements such that the 3' segment of the transcribed pre-mRNA is first cleaved followed by synthesis of the poly(A) tail at the 3' end of the mRNA. In alternative polyadenylation, a poly(A) tail is added at one of several possible sites, producing multiple transcripts from a single gene.
Native retroviral vector genomes are typically flanked by 3' and 5' long terminal repeats (LTRs). Native retrovirus LTRs comprise a U3 region (containing the enhancer/promoter activities necessary for transcription), and an R-U5 region that comprises important cis-acting sequences regulating a number of functions, including packaging, splicing, polyadenylation and translation. Retrovirus polyadenylation (polyA) sequences required for efficient transcriptional termination also reside within native retrovirus LTRs.
The typical structure and spacing of functional elements of polyadenylation sequences for terminating pol-ll transcription have been well characterized (Proudfoot (2011), Genes & Dev. 25: 1770-1782), and can be simply summarized as having: [1] a core polyadenylation signal (PAS; canonical sequence AALIAAA), [2] a cleavage site typically 15-30 nucleotides downstream of the PAS, [3] a downstream Gil-rich downstream enhancer (DSE), broadly within -100 nucleotides of the cleavage site (typically with 20 nucleotides for strong polyadenylation sequences, and often a 'CA' motif), and [4] an upstream enhancer (USE), broadly within -60 nucleotides of the PAS.
RNA Splicing
The present invention, as disclosed herein, may be combined with major splice donor (MSD) site knock out lentiviral vector genomes. The invention may employ lentiviral vector genomes in which the major splice donor site, and optionally the cryptic splice donor site 3' to the major splice donor site, are inactivated.
Thus, in some embodiments, the major splice donor site in the genome of the lentiviral vector, and optionally the cryptic splice donor site 3' to the major splice donor site in the genome of the lentiviral vector are inactivated.
In some embodiments, the inactivated major splice donor site has the sequence set forth in SEQ ID NO: 4.
Suitable inactivated splice sites for use according to the present invention are described in WO 2021/160993 and incorporated herein by reference.
RNA splicing is catalysed by a large RNA-protein complex called the spliceosome, which is comprised of five small nuclear ribonucleoproteins (snRNPs). The borders between introns and exons are marked by specific nucleotide sequences within a pre-mRNA, which delineate where splicing will occur. Such boundaries are referred to as "splice sites". The term "splice site" refers to polynucleotides that are capable of being recognized by the splicing machinery of a eukaryotic cell as suitable for being cut and/or ligated to another splice site.
Splice sites allow for the excision of introns present in a pre-mRNA transcript. Typically, the 5' splice boundary is referred to as the "splice donor site" or the "5' splice site," and the 3' splice boundary is referred to as the "splice acceptor site" or the "3' splice site". Splice sites include, for example, naturally occurring splice sites, engineered or synthetic splice sites, canonical or consensus splice sites, and/or non-canonical splice sites, for example, cryptic splice sites.
Splice acceptor sites generally consist of three separate sequence elements: the branch point or branch site, a polypyrimidine tract and the acceptor consensus sequence. The branch point consensus sequence in eukaryotes is YNYTRAC (where Y is a pyrimidine, N is any nucleotide, and R is a purine). The 3' acceptor splice site consensus sequence is YAG (where Y is a pyrimidine) (see, e.g., Griffiths et al., eds., Modern Genetic Analysis, 2nd edition, W.H. Freeman and Company, New York (2002)). The 3' splice acceptor site typically is located at the 3' end of an intron.
The terms "canonical splice site" or "consensus splice site" may be used interchangeably and refer to splice sites that are conserved across species.
Consensus sequences for the 5' donor splice site and the 3' acceptor splice site used in eukaryotic RNA splicing are well known in the art. These consensus sequences include nearly invariant dinucleotides at each end of the intron: GT at the 5' end of the intron, and AG at the 3' end of an intron.
The canonical splice donor site consensus sequence may be (for DNA) AG/GTRAGT (where A is adenosine, T is thymine, G is guanine, C is cytosine, R is a purine and
Figure imgf000049_0001
indicates the cleavage site). This conforms to the more general splice donor consensus sequence MAGGURR described herein. It is well known in the art that a splice donor may deviate from this consensus, especially in viral genomes where other constraints bear on the same sequence, such as secondary structure for example within a vRNA packaging region. Non- canonical splice sites are also well known in the art, albeit they occur rarely compared to the canonical splice donor consensus sequence.
By "major splice donor site" is meant the first (dominant) splice donor site in the viral vector genome, encoded and embedded within the native viral RNA packaging sequence typically located in the 5' region of the viral vector nucleotide sequence.
In one embodiment, the lentiviral vector genome does not contain an active major splice donor site, that is splicing does not occur from the major splice donor site, and splicing activity from the major splice donor site is ablated.
The major splice donor site is located in the 5' packaging region of a lentiviral genome.
In the case of the HIV-1 virus, the major splice donor consensus sequence is (for DNA) TG/GTRAGT (where A is adenosine, T is thymine, G is guanine, C is cytosine, R is a purine and indicates the cleavage site).
In one embodiment, the splice donor region, i.e. the region of the vector genome which comprises the major splice donor site prior to mutation, may have the following sequence:
GGGGCGGCGACTGGTGAGTACGCCAAAAAT ( SEQ ID NO : 1 )
In one embodiment, the mutated splice donor region may comprise the sequence:
GGGGCGGCGACTGCAGACAACGCCAAAAAT ( SEQ ID NO : 2 - MSD-2KO )
In one embodiment, the mutated splice donor region may comprise the sequence: GGGGCGGCGAGTGGAGACTACGCCAAAAAT ( SEQ ID NO : 3 - MSD-2KOv2 )
In one embodiment, the mutated splice donor region may comprise the sequence:
GGGGAAGGCAACAGATAAATATGCCTTAAAAT ( SEQ ID NO : 4 - MSD-2KOm5 )
In one embodiment, prior to modification the splice donor region may comprise the sequence:
GGCGACTGGTGAGTACGCC ( SEQ ID NO : 5 )
This sequence is also referred to herein as the "stem loop 2" region (SL2). This sequence may form a stem loop structure in the splice donor region of the vector genome. In one aspect of the invention this sequence (SL2) may have been deleted from the nucleotide sequence according to the invention as described herein.
As such, the invention encompasses the use of a lentiviral vector genome that does not comprise SL2. The invention encompasses the use of a lentiviral vector genome that does not comprise a sequence according to SEQ ID NO:5.
In one aspect of the invention the major splice donor site may have the following consensus sequence, wherein R is a purine and "/" is the cleavage site:
TG/GTRAGT
In one aspect, R may be guanine (G).
In one aspect of the invention, the major splice donor and cryptic splice donor region may have the following core sequence, wherein "/" are the cleavage sites at the major splice donor and cryptic splice donor sites:
/GTGA/GTA.
In one aspect of the invention the MSD-mutated vector genome may have at least two mutations in the major splice donor and cryptic splice donor 'region' (/GTGA/GTA), wherein the first and second 'GT' nucleotides are the immediately 3' of the major splice donor and cryptic splice donor nucleotides respectively
In one aspect of the invention the major splice donor consensus sequence is CTGGT. The major splice donor site may contain the sequence CTGGT.
In one aspect the nucleotide sequence encoding the lentiviral vector genome, prior to inactivation of the splice sites, comprises the sequence as set forth in any of SEQ ID NOs: 1 , 5 and/or the sequence TG/GTRAGT, CTGGT, TGAGT and/or /GTGA/GTA. In one aspect the nucleotide sequence comprises an inactivated major splice donor site which would otherwise have a cleavage site between nucleotides corresponding to nucleotides 13 and 14 of SEQ ID NO:1.
According to the invention as described herein, the nucleotide sequence encoding the lentiviral vector genome also contains an inactive cryptic splice donor site. In one aspect the nucleotide sequence does not contain an active cryptic splice donor site adjacent to (3' of) the major splice donor site, that is to say that splicing does not occur from the adjacent cryptic splice donor site, and splicing from the cryptic splice donor site is ablated.
The term "cryptic splice donor site" refers to a nucleic acid sequence which does not normally function as a splice donor site or is utilised less efficiently as a splice donor site due to the adjacent sequence context (e.g. the presence of a nearby 'preferred' splice donor), but can be activated to become a more efficient functioning splice donor site by mutation of the adjacent sequence (e.g. mutation of the nearby 'preferred' splice donor).
In one aspect the cryptic splice donor site is the first cryptic splice donor site 3' of the major splice donor.
In one aspect the cryptic splice donor site is within 6 nucleotides of the major splice donor site on the 3' side of the major splice donor site. Preferably the cryptic splice donor site is within 4 or 5, preferably 4, nucleotides of the major splice donor cleavage site.
In one aspect of the invention the cryptic splice donor site has the consensus sequence TGAGT.
In one aspect the inactivated cryptic splice donor site which would otherwise have a cleavage site between nucleotides corresponding to nucleotides 17 and 18 of SEQ ID NO:1.
In one aspect of the invention the major splice donor site and/or adjacent cryptic splice donor site contain a "GT" motif. In one aspect of the invention both the major splice donor site and adjacent cryptic splice donor site contain a "GT" motif which is mutated. The mutated GT motifs may inactivate splice activity from both the major splice donor site and adjacent cryptic splice donor site. An example of such a mutation is referred to herein as "MSD-2KO".
In one aspect the splice donor region may comprise the following sequence:
CAGACA
For example, in one aspect the mutated splice donor region may comprise the following sequence:
GGCGACTGCAGACAACGCC ( SEQ ID NO : 6 ) A further example of an inactivating mutation is referred to herein as "MSD-2KOv2".
In one aspect the mutated splice donor region may comprise the following sequence:
GTGGAGACT
For example, in one aspect the mutated splice donor region may comprise the following sequence:
GGCGAGTGGAGACTACGCC ( SEQ ID NO : 7 )
For example, in one aspect the mutated splice donor region may comprise the following sequence:
AAGGCAACAGATAAATATGCCTT ( SEQ ID NO : 8 )
In one aspect the stem loop 2 region as described above may be deleted from the splice donor region, resulting in inactivation of both the major splice donor site and the adjacent cryptic splice donor site. Such a deletion is referred to herein as "ASL2".
A variety of different types of mutations can be introduced into the nucleic acid sequence in order to inactivate the major and adjacent cryptic splice donor sites.
In one aspect the mutation is a functional mutation to ablate or suppress splicing activity in the splice region. The nucleotide sequence may contain a mutation or deletion in any of the nucleotides in any of SEQ ID NOs: 1 , 5 and/or the sequence TG/GTRAGT, CTGGT, TGAGT and/or /GTGA/GTA. Suitable mutations will be known to one skilled in the art, and are described herein.
For example, a point mutation can be introduced into the nucleic acid sequence. The term "point mutation," as used herein, refers to any change to a single nucleotide. Point mutations include, for example, deletions, transitions, and transversions; these can be classified as nonsense mutations, missense mutations, or silent mutations when present within protein coding sequence. A "nonsense" mutation produces a stop codon. A "missense" mutation produces a codon that encodes a different amino acid. A "silent" mutation produces a codon that encodes either the same amino acid or a different amino acid that does not alter the function of the protein. One or more point mutations can be introduced into the nucleic acid sequence comprising the cryptic splice donor site. For example, the nucleic acid sequence comprising the cryptic splice site can be mutated by introducing two or more point mutations therein.
At least two point mutations can be introduced in several locations within the nucleic acid sequence comprising the major splice donor and cryptic splice donor sites to achieve attenuation of splicing from the splice donor region. In one aspect the mutations may be within the four nucleotides at the splice donor cleavage site; in the canonical splice donor consensus sequence this is A1G2/G3T4, wherein "/" is the cleavage site. It is well known in the art that a splice donor cleavage site may deviate from this consensus, especially in viral genomes where other constraints bear on the same sequence, such as secondary structure for example within a vRNA packaging region. It is well known that the G3T4 dinucleotide is generally the least variable sequence within the canonical splice donor consensus sequence, and mutations to the G3 and or T4 will most likely achieve the greatest attenuating effect. For example, for the major splice donor site in HIV-1 viral vector genomes this can be T1G2/G3T4, wherein "/" is the cleavage site. For example, for the cryptic splice donor site in HIV-1 viral vector genomes this can be G1A2/G3T4, wherein "/" is the cleavage site. Additionally, the point mutation(s) can be introduced adjacent to a splice donor site. For example, the point mutation can be introduced upstream or downstream of a splice donor site. In aspects where the nucleic acid sequence comprising a major and/or cryptic splice donor site is mutated by introducing multiple point mutations therein, the point mutations can be introduced upstream and/or downstream of the cryptic splice donor site.
As described herein, the nucleotide sequence encoding the RNA genome of the lentiviral vector for use according to the invention may optionally further comprise a mutation in a cryptic splice donor site within the SL4 loop of the packaging sequence. In one aspect a GT dinucleotide in said cryptic splice donor site within the SL4 loop of the packaging sequence is mutated to GC.
In one aspect, the nucleotide sequence encoding the lentiviral vector genome may be suitable for use in a lentiviral vector in a U3 or tat-independent system for vector production. As described herein, 3rd generation lentiviral vectors are U3/tat-independent, and the nucleotide sequences according to the present invention may be used in the context of a 3rd generation lentiviral vector. In one aspect of the invention tat is not provided in the lentiviral vector production system, for example tat is not provided in trans. In one aspect the cell or vector or vector production system as described herein does not comprise the tat protein. In one aspect of the invention HIV-1 U3 is not present in the lentiviral vector production system, for example HIV-1 U3 is not provided in cis to drive transcription of vector genome expression cassette.
In one aspect the major splice donor site in the lentiviral vector genome is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence is for use in a tat-independent lentiviral vector. In one aspect the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence is produced in the absence of tat.
In one aspect the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence has been transcribed independently of tat.
In one aspect the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence is for use in a U3-independent lentiviral vector.
In one aspect the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence has been transcribed independently of the U3 promoter.
In one aspect the major splice donor site in the RNA genome of the lentiviral vector is inactivated and the cryptic splice donor site 3' to the major splice donor site is inactivated, and said nucleotide sequence has been transcribed by a heterologous promoter.
In one aspect, transcription of the nucleotide sequence as described herein is not dependent on the presence of U3. The nucleotide sequence may be derived from a U3-independent transcription event. The nucleotide sequence may be derived from a heterologous promoter. A nucleotide sequence as described herein may not comprise a native U3 promoter.
Construction of Splice Site Mutants
Splice site mutants of the present invention may be constructed using a variety of techniques. For example, mutations may be introduced at particular loci by synthesising oligonucleotides containing a mutant sequence, flanked by restriction sites enabling ligation to fragments of the native sequence. Following ligation, the resulting reconstructed sequence comprises a derivative having the desired nucleotide insertion, substitution, or deletion. By way of further example, splice site mutants may be constructed as described in WO 2021/160993 (which is incorporated herein by reference in its entirety).
Other known techniques allowing alterations of DNA sequence include recombination approaches such as Gibson assembly, Golden-gate cloning and In-fusion.
Alternatively, oligonucleotide-directed site-specific (or segment specific) mutagenesis procedures may be employed to provide an altered sequence according to the substitution, deletion, or insertion required. Deletion or truncation derivatives of splice site mutants may also be constructed by utilising convenient restriction endonuclease sites adjacent to the desired deletion. Subsequent to restriction, overhangs may be filled in, and the DNA religated. Exemplary methods of making the alterations set forth above are disclosed by Sambrook et al. (Molecular cloning: A Laboratory Manual, 2d Ed., Cold Spring Harbor Laboratory Press, 1989).
Splice site mutants may also be constructed utilising techniques of PCR mutagenesis, chemical mutagenesis, chemical mutagenesis (Drinkwater and Klinedinst, 1986) by forced nucleotide misincorporation (e.g., Liao and Wise, 1990), or by use of randomly mutagenised oligonucleotides (Horwitz et al., 1989).
The present invention also provides a method for producing a lentiviral vector nucleotide sequence, comprising the steps of: providing a nucleotide sequence encoding the RNA genome of a lentiviral vector as described herein; and mutating the major splice donor site and cryptic splice donor site as described herein in said nucleotide sequence.
Combination with modified U1 snRNA to improve vector titre
MSD-mutated lentiviral vectors are preferable to current standard lentiviral vectors for use as gene therapy vectors due to their reduced capacity to partake in aberrant splicing events both during LV production and in target cells. However, the production of MSD mutated vectors has either relied upon supply of the HIV-1 tat protein (1st and 2nd generation U3-dependent lentiviral vectors), has been of lower efficiency due to the destabilising effect of mutating the MSD on vector RNA levels (in 3rd generation vectors), or, as discovered by the present inventors, is improved by co-expression of modified U1 snRNA. The present inventors have previously found that MSD-mutated, 3rd generation (i.e. U3/tat-independent) LVs could be produced to high titre by co-expression of a modified U1 snRNA directed to bind to the 5'packaging region of the vector genome RNA during production (see WO 2021/014157 and WO 2021/160993, incorporated herein by reference).
The amount of vRNA produced from so-called MSD-mutated (or MSD-2KO) lentiviral vector genomes is typically substantially reduced, leading to lower vector titres. It is theorized that an 'early' interaction with the MSD and U1 snRNA (prior to splicing decisions) is important for transcription elongation from the external promoter. The inventors previously found that one solution to this problem was to provide a modified U1 snRNA in trans during LV production to stabilize the vRNA (see WO 2021/014157 and WO 2021/160993). These modified U1 snRNAs can enhance the production titres of MSD-mutated LVs in a manner that is independent of the presence of the 5'polyA signal within the 5'R region, indicating a novel mechanism over others' use of modified U1 snRNAs to suppress polyadenylation (so called U1 -interference, [lli]). Targeting the modified U1 snRNAs to critical sequences of the packaging region produced the greatest enhancement in MSD-mutated LV titres.
The present inventors previously showed that the output titres of lentiviral vectors can be enhanced by co-expressing non-coding RNAs based on U1 snRNAs, which have been modified so that they no longer target the endogenous sequence (a splice donor site) but now target a sequence within the vRNA molecule. As demonstrated in WO 2021/014157 and WO 2021/160993, the relative enhancement in output titres of lentiviral vectors harbouring attenuating mutations within the major splice donor region (containing the major splice donor and cryptic splice donor sites) by said modified U1 snRNAs are greater than standard lentiviral vectors containing a non-mutated major splice donor region.
Vector genomes harbouring a broad range of mutation types within the major splice donor region (point mutations, region deletion, and sequence replacement) that lead to reduced titres may be used in combination with a modified U1 snRNA. The approach may comprise coexpression of modified U1 snRNAs together with the other vector components during vector production. The modified U1 snRNAs are designed such that binding to the consensus splice donor site has been ablated by replacing it with a heterologous sequence that is complementary to a target sequence within the vector genome vRNA.
In some embodiments, the nucleotide sequence of the invention is used in combination with a modified U1 snRNA.
In some embodiments, the nucleotide sequence of the invention further comprises a nucleotide sequence encoding a modified U1 snRNA.
In some embodiments, the nucleotide sequence encoding the lentiviral vector genome further encodes a modified U1 snRNA.
In some embodiments, the nucleotide sequence encoding the lentiviral vector genome is operably linked to the nucleotide sequence encoding the modified U1 snRNA.
In some embodiments, wherein said modified U1 snRNA has been modified to bind to a nucleotide sequence within the packaging region of the lentiviral vector genome. In one embodiment, the nucleotide sequence encoding a modified U1 snRNA may be provided on a different nucleotide sequence, for example on a different plasmid. In other words, the nucleotide sequence encoding a modified U1 snRNA may be provided in trans during production of a lentiviral vector as described herein.
Splicing and polyadenylation are key processes for mRNA maturation, particularly in higher eukaryotes where most protein-coding transcripts contain multiple introns. The elements within a pre-mRNA that are required for splicing include the 5' splice donor signal, the sequence surrounding the branch point and the 3' splice acceptor signal. Interacting with these three elements is the spliceosome, which is formed by five small nuclear RNAs (snRNAs), including U1 snRNA, and associated nuclear proteins (snRNP). U1 snRNA is expressed by a polymerase II promoter and is present in most eukaryotic cells (Lund et al., 1984, J. Biol. Chem., 259:2013-2021). Human U1 snRNA (small nuclear RNA) is 164 nt long with a well-defined structure consisting of four stem-loops (West, S., 2012, Biochemical Society Transactions, 40:846-849). U1 snRNA contains a short sequence at its 5'-end that is broadly complementary to the 5' splice donor sites at exon-intron junctions. U1 snRNA participates in splice-site selection and spliceosome assembly by base pairing to the 5' splice donor site. A known function for U1 snRNA outside splicing is in the regulation of 3'-end mRNA processing: it suppresses premature polyadenylation (polyA) at early polyA signals (particularly within introns).
Human U1 snRNA (small nuclear RNA) is 164 nt long with a well-defined structure consisting of four stem-loops. The endogenous non-coding RNA, U1 snRNA, binds to the consensus 5' splice donor site (e.g. 5'-MAGGURR-3', wherein M is A or C and R is A or G) via the native splice donor annealing sequence (e.g. 5'-ACUUACCUG-3') during early steps of intron splicing. Stem loop I binds to U1A-70K protein that has been shown to be important for polyA suppression. Stem loop II binds to U1A protein, and the 5'-AUUUGUGG-3' sequence binds to Sm proteins, which together with Stem loop IV, is important for U1 snRNA processing. As defined herein, the modified U1 snRNA for use according to the present invention is modified to introduce a heterologous sequence that is complementary to a target sequence within the vector genome vRNA molecule at the site of the native splice donor targeting/annealing sequence.
Suitable modified U1 snRNAs for use according to the present invention are described in WO 2021/014157 and WO 2021/160993 and are incorporated herein by reference.
The modified U1 snRNAs can be manufactured according to methods generally known in the art. For example, the modified U1 snRNAs can be manufactured by chemical synthesis or recombinant DNA/RNA technology. By way of further example, the modified U1 snRNAs as described herein can be manufactured as described in WO 2021/014157 and WO 2021/160993.
The introduction of a nucleotide sequence encoding a modified U1 snRNA as described herein into a cell using conventional molecular and cell biology techniques is within the capabilities of a person of ordinary skill in the art.
TRIP System
The present invention, as disclosed herein, may be combined with the 'TRIP' system.
WO2015/092440 and WO2021/094752, which are incorporated in their entirety herein by reference, disclose the use of a heterologous translation control system in eukaryotic cell cultures to repress the translation of the NOI (repress transgene expression) during viral vector production and thus repress or prevent expression of the protein encoded by the NOI. This system is referred to as the Transgene Repression In vector Production cell system or TRIP system.
In one form, the TRIP system utilises the bacterial trp operon regulation protein, tryptophan RNA-binding attenuation protein (TRAP), and the TRAP binding site/sequence (tbs) to mediate transgene repression. The use of this system does not impede the production of packageable vector genome molecules nor the activity of vector virions, and does not interfere with the long-term expression of the NOI in the target cell.
The term "binding site" is to be understood as a nucleic acid sequence that is capable of interacting with a certain protein.
By "capable of interacting" it is to be understood that the nucleic acid binding site (e.g. tbs or portion thereof) is capable of binding to a protein, for example TRAP, under the conditions that are encountered in a cell, for example a eukaryotic viral vector production cell. Such an interaction with an RNA-binding protein such as TRAP results in the repression or prevention of translation of a NOI to which the nucleic acid binding site (e.g. the tbs or portion thereof) is operably linked.
A consensus TRAP binding site sequence that is capable of binding TRAP is [KAGNN] repeated multiple times (e.g. 6, 7, 8, 9, 10, 11 , 12 or more times); such sequence is found in the native trp operon. In the native context, occasionally AAGNN is tolerated and occasionally additional "spacing" N nucleotides result in a functional sequence. In vitro experiments have demonstrated that at least 6 or more consensus repeats are required for TRAP-RNA binding (Babitzke P, Y. J., Campanelli D. (1996) Journal of Bacteriology 178(17): 5159-5163). Therefore, preferably in one aspect there are 6 or more continuous [KAGN>2] sequences present within the tbs, wherein "K" may be T or G in DNA and II or G in RNA and "N" is to be understood as specifying any nucleotide at that position in the sequence (for example, "N" could be G, A, T, C or II).
In some embodiments, the lentiviral vector genome further comprises a tbs.
In some embodiments, the nucleotide sequence of the invention further comprises a TRAP binding site (tbs).
In some embodiments, a nucleotide sequence encoding TRAP is present during production of the lentiviral vector as described herein.
Suitable tbs are described in WO2015/092440 and WO2021/094752 and are incorporated herein by reference. Suitably, the nucleotide sequence may further comprise a tbs, and also may comprise a Kozak sequence, wherein said tbs overlaps the Kozak sequence, or wherein said Kozak sequence comprises a portion of a tbs. Suitably, the nucleotide sequence may further comprise a multiple cloning site and a Kozak sequence, wherein said multiple cloning site is overlapping with or located downstream to the 3' KAGN2-3 repeat of the tbs and upstream of the Kozak sequence.
As used herein, a "multiple cloning site" is to be understood as a DNA region which contains several restriction enzyme recognition sites (restriction enzyme sites) very close to each other. In one aspect, the RE sites may be overlapping in the MCS for use in the invention.
As used herein, a "restriction enzyme site" or "restriction enzyme recognition site" is a location on a DNA molecule containing specific sequences of nucleotides, 4-8 nucleotides in length, which are recognised by restriction enzymes. A restriction enzyme recognises a specific RE site (i.e. a specific sequence) and cleaves the DNA molecule within, or nearby, the RE site.
In some aspects of the present invention, the nucleotide of interest (i.e. transgene) is operably linked to the tbs. In some aspects, the nucleotide of interest is translated in a target cell which expresses TRAP. In some aspects, the nucleotide of interest is translated in a target cell which lacks TRAP.
By "operably linked" it is to be understood that the components described are in a relationship permitting them to function in their intended manner. Therefore a tbs or portion thereof for use in the invention operably linked to a NOI is positioned in such a way that translation of the NOI is modified when as TRAP binds to the tbs or portion thereof. The tbs may be capable of interacting with TRAP such that translation of the nucleotide of interest is repressed or prevented in a viral vector production cell.
Viral cis-acting sequences
ORFs present in the vector backbone delivered in transduced (e.g. patient) cells could be transcribed, for example, when read-through transcription from upstream cellular promoters occurs (lentiviral vectors target active transcription sites), leading to potential aberrant transcription of genetic material located in the vector backbone in patient cells. This potential aberrant transcription of genetic material located in the vector backbone following read- through transcription could also occur during lentiviral vector production in production cells.
The viral cis-acting sequence present within lentiviral vector genomes may contain multiple internal ORFs. These internal ORFs may be found between an internal ATG sequence of the viral cis-acting sequence and the stop codon immediately 3' to the ATG sequence.
Modifications in a viral cis-acting sequence to disrupt at least one internal ORF, for example by mutating the ATG sequence which denotes the start of the at least one internal ORF, are tolerated. Thus, the modified viral cis-acting sequence described herein retains its function.
Accordingly, in some embodiments of the present invention, the lentiviral vector genome comprises at least one modified viral cis-acting sequence, wherein at least one internal open reading frame (ORF) in the viral cis-acting sequence is disrupted (see PCT/GB2021/050620, incorporated herein by reference in its entirety). The at least one internal ORF may be disrupted by mutating at least one ATG sequence (ATG sequences may function as translation initiation codons).
In some embodiments of the present invention, the lentiviral vector genome comprises a modified nucleotide sequence encoding gag, wherein at least one internal ORF in the modified nucleotide sequence encoding gag is disrupted (see PCT/GB2021/050620, incorporated herein by reference in its entirety). The at least one internal ORF in the modified nucleotide sequence encoding gag may be disrupted by mutating at least one ATG sequence as described herein.
Suitable modified viral cis-acting sequences and modified nucleotide sequences encoding gag for use according to the present invention are described in PCT/GB2021/050620 and are incorporated herein by reference. In some embodiments, the lentiviral vector genome comprises at least two (suitably at least three, at least four, at least five, at least six, at least seven) modified viral cis-acting sequences.
In some embodiments, at least two (suitably at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen or at least twenty) internal ORFs in the at least one viral cis-acting sequence and/or in the nucleotide sequence encoding gag may be disrupted. In some embodiments, at least three internal ORFs in the at least one viral cis-acting sequence and/or in the nucleotide sequence encoding gag may be disrupted.
In some embodiments, one (suitably, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen or twenty) internal ORFs in the at least one viral cis-acting sequence and/or the nucleotide sequence encoding gag may be disrupted.
In some embodiments, the at least one internal ORF may be disrupted such that the internal ORF is not expressed. In some embodiments, the at least one internal ORF may be disrupted such that the internal ORF is not translated. In some embodiments, the at least one internal ORF may be disrupted such that no protein is expressed from the internal ORF. In some embodiments, the at least one internal ORF may be disrupted such that no protein is translated from the internal ORF. Thus, the at least one internal ORF present in the modified viral cis- acting sequence and/or in the modified nucleotide sequence encoding gag in the vector backbone delivered in transduced cells may be disrupted such that aberrant transcription of the internal ORF is prevented when there is read-through transcription from upstream cellular promoters.
In one embodiment, the at least one internal ORF may be disrupted by mutating at least one ATG sequence. A "mutation" of an ATG sequence may comprise one or more nucleotide deletions, additions, or substitutions.
In one embodiment, the at least one ATG sequence may be mutated in the modified viral cis- acting sequence and/or in the modified nucleotide sequence encoding gag to a sequence selected from the group consisting of: a) an ATTG sequence; b) an ACG sequence; c) an A-G sequence; d) an AAG sequence; e) a TTG sequence; and/or f) an ATT sequence.
The at least one ATG sequence may be mutated to an ATTG sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag. The at least one ATG sequence may be mutated to an ACG sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag. The at least one ATG sequence may be mutated to an A-G sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag. The at least one ATG sequence may be mutated to an AAG sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag. The at least one ATG sequence may be mutated to a TTG sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag. The at least one ATG sequence may be mutated to an ATT sequence in the modified viral cis-acting sequence and/or the modified nucleotide sequence encoding gag.
In one embodiment, the at least one modified viral cis-acting element and/or the modified nucleotide sequence encoding gag may lack ATG sequences.
In some embodiments, all ATG sequences within viral cis-acting sequences and/or the nucleotide sequence encoding gag in the lentiviral vector genome are mutated.
Lentiviral vectors typically comprise multiple viral cis-acting sequences. Example viral cis- acting sequences include gag-p17, Rev response element (RRE), central polypurine tract (cppt) and Woodchuck hepatitis virus (WHV) post-transcriptional regulatory element (WPRE).
In some embodiments, the at least one viral cis-acting sequence may be at least one lentiviral cis-acting sequence. Example lentiviral cis-acting sequences include the RRE and cppt.
In some embodiments, the at least one viral cis-acting sequence may be at least one non- lentiviral cis-acting sequence.
In some embodiments, the at least one viral cis-acting sequence may be at least one lentiviral cis-acting sequence and at least one non-lentiviral cis-acting sequence.
In some embodiments, the at least one viral cis-acting sequence is: a) gag-p17; and/or b) a Rev response element (RRE); and/or c) a Woodchuck hepatitis virus (WHV) post-transcriptional regulatory element (WPRE).
In some embodiments, the at least one viral cis-acting sequence is a RRE.
In some embodiments, the at least one viral cis-acting sequence is a WPRE.
In some embodiments, the lentiviral vector genome comprises at least two (suitably, at least 3, at least 4, at least 5) modified viral cis-acting sequences.
In some embodiments, the lentiviral vector genome comprises a modified RRE as described herein and a modified WPRE as described herein.
In some embodiments, the lentiviral vector genome comprises a modified RRE as described herein, a modified WPRE as described herein and a modified nucleotide sequence encoding gag as described herein.
In one embodiment, the lentiviral vector genome as described herein lacks ATG sequences in the backbone of the vector genome. In one embodiment, the lentiviral vector genome as described herein lacks ATG sequences except in the NOI (i.e. transgene).
In one embodiment, the lentiviral vector genome comprises at least one modified viral cis- acting sequence and/or a modified nucleotide sequence encoding gag, wherein at least one internal open reading frame (ORF) in the viral cis-acting sequence or in the nucleotide sequence encoding gag is ablated.
In one embodiment, the lentiviral vector genome comprises at least one modified viral cis- acting sequence and/or a modified nucleotide sequence encoding gag, wherein at least one internal open reading frame (ORF) in the viral cis-acting sequence or in the nucleotide sequence encoding gag is silenced.
Modified gag and viral Gag-pl7 protein
A further preferred but optional feature of the invention is the minimization of gag sequences included within the packaging sequences used in combination with the aforementioned features. The amount of gag typically included within HIV-1 lentiviral vector packaging sequences can be reduced by at least 270 nucleotides, but may be reduced by up to the entire gag sequence. The deleted gag nucleotide sequence may be that of the gag-p17 instability sequence. Deletion of the gag-p17 instability sequence typically results in reduced vector titers unless the first ATG codon of the remaining gag sequence is mutated. In some aspects the reduced packaging sequences comprise deleted gag sequences wherein only the first 80 nucleotides of gag remain.
In some aspects the reduced packaging sequences comprise deleted gag sequences wherein only the first 70 nucleotides of gag remain.
In some aspects the reduced packaging sequences comprise deleted gag sequences wherein only the first 60 nucleotides of gag remain.
In some aspects the reduced packaging sequences comprise deleted gag sequences wherein only the first 50 nucleotides of gag remain.
In some aspects the reduced packaging sequences comprise deleted gag sequences wherein only the first 40 nucleotides of gag remain.
In some aspects the reduced packaging sequences comprise deleted gag sequences wherein only the first 30 nucleotides of gag remain.
In some aspects the reduced packaging sequences comprise deleted gag sequences wherein only the first 20 nucleotides of gag remain.
In some aspects the reduced packaging sequences comprise deleted gag sequences wherein only the first 10 nucleotides of gag remain.
In some aspects the reduced packaging sequences comprise deleted gag sequences wherein no nucleotides of gag remain.
In some aspects, nucleotide sequences of the invention comprise ablated gag sequences wherein the gag sequences comprise only up to the first 10, up to the first 20, up to the first 30, up to the first 40, up to the first 50, up to the first 60, up to the first 70, or up to the first 80 nucleotides of gag.
The nucleotide sequence encoding gag may be a truncated nucleotide sequence encoding a part of gag. The nucleotide sequence encoding gag may be a minimal truncated nucleotide sequence encoding a part of gag. The part of gag may be a contiguous sequence. The truncated nucleotide sequence or minimal truncated nucleotide sequence encoding a part of gag may also contain at least one frameshift mutation. An example truncated nucleotide sequence encoding a part of gag and which contains a frameshift mutation at position 45-46 is as follows:
Figure imgf000065_0001
An example minimal truncated nucleotide sequence encoding a part of gag and which contains a frameshift mutation at position 45-46 is as follows:
Figure imgf000065_0002
The nucleotide sequence encoding gag may, for example, comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 9; or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 10.
The modified nucleotide sequence encoding gag may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 9; or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 10.
The modified nucleotide sequence encoding gag may comprise the sequence as set forth in SEQ ID NO: 9 or SEQ ID NO: 10, or a sequence having at least 80% identity (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) thereto, wherein at least one ATG sequence selected from (a) to (c) is mutated: a) ATG corresponding to positions 1-3 of SEQ ID NO: 9; b) ATG corresponding to positions 47-49 of SEQ ID NO: 9; and/or c) ATG corresponding to positions 107-109 of SEQ ID NO: 9.
An example modified truncated nucleotide sequence encoding part of gag and which contains a frameshift mutation is as follows:
Figure imgf000066_0001
An example modified minimal truncated nucleotide sequence encoding a part of gag and which contains a frameshift mutation is as follows:
Figure imgf000066_0002
The modified nucleotide sequence encoding gag may comprise the sequence as set forth in SEQ ID NO: 11 , or a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity thereto. The sequence may comprise less than three (suitably less than two or less than one) ATG sequences.
The modified nucleotide sequence encoding gag may comprise the sequence as set forth in SEQ ID NO: 12, or a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity thereto. The sequence may comprise less than two (suitably less than one) ATG sequences.
The modified nucleotide sequence encoding gag may comprise less than three ATG sequences. Suitably, the modified nucleotide sequence encoding gag may comprise less than two or less than one ATG sequence(s). The modified nucleotide sequence encoding gag may lack an ATG sequence. Lentiviral vector genomes lacking a nucleotide sequence encoding Gag-p17 or a fragment thereof are described in PCT/GB2021/050620, incorporated herein by reference in its entirety.
The lentiviral vector genome as described herein may lack a nucleotide sequence encoding Gag-p17 or a fragment thereof. The lentiviral vector genome may, for example, not express Gag-p17 or a fragment thereof. In one embodiment, the lentiviral vector genome may lack the sequence as set forth in SEQ ID NO: 13.
The viral protein Gag-p17 surrounds the capsid of the lentiviral vector particle, and is in turn surrounded by the envelope protein. A nucleotide sequence encoding Gag-p17 has historically been included in lentiviral vector genomes for the production of therapeutic lentiviral vectors. The nucleotide sequence encoding Gag-p17 present within lentiviral vector genomes is typically embedded within the packaging region containing highly structured RNA towards the 5' region of the RNA (the 5'UTR ). The nucleotide sequence encoding Gag-p17 typically comprises an RNA instability sequence (INS), herein referred to as p17-INS.
Deletion of p17-INS from the backbone of the lentiviral vector genome does not significantly impact vector titres during lentiviral vector production.
The lentiviral vector genome lacking a nucleotide sequence encoding Gag-p17 or p17-INS is of a smaller size compared to a lentiviral vector genome comprising a nucleotide sequence encoding Gag-p17 or p17-INS. Thus, the amount of viral DNA contained within the viral vector backbone delivered in transduced cells is reduced when a lentiviral vector genome lacking a nucleotide sequence encoding Gag-p17 or p17-INS is used. Further, the lentiviral vector genome lacking a nucleotide sequence encoding Gag-p17 or p17-INS may be used to deliver a transgene of larger size than the transgenes which can be delivered using a lentiviral vector genome containing a nucleotide sequence encoding Gag-p17 or p17-INS. Therefore, there are several reasons why it may be desirable to delete nucleotide sequence encoding Gag-p17 or p17-INS within the vector backbone. Deletion of gag sequences in order to reduce the size of lentiviral vector genome sequences has been reported (Sertkaya, H., et al., Sci Rep 11 :12067 (2021)).
In some embodiments, the lentiviral vector genome lacks either (i) a nucleotide sequence encoding Gag-p17 or (ii) a fragment of a nucleotide sequence encoding Gag-p17.
In some embodiments, the lentiviral vector genome lacks a nucleotide sequence encoding p17-INS or a fragment thereof. An example p17-INS is as follows:
Figure imgf000068_0001
In one embodiment, the lentiviral vector genome may lack the sequence as set forth in SEQ ID NO: 13.
In one embodiment, the fragment of a nucleotide sequence encoding Gag-p17 is a part of a full-length nucleotide sequence encoding Gag-p17. In one embodiment, the fragment comprises or consists of at least about 10 nucleotides (suitably at least about 20, at least about 30, at least about 40, at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350 nucleotides).
In one embodiment, the fragment of a nucleotide sequence encoding Gag-p17 may have a length which is between 1 % and 99% of full-length nucleotide sequence encoding Gag-p17. Suitably, the fragment of a nucleotide sequence encoding Gag-p17 may have a length which is at least about 10% (suitably at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) of a full-length nucleotide sequence encoding Gag-p17, such as a native nucleotide sequence encoding Gagpl 7. The fragment may be a contiguous region of a full-length nucleotide sequence encoding Gag-p17, such as a native nucleotide sequence encoding Gag-p17.
In one embodiment, the fragment of a nucleotide sequence encoding Gag-p17 may have a length which is between 1% and 99% of full-length nucleotide sequence encoding p17-INS. Suitably, the fragment of a nucleotide sequence encoding Gag-p17 may have a length which is at least about 10% (suitably at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%) of a full-length nucleotide sequence encoding p17-INS, such as a native nucleotide sequence encoding p17- INS (e.g. SEQ ID NO: 13). The fragment may be a contiguous region of a full-length nucleotide sequence encoding p17-INS, such as a native nucleotide sequence encoding p17-INS (e.g.
SEQ ID NO: 13).
In one embodiment, the fragment of a nucleotide sequence encoding Gag-p17 comprises or consists of the INS located in the nucleotide sequence encoding Gag-p17.
In one embodiment, the lentiviral vector genome lacking either (i) a nucleotide sequence encoding Gag-p17 or (ii) a fragment of a nucleotide sequence encoding Gag-p17 comprises at least one modified viral cis-acting sequence as described herein.
In one embodiment, the lentiviral vector genome lacking either (i) a nucleotide sequence encoding Gag-p17 or (ii) a fragment of a nucleotide sequence encoding Gag-p17 may comprise a modified RRE, a modified WPRE and/or a modified nucleotide sequence encoding gag as described herein.
In one embodiment, the lentiviral vector genome lacking either (i) a nucleotide sequence encoding Gag-p17 or (ii) a fragment of a nucleotide sequence encoding Gag-p17 may comprise a modified RRE as described herein, a modified WPRE as described herein and a modified nucleotide sequence encoding gag as described herein.
Modified Rev response element (RRE)
In some embodiments, the lentiviral vector genome comprises a modified Rev response element (RRE), wherein at least one internal open reading frame (ORF) in the RRE is disrupted as described herein.
The RRE is an essential viral RNA element that is well conserved across lentiviral vectors and across different wild-type HIV-1 isolates. The RRE present within lentiviral vector genomes may contain multiple internal ORFs. These internal ORFs may be found between an internal ATG sequence of the RRE and the stop codon immediately 3' to the ATG sequence.
The RRE present within lentiviral vector genomes is typically embedded within the packaging region containing highly structured RNA towards the 5' region of the RNA (the 5'UTR ). The 5' UTR structure consists of series of stem-loop structures connected by small linkers. These stem-loops include the RRE. Thus, the RRE itself has a complex secondary structure, involving complementary base-pairing, to which Rev binds. Modifications in the RRE to disrupt at least one internal ORF, for example by mutating the ATG sequence which denotes the start of the at least one internal ORF, are tolerated. Thus, the modified RREs described herein retain Rev binding capacity.
The modified RRE may comprise less than eight ATG sequences.
Accordingly, in some embodiments, the lentiviral vector genome comprises a modified Rev response element (RRE), wherein the modified RRE comprises less than eight ATG sequences.
Suitably, the modified RRE may comprise less than seven, less than six, less than five, less than four, less than three, less than two or less than one ATG sequence(s). The modified RRE may lack an ATG sequence.
The RRE may be a minimal functional RRE. An example minimal functional RRE is as follows:
Figure imgf000070_0001
The RRE may be the core RRE. An example core RRE is as follows:
Figure imgf000070_0002
By "minimal functional RRE" or "minimal RRE" is meant a truncated RRE sequence which retains the function of the full-length RRE. Thus, the minimal functional RRE retains Rev binding capacity.
The RRE may be a full-length RRE. An example full-length RRE is as follows:
Figure imgf000070_0003
Figure imgf000071_0001
The RRE may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 14; and/or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 15.
The modified RRE may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 14; and/or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 15.
The modified RRE may comprise the sequence as set forth in SEQ ID NO: 14 or SEQ ID NO: 15, or a sequence having at least 80% identity (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) thereto, wherein at least one ATG sequence selected from the group (a)-(h) is mutated: a) ATG corresponding to positions 27-29 of SEQ ID NO: 15; b) ATG corresponding to positions 192-194 of SEQ ID NO: 15; c) ATG corresponding to positions 207-209 of SEQ ID NO: 15; d) ATG corresponding to positions 436-438 of SEQ ID NO: 15; e) ATG corresponding to positions 489-491 of SEQ ID NO: 15; f) ATG corresponding to positions 571-573 of SEQ ID NO: 15; g) ATG corresponding to positions 599-601 of SEQ ID NO: 15; and/or h) ATG corresponding to positions 663-665 of SEQ ID NO: 15.
An example modified RRE sequence is as follows:
Figure imgf000072_0001
A further example modified RRE sequence is as follows:
Figure imgf000072_0002
A further example modified RRE sequence is as follows:
Figure imgf000072_0003
An example of a modified RRE sequence lacking an ATG sequence is as follows:
Figure imgf000073_0001
The modified RRE may comprise the sequence as set forth in SEQ ID NO: 16 or SEQ ID NO: 17 or SEQ ID NO: 18, or a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity thereto. The sequence may comprise less than eight (suitably less than seven, less than six, less than five, less than four, less than three, less than two or less than one) ATG sequences.
Modified Woodchuck hepatitis virus (WHV) post-transcriptional regulatory element (WPRE)
In some embodiments, the lentiviral vector genome comprises a modified Woodchuck hepatitis virus (WHV) post-transcriptional regulatory element (WPRE), wherein at least one internal open reading frame (ORF) in the WPRE is disrupted as described herein.
The WPRE can enhance expression from a number of different vector types including lentiviral vectors (U.S. Patent Nos. 6,136,597; 6,287,814; Zufferey, R., et al. (1999) J. Virol. 73: 2886- 92). Without wanting to be bound by theory, this enhancement is thought to be due to improved RNA processing at the post-transcriptional level, resulting in increased levels of nuclear transcripts. A two-fold increase in mRNA stability also contributes to this enhancement (Zufferey, R., et al. ibid). The level of enhancement of protein expression from transcripts containing the WPRE versus those without the WPRE has been reported to be around 2-to-5 fold, and correlates well with the increase in transcript levels. This has been demonstrated with a number of different transgenes (Zufferey, R., et al. ibid). The WPRE contains three cis-acting sequences important for its function in enhancing expression levels. In addition, it contains a fragment of approximately 180 bp comprising the 5'-end of the WHV X protein ORF (full length ORF is 425bp), together with its associated promoter. The full-length X protein has been implicated in tumorigenesis (Flajolet, M. et al, (1998) J. Virol. 72: 6175-6180). Translation from transcripts initiated from the X promoter results in formation of a protein representing the NH2-terminal 60 amino acids of the X protein. This truncated X protein can promote tumorigenesis, particularly if the truncated X protein sequence is integrated into the host cell genome at specific loci (Balsano, C. et al, (1991) Biochem. Biophys Res. Commun. 176: 985-92; Flajolet, M. et al, (1998) J. Virol. 72: 6175-80; Zheng, Y.W., et al, (1994) J. Biol. Chem. 269: 22593-8; Runkel, L., et al, (1993) Virology 197: 529-36). Therefore, expression of the truncated X protein could promote tumorigenesis if delivered to cells of interest, precluding safe use of wild-type WPRE sequences.
US 2005/0002907 discloses that mutation of a region of the WPRE corresponding to the X protein ORF ablates the tumorigenic activity of the X protein, thereby allowing the WPRE to be used safely in retroviral and lentiviral expression vectors to enhance expression levels of heterologous genes or nucleotides of interest.
As used herein, the "X region" of the WPRE is defined as comprising at least the first 60-amino acids of the X protein ORF, including the translation initiation codon, and its associated promoter. A "functional" X protein is defined herein as a truncated X protein that is capable of promoting tumorigenesis, or a transformed phenotype, when expressed in cells of interest. A "non-functional" X protein in the context of this application is defined as an X protein that is incapable of promoting tumorigenesis in cells of interest.
The modified WPREs described herein retain the capacity to enhance expression from the lentiviral vector.
The modified WPRE may comprise less than seven ATG sequences. The modified WPRE may comprise less than six ATG sequences.
Accordingly, in some embodiments, the lentiviral vector genome comprises a modified Woodchuck hepatitis virus (WHV) post-transcriptional regulatory element (WPRE), wherein the modified WPRE comprises less than seven ATG sequences, preferably less than six ATG sequences.
Suitably, the modified WPRE may comprise less than seven, less than six, less than five, less than four, less than three, less than two or less than one ATG sequence(s). The modified WPRE may lack ATG sequences. In some embodiments, at least one ATG sequence in the X region of the WPRE is mutated, whereby expression of a functional X protein is prevented. In preferred embodiments, the mutation is in the translation initiation codon of the X region. As a result of the mutation of the at least one ATG sequence, the X protein may not be expressed.
In some embodiments, the modified WPRE does not comprise a mutation in an ATG sequence in the X region of the WPRE.
An example WPRE sequence is as follows:
Figure imgf000075_0001
An example WPRE sequence which contains a disrupted X-protein ORF is as follows:
Figure imgf000075_0002
The WPRE may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 19; and/or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 20.
The modified WPRE may comprise: a) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 19; and/or b) a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity to SEQ ID NO: 20.
The modified WPRE may comprise the sequence as set forth in SEQ ID NO: 19 or SEQ ID NO: 20, or a sequence having at least 80% identity (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) thereto, wherein at least one ATG sequence selected from the group (a)-(g) is mutated: a) ATG corresponding to positions 53-55 of SEQ ID NO: 19; b) ATG corresponding to positions 72-74 of SEQ ID NO: 19; c) ATG corresponding to positions 91-93 of SEQ ID NO: 19; d) ATG corresponding to positions 104-106 of SEQ ID NO: 19; e) ATG corresponding to positions 121-123 of SEQ ID NO: 19; f) ATG corresponding to positions 170-172 of SEQ ID NO: 19; and/or g) ATG corresponding to positions 411-413 of SEQ ID NO: 19.
The WRPE typically contains a retained Pol ORF. An example retained Pol ORF sequence is as follows:
Figure imgf000076_0001
In one embodiment, at least one (suitably at least two or at least three) ATG sequence within the retained Pol ORF sequence in the WPRE is mutated. In one embodiment, all ATG sequences within the retained Pol ORF sequence in the WPRE are mutated.
In one embodiment, the modified WPRE comprises less than three (suitably less than two or less than one) ATG sequences in the retained Pol ORF sequence in the WPRE. In one embodiment, the modified WPRE lacks an ATG sequence in the retained Pol ORF sequence in the WPRE.
An example modified WPRE sequence in which all ATG codons within the retained Pol ORF are mutated is as follows:
Figure imgf000077_0001
An example of a modified WPRE sequence lacking an ATG sequence is as follows:
Figure imgf000077_0002
The modified WPRE may comprise the sequence as set forth in SEQ ID NO: 22 or SEQ ID NO: 23, or a sequence having at least 80% (suitably at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%) identity thereto. The sequence may comprise less than six (suitably less than five, less than four, less than three, less than two or less than one) ATG sequences.
Vector / Expression Cassette
A vector is a tool that allows or facilitates the transfer of an entity from one environment to another. In accordance with the present invention, and by way of example, some vectors used in recombinant nucleic acid techniques allow entities, such as a segment of nucleic acid (e.g. a heterologous DNA segment, such as a heterologous cDNA segment), to be transferred into and expressed by a target cell. The vector may facilitate the integration of the nucleotide sequence encoding a viral vector component to maintain the nucleotide sequence encoding the viral vector component and its expression within the target cell.
The vector may be or may include an expression cassette (also termed an expression construct). Expression cassettes as described herein comprise regions of nucleic acid containing sequences capable of being transcribed. Thus, sequences encoding mRNA, tRNA and rRNA are included within this definition.
The vector may contain one or more selectable marker genes (e.g. a neomycin resistance gene) and/or traceable marker gene(s) (e.g. a gene encoding green fluorescent protein (GFP)). Vectors may be used, for example, to infect and/or transduce a target cell. The vector may further comprise a nucleotide sequence enabling the vector to replicate in the host cell in question, such as a conditionally replicating oncolytic vector.
The term "cassette" - which is synonymous with terms such as "conjugate", "construct" and "hybrid" - includes a polynucleotide sequence directly or indirectly attached to a promoter. Preferably the cassette comprises at least a polynucleotide sequence operably linked to a promoter. For example, expression cassettes for use in the invention may comprise a promoter for the expression of the nucleotide sequence encoding a viral vector component and optionally a regulator of the nucleotide sequence encoding the viral vector component.
The choice of expression cassette, e.g. plasmid, cosmid, virus or phage vector, will often depend on the host cell into which it is to be introduced. The expression cassette can be a DNA plasmid (supercoiled, nicked or linearised), minicircle DNA (linear or supercoiled), plasmid DNA containing just the regions of interest by removal of the plasmid backbone by restriction enzyme digestion and purification, DNA generated using an enzymatic DNA amplification platform e.g. doggybone DNA (dbDNA™) where the final DNA used is in a closed ligated form or where it has been prepared (e.g. restriction enzyme digestion) to have open cut ends.
Lentiviral Vector Production Systems and Cells
In a further aspect, the invention provides a viral vector production system comprising a set of nucleotide sequences, wherein the nucleotide sequences comprise nucleotide sequences encoding vector components including gag-pol, env, optionally rev, and a nucleotide sequence of the invention.
In a further aspect, the invention provides a cell comprising the nucleotide sequence of the invention, the expression cassette of the invention, or the vector production system of the invention.
In a further aspect, the invention provides a cell for producing lentiviral vectors comprising:
(a)
(i) nucleotide sequences encoding vector components including gag-pol and env, and optionally rev, and the nucleotide sequence of the invention or the expression cassette of the invention; or
(ii) the viral vector production system of the invention; and
(b) optionally, a nucleotide sequence encoding a modified U1 snRNA and/or optionally a nucleotide sequence encoding TRAP.
In some embodiments, the splicing activity from the major splice donor site and/or splice donor region of the RNA genome of the lentiviral vector is suppressed or ablated.
In some embodiments, the splicing activity from the major splice donor site and/or splice donor region of the RNA genome of the lentiviral vector is suppressed or ablated during lentiviral vector production.
In a further aspect, the invention provides a method for producing a lentiviral vector, comprising the steps of:
(a) introducing:
(i) nucleotide sequences encoding vector components including gag-pol and env, and optionally rev, and the nucleotide sequence of the invention or the expression cassette of the invention; or
(ii) the viral vector production system of the invention, into a cell; and (b) optionally selecting for a cell that comprises nucleotide sequences encoding vector components and the RNA genome of the lentiviral vector; and
(c) culturing the cell under conditions suitable for the production of the lentiviral vector.
In a further aspect, the invention provides a lentiviral vector produced by the method of the invention.
In some embodiments, the lentiviral vector comprises the RNA genome of the lentiviral vector as described herein. Suitably, the lentiviral vector genome comprises a modified 3' LTR and/or a modified 5' LTR as described herein.
In a further aspect, the invention provides the use of the nucleotide sequence of the invention, the expression cassette of the invention, the viral vector production system of the invention, or the cell of the invention, for producing a lentiviral vector.
A lentiviral vector production system comprises a set of nucleotide sequences encoding the components required for production of the lentiviral vector. Accordingly, a vector production system comprises a set of nucleotide sequences which encode the viral vector components necessary to generate lentiviral vector particles. "Viral vector production system" or "vector production system" or "production system" is to be understood as a system comprising the necessary components for viral vector production.
In an aspect, the viral vector production system comprises nucleotide sequences encoding Gag and Gag/Pol proteins, and Env protein and the vector genome sequence. The production system may optionally comprise a nucleotide sequence encoding the Rev protein, or functional substitute thereof.
In an aspect, the viral vector production system comprises modular nucleic acid constructs (modular constructs). A modular construct is a DNA expression construct comprising two or more nucleic acids used in the production of lentiviral vectors. A modular construct can be a DNA plasmid comprising two or more nucleic acids used in the production of lentiviral vectors. The plasmid may be a bacterial plasmid. The nucleic acids can encode for example, gag-pol, rev, env, vector genome. In addition, modular constructs designed for generation of packaging and producer cell lines may additionally need to encode transcriptional regulatory proteins (e.g. TetR, CymR) and/or translational repression proteins (e.g. TRAP) and selectable markers (e.g. Zeocin™, hygromycin, blasticidin, puromycin, neomycin resistance genes). Suitable modular constructs for use in the present invention are described in EP 3502260, which is hereby incorporated by reference in its entirety. As the modular constructs for use in accordance with the present invention contain nucleic acid sequences encoding two or more of the retroviral components on one construct, the safety profile of these modular constructs has been considered and additional safety features directly engineered into the constructs. These features include the use of insulators for multiple open reading frames of retroviral vector components and/or the specific orientation and arrangement of the retroviral genes in the modular constructs. It is believed that by using these features the direct read-through to generate replication-competent viral particles will be prevented.
The nucleic acid sequences encoding the viral vector components may be in reverse and/or alternating transcriptional orientations in the modular construct. Thus, the nucleic acid sequences encoding the viral vector components are not presented in the same 5' to 3' orientation, such that the viral vector components cannot be produced from the same mRNA molecule. The reverse orientation may mean that at least two coding sequences for different vector components are presented in the 'head-to-head' and 'tail-to-tail' transcriptional orientations. This may be achieved by providing the coding sequence for one vector component, e.g. env, on one strand and the coding sequence for another vector component, e.g. rev, on the opposing strand of the modular construct. Preferably, when coding sequences for more than two vector components are present in the modular construct, at least two of the coding sequences are present in the reverse transcriptional orientation. Accordingly, when coding sequences for more than two vector components are present in the modular construct, each component may be orientated such that it is present in the opposite 5' to 3' orientation to all of the adjacent coding sequence(s) for other vector components to which it is adjacent, i.e. alternating 5' to 3' (or transcriptional) orientations for each coding sequence may be employed.
The modular construct for use according to the present invention may comprise nucleic acid sequences encoding two or more of the following vector components: gag-pol, rev, an env, vector genome. The modular construct may comprise nucleic acid sequences encoding any combination of the vector components. In an aspect, the modular construct may comprise nucleic acid sequences encoding: i) the RNA genome of the retroviral vector and rev, or a functional substitute thereof; ii) the RNA genome of the retroviral vector and gag-pol; iii) the RNA genome of the retroviral vector and env; iv) gag-pol and rev, or a functional substitute thereof; v) gag-pol and env; vi) env and rev, or a functional substitute thereof; vii) the RNA genome of the retroviral vector, rev, or a functional substitute thereof, and gag-pol; viii) the RNA genome of the retroviral vector, rev, or a functional substitute thereof, and env; ix) the RNA genome of the retroviral vector, gag-pol and env; or x) gag-pol, rev, or a functional substitute thereof, and env, wherein the nucleic acid sequences are in reverse and/or alternating orientations.
In one aspect, a cell for producing lentiviral vectors may comprise nucleic acid sequences encoding any one of the combinations i) to x) above, wherein the nucleic acid sequences are located at the same genetic locus and are in reverse and/or alternating orientations. The same genetic locus may refer to a single extrachromosomal locus in the cell, e.g. a single plasmid, or a single locus (i.e. a single insertion site) in the genome of the cell. The cell may be a stable or transient cell for producing lentiviral vectors. In one aspect the cell does not comprise tat.
The DNA expression construct can be a DNA plasmid (supercoiled, nicked or linearised), minicircle DNA (linear or supercoiled), plasmid DNA containing just the regions of interest by removal of the plasmid backbone by restriction enzyme digestion and purification, DNA generated using an enzymatic DNA amplification platform e.g. doggybone DNA (dbDNA™) where the final DNA used is in a closed ligated form or where it has been prepared (e.g restriction enzyme digestion) to have open cut ends.
In one aspect, the lentiviral vector is derived from HIV-1 , HIV-2, SIV, FIV, BIV, EIAV, CAEV or Visna lentivirus.
A "viral vector production cell", "vector production cell", or "production cell" is to be understood as a cell that is capable of producing a lentiviral vector or lentioviral vector particle. Lentiviral vector production cells may be "producer cells" or "packaging cells". One or more DNA constructs of the viral vector system may be either stably integrated or episomally maintained within the viral vector production cell. Alternatively, all the DNA components of the viral vector system may be transiently transfected into the viral vector production cell. In yet another alternative, a production cell stably expressing some of the components may be transiently transfected with the remaining components required for vector production.
As used herein, the term "packaging cell" refers to a cell which contains the elements necessary for production of lentiviral vector particles but which lacks the vector genome. Optionally, such packaging cells contain one or more expression cassettes which are capable of expressing viral structural proteins (such as gag, gag/pol and env). Producer cells/packaging cells can be of any suitable cell type. Producer cells are generally mammalian cells but can be derived from other organisms, e.g. insect cells.
As used herein, the term "producer cell" or "vector producing/producer cell" refers to a cell which contains all the elements necessary for production of lentiviral vector particles. The producer cell may be either a stable producer cell line or derived transiently or may be a stable packaging cell wherein the lentiviral genome is transiently expressed.
In the methods of the invention, the vector components may include gag, env, rev and/or the RNA genome of the lentiviral vector when the viral vector is a lentiviral vector. The nucleotide sequences encoding vector components may be introduced into the cell either simultaneously or sequentially in any order.
The vector production cells may be cells cultured in vitro such as a tissue culture cell line. In some aspects of the methods and uses of the invention, suitable production cells or cells for producing a lentiviral vector are those cells which are capable of producing viral vectors or viral vector particles when cultured under appropriate conditions. Thus, the cells typically comprise nucleotide sequences encoding vector components, which may include gag, env, rev and the RNA genome of the lentiviral vector. Suitable cell lines include, but are not limited to, mammalian cells such as murine fibroblast derived cell lines or human cell lines. They are generally mammalian, including human cells, for example HEK293T, HEK293, CAP, CAP-T or CHO cells, but can be, for example, insect cells such as SF9 cells. Preferably, the vector production cells are derived from a human cell line. Accordingly, such suitable production cells may be employed in any of the methods or uses of the present invention.
Methods for introducing nucleotide sequences into cells are well known in the art and have been described previously. Thus, the introduction into a cell of nucleotide sequences encoding vector components including gag, env, rev and the RNA genome of the lentiviral vector, using conventional techniques in molecular and cell biology is within the capabilities of a person skilled in the art.
Stable production cells may be packaging or producer cells. To generate producer cells from packaging cells the vector genome DNA construct may be introduced stably or transiently. Packaging/producer cells can be generated by transducing a suitable cell line with a retroviral vector which expresses one of the components of the vector, i.e. a genome, the gag-pol components and an envelope as described in WO 2004/022761. Alternatively, the nucleotide sequence can be transfected into cells and then integration into the production cell genome occurs infrequently and randomly. The transfection methods may be performed using methods well known in the art. For example, a stable transfection process may employ constructs which have been engineered to aid concatemerisation. In another example, the transfection process may be performed using calcium phosphate or commercially available formulations such as Lipofectamine™ 2000CD (Invitrogen, CA), FuGENE® HD or polyethylenimine (PEI). Alternatively nucleotide sequences may be introduced into the production cell via electroporation. The skilled person will be aware of methods to encourage integration of the nucleotide sequences into production cells. For example, linearising a nucleic acid construct can help if it is naturally circular. Less random integration methodologies may involve the nucleic acid construct comprising of areas of shared homology with the endogenous chromosomes of the mammalian host cell to guide integration to a selected site within the endogenous genome. Furthermore, if recombination sites are present on the construct then these can be used for targeted recombination. For example, the nucleic acid construct may contain a loxP site which allows for targeted integration when combined with Cre recombinase (i.e. using the Cre/lox system derived from P1 bacteriophage). Alternatively or additionally, the recombination site is an att site (e.g. from A phage), wherein the att site permits site-directed integration in the presence of a lambda integrase. This would allow the lentiviral genes to be targeted to a locus within the host cellular genome which allows for high and/or stable expression.
Other methods of targeted integration are well known in the art. For example, methods of inducing targeted cleavage of genomic DNA can be used to encourage targeted recombination at a selected chromosomal locus. These methods often involve the use of methods or systems to induce a double strand break (DSB) e.g. a nick in the endogenous genome to induce repair of the break by physiological mechanisms such as non-homologous end joining (NHEJ). Cleavage can occur through the use of specific nucleases such as engineered zinc finger nucleases (ZFN), transcription-activator like effector nucleases (TALENs), using CRISPR/Cas9 systems with an engineered crRNA/tracr RNA ('single guide RNA') to guide specific cleavage, and/or using nucleases based on the Argonaute system (e.g., from T. thermophilus).
Packaging/producer cell lines can be generated by integration of nucleotide sequences using methods of just lentiviral transduction or just nucleic acid transfection, or a combination of both can be used. Methods for generating retroviral vectors from production cells and in particular the processing of retroviral vectors are described in WO 2009/153563.
In one aspect, the production cell may comprise the RNA-binding protein (e.g. tryptophan RNA-binding attenuation protein, TRAP) and/or the Tet Repressor (TetR) protein or alternative regulatory proteins (e.g. CymR).
Production of lentiviral vector from production cells can be via transfection methods, from production from stable cell lines which can include induction steps (e.g. doxycycline induction) or via a combination of both. The transfection methods may be performed using methods well known in the art, and examples have been described previously.
Production cells, either packaging or producer cell lines or those transiently transfected with the lentiviral vector encoding components are cultured to increase cell and virus numbers and/or virus titres. Culturing a cell is performed to enable it to metabolize, and/or grow and/or divide and/or produce viral vectors of interest according to the invention. This can be accomplished by methods well known to persons skilled in the art, and includes but is not limited to providing nutrients for the cell, for instance in the appropriate culture media. The methods may comprise growth adhering to surfaces, growth in suspension, or combinations thereof. Culturing can be done for instance in tissue culture flasks, tissue culture multiwell plates, dishes, roller bottles, wave bags or in bioreactors, using batch, fed-batch, continuous systems and the like. In order to achieve large scale production of viral vector through cell culture it is preferred in the art to have cells capable of growing in suspension. Suitable conditions for culturing cells are known (see e.g. Tissue Culture, Academic Press, Kruse and Paterson, editors (1973), and R.l. Freshney, Culture of animal cells: A manual of basic technique, fourth edition (Wiley- Liss Inc., 2000, ISBN 0-471-34889-9).
Preferably cells are initially 'bulked up' in tissue culture flasks or bioreactors and subsequently grown in multi-layered culture vessels or large bioreactors (greater than 50L) to generate the vector producing cells for use in the present invention.
Preferably cells are grown in a suspension mode to generate the vector producing cells for use in the present invention.
Lentiviral Vectors
Lentiviruses are part of a larger group of retroviruses. A detailed list of lentiviruses may be found in Coffin et al (1997) "Retroviruses" Cold Spring Harbour Laboratory Press Eds: JM Coffin, SM Hughes, HE Varmus pp 758-763). In brief, lentiviruses can be divided into primate and non-primate groups. Examples of primate lentiviruses include but are not limited to: the human immunodeficiency virus (HIV), the causative agent of human auto-immunodeficiency syndrome (AIDS), and the simian immunodeficiency virus (SIV). The non-primate lentiviral group includes the prototype "slow virus" visna/maedi virus (VMV), as well as the related caprine arthritis-encephalitis virus (CAEV), equine infectious anaemia virus (EIAV), feline immunodeficiency virus (FIV), Maedi visna virus (MW) and bovine immunodeficiency virus (BIV). In one aspect, the lentiviral vector is derived from HIV-1 , HIV-2, SIV, FIV, BIV, EIAV, CAEV or Visna lentivirus.
The lentivirus family differs from retroviruses in that lentiviruses have the capability to infect both dividing and non-dividing cells (Lewis et al (1992) EMBO J 11 (8): 3053-3058 and Lewis and Emerman (1994) J Virol 68 (1):510-516). In contrast, other retroviruses, such as MLV, are unable to infect non-dividing or slowly dividing cells such as those that make up, for example, muscle, brain, lung and liver tissue.
A lentiviral vector, as used herein, is a vector which comprises at least one component part derivable from a lentivirus. Preferably, that component part is involved in the biological mechanisms by which the vector infects or transduces target cells and expresses a nucleotide of interest (NOI), or nucleotides of interest.
The lentiviral vector may be used to replicate the NOI in a compatible target cell in vitro. Thus, described herein is a method of making proteins in vitro by introducing a vector of the invention into a compatible target cell in vitro and growing the target cell under conditions which result in expression of the NOI. Protein and NOI may be recovered from the target cell by methods well known in the art. Suitable target cells include mammalian cell lines and other eukaryotic cell lines.
In some aspects the vectors may have "insulators" - genetic sequences that block the interaction between promoters and enhancers, and act as a barrier reducing read-through from an adjacent gene.
In one aspect the insulator is present between one or more of the lentiviral nucleic acid sequences to prevent promoter interference and read-thorough from adjacent genes. If the insulators are present in the vector between one or more of the lentiviral nucleic acid sequences, then each of these insulated genes may be arranged as individual expression units. The basic structure of retroviral and lentiviral genomes share many common features such as a 5' LTR and a 3' LTR, between or within which are located a packaging signal to enable the genome to be packaged, a primer binding site, integration sites to enable integration into a target cell genome and gaglpol and env genes encoding the packaging components - these are polypeptides required for the assembly of viral particles. Lentiviruses have additional features, such as the rev gene and RRE sequences in HIV, which enable the efficient export of RNA transcripts of the integrated provirus from the nucleus to the cytoplasm of an infected target cell.
In the provirus, these genes are flanked at both ends by regions called long terminal repeats (LTRs). The LTRs are responsible for proviral integration, and transcription. LTRs also serve as enhancer-promoter sequences and can control the expression of the viral genes.
The LTRs themselves are identical sequences that can be divided into three elements, which are called U3, R and U5. U3 is derived from the sequence unique to the 3' end of the RNA. R is derived from a sequence repeated at both ends of the RNA and U5 is derived from the sequence unique to the 5' end of the RNA. The sizes of the three elements can vary considerably among different retroviruses.
In a typical retroviral vector as described herein, at least part of one or more protein coding regions essential for replication may be removed from the virus; for example, gaglpol and env may be absent or not functional. This makes the viral vector replication-defective.
The lentiviral vector may be derived from either a primate lentivirus (e.g. HIV-1) or a nonprimate lentivirus (e.g. EIAV).
In general terms, a typical retroviral vector production system involves the separation of the viral genome from the essential viral packaging functions. These viral vector components are normally provided to the production cells on separate DNA expression cassettes (alternatively known as plasmids, expression plasmids, DNA constructs or expression constructs).
The vector genome comprises the NOI. Vector genomes typically require a packaging signal (qj), the internal expression cassette harbouring the NOI, (optionally) a post-transcriptional element (PRE), typically a central polypurine tract (cppt), the 3'-ppu and a self-inactivating (SIN) LTR. The R-U5 regions are required for correct polyadenylation of both the vector genome RNA and NOI mRNA, as well as the process of reverse transcription. The vector genome may optionally include an open reading frame, as described in WO 2003/064665, which allows for vector production in the absence of rev. The packaging functions include the gag/pol and env genes. These are required for the production of vector particles by the production cell. Providing these functions in trans to the genome facilitates the production of replication-defective viral vectors.
Production systems for gamma-retroviral vectors are typically 3-component systems requiring genome, gaglpol and env expression constructs. Production systems for HIV-1-based lentiviral vectors may additionally require the accessory gene rev to be provided and for the vector genome to include the rev-responsive element (RRE). ElAV-based lentiviral vectors do not require rev to be provided in trans if an open-reading frame (ORF) is present within the genome (see WO 2003/064665).
Usually both the "external" promoter (which drives the vector genome cassette) and "internal" promoter (which drives the NOI cassette) encoded within the vector genome cassette are strong eukaryotic or virus promoters, as are those driving the other vector system components. Examples of such promoters include CMV, EF1a, PGK, CAG, TK, SV40 and Ubiquitin promoters. Strong 'synthetic' promoters, such as those generated by DNA libraries (e.g. JeT promoter) may also be used to drive transcription. Alternatively, tissue-specific promoters such as rhodopsin (Rho), rhodopsin kinase (RhoK), cone-rod homeobox containing gene (CRX), neural retina-specific leucine zipper protein (NRL), Vitelliform Macular Dystrophy 2 (VMD2), Tyrosine hydroxylase, neuronal-specific neuronal-specific enolase (NSE) promoter, astrocytespecific glial fibrillary acidic protein (GFAP) promoter, human a1 -antitrypsin (hAAT) promoter, phosphoenolpyruvate carboxykinase (PEPCK), liver fatty acid binding protein promoter, Flt-1 promoter, I N F-p promoter, Mb promoter, SP-B promoter, SYN1 promoter, WASP promoter, SV401 hAlb promoter, SV401 CD43, SV401 CD45, NSE I RU5' promoter, ICAM-2 promoter, GPIIb promoter, GFAP promoter, Fibronectin promoter, Endoglin promoter, Elastase-1 promoter, Desmin promoter, CD68 promoter, CD14 promoter and B29 promoter may be used to drive transcription.
Production of retroviral vectors involves either the transient co-transfection of the production cells with these DNA components or use of stable production cell lines wherein all the components are stably integrated within the production cell genome (e.g. Stewart HJ, Fong- Wong L, Strickland I, Chipchase D, Kelleher M, Stevenson L, Thoree V, McCarthy J, Ralph GS, Mitrophanous KA and Radcliffe PA. (2011). Hum Gene Ther. Mar; 22 (3):357-69). An alternative approach is to use a stable packaging cell (into which the packaging components are stably integrated) and then transiently transfect in the vector genome plasmid as required (e.g. Stewart, H. J., M. A. Leroux-Carlucci, C. J. Sion, K. A. Mitrophanous and P. A. Radcliffe (2009). Gene Ther. Jun; 16 (6):805-14). It is also feasible that alternative, not complete, packaging cell lines could be generated (just one or two packaging components are stably integrated into the cell lines) and to generate vector the missing components are transiently transfected. The production cell may also express regulatory proteins such as a member of the tet repressor (TetR) protein group of transcription regulators (e.g.T-Rex, Tet-On, and Tet- Off), a member of the cumate inducible switch system group of transcription regulators (e.g. cumate repressor (CymR) protein), or an RNA-binding protein (e.g. TRAP - tryptophan- activated RNA-binding protein).
In one aspect of the present invention, the viral vector is derived from EIAV. El AV has the simplest genomic structure of the lentiviruses and is particularly preferred for use in the present invention. In addition to the gaglpol and env genes, EIAV encodes three other genes: tat, rev, and S2. Tat acts as a transcriptional activator of the viral LTR (Derse and Newbold (1993) Virology 194(2): 530-536 and Maury et al (1994) Virology 200(2):632-642) and rev regulates and coordinates the expression of viral genes through rev-response elements (RRE) (Martarano et al. (1994) J Virol 68(5):3102-3111). The mechanisms of action of these two proteins are thought to be broadly similar to the analogous mechanisms in the primate viruses (Martarano et al. (1994) J Virol 68(5):3102-3111). The function of S2 is unknown. In addition, an EIAV protein, Ttm, has been identified that is encoded by the first exon of tat spliced to the env coding sequence at the start of the transmembrane protein. In an alternative aspect of the present invention the viral vector is derived from HIV: HIV differs from EIAV in that it does not encode S2 but unlike EIAV it encodes vif, vpr, vpu and nef.
The term "recombinant retroviral or lentiviral vector" (RRV) refers to a vector with sufficient retroviral genetic information to allow packaging of an RNA genome, in the presence of packaging components, into a viral particle capable of transducing a target cell. Transduction of the target cell may include reverse transcription and integration into the target cell genome. The RRV carries non-viral coding sequences which are to be delivered by the vector to the target cell. A RRV is incapable of independent replication to produce infectious retroviral particles within the target cell. Usually the RRV lacks a functional gaglpol and/or env gene, and/or other genes essential for replication.
Preferably the RRV vector of the present invention has a minimal viral genome.
As used herein, the term "minimal viral genome" means that the viral vector has been manipulated so as to remove the non-essential elements whilst retaining the elements essential to provide the required functionality to infect, transduce and deliver a NOI to a target cell. Further details of this strategy can be found in WO 1998/17815 and WO 99/32646. A minimal EIAV vector lacks tat, rev and S2 genes and neither are these genes provided in trans in the production system. A minimal HIV vector lacks vif, vpr, vpu, tat and net.
The expression plasmid used to produce the vector genome within a production cell may include transcriptional regulatory control sequences operably linked to the retroviral genome to direct transcription of the genome in a production cell/packaging cell. All 3rd generation lentiviral vectors are deleted in the 5' U3 enhancer-promoter region, and transcription of the vector genome RNA is driven by heterologous promoter such as another viral promoter, for example the CMV promoter, as discussed below. This feature enables vector production independently of tat. Some lentiviral vector genomes require additional sequences for efficient virus production. For example, particularly in the case of HIV, RRE sequences may be included. However the requirement for RRE on the (separate) GagPol cassette (and dependence on rev which is provided in trans) may be reduced or eliminated by codon optimisation of the GagPol ORF. Further details of this strategy can be found in WO 2001/79518.
Alternative sequences which perform the same function as the rev/RRE system are also known. For example, a functional analogue of the rev/RRE system is found in the Mason Pfizer monkey virus. This is known as the constitutive transport element (GTE) and comprises an RRE-type sequence in the genome which is believed to interact with a factor in the infected cell. The cellular factor can be thought of as a rev analogue. Thus, CTE may be used as an alternative to the rev/RRE system. Any other functional equivalents of the Rev protein which are known or become available may be relevant to the invention. For example, it is also known that the Rex protein of HTLV-I can functionally replace the Rev protein of HIV-1. Revand RRE may be absent or non-functional in the vector for use in the methods of the present invention; in the alternative rev and RRE, or functionally equivalent system, may be present.
It is therefore understood that 'rev' may refer to a sequence encoding the HIV-1 Rev protein or a sequence encoding any functional equivalent thereof. Thus, in an aspect, the invention provides a viral vector production system and/or a cell comprising a set of nucleotide sequences, wherein the nucleotide sequences encode vector components including gag-pol, env, optionally rev, and the nucleotide sequences of any of the preceding claims.
As used herein, the term "functional substitute" means a protein or sequence having an alternative sequence which performs the same function as another protein or sequence. The term "functional substitute" is used interchangeably with "functional equivalent" and "functional analogue" herein with the same meaning. SIN Vectors
The lentiviral vectors as described herein may be used in a self-inactivating (SIN) configuration in which the viral enhancer and promoter sequences have been deleted. SIN vectors can be generated and transduce non-dividing target cells in vivo, ex vivo or in vitro with an efficacy similar to that of non-SIN vectors. The transcriptional inactivation of the long terminal repeat (LTR) in the SIN provirus should prevent mobilisation of vRNA, and is a feature that further diminishes the likelihood of formation of replication-competent virus. This should also enable the regulated expression of genes from internal promoters by eliminating any cis-acting effects of the LTR.
By way of example, self-inactivating retroviral vector systems have been constructed by deleting the transcriptional enhancers or the enhancers and promoter in the U3 region of the 3' LTR. After a round of vector reverse transcription and integration, these changes are copied into both the 5' and the 3' LTRs producing a transcriptionally inactive provirus. However, any promoter(s) internal to the LTRs in such vectors will still be transcriptionally active. This strategy has been employed to eliminate effects of the enhancers and promoters in the viral LTRs on transcription from internally placed genes. Such effects include increased transcription or suppression of transcription. This strategy can also be used to eliminate downstream transcription from the 3' LTR into genomic DNA. This is of particular concern in human gene therapy where it is important to prevent the adventitious activation of any endogenous oncogene. Yu et al., (1986) PNAS 83: 3194-98; Marty et al., (1990) Biochimie 72: 885-7; Naviaux et al., (1996) J. Virol. 70: 5701-5; Iwakuma et al., (1999) Virol. 261 : 120- 32; Deglon et al., (2000) Human Gene Therapy 11 : 179-90. SIN lentiviral vectors are described in US 6,924,123 and US 7,056,699.
Replication-Defective Lentiviral Vectors
In the genome of a replication-defective lentiviral vector the sequences of gaglpol and/or env may be mutated and/or not functional.
In a typical lentiviral vector as described herein, at least part of one or more coding regions for proteins essential for virus replication may be removed from the vector. This makes the viral vector replication-defective. Portions of the viral genome may also be replaced by a NOI in order to generate a vector comprising an NOI which is capable of transducing a non-dividing target cell and/or integrating its genome into the target cell genome.
In one aspect the lentiviral vectors are non-integrating vectors as described in WO 2006/010834 and WO 2007/071994. In a further aspect the vectors have the ability to deliver a sequence which is devoid of or lacking viral RNA. In a further aspect a heterologous binding domain (heterologous to gag) located on the RNA to be delivered and a cognate binding domain on Gag or GagPol can be used to ensure packaging of the RNA to be delivered. Both of these vectors are described in WO 2007/072056.
NOI and Polynucleotides
Polynucleotides of the invention may comprise DNA or RNA. They may be single-stranded or double-stranded. A nucleotide, or nucleotides, of interest is/are commonly referred to as NOI. It will be understood by a skilled person that numerous different polynucleotides can encode the same polypeptide as a result of the degeneracy of the genetic code. In addition, it is to be understood that skilled persons may, using routine techniques, make nucleotide substitutions that do not affect the polypeptide sequence encoded by the polynucleotides of the invention to reflect the codon usage of any particular host organism in which the polypeptides of the invention are to be expressed.
The polynucleotides may be modified by any method available in the art. Such modifications may be carried out in order to enhance the in vivo activity or lifespan of the polynucleotides of the invention.
Polynucleotides such as DNA polynucleotides may be produced recombinantly, synthetically or by any means available to those of skill in the art. They may also be cloned by standard techniques.
Longer polynucleotides will generally be produced using recombinant means, for example using polymerase chain reaction (PCR) cloning techniques. This will involve making a pair of primers (e.g. of about 15 to 30 nucleotides) flanking the target sequence which it is desired to clone, bringing the primers into contact with mRNA or cDNA obtained from an animal or human cell, performing PCR under conditions which bring about amplification of the desired region, isolating the amplified fragment (e.g. by purifying the reaction mixture with an agarose gel) and recovering the amplified DNA. The primers may be designed to contain suitable restriction enzyme recognition sites so that the amplified DNA can be cloned into a suitable vector.
Common Retroviral Vector Elements
Promoters and Enhancers
Expression of a NOI and polynucleotide may be controlled using control sequences for example transcription regulation elements or translation repression elements, which include promoters, enhancers and other expression regulation signals (e.g. tet repressor (TetR) system) or the Transgene Repression In vector Production cell system (TRiP) or other regulators of NOIs described herein.
Prokaryotic promoters and promoters functional in eukaryotic cells may be used. Tissuespecific or stimuli-specific promoters may be used. Chimeric promoters may also be used comprising sequence elements from two or more different promoters.
Suitable promoting sequences are strong promoters including those derived from the genomes of viruses, such as polyoma virus, adenovirus, fowlpox virus, bovine papilloma virus, avian sarcoma virus, cytomegalovirus (CMV), retrovirus and Simian Virus 40 (SV40), or from heterologous mammalian promoters, such as the actin promoter, EF1a, CAG, TK, SV40, ubiquitin, PGK or ribosomal protein promoter. Alternatively, tissue-specific promoters such as rhodopsin (Rho), rhodopsin kinase (RhoK), cone-rod homeobox containing gene (CRX), neural retina-specific leucine zipper protein (NRL), Vitelliform Macular Dystrophy 2 (VMD2), Tyrosine hydroxylase, neuronal-specific neuronal-specific enolase (NSE) promoter, astrocytespecific glial fibrillary acidic protein (GFAP) promoter, human a1 -antitrypsin (hAAT) promoter, phosphoenolpyruvate carboxykinase (PEPCK), liver fatty acid binding protein promoter, Flt-1 promoter, I N F-p promoter, Mb promoter, SP-B promoter, SYN1 promoter, WASP promoter, SV401 hAlb promoter, SV401 CD43, SV401 CD45, NSE I RU5' promoter, ICAM-2 promoter, GPIIb promoter, GFAP promoter, Fibronectin promoter, Endoglin promoter, Elastase-1 promoter, Desmin promoter, CD68 promoter, CD14 promoter and B29 promoter may be used to drive transcription.
Transcription of a NOI may be increased further by inserting an enhancer sequence into the vector. Enhancers are relatively orientation- and position-independent; however, one may employ an enhancer from a eukaryotic cell virus, such as the SV40 enhancer and the CMV early promoter enhancer. The enhancer may be spliced into the vector at a position 5' or 3' to the promoter, but is preferably located at a site 5' from the promoter.
The promoter can additionally include features to ensure or to increase expression in a suitable target cell. For example, the features can be conserved regions e.g. a Pribnow Box or a TATA box. The promoter may contain other sequences to affect (such as to maintain, enhance or decrease) the levels of expression of a nucleotide sequence. Suitable other sequences include the Sh1 -intron or an ADH intron. Other sequences include inducible elements, such as temperature, chemical, light or stress inducible elements. Also, suitable elements to enhance transcription or translation may be present. Regulators of NOIs
A complicating factor in the generation of retroviral packaging/producer cell lines and retroviral vector production is that constitutive expression of certain retroviral vector components and NOIs are cytotoxic leading to death of cells expressing these components and therefore inability to produce vector. Therefore, the expression of these components (e.g. gag-pol and envelope proteins such as VSV-G) can be regulated. The expression of other non-cytotoxic vector components, e.g. rev, can also be regulated to minimise the metabolic burden on the cell. The modular constructs and/or cells as described herein may comprise cytotoxic and/or non-cytotoxic vector components associated with at least one regulatory element. As used herein, the term "regulatory element" refers to any element capable of affecting, either increasing or decreasing, the expression of an associated gene or protein. A regulatory element includes a gene switch system, transcription regulation element and translation repression element.
A number of prokaryotic regulator systems have been adapted to generate gene switches in mammalian cells. Many retroviral packaging and producer cell lines have been controlled using gene switch systems (e.g. tetracycline and cumate inducible switch systems) thus enabling expression of one or more of the retroviral vector components to be switched on at the time of vector production. Gene switch systems include those of the (TetR) protein group of transcription regulators (e.g.T-Rex, Tet-On, and Tet-Off), those of the cumate inducible switch system group of transcription regulators (e.g. CymR protein) and those involving an RNA-binding protein (e.g. TRAP).
One such tetracycline-inducible system is the tetracycline repressor (TetR) system based on the T-REx™ system. By way of example, in such a system tetracycline operators (TetO2) are placed in a position such that the first nucleotide is 10bp from the 3' end of the last nucleotide of the TATATAA element of the human cytomegalovirus major immediate early promoter (hCMVp) then TetR alone is capable of acting as a repressor (Yao F, Svensjo T, Winkler T, Lu M, Eriksson C, Eriksson E. Tetracycline repressor, tetR, rather than the tetR-mammalian cell transcription factor fusion derivatives, regulates inducible gene expression in mammalian cells. 1998. Hum Gene Then, 9: 1939-1950). In such a system the expression of the NOI can be controlled by a CMV promoter into which two copies of the TetO2 sequence have been inserted in tandem. TetR homodimers, in the absence of an inducing agent (tetracycline or its analogue doxycycline [dox]), bind to the TetO2 sequences and physically block transcription from the upstream CMV promoter. When present, the inducing agent binds to the TetR homodimers, causing allosteric changes such that it can no longer bind to the TetO2 sequences, resulting in gene expression. The TetR gene may be codon optimised as this may improve translation efficiency resulting in tighter control of TetO2 controlled gene expression. The TRiP system is described in WO 2015/092440 and provides another way of repressing expression of the NOI in the production cells during vector production. The TRAP-binding sequence (e.g. TRAP-tbs) interaction forms the basis for a transgene protein repression system for the production of retroviral vectors, when a constitutive and/or strong promoter, including a tissue-specific promoter, driving the transgene is desirable and particularly when expression of the transgene protein in production cells leads to reduction in vector titres and/or elicits an immune response in vivo due to viral vector delivery of transgene-derived protein (Maunder et al, Nat Commun. (2017) Mar 27; 8).
Briefly, the TRAP-tbs interaction forms a translational block, repressing translation of the transgene protein (Maunder et al, Nat Commun. (2017) Mar 27; 8). The translational block is only effective in production cells and as such does not impede the DNA- or RNA- based vector systems. The TRiP system is able to repress translation when the transgene protein is expressed from a constitutive and/or strong promoter, including a tissue-specific promoter from single- or multi cistronic mRNA. It has been demonstrated that unregulated expression of transgene protein can reduce vector titres and affect vector product quality. Repression of transgene protein for both transient and stable PaCL/PCL vector production systems is beneficial for production cells to prevent a reduction in vector titres: where toxicity or molecular burden issues may lead to cellular stress; where transgene protein elicits an immune response in vivo due to viral vector delivery of transgene-derived protein; where the use of gene-editing transgenes may result in on/off target affects; where the transgene protein may affect vector and/or envelope glycoprotein exclusion.
Envelope and Pseudotyping
In one preferred aspect, the lentiviral vector as described herein has been pseudotyped. In this regard, pseudotyping can confer one or more advantages. For example, the env gene product of the HIV based vectors would restrict these vectors to infecting only cells that express a protein called CD4. But if the env gene in these vectors has been substituted with env sequences from other enveloped viruses, then they may have a broader infectious spectrum (Verma and Somia (1997) Nature 389(6648):239-242). By way of example, workers have pseudotyped an HIV based vector with the glycoprotein from VSV (Verma and Somia (1997) Nature 389(6648):239-242). Accordingly, alternative sequences which perform the equivalent function as the env gene product of HIV based vectors are also known.
In another alternative, the Env protein may be a modified Env protein such as a mutant or engineered Env protein. Modifications may be made or selected to introduce targeting ability or to reduce toxicity or for another purpose (Valsesia- Wittman et al 1996 J Virol 70: 2056-64; Nilson et al (1996) Gene Ther 3(4):280-286; and Fielding et al (1998) Blood 91 (5):1802-1809 and references cited therein).
The vector may be pseudotyped with any molecule of choice.
As used herein, "env" shall mean an endogenous lentiviral envelope or a heterologous envelope, as described herein. Suitably, env may be Env of HIV based vectors or a functional substitute thereof.
VSV-G
The envelope glycoprotein (G) of Vesicular stomatitis virus (VSV), a rhabdovirus, is an envelope protein that has been shown to be capable of pseudotyping certain enveloped viruses and viral vector virions.
Its ability to pseudotype MoMLV-based retroviral vectors in the absence of any retroviral envelope proteins was first shown by Emi et al. (1991) Journal of Virology 65:1202-1207. WO 1994/294440 teaches that retroviral vectors may be successfully pseudotyped with VSV-G. These pseudotyped VSV-G vectors may be used to transduce a wide range of mammalian cells. More recently, Abe et al. (1998) J Virol 72(8) 6356-6361 teach that non-infectious retroviral particles can be made infectious by the addition of VSV-G.
Burns et al. (1993) Proc. Natl. Acad. Sci. USA 90:8033-7 successfully pseudotyped the retrovirus MLV with VSV-G and this resulted in a vector having an altered host range compared to MLV in its native form. VSV-G pseudotyped vectors have been shown to infect not only mammalian cells, but also cell lines derived from fish, reptiles and insects (Burns et al. (1993) ibid). They have also been shown to be more efficient than traditional amphotropic envelopes for a variety of cell lines (Yee et al., (1994) Proc. Natl. Acad. Sci. USA 91 :9564- 9568, Emi et al. (1991) Journal of Virology 65:1202-1207). VSV-G protein can be used to pseudotype certain retroviruses because its cytoplasmic tail is capable of interacting with the retroviral cores.
The provision of a non-retroviral pseudotyping envelope such as VSV-G protein gives the advantage that vector particles can be concentrated to a high titre without loss of infectivity (Akkina et al. (1996) J. Virol. 70:2581-5). Retrovirus envelope proteins are apparently unable to withstand the shearing forces during ultracentrifugation, probably because they consist of two non-covalently linked subunits. The interaction between the subunits may be disrupted by the centrifugation. In comparison the VSV glycoprotein is composed of a single unit. VSV- G protein pseudotyping can therefore offer potential advantages for both efficient target cell infection/transduction and during manufacturing processes.
WO 2000/52188 describes the generation of pseudotyped retroviral vectors, from stable producer cell lines, having vesicular stomatitis virus-G protein (VSV-G) as the membrane- associated viral envelope protein, and provides a gene sequence for the VSV-G protein.
Ross River Virus
The Ross River viral envelope has been used to pseudotype a non-primate lentiviral vector (FIV) and following systemic administration predominantly transduced the liver (Kang et al., 2002, J. Virol., 76:9378-9388). Efficiency was reported to be 20-fold greater than obtained with VSV-G pseudotyped vector, and caused less cytotoxicity as measured by serum levels of liver enzymes suggestive of hepatotoxicity.
Baculovirus GP64
The baculovirus GP64 protein has been shown to be an alternative to VSV-G for viral vectors used in the large-scale production of high-titre virus required for clinical and commercial applications (Kumar M, Bradow BP, Zimmerberg J (2003) Hum Gene Ther. 14(1):67-77). Compared with VSV-G-pseudotyped vectors, GP64-pseudotyped vectors have a similar broad tropism and similar native titres. Because, GP64 expression does not kill cells, HEK293T- based cell lines constitutively expressing GP64 can be generated.
Alternative Envelopes
Other envelopes which give reasonable titre when used to pseudotype EIAV include Mokola, Rabies, Ebola and LCMV (lymphocytic choriomeningitis virus). Intravenous infusion into mice of lentivirus pseudotyped with 4070A led to maximal gene expression in the liver.
Packaging Sequence
As utilized within the context of the present invention the term "packaging signal", which is referred to interchangeably as "packaging sequence" or "psi", is used in reference to the noncoding, cis-acting sequence required for encapsidation of retroviral RNA strands during viral particle formation. In HIV-1 , this sequence has been mapped to loci extending from upstream of the major splice donor site (SD) to at least the gag start codon (some or all of the 5' sequence of gag to nucleotide 688 may be included). In EIAV the packaging signal comprises the R region into the 5' coding region of Gag. As used herein, the term "extended packaging signal" or "extended packaging sequence" refers to the use of sequences around the psi sequence with further extension into the gag gene. The inclusion of these additional packaging sequences may increase the efficiency of insertion of vector RNA into viral particles.
Feline immunodeficiency virus (FIV) RNA encapsidation determinants have been shown to be discrete and non-continuous, comprising one region at the 5' end of the genomic mRNA (R- U5) and another region that mapped within the proximal 311 nt of gag (Kaye et al., J Virol. Oct;69(10):6588-92 (1995).
Internal Ribosome Entry Site (IRES)
Insertion of IRES elements allows expression of multiple coding regions from a single promoter (Adam et al (as above); Koo et al (1992) Virology 186:669-675; Chen et al 1993 J. Virol 67:2142-2148). IRES elements were first found in the non-translated 5' ends of picornaviruses where they promote cap-independent translation of viral proteins (Jang et al (1990) Enzyme 44: 292-309). When located between open reading frames in an RNA, IRES elements allow efficient translation of the downstream open reading frame by promoting entry of the ribosome at the IRES element followed by downstream initiation of translation.
A review on IRES is presented by Mountford and Smith (TIG May 1995 vol 11 , No 5:179-184). A number of different IRES sequences are known including those from encephalomyocarditis virus (EMCV) (Ghattas, I.R., et al., Mol. Cell. Biol., 11 :5848-5859 (1991); Bi P protein [Macejak and Sarnow, Nature 353:91 (1991)]; the Antennapedia gene of Drosophila (exons d and e) [Oh, et al., Genes & Development, 6:1643-1653 (1992)] as well as those in polio virus (PV) [Pelletier and Sonenberg, Nature 334: 320-325 (1988); see also Mountford and Smith, TIG 11 , 179-184 (1985)].
IRES elements from PV, EMCV and swine vesicular disease virus have previously been used in retroviral vectors (Coffin et al, as above).
The term "IRES" includes any sequence or combination of sequences which work as or improve the function of an IRES. The IRES(s) may be of viral origin (such as EMCV IRES, PV IRES, or FMDV 2A-like sequences) or cellular origin (such as FGF2 IRES, NRF IRES, Notch 2 IRES or EIF4 IRES).
In order for the IRES to be capable of initiating translation of each polynucleotide it should be located between or prior to the polynucleotides in the modular construct. The nucleotide sequences utilised for development of stable cell lines require the addition of selectable markers for selection of cells where stable integration has occurred. These selectable markers can be expressed as a single transcription unit within the nucleotide sequence or it may be preferable to use IRES elements to initiate translation of the selectable marker in a polycistronic message (Adam et al 1991 J.Virol. 65, 4985).
Genetic Orientation and Insulators
It is well known that nucleic acids are directional and this ultimately affects mechanisms such as transcription and replication in the cell. Thus genes can have relative orientations with respect to one another when part of the same nucleic acid construct.
In certain aspects of the present invention, at least two nucleic acid sequences present at the same locus in the cell or construct can be in a reverse and/or alternating orientations. In other words, in certain aspects of the invention at this particular locus, the pair of sequential genes will not have the same orientation. This can help prevent both transcriptional and translational read-through when the region is expressed within the same physical location of the host cell.
Having the alternating orientations benefits retroviral vector production when the nucleic acids required for vector production are based at the same genetic locus within the cell. This in turn can also improve the safety of the resulting constructs in preventing the generation of replication-competent retroviral vectors.
When nucleic acid sequences are in reverse and/or alternating orientations the use of insulators can prevent inappropriate expression or silencing of a NOI from its genetic surroundings.
The term "insulator" refers to a class of nucleotide, e.g.DNA, sequence elements that when bound to insulator-binding proteins possess an ability to protect genes from surrounding regulator signals. There are two types of insulators: an enhancer blocking function and a chromatin barrier function. When an insulator is situated between a promoter and an enhancer, the enhancer-blocking function of the insulator shields the promoter from the transcription-enhancing influence of the enhancer (Geyer and Corces 1992; Kellum and Schedl 1992). The chromatin barrier insulators function by preventing the advance of nearby condensed chromatin which would lead to a transcriptionally active chromatin region turning into a transcriptionally inactive chromatin region and resulting in silencing of gene expression. Insulators which inhibit the spread of heterochromatin, and thus gene silencing, recruit enzymes involved in histone modifications to prevent this process (Yang J, Corces VG. 2011 ;110:43-76; Huang, Li et al. 2007; Dhillon, Raab et al. 2009). An insulator can have one or both of these functions and the chicken p-globin insulator (cHS4) is one such example. This insulator is the most extensively studied vertebrate insulator, is highly rich in G+C and has both enhancer-blocking and heterochromatic barrier functions (Chung J H, Whitely M, Felsenfeld G. Cell. 1993;74:505-514). Other such insulators with enhancer blocking functions are not limited to but include the following: human p-globin insulator 5 (HS5), human p-globin insulator 1 (HS1), and chicken p-globin insulator (cHS3) (Farrell CM1 , West AG, Felsenfeld G., Mol Cell Biol. 2002 Jun;22(11):3820-31 ; J Ellis et al. EMBO J. 1996 Feb 1 ; 15(3): 562- 568). In addition to reducing unwanted distal interactions the insulators also help to prevent promoter interference (i.e. where the promoter from one transcription unit impairs expression of an adjacent transcription unit) between adjacent retroviral nucleic acid sequences. If the insulators are used between each of the retroviral vector nucleic acid sequences, then the reduction of direct read-through will help prevent the formation of replication-competent retroviral vector particles.
The insulator may be present between each of the retroviral nucleic acid sequences. In one aspect, the use of insulators prevents promoter-enhancer interactions from one NOI expression cassette interacting with another NOI expression cassette in a nucleotide sequence encoding vector components.
An insulator may be present between the vector genome and gag-pol sequences. This therefore limits the likelihood of the production of a replication-competent retroviral vector and 'wild-type' like RNA transcripts, improving the safety profile of the construct. The use of insulator elements to improve the expression of stably integrated multigene vectors is cited in Moriarity et al, Nucleic Acids Res. 2013 Apr;41 (8):e92.
Vector Titre
The skilled person will understand that there are a number of different methods of determining the titre of lentiviral vectors. Titre is often described as transducing units/mL (TU/rnL). Titre may be increased by increasing the number of vector particles and by increasing the specific activity of a vector preparation.
Therapeutic Use
The lentiviral vector as described herein or a cell or tissue transduced with the lentiviral vector as described herein may be used in medicine. In addition, the lentiviral vector as described herein, a production cell of the invention or a cell or tissue transduced with the lentiviral vector as described herein may be used for the preparation of a medicament to deliver a nucleotide of interest to a target site in need of the same. Such uses of the lentiviral vector or transduced cell of the invention may be for therapeutic or diagnostic purposes, as described previously.
Accordingly, there is provided a cell transduced by the lentiviral vector as described herein.
A "cell transduced by a viral vector particle" is to be understood as a cell, in particular a target cell, into which the nucleic acid carried by the viral vector particle has been transferred.
Nucleotide of interest
In one embodiment of the invention, the nucleotide of interest (i.e. transgene) is translated in a target cell which lacks TRAP. "Target cell" is to be understood as a cell in which it is desired to express the NOI. The NOI may be introduced into the target cell using a viral vector of the present invention. Delivery to the target cell may be performed in vivo, ex vivo or in vitro.
In a preferred embodiment, the nucleotide of interest gives rise to a therapeutic effect.
The NOI may have a therapeutic or diagnostic application. Suitable NOIs include, but are not limited to sequences encoding enzymes, co-factors, cytokines, chemokines, hormones, antibodies, anti-oxidant molecules, engineered immunoglobulin-like molecules, single chain antibodies, fusion proteins, immune co-stimulatory molecules, immunomodulatory molecules, chimeric antigen receptors a transdomain negative mutant of a target protein, toxins, conditional toxins, antigens, transcription factors, structural proteins, reporter proteins, subcellular localization signals, tumour suppressor proteins, growth factors, membrane proteins, receptors, vasoactive proteins and peptides, anti-viral proteins and ribozymes, and derivatives thereof (such as derivatives with an associated reporter group). The NOIs may also encode micro-RNA.
In one embodiment, the NOI may be useful in the treatment of a neurodegenerative disorder.
In another embodiment, the NOI may be useful in the treatment of Parkinson's disease. In another embodiment, the NOI may encode an enzyme or enzymes involved in dopamine synthesis. For example, the enzyme may be one or more of the following: tyrosine hydroxylase, GTP-cyclohydrolase I and/or aromatic amino acid dopa decarboxylase. The sequences of all three genes are available (GenBank® Accession Nos. X05290, I119523 and M76180, respectively).
In another embodiment, the NOI may encode the vesicular monoamine transporter 2 (VMAT2). In an alternative embodiment, the viral genome may comprise a NOI encoding aromatic amino acid dopa decarboxylase and a NOI encoding VMAT2. Such a genome may be used in the treatment of Parkinson's disease, in particular in conjunction with peripheral administration of L-DOPA.
In another embodiment, the NOI may encode a therapeutic protein or combination of therapeutic proteins.
In another embodiment, the NOI may encode a protein or proteins selected from the group consisting of glial cell derived neurotophic factor (GDNF), brain derived neurotrophic factor (BDNF), ciliary neurotrophic factor (CNTF), neurotrophin-3 (NT-3), acidic fibroblast growth factor (aFGF), basic fibroblast growth factor (bFGF), interleukin-1 beta (I L-1 p), tumor necrosis factor alpha (TNF-a), insulin growth factor-2, VEGF-A, VEGF-B, VEGF-C/VEGF-2, VEGF-D, VEGF-E, PDGF-A, PDGF-B, hetero- and homo-dimers of PDFG-A and PDFG-B.
In another embodiment, the NOI may encode an anti-angiogenic protein or anti-angiogenic proteins selected from the group consisting of angiostatin, endostatin, platelet factor 4, pigment epithelium derived factor (PEDF), placental growth factor, restin, interferon-a, interferon-inducible protein, gro-beta and tubedown-1 , interleukin(IL)-1 , IL-12, retinoic acid, anti-VEGF antibodies or fragments /variants thereof such as aflibercept, thrombospondin, VEGF receptor proteins such as those described in US 5,952,199 and US 6, 100,071 , and anti- VEGF receptor antibodies.
In another embodiment, the NOI may encode anti-inflammatory proteins, antibodies or fragment/variants of proteins or antibodies selected from the group consisting of NF-kB inhibitors, ILI beta inhibitors, TGFbeta inhibitors, IL-6 inhibitors, IL-23 inhibitors, IL-18 inhibitors, Tumour necrosis factor alpha and Tumour necrosis factor beta, Lymphotoxin alpha and Lymphotoxin beta, LIGHT inhibitors, alpha synuclein inhibitors, Tau inhibitors, beta amyloid inhibitors, and IL-17 inhibitors. In another embodiment the NOI may encode cystic fibrosis transmembrane conductance regulator (CFTR).
In another embodiment the NOI may encode a protein normally expressed in an ocular cell.
In another embodiment, the NOI may encode a protein normally expressed in a photoreceptor cell and/or retinal pigment epithelium cell.
In another embodiment, the NOI may encode a protein selected from the group comprising RPE65, arylhydrocarbon-interacting receptor protein like 1 (AIPL1), CRB1 , lecithin retinal acetyltransferace (LRAT), photoreceptor-specific homeo box (CRX), retinal guanylate cyclise (GUCY2D), RPGR interacting protein 1 (RPGRIP1), LCA2, LCA3, LCA5, dystrophin, PRPH2, CNTF, ABCR/ABCA4, EMP1 , TIMP3, MERTK, ELOVL4, MYO7A, USH2A, VMD2, RLBP1 , COX-2, FPR, harmonin, Rab escort protein 1 , CNGB2, CNGA3, CEP 290, RPGR, RS1 , RP1 , PRELP, glutathione pathway enzymes and opticin.
In other embodiments, the NOI may encode the human clotting Factor VIII or Factor IX.
In other embodiments, the NOI may encode protein or proteins involved in metabolism selected from the group comprising phenylalanine hydroxylase (PAH), Methylmalonyl CoA mutase, Propionyl CoA carboxylase, Isovaleryl CoA dehydrogenase, Branched chain ketoacid dehydrogenase complex, Glutaryl CoA dehydrogenase, Acetyl CoA carboxylase, propionyl CoA carboxylase, 3 methyl crotonyl CoA carboxylase, pyruvate carboxylase, carbamoyl- phophate synthase ammonia, ornithine transcarbamylase, glucosylceramidase beta, alpha galactosidase A, glucosylceramidase beta, cystinosin, glucosamine(N-acetyl)-6-sulfatase, N- acetyl-alpha-glucosaminidase, N-sulfoglucosamine sulfohydrolase, Galactosamine-6 sulfatase, arylsulfatase A, cytochrome B-245 beta, ABCD1, ornithine carbamoyltransferase, argininosuccinate synthase, argininosuccinate lysase, arginase 1 , alanine glycoxhylate amino transferase, ATP-binding cassette, sub-family B members.
In other embodiments, the NOI may encode a chimeric antigen receptor (CAR) or a T cell receptor (TCR). In one embodiment, the CAR is an anti-5T4 CAR. In other embodiments, the NOI may encode B-cell maturation antigen (BCMA), CD19, CD22, CD20, CD138, CD30, CD33, CD123, CD70, prostate specific membrane antigen (PSMA), Lewis Y antigen (LeY), Tyrosine-protein kinase transmembrane receptor (ROR1), Mucin 1 , cell surface associated (Muc1), Epithelial cell adhesion molecule (EpCAM), endothelial growth factor receptor (EGFR), insulin, protein tyrosine phosphatase, non-receptor type 22, interleukin 2 receptor, alpha, interferon induced with helicase C domain 1 , human epidermal growth factor receptor (HER2), glypican 3 (GPC3), disialoganglioside (GD2), mesiothelin, vesicular endothelial growth factor receptor 2 (VEGFR2).
In other embodiments, the NOI may encode a chimeric antigen receptor (CAR) against NKG2D ligands selected from the group comprising LILBP1 , 2 and 3, H60, Rae-1 a, b, g, d, MICA, MICB.
In further embodiments the NOI may encode SGSH, SLIMF1 , GAA, the common gamma chain (CD132), adenosine deaminase, WAS protein, globins, alpha galactosidase A, 6- aminolevulinate (ALA) synthase, b-aminolevulinate dehydratase (ALAD), Hydroxymethylbilane (HMB) synthase, Uroporphyrinogen (URO) synthase, Uroporphyrinogen (URO) decarboxylase, Coproporphyrinogen (COPRO) oxidase, Protoporphyrinogen (PROTO) oxidase, Ferrochelatase, a-L-iduronidase, Iduronate sulfatase, Heparan sulfamidase, N-acetylglucosaminidase, Heparan-a-glucosaminide N-acetyltransferase, 3 N- acetylglucosamine 6-sulfatase, Galactose-6-sulfate sulfatase, p-galactosidase, N- acetylgalactosamine-4-sulfatase, p-glucuronidase and Hyaluronidase.
In addition to the NOI the vector may also comprise or encode a siRNA, shRNA, or regulated shRNA (Dickins et al. (2005) Nature Genetics 37: 1289-1295, Silva et al. (2005) Nature Genetics 37:1281-1288).
Indications
The vectors, including retroviral and AAV vectors, according to the present invention may be used to deliver one or more NOI(s) useful in the treatment of the disorders listed in WO 1998/05635, WO 1998/07859, WO 1998/09985. The nucleotide of interest may be DNA or RNA. Examples of such diseases are given below:
A disorder which responds to cytokine and cell proliferation/differentiation activity; immunosuppressant or immunostimulant activity (e.g. for treating immune deficiency, including infection with human immunodeficiency virus, regulation of lymphocyte growth; treating cancer and many autoimmune diseases, and to prevent transplant rejection or induce tumour immunity); regulation of haematopoiesis (e.g. treatment of myeloid or lymphoid diseases); promoting growth of bone, cartilage, tendon, ligament and nerve tissue (e.g. for healing wounds, treatment of burns, ulcers and periodontal disease and neurodegeneration); inhibition or activation of follicle-stimulating hormone (modulation of fertility); chemotactic/chemokinetic activity (e.g. for mobilising specific cell types to sites of injury or infection); haemostatic and thrombolytic activity (e.g. for treating haemophilia and stroke); anti- inflammatory activity (for treating, for example, septic shock or Crohn's disease); macrophage inhibitory and/or T cell inhibitory activity and thus, anti-inflammatory activity; anti-immune activity (i.e. inhibitory effects against a cellular and/or humoral immune response, including a response not associated with inflammation); inhibition of the ability of macrophages and T cells to adhere to extracellular matrix components and fibronectin, as well as up-regulated fas receptor expression in T cells.
Malignancy disorders, including cancer, leukaemia, benign and malignant tumour growth, invasion and spread, angiogenesis, metastases, ascites and malignant pleural effusion.
Autoimmune diseases including arthritis, including rheumatoid arthritis, hypersensitivity, allergic reactions, asthma, systemic lupus erythematosus, collagen diseases and other diseases.
Vascular diseases including arteriosclerosis, atherosclerotic heart disease, reperfusion injury, cardiac arrest, myocardial infarction, vascular inflammatory disorders, respiratory distress syndrome, cardiovascular effects, peripheral vascular disease, migraine and aspirindependent anti-thrombosis, stroke, cerebral ischaemia, ischaemic heart disease or other diseases.
Diseases of the gastrointestinal tract including peptic ulcer, ulcerative colitis, Crohn's disease and other diseases.
Hepatic diseases including hepatic fibrosis, liver cirrhosis.
Inherited metabolic disorders including phenylketonuria PKU, Wilson disease, organic acidemias, urea cycle disorders, cholestasis, and other diseases.
Renal and urologic diseases including thyroiditis or other glandular diseases, glomerulonephritis or other diseases.
Ear, nose and throat disorders including otitis or other oto-rhino-laryngological diseases, dermatitis or other dermal diseases.
Dental and oral disorders including periodontal diseases, periodontitis, gingivitis or other dental/oral diseases. Testicular diseases including orchitis or epididimo-orchitis, infertility, orchidal trauma or other testicular diseases.
Gynaecological diseases including placental dysfunction, placental insufficiency, habitual abortion, eclampsia, pre-eclampsia, endometriosis and other gynaecological diseases.
Ophthalmologic disorders such as Leber Congenital Amaurosis (LCA) including LCA10, posterior uveitis, intermediate uveitis, anterior uveitis, conjunctivitis, chorioretinitis, uveoretinitis, optic neuritis, glaucoma, including open angle glaucoma and juvenile congenital glaucoma, intraocular inflammation, e.g. retinitis or cystoid macular oedema, sympathetic ophthalmia, scleritis, retinitis pigmentosa, macular degeneration including age related macular degeneration (AMD) and juvenile macular degeneration including Best Disease, Best vitelliform macular degeneration, Stargardt's Disease, Usher's syndrome, Doyne's honeycomb retinal dystrophy, Sorby's Macular Dystrophy, Juvenile retinoschisis, Cone-Rod Dystrophy, Corneal Dystrophy, Fuch's Dystrophy, Leber's congenital amaurosis, Leber's hereditary optic neuropathy (LHON), Adie syndrome, Oguchi disease, degenerative fondus disease, ocular trauma, ocular inflammation caused by infection, proliferative vitreo- retinopathies, acute ischaemic optic neuropathy, excessive scarring, e.g. following glaucoma filtration operation, reaction against ocular implants, corneal transplant graft rejection, and other ophthalmic diseases, such as diabetic macular oedema, retinal vein occlusion, RLBP1- associated retinal dystrophy, choroideremia and achromatopsia.
Neurological and neurodegenerative disorders including Parkinson's disease, complication and/or side effects from treatment of Parkinson's disease, AIDS-related dementia complex HIV-related encephalopathy, Devic's disease, Sydenham chorea, Alzheimer's disease and other degenerative diseases, conditions or disorders of the CNS, strokes, post-polio syndrome, psychiatric disorders, myelitis, encephalitis, subacute sclerosing pan-encephalitis, encephalomyelitis, acute neuropathy, subacute neuropathy, chronic neuropathy, Fabry disease, Gaucher disease, Cystinosis, Pompe disease, metachromatic leukodystrophy, Wiscott Aldrich Syndrome, adrenoleukodystrophy, beta-thalassemia, sickle cell disease, Guillaim-Barre syndrome, Sydenham chorea, myasthenia gravis, pseudo-tumour cerebri, Down's Syndrome, Huntington's disease, CNS compression or CNS trauma or infections of the CNS, muscular atrophies and dystrophies, diseases, conditions or disorders of the central and peripheral nervous systems, motor neuron disease including amyotropic lateral sclerosis, spinal muscular atropy, spinal cord and avulsion injury. Other diseases and conditions such as cystic fibrosis, mucopolysaccharidosis including Sanfilipo syndrome A, Sanfilipo syndrome B, Sanfilipo syndrome C, Sanfilipo syndrome D, Hunter syndrome, Hurler-Scheie syndrome, Morquio syndrome, ADA-SCID, X-linked SCID, X-linked chronic granulomatous disease, porphyria, haemophilia A, haemophilia B, post- traumatic inflammation, haemorrhage, coagulation and acute phase response, cachexia, anorexia, acute infection, septic shock, infectious diseases, diabetes mellitus, complications or side effects of surgery, bone marrow transplantation or other transplantation complications and/or side effects, complications and side effects of gene therapy, e.g. due to infection with a viral carrier, or AIDS, to suppress or inhibit a humoral and/or cellular immune response, for the prevention and/or treatment of graft rejection in cases of transplantation of natural or artificial cells, tissue and organs such as cornea, bone marrow, organs, lenses, pacemakers, natural or artificial skin tissue. siRNA, micro-RNA and shRNA
In certain other embodiments, the NOI comprises a micro-RNA. Micro-RNAs are a very large group of small RNAs produced naturally in organisms, at least some of which regulate the expression of target genes. Founding members of the micro-RNA family are let-7 and Un-4. The let-7 gene encodes a small, highly conserved RNA species that regulates the expression of endogenous protein-coding genes during worm development. The active RNA species is transcribed initially as an ~70 nt precursor, which is post-transcriptionally processed into a mature ~21 nt form. Both let-7 and lin-4 are transcribed as hairpin RNA precursors which are processed to their mature forms by Dicer enzyme.
In addition to the NOI the vector may also comprise or encode a siRNA, shRNA, or regulated shRNA (Dickins et al. (2005) Nature Genetics 37: 1289-1295, Silva et al. (2005) Nature Genetics 37:1281-1288).
Post-transcriptional gene silencing (PTGS) mediated by double-stranded RNA (dsRNA) is a conserved cellular defence mechanism for controlling the expression of foreign genes. It is thought that the random integration of elements such as transposons or viruses causes the expression of dsRNA which activates sequence-specific degradation of homologous singlestranded mRNA or viral genomic RNA. The silencing effect is known as RNA interference (RNAi) (Ralph et al. (2005) Nature Medicine 11:429-433). The mechanism of RNAi involves the processing of long dsRNAs into duplexes of about 21-25 nucleotide (nt) RNAs. These products are called small interfering or silencing RNAs (siRNAs) which are the sequencespecific mediators of mRNA degradation. In differentiated mammalian cells, dsRNA >30 bp has been found to activate the interferon response leading to shut-down of protein synthesis and non-specific mRNA degradation (Stark et al., Annu Rev Biochem 67:227-64 (1998)). However this response can be bypassed by using 21 nt siRNA duplexes (Elbashir et al., EMBO J. Dec 3;20(23):6877-88 (2001), Hutvagner et al., Science.Aug 3, 293(5531):834-8. Eupub Jul 12 (2001)) allowing gene function to be analysed in cultured mammalian cells.
Pharmaceutical Compositions
The present disclosure provides a pharmaceutical composition comprising the lentiviral vector as described herein or a cell or tissue transduced with the viral vector as described herein, in combination with a pharmaceutically acceptable carrier, diluent or excipient.
The present disclosure provides a pharmaceutical composition for treating an individual by gene therapy, wherein the composition comprises a therapeutically effective amount of a lentiviral vector. The pharmaceutical composition may be for human or animal usage.
The composition may comprise a pharmaceutically acceptable carrier, diluent, excipient or adjuvant. The choice of pharmaceutical carrier, excipient or diluent can be made with regard to the intended route of administration and standard pharmaceutical practice. The pharmaceutical compositions may comprise, or be in addition to, the carrier, excipient or diluent any suitable binder(s), lubricant(s), suspending agent(s), coating agent(s), solubilising agent(s) and other carrier agents that may aid or increase vector entry into the target site (such as for example a lipid delivery system).
Where appropriate, the composition can be administered by any one or more of inhalation; in the form of a suppository or pessary; topically in the form of a lotion, solution, cream, ointment or dusting powder; by use of a skin patch; orally in the form of tablets containing excipients such as starch or lactose, or in capsules or ovules either alone or in admixture with excipients, or in the form of elixirs, solutions or suspensions containing flavouring or colouring agents; or they can be injected parenterally, for example intracavernosally, intravenously, intramuscularly, intracranially, intraoccularly intraperitoneally, or subcutaneously. For parenteral administration, the compositions may be best used in the form of a sterile aqueous solution which may contain other substances, for example enough salts or monosaccharides to make the solution isotonic with blood. For buccal or sublingual administration, the compositions may be administered in the form of tablets or lozenges which can be formulated in a conventional manner.
The lentiviral vector as described herein may also be used to transduce target cells or target tissue ex vivo prior to transfer of said target cell or tissue into a patient in need of the same. An example of such cell may be autologous T cells and an example of such tissue may be a donor cornea.
Variants, Derivatives, Analogues, Homologues and Fragments
In addition to the specific proteins and nucleotides mentioned herein, the present invention also encompasses the use of variants, derivatives, analogues, homologues and fragments thereof.
In the context of the present invention, a variant of any given sequence is a sequence in which the specific sequence of residues (whether amino acid or nucleic acid residues) has been modified in such a manner that the polypeptide or polynucleotide in question retains at least one of its endogenous functions. A variant sequence can be obtained by addition, deletion, substitution, modification, replacement and/or variation of at least one residue present in the naturally-occurring protein.
The term "derivative" as used herein, in relation to proteins or polypeptides of the present invention includes any substitution of, variation of, modification of, replacement of, deletion of and/or addition of one (or more) amino acid residues from or to the sequence providing that the resultant protein or polypeptide retains at least one of its endogenous functions.
The term "analogue" as used herein, in relation to polypeptides or polynucleotides includes any mimetic, that is, a chemical compound that possesses at least one of the endogenous functions of the polypeptides or polynucleotides which it mimics.
Typically, amino acid substitutions may be made, for example from 1 , 2 or 3 to 10 or 20 substitutions provided that the modified sequence retains the required activity or ability. Amino acid substitutions may include the use of non-naturally occurring analogues.
Proteins used in the present invention may also have deletions, insertions or substitutions of amino acid residues which produce a silent change and result in a functionally equivalent protein. Deliberate amino acid substitutions may be made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity and/or the amphipathic nature of the residues as long as the endogenous function is retained. For example, negatively charged amino acids include aspartic acid and glutamic acid; positively charged amino acids include lysine and arginine; and amino acids with uncharged polar head groups having similar hydrophilicity values include asparagine, glutamine, serine, threonine and tyrosine. Conservative substitutions may be made, for example according to the table below. Amino acids in the same block in the second column and preferably in the same line in the third column may be substituted for each other:
Figure imgf000110_0001
The term "homologue" means an entity having a certain homology with the wild type amino acid sequence and the wild type nucleotide sequence. The term "homology" can be equated with "identity".
In the present context, a homologous sequence is taken to include an amino acid sequence which may be at least 50%, 55%, 65%, 75%, 85% or 90% identical, preferably at least 95%, 97 or 99% identical to the subject sequence. Typically, the homologues will comprise the same active sites etc. as the subject amino acid sequence. Although homology can also be considered in terms of similarity (i.e. amino acid residues having similar chemical properties/functions), in the context of the present invention it is preferred to express homology in terms of sequence identity.
In the present context, a homologous sequence is taken to include a nucleotide sequence which may be at least 50%, 55%, 65%, 75%, 85% or 90% identical, preferably at least 95%, 97%, 98% or 99% identical to the subject sequence. Although homology can also be considered in terms of similarity, in the context of the present invention it is preferred to express homology in terms of sequence identity.
Homology comparisons can be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs can calculate percentage homology or identity between two or more sequences.
Percentage homology may be calculated over contiguous sequences, i.e. one sequence is aligned with the other sequence and each amino acid in one sequence is directly compared with the corresponding amino acid in the other sequence, one residue at a time. This is called an "ungapped" alignment. Typically, such ungapped alignments are performed only over a relatively short number of residues.
Although this is a very simple and consistent method, it fails to take into consideration that, for example, in an otherwise identical pair of sequences, one insertion or deletion in the nucleotide sequence may cause the following codons to be put out of alignment, thus potentially resulting in a large reduction in percent homology when a global alignment is performed. Consequently, most sequence comparison methods are designed to produce optimal alignments that take into consideration possible insertions and deletions without penalising unduly the overall homology score. This is achieved by inserting "gaps" in the sequence alignment to try to maximise local homology.
However, these more complex methods assign "gap penalties" to each gap that occurs in the alignment so that, for the same number of identical amino acids, a sequence alignment with as few gaps as possible, reflecting higher relatedness between the two compared sequences, will achieve a higher score than one with many gaps. "Affine gap costs" are typically used that charge a relatively high cost for the existence of a gap and a smaller penalty for each subsequent residue in the gap. This is the most commonly used gap scoring system. High gap penalties will of course produce optimised alignments with fewer gaps. Most alignment programs allow the gap penalties to be modified. However, it is preferred to use the default values when using such software for sequence comparisons. For example when using the GCG Wisconsin Bestfit package the default gap penalty for amino acid sequences is -12 for a gap and -4 for each extension.
Calculation of maximum percentage homology therefore firstly requires the production of an optimal alignment, taking into consideration gap penalties. A suitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (University of Wisconsin, U.S.A.; Devereux et al. (1984) Nucleic Acids Research 12:387). Examples of other software that can perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al. (1999) ibid - Ch. 18), FASTA (Atschul et al. (1990) J. Mol. Biol. 403-410) and the GENEWORKS suite of comparison tools. Both BLAST and FASTA are available for offline and online searching (see Ausubel et al. (1999) ibid, pages 7-58 to 7-60). However, for some applications, it is preferred to use the GCG Bestfit program. Another tool, called BLAST 2 Sequences is also available for comparing protein and nucleotide sequences (see FEMS Microbiol Lett (1999) 174(2):247-50; FEMS Microbiol Lett (1999) 177(1):187-8).
Although the final percentage homology can be measured in terms of identity, the alignment process itself is typically not based on an all-or-nothing pair comparison. Instead, a scaled similarity score matrix is generally used that assigns scores to each pairwise comparison based on chemical similarity or evolutionary distance. An example of such a matrix commonly used is the BLOSUM62 matrix - the default matrix for the BLAST suite of programs. GCG Wisconsin programs generally use either the public default values or a custom symbol comparison table if supplied (see user manual for further details). For some applications, it is preferred to use the public default values for the GCG package, or in the case of other software, the default matrix, such as BLOSUM62.
Once the software has produced an optimal alignment, it is possible to calculate percentage homology, preferably percentage sequence identity. The software usually does this as part of the sequence comparison and generates a numerical result. "Fragments" are also variants and the term typically refers to a selected region of the polypeptide or polynucleotide that is of interest either functionally or, for example, in an assay. "Fragment" thus refers to an amino acid or nucleic acid sequence that is a portion of a full- length polypeptide or polynucleotide.
Such variants may be prepared using standard recombinant DNA techniques such as site- directed mutagenesis. Where insertions are to be made, synthetic DNA encoding the insertion together with 5' and 3' flanking regions corresponding to the naturally-occurring sequence either side of the insertion site may be made. The flanking regions will contain convenient restriction sites corresponding to sites in the naturally-occurring sequence so that the sequence may be cut with the appropriate enzyme(s) and the synthetic DNA ligated into the break. The DNA is then expressed in accordance with the invention to make the encoded protein. These methods are only illustrative of the numerous standard techniques known in the art for manipulation of DNA sequences and other known techniques may also be used.
All variants, fragments or homologues of the regulatory protein suitable for use in the cells and/or modular constructs of the invention will retain the ability to bind the cognate binding site of the NOI such that translation of the NOI is repressed or prevented in a viral vector production cell.
All variants fragments or homologues of the binding site will retain the ability to bind the cognate RNA-binding protein, such that translation of the NOI is repressed or prevented in a viral vector production cell. Codon Optimisation
The polynucleotides used in the present invention (including the NOI and/or components of the vector production system) may be codon-optimised. Codon optimisation has previously been described in WO 1999/41397 and WO 2001/79518. Different cells differ in their usage of particular codons. This codon bias corresponds to a bias in the relative abundance of particular tRNAs in the cell type. By altering the codons in the sequence so that they are tailored to match with the relative abundance of corresponding tRNAs, it is possible to increase expression. By the same token, it is possible to decrease expression by deliberately choosing codons for which the corresponding tRNAs are known to be rare in the particular cell type. Thus, an additional degree of translational control is available.
Many viruses, including retroviruses, use a large number of rare codons and changing these to correspond to commonly used mammalian codons, increases expression of a gene of interest, e.g. a NOI or packaging components in mammalian production cells, can be achieved. Codon usage tables are known in the art for mammalian cells, as well as for a variety of other organisms.
Codon optimisation of viral vector packaging components has a number of other advantages. By virtue of alterations in their sequences, the nucleotide sequences encoding the packaging components of the viral particles required for assembly of viral particles in the producer cells/packaging cells have RNA instability sequences (INS) eliminated from them. At the same time, the amino acid sequence coding sequence for the packaging components is retained so that the viral components encoded by the sequences remain the same, or at least sufficiently similar that the function of the packaging components is not compromised. In lentiviral vector gag/pol expression cassettes codon optimisation also overcomes the Rev/RRE requirement for export, rendering optimised sequences Rev-independent. Codon optimisation also reduces homologous recombination between different constructs within the vector system (for example between the regions of overlap in the gag-pol and env open reading frames). The overall effect of codon optimisation is therefore a notable increase in viral titre and improved safety.
In one aspect only codons relating to INS are codon optimised. However, in a much more preferred and practical aspect, the sequences are codon optimised in their entirety, with some exceptions, for example the sequence encompassing the frameshift site of gag-pol (see below). The gag-pol gene of lentiviral vectors comprises two overlapping reading frames encoding the gag-pol proteins. The expression of both proteins depends on a frameshift during translation. This frameshift occurs as a result of ribosome "slippage" during translation. This slippage is thought to be caused at least in part by ribosome-stalling RNA secondary structures. Such secondary structures exist downstream of the frameshift site in the gag-pol gene. For HIV, the region of overlap extends from nucleotide 1222 downstream of the beginning of gag (wherein nucleotide 1 is the A of the gag ATG) to the end of gag (nt 1503). Consequently, a 281 bp fragment spanning the frameshift site and the overlapping region of the two reading frames is preferably not codon optimised. Retaining this fragment will enable more efficient expression of the Gag-Pol proteins. For EIAV the beginning of the overlap has been taken to be nt 1262 (where nucleotide 1 is the A of the gag ATG) and the end of the overlap to be nt 1461. In order to ensure that the frameshift site and the gag-pol overlap are preserved, the wild type sequence has been retained from nt 1156 to 1465.
Derivations from optimal codon usage may be made, for example, in order to accommodate convenient restriction sites, and conservative amino acid changes may be introduced into the Gag-Pol proteins.
In one aspect, codon optimisation is based on lightly expressed mammalian genes. The third and sometimes the second and third base may be changed.
Due to the degenerate nature of the genetic code, it will be appreciated that numerous gag- pol sequences can be achieved by a skilled worker. Also there are many retroviral variants described which can be used as a starting point for generating a codon-optimised gag-pol sequence. Lentiviral genomes can be quite variable. For example there are many quasispecies of HIV-1 which are still functional. This is also the case for EIAV. These variants may be used to enhance particular parts of the transduction process. Examples of HIV-1 variants may be found at the HIV Databases operated by Los Alamos National Security, LLC at http://hiv-web.lanl.gov. Details of EIAV clones may be found at the National Center for Biotechnology Information (NCBI) database located at http://www.ncbi.nlm.nih.gov.
The strategy for codon-optimised gag-pol sequences can be used in relation to any retrovirus. This would apply to all lentiviruses, including EIAV, FIV, BIV, CAEV, VMR, SIV, HIV-1 and HIV-2. In addition this method could be used to increase expression of genes from HTLV-1, HTLV-2, HFV, HSRV and human endogenous retroviruses (HERV), MLV and other retroviruses. Codon optimisation can render gag-pol expression Rev-independent. In order to enable the use of anti-rev or RRE factors in the lentiviral vector, however, it would be necessary to render the viral vector generation system totally Rev/RRE-independent. Thus, the genome also needs to be modified. This is achieved by optimising vector genome components. Advantageously, these modifications also lead to the production of a safer system absent of all additional proteins both in the producer and in the transduced cell.
It is to be understood that features disclosed herein may be used in combination with one another. Furthermore, it is to be understood that such features may possess different functionalities by virtue of the nucleotide sequence comprising them.
The practice of the present invention will employ, unless otherwise indicated, conventional techniques of chemistry, molecular biology, microbiology and immunology, which are within the capabilities of a person of ordinary skill in the art. Such techniques are explained in the literature. See, for example, J. Sambrook, E. F. Fritsch, and T. Maniatis (1989) Molecular Cloning: A Laboratory Manual, Second Edition, Books 1-3, Cold Spring Harbor Laboratory Press; Ausubel, F. M. et al. (1995 and periodic supplements) Current Protocols in Molecular Biology, Ch. 9, 13, and 16, John Wiley & Sons, New York, NY; B. Roe, J. Crabtree, and A. Kahn (1996) DNA Isolation and Sequencing: Essential Techniques, John Wiley & Sons; J. M. Polak and James O'D. McGee (1990) In Situ Hybridization: Principles and Practice; Oxford University Press; M. J. Gait (ed.) (1984) Oligonucleotide Synthesis: A Practical Approach, IRL Press; and, D. M. J. Lilley and J. E. Dahlberg (1992) Methods of Enzymology: DNA Structure Part A: Synthesis and Physical Analysis of DNA Methods in Enzymology, Academic Press. Each of these general texts is herein incorporated by reference.
This disclosure is not limited by the exemplary methods and materials disclosed herein, and any methods and materials similar or equivalent to those described herein can be used in the practice or testing of aspects of this disclosure. Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, any nucleic acid sequences are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within this disclosure. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within this disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in this disclosure.
It must be noted that as used herein and in the appended claims, the singular forms "a", "an", and "the" include plural referents unless the context clearly dictates otherwise.
The terms "comprising", "comprises" and "comprised of' as used herein are synonymous with "including", "includes" or "containing", "contains", and are inclusive or open-ended and do not exclude additional, non-recited members, elements or method steps. The terms "comprising", "comprises" and "comprised of' also include the term "consisting of.
The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that such publications constitute prior art to the claims appended hereto.
The invention will now be further described by way of Examples, which are meant to serve to assist one of ordinary skill in the art in carrying out the invention and are not intended in any way to limit the scope of the invention.
EXAMPLES
The invention describes use of a plurality of 'Cytoplasmic Accumulation Region' (CAR) elements (CARe) or 'tiles' based on the consensus sequence BMWGHWSSWS (SEQ ID NO: 24), for example CCAGATCCTG (SEQ ID NO:30), within the 3'UTR of a viral vector transgene cassette either alone or in combination with other cis-acting elements such as posttranscription regulatory elements (PREs) and/or ZCCHC14 binding loops. The invention can be used in two main contexts: [1] within viral vector genomes where 'cargo' space is not limiting, and therefore in target cells the CAR elements further enhance expression of a transgene cassette containing another 3'UTR element, such as wPRE, or [2] within viral vector genomes where cargo space is limiting (i.e. at or above or substantially above the packaging 'limit' of the viral vector system employed), and therefore the CAR elements may be used instead of a larger 3'UTR element, such as the wPRE, thus reducing vector genome size, whilst also imparting an increase to transgene expression in target cells compared to a vector genome lacking any 3'UTR cis-acting element. Simplistically, a person skilled in the art will empirically determine the 'net' benefit in pursuing either of these two options. The net benefit is essentially determining the impact of total vector genome size on the maximum practical vector titres that can be achieved with or without the above stated combinations of cis-acting elements, whilst also leading to desirable levels transgene expression in target cells to mediate therapeutic effect. For example, if a lentiviral vector is being employed to deliver a large transgene resulting in a vector genome size of over ~9.5kb (the size of wild type HIV-1) - for example 10-to-13kb - then the viral vector titres are likely to be severely reduced (Sweeney and Vink, 2021). Output titres may be orders of magntiude lower than titres of vector genomes of <9.5kb. In such cases, it may be more advantageous to reduce the size of the vector genome by employing smaller cis-acting elements such as the CARe and/or ZCCHC14 binding loop(s) rather than larger elements such as the wPRE, even if transgene expression in the target cells is slighty or even modestly lower compared to use of wPRE (hence, the net benefit balances vector titre with transgene expression in target cells). For viral vector systems such as AAV and AdV, the threshold of vector genome packaging is much more stringent, with the optimum size for AAVs in the 4.7-5.3kb range (and in some cases ~6) (Wu et al., 2010). For AdVs, the maximum size of the vector genome is considered to be 34-37kb, and the available space for transgene sequences depends on whether 'gutted' or 1st/2nd generation vectors are being employed (Ricobaraza et al., 2020).
Example 1 : Positioning of novel cis-acting sequences in the 3'UTR of transgene cassettes within retroviral vector genomes (e.g. lentiviral vector genomes)
Generally the 3'UTR of a transgene cassette encoded within a retroviral vector genome - such as a leniviral vector genome - also harbours elements required for reverse transcription and integration; namely, the 3' polypurine tract (3'ppt) and the DNA attachment (aft) site, respectively. The 3'ppt is generated from viral RNA by the RNase H activity of RT plus-strand DNA during reverse transcription, resulting in a 15 nucleotide primer. The 3'ppt is highly conserved in most retroviruses and has been shown to be selectively used as the site of plusstrand initiation (Rausch and Grice, 2004). The att site is defined as those end sequences important for integration. The att site is comprised of U3 sequences at the terminus of the 5' LTR, and terminal U5 sequences at the end of the 3' LTR (Brown et al., 1999). In certain circumstances, the positioning of additional elements in the 3' UTR of the transgene cassette should be considered relative to these components when a retroviral viral vector is being used. For example, for integration-proficient retro/lentiviral vector genomes containing transgene cassettes in the same orientation as the vector genome, the transgene 3'UTR will necessarily contain the 3'ppt and U3 att sequences, since both the transgene cassette and vector genome cassette will utilizes the same polyadenylation (transcriptional terminator) sequence within the 3'LTR.
Accordingly, where the transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette, special care should be taken to ensure that the novel cis-acting sequences described herein (e.g. the CAR sequence, and/or the ZCCHC14 protein-binding sequence) is positioned within the 3'UTR of the transgene cassette such that the 3'ppt and att site are not disrupted. Suitable positions are shown in Figure 1. Other suitable positions may also be identified by a person of skill in the art, based on the disclosure provided herein
For example the core sequence that comprises both the 3'ppt and the att site (e.g. of a lentiviral vector genome expression cassette as described herein) may have a sequence of SEQ ID NO:25 (wherein 3'ppt is in bold, and att is underlined):
5'-AAAAGAAAAGGGGGGACTGGAAGGGCTAATTCAC-3' (SEQ ID NO:25)
Accordingly, where the transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette, it is preferable if the sequence above (of SEQ ID NO:25) is not disrupted by the novel cis-acting sequences described herein.
In one example, the sequence of SEQ ID NO: 26 may be used to provide the 3'ppt and att site (e.g. of a lentiviral vector genome expression cassette as described herein), (wherein 3'ppt is in bold, and att is underlined):
5'-
CCAATGACTTACAAGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGAC TGGAAGGGCTAATTCACTCCCAA-3' (SEQ ID NO:26)
Accordingly, where the transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette, it is preferable if the sequence above (of SEQ ID NO:26) is not disrupted by the novel cis-acting sequences described herein.
Preferably, where the transgene cassette is in a forward orientation with respect to the retroviral vector genome expression cassette cis-acting elements within the 3'UTR of the transgene cassette may be positioned upstream and/or downstream of the above uninterrupted sequences. Figure 1 also indicates how the CARe and/or ZCCHC14 binding loop(s) may be variably positioned within LV genome expression cassettes with or without a PRE, with transgene sequences in forward or reverse orientation.
The above considerations regarding the positioning of the novel cis-acting sequences relative to the 3'ppt and att site of the viral genome expression cassette only apply when the transgene cassette is in the forward orientation with respect to the retroviral vector genome expression cassette. When the transgene cassette employing such novel cis-acting elements is inverted with respect to the retro/lentiviral vector genome cassette (see Figure 1 , with "inverted" transgene cassette shown), the 3' UTR of the transgene is spatially separated from the 3'ppt and att site of the retroviral vector genome expression cassette, such that the novel cis-acting sequences cannot disrupt 3'ppt and att sites when placed at any position within the transgene 3'UTR.
Example 2: General positioning of CAR sequences with or without other cis-acting elements in the 3'UTR of transgene cassettes for empirical testing
Figure 2 provides greater detail on how CAR sequences may be used alone or in combination with ZCCHC14 protein-binding sequence(s), with or without a PRE within the 3'UTR of a transgene cassette of a viral vector. Figure 2A shows the CAR element consensus sequence (adapted from Lei et al., 2013). The nucleotide sequences and structures for ZCCHC14- binding loops from HMCV RNA2.7 and WHV wPRE are shown in Figure 2B. Recruitment of ZCCHC14 to the 3' region of the transgene mRNA results in a complex of ZCCHC14-Tent4, which enables mixed tailing within polyA tails of the polyadenylated mRNA. Mixed tailing increases the stability of the transgene mRNA in target cells. A plurality of CARe sequences for example 6 or 8 or 10 or 16 CARe sequences (repeated tandemly) may be positioned throughout the 3'UTR and optionally mixed with one or more ZCCHC15 binding loop(s) upstream of the polyA sequence to generate a composite sequence that increases the expression of the transgene protein. This can be done by rational design, for example testing different transgene cassettes containing increasing numbers of CARe sequences, and optionally in combination with one or more ZCCHC14 protein-binding sequences at different positions. Alternatively, a (semi-)random library of transgene cassette may be produced to generate thousands to tens of millions or more variants, for example using golden gate cloning in combination with nucleic acid barcoding. Such cassettes may be inserted directly into the viral vector of choice and transgene expression levels evaluated in target cells. Figure 2C gives an overview of how such variants may be tested, screened and selected for high transgene expression in target cells to identify candidate composite cis-acting elements.
Example 3: Use of novel CARe-containing 3'UTRs within lentiviral vector transgene cassettes to increase transgene expression in different target cells
Lentiviral vector genomes were designed to contain GFP expression cassettes driven by a variety of different promoters; namely, EF1a (containing its own intron), EFS (EF1a lacking the intron), human phosphoglycerate kinase (huPGK), Ubiquitin (UBC; containing its own promoter) and UBCs (UBC lacking the intron). Transgene cassette 3'UTR variants were made by generally deleting the wPRE and adding a CAR sequence (160bp in size) composed of tandem 16x CARe sequences (CARe.16t) in this position ('PosT), and optionally inserting a ZCCHC14 protein-binding sequence (from HMCV RNA2.7 and WHV wPRE) downstream of position 1 ('Pos2'). Controls were made with wPRE-only, ΔwPRE or where the 160bp CAR sequence was inverted (CARe.16t). LVs were produced in suspension (serum-free) HEK293T cells by transient co-transfection, with packaging plasmids (gagpol, VSVG and rev for RRE/rev-dependent genomes). Some studies used standard LV genomes, whereas others used LV genomes wherein the major splice donor (MSD) site and adjacent cryptic splice donor (crSD) site within stem loop 2 (SL2) of the packaging signal has been mutated. This feature abolishes aberrant splicing from the SL2 region but results in reduction in LV titres. Titres of these 'MSD-2KO' genomes can be recovered by use of co-expression of a modified U1 snRNA (example '256U1' used herein) that anneals to SL1 of the packaging signal (see WO 2021/014157 and WO 2021/160993, incorporated herein by reference).
Clarified crude harvest material from LV productions were titrated on adherent HEK293T cells by either flow cytometry of transduced cells ('Biological titre GFP TU/mL) and/or by Integration assay (Integrating TU/mL). Upon attaining the integrating TU/mL of LV preps, alternative cells such as Jurkat (T-cell line), HEPG2 (Human hepatocyte carcinoma cell line) and equine primary cells '92BR' (testis fibroblasts) were transduced at matched multiplicities of infection (MOI), ranging from MOI 0.1-to-2. Transgene expression levels in these transduce cells was measured by flow cytometry and (GFP) transgene Expression Scores (ES) generated by multiplying %GFP-positive cells by the median fluorescence intensities (arbitrary units).
Tables 3 and 4 presents the initial data for these experiments, and Figures 3-7 display some of these data graphically.
Figure imgf000121_0001
Figure imgf000122_0001
Table 3
Figure imgf000122_0002
Figure imgf000123_0001
Table 4
Tables 3 and 4. A summary of lentiviral vectors produced and transgene expression within selected transduced target cell. The table displays the types of lentiviral vectors produced: standard RRE/rev-dependent (STD RRE-LV), U1/RRE-dependent MSD-mutated LVs (MSD-2KOm5-RRE-LV [+256U1] or MSD-mutated, RRE/rev-independent (ARRE) LVs (MSD-2KOm5-ARRE-LV); transgene promoters and stated 3'UTR cis-acting elements employed at either upstream (Pos1) or downstream (Pos2) of each other when employing two cis-acting elements. Production titres from suspension (serum-free) HEK293T cells is stated as biological (GFP TU/rnL) or integrating (Integrating TU/rnL) titres (crude harvest). Median fluorescence intensities (Arb units) are displayed at indicated matched MOIs from transduction of the stated target cells (92BR are donkey primary cells). As can be seen from the data, the novel elements described herein boost expression compared to the ΔwPRE control. ND - not done, NT - not tested. Different 3' UTR cis-acting elements were employed: wPRE, wPRE3 (shortened wPRE), ΔwPRE (wPRE deleted), 16x 10bp CARe sequences in sense (CARe.16t) or antisense (CARe.inv16t), and/or single copy of the ZCCHC14 stem loop from either HCMV RNA2.7 (HCMV.ZSL1) or WHV wPRE (WPRE.ZSI1).
A number of general trends can be observed from the data presented in Tables 3 and 4 and Figures 3-7. Firstly, the measure of LV titres by GFP ('biological' TU/rnL) in adherent HEK293T cells tended to reveal the largest impacts of deleting the wPRE, and the partial 'rescue' of expression by the cis-acting sequences. This is to be expected, as loss of GFP transgene expression results in fewer cells being counted as transduced i.e. actual transduced cells appeared in the non-gated population. This is supported by the integration titre values, which are independent of transgene expression; here, LV titres were similar, although there were examples of 'rescued' integration titres when using CAR.16t with or without a ZCCHC14- protein-binding sequence (see Figures 3A, 3B, 6A). This trend generally held for intron-less promoters, since integrating titres for EF1a- or UBC- (both harbouring introns) containing LVs were actually slightly stimulated by deletion of wPRE. These effects were specific to the CAR sequences being employed in the sense orientation, since the inverted CAR sequences did not mediate the effect.
The benefit of the use of the CAR sequences with or without a ZCCHC14 protein-binding sequence in the context of wPRE-deleted LVs were tested in target cells different to adherent HEK293T cells, since the major purpose of viral vectors are to deliver transgenes to more relevant cell types rather than vector production cells. Initial experiments used Jurkat cells (to model T-cells i.e. for CAR-T therapy), HEPG2 cells (to model a liver indication) and a primary cell with limited cell doublings (92BR; a donkey testis cell). Interestingly, the largest effects of the CARe-based cis-acting elements were observed in these cell types. Figure 4 presents data in transduced Jurkat cells at three different MOIs, and also reports on transgene mRNA levels in the same experiment. These data show that the CARe.16t sequence was able to boost expression from LVs delivering an EFS-GFP cassette lacking the wPRE to the same levels observed with a cassette containing the wPRE. Surprisingly, the CAR.16t sequence provided a boost to expression greater than cassettes containing the wPRE when employing the EF1a promoter (i.e. containing an intron). This was not expected, given that CAR element were generally expected to act on intronless transcripts in their natural context. Figure 4D confirmed that some aspect of all these effects were due to increased mRNA steady-state pools in transduced cells.
In Figure 6 the use of CARe.16t or a ZCCHC14 protein-binding sequence alone provided a substantial increase in expression in HEPG2 cells when using a wPRE-deleted LV. The inverted control did not provide this effect. Interestingly, the CAR.16t + ZCCHC14 proteinbinding sequence cassette provided the highest level of expression in HEPG2 cells, which was greater than the wPRE-only LV control.
In Figure 6 the use of CARe.16t or ZCCHC14 protein-binding sequence alone provided an increase in expression in primary cells (92BR) when using a wPRE-deleted, EFS-GFP containing LV. The boost to expression was similar to that observed with the truncated wPRE element 'WPRE3' For EF1a-GFP containing LV genome, the boost to expression was greatest when both CARe.16t and ZCCHC14 protein-binding sequence were used together.
These elements could be used equally well when employing standard RRE/rev-dependent LV genomes, or when using the MSD-2KO vector genomes - either U1 /rev-dependent (Figure 5) or RRE/rev-independent (ARRE) (Figure 6) versions (undisclosed).
Example 4: Pairing a ZCCHC14 protein-binding sequence with multiple numbers of CARe tiles leads to increased transgene expression in transduced cells
To further exemplify the paring of the ZCCHC14 protein-binding sequence with multiple CARe tiles to generate novel minimal transgene expression enhancer sequences in the 3' UTR, several variants were constructed wherein a single (downstream) ZCCHC14 protein-binding sequence was paired with tandem arrays of the 10bp CARe sequence, ranging from a single copy up to 20 copies. These were inserted into an EFS-GFP expression cassette within a standard RRE/rev-dependent LV genome cassette and used to produce LVs in suspension (serum-free) HEK293T cells. Controls LVs containing wPRE, no cis-acting element (ΔwPRE), the 16 x 10bp CARe variant (CARe.16t) or a single ZCCHC14 protein-binding sequence were also produced. Clarified LV supernatants were titrated to generate integrating titres (Tll/mL), on adherent HEK293T cells. Following this, fresh adherent HEK293T cells were transduced at matched MOIs of 0.5, 1 or 2, and cultures incubated for 3 days. GFP expression in transduced cells was measured by flow cytometry, and median fluorescent intensity values generated (Arbitrary units). These were normalised to the wPRE-containing control (set to 100%). These data are displayed in Figure 8, and show that both the single ZCCHC14 proteinbinding sequence and CARe.16t elements boosted expression on their own. Pairing of the single ZCCHC14 protein-binding sequence with multiple CARe tiles produced a further increase in transgene expression in a generally correlative manner (i.e. increasing CARe tile numbers produced a greater increase in transgene expression), although perhaps the greatest increase in expression was realised at 8 copies of the CARe tiles. The CARe.8t element in combination with the single ZCCHC14 protein-binding sequence was ~152bp, generating transgene expression of -60% of that of the wPRE control, despite being only 20% of the size of the wPRE.
Example 5: Transgene 3'UTRs bearing multiple numbers of CARe tiles optionally paired with a ZCCHC14 protein-binding sequence leads to increased transgene expression in transduced Jurkat cells.
To further exemplify the pairing of the ZCCHC14 protein-binding sequence (ZSL1) with multiple CARe tiles to generate novel minimal transgene expression enhancer sequences in the 3' UTR, several variants were constructed wherein a single (downstream) ZCCHC14 protein-binding sequence was paired with tandem arrays of the 10bp CARe sequence CCAGTTCCTG (SEQ ID NO:29), ranging from a single copy up to 20 copies. These were inserted into an EFS-GFP expression cassette within a standard RRE/rev-dependent LV genome cassette and used to produce LVs in suspension (serum-free) HEK293T cells. Controls LVs containing wPRE, no cis-acting element (ΔwPRE), the 16 x 10bp CARe variant (CARe.16t) or a single ZCCHC14 protein-binding sequence were also produced. Clarified LV supernatants were titrated to generate integrating titres (TU/mL), on adherent HEK293T cells (Figure 9). Following this, fresh Jurkat cells were transduced at matched MOIs of 0.5 or 1 , and cultures incubated for 3 days. GFP expression in transduced cells was measured by flow cytometry, and median fluorescent intensity values generated (Arbitrary units). These were normalised to the wPRE-containing control (set to 100%). These data are displayed in Figure 9, and show a similar pattern of expression in the T-cell line compared to that of the HEK293T cells in Figure 8. Having no 3'UTR element (ΔwPRE) had less impact on transgene expression in Jurkat cells, with transgene expression dropping to 40% compared to wPRE. However, similarly to HEK293T cells, there was an increase in transgene expression as more CARe tiles were used, particularly from 8x to 20x. Moreover, when at least 12 tiles were used, the level of transgene expression was greater than that of the wPRE control. In addition, when paired with the ZSL1 sequence, the highest transgene expression was observed with as few as 16x CARe tiles, and comparable expression to wPRE was achieve when ZSL1 was paired with just 8x CARe tiles. Thus, the CARe.8t or CAR.16t elements in combination with the single ZCCHC14 protein-binding sequence were ~150bp and ~230bp respectively, with each generating transgene expression of the same or 160% of that of the ~590bp wPRE control, respectively.
Example 6: Assessment of CARe consensus sequence variants as 16x tiles in combination with a ZCCHC14 protein-binding sequence on transgene expression in transduced Jurkat cells.
The CARe consensus sequence generated by Lei et al., 2013 - and tested in the present invention within transgene 3'llTRs - was engineered to reflect the variance within the consensus to generate alternative 'synthetic' tiles. These are shown in Figure 10, indicating how each variant differed within this semi-degenerate consensus sequence. Since it is shown in the present invention that 8x-to-16x copies of the CARe consensus tile produced increased transgene expression in a variety of different cell types/line, these synthetic variants were tandemised as 16x tiles and tested in combination with the HCMV.ZSL1 loop. Another variant was tested in which the CARe consensus 16x tile fragment was mutated in every other tile (i.e. alternate) to assess the impact of mutation across the CARe16t fragment (Cons/mut5). These were inserted into an EFS-GFP expression cassette within a standard RRE/rev- dependent LV genome cassette and used to produce LVs in suspension (serum-free) HEK293T cells. Controls LVs containing wPRE, no cis-acting element (ΔwPRE), the 16 x 10bp CARe variant (CARe.16t) or a single ZCCHC14 protein-binding sequence were also produced. Clarified LV supernatants were titrated to generate integrating titres (TU/mL), on adherent HEK293T cells (data not shown). Following this, fresh Jurkat cells were transduced at matched MOIs of 1 , and cultures incubated for 10 days. GFP expression in transduced cells was measured by flow cytometry, and median fluorescent intensity values generated (Arbitrary units). Host cell DNA was extracted and qPCR to HIV-1 Psi performed to generated vector- copy-number. Thus, Figure 10 displays transgene expression normalised to vector-copy- number (VCN). These data identity variants 4, 7, 10, and 12 has being able to boost transgene expression in the T-cell line, compared to no 3'UTR element, with variant 4 approaching similar expression levels compared to wPRE. The variant containing alternate mutated CARe tiles (cons/mut5) performed similarly to the CARe.16t consensus control without the ZSL1 loop (which produced slightly higher levels of expression than wPRE). Together, this data indicate that the CARe consensus tile is the most optimal 'version', and that each tile contributes to expression activity in a cumulative manner, since if only some consensus tiles were 'active' whilst others were acting as 'spacers', then the cons/mut5+ZSL1 variant would be expected to equally active as the CARe.16t+ZSL1 control. However, the cons/mut5 variant boosted expression to 2/3rds that of the CARe.16t+ZSL1 control, which is more inline with the activity of the CARe.8t+ZSL1 variant in Figure 9. In other words, only the 8 copies of the CARe consensus within the cons/mut5 variant were fully active. This also provides strong evidence that spacing nucleotides of up to 8 in length may be inserted between tandem CARe consensus tiles without affecting the total cumulative effect of the element on transgene expression. For this reason, and given that one aspect of the invention is to minimise sequence length for use in gene therapy vector transgene cassettes, the inventors did not pursue further variants containing spacers between CARe consensus tiles, as this would unnecessarily increase total element size.
Further CARe consensus variants were generated based on 'native' sequences as per Lei et al., 2013, namely, those found in c-Jun, HSPB3, IFNalpha and IFNbeta. These are displayed in Figure 11. These variants were tandemised into 16x CARe elements and paired with the HCMV/ZSL1 loop. These were inserted into an EFS-GFP expression cassette within a standard RRE/rev-dependent LV genome cassette and used to produce LVs in suspension (serum-free) HEK293T cells. The same set of experiments in Jurkat cells as described above was carried out, and data reported in Figure 11. In this case, variant 2 from c-Jun and variant 1 of IFNal provided similar levels of expression compared to wPRE.
Example 7: Use of CARe/ZSL1 variant as 3'UTR element within rAAV vectors.
To show the utility of the novel CARe/ZSL1 ('CAZL') composite element(s) in other viral vector systems, rAAV vector genomes were engineered to contain example CAZL elements, so that they could be compared 'empty' or cassettes using the wPRE, which is -590 nts in length. The use of CAZL elements may be especially useful in rAAVs where the genome packaging size limit is a 'hard' one, at ~5kb. Figure 12 shows a schematic of these example rAAV vectors and of those generated for exemplification, with the CAZL elements tested ranging in size: 141 nt (CARe.4tZSL1), 181 nt (CARe.8t/ZSL1) and 261 nt (CARe.16t/ZSL1). Note that these variants contain 'stuffer' sequence between the tandemised CARe tiles and the ZSL1 sequence due to cloning sites, and thus may in practise be reduced further in size. Additional controls included the inverted CAZL element for each type to control both for rAAV genome size and for CAZL functionality, since the inverted elements are not expected to be functional. Two sets of the rAAVs were generated, wherein the GFP transgene was driven by the CMV or EFS promoters. rAAVs were produced by transient transfection of HEK293T cells, followed by qPCR titration (see details in Figure 13 and Materials/Methods). HEPG2 cells were transduced with the different rAAV vector stocks at matched MOIs (250 and 500; see details in Figure 13 and Materials/Methods). Transgene expression data is presented in Figure 13 and demonstrates that the CAZL elements were able to increase transgene expression in HEPG2 cells compared to the empty control, and for the CMV-GFP cassette, the CARe.8t/ZSL1 and CARe.16t/ZSL1 CAZL variants enabled ~2-fold greater expression than the wPRE, despite being less than half the size of wPRE. Some variance was observed when employing different promoters, with the CARe.4t/ZSL1 increasing expression when using the EFS promoter to similar levels as the other two CAZL variants. Importantly, the inverted CAZL variants expressed transgene at similar levels as the empty control in all cases, demonstrating that orientation of the CAZL elements in the forward sense is necessary (i.e. these are functional sequences) and that the observed improved expression could not be explained by merely the increase in vector genome size or transgene cassette size relative to the empty control.
These data demonstrate that these novel, short CARe-ZSL1 composite elements function as 3'UTR transgene expression enhancers within rAAV vectors where cargo space is limited, and that expression may be boosted to similar or greater levels as observed for wPRE-containing rAAVs.
Sequences tested in rAAVs. CARe 10nt consensus tandomized tiles in bold. ZSL1 sequence underlined. These are non-limiting examples, and it is recognised that intervening sequence between tandemised CARe tiles and the ZSL1 may be deleted, or reduced in length to accommodate different restriction enzyme sites.
CARe.4t/ZL1 (SEQ ID NO:69):
CCAGTTCCTGCCAGTTCCTGCCAGTTCCTGCCAGTTCCTGACTAGGTACCTCGAGCGGATCCCATCG
ATTGCCGTCGCCACCGCGTTATCCGTTCCTCGTAGGCTGGTCCTGGGGAACGGGTCGGCGGCCGGT CGGCTTCT
CARe.8t/ZL1 (SEQ ID NO:70):
CCAGTTCCTGCCAGTTCCTGCCAGTTCCTGCCAGTTCCTGCCAGTTCCTGCCAGTTCCTGCCAGTTCC
TGCCAGTTCCTGACTAGGTACCTCGAGCGGATCCCATCGATTGCCGTCGCCACCGCGTTATCCGTTCC TCGTAGGCTGGTCCTGGGGAACGGGTCGGCGGCCGGTCGGCTTCT
CARe.16t/ZL1 (SEQ ID NO:71):
Figure imgf000130_0001
MATERIALS/METHODS
Cell culture conditions
HEK293Ts (HEK293Ts) suspension cells were grown in Freestyle™ 293 Expression Medium (Gibco) supplemented with 0.1 % of Cholesterol Lipid Concentrate (Gibco) and incubated at 37 °C in 5% CO2, in a shaking incubator (25 mm orbit set at 190 RPM). HEK293T adherent cells were maintained in complete media (Dulbecco's Modified Eagle Medium (DMEM) (Sigma) supplemented with 10% heat-inactivated fetal bovine serum (Gibco), 2 mM L- glutamine (Sigma) and 1 % non-essential amino acids (NEAA) (Sigma)), at 37 °C in 5% CO2. Jurkat cells were grown in RPMI media supplemented with 10% heat-inactivated fetal bovine serum at 37 °C in 5% CO2.
Suspension cell culture, transfection and lentiviral vector production
All vector production was carried out in HEK293Ts, in 24-well plates (1 mL volumes, on a shaking platform) or 30mL shake flasks. HEK293Ts cells were seeded at 8 x 105 cells per ml in serum-free media and were incubated at 37 °C in 5% CO2, shaking, throughout vector production. Approximately 24 hours after seeding the cells were transfected using the following mass ratios of plasmids per effective final volume of culture at transfection: 0.95 μg/mL Genome, 0.1 μg/mL Gag-Pol, 0.06 μg/mL Rev, 0.07 μg/mL VSV-G. Transfection was mediated by mixing DNA with Lipofectamine 2000CD in Opti-MEM as per manufacturer's protocol (Life Technologies). Sodium butyrate (Sigma) was added ~18 hrs later to 10 mM final concentration. Typically, vector supernatant was harvested 20-24 hours later, and then filtered (0.22 μm) and frozen at -80 °C. Vector used for transduction of Jurkat cells, was either produced in the absence of Sodium butyrate or centrifuged at 20,000 rpm for 1h30m and the vector pellet ressuspended in TSSM (Tromethamine, Sodium Chloride, Sucrose, Mannitol buffer).
Lentiviral vector titration assay
For lentiviral vector titration by GFP marker-containing cassette, HEK293T cells were seeded at 1.2 x 104 cells/well in 96-well plates. GFP-encoding viral vectors were used to transduce the cells in complete media containing 8 mg/ml polybrene and 1 x Penicillin Streptomycin for approximately 5-6 hours after which fresh media was added. The transduced cells were incubated for 2 days at 37 °C in 5% CO2. Cultures were then prepared for flow cytometry using an Attune-NxT® (Thermofisher). Percent GFP expression was measured and vector titres were calculated using a predicted cell count of 2 x 104 cells at the time of transduction (base on typical growth rate), the dilution factor of the vector sample, the percentage positive GFP population and total volume at transduction.
For lentiviral vector titration by integration assay, 0.5mL volumes of neat to 1 :5 diluted vector supernatants were used to transduce 1x105 HEK293T cells at 12-well scale in the presence of 8μg/mL polybrene. Cultures were passaged for 10 days (1 :5 splits every 2-3 days) before host DNA was extracted from 1x106 cell pellets. Duplex quantitative PCR was carried out using a FAM primer/probe set to the HIV packaging signal (Ψ ) and to RRP1 , and vector titres (TU/rnL) calculated using the following factors: transduction volume, vector dilution, RRP1- normallised HIV-1 Ψ copies detected per reaction.
Transduction at matched multiplicity of infection (MOI)
Calculation of vector ul to achieve MOI ranging from 2 to 0.1 was based on integration titre calculated in HEK293T cells. HEK293Tcells were seeded at 9E4 cells per well in 12 well plates. Jurkat cells were seeded at 2E5 cells per well on a 12 well plate. The following day, cells were recounted and MOI calculated based on integration titre and number of cells per well. Cells were passaged for 10 days with splits every 2-3 days. Flow cytometry, using an Attune-NxT® (Thermofisher), was performed at each split to determine the transgene (GFP) mean fluorescence intensity (MFI). At the end of 10 days, a Duplex quantitative PCR (see above) was performed to confirm the HIV-1 Ψ copies per cell. If necessary, the MFI values were normalized to the number of HIV-1 Ψ copies per cell.
Cytosolic mRNA quantification
Sub-cellular fractionation into nucleus and cytosolic fractions was performed by lysing the cells with a hypotonic buffer (20mM Tris-HCL + 150mM NaCI) complemented with Igepal at final concentration of 0.2%. RNA extraction from the cytosolic fraction was performed using RNeasy kit (Qiagen), followed by DNAse treatment (ezDNase™, Invitrogen) and reverse transcriptase (SuperScript™ IV VILO™, Invitrogen). Quantification of cytosolic transgene mRNA was done by qPCR with primer and probes targeting GFP and GAPDH as a normalization control. Analyses was done by comparative quantification using the delta CT method. rAAV vector production
AAV vector was produced with the AAV-MAX Helper-Free AAV Production System (ThermoFisher). The manufacturer's protocol was followed with the exception that the production cells used were 1 ,65s cells. Cells were seeded at 3E+06 live cells/mL in Freestyle + 0.1 % Cholesterol (hereafter FS + 0.1% CLC) in a total volume of 20ml and incubated in a Multitron set to 37°C with 5% CO2 and 200 rpm until transfection mixes were ready. A total of 1 ,5ug/ml of DNA was transfected into cells at a molar ratio of 1 :1 :1 (Transfer: AAV2Rep/Cap: Helper). Transfer plasmids used were pscAAV2-CMV-GFP and pscAAV2-EFS-GFP. After transfection, cells were returned to the Multitron and incubated for 72 hours before harvest with AAV-MAX lysis buffer (Thermo Fisher) following manufacturer's protocol. rAAV vector titration
The number of genome-containing particles present in the AAV preparation was determined by TaqMan Real-time PCR assay. Prior to the qPCR, vector was treated with DNAse and exonuclease in order to remove residual DNA. These enzymes were then heat-inactivated with a 95°C step, which simultaneously lyses AAV capsids to release vector genomes to allow analysis by PCR. TaqMan primers and probe were designed to target the GFP. rAAV transduction at matched multiplicity of infection (MOI)
HepG2 cells were seeded at 2E5 cells/ well in MEM supplemented with 10% FBS, L-glut (2mM), 1% NEAA in 12 well plates. The following day, cells were recounted, and MOI calculated based on the genome copy number/ml determined by qPCR and number of cells. HepG2 cells were transduced at MOI 250, and 500. Approximately 72 hours after infection, flow cytometry using an Attune-NxT® (Thermofisher), was performed to determine the percentage of GFP positive cells and the transgene (GFP) mean fluorescence intensity (MFI).
Sequences
Figure imgf000132_0001
Figure imgf000133_0001
Figure imgf000134_0001
Figure imgf000135_0001
Figure imgf000136_0001
BMWGHWSSWS (SEQ ID NO: 24)
Figure imgf000136_0002
HCMV-RNA2.7 PRE α "like" element (fragment IE in Kim et al., 2020) :
Figure imgf000136_0003
CMAGHWSSTG (SEQ ID NO: 28)
CCAGTTCCTG (SEQ ID NO: 29)
CCAGATCCTG (SEQ ID NO: 30)
CCAGTTCCTG (SEQ ID NO: 31)
TCAGATCCTG (SEQ ID NO: 32)
CCAGATGGTG (SEQ ID NO: 33)
CCAGTTCCAG (SEQ ID NO: 34)
CCAGCAGCTG (SEQ ID NO: 35)
CAAGCTCCTG (SEQ ID NO: 36)
Figure imgf000137_0001
Figure imgf000138_0001
BMWRHWSSWS (SEQ ID NO: 55)
TCAGTTCCTG (SEQ ID NO: 56) - variant 1
GCAGTTCCTG (SEQ ID NO: 57) - variant 2
CAAGTTCCTG (SEQ ID NO: 58) - variant 3
CCTGTTCCTG (SEQ ID NO: 59) - variant 4
CCTGCTCCTG (SEQ ID NO: 60) - variant 6
CCTGTACCTG (SEQ ID NO:61) - variant 7
Figure imgf000139_0001
References
Zufferey et al., J Virol. 1999 Apr; 73(4): 2886-2892.
Sertkaya et al., Sci Rep. 2021 Jun 8;1 1 (1): 12067
Vink et al., Mol Ther. 2017 Aug 2;25(8): 1790-1804.
Tornabene and Trapani (2020) Human Gene Therapy Vol. 31 , No. 1-2. Reviews.
Choi et al., Mol Brain. 2014; 7: 17.
Lei et al., Proc Natl Acad Sci USA. 201 1 Nov 1 ;108(44) :17985-90.
Lei et al., Nucleic Acids Res. 2013 Feb 1 ;41 (4):2517-25.
Kim et al., Nat Struct Mol Biol. 2020 Jun;27(6):581-588.
Sweeney and Vink, Mol Ther Methods Clin Dev. 2021 Apr 16;21 :574-584
Wu et al., Mol Ther. 2010 Jan; 18(1):80-6. Ricobaraza et al., Int J Mol Sci. 2020 May; 21(10): 3643.
Rausch and Grice; Int J Biochem Cell Biol. 2004 Sep;36(9):1752-66.
Brown et al., J Virol. 1999 Nov; 73(11): 9011-9020.

Claims

Claims
1. A nucleotide sequence comprising a transgene expression cassette, wherein the 3' UTR of the transgene expression cassette comprises at least one cis-acting sequence selected from: a) a cis-acting Cytoplasmic Accumulation Region (CAR) sequence, comprising at least one CAR element (CARe) sequence; and/or b) a cis-acting ZCCHC14 protein-binding sequence, comprising at least one CNGGN- type pentaloop sequence, wherein the cis-acting ZCCHC14 protein-binding sequence does not comprise a full length post-transcriptional regulatory element (PRE) a element and does not comprise a full length PRE y element.
2. The nucleotide sequence of claim 1 , wherein the CAR sequence comprises a plurality of CARe sequences.
3. The nucleotide sequence of claim 1 or claim 2, wherein the plurality of CARe sequences are in tandem.
4. The nucleotide sequence of claim 2 or 3, wherein the CAR sequence comprises at least two, at least four, at least six, at least eight, at least ten, at least twelve, at least fourteen, at least sixteen, at least eighteen, or at least twenty CARe sequences, optionally wherein the CARe sequences are in tandem.
5. The nucleotide sequence of claim 4, wherein the CAR sequence comprises at least six CARe sequences in tandem, or at least ten CARe sequences in tandem.
6. The nucleotide sequence of claim 5, wherein the CAR sequence comprises at least eight CARe sequences in tandem, at least twelve CARe sequences in tandem, or at least sixteen CARe sequences in tandem.
7. The nucleotide sequence of any one of the preceding claims, wherein the CARe nucleotide sequence is BMWGHWSSWS (SEQ ID NO: 24) or BMWRHWSSWS (SEQ ID NO: 55).
8. The nucleotide sequence of claim 7, wherein the CARe nucleotide sequence is CMAGHWSSTG (SEQ ID NO:28).
9. The nucleotide sequence of claim 7, wherein the CARe nucleotide sequence is selected from the group consisting of: CCAGTTCCTG (SEQ ID NO:29), CCAGATCCTG (SEQ ID NQ:30), CCAGTTCCTG (SEQ ID NO:31), TCAGATCCTG (SEQ ID NO:32), CCAGATGGTG (SEQ ID NO:33), CCAGTTCCAG (SEQ ID NO:34), CCAGCAGCTG (SEQ ID NO:35), CAAGCTCCTG (SEQ ID NO:36), CAAGATCCTG (SEQ ID NO:37), CCTGAACCTG (SEQ ID NO:38), CAAGAACGTG (SEQ ID NO:39), TCAGTTCCTG (SEQ ID NO: 56), GCAGTTCCTG (SEQ ID NO: 57), CAAGTTCCTG (SEQ ID NO: 58), CCTGTTCCTG (SEQ ID NO: 59), CCTGCTCCTG (SEQ ID NO: 60), CCTGTACCTG (SEQ ID NO:61), CCTGTTGCTG (SEQ ID NO: 62), CCTGTTCGTG (SEQ ID NO: 63), CCTGTTCCAG (SEQ ID NO: 64), CCTGTTCCTG (SEQ ID NO: 65), CCAATTCCTG (SEQ ID NO: 66) and GAAGCTCCTG (SEQ ID NO: 67); optionally wherein the CARe nucleotide sequence is selected from the group consisting of: CCAGTTCCTG (SEQ ID NO:29), CCTGTTCCTG (SEQ ID NO: 59), CCTGTACCTG (SEQ ID NO:61), CCTGTTCCAG (SEQ ID NO: 64), CCAATTCCTG (SEQ ID NO: 66), CCTGAACCTG (SEQ ID NO:38), CCAGTTCCTG (SEQ ID NO: 31) and CCAGTTCCAG (SEQ ID NO:34).
10. The nucleotide sequence of claim 9, wherein the CARe nucleotide sequence is CCAGTTCCTG (SEQ ID NO:29).
11. The nucleotide sequence of any one of the preceding claims, wherein the 3' UTR of the transgene expression cassette does not comprise additional post-transcriptional regulatory elements (PREs).
12. The nucleotide sequence of any one of claims 1 to 10, wherein the 3' UTR of the transgene expression cassette comprises at least one additional post-transcriptional regulatory element (PRE).
13. The nucleotide sequence of claim 12, wherein the additional PRE is a Woodchuck hepatitis virus PRE (wPRE).
14. The nucleotide sequence of any one of the preceding claims, wherein the cis-acting ZCCHC14 protein-binding sequence is a PRE a element fragment.
15. The nucleotide sequence of claim 14, wherein the cis-acting ZCCHC14 protein-binding sequence is: a) a fragment of a HBV PRE a element; or b) a fragment of HCMV RNA 2.7; or c) a fragment of a wPRE a element.
16. The nucleotide sequence of claim 14 or 15, wherein the PRE a element fragment is no more than 200 nucleotides in length.
17. The nucleotide sequence of claim 16, wherein the PRE a element fragment is no more than 90 nucleotides in length.
18. The nucleotide sequence of any one of the preceding claims, wherein the CNGGN- type pentaloop sequence is comprised within a stem-loop structure having a sequence selected from the group consisting of:
(i) TCCTCGTAGGCTGGTCCTGGGGA (SEQ ID NO:40); and
(ii) GCCCGCTGCTGGACAGGGGC (SEQ ID NO:41).
19. The nucleotide sequence of any one of the preceding claims, wherein the CNGGN- type pentaloop sequence is comprised within a heterologous stem-loop structure.
20. The nucleotide sequence of claim 18 or 19, wherein the ZCCHC14 protein-binding sequence comprises a sequence selected from the group consisting of:
(i)TGCCGTCGCCACCGCGTTATCCGTTCCTCGTAGGCTGGTCCTGGGGAACGGGTCGG CGGCCGGTCGGCTTCT (SEQ ID NO: 42); and
(ii)CTATTGCCACGGCGGAACTCATCGCCGCCTGCCTTGCCCGCTGCTGGACAGGGGCT CGGCTGTTGGGCACTGACAATTCCGTGGTGTTGT (SEQ ID NO: 43).
21. The nucleotide sequence of any one of the preceding claims, wherein the 3' UTR of the transgene expression cassette comprises a cis-acting CAR sequence and a cis-acting ZCCHC14 protein-binding sequence, optionally wherein the ZCCHC14 protein-binding sequence is located 3' to the CAR sequence.
22. The nucleotide sequence of any one of the preceding claims, wherein the 3' UTR of the transgene expression cassette comprises at least two spatially distinct cis-acting CAR sequences and/or at least two spatially distinct cis-acting ZCCHC14 protein-binding sequences.
23. The nucleotide sequence of any one of the preceding claims, wherein the 3' UTR of the transgene expression cassette further comprises a polyA sequence located 3' to the cis- acting CAR sequence and/or cis-acting ZCCHC14 protein-binding sequence.
24. The nucleotide sequence of any one of the preceding claims, wherein the transgene expression cassette further comprises a promoter operably linked to the transgene.
25. The nucleotide sequence of claim 24, wherein the promoter lacks its native intron, optionally wherein the promoter is selected from the group consisting of: an EFS promoter, a PGK promoter, and a UBCs promoter.
26. The nucleotide sequence of claim 24, wherein the promoter comprises an intron, optionally wherein the promoter is selected from the group consisting of: an EF1a promoter and a UBC promoter.
27. The nucleotide sequence of any one of the preceding claims, wherein the transgene expression cassette is a viral vector transgene expression cassette.
28. The nucleotide sequence of claim 27, wherein the viral vector transgene expression cassette is selected from the group consisting of: a retroviral vector transgene expression cassette, an adenoviral vector transgene expression cassette, an adeno-associated viral vector transgene expression cassette, a herpes simplex viral vector transgene expression cassette, and a vaccinia viral vector transgene expression cassette.
29. The nucleotide sequence of claim 28, wherein the viral vector transgene expression cassette is a retroviral vector transgene expression cassette or an adeno-associated viral vector transgene expression cassette.
30. The nucleotide sequence of claim 29, wherein the retroviral vector transgene expression cassette is a lentiviral vector transgene expression cassette.
31 . A nucleotide sequence comprising a viral vector genome expression cassette, wherein the viral vector genome expression cassette comprises the nucleotide sequence of any one of claims 1 to 30.
32. The nucleotide sequence of claim 31 , wherein the viral vector genome expression cassette is selected from the group consisting of: a retroviral vector genome expression cassette, an adenoviral vector genome expression cassette, an adeno-associated viral vector genome expression cassette, a herpes simplex viral vector genome expression cassette, and a vaccinia viral vector genome expression cassette.
33. The nucleotide sequence of claim 32, wherein the viral vector genome expression cassette is a retroviral vector genome expression cassette or an adeno-associated viral vector transgene expression cassette.
34. The nucleotide sequence of claim 33, wherein the 3' UTR of the retroviral vector genome expression cassette further comprises a 3' polypurine tract (3'ppt) that is located 5' to a DNA attachment (att) site, wherein, when the transgene expression cassette is in the forward orientation with respect to the genome expression cassette, the cis-acting sequence(s) are located 5' to the 3'ppt and/or 3' to the att site.
35. The nucleotide sequence of claim 33 or claim 34, wherein the retroviral vector genome expression cassette is a lentiviral vector genome expression cassette.
36. The nucleotide sequence of claim 35, wherein the major splice donor site in the lentiviral vector genome expression cassette is inactivated, optionally wherein the cryptic splice donor site 3' to the major splice donor site is also inactivated.
37. The nucleotide sequence of claim 36, wherein the inactivated major splice donor site has the sequence of GGGGAAGGCAACAGATAAATATGCCTTAAAAT (SEQ ID NO:4).
38. The nucleotide sequence of any one of claims 35 to 37, wherein the nucleotide sequence further comprises a nucleotide sequence encoding a modified U1 snRNA, wherein the modified U1 snRNA has been modified to bind to a nucleotide sequence within the packaging region of the lentiviral vector genome.
39. The nucleotide sequence of claim 38, wherein the viral vector genome expression cassette is operably linked to the nucleotide sequence encoding the modified U1 snRNA.
40. A viral vector comprising a viral vector genome encoded by the nucleotide sequence of any one of claims 31 to 39.
41. A viral vector production system comprising a nucleotide sequence according to any one of claims 1 to 39 and one or more additional nucleotide sequence(s) encoding viral vector components.
42. The viral vector production system of claim 41 , wherein the one or more additional nucleotide sequence(s) encode gag-pol and env, and optionally rev.
43. A cell comprising a nucleotide sequence according to any one of claims 1 to 39, a viral vector according to claim 40, or a viral vector production system according to claim 41 or 42.
44. A method for producing a viral vector, comprising the steps of:
(a) introducing a viral vector production system of claim 41 or 42 into a cell; and
(b) culturing the cell under conditions suitable for the production of the viral vector.
45. A viral vector produced by the method of claim 44.
46. Use of the nucleotide sequence of any of claims 1 to 39, the viral vector production system of claim 41 or 42, or the cell of claim 43, for producing a viral vector.
47. A method for identifying one or more cis-acting sequence(s) that improve transgene expression in a target cell, the method comprising the steps of:
(a)transducing target cells with a viral vector of claim 40 or 45;
(b)identifying target cells with a high level transgene expression; and
(c)optionally identifying the one or more cis-acting sequence(s) located within the 3' UTR of the transgene mRNA present within these target cells.
48. The method of claim 47, wherein step (c) comprises performing RT PCR and optionally sequencing the transgene mRNA.
49. The method of claim 47 or 48, wherein the method is performed using a plurality of viral vectors with:
(i) different cis-acting sequences in the 3'UTR of the transgene expression cassette; and/or
(ii) different cis-acting sequence locations within the 3'UTR of the transgene expression cassette; and/or
(iii) different cis-acting sequence combinations in the 3'UTR of the transgene expression cassette; to identify one or more cis-acting sequence(s) that improve transgene expression in the target cell.
PCT/GB2022/052577 2021-10-12 2022-10-12 Novel viral regulatory elements WO2023062359A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2114534.7 2021-10-12
GBGB2114534.7A GB202114534D0 (en) 2021-10-12 2021-10-12 Novel viral regulatory elements

Publications (2)

Publication Number Publication Date
WO2023062359A2 true WO2023062359A2 (en) 2023-04-20
WO2023062359A3 WO2023062359A3 (en) 2023-06-15

Family

ID=78595158

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2022/052577 WO2023062359A2 (en) 2021-10-12 2022-10-12 Novel viral regulatory elements

Country Status (2)

Country Link
GB (1) GB202114534D0 (en)
WO (1) WO2023062359A2 (en)

Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5139941A (en) 1985-10-31 1992-08-18 University Of Florida Research Foundation, Inc. AAV transduction vectors
WO1994029440A1 (en) 1993-06-04 1994-12-22 The Regents Of The University Of California Generation, concentration and efficient transfer of vsv-g pseudotyped retroviral vectors
WO1998005635A1 (en) 1996-08-07 1998-02-12 Darwin Discovery Limited Hydroxamic and carboxylic acid derivatives having mmp and tnf inhibitory activity
WO1998007859A2 (en) 1996-08-23 1998-02-26 Genetics Institute, Inc. Secreted proteins and polynucleotides encoding them
WO1998009985A2 (en) 1996-09-03 1998-03-12 Yeda Research And Development Co. Ltd. Anti-inflammatory peptides and uses thereof
WO1998017815A1 (en) 1996-10-17 1998-04-30 Oxford Biomedica (Uk) Limited Retroviral vectors
WO1999032646A1 (en) 1997-12-22 1999-07-01 Oxford Biomedica (Uk) Limited Equine infectious anaemia virus (eiav) based
WO1999041397A1 (en) 1998-02-17 1999-08-19 Oxford Biomedica (Uk) Limited Anti-viral vectors
US5952199A (en) 1996-05-07 1999-09-14 Genentech, Inc. Chimeric receptors as inhibitors of vascular endothelial growth factor activity, and processes for their production
WO2000052188A1 (en) 1999-03-03 2000-09-08 Oxford Biomedica (Uk) Limited Packaging cells for retroviral vectors
US6136597A (en) 1997-09-18 2000-10-24 The Salk Institute For Biological Studies RNA export element
WO2001079518A2 (en) 2000-04-19 2001-10-25 Oxford Biomedica (Uk) Limited Codon optimisation for expression in retrovirus packaging cells
US6376237B1 (en) 1995-08-03 2002-04-23 Avigen, Inc. High-efficiency wild-type-free AAV helper functions
WO2003064665A2 (en) 2002-02-01 2003-08-07 Oxford Biomedica (Uk) Limited Viral vector
WO2004022761A1 (en) 2002-09-03 2004-03-18 Oxford Biomedica (Uk) Limited Retroviral vector and stable packaging cell lines
US20050002907A1 (en) 2000-10-06 2005-01-06 Kyri Mitrophanous Vector system
US6924123B2 (en) 1996-10-29 2005-08-02 Oxford Biomedica (Uk) Limited Lentiviral LTR-deleted vector
WO2006010834A1 (en) 2004-06-25 2006-02-02 Centre National De La Recherche Scientifique Non-integrative and non-replicative lentivirus, preparation and uses thereof
WO2007071994A2 (en) 2005-12-22 2007-06-28 Oxford Biomedica (Uk) Limited Viral vectors
WO2007072056A2 (en) 2005-12-22 2007-06-28 Oxford Biomedica (Uk) Limited Vectors
WO2009153563A1 (en) 2008-06-18 2009-12-23 Oxford Biomedica (Uk) Limited Virus purification
WO2015092440A1 (en) 2013-12-20 2015-06-25 Oxford Biomedica (Uk) Limited Viral vector production system
EP3502260A1 (en) 2017-12-22 2019-06-26 Oxford BioMedica (UK) Limited Retroviral vector
WO2021014157A1 (en) 2019-07-23 2021-01-28 Oxford Biomedica (Uk) Limited Enhancing production of lentiviral vectors
WO2021094752A1 (en) 2019-11-12 2021-05-20 Oxford Biomedica (Uk) Limited Production system
WO2021160993A1 (en) 2020-02-13 2021-08-19 Oxford Biomedica (Uk) Limited Production of lentiviral vectors
WO2021181108A1 (en) 2020-03-13 2021-09-16 Oxford Biomedica (Uk) Limited Lentiviral vectors

Patent Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5139941A (en) 1985-10-31 1992-08-18 University Of Florida Research Foundation, Inc. AAV transduction vectors
WO1994029440A1 (en) 1993-06-04 1994-12-22 The Regents Of The University Of California Generation, concentration and efficient transfer of vsv-g pseudotyped retroviral vectors
US6376237B1 (en) 1995-08-03 2002-04-23 Avigen, Inc. High-efficiency wild-type-free AAV helper functions
US6100071A (en) 1996-05-07 2000-08-08 Genentech, Inc. Receptors as novel inhibitors of vascular endothelial growth factor activity and processes for their production
US5952199A (en) 1996-05-07 1999-09-14 Genentech, Inc. Chimeric receptors as inhibitors of vascular endothelial growth factor activity, and processes for their production
WO1998005635A1 (en) 1996-08-07 1998-02-12 Darwin Discovery Limited Hydroxamic and carboxylic acid derivatives having mmp and tnf inhibitory activity
WO1998007859A2 (en) 1996-08-23 1998-02-26 Genetics Institute, Inc. Secreted proteins and polynucleotides encoding them
WO1998009985A2 (en) 1996-09-03 1998-03-12 Yeda Research And Development Co. Ltd. Anti-inflammatory peptides and uses thereof
WO1998017815A1 (en) 1996-10-17 1998-04-30 Oxford Biomedica (Uk) Limited Retroviral vectors
US7056699B2 (en) 1996-10-29 2006-06-06 Oxford Biomedia (Uk) Limited Lentiviral LTR-deleted vector
US6924123B2 (en) 1996-10-29 2005-08-02 Oxford Biomedica (Uk) Limited Lentiviral LTR-deleted vector
US6136597A (en) 1997-09-18 2000-10-24 The Salk Institute For Biological Studies RNA export element
US6287814B1 (en) 1997-09-18 2001-09-11 Salk Institute RNA export element and methods of use
WO1999032646A1 (en) 1997-12-22 1999-07-01 Oxford Biomedica (Uk) Limited Equine infectious anaemia virus (eiav) based
WO1999041397A1 (en) 1998-02-17 1999-08-19 Oxford Biomedica (Uk) Limited Anti-viral vectors
WO2000052188A1 (en) 1999-03-03 2000-09-08 Oxford Biomedica (Uk) Limited Packaging cells for retroviral vectors
WO2001079518A2 (en) 2000-04-19 2001-10-25 Oxford Biomedica (Uk) Limited Codon optimisation for expression in retrovirus packaging cells
US20050002907A1 (en) 2000-10-06 2005-01-06 Kyri Mitrophanous Vector system
WO2003064665A2 (en) 2002-02-01 2003-08-07 Oxford Biomedica (Uk) Limited Viral vector
WO2004022761A1 (en) 2002-09-03 2004-03-18 Oxford Biomedica (Uk) Limited Retroviral vector and stable packaging cell lines
WO2006010834A1 (en) 2004-06-25 2006-02-02 Centre National De La Recherche Scientifique Non-integrative and non-replicative lentivirus, preparation and uses thereof
WO2007071994A2 (en) 2005-12-22 2007-06-28 Oxford Biomedica (Uk) Limited Viral vectors
WO2007072056A2 (en) 2005-12-22 2007-06-28 Oxford Biomedica (Uk) Limited Vectors
WO2009153563A1 (en) 2008-06-18 2009-12-23 Oxford Biomedica (Uk) Limited Virus purification
WO2015092440A1 (en) 2013-12-20 2015-06-25 Oxford Biomedica (Uk) Limited Viral vector production system
EP3502260A1 (en) 2017-12-22 2019-06-26 Oxford BioMedica (UK) Limited Retroviral vector
WO2021014157A1 (en) 2019-07-23 2021-01-28 Oxford Biomedica (Uk) Limited Enhancing production of lentiviral vectors
WO2021094752A1 (en) 2019-11-12 2021-05-20 Oxford Biomedica (Uk) Limited Production system
WO2021160993A1 (en) 2020-02-13 2021-08-19 Oxford Biomedica (Uk) Limited Production of lentiviral vectors
WO2021181108A1 (en) 2020-03-13 2021-09-16 Oxford Biomedica (Uk) Limited Lentiviral vectors

Non-Patent Citations (73)

* Cited by examiner, † Cited by third party
Title
"Oligonucleotide Synthesis: A Practical Approach", 1984, IRL PRESS
ABE ET AL., J VIROL, vol. 72, no. 8, 1998, pages 6356 - 6361
ADAM ET AL., J.VIROL., vol. 65, 1991, pages 4985
ATSCHUL ET AL., J. MOL. BIOL., 1990, pages 403 - 410
AUSUBEL, F. M. ET AL.: "Current Protocols in Molecular Biology", 1995, JOHN WILEY & SONS
B. ROEJ. CRABTREEA. KAHN: "DNA Isolation and Sequencing: Essential Techniques", 1996, JOHN WILEY & SONS
BABITZKE PY. J.CAMPANELLI D, JOURNAL OF BACTERIOLOGY, vol. 178, no. 17, 1996, pages 5159 - 5163
BALSANO, C. ET AL., BIOCHEM. BIOPHYS RES. COMMUN., vol. 176, 1991, pages 985 - 92
BROWN ET AL., J VIROL, vol. 73, no. 11, November 1999 (1999-11-01), pages 9011 - 9020
BURNS ET AL., PROC. NATL. ACAD. SCI. USA, vol. 90, 1993, pages 8033 - 7
CHEN ET AL., J. VIROL, vol. 67, 1993, pages 2142 - 2148
CHOI ET AL., MOL BRAIN, vol. 7, 2014, pages 17
CHUNG J HWHITELY MFELSENFELD G, CELL, vol. 74, 1993, pages 505 - 514
D. M. J. LILLEYJ. E. DAHLBERG: "Methods of Enzymology: DNA Structure Part A: Synthesis and Physical Analysis of DNA Methods in Enzymology", 1992, ACADEMIC PRESS
DEGLON, HUMAN GENE THERAPY, vol. 11, 2000, pages 179 - 90
DEVEREUX ET AL., NUCLEIC ACIDS RESEARCH, vol. 12, 1984, pages 387
DICKINS ET AL., NATURE GENETICS, vol. 37, 2005, pages 1281 - 1288
ELBASHIR ET AL., EMBO J., vol. 20, no. 23, 2001, pages 6877 - 88
EMI ET AL., JOURNAL OF VIROLOGY, vol. 65, 1991, pages 1202 - 1207
FARRELL CM1WEST AGFELSENFELD G., MOL CELL BIOL., vol. 22, no. 11, June 2002 (2002-06-01), pages 3820 - 31
FEMS MICROBIOL LETT, vol. 177, no. 1, 1999, pages 187 - 50
FIELDING ET AL., BLOOD, vol. 91, no. 5, 1998, pages 1802 - 1809
FLAJOLET, M. ET AL., J. VIROL., vol. 72, 1998, pages 6175 - 6180
GHATTAS, I.R. ET AL., MOL. CELL. BIOL., vol. 11, 1991, pages 5848 - 5859
HUTVAGNER ET AL., SCIENCE, vol. 293, no. 5531, pages 834 - 8
IWAKUMA ET AL., VIROL, vol. 261, 1999, pages 120 - 32
J ELLIS ET AL., EMBO J., vol. 15, no. 3, 1 February 1996 (1996-02-01), pages 562 - 568
J. M. POLAKJAMES O'D. MCGEE: "Situ Hybridization: Principles and Practice", 1990, OXFORD UNIVERSITY PRESS
J. SAMBROOKE. F. FRITSCHT. MANIATIS: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS
JANG ET AL., ENZYME, vol. 44, 1990, pages 292 - 309
KANG ET AL., J. VIROL., vol. 76, 2002, pages 9378 - 9388
KAYE ET AL., J VIROL., vol. 69, no. 10, 1995, pages 6588 - 92
KIM ET AL., NAT STRUCT MOL BIOL., vol. 27, no. 6, June 2020 (2020-06-01), pages 581 - 588
KOO ET AL., VIROLOGY, vol. 186, 1992, pages 669 - 675
KUMAR MBRADOW BPZIMMERBERG J, HUM GENE THER, vol. 14, no. 1, 2003, pages 67 - 77
LEI ET AL., NUCLEIC ACIDS RES., vol. 41, no. 4, 1 February 2013 (2013-02-01), pages 2517 - 25
LEI ET AL., PROC NATL ACAD SCI USA., vol. 108, no. 44, 1 November 2011 (2011-11-01), pages 17985 - 90
LEWIS ET AL., EMBO J, vol. 11, no. 8, 1992, pages 3053 - 3058
LUND ET AL., J. BIOL. CHEM., vol. 259, 1984, pages 2013 - 2021
MACEJAKSARNOW, NATURE, vol. 353, 1991, pages 91
MANSERVIGIET ET AL., OPEN VIROL J, vol. 4, 2010, pages 123 - 156
MARTARANO ET AL., J VIROL, vol. 68, no. 5, 1994, pages 3102 - 3111
MARTY ET AL., BIOCHIMIE, vol. 72, 1990, pages 885 - 7
MAURY ET AL., VIROLOGY, vol. 200, no. 2, 1994, pages 632 - 642
MCCARTY ET AL., J. VIROL., vol. 65, 1991, pages 2936 - 2945
MORIARITY ET AL., NUCLEIC ACIDS RES., vol. 41, no. 8, April 2013 (2013-04-01), pages e92
MOUNTFORDSMITH, TIG, vol. 11, 1985, pages 179 - 184
NAVIAUX ET AL., J. VIROL., vol. 70, 1996, pages 2581 - 5
NILSON ET AL., GENE THER, vol. 3, no. 4, 1996, pages 280 - 286
PELLETIER ANDSONENBERG, NATURE, vol. 334, 1988, pages 320 - 325
RALPH ET AL., NATURE MEDICINE, vol. 11, 2005, pages 429 - 433
RAUSCHGRICE, INT J BIOCHEM CELL BIOL, vol. 36, no. 9, September 2004 (2004-09-01), pages 1752 - 66
RICOBARAZA ET AL., INT J MOL SCI, vol. 21, no. 10, May 2020 (2020-05-01), pages 3643
RUNKEL, L. ET AL., VIROLOGY, vol. 194, no. 2, 1993, pages 530 - 536
SAMULSKI, J. VIROL., vol. 63, 1989, pages 3822 - 3828
SERTKAYA ET AL., SCI REP, vol. 11, no. 1, 8 June 2021 (2021-06-08), pages 12067
SERTKAYA, H. ET AL., SCI REP, vol. 11, 2021, pages 12067
STARK ET AL., ANNU REV BIOCHEM, vol. 67, 1998, pages 227 - 64
STEWART HJFONG-WONG LSTRICKLAND ICHIPCHASE DKELLEHER MSTEVENSON LTHOREE VMCCARTHY JRALPH GSMITROPHANOUS KA, HUM GENE THER, vol. 22, no. 3, 2011, pages 357 - 69
STEWART, H. J.M. A. LEROUX-CARLUCCIC. J. SIONK. A. MITROPHANOUSP. A. RADCLIFFE, GENE THER, vol. 16, no. 6, 2009, pages 805 - 14
SWEENEYVINK, MOL THER METHODS CLIN DEV, vol. 21, 16 April 2021 (2021-04-16), pages 574 - 584
TORNABENETRAPANI, HUMAN GENE THERAPY, vol. 31, 2020
VALSESIA-WITTMAN ET AL., J VIROL, vol. 70, 1996, pages 2056 - 64
VERMASOMIA, NATURE, vol. 389, no. 6648, 1997, pages 239 - 242
VINK ET AL., MOL THER, vol. 25, no. 8, 2 August 2017 (2017-08-02), pages 1790 - 1804
WEST, S., BIOCHEMICAL SOCIETY TRANSACTIONS, vol. 40, 2012, pages 846 - 849
WU ET AL., MOL THER, vol. 18, no. 1, January 2010 (2010-01-01), pages 80 - 6
YAO FSVENSJO TWINKLER TLU MERIKSSON CERIKSSON E: "Tetracycline repressor, tetR, rather than the tetR-mammalian cell transcription factor fusion derivatives, regulates inducible gene expression in mammalian cells", HUM GENE THER, vol. 9, 1998, pages 1939 - 1950, XP002105115
YEE ET AL., PROC. NATL. ACAD. SCI. USA, vol. 91, 1994, pages 9564 - 9568
YU ET AL., PNAS, vol. 83, 1986, pages 3194 - 98
ZHENG, Y.W. ET AL., J. BIOL. CHEM., vol. 269, 1994, pages 22593 - 8
ZUFFEREY ET AL., J VIROL, vol. 73, no. 4, April 1999 (1999-04-01), pages 2886 - 2892
ZUFFEREY, R. ET AL., J. VIROL., vol. 73, 1999, pages 2886 - 92

Also Published As

Publication number Publication date
WO2023062359A3 (en) 2023-06-15
GB202114534D0 (en) 2021-11-24

Similar Documents

Publication Publication Date Title
US20230118587A1 (en) Lentiviral Vectors
KR102354365B1 (en) Viral vector production system
US20240052366A1 (en) Production of Lentiviral Vectors
US20230183745A1 (en) Retroviral Vector
US20220348958A1 (en) Enhancing Production of Lentiviral Vectors
US20230002777A1 (en) Production System
JP2021515575A (en) Viral vector production system
WO2023062359A2 (en) Novel viral regulatory elements
US20230183742A1 (en) Viral Vector Production
WO2023062363A1 (en) Lentiviral vectors
WO2023062367A1 (en) Lentiviral vectors
US20230407338A1 (en) Preparation of a Solution of Polymer/Nucleic Acid Complexes
WO2023062366A1 (en) Retroviral vectors
WO2024038266A1 (en) Envelope proteins
WO2023062365A2 (en) Lentiviral vectors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22793794

Country of ref document: EP

Kind code of ref document: A2