WO2021209574A1 - Constructs comprising inteins - Google Patents

Constructs comprising inteins Download PDF

Info

Publication number
WO2021209574A1
WO2021209574A1 PCT/EP2021/059841 EP2021059841W WO2021209574A1 WO 2021209574 A1 WO2021209574 A1 WO 2021209574A1 EP 2021059841 W EP2021059841 W EP 2021059841W WO 2021209574 A1 WO2021209574 A1 WO 2021209574A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
intein
aag
sequence
cat
Prior art date
Application number
PCT/EP2021/059841
Other languages
French (fr)
Inventor
Alberto Auricchio
Hristiana LYUBENOVA
Pasquale PICCOLO
Marcello MONTI
Agnese PADULA
Federica Esposito
Original Assignee
Fondazione Telethon
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fondazione Telethon filed Critical Fondazione Telethon
Publication of WO2021209574A1 publication Critical patent/WO2021209574A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • C12N15/86Viral vectors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2750/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
    • C12N2750/00011Details
    • C12N2750/14011Parvoviridae
    • C12N2750/14111Dependovirus, e.g. adenoassociated viruses
    • C12N2750/14141Use of virus, viral particle or viral elements as a vector
    • C12N2750/14143Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/22Vectors comprising a coding region that has been codon optimised for expression in a respective host
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/40Systems of functionally co-operating vectors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • C12N2830/008Vector systems having a special element relevant for transcription cell type or tissue specific enhancer/promoter combination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • C12N2830/42Vector systems having a special element relevant for transcription being an intron or intervening sequence for splicing and/or stability of RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • C12N2830/48Vector systems having a special element relevant for transcription regulating transport or export of RNA, e.g. RRE, PRE, WPRE, CTE
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • C12N2830/50Vector systems having a special element relevant for transcription regulating RNA stability, not being an intron, e.g. poly A signal
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2840/00Vectors comprising a special translation-regulating system
    • C12N2840/44Vectors comprising a special translation-regulating system being a specific part of the splice mechanism, e.g. donor, acceptor
    • C12N2840/445Vectors comprising a special translation-regulating system being a specific part of the splice mechanism, e.g. donor, acceptor for trans-splicing, e.g. polypyrimidine tract, branch point splicing

Definitions

  • the present invention relates to constructs, vectors, relative host cells and pharmaceutical compositions which allow an effective gene therapy, in particular for diseases due to mutations in genes with a coding sequence (CDS) larger than 5 kb.
  • CDS coding sequence
  • AAV-based gene therapy is safe and effective in humans.
  • AAV- based gene therapy products have been approved in recent years both in USA and Europe for inherited metabolic and blinding diseases, whilst clinical trials for AAV-based gene therapy approaches for diseases in different therapeutic areas ranging from ophthalmology to hematology to musculoskeletal and metabolic disorders, are ever increasing.
  • AAV vectors cargo capacity prevents development of AAV-based therapies for diseases due to mutations in genes with a coding sequence (CDS) larger than 5 kb (herein referred to also as large genes).
  • CDS coding sequence
  • Dual and triple AAV vectors exploit concatemerization and recombination of AAV genomes to reconstitute the full-length genomes in cells co-infected by multiple AAV vectors.
  • the efficiency of transgene expression achieved with either dual or triple AAV vectors in photoreceptors which are the main therapeutic targets for most inherited retinal diseases, is lower than that achieved with single AAV vectors. This might be due to the various limiting steps required for efficient transduction, including proper DNA concatemer formation, stability of the heterogeneous mRNA and splicing efficiency across the junctions of the vectors.
  • the inventors have found that delivery of multiple AAV vectors each encoding one of the fragments of either reporter or large therapeutic proteins flanked by short split-inteins results in protein trans-splicing and full-length protein reconstitution both in vitro and in vivo.
  • Inteins are genetic elements transcribed and translated within a host protein from which they self-excise similarly to a protein intron, without leaving amino acid modifications in the final protein product, in the absence of energy supply, exogenous host-specific proteases or co factors (O. Novikova, N. Topilina, M. Belfort, J Biol Chem 289, 14490-14497 (2014); K. V. Mills, M. A. Johnson, F. B. Perler, J Biol Chem 289, 14498-14505 (2014); H. Iwai, S. Switzerlander, J. Jin, P. H. Tam, FEBS Lett 580, 1853-1858 (2006); J. Zettler, V. Schutz, H. D.
  • N- and C-exteins certain peptide sequences surrounding their ligation junction (called N- and C-exteins) that are required for efficient trans- splicing to occur, of which the most important is an amino acid containing a thiol or hydroxyl group (i.e., Cys, Ser orThr) as first residue in the C-extein (N. H. Shah, et al., J Am Chem Soc 135, 5839-5847 (2013).
  • Split-inteins are a subset of inteins that are expressed as two separate polypeptides at the ends of two host proteins, and catalyze their trans- splicing resulting in the generation of a single larger polypeptide (Y. Li, Biotechnol Lett 37, 2121-2137 (2015). Inteins, including split-inteins, are widely used in biotechnological applications that include protein purification and labeling steps, as well as the reconstitution of the widely used CRISPR/Cas9 genome editing nuclease.
  • dystrophin gene a highly functional form of the dystrophin gene was expressed in vitro and in vivo, wherein the 6.3-kb Becker dystrophin gene was split onto two AAV vectors and each half was fused to split inteins obtained from the Synechocystis sp. PCC 6803 (Ssp) DnaB intein or the Rhodothermus marinus (Rma) DnaB intein.
  • split-intein namely N. punctiforme DnaE split inteins
  • US 6,544,786 further reports the use of split inteins to deliver a dystrophin minigene.
  • the present inventors took advantage of the intrinsic ability of split-inteins to mediate protein trans-splicing to reconstitute large full-length proteins following their fragmentation into either two split-intein-flanked polypeptides, whose coding sequences fit into single AAV vectors.
  • the present inventors further reported successful AAV-mediated protein trans splicing (PTS) in the retina resulting in therapeutic levels of protein reconstitution which in some instances match those achieved by single AAV vectors( Tornabene, P. et al. Intein-mediated protein trans-splicing expands adeno-associated virus transfer capacity in the retina. Sci. Transl. Med. (2019); WO 2020/079034 the disclosure of which is herein incorporated by reference in its entirety).
  • Haemophilia A is a severe bleeding disorder caused by the partial or complete deficiency of coagulation factor VIII (F8). With a prevalence of 1 in 5000 male live births, it is the most common inherited X-linked recessive coagulation disorder. F8 activity levels inversely relate to the bleeding risk with severely affected patients (about 50% of all cases) having circulating protein levels of less than 1% (Bowen, D. J. Haemophilia A and haemophilia B: molecular insights. Mol Pathol. 2002 Feb;55(l):l-18 ; 11836440; Antonarakis, S.
  • Adeno-associated viral (AAV) vectors are emerging as a promising in vivo gene therapy of HemA, because of vectors' excellent safety profile and ability to direct long-term transgene expression from post-mitotic tissues such as the liver (Nathwani, A. C., Davidoff, A. M. & Tuddenham, E. G. D. Prospects for gene therapy of haemophilia. Haemophilia (2004) ; A.C., N. et al., Advances in Gene Therapy for Hemophilia. Hum. Gene Ther. (2017)).
  • Wilson disease (WD, OMIM #277900) is a rare autosomal recessive disorder of copper metabolism with an estimated prevalence ranging from one in 30,000 in most populations to one in 10,000 in isolated populations .
  • WD is caused by mutations in ATP7B gene ( Bull PC, et al., The Wilson disease gene is a putative copper transporting P-type ATPase similar to the Menkes gene. Nat Genet 1993;5:327-337; Tanzi RE et al.
  • the Wilson disease gene is a copper transporting ATPase with homology to the Menkes disease gene. Nat Genet 1993;5:344-350.), encoding for a P-type copper transporting ATPase highly expressed in hepatocytes which is critical for regulation of copper levels.
  • WD is characterized by toxic copper deposit in liver but also in brain, eyes and kidneys, even though at a lesser extent.
  • Clinical symptoms in WD include cirrhosis and chronic hepatitis that end in liver failure, psychiatric and neurological deficits including Parkinsonism and seizures.
  • the age of presentation of WD, the prevalence of hepatic and central nervous system (CNS) involvement, and their severity are highly variable (Rosencrantz R, Schilsky M. Wilson disease: pathogenesis and clinical considerations in diagnosis and treatment. Semin Liver Dis 2011;31:245-259).
  • AAV-mediated delivery of full-length human ATP7P resulted in sustained correction of copper metabolism in young male Atp7b-/- mice but resulted in poor production yield due to oversized genome and failed to ameliorate the disease phenotype in female and old male Atp7b-/- mice ( Murillo O, Luqui DM, Gazquez C, Martinez-Espartosa D, Navarro-Blasco I, Monreal Jl, Guembe L, et al. Long-term metabolic correction of Wilson's disease in a murine model by gene therapy. J Hepatol 2016;64:419-426).
  • mini-ATP7B in which four out of six metal binding domains (MDBs) had been deleted, restored copper homeostasis in male and female Atp7b-/- mice ( Murillo O, Moreno D, Gazquez C, Barberia M, Cenzano I, Navarro I, Uriarte I, et al. Liver Expression of a MiniATP7B Gene Results in Long-Term Restoration of Copper Homeostasis in a Wilson Disease Model in Mice. Hepatology 2019;70:108-126). Nevertheless, mini-ATP7B lacks important phosphorylation sites that regulate protein stability and trafficking and maybe less effective than full-length protein in vivo ( Pilankatta R, Lewis D, Inesi G. Involvement of protein kinase D in expression and trafficking of ATP7B (copper ATPase). J Biol Chem 2011;286:7389-739617).
  • the present inventors have used the split inteins, in particular Npu DnaE inteins to reconstitute variants of F8 and ATP7B genes for the treatment of hemophilia and Wilson disease.
  • Inventors applied liver gene therapy with AAV intein vectors to reconstitute and improve a highly active variant of the F8 gene, named N6 variant (5 kb) (Miao, H. Z. et al. Bioengineering of coagulation factor VIII for improved secretion. Blood (2004); Ward, N. J. et al. Codon optimization of human factor VIII cDNAs leads to high-level expression. Blood (2011)) showing supraphysiological levels of F8 activity.
  • the inventors showed that their approach allowed successful reconstitution both in vitro and in vivo, thus allowing generation of well-defined AAV vectors within their normal packaging capacity. Furthermore, the inventors showed that their approach achieves levels of F8 comparable with those obtained with a previously described oversize single AAV-F8 variant, however surprisingly, and unlike the single AAV vector, the inventors' intein-based approach does not elicit F8 neutralizing antibodies, overcoming certain existing limitations of hemophilia A gene therapy.
  • Inventors also applied gene therapy with AAV intein vectors to reconstitute the ATP7B gene for the treatment of Wilson disease.
  • the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein:
  • the first vector comprises a first portion of the coding sequence (CDS1) and a first intein nucleotide sequence that encodes a N-lntein, wherein the first intein nucleotide sequence is at the 3' end of CDS1; and
  • the second vector comprises a second portion of the coding sequence (CDS2) and a second intein nucleotide sequence that encodes a C-lntein, wherein the second intein nucleotide sequence is at the 5' end of CDS2; wherein the coding sequence encodes Factor VIII (F8) or ATP7B, or a variant thereof.
  • CDS2 second portion of the coding sequence
  • F8 Factor VIII
  • ATP7B a variant thereof.
  • the protein product of the coding sequence is produced by protein splicing.
  • the Factor VIII variant is N6 or SQ-N6.
  • the present invention provides a vector system to express a coding sequence in a cell, said coding sequence consisting of a first portion (CDS1) and a second portion (CDS2) said vector system comprising: a) a first vector comprising:
  • said first portion of said coding sequence (CDS1), -a first intein nucleotide sequence coding for a N-lntein said first intein nucleotide sequence having at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with the sequence:
  • AAT (Seq ID No. 16), or said N-intein has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID No. 1 or a variant thereof or a fragment thereof or a homolog thereof; and wherein said first intein nucleotide sequence is located at the 3' end of CDS1; and b) a second vector comprising:
  • telomere sequence coding for a C-lntein said second intein nucleotide sequence having at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with the sequence:
  • said second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%,
  • said C-intein has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID No. 2 or a variant thereof or a fragment thereof or a homolog thereof; and wherein said second intein nucleotide sequence is located at the 5' end of CDS2; wherein when the first vector and the second vector are inserted in a cell, the protein product of the coding sequence is produced by protein splicing.
  • the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 15.
  • the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 16.
  • the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 1.
  • the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 17.
  • the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 18.
  • the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 2.
  • the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 1, 3, 5, 7, 9, 11 or 13.
  • the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 2, 4, 6, 8, 10, 12 or 14.
  • the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with the N-intein sequence shown in SEQ ID NO: 26, 28, 30, 32, 34 or 38.
  • the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with the C-intein sequence shown in SEQ ID NO: 27, 29, 31, 33, 35 or 39.
  • the intein nucleotide sequence may be codon optimized.
  • the inteins are capable of trans-splicing reactions.
  • the vector system is a vector combination comprising the first vector and the second vector.
  • the first and/or second vector comprises a 5'-ITR.
  • the 5'-ITR may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 40.
  • the first and/or second vector comprises a 3'-ITR.
  • the 3'-ITR may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 41.
  • the first and/or second vector comprises an intron, preferably an SV40 intron.
  • the intron may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 42. aagtatcaiaggttacaagacagg.tttaaggag.accaatagaaactgggcttgtQg.ag.acagagaagactcttgc ⁇ ttt.c igata ⁇ gcacctattggtcttactgacatccactttgcctttctctccacag
  • the first and/or second vector comprises a promoter.
  • the promoter is operably linked to the first or second portion of the coding sequence.
  • the promoter may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 43 or 44.
  • GGTTTAGTGAACCGTCAGATCA SEQ ID NO: 43; CMV promoter
  • the first and/or second vector comprises a polyadenylation sequence.
  • the polyadenylation sequence is operably linked to the first or second portion of the coding sequence.
  • the polyadenylation seqeunce may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 45 or 46.
  • the first and/or second vector comprises an enhancer.
  • the enhancer is a WPRE.
  • the enhancer may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 47.
  • the first vector comprises nucleotide sequence that encodes a signal peptide.
  • the signal peptide is operably linked to the protein encoded by the coding sequence.
  • the nucleotide sequence encoding the signal peptide may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 48 or 49.
  • the first vector and the second vector further comprise a promoter sequence operably linked to the 5'end portion of said first portion of the coding sequence (CDS1) or of said second portion of the coding sequence (CDS2), preferably said promoter is a liver specific promoter, preferably the promoter is HLP or thyroxine binding globulin (TBG), HCB promoter , F8 promoter. In some embodiments, the promoter is a HLP promoter.
  • the first vector comprises a promoter, preferably operably linked to the first portion of the coding sequence (CDS1).
  • the second vector comprises a promoter, preferably operably linked to the second portion of the coding sequence (CDS2).
  • the first vector and the second vector further comprise a 5'-terminal repeat (5'-TR) nucleotide sequence and a 3' -terminal repeat (3'-TR) nucleotide sequence, preferably the 5'-TR is a 5'-inverted terminal repeat (5'-ITR) nucleotide sequence and the 3'-TR is a 3'-inverted terminal repeat (3'-ITR) nucleotide sequence.
  • the first and/or second vector comprises an AAV25'-ITR and AAV2-3'ITR.
  • the first and/or second vector comprises an AAV8 5'-ITR and AAV8-3'ITR.
  • first vector and the second vector further comprise a poly-adenylation signal nucleotide sequence and/or wherein at least one of the first vector or the second vector further comprises a nucleotide sequence coding for a degradation signal.
  • the degradation signal is selected from the group consisting of CL1, PB29, SMN, CIITA, ODc, ecDHFR or a fragment thereof.
  • the coding sequence encodes a protein able to correct hemophilia or Wilson disease.
  • the coding sequence is the coding sequence of a gene selected from the group consisting of: F8 or ATP7B or variant thereof.
  • the variant is N6 or N6-SQ.
  • the coding sequence is an F8 coding sequence or variant thereof. In some embodiments, the coding sequence is an ATP7B coding sequence. In preferred embodiments, the coding sequence encodes a B-domain deleted (BDD) F8.
  • BDD B-domain deleted
  • the coding sequence comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 22 or 23.
  • the coding sequence comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 24 or 25.
  • the coding sequence comprises or consists of a nucleotide sequence that encodes a protein with at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 19 or 20, preferably SEQ ID NO: 19..
  • the coding sequence comprises or consists of a nucleotide sequence that encodes a protein with at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 21.
  • the first portion of the coding sequence is the portion disclosed in SEQ ID NO: 26, 28, 30, 32 or 38 (or a sequence with at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity thereto).
  • the second portion of the coding sequence is the portion disclosed in SEQ ID NO: 27, 29, 31, 33 or 39 (or a sequence with at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity thereto).
  • the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein:
  • the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 26; and
  • the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 27; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
  • the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein: (a) the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 28; and
  • the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 29; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
  • the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein:
  • the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 30; and
  • the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 31; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
  • the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein:
  • the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 32; and
  • the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 33; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
  • the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein: (a) the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 38; and
  • the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 39; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
  • the coding sequence is split into the first portion or the second portion at a position consisting of a nucleophile amino acid which does not fall within a structural domain or a functional domain of the encoded protein product, wherein the nucleophile aminoacid is selected from serine, threonine, or cysteine.
  • the coding sequence is split into the first portion or the second portion at a position that substantially does not affect expression and/or activity (e.g. procoagulant activity) of the protein encoded by the coding sequence.
  • the F8 coding sequence is split at a position corresponding to Ser962 or Ser883, preferably Ser962 (with Ser962 or Ser883 being the first amino acid of the second portion).
  • the coding sequence split site is defined with respect a numbering convention with respect to SEQ ID NO: 20 in which the first Met amino acid is position 1.
  • the first portion of the coding sequence encodes an amino acid sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with any one of SEQ ID NO: 50-54.
  • the second portion of the coding sequence encodes an amino acid sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with any one of SEQ ID NO: 55-59.
  • the coding sequence is codon optimized.
  • the coding sequence is not codon optimised.
  • Preferably coding sequence has at least 80% (for example, at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with a sequence selected from the group consisting of: a) ATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGCTTTAGTGCCACCAG AAG AT ACT ACCT G GGTG CAGTG G AACTGTC ATGG G ACTATATG CAAAGTG ATCTCG GTG AG C T G CCT GT G G ACGC AAG ATTTCCT CCT AG AGT GCCAAAAT CTTTTCCATT C AAC ACCT CAGTCGT GTACAAAAAG ACT CT GTTT GT AG AATT CACGG AT C ACCTTTT C AACATCG CT AAG CC AAGG CCA CCCTGGATGGGTCTGCTAGGTCCTACCATCCAGGCTGAGGTTTATGATACAGTGGTCATTACA CTTAAGAACATGGCTTCCCATCCTGTCAGTCTTCATGCTGTTGGTG
  • AAG AAG CG G AAG ACT ATG ATG ATG ATCTTACTG ATTCTG AAATG G ATGTG GTCAG GTTTGATG
  • AAAAGTC AAT ATTT G AAC AAT G GCCCT C AG CG G ATTGGTAG G AAGTAC AAAA
  • CT G GTAC ATT CT AAG C ATT GGAGCACAGACT G ACTT CCTTT CTGT CTT CTT CT CTG G ATAT ACCT
  • AAACAC AAAAT G GT CT AT G AAG ACAC ACT CACCCT ATTCCC ATT CT C AG G AG AAACT GTCTT
  • CT C AGTT CAAG AAAGTT GTTTT CC AGG AATTT ACT GAT G GCTCCTTT ACT C AG CCCTT AT ACCGT
  • AAACTT ACTTTT G G AAAGT GC AACAT CAT AT GG C ACCCACT AAAG AT G AGTTT G ACT G CAAAG
  • a first intein nucleotide sequence coding for a N-lntein said N-intein having at least 80 % identity with SEQ ID No 3, 5, 7, 9, 11, 13 or a variant thereof or a fragment thereof or an homolog thereof and wherein said first intein nucleotide sequence is located at the 3' end of CDS1; and b) a second vector comprising: - said second portion of said coding sequence (CDS2),
  • a second intein nucleotide sequence coding for a C-lntein said C-intein has at least 80 % identity with SEQ ID No. 4, 6, 8, 10, 12, 14 or a variant thereof or a fragment thereof or an homolog thereof and wherein said second intein nucleotide sequence is located at the 5' end of CDS2; wherein said coding sequence encodes a sequence selected from the group of: i) MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFNTSVVYK
  • At least one of the first vector and the second vector further comprises at least one enhancer or regulatory nucleotide sequence, operably linked to the coding sequence.
  • the vector system comprises: a) a first vector comprising in a 5'-3' direction:
  • CDS1 a 5' end portion of a coding sequence
  • CDS2 3'end portion of the coding sequence
  • said first and second vector are independently a viral vector, preferably an adeno viral vector or adeno-associated viral (AAV) vector, preferably said first and second adeno- associated viral (AAV) vectors are selected from the same or different AAV serotypes, preferably the serotype is selected from the serotype 2, the serotype 8, the serotype 5, the serotype 7 or the serotype 9, serotype 7m8, serotype shlO; serotype 2(quad Y-F).
  • AAV adeno viral vector
  • AAV adeno-associated viral
  • the first and/or second vector is an AAV2 vector. In some embodiments, the first and/or second vector is an AAV8 vector.
  • the first and/or second vector is a viral vector particle.
  • the first and/or second vector is an AAV2/8 vector.
  • the invention provides a host cell transformed with the vector system as defined above.
  • the vector system or the host cell are for medical use, preferably for use in gene therapy, preferably for use in the treatment and/or prevention of hemophilia or Wilson disease, preferably hemophilia is hemophilia A.
  • the invention provides a vector, wherein the vector is the first vector as disclosed herein.
  • the invention provides a vector, wherein the vector is the second vector as disclosed herein.
  • the invention provides a cell comprising the first vector and the second vector as disclosed herein.
  • the invention provides a cell transduced or transfected with the first vector and the second vector as disclosed herein.
  • the cell is a mammalian cell, preferably a human cell, such as a liver cell.
  • the invention provides a kit comprising the first vector as disclosed herein and the second vector as disclosed herein.
  • the invention provides a composition comprising the first vector as disclosed herein and the second vector as disclosed herein.
  • the composition is a pharmaceutical composition comprising a pharmaceutically-acceptable carrier, diluent or excipient.
  • the invention provides the vector system, vector, kit or composition of the invention for use in therapy.
  • the invention provides the vector system, vector, kit or composition of the invention for use in treatment of hemophilia.
  • hemophilia is hemophilia A.
  • plasma F8 activity is increased, preferably is substantially normalised, after the treatment. In some embodiments, increased, preferably substantially normalised, plasma F8 activity is substantially maintained for at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or 16 weeks following the treatment.
  • the vector system, vector, kit or composition of the invention is administered at a dose that substantially does not result in the generation of anti-F8 antibodies in a subject (preferably human subject).
  • the vector system, vector, kit or composition of the invention is administered at a dose of at least 1.2 xlO 13 genome copies per kg.
  • the vector system, vector, kit or composition of the invention is administered at a dose of up to 4xl0 14 genome copies per kg.
  • the vector system, vector, kit or composition of the invention is administered at a dose of between 1.2xl0 13 genome copies per kg and 4xl0 14 genome copies per kg.
  • the invention provides the vector system, vector, kit or composition of the invention for use in treatment of Wilson disease.
  • the treated subject does not develop liver pathology.
  • the treatment substantially prevents increase in circulating alanine transaminase (ALT) and/or aspartate transaminase (AST) levels.
  • the vector system, vector, kit or composition of the invention is administered systemically. In some embodiments, the vector system, vector, kit or composition of the invention is administered intravenously.
  • the invention provides the first vector as disclosed herein for use in therapy, wherein the first vector is administered simultaneously, sequentially or separately in combination with the second vector as disclosed herein.
  • the invention provides the first vector as disclosed herein for use in treatment of hemophilia, wherein the first vector is administered simultaneously, sequentially or separately in combination with the second vectoras disclosed herein.
  • the hemophilia is hemophilia A.
  • the invention provides the first vector as disclosed herein for use in treatment of Wilson disease, wherein the first vector is administered simultaneously, sequentially or separately in combination with the second vector as disclosed herein.
  • the invention provides the second vector as disclosed herein for use in therapy, wherein the second vector is administered simultaneously, sequentially or separately in combination with the first vector as disclosed herein.
  • the invention provides the second vector as disclosed herein for use in treatment of hemophilia, wherein the second vector is administered simultaneously, sequentially or separately in combination with the first vector as disclosed herein.
  • the hemophilia is hemophilia A.
  • the invention provides the second vector as disclosed herein for use in treatment of Wilson disease, wherein the second vector is administered simultaneously, sequentially or separately in combination with the first vector as disclosed herein.
  • the invention provides a method of treating or preventing hemophilia comprising administering an effective amount of the vector system, vector, kit or composition of the invention to a subject in need thereof.
  • the hemophilia is hemophilia A.
  • the invention provides a method of treating or preventing Wilson disease comprising administering an effective amount of the vector system, vector, kit or composition of the invention to a subject in need thereof.
  • the invention provides a pharmaceutical composition comprising the vector system or the host cell as defined above and pharmaceutically acceptable vehicle.
  • Split inteins of the invention may be 100% identical, 98%, 80%, 75%, 70%, 65%, 60%, 55%, 50% identical to naturally occurring inteins or to SEQ ID No. 1 to 14 (homologs), wherein said inteins retain the ability to undergo trans-splicing reactions.
  • fragments or variants of naturally occurring or modified inteins which retain trans-splicing activity.
  • Preferred promoters are ubiquitous, artificial, or tissue specific promoters, including fragments and variants thereof retaining a transcription promoter activity.
  • Particularly preferred promoters are liver-specific promoters including Factor 8 promoter, thyroxine binding globulin (TBG), hybrid liver-specific promoter (HLP) (McIntosh J (2013). Blood 20 Feb 2013, 121(17):3335-3344), HCB promoter (Brown HC, Zakas PM, George SN, Parker ET, Spencer HT, Doering CB. Target-Cell-Directed Bioengineering Approaches for Gene Therapy of Hemophilia A. Mol Ther Methods Clin Dev. 2018;9:57-69.
  • liver endothelial cell promoters including the Tie 2 promoter (Benten D, et al., Hepatic targeting of transplanted liver sinusoidal endothelial cells in intact mice. Hepatology. 2005;42(1):140-148.) .
  • Ubiquitous promoters according to the present invention are for instance the ubiquitous cytomegalovirus (CMV)(32) and short CMV (33) promoters
  • Illustrative polyadenylation signals include, without limitations, the bovine growth hormone polyadenylation signal (bGHpA), the human beta globin polyadenylation signal or a short synthetic version (Levitt N, (1989). Genes Dev. 1989 Jul;3(7):1019-25), the SV40 polyadenylation signal, or other naturally occurring or artificial polyadenylation signal.
  • bGHpA bovine growth hormone polyadenylation signal
  • human beta globin polyadenylation signal or a short synthetic version (Levitt N, (1989). Genes Dev. 1989 Jul;3(7):1019-25)
  • the SV40 polyadenylation signal or other naturally occurring or artificial polyadenylation signal.
  • FIG. 1 Comparison of F8 variants in vitro, (a) Schematic representation of the different F8 variants that were cloned into an AAV backbone plasmid: N6 variant (variant in which the B domain is exchanged with a shorter SQ-N6 linker, wherein 11 amino acids of the modified SQ amino acid linker (SQm) are added in front of the N6 peptide)) ITR, inverted terminal repeats; CMV, cytomegalovirus promoter; star symbol, Bxflag tag; PolyA, polyadenylation signal; Ntds, nucleotides; SP, signal peptide (b) Western blot (WB) analysis of lysates of HEK293 cells 72 hours post transfection (hpt) with the different F8 variants. Neg, non-transfected cells (c) Chromogenic assay on medium of transfected cells showing F8 activity. Significant differences between groups were assessed using Kruskal-Wallis rank sum test followed by Nemenyi multiple pair
  • Figure 2 Characterisation of N6 inteins in vitro, (a) Schematic representation of AAV N6 inteins with indicated splitting points. ITR, inverted terminal repeats; Prom., promoter; SP, signal peptide; star symbol, 3xflag tag; PolyA, polyadenylation signal (b) WB of lysates of HEK293 cells 72 hpt with either single N6 plasmid or N6 AAV intein plasmids. I + II, N6 AAV intein plasmids; I, 5' F8-N6-N-intein plasmid; II, C-intein-3' F8-N6 plasmid.
  • Circular maps of plasmids Circular maps of pAAV2.1 HLP Npu DnaE N intein ATP7B (A) and pAAV2.1 HLP Npu DnaE C intein ATP7B (B).
  • ITR inverted terminal repeats
  • SV40 Simian virus 40 intron
  • HLP Hybrid Liver Promoter
  • bGHpA bovine Growth Hormone polyadenylation signal
  • FIG. 5 Intein-mediated in vitro reconstitution of full-length human ATP7B.
  • Expected molecular weights are: 160 kDa for full- length ATP7B-3XFLAG, HOkDa for C-intein- c-term ATP7B half-3XFLAG, 56 kDa for N-term ATP7B half-N-intein-3XFLAG, and 17kDa for excised inteins. GAPDH was used as loading control.
  • Figure 6 Intein-mediated in vivo reconstitution of full-length human ATP7B.
  • FIG. 7 Intein-mediated in vivo reconstitution of full-length human ATP7B in AtpJb ⁇ mice.
  • Atp7b _/ mice were injected with AAV2/8 HLP 5'ATP7B+N-intein and AAV2/8 HLP C-intein- 3 ⁇ TR7B, at a dose of lxl0 13 gc/Kg each, or AAV2/8 TBG eGFP vector at a dose of 2xl0 13 gc/Kg as control vector
  • Expected molecular weights are: 160 kDa for full-length ATP7B-3XFLAG, HOkDa for C-intein- c-term ATP7B half-3XFLAG, 56 kDa for N-term ATP7B half-N-intein-3XFLAG, and 15kDa for excised inteins. GAPDH was used as loading control.
  • AAV2/8 TBG eGFP AAV2/8 HLP 5'ATP7B+N-intein
  • AAV2/8 HLP C-intein-3'ATP7B AAV-int-ATP7B
  • FIG. 10 Comparison of human F8 variants in vitro.
  • A Schematic representation of the four different F8 variants that were cloned into an AAV plasmid: wild-type F8; N6 containing 11 amino acids of the modified SQ amino acid linker (SQ m ) followed by the human N6 B domain; SQ containing the SQ amino acid linker; V3 containing the V3 peptide in the middle of the SQ linker.
  • ITR inverted terminal repeats
  • CMV cytomegalovirus promoter
  • star symbol 3xflag tag
  • PolyA short synthetic polyadenylation signal
  • Ntds nucleotides
  • SP signal peptide.
  • FIG. 11 In vitro F8-N6 (N6) intein expression and activity.
  • N6 F8-N6 intein expression and activity.
  • ITR inverted terminal repeats
  • Prom promoter
  • SP signal peptide
  • 5'N6-F8 5'CDS of N6 variant n-intein, NDnaE or NDnaB intein
  • star symbol 3xflag tag
  • PolyA short synthetic polyadenylation signal
  • c-intein, CDnaE intein 3'N6-F8, 3'CDS of N6 variant.
  • C WB of medium of the transfected cells showing the secreted proteins.
  • Codon optimisation of the N6 intein improves F8 activity levels.
  • A WB of protein lysates of HEK293 cells 72 hpt with the AAV-N6 intein plasmids and with the codon-optimised set. I + II, N6 intein proteins; I, 5' N6 CDS-NDnaE protein; II, CDnaE-3'N6 CDS protein. The arrows indicate the full-length N6 protein and the excised intein Codop: codon-optimised.
  • B WB of medium from the transfected cells showing increased secretion of the codop N6 full-length protein compared to the non-codon-optimised.
  • Figure 13 Analysis of CodopV3 and intein AAV genomes integrity. Southern blot analysis of vector genome DNA isolated directly from AAV virions and run on an alkaline agarose gel. AAV DNA was labelled with a probe specific for the HLP promoter. Neg, AAV Dna treated with Dnasel; CodopV3, AAVCodopV3; I, AAV-5' N6-Nintein; II, AAV-C-intein-3' N6.
  • FIG. 14 AAV-N6 intein administration results in F8 activity levels comparable to wild-type and single AAV-Codop-V3 injected animals. Chromogenic assay performed on plasma samples to detect F8 activity in 3 different groups over time; F8 activity levels are reported as International Units/deciliter (lU/dl). Each dot within different groups represents a single mouse. Plasma sample were analysed at different time points: baseline, 4, 8, 12 and 16 weeks post injection (w.p.i.). The baseline includes all the knock-out mice before the treatment.
  • FIG. 15 AAV-Codop N6 intein leads to development of anti-F8 antibodies.
  • N6 full-length protein is visible and indicated with an arrow in the upper part of the blot.
  • both the 5'N6 CDS-NDnaE protein and the CDnaE-3'N6 CDS protein are visible.
  • FIG. 16 AAV-N6 intein administration results in therapeutic F8 levels without eliciting anti- F8 antibodies.
  • F8 activity levels (lU/dl) obtained from chromogenic assay performed on plasma samples.
  • AU/ml the corresponding amount of anti- F8 antibodies.
  • FIG. 17 Low-dose AAV-CodopN6 intein administration results in therapeutic levels of F8 in the absence of anti-F8 antibodies.
  • F8 activity levels (lU/dl) obtained from chromogenic assay performed on plasma samples.
  • AU/ml the corresponding amount of anti-F8 antibodies.
  • Each number plot on the x axis represents a single mouse.
  • Total number of analysed mice injected with AAV-CodopN6 intein at 4w.p.i. N 5.
  • Significant difference within the same mouse for F8 levels and anti-F8 antibodies were assessed using the paired sample t- test: **P ⁇ 0.05.
  • AAV-N6 intein and low-dose AAV-CodopN6 intein administration reduces the time for blood clotting formation.
  • Activated partial thromboplastin time (aPTT) assay performed on plasma samples both at baseline and at the last time-point of the analysis in 4 different groups: CodopV3 and N6 inteins at 16 w.pi; CodopN6 inteins at 8 w.p.i.; low-dose CodopN6 inteins at 4 w.p.i.; wild-type animals at 12 weeks of age.
  • Coagulation time is reported in seconds; significant differences were assessed using the non paramentic Kruskal-Wallis test.
  • AAV vector plasmids The plasmids used for AAV vector production derived from the pTigem AAV plasmid that contains the ITRs of AAV serotype 2.
  • the AAV plasmids were designed as detailed in Figure 1A and 2A.
  • the F8-N6 protein was split at the amino acid (a. a.) S962 (Set 1) or a. a. S883 (Set 2).
  • Inteins included in the plasmids were from the split intein of DnaE from Nostoc punctiforme (Npu) (27).
  • the plasmids used in the study were under the control of either the ubiquitous cytomegalovirus (CMV) (L. P.
  • CMV ubiquitous cytomegalovirus
  • polyA polyadenylation signal used in all plasmids was the short synthetic polyA (Levitt, N., Briggs, D., Gil, A. & Proudfoot, N. J. Definition of an efficient synthetic poly(A) site. Genes Dev. (1989)).
  • AAV vector production and characterisation AAV vectors were produced by the TIGEM AAV Vector Core by triple transfection of HEK293 cells as already described (A. Maddalena et al., Mol Ther 26, 524-541 (2016); M. Doria, A. Ferrara, A. Auricchio, Hum Gene Ther Methods 24, 392-398 (2013)). No differences in vector yields were observed between AAV vectors including or not intein sequences.
  • HEK293 cells were maintained and transfected using the calcium phosphate method (1 pg of each plasmid/well in 6-well plate format) as already described ( A. Maddalena et al., Mol Ther 26, 524-541 (2016)). The total amount of DNA transfected in each well was kept equal by addition of a scramble plasmid where needed.
  • Samples (HEK293 cells) were lysed in RIPA buffer to extract F8 protein. Lysis buffers were supplemented with protease inhibitors (Complete Protease inhibitor cocktail tablets; Roche, Basel, Switzerland) and 1 mM phenylmethylsulfonyl.
  • protease inhibitors Complete Protease inhibitor cocktail tablets; Roche, Basel, Switzerland
  • 1 mM phenylmethylsulfonyl For medium samples, cells were kept in Opti-MEM medium (Gibco, ThermoFisher Scientific, Germany), and upon cell harvesting at 72 hours post transfection unprocessed medium samples were mixed with IX Laemmli sample buffer. All samples were denatured at 99°C for 5 minutes in IX Laemmli sample buffer.
  • Lysates and medium samples were separated by either 12% (for excised intein detection) or 6% (for F8 protein detection) SDS-polyacrylamide gel electrophoresis (SDS-PAGE).
  • the antibodies used for immuno-blotting are as follows: anti-3xflag (1:1000, A8592; Sigma-Aldrich, Saint Louis, MO, USA) to detect the F8 protein; anti-P-Actin (1:1000, NB600-501; Novus Biological LLC, Littleton, CO, USA) to detect b-Actin proteins which were used as loading controls for the 12% SDS-PAGE; anti-Calnexin (1:1000, ADI-SPA-860; Enzo Life Sciences Inc, New York, NY, USA) to detect Calnexin, used as loading controls for the 6% SDS-PAGE.
  • the quantification of F8 bands detected by Western blot was performed using ImageJ software (free download is available at http://rsbweb.nih.gov/ij/).
  • Kruskal-Wallis rank sum test (non-parametric test) was performed to determine if there were statistically significant differences between more than two groups of an independent variable. As the tests were significant a multiple pairwise-comparison was further applied to determine if the differences between specific pairs of a group were statistically significant. Nemenyi's non- parametric all-pairs comparison test for Kruskal-type ranked data was used. To determine the statistical significance of two groups, unpaired Student's t test was used.
  • the plasmids used for AAV vector production derived from the pAAV2.1 plasmid that contain the ITRs of AAV serotype 2.
  • the AAV intein-ATP7B plasmids were designed as detailed in Figure 4. Codon-optimized human ATP7B cDNA was split at the nucleotide 1467.
  • Inteins included in the plasmids were the intein of DnaE from Nostoc punctiforme (Npu).
  • Simian virus 40 (SV40) intron, Woodchuck hepatitis virus Post-transcriptional Regulatory Element (WPRE), bovine Growth Hormone polyadenylation signal (bGHpA), and 3xFLAG sequences were also included. Cells transfection
  • HepG2 ATP7B knock-out cells (Chandhok G, Schmitt N, Sauer V, Aggarwal A, Bhatt M, Schmidt HH. The effect of zinc and D-penicillamine in a stable human hepatoma ATP7B knockout cell line.
  • PLoS One 2014;9:e98809. were maintained in RPMI 1640 (Euroclone) supplemented with 10% fetal bovine serum (FBS, Euroclone) plus 1% penicillin/streptomycin solution and 1% L- glutamine (Euroclone).
  • Cells were plated in 6 well-plate and transfected using LipoD293 (SignaGen Laboratories) with a total of 2pg of plasmid DN A per well. 72 hours after transfection, cells were washed in PBS and lysed in RIPA buffer supplemented with protease and phosphatase inhibitors (Roche).
  • AAV vectors were produced and tittered by the TIGEM AAV Vector Core as already described ( Maddalena A, et al. High-Throughput Screening Identifies Kinase Inhibitors That Increase Dual Adeno-Associated Viral Vector Transduction In Vitro and in Mouse Retina. Hum Gene Ther 2018;29:886-901).
  • Male 7-week-old C57BL/6 mice (Charles River Laboratories) were administered by intravenous injection with 200 mI of vector solution in 0.9% NaCI. At sacrifice, animals were perfused with PBS and livers were harvested and lysed in RIPA buffer using Tissuelyser (QIAGEN)
  • the DNA-E split intein may be derived from split inteins the DnaE gene (eg DNA polymerase III subunit alpha) from cyanobacteria including Nostoc punctiforme (Npu) Synechocystis sp. PCC6803 (Ssp), Fischerella sp.
  • DnaE gene eg DNA polymerase III subunit alpha
  • Npu Nostoc punctiforme
  • Ssp Synechocystis sp. PCC6803
  • Fischerella sp Fischerella sp.
  • DNA-B ssplit intein may be derived from the DnaB gene from cyanobacteria including R. marinus (Rma), Synechocystis sp. PC6803 (Ssp), Porphyra purpurea chloroplast (Ppu) which are described for instance in (59).
  • split inteins of the invention may be 100% identical, 98%, 80%, 75%, 70%, 65% 50% identical to naturally occurring inteins, wherein said inteins retain the ability to undergo trans splicing reactions.
  • fragments of naturally occurring or modified inteins which retain trans-splicing activity.
  • inteins have conserved functional features that guarantee their splicing activity.
  • four intein motifs have been identified (see below for their consensus sequence): Blocks A-H (Pietrokovski 1994 and Perler 1997) and Blocks N2 and N4 (Pietrokovski 1998).
  • Intein Blocks A, N2, B, N4, F, and G are involved in protein splicing.
  • Blocks C, D, E, H are in the endonuclease domain, which is absent from split inteins.
  • split inteins retain conserved motifs that are essential to the trans-splicing activity. (Intein database, disclosed in [Perler, F. B. (2002). InBase, the Intein Database. Nucleic Acids Res. 30, 383-384.])
  • the present inventors have used intein-mediated protein- transplicing in order to reconstitute large proteins in vivo.
  • Split inteins encoded by intein gene sequences are produced as precursor polypeptides, which through their structural complementation can reassemble and catalyze a protein trans-splicing reaction.
  • the N-intein gene is fused in frame with the sequence coding for the N-terminal portion of the protein of interest; the C-lntein gene is fused in frame with the sequence coding for the C-terminal portion of the sequence of interest.
  • the inteins undergo autocata lytic excision and form a ligated extein, eg the reconstituted protein of interest.
  • reconstitution of a protein of interest requires splitting said protein into two fragments, whose coding sequences are cloned separately into AAV vector, fused to a N- or C- Intein and under the control of a promoter.
  • Splitting points for each protein are selected taking into account the amino acid requirement at the junction point (eg presence of an amino acid containing a nucleophilic thiol or hydroxyl group (i.e. Cys, Ser or Thr) as first residue in the C- extein, as well as preservation of the integrity of critical protein domains in order to favor proper protein folding and stability of each intein-polypeptide precursor polypeptide and the resulting reconstituted protein.
  • amino acid requirement at the junction point eg presence of an amino acid containing a nucleophilic thiol or hydroxyl group (i.e. Cys, Ser or Thr) as first residue in the C- extein, as well as preservation of the integrity of critical protein domains in order to favor proper protein folding and stability of each intein-polypeptide precursor polypeptide and the resulting reconstituted protein.
  • the present inventors have selected junction points within two proteins of interest: the protein F8-N6 and ATP7B.
  • said coding sequence is split at a nucleotide corresponding to aa Ser962 or Ser883, or said coding sequence is split at nucleotide 2884 or at nucleotide 2647.
  • said coding sequence is split at nucleotide 1467 of ATP7B cDNA or said coding sequence is split at a nucleotide corresponding to Lys 489 of ATP7B
  • N6 AAV intein set 1 (CMV promoter) pl278_pTIGEM_CMV_5' SQ-N6 F8 (split Ser962) + N-intein DnaE_3xFlag_(Synthetic- polyA)
  • AGGTCACACGACACCCAGGCJTAGCCCAGGCG_GCCJCAGTCAGCGAGCGA_GCGC_G_CAG isEQ.LD_N_Qi.2_7j
  • N6 AAV intein set 1 (HLP promoter) pl417_pTIGEM_HLP_5' SQ-N6 F8 (split Ser962) + N-intein DnaE_3xFlag_(Synthetic- polyA)
  • CAACCTGCCCAACGA CTA CAAA GA CCA TGA CGGTGA TTA TAAA GA TCA TGA GA TCGA CTA CAA GGA rGACGATCACAAGTCAAAGCTTGATATCATCGAATTCAATAAAAGATCTTTATTTTCATTAG
  • CTCT_CTGC ICGCT( CTCGC_TCACT_GAGG_CCGG_G_CGAC_CAAAGGTCGC_CCGACGCCC_G_GGC
  • G_ACCAA_AGGT_CGCCC_GACGC_C_CGGG_C_TTTG_C_CCGGGCGGC_C_TCAG_TGAG_CGAGC_GAGC_GCAG isEQ.I_D_N_Qi.29j
  • N6 AAV intein set 2 pl276_pTIGEM_CMV_5' SQ-N6 F8 (split Ser883) + N-intein DnaE_3xFlag_(Synthetic- polyA)
  • CTGTGACTGTG GAGGATGGCCCCACCAAGTCTGACCCCAGGTGCCTGACCAGATACTACAGCAG

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Virology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Peptides Or Proteins (AREA)
  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)

Abstract

The present invention relates to constructs, vectors, relative host cells and pharmaceutical compositions which allow an effective gene therapy, in particular for diseases due to mutations in genes with a coding sequence (CDS) larger than 5 kb.

Description

Constructs comprising inteins
TECHNICAL FIELD
The present invention relates to constructs, vectors, relative host cells and pharmaceutical compositions which allow an effective gene therapy, in particular for diseases due to mutations in genes with a coding sequence (CDS) larger than 5 kb.
BACKGROUND OF THE INVENTION
Gene therapy with adeno-associated viral (AAV) vectors is safe and effective in humans. AAV- based gene therapy products have been approved in recent years both in USA and Europe for inherited metabolic and blinding diseases, whilst clinical trials for AAV-based gene therapy approaches for diseases in different therapeutic areas ranging from ophthalmology to hematology to musculoskeletal and metabolic disorders, are ever increasing.
However, the limit of AAV vectors cargo capacity prevents development of AAV-based therapies for diseases due to mutations in genes with a coding sequence (CDS) larger than 5 kb (herein referred to also as large genes).
The inventors and others have shown that this limitation can be overcome by using either dual (up to 9 kb) or triple (up to 14 kb) AAV vectors, each containing fragments of the coding sequence (CDS) of the large transgene expression cassette. Dual and triple AAV vectors exploit concatemerization and recombination of AAV genomes to reconstitute the full-length genomes in cells co-infected by multiple AAV vectors. However, the efficiency of transgene expression achieved with either dual or triple AAV vectors in photoreceptors, which are the main therapeutic targets for most inherited retinal diseases, is lower than that achieved with single AAV vectors. This might be due to the various limiting steps required for efficient transduction, including proper DNA concatemer formation, stability of the heterogeneous mRNA and splicing efficiency across the junctions of the vectors.
The inventors have found that delivery of multiple AAV vectors each encoding one of the fragments of either reporter or large therapeutic proteins flanked by short split-inteins results in protein trans-splicing and full-length protein reconstitution both in vitro and in vivo.
Inteins are genetic elements transcribed and translated within a host protein from which they self-excise similarly to a protein intron, without leaving amino acid modifications in the final protein product, in the absence of energy supply, exogenous host-specific proteases or co factors (O. Novikova, N. Topilina, M. Belfort, J Biol Chem 289, 14490-14497 (2014); K. V. Mills, M. A. Johnson, F. B. Perler, J Biol Chem 289, 14498-14505 (2014); H. Iwai, S. Zuger, J. Jin, P. H. Tam, FEBS Lett 580, 1853-1858 (2006); J. Zettler, V. Schutz, H. D. Mootz, FEBS Lett 583, 909-914 (2009). Intein activity is context-dependent, with certain peptide sequences surrounding their ligation junction (called N- and C-exteins) that are required for efficient trans- splicing to occur, of which the most important is an amino acid containing a thiol or hydroxyl group (i.e., Cys, Ser orThr) as first residue in the C-extein (N. H. Shah, et al., J Am Chem Soc 135, 5839-5847 (2013). Split-inteins are a subset of inteins that are expressed as two separate polypeptides at the ends of two host proteins, and catalyze their trans- splicing resulting in the generation of a single larger polypeptide (Y. Li, Biotechnol Lett 37, 2121-2137 (2015). Inteins, including split-inteins, are widely used in biotechnological applications that include protein purification and labeling steps, as well as the reconstitution of the widely used CRISPR/Cas9 genome editing nuclease. Several attempts have been made at exploiting intein-based protein splicing to reconstitute expression of therapeutic genes including the Factor VIII gene, wherein the Synechocystis sp (Ssp) DnaB intein-fused heavy and light chain genes of Factor VIII were demonstrated to lead to reconstitution of Factor VIII in cell culture and in animal models (L. Villiger et al., Nat Med 24, 1519-1525 (2018); F. Zhu et al, Sci China Life, 2010; F. Zhu et al Sci China Life, 2013). Similarly, a highly functional form of the dystrophin gene was expressed in vitro and in vivo, wherein the 6.3-kb Becker dystrophin gene was split onto two AAV vectors and each half was fused to split inteins obtained from the Synechocystis sp. PCC 6803 (Ssp) DnaB intein or the Rhodothermus marinus (Rma) DnaB intein. Further, split-intein (namely N. punctiforme DnaE split inteins)- mediated protein trans-splicing strategy was reported to reconstitute the large pore-forming subunit of L-type calcium channels from two separate fragments in heart cells. US 6,544,786 further reports the use of split inteins to deliver a dystrophin minigene.
The present inventors took advantage of the intrinsic ability of split-inteins to mediate protein trans-splicing to reconstitute large full-length proteins following their fragmentation into either two split-intein-flanked polypeptides, whose coding sequences fit into single AAV vectors. The present inventors further reported successful AAV-mediated protein trans splicing (PTS) in the retina resulting in therapeutic levels of protein reconstitution which in some instances match those achieved by single AAV vectors( Tornabene, P. et al. Intein-mediated protein trans-splicing expands adeno-associated virus transfer capacity in the retina. Sci. Transl. Med. (2019); WO 2020/079034 the disclosure of which is herein incorporated by reference in its entirety). Haemophilia A (HemA) is a severe bleeding disorder caused by the partial or complete deficiency of coagulation factor VIII (F8). With a prevalence of 1 in 5000 male live births, it is the most common inherited X-linked recessive coagulation disorder. F8 activity levels inversely relate to the bleeding risk with severely affected patients (about 50% of all cases) having circulating protein levels of less than 1% (Bowen, D. J. Haemophilia A and haemophilia B: molecular insights. Mol Pathol. 2002 Feb;55(l):l-18 ; 11836440; Antonarakis, S. E., et al., Molecular etiology of factor VIII deficiency in hemophilia A. Hum. Mutat. (1995); White, G. C. et al. Definitions in Hemophilia. Thromb. Haemost. (2001)) On the other hand, levels of F8 activity between 1 and 5% result in a moderate phenotype while between 5 and 50% in mild, and levels above 50% are associated with normal haemostasis.
A wide spectrum of heterogeneous mutations impairs the F8 gene causing different levels of severity of the disease. In the most affected patients, there is a high risk of spontaneous bleeding into joints, muscles and other soft tissues as well as lowered or no ability to achieve haemostasis following trauma( Bolton-Maggs, P. H. B. & Pasi, K. J. Haemophilias A and B. Lancet (2003)).
In the last two decades, gene therapy for HemA has been under extensive investigation after it was observed that even modest improvements in the F8 levels (by 1-2%) can significantly reduce the risk of spontaneous bleeding events and the need for F8 replacement infusions (Manco- Johnson, M. J. et al. Prophylaxis versus episodic treatment to prevent joint disease in boys with severe hemophilia. N. Engl. J. Med. (2007)).
Adeno-associated viral (AAV) vectors are emerging as a promising in vivo gene therapy of HemA, because of vectors' excellent safety profile and ability to direct long-term transgene expression from post-mitotic tissues such as the liver (Nathwani, A. C., Davidoff, A. M. & Tuddenham, E. G. D. Prospects for gene therapy of haemophilia. Haemophilia (2004) ; A.C., N. et al., Advances in Gene Therapy for Hemophilia. Hum. Gene Ther. (2017)). However, HemA poses a great challenge to AAV gene therapy because of the size of the F8 gene coding sequence (CDS) to be transferred (7 kb) that exceeds the canonical AAV cargo capacity of 4.7 kb. Because of this, all of the AAV-based products under clinical investigation consist of B-domain deleted (BDD) versions of the F8 transgene which are ~4.4 kb in size (Makris, M. Gene therapy 1-0 in haemophilia: effective and safe, but with many uncertainties. The Lancet Haematology (2020) doi:10.1016/S2352-3026(20)30035-l). Still such large transgene leaves limited space in the vector for the needed regulatory elements, thus restricting the choice of promoters and polyA signals. Moreover, all these vector genomes are on the verge of AAV's normal cargo capacity and at risk of being improperly packaged as a library of heterogeneous truncated genomes. In spite of the ability of such oversize vectors to successfully express large proteins, their long term efficiency and safety are still to be confirmed (Grieger, J. C. et al., Packaging Capacity of Adeno-Associated Virus Serotypes: Impact of Larger Genomes on Infectivity and Postentry Steps. J. Virol. (2005); Dong, B. et al., Characterization of genome integrity for oversized recombinant AAV vector. Mol. Ther. (2010); Hirsch, M. et al., Little vector, big gene transduction: Fragmented genome reassembly of adeno-associated virus. Molecular Therapy (2010); Wu, Z.et al., Effect of genome size on AAV vector packaging. Mol. Ther. (2010)).
As an alternative, different groups have explored strategies based on co-delivery of dual AAV vectors to reconstitute F8. However, the main drawback of this approach is the apparent chain imbalance in which the heavy chain is less efficiently secreted than the light one, resulting in the production of higher amounts of inactive protein compared to full-length F8 (Burton, M. et al. Coexpression of factor VIII heavy and light chain adeno-associated viral vectors produces biologically active protein. Proc. Natl. Acad. Sci. U. S. A. (1999); Scallan, C. D. et al. Phenotypic correction of a mouse model of hemophilia A using AAV2 vectors encoding the heavy and light chains of FVIII. Blood (2003); Chen, L. et al. Enhanced factor VIII heavy chain for gene therapy of Hemophilia A. Mol. Ther. (2009); Zhu, F. X., et al., Enhanced plasma factor VIII activity in mice via cysteine mutation using dual vectors. Sci. China Life Sci. (2012)).
Wilson disease (WD, OMIM #277900) is a rare autosomal recessive disorder of copper metabolism with an estimated prevalence ranging from one in 30,000 in most populations to one in 10,000 in isolated populations . WD is caused by mutations in ATP7B gene ( Bull PC, et al., The Wilson disease gene is a putative copper transporting P-type ATPase similar to the Menkes gene. Nat Genet 1993;5:327-337; Tanzi RE et al. The Wilson disease gene is a copper transporting ATPase with homology to the Menkes disease gene. Nat Genet 1993;5:344-350.), encoding for a P-type copper transporting ATPase highly expressed in hepatocytes which is critical for regulation of copper levels. WD is characterized by toxic copper deposit in liver but also in brain, eyes and kidneys, even though at a lesser extent. Clinical symptoms in WD include cirrhosis and chronic hepatitis that end in liver failure, psychiatric and neurological deficits including Parkinsonism and seizures. The age of presentation of WD, the prevalence of hepatic and central nervous system (CNS) involvement, and their severity are highly variable (Rosencrantz R, Schilsky M. Wilson disease: pathogenesis and clinical considerations in diagnosis and treatment. Semin Liver Dis 2011;31:245-259).
Current therapies for WD are based on removal of copper deposit by chelating agents (D- penicillamine, trientine, tetrathiomolybdate) and reduction of copper intestinal absorption by zinc salts through stimulation of endogenous chelators such as metallothioneins . Therapy is effective in most but not all WD patients and non-responders usually develop progressive liver failure and require liver transplantation. In patients responding to therapy, compliance to treatments is often an issue given the drug side effects and duration of the treatment. Therefore, alternative therapies for WD are highly needed and gene therapy has the potential to provide a definitive cure for this severe, life-threatening disease.
AAV-mediated delivery of full-length human ATP7P resulted in sustained correction of copper metabolism in young male Atp7b-/- mice but resulted in poor production yield due to oversized genome and failed to ameliorate the disease phenotype in female and old male Atp7b-/- mice ( Murillo O, Luqui DM, Gazquez C, Martinez-Espartosa D, Navarro-Blasco I, Monreal Jl, Guembe L, et al. Long-term metabolic correction of Wilson's disease in a murine model by gene therapy. J Hepatol 2016;64:419-426). More recently, delivery of a mini-ATP7B, in which four out of six metal binding domains (MDBs) had been deleted, restored copper homeostasis in male and female Atp7b-/- mice ( Murillo O, Moreno D, Gazquez C, Barberia M, Cenzano I, Navarro I, Uriarte I, et al. Liver Expression of a MiniATP7B Gene Results in Long-Term Restoration of Copper Homeostasis in a Wilson Disease Model in Mice. Hepatology 2019;70:108-126). Nevertheless, mini-ATP7B lacks important phosphorylation sites that regulate protein stability and trafficking and maybe less effective than full-length protein in vivo ( Pilankatta R, Lewis D, Inesi G. Involvement of protein kinase D in expression and trafficking of ATP7B (copper ATPase). J Biol Chem 2011;286:7389-739617).
Therefore, there is still the need for constructs and vectors that can be exploited to reconstitute large gene expression for an effective gene therapy.
SUMMARY OF THE INVENTION
The present inventors have used the split inteins, in particular Npu DnaE inteins to reconstitute variants of F8 and ATP7B genes for the treatment of hemophilia and Wilson disease. Inventors applied liver gene therapy with AAV intein vectors to reconstitute and improve a highly active variant of the F8 gene, named N6 variant (5 kb) (Miao, H. Z. et al. Bioengineering of coagulation factor VIII for improved secretion. Blood (2004); Ward, N. J. et al. Codon optimization of human factor VIII cDNAs leads to high-level expression. Blood (2011)) showing supraphysiological levels of F8 activity. The inventors showed that their approach allowed successful reconstitution both in vitro and in vivo, thus allowing generation of well-defined AAV vectors within their normal packaging capacity. Furthermore, the inventors showed that their approach achieves levels of F8 comparable with those obtained with a previously described oversize single AAV-F8 variant, however surprisingly, and unlike the single AAV vector, the inventors' intein-based approach does not elicit F8 neutralizing antibodies, overcoming certain existing limitations of hemophilia A gene therapy.
Inventors also applied gene therapy with AAV intein vectors to reconstitute the ATP7B gene for the treatment of Wilson disease.
In one aspect, the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein:
(a) the first vector comprises a first portion of the coding sequence (CDS1) and a first intein nucleotide sequence that encodes a N-lntein, wherein the first intein nucleotide sequence is at the 3' end of CDS1; and
(b) the second vector comprises a second portion of the coding sequence (CDS2) and a second intein nucleotide sequence that encodes a C-lntein, wherein the second intein nucleotide sequence is at the 5' end of CDS2; wherein the coding sequence encodes Factor VIII (F8) or ATP7B, or a variant thereof.
In preferred embodiments, the first portion of the coding sequence (CDS1) and the second portion of the coding sequence (CDS2) together constitute the coding sequence.
In preferred embodiments, when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
In some embodiments, the Factor VIII variant is N6 or SQ-N6.
In another aspect, the present invention provides a vector system to express a coding sequence in a cell, said coding sequence consisting of a first portion (CDS1) and a second portion (CDS2) said vector system comprising: a) a first vector comprising:
- said first portion of said coding sequence (CDS1), -a first intein nucleotide sequence coding for a N-lntein said first intein nucleotide sequence having at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with the sequence:
TGCCTGAGCTACGAGACCGAGATCCTGACCGTGGAGTACGGCCTGCTGCCCATCGGCAAGATCG
TGGAGAAGCGGATCGAGTGCACCGTGTACAGCGTGGACAACAACGGCAACATCTACACCCAGCC
CGTGGCCCAGTGGCACGACCGGGGCGAGCAGGAGGTGTTCGAGTACTGCCTGGAGGACGGCAGC
CTGATCCGGGCCACCAAGGACCACAAGTTCATGACCGTGGACGGCCAGATGCTGCCCATCGACG or said first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with the sequence:
TGCCTGTCCTATGAGACAGAGATCCTGACAGTGGAGTACGGCCTGCTGCCTATCGGCAAGA
TCGTGGAGAAGAGGATCGAGTGTACCGTGTATAGCGTGGACAACAATGGCAATATCTACA
CACAGCCAGTGGCACAGTGGCACGACAGGGGAGAGCAGGAGGTGTTTGAGTATTGTCTGG
AGGATGGCAGCCTGATCCGGGCCACCAAGGATCACAAGTTCATGACAGTGGACGGCCAGAT
GCTGCCAATCGATGAGATCTTTGAGCGCGAGCTGGACCTGATGCGGGTGGATAACCTGCCC
AAT (Seq ID No. 16), or said N-intein has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID No. 1 or a variant thereof or a fragment thereof or a homolog thereof; and wherein said first intein nucleotide sequence is located at the 3' end of CDS1; and b) a second vector comprising:
- said second portion of said coding sequence (CDS2),
-a second intein nucleotide sequence coding for a C-lntein said second intein nucleotide sequence having at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with the sequence:
ATCAAGATCGCCACCCGGAAGTACCTGGGCAAGCAGAACGTGTACGACATCGGCGTGGAGCGGG
ACCACAACTTCGCCCTGAAGAACGGCTTCATCGCCAGCAAT (SEQ ID No. 17), or said second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%,
97%, 98%, 99% or 100%) identity with the sequence:
ATGATCAAGATCGCCACACGGAAGTACCTGGGCAAGCAGAACGTGTATGATATCGGCGTG GAGCGGGACCACAACTTCGCCCTGAAGAATGGCTTTATCGCCAGCAAT (SEQ ID No. 18), or said C-intein has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID No. 2 or a variant thereof or a fragment thereof or a homolog thereof; and wherein said second intein nucleotide sequence is located at the 5' end of CDS2; wherein when the first vector and the second vector are inserted in a cell, the protein product of the coding sequence is produced by protein splicing.
In some embodiments, the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 15.
In some embodiments, the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 16.
In some embodiments, the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 1.
In some embodiments, the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 17.
In some embodiments, the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 18.
In some embodiments, the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 2.
In some embodiments, the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 1, 3, 5, 7, 9, 11 or 13.
In some embodiments, the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 2, 4, 6, 8, 10, 12 or 14. In some embodiments, the first intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with the N-intein sequence shown in SEQ ID NO: 26, 28, 30, 32, 34 or 38.
In some embodiments, the second intein nucleotide sequence has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with the C-intein sequence shown in SEQ ID NO: 27, 29, 31, 33, 35 or 39.
The intein nucleotide sequence may be codon optimized.
Preferably, the inteins are capable of trans-splicing reactions.
In some embodiments, the vector system is a vector combination comprising the first vector and the second vector. In some embodiments, the first and/or second vector comprises a 5'-ITR. The 5'-ITR may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 40.
CTGCGCGCTCGCTCGCJ_C_ACTGAGGCCGCCCGGGCAAAGCC_C_GGGC_GTCGGGCGA_C_CTTT £Q.ICGCCjGGCCT:CAGTjTAGCiiAGCilAGCGj^GCAG_AGA_GGGAGTGGC_CAACT_CCATCAC
TAGGGGTTCCT
(SEQ ID NO: 40)
In some embodiments, the first and/or second vector comprises a 3'-ITR. The 3'-ITR may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 41.
AG_GAAC_CCCTAGTGATGGAGTTG_G_CCACT_CQCTCT_CTGC_G_CGCTC_G_CTCG_CTCACT_GAG
GC_CGGGCGACCAAAG_GTCGC_C_CGAC_GCCCGGGCTJ_TGCCCG_GGCGGCCTCAGTGAGCGA GCGAGC_G_C_GCAG_
(SEQ ID NO: 41)
In some embodiments, the first and/or second vector comprises an intron, preferably an SV40 intron. The intron may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 42. aagtatcaiaggttacaagacagg.tttaaggag.accaatagaaactgggcttgtQg.ag.acagagaagactcttgc^ttt.c igata^gcacctattggtcttactgacatccacttgcctttctctccacag
(SEQ ID NO: 42)
In some embodiments, the first and/or second vector comprises a promoter. Preferably the promoter is operably linked to the first or second portion of the coding sequence. The promoter may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 43 or 44.
GATAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCG
TTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGT
CAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGG
AGTATTTACG GTAAACTGCCCACTTG GC AGTAC ATC AAGTGTATCATATG CC AAGTACG CCCCC
TATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGA
CTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGC
AGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTG
ACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACT CCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCT
GGTTTAGTGAACCGTCAGATCA (SEQ ID NO: 43; CMV promoter)
TGTTTGCTGCTTGCAATGTTTGCCCATTTTAGGGTGGACACAGGACGCTGTGGTTTCTGAGCCA GGGGGCGACTCAGATCCCAGCCAGTGGACTTAGCCCCTGTTTGCTCCTCCGATAACTGGGGTG ACCTTGGTTAATATTCACCAGCAGCCTCCCCCGTTGCCCCTCTGGATCCACTGCTTAAATACGG ACGAGGACAGGGCCCTGTCTCCTCAGCTTCAGGCACCACCACTGACCTGGGACAGTGAAT (SEQ ID NO: 44; HLP promoter)
In some embodiments, the first and/or second vector comprises a polyadenylation sequence. Preferably the polyadenylation sequence is operably linked to the first or second portion of the coding sequence. The polyadenylation seqeunce may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 45 or 46.
AATT CAAT AAAAG AT CTTT ATTTT CATT AG ATCTGTGTGTTG GTTTTTT GTGTGCGGCC (SEQ ID NO: 45) gcctcgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctggaaggtgccactccca ctgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagtaggtgtcattctattctggggggtggggtggggcag gacagcaagggggaggattgggaagacaatagcaggcatgctgggga (SEQ ID NO: 46)
In some embodiments, the first and/or second vector comprises an enhancer. Preferably the enhancer is a WPRE. The enhancer may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 47. aatcaacctctggattacaaaatttgtgaaagattgactggtattcttaactatgttgctccttttacgctatgtggatacgctg ctttaatgcctttgtatcatgctattgcttcccgtatggctttcattttctcctccttgtataaatcctggttgctgtctctttatgag gagttgtggcccgttgtcaggca acgtggcgtggtgtgca ctgtgtttgctga cgca a ccccca ctggttggggca ttgcca c cacctgtcagctcctttccgggactttcgctttccccctccctattgccacggcggaactcatcgccgcctgccttgcccgctgc tggacaggggctcggctgttgggcactgacaattccgtggtgttgtcggggaagctgacgtcctttccatggctgctcgcctg tgttgccacctggattctgcgcgggacgtccttctgctacgtcccttcggccctcaatccagcggaccttccttcccgcggcctg ctgccggctctgcggcctcttccgcgtcttcg
(SEQ ID NO: 47)
In some embodiments, the first vector comprises nucleotide sequence that encodes a signal peptide. Preferably, the signal peptide is operably linked to the protein encoded by the coding sequence. The nucleotide sequence encoding the signal peptide may have a nucleotide sequence that has at least 80 % (e.g. at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 48 or 49.
AT G C A A AT AG AG CTCTCC ACCTG CTT CTTT CTGTG CCTTTT G CG ATT CT G CTTT AGT (SEQ ID NO: 48; F8 signal peptide)
ATGCAGATTGAGCTGAGCACCTGCTTCTTCCTGTGCCTGCTGAGGTTCTGCTTCTCT (SEQ ID NO: 49; codon optimized F8 signal peptide)
Preferably the first vector and the second vector further comprise a promoter sequence operably linked to the 5'end portion of said first portion of the coding sequence (CDS1) or of said second portion of the coding sequence (CDS2), preferably said promoter is a liver specific promoter, preferably the promoter is HLP or thyroxine binding globulin (TBG), HCB promoter , F8 promoter. In some embodiments, the promoter is a HLP promoter.
In some embodiments, the first vector comprises a promoter, preferably operably linked to the first portion of the coding sequence (CDS1). In some embodiments, the second vector comprises a promoter, preferably operably linked to the second portion of the coding sequence (CDS2).
Preferably the first vector and the second vector further comprise a 5'-terminal repeat (5'-TR) nucleotide sequence and a 3' -terminal repeat (3'-TR) nucleotide sequence, preferably the 5'-TR is a 5'-inverted terminal repeat (5'-ITR) nucleotide sequence and the 3'-TR is a 3'-inverted terminal repeat (3'-ITR) nucleotide sequence. In some embodiments, the first and/or second vector comprises an AAV25'-ITR and AAV2-3'ITR. In some embodiments, the first and/or second vector comprises an AAV8 5'-ITR and AAV8-3'ITR.
Preferably the first vector and the second vector further comprise a poly-adenylation signal nucleotide sequence and/or wherein at least one of the first vector or the second vector further comprises a nucleotide sequence coding for a degradation signal.
Preferably the degradation signal is selected from the group consisting of CL1, PB29, SMN, CIITA, ODc, ecDHFR or a fragment thereof.
Preferably the coding sequence encodes a protein able to correct hemophilia or Wilson disease. Preferably the coding sequence is the coding sequence of a gene selected from the group consisting of: F8 or ATP7B or variant thereof. Preferably the variant is N6 or N6-SQ.
In some embodiments, the coding sequence is an F8 coding sequence or variant thereof. In some embodiments, the coding sequence is an ATP7B coding sequence. In preferred embodiments, the coding sequence encodes a B-domain deleted (BDD) F8.
In some embodiments, the coding sequence comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 22 or 23.
In some embodiments, the coding sequence comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 24 or 25.
In some embodiments, the coding sequence comprises or consists of a nucleotide sequence that encodes a protein with at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 19 or 20, preferably SEQ ID NO: 19..
In some embodiments, the coding sequence comprises or consists of a nucleotide sequence that encodes a protein with at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with SEQ ID NO: 21.
In some embodiments, the first portion of the coding sequence (CDS1) is the portion disclosed in SEQ ID NO: 26, 28, 30, 32 or 38 (or a sequence with at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity thereto).
In some embodiments, the second portion of the coding sequence (CDS1) is the portion disclosed in SEQ ID NO: 27, 29, 31, 33 or 39 (or a sequence with at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity thereto).
In another aspect, the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein:
(a) the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 26; and
(b) the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 27; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
In another aspect, the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein: (a) the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 28; and
(b) the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 29; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
In another aspect, the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein:
(a) the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 30; and
(b) the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 31; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
In another aspect, the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein:
(a) the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 32; and
(b) the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 33; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
In another aspect, the invention provides a vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein: (a) the first vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 38; and
(b) the second vector comprises or consists of a nucleotide sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with SEQ ID NO: 39; preferably wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing.
Preferably the coding sequence is split into the first portion or the second portion at a position consisting of a nucleophile amino acid which does not fall within a structural domain or a functional domain of the encoded protein product, wherein the nucleophile aminoacid is selected from serine, threonine, or cysteine.
In some embodiments, the coding sequence is split into the first portion or the second portion at a position that substantially does not affect expression and/or activity (e.g. procoagulant activity) of the protein encoded by the coding sequence.
In some embodiments, the F8 coding sequence is split at a position corresponding to Ser962 or Ser883, preferably Ser962 (with Ser962 or Ser883 being the first amino acid of the second portion). Preferably, the coding sequence split site is defined with respect a numbering convention with respect to SEQ ID NO: 20 in which the first Met amino acid is position 1.
In some embodiments the first portion of the coding sequence encodes an amino acid sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with any one of SEQ ID NO: 50-54.
MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN TSW YKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTW ITLKNMASHPVSLHAV GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH VDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD AASARAWPKMHTVNGYVNRSLPGL IGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNH RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNE EAEDYDDDLTDSEMDW RFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA PDDRSYKSQYLNNGPQRIGRKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTL LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGEI FKYKWTVTVEDGP TKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDE NRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHS INGYVFDSLQLSVCLHEVAYWYILS IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRG MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPE
(SEQ ID NO: 50) MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN TSW YKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTW ITLKNMASHPVSLHAV GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH VDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD AASARAWPKMHTVNGYVNRSLPGLIGCHRKSVYWHVIGMGTT PEVHSIFLEGHTFLVRNH RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNE EAEDYDDDLTDSEMDW RFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA PDDRSYKSQYLNNGPQRIGRKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTL LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGEI FKYKWTVTVEDGP TKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDE NRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHS INGYVFDSLQLSVCLHEVAYWYILS IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVEMSMENPGLWILGCHNSDFRNRG MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATE LKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKSSPLT E
(SEQ ID NO: 51)
MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN TSW YKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTW ITLKNMASHPVSLHAV GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH VDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD AASARAWPKMHTVNGYVNRSLPGL IGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNH RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNE EAEDYDDDLTDSEMDW RFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA PDDRSYKSQYLNNGPQRIGRKYKKVREMAYTDETFKTREAIQHESGILGPLLYGEVGDTL LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGE IFKYKWTVTVEDGP TKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDE NRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHS INGYVFDSLQLSVCLHEVAYWYILS IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVEMSMENPGLWILGCHNSDFRNRG MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPE
(SEQ ID NO: 52)
MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN TSW YKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTW ITLKNMASHPVSLHAV GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH VDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD AASARAWPKMHTVNGYVNRSLPGL IGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNH RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNE EAEDYDDDLTDSEMDW RFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA PDDRSYKSQYLNNGPQRIGRKYKKVREMAYTDETFKTREAIQHESGILGPLLYGEVGDTL LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGE IFKYKWTVTVEDGP TKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDE NRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHS INGYVFDSLQLSVCLHEVAYWYILS IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRG MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATE LKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKSSPLT E
(SEQ ID NO: 53)
MPEQERQITAREGASRKILSKLSLPTRAWEPAMKKSFAFDNVGYEGGLDGLGPSSQVATSTVRILGMTCQS
CVKSIEDRISNLKGIISMKVSLEQGSATVKYVPSVVCLQQVCHQIGDMGFEASIAEGKAASWPSRSLPAQEA
VVKLRVEGMTCQSCVSSIEGKVRKLQGVVRVKVSLSNQEAVITYQPYLIQPEDLRDHVNDMGFEAAIKSKV
APLSLGPIDIERLQSTNPKRPLSSANQNFNNSETLGHQGSHVVTLQLRIDGMHCKSCVLNIEENIGQLLGVQ
SIQVSLENKTAQVKYDPSCTSPVALQRAIEALPPGNFKVSLPDGAEGSGTDHRSSSSHSPGSPPRNQVQGTC
STTLIAIAGMTCASCVHSIEGMISQLEGVQQISVSLAEGTATVLYNPSVISPEELRAAIEDMGFEASVVSESCS
TNPLGNHSAGNSMVQTTDGTPTSVQEVAPHTGRLPANHAPDILAKSPQSTRAVAPQK
(SEQ ID NO: 54)
In some embodiments the second portion of the coding sequence encodes an amino acid sequence that has at least 75% (e.g. at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%, preferably 100%) identity with any one of SEQ ID NO: 55-59.
SGLQLRLNEKLGTTAATELKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKS
SPLTESGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTREITRTTLQSDQEEIDYDDTISVEMKKEDFDI
YDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSVPQFKKVVFQEFTDGSFTQPLYR
GELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYSSLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQH
HMAPTKDEFDCKAWAYFSDVDLEKDVHSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFDETKSWYFT
ENMERNCRAPCNIQMEDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIHFSGH
VFTVRKKEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGMASGHI
RDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQGARQKFSSLYISQFIIMY
SLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIARYIRLHPTHYSIRSTLRMELMGCDLNSCSMP
LGMESKAISDAQITASSYFTNMFATWSPSKARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTT
QGVKSLLTSMYVKEFLISSSQDGHQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVH
QIALRMEVLGCEAQDLY
(SEQ ID NO: 55)
SGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTREITRTTLQSDQEEIDYDDT
ISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSV
PQFKKVVFQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYS
SLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDV
HSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQM
EDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIHFSGHVFTVRK KEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGM
ASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQ
GARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIAR
YIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSK
ARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQD
GHQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCE
AQDLY
(SEQ ID NO: 56)
SGLQLRLNEKLGTTAATELKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKS
SPLTESGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTRHQREITRTTLQSDQEEIDYDDTISVEMKKE
DFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSVPQFKKVVFQEFTDGSFTQP
LYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYSSLISYEEDQRQGAEPRKNFVKPNETKTYFWK
VQHHMAPTKDEFDCKAWAYFSDVDLEKDVHSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFDETKS
WYFTENMERNCRAPCNIQMEDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIH
FSGHVFTVRKKEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGM
ASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQGARQKFSSLYIS
QFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIARYIRLHPTHYSIRSTLRMELMGCDLN
SCSMPLGMESKAISDAQITASSYFTNMFATWSPSKARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKV
TGVTTQGVKSLLTSMYVKEFLISSSQDGHQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQ
SWVHQIALRMEVLGCEAQDLY
(SEQ ID NO: 57)
SGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTRHQREITRTTLQSDQEEIDYDDT
ISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSV
PQFKKVVFQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYS
SLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDV
HSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQM
EDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIHFSGHVFTVRK
KEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGM
ASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQ
GARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIAR
YIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSK
ARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQD GHQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCE
AQDLY
(SEQ ID NO: 58)
CFLQIKGMTCASCVSNIERNLQKEAGVLSVLVALMAGKAEIKYDPEVIQPLEIAQFIQDLGFEAAVMEDYAG
SDGNIELTITGMTCASCVHNIESKLTRTNGITYASVALATSKALVKFDPEIIGPRDIIKIIEEIGFHASLAQRNPN
AHHLDHKMEIKQWKKSFLCSLVFGIPVMALMIYMLIPSNEPHQSMVLDHNIIPGLSILNLIFFILCTFVQLLG
GWYFYVQAYKSLRHRSANMDVLIVLATSIAYVYSLVILVVAVAEKAERSPVTFFDTPPMLFVFIALGRWLEHL
AKSKTSEALAKLMSLQATEATVVTLGEDNLIIREEQVPMELVQRGDIVKVVPGGKFPVDGKVLEGNTMADE
SLITGEAMPVTKKPGSTVIAGSINAHGSVLIKATHVGNDTTLAQIVKLVEEAQMSKAPIQQLADRFSGYFVPF
IIIMSTLTLVVWIVIGFIDFGVVQRYFPNPNKHISQTEVIIRFAFQTSITVLCIACPCSLGLATPTAVMVGTGVA
AQNGILIKGGKPLEMAHKIKTVMFDKTGTITHGVPRVMRVLLLGDVATLPLRKVLAVVGTAEASSEHPLGV
AVTKYCKEELGTETLGYCTDFQAVPGCGIGCKVSNVEGILAHSERPLSAPASHLNEAGSLPAEKDAVPQTFSV
LIGNREWLRRNGLTISSDVSDAMTDHEMKGQTAILVAIDGVLCGMIAIADAVKQEAALAVHTLQSMGVDV
VLITGDNRKTARAIATQVGINKVFAEVLPSHKVAKVQELQNKGKKVAMVGDGVNDSPALAQADMGVAIG
TGTDVAIEAADVVLIRNDLLDVVASIHLSKRTVRRIRINLVLALIYNLVGIPIAAGVFMPIGIVLQPWMGSAA
MAASSVSVVLSSLQLKCYKKPDLERYEAQAHGHMKPLTASQVSVHIGMDDRWRDSPRATPWDQVSYVS
QVSLSSLTSDKPSRHSAAADDDGDKWSLLLNGRDEEQYI
(SEQ ID NO: 59)
Preferably the coding sequence is codon optimized.
In some embodiments, the coding sequence is not codon optimised.
Preferably coding sequence has at least 80% (for example, at least 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100%) identity with a sequence selected from the group consisting of: a) ATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGCTTTAGTGCCACCAG AAG AT ACT ACCT G GGTG CAGTG G AACTGTC ATGG G ACTATATG CAAAGTG ATCTCG GTG AG C T G CCT GT G G ACGC AAG ATTTCCT CCT AG AGT GCCAAAAT CTTTTCCATT C AAC ACCT CAGTCGT GTACAAAAAG ACT CT GTTT GT AG AATT CACGG AT C ACCTTTT C AACATCG CT AAG CC AAGG CCA CCCTGGATGGGTCTGCTAGGTCCTACCATCCAGGCTGAGGTTTATGATACAGTGGTCATTACA CTTAAGAACATGGCTTCCCATCCTGTCAGTCTTCATGCTGTTGGTGTATCCTACTGGAAAGCTT CTGAGGGAGCTGAATATGATGATCAGACCAGTCAAAGGGAGAAAGAAGATGATAAAGTCTTC CCTGGTGGAAGCCATACATATGTCTGGCAGGTCCTGAAAGAGAATGGTCCAATGGCCTCTGAC CCACT GTG CCTT ACCT ACT CAT AT CTTT CT CAT GTG G ACCTG GT AAAAG ACTT G AATT C AGG CCT
CATTGGAGCCCTACTAGTATGTAGAGAAGGGAGTCTGGCCAAGGAAAAGACACAGACCTTGC AC AAATTT AT ACT ACTTTTT G CT GTATTT GAT G AAG GG AAAAGTT GG CACT C AG AAAC AAAG A
ACTCCTT GATGCAGGATAGGGATGCTGCATCTGCTCGGGCCTGGCCT A A A AT GCACACAGTCA
ATGGTTATGTAAACAGGTCTCTGCCAGGTCTGATTGGATGCCACAGGAAATCAGTCTATTGGC
ATGTG ATT G G A AT GGGCACCACTCCTGAAGTGCACT C A AT ATT CCTCG A AG GT C AC AC ATTT CT
T GTG AGG AACCATCG CC AGG CGTCCTT G G AAAT CTCG CC AAT AACTTT CCTT ACT G CT CAAAC A
CTCTTGATGGACCTTGGACAGTTTCTACTGTTTTGTCATATCTCTTCCCACCAACATGATGGCAT
GGAAGCTTATGTCAAAGTAGACAGCTGTCCAGAGGAACCCCAACTACGAATGAAAAATAATG
AAG AAG CG G AAG ACT ATG ATG ATG ATCTTACTG ATTCTG AAATG G ATGTG GTCAG GTTTGATG
ATG ACAACT CTCCTTCCTTT ATCC AAATTCGCT C AGTT G CC AAG AAG CAT CCT AAAACTT G GGT
ACATTACATTGCTGCTGAAGAGGAGGACTGGGACTATGCTCCCTTAGTCCTCGCCCCCGATGA
C AG AAGTTAT AAAAGTC AAT ATTT G AAC AAT G GCCCT C AG CG G ATTGGTAG G AAGTAC AAAA
AAGTCCGATTTATGGCATACACAGATGAAACCTTTAAGACTCGTGAAGCTATTCAGCATGAAT
CAGGAATCTTGGGACCTTTACTTTATGGGGAAGTTGGAGACACACTGTTGATTATATTTAAGA
ATCAAGCAAGCAGACCATATAACATCTACCCTCACGGAATCACTGATGTCCGTCCTTTGTATTC
AAGG AG ATT ACCAAAAG GT GT AAAAC ATTT G AAG G ATTTT CC AATT CT GCCAG GAG AAAT ATT
CAAATATAAATGGACAGTGACTGTAGAAGATGGGCCAACTAAATCAGATCCTCGGTGCCTGAC
CCG CT ATT ACT CT AGTTT CGTT AAT AT G GAG AG AG AT CT AGCTT CAG G ACT C ATT GG CCCTCTC
CT CAT CTGCT AC AAAG AAT CT GT AG AT C AAAG AG G AAACCAG AT AAT GTCAG AC AAG AG GAA
TGTCATCCTGTTTTCTGTATTTGATGAGAACCGAAGCTGGTACCTCACAGAGAATATACAACGC
TTTCTCCCCAATCCAGCTGGAGTGCAGCTTGAGGATCCAGAGTTCCAAGCCTCCAACATCATGC
ACAGCATCAATGGCTATGTTTTTGATAGTTTGCAGTTGTCAGTTTGTTTGCATGAGGTGGCATA
CT G GTAC ATT CT AAG C ATT GGAGCACAGACT G ACTT CCTTT CTGT CTT CTT CT CTG G ATAT ACCT
T C AAACAC AAAAT G GT CT AT G AAG ACAC ACT CACCCT ATTCCC ATT CT C AG G AG AAACT GTCTT
CATGTCGATGGAAAACCCAGGTCTATGGATTCTGGGGTGCCACAACTCAGACTTTCGGAACAG
AGGCATGACCGCCTTACTGAAGGTTTCTAGTTGTGACAAGAACACTGGTGATTATTACGAGGA
CAGTTATGAAGATATTTCAGCATACTTGCTGAGTAAAAACAATGCCATTGAACCAAGAAGTTTT
TCACAGAATCCACCTGTATTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTAGGC
AAAAACAGTTT AATGCAACCACAATACCTG AAAATG AT ATAG AG AAAACCG AT CCCTGGTT CG
C ACACCG AACCCCC AT G CC AAAAATT C AAAACGTCTCC AGTT CCG AT CTT CT CAT G CT CTT GCG
CCAGTC ACCC ACACCAC AT G GT CT CT CCCT CAG CG ACCT GC AAG AG GCG AAAT AT G AAAC ATT
TTCAGATGACCCTAGCCCCGGCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTTT
CGG CCG CAG CTGCATC ATTCTG GTG AT ATG GT ATT C ACCCCG G AAT CAG GCCT CC AACTT AG A CTTAACGAGAAACTGGGCACGACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCC
AGTACCAGCAACAACCTTATCAGCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAAT
ACATCATCACTTGGGCCACCCTCTATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTT
TGGTAAGAAGTCATCCCCACTCACCGAAAGCGGTGGACCTTTGTCTCTCTCTGAGGAGAATAA
TGACTCCAAGCTGCTTGAGTCAGGGTTGATGAACAGCCAAGAATCCTCATGGGGAAAAAACG
TTTCCT CC ACCAG GG AAAT AACT CGTACT ACT CTT C AGTC AG AT C AAG AG G AAATT G ACT ATG A
TG AT ACC AT ATC AGTT G A A AT G A AG A AG G A AG ATTTT G AC ATTT ATG ATG AG G ATG A A AAT C A
GAGCCCCCGCAGCTTTCAAAAGAAAACACGACACTATTTTATTGCTGCAGTGGAGAGGCTCTG
GGATTATGGGATGAGTAGCTCCCCACATGTTCTAAGAAACAGGGCTCAGAGTGGCAGTGTCC
CT C AGTT CAAG AAAGTT GTTTT CC AGG AATTT ACT GAT G GCTCCTTT ACT C AG CCCTT AT ACCGT
GGAGAACTAAATGAACATTTGGGACTCCTGGGGCCATATATAAGAGCAGAAGTTGAAGATAA
T AT CAT GGTAACTTT CAG AAAT CAG GCCT CTCGT CCCT ATTCCTT CT ATT CT AGCCTT ATTT CTT A
TGAGGAAGATCAGAGGCAAGGAGCAGAACCTAGAAAAAACTTTGTCAAGCCTAATGAAACCA
AAACTT ACTTTT G G AAAGT GC AACAT CAT AT GG C ACCCACT AAAG AT G AGTTT G ACT G CAAAG
CCTGGGCTTATTTCTCTGATGTTGACCTGGAAAAAGATGTGCACTCAGGCCTGATTGGACCCCT
TCTGGTCTGCCACACTAACACACTGAACCCTGCTCATGGGAGACAAGTGACAGTACAGGAATT
TG CTCT GTTTTT C ACC AT CTTT GATGAGACCAAAAGCTGGT ACTT C ACT G A A AAT ATG G A A AG A
AACTGCAGGGCTCCCTGCAATATCCAGATGGAAGATCCCACTTTTAAAGAGAATTATCGCTTCC
AT G C A AT C A AT G G CT AC AT AAT G G ATAC ACT ACCTG G CTT AGTA AT G G CTC AG G AT C A A AG G A
TTCGATGGTATCTGCTCAGCATGGGCAG C AAT G A A A AC ATCC ATT CT ATT C ATTT CAGTGGACA
TGT GTTC ACT GTACG AAAAAAAG AG G AGTAT AAA AT G GC ACT GT ACAAT CT CT AT CC AG GTGT
TTTTGAGACAGTGGAAATGTTACCATCCAAAGCTGGAATTTGGCGGGTGGAATGCCTTATTGG
CG AG CAT CT AC AT G CTGG G ATG AG CAC ACTTTTT CTGGT GTAC AGC AAT AAGT GT CAG ACT CC
CCTGGGAATGGCTTCTGGACACATTAGAGATTTTCAGATTACAGCTTCAGGACAATATGGACA
GTGGGCCCCAAAGCTGGCCAGACTTCATTATTCCGGATCAATCAATGCCTGGAGCACCAAGGA
GCCCTTTTCTTGGATCAAGGTGGATCTGTTGGCACCAATGATTATTCACGGCATCAAGACCCAG
G GT G CCCGT CAG AAGTT CTCCAG CCT CT AC AT CT CT C AGTTT AT CAT CAT GT AT AGT CTT G ATG
GGAAGAAGTGGCAGACTTATCGAGGAAATTCCACTGGAACCTTAATGGTCTTCTTTGGCAATG
TGGATTCATCTGGGATAAAACACAATATTTTTAACCCTCCAATTATTGCTCGATACATCCGTTTG
C ACCC A ACT C ATT ATAG C ATT CG C AG C ACT CTT CGCATGGAGTTGATGGGCTGTG ATTT AAAT A
GTTGCAGCATGCCATTGGGAATGGAGAGTAAAGCAATATCAGATGCACAGATTACTGCTTCAT
CCT ACTTT ACCAAT AT GTTT G CCACCT GGTCTCCTT C AAAAG CT CG ACTT C ACCTCC AAGG G AG GAGTAATGCCTGGAGACCTCAGGTGAATAATCCAAAAGAGTGGCTGCAAGTGGACTTCCAGA
AG AC A AT G A A AGT CACAGGAGTAACTACTCAGGGAGT AAA AT CT CT G CTT ACC AG C ATGTATG TGAAGGAGTTCCTCATCTCCAGCAGTCAAGATGGCCATCAGTGGACTCTCTTTTTTCAGAATGG C AAAGTAAAG GTTTTT CAG GG AAAT C AAG ACT CCTT CAC ACCT GTG GT G AACT CT CT AG ACCC A CCGTTACTGACTCGCTACCTTCGAATTCACCCCCAGAGTTGGGTGCACCAGATTGCCCTGAGG ATG G AG GTT CTGG GCT G CG AG GC ACAG G ACCT CT ACT G A (SEQ ID NO: 22); b) ATG CAG ATT GAGCTGAGCACCTG CTT CTTCCT GTGCCTGCTGAGGTTCTG CTT CTCTGCCACCA GGAGATACTACCTGGGGGCTGTGGAGCTGAGCTGGGACTACATGCAGTCTGACCTGGGGGA GCTGCCTGTGGATGCCAGGTTCCCCCCCAGAGTGCCCAAGAGCTTCCCCTTCAACACCTCTGTG GT GT ACAAG AAG ACCCT GTTT GT GG AGTTCACTG ACCACCT GTT CAACATT GCCAAGCCCAGG CCCCCCTGGATGGGCCTGCTGGGCCCCACCATCCAGGCTGAGGTGTATGACACTGTGGTGATC ACCCTGAAGAACATGGCCAGCCACCCTGTGAGCCTGCATGCTGTGGGGGTGAGCTACTGGAA GGCCTCTGAGGGGGCTGAGTATGATGACCAGACCAGCCAGAGGGAGAAGGAGGATGACAAG GTGTTCCCTGGGGGCAGCCACACCTATGTGTGGCAGGTGCTGAAGGAGAATGGCCCCATGGC CTCTGACCCCCTGTGCCTGACCTACAGCTACCTGAGCCATGTGGACCTGGTGAAGGACCTGAA CTCTGGCCTGATTGGGGCCCTGCTGGTGTGCAGGGAGGGCAGCCTGGCCAAGGAGAAGACC C AG ACCCTG CAC AAGTT CAT CCT G CTGTTTG CTGTGTTTG ATG AG GG CAAG AG CTGG C ACTCT GAAACCAAGAACAGCCTGATGCAGGACAGGGATGCTGCCTCTGCCAGGGCCTGGCCCAAGAT GCACACTGTGAATGGCTATGTGAACAGGAGCCTGCCTGGCCTGATTGGCTGCCACAGGAAGT CTGTGTACTGG CAT GTG ATT G G CAT GGGCACCACCCCTGAGGTGCACAG CAT CTTCCT G G AG G GCCACACCTTCCTGGTCAGGAACCACAGGCAGGCCAGCCTGGAGATCAGCCCCATCACCTTCC TGACTGCCCAGACCCTGCTGATGGACCTGGGCCAGTTCCTGCTGTTCTGCCACATCAGCAGCC ACCAGCATGATGGCATGGAGGCCTATGTGAAGGTGGACAGCTGCCCTGAGGAGCCCCAGCTG AGGATGAAGAACAATGAGGAGGCTGAGGACTATGATGATGACCTGACTGACTCTGAGATGG ATGTG GTG AGGTTTG ATG ATG AC AACAG CCCC AG CTTCATCC AG ATCAG GTCTGTG G CC AAG A AGCACCCCAAGACCTGGGTGCACTACATTGCTGCTGAGGAGGAGGACTGGGACTATGCCCCC CTGGTGCTGGCCCCTGATGACAGGAGCTACAAGAGCCAGTACCTGAACAATGGCCCCCAGAG GATTGGCAGGAAGTACAAGAAGGTCAGGTTCATGGCCTACACTGATGAAACCTTCAAGACCA GGGAGGCCATCCAGCATGAGTCTGGCATCCTGGGCCCCCTGCTGTATGGGGAGGTGGGGGA CACCCTGCTGATCATCTTCAAGAACCAGGCCAGCAGGCCCTACAACATCTACCCCCATGGCATC ACTGATGTGAGGCCCCTGTACAGCAGGAGGCTGCCCAAGGGGGTGAAGCACCTGAAGGACTT
CCCC ATCCTG CCTG G GG AG ATCTTCAAGTACAAGTG G ACTGTG ACTGTG G AG G ATGG CCCCAC CAAGTCTGACCCCAGGTGCCTGACCAGATACTACAGCAGCTTTGTGAACATGGAGAGGGACCT
GGCCTCTGGCCTGATTGGCCCCCTGCTGATCTGCTACAAGGAGTCTGTGGACCAGAGGGGCA
ACCAGATCATGTCTGACAAGAGGAATGTGATCCTGTTCTCTGTGTTTGATGAGAACAGGAGCT
GGTACCTGACTGAGAACATCCAGAGGTTCCTGCCCAACCCTGCTGGGGTGCAGCTGGAGGAC
CCTG AGTTCCAG GCC AGC AACATCATG CACAG CAT CAAT G GCTAT GT GTTT G AC AG CCTG CAG
CTGTCTGTGTGCCTGCATGAGGTGGCCTACTGGTACATCCTGAGCATTGGGGCCCAGACTGAC
TT CCT GTCTGT GTT CTT CT CTG GCT AC ACCTT CAAG CAC AAG AT GGTGT ATG AGG AC ACCCT G A
CCCTGTTCCCCTTCTCTGGGGAGACTGTGTTCATGAGCATGGAGAACCCTGGCCTGTGGATTCT
GGGCTGCCACAACTCTGACTTCAGGAACAGGGGCATGACTGCCCTGCTGAAAGTCTCCAGCTG
T G AC A AG A AC ACTG G G G ACTACTATG AG G AC AG CTATG AG G AC AT CTCTG CCT ACCTG CTG AG
CAAGAACAATGCCATTGAGCCCAGGAGTTTTTCACAGAATCCACCTGTATTGACGCGGAGTTT
C AGT C AG AACTCC AGG CACCCCT CT ACT AG GC AAAAACAGTTT AATGC AACCAC AAT ACCTG A
AAATGATATAGAGAAAACCGATCCCTGGTTCGCACACCGAACCCCCATGCCAAAAATTCAAAA
CGT CTCC AGTT CCG AT CTT CT CAT GCT CTT G CGCCAGTC ACCCAC ACC AC ATGGTCT CTCCCT C A
GCGACCTGCAAGAGGCGAAATATGAAACATTTTCAGATGACCCTAGCCCCGGCGCTATTGATA
GTAACAACT CT CT C AGT G AAATG ACT CACTTTCGG CCG CAG CT GC AT C ATT CTG GTG ATATGGT
ATTCACCCCGGAATCAGGCCTCCAACTTAGACTTAACGAGAAACTGGGCACGACCGCCGCCAC
CG AGTTG AAG AAACT CG ACTT C AAGGTTT CC AGT ACC AGC AACAACCTT AT CAG CACT ATCCCA
TCCGATAATCTCGCGGCCGGGACAGATAATACATCATCACTTGGGCCACCCTCTATGCCGGTC
CACT AT GATT CCC AGTT GG ACAC AACT CTTTTT G GT AAG AAGT C ATCCCC ACT CACCG AAAGCG
GTGGACCTTTGTCTCTCTCTGAGGAGAATAATGACTCCAAGCTGCTTGAGTCAGGGTTGATGA
ACAGCCAAGAATCCTCATGGGGAAAAAACGTTTCCTCCACCAGGGAGATCACCAGGACCACCC
TGCAGTCTGACCAGGAGGAGATTGACTATGATGACACCATCTCTGTGGAGATGAAGAAGGAG
GACTTTGACATCTACGACGAGGACGAGAACCAGAGCCCCAGGAGCTTCCAGAAGAAGACCAG
GCACTACTTCATTGCTGCTGTGGAGAGGCTGTGGGACTATGGCATGAGCAGCAGCCCCCATGT
GCTGAGGAACAGGGCCCAGTCTGGCTCTGTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGAGT
TCACTGATGGCAGCTTCACCCAGCCCCTGTACAGAGGGGAGCTGAATGAGCACCTGGGCCTG
CTGGGCCCCTACATCAGGGCTGAGGTGGAGGACAACATCATGGTGACCTTCAGGAACCAGGC
CAGCAGGCCCTACAGCTTCTACAGCAGCCTGATCAGCTATGAGGAGGACCAGAGGCAGGGGG
CT G AGCCCAGG AAG AACTTT GTG AAGCCCAATG AAACCAAG ACCT ACTTCTGG AAGGTGCAG
CACCACATGGCCCCCACCAAGGATGAGTTTGACTGCAAGGCCTGGGCCTACTTCTCTGATGTG
GACCTGGAGAAGGATGTGCACTCTGGCCTGATTGGCCCCCTGCTGGTGTGCCACACCAACACC CTGAACCCTGCCCATGGCAGGCAGGTGACTGTGCAGGAGTTTGCCCTGTTCTTCACCATCTTTG
ATGAAACCAAGAGCTGGTACTTCACTGAGAACATGGAGAGGAACTGCAGGGCCCCCTGCAAC ATCC AG ATG G AG G ACCCC ACCTTCAAG G AG AACTACAG GTTCCATG CCATCAATG GCTAC ATC ATGGACACCCTGCCTGGCCTGGTGATGGCCCAGGACCAGAGGATCAGGTGGTACCTGCTGAG CATGGGCAGCAATGAGAACATCCACAGCATCCACTTCTCTGGCCATGTGTTCACTGTGAGGAA GAAGGAGGAGTACAAGATGGCCCTGTACAACCTGTACCCTGGGGTGTTTGAGACTGTGGAGA TGCTGCCCAGCAAGGCTGGCATCTGGAGGGTGGAGTGCCTGATTGGGGAGCACCTGCATGCT G GC ATG AG C ACCCTGTTCCTG GTGTAC AGC AACAAGTG CCAGACCCCCCTGGGCATGGCCTCT GGCCACATCAGGGACTTCCAGATCACTGCCTCTGGCCAGTATGGCCAGTGGGCCCCCAAGCTG GCCAGGCTGCACTACTCTGGCAG CAT C A AT GCCTGGAGCACCAAGG AG CCCTT C AG CTG G ATC AAGGTGGACCTGCTGGCCCCCATGATCATCCATGGCATCAAGACCCAGGGGGCCAGGCAGAA GTTC AGC AGCCT GTAC AT CAG CCAGTT CAT CAT CAT GT ACAG CCTG G ATG G C A AG A AGTG G CA GACCTACAGGGGCAACAGCACTGGCACCCTGATGGTGTTCTTTGGCAATGTGGACAGCTCTG G CAT C AAGC AC AAC AT CTT CAACCCCCCCAT CATTGCC AG AT AC AT C AGG CT GC ACCCCACCC A CTACAGCATCAGGAGCACCCTGAGGATGGAGCTGATGGGCTGTGACCTGAACAGCTGCAGCA TGCCCCTGGGCATGGAGAGCAAGGCCATCTCTGATGCCCAGATCACTGCCAGCAGCTACTTCA CCAACATGTTTGCCACCTGGAGCCCCAGCAAGGCCAGGCTGCACCTGCAGGGCAGGAGCAAT GCCTGGAGGCCCCAGGTCAACAACCCCAAGGAGTGGCTGCAGGTGGACTTCCAGAAGACCAT GAAGGTGACTGGGGTGACCACCCAGGGGGTGAAGAGCCTGCTGACCAGCATGTATGTGAAG GAGTTCCTGATCAGCAGCAGCCAGGATGGCCACCAGTGGACCCTGTTCTTCCAGAATGGCAA GGTGAAGGTGTTCCAGGGCAACCAGGACAGCTTCACCCCTGTGGTGAACAGCCTGGACCCCC CCCTG CTG ACCAG ATACCTG AG G ATTCACCCCC AG AG CTG G GTG C ACC AG ATTGCCCTG AG G A TGGAGGTGCTGGGCTGTGAGGCCCAGGACCTGTACTGA (SEQ ID NO: 23); c) AT G C C T GAG CAG GAG AG AC AG AT C AC AG C CAG AG AAG G G G C CAG T C G G AAAA TCTTATCTAAGCTTTCTTTGCCTACCCGTGCCTGGGAACCAGCAATGAAGAA GAGTTTTGCTTTTGACAATGTTGGCTATGAAGGTGGTCTGGATGGCCTGGGC CCTTCTTCTCAGGTGGCCACCAGCACAGTCAGGATCTTGGGCATGACTTGCC AG TCATGTGT G AAG T C C AT T GAG G AC AG GAT T T C C AAT T T G AAAG G CAT CAT CAGCATGAAGGTTTCCCTGGAACAAGGCAGTGCCACTGTGAAATATGTGCCA TCGGTTGTGTGCCTGCAACAGGTTTGCCATCAAATTGGGGACATGGGCTTCG AGGCCAGCATTGCAGAAGGAAAGGCAGCCTCCTGGCCCTCAAGGTCCTTGCC
TGCCCAGGAGGCTGTGGTCAAGCTCCGGGTGGAGGGCATGACCTGCCAGTCC TGTGTCAGCTCCATTGAAGGCAAGGTCCGGAAACTGCAAGGAGTAGTGAGAG
T C AAAG T C T C AC T C AG C AAC C AAG AG G C C G T CAT C AC T T AT C AG C C T TAT C T C AT T C AG C C C G AAG AC C T C AG G G AC CAT G T AAAT G AC AT G G GAT T T G AAG C T GCCATCAAGAGCAAAGTGGCTCCCTTAAGCCTGGGACCAATTGATATTGAGC G G T T AC AAAG C AC T AAC C C AAAGAG AC C T T TAT C T T C T GC T AAC CAGAAT T T TAATAATTCTGAGACCTTGGGGCACCAAGGAAGCCATGTGGTCACCCTCCAA C T GAG AAT AG AT G GAAT G C AT T G T AAG T C T T GC G T C T T GAAT AT T G AAGAAA ATATTGGCCAGCTCCTAGGGGTTCAAAGTATTCAAGTGTCCTTGGAGAACAA AACTGCCCAAGTAAAGTATGACCCTTCTTGTACCAGCCCAGTGGCTCTGCAG AGGGCTATCGAGGCACTTCCACCTGGGAATTTTAAAGTTTCTCTTCCTGATG GAGCCGAAGGGAGTGGGACAGATCACAGGTCTTCCAGTTCTCATTCCCCTGG CTCCCCACCGAGAAACCAGGTCCAGGGCACATGCAGTACCACTCTGATTGCC ATTGCCGGCATGACCTGTGCATCCTGTGTCCATTCCATTGAAGGCATGATCT CCCAACTGGAAGGGGTGCAGCAAATATCGGTGTCTTTGGCCGAAGGGACTGC AAC AG T T C T T TAT AAT C C C T C T G T AAT TAG C C C AG AAG AAC T C AG AG C T G C T ATAGAAGACATGGGATTTGAGGCTTCAGTCGTTTCTGAAAGCTGTTCTACTA ACCCTCTTGGAAACCACAGTGCTGGGAATTCCATGGTGCAAACTACAGATGG TACACCTACATCTGTGCAGGAAGTGGCTCCCCACACTGGGAGGCTCCCTGCA AACCATGCCCCGGACATCTTGGCAAAGTCCCCACAATCAACCAGAGCAGTGG CACCGCAGAAGTGCTTCTTACAGATCAAAGGCATGACCTGTGCATCCTGTGT GTCTAACATAGAAAGGAATCTGCAGAAAGAAGCTGGTGTTCTCTCCGTGTTG GTTGCCTT GAT G G C AG G AAAG G C AG AG AT C AAG TAT G AC C C AG AG G T CAT C C AGCCCCTCGAGATAGCTCAGTTCATCCAGGACCTGGGTTTTGAGGCAGCAGT C AT G GAG G AC T AC G C AG G C T C C GAT G G C AAC AT T GAG C T G AC AAT C AC AG G G ATGACCTGCGCGTCCTGTGTCCACAACATAGAGTCCAAACTCACGAGGACAA ATGGCATCACTTATGCCTCCGTTGCCCTTGCCACCAGCAAAGCCCTTGTTAA G T T T G AC C C G G AAAT TATCGGTCCACGG GAT AT TAT C AAAAT TAT T GAG G AA ATTGGCTTTCATGCTTCCCTGGCCCAGAGAAACCCCAACGCTCATCACTTGG ACCACAAGATGGAAATAAAGCAGTGGAAGAAGTCTTTCCTGTGCAGCCTGGT GTTTGGCATCCCTGTCATGGCCTTAATGATCTATATGCTGATACCCAGCAAC
GAG C C C C AC C AG T C CAT G G T C C T G G AC C AC AAC AT C AT T C C AG G AC T G T C C A TTCTAAATCTCATCTTCTTTATCTTGTGTACCTTTGTCCAGCTCCTCGGTGG
G T GG T AC T T C T AC G T T C AGG C C T AC AAAT C T C T GAG AC AC AG G T C AGC C AAC ATGGACGTGCTCATCGTCCTGGCCACAAGCATTGCTTATGTTTATTCTCTGG TCATCCTGGTGGTTGCTGTGGCTGAGAAGGCGGAGAGGAGCCCTGTGACATT CTTCGACACGCCCCCCATGCTCTTTGTGTTCATTGCCCTGGGCCGGTGGCTG GAACACTTGGCAAAGAGCAAAACCTCAGAAGCCCTGGCTAAACTCATGTCTC TCCAAGCCACAGAAGCCACCGTTGTGACCCTTGGTGAGGACAATTTAATCAT CAGGGAGGAGCAAGTCCCCATGGAGCTGGTGCAGCGGGGCGATATCGTCAAG GTGGTCCCTGGGGGAAAGTTTCCAGTGGATGGGAAAGTCCTGGAAGGCAATA C CAT G G C T GAT GAG TCCCTCAT C AC AG G AG AAG C CAT G C C AG T C AC T AAG AA ACCCGGAAGCACTGTAATTGCGGGGTCTATAAATGCACATGGCTCTGTGCTC AT T AAAG C T AC C C AC G T G G G C AAT G AC AC C AC T T T G G C T C AG AT T G T G AAAC TGGTGGAAGAGGCTCAGATGTCAAAGGCACCCATTCAGCAGCTGGCTGACCG GTTTAGTGGATATTTTGTCCCATTTATCATCATCATGTCAACTTTGACGTTG GTGGTATGGATTGTAATCGGTTTTATCGATTTTGGTGTTGTTCAGAGATACT TTCCTAACCCCAACAAGCACATCTCCCAGACAGAGGTGATCATCCGGTTTGC TTTCCAGACGTCCATCACGGTGCTGTGCATTGCCTGCCCCTGCTCCCTGGGG CTGGCCACGCCCACGGCTGTCATGGTGGGCACCGGGGTGGCCGCGCAGAACG G CAT C C T C AT C AAG G GAG G C AAG C C C C T G GAG AT G G C G C AC AAG AT AAAG AC TGTGATGTTTGACAAGACTGGCACCATTACCCATGGCGTCCCCAGGGTCATG CGGGTGCTCCTGCTGGGGGATGTGGCCACACTGCCCCTCAGGAAGGTTCTGG CTGTGGTGGGGACTGCGGAGGCCAGCAGTGAACACCCCTTGGGCGTGGCAGT C AC C AAAT AC T G T AAAG AG G AAC T T G G AAC AG AG AC C T T G G GAT AC T G C AC G GACTTCCAGGCAGTGCCAGGCTGTGGAATTGGGTGCAAAGTCAGCAACGTGG AAGGCATCCTGGCCCACAGTGAGCGCCCTTTGAGTGCACCGGCCAGTCACCT GAATGAGGCTGGCAGCCTTCCCGCAGAAAAAGATGCAGTCCCCCAGACCTTC TCTGTGCTGATTGGAAACCGTGAGTGGCTGAGGCGCAACGGTTTAACCATTT C TAG C GAT G T C AG T GAC G C TAT GACAGAC C AC G AGAT GAAAG GAC AGAC AGC CATCCTGGTGGCTATTGACGGTGTGCTCTGTGGGATGATCGCAATCGCAGAC GCTGTCAAGCAGGAGGCTGCCCTGGCTGTGCACACGCTGCAGAGCATGGGTG
T G GAC GTGGTTCT GAT C AC G G G G GAC AAC C G G AAG AC AG C C AG AG C TAT T G C CACCCAGGTTGGCATCAACAAAGTCTTTGCAGAGGTGCTGCCTTCGCACAAG GTGGCCAAGGTCCAGGAGCTCCAGAATAAAGGGAAGAAAGTCGCCATGGTGG GGGATGGGGTCAATGACTCCCCGGCCTTGGCCCAGGCAGACATGGGTGTGGC CATTGGCACCGGCACGGATGTGGCCATCGAGGCAGCCGACGTCGTCCTTATC AGAAATGATTTGCTGGATGTGGTGGCTAGCATTCACCTTTCCAAGAGGACTG TCCGAAGGATACGCATCAACCTGGTCCTGGCACTGATTTATAACCTGGTTGG GATACCCATTGCAGCAGGTGTCTTCATGCCCATCGGCATTGTGCTGCAGCCC TGGATGGGCTCAGCGGCCATGGCAGCCTCCTCTGTGTCTGTGGTGCTCTCAT CCCTGCAG CTCAAG TGCTATAAG AAGCCT GACCTGGAG AGGTAT GAGGCACA GGCGCATGGCCACATGAAGCCCCTGACGGCATCCCAGGTCAGTGTGCACATA GGCATGGATGACAGGTGGCGGGACTCCCCCAGGGCCACACCATGGGACCAGG TCAGCTATGTCAGCCAGGTGTCGCTGTCCTCCCTGACGTCCGACAAGCCATC TCGGCACAGCGCTGCAGCAGACGATGATGGGGACAAGTGGTCTCTGCTCCTG AAT GGCAG GGATGAG GAGCAGTAC ATCTGA (SEQ ID NO: 24); d) ATGCCAGAGCAGGAGAGGCAGATCACCGCAAGAGAGGGAGCATCCAGGAAGATCCTGT CCAAGCTGTCTCTGCCAACAAGGGCATGGGAGCCTGCAATGAAGAAGTCTTTCGCCTTT GACAACGTGGGATATGAGGGAGGCCTGGATGGCCTGGGACCTAGCTCCCAGGTGGCCAC CAGCACAGTGAGAATCCTGGGCATGACCTGCCAGTCTTGCGTGAAGAGCATCGAGGACA GGATCTCCAATCTGAAGGGCATCATCTCCATGAAGGTGTCTCTGGAGCAGGGCTCTGCC ACAGTGAAGTACGTGCCCAGCGTGGTGTGCCTGCAGCAGGTGTGCCACCAGATCGGCGA TATGGGCTTCGAGGCATCCATCGCAGAGGGCAAGGCAGCATCTTGGCCATCCAGATCTC TGCCTGCCCAGGAGGCCGTGGTGAAGCTGAGGGTGGAAGGAATGACCTGCCAGTCCTGC GTGAGCAGCATCGAGGGCAAGGTGAGAAAGCTGCAGGGCGTGGTGAGGGTGAAGGTGA GCCTGTCCAACCAGGAGGCCGTGATCACATACCAGCCATATCTGATCCAGCCCGAGGAC CTGCGGGATCACGTGAATGACATGGGCTTCGAGGCCGCCATCAAGAGCAAGGTGGCACC TCTGTCCCTGGGACCAATCGATATCGAGCGCCTGCAGTCCACCAACCCTAAGCGGCCAC TGTCCTCTGCCAACCAGAACTTCAACAATAGCGAGACACTGGGACACCAGGGCTCCCAC GTGGTGACACTGCAGCTGCGCATCGACGGCATGCACTGCAAGAGCTGCGTGCTGAACAT CGAGGAGAATATCGGCCAGCTGCTGGGCGTGCAGAGCATCCAGGTGTCCCTGGAGAACA AGACCGCCCAGGTGAAGTATGATCCCAGCTGCACATCCCCTGTGGCCCTGCAGAGGGCA ATCGAGGCCCTGCCCCCTGGCAATTTCAAGGTGTCTCTGCCAGACGGAGCAGAGGGCAG
CGGAACCGATCACCGCAGCTCCTCTAGCCACTCTCCTGGCAGCCCACCAAGGAACCAGGT GCAGGGAACCTGTTCTACCACACTGATCGCCATCGCCGGCATGACATGCGCCTCTTGCG
TGCACAGCATCGAGGGCATGATCAGCCAGCTGGAGGGCGTGCAGCAGATCTCTGTGAGC
CTGGCAGAGGGAACCGCAACAGTGCTGTACAATCCATCCGTGATCTCTCCCGAGGAGCT
GAGAGCCGCCATCGAGGACATGGGCTTTGAGGCCTCCGTGGTGTCCGAGTCTTGCAGCA
CCAACCCCCTGGGCAATCACTCCGCCGGCAACTCTATGGTGCAGACCACAGACGGCACCC
CAACAAGCGTGCAGGAGGTGGCACCACACACCGGCAGACTGCCTGCCAATCACGCCCCA
GATATCCTGGCCAAGAGCCCTCAGTCCACAAGGGCAGTGGCACCACAGAAGTGCTTCCT
GCAGATCAAGGGCATGACCTGCGCCTCCTGCGTGAGCAACATCGAGAGGAATCTGCAGA
AGGAGGCAGGCGTGCTGTCCGTGCTGGTGGCCCTGATGGCAGGCAAGGCCGAGATCAAG
TACGATCCTGAAGTGATCCAGCCACTGGAGATCGCCCAGTTTATCCAGGACCTGGGCTT
CGAGGCCGCCGTGATGGAGGATTATGCCGGCAGCGACGGCAACATCGAGCTGACCATCA
CAGGCATGACCTGCGCCTCTTGCGTGCACAACATCGAGAGCAAGCTGACCCGCACAAAT
GGCATCACATACGCATCTGTGGCCCTGGCCACCAGCAAGGCCCTGGTGAAGTTTGATCC
CGAGATCATCGGCCCTCGGGACATCATCAAGATCATCGAGGAGATCGGCTTCCACGCCA
GCCTGGCCCAGAGAAACCCCAATGCCCACCACCTGGATCACAAGATGGAGATCAAGCAG
TGGAAGAAGAGCTTTCTGTGCTCCCTGGTGTTCGGCATCCCTGTGATGGCCCTGATGAT
CTACATGCTGATCCCTTCCAACGAGCCACACCAGTCTATGGTGCTGGACCACAACATCA
TCCCAGGCCTGTCCATCCTGAATCTGATCTTCTTTATCCTGTGCACATTTGTGCAGCTGC
TGGGCGGCTGGTACTTCTATGTGCAGGCCTATAAGAGCCTGCGGCACAGATCCGCCAAT
ATGGATGTGCTGATCGTGCTGGCCACCAGCATCGCCTACGTGTATTCCCTGGTCATCCT
GGTGGTGGCAGTGGCAGAGAAGGCAGAGCGGAGCCCCGTGACCTTCTTTGACACACCCC
CTATGCTGTTCGTGTTTATCGCCCTGGGCAGATGGCTGGAGCACCTGGCCAAGAGCAAG
ACCTCCGAGGCCCTGGCCAAGCTGATGAGCCTGCAGGCCACAGAGGCCACCGTGGTGAC
ACTGGGCGAGGATAACCTGATCATCAGGGAGGAGCAGGTGCCAATGGAGCTGGTGCAG
CGCGGCGACATCGTGAAGGTGGTGCCAGGCGGCAAGTTTCCCGTGGATGGCAAGGTGCT
GGAGGGCAATACAATGGCAGACGAGTCCCTGATCACCGGAGAGGCCATGCCTGTGACCA
AGAAGCCAGGCTCTACAGTGATCGCAGGCAGCATCAACGCACACGGCTCCGTGCTGATC
AAGGCCACACACGTGGGCAATGATACCACACTGGCCCAGATCGTGAAGCTGGTGGAGGA
GGCCCAGATGAGCAAGGCACCAATCCAGCAGCTGGCAGACCGGTTTTCTGGCTACTTCG
TGCCTTTTATCATCATCATGAGCACCCTGACACTGGTGGTGTGGATCGTGATCGGCTTC
ATCGACTTTGGCGTGGTGCAGAGGTATTTCCCAAACCCCAATAAGCACATCTCCCAGAC
CGAAGTGATCATCCGCTTCGCCTTTCAGACCTCCATCACCGTGCTGTGCATCGCCTGCCC
TTGTTCTCTGGGCCTGGCCACCCCAACAGCCGTGATGGTGGGAACAGGAGTGGCAGCAC
AGAACGGCATCCTGATCAAGGGCGGCAAGCCCCTGGAGATGGCCCACAAGATCAAGACC GTGATGTTCGATAAGACCGGCACAATCACCCACGGCGTGCCAAGAGTGATGAGAGTGCT
GCTGCTGGGCGACGTGGCCACACTGCCACTGAGAAAGGTGCTGGCAGTGGTGGGAACCG CAGAGGCCAGCTCCGAGCACCCCCTGGGCGTGGCCGTGACAAAGTACTGCAAGGAGGAG CTGGGCACAGAGACACTGGGCTATTGTACCGACTTTCAGGCAGTGCCTGGATGCGGAAT CGGCTGTAAGGTGTCCAACGTGGAGGGCATCCTGGCACACTCTGAGCGGCCCCTGTCTG CCCCTGCAAGCCACCTGAATGAGGCAGGCAGCCTGCCAGCAGAGAAGGATGCAGTGCCT CAGACATTCTCCGTGCTGATCGGCAACAGAGAGTGGCTGCGGAGAAATGGCCTGACCAT CTCTAGCGACGTGAGCGACGCCATGACAGACCACGAGATGAAGGGCCAGACCGCCATCC TGGTGGCCATCGATGGCGTGCTGTGCGGCATGATCGCCATCGCAGACGCAGTGAAGCAG GAGGCCGCCCTGGCAGTGCACACCCTGCAGTCTATGGGCGTGGATGTGGTGCTGATCAC CGGCGACAACAGGAAGACAGCAAGGGCAATCGCAACCCAAGTGGGCATCAATAAGGTG TTTGCCGAGGTGCTGCCATCCCACAAGGTGGCCAAGGTGCAGGAGCTGCAGAACAAGGG CAAGAAGGTGGCCATGGTGGGCGATGGCGTGAATGACTCTCCCGCCCTGGCACAGGCAG ATATGGGAGTGGCAATCGGCACAGGAACCGATGTGGCAATCGAGGCAGCAGACGTGGT GCTGATCCGGAACGATCTGCTGGACGTGGTGGCCTCCATCCACCTGTCTAAGCGGACCG TGAGGCGCATCAGAATCAACCTGGTGCTGGCCCTGATCTACAATCTGGTGGGCATCCCT ATCGCAGCAGGCGTGTTCATGCCAATCGGCATCGTGCTGCAGCCATGGATGGGCAGCGC CGCAATGGCAGCATCCAGCGTGAGCGTGGTGCTGAGCTCCCTGCAGCTGAAGTGTTACA AGAAGCCTGACCTGGAGAGGTATGAGGCCCAGGCCCACGGCCACATGAAGCCACTGACC GCCTCTCAGGTGAGCGTGCACATCGGCATGGACGATAGGTGGAGGGATAGCCCAAGGGC AACACCATGGGACCAGGTGTCCTACGTGTCTCAGGTGAGCCTGTCTAGCCTGACCTCTG ATAAGCCATCCAGGCACAGCGCCGCCGCCGACGATGACGGCGACAAGTGGAGCCTGCTG CTGAATGGCCGCGATGAGGAGCAGTACATC fSEO ID NO: 25Ί The present invention also provides a vector system to express a coding sequence in a cell, said coding sequence consisting of a first portion (CDS1) and a second portion (CDS2) said vector system comprising: a) a first vector comprising:
- said first portion of said coding sequence (CDS1),
-a first intein nucleotide sequence coding for a N-lntein said N-intein having at least 80 % identity with SEQ ID No 3, 5, 7, 9, 11, 13 or a variant thereof or a fragment thereof or an homolog thereof and wherein said first intein nucleotide sequence is located at the 3' end of CDS1; and b) a second vector comprising: - said second portion of said coding sequence (CDS2),
-a second intein nucleotide sequence coding for a C-lntein said C-intein has at least 80 % identity with SEQ ID No. 4, 6, 8, 10, 12, 14 or a variant thereof or a fragment thereof or an homolog thereof and wherein said second intein nucleotide sequence is located at the 5' end of CDS2; wherein said coding sequence encodes a sequence selected from the group of: i) MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFNTSVVYK
KTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTVVITLKNMASHPVSLHAV GVSYWKASEG
AEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSHVDLVKDLNSGLIGA
LLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRDAASARAWPKMHTVNGYV
NRSLPGLIGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNHRQASLEISPITFLTAQTLLMDLG
QFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNEEAEDYDDDLTDSEMDVVRFDDDNSP
SFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLAPDDRSYKSQYLNNGPQRIGRKYKKVRFMA
YTDETFKTREAIQHESGILGPLLYGEVGDTLLIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHL
KDFPILPGEIFKYKWTVTVEDGPTKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGN
QIMSDKRNVILFSVFDENRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHSINGYVFDSLQLSV
CLHEVAYWYILSIGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVF SMENPGLWILGCHN
SDFRNRGMTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHP
STRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQEAKYE
TFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATELKKLDFKVS
STSNNUSTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKSSPLTESGGPLSLSEENNDSK
LLESGLMNSQESSWGKNVSSTREITRTTLQSDQEEIDYDDTISVEMKKEDFDIYDEDENQSPRSF
QKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSVPQFKKVVFQEFTDGSFTQPLYRGELNEH
LGLLGPYIRAEVEDNIMVTFRNQASRPYSFYSSLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQ
HHMAPTKDEFDCKAWAYFSDVDLEKDVHSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFD
ETKSWYFTENMERNCRAPCNIQMEDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLS
MGSNENIHSIHFSGHVFTVRKKEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGM
STLFLVYSNKCQTPLGMASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVD
LLAPMIIHGIKTQGARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIF
NPPIIARYIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPS
KARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQDG
HQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCEAQDL
Y or ii)
MQJELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFNTSVVYKKTLFVEF
TDHLFNIAKPRPPWMGLLGPTIQAEVYDTVVITLKNMASHPVSLHAVGVSYWKASEGAEYDDQTSQREKE
DDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSHVDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFI
LLFAVFDEGKSWHSETKNSLMQDRDAASARAWPKMHTVNGYVNRSLPGLIGCHRKSVYWHVIGMGTTP
EVHSIFLEGHTFLVRNHRQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRM
KNNEEAEDYDDDLTDSEMDVVRFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLAPDDRSY
KSQYLNNGPQRIGRKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTLLIIFKNQASRPYNIYPHGIT
DVRPLYSRRLPKGVKHLKDFPILPGEIFKYKWTVTVEDGPTKSDPRCLTRYYSSFVNMERDLASGLIGPLLICY
KESVDQRGNQIMSDKRNVILFSVFDENRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHSINGYVFDSLQ
LSVCLHEVAYWYILSIGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVF SMENPGLWILGCHNSDFR !MRGMTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQIMSRHPSTRQKQFNAT
TIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQEAKYETFSDDPSPGAIDSNNSLS
E THFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATELKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSL
GPPSMPVHYDSQLDTTLFGKKSSPLTESGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTRHQREITRT
TLQSDQEEIDYDDTISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQS
GSVPQFKKVVFQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYSSLISYEEDQR
QGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDVHSGLIGPLLVCHTNTLNPA
HGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQMEDPTFKENYRFHAINGYIMDTLPGLVMAQ
DQRIRWYLLSMGSNENIHSIHFSGHVFTVRKKEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHA
GMSTLFLVYSNKCQTPLGMASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAP
MIIHGIKTQGARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIARYIRLH
PTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSKARLHLQGRSNAWRPQ
VNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQDGHQWTLFFQNGKVKVFQGNQDSF
TPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCEAQDLY or iii) SEQ ID NO: 21 wherein when the first vector and the second vector are inserted in a cell, the protein product of the coding sequence is produced by protein splicing.
In the vector systems of the invention preferably at least one of the first vector and the second vector further comprises at least one enhancer or regulatory nucleotide sequence, operably linked to the coding sequence.
Preferably the vector system comprises: a) a first vector comprising in a 5'-3' direction:
- a 5'-inverted terminal repeat (5'-ITR) sequence;
- a promoter sequence;
- a 5' end portion of a coding sequence (CDS1), said 5'end portion being operably linked to and under control of said promoter;
- a first intein nucleotide sequence coding for a N-lntein; and
- a 3'-inverted terminal repeat (3'-ITR) sequence; and b) a second vector comprising in a 5'-3' direction:
- a 5'-inverted terminal repeat (5'-ITR) sequence;
- a promoter sequence;
- a second intein nucleotide sequence coding for a C-lntein;
- a 3'end portion of the coding sequence (CDS2); and
- a 3'-inverted terminal repeat (3'-ITR) sequence;
Preferably said first and second vector are independently a viral vector, preferably an adeno viral vector or adeno-associated viral (AAV) vector, preferably said first and second adeno- associated viral (AAV) vectors are selected from the same or different AAV serotypes, preferably the serotype is selected from the serotype 2, the serotype 8, the serotype 5, the serotype 7 or the serotype 9, serotype 7m8, serotype shlO; serotype 2(quad Y-F).
In some embodiments, the first and/or second vector is an AAV2 vector. In some embodiments, the first and/or second vector is an AAV8 vector.
In some embodiments, the first and/or second vector is a viral vector particle.
In some embodiments, the first and/or second vector is an AAV2/8 vector.
The invention provides a host cell transformed with the vector system as defined above. Preferably the vector system or the host cell are for medical use, preferably for use in gene therapy, preferably for use in the treatment and/or prevention of hemophilia or Wilson disease, preferably hemophilia is hemophilia A.
In another aspect the invention provides a vector, wherein the vector is the first vector as disclosed herein.
In another aspect the invention provides a vector, wherein the vector is the second vector as disclosed herein.
In another aspect the invention provides a cell comprising the first vector and the second vector as disclosed herein.
In another aspect the invention provides a cell transduced or transfected with the first vector and the second vector as disclosed herein.
In some embodiments, the cell is a mammalian cell, preferably a human cell, such as a liver cell. In another aspect the invention provides a kit comprising the first vector as disclosed herein and the second vector as disclosed herein.
In another aspect the invention provides a composition comprising the first vector as disclosed herein and the second vector as disclosed herein.
In preferred embodiments, the composition is a pharmaceutical composition comprising a pharmaceutically-acceptable carrier, diluent or excipient.
In another aspect the invention provides the vector system, vector, kit or composition of the invention for use in therapy.
In another aspect the invention provides the vector system, vector, kit or composition of the invention for use in treatment of hemophilia. Preferably the hemophilia is hemophilia A.
In some embodiments, plasma F8 activity is increased, preferably is substantially normalised, after the treatment. In some embodiments, increased, preferably substantially normalised, plasma F8 activity is substantially maintained for at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or 16 weeks following the treatment.
In some embodiments, the vector system, vector, kit or composition of the invention is administered at a dose that substantially does not result in the generation of anti-F8 antibodies in a subject (preferably human subject).
In some embodiments, the vector system, vector, kit or composition of the invention is administered at a dose of at least 1.2 xlO13 genome copies per kg.
In some embodiments, the vector system, vector, kit or composition of the invention is administered at a dose of up to 4xl014 genome copies per kg.
In some embodiments, the vector system, vector, kit or composition of the invention is administered at a dose of between 1.2xl013 genome copies per kg and 4xl014 genome copies per kg.
In another aspect the invention provides the vector system, vector, kit or composition of the invention for use in treatment of Wilson disease.
In some embodiments, the treated subject does not develop liver pathology.
In some embodiments, the treatment substantially prevents increase in circulating alanine transaminase (ALT) and/or aspartate transaminase (AST) levels.
In some embodiments, the vector system, vector, kit or composition of the invention is administered systemically. In some embodiments, the vector system, vector, kit or composition of the invention is administered intravenously.
In another aspect the invention provides the first vector as disclosed herein for use in therapy, wherein the first vector is administered simultaneously, sequentially or separately in combination with the second vector as disclosed herein.
In another aspect the invention provides the first vector as disclosed herein for use in treatment of hemophilia, wherein the first vector is administered simultaneously, sequentially or separately in combination with the second vectoras disclosed herein. Preferably the hemophilia is hemophilia A.
In another aspect the invention provides the first vector as disclosed herein for use in treatment of Wilson disease, wherein the first vector is administered simultaneously, sequentially or separately in combination with the second vector as disclosed herein. In another aspect the invention provides the second vector as disclosed herein for use in therapy, wherein the second vector is administered simultaneously, sequentially or separately in combination with the first vector as disclosed herein.
In another aspect the invention provides the second vector as disclosed herein for use in treatment of hemophilia, wherein the second vector is administered simultaneously, sequentially or separately in combination with the first vector as disclosed herein. Preferably the hemophilia is hemophilia A.
In another aspect the invention provides the second vector as disclosed herein for use in treatment of Wilson disease, wherein the second vector is administered simultaneously, sequentially or separately in combination with the first vector as disclosed herein.
In another aspect the invention provides a method of treating or preventing hemophilia comprising administering an effective amount of the vector system, vector, kit or composition of the invention to a subject in need thereof. Preferably the hemophilia is hemophilia A.
In another aspect the invention provides a method of treating or preventing Wilson disease comprising administering an effective amount of the vector system, vector, kit or composition of the invention to a subject in need thereof.
The invention provides a pharmaceutical composition comprising the vector system or the host cell as defined above and pharmaceutically acceptable vehicle.
Split inteins of the invention may be 100% identical, 98%, 80%, 75%, 70%, 65%, 60%, 55%, 50% identical to naturally occurring inteins or to SEQ ID No. 1 to 14 (homologs), wherein said inteins retain the ability to undergo trans-splicing reactions. Within the scope of the present invention are fragments or variants of naturally occurring or modified inteins which retain trans-splicing activity.
Preferred promoters are ubiquitous, artificial, or tissue specific promoters, including fragments and variants thereof retaining a transcription promoter activity. Particularly preferred promoters are liver-specific promoters including Factor 8 promoter, thyroxine binding globulin (TBG), hybrid liver-specific promoter (HLP) (McIntosh J (2013). Blood 20 Feb 2013, 121(17):3335-3344), HCB promoter (Brown HC, Zakas PM, George SN, Parker ET, Spencer HT, Doering CB. Target-Cell-Directed Bioengineering Approaches for Gene Therapy of Hemophilia A. Mol Ther Methods Clin Dev. 2018;9:57-69. Published 2018 Jan 31.); liver endothelial cell promoters including the Tie 2 promoter (Benten D, et al., Hepatic targeting of transplanted liver sinusoidal endothelial cells in intact mice. Hepatology. 2005;42(1):140-148.) . Ubiquitous promoters according to the present invention are for instance the ubiquitous cytomegalovirus (CMV)(32) and short CMV (33) promoters
Illustrative polyadenylation signals include, without limitations, the bovine growth hormone polyadenylation signal (bGHpA), the human beta globin polyadenylation signal or a short synthetic version (Levitt N, (1989). Genes Dev. 1989 Jul;3(7):1019-25), the SV40 polyadenylation signal, or other naturally occurring or artificial polyadenylation signal.
The present invention will be described by means of non-limiting examples in reference to the following figures.
Figure 1. Comparison of F8 variants in vitro, (a) Schematic representation of the different F8 variants that were cloned into an AAV backbone plasmid: N6 variant (variant in which the B domain is exchanged with a shorter SQ-N6 linker, wherein 11 amino acids of the modified SQ amino acid linker (SQm) are added in front of the N6 peptide)) ITR, inverted terminal repeats; CMV, cytomegalovirus promoter; star symbol, Bxflag tag; PolyA, polyadenylation signal; Ntds, nucleotides; SP, signal peptide (b) Western blot (WB) analysis of lysates of HEK293 cells 72 hours post transfection (hpt) with the different F8 variants. Neg, non-transfected cells (c) Chromogenic assay on medium of transfected cells showing F8 activity. Significant differences between groups were assessed using Kruskal-Wallis rank sum test followed by Nemenyi multiple pairwise comparison. **P < 0.01.
Figure 2. Characterisation of N6 inteins in vitro, (a) Schematic representation of AAV N6 inteins with indicated splitting points. ITR, inverted terminal repeats; Prom., promoter; SP, signal peptide; star symbol, 3xflag tag; PolyA, polyadenylation signal (b) WB of lysates of HEK293 cells 72 hpt with either single N6 plasmid or N6 AAV intein plasmids. I + II, N6 AAV intein plasmids; I, 5' F8-N6-N-intein plasmid; II, C-intein-3' F8-N6 plasmid. The arrows indicate the full-length F8- N6 protein and the excised intein. (c) Chromogenic assay on medium to detect F8 activity. Significant differences between groups were assessed using Kruskal-Wallis rank sum test followed by Nemenyi multiple pairwise comparison. *P < 0.05.
Figure 3. Codon optimisation of N6 intein set improves F8 levels. (a)WB of lysates of HEK293 cells 72 hpt with either single N6 plasmid or N6 AAV intein plasmids. I + II, N6 AAV intein plasmids; I, 5' F8-N6-N-intein plasmid; II, C-intein-3' F8-N6 plasmid; codop, codon-optimised. The arrows indicate the full-length F8-N6 protein and the excised intein, demonstrating the increase in expression of the codop intein set. (b) WB of medium of the transfected cells showing increased secretion of the codop intein set compared to the non-codop. (c) WB quantification of F8 protein expression in cell lysates and medium (N=2). Significant differences were assessed using unpaired Student's t test. **P < 0.01. (d) Chromogenic assay on medium to detect F8 activity levels. Significant differences between groups were assessed using Kruskal- Wallis rank sum test followed by Nemenyi multiple pairwise comparison. *P < 0.05.
Figure 4. Circular maps of plasmids. Circular maps of pAAV2.1 HLP Npu DnaE N intein ATP7B (A) and pAAV2.1 HLP Npu DnaE C intein ATP7B (B). Abbreviations: ITR, inverted terminal repeats; SV40, Simian virus 40 intron; HLP, Hybrid Liver Promoter; WPRE Woodchuck hepatitis virus Post- transcriptional Regulatory Element; bGHpA bovine Growth Hormone polyadenylation signal (bGHpA).
Figure 5. Intein-mediated in vitro reconstitution of full-length human ATP7B. Western blot analysis with anti-FLAG antibody of whole cell lysates (40pg) from HepG2 ATP7B knock-out cells transfected with pAAV2.1 HLP Npu DnaE N intein ATP7B and pAAV2.1 HLP Npu DnaE C intein ATP7B or with pAAV2.1 TBG GFP as control. Expected molecular weights are: 160 kDa for full- length ATP7B-3XFLAG, HOkDa for C-intein- c-term ATP7B half-3XFLAG, 56 kDa for N-term ATP7B half-N-intein-3XFLAG, and 17kDa for excised inteins. GAPDH was used as loading control. Figure 6. Intein-mediated in vivo reconstitution of full-length human ATP7B. Western blot analysis with anti-FLAG antibody of whole liver lysates (40pg) from mice injected with AAV2/8 Npu DnaE intein ATP7B vectors at the dose of 5X1012GC/Kg each or with 1X1013GC/Kg of AAV2/8 TBG EGFP as control vector. Expected molecular weights are: 160 kDa for full-length ATP7B- 3XFLAG, HOkDa for C-intein- c-term ATP7B half-3XFLAG, 56 kDa for N-term ATP7B half-N-intein- 3XFLAG, and 17kDa for excised inteins. GAPDH was used as loading control.
Figure 7. Intein-mediated in vivo reconstitution of full-length human ATP7B in AtpJb^ mice. Atp7b_/ mice were injected with AAV2/8 HLP 5'ATP7B+N-intein and AAV2/8 HLP C-intein- 3ΆTR7B, at a dose of lxl013gc/Kg each, or AAV2/8 TBG eGFP vector at a dose of 2xl013gc/Kg as control vector A) Western blot analysis using anti-FLAG antibody of whole liver lysates. Expected molecular weights are: 160 kDa for full-length ATP7B-3XFLAG, HOkDa for C-intein- c-term ATP7B half-3XFLAG, 56 kDa for N-term ATP7B half-N-intein-3XFLAG, and 15kDa for excised inteins. GAPDH was used as loading control. B) Representative images from immunohistochemistry using anti-ATP7B antibody.
Figure 8. Intein-mediated gene therapy prevents liver disease progression in AtpJb^ mice. A)
Representative hematoxylin/eosin (left panel) and Sirius Red (right panel) staining from Atp7+/~ mice and from At Jb^ mice injected with AAV2/8 TBG eGFP (GFP) or AAV2/8 HLP 5ΆTR7B+N- intein and AAV2/8 HLP C-intein-3'ATP7B (int-ATP7B) B) Quantitative morphometry of Sirius Red (SR) staining. Data are expressed percentage over total field area. ANOVA plus Tukey's post- hoc: ***p<0.005.
Figure 9. Intein-mediated gene therapy prevents hepatocellular damage in AtpJb^ mice. A)
Serum alanine and B) aspartate aminotransferase (ALT/AST) levels at 4, 8 and 12 weeks after treatment in Atp7+/ mice and in Atp7b/ mice injected with AAV2/8 TBG eGFP (AAV-GFP) or AAV2/8 HLP 5'ATP7B+N-intein and AAV2/8 HLP C-intein-3'ATP7B (AAV-int-ATP7B). ANOVA plus Tukey's post-hoc: *p<0.05 versus AAV-GFP.
Figure 10. Comparison of human F8 variants in vitro. (A) Schematic representation of the four different F8 variants that were cloned into an AAV plasmid: wild-type F8; N6 containing 11 amino acids of the modified SQ amino acid linker (SQm) followed by the human N6 B domain; SQ containing the SQ amino acid linker; V3 containing the V3 peptide in the middle of the SQ linker. ITR, inverted terminal repeats; CMV, cytomegalovirus promoter; star symbol, 3xflag tag; PolyA, short synthetic polyadenylation signal; Ntds, nucleotides; SP, signal peptide. (B) Western blot (WB) analysis of lysates of HEK293 cells 72 hpt with the various F8 variants. Neg, non- transfected cells. (C) Chromogenic assay of F8 activity in the medium of transfected cells reported as International Units/deciliter (lU/dl). Significant differences between groups were assessed using Kruskal-Wallis rank sum test followed by Nemenyi multiple pairwise comparison. *P < 0.05, and **P < 0.01.
Figure 11. In vitro F8-N6 (N6) intein expression and activity. (A) Schematic representation of the split N6 intein constructs and of the PTS mechanism. ITR, inverted terminal repeats; Prom, promoter; SP, signal peptide; 5'N6-F8, 5'CDS of N6 variant; n-intein, NDnaE or NDnaB intein; star symbol, 3xflag tag; PolyA, short synthetic polyadenylation signal; c-intein, CDnaE intein; 3'N6-F8, 3'CDS of N6 variant. (B) WB of protein lysates of HEK293 cells 72 hpt with either Npu intein or Heterologous (N-intein DnaB + C-intein DnaE) intein. I + II, N6 intein proteins; l+ll (Het.), heterologous split inteins proteins; I, 5'N6 CDS-NDnaE protein; I (N-int B), 5'N6 CDS-NDnaB. II, CDnaE-3'N6 CDS protein. Excised intein are present only in the down part of the blot when I + II (Npu intein) are provided. (C) WB of medium of the transfected cells showing the secreted proteins. The arrows indicate the full-length N6 protein. (D) Chromogenic assay performed on the medium of transfected cells to detect F8 activity levels reported as International Units/deciliter (lU/dl). Significant differences between groups were assessed using Kruskal- Wallis rank sum test *P < 0.05, and ***P < 0.0005.
Figure 12. Codon optimisation of the N6 intein improves F8 activity levels. (A) WB of protein lysates of HEK293 cells 72 hpt with the AAV-N6 intein plasmids and with the codon-optimised set. I + II, N6 intein proteins; I, 5' N6 CDS-NDnaE protein; II, CDnaE-3'N6 CDS protein. The arrows indicate the full-length N6 protein and the excised intein Codop: codon-optimised. (B) WB of medium from the transfected cells showing increased secretion of the codop N6 full-length protein compared to the non-codon-optimised. (C) Chromogenic assay performed on the medium from transfected cells to measure F8 activity levels reported as International Units/deciliter (lU/dl). Significant differences between groups were assessed using Kruskal- Wallis test *P < 0.05 and **P < 0.005.(D) Peptides sequences obtained by LC-MS analysis which include the N6 splitting point which is correctly reconstituted; S: Ser 962.
Figure 13. Analysis of CodopV3 and intein AAV genomes integrity. Southern blot analysis of vector genome DNA isolated directly from AAV virions and run on an alkaline agarose gel. AAV DNA was labelled with a probe specific for the HLP promoter. Neg, AAV Dna treated with Dnasel; CodopV3, AAVCodopV3; I, AAV-5' N6-Nintein; II, AAV-C-intein-3' N6.
Figure 14. AAV-N6 intein administration results in F8 activity levels comparable to wild-type and single AAV-Codop-V3 injected animals. Chromogenic assay performed on plasma samples to detect F8 activity in 3 different groups over time; F8 activity levels are reported as International Units/deciliter (lU/dl). Each dot within different groups represents a single mouse. Plasma sample were analysed at different time points: baseline, 4, 8, 12 and 16 weeks post injection (w.p.i.). The baseline includes all the knock-out mice before the treatment. The wild- type group include (N=ll) non affected mice; CodopV3, single AAV-codop F8 V3 injected mice (N=8); N6 intein, AAV-N6 intein injected group (N=5); CodopN6 intein, AAV-codopN6 intein injected group (N=10).
Figure 15. AAV-Codop N6 intein leads to development of anti-F8 antibodies. (A) WB of liver lysates (600 pg) from either the CodopN6 intein-treated (N=6) and untreated haemophilic (N=2) mice and HEK293 cells 72 hours post-transfection with N6 full-length plasmid used as positive control (100 pg). In 4 out of 6 CodopN6 inteins samples N6 full-length protein is visible and indicated with an arrow in the upper part of the blot. In the middle part of the blot both the 5'N6 CDS-NDnaE protein and the CDnaE-3'N6 CDS protein are visible. (B) On the left y axis F8 activity levels (lU/dl) obtained from chromogenic assay performed on plasma samples of AAV- Codop N6 intein injected mice at 1 and at 4 w.p.i. The same mice were also analysed for anti-F8 antibodies at 1 and 4w.p.i. The amount of anti-F8 antibodies reported in Arbitrary Units/milliliter (AU/ml) is plotted on the right y axis. Each numbered bar plotted on the x axis, represents a single mouse. Total number of analysed mice N=6.
Figure 16. AAV-N6 intein administration results in therapeutic F8 levels without eliciting anti- F8 antibodies. On the left y axis F8 activity levels (lU/dl) obtained from chromogenic assay performed on plasma samples. On the right y axis the corresponding amount (AU/ml) of anti- F8 antibodies. Each numbered bar represents a single mouse. Codop V3, AAV-Codop V3 analysed mice (N=8); N6 intein, AAV-N6 intein analysed mice (N=5); CodopN6 intein, AAV- CodopN6 intein analysed mice (N=8).
Figure 17. Low-dose AAV-CodopN6 intein administration results in therapeutic levels of F8 in the absence of anti-F8 antibodies. On the left y axis F8 activity levels (lU/dl) obtained from chromogenic assay performed on plasma samples. On the right y axis the corresponding amount (AU/ml) of anti-F8 antibodies. Each number plot on the x axis represents a single mouse. Total number of analysed mice injected with AAV-CodopN6 intein at 4w.p.i. N=5. Significant difference within the same mouse for F8 levels and anti-F8 antibodies were assessed using the paired sample t- test: **P < 0.05.
Figure 18. AAV-N6 intein and low-dose AAV-CodopN6 intein administration reduces the time for blood clotting formation. Activated partial thromboplastin time (aPTT) assay performed on plasma samples both at baseline and at the last time-point of the analysis in 4 different groups: CodopV3 and N6 inteins at 16 w.pi; CodopN6 inteins at 8 w.p.i.; low-dose CodopN6 inteins at 4 w.p.i.; wild-type animals at 12 weeks of age. Coagulation time is reported in seconds; significant differences were assessed using the non paramentic Kruskal-Wallis test. Significant differences were found as follow: baseline and CodopN6 low dose *P = 0.03; CodopN6 intein and CodopN6 intein low dose *P = 0.025; CodopN6 and wild-type mice **P = 0.0012.
DETAILED DESCRIPTION OF THE INVENTION MATERIALS AND METHODS
Generation of AAV vector plasmids The plasmids used for AAV vector production derived from the pTigem AAV plasmid that contains the ITRs of AAV serotype 2. The AAV plasmids were designed as detailed in Figure 1A and 2A. The F8-N6 protein was split at the amino acid (a. a.) S962 (Set 1) or a. a. S883 (Set 2). Inteins included in the plasmids were from the split intein of DnaE from Nostoc punctiforme (Npu) (27). The plasmids used in the study were under the control of either the ubiquitous cytomegalovirus (CMV) (L. P. Pellissier et al., Mol Ther Methods Clin Dev 1, 14009 (2014).) or the liver-specific hybrid liver promoter (HLP) (McIntosh, J. et al. Therapeutic levels of FVIII following a single peripheral vein administration of rAAV vector encoding a novel human factor VIII variant. Blood (2013)) . The polyadenylation signal (polyA) used in all plasmids was the short synthetic polyA (Levitt, N., Briggs, D., Gil, A. & Proudfoot, N. J. Definition of an efficient synthetic poly(A) site. Genes Dev. (1989)).
AAV vector production and characterisation AAV vectors were produced by the TIGEM AAV Vector Core by triple transfection of HEK293 cells as already described (A. Maddalena et al., Mol Ther 26, 524-541 (2018); M. Doria, A. Ferrara, A. Auricchio, Hum Gene Ther Methods 24, 392-398 (2013)). No differences in vector yields were observed between AAV vectors including or not intein sequences.
Transfection of cells
HEK293 cells were maintained and transfected using the calcium phosphate method (1 pg of each plasmid/well in 6-well plate format) as already described ( A. Maddalena et al., Mol Ther 26, 524-541 (2018)). The total amount of DNA transfected in each well was kept equal by addition of a scramble plasmid where needed.
Western blot analysis
Samples (HEK293 cells) were lysed in RIPA buffer to extract F8 protein. Lysis buffers were supplemented with protease inhibitors (Complete Protease inhibitor cocktail tablets; Roche, Basel, Switzerland) and 1 mM phenylmethylsulfonyl. For medium samples, cells were kept in Opti-MEM medium (Gibco, ThermoFisher Scientific, Germany), and upon cell harvesting at 72 hours post transfection unprocessed medium samples were mixed with IX Laemmli sample buffer. All samples were denatured at 99°C for 5 minutes in IX Laemmli sample buffer. Lysates and medium samples were separated by either 12% (for excised intein detection) or 6% (for F8 protein detection) SDS-polyacrylamide gel electrophoresis (SDS-PAGE). The antibodies used for immuno-blotting are as follows: anti-3xflag (1:1000, A8592; Sigma-Aldrich, Saint Louis, MO, USA) to detect the F8 protein; anti-P-Actin (1:1000, NB600-501; Novus Biological LLC, Littleton, CO, USA) to detect b-Actin proteins which were used as loading controls for the 12% SDS-PAGE; anti-Calnexin (1:1000, ADI-SPA-860; Enzo Life Sciences Inc, New York, NY, USA) to detect Calnexin, used as loading controls for the 6% SDS-PAGE. The quantification of F8 bands detected by Western blot was performed using ImageJ software (free download is available at http://rsbweb.nih.gov/ij/).
Chromogenic Assay
To evaluate F8 activity, a chromogenic assay was performed using a Coatest® SP4 FVIII-kit (Chromogenix, Werfen, Milan, Italy). Standard curves were generated by serial dilution of commercial human F8 (Refacto, Pfizer). Results are expressed as international units (III) per deciliter (dl).
Statistical analyses (F8 experiments)
Kruskal-Wallis rank sum test (non-parametric test) was performed to determine if there were statistically significant differences between more than two groups of an independent variable. As the tests were significant a multiple pairwise-comparison was further applied to determine if the differences between specific pairs of a group were statistically significant. Nemenyi's non- parametric all-pairs comparison test for Kruskal-type ranked data was used. To determine the statistical significance of two groups, unpaired Student's t test was used.
Generation of intein-ATP7B constructs
The plasmids used for AAV vector production derived from the pAAV2.1 plasmid that contain the ITRs of AAV serotype 2. The AAV intein-ATP7B plasmids were designed as detailed in Figure 4. Codon-optimized human ATP7B cDNA was split at the nucleotide 1467. Inteins included in the plasmids were the intein of DnaE from Nostoc punctiforme (Npu). Simian virus 40 (SV40) intron, Woodchuck hepatitis virus Post-transcriptional Regulatory Element (WPRE), bovine Growth Hormone polyadenylation signal (bGHpA), and 3xFLAG sequences were also included. Cells transfection
HepG2 ATP7B knock-out cells (Chandhok G, Schmitt N, Sauer V, Aggarwal A, Bhatt M, Schmidt HH. The effect of zinc and D-penicillamine in a stable human hepatoma ATP7B knockout cell line. PLoS One 2014;9:e98809.)were maintained in RPMI 1640 (Euroclone) supplemented with 10% fetal bovine serum (FBS, Euroclone) plus 1% penicillin/streptomycin solution and 1% L- glutamine (Euroclone). Cells were plated in 6 well-plate and transfected using LipoD293 (SignaGen Laboratories) with a total of 2pg of plasmid DN A per well. 72 hours after transfection, cells were washed in PBS and lysed in RIPA buffer supplemented with protease and phosphatase inhibitors (Roche).
Animal study
AAV vectors were produced and tittered by the TIGEM AAV Vector Core as already described ( Maddalena A, et al. High-Throughput Screening Identifies Kinase Inhibitors That Increase Dual Adeno-Associated Viral Vector Transduction In Vitro and in Mouse Retina. Hum Gene Ther 2018;29:886-901). Male 7-week-old C57BL/6 mice (Charles River Laboratories) were administered by intravenous injection with 200 mI of vector solution in 0.9% NaCI. At sacrifice, animals were perfused with PBS and livers were harvested and lysed in RIPA buffer using Tissuelyser (QIAGEN)
Western blot analysis
Lysates were incubated on ice for 30 minutes and centrifuged. Supernatant was collected and protein content was determined by Bradford assay (Bio-Rad). Samples were denatured lXLaemmli sample buffer plus 1M DTT (Roche). Protein samples were separated by SDS-PAGE by using 4-12%polyacrylamide gels (Bio-Rad). Primary antibody was a mouse monoclonal peroxidase-conjugated anti-FLAG (1:1000, Sigma-Aldrich). Peroxidase substrate was provided by the ECL LiteAblot Plus Enhanced Chemiluminescent Substrate (Euroclone). Sequences
Examples of split inteins of the present invention
Figure imgf000042_0001
Figure imgf000043_0001
As described herein, within the scope of the present invention are inteins originated from the same gene from different organisms, retaining trans-splicing activity. As a non limiting example, the DNA-E split intein may be derived from split inteins the DnaE gene (eg DNA polymerase III subunit alpha) from cyanobacteria including Nostoc punctiforme (Npu) Synechocystis sp. PCC6803 (Ssp), Fischerella sp. PCC 9605, Scytonema tolypothrichoides, Cyanobacteria bacterium SW_9_47_5, Nodularia spumigena, Nostoc flagelliforme, Crocosphaera watsonii WH 8502, Chroococcidiopsis cubana CCALA 043, Trichodesmium erythraeum. As a further example, the DNA-B ssplit intein may be derived from the DnaB gene from cyanobacteria including R. marinus (Rma), Synechocystis sp. PC6803 (Ssp), Porphyra purpurea chloroplast (Ppu) which are described for instance in (59).
Hence, split inteins of the invention may be 100% identical, 98%, 80%, 75%, 70%, 65% 50% identical to naturally occurring inteins, wherein said inteins retain the ability to undergo trans splicing reactions. Within the scope of the present invention are fragments of naturally occurring or modified inteins which retain trans-splicing activity.
Hence, within the scope of the present invention are also split inteins variants and fragments of the inteins of the invention retaining trans-splicing activity
Interestingly, it has been reported that inteins have conserved functional features that guarantee their splicing activity. In particular, four intein motifs have been identified (see below for their consensus sequence): Blocks A-H (Pietrokovski 1994 and Perler 1997) and Blocks N2 and N4 (Pietrokovski 1998). Intein Blocks A, N2, B, N4, F, and G are involved in protein splicing. Blocks C, D, E, H are in the endonuclease domain, which is absent from split inteins. Thus, split inteins retain conserved motifs that are essential to the trans-splicing activity. (Intein database, disclosed in [Perler, F. B. (2002). InBase, the Intein Database. Nucleic Acids Res. 30, 383-384.])
Although, no single residue is invariant, the Ser and Cys in Block A, the His in Block B, the His, Asn and Ser/Cys/Thr in Block G are the most conserved residues in the splicing motifs.
The present inventors have used intein-mediated protein- transplicing in order to reconstitute large proteins in vivo. Split inteins encoded by intein gene sequences are produced as precursor polypeptides, which through their structural complementation can reassemble and catalyze a protein trans-splicing reaction.
In the context of protein trans-splicing, the N-intein gene is fused in frame with the sequence coding for the N-terminal portion of the protein of interest; the C-lntein gene is fused in frame with the sequence coding for the C-terminal portion of the sequence of interest. Upon expression of the two precursor fusion proteins, the inteins undergo autocata lytic excision and form a ligated extein, eg the reconstituted protein of interest. Hence, reconstitution of a protein of interest requires splitting said protein into two fragments, whose coding sequences are cloned separately into AAV vector, fused to a N- or C- Intein and under the control of a promoter. Splitting points for each protein are selected taking into account the amino acid requirement at the junction point (eg presence of an amino acid containing a nucleophilic thiol or hydroxyl group (i.e. Cys, Ser or Thr) as first residue in the C- extein, as well as preservation of the integrity of critical protein domains in order to favor proper protein folding and stability of each intein-polypeptide precursor polypeptide and the resulting reconstituted protein.
Of particular note, the present inventors have selected junction points within two proteins of interest: the protein F8-N6 and ATP7B.
For the F8 gene, preferably, said coding sequence is split at a nucleotide corresponding to aa Ser962 or Ser883, or said coding sequence is split at nucleotide 2884 or at nucleotide 2647.
For the coding sequence that encodes the ATP7B gene, preferably, said coding sequence is split at nucleotide 1467 of ATP7B cDNA or said coding sequence is split at a nucleotide corresponding to Lys 489 of ATP7B
N6 aa sequence
MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN
TSW YKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTW ITLKNMASHPVSLHAV
GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH
VDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD
AASARAWPKMHTVNGYVNRSLPGLIGCHRKSVYWHVIGMGTT PEVHSIFLEGHTFLVRNH
RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNE
EAEDYDDDLTDSEMDW RFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA
PDDRSYKSQYLNNGPQRIGRKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTL
LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGEI FKYKWTVTVEDGP
TKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDE
NRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHS INGYVFDSLQLSVCLHEVAYWYILS
IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVEMSMENPGLWILGCHNSDFRNRG
MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS
TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE
AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATE
LKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKSSPLT
ESGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTREITRTTLQSDQEEIDYDDT
ISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSV
PQFKKW FQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYS
SLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDV
HSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTI FDETKSWYFTENMERNCRAPCNIQM
EDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHS IHFSGHVFTVRK
KEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGM
ASGHIRDFQITASGQYGQWAPKLARLHYSGS INAWSTKEPFSWIKVDLLAPMIIHGIKTQ GARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNI FNPPIIAR YIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQ ITASSYFTNMFATWSPSK ARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQD GHQWTLFFQNGKVKVFQGNQDSFTPW NSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCE AQDLY
(SEQ ID NO: 19)
SQ-N6 aa sequence
MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN TSW YKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTW ITLKNMASHPVSLHAV GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH VDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD AASARAWPKMHTVNGYVNRSLPGL IGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNH RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNE EAEDYDDDLTDSEMDW RFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA PDDRSYKSQYLNNGPQRIGRKYKKVREMAYTDETFKTREAIQHESGILGPLLYGEVGDTL LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGEI FKYKWTVTVEDGP TKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDE NRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHS INGYVFDSLQLSVCLHEVAYWYILS IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRG MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATE LKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKSSPLT ESGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTRHQREITRTTLQSDQEEIDYDDT ISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSV PQFKKW FQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYS SLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDV HSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTI FDETKSWYFTENMERNCRAPCNIQM EDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHS IHFSGHVFTVRK KEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGM ASGHIRDFQITASGQYGQWAPKLARLHYSGS INAWSTKEPFSWIKVDLLAPMIIHGIKTQ GARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNI FNPPIIAR YIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSK ARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQD GHQWTLFFQNGKVKVFQGNQDSFTPW NSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCE AQDLY
(SEQ ID NO: 20)
ATP7B aa sequence
MPEQERQITAREGASRKILSKLSLPTRAWEPAMKKSFAFDNVGYEGGLDGLGPSSQVATS TVRILGMTCQSCVKSIEDRISNLKGIISMKVSLEQGSATVKYVPSW CLQQVCHQIGDMG FEASIAEGKAASWPSRSLPAQEAW KLRVEGMTCQSCVSSIEGKVRKLQGW RVKVSLSN QEAVITYQPYLIQPEDLRDHVNDMGFEAAIKSKVAPLSLGPIDIERLQSTNPKRPLSSAN QNFNNSETLGHQGSHW TLQLRIDGMHCKSCVLNIEENIGQLLGVQSIQVSLENKTAQVK YDPSCTSPVALQRAIEALPPGNFKVSLPDGAEGSGTDHRSSSSHSPGSPPRNQVQGTCST TLIAIAGMTCASCVHSIEGMISQLEGVQQISVSLAEGTATVLYNPSVISPEELRAAIEDM GFEASW SESCSTNPLGNHSAGNSMVQTTDGTPTSVQEVAPHTGRLPANHAPDILAKSPQ STRAVAPQKCFLQIKGMTCASCVSN IERNLQKEAGVLSVLVALMAGKAEIKYDPEVIQPL EIAQFIQDLGFEAAVMEDYAGSDGNIELTITGMTCASCVHNIESKLTRTNGITYASVALA TSKALVKFDPEIIGPRDIIKIIEEIGFHASLAQRNPNAHHLDHKMEIKQWKKSFLCSLVF GIPVMALMIYMLIPSNEPHQSMVLDHNI IPGLSILNLIFFILCTFVQLLGGWYFYVQAYK SLRHRSANMDVLIVLATSIAYVYSLVILW AVAEKAERSPVTFFDTPPMLFVFIALGRWL EHLAKSKTSEALAKLMSLQATEATW TLGEDNLIIREEQVPMELVQRGDIVKW PGGKFP VDGKVLEGNTMADESLITGEAMPVTKKPGSTVIAGS INAHGSVLIKATHVGNDTTLAQIV KLVEEAQMSKAPIQQLADRFSGYFVPFI IIMSTLTLW WIVIGFIDFGW QRYFPNPNKH ISQTEVIIRFAFQTSITVLCIACPCSLGLATPTAVMVGTGVAAQNGILIKGGKPLEMAHK IKTVMFDKTGTITHGVPRVMRVLLLGDVATLPLRKVLAW GTAEASSEHPLGVAVTKYCK EELGTETLGYCTDFQAVPGCGIGCKVSNVEGILAHSERPLSAPASHLNEAGSLPAEKDAV PQTFSVLIGNREWLRRNGLTISSDVSDAMTDHEMKGQTAI LVAIDGVLCGMIAIADAVKQ EAALAVHTLQSMGVDW LITGDNRKTARAIATQVGINKVFAEVLPSHKVAKVQELQNKGK KVAMVGDGVNDSPALAQADMGVA IGTGTDVAIEAADW LIRNDLLDW ASIHLSKRTVRR IRINLVLALIYNLVGIPIAAGVFMPIGIVLQPWMGSAAMAASSVSVVLSSLQLKCYKKPD LERYEAQAHGHMKPLTASQVSVHIGMDDRWRDSPRATPWDQVSYVSQVSLSSLTSDKPSR HSAAADDDGDKWSLLLNGRDEEQYI
(SEQ ID NO: 21)
Construct Legend:
51 ITR: da_sh_e_d_ underline (seq A at 5' beginning of the sequence) CMV promoter: bold (seq B)
N-intein Nnu double underline fsea C)
3xflag: italic (seq D)
Synthetic PolvA: bold underline (seq E)
31 IT Ri das h _e_d_ u n dej ine (seq F at 3' end of the sequence)
HLP promoter: bold italic (seq G)
C-intein Npu DnaE: underline fseq H) E signalpeptid^iwuyejmdedme (seq 1) CQdop. F8.signal pe.ptide:. dotted underline (seq J)
5’ N6 AAV intein set 1 (CMV promoter) pl278_pTIGEM_CMV_5' SQ-N6 F8 (split Ser962) + N-intein DnaE_3xFlag_(Synthetic- polyA)
CX CGCl CTCGC rCGCJ_C_ACTGAGGCC_GCCCGGGCAAAGCC_CGGGC_GTCG(iGCGA_CCTTJ_GGTCG
CiIC_GGCC_TCAGT_GAGC_GAGCG_AGCG_C_GCAG_AGAG_GGAGTGGC_CAACT_CCATCACTAG_GGGTTCC
TGCTAGCGATAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATAT
GGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC
GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTG ACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCA
TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCC
CAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTA
TTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCAC
GGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCA
ACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGT
GTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCAGCGGCCGCC
ACCATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGCTTTAGTGCCAC
CAGAAGATACTACCTGGGTGCAGTGGAACTGTCATGGGACTATATGCAAAGTGATCTCGGTGA
GCTGCCTGTGGACGCAAGATTTCCTCCTAGAGTGCCAAAATCTTTTCCATTCAACACCTCAGTC
GTGTACAAAAAGACTCTGTTTGTAGAATTCACGGATCACCTTTTCAACATCGCTAAGCCAAGGC
CACCCTGGATGGGTCTGCTAGGTCCTACCATCCAGGCTGAGGTTTATGATACAGTGGTCATTAC
ACTTAAGAACATGGCTTCCCATCCTGTCAGTCTTCATGCTGTTGGTGTATCCTACTGGAAAGCT
TCTGAGGGAGCTGAATATGATGATCAGACCAGTCAAAGGGAGAAAGAAGATGATAAAGTCTTC
CCTGGTGGAAGCCATACATATGTCTGGCAGGTCCTGAAAGAGAATGGTCCAATGGCCTCTGACC
CACTGTGCCTTACCTACTCATATCTTTCTCATGTGGACCTGGTAAAAGACTTGAATTCAGGCCT
CATTGGAGCCCTACTAGTATGTAGAGAAGGGAGTCTGGCCAAGGAAAAGACACAGACCTTGCAC
AAATTTATACTACTTTTTGCTGTATTTGATGAAGGGAAAAGTTGGCACTCAGAAACAAAGAAC
TCCTTGATGCAGGATAGGGATGCTGCATCTGCTCGGGCCTGGCCTAAAATGCACACAGTCAATG
GTTATGTAAACAGGTCTCTGCCAGGTCTGATTGGATGCCACAGGAAATCAGTCTATTGGCATGT
GATTGGAATGGGCACCACTCCTGAAGTGCACTCAATATTCCTCGAAGGTCACACATTTCTTGTG
AGGAACCATCGCCAGGCGTCCTTGGAAATCTCGCCAATAACTTTCCTTACTGCTCAAACACTCT
TGATGGACCTTGGACAGTTTCTACTGTTTTGTCATATCTCTTCCCACCAACATGATGGCATGGA
AGCTTATGTCAAAGTAGACAGCTGTCCAGAGGAACCCCAACTACGAATGAAAAATAATGAAGA
AGCGGAAGACTATGATGATGATCTTACTGATTCTGAAATGGATGTGGTCAGGTTTGATGATGA
CAACTCTCCTTCCTTTATCCAAATTCGCTCAGTTGCCAAGAAGCATCCTAAAACTTGGGTACAT
TACATTGCTGCTGAAGAGGAGGACTGGGACTATGCTCCCTTAGTCCTCGCCCCCGATGACAGAA
GTTATAAAAGTCAATATTTGAACAATGGCCCTCAGCGGATTGGTAGGAAGTACAAAAAAGTCC
GATTTATGGCATACACAGATGAAACCTTTAAGACTCGTGAAGCTATTCAGCATGAATCAGGAA
TCTTGGGACCTTTACTTTATGGGGAAGTTGGAGACACACTGTTGATTATATTTAAGAATCAAG
CAAGCAGACCATATAACATCTACCCTCACGGAATCACTGATGTCCGTCCTTTGTATTCAAGGAG
ATTACCAAAAGGTGTAAAACATTTGAAGGATTTTCCAATTCTGCCAGGAGAAATATTCAAATA
TAAATGGACAGTGACTGTAGAAGATGGGCCAACTAAATCAGATCCTCGGTGCCTGACCCGCTAT
TACTCTAGTTTCGTTAATATGGAGAGAGATCTAGCTTCAGGACTCATTGGCCCTCTCCTCATCT GCTACAAAGAATCTGTAGATCAAAGAGGAAACCAGATAATGTCAGACAAGAGGAATGTCATCC
TGTTTTCTGTATTTGATGAGAACCGAAGCTGGTACCTCACAGAGAATATACAACGCTTTCTCCC
CAATCCAGCTGGAGTGCAGCTTGAGGATCCAGAGTTCCAAGCCTCCAACATCATGCACAGCATC
AATGGCTATGTTTTTGATAGTTTGCAGTTGTCAGTTTGTTTGCATGAGGTGGCATACTGGTACA
TTCTAAGCATTGGAGCACAGACTGACTTCCTTTCTGTCTTCTTCTCTGGATATACCTTCAAACA
CAAAATGGTCTATGAAGACACACTCACCCTATTCCCATTCTCAGGAGAAACTGTCTTCATGTCG
ATGGAAAACCCAGGTCTATGGATTCTGGGGTGCCACAACTCAGACTTTCGGAACAGAGGCATGA
CCGCCTTACTGAAGGTTTCTAGTTGTGACAAGAACACTGGTGATTATTACGAGGACAGTTATGA
AGATATTTCAGCATACTTGCTGAGTAAAAACAATGCCATTGAACCAAGAAGTTTTTCACAGAA
TCCACCTGTATTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTAGGCAAAAACAG
TTTAATGCAACCACAATACCTGAAAATGATATAGAGAAAACCGATCCCTGGTTCGCACACCGAA
CCCCCATGCCAAAAATTCAAAACGTCTCCAGTTCCGATCTTCTCATGCTCTTGCGCCAGTCACCC
ACACCACATGGTCTCTCCCTCAGCGACCTGCAAGAGGCGAAATATGAAACATTTTCAGATGACC
CTAGCCCCGGCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTTTCGGCCGCAGCT
GCATCATTCTGGTGATATGGTATTCACCCCGGAATCAGGCCTCCAACTTAGACTTAACGAGAAA
CTGGGCACGACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCCAGTACCAGCAACA
ACCTTATCAGCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAATACATCATCACTTGG
GCCACCCTCTATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTTTGGTAAGAAGTCA
TCCCCACTCACCGAATGCCTGAGCTACGAGACCGAGATCCTGACCGTGGAGTACGGCCTGCTGC
CCATCGGCAAGATCGTGGAGAAGCGGATCGAGTGCACCGTGTACAGCGTGGACAACAACGGCAA
CATCTACACCCAGCCCGTGGCCCAGTGGCACGACCGGGGCGAGCAGGAGGTGTTCGAGTACTGC
CTGGAGGACGGCAGCCTGATCCGGGCCACCAAGGACCACAAGTTCATGACCGTGGACGGCCAGA
TGCTGCCCATCGACGAGATCTTCGAGCGGGAGCTGGACCTGATGCGGGTGGACAACCTGCCCAA gGA CTA CAAA GA CCA TGA CGGTGA TTA TAAA GA TCA TGA CA TCGA CTA CAA GGA TGA CGA TGA CAA
GTCAAAGCTTGATATCATCGAATTCAATAAAAGATCTTTATTTTCATTAGATCTGTGTGTT
GGTTTTTTGTGTGCGGCCCAATTGAGGAACCCCTAGTGATGGAGTTGGCCACTCCCTCTCTGC
G_C_ G C T CGC T C G C_T C A C TG A G G_C_ C G G G_C_ G A C CA A A G_G T C G_C_C C G ACG C C C GG G C TTTG C C C_G G G C G
GC.C T C A GJT G A Gj G A Gj_G A G_C_G C G C A_ iSEQ.I_D_N_Qi.26J
3’ N6 AAV intein set 1 (CMV promoter) pl412_pTIGEM_CMV_SP_C-intein DnaE + 3' SQ-N6-F8 (split Ser962)_3xFlag_(Synthetic-polyA) CT_GCGCiICTCGCT:CGCJCACT GGCi CCCGiiGC AGC_CCGGG_C_GTCG_G_GCGACCTTJ_GGTCG
CC_C_GGCC_TCAGT_GAGC_GAGCG_AGCG_C_GCAG_AGAG_GGAGTGGC_CAACT_CCATCACTAG_GGGTTCC
TGCTAGCGATAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATAT
GGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC
GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTG
ACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCA
TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCC
CAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTA
TTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCAC
GGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCA
ACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGT
GTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCAGCGGCCGCC
ACCATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGCTTTAGTATCA
AGATCGCCACCCGGAAGTACCTGGGCAAGCAGAACGTGTACGACATCGGCGTGGAGCGGGACCA
CAACTTCGCCCTGAAGAACGGCTTCATCGCCAGCAATAGCGGTGGACCTTTGTCTCTCTCTGAG
GAGAATAATGACTCCAAGCTGCTTGAGTCAGGGTTGATGAACAGCCAAGAATCCTCATGGGGA
AAAAACGTTTCCTCCACCAGGGAAATAACTCGTACTACTCTTCAGTCAGATCAAGAGGAAATTG
ACTATGATGATACCATATCAGTTGAAATGAAGAAGGAAGATTTTGACATTTATGATGAGGATG
AAAATCAGAGCCCCCGCAGCTTTCAAAAGAAAACACGACACTATTTTATTGCTGCAGTGGAGAG
GCTCTGGGATTATGGGATGAGTAGCTCCCCACATGTTCTAAGAAACAGGGCTCAGAGTGGCAGT
GTCCCTCAGTTCAAGAAAGTTGTTTTCCAGGAATTTACTGATGGCTCCTTTACTCAGCCCTTAT
ACCGTGGAGAACTAAATGAACATTTGGGACTCCTGGGGCCATATATAAGAGCAGAAGTTGAAG
ATAATATCATGGTAACTTTCAGAAATCAGGCCTCTCGTCCCTATTCCTTCTATTCTAGCCTTAT
TTCTTATGAGGAAGATCAGAGGCAAGGAGCAGAACCTAGAAAAAACTTTGTCAAGCCTAATGA
AACCAAAACTTACTTTTGGAAAGTGCAACATCATATGGCACCCACTAAAGATGAGTTTGACTGC
AAAGCCTGGGCTTATTTCTCTGATGTTGACCTGGAAAAAGATGTGCACTCAGGCCTGATTGGAC
CCCTTCTGGTCTGCCACACTAACACACTGAACCCTGCTCATGGGAGACAAGTGACAGTACAGGA
ATTTGCTCTGTTTTTCACCATCTTTGATGAGACCAAAAGCTGGTACTTCACTGAAAATATGGAA
AGAAACTGCAGGGCTCCCTGCAATATCCAGATGGAAGATCCCACTTTTAAAGAGAATTATCGCT
TCCATGCAATCAATGGCTACATAATGGATACACTACCTGGCTTAGTAATGGCTCAGGATCAAAG
GATTCGATGGTATCTGCTCAGCATGGGCAGCAATGAAAACATCCATTCTATTCATTTCAGTGGA
CATGTGTTCACTGTACGAAAAAAAGAGGAGTATAAAATGGCACTGTACAATCTCTATCCAGGT
GTTTTTGAGACAGTGGAAATGTTACCATCCAAAGCTGGAATTTGGCGGGTGGAATGCCTTATT
GGCGAGCATCTACATGCTGGGATGAGCACACTTTTTCTGGTGTACAGCAATAAGTGTCAGACTC
CCCTGGGAATGGCTTCTGGACACATTAGAGATTTTCAGATTACAGCTTCAGGACAATATGGACA
GTGGGCCCCAAAGCTGGCCAGACTTCATTATTCCGGATCAATCAATGCCTGGAGCACCAAGGAG
CCCTTTTCTTGGATCAAGGTGGATCTGTTGGCACCAATGATTATTCACGGCATCAAGACCCAGG
GTGCCCGTCAGAAGTTCTCCAGCCTCTACATCTCTCAGTTTATCATCATGTATAGTCTTGATGG
GAAGAAGTGGCAGACTTATCGAGGAAATTCCACTGGAACCTTAATGGTCTTCTTTGGCAATGTG
GATT C ATC TG G GAT AAAAC AC AAT ATTTTT AAC C C TC C AATT ATT G C TC G AT ACATCCGTTTGC
ACCCAACTCATTATAGCATTCGCAGCACTCTTCGCATGGAGTTGATGGGCTGTGATTTAAATAG
TTGCAGCATGCCATTGGGAATGGAGAGTAAAGCAATATCAGATGCACAGATTACTGCTTCATCC
TACTTTACCAATATGTTTGCCACCTGGTCTCCTTCAAAAGCTCGACTTCACCTCCAAGGGAGGA
GTAATGCCTGGAGACCTCAGGTGAATAATCCAAAAGAGTGGCTGCAAGTGGACTTCCAGAAGA
CAATGAAAGTCACAGGAGTAACTACTCAGGGAGTAAAATCTCTGCTTACCAGCATGTATGTGA
AGGAGTTCCTCATCTCCAGCAGTCAAGATGGCCATCAGTGGACTCTCTTTTTTCAGAATGGCAA
AGTAAAGGTTTTTCAGGGAAATCAAGACTCCTTCACACCTGTGGTGAACTCTCTAGACCCACCG
TTACTGACTCGCTACCTTCGAATTCACCCCCAGAGTTGGGTGCACCAGATTGCCCTGAGGATGG
AGGHGIGGGGIGGGAGGGAGAGGAGGIGIAGGACTACAAAGACCATGACGGTGATTATAAAGAT
CA TGA CA TCGA CTA CAA GGA TGA CGA TGA CAAGTGA G ATATC AAAGCTTTC GAATTCAATAAAA GATCTTTATTTTCATTAGATCTGTGTGTTGGTTTTTTGTGTGCGGCCCAATTGAGGAACCC
CTAGTGATG_GAGTJ_G_GCCACTCCCT_CTCTG_CGCGCT_CGCJ_C_GCTCA_C_TGAGGCCGG.GCGA_CCAA
AGGTCACACGACACCCAGGCJTAGCCCAGGCG_GCCJCAGTCAGCGAGCGA_GCGC_G_CAG isEQ.LD_N_Qi.2_7j
5’ N6 AAV intein set 1 (HLP promoter) pl417_pTIGEM_HLP_5' SQ-N6 F8 (split Ser962) + N-intein DnaE_3xFlag_(Synthetic- polyA)
CJ_GCGCiIC CGCTCGCJCACT GGCCiiCCCGiiGC AGCCC_GGG_QGTCG_G_GCGACCTTJ_GGTC_G
CiIC_GGCC_TCAGT_GAGC_GAGCG_AGCG_C_GCAG_AGAG_GGAGTGGC_CAACT_CCATCACTAG_GGGTTCC
IGGIAGGGTGTTTGCTGCTTGCAATGTTTGCCCATTTTAGGGTGGACACAGGACGCTGTGGTTT
CTGAGCCAGGGGGCGACTCAGATCCCAGCCAGTGGACTTAGCCCCTGTTTGCTCCTCCGATAACT
GGGGTGACCTTGGTTAATATTCACCAGCAGCCTCCCCCGTTGCCCCTCTGGATCCACTGCTTAAA
TACGGACGAGGACAGGGCCCTGTCTCCTCAGCTTCAGGCACCACCACTGACCTGGGACAGTGAA rCGCGGCCGCCACCATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGC
TTTAGTGCCACCAGAAGATACTACCTGGGTGCAGTGGAACTGTCATGGGACTATATGCAAAGTG
ATCTCGGTGAGCTGCCTGTGGACGCAAGATTTCCTCCTAGAGTGCCAAAATCTTTTCCATTCAA
CACCTCAGTCGTGTACAAAAAGACTCTGTTTGTAGAATTCACGGATCACCTTTTCAACATCGCT
AAGCCAAGGCCACCCTGGATGGGTCTGCTAGGTCCTACCATCCAGGCTGAGGTTTATGATACAG
TGGTCATTACACTTAAGAACATGGCTTCCCATCCTGTCAGTCTTCATGCTGTTGGTGTATCCTA
CTGGAAAGCTTCTGAGGGAGCTGAATATGATGATCAGACCAGTCAAAGGGAGAAAGAAGATGA
TAAAGTCTTCCCTGGTGGAAGCCATACATATGTCTGGCAGGTCCTGAAAGAGAATGGTCCAATG
GCCTCTGACCCACTGTGCCTTACCTACTCATATCTTTCTCATGTGGACCTGGTAAAAGACTTGA
ATTCAGGCCTCATTGGAGCCCTACTAGTATGTAGAGAAGGGAGTCTGGCCAAGGAAAAGACAC
AGACCTTGCACAAATTTATACTACTTTTTGCTGTATTTGATGAAGGGAAAAGTTGGCACTCAGA
AACAAAGAACTCCTTGATGCAGGATAGGGATGCTGCATCTGCTCGGGCCTGGCCTAAAATGCAC
ACAGTCAATGGTTATGTAAACAGGTCTCTGCCAGGTCTGATTGGATGCCACAGGAAATCAGTCT
ATTGGCATGTGATTGGAATGGGCACCACTCCTGAAGTGCACTCAATATTCCTCGAAGGTCACAC
ATTTCTTGTGAGGAACCATCGCCAGGCGTCCTTGGAAATCTCGCCAATAACTTTCCTTACTGCT
CAAACACTCTTGATGGACCTTGGACAGTTTCTACTGTTTTGTCATATCTCTTCCCACCAACATG
ATGGCATGGAAGCTTATGTCAAAGTAGACAGCTGTCCAGAGGAACCCCAACTACGAATGAAAA
ATAATGAAGAAGCGGAAGACTATGATGATGATCTTACTGATTCTGAAATGGATGTGGTCAGGT
TTGATGATGACAACTCTCCTTCCTTTATCCAAATTCGCTCAGTTGCCAAGAAGCATCCTAAAAC
TTGGGTACATTACATTGCTGCTGAAGAGGAGGACTGGGACTATGCTCCCTTAGTCCTCGCCCCC
GATGACAGAAGTTATAAAAGTCAATATTTGAACAATGGCCCTCAGCGGATTGGTAGGAAGTAC
AAAAAAGTCCGATTTATGGCATACACAGATGAAACCTTTAAGACTCGTGAAGCTATTCAGCAT
GAATCAGGAATCTTGGGACCTTTACTTTATGGGGAAGTTGGAGACACACTGTTGATTATATTT
AAGAATCAAGCAAGCAGACCATATAACATCTACCCTCACGGAATCACTGATGTCCGTCCTTTGT
ATTCAAGGAGATTACCAAAAGGTGTAAAACATTTGAAGGATTTTCCAATTCTGCCAGGAGAAA
TATTCAAATATAAATGGACAGTGACTGTAGAAGATGGGCCAACTAAATCAGATCCTCGGTGCC
TGACCCGCTATTACTCTAGTTTCGTTAATATGGAGAGAGATCTAGCTTCAGGACTCATTGGCCC
TCTCCTCATCTGCTACAAAGAATCTGTAGATCAAAGAGGAAACCAGATAATGTCAGACAAGAG
GAATGTCATCCTGTTTTCTGTATTTGATGAGAACCGAAGCTGGTACCTCACAGAGAATATACAA
CGCTTTCTCCCCAATCCAGCTGGAGTGCAGCTTGAGGATCCAGAGTTCCAAGCCTCCAACATCA
TGCACAGCATCAATGGCTATGTTTTTGATAGTTTGCAGTTGTCAGTTTGTTTGCATGAGGTGGC
ATACTGGTACATTCTAAGCATTGGAGCACAGACTGACTTCCTTTCTGTCTTCTTCTCTGGATAT
ACCTTCAAACACAAAATGGTCTATGAAGACACACTCACCCTATTCCCATTCTCAGGAGAAACTG
TCTTCATGTCGATGGAAAACCCAGGTCTATGGATTCTGGGGTGCCACAACTCAGACTTTCGGAA CAGAGGCATGACCGCCTTACTGAAGGTTTCTAGTTGTGACAAGAACACTGGTGATTATTACGAG
GACAGTTATGAAGATATTTCAGCATACTTGCTGAGTAAAAACAATGCCATTGAACCAAGAAGT
TTTTCACAGAATCCACCTGTATTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTA
GGCAAAAACAGTTTAATGCAACCACAATACCTGAAAATGATATAGAGAAAACCGATCCCTGGT
TCGCACACCGAACCCCCATGCCAAAAATTCAAAACGTCTCCAGTTCCGATCTTCTCATGCTCTTG
CGCCAGTCACCCACACCACATGGTCTCTCCCTCAGCGACCTGCAAGAGGCGAAATATGAAACAT
TTTCAGATGACCCTAGCCCCGGCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTT
TCGGCCGCAGCTGCATCATTCTGGTGATATGGTATTCACCCCGGAATCAGGCCTCCAACTTAGA
CTTAACGAGAAACTGGGCACGACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCCA
GTACCAGCAACAACCTTATCAGCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAATAC
ATCATCACTTGGGCCACCCTCTATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTTT
GGTAAGAAGTCATCCCCACTCACCGAATGCCTGAGCTACGAGACCGAGATCCTGACCGTGGAGT
ACGGCCTGCTGCCCATCGGCAAGATCGTGGAGAAGCGGATCGAGTGCACCGTGTACAGCGTGGA
CAACAACGGCAACATCTACACCCAGCCCGTGGCCCAGTGGCACGACCGGGGCGAGCAGGAGGTG
TTCGAGTACTGCCTGGAGGACGGCAGCCTGATCCGGGCCACCAAGGACCACAAGTTCATGACCG
TGGACGGCCAGATGCTGCCCATCGACGAGATCTTCGAGCGGGAGCTGGACCTGATGCGGGTGGA
CAACCTGCCCAACGA CTA CAAA GA CCA TGA CGGTGA TTA TAAA GA TCA TGA GA TCGA CTA CAA GGA rGACGATCACAAGTCAAAGCTTGATATCATCGAATTCAATAAAAGATCTTTATTTTCATTAG
ATCTGTGTGTTGGTTTTTTGTGTGCGGCCCAATTGAGGAACCCCTAGTGATGGAGTTGGCCA
CT CTCT_CTGC(ICGCT( CTCGC_TCACT_GAGG_CCGG_G_CGAC_CAAAGGTCGC_CCGACGCCC_G_GGC
JT_TGCCC_GGGC_G_GCCTCAGTGAGCG_A_GCGAGCGC_G_CAG iSEQ.LD_N_Qi.28j
3’ N6 AAV intein set 1 (HLP promoter) pl418_pTIGEM_HLP_SP_C-intein DnaE + 3' SQ-N6-F8 (split Ser962)_3xFlag_(Synthetic-polyA)
CT_GCGC_G_CTCG_C_TQGCJ_CACTG_AGGC_C_GCCCG_GGCMAGC_C_C_GGG_C_GTCG_G_GCGAC_CTTJ_G_GTCG
C_C_C_GGC_C_T_C_AG_T_GAG_C_G_AGC_G_AGCG_C_GCA_G_AGA_G_GGA_G_TGG_C_C_AACJ_C_C_AT_C_A_CTA_G_GGGT_TQC
JGCTAGCGTGTTTGCTGCTTGCAATGTTTGCCCATTTTAGGGTGGACACAGGACGCTGTGGTTT
CTGAGCCAGGGGGCGACTCAGATCCCAGCCAGTGGACTTAGCCCCTGTTTGCTCCTCCGATAACT
GGGGTGACCTTGGTTAATATTCACCAGCAGCCTCCCCCGTTGCCCCTCTGGATCCACTGCTTAAA
TACGGACGAGGACAGGGCCCTGTCTCCTCAGCTTCAGGCACCACCACTGACCTGGGACAGTGAA rCGCGGCCGCCACCATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGC
TTTAGTATCAAGATCGCCACCCGGAAGTACCTGGGCAAGCAGAACGTGTACGACATCGGCGTGG
AGCGGGACCACAACTTCGCCCTGAAGAACGGCTTCATCGCCAGCAATAGCGGTGGACCTTTGTC
TCTCTCTGAGGAGAATAATGACTCCAAGCTGCTTGAGTCAGGGTTGATGAACAGCCAAGAATCC
TCATGGGGAAAAAACGTTTCCTCCACCAGGGAAATAACTCGTACTACTCTTCAGTCAGATCAAG
AGGAAATTGACTATGATGATACCATATCAGTTGAAATGAAGAAGGAAGATTTTGACATTTATG
ATGAGGATGAAAATCAGAGCCCCCGCAGCTTTCAAAAGAAAACACGACACTATTTTATTGCTGC
AGTGGAGAGGCTCTGGGATTATGGGATGAGTAGCTCCCCACATGTTCTAAGAAACAGGGCTCAG
AGTGGCAGTGTCCCTCAGTTCAAGAAAGTTGTTTTCCAGGAATTTACTGATGGCTCCTTTACTC
AGCCCTTATACCGTGGAGAACTAAATGAACATTTGGGACTCCTGGGGCCATATATAAGAGCAGA
AGTTGAAGATAATATCATGGTAACTTTCAGAAATCAGGCCTCTCGTCCCTATTCCTTCTATTCT
AGCCTTATTTCTTATGAGGAAGATCAGAGGCAAGGAGCAGAACCTAGAAAAAACTTTGTCAAG
CCTAATGAAACCAAAACTTACTTTTGGAAAGTGCAACATCATATGGCACCCACTAAAGATGAGT
TTGACTGCAAAGCCTGGGCTTATTTCTCTGATGTTGACCTGGAAAAAGATGTGCACTCAGGCCT
GATTGGACCCCTTCTGGTCTGCCACACTAACACACTGAACCCTGCTCATGGGAGACAAGTGACA
GTACAGGAATTTGCTCTGTTTTTCACCATCTTTGATGAGACCAAAAGCTGGTACTTCACTGAAA
ATATGGAAAGAAACTGCAGGGCTCCCTGCAATATCCAGATGGAAGATCCCACTTTTAAAGAGA ATTATCGCTTCCATGCAATCAATGGCTACATAATGGATACACTACCTGGCTTAGTAATGGCTCA
GGATCAAAGGATTCGATGGTATCTGCTCAGCATGGGCAGCAATGAAAACATCCATTCTATTCAT
TTCAGTGGACATGTGTTCACTGTACGAAAAAAAGAGGAGTATAAAATGGCACTGTACAATCTC
TATCCAGGTGTTTTTGAGACAGTGGAAATGTTACCATCCAAAGCTGGAATTTGGCGGGTGGAA
TGCCTTATTGGCGAGCATCTACATGCTGGGATGAGCACACTTTTTCTGGTGTACAGCAATAAGT
GTCAGACTCCCCTGGGAATGGCTTCTGGACACATTAGAGATTTTCAGATTACAGCTTCAGGACA
ATATGGACAGTGGGCCCCAAAGCTGGCCAGACTTCATTATTCCGGATCAATCAATGCCTGGAGC
ACCAAGGAGCCCTTTTCTTGGATCAAGGTGGATCTGTTGGCACCAATGATTATTCACGGCATCA
AGACCCAGGGTGCCCGTCAGAAGTTCTCCAGCCTCTACATCTCTCAGTTTATCATCATGTATAG
TCTTGATGGGAAGAAGTGGCAGACTTATCGAGGAAATTCCACTGGAACCTTAATGGTCTTCTTT
GGCAATGTGGATTCATCTGGGATAAAACACAATATTTTTAACCCTCCAATTATTGCTCGATACA
TCCGTTTGCACCCAACTCATTATAGCATTCGCAGCACTCTTCGCATGGAGTTGATGGGCTGTGA
TTTAAATAGTTGCAGCATGCCATTGGGAATGGAGAGTAAAGCAATATCAGATGCACAGATTAC
TGCTTCATCCTACTTTACCAATATGTTTGCCACCTGGTCTCCTTCAAAAGCTCGACTTCACCTCC
AAGGGAGGAGTAATGCCTGGAGACCTCAGGTGAATAATCCAAAAGAGTGGCTGCAAGTGGACT
TCCAGAAGACAATGAAAGTCACAGGAGTAACTACTCAGGGAGTAAAATCTCTGCTTACCAGCAT
GTATGTGAAGGAGTTCCTCATCTCCAGCAGTCAAGATGGCCATCAGTGGACTCTCTTTTTTCAG
AATGGCAAAGTAAAGGTTTTTCAGGGAAATCAAGACTCCTTCACACCTGTGGTGAACTCTCTAG
ACCCACCGTTACTGACTCGCTACCTTCGAATTCACCCCCAGAGTTGGGTGCACCAGATTGCCCTG
AGGATGGAGGTTCTGGGCrGCGAGGCACAGGACCrCTACGACTACAAAGACCATGACGGTGATTA
TAAA GA TCA TGA CA TOGA CTA CAAGGA TGA CGA TGA CAA GTGA G ATATC A A AGCTTTCG A ATTC A
ATAAAAGATCTTTATTTTCATTAGATCTGTGTGTTGGTTTTTTGTGTGCGGCCCAATTGA
G_G A A C_C_C C T A G_T G ATG G A G_TT G G_C_C A C TC_C_C T C TCT G C GCG C TC GCT C G CTC A C TG A G G_C_C G G G_C
G_ACCAA_AGGT_CGCCC_GACGC_C_CGGG_C_TTTG_C_CCGGGCGGC_C_TCAG_TGAG_CGAGC_GAGC_GCGCAG isEQ.I_D_N_Qi.29j
5’ N6 AAV intein set 2 (CMV promoter) pl276_pTIGEM_CMV_5' SQ-N6 F8 (split Ser883) + N-intein DnaE_3xFlag_(Synthetic- polyA)
CT_GCGC_G_CTCG_C_TQGCJ_CACTG_AGGC_C_GCCCG_GGCMAGC_C_C_GGG_C_GTCG_G_GCGAC_CTTJ_G_GTCG
CC_CGGCC_TCAGT_GAGC_GAGCGAGCG_CGCAG_AGAGGGAGTGGCC_AACT_C_CATCACTAGGGGT_TCC
TGCTAGCGATAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATAT
GGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC
GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTG
ACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCA
TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCC
CAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTA
TTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCAC
GGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCA
ACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGT
GTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCAGCGGCCGCC
ACCATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGCTTTAGTGCCAC CAGAAGATACTACCTGGGTGCAGTGGAACTGTCATGGGACTATATGCAAAGTGATCTCGGTGA
GCTGCCTGTGGACGCAAGATTTCCTCCTAGAGTGCCAAAATCTTTTCCATTCAACACCTCAGTC
GTGTACAAAAAGACTCTGTTTGTAGAATTCACGGATCACCTTTTCAACATCGCTAAGCCAAGGC
CACCCTGGATGGGTCTGCTAGGTCCTACCATCCAGGCTGAGGTTTATGATACAGTGGTCATTAC
ACTTAAGAACATGGCTTCCCATCCTGTCAGTCTTCATGCTGTTGGTGTATCCTACTGGAAAGCT
TCTGAGGGAGCTGAATATGATGATCAGACCAGTCAAAGGGAGAAAGAAGATGATAAAGTCTTC
CCTGGTGGAAGCCATACATATGTCTGGCAGGTCCTGAAAGAGAATGGTCCAATGGCCTCTGACC
CACTGTGCCTTACCTACTCATATCTTTCTCATGTGGACCTGGTAAAAGACTTGAATTCAGGCCT
CATTGGAGCCCTACTAGTATGTAGAGAAGGGAGTCTGGCCAAGGAAAAGACACAGACCTTGCAC
AAATTTATACTACTTTTTGCTGTATTTGATGAAGGGAAAAGTTGGCACTCAGAAACAAAGAAC
TCCTTGATGCAGGATAGGGATGCTGCATCTGCTCGGGCCTGGCCTAAAATGCACACAGTCAATG
GTTATGTAAACAGGTCTCTGCCAGGTCTGATTGGATGCCACAGGAAATCAGTCTATTGGCATGT
GATTGGAATGGGCACCACTCCTGAAGTGCACTCAATATTCCTCGAAGGTCACACATTTCTTGTG
AGGAACCATCGCCAGGCGTCCTTGGAAATCTCGCCAATAACTTTCCTTACTGCTCAAACACTCT
TGATGGACCTTGGACAGTTTCTACTGTTTTGTCATATCTCTTCCCACCAACATGATGGCATGGA
AGCTTATGTCAAAGTAGACAGCTGTCCAGAGGAACCCCAACTACGAATGAAAAATAATGAAGA
AGCGGAAGACTATGATGATGATCTTACTGATTCTGAAATGGATGTGGTCAGGTTTGATGATGA
CAACTCTCCTTCCTTTATCCAAATTCGCTCAGTTGCCAAGAAGCATCCTAAAACTTGGGTACAT
TACATTGCTGCTGAAGAGGAGGACTGGGACTATGCTCCCTTAGTCCTCGCCCCCGATGACAGAA
GTTATAAAAGTCAATATTTGAACAATGGCCCTCAGCGGATTGGTAGGAAGTACAAAAAAGTCC
GATTTATGGCATACACAGATGAAACCTTTAAGACTCGTGAAGCTATTCAGCATGAATCAGGAA
TCTTGGGACCTTTACTTTATGGGGAAGTTGGAGACACACTGTTGATTATATTTAAGAATCAAG
CAAGCAGACCATATAACATCTACCCTCACGGAATCACTGATGTCCGTCCTTTGTATTCAAGGAG
ATTACCAAAAGGTGTAAAACATTTGAAGGATTTTCCAATTCTGCCAGGAGAAATATTCAAATA
TAAATGGACAGTGACTGTAGAAGATGGGCCAACTAAATCAGATCCTCGGTGCCTGACCCGCTAT
TACTCTAGTTTCGTTAATATGGAGAGAGATCTAGCTTCAGGACTCATTGGCCCTCTCCTCATCT
GCTACAAAGAATCTGTAGATCAAAGAGGAAACCAGATAATGTCAGACAAGAGGAATGTCATCC
TGTTTTCTGTATTTGATGAGAACCGAAGCTGGTACCTCACAGAGAATATACAACGCTTTCTCCC
CAATCCAGCTGGAGTGCAGCTTGAGGATCCAGAGTTCCAAGCCTCCAACATCATGCACAGCATC
AATGGCTATGTTTTTGATAGTTTGCAGTTGTCAGTTTGTTTGCATGAGGTGGCATACTGGTACA
TTCTAAGCATTGGAGCACAGACTGACTTCCTTTCTGTCTTCTTCTCTGGATATACCTTCAAACA
CAAAATGGTCTATGAAGACACACTCACCCTATTCCCATTCTCAGGAGAAACTGTCTTCATGTCG
ATGGAAAACCCAGGTCTATGGATTCTGGGGTGCCACAACTCAGACTTTCGGAACAGAGGCATGA
CCGCCTTACTGAAGGTTTCTAGTTGTGACAAGAACACTGGTGATTATTACGAGGACAGTTATGA AGATATTTCAGCATACTTGCTGAGTAAAAACAATGCCATTGAACCAAGAAGTTTTTCACAGAA
TCCACCTGTATTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTAGGCAAAAACAG
TTTAATGCAACCACAATACCTGAAAATGATATAGAGAAAACCGATCCCTGGTTCGCACACCGAA
CCCCCATGCCAAAAATTCAAAACGTCTCCAGTTCCGATCTTCTCATGCTCTTGCGCCAGTCACCC
ACACCACATGGTCTCTCCCTCAGCGACCTGCAAGAGGCGAAATATGAAACATTTTCAGATGACC
CTAGCCCCGGCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTTTCGGCCGCAGCT
GCATCATTCTGGTGATATGGTATTCACCCCGGAATGCCTGAGCTACGAGACCGAGATCCTGACC
GTGGAGTACGGCCTGCTGCCCATCGGCAAGATCGTGGAGAAGCGGATCGAGTGCACCGTGTACA
GCGTGGACAACAACGGCAACATCTACACCCAGCCCGTGGCCCAGTGGCACGACCGGGGCGAGCA
GGAGGTGTTCGAGTACTGCCTGGAGGACGGCAGCCTGATCCGGGCCACCAAGGACCACAAGTTC
ATGACCGTGGACGGCCAGATGCTGCCCATCGACGAGATCTTCGAGCGGGAGCTGGACCTGATGC
GGGTGGACAACCTGCCCAACGAGTAGAAAGAGGA TGACGGTGA TTA TAAAGATCATGACA TCGACT
ACAAGGArGACGATCACAAGTCAAAGCTTGATATCATCGAATTCAATAAAAGATCTTTATTTT
CATTAGATCTGTGTGTTGGTTTTTTGTGTGCGGCCCAATTGAGGAACCCCTAGTGATGGAGT
Ti GCCA_CTCCCT_C_TCTG_C_GCGC CGCJC_GCTCA_CTGAGGCCGGGCGA_CCAAAGGTC_GCCCG_ACGC
CiIGGGC TTGXC_CGGG_CGGCCT_CAGJ_GAGC_GAGCGAGCG_CGCAG iSEQ.I_D_NQi.30J
3’ N6 AAV intein set 2 (CMV promoter) pl411_pTIGEM_CMV_SP_C-intein DnaE + 3' SQ-N6-F8 (split Ser883)_3xFlag_(Synthetic-polyA)
CT_GCGCiICTCGCT:CGCJCACT GGCi CCCGiiGC AGCCC_GGG_C_GTCG_G_GCGACCTTJ_GGTCG
CC_CGGCC_TCAGXGAGC_GAGCGAGCG_CGCAG_AGAGGGAGTGGC_C_AACT_C_CATCACTAG_GGGT_TCC
TGCTAGCGATAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATAT
GGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC
GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTG
ACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCA
TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCC
CAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTA
TTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCAC
GGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCA
ACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGT
GTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCAGCGGCCGCC
ACCATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGCTTTAGTATCA AGATCGCCACCCGGAAGTACCTGGGCAAGCAGAACGTGTACGACATCGGCGTGGAGCGGGACCA
CAACTTCGCCCTGAAGAACGGCTTCATCGCCAGCAATTCAGGCCTCCAACTTAGACTTAACGAG
AAACTGGGCACGACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCCAGTACCAGCA
ACAACCTTATCAGCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAATACATCATCACT
TGGGCCACCCTCTATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTTTGGTAAGAAG
TCATCCCCACTCACCGAAAGCGGTGGACCTTTGTCTCTCTCTGAGGAGAATAATGACTCCAAGC
TGCTTGAGTCAGGGTTGATGAACAGCCAAGAATCCTCATGGGGAAAAAACGTTTCCTCCACCAG
GGAAATAACTCGTACTACTCTTCAGTCAGATCAAGAGGAAATTGACTATGATGATACCATATC
AGTTGAAATGAAGAAGGAAGATTTTGACATTTATGATGAGGATGAAAATCAGAGCCCCCGCAG
CTTTCAAAAGAAAACACGACACTATTTTATTGCTGCAGTGGAGAGGCTCTGGGATTATGGGAT
GAGTAGCTCCCCACATGTTCTAAGAAACAGGGCTCAGAGTGGCAGTGTCCCTCAGTTCAAGAAA
GTTGTTTTCCAGGAATTTACTGATGGCTCCTTTACTCAGCCCTTATACCGTGGAGAACTAAATG
AACATTTGGGACTCCTGGGGCCATATATAAGAGCAGAAGTTGAAGATAATATCATGGTAACTT
TCAGAAATCAGGCCTCTCGTCCCTATTCCTTCTATTCTAGCCTTATTTCTTATGAGGAAGATCA
GAGGCAAGGAGCAGAACCTAGAAAAAACTTTGTCAAGCCTAATGAAACCAAAACTTACTTTTG
GAAAGTGCAACATCATATGGCACCCACTAAAGATGAGTTTGACTGCAAAGCCTGGGCTTATTTC
TCTGATGTTGACCTGGAAAAAGATGTGCACTCAGGCCTGATTGGACCCCTTCTGGTCTGCCACA
CTAACACACTGAACCCTGCTCATGGGAGACAAGTGACAGTACAGGAATTTGCTCTGTTTTTCAC
CATCTTTGATGAGACCAAAAGCTGGTACTTCACTGAAAATATGGAAAGAAACTGCAGGGCTCCC
TGCAATATCCAGATGGAAGATCCCACTTTTAAAGAGAATTATCGCTTCCATGCAATCAATGGCT
ACATAATGGATACACTACCTGGCTTAGTAATGGCTCAGGATCAAAGGATTCGATGGTATCTGCT
CAGCATGGGCAGCAATGAAAACATCCATTCTATTCATTTCAGTGGACATGTGTTCACTGTACGA
AAAAAAGAGGAGTATAAAATGGCACTGTACAATCTCTATCCAGGTGTTTTTGAGACAGTGGAA
ATGTTACCATCCAAAGCTGGAATTTGGCGGGTGGAATGCCTTATTGGCGAGCATCTACATGCTG
GGATGAGCACACTTTTTCTGGTGTACAGCAATAAGTGTCAGACTCCCCTGGGAATGGCTTCTGG
ACACATTAGAGATTTTCAGATTACAGCTTCAGGACAATATGGACAGTGGGCCCCAAAGCTGGCC
AGACTTCATTATTCCGGATCAATCAATGCCTGGAGCACCAAGGAGCCCTTTTCTTGGATCAAGG
TGGATCTGTTGGCACCAATGATTATTCACGGCATCAAGACCCAGGGTGCCCGTCAGAAGTTCTC
CAGCCTCTACATCTCTCAGTTTATCATCATGTATAGTCTTGATGGGAAGAAGTGGCAGACTTAT
CGAGGAAATTCCACTGGAACCTTAATGGTCTTCTTTGGCAATGTGGATTCATCTGGGATAAAAC
ACAATATTTTTAACCCTCCAATTATTGCTCGATACATCCGTTTGCACCCAACTCATTATAGCAT
TCGCAGCACTCTTCGCATGGAGTTGATGGGCTGTGATTTAAATAGTTGCAGCATGCCATTGGGA
ATGGAGAGTAAAGCAATATCAGATGCACAGATTACTGCTTCATCCTACTTTACCAATATGTTTG
CCACCTGGTCTCCTTCAAAAGCTCGACTTCACCTCCAAGGGAGGAGTAATGCCTGGAGACCTCA GGTGAATAATCCAAAAGAGTGGCTGCAAGTGGACTTCCAGAAGACAATGAAAGTCACAGGAGT
AACTACTCAGGGAGTAAAATCTCTGCTTACCAGCATGTATGTGAAGGAGTTCCTCATCTCCAGC
AGTCAAGATGGCCATCAGTGGACTCTCTTTTTTCAGAATGGCAAAGTAAAGGTTTTTCAGGGA
AATCAAGACTCCTTCACACCTGTGGTGAACTCTCTAGACCCACCGTTACTGACTCGCTACCTTCG
AATTCACCCCCAGAGTTGGGTGCACCAGATTGCCCTGAGGATGGAGGTTCTGGGCTGCGAGGCA
C AG G AC C TC T AC GA C 4 CAAA GA CCA TGA CGG TGA TTA TAAA GA TCA TGA CA TCGA CTACAAGGAT
GACGArGACAAGrGAGATATCAAAGCTTTCGAATTCAATAAAAGATCTTTATTTTCATTAGA
TCTGTGTGTTGGTTTTTTGTGTGCGGCCCAATTGAGGAACCCCTAGTGATGGAGTTGGCCAC
Ti CTCJjXGCXCXCTCXCTCG(XCACTGAGG_C_CGGG_C_GACCAAAG_GTCGC_C_CGAC_GCCCGG.GCT
TXGCCCGGGCGGCCTCAGTGAGCGAGCGAGCGCGC_AG iSEQ.I_D_NQi.31J
5’ codop N6 AAV intein set 1 (CMV promoter) pl430_pTIGEM_CMV_5' codop SQ-N6 F8 (Ser962) _N-int DnaE_3xFlag_(Synthetic- polyA)
CT_GCGCiICTCGCT:CGCJCACT GGCi CCCGiiGC AGCCC_GGG_C_GTCG_G_GCGACCTTJ_GGTCG
£ILCGGCCXCAGJj^AGCj^AGC(iAGCGj CAG GA(iGGAiiTGGCC_AACT_C_CATCA TAGGGGT_TCC
TGCTAGCGATAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATAT
GGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC
GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTG
ACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCA
TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCC
CAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTA
TTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCAC
GGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCA
ACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGT
GTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCAGCGGCCGCC
ACCATGCAGATTGAGCTGAGCACCTGCTTCTTCCTGTGCCTGCTGAGGTTCTGCTTCTCTGCCAC
CAGGAGATACTACCTGGGGGCTGTGGAGCTGAGCTGGGACTACATGCAGTCTGACCTGGGGGAG
CTGCCTGTGGATGCCAGGTTCCCCCCCAGAGTGCCCAAGAGCTTCCCCTTCAACACCTCTGTGGT
GTACAAGAAGACCCTGTTTGTGGAGTTCACTGACCACCTGTTCAACATTGCCAAGCCCAGGCCC
CCCTGGATGGGCCTGCTGGGCCCCACCATCCAGGCTGAGGTGTATGACACTGTGGTGATCACCC
TGAAGAACATGGCCAGCCACCCTGTGAGCCTGCATGCTGTGGGGGTGAGCTACTGGAAGGCCTC
TGAGGGGGCTGAGTATGATGACCAGACCAGCCAGAGGGAGAAGGAGGATGACAAGGTGTTCCC TGGGGGCAGCCACACCTATGTGTGGCAGGTGCTGAAGGAGAATGGCCCCATGGCCTCTGACCCC
CTGTGCCTGACCTACAGCTACCTGAGCCATGTGGACCTGGTGAAGGACCTGAACTCTGGCCTGA
TTGGGGCCCTGCTGGTGTGCAGGGAGGGCAGCCTGGCCAAGGAGAAGACCCAGACCCTGCACAA
GTTCATCCTGCTGTTTGCTGTGTTTGATGAGGGCAAGAGCTGGCACTCTGAAACCAAGAACAGC
CTGATGCAGGACAGGGATGCTGCCTCTGCCAGGGCCTGGCCCAAGATGCACACTGTGAATGGCT
ATGTGAACAGGAGCCTGCCTGGCCTGATTGGCTGCCACAGGAAGTCTGTGTACTGGCATGTGAT
TGGCATGGGCACCACCCCTGAGGTGCACAGCATCTTCCTGGAGGGCCACACCTTCCTGGTCAGG
AACCACAGGCAGGCCAGCCTGGAGATCAGCCCCATCACCTTCCTGACTGCCCAGACCCTGCTGAT
GGACCTGGGCCAGTTCCTGCTGTTCTGCCACATCAGCAGCCACCAGCATGATGGCATGGAGGCC
TATGTGAAGGTGGACAGCTGCCCTGAGGAGCCCCAGCTGAGGATGAAGAACAATGAGGAGGCT
GAGGACTATGATGATGACCTGACTGACTCTGAGATGGATGTGGTGAGGTTTGATGATGACAAC
AGCCCCAGCTTCATCCAGATCAGGTCTGTGGCCAAGAAGCACCCCAAGACCTGGGTGCACTACA
TTGCTGCTGAGGAGGAGGACTGGGACTATGCCCCCCTGGTGCTGGCCCCTGATGACAGGAGCTA
CAAGAGCCAGTACCTGAACAATGGCCCCCAGAGGATTGGCAGGAAGTACAAGAAGGTCAGGTTC
ATGGCCTACACTGATGAAACCTTCAAGACCAGGGAGGCCATCCAGCATGAGTCTGGCATCCTGG
GCCCCCTGCTGTATGGGGAGGTGGGGGACACCCTGCTGATCATCTTCAAGAACCAGGCCAGCAG
GCCCTACAACATCTACCCCCATGGCATCACTGATGTGAGGCCCCTGTACAGCAGGAGGCTGCCC
AAGGGGGTGAAGCACCTGAAGGACTTCCCCATCCTGCCTGGGGAGATCTTCAAGTACAAGTGGA
CTGTGACTGTGGAGGATGGCCCCACCAAGTCTGACCCCAGGTGCCTGACCAGATACTACAGCAG
CTTTGTGAACATGGAGAGGGACCTGGCCTCTGGCCTGATTGGCCCCCTGCTGATCTGCTACAAG
GAGTCTGTGGACCAGAGGGGCAACCAGATCATGTCTGACAAGAGGAATGTGATCCTGTTCTCTG
TGTTTGATGAGAACAGGAGCTGGTACCTGACTGAGAACATCCAGAGGTTCCTGCCCAACCCTGC
TGGGGTGCAGCTGGAGGACCCTGAGTTCCAGGCCAGCAACATCATGCACAGCATCAATGGCTAT
GTGTTTGACAGCCTGCAGCTGTCTGTGTGCCTGCATGAGGTGGCCTACTGGTACATCCTGAGCA
TTGGGGCCCAGACTGACTTCCTGTCTGTGTTCTTCTCTGGCTACACCTTCAAGCACAAGATGGT
GTATGAGGACACCCTGACCCTGTTCCCCTTCTCTGGGGAGACTGTGTTCATGAGCATGGAGAAC
CCTGGCCTGTGGATTCTGGGCTGCCACAACTCTGACTTCAGGAACAGGGGCATGACTGCCCTGC
TGAAAGTCTCCAGCTGTGACAAGAACACTGGGGACTACTATGAGGACAGCTATGAGGACATCTC
TGCCTACCTGCTGAGCAAGAACAATGCCATTGAGCCCAGGAGTTTTTCACAGAATCCACCTGTA
TTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTAGGCAAAAACAGTTTAATGCAA
CCACAATACCTGAAAATGATATAGAGAAAACCGATCCCTGGTTCGCACACCGAACCCCCATGCC
AAAAATTCAAAACGTCTCCAGTTCCGATCTTCTCATGCTCTTGCGCCAGTCACCCACACCACAT
GGTCTCTCCCTCAGCGACCTGCAAGAGGCGAAATATGAAACATTTTCAGATGACCCTAGCCCCG
GCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTTTCGGCCGCAGCTGCATCATTC TGGTGATATGGTATTCACCCCGGAATCAGGCCTCCAACTTAGACTTAACGAGAAACTGGGCACG
ACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCCAGTACCAGCAACAACCTTATCA
GCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAATACATCATCACTTGGGCCACCCTC
TATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTTTGGTAAGAAGTCATCCCCACTC
ACCGAATGCCTGAGCTACGAGACCGAGATCCTGACCGTGGAGTACGGCCTGCTGCCCATCGGCA
AGATCGTGGAGAAGCGGATCGAGTGCACCGTGTACAGCGTGGACAACAACGGCAACATCTACAC
CCAGCCCGTGGCCCAGTGGCACGACCGGGGCGAGCAGGAGGTGTTCGAGTACTGCCTGGAGGAC
GGCAGCCTGATCCGGGCCACCAAGGACCACAAGTTCATGACCGTGGACGGCCAGATGCTGCCCA
Figure imgf000059_0001
AG ACC A TGACGGTGA TTA TAAAGA TCA TGACA TCGACTACAAGGA TGACGA TGACAAGTGA AAGCT
TGATATCATCGAATTCAATAAAAGATCTTTATTTTCATTAGATCTGTGTGTTGGTTTTTT
GTGTGCGGCCCAATTGAGGAACCCCTAGTGATGGAGTTGGCCACTCCCTCTCTGCGCGCTCGCT
CilCTCACXGAXGCCGAGCGACCAAA_GGTC_G_CCCGA_CGCCC_GGGCT_TTGC_C_C_GGGCGGCCT_C_AGTG
AGCGAG_CGAGCGCG_CAG iSEQ.I_D_N_Qi.32J
3’ codop N6 AAV intein set 1 (CMV promoter) pl431_pTIGEM_CMV_SP_C-int DnaE_3' codop SQ-N6-F8 (Ser962)_3xFlag_(Synthetic- polyA)
CJ_GCGCiIC CGCTCGCJCACT GGCCiiCCCGiiGC AGCCC_GGG_QGTCG_G_GCGACCTTJ_GGTC_G
CC_CGGCC_TCAGXGAGC_GAGCAAGCG_CGCAG_AGAGGGAGTGGC_C_AACT_C_CATCAGTAG_GGGT_TCC
TGCTAGCGATAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATAT
GGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC
GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTG
ACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCA
TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCC
CAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTA
TTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCAC
GGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCA
ACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGT
GTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCAGCGGCCGCC
ACCATGCAGATTGAGCTGAGCACCTGCTTCTTCCTGTGCCTGCTGAGGTTCTGCTTCTCTATCA
AGATCGCCACCCGGAAGTACCTGGGCAAGCAGAACGTGTACGACATCGGCGTGGAGCGGGACCA
CAACTTCGCCCTGAAGAACGGCTTCATCGCCAGCAATAGCGGTGGACCTTTGTCTCTCTCTGAG GAGAATAATGACTCCAAGCTGCTTGAGTCAGGGTTGATGAACAGCCAAGAATCCTCATGGGGA
AAAAACGTTTCCTCCACCAGGGAGATCACCAGGACCACCCTGCAGTCTGACCAGGAGGAGATTG
ACTATGATGACACCATCTCTGTGGAGATGAAGAAGGAGGACTTTGACATCTACGACGAGGACG
AGAACCAGAGCCCCAGGAGCTTCCAGAAGAAGACCAGGCACTACTTCATTGCTGCTGTGGAGAG
GCTGTGGGACTATGGCATGAGCAGCAGCCCCCATGTGCTGAGGAACAGGGCCCAGTCTGGCTCT
GTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGAGTTCACTGATGGCAGCTTCACCCAGCCCCTGT
ACAGAGGGGAGCTGAATGAGCACCTGGGCCTGCTGGGCCCCTACATCAGGGCTGAGGTGGAGGA
CAACATCATGGTGACCTTCAGGAACCAGGCCAGCAGGCCCTACAGCTTCTACAGCAGCCTGATC
AGCTATGAGGAGGACCAGAGGCAGGGGGCTGAGCCCAGGAAGAACTTTGTGAAGCCCAATGAA
ACCAAGACCTACTTCTGGAAGGTGCAGCACCACATGGCCCCCACCAAGGATGAGTTTGACTGCA
AGGCCTGGGCCTACTTCTCTGATGTGGACCTGGAGAAGGATGTGCACTCTGGCCTGATTGGCCC
CCTGCTGGTGTGCCACACCAACACCCTGAACCCTGCCCATGGCAGGCAGGTGACTGTGCAGGAG
TTTGCCCTGTTCTTCACCATCTTTGATGAAACCAAGAGCTGGTACTTCACTGAGAACATGGAGA
GGAACTGCAGGGCCCCCTGCAACATCCAGATGGAGGACCCCACCTTCAAGGAGAACTACAGGTT
CCATGCCATCAATGGCTACATCATGGACACCCTGCCTGGCCTGGTGATGGCCCAGGACCAGAGG
ATCAGGTGGTACCTGCTGAGCATGGGCAGCAATGAGAACATCCACAGCATCCACTTCTCTGGCC
ATGTGTTCACTGTGAGGAAGAAGGAGGAGTACAAGATGGCCCTGTACAACCTGTACCCTGGGGT
GTTTGAGACTGTGGAGATGCTGCCCAGCAAGGCTGGCATCTGGAGGGTGGAGTGCCTGATTGGG
GAGCACCTGCATGCTGGCATGAGCACCCTGTTCCTGGTGTACAGCAACAAGTGCCAGACCCCCC
TGGGCATGGCCTCTGGCCACATCAGGGACTTCCAGATCACTGCCTCTGGCCAGTATGGCCAGTG
GGCCCCCAAGCTGGCCAGGCTGCACTACTCTGGCAGCATCAATGCCTGGAGCACCAAGGAGCCC
TTCAGCTGGATCAAGGTGGACCTGCTGGCCCCCATGATCATCCATGGCATCAAGACCCAGGGGG
CCAGGCAGAAGTTCAGCAGCCTGTACATCAGCCAGTTCATCATCATGTACAGCCTGGATGGCAA
GAAGTGGCAGACCTACAGGGGCAACAGCACTGGCACCCTGATGGTGTTCTTTGGCAATGTGGAC
AGCTCTGGCATCAAGCACAACATCTTCAACCCCCCCATCATTGCCAGATACATCAGGCTGCACCC
CACCCACTACAGCATCAGGAGCACCCTGAGGATGGAGCTGATGGGCTGTGACCTGAACAGCTGC
AGCATGCCCCTGGGCATGGAGAGCAAGGCCATCTCTGATGCCCAGATCACTGCCAGCAGCTACT
TCACCAACATGTTTGCCACCTGGAGCCCCAGCAAGGCCAGGCTGCACCTGCAGGGCAGGAGCAA
TGCCTGGAGGCCCCAGGTCAACAACCCCAAGGAGTGGCTGCAGGTGGACTTCCAGAAGACCATG
AAGGTGACTGGGGTGACCACCCAGGGGGTGAAGAGCCTGCTGACCAGCATGTATGTGAAGGAG
TTCCTGATCAGCAGCAGCCAGGATGGCCACCAGTGGACCCTGTTCTTCCAGAATGGCAAGGTGA
AGGTGTTCCAGGGCAACCAGGACAGCTTCACCCCTGTGGTGAACAGCCTGGACCCCCCCCTGCT
GACCAGATACCTGAGGATTCACCCCCAGAGCTGGGTGCACCAGATTGCCCTGAGGATGGAGGTG
CTGGGCTGTGAGGCCCAGGACCTGTACGACTACAAAGACCATGACGGTGATTATAAAGATCATGA CATCGACTACAAGGATGACGATGACAAGTGACATATCAAACCTTTCCAATTCAATAAAAGATCT
TTATTTTCATTAGATCTGTGTGTTGGTTTTTTGTGTGCGGCCCAATTGAGGAACCCCTAGT TGGAGTJGGCCACT CCCTCTCTG_CG_CGCTCGCTC_G_C_TCACTGA_GGCCG_GGCGACCAA_AGGTC
GC_CCGA_CGCCCG_GGCTT_TGCCC_G_GGC_G_GCCTCAGTGAGCGAGCGAGCGC_G_CAG
(SEQ ID NO: 33)
5’ codop N6 AAV intein set 1 (HLP promoter) pl440_pTIGEM_HLP_5' codop SQ-N6 F8 (Ser962) _N-int DnaE_3xFlag_(Synthetic- polyA)
CJ_GCGCiIC CGCTC GCJCACT GGCCiiCCCGiiGC AGCCC_GGG_QGTCG_G_GCGACCTTJ_GGTC_G
£ILCGGCCXCAGJj^AGCj^AGC(iAGCGj CAG GA(iGGAiiTGGCC_AACT_C_CATCA TAGGGGT_TCC
TGCTAGC TGTTTGCTGCTTGCAA TGTTTGCCCA TTTTAGGGTGGACACAGGACGCTGTGGTTTC
TGAGCCAGGGGGCGACTCAGATCCCAGCCAGTGGACTTAGCCCCTGTTTGCTCCTCCGATAACT
GGGGTGACCTTGGTTAATATTCACCAGCAGCCTCCCCCGTTGCCCCTCTGGATCCACTGCTTAAA
TACGGACGAGGACAGGGCCCTGTCTCCTCAGCTTCAGGCACCACCACTGACCTGGGACAGTGAA rCGCGGCCGCCACCATGCAGATTGAGCTGAGCACCTGCTTCTTCCTGTGCCTGCTGAGGTTCTGC
TTCTCTGCCACCAGGAGATACTACCTGGGGGCTGTGGAGCTGAGCTGGGACTACATGCAGTCTG
ACCTGGGGGAGCTGCCTGTGGATGCCAGGTTCCCCCCCAGAGTGCCCAAGAGCTTCCCCTTCAAC
ACCTCTGTGGTGTACAAGAAGACCCTGTTTGTGGAGTTCACTGACCACCTGTTCAACATTGCCA
AGCCCAGGCCCCCCTGGATGGGCCTGCTGGGCCCCACCATCCAGGCTGAGGTGTATGACACTGT
GGTGATCACCCTGAAGAACATGGCCAGCCACCCTGTGAGCCTGCATGCTGTGGGGGTGAGCTAC
TGGAAGGCCTCTGAGGGGGCTGAGTATGATGACCAGACCAGCCAGAGGGAGAAGGAGGATGAC
AAGGTGTTCCCTGGGGGCAGCCACACCTATGTGTGGCAGGTGCTGAAGGAGAATGGCCCCATGG
CCTCTGACCCCCTGTGCCTGACCTACAGCTACCTGAGCCATGTGGACCTGGTGAAGGACCTGAA
CTCTGGCCTGATTGGGGCCCTGCTGGTGTGCAGGGAGGGCAGCCTGGCCAAGGAGAAGACCCAG
ACCCTGCACAAGTTCATCCTGCTGTTTGCTGTGTTTGATGAGGGCAAGAGCTGGCACTCTGAAA
CCAAGAACAGCCTGATGCAGGACAGGGATGCTGCCTCTGCCAGGGCCTGGCCCAAGATGCACAC
TGTGAATGGCTATGTGAACAGGAGCCTGCCTGGCCTGATTGGCTGCCACAGGAAGTCTGTGTAC
TGGCATGTGATTGGCATGGGCACCACCCCTGAGGTGCACAGCATCTTCCTGGAGGGCCACACCT
TCCTGGTCAGGAACCACAGGCAGGCCAGCCTGGAGATCAGCCCCATCACCTTCCTGACTGCCCA
GACCCTGCTGATGGACCTGGGCCAGTTCCTGCTGTTCTGCCACATCAGCAGCCACCAGCATGAT
GGCATGGAGGCCTATGTGAAGGTGGACAGCTGCCCTGAGGAGCCCCAGCTGAGGATGAAGAACA
ATGAGGAGGCTGAGGACTATGATGATGACCTGACTGACTCTGAGATGGATGTGGTGAGGTTTG
ATGATGACAACAGCCCCAGCTTCATCCAGATCAGGTCTGTGGCCAAGAAGCACCCCAAGACCTG GGTGCACTACATTGCTGCTGAGGAGGAGGACTGGGACTATGCCCCCCTGGTGCTGGCCCCTGAT
GACAGGAGCTACAAGAGCCAGTACCTGAACAATGGCCCCCAGAGGATTGGCAGGAAGTACAAG
AAGGTCAGGTTCATGGCCTACACTGATGAAACCTTCAAGACCAGGGAGGCCATCCAGCATGAGT
CTGGCATCCTGGGCCCCCTGCTGTATGGGGAGGTGGGGGACACCCTGCTGATCATCTTCAAGAA
CCAGGCCAGCAGGCCCTACAACATCTACCCCCATGGCATCACTGATGTGAGGCCCCTGTACAGC
AGGAGGCTGCCCAAGGGGGTGAAGCACCTGAAGGACTTCCCCATCCTGCCTGGGGAGATCTTCA
AGTACAAGTGGACTGTGACTGTGGAGGATGGCCCCACCAAGTCTGACCCCAGGTGCCTGACCAG
ATACTACAGCAGCTTTGTGAACATGGAGAGGGACCTGGCCTCTGGCCTGATTGGCCCCCTGCTG
ATCTGCTACAAGGAGTCTGTGGACCAGAGGGGCAACCAGATCATGTCTGACAAGAGGAATGTG
ATCCTGTTCTCTGTGTTTGATGAGAACAGGAGCTGGTACCTGACTGAGAACATCCAGAGGTTCC
TGCCCAACCCTGCTGGGGTGCAGCTGGAGGACCCTGAGTTCCAGGCCAGCAACATCATGCACAG
CATCAATGGCTATGTGTTTGACAGCCTGCAGCTGTCTGTGTGCCTGCATGAGGTGGCCTACTGG
TACATCCTGAGCATTGGGGCCCAGACTGACTTCCTGTCTGTGTTCTTCTCTGGCTACACCTTCA
AGCACAAGATGGTGTATGAGGACACCCTGACCCTGTTCCCCTTCTCTGGGGAGACTGTGTTCAT
GAGCATGGAGAACCCTGGCCTGTGGATTCTGGGCTGCCACAACTCTGACTTCAGGAACAGGGGC
ATGACTGCCCTGCTGAAAGTCTCCAGCTGTGACAAGAACACTGGGGACTACTATGAGGACAGCT
ATGAGGACATCTCTGCCTACCTGCTGAGCAAGAACAATGCCATTGAGCCCAGGAGTTTTTCACA
GAATCCACCTGTATTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTAGGCAAAAA
CAGTTTAATGCAACCACAATACCTGAAAATGATATAGAGAAAACCGATCCCTGGTTCGCACACC
GAACCCCCATGCCAAAAATTCAAAACGTCTCCAGTTCCGATCTTCTCATGCTCTTGCGCCAGTC
ACCCACACCACATGGTCTCTCCCTCAGCGACCTGCAAGAGGCGAAATATGAAACATTTTCAGAT
GACCCTAGCCCCGGCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTTTCGGCCGC
AGCTGCATCATTCTGGTGATATGGTATTCACCCCGGAATCAGGCCTCCAACTTAGACTTAACGA
GAAACTGGGCACGACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCCAGTACCAGC
AACAACCTTATCAGCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAATACATCATCAC
TTGGGCCACCCTCTATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTTTGGTAAGAA
GTCATCCCCACTCACCGAATGCCTGAGCTACGAGACCGAGATCCTGACCGTGGAGTACGGCCTG
CTGCCCATCGGCAAGATCGTGGAGAAGCGGATCGAGTGCACCGTGTACAGCGTGGACAACAACG
GCAACATCTACACCCAGCCCGTGGCCCAGTGGCACGACCGGGGCGAGCAGGAGGTGTTCGAGTA
CTGCCTGGAGGACGGCAGCCTGATCCGGGCCACCAAGGACCACAAGTTCATGACCGTGGACGGC
CAGATGCTGCCCATCGACGAGATCTTCGAGCGGGAGCTGGACCTGATGCGGGTGGACAACCTGC
C.C.AAC.GACTA C.AAA GAC.C.A TGACGGTGA TTA TAAAGA TCA TGACA TCGA CTA CAA GGA TGA GGA TG
ACAAGTCAAAGCTTGATATCATCGAATTCAATAAAAGATCTTTATTTTCATTAGATCTGTGT
GTTGGTTTTTTGTGTGCGGCCCAATTGAGGAACCCCTAGTGATGGAGTTGGCCACTCCCTCTC TGCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCCCGGGCTTTGCCCGG ilCGGCCT_CAGJ_G_AGC_GAGC_GAGCGCGCAG iSEQ.I_D_NQi.34J
3’ codop N6 AAV intein set 1 (HLP promoter) pl441_pTIGEM_HLP_SP_C-int DnaE_3' codop SQ-N6-F8 (Ser962)_3xFlag_(Synthetic- polyA)
CT_GCGCiIC CGCTC GCJCACTGAGGCCiCCCGiiGC AGCCC_GG C_GTCG_G_GCGACCTTTGGTC_G
£ILCGGCCXCAGJj^AGCj^AGC(iAGCGj CAG GA(iGGAiiTGGCC_AACT_C_CATCA TAGGGGT_TCC
TGCTAGC TGTTTGCTGCTTGCAA TGTTTGCCCA TTTTAGGGTGGACACAGGACGCTGTGGTTTC
TGAGCCAGGGGGCGACTCAGATCCCAGCCAGTGGACTTAGCCCCTGTTTGCTCCTCCGATAACT
GGGGTGACCTTGGTTAATATTCACCAGCAGCCTCCCCCGTTGCCCCTCTGGATCCACTGCTTAAA
TACGGACGAGGACAGGGCCCTGTCTCCTCAGCTTCAGGCACCACCACTGACCTGGGACAGTGAA rCGCGGCCGCCACCATGCAGATTGAGCTGAGCACCTGCTTCTTCCTGTGCCTGCTGAGGTTCTGC
TTCTCTATCAAGATCGCCACCCGGAAGTACCTGGGCAAGCAGAACGTGTACGACATCGGCGTGG
AGCGGGACCACAACTTCGCCCTGAAGAACGGCTTCATCGCCAGCAATAGCGGTGGACCTTTGTC
TCTCTCTGAGGAGAATAATGACTCCAAGCTGCTTGAGTCAGGGTTGATGAACAGCCAAGAATCC
TCATGGGGAAAAAACGTTTCCTCCACCAGGGAGATCACCAGGACCACCCTGCAGTCTGACCAGG
AGGAGATTGACTATGATGACACCATCTCTGTGGAGATGAAGAAGGAGGACTTTGACATCTACG
ACGAGGACGAGAACCAGAGCCCCAGGAGCTTCCAGAAGAAGACCAGGCACTACTTCATTGCTGC
TGTGGAGAGGCTGTGGGACTATGGCATGAGCAGCAGCCCCCATGTGCTGAGGAACAGGGCCCAG
TCTGGCTCTGTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGAGTTCACTGATGGCAGCTTCACCC
AGCCCCTGTACAGAGGGGAGCTGAATGAGCACCTGGGCCTGCTGGGCCCCTACATCAGGGCTGA
GGTGGAGGACAACATCATGGTGACCTTCAGGAACCAGGCCAGCAGGCCCTACAGCTTCTACAGC
AGCCTGATCAGCTATGAGGAGGACCAGAGGCAGGGGGCTGAGCCCAGGAAGAACTTTGTGAAG
CCCAATGAAACCAAGACCTACTTCTGGAAGGTGCAGCACCACATGGCCCCCACCAAGGATGAGT
TTGACTGCAAGGCCTGGGCCTACTTCTCTGATGTGGACCTGGAGAAGGATGTGCACTCTGGCCT
GATTGGCCCCCTGCTGGTGTGCCACACCAACACCCTGAACCCTGCCCATGGCAGGCAGGTGACT
GTGCAGGAGTTTGCCCTGTTCTTCACCATCTTTGATGAAACCAAGAGCTGGTACTTCACTGAGA
ACATGGAGAGGAACTGCAGGGCCCCCTGCAACATCCAGATGGAGGACCCCACCTTCAAGGAGAA
CTACAGGTTCCATGCCATCAATGGCTACATCATGGACACCCTGCCTGGCCTGGTGATGGCCCAG
GACCAGAGGATCAGGTGGTACCTGCTGAGCATGGGCAGCAATGAGAACATCCACAGCATCCACT
TCTCTGGCCATGTGTTCACTGTGAGGAAGAAGGAGGAGTACAAGATGGCCCTGTACAACCTGTA
CCCTGGGGTGTTTGAGACTGTGGAGATGCTGCCCAGCAAGGCTGGCATCTGGAGGGTGGAGTGC CTGATTGGGGAGCACCTGCATGCTGGCATGAGCACCCTGTTCCTGGTGTACAGCAACAAGTGCC
AGACCCCCCTGGGCATGGCCTCTGGCCACATCAGGGACTTCCAGATCACTGCCTCTGGCCAGTAT
GGCCAGTGGGCCCCCAAGCTGGCCAGGCTGCACTACTCTGGCAGCATCAATGCCTGGAGCACCA
AGGAGCCCTTCAGCTGGATCAAGGTGGACCTGCTGGCCCCCATGATCATCCATGGCATCAAGAC
CCAGGGGGCCAGGCAGAAGTTCAGCAGCCTGTACATCAGCCAGTTCATCATCATGTACAGCCTG
GATGGCAAGAAGTGGCAGACCTACAGGGGCAACAGCACTGGCACCCTGATGGTGTTCTTTGGCA
ATGTGGACAGCTCTGGCATCAAGCACAACATCTTCAACCCCCCCATCATTGCCAGATACATCAG
GCTGCACCCCACCCACTACAGCATCAGGAGCACCCTGAGGATGGAGCTGATGGGCTGTGACCTG
AACAGCTGCAGCATGCCCCTGGGCATGGAGAGCAAGGCCATCTCTGATGCCCAGATCACTGCCA
GCAGCTACTTCACCAACATGTTTGCCACCTGGAGCCCCAGCAAGGCCAGGCTGCACCTGCAGGG
CAGGAGCAATGCCTGGAGGCCCCAGGTCAACAACCCCAAGGAGTGGCTGCAGGTGGACTTCCAG
AAGACCATGAAGGTGACTGGGGTGACCACCCAGGGGGTGAAGAGCCTGCTGACCAGCATGTATG
TGAAGGAGTTCCTGATCAGCAGCAGCCAGGATGGCCACCAGTGGACCCTGTTCTTCCAGAATGG
CAAGGTGAAGGTGTTCCAGGGCAACCAGGACAGCTTCACCCCTGTGGTGAACAGCCTGGACCCC
CCCCTGCTGACCAGATACCTGAGGATTCACCCCCAGAGCTGGGTGCACCAGATTGCCCTGAGGA
TGGAGGTGCTGGGCTGTGAGGCCCAGGACCTGTACGACTACAAAGACCATGACGGTGATTATAAA
GA TGA TGA CA TGGA GTA GAA GGA TGA GGA TGA CAAGTGA G ATATC A A AGCTTTCG A ATTCA ATA A
AAGATCTTTATTTTCATTAGATCTGTGTGTTGGTTTTTTGTGTGCGGCCCAATTGAGGAAC
CilC T A GTG AT GG A GT_TG G C_C_A C T C C_C_T C TC T_G C G C G_C T C G CTC G C TCA C T GA G G C CG G G C_G A CCA
AAG G J_C_G C C C GAG GC CjZ GG GCJ^TTG CJ C G G(^C G GC CJ^C AGX AG Cj^AG Cjl AGC (^C G C AG iSEQ.LQ_N_Qi.35J
F8 N6 pl263_ pTIGEM_CMV_hF8-SQ-N6_3xflag_(Synthetic-polyA)
CT_GCGC_G_CTCG_C_TCGCJ_CACTG_AGGC_C_GCCCG_GGCMAGC_C_C_GGG_C_GTCG_G_GCGAC_CTTJ_G_GTCG
C_C_C_GGC_C_TCAGXGAG_C_GAGC_GAGCG_C_GCA_GAGAG_GGAG_TGG_C_C_ CT_C_CATCACTA_G_GGGT_TCC
TGCTAGCGATAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATAT
GGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC
GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTG
ACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCA
TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCC
CAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTA
TTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCAC
GGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCA ACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGT
GTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCAGCGGCCGCC
ACCATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGCTTTAGTGCCAC
CAGAAGATACTACCTGGGTGCAGTGGAACTGTCATGGGACTATATGCAAAGTGATCTCGGTGA
GCTGCCTGTGGACGCAAGATTTCCTCCTAGAGTGCCAAAATCTTTTCCATTCAACACCTCAGTC
GTGTACAAAAAGACTCTGTTTGTAGAATTCACGGATCACCTTTTCAACATCGCTAAGCCAAGGC
CACCCTGGATGGGTCTGCTAGGTCCTACCATCCAGGCTGAGGTTTATGATACAGTGGTCATTAC
ACTTAAGAACATGGCTTCCCATCCTGTCAGTCTTCATGCTGTTGGTGTATCCTACTGGAAAGCT
TCTGAGGGAGCTGAATATGATGATCAGACCAGTCAAAGGGAGAAAGAAGATGATAAAGTCTTC
CCTGGTGGAAGCCATACATATGTCTGGCAGGTCCTGAAAGAGAATGGTCCAATGGCCTCTGACC
CACTGTGCCTTACCTACTCATATCTTTCTCATGTGGACCTGGTAAAAGACTTGAATTCAGGCCT
CATTGGAGCCCTACTAGTATGTAGAGAAGGGAGTCTGGCCAAGGAAAAGACACAGACCTTGCAC
AAATTTATACTACTTTTTGCTGTATTTGATGAAGGGAAAAGTTGGCACTCAGAAACAAAGAAC
TCCTTGATGCAGGATAGGGATGCTGCATCTGCTCGGGCCTGGCCTAAAATGCACACAGTCAATG
GTTATGTAAACAGGTCTCTGCCAGGTCTGATTGGATGCCACAGGAAATCAGTCTATTGGCATGT
GATTGGAATGGGCACCACTCCTGAAGTGCACTCAATATTCCTCGAAGGTCACACATTTCTTGTG
AGGAACCATCGCCAGGCGTCCTTGGAAATCTCGCCAATAACTTTCCTTACTGCTCAAACACTCT
TGATGGACCTTGGACAGTTTCTACTGTTTTGTCATATCTCTTCCCACCAACATGATGGCATGGA
AGCTTATGTCAAAGTAGACAGCTGTCCAGAGGAACCCCAACTACGAATGAAAAATAATGAAGA
AGCGGAAGACTATGATGATGATCTTACTGATTCTGAAATGGATGTGGTCAGGTTTGATGATGA
CAACTCTCCTTCCTTTATCCAAATTCGCTCAGTTGCCAAGAAGCATCCTAAAACTTGGGTACAT
TACATTGCTGCTGAAGAGGAGGACTGGGACTATGCTCCCTTAGTCCTCGCCCCCGATGACAGAA
GTTATAAAAGTCAATATTTGAACAATGGCCCTCAGCGGATTGGTAGGAAGTACAAAAAAGTCC
GATTTATGGCATACACAGATGAAACCTTTAAGACTCGTGAAGCTATTCAGCATGAATCAGGAA
TCTTGGGACCTTTACTTTATGGGGAAGTTGGAGACACACTGTTGATTATATTTAAGAATCAAG
CAAGCAGACCATATAACATCTACCCTCACGGAATCACTGATGTCCGTCCTTTGTATTCAAGGAG
ATTACCAAAAGGTGTAAAACATTTGAAGGATTTTCCAATTCTGCCAGGAGAAATATTCAAATA
TAAATGGACAGTGACTGTAGAAGATGGGCCAACTAAATCAGATCCTCGGTGCCTGACCCGCTAT
TACTCTAGTTTCGTTAATATGGAGAGAGATCTAGCTTCAGGACTCATTGGCCCTCTCCTCATCT
GCTACAAAGAATCTGTAGATCAAAGAGGAAACCAGATAATGTCAGACAAGAGGAATGTCATCC
TGTTTTCTGTATTTGATGAGAACCGAAGCTGGTACCTCACAGAGAATATACAACGCTTTCTCCC
CAATCCAGCTGGAGTGCAGCTTGAGGATCCAGAGTTCCAAGCCTCCAACATCATGCACAGCATC
AATGGCTATGTTTTTGATAGTTTGCAGTTGTCAGTTTGTTTGCATGAGGTGGCATACTGGTACA
TTCTAAGCATTGGAGCACAGACTGACTTCCTTTCTGTCTTCTTCTCTGGATATACCTTCAAACA CAAAATGGTCTATGAAGACACACTCACCCTATTCCCATTCTCAGGAGAAACTGTCTTCATGTCG
ATGGAAAACCCAGGTCTATGGATTCTGGGGTGCCACAACTCAGACTTTCGGAACAGAGGCATGA
CCGCCTTACTGAAGGTTTCTAGTTGTGACAAGAACACTGGTGATTATTACGAGGACAGTTATGA
AGATATTTCAGCATACTTGCTGAGTAAAAACAATGCCATTGAACCAAGAAGTTTTTCACAGAA
TCCACCTGTATTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTAGGCAAAAACAG
TTTAATGCAACCACAATACCTGAAAATGATATAGAGAAAACCGATCCCTGGTTCGCACACCGAA
CCCCCATGCCAAAAATTCAAAACGTCTCCAGTTCCGATCTTCTCATGCTCTTGCGCCAGTCACCC
ACACCACATGGTCTCTCCCTCAGCGACCTGCAAGAGGCGAAATATGAAACATTTTCAGATGACC
CTAGCCCCGGCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTTTCGGCCGCAGCT
GCATCATTCTGGTGATATGGTATTCACCCCGGAATCAGGCCTCCAACTTAGACTTAACGAGAAA
CTGGGCACGACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCCAGTACCAGCAACA
ACCTTATCAGCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAATACATCATCACTTGG
GCCACCCTCTATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTTTGGTAAGAAGTCA
TCCCCACTCACCGAAAGCGGTGGACCTTTGTCTCTCTCTGAGGAGAATAATGACTCCAAGCTGC
TTGAGTCAGGGTTGATGAACAGCCAAGAATCCTCATGGGGAAAAAACGTTTCCTCCACCAGGGA
AATAACTCGTACTACTCTTCAGTCAGATCAAGAGGAAATTGACTATGATGATACCATATCAGTT
GAAATGAAGAAGGAAGATTTTGACATTTATGATGAGGATGAAAATCAGAGCCCCCGCAGCTTT
CAAAAGAAAACACGACACTATTTTATTGCTGCAGTGGAGAGGCTCTGGGATTATGGGATGAGT
AGCTCCCCACATGTTCTAAGAAACAGGGCTCAGAGTGGCAGTGTCCCTCAGTTCAAGAAAGTTG
TTTTCCAGGAATTTACTGATGGCTCCTTTACTCAGCCCTTATACCGTGGAGAACTAAATGAACA
TTTGGGACTCCTGGGGCCATATATAAGAGCAGAAGTTGAAGATAATATCATGGTAACTTTCAG
AAATCAGGCCTCTCGTCCCTATTCCTTCTATTCTAGCCTTATTTCTTATGAGGAAGATCAGAGG
CAAGGAGCAGAACCTAGAAAAAACTTTGTCAAGCCTAATGAAACCAAAACTTACTTTTGGAAA
GTGCAACATCATATGGCACCCACTAAAGATGAGTTTGACTGCAAAGCCTGGGCTTATTTCTCTG
ATGTTGACCTGGAAAAAGATGTGCACTCAGGCCTGATTGGACCCCTTCTGGTCTGCCACACTAA
CACACTGAACCCTGCTCATGGGAGACAAGTGACAGTACAGGAATTTGCTCTGTTTTTCACCATC
TTTGATGAGACCAAAAGCTGGTACTTCACTGAAAATATGGAAAGAAACTGCAGGGCTCCCTGC
AATATCCAGATGGAAGATCCCACTTTTAAAGAGAATTATCGCTTCCATGCAATCAATGGCTACA
TAATGGATACACTACCTGGCTTAGTAATGGCTCAGGATCAAAGGATTCGATGGTATCTGCTCAG
CATGGGCAGCAATGAAAACATCCATTCTATTCATTTCAGTGGACATGTGTTCACTGTACGAAAA
AAAGAGGAGTATAAAATGGCACTGTACAATCTCTATCCAGGTGTTTTTGAGACAGTGGAAATG
TTACCATCCAAAGCTGGAATTTGGCGGGTGGAATGCCTTATTGGCGAGCATCTACATGCTGGGA
TGAGCACACTTTTTCTGGTGTACAGCAATAAGTGTCAGACTCCCCTGGGAATGGCTTCTGGACA
CATTAGAGATTTTCAGATTACAGCTTCAGGACAATATGGACAGTGGGCCCCAAAGCTGGCCAGA CTTCATTATTCCGGATCAATCAATGCCTGGAGCACCAAGGAGCCCTTTTCTTGGATCAAGGTGG
ATCTGTTGGCACCAATGATTATTCACGGCATCAAGACCCAGGGTGCCCGTCAGAAGTTCTCCAG
CCTCTACATCTCTCAGTTTATCATCATGTATAGTCTTGATGGGAAGAAGTGGCAGACTTATCGA
GGAAATTCCACTGGAACCTTAATGGTCTTCTTTGGCAATGTGGATTCATCTGGGATAAAACACA
ATATTTTTAACCCTCCAATTATTGCTCGATACATCCGTTTGCACCCAACTCATTATAGCATTCG
CAGCACTCTTCGCATGGAGTTGATGGGCTGTGATTTAAATAGTTGCAGCATGCCATTGGGAATG
GAGAGTAAAGCAATATCAGATGCACAGATTACTGCTTCATCCTACTTTACCAATATGTTTGCCA
CCTGGTCTCCTTCAAAAGCTCGACTTCACCTCCAAGGGAGGAGTAATGCCTGGAGACCTCAGGT
GAATAATCCAAAAGAGTGGCTGCAAGTGGACTTCCAGAAGACAATGAAAGTCACAGGAGTAAC
TACTCAGGGAGTAAAATCTCTGCTTACCAGCATGTATGTGAAGGAGTTCCTCATCTCCAGCAGT
CAAGATGGCCATCAGTGGACTCTCTTTTTTCAGAATGGCAAAGTAAAGGTTTTTCAGGGAAATC
AAGACTCCTTCACACCTGTGGTGAACTCTCTAGACCCACCGTTACTGACTCGCTACCTTCGAAT
TCACCCCCAGAGTTGGGTGCACCAGATTGCCCTGAGGATGGAGGTTCTGGGCTGCGAGGCACAG
G A C C TC T AC GA C 4 CAAA GA CCA TGA CGGTGA TTA TAAA GA TCA TGA CA TCGA CTACAAGGA TGA C
G rG G GTC GATATCAAAGCTTTCGAATTCAATAAAAGATCTTTATTTTCATTAGATCT
GTGTGTTGGTTTTTTGTGTGCGGCCCAATTGAGGAACCCCTAGTGATGGAGTTGGCCACTCC
CXCTCTG_CGCGC_TCGCJ_C_GCTCACTGAGGCCG_GGCGACCAA_AGGT_CGCCCGACGCC_CGGG_C_TTTG
CCCG G GCG G C C T_C A G TG A G C GA G C GA G C GCG C A G iSEQ.I_D_N_Qi.36J codop F8 N6 pl443_ pTIGEM_CMV_codop-hF8-SQ-N6_3xflag_(Synthetic-polyA)
CT_GCGCiICTCGCT:CGCJCACT GGCi CCCGiiGC AGCCC_GGG_C_GTCG_G_GCGACCTTJ_GGTCG
CiIC_GGCCXCAGTj^AGC_GAGCG_AGCG_C_GCAG_AGAG_GGAGTGGC_CAACT_CCATCACTAG_GGGTTCC
TGCTAGCGATAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATAT
GGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC
GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTG
ACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCA
TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCC
CAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTA
TTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCAC
GGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCA
ACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGT
GTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAGTGAACCGTCAGATCAGCGGCCGCC ACCATGCAGATTGAGCTGAGCACCTGCTTCTTCCTGTGCCTGCTGAGGTTCTGCTTCTCTGCCAC
CAGGAGATACTACCTGGGGGCTGTGGAGCTGAGCTGGGACTACATGCAGTCTGACCTGGGGGAG
CTGCCTGTGGATGCCAGGTTCCCCCCCAGAGTGCCCAAGAGCTTCCCCTTCAACACCTCTGTGGT
GTACAAGAAGACCCTGTTTGTGGAGTTCACTGACCACCTGTTCAACATTGCCAAGCCCAGGCCC
CCCTGGATGGGCCTGCTGGGCCCCACCATCCAGGCTGAGGTGTATGACACTGTGGTGATCACCC
TGAAGAACATGGCCAGCCACCCTGTGAGCCTGCATGCTGTGGGGGTGAGCTACTGGAAGGCCTC
TGAGGGGGCTGAGTATGATGACCAGACCAGCCAGAGGGAGAAGGAGGATGACAAGGTGTTCCC
TGGGGGCAGCCACACCTATGTGTGGCAGGTGCTGAAGGAGAATGGCCCCATGGCCTCTGACCCC
CTGTGCCTGACCTACAGCTACCTGAGCCATGTGGACCTGGTGAAGGACCTGAACTCTGGCCTGA
TTGGGGCCCTGCTGGTGTGCAGGGAGGGCAGCCTGGCCAAGGAGAAGACCCAGACCCTGCACAA
GTTCATCCTGCTGTTTGCTGTGTTTGATGAGGGCAAGAGCTGGCACTCTGAAACCAAGAACAGC
CTGATGCAGGACAGGGATGCTGCCTCTGCCAGGGCCTGGCCCAAGATGCACACTGTGAATGGCT
ATGTGAACAGGAGCCTGCCTGGCCTGATTGGCTGCCACAGGAAGTCTGTGTACTGGCATGTGAT
TGGCATGGGCACCACCCCTGAGGTGCACAGCATCTTCCTGGAGGGCCACACCTTCCTGGTCAGG
AACCACAGGCAGGCCAGCCTGGAGATCAGCCCCATCACCTTCCTGACTGCCCAGACCCTGCTGAT
GGACCTGGGCCAGTTCCTGCTGTTCTGCCACATCAGCAGCCACCAGCATGATGGCATGGAGGCC
TATGTGAAGGTGGACAGCTGCCCTGAGGAGCCCCAGCTGAGGATGAAGAACAATGAGGAGGCT
GAGGACTATGATGATGACCTGACTGACTCTGAGATGGATGTGGTGAGGTTTGATGATGACAAC
AGCCCCAGCTTCATCCAGATCAGGTCTGTGGCCAAGAAGCACCCCAAGACCTGGGTGCACTACA
TTGCTGCTGAGGAGGAGGACTGGGACTATGCCCCCCTGGTGCTGGCCCCTGATGACAGGAGCTA
CAAGAGCCAGTACCTGAACAATGGCCCCCAGAGGATTGGCAGGAAGTACAAGAAGGTCAGGTTC
ATGGCCTACACTGATGAAACCTTCAAGACCAGGGAGGCCATCCAGCATGAGTCTGGCATCCTGG
GCCCCCTGCTGTATGGGGAGGTGGGGGACACCCTGCTGATCATCTTCAAGAACCAGGCCAGCAG
GCCCTACAACATCTACCCCCATGGCATCACTGATGTGAGGCCCCTGTACAGCAGGAGGCTGCCC
AAGGGGGTGAAGCACCTGAAGGACTTCCCCATCCTGCCTGGGGAGATCTTCAAGTACAAGTGGA
CTGTGACTGTGGAGGATGGCCCCACCAAGTCTGACCCCAGGTGCCTGACCAGATACTACAGCAG
CTTTGTGAACATGGAGAGGGACCTGGCCTCTGGCCTGATTGGCCCCCTGCTGATCTGCTACAAG
GAGTCTGTGGACCAGAGGGGCAACCAGATCATGTCTGACAAGAGGAATGTGATCCTGTTCTCTG
TGTTTGATGAGAACAGGAGCTGGTACCTGACTGAGAACATCCAGAGGTTCCTGCCCAACCCTGC
TGGGGTGCAGCTGGAGGACCCTGAGTTCCAGGCCAGCAACATCATGCACAGCATCAATGGCTAT
GTGTTTGACAGCCTGCAGCTGTCTGTGTGCCTGCATGAGGTGGCCTACTGGTACATCCTGAGCA
TTGGGGCCCAGACTGACTTCCTGTCTGTGTTCTTCTCTGGCTACACCTTCAAGCACAAGATGGT
GTATGAGGACACCCTGACCCTGTTCCCCTTCTCTGGGGAGACTGTGTTCATGAGCATGGAGAAC
CCTGGCCTGTGGATTCTGGGCTGCCACAACTCTGACTTCAGGAACAGGGGCATGACTGCCCTGC TGAAAGTCTCCAGCTGTGACAAGAACACTGGGGACTACTATGAGGACAGCTATGAGGACATCTC
TGCCTACCTGCTGAGCAAGAACAATGCCATTGAGCCCAGGAGTTTTTCACAGAATCCACCTGTA
TTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTAGGCAAAAACAGTTTAATGCAA
CCACAATACCTGAAAATGATATAGAGAAAACCGATCCCTGGTTCGCACACCGAACCCCCATGCC
AAAAATTCAAAACGTCTCCAGTTCCGATCTTCTCATGCTCTTGCGCCAGTCACCCACACCACAT
GGTCTCTCCCTCAGCGACCTGCAAGAGGCGAAATATGAAACATTTTCAGATGACCCTAGCCCCG
GCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTTTCGGCCGCAGCTGCATCATTC
TGGTGATATGGTATTCACCCCGGAATCAGGCCTCCAACTTAGACTTAACGAGAAACTGGGCACG
ACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCCAGTACCAGCAACAACCTTATCA
GCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAATACATCATCACTTGGGCCACCCTC
TATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTTTGGTAAGAAGTCATCCCCACTC
ACCGAAAGCGGTGGACCTTTGTCTCTCTCTGAGGAGAATAATGACTCCAAGCTGCTTGAGTCAG
GGTTGATGAACAGCCAAGAATCCTCATGGGGAAAAAACGTTTCCTCCACCAGGGAGATCACCAG
GACCACCCTGCAGTCTGACCAGGAGGAGATTGACTATGATGACACCATCTCTGTGGAGATGAAG
AAGGAGGACTTTGACATCTACGACGAGGACGAGAACCAGAGCCCCAGGAGCTTCCAGAAGAAG
ACCAGGCACTACTTCATTGCTGCTGTGGAGAGGCTGTGGGACTATGGCATGAGCAGCAGCCCCC
ATGTGCTGAGGAACAGGGCCCAGTCTGGCTCTGTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGA
GTTCACTGATGGCAGCTTCACCCAGCCCCTGTACAGAGGGGAGCTGAATGAGCACCTGGGCCTG
CTGGGCCCCTACATCAGGGCTGAGGTGGAGGACAACATCATGGTGACCTTCAGGAACCAGGCCA
GCAGGCCCTACAGCTTCTACAGCAGCCTGATCAGCTATGAGGAGGACCAGAGGCAGGGGGCTGA
GCCCAGGAAGAACTTTGTGAAGCCCAATGAAACCAAGACCTACTTCTGGAAGGTGCAGCACCAC
ATGGCCCCCACCAAGGATGAGTTTGACTGCAAGGCCTGGGCCTACTTCTCTGATGTGGACCTGG
AGAAGGATGTGCACTCTGGCCTGATTGGCCCCCTGCTGGTGTGCCACACCAACACCCTGAACCC
TGCCCATGGCAGGCAGGTGACTGTGCAGGAGTTTGCCCTGTTCTTCACCATCTTTGATGAAACC
AAGAGCTGGTACTTCACTGAGAACATGGAGAGGAACTGCAGGGCCCCCTGCAACATCCAGATGG
AGGACCCCACCTTCAAGGAGAACTACAGGTTCCATGCCATCAATGGCTACATCATGGACACCCT
GCCTGGCCTGGTGATGGCCCAGGACCAGAGGATCAGGTGGTACCTGCTGAGCATGGGCAGCAAT
GAGAACATCCACAGCATCCACTTCTCTGGCCATGTGTTCACTGTGAGGAAGAAGGAGGAGTACA
AGATGGCCCTGTACAACCTGTACCCTGGGGTGTTTGAGACTGTGGAGATGCTGCCCAGCAAGGC
TGGCATCTGGAGGGTGGAGTGCCTGATTGGGGAGCACCTGCATGCTGGCATGAGCACCCTGTTC
CTGGTGTACAGCAACAAGTGCCAGACCCCCCTGGGCATGGCCTCTGGCCACATCAGGGACTTCC
AGATCACTGCCTCTGGCCAGTATGGCCAGTGGGCCCCCAAGCTGGCCAGGCTGCACTACTCTGG
CAGCATCAATGCCTGGAGCACCAAGGAGCCCTTCAGCTGGATCAAGGTGGACCTGCTGGCCCCC
ATGATCATCCATGGCATCAAGACCCAGGGGGCCAGGCAGAAGTTCAGCAGCCTGTACATCAGCC AGTTCATCATCATGTACAGCCTGGATGGCAAGAAGTGGCAGACCTACAGGGGCAACAGCACTGG
CACCCTGATGGTGTTCTTTGGCAATGTGGACAGCTCTGGCATCAAGCACAACATCTTCAACCCC
CCCATCATTGCCAGATACATCAGGCTGCACCCCACCCACTACAGCATCAGGAGCACCCTGAGGA
TGGAGCTGATGGGCTGTGACCTGAACAGCTGCAGCATGCCCCTGGGCATGGAGAGCAAGGCCAT
CTCTGATGCCCAGATCACTGCCAGCAGCTACTTCACCAACATGTTTGCCACCTGGAGCCCCAGCA
AGGCCAGGCTGCACCTGCAGGGCAGGAGCAATGCCTGGAGGCCCCAGGTCAACAACCCCAAGGA
GTGGCTGCAGGTGGACTTCCAGAAGACCATGAAGGTGACTGGGGTGACCACCCAGGGGGTGAAG
AGCCTGCTGACCAGCATGTATGTGAAGGAGTTCCTGATCAGCAGCAGCCAGGATGGCCACCAGT
GGACCCTGTTCTTCCAGAATGGCAAGGTGAAGGTGTTCCAGGGCAACCAGGACAGCTTCACCCC
TGTGGTGAACAGCCTGGACCCCCCCCTGCTGACCAGATACCTGAGGATTCACCCCCAGAGCTGG
GTGCACCAGATTGCCCTGAGGATGGAGGTGCTGGGCTGTGAGGCCCAGGACCTGTACGAC7AC4
A AG ACC A TGACGGTGA TTATAAAGA TCA TGACATCGACTACAAGGA TGACGA TGACAAGTGAGATA
TCAAAGCTTTCGAATTCAATAAAAGATCTTTATTTTCATTAGATCTGTGTGTTGGTTTTTT
GTGTGCGGCCCAATTGAGGAACCCCTAGTGATGGAGTTGGCCACTCCCTCTCTGCGCGCTCGCT
C G_C T C ACTG A G_G C C G GG C G A_C_C A A AG G TC_G C C C GAG. G C C C_G G G C T_TT G CCCG G GCG G C C TC A GTG
AGCGAGCGAG.CGCGCAG iSEQ.I_D_NQi.3_7J
1) pAAV2.1 HLP Npu DnaE N intein ATP7B
5' AAV2 ITR: underline (seq A at 5' beginning of the sequence)
Additional AAV2 sequences: double underline fsea B)
HLP promoter: bold (seq C)
SV40 misc rntroni dashed_underl_in_e. (seq D)
CODON OPTIMIZED HUMAN N-TERMINAL ATP7B UNTIL 1467BP: UPPERCASE
UNDERLINE fseq HΊ
CODON OPTIMIZED N-ΪNTEΪN NPU DNAE: BOLD UPPERCASE UNDERLINE fsea F)
CODON OPTIMIZED 3XFLAG: ITALIC UPPERCASE (seq G)
WPRE: italic underline fseq H)
Bgh PolyA: wavy underline (seq I)
Additional AAV2 sequences: underline fseq P 3’ AAV2 ITR: bold (seq K at 3' end of the sequence)
In the following: Sy4 _rnjsc m_trgn.may not be present. The wild type sequence may be used instead of the CODON OPTIMIZED HUMAN N-TERMINAL ATP7B. The CODON OPTIMIZED 3XFLAG: is used as a marker and may not be present. WPRE: may not be present. BghpglyA: Poly A sequences from other sources are possible. ctgcgcgctcgctcgctcactgaggccgcccgggcaaagcccgggcgtcgggcgacctttggtcgcccggcctcagtgagcgagc gagcgcgcagagagggagtggccaactccatcactaggggttccttgtagttaatgattaacccgccatgctacttatctacgtagc mlgctctaggaagatcggaattcgcccttaagtgtttgctgcttgcaatgtttgcccattttagggtggacacaggacgctgtg gtttctgagccagggggcgactcagatcccagccagtggacttagcccctgtttgctcctccgataactggggtgaccttg gttaatattcaccagcagcctcccccgttgcccctctggatccactgcttaaatacggacgaggacagggccctgtctcctc agcttcaggcaccaccactgacctgggacagtgaatctgcagaagttggtcgtgaggcactgggcaggtaagtatcaaggtta. caagacaggtttaaggagaccaata^aaactgggct¾tcgagacagagaagactcttgcgtttc¾ataggcacctattggtctt actgacatccactttgcctttctctccacaggtgtccaggcggccggccgcATGCCAGAGCAGGAGAGGCAGATCACC
GCAAGAGAGGGAGCATCCAGGAAGATCCTGTCCAAGCTGTCTCTGCCAACAAGGGCATGGGAGC
CTGCAATGAAGAAGTCTTTCGCCTTTGACAACGTGGGATATGAGGGAGGCCTGGATGGCCTGGG
ACCTAGCTCCCAGGTGGCCACCAGCACAGTGAGAATCCTGGGCATGACCTGCCAGTCTTGCGTG
AAGAGCATCGAGGACAGGATCTCCAATCTGAAGGGCATCATCTCCATGAAGGTGTCTCTGGAGC
AGGGCTCTGCCACAGTGAAGTACGTGCCCAGCGTGGTGTGCCTGCAGCAGGTGTGCCACCAGAT
CGGCGATATGGGCTTCGAGGCATCCATCGCAGAGGGCAAGGCAGCATCTTGGCCATCCAGATCT
CTGCCTGCCCAGGAGGCCGTGGTGAAGCTGAGGGTGGAAGGAATGACCTGCCAGTCCTGCGTGA
GCAGCATCGAGGGCAAGGTGAGAAAGCTGCAGGGCGTGGTGAGGGTGAAGGTGAGCCTGTCCA
ACCAGGAGGCCGTGATCACATACCAGCCATATCTGATCCAGCCCGAGGACCTGCGGGATCACGT
GAATGACATGGGCTTCGAGGCCGCCATCAAGAGCAAGGTGGCACCTCTGTCCCTGGGACCAATC
GATATCGAGCGCCTGCAGTCCACCAACCCTAAGCGGCCACTGTCCTCTGCCAACCAGAACTTCA
ACAATAGCGAGACACTGGGACACCAGGGCTCCCACGTGGTGACACTGCAGCTGCGCATCGACGG
CATGCACTGCAAGAGCTGCGTGCTGAACATCGAGGAGAATATCGGCCAGCTGCTGGGCGTGCAG
AGCATCCAGGTGTCCCTGGAGAACAAGACCGCCCAGGTGAAGTATGATCCCAGCTGCACATCCC
CTGTGGCCCTGCAGAGGGCAATCGAGGCCCTGCCCCCTGGCAATTTCAAGGTGTCTCTGCCAGA
CGGAGCAGAGGGCAGCGGAACCGATCACCGCAGCTCCTCTAGCCACTCTCCTGGCAGCCCACCA
AGGAACCAGGTGCAGGGAACCTGTTCTACCACACTGATCGCCATCGCCGGCATGACATGCGCCT
CTTGCGTGCACAGCATCGAGGGCATGATCAGCCAGCTGGAGGGCGTGCAGCAGATCTCTGTGAG
CCTGGCAGAGGGAACCGCAACAGTGCTGTACAATCCATCCGTGATCTCTCCCGAGGAGCTGAGA
GCCGCCATCGAGGACATGGGCTTTGAGGCCTCCGTGGTGTCCGAGTCTTGCAGCACCAACCCCC
TGGGCAATCACTCCGCCGGCAACTCTATGGTGCAGACCACAGACGGCACCCCAACAAGCGTGCA GGAGGTGGCACCACACACCGGCAGACTGCCTGCCAATCACGCCCCAGATATCCTGGCCAAGAGC
CCTCAGTCCACAAGGGCAGTGGCACCACAGAAGTGCCTGTCCTATGAGACAGAGATCCTGAC
AGTGGAGTACGGCCTGCTGCCTATCGGCAAGATCGTGGAGAAGAGGATCGAGTGTACCGT
GTATAGCGTGGACAACAATGGCAATATCTACACACAGCCAGTGGCACAGTGGCACGACAGG
GGAGAGCAGGAGGTGTTTGAGTATTGTCTGGAGGATGGCAGCCTGATCCGGGCCACCAAG
GATCACAAGTTCATGACAGTGGACGGCCAGATGCTGCCAATCGATGAGATCTTTGAGCGCG
AGCJGGACCTGATGCGGGTGGATAACCJGCCCAATGACTACAAGGACCACGATGGCGACTATA
AGGATCACGACATCGATTACAAGGACGATGACGATAAGTGAssatcascttssatccaatcaacctctaaatta caaaatttataaaaaattaactaatattcttaactatattactccttttacactatataaatacactactttaatacctttatatcat
Figure imgf000072_0001
actttcoctttccccctccctattoccacoocooaactcatcoccocctoccttocccoctoctooacaooooctcooctottooo cactaacaattccataatattatcaaaaaaactaacatcctttccataactactcacctatattaccacctaaattctacacaaa acatccttctactacatcccttcaaccctcaatccaacaaaccttccttcccacaacctactaccaactctacaacctcttccacat cffcaagatctgcctcgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgccttccttgaccctggaaggtgcca ctcccactgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagtaggtgtcattctattctggggggtggggtggggca ggacagcaagggggaggattgggaagacaatagcaggcatgctggggactcgagttaagggcgaattcccgattaggatcttcc tagagcatggctacgtagataagtagcatggcgggttaatcattaactacaaggaacccctagtgatggagttggccactccct ctctgcgcgctcgctcgctcactgaggccgggcgaccaaaggtcgcccgacgcccgggctttgcccgggcggcctcagtg agcgagcgagcgcgcag
(SQE ID NO: 38)
2) pAAV2.1 HLP Npu DnaE C intein ATP7B
5' AAV2 ITR: underline (seq A at 5' beginning of the sequence)
Additional AAV2 sequences: double underline fsea B)
HLP promoter: bold (seq C)
SV40 miscjn tron dashec mderline (seq D)
CODON OPTIMIZED C-INTEIN NPU DNAE: BOLD UPPERCASE_(seq E)
CODON OPTIMIZED HUMAN C-TERMINAL ATP7B FROM 1468BP TO 3’: UPPERCASE
UNDERLINE fsea F)
CODON OPTIMIZED 3XFLAG: ITALIC UPPERCASE (seq G)
WPRE: underline fseq H) Rgh PolvA: thick underline (seq I)
Additional AAV2 sequences: italic (seq J)
3’ AAV2 ITR: bold (seq K at 3' end of the sequence) ctgcgcgctcgctcgctcactgaggccgcccgggcaaagcccgggcgtcgggcgacctttggtcgcccggcctcagtgagcgagc gagcgcgcagagagggagtggccaactccatcactaggggttccttgtagttaatgattaacccgccatgctacttatctacgtagc tgctctaggaagatcggaattcgcccttaagtgtttgctgcttgcaatgtttgcccattttagggtggacacaggacgctgtg gtttctgagccagggggcgactcagatcccagccagtggacttagcccctgtttgctcctccgataactggggtgaccttg gttaatattcaccagcagcctcccccgttgcccctctggatccactgcttaaatacggacgaggacagggccctgtctcctc agcttcaggcaccaccactgacctgggacagtgaatctgcagaagttggtcgtgaggcactgggcaggtaagtatcaaggtta caagacaggtttaaggagaccaatagaaactgggct¾tcgagacagagaagactcttgcgtttc¾ataggcacctattggtctt actgacatccactttgcctttctctccacaggtgtccaggcggccggccgcATGATCAAGATCGCCACACGGAAGT
ACCTGGGCAAGCAGAACGTGTATGATATCGGCGTGGAGCGGGACCACAACTTCGCCCTGAA
GAATGGCTTTATCGCCAGCAATTGCTTCCTGCAGATCAAGGGCATGACCTGCGCCTCCTGCGT
GAGCAACATCGAGAGGAATCTGCAGAAGGAGGCAGGCGTGCTGTCCGTGCTGGTGGCCCTGATG
GCAGGCAAGGCCGAGATCAAGTACGATCCTGAAGTGATCCAGCCACTGGAGATCGCCCAGTTTA
TCCAGGACCTGGGCTTCGAGGCCGCCGTGATGGAGGATTATGCCGGCAGCGACGGCAACATCGA
GCTGACCATCACAGGCATGACCTGCGCCTCTTGCGTGCACAACATCGAGAGCAAGCTGACCCGC
ACAAATGGCATCACATACGCATCTGTGGCCCTGGCCACCAGCAAGGCCCTGGTGAAGTTTGATC
CCGAGATCATCGGCCCTCGGGACATCATCAAGATCATCGAGGAGATCGGCTTCCACGCCAGCCT
GGCCCAGAGAAACCCCAATGCCCACCACCTGGATCACAAGATGGAGATCAAGCAGTGGAAGAAG
AGCTTTCTGTGCTCCCTGGTGTTCGGCATCCCTGTGATGGCCCTGATGATCTACATGCTGATCCC
TTCCAACGAGCCACACCAGTCTATGGTGCTGGACCACAACATCATCCCAGGCCTGTCCATCCTG
AATCTGATCTTCTTTATCCTGTGCACATTTGTGCAGCTGCTGGGCGGCTGGTACTTCTATGTGC
AGGCCTATAAGAGCCTGCGGCACAGATCCGCCAATATGGATGTGCTGATCGTGCTGGCCACCAG
CATCGCCTACGTGTATTCCCTGGTCATCCTGGTGGTGGCAGTGGCAGAGAAGGCAGAGCGGAGC
CCCGTGACCTTCTTTGACACACCCCCTATGCTGTTCGTGTTTATCGCCCTGGGCAGATGGCTGGA
GCACCTGGCCAAGAGCAAGACCTCCGAGGCCCTGGCCAAGCTGATGAGCCTGCAGGCCACAGAG
GCCACCGTGGTGACACTGGGCGAGGATAACCTGATCATCAGGGAGGAGCAGGTGCCAATGGAGC
TGGTGCAGCGCGGCGACATCGTGAAGGTGGTGCCAGGCGGCAAGTTTCCCGTGGATGGCAAGGT
GCTGGAGGGCAATACAATGGCAGACGAGTCCCTGATCACCGGAGAGGCCATGCCTGTGACCAAG
AAGCCAGGCTCTACAGTGATCGCAGGCAGCATCAACGCACACGGCTCCGTGCTGATCAAGGCCA
CACACGTGGGCAATGATACCACACTGGCCCAGATCGTGAAGCTGGTGGAGGAGGCCCAGATGAG
CAAGGCACCAATCCAGCAGCTGGCAGACCGGTTTTCTGGCTACTTCGTGCCTTTTATCATCATC ATGAGCACCCTGACACTGGTGGTGTGGATCGTGATCGGCTTCATCGACTTTGGCGTGGTGCAGA
GGTATTTCCCAAACCCCAATAAGCACATCTCCCAGACCGAAGTGATCATCCGCTTCGCCTTTCA
GACCTCCATCACCGTGCTGTGCATCGCCTGCCCTTGTTCTCTGGGCCTGGCCACCCCAACAGCCG
TGATGGTGGGAACAGGAGTGGCAGCACAGAACGGCATCCTGATCAAGGGCGGCAAGCCCCTGGA
GATGGCCCACAAGATCAAGACCGTGATGTTCGATAAGACCGGCACAATCACCCACGGCGTGCCA
AGAGTGATGAGAGTGCTGCTGCTGGGCGACGTGGCCACACTGCCACTGAGAAAGGTGCTGGCAG
TGGTGGGAACCGCAGAGGCCAGCTCCGAGCACCCCCTGGGCGTGGCCGTGACAAAGTACTGCAA
GGAGGAGCTGGGCACAGAGACACTGGGCTATTGTACCGACTTTCAGGCAGTGCCTGGATGCGGA
ATCGGCTGTAAGGTGTCCAACGTGGAGGGCATCCTGGCACACTCTGAGCGGCCCCTGTCTGCCC
CTGCAAGCCACCTGAATGAGGCAGGCAGCCTGCCAGCAGAGAAGGATGCAGTGCCTCAGACATT
CTCCGTGCTGATCGGCAACAGAGAGTGGCTGCGGAGAAATGGCCTGACCATCTCTAGCGACGTG
AGCGACGCCATGACAGACCACGAGATGAAGGGCCAGACCGCCATCCTGGTGGCCATCGATGGCG
TGCTGTGCGGCATGATCGCCATCGCAGACGCAGTGAAGCAGGAGGCCGCCCTGGCAGTGCACAC
CCTGCAGTCTATGGGCGTGGATGTGGTGCTGATCACCGGCGACAACAGGAAGACAGCAAGGGCA
ATCGCAACCCAAGTGGGCATCAATAAGGTGTTTGCCGAGGTGCTGCCATCCCACAAGGTGGCCA
AGGTGCAGGAGCTGCAGAACAAGGGCAAGAAGGTGGCCATGGTGGGCGATGGCGTGAATGACT
CTCCCGCCCTGGCACAGGCAGATATGGGAGTGGCAATCGGCACAGGAACCGATGTGGCAATCGA
GGCAGCAGACGTGGTGCTGATCCGGAACGATCTGCTGGACGTGGTGGCCTCCATCCACCTGTCT
AAGCGGACCGTGAGGCGCATCAGAATCAACCTGGTGCTGGCCCTGATCTACAATCTGGTGGGCA
TCCCTATCGCAGCAGGCGTGTTCATGCCAATCGGCATCGTGCTGCAGCCATGGATGGGCAGCGC
CGCAATGGCAGCATCCAGCGTGAGCGTGGTGCTGAGCTCCCTGCAGCTGAAGTGTTACAAGAAG
CCTGACCTGGAGAGGTATGAGGCCCAGGCCCACGGCCACATGAAGCCACTGACCGCCTCTCAGG
TGAGCGTGCACATCGGCATGGACGATAGGTGGAGGGATAGCCCAAGGGCAACACCATGGGACCA
GGTGTCCTACGTGTCTCAGGTGAGCCTGTCTAGCCTGACCTCTGATAAGCCATCCAGGCACAGC
GCCGCCGCCGACGATGACGGCGACAAGTGGAGCCTGCTGCTGAATGGCCGCGATGAGGAGCAGT
AC AT CGACTA TAAGGA TCACGACGGCGA TTACAAGGACCACGA TA TCGACTA TAAGGA TGACGA TG
ACAAGTGAgagcttggatccaatcaacctctggattacaaaatttgtgaaagattgactggtattcttaactatgttgctcctttta cgctatgtggatacectgctttaatgcctttgtatcatgctattgcttcccgtatggctttcattttctcctccttgtataaatcctggttg ctgtctctttatgaggagttgtggcccgttgtcaggcaacgtggcgtggtgtgcactgtgtttgctgacgcaacccccactggttggg gcattgccaccacctgtcaectcctttccgggactttcgctttccccctccctattgccacggcggaactcatcgccgcctgccttgcc cgctgctggacaggggctcggctgttgggcactgacaattccgtggtgttgtcggggaagctgacgtcctttccatggctgctcgcc tgtgttgccacctggattctgcgcgggacgtccttctgctacgtcccttcggccctcaatccagcggaccttccttcccgcggcctgct gccggctctgcggcctcttccgcgtcttcgagatctgcctcgactgtgccttctagttgccagccatctgttgtttgcccctcccccgtg ccttccttgaccctggaaggtgccactcccactgtcctttcctaataaaatgaggaaattgcatcgcattgtctgagtaggtgtcattc ggcgaattcccgattaggatcttcctagagcatggctacgtagataagtagcatggcgggttaatcattaactacaaggaaccc ctagtgatggagttggccactccctctctgcgcgctcgctcgctcactgaggccgggcgaccaaaggtcgcccgacgccc gggctttgcccgggcggcctcagtgagcgagcgagcgcgcag
(SEQ ID NO: 39)
EXAMPLES
Example 1: N6 is the most active F8 variant
To determine which F8 variant was more biologically active, inventors compared the following F8 coding sequence (CDS) versions (Fig.lA): wild type F8 and a modified version of B-domain- deleted (BDD) F8-N6 (N6). In N6, amino acids from 740 to 1649 (B domain) of the WT F8 protein, were deleted.
In place of the B domain, the N6 construct carries a codon-optimised linker designed to promote more efficient F8 secretion by mimicking some of the post-translational modifications that normally occur as described in Miao, H. Z. et al. Bioengineering of coagulation factor VIII for improved secretion. Blood (2004).
N6 was subsequently modified by Ward et al.( Ward, N. J. et al. Codon optimization of human factor VIII cDNAs leads to high-level expression. Blood (2011)) to encode for the N6 human B domain spacer involving 6 (/V)-linked glycosylation sites inserted into a modified SQ activation peptide (SQm) leaving 11 amino acids from the SQm on the N-terminal side and 3 amino acids on the C-terminal side of the spacer. The inventors further modified the N6 variant by deleting the 3 C-terminal amino acids and further codon-optimising the coding sequence using the online software: https://cool.syncti.org/.
Both variants were cloned into an AAV backbone plasmid under control of the CMV promoter, including a short synthetic polyadenylation signal and a triple flag tag (3xflag) to allow for easy detection of the proteins. The constructs were tested by transient transfection into the human embryonic kidney cell line 293 (HEK293). Western blot (WB) analysis of the cell lysates 72 hours post transfection (hpt) revealed bands of the expected size (Fig. IB). To detect the biological activity of each variant, following transfection cells were cultured for 12 hours, after which they were kept in serum-free medium until the timepoint of 72 hpt when F8 activity was measured by chromogenic assay. Both variants produced detectable F8 activity (Fig.1C). Despite the variability of the assay, there was a significant difference in potency between the constructs which was determined by the Kruskal-Wallis rank sum test (P < 0.001). The full-length F8 had fairly low mean levels of activity of 8.4 compared to N6 with mean levels of 71.6 lU/dl.
As the N6 variant was shown to be more active than wild type F8 but still too big to be properly packaged into AAV virions, next split inteins were used to divide the CDS into two AAV intein vectors that could be subsequently evaluated as an alternative to traditional single AAV replacement gene therapy.
Example 2: N6 is efficiently reconstituted by AAV intein-mediated protein trans-splicing in vitro
To test the efficiency of intein-mediated protein trans-splicing for N6, the large CDS were split into two fragments, i.e. the 5' and 3' half, fused respectively to the N- and C-terminal halves of the DnaE split intein from Nostoc punctiforme (Npu) (Iwai, H., Zuger, S., Jin, J. & Tam, P. H. Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme. FEBS Lett. (2006)). The CDS were cloned into two separate AAV plasmids which included the same regulatory elements as in Figure 1 together with a 3xflag to detect both F8 halves as well as the full-length protein and excised intein (Fig.2A). Two different splitting points were selected exclusively within the N6 linker aiming to preserve the integrity of the other more critical protein domains. Further split considerations were taken into account based on the intrinsic amino acid residue requirements for efficient protein trans-splicing with the Npu intein. In particular, the main prerequisite is the presence of an amino acid containing either a thiol or hydroxyl group (Cys, Ser orThr) as the first residue in the 3' half of the coding sequence (Shah, N. H, et a., Extein residues play an intimate role in the rate-limiting step of protein trans -splicing. J. Am. Chem. Soc. (2013) ; Cheriyan, M., et al., Traceless splicing enabled by substrate-induced activation of the Nostoc punctiforme Npu DnaE intein after mutation of a catalytic cysteine to serine. J. Mol. Biol. (2014) )
Two alternative intein sets were designed (Set 1 where the split point is at Ser962 and Set 2: where the split point is Ser883, counting the signal peptide). Both sets were tested and compared to the single N6 plasmid from Figure 1 by transfection into HEK293 cells. Seventy-two hours post transfection, cell lysates and medium were harvested, and F8 expression was evaluated by WB (Fig.2B). Both full-length N6 protein of the expected size (~200 kDa) as well as the excised DnaE intein (~18 kDa) were detected. The activity levels of the secreted F8 in the medium were found to be similar to the levels of the single N6 plasmid with a mean of ~100 lU/dl for both intein sets (Fig.2C). The single halves alone exhibited little to no activity.
Example 3: Codon optimisation of N6 AAV intein achieves supraphysiological levels of F8 activity in vitro
To further improve the efficiency of the N6 AAV intein, we first codon optimised the CDS of set 1 using previously reported codon optimised (codop) nucleotide sequence for domains Al, A2, A3, Cl and C2. A 5-fold increase in protein expression and secretion was observed by WB as opposed to the non-codop intein set (Fig.3A-C). Moreover, cells expressing the codop N6 set produced more robust mean levels of F8 activity of 180 lU/dl compared to the equivalent non- codop N6 set (101 lU/dl) as well as the N6 single plasmid (97 lU/dl) as assessed by chromogenic assay (Fig.3D).
To determine the efficiency in the liver of codop N6 AAV intein set 1, we exchanged the ubiquitous CMV promoter for the smaller hybrid liver promoter (HLP) of 251 base pairs and produced them as AAV2/8. This greatly reduced the size of the expression cassettes to 3-3.9 kb, making them easily packageable into AAV virions. Next, N6 AAV intein are under in vivo evaluation into the HemA mouse model B6;129S-F8tmlKaz. Adult mice (6 to 11 weeks of age) were injected systemically at the dose of 5 x 1011 GC of each vector per animal. Blood plasma samples are to be collected every 4 weeks for a total of 16 weeks and F8 activity levels will be monitored using aPTT (activated partial thromboplastin time) and functional chromogenic assay.
N6 AAV intein set 2 will also be codon optimised and adapted for in vivo gene therapy by exchanging the promoter.
Example 4: Wilson disease
The present inventors designed a dual-AAV vector strategy using split intein technology for gene therapy of Wilson disease. The intein of DnaE from Nostoc punctiforme (Npu) (Zettler J, et al. The naturally split Npu DnaE intein exhibits an extraordinarily high rate in the protein trans splicing reaction. FEBS Lett 2009;583:909-914; Iwai H,et al. Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme. FEBS Lett 2006;580:1853-1858), that recognizes a specific tripeptide present in the original human ATP7B CDS (NM_000053) was selected and two constructs bearing the 5' half of a codon optimized human ATP7B cDNA followed by N-teminal Npu DnaE intein (pAAV2.1 HLP Npu DnaE N intein ATP7B, Fig. 4A) and C-terminal Npu DnaE intein followed by the 3' half of codon optimized human ATP7B cDNA, respectively (pAAV2.1 HLP Npu DnaE C intein ATP7B, Fig. 4B) were made. A 3XFLAG tag was placed at the 3' of both constructs as a marker. Transgene expression was driven by hybrid liver promoter (HLP) ( McIntosh J, Lenting PJ, Rosales C, Lee D, Rabbanian S, Raj D, Patel N, et al. Therapeutic levels of FVIII following a single peripheral vein administration of rAAV vector encoding a novel human factor VIII variant. Blood 2013;121:3335-3344).
To test the efficacy of reconstitution of the full-length ATP7B, intein-ATP7B constructs were co transfected in human hepatoma cell line HepG2 knocked-out for ATP7B ( Chandhok G, Schmitt N, Sauer V, Aggarwal A, Bhatt M, Schmidt HH. The effect of zinc and D-penicillamine in a stable human hepatoma ATP7B knockout cell line. PLoS One 2014;9:e98809.). Cells were also transfected with a plasmid expressing GFP as negative control. Western blot analysis using anti- FLAG antibody showed bands at the expected sizes for full-length ATP7B protein and for excised inteins (17 kDa) (Fig. 5). Unspliced N- and C-intein-ATP7B proteins (56 and 110 kDa, respectively) were also detected. (Fig. 5)
Next, the intein-ATP7B constructs were used to generate two AAV2/8 vectors, AAV2/8 HLP 5'ATP7B+N-intein and AAV2/8 HLP C-intein-3'ATP7B. These vectors were co-injected intra venously in C57BL/6 mice at the dose of 5X1012GC/Kg. An AAV2/8 expressing GFP was injected as negative control at the dose of 1X1013GC/Kg. Western blot analysis using anti-FLAG antibody on livers harvested two weeks post-injection showed reconstituted full-length ATP7B, together with excised inteins and unspliced intein-ATP7B halves (Fig.e). These data demonstrate the ability of split-intein technology to drive the reconstitution of full-length human ATP7B and pave the way for the development of a gene therapy approach for Wilson disease using split-inteins.
Example 5: Gene therapy for preventing disease progression in Wilson disease Materials and methods
Animal studies
AAV vectors were produced and titered by the TIGEM AAV Vector Core as previously described (Maddalena A, et al. High-Throughput Screening Identifies Kinase Inhibitors That Increase Dual Adeno-Associated Viral Vector Transduction In Vitro and in Mouse Retina. Hum Gene Ther 2018;29:886-901). Male 6-7-week-old C57BL/6 (Charles River Laboratories) and Atp7b~/~ (Buiakova Ol, et al. Null mutation of the murine ATP7B (Wilson disease) gene results in intracellular copper accumulation and late-onset hepatic nodular transformation. Hum Mol Genet 1999 Sep;8(9):1665-71) mice were administered by intravenous injection with 200 pi of vector solution in 0.9% NaCI. At sacrifice, animals were perfused with PBS and livers were harvested and lysed in RIPA buffer using Tissuelyser (QIAGEN). Serum ALT/AST levels were measured by scil VitroVet analyzer (Scil vet)
Liver staininq
Hematoxylin and eosin staining was performed on 5-miti liver sections. Briefly sections were rehydrated and stained in Harris hematoxylin solution for 4 minutes. After two wash in tap water for 5 minutes, sections were incubated in a solution of 0.1% ammonia water (1 ml Ammonium hydroxide in 1L distilled water) for 1 minute, washed again in tap water for 5 minutes and counterstained in Eosin y-solution for 30 seconds. Finally, sections were dehydrated, cleared in xylene, and mounted in a resinous medium.
Sirius Red staining was performed on 5-miti liver sections which were rehydrated and stained for 1 hour in picrosirius red solution (0.1% Sirius red in saturated aqueous solution of picric acid). After two changes of acidified water (0.5% acetic acid in water), sections were dehydrated, cleared in xylene, and mounted in a resinous medium. Images were captured by Axio Scan.Zl microscope (Zeiss) and analyzed by ImageJ for quantification of Sirius Red positive area. Five images for each mouse were analyzed.
For immunohistochemistry, 5-miti thick sections were rehydrated and permeabilized in PBS/0.2% Triton (Sigma) for 20 minutes. Antigen unmasking was performed in 0.01M citrate buffer in a microwave oven. Next, sections underwent blocking of endogenous peroxidase activity in methanol/1.5% H202 (Sigma) for 30 minutes and incubation with blocking solution [(3% BSA (Sigma), 5% donkey serum (Millipore), 1.5% horse serum (Vector Laboratories) 20mM MgCI2, 0.3% Triton (Sigma) in PBS] for 1 hour. Sections were incubated with primary antibody overnight at 4°C and with universal biotinylated horse anti-mouse/rabbit IgG secondary antibody (Vector Laboratories) for 1 hour. Biotin/avidin-HRP signal amplification was achieved using ABC Elite Kit (Vector Laboratories) according to manufacturer's instructions. 3,3'- diaminobenzidine (DAB, Vector Laboratories) was used as peroxidase substrate. Mayer's hematoxylin (Bio-Optica) was used as counter-staining. Sections were de-hydrated and mounted in Vectashield (Vector Laboratories). Image capture was performed using Axio Scan.Zl microscope (Zeiss). Results
Atp7b/ mouse recapitulates several features of Wilson disease and has been extensively used to study disease pathogenesis and test novel therapies. In this mouse hepatic copper accumulation starts in the first weeks of life and liver develops steatosis and mild inflammation by 6 weeks of age, progressing into hepatitis, necroinflammation and fibrosis by 18-20 weeks of age (Huster et al., Consequences of copper accumulation in the livers of the Atp7b^ (Wilson disease gene) knockout mice. Am J Pathol 2006; 168, 423-434). We injected AAV2/8 HLP 5'ATP7B+N-intein and AAV2/8 HLP C-intein-3'ATP7B (AAV-int-ATP7B) or AAV2/8 TBG GFP (AAV- GFP) vector at a total dose of 2xl013gc/Kg in Atp7b/ mice at 6-7 weeks of age and sacrificed them at 12 weeks post-injection. Western blot analysis using anti-FLAG antibody on livers showed reconstituted full-length ATP7B, together with excised inteins and unspliced intein- ATP7B halves in mice injected with AAV-int-ATP7B vectors (Fig. 7A). Consistently, in these mice immunohistochemistry using anti-ATP7B antibody showed ATP7B expression by hepatocytes, while no signal was detected in section from AAV-GFP-treated mice (Fig. 7B). As previously described, liver histological analysis revealed necro-inflammation and fibrosis in Atp7b/ mice treated with AAV-GFP vector (Fig. 8). Conversely, livers from AAV-int-ATP7B injected animals showed no sign of liver pathology and were similar to Atp7b+/ healthy controls (Fig. 8). Moreover, circulating alanine and aspartate transaminases (ALT and AST) levels progressively increased in Atp7b/ mice treated with AAV-GFP compared to Atp7+/ mice, while no significant differences were found in AAV-int-ATP7B-treated animals compared to Atp7b+/ mice (Fig. 9A,B). Taken together these findings demonstrated the efficacy of intein-mediated gene therapy in preventing disease progression in Wilson disease mouse model.
Example 6: AAV intein-mediated F8 trans-splicing corrects mouse hemophilia A without eliciting anti-F8 antibodies
RESULTS
N6 is the most active F8 variant
To determine the most biologically active F8 variant, we compared the wild type F8 coding sequence (CDS) to 3 commonly used B-domain-deleted (BDD, which lack F8 amino acids from 740 to 1649) versions (Fig. 10A and Supplemental Fig. 1 for exact amino acid differences). Specifically, the 3 BDD constructs carry different codon-optimised linkers in the place of the B domain which are designed to promote efficient F8 secretion by mimicking some of the natural F8 post-translational modifications: N6 (F8-N6) contains 11 amino acids from the modified SQ activation peptide (SQm) from Ward et al. (N. J. Ward, S. M. K. Buckley, S. N. Waddington, T. VandenDriessche, M. K. L. Chuah, A. C. Nathwani, J. McIntosh, E. G. D. Tuddenham, C. Kinnon, A. J. Thrasher, J. H. McVey, Codon optimization of human factor VIII cDNAs leads to high-level expression. Blood (2011)), followed by the N6 human B domain spacer involving 6 (/V)-linked glycosylation sites (H. Z. Miao, N. Sirachainan, L. Palmer, P. Kucab, M. A. Cunningham, R. J. Kaufman, S. W. Pipe, Bioengineering of coagulation factor VIII for improved secretion. Blood (2004)); SQ (F8-SQ) contains the original SQ linker described by Sandberg et al. (H. Sandberg, A. Almstedt, J. Brandt, E. Gray, L. Holmquist, U. Oswaldsson, S. Sebring, M. Mikaelsson, Structural and functional characteristics of the B-domain-deleted recombinant factor VIII protein, r-VIII SQ. Thromb. Haemost. (2001)). This variant is available for clinical use as a replacement recombinant F8 product (ReFacto, Wyeth Pharma), and is also under investigation in more than one AAV gene therapy clinical trial. The V3 variant (F8-V3) consists of a small 17-aa peptide, which contains the original 6 (/V)-linked glycosylation triplets from the N6 inserted into and flanked by the SQ linker (J. McIntosh, P. J. Lenting, C. Rosales, D. Lee, S. Rabbanian, D. Raj, N. Patel, E. G. D. Tuddenham, O. D. Christophe, J. H. McVey, S. Waddington, A. W. Nienhuis, J. T. Gray, P. Fagone, F. Mingozzi, S. Z. Zhou, K. A. High, M. Cancio, C. Y. C. Ng, J. Zhou, C. L. Morton, A. M. Davidoff, A. C. Nathwani, Therapeutic levels of FVIII following a single peripheral vein administration of rAAV vector encoding a novel human factor VIII variant. Blood (2013)). This variant has been previously described as another small version of BDD F8, able to achieve high levels of F8 activity in mice and non-human primates as well as in human subjects.
All four variants were independently cloned into an AAV backbone plasmid under the control of the CMV promoter, including both a short synthetic polyadenylation signal (N. Levitt, D. Briggs, A. Gil, N. J. Proudfoot, Definition of an efficient synthetic poly(A) site. Genes Dev. (1989)) and a triple flag tag (3xflag) to allow for easy detection of the proteins. The constructs were tested by transient transfection in the human embryonic kidney cell line 293 (HEK293). Western blot (WB) analysis of the cell lysates 72 hours post-transfection (hpt) revealed bands of the expected size (Fig. 10B). To detect the biological activity of each variant, cells were cultured for 12 hours following transfection, after which they were kept in serum-free medium until the timepoint of 72 hpt when F8 activity was measured by chromogenic assay. All variants produced detectable F8 activity (Fig. IOC). There was a significant difference in potency of the wild type F8 and both the N6 and V3 constructs which was determined by the Kruskal-Wallis rank sum test; this difference was more significant for the N6 (**P < 0.001) than for the V3 variant (*P < 0.05). Wild type F8 had fairly low mean levels of activity of 8.4 International Unit / deciliter (lU/dl), followed by SQ with 27.4 lU/dl, V3 with 57.6 lU/dl, and N6 with the highest mean levels of 71.6 lU/dl. Although no significant difference was determined between the N6, SQand V3 variants, due to the variability of the assay, we found that the N6 variant was the best performing (Fig. IOC). Since the N6 variant cannot be properly packaged into a single AAV virion, we used split inteins to divide the F8-N6 CDS into two AAV intein vectors that were evaluated in comparison to one of the traditional single AAV replacement gene therapy which is under clinical investigation (NCT03001830).
AAV-intein-mediated protein trans-splicing efficiently reconstitutes N6 in vitro
To test the efficiency of intein-mediated N6 protein trans- splicing , we split the large CDS into two fragments, i.e. the 5' and 3' half, fused respectively to the N- and C-terminal halves of the DnaE split inteins from Nostoc punctiforme (Npu) (H. Iwai, S. Zuger, J. Jin, P. H. Tam, Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme. FEBS Lett. (2006)). The split CDS were cloned into two separate AAV plasmids which included the same regulatory elements as above, together with a 3xflag to detect both N6 halves as well as the full-length protein and excised intein (Fig. 11A). The splitting point was selected within the B domain, which is known to be dispensable for F8 expression and procoagulant activity, thus aiming to preserve the integrity of the other more critical protein domains. To optimise the chosen splitting position, the intrinsic amino acid residue requirements for efficient protein trans- splicing with the Npu intein were also considered. Specifically, the main prerequisite is the presence of an amino acid containing either a thiol or hydroxyl group (Cys, Ser or Thr) as the first residue in the 3' half of the coding sequence. The intein set was designed within the N6 linker (Ser962, considering the signal peptide) of the N6 variant. Moreover, to assess whether F8 activity in the medium of transfected cells was specifically due to the reconstitution of the full-length N6 after PTS, a set of N6 flanked by heterologous split inteins was also designed. In this set, the N-terminus of the 5' half of N6 was fused to N-intein DnaB from Rhodothermus marinus (Rma) while the C-terminus of the 3'half was fused to the C-intein DnaE. Both intein sets were tested by transient transfection into HEK293 cells. 72 hpt, cell lysates and medium were harvested, and N6 expression was evaluated by WB (Fig. 11 B-C). Both full-length N6 protein of the expected size (~190 kDa in cell lysate and ~170 kDa in the medium) as well as the excised DnaE intein (~18 kDa) were detected only when the Npu intein set was used. The activity levels of the secreted F8 in the medium were found to be ~60 lU/dl on average, the single halves as well as the heterologous intein set exhibited little to no activity (Fig. 11D).
N6 codon optimisation increases F8 activity levels in vitro
To further improve the efficiency of the N6 intein, we codon optimised the N6 CDS (Codop N6), as this has been previously reported to improve F8 levels. A 4-fold increase in Codop-N6 protein expression and secretion was observed by WB compared to the non-codon-optimised N6 intein (Fig. 12A-B). Moreover, cells expressing Codop N6 had higher F8 activity levels (~200 lU/dl) than the corresponding non-codop N6 (~70 lU/dl) as assessed by chromogenic assay (Fig. 12C). In addition, to demonstrate that PTS results in precise Codop-N6 reconstitution, we transfected HEK293 cells with the AAV-Codop N6 intein plasmids and immunopurified the resultant full- length N6 protein. Liquid chromatography-mass spectrometry (LC-MS) analysis showed reconstituted Codop N6 peptides with sequences at the splitting point which were identical to full-length Codop N6 encoded by a single plasmid. This was confirmed across 5 out of 6 independent experiments and a total number of 211 individual peptides (Fig. 12D).
Systemic administration of AAV-N6 intein results in therapeutic levels of F8 in HemaA mice
To determine the efficiency of liver gene therapy following systemic administration of either AAV-N6 intein (N6 intein), Codop AAV-N6 intein (CodopN6 intein) or the highly active, single AAV-Codon optimised F8-V3 (CodopV3) (A. C. Nathwani, E. Tuddenham, P. Chowdary, J. McIntosh, D. Lee, C. Rosales, M. Phillips, J. Pie, Z. Junfang, M. M. Meagher, U. Reiss, A. M. Davidoff, C. L. Morton, A. Riddell, GO-8: Preliminary Results of a Phase I/ll Dose Escalation Trial of Gene Therapy for Haemophilia a Using a Novel Human Factor VIII Variant. Blood (2018)) used as a standard, we have generated AAV8 vectors that efficiently target liver in combination with the hybrid liver promoter (HLP). The resulting size of both the N6 intein and Codop N6 intein AAV genomes (3 kb for the 5'half and 3.9 kb for the 3'half) fell well within the AAV packaging capacity, unlike the genome of the single AAV- CodopV3 (5.2 kb) which exceeds the capacity. The packaged genomes integrity of the AAV-N6 intein were confirmed by alkaline Southern blot hybridisation of purified vector DNA with a probe specific for the HLP promoter (Fig. 13). The lanes corresponding to AAV intein vectors showed discrete bands of the expected molecular weight while the lane corresponding to the AAV8-Codop-V3 vector consisted of a heterogeneous population of truncated genomes of different size.
The different sets of AAV intein vectors were injected retro-orbitally in 7-11 week-old adult HemA B6;i29s-F8tmiKaz mice, at a dose of 5 x 1011 genome copies (GC) of each vector per animal. Blood plasma samples were collected every 4 weeks for 16 weeks after vector administration. F8 activity levels were monitored using both the functional chromogenic assay and the activated partial thromboplastin time (aPTT).
After AAV-N6 intein administration, plasma F8 activity reached wild-type levels (~ 150 lU/dl) and remained stable up to 16 weeks post-injection (w.p.i.). These levels were similar to those from animals which received the single positive control AAV-CodopV3 (Fig. 14), which however showed a significant decrease or loss of F8 activity over time in the majority of the treated animals (5 out of 8, Fig. 14).
Surprisingly, mice injected with AAV-CodopN6 intein (N=10) had little to no plasma F8 activity and were therefore sacrificed between 8 and 12 w.p.i. (Fig. 14).
Western-blot analysis performed on liver lysate samples of the AAV-CodopN6 intein injected group, showed the expression of the N6 full-length protein and both the 5' and 3'halves as an indication that transduced hepatocytes have not been eliminated (Fig. 15A).
AAV-CodopN6 intein and AAV-CodopV3 induce circulating anti-F8 antibodies
To understand why AAV-Codop N6 intein administration results in F8 expression that declines shortly after treatment, adult haemophilic mice were systemically injected with AAV-CodopN6 intein (N=6) and monitored every week for 4 weeks. Chromogenic assay performed on blood plasma samples showed that all injected mice achieved levels of F8 activity in the supraphysiological range of ~ 250 lU/dl which declined at 4 w.p.i. (Fig. 15B). Since anti-F8 antibodies are routinely observed from 15 days after exposure to recombinant F8 (F. Peyvandi, P. M. Mannucci, I. Garagiola, A. El-Beshlawy, M. Elalfy, V. Ramanan, P. Eshghi, S. Hanagavadi, R. Varadarajan, M. Karimi, M. V. Manglani, C. Ross, G. Young, T. Seth, S. Apte, D. M. Nayak, E. Santagostino, M. E. Mancuso, A. C. Sandoval Gonzalez, J. N. Mahlangu, S. Bonanad Boix, M. Cerqueira, N. P. Ewing, C. Male, T. Owaidah, V. Soto Arellano, N. L. Kobrinsky, S. Majumdar, R. Perez Garrido, A. Sachdeva, M. Simpson, M. Thomas, E. Zanon, B. Antmen, K. Kavakli, M. J. Manco-Johnson, M. Martinez, E. Marzouka, M. G. Mazzucconi, D. Neme, A. Palomo Bravo, R. Paredes Aguilera, A. Prezotti, K. Schmitt, B. M. Wicklund, B. Zulfikar, F. R. Rosendaal, A Randomized Trial of Factor VIII and Neutralizing Antibodies in Hemophilia A. N. Engl. J. Med. 374, 2054-2064 (2016)), we speculated that AAV-Codop N6 intein might induce circulating anti- F8 antibodies that interfere with F8 activity. We measured anti-F8 antibodies by an indirect enzyme-linked immunosorbent assay (ELISA) at both lw.p.i. and 4w.p.i. and found that the loss of F8 activity at 4w.p.i is significantly correlated with an increase in anti-F8 antibodies (**P < 0.001) (Fig. 15B).
We next extended the analysis of anti-F8 antibodies to the animals previously injected with: single AAV-CodopV3 (N=8) and AAV-N6 intein (N=5) which were analysed at 16 w.p.i.; and the AAV-CodopN6 intein (N=8) which were analysed at 8w.p.i. The anti-F8 antibodies levels (Fig. 16) precisely inversely correlate with the F8 activity. Specifically: of the N=8 animals injected with the single AAV-CodopV3, N=5 had high levels of anti-F8 antibodies and no detectable F8 activity (*P< 0.05) while the N=3 mice exhibiting high F8 activity levels had no anti-F8 antibodies (***p<0.0008); all mice injected with AAV-N6 intein (N=5) had high F8 activity levels and absent anti-F8 antibodies (*P< 0.05); all the mice injected with AAV-CodopN6 intein exhibited high levels of anti-F8 antibodies and absent F8 activity (**P< 0.005).
Administration of AAV-CodopN6 intein at low doses results in F8 activity without eliciting anti- F8 antibodies
We then hypothesized that the anti-F8 antibodies elicited by AAV-CodopN6 intein administration could be avoided by lowering the vector dose thus combining therapeutic efficacy with higher safety. Adult haemophilic mice were injected systemically with AAV- CodopN6 intein at the dose of 1.5 x 1011 GC of each vector per animal (a dose half-a-log lower than the one previously used) and monitored for4 weeks. Blood plasma samples were collected every week and F8 activity levels were assessed at 4w.p.i. by chromogenic assay while anti-F8 antibodies by indirect ELISA (Fig.17).
Systemic administration of AAV-N6 or codopN6 intein at low dose restores blood coagulation activity
To confirm that the levels of F8 activity obtained result in improved blood clotting activity, we measured at various time-points after AAV vector administration the activated partial thromboplastin time (aPTT) in mouse plasma. We found that this was significantly decreased to almost normal levels in animals which received AAV-N6 or -CodopN6 intein at low dose but not in those which received AAV-CodopV3 or -CodopN6 intein at high dose (Fig. 18) which had low/no F8 activity levels and high anti-F8 circulating antibodies (Fig. 16).
DISCUSSION
After decades of extensive research on therapies for hemophilia A (HemA), numerous new technologies have emerged offering a broader range of disease management options for HemA patients. Yet, the problem of curing the disease still remains unsolved. Liver gene therapy with single AAV vectors has the potential to fill this gap and is currently under evaluation in multiple clinical trials (J. S. S. Butterfield, K. M. Hege, R. W. Herzog, R. Kaczmarek, A Molecular Revolution in the Treatment of Hemophilia. Mol. Ther. (2019); M. Makris, Gene therapy 1-0 in haemophilia: effective and safe, but with many uncertainties. Lancet Haematol. (2020)). However, the high F8 protein levels initially observed in some of the trials require high doses of viral vectors to be achieved. In addition, the durability of vector expression has recently been questioned because of the F8 declining levels observed in the gene therapy trial NCT02576795.
Here we show that dual AAV vectors armed with Npu DnaE split inteins efficiently and precisely reconstitute the large and highly active F8-N6 (N6) variant in the mouse liver resulting in stable therapeutic levels of F8.
The N6 variant has been previously delivered in mice in the context of a single AAV but its size (5 Kb) greatly exceeds the normal AAV cargo capacity. This construct was shown to achieve high levels of F8 expression and secretion, but clinical translatability is limited by the poor characterization of oversize genomes upon truncated genome re-assembly. In this study, we overcome this limitation by effectively delivering N6 using two separate AAV vectors, each well within the AAV packaging capacity.
The levels of F8 activity achieved in vivo by a single systemic administration of AAV-N6 intein are comparable to those obtained with the single packageable AAV-Codon optimised F8-V3 (CodopV3) (A. C. Nathwani, E. Tuddenham, P. Chowdary, J. McIntosh, D. Lee, C. Rosales, M. Phillips, J. Pie, Z. Junfang, M. M. Meagher, U. Reiss, A. M. Davidoff, C. L. Morton, A. Riddell, GO- 8: Preliminary Results of a Phase I/ll Dose Escalation Trial of Gene Therapy for Haemophilia a Using a Novel Human Factor VIII Variant. Blood (2018)) which is in clinical development (NCT03001830) and expresses one of the most promising B-domain deleted (BDD) versions of F8. Yet, the size of the AAV-F8-V3 genome is over the canonical vector cargo capacity and results in truncated genomes.
Importantly, we provide evidence that our strategy does not lead to development of anti-F8 antibodies in HemA mice whereas this occurred in ~60 percent of AAV-CodopV3 treated animals resulting in low to no F8 activity in these animals.
Additionally, we show that N6 codon optimization achieves F8 activity levels, in the absence of anti-F8 antibodies, at doses which are substantially lower than those of AAV-CodopV3 or the non-codon optimized version of AAV-N6 inteins. This lowers the safety and vector manufacturing burdens in view of potential future clinical translation.
In conclusion, our results support liver gene therapy with a single intravenous administration of AAV-N6 intein as a potential therapeutic strategy for HemA.
MATERIALS AND METHODS Study design
This study was designed to define the efficiency of AAV intein-mediated protein trans- splicing in reconstituting the full-length F8-N6 protein in mouse liver. This was defined in vitro by assessing the expression and the activity of the reconstituted protein achieved via protein trans- splicing and in vivo by evaluating the impact of a retro-orbital administration of the AAV-N6 intein in an adult mouse model of hemophilia A in comparison to the single AAV-codon optimised BDD-F8-V3 used as a standard. In all in vivo studies males only were used (given that hemophilia A is inherited as X-linked recessive); within the same litter, animals were randomly assigned to each treatment group and all the mice before the treatment were used as negative control (baseline). To evaluate the efficacy of the treatment overtime, different time-points were selected for the analysis as indicated in the results section. The experimenters in the efficacy studies were blind to the treatment of the animals. Sample sizes were determined on the basis of previous experience and technical feasibility; at least three biological replicates in in vitro studies or five animals per group were used in all the experiments, as indicated in the results section and figure legends.
Generation of AAV vector plasmids
The plasmids used for AAV vector production derived from the pTigem AAV plasmid that contain the ITRs of AAV serotype 2. The F8-N6 protein was split at Serine 962 (Ser962) considering the signal peptide exclusively within the N6 linker (in place of the B domain), aiming to preserve the integrity of the other more critical protein domains. Further split considerations were taken into account based on the intrinsic amino acid residue requirements for efficient protein trans- splicing with the Npu intein. In particular, the main prerequisite is the presence of an amino acid containing either a thiol or hydroxyl group (Cys, Ser or Thr) as the first residue in the 3' half of the coding sequence.
Split-inteins included in the plasmids were the split-inteins of DnaE from Nostoc punctiforme (Npu) (H. Iwai, S. Zuger, J. Jin, P. H. Tam, Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme. FEBS Lett. (2006)). The plasmids used in the study were under the control of either the ubiquitous cytomegalovirus (CMV) promoter (P. Tornabene, I. Trapani, R. Minopoli, M. Centrulo, M. Lupo, S. De Simone, P. Tiberi, F. Dell'Aquila, E. Marrocco, C. lodice, A. luliano, C. Gesualdo, S. Rossi, L. Giaquinto, S. Albert, C. B. Hoyng, E. Polishchuk, F. P. M. Cremers, E. M. Surace, F. Simonelli, M. A. De Matteis, R. Polishchuk, A. Auricchio, Intein- mediated protein trans-splicing expands adeno-associated virus transfer capacity in the retina. Sci. Transl. Med. (2019)) or the liver-specific hybrid liver promoter (HLP) (J. McIntosh, P. J. Lenting, C. Rosales, D. Lee, S. Rabbanian, D. Raj, N. Patel, E. G. D. Tuddenham, O. D. Christophe, J. H. McVey, S. Waddington, A. W. Nienhuis, J. T. Gray, P. Fagone, F. Mingozzi, S. Z. Zhou, K. A. High, M. Cancio, C. Y. C. Ng, J. Zhou, C. L. Morton, A. M. Davidoff, A. C. Nathwani, Therapeutic levels of FVIII following a single peripheral vein administration of rAAV vector encoding a novel human factor VIII variant. Blood (2013)).The polyadenylation signal (polyA) used in all plasmids was the short synthetic polyA (N. Levitt, D. Briggs, A. Gil, N. J. Proudfoot, Definition of an efficient synthetic poly(A) site. Genes Dev. (1989)). For the generation of the heterologous split- inteins, the same splitting point (Ser962) was used. The N- split intein flanking the 5' half plasmid was the N-intein of DnaB from Rhodothermus marinus ( Rma ) (P. Tornabene, I. Trapani, R. Minopoli, M. Centrulo, M. Lupo, S. De Simone, P. Tiberi, F. Dell'Aquila, E. Marrocco, C. lodice, A. luliano, C. Gesualdo, S. Rossi, L. Giaquinto, S. Albert, C. B. Hoyng, E. Polishchuk, F. P. M. Cremers, E. M. Surace, F. Simonelli, M. A. De Matteis, R. Polishchuk, A. Auricchio, Intein-mediated protein trans-splicing expands adeno-associated virus transfer capacity in the retina. Sci. Transl. Med. (2019); F. X. Zhu, Z. L. Liu, X. L. Wang, J. Miao, H. G. Qu, X. Y. Chi, Inter-chain disulfide bond improved protein trans-splicing increases plasma coagulation activity in C57BL/6 mice following portal vein FVIII gene delivery by dual vectors. Sci. China Life Sci. (2013)) while the C-split intein flanking the 3' half plasmid was the C-intein of the DnaE. Heterologous split-inteins plasmids were produced under the control of the ubiquitous cytomegalovirus (CMV) promoter since they were only used for in-vitro purposes. The codon optimised BDD-F8-V3 plasmid used as a standard, is as disclosed in McIntosh et al. (J. McIntosh, P. J. Lenting, C. Rosales, D. Lee, S. Rabbanian, D. Raj, N. Patel, E. G. D. Tuddenham, O. D. Christophe, J. H. McVey, S. Waddington, A. W. Nienhuis, J. T. Gray, P. Fagone, F. Mingozzi, S. Z. Zhou, K. A. High, M. Cancio, C. Y. C. Ng, J. Zhou, C. L. Morton, A. M. Davidoff, A. C. Nathwani, Therapeutic levels of FVIII following a single peripheral vein administration of rAAV vector encoding a novel human factor VIII variant. Blood (2013)).
AAV vector production and characterization
AAV vectors were produced by triple transfection of HEK293 cells. No differences in vector yields were observed between AAV vectors which include the split inteins sequences or not.
Southern blot analyses of AAV vector DNA
DNA was extracted from 6xl010 viral particles measured as genome copies (GC). To digest unpackaged genomes, the vector solution was incubated with 30 mI of DNase I (04536282001 Roche, Italy) in a total volume of 300 mI, containing 50 mM Tris pH 7.5, and 1 mM MgCU for 2 hours at 37°C. The DNase was then inactivated with 50 mM EDTA, followed by incubation at 50°C for 1 hour with proteinase K and 2.5% /V-lauryl-sarcosil solution to lyse the capsids. The DNA was extracted twice with phenol-chloroform and precipitated with 2 volumes of ethanol 100% and 10% sodium acetate (3 M) and 1 mI of Glycogen (20 pg) was performed as previously described (Sambrook, J., and Russell, D.W. 2001. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press. Cold Spring Harbor, New York, USA. 999 pp). Single-stranded DNA was quantified with Qubit® ssDNA Kit (Q10212 ThermoFisher Scientific, Germany). A probe specific for the HLP promoter was used; 1.4 xl010 GC for the single vector AAV-Codop V3 and for both the 5' and the 3' AAV-N6 intein were loaded on an alkaline agarose gel electrophoresis. Transfection of HEK293 cells
HEK293 cells were maintained and transfected using the calcium phosphate method (1 pg of each plasmid/well in 6-well plate format). The total amount of DNA transfected in each well was kept equal by addition of a scramble plasmid when necessary. 12 hpt medium was switched to Opti-MEM reduced serum medium (31985062, Gibco, ThermoFisher Scientific, Germany) lml/well until the time point of 72 hpt when cells were harvested and medium was collected. Western blot analysis
Samples (HEK293 cells or liver lysates) were lysed in RIPA buffer to extract F8 protein. Lysis buffer was supplemented with protease inhibitors (Complete Protease inhibitor cocktail tablets; Roche, Basel, Switzerland) and ImM phenylmethylsulfonyl. For medium samples, upon cell harvesting at 72 hpt medium samples were centrifugated at 4°C for 15 minutes to remove cell debris. Purified medium was collected; 30 mI of medium mixed with IX Laemmli sample buffer. All samples were denatured at 99°C for 5 minutes in IX Laemmli sample buffer. Lysates and medium samples were separated by either 12% (for excised intein detection) or 6% (for full- length F8 protein detection) SDS-polyacrylamide gel electrophoresis (SDS-PAGE). The antibodies used for immuno-blotting are as follows: anti-3xflag (1:2000, A8592; Sigma-Aldrich, Saint Louis, MO, USA) to detect the full-length F8-N6 protein and both the 5' and the 3' halves; anti-P-Actin (1:1000, NB600-501; Novus Biological LLC, Littleton, CO, USA) to detect b-Actin proteins which were used as loading controls for the 12% SDS-PAGE; anti-Calnexin (1:2000, ADI- SPA-860; Enzo Life Sciences Inc, New York, NY, USA) to detect Calnexin, used as loading controls for the 6% SDS-PAGE. The quantification of the full-length F8-N6 bands detected by Western blot was performed using ImageJ software.
Immunoprecipitation and Mass Spectrometry analysis
Cells were plated in 100 mm plates (5x10s cells/plates) and transfected with either the single Codop N6 plasmid or Codop N6 intein plasmids using the calcium phosphate method (20 pg of each plasmid/plate). Cells were harvested 72 hpt and both the single Codop N6 and the Codop N6 intein proteins were immunoprecipitated using anti-flag M2 magnetic beads (M8823; Sigma- Aldrich), according to the manufacturer's instructions. Proteins were eluted from the beads by incubation for 15 minutes in sample buffer supplemented with 4 M urea at 37°C and 10 minutes at 99°C. Samples were then loaded on a gradient 4-10% SDS-polyacrylamide gel electrophoresis. In total 8 protein bands (from HEK293 cells transfected 6 times independently with CodopN6 intein plasmids) were cut after staining with Instant Blue (ISB1L; Sigma-Aldrich) and were used for protein sequencing. Briefly, 8 gel slides were used for digestion by the following enzymes: Lysin and Trypsin. The resulting peptides were identified using nanoscale Liquid Chromatography coupled to tandem Mass Spectrometry (nano LC-MS/MS) analysis. Data obtained were processed using MaxQuant and the implemented Andromeda search engine. Animal model
Animal were housed at the TIGEM animal facility (Pozzuoli, Italy). The hemophilic mouse model (B6;129S-F8tmlKaz/J) was imported from The Jackson Laboratory (JAX stock). Mice were maintained by crossing knockout homozygous females with knockout hemizygous males.
Retro-orbital injection of AAV vectors in mice All procedures on mice were approved from the Italian Ministry of Health; department of Public Health, Animal Health, Nutrition and Food Safety num 379/2019-PR.Adult knockout males (between 7 to 11 weeks of age) were retro-orbitally injected with AAV 8 with either AAV-N6 intein, AAV-CodopN6 intein or the single AAV-CodopV3 as a positive control at the dose of 5 x 1011 GC of each vector per animal.
AAV- CodopN6 intein were also retro-orbitally injected with a low dose of 1.5 x 10 GC of each vector per animal.
Plasma collection and F8 assays
Briefly, nine parts of blood were collected by retro-orbital withdrawal into one part of buffered trisodium citrate 0.109M (BD, Franklin Lakes, NJ, USA). Blood plasma was collected after samples centrifugation at 3000 rpm at 4°C for 15 minutes.
To evaluate F8 activity chromogenic assay was performed on plasma samples using a Coatest® SP4 FVIII-kit (Chromogenix, Werfen, Milan, Italy) according to manufacturer's instructions. Standard curve was generated by serial dilution of commercial human F8 (Refacto, Pfizer). Results are expressed as International Units (IU) per deciliter (dl).
Activated partial thromboplastin time (aPTT), was measured on plasma samples with Coatron M4 (Teco, Bunde, Germany) using the aPTT program following the manufacturer's manual. Indirect enzyme-linked immunosorbent assay (ELISA) to detect anti-F8 antibodies To evaluate the presence of anti-F8 antibodies an indirect enzyme-linked immunosorbent assay (ELISA) was performed using ZYMUTEST™Anti-VIII Monostrip IgG (HYPHEN BioMed, France) according to manufacturer's instructions. The secondary antibody used to detect mouse IgG is the goat-anti-mouse IgG (H+L) HRP conjugate (1:3000, AP308P Sigma-Aldrich, Saint Louis, MO, USA).
Statistical analysis
Data are presented as median, Statistical p values < 0.05 were considered significant. The normality assumption was verified using the Shapiro-Wilk test. Levene's test was applied to check the homogeneity of variances. Data were analyzed by the Student's T-test or, when data were not normally distributed (Shapiro-Wilk test p value < 0.05), either by the Kruskal-Wallis rank sum test or the Wilcoxon rank sum test (non-parametric tests) were used. Specific statistical values were made as follows:
Figure IOC The Kruskal-Wallis test followed by the posthoc analysis: Nemenyi's All-Pairs Rank Comparison Test. The Kruskal-Wallis test p value = 2.88e-05. Post-hoc p values are as follows: wild-type F8 vs neg p value= 0.8; N6 vs neg p value= 0.00034878; SQvs neg p value=0.09; V3 vs neg p value= 0.002; N6 vs wild type p value= 0.003; SQ vs wild type p value= 0.46; V3 vs wild type p value= 0.02; SQ vs N6 p value= 0.3; V3 vs N6 p value= 0.9; V3 vs SQ p value= 0.6.
Figure 11D
The Kruskal-Wallis test p value= 0.013. P value between different groups are as follows: I vs II p value= 0.5; I vs I N-int B p value= 0.6; II vs l+ll (Het) p value= 0.7; II vs N6 intein p value= 0.09; II vs Neg p value= 0.06; I vs I (N int-B) p value= 0.3; II vs l+ll (Het) p value= 0.4; II vs l+ll p value= 0.02; II vs Neg p value= 0.2; I (N int-B) vs l+ll (Het) p value= 0.8; I (N int-B) vs l+ll p value = 0.2; I (N int-B) vs neg p value= 0.02; l+ll (Het) vs l+ll p value= 0.1; l+ll (Het) vs Neg p value= 0.03; l+ll vs neg p value= 0.0003.
Figure 12C
The Kruskal-Wallis test has been used; p value is 0.027. P value between different groups are as follows: I vs II p value= 0.5; I vs Codop II p value=0.9; I vs Codop I p value=0.9; II vs Codop l+ll p value=0.007; II vs l+ll p value=0.03; II vs neg p value= 0.4; I vs Codop II p value= 0.6; I vs Codop I p value= 0.5; I vs Codop l+ll p value= 0.03; I vs l+ll p value= 0.1; I vs Neg p value= 0.8; II Codop vs I Codop p value= 0.8; II Codop vs Codop l+ll p value = 0.008; ; II Codop vs l+ll p value= 0.04; II Codop vs Neg p value= 0.4; Codop I vs Codop l+ll p value = 0.005; Codop I vs l+ll p value=0.03; Codop I vs neg p value= 0.4; Codop l+ll vs l+ll p value= 0.5; Codop l+ll vs Neg p value= 0.056; Figure 14
The Kruskal-Wallis test has been used; p value between different groups at different time point are as follows: at 4w.p.i. p <0.0001; Codop N6 vs Codop V3 p value = 0.00008; Codop N6 vs N6 intein p value= 0.01; Codop V3 vs N6 intein p value =1.
Figure 15B
Wilcoxon test has been used at lw.p.i. p value = 0.016 at 4 w.p.i. p value =0.009 Figure 16
Paired T- test has been used for Codop V3 p value = 0.00072 and Wilcoxon test Codop V3 p value= 0.031; Paired T- test has been used for N6 intein p value= 0.033; Wilcoxon test has been used Codop N6 p value = 0.0039.
Figure 17
Paired T- test has been used for Codop N6 intein low dose p value =0.002. Figure 18
The Kruskal-Wallis test has been used, p value was adjusted with Bonferroni test p value between different groups are as follows: baseline vs CodopN6 p value = 1; baseline vs CodopN6 low dose p value = 0.03; baseline vs CodopV3 p value = 1; baseline vs N6 intein p value = 0.2; baseline vs wild-type p value = 0.0009; CodopN6 vs CodopN6 low dose p value=0.025; CodopN6 vs CodopV3p value=l; CodopN6 vs N6 p value=0.15; CodopN6 vs wild-type p value=l; CodopN6 low dose vs CodopV3 p value=0.8; CodopN6 low dose vs N6 p value=l; CodopN6 low dose vs wild-type p value=l; CodopV3 vs N6 p value=l; CodopV3 vs wild-type p value=0.17; N6 vs wild- type p value=l.
Further aspects of the invention are disclosed in the following numbered paragraphs:
1-A vector system to express a coding sequence in a cell, said coding sequence consisting of a first portion (CDS1) and a second portion (CDS2) said vector system comprising: c) a first vector comprising:
- said first portion of said coding sequence (CDS1),
-a first intein nucleotide sequence coding for a N-lntein said first intein nucleotide sequence having at least 80 % identity with the sequence:
TGCCTGAGCTACGAGACCGAGATCCTGACCGTGGAGTACGGCCTGCTGCCCATCGGCAAGATCG
TGGAGAAGCGGATCGAGTGCACCGTGTACAGCGTGGACAACAACGGCAACATCTACACCCAGCC
CGTGGCCCAGTGGCACGACCGGGGCGAGCAGGAGGTGTTCGAGTACTGCCTGGAGGACGGCAGC
CTGATCCGGGCCACCAAGGACCACAAGTTCATGACCGTGGACGGCCAGATGCTGCCCATCGACG
AGATCTTCGAGCGGGAGCTGGACCTGATGCGGGTGGACAACCTGCCCAAC (Seq ID No.15) or said first intein nucleotide sequence has at least 80 % identity with the sequence: TGCCTGTCCTATGAGACAGAGATCCTGACAGTGGAGTACGGCCTGCTGCCTATCGGCAAGA
TCGTGGAGAAGAGGATCGAGTGTACCGTGTATAGCGTGGACAACAATGGCAATATCTACA
CACAGCCAGTGGCACAGTGGCACGACAGGGGAGAGCAGGAGGTGTTTGAGTATTGTCTGG
AGGATGGCAGCCTGATCCGGGCCACCAAGGATCACAAGTTCATGACAGTGGACGGCCAGAT
GCTGCCAATCGATGAGATCTTTGAGCGCGAGCTGGACCTGATGCGGGTGGATAACCTGCCC
AAT (Sea ID No. 16), or said N-intein has at least 80 % identity with SEQ ID No. 1 or a variant thereof or a fragment thereof or an homolog thereof; and wherein said first intein nucleotide sequence is located at the 3' end of CDS1; and d) a second vector comprising:
- said second portion of said coding sequence (CDS2),
-a second intein nucleotide sequence coding for a C-lntein said second intein nucleotide sequence having at least 80 % identity with the sequence:
ATCAAGATCGCCACCCGGAAGTACCTGGGCAAGCAGAACGTGTACGACATCGGCGTGGAGCGGG
ACCACAACTTCGCCCTGAAGAACGGCTTCATCGCCAGCAAT (SEQ ID No. 17), or said second intein nucleotide sequence has at least 80 % identity with the sequence:
ATGATCAAGATCGCCACACGGAAGTACCTGGGCAAGCAGAACGTGTATGATATCGGCGTG GAGCGGGACCACAACTTCGCCCTGAAGAATGGCTTTATCGCCAGCAAT (SEQ ID No. 18), or said C-intein has at least 80 % identity with SEQ ID No. 2; and wherein said second intein nucleotide sequence is located at the 5' end of CDS2; wherein when the first vector and the second vector are inserted in a cell, the protein product of the coding sequence is produced by protein splicing.
2- The vector system according to paragraph 1, wherein the first vector and the second vector further comprise a promoter sequence operably linked to the 5'end portion of said first portion of the coding sequence (CDS1) or of said second portion of the coding sequence (CDS2), preferably said promoter is a liver specific promoter, preferably the promoter is HLP or thyroxine binding globulin (TBG), HCB promoter , F8 promoter.
3- The vector system according to any one of previous paragraphs, wherein the first vector and the second vector further comprise a 5'-terminal repeat (5'-TR) nucleotide sequence and a 3'- terminal repeat (3'-TR) nucleotide sequence, preferably the 5'-TR is a 5'-inverted terminal repeat (5'-ITR) nucleotide sequence and the 3'-TR is a 3'-inverted terminal repeat (3'-ITR) nucleotide sequence.
4- The vector system according to any one of previous paragraphs, wherein the first vector and the second vector further comprise a poly-adenylation signal nucleotide sequence and/or wherein at least one of the first vector or the second vector further comprises a nucleotide sequence coding for a degradation signal.
5- The vector system according to paragraph 4 wherein the degradation signal is selected from the group consisting of CL1, PB29, SMN, CIITA, ODc, ecDHFR or a fragment thereof.
6- The vector system according to anyone of previous paragraphs, wherein the coding sequence encodes a protein able to correct hemophilia or Wilson disease.
7- The vector system according to any one of previous, wherein the coding sequence is the coding sequence of a gene selected from the group consisting of: F8 or ATP7B or variant thereof.
8- The vector system according to anyone of previous paragraphs, wherein the coding sequence is split into the first portion or the second portion at a position consisting of a nucleophile amino acid which does not fall within a structural domain or a functional domain of the encoded protein product, wherein the nucleophile aminoacid is selected from serine, threonine, or cysteine.
9- The vector system according to anyone of previous paragraphs, wherein the coding sequence is codon optimized. 10- The vector system according to any one of previous paragraphs, wherein the coding sequence has at least 80% identity with a sequence selected from the group consisting of: a) ATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGCTTTAGTGCCACCAG AAGATACTACCTGGGTGCAGTGGAACTGTCATGGGACTATATGCAAAGTGATCTCGGTGAGC T G CCT GT G G ACGC AAG ATTTCCT CCT AG AGT GCCAAAAT CTTTTCCATT C AAC ACCT CAGTCGT GTACAAAAAG ACTCTGTTTGTAG AATTC ACG G ATC ACCTTTTCAAC ATCG CTAAGCCAAGGCCA CCCTGGATGGGTCTGCTAGGTCCTACCATCCAGGCTGAGGTTTATGATACAGTGGTCATTACA CTTAAGAACATGGCTTCCCATCCTGTCAGTCTTCATGCTGTTGGTGTATCCTACTGGAAAGCTT CTGAGGGAGCTGAATATGATGATCAGACCAGTCAAAGGGAGAAAGAAGATGATAAAGTCTTC CCTGGTGGAAGCCATACATATGTCTGGCAGGTCCTGAAAGAGAATGGTCCAATGGCCTCTGAC CCACT GTG CCTT ACCT ACT CAT AT CTTT CT CAT GTG G ACCTG GT AAAAG ACTT G AATT C AGG CCT CATTGGAGCCCTACTAGTATGTAGAGAAGGGAGTCTGGCCAAGGAAAAGACACAGACCTTGC AC A AATTT AT ACT ACTTTTT G CT GTATTT G ATG A AG G G A A A AGTT G G C ACT C AG A A AC A A AG A ACTCCTTGATGCAGGATAGGGATGCTGCATCTGCTCGGGCCTGGCCTAAAATGCACACAGTCA ATGGTTATGTAAACAGGTCTCTGCCAGGTCTGATTGGATGCCACAGGAAATCAGTCTATTGGC ATGTG ATT G G A AT GGGCACCACTCCTGAAGTGCACT C A AT ATT CCTCG A AG GT C AC AC ATTT CT TGTGAGGAACCATCGCCAGGCGTCCTTGGAAATCTCGCCAATAACTTTCCTTACTGCTCAAACA CTCTTGATGGACCTTGGACAGTTTCTACTGTTTTGTCATATCTCTTCCCACCAACATGATGGCAT GGAAGCTTATGTCAAAGTAGACAGCTGTCCAGAGGAACCCCAACTACGAATGAAAAATAATG AAG AAG CG G AAG ACT ATG ATG ATG ATCTTACTG ATTCTG AAATG G ATGTG GTCAG GTTTG ATG ATG ACAACT CTCCTTCCTTT AT CC AAATTCG CT C AGTT G CC AAG AAG CAT CCT A A A ACTT G G GT ACATTACATTGCTGCTGAAGAGGAGGACTGGGACTATGCTCCCTTAGTCCTCGCCCCCGATGA C AG AAGTTAT AAAAGTC AAT ATTT G AAC AAT G GCCCT C AG CG G ATTGGTAG G AAGTAC AAAA AAGTCCGATTTATGGCATACACAGATGAAACCTTTAAGACTCGTGAAGCTATTCAGCATGAAT C AG G AAT CTT G G G ACCTTT ACTTT ATGGGGAAGTTGGAGACACACTGTTG ATT AT ATTT A AG A ATCAAGCAAGCAGACCATATAACATCTACCCTCACGGAATCACTGATGTCCGTCCTTTGTATTC AAGG AG ATT ACCAAAAG GT GT AAAAC ATTT G AAG G ATTTT CC AATT CT GCCAG G AG AAAT ATT CAAATATAAATGGACAGTGACTGTAGAAGATGGGCCAACTAAATCAGATCCTCGGTGCCTGAC CCG CT ATT ACT CT AGTTT CGTT AAT AT G GAG AG AG AT CT AGCTT CAG G ACT C ATT GG CCCTCTC CT CAT CTGCT AC AAAG AAT CT GT AG AT C AAAG AG G AAACC AG AT AAT GT CAG AC AAG AG G AA TGTCATCCTGTTTTCTGTATTTGATGAGAACCGAAGCTGGTACCTCACAGAGAATATACAACGC TTTCTCCCCAATCCAGCTGGAGTGCAGCTTGAGGATCCAGAGTTCCAAGCCTCCAACATCATGC ACAGCATCAATGGCTATGTTTTTGATAGTTTGCAGTTGTCAGTTTGTTTGCATGAGGTGGCATA
CT G GTAC ATT CT A AG C ATT GGAGCACAGACT G ACTT CCTTT CTGT CTT CTT CTCTG G AT ATACCT
T C AAACAC AAAAT G GT CT AT G AAG ACAC ACT CACCCT ATTCCC ATT CT C AG G AG AAACT GTCTT
CATGTCGATGGAAAACCCAGGTCTATGGATTCTGGGGTGCCACAACTCAGACTTTCGGAACAG
AGGCATGACCGCCTTACTGAAGGTTTCTAGTTGTGACAAGAACACTGGTGATTATTACGAGGA
CAGTTATGAAGATATTTCAGCATACTTGCTGAGTAAAAACAATGCCATTGAACCAAGAAGTTTT
TCACAGAATCCACCTGTATTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTAGGC
AAAAACAGTTT AATGCAACCACAATACCTG AAAATG AT ATAG AG AAAACCG AT CCCTGGTT CG
C ACACCG AACCCCC AT G CC AAAAATT C AAAACGTCT CC AGTTCCG AT CTT CT CAT G CT CTTGCG
CCAGTCACCCACACCACATGGTCTCTCCCTCAGCGACCTGCAAGAGGCGAAATATGAAACATT
TTCAGATGACCCTAGCCCCGGCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTTT
CGGCCGCAG CTGC ATC ATTCTG GTG ATATG GTATTC ACCCCG G AATCAG GCCTCC AACTTAG A
CTTAACGAGAAACTGGGCACGACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCC
AGTACCAGCAACAACCTTATCAGCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAAT
ACATCATCACTTGGGCCACCCTCTATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTT
TGGTAAGAAGTCATCCCCACTCACCGAAAGCGGTGGACCTTTGTCTCTCTCTGAGGAGAATAA
TG ACTCC A AG CTG CTT GAGTCAGGGTTGATGAACAG CCA AG A AT CCTC ATG G G G A A A A A ACG
TTTCCT CC ACC AG GG AAAT AACT CGTACT ACT CTT C AGT CAG AT C AAG AG G AAATT G ACT ATG A
TG AT ACC AT ATC AGTT G A A AT G A AG AAG G A AG ATTTT G AC ATTT ATG ATG AG G ATG AAAAT C A
GAGCCCCCGCAGCTTTCAAAAGAAAACACGACACTATTTTATTGCTGCAGTGGAGAGGCTCTG
GGATTATGGGATGAGTAGCTCCCCACATGTTCTAAGAAACAGGGCTCAGAGTGGCAGTGTCC
CT C AGTT CAAG AAAGTT GTTTT CC AGG AATTT ACT GAT G GCTCCTTT ACT CAG CCCTT AT ACCGT
GGAGAACTAAATGAACATTTGGGACTCCTGGGGCCATATATAAGAGCAGAAGTTGAAGATAA
T AT CAT GGTAACTTT CAG AAAT CAG GCCT CTCGT CCCT ATTCCTT CT ATT CT AGCCTT ATTT CTT A
TGAGGAAGATCAGAGGCAAGGAGCAGAACCTAGAAAAAACTTTGTCAAGCCTAATGAAACCA
AAACTT ACTTTT G G AAAGT GC AACAT CAT AT GG C ACCCACT AAAG AT G AGTTT G ACT G CAAAG
CCTGGGCTTATTTCTCTGATGTTGACCTGGAAAAAGATGTGCACTCAGGCCTGATTGGACCCCT
TCTGGTCTGCCACACTAACACACTGAACCCTGCTCATGGGAGACAAGTGACAGTACAGGAATT
TG CTCT GTTTTT C ACC AT CTTT G ATG AG ACC AAAAG CT GGTACTT C ACT G AAAAT AT GG AAAG A
AACTGCAGGGCTCCCTGCAATATCCAGATGGAAGATCCCACTTTTAAAGAGAATTATCGCTTCC
ATG C A ATC A ATG G CTAC ATA ATG G ATAC ACTACCTG G CTT AGT A ATG G CTC AG G ATC A A AG G A
TTCGATGGTATCTGCTCAGCATGGGCAG C AAT G A A A AC ATCC ATT CT ATT C ATTT CAGTGGACA TGTGTTCACTGTACG A A A AAA AG AG G AGT AT A A A AT G G C ACTGT AC A AT CTCTATCC AG GTGT
TTTTGAGACAGTGGAAATGTTACCATCCAAAGCTGGAATTTGGCGGGTGGAATGCCTTATTGG CGAGCATCTACATGCTGGGATGAGCACACTTTTTCTGGTGTACAGCAATAAGTGTCAGACTCC CCTGGGAATGGCTTCTGGACACATTAGAGATTTTCAGATTACAGCTTCAGGACAATATGGACA GTGGGCCCCAAAGCTGGCCAGACTTCATTATTCCGGATCAATCAATGCCTGGAGCACCAAGGA GCCCTTTTCTTGGATCAAGGTGGATCTGTTGGCACCAATGATTATTCACGGCATCAAGACCCAG GGTG CCCGTC AG AAGTT CT CC AGCCT CT AC AT CT CT C AGTTT AT CAT CAT GTAT AGTCTT G ATG GGAAGAAGTGGCAGACTTATCGAGGAAATTCCACTGGAACCTTAATGGTCTTCTTTGGCAATG TGGATTCATCTGGGATAAAACACAATATTTTTAACCCTCCAATTATTGCTCGATACATCCGTTTG C ACCC A ACT C ATT ATAG C ATT CG C AG C ACT CTT CGCATGGAGTTGATGGGCTGTG ATTT A A AT A GTTG C AG CAT G CC ATT G G G A AT G GAG AGT A A AG C A AT AT C AG ATG C AC AG ATT ACTG CTT CAT CCT ACTTT ACCAAT AT GTTT G CCACCT GGTCTCCTT C AAAAG CT CG ACTT C ACCTCC AAGG G AG GAGTAATGCCTGGAGACCTCAGGTGAATAATCCAAAAGAGTGGCTGCAAGTGGACTTCCAGA AG AC A AT G A A AGT CACAGGAGTAACTACTCAGGGAGT AAA AT CT CT G CTT ACC AG C ATGTATG TGAAGGAGTTCCTCATCTCCAGCAGTCAAGATGGCCATCAGTGGACTCTCTTTTTTCAGAATGG C AAAGTAAAG GTTTTT CAG GG AAAT C AAG ACT CCTT CAC ACCT GTG GT G AACT CT CT AG ACCC A CCGTTACTGACTCGCTACCTTCGAATTCACCCCCAGAGTTGGGTGCACCAGATTGCCCTGAGG ATG GAG GTT CTGG GCT G CG AG GC ACAG G ACCT CT ACT G A; b) ATG CAG ATT GAGCTGAGCACCTG CTT CTT CCTGTGCCTGCTGAGGTTCTG CTT CTCTGCCACCA GGAGATACTACCTGGGGGCTGTGGAGCTGAGCTGGGACTACATGCAGTCTGACCTGGGGGA GCTGCCTGTGGATGCCAGGTTCCCCCCCAGAGTGCCCAAGAGCTTCCCCTTCAACACCTCTGTG GT GT ACAAG AAG ACCCT GTTT GT GG AGTTCACTG ACCACCT GTT CAACATT GCCAAGCCCAGG CCCCCCTGGATGGGCCTGCTGGGCCCCACCATCCAGGCTGAGGTGTATGACACTGTGGTGATC ACCCTGAAGAACATGGCCAGCCACCCTGTGAGCCTGCATGCTGTGGGGGTGAGCTACTGGAA GGCCTCTGAGGGGGCTGAGTATGATGACCAGACCAGCCAGAGGGAGAAGGAGGATGACAAG GTGTTCCCTGGGGGCAGCCACACCTATGTGTGGCAGGTGCTGAAGGAGAATGGCCCCATGGC CTCTGACCCCCTGTGCCTGACCTACAGCTACCTGAGCCATGTGGACCTGGTGAAGGACCTGAA CTCTGGCCTGATTGGGGCCCTGCTGGTGTGCAGGGAGGGCAGCCTGGCCAAGGAGAAGACC C AG ACCCTG CAC AAGTT CAT CCT G CTGTTTG CTGTGTTTG ATG AG GG CAAG AG CTGG C ACTCT GAAACCAAGAACAGCCTGATGCAGGACAGGGATGCTGCCTCTGCCAGGGCCTGGCCCAAGAT GCACACTGTGAATGGCTATGTGAACAGGAGCCTGCCTGGCCTGATTGGCTGCCACAGGAAGT
CTGTGTACTGG CAT GTG ATT G G CAT GGGCACCACCCCTGAGGTGCACAG CAT CTTCCT G G AG G GCCACACCTTCCTGGTCAGGAACCACAGGCAGGCCAGCCTGGAGATCAGCCCCATCACCTTCC
TGACTGCCCAGACCCTGCTGATGGACCTGGGCCAGTTCCTGCTGTTCTGCCACATCAGCAGCC
ACCAGCATGATGGCATGGAGGCCTATGTGAAGGTGGACAGCTGCCCTGAGGAGCCCCAGCTG
AGGATGAAGAACAATGAGGAGGCTGAGGACTATGATGATGACCTGACTGACTCTGAGATGG
ATGTG GTG AGGTTTG ATG ATG AC AACAG CCCC AG CTTCATCC AG ATCAG GTCTGTG G CC AAG A
AGCACCCCAAGACCTGGGTGCACTACATTGCTGCTGAGGAGGAGGACTGGGACTATGCCCCC
CTGGTGCTGGCCCCTGATGACAGGAGCTACAAGAGCCAGTACCTGAACAATGGCCCCCAGAG
GATTGGCAGGAAGTACAAGAAGGTCAGGTTCATGGCCTACACTGATGAAACCTTCAAGACCA
GGGAGGCCATCCAGCATGAGTCTGGCATCCTGGGCCCCCTGCTGTATGGGGAGGTGGGGGA
CACCCTGCTGATCATCTTCAAGAACCAGGCCAGCAGGCCCTACAACATCTACCCCCATGGCATC
ACTG ATGTG AG GCCCCTGTACAG CAG G AG GCTG CCC AAGG GG GTG AAG CACCTG AAG G ACTT
CCCC ATCCTG CCTG G G G AG ATCTTC AAGTACAAGTGG ACTGTG ACTGTG G AG G ATGG CCCCAC
CAAGTCTGACCCCAGGTGCCTGACCAGATACTACAGCAGCTTTGTGAACATGGAGAGGGACCT
GGCCTCTGGCCTGATTGGCCCCCTGCTGATCTGCTACAAGGAGTCTGTGGACCAGAGGGGCA
ACCAGATCATGTCTGACAAGAGGAATGTGATCCTGTTCTCTGTGTTTGATGAGAACAGGAGCT
GGTACCTGACTGAGAACATCCAGAGGTTCCTGCCCAACCCTGCTGGGGTGCAGCTGGAGGAC
CCTG AGTTCCAG GCC AGC AACATCATG CACAG CAT CAAT G GCTAT GT GTTT G AC AG CCTG CAG
CTGTCTGTGTGCCTGCATGAGGTGGCCTACTGGTACATCCTGAGCATTGGGGCCCAGACTGAC
TTCCTGTCTGTGTTCTTCTCTGGCTACACCTTCAAGCACAAGATGGTGTATGAGGACACCCTGA
CCCTGTTCCCCTTCTCTGGGGAGACTGTGTTCATGAGCATGGAGAACCCTGGCCTGTGGATTCT
GGGCTGCCACAACTCTGACTTCAGGAACAGGGGCATGACTGCCCTGCTGAAAGTCTCCAGCTG
T G AC A AG A AC ACTG G G G ACTACTATG AG G AC AG CTATG AG G AC AT CTCTG CCT ACCTG CTG AG
CAAGAACAATGCCATTGAGCCCAGGAGTTTTTCACAGAATCCACCTGTATTGACGCGGAGTTT
C AGTC AG AACTCCAG GC ACCCCT CT ACT AG GC AAAAAC AGTTT AAT G C AACC AC AAT ACCTG A
AAATGATATAGAGAAAACCGATCCCTGGTTCGCACACCGAACCCCCATGCCAAAAATTCAAAA
CGT CTCC AGTT CCG AT CTT CT CAT GCT CTT G CGCCAGTC ACCCAC ACC AC ATGGTCT CTCCCT C A
GCGACCTGCAAGAGGCGAAATATGAAACATTTTCAGATGACCCTAGCCCCGGCGCTATTGATA
GTAACAACTCTCTCAGTGAAATGACTCACTTTCGGCCGCAGCTGCATCATTCTGGTGATATGGT
ATTCACCCCGGAATCAGGCCTCCAACTTAGACTTAACGAGAAACTGGGCACGACCGCCGCCAC
CG AGTTG AAG AAACT CG ACTT C AAGGTTT CC AGT ACC AGC AACAACCTT AT CAG CACT ATCCCA
TCCGATAATCTCGCGGCCGGGACAGATAATACATCATCACTTGGGCCACCCTCTATGCCGGTC
CACTATGATTCCCAGTTGGACACAACTCTTTTTGGTAAGAAGTCATCCCCACTCACCGAAAGCG GTGGACCTTTGTCTCTCTCTGAGGAGAATAATGACTCCAAGCTGCTTGAGTCAGGGTTGATGA
ACAGCCAAGAATCCTCATGGGGAAAAAACGTTTCCTCCACCAGGGAGATCACCAGGACCACCC
TGCAGTCTGACCAGGAGGAGATTGACTATGATGACACCATCTCTGTGGAGATGAAGAAGGAG
GACTTTGACATCTACGACGAGGACGAGAACCAGAGCCCCAGGAGCTTCCAGAAGAAGACCAG
GCACTACTTCATTGCTGCTGTGGAGAGGCTGTGGGACTATGGCATGAGCAGCAGCCCCCATGT
GCTGAGGAACAGGGCCCAGTCTGGCTCTGTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGAGT
TCACTGATGGCAGCTTCACCCAGCCCCTGTACAGAGGGGAGCTGAATGAGCACCTGGGCCTG
CTGGGCCCCTACATCAGGGCTGAGGTGGAGGACAACATCATGGTGACCTTCAGGAACCAGGC
CAGCAGGCCCTACAGCTTCTACAGCAGCCTGATCAGCTATGAGGAGGACCAGAGGCAGGGGG
CT G AGCCCAGG AAG AACTTT GTG AAGCCCAATG AAACCAAG ACCT ACTTCTGG AAGGTGCAG
CACCACATGGCCCCCACCAAGGATGAGTTTGACTGCAAGGCCTGGGCCTACTTCTCTGATGTG
GACCTGGAGAAGGATGTGCACTCTGGCCTGATTGGCCCCCTGCTGGTGTGCCACACCAACACC
CTGAACCCTGCCCATGGCAGGCAGGTGACTGTGCAGGAGTTTGCCCTGTTCTTCACCATCTTTG
ATGAAACCAAGAGCTGGTACTTCACTGAGAACATGGAGAGGAACTGCAGGGCCCCCTGCAAC
ATCC AG ATG G AG G ACCCC ACCTTCAAG G AG AACTACAG GTTCCATG CCATCAATG GCTAC ATC
ATG G AC ACCCTG CCTG G CCTG GTG ATG G CCCAG G ACC AG AG G ATCAG GTGGTACCTG CTG AG
CATGGGCAGCAATGAGAACATCCACAGCATCCACTTCTCTGGCCATGTGTTCACTGTGAGGAA
GAAGGAGGAGTACAAGATGGCCCTGTACAACCTGTACCCTGGGGTGTTTGAGACTGTGGAGA
TGCTGCCCAGCAAGGCTGGCATCTGGAGGGTGGAGTGCCTGATTGGGGAGCACCTGCATGCT
GGCATGAGCACCCTGTTCCTGGTGTACAGCAACAAGTGCCAGACCCCCCTGGGCATGGCCTCT
GGCCACATCAGGGACTTCCAGATCACTGCCTCTGGCCAGTATGGCCAGTGGGCCCCCAAGCTG
GCCAGGCTGCACTACTCTGGCAG CAT C A AT GCCTGGAGCACCAAGG AG CCCTT C AG CTG G ATC
AAGGTGGACCTGCTGGCCCCCATGATCATCCATGGCATCAAGACCCAGGGGGCCAGGCAGAA
GTTCAGCAGCCTGTACATCAGCCAGTTCATCATCATGTACAGCCTGGATGGCAAGAAGTGGCA
GACCTACAGGGGCAACAGCACTGGCACCCTGATGGTGTTCTTTGGCAATGTGGACAGCTCTG
G CAT C AAGC AC AAC AT CTT CAACCCCCCCAT CATTGCC AG AT AC AT C AGG CT GC ACCCCACCC A
CTACAGCATCAGGAGCACCCTGAGGATGGAGCTGATGGGCTGTGACCTGAACAGCTGCAGCA
TGCCCCTGGGCATGGAGAGCAAGGCCATCTCTGATGCCCAGATCACTGCCAGCAGCTACTTCA
CCAACATGTTTGCCACCTGGAGCCCCAGCAAGGCCAGGCTGCACCTGCAGGGCAGGAGCAAT
GCCTGGAGGCCCCAGGTCAACAACCCCAAGGAGTGGCTGCAGGTGGACTTCCAGAAGACCAT
GAAGGTGACTGGGGTGACCACCCAGGGGGTGAAGAGCCTGCTGACCAGCATGTATGTGAAG
GAGTTCCTGATCAGCAGCAGCCAGGATGGCCACCAGTGGACCCTGTTCTTCCAGAATGGCAA GGTGAAGGTGTTCCAGGGCAACCAGGACAGCTTCACCCCTGTGGTGAACAGCCTGGACCCCC
CCCTG CTG ACCAG ATACCTG AG G ATTCACCCCC AG AG CTG G GTG C ACC AG ATTGCCCTG AG G A TG G AG GTGCTG GG CTGTG AG GCCCAG G ACCTGTACTG A; c) AT G C T GAG C AG GAG AG AC AG AT C AC AG C C AG AG AAG G G G C C AG T C G G AAAAT CTTATCTAAGCTTTCTTTGCCTACCCGTGCCTGGGAACCAGCAATGAAGAAG AGTTTTGCTTTTGACAATGTTGGCTATGAAGGTGGTCTGGATGGCCTGGGCC CTTCTTCTCAGGTGGCCACCAGCACAGTCAGGATCTTGGGCATGACTTGCCA G T CAT G T G T G AAG T C C AT T GAG G AC AG GAT T T C C AAT T T G AAAG G CAT CATC AGCATGAAGGTTTCCCTGGAACAAGGCAGTGCCACTGTGAAATATGTGCCAT CGGTTGTGTGCCTGCAACAGGTTTGCCATCAAATTGGGGACATGGGCTTCGA GGCCAGCATTGCAGAAGGAAAGGCAGCCTCCTGGCCCTCAAGGTCCTTGCCT GCCCAGGAGGCTGTGGTCAAGCTCCGGGTGGAGGGCATGACCTGCCAGTCCT GTGTCAGCTCCATTGAAGGCAAGGTCCGGAAACTGCAAGGAGTAGTGAGAGT C AAAG T C T C AC T C AG C AAC C AAG AG G C C G T CAT C AC T TAT C AG C C T TAT C T C ATT C AG C C C G AAG AC C T C AG G G AC C AT G T AAAT G AC AT G G GAT T T G AAG CTG CCATCAAGAGCAAAGTGGCTCCCTTAAGCCTGGGACCAATTGATATTGAGCG G T T AC AAAG C AC T AAC C C AAAG AG AC C T T TAT C T T C T G C T AAC CAGAAT T T T AATAATTCTGAGACCTTGGGGCACCAAGGAAGCCATGTGGTCACCCTCCAAC T GAG AAT AG AT G G AAT G CAT T G T AAG TCTTGCGTCTT GAAT AT T G AAG AAAA TATTGGCCAGCTCCTAGGGGTTCAAAGTATTCAAGTGTCCTTGGAGAACAAA ACTGCCCAAGTAAAGTATGACCCTTCTTGTACCAGCCCAGTGGCTCTGCAGA GGGCTATCGAGGCACTTCCACCTGGGAATTTTAAAGTTTCTCTTCCTGATGG AGCCGAAGGGAGTGGGACAGATCACAGGTCTTCCAGTTCTCATTCCCCTGGC TCCCCACCGAGAAACCAGGTCCAGGGCACATGCAGTACCACTCTGATTGCCA TTGCCGGCATGACCTGTGCATCCTGTGTCCATTCCATTGAAGGCATGATCTC CCAACTGGAAGGGGTGCAGCAAATATCGGTGTCTTTGGCCGAAGGGACTGCA ACAGTTCTTTATAATCCCTCTGTAATTAGCCCAGAAGAACTCAGAGCTGCTA TAGAAGACATGGGATTTGAGGCTTCAGTCGTTTCTGAAAGCTGTTCTACTAA CCCTCTTGGAAACCACAGTGCTGGGAATTCCATGGTGCAAACTACAGATGGT ACACCTACATCTGTGCAGGAAGTGGCTCCCCACACTGGGAGGCTCCCTGCAA
ACCATGCCCCGGACATCTTGGCAAAGTCCCCACAATCAACCAGAGCAGTGGC ACCGCAGAAGTGCTTCTTACAGATCAAAGGCATGACCTGTGCATCCTGTGTG
TCTAACATAGAAAGGAATCTGCAGAAAGAAGCTGGTGTTCTCTCCGTGTTGG T T G C C T T GAT G G C AG G AAAG G C AG AG AT C AAG TAT G AC C C AG AG G T CAT C C A GCCCCTCGAGATAGCTCAGTTCATCCAGGACCTGGGTTTTGAGGCAGCAGTC AT G GAG G AC T AC G C AG G C T C C GAT G G C AAC AT T GAG C T G AC AAT C AC AG G G A TGACCTGCGCGTCCTGTGTCCACAACATAGAGTCCAAACTCACGAGGACAAA TGGCATCACTTATGCCTCCGTTGCCCTTGCCACCAGCAAAGCCCTTGTTAAG T T T G AC C C G G AAAT TAT C G G T C C AC G G GAT AT TAT C AAAAT TAT T GAG G AAA TTGGCTTTCATGCTTCCCTGGCCCAGAGAAACCCCAACGCTCATCACTTGGA CCACAAGATGGAAATAAAGCAGTGGAAGAAGTCTTTCCTGTGCAGCCTGGTG TTTGGCATCCCTGTCATGGCCTTAATGATCTATATGCTGATACCCAGCAACG AGCCCCACCAGTCCATGGTCCTGGACCACAACATCATTCCAGGACTGTCCAT TCTAAATCTCATCTTCTTTATCTTGTGTACCTTTGTCCAGCTCCTCGGTGGG T G G T AC T T C T AC G T T C AG G C C T AC AAAT C T C T GAG AC AC AG G T C AG C C AAC A TGGACGTGCTCATCGTCCTGGCCACAAGCATTGCTTATGTTTATTCTCTGGT CATCCTGGTGGTTGCTGTGGCTGAGAAGGCGGAGAGGAGCCCTGTGACATTC TTCGACACGCCCCCCATGCTCTTTGTGTTCATTGCCCTGGGCCGGTGGCTGG AACACTTGGCAAAGAGCAAAACCTCAGAAGCCCTGGCTAAACTCATGTCTCT CCAAGCCACAGAAGCCACCGTTGTGACCCTTGGTGAGGACAATTTAATCATC AGGGAGGAGCAAGTCCCCATGGAGCTGGTGCAGCGGGGCGATATCGTCAAGG TGGTCCCTGGGGGAAAGTTTCCAGTGGATGGGAAAGTCCTGGAAGGCAATAC C AT G G C T GAT GAG T C C C T CAT C AC AG GAG AAG C CAT G C C AG T C AC T AAG AAA CCCGGAAGCACTGTAATTGCGGGGTCTATAAATGCACATGGCTCTGTGCTCA T T AAAG C T AC C C AC G T G GG C AAT G AC AC C AC T T T GG C T C AG AT T G T GAAAC T GGTGGAAGAGGCTCAGATGTCAAAGGCACCCATTCAGCAGCTGGCTGACCGG TTTAGTGGATATTTTGTCCCATTTATCATCATCATGTCAACTTTGACGTTGG TGGTATGGATTGTAATCGGTTTTATCGATTTTGGTGTTGTTCAGAGATACTT TCCTAACCCCAACAAGCACATCTCCCAGACAGAGGTGATCATCCGGTTTGCT TTCCAGACGTCCATCACGGTGCTGTGCATTGCCTGCCCCTGCTCCCTGGGGC TGGCCACGCCCACGGCTGTCATGGTGGGCACCGGGGTGGCCGCGCAGAACGG
C AT C C T CAT C AAG G GAG G C AAG C C C C T G GAG AT G G C G C AC AAG AT AAAG AC T GTGATGTTTGACAAGACTGGCACCATTACCCATGGCGTCCCCAGGGTCATGC GGGTGCTCCTGCTGGGGGATGTGGCCACACTGCCCCTCAGGAAGGTTCTGGC TGTGGTGGGGACTGCGGAGGCCAGCAGTGAACACCCCTTGGGCGTGGCAGTC AC CAAATAC TGTAAAG AGGAAC TTGGAACAG AGACC TTGGGATAC TGCACG G ACTTCCAGGCAGTGCCAGGCTGTGGAATTGGGTGCAAAGTCAGCAACGTGGA AGGCATCCTGGCCCACAGTGAGCGCCCTTTGAGTGCACCGGCCAGTCACCTG AATGAGGCTGGCAGCCTTCCCGCAGAAAAAGATGCAGTCCCCCAGACCTTCT CTGTGCTGATTGGAAACCGTGAGTGGCTGAGGCGCAACGGTTTAACCATTTC TAGCGAT GTCAGTGAC GCTATGAC AGACCAC GAGATGAAAG GACAG ACAGC C ATCCTGGTGGCTATTGACGGTGTGCTCTGTGGGATGATCGCAATCGCAGACG CTGTCAAGCAGGAGGCTGCCCTGGCTGTGCACACGCTGCAGAGCATGGGTGT GGACGTGGTTCTGATCACGGGGGACAACCGGAAGACAGCCAGAGCTATTGCC ACCCAGGTTGGCATCAACAAAGTCTTTGCAGAGGTGCTGCCTTCGCACAAGG TGGCCAAGGTCCAGGAGCTCCAGAATAAAGGGAAGAAAGTCGCCATGGTGGG GGATGGGGTCAATGACTCCCCGGCCTTGGCCCAGGCAGACATGGGTGTGGCC ATTGGCACCGGCACGGATGTGGCCATCGAGGCAGCCGACGTCGTCCTTATCA GAAATGATTTGCTGGATGTGGTGGCTAGCATTCACCTTTCCAAGAGGACTGT CCGAAGGATACGCATCAACCTGGTCCTGGCACTGATTTATAACCTGGTTGGG ATACCCATTGCAGCAGGTGTCTTCATGCCCATCGGCATTGTGCTGCAGCCCT GGATGGGCTCAGCGGCCATGGCAGCCTCCTCTGTGTCTGTGGTGCTCTCATC CCTGCAG CTCAAGTG CTATAAGAAG CCTGAC CTGGAGAG GTATGAG GCACAG GCGCATGGCCACATGAAGCCCCTGACGGCATCCCAGGTCAGTGTGCACATAG GCATGGATGACAGGTGGCGGGACTCCCCCAGGGCCACACCATGGGACCAGGT CAGCTATGTCAGCCAGGTGTCGCTGTCCTCCCTGACGTCCGACAAGCCATCT CGGCACAGCGCTGCAGCAGACGATGATGGGGACAAGTGGTCTCTGCTCCTGA AT GGCAG GGATGAGGAG CAGTAC ATCTGA ; d) ATGCCAGAGCAGGAGAGGCAGATCACCGCAAGAGAGGGAGCATCCAGGAAGATCCTGT CCAAGCTGTCTCTGCCAACAAGGGCATGGGAGCCTGCAATGAAGAAGTCTTTCGCCTTT GACAACGTGGGATATGAGGGAGGCCTGGATGGCCTGGGACCTAGCTCCCAGGTGGCCAC CAGCACAGTGAGAATCCTGGGCATGACCTGCCAGTCTTGCGTGAAGAGCATCGAGGACA
GGATCTCCAATCTGAAGGGCATCATCTCCATGAAGGTGTCTCTGGAGCAGGGCTCTGCC ACAGTGAAGTACGTGCCCAGCGTGGTGTGCCTGCAGCAGGTGTGCCACCAGATCGGCGA
TATGGGCTTCGAGGCATCCATCGCAGAGGGCAAGGCAGCATCTTGGCCATCCAGATCTC
TGCCTGCCCAGGAGGCCGTGGTGAAGCTGAGGGTGGAAGGAATGACCTGCCAGTCCTGC
GTGAGCAGCATCGAGGGCAAGGTGAGAAAGCTGCAGGGCGTGGTGAGGGTGAAGGTGA
GCCTGTCCAACCAGGAGGCCGTGATCACATACCAGCCATATCTGATCCAGCCCGAGGAC
CTGCGGGATCACGTGAATGACATGGGCTTCGAGGCCGCCATCAAGAGCAAGGTGGCACC
TCTGTCCCTGGGACCAATCGATATCGAGCGCCTGCAGTCCACCAACCCTAAGCGGCCAC
TGTCCTCTGCCAACCAGAACTTCAACAATAGCGAGACACTGGGACACCAGGGCTCCCAC
GTGGTGACACTGCAGCTGCGCATCGACGGCATGCACTGCAAGAGCTGCGTGCTGAACAT
CGAGGAGAATATCGGCCAGCTGCTGGGCGTGCAGAGCATCCAGGTGTCCCTGGAGAACA
AGACCGCCCAGGTGAAGTATGATCCCAGCTGCACATCCCCTGTGGCCCTGCAGAGGGCA
ATCGAGGCCCTGCCCCCTGGCAATTTCAAGGTGTCTCTGCCAGACGGAGCAGAGGGCAG
CGGAACCGATCACCGCAGCTCCTCTAGCCACTCTCCTGGCAGCCCACCAAGGAACCAGGT
GCAGGGAACCTGTTCTACCACACTGATCGCCATCGCCGGCATGACATGCGCCTCTTGCG
TGCACAGCATCGAGGGCATGATCAGCCAGCTGGAGGGCGTGCAGCAGATCTCTGTGAGC
CTGGCAGAGGGAACCGCAACAGTGCTGTACAATCCATCCGTGATCTCTCCCGAGGAGCT
GAGAGCCGCCATCGAGGACATGGGCTTTGAGGCCTCCGTGGTGTCCGAGTCTTGCAGCA
CCAACCCCCTGGGCAATCACTCCGCCGGCAACTCTATGGTGCAGACCACAGACGGCACCC
CAACAAGCGTGCAGGAGGTGGCACCACACACCGGCAGACTGCCTGCCAATCACGCCCCA
GATATCCTGGCCAAGAGCCCTCAGTCCACAAGGGCAGTGGCACCACAGAAGTGCTTCCT
GCAGATCAAGGGCATGACCTGCGCCTCCTGCGTGAGCAACATCGAGAGGAATCTGCAGA
AGGAGGCAGGCGTGCTGTCCGTGCTGGTGGCCCTGATGGCAGGCAAGGCCGAGATCAAG
TACGATCCTGAAGTGATCCAGCCACTGGAGATCGCCCAGTTTATCCAGGACCTGGGCTT
CGAGGCCGCCGTGATGGAGGATTATGCCGGCAGCGACGGCAACATCGAGCTGACCATCA
CAGGCATGACCTGCGCCTCTTGCGTGCACAACATCGAGAGCAAGCTGACCCGCACAAAT
GGCATCACATACGCATCTGTGGCCCTGGCCACCAGCAAGGCCCTGGTGAAGTTTGATCC
CGAGATCATCGGCCCTCGGGACATCATCAAGATCATCGAGGAGATCGGCTTCCACGCCA
GCCTGGCCCAGAGAAACCCCAATGCCCACCACCTGGATCACAAGATGGAGATCAAGCAG
TGGAAGAAGAGCTTTCTGTGCTCCCTGGTGTTCGGCATCCCTGTGATGGCCCTGATGAT
CTACATGCTGATCCCTTCCAACGAGCCACACCAGTCTATGGTGCTGGACCACAACATCA
TCCCAGGCCTGTCCATCCTGAATCTGATCTTCTTTATCCTGTGCACATTTGTGCAGCTGC
TGGGCGGCTGGTACTTCTATGTGCAGGCCTATAAGAGCCTGCGGCACAGATCCGCCAAT
ATGGATGTGCTGATCGTGCTGGCCACCAGCATCGCCTACGTGTATTCCCTGGTCATCCT
GGTGGTGGCAGTGGCAGAGAAGGCAGAGCGGAGCCCCGTGACCTTCTTTGACACACCCC CTATGCTGTTCGTGTTTATCGCCCTGGGCAGATGGCTGGAGCACCTGGCCAAGAGCAAG
ACCTCCGAGGCCCTGGCCAAGCTGATGAGCCTGCAGGCCACAGAGGCCACCGTGGTGAC
ACTGGGCGAGGATAACCTGATCATCAGGGAGGAGCAGGTGCCAATGGAGCTGGTGCAG
CGCGGCGACATCGTGAAGGTGGTGCCAGGCGGCAAGTTTCCCGTGGATGGCAAGGTGCT
GGAGGGCAATACAATGGCAGACGAGTCCCTGATCACCGGAGAGGCCATGCCTGTGACCA
AGAAGCCAGGCTCTACAGTGATCGCAGGCAGCATCAACGCACACGGCTCCGTGCTGATC
AAGGCCACACACGTGGGCAATGATACCACACTGGCCCAGATCGTGAAGCTGGTGGAGGA
GGCCCAGATGAGCAAGGCACCAATCCAGCAGCTGGCAGACCGGTTTTCTGGCTACTTCG
TGCCTTTTATCATCATCATGAGCACCCTGACACTGGTGGTGTGGATCGTGATCGGCTTC
ATCGACTTTGGCGTGGTGCAGAGGTATTTCCCAAACCCCAATAAGCACATCTCCCAGAC
CGAAGTGATCATCCGCTTCGCCTTTCAGACCTCCATCACCGTGCTGTGCATCGCCTGCCC
TTGTTCTCTGGGCCTGGCCACCCCAACAGCCGTGATGGTGGGAACAGGAGTGGCAGCAC
AGAACGGCATCCTGATCAAGGGCGGCAAGCCCCTGGAGATGGCCCACAAGATCAAGACC
GTGATGTTCGATAAGACCGGCACAATCACCCACGGCGTGCCAAGAGTGATGAGAGTGCT
GCTGCTGGGCGACGTGGCCACACTGCCACTGAGAAAGGTGCTGGCAGTGGTGGGAACCG
CAGAGGCCAGCTCCGAGCACCCCCTGGGCGTGGCCGTGACAAAGTACTGCAAGGAGGAG
CTGGGCACAGAGACACTGGGCTATTGTACCGACTTTCAGGCAGTGCCTGGATGCGGAAT
CGGCTGTAAGGTGTCCAACGTGGAGGGCATCCTGGCACACTCTGAGCGGCCCCTGTCTG
CCCCTGCAAGCCACCTGAATGAGGCAGGCAGCCTGCCAGCAGAGAAGGATGCAGTGCCT
CAGACATTCTCCGTGCTGATCGGCAACAGAGAGTGGCTGCGGAGAAATGGCCTGACCAT
CTCTAGCGACGTGAGCGACGCCATGACAGACCACGAGATGAAGGGCCAGACCGCCATCC
TGGTGGCCATCGATGGCGTGCTGTGCGGCATGATCGCCATCGCAGACGCAGTGAAGCAG
GAGGCCGCCCTGGCAGTGCACACCCTGCAGTCTATGGGCGTGGATGTGGTGCTGATCAC
CGGCGACAACAGGAAGACAGCAAGGGCAATCGCAACCCAAGTGGGCATCAATAAGGTG
TTTGCCGAGGTGCTGCCATCCCACAAGGTGGCCAAGGTGCAGGAGCTGCAGAACAAGGG
CAAGAAGGTGGCCATGGTGGGCGATGGCGTGAATGACTCTCCCGCCCTGGCACAGGCAG
ATATGGGAGTGGCAATCGGCACAGGAACCGATGTGGCAATCGAGGCAGCAGACGTGGT
GCTGATCCGGAACGATCTGCTGGACGTGGTGGCCTCCATCCACCTGTCTAAGCGGACCG
TGAGGCGCATCAGAATCAACCTGGTGCTGGCCCTGATCTACAATCTGGTGGGCATCCCT
ATCGCAGCAGGCGTGTTCATGCCAATCGGCATCGTGCTGCAGCCATGGATGGGCAGCGC
CGCAATGGCAGCATCCAGCGTGAGCGTGGTGCTGAGCTCCCTGCAGCTGAAGTGTTACA
AGAAGCCTGACCTGGAGAGGTATGAGGCCCAGGCCCACGGCCACATGAAGCCACTGACC
GCCTCTCAGGTGAGCGTGCACATCGGCATGGACGATAGGTGGAGGGATAGCCCAAGGGC
AACACCATGGGACCAGGTGTCCTACGTGTCTCAGGTGAGCCTGTCTAGCCTGACCTCTG ATAAGCCATCCAGGCACAGCGCCGCCGCCGACGATGACGGCGACAAGTGGAGCCTGCTG
CTGAATGGCCGCGATGAGGAGCAGTACATC
11-A vector system to express a coding sequence in a cell, said coding sequence consisting of a first portion (CDS1) and a second portion (CDS2) said vector system comprising: c) a first vector comprising:
- said first portion of said coding sequence (CDS1),
-a first intein nucleotide sequence coding for a N-lntein said N-intein having at least 80 % identity with SEQ ID No 3, 5, 7, 9, 11, 13 or a variant thereof or a fragment thereof or an homolog thereof and wherein said first intein nucleotide sequence is located at the 3' end of CDS1; and d) a second vector comprising:
- said second portion of said coding sequence (CDS2),
-a second intein nucleotide sequence coding for a C-lntein said C-intein has at least 80 % identity with SEQ ID No. 4, 6, 8, 10, 12, 14 or a variant thereof or a fragment thereof or an homolog thereof and wherein said second intein nucleotide sequence is located at the 5' end of CDS2; wherein said coding sequence encodes a sequence selected from the group of: ii) MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN TSVVYKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTVVITLKNMASHPVSLHAV GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH VDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD AASARAWPKMHTVNGYVNRSLPGLIGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNH RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDG EAYVKVDSCPEEPQLRMKNNE EAEDYDDDLTDSEMDVVRFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA PDDRSYKSQYLNNGPQRIGRKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTL LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGEIFKYKWTVTVEDGP TKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDE NRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHSINGYVFDSLQLSVCLH EVAYWYILS IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRG MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATE LKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKSSPLT ESGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTREITRTTLQSDQEEIDYDDT ISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSV PQFKKVVFQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYS SLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDV HSGUGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQM EDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIHFSGHVFTVRK KEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGM ASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQ GARQKFSSLYISQFIIMYSLDGKKWQIYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIAR YIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSK
ARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQD
GHQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCE
AQDLY or ii) MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN
TSVVYKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTVVITLKNMASHPVSLHAV
GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH
VDLVKDLNSGLIGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD
AASARAWPKMHTVNGYVNRSLPGLIGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNH
RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNE
EAEDYDDDLTDSEMDVVRFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA
PDDRSYKSQYLNNGPQRIGRKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTL
LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGEIFKYKWTVTVEDGP
TKSDPRCLTRYYSSFVNMERDLASGLIGPLUCYKESVDQRGNQIMSDKRNVILFSVFDE
NRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHSINGYVFDSLQLSVCLHEVAYWYILS
IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRG
MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS
TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE
AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATE
LKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKSSPLT
ESGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTRHQREITRTTLQSDQEEIDYDDT
ISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYG SSSPHVLRNRAQSGSV
PQFKKVVFQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYS
SLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDV
HSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQM
EDPTFKENYRFHA1NGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIHFSGHVFTVRK
KEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGM
ASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQ
GARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIAR
YIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSK
ARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQD
GHQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCE
AQDLY wherein when the first vector and the second vector are inserted in a cell, the protein product of the coding sequence is produced by protein splicing.
12- The vector system according to any one of previous paragraphs, wherein at least one of the first vector and the second vector further comprises at least one enhancer or regulatory nucleotide sequence, operably linked to the coding sequence.
13- The vector system according to any one of previous paragraphs comprising: c) a first vector comprising in a 5'-3' direction:
- a 5'-inverted terminal repeat (5'-ITR) sequence; - a promoter sequence;
- a 5' end portion of a coding sequence (CDS1), said 5'end portion being operably linked to and under control of said promoter;
- a first intein nucleotide sequence coding for a N-lntein; and
- a 3'-inverted terminal repeat (3'-ITR) sequence; and d) a second vector comprising in a 5'-3' direction:
- a 5'-inverted terminal repeat (5'-ITR) sequence;
- a promoter sequence;
- a second intein nucleotide sequence coding for a C-lntein;
- a 3'end portion of the coding sequence (CDS2); and
- a 3'-inverted terminal repeat (3'-ITR) sequence;
14- The vector system according to any one of previous paragraphs wherein said first and second vector are independently a viral vector, preferably an adeno viral vector or adeno- associated viral (AAV) vector, preferably said first and second adeno-associated viral (AAV) vectors are selected from the same or different AAV serotypes, preferably the serotype is selected from the serotype 2, the serotype 8, the serotype 5, the serotype 7 or the serotype 9, serotype 7m8, serotype shlO; serotype 2(quad Y-F).
15- A host cell transformed with the vector system according to any one of previous paragraphs.
16- The vector system according to any one of paragraphs 1 to 14 or the host cell according to paragraph 15 for medical use.
17- The vector system according to any one of paragraphs 1 to 14 or the host cell according to paragraph 15 for use in gene therapy, preferably for use in the treatment and/or prevention of hemophilia or Wilson disease.
18- A pharmaceutical composition comprising the vector system according to any one of paragraphs 1 to 14 or the host cell according to paragraph 15 and pharmaceutically acceptable vehicle.

Claims

Claims
1. A vector system for expressing a coding sequence in a cell, wherein the vector system comprises a first vector and a second vector, wherein:
(a) the first vector comprises a first portion of the coding sequence (CDS1) and a first intein nucleotide sequence that encodes a N-lntein, wherein the first intein nucleotide sequence is at the 3' end of CDS1; and
(b) the second vector comprises a second portion of the coding sequence (CDS2) and a second intein nucleotide sequence that encodes a C-lntein, wherein the second intein nucleotide sequence is at the 5' end of CDS2; wherein when the first vector and the second vector are introduced into the cell, the protein product of the coding sequence is produced by protein splicing, wherein the coding sequence encodes Factor VIII (F8) or ATP7B, or a variant thereof.
2. The vector system of claim 1, wherein the Factor VIII variant is N6 or SQ-N6.
3. A vector system to express a coding sequence in a cell, said coding sequence consisting of a first portion (CDS1) and a second portion (CDS2) said vector system comprising: c) a first vector comprising:
- said first portion of said coding sequence (CDS1),
-a first intein nucleotide sequence coding for a N-lntein said first intein nucleotide sequence having at least 80 % identity with the sequence:
TGCCTGAGCTACGAGACCGAGATCCTGACCGTGGAGTACGGCCTGCTGCCCATCGGCAAGATCG
TGGAGAAGCGGATCGAGTGCACCGTGTACAGCGTGGACAACAACGGCAACATCTACACCCAGCC
CGTGGCCCAGTGGCACGACCGGGGCGAGCAGGAGGTGTTCGAGTACTGCCTGGAGGACGGCAGC
CTGATCCGGGCCACCAAGGACCACAAGTTCATGACCGTGGACGGCCAGATGCTGCCCATCGACG or said first intein nucleotide sequence has at least 80 % identity with the sequence:
TGCCTGTCCTATGAGACAGAGATCCTGACAGTGGAGTACGGCCTGCTGCCTATCGGCAAGA
TCGTGGAGAAGAGGATCGAGTGTACCGTGTATAGCGTGGACAACAATGGCAATATCTACA
CACAGCCAGTGGCACAGTGGCACGACAGGGGAGAGCAGGAGGTGTTTGAGTATTGTCTGG
AGGATGGCAGCCTGATCCGGGCCACCAAGGATCACAAGTTCATGACAGTGGACGGCCAGAT
GCTGCCAATCGATGAGATCTTTGAGCGCGAGCTGGACCTGATGCGGGTGGATAACCTGCCC
AAT (Seq ID No. 16), or said N-intein has at least 80 % identity with SEQ ID No. 1 or a variant thereof or a fragment thereof or an homolog thereof; and wherein said first intein nucleotide sequence is located at the 3' end of CDS1; and d) a second vector comprising:
- said second portion of said coding sequence (CDS2),
-a second intein nucleotide sequence coding for a C-lntein said second intein nucleotide sequence having at least 80 % identity with the sequence:
ATCAAGATCGCCACCCGGAAGTACCTGGGCAAGCAGAACGTGTACGACATCGGCGTGGAGCGGG
ACCACAACTTCGCCCTGAAGAACGGCTTCATCGCCAGCAAT (SEQ ID No. 17), or said second intein nucleotide sequence has at least 80 % identity with the sequence: ATGATCAAGATCGCCACACGGAAGTACCTGGGCAAGCAGAACGTGTATGATATCGGCGTG GAGCGGGACCACAACTTCGCCCTGAAGAATGGCTTTATCGCCAGCAAT (SEQ ID No. 18), or said C-intein has at least 80 % identity with SEQ ID No. 2; and wherein said second intein nucleotide sequence is located at the 5' end of CDS2; wherein when the first vector and the second vector are inserted in a cell, the protein product of the coding sequence is produced by protein splicing.
4. The vector system according to any preceding claim, wherein the first vector and the second vector further comprise a promoter sequence operably linked to the first portion of the coding sequence (CDS1) or the second portion of the coding sequence (CDS2), preferably wherein said promoter is a liver specific promoter, preferably wherein the promoter is HLP, thyroxine binding globulin (TBG), HCB promoter or F8 promoter.
5. The vector system according to any preceding claim, wherein the first vector and the second vector further comprise a 5'-terminal repeat (5'-TR) nucleotide sequence and a 3'- terminal repeat (3'-TR) nucleotide sequence, preferably wherein the 5'-TR is a 5'-inverted terminal repeat (5'-ITR) nucleotide sequence and the 3'-TR is a 3'-inverted terminal repeat (3'- ITR) nucleotide sequence.
6. The vector system according to any preceding claim, wherein the first vector and the second vector further comprise a poly-adenylation signal nucleotide sequence and/or wherein at least one of the first vector or the second vector further comprises a nucleotide sequence coding for a degradation signal.
7. The vector system according to claim 6, wherein the degradation signal is selected from the group consisting of CL1, PB29, SMN, CIITA, ODc, ecDHFR or a fragment thereof.
8. The vector system according to any one of claims 3-7, wherein the coding sequence encodes a protein able to correct hemophilia or Wilson disease.
9. The vector system according to any one of claims 3-8, wherein the coding sequence is the coding sequence of a gene selected from the group consisting of: F8 or ATP7B or variant thereof.
10. The vector system according to any preceding claim, wherein the coding sequence is split into the first portion or the second portion at a position consisting of a nucleophile amino acid which does not fall within a structural domain or a functional domain of the encoded protein product, wherein the nucleophile amino acid is selected from serine, threonine, or cysteine.
11. The vector system according to any preceding claim, wherein the coding sequence is codon optimized.
12. The vector system according to any preceding claim, wherein the coding sequence has at least 80% identity with a sequence selected from the group consisting of: a) ATGCAAATAGAGCTCTCCACCTGCTTCTTTCTGTGCCTTTTGCGATTCTGCTTTAGTGCCACCAG AAGATACTACCTGGGTGCAGTGGAACTGTCATGGGACTATATGCAAAGTGATCTCGGTGAGC T G CCT GT G G ACGC AAG ATTTCCT CCT AG AGT GCCAAAAT CTTTTCCATT C AAC ACCT CAGTCGT GTACAAAAAG ACT CT GTTT GT AG AATT CACGG AT C ACCTTTT C AACATCG CT AAG CC AAGG CCA CCCTGGATGGGTCTGCTAGGTCCTACCATCCAGGCTGAGGTTTATGATACAGTGGTCATTACA CTTAAGAACATGGCTTCCCATCCTGTCAGTCTTCATGCTGTTGGTGTATCCTACTGGAAAGCTT CTGAGGGAGCTGAATATGATGATCAGACCAGTCAAAGGGAGAAAGAAGATGATAAAGTCTTC CCTGGTGGAAGCCATACATATGTCTGGCAGGTCCTGAAAGAGAATGGTCCAATGGCCTCTGAC CCACT GTG CCTT ACCT ACT CAT AT CTTT CT CAT GTG G ACCTG GT AAAAG ACTT G AATT C AGG CCT CATTGGAGCCCTACTAGTATGTAGAGAAGGGAGTCTGGCCAAGGAAAAGACACAGACCTTGC AC A AATTT AT ACT ACTTTTT G CT GTATTT G ATG A AG G G A A A AGTT G G C ACT C AG A A AC A A AG A ACTCCTT GATGCAGGATAGGGATGCTGCATCTGCTCGGGCCTGGCCT A A A AT GCACACAGTCA ATGGTTATGTAAACAGGTCTCTGCCAGGTCTGATTGGATGCCACAGGAAATCAGTCTATTGGC ATGTG ATT G G A AT GGGCACCACTCCTGAAGTG C ACT C A AT ATT CCTCG A AG GT C AC AC ATTT CT TGTGAGGAACCATCGCCAGGCGTCCTTGGAAATCTCGCCAATAACTTTCCTTACTGCTCAAACA CTCTTGATGGACCTTGGACAGTTTCTACTGTTTTGTCATATCTCTTCCCACCAACATGATGGCAT GGAAGCTTATGTCAAAGTAGACAGCTGTCCAGAGGAACCCCAACTACGAATGAAAAATAATG
AAGAAGCGGAAGACTATGATGATGATCTTACTGATTCTGAAATGGATGTGGTCAGGTTTGATG ATG ACAACTCTCCTTCCTTTATCC AAATTCG CTC AGTTG CCAAGAAG CAT CCT A A A ACTT G G GT
ACATTACATTGCTGCTGAAGAGGAGGACTGGGACTATGCTCCCTTAGTCCTCGCCCCCGATGA
C AG AAGTTAT AAAAGTC AAT ATTT G AAC AAT G GCCCT C AG CG G ATTGGTAG G AAGTAC AAAA
AAGTCCGATTTATGGCATACACAGATGAAACCTTTAAGACTCGTGAAGCTATTCAGCATGAAT
C AG G AAT CTT G G G ACCTTT ACTTT ATGGGGAAGTTGGAGACACACTGTTG ATT AT ATTT A AG A
ATCAAGCAAGCAGACCATATAACATCTACCCTCACGGAATCACTGATGTCCGTCCTTTGTATTC
AAGG AG ATT ACCAAAAG GT GT AAAAC ATTT G AAG G ATTTT CC AATT CT GCCAG G AG AAAT ATT
CAAATATAAATGGACAGTGACTGTAGAAGATGGGCCAACTAAATCAGATCCTCGGTGCCTGAC
CCG CT ATT ACT CT AGTTT CGTT AAT AT G GAG AG AG AT CT AGCTT CAG G ACT C ATT GG CCCTCTC
CT CAT CTGCT AC AAAG AAT CT GT AG AT C AAAG AG G AAACCAG AT AAT GTC AG AC AAG AG G AA
TGTCATCCTGTTTTCTGTATTTGATGAGAACCGAAGCTGGTACCTCACAGAGAATATACAACGC
TTTCTCCCCAATCCAGCTGGAGTGCAGCTTGAGGATCCAGAGTTCCAAGCCTCCAACATCATGC
ACAGCATCAATGGCTATGTTTTTGATAGTTTGCAGTTGTCAGTTTGTTTGCATGAGGTGGCATA
CT G GTAC ATT CT AAG C ATT GGAGCACAGACT G ACTT CCTTT CTGT CTT CTT CT CTG G ATAT ACCT
T C AAACAC AAAAT G GT CT AT G AAG ACAC ACT CACCCT ATTCCC ATT CT C AG G AG AAACT GTCTT
CATGTCGATGGAAAACCCAGGTCTATGGATTCTGGGGTGCCACAACTCAGACTTTCGGAACAG
AGGCATGACCGCCTTACTGAAGGTTTCTAGTTGTGACAAGAACACTGGTGATTATTACGAGGA
CAGTTATGAAGATATTTCAGCATACTTGCTGAGTAAAAACAATGCCATTGAACCAAGAAGTTTT
TCACAGAATCCACCTGTATTGACGCGGAGTTTCAGTCAGAACTCCAGGCACCCCTCTACTAGGC
AAAAACAGTTTAATGCAACCACAATACCTGAAAATGATATAGAGAAAACCGATCCCTGGTTCG
C ACACCG AACCCCC AT G CC AAAAATT C AAAACGTCT CC AGTTCCG AT CTT CT CAT G CT CTTGCG
CCAGTC ACCC ACACCAC AT G GT CT CT CCCT CAG CG ACCT GC AAG AG GCG AAAT AT G AAAC ATT
TTCAGATGACCCTAGCCCCGGCGCTATTGATAGTAACAACTCTCTCAGTGAAATGACTCACTTT
CGGCCGCAGCTGCATCATTCTGGTGATATGGTATTCACCCCGGAATCAGGCCTCCAACTTAGA
CTTAACGAGAAACTGGGCACGACCGCCGCCACCGAGTTGAAGAAACTCGACTTCAAGGTTTCC
AGTACCAGCAACAACCTTATCAGCACTATCCCATCCGATAATCTCGCGGCCGGGACAGATAAT
ACATCATCACTTGGGCCACCCTCTATGCCGGTCCACTATGATTCCCAGTTGGACACAACTCTTTT
TG GT AAG AAGT CAT CCCC ACT CACCG AA AGCG GT GG ACCTTT GT CT CT CT CT GAG GAG AAT AA
TG ACTCC A AG CTG CTT GAGTCAGGGTTGATGAACAG CCA AG A AT CCTC ATG G G G A A A A A ACG
TTTCCT CC ACCAG GG AAAT AACT CGTACT ACT CTT C AGTC AG AT C AAG AG G AAATT G ACT ATG A
TG AT ACC AT ATC AGTT G A A AT G A AG AAG G A AG ATTTT G AC ATTT ATG ATG AG G ATG AAAAT C A
GAGCCCCCGCAGCTTTCAAAAGAAAACACGACACTATTTTATTGCTGCAGTGGAGAGGCTCTG GGATTATGGGATGAGTAGCTCCCCACATGTTCTAAGAAACAGGGCTCAGAGTGGCAGTGTCC
CT C AGTT CAAG AAAGTT GTTTT CC AGG AATTT ACT GAT G GCTCCTTT ACT C AG CCCTT AT ACCGT GGAGAACTAAATGAACATTTGGGACTCCTGGGGCCATATATAAGAGCAGAAGTTGAAGATAA T AT CAT G GTAACTTT C AG AAAT CAG GCCT CTCGTCCCT ATT CCTT CT ATT CT AG CCTT ATTT CTT A TGAGGAAGATCAGAGGCAAGGAGCAGAACCTAGAAAAAACTTTGTCAAGCCTAATGAAACCA AAACTT ACTTTT G G AAAGT GC AACAT CAT AT GG C ACCCACT AAAG AT G AGTTT G ACT G CAAAG CCTGGGCTTATTTCTCTGATGTTGACCTGGAAAAAGATGTGCACTCAGGCCTGATTGGACCCCT TCTGGTCTGCCACACTAACACACTGAACCCTGCTCATGGGAGACAAGTGACAGTACAGGAATT TG CTCT GTTTTT C ACC AT CTTT G ATG AG ACC AAAAG CT GGTACTT C ACT G AAAAT AT GG AAAG A AACTGCAGGGCTCCCTGCAATATCCAGATGGAAGATCCCACTTTTAAAGAGAATTATCGCTTCC AT G C A AT C A AT G G CT AC AT A AT G G ATAC ACT ACCTG G CTT AGTA AT G G CTC AG G AT C A A AG G A TTCGATGGTATCTGCTCAGCATGGGCAG C A AT G A A A AC ATCC ATT CT ATT C ATTT CAGTGGACA TGTGTTCACTGTACG A A A A AAAG AG G AGT AT AAAAT G G C ACTGT AC A AT CTCTATCC AG GTGT TTTTGAGACAGTGGAAATGTTACCATCCAAAGCTGGAATTTGGCGGGTGGAATGCCTTATTGG CG AG CAT CT AC AT G CTGG G ATG AG CAC ACTTTTT CTGGT GTAC AGC AAT AAGT GT CAG ACT CC CCTGGGAATGGCTTCTGGACACATTAGAGATTTTCAGATTACAGCTTCAGGACAATATGGACA GTGGGCCCCAAAGCTGGCCAGACTTCATTATTCCGGATCAATCAATGCCTGGAGCACCAAGGA GCCCTTTTCTTGGATCAAGGTGGATCTGTTGGCACCAATGATTATTCACGGCATCAAGACCCAG G GT G CCCGT CAG AAGTT CTCCAG CCT CT AC AT CT CT C AGTTT AT CAT CAT GT AT AGT CTT G ATG GGAAGAAGTGGCAGACTTATCGAGGAAATTCCACTGGAACCTTAATGGTCTTCTTTGGCAATG TGGATTCATCTGGGATAAAACACAATATTTTTAACCCTCCAATTATTGCTCGATACATCCGTTTG C ACCC A ACT C ATT ATAG C ATT CG C AG C ACT CTT CGCATGGAGTTGATGGGCTGTG ATTT AAAT A GTTGCAGCATGCCATTGGGAATGGAGAGTAAAGCAATATCAGATGCACAGATTACTGCTTCAT CCTACTTTACCAATATGTTTGCCACCTGGTCTCCTTCAAAAGCTCGACTTCACCTCCAAGGGAG GAGTAATGCCTGGAGACCTCAGGTGAATAATCCAAAAGAGTGGCTGCAAGTGGACTTCCAGA AG AC A AT G AAAGT CACAGGAGTAACTACTCAGGGAGT AAAAT CT CT G CTT ACC AG C ATGTATG TGAAGGAGTTCCTCATCTCCAGCAGTCAAGATGGCCATCAGTGGACTCTCTTTTTTCAGAATGG C A AAGTA A AG GTTTTT C AG G G AAAT CAAG ACT CCTT CACACCTGTGGTG A ACT CTCT AG ACCC A CCGTTACTGACTCGCTACCTTCGAATTCACCCCCAGAGTTGGGTGCACCAGATTGCCCTGAGG ATG GAG GTTCTGG GCTG CG AG GC ACAG G ACCTCTACTG A; b) ATG CAG ATT GAGCTGAGCACCTG CTT CTTCCT GTGCCTGCTGAGGTTCTG CTT CTCTGCCACCA GGAGATACTACCTGGGGGCTGTGGAGCTGAGCTGGGACTACATGCAGTCTGACCTGGGGGA G CTG CCTGTG G ATG CC AG GTTCCCCCCCAG AGTGCCC AAG AG CTTCCCCTTC AAC ACCTCTGTG
GT GT ACAAG AAG ACCCT GTTT GT GG AGTTCACTG ACCACCT GTT CAACATT GCCAAGCCCAGG
CCCCCCTGGATGGGCCTGCTGGGCCCCACCATCCAGGCTGAGGTGTATGACACTGTGGTGATC
ACCCTGAAGAACATGGCCAGCCACCCTGTGAGCCTGCATGCTGTGGGGGTGAGCTACTGGAA
GGCCTCTGAGGGGGCTGAGTATGATGACCAGACCAGCCAGAGGGAGAAGGAGGATGACAAG
GTGTTCCCTGGGGGCAGCCACACCTATGTGTGGCAGGTGCTGAAGGAGAATGGCCCCATGGC
CTCTGACCCCCTGTGCCTGACCTACAGCTACCTGAGCCATGTGGACCTGGTGAAGGACCTGAA
CTCTGGCCTGATTGGGGCCCTGCTGGTGTGCAGGGAGGGCAGCCTGGCCAAGGAGAAGACC
CAGACCCTGCACAAGTTCATCCTGCTGTTTGCTGTGTTTGATGAGGGCAAGAGCTGGCACTCT
GAAACCAAGAACAGCCTGATGCAGGACAGGGATGCTGCCTCTGCCAGGGCCTGGCCCAAGAT
G C AC ACTGTG A AT G G CTATGTG A AC AG G AG CCTG CCTG G CCTG ATT GGCTGCCACAGGAAGT
CTGTGTACTGG CAT GTG ATT G G CAT GGGCACCACCCCTGAGGTGCACAG CAT CTTCCT G G AG G
GCCACACCTTCCTGGTCAGGAACCACAGGCAGGCCAGCCTGGAGATCAGCCCCATCACCTTCC
TGACTGCCCAGACCCTGCTGATGGACCTGGGCCAGTTCCTGCTGTTCTGCCACATCAGCAGCC
ACCAGCATGATGGCATGGAGGCCTATGTGAAGGTGGACAGCTGCCCTGAGGAGCCCCAGCTG
AGGATGAAGAACAATGAGGAGGCTGAGGACTATGATGATGACCTGACTGACTCTGAGATGG
AT GTGGTG AGGTTT GAT G ATG ACAACAGCCCCAGCTT CATCCAG AT CAGGT CT GTGGCCAAG A
AGCACCCCAAGACCTGGGTGCACTACATTGCTGCTGAGGAGGAGGACTGGGACTATGCCCCC
CTGGTGCTGGCCCCTGATGACAGGAGCTACAAGAGCCAGTACCTGAACAATGGCCCCCAGAG
GATTGGCAGGAAGTACAAGAAGGTCAGGTTCATGGCCTACACTGATGAAACCTTCAAGACCA
GGGAGGCCATCCAGCATGAGTCTGGCATCCTGGGCCCCCTGCTGTATGGGGAGGTGGGGGA
CACCCTGCTGATCATCTTCAAGAACCAGGCCAGCAGGCCCTACAACATCTACCCCCATGGCATC
ACT GAT GT G AG GCCCCT GT ACAG CAG G AG GCTG CCC AAGG GG GTG AAG CACCTG AAG G ACTT
CCCC ATCCTG CCTG G GG AG ATCTTCAAGTACAAGTG G ACTGTG ACTGTG GAG GATGG CCCCAC
CAAGTCTGACCCCAGGTGCCTGACCAGATACTACAGCAGCTTTGTGAACATGGAGAGGGACCT
GGCCTCTGGCCTGATTGGCCCCCTGCTGATCTGCTACAAGGAGTCTGTGGACCAGAGGGGCA
ACCAGATCATGTCTGACAAGAGGAATGTGATCCTGTTCTCTGTGTTTGATGAGAACAGGAGCT
GGTACCTGACTGAGAACATCCAGAGGTTCCTGCCCAACCCTGCTGGGGTGCAGCTGGAGGAC
CCTG AGTTCC AG G CC AGC AACATCATGC ACAG CATC A ATG G CTATGTGTTTG ACAG CCTG CAG
CTGTCTGTGTGCCTGCATGAGGTGGCCTACTGGTACATCCTGAGCATTGGGGCCCAGACTGAC
TTCCTGTCTGTGTTCTTCTCTGGCTACACCTTCAAGCACAAGATGGTGTATGAGGACACCCTGA
CCCTGTTCCCCTTCTCTGGGGAGACTGTGTTCATGAGCATGGAGAACCCTGGCCTGTGGATTCT GGGCTGCCACAACTCTGACTTCAGGAACAGGGGCATGACTGCCCTGCTGAAAGTCTCCAGCTG
T G AC A AG A AC ACTG G G G ACTACTATG AG G AC AG CTATG AG G AC AT CTCTG CCT ACCTG CTG AG
CAAGAACAATGCCATTGAGCCCAGGAGTTTTTCACAGAATCCACCTGTATTGACGCGGAGTTT
C AGT C AG AACTCC AGG CACCCCT CT ACT AG GC AAAAACAGTTT AATGC AACCAC AAT ACCTG A
AAATGATATAGAGAAAACCGATCCCTGGTTCGCACACCGAACCCCCATGCCAAAAATTCAAAA
CGT CTCC AGTT CCG AT CTT CT CAT GCT CTT G CGCCAGTC ACCCAC ACC AC ATGGTCT CTCCCT C A
GCGACCTGCAAGAGGCGAAATATGAAACATTTTCAGATGACCCTAGCCCCGGCGCTATTGATA
GTAACAACT CT CT C AGT G AAATG ACT CACTTTCGG CCG CAG CT GC AT C ATT CTG GTG ATATGGT
ATTCACCCCGGAATCAGGCCTCCAACTTAGACTTAACGAGAAACTGGGCACGACCGCCGCCAC
CG AGTTG AAG AAACT CG ACTT C AAGGTTT CC AGT ACC AGC AACAACCTT AT CAG CACT ATCCCA
TCCGATAATCTCGCGGCCGGGACAGATAATACATCATCACTTGGGCCACCCTCTATGCCGGTC
CACT AT GATT CCC AGTT GG ACAC AACT CTTTTT G GT AAG AAGT C ATCCCC ACT CACCG AAAGCG
GTGGACCTTTGTCTCTCTCTGAGGAGAATAATGACTCCAAGCTGCTTGAGTCAGGGTTGATGA
ACAGCCAAGAATCCTCATGGGGAAAAAACGTTTCCTCCACCAGGGAGATCACCAGGACCACCC
TGCAGTCTGACCAGGAGGAGATTGACTATGATGACACCATCTCTGTGGAGATGAAGAAGGAG
GACTTTGACATCTACGACGAGGACGAGAACCAGAGCCCCAGGAGCTTCCAGAAGAAGACCAG
GCACTACTTCATTGCTGCTGTGGAGAGGCTGTGGGACTATGGCATGAGCAGCAGCCCCCATGT
GCTGAGGAACAGGGCCCAGTCTGGCTCTGTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGAGT
TCACTGATGGCAGCTTCACCCAGCCCCTGTACAGAGGGGAGCTGAATGAGCACCTGGGCCTG
CTGGGCCCCTACATCAGGGCTGAGGTGGAGGACAACATCATGGTGACCTTCAGGAACCAGGC
CAGCAGGCCCTACAGCTTCTACAGCAGCCTGATCAGCTATGAGGAGGACCAGAGGCAGGGGG
CTGAGCCCAGGAAGAACTTTGTGAAGCCCAATGAAACCAAGACCTACTTCTGGAAGGTGCAG
CACCACATGGCCCCCACCAAGGATGAGTTTGACTGCAAGGCCTGGGCCTACTTCTCTGATGTG
GACCTGGAGAAGGATGTGCACTCTGGCCTGATTGGCCCCCTGCTGGTGTGCCACACCAACACC
CTGAACCCTGCCCATGGCAGGCAGGTGACTGTGCAGGAGTTTGCCCTGTTCTTCACCATCTTTG
ATGAAACCAAGAGCTGGTACTTCACTGAGAACATGGAGAGGAACTGCAGGGCCCCCTGCAAC
ATCC AG ATG G AG G ACCCC ACCTTCAAG G AG AACTACAG GTTCCATG CCATCAATG GCTAC ATC
ATGGACACCCTGCCTGGCCTGGTGATGGCCCAGGACCAGAGGATCAGGTGGTACCTGCTGAG
CATGGGCAGCAATGAGAACATCCACAGCATCCACTTCTCTGGCCATGTGTTCACTGTGAGGAA
GAAGGAGGAGTACAAGATGGCCCTGTACAACCTGTACCCTGGGGTGTTTGAGACTGTGGAGA
TGCTGCCCAGCAAGGCTGGCATCTGGAGGGTGGAGTGCCTGATTGGGGAGCACCTGCATGCT
GGCATGAGCACCCTGTTCCTGGTGTACAGCAACAAGTGCCAGACCCCCCTGGGCATGGCCTCT GGCCACATCAGGGACTTCCAGATCACTGCCTCTGGCCAGTATGGCCAGTGGGCCCCCAAGCTG
GCCAGGCTGCACTACTCTGGCAGCATCAATGCCTGGAGCACCAAGGAGCCCTTCAGCTGGATC AAGGTGGACCTGCTGGCCCCCATGATCATCCATGGCATCAAGACCCAGGGGGCCAGGCAGAA GTTCAGCAGCCTGTACATCAGCCAGTTCATCATCATGTACAGCCTGGATGGCAAGAAGTGGCA GACCTACAGGGGCAACAGCACTGGCACCCTGATGGTGTTCTTTGGCAATGTGGACAGCTCTG G CAT C AAGC AC AAC AT CTT CAACCCCCCCAT CATTGCC AG AT ACAT C AGG CTGC ACCCCACCC A CTACAGCATCAGGAGCACCCTGAGGATGGAGCTGATGGGCTGTGACCTGAACAGCTGCAGCA TGCCCCTGGGCATGGAGAGCAAGGCCATCTCTGATGCCCAGATCACTGCCAGCAGCTACTTCA CCAACATGTTTGCCACCTGGAGCCCCAGCAAGGCCAGGCTGCACCTGCAGGGCAGGAGCAAT GCCTGGAGGCCCCAGGTCAACAACCCCAAGGAGTGGCTGCAGGTGGACTTCCAGAAGACCAT GAAGGTGACTGGGGTGACCACCCAGGGGGTGAAGAGCCTGCTGACCAGCATGTATGTGAAG GAGTTCCTGATCAGCAGCAGCCAGGATGGCCACCAGTGGACCCTGTTCTTCCAGAATGGCAA GGTGAAGGTGTTCCAGGGCAACCAGGACAGCTTCACCCCTGTGGTGAACAGCCTGGACCCCC CCCTG CTG ACCAG ATACCTG AG G ATTCACCCCC AG AG CTG G GTG C ACC AG ATTG CCCTG AG G A TG G AG GTGCTG GG CTGTG AG GCCCAG G ACCTGTACTG A; c) AT G C T GAG C AG GAG AG AC AG AT C AC AG C C AG AG AAG G G G C C AG T C G G AAAAT CTTATCTAAGCTTTCTTTGCCTACCCGTGCCTGGGAACCAGCAATGAAGAAG AGTTTTGCTTTTGACAATGTTGGCTATGAAGGTGGTCTGGATGGCCTGGGCC CTTCTTCTCAGGTGGCCACCAGCACAGTCAGGATCTTGGGCATGACTTGCCA G T CAT G T G T G AAG T C C AT T GAG G AC AG GAT T T C C AAT T T G AAAG G CAT CATC AGCATGAAGGTTTCCCTGGAACAAGGCAGTGCCACTGTGAAATATGTGCCAT CGGTTGTGTGCCTGCAACAGGTTTGCCATCAAATTGGGGACATGGGCTTCGA GGCCAGCATTGCAGAAGGAAAGGCAGCCTCCTGGCCCTCAAGGTCCTTGCCT GCCCAGGAGGCTGTGGTCAAGCTCCGGGTGGAGGGCATGACCTGCCAGTCCT G T GT C AGC T C CAT T G AAG G C AAG G T C CGGAAAC TGCAAGGAG TAG T G AGAG T C AAAG T C T C AC T C AG C AAC C AAG AG G C C G T CAT C AC T TAT C AG C C T TAT C T C ATT C AG C C C G AAG AC C T C AG G G AC C AT G T AAAT G AC AT G G GAT T T G AAG CTG CCATCAAGAGCAAAGTGGCTCCCTTAAGCCTGGGACCAATTGATATTGAGCG G T T AC AAAG C AC T AAC C C AAAG AG AC CTT TAT C T T C T G C T AAC CAGAAT T T T AATAATTCTGAGACCTTGGGGCACCAAGGAAGCCATGTGGTCACCCTCCAAC T GAG AAT AG AT G G AAT G CAT T G T AAG TCTTGCGTCTT GAAT AT T G AAG AAAA
TATTGGCCAGCTCCTAGGGGTTCAAAGTATTCAAGTGTCCTTGGAGAACAAA ACTGCCCAAGTAAAGTATGACCCTTCTTGTACCAGCCCAGTGGCTCTGCAGA
GGGCTATCGAGGCACTTCCACCTGGGAATTTTAAAGTTTCTCTTCCTGATGG AGCCGAAGGGAGTGGGACAGATCACAGGTCTTCCAGTTCTCATTCCCCTGGC TCCCCACCGAGAAACCAGGTCCAGGGCACATGCAGTACCACTCTGATTGCCA TTGCCGGCATGACCTGTGCATCCTGTGTCCATTCCATTGAAGGCATGATCTC CCAACTGGAAGGGGTGCAGCAAATATCGGTGTCTTTGGCCGAAGGGACTGCA ACAGTTCTTTATAATCCCTCTGTAATTAGCCCAGAAGAACTCAGAGCTGCTA TAGAAGACATGGGATTTGAGGCTTCAGTCGTTTCTGAAAGCTGTTCTACTAA CCCTCTTGGAAACCACAGTGCTGGGAATTCCATGGTGCAAACTACAGATGGT ACACCTACATCTGTGCAGGAAGTGGCTCCCCACACTGGGAGGCTCCCTGCAA ACCATGCCCCGGACATCTTGGCAAAGTCCCCACAATCAACCAGAGCAGTGGC ACCGCAGAAGTGCTTCTTACAGATCAAAGGCATGACCTGTGCATCCTGTGTG TCTAACATAGAAAGGAATCTGCAGAAAGAAGCTGGTGTTCTCTCCGTGTTGG T T G C C T T GAT G G C AG G AAAG G C AG AG AT C AAG TAT G AC C C AG AG G T CAT C C A GCCCCTCGAGATAGCTCAGTTCATCCAGGACCTGGGTTTTGAGGCAGCAGTC AT G GAG G AC T AC G C AG G C T C C GAT G G C AAC AT T GAG C T G AC AAT C AC AG G G A TGACCTGCGCGTCCTGTGTCCACAACATAGAGTCCAAACTCACGAGGACAAA TGGCATCACTTATGCCTCCGTTGCCCTTGCCACCAGCAAAGCCCTTGTTAAG T T T G AC C C G G AAAT TAT C G G T C C AC G G GAT AT TAT C AAAAT TAT T GAG G AAA TTGGCTTTCATGCTTCCCTGGCCCAGAGAAACCCCAACGCTCATCACTTGGA CCACAAGATGGAAATAAAGCAGTGGAAGAAGTCTTTCCTGTGCAGCCTGGTG TTTGGCATCCCTGTCATGGCCTTAATGATCTATATGCTGATACCCAGCAACG AGCCCCACCAGTCCATGGTCCTGGACCACAACATCATTCCAGGACTGTCCAT TCTAAATCTCATCTTCTTTATCTTGTGTACCTTTGTCCAGCTCCTCGGTGGG T G G T AC T T C T AC G T T C AG G C C T AC AAAT C T C T GAG AC AC AG G T C AG C C AAC A TGGACGTGCTCATCGTCCTGGCCACAAGCATTGCTTATGTTTATTCTCTGGT CATCCTGGTGGTTGCTGTGGCTGAGAAGGCGGAGAGGAGCCCTGTGACATTC TTCGACACGCCCCCCATGCTCTTTGTGTTCATTGCCCTGGGCCGGTGGCTGG AACACTTGGCAAAGAGCAAAACCTCAGAAGCCCTGGCTAAACTCATGTCTCT CCAAGCCACAGAAGCCACCGTTGTGACCCTTGGTGAGGACAATTTAATCATC
AGGGAGGAGCAAGTCCCCATGGAGCTGGTGCAGCGGGGCGATATCGTCAAGG TGGTCCCTGGGGGAAAGTTTCCAGTGGATGGGAAAGTCCTGGAAGGCAATAC
C AT G G C T GAT GAG TCCCTCAT C AC AG G AG AAG C CAT G C C AG T C AC T AAG AAA CCCGGAAGCACTGTAATTGCGGGGTCTATAAATGCACATGGCTCTGTGCTCA TTAAAGCTACCCACGTGGGCAATGACACCACTTTGGCTCAGATTGTGAAACT GGTGGAAGAGGCTCAGATGTCAAAGGCACCCATTCAGCAGCTGGCTGACCGG TTTAGTGGATATTTTGTCCCATTTATCATCATCATGTCAACTTTGACGTTGG TGGTATGGATTGTAATCGGTTTTATCGATTTTGGTGTTGTTCAGAGATACTT TCCTAACCCCAACAAGCACATCTCCCAGACAGAGGTGATCATCCGGTTTGCT TTCCAGACGTCCATCACGGTGCTGTGCATTGCCTGCCCCTGCTCCCTGGGGC TGGCCACGCCCACGGCTGTCATGGTGGGCACCGGGGTGGCCGCGCAGAACGG C AT C C T CAT C AAG G GAG G C AAG C C C C T G GAG AT G G C G C AC AAG AT AAAG AC T GTGATGTTTGACAAGACTGGCACCATTACCCATGGCGTCCCCAGGGTCATGC GGGTGCTCCTGCTGGGGGATGTGGCCACACTGCCCCTCAGGAAGGTTCTGGC TGTGGTGGGGACTGCGGAGGCCAGCAGTGAACACCCCTTGGGCGTGGCAGTC AC C AAAT AC T G T AAAG AG G AAC T T G G AAC AG AG AC C T T G G GAT AC T G C AC G G ACTTCCAGGCAGTGCCAGGCTGTGGAATTGGGTGCAAAGTCAGCAACGTGGA AGGCATCCTGGCCCACAGTGAGCGCCCTTTGAGTGCACCGGCCAGTCACCTG AATGAGGCTGGCAGCCTTCCCGCAGAAAAAGATGCAGTCCCCCAGACCTTCT CTGTGCTGATTGGAAACCGTGAGTGGCTGAGGCGCAACGGTTTAACCATTTC TAG C GAT G T C AG T G AC G C TAT G AC AG AC C AC GAG AT G AAAG G AC AG AC AG C C ATCCTGGTGGCTATTGACGGTGTGCTCTGTGGGATGATCGCAATCGCAGACG CTGTCAAGCAGGAGGCTGCCCTGGCTGTGCACACGCTGCAGAGCATGGGTGT GGACGTGGTTCTGATCACGGGGGACAACCGGAAGACAGCCAGAGCTATTGCC ACCCAGGTTGGCATCAACAAAGTCTTTGCAGAGGTGCTGCCTTCGCACAAGG TGGCCAAGGTCCAGGAGCTCCAGAATAAAGGGAAGAAAGTCGCCATGGTGGG GGATGGGGTCAATGACTCCCCGGCCTTGGCCCAGGCAGACATGGGTGTGGCC ATTGGCACCGGCACGGATGTGGCCATCGAGGCAGCCGACGTCGTCCTTATCA GAAATGATTTGCTGGATGTGGTGGCTAGCATTCACCTTTCCAAGAGGACTGT CCGAAGGATACGCATCAACCTGGTCCTGGCACTGATTTATAACCTGGTTGGG ATACCCATTGCAGCAGGTGTCTTCATGCCCATCGGCATTGTGCTGCAGCCCT
GGATGGGCTCAGCGGCCATGGCAGCCTCCTCTGTGTCTGTGGTGCTCTCATC CCTGCAGCTCAAGTGCTATAAGAAGCCTGACCTGGAGAGGTATGAGGCACAG GCGCATGGCCACATGAAGCCCCTGACGGCATCCCAGGTCAGTGTGCACATAG GCATGGATGACAGGTGGCGGGACTCCCCCAGGGCCACACCATGGGACCAGGT CAGCTATGTCAGCCAGGTGTCGCTGTCCTCCCTGACGTCCGACAAGCCATCT CGGCACAGCGCTGCAGCAGACGATGATGGGGACAAGTGGTCTCTGCTCCTGA AT G G C AG G GAT GAG GAG C AG T AC AT C T G A ; d) ATGCCAGAGCAGGAGAGGCAGATCACCGCAAGAGAGGGAGCATCCAGGAAGATCCTGT CCAAGCTGTCTCTGCCAACAAGGGCATGGGAGCCTGCAATGAAGAAGTCTTTCGCCTTT GACAACGTGGGATATGAGGGAGGCCTGGATGGCCTGGGACCTAGCTCCCAGGTGGCCAC CAGCACAGTGAGAATCCTGGGCATGACCTGCCAGTCTTGCGTGAAGAGCATCGAGGACA GGATCTCCAATCTGAAGGGCATCATCTCCATGAAGGTGTCTCTGGAGCAGGGCTCTGCC ACAGTGAAGTACGTGCCCAGCGTGGTGTGCCTGCAGCAGGTGTGCCACCAGATCGGCGA TATGGGCTTCGAGGCATCCATCGCAGAGGGCAAGGCAGCATCTTGGCCATCCAGATCTC TGCCTGCCCAGGAGGCCGTGGTGAAGCTGAGGGTGGAAGGAATGACCTGCCAGTCCTGC GTGAGCAGCATCGAGGGCAAGGTGAGAAAGCTGCAGGGCGTGGTGAGGGTGAAGGTGA GCCTGTCCAACCAGGAGGCCGTGATCACATACCAGCCATATCTGATCCAGCCCGAGGAC CTGCGGGATCACGTGAATGACATGGGCTTCGAGGCCGCCATCAAGAGCAAGGTGGCACC TCTGTCCCTGGGACCAATCGATATCGAGCGCCTGCAGTCCACCAACCCTAAGCGGCCAC TGTCCTCTGCCAACCAGAACTTCAACAATAGCGAGACACTGGGACACCAGGGCTCCCAC GTGGTGACACTGCAGCTGCGCATCGACGGCATGCACTGCAAGAGCTGCGTGCTGAACAT CGAGGAGAATATCGGCCAGCTGCTGGGCGTGCAGAGCATCCAGGTGTCCCTGGAGAACA AGACCGCCCAGGTGAAGTATGATCCCAGCTGCACATCCCCTGTGGCCCTGCAGAGGGCA ATCGAGGCCCTGCCCCCTGGCAATTTCAAGGTGTCTCTGCCAGACGGAGCAGAGGGCAG CGGAACCGATCACCGCAGCTCCTCTAGCCACTCTCCTGGCAGCCCACCAAGGAACCAGGT GCAGGGAACCTGTTCTACCACACTGATCGCCATCGCCGGCATGACATGCGCCTCTTGCG TGCACAGCATCGAGGGCATGATCAGCCAGCTGGAGGGCGTGCAGCAGATCTCTGTGAGC CTGGCAGAGGGAACCGCAACAGTGCTGTACAATCCATCCGTGATCTCTCCCGAGGAGCT GAGAGCCGCCATCGAGGACATGGGCTTTGAGGCCTCCGTGGTGTCCGAGTCTTGCAGCA CCAACCCCCTGGGCAATCACTCCGCCGGCAACTCTATGGTGCAGACCACAGACGGCACCC CAACAAGCGTGCAGGAGGTGGCACCACACACCGGCAGACTGCCTGCCAATCACGCCCCA GATATCCTGGCCAAGAGCCCTCAGTCCACAAGGGCAGTGGCACCACAGAAGTGCTTCCT GCAGATCAAGGGCATGACCTGCGCCTCCTGCGTGAGCAACATCGAGAGGAATCTGCAGA
AGGAGGCAGGCGTGCTGTCCGTGCTGGTGGCCCTGATGGCAGGCAAGGCCGAGATCAAG TACGATCCTGAAGTGATCCAGCCACTGGAGATCGCCCAGTTTATCCAGGACCTGGGCTT
CGAGGCCGCCGTGATGGAGGATTATGCCGGCAGCGACGGCAACATCGAGCTGACCATCA
CAGGCATGACCTGCGCCTCTTGCGTGCACAACATCGAGAGCAAGCTGACCCGCACAAAT
GGCATCACATACGCATCTGTGGCCCTGGCCACCAGCAAGGCCCTGGTGAAGTTTGATCC
CGAGATCATCGGCCCTCGGGACATCATCAAGATCATCGAGGAGATCGGCTTCCACGCCA
GCCTGGCCCAGAGAAACCCCAATGCCCACCACCTGGATCACAAGATGGAGATCAAGCAG
TGGAAGAAGAGCTTTCTGTGCTCCCTGGTGTTCGGCATCCCTGTGATGGCCCTGATGAT
CTACATGCTGATCCCTTCCAACGAGCCACACCAGTCTATGGTGCTGGACCACAACATCA
TCCCAGGCCTGTCCATCCTGAATCTGATCTTCTTTATCCTGTGCACATTTGTGCAGCTGC
TGGGCGGCTGGTACTTCTATGTGCAGGCCTATAAGAGCCTGCGGCACAGATCCGCCAAT
ATGGATGTGCTGATCGTGCTGGCCACCAGCATCGCCTACGTGTATTCCCTGGTCATCCT
GGTGGTGGCAGTGGCAGAGAAGGCAGAGCGGAGCCCCGTGACCTTCTTTGACACACCCC
CTATGCTGTTCGTGTTTATCGCCCTGGGCAGATGGCTGGAGCACCTGGCCAAGAGCAAG
ACCTCCGAGGCCCTGGCCAAGCTGATGAGCCTGCAGGCCACAGAGGCCACCGTGGTGAC
ACTGGGCGAGGATAACCTGATCATCAGGGAGGAGCAGGTGCCAATGGAGCTGGTGCAG
CGCGGCGACATCGTGAAGGTGGTGCCAGGCGGCAAGTTTCCCGTGGATGGCAAGGTGCT
GGAGGGCAATACAATGGCAGACGAGTCCCTGATCACCGGAGAGGCCATGCCTGTGACCA
AGAAGCCAGGCTCTACAGTGATCGCAGGCAGCATCAACGCACACGGCTCCGTGCTGATC
AAGGCCACACACGTGGGCAATGATACCACACTGGCCCAGATCGTGAAGCTGGTGGAGGA
GGCCCAGATGAGCAAGGCACCAATCCAGCAGCTGGCAGACCGGTTTTCTGGCTACTTCG
TGCCTTTTATCATCATCATGAGCACCCTGACACTGGTGGTGTGGATCGTGATCGGCTTC
ATCGACTTTGGCGTGGTGCAGAGGTATTTCCCAAACCCCAATAAGCACATCTCCCAGAC
CGAAGTGATCATCCGCTTCGCCTTTCAGACCTCCATCACCGTGCTGTGCATCGCCTGCCC
TTGTTCTCTGGGCCTGGCCACCCCAACAGCCGTGATGGTGGGAACAGGAGTGGCAGCAC
AGAACGGCATCCTGATCAAGGGCGGCAAGCCCCTGGAGATGGCCCACAAGATCAAGACC
GTGATGTTCGATAAGACCGGCACAATCACCCACGGCGTGCCAAGAGTGATGAGAGTGCT
GCTGCTGGGCGACGTGGCCACACTGCCACTGAGAAAGGTGCTGGCAGTGGTGGGAACCG
CAGAGGCCAGCTCCGAGCACCCCCTGGGCGTGGCCGTGACAAAGTACTGCAAGGAGGAG
CTGGGCACAGAGACACTGGGCTATTGTACCGACTTTCAGGCAGTGCCTGGATGCGGAAT
CGGCTGTAAGGTGTCCAACGTGGAGGGCATCCTGGCACACTCTGAGCGGCCCCTGTCTG
CCCCTGCAAGCCACCTGAATGAGGCAGGCAGCCTGCCAGCAGAGAAGGATGCAGTGCCT
CAGACATTCTCCGTGCTGATCGGCAACAGAGAGTGGCTGCGGAGAAATGGCCTGACCAT
CTCTAGCGACGTGAGCGACGCCATGACAGACCACGAGATGAAGGGCCAGACCGCCATCC
TGGTGGCCATCGATGGCGTGCTGTGCGGCATGATCGCCATCGCAGACGCAGTGAAGCAG GAGGCCGCCCTGGCAGTGCACACCCTGCAGTCTATGGGCGTGGATGTGGTGCTGATCAC
CGGCGACAACAGGAAGACAGCAAGGGCAATCGCAACCCAAGTGGGCATCAATAAGGTG
TTTGCCGAGGTGCTGCCATCCCACAAGGTGGCCAAGGTGCAGGAGCTGCAGAACAAGGG
CAAGAAGGTGGCCATGGTGGGCGATGGCGTGAATGACTCTCCCGCCCTGGCACAGGCAG
ATATGGGAGTGGCAATCGGCACAGGAACCGATGTGGCAATCGAGGCAGCAGACGTGGT
GCTGATCCGGAACGATCTGCTGGACGTGGTGGCCTCCATCCACCTGTCTAAGCGGACCG
TGAGGCGCATCAGAATCAACCTGGTGCTGGCCCTGATCTACAATCTGGTGGGCATCCCT
ATCGCAGCAGGCGTGTTCATGCCAATCGGCATCGTGCTGCAGCCATGGATGGGCAGCGC
CGCAATGGCAGCATCCAGCGTGAGCGTGGTGCTGAGCTCCCTGCAGCTGAAGTGTTACA
AGAAGCCTGACCTGGAGAGGTATGAGGCCCAGGCCCACGGCCACATGAAGCCACTGACC
GCCTCTCAGGTGAGCGTGCACATCGGCATGGACGATAGGTGGAGGGATAGCCCAAGGGC
AACACCATGGGACCAGGTGTCCTACGTGTCTCAGGTGAGCCTGTCTAGCCTGACCTCTG
ATAAGCCATCCAGGCACAGCGCCGCCGCCGACGATGACGGCGACAAGTGGAGCCTGCTG
CTGAATGGCCGCGATGAGGAGCAGTACATC
13. A vector system to express a coding sequence in a cell, said coding sequence consisting of a first portion (CDS1) and a second portion (CDS2) said vector system comprising: c) a first vector comprising:
- said first portion of said coding sequence (CDS1),
-a first intein nucleotide sequence coding for a N-lntein said N-intein having at least 80 % identity with SEQ ID No 3, 5, 7, 9, 11, 13 or a variant thereof or a fragment thereof or an homolog thereof and wherein said first intein nucleotide sequence is located at the 3' end of CDS1; and d) a second vector comprising:
- said second portion of said coding sequence (CDS2),
-a second intein nucleotide sequence coding for a C-lntein said C-intein has at least 80 % identity with SEQ ID No. 4, 6, 8, 10, 12, 14 or a variant thereof or a fragment thereof or an homolog thereof and wherein said second intein nucleotide sequence is located at the 5' end of CDS2; wherein said coding sequence encodes a sequence selected from the group of: i) MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN TSVVYKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTVVITLKNMASHPVSLHAV GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH VDLVKDLNSGUGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD AASARAWPKMHTVNGYVNRSLPGLIGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNH RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNE EAEDYDDDLTDSEMDWRFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA PDDRSYKSQYLNNGPQRIGRKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTL
LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGEIFKYKWTVTVEDGP
TKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDE
NRSWYLTENIQRFLPNPAGVQLEDPEFQASNIMHSINGYVFDSLQLSVCLHEVAYWYILS
IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRG
MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS
TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE
AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATE
LKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKSSPLT
ESGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTREITRTTLQSDQEEIDYDDT
ISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSV
PQFKKVVFQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNIMVTFRNQASRPYSFYS
SLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDV
HSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQM
EDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIHFSGHVFTVRK
KEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGM
ASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQ
GARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTLMVFFGNVDSSGIKHNIFNPPIIAR
YIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSK
ARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQD
GHQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCE
AQDLY or ii) MQIELSTCFFLCLLRFCFSATRRYYLGAVELSWDYMQSDLGELPVDARFPPRVPKSFPFN
TSVVYKKTLFVEFTDHLFNIAKPRPPWMGLLGPTIQAEVYDTVVITLKNMASHPVSLHAV
GVSYWKASEGAEYDDQTSQREKEDDKVFPGGSHTYVWQVLKENGPMASDPLCLTYSYLSH
VDLVKDLNSGUGALLVCREGSLAKEKTQTLHKFILLFAVFDEGKSWHSETKNSLMQDRD
AASARAWPKMHTVNGYVNRSLPGLIGCHRKSVYWHVIGMGTTPEVHSIFLEGHTFLVRNH
RQASLEISPITFLTAQTLLMDLGQFLLFCHISSHQHDGMEAYVKVDSCPEEPQLRMKNNE
EAEDYDDDLTDSEMDVVRFDDDNSPSFIQIRSVAKKHPKTWVHYIAAEEEDWDYAPLVLA
PDDRSYKSQYLNNGPQRIGRKYKKVRFMAYTDETFKTREAIQHESGILGPLLYGEVGDTL
LIIFKNQASRPYNIYPHGITDVRPLYSRRLPKGVKHLKDFPILPGEIFKYKWTVTVEDGP
TKSDPRCLTRYYSSFVNMERDLASGLIGPLLICYKESVDQRGNQIMSDKRNVILFSVFDE
NRSWYLTENIQRFLPNPAGVQLEDPEFQASNI HSINGYVFDSLQLSVCLHEVAYWYILS
IGAQTDFLSVFFSGYTFKHKMVYEDTLTLFPFSGETVFMSMENPGLWILGCHNSDFRNRG
MTALLKVSSCDKNTGDYYEDSYEDISAYLLSKNNAIEPRSFSQNPPVLTRSFSQNSRHPS
TRQKQFNATTIPENDIEKTDPWFAHRTPMPKIQNVSSSDLLMLLRQSPTPHGLSLSDLQE
AKYETFSDDPSPGAIDSNNSLSEMTHFRPQLHHSGDMVFTPESGLQLRLNEKLGTTAATE
LKKLDFKVSSTSNNLISTIPSDNLAAGTDNTSSLGPPSMPVHYDSQLDTTLFGKKSSPLT
ESGGPLSLSEENNDSKLLESGLMNSQESSWGKNVSSTRHQREITRTTLQSDQEEIDYDDT
ISVEMKKEDFDIYDEDENQSPRSFQKKTRHYFIAAVERLWDYGMSSSPHVLRNRAQSGSV
PQFKKVVFQEFTDGSFTQPLYRGELNEHLGLLGPYIRAEVEDNI VTFRNQASRPYSFYS
SLISYEEDQRQGAEPRKNFVKPNETKTYFWKVQHHMAPTKDEFDCKAWAYFSDVDLEKDV
HSGLIGPLLVCHTNTLNPAHGRQVTVQEFALFFTIFDETKSWYFTENMERNCRAPCNIQM
EDPTFKENYRFHAINGYIMDTLPGLVMAQDQRIRWYLLSMGSNENIHSIHFSGHVFTVRK
KEEYKMALYNLYPGVFETVEMLPSKAGIWRVECLIGEHLHAGMSTLFLVYSNKCQTPLGM ASGHIRDFQITASGQYGQWAPKLARLHYSGSINAWSTKEPFSWIKVDLLAPMIIHGIKTQ
GARQKFSSLYISQFIIMYSLDGKKWQTYRGNSTGTL VFFGNVDSSGIKHNIFNPPIIAR
YIRLHPTHYSIRSTLRMELMGCDLNSCSMPLGMESKAISDAQITASSYFTNMFATWSPSK
ARLHLQGRSNAWRPQVNNPKEWLQVDFQKTMKVTGVTTQGVKSLLTSMYVKEFLISSSQD
GHQWTLFFQNGKVKVFQGNQDSFTPVVNSLDPPLLTRYLRIHPQSWVHQIALRMEVLGCE
AQDLY or iii) SEQ ID NO: 21 wherein when the first vector and the second vector are inserted in a cell, the protein product of the coding sequence is produced by protein splicing.
14. The vector system according to any preceding claim, wherein at least one of the first vector and the second vector further comprises at least one enhancer or regulatory nucleotide sequence, operably linked to the coding sequence.
15. The vector system according to any preceding claim comprising: c) a first vector comprising in a 5'-3' direction:
- a 5'-inverted terminal repeat (5'-ITR) sequence;
- a promoter sequence;
- a 5' end portion of a coding sequence (CDS1), said 5'end portion being operably linked to and under control of said promoter;
- a first intein nucleotide sequence coding for a N-lntein; and
- a 3'-inverted terminal repeat (3'-ITR) sequence; and d) a second vector comprising in a 5'-3' direction:
- a 5'-inverted terminal repeat (5'-ITR) sequence;
- a promoter sequence;
- a second intein nucleotide sequence coding for a C-lntein;
- a 3'end portion of the coding sequence (CDS2); and
- a 3'-inverted terminal repeat (3'-ITR) sequence;
16. The vector system according to any preceding claim, wherein said first and second vector are independently a viral vector, preferably an adeno viral vector or adeno-associated viral (AAV) vector, preferably wherein said first and second adeno-associated viral (AAV) vectors are selected from the same or different AAV serotypes, preferably wherein the serotype is selected from the serotype 2, the serotype 8, the serotype 5, the serotype 7 or the serotype 9, serotype 7m8, serotype shlO; serotype 2(quad Y-F).
17. A host cell transformed, transfected or transduced with the vector system according to any preceding claim.
18. The vector system according to any one of claims 1-16 or the host cell according to claim
17 for medical use.
19. The vector system according to any one of claims 1-16 or the host cell according to claim 17 for use in gene therapy, preferably for use in the treatment and/or prevention of hemophilia or Wilson disease.
20. A pharmaceutical composition comprising the vector system according to any one of claims 1-16 or the host cell according to claim 17 and a pharmaceutically acceptable vehicle.
PCT/EP2021/059841 2020-04-15 2021-04-15 Constructs comprising inteins WO2021209574A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP20169712.5 2020-04-15
EP20169712 2020-04-15

Publications (1)

Publication Number Publication Date
WO2021209574A1 true WO2021209574A1 (en) 2021-10-21

Family

ID=70292772

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/059841 WO2021209574A1 (en) 2020-04-15 2021-04-15 Constructs comprising inteins

Country Status (1)

Country Link
WO (1) WO2021209574A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023221530A1 (en) * 2022-05-17 2023-11-23 复旦大学附属眼耳鼻喉科医院 Dual-carrier system for treating hearing damage, and use thereof
WO2023178337A3 (en) * 2022-03-18 2023-11-23 University Of Florida Research Foundation, Incorporated Methods and compositions for treating rbm20 related cardiomyopathy with a viral vector
WO2023239267A1 (en) * 2022-06-10 2023-12-14 Joint Stock Company "Biocad" Nucleic acid having promoter activity and use thereof
RU2818112C2 (en) * 2022-06-10 2024-04-24 Акционерное общество "БИОКАД" Nucleic acid having promoter activity and its use

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6544786B1 (en) 1999-10-15 2003-04-08 University Of Pittsburgh Of The Commonwealth Of Higher Education Method and vector for producing and transferring trans-spliced peptides
WO2018035503A1 (en) * 2016-08-18 2018-02-22 The Regents Of The University Of California Crispr-cas genome engineering via a modular aav delivery system
WO2020079034A2 (en) 2018-10-15 2020-04-23 Fondazione Telethon Intein proteins and uses thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6544786B1 (en) 1999-10-15 2003-04-08 University Of Pittsburgh Of The Commonwealth Of Higher Education Method and vector for producing and transferring trans-spliced peptides
WO2018035503A1 (en) * 2016-08-18 2018-02-22 The Regents Of The University Of California Crispr-cas genome engineering via a modular aav delivery system
WO2020079034A2 (en) 2018-10-15 2020-04-23 Fondazione Telethon Intein proteins and uses thereof

Non-Patent Citations (59)

* Cited by examiner, † Cited by third party
Title
A. C. NATHWANIE. TUDDENHAMP. CHOWDARYJ. MCINTOSHD. LEEC. ROSALESM. PHILLIPSJ. PIEZ. JUNFANGM. M. MEAGHER: "GO-8: Preliminary Results of a Phase / Dose Escalation Trial of Gene Therapy for Haemophilia a Using a Novel Human Factor VIII Variant", BLOOD, 2018
A. MADDALENA ET AL., MOL THER, vol. 26, 2018, pages 524 - 541
A.C., N. ET AL.: "Advances in Gene Therapy for Hemophilia", HUM. GENE THER., 2017
ANTONARAKIS, S. E. ET AL.: "Molecular etiology of factor VIII deficiency in hemophilia A", HUM. MUTAT., 1995
BENTEN D ET AL.: "Hepatic targeting of transplanted liver sinusoidal endothelial cells in intact mice", HEPATOLOGY, vol. 42, no. 1, 2005, pages 140 - 148, XP055108910, DOI: 10.1002/hep.20746
BOLTON-MAGGS, P. H. B.PASI, K. J.: "Haemophilias A and B.", LANCET, 2003
BOWEN, D. J: "Haemophilia A and haemophilia B: molecular insights", MOL PATHOL., vol. 55, no. l, February 2002 (2002-02-01), pages l-18
BROWN HCZAKAS PMGEORGE SNPARKER ETSPENCER HTDOERING CB: "Target-Cell-Directed Bioengineering Approaches for Gene Therapy of Hemophilia A", MOL THER METHODS CLIN DEV, vol. 9, 31 January 2018 (2018-01-31), pages 57 - 69, XP055675736, DOI: 10.1016/j.omtm.2018.01.004
BUIAKOVA 01 ET AL.: "Null mutation of the murine ATP7B (Wilson disease) gene results in intracellular copper accumulation and late-onset hepatic nodular transformation", HUM MOL GENET, vol. 8, no. 9, September 1999 (1999-09-01), pages 1665 - 71
BULL PC ET AL.: "The Wilson disease gene is a putative copper transporting P-type ATPase similar to the Menkes gene", NAT GENET, vol. 5, 1993, pages 327 - 337, XP000197785, DOI: 10.1038/ng1293-327
BURTON, M. ET AL.: "Coexpression of factor VIII heavy and light chain adeno-associated viral vectors produces biologically active protein", PROC. NATL. ACAD. SCI. U. S. A., 1999
CHEN, L. ET AL.: "Enhanced factor VIII heavy chain for gene therapy of Hemophilia A", MOL. THER., 2009
CHERIYAN, M. ET AL.: "Traceless splicing enabled by substrate-induced activation of the Nostoc punctiforme Npu DnaE intein after mutation of a catalytic cysteine to serine", J. MOL. BIOL., 2014
DONG, B. ET AL.: "Characterization of genome integrity for oversized recombinant AAV vector", MOL. THER., 2010
F. PEYVANDIP. M. MANNUCCII. GARAGIOLAA. EI-BESHLAWYM. ELALFYV. RAMANANP. ESHGHIS. HANAGAVADIR. VARADARAJANM. KARIMI: "Trial of Factor VIII and Neutralizing Antibodies in Hemophilia A", N. ENGL. J. MED., vol. 374, 2016, pages 2054 - 2064
F. X. ZHUZ. L. LIUX. L. WANGJ. MIAOH. G. QUX. Y. CHI: "Inter-chain disulfide bond improved protein trans-splicing increases plasma coagulation activity in C57BL/6 mice following portal vein FVIII gene delivery by dual vectors", SCI. CHINA LIFE SCI., 2013
F. ZHU ET AL., SCI CHINA LIFE, 2010
F. ZHU ET AL., SCI CHINA LIFE, 2013
GRIEGER, J. C. ET AL.: "Packaging Capacity of Adeno-Associated Virus Serotypes: Impact of Larger Genomes on Infectivity and Postentry Steps", J. VIROL., 2005
H. SANDBERGA. ALMSTEDTJ. BRANDTE. GRAYL. HOLMQUISTU. OSWALDSSONS. SEBRINGM. MIKAELSSON: "Structural and functional characteristics of the B-domain-deleted recombinant factor VIII protein, r-VIII SQ", THROMB. HAEMOST, 2001
H. Z. MIAON. SIRACHAINANL. PALMERP. KUCABM. A. CUNNINGHAMR. J. KAUFMANS. W. PIPE: "Bioengineering of coagulation factor VIII for improved secretion", BLOOD
HIRSCH, M. ET AL.: "Little vector, big gene transduction: Fragmented genome reassembly of adeno-associated virus", MOLECULAR THERAPY, 2010
HUSTER ET AL.: "Consequences of copper accumulation in the livers of the Atp7b-/- (Wilson disease gene) knockout mice", AM J PATHOL, vol. 168, 2006, pages 423 - 434
IWAI, H.ZUGER, S.JIN, J.TAM, P. H.: "Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme", FEBS LETT, vol. 580, 2006, pages 1853 - 1858, XP028030313, DOI: 10.1016/j.febslet.2006.02.045
J. S. S. BUTTERFIELDK. M. HEGER. W. HERZOGR. KACZMAREK: "A Molecular Revolution in the Treatment of Hemophilia", MOL. THER., 2019
J. ZETTLERV. SCHUTZH. D. MOOTZ: "The naturally split Npu DnaE intein exhibits an extraordinarily high rate in the protein trans-splicing reaction", FEBS LETT, vol. 583, 2009, pages 909 - 914, XP025992861, DOI: 10.1016/j.febslet.2009.02.003
K. V. MILLSM. A. JOHNSONF. B. PERLER, J BIOL CHEM, vol. 289, 2014, pages 14498 - 14505
L. P. PELLISSIER ET AL., MOL THER METHODS CLIN DEV, vol. 1, 2014, pages 14009
L. VILLIGER ET AL., NAT MED, vol. 24, 2018, pages 1519 - 1525
LEVITT N, GENES DEV, vol. 3, no. 7, July 1989 (1989-07-01), pages 1019 - 25
LEVITT, N.BRIGGS, D.GIL, A.PROUDFOOT, N. J: "Definition of an efficient synthetic poly(A) site", GENES DEV, 1989
M. DORIAA. FERRARAA. AURICCHIO, HUM GENE THER METHODS, vol. 24, 2013, pages 392 - 398
M. MAKRIS: "Gene therapy 1.0 in haemophilia: effective and safe, but with many uncertainties", LANCET HAEMATOL, 2020
MADDALENA A ET AL.: "High-Throughput Screening Identifies Kinase Inhibitors That Increase Dual Adeno-Associated Viral Vector Transduction In Vitro and in Mouse Retina", HUM GENE THER, vol. 29, 2018, pages 886 - 901
MAKRIS, M.: "Gene therapy 1.0 in haemophilia: effective and safe, but with many uncertainties", THE LANCET HAEMATOLOGY, 2020
MANCO-JOHNSON, M. J. ET AL.: "Prophylaxis versus episodic treatment to prevent joint disease in boys with severe hemophilia", N. ENGL. J. MED., 2007
MCINTOSH J, BLOOD, vol. 121, no. 17, 20 February 2013 (2013-02-20), pages 3335 - 3344
MIAO, H. Z. ET AL.: "Bioengineering of coagulation factor VIII for improved secretion", BLOOD, 2004
MURILLO OLUQUI DMGAZQUEZ CMARTINEZ-ESPARTOSA DNAVARRO-BLASCO IMONREAL JIGUEMBE L ET AL.: "Long-term metabolic correction of Wilson's disease in a murine model by gene therapy", J HEPATOL, vol. 64, 2016, pages 419 - 426, XP029389254, DOI: 10.1016/j.jhep.2015.09.014
MURILLO OMORENO DGAZQUEZ CBARBERIA MCENZANO INAVARRO IURIARTE I ET AL.: "Liver Expression of a MiniATP7B Gene Results in Long-Term Restoration of Copper Homeostasis in a Wilson Disease Model in Mice", HEPATOLOGY, vol. 70, 2019, pages 108 - 126, XP055774699, DOI: 10.1002/hep.30535/suppinfo
N. H. SHAH ET AL., J AM CHEM SOC, vol. 135, 2013, pages 5839 - 5847
N. J. WARDS. M. K. BUCKLEYS. N. WADDINGTONT. VANDENDRIESSCHEM. K. L. CHUAHA. C. NATHWANIJ. MCINTOSHE. G. D. TUDDENHAMC. KINNONA. J: "Codon optimization of human factor VIII cDNAs leads to high-level expression", BLOOD, 2011
N. LEVITTD. BRIGGSA. GILN. J. PROUDFOOT: "Definition of an efficient synthetic poly(A) site", GENES DEV., 1989
NATHWANI, A. C.DAVIDOFF, A. M.TUDDENHAM, E. G. D.: "Prospects for gene therapy of haemophilia", HAEMOPHILIA, 2004
P. J. LENTINGC. ROSALESD. LEES. RABBANIAND. RAJN. PATELE. G. D. TUDDENHAMO. D. CHRISTOPHEJ. H. MCVEYS. WADDINGTON: "Therapeutic levels of FVIII following a single peripheral vein administration of rAAV vector encoding a novel human factor VIII variant", BLOOD, vol. Therapeutic levels of FVIII following a single per, 2013, pages 3335 - 3344
P. TORNABENEI. TRAPANIR. MINOPOLIM. CENTRULOM. LUPOS. DE SIMONEP. TIBERIF. DELL'AQUILAE. MARROCCOC. LODICE: "Intein-mediated protein trans-splicing expands adeno-associated virus transfer capacity in the retina", SCI. TRANSL. MED, 2019
P. TORNABENEI. TRAPANIR. MINOPOLIM. CENTRULOM. LUPOS. DE SIMONEP. TIBERIF. DELL'AQUILAE. MARROCCOC. LODICE: "Intein-mediated protein trans-splicing expands adeno-associated virus transfer capacity in the retina", SCI. TRANSL. MED., 2019
PERLER, F. B.: "InBase, the Intein Database", NUCLEIC ACIDS RES., vol. 30, 2002, pages 383 - 384, XP002972669, DOI: 10.1093/nar/30.1.383
PILANKATTA RLEWIS DINESI G: "Involvement of protein kinase D in expression and trafficking of ATP7B (copper ATPase", J BIOL CHEM, vol. 286, 2011, pages 7389 - 739617
ROSENCRANTZ RSCHILSKY M: "Wilson disease: pathogenesis and clinical considerations in diagnosis and treatment", SEMIN LIVER DIS, vol. 31, 2011, pages 245 - 259
SAMBROOK, J.RUSSELL, D.W.: "Molecular cloning: a laboratory manual", 2001, COLD SPRING HARBOR LABORATORY PRESS, pages: 999
SCALLAN, C. D ET AL.: "Phenotypic correction of a mouse model of hemophilia A using AAV2 vectors encoding the heavy and light chains of FVIII", BLOOD, 2003
SHAH, N. H: "Extein residues play an intimate role in the rate-limiting step of protein trans -splicing", J. AM. CHEM. SOC, 2013
TANZI RE ET AL.: "The Wilson disease gene is a copper transporting ATPase with homology to the Menkes disease gene", NAT GENET, vol. 5, 1993, pages 344 - 350, XP000619625, DOI: 10.1038/ng1293-344
WHITE, G. C. ET AL.: "Definitions in Hemophilia", THROMB. HAEMOST., 2001
WU, Z. ET AL.: "Effect of genome size on AAV vector packaging", MOL. THER., 2010
Y. LI, BIOTECHNOL LETT, vol. 37, 2015, pages 2121 - 2137
ZHU FUXIANG ET AL: "Proteintrans-splicing based dual-vector delivery of the coagulation factor VIII gene", SCIENCE CHINA LIFE SCIENCES, ZHONGGUO KEXUE ZAZHISHE, CHINA, vol. 53, no. 6, 3 July 2010 (2010-07-03), pages 683 - 689, XP035977795, ISSN: 1674-7305, [retrieved on 20100703], DOI: 10.1007/S11427-010-4011-7 *
ZHU, F. X. ET AL.: "Enhanced plasma factor VIII activity in mice via cysteine mutation using dual vectors", SCI. CHINA LIFE SCI., 2012

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023178337A3 (en) * 2022-03-18 2023-11-23 University Of Florida Research Foundation, Incorporated Methods and compositions for treating rbm20 related cardiomyopathy with a viral vector
WO2023221530A1 (en) * 2022-05-17 2023-11-23 复旦大学附属眼耳鼻喉科医院 Dual-carrier system for treating hearing damage, and use thereof
WO2023239267A1 (en) * 2022-06-10 2023-12-14 Joint Stock Company "Biocad" Nucleic acid having promoter activity and use thereof
RU2818112C2 (en) * 2022-06-10 2024-04-24 Акционерное общество "БИОКАД" Nucleic acid having promoter activity and its use

Similar Documents

Publication Publication Date Title
WO2021209574A1 (en) Constructs comprising inteins
US20230022390A1 (en) Vectors for Liver-Directed Gene Therapy of Hemophilia and Methods and Use Thereof
US11110153B2 (en) Modified factor IX, and compositions, methods and uses for gene transfer to cells, organs, and tissues
US11939590B1 (en) AAV vector compositions and methods for gene transfer to cells, organs and tissues
US11419920B2 (en) Factor VIII sequences
US11007280B2 (en) Optimized liver-specific expression systems for FVIII and FIX
JP6495273B2 (en) Compositions, methods and uses for mutant AAV and gene transfer into cells, organs and tissues
US10398787B2 (en) Vectors for liver-directed gene therapy of hemophilia and methods and use thereof
Ohmori et al. New approaches to gene and cell therapy for hemophilia
KR20210021310A (en) Codon-optimized acid alpha-glucosidase expression cassette and methods of using the same
WO2014063753A1 (en) Hyper-active factor ix vectors for liver-directed gene therapy of hemophilia &#39;b&#39; and methods and use thereof
JP2022524434A (en) Non-viral DNA vector and its use for expressing FVIII therapeutic agents
WO2023135273A2 (en) Compositions of dna molecules encoding factor viii, methods of making thereof, and methods of use thereof
TW202237850A (en) Novel compositions with tissue-specific targeting motifs and compositions containing same
EP3728599A1 (en) Ectopically expressed transcription factors and uses thereof
US20230149563A1 (en) Compositions and methods for expressing factor ix for hemophilia b therapy
US20240082429A1 (en) Pah-modulating compositions and methods
WO2023250492A2 (en) Fah-modulating compositions and methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21718886

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21718886

Country of ref document: EP

Kind code of ref document: A1