WO2023141582A1 - Engineered promoters - Google Patents

Engineered promoters Download PDF

Info

Publication number
WO2023141582A1
WO2023141582A1 PCT/US2023/061014 US2023061014W WO2023141582A1 WO 2023141582 A1 WO2023141582 A1 WO 2023141582A1 US 2023061014 W US2023061014 W US 2023061014W WO 2023141582 A1 WO2023141582 A1 WO 2023141582A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
control element
transcription control
nucleotide sequence
cell
Prior art date
Application number
PCT/US2023/061014
Other languages
French (fr)
Inventor
David C. James
Yusuf JOHARI
Original Assignee
Regenxbio Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Regenxbio Inc. filed Critical Regenxbio Inc.
Publication of WO2023141582A1 publication Critical patent/WO2023141582A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1086Preparation or screening of expression libraries, e.g. reporter assays
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6897Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids involving reporter genes operably linked to promoters

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biochemistry (AREA)
  • Plant Pathology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Provided herein are compact CMV-derived transcription control elements that exhibit varied transcriptional efficiency (activity per unit DNA sequence) or total activity. Also provided are methods of using the CMV-derived transcription control elements in the recombinant expression of polypeptides. In some embodiments, the CMV-derived transcription control elements exhibit a significantly higher transcriptional efficiency (activity per unit DNA sequence) or increase in total activity compared to the parental CMV promoter. In some embodiments, the CMV-derived transcription control elements are used to express a recombinant polypeptide in mammalian cells, e.g. HEK293 cells or HEK293-derived cells.

Description

ENGINEERED PROMOTERS
TECHNICAL FIELD
[0001] The present disclosure relates to transcription control elements and methods of using thereof.
CROSS-REFRENCE TO RELATED APPLICATIONS
[0002] This application claims the benefit of U.S. application no. 63/301,504, filed January 21, 2022, which is incorporated herein by reference in its entirety.
REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY
[0003] The content of the electronically submitted sequence listing (Name: 6728_1701_Sequence_Listing.xml; Size: 89,152 bytes; and Date of Creation: January 20, 2023) filed with the application is incorporated herein by reference in its entirety.
BACKGROUND
[0004] hCMV-IE promoter (henceforth referred to as the CMV promoter) is a highly complex element comprising binding sites (transcription factor regulatory elements [TFREs]) for numerous ubiquitously expressed transcription factors (TFs). [1] This is not surprising considering that the promoter has evolved to function in a broad cell tropism. [2, 3] However, promoter activity in any given host is regulated by a system-specific combination of interactions between the promoter’s constituent TFREs and the cells repertoire of endogenous TFs. [4] Therefore, transcriptional activity of the CMV promoter is highly context-specific and cell type-dependent expression has been observed both in vivo [5, 6] and in vitro [7, 8]. With respect to the latter, it has been demonstrated that CMV-driven transient gene expression in Chinese hamster ovary (CHO) cells was largely as a function of transactivation mediated through just two discrete TFREs (NF-KB and CREB). [9] Further, the CMV promoter comprises binding sites of several transcriptional repressors such as YY 1 — conferring on cytomegalovirus the ability to establish latent infection. [3, 9-11] Accordingly, it is likely that the CMV promoter is fundamentally sub-optimal for use in unnatural, specific processes such as recombinant gene expression in e.g. CHO cells or HEK293 cells.
[0005] There is a need in the art for highly compact novel promoters with varied transcriptional activity.
BRIEF SUMMARY
[0006] In one aspect, the disclosure provides a transcription control element comprising (a) a distal cis-regulatory module (CRM), (b) a proximal CRM, and (c) a core promoter wherein (i) the distal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 4, wherein the distal CRM does not comprise the nucleotide sequence of SEQ ID NO: 8 or SEQ ID NO: 10, and (ii) the proximal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 6, wherein the proximal CRM does not comprise the nucleotide sequence of SEQ ID NO: 12. In some embodiments, the transcription control element is capable of mediating transcription of a polynucleotide encoding a polypeptide of interest operably linked to the transcription control element. In some embodiments, the transcription control element is capable of mediating transcription of a GFP reporter construct comprising the nucleotide sequence of SEQ ID NO: 44 in a HEK293 cell. In some embodiments, the transcription control element is less than 550 nucleotides in length. In some embodiments, the transcription control element comprises between 400 and 550 nucleotides. In some embodiments, the core promoter comprises a TATA-box and an Inr element. In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 16. In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 2. In some embodiments, the distal CRM comprises one or more of (a) the nucleotide sequence of SEQ ID NO: 9, and (b) the nucleotide sequence of SEQ ID NO: 11. In some embodiments, the distal CRM comprises the nucleotide sequence of SEQ ID NO: 5. In some embodiments, the proximal CRM comprises the nucleotide sequence of SEQ ID NO: 13. In some embodiments, the proximal CRM comprises the nucleotide sequence of SEQ ID NO: 7. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 3. [0007] In a further aspect, the disclosure provides transcription control element comprising a cis- regulatory module (CRM), and a core promoter, wherein the CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 17-35. In some embodiments, the transcription control element comprises less than 550 nucleotides. In some embodiments, the transcription control element comprises between 190 and 550 nucleotides. In some embodiments, the transcription control element is capable of mediating transcription of a polynucleotide encoding a polypeptide of interest operably linked to the transcription control element. In some embodiments, the transcription control element is capable of mediating transcription of a GFP reporter construct comprising the nucleotide sequence of SEQ ID NO: 44 in a HEK293 cell. In some embodiments, the core promoter comprises a TATA- box and an Inr element. In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 16. In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 2. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 36-43.
[0008] In a further aspect, the disclosure provides an isolated polynucleotide comprising a transcription control element described herein. In some embodiments, the isolated polynucleotide further comprises an enhancer, splice acceptor, splice donor or intron operably linked to the transcription control element.
[0009] In a further aspect, the disclosure provides a vector comprising a transcription control element described herein. In some embodiments, the vector is a viral vector. In some embodiments, the vector further comprises a polynucleotide encoding a polypeptide of interest operably linked to the transcription control element. In some embodiments, the polypeptide of interest is an antibody, or antigen-binding fragment thereof, fusion protein, Fc-fusion polypeptide, immunoadhesin, immunoglobulin, engineered protein, protein fragment or enzyme. In some embodiments, the polypeptide of interest is a viral protein. In some embodiments, the polypeptide of interest is a viral capsid protein. In some embodiments, the polypeptide of interest is a viral Cap or Rep protein.
[0010] In a further aspect, the disclosure provides a host cell comprising an isolated polynucleotide described herein or a vector described herein. In some embodiments, the host cell comprises a HEK293 cell, HEK293 derived cell, CHO cell, CHO derived cell, HeLa cell, SF-9 cell, BHK cell, Vero cell, or PerC6 cell. In some embodiments, the host cell comprises a HEK293 cell or HEK293 derived cell.
[0011] In a further aspect, the disclosure provides a method of expressing a polypeptide of interest in a host cell comprising culturing a host cell described herein under suitable conditions to produce the polypeptide of interest. In some embodiments, the polypeptide of interest is an antibody, or antigen-binding fragment thereof, fusion protein, Fc-fusion polypeptide, immunoadhesin, immunoglobulin, engineered protein, protein fragment or enzyme.
[0012] In some embodiments, the disclosure provides:
[1.] A transcription control element comprising a) a distal cis-regulatory module (CRM), b) a proximal CRM, and c) a core promoter, wherein i. the transcription control element is less than 550 nucleotides long, ii. The distal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 4, wherein the distal CRM does not comprise the nucleotide sequence of SEQ ID NO: 8 or SEQ ID NO: 10, and iii. The proximal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 6, wherein the proximal CRM does not comprise the nucleotide sequence of SEQ ID NO: 12.
[2.] The transcription control element of [1], which is between 400 and 550 nucleotides long.
[3.] The transcription control element of [1] or [2], which is capable of mediating transcription of a heterologous polynucleotide encoding a GFP polypeptide comprising the amino acid sequence of SEQ ID NO: 45 in a HEK293 cell.
[4.] The transcription control element of any one of [1] to [3], wherein the core promoter comprises a TATA-box and an Inr element.
[5.] The transcription control element of any one of [1] to [3], wherein the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 16.
[6.] The transcription control element of any one of [1] to [3], wherein the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 2.
[7.] The transcription control element of any one of [1] to [6], wherein the distal CRM comprises one or more of a) the nucleotide sequence of SEQ ID NO: 9, and b) the nucleotide sequence of SEQ ID NO: 11. [8.] The transcription control element of [7], wherein the distal CRM comprises the nucleotide sequence of SEQ ID NO: 5.
[9.] The transcription control element of any one of [1] to [8], wherein the proximal CRM comprises the nucleotide sequence of SEQ ID NO: 13.
[10.] The transcription control element of [9], wherein the proximal CRM comprises the nucleotide sequence of SEQ ID NO: 9.
[11.] The transcription control element of any one of [1] to [10] comprising the nucleotide sequence of SEQ ID NO: 3.
[12.] A transcription control element comprising a) a cis-regulatory module (CRM), and b) a core promoter, wherein i. the CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 17-35, and ii. The transcription control element is less than 550 nucleotides long.
[13.] The transcription control element of [12], which is between 190 and 550 nucleotides long.
[14.] The transcription control element of [12] or [13], which is capable of mediating transcription of a heterologous polynucleotide encoding a GFP polypeptide comprising the amino acid sequence of SEQ ID NO: 45 in a HEK293 cell.
[15.] The transcription control element of any one of [12] to [14], wherein the core promoter comprises a TATA-box and an Inr element.
[16.] The transcription control element of any one of [12] to [14], wherein the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 16.
[17.] The transcription control element of any one of [12] to [14], wherein the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 2.
[18.] The transcription control element of any one of [12] to [17] comprising the nucleotide sequence of SEQ ID NO: 38-43.
[19.] A transcription control element i) comprising a nucleotide sequence of SEQ ID NO: 36 or SEQ ID NO:37, or ii) comprising a nucleotide sequence with at least 99% sequence identity to SEQ ID NO: 36 or 37. [20.] The transcription control element of [19], which is capable of mediating transcription of a heterologous polynucleotide encoding a GFP polypeptide comprising the amino acid sequence of SEQ ID NO: 45 in a HEK293 cell.
[21.] An isolated polynucleotide comprising the transcription control element of any one of [1] to [20],
[22.] The isolated polynucleotide of [15] further comprising an enhancer, splice acceptor, splice donor or intron operably linked to the transcription control element.
[23.] A vector comprising the transcription control element of any one of [1] to [21].
[24.] The vector of [23], which is a viral vector.
[25.] The vector of [23] or [24] further comprising a polynucleotide encoding a polypeptide of interest operably linked to the transcription control element.
[26.] The vector of [25] wherein the polypeptide of interest is an antibody, or antigen-binding fragment thereof, fusion protein, Fc-fusion polypeptide, immunoadhesin, immunoglobulin, engineered protein, protein fragment or enzyme.
[27.] The vector of [25], wherein the polypeptide of interest is an antibody.
[28.] The vector of [25], wherein the polypeptide of interest is a viral protein.
[29.] The vector of [25], wherein the polypeptide of interest is a viral capsid protein.
[30.] The vector of [25], wherein the polypeptide of interest is a viral Rep protein.
[31.] A host cell comprising the isolated polynucleotide of [21 or [22 or the vector of any one of [23] to [30],
[32.] The host cell of [31], which comprises a HEK293 cell, HEK293 derived cell, CHO cell, CHO derived cell, HeLa cell, SF-9 cell, BHK cell, Vero cell, or PerC6 cell.
[33.] The host cell of [31], which comprises a HEK293 cell or HEK293 derived cell.
[34.] A method of expressing a polypeptide of interest in a host cell comprising culturing the host cell of any one of [25] to [33] under suitable conditions to produce the polypeptide of interest.
[0013] Still other features and advantages of the compositions and methods described herein will become more apparent from the following detailed description when read in conjunction with the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Figure 1. In silico identification of potential transcriptional regulators of CMV promoter activity. CMV promoter (-550 to +48 relative to the transcription start site; TSS) was surveyed for the presence of putative transcription factor regulatory elements (TFREs) using Genomatix software. Discrete TFREs (108) identified in CMV promoter were subsequently analyzed for the presence of their cognate transcription factors (TFs) in HEK293 cells. (A) RNA-seq analysis of HEK293 cell transcriptome determined the relative gene expression level of TFs. Points represent the expression level (transcripts per million; TPM) of each TF sampled at exponential and stationary phases of culture. Genes with more than two transcripts per million (log2 TPM > 1) was considered as actively transcribed genes. (B) CMV promoter sequence (SEQ ID NO: 1) with selected TFREs for in vitro analysis. The TSS is indicated with an arrow.
[0015] Figure 2. Identification of active transcription factor regulatory elements (TFREs). (A) TFRE sequence derived from the CMV promoter (black bars) or its consensus sequence (gray bars) was cloned in series (7x copies) upstream of a minimal CMV core promoter in GFP-reporter vectors. HEK293 cells were transfected with each homotypic TFRE-reporter using polyethylenimine (PEI) and cultured in tube-spin bioreactors at 37°C. GFP expression was quantified 48 h post-transfection. (B) NF-KB p65 consensus sequence was cloned in series (7x copies) upstream of a minimal CMV core promoter in GFP-reporter vectors and transfected into HEK293 and CHO-S cells alongside NF-KB and CREB/E4F constructs from A. Cells were cultured in tube-spin bioreactors at 37°C and GFP expression was quantified 48 h post-transfection. Data are expressed as a percentage with respect to the GFP expression of a vector containing the CMV promoter. Data shown are the mean value ± standard deviation of two independent experiments each performed in duplicate.
[0016] Figure 3. Relative transcriptional activity exhibited by CMV promoter structural elements. (A) The CMV promoter contains the proximal and distal enhancers and clusters of TFREs (cis- regulatory modules; CRMs). Each element was cloned upstream of a minimal CMV core promoter in GFP reporter plasmids while the CBh promoter (793 bp) was inserted directly upstream of the GFP open reading frame. (B) Reporter plasmids were transfected into HEK293 cells using PEI and cultured in tube-spin bioreactors at 37°C. GFP expression was quantified 48 h post-transfection. Data are expressed as a percentage with respect to the GFP expression of a vector containing the CMV promoter. Data shown are the mean value ± standard deviation of two independent experiments each performed in duplicate.
[0017] Figure 4. A proximal CMV promoter devoid of two Spl sites near the TATA box drives inefficient transcription activation in HEK293 cells. (A) Wild-type (WT: residues 444-506 of SEQ ID NO: 1)) and mutated proximal CMV promoters (-300 to +48 relative to the TSS) with specific TFREs knocked-out (KO: SEQ ID NO: 46) were synthesized and cloned into GFP reporter vectors. Selective mutation was performed on a specific TFRE to disrupt the binding site without perturbing overlapping or introducing new TFREs. (B) The relative activity of each proximal CMV promoter construct was determined in HEK293 and Expi293F cells. Reporter plasmids were transfected into HEK293 cells using PEI and cultured in tube-spin bioreactors at 37°C. GFP expression was quantified 48 h post-transfection. Data are expressed as a percentage with respect to the GFP expression of a vector containing the full-length CMV promoter. Data shown are the mean value ± standard deviation of two independent experiments each performed in duplicate.
[0018] Figure 5. Removal of transrepression mediated by YY1 and RBP-JK and redundant sequences enhances CMV promoter activity. (A) Selective mutation was performed on a specific TFRE to disrupt the binding site without perturbing overlapping or introducing new TFREs. (WT: residues 9-448 of SEQ ID NO: 1; KO: residues 9-448 of SEQ ID NO: 29) (B) Wild-type (WT) CMV promoters (-550 to +48 relative to the TSS) and mutated CMV variants with specific TFREs knocked-out (KO) were synthesized and cloned into GFP reporter vectors. The locations of the repressor elements in CMV promoter are underlined. Numbers denote the corresponding TFRE knock-out in A. A fourth putative YYI binding site excluded by the TFRE analysis in this study is shown in bracket. (C) The relative activity of each promoter construct was determined in HEK293 and Expi293F cells. Reporter plasmids were transfected into HEK293 cells using PEI and cultured in tube-spin bioreactors at 37°C. GFP expression was quantified 48 h post-transfection. Data are expressed as a percentage with respect to the GFP expression of a vector containing the wild-type CMV promoter. Data shown are the mean value + standard deviation of two independent experiments each performed in duplicate.
[0019] Figure 6. GFP fluorescence intensity in HEK293 cells is logarithmically proportional to GFP mRNA levels post-transfection. IxlO6 viable cells/mL of suspension HEK293 cells were transfected with up to 0.8 pg plasmid encoding GFP and cultured in tube-spin bioreactors at 37°C. A promoterless vector used to equalize total DNA load. (A) Measurement of GFP fluorescence at 48 h post-transfection, and (B) the corresponding mRNA levels by qRT-PCR quantified as previously described. [12] Data shown are the mean value ± standard deviation of two independent experiments each performed in duplicate.
[0020] Figure 7. Repeat motifs within the distal enhancer of CMV promoter (SEQ ID NO: 1). The 21 bp repeat motifs (highlighted in gray), each overlapping a YY1 and an Spl binding site, have been shown to bind transcriptional repressor ERF for the negative regulation of CMV promoter activity. [11]
DETAILED DESCRIPTION
[0021] In one aspect, provided herein are highly compact novel transcription control elements with varied transcriptional activity. In some embodiments, the transcription control elements described herein comprise elements derived from the CMV promoter. In some embodiment, the transcription control elements described herein comprise elements derived from the CMV promoter and from one or more other promoter(s). The novel transcription control elements described herein offer differential transcriptional activities to suit various circumstances in recombinant vector technology. In some embodiments, a transcription control element described herein has increased transcriptional activity, for example, increased transcriptional activity compared to the CMV promoter. In some embodiments, a transcription control element described herein has decreased transcriptional activity, for example, decreased transcriptional activity compared to the CMV promoter. Transcription control elements with decreased transcriptional activity can be useful for expressing a polypeptide of interest in tissues or cells in which expression of the polypeptide of interest can be toxic to the cells or tissue. The compact size of the transcription control elements described herein is beneficial when maximum base pair size of an expression cassette is restrictive. For example, the compact size of the transcription control elements described herein is beneficial when used in a viral expression vector, such as an AAV vector. In some embodiments, the transcription control elements described herein are suitable for transgene expression in HEK293 cells, HEK293 derived cells, CHO cells, and CHO derived cells.
[0022] As discussed in the Examples below, regulators of CMV-mediated transient gene expression in HEK293 cells were identified through mechanistic dissection of the CMV promoter. Extensive bioinformatic analysis was performed on the promoter’s transcription factor regulatory element composition, coupled with a detailed in vitro comparative analysis of the relative influence of CMV component parts on gene expression to identify functional elements (transcription factor regulatory element sequences and cA-regulatory modules) that critically control promoter activity in HEK293 cells. It was demonstrated, for the first time, that the wild-type CMV promoter can be re-engineered for HEK293 cells to derive highly compact and transcriptionally efficient novel transcription control elements with increased transcriptional activity.
Definitions
[0023] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure is related. To facilitate an understanding of the disclosed methods, a number of terms and phrases are defined below.
[0024] A "core promoter" refers to a nucleotide sequence that is the minimal portion of the promoter required to initiate transcription. Core promoter sequences can be derived from prokaryotic or eukaryotic genes, including, e.g., the CMV immediate early gene promoter or SV40. In some embodiments, the core promoter is derived from a CMV promoter. In some embodiments, the core promoter is derived from the CMV promoter of SEQ ID NO: 1. In some embodiments, the core promoter comprises the sequence of SEQ ID NO: 2. In some embodiments, the core promoter comprises the sequence of SEQ ID NO: 16.
[0025] In some embodiments, the core promoter comprises one or more of a TATA box, an initiator (Inr), downstream promoter element (DPE), and motif ten element (MTE). In some embodiments, the core promoter comprises a TATA box. In some embodiments, the core promoter comprises an initiator (Inr). In some embodiments, the core promoter comprises a TATA box and an initiator (Inr).
[0026] The term "expression vector" as used herein includes an isolated polynucleotide which upon transfer, e.g., by transfection, into an appropriate host cell provides for expression of a recombinant gene product, e.g., recombinant polypeptide, within the host cell. In addition to the polynucleotide sequence coding for the recombinant gene product, e.g., recombinant polypeptide, the expression vector can comprise regulatory sequences that mediate transcription of the coding sequence into RNA and/or mediate translation of the RNA into proteins in the host cell. [0027] The term "expression cassette" as used herein includes a polynucleotide sequence encoding a polypeptide to be expressed and sequences controlling its expression such as a promoter, e.g., a transcription control element described herein, and optionally an enhancer sequence, including any combination of cis-acting transcriptional control elements. In some embodiments, an expression cassette also contains a downstream 3'-untranslated region comprising a polyadenylation site. A transcription control element is either directly linked to the polynucleotide sequence encoding the polypeptide to be expressed, or is separated therefrom by intervening DNA such as, for example, the 5'-untranslated region of a heterologous gene.
[0028] The terms "host cell" or "host cell line" as used herein include any cells which are capable of growing in culture and either expressing a desired recombinant product protein or reproducing a polynucleotide, e.g., a vector, described herein. In some embodiments, the host cell is a prokaryotic cell, e.g., E. coli, capable of reproducing a polynucleotide, e.g., a vector, described herein. In some embodiments, the host cell is a eukaryotic cell, e.g., a mammalian cell, capable of expressing a desired recombinant polypeptide. In some embodiments, the host cell is a HEK293 cell, HEK293 derived cell, CHO cell, CHO derived cell, HeLa cell, SF-9 cell, BHK cell, Vero cell, or PerC6 cell. In some embodiments, the host cell is a HEK293 cell. In some embodiments, the host cell is a HEK293 derived cell. In some embodiments, a derived cell is a cell derived from a parental cell line through limiting dilution. For example, in some embodiments, a HEK293 derived cell is a cell derived from the HEK293 cell line through limiting dilution. A skilled artisan understands that a derived cell, e.g., HEK293 derived cell, can be expanded into a population of cells through standard culture methods.
[0029] A polypeptide, antibody, polynucleotide, vector, cell, or composition, which is "isolated" is a polypeptide, antibody, polynucleotide, vector, cell, or composition, which is in a form not found in nature. Isolated polypeptides, antibodies, polynucleotides, vectors, cell or compositions include those which have been purified to a degree that they are no longer in a form in which they are found in nature. In some embodiments, an antibody, polynucleotide, vector, cell, or composition, which is isolated is substantially pure.
[0030] The terms "polypeptide," "peptide," and "protein" are used interchangeably herein to refer to polymers of amino acids of any length. The polymer can be linear or branched, it can comprise modified amino acids, and it can be interrupted by non-amino acids. The terms also encompass an amino acid polymer that has been modified naturally or by intervention; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation or modification, such as conjugation with a labeling component. Also included within the definition are, for example, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, etc.), as well as other modifications known in the art. It is understood that, because the polypeptides described herein are based upon antibodies, in certain embodiments, the polypeptides can occur as single chains or associated chains.
[0031] The terms "identical" or percent "identity" in the context of two or more nucleic acids or polypeptides, refer to two or more sequences or subsequences that are the same or have a specified percentage of nucleotides or amino acid residues that are the same, when compared and aligned (introducing gaps, if necessary) for maximum correspondence, not considering any conservative amino acid substitutions as part of the sequence identity. The percent identity can be measured using sequence comparison software or algorithms or by visual inspection. Various algorithms and software are known in the art that can be used to obtain alignments of amino acid or nucleotide sequences. One such non-limiting example of a sequence alignment algorithm is the algorithm described in Karlin S., et al, Proc. Natl. Acad. Sci., 87:2264-2268 (1990), as modified in Karlin S., et al., Proc. Natl. Acad. Sci., 90:5873-5877 (1993), and incorporated into the NBLAST and XBLAST programs (Altschul SF, et al., Nucleic Acids Res., 25:3389-3402 (1991)). In certain embodiments, Gapped BLAST can be used as described in Altschul SF, et al., Nucleic Acids Res. 25:3389-3402 (1997). BLAST-2, WU-BLAST-2 (Altschul SF, et al., Methods in Enzymology, 266:460-480 (1996)), ALIGN, ALIGN-2 (Genentech, South San Francisco, California) or Megalign (DNASTAR) are additional publicly available software programs that can be used to align sequences. In certain embodiments, the percent identity between two nucleotide sequences is determined using the GAP program in GCG software (e.g., using a NWSgapdna.CMP matrix and a gap weight of 40, 50, 60, 70, or 90 and a length weight of 1, 2, 3, 4, 5, or 6). In certain alternative embodiments, the GAP program in the GCG software package, which incorporates the algorithm of Needleman and Wunsch (J. Mol. Biol. (48):444-453 (1970)) can be used to determine the percent identity between two amino acid sequences (e.g., using either a Blossum 62 matrix or a PAM250 matrix, and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5). Alternatively, in certain embodiments, the percent identity between nucleotide or amino acid sequences is determined using the algorithm of Myers and Miller (CABIOS, 4: 11-17 (1989)). For example, the percent identity can be determined using the ALIGN program (version 2.0) and using a PAM 120 with residue table, a gap length penalty of 12 and a gap penalty of 4. Appropriate parameters for maximal alignment by particular alignment software can be determined by one skilled in the art. In certain embodiments, the default parameters of the alignment software are used. In certain embodiments, the percentage identity "X" of a first amino acid sequence to a second sequence amino acid is calculated as 100 x (Y/Z), where Y is the number of amino acid residues scored as identical matches in the alignment of the first and second sequences (as aligned by visual inspection or a particular sequence alignment program) and Z is the total number of residues in the second sequence. If the length of a first sequence is longer than the second sequence, the percent identity of the first sequence to the second sequence will be higher than the percent identity of the second sequence to the first sequence.
[0032] As a non-limiting example, whether any particular polynucleotide has a certain percentage sequence identity (e.g., is at least 80% identical, at least 85% identical, at least 90% identical, and in some embodiments, at least 95%, 96%, 97%, 98%, or 99% identical) to a reference sequence can, in certain embodiments, be determined using the Bestfit program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, 575 Science Drive, Madison, WI 53711). Bestfit uses the local homology algorithm of Smith and Waterman (Advances in Applied Mathematics 2: 482 489 (1981)) to find the best segment of homology between two sequences. When using Bestfit or any other sequence alignment program to determine whether a particular sequence is, for instance, 95% identical to a reference sequence described herein, the parameters are set such that the percentage of identity is calculated over the full length of the reference nucleotide sequence and that gaps in identity of up to 5% of the total number of nucleotides in the reference sequence are allowed.
[0033] In some embodiments, two nucleic acids or polypeptides described herein are substantially identical, meaning they have at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, and in some embodiments at least 95%, 96%, 97%, 98%, 99% nucleotide or amino acid residue identity, when compared and aligned for maximum correspondence, as measured using a sequence comparison algorithm or by visual inspection. Identity can exist over a region of the sequences that is at least about 10, about 20, about 40-60 residues in length or any integral value there between, and can be over a longer region than 60-80 residues, for example, at least about 90-100 residues, and in some embodiments, the sequences are substantially identical over the full length of the sequences being compared, such as the coding region of a nucleotide sequence for example. [0034] "AAV" is an abbreviation for adeno-associated virus, and may be used to refer to the virus itself or modifications, derivatives, or pseudotypes thereof. The term covers all subtypes and both naturally occurring and recombinant forms, except where required otherwise. The abbreviation "rAAV" refers to recombinant adeno-associated virus. The term "AAV" includes AAV type 1 (AAV-1), AAV type 2 (AAV-2), AAV type 3 (AAV-3), AAV type 4 (AAV-4), AAV type 5 (AAV- 5), AAV type 6 (AAV-6), AAV type 7 (AAV-7), AAV type 8 (AAV-8), AAV type 9 (AAV-9), avian AAV, bovine AAV, canine AAV, equine AAV, primate AAV, non-primate AAV, and ovine AAV, and modifications, derivatives, or pseudotypes thereof.
[0035] " Recombinant", as applied to an AAV particle means that the AAV particle is the product of one or more procedures that result in an AAV particle construct that is distinct from an AAV particle in nature.
[0036] A recombinant adeno-associated virus particle "rAAV particle" refers to a viral particle composed of at least one AAV capsid protein and an encapsidated polynucleotide rAAV vector genome comprising a heterologous polynucleotide (i.e. a polynucleotide other than a wild-type AAV genome such as a transgene to be delivered to a mammalian cell, e.g., a transgene encoding a microdystrophin comprising the amino acid sequence of SEQ ID NO: 27). The rAAV particle may be of any AAV serotype, including any modification, derivative or pseudotype (e.g., AAV-1, AAV-2, AAV-3, AAV-4, AAV-5, AAV-6, AAV-7, AAV-8, AAV-9, or AAV-10, or derivatives/modifications/pseudotypes thereof). Such AAV serotypes and derivatives/modifications/pseudotypes, and methods of producing such serotypes/derivatives/modifications/ pseudotypes are known in the art (see, e.g., Asokan et al., Mol. Ther. 20(4):699-708 (2012). Recombinant AAV particles comprising a transgene encoding a microdystrophin are disclosed in Int'l. Appl. Pub. No. WO 2021108755, which is incorporated herein by reference for all purposes.
[0037] As used in the present disclosure and claims, the singular forms "a", "an" and "the" include plural forms unless the context clearly dictates otherwise.
[0038] It is understood that wherever embodiments are described herein with the language "comprising" otherwise analogous embodiments described in terms of "consisting of" and/or "consisting essentially of" are also provided. It is also understood that wherever embodiments are described herein with the language "consisting essentially of" otherwise analogous embodiments described in terms of "consisting of" are also provided. [0039] The term "and/or" as used in a phrase such as "A and/or B" herein is intended to include both A and B; A or B; A (alone); and B (alone). Likewise, the term "and/or" as used in a phrase such as "A, B, and/or C" is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone); B (alone); and C (alone).
[0040] Where embodiments of the disclosure are described in terms of a Markush group or other grouping of alternatives, the disclosed method encompasses not only the entire group listed as a whole, but also each member of the group individually and all possible subgroups of the main group, and also the main group absent one or more of the group members. The disclosed methods also envisage the explicit exclusion of one or more of any of the group members in the disclosed methods.
Transcription control elements
[0041] In certain aspects, provided herein are efficient novel transcription control elements with increased transcriptional activity. In some embodiments, the transcription control elements described herein comprise one or more cis-regulatory modules (CRM) derived from the CMV promoter and a core promoter.
[0042] In some embodiment, the core promoter is derived from the CMV promoter. In some embodiment, the core promoter is not derived from the CMV promoter. In some embodiments, the CRM is directly linked to the core promoter. In some embodiments, the CRM is separated from the core promoter by a linker sequence. In some embodiments, the linker sequence comprises between 1 and 200 nucleotides. In some embodiments, the linker sequence comprises between 1 and 100 nucleotides. In some embodiments, the linker sequence comprises between 1 and 50 nucleotides. In some embodiments, the linker sequence comprises between 1 and 30 nucleotides. In some embodiments, the linker sequence comprises between 1 and 20 nucleotides. In some embodiments, the linker sequence comprises between 1 and 10 nucleotides.
[0043] In some embodiments, a cis-regulatory module (CRM) comprises a nucleotide sequence having at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 4-7 or 17-35. In some embodiments, a cis-regulatory module (CRM) comprises a nucleotide sequence having at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 4. In some embodiments, a cis-regulatory module (CRM) comprises a nucleotide sequence having at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 5. In some embodiments, a cis-regulatory module (CRM) comprises a nucleotide sequence having at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 6. In some embodiments, a cis- regulatory module (CRM) comprises a nucleotide sequence having at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 7. In some embodiments, the CRM comprises a nucleotide sequence derived from the corresponding region of SEQ ID NO: 1, wherein the CRM comprises one or more mutations disrupting one or more transcription factor regulatory elements (TFREs) listed in Table 2. In some embodiments, the CRM comprises one or more mutations disrupting one or more YY1, RBP-Jk and/or Gfi-1 TFREs.
[0044] In some embodiments, a cis-regulatory module (CRM) comprises a nucleotide sequence of SEQ ID NO: 4-7 or 17-35 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, a cis-regulatory module (CRM) comprises a nucleotide sequence of SEQ ID NO: 4 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, a cis-regulatory module (CRM) comprises a nucleotide sequence of SEQ ID NO: 5 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, a cis-regulatory module (CRM) comprises a nucleotide sequence of SEQ ID NO: 6 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, a cis-regulatory module (CRM) comprises a nucleotide sequence of SEQ ID NO: 7 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the CRM comprises a nucleotide sequence derived from the corresponding region of SEQ ID NO: 1, wherein the CRM comprises one or more mutations disrupting one or more transcription factor regulatory elements (TFREs) listed in Table 2. In some embodiments, the CRM comprises one or more mutations disrupting one or more YY1, RBP-Jk and/or Gfi-1 TFREs.
[0045] A transcription control elements described herein can comprise any core promoter known to a skilled artisan capable of initiating transcription by RNA polymerase II. In some embodiments, the core promoter comprises one or more of a TATA box, an initiator (Inr), downstream promoter element (DPE), and motif ten element (MTE). In some embodiments, the core promoter comprises a TATA box. In some embodiments, the core promoter comprises an initiator (Inr). In some embodiments, the core promoter comprises a TATA box and an initiator (Inr).
[0046] In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 16. In some embodiments, the core promoter comprises the nucleotide sequence of SEQ ID NO: 16.
[0047] In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 2. In some embodiments, the core promoter comprises the nucleotide sequence of SEQ ID NO: 2.
[0048] In some embodiments, a transcription control element described herein comprises (a) a distal cis-regulatory module (CRM), (b) a proximal CRM, and (c) a core promoter wherein (i) the distal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 4, wherein the distal CRM does not comprise the nucleotide sequence of SEQ ID NO: 8 or SEQ ID NO: 10, and (ii) the proximal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 6, wherein the proximal CRM does not comprise the nucleotide sequence of SEQ ID NO: 12. In some embodiments, the distal CRM comprises a nucleotide sequence of SEQ ID NO: 4 comprising no more than 0, 3, 5, 10, or 15 substitutions, and the proximal CRM comprises a nucleotide sequence of SEQ ID NO: 6 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element is capable of mediating transcription of a heterologous polynucleotide encoding a polypeptide of interest operably linked to the transcription control element. In some embodiments, the transcription control element is capable of mediating transcription of a heterologous polynucleotide encoding a GFP reporter construct comprising the nucleotide sequence of SEQ ID NO: 44 in a HEK293 cell. In some embodiments, the transcription control element comprises less than 550 nucleotides. In some embodiments, the transcription control element comprises less than 500 nucleotides. In some embodiments, the transcription control element comprises less than 450 nucleotides. In some embodiments, the transcription control element comprises between 400 and 550 nucleotides. In some embodiments, the transcription control element comprises between 400 and 450 nucleotides. In some embodiments, the transcription control element comprises between 450 and 500 nucleotides. In some embodiments, the transcription control element comprises between 500 and 550 nucleotides. In some embodiments, the core promoter comprises a TATA-box and an Inr element. In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 16. In some embodiments, the core promoter comprises the nucleotide sequence of SEQ ID NO: 16. In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 2. In some embodiments, the core promoter comprises the nucleotide sequence of SEQ ID NO: 2. In some embodiments, the distal CRM comprises one or more of (a) the nucleotide sequence of SEQ ID NO: 9, and (b) the nucleotide sequence of SEQ ID NO: 11. In some embodiments, the distal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 5. In some embodiments, the distal CRM comprises a nucleotide sequence of SEQ ID NO: 5 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the distal CRM comprises the nucleotide sequence of SEQ ID NO: 5. In some embodiments, the proximal CRM comprises the nucleotide sequence of SEQ ID NO: 13. In some embodiments, the proximal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 7. In some embodiments, the proximal CRM comprises a nucleotide sequence of SEQ ID NO: 7 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the proximal CRM comprises the nucleotide sequence of SEQ ID NO: 7. In some embodiments, the distal CRM comprises the nucleotide sequence of SEQ ID NO: 5, and the proximal CRM comprises the nucleotide sequence of SEQ ID NO: 7. In some embodiments, the distal CRM is directly linked to the proximal CRM. In some embodiments, the distal CRM is separated from the proximal CRM by a linker sequence. In some embodiments, the CRM is directly linked to the core promoter. In some embodiments, the CRM is separated from the core promoter by a linker sequence. In some embodiments, the linker sequence or linker sequences independently comprise between 1 and 200 nucleotides. In some embodiments, the linker sequence or linker sequences independently comprise between 1 and 100 nucleotides. In some embodiments, the linker sequence or linker sequences independently comprise between 1 and 50 nucleotides. In some embodiments, the linker sequence or linker sequences independently comprise between 1 and 30 nucleotides. In some embodiments, the linker sequence or linker sequences independently comprise between 1 and 20 nucleotides. In some embodiments, the linker sequence or linker sequences independently comprise between 1 and 10 nucleotides.
[0049] In some embodiments, a transcription control element described herein comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 3. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 3 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 3.
[0050] In some embodiments, a transcription control element described herein comprises one or more cis-regulatory modules (CRM), and a core promoter, wherein the CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or
100% sequence identity to SEQ ID NO: 17-35. In some embodiments, the transcription control element comprises one cis-regulatory module (CRM), and a core promoter. In some embodiments, the CRM comprises a nucleotide sequence of SEQ ID NO: 17-35 comprising no more than 0, 3, 5,
10, or 15 substitutions. In some embodiments, the CRM comprises the nucleotide sequence of SEQ
ID NO: 17-35. In some embodiments, the transcription control element comprises less than 550 nucleotides. In some embodiments, the transcription control element comprises less than 500 nucleotides. In some embodiments, the transcription control element comprises less than 450 nucleotides. In some embodiments, the transcription control element comprises less than 400 nucleotides. In some embodiments, the transcription control element comprises less than 350 nucleotides. In some embodiments, the transcription control element comprises less than 300 nucleotides. In some embodiments, the transcription control element comprises less than 250 nucleotides. In some embodiments, the transcription control element comprises less than 200 nucleotides. In some embodiments, the transcription control element comprises between 190 and 550 nucleotides. In some embodiments, the transcription control element comprises between 190 and 500 nucleotides. In some embodiments, the transcription control element comprises between 190 and 450 nucleotides. In some embodiments, the transcription control element comprises between 190 and 400 nucleotides. In some embodiments, the transcription control element comprises between 190 and 350 nucleotides. In some embodiments, the transcription control element comprises between 190 and 300 nucleotides. In some embodiments, the transcription control element is capable of mediating transcription of a heterologous polynucleotide encoding a polypeptide of interest operably linked to the transcription control element. In some embodiments, the transcription control element is capable of mediating transcription of heterologous polynucleotide encoding a GFP reporter construct comprising the nucleotide sequence of SEQ ID NO: 44 in a HEK293 cell. In some embodiments, the core promoter comprises a TATA-box and an Inr element. In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 16. In some embodiments, the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 2. In some embodiments, the transcription control element comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 36-43. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 36-43 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 36-43. In some embodiments, the CRM is directly linked to the core promoter. In some embodiments, the CRM is separated from the core promoter by a linker sequence. In some embodiments, the linker sequence comprises between 1 and 200 nucleotides. In some embodiments, the linker sequence comprises between 1 and 100 nucleotides. In some embodiments, the linker sequence comprises between 1 and 50 nucleotides. In some embodiments, the linker sequence comprises between 1 and 30 nucleotides. In some embodiments, the linker sequence comprises between 1 and 20 nucleotides. In some embodiments, the linker sequence comprises between 1 and 10 nucleotides.
Polynucleotides
[01001 In certain aspects, provided herein are isolated polynucleotides comprising a transcription control element described herein. In some embodiments, the transcription control element comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 3. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 3 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 3. In some embodiments, the transcription control element comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 36-43. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 36-43 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 36-43. [0101] In some embodiments, an isolated polynucleotide described herein comprises an expression cassette comprising a transcription control element described herein and a polynucleotide encoding a polypeptide of interest operably linked to the transcription control element. In some embodiments, the expression cassette comprises additional elements capable of controlling the expression of the polypeptide of interest. In some embodiments, the expression cassette comprises an enhancer operably linked to the transcription control element. In some embodiments, the expression cassette comprises a polyadenylation signal operably linked to the transcription control element. In some embodiments, the polynucleotide encoding the polypeptide of interest comprises a splice acceptor, splice donor or intron. In some embodiments, the polynucleotide encoding the polypeptide of interest comprises a 3' untranslated region or a 5' untranslated region. In some embodiments, the 3 'untranslated region comprises a polyadenylation signal. In some embodiments, the polypeptide of interest is an antibody, or antigen-binding fragment thereof, fusion protein, Fc-fusion polypeptide, immunoadhesin, immunoglobulin, engineered protein, protein fragment or enzyme. In some embodiments, the polypeptide of interest is an antibody. In some embodiments, the polypeptide of interest is a viral protein. In some embodiments, the polypeptide of interest is viral capsid protein. In some embodiments, the polypeptide of interest is a viral Rep protein. In some embodiments, the polypeptide of interest is an AAV capsid protein. In some embodiments, the polypeptide of interest is an AAV Rep protein. In some embodiments, the transcription control element comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 3. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 3 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 3. In some embodiments, the transcription control element comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 36-43. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 36-43 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 36-43.
[0102] In certain aspects, provided herein are vectors comprising a transcription control element described herein. In some embodiments, the vector comprises an expression cassette described herein. In some embodiments, the vector is an expression vector. In some embodiments, the vector is viral vector. In some embodiments, the vector is an AAV vector. In some embodiments, the vector is suitable for transient expression of a polypeptide of interest in a host cell, e.g., a HE293 host cell or a HEK293 derived host cell. In some embodiments, the vector is suitable for expression of a polypeptide of interest in a host cell or host organism. In some embodiments, the vector is capable of integrating into host cell chromosomal DNA. In some embodiments, the vector comprises a replication origin capable of enabling the vector to replicate autonomously in a prokaryotic host, e.g., E coli, and a bacterial selectable marker gene. Bacterial origins of replication include but are not limited to the origins of replication of plasmids pBR322, pUC19, pSClOl, pACYC177, and pACYC184 permitting replication in E. coli. Bacterial selectable marker genes include but are not limited to genes that confer antibiotic resistance, e.g., resistance to ampicillin, kanamycin, erythromycin, chloramphenicol or tetracycline. In some embodiments, the transcription control element comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 3. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 3 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 3. In some embodiments, the transcription control element comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 36-43. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 36-43 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 36-43.
[0103] In some embodiments, a vector comprising a transcription control element described herein is an antibody expression vector. In some embodiments, the vector comprises a polynucleotide encoding an antibody. In some embodiments, the vector comprises a polynucleotide encoding an antibody heavy chain variable region and an antibody light chain variable region. In some embodiments, the vector comprises a polynucleotide encoding an antibody heavy chain variable region. In some embodiments, the vector comprises a polynucleotide encoding an antibody light chain variable region.
[0104] In certain aspects, provided herein are host cells comprising a transcription control element described herein. In some embodiments, the host cell comprises an isolated polynucleotide described herein. In some embodiments, the host cell comprises an expression cassette described herein. In some embodiments, the host cell comprises a vector described herein. In some embodiments, the host cell is a prokaryotic cell, e.g., E. coli, suitable for producing the isolated polynucleotide, expression cassette or vector. In some embodiments, the host cell is a HEK293 cell, HEK293 derived cell, CHO cell, CHO derived cell, HeLa cell, SF-9 cell, BHK cell, Vero cell, or PerC6 cell. In some embodiments, the host cell is a HEK293 cell. In some embodiments, the host cell is a HEK293 derived cell. In some embodiments, the host cell expresses transcriptional repressors YY1, RBPJk, Gfil and/or ERF. The improved promoters disclosed herein drive higher protein expression in a transfected host cell compared to those transfected with wild-type CMV promoter-driven polynucleotides, wherein the host cell expresses transcriptional repressors YY1, RBPJk, Gfil and/or ERF, especially high levels of such repressors. In some embodiments, the host cell is capable of expressing a polypeptide of interest. In some embodiments, the polypeptide of interest is an antibody, or antigen-binding fragment thereof, fusion protein, Fc-fusion polypeptide, immunoadhesin, immunoglobulin, engineered protein, protein fragment or enzyme. In some embodiments, the polypeptide of interest is an antibody or a viral protein. In some embodiments, the host cell is capable of producing a recombinant AAV particle. In some embodiments, the transcription control element comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 3. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 3 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 3. In some embodiments, the transcription control element comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 36-43. In some embodiments, the transcription control element comprises a nucleotide sequence of SEQ ID NO: 36-43 comprising no more than 0, 3, 5, 10, or 15 substitutions. In some embodiments, the transcription control element comprises the nucleotide sequence of SEQ ID NO: 36-43.
[0105] In some embodiments, a polynucleotide disclosed herein encodes a polypeptide of interest selected from the group of an antibody, or antigen-binding fragment thereof, fusion protein, Fc-fusion polypeptide, immunoadhesin, immunoglobulin, engineered protein, protein fragment or enzyme. In some embodiments, the polypeptide of interest is an antibody or antigen-binding fragment thereof. In some embodiments, the polypeptide of interest is panitumumab, omalizumab, abagovomab, abciximab, actoxumab, adalimumab, adecatumumab, afelimomab, afutuzumab, alacizumab, alacizumab, alemtuzumab, alirocumab, altumomab, amatuximab, amatuximab, anatumomab, anrukinzumab, apolizumab, arcitumomab, atinumab, tocilizumab, basilizimab, bectumomab, belimumab, bevacizumab, besilesomab, bezlotoxumab, biciromab, blinatumomab, canakinumab, certolizumab, cetuximab, cixutumumab, daclizumab, denosumab, eculizumab, edrecolomab, efalizumab, efungumab, epratuzumab, ertumaxomab, etaracizumab, figitumumab, golimumab, ibritumomab tiuxetan, igovomab, imgatuzumab, infliximab, inolimomab, inotuzumab, labetuzumab, lebrikizumab, moxetumomab, natalizumab, nivolumab, obinutuzumab, oregovomab, palivizumab, panitumumab, pertuzumab, ramucirumab, ranibizumab, rituximab, Secukinumab, tocilizumab, tositumomab, tralokinumab, tucotuzumab, trastuzumab, ustekinumab, vedolizumab, veltuzumab, zalutumumab, or zatuximab. In some embodiments, the polypeptide of interest is an enzyme. In some embodiments, the polypeptide of interest is alpha-galactosidase, myozyme, or cerezyme. In some embodiments, the polypeptide of interest is human erythropoietin, tumor necrosis factor (TNF), or an interferon alpha or beta. In some embodiments, the polypeptide of interest is alglucosidase alfa, laronidase, abatacept, galsulfase, lutropin alfa, antihemophilic factor, agalsidase beta, interferon beta- la, darbepoetin alfa, tenecteplase, etanercept, coagulation factor IX, follicle stimulating hormone, interferon beta- la, imiglucerase, dornase alfa, epoetin alfa, insulin or insulin analogs, mecasermin, factor VIII, factor Vila, anti-thrombin III, protein C, human albumin, erythropoietin, granulocute colony stimulating factor, granulocyte macrophage colony stimulating factor, interleukin-11, laronidase, idursuphase, galsulphase, alpha- 1 -proteinase inhibitor, lactase, adenosine deaminase, tissue plasminogen activator, thyrotropin alpha, acid, betagalactosidase, neuraminidase, hexosaminidase A, or hexosaminidase B.
Methods of use
[0106] In certain aspects, provided herein are methods for expressing a polypeptide of interest in a host cell comprising culturing a host cell described herein under suitable conditions to produce the polypeptide of interest.
[0107] In some embodiments, the polypeptide of interest is an AAV capsid protein. In some embodiments, the polypeptide of interest is an AAV Rep protein.
[0108] In some embodiments, the host cell is a HEK293 cell, HEK293 derived cell, CHO cell, CHO derived cell, HeLa cell, SF-9 cell, BHK cell, Vero cell, or PerC6 cell. In some embodiments, the host cell is a HEK293 cell. In some embodiments, the host cell is a HEK293 derived cell. In some embodiments, the host cell is capable of producing a recombinant AAV particle.
[0109] In some embodiments, the polypeptide of interest is an antibody. In some embodiments, the polypeptide of interest is a viral protein. In some embodiments, the polypeptide of interest is viral capsid protein. In some embodiments, the polypeptide of interest is a viral Rep protein. In some embodiments, the polypeptide of interest is an AAV capsid protein. In some embodiments, the polypeptide of interest is an AAV Rep protein.
[0110] In some embodiments, a polynucleotide disclosed herein encodes a polypeptide of interest selected from the group of an antibody, or antigen-binding fragment thereof, fusion protein, Fc- fusion polypeptide, immunoadhesin, immunoglobulin, engineered protein, protein fragment or enzyme. In some embodiments, the polypeptide of interest is an antibody or antigen-binding fragment thereof. In some embodiments, the polypeptide of interest is panitumumab, omalizumab, abagovomab, abciximab, actoxumab, adalimumab, adecatumumab, afelimomab, afutuzumab, alacizumab, alacizumab, alemtuzumab, alirocumab, altumomab, amatuximab, amatuximab, anatumomab, anrukinzumab, apolizumab, arcitumomab, atinumab, tocilizumab, basilizimab, bectumomab, belimumab, bevacizumab, besilesomab, bezlotoxumab, biciromab, blinatumomab, canakinumab, certolizumab, cetuximab, cixutumumab, daclizumab, denosumab, eculizumab, edrecolomab, efalizumab, efungumab, epratuzumab, ertumaxomab, etaracizumab, figitumumab, golimumab, ibritumomab tiuxetan, igovomab, imgatuzumab, infliximab, inolimomab, inotuzumab, labetuzumab, lebrikizumab, moxetumomab, natalizumab, nivolumab, obinutuzumab, oregovomab, palivizumab, panitumumab, pertuzumab, ramucirumab, ranibizumab, rituximab, Secukinumab, tocilizumab, tositumomab, tralokinumab, tucotuzumab, trastuzumab, ustekinumab, vedolizumab, veltuzumab, zalutumumab, or zatuximab. In some embodiments, the polypeptide of interest is an enzyme. In some embodiments, the polypeptide of interest is alpha-galactosidase, myozyme, or cerezyme. In some embodiments, the polypeptide of interest is human erythropoietin, tumor necrosis factor (TNF), or an interferon alpha or beta. In some embodiments, the polypeptide of interest is alglucosidase alfa, laronidase, abatacept, galsulfase, lutropin alfa, antihemophilic factor, agalsidase beta, interferon beta- la, darbepoetin alfa, tenecteplase, etanercept, coagulation factor IX, follicle stimulating hormone, interferon beta- la, imiglucerase, dornase alfa, epoetin alfa, insulin or insulin analogs, mecasermin, factor VIII, factor Vila, anti-thrombin III, protein C, human albumin, erythropoietin, granulocute colony stimulating factor, granulocyte macrophage colony stimulating factor, interleukin-11, laronidase, idursuphase, galsulphase, alpha- 1 -proteinase inhibitor, lactase, adenosine deaminase, tissue plasminogen activator, thyrotropin alpha, acid, betagalactosidase, neuraminidase, hexosaminidase A, or hexosaminidase B.
EXAMPLES
Example 1 - Engineering of the CMV promoter for controlled expression of recombinant genes in HEK293 cells.
[0111] Expression of recombinant genes in HEK293 cells is frequently utilized for production of recombinant proteins and viral vectors. These systems frequently employ the cytomegalovirus (CMV) promoter to drive recombinant gene transcription. However, the mechanistic basis of CMV- mediated transcriptional activation is unknown and consequently there are no strategies to engineer CMV for controlled expression of recombinant genes. Extensive bioinformatic analyses of transcription factor regulatory elements (TFREs) within the human CMV sequence and transcription factor mRNAs within the HEK293 transcriptome revealed 80 possible regulatory interactions. In vitro functional testing using reporter constructs harboring discrete TFREs or CMV deletion variants identified key TFRE components and clusters of TFREs (cis-regulatory modules) within the CMV sequence. The data revealed that CMV activity in HEK293 cells is a cooperative function of the promoters various constituent TFREs including AhR:ARNT, CREB, E4F, Spl, ZBED1, JunB, c-Rel and NF-KB. Also identified were critical Spl -dependent upstream activator elements near the transcriptional start site that were required for efficient transcription and YY 1 and RBP-JK binding sites that mediate transrepression. This study shows for the first time that novel, compact CMV-derived promoters can be engineered that exhibit a significantly higher transcriptional efficiency (activity per unit DNA sequence) or increase in total activity.
[0112] HEK and CHO cell cultures: Suspension-adapted HEK293 cells were cultured in supplemented Dynamis™ medium (Thermo Fisher Scientific) (Thermo Fisher Scientific). Expi293F™ cells (Thermo Fisher Scientific) were cultured in Expi293™ Expression medium (Thermo Fisher Scientific). CHO-S cells (Thermo Fisher Scientific) were cultured in supplemented CD CHO medium (Thermo Fisher Scientific). Cells were maintained in Erlenmeyer flasks (Corning) at 37°C, 140 rpm under 5% CO2, 85% humidity and were sub-cultured every 3-4 days by seeding at 3xl05 viable cells/mL. Cell viability and VCD were measured using a Vi-CELL® XR (Beckman Coulter). [0113] Vector construction: pmaxGFP™ vector (Lonza) was utilized as a backbone. The CMV promoter and chimeric intron of pmaxGFP™ were deleted by digestion with BsrGI and Kpnl, and replaced with a short DNA fragment containing EcoRI and Hindlll cloning sites. A full-length hCMV-IE promoter (-500 to +48 relative to the TSS) was synthesized (Eurofins Genomics) and inserted directly upstream of the green fluorescent protein (GFP) open reading frame (ORF) of the promoterless vector backbone. A minimal CMV core promoter (-36 to +48 relative to the TSS) was also synthesized and inserted directly upstream of the GFP ORF. To create TFRE reporter plasmids, synthetic oligonucleotides containing 7x repeat copies of the TFRE sequences in Table 2 were synthesized, PCR amplified (Q5 high-fidelity 2x master mix; NEB), and purified (QIAquick® PCR Purification kit; Qiagen). The PCR products were then digested, gel extracted (QIAquick® Gel Extraction kit; Qiagen) and inserted into the cloning sites upstream of the CMV core promoter. Discrete regions of the CMV promoter sequence were PCR amplified and inserted upstream of the CMV core promoter. Mutated promoter constructs were synthesized and inserted upstream of the CMV core promoter. The CBh promoter was excised from pSpCas9(BB)-2A-GFP plasmid (Addgene) by digestion with Kpnl and Agel and inserted directly upstream of the GFP ORF. Clonally derived plasmids were purified using a QIAGEN Plasmid Plus kit (Qiagen). The sequence of all plasmid constructs was confirmed by restriction enzyme analysis and DNA sequencing.
[0114] PEI-mediated transient transfection: One day before transfection, cells were sub- cultured in an Erlenmeyer flask, grown to 1x106 cells/mL and aliquots of 10 mL were added to each TubeSpin bioreactor tube (TPP). 8 μg of DNA and 24 pL of PEI MAX (1 mg/mL; Polysciences) were each pre-diluted in 150 μL of NaCl (150 mM; Polyplus-transfection), combined and incubated at room temperature for 4 min before being added into culture. Transfected cells were cultured for 48 h at 37°C, 230 rpm under 5% CO2, 85% humidity.
[0115] Measurement of recombinant GFP expression in vitro: GFP expression was quantified using a SpectraMax iD5 microplate reader (Molecular Devices) 48 h post-transfection. Prior to fluorescence read (excitation: 485 nm, emission: 535 nm), culture medium was removed by centrifugation at 200 g for 5 min. 1.5x106 viable cells were resuspended in 750 pL Dulbecco’s phosphate-buffered saline (DPBS; Sigma) and then transferred to a 96-well microplate at 3x105 cells (150 μL) per well. To measure transfection efficiency, cells were analyzed using Attune™ Acoustic Focusing Cytometer (Thermo Fisher Scientific). Background fluorescence/absorbance was determined in cells transfected with a promoterless vector.
[0116] In silico analysis of transcription factor regulatory elements: Genomatix Gene Regulation software (Matinspector Release 8.4 and MatBase Version 11.2; Precigen Bioinformatics Germany) was used to analyze the CMV promoter to find putative human TFREs. Cognate TF of each TFRE matrix was obtained from previously published studies as listed in MatBase.
[0117] Analysis of HEK293 transcription factor expression: HEK293 cells were seeded at 1x106 viable cells/mL and cultured as described above. From Day 3, cells were fed daily (1% v/v) with feed medium containing 130 g/L glucose, 29.23 g/L L-glutamine, 25 g/L arginine and 20 g/L serine. Total RNA was extracted from duplicate cultures during exponential (~5x106 cells/mL) and stationary phases (~1.6x107 cells/mL) of growth. For each sample, 3x106 viable cells were collected by centrifugation at 200 g for 5 min. Cell pellets were immediately resuspended in 300 μL of RNAprotect® Cell Reagent and stored at -80°C. RNA-seq libraries were prepared and sequenced by GENEWIZ using an Illumina NovaSeq™ (Illumina). Galaxy (usegalaxy.org) and R software were used to analyze the RNA-seq data using Salmon alignment tool and human GRCh38 GTF and FASTA files from www.ensembl.org. A curated database of -1,600 human TFs was obtained from Lambert et al. [13].
Example 2 - In silico and in vitro identification of regulators of CMV promoter transcriptional activity.
[0118] In order to identify potential regulatory elements in CMV capable of recombinant gene transactivation in HEK293 cells, bioinformatic survey was performed of (i) putative TFREs (binding sites) in the promoter, and (ii) the TF repertoire of HEK293 cells based on RNA-seq datasets. With regard to the latter, although gene expression analysis does not permit precise quantification of active TF levels, it provides useful information on general TF expression profile where genes with more than two transcripts per million (TPM) may be considered active. [14] Using the Genomatix search tool, 108 discrete TFREs from 74 TF families were identified in the CMV promoter at copy numbers ranging from one to six. However, the gene expression analysis indicated that 22% (24/108) of the TFREs’ cognate TFs were not expressed in HEK293 cells (exponential phase log2 TPM < 1 ; Figure 1 A). Further, in order to identify key regulatory elements, the search was focused on TFs that exhibit gene expression activities in both exponential and stationary phases of culture (i.e. “context-specific” expression can extend beyond cell-type), thus eliminating an additional four TFs that were not expressed in the latter phase of culture — yielding 80 potential TFREs. A skilled artisan understands that two TFREs may have identical or overlapping sequences within the CMV promoter (e.g., NF-KB and NFAT5). Table 1 lists the identified TFREs and their cognate TFs while Figure IB shows the map of select TFREs in the CMV promoter.
[0119| Table 1. Details of discrete TFREs identified in CMV promoter by bioinformatic analysis using Genomatix software, their cognate TFs and the corresponding gene expression levels (transcripts per million; TPM) in HEK293 cells during exponential and stationary phases of culture. TFREs are grouped and arranged according to their TF family. A: not identified in CMV promoter. B: binds to the 21 bp repeat motifs within the distal enhancer.
Figure imgf000031_0001
Figure imgf000032_0001
Figure imgf000033_0001
Figure imgf000034_0001
[0120] To minimize the TFRE pool for functional testing, TFREs with substantially overlapping binding sites were filtered out and two TFREs from each TF family were selected — yielding a subset of 25 TFREs. To measure the relative ability of TFREs to activate transcription of recombinant genes in HEK293 cells, a set of GFP reporter constructs were created that contained seven repeat copies of a specific TFRE in series, upstream of a minimal CMV core promoter (-36 to +48 relative to the TSS, containing a TATA box and an Inr motif) as previously described. [15] Optimized PEI-mediated transient transfection of plasmid DNA into suspension HEK293 cells yielded a transfection efficiency of -94% with a cell viability of -90% at 48 h post-transfection (measured using a vector harboring a CMV promoter). Additionally, preliminary experiments confirmed that GFP fluorescence intensity in HEK293 cell host is directly proportional to GFP mRNA levels post-transfection (Figures 6A and 6B). [16] Measurement of GFP expression after transient transfection of HEK293 cells with each TFRE reporter plasmid is shown in Figure 2A. This analysis revealed eight TFREs with significantly increased expression (> 10-fold, p < 0.01) over basal expression from the minimal core promoter, i.e. AhR:ARNT, CREB/ATF1, CREB/E4F, Spl, ZBED1, JunB, c-Rel and NF-KB. In some instances, TFRE sequences with competing (overlapping) binding sites may be resolved by utilizing their consensus sequence (e.g. CREB and E4F; Figure 2A). Other TFRE reporter constructs displayed no obvious increase in GFP above core control level, suggesting alternative mechanisms of TF-mediated transcriptional activation or suboptimal TF binding sequences. To elucidate the latter, the consensus sequence of MYBL1, Oct and E2F (selected based on a data not shown) was tested. This analysis revealed that the consensus sequences exhibited between 10-53-fold increase in expression over the core promoter (Figure 2A), indicating that the TFREs were essentially able to mediate activation of recombinant gene transcription in HEK293 cells using available TF activity.
[0121] In order to both confirm and further demonstrate the distinctive transcriptional landscape of HEK293 cells that influence CMV-mediated TGE, the NF-KB and CREB sequences were tested in CHO cells, as well as NF-KB p65 subunit sequence that is not present in the CMV promoter (Table 2). This analysis (Figure 2B) indicated that NF-KB and CREB were highly active in CHO cells (133% and 62% of CMV activity respectively), in line with our previous study that identified these two elements as key positive regulators of CHO cell-specific CMV promoter activity. [9] While NF-KB p65 subunit was five times more active than NF-KB in HEK293 cells, the activity was only one-fifth of that observed in CHO cells. We therefore deduced that HEK293 cell-specific regulation of CMV promoter activity, in contrast to CHO, was a function of cooperative interactions amongst a broader range of TFREs. A skilled person would understand that the findings using HEK293 cells are directly applicable to host cells with a TRFE profile similar to that of HEK293 cells.
Table 2. DNA sequences of selected TFREs identified by bioinformatic survey of CMV promoter. Measurement of the TFRE sequence relative ability to activate transcription of recombinant GFP genes in HEK293 cells is shown in Figure 2.
Figure imgf000036_0001
Figure imgf000037_0001
Example 3 - CMV promoter-mediated gene expression in HEK cells is regulated by proximal elements.
[0122] In its natural context the CMV promoter can be divided into two modular components, the proximal and distal enhancers (Figure 3 A). [1] Furthermore, TFREs often occur together in clusters as cis-regulatory modules (CRMs) where some elements may require interactions with adjacent or nearby TFRE partners in order to drive transcription. [17] To identify DNA sequence regions that are required for regulating gene expression in HEK293 cells, seven ~150-bp CRMs were inserted upstream of the CMV core in GFP reporter vectors (CRMs 1-7; Figure 3A). Figure 3B shows transient GFP reporter production from each CRM. CRMs from within the proximal enhancer sequence were generally more active than those from the distal, with CRM 1 alone yielding 67% of CMV’s transcriptional activity. Analysis of the TFRE composition indicated that all positive regulators identified in the functional screen (Figure 2 A) occurred in CRM 1 , with one copy each of AhR:ARNT, CREB/ATF1, CREB/EF4, ZBED1, JunB, c-Rel and NF-KB and two copies of Spl. Moreover, multiple copies of CREB/E4F and Spl were present in CRMs 6 and 7, yielding 32-42% of CMV’s activity. Conversely, CRMs from the middle of the CMV promoter (i.e. CRMs 4 and 5) did not display observable activity (< 7% of CMV). This was not unexpected considering that the constituent TFREs of these CRMs were mostly inactive in the functional screen.
[0123] Assembly of CRMs and comparison of their relative activity provided further analysis of individual CRM functions (Figure 3B). Combining CRM 1 and CRM 7 (67% and 42% of CMV activity respectively) yielded a promoter with only 90% CMV activity (CRM 1+7), suggesting a partially redundant function of the distal enhancer and/or spatial effects. On the other hand, adding inactive CRM 4 (7% CMV) onto CRM 1 significantly enhanced the transcriptional activity to 86% CMV (CRM 1+4). This data implies a synergistic interaction of specific TFREs within the proximal enhancer. To expound this observation, an extended CRM 1 reporter vector (CRM 1+2) incorporating the NFATcl, NF1 and LEF1 binding sites was constructed. Even though these TFRE sequences were not active on their own (Figure 2 A) the extended promoter displayed a 15% increase in activity (Figure 3B), possibly via NFATcl-c-Rel interaction (-117 and -57 relative to the TSS respectively). [18] Critically, the data in Figure 3B reveal that CRM 2 exhibited 64% lower activity than CRM 1 despite a significant sequence overlap (Figure 3A), suggesting that either additional TFREs within the 5’ region of CRM 2 functioned to negatively regulate transcription, or essential regulators of CMV-mediated TGE in HEK293 were located in the 3’ region of CRM 1. With regard to the former, the apparent increase in GFP activity of CRM 1+2 compared to CRM 1 (see above) discounted the possibility of a specific transrepression effect of CRM 2. To substantiate the latter, a reporter vector utilizing a CMV enhancer/chicken [3-actin hybrid (CBh) promoter was constructed (Figure 3A). [19] The promoter, comprising a practically complete CMV enhancer apart from CRM 1, exhibited only 41% of CMV’s activity. Combining all observations made above, it was inferred that (i) TFREs within the proximal enhancer functioned synergistically to drive transcription, and (ii) critical regulators of CMV promoter activity in HEK293 were located in the 3’ region of the proximal enhancer sequence (i.e. approximately -90 to -42 relative to the TSS).
Example 4 - Spl binding sites near the TATA-box are essential for efficient CMV promoter-mediated gene expression in HEK293 cells.
[0124| In order to specifically determine the key regulators of CMV-mediated gene expression in HEK293 cells, CMV promoter variants with specific TFREs within -107 to -45 relative to the TSS ‘knocked-out’ were created. Proximal CMV (-300 to +48 relative to the TSS, -84% CMV activity) rather than full-length CMV promoter was utilized for maximal impact of a single TFRE knock-out (i.e. minimal potential “noise” by other elements). Selective mutation was performed on the core sequence of a specific TFRE in order to disrupt the binding site without perturbing overlapping or introducing new TFREs (Figure 4A). Further, given the complexity of CMV promoter, it was hypothesized that different HEK293 hosts may potentially vary in their TF repertoires that could significantly influence CMV promoter regulation. In order to evaluate this, the activity of the synthetic proximal CMV constructs in our standard HEK293 cell line as well as the commercially available Expi293F cell line was determined. Measurement of GFP production after transient transfection of HEK293 and Expi293F cells with the knocked-out proximal CMV promoters is shown in Figure 4B. Relative promoter activities were very similar in both cell lines, invalidating the hypothesis.
[0125] The data (Figure 4B) also show that removal of NF-KB and ZB EDI binding sites, either individually or simultaneously, did not reduce GFP expression. This result is in line with the above finding (Figure 2A) that NF-KB had a very minimal activity in HEK293 cells but was not fully anticipated for the relatively active ZBED1. Utilizing the TFRE identification tool at a lower stringency, the in silico analysis identified a weak ZB EDI binding site at the mutated sequence (matrix similarity 0.734, optimal matrix threshold 0.76) suggesting that the ZB EDI mutation did not fully knock-out the TFRE. Removal of Spl, CREB/ATF1 and Sp4 binding sites individually reduced promoter activity to -62%, -74% and -44% of that deriving from wild-type proximal CMV, respectively. Additionally, removal of the Spl site with CREB/ATF1 or Sp4 (mutations 3+4 and 3+6) led to further decrease in promoter activities. Critically, when the two Spl sites were simultaneously removed (mutations 3+5) GFP expression was reduced to the lowest level, i.e. -25% compared to the wild-type proximal CMV. No further reduction in promoter activity was observed when the Sp4 was mutated in conjunction with the two Spl sites — indicative of the Spl’s vital regulatory function. However, considering the relatively weak activity of the Spl homotypic promoter in Figure 2 A, the data do not support the conclusion that Spl blocks (or any other TFRE) could support high transcriptional activity alone. It was therefore deduced that the two Spl sites act as an upstream activator element [20] for CMV promoter-mediated transcription in HEK293 cells.
Example 5 - Knock-out of repressor elements results in increased gene expression in CMV promoter variants.
[0126] The above in silico analysis of regulation of CMV promoter activity by sequence elements (Fig. 1) also identified two TFRE components that have previously been shown to negatively regulate transcription from the murine CMV-IE promoter in cytomegalovirus-infected mouse kidneys, YY1 and RBP-JK, [10] as well as Gfil where its overexpression has been shown to repress hCMV-IE promoter activity in mouse fibroblast cells. [21] It was hypothesized that CMV promoter could be optimized for TGE by disrupting transrepression mediated by these TFREs. To evaluate the functional activity of these TFREs as regulators of CMV-mediated TGE in HEK293 cells, CMV(-derived) promoters with repressor elements knocked-out were synthesized and inserted into GFP reporter vectors (Figure 5A,B). The YY1 (three binding sites) and RBP-JK (one binding site) are located in the distal enhancer while the Gfil (one binding site) is located in the proximal enhancer. Additionally, previous studies suggested a fourth YY 1 binding site at -343 to -353 relative to the TSS [1] which was identified as a weak binding sequence in this study (matrix similarity 0.889, optimal matrix threshold 0.94). GFP expression levels in HEK293 and Expi293F cells were measured 48 h post-transfection.
[0127| Removal of the repressor elements in full length CMV (promoters 1.01 and 1.02; Figure 5C) did not result in increased GFP expression compared to the wild-type control, suggesting that the TFREs were not critical regulators affecting transcriptional activity under the conditions employed. This is possibly due to positive TF-TFRE interactions within the proximal enhancer decreasing the influence of distal enhancer-mediated processes (see above, Figures 3A and B). To further investigate the impact of proximal enhancer, the 5’ region of the enhancer was truncated (promoter 2.01) which resulted in a -13% decrease in transcriptional activity compared to the wild-type CMV. In this promoter construct, removal of YY1 and RBP-JK binding sites increased the promoter activity by 12% (promoter 2.02), indicating that the TFREs can act as negative regulators of CMV promoter in HEK293 cells. No additional increase was observed with further removal of Gfil (promoter 2.03) — this was not entirely unexpected considering that Gfil gene was lowly expressed (log2 TPM = 2.23) whereas YY 1 and RBP-JK genes exhibited expression levels above the 90th percentile (log2 TPM = 5.51 and 5.53 respectively; Figure 1A).
[0128] To confirm the ability of YY1 and RBP-JK to mediate transrepression of recombinant gene transcription in HEK293 cells, CMV promoter variants were constructed with minimal proximal and distal enhancers containing only one site each of YY1, RBP-JK and Gfil (promoters 3.01-3.03). The CRM for promoter 3.01 (SEQ ID NO: 33) is identical to CRM 1+7 (SEQ ID NO: 26) discussed above and shown in Figure 3. As anticipated, removal of the YY1 and RBP-JK binding sites increased the promoter activity by 11 % compared to its non-mutated counterpart while no significant change was observed with further removal of Gfil (we note that removal of the YY 1 binding alone increased the promoter activity by 7%; data not shown). Moreover, these shorter promoter sequences displayed similar activities to promoters 2.01-2.03. In this regard, it was understood that the deletion of distal enhancer’s 3’ region effectively removed transrepression mediated by the fourth YY1 repressor motif (excluded in our in silico survey of CMV-constituent TFREs). Based on the above observations, promoter 4.01 was constructed that was devoid of repressor elements while retaining the active regions. This engineered CMV promoter displayed a -14% increase in expression while being 25% smaller in size compared to the wild-type CMV. To illustrate the enhanced capability of the promoters in driving transcription, a “transcriptional efficiency” was calculated for each promoter as a function of transcriptional output per promoter length. This analysis indicates that promoters 3.03 and 4.01 were -50% more transcriptionally efficient compared to the wild-type CMV promoter (Table 3). We conclude that the CMV promoter can be engineered for improved TGE in HEK293 via disruption of transrepression mediated by YY 1 and RBP-JK and removal of redundant sequences. A skilled person would understand that the findings using HEK293 cells are directly applicable to host cells with a TRFE profile similar to that of HEK293 cells.
Table 3. Transcriptional efficiency (transcriptional output per base pair length) and number of CpG dinucleotides occurrences in the wild-type and engineered CMV promoters.
Figure imgf000041_0001
[0129] The vast majority of current HEK293 cell TGE systems utilize the CMV promoter for high-yield production of therapeutic proteins [35, 36] and improved lentiviral vectors and AAV expression cassettes. [37, 22, 23] The comparative transient expression analyses described herein revealed that the CMV promoter activity in HEK293 cells was a function of the promoter’s various constituent TFREs including AhR:ARNT, CREB, E4F, Spl, ZB EDI, JunB, c-Rel and NF-KB. This is a very significant and useful finding, as it can form the basis of promoter engineering containing enhanced binding sites, [24] or can be directly be utilized as modular building blocks to construct synthetic promoters de novo. [12, 15] Further identified were several sub-optimal TF binding sequences (MYBL1, Oct, E2F) which suggests an immense opportunity for maximizing CMV promoter’s transcriptional output. Hundreds of TFRE motif sequence variants can be characterized simultaneously via in vitro use of high-throughput parallel screening methods, allowing determination of their optimal binding affinity. The major challenge with such functional tests is the difficulty in identifying TFREs underpinning the more complex regulation governing synergistic transactivation [17, 18] which would require intricate screens of TFRE motif pairs with position-sensitive function. However, this limitation can be circumvented by using the TF decoy technology described in [25] to inhibit specific TFRE(s) within the CMV promoter architecture, obviating the need to characterize spatial effects between two TFRE motif pairs. Furthermore, the work presented herein provides a novel library of promoter sequences (Figure 2) to control gene expression over a wide range. These highly compact promoters could be utilized in multigene vectors to give predictable stoichiometries (e.g. optimization of monoclonal antibody heavy chain to light chain ratio), [26] especially with non-overlapping sequences to avoid homologous recombination-mediated silencing.
[0130] Bioinformatic analysis on the CMV promoter sequence indicated that Spl family is predominant in the promoter, in line with the notion that the element is essential for prevention of de novo methylation of CpG islands. [27] Importantly, the results show that each of the two Spl binding sites near the TATA box contributes to full activation of the CMV promoter in HEK293 cells — resembling the previous report in which mutation of these Spl binding sites caused inefficient CMV promoter transcription and cytomegalovirus replication in human fibroblast cells. [28] Similar transcription activation mechanism had been reported for the simian virus 40 (SV40) promoter in which Spl binding to its cognate sequences upstream of the TATA box enhanced the activity of RNA polymerase II. [29, 30] Indeed, analyses of synthetic core promoters indicated that Spl binding sites, when placed upstream of an Inr and/or TATA box, acted as an upstream activator element for efficient transcription initiation in vitro [20] and in HEK293 cells. [31] Nevertheless, previous studies [9, 12, 25] as well as data in Figure 2B showed that CHO cells were able to drive efficient recombinant gene transcription in the absence of such upstream activator elements, illustrating that engineering strategies to improve CMV promoter activity have to be cell-type specific for maximum efficacy. It is further conjectured that a Spl-dependent upstream activator element is a design prerequisite for construction of strong synthetic/hybrid promoters for HEK293 cells.
[0131] Another important outcome of this study is the identification of negatively acting cellular TFs, and that a substantial proportion of the CMV sequence can be functionally redundant for recombinant gene expression in HEK293 cells. Specifically, the results showed that YY1 and RBP-jK-mediated transrepression of the CMV promoter could be removed by designing engineered CMV constructs with inactive cognate binding sites. Previous studies have also shown that ERF (Ets-2 Repressor Factor) was able to repress the CMV promoter by binding to the 21 bp repeat motifs overlapping YY1 and Spl within the distal enhancer (see Figure 7), [11] and that the ERF gene was highly expressed in HEK293 cells (log2 TPM = 4.91). Without being bound by any specific theory, it is postulated that the deletion of 3’ region of the distal enhancer (promoters 3.03 and 4.01) effectively removed the YY1 as well as an ERF binding site, permitting a more defined, improved regulation of recombinant transcriptional activity and with relatively small promoter size. The engineered promoters can further confer additional advantages in dynamic bioprocess conditions in respond to changes in cellular transcriptional landscape. For example, differential gene expression analysis on the HEK293 transcriptomic data showed that RBP-JK was (slightly) upregulated from the mid-exponential to early-stationary phase (log2 fold-change = 0.362, p-adj = 0.0012), suggesting that the positive impact of RBP-JK knock-out would be more pronounced in long-term, fed-batch production processes for example.
[0132] Eastly, the data presented herein offer benefits to systems beyond TGE. For instance, long-term stable expression can be compromised by the occurrences of sequence features such as repeat elements (homologous recombination-mediated silencing) [32] and CpG islands (methylation-mediated silencing). [33] With regard to the former, the CMV promoter contains two copies of 21 bp repeat motif in the distal enhancer as mentioned above. Promoters 3.03 and 4.01 indirectly removed one of these repeat elements, therefore avoiding potential genetic homologous recombination events associated with gene deletion. With regard to the latter, it is possible to reduce the number of CpG dinucleotides within the CMV promoter by mutating TFREs with no/low activities, thus minimizing the formation of methylation-mediated epigenetic silencing linked to production instability. Minimal CpG dinucleotides is also a desirable feature for gene therapy vectors in which CpG motifs have immunostimulatory effects (e.g. promoter 3.03 contains 20% less CpG dinucleotides compared to the wild-type CMV promoter; Table 3). [34] Accordingly, it is now be possible to rationally design synthetic CMV promoter variants in order to equip HEK293 cells with new transcriptional machinery optimally suited for a specific intended purpose. This work enables a skilled artisan to use similar approaches to deconstruct and reconstruct other promoters for optimal functionalities in particular cell types.
REFERENCES
[1] M. F. Stinski, H. Isomura, Med. Microbiol. Immunol. 2008, 197, 223.
[2] C. Sinzger, M. Digel, G. Jahn, Curr. Top. Microbiol. Immunol. 2008, 325, 63.
[3] E. Forte, Z. Zhang, E. B. Thorp, M. Hummel, Front. Cell. Infect. Microbiol. 2020, 10, 130.
[4] A. Coulon, C. C. Chow, R. H. Singer, D. R. Larson, Nat. Rev. Genet. 2013, 14, 572.
[5] V. Melia- Alvarado, A. Gautier, F. Le Gac, J. -J. Lareyre, Gene Expr. Patterns 2013, 13, 91.
[6] D. Vasey, S. Lillico, H. Sang, T. King, C. Whitelaw, Transgenic Res. 2009, 18, 309.
[7] J. Y. Qin, L. Zhang, K. L. Clift, I. Hulur, A. P. Xiang, B. Z. Ren, B. T. Lahn, PloS One 2010, 5, el0611.
[8] W. Xia, P. Bringmann, J. McClary, P. P. Jones, W. Manzana, Y. Zhu, S. Wang, Y. Liu, S. Harvey, M. R. Madlansacay, K. McLean, M. P. Rosser, J. MacRobbie, C. L. Olsen, R. R. Cobb, Protein Expression Purif. 2006, 45, 115.
[9] A. J. Brown, B. Sweeney, D. O. Mainwaring, D. C. James, Biotechnol. J. 2015, 10, 1019.
[10] X. F. Liu, S. Yan, M. Abecassis, M. Hummel, J. Virol. 2008, 82, 10922.
[11] M. Bain, M. Mendelson, J. Sinclair, J. Gen. Virol. 2003, 84, 41.
[12] Y. B. Johari, A. J. Brown, C. S. Alves, Y. Zhou, C. M. Wright, S. D. Estes, R.
Kshirsagar, D. C. James, J. Biotechnol. 2019, 294, 1. [13] S. A. Lambert, A. Jolma, L. F. Campitelli, P. K. Das, Y. Yin, M. Albu, X. Chen, J. Taipale, T. R. Hughes, M. T. Weirauch, Cell 2018, 772, 650.
[14] G. P. Wagner, K. Kin, V. J. Lynch. Theory Biosci. 2013, 132, 159.
[15] Y. B. Johari, A. C. Mercer, Y. Liu, A. J. Brown, D. C. James. Biotechnol. Bioeng. 2021, 118, 2001.
[16] M. R. Soboleski, J. Oaks, W. P. Halford, FASEB J. 2005, 19, 440.
[17] R. Hardison, J. Taylor, Nat. Rev. Genet. 2012, 13, 469.
[18] L. V. Pham, A. T. Tamayo, L. C. Yoshimura, Y. C. Lin-Lee, R. J. Ford, Blood 2005, 106, 3940.
[19] S. J. Gray, S. B. Foti, J. W. Schwartz, L. Bachaboina, B. Taylor-Blake, J. Coleman, M. D. Ehlers, M. J. Zylka, T. J. McCown, R. J. Samulski, Hum. Gene Ther. 2011, 22, 1143.
[20] S. T. Smale, M. C. Schmidt, A. J. Berk, D. Baltimore, Proc. Natl. Acad. Sci. U. S. A. 1990, 87, 4509.
[21] P. A. Zweidler-McKay, H. L. Grimes, M. M. Flubacher, P. N. Tsichlis, Mol. Cell. Biol. 1996, 16, 4024.
[22] C. A. Vink, J. R. Counsell, D. P. Perocheau, R. Karda, S. M. K. Buckley, M. H. Brugman, M. Galla, A. Schambach, T. R. McKay, S. N. Waddington, S. J. Howe, Mol. Ther. 2017, 25, 1790.
[23] Z. Wang, F. Cheng, J. F. Engelhardt, Z. Yan, J. Qiu, Mol. Ther. Methods Clin. Dev. 2018, 77, 40.
[24] L. Nong, Y. Zhang, Y. Duan, S. Hu, Y. Lin, S. Liang, Biotechnol. Lett. 2020, 42, 2703.
[25] A. J. Brown, D. O. Mainwaring, B. Sweeney, D. C. James, Anal. Biochem. 2013, 443, 205.
[26] Y. D. Patel, A. J. Brown, J. Zhu, G. Rosignoli, S. J. Gibson, D. Hatton, D. C. James, ACS Synth. Biol. 2021, 10, 1155. [27] M. Brandeis, D. Frank, I. Keshet, Z. Siegfried, M. Mendelsohn, A. Names, V. Temper, A. Razin, H. Cedar, Nature 1994, 371, 435.
[28] H. Isomura, M. F. Stinski, A. Kudoh, T. Daikoku, N. Shirata, T. Tsurumi, J. Virol. 2005, 79, 9597.
[29] W. S. Dynan, R. Tjian, Cell 1983, 35, 79.
[30] M. Vigneron, H. A. Barrera- Saldana, D. Baty, R. E. Everett, P. Chambon, EMBO J. 1984, 3, 2373.
[31] K. H. Emami, W. W. Navarre, S. T. Smale, Mol. Cell. Biol. 1995, 75, 5906.
[32] M. Jasin, R. Rothstein, Cold Spring Harb. Perspect. Biol. 2013, 5, a012740.
[33] M. Kim, P.M. O’Callaghan, K. A. Droms, D. C. James, Biotechnol. Bioeng. 2011, 108, 2434.
[34] N. Bessis, F. J. GarciaCozar, M. C. Boissier, Gene Ther. 2004, 77, S10.
[35] G. Backliwal, M. Hildinger, S. Chenuet, S. Wulhfard, M. De Jesus, F. M. Wurm, Nucleic Acids Res. 2008, 36, e96.
[36] K. Swiech, A. Kamen, S. Ansorge, Y. Durocher, V. Picanqo-Castro, E. M. Russo- Carbolante, M. S. Neto, D. T. Covas, BMC Biotechnol. 2011, 77, 114.
[37] J. M. Allen, C. L. Halbert, A. D. Miller, Mol. Ther. 2000, 7, 88.
[0133] While the disclosed methods have been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the methods encompassed by the disclosure are not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
|0134] All publications, patents, patent applications, internet sites, and accession numbers/database sequences including both polynucleotide and polypeptide sequences cited herein are hereby incorporated by reference herein in their entirety for all purposes to the same extent as if each individual publication, patent, patent application, internet site, or accession number/database sequence were specifically and individually indicated to be so incorporated by reference.
SEQUENCES
SEQ ID NO : 1
GTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACC
GCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGA
CTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTG
TATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGC
CCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTA
CCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTT
CCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCC
AAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCT
ATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGTTTTGAC CTCCATAGAAGAC
SEQ ID NO : 2
AGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGT TTTGAC CTCCATAGAAGAC
SEQ ID NO : 3
ATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGACCCCCG
CCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC
AATGGGTGGAGTATTTACGGTAAACTGCCCACTGAATTCTATTAGTCATCGCTATTACCATGGTG
ATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCT
CCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAACCAACGGGACTTTCCAAAATGTC
GTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGAAGCTTAGGTCTATATAAG CAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGTTTTGACCTCCATA GAAGAC
SEQ ID NO : 4
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCG CCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC AATGGGTGGAGTATTTACGGTAAACTGCCCACT
SEQ ID NO : 5
ATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGACCCCCG CCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC AATGGGTGGAGTATTTACGGTAAACTGCCCACT SEQ ID NO : 6
TATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGG
TTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCA
AAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGC GTGTACG
SEQ ID NO : 7
TATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGG
TTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCA
AAACCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGC GTGTACG
SEQ ID NO : 8
ACGGTAAATGG
SEQ ID NO : 9
ACGGTAAGAGG
SEQ ID NO : 10
GTTCCCATA
SEQ ID NO : 11
GTTTCCATA
SEQ ID NO : 12
AAATCAACGG
SEQ ID NO : 13
AAACCAACGG
SEQ ID NO : 14
TATATAAG
SEQ ID NO : 15
TCAGAT
SEQ ID NO : 16
TATATAAGCAGAGCTCGTTTAGTGAACCGTCAGAT
SEQ ID NO : 17 CAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATG
GGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTG
ACGCAAATGGGCGGTAGGCGTGTACG
SEQ ID NO : 18
AGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTG ACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAAT CAACGGGACTTTCCAAAATG
SEQ ID NO : 19
ATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCT ATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGG ATTTCCAAGTCTCCACCCCATTGAC
SEQ ID NO : 20
ATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAG TACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCAT GGTGATGCGGTTTTGGCAGTACATCAATGGGC
SEQ ID NO : 21
CAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAG TACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCT TATGGGACTTTCCTACTTGGCAGTACATCTACGTATTA
SEQ ID NO : 22
ATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTT ACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACG TCAATGACGGTAAATGGCC
SEQ ID NO : 23
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCG CCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC AATGGGTGGAGTATTTACGGTAAACTGCCCACT
SEQ ID NO : 24
AGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTG
ACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAAT CAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGT ACG
SEQ ID NO : 25
ATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAG TACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCAT GGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAA
GTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAA
TGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACG
SEQ ID NO : 26
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCG
CCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC
AATGGGTGGAGTATTTACGGTAAACTGCCCACTGAATTCAATGGGCGTGGATAGCGGTTTGACTC
ACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAAC
GGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACG
SEQ ID NO : 27
CGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGT
CAATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCC
CACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAA
ATGGCCCGCCTGGCATTGTGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCT
ACGTATTAGTCATCGCTATTACCATGG
SEQ ID NO : 28
GTTCATAGCCCTTATATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACC
GCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGA
CTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTG
TATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAGAGGCCCGCCTGGCATTATGC
CCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTA
CCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTT
CCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCC
AAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGG
SEQ ID NO : 29
GTTCATAGCCCTTATATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACC
GCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGA
CTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTG
TATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAGAGGCCCGCCTGGCATTATGC
CCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTA
CCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTT
CCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAACCAACGGGACTTTCC
AAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGG
SEQ ID NO : 30
CCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGA
CCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATT
GACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATG
CCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGAATTCTATTAGTCATCG CTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGG
GGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGA
CTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACG
SEQ ID NO : 31
CCTTATATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGA CCCCCGCCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATT GACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATG CCAAGTACGCCCCCTATTGACGTCAATGACGGTAAGAGGCCCGCCTGGAATTCTATTAGTCATCG CTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGG
GGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGA CTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACG
SEQ ID NO : 32
CCTTATATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGA
CCCCCGCCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATT
GACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATG CCAAGTACGCCCCCTATTGACGTCAATGACGGTAAGAGGCCCGCCTGGAATTCTATTAGTCATCG CTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGG GGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAACCAACGGGA CTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACG
SEQ ID NO : 33
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCG
CCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC AATGGGTGGAGTATTTACGGTAAACTGCCCACTGAATTCAATGGGCGTGGATAGCGGTTTGACTC ACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAAC GGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACG
SEQ ID NO : 34
ATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGACCCCCG CCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC AATGGGTGGAGTATTTACGGTAAACTGCCCACTGAATTCAATGGGCGTGGATAGCGGTTTGACTC ACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAAC GGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACG
SEQ ID NO : 35
ATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGACCCCCG CCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC AATGGGTGGAGTATTTACGGTAAACTGCCCACTGAATTCAATGGGCGTGGATAGCGGTTTGACTC ACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAACCAAC GGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACG
SEQ ID NO : 36 GTTCATAGCCCTTATATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACC
GCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGA
CTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTG
TATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAGAGGCCCGCCTGGCATTATGC
CCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTA
CCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTT
CCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCC
AAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAAGCTT
AGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGT TTTGACCTCCATAGAAGAC
SEQ ID NO : 37
GTTCATAGCCCTTATATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACC
GCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGA
CTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTG
TATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAGAGGCCCGCCTGGCATTATGC
CCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTA
CCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTT
CCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAACCAACGGGACTTTCC
AAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAAGCTT
AGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTGT TTTGACCTCCATAGAAGAC
SEQ ID NO : 38
CCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGA
CCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATT
GACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATG
CCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGAATTCTATTAGTCATCG
CTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGG
GGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGA
CTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGAAGCT
TAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTG
TTTTGACCTCCATAGAAGAC
SEQ ID NO : 39
CCTTATATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGA
CCCCCGCCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATT
GACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATG
CCAAGTACGCCCCCTATTGACGTCAATGACGGTAAGAGGCCCGCCTGGAATTCTATTAGTCATCG
CTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGG
GGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGA
CTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGAAGCT
TAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTG
TTTTGACCTCCATAGAAGAC SEQ ID NO : 40
CCTTATATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGA CCCCCGCCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATT GACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATG CCAAGTACGCCCCCTATTGACGTCAATGACGGTAAGAGGCCCGCCTGGAATTCTATTAGTCATCG CTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGG GGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAACCAACGGGA CTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGAAGCT TAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCACGCTG TTTTGACCTCCATAGAAGAC
SEQ ID NO : 41
ATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCG CCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC AATGGGTGGAGTATTTACGGTAAACTGCCCACTGAATTCAATGGGCGTGGATAGCGGTTTGACTC ACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAAC GGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGA AGCTTAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCAC GCTGTTTTGACCTCCATAGAAGAC
SEQ ID NO : 42
ATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGACCCCCG CCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC AATGGGTGGAGTATTTACGGTAAACTGCCCACTGAATTCAATGGGCGTGGATAGCGGTTTGACTC ACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAAC GGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGA AGCTTAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCAC GCTGTTTTGACCTCCATAGAAGAC
SEQ ID NO : 43
ATGGAGTTCCGCGTTACATAACTTACGGTAAGAGGCCCGCCTGGCTGACCGCCCAACGACCCCCG CCCATTGACGTCAATAATGACGTATGTTTCCATAGTAACGCCAATAGGGACTTTCCATTGACGTC AATGGGTGGAGTATTTACGGTAAACTGCCCACTGAATTCAATGGGCGTGGATAGCGGTTTGACTC ACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAACCAAC GGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGA AGCTTAGGTCTATATAAGCAGAGCTCGTTTAGTGAACCGTCAGATCGCCTGGAGACGCCATCCAC GCTGTTTTGACCTCCATAGAAGAC
SEQ ID NO : 44
ATGCCCGCCATGAAGATCGAGTGCCGCATCACCGGCaCCcTGAACGGCGtgGAGTTCGAGCTGGT GGGCGGCGGAGAGGGCACCCCCGAGCAGGGCCGCATGACCAACAAGATGAAGAGCACCAAAGGCG cCCTGACCTTCAGCCCCTACCTGCTGAGCCACGTGATGGGCTACGGCTTCTACCACTTCGGCACC TACCCCAGCGGCTACGAGAACCCCTTCCTGCACGCCATCAACAACGGCGGCTACACCAACACCCG CATCGAGAAGTACGAGGACGGCGGCGTGCTGCACGTGAGCTTCAGCTACCGCTACGAGGCCGGCC GCGTGATCGGCGACTTCAAGGTGGTGGGCACCGGCTTCCCCGAGGACAGCGTGATCTTCACCGAC AAGATCATCCGCAGCAACGCCACCGTGGAGCACCTGCACCCCATGGGCGATAACGTGCTGGTGGG CAGCTTCGCCCGCACCTTCAGCCTGCGCGACGGCGGCTACTACAGCTTCGTGGTGGACAGCCACA TGCACTTCAAGAGCGCCATCCACCCCAGCATCCTGCAGAACGGGGGCCCCATGTTCGCCTTCCGC CGCGTGGAGGAGCTGCACAGCAACACCGAGCTGGGCATCGTGGAGTACCAGCACGCCTTCAAGAC
CCCCATCGCCTTCGCCAGATCTCGAGCTCgatga
SEQ ID NO : 45
MPAMKIECRITGTLNGVEFELVGGGEGTPEQGRMTNKMKSTKGALTFSPYLLSHVMGYGFYHFGT YPSGYENPFLHAINNGGYTNTRIEKYEDGGVLHVSFSYRYEAGRVIGDFKWGTGFPEDSVIFTD KI IRSNATVEHLHPMGDNVLVGSFARTFSLRDGGYYSFWDSHMHFKSAIHPSILQNGGPMFAFR RVEELHSNTELGIVEYQHAFKTPIAFARSRAR
SEQ ID NO : 46
ACGCAGCTTTCCAAAATGTCTTGTCAACTCCAGTCCATACTCGCAAATGACTGGTAGATCTGT
SEQ ID NO : 47
TGGGCGTGGATA
SEQ ID NO : 48
TGTCGTAACA
SEQ ID NO : 49
CCATTGACGCAA
SEQ ID NO : 50
ATTGACGTCAAT
SEQ ID NO : 51
GGGGATTTCCA
SEQ ID NO : 52
GGCACCAAA
SEQ ID NO : 53
TTCCGCGTTA
SEQ ID NO : 54
TTCCATTGAC
SEQ ID NO : 55 GTCAATGGGT
SEQ ID NO: 56
CGTGAGTCAAA
SEQ ID NO: 57
AGTACATCAATGGGCG
SEQ ID NO: 58
AGTAACGCCA
SEQ ID NO: 59
TGACGGTA
SEQ ID NO: 60
TTGGCAGTACATCAA
SEQ ID NO: 61
TTACCATGGTGA
SEQ ID NO: 62
CGGGACTTTCCA
SEQ ID NO: 63
TCAAGTG
SEQ ID NO: 64
ATATGCCAAGTAC
SEQ ID NO: 65
TAACTTACGGTAAAT
SEQ ID NO: 66
AGTGTATCA
SEQ ID NO: 67
ATGGGTGGAGT
SEQ ID NO: 68
TGGGGCGGAGT
SEQ ID NO: 69 CCATATATGGA
SEQ ID NO: 70
TGCCCAGTACATGACCT
SEQ ID NO: 71
CGGTAAATGGC
SEQ ID NO: 72
AGGGCGCGAAA
SEQ ID NO: 73
TTGACGTAGAAT
SEQ ID NO: 74
TAACGGTT
SEQ ID NO: 75
GGGAATTTCC
SEQ ID NO: 76
ATATGATAATGAG

Claims

CLAIMS What is claimed is:
1. A transcription control element comprising a) a distal cis-regulatory module (CRM) b) a proximal CRM, and c) a core promoter, wherein i. the transcription control element is less than 550 nucleotides long, ii. the distal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 4, wherein the distal CRM does not comprise the nucleotide sequence of SEQ ID NO: 8 or SEQ ID NO: 10, and iii. the proximal CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 6, wherein the proximal CRM does not comprise the nucleotide sequence of SEQ ID NO: 12.
2. The transcription control element of claim 1, which is between 400 and 550 nucleotides long.
3. The transcription control element of claim 1 or claim 2, which is capable of mediating transcription of a heterologous polynucleotide encoding a GFP polypeptide comprising the amino acid sequence of SEQ ID NO: 45 in a HEK293 cell.
4. The transcription control element of any one of claims 1 to 3, wherein the core promoter comprises a TATA-box and an Inr element.
5. The transcription control element of any one of claims 1 to 3, wherein the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 16.
6. The transcription control element of any one of claims 1 to 3, wherein the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 2. The transcription control element of any one of claims 1 to 6, wherein the distal CRM comprises one or more of a) the nucleotide sequence of SEQ ID NO: 9, and b) the nucleotide sequence of SEQ ID NO: 11. The transcription control element of claim 7, wherein the distal CRM comprises the nucleotide sequence of SEQ ID NO: 5. The transcription control element of any one of claims 1 to 8, wherein the proximal CRM comprises the nucleotide sequence of SEQ ID NO: 13. The transcription control element of claim 9, wherein the proximal CRM comprises the nucleotide sequence of SEQ ID NO: 9. The transcription control element of any one of claims 1 to 10 comprising the nucleotide sequence of SEQ ID NO: 3. A transcription control element comprising a) a cis-regulatory module (CRM), and b) a core promoter, wherein i. the CRM comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 17-35, and ii. the transcription control element is less than 550 nucleotides long. The transcription control element of claim 12, which is between 190 and 550 nucleotides long. The transcription control element of claim 12 or claim 13, which is capable of mediating transcription of a heterologous polynucleotide encoding a GFP polypeptide comprising the amino acid sequence of SEQ ID NO: 45 in a HEK293 cell. The transcription control element of any one of claims 12 to 14, wherein the core promoter comprises a TATA-box and an Inr element. The transcription control element of any one of claims 12 to 14, wherein the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 16. The transcription control element of any one of claims 12 to 14, wherein the core promoter comprises a nucleotide sequence with at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100% sequence identity to SEQ ID NO: 2. The transcription control element of any one of claims 12 to 17 comprising the nucleotide sequence of SEQ ID NO: 38-43. A transcription control element i) comprising a nucleotide sequence of SEQ ID NO: 36 or SEQ ID NO:37, or ii) comprising a nucleotide sequence with at least 99% sequence identity to SEQ ID NO: 36 or 37. The transcription control element of claim 19, which is capable of mediating transcription of a heterologous polynucleotide encoding a GFP polypeptide comprising the amino acid sequence of SEQ ID NO: 45 in a HEK293 cell. An isolated polynucleotide comprising the transcription control element of any one of claims 1 to 20. The isolated polynucleotide of claim 15 further comprising an enhancer, splice acceptor, splice donor or intron operably linked to the transcription control element. A vector comprising the transcription control element of any one of claims 1 to 21. The vector of claim 23, which is a viral vector. The vector of claim 23 or claim 24 further comprising a polynucleotide encoding a polypeptide of interest operably linked to the transcription control element. The vector of claim 25 wherein the polypeptide of interest is an antibody, or antigenbinding fragment thereof, fusion protein, Fc-fusion polypeptide, immunoadhesin, immunoglobulin, engineered protein, protein fragment or enzyme. The vector of claim 25, wherein the polypeptide of interest is an antibody. The vector of claim 25, wherein the polypeptide of interest is a viral protein. The vector of claim 25, wherein the polypeptide of interest is a viral capsid protein. The vector of claim 25, wherein the polypeptide of interest is a viral Rep protein. A host cell comprising the isolated polynucleotide of claim 21 or claim 22 or the vector of any one of claims 23 to 30. The host cell of claim 31, which comprises a HEK293 cell, HEK293 derived cell, CHO cell, CHO derived cell, HeLa cell, SF-9 cell, BHK cell, Vero cell, or PerC6 cell. The host cell of claim 31, which comprises a HEK293 cell or HEK293 derived cell. A method of expressing a polypeptide of interest in a host cell comprising culturing the host cell of any one of claims 25 to 33 under suitable conditions to produce the polypeptide of interest.
PCT/US2023/061014 2022-01-21 2023-01-20 Engineered promoters WO2023141582A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263301504P 2022-01-21 2022-01-21
US63/301,504 2022-01-21

Publications (1)

Publication Number Publication Date
WO2023141582A1 true WO2023141582A1 (en) 2023-07-27

Family

ID=85384569

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/061014 WO2023141582A1 (en) 2022-01-21 2023-01-20 Engineered promoters

Country Status (1)

Country Link
WO (1) WO2023141582A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018193072A1 (en) * 2017-04-19 2018-10-25 Medimmune Limited In silico design of mammalian promoters with user-defined functionality
WO2021108755A2 (en) 2019-11-28 2021-06-03 Regenxbio Inc. Microdystrophin gene therapy constructs and uses thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018193072A1 (en) * 2017-04-19 2018-10-25 Medimmune Limited In silico design of mammalian promoters with user-defined functionality
WO2021108755A2 (en) 2019-11-28 2021-06-03 Regenxbio Inc. Microdystrophin gene therapy constructs and uses thereof

Non-Patent Citations (50)

* Cited by examiner, † Cited by third party
Title
A. COULONC. C. CHOWR. H. SINGERD. R. LARSON, NAT. REV. GENET., vol. 14, 2013, pages 572
A. J. BROWNB. SWEENEYD. O. MAINWARINGD. C. JAMES, BIOTECHNOL. J., vol. 10, 2015, pages 1019
A. J. BROWND. O. MAINWARINGB. SWEENEYD. C. JAMES, ANAL. BIOCHEM., vol. 443, 2013, pages 205
ADAM J BROWN ET AL: "Synthetic promoters for CHO cell engineering", BIOTECHNOLOGY AND BIOENGINEERING, JOHN WILEY, HOBOKEN, USA, vol. 111, no. 8, 1 May 2014 (2014-05-01), pages 1638 - 1647, XP071116068, ISSN: 0006-3592, DOI: 10.1002/BIT.25227 *
ADAM J. BROWN ET AL: "NF-κB, CRE and YY1 elements are key functional regulators of CMV promoter driven-transient gene expression in CHO cells", BIOTECHNOLOGY JOURNAL, 1 January 2015 (2015-01-01), pages n/a - n/a, XP055169456, ISSN: 1860-6768, DOI: 10.1002/biot.201400744 *
ALTSCHUL SF ET AL., METHODS IN ENZYMOLOGY, vol. 266, 1996, pages 460 - 480
ALTSCHUL SF ET AL., NUCLEIC ACIDS RES, vol. 25, 1997, pages 3389 - 3402
ALTSCHUL SF ET AL., NUCLEIC ACIDS RES., vol. 25, 1991, pages 3389 - 3402
ASOKAN ET AL., MOL. THER., vol. 20, no. 4, 2012, pages 699 - 708
C. A. VINKJ. R. COUNSELLD. P. PEROCHEAUR. KARDAS. M. K. BUCKLEYM. H. BRUGMANM. GALLAA. SCHAMBACHT. R. MCKAYS. N. WADDINGTON, MOL. THER., vol. 25, 2017, pages 1790
C. SINZGERM. DIGELG. JAHN, CURR. TOP. MICROBIOL. IMMUNOL., vol. 325, 2008, pages 63
D. VASEYS. LILLICOH. SANGT. KINGC. WHITELAW, TRANSGENIC RES, vol. 18, 2009, pages 309
E. FORTEZ. ZHANGE. B. THORPM. HUMMEL, FRONT. CELL. INFECT. MICROBIOL., vol. 10, 2020, pages 130
G. BACKLIWALM. HILDINGERS. CHENUETS. WULHFARDM. DE JESUSF. M. WURM, NUCLEIC ACIDS RES., vol. 36, 2008, pages e96
G. P. WAGNERK. KINV. J. LYNCH, THEORY BIOSCI, vol. 132, 2013, pages 159
H. ISOMURAM. F. STINSKIA. KUDOHT. DAIKOKUN. SHIRATAT. TSURUMI, J. VIROL., vol. 79, 2005, pages 9597
J. M. ALLENC. L. HALBERTA. D. MILLER, MOL. THER., vol. 7, 2000, pages 88
J. Y. QINL. ZHANGK. L. CLIFTI. HULURA. P. XIANGB. Z. RENB. T. LAHN, PLOS ONE, vol. 5, 2010, pages el0611
JOHARI YUSUF B. ET AL: "Engineering of the CMV promoter for controlled expression of recombinant genes in HEK293 cells", BIOTECHNOLOGY JOURNAL, vol. 17, no. 8, 1 August 2022 (2022-08-01), DE, pages 2200062, XP093041967, ISSN: 1860-6768, Retrieved from the Internet <URL:https://onlinelibrary.wiley.com/doi/full-xml/10.1002/biot.202200062> DOI: 10.1002/biot.202200062 *
K. H. EMAMIW. W. NAVARRES. T. SMALE, MOL. CELL. BIOL., vol. 15, 1995, pages 5906
K. SWIECHA. KAMENS. ANSORGEY. DUROCHERV. PICANCO-CASTROE. M. RUSSO-CARBOLANTEM. S. NETOD. T. COVAS, BMC BIOTECHNOL, vol. 11, 2011, pages 114
KARLIN S. ET AL., PROC. NATL. ACAD. SCI., vol. 87, 1990, pages 2264 - 2268
KARLIN S. ET AL., PROC. NATL. ACAD. SCI., vol. 90, 1993, pages 5873 - 5877
L. NONGY. ZHANGY. DUANS. HUY. LINS. LIANG, BIOTECHNOL. LETT., vol. 42, 2020, pages 2703
L. V. PHAMA. T. TAMAYOL. C. YOSHIMURAY. C. LIN-LEER. J. FORD, BLOOD, vol. 106, 2005, pages 3940
M. BAINM. MENDELSONJ. SINCLAIR, J. GEN. VIROL., vol. 84, 2003, pages 41
M. BRANDEISD. FRANKI. KESHETZ. SIEGFRIEDM. MENDELSOHNA. NAMESV. TEMPERA. RAZINH. CEDAR, NATURE, vol. 371, 1994, pages 435
M. F. STINSKIH. ISOMURA, MED. MICROBIOL. IMMUNOL., vol. 197, 2008, pages 223
M. JASINR. ROTHSTEIN, COLD SPRING HARB. PERSPECT. BIOL., vol. 5, 2013, pages a012740
M. KIMP.M. O'CALLAGHANK. A. DROMSD. C. JAMES, BIOTECHNOL. BIOENG., vol. 108, 2011, pages 2434
M. R. SOBOLESKIJ. OAKSW. P. HALFORD, FASEB J, vol. 19, 2005, pages 440
M. VIGNERONH. A. BARRERA-SALDANAD. BATYR. E. EVERETTP. CHAMBON, EMBO J, vol. 3, 1984, pages 2373
MARK F STINSKI ET AL: "Role of the cytomegalovirus major immediate early enhancer in acute infection and reactivation from latency", MEDICAL MICROBIOLOGY AND IMMUNOLOGY, SPRINGER, BERLIN, DE, vol. 197, no. 2, 19 December 2007 (2007-12-19), pages 223 - 231, XP019630549, ISSN: 1432-1831 *
MYERSMILLER, CABIOS, vol. 4, 1989, pages 11 - 17
N. BESSISF. J. GARCIACOZARM. C. BOISSIER, GENE THER, vol. 11, 2004, pages S10
NEEDLEMANWUNSCH, J. MOL. BIOL., no. 48, 1970, pages 444 - 453
P. A. ZWEIDLER-MCKAYH. L. GRIMESM. M. FLUBACHERP. N. TSICHLIS, MOL. CELL. BIOL., vol. 16, 1996, pages 4024
R. HARDISONJ. TAYLOR, NAT. REV. GENET., vol. 13, 2012, pages 469
S. A. LAMBERTA. JOLMAL. F. CAMPITELLIP. K. DASY. YINM. ALBUX. CHENJ. TAIPALET. R. HUGHESM. T. WEIRAUCH, CELL, vol. 172, 2018, pages 650
S. J. GRAYS. B. FOTIJ. W. SCHWARTZL. BACHABOINAB. TAYLOR-BLAKEJ. COLEMANM. D. EHLERSM. J. ZYLKAT. J. MCCOWNR. J. SAMULSKI, HUM. GENE THER., vol. 22, 2011, pages 1143
S. T. SMALEM. C. SCHMIDTA. J. BERKD. BALTIMORE, PROC. NATL. ACAD. SCI. U. S. A., vol. 87, 1990, pages 4509
SMITHWATERMAN, ADVANCES IN APPLIED MATHEMATICS, vol. 2, 1981, pages 482 - 489
V. MELLA-ALVARADOA. GAUTIERF. LE GACJ.-J. LAREYRE, GENE EXPR. PATTERNS, vol. 13, 2013, pages 91
W. S. DYNANR. TJIAN, CELL, vol. 35, 1983, pages 79
W. XIAP. BRINGMANNJ. MCCLARYP. P. JONESW. MANZANAY. ZHUS. WANGY. LIUS. HARVEYM. R. MADLANSACAY, PROTEIN EXPRESSION PURIF, vol. 45, 2006, pages 115
X. F. LIUS. YANM. ABECASSISM. HUMMEL, J. VIROL., vol. 82, 2008, pages 10922
Y. B. JOHARIA. C. MERCERY. LIUA. J. BROWND. C. JAMES, BIOTECHNOL. BIOENG., vol. 118, 2021, pages 2001
Y. B. JOHARIA. J. BROWNC. S. ALVESY. ZHOUC. M. WRIGHTS. D. ESTESR. KSHIRSAGARD. C. JAMES, J. BIOTECHNOL., vol. 294, 2019, pages 1
Y. D. PATELA. J. BROWNJ. ZHUG. ROSIGNOLIS. J. GIBSOND. HATTOND. C. JAMES, ACS SYNTH. BIOL., vol. 10, 2021, pages 1155
Z. WANGF. CHENGJ. F. ENGELHARDTZ. YANJ. QIU, MOL. THER. METHODS CLIN. DEV., vol. 11, 2018, pages 40

Similar Documents

Publication Publication Date Title
US20220162636A1 (en) Inducible AAV System Comprising Cumate Operator Sequences
AU2015342997B2 (en) Methods and materials for producing recombinant viruses in eukaryotic microalgae
AU2007200882A1 (en) Ecdysone receptor-based inducible gene expression system
WO2003018820A2 (en) Mutant recombinant adeno-associated viruses related applications
US8796440B2 (en) Promote system for regulatable gene expression in mammalian cells
JP2022530192A (en) Plasmid system
US20230348936A1 (en) Inducible AAV REP genes
Johari et al. Engineering of the CMV promoter for controlled expression of recombinant genes in HEK293 cells
US20080145893A1 (en) Method for producing a recombinant protein at high specific productivity, high batch yield and high volumetric yield by means of transient transfection
CN112899276B (en) Mini promoter pHSP90AA1 and application thereof
WO2011053935A2 (en) Enhanced gene expression in algae
Hida et al. Sites in the AAV5 capsid tolerant to deletions and tandem duplications
WO2023141582A1 (en) Engineered promoters
CN112680443A (en) Promoter pCalm1 and application thereof
JP3881382B2 (en) Regulated protein expression in stably transfected mammalian cells
JP2022539346A (en) Baculovirus expression system
US20230159951A1 (en) Dual bifunctional vectors for aav production
CN102016025A (en) Expression vector for mass production of foreign gene-derived protein using animal cell and use thereof
WO2005090562A1 (en) Sequence capable of accelerating gene expression at moderately low temperature
US20090023186A1 (en) Use of valproic acid for enhancing production of recombinant proteins in mammalian cells
WO2011089271A1 (en) An inducible baculovirus system in insect cells
AU2010334886B2 (en) Promoter sequences
Wang et al. Combination of MAR and intron increase transgene expression of episomal vectors in CHO cells
JP7264353B2 (en) Novel vectors and their uses
RU2808564C2 (en) Codon-optimized nucleic acid that encodes b-domain-deleted factor viii protein and its use

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23707840

Country of ref document: EP

Kind code of ref document: A1