WO2009154454A1 - Regulation of the expression of a protein in a mammalian cell - Google Patents

Regulation of the expression of a protein in a mammalian cell Download PDF

Info

Publication number
WO2009154454A1
WO2009154454A1 PCT/NL2009/050354 NL2009050354W WO2009154454A1 WO 2009154454 A1 WO2009154454 A1 WO 2009154454A1 NL 2009050354 W NL2009050354 W NL 2009050354W WO 2009154454 A1 WO2009154454 A1 WO 2009154454A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
nucleotide sequence
protein
cell
nucleic acid
Prior art date
Application number
PCT/NL2009/050354
Other languages
French (fr)
Inventor
Pieter Victor Schut
Raymond Michael Dimphena Verhaert
Original Assignee
R1 B3 Holding B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by R1 B3 Holding B.V. filed Critical R1 B3 Holding B.V.
Priority to EP09766872A priority Critical patent/EP2288712A1/en
Publication of WO2009154454A1 publication Critical patent/WO2009154454A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • C12N15/8509Vectors or expression systems specially adapted for eukaryotic hosts for animal cells for producing genetically modified animals, e.g. transgenic
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01KANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
    • A01K2267/00Animals characterised by purpose
    • A01K2267/01Animal expressing industrially exogenous proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription

Definitions

  • the invention relates to the regulation of the expression of a protein in a mammalian cell using a specific nucleic acid sequence.
  • the translation process in eukaryotes is a complex series of steps that involve a wide array of protein translation factors. These factors function in conjunction with the ribosome and tRNAs to decode an mRNA, thereby generating the encoded polypeptide chain.
  • the translation process can be divided into three distinct stages: (i) initiation: the assembly of the ribosomal subunits at the initiation (AUG) codon of an mRNA; (ii) elongation: tRNA-dependend decoding of the mRNA to form a polypeptide chain; (iii) termination: a stop codon (UAA, UAG or UGA) signals the release of the polypeptide chain from the ribosome and subsequently the ribosomal subunits dissociate from the mRNA.
  • Each of these stages requires a specific class of translation factors: eukaryotic initiation, elongation and termination factors (elF, eEF and eRF, respectively).
  • Translation initiation is an important step in both global and mRNA-specific gene regulation and therefore constitutes the primary target for translational control.
  • Global regulation of protein synthesis is generally achieved by the modification of eukaryotic initiation factors (elFs), several of which are phosphoproteins, e.g., eIF4E and eIF2.
  • elFs eukaryotic initiation factors
  • phosphoproteins e.g., eIF4E and eIF2.
  • Translational control of individual mRNAs often depends upon the structural properties of the transcript itself. These may include properties in the 5 ' untranslated region of the mRNA that can affect initiation either directly, for example by impeding 4OS subunit binding or scanning, or indirectly by acting as receptors for a regulatory RNA- binding protein. The role of that sequence in the control of translation has been reviewed by Day and Tuite (1998, J. of Endocrinology 157;361-371).
  • RNA secondary structure positioned between the cap structure and the AUG codon may typically be inhibitory to translation initiation (Kozak 1989, Molecular and Cellular Biology 9:5134-5142).
  • the inhibition can be by steric hindrance preventing the binding of the 43 S preinitiation complex to the cap structure.
  • RNA structural elements can provide sites for the binding of regulatory proteins and RNA molecules and, by forming a stable structure, they typically impede binding or scanning of the 4OS ribosomal subunit (Goossen et al. 1990, Goossen & Hentze 1992).
  • operably linked refers to two or more nucleic acid sequence elements that are physically linked and are in a functional relationship with each other.
  • a promoter is operably linked to a coding sequence if the promoter is able to initiate or regulate the transcription or expression of a coding sequence, in which case the coding sequence should be understood as being "under the control of the promoter.
  • two nucleic acid sequences when operably linked, they will be in the same orientation and usually also in the same reading frame. They usually will be essentially contiguous, although this may not be required.
  • promoter refers to a nucleic acid fragment that functions to control the transcription of one or more genes, located upstream with respect to the direction of transcription of the transcription initiation site of the gene, and is related to the binding site identified by the presence of a binding site for DNA- dependent RNA polymerase, transcription initiation sites and any other DNA sequences, including, but not limited to transcription factor binding sites, repressor and activator protein binding sites, and any other sequences of nucleotides known to one skilled in the art to act directly or indirectly to regulate the amount of transcription from the promoter.
  • a promoter preferably ends at nucleotide -1 of the transcription start site (TSS).
  • Any nucleotide molecule capable to hybridise to a nucleotide molecule represented by SEQ ID NO. 1 is defined as being part of the UNl (SEQ ID NO:2) of the invention. Any nucleotide molecule capable to hybridise to SEQ ID NO:2 or to regions of SEQ ID NO:1 as later identified herein or to SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8 or SEQ ID NO:9 is also encompassed by the present invention. Hybridisation conditions are preferable stringent.
  • Stringent hybridisation conditions are herein defined as conditions that allow a nucleic acid sequence of at least 25, preferably 50, 75 or 100, and most preferably 150 or more nucleotides, to hybridise at a temperature of about 65 0 C in a solution comprising about 1 M salt, preferably 6 x SSC or any other solution having a comparable ionic strength, and washing at 65 0 C in a solution comprising about 0.1 M salt, or less, preferably 0.2 x SSC or any other solution having a comparable ionic strength.
  • the hybridisation is performed overnight, i.e. at least for 10 hours and preferably washing is performed for at least one hour with at least two changes of the washing solution.
  • Moderate hybridization conditions are herein defined as conditions that allow a nucleic acid sequence of at least 50, preferably 150 or more nucleotides, to hybridise at a temperature of about 45 0 C in a solution comprising about 1 M salt, preferably 6 x SSC or any other solution having a comparable ionic strength, and washing at room temperature in a solution comprising about 1 M salt, preferably 6 x SSC or any other solution having a comparable ionic strength.
  • the hybridisation is performed overnight, i.e. at least for 10 hours, and preferably washing is performed for at least one hour with at least two changes of the washing solution.
  • These conditions will usually allow the specific hybridisation of sequences having up to 50% sequence identity. The person skilled in the art will be able to modify these hybridisation conditions in order to specifically identify sequences varying in identity between 50% and 90%.
  • nucleic acid or polypeptide molecule when used to indicate the relation between a given (recombinant) nucleic acid or polypeptide molecule and a given host organism or host cell, is understood to mean that in nature the nucleic acid or polypeptide molecule is produced by a host cell or organisms of the same species, preferably of the same variety or strain. If homologous to a host cell, a nucleic acid sequence encoding a polypeptide will typically be operably linked to another promoter sequence or, if applicable, another secretory signal sequence and/or terminator sequence than in its natural environment.
  • the term "homologous" means that one single-stranded nucleic acid sequence may hybridise to a complementary single-stranded nucleic acid sequence.
  • the degree of hybridisation may depend on a number of factors including the extent of identity between the sequences and the hybridisation conditions such as temperature and salt concentration as discussed later.
  • the region of identity is greater than 5 bp, more preferably the region of identity is greater than 10 bp.
  • heterologous when used with respect to a nucleic acid or polypeptide molecule refers to a nucleic acid or polypeptide from a foreign cell which does not occur naturally as part of the organism, cell, genome or DNA or RNA sequence in which it is present, or which is found in a cell or location or locations in the genome or DNA or RNA sequence that differ from that in which it is found in nature.
  • Heterologous nucleic acids or proteins are not endogenous to the cell into which they are introduced, but have been obtained from another cell or synthetically or recombinantly produced.
  • nucleic acids encode proteins that are not normally produced by the cell in which the DNA is transcribed or expressed
  • similarly exogenous RNA codes for proteins not normally expressed in the cell in which the exogenous RNA is present.
  • a heterologous protein or polypeptide can be composed of homologous elements arranged in an order and/or orientation not normally found in the host organism, tissue or cell thereof in which it is transferred, i.e. the nucleotide sequence encoding said protein or polypeptide originates from the same species but is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention.
  • Heterologous nucleic acids and proteins may also be referred to as foreign nucleic acids or proteins.
  • heterologous nucleic acid or protein Any nucleic acid or protein that one of skill in the art would recognise as heterologous or foreign to the cell in which it is expressed is herein encompassed by the term heterologous nucleic acid or protein.
  • heterologous also applies to non-natural combinations of nucleic acid or amino acid sequences, i.e. combinations where at least two of the combined sequences are foreign with respect to each other.
  • endogenous when used with respect to a nucleic acid or polypeptide molecule refers to a nucleic acid or polypeptide as natively expressed in a cell, preferably a mammalian cell as defined in the invention.
  • Sequence identity is a relationship between two or more amino acid (polypeptide or protein) sequences or two or more nucleic acid (polynucleotide or nucleotide) sequences, as determined by comparing the sequences.
  • the percentage of “identity” indicates the degree of sequence relatedness between amino acid or nucleic acid sequences as determined by the match between strings of such sequences.
  • the percentage of identity is determined by comparing the whole SEQ ID NO as identified herein. However, part of a sequence may also be used. Two amino acid sequences are considered “similar” if the polypeptides only differ in conserved amino acid substitutions.
  • amino acids having similar side chains are glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulphur-containing side chains is cysteine and methionine.
  • Preferred conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.
  • Substitutional variants of the amino acid sequence disclosed herein are those in which at least one residue in the disclosed sequences has been removed and a different residue inserted in its place.
  • the amino acid change is conservative.
  • Preferred conservative substitutions for each of the naturally occurring amino acids are as follows: Ala to ser; Arg to lys; Asn to gin or his; Asp to glu; Cys to ser or ala; GIn to asn; GIu to asp; GIy to pro; His to asn or gin; He to leu or val; Leu to ile or val; Lys to arg; Asn to gin or glu; Met to leu or ile; Phe to met, leu or tyr; Ser to thr; Thr to ser; Trp to tyr; Tyr to trp or phe; and, Val to ile or leu.
  • Preferred methods to determine identity are designed to give the largest match between the sequences tested. Methods to determine identity and similarity are codified in publicly available computer programs. Preferred computer program methods to determine identity and similarity between two sequences include e.g. the GCG program package (Devereux, J., et al, Nucleic Acids Research 12 (1):387 (1984)), BestFit and FASTA (Altschul, S. F. et al., J. MoI. Biol. 215:403-410 (1990).
  • the BLAST 2.0 family of programs which can be used for database similarity searches includes: BLASTN for nucleotide query sequences against nucleotide database sequences; BLASTX for nucleotide query sequences against protein database sequences; BLASTP for protein query sequences against protein database sequences; TBLASTN for protein query sequences against nucleotide database sequences; and TBLASTX for nucleotide query sequences against nucleotide database sequences.
  • the BLASTX program is publicly available from NCBI and other sources (BLAST Manual, Altschul, S., et al., NCBI NLM NIH Bethesda, MD 20894; Altschul, S., et al., J. MoL Biol. 215:403-410 (1990)).
  • the well-known Smith Waterman algorithm may also be used to determine identity.
  • Preferred parameters for polypeptide sequence comparison include the following: Algorithm: Needleman and Wunsch, J. MoI. Biol. 48:443-453 (1970); Comparison matrix: BLOSSUM62 from Hentikoff and Hentikoff, Proc. Natl. Acad. Sci. USA. 89:10915-10919 (1992); Gap Penalty: 12; and Gap Length Penalty: 4.
  • a program useful with these parameters is publicly available as the "Ogap" program from Genetics Computer Group, located in Madison, WI.
  • the aforementioned parameters are the default parameters for amino acid comparisons (along with no penalty for end gaps).
  • Preferred parameters for nucleic acid comparison include the following:
  • Gap_penalty is 10.0 and Extend_penalty is 0.5.
  • nucleic acid refers to a naturally occurring or synthetic oligonucleotide or polynucleotide, whether DNA or RNA or DNA-RNA hybrid, single-stranded or double-stranded, sense or antisense, which is capable of hybridization to a complementary nucleic acid by Watson-Crick base-pairing.
  • Nucleic acids of the invention can also include nucleotide analogs (e.g., BrdU), and non- phosphodiester internucleoside linkages (e.g., peptide nucleic acid (PNA) or thiodiester linkages).
  • nucleic acids can include, without limitation, DNA, RNA, cDNA, gDNA, ssDNA, dsDNA, ssRNA, dsRNA, non coding RNAs or any combination thereof.
  • Primers are a subset of probes which are capable of supporting some type of enzymatic manipulation and which can hybridize with a target nucleic acid such that the enzymatic manipulation can occur.
  • a primer can be made from any combination of nucleotides or nucleotide derivatives or analogs available in the art which do not interfere with the enzymatic manipulation.
  • Probes are molecules capable of interacting with a target nucleic acid, typically in a sequence specific manner, for example through hybridization. The hybridization of nucleic acids is well understood in the art and discussed herein. Typically a probe can be made from any combination of nucleotides or nucleotide derivatives or analogs available in the art.
  • mRNA essential RNA
  • mRNA essential RNA
  • mRNA RNA of the antisense strand (anticoding strand or template) of protein coding DNA.
  • pre-mRNA also called primary transcript or hnRNA
  • hnRNA primary transcript or hnRNA
  • the mature mRNA is then transported into the cytoplasm where it is translated into protein on the ribosome.
  • an mRNA generally comprises on both the 5' and the 3' side sequences flanking the region that specifies the protein sequence.. These regions may be untranslated.
  • Antisense nucleic acid a relatively long pre-mRNA (also called primary transcript or hnRNA) which is then processed, still within the nucleus, to remove introns. Further post-transcriptional modifications can also occur.
  • the mature mRNA is then transported into the cytoplasm where it is translated into protein on the ribosome.
  • an mRNA generally comprises on both the 5' and the 3' side sequences flanking the region that specifies the protein sequence.
  • Antisense nucleic acid refers to a RNA, DNA or PNA molecule that is complementary to all or part of a target primary transcript or mRNA and that blocks the translation of a target nucleotide sequence.
  • Polypeptide as used herein refers to any peptide, oligopeptide, polypeptide, gene product, expression product, or protein. A polypeptide is comprised of consecutive amino acids. The term “polypeptide” encompasses naturally occurring or synthetic molecules.
  • polypeptide refers to amino acids joined to each other by peptide bonds or modified peptide bonds, e.g., peptide isosteres, etc. and may contain modified amino acids other than the 20 gene-encoded amino acids.
  • the polypeptides can be modified by either natural processes, such as post-translational processing, or by chemical modification techniques which are well known in the art. Modifications can occur anywhere in the polypeptide, including the peptide backbone, the amino acid side-chains and the amino or carboxyl termini. The same type of modification can be present in the same or varying degrees at several sites in a given polypeptide. Also, a given polypeptide can have many types of modifications.
  • Modifications include, without limitation, acetylation, acylation, ADP-ribosylation, amidation, covalent cross-linking or cyclization, covalent attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of a phosphytidylinositol, disulfide bond formation, demethylation, formation of cysteine or pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristolyation, oxidation, pergylation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, and transfer-RNA mediated addition of amino acids to protein such as arginylation.
  • amino acid sequence refers to a list of abbreviations, letters, characters or words representing amino acid residues.
  • the amino acid abbreviations used herein are conventional one letter codes for the amino acids and are expressed as follows: A, alanine; N, asparagine; C, cysteine; D aspartic acid; E, glutamate, glutamic acid; F, phenylalanine; G, glycine; H histidine; I isoleucine; K, lysine; L, leucine; M, methionine; P, proline; Q, glutamine; R, arginine; S, serine; T, threonine; V, valine; W, tryptophan; Y, tyrosine; Z, glutamine or glutamic acid.
  • Vector or plasmid refers to a nucleic acid sequence capable of transporting into a cell another nucleic acid to which the vector sequence has been linked.
  • expression vector includes any vector, (e.g., a plasmid, cosmid or phage chromosome) containing a gene construct in a form suitable for expression by a cell (e.g., linked to a transcriptional control element).
  • Plasmid and vector are used interchangeably, as a plasmid is a commonly used form of vector.
  • the invention is intended to include other vectors which serve equivalent functions.
  • sequence of interest or "gene of interest” can mean a nucleic acid sequence (e.g., a therapeutic gene), that is partly or entirely heterologous, i.e., foreign, to a cell into which it is introduced.
  • polypeptide of interest e.g., a therapeutic peptide
  • protein of interest can mean a peptide sequence (e.g., a therapeutic peptide), that is partly or entirely heterologous, i.e., foreign, to a cell into which it is introduced.
  • sequence of interest or “gene of interest” can also mean a nucleic acid sequence, that is partly or entirely homologous to an endogenous gene of the cell into which it is introduced, but which is designed to be inserted into the genome of the cell in such a way as to alter the genome (e.g., it is inserted at a location which differs from that of the natural gene or its insertion results in "a knockout”).
  • a sequence of interest can be cDNA, DNA, or mRNA.
  • sequence of interest or “gene of interest” can also mean a nucleic acid sequence, that is partly or entirely complementary to an endogenous gene of the cell into which it is introduced.
  • sequence of interest can be micro RNA, shRNA, or siRNA.
  • sequence of interest or “gene of interest” can also include one or more transcriptional regulatory sequences and any other nucleic acid, such as introns, that may be necessary for optimal expression of a selected nucleic acid.
  • polypeptide of interest can also mean a peptide sequence that is partly or entirely homologous to an endogenous peptide of the cell into which it is introduced.
  • polypeptide of interest can also mean a peptide or polypeptide sequence (e.g., a therapeutic protein), that is expressed from a sequence of interest or gene of interest.
  • Transformation/transfection mean the introduction of a nucleic acid, e.g., an expression vector, into a recipient cell possibly including introduction of a nucleic acid to the chromosomal DNA of said cell.
  • Isolated polypeptide/purified polypeptide By “isolated polypeptide” or “purified polypeptide” is meant a polypeptide (or a fragment thereof) that is substantially free from the materials with which the polypeptide is normally associated in nature.
  • the polypeptides of the invention, or fragments thereof can be obtained, for example, by extraction from a natural source (for example, a mammalian cell), by expression of a recombinant nucleic acid encoding the polypeptide (for example, in a cell or in a cell-free translation system), or by chemically synthesizing the polypeptide.
  • polypeptide fragments may be obtained by any of these methods, or by cleaving full length polypeptides.
  • isolated nucleic acid or “purified nucleic acid” is meant DNA that is free of the genes that, in the naturally-occurring genome of the organism from which the DNA of the invention is derived, flank the gene.
  • the term therefore includes, for example, a recombinant DNA which is incorporated into a vector, such as an autonomously replicating plasmid or virus; or incorporated into the genomic DNA of a prokaryote or eukaryote (e.g., a transgene); or which exists as a separate molecule (for example, a cDNA or a genomic or cDNA fragment produced by PCR, restriction endonuclease digestion, or chemical or in vitro synthesis).
  • isolated nucleic acid also refers to RNA, e.g., an mRNA molecule that is encoded by an isolated DNA molecule, or that is chemically synthesized, or that is separated or substantially free from at least some cellular components, for example, other types of RNA molecules or polypeptide molecules.
  • the invention relates to a method for expressing a protein or polypeptide of interest in a mammalian cell comprising the steps of: a) providing a nucleic acid construct comprising a first nucleotide sequence that has at least 34 % nucleotide sequence identity with the nucleotide sequence of SEQ ID No.
  • a nucleic acid construct comprises a first nucleotide sequence that has at least 34 % nucleotide sequence identity to the nucleotide sequence of SEQ ID No. 1 (using the Needleman-Wunsch algorithm of Needle; gap penalties: existence 10, extension 0.5).
  • identity is calculated over the whole length of SEQ ID NO:1 or SEQ ID NO:2.
  • identity is calculated by comparison to nucleotides 4-76 or 27-50 or 100-151 or 104- 151 of SEQ ID NO: 1.
  • a nucleic acid construct according to the invention preferably comprises a first nucleotide sequence that has at least 34%, 35%, 36 %, more preferably at least 40 %, 45 %, 50 %, 55 %, 60 %, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98% or 99 % nucleotide sequence identity to the whole length of the nucleotide sequence of SEQ ID No. 1 or to each of the specified regions within SEQ ID NO:1 as identified above.
  • a nucleic acid construct comprises or consists of a first nucleotide sequence that has 100% nucleotide sequence identity to the nucleotide sequence of SEQ ID No. 1.
  • the invention relates to a method for expressing a protein of interest in a mammalian cell comprising the steps of: a) providing a nucleic acid construct comprising a first nucleotide sequence comprising a nucleotide sequence that has at least 46% nucleotide sequence identity to nucleotides 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has at least 51% nucleotide sequence identity to nucleotides 4 - 76 of the nucleotide sequence of SEQ ID No.
  • said first nucleotide sequence comprises a nucleotide sequence that has at least 48%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% and particularly 100% nucleotide sequence identity to nucleotides 100-151 or 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has at least 55%, more preferably at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98% and particularly 100% nucleotide sequence identity to nucleotides 4 - 76 of the nucleotide sequence of SEQ ID No. 1 or a combination thereof.
  • a first nucleotide sequence comprises a nucleotide sequence that has at least 46%, 48%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% and particularly 100% nucleotide sequence identity to nucleotides 100-151 or 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has 100% nucleotide sequence identity to nucleotides 27 - 50 of the nucleotide sequence of SEQ ID No. 1 or a combination thereof.
  • the nucleotides 27 - 50 of the nucleotide sequence of SEQ ID No. 1 consist of a repeat of 8 GAA-units.
  • said first nucleotide sequence comprises a nucleotide sequence that consists of 7, 6, 5, 4 or 3 GAA units.
  • a first nucleotide sequence comprises or consists of or has at least 34% nucleotide sequence identity to the SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9.
  • the invention also encompasses a host cell, preferably a mammalian cell comprising said first nucleotide sequence. Accordingly, the invention also encompasses a nucleic acid construct comprising one of these sequences. Accordingly, the invention also encompasses each of these nucleotide sequences.
  • the invention encompasses a nucleotide sequence defined by identity by comparison to SEQ ID NO:1, or SEQ ID NO:2 or specific regions of SEQ ID NO:1 as identified herein.
  • the skilled person will understand that any sequence derived from SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, or from SEQ ID NO:9 and having the required identity with SEQ ID NOl, SEQ ID NO:2 or with a specific region of SEQ ID NO:1 as identified herein is considered to be encompassed by the present invention.
  • a sequence is derived from SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, or from SEQ ID NO:9 by substituting, deleting and/or adding one, two, three, four or more nucleotides as present in the original sequence.
  • the functionality of any sequence is checked using a control expression system as identified in example 1.
  • a sequence is said functional when the expression of a given protein of interest in a system as identified in example 1 is increased of at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200%, 300% or more after a given period of time and by comparison to a control cell not having the sequence in question.
  • a nucleotide sequence according to the invention can be present in the form of
  • RNA or in the form of DNA including genomic DNA i.e. DNA including the introns, cDNA or synthetic DNA.
  • the DNA may be double-stranded or single-stranded and if single-stranded may be the coding strand or non-coding (anti-sense) strand.
  • DNA or RNA with a backbone modified for stability or for other reasons are a further part of the invention.
  • DNA or RNA comprising unusual bases, such as inosine, or modified bases, such as tritylated bases are also a part of the invention.
  • a nucleotide sequence may also be a allelic variant of the nucleotide sequence according to the invention.
  • nucleotide sequence can be prepared or altered synthetically so the known codon preferences of the intended expression host can advantageously be used. It has been shown for instance that the codon preferences and GC content preferences of monocotyledons and dicotyledons differ (Murray et al, Nucl. Acids Res. 17: 477-498 (1989)).
  • a nucleic acid construct comprises a second nucleotide sequence encoding a protein or polypeptide of interest that is operably linked to any one of the first nucleotide sequences as defined above.
  • a protein or polypeptide of interest can be a homologous or an endogenous or an heterologous protein or polypeptide Therefore it is to be understood that the invention protects the production of homologous, heterogenous or endogenous protein.
  • the invention is not limited to a specific kind of protein to be produced. Any protein is preferably produced using a method of the invention.
  • a second nucleotide sequence encoding an homologous, endogenous or heterologous protein or polypeptide may be derived in whole or in part from any source known to the art, including a bacterial or viral genome or episome, eukaryotic nuclear or plasmid DNA, cDNA or chemically synthesised DNA. Endogenous, homologous and heterologous are preferably defined by reference to the cell or host cell used.
  • a second nucleotide sequence may constitute an uninterrupted coding region or it may include one or more introns bounded by appropriate splice junctions, it can further be composed of segments derived from different sources, naturally occurring or synthetic.
  • a second nucleotide sequence encoding a protein or polypeptide of interest according to a method of the invention is preferably a full-length nucleotide sequence, but can also be a functionally active part or other part of said full-length nucleotide sequence.
  • a protein or polypeptide of interest may be a protein or polypeptide conferring, for instance, disease resistance, immunity, an improved intake of nutrients, minerals, or a modified metabolism in a mammalian cell.
  • a mammalian cell is used for overproduction of the protein or polypeptide of interest.
  • a second nucleotide sequence encoding a protein or polypeptide of interest may also comprise signal sequences directing the protein or polypeptide of interest when expressed to a specific location in a cell or tissue.
  • signal sequences include, but are not limited to, sequences directing the protein or polypeptide of interest to organelles within a mammalian cell or outside of a mammalian cell.
  • a second nucleotide sequence encoding a protein or polypeptide of interest can also comprise sequences which facilitate protein purification and protein detection by for instance Western blotting and ELISA (e.g. c-myc or polyhistidine sequences).
  • a protein or polypeptide of interest may have industrial or medicinal (pharmaceutical) applications.
  • proteins or polypeptides with industrial applications include enzymes such as e.g. lipases (e.g. used in the detergent industry), proteases (used inter alia in the detergent industry, in brewing and the like), cell wall degrading enzymes (such as, cellulases, pectinases, beta. -1,3/4- and beta.
  • glucanases -1,6- glucanases, rhamnogalacturonases, mannanases, xylanases, pullulanases, galactanases, esterases and the like, used in fruit processing wine making and the like or in feed
  • phytases phospholipases
  • glycosidases such as amylases, beta.-glucosidases, arabinofuranosidases, rhamnosidases, apiosidases and the like
  • dairy enzymes e.g. chymosin
  • Mammalian, and preferably human, proteins or polypeptides and/or enzymes with therapeutic, cosmetic or diagnostic applications include, but are not limited to, insulin, human serum albumin (HSA), lactoferrin, hemoglobin ⁇ and ⁇ , tissue plasminogen activator (tPA), erythropoietin (EPO), tumor necrosis factors (TNF), BMP (Bone Morphogenic Protein), growth factors (G-CSF, GM-CSF, M-CSF, PDGF, EGF, and the like), peptide hormones (e.g., insulin, human serum albumin (HSA), lactoferrin, hemoglobin ⁇ and ⁇ , tissue plasminogen activator (tPA), erythropoietin (EPO), tumor necrosis factors (TNF), BMP (Bone Morphogenic Protein), growth factors (G-CSF, GM-CSF, M-CSF, PDGF, EGF, and the like), peptide hormones (e.g.
  • bacterial and viral antigens e.g. for use as vaccines, including e.g. heat-labile toxin B-subunit, cholera toxin B- subunit, envelope surface protein Hepatitis B virus, capsid protein Norwalk virus, glycoprotein B Human cytomegalovirus, glycoprotein S, interferon, and transmissible gastroenteritis corona virusreceptors and the like. Further included are genes coding for mutants or analogues of the said proteins.
  • a nucleic acid construct further comprises a promotor for control and initiation of transcription of a second nucleotide sequence.
  • a promoter preferably is capable of causing expression of a second nucleotide sequence in a host cell of choice. Said promoter, e.g. homologous or heterologous for a mammalian cell and/or for a nucleotide sequence, is operably linked to any one of the nucleotide sequences mentioned above.
  • a promoter is a promoter capable of initiating transcription in a mammalian cell. More preferably, such a promoter is a mammalian promoter.
  • a mammalian promotor as used herein include tissue-specific, tissue-preferred, cell-type-specific, inducible and constitutive promotors.
  • Tissue-specific promotors are promoters which initiate transcription only in certain tissues and refer to a sequence of DNA that provides recognition signals for RNA polymerase and/or other factors required for transcription to begin, and/or for controlling expression of the coding sequence precisely within certain tissues or within certain cells of that tissue. Expression in a tissue specific manner may be only in individual tissues or in combinations of tissues.
  • promoters as used herein can include but are not limited to promoters that originate from the host cell that the constructs are introduced to.
  • Promoters that may be used in a mammalian cell can include promoters such as metallothionein HA promoter (mouse), EFl alpha promoter (human), Cytomegalovirus (CMV), Rous sarcoma virus (RSV), simian virus 40 (SV40), Moloney murine leukemia, Tk promoter Herpes simplex virus (HSV).
  • promoters such as metallothionein HA promoter (mouse), EFl alpha promoter (human), Cytomegalovirus (CMV), Rous sarcoma virus (RSV), simian virus 40 (SV40), Moloney murine leukemia, Tk promoter Herpes simplex virus (HSV).
  • a cell-type-specific promoter is a promotor that primarily drives expression in a certain cell type.
  • An inducible promoter is a promoter that is capable of activating transcription of one or more DNA sequences or genes in response to an inducer. The DNA sequences or genes will not be transcribed when the inducer is absent.
  • Inducers known in the art include high salt concentrations, cold, heat or toxic elements and include pathogens or disease agents such as virusses.
  • Inducers can be chemical agents such as proteins, growth regulators, metabolites or phenolic compounds.
  • An inducer can also be an illumination agent such as darkness and light at various modalities including wavelength, intensity, fluence, direction and duration. Activation of an inducible promoter is established by application of the inducer.
  • the group of generally inducible promotors includes, but is not limited to, the hsp70 heat shock promoter of Drosphilia melanogaster, a cold inducible promoter from Brassica napus and an alcohol dehydrogenase promoter which is induced by ethanol.
  • Other inducible promoters include, but are not limited to the glaA promoter which is starch-inducible, the metallothionein HA promoter, and the tetracyclin inducible promoter.
  • a constitutive promoter is a promoter that is active under many environmental conditions and in many different tissue types.
  • Constitutive mammalian promoters include, but are not limited to the EFl alpha promoter (human), Cytomegalovirus (CMV), Rous sarcoma virus (RSV), simian virus 40 (SV40), Moloney murine leukemia, Tk promoter Herpes simplex virus (HSV).
  • a nucleic acid construct as disclosed herein can also comprise one or more regulating elements.
  • the one or more regulating elements can be operably linked to one or more of the nucleic acid sequences within the nucleic acid construct.
  • regulating elements can refer to enhancers or other segments of a nucleic acid sequence that are involved in controlling gene expression.
  • Enhancer generally refers to a sequence of DNA that functions at no fixed distance from the transcription start site and can be either 5' (Laimins, L. et al, Proc. Natl. Acad. Sci. 78: 993 (1981)), within, or 3' (Lusky, MX., et al, MoI. Cell Bio. 3: 1108 (1983)) to the transcription unit.
  • a transcription unit is that part of the DNA that will be transcribed into RNA.
  • enhancers can be within an intron (Banerji, J.L.
  • Enhancers function to increase transcription from nearby promoters. Enhancers and promoters can also contain response elements that mediate the regulation of transcription. An enhancer often determines the regulation of expression of a gene.
  • enhancer sequences are now known from mammalian genes (globin, elastase, albumin, ⁇ - fetoprotein and insulin), one will preferably use an enhancer from a eukaryotic cell- infecting virus for general expression.
  • enhancers include, but are not limited to the SV40 enhancer on the late side of the replication origin, the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers.
  • regulating elements include, but are not limited to, elements present in the non coding and coding nucleotide sequences of homologous and/or heterologous nucleotide sequences, including the Iron Responsive Element (IRE), Translational cis-Regulatory Element (TLRE) or uORFs in 5 ' untranslated sequences and poly(U) stretches at the 3 ' end.
  • IRE Iron Responsive Element
  • TRE Translational cis-Regulatory Element
  • uORFs in 5 ' untranslated sequences and poly(U) stretches at the 3 ' end.
  • a regulating element can also contain sequences necessary for the termination of transcription which may affect mRNA expression. These regions are transcribed as polyadenylated segments in the untranslated portion of the mRNA encoding tissue factor protein. The 3' untranslated regions usually also include transcription termination sites. For protein coding sequences it is preferred that the transcription unit also contains a polyadenylation region. One benefit of this region is that it increases the likelihood that the transcribed unit will be processed and transported like mRNA. The identification and use of polyadenylation signals in expression constructs is well established. It is preferred that homologous polyadenylation signals be used in the transgene constructs. In certain transcription units, the polyadenylation region is derived from the SV40 early polyadenylation signal and consists of about 400 bases.
  • a nucleic acid construct according to the invention is preferably a vector, in particular a plasmid, cosmid or phage or nucleotide sequence, linear or circular, of a single or double stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing any one of the nucleotide sequences of the invention in sense or antisense orientation into a mammalian cell.
  • the choice of vector is dependent on the recombinant procedures followed and the host cell used.
  • a vector may be an autonomously replicating vector or may replicate together with the chromosome into which it has been integrated.
  • General techniques for integration into the host genome include, for example, systems designed to promote homologous recombination with the host genome. These systems typically rely on sequence flanking the nucleic acid to be expressed that has enough homology with a target sequence within the host cell genome that recombination between the vector nucleic acid and the target nucleic acid takes place, causing the delivered nucleic acid to be integrated into the host genome. These systems and the methods necessary to promote homologous recombination are known to those of skill in the art.
  • Suitable vectors which can be delivered using the presently known procedures include, but are not limited to, herpes simplex virus vectors, adenovirus vectors, papovavirus vectors (such as human papillomavirus vectors, polyomavirus vectors, SV40 vectors), adeno-associated virus vectors, retroviral vectors, pseudorabies virus, alpha-herpes virus vectors, Herpes virus, Vaccinia virus, Polio virus, AIDS virus, neuronal trophic virus, Sindbis and other RNA viruses, including these viruses with the HIV backbone and the like.
  • herpes simplex virus vectors such as human papillomavirus vectors, polyomavirus vectors, SV40 vectors
  • adeno-associated virus vectors retroviral vectors, pseudorabies virus, alpha-herpes virus vectors
  • Herpes virus Vaccinia virus, Polio virus, AIDS virus, neuronal trophic virus, Sindbis and other RNA viruses,
  • Retroviruses include Murine Maloney Leukemia virus, MMLV, and retroviruses that express the desirable properties of MMLV as a vector.
  • Retroviral vectors are able to carry a larger genetic payload, i.e., a transgene or marker gene, than other viral vectors, and for this reason are a commonly used vector. However, they are not as useful in non-proliferating cells.
  • Adenovirus vectors are relatively stable and easy to work with, have high titers, and can be delivered in aerosol formulation, and can transfect non-dividing cells.
  • Pox viral vectors are large and have several sites for inserting genes, they are thermostable and can be stored at room temperature.
  • Viral vectors can have higher transaction abilities (i.e., ability to introduce genes) than chemical or physical methods of introducing genes into cells.
  • viral vectors contain nonstructural early genes, structural late genes, an RNA polymerase III transcript, inverted terminal repeats necessary for replication and encapsidation, and promoters to control the transcription and replication of the viral genome.
  • viruses When engineered as vectors, viruses typically have one or more of the early genes removed and a gene or gene/promotor cassette is inserted into the viral genome in place of the removed viral DNA. Constructs of this type can carry up to about 8 kb of foreign genetic material.
  • the necessary functions of the removed early genes are typically supplied by cell lines which have been engineered to express the gene products of the early genes in trans.
  • Retroviral vectors in general, are described by Verma, LM. , Retroviral vectors for gene transfer. In Microbiology-1985, American Society for Microbiology, pp. 229- 232, Washington, (1985), which is incorporated by reference herein. Examples of methods for using retroviral vectors for gene therapy are described in U.S. Patent Nos. 4,868,116 and 4,980,286; PCT applications WO 90/02806 and WO 89/07136; and Mulligan, (Science 260:926-932 (1993)); the teachings of which are incorporated herein by reference in their entirety for their teaching of methods for using retroviral vectors for gene therapy.
  • a retrovirus is essentially a package which has packed into it nucleic acid cargo.
  • the nucleic acid cargo carries with it a packaging signal, which ensures that the replicated daughter molecules will be efficiently packaged within the package coat.
  • a packaging signal In addition to the package signal, there are a number of molecules which are needed in cis, for the replication, and packaging of the replicated virus.
  • a retroviral genome contains the gag, pol, and env genes which are involved in the making of the protein coat. It is the gag, pol, and env genes which are typically replaced by the foreign DNA that it is to be transferred to the target cell.
  • Retrovirus vectors typically contain a packaging signal for incorporation into the package coat, a sequence which signals the start of the gag transcription unit, elements necessary for reverse transcription, including a primer binding site to bind the tRNA primer of reverse transcription, terminal repeat sequences that guide the switch of RNA strands during DNA synthesis, a purine rich sequence 5' to the 3' LTR that serves as the priming site for the synthesis of the second strand of DNA synthesis, and specific sequences near the ends of the LTRs that enable the insertion of the DNA state of the retrovirus to insert into the host genome.
  • This amount of nucleic acid is sufficient for the delivery of one to many genes depending on the size of each transcript. It is preferable to include either positive or negative selectable markers along with other genes in the insert.
  • a packaging cell line is a cell line which has been transfected or transformed with a retrovirus that contains the replication and packaging machinery but lacks any packaging signal.
  • the vector carrying the DNA of choice is transfected into these cell lines, the vector containing the gene of interest is replicated and packaged into new retroviral particles, by the machinery provided in cis by the helper cell. The genomes for the machinery are not packaged because they lack the necessary signals.
  • viruses have been shown to achieve high efficiency gene transfer after direct, in vivo delivery to airway epithelium, hepatocytes, vascular endothelium, CNS parenchyma and a number of other tissue sites (Morsy, J. Clin. Invest. 92:1580-1586 (1993); Kirshenbaum, J. Clin. Invest. 92:381-387 (1993); Roessler, J. Clin. Invest.
  • adenoviruses achieve gene transduction by binding to specific cell surface receptors, after which the virus is internalized by receptor-mediated endocytosis, in the same manner as wild type or replication-defective adenovirus (Chardonnet and Dales, Virology 40:462-477 (1970); Brown and Burlingham, J. Virology 12:386-396 (1973); Svensson and Persson, J. Virology 55:442-449 (1985); Seth, et al., J. Virol.
  • a viral vector can be one based on an adenovirus which has had the El gene removed and these virons are generated in a cell line such as the human 293 cell line.
  • both the El and E3 genes are removed from the adenovirus genome.
  • AAV adeno-associated virus
  • This defective parvovirus is a preferred vector because it can infect many cell types and is nonpathogenic to humans.
  • AAV type vectors can transport about 4 to 5 kb and wild type AAV is known to stably insert into chromosome 19. Vectors which contain this site specific integration property are preferred.
  • An especially preferred embodiment of this type of vector is the P4.1 C vector produced by Avigen, San Francisco, CA, which can contain the herpes simplex virus thymidine kinase gene, HSV-tk, or a marker gene, such as the gene encoding the green fluorescent protein, GFP.
  • the AAV contains a pair of inverted terminal repeats (ITRs) which flank at least one cassette containing a promoter which directs cell-specific expression operably linked to a heterologous gene.
  • ITRs inverted terminal repeats
  • Heterologous in this context refers to any nucleotide sequence or gene which is not native to the AAV or
  • AAV and B 19 coding regions have been deleted, resulting in a safe, noncytotoxic vector.
  • the AAV ITRs, or modifications thereof, confer infectivity and site-specific integration, but not cytotoxicity, and the promoter directs cell-specific expression.
  • United States Patent No. 6,261,834 is herein incorproated by reference in its entirity for material related to the AAV vector.
  • a nucleic acid construct as used herein contains a selection marker.
  • Useful markers are dependent on the host cell of choice and are well known to persons skilled in the art.
  • molecules encoded within the viral vector e.g., by a cDNA contained in the viral vector, are expressed efficiently in cells which have taken up viral vector nucleic acid.
  • Preferred selection marker genes are extensively presented later on herein.
  • a recombinant host cell such as a mammalian cell, preferably a human cell, containing one or more copies of a nucleic acid construct according to the invention is an additional aspect of the invention.
  • host cell or recombinant host cell is meant a cell which contains a nucleic acid construct such as a vector and supports the replication and/or expression of the nucleic acid construct.
  • a suitable expression system uses any mammalian cells such as CHO, Cos, CPK (porcine kidney), MDCK, BHK, and Vera cells.
  • a suitable human cell or human cell line is an astrocyte, adipocyte, chondrocyte, endothelial, epithelial, fibroblast, hair, keratinocyte, melanocyte, osteoblast, skeletal muscle, smooth muscle, stem, synoviocyte cell or cell line.
  • suitable human cell lines also include HEK 293 (human embryonic kidney), HeLa, Per.C ⁇ , and Bowes melanoma cells.
  • a human cell is not an embryonic stem cell. Therefore, in another aspect of the invention relates to a mammalian cell that is genetically modified, preferably by a method of the invention, in that a mammalian cell comprises a nucleic acid construct as herein defined above.
  • a nucleic acid construct preferably is a construct containing nucleic acid sequences that are manipulated or modified in vitro.
  • a nucleic acid construct preferably provides a mammalian cell with a combination of nucleic acid sequences which is not found in nature.
  • a nucleic acid construct preferably is stably maintained, either as a autonomously replicating element, or, more preferably, the nucleic acid construct is integrated into the mammalian cell's genome, in which case the construct is usually integrated at random positions in the mammalian cell's genome, for instance by non-homologuous recombination.
  • Stably transformed mammalian cells are produced by known methods. The term stable transformation refers to exposing cells to methods to transfer and incorporate foreign DNA into their genome.
  • a mammalian tissue can be regenerated from said transformed cell in a suitable medium, which optionally may contain antibiotics or biocides known in the art for the selection of transformed cells.
  • Resulting transformed mammalian tissues are preferably identified by means of selection using a selection marker gene as present on a nucleic acid construct as defined herein.
  • a nucleic acid construct according to the invention therefore preferably also comprises a marker gene which can provide selection or screening capability in a treated mammalian cell. Selectable markers are generally preferred for mammalian transformation events, but are not available for all mammalian species.
  • a nucleic acid construct disclosed herein can also include a nucleic acid sequence encoding a marker product.
  • a marker product can be used to determine if the construct or portion thereof has been delivered to the cell and once delivered is being expressed. Examples of marker genes include, but are not limited to the E. coli lacZ gene, which encodes ⁇ -galactosidase, and a gene encoding the green fluorescent protein.
  • a marker may be a selectable marker.
  • suitable selectable markers for mammalian cells include, but are not limited to dihydrofolate reductase (DHFR), thymidine kinase, neomycin, neomycin analog G418, hydromycin, and puromycin.
  • Other suitable selectable markers include, but are not limited to antibiotic, metabolic, auxotrophic or herbicide resistant genes which, when inserted in a host cell in culture, would confer on those cells the ability to withstand exposure to an antibiotic. Metabolic or auxotrophic marker genes enable transformed cells to synthesize an essential component, usually an amino acid, which allows the cells to grow on media that lack this component.
  • Another type of marker gene is one that can be screened by histochemical or biochemical assay, even though the gene cannot be selected for.
  • a suitable marker gene found useful in such host cell transformation experience is a luciferase gene. Luciferase catalyzes the oxidation of luciferin, resulting in the production of oxyluciferin and light.
  • a luciferase gene provides a convenient assay for the detection of the expression of introduced DNA in host cells by histochemical analysis of the cells.
  • a nucleic acid sequence sought to be expressed in a host cell could be coupled in tandem with the luciferase gene. The tandem construct could be transformed into host cells, and the resulting host cells could be analyzed for expression of the luciferase enzyme.
  • An advantage of this marker is the non-destructive procedure of application of the substrate and the subsequent detection.
  • the transformed mammalian host cell can survive if placed under selective pressure.
  • selective regimes There are two widely used distinct categories of selective regimes. The first category is based on a cell's metabolism and the use of a mutant cell line which lacks the ability to grow independent of a supplemented media. Two examples are CHO DHFR-cells and mouse LTK- cells. These cells lack the ability to grow without the addition of such nutrients as thymidine or hypoxanthine. Because these cells lack certain genes necessary for a complete nucleotide synthesis pathway, they cannot survive unless the missing nucleotides are provided in a supplemented media.
  • An alternative to supplementing the media is to introduce an intact DHFR or TK gene into cells lacking the respective genes, thus altering their growth requirements. Individual cells which were not transformed with the DHFR or TK gene will not be capable of survival in non-supplemented media.
  • the second category is dominant selection which refers to a selection scheme used in any cell type and does not require the use of a mutant cell line. These schemes typically use a drug to arrest growth of a host cell. Those cells which have a novel gene would express a protein conveying drug resistance and would survive the selection. Examples of such dominant selection use the drugs neomycin, (Southern P. and Berg, P., J. Molec. Appl. Genet. 1 : 327 (1982)), mycophenolic acid, (Mulligan, R.C. and Berg, P. Science 209: 1422 (1980)) or hygromycin, (Sugden, B. et al., MoI. Cell. Biol. 5: 410-413 (1985)).
  • the three examples employ bacterial genes under eukaryotic control to convey resistance to the appropriate drug G418 or neomycin (geneticin), xgpt (mycophenolic acid) or hygromycin, respectively. Others include the neomycin analog G418 and puramycin. Other useful markers are dependent on the host cell of choice and are well known to persons skilled in the art.
  • a transformed mammalian cell is subjected to conditions leading to expression of a protein or polypeptide of interest, and optionally recovering said protein or polypeptide. Recovering steps depend on the expressed protein or polypeptide and the host cell used but can comprise isolation of the protein or polypeptide.
  • the term "isolation" indicates that the protein is found in a condition other than its native environment.
  • an isolated protein is substantially free of other proteins, particularly other homologous proteins. It is preferred to provide the protein in a greater than 40% pure form, more preferably greater than 60% pure form. Even more preferably it is preferred to provide the protein in a highly purified form, i.e., greater than 80% pure, more preferably greater than 95% pure, and even more preferably greater than 99% pure, as determined by SDS-PAGE.
  • a second nucleotide sequence may be ligated to a heterologous nucleotide sequence to encode a fusion protein to facilitate protein purification and protein detection on for instance Western blot and in an ELISA.
  • Suitable heterologous sequences include, but are not limited to, the nucleotide sequences encoding for proteins such as for instance glutathione-S-transferase, maltose binding protein, metal- binding polyhistidine, green fluorescent protein, luciferase and beta-galactosidase.
  • the protein may also be coupled to non-peptide carriers, tags or labels that facilitate tracing of the protein, both in vivo and in vitro, and allow for the identification and quantification of binding of the protein to substrates.
  • labels, tags or carriers are well-known in the art and include, but are not limited to, biotin, radioactive labels and fluorescent labels.
  • the verb "to comprise” and its conjugations is used in its non-limiting sense to mean that items following the word are included, but items not specifically mentioned are not excluded.
  • the verb "to consist” may be replaced by "to consist essentially of meaning that a vector or a nucleic acid construct or a nucleotide molecule, a host cell respectively a method as defined herein may comprise additional component(s) respectively additional step(s) than the ones specifically identified, said additional component(s) respectively additional step(s) not altering the unique characteristic of the invention.
  • reference to an element by the indefinite article “a” or “an” does not exclude the possibility that more than one of the element is present, unless the context clearly requires that there be one and only one of the elements.
  • the indefinite article “a” or “an” thus usually means “at least one".
  • Example 1 CHO cell lines transgenic for SEAP containing the UNl sequence (SEQ ID NO :2) at a specific genomic location
  • SEAP human placental alkaline phosphatase
  • the secreted form of human placental alkaline phosphatase is a very stable reporter enzyme which is easily detectable in the cell medium of mammalian expression systems.
  • SEAP production to study the yield effect of introducing the UNicTM technology in a controlled high expression CHO cell line.
  • the Flp-InTM system was used to obtain transgenic CHO lines with the SEAP constructs integrated at a single specific genomic location by means of site-specific recombination.
  • Two polyclonal stable lines were generated that differed only in the presence of the untranslated sequence of ntp303 (UNl sequence).
  • the integration site in the CHO FIp-In strain is known to give a high and stable mRNA expression level.
  • the pPNIC004 and pPNIC005 vectors used to transfect CHO FIp-In cells were constructed based on the expression vector pEF5/FRT/V5-dest (Invitrogen). Both vectors were constructed by insertion of the SEAP cDNA sequence, amplified by PCR using pSEAP2 (Invitrogen) as a template, between the constitutive EFl -alpha promoter and the bovine growth hormone poly-adenylation site.
  • pPNIC004 contains an additional UNl sequence fused to the SEAP coding sequence by means of fusion PCR. The constructs were analyzed by sequencing before transfection.
  • the CHO-pPNIC004 and CHO-pPNIC005 lines were generated using the pPNIC004 and pPNIC005 plasmids and CHO FIp-In cell line, according to the recommendations provided by the manufacturer of the FIp-In system (Invitrogen). Cells were maintained by transferring a 1 : 10 dilution in fresh medium.
  • transgenic CHO lines were analyzed as follows. The cells were seeded at a density of 10 5 cells per ml, using 2 ml per well in 6-wells plates. Hygromycin was omitted from these plates to give similar growth for transgenic cells and the empty CHO FIp-In cell line that was used as a negative control. The three cell lines (CHO FIp-In, CHO- pPNIC004, and CHO-pPNIC005) were seeded. Two wells were used for measuring SEAP production in the supernatant and the same wells were used to determine the cell numbers.
  • the SEAP concentration was determined using the Phospha-Light System (Applied Biosystems), using the manual provided with the kit. Luminescence was measured using a Victor3 plate reader (Perkin-Elmer).
  • the growth curves show that both transgenic cell lines grow with a rate similar to the CHO FIp-In cell line.
  • the presence of the SEAP gene or UNl sequence did not have a significant effect on cell growth, indicating that differences in protein production are not a result of differences in biomass.
  • All cells have reached a maximum cell density around two days after seeding at a density of 1.2 to 1.5 million cells per well of 10 cm 2 . After one day the SEAP concentration still increased linearly in time.
  • two independent experiments showed a two-fold increased SEAP production for the pPNIC004 line (+UN 1) compared to the pPNIC005 line (-UN1) at 48 hours after seeding.
  • Example 2 Increase of human protein production in transgenic human cell line
  • SEAP human placental alkaline phosphatase
  • the integration site in the HEK FIp-In strain is known to give a high and stable mRNA expression level. Clonal variation and variation related to different insertion sites and copy number are eliminated by the use of polyclonal isogenic lines. This enables a proper comparison between the constructs with SEQ ID NO:2 and SEQ ID NO:3 sequence.
  • the two SEAP expression lines were grown simultaneously and analyzed at different time points for cell growth and protein expression.
  • the pPNIC136 and pPNIC147 vectors used to transfect CHO FIp-In cells were constructed based on the expression vector pEF5/FRT/V5-dest (Invitrogen). Both vectors were constructed by insertion of the SEAP cDNA sequence, amplified by PCR using pSEAP2 (Invitrogen) as a template, between the constitutive human cytomegalo virus (CMV) promoter and the bovine growth hormone poly-adenylation site.
  • CMV human cytomegalo virus
  • the SEQ ID NO: 1 and SEQ ID NO: 3 sequences were purchased as synthetic DNA and inserted as a fusion with the SEAP coding sequence resulting in the pPNIC135 and pPNIC147 plasmids, respectively. The constructs were analyzed by sequencing before transfection.
  • the HEK-pPNIC136 and HEK-pPNIC147 lines were generated using the pPNIC136 and pPNIC147 plasmids and HEK FIp-In cell line, according to the recommendations provided by the manufacturer of the FIp-In system (Invitrogen).
  • Cells were maintained by transferring a 1 :10 dilution in fresh medium. Briefly, cells were washed with PBS, detached with trypsin/EDTA solution, diluted in fresh selective DMEM medium (DMEM medium containing 10% fetal bovine serum and 50 microgram/ml Hygromycin B), and diluted to a final dilution of 1 : 10 in selective DMEM medium (DMEM medium containing 10% fetal bovine serum and 50 microgram/ml Hygromycin B), and diluted to a final dilution of 1 : 10 in selective DMEM medium (DMEM medium containing 10% fetal bovine serum and 50 microgram/ml Hygromycin B), and diluted to a final dilution of 1 : 10 in selective DMEM medium (DMEM medium containing 10% fetal bovine serum and 50 microgram/ml Hygromycin B), and diluted to a final dilution of 1 : 10 in selective DMEM medium (DMEM medium containing 10% fetal bovine serum
  • the transgenic HEK lines were analyzed as follows. The cells were seeded at a density of 10E5 cells per ml, using 2 ml per well in 6-wells plates. Hygromycin was omitted from these plates to give similar growth for transgenic cells and the empty HEK FIp-In cell line that was used as a negative control. The three cell lines (HEK FIp-In, HEK- pPNIC136, and HEK-pPNIC147) were seeded. Two wells were used for measuring
  • SEAP activity assay The SEAP concentration was determined using the Phospha-Light System (Applied Biosystems), using the manual provided with the kit. Luminescence was measured using a Victor3 plate reader (Perkin-Elmer). Results
  • the supernatants of the pPNIC136 lines contained at least 50 percent more SEAP than the pPNIC147 lines at these time points, which is shown in Figure 2 .
  • Example 3 Improvement of transient human IL-4 expression in human cell line by UN1.52 (SEQ ID NO:4)
  • the human IL-4 (hIL-4) protein is a cytokine with anti- inflammatory properties and a key regulator in humoral and adaptive immunity. Cells that express the protein secrete it into the culture medium. It is easily detected by means of a commercial ELISA kit.
  • hIL-4 production was used to study the yield effect of introducing the UNl .52 after transfection of HEK cells. Expression constructs were generated that differed only in the expression of UNl.52 messenger RNA. The two constructs were transfected in parallel to enable a proper comparison between the constructs with and without UNl .52 sequence.
  • the pPNIC144 and pPNIC145 expression vectors used to transfect HEK cells were constructed based on the expression vector pCMV6-neo (OriGene). Both vectors were constructed by insertion of the hIL-4 cDNA sequence, derived from the pCMV6-
  • XL5mod_IL4_NM_000589 plasmid (OriGene).
  • the neomycin resistance cassette of the pCMV6-neo plasmid was replaced by the blasticidin resistance cassette derived from the pUB/Bsd plasmid (Invitrogen).
  • the resulting hIL-4 expression plasmid is pPNIC144.
  • the sequence for UNl.52 was purchased as synthetic DNA and inserted as a fusion with the hIL-4 coding sequence resulting in the pPNIC145 plasmid.
  • the constructs were analyzed by sequencing before transfection. Plasmid concentrations and purity were checked using a Nanodrop spectrophotometer (Thermo Scientific).
  • HEK cells were grown in DMEM/F12 medium containing 10% FBS at 37°C, 5% CO2. Cells were seeded in 6-well plates to reach a density of half a million cells per well at the day of transfection. Transfections were performed in triplicate reactions using Fugene-6 reagent (Roche) according to the manufacturer's instructions. Medium was replaced by fresh medium at 24, 48, and 72 hours post transfection and samples were collected for protein analysis. At each time point the cells of one well of each transfection were detached using trypsin/EDTA and used to count the number of cells by using a CASY Cell Counter as described by the manufacturer. hIL-4 yield
  • ELISA kit (eBioscience) was used to determine the concentration of hIL-4 in the medium samples. Each sample was assayed in triplicate wells and a dilution series of rhIL-4 (eBioscience) was used as a standard. The colorimetric ELISA assay was measured using a Victor3 plate reader (Perkin-Elmer).
  • HEK cells transfected with plasmids pPNIC144 and pPNIC145 secreted an easily detectable amount of hIL4 into the culture medium.
  • the density of cells transfected with either pPNIC144 or pPNIC145 was not significantly different.
  • the amount of hlL- 4 produced per cell increased between 24 and 48 hours after transfection, but remained constant between 48 and 72 hours after transfection.
  • the supernatants of the HEK cells transfected with pPNIC145 contained 6 to 7 times more hIL-4 than the cells transfected with pPNIC144 at all time points, which is shown in Figure 3 for 48 hours after transfection.
  • FIG. 3 HEK cells transiently expressing hIL-4 after transfection. Expression with or without UNl.52. The concentration of hIL-4 in the cell supernatant was determined at 48 hours after transfection.

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Veterinary Medicine (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

The present invention pertains to a method of expressing a protein of interest, preferably a heterologous protein, in a mammalian cell, preferably a human cell. Furthermore, the invention relates to said cell with or without nucleic acid constructs according to the invention and to said nucleic acid construct..

Description

Regulation of the expression of a protein in a mammalian cell
Field of the invention
The invention relates to the regulation of the expression of a protein in a mammalian cell using a specific nucleic acid sequence.
Background of the invention
The translation process in eukaryotes is a complex series of steps that involve a wide array of protein translation factors. These factors function in conjunction with the ribosome and tRNAs to decode an mRNA, thereby generating the encoded polypeptide chain. The translation process can be divided into three distinct stages: (i) initiation: the assembly of the ribosomal subunits at the initiation (AUG) codon of an mRNA; (ii) elongation: tRNA-dependend decoding of the mRNA to form a polypeptide chain; (iii) termination: a stop codon (UAA, UAG or UGA) signals the release of the polypeptide chain from the ribosome and subsequently the ribosomal subunits dissociate from the mRNA. Each of these stages requires a specific class of translation factors: eukaryotic initiation, elongation and termination factors (elF, eEF and eRF, respectively).
Translation initiation is an important step in both global and mRNA-specific gene regulation and therefore constitutes the primary target for translational control. Global regulation of protein synthesis is generally achieved by the modification of eukaryotic initiation factors (elFs), several of which are phosphoproteins, e.g., eIF4E and eIF2. Translational control of individual mRNAs often depends upon the structural properties of the transcript itself. These may include properties in the 5 ' untranslated region of the mRNA that can affect initiation either directly, for example by impeding 4OS subunit binding or scanning, or indirectly by acting as receptors for a regulatory RNA- binding protein. The role of that sequence in the control of translation has been reviewed by Day and Tuite (1998, J. of Endocrinology 157;361-371).
Structural properties of the 5 ' end of an mRNA transcript that may affect translation initiation of that transcript include, but are not limted to the presence of an m7G cap structure, the primary sequence context of the initiation codon, the presence of upstream AUGs, the stability and position of secondary structures, and/or the length of the 5' leader. The RNA secondary structure positioned between the cap structure and the AUG codon may typically be inhibitory to translation initiation (Kozak 1989, Molecular and Cellular Biology 9:5134-5142). For example, the inhibition can be by steric hindrance preventing the binding of the 43 S preinitiation complex to the cap structure. In contrast, increasing the length of the 5' sequence may lead to a proportional increase in translational efficiency (Kozak 1991, Gene Expression 1 :117-125), which may simply be due to increased loading of the 4OS subunits on the longer 5' sequences. In addition, mRNA secondary structures can also have a regulatory role. For example, RNA structural elements can provide sites for the binding of regulatory proteins and RNA molecules and, by forming a stable structure, they typically impede binding or scanning of the 4OS ribosomal subunit (Goossen et al. 1990, Goossen & Hentze 1992).
Several expression systems have been so far developed as attempts to improve the efficiency of existing expression systems in mammalian cells. There is still a need for alternative and preferably improved methods for regulating the expression of a protein of interest in mammalian cells.
Description of the invention
Definitions
Here below follow definitions of terms as used in the invention. Operably linked
As used herein, the term "operably linked" refers to two or more nucleic acid sequence elements that are physically linked and are in a functional relationship with each other. For instance, a promoter is operably linked to a coding sequence if the promoter is able to initiate or regulate the transcription or expression of a coding sequence, in which case the coding sequence should be understood as being "under the control of the promoter. Generally, when two nucleic acid sequences are operably linked, they will be in the same orientation and usually also in the same reading frame. They usually will be essentially contiguous, although this may not be required.
Promoter
As used herein, the term "promoter" refers to a nucleic acid fragment that functions to control the transcription of one or more genes, located upstream with respect to the direction of transcription of the transcription initiation site of the gene, and is related to the binding site identified by the presence of a binding site for DNA- dependent RNA polymerase, transcription initiation sites and any other DNA sequences, including, but not limited to transcription factor binding sites, repressor and activator protein binding sites, and any other sequences of nucleotides known to one skilled in the art to act directly or indirectly to regulate the amount of transcription from the promoter. Within the context of the invention, a promoter preferably ends at nucleotide -1 of the transcription start site (TSS).
Hybridising nucleic acid orthologs and hybridization Any nucleotide molecule capable to hybridise to a nucleotide molecule represented by SEQ ID NO. 1 is defined as being part of the UNl (SEQ ID NO:2) of the invention. Any nucleotide molecule capable to hybridise to SEQ ID NO:2 or to regions of SEQ ID NO:1 as later identified herein or to SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8 or SEQ ID NO:9 is also encompassed by the present invention. Hybridisation conditions are preferable stringent. Stringent hybridisation conditions are herein defined as conditions that allow a nucleic acid sequence of at least 25, preferably 50, 75 or 100, and most preferably 150 or more nucleotides, to hybridise at a temperature of about 650C in a solution comprising about 1 M salt, preferably 6 x SSC or any other solution having a comparable ionic strength, and washing at 650C in a solution comprising about 0.1 M salt, or less, preferably 0.2 x SSC or any other solution having a comparable ionic strength. Preferably, the hybridisation is performed overnight, i.e. at least for 10 hours and preferably washing is performed for at least one hour with at least two changes of the washing solution. These conditions will usually allow the specific hybridisation of sequences having about 90% or more sequence identity. Moderate hybridization conditions are herein defined as conditions that allow a nucleic acid sequence of at least 50, preferably 150 or more nucleotides, to hybridise at a temperature of about 450C in a solution comprising about 1 M salt, preferably 6 x SSC or any other solution having a comparable ionic strength, and washing at room temperature in a solution comprising about 1 M salt, preferably 6 x SSC or any other solution having a comparable ionic strength. Preferably, the hybridisation is performed overnight, i.e. at least for 10 hours, and preferably washing is performed for at least one hour with at least two changes of the washing solution. These conditions will usually allow the specific hybridisation of sequences having up to 50% sequence identity. The person skilled in the art will be able to modify these hybridisation conditions in order to specifically identify sequences varying in identity between 50% and 90%.
Homologous
The term "homologous" when used to indicate the relation between a given (recombinant) nucleic acid or polypeptide molecule and a given host organism or host cell, is understood to mean that in nature the nucleic acid or polypeptide molecule is produced by a host cell or organisms of the same species, preferably of the same variety or strain. If homologous to a host cell, a nucleic acid sequence encoding a polypeptide will typically be operably linked to another promoter sequence or, if applicable, another secretory signal sequence and/or terminator sequence than in its natural environment.
When used to indicate the relatedness of two nucleic acid sequences the term "homologous" means that one single-stranded nucleic acid sequence may hybridise to a complementary single-stranded nucleic acid sequence. The degree of hybridisation may depend on a number of factors including the extent of identity between the sequences and the hybridisation conditions such as temperature and salt concentration as discussed later. Preferably the region of identity is greater than 5 bp, more preferably the region of identity is greater than 10 bp.
Heterologous
The term "heterologous" when used with respect to a nucleic acid or polypeptide molecule refers to a nucleic acid or polypeptide from a foreign cell which does not occur naturally as part of the organism, cell, genome or DNA or RNA sequence in which it is present, or which is found in a cell or location or locations in the genome or DNA or RNA sequence that differ from that in which it is found in nature. Heterologous nucleic acids or proteins are not endogenous to the cell into which they are introduced, but have been obtained from another cell or synthetically or recombinantly produced. Generally, though not necessarily, such nucleic acids encode proteins that are not normally produced by the cell in which the DNA is transcribed or expressed, similarly exogenous RNA codes for proteins not normally expressed in the cell in which the exogenous RNA is present. Furthermore, it is known that a heterologous protein or polypeptide can be composed of homologous elements arranged in an order and/or orientation not normally found in the host organism, tissue or cell thereof in which it is transferred, i.e. the nucleotide sequence encoding said protein or polypeptide originates from the same species but is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. Heterologous nucleic acids and proteins may also be referred to as foreign nucleic acids or proteins. Any nucleic acid or protein that one of skill in the art would recognise as heterologous or foreign to the cell in which it is expressed is herein encompassed by the term heterologous nucleic acid or protein. The term heterologous also applies to non-natural combinations of nucleic acid or amino acid sequences, i.e. combinations where at least two of the combined sequences are foreign with respect to each other.
Endogenous The term "endogenous" when used with respect to a nucleic acid or polypeptide molecule refers to a nucleic acid or polypeptide as natively expressed in a cell, preferably a mammalian cell as defined in the invention.
Sequence identity "Sequence identity", as known in the art, is a relationship between two or more amino acid (polypeptide or protein) sequences or two or more nucleic acid (polynucleotide or nucleotide) sequences, as determined by comparing the sequences. In the art, the percentage of "identity" indicates the degree of sequence relatedness between amino acid or nucleic acid sequences as determined by the match between strings of such sequences. Preferably, the percentage of identity is determined by comparing the whole SEQ ID NO as identified herein. However, part of a sequence may also be used. Two amino acid sequences are considered "similar" if the polypeptides only differ in conserved amino acid substitutions. In determining the degree of amino acid similarity, the skilled person takes into account "conservative" amino acid substitutions. Conservative amino acid substitutions refer to the interchange of amino acids having similar side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulphur-containing side chains is cysteine and methionine. Preferred conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine. Substitutional variants of the amino acid sequence disclosed herein are those in which at least one residue in the disclosed sequences has been removed and a different residue inserted in its place. Preferably, the amino acid change is conservative. Preferred conservative substitutions for each of the naturally occurring amino acids are as follows: Ala to ser; Arg to lys; Asn to gin or his; Asp to glu; Cys to ser or ala; GIn to asn; GIu to asp; GIy to pro; His to asn or gin; He to leu or val; Leu to ile or val; Lys to arg; Asn to gin or glu; Met to leu or ile; Phe to met, leu or tyr; Ser to thr; Thr to ser; Trp to tyr; Tyr to trp or phe; and, Val to ile or leu. "Identity" and "similarity" can be readily calculated by known methods, including but not limited to those described in Computational Molecular Biology, Lesk, A. M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D. W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heine, G.,
Academic Press, 1987; and Sequence Analysis Primer Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48:1073 (1988).
Preferred methods to determine identity are designed to give the largest match between the sequences tested. Methods to determine identity and similarity are codified in publicly available computer programs. Preferred computer program methods to determine identity and similarity between two sequences include e.g. the GCG program package (Devereux, J., et al, Nucleic Acids Research 12 (1):387 (1984)), BestFit and FASTA (Altschul, S. F. et al., J. MoI. Biol. 215:403-410 (1990). The BLAST 2.0 family of programs which can be used for database similarity searches includes: BLASTN for nucleotide query sequences against nucleotide database sequences; BLASTX for nucleotide query sequences against protein database sequences; BLASTP for protein query sequences against protein database sequences; TBLASTN for protein query sequences against nucleotide database sequences; and TBLASTX for nucleotide query sequences against nucleotide database sequences. The BLASTX program is publicly available from NCBI and other sources (BLAST Manual, Altschul, S., et al., NCBI NLM NIH Bethesda, MD 20894; Altschul, S., et al., J. MoL Biol. 215:403-410 (1990)). The well-known Smith Waterman algorithm may also be used to determine identity.
Preferred parameters for polypeptide sequence comparison include the following: Algorithm: Needleman and Wunsch, J. MoI. Biol. 48:443-453 (1970); Comparison matrix: BLOSSUM62 from Hentikoff and Hentikoff, Proc. Natl. Acad. Sci. USA. 89:10915-10919 (1992); Gap Penalty: 12; and Gap Length Penalty: 4. A program useful with these parameters is publicly available as the "Ogap" program from Genetics Computer Group, located in Madison, WI. The aforementioned parameters are the default parameters for amino acid comparisons (along with no penalty for end gaps). Preferred parameters for nucleic acid comparison include the following:
Algorithm: Needleman and Wunsch, J. MoI. Biol. 48:443-453 (1970); Comparison matrix: matches=+10, mismatch=O; Gap Penalty: 50; Gap Length Penalty: 3. Available as the Gap program from Genetics Computer Group, located in Madison, Wis. Given above are the default parameters for nucleic acid comparisons. Another preferred method to determine sequence similarity and identity is by using the algorithm Needleman- Wunsch ( Needleman, S. B. and Wunsch, C. D. (1970) J. MoI. Biol. 48, 443-453, Kruskal, J. B. (1983) An overview of squence comparison In D. Sankoff and J. B. Kruskal, (ed.), Time warps, string edits and macromolecules: the theory and practice of sequence comparison, pp. 1-44 Addison Wesley). The following website could be used: http://www.cbi.ac.uk/Tools/cmboss/aligix/indcx.html
Definitions of the parameters used in this algorithm are found at the following website:: http://emboss. sourceforge.net/docs/themes/AlignFormats.htmWid.Preferably, using this algorithm, Gap_penalty is 10.0 and Extend_penalty is 0.5.
Nucleic acid
The phrase "nucleic acid" as used herein refers to a naturally occurring or synthetic oligonucleotide or polynucleotide, whether DNA or RNA or DNA-RNA hybrid, single-stranded or double-stranded, sense or antisense, which is capable of hybridization to a complementary nucleic acid by Watson-Crick base-pairing. Nucleic acids of the invention can also include nucleotide analogs (e.g., BrdU), and non- phosphodiester internucleoside linkages (e.g., peptide nucleic acid (PNA) or thiodiester linkages). In particular, nucleic acids can include, without limitation, DNA, RNA, cDNA, gDNA, ssDNA, dsDNA, ssRNA, dsRNA, non coding RNAs or any combination thereof.
Primer
"Primers" are a subset of probes which are capable of supporting some type of enzymatic manipulation and which can hybridize with a target nucleic acid such that the enzymatic manipulation can occur. A primer can be made from any combination of nucleotides or nucleotide derivatives or analogs available in the art which do not interfere with the enzymatic manipulation.
Probe
"Probes" are molecules capable of interacting with a target nucleic acid, typically in a sequence specific manner, for example through hybridization. The hybridization of nucleic acids is well understood in the art and discussed herein. Typically a probe can be made from any combination of nucleotides or nucleotide derivatives or analogs available in the art.
Messenger RNA
"Messenger RNA (mRNA)" as used herein refers to a temporary complementary copy of RNA of the antisense strand (anticoding strand or template) of protein coding DNA. In eukaryotes, especially mammals it is usually transcribed as a relatively long pre-mRNA (also called primary transcript or hnRNA) which is then processed, still within the nucleus, to remove introns. Further post-transcriptional modifications can also occur. The mature mRNA is then transported into the cytoplasm where it is translated into protein on the ribosome. Furthermore, an mRNA generally comprises on both the 5' and the 3' side sequences flanking the region that specifies the protein sequence.. These regions may be untranslated. Antisense nucleic acid
"Antisense nucleic acid" as used herein refers to a RNA, DNA or PNA molecule that is complementary to all or part of a target primary transcript or mRNA and that blocks the translation of a target nucleotide sequence.
Polypeptide
"Polypeptide" as used herein refers to any peptide, oligopeptide, polypeptide, gene product, expression product, or protein. A polypeptide is comprised of consecutive amino acids. The term "polypeptide" encompasses naturally occurring or synthetic molecules.
In addition, as used herein, the term "polypeptide" refers to amino acids joined to each other by peptide bonds or modified peptide bonds, e.g., peptide isosteres, etc. and may contain modified amino acids other than the 20 gene-encoded amino acids. The polypeptides can be modified by either natural processes, such as post-translational processing, or by chemical modification techniques which are well known in the art. Modifications can occur anywhere in the polypeptide, including the peptide backbone, the amino acid side-chains and the amino or carboxyl termini. The same type of modification can be present in the same or varying degrees at several sites in a given polypeptide. Also, a given polypeptide can have many types of modifications. Modifications include, without limitation, acetylation, acylation, ADP-ribosylation, amidation, covalent cross-linking or cyclization, covalent attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of a phosphytidylinositol, disulfide bond formation, demethylation, formation of cysteine or pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristolyation, oxidation, pergylation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, and transfer-RNA mediated addition of amino acids to protein such as arginylation. (See Proteins - Structure and Molecular Properties 2nd Ed., T.E. Creighton, W.H. Freeman and Company, New York (1993); Posttranslational Covalent Modification of Proteins, B.C. Johnson, Ed., Academic Press, New York, pp. 1-12 (1983)). Amino acid sequence
As used herein, the term "amino acid sequence" refers to a list of abbreviations, letters, characters or words representing amino acid residues. The amino acid abbreviations used herein are conventional one letter codes for the amino acids and are expressed as follows: A, alanine; N, asparagine; C, cysteine; D aspartic acid; E, glutamate, glutamic acid; F, phenylalanine; G, glycine; H histidine; I isoleucine; K, lysine; L, leucine; M, methionine; P, proline; Q, glutamine; R, arginine; S, serine; T, threonine; V, valine; W, tryptophan; Y, tyrosine; Z, glutamine or glutamic acid.
Or
The word "or" as used herein means any one member of a particular list and also includes any combination of members of that list.
Vector or plasmid The term "vector" or "plasmid" refers to a nucleic acid sequence capable of transporting into a cell another nucleic acid to which the vector sequence has been linked. The term "expression vector" includes any vector, (e.g., a plasmid, cosmid or phage chromosome) containing a gene construct in a form suitable for expression by a cell (e.g., linked to a transcriptional control element). "Plasmid" and "vector" are used interchangeably, as a plasmid is a commonly used form of vector. Moreover, the invention is intended to include other vectors which serve equivalent functions.
Sequence of interest, gene of interest
The term "sequence of interest" or "gene of interest" can mean a nucleic acid sequence (e.g., a therapeutic gene), that is partly or entirely heterologous, i.e., foreign, to a cell into which it is introduced. The term "polypeptide of interest", "peptide of interest, or "protein of interest" can mean a peptide sequence (e.g., a therapeutic peptide), that is partly or entirely heterologous, i.e., foreign, to a cell into which it is introduced. The term "sequence of interest" or "gene of interest" can also mean a nucleic acid sequence, that is partly or entirely homologous to an endogenous gene of the cell into which it is introduced, but which is designed to be inserted into the genome of the cell in such a way as to alter the genome (e.g., it is inserted at a location which differs from that of the natural gene or its insertion results in "a knockout"). For example, a sequence of interest can be cDNA, DNA, or mRNA.
The term "sequence of interest" or "gene of interest" can also mean a nucleic acid sequence, that is partly or entirely complementary to an endogenous gene of the cell into which it is introduced. For example, the sequence of interest can be micro RNA, shRNA, or siRNA.
A "sequence of interest" or "gene of interest" can also include one or more transcriptional regulatory sequences and any other nucleic acid, such as introns, that may be necessary for optimal expression of a selected nucleic acid.
Polypeptide of interest
The term "polypeptide of interest", "peptide of interest, or "protein of interest" can also mean a peptide sequence that is partly or entirely homologous to an endogenous peptide of the cell into which it is introduced. The term "polypeptide of interest", "peptide of interest, or "protein of interest" can also mean a peptide or polypeptide sequence (e.g., a therapeutic protein), that is expressed from a sequence of interest or gene of interest.
Transformation/transfection The terms "transformation" and "transfection" mean the introduction of a nucleic acid, e.g., an expression vector, into a recipient cell possibly including introduction of a nucleic acid to the chromosomal DNA of said cell.
Isolated polypeptide/purified polypeptide By "isolated polypeptide" or "purified polypeptide" is meant a polypeptide (or a fragment thereof) that is substantially free from the materials with which the polypeptide is normally associated in nature. The polypeptides of the invention, or fragments thereof, can be obtained, for example, by extraction from a natural source (for example, a mammalian cell), by expression of a recombinant nucleic acid encoding the polypeptide (for example, in a cell or in a cell-free translation system), or by chemically synthesizing the polypeptide. In addition, polypeptide fragments may be obtained by any of these methods, or by cleaving full length polypeptides. Isolated nucleic acid/purified nucleic acid
By "isolated nucleic acid" or "purified nucleic acid" is meant DNA that is free of the genes that, in the naturally-occurring genome of the organism from which the DNA of the invention is derived, flank the gene. The term therefore includes, for example, a recombinant DNA which is incorporated into a vector, such as an autonomously replicating plasmid or virus; or incorporated into the genomic DNA of a prokaryote or eukaryote (e.g., a transgene); or which exists as a separate molecule (for example, a cDNA or a genomic or cDNA fragment produced by PCR, restriction endonuclease digestion, or chemical or in vitro synthesis). It also includes a recombinant DNA which is part of a hybrid gene encoding additional polypeptide sequence. The term "isolated nucleic acid" also refers to RNA, e.g., an mRNA molecule that is encoded by an isolated DNA molecule, or that is chemically synthesized, or that is separated or substantially free from at least some cellular components, for example, other types of RNA molecules or polypeptide molecules.
Detailed description of the invention
As a first aspect, the invention relates to a method for expressing a protein or polypeptide of interest in a mammalian cell comprising the steps of: a) providing a nucleic acid construct comprising a first nucleotide sequence that has at least 34 % nucleotide sequence identity with the nucleotide sequence of SEQ ID No. 1, operably linked to a second nucleotide sequence encoding a protein or polypeptide of interest and further operably linked to a heterologous promotor, b) contacting a mammalian cell with said nucleic acid construct to obtain a transformed mammalian cell, and c) subjecting said transformed mammalian cell to conditions leading to expression of the protein or polyeptide of interest, and optionally recovering said protein or polypeptide.
According to the invention a nucleic acid construct comprises a first nucleotide sequence that has at least 34 % nucleotide sequence identity to the nucleotide sequence of SEQ ID No. 1 (using the Needleman-Wunsch algorithm of Needle; gap penalties: existence 10, extension 0.5). In a first preferred embodiment, identity is calculated over the whole length of SEQ ID NO:1 or SEQ ID NO:2. In a second preferred embodiment, identity is calculated by comparison to nucleotides 4-76 or 27-50 or 100-151 or 104- 151 of SEQ ID NO: 1. Therefore, thought the application text, when one defines identity by comparison to SEQ ID NO: 1 , SEQ ID NO: 1 may be replaced by SEQ ID NO:2 or by a region of SEQ ID NO: 1 as identified herein. A nucleic acid construct according to the invention preferably comprises a first nucleotide sequence that has at least 34%, 35%, 36 %, more preferably at least 40 %, 45 %, 50 %, 55 %, 60 %, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98% or 99 % nucleotide sequence identity to the whole length of the nucleotide sequence of SEQ ID No. 1 or to each of the specified regions within SEQ ID NO:1 as identified above. In a particularly preferred embodiment of the invention a nucleic acid construct comprises or consists of a first nucleotide sequence that has 100% nucleotide sequence identity to the nucleotide sequence of SEQ ID No. 1.
In a further embodiment the invention relates to a method for expressing a protein of interest in a mammalian cell comprising the steps of: a) providing a nucleic acid construct comprising a first nucleotide sequence comprising a nucleotide sequence that has at least 46% nucleotide sequence identity to nucleotides 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has at least 51% nucleotide sequence identity to nucleotides 4 - 76 of the nucleotide sequence of SEQ ID No. 1 or a combination thereof, operably linked to a second nucleotide sequence encoding a protein or polypeptide of interest and further operably linked to a heterologous promotor, b) contacting a mammalian cell with said nucleic acid construct to obtain a transformed mammalian cell, and c) subjecting said transformed mammalian cell to conditions leading to expression of the protein or polyeptide of interest, and optionally recovering said protein or polypeptide.
Preferably, said first nucleotide sequence comprises a nucleotide sequence that has at least 48%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% and particularly 100% nucleotide sequence identity to nucleotides 100-151 or 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has at least 55%, more preferably at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98% and particularly 100% nucleotide sequence identity to nucleotides 4 - 76 of the nucleotide sequence of SEQ ID No. 1 or a combination thereof. In another embodiment of the above mentioned method a first nucleotide sequence comprises a nucleotide sequence that has at least 46%, 48%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% and particularly 100% nucleotide sequence identity to nucleotides 100-151 or 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has 100% nucleotide sequence identity to nucleotides 27 - 50 of the nucleotide sequence of SEQ ID No. 1 or a combination thereof. The nucleotides 27 - 50 of the nucleotide sequence of SEQ ID No. 1 consist of a repeat of 8 GAA-units. Preferably, said first nucleotide sequence comprises a nucleotide sequence that consists of 7, 6, 5, 4 or 3 GAA units.
In another embodiment of the above mentioned method a first nucleotide sequence comprises or consists of or has at least 34% nucleotide sequence identity to the SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9. Accordingly, the invention also encompasses a host cell, preferably a mammalian cell comprising said first nucleotide sequence. Accordingly, the invention also encompasses a nucleic acid construct comprising one of these sequences. Accordingly, the invention also encompasses each of these nucleotide sequences.
We clearly demonstrated that at least the new sequences SEQ ID NO:3 and SEQ ID NO:4 (see examples) were quite attractive to be used in comparison with SEQ ID NO: 1 or SEQ ID NO:2. We expect that each of these new sequences SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9 or sequences comprising it or having at least 34%identity to one of them will also be attractive to be used in a method of the invention.
Throughout the application, the invention encompasses a nucleotide sequence defined by identity by comparison to SEQ ID NO:1, or SEQ ID NO:2 or specific regions of SEQ ID NO:1 as identified herein. The skilled person will understand that any sequence derived from SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, or from SEQ ID NO:9 and having the required identity with SEQ ID NOl, SEQ ID NO:2 or with a specific region of SEQ ID NO:1 as identified herein is considered to be encompassed by the present invention. In a preferred embodiment, a sequence is derived from SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, or from SEQ ID NO:9 by substituting, deleting and/or adding one, two, three, four or more nucleotides as present in the original sequence. Preferably, the functionality of any sequence is checked using a control expression system as identified in example 1. A sequence is said functional when the expression of a given protein of interest in a system as identified in example 1 is increased of at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 150%, 200%, 300% or more after a given period of time and by comparison to a control cell not having the sequence in question.
A nucleotide sequence according to the invention can be present in the form of
RNA or in the form of DNA including genomic DNA, i.e. DNA including the introns, cDNA or synthetic DNA. The DNA may be double-stranded or single-stranded and if single-stranded may be the coding strand or non-coding (anti-sense) strand. DNA or RNA with a backbone modified for stability or for other reasons are a further part of the invention. Moreover, DNA or RNA comprising unusual bases, such as inosine, or modified bases, such as tritylated bases are also a part of the invention. A nucleotide sequence may also be a allelic variant of the nucleotide sequence according to the invention. If desired, a nucleotide sequence can be prepared or altered synthetically so the known codon preferences of the intended expression host can advantageously be used. It has been shown for instance that the codon preferences and GC content preferences of monocotyledons and dicotyledons differ (Murray et al, Nucl. Acids Res. 17: 477-498 (1989)).
In a preferred embodiment of a method according to the invention, a nucleic acid construct comprises a second nucleotide sequence encoding a protein or polypeptide of interest that is operably linked to any one of the first nucleotide sequences as defined above. A protein or polypeptide of interest can be a homologous or an endogenous or an heterologous protein or polypeptide Therefore it is to be understood that the invention protects the production of homologous, heterogenous or endogenous protein. The invention is not limited to a specific kind of protein to be produced. Any protein is preferably produced using a method of the invention. A second nucleotide sequence encoding an homologous, endogenous or heterologous protein or polypeptide may be derived in whole or in part from any source known to the art, including a bacterial or viral genome or episome, eukaryotic nuclear or plasmid DNA, cDNA or chemically synthesised DNA. Endogenous, homologous and heterologous are preferably defined by reference to the cell or host cell used. A second nucleotide sequence may constitute an uninterrupted coding region or it may include one or more introns bounded by appropriate splice junctions, it can further be composed of segments derived from different sources, naturally occurring or synthetic. A second nucleotide sequence encoding a protein or polypeptide of interest according to a method of the invention is preferably a full-length nucleotide sequence, but can also be a functionally active part or other part of said full-length nucleotide sequence. A protein or polypeptide of interest may be a protein or polypeptide conferring, for instance, disease resistance, immunity, an improved intake of nutrients, minerals, or a modified metabolism in a mammalian cell. In another embodiment, a mammalian cell is used for overproduction of the protein or polypeptide of interest. A second nucleotide sequence encoding a protein or polypeptide of interest may also comprise signal sequences directing the protein or polypeptide of interest when expressed to a specific location in a cell or tissue. Such signal sequences include, but are not limited to, sequences directing the protein or polypeptide of interest to organelles within a mammalian cell or outside of a mammalian cell. Furthermore, a second nucleotide sequence encoding a protein or polypeptide of interest can also comprise sequences which facilitate protein purification and protein detection by for instance Western blotting and ELISA (e.g. c-myc or polyhistidine sequences).
A protein or polypeptide of interest may have industrial or medicinal (pharmaceutical) applications. Examples of proteins or polypeptides with industrial applications include enzymes such as e.g. lipases (e.g. used in the detergent industry), proteases (used inter alia in the detergent industry, in brewing and the like), cell wall degrading enzymes (such as, cellulases, pectinases, beta. -1,3/4- and beta. -1,6- glucanases, rhamnogalacturonases, mannanases, xylanases, pullulanases, galactanases, esterases and the like, used in fruit processing wine making and the like or in feed), phytases, phospholipases, glycosidases (such as amylases, beta.-glucosidases, arabinofuranosidases, rhamnosidases, apiosidases and the like), dairy enzymes (e.g. chymosin). Mammalian, and preferably human, proteins or polypeptides and/or enzymes with therapeutic, cosmetic or diagnostic applications include, but are not limited to, insulin, human serum albumin (HSA), lactoferrin, hemoglobin α and β, tissue plasminogen activator (tPA), erythropoietin (EPO), tumor necrosis factors (TNF), BMP (Bone Morphogenic Protein), growth factors (G-CSF, GM-CSF, M-CSF, PDGF, EGF, and the like), peptide hormones (e.g. calcitonin, somatomedin, somatotropin, growth hormones, follicle stimulating hormone (FSH) interleukins (IL- x), preferably IL-4, interferons (IFN-y). Also included are bacterial and viral antigens, e.g. for use as vaccines, including e.g. heat-labile toxin B-subunit, cholera toxin B- subunit, envelope surface protein Hepatitis B virus, capsid protein Norwalk virus, glycoprotein B Human cytomegalovirus, glycoprotein S, interferon, and transmissible gastroenteritis corona virusreceptors and the like. Further included are genes coding for mutants or analogues of the said proteins. In an embodiment of the invention, a nucleic acid construct further comprises a promotor for control and initiation of transcription of a second nucleotide sequence. A promoter preferably is capable of causing expression of a second nucleotide sequence in a host cell of choice. Said promoter, e.g. homologous or heterologous for a mammalian cell and/or for a nucleotide sequence, is operably linked to any one of the nucleotide sequences mentioned above. In a preferred embodiment of the invention, a promoter is a promoter capable of initiating transcription in a mammalian cell. More preferably, such a promoter is a mammalian promoter. A mammalian promotor as used herein include tissue-specific, tissue-preferred, cell-type-specific, inducible and constitutive promotors. Tissue-specific promotors are promoters which initiate transcription only in certain tissues and refer to a sequence of DNA that provides recognition signals for RNA polymerase and/or other factors required for transcription to begin, and/or for controlling expression of the coding sequence precisely within certain tissues or within certain cells of that tissue. Expression in a tissue specific manner may be only in individual tissues or in combinations of tissues. Furthermore, promoters as used herein can include but are not limited to promoters that originate from the host cell that the constructs are introduced to. Promoters that may be used in a mammalian cell can include promoters such as metallothionein HA promoter (mouse), EFl alpha promoter (human), Cytomegalovirus (CMV), Rous sarcoma virus (RSV), simian virus 40 (SV40), Moloney murine leukemia, Tk promoter Herpes simplex virus (HSV).
A cell-type-specific promoter is a promotor that primarily drives expression in a certain cell type. An inducible promoter is a promoter that is capable of activating transcription of one or more DNA sequences or genes in response to an inducer. The DNA sequences or genes will not be transcribed when the inducer is absent. Inducers known in the art include high salt concentrations, cold, heat or toxic elements and include pathogens or disease agents such as virusses. Inducers can be chemical agents such as proteins, growth regulators, metabolites or phenolic compounds. An inducer can also be an illumination agent such as darkness and light at various modalities including wavelength, intensity, fluence, direction and duration. Activation of an inducible promoter is established by application of the inducer. The group of generally inducible promotors includes, but is not limited to, the hsp70 heat shock promoter of Drosphilia melanogaster, a cold inducible promoter from Brassica napus and an alcohol dehydrogenase promoter which is induced by ethanol. Other inducible promoters include, but are not limited to the glaA promoter which is starch-inducible, the metallothionein HA promoter, and the tetracyclin inducible promoter.
A constitutive promoter is a promoter that is active under many environmental conditions and in many different tissue types. Constitutive mammalian promoters, include, but are not limited to the EFl alpha promoter (human), Cytomegalovirus (CMV), Rous sarcoma virus (RSV), simian virus 40 (SV40), Moloney murine leukemia, Tk promoter Herpes simplex virus (HSV).
A nucleic acid construct as disclosed herein can also comprise one or more regulating elements. The one or more regulating elements can be operably linked to one or more of the nucleic acid sequences within the nucleic acid construct.
As used herein the term "regulating elements" can refer to enhancers or other segments of a nucleic acid sequence that are involved in controlling gene expression. "Enhancer" generally refers to a sequence of DNA that functions at no fixed distance from the transcription start site and can be either 5' (Laimins, L. et al, Proc. Natl. Acad. Sci. 78: 993 (1981)), within, or 3' (Lusky, MX., et al, MoI. Cell Bio. 3: 1108 (1983)) to the transcription unit. A transcription unit is that part of the DNA that will be transcribed into RNA. Furthermore, enhancers can be within an intron (Banerji, J.L. et al., Cell 33: 729 (1983)) as well as within the coding sequence itself (Osborne, T. F., et al., MoI. Cell Bio. 4: 1293 (1984) each of the cited references are incorporated herein by reference in their entirety for enhancers taught therein). They are usually between 10 and 300 bp in length, and they function in cis. Enhancers function to increase transcription from nearby promoters. Enhancers and promoters can also contain response elements that mediate the regulation of transcription. An enhancer often determines the regulation of expression of a gene. While many enhancer sequences are now known from mammalian genes (globin, elastase, albumin, α- fetoprotein and insulin), one will preferably use an enhancer from a eukaryotic cell- infecting virus for general expression. Examples of enhancers, include, but are not limited to the SV40 enhancer on the late side of the replication origin, the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers.
Additional examples of regulating elements include, but are not limited to, elements present in the non coding and coding nucleotide sequences of homologous and/or heterologous nucleotide sequences, including the Iron Responsive Element (IRE), Translational cis-Regulatory Element (TLRE) or uORFs in 5 ' untranslated sequences and poly(U) stretches at the 3 ' end. As described above, a regulating element can be operably linked to a nucleotide sequence and promoter according to the invention.
A regulating element can also contain sequences necessary for the termination of transcription which may affect mRNA expression. These regions are transcribed as polyadenylated segments in the untranslated portion of the mRNA encoding tissue factor protein. The 3' untranslated regions usually also include transcription termination sites. For protein coding sequences it is preferred that the transcription unit also contains a polyadenylation region. One benefit of this region is that it increases the likelihood that the transcribed unit will be processed and transported like mRNA. The identification and use of polyadenylation signals in expression constructs is well established. It is preferred that homologous polyadenylation signals be used in the transgene constructs. In certain transcription units, the polyadenylation region is derived from the SV40 early polyadenylation signal and consists of about 400 bases.
A nucleic acid construct according to the invention is preferably a vector, in particular a plasmid, cosmid or phage or nucleotide sequence, linear or circular, of a single or double stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing any one of the nucleotide sequences of the invention in sense or antisense orientation into a mammalian cell. The choice of vector is dependent on the recombinant procedures followed and the host cell used. A vector may be an autonomously replicating vector or may replicate together with the chromosome into which it has been integrated. General techniques for integration into the host genome include, for example, systems designed to promote homologous recombination with the host genome. These systems typically rely on sequence flanking the nucleic acid to be expressed that has enough homology with a target sequence within the host cell genome that recombination between the vector nucleic acid and the target nucleic acid takes place, causing the delivered nucleic acid to be integrated into the host genome. These systems and the methods necessary to promote homologous recombination are known to those of skill in the art.
Suitable vectors which can be delivered using the presently known procedures include, but are not limited to, herpes simplex virus vectors, adenovirus vectors, papovavirus vectors (such as human papillomavirus vectors, polyomavirus vectors, SV40 vectors), adeno-associated virus vectors, retroviral vectors, pseudorabies virus, alpha-herpes virus vectors, Herpes virus, Vaccinia virus, Polio virus, AIDS virus, neuronal trophic virus, Sindbis and other RNA viruses, including these viruses with the HIV backbone and the like. A thorough review of viral vectors, particularly viral vectors suitable for modifying nonreplicating cells, and how to use such vectors in conjunction with the expression of polynucleotides of interest can be found in the book VIRAL VECTORS: GENE THERAPY AND NEUROSCIENCE APPLICATIONS (Ed. Caplitt and Loewy, 1995). Also preferred are any viral families which share the properties of these viruses which make them suitable for use as vectors. Retroviruses include Murine Maloney Leukemia virus, MMLV, and retroviruses that express the desirable properties of MMLV as a vector. Retroviral vectors are able to carry a larger genetic payload, i.e., a transgene or marker gene, than other viral vectors, and for this reason are a commonly used vector. However, they are not as useful in non-proliferating cells. Adenovirus vectors are relatively stable and easy to work with, have high titers, and can be delivered in aerosol formulation, and can transfect non-dividing cells. Pox viral vectors are large and have several sites for inserting genes, they are thermostable and can be stored at room temperature.
Viral vectors can have higher transaction abilities (i.e., ability to introduce genes) than chemical or physical methods of introducing genes into cells. Typically, viral vectors contain nonstructural early genes, structural late genes, an RNA polymerase III transcript, inverted terminal repeats necessary for replication and encapsidation, and promoters to control the transcription and replication of the viral genome. When engineered as vectors, viruses typically have one or more of the early genes removed and a gene or gene/promotor cassette is inserted into the viral genome in place of the removed viral DNA. Constructs of this type can carry up to about 8 kb of foreign genetic material. The necessary functions of the removed early genes are typically supplied by cell lines which have been engineered to express the gene products of the early genes in trans.
Retroviral vectors, in general, are described by Verma, LM. , Retroviral vectors for gene transfer. In Microbiology-1985, American Society for Microbiology, pp. 229- 232, Washington, (1985), which is incorporated by reference herein. Examples of methods for using retroviral vectors for gene therapy are described in U.S. Patent Nos. 4,868,116 and 4,980,286; PCT applications WO 90/02806 and WO 89/07136; and Mulligan, (Science 260:926-932 (1993)); the teachings of which are incorporated herein by reference in their entirety for their teaching of methods for using retroviral vectors for gene therapy.
A retrovirus is essentially a package which has packed into it nucleic acid cargo. The nucleic acid cargo carries with it a packaging signal, which ensures that the replicated daughter molecules will be efficiently packaged within the package coat. In addition to the package signal, there are a number of molecules which are needed in cis, for the replication, and packaging of the replicated virus. Typically a retroviral genome, contains the gag, pol, and env genes which are involved in the making of the protein coat. It is the gag, pol, and env genes which are typically replaced by the foreign DNA that it is to be transferred to the target cell. Retrovirus vectors typically contain a packaging signal for incorporation into the package coat, a sequence which signals the start of the gag transcription unit, elements necessary for reverse transcription, including a primer binding site to bind the tRNA primer of reverse transcription, terminal repeat sequences that guide the switch of RNA strands during DNA synthesis, a purine rich sequence 5' to the 3' LTR that serves as the priming site for the synthesis of the second strand of DNA synthesis, and specific sequences near the ends of the LTRs that enable the insertion of the DNA state of the retrovirus to insert into the host genome. This amount of nucleic acid is sufficient for the delivery of one to many genes depending on the size of each transcript. It is preferable to include either positive or negative selectable markers along with other genes in the insert.
Since the replication machinery and packaging proteins in most retroviral vectors have been removed (gag, pol, and env), the vectors are typically generated by placing them into a packaging cell line. A packaging cell line is a cell line which has been transfected or transformed with a retrovirus that contains the replication and packaging machinery but lacks any packaging signal. When the vector carrying the DNA of choice is transfected into these cell lines, the vector containing the gene of interest is replicated and packaged into new retroviral particles, by the machinery provided in cis by the helper cell. The genomes for the machinery are not packaged because they lack the necessary signals.
The construction of replication-defective adenoviruses has been described (Berkner et al., J. Virology 61 : 1213-1220 (1987); Massie et al., MoI. Cell. Biol. 6:2872-2883 (1986); Haj-Ahmad et al., J. Virology 57:267-274 (1986); Davidson et al., J. Virology 61 : 1226-1239 (1987); Zhang "Generation and identification of recombinant adenovirus by liposome-mediated transfection and PCR analysis" BioTechniques 15:868-872 (1993)). The benefit of the use of these viruses as vectors is that they are limited in the extent to which they can spread to other cell types, since they can replicate within an initial infected cell but are unable to form new infectious viral particles. Recombinant adenoviruses have been shown to achieve high efficiency gene transfer after direct, in vivo delivery to airway epithelium, hepatocytes, vascular endothelium, CNS parenchyma and a number of other tissue sites (Morsy, J. Clin. Invest. 92:1580-1586 (1993); Kirshenbaum, J. Clin. Invest. 92:381-387 (1993); Roessler, J. Clin. Invest. 92:1085-1092 (1993); Moullier, Nature Genetics 4:154-159 (1993); La Salle, Science 259:988-990 (1993); Gomez-Foix, J. Biol. Chem. 267:25129-25134 (1992); Rich, Human Gene Therapy 4:461-476 (1993); Zabner, Nature Genetics 6:75-83 (1994); Guzman, Circulation Research 73:1201-1207 (1993); Bout, Human Gene Therapy 5:3-10 (1994); Zabner, Cell 75:207-216 (1993); Caillaud, Eur. J. Neuroscience 5:1287-1291 (1993); and Ragot, J. Gen. Virology 74:501-507 (1993)) the teachings of which are incorporated herein by reference in their entirety for their teaching of methods for using retroviral vectors for gene therapy. Recombinant adenoviruses achieve gene transduction by binding to specific cell surface receptors, after which the virus is internalized by receptor-mediated endocytosis, in the same manner as wild type or replication-defective adenovirus (Chardonnet and Dales, Virology 40:462-477 (1970); Brown and Burlingham, J. Virology 12:386-396 (1973); Svensson and Persson, J. Virology 55:442-449 (1985); Seth, et al., J. Virol. 51 :650- 655 (1984); Seth, et al., MoI. Cell. Biol. 4:1528-1533 (1984); Varga et al., J. Virology 65:6061-6070 (1991); Wickham et al., Cell 73:309-319 (1993)).
A viral vector can be one based on an adenovirus which has had the El gene removed and these virons are generated in a cell line such as the human 293 cell line. Optionally, both the El and E3 genes are removed from the adenovirus genome.
Another type of viral vector that can be used to introduce the polynucleotides of the invention into a cell is based on an adeno-associated virus (AAV). This defective parvovirus is a preferred vector because it can infect many cell types and is nonpathogenic to humans. AAV type vectors can transport about 4 to 5 kb and wild type AAV is known to stably insert into chromosome 19. Vectors which contain this site specific integration property are preferred. An especially preferred embodiment of this type of vector is the P4.1 C vector produced by Avigen, San Francisco, CA, which can contain the herpes simplex virus thymidine kinase gene, HSV-tk, or a marker gene, such as the gene encoding the green fluorescent protein, GFP.
In another type of AAV virus, the AAV contains a pair of inverted terminal repeats (ITRs) which flank at least one cassette containing a promoter which directs cell-specific expression operably linked to a heterologous gene. Heterologous in this context refers to any nucleotide sequence or gene which is not native to the AAV or
B19 parvovirus.
Typically the AAV and B 19 coding regions have been deleted, resulting in a safe, noncytotoxic vector. The AAV ITRs, or modifications thereof, confer infectivity and site-specific integration, but not cytotoxicity, and the promoter directs cell-specific expression. United States Patent No. 6,261,834 is herein incorproated by reference in its entirity for material related to the AAV vector.
Preferably, a nucleic acid construct as used herein contains a selection marker. Useful markers are dependent on the host cell of choice and are well known to persons skilled in the art. Additionally, molecules encoded within the viral vector, e.g., by a cDNA contained in the viral vector, are expressed efficiently in cells which have taken up viral vector nucleic acid. Preferred selection marker genes are extensively presented later on herein.
A recombinant host cell, such as a mammalian cell, preferably a human cell, containing one or more copies of a nucleic acid construct according to the invention is an additional aspect of the invention. By host cell or recombinant host cell is meant a cell which contains a nucleic acid construct such as a vector and supports the replication and/or expression of the nucleic acid construct. A suitable expression system uses any mammalian cells such as CHO, Cos, CPK (porcine kidney), MDCK, BHK, and Vera cells. A suitable human cell or human cell line is an astrocyte, adipocyte, chondrocyte, endothelial, epithelial, fibroblast, hair, keratinocyte, melanocyte, osteoblast, skeletal muscle, smooth muscle, stem, synoviocyte cell or cell line. Examples of suitable human cell lines also include HEK 293 (human embryonic kidney), HeLa, Per.Cβ, and Bowes melanoma cells. Following EP patentability rules, in a preferred embodiment, a human cell is not an embryonic stem cell. Therefore, in another aspect of the invention relates to a mammalian cell that is genetically modified, preferably by a method of the invention, in that a mammalian cell comprises a nucleic acid construct as herein defined above. A nucleic acid construct preferably is a construct containing nucleic acid sequences that are manipulated or modified in vitro. As such, a nucleic acid construct preferably provides a mammalian cell with a combination of nucleic acid sequences which is not found in nature. A nucleic acid construct preferably is stably maintained, either as a autonomously replicating element, or, more preferably, the nucleic acid construct is integrated into the mammalian cell's genome, in which case the construct is usually integrated at random positions in the mammalian cell's genome, for instance by non-homologuous recombination. Stably transformed mammalian cells are produced by known methods. The term stable transformation refers to exposing cells to methods to transfer and incorporate foreign DNA into their genome. These methods include, but are not limited to transfer of purified DNA via microparticle bombardment, electroporation of protoplasts and microinjection or use of silicon fibers to facilitate penetration and transfer of DNA into the mammalian cell. We demonstrated herein (see examle 2) that in a stable expression system, a method of the invention is quie attractive. An alternative method to express a protein or polypeptide of interest in a mammalian cell relies on transient expression from virus-based vectors. We demonstrated herein (see examples 1 and 3) that in a transient expression system, a method of the invention is quite attractive. When a transformed mammalian cell is obtained with a method according to the invention, a mammalian tissue can be regenerated from said transformed cell in a suitable medium, which optionally may contain antibiotics or biocides known in the art for the selection of transformed cells.
Resulting transformed mammalian tissues are preferably identified by means of selection using a selection marker gene as present on a nucleic acid construct as defined herein. A nucleic acid construct according to the invention therefore preferably also comprises a marker gene which can provide selection or screening capability in a treated mammalian cell. Selectable markers are generally preferred for mammalian transformation events, but are not available for all mammalian species. A nucleic acid construct disclosed herein can also include a nucleic acid sequence encoding a marker product. A marker product can be used to determine if the construct or portion thereof has been delivered to the cell and once delivered is being expressed. Examples of marker genes include, but are not limited to the E. coli lacZ gene, which encodes β-galactosidase, and a gene encoding the green fluorescent protein.
Optionally, a marker may be a selectable marker. Examples of suitable selectable markers for mammalian cells include, but are not limited to dihydrofolate reductase (DHFR), thymidine kinase, neomycin, neomycin analog G418, hydromycin, and puromycin. Other suitable selectable markers include, but are not limited to antibiotic, metabolic, auxotrophic or herbicide resistant genes which, when inserted in a host cell in culture, would confer on those cells the ability to withstand exposure to an antibiotic. Metabolic or auxotrophic marker genes enable transformed cells to synthesize an essential component, usually an amino acid, which allows the cells to grow on media that lack this component. Another type of marker gene is one that can be screened by histochemical or biochemical assay, even though the gene cannot be selected for. A suitable marker gene found useful in such host cell transformation experience is a luciferase gene. Luciferase catalyzes the oxidation of luciferin, resulting in the production of oxyluciferin and light. Thus, the use of a luciferase gene provides a convenient assay for the detection of the expression of introduced DNA in host cells by histochemical analysis of the cells. In an example of a transformation process, a nucleic acid sequence sought to be expressed in a host cell could be coupled in tandem with the luciferase gene. The tandem construct could be transformed into host cells, and the resulting host cells could be analyzed for expression of the luciferase enzyme. An advantage of this marker is the non-destructive procedure of application of the substrate and the subsequent detection.
When such selectable markers are successfully transferred into a mammalian host cell, the transformed mammalian host cell can survive if placed under selective pressure. There are two widely used distinct categories of selective regimes. The first category is based on a cell's metabolism and the use of a mutant cell line which lacks the ability to grow independent of a supplemented media. Two examples are CHO DHFR-cells and mouse LTK- cells. These cells lack the ability to grow without the addition of such nutrients as thymidine or hypoxanthine. Because these cells lack certain genes necessary for a complete nucleotide synthesis pathway, they cannot survive unless the missing nucleotides are provided in a supplemented media. An alternative to supplementing the media is to introduce an intact DHFR or TK gene into cells lacking the respective genes, thus altering their growth requirements. Individual cells which were not transformed with the DHFR or TK gene will not be capable of survival in non-supplemented media.
The second category is dominant selection which refers to a selection scheme used in any cell type and does not require the use of a mutant cell line. These schemes typically use a drug to arrest growth of a host cell. Those cells which have a novel gene would express a protein conveying drug resistance and would survive the selection. Examples of such dominant selection use the drugs neomycin, (Southern P. and Berg, P., J. Molec. Appl. Genet. 1 : 327 (1982)), mycophenolic acid, (Mulligan, R.C. and Berg, P. Science 209: 1422 (1980)) or hygromycin, (Sugden, B. et al., MoI. Cell. Biol. 5: 410-413 (1985)). The three examples employ bacterial genes under eukaryotic control to convey resistance to the appropriate drug G418 or neomycin (geneticin), xgpt (mycophenolic acid) or hygromycin, respectively. Others include the neomycin analog G418 and puramycin. Other useful markers are dependent on the host cell of choice and are well known to persons skilled in the art. In a next step a transformed mammalian cell is subjected to conditions leading to expression of a protein or polypeptide of interest, and optionally recovering said protein or polypeptide. Recovering steps depend on the expressed protein or polypeptide and the host cell used but can comprise isolation of the protein or polypeptide. When applied to a protein/polypeptide, the term "isolation" indicates that the protein is found in a condition other than its native environment. In a preferred form, an isolated protein is substantially free of other proteins, particularly other homologous proteins. It is preferred to provide the protein in a greater than 40% pure form, more preferably greater than 60% pure form. Even more preferably it is preferred to provide the protein in a highly purified form, i.e., greater than 80% pure, more preferably greater than 95% pure, and even more preferably greater than 99% pure, as determined by SDS-PAGE. If desired, a second nucleotide sequence may be ligated to a heterologous nucleotide sequence to encode a fusion protein to facilitate protein purification and protein detection on for instance Western blot and in an ELISA. Suitable heterologous sequences include, but are not limited to, the nucleotide sequences encoding for proteins such as for instance glutathione-S-transferase, maltose binding protein, metal- binding polyhistidine, green fluorescent protein, luciferase and beta-galactosidase. The protein may also be coupled to non-peptide carriers, tags or labels that facilitate tracing of the protein, both in vivo and in vitro, and allow for the identification and quantification of binding of the protein to substrates. Such labels, tags or carriers are well-known in the art and include, but are not limited to, biotin, radioactive labels and fluorescent labels.
In this document and in its claims, the verb "to comprise" and its conjugations is used in its non-limiting sense to mean that items following the word are included, but items not specifically mentioned are not excluded. In addition the verb "to consist" may be replaced by "to consist essentially of meaning that a vector or a nucleic acid construct or a nucleotide molecule, a host cell respectively a method as defined herein may comprise additional component(s) respectively additional step(s) than the ones specifically identified, said additional component(s) respectively additional step(s) not altering the unique characteristic of the invention. In addition, reference to an element by the indefinite article "a" or "an" does not exclude the possibility that more than one of the element is present, unless the context clearly requires that there be one and only one of the elements. The indefinite article "a" or "an" thus usually means "at least one". Each embodiment as described herein may be combined with other embodiment as described herein unless otherwise indicated.
All patent and literature references cited in the present specification are hereby incorporated by reference in their entirety.
The following examples are offered for illustrative purposes only, and are not intended to limit the scope of the present invention in any way.
Example 1: CHO cell lines transgenic for SEAP containing the UNl sequence (SEQ ID NO :2) at a specific genomic location
We have generated two stable CHO cell lines expressing a secreted form of human placental alkaline phosphatase (SEAP) which has been integrated at a specific genomic location. Genetically, the cell lines differ only by the presence of the UNl. This UNl sequence increased SEAP protein production almost by a factor 2. This result, obtained over more than 30 generations, confirms the stability of the cell lines.
Introduction
The secreted form of human placental alkaline phosphatase (SEAP) is a very stable reporter enzyme which is easily detectable in the cell medium of mammalian expression systems. Here we use SEAP production to study the yield effect of introducing the UNic™ technology in a controlled high expression CHO cell line. The Flp-In™ system was used to obtain transgenic CHO lines with the SEAP constructs integrated at a single specific genomic location by means of site-specific recombination. Two polyclonal stable lines were generated that differed only in the presence of the untranslated sequence of ntp303 (UNl sequence). The integration site in the CHO FIp-In strain is known to give a high and stable mRNA expression level. Clonal variation and variation related to different insertion sites and copy number are eliminated by the use of polyclonal isogenic lines. This enables a proper comparison between the constructs with and without UNl sequence. The two SEAP expression lines were grown simultaneously for some time and analyzed at different time points for cell growth and protein expression. Experimental
Construction of SEAP expression cassettes in FIp-In vectors
The pPNIC004 and pPNIC005 vectors used to transfect CHO FIp-In cells were constructed based on the expression vector pEF5/FRT/V5-dest (Invitrogen). Both vectors were constructed by insertion of the SEAP cDNA sequence, amplified by PCR using pSEAP2 (Invitrogen) as a template, between the constitutive EFl -alpha promoter and the bovine growth hormone poly-adenylation site. pPNIC004 contains an additional UNl sequence fused to the SEAP coding sequence by means of fusion PCR. The constructs were analyzed by sequencing before transfection.
Generation of CHO lines
The CHO-pPNIC004 and CHO-pPNIC005 lines were generated using the pPNIC004 and pPNIC005 plasmids and CHO FIp-In cell line, according to the recommendations provided by the manufacturer of the FIp-In system (Invitrogen). Cells were maintained by transferring a 1 : 10 dilution in fresh medium. Briefly, cells were washed with PBS, detached with trypsin/EDTA solution, diluted in fresh selective HAM' s medium (HAM's F 12 medium containing 10% fetal bovine serum, 10 mM L- glutamine and 50 mg/ml Hygromycin B), and diluted to a final dilution of 1 : 10 in selective HAM's medium in a new T75 flask.
Analysis of transgenic CHO lines The transgenic CHO lines were analyzed as follows. The cells were seeded at a density of 105 cells per ml, using 2 ml per well in 6-wells plates. Hygromycin was omitted from these plates to give similar growth for transgenic cells and the empty CHO FIp-In cell line that was used as a negative control. The three cell lines (CHO FIp-In, CHO- pPNIC004, and CHO-pPNIC005) were seeded. Two wells were used for measuring SEAP production in the supernatant and the same wells were used to determine the cell numbers.
Counting cells
Cells were detached from the culture plate by trypsin/EDTA treatment. The cell concentration was determined using a CASY Cell Counter after dilution in CASY-ton solution, as described by the manufacturer. SEAP activity assay
The SEAP concentration was determined using the Phospha-Light System (Applied Biosystems), using the manual provided with the kit. Luminescence was measured using a Victor3 plate reader (Perkin-Elmer).
Results
Transfection of CHO FIp-In or normal CHO cell line with pPNIC004 or pPNIC005 plasmid resulted in secretion of SEAP into the culture medium. The polyclonal transgenic cell lines also secreted an easily detectable amount of SEAP. The cells used for the time series were seeded at the tenth passage after starting growth under selective pressure, which represents approximately 35 generations of transgenic cells. Cell counts at 1, 2, and 3 days after seeding are shown in Figure IA. The corresponding SEAP levels in the cell supernatant are shown in Figure IB, which includes an extra time point at 6 hours after seeding (t=0.25 days) The average values of the sample duplicates are shown. The growth curves show that both transgenic cell lines grow with a rate similar to the CHO FIp-In cell line. The presence of the SEAP gene or UNl sequence did not have a significant effect on cell growth, indicating that differences in protein production are not a result of differences in biomass. All cells have reached a maximum cell density around two days after seeding at a density of 1.2 to 1.5 million cells per well of 10 cm2. After one day the SEAP concentration still increased linearly in time. Earlier, two independent experiments showed a two-fold increased SEAP production for the pPNIC004 line (+UN 1) compared to the pPNIC005 line (-UN1) at 48 hours after seeding.
These data demonstrate that the presence of the UNl sequence results into a higher production of target protein. Since the copy number and the promoter sequence of both SEAP constructs is identical, this means that the increase is caused by the presence of the UNl transcript.
Conclusion
Linking the UNl to a SEAP construct led to an almost twofold increase in SEAP production in a transgenic, high producing, polyclonal CHO system using isogenic single copy insert cell lines. The increase in protein yield varies with growth phase. The current data are in line with earlier findings that UNl -containing RNAs give increased protein production and boost protein production in plants and fungi.
Example 2: Increase of human protein production in transgenic human cell line
We have generated two stable HEK cell lines expressing a secreted form of human placental alkaline phosphatase (SEAP) which has been integrated at a specific genomic location. Genetically, the cell lines differ only by the base sequence of UNl or UNl.51 (SEQ ID NO:3). Replacement of the SEQ ID NO:2 sequence by the SEQ ID NO:3 sequence increased protein production by a factor 1.5. This result is evidence of a further improvement by a new sequence (UNl.51 or SEQ ID NO:3).
Introduction
The secreted form of human placental alkaline phosphatase (SEAP) is a very stable reporter enzyme which is easily detectable in the cell medium of mammalian expression systems. Here we use SEAP production to study the yield effect of introducing the UNic™ technology in a controlled high expression human (HEK) cell line. The Flp-In™ system was used to obtain transgenic HEK lines with the SEAP constructs integrated at a single specific genomic location by means of site-specific recombination. Two polyclonal stable lines were generated that differed only in the sequence of the sequence at the 5' of the start codon (SEQ ID NO:2 or SEQ ID NO:3). The integration site in the HEK FIp-In strain is known to give a high and stable mRNA expression level. Clonal variation and variation related to different insertion sites and copy number are eliminated by the use of polyclonal isogenic lines. This enables a proper comparison between the constructs with SEQ ID NO:2 and SEQ ID NO:3 sequence.
The two SEAP expression lines were grown simultaneously and analyzed at different time points for cell growth and protein expression.
Experimental
Construction of SEAP expression cassettes in FIp-In vectors
The pPNIC136 and pPNIC147 vectors used to transfect CHO FIp-In cells were constructed based on the expression vector pEF5/FRT/V5-dest (Invitrogen). Both vectors were constructed by insertion of the SEAP cDNA sequence, amplified by PCR using pSEAP2 (Invitrogen) as a template, between the constitutive human cytomegalo virus (CMV) promoter and the bovine growth hormone poly-adenylation site. The SEQ ID NO: 1 and SEQ ID NO: 3 sequences were purchased as synthetic DNA and inserted as a fusion with the SEAP coding sequence resulting in the pPNIC135 and pPNIC147 plasmids, respectively. The constructs were analyzed by sequencing before transfection.
Generation of HEK lines The HEK-pPNIC136 and HEK-pPNIC147 lines were generated using the pPNIC136 and pPNIC147 plasmids and HEK FIp-In cell line, according to the recommendations provided by the manufacturer of the FIp-In system (Invitrogen).
Cells were maintained by transferring a 1 :10 dilution in fresh medium. Briefly, cells were washed with PBS, detached with trypsin/EDTA solution, diluted in fresh selective DMEM medium (DMEM medium containing 10% fetal bovine serum and 50 microgram/ml Hygromycin B), and diluted to a final dilution of 1 : 10 in selective
DMEM medium in a new T75 flask.
Analysis of transgenic HEK lines
The transgenic HEK lines were analyzed as follows. The cells were seeded at a density of 10E5 cells per ml, using 2 ml per well in 6-wells plates. Hygromycin was omitted from these plates to give similar growth for transgenic cells and the empty HEK FIp-In cell line that was used as a negative control. The three cell lines (HEK FIp-In, HEK- pPNIC136, and HEK-pPNIC147) were seeded. Two wells were used for measuring
SEAP production in the supernatant and the same wells were used to determine the cell numbers.
SEAP activity assay The SEAP concentration was determined using the Phospha-Light System (Applied Biosystems), using the manual provided with the kit. Luminescence was measured using a Victor3 plate reader (Perkin-Elmer). Results
The polyclonal FIp-In HEK cell lines stably transformed with the pPNIC136 or pPNIC147 plasmid secreted an easily detectable amount of SEAP into the culture medium. The cell density increased up to two days after seeding when complete confluency was obtained with 2 to 2.5 million cells per well of 10 cm2. Cell supernatants were assayed for SEAP activity on days 1, 3, and 7 after seeding. The supernatants of the pPNIC136 lines contained at least 50 percent more SEAP than the pPNIC147 lines at these time points, which is shown in Figure 2 .
These data demonstrate that the presence of the SEQ ID NO: 3 sequence results into a higher production of target protein when compared to the SEQ ID NO:2 sequence. Since the copy number and the promoter sequence of both SEAP constructs are identical, this means that the increase is solely caused by the difference in the sequences of the transcripts of SEQ ID NO:2 versus SEQ ID NO: 3.
Conclusion
Use of the UNl.51 sequence instead of UNl sequence results in a 50 percent increase in SEAP production in a transgenic, high producing, polyclonal HEK system using isogenic single copy insert cell lines.
Example 3: Improvement of transient human IL-4 expression in human cell line by UN1.52 (SEQ ID NO:4)
We have expressed human IL-4 in parallel in two human cell lines (HEK) by transfection with two different plasmid constructs. The expressed messenger RNAs differ by the presence of the UNl.52 (SEQ ID NO:4) sequence. The plasmid relating to the UNl.52 sequence increased human IL-4 protein production almost by a factor 6 as compared to the reference construct. This result, obtained in repetitive experiments, shows that this new sequence also can enhance protein production during transient expression. Introduction
The human IL-4 (hIL-4) protein is a cytokine with anti- inflammatory properties and a key regulator in humoral and adaptive immunity. Cells that express the protein secrete it into the culture medium. It is easily detected by means of a commercial ELISA kit. Here we use hIL-4 production to study the yield effect of introducing the UNl .52 after transfection of HEK cells. Expression constructs were generated that differed only in the expression of UNl.52 messenger RNA. The two constructs were transfected in parallel to enable a proper comparison between the constructs with and without UNl .52 sequence.
Experimental
Construction of hIL-4 cassettes in expression vectors
The pPNIC144 and pPNIC145 expression vectors used to transfect HEK cells were constructed based on the expression vector pCMV6-neo (OriGene). Both vectors were constructed by insertion of the hIL-4 cDNA sequence, derived from the pCMV6-
XL5mod_IL4_NM_000589 plasmid (OriGene). The neomycin resistance cassette of the pCMV6-neo plasmid was replaced by the blasticidin resistance cassette derived from the pUB/Bsd plasmid (Invitrogen). The resulting hIL-4 expression plasmid is pPNIC144. The sequence for UNl.52 was purchased as synthetic DNA and inserted as a fusion with the hIL-4 coding sequence resulting in the pPNIC145 plasmid. The constructs were analyzed by sequencing before transfection. Plasmid concentrations and purity were checked using a Nanodrop spectrophotometer (Thermo Scientific).
Generation of HEK lines HEK cells were grown in DMEM/F12 medium containing 10% FBS at 37°C, 5% CO2. Cells were seeded in 6-well plates to reach a density of half a million cells per well at the day of transfection. Transfections were performed in triplicate reactions using Fugene-6 reagent (Roche) according to the manufacturer's instructions. Medium was replaced by fresh medium at 24, 48, and 72 hours post transfection and samples were collected for protein analysis. At each time point the cells of one well of each transfection were detached using trypsin/EDTA and used to count the number of cells by using a CASY Cell Counter as described by the manufacturer. hIL-4 yield
An ELISA kit (eBioscience) was used to determine the concentration of hIL-4 in the medium samples. Each sample was assayed in triplicate wells and a dilution series of rhIL-4 (eBioscience) was used as a standard. The colorimetric ELISA assay was measured using a Victor3 plate reader (Perkin-Elmer).
Results
HEK cells transfected with plasmids pPNIC144 and pPNIC145 secreted an easily detectable amount of hIL4 into the culture medium. The density of cells transfected with either pPNIC144 or pPNIC145 was not significantly different. The amount of hlL- 4 produced per cell increased between 24 and 48 hours after transfection, but remained constant between 48 and 72 hours after transfection. The supernatants of the HEK cells transfected with pPNIC145 contained 6 to 7 times more hIL-4 than the cells transfected with pPNIC144 at all time points, which is shown in Figure 3 for 48 hours after transfection.
These data demonstrate that the presence of the UN 1.52 sequence results into a higher production of target protein. Since the concentration of transfected plasmids and the promoter sequence of both hIL-4 constructs is identical, this means that the increase is caused by the presence of UNl .52 in the expressing cells.
Conclusion
Using UNl.52 for hIL-4 expression led to an almost six fold increase in hIL-4 production after transfection in a human HEK cell line.
Description of the figures
Figure 1: Transgenic CHO lines expressing SEAP with (pPNIC004) or without UNl sequence (pPNIC005) or without SEAP gene (CHO FIp-In) were seeded at t=0 days at a density of 2*105 cells per well. The number of cells per well (A) and concentration of SEAP in the cell supernatant (B) were determined at different time points for 3 days after seeding. Figure 2: Transgenic human cell line (HEK) with SEQ ID NO:2 (pPNIC136) or with SEQ ID NO: 3 sequence (pPNIC147) Cells were seeded at t=0 days at a density of 2*10E5 cells per well. SEAP expression was determined 7 days after seeding.
Figure 3: HEK cells transiently expressing hIL-4 after transfection. Expression with or without UNl.52. The concentration of hIL-4 in the cell supernatant was determined at 48 hours after transfection.

Claims

Claims
1. A method for expressing a protein of interest in a mammalian cell comprising the steps of: a) providing a nucleic acid construct comprising a first nucleotide sequence that has at least 34 % nucleotide sequence identity with the nucleotide sequence of SEQ ID No. 1, operably linked to a second nucleotide sequence encoding a protein or polypeptide of interest and further operably linked to a heterologous promotor, b) contacting a mammalian cell with said nucleic acid construct to obtain a transformed mammalian cell, and c) subjecting said transformed mammalian cell to conditions leading to expression of the protein or polyeptide of interest, and optionally recovering said protein or polypeptide.
2. A method for expressing a protein of interest in a mammalian cell comprising the steps of: a) providing a nucleic acid construct comprising a first nucleotide sequence comprising a nucleotide sequence that has at least 46% nucleotide sequence identity to 100-151 or 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has at least 51% nucleotide sequence identity to nucleotides 4 - 76 of the nucleotide sequence of SEQ ID No. 1 or a combination thereof, operably linked to a second nucleotide sequence encoding a protein or polypeptide of interest and further operably linked to a heterologous promotor, b) contacting a mammalian cell with said nucleic acid construct to obtain a transformed mammalian cell, and c) subjecting said transformed mammalian cell to conditions leading to expression of the protein or polyeptide of interest, and optionally recovering said protein or polypeptide.
3. A method according to claim 2, wherein the first nucleotide sequence comprises a nucleotide sequence that has at least 46% nucleotide sequence identity to nucleotides 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has 100% nucleotide sequence identity to nucleotides 27 - 50 of the nucleotide sequence of SEQ ID No. 1 or a combination thereof
4. A method according to any one of claims 1 to 3, wherein the first nucleotide sequence has at least 34, 40, 50, 60, 70, 80, 90, 95, 99% identity with SEQ ID
NO:1.
5.A method according to claim 4, wherein the first nucleotide sequence comprises or consists of SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, or SEQ ID NO:9.
6.A method according to any one of the claims 1-5, wherein the second nucleotide sequence encodes a heterologous protein or polypeptide.
7.A method according to any one of the claims 1-6, wherein the mammalian cell is a human cell.
8.A mammalian cell, wherein said cell comprises a nucleic acid construct comprising a first nucleotide sequence that has at least 34% nucleotide sequence identity to the nucleotide sequence of SEQ ID No. 1 , operably linked to a second nucleotide sequence encoding a protein or polypeptide of interest and further operably linked to a heterologous promotor.
9.A mammalian cell according to claim 8, wherein the first nucleotide sequence comprises a nucleotide sequence that has at least 46% nucleotide sequence identity to nucleotides 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has at least 51% nucleotide sequence identity to nucleotides 4 - 76 of the nucleotide sequence of SEQ ID No. 1 or a combination thereof.
10.A mammalian cell according to claim 9, wherein the first nucleotide sequence comprises a nucleotide sequence that has at least 46% nucleotide sequence identity to nucleotides 104 - 151 of the nucleotide sequence of SEQ ID No. 1 or a nucleotide sequence that has 100% nucleotide sequence identity to nucleotides 27 - 50 of the nucleotide sequence of SEQ ID No. 1 or a combination thereof.
1 IA mammalian cell according to any one of the claims 8-10, wherein the second nucleotide sequence encodes a heterologous protein or polypeptide.
12A mammalian cell according to any one of the claims 8-11, wherein the mammalian cell is a human cell.
13. A mammalian cell according to any one of claims 8 to 12, wherein the first nucleotide sequence comprises or consists of or having at least 34% nucleotide sequence identity to the SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, or SEQ ID NO:9.
14. A nucleotide molecule being represented by a sequence comprising or consisting of or having at least 34% nucleotide sequence identity to the SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, or SEQ ID NO:9.
15. A nucleic acid construct comprising a nucleotide molecule as defined in claim
PCT/NL2009/050354 2008-06-18 2009-06-18 Regulation of the expression of a protein in a mammalian cell WO2009154454A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP09766872A EP2288712A1 (en) 2008-06-18 2009-06-18 Regulation of the expression of a protein in a mammalian cell

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP08158460.9 2008-06-18
EP08158460 2008-06-18

Publications (1)

Publication Number Publication Date
WO2009154454A1 true WO2009154454A1 (en) 2009-12-23

Family

ID=40902092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2009/050354 WO2009154454A1 (en) 2008-06-18 2009-06-18 Regulation of the expression of a protein in a mammalian cell

Country Status (2)

Country Link
EP (1) EP2288712A1 (en)
WO (1) WO2009154454A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012044171A1 (en) 2010-10-01 2012-04-05 R1 B3 Holding B.V. Regulation of translation of expressed genes

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003031613A2 (en) * 2001-10-05 2003-04-17 Katholieke Universiteit Nijmegen Regulation of translation of heterologously expressed genes

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003031613A2 (en) * 2001-10-05 2003-04-17 Katholieke Universiteit Nijmegen Regulation of translation of heterologously expressed genes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HULZINK R J M ET AL: "The 5'-untranslated region of the ntp303 gene strongly enhances translation during pollen tube growth, but not during pollen maturation", PLANT PHYSIOLOGY, AMERICAN SOCIETY OF PLANT PHYSIOLOGISTS, ROCKVILLE, MD, US, vol. 129, no. 1, 1 May 2002 (2002-05-01), pages 342 - 353, XP002240205, ISSN: 0032-0889 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012044171A1 (en) 2010-10-01 2012-04-05 R1 B3 Holding B.V. Regulation of translation of expressed genes
US10876113B2 (en) 2010-10-01 2020-12-29 Proteonic Biotechnology IP B.V Regulation of translation of expressed genes
US11667921B2 (en) 2010-10-01 2023-06-06 Proteonic Biotechnology Ip B.V. Regulation of translation of expressed genes

Also Published As

Publication number Publication date
EP2288712A1 (en) 2011-03-02

Similar Documents

Publication Publication Date Title
US8470797B2 (en) Inducible small RNA expression constructs for targeted gene silencing
KR101738438B1 (en) Cell capable of producing adeno-associated virus vector
US20220282261A1 (en) Construct and sequence for enhanced gene expression
EP2166107A1 (en) Lentiviral vectors for the expression of shRNA
ES2543730T3 (en) Enhanced protein expression system
Terenzi et al. The antiviral enzymes PKR and RNase L suppress gene expression from viral and non-viral based vectors
WO2014104269A1 (en) Posterior silk gland gene expression unit and transgenic silkworm containing same
US20230304003A1 (en) Expression control using a regulatable intron
EP3393523A1 (en) Endothelium-specific nucleic acid regulatory elements and methods and use thereof
EP2288712A1 (en) Regulation of the expression of a protein in a mammalian cell
KR102645079B1 (en) Regulation of gene expression by aptamer-mediated accessibility of polyadenylation signals.
Tepfer et al. Transient expression in mammalian cells of transgenes transcribed from the Cauliflower mosaic virus 35S promoter
Zhang et al. A vector based on the chicken hypersensitive site 4 insulator element replicates episomally in mammalian cells
JP6436908B2 (en) Exogenous gene expression vector, transformant discrimination marker and transformant
Paek et al. The orientation-dependent expression of angiostatin-endostatin hybrid proteins and their characterization for the synergistic effects of antiangiogenesis
Tomberg et al. Intronization enhances expression of S-protein and other transgenes challenged by cryptic splicing
Johnston et al. 128. Targeting a High-Expression FVIII Transgene to Exogenous Locations in the Genome without Disrupting Endogenous Gene Expression
Class et al. Patent application title: Regulation of translation of expressed genes Inventors: Raymond Michael Dimphena Verhaert (Breda, NL) Pieter Victor Schut (Leiden, NL) Sharief Barends (Voorschoten, NL) Maurice Wilhelmus Van Der Heijden (Gouda, NL) Assignees: R1 B3 Holdings BV
WO2000053773A2 (en) Methods for mitochondrial gene therapy
Howarth et al. Use of Viral Gene Delivery Systems to Investigate the Neuroprotective Roles of Hsp70 and Hsp40 Proteins
Scarpulla et al. Cumulative Contents

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09766872

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2009766872

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE