AU2008202251A1 - A method for simultaneous production of multiple proteins; vectors and cells for use therein - Google Patents

A method for simultaneous production of multiple proteins; vectors and cells for use therein Download PDF

Info

Publication number
AU2008202251A1
AU2008202251A1 AU2008202251A AU2008202251A AU2008202251A1 AU 2008202251 A1 AU2008202251 A1 AU 2008202251A1 AU 2008202251 A AU2008202251 A AU 2008202251A AU 2008202251 A AU2008202251 A AU 2008202251A AU 2008202251 A1 AU2008202251 A1 AU 2008202251A1
Authority
AU
Australia
Prior art keywords
star
sequence
gene
expression
polypeptide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
AU2008202251A
Other versions
AU2008202251B2 (en
Inventor
Arthur Leo Kruckeberg
Arie Pieter Otte
Richard George Antonius Bernardus Sewalt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chromagenics BV
Original Assignee
Chromagenics BV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2003238719A external-priority patent/AU2003238719B2/en
Application filed by Chromagenics BV filed Critical Chromagenics BV
Priority to AU2008202251A priority Critical patent/AU2008202251B2/en
Publication of AU2008202251A1 publication Critical patent/AU2008202251A1/en
Priority to AU2011218621A priority patent/AU2011218621B2/en
Application granted granted Critical
Publication of AU2008202251B2 publication Critical patent/AU2008202251B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Landscapes

  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Description

P/00/01 I Regulation 3.2 00
O
O
OO
O
O
AUSTRALIA
Patents Act 1990 COMPLETE SPECIFICATION STANDARD PATENT
(ORIGINAL)
Name of Applicant(s): Actual Inventor(s): Address for Service: Invention Title: Chromagenics of Plantage Muidergracht 12, Amsterdam NL-1018 TV, The Netherlands Arie Pieter OTTE; Arthur Leo KRUCKEBERG; Richard George Antonius Bernardus SEWALT DAVIES COLLISON CAVE, Patent Trademark Attorneys, of 1 Nicholson Street, Melbourne, 3000, Victoria, Australia Ph: 03 9254 2777 Fax: 03 9254 2770 Attorney Code: DM "A method for simultaneous production of multiple proteins; vectors and cells for use therein" The following statement is a full description of this invention, including the best method of performing it known to us:- P/00/008b Q:\OPER\RAS\Jul 2007-June 2008\30546031 div 141 doc 21/5/08 00 Ct1 Title: A method for simultaneous production of multiple proteins; vectors and cells for use therein This application is a divisional application of Australian Application No.
2003238719 the specification and drawings of which as originally filed are incorporated herein in their entirety by reference.
00 The invention relates to the fields of biochemistry, molecular biology, pharmacology and diagnosis. More specifically the present invention relates to the production of proteins in a host cell. And even more specifically the invention relates to a method for improving expression of two or more proteins in a (host) cell. The method is suited for production of for example recombinant antibodies that can be used in a pharmaceutical preparation or as a diagnostic tool.
Proteins are produced in systems for a wide range of applications in biology and biotechnology. These include research into cellular and molecular function, production of proteins as biopharmaceuticals or diagnostic reagents, and modification of the traits or phenotypes of livestock and crops. Biopharmaceuticals are usually proteins that have an extracellular function, such as antibodies for immunotherapy or hormnones or cytokines for eliciting a cellular response. Proteins with extracellular functions exit the cell via the secretory pathway, and undergo post-translational modifications during secretion. The modifications (primarily glycosylation and disulfide bond formation) do not occur in bacteria. Moreover, the specific oligosaccharides attached to proteins by glycosylating enzymes are species and cell-type specific. These considerations often limit the choice of host cells for heterologous protein production to eukaryotic cells (Kaufman, 2000). For expression of human therapeutic proteins, host cells such as bacteria; yeast, or plants may be inappropriate. Even the subtle differences in protein glycosylation between rodents and human, for example, can be sufficient to render proteins produced in rodent cells unacceptable for therapeutic use (Sheeley et al., 1997). The consequences of improper (i.e.
non-human) glycosylation include immunogenicity, reduced functional half-life, and loss of activity. This limits the choice of host cells further, to human cell lines or to 00
O
O
cell lines such as Chinese Hamster Ovary (CHO) cells, which may produce glycoproteins with human-like carbohydrate structures (Liu, 1992).
Some proteins of biotechnological interest are functional as multimers, i.e. they consist of two or more, possibly different, polypeptide chains in their biologically and/or biotechnologically active form. Examples include antibodies C (Wright Morrison, 1997), bone morphogenetic proteins (Groeneveld Burger, 2000), nuclear hormone receptors (Aranda Pascual, 2001), 00 heterodimeric cell surface receptors T cell receptors, (Chan Mak, 1989)), integrins (Hynes, 1999), and the glycoprotein hormone family (chorionic gonadotrophin, pituitary luteinizing hormone, follicle-stimulating hormone, and thyroid-stimulating hormone, (Thotakura Blithe, 1995)).
Production of such multimeric proteins in heterologous systems is technically difficult due to a number of limitations of current expression systems. These limitations include difficulties in isolating recombinant cells/cell lines that produce the monomer polypeptides at high levels (predictability and yield), (2) difficulties in attaining production of the monomeric polypeptides in stoichiometrically balanced proportions (Kaufman, 2000), and declines in the levels of expression during the industrial production cycle of the proteins (stability). These problems are described in more detail below.
Recombinant proteins such as antibodies that are used as therapeutic compounds need to be produced in large quantities. The host cells used for recombinant protein production must be compatible with the scale of the industrial processes that are employed. Specifically, the transgene (or the gene encoding a protein of interest, the two terms are used interchangeably herein) expression system used for the heterologous protein needs to be retained by the host cells in a stable and active form during the growth phases of scale-up and production. This is achieved by integration of the transgene into the genome of the host cell. However, creation of recombinant cell lines by conventional means is a costly and inefficient process due to the unpredictability of transgene expression among the recombinant host cells.
C The unpredictability stems from the high likelihood that the transgene will become inactive due to gene silencing (McBurney et al., 2002). Using conventional technologies, the proportion of recombinant host cells that produce one polypeptide at high levels ranges from In order to construct a cell line that produces two polypeptides at high levels, the two transgenes Sare generally integrated independently. If the two transgenes are transfected O simultaneously on two separate plasmids, the proportion of cells that will 00 produce both polypeptides at high levels will be the arithmetic product of the N proportions for single transgenes. Therefore the proportion of such recombinant cell lines ranges from one in 2,500 to one in 10,000. For multimeric proteins with three or more subunits, the proportions decline further. These high-producing cell lines must subsequently be identified and isolated from the rest of the population. The methods required to screen for these rare high-expressing cell lines are time-consuming and expensive.
An alternative to simultaneous transfection of two transgene-bearing plasmids is sequential transfection. In this case the proportion of high-yielding clones will be the sum of the proportions for single transgenes, i.e. 2-4%.
Sequential transfection however has (major) drawbacks, including high costs and poor stability. The high costs results from various factors: in particular, the time and resources required for screening for high-expressing cell lines is doubled, since high expression of each subunit must be screened for separately.
The poor overall stability of host cells expressing two polypeptides is a consequence of the inherent instability of each of the two transgenes.
Production of multimeric proteins requires balanced levels of transcriptional and translational expression of each of the polypeptide monomers. Imbalanced expression of the monomers is wasteful of the costly resources used in cell cultivation. Moreover, the imbalanced expression of one monomer can have deleterious effects on the cell. These effects include (a) sequestration of cellular factors required for secretion of the recombinant proteins chaperones in the endoplasmic reticulum, (Chevet et al., 2001)), 00
O
O
and induction of stress responses that result in reduced rates of growth and protein translation, or even in apoptosis (programmed cell death) (Pahl Baeuerle, 1997, Patil Walter, 2001). These deleterious effects lead to losses in productivity and yield and to higher overhead costs.
Silencing of transgene expression during prolonged host cell CI cultivation is a commonly observed phenomenon. In vertebrate cells it can be O caused by formation of heterochromatin at the transgene locus, which prevents 00 transcription of the transgene. Transgene silencing is stochastic; it can occur Sshortly after integration of the transgene into the genome, or only after a number of cell divisions. This results in heterogeneous cell populations after prolonged cultivation, in which some cells continue to express high levels of recombinant protein while others express low or undetectable levels of the protein (Martin Whitelaw, 1996, McBurney et al., 2002). A cell line that is used for heterologous protein production is derived from a single cell, yet is often scaled up to, and maintained for long periods at, cell densities in excess of ten million cells per millilitre in cultivators of 1,000 litres or more. These large cell populations (1014 1016 cells) are prone to serious declines in productivity due to transgene silencing (Migliaccio et al., 2000, Strutzenberger et al., 1999).
The instability of expression of recombinant host cells is particularly severe when transgene copy numbers are amplified in an attempt to increase yields. Transgene amplification is achieved by including a selectable marker gene such as dihydrofolate reductase (DHFR) with the transgene during integration. Increased concentrations of the selection agent (in the case of DHFR, the drug methotrexate) select for cells that have amplified the number of DHFR genes in the chromosome. Since the transgene and DHFR are colocalized in the chromosome, the transgene copy number increases too. This is correlated with an increase in the yield of the heterologous protein (Kaufman, 1990). However, the tandem repeats of transgenes that result from amplification are highly susceptible to silencing (Garrick et al., 1998, 00
O
O
Kaufman, 1990, McBurney et al., 2002). Silencing is often due to a decline in transgene copy number after the selection agent is removed (Kaufman, 1990).
Removal of the selection agent, however, is routine during industrial biopharmaceutical production, for two reasons. First, cultivation of cells at industrial scales in the presence of selection agents is not economically feasible, as the agents are expensive compounds. Second, and more 0 importantly, concerns for product purity and safety preclude maintaining 00 selection during a production cycle. Purifying a recombinant protein and 0 removing all traces of the selection agent is necessary if'the protein is intended for pharmaceutical use. However, it is technically difficult and prohibitively expensive to do so, and demonstrating that this has been achieved is also difficult and expensive. Therefore amplification-based transgenic systems that require continual presence of selections agents are disadvantageous.
Alternatively, silencing can be due to epigenetic effects on the transgene tandem repeats, a phenomenon known as Repeat Induced Gene Silencing (RIGS) (Whitelaw et al., 2001). In these cases the copy number of the transgene is stable, and silencing occurs due to changes in the chromatin structure of the transgenes (McBurney et al., 2002). The presence of a selection agent during cell cultivation may be unable to prevent silencing of the transgene transcription unit because transgene expression is independent of expression of the selectable marker. The lack of a means to prevent RIGS in conventional transgenic systems thus results in costly losses in productivity.
The problems associated with conventional transgene expression technologies for protein production and more specifically for multimeric protein production clearly demonstrate a need in the art for a system that overcomes these problems. The present invention relates to a novel system for creating (host) cells/cell lines that efficiently express two or more proteins, for example two or more polypeptide monomers and optionally produce functional multimeric proteins from them. Important examples of heterologous multimer proteins are recombinant antibodies. In one embodiment the invention takes 00 Ct advantage of proprietary DNA elements that protect transgenes from silencing, termed STabilizing Anti-Repressor (STAR or STAR~m; the terms will be used interchangeably herein) elements, for the production of two or more proteins.
The invention also discloses a novel configuration of transcriptional and translational elements and selectable marker genes. In one embodiment, the 0 invention uses antibiotic resistance genes and protein translation initiation 00 00 sites with reduced translation efficiency (for example an Internal Ribosome Binding Site, IRES) in novel ways that improve heterologous protein expression. The combination of the STAIR elements and these other elements results in a system for obtaining a cell which expresses two or more proteins that predictably produces a high proportion of recombinant cell lines with high yields of heterologous proteins, exhibits balanced and proportional expression of two or more polypeptide monomers which are constituents of a multimeric protein, and creates recombinant cell lines with stable productivity characteristics.
Therefore, the invention provides in one embodiment, a method for obtaining a cell which expresses two or more proteins comprising providing said cell with two or more protein expression units encoding said two or more proteins, characterised in that at least two of said protein expression units comprise at least one STAR sequence.
The terms "cell"/"host cell" and "cell line"/"host cell line" are respectively typically defined as a eukaryotic cell and homogeneous populations thereof that are maintained in cell culture by methods known in the art, and that have the ability to express heterologous proteins.
The term "expression" is typically used to refer to the production of a specific RNA product or products, or a specific protein or proteins, in a cell. In the case of RNA products, it refers to the process of transcription. In the case of protein products, it refers to the processes of transcription, translation and 00 Ct optionally post-translational modifications. In the case of secreted proteins, it refers to the processes of transcription, translation, and optionally post- N translational modification glycosylation, disfulfide bond formation, etc.); followed by secretion. In the case of multimeric proteins, it includes assembly of the multimeric structure from the polypeptide monomers. The corresponding verbs of the noun "expression" have an analogous meaning as said noun.
SA protein is herein defined as being either a product obtained by the processes of transcription and translation and possibly but not necessarily said product is part of a multimeric protein (for example a subunit) and/or (ii) a product obtained by the processes of transcription, translation and posttranslational modification. The term "multimer" or "multimeric protein" is typically defined as a protein that comprises two or more, possibly nonidentical, polypeptide chains ("monomers"). The different monomers in a multimeric protein can be present in stoichiometrically equal or unequal numbers. In either case, the proportion of the monomers is usually fixed by the functional structure of the multimeric protein.
The term "protein expression unit" is herein defined as a unit capable of providing protein expression and typically comprises a functional promoter, an open reading frame encoding a protein of interest and a functional terminator, all in operable configuration. A functional promoter is a promoter that is capable of initiating transcription in a particular cell. Suitable promotors for obtaining expression in eukaryotic cells are the CMV-promoter, a mammalian EFl-alpha promoter, a mammalian ubiquitin promoter, or a SV40 promoter.
A
functional terminator is a terminator that is capable of providing transcription termination. One example of a suitable terminator is an SV40 terminator. The term "an open reading frame encoding a protein of interest (or a transgene)" is typically defined as a fragment of DNA which codes for a specific RNA product or products or a specific protein or proteins, and which is optionally capable of becoming integrated into the genome of a host cell. It includes DNA elements 00 required for proper transcription and translation of the coding region(s) of the transgene. Said DNA encoding said protein of interestitransgene can either be a DNA encoding a product obtained by the processes of transcription and translation (and possibly but not necessarily said product is part of a multimeric protein, for example a subunit) or a product obtained by the processes of transcription, translation and post-translational modification.
The terms "recombinant cell/host cell" and "recombinant cell line/host 0cell line" are respectively typically defined as a host cell and homogeneous N populations thereof into which a transgene has been introduced for the purpose of producing a heterologous protein or proteins.
A STAR (STabilizing Anti-Repressor) sequence (or STAR element; the terms will be used interchangeably herein) is a naturally occurring
DNA
element that we have isolated from eukaryotic genomes on the basis of their ability to block transgene repression. Preferably, the STAR elements are recovered from the human genome. A STAR sequence comprises the capacity to influence transcription of genes in cis and/or provide a stabilizing and/or an enhancing effect. It has been demonstrated that when STAR elements flank transgenes, the transgene expression level of randomly selected recombinant cell lines can be increased to levels approaching the maximum potential expression of the transgene's promoter. Moreover, the expression level of the transgene is stable over many cell generations, and does not manifest stochastic silencing. Therefore, STAR sequences confer a degree of positionindependent expression on transgenes that is not possible with conventional transgenic systems. The position independence means that transgenes that are integrated in genomic locations that would result in transgene silencing are, with the protection of STAR elements, maintained in a transcriptionally active state.
STAR-sequences can be identified (as disclosed for example in example 1 of EP 01202581.3) using a method of detecting, and optionally selecting, a DNA sequence with a gene transcription-modulating quality, comprising 00 providing a transcription system with a variety of fragment-comp rising vectors, said vectors comprising i) an element with a gene-transcription c-i repressing quality, and ii) a promoter directing transcription of a reporter gene, the method further comprising performing a selection-step in said transcription system in order to identify said DNA sequence with said gene transcription modulating quality. Preferably, said fragments are located C-i between i) said element with a gene-transcription repressing quality, and ii) 00 said promoter directing transcription of said reporter gene. RNA polymerase initiates the transcription process after binding to a specific sequence, called the promoter, that signals where RNA synthesis should begin. A modulating quality can enhance transcription from said promoter in cis, in a given cell type and/or a given promoter. The same DNA sequence can comprise an enhancing quality in one type of cell or with one type of promoter, whereas it can comprise another or no gene transcription modulating quality in another cell or with another type of promoter. Transcription can be influenced through a direct effect of the regulatory element (or the protein(s) binding to it) on the transcription of a particular promoter. Transcription can however, also be influenced by an indirect effect, for instance because the regulatory element affects the function of one or more other regulatory elements. A gene transcription modulating quality can also comprise a stable gene transcription quality. With stable is meant that the observed transcription level is not significantly changed over at least 30 cell divisions. A stable quality is useful in situations wherein expression characteristics should be predictable over many cell divisions. Typical examples are cell lines transfected with foreign genes. Other Iexamples are transgenic animals and plants and gene therapies.
Very often, introduced expression cassettes function differently after increasing numbers of cell divisions or plant or animal generations. Preferably, a stable quality comprises a capacity to maintain gene transcription in subsequent generations of a transgenic plant or animal. Of course in case expression is inducible said quality comprises the quality to maintain 00 inducibility of expression in subsequent generations of a transgenic plant or animal. Frequently, expression levels drop dramatically with increasing numbers of cell divisions. With the herein described method for identification of a DNA sequence with a gene transcription modulating quality, it is possible tq 5 to detect and optionally select a DNA sequence that is capable of at least in c-i part preventing the dramatic drop in transcription levels with increasing c-i numbers of cell divisions. Preferably, said gene transcription modulating 00 0 quality comprises a stable gene transcription quality. Strikingly, fragments CI comprising a DNA sequence with said stable gene transcription quality can be detected and optionally selected with the method for identification of a DNA sequence with a gene transcription modulating quality, in spite of the fact that said method does not necessarily measure long term stability of transcription.
Preferably, said gene transcription modulating quality comprises a stable gene transcription enhancing quality. It has been observed that incorporation of a DNA sequence with a gene transcription modulating quality in an expression vector with a gene of interest, resuilts in a higher level of transcription of said gene of interest, upon integration of the expression vector in the genome of a cell and moreover that said higher gene expression level is also more stable than in the absence of said DNA sequence with a gene transcription modulating quality.
In experiments designed to introduce a gene of interest into the genome of a cell and to obtain expression of said gene of interest, the following has been observed. If together with said gene of interest also a DNA sequence with a gene transcription modulating quality was introduced, more clones could be detected that expressed more than a certain amount of gene product of said gene of interest, than when said DNA sequence was not introduced together with said gene of interest. Thus an identified DNA sequence with gene transcription modulating quality also provides a method for increasing the number of cells expressing a more than a certain level of a gene product of a gene of interest upon providing said gene of interest to the genome of said 00 cells, comprising providing said cell with a DNA sequence comprising a gene transcription modulating quality together with said gene of interest.
The chances of detecting a fragment with a gene transcriptionmodulating quality vary with the source from which the fragments are derived. Typically, there is no prior knowledge of the presence or absence of fragments with said quality. In those situations many fragments will not 0 comprise a DNA sequence with a gene transcription-modulating quality. In 00 these situations a formal selection step for DNA sequences with said quality is Sintroduced. This is done by selection vectors comprising said sequence on the basis of a feature of a product of said reporter gene, that can be selected for or against. For instance, said gene product may induce fluorescence or a color deposit green fluorescent protein and derivatives, luciferase, or alkaline phosphatase) or confer antibiotic resistance or induce apoptosis and cell death.
A method for the identification of a DNA sequence with a gene transcription modulating quality is particularly suited for detecting and optionally selecting a DNA sequence comprising a gene transcriptionenhancing quality. It has been observed that at least some of the selected DNA sequences, when incorporated into an expression vector comprising a gene of interest, can dramatically increase gene transcription of said gene of interest in a host cell even when the vector does not comprise an element with a genetranscription repressing quality. This gene transcription enhancing quality is very useful in cell lines transfected with foreign genes or in transgenic animals and plants.
Said transcription system can be a cell free in vitro transcription system. With the current expertise in automation such cell free systems can be accurate and quick. However, said transcription system preferably comprises host cells. Using host cells warrants that fragments are detected and optionally selected with activity in cells.
An element with a gene transcription repressing quality will repress transcription from a promoter in the transcription system used. Said 00 repression does not have to lead to undetectable expression levels. Important is that the difference in expression levels in the absence or presence of repression is detectable and optionally selectable. Preferably, said genetranscription repression in said vectors results in gene -transcription repressing chromnatin. Preferably, DNA sequences can be detected, and optionally selected that are capable of at least in part counteracting the formation of gene -transcription repressing chromatin. In one aspect a DNA 00 sequence capable of at least in part counteracting the formation of genetranscription repressing chromatin comprises a stable gene transcription quality. Preferably, the DNA sequence involved in gene -transcription repression is a DNA sequence that is recognized by a protein complex and wherein said transcription system comprises said complex. Preferably said complex comprises a heterochromatin-binding protein comprising IP 1, a Polycomb-group (Pc-G) protein, a histone deacetylase activity or MeCP2 (methyl-CpG-biflddfg protein). Many organisms comprise one or more of these protLeins. These proteins frequently exhibit activity in other species as well.
Said complex can thus also comprise proteins from two or more species. The mentioned set of known chromatin-associated protein complexes are able to convey long-range repression over many base pairs. The complexes are also involved in stably transferring the repressed status of genes to daughter cells upon cell division. Sequences selected in this way are able to convey long-range anti-repression over many base pairs (van der Vlag et al., 2000).
The vector used can be any vector that is suitable for cloning DNA and that can be used in a transcription system. When host cells are used it is preferred that the vector is an episomally replicating vector. In this way, effects due to different sites of integration of the vector are avoided. DNA elements flanking the vector at the site of integration can have effects on the level of transcription of the promoter and thereby mimic effects of fragments comprising DNA sequences with a gene transcription modulating quality. In a preferred embodiment said vector comprises a replication origin from the 00
P
Epstein-Barr virus (EBV), OriP, and a nuclear antigen (EBNA-1). Such vectors are capable of replicating in many types of eukaryotic cells and assemble into chromatin under appropriate conditions.
DNA sequences with gene transcription modulating quality can be kn 5 obtained from different sources, for example from a plant or vertebrate, or derivatives thereof, or a synthetic DNA sequence or one constructed by means 00 of genetic engineering. Preferably said DNA sequence comprises a sequence as depicted in Table 3 and/or Figure 6 and/or a functional equivalent and/or a functional fragment thereof.
Several methods are available in the art to extract sequence identifiers from a family of DNA sequences sharing a certain common feature. Such sequence identifiers can subsequently be used to identify sequences that share one or more identifiers. Sequences sharing such one or more identifiers are likely to be members of the same family of sequences, i.e are likely to share the common feature of the family. Herein, a large number of sequences comprising STAR activity (so-called STAR sequences or STAR elements) were used to obtain sequence identifiers (patterns) which are characteristic for sequences comprising STAR activity. These patterns can be used to determine whether a test sequence is likely to contain STAR activity. A method for detecting the presence of a STAR sequence within a nucleic acid sequence of about 50-5000 base pairs is thus herein provided, comprising determining the frequency of occurrence in said sequence of at least one sequence pattern and determining that said frequency of occurrence is representative of the frequency of occurrence of said at least one sequence pattern in at least one sequence comprising a STAR sequence. In principle any method is suited for determining whether a sequence pattern is representative of a STAR sequence.
Many different methods are available in the art. Preferably, the step of determining that said occurrence is representative of the frequency of occurrence of said at least one sequence pattern in at least one sequence comprising a STAR sequence comprises, determining that the frequency of 00 C occurrence of said at least one sequence pattern significantly differs between said at least one STAR sequence and at least one control sequence. In principle c-i any significant difference is discriminative for the presence of a STAR sequence. However, in a particularly preferred embodiment the frequency of occurrence of said at least one sequence pattern is significantly higher in said c.i at least one sequence comprising a STAR sequence compared to said at least one control sequence.
00 As described above, a considerable number of sequences comprising a i STAR sequence have been identified herein. It is possible to use these sequences to test how efficient a pattern is in discriminating between a control sequence and a sequence comprising a STAR sequence. Using so-called discriminant analysis it is possible to determine on the basis of any set of STAR sequences in a species, the most optimal discriminative sequence patters or combination thereof. Thus, preferably, at least one of said patterns is selected on the basis of optimal discrimination between said at least one sequence comprising a STAR sequence and a control sequence.
Preferably, the frequency of occurrence of a sequence pattern in a test nucleic acid is compared with the frequency of occurrence in a sequence known to contain a STAR sequence. In this case a pattern is considered representative for a sequence comprising a STAR sequence if the frequencies of occurrence are similar. Even more preferably, another criterion is used. The frequency of occurrence of a pattern in a sequence comprising a STAR sequence is compared to the frequency of occurrence of said pattern in a control sequence.
By comparing the two frequencies it is possible to determine for each pattern thus analysed, whether the frequency in the sequence comprising the STAR sequence is significantly different from the frequency in the control sequence.
Then a sequence pattern is considered to be representative of a sequence comprising a STAR sequence, if the frequency of occurrence of the pattern in at least one sequence comprising a STAR sequence is significantly different from the frequency of occurrence of the same pattern in a control sequence. By using 00 Ct larger numbers of sequences comprising a STAR sequence the number of patterns for which a statistical difference can be established increases, thus CI enlarging the number of patterns for which the frequency of occurrence is representative for a sequence comprising a STAR sequence. Preferably said V) 5 frequency of occurrence is representative of the frequency of occurrence of said ci at least one sequence pattern in at least 2 sequences comprising a STAR c-i sequence, more preferably in at least 5 sequences comprising a STAR 00 0 sequence. More preferably in at least 10 sequences comprising a STAR sequence. More preferably, said frequency of occurrence is representative of the frequency of occurrence of said at least one sequence pattern in at least sequences comprising a STAR sequence. Particularly preferred, said frequency of occurrence is representative of the frequency of occurrence of said at least one sequence pattern in at least 50 sequences comprising a STAR The patterns that are indicative for a sequence comprising a STAR sequence are also dependent on the type of control nucleic acid used. The type of control sequence used is preferably selected on the basis of the sequence in which the presence of a STAR sequence is to be detected. Preferably, said control sequence comprises a random sequence comprising a similar AT/OG content as said at least one sequence comprising a STAR sequence. Even more preferably, the control sequence is derived from the same species as said sequence comprising said STAR sequence. For instance, if a test sequence is scrutinized for the presence of a STAR sequence, active in a plant cell, then preferably the control sequence is also derived from a plant cell. Similarly, for testing for STAR activity in a human cell, the control nucleic acid is preferably also derived from a human genome. Preferably, the control sequence comprises between 50% and 150% of the bases of said at least one sequence comprising a STAR sequence. Particularly preferred, said control sequence comprises between 90% and 110% of the bases of said at least one sequence comprising a STAR sequence. More preferably, between 95% and 105%.
00 A pattern can comprise any number of bases larger than two.
Preferably, at least one sequence pattern comprises at least 5, more preferably C1 at least 6 bases. Even more preferably, at least one sequence pattern comprises at least 8 bases. Preferably, said at least one sequence pattern V) 5 comprises a pattern listed in table 4 and/or table 5. A pattern may consist of a N consecutive list of bases. However, the pattern may also comprise bases that c are interrupted one or more times by a number of bases that are not or only Spartly discriminative. A partly discriminative base is for instance indicated as C a purine.
Preferably, the presence of STAR activity is verified using a functional assay. Several methods are presented herein to determine whether a sequence comprises STAR activity. STAR activity is confirmed if the sequence is capable of performing at least one of the following functions: at least in part inhibiting the effect of sequence comprising a gene transcription repressing element of the invention, (ii) at least in part blocking chromatin-associated repression, (iii) at least in part bl6cking activity of an enhancer, (iv) conferring upon an operably linked nucleic acid encoding a transcription unit compared to the same nucleic acid alone. (iv-a) a higher predictability of transcription, (ivb) a higher transcription, and/or (iv-c) a higher stability of transcription over time.
The large number of sequences comprising STAR activity identified herein open up a wide variety of possibilities to generate and identify sequences comprising the same activity in kind not necessarily in amount. For instance, it is well within the reach of a skilled person to alter the sequences identified herein and test the altered sequence for STAR activity. Such altered sequences are therefore also included herein and can be used in method for obtaining a cell which expresses two or more proteins or in a method for identifying a cell wherein expression of two or more proteins is in a predetermined ratio. Alteration can include deletion, insertion and mutation of one or more bases in the sequences.
00 Sequences comprising STAR activity were identified in stretches of 400 bases. However, it is expected that not all of these 400 bases are required to retain STAR activity. Methods to delimit the sequences that confer a certain property to a fragment of between 400 and 5000 bases are well known. The minimal sequence length of a fragment comprising STAR activity is estimated to be about 50 bases.
Table 4 and table 5 list patterns of 6 bases that have been found to be 00 over represented in nucleic acid molecules comprising STAR activity. This over representation is considered to be representative for a STAR sequence. The tables were generated for a family of 65 STAR sequences. Similar tables can be generated starting from a different set of STAR sequences, or from a smaller or larger set of STAR sequences. A pattern is representative for a STAR sequence if it is over represented in said STAR sequence compared to a sequence not comprising a STAR element. This can be a random sequence. However, to exclude a non-relevant bias, the sequence comprising a STAR sequence is preferably compared to a genome or a significant part thereof. Preferably a genome of a vertebrate or plant, more preferably a human genome. A significant part of a genome is for instance a chromosome. Preferably the sequence comprising a STAR sequence and said control sequence are derived from nucleic acid of the same species.
The more STAR sequences are used for the determination of the frequency of occurrence of sequence patterns, the more representative for STARS the patterns are that are over- or under -represented. Considering that many of the functional features that can be displayed by nucleic acids, are mediated by proteinaceous molecules binding to it, it is preferred that the representative pattern is over-represented in the STAR sequences. Such overrepresented pattern can be, part of, a binding site for such a proteinaceous molecule. Preferably said frequency of occurrence is representative of the frequency of occurrence of said at least one sequence pattern in at least 2 sequences comprising a STAR sequence, more preferably in at least 00 sequences comprising a STAR sequence. More preferably in at least sequences comprising a STAR sequence. More preferably, said frequency of F3 occurrence is representative of the frequency of occurrence of said at least one sequence pattern in at least 20 sequences comprising a STAR sequence.
V- 5 Particularly preferred, said frequency of occurrence is representative of the c. frequency of occurrence of said at least one sequence pattern in at least N sequences comprising a STAR. Preferably, said sequences comprising a STAR sequence comprise at least one of the sequences depicted in figure 6.
N STAR activity is feature shared by the sequences listed in figure 6. However, this does not mean that they must all share the same identifier sequence. It is very well possible that different identifiers exist. Identifiers may confer this common feature onto a fragment containing it, though this is not necessarily so.
By using more sequences comprising STAR activity for determining the frequency of occurrence of a sequence pattern or patterns, it is possible to select patterns that are more often than others present or absent in such a STAR sequence. In this way it is possible to find patterns that are very frequently over or under represented in STAR sequences. Frequently over or under represented patterns are more likely to identify candidate
STAR
sequences in test sets. Another way of using a set of over or under represented patterns is to determine which pattern or combination of patterns is best suited to identify a STAR in a sequence. Using so-called discriminant statistics we have identified a set of patterns which performs best in identifying a sequence comprising a STAR element. Preferably, at least one of said sequence patterns for detecting a STAR sequence comprises a sequence pattern GGACCC, CCCTGC, AAGCCC, CCCCCA and/or AGCACC. Preferably, at least one of said sequence patterns for detecting a STAR sequence comprises a sequence pattern CCCN{16}AGC, GGCN{9}GAC, CACN{13}AGG, and/or CTGN{4}GCC.
00 (c- A list of STAR sequences can also be used to determine one or more consensus sequences therein. A consensus sequence for a STAR element is N, therefore also provided herein. This consensus sequence can of course be used to identify candidate STAR elements in a test sequence.
5 Moreover, once a sequence comprising a STAR element has been C identified in a vertebrate it can be used by means of sequence homology to 0 identify sequences comprising a STAR element in other species belonging to 00 vertebrate. Preferably a mammalian STAR sequence is used to screen for c- STAR sequences in other mammalian species. Similarly, once a STAR sequence has been identified in a plant species it can be used to screen for homologous sequences with similar function in other plant species. STAR sequences obtainable by a method as described herein are thus provided.
Further provided is a collection of STAR sequences. Preferably said STAR sequence is a vertebrate or plant STAR sequence. More preferably, said STAR sequence is a mammalian STAR sequence or an angiosperm (monocot, such as rice or dicot, such as Arabidopsis). More preferably, said STAR sequence is a primate and/or human STAR sequence.
A list of sequences comprising STAR activity can be used to determine whether a test sequence comprises a STAR element. There are, as mentioned above, many different methods for using such a list for this purpose.
Preferably, a method is provided for determining whether a nucleic acid sequence of about 50-5000 base pairs comprises a STAR sequence said method comprising, generating a first table of sequence patterns comprising the frequency of occurrence of said patterns in a collection of STAR sequences of the invention, generating a second table of said patterns comprising the frequency of occurrence of said patterns in at least one reference sequence, selecting at least one pattern of which said frequency of occurrence differs between the two tables, determining, within said nucleic acid sequence of about 50-5000 base pairs, the frequency of occurrence of at least one of said selected patterns, and determining whether the occurrence in said test nucleic 00 acid is representative of the occurrence of said selected pattern in said collection of STAR sequences. Alternatively, said determining comprises c-i determining whether the frequency of occurrence in said test nucleic acid is representative of the frequency occurrence of said selected pattern in said collection of STAR sequences. Preferably said method further comprises determining whether said candidate STAR comprises a gene transcription c-i modulating quality using a method described herein. Preferably, said 00 collection of STARs comprises sequence as depicted in figure 6.
N Now multiple methods are disclosed for obtaining a STAR sequence, it is clear that we also provide an isolated and/or recombinant nucleic acid sequence comprising a STAR sequence by a method as described herein.
A STAR sequence can exert its activity in a directional way, i.e. more to one side of the fragment containing it than to the other. Moreover,
STAR
activity can be amplified in amount by multiplying the number of STAR elements. The latter suggests that a STAR element may comprise one or more elements comprising STAR activity. Another way of identifying a sequence capable of conferring STAR activity on a fragment containing it comprises selecting from a vertebrate or plant sequence, a sequence comprising
STAR
activity and identifying whether sequences flanking the selected sequence are conserved in another species. Such conserved flanking sequences are Likely to be functional sequences. Such a method for identifying a sequence comprising a STAR element comprising selecting a sequence of about 50 to 5000 base pairs from a vertebrate or plant species comprising a STAR element and identifying whether sequences flanking said selected sequence in said species are conserved in at least one other species. We further provide a method for detecting the presence of a STAR sequence within a nucleic acid sequence of about 50-5000 base pairs, comprising identifying a sequence comprising a STAR sequence in a part of a chromosome of a cell of a species and detecting significant homology between said sequence and a sequence of a chromosome of a different species. The STAR in said different species is thus identified.
00 Preferably, said species comprises a plant or vertebrate species, preferably a mammalian species. We also provide a method for detecting the presence of a STAR element within a nucleic acid sequence of about 50-5000 base pairs of a vertebrate or plant species, comprising identifying whether a flanking sequence of said nucleic acid sequence is conserved in at least one other species.
It is important to note that methods as disclosed herein for detecting the 00 presence of a sequence comprising a STAR sequence using bioinformatical N- information are iterative in nature. The more sequences comprising a STAR sequence are identified with a method as described herein the more patterns are found to be discriminative between a sequence comprising a STAR sequence and a control sequence. Using these newly found discriminative patterns more sequences comprising a STAR sequence can be identified which in turn enlarge the set of patterns that can discriminate and so on. This iterative aspect is an important aspect of methods provided herein.
The term quality in relation to a sequence refers to an activity of said sequence. The term STAR, STAR sequence or STAR element, as used herein, refers to a DNA sequence comprising one or more of the mentioned gene transcription modulating qualities. The term "DNA sequence" as used herein does, unless otherwise specified, not refer to a listing of specific ordering of bases but rather to a physical piece ofDNA. A transcription quality with reference to a DNA sequence refers to an effect that said DNA sequence has on transcription of a gene of interest. "Quality" as used herein refers to detectable properties or attributes of a nucleic acid or protein in a transcription system.
The present invention provides, amongst others, a method for obtaining a cell which expresses two or more proteins, a method for identifying a cell wherein expression of two or more proteins is in a predetermined ratio and a protein expression unit. It is clear that in all these embodiments the abovedescribed obtainable STAR sequences can be used. For example, a STAR sequence from figure 6, table 3, table 4, table 5 or combinations thereof. More 00 e preferably, said STAR sequence is a vertebrate STAR sequence or a plant STAR sequence. Even more preferably, said vertebrate STAR sequence is a N human STAR sequence. It is furthermore preferred to use a STAR sequence from a species from which a gene of interest is expressed. For example, when S 5 one would like to express two or more proteins and one of the proteins is a c, human protein, one preferably includes a human STAR sequence for the Sexpression of said human protein.
00 SAs outlined above the STAR elements flanking an expression unit are the basis of the stable expression of the monomer transgenes over many cell generations. We have demonstrated that STAR elements can protect individual transgenes from silencing. In the present invention that capability is extended to more than one expression unit introduced (preferentially) independently in a recombinant host cell. Expression units that are not flanked by STAR elements can undergo significant silencing after only 5-10 culture passages, during which time silencing of the STAR protected units is ne gligible.
The advantages of a method for obtaining a cell which expresses two or more proteins comprising providing said cell with two or more protein expression units encoding said two or more proteins, characterised in that at least two of said protein expression units comprise at least one STAR sequence, are multifold.
The present invention uses STAR sequences for the production of two or more proteins and thereby the invention provides an increased predictability in the creation of recombinant cell lines that efficiently produce the heterologous multimeric proteins of interest, an increased yield of the heterologous multimeric proteins, stable expression of the heterologous multimeric proteins, even during prolonged cultivation in the absence of selection agent and the invention also provides favorable transgene expression characteristics without amplification of the transgene. The increased yield of heterologous proteins provided by the invention may be 00 obtained at low transgene copy numbers, without selective co -ampiffication using, for example, the DHFRlmethotrexate system. This results in greater c-istability, since the tra nsgene copy number is low and is not susceptible to decrease due to recombination (McBurney et al., 2002) or repeat-induced gene silencing (Garrick et al., 1998). Fifth, the broad applicability of the method of c-i the invention includes its utility in a wide range of host cell lines. This is for c-i example useful/desirable when a particular multimeric protein is preferably 00 expressed by a particular host cell line expression of antibodies from c-i lymphocyte -derived host cell lines).
A method according to the invention therefore provides an improvement of expression of two or more proteins in a (host) cell.
In another embodiment the invention provides a method for identifying a cell wherein expression of two or more proteins is in a predetermined ratio comprising providing a collection of cells with two or more protein expression units encoding said two or more proteins, selecting cells which express said two or more proteins, and identifying from the obtained selection, cells that express said two or more proteins in said predetermined ratio, characterised in that at least two of said protein expression units comprise at least one STAR sequence.
The selection of cells which express said two or more proteins may for example be obtained by performing a SDS-PAGE analysis, a Western blot analysis or an ELISA, which are all techniques which are known by a person skilled in the art and therefore need no further elaboration. The identification of cells that express said two or more proteins in said predetermined ratio can also be performed by these techniques.
The presence of a STAR sequence in at least two of said protein expression units, again, provide the desired predictability, yield, stability and stoichiometrically balanced availability of the two or more proteins.
00
O
O
C Especially when polypeptides of a multimeric protein are produced according to a method of the invention it is desirable to provide the required monomers/subunits in a ratio that is relevant for the formation of said multimeric protein. Hence, preferably said monomers/subunits are produced in a biological relevant balanced ratio. If for example, a multimeric protein C consists of two subunits A and 1 subunit B it is desired to produce two subunits A for every subunit of B that is produced. Hence, a predetermined 00 Sratio is herein defined as the natural occurring ratio (stoichiometry) of the c different subunits/monomers/polypeptides which comprise a multimeric protein.
In a more preferred embodiment a cell obtainable according to a method of the invention expresses two proteins. For example, two proteins which together provide a therapeutically advantageous effect. In an even more preferred embodiment the predetermined ratio of the two expressed proteins is 1:1. This is for example useful in the production of multimeric proteins in which the monomers are in a 1:1 ratio. Typical examples are antibodies that comprise two heavy chains and two light chains.
Preferably, the invention provides a method, wherein said two or more protein expression units further encode at least two different selection markers, and wherein said method further comprises a two-step selection marker screening on said cell, wherein said cell is selected in a first step on the presence of a first selection marker and in a second step on the presence of a second selection marker.
In this embodiment of the invention a two-stage antibiotic selection is used which regime results in a high proportion of isolates that express for example transgenes 1 and 2 at high levels; the first stage of selection eliminates cells that do not contain the expression unit or units, and the second stage of selection eliminates colonies that do not transcribe both bicistronic mRNAs at high levels. This regime is one of the aspects for the increased frequency of multimer-expressing recombinant cell lines achieved by 00 the invention compared to conventional methods. As described herein, it results in an increase in the frequency of expressor lines by more than ten-fold.
In another embodiment the invention provides a method wherein at least one of said protein expression units comprises a monocistronic gene comprising an open reading frame encoding a protein of N interest and wherein said monocistronic gene is under control of a functional promoter.
00 In yet another embodiment the invention provides a method according to the invention, wherein at least one of said protein expression units comprises a bicistronic gene comprising an open reading frame encoding a protein of interest, a protein translation initiation site with a reduced translation efficiency, a selection marker and wherein said bicistronic gene is under control of a functional promoter.
In a more preferred embodiment the invention provides a method according to the invention, wherein at least one of said protein expression units comprises a bicistronic gene comprising an open reading frame encoding a protein of interest, a protein translation initiation site with a reduced translation efficiency, a selection marker a"d wherein said bicistronic gene is under control of a functional promoter, which protein expression unit further comprises a monocistronic gene comprising an open reading frame encoding a second selection marker and wherein said monocistronic gene is under control of a functional promoter.
The term "bicistronic gene," is typically defined as a gene capable of providing a RNA molecule that encodes two proteins/polypeptides.
The term "monocistronic gene" is typically defined as a gene capable of providing a RNA molecule that encodes one proteinpolypeptide.
00 The term "selection marker or selectable marker" is typically used to refer to a gene and/or protein whose presence can be detected directly or Sindirectly in a cell, for example a gene and/or a protein that inactivates a selection agent and protects the host cell from the agent's lethal or growthinhibitory effects an antibiotic resistance gene and/or protein). Another possibility is that said selection marker induces fluorescence or a color deposit c-i green fluorescent protein and derivatives, luciferase, or alkaline 00 0phosphatase).
The term "selection agent" is typically defined as a chemical compound that is able to kill or retard the growth of host cells an antibiotic).
The term "selection" is typically defined as the process of using a selection marker/selectable marker and a selection agent to identify host cells with specific genetic properties that the host cell contains a transgene inte grate d into its genome).
The nouns "clone" and "isolate" typically refer to a recombinant host cell line that has been identified and isolated by means of selection.
The improvements provided by a method according to the invention have three integrated aspects. With existing systems, recombinant cell lines that simultaneously express acceptable quantities of the monomers of multimeric proteins can be created only at very low frequencies; the present invention increases the predictability of creating high-yielding recombinant host cell lines by a factor of ten or more. Existing systems do not provide stoichiometrically balanced and proportional amounts of the subunits of multimeric proteins; the present invention ensures that the expression levels of the subunits will be balanced and proportional. Existing systems do not provide a means of protecting the transgenes that encode the protein subunits from transgene silencing.
FIG 1 provides a, non-limiting, schematic representation of one of the embodiments of this part of the invention. FIG 1A and FIG 1B show two separate protein expression units. This is the configuration of the DNA 00 Selements of the expression units in the plasmid as well as after integration Sinto the genome. Expression unit one is shown in FIG 1A. It contains an open c-I reading frame for a transgene (a reporter gene or subunit 1 of a multimeric (TG S1, transgene subunit This is upstream of the attenuated
EMCV
IRES, and of the open reading frame encoding the zeocin resistance selectable N. marker protein (zeo). This bicistronic transgene is transcribed at high levels N. from the CMV promoter. Next to this is the neomycin resistance selectable 00 0 marker (neo; this confers resistance to the antibiotic G418 as well), transcribed c- as a monocistronic mRNA from the SV40 promoter. These two genes are flanked by STAR elements. In FIG 1B a similar expression unit is depicted. It consists of a second transgene (a second reporter gene or the open reading frame for subunit 2 of a heterodimeric protein (TG S2)) upstream of the attenuated EMCV IRES and the blasticidin selectable marker open reading frame (bsd). This bicistronic transgene is transcribed at high levels from the CMV promoter. Next to this is the neo selectable marker, transcribed as a monocistronic mRNA from the SV40 promoter. The two genes in the second expression unit are flanked by STAR elements as well.
It is clear to a person skilled in the art that the possible combinations of selection markers is numerous. Examples of possible antibiotic combinations are provided above. The one antibiotic that is particularly advantageous is zeocin, because the zeocin-resistance protein (zeocin-R) acts by binding the drug and rendering it harmless. Therefore it is easy to titrate the amount of drug that kills cells with low levels of zeocin-R expression, while allowing the high-expressors to survive. All other antibiotic-resistance proteins in common use are enzymes, and thus act catalytically (not 1:1 with the drug).
When a two-step selection is performed it is therefore advantageous to use an antibiotic resistance protein with this 1:1 binding mode of action. Hence, the antibiotic zeocin is a preferred selection marker. For convenience the zeocin antibiotic is in a two-step selection method combined with puromycin-R or blasticidin-R in the second bicistronic gene, and neomycin-R or hygromycin-R 00 in the monocistromic gene.
It is furthermore clear that it is also possible to combine an antibiotic selection marker with a selection marker which provides induction of fluorescence or which provide a color deposit.
It is also clear to a person skilled in the art that different promoters can be used as long as they are functional in the used cell. The CMV promoter is considered the strongest available, so it is preferably chosen for the bicistronic 00 gene in order to obtain the highest possible product yield. Other examples of N suitable promoters are e.g. mammalian promoters for EFi-aipha or ubiquitin.
The good expression and stability of the SV40 promoter makes it well suited for expression of the monocistronic gene; enough selection marker protein (for example the antibiotic resistance protein neomycin-R in the example cited herein) is made to confer high expression of said selection marker. Hence, said promoter is preferentially used as a promoter driving the expression of the selection marker.
In a preferred embodiment the invention provides a method wherein at least one of said protein expression units comprises at least two STAR sequences. In an even more preferred embodiment the invention provides a method wherein said protein expression unit comprising at least two STAR sequences is arranged such that said protein expression unit is flanked on either side by at least one STAR sequence. In yet an even more preferred embodiment the said at least two STAR sequences are essentially identical.
Essentially identical STAR sequences are defined herein as STAR sequences which are identical in their important domains, but which may vary within their less important domains (the domains that confer the transcription stabilizing or enhancing quality), for example a pointmutation, deletion or insertion at a less important position within the STAR sequence.
Preferentially said essentially identical STAR sequences provide equal amounts of transcription stabilizing or enhancing activity.
00
O
O
The use of STAR sequences to flank at least one protein expression unit is one of the aspects of the balanced and proportional levels of expression of two or more proteins and more specifically for the expression of the monomers of multimeric proteins. The STAR sequences create chromatin domains of definite and stable transcriptional potential. As a result, promoters that drive C-i transcription of each bicistronic mRNA will function at definite, stable levels.
O A recombinant host cell line created by the method of the invention is readily 00 identified in which these levels result in appropriate proportions of each Smonomer of the multimeric protein of interest being expressed at high yields.
In another embodiment the protein expression unit contains only the bicistronic gene flanked by STAR elements. The advantages of omitting the monocistronic antibiotic resistance gene are twofold. First, selection of highexpressing recombinant host cells requires the use of only two antibiotics.
Second, it prevents repression of the bicistronic and/or monocistronic genes by the phenomena of promoter suppression and transcriptional interference.
These phenomena are common problems in conventional transgenic systems in which two or more transcription units are located near each other. Repression by an upstream unit of a downstream unit is termed transcriptional interference, and repression by a downstream unit of an upstream unit is termed promoter suppression (Villemure et al., 2001). Transcriptional interference can result in suppression of adjacent transgenes in all possible arrangements (tandem, divergent, and convergent) (Eszterhas et al., 2002).
These phenomena can reduce the efficiency of selection of the IRES-dependent and/or monocistronic antibiotic resistance genes, and reduce the yield of the transgene. Therefore the embodiment of the invention comprising only a bicistronic gene flanked by STAR elements provides an alternative configuration of the components.
In a preferred embodiment the method according to the invention uses a STAR sequence wherein said STAR sequence is depicted in Table 3 and/or figure 6 and/or a functional equivalent and/or a functional fragment thereof.
00 We have isolated and characterized an extensive collection of STAR sequences using proprietary technology. The strength of these sequences ranges widely. This is manifested by the varying degrees of improvement of transgene expression in recombinant host cells conferred by the STAR elements; some STAR elements provide full protection from silencing, while c-i others only provide partial protection. The range in strength of the STAR c-i elements is also manifested in their varying capacities to improve the 00 predictability of isolating recombinant cell lines that efficiently produce the Ni heterologous proteins of interest. For the present invention we have preferably employed STAR elements that have strong predictability characteristics, in order to have high numbers of efficiently-expressing recombinant cell lines.
The STAR elements employed have moderate to strong anti-repressor activity, in order to be able to modulate the levels of recombinant protein production to match the requirements of the product balanced and proportional expression of polypeptide monomers). The selected STAR elements also confer significant increases on the stability of expression of the transgenes.
Some STAR elements also display promoter and host cell-type specificity. These characteristics are exploited to create novel transgenic systems to optimize the production of heterologous proteins that require a specific host cell (for example, to achieve a high yield or a pharmaceutically advantageous glycosylation pattern) or a specific mode of expression (for example, the use of an inducible promoter or a constitutive promoter; the use of a promoter with moderate strength or high strength, etc.). Therefore the use of different STAR elements results in different embodiments of the invention that pertain to these types of applications.
A functional equivalent and/or a functional fragment of a sequence depicted in Table 3 and/or figure 6 is defined herein as follows. A functional equivalent of a sequence as depicted in Table 3 and/or figure 6 is a sequence derived with the information given in Table 3 and/or figure 6. For instance, a sequence that can be derived from a sequence in Table 3 and/or figure 6 by 00 deleting, modifying and/or inserting bases in or from a sequence listed in Table 3 andor figure 6, wherein said derived sequence comprises the same activity in kind, not necessarily in amount, of a sequence as depicted in Table 3 and/or figure 6. A functional equivalent is further a sequence comprising a part from two or more sequences depicted in Table 3 and/or figure 6. A functional c equivalent can also be a synthetic DNA sequence which is a sequence that is Snot derived directly or indirectly from a sequence present in an organism. For 00 instance a sequence comprising a drosophila scs or scs' sequence is not a isynthetic sequence, even when the scs or scs' sequence was artificially generated.
Functional sequences of STAR elements can be delineated by various methods known in the art. In one embodiment deletions and/or substitutions are made in STAR sequences. DNA that is modified in such a way is for example tested for activity by using a single modified nucleic acid or by generating a collection of test nucleic acids comprising said modified nucleic acid. Elucidation of functional sequences within STAR sequences enables the elucidation of consensus sequences for elements with a gene transcription modulating and/or a gene transcription repressing quality.
A functional fragment of a STAR sequence as depicted in Table 3 and/or figure 6 can for example be obtained by deletions from the 5' end or the 3' end or from the inside of said sequences or any combination thereof, wherein said derived sequence comprises the same activity in kind, not necessarily in amount.
In a more preferred embodiment said STAR sequence as depicted in Table 3 andlor figure 6 is STAR18 and/or a functional equivalent and/or a functional fragment thereof.
Yet another preferred feature of a method according to the invention is the introduction of a (weak) Internal Ribosome Binding Site (IRES) as an example of a protein translation initiation site with a reduced translation efficiency, between the open reading frame of the protein of interest and the 00
O
selection marker open reading frame. In combination with for example the STAR sequence, this component of the present invention comprises a marked improvement in transgenic systems for the expression of two or more proteins.
Internal ribosome binding site (IRES) elements are known from viral and mammalian genes (Martinez-Salas, 1999), and have also been identified in screens of small synthetic oligonucleotides (Venkatesan Dasgupta, 2001).
0 The IRES from the encephalomyocarditis virus has been analyzed in detail (Mizuguchi et al., 2000). An IRES is an element encoded in DNA that results in a structure in the transcribed RNA at which eukaryotic ribosomes can bind and initiate translation. An IRES permits two or more proteins to be produced from a single RNA molecule (the first protein is translated by ribosomes that bind the RNA at the cap structure of its 5' terminus, (Martinez-Salas, 1999)).
Translation of proteins from IRES elements is less efficient than capdependent translation: the amount of protein from IRES-dependent open reading frames (ORFs) ranges from less than 20% to 50% of the amount from the first ORF (Mizuguchi et al., 2000). This renders IRES elements undesirable for production of all subunits of a multimeric protein from one messenger RNA (mRNA), since it is not possible to achieve balanced and proportional expression of two or more protein monomers from a bicistronic or multicistronic mRNA. However, the reduced efficiency of IRES-dependent translation provides an advantage that is exploited by the current invention.
Furthermore, mutation of IRES elements can attenuate their activity, and lower the expression from the IRES-dependent ORFs to below 10% of the first ORF (Lopez de Quinto Martinez-Salas, 1998, Rees et al., 1996). The advantage exploited by the invention is as follows: when the IRES-dependent ORF encodes a selectable marker protein, its low relative level of translation means that high absolute levels of transcription must occur in order for the recombinant host cell to be selected. Therefore, selected recombinant host cell isolates will by necessity express high amounts of the transgene mRNA. Since 00 ct the recombinant protein is translated from the cap-dependent ORF, it can be produced in abundance resulting in high product yields.
NI It is clear to a person skilled in the art that changes to the IRES can be made without altering the essence of the function of the IRES (hence, V) 5 providing a protein translation initiation site with a reduced translation NI efficiency), resulting in a modified. IRES. Use of a modified IRES which is still NI capable of providing a small percentage of translation (compared to a 5' cap 00 translation) is therefore also included in this invention.
In yet another embodiment the invention provides a method for obtaining a cell which expresses two or more proteins or a method for identifying a cell wherein expression of two or more proteins is in a predetermined ratio, wherein each of said protein expression units resides on a separate DNA-carrier. The present invention preferentially makes use of a separate transcription unit for each protein and/or monomer of a multimeric: protein. In each transcription unit the monomer ORF is produced by efficient cap-dependent translation. This feature of the invention contributes that recombinant host cells are isolated which have high yields of each monomer, at levels that are balanced and proportionate to the stoichiolnetry of the multimeric protein. The increased predictability at which such recombinant host cells are isolated results in an improvement in the efficiency of screening for such isolates by a factor of ten or more. In a preferred embodiment said DNA-carrier is a vector (or plasmid; the terms are used interchangeably herein). In another embodiment said vector is a viral vector and in a more preferred embodiment said viral vector is an adenoviral vector or a retroviral vector. It is clear to person skilled in the art that other viral vectors can also be used in a method according to the invention.
Conventional expression systems are DNA molecules in the form of a recombinant plasmid or a recombinant viral genome. The plasmid or the viral genome is introduced into (mammalian host) cells and integrated into their genomes by methods known in the art. The present invention also uses these 00
O
O
types of DNA molecules to deliver its improved transgene expression system. A preferred embodiment of the invention is the use of plasmid DNA for delivery of the expression system. A plasmid contains a number of components: conventional components, known in the art, are an origin of replication and a selectable marker for propagation of the plasmid in bacterial cells; a selectable C marker that functions in eukaryotic cells to identify and isolate host cells that 0 carry an integrated transgene expression system; the protein of interest, 00 whose high-level transcription is brought about by a promoter that is Sfunctional in eukaryotic cells the human cytomegalovirus major immediate early promoter/enhancer, pCMV (Boshart et al., 1985)); and viral transcriptional terminators for the transgene of interest and the selectable marker the SV40 polyadenylation site (Kaufman Sharp, 1982)).
The vector used can be any vector that is suitable for cloning DNA and that can be used in a transcription system. When host cells are used it is preferred that the vector is an episomally replicating vector. In this way, effects due to different sites of integration of the vector are avoided. DNA elements flanking the vector at the site of integration can have effects on the level of transcription of the promoter and thereby mimic effects of fragments comprising DNA sequences with a gene transcription modulating quality. In a preferred embodiment said vector comprises a replication origin from the Epstein-Barr virus (EBV), OriP, and a nuclear antigen (EBNA-1). Such vectors are capable of replicating in many types of eukaryotic cells and assemble into chromatin under appropriate conditions.
In a preferred embodiment the invention provides a method for obtaining a cell which expresses two or more proteins or a method for obtaining a cell wherein expression of two or more proteins is in a predetermined ratio comprising providing two or more protein expression units wherein one of the said protein expression units or said protein(s) of interest encodes an immunoglobulin heavy chain and/or wherein another of the said protein expression units or said protein(s) of interest encodes an 00
O
immunoglobulin light chain. According to this embodiment a multimeric protein, an antibody, is obtained. It is clear to a person skilled in the art that it is possible to provide a cell which expresses an immunoglobulin heavy chain from one protein expression unit and an immunoglobulin light chain from another protein expression unit with a third protein expression unit encoding N a secretory component or a joining chain. In this way the production of for O example sIgA and pentameric IgM is provided.
00 Preferably, the used host cell secretes the produced multimer. In this Sway the product is easily isolated from the medium surrounding said host cell.
More preferably, the invention results in the production of a functional multimer. The functionality of the produced multimer is determined with standard procedures. For example, a produced multi subunit enzyme is tested in a corresponding enzymatic assay or binding to an antigen, for example in an ELISA, is used to test the functionality of a produced antibody.
Hence, the selection of a final suitable host cell expressing a multimer involves multiple steps amongst which are the selection for a cell that expresses all the desired subunits of a multimer, followed by a functional analysis of said multimer.
With regard to a multimeric protein high expression levels of the subunits is desired as well as the formation of a functional multimeric protein of said subunits. Surprisingly, the use of a STAR sequence for the production of the subunits of a multimeric protein results in high amount of cells that express the subunits, as compared to control vectors without a STAR sequence.
Moreover, the amount of functional multimeric protein is relatively higher when compared to the control.
Production of subunits and the formation of functional multimeric protein from these subunits is in particular of importance for the production of antibodies. When the heavy chain and light chain expression cassette are flanked by a STAR sequence this results in a higher production of functional antibody, as compared to control vectors without a STAR sequence. Hence, the 00 presence of a STAR sequence results in a higher degree of predictability of functional antibody expression. Preferably, each expression unit comprises at N least two STAR sequences which sequences are arranged such that said expression unit is flanked on either side by at least one STAR sequence.
In yet another embodiment a method according to the invention is N provided, wherein said protein expression units are introduced simultaneously O into said cell.
00 Preferebly, a functional promoter is a human cytomegalovirus
(CMV)
N promotor, a simian virus (SV40) promoter, a human ubiquitin C promoter or a human elongation factor alpha (EF-ca) promoter.
As disclosed herein within the experimental part, a STAR sequence can confer copy number-dependence on a transgene expression unit, making transgene expression independent of other transgene copies in tandem arrays, and independent of gene-silencing influences at the site of integration. Hence, the invention also provides a method for obtaining a cell which expresses two or more proteins or a method for identifying a cell wherein expression of two or more proteins is in a predetermined ratio in which multiple copies of a protein expression unit encoding a protein of interest is integrated into the genome of said cell in which cell, an amplification of the gene of interest is present).
According to this part of the invention, the protein expression units are introduced simultaneously into said (host) cell or collection of cells by methods known in the art. Recombinant host cells are selected by treatment with an appropriate antibiotic, for example G418, using methods known in the art.
After formation of individual antibiotic-resistant colonies, another antibiotic or a combination of antibiotics, for example a combination of zeocin and blasticidin, is/are applied, and antibiotic-resistant colonies are identified and isolated. These are tested for the level of expression of transgenes.
In another embodiment the invention provides a protein expression unit comprising 00 Ct a bicistronic gene comprising an open reading frame encoding a protein of interest, a protein translation initiation site with a reduced translation N efficiency, a selection marker and wherein said bicistronic gene is under control of a functional promoter at least one STAR sequence.
c-i In a more preferred embodiment said protein expression unit further c-i comprises 00 a monocistronic gene comprising an open reading frame encoding a second N- selection marker and wherein said monocistroruc gene is under control of a functional promoter.
In an even more preferred embodiment said protein expression unit comprises at least two STAR sequences which are preferentially arranged such that said protein expression unit is flanked on either side by at least one STAR sequence. Examples of such a protein expression unit are provided within the experimental part of this patent application (for example Figures 1 and In another embodiment the protein expression unit according to the invention comprises STAR sequences, wherein said STAR sequences are essentially identical.
In a preferred embodiment the invention provides a protein expression unit comprising a bicistronic gene comprising an open readling frame encoding a protein of interest, a protein translation initiation site with a reduced translation efficiency, a selection marker and wherein said bicistronic gene is under control of a functional promoter at least one STAR sequence, and is optionally provided with a monocistronic gene cassette, wherein said STAR sequence is depicted in Table 3 and/or figure 6 and/or a functional equivalent and/or a functional fragment thereof and even more preferred wherein said STAR sequence is STAR18.
In another embodiment a protein expression unit according to the invention is provided wherein said protein translation initiation site with a 00 38 tq reduced translation efficiency comprises an Internal Ribosome Entry Site (IRES). More preferably a modified, e.g. weaker, IRES is used.
SIn yet another embodiment a protein expression unit according to the invention is provided wherein said protein expression unit is a vector. In a preferred embodiment said DNA-carrier is a vector (or plasmid; the terms are used interchangeably herein). In another embodiment said vector is a viral vector and in a more preferred embodiment said viral 00 vector is an adenoviral vector or a retroviral vector. It is clear to person skilled in the art that other viral vectors can also be used in a method according to the invention.
SIn a preferred embodiment a protein expression unit according to the invention is provided, wherein said protein of interest is an immunoglobulin heavy chain. In yet another preferred embodiment a protein expression unit according to the invention is provided, wherein said protein of interest is an immunoglobulin light chain. When these two protein expression units are present within the same (host) cell a multimeric protein and more specifically an antibody is assembled.
The invention includes a cell provided with a protein expression unit comprising a
STAR.
The invention also includes a (host) cell comprising at least one protein expression unit according to the invention. Such a (host) cell is then for example used for large-scale production processes.
The invention also includes a cell obtainable according to any one of the methods as described herein. The invention furthermore includes a protein obtainable from said cell (for example, via the process of protein purification). Preferably, said protein is a multimeric protein and even more preferably said multimeric protein is an antibody. Such an antibody can be used in pharmaceutical and/or diagnostic applications.
In a preferred embodiment the invention provides a cell comprising two polypeptide expression units each encoding at least one polypeptide of interest, characterized in that said polypeptide expression units each comprise at least one sequence having the capacity to at least in part block chromatin-associated repression, wherein said sequence having the capacity to at least in part block chromatin-associated repression for one of the expression units is chosen from the group consisting of: SEQ ID: 44 of Figure 6; (ii) a functional equivalent of SEQ ID: 44 of Figure 6; and (iii) a functional 00 39 fragment of SEQ ID: 44 of Figure 6, and wherein said sequence having the capacity to at least in part block chromatin-associated repression for the other one of the expression units is chosen from the group consisting of: any one of SEQ ID: 1 through SEQ ID: 65 of Figure 6; a functional equivalent of any one of SEQ ID: 1 through SEQ ID: 65 of t 5 Figure 6; and a functional fragment of any one of SEQ ID: 1 through SEQ ID: 65 of Figure 6.
00 In a further embodiment the invention provides a polypeptide expression unit 00 Scomprising: a bicistronic gene comprising in the following order: an open reading frame encoding a polypeptide of interest, (ii) an Internal Ribosome Entry Site (IRES), and (iii) a selection marker, and wherein said bicistronic gene is under control of a functional promoter; and at least one sequence having the capacity to at least in part block chromatinassociated repression, wherein said sequence having the capacity to at least in part block chromatin-associated repression is chosen from the group consisting of: SEQ ID: 44 of Figure 6; a functional equivalent of SEQ ID: 44 of Figure 6; and a functional fragment of SEQ ID: 44 of Figure 6.
In still a further embodiment the invention provides a method for obtaining a host cell expressing two polypeptides of interest, the method comprising: a) providing host cells comprising: a first polypeptide expression unit comprising a bicistronic gene comprising a promoter functionally linked to a sequence encoding a first polypeptide of interest and a first selectable marker gene, and (ii) a second polypeptide expression unit comprising a bicistronic gene comprising a promoter functionally linked to a sequence encoding a second polypeptide of interest and a second selectable marker gene, wherein said second selectable marker gene is different from said first selectable marker gene, and wherein said first polypeptide expression unit, or said second polypeptide expression unit, or each of said first and said second polypeptide expression units comprise at least one sequence having the capacity to at least in part block chromatin-associated repression, wherein said sequence having the capacity to at least in part block chromatin-associated repression is chosen from the group consisting of: SEQ ID: 44 of Figure 6; a functional equivalent of SEQ ID: 44 of Figure 6; and a functional fragment of SEQ ID: 44 of Figure 6; and b) selecting a host cell by selecting for expression of said first and second selectable marker genes.
00 39A A further embodiment of the invention provides a set of two polypeptide expression units, said set comprising: a first polypeptide expression unit comprising a N bicistronic gene comprising a promoter functionally linked to a sequence encoding a first polypeptide of interest and a first selectable marker gene, and (ii) a second polypeptide expression unit comprising a bicistronic gene comprising a promoter functionally linked to N a sequence encoding a second polypeptide of interest and a second selectable marker gene, 0 wherein said second selectable marker gene is different from said first selectable marker Sgene, and wherein said first polypeptide expression unit, or said second polypeptide Sexpression unit, or both said first and said second polypeptide expression units comprise at least one sequence having the capacity to at least in part block chromatin-associated repression, wherein said sequence having the capacity to at least in part block chromatinassociated repression is chosen from the group consisting of: SEQ ID: 44 of Figure 6; a functional equivalent of SEQ ID: 44 of Figure 6; and a functional fragment of SEQ ID: 44 of Figure 6.
The foregoing discussion and the following examples are provided for illustrative purposes, and they are not intended to limit the scope of the invention as claimed herein.
They simply provide some of the preferred embodiments of the invention. Modifications and variations, which may occur to one of ordinary skill in the art, are within the intended scope of this invention. Various other embodiments apply to the present invention, including: other selectable marker genes; other IRES elements or means of attenuating IRES activity; other elements affecting transcription including promoters, enhancers, introns, terminators, and polyadenylation sites; other orders and/or orientations of the monocistronic and bicistronic genes; other anti-repressor elements or parts, derivations, and/or analogues thereof; other vector systems for delivery of the inventive DNA molecules into eukaryotic host cells; and applications of the inventive method to other transgenic systems.
00
O
O
EXAMPLES
Ct Example 1: STAR elements and two-step selection improve the predictability oftransgene expression One object of this invention is to improve transgene expression for Sheterologous protein production by using a two-step antibiotic selection procedure. The two-step procedure increases the predictability of finding 00 recombinant host cell lines that express the transgene to high levels, thus 0increasing the yield of the heterologous protein.
Materials and Methods Plasmid construction The pSDH-SIB/Z and pSDH-GIB/Z families of plasmids were constructed as follows: The zeocin selectable marker was recovered by polymerase chain reaction amplification (PCR) from plasmid pEM7/zeo (Invitrogen V500-20) using primers E99 and E100 (all PCR primers and mutagenic oligonucleotide sequences are listed in Table and cloned directionally into the XbaI and NotI sites of multiple cloning site (MCS) B of pIRES (Clontech 6028-1) to create pIRES-zeo. The blasticidin selectable marker was recovered by PCR from plasmid pCMV/bsd (Invitrogen V510-20) using primers E84 and E85, and cloned directionally into the Xbal and NotI sites MCS-B ofpIRES to create pIRES-bsd. The SEAP (secreted alkaline phosphatase) reporter gene was recovered by PCR from plasmid pSEAP2-basic (Clontech 6049-1) using primers F11 and E87, and cloned directionally into MCS-A of pIRES-zeo and pIRES-bsd to create plasmids pIRES-SEAP-zeo and pIRES-SEAP-bsd. The GFP reporter gene was recovered from plasmid phr- GFP-1 (Stratagene 240059) by restriction digestion with NheI and EcoRI, and ligated directionally into MCS-A of pIRES-zeo and pIRES-bsd to create plasmids pIRES-GFP-zeo and pIRES-GFP-bsd. A linker was inserted at the non-methylated Clal site of each of these plasmids (downstream of the 00
O
O
Sneomycin resistance marker) to introduce an AgeI site using oligonucleotides C F34 and The pSDH-Tet vector was constructed by PCR of the luciferase open reading frame from plasmid pREP4-HSF-Luc (van der Vlag et al., 2000) using 5 primers C67 and C68, and insertion of the SacII/BamHI fragment into N SacII/BamHI-digested pUHD10-3 (Gossen Bujard, 1992). The luciferase O expression unit was re-amplified with primers C65 and C66, and re-inserted 0O into pUHD10-3 in order to flank it with multiple cloning sites (MCSI and OMCSII). An AscI site was then introduced into MCSI by digestion with EcoRI and insertion of a linker (comprised of annealed oligonucleotides D93 and D94). The CMV promoter was amplified from plasmid pCMV-Bsd with primers and D91, and used to replace the Tet-Off promoter in pSDH-Tet by Sall/SacII digestion and ligation to create vector pSDH-CMV. The luciferase open reading frame in this vector was replaced by SEAP as follows: vector pSDH-CMV was digested with SacII and BamHI and made blunt; the SEAP open reading frame was isolated from pSEAP-basic by EcoRI/SalI digestion, made blunt and ligated into pSDH-CMV to create vector pSDH-CS. The puromycin resistance gene under control of the SV40 promoter was isolated from plasmid pBabe-Puro (Morgenstern Land, 1990) by PCR, using primers C81 and C82. This was ligated into vector pGL3-control (BamHI site removed) (Promega E1741) digested with NcoI/XbaI, to create pGL3-puro. pGL3-puro was digested with BgIII/SalI to isolate the SV40-puro resistance gene, which was made blunt and ligated into NheI digested, blunt-ended pSDH-CS. The resulting vector, pSDH-CSP, is shown in FIG 2. STAR18 was inserted into MCSI and MCSII in two steps, by digestion of the STAR element and the pSDH-CSP vector with an appropriate restriction enzyme, followed by ligation.
The orientation of the STAR element was determined by restriction mapping.
The identity and orientation of the inserts were verified by DNA sequence analysis. Sequencing was performed by the dideoxy method (Sanger et al., 1977) using a Beckman CEQ2000 automated DNA sequencer, according to the 00
O
O
^c manufacturer's instructions. Briefly, DNA was purified from E. coli using S QIAprep Spin Miniprep and Plasmid Midi Kits (QIAGEN 27106 and 12145, respectively). Cycle sequencing was carried out using custom oligonucleotides E25, and E42 (Table in the presence of dye terminators (CEQ Dye S 5 Terminator Cycle Sequencing Kit, Beckman 608000).
pSDH-CSP plasmids containing STAR elements were modified as follows: for receiving SEAP-IRES-zeo/bsd cassettes, an AgeI site was Sintroduced at the BglII site by insertion of a linker, using oligonucleotides F32 and F33; for receiving GFP-IRES-zeo/bsd cassettes, an AgeI site was introduced at the Bsu36I site by insertion of a linker, using oligonucleotides F44 and F45. The SEAP-IRES-zeo/bsd cassettes were inserted into the pSDH- CSP-STAR18 plasmid by replacement of the Bsu36I/AgeI fragment with the corresponding fragments from the pIRES-SEAP-zeo/bsd plasmids. The GFP- IRES-zeo/bsd cassettes were inserted into pSDH-CSP-STAR plasmids by replacement of the BglII/AgeI fragment with the corresponding fragments from the pIRES-GFP-zeolbsd plasmids. The resulting plasmid families, pSDH-SIB/Z and pSDH-GIB/Z, are shown in FIG 3.
All cloning steps were carried out following the instructions provided by the manufacturers of the reagents used, according to methods known in the art (Sambrook et al., 1989).
Transfection and culture of CHO cells The Chinese Hamster Ovary cell line CHO-K1 (ATCC CCL-61) was cultured in HAMS-F12 medium 10% Fetal Calf Serum containing 2 mM glutamine, 100 U/ml penicillin, and 100 micrograms/ml streptomcyin at 37 C02. Cells were transfected with the pSDH-SIZ plasmids using SuperFect (QIAGEN) as described by the manufacturer. Briefly, cells were seeded to culture vessels and grown overnight to 70-90% confluence. SuperFect reagent was combined with plasmid DNA at a ratio of 6 microliters per microgram for a 10 cm Petri dish, 20 micrograms DNA and 120 00
O
O
microliters SuperFect) and added to the cells. After overnight incubation the transfection mixture was replaced with fresh medium, and the transfected cells were incubated further. After overnight cultivation, cells were seeded into fresh culture vessels and 500 micrograms/ml neomycin was added. Neomycin selection was complete within 3-4 days. Fresh medium was then added containing zeocin (100 ig/ml) and cultured further. Individual clones were O isolated after 4-5 days and cultured further. Expression of the reporter gene 00 was assessed by measuring SEAP activity approximately 3 weeks after 0transfection.
Secreted Alkaline Phosphatase (SEAP) assay SEAP activity (Berger et al., 1988, Henthorn et al., 1988, Kain, 1997, Yang et al., 1997) in the culture media of the clones was determined as described by the manufacturer (Clontech Great EscAPe kit #K2041). Briefly, an aliquot of medium was heat inactivated at 65 C, then combined with assay buffer and CSPD chemiluminescent substrate and incubated at room temperature for 10 minutes. The rate of substrate conversion was then determined in a luminometer (Turner 20/20TD). Cell density was determined by counting trypsinized cells in a Coulter ACT10 cell counter.
Results Transfection of the pSDH-SIZ-STAR18 expression vector consistently results in ~10-fold more colonies than transfection of the empty pSDH-SIZ vector, presumably due to the increased proportion of primary transfectants that are able to bring the neomycin resistance gene to expression. The outcome of a typical experiment is shown in Table 2, in which transfection of the empty vector yielded ~100 G418-resistant colonies, and transfection of the STAR18 vector yielded ~1000 colonies.
The expression of the SEAP reporter transgene was compared between the empty pSDH-SIZ vector (hence, without a STAR sequence) and the 00 STAR18 vector (FIG The populations of G418-resistant isolates were divided into two sets. The first set was cultured with G418 only (one-step selection). For this set, the inclusion of STAIR18 to protect the transgene from silencing resulted in higher yield of reporter protein: the maximal level of expression among the 20 clones analyzed was 2-3-fold higher than the maximal expression level of clones without the STAR element. The inclusion of STAR18 also led increased predictability: more than 25% of the STAR18 clones 00 had expression levels greater than or equal to the maximum expression level N observed in the STARless clones. In this population of STAR18 clones, had expression above the background level, while only 50%/ of the STARless clones had expression above the background level.
The performance of STAR18 was even better when used in a two-step selection. The second set of G418-resistant isolates was treated with zeocin.
Clones that survived the two-step selection regime were assayed for expression of the SEAP reporter transgene. In this case too, the STAR18 element increased the yield compared to the STA]Rless clones by approximately threefold. The predictability was also increased by inclusion of STAR18: -80% of the population had expression levels greater than the highest-expressing STA~less clone.
When the one-step selection is compared with the two-step selection, it can be seen that the latter is superior in terms of both yield and predictability.
In fact with two-step selection, no clones appear with background levels of expression. This is due to the requirement imposed on clones that survive zeocin selection that they have high levels of transcription of the bicistronic SEAP-zeocin gene. As indicated in Table 2, the elimination of low-producing clones by the second antibiotic selection step increases the predictability of finding high-producing clones; when STAR18 is included in the expression unit, this increased predictability is improved from three-fold to thirty-fold. In summary, when STAR elements are used in combination with two-step antibiotic selection, the predictability of finding clones with -high yields of a 00
O
O
transgene is dramatically improved. Application of this increased predictability to two or more transgenes simultaneously will significantly increase the likelihood of finding clones that have high yields of multimeric proteins.
SExample 2: Simultaneous expression of two proteins is improved by O two-step selection and STAR elements OO A second object of this invention is to improve the expression of heterologous multimeric proteins such as antibodies. This example demonstrates that the combination of STAR elements and two-step antibiotic selection improves the predictability of establishing recombinant host cell lines that express balanced and proportional amounts of two heterologous polypeptides at high yields. This method of the invention is applicable in practice to multimeric proteins such as antibodies. It is demonstrated in this example using two reporter proteins, secreted alkaline phosphatase (SEAP) and green fluorescent protein (GFP).
Materials and Methods Plasmids The pSDH-SIB/Z and pSDH-GIB/Z families of plasmids described in Example 1 are used. Cloning of STAR elements x and y, transfection and culture of host cells, and SEAP assay are described in Example 1. The assay for GFP is performed according to the manufacture's instructions.
Results Results show an increased number of clones wherein the two reporter proteins are both expressed. Moreover, expression was balanced in many of such clones.
00
O
SExample 3: General-purpose vectors for simultaneous expression of multiple polypeptides SThe expression system tested and validated in Example 1 has been modified to facilitate its application to any polypeptide that is preferably coexpressed with another polypeptide or polypeptides in a host cell, for example N the heavy and light chains of recombinant antibodies. It is designed for easy N and rapid construction of the expression units. This improved system is a described in this example.
Materials and Methods Plasmids The construction of the plasmids PP1 to PP5 is described below, and their map is shown in FIG 5. Plasmid pd2EGFP (Clontech 6010-1) was modified by insertion of a linker at the BsiWI site to yield pd2EGFP-link. The linker (made by annealing oligonucleotides F25 and F26) introduces sites for the PacI, BglII, and EcoRV restriction endonucleases. This creates the multiple cloning site MCSII for insertion of STAR elements. Then primers F23 and F24 were used to amplify a region of 0.37 kb from pd2EGFP, which was inserted into the BglII site of pIRES (Clontech 6028-1) to yield pIRES-stuf.
This introduces sites for the AscI and Swal restriction endonucleases at MCSI, and acts as a "stuffer fragment" to avoid potential interference between
STAR
elements and adjacent promoters. pIRES-stuf was digested with BglI and FspI to liberate a DNA fragment composed of the stuffer fragment, the CMV promoter, the IRES element (flanked by multiple cloning sites MCS A and MCS and the SV40 polyadenylation signal. This fragment was ligated with the vector backbone of pd2EGFP-link produced by digestion with BamHI and StuI, to yield pd2IRES-link.
The open reading frames of the zeocin-, neomycin, or puromycinresistance genes were inserted into the BamHI/NotI sites of MCS B in pd2IRES-link as follows: the zeocin-resistance ORF was amplified by PCR 00 with primers F18 and E100 from plasmid pEM7/zeo, digested with BamHI and Notl, and ligated with BamHI/NotI-digested pd2IRES-link to yield pd2IRESlink-zeo. The neomycin-resistance ORF was amplified by PCR with primers F19 and F20 from pIRES, digested with BamHI and NotI, and ligated with BamHI/NotI-digested pd2IRES-link to yield pd2IRES-link-neo. The 1 puromycin-resistance ORF was amplified by PCR with primers F21 and F22 O from plasmid pBabe-Puro (Morgenstern Land, 1990), digested with BamHI 0 0 and NotI, and ligated with BamHI/NotI-digested pd2IRES-link to yield pd2IRES-link-puro.
The GFP reporter ORF was introduced into pd2IRES-link-puro by amplification of phr-GFP-1 with primers F16 and F17, and insertion of the EcoRI-digested GFP cassette into the EcoRI site in MCS A of the pd2IRESlink-puro plasmid, to yield plasmid PP1 (FIG 5A). Correct orientation was verified by restriction mapping. The SEAP reporter ORF was introduced into pd2IRES-link-zeo and pd2IRES-link-neo by PCR amplification ofpSEAP2basic with primers F14 and F15, and insertion of the EcoRI-digested
SEAP
cassette into the EcoRI sites in MCS A of the plasmids pd2IRES-link-zeo (to yield plasmid PP2, FIG 5B) and pd2IRES-link-neo (to yield plasmid PP3, FIG Correct orientation was verified by restriction mapping.
Plasmids PP1, PP2 and PP3 contain a bicistronic gene for expression of a reporter protein and and antibiotic resistance marker. In order to carry out two-step antibiotic selection with separate antibiotics, a monocistronic resistance marker was introduced as follows: pIRES-stuf was digested with ClaI, made blunt with Klenow enzyme, and digested further with BgII. This liberated a DNA fragment composed of the stuffer fragment, the CMV promoter, the IRES element (flanked by multiple cloning sites MCS A and MCS the SV40 polyadenylation signal, and the neomycin resistance marker under control of the SV40 promoter. This fragment was ligated with the vector backbone of pd2EGFP-link produced by digestion with BamHI and StuI, to yield pd2IRES-link-neo. Then as described above the GFP and puro cassettes 00
O
O
were introduced to yield PP4 (FIG 5D), and the SEAP and zeo cassettes were introduced to yield PP5 (FIG Example 4: Predictability and yield are improved by application of STAR elements in expression systems CN STAR elements function to block the effect of transcriptional repression 0 influences on transgene expression units. These repression influences can be 00 due to heterochromatin ("position effects", (Boivin Dura, 1998)) or to adjacent copies of the transgene ("repeat-induced gene silencing", (Garrick et al., 1998)). Two of the benefits of STAR elements for protein production are increased predictability of finding high-expressing primary recombinant host cells, and increased yield during production cycles. These benefits are illustrated in this example.
Materials and Methods Construction of the pSDH vectors and STAR-containing derivatives: The pSDH-Tet vector was constructed by polymerase chain reaction amplification (PCR) of the luciferase open reading frame from plasmid pREP4-HSF-Luc (van der Vlag et al., 2000) using primers C67 and C68 (all PCR primers and mutagenic oligonucleotides are listed in Table and insertion of the SacII/BamHI fragment into SacII/BamHI-digested pUHD10-3 (Gossen Bujard, 1992). The luciferase expression unit was re-amplified with primers and C66, and re-inserted into pUHD10-3 in order to flank it with two multiple cloning sites (MCSI and MCSII). An AscI site was then introduced into MCSI by digestion with EcoRI and insertion of a linker (comprised of annealed oigonucleotides D93 and D94). The CMV promoter was amplified from plasmid pCMV-Bsd (Invitrogen K510-01) with primers D90 and D91, and used to replace the Tet-Off promoter in pSDH-Tet by SalIlSacII digestion and ligation to create vector pSDH-CMV. The luciferase open reading frame in this vector was replaced by SEAP (Secreted Alkaline Phosphatase) as follows: 00 vector pSDH-CMV was digested with SacII and BamHI and made blunt; the SEAP open reading frame was isolated from pSEAP-basic (Clontech 6037-1) by EcoRI/SalI digestion, made blunt and ligated into pSDH-CMV to create vector pSDH-CS. The puromycin resistance gene under control of the SV40 promoter was isolated from plasmid pBabe-Puro (Morgenstern Land, 1990) by PCR, using primers C81 and C82. This was ligated into vector pGL3-control (BamHI O site removed) (Promega E1741) digested with NcoI/XbaI, to create pGL3-puro.
0 pGL3-puro was digested with BgIII/SalI to isolate the SV40-puro resistance Sgene, which was made blunt and ligated into NheI digested, blunt-ended pSDH-CS. The resulting vector, pSDH-CSP, is shown in FIG 7. All cloning steps were carried out following the instructions provided by the manufacturers of the reagents, according to methods known in the art (Sambrook et al., 1989).
STAR elements were inserted into MCSI and MCSII in two steps, by digestion of the STAR element and the pSDH-CSP vector with an appropriate restriction enzyme, followed by ligation. The orientation of STAR elements in recombinant pSDH vectors was determined by restriction mapping. The identity and orientation of the inserts were verified by DNA sequence analysis.
Sequencing was performed by the dideoxy method (Sanger et al., 1977) using a Beckman CEQ2000 automated DNA sequencer, according to the manufacturer's instructions. Briefly, DNA was purified from E. coli using QIAprep Spin Miniprep and Plasmid Midi Kits (QIAGEN 27106 and 12145, respectively). Cycle sequencing was carried out using custom oligonucleotides E25, and E42 (Table in the presence of dye terminators (CEQ Dye Terminator Cycle Sequencing Kit, Beckman 608000).
Transfection and culture of CHO cells with pSDH plasmids The Chinese Hamster Ovary cell line CHO-K1 (ATCC CCL-61) was cultured in HAMS-F12 medium 10% Fetal Calf Serum containing 2 mM glutamine, 100 U/ml penicillin, and 100 micrograms/ml streptomcyin at 370 00
O
O
C02. Cells were transfected with the pSDH-CSP vector, and its derivatives containing STAR6 or STAR49 in MCSI and MCSII, using SuperFect (QIAGEN) as described by the manufacturer. Briefly, cells were seeded to culture vessels and grown overnight to 70-90% confluence. SuperFect reagent was combined with plasmid DNA (linearized in this example by C digestion with PvuI) at a ratio of 6 microliters per microgram for a 10 cm O Petri dish, 20 micrograms DNA and 120 microliters SuperFect) and added to 00 the cells. After overnight incubation the transfection mixture was replaced Swith fresh medium, and the transfected cells were incubated further. After overnight cultivation, 5 micrograms/ml puromycin was added. Puromycin selection was complete in 2 weeks, after which time individual puromycin resistant CHO/pSDH-CSP clones were isolated at random and cultured further.
Secreted Alkaline Phosphatase (SEAP) assay SEAP activity (Berger et al., 1988, Henthorn et al., 1988, Kain, 1997, Yang et al., 1997) in the culture medium of CHO/pSDH-CSP clones was determined as described by the manufacturer (Clontech Great EscAPe kit #K2041). Briefly, an aliquot of medium was heat inactivated at 65 0 C, then combined with assay buffer and CSPD chemiluminescent substrate and incubated at room temperature for 10 minutes. The rate of substrate conversion was then determined in a luminometer (Turner 20/20TD). Cell density was determined by counting trypsinized cells in a Coulter ACT10 cell counter.
Transfection and culture of U-2 OS cells with pSDH plasmids The human osteosarcoma U-2 OS cell line (ATCC #HTB-96) was cultured in Dulbecco's Modified Eagle Medium 10% Fetal Calf Serum containing glutamine, penicillin, and streptomycin (supra) at 37°C/5% C02.
Cells were co-transfected with the pSDH-CMV vector, and its derivatives 00
O
O
containing STAR6 or STAR8 in MCSI and MCSII, (along with plasmid pBabe- Puro) using SuperFect (supra). Puromycin selection was complete in 2 weeks, Safter which time individual puromycin resistant U-2 OS/pSDH-CMV clones were isolated at random and cultured further.
c Luciferase assay Luciferase activity (Himes Shannon, 2000) was assayed in 00 Sresuspended cells according to the instructions of the assay kit manufacturer (Roche 1669893), using a luminometer (Turner 20/20TD). Total cellular protein concentration was determined by the bicinchoninic acid method according to the manufacturer's instructions (Sigma B-9643), and used to normalize the luciferase data.
Results Recombinant CHO cell clones containing the pSDH-CSP vector, or pSDH-CSP plasmids containing STAR6 or STAR49 (Table were cultured for 3 weeks. The SEAP activity in the culture supernatants was then determined, and is expressed on the basis of cell number (FIG As can be seen, clones with STAR elements in the expression units were isolated that express 2-3 fold higher SEAP activity than clones whose expression units do not include STAR elements. Furthermore, the number of STAR-containing clones that express SEAP activity at or above the maximal activity of the STAR-less clones is quite high: 25% to 40% of the STAR clone populations exceed the highest SEAP expression of the pSDH-CSP clones.
Recombinant U-2 OS cell clones containing the pSDH-CMV vector, or pSDH-CMV plasmids containing STAR6 or STAR8 (Table were cultured for 3 weeks. The luciferase activity in the host cells was then determined, and is expressed as relative luciferase units (FIG normalized to total cell protein.
The recombinant U-2 OS clones with STAR elements flanking the expression units had higher yields than the STAR-less clones: the highest expression 00 observed from STAR8 clones was 2-3 fold higher than the expression from STAR-less clones. STAR6 clones had maximal expression levels 5 fold higher than the STAR-less clones. The STAR elements conferred greater predictability as well: for both STAR elements, 15 to 20% of the clones displayed luciferase expression at levels comparable to or greater than the STAR-less clone with the highest expression level.
These results demonstrate that, when used with the strong CMV 0O promoter, STAR elements increase the yield of heterologous proteins (luciferase and SEAP). All three of the STAR elements introduced in this example provide elevated yields. The increased predictability conferred by the STAR elements is manifested by the large proportion of the clones with yields equal to or greater than the highest yields displayed by the STAR-less clones.
Example 5: STAR elements improve the stability of transgene expression During cultivation of recombinant host cells, it is common practice to maintain antibiotic selection. This is intended to prevent transcriptional silencing of the transgene, or loss of the transgene from the genome by processes such as recombination. However it is undesirable for production of proteins, for a number of reasons. First, the antibiotics that are used are quite expensive, and contribute significantly to the unit cost of the product. Second, for biopharmaceutical use, the protein must be demonstrably pure, with no traces of the antibiotic in the product. One advantage of STAR elements for heterologous protein production is that they confer stable expression on transgenes during prolonged cultivation, even in the absence of antibiotic selection; this property is demonstrated in this example.
Materials and Methods The U-2 OS cell line was transfected with the plasmid pSDH-Tet-STAR6 and cultivated as described in Example 4. Individual puromycin-resistant 00 clones were isolated and cultivated further in the absence of doxycycline. At weekly intervals the cells were transferred to fresh culture vessels at a dilution of 1:20. Luciferase activity was measured at periodic intervals as described in Example 4. After 15 weeks the cultures were divided into two replicates; one replicate continued to receive puromycin, while the other N replicate received no antibiotic for the remainder of the experiment (25 weeks 0 total).
00 Results Table 7 presents the data on luciferase expression by an expression unit flanked with STAR6 during prolonged growth with or without antibiotic. As can be seen, the expression of the reporter transgene, luciferase, remains stable in the U-2 OS host cells for the duration of the experiment. After the cultures were divided into two treatments (plus antibiotic and without antibiotic) the expression of luciferase was essentially stable in the absence of antibiotic selection. This demonstrates the ability of STAR elements to protect transgenes from silencing or loss during prolonged cultivation. It also demonstrates that this property is independent of antibiotic selection.
Therefore production of proteins is possible without incurring the costs of the antibiotic or of difficult downstream processing Example 6: Minimal essential sequences of STAR elements STAR elements are isolated from the genetic screen as described herein.
The screen uses libraries constructed with human genomic DNA that was sizefractionated to approximately 0.5 2 kilobases (supra). The STAR elements range from 500 to 2361 base pairs (Table It is likely that, for many of the STAR elements that have been isolated, STAR activity is conferred by a smaller DNA fragment than the initially isolated clone. It is useful to determine these minimum fragment sizes that are essential for STAR activity, for two reasons. First, smaller functional STAR elements would be 00
O
^c advantageous in the design of compact expression vectors, since smaller vectors transfect host cells with higher efficiency. Second, determining minimum essential STAR sequences permits the modification of those sequences for enhanced functionality. Two STAR elements have been finemapped to determine their minimal essential sequences.
Materials and Methods: (1167 base pairs) and STAR27 (1520 base pairs) have been finemapped. They have been amplified by PCR to yield sub-fragments of approximately equal length (FIG 10 legend). For initial testing, these have been cloned into the pSelect vector at the BamHI site, and transfected into U-2 OS/Tet-OfflLexA-HP1 cells. The construction of the host strains has been described (van der Vlag et al., 2000). Briefly, they are based on the U-2 OS human osteosarcoma cell line (American Type Culture Collection HTB-96). U-2 OS is stably transfected with the pTet-Offplasmid (Clontech K1620-A), encoding a protein chimera consisting of the Tet-repressor DNA binding domain and the VP16 transactivation domain. The cell line is subsequently stably transfected with fusion protein genes containing the LexA DNA binding domain, and the coding regions of either HP1 or HPC2 (two Drosophila Polycomb group proteins that repress gene expression when tethered to DNA).
The LexA-repressor genes are under control of the Tet-Off transcriptional regulatory system (Gossen and Bujard, 1992). After selection for hygromycin resistance, LexA-HP1 was induced by lowering the doxycycline concentration.
Transfected cells were then incubated with zeocin to test the ability of the STAR fragments to protect the SV40-Zeo expression unit from repression due to LexA-HP1 binding.
00 Results In this experiment STAR10 and STAR 27 confer good. protection against gene silencing, as expected (FIG 10). This is manifested by robust growth in the presence of zeocin.
Of the 3 STAR10 sub-fragments, 10A (-400 base pairs) confers on Ni transfected cells vigorous growth in the presence of zeocin, exceeding that of the full-length STAR element. Cells transfected with pSelect constructs 00 containing the other 2 sub-fragments do not grow in the presence of zeocin.
These results identify the -400 base pair 10A fragment as encompassing the DNA sequence responsible for the anti-repression activity of STAR1O.
STAR27 confers moderate growth in zeocin to transfected cells in this experiment (FIG 10). One of the sub-fragments of this STAR, 27B (-500 base pairs), permits weak growth of the host cells in ze ocin- containing medium.
This suggests that the anti-repression activity of this STAR is partially localized on sub-fragment 27B, but full activity requires sequences from 27A and/or 27C (each -500 base pairs) as well.
Example 7: STAR elements function in diverse strains of cultured marmmalian cells The choice of host cell line for (heterologous) protein expression is a critical parameter for the quality, yield, andl unit cost of the protein.
Considerations such as post-translational modifications, secretory pathway capacity, and cell line immortality dictate the appropriate cell line for a particular biopharmaceutical production system. For this reason, the advantages provided by STAR elements in terms of yield, predictability, and stability should be obtainable in diverse cell lines. This was tested by comparing the function of STAR6 in the human U-2 OS cell line in which it was originally cloned, and the CHO cell line which is widely applied in biotechnology.
00
O
Materials and Methods: The experiments of Example 4 are referred to.
Results The expression of the SEAP reporter gene in CHO cells is presented in SFIG 8; the expression of the luciferase reporter gene in U-2 OS cells is Spresented in FIG 9. By comparison of the results of these two experiments, it is 00 0 apparent that the STAR6 element is functional in both cell lines: reporter gene Sexpression was more predictable in both of them, and clones of each cell line displayed higher yields, when the reporter gene was shielded from position effects by STAR6. These two cell lines are derived from different species (human and hamster) and different tissue types (bone and ovary), reflecting the broad range of host cells in which this STAR element can be utilized in improving heterologous protein expression.
Example 8: STAR elements function in the context of various transcriptional promoters Transgene transcription is achieved by placing the transgene open reading frame under control of an exogenous promoter. The choice of promoter is influenced by the nature of the (heterologous) protein and the production system. In most cases, strong constitutive promoters are preferred because of the high yields they can provide. Some viral promoters have these properties; the promoter/enhancer of the cytomegalovirus immediate early gene ("CMV promoter") is generally regarded as the strongest promoter in common biotechnological use (Boshart et al., 1985, Doll et al., 1996, Foecking Hofstetter, 1986). The simian virus SV40 promoter is also moderately strong (Boshart et al., 1985, Foecking Hofstetter, 1986) and is frequently used for ectopic expression in mammalian cell vectors. The Tet-Off promoter is inducible: the promoter is repressed in the presence of tetracycline or related antibiotics (doxycycline is commonly used) in cell-lines which express the tTA 00 (O Splasmid (Clontech K1620-A), and removal of the antibiotic results in transcriptional induction (Deuschle et al., 1995, Gossen Bujard, 1992, Izumi N. Gilbert, 1999, Umana et al., 1999).
Materials and Methods: N The construction of the pSDH-Tet and pSDH-CMV vectors is described c, in Example 4. pSDH-SV40 is, amongst others, derived from SThe selection vector for STAR elements, pSelect-SV40-zeo is constructed as follows: the pREP4 vector (Invitrogen V004-50) is used as the plasmid backbone. It provides the Epstein Barr oriP origin of replication and EBNA-1 nuclear antigen for high-copy episomal replication in primate cell lines; the hygromycin resistance gene with the thymidine kinase promoter and polyadenylation site, for selection in mammalian cells; and the ampicillin resistance gene and colEl origin of replication for maintenance in Escherichia coli. The vector contains four c6nsecutive LexA operator sites between XbaI and NheI restriction sites (Bunker and Kingston, 1994). Embedded between the LexA operators and the NheI site is a polylinker consisting of the following restriction sites: HindIII-AscI-BamHI-Ascl-HindIII. Between the NheI site and a Sall site is the zeocin resistance gene with the SV40 promoter and polyadenylation site, derived from pSV40/Zeo (Invitrogen V502-20); this is the selectable marker for the STAR screen.
was constructed by PCR amplification of the SV40 promoter (primers D41 and D42) from plasmid pSelect-SV40-Zeo, followed by digestion of the PCR product with SacII and SalI. The pSDH-CMV vector was digested with SacII and Sall to remove the CMV promoter, and the vector and fragment were ligated together to create pSDH-SV40. STAR6 was cloned into MCSI and MCSII as described in Example 4. The plasmids pSDH-Tet, pSDH- Tet-STAR6, pSDH-Tet-STAR7, pSDH-SV40 and pSDH-SV40-STAR6 were cotransfected with pBabe-Puro into U-2 OS using SuperFect as described by the 00
O
manufacturer. Cell cultivation, puromycin selection, and luciferase assays were carried out as described in Example 4.
Results FIGS 9, 11, and 12 compare the expression of the luciferase reporter N gene from 3 different promoters: two strong and constitutive viral promoters O (CMV and SV40), and the inducible Tet-Off promoter. All three promoters 00 were tested in the context of the STAR6 element in U-2 OS cells. The results O demonstrate that the yield and predictability from all 3 promoters are increased by STAR6. As described in Examples 4 and 7, STAR6 is beneficial in the context of the CMV promoter (FIG Similar improvements are seen in the context of the SV40 promoter (FIG 11): the yield from the highestexpressing STAR6 clone is 2-3 fold greater than the best pSDH-SV40 clones, and 6 STAR clones (20% of the population) have yields higher than the best STAR-less clones. In the context of the Tet-Off promoter under inducing (low doxycycline) concentrations, STAR6 also improves the yield and predictability of transgene expression (FIG 12): the highest-expressing STAR6 clone has a higher yield than the best pSDH-Tet clone, and 9 STAR6 clones (35% of the population) have yields higher than the best STAR-less clone. It is concluded that this STAR element is versatile in its transgene-protecting properties, since it functions in the context of various biotechnologically useful promoters of transcription.
Example 9: STAR element function can be directional While short nucleic acid sequences can be symmetrical (e.g.
palindromic), longer naturally-occurring sequences are typically asymmetrical.
As a result, the information content of nucleic acid sequences is directional, and the sequences themselves can be described with respect to their 5' and 3' ends. The directionality of nucleic acid sequence information affects the arrangement in which recombinant DNA molecules are assembled using 00
O
O
C standard cloning techniques known in the art (Sambrook et al., 1989). STAR elements are long, asymmetrical DNA sequences, and have a directionality N based on the orientation in which they were originally cloned in the pSelect vector. In the examples given above, using two STAR elements in pSDH S 5 vectors, this directionality was preserved. This orientation is described as the Snative or orientation, relative to the zeocin resistance gene (see FIG 13).
In this example the importance of directionality for STAR function is tested in 00 0 the pSDH-Tet vector. Since the reporter genes in the pSDH vectors are flanked on both sides by copies of the STAR element of interest, the orientation of each STAR copy must be considered. This example compares the native orientation with the opposite orientation (FIG 13).
Materials and Methods: The STAR66 element was cloned into pSDH-Tet as described in Example 4. U-2 OS cells were co-transfected with plasmids pSDH-Tet- STAR66-native and pSDH-Tet-STAR66-opposite, and cultivated as described in Example 4. Individual clones were isolated and cultivated; the level of luciferase expression was determined as described (supra).
Results The results of the comparison of STAR66 activity in the native orientation and the opposite orientation are shown in FIG 14. When STAR66 is in the opposite orientation, the yield of only one clone is reasonably high luciferase units). In contrast, the yield of the highest-expressing clone when STAR66 is in the native orientation is considerably higher (100 luciferase units), and the predictability is much higher as well: 7 clones of the nativeorientation population express luciferase above the level of the highestexpressing clone from the opposite-orientation population, and 15 of the clones in the native-orientation population express luciferase above 10 relative luciferase units.
00 Therefore it is demonstrated that STAR66 function is directional.
Example 10: Transgene expression in the context of STAR elements is copy number-dependent V 5 Transgene expression units for (heterologous) protein expression are C1 generally integrated into the genome of the host cell to ensure stable retention 0C during cell division. Integration can result in one or multiple copies of the 00 Sexpression unit being inserted into the genome; multiple copies may or may not be present as tandem arrays. The increased yield demonstrated for transgenes protected by STAR elements (supra) suggests that STAR elements are able to permit the transgene expression units to function independently of influences on transcription associated with the site of integration in the genome (independence from position effects (Boivin Dura, 1998). It suggests further that the STAR elements permit each expression unit to function independently of neighboring copies of the expression unit when they are integrated as a tandem array (independence from repeat-induced gene silencing (Garrick et al., 1998)). Copy number-dependence is determined from the relationship between transgene expression levels and copy number, as described in the example below.
Materials and Methods: U-2 OS cells were co-transfected with pSDH-Tet-STAR10 and cultivated under puromycin selection as described (supra). Eight individual clones were isolated and cultivated further. Then cells were harvested, and one portion was assayed for luciferase activity as described (supra). The remaining cells were lysed and the genomic DNA purified using the DNeasy Tissue Kit (QIAGEN 69504) as described by the manufacturer. DNA samples were quantitated by UV spectrophotometry. Three micrograms of each genomic DNA sample were digested with PvuII and XhoI overnight as described by the manufacturer (New England Biolabs), and resolved by agarose gel 00
O
electrophoresis. DNA fragments were transferred to a nylon membrane as described (Sambrook et al., 1989), and hybridized with a radioactively labelled N probe to the luciferase gene (isolated from BamHI/SacII-digested pSDH-Tet).
The blot was washed as described (Sambrook et al., 1989) and exposed to a phosphorimager screen (Personal F/X, BioRad). The resulting autoradiogram S (FIG 15) was analyzed by densitometry to determine the relative strength of the luciferase DNA bands, which represents the transgene copy number.
00 (C Results The enzyme activities and copy numbers (DNA band intensities) of luciferase in the clones from the pSDH-Tet-STAR10 clone population is shown in FIG 16. The transgene copy number is highly correlated with the level of luciferase expression in these pSDH-Tet-STAR10 clones (r 0.86). This suggests that STAR10 confers copy number-dependence on the transgene expression units, making transgene expression independent of other transgene copies in tandem arrays, and independent of gene-silencing influences at the site of integration.
Example 11: STAR elements function as enhancer blockers but not enhancers Gene promoters are subject to both positive and negative influences on their ability to initiate transcription. An important class of elements that exert positive influences are enhancers. Enhancers are characteristically able to affect promoters even when they are located far away (many kilobase pairs) from the promoter. Negative influences that act by heterochromatin formation Polycomb group proteins) have been described above, and these are the target of STAR activity. The biochemical basis for enhancer function and for heterochromatin formation is fundamentally similar, since they both involve binding of proteins to DNA. Therefore it is important to determine whether STAR elements are able to block positive influences as well as negative 00
O
O
influences, in other words, to shield transgenes from genomic enhancers in the vicinity of the site of integration. The ability to shield transgenes from enhancer activity ensures stable and predictable performance of transgenes in biotechnological applications. This example examines the performance of STAR elements in an enhancer-blocking assay.
Another feature of STAR activity that is important to their function is c the increased yield they confer on transgenes (Example STARs are isolated 00 Son the basis of their ability to maintain high levels of zeocin expression when heterochromatin-forming proteins are bound adjacent to the candidate
STAR
elements. High expression is predicted to occur because STARs are anticipated to block the spread of heterochromatin into the zeocin expression unit.
However, a second scenario is that the DNA fragments in zeocin-resistant clones contain enhancers. Enhancers have been demonstrated to have the ability to overcome the repressive effects of Polycomb-group proteins such as those used in the method of the STAR screen (Zink Paro, 1995). Enhancers isolated by this phenomenon would be considered false positives, since enhancers do not have the properties claimed here for STARs. In order to demonstrate that STAR elements are not enhancers, they have been tested in an enhancer assay.
The enhancer-blocking assay and the enhancer assay are methodologically and conceptually similar. The assays are shown schematically in FIG 17. The ability of STAR elements to block enhancers is performed using the E47/E-box enhancer system. The E47 protein is able to activate transcription by promoters when it is bound to an E-box DNA sequence located in the vicinity of those promoters (Quong et al., 2002). E47 is normally involved in regulation of B and T lymphocyte differentiation (Quong et al., 2002), but it is able to function in diverse cell types when expressed ectopically (Petersson et al., 2002). The E-box is a palindromic DNA sequence, CANNTG (Knofler et al., 2002). In the enhancer-blocking assay, an E-box is placed upstream of a luciferase reporter gene (including a minimal promoter) 00
O
O
in an expression vector. A cloning site for STAR elements is placed between the E-box and the promoter. The E47 protein is encoded on a second plasmid.
c The assay is performed by transfecting both the E47 plasmid and the luciferase expression vector into cells; the E47 protein is expressed and binds to the E-box, and the E47/E-box complex is able to act as an enhancer. When c the luciferase expression vector does not contain a STAR element, the E47/Ebox complex enhances luciferase expression (FIG 17A, situation When 00 SSTAR elements are inserted between the E-box and the promoter, their ability to block the enhancer is demonstrated by reduced expression of luciferase activity (FIG 17A, situation if STARs cannot block enhancers, luciferase expression is activated (FIG 17A, situation 3).
The ability of STAR elements to act as enhancers utilizes the same luciferase expression vector. In the absence of E47, the E-box itself does not affect transcription. Instead, enhancer behaviour by STAR elements will result in activation of luciferase transcription. The assay is performed by transfecting the luciferase expression vector without the E47 plasmid. When the expression vector does not contain STAR elements, luciferase expression is low (FIG 17B, situation If STAR elements do not have enhancer properties, luciferase expression is low when a STAR element is present in the vector (FIG 17B, situation If STAR elements do have enhancer properties, luciferase expression will be activated in the STAR-containing vectors (FIG 17B, situation 3).
Materials and Methods: The luciferase expression vector was constructed by inserting the E-box and a human alkaline phosphatase minimal promoter from plasmid mu- E5+E2x6-cat(x) (Ruezinsky et al., 1991) upstream of the luciferase gene in plasmid pGL3-basic (Promega E1751), to create pGL3-E-box-luciferase (gift of W. Romanow). The E47 expression plasmid contains the E47 open reading 00 Sframe under control of a beta-actin promoter in the pHBAPr-1-neo plasmid; S E47 in constitutively expressed from this plasmid (gift of W. Romanow).
SSTAR elements 1, 2, 3, 6, 10, 11, 18, and 27 have been cloned into the luciferase expression vector. Clones containing the Drosophila scs element and S 5 the chicken beta-globin HS4-6x core C("HS4") element have been included as c positive controls (they are known to block enhancers, and to have no intrinsic Senhancer properties (Chung et al., 1993, Kellum Schedl, 1992)), and the 00 Sempty luciferase expression vector has been included as a negative control. All c assays were performed using the U-2 OS cell line. In the enhancer-blocking assay, the E47 plasmid was co-transfected with the luciferase expression vectors (empty vector, or containing STAR or positive-control elements). In the enhancer assay, the E47 plasmid was co-transfected with STARless luciferase expression vector as a positive control for enhancer activity; all other samples received a mock plasmid during co-transfection. The transiently transfected cells were assayed for luciferase activity 48 hours after plasmid transfection (supra). The luciferase activity expressed from a plasmid containing no E-box or STAR/control elements was subtracted, and the luciferase activities were normalized to protein content as described (supra).
Results FIG 18 shows the results of the enhancer-blocking assay. In the absence of STAR elements (or the known enhancer-blocking elements scs and HS4), the E47/E-box enhancer complex activates expression of luciferase ('vector"); this enhanced level of expression has been normalized to 100. Enhancer activity is blocked by all STAR elements tested. Enhancer activity is also blocked by the HS4 and scs elements, as expected (Bell et al., 2001, Gerasimova Corces, 2001). These results demonstrate that in addition to their ability to block the spreading of transcriptional silencing (negative influences), STAR elements are able to block the action of enhancers (positive influences).
00 FIG 19 shows the results of the enhancer assay. The level of luciferase expression due to enhancement by the E47/E-box complex is set at 100 C By comparison, none of the STAR elements bring about significant activation of luciferase expression. As expected, the scs and HS4 elements also do not V- 5 bring about activation of the reporter gene. Therefore it is concluded that at least the tested STAR elements do not possess enhancer properties.
00 0 Example 12: STAR elements are conserved between mouse and human BLAT analysis of the STAR DNA sequence against the human genome database (http://genome.ucsc.edu/cgi-bin/hgGateway) reveals that some of these sequences have high sequence conservation with other regions of the human genome. These duplicated regions are candidate STAR elements; if they do show STAR activity, they would be considered paralogs of the cloned STARs (two genes or genetic elements are said to be paralogous if they are derived from a duplication event (Li, 1997)).
BLAST analysis of the human STARs against the mouse genome (http://www.ensembl.org/Musmusculusblastview) also reveals regions of high sequence conservation between mouse and human. This sequence conservation has been shown for fragments of 15 out of the 65 human STAR elements. The conservation ranges from 64% to 89%, over lengths of 141 base pairs to 909 base pairs (Table These degrees of sequence conservation are remarkable and suggest that these DNA sequences may confer STAR activity within the mouse genome as well. Some of the sequences from the mouse and human genomes in Table 8 could be strictly defined as orthologs (two genes or genetic elements aresaid to be orthologous if they are derived from a speciation event (Li, 1997)). For example, STAR6 is between the SLC8A1 and HAAO genes in both the human and mouse genomes. In other cases, a cloned human
STAR
has a paralog within the human genome, and its ortholog has been identified in the mouse genome. For example, STAR3a is a fragment of the 15q11.
2 region of human chromosome 15. This region is 96.9% identical (paralogous) 00 with a DNA fragment at 5q3 3 .3 on human chromosome 5, which is near the IL12B interleukin gene. These human DNAs share approximately identity with a fragment of the 11B2 region on mouse chromosome 11. The 11B2 fragment is also near the (mouse) IL12B interleukin gene. Therefore STAR3a and the mouse 11B2 fragment can be strictly defined as paralogs.
SIn order to test the hypothesis that STAR activity is shared between regions of 0 high sequence conservation in the mouse and human genome, one of the 0 0 human STARs with a conserved sequence in mouse, STAR18, has been analyzed in greater detail. The sequence conservation in the mouse genome detected with the original STAR18 clone extends leftward on human chromosome 2 for about 500 base pairs (FIG 20; left and right relate to the standard description of the arms of chromosome In this example we examine whether the region of sequence conservation defines a "naturally occurring" STAR element in human that is more extensive in length than the original clone. We also examine whether the STAR function of this STAR element is conserved between mouse and human.
Materials and Methods The region of mouse/human sequence conservation around STAR 18 was recovered from human BAC clone RP11-387A1 by PCR amplification, in three fragments: the entire region (primers E93 and E94), the leftward half (primers E93 and E92), and the rightward half (primers E57 and E94). The corresponding fragments from the homologous mouse region were recovered from BAC clone RP23-400H17 in the same fashion (primers E95 and E98, and E96, an4 E97 and E98, respectively). All fragments were cloned into the pSelect vector and transfected into a U-2 OS/Tet-OfffLexA-HP1 cell line (supra). Following transfection, hygromycin selection was carried out to select for transfected cells. The LexA-HP1 protein was induced by lowering the doxycycline concentration, and the ability of the transfected cells to withstand 00 the antibiotic zeocin (a measure of STAR activity) was assessed by monitoring cell growth.
The original STAR18 clone was isolated from Sau3AI digested human N IDNA ligated into the pSelect vector on the basis of its ability to prevent Nl silencing of a zeocin resistance gene. Alignment of the human STAR18 clone 00 0 (497 base pairs) with the mouse genome revealed high sequence similarity cl between the orthologous human and mouse STAR18 regions. It also uncovered high similarity in the region extending for 488 base pairs immediately leftwards of the Sau3AI site that defines the left end of the cloned region (FIG 22). Outside these regions the sequence similarity between human and mouse DNA drops below As indicated in FIG 20, both the human and the mouse STAR18 elements confer survival on zeocin to host cells expressing the lexA-I{Pl repressor protein. The original 497 base pair STAIR18 clone and its mouse ortholog both confer the ability to grow (FIG 20, a and The adjacent 488 base pair regions of high similarity from both genomes also confer the ability to grow, and in fact their growth phenotype is more vigorous than that of the original STAR18 clone (FIG 20, b and When the entire region of sequence similarity was tested, these DNAs from both mouse and human confer growth, and the growth phenotype is more vigorous than the two sub-fragments
(FIG
c and These results demonstrate that the STAR activity of human STAR18 is conserved in its ortholog from mouse. The high sequence conservatior between these orthologous regions is particularly noteworthy because they are not protein-coding sequences, leading to the conclusion that they have some regulatory function that has prevented their evolutionary divergence through mutation.
This analysis demonstrates that cloned STAIR elements identified by the original screening program may in some cases represent partial STAR 00 elements, and that analysis of the genomic DNA in which they are embedded
CO
can identify sequences with stronger STAR activity.
Example 13: STAR elements contain characteristic DNA sequence motifs C STAR elements are isolated on the basis of their anti-repression 0 phenotype with respect to transgene expression. This anti-repression 00 phenotype reflects underlying biochemical processes that regulate chromatin formation which are associated with the STAR elements. These processes are typically sequence-specific and result from protein binding or DNA structure.
This suggests that STAR elements will share DNA sequence similarity.
Identification of sequence similarity among STAR elements will provide sequence motifs that are characteristic of the elements that have already been identified by functional screens and tests. The sequence motifs will also be useful to recognize and claim new STAR elements whose functions conform to the claims of this patent. The functions include improved yield and stability of transgenes expressed in eukaryotic host cells.
Other benefits of identifying sequence motifs that characterize STAR elements include: provision of search motifs for prediction and identification of new STAR elements in genome databases, provision of a rationale for modification of the elements, and provision of information for functional analysis of STAR activity. Using bio-informatics, sequence similarities among STAR elements have been identified; the results are presented in this example.
Bio-inormatic and Statistical Background. Regulatory DNA elements typically function via interaction with sequence-specific DNA-binding proteins.
Bio-informatic analysis of DNA elements such as STAR elements whose regulatory properties have been identified, but whose interacting proteins are unknown, requires a statistical approach for identification of sequence motifs.
This can be achieved by a method that detects short DNA sequence patterns 00
O
O
that are over-represented in a set of regulatory DNA elements the STAR elements) compared to a reference sequence the complete human Sgenome). The method determines the number of observed and expected occurrences of the patterns in each regulatory element. The number of expected occurrences is calculated from the number of observed occurrences of each pattern in the reference sequence.
O The DNA sequence patterns can be oligonucleotides of a given length, 0 e.g. six base pairs. In the simplest analysis, for a 6 base pair oligonucleotide S(hexamer) composed of the four nucleotides C, G, and T) there are 4 ^6 4096 distinct oligonucleotides (all combinations from AAAAAA to TTTTTT). If the regulatory and reference sequences were completely random and had equal proportions of the A, C, G, and T nucleotides, then the expected frequency of each hexamer would be 1/4096 (-0.00024). However, the actual frequency of each hexamer in the reference sequence is typically different than this due to biases in the content of G:C base pairs, etc. Therefore the frequency of each oligonucleotide in the reference sequence is determined empirically by counting, to create a "frequency table" for the patterns.
The pattern frequency table of the reference sequence is then used to calculate the expected frequency of occurrence of each pattern in the regulatory element set. The expected frequencies are compared with the observed frequencies of occurrence of the patterns. Patterns that are "overrepresented" in the set are identified; for example, if the hexamer ACGTGA is expected to occur 5 times in 20 kilobase pairs of sequence, but is observed to occur 15 times, then it is three-fold over-represented. Ten of the 15 occurrences of that hexameric sequence pattern would not be expected in the regulatory elements if the elements had the same hexamer composition as the entire genome. Once the over-represented patterns are identified, a statistical test is applied to determine whether their over-representation is significant, or may be due to chance. For this test, a significance index, "sig", is calculated for each pattern. The significance index is derived from the probability of occurrence of 00 each pattern, which is estimated by a binomial distribution. The probability takes into account the number of possible patterns (4096 for hexamers). The highest sig values corespond to the most overrepresented oligonucleotides (van Helden et al., 1998). In practical terms, oligonucleotides with sig 0 are considered as over-rep resented. A pattern with sig 0 is likely to be overrepresented due to chance once (=JOAO) in the set of regulatory element sequences. However, at sig 1 a pattern is expected to be over-rep resented 00 once in ten (lJOAl1) sequence sets, sig 2 once in 100 (1OA 2) sequence sets, etc.
The patterns that are significantly over-represented in the regulatory element set are used to develop a model for classification and prediction of regulatory element sequences. This employs Discriminant Analysis, a so-called "supervised!' method of statistical classification known to one of ordinary skill in the art (Huberty, 1994). In Discriminant Analysis, sets of known or classified items STA.R elements) are used to "train" a model to recognize those items on the basis of specific variables sequence patterns such as hexamers). The trained model is then used to predict whether other items should be classified as belonging to the set of known items is a DNA sequence a STAR element). In this example, the known items in the training set are STAR elements (positive training set). They are contrasted with sequences that are randomly selected from the genome (negative training set) which have the same length as the STAR elements. Discriminant Analysis establishes criteria for discriminating positives from negatives based on a set of variables that distinguish the positives; in this example, the variables are the significa~itly over-represented patterns hexamers).
When the number of over-represented patterns is high compared to the size of the training set, the model could become biased due to over-training.
Over-training is circumvented by applying a forward stepwise selection of variables (iHuberty, 1994). The goal of Stepwise Discriminant Analysis is to select the minimum number of variables that provides maximum 00
O
discrimination between the positives and negatives. The model is trained by evaluating variables one-by-one for their ability to properly classify the items in the positive and negative training sets. This is done until addition of new variables to the model does not significantly increase the model's predictive power until the classification error rate is minimized). This optimized N model is then used for testing, in order to predict whether "new" items are 0 positives or negatives (Huberty, 1994).
0 0 It is inherent in classification statistics that for complex items such as DNA sequences, some elements of the positive training set will be classified as negatives (false negatives), and some members of the negative training set will be classified as positives (false positives). When a trained model is applied to testing new items, the same types of misclassifications are expected to occur.
In the bio-informatic method described here, the first step, pattern frequency analysis, reduces a large set of sequence patterns all 4096 hexamers) to a smaller set of significantly over-represented patterns 100 hexamers); in the second step, Stepwise Discrimant Analysis reduces the set of overrepresented patterns to the subset of those patterns that have maximal discriminative power 5-10 hexamers). Therefore this approach provides simple and robust criteria for identifying regulatory DNA elements such as STAR elements.
DNA-binding proteins can be distinguished on the basis of the type of binding site they occupy. Some recognize contiguous sequences; for this type of protein, patterns that are oligonucleotides of length 6 base pairs (hexamers) are fruitful for bio-informatic analysis (van Helden et al., 1998). Other proteins bind to sequence dyads: contact is made between pairs of highly conserved trinucleotides separated by a non-conserved region of fixed width (van Helden et al., 2000). In order to identify sequences in STAR elements that may be bound by dyad-binding proteins, frequency analysis was also conducted for this type of pattern, where the spacing between the two trinucleotides was varied from 0 to 20 XXXN{0-20}XXX where X's are specific nucleotides composing 00
O
O
the trinucleotides, and N's are random nucleotides from 0 to 20 base pairs in length). The results of dyad frequency analysis are also used for Linear Discriminant Analysis as described above.
Materials and Methods C Using the genetic screen described herein and in EP 01202581.3, sixty- 0 six (66) STAR elements were initially isolated from human genomic DNA and 00 characterized in detail (Table The screen was performed on gene libraries Sconstructed by Sau3AI digestion of human genomic DNA, either purified from placenta (Clontech 6550-1) or carried in bacterial/P1 (BAC/PAC) artificial chromosomes. The BAC/PAC clones contain genomic DNA from regions of chromosome 1 (clones RP1154H19 and RP3328E19), from the HOX cluster of homeotic genes (clones RP1167F23, RP1170019, and RP11387A1), or from human chromosome 22 (Research Genetics 96010-22). The DNAs were sizefractionated, and the 0.5 2 kb size fraction was ligated into BamHI-digested pSelect vector, by standard techniques (Sambrook et al., 1989). pSelect plasmids containing human genomic DNA that conferred resistance to zeocin at low doxycycline concentrations were isolated and propagated in Escherichia coli. The screens that yielded the STAR elements of Table 6 have assayed approximately 1-2% of the human genome.
The human genomic DNA inserts in these 66 plasmids were sequenced by the dideoxy method (Sanger et al., 1977) using a Beckman CEQ2000 automated DNA sequencer, using the manufacturer's instructions. Briefly, DNA was purified from E. coli using QlAprep Spin Miniprep and Plasmid Midi Kits (QIAGEN 27106 and 12145, respectively). Cycle sequencing was carried out using custom oligonucleotides corresponding to the pSelect vector (primers D89 and D95, Table in the presence of dye terminators (CEQ Dye Terminator Cycle Sequencing Kit, Beckman 608000). Assembled STAR DNA sequences were located in the human genome (database builds August and December 2001) using BLAT (Basic Local Alignment Tool (Kent, 2002); 73 00 http://genome.ucsc.edu/cgi-bin/hgGateway; Table In aggregate, the combined STAR sequences cmprise 85.6 kilobase pairs, with an average length of 1.3 kilobase pairs.
Sequence motifs that distinguish STAR elements within human genomic DNA were identified by.bio-informatic analysis using a two-step procedure, as (Ni follows (see FIG 21 for a schematic diagram). The analysis has two input 0 datasets: the DNA sequences of the STAR elements (STAR1 0 oO were used; Table and the DNA sequence of the human genome (except 0 for chromosome 1, which was not feasible to include due to its large size; for dyad analysis a random subset of human genomic DNA sequence (-27 Mb) was used).
Pattern Frequency Analysis. The first step in the analysis uses RSA- Tools software (Regulatory Sequence Analysis Tools; http://www.ucmb.ulb.ac.be/bioinformatics/rsa-tools/; references (van Helden et al., 1998, van Helden et al., 2000, van Helden et al., 2000)) to determine the following information: the frequencies of all dyads and hexameric oligonucleotides in the human genome; the frequencies of the oligonucleotides and dyads in the 65 STAR elements; and the significance indices of those oligonucleotides and dyads that are over-represented in the STAR elements compared to the genome. A control analysis was done with sequences that were selected at random from the human genome from 2689 x 10^3 kilobase pairs) that match the length of the STAR elements of Table 6.
Discriminant Analysis. The over-represented oligonucleotides and dyads were used to,train models for prediction of STAR elements by Linear Discriminant Analysis (Huberty, 1994). A pre-selection of variables was performed by selecting the 50 patterns with the highest individual dicriminatory power from the over-represented oligos or dyads of the frequency analyses. These pre-selected variables were then used for model training in a Stepwise Linear Discriminant Analysis to select the most discriminant 00
O
O
combination of variables (Huberty, 1994). Variable selection was based on minimizing the classification error rate (percentage of false negative c classifications). In addition, the expected error rate was estimated by applying the same discriminant approach to the control set of random sequences (minimizing the percentage of false positive classifications).
N The predictive models from the training phase of Discriminant Analysis Swere tested in two ways. First, the STAR elements and random sequences that 00 were used to generate the model (the training sets) were classified. Second, Ssequences in a collection of 19 candidate STAR elements (recently cloned by zeocin selection as described above) were classified. These candidate STAR elements are listed in Table 9 (SEQ ID:67-84).
Results Pattern frequency analysis was performed with RSA-Tools on 65 STAR elements, using the human genome as the reference sequence. One hundred sixty-six (166) hexameric oligonucleotides were found to be over-represented in the set of STAR elements (sig 0) compared to the entire genome (Table 4).
The most significantly over-represented oligonucleotide, CCCCAC, occurs 107 times among the 65 STAR elements, but is expected to occur only 49 times. It has a significance coefficient of 8.76; in other words, the probability that its over-representation is due to random chance is 1 10^8.76, i.e. less than one in 500 million.
Ninety-five of the oligonucleotides have a significance coefficient greater than 1, and are therefore highly over-represented in the STAR elements.
Among the oyer-represented oligonucleotides, their observed and expected occurrences, respectively, range from 6 and 1 (for oligo 163, CGCGAA, sig 0.02) to 133 and 95 (for oligo 120, CCCAGG, sig 0.49). The differences in expected occurrences reflect factors such as the G:C content of the human genome. Therefore the differences among the oligonucleotides in their number of occurrences is less important than their over-representation; for example, 00 Soligo 2 (CAGCGG) is 36 9 4-fold over-represented, which has a probability of being due to random chance of one in fifty million (sig 7.75).
Table 4 also presents the number of STAR elements in which each overrepresented oligonucleotide is found. For example, the most significant S 5 oligonucleotide, oligo 1 (CCCCAC), occurs 107 times, but is found in only 51 STARs, i.e. on average it occurs as two copies per STAR. The least abundant Soligonucleotide, number 166 (AATCGG), occurs on average as a single copy per 00 STAR (thirteen occurrences on eleven STARs); single-copy oligonucleotides occur frequently, especially for the lower-abundance oligos. At the other extreme, oligo 4 (CAGCCC) occurs on average 3 times in those STARs in which it is found (37 STARs). The most widespread oligonucleotide is number 120 (CCCAGG), which occurs on 58 STARs (on average twice per STAR), and the least widespread oligonucleotide is number 114 (CGTCGC), which occurs on only 6 STARs (and on average only once per STAR).
Results of dyad frequency analysis are given in Table 5. Seven hundred thirty (730) dyads were found to be over-represented in the set of STAR elements (sig 0) compared to the reference sequence. The most significantly over-represented dyad, CCCN{2}CGG, occurs 36 times among the 65 STAR elements, but is expected to occur only 7 times. It has a significance coefficient of 9.31; in other words, the probability that its over-representation is due to chance is 1 10^9.31, i.e. less than one in 2 billion.
Three hundred ninety-seven of the dyads have a significance coefficient greater than 1, and are therefore highly over-represented in the STAR elements. Among the over-represented dyads, their observed and expected occurrences, respectively, range from 9 and 1 (for five dyads (numbers 380, 435, 493, 640, and 665)) to 118 and 63 (for number 30 (AGGN{2}GGG), sig 4.44).
The oligonucleotides and dyads found to be over-represented in STAR elements by pattern frequency analysis were tested for their discriminative power by Linear Discriminant Analysis. Discriminant models were trained by 00 step-wise selection of the best combination among the 50 most discriminant oligonucleotide (Table 4) or dyad (Table 5) patterns. The models achieved c. optimal error rates after incorporation of 4 (dyad) or 5 variables. The discriminative variables from oligo analysis are numbers 11, 30, 94, 122, and 160 (Table those from dyad analysis are numbers 73, 194, 419, and 497 c (Table O The discriminant models were then used to classify the 65 STAR 00 Selements in the training set and their associated random sequences. The model Susing oligonucleotide variables classifies 46 of the 65 STAR elements as STAR elements (true positives); the dyad model classifies 49 of the STAR elements as true positives. In combination, the models classify 59 of the 65 STAR elements as STAR elements FIG 22). The false positive rates (random sequences classified as STARs) were 7 for the dyad model, 8 for the oligonucleotide model, and 13 for the combined predictions of the two models The STAR elements of Table 6 that were not classified as STARs by LDA are STARs 7, 22, 35, 44, 46, and 65. These elements display stabilizing anti-repressor activity in functional assays, so the fact that they are not classified as STARs by LDA suggests that they represent another class (or classes) of STAR elements.
The models were then used to classify the 19 candidate STAR elements in the testing set listed in Table 9. The dyad model classifies 12 of these candidate STARs as STAR elements, and the oligonucleotide model classifies 14 as STARs. The combined number of the candidates that are classified as STAR elements is 15 This is a lower rate of classification than obtained with the traiing set of 65 STARs; this is expected for two reasons. First, the discriminant models were trained with the 65 STARs of Table 6, and discriminative variables based on this training set may be less well represented in the testing set. Second, the candidate STAR sequences in the testing set have not yet been fully characterized in terms of in vivo function, and may include elements with only weak anti-repression properties.
00 This analysis demonstrates the power of a statistical approach to bioinformatic classification of STAR elements. The STAR sequences contain a c-i number of dyad and hexameric oligonucleotide patterns that are significantly over-represented in comparison with the human genome as a whole. These patterns may represent binding sites for proteins that confer STAR activity; in c-i any case they form a set of sequence motifs that can be used to recognize
STAR
element sequences.
00 Using these patterns to recognize STAR elements by Discriminant CI Analysis, a high proportion of the elements obtained by the genetic screen of the invention are in fact classified as STARs. This reflects underlying sequence and functional similarities among these elements. An important aspect of the method described here (pattern frequency analysis followed by Discriminant Analysis) is that it can be reiterated; for example, by including the 19 candidate STAR elements of Table 9 with the 66 STAR elements of Table 6 into one training set, an improved discriminant model can be trained. This improved model can then be used to classify other candidate regulatory elements as STARs. Large-scale in vivo screening of genomic sequences using the method of the invention, combined with reiteration of the bio-informatic analysis, will provide a means of discriminating STAR elements that asymptotically approaches 100% recognition and prediction of elements as the genome is screened in its entirety. These stringent and comprehensive predictions of STAR function will ensure that all human STAR elements are recognized, and are available for use in improving transgene expression.
Example 14,: Cloning andl characterization of STAR elements from Arabiclopsis thaliana Transgene silencing occurs in transgenic plants at both the transcriptional and post-transcriptional levels (Meyer, 2000, Vance Vaucheret, 2001). In either case, the desired result of transgene expression can be compromised by silencing; the low expression and instability of the 00 transgene results in poor expression of desirable traits pest resistance) or low yields of recombinant proteins. It also results in poor predictability: the proportion of transgenic plants that express the transgene at biotechnologically useful levels is low, which necessitates laborious and expensive screening of transformed individuals for those with beneficial expression characteristics. This example describes the isolation of STAR elements from the genome of the dicot plant Arabidopsis thaliana for use in 00 Spreventing transcriptional transgene silencing in transgenic plants.
Arabidopsis was chosen for this example because it is a well-studied model orga.nism: it has a compact genome, it is amenable to genetic and recombinant DNA manipulations, and its genome has been sequenced (Bevan et al., 2001, Initiative, 2000, Meinke et al., 1998).
Materials and Methods: Genomic DNA was isolated from Arabidopsis thaliana ecotype Columbia as described (Stam et al., 1998) and partially digested with MboI. The digested DNA was size-fractionated to 0.5 2 kilbase pairs by agarose gel electrophoresis and purification from the gel (QIAquick Gel Extraction Kit, QIAGEN 28706), followed by ligation into the pSelect vector (supra).
Transfection into the U-2 OS/Tet-Off/LexA-HP1 cell line and selection for zeocin resistance at low doxycycline concentration was performed as described (supra). Plasmids were isolated from zeocin resistant colonies and retransfected into the U-2 OS/Tet-OfflLexA-HP1 cell line.
Sequencing of Arabidopsis genomic DNA fragments that conferred zeocin resistance upon re-transfection was performed as described (supra). The DNA sequences were compared to the sequence of the Arabidopsis genome by BLAST analysis ((Altschul et al., 1990); URL http://www.ncbi.nlm.nih.gov/blast/Blast).
STAR activity was tested further by measuring mRNA levels for the hygromycin- and zeocin-resistance genes in recombinant host cells by reverse 00 ^c transcription PCR (RT-PCR). Cells of the U-2 OS/Tet-Off/lexA-HP1 cell line were transfected with pSelect plasmids containing Arabidopsis
STAR
r elements, the Drosophila scs element, or containing no insert (supra). These were cultivated on hygromycin for 2 weeks at high doxycycline concentration, then the doxycycline concentration was lowered to 0.1 ng/ml to induce the N lexA-HP1 repressor protein. After 10 days, total RNA was isolated by the N, RNeasy mini kit (QIAGEN 74104) as described by the manufacturer. First- 00 Sstrand cDNA synthesis was carried out using the RevertAid First Strand cDNA Synthesis kit (MBI Fermentas 1622) using oligo(dT)18 primer as described by the manufacturer. An aliquot of the cDNA was used as the template in a PCR reaction using primers D58 and D80 (for the zeocin marker), and D70 and D71 (for the hygromycin marker), and Taq DNA polymerase (Promega M2661). The reaction conditions were 15-20 cycles of 94°C for 1 minute, 54 0 C for Iminute, and 72oC for 90 seconds. These conditions result in a linear relationship between input RNA and PCR product DNA. The PCR products were resolved by agarose gel electrophoresis, and the zeocin and hygromycin bands were detected by Southern blotting as described (Sambrook et al., 1989), using PCR products produced as above with purified pSelect plasmid as template. The ratio of the zeocin and hygromycin signals corresponds to the normalized expression level of the zeocin gene.
Results The library of Arabidopsis genomic DNA in the pSelect vector comprised 69,000 primary clones in E. coli, 80% of which carried inserts. The average insert size was approximately 1000 base pairs; the library therefore represents approximately 40% of the Arabidopsis genome.
A portion of this library (representing approximately 16% of the Arabidopsis genome) was transfected into the U-2 OS/Tet-Off/LexA-HP1 cell line. Hygromycin selection was imposed to isolate transfectants, which resulted in 27,000 surviving colonies. These were then subjected to zeocin 00
O
O
selection at low doxycycline concentration. Putative STAR-containing plasmids from 56 zeocin-resistant colonies were rescued into E. coli and re-transfected into U-2 OS/Tet-OffLexA-HP1 cells. Forty-four of these plasmids (79% of the plasmids tested) conferred zeocin resistance on the host cells at low doxycycline concentrations, demonstrating that the plasmids carried STAR elements. This indicates that the pSelect screen in human U-2 OS cells is O highly efficient at detection of STAR elements from plant genomic DNA.
00 The DNA sequences of these 44 candidate STAR elements were Sdetermined. Thirty-five of them were identified as single loci in the database of Arabidopsis nuclear genomic sequence (Table 10; SEQ ID:85 SEQ ID:119).
Four others were identified as coming from the chloroplast genome, four were chimeras of DNA fragments from two loci, and one was not found in the Arabidopsis genome database.
The strength of the cloned Arabidopsis STAR elements was tested by assessing their ability to prevent transcriptional repression of the zeocinresistance gene, using an RT-PCR assay. As a control for RNA input among the samples, the transcript levels of the hygromycin-resistance gene for each STAR transfection were assessed too. This analysis has been performed for 12 of the Arabidopsis STAR elements. The results (FIG 23) demonstrate that the Arabidopsis STAR elements are superior to the Drosophila scs element (positive control) and the empty vector ("SV40"; negative control) in their ability to protect the zeocin-resistance gene from transcriptional repression. In particular, STAR-A28 and STAR-A30 enable 2-fold higher levels of zeocinresistance gene expression than the scs element (normalized to the internal control of hygromycin-resistance gene mRNA) when the lexA-HP1 repressor is expressed.
These results demonstrate that the method of the invention can be successfully applied to recovery of STAR elements from genomes of other species than human. Its successful application to STAR elements from a plant genome is particularly significant because it demonstrates the wide taxonomic 00
O
O
range over which the method of the invention is applicable, and because plants Sare an important target of biotechnological development.
Example 15: STAR-shielded genes that reside on multiple vectors are expressed simultaneously in CHO cells 'N STAR elements function to block the effect of transcriptional repression 0 influences on transgene expression units. One of the benefits of STAR 00 elements for heterologous protein production is the increased predictability of finding high-expressing primary recombinant host cells. This feature allows for the simultaneous expression of different genes that reside on multiple, distinct vectors. In this example we use two different STAR7-shielded genes, GFP and RED, which are located on two different vectors. When these two vectors are transfected simultaneously to Chinese hamster ovary (CHO) cells, both are expressed, whereas the corresponding, but unprotected GFP and RED genes, show hardly such simultaneous expression.
Material and Methods The STAR7 element is tested in the ppGIZ-STAR7 and ppRIP-STAR7 vectors (FIG 24). The construction of the pPlug&Play (ppGIZ and ppRIP) vectors is described below. Plasmid pGFP (Clontech 6010-1) is modified by insertion of a linker at the BsiWI site to yield pGFP-link. The linker (made by annealing oligonucleotides 5'GTACGGATATCAGATCTTTAATTAAG3' and 5'GTACCTTAATTAAAGATCTGATATCC3) introduces sites for the PacI, BglII, and EcoRV restriction endonucleases. This creates the multiple cloning site MCSII for insertion of STAR elements. Then primers CGG3') and 5 'AGGCGGATCCGAATGTATTTAGAAAAATAAACAAATAGGGG3) are used to amplify a region of 0.37 kb from pGFP, which is inserted into the BglII site of pIRES (Clontech 6028-1) to yield pIRES-stuf. This introduces sites for the 00 AscI and SwaI restriction endonucleases at MCSI, and acts as a "stuffer fragment" to avoid potential interference between STAR elements and adjacent promoters. pIRES-stuf is digested with BglII and FspI to liberate a DNA fragment composed of the stuffer fragment, the CMV promoter, the IRES element (flanked by multiple cloning sites MOS A and MCS and the polyadenylation signal. This fragment is ligated with the vector backbone of c-i pGFP-link produced by digestion with Bamll and StuI, to yield pIRES-link.
00 The open reading frames of the zeocin-resistance gene is inserted into the BamiLf/NotI sites of MCS B in pIRES-link as follows: the zeocin-resistance ORF is amplified by PCR with primers 3 and 3 from plasmid pEM7/zeo, digested with BamHI and NotI, and ligated with BamlNotI-digested pIRESlink to yield pIRES-link-zeo. The GFP reporter ORF is introduced into pIRESlink-zeo by amplification of phr-GFP-1 with primers 3 andI AGG and insertion of the EcoRI-cligested GFP cassette into the EcoRI site in MCS A of the pIRES-link-zeo plasmid. This creates the ppGIZ (for ppGFP- IRES-zeo) 5' STAR7 is cloned into the Sall site and 3' STAR7 is cloned into the PacI site.
The puromycin-resistance ORF is amplified by PCR with primers 3 'and 3 from plasinid pBabe- Puro (Morgepstern Land, 1990), digested with BamH[ and NotI, and ligated with pamI-Hi/NotI-dligested pIRES-link to yield pIRES-link-puro. The RED gene is amplified by POR with primers 3 and 3 from plasmid pDsRed2 (Clontech 6943-1), digested with XbaI and MuI and ligated to NheI- 00
O
O
MluI digested pIRES-link-puro to yield ppRIP (for ppRED-IRES-puro). STAR7 is cloned into the Sal site and 3' STAR7 is cloned into the PacI site.
Transfection and culture of CHO cells The Chinese Hamster Ovary cell line CHO-K1 (ATCC CCL-61) is N cultured in HAMS-F12 medium 10% Fetal Calf Serum containing 2 mM N glutamine, 100 U/ml penicillin, and 100 micrograms/ml streptomycin at 370 CO2. Cells are transfected with the plasmids using Lipofectamine 2000 (Invitrogen) as described by the manufacturer. Briefly, cells are seeded to culture vessels and grown overnight to 70-90% confluence. Lipofectamine reagent is combined with plasmid DNA at a ratio of 7.5 microliters per 3 microgram for a 10 cm Petri dish, 20 micrograms DNA and 120 microliters Lipofectamine) and added after a 30 minutes incubation at 250C to the cells. After a 6 hour incubation the transfection mixture is replaced with fresh medium, and the transfected cells are incubated further. After overnight cultivation, cells are trypsinized and seeded into fresh petri dishes with fresh medium with zeocin added to a concentration of 100 tg/ml and the cells are cultured further. When individual colonies become visible (approximately ten days after transfection) medium is removed and replaced with fresh medium (puromycin) Individual colonies are isolated and transferred to 24-well plates in medium with zeocin. Expression of the GFP and RED reporter genes is assessed approximately 3 weeks after transfection.
One tested construct consists of a monocistronic gene with the GFP gene, an IREP and the Zeocin resistance gene under control of the CMV promoter, but either with or without STAR7 element to flank the entire construct (FIG 24). The other construct consists of a monocistronic gene with the RED gene, an IRES and the puromycin resistance gene under control of the CMV promoter, but either with or without STAR7 element to flank the entire construct (FIG 24).
00 The constructs are transfected to CHO-Ki cells. Stable colonies that are resistant for both zeocin and puromycin are expanded before the GFP and RED signals are determined on a XL-MCL Beckman Coulter flowcytometer.
The percentage of cells in one colony that are double positive for both GFP and RED signals is taken as measure for simultaneous expression of both proteins and this is plotted in FIG 24.
00 Results N FIG 24 shows that simultaneous expression in independent zeocin and puromycin resistant CHO colonies of GFP and a RED reporter genes that are flanked by a STAR element results in a higher number of cells that express both GFP and RED proteins, as compared to the control vectors without STAR7 element. The STAR7 element therefore conveys a higher degree of predictability of transgene expression in CHO cells. In the STAR-less colonies at most 9 out of 20 colonies contain double GFP/RED positive cells. The percentage of double positive cells ranges between 10 and 40%. The remaining 11 out of 20 colonies have less than 10% GFP[RED positive cells. In contrast, in 19 out of 20 colonies that contain the STAR-shielded GFP and RED genes, the percentage GFP/RED double positive cells ranges between 25 and 75%. In 15 out of these 19 double positive colonies the percentage GFP[RED double positive cells is higher than 40%. This result shows that it is more likely that simultaneous expression of two genes is achieved when these genes are flanked with STAR elements.
Example 1q: Expression of a functional antibody from two separate plasmids is easier obtained when STAR elements flank the genes encoding the heavy and light chains.
Due to the ability of STAR elements to convey higher predictability to protein expression two genes can be expressed simultaneously from distinct vectors. This is shown in example 15 for two reporter genes, GEP and RED.
00
O
O
Now the simultaneous expression of a light and a heavy antibody chain is Stested. In example 16, STAR7-shielded light and heavy antibody cDNAs that reside on distinct vectors are simultaneously transfected to Chinese hamster ovary cells. This results in the production of functional antibody, indicating that both heavy and light chains are expressed simultaneously. In contrast, the simultaneous transfection of unprotected light and heavy antibody cDNAs shows hardly expression of functional antibody.
00 Materials and Methods The tested constructs are the same as described in Example 15, except that the GFP gene is replaced by the gene encoding the light chain of the RING1 antibody (Hamer et al., 2002) and the RED gene is replaced by the gene encoding the heavy chain of the RING1 antibody. The light chain is amplified from the RING1 hybridoma (Hamer et al., 2002) by RT-PCR using the primers 5'CAAGAATTCAATGGATTTTCAAGTGCAG3' and CCGCTTTGTCTCTAACACTCATTCC3'. The PCR product is cloned into pcDNA3 after restriction digestion with EcoRI and NotI and sequenced to detect potential frame shifts in the sequence. The cDNA is excised with EcoRI and NotI, blunted and cloned in ppGIZ plasmid. The heavy chain is amplified from the RING1 hybridoma (Hamer et al., 2002) by RT-PCR using the primers 5'ACAGAATTCTTACCATGGATTTTGGGCTG3'and 5'ACAGCGGCCGCTCATTTACCAGGAGAGTGGG3'. The PCR product is cloned into pcDNA3 after restriction digestion with EcoRI and NotI and sequenced to detect potential frame shifts in the sequence. The cDNA is excised with EcoRI and NotI, blunted and cloned in ppRIP plasmid.
Results CHO colonies are simultaneously transfected with the RING1 Light Chain (LC) and RING1 Heavy Chain (HC) cDNAs that reside on two distinct vectors. The Light Chain is coupled to the zeocin .resistance gene through an 00
O
O
IRES, the Heavy Chain is coupled to the puromycin resistance gene through San IRES. FIG 25 shows that simultaneous transfection to CHO cells of the heavy and light chain encoding cDNAs results in the establishment of independent zeocin and puromycin resistant colonies. When the constructs are flanked by the STAR7 element this results in a higher production of functional RING1 antibody, as compared to the control vectors without STAR7 element.
The STAR7 element therefore conveys a higher degree of predictability of antibody expression in CHO cells.
In the STAR-less colonies only 1 out of 12 colonies express detectable antibody. In contrast, in 7 out of 12 colonies that contain the STAR-shielded Light and Heavy Chain genes, produce functional RING1 antibody that detects the RING1 antigen in an ELISA assay. Significantly, all these 7 colonies produce higher levels of RING1 antibody than the highest control colony (arbitrarily set at 100%). This result shows that it is more likely that simultaneous expression of two genes encoding two antibody chains is achieved when these genes are flanked with STAR elements.
00 87 Table 1. Oligonucleotides used for polymerase chain reactions
(PCR
primers) or DNA mutageneSis Number Seauence
AACAGCTTGATATCAGATTGCTAGCTTGGTGACTTACTTCCC
C66 AAACTCGA
GCGGCGCGTCGTCGACTTACACTAAGTAA
ciC67 AAACGCGCATGGAAGACGC L CATAAGG C68 TATGGATCCTAATTAACGCGATCTTTCC 00C81- AAAcCATGGCCGAGTACAAGCCCACGTGCGCC C82 AAATCTAGATCAGGCAC0GrCTGCGGT
AGC
CATTTCCCCGAAAAGTGCCACC
F D30F TCACTGCTAGCGAGTGTACTC D42 GAGCCGCGGTTTAGTTCCTCACCTTGTCG D51. C-TGGAAGc'ITTGCTGAAGAAAC D58
CCAAGTTGACCAGTGCC
TACAAGCCAACCACGGCT
D71 CGGATGCITGACATTGG D180 GTTCGTGGACACGACCTCCG D89
GGGCAAGATGTCGTAGTCAGG
AGCCGGATCCAATGTACG
E D13
CTAAACAGCTTTAGC
D914 AAGTTGG
CCCTTG
E 15 TTTGAAGATCTACCAAATGG E16 GTTCGGGATCCACCTGGCCG E 17 TAGGCAAGATCTEGGC
CCTC
AGA
CTTCATGTCTGCGAGGC
00 E92
AGGOOGCTAGCACGCGTTCTACTCICCTACTCTG
E93
GATCAAGCTACGCGTCTAAGGCAMATATAG
E94
AGCGCTAGCACCGTTCAGAGTTAGTGTCCAGG
GTCAGcfACGCGTCAGTAAGGTTCGTATGG E96 AGGCGTA 3CACGCGTCTATCTTCArACTCTG E98
CAAGGCCGCAGGTACACATGTTC
CIE99
GATCACTAGTATGGCCAAG-TT~CCAGTGC
E100
AGGCGCGGCCGCAATTCTCAGTCCTCTCCTC
Fl 1 GATCGCTAGCAATCGCGACTTCGCCCACCATGC 00 F14 GATCGAATTCTOGCGACTTCGCCCACCATGC VAGGATA~-lMT
C
F16
GATCAATCTCGCGAATGGTGCAAGTCCTGAAG
F17 AGGCGATTCACCGGTGTTTACTACACCCACCGTGCAGGO'CCgCAGG F 18 GATCGGATCCTTC AAATGGCCAGTxCCAGTGC F19
GACGTCTGATATGAAGTGTG
AGGCGCG
CCGCCAGACGTCAAGAGGCG
F21 GATCGGAT'CTC,"zAAATGACC 3GTACAAGCCCACG F22
AGGCGCGGCCGTCAGGCACCGTGCGGGTC
F23
GACGTTGGGCTT.ACTTGGGTCGGTAG
F24
AGGCGGATCGAATGTATTTAGATACAATAGGGG
GTACGGATATAGrTTATTAAG F 26 GTACCTTAATTAAAGATCTGATATCC F32
GATCGAGGTACCGGTGTGT
F33
GATCACACACOGGTACCTC
F34
CGGAGGTACCGGTGTGT
CGAOACACOGGTACCTC
F4 TGAGAGGTACCGGTGTGT
TCAACACACCGGTACOTO
Table 2. STAR elements and two-step selection increase the predictability of transgene expression without STAR with STAR fold improvement (carry out first antibiotic selection) Number colonies 1 -100 10-fold -1000 High producers percent 5% 3-fold number 5 150 (characterize 20 (20% of population) of population) colonies) High producers 1 3-fold 2 3 Low producers 19 17 (carry out second antibiotic selection, killing low producers) Survivors to 5 30-fold 3 150 characterize 1Colonies per microgram plasmid DNA.
2 Manifesting the three-fold improvement due to the presence of STARs in the percent of high producers in the original population of colonies resistant to the first antibiotic.
SManifesting the arithmetic product of the fold improvement in the number of colonies and the increased percentage of high producers due to the presence of STARs.
00 Table 3. Sequences of various STAR elements STARS forward
ACGTNCTAAGNAAACCATTATTATCATGACTACCTATAAATAGGC
GTATCACGAGGCCCTTTCGTCTTCACTCGAGCGGCCAG
CTTGGATCTCGA
N~ GTACTGAAATAGGAGTAAATCTGAAGAGCAATAGATGAGCCAGAAC 0 CATGAAAAGAACAGGGACTAC CAGTTGATTC CACAAGGACATTCC
CAAGG
00 00 TGAGAAGGCCATATACCTCCACTACCTGAC
CAATTCTCTGTATGCAGATT
TAGCAAGGTTATAAGGTAGCAAAAGATTAGACCCGTAGAGAACT
TC CAATC CAGTAAAAATCATAGCAAATTTATTGATGATAACAM'IGTCTCC
AAGGAACCAGGCAGAGTCGTGCTAGCAGAGGAGCACGTGAGCTGA
ACAGCCAATCTGCTIG1TTTCATGACACAGGAGCATAGTACACACCA
CCAACTGACCTATTAAGGCTGTGGTAAACCGATTCATAGAGAGAGGTTCT
AATACATTGGTCCCTCATAGGCAACCGCAGTTCACTCCGAACGTAGTC
CCTGGAAATTTGATGTCCAGNATAGAAAG
CANAGCAGNCNNNNNNTAT
AATNNNGNTGANCCANATGNTNNCTGNC
STARS reverse
GAGCTAGCGGCGCGCCAAGCTTGGATCCCGCCCCGCCCCCTCCGCCCTCG
AGCCCCGCCCCTITGCCCTAGAGGCCCTGCCGAGGGGCGGGGCCTGTCCC
TCCTCCCCTTTCCC
CCGCCCCCTACCGTCACGCTCAGGGGCAGGCTGACC
CCGAGCGGCCCCGCGGTGACCCTCGCGCAGAGGCCTGTGGGAGGGGCGT
CGCAAGCCCCTGAATCCCCCCCCGTCTGTTCCCCCCTCCCGCCCAGTCTC
CTCCCCCTGGGAACGCGCGGGGTGGGTGACAGACCTGGCTGCGCGCCAC
CGCCACCGCGCCTGCCGGGGGCGCTGCCGCTGCCTGAGACTGCGGCT
GCCGCCTGGAGGAGGTGCCGTCGCCTCCGCCACCGCTGCCGCCGCCGCC
AGGGGTAGGAGCTAAGCCGCCGCCATTTTGTGTCCCCCTGTTGTTGTCGT
TGACATGAATC
CGACATGACACTGATTACAGCCCAATGGAGTCTCATA
ACCCGAGTCGCGGTCCCGC
CCCGCCGCTGCTCCATTGG.AGGAGACCAAAG
ACACTTAAGGCCAC
CCGTTGGCCTACGGGTCTGTCTGTCACCCACTCACT
00
AACCACTCTGCAGCCCATTGGGGCAGGTTCCTGCCGGTCATNTCGCTTICC
AATAAACACACCCCTTCGACCCCATNATTCCCCCCCTTCGGGAACCACCC
CCGGGGGAGGGGTCCACTGGNCAATACCAATTNAANAGAAC CGCTNGGG
TCCGCCTNTTTNCGGGCNCCCTATTGGGTT
STAR4 forward GGGGAGGATrC1TTPTGGCTGCTGAG'ITGAGATTAGGTTGAGGGTAGTGAA 00 GGTAAAGGCAGTGAGACCACGTAGGGGTCATTGCAGTAATCCAGGCTGGA 0 GATGATGGTGGTTCAGTTGGAATAGCAGTGCATGTGCTGTAACAACCTCA
GCTGGGAAGCAGTATATGTGGCGTTATGACCTCAGCTGGAACAGCAATGC
ATGTGGTGGTGTAATGACCCCAGCTGGGTAGGGTGCATGTGATGGAACAA
CCTCAGCTGGGTAGCAGTGTACTTGATAAAATGTTGGCATACTCTACATTT
GTTATGAGGGTAGTGCCATTAAATTTCTCCACAAATTGGTTGTCACGTATG
AGTGAAAAGAGGAAGTGATGGAAGACTTCAGTGCTT= GGCCTGAATAAA
TAGAAGACGTCATTTVTCAGTAATGGAGACAGGGAAGACTAANGNAGGGT
GGATTCAGTAGAGCAGGTGTTCAGTTTTGAATATGATGAACTCTGAGAGA
GGAAAAAC'ITTTTCTACCTCTTAGTTTTTGNGNCTGGACTTAANATTAAAG
GACATANGACNGAGANCAGACCAAATNTG CGANGTI7TTTATATTTTACTT GCNGAGGGAATTTNCAAGAAAAAGAAGAC CCAANANCCATTGGTCAAAA CTATNTGC CTTTTAANAAAAAGM'NAATTACAATGGANANANAAGTGTTGN
CTNGGCAAAAATTGGG
STAR4 reverse
GGATTNGAGCTAGCGGCGCGCCAAGCTTGGATCTTAGAAGGACAGAGTG
GGGCATGGAAATGCACCACCAGGGCAGTG CAGCTTGGTCACTGCCAGCTC
CNCTCATGGGCAGAGGGCTGGCCTCTTGCAGCCGACCAGGCACTGAGCG
CCATCCCAGGGCCCTCGCCAGCCCTCAGCAGGGCCAGGACACACAAGCCT
TTGACTTCCTCCTGTCACTGCTGCTGCCATTCCTGTTTTGTGGTCATCACT
CCTTCCCTGTCCTCAGACTGCCCAGCACTCAAGGATGTCCTGTGOTGGCA
TCAGACCATATGCCCCTGAANAGGAGTGAGTTGGTGTITTTLTGCCGCGCC
00
CANAGAGCTGCTGTCCCCTGAAAGATGCAAGTGGGAATGATGATGNTCAC
CATCNTCTGACAC CAAGCC CTTTGGATAGAGGCCCCAACAGTGAGGATGG GGCTG CACTGCATTGCCAAGGCAACTCTGTNNTGACTGCTACANGACANT CC CAGGAC CTGNGAAGNNCTATANATNTGATGCNAGGCACCT STAR6 forward 00 OCACCACAGACATO CCCTCTGGCCTCCTGAGTGGTTTCTTCAGCACAGCTT 0 CCAGAGCCAAATTAAACGTTCACTCTATGTCTATAGACAAAAAGGGT'JTG ACTAAACTCTGTGTTTTAGAGAGGGAGTTAAATG
CTGTTAACTTTTTAGGG
GTGGGCGAGAGGAATGACAAATAACAACTTGTCTGAATGTTNTTACNFI'TCT
CC CCACTGC CTCAAGAAGGTTCACAACGAGGTCATCCATGATAAGGAGTA
AGACCTCCCAGCCGGACTGTCCCTCGGCCCCCAGAGGACACTCCACAGAG
ATATGCTAACTGGACTTGGAGACTGGCTCACACTCCAGAGAAAAGCATGG
AGCACGAGCGCACAGAGCANGGGCCAAGGTCCCAGGGACNGAATGTCTA
GGAGGGAGATTGGGGTGAGGGTANTCTGATGCAATTACTGNGCAG
CTCA
ACATTCAAGGGAGGGGAAGAAAGAAACNGTCCCTGTAAGTAAGTTGTNCA
NCAGAGATGGTAAGCTCCAAATTTNAACTTTGGCTGCTGGAAAGTTTNNG
GGCCNANANAANAAACANAAANATTTGAGGThTANAC
CCACTAACCCNT
ATNANTANTTATTATACCCCTAATTANACCTTGGATANC
CTAAAATATC
NTNTNAAACGGAAC CCTCNTTC CCNTH'NNAAATNNNAAAGGCCATTNN GNNCNAGTAAAAATCTNNNTTAAGNNNTGGGCC
CNAACAAACNTNTTCC
NAGACACNTTTTTTNTCCNGGNATTTNTAATTTAT~ITCTAANCC
STAR6 reverse
ATCGTGTCCTTTCCAGGGACATGGATGAAGCTGGAAGCCATCATCCTCAG
CAAACTAACACAGGAACAGAAAACCAAATACCACATGTTCTCACTCATAA
GTGGGAGCTGAACAGTGAGAACACATGGACACAGGGAGGGGAACATCAC
ACACCAAGGCCTGTCTGGTGTGGGGAGGGGAGGGAGAGCAT
CAGGACAA
ATAGCTAATGCATGTGGGGCTTAAAC
CTAGATGACGGGTTGATAGGTGCA
00 GCAATCCACTATGGACACATATACrATGTAACACCCNACCTTNIGAC ATGTATCC
CAGAACTTAAAGGAAAATAAAAATTAAAATCCCTGG
AATAAAAAAGAGTGTGGACTTTGGTGAGATN
STARS forward N ~GGATCACCTCGAAGAGAGTCTAACGTCCGTAGGAACGCTCTCGGGTTCAC
AAGGATTGACCGAACCCCAGGATACGTCGCTCTCCATCTGAGGCTTGNTC
00 CAAATGGCCCTCCACTATTCCAGGCACGTGGGTGTCTCCCCTACCTCC
CTGTCTCCTGAGCCCATGCTGCCTATCACCCATCGGTGCAGGTCCTCT
GANGTGGGATTTCTCCCTCTCCAAAA
GCACTCAGCCCAGGAATCNTCCTCTT
NA.AAGTTNGCC CAGGTG117CNTAACAGGTTFAGGGAGAGANCC
CCCAGG
TTTNAGTTNCAAGGCATAGGACGCTGGCTTGAACACACACACACNCTC
STARS reverse GGATCC COACTCTGCACCGCAAACTCTACGGCGCCCTGCAGGACGGCGGC
CTCCTGCCGCTTGGACGCCAGNCAGGAGCTCCCCGGCAGCAGCAGAGCA
GAAAGAAGGATGGCCCCGCCCCACTTCGCCTCCCGGCGGTCTCCCTCCCG
CCGGCTCACGGACATAGATGGCTGCCTAGCTCCGGAGCCTAGCTCTTGT
TCCGGGCATCCTAAGGAAGACACGGTTTCCTCCCGGGGCCTCACCACA
TCTGGGACTTTGACGACTCGGACTCTCTCCATTGATGGTGCGCGTTCT
CTGGGAAAG
STAR18 forward TGGATC CTGCCG CTCGCGTCTTAGTGTTCTCCCTCAGACTTCCTITG
TTTTGTTGTCTTGTGCAGTATTTTACAGCCCCTCTTGTGTTTC'TTAT
CTCGTACACACACGCAGTTTAAGGGTGATGTGTGTATATTAAGGAC
CCTTGGCCCATACTTTCCTATTCTTAGGGACTGGGATGGGTTTGACTG
AAATATGTTTTGGTGGGGATGGGACGGTGGACTTCCATTCTCCCTAACT
GGAGTTTTGGTCGGTAATCAAAACTAAAAGAACCTCTGGAGACTGGA
00
ACCTGATTGGAGCACTGAGGAACAAGGGAATGAAAGGCAGACTCTCTGA
ACGTTTGATGAAATGGACTCTTGTGAAAATACAGTGATATTCACTGT
GCACTGTACGAAGTCTCTGAAATGTAATTAAGTTTTATTGAGCCCCCG
AGCTTTGGCTTGCGCGTATTTTTCCGGTCGCGGACATC
CCACCGCGCAGA
GCCTCGCCTCCCCGCTGNCCTCAGCTCCGATGACTTCCCCGCCCCCGCCC
C~1 TGCTCGGTGACAGACGTTCTACTGCTTCCAATCGGAGGCAC
CCTTCGCGG
00 STAIR18 reverse TTTrTGTTGTCTTGTGCAGTATTrACAGCCC
CTCTTGTGTTTTTCTTTATTT
CT CGTACACACACG CAG~rTTAAGGGTGATGTGTGTATATTAAGGAC
CCTTGGCCCATACTTTCCTAAIITCTTTAGGGACTGGGATTGGGTTTGACTG
AAATATGTTTTGGTGGGGATGGGACGGTGGAC'TCCATTCTCCCTACT
GGAGTTTTGGTCGGTAATCAAAACTAAAGAAACCTCTGGGAGACTGGIAA
AC CTGATTGGAGCACTGAGGAACAAGGGATGAAGGCAGACTCTCTGA
ACGTTTGATGAAATGGACTCTTGTGAAAATTACAGTGATATTCACTGTT
GCACTGTACGAAGTCTCTGAAATGTTTAGPITATTGAG CCCC CG
AGCTTTGGC
Table 4. Oligonucleotide patterns (6 base pairs) over-represented in STAR elements.
The patterns are ranked according to significance coefficient. These were determined using RSA-Tools with the sequence of the human genome as reference. Patterns that comprise the most discriminant variables in Linear Discriminant Analysis are indicated with an asterisk.
Number Oligonucl Observed Expected Significan Number of eotide occurrenc occurrenc ce matching sequence es es coefficient STARs 1 CCCCAC 107 49 8.76 51 2 CAGCGG 36 9 7.75 23 3 GGCCCC 74 31 7.21 34 4 CAGCCC 103 50 7.18 37 GCCCCC 70 29 6.97 34 6 CGGGGC 40 12 6.95 18 7 CCCCGC 43 13 6.79 22 8 CGGCAG 35 9 6.64 18 9 AGCCCC 83 38 6.54 CCAGGG 107 54 6.52 43 11 GGACCC 58 23 6.04 12 GCGGAC 20 3 5.94 14 13 CCAGCG 34 10 5.9 24 14 GCAGCC 92 45 5.84 43 CCGGCA 28 7 5.61 16 16 AGCGGC 27 7 5.45 17 17 CAGGGG 86 43 5.09 43 18 CCGCCC 43 15 5.02 18 19 CCCCCG 35 11 4.91 GCCGCC 34 10 4.88 18 21 GCCGGC 22 5 4.7 16 22 CGGACC 19 4 4.68 14 23 CGCCCC 35 11 4.64 19 24 CGCCAG 28 8 4.31 19 CGCAGC 29 8 4.29 26 CAGCCG 32 10 4 24 27 CCCACG 33 11 3.97 26 28 GCTGCC 78 40 3.9 43 29 CCCTCC 106 60 3.87 48 CCCTGC 92 50 3.83 42
O
O
00 0 OIn o
(N
0- 31 CACCCC 77 40 32 GCGCCA 30 10 33 AGGGGC 70 35 34 GAGGGC 66 32 GCGAAC 14 2 36 CCGGCG 17 4 37 AGCCGG 34 12 38 GGAGCC 67 34 39 CCCCAG 103 60 40 41 42 43
CCGCTC
CCCCTC
CACCGC
CTGCCC
24 81 33 96 44 12 55 35 3.75 3.58 3.55 3.5 3.37 3.33 3.29 3.27 3.23 3.19 3.19 3.14 3.01 2.99 2.88 2.77 2.73 2.56 2.41 2.34 2.31 2.22 2.2 2.18 2.15 23 34 13 12 51 19 43 22 42 39 22 19 19 9 17 17 38 33 18 42 AA GGC CA 68 .1 T 46
CGCTGC
CAGCGC
28 25 28 25 9 8 10
I
CGGCCC 281 48 CCGCCG 19 49 CCCCGG 30 1: AGCCGC 23 51 GCACCC 55 2' 52 AGGACC 54 2' 5 1 7 7 7 53 54 56 57 58 59
AGGGCG
CAGGGC
24 81 5 147 47
CCCGCC
GCCAGC
AGCGCC
AGGCCC
CCCACC
45 21 66 21 101 36 6 34 62 CGCTCA 21 6 61 AACGCG 9 1 2.09 2.09 2.08 2.05 2.03 1.96 1.92 1.87 1.78 39 18 32 54 17 9 14 36 14 62 63
GCGGCA
AGGTCC
21 49 24 24 tI rlctA~~ I iI CAGAGG 107 68 1.77 47 66 CCCGAG 33 14 1.77 22 67 CCGAGG 36 16 1.76 68 CGCGGA 11 2 1.75 8 69 CCACCC 87 53 1.71 CCTCGC 23 8 1.71 71 CAAGCC 59 32 1.69 72 TCCGCA 18 5 1.68 17 73 CGCCGC 18 5 1.67 9 74 GGGAAC 55 29 1.63 39 CCAGAG 93 58 1.57 49 76 CGTTCC 19 6 1.53 16 77 CGAGGA 23 8 1.5 19 78 GGGACC 48 24 1.48 31 79 CCGCGA 10 2 1.48 8 CCTGCG 24 9 1.45 17 81 CTGCGC 23 8 1.32 14 82 GACCCC 47 24 1.31 33 83 GCTCCA 66 38 1.25 39 84 CGCCAC 33 15 1.19 21 GCGGGA 23 9 1.17 18 86 CTGCGA 18 6 1.15 87 CTGCTC 80 49 1.14 88 CAGACG 23 9 1.13 19 89 CGAGAG 21 8 1.09 17 CGGTGC 18 6 1.06 16 91 CTCCCC 84 53 1.05 47 92 GCGGCC 22 8 1.04 14 93 CGGCGC 14 4 1.04 13 94 AAGCCC 60 34 1.03 42 CCGCAG 24 9 1.03 17 96 GCCCAC 59 34 0.95 97 CACCCA 92 60 0.93 49 98 GCGCCC 27 11 0.93 18 99 ACCGGC 15 4 0.92 13 100 CTCGCA 16 5 0.89 14 101 ACGCTC 16 5 0.88 12 102 CTGGAC 58 33 0.88 32 103 GCCCCA 67 40 0.87 38 104 ACCGTC 15 4 0.86 11 105 CCCTCG 21 8 0.8 18 106 AGCCCG 22 8 0.79 14 107 ACCCGA 16 5 0.78 13 108 AGCAGC 79 50 0.75 41 109 ACCGCG 14 4 0.69 7 110 CGAGGC 29 13 0.69 24 111 AGCTGC 70 43 0.64 36 112 GGGGAC 49 27 0.64 34 113 CCGCAA 16 5 0.64 12 114 CGTCGC 8 1 0.62 6
O
O
00
O
(N
j?
O-
(N
115 CGTGAC 17 6 0.57 116 CGCCCA 33 16 0.56 22 117 CTCTGC 97 65 0.54 47 118 AGCGGG 21 8 0.52 17 119 ACCGCT 15 5 0.5 11 120 CCCAGG 133 95 0.49 58 121 CCCTCA 71 45 0.49 39 122 CCCCCA 77 49 0.49 42 123 GGCGAA 16 5 0.48 14 124 CGGCTC 29 13 0.47 19 125 CTCGCC 20 8 0.46 17 126 CGGAGA 20 8 0.45 14 127 TCCCCA 95 64 0.43 52 128 GACACC 44 24 0.42 33 129 CTCCGA 17 6 0.42 13 130 CTCGTC 17 6 0.42 14 131 CGACCA 13 4 0.39 11 132 ATGACG 17 6 0.37 12 133 CCATCG 17 6 0.37 13 134 AGGGGA 78 51 0.36 44 135 GCTGCA 77 50 0.35 43 136 ACCCCA 76 49 0.33 137 CGGAGC 21 9 0.33 16 138 CCTCCG 28 13 0.32 19 139 CGGGAC 16 6 0.3 140 CCTGGA 88 59 0.3 141 AGGCGA 18 7 0.29 17 142 ACCCCT 54 32 0.28 36 143 GCTCCC 56 34 0.27 36 144 CGTCAC 16 6 0.27 145 AGCGCA 16 6 0.26 11 146 GAAGCC 62 38 0.25 39 147 GAGGCC 79 52 0.22 42 148 ACCCTC 54 32 0.22 33 149 CCCGGC 37 20 0.21 21 150 CGAGAA 20 8 0.2 17 151 CCACCG 29 14 0.18 152 ACTTCG 16 6 0.17 14 153 GATGAC 48 28 0.17 154 ACGAGG 23 10 0.16 18 155 CCGGAG 20 8 0.15 18 156 ACCCAC 60 37 0.12 41 157 CTGGGC 105 74 0.11 158 CCACGG 23 10 0.09 19 159 CGGTCC 13 4 0.09 12 160 AGCACC 54 33 0.09 161 ACACCO 53 32 0.08 38 162 AGGGCC 54 33 0.08 163 CGCGAA 6 1 0.02 6 164 JGAGCCC 58 36 0.02 36 165 ICTGAGC 1711 461 0.021 166 IAATCGG 113 41- 0.021 11 00100 0 Table 5. Dyad patterns over-represented in STAR elements.
The patterns are ranked according to significance coefficient. These were determined using RSA-Tools with the random sequence from the human genome as reference. Patterns that comprise the most discriminant variables in Linear Discriminant Analysis are indicated with an asterisk.
O Number Dyad sequence Observ Expected Significan c0 ed occurrenc ce occurre es coefficient nces C. o r7 q ?t1 31 CACN{5}GCG 22 4 4.42 32 CGCN(17}CCA 27 6- 4.39 33 CCCN{GGC 69 30 4.38 34 CCTN{5)GCG 28 7 4.37 GCGN{0}GAC 19 3 4.32 36 GCCN0}GGC 40 7 4.28 37 GCGN{2)CCC 26 6 4.27 38 CCGN(11)CCC 32 9 4.17 39 CCCN8)TCG 23 5 4.12 CCGN{17}GCC 30 8 4.12 41 GGGN{5)GGA 101 52 4.11 42 GGCN(6)GGA 71 32 4.1 43 CCAN{4)CCC 96 48 4.1 44 CCTN{141CCG 32 9 4.09 GACN{12}GGC 45 16 4.07 46 CGCN{13)CCC 30 8 4.04 47 CAGN{16}CCC 92 46 4.02 48 AGCN{10}GGG 75 35 3.94 49 CGGN{13GGC 30 8 3.93 CGGN(1)GCC 30 8 3.92 51 AGCNfOGGC 26 6 3.9 52 CCCN{16GGC 64 28 3.89 53 GCTN{19)CCC 67 29 3.87 54 CCCN{16)GGG 88 31 3.81 CCCN{9)CGG 30 8 3.77 56 CCCN{10CGG 30 8 3.76 57 CCAN{0}GCG 32 9 3.75 58 GCCN{17)CGC 26 6 3.74 59 CCTN{61CGC 27 7 3.73 GGAN{1)CCC 63 27 3.71 61 CGCN(18)CAC 24 5 3.7 62 CGCN{20)CCG 21 4 3.69 63 CCGN{0}GCA 26 6 3.69 64 CGCN{201CCC 28 7 3.69 AGCN{15)CCC 67 30 3.65 66 CCTN{7}GGC 69 31 3.63 67 GCCN{5}CGC 32 9 3.61 68 GCCN{14}CGC 28 7 3.59 69 CAGN{11}CCC 89 45 3.58 GGGN{16)GAC 53 21 3.57 71 CCCN{15}GCG 25 6 3.57 72 CCCN{0}CGC 37 12 3.54 73 74 76 77 .78 79 81 82 .83 84 86 87 .88 89 91 92 93 94 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 112 113 1114 CCCN 16AGC AGONf9jGGG CGCN 12 CTC CACN181CGC -COAN{7
CCG
CGG'N{1}GCA CGCN{14CC AGCNf0 CCC CGCN{13)GTC GCGN{3)GCA OGGN{0
GGC
GOON{14}CCC ACON{7)CGC AGGN 7 CGG CCN{161CGA CGCN(6)CAG CAGN{11)GCG CCGN{12)CCG OGON 18)CAG CAGN{1}GGG GOGN{18)GC CG 'Nl 15GGC CCNI151AGG AGGN{20
GOG
COON 5 OTO TOON{171OGA GCGN(4000C CCGN(131GAG OTON{6)OGO OGON{4)GAG GCG'N(5 GGA CGCN{111CCG GOGN{19)CCC OGON{18}GAA COAN{1)CGG AGGN{10}
CCC
67 96 28 23 31 29 76 18 26 34 68 21 33 22 31 29 19 27 80 2 2E 2 r 2'e 2( 1 2 2 7 2 2 2 30 3.54 50 3.52 7 3.46 5 3.43 9 3.42 6 3.41.
8 3.4 36- 3.4 3 3.37 7 3.35 11 3.35 31 3.33 4 3.32 10 3.31 5 3.3 9 3.29 I8 3.29 4 3.26 7 3.24 I39 3.21 10 3.2 7 3.18 7 3.15 34 3.14 37 3.14 3 7* 3.13 3 5 3.12 )9 3.08 0 9 3.07 8 8 3.06 7 7 3.05 8 8 3.04_ 1 5 3.03 4 6 3.03 7 7 3.01 8 3 2.99 6 7 2.98 1 5 2.98 8 39 2.95 4 6 2.94 6 2.94 4 431 2.92 124 CCCN(IG 15 ACG 23 1125 AGCN{8}CCC 66 31 2.73 126 CCCNfl }GC 60 27 2.71 12 GN{G 19 102.
128 CCGN(14)GGC 276 8 2.75 129 CCGN 0 CCG 19 4 2.74 130 CGCNf8}AGC 23 6 2.69 131 CGCN{ 19AC 1 4 2.8 1 2 G G N{ 17 GAG5 172 3 2. 66 133 AGCN1GCG 24 6 2.63 124___ CCCN{15}G 231 10 2.63 12CGCN{17CCC 6 71 2.51 16CCN{1}ACA 23 67 2.1 12CGGN{CC 317 30 12 CGN1)GCC 24 6. 2.49 129 CGNSGAC 17 3 2.49 11CGCN{15}GGc 84 32 2.44 13 CGGN{116GC 7 8 2.44 13CGCN{16 CA 23 6 2.42 13 GCCN{3}CC 173 36 2.4 156 CGN41GGG 94 51 26 00 157 CCCN61GCG 23 6 2.38 158 CCGN(16)CGC 17 3 2.38 159 CCCN{17)GCA 61 28 2.37 160 CGCN{13)TCC 24 6 2.37 161 GCCN{1}CGC 29 9 2.36 162 CCGN191GAG 26 7 2.35 163 GGGN(10}GGA 89 48 2.35 164 CAGN{51CCG 32 11 2.35 165 CGCN{3)AGA 19 4 2.32 166 GCCN{0}GCC 29 9 2.32 167 CCCN{8)GGC 61 28 2.31 168 CCTN(6)GCG 22 6 2.29 169 GACN{6}CCC 48 20 2.29 170 CGGN1}CCC 26 8 2.27 171 CCCN{15}CCG 30 10 2.27 172 CAGN{9)CCC 84 44 2.26 173 CGGN{10}GGC 27 8 2.26 174 CGAN{10}ACG 10 1 2.26 175 GCGN3)TCC 21 5 2.26 176 CCCN{3}GCC 75 38 2.24 177 GCGNf1}ACC 17 3 2.24 178 CCGN(9)AGG 27 8 2.23 179 CGCN{16)CAG 26 8 2.23 180 GGCNf0}CCC 62 29 2.22 181 AGGN{12)CCG 26 8 2.19 182 CCGN{0}GCG 16 3 2.19 183 CCGN{2lGCC 30 10 2.18 184 CCGN{111GTC 19 4 2.17 185 CAGN{0}CCC 88 47 2.17 186 CCCN5 CCG 32 11 2.17 187 GCCN{20)CCC 66 32 2.15 188 GACN{2}CGC 18 4 2.14 189 CGCN{6)CAC 23 6 2.13 190 AGGN{14}GCG 25 7 2.1 191 GACN{51CGC 17 3 2.1 192 CCTN{19}CCG 29 9 2.1 193 CCGN{12)GGA 24 7 2.08 194 GGCN{9}GAC 44 18. 2.08 195 AGGN{10}GG. 94 52 2.07 196 CCGN{101GAG 25 7 2.07 197 CGCN6}GGA 20 5 2.06 198 CGCN7}AGC 23 6 2.04 199 CCAN{13}CGG 26 8 2.03 200 CGGN{6)GGA 25 7 2.03 201 CGCN(191GCC 24 7 2.03 202 CCAN{12)CGC 24 7 2.02 203 CGGN{1}GGC 41 16 2.02 204 GCGN{3 CCA 25 7 2.01 205 AGGN{1}CGC 21 5 2 206 CTCN{5)CGC 24 7 1.98 207 CCCN{0}ACG 30 10* 1.97 208 CAGN[17)CCG 29 9 1.96 209 GGCN{4}CCC 62 30 1.96 210 AGGN{8lGCG 26 8 1.96 211 CTGN{1)CCC 88 48 1.94 212 CCCNf16CAG 85 46 1.94 213 CGCN{9)GAC 16 3 1.93 214 CAGN{6)CCG 29 9 1.92 215 CGTNf12CGC 11 1 1.92 216 CTCN7)GCC 69 35 1.92 217 CGCN{19TCC 22 6 1.92 218 CCCN(7}GCC 67 33 1.91 219 CAGN(13 CGG 30 10 1.9 220 CGCN{1}GCC 27 8 1.9 221 CGCNI17)CCG 17 4 1.89 222 AGGN[4)CCC 63 31 1.89 223 AGCN{10}CGC 21 5 1.89 224 CCCN{11}CGG 30 10 1.88 225 CCCN{8}GCC 75 39 1.86 226 CCGN{11CGG 22 3 1.86 227 CCCN{1}ACC 71 36 1.85 228 CGCN{0}CAG 25 7 1.85 229 CCGN{19}TGC 23 6 1.82 230 GCGN{4}CGA 12 2 1.82 231 CCGN{19 GCC 30 10 1.82 232 CCAN{10}CCC 85 46 1.81 233 CAGN{13}GGG 91 51 1.81 234 AGCN{18}CGG 23 6 1.81 235 CGAN{81CGC 11 1 1.81 236 AGCN{4)CCC 63 31 1.8 237 GGAN{6}CCC 61 30 1.8 238 CGGN{13)AAG 23 6 1.8 239 ACCN{11 CGC 19 5 1.79 240 CCGN(12}CAG 28 9 1.78 241 CCCN(12)GGG 76 29 1.77 242 CACN{17}ACG 22 6 1.76 243 CAGN{18)CCC 82 44 1.76 244 CGTN{10}GTC 19 5 1.75 245 CCCN{13)GCG 23 6 1.75 246 GCAN{11CGC 20 5 1.73 247 AGAN{41CCG 24 7 1.73 248 GCGN{10}AGC 22 6 1.72 249 CGCN{OIGGA 12 2 1.72 250 CGGN{4)GAC 17 4 1.69 251 CCCN{12)CGC 26 8 1.68 252 GCCNf15CCC 65 33 1.68 253 GCGN{6}TCC 20 5 1.66 254 CGGN{3)CAG 33 12 1.65 255 CCCN{3)CCA 88 49 1.65 256 AGCN{31CCC 59 28 1.65 257 GGGN(16)GCA 65 33 1.65 258 AGGN{8}CCG 28 9 1.64 259 CCCN{0}CCG 29 10 1.64 260 GCGN51GAC 16 3 1.64 261 CCCN{9}ACC 60 29 1.64 262 CTGN{5}CGC 25 8 1.64 263 CGCN{14CTC 23 7 1.64 264 CGGN{14)GCA 23 7 1.63 265 CCGN{8}GCC 26 8 1.62 266 CCGN{7}CAC 23 7 1.62 267 AGCN{8}GCG 21 6 1.61 268 CGGN{16)GGA 29 10 1.61 269 CCAN{12}CCG 26 8 1.61 270 CGGN{2}CCC 26 8 1.6 271 CCAN(13}GGG 71 37 1.6 272 CGGN{15}GCA 21 6 1.6 273 CGCN9GCA 20 5 1.58 274 CGGN{19)CCA 26 8 1.58 275 GGGN{15CGA 20 5 1.57 276 CCCN(10}CGC 26 8 1.57 277 CTCN{14}CGC 26 8 1.55 278 CACN 11 GCG 20 5 1.55 279 CCGN{2}GGC 24 7 1.55 280 CTGN{18}CCC 85 47 1.54 281 GGGN{13)CAC 58 28 1.54 282 CCTN{1l5GGC 62 31 1.54 283 CCCN20)CGA 20 5 1.54 284 CCCN8)CGA 20 5 1.53 285 GAGN{7)CCC 61 30. 1.53 286 CGCN(21CCG 22 6 1.53 287 CCCNf0}TCC 98 57 1.52 288 AGCN(0}GCC 21 6 1.52 289 CCCN21TCC 82 45 1.52 290 CCGNf5lCCC 30 10 1.52 291 CGCN{131CGC 16 3 1.51 292 CCCN{1}CGC 28 9 1.51 293 GCCN(16}GCA 53 25 1.51 294 CCCN{16 CCA 84 46 295 CCGNJ13)CGC 19 5 296 CCGN{17}CAG 28 9 1.49 297 CGGN(181GGC 26 8 1.49 298 CCGN(14)AGG 23 7 1.49 299 CCCN(5}CGG 26 8 1.49 300 CCCNt6}GGA 58 28 1.49 301 ACGN{2}CCC 20 5 1.49 302 CCAN{9}CCG 27 9 1.48 303 CCCN{191CCA 78 42 1.48 304 CAGN{0}GGG 77 41 1.48 305 AGCN(1}CCC 58 28 1.47 306 GCGN 7)TCC 27 9 1.46 307 ACGN{18}CCA 25 8 1.46 308 GCTN{14}CCC 61 30 1.46 309 GCGN{14)CCC 23 7 1.46 310 GCGN{19)AGC 20 5 1.45 311 CCGNf8)CAG 29 10 1.45 312 GCGN{6}GCC 22 6 1.45 313 GCGN{101GCA 20 5 1.44 314 CCTN{71GCC 69 36 1.44 315 GCCN(13)GCC 54 26 1.42 316 CCCN{14)GCC 63 32 1.42 317 CCCN{151CGG 26 8 1.42 318 CCAN(13)CGC 23 7 1.42 319 AGCN{11}GGG 67 35 1.41 320 GGANf0}GCC 64 32 1.4 321 GCCN{3}TCC 61 30* 1.4 322 CCTN(5}GCC 69 36 1.39 323 CGGN{18CCC 25 8 1.39 324 CCTN{3}GGC 59 29 1.38 325 CCGN(0}CTC 22 6 1.38 326 AGCNfl7)GCG 19 5 1.37 327 ACGN{14)GGG 20 5 1.37 328 CGAN(12)GGC 19 5 1.37 329 CCCN{20)CGC 24 7 1.37 330 ACGN{12)CTG 24 7 1.36 331 CCGN{0}CCC 36 14 1.36 332 CCGN{10)GGA 23 7 1.36 333 CCCN{3}GCG 21 6 1.36 334 GCGN{14}CGC 22 3 1.35 335 CCGN{8}CGC 16 4 1.35_ 336 CGCN{10}ACA 22 6 1.34 337 CCCN{19)CCG 28 10 1.33 338 CACN{14)CGC 20 5 1.32 339 GACNf3}GGC, 46 21 1.32 340 GAAN{7}CGC 19 5 1.32 341 CGCN{16}GGC 21 6 1.31 342 GGCN{9)CCC%- 64 33 1.31 343 CCCN{9}GCC 64 33 1.31 344 CGCN{0}TGC 26 9 1.3 345 CCTN{81GGC 67 35 1.3 346 CCAN{8jCCC 82 46 1.29 347 GACN{2}CCC 42 18 1.28 348 GGCN{1)CCC 54 26 1.27 349 CGCNfO}AGC 24 7 1.26 350 AGGN{4}GCG 28 10 1.26 351 CGGN{6}TCC 22 6. 1.25 352 -ACGN{19}GGC 20 5 1.25 353 CCCN{81ACG 21 6 1.24 354 CCCN{18}GCC 62 31 1.24 355u GCCN{21CGA 19 5 1.24 356 CCCN{8)GCG 28 10 1.23 357 CCCN{0}CTC 76 41 1.23 358 GCCN{11)CGC 27 9 1.22 359I AGCN{9)CCC 59 29 1.22 360 GCTN{0}GCC 71 38 1.21 361 CGCN{3)CCC 26 9 1.21 362 CCCN{2)CCC 117 72 1.19 363 GCCN{9)CGC 23 7 1.19 364 GCAN{19)CGC 19 5 1.19 365 CAGN{4CGG 32 12 1.18 [366 CAGN{}G 80 44 1.17 00 367 GCCN(16}CCC 67 35 368 GAGN{5)CCC 60 30 369 CCTN{16}TCG 20 6 370 CCCN GGC 62 32 371 GCGN{13GGA 24 8 372 GCCN{17}GGC 66 25 373 CCCN{1GGC 58 29 374 AGGN(31CCG 311 12 375 q3r.
CACN{0}CGC flC~Nt1 8WCA 32 28 10 10 1.16 1.16 1.16 1.15 1.15 1.15 1.14 1.14 1.14 1.14 1.13 1.13 1.11 1.11 1.09 1.09 1.08 1.07 377 AGCN(1}GCC 57 28 378 CGCN{18}GGC 23 7 379 CCCN(5}AGG 64 I 3go AACNf0lGCG 380t 381 CCCN(10}CCA sS 381 389 1CGCNf13)GAG 1 50 6 8 10 t 383 CGCN{7GCC
I
qsAL ('NfCCG 385 CGCNf16CCC 24 8 386 GAAN(13)CGC 18 5 387 GGCN{3}CCC 49 23 388 TCCN{11}CCA 87 50 389 CACN{0}CCC 70 38 390 CGCN{16}CCG 15 3 391 CGGN{15AGC 21 6 1.05 1.05 1.03 1.03 1.02 1.02 1.02 1.02 1.01 1.01 1.01 ~92 CCCN(12GCG tj 393 CCCNf9jGAG 591__ 30 394 CCGN{20}TCC 241 8 QF C.C~C1N(n~C~CrC, 396 ATGN(7}CGG 20 6 1 397 GGGN{20)GCA 59 30 1 398 CGGN{4)GGC 26 9 0.99 399 CGGN{16)AGC 22 7. 0.99 400 CGGN{5}GGC 25 8 0.99 401 GCGN{0}GGA 25 8 0.98 402 GGCN{20}CAC r 52 25 0.98 403 CCCN(9}CCC 97 58 0.97 404 ACCN{17}GGC 44 20 0.97 405 CCCN{6}CGA 18. 5 0.96 406 AAGMI1OICGG t 1407 CGCN{17)CAC 6 8
I
U.'db 0.95 0.94 Ihn CCCNf16CGG 14Q CCC-,flIlCGG 00 409 GACN{18}GGC 39 17 0.94 410 GGGN{15)GAC 47 22 0.92 411 GCCN{4}TCC 66 35 0.92 412 GGCN{15)CCC 56 28' 0.92 413 CAGN{12}CGC 24 8 0.92 414 CCAN{3)GCG 22 7 0.91 415 CCGN{16}GAG 22 7 0.9 416 AGCN{2}CGC 24 8 0.89 417 GAGN{4}CCC 54 27 0.89 418 AGGN{3}CGC 23 7 0.88 419 CACN{13}AGG* 67 36 0.88 420 CCCN{4}CAG 88 51 0.88 421 CCCN{2}GAA 63 33 0.87 422 CGCN{19}GAG 21 6 0.87 423 ACGN18lGGG 21 6 0.87 424 CCCN{4)GGC 62 32 0.87 425 CGGN{9)GAG 28 10 0.86 426 CCCN{3)GGG 66 26 0.86 427 GAGN4GGC 66 35 0.85 428 CGCN5)GAG 18 5 0.84 429 CCGN(20}AGG 24 8 0.84 430 CCCN{15}CCC 88 51 0.83 431 AGGN{17}CCG 25 8 0.82 432 AGGN(61GGG 89 52 0.82 433 GGCN{20}CCC 57 29 0.82 434 GCAN(17}CGC 19 5 0.82 435 CGAN{11}ACG 9 1 0.81 436 CGCNf2}GGA 19 5 0.81 437 CTGN{5}CCC 79 45 0.8 438 TCCN{20}CCA 77 43 0.8 439 CCAN{21GGG 59 30 0.8 440 CCGN{15}GCG 14 3 0.8 441 CCAN{5}GGG 69 38 0.79 442 CGGN{1}TGC 24 8 0.79 443 CCCN{14}GCG 21 6 0.79 444 CAGN{0}CCG 27 10 0.79 445 GCCN{9}TCC 60 31 0.78 446 AGGN{20}CGC 22 7 0.78 447 CCCN {6GAC 42 19 0.77 448 CGGN{11}CCA 23 7 0.76 449 GGGN{14}CAC 57 29 0.75 450 GCAN{15}CGC 19 5 0.74 451 CGCN{2)ACA 20 6 0.74 452 ACCN{9}CCC 57 29 0.73 453 GCGN{9}CGC 20 3 0.73 454 CAGN15jGCG 23 7 0.73 455 CCCN18)GTC 45 21 0.72 456 GCGN{3}CCC 24 8 0.72 457 CGGN(11)GCC 23 8 0.72 458 CCCNf1}CGG 24 8 0.71 459 GCCN14}CCA 70 38 0.71 460 CCCNf4)CCG 30 12 0.7 461 CGTN{2)GCA 21 6 0.7 462 AGCN{7}TCG 18 5 0.69 463 CCGN{15)GAA 20 6 0.69 464 ACCN{5)CCC 62 33 0.69 465 CGCN{14jGAG 19 5 0.68 466 CCCN{7 CGC 30 12 0.68 467 GAGN(12)CGC 21 6 0.68 468 GGCNf17CCC 58 30 0.67 469 ACGN{11)CTC 21 7 0.65 470 ACAN{9}CGG 24 8 0.65 471 CTGN7}CCC 82 47 0.65 472 CCCN2lGCC 72 40 0.65 473 CGGN{2)GCA 24 8 0.64 474 CCCN{0}TGC 83 48 0.64 475 CGCN{7}ACC 18 5 0.63 476 GCANf2)GCC 54 27 0.63 477 GCGN8}CCA 20 6 0.63 478 AGCN 0}CGC 22 7 0.63 479 GCGN{2)GCA 18 5 0.63 480 CCGN{2}GTC 18 5 0.62 481 CCGN{3}ACA 21 7 0.62 482 ACGN{13)TGG 21 7 0.62 483 CCAN{8}CGC 23 8 0.62 484 CCGN{9)GGC 23 8 0.61 485 CCAN{5}CCG 25 9 0.61 486 AGGN{3)GGG 97 59 0.61 487 CAGN(2)GGC 78 45 0.61 488 CCCN8 CAG 81 47 0.61 489 AGCN{5}CAG 80 46 0.6 490 CGGN{16}GCC 22 7 0.6 491 GCGN{15 CCC 23 8 0.6 492 CCCN(11)GCC 59 31 0.59 493 CGAN{2)ACG 9 1 0.59 494 CGGN{4)GCC 22 7 0.59 495 CACN{6}CGC 19 6 0.59 496 CGGN{5}ACG 11 2 0.59 497 CTGN{4}GCC *66 36 0.59 498 GGGN{18)CGA 18 5 0.59 499 CCTN{8}CGC 22 7 0.59 500 GCCN{4)CCC-x. 67 37 0.58 501 CGGN{10}GCC 22 7 0.58 502 GCCN{5)GGA 54 27 0.57 503 ACCN{7}GCG 15 4 0.57 504 CCCN{8}CGC 24 8 0.57 505 CAGN{5)CCC 77 44 0.56 506 CACN{14}GGA 63 34 0.56 507 CCCN{ 11GCC% 94 57 0.55 508 CCCN{5)AGC 67 37. 0.55 509 GGCN{5}GGA 59 31 0.55 510 CGAN{17)GAG 19 6 0.55 511 CGCN{7}ACA 18 5 0.54 512 CCAN{13}CCC 87 52 0.54 513 CGGN{201GGC 24 8 0.54 514 CCCN{17)GCC 58 30 0.53 515 CCTN{10}CCG 30 12 0.53 516 CCCN{8lCCG 27 10 0.53 517 CGCN{3}GAG 18 5 0.52 518 CGCN{7)AAG 17 5 0.51 519 CGGN{11}GGA 23 8 0.51 520 CCGN{15)CCG 15 4 0.51 521 CCCN{3)GCA 57 30 0.511 522 CGGN{2)CAG 24 8 523 AGGN{2} COG 24 8 524 CCCN{4}CAC 69 38 525 GGAN{191CCC 56 29 0.49 526 CCCN{8}CAC 68 38' 0.49 527 ACCN{6)CCG 18 5 0.49 528 CCCN{6}GGC 54 28 0.49_ 529 CCCN{6)CCG 29 11 0.48 530 CGCN{14)GCC 26 9 0.47 531 CCGN15}TCC 25 9 0.46 532 GCCN{6}GCC 55 28 0.46 533 CGGNM7GGA 24 8 0.45_ 534 GGGN(6 GGA 87 52 0.44 535 GCCN{12ITCC 60 32 0.44 536 AGTN{16}CCG 17 5 0.44 537 GGCN(19}GCC 68 29 0.44 538 CCGN3)CCG 22 7 0.44 539 CCCN8}ACC 58 31 0.44 540 CAGN{15)GCC 77 44 0.44 541 CCCN{17)CGG 24 8 0.44 542 GCGN{1}CCA 22 7 0.44 543 CCCN{14}CAG 79 46 0.44 544 CCCN{81CCC 89 53 0.44 545 ACAN{12}GCG 23 8 0.43 546 AGGN{41CCG 23 8 0.43 547 CGCN{131GCC 23 8 0.43 548 GAGN{2}CGC 23 8 0.42 549 CCCN9lGCG 21 7 0.42 550 CGCN{171ACA 17 5 0.42 551 GCGN{1 )CCA 23 8 0.42 552 AAGN{18)CCG 20 6 0.42 553 CGCN{1}GGA 18 5 0.41 554 CCAN{1}CCC 90 54 0.41 555 CGTN{18)TGC 20 6 0.41 556 TCCN{14}CGA 17 5. 0.41 557 CACN{5}GGG 56 29 0.4 558 CCGN{12GCA 21 7 0.4 559 CTGN{6}CCC 77 44 0.4 560 CGGN8lGGC 32 13 0.4 561 CCAN{11}GGG 68 38 0.4 562 ACGN{19}CAA 21 7 0.39 563 GGGN{20}CCC 72 31 0.39 564 CGCN3) CAG 23 8 0.39 565 AGCNf17GGG 58 31 0.37 566 CACN{20)CCG 21 7 0.37 567 ACGN{17}CAG 24 8 0.37 568 AGGN{1}CCC 60 32 0.37 569 CGTN{12}CAC 20 6 0.37 570 CGGN{9}GGC 23 8 0.37 571 CGCN{10}GCG 18 3 0.37 572 CCCN{6}CTC 80 47 0.36 573 CCGN{10}AGG 23 8 0.36 574 CcCN 18}CAG 79 46 0.3 575 AGCN{17}CCG 21 7 0.36 576 AGCN{9)GCG 18 _0.36 r, 7 7 A NI~IC~C~C1 0.36 ui7 r-Amf~l 0.3 578 CCCN{11}GGC 57 30 0.35 579 ACGN{5)GCA 23 8 0.35 580 CCCN n1}CGG 23 8 0.35 581 CCCN{5)CCA 91 55 0.35 582 CCGN{11AGG 22 7 0.34 583 GGGN{101GAC 45 22 0.34 584 CGCN115}CCA 20 6 0.34 585 CCTN{19)CGC 22 7 0.34 586 CGTN{3}CGC 10 2 0.33 587 AGCN{14}CCG 21 7 0.33 588 GGCN{2)CGA 17 5 0.33 589 CAGN 8}CCC 79 46 0.33 590 CCGN{21GAC 16 4 0.33 591 AGCN{19AGG 70 40 0.32 592 CCTN{41GGC 64 35 0.32 593 CCGN{1}AGC 22 7 0.32 594 CACN{4}CGC 18 5 0.32 595 CCGN(1)CCC 30 12 0.31 596 CTGN{13}GGC 73 42 0.31 597 CGCN{16)ACC 15 4 0.31 598 CACN{18}CAG 79 46 0.31 599 GGCN8}GCC 68 29 0.29 600 GGGN{15 GGA 78 46 0.29 601 CCGN{16}GCC 22 7 0.29 602 CCGN120} CC 18 5 0.29 603 CGAN 7CCC 17 5 0.28 604 CCGN{6 CTC 23 8 0.28 605 CGGN{101CTC 22 7 0.28 606 CAGN{16)CGC 23 8 0.28 607 CCAN{3}AGG 77 45 0.27 608 GCCN{18}GCC 52 27 0.27 609 CGCN{18}GGA 19 6 0.26 610 CCGN{20)GGC 22 7 0.26 611 ACAN{10)GCG 17 5 0.26 612 CGGN{5}CCC 25 9 0.25 613 CCCN{7)TCC 75 43 0.25 614 ACGN(10}CGC 10 2 0.25 615 CCCN{3}TCC 81 48 0.25 616 CCGN{8}CGG 20 3 0.24 617 CCAN{15)CGG 22 7 0.24 618 CCGN{6)CCG 17 5 0.24 619 CAGN(3}GCG 25 9 0.24 620 GAGN{1}CCC 62 34 0.24 621 CCGN{18}TGC 22 7 0.23.
622 CCCN{7}CCA,. 85 51 0.23 623 CGGN(3)CCA 24 9 0.23 624 ACGN{1}CCC 18 5 0.23 625 CGGN(13}TGA 21 7 0.22 626 CTCN(6}GGC 53 28 0.22 627 GCGN{2}GAC 15 4 0.22 628 GGGN{11}ACC 49 25 0.22 629 CGCN(4}GGA 17 5 0.22 630 CCCN{11}CCG 27 10 0.22 631 CCGN{19}GCA 20 6 0.22 632 GCGN[0}GCA 20 6 0.21 633 AGAN{7}CCC 61 33 0.21 634 CGGN{2)CCA 21 7 0.21 635 CCCN17)CCC 89 54' 0.21 636 ACCN{4}GCG 15 4 0.2- 637 CCTN(15}CGC 20 6 0.2 638 AGCNf9}GTC 44 21 0.2 639 CCCN{18}CTC 74 43 0.2 640 -CGCN(181CGA 9 1 0.19 641 CCCN{15}GCC 62 34 0.18 642 -ACCN{11)GGC 45 22 0.18 643 AGGN{15}CGC 29 12 0.18 644 GCGN{0}CCA 27 10 0.18 645 GCGN{91AGC 18 5 0.17 646 GGGN{18}GCA 59 32 0.17 647 CCCNfl7}CAG 77 45 0.17 648 CCAN{8)CGG 22 8 0.16 649 CCGN{10}GGC 21 7 0.16 650 GCAN{0}GCC 76 44 0.16 651 CAGNf2)CGC 20 6 0.16 652 CGCN{8}GGC 19 6 0.16 653 CTGN{17}GGC 65 36 0.16 654 GGGNfl4}ACC 46 23 0.16 655 CCGNf1}TGC 20 6 0.16 656 CAGN{8}CGC 22 8 0.15 657 AAGN{11}CGC 17 5 0.15 658 CCGN{6}TCC 22 8 0.14 659 CCAN{18 CCC 72 42 0.14 660 CCAN(0}CCC 841 51 0.141 661 GAGN 6}CCC 53 28 0.14 662 AGCN20}GGC 52 27 0.14 663 CAGN0}CGC 21 7 0.14 664 CCGN{12)CTC 22 8 0.14 665 CGCN{15)ACG 9 1 0.13 666 GGCN{17)CGA 15 4 0.13 667 CCGN{16)AAG 19 6 0.13 668 CGCNf14TCC 19 6 0.12 669 AGGN{7)CGC 20 7 0.12 670 CGGN{7)CCC 22 8 0.12 671 CGCN{41GCC 34 15 0.12 672 CGAN{6}CCC 17 5 0.12 673 CCCN{19)GGA 60 33 0.11 674 CCCN{16)GCG 28 11 0.11 675 CCANf7)CGC 20 7 0.11 676 CCCN{6)GCC 80 48 0.11 677 GCCN{14}TCC 55 29 0.11 678 AGGN{14}GCC 64 36 0.1 679 CGCN{11}GCC 20 7 0.1 680 TCCNf0}GCA 17 5 0.09 681 GCGN8lCCC 27 11 0.09 682 CCAN{11}GCG 19 6 0.09 683 CACN{4)GGG 51 26* 0.09 684 CGGN7)TCC 20 7 0.09 685 GCGN{5)GCC 20 7 0.09 686 ACGN{12}CAG 26 10 0.09 687 CCGN{19)CGC 14 4 0.08 688 CGGN{8)TGC 18 5 0.08 689 CCCN{1}GAG 65 37 0.07 690 GCGN{19}TGA 18 6 0.07 691 GGCN{15)GCC 70 31 0.07 692 CCGN{7)CCC 27 11 0.07 693 ACAN{19)CCC 63 35 0.07 694 ACCN{16)GGG 47 24 0.07 695 AGAN{1)GGC 64 36 0.07 696 GGGN{17}TGA 64 36 0.06 697 CAGN{5}GGG 83 50 0.06 698 GCCN{1 GC 22 8 0.06 699 GCGN7lGGA 19 6 0.06 700 CAGN{141CCA 94 58 0.06 701 CCGN{4}GTC 16 4 0.06 702 CCCN{13)CGC 22 8 0.06 00 118 0 00 Table 6. STAR elements, including genomic location and length
O-
00 00 STAR Location' Length 2 1 2q31.1 750 2 '7p15.2 916 33 15q11.2 and 10Oq22.2 2132 4 lp 3 l.1 and 14q24.1 1625 54 20q13.3 2 1571 6 2p2l 1173 7 1q34 2101 8 9q32 1839 94 10p 1 5 3 1936 Xpll.3 1167 11 2p25.1 1377 12 5q35.3 1051 134 9q34.3 1291 144 22ql1.2 2 732 1p36.3l 1881 16 'p 21 2 1282 17 2q31.l 793 18 2q31.3 497 19 6p 2 2 .1 1840 8pl 3 3 780 21 6q24.2 620 22 2q12.2 1380 23 6p 2 2 .1 1246 24 1lq21.2 948
STAR
255 Location' lq2l.3 Length 2 1067 26 lq2l.1 540 27 1q23.1 1520 28 22qll.
2 3 961 29 2q13.31 2253 22g12.3 1851 31 9q34.11land 22qll.21 1165 32 21q22.2 771 33 21q22.2 1368 34 9q34.14 755 7q22.3 1211 36 21q22.2 1712 37 22g11.23 1331 38 22q11.1 and 22q11.1 -1000 STAR Location' Length 2 39 22q12.3 2331 22q11.21 1071 41 22q11.21 1144 42 22q11.1 735 43 14q24.3 1231 44 22q11.1 1591 22q11.21 1991 46 22q11.23 1871 47 22q11.21 1082 48 22q11.22 1242 49 Chr 12 random clone, and. 1015 3q26.32 6p2l.3l 2361 51 5q21.3 2289 52 '7p15.2 1200 53 Xpll.3 1431 54 4q21.1 981 15q13.1 501 56 includes 3p25.3 741 57 4g35.2 1371 58 21q11.2 1401 59 17 random clone 872 4p16.1 and 6q27 2068 61 7p14.
3 and 11g25 14821 00 00 STAR Location 1 Length 2 62 14q24.3 1011 63 22q13.3 1421 64 17q11.2 1414 7q21.11= 2 8 4 1310 66 j20q13.33 and 6q14.1 -2800 'Chromosomal location is determined by BLAST search of DNA sequence data from the STAR elements against the human genome database. The location is given according to standard nomenclature referring to the cytogenetic ideogram of each chromosome; e.g. lp 2 3 is the third cytogenetic sub-band of the second cytogenetic band of the short arm of chromosome 1 (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Genetics/chrombanding.h tml). In cases where the forward and reverse sequencing reaction identified DNAs from different genomic loci, both loci are shown.
2 Precise lengths are determined by DNA sequence analysis; approximate lengths are determined by restriction mapping.
3 Sequence and location of STAR3 has been refined since assembly of Tables 2 and 4 of EP 01202581.3.
4 The STARs with these numbers in Tables 2 and 4 of EP 01202581.3 have been set aside (hereafter referred to as "oldSTAR5" etc.) and their numbers assigned to the STAR elements shown in the DNA sequence appendix. In the case of oldSTAR5, oldSTAR14, and oldSTAR16, the cloned DNAs were chimeras from more than two chromosomal locations; in the case of oldSTAR9 and oldSTAR13, the cloned DNAs were identical to STAR4.
Tdentical to Table 4 "STAR18"ofEP 01202581.3.
00 14.
Table 7. STAR elements convey stability over time on transgene expression 1 Cell Luciferase Divisions 2 Expression 3 STAR6 plus 42 18,000 N puromycin 23,000 00 84 20,000 108 16,000 STAR6 without 84 12,000 puromycin 4 108 15,000 144 12,000 1 Plasmid pSDH-Tet-STAR6 was transfected into U-2 OS cells, and clones were isolated and cultivated in doxycyclinefree medium. Cells were transferred to fresh culture vessels weekly at a dilution of 1:20.
2 The number of cell divisions is based on the estimation that in one week the culture reaches cell confluence, which represents ~6 cell divisions.
3 Luciferase was assayed as described in Example 4.
4 After 60 cell divisions the cells were transferred to two culture vessels; one was supplied with culture medium that contained puromycin, as for the first 60 cell divisions, and the second was supplied with culture medium lacking antibiotic.
Table 8. Human STAR elements and their putative mouse orthologs and paralogs SEQ:ID STAR Human' Mouse 2 Similarity 3 1 1 2q31.1 2D 600 bp 69% 2 2 7P 15.2 6B33 909 bp 89% 3 3a 5q33.3 11B32 248 bp 83% 4 3b 10q22.2 14B3 1. 363 bp 89% 6 2p2l 17E4 437bp 78% 6 12 5q35.3 11b1.3 796 bp 66% 7 13 9q34.3 2A3 753 bp 77% 8 18 2q31.3 2E1 497 bp 72% 9 36 21q22.2 16C4 166 bp 79% 40 22q11.1 6F1 1. 270 bp 2. 309 bp 11 50 6p2l.3l 17B31 1. 451 bp 72% 2. 188 bp 142 bp 64% 12 52 7 p 1 5 2 6B3 1. 846 bp 74% 195 bp 71% 13 53 Xp11.3 XA2 364 bp 64% 14 54 4q21.1 5E3 1. 174 bp 2. 240 bp 73% 3. 141 bp 67% 4. 144 bp_68% 61a 7p14.
3 6133 188 bp 68% iCytogenetic, location of STAR element in the human genome.
2 Cytogenetic location of STAR element ortholog in the mouse genome.
3 Length of region(s) displaying high sequence similarity, and percentage similarity. In some cases more than one block of high similarity occurs; in *those cases, each block is described separately. Similarity <60% is not considered significant.
00 Table 9. Candidate STAR elements tested by Linear Discriminant Analysis Candidate Location' Length
STAIR
T2 F 20q13.33 -2800 T2 R -6q14.1 -2800 T3 F 15g12 2900 13 R 7q31.2 -2900 F 9034.13 NTD2 R -9q34.13
IND
T7 22q12.3 -1200 T9 F 21g22.2 -1600 T9 R -22g1 1.22 -1600 11 F 7q22.2 -1300 6q14.1 -1300 T11 F l7g23.3 -2000 _T11 R 16q23.1 -2000 T12 4p1 5 .1 -2100 _T13 F 20p 13 1700 113 R 1p1 3 3 -1700 114 R 11g25 -1500 T17 2q31.3
IND
T18 2q31.1
IND
'Chromosomal location is determined by BLAT search of DNA sequence data from the STAR elements against the human genome database. The location is given according to standard nomenclature referring to the cytogenetic ideogram of each chromosome; e.g.
lp2.
3 is the third cytogenetic sub-band of the second cytogenetic band of the short arm of chromosome 1 (http://www.ncbi.nlmn.nih. gov/Class/MLACourse/Genetics/chrombanding.html).
F,
forward sequencing reaction result; R, reverse sequencing reaction result. When the forward adnd reverse sequencing results mapped to different genomic locations, each sequence was extended to the full length of the original clone (as determined by restriction mapping) based on sequence information from the human genome database.
2ND3: Not Determined.
126 Table 10. Arabidopsis STAR elements of the invention, including chromosome location and length
STAR
Al A2 A3 A4 A6 Chromosome
I
I
I
I
I
I
A7 A8 A9 All A12 A13 A14 A16 A17 A18 A19 AQA9n
II
II
II
II
II
II
II
II
II
II
II
III
III
Length, kb 1.2 0.9 0.9 0.8 1.3 1.4 1.2 0.8 0.9 1.7 1.9 1.4 1.2 2.1 1.4 0.7 0.7 A21 IV 1.8 A22 IV 0.8 A23 IV 0.6 A24 IV V 0.9 A26 V 1.9 A27 V 1.1 A28 V 1.6 A29 V 0.9 V A31 V A32 V 1.3 A33 V0.9 A34 I0.9 00
O
O
(N
DESCRIPTION OF FIGURES r The drawings show representative versions of the DNA molecules of the invention. These portions of DNA, referred to as protein expression unit(s), is/are created and manipulated in vectors such as recombinant plasmid molecules and/or recombinant viral genomes. The protein expression units are Sintegrated into host cell genomes as part of the method of the invention, and 00 Sthe schematic drawings represent the configuration of the DNA elements in S the expression units in both the vector molecules and the host cell genome.
FIG 1. Schematic diagram of the invention.
FIG 1A shows the first expression unit. It is flanked by STAR elements, and comprises a bicistronic gene containing (from 5' to a transgene (encoding for example a reporter gene or one subunit of a multimeric protein; TG S1, "transgene subunit an IRES, and a selectable marker (zeo, conferring zeocin resistance) under control of the CMV promoter. A monocistronic selectable marker (neo, conferring G418 resistance) under control of the SV40 promoter is included. Both genes have the transcriptional terminator at their 3' ends FIG 1B shows the second expression unit. It is flanked by STAR elements, and contains a bicistronic gene containing (from 5' to a transgene (encoding for example a different reporter gene or another subunit of a multimeric protein; TG S2), an IRES, and a selectable marker (bsd, conferring blasticidin resistance) under control of the CMV promoter. A monocistronic selectable marker (neo, conferring G418 resistance) under control of the promoter is included. Both genes have the SV40 transcriptional terminator at their 3' ends.
FIG 2. The pSDH-CSP plasmid.
001L ct The Secreted Alkaline Phosphatase (SEAP) reporter gene is under control of the CMV promoter, and the puromycin resistance selectable marker N (puro) is under control of the SV40 promoter. Flanking these two genes are multiple cloning sites into which STAR elements can be cloned. The plasmid S 5 also has an origin of replication (ori) and ampicillin resistance gene (ampR) for propagation in Escherichia coli.
0 0 FIG 3. The pSDH-SIB/Z and pSDH-GIB/Z families of plasmids.
SThese plasmids are derived from the pSDH-CSP plasmid (FIG by replacement of the monocistronic SEAP and puro genes with a bicistronic gene under control of the CMV promoter and a monocistronic neomycin resistance selectable marker gene (neo) under control of the SV40 promoter.
Panel A, pSDH-SIB/Z in which the bicistronic gene encodes secreted alkaline phosphatase (SEAP) in the 5' position and blasticidin (bsd) or zeocin (zeo) resistance selectable markers in the 3' position, relative to the internal ribosome binding site (IRES).
Panel B, pSDH-GIB/Z in which the bicistronic gene encodes green fluorescent protein (GFP) in the 5' position and blasticidin (bsd) or zeocin (zeo) resistance selectable markers in the 3' position, relative to the internal ribosome binding site (IRES).
FIG 4. Comparison of the consequences of one-step and two-step antibiotic selection on the predictability of transgene expression.
Recombinant CHO cell isolates containing plasmid pSDH-SIZ or plasmid pSDH-SIZ-STAR18 were selected on G418 (panel A) or sequentially on G418 and zeocin (panel B) and assayed for SEAP activity.
FIG 5. The PP (Plug and Play) family of plasmids.
These plasmids contain a bicistronic expression unit (containing an internal ribosome binding site, IRES) between multiple cloning sites (MCS) for 00 Ct insertion of STAR elements. MCSI, Sbfl-Sall-XbaI-AscI-Swal; MCSII, BsiWI- EcoRV-BglII-PacI.
Panel A, the bicistronic gene encodes green fluorescent protein (GFP) and the puromycin resistance marker (puro).
Panel B, the bicistronic gene encodes secreted alkaline phosphatase N (SEAP) and the zeocin resistance marker (zeo).
N Panel C, the bicistronic gene encodes SEAP and the neocin resistance 00 0 marker (neo).
Panel D, the bicistronic gene encodes GFP and puro, and an adjacent monocistronic gene encodes neo.
Panel E, the bicistronic gene encodes SEAP and zeo, and an adjacent monocistronic gene encodes neo.
Bicistronic genes are under control of the CMV promoter (pCMV) and the monocistronic gene is under control of the SV40 promoter (pSV40). A stuffer fragment of 0.37 kb (St) separates MCSI from pCMV. Both the bicistronic and monocistronic genes have the SV40 polyadenylation site at their 3' ends.
FIG 6. STAR sequences Sequences comprising STAR1 STAR65 (SEQ ID:1 Sequences comprising STAR66 and testing set (SEQ ID:66 84), Sequences comprising Arabidopsis STAR A1-A35 (SEQ ID:85-119).
FIG 7. The pSDH-CSP plasmid used for testing STAR activity.
The Secreted Alkaline Phosphatase (SEAP) reporter gene is under control of the CMV promoter, and the puromycin resistance selectable marker (puro) is under control of the SV40 promoter. Flanking these two genes are multiple cloning sites into which STAR elements can be cloned. The plasmid also has an origin of replication (ori) and ampicillin resistance gene (ampR) for propagation in Escherichia coli.
00 1 SFIG 8. STAR6 and STAR49 improve predictability and yield of Stransgene expression.
Expression of SEAP from the CMV promoter by CHO cells transfected to 5 with pSDH-CSP, pSDH-CSP-STAR6, or pSDH-CSP-STAR49 was determined.
C The STAR-containing constructs confer greater predictability and elevated 0C yield relative to the pSDH-CSP construct alone.
00 C FIG 9. STAR6 and STAR8 improve predictability and yield of transgene expression.
Expression of luciferase from the CMV promoter by U-2 OS cells transfected with pSDH-CMV, pSDH-CMV-STAR6, or pSDH-CMV-STAR8 was determined. The STAR-containing constructs confer greater predictability and elevated yield relative to the pSDH-CMV construct alone.
FIG 10. Minimal essential sequences of STAR10 and STAR27.
Portions of the STAR elements were amplified by PCR: STAR10 was amplified with primers E23 and E12 to yield fragment 10A, E13 and E14 to yield fragment 10B, and E15 and E16 to yield fragment 10C. STAR27 was amplified with primers E17 and E18 to yield fragment 27A, E19 and E20 to yield fragment 27B, and E21 and E22 to yield fragment 27C. These subfragments were cloned into the pSelect vector. After transfection into U-2 OS/Tet-OfflLexA-HP1 cells, the growth of the cultures in the presence of zeocin was monitored. Growth rates varied from vigorous to poor while some cultures failed to survive zeocin treatment due to absence of STAR activity in the DNA fragment tested.
FIG 11. STAR element function in the context of the SV40 promoter.
and pSDH-SV40-STAR6 were transfected into the human osteosarcoma U-2 OS cell line, and expression of luciferase was assayed with 00
O
O
or without protection from gene silencing by STAR6 in puromycin-resistant clones.
FIG 12. STAR element function in the context of the Tet-Off promoter.
pSDH-Tet and pSDH-Tet-STAR6 were transfected into the human (1 osteosarcoma U-2 OS cell line, and expression of luciferase was assayed with O or without protection from gene silencing by STAR6 in puromycin-resistant
O
0 clones.
FIG 13. STAR element orientation Schematic diagram of the orientation of STAR elements as they are cloned in the pSelect vector (panel as they are cloned into pSDH vectors to preserve their native orientation (panel and as they are cloned into pSDH vector in the opposite orientation (panel C).
FIG 14. Directionality of STAR66 function.
The STAR66 element was cloned into pSDH-Tet in either the native (STAR66 native) or the opposite orientation (STAR66 opposite), and transfected into U-2 OS cells. Luciferase activity was assayed in puromycin resistant clones.
FIG 15. Copy number-dependence of STAR function.
Southern blot of luciferase expression units in integrated into U-2 OS genomic DNA. Radioactive luciferase DNA probe was used to dete 9 t the amount of transgene DNA in the genome of each clone, which was then quantified with a phosphorimager.
FIG 16. Copy number-dependence of STAR function.
00
O
O
The copy number of pSDH-Tet-STAR10 expression units in each clone was determined by phosphorimagery, and compared with the activity of the Sluciferase reporter enzyme expressed by each clone.
FIG 17. Enhancer-blocking and enhancer assays.
SThe luciferase expression vectors used for testing STARs for enhancerblocking and enhancer activity are shown schematically. The E-box binding 0 site for the E47 enhancer protein is upstream of a cloning site for STAR elements. Downstream of the STAR cloning site is the luciferase gene under control of a human alkaline phosphatase minimal promoter The histograms indicate the expected outcomes for the three possible experimental situations (see text). Panel A: Enhancer-blocking assay. Panel B: Enhancer assay.
FIG 18. Enhancer-blocking assay.
Luciferase expression from a minimal promoter is activated by the E47/E-box enhancer in the empty vector (vector). Insertion of enhancerblockers (scs, HS4) or STAR elements (STAR elements 1, 2, 3, 6, 10, 11, 18, and 27) block luciferase activation by the E47/E-box enhancer.
FIG 19. Enhancer assay.
Luciferase expression from a minimal promoter is activated by the E47/E-box enhancer in the empty vector (E47). Insertion of the scs and HS4 elements or various STAR elements (STARs 1, 2, 3, 6, 10, 11, 18, and 27) do not activate transcription of the reporter gene.
FIG 20. STAR18 sequence conservation between mouse and human.
The region of the human genome containing 497 base pair STAR18 is shown (black boxes); the element occurs between the HOXD8 and HOXD4 homeobox genes on human chromosome 2. It is aligned with a region in mouse 00
O
O
chromosome 2 that shares 72% sequence identity. The region of human chromosome 2 immediately to the left of STAR18 is also highly conserved with Smouse chromosome 2 (73% identity; gray boxes); beyond these region, the identity drops below 60%. The ability of these regions from human and mouse, either separately or in combination, to confer growth on zeocin is indicated:
N
no growth; moderate growth; vigorous growth; rapid growth.
0 FIG 21.
Schematic diagram ofbio-informatic analysis workflow. For details, see text.
FIG 22. Results of discriminant analysis on classification of the training set of 65 STAR elements.
STAR elements that are correctly classified as STARs by Stepwise Linear Discriminant Analysis (LDA) are shown in a Venn diagram. The variables for LDA were selected from frequency analysis results for hexameric oligonucleotides ("oligos") and for dyads. The diagram indicates the concordance of the two sets of variables in correctly classifying STARs.
FIG 23. RT-PCR assay of Arabidopsis STAR strength U-2 OS/Tet-OffflexA-HP1 cells were transfected with candidate Arabidopsis STAR elements and cultivated at low doxycycline concentrations.
Total RNA was isolated and subjected to RT-PCR; the bands corresponding to the zeocin and hygromycin resistance mRNAs were detected by Southern blotting and quantified with a phosphorimager. The ratio of the zeocin to hygromycin signals is shown for transfectants containing zeocin expression units flanked by 12 different Arabidopsis STAR elements, the Drosophila scs element, or no flanking element.
00 .J FIG. 24. STAR elements allow efficient and simultaneous expression of two genes from two distinct vectors.
cThe ppGIZ, ppGIZ-STAR7, ppRIP and ppRIP-STAR7 vectors used for testing simultaneous expression of respectively GFP and RED are shown. The expression unit comprises (from 5' to 3' genes encoding the GFP or RED proteins, an IRES, and a selectable marker (zeo, conferring zeocin resistance cior respectively puro, puromycin resistance gene) under control of the CMV 00 0promoter. The expression unit has the SV40 transcriptional terminator at its 3' end The cassettes with the GFP and RED expression units are either flanked by STAR7 elements (STAR7-shielded) or not (Control). The two control constructs or the two STAR7-shielded vectors are simultaneously transfected to CHO-K1 cells. Stable colonies that are resistant to both zeocin and puromycin are expanded and the GFP and RED signals are determined on a XL-MCL Beckman Coulter flowcytometer. The percentage of cells in one colony that are double positive for both GFP and RED signals is taken as measure for simultaneous expression of both proteins and this is plotted in FIG 24.
FIG. 25. STAR elements improve expression of a functional antibody in CHO cells.
The different vectors containing the Light and Heavy Chain of the RING1 antibody are shown in FIG 25. The constructs are simultaneously transfected to CHO cells. Stable colonies that are resistant to both zeocin and puromycin are expanded. The cell culture medium of these colonies is tested for the detection of functional RING1 antibody in an ELISA with RING1 protein as antigen. The values are dividing by the number of cells in the colony. The highest value detected in the STAR-less control is arbitrarily set at 100%.
00 O 135A CtThroughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and "comprising", will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.
N The reference in this specification to any prior publication (or information derived 00 from it), or to any matter which is known, is not, and should not be taken as an acknowledgment or admission or any form of suggestion that that prior publication (or N information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.
0
O
REFERENCES
c Aranda, A, and Pascual, A. (2001) Nuclear hormone receptors and gene expression Physiol Rev 81, 1269-304.
Berger, J, Hauber, J, Hauber, R, Geiger, R, and Cullen, BR. (1988) Secreted Splacental alkaline phosphatase: a powerful new quantitative indicator of gene Sexpression in eukaryotic cells Gene 66, 1-10.
Bell, AC, West, AG, and Felsenfeld, G. (2001) Insulators and boundaries: versatile regulatory elements in the eukaryotic genome Science 291, 447-50.
Bevan, M, Mayer, K, White, O, Eisen, JA, Preuss, D, Bureau, T, Salzberg, SL, and Mewes, HW. (2001) Sequence and analysis of the Arabidopsis genome Curr Opin Plant Biol 4, 105-10.
Boivin, A, and Dura, JM. (1998) In vivo chromatin accessibility correlates with gene silencing in Drosophila Genetics 150, 1539-49.
Boshart, M, Weber, F, Jahn, G, Dorsch-Hasler, K, Fleckenstein, B, and Schaffner, W. (1985) A very strong enhancer is located upstream of an immediate early gene of human cytomegalovirus Cell 41, 521-30.
Bunker, C.A. and Kingston, R.E. (1994) Transcriptional repression by Drosophila and mammalian Polycomb group proteins in transfected mammalian cells. Mol Cell Biol, 14, 1721-1732.
Chan, A, and Mak, TW. (1989) Genomic organization of the T cell receptor Cancer Detect Prev 14, 261-7.
00 l Chung, JH, Whiteley, M, and Felsenfeld, G. (1993) A 5' element of the chicken beta-globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila Cell 74, 505-14.
Chevet, E, Cameron, PH, Pelletier, MF, Thomas, DY, and Bergeron, JJ. (2001) (71 The endoplasmic reticulum: integration of protein folding, quality control, O signaling and degradation Curr Opin Struct Biol 11, 120-4.
00 SDas, GC, Niyogi, SK, and Salzman, NP. (1985) SV40 promoters and their regulation Prog Nucleic Acid Res Mol Biol 32, 217-36.
Deuschle, U, Meyer, WK, and Thiesen, HJ. (1995) Tetracycline-reversible silencing of eukaryotic promoters Mol Cell Biol 15, 1907-14.
Doll, Crandall, Dyer, Aucoin, J.M. and Smith, F.I. (1996) Comparison of promoter strengths on gene delivery into mammalian brain cells using AAV vectors. Gene Ther, 3, 437-447.
Eszterhas, SK, Bouhassira, EE, Martin, DI, and Fiering, S. (2002) Transcriptional interference by independently regulated genes occurs in any relative arrangement of the genes and is influenced by chromosomal integration position Mol Cell Biol 22, 469-79.
European patent application 01202581.3 Foecking, MK, and Hofstetter, H. (1986) Powerful and versatile enhancerpromoter unit for mammalian expression vectors Gene 45, 101-5.
Garrick, D, Fiering, S, Martin, DI, and Whitelaw, E. (1998) Repeat-induced gene silencing in mammals Nat Genet 18, 56-9.
00
O
SGerasimova, TI, and Corces, VG. (2001) Chromatin insulators and boundaries: effects on transcription and nuclear organization Annu Rev Genet 35, 193-208.
Gill, DR, Smyth, SE, Goddard, CA, Pringle, IA, Higgins, CF, Colledge, WH, C and Hyde, SC. (2001) Increased persistence of lung gene expression using O plasmids containing the ubiquitin C or elongation factor lalpha promoter Gene 0 0 Ther 8, 1539-46.
Gossen, M, and Bujard, H. (1992) Tight control of gene expression in mammalian cells by tetracycline-responsive promoters Proc Natl Acad Sci U S A 89, 5547-51.
Groeneveld, EH, and Burger, EH. (2000) Bone morphogenetic proteins in human bone regeneration Eur JEndocrinol 142, 9-21.
Hamer, CM, Sewalt, RGAB, Den Blaauwen, JL, Hendrix, M, Satijn, DPE, and Otte, AP. (2002). A panel of monoclonal antibodies against human Polycomb group proteins. Hybridoma and Hybridomics 21, 245-52.
Henthorn, P, Zervos, P, Raducha, M, Harris, H, and Kadesch, T. (1988) Expression of a human placental alkaline phosphatase gene in transfected cells: use as a reporter for studies of gene expression Proc Natl Acad Sci U S A 6342-6.
Himes, S.R. and Shannon, M.F. (2000) Assays for transcriptional activity based on the luciferase reporter gene. Methods Mol Biol, 130, 165-174.
Huberty, CJ (1994) Applied discriminant analysis, Wiley and Sons, New York.
00 13, 0 Hynes, RO. (1999) Cell adhesion: old and new questions Trends Cell Biol 9, SM33-7.
Initiative, AG. (2000) Analysis of the genom'e sequence of the flowering plant Arabidopsis thaliana Nature 408, 796-815.
S Izumi, M, and Gilbert, DM. (1999) Homogeneous tetracycline-regulatable gene expression in mammalian fibroblasts J Cell Biochem 76, 280-9.
Kain, SR. (1997) Use of secreted alkaline phosphatase as a reporter of gene expression in mammalian cells Methods Mol Biol 63, 49-60.
Kaufman, RJ. (2000) Overview of vector design for mammalian gene expression Mol Biotechnol 16, 151-60.
Kaufman, RJ. (1990) Selection and coamplification ofheterologous genes in mammalian cells Methods in Enzymology 185, 536-566.
Kaufman, RJ, and Sharp, PA. (1982) Construction of a modular dihydrofolate reductase cDNA gene: analysis of signals utilized for efficient expression Mol Cell Biol 2, 1304-19.
Kellum, R. and Schedl, P. (1992) A group of scs elements function as domain boundaries in an enhancer-blocking assay. Mol Cell Biol, 12, 2424-2431.
Kent, WJ. (2002) BLAT--the BLAST-like alignment tool Genome Res 12, 656- 64.
Knofler, M, Meinhardt, G, Bauer, S, Loregger, T, Vasicek, R, Bloor, DJ, Kimber, SJ, and Husslein, P. (2002) Human Handl basic helix-loop-helix 00 11V (bHLH) protein: extra-embryonic expression pattern, interaction partners and identification of its transcriptional repressor domains Biochem J 361, 641-51.
Liu, DT. (1992) Glycoprotein pharmaceuticals: scientific and regulatory considerations, and the US Orphan Drug Act Trends Biotechnol 10, 114-20.
O Lopez de Quinto, S, and Martinez-Salas, E. (1998) Parameters influencing 0 0 translational efficiency in aphthovirus IRES- based bicistronic expression vectors Gene 217, 51-6.
Martin, DI, and Whitelaw, E. (1996) The vagaries of variegating transgenes Bioessays 18, 919-23.
Martinez-Salas, E. (1999) Internal ribosome entry site biology and its use in expression vectors Curr Opin Biotechnol 10, 458-64.
McBurney, MW, Mai, T, Yang, X, and Jardine, K. (2002) Evidence for repeatinduced gene silencing in cultured Mammalian cells: inactivation of tandem repeats of transfected genes Exp Cell Res 274, 1-8.
Meyer, P. (2000) Transcriptional transgene silencing and chromatin components Plant Mol Biol 43, 221-34.
Migliaccio, AR, Bengra, C, Ling, J, Pi, W, Li, C, Zeng, S, Keskintepe, M, Whitney, B, Sanchez, M, Migliaccio, G, and Tuan, D. (2000) Stable and unstable transgene integration sites in the human genome: extinction of the Green Fluorescent Protein transgene in K562 cells Gene 256, 197-214.
C Mizuguchi, H, Xu, Z, Ishii-Watabe, A, Uchida, E, and Hayakawa, T. (2000) SIRES-dependent second gene expression is significantly lower than capdependent first gene expression in a bicistronic vector Mol Ther 1, 376-82.
Morgenstern, JP, and Land, H. (1990) Advanced mammalian gene transfer: high titre retroviral vectors with multiple drug selection markers and a complementary helper-free packaging cell line Nucleic Acids Res 18, 3587-96.
00 Pahl, HL, and Baeuerle, PA. (1997) The ER-overload response: activation of NF-kappa B Trends Biochem Sci 22, 63-7.
Patil, C, and Walter, P. (2001) Intracellular signaling from the endoplasmic reticulum to the nucleus: the unfolded protein response in yeast and mammals Curr Opin Cell Biol 13, 349-55.
Petersson, K, Ivars, F, and Sigvardsson, M. (2002) The pT alpha promoter and enhancer are direct targets for transactivation by E box-binding proteins Eur J Immunol 32, 911-20.
Quong, MW, Romanow, WJ, and Murre, C. (2002) E protein function in lymphocyte development Annu Rev Immunol 20, 301-22.
Rees, S, Coote, J, Stables, J, Goodson, S, Harris, S, and Lee, MG. (1996) Bicistronic vector for the creation of stable mammalian cell lines that predisposes all antibiotic-resistant cells to express recombinant protein Biotechniques 20, 102-4, 106, 108-10.
Ruezinsky, D, Beckmann, H, and Kadesch, T. (1991) Modulation of the IgH enhancer's cell type specificity through a genetic switch Genes Dev 5, 29-37.
00
O
O
Sambrook, J, Fritsch, EF, and Maniatis, T (1989) Molecular Cloning: A S Laboratory Manual, Second ed., Cold Spring Harbor Laboratory Press, SPlainview NY.
Sanger, F, Nicklen, S, and Coulson, AR. (1977) DNA sequencing with chain- Sterminating inhibitors Proc Natl Acad Sci US A 74, 5463-7.
0 (Schorpp, M, Jager, R, Schellander, K, Schenkel, J, Wagner, EF, Weiher, H, N and Angel, P. (1996) The human ubiquitin C promoter directs high ubiquitous expression of transgenes in mice Nucleic Acids Res 24, 1787-8.
Sheeley, DM, Merrill, BM, and Taylor, LC. (1997) Characterization of monoclonal antibody glycosylation: comparison of expression systems and identification of terminal alpha-linked galactose Anal Biochem 247, 102-10.
Stam, M, Viterbo, A, Mol, JN, and Kooter, JM. (1998) Position-dependent methylation and transcriptional silencing of transgenes in inverted T-DNA repeats: implications for posttranscriptional silencing of homologous host genes in plants Mol Cell Biol 18, 6165-77.
Strutzenberger, K, Borth, N, Kunert, R, Steinfellner, W, and Katinger, H.
(1999) Changes during subclone development and ageing of human antibodyproducing recombinant CHO cells J Biotechnol 69, 215-26.
Thotakura, NR, and Blithe, DL. (1995) Glycoprotein hormones: glycobiology of gonadotrophins, thyrotrophin and free alpha subunit Glycobiology 5, 3-10.
Umana, P, Jean-Mairet, J, and Bailey, JE. (1999) Tetracycline-regulated overexpression of glycosyltransferases in Chinese hamster ovary cells Biotechnol Bioeng 65, 542-9.
00
O
van der Vlag, J, den Blaauwen, JL, Sewalt, RG, van Driel, R, and Otte, AP.
N. (2000) Transcriptional repression mediated by polycomb group proteins and other chromatin-associated repressors is selectively blocked by insulators J Biol Chem 275, 697-704.
00 0 van Helden, J, Andre, B, and Collado-Vides, J. (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of N- oligonucleotide frequencies J Mol Biol 281, 827-42.
van Helden, J, Andre, B, and Collado-Vides, J. (2000) A web site for the computational analysis of yeast regulatory sequences Yeast 16, 177-87.
van Helden, J, Rios, AF, and Collado-Vides, J. (2000) Discovering regulatory elements in non-coding sequences by analysis of spaced dyads Nucleic Acids Res 28, 1808-18.
Vance, V, and Vaucheret, H. (2001) RNA silencing in plants--defense and counterdefense Science 292, 2277-80.
Venkatesan, A, and Dasgupta, A. (2001) Novel fluorescence-based screen to identify small synthetic internal ribosome entry site elements Mol Cell Biol 21, 2826-37.
Villemure, JF, Savard, N, and Belmaaza, A. (2001) Promoter Suppression in Cultured Mammalian Cells can be Blocked by the Chicken beta-Globin Chromatin Insulator 5'HS4 and Matrix/Scaffold Attachment Regions JMol Biol 312, 963-74.
00
O
Whitelaw, E, Sutherland, H, Kearns, M, Morgan, H, Weaving, L, and Garrick, D. (2001) Epigenetic effects on transgene expression Methods Mol Biol 158, S 351-68.
Wright, A, and Morrison, SL. (1997) Effect of glycosylation on antibody Sfunction: implications for genetic engineering Trends Biotechnol 15, 26-32.
00 SYang, TT, Sinai, P, Kitts, PA, and Kain, SR. (1997) Quantification of gene Sexpression with a secreted alkaline phosphatase reporter system Biotechniques 23, 1110-4.
Zink, D, and Paro, R. (1995) Drosophila Polycomb-group regulated chromatin inhibits the accessibility of a trans-activator to its target DNA Embo J 14, 5660-71.

Claims (16)

  1. 2. A cell according to claim 1, wherein said two polypeptide expression units each further encode a different selection marker.
  2. 3. A cell according to claim 1 or 2, wherein at least one of said polypeptide expression units comprises a monocistronic gene comprising an open reading frame encoding a polypeptide of interest and wherein said monocistronic gene is under control of a functional promoter.
  3. 4. A cell according to claim 1 or 2, wherein at least one of said polypeptide expression units comprises a bicistronic gene comprising in the following order: an open reading frame encoding a polypeptide of interest, (ii) an Internal Ribosome Entry Site (IRES), and (iii) a selection marker, and wherein said bicistronic gene is under control of a functional promoter.
  4. 5. A cell according to any one of the preceding claims, wherein at least one of said polypeptide expression units comprises at least two of said sequences having the capacity 00 S146 to at least in part block chromatin-associated repression, arranged such that said polypeptide expression unit is flanked on both sides by at least one of said sequences N having the capacity to at least in part block chromatin-associated repression.
  5. 6. A cell according to claim 5, wherein said at least two sequences having the capacity K to at least in part block chromatin-associated repression are essentially identical. 0 S7. A cell according to any one of the preceding claims, wherein at least one polypeptide of interest comprises an immunoglobulin heavy chain, or an immunoglobulin light chain, and preferably_wherein at least one polypeptide of interest comprises an immunoglobulin heavy chain and the other polypeptide of interest comprises an immunoglobulin light chain, wherein said heavy and light chain can form a functional antibody.
  6. 8. A method for expressing at least two polypeptides of interest in a cell, said method comprising culturing a cell according to any one of the preceding claims under conditions wherein said polypeptide expression units are expressed.
  7. 9. A polypeptide expression unit comprising: a bicistronic gene comprising in the following order: an open reading frame encoding a polypeptide of interest, (ii) an Internal Ribosome Entry Site (IRES), and (iii) a selection marker, and wherein said bicistronic gene is under control of a functional promoter; and at least one sequence having the capacity to at least in part block chromatin- associated repression, wherein said sequence having the capacity to at least in part block chromatin-associated repression is chosen from the group consisting of: SEQ ID: 44 of Figure 6; a functional equivalent of SEQ ID: 44 of Figure 6; and a functional fragment of SEQ ID: 44 of Figure 6.
  8. 10. A polypeptide expression unit according to claim 9, comprising at least two of said sequences having the capacity to at least in part block chromatin-associated repression, 00 147 arranged such that said polypeptide expression unit is flanked on both sides by at least one _of said sequences having the capacity to at least in part block chromatin-associated repression.
  9. 11. A polypeptide expression unit according to claim 9 or 10, wherein said polypeptide of interest comprises an immunoglobulin heavy chain or an immunoglobulin light chain. 00
  10. 12. A polypeptide expression unit according to any one of claims 9-11, wherein said selection marker encodes the zeocin-resistance protein.
  11. 13. A method for obtaining a host cell expressing two polypeptides of interest, the method comprising: a) providing host cells comprising: a first polypeptide expression unit comprising a bicistronic gene comprising a promoter functionally linked to a sequence encoding a first polypeptide of interest and a first selectable marker gene, and (ii) a second polypeptide expression unit comprising a bicistronic gene comprising a promoter functionally linked to a sequence encoding a second polypeptide of interest and a second selectable marker gene, wherein said second selectable marker gene is different from said first selectable marker gene, and wherein said first polypeptide expression unit, or said second polypeptide expression unit, or each of said first and said second polypeptide expression units comprise at least one sequence having the capacity to at least in part block chromatin-associated repression, wherein said sequence having the capacity to at least in part block chromatin- associated repression is chosen from the group consisting of: SEQ ID: 44 of Figure 6; a functional equivalent of SEQ ID: 44 of Figure 6; and a functional fragment of SEQ ID: 44 of Figure 6; and b) selecting a host cell by selecting for expression of said first and second selectable marker genes. 00 8148
  12. 14. A method for expressing two polypeptides of interest, the method comprising culturing a host cell selected by a method according to claim 13, to express said first and c second polypeptides. S 5 15. A method according to claim 14, wherein the two polypeptides of interest form part of a multimeric protein. 00
  13. 16. A method according to claim 14 or 15, wherein at least one of said two polypeptides of interest comprises an immunoglobulin heavy chain, or an immunoglobulin light chain, and preferably wherein at least one of said two polypeptides of interest comprises an immunoglobulin heavy chain and the other polypeptide of interest comprises an immunoglobulin light chain, wherein said heavy and light chain can form a functional antibody.
  14. 17. A method according to any one of claims 13-16, wherein at least one of said bicistronic genes comprises an internal ribosome entry site (IRES).
  15. 18. A set of two polypeptide expression units, said set comprising: a first polypeptide expression unit comprising a bicistronic gene comprising a promoter functionally linked to a sequence encoding a first polypeptide of interest and a first selectable marker gene, and (ii) a second polypeptide expression unit comprising a bicistronic gene comprising a promoter functionally linked to a sequence encoding a second polypeptide of interest and a second selectable marker gene, wherein said second selectable marker gene is different from said first selectable marker gene, and wherein said first polypeptide expression unit, or said second polypeptide expression unit, or both said first and said second polypeptide expression units comprise at least one sequence having the capacity to at least in part block chromatin-associated repression, wherein said sequence having the capacity to at least in part block chromatin- associated repression is chosen from the group consisting of: SEQ ID: 44 of Figure 6; 00 O S149 a functional equivalent of SEQ ID: 44 of Figure 6; and a functional fragment of SEQ ID: 44 of Figure 6.
  16. 19. A cell according to any one of claims 1 to 7, or a method according to claim 8 or 14 to 17, or a polypeptide according to any one of claims 10 to 12, or a set of two or more polypeptide expression units according to claim 18 substantially as hereinbefore described 00 with reference to the Figures and/or Examples. 0',
AU2008202251A 2002-06-14 2008-05-21 A method for simultaneous production of multiple proteins; vectors and cells for use therein Ceased AU2008202251B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2008202251A AU2008202251B2 (en) 2002-06-14 2008-05-21 A method for simultaneous production of multiple proteins; vectors and cells for use therein
AU2011218621A AU2011218621B2 (en) 2002-06-14 2011-08-26 A method for simultaneous production of multiple proteins; vectors and cells for use therein

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP02077350.3 2002-06-14
AU2003238719A AU2003238719B2 (en) 2002-06-14 2003-06-13 A method for simultaneous production of multiple proteins, vectors and cells for use therein
AU2008202251A AU2008202251B2 (en) 2002-06-14 2008-05-21 A method for simultaneous production of multiple proteins; vectors and cells for use therein

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU2003238719A Division AU2003238719B2 (en) 2002-06-14 2003-06-13 A method for simultaneous production of multiple proteins, vectors and cells for use therein

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2011218621A Division AU2011218621B2 (en) 2002-06-14 2011-08-26 A method for simultaneous production of multiple proteins; vectors and cells for use therein

Publications (2)

Publication Number Publication Date
AU2008202251A1 true AU2008202251A1 (en) 2008-06-12
AU2008202251B2 AU2008202251B2 (en) 2011-09-01

Family

ID=39537807

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2008202251A Ceased AU2008202251B2 (en) 2002-06-14 2008-05-21 A method for simultaneous production of multiple proteins; vectors and cells for use therein

Country Status (1)

Country Link
AU (1) AU2008202251B2 (en)

Also Published As

Publication number Publication date
AU2008202251B2 (en) 2011-09-01

Similar Documents

Publication Publication Date Title
US7960143B2 (en) Method for simultaneous production of multiple proteins; vectors and cells for use therein
AU2002314629B2 (en) Method of selecting DNA sequence with transcription modulating activity using a vector comprising an element with a gene transcription repressing activity
US7794977B2 (en) Means and methods for regulating gene expression
CA2812821C (en) Dna sequences comprising gene transcription regulatory qualities and methods for detecting and using such dna sequences
AU2011218621B2 (en) A method for simultaneous production of multiple proteins; vectors and cells for use therein
AU2008202251B2 (en) A method for simultaneous production of multiple proteins; vectors and cells for use therein
Class et al. Patent application title: Method for simultaneous production of multiple proteins; vectors and cells for use therein Inventors: Arie Pieter Otte (Amersfoort, NL) Arie Pieter Otte (Amersfoort, NL) Arthur Leo Kruckeberg (Shoreline, WA, US) Richard George Antonius Bernardus Sewalt (Arnhem, NL) Assignees: Crucell Holland BV

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)
MK14 Patent ceased section 143(a) (annual fees not paid) or expired