EP0996715A1

EP0996715A1 - Heterologous expression of proteins by "rescued" vector comprising an intron

Info

Publication number: EP0996715A1
Application number: EP98935143A
Authority: EP
Inventors: Alan Colman; Ian Garner; Michael Alexander Dalrymple
Original assignee: PPL Therapeutics Scotland Ltd
Current assignee: PPL Therapeutics Scotland Ltd
Priority date: 1997-07-17
Filing date: 1998-07-17
Publication date: 2000-05-03
Also published as: WO1999003981A1; WO1999003981A8; GB9715064D0; AU8450298A

Abstract

A nucleic acid expression construct comprising: (a) a promoter; (b) an intron whose natural position is within the 5'-untranslated region of a gene from which it is derived; (c) a coding sequence; and (d) a 3'-flanking sequence wherein the intron (b) is not derived from the same gene as that from which either the promoter (a) or the protein-coding sequence (c) is derived and processes, vectors, hosts and uses involving such a construct to obtain inter alia an increase in the level of expression of the coding sequence.

Description

HETEROLOGOUS EXPRESSION OF PROTEINS BY " RESCUED" VECTOR COMPRIS ING AN INTRON

This invention relates to the expression of proteins in heterologous host systems, particularly in, but not limited to, the mammary gland of transgenic animals.

5

It has been shown, using regulatory DNA elements from milk protein genes, that it is possible to express heterologous proteins in the milk of transgenic livestock. One such gene, that for ovine β-lactoglobulin (BLG), has been cloned and characterised (Ali and Clark, J. Mol. Biol, 199 145-426(1988)). The authors lo subsequently demonstrated consistent, high level, expression of ovine BLG in the milk υf mice transgenic for the entire gene (Simons et al, Nature, 328 530- 532(1987); Harris et al, Developmental Genetics, 12 299-307(1991)). Further experiments demonstrated that the BLG promoter region can direct high levels of expression of a heterologous human protein to the milk of transgenic mice is (Archibald et al, Proc. Natl Acad. Sci. USA, 87 5178-5182(1990)). The generation of sheep, expressing human proteins in their milk using BLG regulatory elements, indicated that this technology was applicable to transgenic livestock (Simons et al, Bio/Technology, 6 179-183(1988); Clark et al, Bio/Technology, 7 487-492(1989)). The commercial feasibility of this technology, as a means of

20 producing recombinant therapeutics in livestock milk, has been confirmed by the demonstration of high level expression of human α_rantitrypsin in the milk of transgenic sheep (Wright et al, Bio/Technology, 9 830-834(1991); Carver et al, Cytotechnology, 9 77-84(1992); Carver et al, Bio/Technology, 11 1263- 1270(1993); Cooper and Dalrymple, The Japanese Journal of Expeήmental 25 Medicine, Developmental Biotechnology supplement, 12(2) 124-132(1994)).

This high level of expression of a heterologous protein in livestock milk was the result of using a fusion of the BLG promoter region to human genomic sequences (Wright et a , Bio/Technology, 9 830-834(1991)). Analogous cDNA based

30 constructs were poorly expressed in transgenic mice (Whitelaw et al, Transgenic Res., 1 3-13(1991)). Despite some notable exceptions in the field as a whole, (Ebert et al, Bio /Technology, 9 835-838(1991); Velander et al, Proc. Natl. Acad. Sci. USA, 89 12003-12007(1992)) the general inefficient expression of cDNA based constructs is well documented (Brinster et al, Proc. Natl. Acad. Sci. USA, 85 836-840(1988); Palmiter et al, Proc. Natl. Acad. Sci. USA 88 478-482 (1991); Whitelaw et al, Biochem. J., 286: 31-39 (1992)). Observed problems include the influence of chromosomal position effects and distinct spatial and/or temporal expression in lines transgenic for the same construct. Such constructs can be improved by the addition of some natural or heterologous introns. However, expression levels from such constructs rarely match levels attained with constructs containing some or all natural introns in the region encoding a heterologous protein. The successful use of less than a full complement of introns is the subject of WO-A-9005188. In spite of that useful advance in the art, however, the genetic material encoding many potential target human proteins which may be produced by the transgenic mammary gland is very often, due to immediate non-availability or the size of the natural gene, limited to cDNAs. As such, a technique giving more consistent expression from transgene constructs containing intronless cDNA sequences is highly desirable.

A further advance in the expression of cDNAs is the so-called "rescue" technology, an approach developed by Clark and co-workers (Clark et al, Bio/Technology, 10 1450-1454(1992); WO-A-9211358)) to overcome cDNA- related expression problems. It makes use of the observation that co-injection of an actively expressed transgene, such as the entire ovine BLG gene, together with an intronless construct results in the expression of the second construct where no expression is achieved when it is injected alone. Clark and colleagues have demonstrated the expression of up to 800μg/ml of human α,-antitrypsin (AAT) in the milk of mice transgenic for both BLG and an intronless human AAT construct. In mice transgemc for the latter construct alone, only one out of eight mice expressed and this at a level of only 3.9μg/ml. Similarly, using this technology, an expression level of 108μg/ml from a wild type human protein C cDNA construct was achieved (WO-A-9211358). This represents approximately 20% the expression level obtained with an equivalent genomic based construct. The cDNA construct alone gave no expression in 11 lines of transgenic mice.

The "rescue" .phenomenon has been rationalised as follows. Strongly expressing genes have an innate ability to 'dominate' their chromosomal environment such that they are able to initiate and maintain a high expressing state. Intronless genes are deficient in some, as yet identified, feature which provides them with this capability. However, the dominant effect of the strong gene extends some way 5' and 3' to the gene itself and therefore by linking a 'weak' and 'strong' gene, some of the properties of the high expressing gene are conferred on the intronless gene. Clark and colleagues propose that this probably results in an open chromatin conformation associated with the actively expressing gene which encompasses adjacent intronless genes. The actively expressing gene may thus create a permissive domain allowing access to the intronless genes by the transcriptional machinery of the cell. In the absence of adjacent actively expressing genes, the intronless construct may be inaccessible, probably residing in condensed chromatin. Other possible explanations for this phenomenon include enhancer-like sequences present in the actively expressing gene but absent from the intronless construct interacting positively with the latter or simply that the actively expressing gene insulates the intronless gene from the negative effects of adjacent chromatin.

To take advantage of "rescue" technology, we have constructed a vector, pMAD, from the ovine BLG gene for the cloning of cDNAs (Figures 1 and 2). This vector contains the same 5' and 3' flanking sequences present in the BLG gene which itself always gives rise to high level expression in transgenic mice. However, it lacks all coding sequences and introns of the intact gene. Cloning of cDNAs in the unique EcoRV site between 5' and 3' flanking sequences results in constructs suitable for expression by the "rescue" approach. However, the issue of co- injection of two covalently unlinked genes is not without its difficulties. There is always the risk that one gene or other is not represented in the final transgenic lines. Additionally, the two different genes may be present but not at the same locus. Subsequently they may segregate upon breeding. Finally, the physical structure of a BLG/pMAD array is not determined prior to injection and there is no control over it. The relative copy numbers of the two genes may vary especially if the DNA concentrations of the two constructs are not tightly controlled.

cDNAs have been successfully expressed at high levels, in a limited number of cases. It is not clear from the literature why this should be the case. However, the fact is that a cDNA has never (to our knowledge) been expressed at high levels from a BLG construct other than by rescue.

We had noted the work of Brinster and Palmiter (ibid) and others and we sought to incorporate a BLG intron into our cDNA constructs. To this end the vector pMAD6 was constructed, containing almost all the BLG sequence 3' to the natural BLG stop codon, i.e. a portion of exon 6, intron 6, all of exon 7 and those available sequences downstream of the polyadenylation site (see Figures 1 and 2). A protein C cDNA in this vector (pCORP3) expresses at detectable levels but not nearly as well as "rescued" intronless pCORP2 (see table 2). Thus we can conclude that the mere presence of a BLG intron is insufficient to achieve high level expression.

Noting that certain genes have an intron in the 5 '-untranslated region (5'-UTR), we engineered the natural BLG first intron into the 5'-UTR of the BLG sequences in pMAD (to give pMADl) and into pMAD6 (to give pMAD16). When protein C cDNAs were put into these vectors, there was no detectable expression of protein C in the milk of lactating female transgenics (see table 2; pCORP6). This indicates that the mere presence of intronic sequences in the 5'-UTR of a gene is in general insufficient to allow expression of a cDNA. We have now found, however, that if, instead of the BLG first intron, an intron whose natural position is within the 5 '-untranslated region of its gene is used, good expression results.

According to a first aspect of the invention, there is provided an expression construct comprising:

(a) a promoter;

(b) an intron whose natural position is within the 5 '-untranslated region of a gene from which it is derived;

(c) a coding sequence; and

(d) a 3 '-flanking sequence,

wherein the intron (b) is not derived from the same gene as that from which either the promoter (a) or the coding sequence (c) is derived, and, in particular wherein the promoter (a) drives expression of the coding sequence (c) at a level which is elevated by virtue of the presence of the intron (b) .

Elevated levels of expression include expression where previously none was measurable (or obtained). Elevated levels is optionally defined as a level higher than obtained by the construct without the intron (and optionally the 3' flanking sequence) described above.

Preferably the expression construct is a DNA expression construct. Preferably the coding sequence is a protein-coding sequence although it may code non-protein substances such as ribozymes. The construct is effective for two particular reasons; firstly, the promoter (a) drives expression of the coding sequence (c) at a level which is elevated by virtue of the presence of the intron (b) and/or secondly the coding sequence (c) is more likely to be expressed in a transgenic host, by virtue of the presence of the intron (b). The second effect is particularly important when taking into account the length of time and the efforts required to produce transgemc animals useful as bioreactors for the production of useful proteins, etc. It is also important in laboratory scale trials to determine and obtain transgenic hosts. Use of constructs as described in the claims have shown that an increased number of transgenic hosts express the coding sequence over use of the constructs without specific intron described herein (e.g see number of expressing founders in Table 3). The elevated level of expression of the coding sequence and/or the expression of the coding sequence may be by virtue of the presence of the intron (b) and the 3'- flanking sequence (d).

The DNA expression construct may be useful for expression in any suitable host system such as, for example, prokaryotes, (e.g. E.colϊ), fungi, plant and animal (including mammalian) cell lines and transgenic plants and animals (including mammals). However, it is in transgenic animal hosts that the expression constructs of the invention are most useful. In principle, the invention is applicable to all animals, including birds such as domestic fowl, amphibian species and fish species. The protein may be harvested from body fluids (such as milk, blood or urine) or other body products (such as eggs, where appropriate). In practice, it will be to (non-human) mammals, particularly placental mammals, that the greatest commercially useful applicability is presently envisaged. This is because expression in the mammary gland, with subsequent optional recovery of the expression product from the milk, is a proven and preferred technology. It is with ungulates, particularly economically important ungulates such as cattle, sheep, goats, water buffalo, camels and pigs that the invention is likely to be most useful. The generation and usefulness of such mammalian transgenic mammary expression systems is both generally, and in certain instances specifically, disclosed in WO-A-8800239 and WO-A-9005188. In this text, the meaning of a sequence being derived from a gene does not require that the sequence has actually been obtained from the gene in question. Rather, all and any copy, as well as the original sequence is meant. Further, any modification to the sequence which does not remove the desired end result can be used.

In addition to being useful in transgenic animal expression for non-therapeutic purposes (as far as the host is concerned), constructs of the invention may also be useful in genetic therapy in humans or other animals.

The promoter can be any suitable promoter chosen from a gene different from the source of the intron (b). Within that constraint, it will be chosen having regard to its desired properties in the construct of the expression system to be used and its ability to derive expression of heterologous sequences in cell culture or in a transgenic organism. A promoter is any sequence which drives expression of a coding sequence. For example, the BLG promoter does not express particularly highly in cells which do not respond to prolactin (such as COS cells). A 'cell' promoter according to the invention is the HCMV (human cytomegalovirus) IE gene promoter. Other promoters of the invention include, endothelial promoters such as vascular cell adhesion molecule (VCAM), platelet endothelial cell adhesion molecule- 1 (PEC AM), inter-cellular adhesion molecule-2 (ICAM) and smooth muscle promoters, such as Desrnin E and Desmin P. A preferred promoter is one which drives expression of the protein coding sequence in mammalian cells. In relation to expression in transgenic animal hosts, the preferred expression system involves expression in the mammary gland of transgenic placental mammals. For this purpose, milk protein promoters will generally be used, preferably but not necessarily derived from the species chosen as an expression host. The promoter may be a casein promoter (such as an α-, β- or κ-casein promoter), but it is preferred that it be a non-casein promoter, such as the human Bile Salt Stimulated Lipase (BSSL) promoter, more preferably a whey protein promoter, such as that of whey acidic protein (WAP), α-lactalbumin or, most preferred of all, β- lactoglobulin. Figure 3 is a schematic representation of the cloning of pCASLAC and obtaining transgene constructs therefrom. pCASLAC corresponds to pCASMADό (see Fig. 2, 4 and 7) only using the more tightly regulated α-lac promoter. Of course, the present invention covers promoters, in the constructs described, which have not yet been isolated or characterized. One general way for isolating specific promoters (such as mammalian promoters) for use in the present invention is to isolate specific cDNAs by differential display or from subtractive cDNA libraries. These, in turn, are used to screen genomic libraries for the cognate promoters.

In addition, the present invention encompasses the use of a modified low expressing naturally occurring promoter in vitro to an increased level of expression (eg. by addition of an enhancer) or to use a promoter with a higher level of expression in a crossed species (eg. the human α-lactalbumin promoter expresses better in mice than the endogenous mouse promoter).

A promoter according to the invention may also be a viral or modified cellular promoter or a completely artificial promoter having the properties of high level expression (preferably mammalian species). Details of suitable promoters can be found in Houdibine, J-M., J. Biotech., 34: 269-287 (1994); Garner, I. & Dalrymple, M., in "Encyclopedia of Molecular Biology: Fundamentals and Applications ", Robert A. Myers (Ed.), Weinheim, NY.

Element (b) of a construct of the invention is an intron whose natural position is within the 5 '-untranslated region (5'-UTR) of its natural gene (i.e the gene with which it is naturally associated). The whole intron is not necessarily required. Fragments or portions may be sufficient. The requirement for the present invention is that the level of protein expression, from any construct according to the invention, is elevated by virtue of the presence of the intron, or parts thereof. It has been shown that the first and third portions of an intron (which has been divided into three fairly equal parts), recombined, are often effective. Generally speaking, the intron for inclusion in a construct according to the invention will be the first intron of such a gene. Examples of genes with such known introns include; human and rat aldolase A, human type II IL-1 receptor, human UDP-N- acetylglucosaminyl transferase, mouse involucrin and mouse adenosine deaminase. Some genes have more than one intron whose natural positions are all within the 5' untranslated region of its natural gene. The present invention recognises this and covers, within element (b), one or more of such introns (for example in a gene with two introns naturally positioned in the 5' UTR, they may separately, together or parts of each cojoined be included in a part of a construct according to the invention). These and other yet unidentified introns whose natural position is within the 5' untranslated region of its natural gene may be used according to the present invention. Also included are: the introns of several gene families including; the actin family (two skeletal muscle actins-alpha cardiac and alpha skeletal, two smooth muscle actins-alpha smooth and gamma smooth, and two non- muscle actins-beta and gamma cytoplasmic actin), the troponin family (cardiac, skeletal and foetal troponins) and the casein family (α SI, α S2, β and K). Preferably the intron is the first intron of the family. In the case of transgenic mammary specific expression, the most preferred gene family from which the intron may come is the casein family or the actin family. The intron may preferably be from the same source of organism as the promoter and/or the expression system which it is in (e.g. mammalian, bovine, ovine, etc.).

DNA expression constructs of the present invention are different from that of Barash et al. (Nucl. Acids Res. 24(4) 602-610 (1996)), in that Barash et α/. 's constructs include β-lactoglobulin intragenic sequences which are not within the 5'- untranslated region. Barash et al. do not refer to the possibility of using an intron whose natural position is within the 5 '-untranslated region of its natural gene. Caseins, whose genes represent a preferred source of the 5' -UTR introns useful in the present invention, are the major mammalian milk proteins and are encoded by a small gene family, which in cows and sheep consists of four members, α_sl, β, α_s2 and K, and in mice and rats five, α, β, γ, ε and K (Yu-Lee & Rosen, J. Bio I Chem., 258 10794-10804 (1983); Jones et al, J. Biol. Chem., 206 7042- 7048(1985); Thompson et al, DNA, 4 263-271 (1985)/ reviewed by Mercier & Vilotte, J. Dairy Sci., 76 3079-3098(1993)). The evolution of the calcium sensitive caseins (α and β) is believed to have occurred by recruitment of exons encoding discrete functional domains, followed by intragenic and intergenic duplication to create the present number of similar exons within a given gene, and of genes within a family (Jones et al, J. Biol. Chem., 206 7042-7048(1985); Groenen et al, Gene, 123 187(1993); reviewed by Mercier & Vilotte, J. Dairy Sci., 16 3079-3098(1993)). There is no evidence that K casein is evolutionally related to the other caseins. Both in sequence homology and protein function it appears to be related to γ fibrinogen (Jolles et al, Biochim. Biophys. Acta., 365 335(1974); Thompson et al, DNA, 4 263-271(1985); Alexander et al, Eur. J. Biochem, 178 395-401(1988)), which performs a cleavage-induced clotting function in blood similar to the clotting function of K casein in the stomach. The caseins all map to a single chromosome in rodents, sheep, cows, humans and pigs (reviewed by Mercier & Villotte, J. Dairy Sc , 16 3079-3098(1993)), all four bovine caseins have been mapped to a single 250 Kb locus (Ferretti et al, Nucleic Acids Res. , 18 6829-6833(1990); Threadgill & Womack, Nucleic Acids Res., 18 6935-6942(1990)) and all five mouse caseins to a 400 Kbp region (Tomlinson et al, Mammalian Genome, 1 542-544).

The first intron of the calcium sensitive casein genes is naturally positioned in the

5' -UTR, upstream of the start of translation. The position of this intron is conserved across species barriers, indicating that there may be some critical function for an intron in this position. The intron may be obtained by PCR amplification from genomic DNA. The resulting DNA fragment may be cloned into a suitable site of an appropriate vector, such as the pMAD6 vector described above.

Constructs of the invention also contain a coding sequence (c), whose expression is driven by the promoter (a) under the beneficial influence of the intron (b). The protein-coding sequence may code for any (natural or modified) protein of interest, particularly those which may be advantageously produced in the preferred mammary gland expression systems. Examples of classes of such proteins, and specific instances within those classes, are as follows: blood proteins involved in haemostasis including factors V, VII, VIII, IX, X, XIII, PAI-1, PAI-2, TFPI, protein C (details of protein C according to the present invention can be found for example in EP-A-191606 and W097/20043), protein S, alpha 1-antitrypsin (AAT) (details of which can be found in general from Perlino et al EMBO Journal, 6, 2767-2771, 1987 and WO90/05188), tPA, fibrinogen (details for which may be found in WO95/23868 and the references cited therein); other protease inhibitors such as serpins, Kazal/Kunitz inhibitors, kinninogens, stefms, cy statins or tissue inhibitors of metalloproteinases; growth factors; protein hormones; structural proteins such as collagens (details of which may be found in WO93/07889, WO94/ 16570, WO97/08311 and the references cited in these publications) and keratins; enzymes such as lipases, other proteases and transferases; and antibodies. While the protein-coding sequence may in principle be any suitable sequence, such as either the full natural genomic structure, a minigene sequence consisting of some, but not all, of the introns naturally present in the gene, or a cDNA (containing no introns), it will generally be with cDNA sequences that the invention is most useful. This is because the invention may conveniently enable the expression of protein from cDNAs which may otherwise only be achievable using minigenes or full genomic sequences. Furthermore, some proteins may be expressed in nature from intronless genes (e.g. bacterial or yeast genes, human thrombomodulin) or have natural intron structures incompatible with the chosen host (e.g. invertebrate or plant genes in a mammalian cell). In these cases the 'cDNA' route is the only one available. The intron (b) is preferably positioned upstream of the translation start site for the protein-coding sequence (c), by analogy with its position in its natural environment.

Particularly preferred constructs according to the present invention are the BLG promoter with either (i) the first intron from bovine β-casein or (ii) the first intron from muscle cardiac actin or (iii) the first intron from ovine β-casein. More preferably, the 3' flanking sequence is from BLG as described below for preferred 3' sequences under (i). Particularly preferred constucts of the present invention include the following:

(i) BLG promoter + bovine β-casein intron 1 + BLG 3' sequence (particularly the 3' sequence beginning immediately 3' to the natural β- lactoglobulin stop codon and continuing to at least about 30 bases 3' of the poly-A site), optionally including ovine beta- lactoglobulin intron 6 (preferred positioned 5' to the flanking sequence and 3' to any coding sequence); in particular the construct pCASMADό as described in Fig. 2, 4 or 7;

(ii) BLG promoter + muscle cardiac actin intron 1 + BLG 3' sequence (particularly the 3' sequence beginning immediately 3' to the natural β- lactoglobulin stop codon and continuing to at least about 30 bases 3' of the poly-A site), optionally including ovine beta-lactogloulin intron 6 (preferred position 5' to the flanking sequence and 3' to any coding sequence); in particular the construct pACTMADό as described in Fig. 2, 5 or 7;

(iii) BLG promoter + ovine β-casein intron 1 -I- ovine β-casein 3' flanking sequence; in particular the construct pBOB as described in Fig 2, 6 or 7. Preferably, following the coding sequence will be such 3 '-sequences as may be necessary or appropriate. In the invention at its broadest, it is not thought that the nature of such 3 '-sequences is particularly limited. The 3'- flanking sequence may or may not include its natural intron. Suitable 3' flanking sequences preferably comprise functional elements which are able to direct the correct transcription, termination and 3' end processing. These can be determined, without undue burden, by the person skilled in the art. However, certain 3 '-flanking sequences have been found to be particularly useful. These include, but are not restricted, to: (i) a poly-A site (poly A addition site), (ii) a β-lactoglobulin gene 3 '-sequence beginning immediately 3' to the natural β-lactoglobulin stop codon and continuing to at least about 30 bases 3' of the poly-A site (as found in pMAD6 and pCASMADό and PACTMAD6), or (iii) β-casein 3' sequences including poly A signal. These sequences (as used in pBOB) consist of 6.5Kbp of DNA incorporating ovine β-casein exons 7 to 9, introns 7 and 8, and approximately 4.8Kbp of 3 ' sequence .

The presence of such 3 '-sequences in the construct adds stability to it. It is believed that the relative orientation of the first and last intron may contribute to this stability.

The β-lactoglobulin gene 3 '-sequence may be cloned from a β-lactoglobulin gene or amplified by PCR and cloned from genomic DNA, which may be of ovine origin. As mentioned above, it begins with the natural β-lactoglobulin gene sequence immediately 3' to the stop codon, which is a TAG codon occurring in exon VI. It extends to at least about 30 bases 3' of the poly-A site, which is in exon VII. Exons VI and VII bracket intron 6 which is present in its entirety. The preferred minimum length of the β-lactoglobulin-derived 3 '-sequences is about 2.3 Kb. For additional preference at least about 50 bases 3' to the poly A site are present. Similarly, the β-casein 3' sequences may be cloned from the β-casein gene cr amplified by PCR and cloned from genomic DNA, which may be of ovine origin.

Appropriate signal and/or secretory sequences, operably linked to the construct may be present if necessary or desirable.

In other aspects, the invention is directed to:

• a process for the preparation of a construct according to any feature of the first aspect. The process comprises linking together selected nucleotide bases and/or nucleotide sequences;

• a vector comprising a construct according to the first aspect of the invention. The vector may be plasmid, phage, cosmid or other vector type, for example derived from yeast. The vector may be an expression vector;

• a process for the preparation of a vector described above, comprising introduction of a construct according to the first aspect of the invention into a vector construct;

• a process for the preparation of a host (preferably an expression host), the process comprising introducing a DNA expression construct (or a vector), as described above, into a suitable organism; the process, in particular provides a host which expresses elevated levels of the coding sequence in the construct;

• a host organism (preferably an expression host organism) incorporating a DNA expression construct (or a vector) as described above (and preferably capable of giving rise to expression of protein encoded by the construct, although non-expressing hosts such as Escherichia coli and other procaryotes may be useful as cloning hosts); The host may be a eukaryotic or prokaryotic cell/organism, such as bacteria, insect or yeast cells, as well as animal tissues (cells in culture) and animals themselves. Such animals are transgenic and preferred transgenic animals include mammals, in particular non-human placental mammals such as pigs, sheep, cattle and goats. Preferably the host (e.g. transgemc animal) according to the invention has the construct (of the first aspect of the invention) integrated into its genome. It is particularly preferred that the transgenic animal transmits the construct to its progeny, thereby enabling the production of at least one subsequent generation of producer animals. Such a host organism, in particular expresses elevated levels of the coding sequence in the construct;

• a process of preparing a protein, the process comprising allowing an expression host to express a DNA expression construct as described above, and optionally subsequently purifying the protein;

• a protein when prepared by such a process. The protein may be a fusion protein;

• the use of a nucleic acid expression construct comprising a promoter, an intron whose natural position is within the 5 '-untranslated region of a gene from which it is derived, a coding sequence and a 3' flanking sequence to obtain a transgenic host, preferably with elevated levels of the expressed coding sequence;

• the use of a nucleic acid construct comprising a promoter, an intron whose natural position is within the 5' untranslated region of a gene from which it is derived, a coding sequence and a 3' flanking sequence to increase the likelihood of expression of the coding sequence from a transgenic host which incorporates the nucleic acid construct; • a process for improving whether an individual or a number of transgenic hosts express a transgene coding sequence, the process comprising introducing into a host, a nucleic acid construct comprising a promoter, an intron whose natural position is within the 5' untranslated region of a gene from which it is derived, a coding sequence and a 3' flanking sequence.

In addition to the construct according to the first aspect of the invention, there is provided an empty 'cassette', including all feamres in claim 1, without the coding sequence. Such a "cassette" provides an easy means by which to provide a high expressing vector for other parties to use by simply introducing coding sequences of interest by restriction endonuclease cutting of the empty cassette and religation (according to standard techniques). The empty "cassette" is for use with an incorporated coding sequence of interest.

Preferred feamres for each aspect of the invention are as for each other aspect mutatis mutandis.

The present invention also provides, as a separate aspect, the novel expression of collagen cDNA (natural procollagen chains or modified collagen). Preferably the collagen cDNA is expressed via a construct according to the first aspect of the invention. Preferred feamres of and all different aspects of the invention described herein above in relation to the construct, also apply to the expression of collagen cDNA. Particular preferred details in relation to collagen are described above under a discussion of the protein-coding sequences, including references thereto. For example, for the expression of all collagen DNA (cDNA or otherwise), expression hosts may co-express prolyl 4-hydroxylase, which is a post- trans lational enzyme important in the natural biosynthesis of procollagen.

The invention will now be illustrated by the following examples. The examples refer to the drawings, in which: FIGURE 1 shows the β-lactoglobulin (BLG) sequences and the plasmids pMAD and pMADό.

FIGURE 2 shows the origin of sequences present in plasmids pMAD, pMAD6, pCASMADό and pACTMADό.

FIGURE 3 shows the construction of pCASLAC

FIGURE 4 shows the construction of pCASMADό.

FIGURE 5 shows the construction of pACTMADό.

FIGURE 6 shows the construction of pBOB.

FIGURE 7 shows details of pMAD, pMADό, pCASMADό, pACTMADό and pBOB.

Preferred embodiments of the invention are based on the use of the BLG promoter, and are designed to express cDNAs from the BLG gene. The structure of pMADό is indicated in Figures 1, 2 and 7. This vector contains the same 5' and 3' flanking sequences present in the ovine BLG gene which itself always gives rise to high level expression in transgenic mice.

However, it lacks all protein coding sequences and introns 1 to 5 of the intact gene. The 3' non coding exons of the gene remain in this vector together with the final intron of the BLG gene. Cloning of cDNAs in the unique EcoRY site between 5' and 3' flanking sequences results in constructs suitable for expression of cDNAs. Incorporation of the BLG 3' sequences are not essential for the invention. Such BLG 3 ' sequences can be substituted by any competent 3 ' flanking sequences with or without an intron situated downstream of the last (stop) codon of such a gene. In outline, the first intron of the bovine β-casein gene was amplified by PCR from genomic DNA. The resulting DNA fragment, of approximately 2 Kbp, was cloned and subsequently subcloned into the EcoRV site of pMADό in such a way that the original EcoRV site was destroyed and reformed on the 3' side of the intron. This gave the vector pCASMADό (Figures 1, 2, 4 and 7). A cDNA encoding human protein C was inserted into the unique EcoRV site of pCASMADό and the new construct called pCORI69. The cDNA utilised encodes a mutant form of the natural protein C (PC962): the mutation was designed to allow more efficient processing of the mature protein (Foster et al, Biochemistry, 29:347-354 (1990)).

This mutant form of the human protein C cDNA has been incorporated into a construct pCORP9, exactly analogous to pCORP2 (see WO-A-9211358). In "rescue" experiments pCORP9 expressed particularly poorly, the highest expressing line being 3μg/ml compared to 108μg/ml for the wild type cDNA. This indicates that this mutant cDNA is particularly difficult to express at high levels and therefore is a very exacting test of any cDNA expression system. All references to the DNA sequence of the β-lactoglobulin gene utilise the numbering of the sequence allocated ΕMBL Accession No. X 12817 (Harris et al , NAR 16: 10379-80 91988).

EXAMPLES

General

Where not specifically detailed, recombinant DNA and moleuclar biological procedures were after Maniatis et al ("Molecular Cloning" Cold Spring Harbor (1982)) "Recombinant DNA" Methods in Enzymology Volume 68, (edited by R. Wu), Academic Press (1979); "Recombinant DNA part B" Methods in Enzymology Volume 100, (Wu, Grossman and Moldgave, Eds), Academic Press (1983); "Recombinant DNA part C" Methods in Enzymology Volume 101, (Wu, Grossman and Moldgave, Eds), Academic Press (1983); and "Guide to Molecular Cloning Techniques", Methods in Enzymology Volume 152 (edited by S.L. Berger & A.R. Kimmel), Academic Press (1987). Unless specifically stated, all chemicals were purchased from BDH Chemicals Ltd, Poole, Dorset, England or the Sigma Chemical Company, Poole, Dorset, England. Unless specifically stated all DNA modiiymg enzymes and restriction endonucleases were purchased from BCL, Boehringer Mannheim House, Bell Lane, Lewes, East Sussex BN7 1LG, UK.

[Abbreviations: bp = base pairs; kb - Kilobase pairs, AAT =alphal-antitrypsin; BLG = beta-lactoglobulin; FIX = factor IX; E. coli = Escherichia coli; dNTPs = deoxyribonucleotide triphosphates; restriction enonucl eases are abbreviated thus e.g. BamHI: the addition of -O after a site for a restriction endonuclease e.g. PvuII-O indicates that the recognition site has been destroyed] .

Construction of Plasmids

Vectors

Plasmid pUCPM

The multiple cloning site of the vector pUC18 (Yanisch-Perron et al. , (1985) Gene 33: 103-119) was removed and replaced with a synthetic, double stranded, oligonucleotide containing the new restriction sites: PvuVMluVSall/EcoRY/Xbal/ Pvull MM, and flanked by 5 '-overhangs compatible with the restriction sites EcoRI and HmdIII. pUC18 DNA was cleaved with both EcoRI and HmdIII and the new linker DNA was ligated into pUC18. The DNA sequence across the new multiple cloning site was confirmed. This new vector was called pUCPM.

Plasmid pUCXS The β-lactoglobulin gene sequences from plasmid pSSltgXS (see WO-A-9201358) were excised on a SaWXbal fragment and recloned into the vector pUCPM, cut with Sail and Xbal, to give plasmid pUCXS.

Plasmid pUCXS/RV The plasmid pSSltgSE (see WO-A-8800239) contains: β-lactoglobulin gene sequences from the Sphl site at position 754 to the EcoRI site at 2050, a region spanning a unique NotI site at position 1148. This insert contains a single Pvull site (832) which lies in the 5 '-untranslated region of the β-lactoglobulin mRΝA. Into this site was blunt-end ligated a double stranded, 8bp, DΝA linker encoding the recognition site for the enzyme EcoRV, to give the plasmid pSSltgSΕ/RV. The DΝA sequences bounded by Sphl and NotI were then excised and used to replace the equivalent fragment in the plasmid pUCXS, thus effectively introducing a unique EcoRV site into the β-lactoglobulin gene placed in such a way as to allow the insertion of any additional DΝA sequences under the control of the β- lactoglobulin gene promoter and 3' to the initiation of transcription. The resulting plasmid was called pUCXS/RV.

Plasmid pUCSV

A derivative of pUCXS/RV, containing only the 4.3 Kbp of the β-lactoglobulin gene which lie 5' to the transcription initiation site (the promoter), was constructed by subcloning the Sα/I-EcoRV fragment into pUCPM; this plasmid is called pUCSV.

Plasmid pBLAClOO A fragment of the 3' flanking sequence of the β-lactoglobulin gene was subcloned in such a way as to eliminate all introns. Plasmid DΝA of pUCXS/RV was partially digested with Smal by performing an enzyme titration with lower and lower concentrations of enzyme at a fixed DΝA concentration. The Smal protein was removed by phenol-chloroform extraction and ethanol precipitation and the DΝA resuspended in water. This DΝA was subsequently digested to completion with the enzyme Xbal. DNA cut once at the Smal site, position 5286 and then cleaved with Xbal gave a characteristic band of size 2.1 Kbp. This band was purifLά from an agarose gel slice and ligated into Smal and Xbal cut pBSIISK+ (Stratagene Ltd., Cambridge Science Park, Cambridge, UK) to give the plasmid pBLAClOO.

Plasmid pMAD

The β-lactoglobulin cloning vector pMAD was constructed to allow rapid insertion of cDNAs under the control of the β-lactoglobulin gene promoter and 3 '-flanking sequences. Such constructs contain no introns. The plasmid pBLAClOO was opened by digestion with both EcoRV and Sail, the vector fragment was gel purified. Into this was ligated the 4.3 Kbp promoter fragment from the plasmid pUCSV as a Sall-EcoRV fragment. This construct is termed pSTl and constitutes a β-lactoglobulin mini-gene encoding the 4.3 Kbp promoter and 2.1 Kbp of 3' flanking sequences. A unique EcoRV site is present to allow blunt-end cloning of any auditional DNA sequences. In order to allow excision of novel β-lactoglobulin gene constructs with the enzyme Mlul the entire mini-gene from pSTl was excised on a Xhol-Notl fragment, the DNA termini made flush with Klenow polymerase, under standard conditions, and blunt-end cloned into the EcoRV site of pUCPM to give pM AD.

Plasmid pMADό

Previously described in WO 95/23868, and shown in Figures 1 and 2.

Plasmid pMADl

Two primers, complementary to sequences at the 5' and 3' boundaries of the first intron of the ovine BLG gene, were used to amplify a ~ 650bp fragment encompassing the entire sequence of intron 1 of the BLG gene from pUCXSRV template. The primers introduce a 5' Smal site and a 3' EcoRV site at the ends of the PCR fragment. This fragment was cloned in Eco RV digested pBluescriptSK to which single 3' dATP overhangs were added, using Taq polymerase. This construct was named pSTIl. The orientation of the insert with respect of the multiple cloning site in pSTIl was determined by restriction digestion.

The intron 1 sequence was excised from pSTIl on a 5' Smal -3'Hind l fragment, the recessed 3' terminus generated at the Hindlll end was repaired using Klenow, and the resulting blunt ended fragment was ligated with EcoRV digested pMAD to make pMADl . The correct orientation of the intron fragment with respect to the remainder of the BLG sequences was determined by DNA sequencmg. This step effectively moves the EcoRV site to the 3' end of the BLG intron.

Plasmid pMAD 16

This was constructed using essentially the same strategy as that described for pMADl, except that in the final cloning step the BLG intron was ligated with ΕcoRV cleaved pMADό (instead of pMAD) to construct pMADlό.

Expression Constructs

Plasmid pCORP2 (see WO-A-9211358)

A 1450bp cDNA of the human protein C gene, flanked by Kpή sites, was obtained in the form of plasmid pWAPC2. The cDNA was excised as a Kpnl fragment, the 3' overhangs made flush by treatment with T4 DNA polymerase, the fragment gel purified and blunt-end cloned into the EcoRV site of pMAD. Orientation was determined by restriction digest and confirmed by DNA sequencing. This construct is plasmid pCORP2 and contains the human protein C cDNA under the transcriptional control of the β-lactoglobulin gene 5' and 3' flanking sequences. There are no introns. Plasmid pCORP5

The 1450bp protein C cDNA fragment used in the construction of pCORP2 was placed into pMADlό to make pCORP5.

Plasmid pCORP9

To facilitate the cloning of the protein C cDNA, PC962 (Foster et al ., ibid), into pMAD, the plasmid was modified to incorporate EcoRV sites at the extremities of the protein C cDNA insert. A 769 bp Sstll-Pstl fragment encompassing the 3' end of PC962 was cloned between the Sstll and Pstl sites of pBluescript II SK+ (Stratagene, La Jolla, CA). The fragment was excised with Sstll/ EcoRV and purified. The 5' portion of PC962 was modified by PCR. The sense oligonucleotide primer for this reaction covered the 5' ATG region of the cDNA and provided an EcoRV site upstream of this in the product. The antisense oligonucleotide primer covered the Sstll site used to generate the Sstll - EcoRV fragment. The resulting PCR product was digested with EcoRV and Sstll and ligated with the Sstll-EcoRV 3' fragment and EcoRV digested pMAD. The resulting plasmid, designated pCORP9 effectively contained the PC962 cDNA flanked by EcoRV sites in an intronless fusion driven by the β- lactoglobulin promoter.

Plasmid pCORP14

A genomic DNA construct, containing exons I through VIII of the human protein C gene, was made. This genomic construct, designated GPClO-1, changed the sequence 16 base pairs upstream of the ATG from the native protein C sequence to the β-lactoglobulin sequence and introduced mutations in the propeptide cleavage site located in exon 2, and the two-chain cleavage site located in exon 6, as described below. The construct was assembled using four fragments designated A, B, C and D and encompassed the protein C gene sequence from the ATG to a BamHl site in exon VIII, immediately upstream of the stop codon. The fragments were generated from a human genomic library in Charon 4A phage which was screened with a radiolabeled cDNA probe for human protein C. The screening of the λ library produced three clones that together mapped the entire protein C5 gene (Foster et al ., 1985, Proc. Natl. Acad. Sci. USA, 82: 4673-4677). These clones were designated PC λl, PC λό and PC λ8. Fragment A was a NotI to EcoRI fragment that contained exons I and II of the genomic sequence and was 1698 bp. A subclone of PC λό contained an EcoRI to EcoRI fragment and was designated pHCR4.4-l . Using pHCR4.4-l as a template and oligonucleotides ZC6303 (5'-ATT TGC GGC CGC CTG CAG CCA TGT GGC AGC TCA CAA GCC TCC TGC-3') and ZC6337 (5'-CAG GAA GGA GTT GGC GCG CTT GCG CCG TTG CAG CAC CTG CTG GGC-3", a DΝA fragment was generated by polymerase chain reaction (PCR). Oligonucleotide ZC6303 changed the sequence 16 based pairs 5' to the ATG sequence from the native protein C sequence to the equivalent sequence from the β-lactoglobulin gene and introduced a NotI site.

Oligonucleotide ZC6337 changed the propetide cleavage site from Arg-Ile-Arg- Lys-Arg to Gln-Arg-Arg-Lys-Arg. The resulting PCR generated fragment was digested with NotI and BssHlL, and a 1402 base pair fragment was gel purified and designated Al. A second fragment was prepared using a λ gtll clone of PC λl as a template with oligonucleotides ZC6306 (5' -CTT CTT CCT GAA TTC TGT TTC TTG C-3') and ZC6338 (5' -CGG ATC CGC AAG CGC GCC AAC TCC TTC C-3') in a polymerase chain reaction. The resulting DΝA fragment, designated A3, was digested with 5wHII and EcoRI and gel purified, resulting in a 296 base pair fragment.

Fragments Al and A3 were ligated into the Bluescript II KS + phagemid vector (Stratagene. La Jolla, CA). The resulting plasmid, designated GPC 2-2, was digested with NotI and EcoRI, gel purified and the Notl-EcoRI DΝA fragment was designated Fragment A.

pCR 2-14 is a subclone which contains an EcoRI to EcoRI DNA fragment of PC λ8 (Foster et al ., 1985, ibid.). The plasmid was digested with EcoRI and SstI and gel purified. The resulting figment was designated Fragment B.

Plasmid pCR 2-14 was used as a template DNA with oligonucleotides ZC6373 (5' -AAA GTA AAA AAA GAT CTA AAA ATT TAA C-3') and ZC6305 (5' - GTG TCT CGT TTT CTT AAG TGA CTG CGC-3'), which introduced za AfTO. site and the RRKR mutation of the native (KR) two-chain cleavage site, in a polymerase chain reaction. The resulting PCR-generated fragment was digested with Bglϊl and Aflϊ and gel purified, resulting in a 1441 base pair fragment, designated Εl. Fragment ΕI was used in a ligation reaction with oligonucleotides ZC6302 (5' -TTA AGA AGA AAA CGA GAC ACA GAA GAC CAA GAA GAC CAA GTA GAT CCG C-3') and ZC6304 (5' -GGA

TCT ACT TGG TCT TCT TGG TCT GTG TCT CGT TTT CTT C-3'). These oligonucleotides form Aβll and Sstll restriction sites when annealed and were ligated to the 3' end of fragment Εl, resulting in a fragment with a 5' Bglll site and a 3' Sstll site. This frament was used in a ligation reaction with a BamΑl- Sstll digested Bluescript II KS+ phagemid vector (Stratagene). The resulting plasmid was designated GPC 8-5 and digested with SsrI and Sstll, generating a 626 base pair fragment, designated Fragment C.

A fourth fragment was generated by digestion of a genomic subclone (pHCB7-l) of PC λ8. pHCB7-l contained a Bglll to Bglll fragment that encompassed exons VI through VIII. pHCB7-l was digested with Sstll and BamΑl and a 2702 base pair fragment was gel purified. The fragment was designated Fragment D.

A five-part ligation reaction was prepared using NotI and BamΑl digested and linearized Bluescript II KS+ phagemid vector (Stratagene) with Fragment A (5' NotI to 3' EcoRI) that contained exons I and II, Fragment B (5' EcoRI to 3 'SstI) that contained exons III, IV and V, Fragment C (5' SstI to 3' Sstll) that contained the 5' portion of exon VI and Fragment D (5' Sstll to BamHl) that contained the remaining 3' portion of exon VI and exons VII and VIII.

5

The resulting DΝA was 8950 base pairs and designated GPC 10-1.

GPClO-1 was originally generated with BLG sequences and a NotI site upstream of the ATG initiator codon and modifications to both cleavage sites. A clone, o designated pPC12/BS, was generated to ensure that the 5' NotI site of GPClO-1 would not introduce secondary structure into mRΝA molecules that could hinder translation. pPC12/BS was generated using PCR amplification of a 1 kb ΛOtl- Scαl fragment that covered the 5' region of the protein C gene and contained the wild-type ATG codon environment. This introduced an EcoRV site immediately s downstream of the NotI site, adjacent to the ATG codon, and a BamRl site was incorporated 3' of the Seal site to facilitate cloning. Following a Notl/BamHl digestion, the PCR product was cloned into Notl/BamUl digested Bluescript II KS+ phagemid vector (Strategene). The Notl-EcoRV-Scal fragment present in pPC12/BS was excised, purified and ligated to GPClO-1, which had been o linearized with NotI and partially digested with Seal (the pUC amplillicin gene has an internal Seal site). The resulting clone was designated GPC 10-2 and possesses an EcoRV site immedately upstream of the ATG initiator codon. GPC 10-1 and GPC 10-2 both terminated at the final 5αmHI site in exon VIII of the protein C gene. To reconstitute the 56 bp of sequence, ending at the 5 termination codon, two oligonucleotides were synthesized with flanking 5αmHI (5') and Bglll (3') restriction sites. Following annealing of the oligonucleotides, the product was cloned into 5αmHI digested pBST + to generate plasmid pPC3 ' . pBST+ is a derivative of pBS (Stratagene) with a new polylinker. The addition of the polylinker added Bglll, Xhol, Narl and CZαl restriction sites from the o vector polylinker downstream of the destroyed Bglll site of the oligonucleotide construct.

The Notl-BamHl fragment of GPC 10-1 was subcloned into NotI/ BamRl digested pPC3' to add 3' coding sequences of protein C, the TAG termination codon followed by Bglll-Xhol-Narl-Clal. The 3' region of the protein C gene beginning with the EcoRV site in intron V was excised from this plasmid on an EcoRV-C/αl fragment.

The EcoRV-EcoRV fragment from GPC 10-2, covering the 5' portion of the protein C gene, and the 5 above EcoRl-Clal fragment covering the 3' portion of the protein C gene were combined between the EcoRV and Clal sites of pMADό to generate pCORP13. This effectively placed a genomic portion of the protein C gene with modified propeptide and two-chain cleavage site under the control of the β-lactoglobulin promoter.

A further genomic construct was generated from pCORP13 which contained only the modified two-chain cleavage site. This was achived using PCR amplification to modify two fragments which result in restoration of the coding capabilitiy of exon 2 from the mutant Gln-Arg-Arg-Lys-Arg to the wild-type Arg-Ile-Arg-Lys- Arg. pCORP13 was used as template for these reaction. The first fragment was

1.3kb, which encompassed the 5' end of the protein C gene up to the 5αmHI site in excn 2. For this reason, the sense primer was designed to add a HmdIII site 5' to the EcoRV site proximal to the ATG initiation codon. The antisense primer was designed to restore the wild-type sequences in exon 2, which included a restored 5αmΗI site. A second fragment of 0.2kb from the 5αmHI site in exon 2 to the Xhol site in intron 2, was amplified. The two fragments were combined in pGΕMII (Promega, Madison, WI) to generate pGΕMOC1.5. A 7.5kb Xhol fragment from pCORP13 was ligated to Xhol digested pGEMPC1.5 to generate a complete protein C genomic sequence covering exons 1-8 with a wild-type propeptide cleavage site and a modified two-chain cleavage site. The plasmid was designated pGEMPC14. The sequence was excised from pGEMPC14 as a Hindlϊl/Sall fragment. The DNA termini was repaired using a Klenow reaction and the fragment was blunt-end ligated into EcoRV digested pMADό to generate pCORP14.

Plasmid pCORPlό

The modified protein C cDNA (PC962) was excited from the plasmid pCORP9 (see above) as an EcoRV fragment and ligated with EcoRV pMADό. The resulting construct has been named pCORPlό.

The Vector pCASMADό

Plasmid pΕ' 10

The Bovine β-Casein intron 1 (BBCI 1; BOVCASl (5'-AGG CCT ATT CAG CTC CTC CTT CAC TTC TT-3') and BOVCAS2 (5'-GAT ATC GGC TCT CAA TTC CTG GGA ATG GG-3') approximately 2 Kbp) was PCR amplified from dairy cow DNA. The 5' primer incorporates a Stul site and the 3' primer incorporates an EcoRV site. The purified 2 Kb fragment was cloned into the pGΕM-T vector (Promega) to give construct pΕ' 10.

Plasmid pCASMADό pMADό was modified by inserting a linker, containing Spe l/Not I/Sac II sites, into the EcoRV site. Both orientations of the linker were obtained and thus two new cloning vectors were obtained. These were called pMADό/STOPS (5'

Sαdl/Notl/S el 3^*) and pMADό/SPOTS (5' S el/Notl/SαcII 3').

BBCI 1 was excised from pΕ' 10 on a SαcII and Spel partial (due to an internal Spel site in the β-Casein intron) and cloned into Sacll/Spel digested pMADό/SPOTS. The new vector was called pCASMADό. Plasmid pCORI69

The modified protein C cDNA (PC962) was excised from the plasmid pCORP9 (see above) as an EcoRV fragment and ligated with EcoRV digested pCASMADό. This places the AUG translation start downstream with respect to the β-casein intron sequence. The resulting construct was named pCOR169.

The Vector pACTMADό

Plasmid pGΕM-AI Two primers, (Sequences ACTPl 5'-AGG CCT AGT GCC TGC CAC CAG CGC CAG CC-3' ACTP2 5' -GAT ATC CCT GGC AC A GCT TTG TGT GGT TC-3') complementary to the opposing strands of the 3' end of the first exon and the 5' end of the second exon of the murine cardiac actin gene respectively, were used in a PCR reaction to amplify a 0.8 Kb fragment encompassing the intervening sequences from a template of mouse genomic DNA. The two primers introduced a 5' SnaBl and a 3' EcoRV restriction site at the ends of the PCR product. This DNA fragment was cloned in pGΕM-T to give a construct which was named pGΕM-AI. DNA sequence analysis confirmed that the sequence of the amplified product beyond the primers matched that published for the murine beta actin gene.

Plasmid pACTGMADό

The actin intron 1 sequence was excised from pGΕM-AI on a 5' SnαBI- 3 'EcoRV fragment which was then ligated with ΕcoRV digested pMADό to give vector pACTMADό. This cloning step effectively moves the EcoRV site from the 3' end of the BLG promoter downstream to the 3 ' end of the actin gene intron segment.

Plasmid pCORI70

The modified protein C cDNA (PC962) was excised from the plasmid pCORP9 as an EcoRV fragment and ligated with EcoRV digested pACTMADό. This places the AUG translation start downstream with respect to the actin intron sequence. The resulting construct was named pCOR170.

The Vector pBOB

Plasmid pBOB

PCR primers were designed to amplify the region of the ovine β-casein gene from exon 1 to exon 2 (BOB1: 5'-CGG GAT CCG TCG ACC ATT CAG CTT CTC CTT CAC TTC TTC TC-3'; BOB2: 5'-CGG GAT CCG GGT CCC TAC GTA GGC TCT CGA TTC CTG TGA ATG GGA-3'). The size of this product is 2.1Kbp and has, engineered into it, the sites BamRl/Sall at the 5' end and BamRl/ Ppuml/ SnaBl at the 3' end.

The construction of pBOB toook place in three steps: the 2.1 Kbp PCR fragment (above) was blunt-end cloned into the EcoRV site of pBSIISK+ (Stratagene) to give the plasmid poβ- casΕxl/2. The 6.5Kbp Ppuml fragment from the ovine β- casein gene, containing exons 7 to 9 and 3' flanking sequences, was cloned into the now unique Ppuml site of pOβ- caseΕxl/2. A clone was obtained with the 6.5Kbp fragment in its natural orientation with respect to the first intron and this clone was named pBOBΔprom.

Previously, a Xhol linker had been cloned into the EcoRV site of pMADό and the modified plasmid named pMADX. Finally, the ovine BLG promoter from pMAD6X was cloned into the Sail site of pBOBΔprom as a SaWXhol fragment giving rise to pBOB.

Plasmid pCORB71

The modified protein C cDNA (PC962) was excised from the plasmid pCORP9 as an EcoRV fragment and ligated with EcoRV digested pBOB. This places the AUG translation start downstream with respect to the actin intron sequence. The resulting construct was named pCORB71. Example 1

Analysis of constructs of the present invention in the expression of protein C in transgenic animals. Results are shown in Tables 1 and 2.

Generation of transgenic animals

Transgenic mice were generated as described in Prunkard et al (Namre Biotechnology, 14:867-871, 1996).

Protein C Assays Human protein C in the milk of transgenic animals was assayed according to the following procedure:

Protein C Standard

Purified human Protein C stored at 50μg/ml in Phosphate Buffered Saline (PBS)/1 % bovine Serum Albumin (BSA) at 20 ° C. Dilute to 500ng/ml in blocking buffer for use. Standard curve range of ELISA is 3.9-125ng/ml.

Blocking buffer

IX PBS 5% Milk powder 0.01 % Tween 20

Wash buffer

IX PBS 0.05% Tween 20

Coating Antibody

Dako Rabbit Anti-human Protein C antibody diluted to lOμg/ml in PBS Detection Antibody

Dako Rabbit Anti-human Protein C antibody Peroxidase conjugate. Dilute 1/5000 in blocking buffer.

Substrate

TMB 1 Component Peroxidase substrate

Stop Solution

0.2M Sulphuric acid

Method

1. Coat Costar High Binding capacity 96 well plate with 150μl/well of coating antibody. Incubate in a damp box o/n in fridge.

2. Wash wells with wash buffer (3X 200μl/well). 3. Load lOOμl blocking buffer into wells.

4. Dilute samples appropriately in blocking buffer.

Reference human plasma is used as a protective control at 1/40 dilution.

5. Load standard and samples into plate by columns with doubling dilution (lOOμl per well). Standard is loaded in rows 1 and 12 in duplicate. 6. Incubate for 2 hours in damp box in fridge.

7. Wash plate with wash buffer (3X 200μl/well).

8. Load lOOμl/well peroxidase conjugate. Incubate in damp box for 2 hours in fridge.

9. Wash plate with wash buffer (3X 200μl/well). Drain plate. 10. Add lOOμl/well of substrate and leave for 5 minutes.

Stop reaction by addition of lOOμl/well of stop solution.

11. Read plate on plate reader with 650nm filter.

12. Plot standard curve using mean of duplicates (O.D v. log concentration PC) and calculate regression line equation. Use equation to calculate sample values. Data handling can be performed by PC assay programme on Dynex Revelation software.

Example 2

Expression of AAT from cDNA in constructs according to the present invention.

Transgenic mice were prepared as in Example 1. Analysis of AAT in the milk of transgenic mice was according to standard procedures, for example as described in Wright, G. , Carver, A., Cottom, D., Reeves, D. , Scott, A. , Simons, J.P., Wilmut, I., Garner, I., and Colman A., 1991. High level expression of active human αl antitrypsin in the milk of transgemc sheep. Bio/Technology 9: 830-834.

Results are shown in Table 3.

Example 3.

Expression of antibody fragment from constructs according to the present invention.

Constructs pMADό and pCASMADό were prepared incorporating DNA encoding an antibody binding fragment to give constructs pMAD6-AB and pCASMADό- AB. The constructs were used to obtain transgenic mice according to Example 1. Expression of the antibody fragment was determined by standard protocols.

No expression of the antibody fragment was found in the transgenic mice with pMAD6-AB. Levels ranging from 0 to 129 μg/ml were found in pCASMAD6-AB mice.

The results are given in Table 4. Example 4

Expression of IgG from constructs according to the present invention. Const^ructs pMADό and pCASMADό were prepared incorporating DNA encoding IgG to give constructs pMAD6-IgG and pCASMAD6-IgG. The constructs were used to obtain transgenic mice according to Example 1. Expression of IgG in the mice milk was determined by standard ELISA protocol.

Results are given in Table 5.

Example 5

Expression of adhesion molecule (soluble) from constructs according to the present invention.

Constructs pMADό and pCASMADό were prepared incorporating DNA encoding a soluble adhesion molecule (SAM) to give constructs pMADό-SAM and pCASMADό-SAM. Transgenic mice were prepared according to Example 1.

The expression level range (μg/ml) in pCASMADό-SAM transgenic mice was up to 500. The maximum level detected in the pMADό-SAM transgenic mice was 80.

Example 6

Expression of collagen cDNA from constructs according to the present invention.

The CASMAD6 vector was used. Collagen cDNA (human truncated pro-collagen α2(l) homotrimer) was inserted as the DNA for the protein of interest. This was coinjected with two transgenes expressing α and β subunits of prolyl 4- hydro yiase, an enzyme for the post-translational modification of procollagen. Transgenic animals were obtained as in Example 1. Determination of collagen expression in mouse milk was as described according to standard protocols and described in WO97/08311.

11 transgenic lines bearing the collagen cDNA construct and the two prolyl 4- hydroxlase trans genes were analysed. Three lines were found to express procollagen α2(l) homotrimer protein. The amount of collagen present was estimated by measurement of hydroxyproline content and Western analysis in comparison with bovine collagen standard. Three independently derived mouse lines were found to express detectable amounts of collagen. The levels present in milk of these three lines was estimated as: lOμg/ml, 30μg/ml, and 120-240μg/ml. Collagen protein was absent from the milk of non- transgenic mice. Milk from the highest expressing line was analysed further and the procollagen present was found to have formed a correctly aligned triple helical molecule.

These results demonstrate secretion of relatively high levels of recombinant procollagen in the milk of transgenic mice by expression of cDNA under the control of the β-lactoglobulin promoter.

From the foregoing it will be appreciated that although specific embodiments of the invenuon have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. Accordingly, the invention is not limited. All documents and papers cited or mentioned herein are fully incorporated by reference. TABLE 1

Data supporting the improved expression from constructs according to the invention

*refers to different protein C cDNAs Table 2: Protein C Expression data

O) x m m

H

3Ϊ

C r m- t

O)

TABLE 3

Expression of cDNA Constructs

CO

ID CO

H

C H 0 rπ

CO x m m

H c m t

TABLE 4

pMAD6-AB

pCASMADό- AB

TABLE 5

Expression Levels of IgG in Transgenic

Mouse Milk

Claims

1. A nucleic acid expression construct comprising: (a) a promoter; (b) an intron whose natural position is within the 5 '-untranslated region of a gene from which it is derived;

(c) a coding sequence; and

(d) a 3 '-flanking sequence wherein the intron (b) is not derived from the same gene as that from which either the promoter (a) or the protein-coding sequence (c) is derived.

2. An expression construct as claimed in claim 1 wherein the promoter is a gene promoter which drives expression of the coding sequence (c) in mammalian cells, in particular, a milk protein promoter.

3. An expression construct as claimed in claim 1 or claim 2 wherein the intron (b) is the first intron from a gene where the intron is namrally located entirely within the 5' untranslated region of the gene.

4. An expression construct as claimed in 1, 2, or 3, wherein the intron (b) is the first intron from a gene which is a member of a family of genes where the intron is namrally located entirely within the 5' untranslated region of the gene.

5. An expression construct as claimed in any one of claims 1 to 4 wherein the intron is from the casein gene family.

6. An expression construct as claimed in any one of claims 1 to 4 wherein the intron is from the actin gene family.

7. An expression construct as claimed in any one of claims 1 to 6 wherein the 3' flanking sequence is any sequence which supports the correct transcription termination, mRNA 3' end processing, mRNA stabilisation, mRNA transport from the nucleus to cytoplasm and mRNA translation.

8. An expression construct as claimed in any one of claims 1 to 7 wherein the 3 'flanking sequence is a poly-A site or a ╬▓-lactoglobulin gene 3 '-sequence beginning 3' to the namral ╬▓-lactoglobulin stop codon and continuing to at least about 50 bases 3' of the poly-A site.

9. An expression construct as claimed in any one of claims 1 to 7 wherein the 3 '-flanking sequence is a ╬▓-casein gene 3 '-sequence beginning 3' to the namral ╬▓- casein stop codon and continuing to at least about 50 bases 3' of the poly-A site.

10. A process for the preparation of a host organism the process comprising introducing an expression construct, as claimed in any one of claims 1 to 9, into a suitable organism.

11. A process as claimed in claim 10 wherein the suitable organism is a prokaryote, a fungi, a plant, an animal, or a eukaryotic cell.

12. A process as claimed in claim 11 wherein the animal is a non-human mammal.

13. A host organism incorporating a DNA expression construct as claimed in any one of claims 1 to 9.

14. A host organism as claimed in claim 13 which is a procaryote (eg. E.col╧è), a fungi, a plant, an animal or a eukaryotic cell.

15. A host organism, as claimed in claim 14, wherein the animal is a non- human mammal.

16. A process of preparing a protein, the process comprising allowing an expression host to express a DNA expression construct as claimed in any one of claims 1 to 9.

17. A process as claimed in claim 16, further including a process of purifying the pi oiein.

18. A process as claimed in claim 15 or claim 16 wherein the protein is protein C, fibrinogen, AAT or collagen.

19. A process as claimed in claim 16 or claim 17 wherein the expression host is a prokaryote, a fungi, a plant, an animal or a eukaryotic cell.

20. A process, as claimed in claimed 19, wherein the animal is a non-human mammal.

21. A protein prepared by a process as claimed in any one of claims 16 to 20.

22. The use of a nucleic acid expression construct comprising a promoter, an intron whose namral position is within the 5' untranslated region of a gene from which it is derived, a coding sequence and a 3' flanking sequence to obtain a transgenic host.

23. The use of nucleic acid construct comprising a promoter, an intron whose namral position is within the 5' untranslated region of a gene from which it is derived, a coding sequence and a 3' flanking sequence to increase the likelihood of expression of the coding sequence from a transgenic host which incorporates the nucleic acid construct.

24. A process for improving the number of transgenic hosts which express a transgene coding sequence, the process comprising introducing into the host a nucleic acid construct comprising a promoter, an intron whose namral position is within the 5' untranslated region of a gene from which it is derived, a coding sequence and a 3' flanking sequence.

25. The use as claimed in any one of claims 22 or 23 or a process as claimed in claim 24 wherein the construct is as set out in any one of claims 1 to 9.