RIBOSOMAL FRAMESHIFT SIGNALS AND USES
BACKGROUND OF THE INVENTION
This invention relates to ribosomal frameshift signals and their uses.
Work from a number of viruses has recently indicated a novel form of translational control of gene expression through a mechanism known as ribosomal frameshifting. The essence of this is that a ribosome can at certain specific points in a virus mRNA, slip back by one nucleotide, resulting in the decoding of the subsequent downstream RNA sequence in a different reading frame. The effect of this frame shift may be to avoid a termination codon, which would otherwise have been encountered by the ribosome, and instead create a protein with extra amino acid sequences at its C terminal end.
This mechanism was first described for the retrovirus Rous Sarcoma Virus (RSV) (Jacks, T. and Varmus, H.E. (1985); Expression of the Rous Sarcoma Virus pol gene by ribosomal frameshifting; Science 230, 1237- 1242) but has more recently been described for Human Immunodeficiency Virus (Jacks, T., Power, M.D., Masiarz, F.R., Luciw, P.A. , Barr, P.J. and Varmus, H.E. (1988a); Characterization of ribosomal frameshifting in HIV-1 gag-pol expression; Nature 331 , 280-283. Wilson, W., Braddock, M. , Adams, S.E., Rathjen, P.D., Kingsman, S.M. and Kingsman, A.J. (1988); HIV expression strategies; ribosomal frameshifting is directed by a short sequence in both mammalian and yeast systems; Cell 5_5, 1159-1169.), Mouse Mammary Tumour Virus (MMTV) (Moore, R. , Dixon, M. , Smith, R. , Peters, G. and Dickson, C. (1987); Complete nucleotide sequence of a milk-transmitted mouse mammary tumour virus: two frameshift suppression events required for translation of gag and pol; J. Virol. 6JL, 480-490) and by the inventors
for the Coronavirus IBV (Brierley, I., Boursnell, M.E.G., Binns, M.M., Bilimoria, B., Blok, V.C., Brown, T.D.K. and Inglis, S.C. (1987); An efficient ribosomal frame¬ shifting signal in the polymerase-encoding region of the coronavirus IBV; EMBO J. , 3779-3785.).
In ribosomal frameshifting, a directed change of translational reading frame allows the synthesis of a single protein from two (or more) overlapping genes (see Roth, J.R. (1981); Frameshift suppression; Cell 24, 601- 602; and Craigen, W.J. and Caskey, C.T. (1987); Translational frameshifting: where will it stop?; Cell 50, 1-2; for reviews). So far, almost all the examples of this kind of control in higher eukaryotes have come from retroviruses, where frameshifting appears to be a mechanism for the regulation of expression of the viral RNA-dependent DNA polymerase. One termination codon in RSV and HIV-1 and two in MMTV are suppressed by (-1) ribosomal frameshifts into alternative overlapping open reading frames (ORFs), generating the gag-pol polyproteins from which the viral polymerases are subsequently derived. In addition, frameshifting appears to be necessary for the expression of the reverse transcriptase enzymes of a number of retrotransposons, such as yeast Tyl (Mellor, J., Fulton, S.M., Dobson, M.J., Wilson, W., Kingsman, S.M. and Kingsman, A.J. (1985); A retrovirus-like strategy for expression of a fusion protein encoded by yeast transposon. Tyl; Nature 313, 243-246. Wilson, W. , Malim, M.H. , Mellor, J. , Kingsman, A.J. and Kingsman, S.M. (1986); Expression strategies of the yeast retrotransposon Ty: a short sequence directs ribosomal frameshifting; Nucl. Acids Res. _T4_, 7001-7015. Clare, J.J., Belcourt, M. and Farabaugh, P.J. (1988); Efficient translational frameshifting occurs within a conserved sequence of the overlap between the two genes
of a yeast Tyl transposon; Proc. Natl. Acad. Sci. USA 85,
6816-6820.) and Ty912 (Clare, J.J. and Farabaugh, P.J.
(1985); Nucleotide sequence of a yeast Ty element: evidence for an unusual mechanism of gene expression; Proc. Natl. Acad. Sci. USA 2, 2829-2833.) and the
Drosophila 17.6 (Saigo, K., Kugimiya, W., Matsuo, Y.,
Inouye, S., Yoshioka, K. and Yuki, S (1984);
Identification of the coding sequence for a reverse transcriptase-like enzyme in a transposable genetic element in Drosophila melanogaster; Nature 312, 659-661.) and gypsy (Marlor, R.L., Parkhurst, S.M. and Corces, V.G.
(1986); The Drosophila melanogaster gypsy transposable element encodes putative gene products homologous to retroviral proteins; Mol. Cell Biol. 6_, 1129-1134.) elements.
Recently, however, the applicants described the first non-retroviral, higher eukaryotic example of the phenomenon, in the avian coronavirus infectious bronchitis virus (IBV) (Brierley et al., 1987). There is considerable interest in the precise mechanism by which ribosomal frameshifting operates, and work from several groups has shown that the specificity for the event resides in the nucleotide sequence of the
RNA around the site at which frameshifting occurs, since it has proved possible to induce frameshifting in a heterologous context by inserting cloned DNA corresponding to the frameshift site into unrelated genes
(Brierley et al., 1987; Jacks et al., 1988). For RSV and
HIV, the site at which frameshifting occurs has been identified by a combination of site-directed mutagenesis and amino-acid sequence analysis (Jacks et al., 1988a.
Jacks, T. , Madhani, H.D. , Masiarz, F.R. and Varmus, H.E.
(1988b); Signals for ribosomal frameshifting in the Rous sarcoma virus gag-pol region; Cell 5_5, 447-458.). These authors, from a comparative analysis of a large number of
retroviruses thought to utilise frameshifting as a means of controlling gene expression, have suggested that certain heptanucleotide RNA sequences, initiating with 2 homopoly eric triplets, can allow tRNA slippage during translation leading to a (-1) frameshift. In addition to these 'slippery' sequences, potential RNA stem-loop structures located downstream of most retroviral shift- sites have been proposed to make contributions to the frameshifting process (Rice, N.R., Stephens, R.M. , Burny, A. and Gilden, R.V. (1985); The gag and pol genes of bovine leukemia virus: nucleotide sequence and analysis; Virology 142, 357-377. Moore et al., 1987. Jacks, T., Townsley, K., Varmus, H.E. and Majors, J. (1987); Two efficient ribosomal frameshifting events are required for synthesis of mouse mammary tumour virus gag-related polyproteins; Proc. Natl. Acad. Sci. USA 84, 4298-4302. and Jacks et al., 1988a), and indeed, the presence of a stem-loop downstream of the RSV frameshift site has been shown to be essential (Jacks et al., 1988b.). How the stem-loop influences frameshifting is not known, but it has been suggested (Jacks et al., 1988b.) that ribosomes may slow or stall at the stem-loop, increasing the likelihood of a tRNA slippage event.
The present applicants' recent work on IBV, has aimed at identifying precisely the sequences in IBV RNA which signal the frameshifting, and they have now narrowed this down to an 8'3 nucleotide portion of the RNA, which is sufficient to direct highly efficient (30%) frameshifting in a variety of different heterologous contexts (Fig. 2). The applicants have identified the point at which the ribosome actually slips back, as a UUUAAAC sequence which occurs right at the 5' end of the frameshift signal, and this is consistent with the results reported by others (Jacks et al., 1988a,b). However, the applicants have also made the new discovery
that -^downstream of this 'slippery' sequence, the RNA in the frameshifting signal folds to form a tertiary structure which has been termed a pseudoknot (Pleij, C.W.A., Rietveld, K. and Bosch, L. (1985). A new principle of RNA folding based on pseudoknotting. Nucl. Acids Res. 13 1717-1731).
The applicants' experiments indicate that the precise primary nucleotide sequence of this folded region is less important than its ability to form the right kind of structure. Hence the applicants have been able to change fairly radically the nucleotide sequence at the frameshift signal, and still preserve its function. The applicants have also been able to predict that certain regions of the existing signal are redundant, in particular nucleotides 12395 to 12419 marked in Figure 2, and indeed have been able to delete them without affecting the function of the signal. Thus the applicants have the capacity to design a minimal sequence of about 60 nucleotides which will, when placed in any foreign gene, direct high efficiency frameshifting.
In this specification, the applicants describe analysis of the RNA sequences responsible for frameshifting in IBV, and give information about the F1/F2 overlap which is sufficient to induce frameshifting at high efficiency. The upstream limit of this region has the characteristic of a 'slippery' sequence as defined previously (Jacks et al., 1988b) and mutational analysis of this sequence is entirely consistent with frameshifting occurring at this site. The remainder of the region consists of about eighty nucleotides immediately downstream of the 'slippery' sequence and the applicants present evidence that efficient frameshifting depends on the formation, by these sequences, of a tertiary RNA structure in the form of a "pseudoknot" (Studnika, G.M., Rahn, G.M., Cummings, I.W. and Salser,
W.A. (1978). Computer method for predicting the secondary structure of single-stranded RNA. Nucl. Acids Res. 5_, 3365-3387; Pleij et al., 1985).
The applicants also describe herein, examples of the practical use of these frameshifting signals. SUMMARY OF THE INVENTION
The present invention provides a nucleotide construct which comprises a ribosomal frameshift signals as described herein and functional variants thereof, their production, identification and uses.
The present invention provides a nucleotide construct which comprises a ribosomal frameshift signal which has a nucleotide sequence which, as RNA forms part or all of a pseudoknot substantially as shown in Figure 7 hereof, or a functional variant thereof, and which causes a ribosome arriving at said ribosomal frameshift signal during translation, to slip back by at least one nucleotide, resulting in the translation of the subsequent RNA sequence in a different reading frame. The present invention provides nucleotide constructs wherein the ribosomal frameshift signal comprises part or all of the 83 nucleotide sequence shown in Figure 2 hereof or a functional variant thereof and which causes a ribosome arriving at said ribosomal frameshift signal during translation to slip back by one or more nucleotides resulting in the translation of the subsequent RNA sequence in a different reading frame. The nucleotide construct may comprise a ribosomal frameshift signal which comprises said 83 nucleotide sequence less one or more of the 24 nucleotides between positions 12396 and 12419 (inclusive). The ribosomal frameshift signal may comprise as little as 59 nucleotides of the 83 nucleotide sequence shown in Figure 2 eg 83 nucleotides, less the 24 nucleotides between positions 12396 and 12419 (inclusive).
The ribosomal frameshift signal may comprise the sequence UUUAAAC or a functionally equivalent 'slippery' sequence. Some functionally equivalent 'slippery' sequences are UUUUUUC, UUUAAAU, UUUAAAA and UUUGGGC. Genes are provided which contain part or all of a said ribosomal frameshift signal or said 'slippery' sequences to direct ribosomal frameshifting.
The present invention also provides a nucleotide construct which contains a sequence encoding a first polypeptide followed by a sequence encoding a second polypeptide with said ribosomal frameshift signal located between said sequences, whereby translation of the second polypeptide sequence depends on whether or not frameshifting has occurred. This second polypeptide may be a reporter molecule. The second polypeptide may comprise a first member of a binding pair so as to be detectable by the other member of the binding pair. The second polypeptide can thus act as a tag, allowing detection of the frameshifted product and hence proof that the primary sequence is being translated. The second polypeptide may be a protein or peptide sequence known to be recognised by a specific antibody. In another example, the second polypeptide may be a reporter molecule whose expression level may be monitored quantitatively. The reporter molecule may be an enzyme eg luciferase. Thus cells expressing high levels of the desired target protein may be selected on the basis of expression of the second polypeptide.
In a third example, the second polypetide may be a membrane-anchor peptide sequence. If the primary polypeptide is destined for secretion from the cell (eg soluble immunoglobulin), addition of the secondary polypeptide in the form of a membrane anchor sequence, will result in the protein remaining attached to the cell surface. Thus, depending on the efficiency of the
frameshift signal, a proportion of both secreted and membrane-bound protein may be produced.
Also provided are recombinant cloning vectors and replicable expression vectors which contain the nucleotide constructs.
Also provided are recombinant microorganisms and cell cultures containing said recombinant cloning and replicable expression vectors; protein expression products expressed from the replicable expression vectors as herein provided; and nucleotide sequences comprising the nucleotide sequences for said primary and secondary polypeptides with the ribosomal frameshift signal located therebetween.
The present invention also provides a method of monitoring the production of a first polypeptide in a recombinant system> in which a nucleotide construct as herein provided is expressed and the second polypeptide (ie the reporter protein) is detected. In which case, a preparation of the expression product may carry the reporter molecule.
In such a method, the second polypeptide may be detected by use of an antibody, or by use of an enzyme substrate.
The invention further provides a method for the production of both non-membrane bound and membrane bound forms of polypeptide in a recombinant system which comprises expressing a nucleotide construct as herein provided.
The present invention also provides a method which comprises culturing a microorganism or cell culture as hereinbefore described, to express said nucleotide construct and which also comprises separating the different resulting polypeptides from each other.
BRIEF DESCRIPTION OF THE DRAWINGS
In order that the present invention may be more readily understood, embodiments by way of example only, will now be described with reference to the figures wherein:
Figure 1 is a diagram of plasmid pFS7 showing the predicted sizes of protein products which would be expected following ribosomal frameshifting within the F1/F2 overlap during translation of a mRNA derived from Smal-digested pFS7. template DNA;
Figure 2 shows the nucleotide sequence of the smallest fragment from the F1/F2 junction region of IBV genomic RNA able to direct high efficiency frameshifting in a heterologous context (defined by deletion analysis); Figure 3 shows the definition of the ribosomal slip site within the IBV frameshifting signal. The mutations shown in panel A were constructed in pFS7 (pFS7.1 etc) or pFS8 (pFS8.1 etc) and analysed by in vitro transcription and translation in the rabbit reticulocyte lysate system (panel B) exactly as described in Brierley et al., 1989 (plasmid pFS8 is a derivative of pFS7 in which a promoter for the phage T7 RNA polymerase has been inserted just upstream of the influenza PB1 gene. The coding sequences and frameshift signal are however identical);
Figure 4 shows the proposed mechanism by which ribosomal slippage occurs (Jacks et al, Cell 5_5, 447-458, 1988);
Figure 5 shows an analysis of the Slippery Site. Nucleotide changes were constructed in the 7 nucleotide slip. site defined in Figure 3 as shown, by site-directed mutagenesis of pFS8. Mutant signals were analysed by in vitro transcription and translation as before;
Figure 6 shows protein coding consequences of appending a minimal frameshift signal to the end of a
desired target^ene;
Figure 7 shows -the proposed structure of the pseudoknot which forms an essential part of the IBV frameshift signal. Base-pairing between nucleotides in the loop of the stem-loop structure and a region downstream (A) results in the formation of an extended double helix, shown schematically (B). The double helical regions SI and S2 are connected by single- stranded loops LI and L2. In this structure, S2 is stacked upon SI such that a right-handed, quasi- continuous double-helix of 16 base-pairs and one mismatched pair is formed. An artist's impression of the three-dimensional organisation of this structure is shown in (C), assuming that 10 base-pairs and one mismatch form in SI, 6 base-pairs from in S2 and one turn of the helix contains 11 base-pairs (Arnott, S; Hukins, D.W.L. and Dover, S.D. (1972); Optimised parameters for RNA double- helices. Biochem. Biophys. Res. Commun. 48_, 1392-1399.). In the resulting pseudoknot, LI (2 nucleotides in length) crosses the deep groove and L2 (32 nucleotides in length) the shallow groove. The 'fold' program of Jacobson et al. (Jacobson, A.B., Good, L; Simonetti, J; and Zucker, M. (1984); Some simple computational methods to improve the folding of large RNA's; Nucl. Acids Res. 12, 54-62) did not predict any significant RNA secondary structures within L2. These diagrams are based on those presented by Rietveld et al. (Rietveld, K; Pleij, C.W.A. and Bosch, L. (1983); Three-dimensional models of the tRNA-like 3'-termini of some plant viral RNA's; EMBO J. 2, 1079-1085.) and Pleij et al. (1985). The precise nucleotides proposed to be part of the stems and loops of the pseudoknot are shown in D;
Figure 8 shows the effect on frameshifting efficiency of changing the relative position of the slip site with respect to the RNA pseudoknot. A. Diagram of
the mutations created in pFS7. B. Translation products from mRNAs bearing the altered frameshift signals;
Figure 9 shows a summary of mutational analysis of the loops within the proposed pseudoknot. Mutations were created, as indicated, in pFS8 by site-directed mutagenesis, and analysed as before for frameshifting. Wild type efficiency (20-30%) is indicated by ++, intermediate efficiency (5-20%) by + and low efficiency (1-5%) by +/-; Figure 10 shows a summary of mutational analysis of the stems within the proposed pseudoknot. Mutations were created, as indicated, in pFS7 and pFS8 by site-directed mutagenesis, and analysed as before for frameshifting. Frameshifting efficiencies are defined by the symbols as for Figure 9;
Figure 11 shows a summary of compensatory mutations made within the pseudoknot stems by mutagenesis of pFS7 and pFS8. Boxed regions corresponding to different sections of the proposed stems, were altered to create destabilising mutations on each strand of the helix, and further changed to restabilise the helix by introducing double mutations. Frameshifting efficiencies are defined by the symbols as for Figure 9;
Figure 12 shows a summary of all mutational changes created within the pseudoknot, and of their effect on frameshift efficiency;
Figure 13 shows the construction of pFScassδ. The nucleotide sequence shown in A (unboxed) was introduced into the Bglll site (position 483 from the 5' end) of the influenza A/PR8/34 PB2 gene as part of the plasmid pFSl (Brierley et al., 1987,) such that ribosomal slippage into each reading frame results in a characteristically sized product. This sequence contains a slip site in conjunction with a downstream region which should fold to form a pseudoknot as shown in B. Two derivatives of this
plasmid, pFScassό and 7, were constructed by respectively introducing or deleting a single nucleotide in the downstream portion of PB2. The translation products of RNA containing these artificially-designed frameshift signals are shown in C;
Figure 14 shows the testing of the IBV frameshift signal in vaccinia virus infected CVl cells. Cells were mock-infected, or were infected with WT vaccinia virus or VACFS, which contains the influenza virus PB2 gene interrupted by a frameshift signal. Infected cell proteins were labelled with "^S methionine, and analysed either directly by gel electrophoresis (total) or after immunoprecipitation with antiserum directed against the C-terminal portion of the PB2 gene (antiPB2) or against the influenza virus PA protein (control). Immuno¬ precipitation was carried out as described in Brierley et al., 1987. The expected size for the frameshifted product was indicated by in vitro translation of RNA transcribed from the plasmid pFSl; Figure 15 shows the introduction of a BamHl site into pFS8 (a derivative of pFS7 with a T7 phage RNA polymerase promoter replacing the SP6 promoter) just downstream of the IBV frameshift signal;
Figure 16 shows the plasmid resulting from the genetic manipulation shown in Figure 15;
Figure 17 shows the plasmid resulting from the replacement of the smaller BamHl fragment shown in Figure 16 with a DNA cassette containing the luciferase gene;
Figure 18 shows the sequence at the point of fusion between the IBV frameshift signal and the luciferase gene.
Figure 19 shows the sequence of RNA segment 4 coding for HA. The N-terminal signal sequence is shown by the large box. Glycosylation sites are shown by small boxes and the cleavage site of the signal peptide and the site
of cleavage of HA into HA1 and HA2 are shown by arrows; and
Figure 20 shows the sequence of an artificial frameshift signal designed for membrane anchoring. DESCRIPTION OF EMBODIMENTS
Definition of the IBV Frameshift Signal
The details of this work are presented in two papers Brierley et al., EMBO J. j>, 3779, 1987; and Brierley et al., Cell _7_, 537, 1989. In summary, a fragment of cloned DNA, corresponding to the junction between the Fl and F2 open reading frames (ORFs) on the IBV genome (Boursnell et al., J. Gen. Virol. 6_ , 57, 1987) was cloned into a suitable reporter gene (the influenza virus PB2 gene) such that an artificial^ messenger RNA containing the putative frameshift signal could be produced by in vitro transcription of cloned DNA. The recombinant plasmid was constructed in such a way that ribosomal frameshifting on the mRNA could be monitored by translation in a cell-free system (from rabbit reticulocytes); successful frameshifting results in a read-through product of defined size, whose identity could be confirmed through its reactivity with antisera directed against the downstream portion of the reporter gene. Using this approach, the applicants showed that the F1/F2 junction sequence did indeed contain a highly efficient frameshift signal, and that this signal was recognised by eukaryotic ribosomes not only as part of the IBV mRNA, but also in a novel genetic context. The applicants have further shown that the signal is equally efficient when placed in two completely different reporter genes, the influenza virus PB2 gene (Brierley et al., 1987,) and the influenza virus PB1 gene (Brierley et al., 1989) suggesting that it may function irrespective of its genetic location. The precise composition of the frameshift signal was
then investigated by site-directed mutagenesis. This was facilitated by the use of a plasmid vector, pFS7 (Figure 1) which carries, in addition to a frameshift signal interrupting a reporter gene, a sequence which allows replication and packaging of the plasmid as a single-stranded DNA. The single-stranded form of the plasmid can then be used directly as a template for site- directed mutagenesis.
Initially the applicants sought to define the extent of the frame-signal by progressively deleting information from the ends of the 220 nucleotide IBV-derived sequence in pFS7. Using this approach the applicants were able to delete all but 83 nucleotides without affecting the efficiency of frameshifting. The 83 nucleotide "minimal" frameshift signal is shown in Figure 2. Identification of the 'Slippery Site'
Comparison of this sequence with those known to promote rameshifting in other systems (Jacks et al. , 1988b) suggested that a 7 nucleotide sequence, UUUAAAC, located at the very beginning of the signal, very likely represents the 'slippery site' at which ribosomes actually slip back by 1 nucleotide. This has been confirmed by an experiment in which termination codons were introduced on either side of the proposed 'slippery site' , such that this 7 nucleotide sequence becomes the only region of overlap between the upstream and downstream ORFs; in this case, a "read-through" product could only .be synthesised if the ribosome changed frame within the 7 nucleotide window. Figure 3 shows that frameshifting does indeed still occur efficiently on mRNA produced from this mutant frameshift signal, and so the applicants are confident that the UUUAAAC sequence is indeed the point of slippage.
The mechanism by which the ribosomal slippage occurs on the UUUAAAC sequence is believed to involve firstly
recognition of the two codons UUA and AAC (at the peptidy and acceptor sites respectively on the ribosome) by their cognate tRNAs (Figure 4). Subsequently both tRNAs can slip back by a single nucleotide on the mRNA; the arrangement of sequences in the slip site is such that 2 out of 3 base pair contact can still be maintained between the tRNAs and the mRNA after slippage. This kind of model suggests that other kinds of 7 nucleotide sequence, which also show similar or greater degrees of pairing in the slipped position, might function equally well as slip sites. The applicants have tested several such sequences (Figure 5) and find that indeed some do allow efficient frameshifting (eg UUUUUUC, UUUAAAU, UUUAAAA). Others however (eg UUUGGGC) are much less efficient, though still functional. Thus from these studies, the applicants are able to make predictions about the potential "slipperiness" of a variety of sequences.
Based on the slippage model, a -2 slippage may also be possible in certain situations where such a shift still allows "2 out of 3" pairing by the slipped tRNAs (eg with sequences of the form UUUUUUN, where N is any nucleotide). In this case however, ribosomes would shift into both the -1 and -2 frames, and so the lack of specificity may render the signal less practically useful.
These data have two important practical implications. The first is that a minimal frameshift signal placed in a novel genetic context will direct ribosomes to change frame almost immediately after encounter with the signal. Second, those ribosomes that fail to change frame at the 'slippery' sequence can be made to terminate directly beyond the slip site (through incorporation of a termination codon in the primary reading frame). Thus, if a minimal frameshift signal is
introduced exactly adjacent to the last codon of a particular gene (Figure 6) two different gene products can be generated: a frameshifted product consisting of the original gene with additional sequences fused on at the C terminus, and a "non-frameshifted" version which differs from the wild type gene product only by the presence at its C terminus of an additional 2 or 3 amino acids. Furthermore, since a variety of different 'slippery' sequences may be used in the signal, these additional amino acids can be, to some extent, chosen to minimise their impact on the structure of the native protein.
Requirement for an RNA Pseudoknot as Part of the Frameshift Signal The deletion analysis referred to above indicated however, that the 'slippery' sequence itself, is insufficient to promote efficient ribosomal slippage. The applicants noted that the additional sequences required, had the potential to form two sets of base- paired interactions (RNA helices), and the applicants investigated this possibility through construction of mutant frameshift signals, in which the predicted interactions would be de-stabilised by complementary mutations, and re-stabilised by compensatory changes on the opposite strand. This analysis indicated clearly that efficient frameshifting does indeed require formation of both helices. These results imply that the RNA sequences just downstream of the 'slippery site' fold into a kind of RNA tertiary structure known as an RNA pseudoknot (Studnicka et al. Nucl. Acids Res. 5_, 3365, 1978; Pleij et al., Nucl. Acids Res. 13_, 1717, 1985) (Figure 7), and that this structure is essential for efficient frameshifting.
The most likely explanation for this requirement is that the presence of the folded RNA structure arrests the
progress of the ribosome during translation, such that it pauses at the 'slippery site', promoting slippage of the bound tRNAs. This explanation is supported by the observation that the distance between the 'slippery site' and the pseudoknot is critical; eg insertion or deletion of just 3 nucleotides between the two elements severely inhibits frameshifting (Figure 8). Sequence Constraints on the RNA Pseudoknot
The data described above suggests that the primary structure (nucleotide sequence) of the downstream portion of the frameshift signal is much less important than its ability to fold into the correct kind of tertiary structure. This has been borne out by extensive mutagenic studies. Analysis of the sequences predicted to form the two single-stranded loops of the knot (loops 1 and 2, Figure 9) indicate that they may be altered radically without affecting frameshift efficiency. Both of the nucleotides predicted to be part of loop 1 can be changed to the complementary sequence, and either one or three extra nucleotides can be inserted without effect. Loop 2 can either be lengthened (by 6 nucleotides), or shortened from 32 nucleotides to 8 nucleotides. Furthermore, these 8 nucleotides can be changed to a complementary sequence, again with no effect. The only constraint appears to be that the loops must be of a certain minimum length; shortening loop 2 beyond 8 nucleotides proved inhibitory to frameshifting. The applicants presume this reflects the minimum length required to span the top and bottom of the appropriate helix.
Mutation of the nucleotides predicted to be involved with helix formation, as expected, generally proved inhibitory to frameshifting (Figure 10), though changes made at the ends of the helices had a less dramatic effect. However in all cases the inhibitory effects of
these changes could be abolished by introduction of additional compensatory base changes on the opposite strand (Figure 11). This strongly supports the proposed model for formation of the pseudoknot. A summary of all the point mutations made throughout the pseudoknot region is presented in Figure 12. From this, it is evident that, apart from the 'slippery' sequence, each nucleotide position within the frameshift signal is changed without effect, as long as the overall structure is maintained. It is therefore possible to design an RNA sequence with the capacity to form the correct structure for frameshifting to occur, but also with desirable codons. An example of such a signal is shown in Figure 13. Here the signal was designed in such a way that no termination codons were present in any reading frame throughout the length of the RNA sequence, apart from that present just downstream of the 'slippery' sequence (terminating the primary open reading frame). The required sequence was synthesised chemically as a pair of complementary DNA molecules with the 5 ' overhanging ends (sequence GATC) which were annealed and introduced into the appropriate genetic location (a Bglll site at position 483 on the influenza PB2 gene Fields and Winter, Cell 28 303, 1982) by ligation of compatible ends.
The basic plasmid resulting from this cloning procedure was designated pFScass5, and in this case, ribosomal slippage into the '-1' frame should lead to the production of a 19K 'stopped' primary product. Figure 13B shows that this is indeed the case, and that the efficiency of frameshifting is rather greater even than with the wild type signal. The construction of the plasmid is such that ribosomal slippage into the '-2' frame, if it were to occur, should generate an 85K protein. The results (Figure 13B) show however that
pFScassδ-generated mRNA directs the sythesis of only a very small amount of an 85K protein, indicating that, as expected for this particular slippery site, efficient frameshifting is confined to the '-1' frame. Two further plasmids were constructed from pFScassδ, by deleting
(pFScassδ) or inserting (pFScass7) a single nucleotide downstream of the frameshift signal. Messenger RNA transcribed from this plasmid should produce a 28K protein through '-1' frameshifting, and indeed, such a protein is produced very efficiently (Figure 12B).
However, in addition, this mRNA also directed the synthesis of a small amount of an 85K protein. This was surprising, since a protein of this length should only be produced if the ribosome translating through the frameshift signal does not change frame, but ignores, or
'supresses' the termination codon at the end of the upstream reading frame. Positioning of the IBV pseudoknot some three nucleotides downstream of a terminator may therefore be capable of causing the ribosome occasionally to insert an amino acid at the termination codon, and to read through, in the same frame, into the downstream sequence.
If it proves possible to reproduce this effect in other genetic contexts, a pseudoknot-forming sequence could be inserted just downstream of any gene such that its protein product may be expressed in a truly native form (ie with no additional amino acids whatsoever) and also as a fusion protein with useful downstream sequences attached. In situations where a high proportion of fusion protein expression is not required, this approach might prove preferable to the insertion of a complete frameshift signal.
To date no practical uses of these ribosomal frameshift sequences have been envisaged or proposed by workers in this field.
The applicants have now identified two different types of use. a) Attachment of a potentially useful reporter molecule to the end of a specific target gene product in order to allow simple monitoring of its production. For this, the frameshift signal is arranged in such a way that ribosomes translating the mRNA produced from the engineered gene produce the desired specific gene product, but on a proportion of occasions, shift frame, resulting in the production of the gene product fused to the reporter. b) Attachment of a membrane anchor or other form of cellular 'targeting' signal to the end of a specific gene product in order to direct a proportion of that gene product to particular sites within the cell.
Examples of these uses will now be described in more detail. a) Attachment of an Immunological Marker Sequence to the End of a Protein Whose Expression in Eukaryotic Cells is Difficult to Monitor
There are many instances in which a gene is identified initially by nucleotide sequencing in the absence of any information about its encoded gene product. It is often therefore important to express the gene in a eukaryotic cell as a means of investigating the function of its encoded protein. Several kinds of expression system can be used for this purpose, but one of the major problems with such an approach is that a suitable reagent (such as specific antibody) for detection of the gene product is not usually available. Unless high level expression is achieved (which may, in any case, prove toxic to the cell) it can often therefore be difficult to establish whether or not the desired protein is being produced, particularly if the gene product is subject to post-translational modification and
so is not of the expected molecular weight. One approach to this, is to express the target sequence in a heterologous system, for example as a bacterial fusion protein, and to use this protein as an immunogen to raise a specific antiserum. This however is a laborious procedure, taking several months, with no guarantee that the resulting serum will indeed recognise the desired target sequence.
It would therefore be of great value, where expression of an uncharacterised gene product in eukaryotic cells is sought, to be sure that the desired polypeptide was indeed being translated in the cell, and to obtain some estimate of its likely expression level. This may be done by expressing the protein as a fusion with some immunologically identifiable protein sequence, but of course there is a high probability that such a protein will no longer retain its correct function. However, by inserting a frameshift signal between the coding sequence for the target protein and that for the immunological marker, in such a way that ribosomes translating the hybrid mRNA normally stop at the end of the target gene (producing the native protein) but occasionally shift frame leading to production of a fusion protein, then one achieves both objectives simultaneously, ie the bulk of the protein produced from the hybrid gene is the native target protein, but the proportion of frameshifted protein still allows monitoring of expression levels immunologically.
If the tagged version of the protein is detected, then one can be confident that the primary protein product is also being translated, and the strength of the detection signal provides a measure of its expression level.
Plasmid pFSl contains the influenza virus PB2 gene (Brierley et al., 1987, ibid) into which has been cloned
the IBV ribosomal frameshift signal, such that expression of the downstream half of the gene is dependent on ribosomal frameshifting. This plasmid was digested with BamHl to release the complete recombinant gene as a single DNA fragment, and this fragment was cloned into the BamHl site of plasmid pGS20 (Mackett, M; Smith, G.L. and Moss, B. (1982); Vaccinia virus: a selectable eukaryotic cloning and expression vector. Proc. Natl. Acad. Sci., USA 79_, 7415-7419). The resulting plasmid (pFSVAC) contains the recombinant gene adjacent to the vaccinia virus 7.5k promoter in the correct orientation to allow its transcription by the vaccinia virus RNA polymerase into messenger RNA. The gene is also flanked by sequences from the vaccinia virus thymidine kinase gene, and so it can be introduced into the genome of vaccinia virus by standard procedures (Smith, G.L., Mackett, M. and Moss, B; (1983) Infectious vaccinia virus recombinants that express hepatitis B virus surface antigen. Nature, London, 302, 490-495). In brief, the plasmid was introduced into CV1 cells by transfection, the cells were superinfected with vaccinia virus, and the progeny virus from this infection were grown in selective medium in order to select virus which had lost thymidine kinase activity, and which was therefore likely to have acquired the influenza virus gene by recombination. A virus which had acquired the correct DNA sequence was identified by Southern Blotting of virus DNA using a radiolabelled probe specific for the influenza virus PB2 gene, and stocks of this virus were prepared. CV1 cells were infected with this virus, and at 10 h post infection, total cell lystes were prepared for analysis, immediately after labelling with ^^S-methonine for 2 h. The lysates were then examined by gel electrophoresis and autoradiography (Figure 15). In this experiment it was not possible to detect PB2-specific protein products
expressed from the recombinant..virus by direct comparison with lysates of wild type virus infected cells, as is the case for many such experiments using vaccinia virus and other eukaryotic vectors as expression systems (presumably to low expression levels). However, immunoprecipitation of the radiolabelled lysates with an antiserum directed against the C-terminal region of the PB1 gene (which should only be expressed through ribosomal-frameshifting) clearly precipitated a protein of the molecular weight expected for a frameshifted product. Thus from this experiment, it can be concluded that the N-terminal portion of the PB1 gene is being expressed in the infected cell, even though it cannot be detected directly. In the example described above, the immunological marker sequence is a relatively large portion of a virus protein, and is detected by immunoprecipitation. It is equally possible however, to use Western Blotting as a method for detecting the immunological tag, and any protein or peptide sequence which is readily detectable with an antibody could be appended to the target protein in accordance with the protocol described herein. b) Attachment of the Enzyme Luciferase to the End of a Target Gene Product in Order to Maximise its Expression
Luciferase is a protein of approximately 65K which catalyses the dehydrogenation of the substrate luciferin in the presence of oxygen, ATP and magnesium ions. During this process, 96% of the energy released appears as visible (mostly blue) light. Individual cells expressing luciferase may be detected simply by microscopy, and furthermore the level of expression of the enzyme is indicated by the degree of fluorescence. Thus, attachment of a luciferase gene to the end of a desired target gene (for example, an influenza virus
gene) in a way that allows cellular production of a hybrid protein (with influenza sequences at the N terminus and luciferase at the C terminus) provides a way of monitoring easily the level of production of the target protein sequences simply by looking for luciferase activity. Of course, this is of limited value since it is usually desirable to express a target protein in its native form, with little or no extraneous sequences. However, by inserting a ribosomal frameshift signal between the target gene and the luciferase, as described above, it is possible to produce, simultaneously, the primary protein sequence alone, as well as the 'luciferase-tagged' version. Since in such a system the production of the luciferase fusion protein is proportional to the production of the primary protein, and since luciferase activity can be detected in living cells, it is possible to use the luciferase detection assay as a means of identifying individual 'high expressor' cells from a cell-line expressing the target gene. Selection can be done by picking suitable cell colonies after microscopic examination for luciferase activity, or through use of a fluorescence activated cell sorting (FACS) machine. Selected cells would still be viable, and so could be amplified in culture to produce a new, more useful, expressing cell line.
Plasmid pFS8 contains the influenza virus PB1 gene inserted next to a promoter for the T7 phage RNA polymerase, with the IBV ribosomal frameshift signal (12,296-12,509) from the IBV genome RNA sequence (Boursnell, M.E.G., Brown, T.D.K., Foulds, I.J., Green, P.F., Tomley, F.M. and Binns, M.M. (1987). Completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus. J. Gen. Virol. 68^, 57-77) inserted at 1,139 nucleotides from the 5' end of the PB1 coding sequence. This plasmid is
altered by site directed mutagenesis to introduce a new restriction site for BamHl just downstream of the frame¬ shift signal (Figure 16), to create a plasmid with two BamHl restriction sites (Figure 17). This new plasmid is then digested with BamHl, and a DNA 'cassette' consisting of the luciferase gene bounded by BamHl 'sticky' ends is inserted (Figure 18) such that the reading frame is preserved from the IBV sequence into the luciferase sequence at the point of fusion (Figure 19). The luciferase gene cassette is produced by BamHl digestion of plasmid pSK8 (gift from S.M. Kerr, Department of Pathology, University of Cambridge). This plasmid was derived from plasmid pD0432 (Ow et al., Science, 234, 856, 1986), through introduction of a BamHl site 23 nucleotides upstream of the luciferase initiation codon.
This hybrid gene is then introduced into eukaryotic cells and expressed by:
(i) synthesis of an artificial mRNA in vitro from the hybrid gene using the T7 RNA polymerase, and microinjection of that mRNA into Xenopus oocytes - this indicates whether luciferase can be expressed in an active form when fused to a foreign protein; and
(ii) subcloning of the hybrid gene into a recognised eukaryotic expression vector, and transfection of cells with the resulting plasmid, to generate stable cell lines expressing the hybrid gene. Potential 'high expressors' are selected on the basis of luciferase activity, and the level of PBl expression achieved, is confirmed by Western Blotting using specific antisera.
.c) Attachment of a Membrane Anchor to the End of an 'Exported' Protein Via a Ribosomal Frameshift Signal
Many useful and important molecules synthesised within the eukaryotic cells are destined for export across the cell membrane. Some of these proteins are
simply secreted by the cell, such as hormones and growth factors, but others, having been transferred through the cell membrane, remain embedded in the membrane usually through a C terminal hydrophobic sequence of amino acids, sometimes called a membrane anchor. Examples of proteins in this category include important cell surface 'marker proteins', receptors, and also a large number of virus antigens. There are also proteins which fall into both categories, such as the immunoglobulins, which may be synthesised as secreted or membrane-bound forms, depending on the nature of the cell producing them.
There are situations where it might be useful to generate eukaryotic cells that expressed both a membrane- bound and a secreted form of an exported protein. For example, if a cell designated to express a particular kind of immunoglobulin (Ig) were to produce a membrane bound form of the protein, as well as a secreted form, then one could directly select for cells expressing high levels of the molecule by fluorescence activated cell sorting (FACS). Here, cells with high levels of the desired protein on their surface would be 'tagged' with a reagent recognising specifically the Ig, and 'sorted' by the machine into a separate container.
An alternative use, might be in situations where a good immune response was required against a particular virus-specific membrane protein. It is not entirely clear what features of virus proteins give high immunogenicity, but it seems possible that it might be advantageous to the immune system to encounter both a membrane-bound and a secreted form of the protein.
For example, an influenza virus gene, the virus haemagglutinin gene (HA) encodes a well characterised protein which occurs naturally as a spike-like projection on the surface of virus particles and infected cells, and is embedded in the membrane via a classical hydrophobic
membrane anchor sequence. A portion of this membrane anchor sequence may be replaced with a ribosomal frameshift signal, in such a way that ribosomes translating the new HA sequence will usually terminate before the hydrophobic sequence is encountered, leading to the production of a secreted form of the HA. However, occasionally the ribosome will shift frame, translating through into the membrane anchor sequence and leading to the production of an anchored form of the protein. A major problem with the above, is that the frameshift signal which is to replace part of the membrane anchor, is, in its natural form, a sequence which encodes non hydrophobic amino acids, and therefore the essential feature of the membrane anchor would be destroyed. However, based on the applicants' discovery that the primary sequence of the frameshift signal can be radically altered as long as the secondary and tertiary structure is preserved, it is possible to design a frameshift signal which encodes hydrophobic amino acids, and which therefore preserves the integrity of the membrane anchor.
The influenza A/PR8/34 haemagglutinin gene
(Figure 20) is cloned into a plasmid such that it is placed under control of a phage T7 RNA polymerase promoter, in a similar fashion to pFS7, and can also be converted to single stranded DNA. Using site-directed mutagenesis, an artificially designed frameshift signal
(Figure 21). is engineered into the HA gene, replacing the nucleotides highlighted in Figure 22 (1632-1692). This hybrid gene is transcribed into artificial mRNA, and its capacity for frameshifting assessed by in vitro translation or microinjection (in Xenopus oocytes) of the
RNA. The hybrid gene is then subcloned into a plasmid
(pGS20) which is then used to introduce the hybrid gene into a recombinant vaccinia virus genome. This
recombinant virus is then assessed for its ability to produce the secreted and membrane anchored form of the HA protein in infected cells by immunoprecipitation. Western Blotting, and immunofluorescence with specific anti-HA antibodies.