IE84405B1 - Surface expression libraries of randomized peptides - Google Patents

Surface expression libraries of randomized peptides

Info

Publication number
IE84405B1
IE84405B1 IE1991/3424A IE342491A IE84405B1 IE 84405 B1 IE84405 B1 IE 84405B1 IE 1991/3424 A IE1991/3424 A IE 1991/3424A IE 342491 A IE342491 A IE 342491A IE 84405 B1 IE84405 B1 IE 84405B1
Authority
IE
Ireland
Prior art keywords
oligonucleotides
sequence
seq
vector
population
Prior art date
Application number
IE1991/3424A
Other versions
IE913424A1 (en
Original Assignee
Applied Molecular Evolution
Filing date
Publication of IE84405B1 publication Critical patent/IE84405B1/en
Application filed by Applied Molecular Evolution filed Critical Applied Molecular Evolution
Publication of IE913424A1 publication Critical patent/IE913424A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/005Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from viruses
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/665Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans derived from pro-opiomelanocortin, pro-enkephalin or pro-dynorphin
    • C07K14/675Beta-endorphins
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/02Fusion polypeptide containing a localisation/targetting motif containing a signal sequence
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/70Fusion polypeptide containing domain for protein-protein interaction
    • C07K2319/735Fusion polypeptide containing domain for protein-protein interaction containing a domain for self-assembly, e.g. a viral coat protein (includes phage display)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/70Fusion polypeptide containing domain for protein-protein interaction
    • C07K2319/74Fusion polypeptide containing domain for protein-protein interaction containing a fusion for binding to a cell surface receptor
    • C07K2319/75Fusion polypeptide containing domain for protein-protein interaction containing a fusion for binding to a cell surface receptor containing a fusion for activation of a cell surface receptor, e.g. thrombopoeitin, NPY and other peptide hormones
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1037Screening libraries presented on the surface of microorganisms, e.g. phage display, E. coli display
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/66General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/70Vectors or expression systems specially adapted for E. coli
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2795/00Bacteriophages
    • C12N2795/00011Details
    • C12N2795/10011Details dsDNA Bacteriophages
    • C12N2795/10022New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/02Libraries contained in or displayed by microorganisms, e.g. bacteria or animal cells; Libraries contained in or displayed by vectors, e.g. plasmids; Libraries containing only microorganisms or vectors

Description

PATENTS ACT, 1992 SURFACE EXPRESSION LIBRARIES OF RANDOMIZED PEPTIDES IXSYS, INC.
SURFACE EXPRESSION LIBRARIES OF RANDOMIZED PEPTIDES BACKG O F E ON This invention relates generally to methods for synthesizing and expressing oligonucleotides and, more particularly, to methods for expressing oligonucleotides having random codon sequences. oligonucleotide synthesis proceeds via linear coupling of The reactions are generally performed on a solid phase support by first coupling the 3' end of the first monomer to the support. The second monomer is added to the 5' end of the individual monomers in a stepwise reaction. first monomer in a condensation reaction to yield a dinucleotide coupled to the solid support. At the end of each coupling reaction, the by—products and unreacted, free monomers are washed away so that the starting material for the next round of synthesis is the pure oligonucleotide attached to the support. In this reaction scheme, the stepwise addition of individual monomers to a single, growing end of a oligonucleotide ensures accurate synthesis Moreover, unwanted side reactions the oligonucleotides, resulting in high product yields. of the desired sequence. are eliminated, such as condensation of two In some instances, it is desired that synthetic oligonucleotides have random nucleotide sequences. This result can be accomplished by adding equal proportions of all four nucleotides in the monomer coupling reactions, leading to the random incorporation of all nucleotides and yielding a population of oligonucleotides with random Since all possible combinations of nucleotide represented within the population, all If the sequences. sequences are possible codon triplets will also be represented. objective is ultimately to generate random peptide products, this approach has a severe limitation because the random codons synthesized will bias the amino acids incorporated during translation of the DNA by the cell into polypeptides.
The bias is due to the redundancy of the genetic code.
There are four nucleotide monomers which leads to four possible triplet codons. to specify, sixty- With only twenty amino acids many of the amino acids are encoded by multiple Therefore, a population of oligonucleotides synthesized by sequential addition of monomers from a random population will not encode peptides whose amino acid sequence represents all possible combinations of the twenty different amino acids in equal proportions. That is, the frequency of amino acids incorporated into polypeptides will be biased toward those amino acids which are specified by multiple codons. codons.
To alleviate amino acid bias due to the redundancy of the genetic code, the oligonucleotides can be synthesized from nucleotide triplets. Here, a triplet coding for each of the twenty amino acids is synthesized from individual monomers. once synthesized, the triplets are used in the coupling reactions instead of individual monomers. By mixing equal proportions of the triplets, synthesis of oligonucleotides with random codons can be accomplished.
However, the cost of synthesis from such triplets far exceeds that of synthesis from individual monomers because triplets are not commercially available.
Amino acid bias can be reduced, however, by synthesizing the degenerate codon sequence NNK where N is a mixture of all four nucleotides and K is a. mixture guanine and thymine nucleotides. Each position within an oligonucleotide having this codon sequence will contain a total of 32 codons (12 encoding amino acids being represented once, 5 represented twice, 3 represented three times and one codon being a stop codon). Oligonucleotides expressed with such degenerate codon sequences will produce peptide products whose sequences are biased toward those amino acids being represented more than once. Thus, populations of peptides whose sequences are completely random cannot be obtained from oligonucleotides synthesized from degenerate sequences.
There thus exists a need for a method to express oligonucleotides having a fully random or desirably biased sequence which alleviates genetic redundancy. The present invention satisfies these needs and provides additional advantages as well.
SUMMARY OF THE INVENTION The invention provides a method of constructing a diverse population of vectors containing expressible oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues, comprising operationally linking a diverse population of oligonucleotides encoding completely random codon sequences to expression elements.
The invention also provides a method of constructing a diverse population of vectors having a combined first and second oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues capable of expressing said combined oligonucleotides as said random polypeptides, comprising the steps of: (a) operationally linking sequences from a diverse population of first oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues to a first vector; (b) operationally linking sequences from a diverse population of second oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues to a second vector; and (c) combining the vector products of steps (a) and (b) under conditions where said populations of first and second oligonucleotides are joined together into a population of combined vectors capable of being expressed.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic drawing for synthesizing oligonucleotides from- nucleotide monomers with random tuplets at each position using twenty reaction vessels.
Figure 2 is a schematic drawing for synthesizing oligonucleotides from nucleotide monomers with random tuplets at each position using ten reaction vessels.
Figure 3 is a schematic diagram of the two vectors used for sublibrary and library production from precursor oligonucleotide portions. M13IX22 (Figure 3A) is the vector used to clone the anti—sense precursor portions (hatched box). p/o expression The single—headed arrow represents the Lac sequences and the. double-headed arrow represents the portion of M13IX22 which is to be combined with H13IX42. The amber stop codon for biological selection and relevant restriction sites are also shown.
M13IX42 (Figure 3B) is the vector used to clone the sense precursor portions (open box). Thick lines represent the pseudo-wild type (4’gVIII) and wild type (gVIII) gene VIII sequences. The double-headed arrow represents the portion of M13IX42 which is to be combined with M13IX22. The two amber stop codons and relevant restriction sites are also shown. Figure 3C shows the joining of vector population from sublibraries to form the functional surface expression vector Ml3IX. Figure 3D shows the generation of a surface expression library in a non-suppressor strain and the production of phage. The phage are used to infect a suppressor strain (Figure 3E) for surface expression and screening of the library.
Figure 4 is a schematic diagram of the vector used for generation of surface expression libraries from random oligonucleotide populations (M13IX30). The symbols are as described for Figure 3.
Figure 5 is the nucleotide sequence of M13IX42 (SEQ ID NO: 1).
Figure 6 is the nucleotide sequence of M13IX22 (SEQ ID NO: 2).
Figure 7 is the nucleotide sequence of M13IX30 (SEQ ID NO: 3).
Figure 8 is the nucleotide sequence of Ml3ED03 (SEQ ID NO: 4).
Figure 9 is the nucleotide sequence of M13IX421 (SEQ ID No: 5).
Figure 10 is the nucleotide sequence of Ml3ED04 (SEQ ID NO: 6).
DETAILED DESCRIPTION OF THE INVENTION This invention is directed to a simple and inexpensive method for synthesizing and expressing oligonucleotides encoding completely random amino acid residues using individual monomers. The method is advantageous in that individual monomers are used instead of triplets and by synthesizing only a non- degenerate subset of all triplets, codon redundancy is alleviated. Thus, the oligonucleotides synthesized represent a large proportion of possible random triplet sequences which can be obtained. The oligonucleotides can be expressed, for example, on the surface of filamentous bacteriophage in a form which does not alter phage viability or impose biological selections against certain peptide sequences.
The oligonucleotides produced are therefore useful for generating an unlimited number of pharmacological and research products.
We also describe herein the sequential coupling of monomers to produce oligonucleotides with a desirable bias of random codons. The coupling reactions for the randomization of twenty codons which specify the amino acids of the genetic code are performed in ten different reaction Vessels. Each reaction vessel contains a support on which the monomers for two different codons are coupled in three sequential reactions. One of the reactions couples an equal mixture of two monomers such that the final product has two different codon sequences. The codons are randomized by removing the supports from the reaction vessels and mixing them to produce a single batch of supports containing all twenty codons at a particular position. Synthesis at the next codon position proceeds by equally dividing the mixed batch of supports into ten reaction vessels as before and sequentially coupling the monomers for each pair of codons. The supports are again mixed to randomize the codons at the position just synthesized. The cycle of coupling, mixing and dividing continues until the desired number of codon positions have been randomized. After the last position has been randomized, the oligonucleotides with random codons are cleaved from the support. The random oligonucleotides can then be expressed, for example, on the surface of filamentous bacteriophage as gene VIII- peptide fusion proteins. Alternative genes can be used as well.
In its broadest form, the invention provides a method of constructing a diverse population of vectors containing expressible oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues, comprising operationally linking a diverse population of oligonucleotides encoding completely random codon sequences to expression elements. The populations of oligonucleotides can be expressed as fusion products in combination with surface proteins of filamentous bacteriophage, such as M13, as with gene VIII. The vectors can be transected into a plurality of cells, such as the procaryote E. coli.
The diverse population of oligonucleotides can be formed by randomly combining first and second precursor populations, each precursor population having a desirable bias of random codon sequences. Methods of synthesizing and expressing the diverse population of expressible oligonucleotides are also provided.
In a preferred embodiment, two populations of random oligonucleotides are synthesized. The oligonucleotides within each population encode a portion of the final oligonucleotide which is to be expressed. Oligonucleotides within one population encode the carboxy terminal portion of the expressed oligonucleotides. These oligonucleotides are cloned in frame with a gene VIII (gVIII) sequence so that translation of the sequence produces peptide fusion proteins. The second population of oligonucleotides are cloned into a separate vector. Each oligonucleotide within this population encodes the anti-sense of the amino terminal portion or the expressed oligonucleotides. This vector also contains the elements necessary for expression.
The two vectors containing the random oligonucleotides are combined such that the two precursor oligonucleotide portions are joined together at random to form a population of larger oligonucleotides derived from two smaller The vectors contain selectable markers to ensure joining together the two A mechanism also exists to portions. maximum efficiency in oligonucleotide populations. control the expression of gVIII-peptide fusion proteins during library construction and screening.
As used herein, the term "monomer" or "nucleotide refers to individual nucleotides used in the chemical synthesis of oligonucleotides. be used include both the ribo- and deoxyribo- forms of each of the five standard nucleotides (derived from the bases adenine (A or dA, respectively), guanine (G or dc), cytosine (C or dc), thymine (T) and uracil (U)).
Derivatives and precursors of bases such as inosine which are capable of supporting polypeptide biosynthesis are also included as monomers. Also included are chemically modified nucleotides, for example, one having a reversible blocking agent attached to any of the positions on the purine or pyrimidine bases, the ribose or deoxyribose sugar or the phosphate or hydroxyl moieties of the monomer. Such blocking groups include, for example, dimethoxytrityl, benzoyl, isobutyryl, beta-cyanoethyl and diisopropylamine groups, and are used to protect hydroxyls, exocyclic amines and phosphate moieties. other blocking agents can also be used and are known to one skilled in the art. monomer" Monomers that can As used herein, the term "tuplet" refers to a group of elements of a definable size. The elements of a tuplet as used herein are nucleotide monomers. For example, a tuplet can be a dinucleotide, a trinucleotide or can also be four or more nucleotides.
As used herein, the term "codon" or "triplet" refers tuplet consisting of three adjacent nucleotide monomers which specify one of the twenty naturally occurring amino acids found in polypeptide biosynthesis.
The term also includes nonsense, not specify any amino acid. to a or stop, codons which do "Random codons" or herein, "randomized codons," as used refers to more than one codon at a position within a collection of oligonucleotides. The number of different codons can be from two to twenty at any particular position. "Randomized oligonucleotides," as used herein, refers to a collection of oligonucleotides with random codons at one or more positions. "Random codon sequences" as used herein means that more than one codon position within a randomized oligonucleotide contains random codons.
For example, if randomized oligonucleotides are six nucleotides in length (i.e., two codons) and both the first and second codon positions are randomized to encode all twenty amino acids, then a population of oligonucleotides having random sequences with every possible combination of the twenty triplets in the first and second position makes up the above population of randomized oligonucleotides. The number of possible codon combinations is 202. Likewise, if randomized oligonucleotides of fifteen nucleotides in length are synthesized which have random codon sequences at all positions encoding all twenty amino acids, then all triplets coding for each of the twenty amino acids will be found in equal proportions at every position. The population constituting the randomized oligonucleotides codon different "Random will contain 20" oligonucleotides. possible tuplets," or tuplets" are defined analogously. species of "randomized the term "bias" refers to a It is understood that there can be degrees of preference or bias toward codon sequences which encode As used herein, preference. particular amino acids. For example, an oligonucleotide whose codon sequences do not preferably encode particular amino acids is unbiased and therefore completely random.
The oligonucleotide codon sequences can also be biased toward predetermined codon sequences or codon frequencies will exhibit codon sequences biased toward a defined, or preferred, sequence. and while still diverse and random, "A desirable bias of random codon sequences" as used herein, refers to the predetermined degree of bias which can be selected from totally random to essentially, but not totally, defined (or preferred). There must be at least one codon position which is variable, however.
As used herein, the term "support" refers to a solid ‘material for attaching monomers for chemical such support is usually composed of materials phase synthesis. such as beads of control pore glass but can be other The term is also meant to include one or more monomers coupled to the materials known to one skilled in the art. support for additional oligonucleotide synthesis reactions.
As used herein, the terms "coupling" or "condensing" refers to the chemical reactions for attaching one monomer to a second monomer or to a solid support. Such reactions are known to one skilled in the art and are typically performed on an automated DNA synthesizer such as a MilliGen/Biosearch Cyclone Plus procedures recomended by the manufacturer. coupling" as used herein, refers to the stepwise addition Synthesizer using "sequentially of monomers.
A method of synthesizing oligonucleotides having random tuplets using individual monomers is described. The method consists of several steps, of a the first being synthesis nucleotide tuplet for each tuplet to be randomized.
As described here and below, a nucleotide triplet (i.e., a codon) will be used as a specific example of a tuplet. Any size tuplet will work using the methods disclosed herein, and one skilled in the art would know how to use the methods to randomize tuplets of any size.
If the randomization of codons specifying all twenty amino acids is desired at a position, then twenty different codons are synthesized. Likewise, if randomization of only ten codons at a particular position is desired then those ten codons are synthesized. Randomization of codons from two to sixty-four can be accomplished by synthesizing each desired triplet. Preferably, randomization of from two to twenty codons is used for any one position because of the redundancy of the genetic code. The codons selected at one position do not have to be the same codons selected at the next position. Additionally, the sense or anti-sense sequence oligonucleotide can be synthesized. The process therefore provides for randomization of any desired codon position with any number of codons.
Codons to be randomized are synthesized sequentially by coupling the first monomer of each codon to separate supports. The supports for the synthesis of each codon for example, be contained in different reaction vessels such that one reaction vessel corresponds to the monomer coupling reactions for one codon. As will be used here and below, if twenty codons are to be randomized, then twenty reaction vessels can be used in independent coupling reactions for the first twenty monomers of each codon.
Synthesis proceeds by sequentially coupling the second monomer of each codon to the first monomer to produce a dimer, followed by coupling the third monomer for each can, codon to each of the above-synthesized dimers to produce a trimer (Figure 1, step 1, where M1, M2 and M3 represent the first, second and third monomer, respectively, for each codon to be randomized).
Following synthesis of the first codons from individual monomers, the randomization is achieved by mixing the supports from all twenty reaction vessels which contain the individual codons to be randomized. The solid phase support can be removed from its vessel and mixed to achieve a random distribution of all codon species within the population (Figure 1, step 2). The mixed population of constituting all then redistributed into twenty independent reaction vessels (Figure 1, step 3). The resultant vessels are all identical and contain equal portions of all twenty codons coupled to a solid phase support. supports, codon species, are For randomization of the second position codon, synthesis of twenty additional codons is performed in each of the twenty reaction vessels produced in step 3 as the condensing substrates of step 1 (Figure 1, step 4). Steps 1 and 4 are therefore equivalent except that step 4 uses the supports produced by the previous synthesis cycle (steps 1 through 3) for codon synthesis whereas step 1 is the initial synthesis of the first the oligonucleotide. The supports resulting from step 4 will each have two attached to them hexanucleotide) with the codon at the first position being codon in codons (i.e., a any one of twenty possible codons (i.e., random) and the codon at the second position being one of the twenty possible codons.
For randomization of the codon at the second position and synthesis of the third position codon, steps 2 through This process yields in each vessel with are again repeated. a three codon oligonucleotide (i.e., 9 nucleotides) codon positions 1 and 2 randomized and. position three containing one of the twenty possible codons. Steps 2 through 4 are repeated to randomize the third position codon and synthesize the codon at the next position. The process is continued until an oligonucleotide of the desired length is achieved. After the final randomization step, the oligonucleotide can be cleaved from the supports and isolated by methods known to one skilled in the art.
Alternatively, the oligonucleotides can remain on the supports for use in methods employing probe hybridization.
The diversity of codon sequences, i.e., the number of different possible oligonucleotides, which can be obtained using the methods of the present invention, is extremely large and only limited by the physical characteristics of available materials. For example, a support composed of beads of about me um in diameter will be limited to about ,000 beads/reaction vessel using a 1 nu reaction vessel containing 25 mg of beads. This size head can support about 1 x 107 oligonucleotides per head. Synthesis using separate reaction vessels for each of the twenty amino acids will produce beads in which all the oligonucleotides attached to an individual head are identical. The diversity which can be obtained under these conditions is approximately 107 copies of 10,000 x 20 or 200,000 different random oligonucleotides. The diversity can be increased, however, in several ways without departing from the basic methods disclosed. herein. the number of possible sequences can be increased by decreasing the size of the individual beads which make up the support. A head of about 30 um in diameter will increase the number of beads per reaction vessel and therefore the number of oligonucleotides synthesized.
For example, Another way to increase the diversity’ of oligonucleotides ‘with random codons is to increase the volume of the reaction vessel. For example, using the same size head, a larger volume can contain a greater number of beads than a smaller vessel and therefore the oligonucleotides. number of Increasing the number of codons coupled to a support in a single reaction vessel also increases the diversity of The total diversity will be the number of codons coupled per vessel support synthesis of a greater the random oligonucleotides. raised to the number of codon positions synthesized. For example, using ten reaction vessels, each synthesizing two codons to randomize a total of twenty codons, the number of different oligonucleotides of ten codons in length per 100 pm head can be increased where each bead will contain about " or 1 x 103 different sequences instead of one. one skilled in the art will know how to modify such parameters to increase the diversity of oligonucleotides with random codons.
A method of synthesizing oligonucleotides having random codons at each position using individual monomers wherein the number of reaction vessels is less than the number of codons to be randomized is also described. For if twenty codons are to be randomized at each position within an oligonucleotide population, than ten The use of a smaller number example, reaction vessels can be used. of reaction vessels than the number of codons to be randomized at each position is preferred because the smaller number of reaction vessels is easier to manipulate number of results in a possible oligonucleotides synthesized. and greater The use of a smaller number of reaction vessels for random synthesis of twenty codons at a desired position within an oligonucleotide is similar to that described above using twenty reaction vessels except that each reaction vessel can contain the synthesis products of more than one codon. For example, step one synthesis using ten reaction vessels proceeds by coupling about two different codons on supports contained in each of ten reaction vessels. This is shown in Figure 2 where each of the two codons coupled to a different support can consist of the following sequences: (1) (T/G)TT for Phe and Val; (2) (T/C)CT for Ser and Pro; (3) (T/C)AT for Tyr and His; (4) (T/C)GT for Cys and Arg: (5) (C/A)TG for Leu and Met; (6) (C/G)AG for Gln and Glu: (7) (A/G)CT for Thr and Ala; (8) (A/G)AT for Ash and Asp; (9) (T/G)GG for Trp and Gly and (10) A(T/A)A for Ile and Cys. The slash (/) signifies that a mixture of the monomers indicated on each side of the slash are used as if they were a single monomer in the indicated coupling step. The antisense sequence for each of the above codons can be generated by synthesizing the complementary sequence. For example, the antisense for Rhe and Val can be AA(C/A). The amino acids encoded by each of the above pairs of sequences are given as the standard three letter nomenclature. coupling of the monomers in this fashion will yield codons specifying all twenty of the naturally occurring amino acids attached to supports in ten reaction vessels.
However, the number of individual reaction vessels to be used will depend on the number of codons to be randomized at the desired position and can be determined by one skilled in the art. For example, if ten codons are to be randomized, than five reaction vessels can be used for coupling. The codon sequences given above can be used for this synthesis as well. The sequences of the codons can also be changed to incorporate or be replaced by any of the additional forty-four codons which constitutes the genetic code.
The remaining steps of synthesis of oligonucleotides with random codons using a smaller number of reaction vessels are as outlined above for synthesis with twenty reaction vessels except that the mixing and dividing steps are performed with supports from about half the number of reaction ‘vessels. These remaining steps are shown in Figure 2 (steps 2 through 4).
Oligonucleotides having at least one specified tuplet at a predetermined position and the remaining positions having random tuplets can also be synthesized using the methods described herein. The synthesis steps are similar to those outlined above using twenty or less reaction vessels except that prior to synthesis of the specified codon position, the dividing of the supports into separate reaction vessels for synthesis of different codons is omitted. For example, if the codon at the second position of the oligonucleotide is to be specified, then following synthesis of random codons at the first position and mixing of the supports, the mixed supports are not divided into new reaction vessels but, instead, single reaction vessel to synthesize the specified codon. can be contained in a The specified codon is synthesized sequentially from Thus, the number of reaction vessels can be increased or decreased at each individual monomers as described above. step to allow for the synthesis of a specified codon or a desired number of random codons.
Following codon synthesis, the mixed supports are divided into individual reaction vessels for synthesis of the next codon to be randomized (Figure 1, step 3) or can be used without separation for synthesis of a consecutive specified codon. The rounds of synthesis can be repeated for each codon to be added until the desired number of positions with predetermined or randomized codons are obtained.
Synthesis of oligonucleotides with the first position codon being specified can also be synthesized using the above method. In this case, the first position codon is synthesized from the appropriate monomers. The supports are divided into the required number of reaction vessels needed for synthesis of random codons at the second position and the rounds of synthesis, mixing and dividing are performed as described above.
A method of synthesizing oligonucleotides having tuplets which are diverse but biased toward a predetermined sequence is also described herein. reaction vessels, This method employs two one vessel for the synthesis of a predetermined sequence and the second vessel for the synthesis of a random sequence. This method is advantageous to use when a significant number of codon positions, for example, are to be of a specified sequence since it alleviates the use of multiple reaction vessels.
Instead, a mixture of four’ different monomers such as adenine, guanine, cytosine and thymine nucleotides are used for the first and second monomers in the codon. The codon is completed by coupling a mixture of a pair of monomers of either guanine and thymine or cytosine nucleotides at the third monomer position. and adenine In the second vessel, nucleotide monomers are coupled sequentially to yield the predetermined codon sequence. Mixing of the two supports yields a population of oligonucleotides containing both the predetermined codon and the random codons at the desired position. Synthesis can proceed by using this mixture of supports in a single reaction vessel, for example, for coupling additional predetermined codons or, further dividing the mixture into two reaction vessels for synthesis of additional random codons.
The two reaction vessel method can be used for codon synthesis within an oligonucleotide with a predetermined tuplet sequence by dividing the support mixture into two portions at the desired codon position to be randomized.
Additionally, this method allows randomization to be adjusted. for the extent of For example, unequal mixing or dividing of the two supports will change the fraction of codons with predetermined sequences compared to those with random codons at the desired position. Unequal mixing and dividing of supports can be useful when there is a need to synthesize random codons at a significant number of positions within an oligonucleotide of a longer or shorter length.
The extent of randomization can also be adjusted by using unequal mixtures of monomers in the first, second and third monomer coupling steps of the random codon position.
The unequal mixtures can be in any or all of the coupling steps to yield a population of codons enriched in sequences reflective of the monomer proportions.
Synthesis of randomized oligonucleotides is performed using methods well known to one skilled in the art. Linear coupling of monomers can, be accomplished using phosphoramidite chemistry with a Millieen/Biosearch Cyclone Plus automated synthesizer as described by the manufacturer (Millipore, Burlington, MA). other chemistries and automated synthesizers can be employed as well and are known to one skilled in the art. for example, Synthesis of multiple codons can be performed without modification to the synthesizer by separately synthesizing the codons in individual sets of reactions. Alternatively, modification of an automated DNA synthesizer can be performed for the simultaneous synthesis of codons in multiple reaction vessels.
In one embodiment, the invention provides a plurality of procaryotic cells containing a diverse population of operationally linked to oligonucleotides oligonucleotides expression elements, the expressible having a desirable bias of random codon sequences produced expressible from diverse combinations of first and second oligonucleotides having a desirable bias of random sequences. The invention provides for a method for constructing such a plurality of procaryotic cells as well.
The oligonucleotides synthesized by the above methods can be used to express a plurality of random peptides which are unbiased, diverse but biased toward a predetermined sequence or which contain at least one specified codon at a predetermined position. The need will determine which type of oligonucleotide is to be expressed to give the resultant population of random peptides and is known to one skilled in the art. Expression can be performed in any compatible vector/host system. Such systems include, for example, plasmids or phagemids in procaryotes such as B. ggli, yeast systems, and other eucaryotic systems such as mammalian cells, but will be described herein in context with its presently preferred embodiment, i.e. expression on the surface of filamentous bacteriophage. Filamentous bacteriophage can be, for example, M13, fl and fd. Such phage have circular single-stranded genomes and double strand replicative DNA forms. Additionally, the peptides can also be expressed in soluble or secreted form depending on the need and the vector/host system employed.
Expression of random peptides on the surface of M13 can be accomplished, for example, using the vector system shown in Figure 3. Construction of the vectors enabling one of ordinary skill to make them are explicitly set out in Examples I and II. The complete nucleotide sequences are given in Figures 5, 6 and 7 (SEQ ID NOS: 1, 2 and 3, respectively). This system produces random oligonucleotides functionally linked to expression elements and to gVIII by’ combining two smaller oligonucleotide portions contained in separate vectors into a single vector. The diversity of oligonucleotide species obtained by this system or others described herein can be 5 x 107 or greater. Diversity of less than 5 J! 107 can also be obtained and will be determined by the need and type of random peptides to be expressed. The random combination of two precursor portions into a larger oligonucleotide increases the diversity of the population several fold and has the added advantage of producing oligonucleotides larger than what can be synthesized by standard methods.
Additionally, although the correlation is not known, when the number of possible paths an oligonucleotide can take during synthesis such as described herein is greater than the number of beads, then ‘there will be a correlation between the synthesis path and the sequences obtained. By combining oligonucleotide populations which are synthesized separately, this correlation will be destroyed. Therefore, any bias which may be inherent in the synthesis procedures will be alleviated by joining two precursor portions into a contiguous random oligonucleotide. be combined into an expressible form are each cloned into Populations of precursor oligonucleotides to separate vectors. The two precursor portions which make up the combined oligonucleotide corresponds to the carboxy and amino terminal portions of the expressed peptide. Each precursor oligonucleotide can encode either the sense or anti-sense and will depend on the orientation of the expression elements and the gene encoding the fusion portion of the protein as well as the mechanism used to join the two precursor oligonucleotides. For the vectors shown in Figure 3, precursor oligonucleotides corresponding to the carboxy terminal portion of the peptide encode the sense strand. Those corresponding to the amino terminal portion encode the anti-sense strand. Oligonucleotide populations are inserted between the Eco RI and Sac I restriction enzyme sites in Ml3Ix22 and Ml3IX42 (Figure 3A and B). Ml3IX42 (SEQ ID NO: 1) is the vector used for sense strand precursor oligonucleotide portions and M13IX22 (SEQ ID No: 2) is used for anti-sense precursor portions.
The populations of randomized oligonucleotides inserted into the vectors are synthesized with Eco RI and Sac I recognition sequences flanking opposite ends of the random codon sequences. The sites allow annealing and ligation of these single strand oligonucleotides into a double stranded vector restricted with Eco RI and Sac I.
Alternatively, the oligonucleotides can be inserted into the vector by standard mutagenesis methods. In this latter method, single stranded vector DNA is isolated from the phage and annealed with random oligonucleotides having known sequences complementary to vector sequences. The oligonucleotides are extended with DNA polymerase to produce double stranded vectors containing the randomized oligonucleotides .
The vector used for sense strand oligonucleotide portions, H13Ix42 (Figure 3B) contains down-stream and in frame with the Eco RI and Sac I restriction sites a sequence encoding the pseudo-wild type gVIII product. This gene encodes the wild type M13 gVIII amino acid sequence but has been changed at the nucleotide level to reduce homologous recombination with the wild type gVIII contained on the same vector. The wild type gVIII is present to ensure that at least some functional, non-fusion coat protein will be produced. The inclusion of a wild type gVIII therefore reduces the possibility of non-viable phage production and biological selection against certain peptide fusion proteins. Differential regulation of the two genes can also be used to control the relative ratio of the pseudo and wild type proteins.
Also contained downstream and in frame with the Eco RI and Sac I restriction sites is an amber stop codon. The mutation is located six codons downstream from sac I and therefore lies between the inserted oligonucleotides and the gVIII sequence. As was the function of the wild type gVIII, the amber stop codon also reduces biological selection when combining precursor portions to produce expressible oligonucleotides. This is accomplished by using a non-suppressor (sup 0) host strain because non- suppressor strains will terminate expression after the oligonucleotide sequences but before the pseudo gVIII sequences. Therefore, the pseudo gVIII will never be expressed on the phage surface under these circumstances. soluble ~ will be Expression in a non-suppressor strain can be advantageously Instead, only peptides produced. utilized when one wishes to produce large populations of soluble peptides. Stop codons other than amber, such as opal and ochre, or molecular switches, such as inducible repressor elements, can also be used to unlink peptide expression from surface expression. Additional controls exist as well and are described below.
The vector used for anti-sense strand oligonucleotide Ml3IX22 , (Figure 3A) , elements for the peptide fusion proteins. frame with the Sac I and Eco RI sites in this vector is a portions, contains the expression Upstream and in leader sequence for surface expression. A ribosome binding site and Lac 2 promoter/operator elements are present for transcription and translation of the peptide fusion proteins.
Both vectors contain a pair of Fok I restriction enzyme sites (Figure 3 A and B) for joining together two precursor oligonucleotide portions and their vector sequences. One site is located at the ends of each precursor oligonucleotide which is to be joined. The second Fok I site within the vectors is located at the end of the vector sequences which are to be joined. The 5' overhang of this second Fok I site has been altered to encode a sequence which is not found in the overhangs produced at the first Fok I site within the oligonucleotide portions. The two sites allow the cleavage of each circular vector into two portions and subsequent ligation of essential components within each vector into a single circular vector where the two oligonucleotide precursor portions form a contiguous sequence (Figure 3C). Non- compatible overhangs produced at the two Fok I sites allows optimal to be selected for performing concatermization or circularization reactions for joining conditions the two vector portions. Such selection of conditions can be used to govern the reaction order and therefore increase the efficiency of joining.
Fok I is a restriction enzyme whose recognition sequence is distal to the point of cleavage. Distal placement of the recognition sequence in its location to the cleavage point is important since if the two were superimposed within the oligonucleotide portions to be combined, it would lead to an invariant codon sequence at the juncture. To alleviate the formation of invariant codons at the juncture, Fok I recognition sequences can be placed outside of the random codon sequence and still be used to restrict within the random sequence. Subsequent annealing of the single-strand overhangs produced by Fok I and ligation of the two oligonucleotide precursor portions allows the juncture to be formed. A variety of restriction enzymes restrict DNA by this mechanism and can be used instead of Fok I to join precursor oligonucleotides without creating invariant codon sequences. Such enzymes include, for example, Alw I, Bbu I, Bsp MI, Hga I, Hph I, Mbo II, Mnl I, Ple I and Sfa NI. one skilled in the art knows how to substitute Fok I recognition sequences for alternative enzyme recognition sequences such as those above, and use the appropriate enzyme for joining precursor oligonucleotide portions.
Although the sequences of the precursor oligonucleotides are random and will invariably have oligonucleotides within the two precursor populations whose sequences are sufficiently complementary to anneal after cleavage, the efficiency of annealing can be increased by insuring that the single-strand overhangs within one precursor population will have a complementary sequence within the second precursor population. This can be accomplished by synthesizing a non-degenerate series of known sequences at the Fok I cleavage site coding for each of the twenty amino acids. Since the Fok I cleavage site contains a four base overhang, forty different sequences are needed to randomly encode all twenty amino acids. For example, if two precursor populations of ten codons in length are to be combined, then after the ninth codon position is synthesized, the mixed population of supports are divided into forty reaction vessels for each of the populations and complementary sequences for each of the corresponding reaction vessels between populations are The sequences are shown in Tables III and VI of Example I where the oligonucleotides on columns 1R through 40R form complementary overhangs with independently synthesized. the oligonucleotides on the corresponding columns 1L through 40L once cleaved.
Table VI are necessary to maintain the reading frame once the joined.
However, use of restriction enzymes which produce a blunt The degenerate X positions in precursor oligonucleotide portions are end, such as Mnl I can be alternatively used in place of Fok I to alleviate the degeneracy introduced in maintaining the reading frame.
The last feature exhibited by each of the vectors is an amber stop codon located in an essential coding sequence within the vector portion lost during combining (Figure 3C). The amber stop codon is present to select for viable phage produced from only the proper combination of precursor oligonucleotides and their vector sequences into a single vector species. other non-sense mutations or selectable markers can work as well.
The combining step randomly brings together different precursor oligonucleotides within the two populations into a single vector (Figure 3C: M13IX). The vector sequences donated from each independent vector, Ml3Ix22 and M13IX42, are necessary for production of viable phage. Also, the expression elements are contained in M13Ix22 and the since gVIII sequences are contained in Ml3IX42, expression of functional gVIII-peptide fusion proteins cannot be accomplished until the sequences are linked as shown in M13IX.
The combining step is performed by restricting each population of vectors containing randomized oligonucleotides with Fok I, mixing and ligating (Figure 3C). Any vectors generated which contain an amber stop codon will not produce viable phage when introduced into a non-suppressor strain (Figure 3D). Therefore, only the sequences which do not contain an amber stop codon will make up the final population of vectors contained in the library. These vector sequences are the sequences required for surface expression of randomized peptides. By analogous methodology, more than two vector portions can be combined into a single vector which expresses random peptides.
The invention provides for a method of selecting peptides capable of being bound by a ligand binding protein from a population of random peptides by (a) operationally linking a diverse population of first oligonucleotides having a desirable bias of random codon sequences to a first vector; (b) operationally linking a diverse population of second oligonucleotides having a desirable bias of random codon sequences to a second vector; (c) combining the vector products of steps (a) and (h) under conditions where said populations of first and second oligonucleotides are joined together into a population of combined vectors; (d) introducing said population of combined vectors into a compatible host under conditions sufficient for expressing said population of random peptides; and (e) determining the peptides which bind to said binding protein. The invention also provides for determining the encoding nucleic acid sequence of such peptides as well.
Surface expression of the random peptide library is performed in an amber suppressor strain. above, As described the amber stop codon between the random codon sequence and the gVIII sequence unlinks the two components in a non-suppressor strain. Isolating the phage produced from the non~suppressor strain and infecting a suppressor strain will link the random codon sequences to the gVIII sequence during expression (Figure 3E). culturing the suppressor strain after infection allows the expression of all peptide species within the library as gVIII-peptide Alternatively, the DNA can be isolated from the non-suppressor strain and then introduced into a suppressor strain to accomplish the same effect. fusion proteins.
The level of expression of gVIII-peptide fusion additionally be controlled at the transcriptional level. The gVIII-peptide fusion proteins under the inducible control of the Lac 2 promoter/operator system. other inducible promoters can work as well and are known by one skilled in the art. For high levels of surface expression, the suppressor library is cultured in an inducer of the Lac 2 promoter such as isopropylthio-B-galactoside (IPTG). proteins can Inducible control is beneficial because biological selection against non- functional gVIII-peptide fusion proteins can be minimized by culturing the library under non-expressing conditions.
Expression can then be induced only at the time of that the oligonucleotides within the library represented on the phage surface. Also this can be used to control the valency of the peptide on the phage surface. screening to ensure entire population of are accurately The surface expression library is screened for specific peptides which bind ligand binding proteins by standard affinity isolation procedures. Such methods include, for example, panning, affinity chromatography and solid phase blotting procedures. Panning as described by Parmley and Smith, Gene 73:305-318 incorporated herein by reference, (1988), which is is preferred because high titers of phage can be screened easily, quickly and in small volumes. Furthermore, this procedure can select species within the population, which otherwise would have been undetectable, and amplified to substantially homogenous populations. The selected peptide sequences can be determined by sequencing the nucleic acid encoding such peptides after amplification of the phage population. minor peptide The invention provides a plurality of procaryotic cells containing a diverse population of oligonucleotides having a desirable bias of random codon sequences that are operationally linked to The invention such expression provides for methods populations of cells as well. sequences. of constructing Random oligonucleotides synthesized by any of the methods described previously can also be expressed on the surface of filamentous bacteriophage, such as M13, example, without the joining together of precursor oligonucleotides. A vector such as that shown in Figure 4, M13IX30, can be used. This vector exhibits all the functional features of the combined vector shown in Figure 3C for surface expression of gVIII-peptide fusion proteins.
The complete nucleotide sequence for M13IX30 (SEQ ID No: 3) is shown in Figure 7.
M13IX30 contains a wild type gVIII for phage viability and a pseudo gVIII sequence for peptide fusions. The vector also contains in frame restriction sites for cloning random peptides. The cloning sites in this vector are xho I, Stu I and Spe I. Oligonucleotides should therefore be synthesized with the appropriate complementary ends for annealing and ligation or insertional mutagenesis.
Alternatively, the appropriate termini can be generated by PCR technology. Between the restriction sites and the pseudo gVIII sequence is an in-frame amber stop codon, again, ensuring complete viability of phage in constructing and manipulating the library. Expression and screening is performed as described above for the surface expression library of oligonucleotides generated from precursor portions.
Thus, the invention provides a method of selecting peptides capable of being bound by a ligand binding protein from a population of random peptides by (a) operationally linking a diverse population of oligonucleotides having a desirable bias of random codon sequences to expression elements: (b) introducing said population of vectors into conditions expressing said population of random peptides: (C) determining the peptides which bind to said binding a compatible host under sufficient and protein. Also provided is a method for determining the encoding nucleic acid sequence of such selected peptides.
The following examples are intended to illustrate, but not limit the invention.
EKAfl2LE_l Isolation and gharacterigatiog of Peptide Ligands Qenerated From ' h and a f o u eotides This example shows the synthesis of random oligonucleotides and the construction and expression of surface expression libraries of the encoded randomized peptides. The random peptides of this example derive from the joining together of oligonucleotides. Also demonstrated is the isolation and characterization of peptide ligands and their corresponding nucleotide sequence for specific binding proteins. mixing and two random n hes’ f ndo cle tid s The synthesis of two randomized oligonucleotides which correspond. to smaller portions of a larger randomized oligonucleotide is shown below. Each of the two smaller portions make up one-half of the larger oligonucleotide.
The population of randomized oligonucleotides constituting each half are designated the right and left half. Each population of right and left halves are ten codons in length with twenty random codons at each position. The right half corresponds to the sense sequence of the randomized oligonucleotides and encode the carboxy terminal half of the expressed peptides. The left half corresponds to the sequence of the randomized oligonucleotides and encode the amino terminal half of the expressed peptides. The right and left halves of the randomized oligonucleotide populations are cloned into separate vector species and then mixed and joined so that the right and left halves anti-sense come together in random combination to produce a single expression vector species which contains a population of randomized oligonucleotides twenty codons in length. Electroporation of the vector population into an appropriate host produces filamentous phage which express the random peptides on their surface.
The reaction ‘vessels for’ oligonucleotide synthesis were obtained from the manufacturer of the automated synthesizer (Millipore, Burlington, MA; supplier of Millieen/Biosearch cyclone Plus Synthesizer). The vessels were supplied as packages containing empty reaction columns (1 umole), frits, crimps and plugs (MilliGen/Biosearch catalog # GEN 860458). Derivatized and underivatized control pore glass, phosphoramidite nucleotides, and synthesis reagents were also obtained from Millicen/Biosearch. Crimper and decrimper tools were obtained from Fisher scientific Co., Pittsburgh, PA (Catalog numbers 0620 and 0625A, respectively).
Ten reaction columns were used for right half synthesis of random oligonucleotides ten codons in length.
The oligonucleotides have 5 monomers at their 3' end of the 'GAGCT3' and 8 monomers at their 5‘ end of the sequence 5'AATTCCAT3'. The synthesizer was fitted with a column derivatized with a thymine nucleotide (T-column, Mil1iGen/Biosearch # 0615.50) synthesize the sequences shown in Table I for each of ten sequence and was programmed to columns in independent reaction sets. The sequence of the last three monomers (from right to left since synthesis proceeds 3' to 5') encode the indicated amino acids: maple ; Sequence Qoiiimn .(_5_L..).' ' i c s column 112 (T/G)TTGAGCT Phe and Val column 2R (T/C)cTGAGCT Ser and Pro column 3R (T/C)ATGAGCT Tyr and His column 4R (T/C)GTGAGCT Cys and Arg column SR (C/A)TGGAGCT Leu and Met column 6R (C/G)AGGAGcT Gln and Glu column 7R (A/G)cTGAGCT Thr and Ala column BR (A/G)ATGAGCT Asn and Asp column QR (T/G)GGGAGCT Trp and Gly column 1R A(T/A)AGAGCT Ile and Cys where the two monomers in parentheses denote a single monomer position within the codon and indicate that an equal mixture of each monomer was added to the reaction for coupling. The monomer coupling reactions for each of the columns performed as the manufacturer (amidite version 81.06, # 8400-050990, scale 1 uM). washed with acetonitrile and lyophilized to dryness. were recommended by After the last coupling reaction, the columns were Following synthesis, the plugs were removed from each column using a decrimper and the reaction products were poured into a single weigh boat. Initially the bead mass increases, due to the weight of the monomers, however, at later rounds of synthesis material is lost. In either case, the material was equalized with underivatized control pore glass and mixed thoroughly to obtain a random distribution of all twenty codon species. The reaction products were then aliquotted into 10 new reaction columns by removing 25 mg of material at a time and placing it into separate reaction columns. Alternatively, the reaction products.can be aliquotted by suspending the beads in a liquid that is dense enough for the beads to remain dispersed, preferably a liquid that is equal in density to the beads, .and then aliquoting equal volumes of the suspension into separate reaction columns. The lip on the inside of the columns where the frits rest was cleared of material using vacuum suction with a syringe and 25 G needle. New frits were placed onto the lips, the plugs were fitted into the coluns and were crimped into place using a crimper.
Synthesis of the second codon position was achieved using the above 10 columns containing the random mixture of reaction products from the first codon synthesis. the monomer coupling reactions for the second codon position are shown in Table II. An A in the first position means that any monomer can be programmed into the synthesizer.
At that position, the first monomer position is not coupled by the synthesizer since the software assumes that the monomer is already attached to the column. An A also denotes that the columns from the previous codon synthesis should be placed on the synthesizer for use in the present synthesis round. Reactions were again sequentially repeated for each column as shown in Table II and the reaction products washed and dried as described above.
Randomization of the second codon position was achieved by removing the reaction products from each of the columns and thoroughly mixing the material. divided into new reaction columns and prepared for monomer coupling reactions as described above.
The material was again Random synthesis of the next seven codons (positions 3 through 9) proceeded identically to the cycle described above for the second codon position and again used the monomer sequences of Table II. Each of the newly repacked columns containing the random mixture of reaction products from synthesis of the previous codon position was used for the synthesis of the subsequent codon position. After synthesis of the codon at position nine and mixing of the reaction products, the material was divided and repacked into 40 different columns and the monomer sequences shown in Table III were coupled to each of the 40 columns in independent reactions. The oligonucleotides from each of the 40 columns were mixed once more and cleaved from the control pore glass as recommended by the manufacturer.
EQBMJ column column column column column column column column colunm column column column column column column column column column column column column column column column column column column column column column column column column Iaglg ILI R 11R 12R 13R 14R 15R 16R 17R 18R 19R 20R 21R 22R 23R 24R 25R 26R 27R 28R 29R 30R 31R 32R 33R AATTCTTTTA AATTCTGTTA AATTCGTITA AATTCGGTTA AATTCTTCTA AATTCTCCT; AATTCGTCTA AATTCGCCTA AATTCITATA AATTCTCATA AATTCGTATA AATTCGCATA AATTCTTGTA AATTCTCGTA AATTCGTGTA AATTCGCGTA AATTCTCTGA AATTCTATGA AATTCGCTGA AATTCGATGA AATTCTCAGA AATTCTGAG; AATTCGCAGA AATTCGGAGA AATTCTACTA AATTCTGCTA AATTCGACTA AATTCGGCTA AATTCTAATA AATTCTGATA AATTCGAATA AATTCGGATA AATTCTTGGA column 34R AATTCTGGGA column 35R AATTCGTGGA column 36R AATTCGGGGA column 37R AATTCTATAA column 38R AATTCTAAAA column 39R AATTCGATAA column 40R AATTCGAAAA Left half synthesis of random oligonucleotides proceeded similarly to the right half synthesis. This half of the ~oligonucleotide corresponds to the anti-sense sequence of the encoded randomized peptides. Thus, the complementary sequence of the codons in Tables I through III are synthesized. The left half oligonucleotides also have 5 monomers at their 3' end of the sequence 5'GAGCT3' and 8 at their 5' the 'AATTCCAT3'. The rounds of synthesis, washing, drying, mixing, and dividing are as described above. monomers end of sequence For the first codon position, the synthesizer was fitted with a T-column and programmed to synthesize the sequences shown in Table IV for each of ten columns in independent reaction sets. As with right half synthesis, the sequence of the last three monomers (from right to left) encode the indicated amino acids: Column column 1L column 2L column 3L column 4L column 5L column 6L column 7L column 8L column 9L column 10L codon synthesis.
Qglumn column column column column column column column column column column columns as described above.
Tgglg V Sequence ' to 3') 1L AA(A/C)A 2L AG(A/G)A 3L AT(A/G); 4L AC(A/G)A 5L CA(G/T)5 6L CT(G/C); 7L AG(T/C); 8L AT(T/C)A 9L CC(A/C)A 10L T(A/T)TA iggle EV sequence - to 3-) AA(A/C)GAGCT AG(A/G)GAGCT AT(A/G)GAGCT AC(A/G)GAGCT CA(G/T)GAGCT CT(G/C)GAGCT AG(T/C)GAGCT AT(T/C)GAGCT CC(A/C)GAGCT T(A/T)TGAGCT Amino Acids Phe and val Ser and Pro Tyr and His Cys and Arg Leu and Met Gln and Glu Thr and Ala Ash and Asp Trp and Gly Ile and Cys Following washing and drying, the plugs for each column were removed, mixed and aliquotted into ten new reaction synthesis of the second codon position was achieved using these ten columns containing the random mixture of reaction products from the first The monomer coupling reactions for the second codon position are shown in Table V.
Amin2_A£ids Phe and Val Ser and Pro Tyr and His Cys and Arg Leu and Met Gln and Glu Thr and Ala Ash and Asp Trp and Gly Ile and cys Again, randomization of the second codon position was achieved by removing the reaction products from each of the columns and thoroughly mixing the beads. The beads were repacked into ten new reaction columns.
Random synthesis of the next seven codon positions proceeded identically to the cycle described above for the second codon position and again used the monomer sequences of Table V. After synthesis of the codon at position nine and mixing of the reaction products, the material was divided and repacked into 40 different columns and the monomer sequences shown in Table VI were coupled to each of the 40 columns in independent reactions. gable 11 golumn Se enc ' o ' column 1L AATTCCATAAAAXXA column 2L AATTCCATAAACXXA column 3L AATTCCATAACAXXA column 4L AATTCCATAACCXXA column SL AATTCCATAGAAXXA column 6L AATTCCATAGACXX5 column 7L AATTCCATAGGAXXA column 8L AATTCCATAGGCXXA column 9L AATTCCATATAAXXA column 10L AATTCCATATACXXA column 11L AATTCCATATGAXXA column 12L AATTCCATATGCXXA column 13L AATTCCATACAAXXA column 14L AATTCCATACACXXA column 15L AATTCCATACGAXXA column 16L AATTCCATACGCXXA column 17L AATTCCATCAGAXXA column 18L AATTCCATCAGCXXA column 19L AATTCCATCATAXXA column 20L AATTCCATCATCXXA column 21L AATTCCATCTGAXXA column 22L AATTCCATCTGCXX3 column 23L AATTCCATCTCAXXA column 24L AATTCCATCTCCXXA column 25L AATTCCATAGTAXXA column 26L AATTCCATAGTCXXA column 27L AATTCCATAGCAXXA column 28L AATTCCATAGCCXXA column 29L AATTCCATATTAXX5 column 30L AATTCCATATTCXXA column 31L AATTCCATATCAXXA colun 32L AATTCCATATCCXXA oolun 33L AATTCCATCCAAXXA column 34L AATTCCATCCACXXA column 35L AATTCCATCCCAXXA column 36L AATTCCATCCCCXXA column 37L AATTCCATTATAXXA column 38L AATTCCATTATCXXA column 39L AATTCCATTTTAXXA column 40L AATTCCATTTTCXXA The first two monomers denoted by an "X" represent an equal mixture of all four nucleotides at that position. This is necessary to retain a relatively unbiased codon sequence at the junction between right and left half oligonucleotides.
The above right and left half random oligonucleotides were cleaved and purified from the supports and used in constructing the surface expression libraries below.
Vector Cgnstrgctign Two M13-based vectors, M13Ix42 (SEQ ID NO: 1) and M13IX22 (SEQ ID No: 2), were constructed for the cloning and. propagation of right and left. half populations of random oligonucleotides, respectively. The vectors were specially constructed to facilitate the random joining and subsequent expression of right and left half Each vector within the right left half oligonucleotide from the population joined together to form oligonucleotide populations. population contains one and one a single contiguous oligonucleotide with random codons which is twenty-two codons in length. The resultant population of vectors are used to construct a surface expression library.
M13IX42, or the right—half vector, was constructed to harbor the right half populations of randomized oligonucleotides. Ml3mp18 (Pharmacia, Piscataway, NJ) was the starting vector. This vector was genetically modified to contain, in addition to the encoded wild type M13 gene VIII already present in the vector: (1) a pseudo-wild type M13 gene VIII sequence with a stop codon (amber) placed between it and an Eco RI-Sac I cloning site for randomized oligonucleotides; (2) a pair of Fok I sites to be used for joining with M13IX22, the left-half vector; amber stop codon placed on the opposite side of the vector than the portion being combined with the left-half vector: and (4) various other mutations to remove redundant restriction sites and the amino terminal portion of Lac 2. (3) a second The pseudo-wild type M13 gene VIII was used for surface expression of random peptides. The pseudo-wild type gene encodes the identical amino acid sequence as that of the wild type gene; however, the nucleotide sequence has been altered so that only 63% identity exists between this Modification of surface gene and the encoded wild type gene VIII. the gene VIII nucleotide sequence used for expression reduces the possibility of recombination with the wild type gene VIII contained on the same vector. Additionally, the wild type M13 gene VIII was retained in the vector system to ensure that at least some functional, non-fusion coat protein would be produced. The inclusion of wild type gene VIII therefore reduces the possibility of non-viable phage production from the random homologous peptide fusion genes.
The pseudo-wild type gene VIII was constructed by chemically synthesizing a series of oligonucleotides which encode both strands of the gene. The oligonucleotides are presented in Table VII (SEQ ID NOS: 7 through 16).
T ACG AGC AAG GCT TCT TA Bottom Strand i u t'des VIII 08 AGC TTA AGA AGC CTT GCT CGT AAA CTT TTT GAA TAA TTT VIII 09 AAT CCC TAT GGT AGC ACC AAC TAT AAC TAC TAC CAT VIII 10 AGC CCA AGC GTA GCC AAT GTA CTC AGT AGC ACT TG VIII 11 C CTG TAA ACT ATT GAA TGC AGC CTT AGC AGG GTC VIII 12 ATC GCC TTC AGC CTA G Except for the terminal oligonucleotides VIII 03 (SEQ ID NO: 7) and VIII 08 (SEQ ID NO: 12), the above oligonucleotides (oligonucleotides VIII 04-VIII O7 and 09- 12 (SEQ ID NOS: 8 through 11 and 13 through 16)) were mixed at 200 ng each in 10 pl final volume and phosphorylated with T4 polynucleotide Kinase (Pharmacia, Piscataway, NJ) with 1 mM ATP at 37‘C for 1 hour. The reaction was stopped at 65'C for 5 minutes. Terminal oligonucleotides were added to the mixture and annealed into double-stranded form by heating to 65'c for 5 minutes, followed by cooling to room temperature over a period of 30 minutes. oligonucleotides were ligated together with 1.0 U of T4 DNA ligase (BRL). The annealed and ligated oligonucleotides yield a double-stranded DNA flanked by a Bam HI site at its ' end and by a Hind III site at its 3' end. A translational stop codon (amber) immediately follows the Bam HI site. The gene VIII sequence begins with the codon GAA (Glu) two codons 3' to the stop codon. The double- stranded insert was phosphorylated using T4 DNA Kinase (Pharmacia, Piscataway, NJ) and ATP (10 mM Tris-HC1, pH 7.5, 10 mm Mgclz) and cloned in frame with the Eco RI and Sac I sites within the M13 polylinker. To do so, M13mp18 was digested with Bam HI (New England Biolabs, Beverley, MA) and Hind III (New England Biolabs) and combined at a molar ratio of 1:10 with the double-stranded insert. The ligations were performed at 16'C overnight in 1X ligase buffer (50 mM Tris-HCI, pH 7.8, 10 mM Mgclz, 20 mM DTT, 1 mM ATP, 50 pg/ml BSA) containing 1.0 U of T4 DNA ligase (New The ligation mixture was transformed The annealed England Biolabs). into a host and screened for positive clones using standard procedures in the art. several mutations were generated within the right-half vector to yield functional M13Ix42. The mutations were generated using the method of Kunkel et al., Meth. Enzymol. l54:367-332 (1987), which is reference, for site—directed mutagenesis. and protocols were obtained from a Bio Rad incorporated herein by The reagents, strains Mutagenesis kit (Bio Rad, Richmond, CA) and mutagenesis was performed as recommended by the manufacturer.
A Fok I site used for joining the right and left halves was generated 8 nucleotides 5' to the unique Eco RI site using the oligonucleotide 5'-CTCGAATTCGTACATCCT GGTCATAGC-3' (SEQ ID NO: 17). The second Fok I site retained in the vector is naturally encoded at position 3547; however, the sequence within the overhang was changed to encode CTTC. Two Fok I sites were removed from the vector at positions 239 and 7244 of M13mp18 as well as the Hind III site at the end of the pseudo gene VIII sequence using the mutant oligonucleotides 5'-CATTTTTGCAGATGGCTTAGA -3' (SEQ ID NO: 18) and 5'-TAGCATTAACGTCCAATA-3' (SEQ ID No: 19), respectively. New Hind III and Mlu I sites were also introduced at position 3919 and 3951 of M13IX42. The oligonucleotides used for this mutagenesis had the sequences 5'-ATATATTTTAGTAAGCTTCATCTTCT-3' (SEQ ID NO: 20) and 5'-GACAAAGAACGCGTGAAAACTTT-3' (SEQ ID NO: 21), respectively. The amino terminal portion of Lao Z was deleted by oligonucleotide-directed mutagenesis using the m u t a n t o 1 i g o n u c l e o t i d e 5 ' - GCGGGCCTCTTCGCTATTGCTTAAGAAGCCTTGCT-3' (SEQ ID NO: 22).
This deletion also removed a third M13mp18 derived Fok I site. The distance between the Eco RI and Sac I sites was increased to ensure complete double digestion by inserting a spacer sequence. The spacer sequence was inserted using t 11 e o l i g :2 n u :3 l e :3 t 5. d e 5 ' - TTCAGCCTAGGATCCGCCGAGCTCTCCTACCTGCGAATTCGTACATCC-3'(SEQID N0: 23). Finally, an amber stop codon was placed at position 4492 using the mutant oligonucleotide 5'- TGGATTATACTTCTA AATAATGGA-3' (SEQ ID NO: 24). The amber stop codon is used as a biological selection to ensure the proper recombination of vector sequences to bring together right and left halves of the randomized oligonucleotides.
In constructing the above mutations, all changes made in a M13 coding region were performed such that the amino acid [ It should be noted that several mutations within Ml3mp18 were found which differed from the published sequence. Where known, these sequence differences are recorded herein as found and therefore may not correspond exactly to the published sequence of M13mp18. sequence remained unaltered.
The sequence of the resultant vector, M13IX42, is shown in Figure 5 (SEQ ID No: 1). Figure 3A also shows Ml3IX42 where each of the elements necessary for producing a surface expression library between right and left half randomized oligonucleotides is marked. The sequence between the two Fok I sites shown by the arrow is the portion of H13IX42 which is to be combined with a portion of the left-half vector to produce random oligonucleotides as fusion proteins of gene VIII.
M13IX22, or the left-half vector, was constructed to the left half populations of randomized oligonucleotides. This vector was constructed from Ml3mp19 (Pharmacia, Piscataway, NJ) and contains: (1) Two Fok I sites for mixing with Ml3IX42 to bring together the left (2) sequences necessary for expression such as a promoter and harbor and right halves of the randomized oligonucleotides; signal sequence and translation initiation signals: (3) an Eco RI-Sac I cloning site for the oligonucleotides; and (4) an amber stop biological selection in bringing together right and left half oligonucleotides. randomized codon for Of the two Fok I sites used for mixing M13IX22 with M13IX42, one is naturally encoded in M13mpl8 and M13mp19 (at position 3547). As with M13IX42, the overhang within this naturally occurring Fok I site was changed to CTTC.
The other Fok I site was introduced after construction of the initiation signals by mutagenesis the site-directed !- translation using oligonucleotide TAACACTCATTCCGGATGGAATTCTGGAGTTGGGT-3' (SEQ ID NO: 25).
The translation initiation signals were constructed by annealing of overlapping oligonucleotides as described above to produce a double-stranded insert containing a 5' Eco RI site and a 3' Hind III site. The overlapping oligonucleotides are shown in Table VIII (SEQ ID Nos: 26 through 34) and were ligated as a double-stranded insert between the Eco RI and Hind III sites of M13mp18 as described for the pseudo gene VIII insert. The ribosome binding site (AGGAGAC) is located in oligonucleotide 015 (SEQ ID NO: 26) and the translation initiation codon (ATG) is the first three nucleotides of oligonucleotide 016 (SEQ AATT C GCC AAG GAG ACA GTC AT AATG AAA TAC CTA TTG CCT ACG GCA GCC GCT GGA TTG TT ATTA CTC GCT GCC CAA CCA GCC ATG GCC GAG CTC GTG AT GACC CAG ACT CCA GATATC CAA CAG GAA TGA GTG TTA AT TCT AGA ACG CGT C ACGT G ACG CGT TCT AGA AT TAA CACTCA TTC CTG T TG GAT ATC TGG AGT CTG GGT CAT CAC GAG CTC GGC CAT G GC TGG TTG GGC AGC GAG TAA TAA CAA TCC AGC GGC TGC C GT AGG CAA TAG GTA TTT CAT TAT GAC TGT CCT TGG CG Oligonucleotide 017 (SEQ ID NO: 27) contained a Sac: I restriction site 67 nucleotides downstream from the ATG codon. The naturally occurring Eco RI site was removed and a new site introduced 25 nucleotides downstream from the Sac I. Oligonucleotides 5 '-TGACTGTCTCC‘1"I’GGCG'I’G'I'GAAATTG'I‘TA- ' (SEQ ID NO: 35) and 5'-TAACACTCATTCCGGATGGAATTCTGGAGTCT GGGT-3' (SEQ ID NO: 36) were used to generate each of the mutations, respectively. An amber stop oodon was also introduced at position 3263 of M13mp18 using the oligonucleotide 5 '-CAATTTTATCCTAAATCTTACCAAC-3' (SEQ ID NO: 37) .
In addition to the above mutations, a variety of other modifications were made to remove certain sequences and The LAC Z ribosome binding site was removed when the original Eco RI site in M13mp18 Also, the Fox I sites at positions 239, 6361 and 7244 of M13mp18 were likewise removed with mutant oligonucleotides 5'-CATITTTGCAGATGGCTTAGA-3' (SEQ ID NO: 38), 5'-CGAAAGGGGGGTGTGCTGCAA-3' (SEQ ID NO: 39) and 5'- TAGCATTAACGTCCAATA-3‘ (SEQ ID NO: 40) , respectively.
Again, mutations within the coding region did not alter the redundant restriction sites. was mutated. amino acid sequence.
The resultant vector, M13IX22, is 7320 base pairs in length, the sequence of which is shown in Figure 6 (SEQ. ID NO: 2). positions 6290 and 6314, shows M13IX22 where each of the elements necessary for The Sac I and Eco RI cloning sites are at respectively. Figure 3A also producing a surface expression library between right and left half randomized oligonucleotides is marked.
Libr Cons Each population of right and left half randomized oligonucleotides from columns 1R through 40R and columns 1L through 40L are cloned separately into Ml3IX42 and Ml3IX22, respectively, to create sublibraries of right and left half randomized oligonucleotides. Therefore, a total of eighty sublibraries are generated. separately maintaining each population of randomized oligonucleotides until the final screening step is performed to ensure maximum efficiency of annealing of right and left half oligonucleotides. The greater efficiency increases the total number of randomized oligonucleotides which can be obtained. Alternatively, one can combine all forty populations of right half oligonucleotides (columns 1R-40R) into one population and of left- half oligonucleotides (columns lL—40L) into a second population to generate just one sublibrary for each.
For the generation of sublibraries, each of the above populations of randomized oligonucleotides are cloned separately into the appropriate vector. The right half oligonucleotides are cloned into M13IX42 to generate sublibraries M13IX42.lR through.M13IX42.40R. The left half oligonucleotides are similarly cloned into M13IX22 to generate sublibraries Ml3IX22.1L through.M13Ix22.4oIn Each vector contains unique Eco RI and sac I restriction enzyme sites which produce 5' and 3' single-stranded overhangs, respectively, when digested. The single strand overhangs used for the annealing and ligation of the complementary single-stranded random oligonucleotides.
The randomized oligonucleotide populations are cloned between the Eco RI and Sac I sites by sequential digestion and ligation steps. of Eco RI Each vector is treated with an excess (New England Biolabs) at 37°C for 2 hours followed by addition of 4-24 units of calf intestinal alkaline phosphatase (Boehringer Mannheim, Indianapolis, IN). Reactions are stopped by phenol/chloroform extraction and ethanol precipitation. The pellets are resuspended in an appropriate amount of distilled or deionized water (dH5>). About 10 pmol of vector is mixed with a 5000-fold molar excess of each population of randomized oligonucleotides in 10 pl of 1X ligase buffer (50 mM Tris- HCl, pH 7.8, 10 mM MQCIZ, 20 IIIM DTT, 1 mM ATP, 50 pg/ml BSA) containing 1.0 U of T4 DNA ligase (BRL, Gaithersburg, MD).
The ligation is incubated at 16"C for 16 hours. Reactions are stopped by heating at 75‘C for 15 minutes and the DNA is digested with an excess of Sac I (New England Biolabs) Sac I is inactivated by heating at 75°C for minutes and the volume of the reaction mixture is adjusted to 300 pl with an appropriate amount of 10X ligase buffer and dHg3. one unit of T4 DNA ligase (BRL) is added and the mixture is incubated overnight at 16'C. The DNA is ethanol precipitated and resuspended in TE (10 mM Tris-Hcl, pH 8.0, 1 mM EDTA) . DNA from each ligation is electroporated into XL1 Blue" cells (Stratagene, La Jolla, CA), as described below, to generate the sublibraries. for 2 hours .
E. coli XLZL Blue" is electroporated as described by Smith et al., Focus 12:38-40 (1990) which is incorporated herein by reference. The cells are prepared by inoculating a fresh colony of XL1s into 5 mls of SOB without magnesium (20 g bacto-tryptone, 5 g bacto—yeast extract, 0.584 g Nacl, 0.186 g KC1, dH20 to 1,000 mls) and grown with vigorous aeration overnight at 37°C. SOB without magnesium (500 ml) is inoculated at 1:1000 with the overnight culture and grown with vigorous aeration at 37'C until the Obgo is 0.8 (about 2 to 3 h). The cells are harvested by centrifugation at 5,000 rpm (2,600 x g) in a G53 rotor (Sorvall, Newtown, CT) at 4'C for 10 minutes, resuspended in 500 ml of ice-cold 10% (V/V) centrifuged and resuspended a second time in the same manner. After a third centrifugation, the cells are resuspended in 10% sterile glycerol at a final volume of about 2 ml, such that the OD5o of the suspension is 200 to 300. Usually, resuspension is achieved in the 10% glycerol that remains in the bottle after pouring off the supernate.
Cells are frozen in 40 pl aliquots in microcentrifuge tubes sterile glycerol and using a dry ice-ethanol bath and stored frozen at -70°C.
Frozen cells are electroporated by thawing slowly on ice before use and mixing with about 10 pg to 500 ng of vector per 40 pl of cell suspension. A 40 pl aliquot is placed in an 0.1 cm electroporation chamber (Bio-Rad, Richmond, CA) and pulsed once at 0'C using 200 n parallel resistor, 25 AF, 1.88 RV, which gives a pulse length (1) of '4 ms. A 10 pl aliquot of the pulsed cells are diluted into 1 ml SOC (98 mls SOB plus 1 ml of 2 M Mgclz and 1 ml of 2 M glucose) in a 12- x 75-mm culture tube, and the culture is shaken at 37'C for 1 hour prior to culturing in selective media, (see below).
Each of the eighty sublibraries are cultured using methods known to one skilled in the art. be found in sanbrook et al., Such methods can Molecular cloning: A Laboratory Manuel, cold Spring Harbor Laboratory, cold Spring Harbor, 1989, and in Ausubel et al., Current Protocols in Molecular Biology, John Wiley and Sons, New York, 1989, both of which are incorporated herein by reference. Briefly, the above 1 ml suhlibrary cultures were grown up by diluting 50-fold into ZXYT media (16 g tryptone, 10 g yeast extract, 5 g Nacl) and culturing at 37°C for 5-8 hours. The bacteria were pelleted by centrifugation at 10,000 xg. The supernatant containing phage was transferred to a sterile tube and stored at 4'C.
Double strand vector DNA containing right and left half randomized cligonucleotide inserts is isolated from the cell pellet of each sublibrary. Briefly, the pellet is washed in TE (10 mn Tris, pH 8.0, 1 mM EDTA) and recollected by centrifugation at 7,000 rpm for 5' in a sorval centrifuge (Newtown, CT). Pellets are resuspended in 6 mls of 10% Sucrose, 50 mM Tris, pH 8.0. 3.0 ml of 10 mg/pl lysozyne is added and incubated on ice for 20 minutes. 12 mls of 0.2 M NaOH, 1% SDS is added followed by minutes on ice. The suspensions are then incubated on ice for 20 minutes after addition of 7.5 mls of 3 M NaOAc, pH 4.6. The samples are centrifuged at 15,000 rpm for 15 minutes at 4'C, RNased and extracted with phenol/chloroform, followed by ethanol precipitation. The pellets are resuspended, weighed and an equal weight of csclz is dissolved into each tube until a density of 1.60 g/ml is achieved. EtBr is added to 600 pg/ml and the double-stranded DNA is isolated by equilibrium centrifugation in a TV-1665 rotor (Sorval) at 50,000 rpm for 6 hours. These DNAs from each right and left half sublibrary are used to generate forty libraries in which the left the oligonucleotides have been randomly joined together. right and halves of randomized Each of the forty libraries are produced by joining together one right half and one left half sublibrary. The two sublibraries joined together corresponded to the same number right left half oligonucleotide synthesis. example, sublibrary M13IX42.1R is joined with M13IX22.1L to produce the surface expression library M13IX.1RL. In the alternative situation where only two sublibraries are generated from the combined populations of all right half synthesis and all left half column for and random synthesis, only one surface expression library would be produced.
For the random joining of each right and left half populations single expression ‘vector species, the DNAs isolated from each sublibrary are digested an excess of Fok I (New England Biolabs). extraction, followed by ethanol precipitation. oligonucleotide into a surface The reactions are stopped by phenol/chloroform Pellets are resuspended in dHgD. Each surface expression library is generated by ligating equal molar amounts (5-10 pmol) of Fok I digested DNA isolated from corresponding right and left half sublibraries in 10 ul of IX ligase buffer containing 1.0 U of T4 DNA ligase (Bethesda Research Laboratories, Gaithersburg, MD). The ligations proceed overnight at 16‘C and are electroporated into the sup 0 strain MK30-3 (Boehringer Mannheim. Biochemical, (BMB), Indianapolis, IN) as previously described for XL1 cells.
Because MK30-3 is sup 0, only the vector portions encoding the randomized oligonucleotides which come together will produce viable phage.
E re s’ ‘b a ' s Purified phage are prepared from 50 ml liquid cultures of XL1 Blue" cells (Stratagene) which are infected at a m.o.i. of 10 from the phage stocks stored at are. The cultures are induced with 2 mM IPTG. Supernatants from all cultures are combined and cleared by two centrifugations, and the phage are precipitated by adding 1/7.5 volumes of PEG solution (25% PEG-8000, 2.5 M Nacl) , followed by incubation at are overnight. The precipitate is recovered by centrifugation for 90 minutes at 10,000 x g. Phage pellets are resuspended in 25 ml of 0.01 M Tris-Hcl, pH 7.6, 1.0 mM EDTA, and 0.1% sarkosyl and then shaken slowly at room temperature for 30 minutes. The solutions are adjusted to 0.5 M Nacl and to a final concentration of 5% polyethylene glycol. After 2 hours at 4‘C, the precipitates containing the phage are recovered by centrifugation for 1 hour at 15,000 X g. The precipitates are resuspended in 10 ml of NET buffer (0.1 M Nacl, 1.0 mM EDTA, and 0.01 M Tris-HCl, pH 7.6), mixed well, and the phage repelleted by centrifugation at 170,000 X g for 3 hours. The phage pellets are subsequently resuspended overnight in 2 ml of NET buffer and subjected to cesium chloride centrifugation for 18 hours at 110,000 X g (3.86 g of cesium chloride in 10 ml of buffer). Phage bands are collected, diluted 7-fold with NET buffer, recentrifuged at 170,000 X g for 3 hours, resuspended, and stored at 4°C in 0.3 ml of NET buffer containing 0.1 mM sodium azide.
Ligand binding proteins used for panning on streptavidin coated dishes are first biotinylated and then absorbed against UV-inactivated blocking phage (see below).
The biotinylating dimethylformamide at a ratio of 2.4 mg solid NHs—Ss-Biotin 2-(biotinamido)ethyl-1,3'- dithiopropionate; Pierce, Rockford, IL) to 1 ml solvent and used as recommended by the manufacturer. reagents are dissolved in (sulfosuccinimidyl Small-scale reactions are accomplished by mixing 1 pl dissolved reagent with 43 pl of 1 mg/ml ligand binding protein diluted in sterile bicarbonate buffer (0.1 M Naficqy pH 8.6). After 2 hours at 25'C, residual biotinylating reagent is reacted with 500 pl 1 M ethanolamine (pH adjusted to 9 with HC1) for an additional 2 hours. The entire sample is diluted with 1 ml TBS containing 1 mg/ml BSA, concentrated to about 50 ul on a Centricon 30 ultra-filter (Amicon), and washed on the same filter three times with 2 ml TBS and once with 1 ml TBS containing 0.02% NaN3 and 7 x 10" Uvbinactivated blocking phage (see below); the final retentate (60-80 pl) is stored at 4'C. with the NHS-SS-Biotin reagent are linked to biotin via a disulfide-containing chain.
Ligand binding proteins biotinylated UV-irradiated M13 phage were used for blocking binding proteins which fortuitously’ bound filamentous phage in M13mp8 (Messing and Vieira, Gene 19: 262-276 which is incorporated herein by reference) was which general. (1982), chosen because it carries two amber stop codons, ensure that the few phage surviving irradiation will not grow in the sup 0 strains used to titer the surface expression libraries. A 5 ml sample containing 5 x 10" M13mp8 phage, purified as described above, was placed in a small petri plate and irradiated with a germicidal lamp at a distance of two feet for 7 minutes (flux 150 pw/cmfi.
NaN3 was added to 0.02% and phage particles concentrated to -kDa ultrafilter " particles/ml on a Centricon (Amicon).
For panning, polystyrene petri plates (60 x 15 mm, Falcon; Becton Dickinson, Lincoln Park, NJ) are incubated with 1 ml of 1 mg/ml of streptavidin (BMB) in 0.1 M NaHC03 pa 8.6-0.02% Natl; in a small, air-tight plastic box overnight in a cold room. The next day streptavidin is removed and replaced with at least 10 ml blocking solution (29 mg/ml of BSA; 3 pg/ml of streptavidin; 0.1 M Na}-ICO3 pl-I 8.6—0.02% NaN3) and incubated at least 1 hour at room temperature. The blocking solution is removed and plates are washed rapidly three times with Tris buffered saline containing 0.5% Tween 20 (TBS-0.5% Tween 20).
Selection of phage expressing peptides bound by the ligand binding proteins is performed with 5 pl (2.7 pg ligand binding protein) of blocked biotinylated ligand binding proteins reacted with a 50 pl portion of each library. Each mixture is incubated overnight at 4'C, diluted with 1 ml TBS-0.5% Tween 20, and transferred to a streptavidin-coated petri plate prepared as described above. After rocking 10 minutes at room temperature, unbound phage are removed and plates washed ten times with TBS-0.5% Tween 20 over a period of 30-90 minutes. Bound phage are eluted from plates with 800 pl sterile elution buffer (1 mg/ml BSA, 0.1 M HCl, pH adjusted to 2.2 with glycerol) for 15 minutes and eluates neutralized with 48 pl 2 M Tris (pH unadjusted). A 20 pl portion of each eluate is titered on MK30-3 concentrated cells with dilutions of input phage. A A second round of panning is performed by treating 750 pl of first eluate from each library with 5 mM DTT for 10 minutes to break disulfide bonds linking biotin groups to residual biotinylated binding proteins. The treated eluate is concentrated on a Centricon 30 ultrafilter (Amicon), washed three times with TBS-0.5% Tween 20, and concentrated to a final volume of about 50 pl. Final retentate is transferred to a tube containing 5.0 pl (2.7 pg ligand binding protein) blocked biotinylated ligand binding proteins and incubated overnight. The solution is diluted with 1 ml TBS-0.5% Tween 20, described above on fresh streptavidin-coated petri plates.
The entire second eluate (800 pl) is neutralized with 48 pl 2 M Tris, and 20 pl is titered simultaneously with the first eluate and dilutions of the input phage. panned, and eluted as Individual phage populations are purified through 2 to 3 rounds of plaque purification. Briefly, the second eluate titer plates are lifted with nitrocellulose filters (Schleicher & Schuell, Inc., Keene, NH) and processed by washing for 15 minutes in TBS (10 mm Tris-Hcl, pH 7.2, 150 mM Nacl), followed by an incubation with shaking for an additional 1 hour at 37°C with TBS containing 5% nonfat dry milk (TBS-5% NDM) at 0.5 ml/cud. The wash is discarded and fresh TBS-5% NDM is added (0.1 ml/cmz) containing the ligand binding protein between 1 nM to 100 mM, preferably between 1 to 100 nu. All incubations are carried out in heat- sealable pouches (Sears). Incubation with the ligand binding protein proceeds for 12-16 hours at 4‘C with shaking. VThe filters are removed from the bags and washed 3 times for 30 minutes at room temperature with 150 mls of TBS containing 0.1% NDM and 0.2% NP-40 (Sigma, St. Louis, MO). The filters are then incubated for 2 hours at room temperature in antiserum against the ligand binding protein at an appropriate dilution in TBS-0.5% NDM, changes of TBS containing 0.1% NDM and 0.2% NP-40 as described above and incubated in TBS containing 0.1% NDM and 0.2% NP-40 with 1 x 106 cpm of 151-labeled Protein A (specific activity = 2.1 x 107 cpm/pg). After a washing with TBS containing 0.1% NDM and 0.2% NP-40 as described above, the filters are wrapped in Saran Wrap and exposed to Kodak X-Omat x-ray film (Kodak, Rochester, NY) for 1-12 hours at -70'C Lightning Plus Intensifying Screens (Dupont, Willmington, DE). washed in 3 using Dupont Cronex Template Preparation and Sequencing Templates are prepared for sequencing by inoculating a 1 ml culture of 2XYT containing a 1:100 dilution of an overnight culture of XL1 with an individual plaque. The plaques are picked using a sterile toothpick. The culture is incubated at 37'C for 5-6 hours with shaking and then transferred to a 1.5 ml microfuge tube. 200 pl of PEG solution is added, followed by vortexing and placed on ice for 10 minutes. The phage precipitate is recovered by centrifugation in a microfuge at 12,000 x g for 5 minutes.
The supernatant is discarded and the pellet is resuspended in 230 pl of TE (10 mM Tris-Hcl, pH 7.5, 1 mM EDTA) by gently pipeting with a yellow pipet tip. Phenol (200 pl) is added, followed by a brief vortex and microfuged to The aqueous phase is transferred to with 200 [11 of phenol/chloroform (1:1) as described above for the phenol A 0.1 volume of 3 M Na0Ac is added, followed by addition of 2.5 volumes of ethanol and precipated at -20'C for 20 minutes. The precipated templates are separate the phases. a separate tube and extracted extraction. recovered by centrifugation in a microfuge at 12,000 x g The pellet is washed in 70% ethanol, dried sequencing was performed for 8 minutes. and resuspended in 25 #1 TE. using a sequenasem sequencing kit following the protocol supplied by the manufacturer (U.S. Biochemical, Cleveland, on) .
EX£MELE_1l Isolation and Chargctegizatign of Egptide Ligands Generated Egon Qligongcleotiges Having Random Qodons at Two e e 'ned o ‘o s This example shows the generation of a surface expression library from a population of oligonucleotides having randomized codons. The oligonucleotides are ten codons in length and are cloned into a single vector species for the generation of a M13 gene VIII-based surface expression library. The example also shows the selection of peptides ligand binding characterization of their encoded nucleic acid sequences. for_ a protein and oli o uc1eot‘de S es's oligonucleotides were synthesized as described in Example I. The synthesizer was programmed to synthesize the sequences in Table IX. These sequences correspond to the first random codon position synthesized and 3' flanking sequences of the oligonucleotide which hybridizes to the leader sequence in the vector. The shown complementary sequences are used for insertional mutagenesis of the synthesized population of oligonucleotides. maple xx gglgmg e e 5' o ' column 1 AA(A/C)GGTTGGTCGGTACCGG column 2 AG(A/G)GGTTGGTCGGTACCGG column 3 AT(A/G)GGTTGGTCGGTACCGG column 4 AC(A/G)GGTTGGTCGGTACCGG column 5 CA(G/T)GGTGGTCGGTACCGG column 6 CT(G/C)GGTGGTCGGTACCGG column 7 AG(T/C)GGTTGGTCGGTACCGG column 8 AT(T/C)GGTTGGTCGGTACCGG column 9 CC(A/C)GGTTGGTCGGTACCGG column 10 T(A/T)TGGTTGGTCGGTACCGG The next eight random codon positions were synthesized as described for Table V in Example 1. Following the ninth position synthesis, the reaction products were once more combined, mixed and. redistributed into 10 new’ reaction columns. Synthesis of the last random codon position and ' flanking sequences are shown in Table X.
Table X Qglumn geggence 15' to 3') column 1 AcGATccGcceAocrcAA(A/c)g column 2 AGGATCCGCCGAGCTCAG(A/G)A column 3 AGGATCCGCCGAGCTCAT(A/G)A column 4 AGGATCCGCCGAGCTCAC(A/G)A column 5 AGGATCCGCCGAGCTCCA(G/T)g column 6 AGGATCCGCCGAGCTCCT(G/C); column 7 AGGATCCGCCGAGCTCAG(T/C); column 8 AGGATCCGCCGAGCTCAT(T/C); column 9 AGGATCCGCCGAGCTCCC(A/C)A column 10 AGGATCCGCCGAGCTCT(A/T)TA The reaction products were mixed once more and the oligonucleotides cleaved and purified as recommended by the manufacturer. The purified population of oligonucleotides were used to generate a surface expression library as described below.
Vector gonsgggction The vector used for generating surface expression libraries from a single oligonucleotide population (i.e., without joining together of left half The vector is a K13- based expression vector which directs the synthesis of gene VIII-peptide fusion proteins (Figure 4). This vector exhibits all the functions that the combined right and left half vectors of Example I exhibit. right and oligonucleotides) is described below.
An M13—based vector was constructed for the cloning surface expression of populations of oligonucleotides (Figure 4, M13IX30), M13mp19 (Pharmacia) was the starting vector. This vector was modified to in addition to the encoded wild type M13 gene VIII: (1) a pseudo—wild type gene, gene VIII sequence with an amber stop codon placed between it and the restriction (2) Stu 1, Spa I and Xho I restriction sites in frame with the pseudo—wi1d type and random contain, sites for cloning oligonucleotides; gVIII for cloning oligonucleotides: (3) sequences necessary such as a promoter, signal sequence and translation initiation signals; (4) various other'mutations for expression, to remove redundant restriction sites and the amino terminal portion of Lao Z. construction of M13IX3o was performed in four steps.
In the first step, a precursor vector containing the pseudo gene VIII and various other mutations was constructed, Ml3IXO1F. small cloning site in a separate M13mp18 vector to yield The second step involved the construction of a M13IXO3. In the third step, expression sequences and cloning sites were constructed in M13IXO3 to generate the intermediate vector Ml3IX04B. The fourth step involved the incorporation of the newly constructed sequences from the intermediate vector into Ml3IX01F to yield M13IX30.
Incorporation of these sequences linked them with the pseudo gene VIII.
Construction of the precursor ‘vector Ml3IXO1F was similar to that of Ml3IX42 described in Example I except for the following features: starting vector; (1) Ml3mpl9 was used as the (2) the Fok I site 5' to the unique Eco RI site was not incorporated and the overhang’ at the naturally occurring Fok I site at position 3547 was not changed to 5‘-CTTC-3‘; (3) the spacer sequence was not incorporated between the Eco RI and sac I sites; and (4) the amber codon at position 4492 was not incorporated.
In the second step, M13mp18 was mutated to remove the ‘ end of Lac Z up to the Lac i binding site and including the Lac Z ribosome binding site and start codon.
Additionally, the polylinker was removed and a Hlu I site was introduced in the coding region of Lac 2. A single oligonucleotide was used for these mutagenesis and had the sequence "5'-AAACGACGGCCAGTGCCAAGTGACGCGTGTGAAATTGTTATCC- 3'" (SEQ ID No: 41). Restriction enzyme sites for Hind III and Eco RI were introduced downstream of the Mlult site u s i n g t h e c l i g o n u c l e o t i d e GGCGAAAGGGAATTCTGCAAGGCGATTAAGCTTGGGTAACGCC-3'" (SEQIUDNO: ). These modifications of M13mpl8 yielded the vector Ml3IXO3.
"SI- The expression sequences and cloning sites were introduced into M13Ix03 by chemically synthesizing a series of oligonucleotides which encode both strands of the desired sequence. The oligonucleotides are presented in Table XI (SEQ ID NOS: 43 through 50).
Top Strand Qlig ngcleotiges geggegce (5' to 3') 084 GGCGTTACCCAAGCTTTGTACATGGAGAAAATAAAG O27 TGAAACAAAGCACTATTGCACTGGCACTCTTACCGT TACCGT O28 TACTGTTTACCCCTGTGACAAAAGCCGCCCAGGTCC AGCTGC O29. TCGAGTCAGGCCTATTGTGCCCAGGGATTGTACTAG TGGATCCG Bottom Oligogucleggiggg figgggnge (5' to 3') 085 TGGCGAAAGGGAATTCGGATCCACTAGTACAATCCCTG 031 GGCACAATAGGCCTGACTCGAGCAGCTGGACCAGGGCG GCTT 032 TTGTCACAGGGGTAAACAGTAACGGTAACGGTAAGTGT GCCA O33 GTGCAATAGTGCTTTGTTTCACTTTATTTTCTCCATGT ACAA The above oligonucleotides except for the terminal oligonucleotides 084 (SEQ ID No: 43) and 085 (SEQ ID NO: 47) of Table XI were mixed, phosphorylated, annealed and ligated to form a double stranded insert as described in Example I. However, instead of cloning directly into the intermediate vector the insert was first amplified by PCR using the terminal oligonucleotides O84 (SEQ ID NO: 43) and (SEQ ID No: 47) as primers. The terminal oligonucleotide 084 (SEQ ID No: 43) contains a Hind III site 10 nucleotides internal to its 5' end.
Oligonucleotide 085 (SEQ ID NO: 47) has an Eco RI site at its 5' end. Following amplification, restricted with Hind III and Eco RI and ligated as described in Example I into the polylinker of M13mp18 the products were digested with the same two enzymes. The resultant double stranded insert contained a ribosome binding site, a translation initiation codon followed by a leader sequence and three restriction enzyme sites for cloning random oligonucleotides (xho I, Stu I, Spe I). named M13IXO4.
The vector was During cloning of the double-stranded insert, it was found that one of the Gcc codons in oligonucleotides 028 and its complement in 031 was deleted. Since this deletion did not affect function, the final construct is missing one of the two GCC codons. Additionally, oligonucleotide 032 contained a GTG codon where a GAG codon was needed.
Mutagenesis was performed using the oligonucleotide 5'- TAACGGTAAGAGTGCCAGTGC-3‘ (SEQ ID NO: 51) to convert the codon to the desired sequence. The resultant intermediate vector was named Ml3IX04B.
The fourth step in constructing M13IX30 involved inserting the expression and cloning sequences from M13Ix04B upstream of the pseudo-wild type gVIII in M13Ix01F. This was accomplished by digesting M13IXO4B with Dra III and Ban HI and gel isolating the 700 base pair insert containing the sequences of interest. M13IX01F was likewise digested with Dra III and Bam HI. The insert was combined with the double digested vector at a molar ratio of 3:1 and ligated as described in Example I. It should be noted that all modifications in the vectors described herein were confirmed by sequence analysis. The sequence of the final construct, M13IX30, is shown in Figure 7 (SEQ ID No: 3). Figure 4 also shows M13Ix30 where each of the elements necessary for surface expression of randomized oligonucleotides is marked.
Li a C n ct‘on scree n nd Ch a ter' a ‘on of Engoded Dligonucleotiggs Construction of an M13IX30 surface expression library is accomplished identically to that described in Example I for sublibrary construction except the oligonucleotides described above are inserted into Ml3Ix30 by mutagenesis instead of by ligation. The library is constructed and propagated on MK30-3 (BMB) and phage stocks are prepared The surface encoding for infection of XLI cells and screening. expression library is oligonucleotides characterized as described in Example I. screened and Degenerate Oligonuglgotides This example shows the construction and expression of a surface expression library of degenerate oligonucleotides. The encoded peptides of this example derive from the mixing and joining together of two separate oligonucleotide populations. Also demonstrated is the isolation and characterization of peptide ligands and their corresponding nucleotide sequence for specific binding proteins.
Synthesis Qf Oligonggleogide Eopulatigns A population of left half degenerate oligonucleotides and a population of right half degenerate oligonucleotides was synthesized using standard automated procedures as described in Example I.
The degenerate codon sequences for each population of oligonucleotides were generated by sequentially synthesizing the triplet NNG/T where N is an equal mixture of all four nucleotides. The antisense sequence for each population of oligonucleotides was synthesized and each population contained 5' and 3' flanking sequences complementary to the vector sequence. The complementary termini was used to incorporate each population of oligonucleotides into their respective vectors by standard mutagenesis procedures. Such procedures have been described previously in Example I and in the Detailed Description. synthesis of the antisense sequence of each population was necessary since the single-stranded form of the vectors are obtained only as the sense strand.
The left half oligonucleotide population was synthesized having the following sequence: 5'- AGCTCCCGGATGCCTCAGAAGATG(A/CNN)¢GGCTTTTGCCACAGGGG-3' (SEQ ID NO: 52). The right half oligonucleotide population was synthesized having the following sequence: 5'- CAGCCTCGGATCCGCC(A/CNN)wATG(A/C)GAAT-3' (SEQ ID NO. 53).
These two oligonucleotide populations when incorporated into their respective vectors and joined together encode a 20 codon oligonucleotide having 19 degenerate positions and an internal predetermined codon sequence.
Vegto; Construction Modified forms of the previously described vectors were used for the construction of right and left half sublibraries. The construction of left half sublibraries was performed in an M13-based vector termed M13ED03.
This vector is a modified form of the previously described M13IX3o vector and contains all the essential features of both Ml3IX3O and M13IX22. M13ED03 contains, in addition to a wild type and a pseudo~wild type gene VIII, sequences necessary for expression and two Fok I sites for joining with a right half oligonucleotide sublibrary. Therefore, this vector combines the advantages of both previous vectors in that it can be used for the generation and expression of surface expression libraries from a single oligonucleotide population or it can be joined with a sublibrary to bring together right and left half oligonucleotide populations into a surface expression library.
M13ED03 was constructed in two steps from M13Ix30.
The first step involved the modification of M13Ix30 to remove a redundant sequence and to incorporate a sequence encoding the eight amino—terminal residues of human B- endorphin. The leader sequence was also mutated to increase secretion of the product.
During construction of M13IXo4 (an intermediate vector to M13IX30 which is described in Example II), a six nucleotide sequence was duplicated in oligonucleotide 027 (SEQ ID NO: 44) and its complement 032 (SEQ ID NO: 49). This sequence, 5'-TTACCG-3', was deleted by mutagenesis in the construction of M13ED01. The oligonucleotide used for the mutagenesis was 5'- GGTAAACAGTAACGGTAAGAGTGccAG~3' (SEQ ID NO: 54). The mutation in the leader sequence was generated using the oligonucleotide 5'-GGGCTTTTGCCACAGGGGT-3' (SEQ ID NO: 55). This mutagenesis resulted in the A residue at position 6353 of M13Ix3O being changed to a G residue.
The resultant vector was designated M13IX32.
To generate M13ED0l, the nucleotide sequence encoding B-endorphin (8 amino acid residues of B- endorphin plus 3 extra amino acid residues) was incorporated after the leader sequence by mutagenesis.
The oligonucleotide used had the following sequence: AGGGTCATCGcCTTCAGCTCCGGATCcCTCAGAAGTCATAAACCCCCCATAGGC TTTTGCCAC-3' (SEQ ID NO: 56). removed some of the downstream sequences through the Spe This mutagenesis also I site.
The second step in the construction of M13EDo3 involved vector changes which put the B-endorphin sequence in frame with the downstream pseudo-gene VIII sequence and incorporated a Fok I site for joining with a sublibrary of right half oligonucleotides. This vector , was designed to incorporate oligonucleotide populations by mutagenesis using sequences complementary to those flanking or overlapping with the encoded B-endorphin sequence. The absence of B—endorphin expression after mutagenesis can therefore be used to measure the mutagenesis frequency. In addition to the above vector changes, M13ED03 was also modified to contain an amber codon at position 3262 for biological selection during joining of right and left half sublibraries.
The mutations were incorporated using standard mutagenesis procedures as described in Example I. The frame shift changes and Fox I site were generated using the oligonucleotide 5'- TCGCCTTCAGCTCCCGGATGCCTCAGAAGCATGAACCCCCCATAGGC-3' (SEQ ID No: 57). The amber codon was generated using the oligonucleotide 5'-CAATTTTATCCTAAATCTTACCAAC-3' (SEQ ID NO: 58). The full sequence of the resultant vector, Ml3ED03, is provided in Figure 8 (SEQ ID No: 4).
The construction of right half oligonucleotide sublibraries was performed in a modified form of the M13IX42 vector. The new vector, M13IX421, is identical to M13IX42 except that the amber codon between the Eco RI-sacI cloning site and the pseudo-gene VIII sequence was removed. This change ensures that all expression off of the Lac 2 promoter produces a peptide-gene VIII fusion protein. Removal of the amber codon was performed by mutagenesis using the following oligonucleotide: 5'- GCCTTCAGCCTCGGATCCGCC-3‘ (SEQ ID NO: 59). The full sequence of Ml3IX421 is shown in Figure 9 (SEQ ID No: 5).
L'br Co stru on c een n an ha t zation of Encoded Oligonuclggtidgs A sublibrary was constructed for each of the previously described degenerate populations of The left half population of oligonucleotides was incorporated into M13ED03 to generate the sublibrary M13EDo3.L and the right half population of oligonucleotides was incorporated into Ml3Ix421 to generate the sublibrary M13IX42l.R. Each of the oligonucleotide populations were incorporated into oligonucleotides. their respective vectors using site-directed mutagenesis as described in Example I. Briefly, the nucleotide sequences flanking the degenerate codon sequences were complementary to the vector at the site of incorporation.
The populations of nucleotides were hybridized to single- stranded M13ED03 or M13IX421 vectors and extended with T4 DNA polymerase to generate a double-stranded circular vector. Mutant templates were obtained by uridine selection ig 2119, as described by Kunkel et al., supra.
Each of the vector populations were electroporated into host cells and propagated as described in Example I.
The random joining of right and left half sublibraries into a single surface expression library was accomplished as described in Example I except that prior to digesting each vector population with Fck I they were first digested with an enzyme that cuts in the unwanted portion of each vector. Briefly, Ml3ED03.L was digested with Bgl II (cuts at 7094) and Ml3IX42l.R was digested with Hind III (cuts at 3919). Each of the digested populations were further treated with alkaline phosphatase to ensure that the ends would not religate and then digested with an excess of Fok I. Ligations, electroporation and propagation of the resultant library was performed as described in Example I.
The surface expression library was screened for ligand binding proteins using a modified panning procedure. Briefly, 1 ml of the library, about 10" phage particles, was added to 1-5 pg of the ligand binding protein. The ligand binding protein was either an antibody or receptor globulin (Rg) molecule, Aruffo et al., Cell 6l:1303—1313 (1990), which is incorporated herein by reference. Phage were incubated shaking with affinity ligand at room temperature for 1 to 3 hours followed by the addition of 200 pl of latex beads (Biosite, San Diego, CA) which were coated with goat- antimouse IgG. This mixture was incubated shaking for an additional 1-2 hours at room temperature. Beads were pelleted for 2 minutes by centrifugation in a microfuge and washed with TBS which can contain 0.1% Tween 20.
Three additional washes were performed where the last wash did not contain any Tween 20. The bound phage were then eluted with 200 pl 0.1 M Glycine-HC1, pH 2.2 for 15 minutes and the beads were spun down by centrifugation.
The supernatant-containing phage (eluate) was removed and phage exhibiting binding to the ligand binding protein were further enriched by one-to-two more cycles of panning. Typical yields after the first eluate were about 1 x 106 - 5 x 106 pfu. The second and third eluate generally yielded about 5 x 106 - 2 x 107 pfu and 5 x 107 - 1 x 10" pfu, respectively.
The second or third eluate was plated at a suitable density for plaque identification screening and sequencing of positive clones (i.e., plated at confluency for rare clones and 200-500 plaques/plate if pure plaques were needed). Briefly, plaques grown for about 6 hours at 37°C and were overlaid with nitrocellulose filters that had been soaked in 2 mM IPTG and then briefly dried.
The filters remained on the plaques overnight at room temperature, removed and placed in blocking solution for 1-2 hours. Following blocking, the filters were incubated in 1 ug/ml ligand binding protein in blocking solution for 1-2 hours at room temperature. Goat antimouse Ig-coupled alkaline phosphatase (Fisher) was added at a 1:1ooo dilution and the filters were rapidly washed with 10 mls of TBS or block solution over a glass vacuum filter. Positive plaques were identified after alkaline phosphatase development for detection.
Screening of the degenerate oligonucleotide library with several different ligand binding proteins resulted in the identification of peptide sequences which bound to each of the ligands. For example, screening with an antibody to B-endorphin resulted in the detection of about 30-40 different clones which essentially all had the core amino acid sequence known to interact with the The sequences flanking the core sequences were different showing that they were independently derived antibody. and not duplicates of the same clone. Screening with an antibody known as 57 gave similar results (i.e., a core consensus sequence was identified but the flanking sequences among the clones were different).
Generat‘on o t andom 01’ o leo e Librar This example shows the synthesis and construction of a left half random oligonucleotide library.
A population of random oligonucleotides nine codons in length was synthesized as described in Example I except that different sequences at their 5' and 3' ends were synthesized so that they could be easily inserted into the vector by mutagenesis. Also, the mixing and dividing steps for generating random distributions of reaction products was performed by the alternative method of dispensing equal volumes of head suspensions. The liquid chosen that was dense enough for the beads to remain dispersed was 100% acetonitrile.
Briefly, each column was prepared for the first coupling reaction by suspending 22 mg (lumole) of 48 umol/g capacity beads (Genta, San Diego, CA) in 0.5 mls of 100% acetonitrile. These beads are smaller than those described in Example I and are derivatized with a guanine nucleotide. They also do not have a controlled pore The bead suspension was then transferred to an empty reaction column. Suspensions were kept relatively dispersed by gently pipetting the suspension during transfer. size.
Columns were plugged and monomer coupling reactions were performed as shown in Table XII. column 3L AT(A/G)GGCTTTTGCCACAGG AC(A/G)GGCTTTTGCCACAGG CA(G/T)GGCTTTTGCCACAGG CT(G/C)GGCTTTTGCCACAGG AG(T/C)GGCTTTTGCCACAGG column 4L column 5L column 6L column 7L column 8L AT(T/C)GGCTTTTGCCACAGG column 9L CC(A/C)GGCTTTTGCCACAGG column 10L T(A/T)TGGCTTTTGCCACAGG After coupling of the last monomer, the columns were unplugged as described previously and their contents were poured into a 1.5 ml microfuge tube. The columns were rinsed with 100% acetonitrile to recover any remaining beads. The volume used for rinsing was determined so that the final volume of total bead suspension was about 100 pl for each new reaction column that the beads would be aliquoted into. The mixture was vortexed gently to produce a uniformly dispersed suspension and then divided, with constant pipetting of the mixture, into equal volumes. Each mixture of beads was then transferred to an empty reaction column. The empty tubes were washed with a small volume of 100% acetonitrile and also transferred to their respective columns. Random codon positions 2 through 9 were then synthesized as described in Example I where the mixing and dividing steps were performed using a suspension in 100% acetonitrile. The coupling reactions for codon positions through 9 are shown in Table xIII.
Table 3;}; sequence Qlumn ' t ' column 1L AA(A/C); column 2L AG(A/G)A column 3L AT(A/G); column 4L AC(A/G); column 5L CA(G/T)A column 6L CT(G/C)g column 7L AG(T/C)A column 8L AT(T/C); column 9L CC(A/C); column 10L T(A/T)TA After coupling of the last monomer for the ninth codon position, the reaction products were mixed and a portion was transferred to an empty reaction column.
Columns were plugged and the following monomer coupling reactions were performed: 5'-CGGATGCCTCAGAAGCCCCXXg-3‘ (SEQ ID NO: 60). The resulting population of random oligonucleotides was purified and incorporated by mutagenesis into the left half vector M13EDo4.
M13ED04 is a modified version of the M13EDo3 vector described in Example III and therefore contains all the features of that vector. The difference between Ml3ED03 and M13ED04 is that M13ED04 does not contain the five amino acid sequence (Tyr Gly Gly Phe Met) recognized by anti-B-endorphin antibody. This sequence was deleted by mutagenesis using the oligonucleotide 5'- CGGATGCCTCAGAAGGGCTTTTGCCACAGG (SEQ ID NO: 61). The entire nucleotide sequence of this vector is shown in Figure 10 (sag ID NO: 6).
Although the invention has been described with reference to the presently preferred embodiment, it should be understood that various modifications can be made without departing from the spirit of the invention.
Accordingly, the invention is limited only by the claims.
SEQUENCE LISTING (1) GENERAL INFORMATION: (i) APPLICANT: Huse, William D. (ii) TITLE OF INVENTION: SURFACE EXPRESSION LIBRARIES OF RANDOMIZED PEPTIDES (iii) NUMBER OF SEQUENCES: 61 (iv) CORRESPONDENCE ADDRESS: (A) ADDRESSEE: Pretty, Schroeder, Brueg emann & Clark (B) STREET: 444 South Flower Street, Su ca 2000 (c) CITY: Los Angeles (D) STATE: California (E) COUNTRY: United States (F) ZIP: 90071 (v) COMPUTER READABLE FORM: (A) MEDIUM TYPE: Floppy disk (B) COMPUTER: IBM PC compatible (C) OPERATING SYSTEM: PC-DOS/MS-DOS (D) SOFTWARE: Patentln Release #l.0, Version #1.25 (vi) CURRENT APPLICATION DATA: (A) APPLICATION NUMBER: (B) FILING DATE: (C) CLASSIFICATION: (viii) ATTORNEY/AGENT INFORMATION: (A) NAME: Campbell. Cathrgn A (B) REGISTRATION NUMBER: 1.815 (C) REFERENCE/DOCKET NUMBER: P31 9072 (ix) TELECOMMUNICATION INFORMATION: (A) TELEPHONE: (619) 535-9001 (B) TELEFAX: (619) 535-8949 (2) INFORMATION FOR SEQ ID NO:1: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7294 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: both (D) TOPOLDGY: circular (xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: AATCCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT CGTTCGCAGA ATTGGCAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTAIAAIAGT 180 240 300 360 420 CAGGGTAAAG TTTGAGGGGG AAACATTTTA GGTTTTTATC AATTCCTTTT ATGAATCTTT TCTTCCCAAC CAATGATTAA CTCGTCAGGG AATATCCGGT TGTACACCGT GTCTGCGCCT CAGGCGAIGA CAAAGATGAG GTGGCATTAC CAAAGCCTCT CGATCCCGCA TGCGTGGGCG ATTCACCTCG TTTTTGGAGA TATTCTCACT TTTACTAACG CTGTGGAATG TGGGTTCCTA TCTGAGGGTG AITCCGGGCT AACCCCGCTA CAGAATAATA CAAGGCACTG TATGACGCTT GATCCATTCG GCTGGCGGCG GGCGGTTCTG GATTTTGATT ACCTGATTTT AITCAATGAA CTATTACCCC GTCGTCTGGT GGCGTTATGT CTACCTGTAA GTCCTGACTG AGTTGAAATT CAAGCCTTAT TCTTGTCAAG TCATCTGTCC CGTTCCGGCT TACAAATCTC TGTTTTAGTG GTATTTTACC GTAGCCGTTG AAAGCGGCCT ATGGTTCTTG AAAGGAAGCT TTTTCAACGT CCGCTGAAAC TCTGGAAAGA CTACAGGCGT TTGGGCTTGC GCGGTTCTGA ATACTTATAT ATCCTAATCC GGTTCCGAAA ACCCCGTTAA ACTGGAACGG TTTGTGAATA GCTCTGGTGG AGGGTGGCGG ATGAAAAGAT TGATTTATGC TCATTCTCGT TTTCTGAACT GTTTAAAGCA TATTTATGAC CTCTGGCAAA AAACGAGGGT ATCTGCATTA TAATCTTGTT GTATAATGAG AAACCATCTC TCACTGAATG AITACTCTTG TCTTTCAAAG AAGTAACATG CGTTGTACTT TAITCTTTCG CGTTTAATGG CTACCCTCGT TTAACTCCCT TCATTGTCGG GATAAACCGA GAAAAAATTA TGTTGAAAGT CGACAAAACT TGTAGTTTGT TATCCCTGAA GCCTGGCGGT CAACCCTCTC TTCTCTTGAG TAGGCAGGGG AACTTAITAC TAAATTCAGA TGAAGGCGAA TGGTTCTGGT CTCTGAGGGA GGCAAACGCT GAITCCGCAG ACTTCTTTTG TATGAIAGTG GTTCAATGTG CCGTTAGTTC CCAGTTCTTA AAGCCCAATT AGCAGCTTTG AIGAAGGTGA TTGGTCAGTT GAGCAGGTCG TGTTTCGCGC CCTCTTTCGT AAACTTCCTC TCCGATGCTG GCAAGCCTCA CGCAACTATC TACAAITAAA TTAITCGCAA TGTTTAGCAA TTAGATCGTT ACTGGTGACG AATGAGGGTG ACTAAACCTC GACGGCACTT GAGTCTCAGC GCATTAACTG CAGTACACTC GACTGCGCTT TCGTCTGACC GGCGGCTCTG GGCGGTTCCG AATAAGGGGG TATTGGACGC CAAAAGCCTC TTGCTCTTAC GTATTCCTAA GTTTTAIIAA AAAICGCATA TACTACTCGT TTACGTTGAT GCCAGCCTAT CGGTTCCCTT CGCATTTCGA TTGGTATAAT TTTAGGTTGG ATGAAAAAGT TCTTTCGCTG GCGACCGAAT GGIATCAAGC GGCTCCTTTT TTCCTTTAGT AACCCCATAC ACGCTAACTA AAACTCAGTG GTGGCTCTGA CTGAGTACGG ATCCGCCTGG CTCTTAATAC TTTATACGGG CTGTATCATC TCCATTCTGG TGCCTCAACC AGGGTGGTGG GTGGTGGCTC CTATGACCGA TATCCAGTCT TCGCTATTTT TATGCCTCGT ATCTCAACTG CGTAGATTTT AGGTAATTCA TCTGGTGTTT TTGGGTAATG GCGCCTGGTC ATGAITGACC CACAATTTAT CGCTGGGGGT TGCCTTCGTA CTTTAGTCCT CTGAGGGTGA AIAICGGTTA TGTTTAAGAA GGAGCCTTTT TGTTCCTTTC AGAAAATTCA TGAGGGTTGT TTACGGTACA GGGTGGCGGT TGATACACCT TACTGAGCAA TTTCATGTTT CACTGTTACT AAAAGCCATG CTTTAATGAA TCCTGTCAAI CTCTGAGGGT TGGTTCCGGT AAATGCCGAT 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 GAAAACGCGC GCTGCTATCG GGTGATTTTG TTAATGAATA TTTGTCTTTA TTCCGTGGTG TTTGCTAACA TATTATTGCG TTAAAAAGGG GGCTTAACTC TTGTTCAGGG TCTCTGTAAA ATTGGGATAA CTCGTTAGCG CTTGATTTAA CTTAGAATAC TCCTACGATG ACCCGTTCTT AAATTAGGAT CGTTCTGCAI TTTGTCGGTA GTTGGCGTTG ACTGGTAAGA TCCGGTGTTT AATTTAGGTC TGTCTTGCGA GAGGTTAAAA CAGCGTCTTA ACCGACGATT ATTAAAAAGG GTTTCATCAT GTAACTTGGT AGTGTTACTG GTTTTACGTG TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT AIGGTTTCAT CTGGCTCTAA ATTTCCGTCA GCGCTGGTAA TCTTTGCGTT TACTGCGTAA TTTCCTCGGT CTTCGGTAAG AATTCTTGTG TGTTCAGTTA GGCTGCTATT AIAAIATGGC TTGGTAAGAT GGCTTCAAAA CGGATAAGCC AAAAIAAAAA GGAATGATAA GGGAIATTAI TAGCTGAACA CTTTATATTC TTAAATATGG ATTTGTATAA ATTCTTATTT AGAAGATGAA TTGGATTTGC AGGTAGTCTC ATCTAAGCTA TACAGAAGCA TAATTCAAAT CTTCTTTTGC AITCAAAGCA TATATTCATC CTAATAATTT TGGTGACGTT TTCCCAAATG ATATTTACCT ACCAIATGAA TCTTTTATAT TAAGGAGTCT TTCCTTCTGG AIAGCTATTG GGTTATCTCT ATTCTCCCGT TTCATTTTTG TGTTTAITTT TCAGGATAAA CCTCCCGCAA TTCTATATCT CGGCTTGCTT GGAAAGACAG CTTCCTTGTT TGTTGTTTAT TCTTAITACT CGATTCTCAA CGCAIATGAT AACGCCTTAT GCTTACTAAA AICAGCATTT TCAGACCTAT TCGCTAIGTT AGGTTATTCA GAAATTGTTA TCAGGTAATT ATCAGGCGAA TGACGTTAAA TGAIATGGTT TCCGGCCTTG GCTCAAGTCG TCCCTCCCTC TTTTCTATTG GTTGCCACCT TAATCATGCC TAACTTTGTT CTATTTCATT CTGATATTAG CTAATGCGCT ACGTTAAACA GTAACTGGCA ATTGTAGCTG GTCGGGAGGT GATTTGCTTG GTTCTCGATG CCGATTATTG CAGGACTTAT TGTCGTCGTC GGCTCGAAAA TTAAGCCCTA ACTAAACAGG TTATCACACG ATATATTTGA ACAIATAGTT GATTTTGATA TTCAAGGATT CTCACAIATA AATGTAATTA GAAATGAATA TCCGTTATTG CCTGAAAATC GGTTCAATTC CTAATGGTAA GTGACGGTGA AATCGGTTGA AITGTGACAA TTATGTATGT AGTTCTTTTG CGGCTATCTG GTTTCTTGCT CGCTCAATTA TCCCTGTTTT AAAAATCGTT AATTAGGCTC GGTGCAAAAI TCGCTAAAAC CTATTGGGCG AGTGCGGTAC ATTGGTTTCT CTATTGTTGA TGGACAGAAT TGCCTCTGCC CTGTTGAGCG CTTTTTCTAG GTCGGTATTT AAAAGTTTTC AIAIAACCCA AATTCACTAT CTAAGGGAAA TTGATTTATG AITTTCTTTT ATTCGCCTCT TTTCTCCCGA TACGCAATTT CTTCCATTAT TGGTGCTACT TAAITCACCT ATGTCGCCCT AATAAACTTA ATTTTCTACG GGTATTCCGT CTTACTTTTC CTTATTATTC CCCTCTGACT TATGTTATTC TCTTATTTGG TGGAAAGACG AGCAACTAAT GCCTCGCGTT CGGTAATGAT TTGGTTTAAT ACAIGCTCGT TAAACAGGCG TACTTTACCT TAAATTACAT TTGGCTTTAT TAATTATGAT CAAACCATTA ACGCGTTCTT ACCTAAGCCG TGACTCTTCT ATTAATTAAT TACTGTTTCC CTTGATGTTT GCGCGATTTT TGTAAAAGGT CTTTATTTCT TTAGAAGTAT 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 45 AATCCAAACA ATCAGGATTA TAITGATGAA TTGCCAICAT CTGATAATCA GGAATATGAT GATAATTCCG TTTAAAATTA TCTAATACTT AGTGCACCTA ACTCACCAGA TTTTCATTTG CTCACCTCTG GGGCTATCAG ATTCTTACGC ACTGGTCGTG CAAAATGTAG CTGGAIATTA ACTAATCAAA GGTGGCCTCA ATCCCTTTAA TACGTGCTCG TGTGGTGGTT CGCTTTCTTC GGGGCTCCCT TTTGGGTGAT GTTGGAGTCC TATCTCGGGC GAGGATTTTC CAGCCGGTGA GCGCCCAATA CGACAGGTTT CACTCATTAG TGTGAGCGGA GTAGGAGAGC AGTTTAGAGG GTTGGTGCTA GCTGGGGTAA ATGGCGAATG CTCCTTCTGG TGGTTTCTTT GTTCCGCAAA ATGAIAATGT TACTCAAACT ATAACGTTCG GGCAAAGGAT CTAAATCCTC AAATGTATTA AAGATATTTT AGATAACCTT TATTGATTGA GGGTTTGATA CTGCTGGCTC TCAGCGTGGC TTTTATCTTC TGCTGGTGGT TTCGCGCA$T AAAGACTAAT TTfCAGGTCA GAAGGGTTCT TGACTGGTGA AICTGCCAAT GTATTTCCAT GAGCGTTTTT CCAGCAAGGC CGATAGTTTG GAAGTATTGC TACAACGGTT CTGATTATAA AAACACTTCT TCGGCCTCCT GTTTAGCTCC TCAAAGCAAC CATAGTACGC ACGCGCAGCG TGACCGCTAC CCTTCCTTTC TCGCCACGTT TTAGGGTTCC GAITTAGTGC GGTTCACGTA GTGGGCCATC ACGTTCTTTA AIAGTGGACT TATTCTTTTG ATTTATAAGG GCCTGCTGGG GCAAACCAGC AGGGCAATCA GCTGTTGCCC CGCAAACCGC CTCTCCCCGC CCCGACTGGA AAGCGGGCAG GCACCCCAGG CTTTACACTT TAAGAAITTC ACACAGGAAA TCGGCGGATC CTAGGCTGAA CAAGTGCTAC TGAGTACATT CCATAGGGAT TAAAITATTC TAGCGAAGAG GCCCGCACCG GCGCTTTGCC TGGTTTCCGG TTAATACGAG TCTATTGACG CCTCAATTCC TTTGAGGTTC ACTGTTGCAG TCGTTCGGTA AGCCATTCAA AICTCTGTTG GTAAATAAIC CCTGTTGCAA AGTTCTTCTA AATTTGCGTG CAAGATTCTG CGCTCTGAIT GCCCTGTAGC ACTTGCCAGC CGCCGGCTTT TTTACGGCAC GCCCTGATAG CTTGTTCCAA GATTTTGCCG GTCGACCGCT GTCTCGCTGG GCGTTGGCCG TGAGCGCAAC TATGCTTCCG CAGCTATGAC GGCGATGACC GGCTACGCTT AAAAAGTTTA AICCCCCTTC CACCAGAAGC TTGTCGAATT GCTCTAATCT TTTCTACTGT AGCAAGGTGA GCGGTGTTAA TTTTTAATGG AAATATTGTC GCCAGAATGT CATTTCAGAC TGGCTGGCGG CTCAGGCAAG AIGGACAGAC GCGTACCGTT CCAACGAGGA GGCGCATTAA GCCCTAGCGC CCCCGTCAAG CTCGACCCCA ACGGTTTTTC ACTGGAACAA ATTTCGGAAC TGCTGCAACT TGAAAAGAAA ATTCAITAAT GCAATTAATG GCTCGTATCT CAGGAIGTAC CTGCTAAGGC GGGCTATGGT CGAGCAAGGC CCAACAGTTG GGTGCCGGAA GTTTGTAAAG ATTAGTTGTT TGATTTGCCA TGCTTTAGAT TACTGACCGC CGATGTTTTA TGTGCCACCT CCCTTTTATT GATTGAGCGT TAATATTGTT TGAIGTTATT TCTTTTACTC CCTGTTAAA AAGCACGTTA GCGCGGCGGG CCGCTCCTTT CTCTAAATCG AAAAACTTGA GCCCTTTGAC CACTCAACCC CACCATCAAA CTCTCAGGGC AACCACCCTG GCAGCTCGCA TGAGTTAGCT TCTGTGGAAT GAATTCGCAG TGCATTCAAT AGTAGTTATA TTCTTAACCA CGCAGCCTGA AGCTGGCTGG 4680 4740 4800 4860 4920 4980 5040 5100 S160 5220 5280 5340 5400 S460 5520 S580 5640 S700 5760 S820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 AGTGCGATCT ACGATGCGCC CCACGGAGAA AGGAACGCCA TTAACAAAAA TTATACAATC CAIGCTAGTT TGACCTGATA AGCTAGAACG TTTTGAATCTI AAATTTTTAT TGTTTTTGGT TTCTTTGCCT TCCTGAGGCC GATACGGTCG CATCTACACC AACGTAACCT TCCGACGGGT TGTTACTCGC GACGCGAATT A1111lbATG TTTAACGCGA ATTTTAACAA TTCCTGTTTT TGGGGCTTTT TTACGATTAC CGTTCATCGA GCCTTTGTAG ATCTCTCAAA GTTGAATATC ATAITGATGG TTACCTACAC ATTACTCAGG TCGTCCCCTC AAACTGGCAG ATGCACGGTT ATCCCATTAC GGTCAATCCG CCGTTTGTTC TCACATTTAA TGTTGATGAA AGCTGGCTAC Gbb11bCTAT 1bb11AAAAA AIGAGCTGAT AATATTAACG TTTACAATTT AAATAITTGC CTGATTATCA ACCGGGGTAC AIATGATTGA TTCTCTTGTT TGCTCCAGAC TCTCAGGCAA AATAGCTACC CTCTCCGGCA TTAAETTATC TGATTTGACT GTCTCCGGCG TTTCTCACCC CAITGCAITT AAAAIATATG AGGGTTCTAA CCTTGCGTTG AAAIAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA CTCTGAGGCT TTATTGCTTA AITTLUCTAA CGTT ACAACCGATT TAGCTTTATG TGCCTGTATC AITTAITGGA (2) INFORMATION FOR SEQ ID NO:2: (1) SEQUENCE CHARACTERISTICS: ( ( ( ( A) LENGTH: 7320 base pairs B) TYPE: nucleic acid C) STRANDEDNESS: both D) TOPOLDGY: circular (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: AAIGCTACIA ATAGCTAAAC CGTTCGCAGA GTTGCATATT TCTGCAAAAA TTGGAGTTTG TCTTTCGGGC CAGGGTAAAG TTTGAGGGGG AAACATTTTA GGTTTTTATC AATTCCTTTT ATGAAICTTT TCTTCCCAAC CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGOCCC AAAICAAAAT AGGTTATTGA CCATTTGCGA AAIGTATCTA ATCGTCAAAC TAAATCTACT AITGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA TAAAACAIGT TGAGCTACAG CACCAGAITC AGCAAITAAG CTCTAAGCCA TGACCTCTTA TCAAAAGGAG CAAITAAAGG TACTCTCTAA TCCTGACCTG CTTGCGGTCT GGTTCGCTTT GAAGGTCGAA TTAAAACGCG AIAITTGAAG TTCCTCTTAA TCTTTTTGAT CCAATCCGCT TTGCTTCTGA CTATAATAGT ACCTGATTTT TGAITTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA AITCAAIGAA TATTTAIGAC GAITCCGCAG TATTGGACGC TATCCAGTCT CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCC TCGCTAITTT GTCGTCTGGT AAACGAGGGT TAIGAIAGTG TTGCTCTTAC TATGCCTCGT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG CTACCTGTAA TAAIGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT GTCCTGACTG GTAIAAIGAG CCAGTTCTTA AAAICGCATA AGGTAATTCA 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 72120 180 2&0 300 360 420 480 540 600 660 720 780 840 CAATGATTAA CTCGTCAGGG AATATCCGGT TGTACACCGT GTCTGCGCCT CAGGCGATGA CAAAGAIGAG GTGGCATTAC CAAAGCCTCT CGATCCCGCA TGCGTGGGCG AITCACCTCG TTTTTGGAGA TATTCTCACT TTTACTAACG CTGTGGAATG TGGGTTCCTA TCTGAGGGTG ATTCCGGGCT AACCCCGCTA CAGAAIAATA CAAGGCACTG TATGACGCTT GATCCAITCG GCTGGCGGCG GGCGGTTCTG GATTTTCATT GAAAACGCGC GCTGCTATCG GCTGATTTTG TTAATGAATA TTTGTCTTTA TTCCGTGGTG AGTTGAAAIT CAAGCCTTAI TCTTGTCAAG TCATCTGTCC CGTTCCGGCT TACAAATCTC TGTTTTAGTG GTATTTTACC GTAGCCGTTG AAAGCGGCCT ATGGTTGTTG AAAGCAAGCT TTTTCAACGT CCGCTGAAAC TCTGGAAAGA CTACAGGCGT TTGGGCTTGC GCCGTTCTGA ATACTTATAI ATCCTAATCC GGTTCCGAAA ACCCCGTTAA ACTGGAACGG TTTGTGAATA GCTCTGGTGG AGGGTGGCGG ATGAAAAGAI TACAGTCTGA ATGGTTTCAT CTGGCTCTAA AAACCATCTC TCACTGAATG ATTACTCTTG TCTTTCAAAC AAGTAACATG CGTTGTACTT TATTCTTTCG CGTTTAATGG CTAGCCTCGT TTAACTCCCT TCAITGTCGG GATAAACCGA GAAAAAAETA TGTTGAAAGT CGACAAAACT TGTAGTTTGT TATCCCTGAA GGGTGGCGGT CAACCCTCTC TTCTCTTGAG TAGGCAGGGG AACTTAITAC TAAATTCAGA TCAAGGCCAA TGGTTCTGGT CTCTGAGGGA GGCAAACGCT CGCTAAAGGC TGGTGACGTT TTCCCAAATG ATTTCCGTCA AIATTTACCT CCGCTGGTAA ACCATATGAA TCTTTGCGTT TCTTTTATAT TTTGCTAACA TACTGCGTAA TAACGAGTCT AAGCCCAATT AGCAGCTTTG ATGAAGGTCA TTGGTCAGTT GAGCAGGTCG TGTTTCGCGC CCTCTTTCGT AAACTTCCTC TCCGATGCTG GCAAGCCTCA CGCAACTATC TACAATTAAA TTATTCGCAA TCTTTAGCAA TTAGATCGTT ACTGGTGACG AATGAGGGTG ACTAAACCTC GACGGCACTT GAGTCTCAGC GCAITAACTG CAGTACACTC GACTGCGCTT TCGTCTGACC GGCGGCTCTG GGCGGTTCCG AATAAGGGGG AAACTTGAIT TCCGGCCTTG GCTCAAGTCG TCCCTCCCTC TTTTCTATTG GTTGCCACCT TAATCATGCC TACTACTCGT TCTGGTGTTT TTACGTTGAT TTGGGTAATG GCCAGCCTAT GCGCCTGGTC CGGTTCCCTT AIGAITGACC CGGAITTCGA CACAATTTAT TTGGTATAAT CGCTGGGGGT TTTAGGTTGG TGCCTTCGTA ATGAAAAAGT CTTTAGTCCT TCTTTCGCTG CTGAGGGTGA GCGACCGAAI AIATCGGTTA GGTAICAAGC TGTTTAAGAA GGCTCCTTTT GGAGCCTTTT TTCCTTTAGT TGTTCCTTTC AACCCCATAC AGAAAATTCA ACGCTAACTA TGAGGCTTGT AAACTCAGTG TIACGGTACA GTCGCTCTGA GGGTGGCGGT CTGAGTACGG TGAIACACCT ATCCGCCTGG TACTGAGCAA CTCTTAATAC TTTCATGTTT TTTATACGGG CACTGTTACT CTGTAICATC AAAAGCCATG TCCATTCTGG CTTTAATGAA TGCCTCAACC TCCTGTCAAT AGGGTGGTGG CTCTGAGGGT GTGGTGGCTC TGGTTCCGGT CTATGACCGA AAATGCCGAT CTGTCGCTAC TGATTACGGT CTAATGGTAA TGGTGCTACT GTGACGGTGA TAATTCACCT AATCGGTTGA.AIGTCGCCCT ATTGTGACAA AAIAAACTTA TTATGTATGT ATTTTCTACG AGTTCTTTTG GGTATTCCGT 1020 1080 1140 1200 1260 1320 1380 1A40 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 TATTATTGCG TTAAAAAGGG GGCTTAACTC TTGTTCAGGG TCTCTGTAAA AITGGGATAA CTCGTTAGCG CTTGATTTAA CTTAGAATAC TCCTACGATC ACCCGTTCTT AAATTACCAT CGTTCTGCAT TTTGTCGGTA GTTGGCGTTC ACTGGTAAGA TCCGGTGTTT AATTTAGGTC TGTCTTGCGA GAGGTTAAAA CAGCCTCTTA AGCGACGAIT ATTAAAAAAG TGTTTCATCA TGTAACTTGG TACTGTTACT TGTTTTACGT TAATCCAAAC TGATAATTCC TTTTAAAATT GTCTAATACT TAGTGCACCT AACTGACCAG TTTTTCATTT TTTCCTCGGT CTTCGGTAAG AATTCTTCTC TGTTCAGTTA GGCTGCTATT ATAAIATGGC TTGGTAAGAT GGCTTCAAAA CGGATAAGCC AAAAIAAAAA GGAAIGATAA GCGATATTAT TACCTGAACA CTTTATATTC TTAAATATGG ATTTGTATAA ATTCTTATTT AGAAGATGAA TTGGAITTGC AGGTAGTCTC ATCTAAGCTA TACAGAAG CA GTAATTCAAA TCITCTTTTG TAITCAAAGC GTATAITCAT GCTAATAATT AATCAGGATT GCTCCTTCTG AATAACGTTC TCTAAATCCT AAAGATATTT ATAITGATTG GCTGCTGGCT TTCCTTCTGG ATAGCTATTG GGTTATCTCT ATTCTCCCGT TTCATTTTTG TGTTTATTTT TTAGGAIAAA CCTCCCGCAA TTCTATATCT CGGCTTGCTT GGAAAGACAG CTTCCTTGTT TGTTGTTTAT TCTTATTACT CGATTCTCAA CGCATATGAT AACGCCTTAT ATTAACTAAA ATCAGCATTT TCAGACCTAT TCGCTATGTT AGGTTATTCA TGAAATTGTT CTCAGGTAAT AATCAGGCGA CTGACGTTAA TTGATATGGT ATATTGATGA GTGGTTTCTT GGGCAAAGGA CAAATGTATT TAGATAACCT AGGGTTTGAT CTCAGCGTGG TAACTTTGTT CTATTTCATT CTCATATTAC CTAATGCGCT ACGTTAAACA GTAACTGGCA ATTGTAGCTG GTCGGGAGGT GATTTGCTTG GTTCTCGATG CCGATTATTG CAGGACTTAT TGTCGTCGTC GCCTCGAAAA TTAAGCCCTA ACTAAACAGG TTATCACACG ATATATTTGA ACATATAGTT CGGCTATCTG GTTTCTTGCT CGCTCAATTA TCCCTGTTTT AAAAAICGTT AATTAGGCTC GGTCCAAAAT TCGCTAAAAC CTATTGGGCG AGTGCGGTAC AITGGTTTCT CTATTGTTGA TGGACAGAAT TGCCTCTGCC CTGTTGAGCG CTTTTTCTAG GTCGGTATTT AAAAGTTTTC ATATAACCCA c'r'rAcI'r'r'rc CTTATTATTG cccrcrcacr m:c:m-r'.rc rcrramcc TGGAAAGACG AGCAACTAAT cccrccccrr cccnwrcxr TTGGTTTAAT ACATGCTCGT TAAACAGGCG TACTTTACCT TAAATTACAT rrcccrrmr TAATTATGAT cumccarm TCGCG'I'TC'1'I‘ ACCTAAGCCG GATTTTGATA AATTCACTAT TGACTCTTCT TTCAAGGATT CTCACATATA CTAAGGGAAA TTGAITTATG AITAAITAAT TACTGTTTCC AAATGTAATT AAITTTGTTT TCTTGAIGTT TGAAATGAAT ATCCGTTATT ACCTGAAAAT TGGTTCAATT ATTGCCAICA TGTTCCGCAA TTTAATACGA ATCTATTGAC TCCTCAATTC ATTTGAGGTT CACTGTTGCA AAITCGCCTC GTTTCTCCCG CTACGCAATT CCTTCCATAA TCTGAIAATC AATGAIAATG GTTGTCGAAT GGCTCTAATC CTTTCTACTG CAGCAAGGTG GGCGGTGTTA TGCGCGATTT AICTAAAAGG TCTTTATTTC TTCAGAAGTA AGGAATATGA TTACTCAAAC TGTTTGTAAA TAITAGTTGT TTGATTTGCC AIGCTTTAGA AIACTGACCG 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 CCTCACCTCT AGGGCTATCA TATTCTTACG TACTGGTCGT TCAAAATGTA TCTGGAIATT TACTAATCAA CGGTGCCCTC AATCCCTTTA ATACGTGCTC GTGTGGTGGT TCGCTTTCTT GGGGGCTCCC ATTTGGGTGA CGTTGCAGTC CTATCTCGGG ACAGGATTTT CCAGGCGGTG GGCGCCCAAI ACGACAGGTT TCACTCATTA TTGTGAGCGG TACGGCAGCC GACCCAGACT ACTGGCCCTC CCTTGCAGCA CCCTTCCCAA AGAAGCGGTG CCCCTCAAAC CATTACGGTC ATTTAATGTT TCCTATTGGT TTAACGTTTA TTATCAACCG GTTTTATCTT GTTCGCGCAT CTTTCAGGTC GTGACTGGTG GGTAITTCCA ACCAGCAAGG AGAAGTAITG ACTGATTATA ATCGGCCTCC GICAAAGCAA TACGCGCAGC CCCTTCCTTT TTTAGGGTTC TGGTTCACGT CACGTTCTTT CTAITCTTTT CGCCTGCTGG AAGGGCAATC ACGCAAACCG TCCCGACTGG GGCACCCCAG ATAACAATTT GCTGCATTGT CCAGAAITCC GTTTTACAAC CACCCCCCTT CAGTTGCGCA CCGGAAAGCT TGGCAGATGC AATCCGCCGT GATGAAAGCT TAAAAAATGA CAAITTAAAT GGGTACATAT CTGCTGGTGG TAAAGACTAA AGAAGGGTTC AAICTGCCAA TGAGCGTTTT CCGATAGTTT CTACAACGGT AAAACACTTC TGTTTAGCTC CCATAGTACG GTGACCGCTA CTCGCCACGT CGAITTAGTG AGTGGGCCAT AAIAGTGGAC GATTTATAAG GGCAAACCAG AGCTGTTGCC CCTCTCCCCG AAAGCGGGCA GCTTTACACT CACACGCCAA TAITACTCGC AICCGGAATG GTCGTGACTG TCGCCAGCTG GCCTGAATGG GGCTGGAGTG ACGGTTACGA TTGTTCCCAC GGCTACAGGA GCTGATTTAA ATTTGCTTAT GATTGACATG TTCGTTCGGT TAGCCATTCA TATCTCTGTT TGTAAATAAT TCCTCTTGCA GAGTTCTTCT TAATTTGCGT TCAAGATTCT CCGCTCTGAT CGCCCTGTAG CACTTGCCAG TCCCCGGCTT CTTTACGGCA CGCCCTGATA TCTTGTTCCA GGAITTTGCC CGTGGACCGC CGTCTCGCTG CGCGTTGGCC GTGAGCGCAA TTAIGCTTCC GGAGACAGTC TGCCCAACCA AGTGTTAATT GGAAAACCCT GCGTAAIAGC CGAATGGCGC CGAICTTCCT TGCGCCCATC GGAGAATCCG AGGCCAGACG CAAAAAITTA ACAATCTTCC CTAGTTTTAC AITTTTAATG AAAATATTGT GGCCAGAATG CCATTTCAGA ATGGCTGGCG ACTCAGGCAA GAIGGACAGA GGCGTACCGT TCCAACGAGG CGGCGCATTA CGCCCTAGCG TCCCCCTCAA CCTCGACCGC GACGGTTTTT AACTGGAACA GATTTCGGAA TTGCTGCAAC GTGAAAAGAA GATTCATTAA CGCAATTAAT GGCTCGTATG AIAATGAAAT GCCATGGCCG CTAGAACGCG GGCGTTACCC GAAGAGGCCC TTTGCCTGGT GAGGCCGATA TACACCAACG ACGGGTTGTT CCAAITATTT ACGGGAATTT TGTTTTTGGG GAITACCGTT GCGAIGTTTT CTGTGCCACG TCCCTTTIAT CGAITGAGCG GTAAIATTGT GTGATGTTAT CTCTTTTACT TCCTGTCTAA AAAGCACGTT AGCGCGGCGG CCCGCTCCTT GCTCTAAATC AAAAAACTTG CGCCCTTTGA ACACTCAACC CCACCAICAA TCTCTCAGGG AAACCACCCT TGCAGCTGGC GTGAGTTAGC TTGTGTGGAA ACCTAITGCC AGCTCGTGAI TAAGCTTGGC AACTTAATCG GCACCGATCG TTCCGGCACC CGGTCGTCGT TAACTATCC ACTCGCTCAC TTGAIGGCGT TAAGAAAATA GCTTTTCTGA CATCGATTCT CTTGTTTGCT CCAGACTCTC AGGCAATGAC CTGATAGCCT GCTACCCTCT CCGGCATTAA TTTATCAGCT AGAACGGTTG TTGACTGTCT CCGGCCTTTC TCACCCTTTT GAATCTTTAC GCATTTAAAA TATATGAGGG TTCTAAAAAT TTTTATGCTT CCCGCAAAAG TATTACAGGG TCATAATGTT TTTCGTACAA GAGGCTTTAT TCCTTAATTT TGCTAATTCT TTGCCTTGCC (2) INFORMATION FOR SEQ ID NO:3: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7445 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: both (D) TOPOLOGY: circular (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA CGTTCGCAGA GTTGCATATT TCTGCAAAAA TTGGAGTTTG TCTTTCGGGC CAGGGTAAAG TTTGAGGGGG AAACATTTTA GGTTTTTATC AATTCCTTTT ATGAATCTTT TCTTCCCAAC CAATGATTAA CTCGTCAGGG AATATCCGGT TGTACACCGT GTCTGCGCCT CAGGCGATGA CAAAGATGAG ATTGGGAATC TAAAACATGT TGACCTCTTA CTTCCGGTCT TTCCTCTTAA ACCTGATTTT ATTCAATGAA CTATTACCCC GTCGTCTGGT GGCGTTATGT CTACCTGTAA GTCCTGACTG AGTTGAAATT CAAGCCTTAT TCTTGTCAAG TCATCTGTCC CGTTCCGGCT TACAAATCTC TGTTTTAGTG AACTGTTACA TGAGCTACAG TCAAAAGGAG GGTTCGCTTT TCTTTTTGAT TGATTTATGG TATTTATGAC CTCTGGCAAA AAACGAGGGT ATCTGCATTA TAATGTTGTT GTATAATGAG AAACCATCTC TCACTGAATG ATTACTCTTG TCTTTCAAAG AAGTAACATG CGTTGTACTT TATTCTTTCG TGGAATGAAA GACCAGATTC CAATTAAAGG GAAGCTCGAA GCAATCCGCT TCAITCTCGT GATTCCGCAG ACTTCTTTTG TATGATAGTG GTTGAATGTG CCGTTAGTTC CCAGTTCTTA AAGCCCAATT AGCAGCTTTG ATGAAGGTCA TTGGTCAGTT GAGCAGGTCG TGTTTCGCGC CCTCTTTCGT TTGTAGATCT CTCAAAAATA AATATCATAT TGATGGTGAT CTACACATTA CTCAGGCATT GCGTTGAAAT AAAGGCTTCT CCGATTTAGC TTTATGCTCT TGTATGATTT ATTGGACGTT CTCGCGCCCC ATGGTCAAAC CTTCCAGACA AGCAATTAAG TACTCTCTAA TTAAAACGCG TTGCTTCTGA TTTCTGAACT TATTGGACGC CAAAAGCCTC TTGCTCTTAC GTATTCCTAA GTTTTATTAA AAATCGCATA TACTACTCGT TTACGTTGAT GCCAGCCTAT CGGTTCCCTT CGGATTTCGA TTGGTATAAT TTTAGGTTGG AAATGAAAAT TAAATCTACT CCGTACTTTA CTCTAAGCCA TCCTGACCTG ATAITTGAAG CTATAATAGT GTTTAAAGCA TATCCAGTCT TCGCTATTTT TATGCCTCGT ATCTCAACTG CGTAGATTTT AGGTAATTCA TCTGCTGTTT TTGGGTAATG GCGCCTGGTC ATGATTGACC CACAATTTAT CGCTGGGGGT TGCCTTCGTA 7140 7200 7260 73120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1140 1200 1260 GTGGCATTAC CAAAGCCTCT CGATCCCGCA TGCGTGGGCG ATTCACCTCG TTTTTGGAGA TATTCTCACT TTTACTAACG CTGTGGAATG TGGGTTCCTA TCTGAGGGTG AITCCGGGCT AACCCCGCTA CAGAATAATA CAAGGCACTG TATGACGCTT GAICCATTCG GCTGGCGGCG GGCGGTTCTG GATTTTGATT GAAAACGCGC GCTGCTATCG GGTGATTTTG TTAAIGAATA TTTGTCTTTA TTCCGTGGTG TTTGCTAACA TATTATTGCG TTAAAAAGGC GCCTTAACTC TTGTTCAGGG TCTCTGTAAA ATTGGGATAA CTCGTTAGCG GTATTTTACC GTAGCCGTTG AAAGCGGCCT AIGGTTGTTG AAAGCAAGCT TTTTCAACGT CCGCTGAAAC TCTGGAAAGA CTACAGGCGT TTGGGCTTGC GCGGTICTGA ATACTTAIAT ATCCTAAICC GGTTCCGAAA ACCCCGTTAA ACTGGAACGG TTTGTGAAIA GCTCTGGTGG AGGGTGGCGG AIGAAAAGAI TACAGTCTGA AIGGTTTCAT CTGGCTCTAA ATTTCCGTCA GCGCTGGTAA TCTTTGCGTT TACTGCGTAA TTTCCTCGGT CTTCGGTAAG AAITCTTGTG TGTTCAGTTA GGCTGCTATT AIAAIAIGGC TTGGTAAGAT CGTTTAATGG CTACCCTCGT TTAACTCCCT TCATTGTCGG GATAAACCGA GAAAAAATTA TGTTGAAAGT CGACAAAACT TGTAGTTTGT TAICCCTGAA GGGTGGCGGT CAACCCTCTC TTCTCTTGAG TAGGCAGGGG AACTTATTAC TAAAITCAGA TCAAGGCCAA TGGTTCTGGT CTCTGAGGGA GGCAAACGCT CGCTAAAGGC TGGTGACGTT TTCCCAAATG AIAITTACCT ACCAIATGAA TCTTTTAIAT TAAGGAGTCT TTCCTTCTGG ATAGCTATTG GGTTATCTCT AITCTCCCGT TTCAITTTTG TGTTTATTTT TCAGGATAAA AAACTTCCTC TCCGATGCTG GCAAGCCTCA CGCAACTATC TACAATTAAA TTAITCGCAA TGTTTAGCAA TTAGATCGTT ACTGGTGACG AATGAGGGTG ACTAAACCTC GACGGCACTT GAGTCTCAGC GCATTAACTG CAGTACACTC GACTGCGCTT TCGTCTGACC GGCGGCTCTG GGCGGTTCCG AATAAGGGGG AAACTTGATT TCCGGCCTTG GCTCAAGTCG TCCCTCCCTC TTTTCTATTG GTTGCCACCT TAATCATGCC TAACTTTGTT CTATTTCATT CTGATATTAG CTAATGCGCT ACGTTAAACA GTAACTGGCA AITGTAGCTG AIGAAAAAGT TCTTTCGCTG GCGACCGAAT GGTATCAAGC GGCTCCTTTT TTCCTTTAGT AACCCCATAC ACGCTAACTA AAACTCAGTG GTGGCTCTGA CTGAGTACGG ATCCGCCTGG CTCTTAAIAC TTTATACGGG CTGTATCATC TCCAITCTGG TGCCTCAACC AGGGTGGTGG GTGGTGGCTC CTATGACCGA CTGTCGCTAC CTAATGGTAA GTGACGGTGA AAICGCTTGA ATTGTGACAA TTAIGTATGT AGTTCTTTTG CGGCTATCTG GTTTCTTGCT CGCTCAAITA TCCCTGTTTT AAAAATCGTT AAITAGGCTC CTTTAGTCCT CTGAGGGTGA AIATCGGTTA TGTTTAAGAA GGAGCCTTTT TGTTCCTTTC AGAAAATTCA TGAGGGTTGT TTACGGTACA GGGTGGCGGT TGAIACACCT TACTGAGCAA TTTCAIGTTT CACTGTTACT AAAAGCCATG CTTTAATGAA TCCTGTCAAT CTCTGAGGGT TGGTTCCGGT AAAIGCCGAT TGATTACGGT TGGTGCTACT TAATTCACCT ASGTCCCCCT AATAAACTTA AITTTCTACG GGTAITCCGT CTTACTTTTC CTTATTATTG CCCTCTGACT TAIGTTATTC TCTTAITTGG TGGAAAGACG GGTGCAAAAT AGCAACTAAT 1440 1500 1560 1620 1680 1740 1800 1860 1920 1930 2040 2100 2160 2220 2280 2340 2400 2h60 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 32h0 3300 CTTGATTTAA CTTAGAATAC TCCTACGATG ACCCGTTCTT AAATTAGGAT CGTTCTGCAT TTTGTCGGTA GTTGGCGTTG ACTGGTAAGA TCCGGTGTTT AATTTAGGTC TGTCTTGCGA GAGGTTAAAA CAGCGTCTTA AGCGACGATT ATTAAAAAAG TGTTTCATCA TGTAACTTGG TACTCTTACT TGTTTTACGT TAATCCAAAC TGAIAATTCC TTTTAAAATT GTCTAATACT TAGTGCACCT AACTGACCAG TTTTTCATTT CCTCACCTCT AGGGCTATCA TATTCTTACG TACTGGTCGT TCAAAATGTA TCTGGATATT TACTAATCAA GGCTTCAAAA CGGATAAGCC AAAATAAAAA GGAATGATAA GGGATATTAT TAGCTGAACA CTTTATATTC TTAAATATGG ATTTGTATAA ATTCTTATTT AGAAGATGAA TTGGATTTGC AGGIAGTCTC ATCTAAGCTA TACAGAAGCA GTAATTCAAA TCTTCTTTTG TATTCAAAGC GTATAITCAT GCTAATAATT AATCAGGATT GCTCCTTCTG AAIAACGTTC TCTAAATCCT AAAGAIAITT ATATTGATTG GCTGCTGGCT GTTTTATCTT GTTCGCGCAT CTTTCAGGTC GTGACTGGTG GGTATTTCCA ACCAGCAAGG AGAAGTATTG CCTCCCGCAA TTCTATAICT CGGCTTGCTT GGAAAGACAG TTTTCTTGTT TGTTGTTTAT TCTTAITACT CGATTCTCAA CGCATATGAT AACGCCTTAT GCTTACTAAA AICAGCATTT TCAGACCTAT TCGCTATGTT AGGTTATTCA TGAAATTGTT CTCAGGTAAT AATCAGGCGA CTGACGTTAA TTGATATGGT AIATTGATGA GTGGTTTCTT GGGCAAAGGA CAAATGTATT TAGATAACCT AGGGTTTGAT CTCAGCGTGG CTGCTGGTGG TAAAGACTAA AGAAGGGTTC AAICTGCCAA TGAGCGTTTT CCGATAGTTT CTACAACGGT GTCGGGAGGT GATTTGCTTG CTTCTCGATG CCGATTATTG CAGGACTTAT TGTCGTCGTC GGCTCGAAAA TTAAGCCCTA ACTAAACAGG TTATCACACG ATATATTTGA ACATATAGTT GAITTTGATA TTCAAGGATT CTCACATATA AAATGTAAIT TGAAATCAAT ATCCGTTATT ACCTGAAAAT TGGTTCAAIT ATTGCCATCA TGTTCCGCAA TTTAAIACGA ATCTAITGAC TCCTCAATTC ATTTGAGGTT CACTGTTGCA TTCGTTCGGT TAGCCATTCA TATCTCTGTT TGTAAATAAI TCCTGTTGCA GAGTTCTTCT TAATTTGCGT TCGCTAAAAC CTATTGGGCG ACTGCGGTAC ATTGGTTTCT CTATTGTTGA TGGACAGAAT TGCCTCTGCC CTGTTGAGCG CTTTTTCTAG GTCGGTATTT AAAAGTTTTC ATATAACCCA AAITCACTAT CTAAGGGAAA TTGATTTATG AATTTTGTTT AAITCGCCTC GTTTCTCCCG CTACGCAATT CCTTCCATAA TCTGAIAATC AATGATAATG GTTGTCGAAT GCCTCTAATC CTTTCTACTG CAGCAAGGTG GGCGGTGTTA ATTTTTAATG AAAAIATTGT GGCCAGAATG CCATTTCAGA ATGGCTGGCG ACTCAGGCAA GATGGACAGA GCCTCGCGTT CGGTAATGAT TTGGTTTAAT ACAIGCTCGT TAAACAGGCG TACTTTACCT TAAAITACAT TTGGCTTTAT TAATTATGAT CAAACCATTA ACGCGTTCTT ACCTAAGCCG TGACTCTTCT ATTAATTAAT TACTGTTTCC TCTTGATGTT TGCGCGATTT ATGTAAAAGG TCTTTATTTC TTCAGAAGTA AGGAATATGA TTACTCAAAC TGTTTGTAAA TATTAGTTGT TTGATTTGCC ATGCTTTAGA ATACTGACCG GCGATGTTTT CTGTGCCACG TCCCTTTTAT CGATTGAGCG GTAAIATTGT GTGAIGTTAT CTCTTTTACT 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4300 4860 4920 4980 5040 5100 5160 5220 5280 5340 CGGTGGCCTC AATCCCTTTA ATACGTGCTC GTGTGGTGGT TCGCTTTCTT GGGGGCTCCC AITTGGGTGA CGTTGGAGTC CTATCTCGGG ACAGGAITTT CCAGGCGGTG GGCGCCCAAT ACGACAGGTT TCACTCATTA TTGTGAGCGG GTGACTGGGA AAGCACTAIT CGCCCAGGTC CTAGGCTGAA TGAGTACATT TAAAITATTC GATCGCCCTT GCACCAGAAG GTCGTCCCCT TAICCCATTA CTCACATTTA GGCGTTCCTA AAATAITAAC TCTGATTATC ATTCTCTTGT AAATAGCTAC GTGATTTGAC GCATTGCATT ACTGATTATA AICGGCTCC GTCAAAGCAA TACGCGCAGC CCCTTCCTTT TTTAGGGTTC TGGTTCACGT CACGTTCTTT CTATTCTTTT CGCCTGCTGG AAGGGCAATC ACGCAAACCG TCCCGACTGG GGCACCCCAG ATAACAAITT AAACCCTGGC GCACTGGCAC CAGCTGCTCG GGCGAIGACC GGCTACGCTT AAAAAGTTTA CCCAACAGTT CGGTGCCGGA CAAACTGGCA CGGTCAATCC ATGTTGATGA TTGGTTAAAA GTTTACAATT AACCGGGGTA TTGCTCCAGA CCTCTGCGGC TGTCTCCGGC TAAAAIATAT AAAACACTTC TGTTTAGCTC CCATAGTACG GTCACCGCTA CTCGCCACGT CGATTTAGTG AGTGGGCCAI AATAGTGGAC GATTTATAAG GGCAAACCAG AGCTGTTCCC CCTCTCCCCG AAAGCGGGCA GCTTTACACT CACACGCGTC GTTACCCAAG TCTTACCGTT AGTCAGGCCT CTGCTAAGGC GGGCTATGGT CGAGCAAGGC GCGCAGCCTG AAGCTGGCTG GATGCACGGT GCCGTTTGTT AAGCTGGCTA AAIGAGCTGA TAAATAITTG CAIATGATTG CTCTCAGGCA AITAAITTAT CTTTCTCACC GAGGGITCTA TCAAGAITCT CCGCTCTGAT CGCCCTGTAG CACTTGCCAG TCGCCGGCTT CTTTACGGCA CGCCCTGATA TCTTGTTCCA GGAITTTGCC CGTGGACCGC CGTCTCGCTG CGCGTTGGCC GTGACCGCAA TTATGCTTCC ACTTGGCACT CTTTGIACAT ACCGTTACTG ATTGTGCCCA GGCGTACCGT TCCTGTCTAA TCCAACGAGG AAAGCACGTT CGGCGCATTA AGCGCGGCGG CGCCCTAGCG CCCGCTCCTT TCCCCGTCAA GCTCTAAATC CCTCGACCCC AAAAAACTTG GACGGTTTTT CGCCCTTTGA AACTGGAACA ACACTCAACC GATTTCGGAA CCACCAICAA TTGCTGGAAC TCTCTCAGGG GTGAAAAGAA AAACCACCCT GATTCATTAA CGCAATTAAT GGCTCGTATG GGCCGTCGTT GGAGAAAATA TTTACGCCTG GGGGAITGTA TGCATTCAAT AGTTTACAGG AGIATTATA GTTGGTGCTA TTCTTAAGCA ATAGCGAAGA AATGGCGAAT GGCGCTTTGC GAGTGCGATC TTCCTGAGGC TACGATGCGC CCATCTACAC CCCACGGAGA AICCGACGGG CAGGAAGGCC AGACGCGAAT TTTAACAAAA AITTAACGCG CTTATACAAT CTTCCTGTTT ACATGCTAGT TTTACGATTA ATGACCTGAI AGCCTTTGTA CAGCTAGAAC GGTTGAATAT CTTTTGAAIC TTTACCTACA AAAATTTTIA TCCTTGCGTT TGCAGCTGGC GTGAGTTAGC TTGTGTGGAA TTACAACGTC AAGTGAAACA TGACAAAAGC CTAGTGGAIC CAAGTGCTAC CCATAGGGAT GGCCCGCACC CTGGTTTCCG CGATACGGTC CAACGTAACC TTGTTACTCG TATTTTTGAT AAITTTAACA TTGGGGCTTT CCGTTCATCG GAICTCTCAA CATAITGATG CATTACTCAG GAAATAAAGG CTTCTCCCGC AAAAGTATTA GAGGGTCAIA ATGTTTTTGG TACAACCGAI TTAGCTTTAT h60 S520 5580 5640 5700 5760 S820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 73 GCTCTGAGGC TTTATTGCTT AATTTTGCTA ATTCTTTGCC TTGCCTGTAT GATTTATTGG ACGTT (2) INFORMATION FOR SEQ ID NO:4: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 7409 base pairs (B) TYPE: nucleic acid (0) STRANDEDNESS: both (D) TOPOLOGY: circular (xi) SEQUENCE DESCRIPTION: SEQ ID N024: AATGCTACTA ATAGCTAAAC CGTTCGCAGA GTTGCATATT TCTGCAAAAA TTGGAGTTTG TCTTTCGGGC CAGGGTAAAG TTTGAGGGGG AAACATTTTA GGTTTTTATC AATTCCTTTT ATGAATCTTT TCTTCCCAAC CAATGATTAA CTCGTCAGGG AATATCCCCT TGTACACCGT GTCTGCGCCT CAGGCGATGA CAAAGATCAG GTGGCATTAC CAAAGCCTCT CGATCCCGCA TGCGTGGGCG CTAITAGTAG AGGTTATTGA AITGGGAATC TAAAACATGT TGACCTCTTA CTTCCGGTCT TTCCTCTTAA ACCTGATTIT ATTCAATGAA crawracccc crccrcrccr CGCGTTATGT CTACCTGTAA GTCCTGACTG AGTTGAAATT CAAGCCTTAI TCTTGTCAAG TCATCTGTCC ccrrcccccr TACAAATCTC TGTTTTAGTG GTATTTTACC GTAGCCGTTG AAAGCGGCCT ATGGTTGTTG AATTGATGCC CCATTTGCGA AACTGTTACA TGAGCTACAG TCAAAAGGAG GGTTCGCTTT TCTTTTTGAT TGATTTATGG TAITTATGAC CTCTGGCAAA AAACGAGGGT ATCTGCATTA TAATGTTGTT GTATAATGAG AAACCATCTC TCACTGAATG ATTACTCTTG TCTTTCAAAG AAGTAACATG CGTTGTACTT TATTCTTTCG CGTTTAATGG CTACCCTCGT TTAACTCCCT TCATTGTCGG ACCTTTTCAG AATGTATCTA TCGAATGAAA CACCAGATTC CAATTAAAGG GAAGCTCGAA GCAATCCGCT TCATTCTCGT GAITCCGCAG ACTTCTTTTG TATGATAGTG GTTGAATGTG CCGTTAGTTC CCAGTTCTTA AAGCCCAATT AGCAGCTTTG ATGAAGGTCA TTGGTCAGTT GAGCAGGTCG TGTTTCGCGC CCTCTTTCGT AAACTTCCTC TCCGATGCTG GCAAGCCTCA CGCAACTATC CTCGCGCCCC ATGGTCAAAC CTTCCAGACA AGCAATTAAG TACTCTCTAA TTAAAACGCG TTGCTTCTGA TTTCTGAACT TAITGGACGC CAAAAGCCTC TTGCTCTTAC GTATTCCTAA GTTTIATTAA AAATCGCATA TACTACTCGT TTACGTTGAT GCCAGCCTAT CGGTTCCCTT CGGATTTCGA TTGGTATAAT TTTAGGTTGG ATGAAAAAGT TCTTTCGCTG GCGACCGAAT GGTATCAAGC AAATGAAAAT TAAATCTACT CCGTACTTTA CTCTAAGCCA TCCTGACCTG ATATTTGAAG CTATAATAGT GTTTAAAGCA TATCCAGTCT TCGCTATTTT TATGCCTCGT AICTCAACTG CGTAGATTTT AGGTAATTCA TCTGGTGTTT TTGGGTAATG GCGCCTGGTC ATGATTGACC CACAATTTAT CGCTGGGGGT TCCCTTCGTA CTTTAGTCCT CTGAGGGTGA ATATCGGTTA TGTTTAAGAA 180 2&0 300 360 A20 480 540 600 660 720 780 840 900 960 11hO 1200 1260 1320 1380 1440 1500 ATTCACCTCG TTTTTGGAGA TATTCTGACT TTTACTAACG CTGTGGAATG TGGGTTCCTA TCTGAGGGTG ATTCCGGGCT AACCCCGCTA CAGAATAATA CAAGGCACTG TATGACCCTT GATCCATTCG GCTGGCGGCG GGCGGTTCTG GAITTTGATT GAAAACGCGC GCTGCTATCG GGTGATTTTG TTAATGAATA TTTGTCTTTA TTCCGTGGTG TTTGCTAACA TATTATTGCG TTAAAAAGGG GGCTTAACTC TTGTTCAGGG TCTCTGTAAA ATTGGGATAA CTCGTTAGCG CTTGATTTAA CTTAGAATAC TCCTACGATG ACCCGTTCTT AAAGGAAGCT TTTTCAACGT CCGCTGAAAC TCTGGAAAGA CTACAGGCGT TTGGGCTTGC GCGGTTCTGA ATACTTATAT ATCCTAATCC GGTTCCGAAA ACCCCGTTAA ACTGGAACGG TTTGTGAATA GCTCTGGTGG AGGGTGGCGG GATAAACCGA GAAAAAATTA TGTTGAAAGT CGACAAAACT TGTAGTTTGT TATCCCTGAA GGGTGGCGGT CAACCCTCTC TTCTCTTGAG TAGGCAGGGG AACTTATTAC TAAAITCAGA TCAAGGCCAA TGGTTCTGGT CTCTGAGGGA - TACAAITAAA TTATTCGCAA TGTTTAGCAA TTAGATCGTT ACTGGTGACG AATGAGGGTG ACTAAACCTC GACGGCACTT GAGTCTCAGC GCATTAACTG CAGTACACTC GACTGCGCTT TCGTCTGACC GGCGGCTCTG GGCGGTTCCG AIGAAAAGAT GGCAAACGCT AAIAAGGGGG TACAGTCTGA ATGGTTTCAT CTGGCTCTAA AITTCCGTCA GCGCTGGTAA TCTTTGCGTT TACTGCGTAA TTTCCTCGGT CTTCGGTAAG AATTCTTGTG TGTTCAGTTA GGCTGCTATT AIAAIATGGC TTGGTAAGAT GGCTTCAAAA CGGATAAGCC CGCTAAAGGC TGGTGACGTT TTCCCAAATG ATAITTACCT ACCATATGAA TCTTTTATAT TAAGGAGTCT TTGCTTCTGG ATAGCTATTG GGTTAICTCT AITCTCCCGT TTCAITTTTG TGTTTATTTT TCAGGATAAA CCTCCCGCAA TTCTATATCT AAACTTGATT TCCGGCCTTG GCTCAAGTCG TCCCTCCCTC TTTTCTATTG GTTGCCACCT TAAICATGCC IAACTTIGTT CTAITTCATT CTGATATTAG CTAATGCGCT ACGTTAAACA GTAACTGGGA AITGTAGCTG GTCGGGAGGT GAITTGCTTG AAAAIAAAAA CGGCTTGCTT GTTCTCGATG GGAATGAIAA GGAAAGACAG CCGAITATTG GGCTCCTTTT TTCCTTTAGT AACCCCAIAC ACGCTAACTA AAACTCAGTG GTGGCTCTGA CTGAGTACGG ATCCGCCTGG CTCTTAATAC TTTATACGGG CTGTAICATC TCCATTCTGG TGCCTCAACC AGGGTGGTGG GTCGTGGCTC CTAIGACCGA CTGTCGCTAC CTAATGGTAA GTGACGGTGA AATCGGTTGA ATTGTGACAA TTAIGTATGT AGTTCTTTTG CGGCTATCTG GTTTCTTGCT CGCTCAAITA TCCCTGTTTT AAAAATCGTT AAITAGGCTC GGTGCAAAAT TCGCTAAAAC CTATTGGGCG AGTGCGGTAC AITGGTTTCT GGAGCCTTTT TGTTCCTTTC AGAAAATTCA TGAGGGTTGT TTACGGTACA GGGTCGCGGT TGATACACCT TACTGAGCAA TTTCAIGTTT CACTGTTACT AAAAGCCATG CTTTAATGAA TCCTGTCAAT CTCTGAGGGT TGGTTCCGGT AAATGCCGAT TGAITACGGT TGGTGCTACT TAAITCACCT AIGTCGCCCT AAIAAACTTA ATTTTCTACG GGTAITCCGT CTTACTTTTC CTTAITATTG CCCTCTGACT TAIGTTATTC TCTTATTTGG TGGAAAGACG AGCAACTAAT GCCTCGCGTT CGGTAATGAI TTGGTTTAAT ACATGCTCGT 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2h60 2520 2580 2640 2700 2760 2820 2830 2940 3000 3060 3120 3180 3240 3300 3360 3h20 3480 3540 AAATTAGGAT CGTTCTGCAT TTTGTCGTA GTTGGCGTTG ACTGGTAAGA TCCGGTGTTT AATTTAGGTC TGTCTTGCGA GAGGTTAAAA CAGCGTCTTA AGCGACGATT ATTAAAAAAG TGTTTCATCA TGTAACTTGG TACTGTTACT TGTTTTACGT TAATCCAAAC TGATAATTCC TTTTAAAATT GTCTAATACT TAGTGCACCT AACTGACCAG TTTTICATTT CCTCACCTCT AGGGCTATCA TATTCTTACG TACTGGTCGT TCAAAATGTA TCTGCATATT TACTAATCAA CGGTGGCCTC AATCCCTTIA ATACGTGCTC GTGTGGTGCT GGGATATTAT TAGCTGAACA CTTTATATTC TTAAATATGG ATTTGTATAA ATTCTTAITT AGAAGATGAA TTGGATTTGC AGGTAGTCTC AICTAAGCTA TACAGAAGCA GTAATTCAAA TCTTCTTTTG TATTCAAACC GTATATTCAT GCTAATAATT AATCAGGATT GCTCCTTCTG AATAACGTTC TCTAAATCCT AAAGATATTT ATATTGATTG GCTGCTGGCT GTTTTATCTT CTTCCCGCAT CTTTCAGGTC GTGACTCGTG GGTATTTCCA ACCAGCAAGG AGAAGTATTG ACTGAITATA ATCGGCCTCC GTCAAAGCAA TACGCGCAGC TTTTCTTGTT TGTTGTTTAT TCTTATTACT CGATTCTCAA CGCATATGAT AACGCCTTAT GCTTACTAAA ATCAGCATTT TCAGACCTAT TCGCTATGTT AGGTTATTCA TGAAATTGTT CTCAGGTAAT AATCAGGCGA CTGACGTTAA TTGATATGGT AIAITGATGA GTGGTTTCTT GGGCAAAGGA CAAATGTAIT TAGATAACCT ACGGTTTGAT CTCAGCGTGG CTGCTGGTGG TAAAGACTAA AGAAGGGTTC AATCTGCCAA TGAGCGTTTT CCGATAGTTT CTACAACGGT AAAACACTTC TGTTTAGCTC CCATAGTACG GTGACCGCTA CAGGACTTAT TGTCGTCGTC GGCTCGAAAA TTAAGCCCTA ACTAAACAGG TTATCACACG AIATATTTGA ACATATAGTT GATTTTGATA TTCAAGGATT CTCACATATA AAAICTAATT TGAAATGAAI AICCGTTATT ACCTGAAAAT TGGTTCAATT AITGCCATCA TGTTCCGCAA TTTAATACGA ATCTATTGAC TCCTCAATTC ATTTGAGGTT CACTGTTGCA TTCGTTCGGT TAGCCATTCA TATCTCTGTT TGTAAATAAT TCCTGTTGCA GAGTTCTTCT TAATTTGCGT TCAAGATTCT CCGCTCTGAT CGCCCTGTAG CACTTGCGAG CTATTGTTGA TGGACAGAAT TGCCTCTGCC CTGTTGAGCG CTTTTTCTAG GTCGGTATTT AAAAGTTTTC ATATAACCCA AATTCACTAT CTAAGGGAAA TTGATTTATG AATTTTGTTT AAITCGCCTC GTTTCTCCCG CTACGCAATT CCTTCGATAA TCTGATAATC AATGATAATG GTTGTCGAAT GGCTCTAATC CTTTCTACTG CAGCAAGGTC GGCGGTGTTA ATTTTTAATG AAAATATTGT GGCCAGAATG CCATTTCAGA ATGGCTGGCG ACTCAGGCAA GATGGACAGA GGCGTACCGT TCCAACGAGG CGGCGCATTA CGCCCTAGCG TAAACAGGCG TACTTTACCT TAAATTACAT TTGGCTTTAT TAATTATGAT CAAACCATTA ACGCGTTCTT ACCTAAGCCG TGACTCTTCT ATTAATTAAT TACTGTTTCC TCTTGATGTT TCCGCGATTT ATGTAAAAGG TCTTTATTTC TTCAGAAGTA AGGAATATGA TTACTCAAAC TGTTTGTAAA TATTAGTTGT TTCATTTGCC ATGCTTTAGA AIACTGACCG GCGATGTTTT CTGTGCCACG TCCCTTTTAT CGATTGAGCG GTAATATTGT GTGAIGTTAT CTCTTTTACT TCCTGTCTAA AAAGCACGTT AGCGCGGCGG CCCGCTCCTT 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 TCGCTTTCTT GGGGGCTCCC ATTTGGGTGA CGTTGGAGTC CTAICTCGGG ACAGGATTTT CCAGGCGGTG GGCGCCCAAT ACGACAGGTT TCACTCATTA' TTGTGAGCGG GTGACTGGGA AAGCACTATT GGGGTTTATG CAATAGTTTA TATAGTTGGT AGCAATAGCG GAATGGCGCT GATCTTCCTG GCGCCCATCT GAGAAICCGA GGCCAGACGC AAAAATTTAA CAAICTTCCT TAGTTTTACG TGAIAGCCTT GAACGGTTGA AATCTTTACC TTTATCCTTG TTGGTACAAC TGCCTTGCCT CCCTTCCTTT TTTAGGGTTC TCGTTCACGT CACGTTGTTT CTAITCTTTT CGCCTGCTGG AAGCGCAATC ACGCAAACCG TCCCGACTGG GGCACCCCAG AIAACAATTT AAACCCTGGC GCACTGGCAC ACTTCTGAGG CAGGCAAGTG GCTACCATAG AAGAGGCCCG TTGCCTGGTT AGGCCGATAC ACACCAACGT CGGGTTGTTA GAATTATTTT CGCGAATTTT GTTTTTGGGG AITACCGTTC TGTAGATCTC ATATCATATT TACACATTAC CTCGCCACGT CGATTTAGTG AGTGGGCCAT AATAGTGGAC GATTTATAAG GGCAAACCAG AGCTGTTGCC CCTCTCCCCG AAAGCGGGCA GCTTTACACT CACACGCGTC GTTACCCAAG TCTTACCGTT GATCCGGAGC CTACTGAGTA GGATTAAATT CACCGAICGC TCCGGCACCA GGTCGTCGTC AACCTAICCC CTCGCTCACA TGATGGCGTT AACAAAATAI CTTTTCTGAT ATCGATTCTC TCAAAAATAG GATGGTGATT TCAGGCATTG CGTTGAAAIA AAGGCTTCTC CGATTTAGCT TTATGCTCTG GTATGATTTA TTGGACGTT TCGCCGGCTT CTTTACGCCA CGCCCTGATA TCTTGTTCCA GGAITTTGCC CGTGGACCGC CGTCTCGCTG CGCGTTGGCC GTGAGCGCAA TTATGCTTCC ACTTGGCACT CTTTGTACAT ACTGTTTACC TGAAGGCGAI CATTCGCTAC AITCAAAAAG CCTTCCCAAC GAAGCGGTGC CCCTCAAACT ATTACGGTCA TTTAATGTTG CCTATTGGTT TAACGTTTAC TATCAACCGG TTGTTTGCTC CTACCCTCTC TGACTGTCTC CATTTAAAAI CCGCAAAAGT AGGCTTTATT TCCCCGTCAA GCTCTAAATC -CCTCGACCCC AAAAAACTTG GACGGTTTTT CGCCCTTTGA AACTGGAACA ACACTCAACC GATTTCGGAA CCACCAICAA TTGCTGCAAC TCTCTCAGGG GTGAAAAGAA AAACCACCCT GATTCATTAA TGCAGCTGGC CGCAATTAAT GTGAGTTAGC GGCTCGTATG TTGTGTGGAA GGCCGTCGTT TTACAACGTC GGAGAAAATA AAGTGAAACA CCTGTGGCAA AAGCCTATGG GACCCTGCTA AGGCTGCATT GCTTCGGCTA TGGTAGTAGT TTTACGAGCA ABGCTTCTTA AGTTGCGCAG CGGAAAGCTG GGCAGATGCA AICCGCCGTT AIGAAAGCTG AAAAAATGAG AAITTAAAIA GGTACATATG CAGACTCTCA CGGCATTAAI CGGCCTTTCT AIATGAGGGT AITACAGGGT GCTTAATTTT CCTGAATGGC GCTGGAGTGC CGGTTACGAI TGTTCCCACG GCTACAGGAA CTGATTTAAC TTTGCTTATA ATTGACATGC GGCAATGACC TTATCAGCTA CACCCTTTTG TCTAAAAATT CATAATGTTT GCTAATTCTT 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 74 (2) INFORMATION FOR SEQ ID NO:5: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 729k base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: both (D) TOPOLDGYZ circular (xi) SEQUENCE DESCRIPTION: SEQ ID N025: AATGCTACTA CTATTAGTAG AATTGATGCC ATAGCTAAAG CGTTCGCAGA GTTGCATATT TCTCCAAAAA TTCGAGTTTG TCTTTCGGGC CACGGTAAAG TTTGAGGGGG AAACATTTTA GGTTTTTATC AATTCCTTTT ATGAATCTTT TCTTCCGAAC CAATGATTAA CTCGTCAGGG AATATCCGGT TGTACACCGT GTCTGCGCCT CAGGCGATGA CAAAGATGAG GTGGCATTAC CAAAGCCTCT CGATCCCGCA TGCGTGGGCG ATTCACCTCG TTTTTGGAGA TATTCTCACT AGGTTATTGA ATTGGGAATC TAAAACATGT TGACCTCTTA CTTCCGGTCT TTCCTCTTAA ACCTGATTTT ATTCAATGAA CTATTACCCC GTCGTCTGGT GGCGTTATGT CTACCTGTAA GTCCTGACTG AGTTGAAATI CAAGCCTTAT TCTTCTCAAG TCATCTGTCC CGTTCCGGCT TACAAATCTC TGTTTTAGTG GTATTTTACC GTAGCCGTTG AAAGCGGCCT ATGGTTGTTG AAAGCAAGCT TTTTCAACGT CCGCTGAAAC CCATTTGCGA AACTGTTACA IGAGCTACAG TCAAAAGGAG GGTTCGCTTT TCTTTTTGAT TGAITTATGG TAITTATGAC CTCTGGCAAA AAACGAGGGT ATCTGCATTA TAATGTIGTT GIATAATGAG AAACCATCTC TCACTGAATG ATTACTCTTG TCTTTCAAAG AAGTAACATG GGTTGTACTT TATTCTTTCG CGTTTAATGG CTACCCTCGT TTAACTCCCT TCATTGTCGG GATAAACCGA GAAAAAATTA TGTTGAAAGT ACCTTTTCAG AATGTATCTA TGGAATGAAA CACCAGATTC CAATTAAAGG GAAGCTCGAA GCAATCCGCT TCATTCTCGT GATTCCGCAG ACTTCTTTTG TATGATAGTG GTTGAATGTG CCGTTAGTTC CCAGTTCTTA AAGCCCAATT AGCAGCTTTG ATGAAGGTCA TTGGTCAGTT GAGCAGGTCG TGTTTCGCGC CCTCTTTCGT AAACTTCCTC TCCGATGCTG GCAAGCGTCA GGGAACTATC TACAATTAAA TTATTCGCAA TGTTTAGCAA CTCGCGCCCC ATGGTCAAAC CTTCCAGACA AGCAAITAAG TACTCTCTAA TTAAAACGCG TTGCTTCTGA TTTCTGAACT TAITGGACGC CAAAAGCCTC TTGCTCTTAC GTATTCCTAA GTTTTATTAA AAAICGCATA TACTACTCGT TTACGTTGAT GCCAGCCTAT CGGTTCCCTT CGGATTTCGA TTGGTAIAAT TTTAGGTTGG ATGAAAAAGT TCTTTCGCTG GCGACCGAAT GGTATCAAGC GGCTCCTTTT TTCCTTTAGT AACCCCATAC AAATGAAAAT TAAAICTACT CCGTACTTTA CTCTAAGCCA TCCTGACCTG ATATTTGAAG CTATAATAGT GTTTAAAGCA TATCCAGTCT TCGCTATTTT TAIGCCTCGT AICTCAACTG CGTAGATTTT AGGTAATTCA TCTGGTGTTT TTGGGTAATG GCGCCTGGTC AIGATTGACC CACAAITTAT CGCTGGGGGT TGCCTTCGTA CTTTAGTCCT CTGAGGGTGA ATATCGGTTA TGTTTAAGAA GGAGCCTTTT TGTTCCTTTC AGAAAATTCA 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1140 1200 1260 1320 1380 1hh0 1500 1560 1620 1680 TTTACTAACG CTGTGGAATC TGGGTTCCTA TCTGAGGCTC ATTCCGGGCT AACCCCGCTA CAGAATAATA CAAGGCACTG TATGACGCTT GATCCATTCG ccrccccccc cccccrrcwc GATTTTGATT GAAAACGCGC GCTGCTATCG GGTGATTTTG TTAATGAATA TTTGTCTTTA TTCCGTGGTG TTTGCTAACA TAITATTGCG TTAAAAAGGG GGCTTAACTC TTGTTCAGGG TCTCTGTAAA ATTGGGATAA CTCGTTAGCG CTTGATTTAA CTTAGAATAC TCCTACGATG ACCCGTTCTT AAATTAGGAT CGTTCTGCAI TTTGTCGGTA TCTGGAAAGA CGACAAAACT CTACAGGCGT TGTAGTTTGT TTGGGCTTGC TATCCCTGAA GCGGTTCTGA GGGTGGCGGT ATACTTATAT CAACCCTCTC ATCCTAATCC TTCTCTTGAG GGTTCCGAAA TAGGCAGGGG ACCCCGTTAA AACTTATTAC ACTGGAACGG TAAATTCAGA ITTGTGAATA TCAAGGCCAA ccrcrccrcc TGGTTCTGGT AGGGTGGCGG CTCTGAGGGA ATGAAAAGAI GGCAAACGCT TACAGTCTGA CGCTAAAGGC ATGGTTTCAT TGGTGACGTT crcccrcraa TTCCCAAATG ATTTCCGTGA AIATTTACCT GCGCTGGTAA ACCATATGAA TCTTTGCGTT TCTTTTATAT racrcccraa TAAGGAGTCT mccrcccr 'r'rcc'rIc'rcG crrcccmac ATAGCTATTG AATTCTTGTG ccrrarcwcr TGTTCAGTTA AITTCCCGT GGCTGCTATT TTCATTTTTG ATAATATGGC TGTTTAITTT TTGGTAAGAT TCAGGATAAA GGCTTCAAAA CCTCCCGCAA CGGATAAGCC TTCTATATCT AAAATAAAAA ccccrrccrw GGAATGATAA GGAAAGACAG GGGATATTAT crrccrrcrr TAGCTGAACA TGTTGTTTAT CTTTATATTC TCTTATTACT TTAGATCGTT ACTGGTGACG AATGAGGGTG ACTAAACCTC GACGGCACTT GAGTCTCAGC GCATTAACTG CAGTACACTC GACTGCGCTT TCGTCTGACC GGCGGCTCTG GGCGGTTCCG AATAAGGGGG AAACTTGATT TCCGGCCTTG GCTCAAGTCG TCCCTCCCTC TTTTCTATTG GTTGCCACCT TAATCATGCC TAACTTTGTT CTATTTCATT CTGAIATTAG CTAAIGCGCT ACGTTAAACA GTAACTGGCA ATTGTAGCTG GTCGGGAGGT GATTTGCTTG GTTCTCGATG CCGATTAITG CAGGACTTAT TGTCGTCGTC GGCTCGAAAA ACGCTAACTA AAACTCAGTG GTGGCTCTGA CTGAGTACGG ATCCGCCTGG CTCTTAATAC TTTATACGGG CTGTAICATC TCCATTCTGG TGCCTCAACC AGGGTGGTGG GTGGTGGCTC CTAIGACCGA CTGTCGCTAC CTAATGGTAA GTGACGGTGA AATCGGTTGA ATTGTGACAA TTATGTATGT AGTTCTTTTG CGGCTATCTG GTTTCTTGCT CGCTCAATTA TCCCTGTTTT AAAAATCGTT AATTAGGCTC GGTGCAAAAT TCGCTAAAAC CTATTGGGCG AGTGCGGTAC AITGGTTTCT CTATTGTTGA TGGACAGAAT TGCCTCTGCC TGAGGGTTGT TTACGGTACA GGGTGGCGGT TGATACACCT TACTGAGCAA TTTCAIGTTT CACTGTTACT AAAAGCCATG CTTTAATGAA TCCTGTCAAT CTCTGAGGGT TGGITCCGGT AAATGCCGAT TGAITACGGT TGGTGCIACT TAAITCACCT ATGTCGCCCT AATAAACTTA AITTTCTACG GGTATTCCGT CTTACTTTTC CTTAITATTG CCCTCTGACT TATGTTATTC TCTTATTTGG TGGAAAGACG AGCAACIAAT CCCTCGCGTT CGGTAATGAT TTGGTTTAAT ACAIGCTCGT TAAACAGGCG TACTTTACCT TAAATTACAI 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 GTTGGCGTTG ACTGGTAAGA TCCGGTGTTT AATTTAGGTC TGTCTTGCGA GAGGTTAAAA CAGCGTCTTA AGCGACGATT ATTAAAAAGG GTTTCATCAT GTAACTTGGT ACTGTTACTG CTTTTACGTG AATCCAAACA GATAATTCCG TTTAAAATTA TCIAATACTT AGTGCACCTA ACTGACCAGA TTTTCATTTG CTCACCTCTG GGGCTATCAG ATTCTTACGC ACTGGTCGTG CAAAATGTAG CTGGAIATTA ACTAATCAAA GCTGGCCTCA ATCCCTTTAA TACGTGCTCG TGTGGTGGTT CGCTTTCTTC GGGGCTCCCT TTTGCGTGAT TTAAATATGG CGATTCTCAA ATTTGTATAA CGCATATGAT ATTCTTATTT AACGCCTTAT AGAAGATGAA GCTTACTAAA TTGGATTTGC ATCAGCATTT AGGTAGTCTC TCAGACCTAT ATCTAAGCTA TCGCTATGTT TACAGAAGCA AGGTTATTCA TAAITCAAAT GAAATTGTTA crrcrrrrcc TCAGGTAATT ATTCAAAGCA ATCAGGCGAA TATATTCATC TGACGTTAAA CTAATAATTT TGATATGGTT ATCAGGATTA TATTGATGAA crccrrcrcc TGGTTTCTTT ATAACGTTCG GGCAAAGGAT CTAAATCCTC AAATGTATTA AAGAIATTTT AGATAACCTT TATTGATTGA CTGCTGGCTC TTTTATCTTC TTCGCGCATT TTTCAGGTCA TGACTGGTGA GTATTTCCAT CCAGCAAGGC GAAGTATTGC CTGATTATAA TCGGCCTCCT TCAAAGCAAC ACGCGCAGCG CCTTCCTTTC TTAGGGTTCC GGTTCACGTA GGGTTTGATA TCAGCGTGGC TGCTGGTGGT AAAGACTAAT GAAGGGTTCT ATCTGCCAAT GAGCGTTTTT CGATAGTTTG TACAACGGTT AAACACTTCT GTTTAGCTCC CATAGTACGC TGACCGCTAC TCGCCACGTT GAITTAGTGC GTGGGCCATC TTAAGCCCTA ACTAAACAGG TTATCACACG ATATATTTGA ACATAIAGTT GATTTTGATA TTCAAGGATT CTCACATATA AATGTAATTA GAAATGAATA TCCGTTATTG CCTGAAAATC GGTTCAATTC TTGCCATCAI GTTCCGCAAA TTAATACGAG TCTATTGACG CCTCAATTCC TTTGAGGTTC ACTGTTGCAG TCGTTCGGTA AGCCATTCAA ATCTCTGTTG GTAAATAATC CCTCTTGCAA AGTTCTTCTA AATTTGCGTG CAAGATTCTG CGCTCTGATT GCCCTGTAGC ACTTGCCAGC CGCCGGCTTT TTTACGGCAC GCCCTGATAG CTGTTGAGCG CTTTTTCTAG GTCGGTATTT AAAAGTTTTC ATATAACCCA AATTCACTAT CTAAGGGAAA TTGATTTATG ATTTNHHTT ATTCGCCTCT TTTCTCCCGA TACGCAATTT CTTCCATTAI CTGATAATCA ATGATAATGT TTGTCGAATT GCTCTAATCT TTTCTACTGT AGCAAGGTGA GCGGTGTTAA TTTTIAATGG AAAIATTGTC GCCAGAATGT CATTTCAGAC TGGCTGGCGG CTCAGGCAAG ATGGACAGAC GCGTACCGTT CCAACGAGGA GGCGCATTAA GCCCTAGCGC CCCCGTCAAG CTGGACCCCA ACGGTTTTTC TTGGCTTTAT TAATTATGAT CAAACCATTA ACGCGTTCTT ACCTAAGCCG TGACTCTTCT ATTAATTAAT TACTGTTTCC CTTGATGTTT GCGCGATTTT TGTAAAAGGT CTTTATTTCT TTAGAAGTAT GGAATATGAT TACTCAAACT GTTTGTAAAG ATTAGTTGTT TGATTTGCCA TGCTTTAGAT TACTGACCGC CGATGTTTTA TGTGCCACGT CCCTTTTATT GATTGAGCGT TAATATTGTT TGATGTTATT TCTTTTACTC CCTGTCTAAA AAGCACGTTA GCGCGGCGGG CCGCTCCTTT CTCTAAATCG AAAAACTTGA GCCCTTTGAC GTTGGAGTCC TAICTCGGGC CAGGATTTTC CAGGCGGTGA GCGCCCAATA CGACAGGTTT CACTCATTAG TGTGAGCGGA GTAGGAGAGC AGTTTACAGG GTTGGTGCTA GCTGGCGTAA AIGGCGAATG AGTGCGATCT ACGATGCGCC CCACGGAGAA AGGAAGGCCA TTAACAAAAA TTAIACAATC CATGCTAGTT TGACCTGATA AGCTAGAACG TTTTGAATCT AAAITTTTAT TGTTTTTGGT TTCTTTGCCT ACGTTCTITA TATTCTTTTG GCCTGCTGGG AGGGCAATCA CGCAAACCGC CCCGACTGGA GCACCCCAGG TAACAATTTC TCGGCGGATC CAAGTGCTAC CCATAGGGAT TAGCGAAGAG GCGCTTTGCC TCCTGAGGCC CATCTACACC TCCGACGGGT GACGCGAATT TTTAACGCGA TTCCTGTTTT TTACGAITAC GCCTTTGTAG GTTGAATATC TTACCTACAC CCTTGCGTTG ACAACCGAIT TGCCTGTATG AIAGTGGACT AITTATAAGG GCAAACCAGC GCTGTTGCCC CTCTCCCCGC AAGCGGGCAG CTTTACACTT ACACAGGAAA CGAGGCTGAA TGAGTACATT TAAATTATTC GCCCGGACCG TGGTTTCCGG GATACGGTCG AACGTAACCT TGTTACTCGC AITTTTGATG ATTTTAACAA TCGGGCTTTT CGTTCAICGA AICTCTCAAA AIAITGATGG ATTACTCAGG AAAIAAAGGC TAGCTTTATG AITTATTGGA (2) INFORMATION FOR SEQ ID NO:6: (i) SEQUENCE CHARACTERISTICS: CTTGTTCCAA GATTTTGCCG GTGGACCGCT GTCTCGCTGG GCGTTGGCCG TGAGCGCAAC TATGCTTCCG CAGCTATGAC GGCGATGACC GGCTACGCTT AAAAAGTTTA ATCGCCCTTC GACCAGAAGC TCGTCCCCTC ATCCCATTAC TCACATTTAA GCGTTCCTAT AAIAITAACG CTGATTATCA TTCTCTTGTT AAIAGCTACC TGAITTGACT CATTGCATTT TTCTCCCGCA CTCTGAGGCT CGTT (A) LENGTH: 7394 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: both (D) TOPDLOGY: circular (xi) SEQUENCE DESCRIPTION: SEQ ID N026: ACTGGAACAA CACTCAACCC AITTCGGAAC TGCTGCAACT CACCATCAAA CTCTCAGGGC TGAAAAGAAA AACCACCCTG ATTCATTAAT GCAATTAATG GCTCGTATGT CAGGATGTAC CTGCTAAGGC GGGCTAIGGT CGAGCAAGGC CCAACAGTTG GGTGCCGGAA AAACTGGCAG GGTCAATCCG TGTTCAIGAA TGGTTAAAAA TTTACAAITT ACCGGGGTAC TGCTCCAGAC CTCTCCGGCA GTCTCCGGCC AAAATAIATG AAAGTATTAC GCAGCTGGCA TGAGTTAGCT TGTGTGGAAT GAATTCGCAG TGCATTCAAT AGTAGTTATA TTCTTAACCA CGCAGCCTGA AGCTGGCTGG AIGCACGGTT CCGTTTGTTC AGCTGGCTAC AIGAGCTGAI AAATATTTGC ATATGATTGA TCTCAGGCAA TTAATTTATC TTTCTCACCC AGGGTTCTAA AGGGTCATAA TTATTGCTTA ATTTTGCTAA AAIGCTACTA CTAITAGTAG AAITGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 72 ATAGCTAAAC CCTTCGCAGA GTTGCATATT TCTGCAAAAA TTGGACTTTG TCTTTCGGGC CAGGGTAAAG TTTGAGGGGC AAACATTTTA GGTTTTTATC AATTCCTTTT ATGAATCTTT TCTTCCCAAC CAATGATTAA CTCGTCAGGG AAIATCCGGT TGTACACCGT GTCTGCGCCT CAGGCGATGA CAAAGATGAG GTGGCATTAC CAAAGCCTCT CGATCCCGCA TGCGTGGGCG AITCACCTCG TTTTTGGACA TATTCTCACT TTTACTAACG CTGTGGAATG TGGGTTCCTA TCTGAGGGTG ATTCCGGGCT AACCCCGCTA CAGAATAATA AGGTTATTGA ATTGGGAATC TAAAACATGT TGACCTCTTA CTTCCGGTCT TTCCTCTTAA ACCTGATTTT ATTCAATGAA CTATTACCCC GTCGTCTGGT CCCGTTAIGT CTACCTGTAA GTCCTGACTG AGTTGAAATT CAAGCCTTAT TCTTGTCAAG TCATCTGTCC CGTTCCGGCT TACAAATCTC TGTTTTAGTG GTAITTTACC GTAGCCGTTG AAAGCGGCCT ATGGTTGTTG AAAGCAAGCT TTTTCAACGT CCGCTGAAAC TCTGGAAAGA CTACAGGCGT TTGGGCTTGC GCGGTTCTGA ATACTTATAT ATCCTAATCC GGTTCCGAAA CCATTTGCGA AACTGTTACA TGAGCTACAG TCAAAAGGAG GGTTCGCTTT TCTTTTTGAT TGATTTAIGG TATTTATGAC CTCTGGCAAA AAACGAGGGT ATCTGCATTA TAATGTTGTT CTATAATGAG AAACCATCTC TCACTGAATG AITACTCTTG TCTTTCAAAG AAGTAACATG CGTTGTACTT TATTCTTTCG CGTTTAATGG CTACCCTCGT TTAACTCCCT TCATTGTCGG GATAAACCGA GAAAAAATTA TGTTGAAAGT CGACAAAACT TGTAGTTTGT TATCCCTGAA GGGTGGCGGT CAACCCTCTC TTCTCTTGAG TAGGCAGGGG AATGTATCTA ATGCTCAAAC TAAATCTACT TGGAATGAAA CACCAGATTC CAATTAAAGG GAAGCTCGAA GCAATCCGCT TCATTCTGGT GAITCCGCAG ACTTCTTTTG TATGATAGTG GTTGAATGTG CCGTTAGTTC CCAGTTCTTA AAGCCCAATT AGCAGCTTTG ATGAAGGTCA TTGGTCAGTT GAGCAGGTCG TGTTTCGCGC CCTCTTTCGT AAACTTCCTC TCCGATGCTG GCAAGCCTCA CGCAACTATC TACAAITAAA TTAITCGCAA TGTTTAGCAA TTAGATCGTT ACTGGTGACG AATGAGGGTG ACTAAACCTC GACCGCACTT GAGTCTCAGC GCATTAACTG CTTCCAGACA AGCAATTAAG TACTCTCTAA TTAAAACGCG TTGCTTCTGA TTTCTGAACT TATTGGACGC CAAAAGCCTC TTGCTCTTAC GTATTCCTAA GTTTTAITAA AAATCGCATA TACTACTCGT TTACGTTGAT GCCAGCCTAT CGGTTCCCTT CGGATTTCGA TTCGTAIAAT TTTAGGTTGG AIGAAAAAGT TCTTTCGCTG GCGACCGAAT GGTATCAAGC GGCTCCTTTT TTCCTTTAGT AACCCCATAC ACCCTAACTA AAACTCAGTG GTGGCTCTGA CTGAGTACGG ATCCGCCTGG CTQTTAATAC TTTAIACGGG CCGTACTTTA CTCTAAGCCA TCCTGACCTG AIATTTGAAG CTAIAATAGT GTTTAAAGCA TATCCAGTCT TCGCTATTTT TATGCCTCGT AICTCAACTG CGTAGATTTT AGGTAATTCA TCTGGTGTTT TTGGGTAATG GCGCCTGGTC ATGATTGACC CACAATTTAT CGCTGGGGGT TGCCTTCGTA CTTTAGTCCT CTGAGGGTGA AIATCGGTTA TGTTTAAGAA GGAGCCTTTT TGTTCCTTTC ACAAAATTCA TGAGGGTTGT TTACGGTACA GGGTGGCGGT TGAIACACCT TACTGAGCAA TTTCATGTTT CACTGTTACT 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 CAAGGCACTG TATGACGCTT GATCCATTCG GCTGGCGGCG GGCGGTTCTG GATTTTGATT GAAAACGCGC GCTGCTATCG GGTGATTTTG ACCCCGTTAA AACTTATTAC ACTGGAACGG TAAATTCAGA TTTGTGAATA TCAAGGCCAA GCTCTGGTGG TGGTTCTGGT AGGGTGGCGG CTCTGAGGGA ATGAAAAGAT GGCAAACGCT TACAGTCTGA CGCTAAAGGC ATCCTTTCAT TGGTGACGTT CTGGCTCTAA TTCCCAAAIG TTAATGAATA ATTTCCGTCA ATATTTAGCT TTTGTCTTTA TTCCGTGGTG TTTGCTAACA TATTATTGCG TTAAAAAGGG GGCTTAACTC TTGTTCAGGG TCTCTGTAAA ATTGGGAIAA CTCCTTAGCG TTGATTTAA CTTAGAATAC TCCTACGATG ACCCGTTCTT AAATTAGGAT CGTTCTGCAT TTTGTCGGTA GTTGGCGTTG ACTGGTAAGA TCCGGTGTTT AATTTAGGTC TGTCTTGCGA GAGGTTAAAA CAGCGTCTTA GCGCTGGTAA ACCATATGAA TCTTTGCGTT TCTTTTATAT TACTGCGTAA TAAGGAGTCT TTTCCTCGGT TTCCTTCTGG CTTCGGTAAG AAITCTTGTG TGTTCAGTTA GGCTGCTATT ATAGCTATTG GGTTAICTCT ATTCTCCCGT TTCATTTTTG AIAAIAIGGC TTGGTAAGAI GGCTTCAAAA CGGAIAAGCC AAAAIAAAAA GGAATGATAA GGGAIATTAI TAGCTGAACA CTTTATATTC TTAAAIATGG AITTGTATAA AITCTTATTT AGAAGATGAA TTGGATTTGC AGGTAGTCTC A$CTAAGCTA TGTTTATTTT TTAGGATAAA CCTCCCGCAA TTCTATATCT CGGCTTGCTT GGAAAGACAG TTTTCTTGTT TGTTGTTTAI TCTTATTACT CGAITCTCAA CGCATATGAT AACGCCTTAT GCTTACTAAA ATCAGCATTT TCAGACCTAT TCGCTATGTT CAGTACACTC GACTGCGCTT TCGTCTGACC GGCGGCTCTG GGCGGTTCCG AAIAAGGGGG AAACTTGAIT TCCGGCCTTG GCTCAAGTCG TCCCTCCCTC TTTTCTATTG GTTGCCACCT TAATCATGCC TAACTTTGTT CTATTTCATT CTGATATTAG CTAAIGCGCT ACGTTAAACA GTAACTGGCA ATTGTAGCTG GTCGGGAGGT GATTTGCITG GTTCTCGATG CCGAITAITG CAGGACTTAT TGTCGTCGTC GGCTCGAAAA TTAAGCCCTA ACTAAACAGG TTATCACACG ATATATTTGA ACATATAGTT GATTTTGATA TTCAAGGAIT CTGTATCATC AAAAGCCATG TCCAITCTGG CTTTAATGAA TGCCTCAACC TCCTGTCAAT AGGGTGGTGG CTCTGAGGGT GTGGTGGCTC TGGTTCCGGT CTATGACCGA AAATGCCGAT CTGTCGCTAC TGATTACGGT CTAATGGIAA TGGTGCTACT GTGACGGTGA TAATTCACCT AATCGGTTGA AIGTCGCCCT AITGTGACAA AAIAAACTTA TTATGTAIGT ATTTTCTACG AGTTCTTTTG GGTATTCCGT CGGCTATCTG CTTACTTTTC GTTTCTTGCT CTTATTATTG CGCTCAAITA CCCTCTGACT TCCCTGTTTT TAIGTTATTC AAAAATCGTT TCTTAITTGG AAITAGGCTC TGGAAAGACG GGTGCAAAAI AGCAACTAAT TCGCTAAAAC GCCTCGCGTT CTATTGGGCG AGTGCGGTAC AITGGTTTCT CTAITGTTGA TGGACAGAAT TGCCTCTGCC CTGTTGAGCG CTTTTTCTAG GTCGGTATTT AAAAGTTTTC AIATAACCCA AATTCAGTAI CTAAGGGAAA CGGTAATGAT TTGGTTTAAT ACATGCTCG TAAACAGGCG TACTTTACCT TAAAITACAT TTGGCTTTAT TAAITATGAT CAAACCATTA ACGCGTTCTT ACCTAAGCCG TGACTCTTCT AITAATTAAT 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2380 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 AGCGACGATT ATIAAAAAAG TGTTTCATCA TGTAACTTCG TACTGTTACT .TGTTTTACGT TAATCCAAAC TGATAATTCC TTTTAAAATT GTCTAATACT TAGTGCACCT AACTGACCAG TTTTTCATTT CCTCACCTCT AGGGCTATCA TATTCTTACG TACTGGTCGT TCAAAATGTA TCTGGATAIT TACTAATCAA CGGTGGCCTC AATCCCTTTA ATACGTGCTC GTGTGGTGGT TCGCTTTCTT GGGGGCTCCC ATTTGGGTGA CCTTCCAGTC CTATCTCGGG ACAGGATTTT CCAGGCGGTG GGCGCCCAAT ACGACAGGTT TCACTCATTA TACAGAAGCA GTAATTCAAA TCTTCTTTTG TATTCAAAGC GTATATTCAT GCTAATAATT AATCAGGATT GCTCCTTCTG AATAACGTTC $CTAAATCCT AAAGATATTT ATAITGATTG GCTGCTGGCT GTTTTATCTT GTTCGCGCAT CTTTCAGGTC GTGACTGGTG GGTATTTCCA ACCAGCAAGG AGAAGTATTG ACTGAITATA ATCGGCCTCC GTCAAAGCAA TACGCGCAGC CCCTTCCTTT TTTAGGGTTC TGGTTCACGT CACGTTCTTT CTATTCTTTT CGCCTGCTGG AAGGGCAATC ACGCAAACCG TCCCGACTGG GGCACCCCAG ACGTTATTCA CTCACATATA TTGATTTATG TGAAATTGTT CTCAGGTAAT AAICAGGCGA CTGACGTTAA TTGAIATGGT AIATTGATGA GTGGTTTCTT GGGCAAAGGA CAAATGTATT TAGATAACCT AGGGTTTGAT CTCAGCGTGG CTGCTGGTGG TAAAGACTAA AGAAGGGTTC AATCTGCCAA TGAGCGTTTT CCGAIAGTTT CTACAACGGT AAAACACTTC TGTTTAGCTC CCATAGTACG GTGACCGCTA CTCGCCACGT CGATTTAGTG AGTGGGCCAT AATAGTGGAC GATTTATAAG GGCAAACCAG AGCTGTTGCC CCTCTCGCCG AAAGCGGGCA GCTTTACACT AAATGTAATT TGAAATGAAI AICCGTTATT ACCTGAAAAT TGGTTCAATT AITGCCATCA TGTTCCGCAA TTTAATACGA ATCTATTGAC TCCTCAATTC ATTTGAGGTT CACTGTTGCA TTCGTTCGGT TAGCCATTCA TATCTCTGTT TGTAAATAAT TCCTGTTGCA GAGTTCTTCT TAATTTGCGT TCAAGATTCT CCGCTCTGAT CGCCCTCTAC CACTTGCCAG TCGCCGGCTT CTTTACGGCA CGCCCTGATA TCTTGTTCCA GGATTTTGCC CGTGGACCGC CGTCTCGCTG CGCGTTGGCC GTGAGCGCAA TTATGCTTCC AAITTTGTTT AATTCGCCTC GTTTCTCCCG CTACGCAATT CCTTCCATAA TCTGATAATC AATGATAATG GTTGTCGAAT GGCTCTAATC CTTTCTACTG CACCAAGGTG GGCGGTGTTA ATTTTTAATG AAAATATTGT GGCCAGAATG CCAITTCAGA AIGGCTGGCG ACTCAGGCAA GAIGGACAGA GGCGTACCGT TCCAACGAGG CGGCGCATTA CGCCCTAGCG TCCCCGTCAA CCTCGACCCC GACGGTTTTT AACTGGAACA GATTTCGGAA TTGCTGCAAC GTGAAAAGAA GATTCATTAA CGCAATTAAT GGCTCGTATG TACTGTTTCC TCTTGATGTT TGCGCGATTT ATGTAAAAGG TCTTTAITTC TTCAGAAGTA AGGAATATGA TTACTCAAAC TGTTTGTAAA TATTAGTTGT TTGATTTGCC ATGCTTTAGA AIACTGACCG GCGATGTTTT CTGTGCCACG TCCCTTTTAT CGATTGAGCG GTAATATTGT GTGATGTTAT CTCTTTTACT TCCTGTCTAA AAAGCACGTT AGCGCGGCGG CCCGCTCCTT GCTCTAAATC AAAAAACTTG CGCCCTTTGA ACACTCAACC CCACCATCAA TCTCTCAGGG AAACCACCCT TGCAGCTGGC GTGAGTTAGC TTGTGTGGAA 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 S220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 TTGTGAGCGG GTGACTGGGA AAGCACTATT GAGCCATCCG AAGTGCTACT CATAGGGATT GCCCGCACCG TGGTTTCCGG GATACGGTCG AACGTAACCT TCTTACTCGC ATTTTTGATG ATTTTAACAA TGGGGCTTTT CGTTCATCGA ATCTCTCAAA ATATTGATGG AITACTCAGC AAATAAAGGC TAGCTTTATG AITTATTGGA ATAACAAITT AAACCCTGGC GCACTGGCAC GGAGCTGAAG GAGTACATTG AAATTATTCA ATCGCCCTTC CACCAGAAGC TCGTCCCCTC ATCCCATTAC TCACATTTAA GCGTTCCTAT AATATTAACG CTGATTATCA TTCTCTTGTT AATAGCTACC TGATTTGACT CAITGCATTT TTCTCCCGCA CTCTGAGGCT CGTT CACACGCGTC GTTACCCAAG TCTTACCGTT GCGATGACCC GCTACGCTTG AAAAGTTTAC CCAACAGTTG GGTGCCGGAA AAACTGGCAG GGTCAATCCG TGTTGAIGAA TGGTTAAAAA TTTACAATTT ACCGGGGTAC TGCTCCAGAC CTCTCCGCCA GTCTCCGGCC AAAAIATATG AAAGTATTAC TTAITGCTTA (2) INFORMATION FOR SEQ ID NO:7: (1) SEQUENCE CHARACTERISTICS: ACTTGGCACT GGCCGTCGTT CTTTGTACAT GGAGAAAATA ACTGTTTACC CCTGTGGCAA TGCTAAGGCT GCATTCAATA GGCTATGGTA GTAGTTATAG GAGCAAGGCT TCTTAAGCAA CGCAGCCTGA ATGGCGAATG AGCTGGCTCG AGTGCGAICT AIGCACGGTT ACGATGCGCC CCGTTTGTTC CCACGGAGAA AGCTGGCTAC AGGAAGGCCA ATGAGCTGAT TTAACAAAAA AAAIAITTGC TTATACAAIC ATATGATTGA CATGCTAGTT TCTCAGGCAA TGACCTGATA TTAATTTATC AGCTAGAACG TTTCTCACCC TTTTGAATCT AGGGTTCTAA AAAITTTTAT AGGGTCATAA TCTTTTTGGT AITTTGCTAA TTCTTTGCCT (A) LENGTH: 37 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: GATCCTAGGC TGAAGGCGAT GACCCTGGTA AGGCTGC (2) INFORMATION FOR SEQ ID NO:8: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 35 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear TTACAACGTC AAGTGAAACA AAGCCCTTCT GTTTACAGGC TTGGTGCTAC TAGCGAAGAG GCGCTTTGCC TCCTGAGGCC CATCTACACC TCCGACGGCT GACGCGAATT TTTAACGCGA TTCCTGTTTT TIACGAITAC GCCTTTGTAG GTTGAAIATC TTACCTACAC CCTTGCGTTG ACAACCGATT TGCCTGTAIG 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 73 93 (xi) SEQUENCE DESCRIPTION: SEQ ID N028: ATTCAATACT TTACAGGCAA GTGCTACTGA GTACA (2) INFORMATION FOR SEQ ID NO:9: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 35 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: TTGCCTACGC TTGGCCTATG GTAGTAGTTA TACTT (2) INFORMATION FOR SEQ ID NO:10: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 35 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: GGTGCTACCA TAGGGATTAA AITAITCAAA AAGTT (2) INFORMATION FOR SEQ ID NO:1l: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 18 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: TACGAGCAAG GCTTCTTA (2) INFORMATION FOR SEQ ID NO:l2: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 39 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:l2: ACCTTAAGAA GCCTTCCTCG TAAACTTTTT GAATAATTT (2) INFORMATION roa SEQ ID NO:13: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 36 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOIDGY: linear (xi) saqunucn DESCRIPTION: snq ID NO:13: AATCCCTATG GTACCACCAA cramaacrac TACCAT (2) INFORMATION FOR SEQ ID NO:14: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 35 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOIDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: AGCCCAAGCG TACCCAAIGT ACTCAGTAGC ACTTG (2) INFORMATION FOR SEQ ID N0:15: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 34 base airs (B) TY?E: nucleic ac d (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: CCTGTAAACT AITGAATGCA GCCTTAGCAG GCTC (2) INFORMATION FOR SEQ ID N0:16: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 16 base pairs (B) TYYE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: ATCGCCTTCA GCCTAG (2) INFORMATION FOR SEQ ID N0:l7: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 27 base pairs (B) TYPE: nucleic acid (G) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: CTCGAATTCG TACATCCTGG TCATAGC 27 (2) INFORMATION FOR SEQ ID NO:18: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: CATTTTTGCA IGATGGCTTAG A 21 (2) INFORMATION FOR SEQ ID NO:19: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 13 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: TAGCATTAAC GTCCAATA 18 (2) INFORMATION FOR SEQ ID NO:20: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: ATATATTTTA GTAAGCTTCA TCTTCT 26 (2) INFORMATION FOR SEQ ID NO:21: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 23 base pairs (B) TYPE: nucleic acid (c) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: GACAAAGAAC GCGTGAAAAC TTT (2) INFORMATION FOR SEQ ID N0:22: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 35 base pairs (3) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOIDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: GCGGGCCTCT TCGCTATTGC TTAAGAAGCC TTGCT (2) INFORMATION FOR SEQ ID N0:23: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 48 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: TTCAGCTAG GATCCGCCGA GCTCTCCTAC CTGCGAATTC GTACATCC (2) INFORMATION FOR SEQ ID N0:24: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 24 base pairs (3) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:24: TGGATTATAC TTCTAAATAA TGGA (2) INFORMATION FOR SEQ ID NO:25: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 36 base pairs (3) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOIOGY: linear (xi) SEQUENCE DESCRIPTION: sm ID NO:25: TAACACTCAT TCCGGATGGA ATTCTGGAGT c'rccc'r (2) INFORMATION FOR SEQ ID NO:26: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 22 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: AATTCGCCAA GGAGACAGTC AT (2) INFORMATION FOR SEQ ID NO:27: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 39 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: AATGAAATAC CTATTGCCTA CGGCAGCCGC TGGATTGTT (2) INFORMATION FOR SEQ ID NOIZB: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 39 base Pairs (B) TYPE: nucleic ac1d (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:28: ATTACTCGCT GCCCAACCAG CCATGGCCGA GCTCGTGAT (2) INFORMATION FOR SEQ ID NO:29: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 39 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: GACCCAGACT CCAGATATCC AACAGGAATG AGTGTTAAT (2) INFORMATION FOR SEQ ID NO:30: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 13 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: szq ID NO:30: TCTAGAACGC crc (2) INFORMATION FOR SEQ ID NO:31: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 35 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3l: ACGTGACGCG TTCTAGAATT AACACTCAIT CCTGT (2) INFORMATION FOR SEQ ID NO:32: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 39 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: TGGATATCTG GAGTCTGGGT CATCACGAGC TCGGCCATG (2) INFORMATION FOR SEQ ID N0:33: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 39 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: GCTGGTTGGG CAGCGAGTAA TAACAATCCA GCGGCTGGC (2) INFORMATION FOR SEQ ID NO:34: (1) snqumcz CHARACTERISTICS: (A) LENGTH: 37 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:34: GTAGGCAATA GGTATTTCAI TATGACTGTC CTTGGCG (2) INFORMATION FOR SEQ ID NO:35: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 30 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:35? TCACTGTCTC CTTGGCCTGT GAAATTGTTA (2) INFORMATION FOR SEQ ID NO:36: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 36 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESSt single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:36: TAACACTCATCTCCGCATGGA ATTCTGGAGT CTGGGT (2) INFORMATION FOR SEQ ID NO:37: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 25 base pairs (B) TYPE: nucleic acid (c) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:37: CAATTTTATC CTAAAICTTA CCAAC (2) INFORMATION FOR SEQ ID NO:38: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:38: CATTTTTGCA GATGGCTTAG A (2) INFORMATION FOR SEQ ID NO:39: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: CGAAAGGGGG GTGTGCTGCA A (2) INFORMATION FOR SEQ ID N0:40: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 18 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY 2 linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: TAGCATTAAC GTCCAATA (2) INFORMATION FOR SEQ ID NO:41: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 43 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4l: AAACGACGGC CAGTGCCAAG TGACGCGTGT GAAATTGTTA ICC (2) INFORMATION FOR SEQ ID N0:42: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 43 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: GGCGAAAGGG AATTCTGCAA GGCGATTAAG CTTGGGTAAC GCC (2) INFORMATION FOR SEQ ID NO:43: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 36 base airs (B) TYPE: nucleic ac d (C) STRANDEDNESS: single (D) TOPOIDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: GGCGTTACCC AAGCTTTGTA CATGGAGAAA ATAAAG (2) INFORMATION FOR SEQ ID N0:h4: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 42 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:44: TGAAACAAAG CACTATTGCA CTGGCACTCT TACCGTTACC GT (2) INFORMATION FOR SEQ ID NO:45: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 42 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:45: TACTGTTTAC CCCTGTGACA AAAGCCGCCC AGGTCCAGCT GC (2) INFORMATION FOR SEQ ID NO:46: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 44 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:h6: TCGAGTCAGG CCTATTGTGC CCAGGGATTG TACTAGTGGA TCCG (2) INFORMATION FOR SEQ ID NO:47: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 38 base airs (B) TYPE: nucleic ac d (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: TGGCGAAAGG GAATTCGGAT CCACTAGTAC AATCCCTG (2) INFORMATION FOR SEQ ID N0:48: (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: #2 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID ND:48: GGCACAATAG GCCTGACTCG AGCAGCTGGA CCAGGGCGGC TT (2) INFORMATION FOR SEQ ID NO:49: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 42 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: TTGTCACAGG GGTAAACAGT AACGGTAACG GTAAGTGTGC CA (2) INFORMATION FOR SEQ ID NO:50: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 42 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:50: GTGCAATAGT GCTTTGTTTC ACTTTAITTT CTCCATGTAC AA (2) INFORMATION FOR SEQ ID NO:51: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5l: TAACGGTAAG AGTGCCAGTG C (52) INFORMATION FOR sao ID NO:52: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 68 base pairs (B) TYPE: nucleic acid (c) STRANDEDNESS: single (D) TOPOLDGY: linear (ix) FEATURE: (A) NAME/KEY: misc_difference (B) LOCATION: rep1ace(25, "") (D) OTHER INFORMATION: /note- "M REPRESENTS AN EQUAL MIXTURE OF A AND C AT THIS LOCATION AND AT LOCATIONS 28, 31, 34, 37, 40, 43, 46 & 49" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: AGCTCCCGGA TGCCTCAGAA GATCMNNMWN MNNMNNHNNM~NNMNNMNNMN NGGCTTTTGC 60 CACAGGGG (2) INFORMATION FOR SEQ ID N0:53: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 54 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ix) FEATURE: (A) NAME/KEY: misc_difference (B) LOCATION: rep1ace(l7, '") (D) OTHER INFORMATION: /noteu "M REPRESENTS AN EQUAL MIXTURE OF A AND C AT THIS LOCATION AND AT LOCATIONS 20, 23, 26, 29, 32, 35, 38, 41, 44 & 50" (xi) SEQUENCE DESCRIPTION: SEQ ID N0:53: CAGCCTCGGA TCCGCCMWNM NNMNNMNNM NMNMNNMNN MNNMMNATGM GAAT 54 (2) INFORMATION FOR SEQ ID N0:54: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 27 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: GGTAAACACT AACGGTAAGA GTGCCAG 27 (2) INFORMATION FOR SEQ ID NOZSS: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 19 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: GGGCTTTTGC CACAGGGGT 19 (2) INFORMATION FOR SEQ ID N0:56: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 63 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID N0:56: AGGGTCATCG CCTTCAGCTC CGGATCCCTC AGAAGTCATA-AACCCCCCAT AGGCTTTTGC CAC (2) INFORMATION FOR SEQ ID N0:57: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 47 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLDGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID N0:57: TCCCCTTCAG CTCCCGGATG CCTCAGAAGC ATGAACCCCC CATAGGC (2) INFORMATION FOR SEQ ID NO:53: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 25 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: CAATTTTATC CTAAATCTTA CCAAC (2) INFORMATION FOR SEQ ID N0:59: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: snq ID N0:59: cccmcaccc TCGGATCCGC c (2) INFORMATION FOR SEQ ID N0:602 (1) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:60: CGGATGCCTC AGAAGCCCCN N (2) INFORMATION FOR SEQ ID NO:61: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 30 base pairs (B) TYPE: nucleic acid (C) smmnunnnss: single (D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:61; CGGATGCCTC AGAAGGGCTT TTGCCACAGG

Claims (1)

1.CLAIMS A method of constructing a diverse population of vectors containing expressible oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues, comprising operationally linking a diverse population of oligonucleotides encoding completely random codon sequences to expression elements. A method of constructing a diverse population of vectors having a combined first and second oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues capable of expressing said combined oligonucleotides as said random polypeptides, comprising the steps of: (a) operationally linking sequences from a diverse population of first oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues to a first vector; (b) operationally linking sequences from a diverse population of second oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues to a second vector; and (c) combining the vector products of steps (a) and (b) under conditions where said populations of first and second oligonucleotides are joined together into a population of combined vectors capable of being expressed. A method according to claim 2 wherein steps (a) to (c) are repeated two or more times. The method of any one of the preceding claims wherein said oligonucleotide is expressed as a fusion protein on the surface of a cell. A method of selecting a peptide capable of being bound by a ligand binding protein from a population of random peptides, comprising: constructing a diverse population of vectors according to any one of the preceding claims and further comprising the steps of: (a) introducing said population of combined vectors into a compatible host under conditions sufficient for expressing said population of random peptides; and (b) determining the peptide which binds to said ligand binding protein. A method for determining the nucleic acid sequence encoding a peptide capable of being bound by a ligand binding protein which is selected from a population of random peptides, comprising: selecting a peptide according to the method of claim 5 and further comprising the steps of: (a) isolating the nucleic acid encoding said peptide; and (b) sequencing said nucleic acid. A composition of matter comprising a plurality of vectors obtainable according to any one of claims I to 3. A composition of matter comprising a plurality of cells containing a diverse population of oligonucleotides as described in any one of claims 1 to 4. A composition of matter comprising a plurality of cells containing a diverse population of expressible oligonucleotides operationally linked to expression elements, said expressible oligonucleotide encoding polypeptides having a sequence of completely random amino acid residues produced from random combinations of first and second oligonucleotide precursor populations encoding completely random amino acid residues. A composition of matter comprising a plurality of vectors containing a diverse population of expressible oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues. The composition of matter of any one of claims 7 to 10 wherein said oligonucleotide is expressed as a fusion protein on the surface of a cell. A kit for the preparation of vectors useful for the expression of a diverse population of random peptides from combined first and second oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues, comprising: two vectors: a first vector having a cloning site for said first oligonucleotides and a pair of restriction sites for operationally combining first oligonucleotides with second oligonucleotides; and a second vector having a cloning site for said second oligonucleotides and a pair of restriction sites complementary to those on said first vector, one or both vectors containing expression elements capable of being operationally linked to said combined first and second oligonucleotides. A cloning system for expressing random peptides from diverse populations of combined first and second oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues, comprising: a set of first vectors having a diverse population of first oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues and a set of second vectors having a diverse population of second oligonucleotides encoding polypeptides having a sequence of completely random amino acid residues, said first and second vectors each having a pair of restriction sites soias to allow the operational combination of first and second oligonucleotides into a contiguous oligonucleotide encoding polypeptides having a sequence of completely random amino acid residues. The method of claim 4, wherein said fusion protein is a gene VIII fiision. The composition of matter of claim 1 1, wherein said fusion protein is a gene VIII fusion protein.
IE342491A 1990-09-28 1991-09-27 Surface expression libraries of randomized peptides IE913424A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
USUNITEDSTATESOFAMERICA28/09/19905
US59066490A 1990-09-28 1990-09-28

Publications (2)

Publication Number Publication Date
IE84405B1 true IE84405B1 (en)
IE913424A1 IE913424A1 (en) 1992-04-08

Family

ID=24363161

Family Applications (2)

Application Number Title Priority Date Filing Date
IE20060477A IE20060477A1 (en) 1990-09-28 1991-09-27 Surface expression libraries of randomized peptides
IE342491A IE913424A1 (en) 1990-09-28 1991-09-27 Surface expression libraries of randomized peptides

Family Applications Before (1)

Application Number Title Priority Date Filing Date
IE20060477A IE20060477A1 (en) 1990-09-28 1991-09-27 Surface expression libraries of randomized peptides

Country Status (12)

Country Link
EP (2) EP0551438B1 (en)
JP (3) JP3663515B2 (en)
AT (1) ATE283352T1 (en)
AU (1) AU655788B2 (en)
CA (1) CA2092803A1 (en)
DE (1) DE69133430T2 (en)
DK (1) DK0551438T3 (en)
ES (1) ES2233922T3 (en)
IE (2) IE20060477A1 (en)
IL (1) IL99553A0 (en)
NZ (1) NZ239987A (en)
WO (1) WO1992006176A1 (en)

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5223409A (en) * 1988-09-02 1993-06-29 Protein Engineering Corp. Directed evolution of novel binding proteins
US7413537B2 (en) 1989-09-01 2008-08-19 Dyax Corp. Directed evolution of disulfide-bonded micro-proteins
US5498538A (en) * 1990-02-15 1996-03-12 The University Of North Carolina At Chapel Hill Totally synthetic affinity reagents
US5747334A (en) * 1990-02-15 1998-05-05 The University Of North Carolina At Chapel Hill Random peptide library
AU1545692A (en) 1991-03-01 1992-10-06 Protein Engineering Corporation Process for the development of binding mini-proteins
WO1993006121A1 (en) * 1991-09-18 1993-04-01 Affymax Technologies N.V. Method of synthesizing diverse collections of oligomers
US5639603A (en) * 1991-09-18 1997-06-17 Affymax Technologies N.V. Synthesizing and screening molecular diversity
CA2148838A1 (en) * 1992-11-10 1994-05-26 William D. Huse Soluble peptides having constrained, secondary conformation in solution and method of making same
IT1270939B (en) * 1993-05-11 1997-05-26 Angeletti P Ist Richerche Bio PROCEDURE FOR THE PREPARATION OF IMMUNOGEN AND DIAGNOSTIC REAGENTS, AND IMMUNOGEN AND DIAGNOSTIC REAGENTS SO OBTAINABLE.
WO1995010296A1 (en) * 1993-10-12 1995-04-20 Glycomed Incorporated A library of glyco-peptides useful for identification of cell adhesion inhibitors
US5503805A (en) * 1993-11-02 1996-04-02 Affymax Technologies N.V. Apparatus and method for parallel coupling reactions
US6165778A (en) * 1993-11-02 2000-12-26 Affymax Technologies N.V. Reaction vessel agitation apparatus
US6117679A (en) 1994-02-17 2000-09-12 Maxygen, Inc. Methods for generating polynucleotides having desired characteristics by iterative selection and recombination
US6335160B1 (en) 1995-02-17 2002-01-01 Maxygen, Inc. Methods and compositions for polypeptide engineering
US6165793A (en) 1996-03-25 2000-12-26 Maxygen, Inc. Methods for generating polynucleotides having desired characteristics by iterative selection and recombination
US5605793A (en) 1994-02-17 1997-02-25 Affymax Technologies N.V. Methods for in vitro recombination
US6010861A (en) * 1994-08-03 2000-01-04 Dgi Biotechnologies, Llc Target specific screens and their use for discovering small organic molecular pharmacophores
JPH10511554A (en) * 1994-12-30 1998-11-10 カイロン コーポレイション Controlled synthesis of polynucleotide mixtures encoding desired peptide mixtures
US6096548A (en) 1996-03-25 2000-08-01 Maxygen, Inc. Method for directing evolution of a virus
US6310191B1 (en) 1998-02-02 2001-10-30 Cosmix Molecular Biologicals Gmbh Generation of diversity in combinatorial libraries
ES2199431T3 (en) * 1997-01-31 2004-02-16 Cosmix Molecular Biologicals Gmbh GENERATION OF DIVERSITY IN COMBINATORY GENOTECAS.
US6153410A (en) * 1997-03-25 2000-11-28 California Institute Of Technology Recombination of polynucleotide sequences using random or defined primers
AU1449499A (en) 1997-10-31 1999-05-24 Maxygen, Incorporated Modification of virus tropism and host range by viral genome shuffling
US7244826B1 (en) 1998-04-24 2007-07-17 The Regents Of The University Of California Internalizing ERB2 antibodies
WO2000006717A2 (en) 1998-07-27 2000-02-10 Genentech, Inc. Improved transformation efficiency in phage display through modification of a coat protein
EP1117777A2 (en) * 1998-09-29 2001-07-25 Maxygen, Inc. Shuffling of codon altered genes
US6917882B2 (en) 1999-01-19 2005-07-12 Maxygen, Inc. Methods for making character strings, polynucleotides and polypeptides having desired characteristics
US6376246B1 (en) 1999-02-05 2002-04-23 Maxygen, Inc. Oligonucleotide mediated nucleic acid recombination
US6436675B1 (en) 1999-09-28 2002-08-20 Maxygen, Inc. Use of codon-varied oligonucleotide synthesis for synthetic shuffling
US6368861B1 (en) 1999-01-19 2002-04-09 Maxygen, Inc. Oligonucleotide mediated nucleic acid recombination
US6961664B2 (en) 1999-01-19 2005-11-01 Maxygen Methods of populating data structures for use in evolutionary simulations
US7024312B1 (en) 1999-01-19 2006-04-04 Maxygen, Inc. Methods for making character strings, polynucleotides and polypeptides having desired characteristics
US7873477B1 (en) 2001-08-21 2011-01-18 Codexis Mayflower Holdings, Llc Method and system using systematically varied data libraries
US7702464B1 (en) 2001-08-21 2010-04-20 Maxygen, Inc. Method and apparatus for codon determining
US20070065838A1 (en) 1999-01-19 2007-03-22 Maxygen, Inc. Oligonucleotide mediated nucleic acid recombination
US8457903B1 (en) 1999-01-19 2013-06-04 Codexis Mayflower Holdings, Llc Method and/or apparatus for determining codons
IL138002A0 (en) 1999-01-19 2001-10-31 Maxygen Inc Methods for making character strings, polynucleotides and polypeptides having desired characteristics
US6365377B1 (en) 1999-03-05 2002-04-02 Maxygen, Inc. Recombination of insertion modified nucleic acids
US7430477B2 (en) 1999-10-12 2008-09-30 Maxygen, Inc. Methods of populating data structures for use in evolutionary simulations
EP1272647B1 (en) 2000-04-11 2014-11-12 Genentech, Inc. Multivalent antibodies and uses therefor
PT1303293E (en) 2000-07-27 2009-03-11 Genentech Inc Sequential administration of cpt-11 and apo-2l polypeptide

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4458066A (en) * 1980-02-29 1984-07-03 University Patents, Inc. Process for preparing polynucleotides
DE3177281T2 (en) * 1980-12-12 1992-12-10 Unilever Nv STRUCTURAL GENES ENCODING THE DIFFERENT ALLELIC AND RIPENING FORMS OF PRAEPROTHAUMATIN AND THEIR MUTANTS, THESE CLONING VECTORS CONTAINING THESE STRUCTURAL GENES AND THEIR EXPRESSION IN MICROBIAL CELLS.
EP0768377A1 (en) * 1988-09-02 1997-04-16 Protein Engineering Corporation Generation and selection of recombinant varied binding proteins
CA2009996A1 (en) * 1989-02-17 1990-08-17 Kathleen S. Cook Process for making genes encoding random polymers of amino acids

Similar Documents

Publication Publication Date Title
AU655788B2 (en) Surface expression libraries of randomized peptides
IE84405B1 (en) Surface expression libraries of randomized peptides
US5770434A (en) Soluble peptides having constrained, secondary conformation in solution and method of making same
AU667291B2 (en) Surface expression libraries of heteromeric receptors
US6027933A (en) Surface expression libraries of heteromeric receptors
US5766905A (en) Cytoplasmic bacteriophage display system
JP5161865B2 (en) Selection system
US6258530B1 (en) Surface expression libraries of randomized peptides
US20040048383A1 (en) Novel method and phage for the identification of nucleic acid sequences encoding members of a multimeric (poly)peptide complex
WO1997032017A1 (en) Novel method for the identification of nucleic acid sequences encoding two or more interacting (poly)peptides
WO1994011496A1 (en) Soluble peptides having constrained, secondary conformation in solution and method of making same
Haaparanta et al. A combinatorial method for constructing libraries of long peptides displayed by filamentous phage
CA2709939A1 (en) Engineered hybird phage vectors for the design and the generation of a human non-antibody peptide or protein phage library via fusion to pix of m13 phage
Studier et al. Cytoplasmic bacteriophage display system
GB2379933A (en) Selection system for phagemids using proteolytically sensitive helper phage