WO1998020016A1 - Novel coding sequences from herpes simplex virus type-2 - Google Patents

Novel coding sequences from herpes simplex virus type-2 Download PDF

Info

Publication number
WO1998020016A1
WO1998020016A1 PCT/US1997/020016 US9720016W WO9820016A1 WO 1998020016 A1 WO1998020016 A1 WO 1998020016A1 US 9720016 W US9720016 W US 9720016W WO 9820016 A1 WO9820016 A1 WO 9820016A1
Authority
WO
WIPO (PCT)
Prior art keywords
orf
polypeptide
polynucleotide
sequence
seq
Prior art date
Application number
PCT/US1997/020016
Other languages
French (fr)
Inventor
Klaus Max Esser
John Y. Chan
Christine Ellen Dabrowski-Amaral
Alfred Michael Delvecchio
Susan B. Dillon
Jeffrey Joseph Leary
David Sutton
Original Assignee
Smithkline Beecham Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smithkline Beecham Corporation filed Critical Smithkline Beecham Corporation
Priority to EP97946877A priority Critical patent/EP0948508A4/en
Priority to JP52166998A priority patent/JP2001508649A/en
Priority to CA002270282A priority patent/CA2270282A1/en
Publication of WO1998020016A1 publication Critical patent/WO1998020016A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/005Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from viruses
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P31/00Antiinfectives, i.e. antibiotics, antiseptics, chemotherapeutics
    • A61P31/12Antivirals
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P31/00Antiinfectives, i.e. antibiotics, antiseptics, chemotherapeutics
    • A61P31/12Antivirals
    • A61P31/20Antivirals for DNA viruses
    • A61P31/22Antivirals for DNA viruses for herpes viruses
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K39/00Medicinal preparations containing antigens or antibodies
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2710/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA dsDNA viruses
    • C12N2710/00011Details
    • C12N2710/16011Herpesviridae
    • C12N2710/16611Simplexvirus, e.g. human herpesvirus 1, 2
    • C12N2710/16622New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes

Definitions

  • This invention relates to newly identified Herpes Simplex Virus type 2 (HSV-2) polynucleotides, polypeptides encoded by such polynucleotides, the uses of such polynucleotides and polypeptides, as well as the production of such polynucleotides and polypeptides and recombinant host cells transformed with the polynucleotides.
  • HSV-2 Herpes Simplex Virus type 2
  • the herpes viruses consist of large icosahedral enveloped virions containing linear double stranded DNA genomes.
  • One human herpes virus, herpes simplex virus type 2, designated HSV-2 is usually acquired through sexual contact giving rise to the condition known as genital herpes.
  • the frequency of recurrence of secondary genital herpes ranges between one and six times per year per infected individual. It is estimated that genital HSV-2 infections occur in ten to sixty million individuals in the USA. Less frequently, HSV-2 infection results in herpes labialis, seen as cold sores.
  • General information about HSV-2 may be found in various treatises such as, Herpes
  • HSV-2 presents a major public health problem.
  • prophylactic and therapeutic vaccines as well as a method of identifying anti-HSV-2 agents and for reagents useful in such methods.
  • a method of identifying compounds which modulate the activity of HSV-2 polynucleotides and proteins and which affect the ability of the virus to replicate and produce multiple infectious virions in an infected cell There is a need for methods of, and kits for, distinguishing HSV-2 infections from other herpes virus infections.
  • polypeptides inter alia, that have been identified as novel HSV ⁇ polypeptides by comparison between the amino acid sequences set out in Tables 1-4 and known amino acid sequences of proteins of other viruses such as herpes simplex virus type-1 (HSV-1).
  • HSV-1 herpes simplex virus type-1
  • ORFs Open Reading Frames
  • polynucleotides comprise any of the regions encoding HSV-2 proteins in the sequences set out in Tables 1-4, including fragments, analogs or derivatives thereof.
  • HSV-2 protein comprising any of the amino acid sequences shown in Table 1, or fragments, analogues or derivatives thereof.
  • an isolated nucleic acid molecule encoding a polypeptide expressible by the HSV-2 polynucleotide contained in the deposited HSV-2 strain, SB5.
  • nucleic acid molecules encoding HSV-2 proteins, nucleic acid molecules such as, mRNAs, cDNAs, genomic DNAs and, in further embodiments of this aspect of the invention, biologically, diagnostically, clinically or therapeutically useful variants, analogs or derivatives thereof, or fragments thereof, including fragments of the variants, analogs and derivatives.
  • HSV-2 origin as well as biologically, diagnostically or therapeutically useful fragments thereof, as well as variants, derivatives and analogs of the foregoing and fragments thereof.
  • probes that hybridize to HSV-2 sequences useful for detection of viral infection there are probes that hybridize to HSV-2 sequences useful for detection of viral infection.
  • HSV-2 polypeptides or fragments thereof that may be employed for therapeutic or prophylactic purposes, for example, to treat disease, including treatment by conferring host immunity against viral infections, or as an antiviral agent or a vaccine.
  • a polynucleotide of the invention for therapeutic or prophylactic purposes, in particular genetic immunization.
  • HSV-2 polypeptides encoded by naturally occurring alleles of HSV-2 genes for therapeutic or prophylactic use.
  • methods for producing the aforementioned HSV-2 polypeptides comprising culturing host cells having expressibly incorporated therein an exogenously-derived HSV-2 encoding polynucleotide under conditions for expression of HSV-2 in the host and then recovering the expressed polypeptide.
  • products, compositions, processes and methods that utilize the aforementioned polypeptides and polynucleotides, inter alia, for research, biological, clinical, diagnostic, prophylatic and therapeutic purposes.
  • inhibitors of such polypeptides useful as antiviral agents.
  • antibodies against such polypeptides there are provided antibodies against such polypeptides.
  • die antibodies are selective for HSV-2.
  • compositions comprising a HSV-2 polynucleotide or HSV-2 polypeptide for administration to cells in vitro, to cells ex vivo and to cells in vivo, or to a multicellular organism.
  • the compositions comprise a HSV-2 polynucleotide for expression of a HSV-2 polypeptide in a host organism to raise an immunological response, preferably to raise immunity in such host against HSV-2 or related organisms.
  • Tables 1-3 show the nucleotide sequences of one strand of "contigs," prepared by assembling sequences derived by sequencing HSV-2, Strain SB5, DNA. Collectively, the contigs herein represent between 85% to over 90% of the genome of this organism. Each of Table 1, 2 and 3 represents a separate sequencing of the HSV-2, SB5, DNA.
  • Tables 1-3 also show the nucleotide sequences of open reading frames (ORFs), which are deduced DNA coding sequences present within each contig.
  • Tables 1-4 also show the deduced amino acid sequences of polypeptides encoded by these ORFs and sequence homologies to proteins in the NCBI non-redundant protein database.
  • ORFs open reading frames
  • Tables 1-4 also show the deduced amino acid sequences of polypeptides encoded by these ORFs and sequence homologies to proteins in the NCBI non-redundant protein database.
  • Each ORF represents a HSV-2 gene although in some cases, a given ORF may actually have been derived from a gene that is longer than the ORF.
  • each of the DNA sequences provided herein may be used in the discovery and development of antiviral compounds.
  • sequences containing an open reading frame (ORF) with appropriate initiation and termination codons the encoded protein upon expression can be used as a target for the screening of antiviral drugs.
  • the DNA sequences encoding preferably the amino terminal regions of the encoded protein, or regions immediately upstream therefrom can be used to construct antisense sequences to control the expression of the coding sequence of interest.
  • many of the sequences disclosed herein also provide regions upstream and downstream from the encoding sequence. These sequences are useful as a source of regulatory elements for the control of viral gene expression.
  • Such sequences are conveniently isolated by restriction enzyme action or synthesized chemically and introduced, for example, into promoter identification strains. These strains contain a reporter structural gene sequence located downstream from a restriction site such that if an active promoter is inserted, the reporter gene will be expressed.
  • this invention also provides several means for identifying particularly useful target genes.
  • the first of these approaches entails searching appropriate databases for sequence matches in related organisms.
  • the HSV-2-like form of this gene would likely play an analogous role.
  • a HSV-2 protein identified as homologous to a cell surface protein in another organism would be useful as a vaccine candidate.
  • homologies have been identified for the sequences disclosed herein they are reported along with the encoding sequence in the Tables.
  • a number of methods can be used to identify genes which are essential to survival per se, or essential to the establishmentymaintenance of an infection. Identification of an ORF unknown by one of these methods yields additional information about its function and permits the selection of such an ORF for further development as a screening target.
  • these approaches include: generation of temperature sensitive mutations (Weller, S.K., et al., Virology 130:290-305 (1983)), site specific insertion or deletion of a viral gene; a method based on selection of recombinant molecules generated by double recombination through homologous sequencees between intact viral DNA molecules and a DNA fragment containing an insertion or deletion and a selectable marker (Post, L.E., et al., Cell 25:227- 32 (1981)), and also by insertional mutagenesis using transposons; a method taking advantage of the random insertion of the DNA phage miniMu into target plasmid DNAs (Jenkins, F.J., et al., Proc.
  • HSV-2 BINDING MOLECULE refers to molecules or ions which bind or interact specifically with HSV-2 polypeptides or polynucleotides of the present invention, including, for example, enzyme substrates, cell membrane components and classical receptors. Binding between polypeptides of the invention and such molecules, including binding or interaction molecules may be exclusive to polypeptides of the invention, which is preferred, or it may be highly specific for polypeptides of the invention, which is also preferred, or it may be highly specific to a group of proteins that includes polypeptides of the invention, which is preferred, or it may be specific to several groups of proteins at least one of which includes a polypeptide of the invention. Binding molecules also include antibodies and antibody-derived reagents that bind specifically to polypeptides of the invention.
  • GENETIC ELEMENT generally means a polynucleotide comprising a region that is important to the viral life cycle, a polynucleotide comprising a region that encodes a polypeptide or a polynucleotide region that regulates replication, transcription or translation or other processes important to expression of the polypeptide in a host cell, or a polynucleotide comprising both a region that encodes a polypeptide and a region operably linked thereto that regulates expression.
  • Genetic elements may be comprised within a vector that replicates as an episomal element; that is, as a molecule physically independent of the host cell genome. They may be comprised within plasmids. Genetic elements also may be comprised within a host cell genome; not in their natural state but, rather, following manipulation such as isolation, cloning and introduction into a host cell in the form of purified DNA or in a vector, among others.
  • HOST CELL is a cell which has been transformed or transfected, or is capable of transformation or transfection by an exogenous polynucleotide sequence.
  • IDENTITY as known in the art, is a relationship between two or more polypeptide sequences or two or more polynucleotide sequences, as determined by comparing the sequences.
  • identity also means the degree of sequence relatedness between polypeptide or polynucleotide sequences, as the case may be, as determined by the match between strings of such sequences.
  • Preferred methods to determine identity are designed to give the largest match between the sequences tested. Methods to determine identity and similarity are codified in publicly available computer programs. Preferred computer program methods to determine identity and similarity between two sequences include, but are not limited to, the GCG program package (Devereux, J., et al., Nucleic Acids Research 12(1): 387 (1984)), BLASTP, BLASTN, and FASTA (Atschul, S.F. et al., J. Molec. Biol.
  • BLAST X program is publicly available from NCBI and other sources (BLAST Manual, Altschul, S., et al, NCBI NLM NIH Bethesda, MD 20894; Altschul, S., et al., J. Mol. Biol. 215: 403-410 (1990).
  • a polynucleotide having a nucleotide sequence having at least, for example, 95% "identity" to a reference nucleotide sequence it is intended that the nucleotide sequence of the tested polynucleotide is identical to the reference sequence except that the polynucleotide sequence may include up to five point mutations per each 100 nucleotides of the reference nucleotide sequence.
  • a polynucleotide having a nucleotide sequence at least 95% identical to a reference nucleotide sequence up to 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence.
  • These mutations of the reference sequence may occur at the 5' or 3' terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence.
  • a polypeptide having an amino acid sequence having at least, for example, 95% identity to a reference amino acid sequence is intended that the test amino acid sequence of the polypeptide is identical to the reference sequence except that the polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the reference amino acid.
  • the polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the reference amino acid.
  • up to 5% of the amino acid residues in the reference sequence may be deleted or substituted with another amino acid, or a number of amino acids up to 5% of the total amino acid residues in the reference sequence may be inserted into the reference sequence.
  • These alterations of the reference sequence may occur at the amino or carboxy terminal positions of the reference amino acid sequence or anywhere between those terminal positions, interspersed either individually among residues in the reference sequence or in one or more contiguous groups within the reference sequence.
  • ISOLATED means altered “by the hand of man” from its natural state; i.e., that, if it occurs in nature, it has been changed or removed from its original environment, or both.
  • a naturally occurring polynucleotide or a polypeptide naturally present in a living organism in its natural state is not “isolated,” but the same polynucleotide or polypeptide separated from the coexisting materials of its natural state is “isolated", as the term is employed herein.
  • isolated means that it is separated from the genome and cell in which it naturally occurs.
  • polynucleotides can be joined to other polynucleotides, such as DNAs, for mutagenesis, to form fusion proteins, and for propagation or expression in a host, for instance.
  • the isolated polynucleotides, alone or joined to other polynucleotides such as vectors, can be introduced into host cells, in culture or in whole organisms. Introduced into host cells in culture or in whole organisms, such DNAs still would be isolated, as the term is used herein, because they would not be in their naturally occurring form or environment.
  • polynucleotides and polypeptides may occur in a composition, such as a media formulations, solutions for introduction of polynucleotides or polypeptides, for example, into cells, compositions or solutions for chemical or enzymatic reactions, for instance, which are not naturally occurring compositions, and, therein remain isolated polynucleotides or polypeptides within the meaning of that term as it is employed herein.
  • a composition such as a media formulations, solutions for introduction of polynucleotides or polypeptides, for example, into cells, compositions or solutions for chemical or enzymatic reactions, for instance, which are not naturally occurring compositions, and, therein remain isolated polynucleotides or polypeptides within the meaning of that term as it is employed herein.
  • POLYNUCLEO ⁇ DE(S) generally refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA.
  • polynucleotides as used herein refers to, among others, single-and double- stranded DNA, DNA that is a mixture of single- and double-stranded regions or single-, double- and triple-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded, or triple-stranded, or a mixture of single- and double-stranded regions.
  • polynucleotide as used herein refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA.
  • the strands in such regions may be from the same molecule or from different molecules.
  • the regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules.
  • One of the molecules of a triple-helical region often is an oligonucleotide.
  • polynucleotide includes DNAs or RNAs as described above that contain one or more modified bases. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are "polynucleotides" as that term is intended herein.
  • DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritylated bases, to name just two examples, are polynucleotides as the term is used herein. It will be appreciated that a great variety of modifications have been made to DNA and RNA that serve many useful purposes known to those of skill in the art.
  • polynucleotide as it is employed herein embraces such chemically, enzymatically or metabolically modified forms of polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including inter alia, simple and complex cells.
  • polynucleotide(s) embrace short polynucleotides often referred as oligonucleotides.
  • POLYPEPTIDES includes all polypeptides as described below.
  • the basic structure of polypeptides is well known and has been described in innumerable textbooks and other publications in the art.
  • the term is used herein to refer to any peptide or protein comprising two or more amino acids joined to each other in a linear chain by peptide bonds.
  • the term refers to both short chains, which also commonly are referred to in the art as peptides, oligopeptides and oligomers, for example, and to longer chains, which generally are referred to in the art as proteins, of which there are many types.
  • polypeptides often contain amino acids other than the 20 amino acids commonly referred to as the 20 naturally occurring amino acids, and that many amino acids, including the terminal amino acids, may be modified in a given polypeptide, either by natural processes, such as processing and other post-translational modifications, but also by chemical modification techniques which are well known to the art. Even d e common modifications that occur naturally in polypeptides are too numerous to list exhaustively here, but they are well described in basic texts and in more detailed monographs, as well as in a voluminous research literature, and they are well known to those of skill in the art.
  • polypeptides of the present are, to name an illustrative few, acetylation, acylation, ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of phosphotidylinositol, cross-linking, cyclization, disulfide bond formation, demethylation, formation of covalent cross-links, formation of cystine, formation of pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristoylation, oxidation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of amino acids to proteins such as
  • polypeptides may be branched as a result of ubiquitination, and they may be circular, with or without branching, generally as a result of posttranslation events, including natural processing event and events brought about by human manipulation which do not occur naturally.
  • Circular, branched and branched circular polypeptides may be synthesized by non-translation natural process and by entirely syntheticieriods, as well. Modifications can occur anywhere in a polypeptide, including the peptide backbone, the amino acid side-chains and the amino or carboxyl termini.
  • blockage of the amino or carboxyl group in a polypeptide, or both, by a covalent modification is common in naturally occurring and synthetic polypeptides and such modifications may be present in polypeptides of the present invention, as well.
  • the amino terminal residue of polypeptides made in E. coli or other cells, prior to proteolytic processing almost invariably will be N-formylmethionine.
  • a methionine residue at the NH 2 -terminus may be deleted. Accordingly, this invention contemplates the use of both the methionine-containing and the methionineless amino terminal variants of the protein of the invention.
  • polypeptides made by expressing a cloned gene in a host for instance, the nature and extent of the modifications in large part will be determined by the host cell posttranslational modification capacity and the modification signals present in the polypeptide amino acid sequence.
  • glycosylation often does not occur in bacterial hosts such as, for example, E. coli. Accordingly, when glycosylation is desired, a polypeptide should be expressed in a glycosylating host, generally a eukaryotic cell.
  • Insect cells often carry out the same posttranslational glycosylations as do mammalian cells and, for this reason, insect cell expression systems have been developed to express efficiently mammalian proteins having native patterns of glycosylation. Similar considerations apply to other modifications. It will be appreciated that the same type of modification may be present in the same or varying degree at several sites in a given polypeptide. Also, a given polypeptide may contain many types of modifications. In general, as used herein, the term polypeptide encompasses all such modifications, particularly those that are present in polypeptides synthesized by expressing a polynucleotide in a host cell.
  • VARIANT(S) as the term is used herein, is a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide respectively, but retains essential properties.
  • a typical variant of a polynucleotide differs in nucleotide sequence from another, reference polynucleotide. Changes in the nucleotide sequence of the variant may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide. Nucleotide changes may result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as discussed below.
  • a typical variant of a polypeptide differs in amino acid sequence from another, reference polypeptide. Generally, differences are limited so that the sequences of the reference polypeptide and the variant are closely similar overall and, in many regions, identical.
  • a variant and reference polypeptide may differ in amino acid sequence by one or more substitutions, additions, deletions in any combination.
  • a substituted or inserted amino acid residue may or may not be one encoded by the genetic code.
  • a variant of a polynucleotide or polypeptide may be a naturally occurring such as an allelic variant, or it may be a variant that is not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to skilled artisans.
  • HSV-2 strain SB5 has been deposited at the American Type Culture Collection under accession number VR-2546 on October 31, 1996.
  • nucleotide sequences disclosed herein can be obtained by synthetic chemical techniques known in the art or can be obtained from HSV-2, strain SB5 by probing a DNA preparation with probes constructed from the particular sequences disclosed herein.
  • oligonucleotides derived from a disclosed sequence can act as PCR primers in a process of PCR-based cloning of the sequence from a viral genomic source. It is recognised that such sequences will also have utility in diagnosis of the stage of infection and type of infection the pathogen has attained.
  • the present invention relates to novel HSV-2 polypeptides and polynucleotides encoding same, among other things, as described in greater detail below.
  • the invention relates especially to HSV-2 molecules having the nucleotide and amino acid sequences set out in Tables 1-4 and to the HSV-2 nucleotide and amino acid sequences of the DNA isolatable from Deposit No. ATTC VR-2546, which is herein referred to as "the deposited organism” or as the "DNA of the deposited organism.” It will be appreciated that the nucleotide and amino acid sequences set out in Tabled 1-4 were obtained by sequencing the DNA of the deposited organism. Hence, the sequence of the deposited clone is controlling as to any discrepancies between it (and the sequence it encodes) and the sequences of the Tables.
  • the present invention also relates to additional polynucleotide sequences disclosed herein, which are RNAs transcribed from the DNAs disclosed herein but which may or may not be translated into protein.
  • Such polynucleotides are known in HSV-1 and outer herpes viruses.
  • isolated polynucleotides which encode HSV-2 polypeptides having the deduced amino acid sequence of Tables 1-4. It is preferred that these polynucleotides be one of those set forth in Tables 1, 2 or 3. The skilled artisan can readily determine the polynucleotide sequence of such preferred polynucleotides by reference to the ORF start and stop positions set forth in Tables 1-4.
  • a polynucleotide of the present invention encoding HSV-2 polypeptide may be obtained using standard cloning and screening procedures.
  • To obtain the polynucleotide encoding the protein using the DNA sequences given in Tables 1-3 typically a library of clones of chromosomal DNA of HSV-2 strain SB5 in E. coli or some other suitable host is probed with a radiolabelled oligonucleotide, preferably a 17mer or longer, derived from a sequence of Tables 1-3. Clones carrying DNA identical to that of the probe can then be distinguished using high stringency washes.
  • sequencing is then possible to extend the sequence in both directions to determine the full gene sequence.
  • sequencing is performed using denatured double stranded DNA prepared from a plasmid clone. Suitable techniques are described by Maniatis, T., Fritsch, E.F. and Sambrook, J. in MOLECULAR CLONING, A Laboratory Manual (2nd edition 1989 Cold Spring Harbor Laboratory, see Screening By Hybridization 1.90 and Sequencing Denatured Double-Stranded DNA Templates 13.70
  • the DNA sequences set out in Tables 1, 2 and 3 each contain at least one open reading frame encoding a protein having at least about the number of amino acid residues set forth in Table 1-3.
  • the start and stop codons of each open reading frame are the first three and the last three nuclotides of each polynucleotide set forth in Table 1, 2 and 3.
  • Certain HSV-2 sequences of the invention are structurally related to sequences encoding other proteins of the herpes family, as shown by comparing the sequences of the Tables with that of sequences reported in the literature.
  • certain polynucleotides and polypeptides of the invention are structurally related to known. These proteins exhibit greatest homology to the homologue listed in Tables 1, 2, 3 and 4 from among the known proteins.
  • the invention provides a polynucleotide sequence identical over its entire length to each coding sequence in Tables 1-3. Also provided by the invention is the coding sequence for the mature polypeptide or a fragment thereof, by itself as well as the coding sequence for the mature polypeptide or a fragment in reading frame with other coding sequence, such as those encoding a leader or secretory sequence, a pre-, or pro- or prepro- protein sequence.
  • the polynucleotide may also contain non-coding sequences, including for example, but not limited to non-coding 5' and 3' sequences, such as the transcribed, non-translated sequences, termination signals, ribosome binding sites, sequences that stabilize mRNA, introns, polyadenylation signals, and additional coding sequence which encode additional amino acids.
  • non-coding sequences including for example, but not limited to non-coding 5' and 3' sequences, such as the transcribed, non-translated sequences, termination signals, ribosome binding sites, sequences that stabilize mRNA, introns, polyadenylation signals, and additional coding sequence which encode additional amino acids.
  • a marker sequence mat facilitates purification of the fused polypeptide can be encoded.
  • the marker sequence is a hexa-histidine peptide, as provided in the pQE vector (Qiagen, Inc.) and described in Gentz et al., Proc. Natl. Acad.
  • Polynucleotides of the invention also include, but are not limited to, polynucleotides comprising a structural gene and its naturally associated sequences that control gene expression.
  • the invention also includes polynucleotides of the formula:
  • R 1 is hydrogen, and at the 3' end of the molecule, Y is hydrogen or a metal
  • R j and R3 is any nucleic acid residue
  • n and/or m is an integer between 1 and 3000 or zero
  • R 2 is a nucleic acid sequence of the invention, particularly a nucleic acid sequence selected from the group set forth in Tables 1, 2 and 3, as well as a ORF sequence selected from the group set forth in Tables 1, 2, 3 and 4 (as indicated by the reading frame numbering).
  • R 2 is oriented so that its 5' end residue is at the left, bound to R j and its 3' end residue is at the right, bound to R3.
  • Any stretch of nucleic acid residues denoted by either R group, where n and/or m is greater than 1, may be either a heteropolymer or a homopolymer, preferably a heteropolymer.
  • n and/or m is an integer between 1 and 1000, or 2000 or 3000.
  • polynucleotide encoding a polypeptide encompasses polynucleotides that include a sequence encoding a polypeptide of the invention, particularly a viral polypeptide and more particularly a polypeptide of the HSV-2 having an amino acid sequence set out in Table 1, 2, 3 or 4.
  • the term also encompasses polynucleotides that include a single continuous region or discontinuous regions encoding the polypeptide (for example, interrupted by integrated phage or an insertion sequence or editing) together with additional regions, that also may contain coding and/or non-coding sequences.
  • the invention further relates to variants of the polynucleotides described herein that encode for variants of the polypeptide having the deduced amino acid sequence of Tables 1, 2, 3 and 4. Variants that are fragments of the polynucleotides of the invention may be used to synthesize full-length polynucleotides of the invention.
  • Polynucleotides of the present invention may be in the form of RNA, such as mRNA, or in the form of DNA, including, for instance, cDNA and genomic DNA obtained by cloning or produced by chemical synthetic techniques or by a combination thereof.
  • the DNA may be double-stranded or single-stranded.
  • Single-stranded DNA may be the coding strand, also known as the sense strand, or it may be the non-coding strand, also referred to as die anti-sense strand.
  • the coding sequence which encodes the polypeptide may be identical to the coding sequence of the polynucleotide shown in Tables 1-4. It also may be a polynucleotide with a different sequence, which, as a result of the redundancy (degeneracy) of the genetic code, encodes the polypeptides of Tables 1-4.
  • polynucleotides encoding polypeptide variants, mat have the amino acid sequence of a polypeptide of Tables 1, 2, 3 and/or 4 in which several, a few, 5 to 10, 1 to 5, 1 to 3, 2, 1 or no amino acid residues are substituted, deleted or added, in any combination. Especially preferred among these are silent substitutions, additions and deletions, that do not alter the properties and activities of such polynucleotide.
  • polynucleotides that are at least 50%, 60% or 70% identical over their entire lengtii to a polynucleotide encoding a polypeptide having the amino acid sequence set out in Tables 1, 2, 3 or 4, and polynucleotides that are complementary to such polynucleotides.
  • polynucleotides that comprise a region that is at least 80% identical over its entire length to a polynucleotide encoding a polypeptide of the deposited strain and polynucleotides complementary thereto.
  • polynucleotides at least 90% identical over their entire length to the same are particularly preferred, and among these particularly preferred polynucleotides, those with at least 95% are especially preferred. Furthermore, those with at least 97% are highly preferred among those with at least 95%, and among these those with at least 98% and at least 99% are particularly highly preferred, with at least 99% being the most preferred.
  • a preferred embodiment is an isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of: a polynucleotide having at least a 50% identity to a polynucleotide encoding a polypeptide comprising the amino acid sequence of Tables 1, 2, 3, or 4 and obtained from a prokaryotic species other than HSV-2; and a polynucleotide encoding a polypeptide comprising an amino acid sequence which is at least 50% identical to the amino acid sequence of Tables 1, 2, 3 or 4 and obtained from a prokaryotic species other than HSV-2.
  • Preferred embodiments are polynucleotides that encode polypeptides that retain substantially the same biological function or activity as the mature polypeptide encoded by the DNA of Tables l, 2, 3 or4.
  • the invention further relates to polynucleotides that hybridize to the herein above- described sequences.
  • the invention especially relates to polynucleotides that hybridize under stringent conditions to the herein above-described polynucleotides.
  • stringent conditions and “stringent hybridization conditions” mean hybridization will occur only if there is at least 95% and preferably at least 97% identity between the sequences.
  • An example of stringent hybridization conditions is overnight incubation at 42°C in a solution comprising: 50% formamide, 5x SSC (150mM NaCl, 15mM trisodium citrate), 50 mM sodium phosphate (pH7.6), 5x Denhardt's solution, 10% dextran sulfate, and 20 micrograms/ml denatured, sheared salmon sperm DNA, followed by washing the hybridization support in O.lx SSC at about 65°C.
  • Hybridization and wash conditions are well known and exemplified in Sambrook, et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N.Y., (1989), particularly Chapter 11 therein.
  • the invention also provides a polynucleotide consisting essentially of a polynucleotide sequence obtainable by screening an appropriate library containing the complete gene for a polynucleotide sequence set forth in Tables 1, 2, 3 or 4 under stringent hybridization conditions with a probe having the sequence of said polynucleotide sequence or a fragment thereof; and isolating said DNA sequence. Fragments useful for obtaining such a polynucleotide include, for example, probes and primers described elsewhere herein.
  • polynucleotides of the invention may be used as a hybridization probe for RNA, cDNA and genomic DNA to isolate full-length cDNAs and genomic clones encoding a polypeptide and to isolate cDNA and genomic clones of other genes that have a high sequence similarity to a polynucleotide set forth in Table 1, 2, 3 or 4.
  • Such probes generally will comprise at least 15 bases.
  • such probes will have at least 30 bases and may have at least 50 bases.
  • Particularly preferred probes will have at least 30 bases and will have 50 bases or less.
  • each gene that comprises or is comprised by a polynucleotide set forth in Table 1, 2, 3 or 4 may be isolated by screening using a DNA sequence provided in Table 1, 2, 3 or 4 to synthesize an oligonucleotide probe.
  • a labeled oligonucleotide having a sequence complementary to that of a gene of the invention is then used to screen a library of cDNA, genomic DNA or mRNA to determine which members of the library the probe hybridizes to.
  • Polynucleotides of the invention that are oligonucleotides derived from the a polynucleotide or polypeptide sequence set forth in Table 1 , 2, 3 or 4 may be used in the processes herein as described, but preferably for PCR, to determine whether or not the polynucleotides identified herein in whole or in part are transcribed in virus in infected tissue. It is recognized that such sequences will also have utility in diagnosis of the stage of infection and type of infection the pathogen has attained.
  • the invention also provides polynucleotides that may encode a polypeptide that is the mature protein plus additional amino or carboxyl-terminal amino acids, or amino acids interior to the mature polypeptide (when the mature form has more than one polypeptide chain, for instance).
  • Such sequences may play a role in processing of a protein from precursor to a mature form, may allow protein transport, may lengthen or shorten protein half-life or may facilitate manipulation of a protein for assay or production, among other things.
  • the additional amino acids may be processed away from the mature protein by cellular enzymes.
  • a precursor protein, having the mature form of the polypeptide fused to one or more prosequences may be an inactive form of the polypeptide. When prosequences are removed such inactive precursors generally are activated. Some or all of the prosequences may be removed before activation. Generally, such precursors are called proproteins.
  • the DNA may also comprise a promoter region which functions to direct the transcription of the mRNA encoding the HSV-2 of this invention .
  • promoters may be independently useful to direct the transcription of heterologous genes in recombinant expression systems.
  • Polyadenylation and splicing signal sequences are also present in the polynucleotide sequence and may be useful as gene expression signal in heterologous gene expression vectors and constructs.
  • the polynucleotides and polypeptides of the invention may be employed, for example, as research reagents and materials for discovery of treatments of and diagnostics for disease, particularly human disease, as further discussed herein relating to polynucleotide assays.
  • polynucleotides of the invention that are oligonucleotides may also be used as nucleic acid amplification primers, such as PCR primers, in the process herein described to determine whether or not the HSV-2 genes identified herein in whole or in part are present or transcribed in infected tissue. It is recognized that such sequences will also have utility in diagnosis of the stage of infection and type of infection the pathogen has attained.
  • the polynucleotides disclosed herein or portions thereof may be used as probes to discover mRNA transcripts synthesized during productive and latent HSV-2 infections, for example by Northern blot, nuclease protection, and primer extension experiments. Novel transcripts in turn can lead to the discovery of new HSV-2 proteins not deducible from the genome sequences directly.
  • the sequences, or portions thereof may be used to discover antisense inhibitors of virus replication and novel therapeutics based on antisense mechanisms.
  • the sequences, or portions thereof may be used to prepare novel gene therapy vectors.
  • sequences or portions thereof may be used as a basis for the generation of DNA- or RNA-containing oligonucleotides designed to form a triplex with duplex DNA, for use as analytical tools, diagnostics or therapeutics.
  • Nucleic acid sequences, or portion thereof can be used to generate cell lines useful for diagnostics or screening.
  • the DNA sequences can be used to predict restriction enzyme sites useful for replacing the gene in the viral genome with a marker gene such as lac z or green flourescent protein. Such a replacement is useful in defining the biological role of the gene in the viral life cycle.
  • These gene knockout experiments are useful to discover genes which are likely to be high quality drug discovery targets (essential genes) or good locations for foreign genes for the purposes of gene therapy (non- essential genes) through an HSV-2 viral vector.
  • Such gene replacements are also useful for discovering virulence factors, for example by comparing the pathogenicity of the modified virus with the unmodified virus or through the ease of identifying a marker gene such as lacz.
  • N means that any of the four DNA or RNA bases may appear at such a designated position in the DNA or RNA sequence, except it is preferred that N is not a base that when taken in combination with adjacent nucleotide positions, when read in the correct reading frame, would have the effect of generating a premature termination codon in such reading frame.
  • a polynucleotide of the invention may encode a mature protein, a mature protein plus a leader sequence (which may be referred to as a preprotein), a precursor of a mature protein having one or more prosequences that are not the leader sequences of a preprotein, or a preproprotein, which is a precursor to a proprotein, having a leader sequence and one or more prosequences, which generally are removed during processing steps that produce active and mature forms of the polypeptide.
  • Polypeptides The present invention further relates to HSV-2 polypeptides that have the deduced amino acid sequences of the polypeptides defined by amino acid sequence in Tables 1-4.
  • the invention also relates to fragments, analogs and de ⁇ vatives of these polypeptides
  • fragment when referring to the polypeptides of the invention mean a polypeptide which retains essentially the same biological function or activity as such polypeptide Fragments, de ⁇ vatives and analogs that retain at least 90% of the biological activity of the native HSV-2 protein are preferred Fragments, de ⁇ vatives and analogs that retain at least 95% of the activity of the native HSV-2 protein are preferred
  • an analog includes a proprotein which can be activated by cleavage of the proprotein portion to produce an active mature polypeptide
  • polypeptide of the present invention may be a recombinant polypeptide, a natural polypeptide or a synthetic polypeptide In certain preferred embodiments it is a recombinant polypepftde
  • the fragment, de ⁇ vative or analog of the polypeptides of the invention may be (I) one in which one or more of the ammo acid residues are substituted with a conserved or non- conserved ammo acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code, or (n) one in which one or more of the ammo acid residues includes a substituent group, or (in) one in which the mature polypeptide is fused with another compound, such as a compound to increase the half-life of the polypeptide (for example, polyethylene glycol), or (iv) one in which the additional amino acids are fused to the mature polypeptide, such as a leader or secretory sequence or a sequence which is employed for pu ⁇ fication of the mature polypeptide or a proprotein sequence
  • a conserved or non- conserved ammo acid residue preferably a conserved amino acid residue
  • substituted amino acid residue may or may not be one encoded by the genetic code
  • va ⁇ ants are those that vary from a reference by conservative amino acid substitutions Such substitutions are those that substitute a given ammo acid in a polypeptide by another ammo acid of like characte ⁇ stics Typically seen as conservative substitutions are the replacements, one for another, among the aliphatic amino acids Ala, Val, Leu and De, interchange of the hydroxyl residues Ser and Thr, exchange of the acidic residues Asp and Glu, substitution between the amide residues Asn and Gin, exchange of the basic residues Lys and Arg and replacements among die aromatic residues Phe, Tyr Further particularly preferred m this regard are va ⁇ ants, analogs, denvatives and fragments having the ammo acid sequence of one or more of the HSV-2 polypeptides of the invention, in which several, a few, 5 to 10, 1 to 5, 1 to 3, 2, 1 or no ammo acid residues are substituted, deleted or added, in any combination Especially preferred among these are silent substitutions, additions and deletions, which do not alter the
  • R 1 X-(R 1 ) n -(R 2 )-(R 3 ) m -Y
  • X is hydrogen
  • Y is hydrogen or a metal
  • R ⁇ and R3 are any amino acid residue
  • n and/or m is an integer between 1 and 2000 or zero
  • R 2 is an amino acid sequence of the invention, particularly an amino acid sequence selected from the group set forth in Tables 1, 2, 3 and 4.
  • R is oriented so that its amino terminal residue is at the left, bound to R j and its carboxy terminal residue is at the right, bound to R3.
  • the polypeptides and polynucleotides of the present invention are preferably provided in an isolated form, and preferably are purified to homogeneity.
  • polypeptides of the present invention include the polypeptides of Tables 1 -4, in particular the mature polypeptide as well as polypeptides which have at least 60%, 70% or 80% identity to one or more of the polypeptides of Tables 1-4 and preferably at least 90% similarity to one or more of the polypeptides of Tables 1-4 and more preferably at least 95% similarity; and still more preferably at least 95% identity to one or more of the polypeptides of Tables 1-4 and also include portions of such polypeptides with such portion of the polypeptide generally containing at least 30 contiguous amino acids and more preferably at least 50 contiguous amino acids.
  • polypeptides of this invention are useful as a source of those proteins for screening and or therapy.
  • polypeptides may be identified by homology for example, to HSV1 polypeptides that code for proteins with known function (e.g., helicases, kinases, proteases).
  • Use of polypeptides of the invention for screening or therapy based upon functionality predicted by homology match is a particularly preferred aspect of this invention.
  • polypeptides derived from the deposited strain ATCC VR-2546 herein can be used for comparison with sequences from other HSV-2 strains in the public domain, for example, comparison of the polypeptides of the invention with strain HG52 may be useful in the discovery of virulence factors, since HG52 is avirulent in mouse and guinea pig infection models and HSV-2 SB5 is virulent.
  • public domain homolog from strain MS may be useful in the discovery of virulence factors since there are major differences in the CNS pathogenesis in animal models between strains MS and SB5.
  • X or "Xaa” is also used.
  • X and “Xaa” mean that any of the twenty naturally occuring amino acids may appear at such a designated position in the polypeptide sequence.
  • Fragments or portions of the polypeptides of the present invention may be employed for producing the corresponding full-length polypeptide by peptide synthesis; therefore, the fragments may be employed as intermediates for producing the full-length polypeptides. Fragments or portions of the polynucleotides of the present invention may be used to synthesize full-length polynucleotides of the present invention.
  • polypeptides comprising fragments of HSV-2, most particularly fragments of HSV-2 having the amino acid sequences set out in Tables 1-4, and variants and derivatives thereof.
  • a fragment is a polypeptide having an amino acid sequence that entirely is the same as part but not all of the amino acid sequence of the aforementioned HSV-2 polypeptides and variants or derivatives thereof.
  • fragments may be "free-standing,” i.e., not part of or fused to other amino acids or polypeptides, or they may be comprised within a larger polypeptide of which they form a part or region. When comprised within a larger polypeptide, the presently discussed fragments most preferably form a single continuous region. However, several fragments may be comprised within a single larger polypeptide. For instance, certain preferred embodiments relate to a fragment of a HSV-2 polypeptide of the present comprised within a precursor polypeptide designed for expression in a host and having heterologous pre and pro-polypeptide regions fused to the amino terminus of the HSV-2 fragment and an additional region fused to the carboxyl terminus of the fragment. Therefore, fragments in one aspect of the meaning intended herein, refers to the portion or portions of a fusion polypeptide or fusion protein derived from HSV-2.
  • polypeptide fragments of the invention include, for example, those which have from about 5-15, 10-20, 15-40, 30-55, 41-75, 41-80, 41-90, 50-100, 75-100, 90-115, 100-125, and 110-140, 120-150, 200-300, 1-175, 1-600 or 1-1000 amino acids long.
  • Particular examples of polypeptide fragments of the inventions that may be mentioned include fragments of 20-200 amino acids.
  • Truncation mutants include HSV-2 polypeptides having the amino acid sequences of Tables 1-4, or of variants or derivatives thereof, except for deletion of a continuous series of residues (that is, a continuous region, part or portion) that includes the amino terminus, or a continuous series of residues that includes the carboxyl terminus or, as in double truncation mutants, deletion of two continuous series of residues, one including the amino terminus and one including the carboxyl terminus.
  • Fragments having the size ranges set out above also are preferred embodiments of truncation fragments, which are especially preferred among fragments generally.
  • Degradation forms of the polypeptides of the invention in a host cell are also preferred.
  • Also preferred in this aspect of the invention are fragments characterized by structural or functional attributes of HSV-2.
  • Preferred embodiments of the invention in this regard include fragments that comprise alpha-helix and alpha-helix forming regions ("alpha-regions”), beta- sheet and beta-sheet-forming regions ("beta-regions"), turn and rum-forming regions (“rum- regions”), coil and coil-forming regions ("coil-regions”), hydrophilic regions, hydrophobic regions, alpha amphipathic regions, beta amphipathic regions, flexible regions, surface-forming regions and high antigenic index regions of HSV-2.
  • Further preferred regions are those that mediate activities of HSV-2.
  • Most highly preferred in this regard are fragments that have a chemical, biological or other activity of the particular HSV-2 protein , including those with a similar activity or an improved activity, or with a decreased undesirable activity. Routinely one generates the fragment by well-known methods then compares the activity of the fragment to the native protein in a convenient assay such as listed hereinbelow.
  • Highly preferred in this regard are fragments that contain regions that are homologs in sequence, or in position, or in both sequence and to active regions of related polypeptides, such as the related polypeptides set out in Table 1.
  • truncation mutants are particularly preferred fragments in these regards.
  • Further preferred polynucleotide fragments are those that are antigenic or immunogenic in an animal, especially in a human.
  • the invention also relates to, among others, polynucleotides encoding the aforementioned fragments, polynucleotides that hybridize to polynucleotides encoding the fragments, particularly those that hybridize under stringent conditions, and polynucleotides, such as PCR primers, for amplifying polynucleotides that encode the fragments.
  • preferred polynucleotides are those that correspond to the preferred fragments, as discussed above.
  • the present invention also relates to vectors which comprise a polynucleotide or polynucleotides of the present invention, host cells which are genetically engineered with vectors of the invention and the production of polypeptides of the invention by recombinant techniques.
  • Host cells can be genetically engineered to incorporate polynucleotides and express polypeptides of the present invention.
  • Introduction of a polynucleotides into the host cell can be affected by calcium phosphate transfection, DEAE-dextran mediated transfection, transvection, microinjection, cationic lipid-mediated transfection, electroporation, transduction, scrape loading, ballistic introduction, infection or other methods. Such methods are described in many standard laboratory manuals, such as Davis et ah, BASIC METHODS IN MOLECULAR BIOLOGY,
  • Polynucelotide constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence.
  • the polypeptides of the invention can be synthetically produced by conventional peptide synthesizers.
  • Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook et ah, MOLECULAR CLONING: A LABORATORY MANUAL, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989).
  • the vector may be, for example, a plasmid vector, a single or double-stranded phage vector, a single or double-stranded RNA or DNA viral vector.
  • Plasmids generally are designated herein by a lower case p preceded and/or followed by capital letters and or numbers, in accordance with standard naming conventions that are familiar to those of skill in the art. Starting plasmids disclosed herein are either commercially available, publicly available, or can be constructed from available plasmids by routine application of well known, published procedures. Many plasmids and other cloning and expression vectors that can be used in accordance with the present invention are well known and readily available to those of skill in the art.
  • vectors are those for expression of polynucleotides and polypeptides of the present invention.
  • such vectors comprise -acting control regions effective for expression in a host operatively linked to the polynucleotide to be expressed.
  • Appropriate trans-acting factors either are supplied by the host, supplied by a complementing vector or supplied by the vector itself upon introduction into the host.
  • the vectors provide for specific expression.
  • Such specific expression may be inducible expression or expression only in certain types of cells or both inducible and cell-specific.
  • Particularly preferred among inducible vectors are vectors that can be induced for expression by environmental factors that are easy to manipulate, such as temperature and nutrient additives.
  • a variety of vectors suitable to this aspect of the invention, including constitutive and inducible expression vectors for use in prokaryotic and eukaryotic hosts, are well known and employed routinely by those of skill in the art.
  • vectors can be used to express a polypeptide of the invention.
  • Such vectors include, among others, chromosomal, episomal and virus-derived vectors, e.g., vectors derived from viral plasmids, from bacteriophage, from transposons, from yeast episomes, from insertion elements, from yeast chromosomal elements, from viruses such as baculoviruses, papova viruses, such as SV40, vaccinia viruses, adenoviruses, fowl pox viruses, pseudorabies viruses and retroviruses, and vectors derived from combinations thereof, such as those derived from plasmid and bacteriophage genetic elements, such as cosmids and phagemids, all may be used for expression in accordance with this aspect of the present invention.
  • any vector suitable to maintain, propagate or express polynucleotides to express a polypeptide in a host may be used for expression in this regard.
  • DNA sequence may be inserted into the vector by any of a variety of well-known and routine techniques, such as, for example, those set forth in Sambrook et al.. MOLECULAR CLONING, A LABORATORY MANUAL, 2nd Ed.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989).
  • the DNA sequence in the expression vector is operatively linked to appropriate expression control sequence(s), including, for instance, a promoter to direct mRNA transcription.
  • appropriate expression control sequence(s) including, for instance, a promoter to direct mRNA transcription.
  • promoters include, but are not limited to, the phage lambda PL promoter, the E. coli lac, trp and tac promoters, the SV40 early and late promoters and promoters of retroviral LTRs.
  • expression constructs will contain sites for transcription initiation and termination, and, in some instances, in the transcribed region, a ribosome binding site for translation.
  • the coding portion of the mature transcripts expressed by the constructs will include a translation initiating AUG at the beginning and a teirnination codon appropriately positioned at the end of the polypeptide to be translated.
  • constructs may contain control regions that regulate as well as engender expression.
  • control regions that regulate as well as engender expression.
  • such regions will operate by controlling transcription, such as transcription factors, repressor binding sites and termination, among others.
  • Vectors for propagation and expression generally will include selectable markers and amplification regions, such as, for example, those set forth in Sambrook et al, MOLECULAR CLONING, A LABORATORY MANUAL, 2nd Ed.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989).
  • bacterial cells such as streptococci, staphylococci, E. coli, streptomyces and Bacillus subtilis cells
  • fungal cells such as yeast cells and Aspergillus cells
  • insect cells such as Drosophila S2 and Spodoptera Sf9 cells
  • animal cells such as CHO, COS, HeLa, C127, 3T3, BHK, 293 and Bowes melanoma cells
  • plant cells include bacterial cells, such as streptococci, staphylococci, E. coli, streptomyces and Bacillus subtilis cells
  • fungal cells such as yeast cells and Aspergillus cells
  • insect cells such as Drosophila S2 and Spodoptera Sf9 cells
  • animal cells such as CHO, COS, HeLa, C127, 3T3, BHK, 293 and Bowes melanoma cells
  • plant cells such as CHO, COS, HeLa, C127, 3T3, BHK, 293 and Bowes melanom
  • vectors which are commercially available, are provided by way of example.
  • vectors preferred for use in bacteria are pQE70, pQE60 and pQE-9, available from Qiagen; pBS vectors, Phagescript vectors, Bluescript vectors, pNH8A, pNHl ⁇ a, pNH18A, pNH46A, available from Stratagene; and ptrc99a, pKK223-3, pKK233-3, pDR540, pRTT5 available from Pharmacia, and pBR322 (ATCC 37017).
  • eukaryotic vectors are pWLNEO, pSV2CAT, pOG44, pXTl and pSG available from Stratagene; and pSVK3, pBPV, pMSG and pSVL available from Pharmacia. These vectors are listed solely by way of illustration of the many commercially available and well known vectors that are available to those of skill in the art for use in accordance with this aspect of the present invention. It will be appreciated that any other plasmid or vector suitable for, for example, introduction, maintenance, propagation or expression of a polynucleotide or polypeptide of the invention in a host may be used in this aspect of the invention.
  • Promoter regions can be selected from any desired gene using vectors that contain a reporter transcription unit lacking a promoter region, such as a chloramphenicol acetyl transferase ("CAT") transcription unit, downstream of restriction site or sites for introducing a candidate promoter fragment; i.e., a fragment that may contain a promoter.
  • CAT chloramphenicol acetyl transferase
  • introduction into the vector of a promoter-containing fragment at the restriction site upstream of the cat gene engenders production of CAT activity, which can be detected by standard CAT assays.
  • Vectors suitable to this end are well known and readily available, such as pKK232-8 and pCM7.
  • Promoters for expression of polynucleotides of the present invention include not only well known and readily available promoters, but also promoters that readily may be obtained by the foregoing technique, using a reporter gene.
  • prokaryotic promoters suitable for expression of polynucleotides and polypeptides in accordance with the present invention are the E. coli lacl and lacZ and promoters, the T3 and T7 promoters, the gpt promoter, the lambda PR, PL promoters and the trp promoter.
  • eukaryotic promoters suitable in this regard are the CMV immediate early promoter, the HSV thymidine kinase promoter, the early and late SV40 promoters, the promoters of retroviral LTRs, such as those of the Rous sarcoma virus ("RSV”), and metallothionein promoters, such as the mouse metallothionein-I promoter.
  • CMV immediate early promoter the HSV thymidine kinase promoter
  • the early and late SV40 promoters the promoters of retroviral LTRs, such as those of the Rous sarcoma virus ("RSV”)
  • metallothionein promoters such as the mouse metallothionein-I promoter.
  • Recombinant expression vectors will include, for example, origins of replication, a promoter preferably derived from a highly-expressed or regulatable gene to direct transcription of a downstream structural sequence, and a selectable marker to permit isolation of vector containing cells after exposure to the vector.
  • Polynucleotides of the invention encoding the heterologous structural sequence of a polypeptide of the invention generally will be inserted into the vector using standard techniques so that it is operably linked to the promoter for expression.
  • the polynucleotide will be positioned so that the transcription start site is located appropriately 5' to the AUG that initiates translation of the polypeptide to be expressed.
  • a ribosome binding site may be located between the transcription start site and the initiating AUG.
  • a translation stop codon at the end of the polypeptide and there will be a polyadenylation signal in constructs for use in eukaryotic hosts.
  • Transcription termination signal appropriately disposed at the 3' end of the transcribed region may also be included in the polynucleotide construct.
  • secretion signals may be incorporated into the expressed polypeptide. These signals may be endogenous to the polypeptide or they may be heterologous signals.
  • the polypeptide may be expressed in a modified form, such as a fusion protein, and may include not only secretion signals but also additional heterologous functional regions.
  • a region of additional amino acids, particularly charged amino acids may be added to the N-or C-terminus of the polypeptide to improve stability and persistence in the host cell, during purification or during subsequent handling and storage.
  • a region may be added to the polypeptide to facilitate purification. Such regions may be removed prior to final preparation of the polypeptide.
  • the addition of peptide moieties to polypeptides to engender secretion or excretion, to improve stability or to facilitate purification, among others, are familiar and routine techniques in the art.
  • a preferred fusion protein comprises a heterologous region from immunoglobulin that is useful to solubilize or purify polypeptides.
  • EP-A-0 464 533 (Canadian counterpart 2045869) discloses fusion proteins comprising various portions of constant region of immunoglobulin molecules together with another protein or part thereof.
  • proteins have been fused with antibody Fc portions for the purpose of high-throughput screening assays to identify antagonists. See, D. Bennett et ah, Journal of Molecular Recognition, 8: 52-58 (1995) and K. Johanson et ah, The Journal of Biological Chemistry, 270,(16): 9459-9471 (1995).
  • Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents, such methods are well know to those skilled in the art.
  • Mammalian expression vectors may comprise an origin of replication, a suitable promoter and enhancer, and also any necessary polyadenylation sites, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking non-transcribed sequences that are necessary for expression.
  • DNA sequences derived from the SV40 splice sites, and the SV40 polyadenylation sites are used for required non-transcribed genetic elements of these types.
  • HSV-2 polypeptides can be recovered and purified from recombinant cell cultures by well-known methods including ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite chromatography and lectin chromatography. Most preferably, high performance liquid chromatography ("HPLC") is employed for purification. Well known techniques for refolding protein may be employed to regenerate active conformation when the polypeptide is denatured during isolation and or purification.
  • HPLC high performance liquid chromatography
  • Polypeptides of the present invention include naturally purified products, products of chemical synthetic procedures, and products produced by recombinant techniques from a prokaryotic or eukaryotic host, including, for example, viral, yeast, higher plant, insect and mammalian cells. Depending upon the host employed in a recombinant production procedure, the polypeptides of the present invention may be glycosylated or may be non-glycosylated. In addition, polypeptides of the invention may also include an initial modified methionine residue, in some cases as a result of host-mediated processes.
  • HSV-2 polynucleotides and polypeptides may be used in accordance with the present invention for a variety of applications, particularly those that make use of the chemical and biological properties of HSV-2. Additional applications relate to diagnosis and to treatment of disorders of cells, tissues and organisms. These aspects of the invention are illustrated further by the following discussion.
  • HSV-2 polynucleotides to detect complementary polynucleotides such as, for example, as a diagnostic reagent. Detection of HSV- 2 polynucleotides in a eukaryote, particularly a mammal, and especially a human, will provide a diagnostic method that can add to, define or allow a diagnosis of a disease. Eukaryotes (herein also "individual(s)”), particularly mammals, and especially humans, infected by HSV-2 may be detected at the DNA or RNA level by a variety of techniques.
  • Nucleic acids for diagnosis may be obtained from an individual's cells, tissues, and fluids, such as brain, bone, blood, muscle, cartilage, skin, saliva, urine, semen, and mucous. Tissue biopsy and autopsy material is also preferred for samples from an individual to use in a diagnostic assay.
  • the viral DNA may be used directly for detection or may be amplified enzymatically by using PCR prior to analysis (Saiki et ah, Nature 324: 163-166 (1986)).
  • RNA or cDNA may also be used in the same ways.
  • PCR primers complementary to the nucleic acid encoding HSV-2 can be used to identify and analyze HSV-2 presence and expression.
  • characterization of the strain of virus present in a eukaryote, particularly a mammal, and especially a human may be made by an analysis of the genotype of the viral gene. For example, deletions and insertions can be detected by a change in size of the amplified product in comparison to the genotype of a reference sequence. Point mutations can be identified by hybridizing amplified DNA to radiolabeled HSV-2 RNA or alternatively, radiolabeled HSV-2 antisense DNA sequences. Perfectly matched sequences can be distinguished from mismatched duplexes by RNase A digestion or by differences in melting temperatures. Sequence differences between a reference gene and genes having mutations also may be revealed by direct DNA sequencing.
  • cloned DNA segments may be employed as probes to detect specific DNA segments.
  • the sensitivity of such methods can be greatly enhanced by appropriate use of PCR or another amplification method.
  • a sequencing primer is used with double-stranded PCR product or a single-stranded template molecule generated by a modified PCR.
  • the sequence determination is performed by conventional procedures with radiolabeled nucleotide or by automatic sequencing procedures with fluorescent-tags.
  • DNA sequence differences may be achieved by detection of alteration in electrophoretic mobility of DNA fragments in gels, with or without denaturing agents. Small sequence deletions and insertions can be visualized by high resolution gel electrophoresis. DNA fragments of different sequences may be distinguished on denaturing formamide gradient gels in which the mobilities of different DNA fragments are retarded in the gel at different positions according to their specific melting or partial melting temperatures (see, e.g., Myers et ah, Science. 230: 1242 (1985)). Sequence changes at specific locations also may be revealed by nuclease protection assays, such as RNase and S 1 protection or the chemical cleavage method (e.g. , Cotton et ah, Proc. Natl. Acad. Sci.. USA. 85: 4397-4401 (1985)).
  • the detection of a specific DNA sequence may be achieved by methods such as hybridization, RNase protection, chemical cleavage, direct DNA sequencing or the use of restriction enzymes, (e.g., restriction fragment length polymorphisms ("RFLP”) and Southern blotting of genomic DNA.
  • restriction enzymes e.g., restriction fragment length polymorphisms ("RFLP") and Southern blotting of genomic DNA.
  • mutations also can be detected by in situ analysis.
  • Cells carrying mutations or polymorphisms in the gene of the present invention may also be detected at the DNA level by a variety of techniques, to allow for serotyping, for example.
  • Nucleic acids for diagnosis may be obtained from an infected individual's cells, including but not limited to blood, urine, saliva, tissue biopsy and autopsy material or from virus isolated and cultured from the above or other sources.
  • the viral DNA may be used directly for detection or may be amplified enzymatically by using PCR (Saiki etal, Nature, 324:163-166 (1986)) prior to analysis.
  • RT-PCR can also be used to detect mutations. It is particularly preferred to used RT-PCR in conjunction with automated detection systems, such as, for example, GeneScan.
  • RNA or cDNA may also be used for the same purpose, PCR or RT-PCR.
  • PCR primers complementary to the nucleic acid encoding HSV-2 can be used to identify and analyze mutations. For example, deletions and insertions can be detected by a change in size of the amplified product in comparison to the normal genotype. Point mutations can be identified by hybridizing amplified DNA to radiolabeled RNA or alternatively, radiolabeled antisense DNA sequences. Perfectly matched sequences can be distinguished from mismatched duplexes by RNase A digestion or by differences in melting temperatures.
  • the primers may be used to amplify the gene isolated from the individual such that the gene may then be subject to various techniques for elucidation of the DNA sequence. In this way, mutations in the DNA sequence may be detected.
  • Polypeptide assays Polypeptide assays:
  • the present invention also relates to diagnostic assays such as quantitative and diagnostic assays for detecting levels of HSV-2 protein in cells and tissues, including determination of normal and abnormal levels.
  • a diagnostic assay in accordance with the invention for detecting expression of HSV-2 protein compared to normal control tissue samples may be used to detect the presence of an infection
  • Assay techniques that can be used to determine levels of a protein, such as an HSV-2 protem of the present invention, in a sample de ⁇ ved from a host are well-known to those of skill in the art
  • Such assay methods include radioimmunoassays, competitive-binding assays, Western Blot analysis and ELISA assays Among these ELISAs frequently are preferred
  • An ELISA assay initially comp ⁇ ses prepa ⁇ ng an antibody specific to HSV-2, preferably a monoclonal antibody
  • a reporter antibody generally is prepared which binds to the monoclonal antibody
  • the reporter antibody is attached to a detectable reagent such as radioactive, fluorescent or
  • the polypeptides, their fragments or other de ⁇ vatives, or analogs thereof, or cells expressing them can be used as an immunogen to produce antibodies thereto
  • the present invention includes, for examples monoclonal and polyclonal antibodies, chimenc, single chain, and humanized antibodies, as well as Fab fragments, or the product of an Fab expression library
  • Antibodies generated against the polypeptides corresponding to a sequence of the present invention can be obtained by direct injection of the polypeptides into an animal or by administe ⁇ ng the polypeptides to an animal, preferably a nonhuman The antibody so obtained will then bind the polypeptides itself In this manner, even a sequence encoding only a fragment of the polypeptides can be used to generate antibodies binding the whole native polypeptides Such antibodies can then be used to isolate the polypeptide from tissue expressing that polypeptide
  • any technique known in the art which provides antibodies produced by continuous cell line cultures can be used Examples include va ⁇ ous techniques, such as those in Kohler, G and Milstein, C , Nature 256 495-497 (1975), Kozbor et al . Immunology Today 4 72 (1983), Cole et al, pg 7 '-96 m MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R Liss, Inc (1985)
  • phage display technology could be utilized to select antibody genes with binding activities towards the polypeptide either from repertoires of PCR amplified v- genes of lymphocytes from humans screened for possessing anti-Fbp or from naive libraries (McCafferty, J. et ah, , Nature 348, 552-554 (1990); Marks, J. et aL, Biotechnology 10: 779-783 (1992).
  • the affinity of these antibodies can also be improved by chain shuffling (Clackson, T. et aL, Nature 352, 624-628 (1991).
  • each domain may be directed against a different epitope - termed 'bispecific' antibodies.
  • the above-described antibodies may be employed to isolate or to identify clones expressing the polypeptide or purify the polypeptide of the present invention by attachment of the antibody to a solid support for isolation and/or purification by affinity chromatography.
  • antibodies against HSV-2 may be employed to inhibit and or treat infections, particularly viral infections, and especially HSV-2 infections as well as to monitor the effectiveness of antibiotic treatment.
  • Polypeptide derivatives include antigenically, epitopically or immunologically equivalent derivatives which form a particular aspect of this invention.
  • the term "antigenically equivalent derivative” as used herein encompasses a polypeptide or its equivalent which will be specifically recognized by certain antibodies which, when raised to the protein or polypeptide according to the present invention, interfere with the immediate physical interaction between pathogen and mammalian host.
  • the term “immunologically equivalent derivative” as used herein encompasses a peptide or its equivalent which when used in a suitable formulation to raise antibodies in a vertebrate, the antibodies act to interfere with the immediate physical interaction between pathogen and mammalian host.
  • the polypeptide such as an antigenically or immunologically equivalent derivative or a fusion protein thereof, is used as an antigen to immunize a mouse or other animal such as a rabbit, rat or chicken.
  • the fusion protein may provide stability to the polypeptide.
  • the antigen may be associated, for example by conjugation , with an immunogenic carrier protein for example bovine serum albumin (BSA) or keyhole limpet haemocyanin (KLH).
  • BSA bovine serum albumin
  • KLH keyhole limpet haemocyanin
  • a multiple antigenic peptide comprising multiple copies of the protein or polypeptide, or an antigenically or immunologically equivalent polypeptide thereof may be sufficiently antigenic to improve immunogenicity so as to obviate the use of a carrier.
  • the antibody or derivative thereof is modified to make it less immunogenic in the individual.
  • the antibody may most preferably be "humanised” ; where the complimentarity determining region(s) of the hybridoma-derived antibody has been transplanted into a human monoclonal antibody , for example as described in Jones, P. et a Nature 321 : 522-525 (1986)or Tempest et ah, Biotechnology 9: 266-273 (1991).
  • the above antibody reagents will also be useful for assessing the biological role of the gene through antibody inhibition studies, immunoprecipitation studies, super-shift experiments and similar techniques. These studies may lead to discovery of novel proteimprotein interactions which may be useful drug targets.
  • the above antibody reagents may lead to the identification of novel viral proteins not predicted by the DNA sequence, which in turn may be novel drug targets.
  • HSV-2 binding molecules and assays are examples of novel viral proteins not predicted by the DNA sequence, which in turn may be novel drug targets.
  • This invention also provides a method for identification of molecules, such as binding molecules, that bind HSV-2.
  • Genes encoding proteins that bind HSV-2, such as binding proteins, can be identified by numerous methods known to those of skill in the art, for example, ligand panning and FACS sorting. Such methods are described in many laboratory manuals such as, for instance, Coligan s ⁇ aL, Current Protocols in Immunology 1(2): Chapter 5 (1991).
  • expression cloning may be employed for this purpose.
  • polyadenylated RNA is prepared from a cell expressing HSV-2, a cDNA library is created from this RNA, the library is divided into pools and the pools are transfected individually into cells that are not expressing HSV-2. The transfected cells then are exposed to labeled HSV-2.
  • HSV-2 can be labeled by a variety of well-known techniques including standard methods of radio- iodination or inclusion of a recognition site for a site-specific protein kinase. Following exposure, the cells are fixed and binding of HSV-2 is determined. These procedures conveniently are carried out on glass slides.
  • a labeled ligand can be photoaffinity linked to a cell extract, such as a membrane or a membrane extract, prepared from cells that express a molecule that it binds, such as a binding molecule.
  • Cross-linked material is resolved by polyacrylamide gel electrophoresis ("PAGE") and exposed to X-ray film.
  • PAGE polyacrylamide gel electrophoresis
  • the labeled complex containing the ligand-binding can be excised, resolved into peptide fragments, and subjected to protein microsequencing.
  • the amino acid sequence obtained from microsequencing can be used to design unique or degenerate oligonucleotide probes to screen cDNA libraries to identify genes encoding the putative binding molecule.
  • Polypeptides of the invention also can be used to assess HSV-2 binding capacity of HSV-2 binding molecules in cells or in cell-free preparations. Polypeptides of the invention may also be used to assess the binding of small molecule substrates and ligands in, for example, cells, cell-free preparations, chemical libraries, and natural product mixtures. These substrates and ligands may be natural substrates and ligands or may be structural or functional mimetics.
  • This invention also provides a method of screening drugs to identify those which interfere with the proteins selected as targets herein, which method comprises measuring the interference of the activity of the protein by a test drug. For example if the protein selected has a catalytic activity, after suitable purification and formulation the activity of the enzyme can be followed by its ability to convert its natural substrates. By incorporating different chemically synthesised test compounds or natural products into such an assay of enzymatic activity one is able to detect those additives which compete with the natural substrate or otherwise inhibit enzymatic activity.
  • the invention also relates to activators and inhibitors identified thereby.
  • Another aspect of the invention relates to use of a polynucleotide in genetic immunization, and will preferably employ a suitable delivery method such as direct injection of plasmid DNA into muscles (Wolff et al., Hum. Mol. Genet. 1:363 (1992); Manthorpe et al., Hum. Gene Ther. 4:419 (1963)), delivery of DNA complexed with specific protein carriers ( Wu et al., J. Biol. Chem. 264:16985 (1989)), coprecipitation of DNA with calcium phosphate (Benvenisty & Reshef, Proc. Nat'l Acad. Sci. USA.
  • Suitable promoters for muscle transfection include CMV, RSV, SRa, actin, MCK, alpha globin, adenovirus and dihydrofolate reductase.
  • the active agent i.e., the polypeptide, polynucleotide or inhibitor of the invention
  • the active agent may be administered to a patient as an injectable composition, for example as a sterile aqueous dispersion, preferably isotonic.
  • Vaccines Another aspect of the invention relates to a method for inducing an immunological response in an individual, particularly a mammal which comprises inoculating the individual with HSV-2 polypeptide, or an antigenic fragment or variant thereof, adequate to produce antibody to protect said individual from infection, particularly HSV-2 infection.
  • Yet another aspect of the invention relates to a method of inducing immunological response in an individual which comprises, through gene therapy, delivering a gene encoding HSV-2, or an antigenic fragment or a variant thereof, for expressing HSV-2, or a fragment or a variant thereof in vivo in order to induce an immunological response to produce antibody to protect said individual from disease.
  • a further aspect of the invention relates to an immunological composition which, when introduced into a host capable or having induced within it an immunological response, induces an immunological response in such host to HSV-2 or a protein coded therefrom, wherein the composition comp ⁇ ses a recombinant HSV-2 or protein coded therefrom comprising DNA which codes for and expresses an antigen of said HSV-2 or protein coded therefrom.
  • the HSV-2 or a fragment thereof may be fused with a co-protein which may not by itself produce antibodies, but is capable of stabilizing the first protein and producing a fused protein which will have immunogenic and protective properties.
  • This fused recombinant protein preferably further comprises an antigenic co-protein, such as Glutathione-S- transferase (GST) or beta-galactosidase, relatively large co-proteins which solubilise the protein and facilitate production and purification thereof.
  • GST Glutathione-S- transferase
  • beta-galactosidase relatively large co-proteins which solubilise the protein and facilitate production and purification thereof.
  • the co-protein may act as an adjuvant in the sense of providing a generalized stimulation of the immune system.
  • the co-protein may be attached to either the amino or carboxy terminus of the first protein.
  • the present invention also includes a vaccine formulation which comprises the immunogenic recombinant protein together with a suitable carrier. Since the protein may be broken down in the stomach, it is preferably administered parenterally, including, for example, administration that is subcutaneous, intramuscular, intravenous, or intradermal.
  • Formulations suitable for parenteral administration include aqueous and non-aqueous sterile injection solutions which may contain anti-oxidants, buffers, bacteriostats and solutes which render the formulation instonic with the bodily fluid, preferably the blood, of the individual; and aqueous and non-aqueous sterile suspensions which may include suspending agents or thickening agents.
  • the formulations may be presented in unit-dose or multi-dose containers, for example, sealed ampoules and vials and may be stored in a freeze-dried condition requiring only the addition of the sterile liquid carrier immediately prior to use.
  • the vaccine formulation may also include adjuvant systems for enhancing the immunogenicity of the formulation, such as oil-in water systems and other systems known in the art. The dosage will depend on the specific activity of the vaccine and can be readily determined by routine experimentation.
  • compositions Whilst the invention has been described with reference to acertain HSV-2 polypeptide, it is to be understood that this covers fragments of the naturally occurring protein and similar proteins (for example, having sequence homologies of 75% or greater) with additions, deletions or substitutions which do not substantially affect the immunogenic properties of the recombinant protein.
  • compositions comprising the polynucleotide or the polypeptides discussed above or the inhibitors.
  • the polypeptides of the present invention may be employed in combination with a non-sterile or sterile carrier or carriers for use with cells, tissues or organisms, such as a pharmaceutical carrier suitable for administration to a subject.
  • a pharmaceutical carrier suitable for administration to a subject such as a pharmaceutical carrier suitable for administration to a subject.
  • Such compositions comprise, for instance, a media additive or a therapeutically effective amount of a polypeptide of the invention and a pharmaceutically acceptable carrier or excipient.
  • Such carriers may include, but are not limited to, saline, buffered saline, dextrose, water, glycerol, ethanol and combinations thereof. The formulation should suit the mode of administration.
  • the invention further relates to diagnostic and pharmaceutical packs and kits comprising one or more containers filled with one or more of the ingredients of the aforementioned compositions of the invention.
  • Associated with such containers can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, reflecting approval by the agency of the manufacture, use or sale of the product for human administration.
  • Polypeptides and other compounds of the present invention may be employed alone or in conjunction with other compounds, such as therapeutic compounds.
  • compositions may be administered in any effective, convenient manner including, for instance, administration by topical, oral, anal, vaginal, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes among others.
  • compositions generally are administered in an amount effective for treatment or prophylaxis of a specific indication or indications. It will be appreciated that optimum dosage will be determined by standard methods for each treatment modality and indication, taking into account the indication, its severity, route of administration, complicating conditions and the like.
  • the active agent may be administered to an individual as an injectable composition, for example as a sterile aqueous dispersion, preferably isotonic.
  • the composition may be formulated for topical application for example in the form of ointments, creams, lotions, eye ointments, eye drops, ear drops, mouthwash, impregnated dressings and sutures and aerosols, and may contain appropriate conventional additives, including, for example, preservatives, solvents to assist drug penetration, and emollients in ointments and creams.
  • Such topical formulations may also contain compatible conventional carriers, for example cream or ointment bases, and ethanol or oleyl alcohol for lotions.
  • Such carriers may constitute from about 1% to about 98% by weight of the formulation; more usually they will constitute up to about 80% by weight of the formulation.
  • the daily dosage level of the active agent will be from 0.01 mg/kg to 10 mg/kg, typically around 1 mg/kg.
  • the physician in any event will determine the actual dosage which will be most suitable for an individual and will vary with the age, weight and response of the particular individual.
  • the above dosages are exemplary of the average case. There can, of course, be individual instances where higher or lower dosage ranges are merited, and such are within the scope of this invention.
  • composition of the invention may be administered by injection to achieve a systemic effect against relevant virus shortly before insertion of an in-dwelling device. Treatment may be continued after surgery during the in-body time of the device. In addition, the composition could also be used to broaden perioperative cover for any surgical technique to prevent viral reactivation.
  • composition of the invention may be used to bathe an indwelling device immediately before insertion.
  • a vaccine composition is conveniently in injectable form.
  • Conventional adjuvants may be employed to enhance the immune response.
  • a suitable unit dose for vaccination is 0.5-5 microgram/kg of antigen, and such dose is preferably administered 1-3 times and with an interval of 1-3 weeks.
  • This protocol describes the preparation of herpes simplex virus type 2 strain SB5 DNA for sequencing. It is the combination of two protocols, both of which have been modified. Part one describes the crude isolation of the viral DNA from host cell DNA (Hirt, B., J. Mol. Biol. 26: 365-369. (1967),), and part two describes the ultra-purification of the viral DNA through a cesium chloride (CsCl) gradient (Vinograd J,et ah, Proc. Nat'l. Acad. Sci.(USA) 2:902-910(1963)).
  • CsCl cesium chloride
  • infected monolayers were harvested by scraping, and placed in 10ml of cold lx PBS.
  • three roller bottles of infected cells were combined (3 x 10° cells) The cells were spun at 2000g x 5 minutes. The supernatant was removed and to the cell pellet, 25ml of DNA extraction buffer was added (0.25% Triton X-100, lOmM EDTA, lOmM Tris pH 8.0).
  • the lysate was mixed at room temperature for 10 minutes. Them to the lysate, 1ml of 5M NaCl (0.2M final concentration) was added and allowed to mix another 15 minutes. The lysate was centrifuged at 10,000g for 30 minutes at 4°C. The supernatant, which contains the viral DNA, was saved and the pellet, which contains mostly chromosomal DNA, was discarded.
  • the precipitate was centrifuged at 10,000g for 30 minutes at 4°C. The pellet was washed once with 70% ethanol and air dried for 30 minutes. Then the pellet was resuspended in 250ul of TE (lOmM Tris, pH 7.5, 2mM EDTA). RNase A was added to a final concentration of lOug/ml and incubated at 37 °C for one hour.
  • TE lOmM Tris, pH 7.5, 2mM EDTA
  • the DNA was phenol extracted 2x, chloroform extracted lx, and 1/10 volume 3M sodium acetate and 2.5 volumes of 100% ethanol to precipitate were added and allowed to precipitate overnight at -20°C. The next day, The precipitate was spun down at 15,000g x 20 minutes. The pellet was washed lx with 70% ethanol, briefly air dried and resuspended in 1ml of TE.
  • the refractive index of every fourth tube was determined on a refractometer.
  • the final DNA prep was concentrated by precipitating with 1/10 volume 3M sodium acetate and 2.5 volumes of 100% ethanol.
  • the DNA was resuspended in TE and the OD 2 60/280 rea ding taken.
  • the DNA was then subjected to sequencing as provided in Sambrook, J et al. (1989) Chapter 13, supra: or by automated DNA sequencing as per manufacturer's protocols, e.g., Applied Biosystems/Perkin Elmer, Foster City, CA.
  • Tables 1, 2 and 3 represent three different sequencing efforts.
  • Table 4 represents polypeptides encoded by ORFs from Table 3.
  • Table 1 provides polynucleotides of the invention and polypeptides encoded by ORFs, wherein the polynucleotide start and end position for each ORF is indicated by sequence numbers which correlate to the the polynucleotide sequence referred to above each given polypeptide in the Table. Additionally, each ORF-encoded polypeptide sequence is labeled with the Contig number matching the Contig number of the polynucleotide sequence from which it was encoded. For ORF sequences wherein the start polynucleotide number is larger than the end polynucleotide number, translation of that polypeptide initiates on the nucleotide strand which is complemetary to the strand depicted in the Table.
  • Table 2 obtained from a separately-performed sequencing, provides polynucleotides of the invention and polypeptides encoded by ORFs, wherein the polynucleotide start and end position for ORFs are indicated by sequence numbers which correlate to the polynucleotide sequence referred to above each given polypeptide in the Table.
  • Each ORF-encoded polypeptide sequence is labeled with a Contig number matching the Contig number of the polynucleotide sequence referred to above it, from which it was encoded.
  • the nucleotide start number is larger than the end number
  • translation of that polypeptide initiates on the nucleotide strand which is complementary to the strand depicted in the Table.
  • Table 3 obtained from a separately-performed sequencing, provides polynucleotides of the invention and polypeptides encoded by ORFs, wherein the polynucleotide start and end positions for each ORF is indicated by sequence numbers which correlate to the polynucleotide sequence referred to above that polypeptide in the
  • Each ORF-encoded polypeptide sequence is labeled with a Contig number matching the Contig number of the polynucleotide sequence appearing above it from which it was encoded.
  • Contig assembly was performed using the publicly-available Phrap program, (see Table 1).
  • ORF prediction was accomplished using the publicly-available GenMark software program (see Table 1). Homologies of the polypeptide sequences to known proteins are indicated. These homologies were determined by comparison with public database Mpsrch_pp (see Table 1).
  • Table 4 provides ORF sequences of polypeptides encoded by the polynucleotide sequences of Table 3 which were predicted by the GenMark program (see Table 1) as having more than a single start site (N-terminal methionyl residue).
  • the Contig numbers and polynucleotide start and end sites for these ORFs correlate to the Contig numbers and polynucleotide sequence numbers of Table 3.
  • Gene name gene UL5 protein - human herpesvirus 1
  • ORF # 7 from Contig 101
  • ORF start site 7017
  • ORF end site 5815
  • ORF sequence VCIAYHGMGRLTSGVGTAALLWAVGLRWCAKYALADPSLKMADPNRFRGKNLPVLDQL
  • RKHTYNLTIA YRMGDNCAIPITVMEYTECPYNKSLGVCPIRTQPR SYYDSFSAVSEDN
  • ORF start site 1502
  • ORF end site 465
  • ORF sequence
  • ORF # 2 from Contig 102
  • ORF start site 2996
  • ORF end site 1584 ORF sequence:
  • ORF start site 6266
  • ORF end site 5253
  • ORF # 6 from Contig 102
  • ORF start site 9861
  • ORF end site 6319
  • ORF sequence VIRRPVRPFGRTAHPASHGPAAVSVHRVRATVTLVPMANRPAASALAGARSPSERQEPRE PEVAPPGGDHVFCRKVSGVMVLSSDPPGPAAYRISDSSFVQCGSNCSMIIDGDVARGHLR DLEGATSTGAFVAISNVAAGGDGRTAWALGGTSGPSATTSVGTQTSGEFLHGNPRTPEP QGPQAVPPPPPPPFP GHECCARRDARGGAEKDVGAAESW ⁇ DGPSSDSETEDSDSSDEDT GSGSETLSRSSSI AAGATDDDDSDSDSRSDDSVQPDVWRRRWSDGPAPVAFPKPRRPG DSPGNPGLGAGTGPGSATDPRASADSDSAAHAAAPQAEVAPVLDSQPTVGTDPGYPVPLE LTPENAEAVARFLGDAVDREPALMLEYFCRCAREESKRVPPRTFGSAPRL
  • ORF # 7 from Contig 102
  • ORF start site 11144
  • ORF end site 10323
  • ORF sequence VRRRLRCARRRRGGPGPHHDQLRRDAGRGAAGPVFRMPARHGPHARVSPRGHAVFRGASV WTQDELASVTAVCSGPQEATHTGHPGRPCSAVTIPACAFVDLDAELCLGGPGAAFLYLV FTYRQCRDQELCCVYWKSQLPPRGLEAALERLFGRLRITNTIHGAEDMTPLPPNRNVDF PLAVLAASSQSPRCSASQVTNPQFVDRLYRWQPDLRGRPTARTCTYAAFAELGVMPDNSP RCLHRTERFGAVGVPWILEGWWRPGG RACA*
  • ORF # 8 from Contig 102
  • ORF start site 11722
  • ORF end site 10667
  • MKTKPLPTAPMA AESAVETTTSPRELAGHAPLRRVLRPPIARRDGPVLLGDRAPRRTAS TM LLGIDPAESSPGTRATRDDTEQAVDKILRGARRAGGLTVPGAPRYHLTRQVTLTDLC QPNAERAGALLLALRHPTDLPHLARHRAPPGRQTERLAEAWGQLLEASALGSGRAESGCA RAGLVSFNFLVAACAAAYDARDAAEAVRAHITTNYGGTRAGARLDRFSECLRAMVHTHVF PHEVMRFFGGLVSWSHRTS LASPPSAADPRRPHTPATRAGPVRPLPSRPAPL TWTPSC A GALGRRSCTWFSPTDSAGTRSSVACTWSRASSPRAD RRPSSGCSGASG*
  • ORF # 3 from Contig 103
  • ORF start site 6853
  • ORF end site 4784 ORF sequence: MAAAATPGAKRPADPARDPDSPPKRPRPNSLDLATVFGPRPAPPRPTSPGAPGSHWPQSP PRGQPDGGAPGEKARPASPALSEASSGPPTPDIPLSPGGAHAIDPDCSPGPPDPDPM SA SAIPNALPPHILAETFERHLRGLLRGVRSPLAIGPLWARLDYLCSLWSLEAAGMVDRGL GRHL RLTRRAPPSAAEAVAPRPLMGFYEAATQNQADCQL ALLRRGLTTASTLRWGAQG PCFSSQ LTHNASLRLDAQSSAVMFGRVNEPTARNLLFRYCVGRADAGVNDDADAGRFVF HQPGDLAEENVHACGVLMDGHTGMVGASLDILVCPRDPHGYLAPAPQTPLAFYEVKCRAK YAFDPADPGAPAASAYEDLMARRSPEAFRAFIRSIPNPGVRYFAPGRVPGPEE
  • ORF # 4 from Contig 103
  • ORF start site 5313
  • ORF end site 4990
  • ORF # 5 from Contig 103
  • ORF start site 8477
  • ORF end site 6894
  • ORF sequence VGGRRPGGRMDESGRQRPASHVAADISPQGAHRRSFKAWLASYIHSLSRRASGRPSGPSP RDGAVSGARPGSRRRSSFRERLRAGLSR RVSRSSRRRSSPEAPGPAAKLRRPPLRRSET AMTSPPSPPSHILSLARIHKLCIPVFAVNPALRYTTLEIPGARSFGGSGGYGEVQLICEH KLAVKTIREKEWFAVELVATLLVGECAFCGGRTHDIRGFITPLGFSLQQRQIVFPAYDMD LGKYIGQLASLRATTPSVATALHHCFTDLARAWFLNTRCGISHLDIKCANVLVMLRSDA VSLRRAVLADFSLVTLNSNSTISRGQFCLQEPDLESPRGFGMPAALTTANFHTLVGHGYN QPPELLVKYLNNERAEFNNRPLKHDVGLAVDLYALGQTLLELLVSVYVAPSLG
  • ORF # 7 from Contig 103
  • ORF start site 8863
  • ORF start site 8749
  • ORF end site 10242
  • ORF # 9 from Contig 103
  • ORF start site 11332
  • ORF end site 10115
  • ORF # 3 from Contig 104
  • ORF start site 6099
  • ORF end site 3643
  • ORF # 7 from Contig 104
  • ORF start site 13917
  • ORF end site 9727
  • ORF # 8 from Contig 104
  • ORF start site 14832
  • ORF end site 14164
  • ORF # 12 from Contig 104
  • ORF start site 21285
  • ORF end site 20155
  • ORF sequence MASHAGQQHAPAFGQAARASGPTDGRAASRPSHRQGASEARGDPELPTLLRVYIDGPHGV GKTTTSAQLMEALGPRDNIVYVPEPMTY QVLGASETLTNIYNTQHRLDRGEISAGEAAV VMTSAQITMSTPYAATDAVLAPHIGGEAVGPQAPPPALTLVFDRHPIASLLCYPAARYLM GSMTPQAVLAFVALMPPTAPGTNLVLGVLPEAEHADRLARRQRPGERLDLAMLSAIRRVY DLLANTVRYLQRGGR RED GRLTGVAAATPRPDPEDGAGSLPRIEDTLFALFRVPELLA PNGDLYHIFAWVLDVLADRLLPMHLFVLDYDQSPVGCRDALLRLTAGMIPTRVTTAGSIA EIRDLARTFAREVGGV*
  • ORF # 15 from Contig 104
  • ORF start site 23784
  • ORF end site 24071
  • ORF # 17 from Contig 104
  • ORF start site 25463
  • ORF # 1 from Contig 76
  • ORF start site 111
  • ORF end site 1
  • ORF start site 507
  • ORF end site 2702
  • ORF sequence
  • ORF # 1 from Contig 91
  • ORF start site 364
  • ORF end site 2751
  • ORF sequence
  • ORF # 1 from Contig 93
  • ORF start site 533
  • ORF end site 1678
  • ORF sequence : VALFVPLRLGWDPQTGLWRVERASWGPPAAPRAALLDVEAKVNFNPLALAARVAEHPGA RLAWARLAAIRNSPQCASSASLAVTITTRTARFAREYTTLAFPPTSKEGAFADLVEVCEV CLRPRGHPHRVTARVLLPRGYNYFVSAGDGFSAPALVALFRQWHTTVHPAPGALAPVFAF LGPGFEVRGGPLQYFAVLGFPGWPPFTVPAAAAAESVRDLLRGAACTHPLCPGGPGPRWA PRSSCPRGHGRPWPRRRPAASCPPFGKRWRGGTPRPPPSNYSTPRRPSGRSGRRGFVSPG SRPSSWPPSRASGRPGCRKPGGGRAWKGWTRWWRPPPR ⁇ PGPCWSAWCRTRATPAPRS GSCSAGSWPPSACRSSRRPAR*
  • ORF # 3 from Contig 93
  • ORF start site 3631
  • ORF end site 2705
  • ORF sequence
  • ORF # 4 from Contig 93
  • ORF start site 4286
  • ORF # 4 from Contig 98
  • ORF start site 4922
  • ORF end site 3906
  • ORF # 5 from Contig 9i
  • ORF start site 6334
  • ORF end site 4874
  • ORF # 6 from Contig 99
  • ORF start site 7758
  • ORF end site 5668
  • ORF sequence :
  • [SEQ ID NO: 119] Contig ID 10 Length: 21036 Type: N Check: 7835 ..
  • RVMAGLREALAARERRAQ I EAEGL ANLKTMLKWAVPATVAKTLDQARS VAE I ADQVEVLLDQTEKTR EL
  • [SEQ ID NO: 141] Contig ID 14 Length: 2647 Type: N Check: 2951 .. [ SEQ ID NO : 142 ] >contigl4 ( start 2661 - stop 97 ) translated

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Virology (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • General Chemical & Material Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Oncology (AREA)
  • Communicable Diseases (AREA)
  • Veterinary Medicine (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Peptides Or Proteins (AREA)
  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

This invention relates to newly identified HSV-2 polynucleotides, polypeptides encoded by such polynucleotides, the uses of such polynucleotides and polypeptides, as well as the production of such polynucleotides and polypeptides and recombinant host cells transformed with the polynucleotides. This invention also relates to inhibiting or activating the biosynthesis or action of such polynucleotides or polypeptides and the use of such inhibitors or activators in therapy.

Description

NOVEL CODING SEQUENCES FROM HERPES SIMPLEX VIRUS TYPE-2 Field of the Invention:
This invention relates to newly identified Herpes Simplex Virus type 2 (HSV-2) polynucleotides, polypeptides encoded by such polynucleotides, the uses of such polynucleotides and polypeptides, as well as the production of such polynucleotides and polypeptides and recombinant host cells transformed with the polynucleotides. This invention also relates to inhibiting the biosynthesis or action of such polynucleotides or polypeptides and to the use of such inhibitors in therapy of viral infections or related diseases. Background of the Invention:
The herpes viruses consist of large icosahedral enveloped virions containing linear double stranded DNA genomes. Currently, eight human herpes viruses have been isolated and are known to be responsible for a variety of disease states, from sub-clinical infections to fatal disease in the immuno compromised. One human herpes virus, herpes simplex virus type 2, designated HSV-2, is usually acquired through sexual contact giving rise to the condition known as genital herpes. The frequency of recurrence of secondary genital herpes ranges between one and six times per year per infected individual. It is estimated that genital HSV-2 infections occur in ten to sixty million individuals in the USA. Less frequently, HSV-2 infection results in herpes labialis, seen as cold sores. General information about HSV-2 may be found in various treatises such as, Herpes
Simplex Viruses, In: "Field's Virology", 3rd ed., Lippincott-Raven Publ, pp2297-2342 (1996); Magder, L.S., et a New. England J. Med. 321 :7-12 (1989); and "The Human Herpes viruses", Roizman, B. et al., eds. Raven Press, New York, (1993), the contents of which are incorporated herein by reference for purposes of background. Currently, there are no vaccines available to protect against HSV-2 infection.
Individuals continue to become infected by the virus and no completely satisfactory antiviral agents or vaccines are available. Thus HSV-2 presents a major public health problem. There is a need for prophylactic and therapeutic vaccines as well as a method of identifying anti-HSV-2 agents and for reagents useful in such methods. There is a need for a method of identifying compounds which modulate the activity of HSV-2 polynucleotides and proteins and which affect the ability of the virus to replicate and produce multiple infectious virions in an infected cell. There is a need for methods of, and kits for, distinguishing HSV-2 infections from other herpes virus infections. Brief Description of the Invention:
Toward these ends, it is an object of the present invention to provide polypeptides, inter alia, that have been identified as novel HSV^ polypeptides by comparison between the amino acid sequences set out in Tables 1-4 and known amino acid sequences of proteins of other viruses such as herpes simplex virus type-1 (HSV-1).
It is a further object of the invention, to provide polynucleotides that encode HSV-2 proteins, particularly polynucleotides that encode the polypeptides encoded by the Open Reading Frames (ORFs) provided herein, or fragments, analogs or derivatives thereof.
In a particularly preferred embodiment of this aspect of the invention the polynucleotides comprise any of the regions encoding HSV-2 proteins in the sequences set out in Tables 1-4, including fragments, analogs or derivatives thereof.
In another particularly preferred embodiment of the present invention, there is a novel HSV-2 protein comprising any of the amino acid sequences shown in Table 1, or fragments, analogues or derivatives thereof. In accordance with the invention there is provided an isolated nucleic acid molecule encoding a polypeptide expressible by the HSV-2 polynucleotide contained in the deposited HSV-2 strain, SB5.
In accordance with the invention there are provided isolated nucleic acid molecules encoding HSV-2 proteins, nucleic acid molecules such as, mRNAs, cDNAs, genomic DNAs and, in further embodiments of this aspect of the invention, biologically, diagnostically, clinically or therapeutically useful variants, analogs or derivatives thereof, or fragments thereof, including fragments of the variants, analogs and derivatives.
Among the particularly preferred embodiments of this aspect of the invention are naturally occurring allelic variants of HSV-2 proteins. In accordance with this aspect of the invention there are provided novel polypeptides of
HSV-2 origin as well as biologically, diagnostically or therapeutically useful fragments thereof, as well as variants, derivatives and analogs of the foregoing and fragments thereof.
In accordance with certain preferred embodiments of this and other aspects of the invention there are probes that hybridize to HSV-2 sequences useful for detection of viral infection.
It also is an object of the invention to provide HSV-2 polypeptides or fragments thereof that may be employed for therapeutic or prophylactic purposes, for example, to treat disease, including treatment by conferring host immunity against viral infections, or as an antiviral agent or a vaccine. In accordance with another aspect of the present invention, there is provided the use of a polynucleotide of the invention for therapeutic or prophylactic purposes, in particular genetic immunization.
Among the particularly preferred embodiments of this aspect of the invention are variants of HSV-2 polypeptides encoded by naturally occurring alleles of HSV-2 genes for therapeutic or prophylactic use.
It is another object of the invention to provide a process for producing the aforementioned polypeptides, polypeptide fragments, variants and derivatives, fragments of the variants and derivatives, and analogs thereof. In a preferred embodiment of this aspect of the invention there are provided methods for producing the aforementioned HSV-2 polypeptides comprising culturing host cells having expressibly incorporated therein an exogenously-derived HSV-2 encoding polynucleotide under conditions for expression of HSV-2 in the host and then recovering the expressed polypeptide. In accordance with another object of the invention there are provided products, compositions, processes and methods that utilize the aforementioned polypeptides and polynucleotides, inter alia, for research, biological, clinical, diagnostic, prophylatic and therapeutic purposes.
In accordance with yet another aspect of the present invention, there are provided inhibitors of such polypeptides, useful as antiviral agents. In particular, there are provided antibodies against such polypeptides. In certain particularly preferred embodiments in this regard, die antibodies are selective for HSV-2.
In a further aspect of the invention there are provided compositions comprising a HSV-2 polynucleotide or HSV-2 polypeptide for administration to cells in vitro, to cells ex vivo and to cells in vivo, or to a multicellular organism. In certain preferred embodiments of this aspect of the invention, the compositions comprise a HSV-2 polynucleotide for expression of a HSV-2 polypeptide in a host organism to raise an immunological response, preferably to raise immunity in such host against HSV-2 or related organisms.
Other objects, features, advantages and aspects of the present invention will become apparent to those of skill from the following description. It should be understood, however, that the following description and the specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only. Various changes and modifications within the spirit and scope of the disclosed invention will become readily apparent to those skilled in the art from reading the following description and from reading the other parts of the present disclosure. Detailed Description of the Invention:
Tables 1-3 show the nucleotide sequences of one strand of "contigs," prepared by assembling sequences derived by sequencing HSV-2, Strain SB5, DNA. Collectively, the contigs herein represent between 85% to over 90% of the genome of this organism. Each of Table 1, 2 and 3 represents a separate sequencing of the HSV-2, SB5, DNA.
Tables 1-3 also show the nucleotide sequences of open reading frames (ORFs), which are deduced DNA coding sequences present within each contig. Tables 1-4 also show the deduced amino acid sequences of polypeptides encoded by these ORFs and sequence homologies to proteins in the NCBI non-redundant protein database. Each ORF represents a HSV-2 gene although in some cases, a given ORF may actually have been derived from a gene that is longer than the ORF.
Each of the DNA sequences provided herein may be used in the discovery and development of antiviral compounds. For sequences containing an open reading frame (ORF) with appropriate initiation and termination codons, the encoded protein upon expression can be used as a target for the screening of antiviral drugs. Additionally, the DNA sequences encoding preferably the amino terminal regions of the encoded protein, or regions immediately upstream therefrom, can be used to construct antisense sequences to control the expression of the coding sequence of interest. Furthermore, many of the sequences disclosed herein also provide regions upstream and downstream from the encoding sequence. These sequences are useful as a source of regulatory elements for the control of viral gene expression. Such sequences are conveniently isolated by restriction enzyme action or synthesized chemically and introduced, for example, into promoter identification strains. These strains contain a reporter structural gene sequence located downstream from a restriction site such that if an active promoter is inserted, the reporter gene will be expressed.
Although each of the sequences may be employed as described above, this invention also provides several means for identifying particularly useful target genes. The first of these approaches entails searching appropriate databases for sequence matches in related organisms. Thus, if a homologue exists, the HSV-2-like form of this gene would likely play an analogous role. For example, a HSV-2 protein identified as homologous to a cell surface protein in another organism would be useful as a vaccine candidate. To the extent such homologies have been identified for the sequences disclosed herein they are reported along with the encoding sequence in the Tables.
A number of methods can be used to identify genes which are essential to survival per se, or essential to the establishmentymaintenance of an infection. Identification of an ORF unknown by one of these methods yields additional information about its function and permits the selection of such an ORF for further development as a screening target. Briefly, these approaches include: generation of temperature sensitive mutations (Weller, S.K., et al., Virology 130:290-305 (1983)), site specific insertion or deletion of a viral gene; a method based on selection of recombinant molecules generated by double recombination through homologous sequencees between intact viral DNA molecules and a DNA fragment containing an insertion or deletion and a selectable marker (Post, L.E., et al., Cell 25:227- 32 (1981)), and also by insertional mutagenesis using transposons; a method taking advantage of the random insertion of the DNA phage miniMu into target plasmid DNAs (Jenkins, F.J., et al., Proc. Nat'l. Acad. Sci. USA 82:4773-4777 (1985)). Each of these techniques may have advantages or disadvantages depending on the particular application. The skilled artisan would choose the approach that is the most relevant with the particular end use in mind. For example, some genes might be recognised as essential for infection but in reality are only necessary for the initiation of infection and so their products would represent relatively unattractive targets for antivirals developed to cure established and chronic infections.
Use of these technologies when applied to the ORFs of the present invention enables identification of viral proteins expressed during infection, inhibitors of which would have utility in antiviral therapy. Glossary:
The following explanations are provided to facilitate understanding of certain terms used frequently herein, particularly in the Examples. The explanations are provided as a convenience and are not limitative of the invention.
HSV-2 BINDING MOLECULE, as used herein, refers to molecules or ions which bind or interact specifically with HSV-2 polypeptides or polynucleotides of the present invention, including, for example, enzyme substrates, cell membrane components and classical receptors. Binding between polypeptides of the invention and such molecules, including binding or interaction molecules may be exclusive to polypeptides of the invention, which is preferred, or it may be highly specific for polypeptides of the invention, which is also preferred, or it may be highly specific to a group of proteins that includes polypeptides of the invention, which is preferred, or it may be specific to several groups of proteins at least one of which includes a polypeptide of the invention. Binding molecules also include antibodies and antibody-derived reagents that bind specifically to polypeptides of the invention.
GENETIC ELEMENT generally means a polynucleotide comprising a region that is important to the viral life cycle, a polynucleotide comprising a region that encodes a polypeptide or a polynucleotide region that regulates replication, transcription or translation or other processes important to expression of the polypeptide in a host cell, or a polynucleotide comprising both a region that encodes a polypeptide and a region operably linked thereto that regulates expression. Genetic elements may be comprised within a vector that replicates as an episomal element; that is, as a molecule physically independent of the host cell genome. They may be comprised within plasmids. Genetic elements also may be comprised within a host cell genome; not in their natural state but, rather, following manipulation such as isolation, cloning and introduction into a host cell in the form of purified DNA or in a vector, among others.
HOST CELL is a cell which has been transformed or transfected, or is capable of transformation or transfection by an exogenous polynucleotide sequence.
IDENTITY as known in the art, is a relationship between two or more polypeptide sequences or two or more polynucleotide sequences, as determined by comparing the sequences. In the art, "identity" also means the degree of sequence relatedness between polypeptide or polynucleotide sequences, as the case may be, as determined by the match between strings of such sequences. "Identity" and "similarity" can be readily calculated by known methods, including but not limited to those described in (Computational Molecular Biology, Lesk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073 (1988). Preferred methods to determine identity are designed to give the largest match between the sequences tested. Methods to determine identity and similarity are codified in publicly available computer programs. Preferred computer program methods to determine identity and similarity between two sequences include, but are not limited to, the GCG program package (Devereux, J., et al., Nucleic Acids Research 12(1): 387 (1984)), BLASTP, BLASTN, and FASTA (Atschul, S.F. et al., J. Molec. Biol. 215: 403-410 (1990). The BLAST X program is publicly available from NCBI and other sources (BLAST Manual, Altschul, S., et al, NCBI NLM NIH Bethesda, MD 20894; Altschul, S., et al., J. Mol. Biol. 215: 403-410 (1990). As an illustration, by a polynucleotide having a nucleotide sequence having at least, for example, 95% "identity" to a reference nucleotide sequence it is intended that the nucleotide sequence of the tested polynucleotide is identical to the reference sequence except that the polynucleotide sequence may include up to five point mutations per each 100 nucleotides of the reference nucleotide sequence. In other words, to obtain a polynucleotide having a nucleotide sequence at least 95% identical to a reference nucleotide sequence, up to 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence. These mutations of the reference sequence may occur at the 5' or 3' terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence. Analogously, by a polypeptide having an amino acid sequence having at least, for example, 95% identity to a reference amino acid sequence is intended that the test amino acid sequence of the polypeptide is identical to the reference sequence except that the polypeptide sequence may include up to five amino acid alterations per each 100 amino acids of the reference amino acid. In other words, to obtain a polypeptide having an amino acid sequence at least 95% identical to a reference amino acid sequence, up to 5% of the amino acid residues in the reference sequence may be deleted or substituted with another amino acid, or a number of amino acids up to 5% of the total amino acid residues in the reference sequence may be inserted into the reference sequence. These alterations of the reference sequence may occur at the amino or carboxy terminal positions of the reference amino acid sequence or anywhere between those terminal positions, interspersed either individually among residues in the reference sequence or in one or more contiguous groups within the reference sequence.
ISOLATED means altered "by the hand of man" from its natural state; i.e., that, if it occurs in nature, it has been changed or removed from its original environment, or both. For example, a naturally occurring polynucleotide or a polypeptide naturally present in a living organism in its natural state is not "isolated," but the same polynucleotide or polypeptide separated from the coexisting materials of its natural state is "isolated", as the term is employed herein. For example, with respect to polynucleotides, the term isolated means that it is separated from the genome and cell in which it naturally occurs. As part of or following isolation, such polynucleotides can be joined to other polynucleotides, such as DNAs, for mutagenesis, to form fusion proteins, and for propagation or expression in a host, for instance. The isolated polynucleotides, alone or joined to other polynucleotides such as vectors, can be introduced into host cells, in culture or in whole organisms. Introduced into host cells in culture or in whole organisms, such DNAs still would be isolated, as the term is used herein, because they would not be in their naturally occurring form or environment. Similarly, the polynucleotides and polypeptides may occur in a composition, such as a media formulations, solutions for introduction of polynucleotides or polypeptides, for example, into cells, compositions or solutions for chemical or enzymatic reactions, for instance, which are not naturally occurring compositions, and, therein remain isolated polynucleotides or polypeptides within the meaning of that term as it is employed herein.
POLYNUCLEOΗDE(S) generally refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. Thus, for instance, polynucleotides as used herein refers to, among others, single-and double- stranded DNA, DNA that is a mixture of single- and double-stranded regions or single-, double- and triple-stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded, or triple-stranded, or a mixture of single- and double-stranded regions. In addition, polynucleotide as used herein refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands in such regions may be from the same molecule or from different molecules. The regions may include all of one or more of the molecules, but more typically involve only a region of some of the molecules. One of the molecules of a triple-helical region often is an oligonucleotide. As used herein, the term polynucleotide includes DNAs or RNAs as described above that contain one or more modified bases. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are "polynucleotides" as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritylated bases, to name just two examples, are polynucleotides as the term is used herein. It will be appreciated that a great variety of modifications have been made to DNA and RNA that serve many useful purposes known to those of skill in the art. The term polynucleotide as it is employed herein embraces such chemically, enzymatically or metabolically modified forms of polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including inter alia, simple and complex cells. The term polynucleotide(s) embrace short polynucleotides often referred as oligonucleotides.
POLYPEPTIDES, as used herein, includes all polypeptides as described below. The basic structure of polypeptides is well known and has been described in innumerable textbooks and other publications in the art. In this context, the term is used herein to refer to any peptide or protein comprising two or more amino acids joined to each other in a linear chain by peptide bonds. As used herein, the term refers to both short chains, which also commonly are referred to in the art as peptides, oligopeptides and oligomers, for example, and to longer chains, which generally are referred to in the art as proteins, of which there are many types. It will be appreciated that polypeptides often contain amino acids other than the 20 amino acids commonly referred to as the 20 naturally occurring amino acids, and that many amino acids, including the terminal amino acids, may be modified in a given polypeptide, either by natural processes, such as processing and other post-translational modifications, but also by chemical modification techniques which are well known to the art. Even d e common modifications that occur naturally in polypeptides are too numerous to list exhaustively here, but they are well described in basic texts and in more detailed monographs, as well as in a voluminous research literature, and they are well known to those of skill in the art. Among the known modifications which may be present in polypeptides of the present are, to name an illustrative few, acetylation, acylation, ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of phosphotidylinositol, cross-linking, cyclization, disulfide bond formation, demethylation, formation of covalent cross-links, formation of cystine, formation of pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristoylation, oxidation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of amino acids to proteins such as arginylation, and ubiquitination. Such modifications are well known to those of skill and have been described in great detail in the scientific literature. Several particularly common modifications, glycosylation, lipid attachment, sulfation, gamma-carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation, for instance, are described in most basic texts, such as, for instance PROTEINS - STRUCTURE AND MOLECULAR PROPERTIES, 2nd Ed., T. E. Creighton, W. H. Freeman and Company, New York (1993). Many detailed reviews are available on this subject, such as, for example, those provided by Wold, F., Posttranslational Protein Modifications: Perspectives and Prospects, pgs. 1-12 in POSTTRANSLATIONAL COVALENT MODIFICATION OF PROTEINS, B. C. Johnson, Ed., Academic Press, New York (1983); Seifter et al, Meth. Enzymol. 182:626-646 (1990) and Rattan et aL, Protein Synthesis: Posttranslational Modifications and Aging, Ann. N.Y. Acad. Sci. 663: 48-62 (1992). It will be appreciated, as is well known and as noted above, that polypeptides are not always entirely linear. For instance, polypeptides may be branched as a result of ubiquitination, and they may be circular, with or without branching, generally as a result of posttranslation events, including natural processing event and events brought about by human manipulation which do not occur naturally. Circular, branched and branched circular polypeptides may be synthesized by non-translation natural process and by entirely synthetic meuiods, as well. Modifications can occur anywhere in a polypeptide, including the peptide backbone, the amino acid side-chains and the amino or carboxyl termini. In fact, blockage of the amino or carboxyl group in a polypeptide, or both, by a covalent modification, is common in naturally occurring and synthetic polypeptides and such modifications may be present in polypeptides of the present invention, as well. For instance, the amino terminal residue of polypeptides made in E. coli or other cells, prior to proteolytic processing, almost invariably will be N-formylmethionine. During post-translational modification of the peptide, a methionine residue at the NH2-terminus may be deleted. Accordingly, this invention contemplates the use of both the methionine-containing and the methionineless amino terminal variants of the protein of the invention. The modifications that occur in a polypeptide often will be a function of how it is made. For polypeptides made by expressing a cloned gene in a host, for instance, the nature and extent of the modifications in large part will be determined by the host cell posttranslational modification capacity and the modification signals present in the polypeptide amino acid sequence. For instance, as is well known, glycosylation often does not occur in bacterial hosts such as, for example, E. coli. Accordingly, when glycosylation is desired, a polypeptide should be expressed in a glycosylating host, generally a eukaryotic cell. Insect cells often carry out the same posttranslational glycosylations as do mammalian cells and, for this reason, insect cell expression systems have been developed to express efficiently mammalian proteins having native patterns of glycosylation. Similar considerations apply to other modifications. It will be appreciated that the same type of modification may be present in the same or varying degree at several sites in a given polypeptide. Also, a given polypeptide may contain many types of modifications. In general, as used herein, the term polypeptide encompasses all such modifications, particularly those that are present in polypeptides synthesized by expressing a polynucleotide in a host cell.
VARIANT(S) as the term is used herein, is a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide respectively, but retains essential properties. A typical variant of a polynucleotide differs in nucleotide sequence from another, reference polynucleotide. Changes in the nucleotide sequence of the variant may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide. Nucleotide changes may result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as discussed below. A typical variant of a polypeptide differs in amino acid sequence from another, reference polypeptide. Generally, differences are limited so that the sequences of the reference polypeptide and the variant are closely similar overall and, in many regions, identical. A variant and reference polypeptide may differ in amino acid sequence by one or more substitutions, additions, deletions in any combination. A substituted or inserted amino acid residue may or may not be one encoded by the genetic code. A variant of a polynucleotide or polypeptide may be a naturally occurring such as an allelic variant, or it may be a variant that is not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to skilled artisans.
Deposit:
HSV-2, strain SB5 has been deposited at the American Type Culture Collection under accession number VR-2546 on October 31, 1996.
The deposits referred to herein will be maintained under the terms of the Budapest Treaty on the International Recognition of the Deposit of Micro-organisms for Purposes of Patent Procedure. These deposits are provided merely as convenience to those of skill in the art and are not an admission that a deposit is required under 35 U.S.C. § 112. The sequence of the polynucleotides contained in the deposited material, as well as the amino acid sequence of the polypeptides encoded thereby, are incorporated herein by reference and are controlling in the event of any conflict with any description of sequences herein. A license may be required to make, use or sell the deposited material, and no such license is hereby granted. Viral Strain and Genome:
The nucleotide sequences disclosed herein can be obtained by synthetic chemical techniques known in the art or can be obtained from HSV-2, strain SB5 by probing a DNA preparation with probes constructed from the particular sequences disclosed herein. Alternatively, oligonucleotides derived from a disclosed sequence can act as PCR primers in a process of PCR-based cloning of the sequence from a viral genomic source. It is recognised that such sequences will also have utility in diagnosis of the stage of infection and type of infection the pathogen has attained.
The present invention relates to novel HSV-2 polypeptides and polynucleotides encoding same, among other things, as described in greater detail below. The invention relates especially to HSV-2 molecules having the nucleotide and amino acid sequences set out in Tables 1-4 and to the HSV-2 nucleotide and amino acid sequences of the DNA isolatable from Deposit No. ATTC VR-2546, which is herein referred to as "the deposited organism" or as the "DNA of the deposited organism." It will be appreciated that the nucleotide and amino acid sequences set out in Tabled 1-4 were obtained by sequencing the DNA of the deposited organism. Hence, the sequence of the deposited clone is controlling as to any discrepancies between it (and the sequence it encodes) and the sequences of the Tables.
The present invention also relates to additional polynucleotide sequences disclosed herein, which are RNAs transcribed from the DNAs disclosed herein but which may or may not be translated into protein. Such polynucleotides are known in HSV-1 and outer herpes viruses. Polynucleotides
In accordance with one aspect of the present invention, there are provided isolated polynucleotides which encode HSV-2 polypeptides having the deduced amino acid sequence of Tables 1-4. It is preferred that these polynucleotides be one of those set forth in Tables 1, 2 or 3. The skilled artisan can readily determine the polynucleotide sequence of such preferred polynucleotides by reference to the ORF start and stop positions set forth in Tables 1-4.
Using the information provided herein, such as the polynucleotide sequence set out in Tables 1-3, a polynucleotide of the present invention encoding HSV-2 polypeptide may be obtained using standard cloning and screening procedures. To obtain the polynucleotide encoding the protein using the DNA sequences given in Tables 1-3 typically a library of clones of chromosomal DNA of HSV-2 strain SB5 in E. coli or some other suitable host is probed with a radiolabelled oligonucleotide, preferably a 17mer or longer, derived from a sequence of Tables 1-3. Clones carrying DNA identical to that of the probe can then be distinguished using high stringency washes. By sequencing the individual clones thus identified with sequencing primers designed from the original sequence it is then possible to extend the sequence in both directions to determine the full gene sequence. Conveniently such sequencing is performed using denatured double stranded DNA prepared from a plasmid clone. Suitable techniques are described by Maniatis, T., Fritsch, E.F. and Sambrook, J. in MOLECULAR CLONING, A Laboratory Manual (2nd edition 1989 Cold Spring Harbor Laboratory, see Screening By Hybridization 1.90 and Sequencing Denatured Double-Stranded DNA Templates 13.70
The DNA sequences set out in Tables 1, 2 and 3 each contain at least one open reading frame encoding a protein having at least about the number of amino acid residues set forth in Table 1-3. The start and stop codons of each open reading frame are the first three and the last three nuclotides of each polynucleotide set forth in Table 1, 2 and 3. Certain HSV-2 sequences of the invention are structurally related to sequences encoding other proteins of the herpes family, as shown by comparing the sequences of the Tables with that of sequences reported in the literature. Moreover, certain polynucleotides and polypeptides of the invention are structurally related to known. These proteins exhibit greatest homology to the homologue listed in Tables 1, 2, 3 and 4 from among the known proteins. The invention provides a polynucleotide sequence identical over its entire length to each coding sequence in Tables 1-3. Also provided by the invention is the coding sequence for the mature polypeptide or a fragment thereof, by itself as well as the coding sequence for the mature polypeptide or a fragment in reading frame with other coding sequence, such as those encoding a leader or secretory sequence, a pre-, or pro- or prepro- protein sequence. The polynucleotide may also contain non-coding sequences, including for example, but not limited to non-coding 5' and 3' sequences, such as the transcribed, non-translated sequences, termination signals, ribosome binding sites, sequences that stabilize mRNA, introns, polyadenylation signals, and additional coding sequence which encode additional amino acids. For example, a marker sequence mat facilitates purification of the fused polypeptide can be encoded. In certain embodiments of the invention, the marker sequence is a hexa-histidine peptide, as provided in the pQE vector (Qiagen, Inc.) and described in Gentz et al., Proc. Natl. Acad. Sci., USA 86: 821- 824 (1989), or an HA tag (Wilson et al., Cell 37: 767 (1984). Polynucleotides of the invention also include, but are not limited to, polynucleotides comprising a structural gene and its naturally associated sequences that control gene expression. The invention also includes polynucleotides of the formula:
X-(R1)m-(R2)-(R3)n-Y wherein, at the 5' end of the molecule, X is hydrogen, and at the 3' end of the molecule, Y is hydrogen or a metal, Rj and R3 is any nucleic acid residue, n and/or m is an integer between 1 and 3000 or zero, and R2 is a nucleic acid sequence of the invention, particularly a nucleic acid sequence selected from the group set forth in Tables 1, 2 and 3, as well as a ORF sequence selected from the group set forth in Tables 1, 2, 3 and 4 (as indicated by the reading frame numbering). In the polynucleotide formula above R2 is oriented so that its 5' end residue is at the left, bound to Rj and its 3' end residue is at the right, bound to R3. Any stretch of nucleic acid residues denoted by either R group, where n and/or m is greater than 1, may be either a heteropolymer or a homopolymer, preferably a heteropolymer. In a preferred embodiment n and/or m is an integer between 1 and 1000, or 2000 or 3000.
The term "polynucleotide encoding a polypeptide" as used herein encompasses polynucleotides that include a sequence encoding a polypeptide of the invention, particularly a viral polypeptide and more particularly a polypeptide of the HSV-2 having an amino acid sequence set out in Table 1, 2, 3 or 4. The term also encompasses polynucleotides that include a single continuous region or discontinuous regions encoding the polypeptide (for example, interrupted by integrated phage or an insertion sequence or editing) together with additional regions, that also may contain coding and/or non-coding sequences.
The invention further relates to variants of the polynucleotides described herein that encode for variants of the polypeptide having the deduced amino acid sequence of Tables 1, 2, 3 and 4. Variants that are fragments of the polynucleotides of the invention may be used to synthesize full-length polynucleotides of the invention.
Polynucleotides of the present invention may be in the form of RNA, such as mRNA, or in the form of DNA, including, for instance, cDNA and genomic DNA obtained by cloning or produced by chemical synthetic techniques or by a combination thereof. The DNA may be double-stranded or single-stranded. Single-stranded DNA may be the coding strand, also known as the sense strand, or it may be the non-coding strand, also referred to as die anti-sense strand.
The coding sequence which encodes the polypeptide may be identical to the coding sequence of the polynucleotide shown in Tables 1-4. It also may be a polynucleotide with a different sequence, which, as a result of the redundancy (degeneracy) of the genetic code, encodes the polypeptides of Tables 1-4.
Particularly preferred embodiments are polynucleotides encoding polypeptide variants, mat have the amino acid sequence of a polypeptide of Tables 1, 2, 3 and/or 4 in which several, a few, 5 to 10, 1 to 5, 1 to 3, 2, 1 or no amino acid residues are substituted, deleted or added, in any combination. Especially preferred among these are silent substitutions, additions and deletions, that do not alter the properties and activities of such polynucleotide.
Further preferred embodiments of the invention are polynucleotides that are at least 50%, 60% or 70% identical over their entire lengtii to a polynucleotide encoding a polypeptide having the amino acid sequence set out in Tables 1, 2, 3 or 4, and polynucleotides that are complementary to such polynucleotides. Alternatively, most highly preferred are polynucleotides that comprise a region that is at least 80% identical over its entire length to a polynucleotide encoding a polypeptide of the deposited strain and polynucleotides complementary thereto. In this regard, polynucleotides at least 90% identical over their entire length to the same are particularly preferred, and among these particularly preferred polynucleotides, those with at least 95% are especially preferred. Furthermore, those with at least 97% are highly preferred among those with at least 95%, and among these those with at least 98% and at least 99% are particularly highly preferred, with at least 99% being the most preferred.
A preferred embodiment is an isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of: a polynucleotide having at least a 50% identity to a polynucleotide encoding a polypeptide comprising the amino acid sequence of Tables 1, 2, 3, or 4 and obtained from a prokaryotic species other than HSV-2; and a polynucleotide encoding a polypeptide comprising an amino acid sequence which is at least 50% identical to the amino acid sequence of Tables 1, 2, 3 or 4 and obtained from a prokaryotic species other than HSV-2. Preferred embodiments are polynucleotides that encode polypeptides that retain substantially the same biological function or activity as the mature polypeptide encoded by the DNA of Tables l, 2, 3 or4. The invention further relates to polynucleotides that hybridize to the herein above- described sequences. In this regard, the invention especially relates to polynucleotides that hybridize under stringent conditions to the herein above-described polynucleotides. As herein used, the terms "stringent conditions" and "stringent hybridization conditions" mean hybridization will occur only if there is at least 95% and preferably at least 97% identity between the sequences. An example of stringent hybridization conditions is overnight incubation at 42°C in a solution comprising: 50% formamide, 5x SSC (150mM NaCl, 15mM trisodium citrate), 50 mM sodium phosphate (pH7.6), 5x Denhardt's solution, 10% dextran sulfate, and 20 micrograms/ml denatured, sheared salmon sperm DNA, followed by washing the hybridization support in O.lx SSC at about 65°C. Hybridization and wash conditions are well known and exemplified in Sambrook, et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N.Y., (1989), particularly Chapter 11 therein.
The invention also provides a polynucleotide consisting essentially of a polynucleotide sequence obtainable by screening an appropriate library containing the complete gene for a polynucleotide sequence set forth in Tables 1, 2, 3 or 4 under stringent hybridization conditions with a probe having the sequence of said polynucleotide sequence or a fragment thereof; and isolating said DNA sequence. Fragments useful for obtaining such a polynucleotide include, for example, probes and primers described elsewhere herein.
As discussed additionally herein regarding polynucleotide assays of the invention, for instance, polynucleotides of the invention as discussed above, may be used as a hybridization probe for RNA, cDNA and genomic DNA to isolate full-length cDNAs and genomic clones encoding a polypeptide and to isolate cDNA and genomic clones of other genes that have a high sequence similarity to a polynucleotide set forth in Table 1, 2, 3 or 4. Such probes generally will comprise at least 15 bases. Preferably, such probes will have at least 30 bases and may have at least 50 bases. Particularly preferred probes will have at least 30 bases and will have 50 bases or less.
For example, the coding region of each gene that comprises or is comprised by a polynucleotide set forth in Table 1, 2, 3 or 4 may be isolated by screening using a DNA sequence provided in Table 1, 2, 3 or 4 to synthesize an oligonucleotide probe. A labeled oligonucleotide having a sequence complementary to that of a gene of the invention is then used to screen a library of cDNA, genomic DNA or mRNA to determine which members of the library the probe hybridizes to. Polynucleotides of the invention that are oligonucleotides derived from the a polynucleotide or polypeptide sequence set forth in Table 1 , 2, 3 or 4 may be used in the processes herein as described, but preferably for PCR, to determine whether or not the polynucleotides identified herein in whole or in part are transcribed in virus in infected tissue. It is recognized that such sequences will also have utility in diagnosis of the stage of infection and type of infection the pathogen has attained.
The invention also provides polynucleotides that may encode a polypeptide that is the mature protein plus additional amino or carboxyl-terminal amino acids, or amino acids interior to the mature polypeptide (when the mature form has more than one polypeptide chain, for instance). Such sequences may play a role in processing of a protein from precursor to a mature form, may allow protein transport, may lengthen or shorten protein half-life or may facilitate manipulation of a protein for assay or production, among other things. As generally is the case in vivo, the additional amino acids may be processed away from the mature protein by cellular enzymes. A precursor protein, having the mature form of the polypeptide fused to one or more prosequences may be an inactive form of the polypeptide. When prosequences are removed such inactive precursors generally are activated. Some or all of the prosequences may be removed before activation. Generally, such precursors are called proproteins.
The DNA may also comprise a promoter region which functions to direct the transcription of the mRNA encoding the HSV-2 of this invention . Such promoters may be independently useful to direct the transcription of heterologous genes in recombinant expression systems. Polyadenylation and splicing signal sequences are also present in the polynucleotide sequence and may be useful as gene expression signal in heterologous gene expression vectors and constructs. The polynucleotides and polypeptides of the invention may be employed, for example, as research reagents and materials for discovery of treatments of and diagnostics for disease, particularly human disease, as further discussed herein relating to polynucleotide assays.
The polynucleotides of the invention that are oligonucleotides may also be used as nucleic acid amplification primers, such as PCR primers, in the process herein described to determine whether or not the HSV-2 genes identified herein in whole or in part are present or transcribed in infected tissue. It is recognized that such sequences will also have utility in diagnosis of the stage of infection and type of infection the pathogen has attained.
In addition to the uses mentioned above for the polynucleotides of this invention, the following applications are also contemplated by this invention. Inter alia, the polynucleotides disclosed herein or portions thereof, may be used as probes to discover mRNA transcripts synthesized during productive and latent HSV-2 infections, for example by Northern blot, nuclease protection, and primer extension experiments. Novel transcripts in turn can lead to the discovery of new HSV-2 proteins not deducible from the genome sequences directly. The sequences, or portions thereof, may be used to discover antisense inhibitors of virus replication and novel therapeutics based on antisense mechanisms. The sequences, or portions thereof, may be used to prepare novel gene therapy vectors. The sequences or portions thereof may be used as a basis for the generation of DNA- or RNA-containing oligonucleotides designed to form a triplex with duplex DNA, for use as analytical tools, diagnostics or therapeutics. Nucleic acid sequences, or portion thereof, can be used to generate cell lines useful for diagnostics or screening. The DNA sequences can be used to predict restriction enzyme sites useful for replacing the gene in the viral genome with a marker gene such as lac z or green flourescent protein. Such a replacement is useful in defining the biological role of the gene in the viral life cycle. These gene knockout experiments are useful to discover genes which are likely to be high quality drug discovery targets (essential genes) or good locations for foreign genes for the purposes of gene therapy (non- essential genes) through an HSV-2 viral vector. Such gene replacements are also useful for discovering virulence factors, for example by comparing the pathogenicity of the modified virus with the unmodified virus or through the ease of identifying a marker gene such as lacz.
In addition to the standard A, G, C, T U representations for nucleic acid bases, the term "N" is also used. "N" means that any of the four DNA or RNA bases may appear at such a designated position in the DNA or RNA sequence, except it is preferred that N is not a base that when taken in combination with adjacent nucleotide positions, when read in the correct reading frame, would have the effect of generating a premature termination codon in such reading frame.
In sum, a polynucleotide of the invention may encode a mature protein, a mature protein plus a leader sequence (which may be referred to as a preprotein), a precursor of a mature protein having one or more prosequences that are not the leader sequences of a preprotein, or a preproprotein, which is a precursor to a proprotein, having a leader sequence and one or more prosequences, which generally are removed during processing steps that produce active and mature forms of the polypeptide. Polypeptides The present invention further relates to HSV-2 polypeptides that have the deduced amino acid sequences of the polypeptides defined by amino acid sequence in Tables 1-4. The invention also relates to fragments, analogs and deπvatives of these polypeptides The terms "fragment," "derivative" and "analog" when referring to the polypeptides of the invention mean a polypeptide which retains essentially the same biological function or activity as such polypeptide Fragments, deπvatives and analogs that retain at least 90% of the biological activity of the native HSV-2 protein are preferred Fragments, deπvatives and analogs that retain at least 95% of the activity of the native HSV-2 protein are preferred Thus, an analog includes a proprotein which can be activated by cleavage of the proprotein portion to produce an active mature polypeptide
The polypeptide of the present invention may be a recombinant polypeptide, a natural polypeptide or a synthetic polypeptide In certain preferred embodiments it is a recombinant polypepftde
The fragment, deπvative or analog of the polypeptides of the invention may be (I) one in which one or more of the ammo acid residues are substituted with a conserved or non- conserved ammo acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code, or (n) one in which one or more of the ammo acid residues includes a substituent group, or (in) one in which the mature polypeptide is fused with another compound, such as a compound to increase the half-life of the polypeptide (for example, polyethylene glycol), or (iv) one in which the additional amino acids are fused to the mature polypeptide, such as a leader or secretory sequence or a sequence which is employed for puπfication of the mature polypeptide or a proprotein sequence Such fragments, deπvatives and analogs are deemed to be obtained by those of ordinary skill in the art, from the teachings herein
Among preferred vaπants are those that vary from a reference by conservative amino acid substitutions Such substitutions are those that substitute a given ammo acid in a polypeptide by another ammo acid of like characteπstics Typically seen as conservative substitutions are the replacements, one for another, among the aliphatic amino acids Ala, Val, Leu and De, interchange of the hydroxyl residues Ser and Thr, exchange of the acidic residues Asp and Glu, substitution between the amide residues Asn and Gin, exchange of the basic residues Lys and Arg and replacements among die aromatic residues Phe, Tyr Further particularly preferred m this regard are vaπants, analogs, denvatives and fragments having the ammo acid sequence of one or more of the HSV-2 polypeptides of the invention, in which several, a few, 5 to 10, 1 to 5, 1 to 3, 2, 1 or no ammo acid residues are substituted, deleted or added, in any combination Especially preferred among these are silent substitutions, additions and deletions, which do not alter the properties and activities of the HSV- 2 protein. Also especially preferred in this regard are conservative substitutions. Most highly preferred are polypeptides having the amino acid sequences of Tables 1-4 without substitutions. The invention also includes polypeptides of the formula:
X-(R1)n-(R2)-(R3)m-Y wherein, at the amino terminus, X is hydrogen, and at the carboxyl terminus, Y is hydrogen or a metal, R\ and R3 are any amino acid residue, n and/or m is an integer between 1 and 2000 or zero, and R2 is an amino acid sequence of the invention, particularly an amino acid sequence selected from the group set forth in Tables 1, 2, 3 and 4. In the formula above R is oriented so that its amino terminal residue is at the left, bound to Rj and its carboxy terminal residue is at the right, bound to R3. Any stretch of amino acid residues denoted by either R group, where n and/or m is greater than 1, may be either a heteropolymer or a homopolymer, preferably a heteropolymer. In preferred embodiments n and or m is an integer between 1 and 1000 or 2000. The polypeptides and polynucleotides of the present invention are preferably provided in an isolated form, and preferably are purified to homogeneity. The polypeptides of the present invention include the polypeptides of Tables 1 -4, in particular the mature polypeptide as well as polypeptides which have at least 60%, 70% or 80% identity to one or more of the polypeptides of Tables 1-4 and preferably at least 90% similarity to one or more of the polypeptides of Tables 1-4 and more preferably at least 95% similarity; and still more preferably at least 95% identity to one or more of the polypeptides of Tables 1-4 and also include portions of such polypeptides with such portion of the polypeptide generally containing at least 30 contiguous amino acids and more preferably at least 50 contiguous amino acids.
In addition to the uses mentioned above for the polypeptides of this invention, the following applications are also contemplated by this invention. Inter alia, the polypeptides disclosed herein or portions thereof which have enzymatic activity or structural functionality are useful as a source of those proteins for screening and or therapy. Such polypeptides may be identified by homology for example, to HSV1 polypeptides that code for proteins with known function (e.g., helicases, kinases, proteases). Use of polypeptides of the invention for screening or therapy based upon functionality predicted by homology match is a particularly preferred aspect of this invention. Also the polypeptides derived from the deposited strain ATCC VR-2546 herein can be used for comparison with sequences from other HSV-2 strains in the public domain, for example, comparison of the polypeptides of the invention with strain HG52 may be useful in the discovery of virulence factors, since HG52 is avirulent in mouse and guinea pig infection models and HSV-2 SB5 is virulent. Similarly, public domain homolog from strain MS may be useful in the discovery of virulence factors since there are major differences in the CNS pathogenesis in animal models between strains MS and SB5.
In addition to the standard single and triple letter representations for amino acids, the term "X" or "Xaa" is also used. "X" and "Xaa" mean that any of the twenty naturally occuring amino acids may appear at such a designated position in the polypeptide sequence.
Fragments
Fragments or portions of the polypeptides of the present invention may be employed for producing the corresponding full-length polypeptide by peptide synthesis; therefore, the fragments may be employed as intermediates for producing the full-length polypeptides. Fragments or portions of the polynucleotides of the present invention may be used to synthesize full-length polynucleotides of the present invention.
Also among preferred embodiments of this aspect of the present invention are polypeptides comprising fragments of HSV-2, most particularly fragments of HSV-2 having the amino acid sequences set out in Tables 1-4, and variants and derivatives thereof. In this regard, a fragment is a polypeptide having an amino acid sequence that entirely is the same as part but not all of the amino acid sequence of the aforementioned HSV-2 polypeptides and variants or derivatives thereof.
Such fragments may be "free-standing," i.e., not part of or fused to other amino acids or polypeptides, or they may be comprised within a larger polypeptide of which they form a part or region. When comprised within a larger polypeptide, the presently discussed fragments most preferably form a single continuous region. However, several fragments may be comprised within a single larger polypeptide. For instance, certain preferred embodiments relate to a fragment of a HSV-2 polypeptide of the present comprised within a precursor polypeptide designed for expression in a host and having heterologous pre and pro-polypeptide regions fused to the amino terminus of the HSV-2 fragment and an additional region fused to the carboxyl terminus of the fragment. Therefore, fragments in one aspect of the meaning intended herein, refers to the portion or portions of a fusion polypeptide or fusion protein derived from HSV-2.
Representative examples of polypeptide fragments of the invention, include, for example, those which have from about 5-15, 10-20, 15-40, 30-55, 41-75, 41-80, 41-90, 50-100, 75-100, 90-115, 100-125, and 110-140, 120-150, 200-300, 1-175, 1-600 or 1-1000 amino acids long. Particular examples of polypeptide fragments of the inventions that may be mentioned include fragments of 20-200 amino acids.
In this context about includes the particularly recited range and ranges larger or smaller by several, a few, 5, 4, 3, 2 or 1 amino acid at either extreme or at both extremes. Among especially preferred fragments of the invention are truncation mutants of HSV-2. Truncation mutants include HSV-2 polypeptides having the amino acid sequences of Tables 1-4, or of variants or derivatives thereof, except for deletion of a continuous series of residues (that is, a continuous region, part or portion) that includes the amino terminus, or a continuous series of residues that includes the carboxyl terminus or, as in double truncation mutants, deletion of two continuous series of residues, one including the amino terminus and one including the carboxyl terminus. Fragments having the size ranges set out above also are preferred embodiments of truncation fragments, which are especially preferred among fragments generally. Degradation forms of the polypeptides of the invention in a host cell are also preferred. Also preferred in this aspect of the invention are fragments characterized by structural or functional attributes of HSV-2. Preferred embodiments of the invention in this regard include fragments that comprise alpha-helix and alpha-helix forming regions ("alpha-regions"), beta- sheet and beta-sheet-forming regions ("beta-regions"), turn and rum-forming regions ("rum- regions"), coil and coil-forming regions ("coil-regions"), hydrophilic regions, hydrophobic regions, alpha amphipathic regions, beta amphipathic regions, flexible regions, surface-forming regions and high antigenic index regions of HSV-2.
Further preferred regions are those that mediate activities of HSV-2. Most highly preferred in this regard are fragments that have a chemical, biological or other activity of the particular HSV-2 protein , including those with a similar activity or an improved activity, or with a decreased undesirable activity. Routinely one generates the fragment by well-known methods then compares the activity of the fragment to the native protein in a convenient assay such as listed hereinbelow. Highly preferred in this regard are fragments that contain regions that are homologs in sequence, or in position, or in both sequence and to active regions of related polypeptides, such as the related polypeptides set out in Table 1. Among particularly preferred fragments in these regards are truncation mutants, as discussed above. Further preferred polynucleotide fragments are those that are antigenic or immunogenic in an animal, especially in a human.
It will be appreciated that the invention also relates to, among others, polynucleotides encoding the aforementioned fragments, polynucleotides that hybridize to polynucleotides encoding the fragments, particularly those that hybridize under stringent conditions, and polynucleotides, such as PCR primers, for amplifying polynucleotides that encode the fragments. In these regards, preferred polynucleotides are those that correspond to the preferred fragments, as discussed above. Vectors, host cells, expression:
The present invention also relates to vectors which comprise a polynucleotide or polynucleotides of the present invention, host cells which are genetically engineered with vectors of the invention and the production of polypeptides of the invention by recombinant techniques. Host cells can be genetically engineered to incorporate polynucleotides and express polypeptides of the present invention. Introduction of a polynucleotides into the host cell can be affected by calcium phosphate transfection, DEAE-dextran mediated transfection, transvection, microinjection, cationic lipid-mediated transfection, electroporation, transduction, scrape loading, ballistic introduction, infection or other methods. Such methods are described in many standard laboratory manuals, such as Davis et ah, BASIC METHODS IN MOLECULAR BIOLOGY,
(1986) and Sambrook et ah, MOLECULAR CLONING: A LABORATORY MANUAL, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989).
Polynucelotide constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence. Alternatively, the polypeptides of the invention can be synthetically produced by conventional peptide synthesizers.
Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook et ah, MOLECULAR CLONING: A LABORATORY MANUAL, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989).
In accordance with this aspect of the invention the vector may be, for example, a plasmid vector, a single or double-stranded phage vector, a single or double-stranded RNA or DNA viral vector. Plasmids generally are designated herein by a lower case p preceded and/or followed by capital letters and or numbers, in accordance with standard naming conventions that are familiar to those of skill in the art. Starting plasmids disclosed herein are either commercially available, publicly available, or can be constructed from available plasmids by routine application of well known, published procedures. Many plasmids and other cloning and expression vectors that can be used in accordance with the present invention are well known and readily available to those of skill in the art.
Preferred among vectors, in certain respects, are those for expression of polynucleotides and polypeptides of the present invention. Generally, such vectors comprise -acting control regions effective for expression in a host operatively linked to the polynucleotide to be expressed. Appropriate trans-acting factors either are supplied by the host, supplied by a complementing vector or supplied by the vector itself upon introduction into the host. In certain preferred embodiments in this regard, the vectors provide for specific expression. Such specific expression may be inducible expression or expression only in certain types of cells or both inducible and cell-specific. Particularly preferred among inducible vectors are vectors that can be induced for expression by environmental factors that are easy to manipulate, such as temperature and nutrient additives. A variety of vectors suitable to this aspect of the invention, including constitutive and inducible expression vectors for use in prokaryotic and eukaryotic hosts, are well known and employed routinely by those of skill in the art.
A great variety of expression vectors can be used to express a polypeptide of the invention. Such vectors include, among others, chromosomal, episomal and virus-derived vectors, e.g., vectors derived from viral plasmids, from bacteriophage, from transposons, from yeast episomes, from insertion elements, from yeast chromosomal elements, from viruses such as baculoviruses, papova viruses, such as SV40, vaccinia viruses, adenoviruses, fowl pox viruses, pseudorabies viruses and retroviruses, and vectors derived from combinations thereof, such as those derived from plasmid and bacteriophage genetic elements, such as cosmids and phagemids, all may be used for expression in accordance with this aspect of the present invention. Generally, any vector suitable to maintain, propagate or express polynucleotides to express a polypeptide in a host may be used for expression in this regard.
The appropriate DNA sequence may be inserted into the vector by any of a variety of well-known and routine techniques, such as, for example, those set forth in Sambrook et al.. MOLECULAR CLONING, A LABORATORY MANUAL, 2nd Ed.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989).
The DNA sequence in the expression vector is operatively linked to appropriate expression control sequence(s), including, for instance, a promoter to direct mRNA transcription. Representatives of such promoters include, but are not limited to, the phage lambda PL promoter, the E. coli lac, trp and tac promoters, the SV40 early and late promoters and promoters of retroviral LTRs.
In general, expression constructs will contain sites for transcription initiation and termination, and, in some instances, in the transcribed region, a ribosome binding site for translation. The coding portion of the mature transcripts expressed by the constructs will include a translation initiating AUG at the beginning and a teirnination codon appropriately positioned at the end of the polypeptide to be translated.
In addition, the constructs may contain control regions that regulate as well as engender expression. Generally, in accordance with many commonly practiced procedures, such regions will operate by controlling transcription, such as transcription factors, repressor binding sites and termination, among others.
Vectors for propagation and expression generally will include selectable markers and amplification regions, such as, for example, those set forth in Sambrook et al, MOLECULAR CLONING, A LABORATORY MANUAL, 2nd Ed.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989).
Representative examples of appropriate hosts include bacterial cells, such as streptococci, staphylococci, E. coli, streptomyces and Bacillus subtilis cells; fungal cells, such as yeast cells and Aspergillus cells; insect cells such as Drosophila S2 and Spodoptera Sf9 cells; animal cells such as CHO, COS, HeLa, C127, 3T3, BHK, 293 and Bowes melanoma cells; and plant cells.
The following vectors, which are commercially available, are provided by way of example. Among vectors preferred for use in bacteria are pQE70, pQE60 and pQE-9, available from Qiagen; pBS vectors, Phagescript vectors, Bluescript vectors, pNH8A, pNHlόa, pNH18A, pNH46A, available from Stratagene; and ptrc99a, pKK223-3, pKK233-3, pDR540, pRTT5 available from Pharmacia, and pBR322 (ATCC 37017). Among preferred eukaryotic vectors are pWLNEO, pSV2CAT, pOG44, pXTl and pSG available from Stratagene; and pSVK3, pBPV, pMSG and pSVL available from Pharmacia. These vectors are listed solely by way of illustration of the many commercially available and well known vectors that are available to those of skill in the art for use in accordance with this aspect of the present invention. It will be appreciated that any other plasmid or vector suitable for, for example, introduction, maintenance, propagation or expression of a polynucleotide or polypeptide of the invention in a host may be used in this aspect of the invention.
Promoter regions can be selected from any desired gene using vectors that contain a reporter transcription unit lacking a promoter region, such as a chloramphenicol acetyl transferase ("CAT") transcription unit, downstream of restriction site or sites for introducing a candidate promoter fragment; i.e., a fragment that may contain a promoter. As is well known, introduction into the vector of a promoter-containing fragment at the restriction site upstream of the cat gene engenders production of CAT activity, which can be detected by standard CAT assays. Vectors suitable to this end are well known and readily available, such as pKK232-8 and pCM7. Promoters for expression of polynucleotides of the present invention include not only well known and readily available promoters, but also promoters that readily may be obtained by the foregoing technique, using a reporter gene.
Among known prokaryotic promoters suitable for expression of polynucleotides and polypeptides in accordance with the present invention are the E. coli lacl and lacZ and promoters, the T3 and T7 promoters, the gpt promoter, the lambda PR, PL promoters and the trp promoter.
Among known eukaryotic promoters suitable in this regard are the CMV immediate early promoter, the HSV thymidine kinase promoter, the early and late SV40 promoters, the promoters of retroviral LTRs, such as those of the Rous sarcoma virus ("RSV"), and metallothionein promoters, such as the mouse metallothionein-I promoter.
Recombinant expression vectors will include, for example, origins of replication, a promoter preferably derived from a highly-expressed or regulatable gene to direct transcription of a downstream structural sequence, and a selectable marker to permit isolation of vector containing cells after exposure to the vector.
Polynucleotides of the invention, encoding the heterologous structural sequence of a polypeptide of the invention generally will be inserted into the vector using standard techniques so that it is operably linked to the promoter for expression. The polynucleotide will be positioned so that the transcription start site is located appropriately 5' to the AUG that initiates translation of the polypeptide to be expressed. Where applicable, a ribosome binding site may be located between the transcription start site and the initiating AUG. Generally, there will be no other open reading frames that begin with an initiation codon, usually AUG, and lie between the ribosome binding site, where applicable or the 5' end of the transcript and the initiation codon. Also, generally, there will be a translation stop codon at the end of the polypeptide and there will be a polyadenylation signal in constructs for use in eukaryotic hosts. Transcription termination signal appropriately disposed at the 3' end of the transcribed region may also be included in the polynucleotide construct.
For secretion of the translated protein into the lumen of the endoplasmic reticulum, into the periplasmic space or into the extracellular environment, appropriate secretion signals may be incorporated into the expressed polypeptide. These signals may be endogenous to the polypeptide or they may be heterologous signals.
The polypeptide may be expressed in a modified form, such as a fusion protein, and may include not only secretion signals but also additional heterologous functional regions. Thus, for instance, a region of additional amino acids, particularly charged amino acids, may be added to the N-or C-terminus of the polypeptide to improve stability and persistence in the host cell, during purification or during subsequent handling and storage. Also, a region may be added to the polypeptide to facilitate purification. Such regions may be removed prior to final preparation of the polypeptide. The addition of peptide moieties to polypeptides to engender secretion or excretion, to improve stability or to facilitate purification, among others, are familiar and routine techniques in the art. A preferred fusion protein comprises a heterologous region from immunoglobulin that is useful to solubilize or purify polypeptides. For example, EP-A-0 464 533 (Canadian counterpart 2045869) discloses fusion proteins comprising various portions of constant region of immunoglobulin molecules together with another protein or part thereof. In drug discovery, for example, proteins have been fused with antibody Fc portions for the purpose of high-throughput screening assays to identify antagonists. See, D. Bennett et ah, Journal of Molecular Recognition, 8: 52-58 (1995) and K. Johanson et ah, The Journal of Biological Chemistry, 270,(16): 9459-9471 (1995).
Cells typically then are harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification. Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents, such methods are well know to those skilled in the art.
Mammalian expression vectors may comprise an origin of replication, a suitable promoter and enhancer, and also any necessary polyadenylation sites, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking non-transcribed sequences that are necessary for expression. In certain preferred embodiments in this regard DNA sequences derived from the SV40 splice sites, and the SV40 polyadenylation sites are used for required non-transcribed genetic elements of these types.
HSV-2 polypeptides can be recovered and purified from recombinant cell cultures by well-known methods including ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite chromatography and lectin chromatography. Most preferably, high performance liquid chromatography ("HPLC") is employed for purification. Well known techniques for refolding protein may be employed to regenerate active conformation when the polypeptide is denatured during isolation and or purification.
Polypeptides of the present invention include naturally purified products, products of chemical synthetic procedures, and products produced by recombinant techniques from a prokaryotic or eukaryotic host, including, for example, viral, yeast, higher plant, insect and mammalian cells. Depending upon the host employed in a recombinant production procedure, the polypeptides of the present invention may be glycosylated or may be non-glycosylated. In addition, polypeptides of the invention may also include an initial modified methionine residue, in some cases as a result of host-mediated processes.
HSV-2 polynucleotides and polypeptides may be used in accordance with the present invention for a variety of applications, particularly those that make use of the chemical and biological properties of HSV-2. Additional applications relate to diagnosis and to treatment of disorders of cells, tissues and organisms. These aspects of the invention are illustrated further by the following discussion.
Polynucleotide assays: This invention is also related to the use of the HSV-2 polynucleotides to detect complementary polynucleotides such as, for example, as a diagnostic reagent. Detection of HSV- 2 polynucleotides in a eukaryote, particularly a mammal, and especially a human, will provide a diagnostic method that can add to, define or allow a diagnosis of a disease. Eukaryotes (herein also "individual(s)"), particularly mammals, and especially humans, infected by HSV-2 may be detected at the DNA or RNA level by a variety of techniques. Nucleic acids for diagnosis may be obtained from an individual's cells, tissues, and fluids, such as brain, bone, blood, muscle, cartilage, skin, saliva, urine, semen, and mucous. Tissue biopsy and autopsy material is also preferred for samples from an individual to use in a diagnostic assay. The viral DNA may be used directly for detection or may be amplified enzymatically by using PCR prior to analysis (Saiki et ah, Nature 324: 163-166 (1986)). RNA or cDNA may also be used in the same ways. As an example, PCR primers complementary to the nucleic acid encoding HSV-2 can be used to identify and analyze HSV-2 presence and expression. Using PCR, characterization of the strain of virus present in a eukaryote, particularly a mammal, and especially a human, may be made by an analysis of the genotype of the viral gene. For example, deletions and insertions can be detected by a change in size of the amplified product in comparison to the genotype of a reference sequence. Point mutations can be identified by hybridizing amplified DNA to radiolabeled HSV-2 RNA or alternatively, radiolabeled HSV-2 antisense DNA sequences. Perfectly matched sequences can be distinguished from mismatched duplexes by RNase A digestion or by differences in melting temperatures. Sequence differences between a reference gene and genes having mutations also may be revealed by direct DNA sequencing. In addition, cloned DNA segments may be employed as probes to detect specific DNA segments. The sensitivity of such methods can be greatly enhanced by appropriate use of PCR or another amplification method. For example, a sequencing primer is used with double-stranded PCR product or a single-stranded template molecule generated by a modified PCR. The sequence determination is performed by conventional procedures with radiolabeled nucleotide or by automatic sequencing procedures with fluorescent-tags.
Genetic typing of various strains of virus based on DNA sequence differences may be achieved by detection of alteration in electrophoretic mobility of DNA fragments in gels, with or without denaturing agents. Small sequence deletions and insertions can be visualized by high resolution gel electrophoresis. DNA fragments of different sequences may be distinguished on denaturing formamide gradient gels in which the mobilities of different DNA fragments are retarded in the gel at different positions according to their specific melting or partial melting temperatures (see, e.g., Myers et ah, Science. 230: 1242 (1985)). Sequence changes at specific locations also may be revealed by nuclease protection assays, such as RNase and S 1 protection or the chemical cleavage method (e.g. , Cotton et ah, Proc. Natl. Acad. Sci.. USA. 85: 4397-4401 (1985)).
Thus, the detection of a specific DNA sequence may be achieved by methods such as hybridization, RNase protection, chemical cleavage, direct DNA sequencing or the use of restriction enzymes, (e.g., restriction fragment length polymorphisms ("RFLP") and Southern blotting of genomic DNA.
In addition to more conventional gel-electrophoresis and DNA sequencing, mutations also can be detected by in situ analysis.
Cells carrying mutations or polymorphisms in the gene of the present invention may also be detected at the DNA level by a variety of techniques, to allow for serotyping, for example. Nucleic acids for diagnosis may be obtained from an infected individual's cells, including but not limited to blood, urine, saliva, tissue biopsy and autopsy material or from virus isolated and cultured from the above or other sources. The viral DNA may be used directly for detection or may be amplified enzymatically by using PCR (Saiki etal, Nature, 324:163-166 (1986)) prior to analysis. RT-PCR can also be used to detect mutations. It is particularly preferred to used RT-PCR in conjunction with automated detection systems, such as, for example, GeneScan. RNA or cDNA may also be used for the same purpose, PCR or RT-PCR. As an example, PCR primers complementary to the nucleic acid encoding HSV-2 can be used to identify and analyze mutations. For example, deletions and insertions can be detected by a change in size of the amplified product in comparison to the normal genotype. Point mutations can be identified by hybridizing amplified DNA to radiolabeled RNA or alternatively, radiolabeled antisense DNA sequences. Perfectly matched sequences can be distinguished from mismatched duplexes by RNase A digestion or by differences in melting temperatures. The primers may be used to amplify the gene isolated from the individual such that the gene may then be subject to various techniques for elucidation of the DNA sequence. In this way, mutations in the DNA sequence may be detected. Polypeptide assays:
The present invention also relates to diagnostic assays such as quantitative and diagnostic assays for detecting levels of HSV-2 protein in cells and tissues, including determination of normal and abnormal levels. Thus, for instance, a diagnostic assay in accordance with the invention for detecting expression of HSV-2 protein compared to normal control tissue samples may be used to detect the presence of an infection Assay techniques that can be used to determine levels of a protein, such as an HSV-2 protem of the present invention, in a sample deπved from a host are well-known to those of skill in the art Such assay methods include radioimmunoassays, competitive-binding assays, Western Blot analysis and ELISA assays Among these ELISAs frequently are preferred An ELISA assay initially compπses prepaπng an antibody specific to HSV-2, preferably a monoclonal antibody In addition a reporter antibody generally is prepared which binds to the monoclonal antibody The reporter antibody is attached to a detectable reagent such as radioactive, fluorescent or enzymatic reagent, in this example horseradish peroxidase enzyme Antibodies:
The polypeptides, their fragments or other deπvatives, or analogs thereof, or cells expressing them can be used as an immunogen to produce antibodies thereto The present invention includes, for examples monoclonal and polyclonal antibodies, chimenc, single chain, and humanized antibodies, as well as Fab fragments, or the product of an Fab expression library Antibodies generated against the polypeptides corresponding to a sequence of the present invention can be obtained by direct injection of the polypeptides into an animal or by administeπng the polypeptides to an animal, preferably a nonhuman The antibody so obtained will then bind the polypeptides itself In this manner, even a sequence encoding only a fragment of the polypeptides can be used to generate antibodies binding the whole native polypeptides Such antibodies can then be used to isolate the polypeptide from tissue expressing that polypeptide
For preparation of monoclonal antibodies, any technique known in the art which provides antibodies produced by continuous cell line cultures can be used Examples include vaπous techniques, such as those in Kohler, G and Milstein, C , Nature 256 495-497 (1975), Kozbor et al . Immunology Today 4 72 (1983), Cole et al, pg 7 '-96 m MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R Liss, Inc (1985)
Techniques descnbed for the production of single chain antibodies (U S Patent No 4,946,778) can be adapted to produce single chain antibodies to immunogenic polypeptide products of this invention Also, transgenic mice, or other organisms such as other mammals, may be used to express humanized antibodies to immunogenic polypeptide products of this invention
Alternatively phage display technology could be utilized to select antibody genes with binding activities towards the polypeptide either from repertoires of PCR amplified v- genes of lymphocytes from humans screened for possessing anti-Fbp or from naive libraries (McCafferty, J. et ah, , Nature 348, 552-554 (1990); Marks, J. et aL, Biotechnology 10: 779-783 (1992). The affinity of these antibodies can also be improved by chain shuffling (Clackson, T. et aL, Nature 352, 624-628 (1991).
If two antigen binding domains are present each domain may be directed against a different epitope - termed 'bispecific' antibodies.
The above-described antibodies may be employed to isolate or to identify clones expressing the polypeptide or purify the polypeptide of the present invention by attachment of the antibody to a solid support for isolation and/or purification by affinity chromatography.
Thus among others, antibodies against HSV-2 may be employed to inhibit and or treat infections, particularly viral infections, and especially HSV-2 infections as well as to monitor the effectiveness of antibiotic treatment.
Polypeptide derivatives include antigenically, epitopically or immunologically equivalent derivatives which form a particular aspect of this invention. The term "antigenically equivalent derivative" as used herein encompasses a polypeptide or its equivalent which will be specifically recognized by certain antibodies which, when raised to the protein or polypeptide according to the present invention, interfere with the immediate physical interaction between pathogen and mammalian host. The term "immunologically equivalent derivative" as used herein encompasses a peptide or its equivalent which when used in a suitable formulation to raise antibodies in a vertebrate, the antibodies act to interfere with the immediate physical interaction between pathogen and mammalian host.
The polypeptide, such as an antigenically or immunologically equivalent derivative or a fusion protein thereof, is used as an antigen to immunize a mouse or other animal such as a rabbit, rat or chicken. The fusion protein may provide stability to the polypeptide. The antigen may be associated, for example by conjugation , with an immunogenic carrier protein for example bovine serum albumin (BSA) or keyhole limpet haemocyanin (KLH). Alternatively a multiple antigenic peptide comprising multiple copies of the protein or polypeptide, or an antigenically or immunologically equivalent polypeptide thereof may be sufficiently antigenic to improve immunogenicity so as to obviate the use of a carrier. Preferably the antibody or derivative thereof is modified to make it less immunogenic in the individual. For example, if the individual is human the antibody may most preferably be "humanised" ; where the complimentarity determining region(s) of the hybridoma-derived antibody has been transplanted into a human monoclonal antibody , for example as described in Jones, P. et a Nature 321 : 522-525 (1986)or Tempest et ah, Biotechnology 9: 266-273 (1991). The above antibody reagents will also be useful for assessing the biological role of the gene through antibody inhibition studies, immunoprecipitation studies, super-shift experiments and similar techniques. These studies may lead to discovery of novel proteimprotein interactions which may be useful drug targets. The above antibody reagents may lead to the identification of novel viral proteins not predicted by the DNA sequence, which in turn may be novel drug targets. HSV-2 binding molecules and assays:
This invention also provides a method for identification of molecules, such as binding molecules, that bind HSV-2. Genes encoding proteins that bind HSV-2, such as binding proteins, can be identified by numerous methods known to those of skill in the art, for example, ligand panning and FACS sorting. Such methods are described in many laboratory manuals such as, for instance, Coligan sϊ aL, Current Protocols in Immunology 1(2): Chapter 5 (1991).
For instance, expression cloning may be employed for this purpose. To this end polyadenylated RNA is prepared from a cell expressing HSV-2, a cDNA library is created from this RNA, the library is divided into pools and the pools are transfected individually into cells that are not expressing HSV-2. The transfected cells then are exposed to labeled HSV-2. HSV-2 can be labeled by a variety of well-known techniques including standard methods of radio- iodination or inclusion of a recognition site for a site-specific protein kinase. Following exposure, the cells are fixed and binding of HSV-2 is determined. These procedures conveniently are carried out on glass slides. Alternatively a labeled ligand can be photoaffinity linked to a cell extract, such as a membrane or a membrane extract, prepared from cells that express a molecule that it binds, such as a binding molecule. Cross-linked material is resolved by polyacrylamide gel electrophoresis ("PAGE") and exposed to X-ray film. The labeled complex containing the ligand-binding can be excised, resolved into peptide fragments, and subjected to protein microsequencing. The amino acid sequence obtained from microsequencing can be used to design unique or degenerate oligonucleotide probes to screen cDNA libraries to identify genes encoding the putative binding molecule.
Polypeptides of the invention also can be used to assess HSV-2 binding capacity of HSV-2 binding molecules in cells or in cell-free preparations. Polypeptides of the invention may also be used to assess the binding of small molecule substrates and ligands in, for example, cells, cell-free preparations, chemical libraries, and natural product mixtures. These substrates and ligands may be natural substrates and ligands or may be structural or functional mimetics.
This invention also provides a method of screening drugs to identify those which interfere with the proteins selected as targets herein, which method comprises measuring the interference of the activity of the protein by a test drug. For example if the protein selected has a catalytic activity, after suitable purification and formulation the activity of the enzyme can be followed by its ability to convert its natural substrates. By incorporating different chemically synthesised test compounds or natural products into such an assay of enzymatic activity one is able to detect those additives which compete with the natural substrate or otherwise inhibit enzymatic activity.
The invention also relates to activators and inhibitors identified thereby. Another aspect of the invention relates to use of a polynucleotide in genetic immunization, and will preferably employ a suitable delivery method such as direct injection of plasmid DNA into muscles (Wolff et al., Hum. Mol. Genet. 1:363 (1992); Manthorpe et al., Hum. Gene Ther. 4:419 (1963)), delivery of DNA complexed with specific protein carriers ( Wu et al., J. Biol. Chem. 264:16985 (1989)), coprecipitation of DNA with calcium phosphate (Benvenisty & Reshef, Proc. Nat'l Acad. Sci. USA. 83:9551 (1986)), encapsulation of DNA in various forms of liposomes (Kaneda et al., Science 243:375 (1989)), particle bombardment (Tang et al, Nature 356:152 (1992)); Eisenbraun et al., DNA Cell Biol. 12:791 (1993)) and in vivo infection using cloned retroviral vectors (Seeger et al., Proc. Nat'l. Acad. Sci. USA 81:5849 (1984)). Suitable promoters for muscle transfection include CMV, RSV, SRa, actin, MCK, alpha globin, adenovirus and dihydrofolate reductase. In therapy or as a prophylactic, the active agent i.e., the polypeptide, polynucleotide or inhibitor of the invention, may be administered to a patient as an injectable composition, for example as a sterile aqueous dispersion, preferably isotonic. Vaccines: Another aspect of the invention relates to a method for inducing an immunological response in an individual, particularly a mammal which comprises inoculating the individual with HSV-2 polypeptide, or an antigenic fragment or variant thereof, adequate to produce antibody to protect said individual from infection, particularly HSV-2 infection. Yet another aspect of the invention relates to a method of inducing immunological response in an individual which comprises, through gene therapy, delivering a gene encoding HSV-2, or an antigenic fragment or a variant thereof, for expressing HSV-2, or a fragment or a variant thereof in vivo in order to induce an immunological response to produce antibody to protect said individual from disease.
A further aspect of the invention relates to an immunological composition which, when introduced into a host capable or having induced within it an immunological response, induces an immunological response in such host to HSV-2 or a protein coded therefrom, wherein the composition compπses a recombinant HSV-2 or protein coded therefrom comprising DNA which codes for and expresses an antigen of said HSV-2 or protein coded therefrom.
The HSV-2 or a fragment thereof may be fused with a co-protein which may not by itself produce antibodies, but is capable of stabilizing the first protein and producing a fused protein which will have immunogenic and protective properties. This fused recombinant protein, preferably further comprises an antigenic co-protein, such as Glutathione-S- transferase (GST) or beta-galactosidase, relatively large co-proteins which solubilise the protein and facilitate production and purification thereof. Moreover, the co-protein may act as an adjuvant in the sense of providing a generalized stimulation of the immune system. The co-protein may be attached to either the amino or carboxy terminus of the first protein.
The present invention also includes a vaccine formulation which comprises the immunogenic recombinant protein together with a suitable carrier. Since the protein may be broken down in the stomach, it is preferably administered parenterally, including, for example, administration that is subcutaneous, intramuscular, intravenous, or intradermal. Formulations suitable for parenteral administration include aqueous and non-aqueous sterile injection solutions which may contain anti-oxidants, buffers, bacteriostats and solutes which render the formulation instonic with the bodily fluid, preferably the blood, of the individual; and aqueous and non-aqueous sterile suspensions which may include suspending agents or thickening agents. The formulations may be presented in unit-dose or multi-dose containers, for example, sealed ampoules and vials and may be stored in a freeze-dried condition requiring only the addition of the sterile liquid carrier immediately prior to use. The vaccine formulation may also include adjuvant systems for enhancing the immunogenicity of the formulation, such as oil-in water systems and other systems known in the art. The dosage will depend on the specific activity of the vaccine and can be readily determined by routine experimentation.
Whilst the invention has been described with reference to acertain HSV-2 polypeptide, it is to be understood that this covers fragments of the naturally occurring protein and similar proteins (for example, having sequence homologies of 75% or greater) with additions, deletions or substitutions which do not substantially affect the immunogenic properties of the recombinant protein. Compositions:
The invention also relates to compositions comprising the polynucleotide or the polypeptides discussed above or the inhibitors. Thus, the polypeptides of the present invention may be employed in combination with a non-sterile or sterile carrier or carriers for use with cells, tissues or organisms, such as a pharmaceutical carrier suitable for administration to a subject. Such compositions comprise, for instance, a media additive or a therapeutically effective amount of a polypeptide of the invention and a pharmaceutically acceptable carrier or excipient. Such carriers may include, but are not limited to, saline, buffered saline, dextrose, water, glycerol, ethanol and combinations thereof. The formulation should suit the mode of administration.
Kits:
The invention further relates to diagnostic and pharmaceutical packs and kits comprising one or more containers filled with one or more of the ingredients of the aforementioned compositions of the invention. Associated with such containers) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, reflecting approval by the agency of the manufacture, use or sale of the product for human administration.
Administration:
Polypeptides and other compounds of the present invention may be employed alone or in conjunction with other compounds, such as therapeutic compounds.
The pharmaceutical compositions may be administered in any effective, convenient manner including, for instance, administration by topical, oral, anal, vaginal, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes among others.
The pharmaceutical compositions generally are administered in an amount effective for treatment or prophylaxis of a specific indication or indications. It will be appreciated that optimum dosage will be determined by standard methods for each treatment modality and indication, taking into account the indication, its severity, route of administration, complicating conditions and the like.
In therapy or as a prophylactic, the active agent may be administered to an individual as an injectable composition, for example as a sterile aqueous dispersion, preferably isotonic.
Alternatively the composition may be formulated for topical application for example in the form of ointments, creams, lotions, eye ointments, eye drops, ear drops, mouthwash, impregnated dressings and sutures and aerosols, and may contain appropriate conventional additives, including, for example, preservatives, solvents to assist drug penetration, and emollients in ointments and creams. Such topical formulations may also contain compatible conventional carriers, for example cream or ointment bases, and ethanol or oleyl alcohol for lotions. Such carriers may constitute from about 1% to about 98% by weight of the formulation; more usually they will constitute up to about 80% by weight of the formulation. For administration to mammals, and particularly humans, it is expected that the daily dosage level of the active agent will be from 0.01 mg/kg to 10 mg/kg, typically around 1 mg/kg. The physician in any event will determine the actual dosage which will be most suitable for an individual and will vary with the age, weight and response of the particular individual. The above dosages are exemplary of the average case. There can, of course, be individual instances where higher or lower dosage ranges are merited, and such are within the scope of this invention.
The composition of the invention may be administered by injection to achieve a systemic effect against relevant virus shortly before insertion of an in-dwelling device. Treatment may be continued after surgery during the in-body time of the device. In addition, the composition could also be used to broaden perioperative cover for any surgical technique to prevent viral reactivation.
Alternatively, the composition of the invention may be used to bathe an indwelling device immediately before insertion. A vaccine composition is conveniently in injectable form. Conventional adjuvants may be employed to enhance the immune response.
A suitable unit dose for vaccination is 0.5-5 microgram/kg of antigen, and such dose is preferably administered 1-3 times and with an interval of 1-3 weeks.
With the indicated dose range, no adverse toxicological effects will be observed with the compounds of the invention which would preclude their administration to suitable individuals.
In order to facilitate understanding of the following example certain frequently occurring methods and/or terms will be described.
Example 1 Preparation of ultra-purified Herpes simplex 2 virus DNA:
This protocol describes the preparation of herpes simplex virus type 2 strain SB5 DNA for sequencing. It is the combination of two protocols, both of which have been modified. Part one describes the crude isolation of the viral DNA from host cell DNA (Hirt, B., J. Mol. Biol. 26: 365-369. (1967),), and part two describes the ultra-purification of the viral DNA through a cesium chloride (CsCl) gradient (Vinograd J,et ah, Proc. Nat'l. Acad. Sci.(USA) 2:902-910(1963)).
I. Separation of viral DNA from host DNA (modified from Hirt^) Confluent monolayers of Vero cells (ATCC CCL 81) previously seeded into roller bottles (1 x 108 cells/bottle), were infected with HSV-2 strain SB5 at an MOI = 0.01 in HBSS. After one hour, the virus innoculum was removed and normal media was added (DMEM, 10% FCS).
Approximately 40-48 hours post- infection, infected monolayers were harvested by scraping, and placed in 10ml of cold lx PBS. For subsequent steps, three roller bottles of infected cells were combined (3 x 10° cells) The cells were spun at 2000g x 5 minutes. The supernatant was removed and to the cell pellet, 25ml of DNA extraction buffer was added (0.25% Triton X-100, lOmM EDTA, lOmM Tris pH 8.0).
The lysate was mixed at room temperature for 10 minutes. Them to the lysate, 1ml of 5M NaCl (0.2M final concentration) was added and allowed to mix another 15 minutes. The lysate was centrifuged at 10,000g for 30 minutes at 4°C. The supernatant, which contains the viral DNA, was saved and the pellet, which contains mostly chromosomal DNA, was discarded.
To the supernatant, SDS was added to 0.5% final cone, and Proteinase K to 150ug/ml final cone. This was incubated 2 hours at 45°C. After two hours, 2.5 volumes of 100% ethanol were added. Viral DNA was precipitated overnight at -20°C.
The precipitate was centrifuged at 10,000g for 30 minutes at 4°C. The pellet was washed once with 70% ethanol and air dried for 30 minutes. Then the pellet was resuspended in 250ul of TE (lOmM Tris, pH 7.5, 2mM EDTA). RNase A was added to a final concentration of lOug/ml and incubated at 37 °C for one hour.
SDS and Proteinase K were then added (as above) and incubated overnight at 37°C.
The DNA was phenol extracted 2x, chloroform extracted lx, and 1/10 volume 3M sodium acetate and 2.5 volumes of 100% ethanol to precipitate were added and allowed to precipitate overnight at -20°C. The next day, The precipitate was spun down at 15,000g x 20 minutes. The pellet was washed lx with 70% ethanol, briefly air dried and resuspended in 1ml of TE.
II. Ultrapurification of the viral DNA through a CsCl gradient (modified from Vinograd, et al. supra) A cesium chloride solution of 57% w/w with the prepared DNA from above was made as follows:
To the 1 ml of viral DNA prepared above, 9ml of TE was added for a total of exactly 10ml. To this, 13.26g of CsCl was added and dissolved. This solution was added to ultracentrufuge tubes and spun in a VTi 40 rotor at 35,000 rpm for 72 hours at 25°C. After centrifugation, the tube was mounted on a gradient collector and through a hole pierced in the bottom, 15 drop fractions were collected.
The refractive index of every fourth tube was determined on a refractometer. The viral DNA lies between refractive indicies = 1.403-1.401. Density range for HSV DNA from Goldin A , et ah, J. Virol. : 50-58. Boyant density (p) = a η25 - b , where coefficients a and b are 10.8601 and 13.4974 respectively for CsCl, η = refractive index. (Isco tables, a handbook of data for biological and physical scientist, Isco, Inc. Lincoln, NE, ninth ed. 1987).
The appropriate fractions were pooled and dialyzed against 3L of TE with frequent changing overnight.
The final DNA prep was concentrated by precipitating with 1/10 volume 3M sodium acetate and 2.5 volumes of 100% ethanol. The DNA was resuspended in TE and the OD 260/280 reading taken.
The DNA was then subjected to sequencing as provided in Sambrook, J et al. (1989) Chapter 13, supra: or by automated DNA sequencing as per manufacturer's protocols, e.g., Applied Biosystems/Perkin Elmer, Foster City, CA.
Certain preferred individual polynucleotide and polypeptide sequences of the invention are summarized in the following Tables. Tables 1, 2 and 3 represent three different sequencing efforts. Table 4 represents polypeptides encoded by ORFs from Table 3.
Table 1 provides polynucleotides of the invention and polypeptides encoded by ORFs, wherein the polynucleotide start and end position for each ORF is indicated by sequence numbers which correlate to the the polynucleotide sequence referred to above each given polypeptide in the Table. Additionally, each ORF-encoded polypeptide sequence is labeled with the Contig number matching the Contig number of the polynucleotide sequence from which it was encoded. For ORF sequences wherein the start polynucleotide number is larger than the end polynucleotide number, translation of that polypeptide initiates on the nucleotide strand which is complemetary to the strand depicted in the Table. In many cases there is more than one ORF mapped to an individual Contig. Contig assembly was performed using the publicly-available Phrap program, P.Green, University of Washington, WA., U.S.A. ORF prediction was accomplised using the publicly-available GenMark program, Georgia Tech Research Corp., Georgia Tech, GA, U.S.A. Homologies of the polypeptide sequences to known proteins are also indicted. These homologies were determined using the public database Mpsrch_pp, release 2.1 by J.Collins, Biocomputing Research Unit, University of Edinburgh (distributed by IntelliGenetics, Inc.). Table 2, obtained from a separately-performed sequencing, provides polynucleotides of the invention and polypeptides encoded by ORFs, wherein the polynucleotide start and end position for ORFs are indicated by sequence numbers which correlate to the polynucleotide sequence referred to above each given polypeptide in the Table. Each ORF-encoded polypeptide sequence is labeled with a Contig number matching the Contig number of the polynucleotide sequence referred to above it, from which it was encoded. For ORF sequences wherein the nucleotide start number is larger than the end number, translation of that polypeptide initiates on the nucleotide strand which is complementary to the strand depicted in the Table. Contig assembly was accomplished using the publicly-available Sequencher 3.0, Gene Codes Corp., Ann Arbor MI, USA, software program. ORF prediction was done using the publicly-available GenMark program (see Table 1). Homologies of the polypeptide sequences to known proteins are indicated. These homologies were determined using the publicly-available Mpsrch program (see Table
1).
Table 3 obtained from a separately-performed sequencing, provides polynucleotides of the invention and polypeptides encoded by ORFs, wherein the polynucleotide start and end positions for each ORF is indicated by sequence numbers which correlate to the polynucleotide sequence referred to above that polypeptide in the
Table. Each ORF-encoded polypeptide sequence is labeled with a Contig number matching the Contig number of the polynucleotide sequence appearing above it from which it was encoded. For ORF sequences wherein the start polynucleotide number is larger than the end number, translation of that polypeptide initiates on the nucleotide strand which is complementary to the strand depicted in the Table. Contig assembly was performed using the publicly-available Phrap program, (see Table 1). ORF prediction was accomplished using the publicly-available GenMark software program (see Table 1). Homologies of the polypeptide sequences to known proteins are indicated. These homologies were determined by comparison with public database Mpsrch_pp (see Table 1). Table 4 provides ORF sequences of polypeptides encoded by the polynucleotide sequences of Table 3 which were predicted by the GenMark program (see Table 1) as having more than a single start site (N-terminal methionyl residue). The Contig numbers and polynucleotide start and end sites for these ORFs correlate to the Contig numbers and polynucleotide sequence numbers of Table 3. TABLE 1
[SEQ ID NO:l] = Contig ID 100
[SEQ ID NO: 2]
ORF # = 1 from Contig ID 100
ORF start site = 1808 ORF end site = 3 ORF sequence : VLMGRLRNAPESLTYMFCAAIRVAPVTTQSRTSLRVCTHVLFPDPALPVMRYAANGNSR SGRPVGTSKAATSRNHCRRGTCVTSSCCCESSRMRAMIGWTPCMDVKFKNASSLNRTAGL APGCCGGGPGARTSREPSPPDAAMAAQRARAPAMRTRGGDAALCAPEDG VKVHPTPGT LFREILLGQMGYTEGQGVYNWRSSEAATRQLQAAIFHALLNATTYRDLEEDWRRHWAR GLQPQRLVRRYRNAREGDIAGVAERVFDTWRCTLRTTLLDFAHGWNCFAPGGPSGPTSF PKYIDWLTCLGLVPILRKTREGEATQRLGAFLRQHTLPRQLATVAGAAERAGPGLLELAV AFDSTRMAEYDRVHIYYNHRRGE LVRDPVSGQRGECLVLCPPLWTGDRLVFDSPVQRLC PEIVACHALREHAHICRLRNTASVKVLLGRKSDSERGVAGAARWNKALGEDDETKAGSA ASCLVRLIINMKGMRHVGDINDTVRAYLDEAGGHLIDTPAVDHTLPGFGKGGTGRGSAAQ DPGARPQQLRQAFQTAWNNINGMLEGYINNLFGTIERLRETNAGLATQLQARGGSSRST AX
Gene matched: gi | 136794 | sp | P10190 |UL06_HSV11 Gene name: VIRION PROTEIN U 6. gi | 73994
[SEQ ID NO: 3]
ORF # = 2 from Contig 100 ORF start site = 1378 ORF end site = 4023 ORF sequence: MAASGGEGSRDVRAPGPPPQQPGARPAVRFRDEAFLNFTSMHGVQPIIARIRELSQQQLD VTQVPRLQWFRDVAALEVPTGLPLREFPFAAYLITGNAGSGKSTCVQTLNEVLDCWTGA TRIAAQNMYVKLSGAFLSRPINTIFHEFGFRGNHVQAQLGQHPYTLASSPASLEDLQRRD LTYYWEVILDITKRALAAHGGEDARNEFHALTALEQTLGLGQGALTRLASVTHGALPAFT RSNIIVIDEAGLLGRHLLTTWYCWWMINALYHTPQYAGRLRPVLVCVGSPTQTAΞLEST FEHQKLRCSVRQSENVLTYLICNRTLREYTRLSHSWAIFINNKRCVEHEFGNLMKVLEYG LPITEEHMQFVDRFWPESYITNPANLPGWTRLFSSHKEVSAYMAKLHAYLKVTREGEFV VFTLPVLTFVSVKEFDKYRRLTQQPTLTMEK ITANASRITNYΞQSQDQDAGHVRCEVHS KQQLWARNDITYVLNSQVAVTARLRKMVFGFDGTFRTFEAVLRDDSFVKTQGETSVEFA YRFLSRLMFGGLIHFYNFLQRPGLDATQRTLAYGRLGELTAELLSLRRDAAGASATRAAD TSDRSPGERAFNFKHLGPRDGGPDDFPDDDLDVIFAGLDEQQLDVFYCHYALEEPETTAA VHAQFGLLKRAFLGRYLILRELFGEVFESAPFSTYVDNVIFRGCELLTGSPRGGLMSVAL QTDNYTLMGYTYTRVFAFAEELRRRHATAGVAEFLEESPLPYIVLRDQHGFMSWNTNIΞ EFVESIDSTELAMAINADYGISSKLAMTITRSQGLΞLDKVAICFTPGNLRLNSAYVAMSR TTSSEFLH NLNPLRERHERDDVISEHILSALRDPNWIVY*
Gene matched: gi | 74000 |pir | | MBEU5
Gene name : gene UL5 protein - human herpesvirus 1
[ SEQ ID NO : 4 ]
ORF # = 3 from Contig 100 ORF start site = 4090 ORF end site = 4695 ORF sequence :
MGNPQTTIAYSLHHPRAΞLTSALPDAAQWHVFESGTRAVLTRGRARQDRLPRGGWIQH TPIGLLVIIDCRAEFCAYRFIGRASTQRLER WDAHMYAYPFDSWVSSSHGESVRSATAG ILTWWTPDTIYITATIYGTAPEAARGCDNAPLDVRPTTPPAPVSPTAGEFPANTTDLLV EVLREIQISPTLDDADPTPGT*
Gene matched: gi | 136788 | sp | P28280 | UL04_HSV2H Gene name: PROTEIN UL4. gi | 73890 | pir | | W
[SEQ ID NO: 5]
ORF # = 4 from Contig 100
ORF start site = 5413
ORF end site = 4895
ORF sequence : VGPLDGEPDRDAISPLTSSVAGDPPGADGPYVTFDTLFMVSSIDELGRRQLTDTIRKDLR
LSLAKFΞIACTKTSSFSGTAARQRKRGAPPQRTCVPRSNKSLQMFVLCKRANAAQVREQL
RAVIRSRKPRKYYTRSSDGRLCPAVPVFVHEFVSSEPMRLHRDNVMLSTEPD*
Gene matched: gi | 330308
Gene name: (L02638) nuclear phosphoprotein [Herpes simplex v
[SEQ ID NO: 6]
ORF # = 5 from Contig 100 ORF start site = 6656
ORF end site = 5652
ORF sequence:
MKRARSRSPSPPSRPSSPFRTPPHGGSPRREVGAGILA.SDATSHVCIASHPGSGAGYPTR
LAAGSAVQRRRPRGCPPGVMFSASTTPEQPLGLSGDATPPLPTSVPLDWAAFRRAFLIDD AWRPLLEPELANPLTARLLAEYDRRCQTEEVLPPREDVFSWTRYCTPDDVRWIIGQDPY HHPGQAHGLAFSVRADVPVPPSLRNVLAAVKNCYPDARMΞGRGCLEKWARDGVLLLNTTL TVKRGAAASHSKLGWDRFVGGVVRRLAARRPGLVFML GAHAQNAIRPDPRQHYVLKFΞH PSPLSKVPFGTCQHFLAANRYLETRDIMPID SV*
Gene matched: gi | 330306
Gene name: (M25410) uracil-DNA glycosylase [Herpes simplex v
[SEQ ID NO: 7]
ORF # = 6 from Contig 100 ORF start site = 7080 ORF end site = 6529 ORF sequence:
VPCMRTPADDVSWRYEAPSVIDYARIDGIFLRYHCPGLDTFLWDRHAQRAYLVNPFLFAG GFLEDLSHSVFPADTQETTTRRALYKEIRDALGSRKQAVSHAPVRAGCVNFDYSRTRRCV GRRDLRPANTTST EPPVSSDDEASSQSKPLATQPPVLALSNAPPRRVSPTRGRRRHTRL RRN*
Gene matched: gi | 136776 | sp | P28278 | UL01_HSV2H Gene name: GLYCOPROTEIN L PRECURSOR, gi
[SEQ ID NO: 8] = Contig ID 101
[SEQ ID NO: 9] ORF # = 1 from Contig 101
ORF start site = 351
ORF end site = 1259
ORF sequence : IRRRGNVEIRVYYESVRPSRSRSHLKPSDHQEFPGHHVSPGSPGFPESPGNREFHDLPE NPGSRAYPGTRDPHDPHGCPGSLDPHGNPAQPAGLPSPVPYAPLGSPDPSSPRQRTYVLP
RVGIRNAPASDTRAPKRAHSRHRADRPPESPGSELYPLNAQALAHLQMLPADHRAFFRTV
IEVSRLCALNTHDPPPPLAGARVGQEAQLVHTQWLRANRESSPLWP RTAAMNFIAAAAP
CVQTHRHMHDLLMACAF CCLAHASTCSYAGLYSAHCQHLFRAFGCGPPVLTTSRGQGG
CN*
Gene matched: gi | 757866
Gene name: (X02138) 34K (UslO) (aa 1-284) [Human herpesvirus
[SEQ ID NO: 10] ORF # = 2 from Contig 101
ORF start site = 2140
ORF end site = 1871
ORF sequence:
MTSRPADQDSVRSSASVPLYPAASPVPAEAYYSESEDEAANDFLVR GRQQSVLRRRRRR
TRCVGLVIACLWALLSGGFGALLV LLR*
Gene matched: gi | 135568 | sp | P06481 |TEGP_HSV11 Gene name: TEGUMENT PHOSPHOPROTEIN US9
[SEQ ID NO: 11] ORF # = 3 from Contig 101 ORF start site = 2377 ORF end site = 2240 ORF sequence : VALHAVDAPSQFWWLAWWLRGAVGLGAVLCGIAFYVTSIARGA1'
Gene matched: gi | 477669 |pir | | B45696 Gene name: 23-29K immunoreactive epitope dispens [SEQ ID NO: 12] ORF # = 4 from Contig 101 ORF start site = 3572 ORF end site = 2529 ORF sequence:
VAPPRHHRVIPEVSHVRGVTVHME PEAIMFAPGETFETKVSIHAVAHDDGPYAMDWM RFDVPSSCAEMRIYEACLYHPQLPECLSPADAPCAVSSWAYRLAVRSYAGCSRTTPPPRC FAEARMEPVPGLAWLASTVNLEFQHASPQHAGLYLCWYVDDHIHAWGHMTISTAAQYRN AWEQHLPQRQPEPVEPTRPHVRAPPPAPSARGPLRLGAVLGAALLLAALGLSAWACMTC RRRS RAVKSRASATGPTYIRVADSELYADWSSDSEGERDGSLWQDPPERPDSPSTNGS GFEILSPTAPSVYPHSEGRKSRRPLTTFGSGSPGRRHSQASYSSVLW*
Gene matched: gi | 138240 | sp | P04488 |VGLE_HSV11 Gene name: GLYCOPROTEIN E PRECURSOR, gi
[SEQ ID NO: 13]
ORF # = 5 from Contig 101
ORF start site = 4176
ORF end site = 3460
ORF sequence : MARGAGLVFFVGVWWSCLAAAPRTS KRVTSGEDWLLPAPAGPEERTRAHKLL AAEP LDACGPLRPSWVAL PPRRVLETWDAACMRAPEPLAIAYSPPFPAGDEGLYSELA RDR VAWNESLVIYGALETDSGLYTLSWGLSDEARQVASWLWEPAPVPTPTPDDYDEEDD AGVSERTPVSVPPPTPPRWSPRGPPEAPSCYPRGVPRARGNGPYGDPGGHYVCPRGDV*
Gene matched: gi | 138241 | sp | P13289 |VGLE_HSV2 Gene name: GLYCOPROTEIN E PRECURSOR, gi I
[SEQ ID NO: 14] ORF # = 6 from Contig 101
ORF start site = 5796
ORF end site = 4495
ORF sequence:
VYL ARVGGWLGYLGGTWTPHKGSLEGGKLGQFIGRERGARTAVPTISHRAHSHLDPSDP GMPGRSLQGLAILGLWVCATGLWRGPTVSLVSDSLVDAGAVGPQGFVEEDLRVFGELHF
VGAQVPHTNYYDGIIELFHYPLGNHCPRWHWTLTACPRRPAVAFTLCRSTHHAHSPAY
PTLELGLARQPLLRWTATRDYAGLYVLRVWVGSATNASLFVLGVALSANGTFVYNGSDY
GSCDPAQLPFSAPRLGPSSVYTPGASRPTPPRTTTSPSSPRDPTPAPGDTGTPAPASGER
APPNSTRSASESRHRLTVAQVIQIAIPASIIAFVFLGSCICFIHRCQRRYRRPRGQIYNP GGVSCAVNEAAMARLGAELRSHPNTPPKPRRRSSSSTTMPSLTSIAEESEPGPWLLSVS
PRPRSGPTAPQEV*
Gene matched: gi | 138328 | sp | P06764 |VGLI_HSV23 Gene name: GLYCOPROTEIN I. gi | 73722 |pir
[SEQ ID NO:15] ORF # = 7 from Contig 101 ORF start site = 7017 ORF end site = 5815 ORF sequence : VCIAYHGMGRLTSGVGTAALLWAVGLRWCAKYALADPSLKMADPNRFRGKNLPVLDQL TDPPGVKRVYHIQPSLEDPFQPPSIPITVYYAVLERACRSVLLHAPSEAPQIVRGASDEA RKHTYNLTIA YRMGDNCAIPITVMEYTECPYNKSLGVCPIRTQPR SYYDSFSAVSEDN LGFLMHAPAFETAGTYLRLVKINDWTEITQFILEHRARASCKYALPLRIPPAACLTSKAY QQGVTVDSIGMLPRFIPENQRTVALYSLKIAGWHGPKPPYTSTLLPPELSDTTNATQPEL VPEDPEDSALLEDPAGTVSSQIPPNWHIPSIQDVAPHHAPAAPSNPGLIIGALAGSTLAV LVIGGIAF VRRRAQMAPKRLRLPHIRDDDAPPSHQPLFY*
Gene matched: gi | 419141 |pir| | E43674
Gene name: US6 protein - human herpesvirus 2 (st [SEQ ID NO: 16]
ORF # = 8 from Contig 101
ORF start site = 7553
ORF end site = 7440
ORF sequence :
VGGLCLMILGMACLLEVLRRLGRELARCCPHAGQFAP*
Gene matched: gi | 137132 | sp | P13293 |VGLJ_HSV2 Gene name: GLYCOPROTEIN J. gi I 419140 Ipir
[SEQ ID NO: 17] = Contig ID 102
[SEQ ID NO: 18]
ORF # = 1 from Contig 102
ORF start site = 1502 ORF end site = 465 ORF sequence :
VCPPPPTNMAWCGSGLRLRPFHPPSPSFFVLRALIRAGPGPFAASPRAPSGPGCGMCRG DSPGVAGGSGEHCLGGDDGDDGRPRLACVGAIARGFAHLWLQATTLGFVGSWLSRGPYA DAMSGAFVIGSTGLGFLRAPPAFARPPTRVCA LRLVGGGAAVAL SLGEAGAPPGVPGP ATQCLALGAAYAALLVLADDVHPLFLLAPRPLFVGTLGVWGGLTIGGSARY WIDPRAA AALTAAWAGLGTTAAGDSFSKACPRHRRFCWSAVESPPPRYAPEDAERPTDHGPLLPS THHQRS PRVCGDGAARPENI WVPWTFAGALALAACAARGWWERS *
Gene matched : gi | 136909 | sp | P10227 | UL43_HSV11 Gene name : MEMBRANE PROTEIN UL43 . gi | 73
[ SEQ ID NO : 19 ]
ORF # = 2 from Contig 102 ORF start site = 2996 ORF end site = 1584 ORF sequence:
MAHLPGGAAAAPLSEDAIPSPRERTEDWPPCQIVLQGAELNGILQAFAPLRTSLLDSLLV VGDRGILVHNAIFGEQVFLPLDHSQFSRYR GGPTAAFLSLVDQKRSLLSVFRANQYPDL RRVELTVTGQAPFRTLVQRIWTTASDGEAVELASETLMKRELTSFAVLLPQGDPDVQLRL TKPQLTKWNAVGDETAKPTTFELGPNGKFSVFNARTCVTFAAREEGASSSTSAQVQILT SALKKAGQAAANAKTVYGENTHRTFSVWDDCSMRAVLRRLQVGGGTLNFFLTADVPSVC VTATGPNAVSAVFLLKPQRVCLN LGRTPGSSTGSLASQDSRAGPTDSQDFSSEPDAGDR GAPEEEGLEGQARVPPAFPEPPGTKRRHAGAEWPADDATKRPKTGVPAAPTRAESPPLS ARYGPEAAEGGGDGGRYAWYFRDLQTGDASPSPLSAFRGPQRPPYGFGLP* Gene matched: gi | 136905 | sp | P10226 |VPAP_HSV11 Gene name: POLYMERASE ACCESSORY PROTEIN
[SEQ ID NO: 20]
ORF # = 3 from Contig 102
ORF start site = 3490 ORF end site = 4152
ORF sequence:
MGLFGMMKFAQTHHLVKRRGLRAPEGYFTPIAVDL NVMYTLWKYQRRYPSYDREAITL
HCLCSMLRVFTQKSLFPIFVTDRGVECTEPWFGAKAILARTTAQCRTDEEASDVDASPP
PFPHHRLQAQFPPFQHAPPRARLRPGGPGERGPPAQARRPPGARPRSRPCAWLTCSVSAF CGR GTPTSTRVSWRPTTPARTSIIPTRSRTCIPRIPISC*
Gene matched: gi | 549322 | sp| P36699 |VHS_HSV2G Gene name: VIRION HOST SHUTOFF PROTEIN.
[SEQ ID NO: 21]
ORF # = 4 from Contig 102
ORF start site = 4122 ORF end site = 4970
ORF sequence :
VHTTDTDLLLMGCDIVLDISTGYIPTIHCRDLLQYFKMSYPQFLALFVRCHTDLHPNNTY
ASVEDVLRECHWTAPSRSQARRGARRERANSRSLESMPTLTAAPVGLETRISWTEILAQQ
IAGEDDYEEDPPLQPPDVAGGPRDGARSSSSEILTPPELVQVPNAQRVAEHRGYVAGRRR HVIHDAPEALDWLPDPMTIAELVEHRYVKYVISLISPKERGPWTLLKRLPIYQDLRDEDL
ARSIVTRHITAPDIADRFLAQLWAHAPPPAFYKDVLAKF DE*
Gene matched: gi | 549322 | sp| P36699 |VHS_HSV2G Gene name: VIRION HOST SHUTOFF PROTEIN.
[SEQ ID NO:22]
ORF # = 5 from Contig 102
ORF start site = 6266 ORF end site = 5253 ORF sequence:
MDPAVSPASTDPLDTHASGAGAAPIPVCPTPERYFYTSQCPDINHLRSLSILNR LETEL VFVGDEEDVSKLSEGELGFYRFLFAFLSAADDLVTENLGGLSGLFEQKDILHYYVEQECI EWHSRVYNIIQLVLFHNNDQARRAYVARTINHPAIRVKVDWLEARVRECDSIPEKFILM ILIEGVFFAASFAAIAYLRTNNLLRVTCQSNDLISRDEAVHTTASCYIYNNYLGGHAKPE AARVYRLFREAVDIEIGFIRSQAPTDSSILSPGALAAIENYVRFSADRLLGLIHMQPLYS APAPDASFPLSLMSTDKHTNFFECRSTSYAGAWNDL*
Gene matched: gi | 132624 | sp| P03174 |RIR2_HSV23 Gene name: RIBONUCLEOSIDE-DIPHOSPHATE R
[SEQ ID NO: 23] ORF # = 6 from Contig 102 ORF start site = 9861 ORF end site = 6319 ORF sequence : VIRRPVRPFGRTAHPASHGPAAVSVHRVRATVTLVPMANRPAASALAGARSPSERQEPRE PEVAPPGGDHVFCRKVSGVMVLSSDPPGPAAYRISDSSFVQCGSNCSMIIDGDVARGHLR DLEGATSTGAFVAISNVAAGGDGRTAWALGGTSGPSATTSVGTQTSGEFLHGNPRTPEP QGPQAVPPPPPPPFP GHECCARRDARGGAEKDVGAAESWΞDGPSSDSETEDSDSSDEDT GSGSETLSRSSSI AAGATDDDDSDSDSRSDDSVQPDVWRRRWSDGPAPVAFPKPRRPG DSPGNPGLGAGTGPGSATDPRASADSDSAAHAAAPQAEVAPVLDSQPTVGTDPGYPVPLE LTPENAEAVARFLGDAVDREPALMLEYFCRCAREESKRVPPRTFGSAPRLTEDDFGLLNY ALAEMRRLCLDLPPVPPNAYTPYHLREYATRLVNGFKPLVRRSARLYRILGILVHLRIRT REASFEEWMRSKEVDLDFGLTERLREHEAQLMILAQALNPYDCLIHSTPNTLVERGLQSA LKYEEFYLKRFGGHYMESVFQMYTRIAGFLACRATRGMRHIALGRQGS WEMFKFFFHRL YDHQIVPSTPAMLNLGTRNYYTSSCYLVNPQATTNQATLRAITGNVSAILARNGGIGLCM QAFNDASPGTASIMPALKVLDSLVAAHNKQSTRPTGACVYLEPWHSDVRAVLRMKGVLAG EEAQRCDNIFSALWMPDLFFKRLIRHLDGEENVTWSLFDRDTSMSLADFHGEEFEKLYEH LEAMGFGETIPIQDLAYAIVRSAATTGSPFIMFKDAVNRHYIYNTQGAAIAGSNLCTEIV HPSSKRSSGVCNLGSVNLARCVSRRTFDFGMLRDAVQACVLMVNIMIDSTLQPTPQCARG HDNLRSMGIGMQGLHTACLKMGLDLESAEFRDLNTHIAEVMLLAAMKTSNALCVRGARPF SHFKRSMYRAGRFHWERFSNASPRYEGEWEMLRQSMMKHGLRNSQFIALMPTAASAQIΞD VSEGFAPLFTNLFSKVTRDGETLRPNTLLLKELERTFGGKRLLDAMDGLEAKQWSVAQAL PCLDPAHPLRRFKTAFDYDQELLIDLCADRAPYVDHSQSMTLYVTEKADGTLPASTLVRL LVHAYKRGL TGMYYCKVRKATNΞGVFAGDDNIVCTSCAL*
Gene matched: gi | 330199
Gene name: (M12700) ribonucleotide reductase large subunit (
[SEQ ID NO: 24] ORF # = 7 from Contig 102 ORF start site = 11144 ORF end site = 10323 ORF sequence: VRRRLRCARRRRGGPGPHHDQLRRDAGRGAAGPVFRMPARHGPHARVSPRGHAVFRGASV WTQDELASVTAVCSGPQEATHTGHPGRPCSAVTIPACAFVDLDAELCLGGPGAAFLYLV FTYRQCRDQELCCVYWKSQLPPRGLEAALERLFGRLRITNTIHGAEDMTPLPPNRNVDF PLAVLAASSQSPRCSASQVTNPQFVDRLYRWQPDLRGRPTARTCTYAAFAELGVMPDNSP RCLHRTERFGAVGVPWILEGWWRPGG RACA*
Gene matched : gi | 139176 | sp | P22486 | VP19_HSV2G Gene name : CAPSID ASSEMBLY AND DNA MATU
[SEQ ID NO:25] ORF # = 8 from Contig 102 ORF start site = 11722 ORF end site = 10667 ORF sequence:
MKTKPLPTAPMA AESAVETTTSPRELAGHAPLRRVLRPPIARRDGPVLLGDRAPRRTAS TM LLGIDPAESSPGTRATRDDTEQAVDKILRGARRAGGLTVPGAPRYHLTRQVTLTDLC QPNAERAGALLLALRHPTDLPHLARHRAPPGRQTERLAEAWGQLLEASALGSGRAESGCA RAGLVSFNFLVAACAAAYDARDAAEAVRAHITTNYGGTRAGARLDRFSECLRAMVHTHVF PHEVMRFFGGLVSWSHRTS LASPPSAADPRRPHTPATRAGPVRPLPSRPAPL TWTPSC A GALGRRSCTWFSPTDSAGTRSSVACTWSRASSPRAD RRPSSGCSGASG*
Gene matched: gi | 139176 | sp | P22486 |VP19_HSV2G Gene name: CAPSID ASSEMBLY AND DNA MATU
[SEQ ID NO:26] = Contig ID 103
[SEQ ID NO: 27]
ORF # = 1 from Contig 102 ORF start site = 3308 ORF end site = 693 ORF sequence : MAETMNVATCTHQTHHAARAPGATSAPGAASGDPLGARRPIGDDECEQYTSSVSLARMLY GGDLAE VPRVHPKTTIERQQHGPVTFPDASAPTARCVTWRAPMGSGKTTALIRWLGEA IHSPDTSVLWSCRRSFTQTLATRFAESGLPDFVTYFSSTNYIMNDRPFHRLIVQVESLH RVGPNLLNNYDVLVLDEVMSTLGQLYSPTMQQLGRVDALMLRLLRTCPRIIAMDATANAQ LVDFLCSLRGEKNVHWIGEYAMPGFSARRCLFLPRLGPEVLQAALRPPGPAGGAPPPDA PPDATFFGEVEARLAGGDNVCIFLSTVSFAEWARFCRQFTDRVLLLHSLTPPGEVTTWG RYRWIYTTWTVGLSFDPPHFDSMFAYVKPMNYGPDMVSVYQSLGRVRTLRKGELLIYM DGSGARSEPVFTPMLLNHWSASGQWPAQFSQVTNLLCRRFKGRCDASHADAAQARGSRI YSKFRYKHYFERCTLACLADSLNILHMLLTLNCMHVRFWGHDAALTPRNFCLFLRGIHFD ALRAQRDLRELRCQDPDTSLSAQAAETEEVGLFVEKYLRPDVAPAEWALMRGLNSLVGR TRFIYLVLLEACLRVPMAAHSSAIFRRLYDHYATGVIPTINAAGELELVALHPTLNVAPV ELFRLCSTMAACLQWDSMAGGSGRTFSPEDVLELLNPHYDRYMQLVFELGHCNVTDGPL LSEDAVKRVADALSGCPPRGSVSETEHALSLFKIIWGELFGVQLAKSTQTFPGAGRVKNL TKRAIVELLDAHRIDHSACRTHRQLYALLMAHKREFAGARFKLRAPAWGRCLRTHASGAQ PNTDIILEAALSELPTEAWPMMQGAVNFSTL*
Gene matched: gi | 136806 | sp| P10193 |UL09_HSV11 Gene name: ORIGIN OF REPLICATION BINDIN
[SEQ ID NO: 28]
ORF # = 2 from Contig 103 ORF start site = 3160 ORF end site = 4590 ORF sequence: VYCSHSSSPMGRRAPRGSPEAAPGADVAPGARAA WVWCVQVATFIVSAICWGLLVLAS VFRDRFPCLYAPATSYAEANATVEVRGGVAVPLRLDTQSLLATYAITSTLLLAAAVYAAV GAVTSRYERALDAARRLAAARMAMPHATLIAGNVCAWLLQITVLLLAHRISQLAHLIYVL HFACLVYLAAHFCTRGVLSGTYLRQVHGLIDPAPTHHRIVGPVRAVMTNALLLGTLLCTA AAAVSLNTIAALNFNFSAPSMLICLTTLFALLWSLLLWEGVLCHYVRVLVGPHLGAIA ATGIVGLACEHYHTGGYYWEQQ PGAQTGVRVALALVAAFALAMAVLRCTRAYLYHRRH HTKFFVRMRDTRHRAHSALRRVRSSMRGSRRGGPPGDPGYAETPYASVSHHAEIDRYGDS DGDPIYDEVAPDHEAELYARVQRPGPVPDAEPIYDTVEGYAPRSAGEPVYSTVRR *
Gene matched: gi | 136810 | sp | P04288 |VIMP_HSV11 Gene name: PROBABLE INTEGRAL MEMBRANE P
[SEQ ID NO: 29] ORF # = 3 from Contig 103 ORF start site = 6853 ORF end site = 4784 ORF sequence: MAAAATPGAKRPADPARDPDSPPKRPRPNSLDLATVFGPRPAPPRPTSPGAPGSHWPQSP PRGQPDGGAPGEKARPASPALSEASSGPPTPDIPLSPGGAHAIDPDCSPGPPDPDPM SA SAIPNALPPHILAETFERHLRGLLRGVRSPLAIGPLWARLDYLCSLWSLEAAGMVDRGL GRHL RLTRRAPPSAAEAVAPRPLMGFYEAATQNQADCQL ALLRRGLTTASTLRWGAQG PCFSSQ LTHNASLRLDAQSSAVMFGRVNEPTARNLLFRYCVGRADAGVNDDADAGRFVF HQPGDLAEENVHACGVLMDGHTGMVGASLDILVCPRDPHGYLAPAPQTPLAFYEVKCRAK YAFDPADPGAPAASAYEDLMARRSPEAFRAFIRSIPNPGVRYFAPGRVPGPEEALVTQDR DWLDSRAAGEKRRCSAPDRALVELNSGWSEVLLFGVPDLERRTISPVAWSSGELVRREP IFANPRHPNFKQILVQGNVPRQPLSRLPPATAPGDVPRQAPRGRGGGRDVPPGGRPRSAR RA RGPRTRQGIDPPGPGRSDRPDHHPRPRRAGDIPGHPAKQPPGLRRYARQVMGLAFSG ARPCCCRHNVIITDGGEWSLTAHEFDWDIESEEEGNFYVPPDMRWTRAPGPQYRRAS DPPSRHTRRRDPDVARPPATLTPPLΞDSE* Gene matched: gi | 119694 | sp [ P06489 | EXON_HSV2 Gene name: ALKALINE EXONUCLEASE. gi 13302
[SEQ ID NO: 30] ORF # = 4 from Contig 103 ORF start site = 5313 ORF end site = 4990 ORF sequence:
VTFLGRHRAGAEEGVTFRLEDGRGAPAGRGGAPGPAKASILPDQAVPIALIITPVRVEPG IYRDIRRNSRLAFDDTLAKL ASRSPGRGPAAADTTSSSPTAGRSSR*
Gene matched: gi | 330252
Gene name: (M11854) 1.9kD ORF [Herpes simplex virus type 2]
[SEQ ID NO: 31]
ORF # = 5 from Contig 103 ORF start site = 8477 ORF end site = 6894 ORF sequence : VGGRRPGGRMDESGRQRPASHVAADISPQGAHRRSFKAWLASYIHSLSRRASGRPSGPSP RDGAVSGARPGSRRRSSFRERLRAGLSR RVSRSSRRRSSPEAPGPAAKLRRPPLRRSET AMTSPPSPPSHILSLARIHKLCIPVFAVNPALRYTTLEIPGARSFGGSGGYGEVQLICEH KLAVKTIREKEWFAVELVATLLVGECAFCGGRTHDIRGFITPLGFSLQQRQIVFPAYDMD LGKYIGQLASLRATTPSVATALHHCFTDLARAWFLNTRCGISHLDIKCANVLVMLRSDA VSLRRAVLADFSLVTLNSNSTISRGQFCLQEPDLESPRGFGMPAALTTANFHTLVGHGYN QPPELLVKYLNNERAEFNNRPLKHDVGLAVDLYALGQTLLELLVSVYVAPSLGVPVTRVP GYQYFNNQLSPDFAVALLAYRCVLHPALFVNSAETNTHGLAYDVPEGIRRHLRNPKIRRA FTEQCINYQRTHKAVLSSVSLPPELRPLLVLVSRLCHANPAARHSLS*
Gene matched: gi | 125628 | sp | P04290 | KR2_HSV11 Gene name: PROBABLE SERINE/THREONINE-PRO
[SEQ ID NO: 32]
ORF # = 6 from Contig 103
ORF start site = 8113
ORF end site = 8352
ORF sequence : MAVSDLRRGGRLSLAAGPGASGDERRRDERLTRHRDSPARSRSRKLDRRRDPGRAPETAP SRGEGPLGRPDARRLRECM*
Gene matched: gi | 93505 |pir | | B34768
Gene name: ORF5 protein - Orf virus (strain NZ2)
[SEQ ID NO: 33]
ORF # = 7 from Contig 103 ORF start site = 8863
ORF end site = 8204
ORF sequence:
MSRDASHAALRRRLAETHLRAEVYRDQTLQLHREGVSTQDPRFVGAFMAAKAAHLELEAR
LKSRARLEMMRQRATCVKIRVEEQAARRDFLTAHRRYLDPALSERLDAADDRLADQEEQL EEAAANASL GDGDLADGWMSPGDSDLLVMWQLTSAPKVHTDAPSRPGSRPTYTPSAAGR
PDAQAAPPPETAPSPEPAPGPAADPAΞGSGFARDCPDGE*
Gene matched: gi | 136823 | sp | P04291 |UL14_HSV11 Gene name: HYPOTHETICAL UL14 PROTEIN, g
[SEQ ID NO:34]
ORF # = 8 from Contig 103
ORF start site = 8749 ORF end site = 10242 ORF sequence:
VYSRPPGVAAGSGPCTPRPGGASRPNVGAGPRG RLGSSRRPRARPTSDSFAPTPLTSAA PASPAMFGQQLASDVQQYLERLEKQRQQKVGVDEASAGLTLGGDALRVPFLDFATATPKR HQTWPGVGTLHDCCEHSPLFSAVARRLLFNSLVPAQLRGRDFGGDHTAKLEFLAPELVR AVARLRFRECAPEDAVPQRNAYYSVLNTFQALHRSEAFRQLVHFVRDFAQLLKTSFRASS LAENTGPPKKRAKVDVATHGQTYGTLELFQKMILMHATYFLAAVLLGDHAEQVNTFLRLV FEIPLFSDTAVRHFRQRATVFLVPRRHGKTWFLVPLIALSLASFRGIKIGYTAHIRKATE PVFDEIDACLRG FGSSRVDHVKGETISFSFPDGSRSTIVFASSHNTNVSTPSSRGACFP GAALPEIDRQTNTARRECGTTRPQPPPPWRGEALLFICNRTMRLWPRPARPRGSSLQTGG YTMTERRGATRRWSGG*
Gene matched: gi | 74013 |pir| |WMBE31
Gene name: 38K protein - human herpesvirus 1 gi | 5
[SEQ ID NO: 35] ORF # = 9 from Contig 103 ORF start site = 11332 ORF end site = 10115 ORF sequence:
VFLFHRSPTPPPKSYTR PLCFWCVSGPFPTTNMAQRAVWRPQGTPGPPGAAAPPGHRGA
PPDARAPDPGPEADLVARIANSVFVWRWRGDERLKIFRCLTVLTEPLCQVALPDPDPER
ALFCEIFLYLTRPKALRLPSNTFFAIFFFNRERRYCATVHLRSVTHPRTPLLCTLAFGHL
EAASPPEETPDPAAEQLADEPVAHELDGAYLVPTEPPPNPGACCALGPGAW HLPGGRIY
C AMDDDLGSLCPPGSRARHLG LLSRITDPPGGGGACAPTAHIDSANAL RAPAVAEAC
PCVAPCMWSNMAQRTLAVRGDASLCQLLFGHPVDAVILRQATRRPRITAHLHEVWGRDG
AESVIRPTSAGWRLCVLSSYTSRLFATSCPAVARAVARASSSDYK*
Gene matched: gi | 136829 | sp | P10200 |UL16_HSV11 Gene name: PROTEIN UL16. gi I 73879 Ipir I I
[ SEQ ID NO : 36 ]
ORF # = 10 from Contig 103
ORF start site = 12706
ORF end site = 11336
ORF sequence : FLTGYFRVHGIDKLDQRAVQDVTRRHPVRARPKHAASGVXSGLRQGALVHXAVSGGALGA
SDAEAVLAGLEPPGGGRFATPGGPRAAGDDVLNDVLTLVPGTAKPRSLVE LDRGWEPLA
GGDRPDWLWSRRSISWLRHHYGTKQRFVWSYKNSVA GGRRTRPPLLSSYLATALTEA
CAAERWRPHQLSPAAQTALLRRFPALEGPLRHPRPVLQPFDIAAEVAFVARIQIACLRA
LGHSIRAALQGGPRIFQRLRYDFGPHQSE LGEVTRRFPVLLENLMRALEGTAPDAFFHT AYALAVLAHLGGQGGRGRRRRLVPLSDDIPARFADSDAHYAFDYYSTSGDTLRLTNRPIA
WIDGDVNGREQSKCRFMEGSPSTAPHRVCEQYLPGESYAYLCLGFNRRLCGLWFPGGF
AFTINTAAYLSLADPVARAVGLRFCRGAGTGPGLVR*
Gene matched : gi | 136835 | sp | P10201 | UL17_HSV11 Gene name : PROTEIN UL17 . gi | 73875 | pir | |
[ SEQ ID NO : 37 ] = Contig ID 104
[ SEQ ID NO : 38 ]
ORF # = 1 from Contig 104
ORF start site = 3027 ORF end site = 262
ORF sequence :
VSGRAGDPAGLPAPRGGPT PMPSGGPPPEVKAGLRADMWGVMGQYREAXEHQTPDTETV
VAGMHPALVWLKTMFXDAPETPVLVQFFSDHAPTIAKAVSNAINAGSAAVATASPAATV
DAAVRAHGALADAVSALGAAARDPASPLSFLAALADSAAGYVKATRLALEARGAIDKLTT LGSAAADLVFHARRACAQPEGDHAALIDAAARATTAARESLAGHEAGFGGLLHAEGTAGD HSPSGRALQELGKVIGATRRRAEELEAAVADLTGKMAAQRARGSSERWAAGVEAALDRVE NRAEFDWELRRLQALAGTHGYNPRDFRKRAEQALAANAEAVTLALDTAFAFNPYTPENQ RHPMLPPLAAIHRLG SAAFHAAAETYADMFRVDAEPLARLLRIAEGLLEMAQAGDGFID YHEAVGRLADDMTSVPGLRRYVPFFQHGYADYVELRDRLDAIRADVHRALGGVPLDLAAA AEQISAARNDPEATAELVRTGVTLPCPSEDALVACAAALERVDQSPVKNTAYAEYVAFVT RQDTAETKDAWRAKQQRAEATERVMAGLREALAARERRAQIEAEGLANLKTMLKWAVP ATVAKTLDQARSVAEIADQVEVLLDQTEKTRELDVPAVI LEHAQRTFETHPLSAARGDG PGPLARHAGRLGALFDTRRRVDALRRSLEEAEAE DEVWGRFGRVRGGA KSPEGFRAMH EQLRALQDTTNTVSGLRAQPAYERLSARYQGVLGAKGAERAEAVEELGARVTKHTALCAR LRDEWRRVP EMNFDALGRLLAEFDAAAADLAPWAVEEFRGARELIQYRMGLYSAYARA GGKALFLFFFFPPPLSSFLPHFHFFIHHHHSFTKFFTSSSLHSYHLFPSSIYSIPSISPL YPHSSLSFPSSQFLHIFLSLP*
Gene matched: gi | 135576 | sp | P10220 |TEGU_HSV11 Gene name: LARGE TEGUMENT PROTEIN (VIRI
[SEQ ID NO: 39]
ORF # = 2 from Contig 104 ORF start site = 3914
ORF end site = 2901
ORF sequence :
VMPVAPPPRGAGGRAPCPPALGPEAIHARLEDVRIQARRAIESAIKEYFHRGAVYSAKAL
QASDSHDCRFHVASAAWPMVQLLESLPAFDQHTRDVAQRAALPPPPPLATSPQAILLRD LLQRGQTLDAPEDLAA LSVLTDAATQGLIERKPLEELARSIHGINDQQARRSSGLAELQ
RFDALDAALAQQLDSDAAFVPATGPAPYVDGGGLSPEATRMAEDALRQARAMEAAKMTAE
LAPEARSRLRERAHALEAMLNDARERAKVAHDAREKFLHKLQGVLRPLPDFVGLKACPAV
LATLRASLPRGVDRPGRCRPGAPPRKSRRGCGRTCGG*
Gene matched : gi | 221757
Gene name : (D10879 ) virion protein [Herpes simplex virus typ
[SEQ ID NO: 40]
ORF # = 3 from Contig 104 ORF start site = 6099 ORF end site = 3643 ORF sequence: WTGVRNQFATDLEPGGSVSCMRSSLSFLSLLFDVGPRDVLSAEAIEGCLVEGGEWTRAA AGSGPPRMCSIIELPNFLEYPAARGGLRCVFSRVYGEVGFFGEPTAGLLETQCPAHTFFA GP AMRPLSYTLLTIGPLGMGLYRDGDTAYLFDPHGLPAGTPAFIAKVRAGDVYPYLTYY AHDRPKVR AGAMVFFVPSGPGAVAPADLTAAALHLYGASETYLQDEPFVERRVAITHPL RGEIGGLGALFVGWPRGDGEGSGPWPALPAPTHVQTPRADRPPEAPRGASGPPNTPQA GHPNRPPDDV AAALEGTPPAKPSAPDAAASGPPHAAPPPQTPAGDAAEEAEDLRVLEVG AVPVGCHRARYSTGLPKRRRPT TPPSSVEDLTSGERPAPKAPPAKAKKKSAPKKKAPVA AEVPASSPTPIAATVPPAPDTPPQSGQGGGDDGPASPSSPSVLETLGARRPPEPPGADLA QLFEVHPNVAATAVRLAARDAALAREVAACSQLTINALRSPYPAHPGLLELCVIFFFERV LAFLIENGARTHTQAGVAGPAAALLDFTLRMPPRKTAVGDFLASTRMSLADVAAHRPLIQ HVLDKNSQIGRLALAKLVLVARDFIRETDAFYGDLADLDLQLRAAPPANLYARLGKWLLE RSRAHPNTLFAPATPTHPEPLLHRIQALAHFARGKKMRVEAEAREMREALYALARGVYSV SQRAGPPDRDARCPPPPGRRRQGPVPARPGPRGHPCAAGGRADPGPPGDRKRDQGVLPPG SRIQREGPAGQRQPRLSVSRGLGRGRAHGPVAGIATGL*
Gene matched: gi | 221757
Gene name: (D10879) virion protein [Herpes simplex virus typ
[SEQ ID NO: 41] ORF # = 4 from Contig 104
ORF start site = 6751
ORF end site = 6269
ORF sequence:
MNAHFANEVQYDLTRDPSSPASLIHVIISSECLAAAGVPLSALVRGRPDGGAAANFRVET QTRPHAPGDCTPWRSAFAAYVPADAVGAILAPVIPAHPDLLPRVPSAGGLFVSLPVACDA
QGVYDPYTVAALRLAWGPWATCARVLLFSYDELTRYRVCG*
Gene matched: gi | 136835 | sp | P10201 |UL17_HSV11 Gene name: PROTEIN UL17. gi | 73875 | pir | |
[SEQ ID NO: 42] ORF # = 5 from Contig 104
ORF start site = 6781
ORF end site = 8052
ORF sequence :
VPEGAWVGGACARPRGPRAHVRLYAVCFVCPQGIRGQDFNLLFVDEANFIRPDAVQTIMG FLNQANCKIIFVSSTNTGKASTSFLYNLRGAADELLNWTYICDDHMPRWTHTNATACS
CYILNKPVFITMDGAVRRTADLFLPDSFMQEIIGGQARETGDDRPVLTKSAGERFLLYRP
STTTNSGLMAPELYVYVDPAFTANTRASGTGIAWGRYRDDFIIFALEHFFLRALTGSAP
ADIARCWHSLAQVLALHPGAFRSVRVAVEGNSSQDSAVAIATHVHTEMHRILASAGANG
PGPELLFYHCEPPGGAVLYPFFLLNKQKTPAFEYFIKKFNSGGVMASQELVSVTVRLQTD PVEYLSEQLNNLIETVSPNTDVRMYSGKRNGAADDLMVAVIMAIYLAAPTGIPPAFFPI
RTS*
Gene matched: gi | 139646 | sp | P04295 |VTER_HSV11 Gene name: PROBABLE DNA PACKAGING PROTE [SEQ ID NO: 43] ORF # = 6 from Contig 104 ORF start site = 9483 ORF end site = 8392 ORF sequence:
VLLSPAPPPLPHGRCPPSLFHHRPGCVALSGPPAPPRSGVSRPGAMITDCFEADIAIPSG ISRPDAAALQRCEGRWFLPTIRRQLALADVAHESFVSGGVSPDTLGLLLAYRRRFPAVI TRVLPTRIVACPVDLGLTHAGTVNLRNTSPVDLCNGDPVSLVPPVFEGQATDVRLESLDL TLRFPVPLPTPLAREIVARLVARGIRDLNPDPRTPGELPDLNVLYYNGARLSLVADVQQL ASVNTELRSLVLNMVYSITEGTTLILTLIPRLLALSAQDGYVNALLQMQSVTREAAQLIH PEAPMLMQDGERRLPLYEALVAWLAHAGQLGDILALAPAVRVCTFDGAAWQSGDMAPVI RYP*
Gene matched: gi | 139191 | sp | P10202 |VP23_HSV11 Gene name: CAPSID PROTEIN VP23. gi|7387
[SEQ ID NO: 44] ORF # = 7 from Contig 104 ORF start site = 13917 ORF end site = 9727 ORF sequence:
VWEGLGLPELGLMEPANPPRNPMAAPARDPPGYRYAAAMVPTGSILSTIEVASHRRLFDF FARVRSDENSLYDVEFDALLGSYCNTLSLVRFLELGLSVACVCTKFPELAYMNEGRVQFE VHQPLIARDGPHPVEQPVHNYMTKVIDRRALNAAFSLATEAIALLTGEALDGTGISLHRQ LRAIQQLARNVQAVLGAFERGTADQMLHVLLEKAPPLALLLPMQRYLDNGRLATRVARAT LVAELKRSFCDTSFFLGKAGHRREAIEA LVDLTTATQPSVAVPRLTHADTRGRPVDGVL VTTAAIKQRLLQSFLKVEDTEADVPVTYGEMVLNGANLVTALVMGKAVRSLDDVGRHLLE MQEEQLEANRETLDELESAPQTTRVRADLVAIGDRLVFLEALEKRIYAATNVPYPLVGAM DLTFVLPLGLFNPAMERFAAHAGDLVPAPGHPEPRAFPPRQLFF GKDHQVLRLSMENAV GTVCHPSLMNIDAAVGGVNHDPVEAANPYGAYVAAPAGPGADMQQRFLNAWRQRLAHGRV RWVAECQMTAEQFMQPDNANLALELHPAFDFFAGVADVELPGGEVPPAGPGAIQATWRW NGNLPLALCPVAFRDARGLELGVGRHAMAPATIAAVRGAFEDRSYPAVFYLLQAAIHGSE HVFCALARLVTQCITSYWNNTRCAAFVNDYSLVSYIVTYLGGDLPEECMAVYRDLVAHVE ALAQLVDDFTLPGPELGGQAQAELNHLMRDPALLPPLVWDCDGLMRHAALDRHRDCRIDA GGHEPVYAAACNVATADFNRNDGRLLHNTQARAADAADDRPHRPADWTVHHKIYYYVLVP AFSRGRCCTAGVRFDRVYATLQNMWPEIAPGEECPSDPVTDPAHPLHPANLVANTVNAM FHNGRVWDGPAMLTLQVLAHNMAERTTALLCSAAPDAGANTASTANMRIFDGALHAGVL LMAPQHLDHTIQNGEYFYVLPVHALFAGADHVANAPNFPPALRDLARHVPLVPPALGANY FSSIRQPWQHARESAAGENALTYALMAGYFKMSPVALYHQLKTGLHPGFGFTWRQDRF VTENVLFSERASEAYFLGQLQVARHETGGGVSFTLTQPRGNVDLGVGYTAVAATATVRNP VTDMGNLPQNFYLGRGAPPLLNNAAAVYLRNAWAGNRLGPAQPLPVFGCAQVPRRAGMD HGQDAVCEFIATPVATDINYFRRPCNPRGRAAGGVYAGDKEGDVIALMYDHGQSDPARPF AATA P ASQRFSYGDLLYNGAYHLNGASPVLSPCFKFFTAADITAKHRCLERLIVETGS AVSTATAASDVQFKRPPGCRELVEDPCGLFQEAYPITCASDPALLRSARDGEAHARETHF TQYLIYDASPLKGLSL*
Gene matched: gi | 137571 | sp| P06491 |VCAP_HSVll Gene name:' MAJOR CAPSID PROTEIN (MCP) (
[SEQ ID NO: 45] ORF # = 8 from Contig 104 ORF start site = 14832 ORF end site = 14164 ORF sequence:
MTMRDDVPLLDRELVYEAACGGEDGELPLDEQFSLSSYGTSDFFVSSAYSRLPPHTQPVF SKRVVMFAWSFLVLKPLELVAAGMYYG TGRAVAPACIIAAVLAYYVTWLARALLLYVNI KRDRLPLSPPVFWGLCVIMGGAALCALVAAAHETFSPDGLFH ITASQLLPRTDPLRARS LGIACAAGAAMWVAAADCFAAFTNFFLARF TRAILKAPVAF*
Gene matched: gi | 136841 | sp| P10204 |UL20_HSV11 Gene name: MEMBRANE PROTEIN UL20. gi | 73
[SEQ ID NO: 46] ORF # = 9 from Contig 104 ORF start site = 15168 ORF end site = 17081 ORF sequence :
VGRQGERWVGGGNEKNTQRATSGMRPELSLKGRPCVTEAWCPSTDAAIHSGGSSSVRPQ PYARAARARATHGSRSRHRQPLLPPPSSHHPTIPPPPSPPRGSPAMELTYATTLHHRDW FYVTADRNRAYFVCGGSVYSVGRPRDSQPGEIAKFGLWRGTGPKDRMVANYVRSΞLRQR GLREVRPVGEDEVFLDSVCLLNPNVSSERDVINTNDVEVLDECLAEYCTSLQTSPGVLVT GVRVRARDRVIELFEHPAIVNISSRFAYTPSPYVFALAQAHLPRLPSSLEPLVSGLFDGI PAPRQPLDARDRRTDWITGTRAPRPMAGTGAGGAGAKRATVSEFVQVKHIDRWSPSVS SAPPPSAPDASLPPPGLQEAAPPGPPLRELW VFYAGDRALEEPHAESGLTREEVRAVHG FREQAWKLFGSVGAPRAFLGAALALSPTQKLAVYYYLIHRERRMSPFPALVRLVGRYIQR HGLYVPAPDEPTLADAMNGLFRDALAAGTVAEQLLMFDLLPPKDVPVGSDARADSAALLR FVDSQRLTPGGSVSPEHVMYLGAFLGVLYAGHGRLAAATHTARLTGVTSLVLTVGDVDRM SAFDRGPAGAAGRTRTAGYLDALLTVCLARAQHGQSV*
Gene matched: gi | 136845 | sp| P10205 |UL21_HSV11 Gene name: PROTEIN UL21. gi | 73866 |pir | | [SEQ ID NO:47] ORF # = 10 from Contig 104 ORF start site = 19116 ORF end site = 17302 ORF sequence:
VYLSPSALKWPVGVWTTGGLAFGCDAALVRARYGKGFMGWISMRDSPPAEIIWPADKT LARVGNPTDENAPAVLPGPPAGPRYRVFVLGAPTPADNGSALDALRRVAGYPEESTNYAQ YMSRAYAEFLGEDPGSGTDARPSLFWRLAGLLASSGFAFVNAAHAHDAIRLSDLLGFLAH SRVLAGLAARGAAGCAADSVFLNVSVLDPAARLRLEARLGHLVAAILEREQSLAAHALGY QLAFVLDSPAAYGAVAPSAARLIDALYAEFLGGRALTAPMVRRALFYATAVLRAPFLAGA PSAEQRERARRGLLITTALCTSDVAAATHADLRAALARTDHQKNLFWLPDHFSPCAASLR FDLAEGGFILDALAMATRSDIPADVMAQQTRGVASVLTR AHYNALIRAFVPEATHQCSG PSHNAEPRILVPITHNASYWTHTPLPRGIGYKLTGVDVRRPLFITYLTATCEGHAREIE PKRLVRTENRRDLGLVGAVFLRYTPAGEVMSVLLVDTDATQQQLAQGPVAGTPNVFSSDV PSVALLLFPNGTVIHLLAFDTLPIATIAPGFLAASALGWMITAALAGILRWRTCVPFL WRRE*
Gene matched: gi | 138316 | sp | P08356 |VGLH_HΞV1E Gene name: GLYCOPROTEIN H PRECURSOR, gi
[SEQ ID NO:48] ORF # = 11 from Contig 104
ORF start site = 20070
ORF end site = 19117
ORF sequence :
VSISAGVRGQGWHRISTPPKNGAGRSVLVFGLVLPLCFYPHPTPSFGPRLRQQRASDSLR GAEPLWAVGTDTPPSADWQPGRTTMGPGL WMGVLVGVAGGHDTYWTEQIDP FLHGLG
LARTYWRDTNTGRLWLPNTPDASDPQRGRLAPPGELNLTTASVPMLRWYAERFCFVLVTT
AEFPRDPGQLLYIPKTYLLGRPRNASLPELPEAGPTSRPPAEVTQLKGLSHNPGASALLR
SRAWVTFAAAPDREGLTFPRGDDGATERHPDGRRNAPPPGPPAGTPRHPTTNLSIAHLHN
ASVT LAARGLLRTPGR*
Gene matched: gi | 364588 |prf | |1508243Y Gene name: UL22 gene [Human herpesvirus 1]
[SEQ ID NO: 49] ORF # = 12 from Contig 104 ORF start site = 21285 ORF end site = 20155 ORF sequence : MASHAGQQHAPAFGQAARASGPTDGRAASRPSHRQGASEARGDPELPTLLRVYIDGPHGV GKTTTSAQLMEALGPRDNIVYVPEPMTY QVLGASETLTNIYNTQHRLDRGEISAGEAAV VMTSAQITMSTPYAATDAVLAPHIGGEAVGPQAPPPALTLVFDRHPIASLLCYPAARYLM GSMTPQAVLAFVALMPPTAPGTNLVLGVLPEAEHADRLARRQRPGERLDLAMLSAIRRVY DLLANTVRYLQRGGR RED GRLTGVAAATPRPDPEDGAGSLPRIEDTLFALFRVPELLA PNGDLYHIFAWVLDVLADRLLPMHLFVLDYDQSPVGCRDALLRLTAGMIPTRVTTAGSIA EIRDLARTFAREVGGV*
Gene matched: gi | 59823
Gene name: (V00466) thymidine kinase [Herpes simplex virus ty
[ SEQ ID NO : 50 ] ORF # = 13 from Contig 104 ORF start site = 20968 ORF end site = 22032 ORF sequence : VLRWDVRQGLGGPQHLPVSHRLGDVDDIVARPQGLHQLRGGGGLPHPVGSVYINPQQRG QLRI PAGFGGPLAMARTGRRAAVGRPARTSSLTERRRVLLAGVRSHTRFYKAFAREVREF NATRICGTLLTLMSGSLQGRSLFEATRVTLICEVDLGPRRPDCICVFEFANDKTLGGVCV ILELKTCKSISSGDTASKREQRTTGMKQLRHSLKLLQSLAPPGDKWYLCPILVFVAQRT LRVSRVTRLVPQKISGNITAAVRMLQSLSTYAVPPEPQTRRSRRRVAATARPQRPPSPTR DPEGTAGHPAPPESDPPSPGWGVAAEGGGVLQKIAALFCVPVAAKSRPRTKTE*
Gene matched: gi | 136854 | sp | P10208 |UL24_HSV11 Gene name: PROTEIN UL24. gi | 74056 |pir | |
[SEQ ID NO: 51] ORF # = 14 from Contig 104 ORF start site = 22313 ORF end site = 23893 ORF sequence:
MDPYYPFDALDVWEHRRFIVADSRSFITPEFPRDFWMLPVFNIPRETAAERAAVMQAQRT AAAAALENAALQAAELPVDIERRIRPIEQQVHHIADALEALETAAAAAEEADAARDAEAR GEGAADGAAPSPTAGPAAAEMEVQIVRNDPPLRYDTNLPVDLLHMVYAGRGAAGSSGWF GT YRTIQERTIADFPLTTRSADFRDGRMSKTFMTALVLSLQSCGRLYVGQRHYSAFECA VLCLYLLYRTTHESSPDRDRAPVAFGDLLARLPRYLARLAAVIGDESGRPQYRYRDDKLP KAQFAAAGGRYEHGALATHWIATLVRHGVLPAAPGDVPRDTSTRVNPDDVAHRDDVNRA AAAFLARGHNLFLWEDQTLLRATANTITALAVLRRLLANGNVYADRLDNRLQLGMLIPGA VPAEAIARGASGLDSGAIKSGDNNLEALCVNYVLPLYQADPTVELTQLFPGAGRPVPGRP GGAATGVDEARGGYWGRPPGGARAPHRAGAHQPHPHKHHPCGGDY* Gene matched: gi | 136863 | sp | P10209 |UL25_HSV11 Gene name: VIRION PROTEIN UL25. gi 17406
[SEQ ID NO: 52] ORF # = 15 from Contig 104 ORF start site = 23784 ORF end site = 24071 ORF sequence:
WDMLSGARQAALVRLTALELINRTRTNTTPVGEIINAHDALGIQYEQGLGLLAQQARIG LASNAKRFATFNVGSDYDLLYFLCLGFIPQYLSVA*
Gene matched: gi | 136863 | sp | P10209 |UL25_HSV11 Gene name: VIRION PROTEIN UL25. gi 17406
[SEQ ID NO: 53] ORF # = 16 from Contig 104
ORF start site = 24292
ORF end site = 25638
ORF sequence:
VRVPMASAEMRERLEAPLPDRAVPIYVAGFLALYDSGDPGELALDPDTVRAALPPENPLP INVDHRARCEVGRVLAWNDPRGPFFVGLIACVQLERVLETAASAAIFERRGPALSREER
LLYLITNYLPSVSLSTKRRGDEVPPDRTLFAHVALCAIGRRLGTIVTYDTSLDAAIAPFR
HLDPATREGVRREAAEAELALAGRTWAPGVEALTHTLLSTAVNNMMLRDRWSLVAERRRQ
AGIAGHTYLQASEKFKIWGAESAPAPERGYKTGAPGAMDTSPAASVPAPQVAVRARQVAS
SSSSSSSFPAPADMNPVSASGAPAPPPPGDGSYL IPAFHYNQLVTGQSAPHHPPLTACG LPAAGTVAYGHPGAGPSPHYPPPPAHPYPGYAVRGPQSPGGPDRRAGGGHRRRPPGGWAS
GGRRRPRDPGVGEPPPTRGGAAGVRLRP*
Gene matched: gi | 1224097 Gene name: (U49329) UL26 protease [Human herpesvirus 2 ]
[SEQ ID NO: 54]
ORF # = 17 from Contig 104 ORF start site = 25463
ORF end site = 26221
ORF sequence:
MLFAGPSPLEAQIAALVGAIAADRQAGGLPAAAGDHGIRGSANRRRHEVEQPEYDCGRDE
PDRDFPYYPGEARPEPRPVDSRRAARQASGPHETITALVGAVTSLQQELAHMRARTHAPY GPYPPVGPYHHPHADTETPAQPPRYPAEAVYLPPPHIAPPGPPLSGAVPPPSYPPVAVTP GPAPPLHQPSPAHAHPPPPPPGPTPPPAASLPQPEAPGAEAGALVNASSAAHVNVDTARA ADLFVSQMMGSR*
Gene matched: gi | 1224097
Gene name: (U49329) UL26 protease [Human herpesvirus 2]
[SEQ ID NO: 55] = Contig ID 14
[SEQ ID NO: 56] ORF # = 1 from Contig 14 ORF start site = 665 ORF end site = 787 ORF sequence: VKYQPRKLGKFKFNNLRDCGLYQRSPLQKFARLDIQPLLH"
Gene matched: gi | 76474 |pir| |JQ0950
Gene name: ICP 18.5 protein - infectious laryngot
[SEQ ID NO: 57] = Contig ID 3Ϊ
[SEQ ID NO: 58]
ORF # = 1 from Contig 38
ORF start site = 273
ORF end site = 43
ORF sequence: VELTAAQGVLPVSVDSTSGDRAQLNNNNNNNDDDYNNKKQLKPLQTQTTLSHSFEVSSGS
PNTEVEIGERTDYLLK*
Gene matched: gi | 1020200 Gene name: (U31782) minor capsid protein L2 [Human papillom
[SEQ ID NO: 59] = Contig ID 50 [SEQ ID NO: 60]
ORF # = 1 from Contig 50
ORF start site = 365
ORF end site = 3
ORF sequence:
MSRRSPRRRGPRRRPRPGGPTVPRPGAFPTADSQMVPAYDSGTAVESAPAASSLLRR LL
VPQADDSDDADYAGNDDAEWANSPPSEGGGKAPEAPHAAPASACPPPPPRKERGQQRPLP
X
Gene matched: gi | 132753 | sp| P28283 |RL1_HSV2H Gene name: NEUROVIRULENCE FACTOR (ICP34.
[SEQ ID NO: 61] = Contig ID 53
[SEQ ID NO: 62]
ORF # = 1 from Contig 53 ORF start site = 754
ORF end site = 380
ORF sequence:
VETAHARMYPDAPPLRLCRGANVRYRVRTRFGPDTLVPMSPREYRRAVLPALDGRAAASG
AGDAMAPGAPDFCEDEAHSHRACARWGLGAPLRPVYVALGRDTVRGGPADLLGPRREFCA RALL*
Gene matched: gi | 124141 | sp | P08392 | ICP4_HSV11 Gene name: TRANS-ACTING TRANSCRIPTIONAL
[SEQ ID NO: 63] = Contig ID 67
[SEQ ID NO: 64]
ORF # = 1 from Contig 67
ORF start site = 487
ORF end site = 26
ORF sequence: VSDGQHQATVXXEVQASEPYIRVANGFGLWPQGGQGTIDTXELHXDTNLDIRSGDEVHY
HVTAGRR GQLLWATQSVTAFSQEDLLDGAIFYRLNGSLRTRDTLIFSMEMGPVHTDATI
QVTVALEGPLAPLKLVRHKKIYVFXGRGS GIL*
Gene matched: gi | 560570 | bbs | 151525 Gene name : chondroitin sulfate proteoglycan NG2=t
[SEQ ID NO: 65]
ORF # = 2 from Contig 67
ORF start site = 353
ORF end site = 511
ORF sequence :
VELXCVNGALTSLRDHKAKTVGHTDVGLRGLHLXXHSGLVLPIGHTCSWIQP*
Gene matched: gi | 1079684
Gene name: (U39205) Lpel4p [Saccharomyces cerevisiae]
[SEQ ID NO: 66] = Contig ID 74
[SEQ ID NO: 67]
ORF # = 1 from Contig 74
ORF start site = 224
ORF end site = 412
ORF sequence: MITGLDNNVCYPITQFAIYNRLTCDKTYRIMPEYAHEAMNVFVNDQVYN LCGSEIPFKY
LK*
Gene matched: gi] 550075 Gene name: (D10935) cephalosporin-C deacetylase [Bacillus su
[SEQ ID NO: 68] = Contig ID 76
[SEQ ID NO: 69] ORF # = 1 from Contig 76 ORF start site = 111 ORF end site = 1 ORF sequence:
MALTEDASSDSPTSAPEKTPLPVSATAMDQAYRYΞXX
Gene matched: gi | 138297 | sp | P13290 |VGLG_HSV2 Gene name: GLYCOPROTEIN G. gi | 419139 | pir
[SEQ ID NO: 70] = Contig ID 82
[SEQ ID NO: 71] ORF # = 1 from Contig 82 ORF start site = 767 ORF end site = 1156 ORF sequence:
VALAPYVNKTVTGDCLPVLDMETGHIGAYWLVDQTGNVADLLRAAAPA SRRTLLPEHA RNCVRPPDYPTPPASE NSL MTPVGNMLFDQGTLVGALDFHGLRSRHPWSREQGAPAPA GDAPAGHGE*
Gene matched: gi | 124135 | sp | P28284 | ICP0_HSV2H Gene name: TRANS-ACTING TRANSCRIPTIONAL
[SEQ ID NO: 72] = Contig ID 87
[SEQ ID NO: 73] ORF # = 1 from Contig 87
ORF start site =.519
ORF end site = 1475
ORF sequence:
MLNDMQWLASSDSEEETEVGISDDDLHRDSTSEAGSTDTEMFEAGLMDAATPPARPPAER QGSPTPADAQGSCGGGPVGEEEAEAGGGGDVCAVCTDEIAPPLRCQSFPCLHPFCIPCMK
TWIPLRNTCPLCNTPVAYLIVGVTASGSFSTIPIVNDPRTRVEAEAAVRSGTAVDFIWTG
NPRTAPRSLSLGGHTVRALSPTPPWPGTDDEDDDPPDGEGGRGSGTGRGSGTGRGSGTGR
GSGTGRGSGGGQALTGGSRLCLPLQPELISRPPPNTSPPGAAVPGPPLVTPPPLLPNLRP
PAPPGTTLTRGPPFLGRGF
Gene matched: gi | 124135 | sp | P28284 | ICP0_HSV2H Gene name: TRANS-ACTING TRANSCRIPTIONAL
[SEQ ID NO: 74] = Contig ID 89
[SEQ ID NO: 75] ORF # = 1 from Contig 89
ORF start site = 259
ORF end site = 615
ORF sequence:
MLADRWRKHTDGN YWFDNSGEMATGWKKIADK YYFNEEGAMKTGWVKYKDT YYLNAK
EGAMVSNAFIHSAGRNRLVLPQTRPNTGRQARIHSRPRWLDYVKIIMECLSNQNPAYY*
Gene matched: gi | 113676 | sp | PO6653 |ALYS_STRPN Gene name: AUTOLYSIN (N-ACETYLMURAMOYL-
[SEQ ID NO: 76] = Contig ID 90
[SEQ ID NO: 77]
ORF # = 1 from Contig 90
ORF start site = 507 ORF end site = 2702 ORF sequence :
VKTIKSMDMPVATSFLAPDGTPLQYALCFPAVTDKLGALLMRPEAACVRPPLPTDVLESA PTVTAMYVLTWNRLQLALSDAQAANFQLFGRFVRHRQATWGASMDAAAELYVALVATTL TREFGCRWAQLGWASGAAAPRPPPGPRGSQRHCVAFNENDVLVALVAGVPEHIYNFWRLD LVRQHEYMHLTLERAFEDAAESMLFVQRLTPHPDARIRVLPTFLDGGPPTRGLLFGTRLA DWRRGKLSETDPLAPWRSALELGTQRRDAPALGKLSPAQALAAVΞVLGRMCLPSAALAAL WTCMFPDDYTEYDSFDALLAARLESGQTLGPAGGREASLPEAPHALYRPTGQHVAVLAAA THRTPAARVTAMDLVLAAVLLGAPVWALRNTTAFSRESELELCLTLFDSRPGGPDAALR DWSSDIETWAVGLLHTDLNPIENACLAAQLPRLSALIAERPLADGPPCLVLVDISMTPV AVLWEAPEPPGPPDVRFVGSEATEELPFVATAGDVLAASAADADPFFARAILGRPFDASL LTGELFPGHPVYQRPLADEAGPSAPTAARDPRDLAGGDGGSGPEDPAAPPARQADPGVLA PTFLTDATTGEPVPPRMWAWIHGLEELASEDAGGPTPNPAPALLPPPATDQSVPTSQYAP RPIGPAXTARETRPSVPPQQNTGRVPVAPRXDPRPSPPTPSPPADAAVPPPAFSGFAAAF SAAVPRVRRSRR
Gene matched: gi | 135576 | sp | P10220 |TEGU_HSV11 Gene name: LARGE TEGUMENT PROTEIN (VIRI
[SEQ ID NO:78] = Contig ID 91
[SEQ ID NO:79] ORF # = 1 from Contig 91 ORF start site = 364 ORF end site = 2751 ORF sequence :
VCPPPTGATWQFEQPRRCPTRPEGQNYTEGIAWFKENIAPYKFKATMYYKDVTVSQVW FGHRYSQFMGIFEDRAPVPFEEVIDKINAKGVCRSTAKYVRNNMETTAFHRDDHETDMEL KPAKVATRTSRGWHTTDLKYNPSRVEAFHRYGTTVNCIVEEVDARSVYPYDEFVLATGDF VYMSPFYGYREGSHTEHTSYAADRFKQVDGFYARDLTTKARATSPTTRNLLTTPKFTVAW DWVPKRPAVCTMTKWQEVDEMLRAEYGGSFRFSSDAISTTFTTNLTQYSLSRVDLGDCIG RDAREAIDRMFARKYNATHIKVGQPQYYLATGGFLIAYQPLLSNTLAELYVREYMREQDR KPRNATPAPLREAPSANASVERIKTTSSIEFARLQFTYNHIQRHVNDMLGRIAVAWCELQ NHELTLWNEARKLNPNAIASATVGRRVSARMLGDVMAVSTCVPVAPDNVIVQNSMRVSSR PGTCYSRPLVSFRYEDQGPLIEGQLGENNELRLTRDALEPCTVGHRRYFIFGGGYVYFEE YAYSHQLSRADVTTVSTFIDLNITMLEDHEFVPLEVYTRHEIKDSGLLDYTEVQRRNQLH DLRFADIDTVIRADANAAMFAGLCAFFEGMGDLGRAVGKWMGWGGWSAVSGVSSFMS NPFGALAVGLLVLAGLVAAFFAFRYVLQLQRNPMKALYPLTTKELKTSDPGGVGGEGEEG AEGGGFDEAKLAEAREMIRYMXLVSAMERTEHKARKKGTSALLSSKVTNMVLRKRNKARY SPLHNEDEAGDEDEL*
Gene matched: gi | 138198 | sp | P06763 |VGLB_HSV23 Gene name: GLYCOPROTEIN B PRECURSOR, gi
[SEQ ID NO: 80] = Contig ID 93
[SEQ ID NO: 81] ORF # = 1 from Contig 93 ORF start site = 533 ORF end site = 1678 ORF sequence : VALFVPLRLGWDPQTGLWRVERASWGPPAAPRAALLDVEAKVNFNPLALAARVAEHPGA RLAWARLAAIRNSPQCASSASLAVTITTRTARFAREYTTLAFPPTSKEGAFADLVEVCEV CLRPRGHPHRVTARVLLPRGYNYFVSAGDGFSAPALVALFRQWHTTVHPAPGALAPVFAF LGPGFEVRGGPLQYFAVLGFPGWPPFTVPAAAAAESVRDLLRGAACTHPLCPGGPGPRWA PRSSCPRGHGRPWPRRRPAASCPPFGKRWRGGTPRPPPSNYSTPRRPSGRSGRRGFVSPG SRPSSWPPSRASGRPGCRKPGGGRAWKGWTRWWRPPPRΞPGPGPCWSAWCRTRATPAPRS GSCSAGSWPPSACRSSRRPAR*
Gene matched: gi | 136802 | sp | P10192 |UL08_HSV11 Gene name: PROTEIN UL8. gi | 73829 | pir | | [SEQ ID NO: 82] ORF # = 2 from Contig 93 ORF start site = 1288 ORF end site = 2448 ORF sequence :
VASEAAGRLLPAFREAVARWHPTATTIQLLDPPAAVGPVWTARFCFSGLQAQLLAALAGL GEAGLPEARGRAGLERLDALVAAAPSEPWARAVLERLVPDACDACPALRQLLGGVMAAVC LQIEQTASSVKFAVCGGTGAAFWGLFNVDPGDADAAHGAIHDARRALEASVRAVLSANGI RPRLAPSLALEGVYTHWTWSQTGAWFWNSRDDTDFLQGFPLRGPAYAAAAEVMRDALRR ILRRPAAGPPEEAVCAARGIMEDACDRFVLDAFGRRLDAEYWSVLTPPGEADDPLPQTAF RGGALLDAEQYWRRWRVCPGGGESVGVPVDLYPRPLVLPPVDCAHHLREILREIQLVFT GVLEGVWGEGGSFVYPFEEKMRFLFP*
Gene matched: gi | 136802 | sp | P10192 | UL08_HSV11 Gene name: PROTEIN UL8. gi I 73829 I pir I | W
[SEQ ID NO: 83] ORF # = 3 from Contig 93 ORF start site = 3631 ORF end site = 2705 ORF sequence :
VRRTRAGASNAGMADPTPADEGTAAAILKQAIAGDRSLVEVAEGISNQALLRMACEVRQV SDRQPRFTATSVLRVDVTPRGRLRFVLDGSSDDAYVASEDYFKRCGDQPTYRGFAVWLT ANEDHVHSLAVPPLVLLHRLSLFRPTDLRDFELVCLLMYLENCPRSHATPSLFVKVSAWL GWARHASPFERVRCLLLRSCHWILNTLMCMAGVKPFDDELVLPHWYMAHYLLANNPPPV LΞALFCATPQSFALQLPGPVPRTDCVAYNPAGVMGSCWKSKDLRSALVYWWLSGSPKRRT SSLFYRFC*
Gene matched: gi | 136798 | sp| P10191 |UL07_HSV11 Gene name: PROTEIN UL7. gi | 73828 |pir | |w
[SEQ ID NO: 84]
ORF # = 4 from Contig 93 ORF start site = 4286
ORF end site = 3570
ORF sequence :
MSPATQLQARDRELRRAQAGALEREHRAADRAAGGGAGRPAEADLLRADYD11DVSKSMD
DDTYVANSFQHQYIPAYGQDLERLSRLWEHELVRCFKILRHRNNQGQETSISYSSGAIAS FVAPYFEYVLRAPRAGALITGSDVILGEEELWEAVFKKTRLQTYLTDVAALFVADVQHAA LPRPPSPTPADFRASASPRGGSRSRTRTRSRSPGRTPRGAPDQGWGVQRRDGRPHARR*
Gene matched: gi | 136794 | sp | P10190 |UL06_HSV11 Gene name: VIRION PROTEIN UL6. gi 173994
[SEQ ID NO: 85] = Contig ID 94
[SEQ ID NO: 86] ORF # = 1 from Contig 94 ORF start site = 3669 ORF end site = 496 ORF sequence:
PRLSRAYLRHARGFEGSPGDTYPLRIGRRQSFPFGPAVSAPRRRARTPVAMSDSALQVPA PAGMTPPSAPPPNGPLQVLLGSLTNLRRPPSPSSEPAGSADEPAFLΞAAKLRAATAAFLL SGAAVGPAEARACWHPLLEQLCALHRAHGLPETALLAENLPGLLVHRMAVALPETPEAAF REMDVIKDTVLAITGSDTTHALEAAGLRTTAALGPVRVRQCAVEWIDRWRTVTQSCLAMN PRTSLEALGEMSLKMSPVPLGQPGANLTTPAYSLLFPSPIVQEGLRFLALVSNWVTLFSA HLQRIDDAALTPLTRALFTLALVDEYLTTPDRGAWPPPLLAQFQHTVREIDPAIMIPPL EATKMVRSREEVRVSTALSRVSPRSACAPPGTLMARVRTDAAVFDPDVPFLSASALAIFR PAVTGLLQLGEPPSAGAQQRLLALLQQTWALVQNSNSPSWINTLTDAGFTPAHCTQYIS ALEGFLVAGVPARTPPGHGLSEIQQLFGCIALAGANVFGLAREYGHYAGYVKTFRRIQGA SEHTHGRLCEAVGLSGGVLSQTLARIMGPAVPTEHLASLRRTLVGEFETAERRFSAGQPS LLRETALIWLDVYGQTHWDLTPTTPATPLSALLPVGPPSHAPSVHLAAATKIRFPALEGI HPNVLADPGFVPYVLALWGDALRATCNAAYLPRPIEFALRVLAWARDFGLGYLPTVEGH RTKLGALITLLEPATRAGVGPTMQMADNIEQLLRELYVIARGAVEQLRPAVQLPPPQPPE VGSSLLLISMYALAARGVMQEFAERADPLVRQLEDAIVLLRLHMRTLAAFFECRFESDGH RLYAWADAHERLGPWRPEAMGDAVSQYCGMYHDAKRALVASLAGLRSWTETTAHLGVC DELAAQVSHEGNVLAWRREIHGFLAIVSGIHARASKLMSGDQVPGFCYMSQFLARWRRL SAGYQAARAATGPERVAEFVQELHDTWKGLQTERALWAPFASSGDQRTAAIQEVMAHAN EDAPPARPQTRRAHKRHDWGAGXTXXGAWVXDWXDS*
Gene matched: gi | 221758
Gene name: (D10879) UL37 [Herpes simplex virus type 1]
[SEQ ID NO: 87] = Contig ID 95
[SEQ ID NO: 88] ORF # = 1 from Contig 95 ORF start site = 371 ORF end site = 18 ORF sequence :
VLLDAPAPTASGRTKTPAQGLAKEVQFSTAPPSPTAPWTPRVAGFNKRVFCAAVGRLAAT HARLAAVQLWDMSRPHTDGDLNELLDLTTIRVTVCEGKNLLQRANELVNPDAAQGI *
Gene matched: gi | 136927 | sp | P10233 |UL49_HSV11 Gene name: TEGUMENT PROTEIN UL49. gi | 73
[SEQ ID NO: 89] ORF # = 2 from Contig 95 ORF start site = 831 ORF end site = 436 ORF sequence:
MTSRRSVKSCPREAPRGTHEELYYGPVSPADPESPRDDFRRGAGPMRARPRGEVRFLHYD EAGYALYRDSSSSEDNDESRDTARPRRSASVAGSHGPGPARAPPPPGGPVGAGGRSHAPP ARTPKMTRGAP*
Gene matched: gi | 136927 | sp | P10233 |UL49_HSV11 Gene name: TEGUMENT PROTEIN UL49. gi | 73
[SEQ ID NO: 90]
ORF # = 3 from Contig 95 ORF start site = 1441 ORF end site = 2550 ORF sequence: MSQWGPRAILVQTDSTNRNADGDWQAAVAIRGGGWQLNMVNKRAVDFTPAECGDSEWAV GRVSLGLRMAMPRDFCAIIHAPAVSGPGPHVMLGLVHSGYRGTVLAVWSPNGTRGFAPG ALRVDVTFLDIRATPPTLTEPSSLHRFPQLAPSPLAGLREDPWLDGALATAGGAVALPAR RRGGSLVYAGELTQVTTEHGDCVHEAPAFLPKREEDAGFDILIHRAVTVPANGATVIQPS LRVLRAADGPEACYVLGRSSLNARGLLVMPTRWPSGHACAFWCNLTGVPVTLQAGSKVA QLLVAGTHALPWIPPDNIHEDGAFRAYPRGVPDATATPRDPPILVFTNEFDADAPPSKRG AGGFGSTGI*
Gene matched: gi | 118955 | sp | P10234 |DUT_HSV11 Gene name: DEOXYURIDINE 5 ' -TRIPHOSPHATE
[SEQ ID NO: 91] ORF # = 4 from Contig 95 ORF start site = 3535 ORF end site = 2756 ORF sequence:
VGWGKAGAEPRACSGMASLLGVLCGWGTRPEEQQYEMIRAADPPSEAEPRLQEALAWNA LLPAPITLDDALESLDDTRRLVKARALARTYHACMVNLERLARHHPGLEGSTIDGAVAAH RDKMRRLADTCMATILQMYMSVGAADKSADVLVSQAIRSMAESDWMEDVAIAERALGLS TSALAGGTRTAGLGATEAPPGPTRAQAPEVASVPVTHAGDRSPVRPGPVPPADPTPDPRH RTSAPKRQASSTEAPLLLA*
Gene matched: gi | 136933 | sp | P10235 |UL51_HSV11 Gene name: PROTEIN UL51. gi | 73813 | pir | |
[SEQ ID NO: 92] ORF # = 5 from Contig 95 ORF start site = 2889 ORF end site = 5042 ORF sequence :
VTGTDATSGACALVGPGGASVAPSPAVRVPPARAEVERPRARSAIATSSMTTSLSAMLRM AWETSTSADLSAAPTDMYICRMVAMHVSARRRILSRCAATAPSMVEPSSPGWWRASLSRL TMQAWYVRARARAFTRRRVSSSDSRASSSVMGAGKSALTTARASCSRGSASEGGSAARII SYCCSSGRVPQPHSTPSRDAIPEHARGSAPAFPHPTPSGFAGAMGTEDCDHEGRSVAAPV EVMALYATDGCVITSSLALLTNCLLGAEPLYIFSYDAYRPDAPNGPTGAPTEQERFEGSR ALYRDAGGLNGDSFRVTFCLLGTEVGVTHHPKGRTRPMFVCRFERADDVAVLQDALGRGT PLLPAHITATLDLEATFALHANIIMALTVAIVHNAPARIGSGSTAPLYEPGESMRSWGR MSLGQRGLTTLFVHHKARVLAAYRRAYYGSAQSPFWFLSKFGPDKKSLVLAARYYLLQAP RLGGAGATYDLQAVKDICATYAIPHDPRPDTLSAASLTSFAAITRFCCTSQYSRGAAAAG FPLYVERRIAADVRETGALEKFIAHDRSCLRVSDREFITYIYLAHFECFSPPRLATHLRA VTTHDPSPAASTEQPSPLGREAVEQFFRHVRAQLNIREYVKQNVTPRETALAGDAAAAYL RARTYAPAALTPAPAYCGVADSSTKMMGRLAEAERLLVPHGWPAFAPTTPGDDAGGGI
Gene matched: gi | 136939 | sp | P10236 |UL52_HSV11 Gene name: DNA REPLICATION PROTEIN UL52
[SEQ ID NO: 93] = Contig ID 96
[SEQ ID NO: 94] ORF # = 1 from Contig 96 ORF start site = 2599 ORF end site = 1064 ORF sequence:
VGGCVDKLPLLKTPGPVARGARWLARATRRMACRKFCGVYRRPDKRQEASVPPETNTAPA FPASTFYTPAEDAYLAPGPPETIHPSRPPSPGEAARLCQLQEILAQMHSDEDYPIVDAAG AEEEDEADDDAPDDVAYPEDYAEGRFLSMVSAAPLPGASGHPPVPGRAAPPDVRTCDΞGK MGATGFTPEELDTMDREALRAIΞRGCKPPSTLAKLVTGLGFAIHGTLIPGSEGCVFDSSH PNYPHRVIVKAGWYASTNHEARLLRRLNHPAILPLLDLHWSGVTCLVLPKYHCDLYTYL SKRPSPLGHLQITAVSRQLLSAIDYVHCEGIIHRDIKTENILINTPENICLGDFGAACFV RGCRSSPFHYGIAGTIDTNAPEVLAGDPYTQVIDIWSAGLVIFETAVHTASLFSAPRDPE RRPCDNQIARIIRQAQVHVDEFPTHAESRLTAHYRSRAAGNNRPAWTRPAWTRYYKIHTD VEYLICKALTFDAALRPSAAELLRLPLFHPK*
Gene matched: gi | 125617 | sp | P13287 | KR1_HSV2 Gene name: SERINE/THREONINE-PROTEIN KINAS
[SEQ ID NO: 95]
ORF # = 2 from Contig 96 ORF start site = 2795
ORF end site = 3373
ORF sequence :
MGWWSWTLLNQRNALPRTSADASPALWSFLLRQCRILASEPLGTPVWRPANLRRLA
EPLMDLPKFTRPIVRTRSCRCPPNTTTGLFAEDDPLESIEILDAPACFRLLHQERPGPHR LYHLWWGAADLCVPFLEYAQKTRLGFRFIAMKTNDAWVGEPWPLPDRFLPERTVSWTPF
PAAPNHPLGKSP*
Gene matched : gi | 137125 | sp | P13292 | US02_HSV2 Gene name : PROTEIN US2 . gi | 419137 I pir | | A
[ SEQ ID NO : 96 ] ORF # = 3 from Contig 96 ORF start site = 3534 ORF end site = 3671 ORF sequence : MGRPEIPDEPSWQTGDDDPQNPGPPLAVGDEWPPSSHVCYPITNL*
Gene matched: gi | 137125 | sp | P13292 |US02_HSV2 Gene name: PROTEIN US2. gi | 419137 | pir | | A [SEQ ID NO:97] ORF # = 4 from Contig 96 ORF start site = 5400 ORF end site = 3853 ORF sequence :
VGRMRVGERERGKKKKEGRRRRKREGGEGKGKEEEGGEEGEVREKGERDRGGGEGGGREK RGEKGDGGGGPRSQHPRFIAGRAPPSWTGHRCGNWRQGVATMADIPPDPPAVNTTPANHA PPSPPPGSRKRRRPVLPSSSESEGKPDTESESSSTESSEDEAGDLRGGRRRSPRELGGRY FLDLSAESTTGTESEGTGPSDDDDDDASDGWLVDTPPRKSKRPRINLRLTSSPDRRAGW FPEVWRNDRPIRAAQPQAPAQSSGDRAAAPRRSARQAQMRSGAAWTLDLHYIRQCVNQLF RILRAAPNPPGSANRLRHLVRDCYLMGYCRTRLGPRTWGRLLQISGGTWDVRLRNAIREV EARFEPAAEPVCELPCLNARRYGPECDVGNLETNGGSTSDDEISDATDSDDTLASHSDTE GGPSPAGRENPESASGGAIAARLECEFGTFDWTSEEGSQPWLSAWADTSSAERSGLPAP GACRATEAPEREDGCRKMRFPAACPYPCGHTFLRP*
Gene matched: gi | 124184 | sp | P04485 | IE68_HSV11 Gene name: IMMEDIATE-EARLY PROTEIN IE68
[SEQ ID NO: 98] = Contig ID 98
[SEQ ID NO: 99]
ORF # = lfrom Contig 98
ORF start site = 612
ORF end site = 872
ORF sequence: MVMAACPTEPPGGSVGPADQPRVQSSRTWRPPLVNSRELYRAQRAARCASSSDTPQAPGW
CGGTCRHAVFGWAWWIILAFLWR*
Gene matched: gi | 136952 | sp| P28282 |UL56_HSV2H Gene name: PROTEIN UL56. gi I 73833 Ipirl I
[SEQ ID NO: 100]
ORF # = 2 from Contig 98 ORF start site = 1689
ORF end site = 1045
ORF sequence :
MWGPGPARFIARPGTHGRRVFTDPPPRNMTTTPLSNLFLRAPDITHVAPPYCLNATWQAE
NALHTTKTDPACLAARSYLVRASCSTSGPIHCFFFAVYKDSQHSLPLVTELRNFADLVNH PPVLRELEDKRGGRLRCTGPFSCGTIKDVSGASPAGEYTINGIVYHCHCRYPFSKTCWLG ASAALQHLRFISSSGTAARAAEQRRHKIKIKIKV*
Gene matched: gi | 136947 | sp | P28281 |UL55_HSV2H Gene name: PROTEIN UL55. gi | 73806 | pir | |
[SEQ ID NO:101]
ORF # = 3 from Contig 98 ORF start site = 2705
ORF end site = 1821
ORF sequence:
MALSLTPPHADGRAPVPERKAPSADTIDPAVRAVLRSISERAAVERISESFGRSALVMQD
PFGGMPFPAANSPWAPVLATQAGGFDAETRRVSWETLVAHGPSLYRTFAANPRAASTAKA MRDCVLRQENLIEALASADETLAWCKMCIHHNLPLRPQDPIIGTAAAVLENLATRLRPFL
QCYLKARGLCGLDDLCSRRRLSDIKDIASFVLVILARLANRVERGVSEIDYTTVGVGAGE
TMHFYIPGACMAGLIEILDTHRQECSSRVCELTASHTIAPLYVHGKYFYCNSLF*
Gene matched: gi | 124181 | sp | P28276 | IE63_HSV2H Gene name: TRANSCRIPTIONAL REGULATOR IE
[SEQ ID NO: 102]
ORF # = 4 from Contig 98 ORF start site = 4922 ORF end site = 3906 ORF sequence: MLAVRSLQHLTTVIFITAYGLVLAWYIVFGASPLHRCIYAVRPAGAHNDTALVWMKINQT LLFLGPPTAPPGGAWTPHAHVCYANIIEGRAVSLPAIPGAMSRRVMNVHEAVNCLEALWD TQMRLVWGWFLYLAFVALHQRRCMFGWSPAHSMVAPATYLLNYAGRIVSSVFLQYPYT KITRLLCELSVQRQTLVQLFEADPVTFLYHRPAVGVIVGCELLLRFVALGLIVGTALISR GACAITYPLFLTITTWCFVSIIALTELYFILRRDSAPKNAEPAAPRGRSKGWSGVCGRCC SIILSGIAVRLCYIAWAGWLMALRYEQEIQRRLFDL*
Gene matched: gi | 116105 | sp| P22485 |CELF_HSV2H Gene name: CELL FUSION PROTEIN PRECURSO
[SEQ ID NO: 103] ORF # = 5 from Contig 9i ORF start site = 6334 ORF end site = 4874 ORF sequence:
AAFDLEVPGHRPFAPGPALPPGGLAVGGHMYVNRNEIFNAALAVTNIILDLDIALKEPVP FPRLHEALGHFMRGALAAVXLLFPAARVNPDAYPCYFFKSACRPRAPPVCAGDGPSAGGD DGDGDWFPDAGGDDGDEEWEEDTDPMDTTHGPLPDDEAAYLDLLHEQIPAATPSEPDSW CSCADKIGLRVCLPVPAPYWHGSLTMRGVARVIQQAVLLDRNFVEAVGSHVKNFLLIDT GVYAHGHSLRLPYFAKIGPDGSACGRLLPVFVIPPACEDVPAFVAAHADPRRFHFHAPPM FSAAPREIRVLHΞLGGDYVSFFEKKASRNALEHFGRRETLTEVLGRYDVRPDAGETVEGF ASELLGRIVACIEAHFPEHAREYQAVSVRRAVIKDDWVLLQLIPGRGALNQSLSCLRFKH GRASRATARTFLALSVGTNNRLCASLCQQCFATKCDNNRLHTLFTVDAGTPCSRSAPSST SRPSSS*
Gene matched: gi | 136939 | sp | P10236 |UL52_HSV11 Gene name: DNA REPLICATION PROTEIN UL52
[ SEQ ID NO : 104 ] = Contig ID 99
[ SEQ ID NO : 105 ]
ORF # = 1 from Contig 99
ORF start site = 213
ORF end site = 659
ORF sequence : VGVGVRGWGGGXCGGLWGGWCWVXWGGWVGVFFCFFLFCXXFXXXXXXXFLAPDLTDPL
LFAYVGFQWNHGLMFWPDIAVYAMLGGAVWISLTQVLGLRRRLHKDPDAGPWAAATLR
GLFFΞVYALGFAAGVLVRPRMAASRRSG *
Gene matched : gi | 807644
Gene name : (M10053 ) unknown protein [Herpes simplex virus ty
[ SEQ ID NO : 106 ]
ORF # = 2 from Contig 99 ORF start site = 757 ORF end site = 2403 ORF sequence :
MGAGVPWTGIKARGAGGPITVRVLGWEVAQKATHPCCSCPREAWSGNPPRCAGRAHRSF AGAGALLVMALGRVGLAVGLWGLLWVGWWLANASPGRTITVGPRGKESNAAPSASPRN ASAPRTTPTPPQPRKATKSKASTAKPAPPPKTGPPKTSSEPVRCNRHNPLARYGLRVQIR CRFPNSTRTESRLQIWRYATATDAEIGTAPSLEEVMVNVSAPPGGQLVYDSAPNRTDPHV IWAEGAGPGASPRLYSWGPLGRQRLIIEELTLETQGMYYWVWGRTDRPSAYGTWVRVRV FRPPSLTIHPHAVLEGQPFKATCTAATYYPGNRAEFVWFEDGRRVFDPAQIHTQTQENPD GFSTVSTVTSAAVGGQGPPRTFTCQLTWHRDSVSFSRRNASGTASVLPRPTITMEFTGDH AVCTAGCVPEGVTFAWFLGDDSSPAEKVAVASQTSCGRPGTATIRSTLPVSYEQTEYICR LAGYPDGIPVLEHHGSHQPPPRDPTERQVIRAVEGAGIGVAVLVAWLAGTAWYLTHAS SVRYRRLR*
Gene matched: gi | 138220 | sp | P06475 |VGLC_HSV23 Gene name: GLYCOPROTEIN C PRECURSOR, gi
[SEQ ID NO: 107] ORF # = 3 from Contig 99 ORF start site = 2634 ORF end site = 3152 ORF sequence:
MAFRASGPAYQPLAPAASPARARVPAVAWIGVGAIVGAFALVAALVLVPPRSSWGLSPCD SGWQEFNAGCVAWDPTPVEHEQAVGGCSAPATLIPRAAAKHLAALTRVQAERSSGYWWVN GDGIRTCLRLVDSVSGIDEFCEELAIRICYYPRSPGGFVRFVTSIRNALGLP*
Gene matched: gi | 136917 | sp | P06483 |UL45_HSV23 Gene name: PROTEIN UL45 HOMOLOG (18 KD
[SEQ ID NO: 108]
ORF # = 4 from Contig 99
ORF start site = 4072
ORF end site = 3419
ORF sequence: MAGAPPRLPPRNPAPPEQRPAAAARPLAAHREAAGVYNAVRTWGPDAEAEPDQMENTYLL
PEDDAAMPAGVGLGSTPAADTTAAAWPAESHAPRAPSEEADSIYESVSEDGGRVYEEIPW
VRVYENICLRRQDAGGAAPPGDAPDSPYIEAENPLYDWGGSALFSPPGATRAPDPGLSLS
PMPARPRTNALANDGPTNVAALSALLTKLKRGRHQSH*
Gene matched: gi | 114350 | sp | P10230 |ATI2_HSV11 Gene name: ALPHA TRANS-INDUCING FACTOR
[SEQ ID NO: 109]
ORF # = 5 from Contig 99
ORF start site = 5584
ORF end site = 4391
ORF sequence : MQRRARGASSLRLARCLTPANLIRGANAGVPERRIFAGCLLPTPEGLLSAAVGVLRQRAD DLQPAFLTGADRSVRLAARHHNTVPESLIVDGLASDPHYDYIRHYASAAKQALGEVELSG GQLSRAILAQYWKYLQTVVPSGLDIPDDPAGXCDPSLHVLMRPTLLPKLWRAPFKSGAA AAKYAAAVAGLRDAAHRLQQYMFFMRPADPSRPSTDTALRLSELLAYVSVLYHWASWMLW TADKYVCRRLGPADRRFVALSGSLEAPAETFARHLDRGPSGTTGSMQCMALRAAVSDVLG HLTRLAHLWETGKRSGGTYGIVDAIVSTVEVLSIVHHHAQYIINATLTGYWWASDSLNN EYLRAAVDSQERFCRTAAPLFPTMTAPSWARMELSIK*
Gene matched: gi | 114351 | sp | P08314 |ATI2_HSV1F Gene name: ALPHA TRANS-INDUCING FACTOR
[SEQ ID NO: 110]
ORF # = 6 from Contig 99 ORF start site = 7758 ORF end site = 5668 ORF sequence :
MSVRGHAVRRRRASTRSHAPSAHRAESPVEDEPEGGGVGLMGYLRAVFNVDDDSEVEAAG EMASEEPPPRRRREARGHPGSRRASEARAAAPPRRASFPRPRSVTARSQSVRGRRDSAIT RAPRGGYLGPMDPRDVLGRVGGSRWPSPLFLDELSYEEDDYPAAVAHDDGAGARPPATV EIIEGRVSGPELQAAFPLDRLTPRVAAWDESVRSALALGHPAGFYPCPDSAFGLSRVGVM HFASPADPKVFFRQTLQQGEALAWYVTGDAILDLTDRRAKTSPSRAMGFLVDAIVRVAIN GWVCGTRLHTEGRGSELDDRAAELRRQFASLTALRPVGAAAVPLLΞAGGAAPPHPGPDAA VFRSSLGSLLYWPGVRALLGRDCRVAARYAGRMTYIATGALLARFNPGAVKCVLPREAAF AGRVLDVLAVLAEQTVQWLSVWGARLHPHSAHPAFADVEQEALFRALPLGSPGWAAEH EALGDTAARRLLATSGLNAVLGAAVYALHTALATVTLKYALACGDARRRRDNAAAARAVL ATGLILQRLLGLADTWACVALAAFDGGΞTAPEVGTYTPLRYACVLRATQPLYARTTPAK FWADVRAAAEHVDLRPASSAPRAPVSGTADPAFLLEDLAAFPPAPLNSESVLGPRVRWD IMAQFRKLLMGDEETAALRAHVSGRRATGLGGPPRP*
Gene matched: gi | 136920 | sp | P10231 |UL47_HSV11 Gene name: VIRION PROTEIN UL47 (82/81 K
[SEQ ID NO: 111]
ORF # = 7 from Contig 99 ORF start site = 9949
ORF end site = 8279
ORF sequence:
VILKMRGGGREMSVIGDARHPRQFPSQGPRPFSVAGPGSLPPSPPPGARALLIRLSKSLS
PDPTAPMDLLVNNLFADADGVSPPPPRPAGGPKNTPAAPPLYATGRLSQAQLMPSPPMPV PPAALFNRLLDDLGFSAGPALCTMLDTWNEDLFSGFPTNADMYRECKFLSTLPSDVIDWG DAHVPERSPIDIRAHGDVAFPTLPATRDELPSYYEAMAQFFRGELRAREESYRTVLANFC SALYRYLRASVRQLHRQAHMRGRNRDLREMLRTTIADRYYRETARLARVLFLHLYLFLSR EILWAAYAEQMMRPDLFDGLCCDLESWRQLACLFQPLMFINGSLTVRGVPVEARRLRELN HIREHLNLPLVRSAAAEEPGAPLTTPPVLQGNQARSSGYFMLLIRAKLDSYSSVATΞEGE SVMREHAYSRGRTRNNYGSTIEGLLDLPDDDDAPAEAGLVAPRMSFLSAGQRPRRLSTTA PITDVSLGDELRLDGEEVDMTPADALDDFDLEMLGDVESPSPGMTHDPVLYGALDVDDFE FEQMFTDAMGIDDFGG*
Gene matched: gi | 1168549 | sp | P29793 |ATIN_HSV23 Gene name: ALPHA TRANS-INDUCING PROTEI
TABLE 2
[SEQ ID NO: 112] = Contig ID 1
[SEQ ID NO: 113] >contigl (start 332 - stop 874)
MRTPADDVSWRYEAPSVIDYARIDGIFLRYHCPGLDTFLWDRHAQRAYLVNPFLFAAGFLEDLSHSVF PA
DTQETTTRRALYKEIRDALGSRKQAVSHAPVRAGCVNFDYSRTRRCVGRRDLRPANTTSTWEPPVSSD DE
ASSQSKPLATQPPVLALSNAPPRRVSPTRGRRRHTRLRRN*
gi|136776|sp|P28278|VGLL_HSV2H GLYCOPROTEIN L PRECURSOR
[SEQ ID NO: 114] >contigl (start 747 - stop 1751)
MKRARSRSPSPPSRPSSPFRTPPHGGSPRREVGAGILASDATSHVCIASHPGSGAGYPTRLAAGSAVQ
RR
RPRGCPPGVMFSASTTPEQPLGLSGDATPPLPTSVPLDWAAFRRAFLIDDAWRPLLEPELANPLTARL
LA EYDRRCQTEEVLPPREDVFSWTRYCTPDDVRWIIGQDPYHHPGQAHGLAFSVRADVPVPPSLRNVLA
AV
KNCYPDARMSGRGCLEKWARDGVLLLNTTLTVKRGAAASHSKLGWDRFVGGWRRLAARRPGLVFMLW
GA
HAQNAIRPDPRQHYVLKFSHPSPLSKVPFGTCQHFLAANRYLETRDIMPIDWSV*
gi|l37039|sp|P28275|UNG_HSV2H URACIL-DNA GLYCOSYLASE
[SEQ ID NO: 115] >contigl (start 1806 - stop 2507) MVKSRVSYRSVMSGVGEERVPSAFTILASWGWTFAPQNHDPGASPNTTPIESIAGTAPDAHVGPLDGE PD RDAISPLTSSVAGDPPGADGPYVTFDTLFMVSSIDELGRRQLTDTIRKDLRLSLAKFSIACTKTΞSFS GT
AARQRKRGAPPQRTCVPRSNKSLQMFVLCKRANAAQVREQLRAVIRSRKPRKYYTRSSDGRLCPAVPV FV HEFVSSEPMRLHRDNVMLSTEPD*
gi 1136782 | sp| P28279 |UL03_HSV2H PROTEIN UL3
[SEQ ID NO:116] >contigl (start 3312 - stop2707) MGNPQTTIAYSLHHPRASLTSALPDAAQWHVFESGTRAVLTRGRARQDRLPRGGWIQHTPIGLLVI ID
CRAEFCAYRFIGRASTQRLERWWDAHMYAYPFDSWVSSSHGESVRSATAGILTWWTPDTIYITATIY GT APEAARGCDNAPLDVRPTTPPAPVSPTAGEFPANTTDLLVEVLREIQISPTLDDADPTPGT*
gi|136788|sp|P28280|UL04_HSV2H PROTEIN UL4
[SEQ ID NO: 117] >contigl (contigl start 6024 - stop 3379)
MAASGGEGSRDVRAPGPPPQQPGARPAVRFRDEAFLNFTSMHGVQPIIARIRELSQQQLDVTQVPRLQ WF
RDVAALEVPTGLPLREFPFAAYLITGNAGSGKSTCVQTLNEVLDCWTGATRIAAQNMYVKLSGAFLS
RP
INTIFHEFGFRGNHVQAQLGQHPYTLASSPASLEDLQRRDLTYYWEVILDITKRALAAHGGEDARNEF
HA LTALEQTLGLGQGALTRLASVTHGALPAFTRSNIIVIDEAGLLGRHLLTTWYCWWMINALYHTPQYA
GR
LRPVLVCVGSPTQTASLESTFEHQKLRCSVRQSENVLTYLICNRTLREYTRLSHSWAIFINNKRCVEH
EF
GNLMKVLEYGLPITEEHMQFVDRFVVPESYITNPANLPGWTRLFSSHKEVSAYMAKLHAYLKVTREGE FV
VFTLPVLTFVSVKEFDEYRRLTQQPTLTMEKWITANASRITNYSQSQDQDAGHVRCEVHSKQQLWAR
ND
ITYVLNSQVAVTARLRKMVFGFDGTFRTFEAVLRDDSFVKTQGETSVEFAYRFLSRLMFGGLIHFYNF
LQ RPGLDATQRTLAYGRLGELTAELLSLRRDAAGASATRAADTSDRSPGERAFNFKHLGPRDGGPDDFPD
DD
LDVIFAGLDEQQLDVFYCHYALEEPETTAAVHAQFGLLKRAFLGRYLILRELFGEVFESAPFSTYVDN
VI
FRGCELLTGSPRGGLMSVALQTDNYTLMGYTYTRVFAFAEELRRRHATAGVAEFLEESPLPYIVLRDQ HG
FMSWNTNISEFVESIDSTELAMAINADYGISSKLAMTITRSQGLSLDKVAICFTPGNLRLNSAYVAM
SR
TTSSEFLHMNLNPLRERHERDDVISEHILSALRDPNWIVY*
gi I 74000 |pir I I MBEU5 gene UL5 protein - human herpesvirus 1 [SEQ ID NO: 118] >contigl (start 5594 - stop 7375) translated
MVLMGRLRNAPEΞLTYMFCAAIRVAPVTTQSRTSLRVCTHVLFPDPALPVMRYAANGNSRSGRPVGTS
KA ATSRNHCRRGTCVTSSCCCESSRMRAMIGWTPCMDVKFKNASSLNRTAGLAPGCCGGGPGARTSREPS
PP
DAAMAAQPΛRAPAMRTRGGDAALCAPEDGWVKVHPTPGTMLFREILLGQMGYTEGQGVYNWRSSEAA
TR
QLQAAIFHALLNATTYRDLEEDWRRHWARGLQPQRLVRRYRNAREGDIAGVAERVFDTWRCTLRTTL LD
FAHGWDCFAPGGPSGPTSFPKYIDWLTCLGLVPILRKTREGEATQRLGAFLRQHTLPRQLATVAGAA
ER
AGPGLLELAVAFDSTRMAEYDRVHIYYNHRRGEWLVRDPVSGQRGECLVLCPPLWTGDRLVFDSPVQR
LC PEIVACHALREHAHICRLRNTASVKVLLGRKSDSERGVAGAARWNKALGEDDETKAGSAASRLVRLI
IN
MKGMRHVGDINDTVRAYLDEAGGHLIDTPAVDHTLPGFGKGGTGRGSAAQDPGARPQQLRQAFQTAW
NN
INGMLEGYINNLFGTIERLRETNAGLATQLQARV
gi|136794|sp|P10190|UL06_HSVll VIRION PROTEIN UL6
[SEQ ID NO: 119] = Contig ID 10 Length: 21036 Type: N Check: 7835 ..
[SEQ ID NO: 120] >contigl0 (start 5688 - stop 1) translated
VAGAAHMIPAALPHPTMKRQGDRDIWTGVRNQFATDLEPGGSVSCMRSSLSFLSLLFDVGPRDVLSA EA
IEGCLVEGGEWTRAAAGSGPPRMCSIIELPNFLEYPAARGGLRCVFSRVYGEVGFFGEPTAGLLETQC
PA
HTFFAGPWAMRPLSYTLLTIGPLGMGLYRDGDTAYLFDPHGLPAGTPAFIAKVRAGDVYPYLTYYAHD
RP KVRWAGAMVFFVPSGPGAVAPADLTAAALHLYGASETYLQDEPFVERRVAITHPLRGEIGGLGALFVG
W
PRGDGEGSGPWPALPAPTHVQTPRADRPPEAPRGASGPPNTPQAGHPNRPPDDVWAAALEGTPPAKP
SA
PDAAASGPPHAAPPPQTPAGDAAEEAEDLRVLEVGAVPVGRHRARYSTGLPKRRRPTWTPPSSVEDLT SG
ERPAPKAPPAKAKKKSAPKKKAPVAAEVPASSPTPIAATVPPAPDTPPQSGQGGGDDGPASPSSPSVL
ET
LGARRPPEPPGADLAQLFEVHPNVAATAVRLAARDAALAREVAACSQLTINALRSPYPAHPGLLELCV
IF FFERVLAFLI ENGARTHTQAGVAGPAAALLDFTLRMLPRKTAVGDFLASTRMSLADVAAHRPL I QHVL
DE
NSQIGRLALAKLVLVARDVIRETDAFYGDLADLDLQLRAAPPANLYARLGEWLLERSRAHPNTLFAPA
TP THPEPLLHRIQALAQFARGEEMRVEAEAREMREALDALARGVDSVSQRAGPLTVMPVPAAPGAGGRAP
CP
PALGPEAIQARLEDVRIQARRAIESAIKEYFHRGAVYΞAKALQASDSHDCRFHVASAAWPMVQLLES
LP
AFDQHTRDVAQRAALPPPPPLATSPQAILLRDLLQRGQTLDAPEDLAAWLSVLTDAATQGLIERKPLE EL
ARSIHGINDQQARRSSGLAELQRFDALDAALAQQLDSDAAFVPATGPAPYVDGGGLSPEATRMAEDAL
RQ
ARAMEAAKMTAELAPEARSRLRERAHALEAMLNDARERAKVAHDAREKFLHKLQGVLRPLPDFVGLKA
CP AVLATLRASLPAGWTDLADAVRGPPPEVTAALRADLWGLLGQYREALEHPTPDTATALAGLHPAFVW
LK
TLFADAPETPVLVQFFSDHAPTIAKAVSNAINAGSAAVATASPAATVDAAVRAHGALADAVSALGAAA
RD
PASPLSFLAALADSAAGYVKATRLALEARGAIDELTTLGSAAADLWQARRACAQPEGDHAALIDAAA RA
TTAARESLAGHEAGFGGLLHAEGTAGDHSPSGRALQELGKVIGATRRRADELEAAVADLTAKMAAQRA
RG
SSERWAAGVEAALDRVENRAEFDWELRRLQALAGTHGYNPRDFRKRAEQALAANAEAVTLALDTAFA
FN PYTPENQRHPMLPPLAAIHRLGWSAAFHAAAETYADMFRVDAEPLARLLRIAEGLLEMAQAGDGFIDY
HE
AVGRLADDMTSVPGLRRYVPFFQHGYADYVELRDRLDAIRADVHRALGGVPLDLAAAAEQISAARNDP
EA
TAELVRTGVTLPCPSEDALVACAAALERVDQSPVKNTAYAEYVAFVTRQDTAETKDAWRAKQQRAEA TE
RVMAGLREALAARERRAQ I EAEGL ANLKTMLKWAVPATVAKTLDQARS VAE I ADQVEVLLDQTEKTR EL
DVPAVIWLEHAQRTFETHPLSAARGDGPGPLARHAGRLGALFDTRRRVDALRRSLEEAEAEWDEVWGR FG RVRGGAWKSPEGFRAMHEQLRALQDTTNTVSGLRAQPAYERLΞARYQGVLGAKGAERAEAVEELGARV TK
HTALCARLRDEWRRVPWEMNFDALGRLLAEFDAAAADLAPWAVEEFRGARELIQYRMGLYSAYARAG GQ TXXXXX
gi|135576|sp|P10220|TEGU_HSVll LARGE TEGUMENT PROTEIN
[SEQ ID NO:121] >contiglO (start 9322 - stop 5978) translated MSDSALQVPAPAGMTPPSAPPPNGPLQVLLGSLTNLRRPPSPSSEPAGSADEPAFLSAAKLRAATAAF L SGAAVGPAEARACWHPLLEQLCALHRAHGLPETALLAENLPGLLVHRMAVALPETPEAAFREMDVIKD
TV
LAITGSDTTHALEAAGLRTTAALGPVRVRQCAVEWIDRWRTVTQSCLAMNPRTSLEALGEMSLKMSPV
PL GQPGANLTTPAYSLLFPSPIVQEGLRFLALVSNWVTLFSAHLQRIDDAALTPLTRALFTLALVDEYLT
TP
DRGAWPPPLLAQFQHTVREIDPAIMIPPLEATKMVRSREEVRVSTALSRVSPRSACAPPGTLMARVR
TD
AAVFDPDVPFLSASALAIFRPAVTGLLQLGEPPSAGAQQRLLALLQQTWALVQNSNSPSWINTLTDA GF
TPAHCTQYISALEGFLVAGVPARTPPGHGLSEIQQLFGCIALAGANVFGLAREYGHYAGYVKTFRRIQ
GA
SEHTHGRLCEAVGLSGGVLSQTLARIMGPAVPTEHLASLRRTLVGEFETAERRFSAGQPSLLRETALI
WL DVYGQTHWDLTPTTPATPLSALLPVGPPSHAPSVHLAAATKIRFPALEGIHPNVLADPGFVPYVLALV
VG
DALRATCNAAYLPRPIEFALRVLAWARDFGLGYLPTVEGHRTKLGALITLLEPATRAGVGPTMQMADN
IE
QLLRELYVIARGAVEQLRPAVQLPPPQPPEVGSSLLLISMYALAARGVLQELAERADPLVRQLEDAIV LL
RLHMRTLAAFFECRFESDGHRLYAWADAHERLGPWRPEAMGDAVSQYCGMYHDAKRALVASLAGLRS
W
TETTAHLGVCDELAAQVSHEGNVLAWRREIHGFLAIVSGIHARASKLMSGDQVPGFCYMSQFLARWR
RL SAGYQAARAATGPERVAEFVQELHDTWKGLQTERALWAPFASSADQRTAAIQEVMAHATEDAPPSPA
AD
LWLTNRHDLGAWGDYSLGPLGQPTWPDSVDLSPQGLAATLSMDWLLINELLQVTDGVFRASAFRPS
AG
PEAPGDLEAQDAGGSTPEPTTPGPQDTQARAPSTRPAGRETVPWPNTPVEDDEMTPQETPPVHP*
gi I 221758 UL37 [Herpes simplex virus type 1]
[SEQ ID NO:122] >contiglO (start 9262 - stop 11211) translated VERTGGSCRRAPGPGARCPTWRPACALGDAARRPRAQTGMTAAALYGGAKYRPGTLRNPGRVASTPRR RG
VLYGALCPGIPFVGSGPGAVGWECVCVGGGRRDGGPDQVYRGRSVGRPNRPFKHLRMHRPSQSDTGTH
QR
RKPPSPVRVRVFSGGVFFLSALLPPHLHHPPPTTRPLAIGGKTMKTKPLPTAPMAWAESAVETTTSPR
EL AGHAPLRRVLRPPIARRDGPVLLGDRAPRRTASTMWLLGIDPAESSPGTRATRDDTEQAVDKILRGAR
RA
GGLTVPGAPRYHLTRQVTLTDLCQPNAERAGALLLALRHPTDLPHLARHRAPPGRQTERLAEAWGQLL
EA
SALGSGRAESGCARAGLVSFNFLVAACAAAYDARDAAEAVRAHITTNYGGTRAGARLDRFSECLRAMV HT HVFPHEVMRFFGGLVSWVTQDELASVTAVCSGPQEATHTGHPGRPCSAVTIPACAFVDLDAELCLGGP GA
AFLYLVFTYRQCRDQELCCVYWKSQLPPRGLEAALERLFGRLRITNTIHGAEDMTPPPPNRNVDFPL AV LAASSQSPRCSASQVTNPQFVDRLYRWQPDLRGRPTARTCTYAAFAELGVMPDNSPRCLHRTERFGAV GV P WI L EG WWRPGGWRAC A *
gi|139176|sp|P22486|VP19_HSV2G CAPSID ASSEMBLY AND DNA MATU
[SEQ ID NO:123] >contiglO (start 11673 - stop 15215) translated
VIRRPVRPFGRTAHPASHGPAAVSVHRVRATVTLVPMANRPAASALAGARSPSERQEPREPEVAPPGG
DH
VFCRKVSGVMVLSSDPPGPAAYRISDSSFVQCGSNCSMIIDGDVARGHLRDLEGATSTGAFVAISNVA AG
GDGRTAWALGGTSGPSATTSVGTQTSGEFLHGNPRTPEPQGPQAVPPPPPPPFPWGHECCARRDARG
GA
EKDVGAAESWSDGPSSDSETEDSDSSDEDTGSGSETLSRSSSIWAAGATDDDDSDSDSRSDDSVQPDV
W RRRWSDGPAPVAFPKPRRPGDSPGNPGLGAGTGPGSATDPRASADSDSAAHAAAPQADVAPVLDSQPT
VG
TDPGYPVPLELTPENAEAVARFLGDAVDREPALMLEYFCRCAREESKRVPPRTFGSAPRLTEDDFGLL
NY
ALAEMRRLCLDLPPVPPNAYTPYHLREYATRLVNGFKPLVRRSARLYRILGILVHLRIRTREASFEEW MR
SKEVDLDFGLTERLREHEAQLMILAQALNPYDCLIHSTPNTLVERGLQSALKYEEFYLKRFGGHYMES
VF
QMYTRIAGFLACRATRGMRHIALGRQGSWWEMFKFFFHRLYDHQIVPSTPAMLNLGTRNYYTSSCYLV
NP QATTNQATLRAITGNVSAILARNGGIGLCMQAFNDASPGTASIMPALKVLDSLVAAHNKQSTRPTGAC
VY
LEPWHSDVRAVLRMKGVLAGEEAQRCDNIFSALWMPDLFFKRLIRHLDGEKNVTWSLFDRDTSMSLAD
FH
GEEFEKLYEHLEAMGFGETIPIQDLAYAIVRSAATTGSPFIMFKDAVNRHYIYDTQGAAIAGSNLCTE IV
HPSΞKRSSGVCNLGSVNLARCVSRRTFDFGMLRDAVQACVLMVNIMIDSTLQPTPQCARGHDNLRSMG
IG
MQGLHTACLKMGLDLESAEFRDLNTHIAEVMLLAAMKTSNALCVRGARPFSHFKRSMYRAGRFHWERF
SN ASPRYEGEWEMLRQSMMKHGLRNSQFIALMPTAASAQISDVSEGFAPLFTNLFSKVTRDGETLRPNTL
LL
KELERTFGGKRLLDAMDGLEAKQWSVAQALPCLDPAHPLRRFKTAFDYDQELLIDLCADRAPYVDHSQ
SM
TLYVTEKADGTLPASTLVRLLVHAYKRGLKTGMYYCKVRKATNSGVFAGDDNIVCTSCAL* gi I 1710385|sp|P09853 |RIR1_HSV23 RIBONUCLEOSIDE-DIPHOSPHATE
[SEQ ID NO: 124] >contiglO (start 15268 - stop 16281) translated MDPAVSPASTDPLDTHASGAGAAPIPVCPTPERYFYTSQCPDINHLRSLSILNRWLETELVFVGDEED VS
KLSEGELGFYRFLFAFLSAADDLVTENLGGLSGLFEQKDILHYYVEQECIEWHSRVYNIIQLVLFHN ND
QARRAYVARTINHPAIRVKVDWLEARVRECDSIPEKFILMILIEGVFFAASFAAIAYLRTNNLLRVTC QS NDLIΞRDEAVHTTASCYIYNNYLGGHAKPEAARVYRLFREAVDIEIGFIRSQAPTDSSILSPGALAAI EN YVRFSADRLLGLIHMQPLYSAPAPDASFPLSLMSTDKHTNFFECRSTSYAGAWNDL*
gi|132624|sp|P03174|RIR2_HSV23 RIBONUCLEOSIDE-DIPHOSPHATE R
[SEQ ID NO: 125] >contigl0 (start 17637 - stop 16564) translated
MRRRGHAFAPGDRGTRAAGPGPAAPWGAPSKPALRLAHLFCIRVLRALGYAYINSGQLEADDACANLY
HT
NTVAYVHTTDTDLLLMGCDIVLDISTGYIPTIHCRDLLQYFKMSYPQFLALFVRCHTDLHPNNTYASV ED
VLRECHWTAPSRSQARRAARRERANSRSLESMPTLTAAPVGLETRISWTEILAQQIAGEDDYEEDPPL
QP
PDVAGGPRDGARSSSSEILTPPELVQVPNAQRVAEHRGYVAGRRRHVIHDAPEALDWLPDPMTIAELV
EH RYVKYVISLISPKERGPWTLLKRLPIYQDLRDEDLARSIVTRHITAPDIADRFLAQLWAHAPPPAFYK
DV
LAKFWDE*
gi I 549322 | sp | P36699 | VHS_HSV2G VIRION HOST SHUTOFF PROTEIN
[SEQ ID NO: 126] >contiglO (start 18537 - stop 19949) translated
MAHLPGGAAAAPLSEDAIPSPRERTEDWPPCQIVLQGAELNGILQAFAPLRTSLLDSLLWGDRGILV
HN
AIFGEQVFLPLDHSQFSRYRWGGPTAAFLSLVDQKRSLLSVFRANQYPDLRRVELTVTGQAPFRTLVQ RI
WTTASDGEAVELASETLMKRELTSFAVLLPQGDPDVQLRLTKPQLTKWNAVGDETAKPTTFELGPNG
KF
SVFNARTCVTFAAREEGASSSTSAQVQILTSALKKAGQAAANAKTVYGENTHRTFSVWDDCSMRAVL
RR LQVGGGTLKFFLTADVPSVCVTATGPNAVSAVFLLKPQRVCLNWLGRTPGSSTGSLASQDSRAGPTDS
QD
FSSEPDAGDRGAPEEEGLEGQARVPPAFPEPPGTKRRHAGAEWPADDATKRPKTGVPAAPTRAESPP
LS
ARYGPEAAEGGGDGGRYACYFRDLQTGDASPSPLSAFRGPQRPPYGFGLP* gi | l36905 | sp | P10226 | VPAP_HSVll DNA POLYMERASE PROCESSIVITY
[ SEQ ID NO : 127 ] >contigl 0 ( start 20031 - stop 21053 ) translated
VCPPPPTNMAWCGSGLRLRPFHPPSPSFFVLRALIRAGPGPFAASPRAPSGPGCGMCRGDSPGVAGG SG
EHCLGGDDGDDGRPRLACVGAIARGFAHLWLQATTLGFVGSWLSRGPYADAMSGAFVIGSTGLGFLR AP
PAFARPPTRVCAWLRLVGGGAAVALWSLGEAGAPPGVPGPATQCLALGAAYAALLVLADDVHPLFLLA PR PLFVGTLGVWGGLTIGGSARYWWIDPRAAAALTAAWAGLGTTAAGDSFSKACPRHRRFCWSAVES PP PRYAPEDAERPTDHGPLLPSTHHQRSPRVCGDGAARPENIWVPWTFAGALALAACAARGS
gi|136909 | sp | P10227 |UL43_HSV11 MEMBRANE PROTEIN UL43
[SEQ ID NO:128] = Contig 11 Length: 2343 Type: N Check: 6656 [SEQ ID NO: 129] >contigll (start 2357 - stop 3) translated
APLLVDLRALDARARASSSPEGHEVDPQLLRRRGEAYLRAGGDPGPLVLREAVSALDLPFATSFLAPD GT
PLQYALCFPAVTDKLGALLMRPEAACVRPPLPTDVLESAPTVTAMYVLTVVNRLQLALSDAQAANFQL
FG
RFVRHRQATWGASMDAAAELYVALVATTLTREFGCRWAQLGWASGAAAPRPPPGPRGSQRHCVAFNEN
DV LVALVAGVPEHIYNFWRLDLVRQHEYMHLTLERAFEDAAESMLFVQRLTPHPDARIRVLPTFLDGGPP
TR
GLLFGTRLADWRRGKLSETDPLAPWRSALELGTQRRDAPALGKLSPAQALAAVSVLGRMCLPSAALAA
LW
TCMFPDDYTEYDSFDALLAARLESGQTLGPAGGREASLPEAPHALYRPTGQHVAVLAAATHRTPAARV TA
MDLVLAAVLLGAPVWALRNTTAFSRESELELCLTLFDSRPGGPDAALRDWSSDIETWAVGLLHTDL
NP
IENACLAAQLPRLSALIAERPLADGPPCLVLVDISMTPVAVLWEAPEPPGPPDVRFVGSEATEELPFV AT AGDVLAASAADADPFFARAILGRPFDASLLTGELFPGHPVYQRPLADEAGPSAPTAARDPRDLAGGDG GS
GPEDPAAPPARQADPGVLAPTLLTDATTGEPVPPRMWAWIHGLEELASDDAGGPTPNPAPALLPPPAT DQ
SVPTSQYAPRPIGPAATARETRPSVPPQQNTGRVPVAPRDDPRPSPPTPSPPADAALPPPAFSGSAAA FS
AAVPRVRRSRXXXXX
gi|l35576|sp|P10220|TEGU_HSVll LARGE TEGUMENT PROTEIN [SEQ ID NO: 130] = Contig 12 Length: 14928 Type: N Check: 1371
[SEQ ID NO: 131] >contigl2 (start 1505 - stop 3) translated MAAAPPAAVSEPTAARQKLLALLGQVQTYVFQLELLRRCDPQIGLGKLAQLKLNALQVRVLRRHLRPG LE
AQAAAFLTPLSVTLELLLEYAWREGERLLGHLETFATTGDVSAFFTETMGLARPCPYHQQIRLETYGG DV RMELCFLHDVENFLKQLNYCHLITPPSGATAALERVREFMVAAVGSGLIVPPELSDPSHPCAVCFEEL cv
TANQGATIARRLADRICNHVTQQAQVRLDANELRRYLPHAAGLSDAARARALCVLDQALARTAAGGGA RA
GPPPADSSΞVREEADALLEAHDVFQATTPGLYAISELRFWLASGDRARHSTMDAFADNLNALAQRELQ QE TAAVAVELALFGRRAEHFDRAFGGHLAALDMVDALIIGGQATSPDDQIEALIRACYDHHLTTPLLRRL VS
PEQCDEEALRRVLARLGAGGATGGAEEEEPRAAAEEGGRRRGAGTPASEDGERGPEPGAQGPESWGDI AT RAAADVXXXXX
gi|124088|sp|P10212|PRTP_HSVlll PROCESSING AND TRANSPORT PRO
[SEQ ID NO:132] >contigl2 (start 5468 - stop 1878) translated
MDTKPKTTTTVKVPPGPMGYVYGRACPAEGLELLSLLΞARSGDADVAVAPLIVGLTVESGFEANVAAV VG
SRTTGLGGTAVSLKLMPSHYSPSVYVFHGGRHLAPSTQAPNLTRLCERARRHFGFSDYAPRPCDLKHE
TT
GDALCERLGLDPDRALLYLVITEGFREAVCISNTFLHLGGMDKVTIGDAEVHRIPVYPLQMFMPDFSR
VI ADPFNCNHRS IGENFNYPLPFFNRPLARLLFEAWGPAAVALRARNVDAVARAAAHLAFDENHEGAAL
PA
DITFTAFEASQGKPQRGARDAGNKGPAGGFEQRLASVMAGDAALALESIVSMAVFDEPPPDITTWPLL
EG
QETPAARAGAVGAYLARAAGLVGAMVFSTNSALHLTEVDDAGPADPKDHSKPSFYRFFLVPGTHVAAN PQ
LDREGHWPGYEGRPTAPLVGGTQEFAGEHLAMLCGFSPALLAKMLFYLERCDGGVIVGRQEMDVFRY
VA
DSGQTDVPCNLCTFETRHACAHTTLMRLRARHPKFASAARGAIGVFGTMNSAYSDCDVLGNYAAFSAL
KR ADGSENTRTIMQETYRAATERVMAELEALQYVDQAVPTALGRLETIIGTREALHTWNNIKQLVDREV
EQ
LMRNLIEGRNFKFRΓSIAEANHAMSLSLDPYTCGPCPLLQLLARRSNLAVYQDLALSQCHGVFAGQSV
EG
RNFRNQFQPVLRRRVMDLFNNGFLSAKTLTVALSEGAAICAPSLTAGQTAPAESSFEGDVARVTLGFP KE LRVKSRVLFAGASANASEAAKARVASLQSAYQKPDKRVDILLGPLGFLLKQFHAVIFPNGKPPGSNQP NP
QWFWTALQRNQLPARLLSREDIETIAFIKRFSLDYGAINFINLAPNNVSELAMYYMANQILRYCDHST YF INTLTAVIAGSRRPPGVQAAAAWAPQGGAGLEAGARALMDSLDAHPGAWTSMFASCNLLRPVMAARPM W
LGLSISKYYGMAGNDRVFQAGNWASLLGGKNACPLLIFDRTRKFVLACPRAGFVCAASSLGGGAHEHS LC
EQLRGIIAEGGAAVASSVFVATVKSLGPRTQQLQIEDWLALLEDEYLSEEMMEFTTRALERGHGEWST DA
ALEVAHEAEALVSQLGAAGEVFNFGDFGDEDDHAASFGGLAAAAGAAGVARKRAFHGDDPFGEGPPEK
KD
LTLDML*
gi|544182|sp|P36384|DNBI_HSV2 MAJOR DNA-BINDING PROTEIN
[SEQ ID NO: 133] >contigl2 (start 6286 - stop 10008) translated
MFCAAGGPTSPGGKSAARAASGFFAPHNPRGATQTAPPPCRRQNFYNPHLAQTGTQPKAPGPAQRHTY
YS ECDEFRFIAPRSLDEDAPAEQRTGVHDGRLRRAPKVYCGGDERDVLRVGPEGF PRRLRLWGGADHAP
EG
FDPTVTVFHVYDILEHVEHAYSMRAAQLHERFMDAITPAGTVITLLGLTPEGHRVAVHVYGTRQYFYM
NK
AEVDRHLQCRAPRDLCERLAAALRES PGASFRGI SADHFEAEWERADVYYYETRPTLYYRVFVRSGR AL
AYLCDNFCPAIRKYEGGVDATTRFILDNPGFVTFGWYRLKPGRGNAPAQPRPPTAFGTSSDVEFNCTA
DN
LAVEGAMCDLPAYKLMCFDIECKAGGEDELAFPVAERPEDLVIQISCLLYDLSTTALEHILLFSLGSC
DL PESHLSDLASRGLPAPWLEFDSEFEMLLAFMTFVKQYGPEFVTGYNIINFDWPFVLTKLTEIYKVPL
DG
YGRMNGRGVFRVWDIGQSHFQKRSKIKVNGMVNIDMYGIITDKVKLSSYKLNAVAEAVLKDKKKDLSY
RD
IPAYYASGPAQRGVIGEYCVQDSLLVGQLFFKFLPHLELSAVARLAGINITRTIYDGQQIRVFTCLLR LA
GQKGFILPDTQGRFRGLDKEAPKRPAVPRGEGERPGDGNGDEDKDDDEDGDEDGDEREEVARETGGRH
VG
YQGARVLDPTSGFHVDPVWFDFASLYPSIIQAHNLCFSTLSLRPEAVAHLEADRDYLEIEVGGRRLF
FV KAHVRESLLSILLRDWLAMRKQIRSRIPQSTPEEAVLLDKQQAAIKWCNSVYGFTGVQHGLLPCLHV
AA
TVTTIGREMLLATRAYVHARWAEFDQLLADFPEAAGMRAPGPYSMRIIYGDTDSIFVLCRGLTAAGLV
AM
GDKMASHISRALFLPPIKLECEKTFTKLLLIAKKKYIGVICGGKMLIKGVDLVRKNNCAFINRTSRAL VD LLFYDDTVSGAAAALAERPAEEWLARPLPEGLQAFGAVLVDAHRRITDPERDIQDFVLTAELSRHPRA YT
NKRLAHLTVYYKLMARRAQVPSIKDRIPYVIVAQTREVEETVARLAALRELDAAAPGDEPAPPAALPS PA KRPRETPSHADPPGGASKPRKLLVSELAEDPGYAIARGVPLNTDYYFSHLLGAACVTFKALFGNNAKI TE SLLKRFIPETWHPPDDVAARLRAAGFGPAGAGATAEETRRMLHRAFDTLA*
gi|118882|sp|P07918|DPOL_HSV21 DNA POLYMERASE
[SEQ ID NO:134] >contigl2 (start 10870 - stop 9953) translated
MYDIAPRRΞGSRPGPGRDKTRRRSRFSAAGNPGVERRASRKSLPSHARRLELCLHERRRYRGFFAALA
QT
PSEEIAIVRSLSVPLVKTTPVSLPFSLDQTVADNCLTLSGMGYYLGIGGCCPACΞAGDGRLATVSREA LI
LAFVQQINTIFEHRTFLASLWLADRHSTPLQDLLADTLGQPELFFVHTILRGGGACDPRFLFYPDPT
YG
GHMLYVIFPGTSAHLHYRLIDRMLTACPGYRFAAHVWQSTFVLWRRNAEKPADAEIPTVSAADIYCK
MR DISFDGGLMLEYQRLYATFDEFPPP*
gi|l36875|sp|P10215|UL31_HSVll PROTEIN UL31
[SEQ ID NO:135] >contigl2 (start 12674 - stop 10863) translated VRPARPAMATSAPGVPSSAAVREESPGSSWKEGAFERPYVAFDPDLLALNEALCAELLAACHWGVPP AS
ALDEDVESDVAPAPPRPRGAAREASGGRGPGSARGPPADPTAEGLLDTGPFAAASVDTFALDRPCLVC RT
IELYKQAYRLSPQWVADYAFLCAKCLGAPHCAASIFVAAFEFVYVMDHHFLRTKKATLVGSFARFALT IN
DIHRHFFLHCCFRTDGGVPGRHAQKQPRPTPSPGAAKVQYSNYSFLAQSATRALIGTLASGGDDGAGA
GG
GSGTQPSLTTALMNWKDCARLLDCTEGKRGGGDSCCTRAAARNGEFEAAAGALAQGGEPETWAYADLI LL LLAGTPAVWESGPRLRAAADARRAAVSESWEAHRGARMRDAAPRFAQFAEPKAQPDLDLGPLMATVLK HG
RGRGRTGGECLLCNLLLVRAYWLAMRRLRASWRYSENNTSLFDCIVPWDQLEADPEAQPGDGGRFV SL
LRAAGPEAIFKHMFCDPMCAITEMEVDPWVLFGHPRADHRDELQLHKAKLACGNEFEGRVCIALRALI YT
FKTYQVFVPKPTALATFVREAGALLRRHSISLLSLEHTLCTYV*
gi 1136879 | sp | P10216 |UL32_HSV11 PROBABLE MAJOR ENVELOPE GLYC
[SEQ ID NO: 136] >contigl2 (start 12652 - stop 13044) translated MAGRAGRTRPRTLRDAIPDCALRSQTLESLDARYVSRDGAGDAAVWFEDMTPAELEVIFPTTDAKLNY
LS
RTQRLASLLTYAGPIKAPDGPAAPHTQDTACVHGELDATERERFAAVINRFLDLHQILRG*
gi 1136883 | sp | P10217 | UL33_HSV11 PROTEIN UL33
[SEQ ID NO: 137] >contigl2 (start 13134 - stop 13964) translated MAGMGKPYGGRPGDAFEGLVQRIRLIVPTTLRGGGGESGPYSPSNPPSRCAFQFHGQDGSDEAFPIEY VL RLMNDWADVPCNPYLRVQNTGVSVLFQGFFNRPHGAPGGAITAEQTNVILHSTETTGLSLGDLDDVKG RL
GLDARPMMASMWISCFVRMPRVQLAFRFMGPEDAVRTRRILCRAAEQALARRRRSRRSQDDYGAVAVA AA HHΞSGAPGPGVAASGPPAPPGRGPARPWHQAVQLFRAPRPGPPALLLLVAGLFLGAAIWWAVGARL*
gi|136888|sp|P10218|UL34_HSVll VIRION PROTEIN UL34
[SEQ ID NO:138] >contigl2 (start 14076 - stop 14414) translated MAAPQFHRPSTITADNVRALGMRGLVLATNNAQFIMDNSYPHPHGTQGAVREFLRGQAAALTDLGVTH AN
NTFAPQPMFAGDAAAEWLRPSFGLKRTYSPFWRDPKTPSTP*
gi|139196|sp|P10219|VP26_HSVll CAPSID PROTEIN VP26
[SEQ ID NO: 139] = Contig ID 13 Length: 838 Type: N Check: 7960
[SEQ ID NO: 140] >contigl3 (start stop 852 - 1) translated RRLYADRLTKRSLASLGRCVREQRGELEKMLRVSVHGEVLPATFAAVANGFAARARFCALTAGAGTVI
DN
RAAPGVFDAHRFMRASLLRHQVDPALLPSITHRFFELVNGPLFDHSTHSFAQPPNTALYYSVENVGLL
PH
LKEELARFIMGAGGSGADWAVSEFQKFYCFDGVSGITPTQRAAWRYIRELIIATTLFASVYRCGELEL RR
PDCSRPTSEGLYRYPPGVYLTYNSDCPLVAIVESGPDGCIGPRSVWYDRDVFSILYSVLQHLAPRLA
GXXXXX
gi 1124089 |sp|P12835 | PRTP_HSVlA PROCESSING AND TRANSPORT PRO
[SEQ ID NO: 141] = Contig ID 14 Length: 2647 Type: N Check: 2951 .. [ SEQ ID NO : 142 ] >contigl4 ( start 2661 - stop 97 ) translated
PPVPSPATTKARKRKTKKPPKRPEATPPPDANATVAAGHATLRAHLREIKVENADAQFYVCPPPTGAT
W
QFEQPRRCPTRPEGQNYTEGIAVVFKENIAPYKFKATMYYKDVTVSQVWFGHRYSQFMGIFEDRAPVP FE
EVIDKINAKGVCRSTAKYVRNNMETTAFHRDDHETDMELKPAKVATRTSRGWHTTDLKYNPSRVEAFH
RY
GTTVNCIVEEVDARSVYPYDEFVLATGDFVYMSPFYGYREGSHTEHTSYAADRFKQVDGFYARDLTTK
AR ATSPTTRNLLTTPKFTVAWDWVPKRPAVCTMTKWQEVDEMLRAEYGGSFRFSSDAISTTFTTNLTQYS
LS
RVDLGDCIGRDAREAIDRMFARKYNATHIKVGQPQYYLATGGFLIAYQPLLSNTLAELYVREYMREQD
RK
PRNATPAPLREAPSANASVERIKTTSSIEFARLQFTYNHIQRHVNDMLGRIAVAWCELQNHELTLWNE AR
KLNPNAIASATVGRRVSARMLGDVMAVSTCVPVAPDNVIVQNSMRVSSRPGTCYSRPLVSFRYEDQGP
LI
EGQLGENNELRLTRDALEPCTVGHRRYFIFGGGYVYFEEYAYSHQLSRADVTTVSTFIDLNITMLEDH
EF VPLEVYTRHEIKDSGLLDYTEVQRRNQLHDLRFADIDTVIRADANAAMFAGLCAFFEGMGDLGRAVGK
W
MGWGGWSAVSGVSSFMSNPFGALAVGLLVLAGLVAAFFAFRYVLQLQRNPMKALYPLTTKELKTSD
PG
GVGGEGEEGAEGGGFDEAKLAEAREMIRYMALVSAMERTEHKARKKGTSALLΞSKVTNMVLRKRNKAR YS
PLHNEDEAGDEDEL*
gi|138198|sp|P06763|VGLB_HSV23 GLYCOPROTEIN B PRECURSOR
[SEQ ID NO:143] = Contig ID 15 Length: 20389 Type: N Check: 2794
[SEQ ID NO:144] >contigl5 (start 788 - stop 3) translated MNAHFANEVQYDLTRDPSSPASLIHVIISSECLAAAGVPLSALVRGRPDGGAAANFRVETQTRAHATG
DC
TPWRSAFAAYVPADAVGAILAPVIPAHPDLLPRVPSAGGLFVSLPVACDAQGVYDPYTVAALRLAWGP
WA
TCARVLLFSYDELVPPNTRYAADGARLMRLCRHFCRYVARLGAAAPAAATEAAAHLSLGMGESGTPTP QA
SSVSGGAGPAWGTPDPPISPEEQLTAPGGDTATAEDVSITQENEEIXXXXX
gi | l36835 | sp | P1020 l | UL17_HSVll PROTEIN UL17
[ SEQ ID NO : 145 ] >contigl5 ( start 818 - stop 2089 ) translated VPEGAWVGGACARPRGPRAHVRLYAVCFVCPQGIRGQDFNLLFVDEANFIRPDAVQTIMGFLNQANCK II
FVSSTNTGKASTSFLYNLRGAADELLNWTYICDDHMPRWTHTNATACSCYILNKPVFITMDGAVRR TA DLFLPDSFMQEIIGGQARETGDDRPVLTKSAGERFLLYRPSTTTNSGLMAPELYVYVDPAFTANTRAS GT
GIAWGRYRDDFIIFALEHFFLRALTGSAPADIARCWHSLAQVLALHPGAFRSVRVAVEGNSSQDSA VA
IATHVHTEMHRILASAGANGPGPELLFYHCEPPGGAVLYPFFLLNKQKTPAFEYFIKKFNSGGVMASQ EL
VSVTVRLQTDPVEYLSEQLNNLIETVSPNTDVRMYSGKRNGAADDLMVAVIMAIYLAAPTGIPPAFFP
IT
RTS*
gi | l39646 | sp | P04295 | VTER_HSV11 PROBABLE DNA PACKAGING PROTE
[ SEQ ID NO : 146 ] >contigl5 ( start 3520 - stop 2429 ) translated
VLLSPAPPPLPHGRCPPSLFHHRPGCVALSGPPAPPRSGVSRPGAMITDCFEADIAIPSGISRPDAAA
LQ RCEGRWFLPTIRRQLALADVAHESFVSGGVSPDTLGLLLAYRRRFPAVITRVLPTRIVACPVDLGLT
HA
GTVNLRNTSPVDLCNGDPVSLVPPVFEGQATDVRLESLDLTLRFPVPLPTPLAREIVARLVARGIRDL
NP
DPRTPGELPDLNVLYYNGARLSLVADVQQLASVNTELRSLVLNMVYSITEGTTLILTLIPRLLALSAQ DG
YVNALLQMQSVTREAAQLIHPEAPMLMQDGERRLPLYEALVAWLAHAGQLGDILALAPAVRVCTFDGA
AV
VQSGDMAPVIRYP*
gi|l3919l|sp|P10202|VP23_HSVll CAPSID PROTEIN VP23
[SEQ ID NO: 147] >contigl5 (start 7954 - stop 3764) translated VWEGLGLPELGLMEPANPPRNPMAAPARDPPGYRYAAAMVPTGSILSTIEVASHRRLFDFFARVRSDE NS LYDVEFDALLGSYCNTLSLVRFLELGLSVACVCTKFPELAYMNEGRVQFEVHQPLIARDGPHPVEQPV HN
YMTKVIDRRALNAAFSLATEAIALLTGEALDGTGISLHRQLRAIQQLARNVQAVLGAFERGTADQMLH VL LEKAPPLALLLPMQRYLDNGRLATRVARATLVAELKRSFCDTSFFLGKAGHRREAIEAWLVDLTTATQ ps
VAVPRLTHADTRGRPVDGVLVTTAAIKQRLLQSFLKVEDTEADVPVTYGEMVLNGANLVTALVMGKAV RS
LDDVGRHLLEMQEEQLEANRETLDELESAPQTTRVRADLVAIGDRLVFLEALEKRIYAATNVPYPLVG AM DLTFVLPLGLFNPAMERFAAHAGDLVPAPGHPEPRAFPPRQLFFWGKDHQVLRLSMENAVGTVCHPSL
MN
IDAAVGGVNHDPVEAANPYGAYVAAPAGPGADMQQRFLNAWRQRLAHGRVRWVAECQMTAEQFMQPDN
AN LALELHPAFDFFAGVADVELPGGEVPPAGPGAIQATWRWNGNLPLALCPVAFRDARGLELGVGRHAM
AP
ATIAAVRGAFEDRSYPAVFYLLQAAIHGSEHVFCALARLVTQCITSYWNNTRCAAFVNDYSLVSYIVT
YL
GGDLPEECMAVYRDLVAHVEALAQLVDDFTLPGPELGGQAQAELNHLMRDPALLPPLVWDCDGLMRHA AL
DRHRDCRIDAGGHEPVYAAACNVATADFNRNDGRLLHNTQARAADAADDRPHRPADWTVHHKIYYYVL
VP
AFSRGRCCTAGVRFDRVYATLQNMWPEIAPGEECPSDPVTDPAHPLHPANLVANTVNAMFHNGRVW
DG PAMLTLQVLAHNMAERTTALLC S AAPDAGANTASTANMRI FDGALHAGVLLMAPQHLDHT I QNGEYFY
VL
PVHALFAGADHVANAPNFPPALRDLARHVPLVPPALGANYFSSIRQPWQHARESAAGENALTYALMA
GY
FKMSPVALYHQLKTGLHPGFGFTWRQDRFVTENVLFSERASEAYFLGQLQVARHETGGGVSFTLTQP RG
NVDLGVGYTAVAATATVRNPVTDMGNLPQNFYLGRGAPPLLDNAAAVYLRNAWAGNRLGPAQPLPVF
GC
AQVPRRAGMDHGQDAVCEFIATPVATDINYFRRPCNPRGRAAGGVYAGDKEGDVIALMYDHGQSDPAR
PF AATANPWASQRFSYGDLLYNGAYHLNGASPVLSPCFKFFTAADITAKHRCLERLIVETGSAVSTATAA
SD
VQFKRPPGCRELVEDPCGLFQEAYPITCASDPALLRSARDGEAHARETHFTQYLIYDASPLKGLSL*
gi|137571|sp|P06491|VCAP_HSVll MAJOR CAPSID PROTEIN (MCP)
[SEQ ID NO: 148] >contigl5 (start 8869 - stop 8201) translated MTMRDDVPLLDRELVYEAACGGEDGELPLDEQFSLSSYGTSDFFVSSAYSRLPPHTQPVFSKRWMFA WS
FLVLKPLELVAAGMYYGWTGRAVAPACIIAAVLAYYVTWLARALLLYVNIKRDRLPLSPPVFWGLCVI MG
GAALCALVAAAHETFSPDGLFHWITASQLLPRTDPLRARSLGIACAAGAAMWVAAADCFAAFTNFFLA
RF
WTRAILKAPVAF*
gi|l36841 | sp | P10204 |UL20_HSV11 MEMBRANE PROTEIN UL20
[SEQ ID NO: 149] >contigl5 (start 9205 - stop 11118) translated VGRQGERWVGGGNEENTQRATSGMRPELSLKGRPCVTEAWCPSTDAAIHSGGSSSVRPQPYARAARA RA THGSRSRHRQPLLPPPSSHHPTIPPPPΞPPRGSPAMELSYATTLHHRDWFYVTADRNRAYFVCGGSV
YS
VGRPRDSQPGEIAKFGLWRGTGPKDRMVANYVRSELRQRGLRDVRPVGEDEVFLDSVCLLNPNVSSE
RD VINTNDVEVLDECLAEYCTSLRTSPGVLVTGVRVRARDRVIELFEHPAIVNISSRFAYTPSPYVFALA
QA
HLPRLPSSLEPLVSGLFDGIPAPRQPLDARDRRTDWITGTRAPRPMAGTGAGGAGAKRATVSEFVQV
KH
IDRWSPSVSSAPPPSAPDASLPPPGLQEAAPPGPPLRELWWVFYAGDRALEEPHAESGLTREEVRAV HG
FREQAWKLFGSVGAPRAFLGAALALSPTQKLAVYYYLIHRERRMSPFPALVRLVGRYIQRHGLYVPAP
DE
PTLADAMNGLFRDALAAGTVAEQLLMFDLLPPKDVPVGSDARADSAALLRFVDSQRLTPGGSVSPEHV
MY LGAFLGVLYAGHGRLAAATHTARLTGVTSLVLTVGDVDRMSAFDRGPAGAAGRTRTAGYLDALLTVCL
AR
AQHGQSV*
gi|l36845|sp|P10205|UL21_HSVll PROTEIN UL21
[SEQ ID NO: 150] >contig 15 (start 14107 - stop 11339) translated
VSISAGVRGQGWHRISTPPKNGAGRSVLVFGLVLPLCFYPHPTPSFGPRLRQQRASDSLRGAEPLWAV
GT
DTPPSADWQPGRTTMGPGLWWMGVLVGVAGGHDTYWTEQIDPWFLHGLGLARTYWRDTNTGRLWLPN TP
DASDPQRGRLAPPGELNLTTASVPMLRWYAERFCFVLVTTAEFPRDPGQLLYIPKTYLLGRPRNASLP
EL
PEAGPTSRPPAEVTQLKGLSHNPGASALLRSRAWVTFAAAPDREGLTFPRGDDGATERHPDGRRNAPP
PG PPAGTPRHPTTNLSIAHLHNASVTWLAARGLLRTPGRYVYLSPSASTWPVGVWTTGGLAFGCDAALVR
AR
YGKGFMGLVISMRDSPPAEIIWPADKTLARVGNPTDENAPAVLPGPPAGPRYRVFVLGAPTPADNGS
AL
DALRRVAGYPEESTNYAQYMSRAYAEFLGEDPGSGTDARPΞLFWRLAGLLASSGFAFVNAAHAHDAIR LS
DLLGFLAHSRVLAGLAARGAAGCAADSVFLNVSVLDPAARLRLEARLGHLVAAILEREQSLAAHALGY
QL
AFVLDSPAAYGAVAPSAARLIDALYAEFLGGRALTAPMVRRALFYATAVLRAPFLAGAPSAEQRERAR
RG LLITTALCTSDVAAATHADLRAALARTDHQKNLFWLPDHFSPCAASLRFDLAEGGFILDALAMATRSD
IP
ADVMAQQTRGVASVLTRWAHYNALIRAFVPEATHQCSGPSHNAEPRILVPITHNASYWTHTPLPRGI
GY
KLTGVDVRRPLFI YLTATCEGHAREIEPKRLVRTENRRDLGLVGAVFLRYTPAGEVMSVLLVDTDAT QQ QLAQGPVAGTPNVFSSDVPSVALLLFPNGTVIHLLAFDTLPIATIAPGFLAASALGWMITAALAGIL
RV
VRTCVPFLWRRE*
gi| 138315 |sp|P06477|VGLH_HSVll GLYCOPROTEIN H PRECURSOR
[SEQ ID NO:151] >contigl5 (start 15322 - stop 14192) translated
MASHAGQQHAPAFGQAARASGPTDGRAASRPSHRQGASEARGDPELPTLLRVYIDGPHGVGKTTTSAQ
LM EALGPRDNIVYVPEPMTYWQVLGASETLTNIYNTQHRLDRGEISAGEAAWMTSAQITMSTPYAATDA
VL
APHIGGEAVGPQAPPPALTLVFDRHPIASLLCYPAARYLMGSMTPQAVLAFVALMPPTAPGTNLVLGV
LP
EAEHADRLARRQRPGERLDLAMLSAIRRVYDLLANTVRYLQRGGRWREDWGRLTGVAAATPRPDPEDG AG
SLPRIEDTLFALFRVPELLAPNGDLYHIFAWVLDVLADRLLPMHLFVLDYDQSPVGCRDALLRLTAGM
IP
TRVTTAGSIAEIRDLARTFAREVGGV*
gi | 125438 | sp | P04407 | KITH_HSV23 THYMIDINE KINASE
[ SEQ ID NO : 152 ] >contigl5 ( start 15005 - stop 16069 ) translated
VLRWDVRQGLGGPQHLPVSHRLGDVDDIVARPQGLHQLRGGGGLPHPVGSVYINPQQRGQLRIPAGF
GG PLAMARTGRRAAVGRPARTSSLTERRRVLLAGVRSHTRFYKAFAREVREFNATRICGTLLTLMSGSLQ
GR
SLFEATRVTLICEVDLGPRRPDCICVFEFANDKTLGGVCVILELKTCKSISSGDTASKREQRTTGMKQ
LR
HSLKLLQSLAPPGDKWYLCPILVFVAQRTLRVSRVTRLVPQKISGNITAAVRMLQSLSTYAVPPEPQ TR
RSRRRVAATARPQRPPSPTRDPEGTAGHPAPPESDPPSPGWGVAAEGGGVLQKIAALFCVPVAAKSR
PR
TKTE*
gi|136854|sp|P10208|UL24_HSVll PROTEIN UL24
[SEQ ID NO:153] >contigl5 (start 16350 - stop 18107) translated
MDPYYPFDALDVWEHRRFIVADSRSFITPEFPRDFWMLPVFNIPRETAAERAAVLQAQRTAAAAALEN
AA LQAAELPVDIERRIRPIEQQVHHIADALEALETAAAAAEEADAARDAEARGEGAADGAAPSPTAGPAA
AE
MEVQIVRNDPPLRYDTNLPVDLLHMVYAGRGAAGSSGWFGTWYRTIQERTIADFPLTTRSADFRDGR
MS
KTFMTALVLSLQSCGRLYVGQRHYSAFECAVLCLYLLYRTTHESSPDRDRAPVAFGDLLARLPRYLAR LA AVIGDESGRPQYRYRDDKLPKAQFAAAGGRYEHGALATHWIATLVRHGVLPAAPGDVPRDTSTRVNP
DD
VAHRDDVNRAAAAFLARGHNLFLWEDQTLLRATANTITALAVLRRLLANGNVYADRLDNRLQLGMLIP
GA
VPAEAIARGASGLDSGAIKSGDNNLEALCVNYVLPLYQADPTVELTQLFPGLAALCLDAQAGRPLAST
RR
WDMSSGARQAALVRLTALELINRTRTNTTPVGEIINAHDALGIQYEQGLGLLAQQARIGLASNAKRF
AT
FNVGSDYDLLYFLCLGFIPQYLSVA*
gi I 136863 | sp | P10209 | UL25_HSV11 VIRION PROTEIN UL25
[ SEQ ID NO : 154 ] >contigl5 ( start 18328 - stop 20256 ) translated
VRVPMASAEMRERLEAPLPDRAVPIYVAGFLALYDSGDPGELALDPDTVRAALPPENPLPINVDHRAR CE
VGRVLAWNDPRGPFFVGLIACVQLERVLETAASAAIFERRGPALSREERLLYLITNYLPSVSLSTKR
RG
DEVPPDRTLFAHVALCAIGRRLGTIVTYDTSLDAAIAPFRHLDPATREGVRREAAEAELALAGRTWAP
GV EALTHTLLSTAVNNMMLRDRWSLVAERRRQAGIAGHTYLQASEKFKIWGAESAPAPERGYKTGAPGAM
DT
SPAASVPAPQVAVRARQVASSSSΞSΞSFPAPADMNPVSASGAPAPPPPGDGSYLWIPAFHYNQLVTGQ
SA
PHHPPLTACGLPAAGTVAYGHPGAGPΞPHYPPPPAHPYPGMLFAGPSPLEAQIAALVGAIAADRQAGG LP
AAAGDHGIRGSAKRRRHEVEQPEYDCGRDEPDRDFPYYPGEARPEPRPVDSRRAARQASGPHETITAL
VG
AVTSLQQELAHMRARTHAPYGPYPPVGPYHHPHADTETPAQPPRYPAEAVYLPPPHIAPPGPPLSGAV
PP PSYPPVAVTPGPAPPLHQPSPAHAHPPPPPPGPTPPPAASLPQPEAPGAEAGALVNASSAAHVNVDTA
RA
ADLFVSQMMGSR*
>gi 1529230 UL26 [Herpes simplex virus type 1]
[SEQ ID NO: 155] = Contig 16 Length: 11707 Type: N Check: 605^
[SEQ ID NO:156] >contigl6 (start 190 - stop 2) translated MEAPGIVWVEESVSAITLYAVWLPPRTRDCLHALLYLVCRDAAGEARARFAEVSVGSSXXXXX
gi|l36802|sp|P10192|HEPA_HSVll DNA HELICASE/PRIMASE COMPLEX [ SEQ ID NO : 157 ] >contigl 6 ( start 2855 - stop 240 ) translated
MAETMNVATCTHQTHHAARAPGATSAPGAASGDPLGARRPIGDDECEQYTSSVSLARMLYGGDLAEWV
PR
VHPKTTIERQQHGPVTFPDASAPTARCVTVVRAPMGSGKTTALIRWLGEAIHSPDTSVLWSCRRSFT QT
LATRFAESGLPDFVTYFSSTNYIMNDRPFHRLIVQVESLHRVGPNLLNNYDVLVLDEVMSTLGQLYSP
TM
QQLGRVDALMLRLLRTCPRIIAMDATANAQLVDFLCSLRGEKNVHWIGEYAMPGFSARRCLFLPRLG
PE VLQAALRPPGPAGGAPPPDAPPDATFFGELEARLAGGDNVCIFSSTVSFAEWARFCRQFTDRVLLLH
SL
TPPGDVTTWGRYRWIYTTWTVGLSFDPPHFDSMFAYVKPMNYGPDMVSVYQSLGRVRTLRKGELLI
YM
DGSGARSEPVFTPMLLNHWSASGQWPAQFSQVTNLLCRRFKGRCDASHADAAQARGSRIYSKFRYKH F
ERCTLACLADSLNILHMLLTLNCMHVRFWGHDAALTPRNFCLFLRGIHFDALRAQRDLRELRCQDPDT
SL
SAQAAETEEVGLFVEKYLRPDVAPAEWALMRGLNSLVGRTRFIYLVLLEACLRVPMAAHSSAIFRRL
YD HYATGVIPTINAAGELELVALHPTLNVAPVWELFRLCSTMAACLQWDSMAGGSGRTFSPEDVLELLNP
HY
DRYMQLVFELGHCNVTDGPLLSEDAVKRVADALSGCPPRGSVSETEHALSLFKIIWGELFGVQLAKST
QT
FPGAGRVKNLTKRAIVELLDAHRIDHSACRTHRQLYALLMAHKREFAGARFKLRAPAWGRCLRTHASG AQ
PNTDIILEAALSELPTEAWPMMQGAVNFSTL*
gi I 136806 |sp|P10193 |OBP_HSVll ORIGIN OF REPLICATION BINDING
[SEQ ID NO: 158] >contigl6 (start 2707 - stop 4137) translated
VYCSHSSSPMGRRAPRGSPEAAPGADVAPGARAAWWVWCVQVATFIVSAICWGLLVLASVFRDRFPC
LY
APATSYAEANATVEVRGGVAVPLRLDTQSLLATYAITSTLLLAAAVYAAVGAVTSRYERALDAARRLA
AA RMAMPHATLIAGNVCAWLLQITVLLLAHRISQLAHLIYVLHFACLVYLAAHFCTRGVLSGTYLRQVHG
LI
DPAPTHHRIVGPVRAVMTNALLLGTLLCTAAAAVSLNTIAALNFNFSAPSMLICLTTLFALLWSLLL
W
EGVLCHYVRVLVGPHLGAIAATGIVGLACEHYHTGGYYWEQQWPGAQTGVRVALALVAAFALAMAVL RC
TRAYLYHRRHHTKFFVRMRDTRHRAHSALRRVRSSMRGSRRGGPPGDPGYAETPYASVSHHAEIDRYG
DS
DGDPIYDEVAPDHEAELYARVQRPGPVPDAEPIYDTVEGYAPRSAGEPVYSTVRRW*
gi|l36810|sp|P04288|VGLM_HSVll GLYCOPROTEIN [SEQ ID NO: 159] >contigl6 (start 4621 - stop 4331) translated MGLAFSGARPCCCRHNVIITDGGEWSLTAHEFDWDIESEEEGNFYVPPDMRWTRAPGPQYRRASD PP SRHTRRRDPDVARPPATLTPPLSDSE*
gi|136816|sp|P13294|ULll_HSV2 HYPOTHETICAL UL11 PROTEIN
[SEQ ID NO: 160] >contigl6 (start 6399 - stop 4537) translated MAAAATPGAKRPADPARDPDSPPKRPRPNSLDLATVFGPRPAPPRPTΞPGAPGSHWPQSPPRGQPDGG
AP
GEKARPASPALSEASSGPPTPDIPLSPGGAHAIDPDCSPGPPDPDPMWSASAIPNALPPHILAETFER
HL
RGLLRGVRSPLAIGPLWARLDYLCSLWSLEAAGMVDRGLGRHLWRLTRRAPPSAAEAVAPRPLMGFY EA
ATQNQADCQLWALLRRGLTTASTLRWGAQGPCFSSQWLTHNASLRLDAQSSAVMFGRVNEPTARNLLF
RY
CVGRADAGVNDDADAGRFVFHQPGDLAEENVHACGVLMDGHTGMVGASLDILVCPRDPHGYLAPAPQT
PL AFYEVKCRAKYAFDPADPGAPAASAYEDLMARRSPEAFRAFIRSIPNPGVRYFAPGRVPGPEEALVTQ
DR
DWLDSRAAGEKRRCSAPDRALVELNSGWSEVLLFGVPDLERRTISPVAWSSGELVRREPIFANPRHP
NF
KQILVQGYVLDSHFPDCPLQPHLVTFLGRHRAGAEEGVTFRLEDGRGAPAGRGGAPGPAKASILPDQA VP
IALIITPVRVEPGIYRDIRRNSRLAFDDTLAKLWASRSPGRGPAAADTTSSSPTAGRSSR*
gi|119694|sp|P06489|EXON_HSV2 ALKALINE EXONUCLEASE
[SEQ ID NO: 161] >contigl6 (start 8023 - stop 6440) translated
VGGRRPGGRMDESGRQRPASHVAADISPQGAHRRSFKAWLASYIHSLSRRASGRPSGPSPRDGAVSGA
RP
GSRRRSSFRERLRAGLSRWRVSRSSRRRSSPEAPGPAAKLRRPPLRRSETAMTSPPSPPSHILSLARI
HK LCIPVFAVNPALRYTTLEIPGARSFGGSGGYGEVQLIREHKLAVKTIREKEWFAVELVATLLVGECAL
RG
GRTHDIRGFITPLGFSLQQRQIVFPAYDMDLGKYIGQLASLRATTPSVATALHHCFTDLARAWFLNT
RC
GISHLDIKCANVLVMLRSDAVSLRRAVLADFSLVTLNSNSTISRGQFCLQEPDLESPRGFGMPAALTT AN
FHTLVGHGYNQPPELLVKYLNNERAEFNNRPLKHDVGLAVDLYALGQTLLELLVSVYVAPSLGVPVTR
VP
GYQYFNNQLSPDFAVALLAYRCVLHPALFVNSAETNTHGLAYDVPEGIRRHLRNPKIRRAFTEQCINY
QR THKAVLSSVSLPPELRPLLVLVSRLCHANPAARHSLS* gi|125628|sp|P04290|KR2_HSVll PROBABLE SERINE/THREONINE-PRO
[SEQ ID NO:162] >contigl6 "(start 8409 - stop 7750) translated MSRDASHAALRRRLAETHLRAEVYRDQTLQLHREGVSTQDPRFVGAFMAAKAAHLELEARLKSRARLE MM
RQRATCVKIRVEEQAARRDFLTAHRRYLDPALSERLDAADDRLADQEEQLEEAAANASLWGDGDLADG WM
SPGDSDLLVMWQLTSAPKVHTDAPSRPGSRPTYTPSAAGRPDAQAAPPPETAPSPEPAPGPAADPASG SG
FARDCPDGE*
gi 1136823 | sp | P042911 UL14_HSV11 HYPOTHETICAL UL14 PROTEIN
[SEQ ID NO:163] >contigl6 (start 8295 - stop 9788) translated
VYSRPPGVAAGSGPCTPRPGGASRPNVGAGPRGWRLGSSRRPRARPTSDSFAPTPLTSAAPASPAMFG
QQ
LASDVQQYLERLEKQRQQKVGVDEASAGLTLGGDALRVPFLDFATATPKRHQTWPGVGTLHDCCEHS
PL FSAVARRLLFNSLVPAQLRGRDFGGDHTAKLEFLAPELVRAVARLRFRECAPEDAVPQRNAYYSVLNT
FQ
ALHRSEAFRQLVHFVRDFAQLLKTSFRASSLAETTGPPKKRAKVDVATHGQTYGTLELFQKMILMHAT
YF
LAAVLLGDHAEQVNTFLRLVFEIPLFSDTAVRHFRQRATVFLVPRRHGKTWFLVPLIALSLASFRGIK IG
YTAHIRKATEPVFDEIDACLRGWFGSSRVDHVKGETISFSFPDGSRSTIVFASSHNTNVSTPSSRGAC
FP
GAALPEIDRQTNTARRECGTTRPQPPPPWRGEALLFICNRTMRLWPRPARPRGSSLQTGGWYTMTERR
GA TRRWSGG*
gi | 139646 | sp | P04295 | VTER_HSVll PROBABLE DNA PACKAGING PROTEIN
[ SEQ ID NO : 164 ] >contigl 6 ( start 10626 - stop 9661 ) translated VWRWRGDERLKIFRCLTVLTEPLCQVALPDPDPERALFCEIFLYLTRPKALRLPSNTFFAIFFFNRE RR
YCATVHLRSVTHPRTPLLCTLAFGHLEAASPPEETPDPAAEQLADEPVAHELDGAYLVPTEPPPNPGA CC ALGPGAWWHLPGGRIYCWAMDDDLGSLCPPGSRARHLGWLLSRITDPPGGGGACAPTAHIDSANALWR AP
AVAEACPCVAPCMWSNMAQRTLAVRGDASLCQLLFGHPVDAVILRQATRRPRITAHLHEWVGRDGAE
SV
IRPTSAGWRLCVLSSYTSRLFATSCPAVARAVARASSΞDYK*
gi| 136829 |sp|P10200|ULl6_HSVll PROTEIN UL16 [ SEQ ID NO : 165 ] >contigl 6 ( start 11723 - stop 10881 ) translated LTEACAAERWRPHQLSPAAQTALLRRFPALEGPLRHPRPVLQPFDIAAEVAFVARIQIACLRALGHS IR
AALQGGPRIFQRLRYDFGPHQSEWLGEVTRRFPVLLENLMRALEGTAPDAFFHTAYALAVLAHLGGQG GR
GRRRRLVPLSDDIPARFADSDAHYAFDYYSTSGDTLRLTNRPIAWIDGDVNGREQSKCRFMEGSPST AP
HRVCEQYLPGESYAYLCLGFNRRLCGLWFPGGFAFTINTAAYLSLADPVARAVGLRFCRGAATGPGL VR
gi I 136835 | sp | P10201 | UL17_HSV11 PROTEIN UL17
[SEQ ID NO:166] = Contig ID 17 Length: 732 Type: N Check: 3911
[SEQ ID NO: 167] >contigl7 (start 747 - stop 1) translated PAASPLEPLGDPTLWRALYACVLAALERQTGPVALFVPLRLGWDPQTGLWRVERASWGPPAAPRAAL
LD
VEAKVDVDPLALAARVAEHPGARLAWARLAAIRDSPQCASSASLAVTITTRTARFAREYTTLAFPPTS
KE
GAFADLVEVCEVGLRPRGHPQRVTARVLLPRGYDYFVSAGDGFSAPALVALFRQWHTTVHAAPGALAP VF
AFLGAGFDVRGGPVQYFAVLGFPGWPTFTVPAAAXXXXX
gi I 136802 | sp I P10192 I HEPA_HSV11 DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:168] Contig ID 18 Length: 3006 Type: N Check: 6117 ..
[SEQ ID NO: 169] >contigl8 (start 2 - stop 673) translated XXXXXALEREQRAADRAAGGGAGRPAEADLLRADYDIIDVSKSMDDDTYVANSFQHQYIPAYGQDLER LS
RLWEHELVRCFKILRHRNNQGQETSISYSSGAIASFVAPYFEYVLRAPRAGALITGSDVILGEEELWE AV
FKKTRLQTYLTDVAALFVADVQHAALPRPPSPTPADFRASASPRGGSRSRTRTRSRSPGRTPRGAPDQ GW GVERRDGRPHARR* gi 1 136794 | sp | P10190 | UL06_HSV11 VIRION PROTEIN UL6
[SEQ ID NO: 170] >contigl8 (start 612 - stop 1538) translated VRRTRAGASNAGMADPTPADEGTAAAILKQAIAGDRSLVEVAEGISNQALLRMACEVRQVSDRQPRFT AT
SVLRVDVTPRGRLRFVLDGSSDDAYVASEDYFKRCGDQPTYRGFAVWLTANEDHVHSLAVPPLVLLH RL
SLFRPTDLRDFELVCLLMYLENCPRSHATPSLFVKVSAWLGWARHASPFERVRCLLLRSCHWILNTL MC
MAGVKPFDDELVLPHWYMAHYLLANNPPPVLSALFCATPQSSALQLPGPVPRTDCVAYNPAGVMGSCW KS KDLRSALVYWWLSGSPKRRTSSLFYRFC*
gi 1136798 | sp | P101911 UL07_HSV11 PROTEIN UL7
[SEQ ID NO: 171] >contigl8 (start 3021 - stop 1795) translated ACLGAWPAVGARWLPPRAWPAVASEAAGRLLPAFREAVARWHPTATTIQLLDPPAAVGPVWTARFCF SG
LQAQLLAALAGLGEAGLPEARGRAGLERLDALVAAAPSEPWARAVLERLVPDACDACPALRQLLGGVM AA
VCLQIEQTASSVKFAVCGGTGAAFWGLFNVDPGDADAAHGAIQDARRALEASVRAVLSANGIRPRLAP SL
ALEGVYTHWTWSQTGAWFWNSRDDTDFLQGFPLRGPAYAAAAEVMRDALRRILRRPAAGPPEEAVCA AR
GIMEDACDRFVLDAFGRRLDAEYWSVLTPPGEADDPLPQTAFRGGALLDAEQYWRRWRVCPGGGESV GV PVDLYPRPLVLPPVDCAHHLREILREIQLVFTGVLEGVWGEGGSFVYPFEEKMRFLFP*
gi 1136802 | sp | P10192 | HEPA_HSV11 DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO: 172] Contig ID 2 Length: 429 Type: N Check: 5672
[SEQ ID NO: 173] Contig ID 3 Length: 15901 Type: N Check: 1337 ..
[SEQ ID NO: 174] >contig3 (start 1547 - stop 2791) translated MADIPPDPPALNTTPANHAPPSPPPGSRKRRRPVLPSSSESEGKPDTESESSSTESSEDEAGDLRGGR
RR
SPRELGGRYFLDLSAESTTGTESEGTGPSDDDDDDASDGWLVDTPPRKSKRPRINLRLTSSPDRRAGV
VF
PEVWRNDRPIRAAQPQAPAQSSGDRAAAPRRSARQAQMRSGAAWTLDLHYIRQCVNQLFRILRAAPNP PG SANRLRHLVRDCYLMGYCRTRLGPRTWGRLLQISGGTWDVRLRNAIREVEARFEPAAEPVCELPCLNA RR
YGPECDVGNLETNGGSTSDDEISDATDSDDTLASHSDTEGGPSPAGRENPESASGGAIAARLECEFGT FD WTSEEGSQPWLSAWADTSSAERSGLPAPGACRATEAPEREDGCRKMRFPAACPYPCGHTFLRP*
gi I 124184 | sp| P04485 | IE68„HSV11 IMMEDIATE-EARLY PROTEIN IE68
[SEQ ID NO: 175] >contig3 (start 3848 - stop 2973) translated MGVVVVSVVTLLDQRNALPRTSADASPALWSFLLRQCRILASEPLGTPVVVRPANLRRLAEPLMDLPK FT
RPIVRTRSCRCPPNTTTGLFAEDDPLESIEILDAPACFRLLHQERPGPHRLYHLWWGAADLCVPFLE YA
QKTRLGFRFIAMKTNDAWVGEPWPLPDRFLPERTVSWTPFPAAPNHPLENLLSRYEYQYGVWPGDRE RS
CLRWLRSLVAPHNKPRPASSRPHPATHPTQRPCFTCMGRPEIPDEPSWQTGDDDPQNPGPPLAVGDEW
PP SSHVCYPITNL*
gi|137125|sp|P13292|US02_HSV2 PROTEIN US2
[SEQ ID NO: 176] >contig 3 (start 4044 - stop 5579) translated VGGCVDKLPLLKTPGPVARGARWLARATRRMACRKFCGVYRRPDKRQEASVPPETNTAPAFPASTFYT PA EDAYLAPGPPETIHPSRPPSPGEAARLCQLQEILAQMHΞDEDYPIVDAAGAEEEDEADDDAPDDVAYP ED
YAEGRFLSMVSAAPLPGASGHPPVPGRAAPPDVRTCDSGKVGATGFTPEELDTMDREALRAISRGCKP PS
TLAKLVTGLGFAIHGALIPGSEGCVFDSSHPNYPHRVIVKAGWYASTNHEARLLRRLNHPAILPLLDL HV
VSGVTCLVLPKYHCDLYTYLSKRPSPLGHLQITAVSRQLLSAIDYVHCEGIIHRDIKTENILINTPEN
IC
LGDFGAACFVRGCRSSPFHYGIAGTIDTNAPEVLAGDPYTQVIDIWSAGLVIFETAVHTASLFSAPRD PE RRPCDNQIARIIRQAQVHVDEFPTHAESRLTAHYRSRAAGNNRPAWTRPAWTRYYKIHTDVEYLICKA LT FDAALRPSAAELLRLPLFHPK*
gi|l25617|sp|P13287|KRl_HSV2 SERINE/THREONINE-PROTEIN KINASE
[SEQ ID NO: 177] >contig3 (start 8255 - stop 8368) translated VGGLCLMILGMACLLEVLRRLGRELARCCPHAGQFAP*
gi|l37132|sp|P13293 |VGLJ_HSV2 GLYCOPROTEIN J [SEQ ID NO: 178] >contig 3 (start 8791 - stop 9993) translated VCIAYHGMGRLTSGVGTAALLWAVGLRWCAKYALADPSLKMADPNRFRGKNLPVLDQLTDPPGVKR VY
HIQPSLEDPFQPPSIPITVYYAVLERACRSVLLHAPSEAPQIVRGAΞDEARKHTYNLTIAWYRMGDNC Al
PITVMEYTECPYNKSLGVCPIRTQPRWSYYDSFSAVSEDNLGFLMHAPAFETAGTYLRLVKINDWTEI TQ
FILEHRARASCKYALPLRIPPAACLTSKAYQQGVTVDSIGMLPRFIPENQRTVALYSLKIAGWHGPKP PY TSTLLPPELSDTTNATQPELVPEDPEDSALLEDPAGTVSSQIPPNWHIPSIQDVAPHHAPAAPSNPGL II GALAGSTLAVLVIGGIAFWVRRRAQMAPKRLRLPHIRDDDAPPSHQPLFY*
gi|l38234|sp|P03172|VGLD_HSV2 GLYCOPROTEIN D PRECURSOR
[SEQ ID NO: 179] >contig3 (start 10012 - stop 11313) translated
VYLWARVGGWLGYLGGTWTPHKGSLEGGKLGQFIGRERGARTAVPTISHRAHSHLDPSDPGMPGRΞLQ
GL
AILGLWVCATGLWRGPTVSLVSDSLVDAGAVGPQGFVEEDLRVFGELHFVGAQVPHTNYYDGIIELF HY
PLGNHCPRWHWTLTACPRRPAVAFTLCRSTHHAHSPAYPTLELGLARQPLLRVRTATRDYAGLYVL
RV
WVGSATNASLFVLGVALSANGTFVYNGSDYGSCDPAQLPFSAPRLGPSSVYTPGASRPTPPRTTTSPS
SP RDPTPAPGDTGTPAPASGERAPPNSTRSASESRHRLTVAQVIQIAIPASIIAFVFLGSCICFIHRCQR
RY
RRPRGQIYNPGGVSCAVNEAAMARLGAELRSHPNTPPKPRRRSSSSTTMPSLTSIAEESEPGPWLLS
VS
PRPRSGPTAPQEV*
gi|l38328|sp|P06764|VGLI_HSV23 GLYCOPROTEIN I
[SEQ ID NO: 180] >contig3 (start 11632 - stop 12984) translated
MARGAGLVFFVGVWWSCLAAAPRTSWKRVTSGEDWLLPAPAGPEERTRAHKLLWAAEPLDACGPLR PS
WVALWPPRRVLETWDAACMRAPEPLAIAYSPPFPAGDEGLYSELAWRDRVAWNESLVIYGALETDS
GL
YTLSWGLSDEARQVASWLWEPAPVPTPTPDDYDEEDDAGVSERTPVSVPPPTPPRRPPVAPPTHP
RV IPEVSHVRGVTVHMETPEAILFAPGETFGTNVSIHAIAHDDGPYAMDWWMRFDVPSSCAEMRIYEAC
LY
HPQLPECLSPADAPCAVSSWAYRLAVRSYAGCSRTTPPPRCFAEARMEPVPGLAWLASTVNLEFQHAS
PQ
HAGLYLCWYVDDHIHAWGHMTISTAAQYRNAWEQHLPQRQPEPVEPTRPHVRAPPPAPSARGPLRL GA VLGAALLLAALGLSAWGVHDLLAQALLAGG*
gi|138240|sp|P04488!VGLE_HSVll GLYCOPROTEIN E PRECURSOR
[SEQ ID NO:181] >contig3 (start 13431 - stop 13568) translated VALHAVDAPSQFVTWLAVRWLRGAVGLGAVLCGIAFYVTSIARGA*
gi|l944544|gnl|PID|e312381 US8A
[SEQ ID NO: 182] >contig3 (start 13668 - stop 13937) translated MTSRPADQDSVRSSASVPLYPAASPVPAEAYYSESEDEAANDFLVRMGRQQSVLRRRRRRTRCVGLVI AC LWALLSGGFGALLVWLLR*
gi|l35568|sp|P0648l|TEGP_HSVll TEGUMENT PHOSPHOPROTEIN US9
[SEQ ID NO:183] >contig3 (start 15333 - stop 14425) translated
MIRRRGNVEIRVYYESVRPSRSRSHLKPSDHQEFPGHHVSPGSPGFPESPGNREFHDLPENPGSRAYP
GT RDPHDPHGCPGSLDPHGNPAQPAGLPSPVPYAPLGSPDPSSPRQRTYVLPRVGIRNAPASDTRAPKRA
HS
RHRADRPPESPGSELYPLNAQALAHLQMLPADHRAFFRTVIEVSRLCALNTHDPPPPLAGARVGQEAQ
LV
HTQWLRANRESSPLWPWRTAAMNFIAAAAPCVQTHRHMHDLLMACAFWCCLAHASTCSYAGLYSAHCQ HL
FRAFGCGPPVLTTSRGQGGWCN*
gi|137138|sp|P06486|US10_HSVll VIRION PROTEIN US10
[SEQ ID NO: 184] = Contig ID 4 Length: 179 Type: N Check: 5124
[SEQ ID NO: 185] = Contig ID 5 Length: 2117 Type: N Check: 9467
[SEQ ID NO:186] >contig5 (start 1020 - stop 1) translated MLNDMQWLASSDSEEETEVGIΞDDDLHRDSTSEAGSTDTEMFEAGLMDAATPPARPPAERQGSPTPAD AQ
GSCGGGPVGEEEAEAGGGGDVCAVCTDEIAPPLRCQSFPCLHPFCIPCMKTWIPLRNTCPLCNTPVAY LI
VGVTASGSFSTIPIVNDPRTRVEAEAAVRSGTAVDFIWTGNPRTAPRSLSLGGHTVRALSPTPPWPGT DD EDDDLADGEGGRGSGTGRGSGTGRGSGTGRGSGTGRGSGGGRAGVGHWAGVGRGXGTNRGFPSLSPSA
AD
YVPPAPRRAPRRGGGGAGATRGTSQPAATRPAPPGAPRSSSSGGAPLRAGVGSGSXXXXX
gi|124135|sp|P28284|lCP0_HSV2H TRANS-ACTING TRANSCRIPTIONAL
[SEQ ID NO: 187] = Contig 6 Length: 643 Type: N Check: 5042
[SEQ ID NO:188] = Contig 7 Length: 354 Type: N Check: 9326
[SEQ ID NO: 189] = Contig 8 Length: 6387 Type: N Check: 4794
[SEQ ID NO: 190] >contig8 (start3 - stop 1454) translated
XXXXXTRRICARGPALPPGGLAVGGQMYVNRNEIFNAALAVTNIILDLDIALKEPVPFPRLHEALGHF
RR
GALAAVQLLFPAARVDPDAYPCYFFKSACRPRAPPVCAGDGPSAGGDDGDGDWFPDAGGDDGDEEWEE DT
DPMDTTHGPLPDDEAAYLDLLHEQIPAATPSEPDSWCSCADKIGLRVCLPVPAPYWHGSLTMRGVA
RV
IQQAVLLDRDFVEAVGSHVKNFLLIDTGVYAHGHSLRLPYFAKIGPDGSACGRLLPVFVIPPACEDVP
AF VAAHADPRRFHFHAPPMFSAAPREIRVLHSLGGDYVSFFEKKASRNALEHFGRRETLTEVLGRYDVRP
DA
GETVEGFASELLGRIVACIEAHFPEHAREYQAVSVRRAVIKDDWVLLQLIPGRGALNQSLSCLRFKHG
RA
SRATARTFLALSVGTNNRLCASLCQQCFATKCDNNRLHTLFTVDAGTPCSRSAPSSTSRPSSS*
gi|136939|sp|P10236|UL52_HSVll DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO: 191] >contig8 (start 1406 - stop 2422) translated MLAVRSLQHLTTVIFITAYGLVLAWYIVFGASPLHRCIYAVRPAGAHNDTALVWMKINQTLLFLGPPT AP
PGGAWTPHAHVCYANIIEGRAVSLPAIPGAMSRRVMNVHEAVNCLEALWDTQMRLVWGWFLYLAFVA LH
QRRCMFGWSPAHΞMVAPATYLLNYAGRIVSSVFLQYPYTKITRLLCELSVQRQTLVQLFEADPVTFL YH RPAVGVIVGCELLLRFVALGLIVGTALISRGACAITYPLFLTITTWCFVSIIALTELYFILRRDSAPK NA EPAAPRGRSKGWSGVCGRCCSIILSGIAVRLCYIAWAGWLMALRYEQEIQRRLFDL*
gi | 116105 | sp | P22485 | CELF_HSV2H CELL FUSION PROTEIN PRECURSOR [ SEQ ID NO : 192 ] >contig8 ( start 2752 - stop 4506 ) translated
VTPDGEGQGGVSESRPRSCGYKGSHRPTGRCVLPCADPGCASVPLLDSDPATLFRHAPPRRTPAIPAP
AT
YNMATDIDMLIDLGLDLSDSELEEDALERDEEGRRDDPESDSSGECSSSDEDMEDPCGDGGAEAIDAA IP
KGPPARPEDAGTPEASTPRPAARRGADDPPPATTGVWSRLGTRRSASPREPHGGKVARIQPPSTKAPH
PR
GGRRGRRRGRGRYGPGGADSTPNPRRRVSRNAHNQGGRHPASARTDGPGATHGEARRGGEQLDVSGGP
RP RGTRQAPPPLMALSLTPPHADGRAPVPERKAPSADTIDPAVRAVLRSISERAAVERISESFGRSALVM
QD
PFGGMPFPAANSPWAPVLATQAGGFDAETRRVSWETLVAHGPSLYRTFAANPRAASTAKAMRDCVLRQ
EN
LIEALASADETLAWCKMCIHHNLPLRPQDPIIGTAAAVLENLATRLRPFLQCYLKARGLCGLDDLCSR RR
LSDIKDIASFVLVILARLANRVERGVSEIDYTTVGVGAGETMHFYIPGACMAGLIEILDTHRQECSSR
VC
ELTASHTIAPLYVHGKYFYCNSLF*
gi 1124181 | sp | P28276 | IE63_HSV2H TRANSCRIPTIONAL REGULATOR IE
[SEQ ID NO:193] >contig8 (start 4638 - stop 5282) translated MWGPGPARFIARPGTHGRRVFTDPPPRNMTTTPLSNLFLRAPDITHVAPPYCLNATWQAENALHTTKT DP ACLAARSYLVRASCSTSGPIHCFFFAVYKDSQHSLPLVTELRNFADLVNHPPVLRELEDKRGGRLRCT GP
FSCGTIKDVSGASPAGEYTINGIVYHCHCRYPFSKTCWLGASAALQHLRSISSSGTAARAAEQRRHKI KI KIKV*
gi|136947|sp|P28281|UL55_HSV2H PROTEIN UL55
[SEQ ID NO: 194] >contig8 (start 5808 - stop 5455) translated MIGAHPGVGGDLPSGLPTYAEATSDRPPTYAMVMAACPTEPPGGSVGPADQPRVQSSRTWRPPLVNSR EL
YRAQRAARCASSSDTPQAPGWCGGTCRHAVFGWAWWIILAFLWR*
gi 1136952 | sp I P28282 |UL56_HSV2H PROTEIN UL56
[SEQ ID NO:195] = Contig 9 Length: 3700 Type: N Check: 8257 [SEQ ID NO:196] >contig9 (start 2 - stop 355) translated XXXXXGGHAAAGLTELCQTLAPRDLTDPLLFAYVGFQWNHGLMFWPDIAVYAMLGGAVWISLTQVL GL RRRLHKDPDAGPWAAATLRGLFFSVYALGFAAGVLVRPRMAASRRSG*
gi|136909|sp|P10227|UL43_HSVll MEMBRANE PROTEIN UL43
[SEQ ID NO: 197] >contig9 (start 453 - stop 2099) translated
MGAGVPWTGIKARGAGGPITVRVLGWEVAQKATHPCCSCPREAWSGNPPRCAGRAHRSFAGAGALLV MA
LGRVGLAVGLWGLLWVGVVVVLANASPGRTITVGPRGNASNAAPSASPRNASAPRTTPTPPQPRKATK
SK
ASTAKPAPPPKTGPPKTSSEPVRCNRHDPLARYGSRVQIRCRFPNSTRTESRLQIWRYATATDAEIGT
AP SLEEVMVNVSAPPGGQLVYDSAPNRTDPHVIWAEGAGPGASPRLYSWGPLGRQRLIIEELTLETQGM
YY
WVWGRTDRPSAYGTWVRVRVFRPPSLTIHPHAVLEGQPFKATCTAATYYPGNRAEFVWFEDGRRVFDP
AQ
IHTQTQENPDGFSTVSTVTSAAVGGQGPPRTFTCQLTWHRDSVSFSRRNASGTASVLPRPTITMEFTG DH
AVCTAGCVPEGVTFAWFLGDDSSPAEKVAVASQTSCGRPGTATIRSTLPVSYEQTEYICRLAGYPDGI
PV
LEHHGSHQPPPRDPTERQVIRAVEGAGIGVAVLVAWLAGTAWYLTHASSVRYRRLR*
gi|138220|sp|P06475|VGLC_HSV23 GLYCOPROTEIN C PRECURSOR
[SEQ ID NO: 198] >contig9 (start 2266 - stop 2847) translated VGSKRLRKRAPRPDIQARGGAMAFRASGPAYQPLAPAASPARARVPAVAWIGVGAIVGAFALVAALVL VP PRSSWGLSPCDSGWQEFNAGCVAWDPTPVEHEQAVGGCSAPATLIPRAAAKHLAALTRVQAERSSGYW V NGDGIRTCLRLVDSVSGIDEFCEELAIRICYYPRSPGGFVRFVTSIRNALGLP*
gi|136917|sp|P06483|UL45_HSV23 PROTEIN UL45 HOMOLOG
[SEQ ID NO:199] >contig9 (start 3716 - stop 3114) translated QRPAAAARPLAAQREAAGVYDAVRTWGPDAEAEPDQMENTYLLPDDDAAMPAGVGLGATPAADTTAAA WP
AESHAPRAPSEDADSIYESVSEDGGRVYEEIPWVRVYENICLRRQDAGGAAPPGDAPDSPYIEAENPL YD
WGGSALFSPPGATRAPDPGLSLSPMPARPRTNALANDGPTNVAALSALLTKLKRGRHQSH*
gi|114350|sp|P10230|ATI2_HSVll ALPHA TRANS-INDUCING FACTOR TABLE 3
[SEQ ID NO: 200] = Contig ID 2
[SEQ ID NO: 201] = Contig ID 3
[SEQ ID NO: 202] = Contig ID 4
[SEQ ID NO: 203] = Contig ID 5
[SEQ ID NO:204] = Contig ID 7
[SEQ ID NO: 205] = Contig ID 12
[SEQ ID NO: 206]
ORF # = 1 from Contig ID 12
ORF start site = 120
ORF end site = 1371
ORF sequence: MADIPPDPPALNTTPANHAPPSPPPGSRKRRRPVLPSSSESEGKPDTESESSSTESSEDEAGDLRGGR
RR
SPRELGGRYFLDLSAESTTGTESEGTGPSDDDDDDASDGWLVDTPPRKSKRPRINLRLTSSPDRRAGV
VF
PEVWRNDRPIRAAQPQAPAQSSGDRAAAPRRSARQAQMRΞGAAWTLDLHYIRQCVNQLFRILRAAPNP PG
SANRLRHLVRDCYLMGYCRTRLGPRTWGRLLQISGGTWDVRLRNAIREVEARFEPAAEPVCELPCLNA
RR
YGPECDVGNLETNGGSTSDDEISDATDSDDTLASHSDTEGGPSPAGRENPESASGGAIAARLECEFGT
FD WTSEEGSQPWLSAWADTSSAERSGLPAPGACRATEAPEREDGCRKMRFPAACPYPCGHTFLRP*
Gene matched : gi | 124184 | sp | P04485 | IE68_HSV11 Gene name : IMMEDIATE-EARLY PROTEIN IE68
[ SEQ ID NO : 207 ] ORF # = 2 from Contig 12 ORF start site = 2428 ORF end site = 1553 ORF sequence : MGWWSWTLLDQRNALPRTSADASPALWSFLLRQCRILASEPLGTPVWRPANLRRLAEPLMDLPK FT
RPIVRTRSCRCPPNTTTGLFAEDDPLESIEILDAPACFRLLHQERPGPHRLYHLWWGAADLCVPFLE YA
QKTRLGFRFIAMKTNDAWVGEPWPLPDRFLPERTVSWTPFPAAPNHPLENLLSRYEYQYGVWPGDRE RS
CLRWLRSLVAPHNKPRPASSRPHPATHPTQRPCFTCMGRPEIPDEPSWQTGDDDPQNPGPPLAVGDEW
PP
SSHVCYPITNL*
Gene matched: gi | 137125 | sp | P13292 |US02_HSV2 Gene name: PROTEIN US2 «gi | 419137 | pir | | A4
[SEQ ID NO:208]
ORF # 3 from Contig 12 ORF start site = 2714
ORF end site = 4159
ORF sequence :
MACRKFCGVYRRPDKRQEASVPPETNTAPAFPASTFYTPAEDAYLAPGPPETIHPSRPPSPGEAARLC
QL QEILAQMHSDEDYPIVDAAGAEEEDEADDDAPDDVAYPEDYAEGRFLSMVSAAPLPGASGHPPVPGRA
AP
PDVRTCDSGKVGATGFTPEELDTMDREALRAISRGCKPPSTLAKLVTGLGFAIHGALIPGSEGCVFDS
SH
PNYPHRVIVKAGWYASTNHEARLLRRLNHPAILPLLDLHWSGVTCLVLPKYHCDLYTYLSKRPSPLG HL
QITAVSRQLLSAIDYVHCEGIIHRDIKTENILINTPENICLGDFGAACFVRGCRSSPFHYGIAGTIDT
NA
PEVLAGDPYTQVIDIWSAGLVIFETAVHTASLFSAPRDPERRPCDNQIARIIRQAQVHVDEFPTHAES
RL TAHYRSRAAGNNRPAWTRPAWTRYYKIHTDVEYLICKALTFDAALRPSAAELLRLPLFHPK*
Gene matched: gi | 125617 | sp| P13287 | KR1_HSV2 Gene name: SERINE/THREONINE-PROTEIN KINAS
[SEQ ID NO: 209]
ORF # = 4 from Contig 12
ORF start site = 6835
ORF end site = 6948
ORF sequence : VGGLCLMILGMACLLEVLRRLGRELARCCPHAGQFAP* Gene matched: gi | 137132 | sp | P13293 |VGLJ_HSV2 Gene name: GLYCOPROTEIN J«gi | 419140 | pir |
[SEQ ID NO:210]
ORF # = 5 from Contig 12
ORF start site = 7392 ORF end site = 8573
ORF sequence:
MGRLTSGVGTAALLWAVGLRWCAKYALADPSLKMADPNRFRGKNLPVLDQLTDPPGVKRVYHIQPS
LE
DPFQPPSIPITVYYAVLERACRSVLLHAPSEAPQIVRGASDEARKHTYNLTIAWYRMGDNCAIPITVM EY
TECPYNKSLGVCPIRTQPRWSYYDSFSAVSEDNLGFLMHAPAFETAGTYLRLVKINDWTEITQFILEH
RA
RASCKYALPLRIPPAACLTSKAYQQGVTVDSIGMLPRFIPENQRTVALYSLKIAGWHGPKPPYTSTLL
PP ELSDTTNATQPELVPEDPEDSALLEDPAGTVSSQIPPNWHIPSIQDVAPHHAPAAPSNPGLIIGALAG
ST
LAVLVIGGIAFWVRRRAQMAPKRLRLPHIRDDDAPPSHQPLFY*
Gene matched: gi | 138234 | sp | P03172 |VGLD_HSV2 Gene name: GLYCOPROTEIN D PRECURSOR
[SEQ ID NO:211]
ORF # = 6 from Contig 12
ORF start site = 8775 ORF end site = 9893
ORF sequence :
MPGRSLQGLAILGLWVCATGLWRGPTVSLVSDSLVDAGAVGPQGFVEEDLRVFGELHFVGAQVPHTN Y
DGIIELFHYPLGNHCPRWHWTLTACPRRPAVAFTLCRSTHHAHSPAYPTLELGLARQPLLRVRTAT RD
YAGLYVLRVWVGSATNASLFVLGVALSANGTFVYNGSDYGSCDPAQLPFSAPRLGPSSVYTPGASRPT
PP
RTTTSPSSPRDPTPAPGDTGTPAPASGERAPPNSTRSASESRHRLTVAQVIQIAIPASIIAFVFLGSC
IC FIHRCQRRYRRPRGQIYNPGGVSCAVNEAAMARLGAELRSHPNTPPKPRRRSSSSTTMPΞLTSIAEES
EP
GPWLLSVSPRPRSGPTAPQEV*
Gene matched: gi | 138328 | sp | P06764 |VGLI_HSV23 Gene name: GLYCOPROTEIN IΛAgi | 73722 | pir | [ SEQ ID NO : 212 ]
ORF # = 7 from Contig 12 ORF start site = 10212
ORF end site = 11858
ORF sequence :
MARGAGLVFFVGVWVVSCLAAAPRTSWKRVTSGEDVVLLPAPAGPEERTRAHKLLWAAEPLDACGPLR
PS WVALWPPRRVLETWDAACMRAPEPLAIAYSPPFPAGDEGLYSELAWRDRVAWNESLVIYGALETDS
GL
YTLSWGLSDEARQVASWLWEPAPVPTPTPDDYDEEDDAGVSERTPVSVPPPTPPRGPPVAPPTHP
RV
IPEVSHVRGVTVHMETPEAILFAPGETFGTNVSIHAIAHDDGPYAMDWWMRFDVPSSCAEMRIYEAC LY
HPQLPECLSPADAPCAVSSWAYRLAVRSYAGCSRTTPPPRCFAEARMEPVPGLAWLASTVNLEFQHAS
PQ
HAGLYLCWYVDDHIHAWGHMTISTAAQYRNAWEQHLPQRQPEPVEPTRPHVRAPPPAPSARGPLRL
GA VLGAALLLAALGLSAWACMTCWRRRSWRAVKSRASATGPTYIRVADSELYADWSΞDSEGERDGSLWQD
PP
ERPDSPSTNGSGFEILSPTAPSVYPHSEGRKSRRPLTTFGSGSPGRRHSQASYSSVLW*
Gene matched: gi | 138240 | sp | P04488 |VGLE_HSV11 Gene name: GLYCOPROTEIN E PRECURSOR
[SEQ ID NO: 213] ORF # = 8 from Contig 12 ORF start site = 12010 ORF end site = 12147 ORF sequence: VALHAVDAPSQFVTWLAVRWLRGAVGLGAVLCGIAFYVTSIARGA*
Gene matched: gi | 1944544 | gnl | PID| e312381 Gene name: (X14112) US8A [human herpesvirus
[SEQ ID NO:214]
ORF # = 9 from Contig 12 ORF start site = 12247 ORF end site = 12516 ORF sequence: MTSRPADQDS VRS SAS VPLYP AAS PVPAEAYYS E S EDEAANDFLVRMGRQQ S VLRRRRRRTRCVGLVI
AC
LWALLSGGFGALLVWLLR*
Gene matched: gi | 135568 | sp | P06481 |TEGP_HSV11 Gene name: TEGUMENT PHOSPHOPROTEIN US9
[SEQ ID NO:215]
ORF # = 10 from Contig 12
ORF start site = 13912
ORF end site = 13004
ORF sequence : MIRRRGNVEIRVYYESVRPSRSRSHLKPSDHQEFPGHHVSPGSPGFPESPGNREFHDLPENPGSRAYP
GT
RDPHDPHGCPGSLDPHGNPAQPAGLPSPVPYAPLGSPDPSSPRQRTYVLPRVGIRNAPASDTRAPKRA
HS
RHRADRPPESPGSELYPLNAQALAHLQMLPADHRAFFRTVIEVSRLCALNTHDPPPPLAGARVGQEAQ LV
HTQWLRANRESSPLWPWRTAAMNFIAAAAPCVQTHRHMHDLLMACAFWCCLAHASTCSYAGLYSAHCQ
HL
FRAFGCGPPVLTTSRGQGGWCN*
Gene matched: gi | 137138 | sp | P06486 |US10_HSV11 Gene name: VIRION PROTEIN US10
[SEQ ID NO: 216]
ORF # = 11 from Contig 12
ORF start site = 15899
ORF end site = 16582
ORF sequence: MSAEQRKKKKTTTTTQGRGAEVAMADEDGGRLRAAAETTGGPGSPDPADGPPPTPNPDRRPAARPGFG
WH
GGPEENEDEDDDAAADADADEAAPASGEAVDEPAADGWSPRQLALLASMVDEAVRTIPSPPPERDGA
EE
EAARSPSPPRTPSMCADYGEENDDDDDDDDRDAGRWVRGPENDVRGPRGVPGPHGQPVAATPGAPPTP PP
PPPPPPPARPPPALDRL*
Gene matched: gi | 124141 | sp | P08392 | ICP4_HSV11 Gene name: TRANS-ACTING TRANSCRIPTIONAL [SEQ ID NO: 217] = Contig ID 15
[SEQ ID NO:218]
ORF # = 1 from Contig 15
ORF start site = 755
ORF end site = 1297 ORF sequence:
MRTPADDVSWRYEAPSVIDYARIDGIFLRYHCPGLDTFLWDRHAQRAYLVNPFLFAAGFLEDLSHSVF
PA
DTQETTTRRALYKEIRDALGSRKQAVSHAPVRAGCVNFDYSRTRRCVGRRDLRPANTTSTWEPPVSSD
DE ASSQSKPLATQPPVLALSNAPPRRVSPTRGRRRHTRLRRN*
Gene matched: gi | 136776 | sp| P28278 |VGLL_HSV2H Gene name: GLYCOPROTEIN L PRECURSOR»gi |
[SEQ ID NO:219]
ORF # = 2 from contig 15
ORF start site = 1170 ORF end site = 2174
ORF sequence:
MKRARSRSPSPPSRPSSPFRTPPHGGSPRREVGAGILASDATSHVCIASHPGSGAGYPTRLAAGSAVQ
RR
RPRGCPPGVMFSASTTPEQPLGLΞGDATPPLPTSVPLDWAAFRRAFLIDDAWRPLLEPELANPLTARL LA
EYDRRCQTEEVLPPREDVFSWTRYCTPDDVRWIIGQDPYHHPGQAHGLAFSVRADVPVPPSLRNVLA
AV
KNCYPDARMSGRGCLEKWARDGVLLLNTTLTVKRGAAASHSKLGWDRFVGGWRRLAARRPGLVFMLW
GA HAQNAIRPDPRQHYVLKFSHPSPLSKVPFGTCQHFLAANRYLETRDIMPIDWSV*
Gene matched: gi | 137037 | sp | P10186 |UNG_HSV11 Gene name: URACIL-DNA GLYCOSYLASE
[SEQ ID NO: 220] ORF # = 3 from Contig 15 ORF start site = 2229 ORF end site = 2930 ORF sequence:
MVKSRVSYRSVMSGVGEERVPSAFTILASWGWTFAPQNHDPGASPNTTPIESIAGTAPDAHVGPLDGE PD
RDAISPLTSSVAGDPPGADGPYVTFDTLFMVSSIDELGRRQLTDTIRKDLRLSLAKFSIACTKTSSFS GT
AARQRKRGAPPQRTCVPRSNKSLQMFVLCKRANAAQVREQLRAVIRSRKPRKYYTRSSDGRLCPAVPV
FV
HEFVSSEPMRLHRDNVMLSTEPD*
Gene matched: gi | 136782 | sp| P28279 |UL03_HSV2H Gene name: PROTEIN UL3
[SEQ ID NO:221]
ORF # = 4 from contig 15
ORF start site = 3735
ORF end site = 3130
ORF sequence : MGNPQTTIAYSLHHPRASLTSALPDAAQWHVFESGTRAVLTRGRARQDRLPRGGWIQHTPIGLLVI
ID
CRAEFCAYRFIGRASTQRLERWWDAHMYAYPFDSWVSSSHGESVRSATAGILTWWTPDTIYITATIY
GT
APEAARGCDNAPLDVRPTTPPAPVSPTAGEFPANTTDLLVEVLREIQISPTLDDADPTPGT*
Gene matched: gi | 136788 | sp | P28280 | UL04_HSV2H Gene name: PROTEIN UL4«gi | 73890 |pir | | WM
[SEQ ID NO:222]
ORF # = 5 from Contig 15
ORF start site = 6447
ORF end site = 3802 ORF sequence :
MAASGGEGSRDVRAPGPPPQQPGARPAVRFRDEAFLNFTSMHGVQPIIARIRELSQQQLDVTQVPRLQ
WF
RDVAALEVPTGLPLREFPFAAYLITGNAGSGKSTCVQTLNEVLDCVVTGATRIAAQNMYVKLSGAFLS
RP INTIFHEFGFRGNHVQAQLGQHPYTLASSPASLEDLQRRDLTYYWEVILDITKRALAAHGGEDARNEF
HA
LTALEQTLGLGQGALTRLASVTHGALPAFTRSNIIVIDEAGLLGRHLLTTWYCWWMINALYHTPQYA
GR
LRPVLVCVGSPTQTASLESTFEHQKLRCSVRQSENVLTYLICNRTLREYTRLSHSWAIFINNKRCVEH EF GNLMKVLEYGLPITEEHMQFVDRFWPESYITNPANLPGWTRLFSSHKEVSAYMAKLHAYLKVTREGE FV
VFTLPVLTFVSVKEFDEYRRLTQQPTLTMEKWITANASRITNYSQSQDQDAGHVRCEVHSKQQLWAR ND ITYVLNSQVAVTARLRKMVFGFDGTFRTFEAVLRDDSFVKTQGETSVEFAYRFLSRLMFGGLIHFYNF LQ
RPGLDATQRTLAYGRLGELTAELLSLRRDAAGASATRAADTSDRSPGERAFNFKHLGPRDGGPDDFPD DD LDVIFAGLDEQQLDVFYCHYALEEPETTAAVHAQFGLLKRAFLGRYLILRELFGEVFESAPFSTYVDN
FRGCELLTGSPRGGLMΞVALQTDNYTLMGYTYTRVFAFAEELRRRHATAGVAEFLEESPLPYIVLRDQ HG
FMSWNTNISEFVESIDSTELAMAINADYGISSKLAMTITRSQGLSLDKVAICFTPGNLRLNSAYVAM SR TTSSEFLHMNLNPLRERHERDDVISEHILSALRDPNWIVY*
Gene matched : gi | 122809 | sp | P10189 | HELI_HSV11 Gene name : PROBABLE HELICASE
[ SEQ ID NO : 223 ]
ORF # = 7 from Contig 15
ORF start site = 8457 ORF end site = 9347
ORF sequence:
MADPTPADEGTAAAILKQAIAGDRSLVEVAEGISNQALLRMACEVRQVSDRQPRFTATSVLRVDVTPR
GR
LRFVLDGSSDDAYVASEDYFKRCGDQPTYRGFAVWLTANEDHVHSLAVPPLVLLHRLSLFRPTDLRD FE
LVCLLMYLENCPRSHATPSLFVKVSAWLGWARHASPFERVRCLLLRSCHWILNTLMCMAGVKPFDDE
LV
LPHWYMAHYLLANNPPPVLSALFCATPQSSALQLPGPVPRTDCVAYNPAGVMGSCWKSKDLRSALVYW
WL SGSPKRRTSSLFYRFC*
Gene matched: gi | 136798 | sp | P10191 |UL07_HSV11 Gene name: PROTEIN UL7
[SEQ ID NO: 224]
ORF # = 8 from Contig 15 ORF start site = 11855 ORF end site = 9604 ORF sequence: MEAPGIVWVEESVSAITLYAVWLPPRTRDCLHALLYLVCRDAAGEARARFAEVSVGSSDLQDFYGSPD
VS
AAGAVAAARAAPAASPLEPLGDPTLWRALYACVLAALERQTGPVALFVPLRLGWDPQTGLWRVERAS
WG PPAAPRAALLDVEAKVDVDPLALAARVAEHPGARLAWARLAAIRDSPQCASSASLAVTITTRTARFAR
EY
TTLAFPPTSKEGAFADLVEVCEVGLRPRGHPQRVTARVLLPRGYDYFVSAGDGFSAPALVALFRQWHT
TV
HAAPGALAPVFAFLGPGFEVRGGPVQYFAVLGFPGWPTFTVPAAAAAESARDLVRGAAATHAACLGAW PA
VGARWLPPRAWPAVASEAAGRLLPAFREAVARWHPTATTIQLLDPPAAVGPVWTARFCFSGLQAQLL
AA
LAGLGEAGLPEARGRAGLERLDALVAAAPSEPWARAVLERLVPDACDACPALRQLLGGVMAAVCLQIE
QT ASSVKFAVCGGTGAAFWGLFNVDPGDADAAHGAIQDARRALEASVRAVLSANGIRPRLAPSLALEGVY
TH
WTWSQTGAWFWNSRDDTDFLQGFPLRGPAYAAAAEVMRDALRRILRRPAAGPPEEAVCAARGIMEDA
CD
RFVLDAFGRRLDAEYWSVLTPPGEADDPLPQTAFRGGALLDAEQYWRRWRVCPGGGESVGVPVDLYP RP
LVLPPVDCAHHLREILREIQLVFTGVLEGVWGEGGSFVYPFEEKMRFLFP*
Gene matched: gi | 136802 | sp | P10192 |HEPA_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:225]
ORF # = 10 from Contig 15
ORF start site = 14399 ORF end site = 15802
ORF sequence :
MGRRAPRGSPEAAPGADVAPGARAAWWVWCVQVATFIVSAICWGLLVLASVFRDRFPCLYAPATSYA
EA
NATVEVRGGVAVPLRLDTQSLLATYAITSTLLLAAAVYAAVGAVTSRYERALDAARRLAAARMAMPHA TL
IAGNVCAWLLQITVLLLAHRISQLAHLIYVLHFACLVYLAAHFCTRGVLSGTYLRQVHGLIDPAPTHH
RI
VGPVRAVMTNALLLGTLLCTAAAAVSLNTIAALNFNFSAPSMLICLTTLFALLWSLLLWEGVLCHY
VR VLVGPHLGAIAATGIVGLACEHYHTGGYYWEQQWPGAQTGVRVALALVAAFALAMAVLRCTRAYLYH
RR
HHTKFFVRMRDTRHRAHSALRRVRSSMRGSRRGGPPGDPGYAETPYASVSHHAEIDRYGDSDGDPIYD
EV
APDHEAELYARVQRPGPVPDAEPIYDTVEGYAPRSAGEPVYSTVRRW* Gene matched : gi | 136810 | sp | P04288 | VGLM_HSV11 Gene name : GLYCOPROTEIN M
[SEQ ID NO: 226]
ORF # = 11 from Contig 15 ORF start site = 16286 ORF end site = 15996 ORF sequence : MGLAFSGARPCCCRHNVIITDGGEWSLTAHEFDWDIESEEEGNFYVPPDMRWTRAPGPQYRRASD PP SRHTRRRDPDVARPPATLTPPLSDSE*
Gene matched: gi | 136816 | sp | P13294 | UL11_HSV2 Gene name: HYPOTHETICAL UL11 PROTEIN
[SEQ ID NO: 227] ORF # = 12 from Contig 15
ORF start site = 18064
ORF end site = 16202
ORF sequence:
MAAAATPGAKRPADPARDPDΞPPKRPRPNSLDLATVFGPRPAPPRPTSPGAPGSHWPQSPPRGQPDGG AP
GEKARPASPALSEASΞGPPTPDIPLSPGGAHAIDPDCSPGPPDPDPMWSASAIPNALPPHILAETFER
HL
RGLLRGVRSPLAIGPLWARLDYLCSLWSLEAAGMVDRGLGRHLWRLTRRAPPSAAEAVAPRPLMGFY
EA ATQNQADCQLWALLRRGLTTASTLRWGAQGPCFSSQWLTHNASLRLDAQSSAVMFGRVNEPTARNLLF
RY
CVGRADAGVNDDADAGRFVFHQPGDLAEENVHACGVLMDGHTGMVGASLDILVCPRDPHGYLAPAPQT
PL
AFYEVKCRAKYAFDPADPGAPAASAYEDLMARRSPEAFRAFIRSIPNPGVRYFAPGRVPGPEEALVTQ DR
DWLDSRAAGEKRRCSAPDRALVELNSGWSEVLLFGVPDLERRTISPVAWSSGELVRREPIFANPRHP
NF
KQILVQGYVLDSHFPDCPLQPHLVTFLGRHRAGAEEGVTFRLEDGRGAPAGRGGAPGPAKASILPDQA
VP IALIITPVRVEPGIYRDIRRNSRLAFDDTLAKLWASRSPGRGPAAADTTSSSPTAGRSSR*
Gene matched: gi | 119694 | sp | P06489 | EXON_HSV2 Gene name: ALKALINE EXONUCLEASE«gi | 33025 [SEQ ID NO:228]
ORF # = 13 from Contig 15
ORF start site = 19661 ORF end site = 18107
ORF sequence:
MDESGRQRPASHVAADISPQGAHRRSFKAWLASYIHSLSRRASGRPSGPSPRDGAVSGARPGSRRRSS
FR
ERLRAGLSRWRVSRSSRRRSSPEAPGPAAKLRRPPLRRSETAMTSPPSPPSHILSLARIHKLCIPVFA VN
PALRYTTLEIPGARSFGGSGGYGEVQLIREHKLAVKTIREKEWFAVELVATLLVGECALRGGRTHDIR
GF
ITPLGFSLQQRQIVFPAYDMDLGKYIGQLASLRATTPSVATALHHCFTDLARAWFLNTRCGISHLDI
KC ANVLVMLRSDAVSLRRAVLADFSLVTLNSNSTISRGQFCLQEPDLESPRGFGMPAALTTANFHTLVGH
GY
NQPPELLVKYLNNERAEFNNRPLKHDVGLAVDLYALGQTLLELLVSVYVAPSLGVPVTRVPGYQYFNN
QL
SPDFAVALLAYRCVLHPALFVNSAETNTHGLAYDVPEGIRRHLRNPKIRRAFTEQCINYQRTHKAVLS SV
SLPPELRPLLVLVSRLCHANPAARHSLS*
Gene matched: gi | 125628 | sp | P04290 | KR2_HSV11
Gene name: PROBABLE SERINE/THREONINE-PROTEIN KINASE
[SEQ ID NO:229]
ORF # = 14 from Contig 15
ORF start site = 20074 ORF end site = 19415
ORF sequence :
MSRDASHAALRRRLAETHLRAEVYRDQTLQLHREGVSTQDPRFVGAFMAAKAAHLELEARLKSRARLE
MM
RQRATCVKIRVEEQAARRDFLTAHRRYLDPALSERLDAADDRLADQEEQLEEAAANASLWGDGDLADG WM
SPGDSDLLVMWQLTSAPKVHTDAPSRPGSRPTYTPSAAGRPDAQAAPPPETAPSPEPAPGPAADPASG
SG
FARDCPDGE*
Gene matched: gi | 136823 | sp| P04291 |UL14_HSV11 Gene name: HYPOTHETICAL UL14 PROTEIN
[SEQ ID NO:230; ORF # = 15 from Contig 15
ORF start site = 20155
ORF end site = 21453
ORF sequence: MFGQQ
LASDVQQYLERLEKQRQQKVGVDEASAGLTLGGDALRVPFLDFATATPKRHQTWPGVGTLHDCCEHS
PL
FSAVARRLLFNSLVPAQLRGRDFGGDHTAKLEFLAPELVRAVARLRFRECAPEDAVPQRNAYYSVLNT
FQ ALHRSEAFRQLVHFVRDFAQLLKTSFRASSLAETTGPPKKRAKVDVATHGQTYGTLELFQKMILMHAT
YF
LAAVLLGDHAEQVNTFLRLVFEIPLFSDTAVRHFRQRATVFLVPRRHGKTWFLVPLIALSLASFRGIK
IG
YTAHIRKATEPVFDEIDACLRGWFGSSRVDHVKGETISFSFPDGSRSTIVFASSHNTNVSTPSSRGAC FP
GAALPEIDRQTNTARRECGTTRPQPPPPWRGEALLFICNRTMRLWPRPARPRGSSLQTGGWYTMTERR
GA
TRRWSGG*
Gene matched : gi 1 139646 | sp | P04295 | VTER_HSV11 Gene name : PROBABLE DNA PACKAGING PROTEIN
[ SEQ ID NO : 231 ] ORF # = 16 from Contig 15
ORF start s ite = 22291
ORF end site = 21326
ORF sequence :
VWRWRGDERLKIFRCLTVLTEPLCQVALPDPDPERALFCEIFLYLTRPKALRLPSNTFFAIFFFNRE RR
YCATVHLRSVTHPRTPLLCTLAFGHLEAASPPEETPDPAAEQLADEPVAHELDGAYLVPTEPPPNPGA
CC
ALGPGAWWHLPGGRIYCWAMDDDLGSLCPPGSRARHLGWLLSRITDPPGGGGACAPTAHIDSANALWR
AP AVAEACPCVAPCMWSNMAQRTLAVRGDASLCQLLFGHPVDAVILRQATRRPRITAHLHEVWGRDGAE
SV
IRPTSAGWRLCVLSSYTSRLFATSCPAVARAVARASSSDYK*
Gene matched: gi | 136829 | sp | P10200 |UL16_HSV11 Gene name: PROTEIN UL16
[SEQ ID NO:232] ORF # = 17 from Contig 15 ORF start site = 24654
ORF end site = 22546
ORF sequence:
MNAHFANEVQYDLTRDPSSPASLIHVIISSECLAAAGVPLSALVRGRPDGGAAANFRVETQTRAHATG DC
TPWRSAFAAYVPADAVGAILAPVIPAHPDLLPRVPSAGGLFVSLPVACDAQGVYDPYTVAALRLAWGP
WA
TCARVLLFSYDELVPPNTRYAADGARLMRLCRHFCRYVARLGAAAPAAATEAAAHLSLGMGESGTPTP
QA SSVSGGAGPAWGTPDPPISPEEQLTAPGGDTATAEDVSITQENEEILALVQRAVQDVTRRHPVRARP
KH
AASGVASGLRQGALVHQAVSGGALGASDAEAVLAGLEPPGGGRFATPGGPRAAGEDVLNDVLTLVPGT
AK
PRSLVEWLDRGWEALAGGDRPDWLWSRRSISWLRHHYGTKQRFVWSYENSVAWGGRRARPPRLSSE LA
TALTEACAAERWRPHQLSPAAQTALLRRFPALEGPLRHPRPVLQPFDIAAEVAFVARIQIACLRALG
HS
IRAALQGGPRIFQRLRYDFGPHQSEWLGEVTRRFPVLLENLMRALEGTAPDAFFHTAYALAVLAHLGG
QG GRGRRRRLVPLSDDIPARFADSDAHYAFDYYΞTSGDTLRLTNRPIAWIDGDVNGREQSKCRFMEGSP
ST
APHRVCEQYLPGESYAYLCLGFNRRLCGLWFPGGFAFTINTAAYLSLADPVARAVGLRFCRGAATGP
GL
VR*
Gene matched: gi | 136835 | sp | P10201 |UL17_HSV11 Gene name: PROTEIN UL17
[SEQ ID NO:233]
ORF # = 18 from Contig 15
ORF start site = 24684
ORF end site = 25955 ORF sequence:
VPEGAWVGGACARPRGPRAHVRLYAVCFVCPQGIRGQDFNLLFVDEANFIRPDAVQTIMGFLNQANCK
II
FVSSTNTGKASTSFLYNLRGAADELLNWTYICDDHMPRWTHTNATACSCYILNKPVFITMDGAVRR
TA DLFLPDSFMQEIIGGQARETGDDRPVLTKSAGERFLLYRPSTTTNSGLMAPELYVYVDPAFTANTRAS
GT
GIAWGRYRDDFIIFALEHFFLRALTGSAPADIARCWHSLAQVLALHPGAFRSVRVAVEGNSSQDSA
VA
IATHVHTEMHRILASAGANGPGPELLFYHCEPPGGAVLYPFFLLNKQKTPAFEYFIKKFNSGGVMASQ EL VSVTVRLQTDPVEYLSEQLNNLIETVSPNTDVRMYSGKRNGAADDLMVAVIMAIYLAAPTGIPPAFFP
IT
RTS*
Gene matched: gi | 139646 | sp| P04295 |VTER_HSV11 Gene name: PROBABLE DNA PACKAGING PROTEIN
[SEQ ID NO:234] ORF # = 19 from Contig 15
ORF start site = 27251
ORF end site = 26295
ORF sequence:
MITDCFEADIAIPSGISRPDAAALQRCEGRWFLPTIRRQLALADVAHESFVSGGVSPDTLGLLLAYR RR
FPAVITRVLPTRIVACPVDLGLTHAGTVNLRNTSPVDLCNGDPVSLVPPVFEGQATDVRLESLDLTLR
FP
VPLPTPLAREIVARLVARGIRDLNPDPRTPGELPDLNVLYYNGARLSLVADVQQLASVNTELRSLVLN
MV YSITEGTTLILTLIPRLLALSAQDGYVNALLQMQSVTREAAQLIHPEAPMLMQDGERRLPLYEALVAW
LA
HAGQLGDILALAPAVRVCTFDGAAWQSGDMAPVIRYP*
Gene matched: gi | 139191 | sp| P10202 |VP23_HSV11 Gene name: CAPSID PROTEIN VP23
[SEQ ID NO:235]
ORF # = 21 from Contig 15 ORF start site = 32735
ORF end site = 32067
ORF sequence:
MTMRDDVPLLDRELVYEAACGGEDGELPLDEQFSLSSYGTSDFFVSSAYSRLPPHTQPVFSKRWMFA
WS FLVLKPLELVAAGMYYGWTGRAVAPACIIAAVLAYYVTWLARALLLYVNIKRDRLPLSPPVFWGLCVI
MG
GAALCALVAAAHETFSPDGLFHWITASQLLPRTDPLRARSLGIACAAGAAMWVAAADCFAAFTNFFLA
RF
WTRAILKAPVAF*
Gene matched: | 136841 | sp| P10204 |UL20_HSV11 Gene name: MEMBRANE PROTEIN UL20 [SEQ ID NO:236]
ORF # = 23 from Contig 15
ORF start site = 37721
ORF end site = 35205 ORF sequence:
MGPGLWWMGVLVGVAGGHDTYWTEQIDPWFLHGLGLARTYWRDTNTGRLWLPNTPDASDPQRGRLAP
PG
ELNLTTASVPMLRWYAERFCFVLVTTAEFPRDPGQLLYIPKTYLLGRPRNASLPELPEAGPTSRPPAE
VT QLKGLSHNPGASALLRSRAWVTFAAAPDREGLTFPRGDDGATERHPDGRRNAPPPGPPAGTPRHPTTN
LS
IAHLHNASVTWLAARGLLRTPGRYVYLSPSASTWPVGVWTTGGLAFGCDAALVRARYGKGFMGLVISM
RD
SPPAEIIWPADKTLARVGNPTDENAPAVLPGPPAGPRYRVFVLGAPTPADNGSALDALRRVAGYPEE ST
NYAQYMSRAYAEFLGEDPGSGTDARPSLFWRLAGLLASSGFAFVNAAHAHDAIRLSDLLGFLAHSRVL
AG
LAARGAAGCAADSVFLNVSVLDPAARLRLEARLGHLVAAILEREQSLAAHALGYQLAFVLDSPAAYGA
VA PSAARLIDALYAEFLGGRALTAPMVRRALFYATAVLRAPFLAGAPSAEQRERARRGLLITTALCTSDV
AA
ATHADLRAALARTDHQKNLFWLPDHFSPCAASLRFDLAEGGFILDALAMATRSDIPADVMAQQTRGVA
SV
LTRWAHYNALIRAFVPEATHQCSGPSHNAEPRILVPITHNASYWTHTPLPRGIGYKLTGVDVRRPLF IT
YLTATCEGHAREIEPKRLVRTENRRDLGLVGAVFLRYTPAGEVMSVLLVDTDATQQQLAQGPVAGTPN
VF
SSDVPSVALLLFPNGTVIHLLAFDTLPIATIAPGFLAASALGWMITAALAGILRWRTCVPFLWRRE
Gene matched: gi | 138315 | sp | P06477 |VGLH_HSV11 Gene name: GLYCOPROTEIN H PRECURSOR
[SEQ ID NO:237]
ORF # = 24 from Contig 15
ORF start site = 39188
ORF end site = 38058
ORF sequence : MASHAGQQHAPAFGQAARASGPTDGRAASRPSHRQGASEARGDPELPTLLRVYIDGPHGVGKTTTSAQ
LM
EALGPRDNIVYVPEPMTYWQVLGASETLTNIYNTQHRLDRGEISAGEAAWMTSAQITMSTPYAATDA
VL
APHIGGEAVGPQAPPPALTLVFDRHPIASLLCYPAARYLMGSMTPQAVLAFVALMPPTAPGTNLVLGV LP EAEHADRLARRQRPGERLDLAMLSAIRRVYDLLANTVRYLQRGGRWREDWGRLTGVAAATPRPDPEDG AG
SLPRIEDTLFALFRVPELLAPNGDLYHIFAWVLDVLADRLLPMHLFVLDYDQSPVGCRDALLRLTAGM IP TRVTTAGSIAEIRDLARTFAREVGGV*
Gene matched: gi | 125438 | sp | P04407 | KITH_HSV23 Gene name: THYMIDINE KINASE
[SEQ ID NO:238] ORF # = 25 from Contig 15 ORF start site = 39090 ORF end site = 39935 ORF sequence :
MARTGRRAAVGRPARTSSLTERRRVLLAGVRSHTRFYKAFAREVREFNATRICGTLLTLMSGSLQGRS LF EATRVTLICEVDLGPRRPDCICVFEFANDKTLGGVCVILELKTCKSISSGDTASKREQRTTGMKQLRH SL
KLLQSLAPPGDKWYLCPILVFVAQRTLRVSRVTRLVPQKISGNITAAVRMLQSLSTYAVPPEPQTRR SR
RRVAATARPQRPPSPTRDPEGTAGHPAPPESDPPSPGWGVAAEGGGVLQKIAALFCVPVAAKSRPRT KT E*
Gene matched: gi | 136854 | sp | P10208 | UL24_HSV11 Gene name: PROTEIN UL24
[SEQ ID NO: 239]
ORF # = 26 from Contig 15
ORF start site = 40216
ORF end site = 41973 ORF sequence:
MDPYYPFDALDVWEHRRFIVADSRSFITPEFPRDFWMLPVFNIPRETAAERAAVLQAQRTAAAAALEN
AA
LQAAELPVDIERRIRPIEQQVHHIADALEALETAAAAAEEADAARDAEARGEGAADGAAPSPTAGPAA
AE MEVQIVRNDPPLRYDTNLPVDLLHMVYAGRGAAGSSGWFGTWYRTIQERTIADFPLTTRSADFRDGR
MS
KTFMTALVLSLQSCGRLYVGQRHYSAFECAVLCLYLLYRTTHESSPDRDRAPVAFGDLLARLPRYLAR
LA
AVIGDESGRPQYRYRDDKLPKAQFAAAGGRYEHGALATHWIATLVRHGVLPAAPGDVPRDTSTRVNP DD VAHRDDVNRAAAAFLARGHNLFLWEDQTLLRAT ANT I TALAVLRRLLANGNVYADRLDNRLQLGML I P GA
VPAEAIARGASGLDSGAIKSGDNNLEALCVNYVLPLYQADPTVELTQLFPGLAALCLDAQAGRPLAST RR WDMSSGARQAALVRLTALELINRTRTNTTPVGEIINAHDALGIQYEQGLGLLAQQARIGLASNAKRF AT FNVGSDYDLLYFLCLGFIPQYLSVA*
Gene matched : gi | 136863 | sp | P10209 | UL25_HSV11 Gene name : VIRION PROTEIN UL25
[ SEQ ID NO : 240 ]
ORF # = 27 from Contig 15 ORF start site = 42206
ORF end s ite = 44179
ORF sequence :
MASAEMRERLEAPLPDRAVPIYVAGFLALYDSGDPGELALDPDTVRAALPPENPLPINVDHRARCEVG
RV LAWNDPRGPFFVGLIACVQLERVLETAASAAIFERRGPALSREERLLYLITNYLPSVSLSTKRRGDE
VP
PDRTLFAHVALCAIGRRLGTIVTYDTSLDAAIAPFRHLDPATREGVRREAAEAELALAGRTWAPGVEA
LT
HTLLSTAVNNMMLRDRWSLVAERRRQAGIAGHTYLQASEKFKIWGAESAPAPERGYKTGAPGAMDTSP AA
SVPAPQVAVRARQVASSSSSSSSFPAPADMNPVSASGAPAPPPPGDGSYLWIPAFHYNQLVTGQSAPH
HP
PLTACGLPAAGTVAYGHPGAGPSPHYPPPPAHPYPGMLFAGPSPLEAQIAALVGAIAADRQAGGLPAA
AG DHGIRGSAKRRRHEVEQPEYDCGRDEPDRDFPYYPGEARPEPRPVDSRRAARQAΞGPHETITALVGAV
TS
LQQELAHMRARTHAPYGPYPPVGPYHHPHADTETPAQPPRYPAEAVYLPPPHIAPPGPPLSGAVPPPS
YP
PVAVTPGPAPPLHQPSPAHAHPPPPPPGPTPPPAASLPQPEAPGAEAGALVNASSAAHVKRGHGPGRR SV
CVTDDGVPLTRLQDPDLGGVCVFIYFK*
Gene matched: gi | 139233 | sp| P10210 |VP40_HSV11 Gene name: CAPSID PROTEIN P40 (VIRION S
[SEQ ID NO:241] ORF # = 28 from Contig 15 ORF start site = 47298 ORF end site = 44584 ORF sequence :
MRGGGLICALWGALVAAVASAAPAAPAAPRASGGVAATVAANGGPASRPPPVPSPATTKARKRKTKK
PP
KRPEATPPPDANATVAAGHATLRAHLREIKVENADAQFYVCPPPTGATWQFEQPRRCPTRPEGQNYT EG
IAWFKENIAPYKFKATMYYKDVTVSQVWFGHRYSQFMGIFEDRAPVPFEEVIDKINAKGVCRSTAKY
VR
NNMETTAFHRDDHETDMELKPAKVATRTSRGWHTTDLKYNPSRVEAFHRYGTTVNCIVEEVDARSVYP
YD EFVLATGDFVYMSPFYGYREGSHTEHTSYAADRFKQVDGFYARDLTTKARATΞPTTRNLLTTPKFTVA
WD
WVPKRPAVCTMTKWQEVDEMLRAEYGGSFRFSSDAISTTFTTNLTQYSLSRVDLGDCIGRDAREAIDR
MF
ARKYNATHIKVGQPQYYLATGGFLIAYQPLLSNTLAELYVREYMREQDRKPRNATPAPLREAPSANAS VE
RIKTTSSIEFARLQFTYNHIQRHVNDMLGRIAVAWCELQNHELTLWNEARKLNPNAIASATVGRRVSA
RM
LGDVMAVSTCVPVAPDNVIVQNSMRVSSRPGTCYSRPLVSFRYEDQGPLIEGQLGENNELRLTRDALE
PC TVGHRRYFIFGGGYVYFEEYAYSHQLSRADVTTVSTFIDLNITMLEDHEFVPLEVYTRHEIKDSGLLD
YT
EVQRRNQLHDLRFADIDTVIRADANAAMFAGLCAFFEGMGDLGRAVGKWMGWGGWSAVSGVSSFM
SN
PFGALAVGLLVLAGLVAAFFAFRYVLQLQRNPMKALYPLTTKELKTSDPGGVGGEGEEGAEGGGFDEA KL
AEAREMIRYMALVSAMERTEHKARKKGTSALLSSKVTNMVLRKRNKARYSPLHNEDEAGDEDEL*
Gene matched : gi | 138198 | sp | P06763 | VGLB_HSV23 Gene name : GLYCOPROTEIN B PRECURSOR • gi |
[ SEQ ID NO : 242 ] ORF # = 29 from Contig 15 ORF start site = 47122 ORF end site = 47338 ORF sequence :
WAGLGTGGGREAGPPFAATVAATPPEARGAAGAAGAADATAATSAPTTSAQIKPPPRMAGLRGRVAP AA R*
Gene matched: gi | 729379 | sp| P39055 | DYN1_CAEEL Gene name: DYNAMIN«gi | 456286 (L29031) d [SEQ ID NO: 243]
ORF # = 30 from Contig 15
ORF start site = 49662 ORF end site = 47305
ORF sequence :
MAAAPPAAVSEPTAARQKLLALLGQVQTYVFQLELLRRCDPQIGLGKLAQLKLNALQVRVLRRHLRPG
LE
AQAAAFLTPLSVTLELLLEYAWREGERLLGHLETFATTGDVSAFFTETMGLARPCPYHQQIRLETYGG DV
RMELCFLHDVENFLKQLNYCHLITPPSGATAALERVREFMVAAVGSGLIVPPELSDPSHPCAVCFEEL
CV
TANQGATIARRLADRICNHVTQQAQVRLDANELRRYLPHAAGLSDAARARALCVLDQALARTAAGGGA
RA GPPPADSSSVREEADALLEAHDVFQATTPGLYAISELRFWLASGDRARHSTMDAFADNLNALAQRELQ
QE
TAAVAVELALFGRRAEHFDRAFGGHLAALDMVDALIIGGQATSPDDQIEALIRACYDHHLTTPLLRRL
VS
PEQCDEEALRRVLARLGAGGATGGAEEEEPRAAAEEGGRRRGAGTPASEDGERGPEPGAQGPESWGDI AT
RAAADVPERRRLYADRLTKRSLASLGRCVREQRGELEKMLRVSVHGEVLPATFAAVANGFAARARFCA
LT
AGAGTVIDNRAAPGVFDAHRFMRASLLRHQVDPALLPSITHRFFELVNGPLFDHSTHΞFAQPPNTALY
YS VENVGLLPHLKEELARFIMGAGGSGADWAVSEFQKFYCFDGVSGITPTQRAAWRYIRELIIATTLFAS
VY
RCGELELRRPDCSRPTSEGLYRYPPGVYLTYNSDCPLVAIVESGPDGCIGPRSVWYDRDVFSILYSV
LQ
HLAPRLAGGGSDAPP*
Gene matched: gi | 124088 | sp | P10212 | PRTP_HSV11 Gene name: PROCESSING AND TRANSPORT PRO
[SEQ ID NO:244] ORF # = 31 from Contig 15 ORF start site = 51666 ORF end site = 50035 ORF sequence:
MSLSLDPYTCGPCPLLQLLARRSNLAVYQDLALSQCHGVFAGQSVEGRNFRNQFQPVLRRRVMDLFNN GF
LSAKTLTVALSEGAAICAPSLTAGQTAPAESSFEGDVARVTLGFPKELRVKSRVLFAGASANASEAAK AR VASLQSAYQKPDKRVDILLGPLGFLLKQFHAVIFPNGKPPGSNQPNPQWFWTALQRNQLPARLLSRED IE
TIAFIKRFSLDYGAINFINLAPNNVSELAMYYMANQILRYCDHSTYFINTLTAVIAGSRRPPGVQAAA AW APQGGAGLEAGARALMDSLDAHPGAWTSMFASCNLLRPVMAARPMWLGLSISKYYGMAGNDRVFQAG NW
ASLLGGKNACPLLIFDRTRKFVLACPRAGFVCAASSLGGGAHEHSLCEQLRGIIAEGGAAVASSVFVA TV
KSLGPRTQQLQIEDWLALLEDEYLSEEMMEFTTRALERGHGEWSTDAALEVAHEAEALVSQLGAAGEV FN
FGDFGDEDDHAASFGGLAAAAGAAGVARKRAFHGDDPFGEGPPEKKDLTLDML*
Gene matched: gi | 544182 | sp | P36384 | DNBI_HSV2 Gene name: MAJOR DNA-BINDING PROTEIN (IN
[SEQ ID NO: 245]
ORF # = 32 from Contig 15 ORF start site = 53575
ORF end site = 51701
ORF sequence:
MDTKPKTTTTVKVPPGPMGYVYGRACPAEGLELLSLLSARSGDADVAVAPLIVGLTVESGFEANVAAV
VG SRTTGLGGTAVSLKLMPSHYSPSVYVFHGGRHLAPSTQAPNLTRLCERARRHFGFSDYAPRPCDLKHE
TT
GDALCERLGLDPDRALLYLVITEGFREAVCISNTFLHLGGMDKVTIGDAEVHRIPVYPLQMFMPDFSR
VI
ADPFNCNHRSIGENFNYPLPFFNRPLARLLFEAWGPAAVALRARNVDAVARAAAHLAFDENHEGAAL PA
DITFTAFEASQGKPQRGARDAGNKGPAGGFEQRLASVMAGDAALALESIVSMAVFDEPPPDITTWPLL
EG
QETPAARAGAVGAYLARAAGLVGAMVFSTNSALHLTEVDDAGPADPKDHSKPSFYRFFLVPGTHVAAN PQ LDREGHWPGYEGRPTAPLVGGTQEFAGEHLAMLCGFSPALLAKMLFYLERCDGGVIVGRQEMDVFRY VA
DSGQTDVPCNLCTFETRHACAHTTLMRLRARHPKFASAARGAIGVFGTMNSAYSDCDVLGNYAAFSAL KR ADGSENTRTIMQETYRAATERVMAELEALQYVDQAVPTALGRLETIIGTREALHTWNNIKQLV*
Gene matched: >gi | 544182 | sp| P36384 | DNBI_HSV2 Gene name: MAJOR DNA-BINDING PROTEIN (IN [SEQ ID NO:246]
ORF # = 33 from Contig 15
ORF start site = 54393
ORF end site = 58115 ORF sequence:
MFCAAGGPTSPGGKSAARAASGFFAPHNPRGATQTAPPPCRRQNFYNPHLAQTGTQPKAPGPAQRHTY
YS
ECDEFRFIAPRSLDEDAPAEQRTGVHDGRLRRAPKVYCGGDERDVLRVGPEGFWPRRLRLWGGADHAP
EG FDPTVTVFHVYDILEHVEHAYSMRAAQLHERFMDAITPAGTVITLLGLTPEGHRVAVHVYGTRQYFYM
NK
AEVDRHLQCRAPRDLCERLAAALRESPGASFRGISADHFEAEWERADVYYYETRPTLYYRVFVRSGR
AL
AYLCDNFCPAIRKYEGGVDATTRFILDNPGFVTFGWYRLKPGRGNAPAQPRPPTAFGTSSDVEFNCTA DN
LAVEGAMCDLPAYKLMCFDIECKAGGEDELAFPVAERPEDLVIQISCLLYDLSTTALEHILLFSLGSC
DL
PESHLSDLASRGLPAPWLEFDSEFEMLLAFMTFVKQYGPEFVTGYNIINFDWPFVLTKLTEIYKVPL
DG YGRMNGRGVFRVWDIGQSHFQKRSKIKVNGMVNIDMYGIITDKVKLSSYKLNAVAEAVLKDKKKDLSY
RD
IPAYYASGPAQRGVIGEYCVQDSLLVGQLFFKFLPHLELSAVARLAGINITRTIYDGQQIRVFTCLLR
LA
GQKGFILPDTQGRFRGLDKEAPKRPAVPRGEGERPGDGNGDEDKDDDEDGDEDGDEREEVARETGGRH VG
YQGARVLDPTSGFHVDPVWFDFASLYPSIIQAHNLCFSTLSLRPEAVAHLEADRDYLEIEVGGRRLF
FV
KAHVRESLLSILLRDWLAMRKQIRSRIPQSTPEEAVLLDKQQAAIKWCNSVYGFTGVQHGLLPCLHV
AA TVTTIGREMLLATRAYVHARWAEFDQLLADFPEAAGMRAPGPYSMRIIYGDTDSIFVLCRGLTAAGLV
AM
GDKMASHISRALFLPPIKLECEKTFTKLLLIAKKKYIGVICGGKMLIKGVDLVRKNNCAFINRTSRAL
VD
LLFYΌDTVΞGAAAALAERPAEEWLARPLPEGLQAFGAVLVDAHRRITDPERDIQDFVLTAELSRHPRA YT
NKRLAHLTVYYKLMARRAQVPSIKDRIPYVIVAQTREVEETVARLAALRELDAAAPGDEPAPPAALPS
PA
KRPRETPSHADPPGGASKPRKLLVSELAEDPGYAIARGVPLNTDYYFSHLLGAACVTFKALFGNNAKI
TE SLLKRFIPETWHPPDDVAARLRAAGFGPAGAGATAEETRRMLHRAFDTLA*
Gene matched: gi | 118882 | sp | P07918 |DPOL_HSV21 Gene name: DNA POLYMERASE [SEQ ID NO: 247]
ORF # = 34 from Contig 15
ORF start site = 58977 ORF end site = 58060
ORF sequence:
MYDIAPRRSGSRPGPGRDKTRRRSRFSAAGNPGVERRASRKSLPSHARRLELCLHERRRYRGFFAALA
QT
PSEEIAIVRSLSVPLVKTTPVSLPFSLDQTVADNCLTLSGMGYYLGIGGCCPACSAGDGRLATVSREA LI
LAFVQQINTIFEHRTFLASLWLADRHSTPLQDLLADTLGQPELFFVHTILRGGGACDPRFLFYPDPT
YG
GHMLYVIFPGTSAHLHYRLIDRMLTACPGYRFAAHVWQSTFVLVVRRNAEKPADAEIPTVSAADIYCK
MR DISFDGGLMLEYQRLYATFDEFPPP*
Gene matched: gi | 136875 | sp | P10215 |UL31_HSV11 Gene name: PROTEIN UL31
[SEQ ID NO:248]
ORF # = 35 from Contig 15
ORF start site = 60760
ORF end site = 58970 ORF sequence :
MATSAPGVPSSAAVREESPGSSWKEGAFERPYVAFDPDLLALNEALCAELLAACHWGVPPASALDED
VE
SDVAPAPPRPRGAAREASGGRGPGSARGPPADPTAEGLLDTGPFAAASVDTFALDRPCLVCRTIELYK
QA YRLSPQWVADYAFLCAKCLGAPHCAASIFVAAFEFVYVMDHHFLRTKKATLVGSFARFALTINDIHRH
FF
LHCCFRTDGGVPGRHAQKQPRPTPSPGAAKVQYSNYSFLAQSATRALIGTLASGGDDGAGAGGGSGTQ
PS
LTTALMNWKDCARLLDCTEGKRGGGDSCCTRAAARNGEFEAAAGALAQGGEPETWAYADLILLLLAGT PA
VWESGPRLRAAADARRAAVSESWEAHRGARMRDAAPRFAQFAEPKAQPDLDLGPLMATVLKHGRGRGR
TG
GECLLCNLLLVRAYWLAMRRLRASWRYSENNTSLFDCIVPWDQLEADPEAQPGDGGRFVSLLRAAG
PE AIFKHMFCDPMCAITEMEVDPWVLFGHPRADHRDELQLHKAKLACGNEFEGRVCIALRALIYTFKTYQ
VF
VPKPTALATFVREAGALLRRHSISLLSLEHTLCTYV*
Gene matched: gi | 136879 | sp | P10216 |UL32_HSV11 Gene name: PROBABLE MAJOR ENVELOPE GLYC [SEQ ID NO:249] ORF # = 36 from Contig 15 ORF start site = 60759 ORF end site = 61151 ORF sequence:
MAGRAGRTRPRTLRDAIPDCALRSQTLESLDARYVSRDGAGDAAVWFEDMTPAELEVIFPTTDAKLNY LS RTQRLASLLTYAGPIKAPDGPAAPHTQDTACVHGELLARKRERFAAVINRFLDLHQILRG*
Gene matched: gi | 136883 | sp | P10217 |UL33_HSV11 Gene name : UL33
[SEQ ID NO:250]
ORF # = 37 from Contig 15
ORF start site = 61241 ORF end site = 62071
ORF sequence:
MAGMGKPYGGRPGDAFEGLVQRIRLIVPTTLRGGGGEΞGPYSPSNPPSRCAFQFHGQDGSDEAFPIEY
VL
RLMNDWADVPCNPYLRVQNTGVSVLFQGFFNRPHGAPGGAITAEQTNVILHSTETTGLSLGDLDDVKG RL
GLDARPMMASMWISCFVRMPRVQLAFRFMGPEDAVRTRRILCRAAEQALARRRRSRRSQDDYGAVAVA
AA
HHSSGAPGPGVAASGPPAPPGRGPARPWHQAVQLFRAPRPGPPALLLLVAGLFLGAAIWWAVGARL*
Gene matched: gi | 136888 | sp | P10218 |UL34_HSV11 Gene name: VIRION PROTEIN UL34
[SEQ ID NO: 251]
ORF # = 38 from Contig 15
ORF start site = 62183
ORF end site = 62521
ORF sequence: MAAPQFHRPSTITADNVRALGMRGLVLATNNAQFIMDNSYPHPHGTQGAVREFLRGQAAALTDLGVTH
AN
NTFAPQPMFAGDAAAEWLRPSFGLKRTYSPFWRDPKTPSTP*
Gene matched: gi | 139196 | sp| P10219 |VP26_HSV11 Gene name : CAPSID PROTEIN VP26
ID [ SEQ ID NO : 252 ] ORF # = 39 from Contig 15
ORF start s ite = 72047
ORF end site = 62688
ORF sequence :
MIPAALPHPTMKRQGDRDIWTGVRNQFATDLEPGGSVSCMRSSLSFLSLLFDVGPRDVLSAEAIEGC LV
EGGEWTRAAAGSGPPRMCSIIELPNFLEYPAARGGLRCVFSRVYGEVGFFGEPTAGLLETQCPAHTFF
AG
PWAMRPLSYTLLTIGPLGMGLYRDGDTAYLFDPHGLPAGTPAFIAKVRAGDVYPYLTYYAHDRPKVRW
AG AMVFFVPSGPGAVAPADLTAAALHLYGASETYLQDEPFVERRVAITHPLRGEIGGLGALFVGWPRGD
GE
GSGPWPALPAPTHVQTPRADRPPEAPRGASGPPNTPQAGHPNRPPDDVWAAALEGTPPAKPSAPDAA
AS
GPPHAAPPPQTPAGDAAEEAEDLRVLEVGAVPVGRHRARYSTGLPKRRRPTWTPPSSVEDLTSGERPA PK
APPAKAKKKSAPKKKAPVAAEVPASSPTPIAATVPPAPDTPPQSGQGGGDDGPASPSSPSVLETLGAR
RP
PEPPGADLAQLFEVHPNVAATAVRLAARDAALAREVAACSQLTINALRSPYPAHPGLLELCVIFFFER VL AFLIENGARTHTQAGVAGPAAALLDFTLRMLPRKTAVGDFLASTRMSLADVAAHRPLIQHVLDENSQI GR
LALAKLVLVARDVIRETDAFYGDLADLDLQLRAAPPANLYARLGEWLLERSRAHPNTLFAPATPTHPE PL
LHRIQALAQFARGEEMRVEAEAREMREALDALARGVDSVSQRAGPLTVMPVPAAPGAGGRAPCPPALG PE
AIQARLEDVRIQARRAIESAIKEYFHRGAVYSAKALQASDSHDCRFHVASAAWPMVQLLESLPAFDQ
HT
RDVAQRAALPPPPPLATSPQAILLRDLLQRGQTLDAPEDLAAWLSVLTDAATQGLIERKPLEELARSI
HG INDQQARRSSGLAELQRFDALDAALAQQLDSDAAFVPATGPAPYVDGGGLSPEATRMAEDALRQARAM
EA
AKMTAELAPEARSRLRERAHALEAMLNDARERAKVAHDAREKFLHKLQGVLRPLPDFVGLKACPAVLA
TL
RASLPAGWTDLADAVRGPPPEVTAALRADLWGLLGQYREALEHPTPDTATALAGLHPAFVWLKTLFA DA
PETPVLVQFFSDHAPTIAKAVSNAINAGSAAVATASPAATVDAAVRAHGALADAVSALGAAARDPASP
LS
FLAALADSAAGYVKATRLALEARGAIDELTTLGSAAADLWQARRACAQPEGDHAALIDAAARATTAA
RE SLAGHEAGFGGLLHAEGTAGDHSPSGRALQELGKVIGATRRRADELEAAVADLTAKMAAQRARGSSER
WA
AGVEAALDRVENRAEFDWELRRLQALAGTHGYNPRDFRKRAEQALAANAEAVTLALDTAFAFNPYTP
EN QRHPMLPPLAAIHRLGWSAAFHAAAETYADMFRVDAEPLARLLRIAEGLLEMAQAGDGFIDYHEAVGR
LA
DDMTSVPGLRRYVPFFQHGYADYVELRDRLDAIRADVHRALGGVPLDLAAAAEQISAARNDPEATAEL
VR
TGVTLPCPSEDALVACAAALERVDQSPVKNTAYAEYVAFVTRQDTAETKDAVVRAKQQRAEATERVMA GL
REALAARERRAQIEAEGLANLKTMLKWAVPATVAKTLDQARSVAEIADQVEVLLDQTEKTRELDVPA
VI
WLEHAQRTFETHPLSAARGDGPGPLARHAGRLGALFDTRRRVDALRRSLEEAEAEWDEVWGRFGRVRG
GA WKSPEGFRAMHEQLRALQDTTNTVSGLRAQPAYERLSARYQGVLGAKGAERAEAVEELGARVTKHTAL
CA
RLRDEWRRVPWEMNFDALGRLLAEFDAAAADLAPWAVEEFRGARELIQRRMGLYSAYARAGGQTGAG
AA
AAPAPLLVDLRALDARARASSSPEGHEVDPQLLRRRGEAYLRAGGDPGPLVLREAVSALDLPFATSFL AP
DGTPLQYALCFPAVTDKLGALLMRPEAACVRPPLPTDVLESAPTVTAMYVLTWNRLQLALSDAQAAN
FQ
LFGRFVRHRQATWGASMDAAAELYVALVATTLTREFGCRWAQLGWASGAAAPRPPPGPRGSQRHCVAF
NE NDVLVALVAGVPEHIYNFWRLDLVRQHEYMHLTLERAFEDAAESMLFVQRLTPHPDARIRVLPTFLDG
GP
PTRGLLFGTRLADWRRGKLSETDPLAPWRSALELGTQRRDAPALGKLSPAQALAAVSVLGRMCLPSAA
LA
ALWTCMFPDDYTEYDSFDALLAARLESGQTLGPAGGREASLPEAPHALYRPTGQHVAVLAAATHRTPA AR
VTAMDLVLAAVLLGAPVWALRNTTAFSRESELELCLTLFDSRPGGPDAALRDWSSDIETWAVGLLH
TD
LNPIENACLAAQLPRLSALIAERPLADGPPCLVLVDISMTPVAVLWEAPEPPGPPDVRFVGSEATEEL
PF VATAGDVLAASAADADPFFARAILGRPFDASLLTGELFPGHPVYQRPLADEAGPΞAPTAARDPRDLAG
GD
GGSGPEDPAAPPARQADPGVLAPTLLTDATTGEPVPPRMWAWIHGLEELASDDAGGPTPNPAPALLPP
PA
TDQSVPTSQYAPRPIGPAATARETRPSVPPQQNTGRVPVAPRDDPRPSPPTPSPPADAALPPPAFSGS AA
AFSAAVPRVRRSRRTRAKSRAPRASAPPEGWRPPALPAPVAPVAASARPPDQPPTPESAPPAWVSALP
LP
PGPASARGAFPAPTLAPIPPPPAEGAVAPGDDRRRGRRQTTAGPSPTPPRGPAAGPPRRLTRPAVASL
SA SLNSLPSPRDPADHAAAVSAAAAAVPPSPGLAPPTSAVQTSPPPLAPGPVAPSEPLCGWWPGGPVAR
RP
PPQSPATKPAARTRIRARSVPQPPLPQPPLPQPPLPQPPLPQPPLPQPPLPQPPLPQPPLPQPPLPQP
PL
PQPPLPQPPLPQSRDSVPTPESPTHTNTHLPVSAVTSWASSLALHVDSAPPPASLLQTLHISSDDEHS
DA
DSLRFSDSDDTEALDPLPPEPHLPPADEPPGPLAADHLQSPHSQFGPLPVQANAVLSRRYVRSTGRSA
LA
VLIRACRRIQQQLQRTRRALFQRSNAVLTSLHHVRMLLG*
Gene matched: gi | 135576 | sp | P10220 |TEGU_HSV11 Gene name: LARGE TEGUMENT PROTEIN (VIRI
[SEQ ID NO: 253]
ORF # = 40 from Contig 15
ORF start site = 75699
ORF end site = 72355
ORF sequence: MSDSALQVPAPAGMTPPSAPPPNGPLQVLLGSLTNLRRPPSPSSEPAGSADEPAFLSAAKLRAATAAF
LL
SGAAVGPAEARACWHPLLEQLCALHRAHGLPETALLAENLPGLLVHRMAVALPETPEAAFREMDVIKD
TV
LAITGSDTTHALEAAGLRTTAALGPVRVRQCAVEWIDRWRTVTQSCLAMNPRTSLEALGEMSLKMSPV PL
GQPGANLTTPAYΞLLFPSPIVQEGLRFLALVSNWVTLFSAHLQRIDDAALTPLTRALFTLALVDEYLT
TP
DRGAWPPPLLAQFQHTVREIDPAIMIPPLEATKMVRSREEVRVSTALSRVSPRSACAPPGTLMARVR
TD AAVFDPDVPFLSASALAIFRPAVTGLLQLGEPPSAGAQQRLLALLQQTWALVQNSNSPSWINTLTDA
GF
TPAHCTQYISALEGFLVAGVPARTPPGHGLSEIQQLFGCIALAGANVFGLAREYGHYAGYVKTFRRIQ
GA
SEHTHGRLCEAVGLSGGVLSQTLARIMGPAVPTEHLASLRRTLVGEFETAERRFSAGQPSLLRETALI WL
DVYGQTHWDLTPTTPATPLSALLPVGPPSHAPSVHLAAATKIRFPALEGIHPNVLADPGFVPYVLALV
VG
DALRATCNAAYLPRPIEFALRVLAWARDFGLGYLPTVEGHRTKLGALITLLEPATRAGVGPTMQMADN
IE QLLRELYVIARGAVEQLRPAVQLPPPQPPEVGSSLLLISMYALAARGVLQELAERADPLVRQLEDAIV
LL
RLHMRTLAAFFECRFESDGHRLYAWADAHERLGPWRPEAMGDAVSQYCGMYHDAKRALVASLAGLRS
W
TETTAHLGVCDELAAQVSHEGNVLAWRREIHGFLAIVSGIHARASKLMSGDQVPGFCYMSQFLARWR RL SAGYQAARAATGPERVAEFVQELHDTWKGLQTERALWAPFASSADQRTAAIQEVMAHATEDAPPSPA AD
LVVLTNRHDLGAWGDYSLGPLGQPTVVPDSVDLSPQGLAATLSMDWLLINELLQVTDGVFRASAFRPS AG PEAPGDLEAQDAGGSTPEPTTPGPQDTQARAPSTRPAGRETVPWPNTPVEDDEMTPQETPPVHP*
Gene matched : gi | 136894 | sp | P10221 | V120_HSV11 Gene name : CAPSID ASSEMBLY PROTEIN UL37
[SEQ ID NO:254] ORF # = 42 from Contig 15 ORF start site = 78158 ORF end site = 81592 ORF sequence:
MANRPAASALAGARSPSERQEPREPEVAPPGGDHVFCRKVSGVMVLSSDPPGPAAYRISDSSFVQCGS NC SMIIDGDVARGHLRDLEGATSTGAFVAISNVAAGGDGRTAWALGGTSGPSATTSVGTQTSGEFLHGN PR
TPEPQGPQAVPPPPPPPFPWGHECCARRDARGGAEKDVGAAESWΞDGPSSDSETEDSDSSDEDTGSGS ET
LSRSSSIWAAGATDDDDSDSDSRSDDSVQPDVWRRRWSDGPAPVAFPKPRRPGDSPGNPGLGAGTGP GS ATDPRASADSDSAAHAAAPQADVAPVLDSQPTVGTDPGYPVPLELTPENAEAVARFLGDAVDREPALM LE
YFCRCAREESKRVPPRTFGSAPRLTEDDFGLLNYALAEMRRLCLDLPPVPPNAYTPYHLREYATRLVN GF KPLVRRSARLYRILGILVHLRIRTREASFEEWMRSKEVDLDFGLTERLREHEAQLMILAQALNPYDCL iH
STPNTLVERGLQSALKYEEFYLKRFGGHYMESVFQMYTRIAGFLACRATRGMRHIALGRQGSWWEMFK FF
FHRLYDHQIVPSTPAMLNLGTRNYYTSSCYLVNPQATTNQATLRAITGNVSAILARNGGIGLCMQAFN DA SPGTASIMPALKVLDSLVAAHNKQSTRPTGACVYLEPWHSDVRAVLRMKGVLAGEEAQRCDNIFSALW MP
DLFFKRLIRHLDGEKNVTWSLFDRDTSMSLADFHGEEFEKLYEHLEAMGFGETIPIQDLAYAIVRSAA TT
GSPFIMFKDAVNRHYIYDTQGAAIAGSNLCTEIVHPSSKRSSGVCNLGSVNLARCVSRRTFDFGMLRD AV
QACVLMVNIMIDSTLQPTPQCARGHDNLRSMGIGMQGLHTACLKMGLDLESAEFRDLNTHIAEVMLLA
AM
KTSNALCVRGARPFSHFKRSMYRAGRFHWERFSNASPRYEGEWEMLRQSMMKHGLRNSQFIALMPTAA
SA QISDVSEGFAPLFTNLFSKVTRDGETLRPNTLLLKELERTFGGKRLLDAMDGLEAKQWSVAQALPCLD PA
HPLRRFKTAFDYDQELLIDLCADRAPYVDHSQSMTLYVTEKADGTLPASTLVRLLVHAYKRGLKTGMY YC KVRKATNSGVFAGDDNIVCTSCAL*
Gene matched: gi | 1710385 | sp | P09853 |RIR1_HSV23 Gene name: RIBONUCLEOSIDE-DIPHOSPHATE
[SEQ ID NO:255]
ORF # = 43 from Contig 15
ORF start site = 81665
ORF end site = 82658 ORF sequence:
MDPAVSPASTDPLDTHASGAGAAPIPVCPTPERYFYTSQCPDINHLRSLSILNRWLETELVFVGDEED
VS
KLSEGELGFYRFLFAFLSAADDLVTENLGGLSGLFEQKDILHYYVEQECIEWHSRVYNIIQLVLFHN
ND QARRAYVARTINHPAIRVKVDWLEARVRECDSIPEKFILMILIEGVFFAASFAAIAYLRTNNLLRVTC
QS
NDLISRDEAVΉTTASCYIYNNYLGGHAKPEAARVYRLFREAVDIEIGFIRSQAPTDSΞILSPGALAAI
EN
YVRFSADRLLGLIHMQPLYSAPAPDASFPLSLMSTDKHTNFFECRSTSYAGAWNDL*
Gene matched: gi | 132624 | sp| P03174 | RIR2_HSV23 Gene name: RIBONUCLEOSIDE-DIPHOSPHATE R
[SEQ ID NO:256]
ORF # = 44 from Contig 15
ORF start site = 84014
ORF end site = 82941 ORF sequence :
MRRRGHAFAPGDRGTRAAGPGPAAPWGAPSKPALRLAHLFCIRVLRALGYAYINSGQLEADDACANLY
HT
NTVAYVHTTDTDLLLMGCDIVLDISTGYIPTIHCRDLLQYFKMSYPQFLALFVRCHTDLHPNNTYASV
ED VLRECHWTAPSRSQARRAARRERANSRSLESMPTLTAAPVGLETRISWTEILAQQIAGEDDYEEDPPL
QP
PDVAGGPRDGARSSSSEILTPPELVQVPNAQRVAEHRGYVAGRRRHVIHDAPEALDWLPDPMTIAELV
EH
RYVKYVI SLI S PKERGPWTLLKRLPI YQDLRDEDLARS I VTRHITAPDI ADRFLAQLWAHAPPPAFYK DV LAKFWDE*
Gene matched: gi | 549322 | sp | P36699 |VHS_HSV2G Gene name: VIRION HOST SHUTOFF PROTEIN
[SEQ ID NO: 257]
ORF # = 45 from Contig 15 ORF start site = 84914
ORF end site = 86326
ORF sequence:
MAHLPGGAAAAPLSEDAIPSPRERTEDWPPCQIVLQGAELNGILQAFAPLRTSLLDSLLWGDRGILV
HN AIFGEQVFLPLDHSQFSRYRWGGPTAAFLSLVDQKRSLLSVFRANQYPDLRRVELTVTGQAPFRTLVQ
RI
WTTASDGEAVELASETLMKRELTSFAVLLPQGDPDVQLRLTKPQLTKWNAVGDETAKPTTFELGPNG
KF
SVFNARTCVTFAAREEGASSSTSAQVQILTSALKKAGQAAANAKTVYGENTHRTFSVWDDCSMRAVL RR
LQVGGGTLKFFLTADVPSVCVTATGPNAVSAVFLLKPQRVCLNWLGRTPGSSTGSLASQDSRAGPTDS
QD
FSSEPDAGDRGAPEEEGLEGQARVPPAFPEPPGTKRRHAGAEWPADDATKRPKTGVPAAPTRAESPP
LS ARYGPEAAEGGGDGGRYACYFRDLQTGDASPSPLSAFRGPQRPPYGFGLP*
Gene matched: gi | 136905 | sp | P10226 |VPAP_HSV11 Gene name: DNA POLYMERASE PROCESSIVITY
[SEQ ID NO:258] = 15
ORF # = 48 from Contig 15
ORF start site = 89794 ORF end site = 90312
ORF sequence:
MAFRASGPAYQPLAPAASPARARVPAVAWIGVGAIVGAFALVAALVLVPPRSSWGLSPCDSGWQEFNA
GC
VAWDPTPVEHEQAVGGCSAPATLIPRAAAKHLAALTRVQAERSSGYWWVNGDGIRTCLRLVDSVSGID EF
CEELAIRICYYPRSPGGFVRFVTSIRNALGLP*
Gene matched: gi | 136917 | sp| P06483 |UL45_HSV23 Gene name: PROTEIN UL45 HOMOLOG (18 KD [SEQ ID N0:259] ORF # = 49 from Contig 15 ORF start site = 92744 ORF end site = 90579 ORF sequence:
MQRRARGASSLRLARCLTPANLIRGANAGVPERRIFAGCLLPTPEGLLSAAVGVLRQRADDLQPAFLT GA DRSVRLAARHHNTVPESLIVDGLASDPHYDYIRHYASAAKQALGEVELSGGQLSRAILAQYWKYLQTV vp
SGLDIPDDPAGDCDPSLHVLLRPTLLPKLLVRAPFKSGAAAAKYAAAVAGLRDAAHRLQQYMFFMRPA
DP
SRPSTDTALRLSEFLAYVSVLYHWASWMLWTADKYVCRRLGPADRRFVALSGSLEAPAETFARHLDRG
PS GTTGSMQCMALRAAVΞDVLGHLTRLAHLWETGKRSGGTYGIVDAIVSTVEVLSIVHHHAQYIINATLT
GY
WWASDSLNNEYLRAAVDSQERFCRTAAPLFPTMTAPSWARMELSIKSWFGAALAPDLLRSGTPSPHY
ES
ILRLAASGPPGGRGAVGGSCRDKIQRTRRDNAPPPLPRARPHSTPAAPRRFRRHREDLPEPPHVDAAD RG
PEPCAGRPATYYTHMAGAPPRLPPRNPAPPEQRPAAAARPLAAQREAAGVYDAVRTWGPDAEAEPDQM
EN
TYLLPDDDAAMPAGVGLGATPAADTTAAAWPAESHAPRAPSEDADSIYESVSEDGGRVYEEIPWVRVY
EN ICLRRQDAGGAAPPGDAPDSPYIEAENPLYDWGGSALFSPPGATRAPDPGLSLSPMPARPRTNALAND
GP
TNVAALSALLTKLKRGRHQSH*
Gene matched: gi | 114350 | sp | P10230 |ATI2_HSV11 Gene name: LPHA TRANS-INDUCING FACTOR
[SEQ ID NO:260] ORF # = 50 from Contig 15
ORF start site = 93910
ORF end site = 92828
ORF sequence:
VGAAAVPLLSAGGAAPPHPGPDAAVFRSSLGSLLYWPGVRALLGRDCRVAARYAGRMTYIATGALLAR FN
PGAVKCVLPREAAFAGRVLDVLAVLAEQTVQWLSVWGARLHPHSAHPAFVDVEQEALFRALPLGSPG
W
AAEHEALGDTAARRLLATSGLNAVLGAAVYALHTALATVTLKYALACGDARRRRDHAAAARAVLATGL
IL QRLLGIADTWACVALAAFDGGSTAPEVGTYTPLRYACVLRATQPLYARTTPAKFWADVRAAAEHVDL RP
ASSAPRAPVSGTADPAFLLEDLAAFPPAPLNSESVLGPRVRWDIMAQFRKLLMGDEETAALRAHVSG RR ATGLGGPPRP*
Gene matched : gi | 136920 | sp | P10231 | UL47_HSV11 Gene name : VIRION PROTEIN UL47
[SEQ ID NO:261]
ORF # = 51 from Contig 15
ORF start site = 94919 ORF end site = 93504
ORF sequence:
MSVRGHAVRRRRASTRSHAPSAHRADSPVEDEPEGGGVGLMGYLRAVFNVDDDSEVEAAGEMASEEPP
PR
RRREARGHPGSRRASEARAAAPPRRASFPRPRSVTARSQSVRGRRDSAITRAPRGGYLGPMDPRDVLG RV
GGSRWPSPLFLDELSYEEDDYPAAVAHDDGAGARPPATVEILAGRVSGPELQAAFPLDRLTPRVAAW
DE
SVRSALALGHPAGFYPCPDSAFGLSRVGVMHFASPADPKVFFRQTLQQGEALAWYVTGDAILDLTDRR
AK TSPSRAMGFLVDAIVRVAINGWVCGTRLHTEGARLGARRQGGRAPTAVREPHGVAARGGRRRAAAQRG
RG
RAPPPRPRRRGLSQFAGVPAVLARGARAPGARLSRGRPLRGAHDVHRHRGSARPLQPRRRQMRAPAGG
RV
CGARPGRAGGPGGADGPVALGGRGGAPAPALRPPRVCGRGAGGAVSRPAPG*
Gene matched: gi | 136920 | sp | P10231 | UL47_HSV11 Gene name: VIRION PROTEIN UL47
[SEQ ID NO: 262] ORF # = 53 from Contig 15 ORF start site = 98257 ORF end site = 97349 ORF sequence:
MTSRRSVKSCPREAPRGTHEELYYGPVSPADPESPRDDFRRGAGPMRARPRGEVRFLHYDEAGYALYR DS
SSSEDNDESRDTARPRRSASVAGSHGPGPARAPPPPGGPVGAGGRSHAPPARTPKMTRGAPKAPATPA TD PARGRRPAQADSAVLLDAPAPTASGRTKTPAQGLAKKLHFSTAPPSPTAPWTPRVAGFNKRVFCAAVG RL
AATHARLAAVQLWDMSRPHTDEDLNELLDLTTIRVTVCEGKNLLQRANELVNPDAAQDVDATAAARGR PA GRAAATARAPARSASRPRRPLE*
Gene matched: gi | 136927 | sp | P10233 |UL49_HSVll Gene name: TEGUMENT PROTEIN UL49
[SEQ ID NO: 263] ORF # = 54 fro Contig 15 ORF start site = 98876 ORF end site = 98596 ORF sequence:
WLLFVALVAGVPGEPPNAAGARGVIGDAQCRGDSAGWSVPGVLVPFYLGMTSMGVCMIAHVYQICQ RA LAAGSA*
Gene matched: gi | 1944541 | gnl | PID| e312365 Gene name: (X14112) envelope protein [human
[SEQ ID NO: 264]
ORF # = 55 from Contig 15
ORF start site = 98867
ORF end site = 99976 ORF sequence :
MSQWGPRAILVQTDSTNRNADGDWQAAVAIRGGGWQLNMVNKRAVDFTPAECGDSEWAVGRVSLGLR
MA
MPRDFCAIIHAPAVSGPGPHVMLGLVDSGYRGTVLAVWAPNGTRGFAPGALRVDVTFLDIRATPPTL
TE PSSLHRFPQLAPSPLAGLREDPWLDGALATAGGAVALPARRRGGSLVYAGELTQVTTEHGDCVHEAPA
FL
PKREEDAGFDILIHRAVTVPANGATVIQPSLRVLRAADGPEACYVLGRSSLNARGLLVMPTRWPSGHA
CA
FWCNLTGVPVTLQAGSKVAQLLVAGTHALPWIPPDNIHEDGAFRAYPRGVPDATATPRDPPILVFTN EF
DADAPPSKRGAGGFGSTGI*
Gene matched: gi | 118955 | sp | P10234 |DUT_HSV11 Gene name: DEOXYURIDINE 5 ' -TRIPHOSPHATE [ SEQ ID NO : 265 ] ORF # = 56 from Contig 15 ORF start site = 101006 ORF end site = 100182 ORF sequence :
MASLLGVLCGWGTRPEEQQYEMIRAAAPPSXXDPRLQEALAWNALLPAPITLDDALESLDDTRRLVK AR ALARTYHACMVNLERLARHHPGLEGSTIDGAVAAHRDKMRRLADTCMATILQMYMSVGAADKSADVLV SQ
AIRSMAESDWMEDVAIAERALGLSTSALAGGTRTAGLGATEAPPGPTRAQAPEVASVPVTHAGDRSP VR PGPVPPADPTPDPRHRTSAPKRQASSTEAPLLLA*
Gene matched: gi | 136933 | sp| P10235 | UL51_HSV1 Gene name: PROTEIN U 51
[SEQ ID NO:266]
ORF # = 58 from Contig 15
ORF start site = 102815
ORF end site = 104188
ORF sequence : MYVNRNEIFNAALAVTNIILDLDIALKEPVPFPRLHEALGHFRRGALAAVQLLFPAARVDPDAYPCYF
FK
SACRPRAPPVCAGDGPSAGGDDGDGDWFPDAGGDDGDEEWEEDTDPMDTTHGPLPDDEAAYLDLLHEQ
IP
AATPSEPDSWCSCADKIGLRVCLPVPAPYWHGSLTMRGVARVIQQAVLLDRDFVEAVGSHVKNFLL ID
TGVYAHGHSLRLPYFAKIGPDGSACGRLLPVFVIPPACEDVPAFVAAHADPRRFHFHAPPMFSAAPRE
IR
VLHSLGGDYVSFFEKKASRNALEHFGRRETLTEVLGRYDVRPDAGETVEGFASELLGRIVACIEAHFP EH AREYQAVSVRRAVIKDDWVLLQLIPGRGALNQSLSCLRFKHGRASRATARTFLALSVGTNNRLCASLC QQ CFATKCDNNRLHTLFTVDAGTPCSRSAPSSTSRPSSS*
Gene matched: gi | 136939 | sp | P10236 | UL52_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:267] ORF # = 59 from Contig 15 ORF start site = 104140 ORF end site = 105156 ORF sequence :
MLAVRSLQHLTTVIFITAYGLVLAWYIVFGASPLHRCIYAVRPAGAHNDTALVWMKINQTLLFLGPPT AP
P∞AWTPHAHVCYANIIEGRAVSLPAIPGAMSRRVMNVHEAWCLEALWDTQMRLVWGWFLYLAFVA LH
QRRCMFGWSPAHSMVAPATYLLNYAGRIVSSVFLQYPYTKITRLLCELSVQRQTLVQLFEADPVTFL YH RPAVGVIVGCELLLRFVALGLIVGTALISRGACAITYPLFLTITTWCFVSIIALTELYFILRRDSAPK NA EPAAPRGRSKGWSGVCGRCCSI ILSGIAVRLCYIAWAGWLMALRYEQEIQRRLFDL *
Gene matched : gi | 116105 | sp | P22485 | CELF_HSV2H Gene name : CELL FUSION PROTEIN PRECURSO
[ SEQ ID NO : 268 ] ORF # = 60 from Contig 15 ORF start site = 105702 ORF end site = 107240 ORF sequence :
MATDIDMLIDLGLDLSDSELEEDALERDEEGRRDDPESDSSGECSSSDEDMEDPCGDGGAEAIDAAIP KG
PPARPEDAGTPEASTPRPAARRGADDPPPATTGVWSRLGTRRSASPREPHGGKVARIQPPSTKAPHPR
GG
RRGRRRGRGRYGPGGADSTPNPRRRVSRNAHNQGGRHPASARTDGPGATHGEARRGGEQLDVSGGPRP
RG TRQAPPPLMALSLTPPHADGRAPVPERKAPSADTIDPAVRAVLRSISERAAVERISESFGRSALVMQD
PF
GGMPFPAANSPWAPVLATQAGGFDAETRRVSWETLVAHGPSLYRTFAANPRAASTAKAMRDCVLRQEN
LI
EALASADETLAWCKMCIHHNLPLRPQDPIIGTAAAVLENLATRLRPFLQCYLKARGLCGLDDLCSRRR LS
DIKDIASFVLVILARLANRVERGVSEIDYTTVGVGAGETMHFYIPGACMAGLIEILDTHRQECSSRVC
EL
TASHTIAPLYVHGKYFYCNSLF*
Gene matched: gi | 124181 | sp| P28276 | IE63_HSV2H Gene name: TRANSCRIPTIONAL REGULATOR IE
ORF # = 62 ORF start site = 108542 ORF end site = 108189 ORF sequence :
MIGAHPGVGGDLPSGLPTYAEATSDRPPTYAMVMAACPTEPPGGSVGPADQPRVQSSRTWRPPLVNSR EL YRAQRAARCASSSDTPQAPGWCGGTCRHAVFGWAWWIILAFLWR*
Gene matched: gi | 136952 | sp | P28282 |UL56_HSV2H Gene name: PROTEIN UL56 «gi | 73833 jpir | |w
[SEQ ID NO: 269]
ORF # = 63 from Contig 15
ORF start site = 112958 ORF end site = 113542
ORF sequence:
MHLFCQCPLTDGQDLYLCPVYPRMHQEHLVCPLHRLDDARRRGRTSAAWDEGLVRALTHSGGLMGCGG
RS
LTLSETYWGHPLYEKLVPWDHPRDLKVPEASAVGTRALVPRGRGRPLRGRPVPLIPLDCEPNDGLPFG GG
WPGGRLRGAPVPLHPPPPSAPPLSFTPTLTPPCLCRGLSLCVWKQYLKDRNNF*
Gene matched: gi | 1644457 Gene name: (U72521) neural variant mena+ protein [Mus muscu
TABLE 4
All amino acid sequences within Table 4 are encoded by Contig 15 [SEQ ID NO:217] of Table 3
[SEQ ID NO:270] = 15 ORF # = 9b ORF start site = 14508 ORF end site = 11905 ORF sequence :
MNVATCTHQTHHAARAPGATSAPGAASGDPLGARRPIGDDECEQYTSSVSLARMLYGGDLAEWVPRVH PK TTIERQQHGPVTFPDASAPTARCVTVVRAPMGSGKTTALIRWLGEAIHSPDTSVLWSCRRSFTQTLA
TR
FAESGLPDFVTYFSSTNYIMNDRPFHRLIVQVESLHRVGPNLLNNYDVLVLDEVMSTLGQLYSPTMQQ
LG RVDALMLRLLRTCPRI IAMDATANAQLVDFLCSLRGEKNVHWIGEYAMPGFSARRCLFLPRLGPEVL
QA
ALRPPGPAGGAPPPDAPPDATFFGELEARLAGGDNVCIFSSTVSFAEWARFCRQFTDRVLLLHSLTP
PG
DVTTWGRYRVVIYTTVVTVGLSFDPPHFDSMFAYVKPMNYGPDMVSVYQSLGRVRTLRKGELLIYMDG SG
ARSEPVFTPMLLNHWSASGQWPAQFSQVTNLLCRRFKGRCDASHADAAQARGSRIYSKFRYKHYFER
CT
LACLADSLNILHMLLTLNCMHVRFWGHDAALTPRNFCLFLRGIHFDALRAQRDLRELRCQDPDTSLSA
QA AETEEVGLFVEKYLRPDVAPAEWALMRGLNSLVGRTRFIYLVLLEACLRVPMAAHSSAIFRRLYDHY
AT
GVIPTINAAGELELVALHPTLNVAPVWELFRLCSTMAACLQWDSMAGGSGRTFSPEDVLELLNPHYDR
YM
QLVFELGHCNVTDGPLLSEDAVKRVADALSGCPPRGSVSETEHALSLFKIIWGELFGVQLAKΞTQTFP GA
GRVKNLTKRAIVELLDAHRIDHSACRTHRQLYALLMAHKREFAGARFKLRAPAWGRCLRTHASGAQPN
TD
IILEAALSELPTEAWPMMQGAVNFST *
Gene matched: gi | 1869831 | gnl | PID | e304265 Gene name: (Z86099) UL9 [human herpesvirus
[SEQ ID NO:271] = 15 ORF # = 9a
ORF start site = 14520
ORF end site = 11904
ORF sequence :
MAETMNVATCTHQTHHAARAPGATSAPGAASGDPLGARRPIGDDECEQYTSSVSLARMLYGGDLAEWV PR
VHPKTTIERQQHGPVTFPDASAPTARCVTWRAPMGSGKTTALIRWLGEAIHSPDTSVLWSCRRSFT
QT
LATRFAESGLPDFVTYFSSTNYIMNDRPFHRLIVQVESLHRVGPNLLNNYDVLVLDEVMSTLGQLYSP
TM QQLGRVDALMLRLLRTCPRIIAMDATANAQLVDFLCSLRGEKNVHWIGEYAMPGFSARRCLFLPRLG
PE
VLQAALRPPGPAGGAPPPDAPPDATFFGELEARLAGGDNVCIFSSTVSFAEWARFCRQFTDRVLLLH
SL
TPPGDVTTWGRYRWIYTTWTVGLSFDPPHFDSMFAYVKPMNYGPDMVSVYQSLGRVRTLRKGELLI YM DGSGARSEPVFTPMLLNHWSASGQWPAQFΞQVTNLLCRRFKGRCDASHADAAQARGSRIYSKFRYKH YF
ERCTLACLADSLNILHMLLTLNCMHVRFWGHDAALTPRNFCLFLRGIHFDALRAQRDLRELRCQDPDT SL SAQAAETEEVGLFVEKYLRPDVAPAEWALMRGLNSLVGRTRFIYLVLLEACLRVPMAAHSSAIFRRL YD
HYATGVIPTINAAGELELVALHPTLNVAPVWELFRLCSTMAACLQWDSMAGGSGRTFSPEDVLELLNP HY
DRYMQLVFELGHCNVTDGPLLSEDAVKRVADALSGCPPRGSVSETEHALSLFKIIWGELFGVQLAKST QT
FPGAGRVKNLTKRAIVELLDAHRIDHSACRTHRQLYALLMAHKREFAGARFKLRAPAWGRCLRTHASG
AQ
PNTDIILEAALSELPTEAWPMMQGAVNFSTL*
Gene matched: gi | 136806 | sp | P10193 |0BP_HSV11 Gene name: ORIGIN OF REPLICATION BINDING
[SEQ ID NO:272] = 15
ORF # = 20a
ORF start site = 31782
ORF end site = 27630
ORF sequence: MEPANPPRNPMAAPARDPPGYRYAAAMVPTGSILSTIEVASHRRLFDFFARVRSDENΞLYDVEFDALL
GS
YCNTLSLVRFLELGLSVACVCTKFPELAYMNEGRVQFEVHQPLIARDGPHPVEQPVHNYMTKVIDRRA
LN
AAFSLATEAIALLTGEALDGTGISLHRQLRAIQQLARNVQAVLGAFERGTADQMLHVLLEKAPPLALL LP
MQRYLDNGRLATRVARATLVAELKRSFCDTSFFLGKAGHRREAIEAWLVDLTTATQPSVAVPRLTHAD
TR
GRPVDGVLVTTAAIKQRLLQSFLKVEDTEADVPVTYGEMVLNGANLVTALVMGKAVRSLDDVGRHLLE
MQ EEQLEANRETLDELESAPQTTRVRADLVAIGDRLVFLEALEKRIYAATNVPYPLVGAMDLTFVLPLGL
FN
PAMERFAAHAGDLVPAPGHPEPRAFPPRQLFFWGKDHQVLRLSMENAVGTVCHPSLMNIDAAVGGVNH
DP
VEAANPYGAYVAAPAGPGADMQQRFLNAWRQRLAHGRVRWVAECQMTAEQFMQPDNANLALELHPAFD FF
AGVADVELPGGEVPPAGPGAIQATWRWNGNLPLALCPVAFRDARGLELGVGRHAMAPATIAAVRGAF
ED
RSYPAVFYLLQAAIHGSEHVFCALARLVTQCITSYWNNTRCAAFVNDYSLVSYIVTYLGGDLPEECMA
VY RDLVAHVEALAQLVDDFTLPGPELGGQAQAELNHLMRDPALLPPLVWDCDGLMRHAALDRHRDCRIDA
GG
HEPVYAAACNVATADFNRNDGRLLHNTQARAADAADDRPHRPADWTVHHKIYYYVLVPAFSRGRCCTA
GV RFDRVYATLQNMWPEIAPGEECPSDPVTDPAHPLHPANLVANTVNAMFHNGRVWDGPAMLTLQVLA
HN
MAERTTALLCSAAPDAGANTASTANMRIFDGALHAGVLLMAPQHLDHTIQNGEYFYVLPVHALFAGAD
HV
ANAPNFPPALRDLARHVPLVPPALGANYFS S IRQPWQHARESAAGENALTYALMAGYFKMS PVALYH QL
KTGLHPGFGFTWRQDRFVTENVLFSERASEAYFLGQLQVARHETGGGVSFTLTQPRGNVDLGVGYTA
VA
ATATVRNPVTDMGNLPQNFYLGRGAPPLLDNAAAVYLRNAWAGNRLGPAQPLPVFGCAQVPRRAGMD
HG QDAVCEFIATPVATDINYFRRPCNPRGRAAGGVYAGDKEGDVIALMYDHGQSDPARPFAATANPWASQ
RF
SYGDLLYNGAYHLNGASPVLSPCFKFFTAADITAKHRCLERLIVETGSAVSTATAASDVQFKRPPGCR
EL
VEDPCGLFQEAYPITCASDPALLRSARDGEAHARETHFTQYLIYDASPLKGLSL*
Gene matched: gi | 137571 | sp| P06491 |VCAP_HSV11 Gene name: MAJOR CAPSID PROTEIN (MCP)
[SEQ ID NO:273] = 15
ORF # = 20b
ORF start site = 31754
ORF end site = 27630
ORF sequence: MAAPARDPPGYRYAAAMVPTGSILSTIEVASHRRLFDFFARVRSDENSLYDVEFDALLGSYCNTLSLV
RF
LELGLSVACVCTKFPELAYMNEGRVQFEVHQPLIARDGPHPVEQPVHNYMTKVIDRRALNAAFSLATE
Al
ALLTGEALDGTGISLHRQLRAIQQLARNVQAVLGAFERGTADQMLHVLLEKAPPLALLLPMQRYLDNG RL
ATRVARATLVAELKRSFCDTSFFLGKAGHRREAIEAWLVDLTTATQPSVAVPRLTHADTRGRPVDGVL
VT
TAAIKQRLLQSFLKVEDTEADVPVTYGEMVLNGANLVTALVMGKAVRSLDDVGRHLLEMQEEQLEANR
ET LDELESAPQTTRVRADLVAIGDRLVFLEALEKRIYAATNVPYPLVGAMDLTFVLPLGLFNPAMERFAA
HA
GDLVPAPGHPEPRAFPPRQLFFWGKDHQVLRLSMENAVGTVCHPSLMNIDAAVGGVNHDPVEAANPYG
AY
VAAPAGPGADMQQRFLNAWRQRLAHGRVRWVAECQMTAEQFMQPDNANLALELHPAFDFFAGVADVEL PG GEVPPAGPGAIQATWRWNGNLPLALCPVAFRDARGLELGVGRHAMAPATIAAVRGAFEDRSYPAVFY
LL
QAAIHGSEHVFCALARLVTQCITSYWNNTRCAAFVNDYSLVSYIVTYLGGDLPEECMAVYRDLVAHVE
AL AQLVDDFTLPGPELGGQAQAELNHLMRDPALLPPLVWDCDGLMRHAALDRHRDCRIDAGGHEPVYAAA
CN
VATADFNRNDGRLLHNTQARAADAADDRPHRPADWTVHHKIYYYVLVPAFSRGRCCTAGVRFDRVYAT
LQ
NMWPEIAPGEECPSDPVTDPAHPLHPANLVANTVNAMFHNGRVWDGPAMLTLQVLAHNMAERTTAL LC
SAAPDAGANTASTANMRIFDGALHAGVLLMAPQHLDHTIQNGEYFYVLPVHALFAGADHVANAPNFPP
AL
RDLARHVPLVPPALGANYFSSIRQPWQHARESAAGENALTYALMAGYFKMSPVALYHQLKTGLHPGF
GF TVVRQDRFVTENVLFSERASEAYFLGQLQVARHETGGGVSFTLTQPRGNVDLGVGYTAVAATATVRNP
VT
DMGNLPQNFYLGRGAPPLLDNAAAVYLRNAWAGNRLGPAQPLPVFGCAQVPRRAGMDHGQDAVCEFI
AT
PVATDINYFRRPCNPRGRAAGGVYAGDKEGDVIALMYDHGQSDPARPFAATANPWASQRFSYGDLLYN GA
YHLNGASPVLSPCFKFFTAADITAKHRCLERLIVETGSAVSTATAASDVQFKRPPGCRELVEDPCGLF
QE
AYPITCASDPALLRSARDGEAHARETHFTQYLIYDASPLKGLSL*
Gene matched: gi | 137571 | sp | P06491 |VCAP_HSV11 Gene name: MAJOR CAPSID PROTEIN (MCP)
[SEQ ID NO:274] = 15 ORF # = 22a
ORF start site = 33002
ORF end site = 34984
ORF sequence :
MRPELSLKGRPCVTEAWCPSTDAAIHSGGSSSVRPQPYARAARARATHGSRSRHRQPLLPPPSSHHP Tl
PPPPSPPRGSPAMELSYATTLHHRDWFYVTADRNRAYFVCGGSVYSVGRPRDSQPGEIAKFGLWRG
TG
PKDRMVANYVRSELRQRGLRDVRPVGEDEVFLDSVCLLNPNVSSERDVINTNDVEVLDECLAEYCTSL
RT SPGVLVTGVRVRARDRVIELFEHPAIVNISSRFAYTPSPYVFALAQAHLPRLPSSLEPLVSGLFDGIP
AP
RQPLDARDRRTDWITGTRAPRPMAGTGAGGAGAKRATVSEFVQVKHIDRWSPSVSSAPPPSAPDAS
LP
PPGLQEAAPPGPPLRELWWVFYAGDRALEEPHAESGLTREEVRAVHGFREQAWKLFGSVGAPRAFLGA AL ALSPTQKLAVYYYLIHRERRMSPFPALVRLVGRYIQRHGLYVPAPDEPTLADAMNGLFRDALAAGTVA EQ
LLMFDLLPPKDVPVGSDARADSAALLRFVDSQRLTPGGSVSPEHVMYLGAFLGVLYAGHGRLAAATHT AR LTGVTSLVLTVGDVDRMSAFDRGPAGAAGRTRTAGYLDALLTVCLARAQHGQSV*
Gene matched: gi | 136845 | sp| P10205 |UL21_HSV11 Gene name: PROTEIN UL21
[SEQ ID NO:275] = 15
ORF # = 22b
ORF start site = 33385
ORF end site = 34984 ORF sequence:
MELSYATTLHHRDVVFYVTADRNRAYFVCGGSVYSVGRPRDSQPGEIAKFGLVVRGTGPKDRMVANYV
RS
ELRQRGLRDVRPVGEDEVFLDSVCLLNPNVSSERDVINTNDVEVLDECLAEYCTΞLRTSPGVLVTGVR
VR ARDRVIELFEHPAIVNISSRFAYTPSPYVFALAQAHLPRLPSSLEPLVSGLFDGIPAPRQPLDARDRR
TD
WITGTRAPRPMAGTGAGGAGAKRATVSEFVQVKHIDRWSPSVSSAPPPSAPDASLPPPGLQEAAPP
GP
PLRELWWVFYAGDRALEEPHAESGLTREEVRAVHGFREQAWKLFGSVGAPRAFLGAALALSPTQKLAV YY
YLIHRERRMSPFPALVRLVGRYIQRHGLYVPAPDEPTLADAMNGLFRDALAAGTVAEQLLMFDLLPPK
DV
PVGSDARADSAALLRFVDSQRLTPGGSVSPEHVMYLGAFLGVLYAGHGRLAAATHTARLTGVTSLVLT VG DVDRMSAFDRGPAGAAGRTRTAGYLDALLTVCLARAQHGQSV*
Gene matched: gi | 136845 | sp| P10205 |UL21_HSV11 Gene name: PROTEIN UL21
[SEQ ID NO:276] = 15 ORF # = 41a ORF start site = 75756 ORF end site = 77588 ORF sequence :
MTAAALYGGAKYRPGTLRNPGRVASTPRRRGVLYGALCPGIPFVGSGPGAVGWECVCVGGGRRDGGPD QV
YRGRSVGRPNRPFKHLRMHRPSQSDTGTHQRRKPPSPVRVRVFSGGVFFLSALLPPHLHHPPPTTRPL Al GGKTMKTKPLPTAPMAWAESAVETTTSPRELAGHAPLRRVLRPPIARRDGPVLLGDRAPRRTASTMWL LG
IDPAESSPGTRATRDDTEQAVDKILRGARRAGGLTVPGAPRYHLTRQVTLTDLCQPNAERAGALLLAL RH PTDLPHLARHRAPPGRQTERLAEAWGQLLEASALGSGRAESGCARAGLVSFNFLVAACAAAYDARDAA EA
VRAHITTNYGGTRAGARLDRFSECLRAMVHTHVFPHEVMRFFGGLVSWVTQDELASVTAVCΞGPQEAT HT
GHPGRPCSAVTIPACAFVDLDAELCLGGPGAAFLYLVFTYRQCRDQELCCVYWKSQLPPRGLEAALE RL
FGRLRITNTIHGAEDMTPPPPNRNVDFPLAVLAASSQSPRCSASQVTNPQFVDRLYRWQPDLRGRPTA
RT
CTYAAFAELGVMPDNSPRCLHRTERFGAVGVPWILEGWWRPGGWRACA*
Gene matched: gi | 139176 | sp | P22486 |VP19_HSV2G Gene name: CAPSID ASSEMBLY AND DNA MATU
[SEQ ID NO:277] = 15 ORF # = 41b
ORF start site = 75817
ORF end site = 77588
ORF sequence:
MHRPSQSDTGTHQRRKPPSPVRVRVFSGGVFFLSALLPPHLHHPPPTTRPLAIGGKTMKTKPLPTAPM AW
AESAVETTTSPRELAGHAPLRRVLRPPIARRDGPVLLGDRAPRRTASTMWLLGIDPAESSPGTRATRD
DT
EQAVDKILRGARRAGGLTVPGAPRYHLTRQVTLTDLCQPNAERAGALLLALRHPTDLPHLARHRAPPG
RQ TERLAEAWGQLLEASALGSGRAESGCARAGLVSFNFLVAACAAAYDARDAAEAVRAHITTNYGGTRAG
AR
LDRFSECLRAMVHTHVFPHEVMRFFGGLVSWVTQDELASVTAVCSGPQEATHTGHPGRPCSAVTIPAC
AF
VDLDAELCLGGPGAAFLYLVFTYRQCRDQELCCVYWKSQLPPRGLEAALERLFGRLRITNTIHGAED MT
PPPPNRNVDFPLAVLAASSQSPRCSASQVTNPQFVDRLYRWQPDLRGRPTARTCTYAAFAELGVMPDN
SP
RCLHRTERFGAVGVPWILEGWWRPGGWRACA*
Gene matched: gi | 139176 | sp| P22486 |VP19_HSV2G Gene name: CAPSID ASSEMBLY AND DNA MATU
[SEQ ID NO :278] = 15 ORF # = 41c ORF start site = 76188
ORF end site = 77588
ORF sequence:
MKTKPLPTAPMAWAESAVETTTSPRELAGHAPLRRVLRPPIARRDGPVLLGDRAPRRTASTMWLLGID PA
ESSPGTRATRDDTEQAVDKILRGARRAGGLTVPGAPRYHLTRQVTLTDLCQPNAERAGALLLALRHPT
DL
PHLARHRAPPGRQTERLAEAWGQLLEASALGSGRAESGCARAGLVSFNFLVAACAAAYDARDAAEAVR
AH ITTNYGGTRAGARLDRFSECLRAMVHTHVFPHEVMRFFGGLVSWVTQDELASVTAVCSGPQEATHTGH
PG
RPCSAVTIPACAFVDLDAELCLGGPGAAFLYLVFTYRQCRDQELCCVYWKSQLPPRGLEAALERLFG
RL
RITNTIHGAEDMTPPPPNRNVDFPLAVLAASSQSPRCSASQVTNPQFVDRLYRWQPDLRGRPTARTCT YA
AFAELGVMPDNSPRCLHRTERFGAVGVPWILEGWWRPGGWRACA*
Gene matched: gi | 139176 | sp | P22486 |VP19_HSV2G Gene name: CAPSID ASSEMBLY AND DNA MATU
[SEQ ID NO:279] = 15 ORF # = 46a ORF start site = 86432 ORF end site = 87820 ORF sequence:
MAWCGSGLRLRPFHPPSPSFFVLRALIRAGPGPFAASPRAPSGPGCGMCRGDSPGVAGGSGEHCLGG DD GDDGRPRLACVGAIARGFAHLWLQATTLGFVGSWLSRGPYADAMSGAFVIGSTGLGFLRAPPAFARP PT
RVCAWLRLVGGGAAVALWSLGEAGAPPGVPGPATQCLALGAAYAALLVLADDVHPLFLLAPRPLFVGT LG
WVGGLTIGGSARYWWIDPRAAAALTAAWAGLGTTAAGDSFSKACPRHRRFCWSAVESPPPRYAPE DA ERPTDHGPLLPSTHHQRSPRVCGDGAARPENIWVPWTFAGALALAACAARGSDAAPSGPVLPLWPQV FV
GGHAAAGLTELCQTLAPRDLTDPLLFAYVGFQWNHGLMFWPDIAVYAMLGGAVWISLTQVLGLRRR LH KDPDAGPWAAATLRGLFFSVYALGFAAGVLVRPRMAASRRSG*
Gene matched: gi | 136909 | s | P10227 | UL43_HSV11 Gene name: MEMBRANE PROTEIN UL43
[SEQ ID NO:280] = 15 ORF # = 46b
ORF start site = 86576
ORF end site = 87820
ORF sequence: MCRGDSPGVAGGSGEHCLGGDDGDDGRPRLACVGAIARGFAHLWLQATTLGFVGSWLSRGPYADAMS
GA
FVIGSTGLGFLRAPPAFARPPTRVCAWLRLVGGGAAVALWSLGEAGAPPGVPGPATQCLALGAAYAAL
LV
LADDVHPLFLLAPRPLFVGTLGVVVGGLTIGGSARYWWIDPRAAAALTAAVVAGLGTTAAGDSFSKAC PR
HRRFCWSAVESPPPRYAPEDAERPTDHGPLLPSTHHQRSPRVCGDGAARPENIWVPWTFAGALALA
AC
AARGSDAAPSGPVLPLWPQVFVGGHAAAGLTELCQTLAPRDLTDPLLFAYVGFQWNHGLMFWPDIA
VY AMLGGAVWISLTQVLGLRRRLHKDPDAGPWAAATLRGLFFSVYALGFAAGVLVRPRMAASRRSG*
Gene matched: gi | 136909 | sp| P10227 |UL43_HSV11 Gene name: MEMBRANE PROTEIN UL43
[SEQ ID NO:281] = 15
ORF # = 57a
ORF start site = 100984
ORF end site = 102942 ORF sequence :
MGTEDCDHEGRSVAAPVEVMALYATDGCVITSSLALLTNCLLGAEPLYIFSYDAYRPDAPNGPTGAPT
EQ
ERFEGSRALYRDAGGLNGDSFRVTFCLLGTEVGVTHHPKGRTRPMFVCRFERADDVAVLQDALGRGTP
LL PAHITATLDLEATFALHANIIMALTVAIVHNAPARIGSGSTAPLYEPGESMRSWGRMSLGQRGLTTL
FV
HHEARVLAAYRRAYYGSAQSPFWFLSKFGPDEKSLVLAARYYVLQAPRLGGAGATYDLQAVKDICATY
Al
PHDPRPDTLSAASLTSFAAITRFCCTSQYSRGAAAAGFPLYVERRIAADVRETGALEKFIAHDRSCLR VS
DREFITYIYLAHFECFSPPRLATHLRAVTTHDPSPAASTEQPSPLGREAVEQFFRHVRAQLNIREYVK
QN
VTPRETALAGDAAAAYLRARTYAPAALTPAPAYCGVADSSTKMMGRLAEAERLLVPHGWPAFAPTTPG
DD AGGGTAAPQTCGIVKRLLKLAATEQQGTTPPAIAALMQDASVQTPLPVYRITMSPTGQAFAAAARDDW
AR
VTRDARPPEATWADAAAAPEPGALGRRLTRRICARGPAPPPGRPGRRGPDVREPQRDLQRRAGRYEH
HP
GSGHRPEGARPLSPAPRGPGSL* Gene matched : gi | 136939 | sp | P10236 | UL52_HSV11 Gene name : DNA HELICASE/PRIMASE COMPLEX
[ SEQ ID NO : 282 ] = 15
ORF # = 57b
ORF start site = 100765
ORF end site = 102942
ORF sequence : MGAGKSALTTARASCSRGSXSEGGAAARIISYCCSSGRVPQPHSTPSRDAIPEHARGSAPAFPHPTPS
GF
AGAMGTEDCDHEGRSVAAPVEVMALYATDGCVITSSLALLTNCLLGAEPLYIFSYDAYRPDAPNGPTG
AP
TEQERFEGSRALYRDAGGLNGDSFRVTFCLLGTEVGVTHHPKGRTRPMFVCRFERADDVAVLQDALGR GT
PLLPAHITATLDLEATFALHANIIMALTVAIVHNAPARIGSGSTAPLYEPGESMRSWGRMSLGQRGL
TT
LFVHHEARVLAAYRRAYYGSAQSPFWFLSKFGPDEKSLVLAARYYVLQAPRLGGAGATYDLQAVKDIC
AT YAIPHDPRPDTLSAASLTSFAAITRFCCTSQYSRGAAAAGFPLYVERRIAADVRETGALEKFIAHDRS
CL
RVSDREFITYIYLAHFECFSPPRLATHLRAVTTHDPSPAASTEQPSPLGREAVEQFFRHVRAQLNIRE
YV
KQNVTPRETALAGDAAAAYLRARTYAPAALTPAPAYCGVADSSTKMMGRLAEAERLLVPHGWPAFAPT TP
GDDAGGGTAAPQTCGIVKRLLKLAATEQQGTTPPAIAALMQDASVQTPLPVYRITMSPTGQAFAAAAR
DD
WARVTRDARPPEATWADAAAAPEPGALGRRLTRRICARGPAPPPGRPGRRGPDVREPQRDLQRRAGR
YE HHPGSGHRPEGARPLSPAPRGPGSL*
Gene matched: gi | 136939 | sp| P10236 |UL52_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:283] = 15 ORF # = 57c
ORF start site = 100678 ORF end site = 102942 ORF sequence:
MQAWYVRARARAFTRRRVSSSDSRASSSVMGAGKSALTTARASCSRGSXSEGGAAARIISYCCSSGRV PQ
PHSTPSRDAIPEHARGSAPAFPHPTPSGFAGAMGTEDCDHEGRSVAAPVEVMALYATDGCVITSSLAL LT NCLLGAEPLYIFSYDAYRPDAPNGPTGAPTEQERFEGSRALYRDAGGLNGDSFRVTFCLLGTEVGVTH
HP
KGRTRPMFVCRFERADDVAVLQDALGRGTPLLPAHITATLDLEATFALHANIIMALTVAIVHNAPARI
GS GSTAPLYEPGESMRSWGRMSLGQRGLTTLFVHHEARVLAAYRRAYYGSAQSPFWFLSKFGPDEKSLV
LA
ARYYVLQAPRLGGAGATYDLQAVKDICATYAIPHDPRPDTLSAASLTSFAAITRFCCTSQYSRGAAAA
GF
PLYVERRIAADVRETGALEKFIAHDRSCLRVSDREFITYIYLAHFECFSPPRLATHLRAVTTHDPSPA AS
TTEQPSPLGREAVEQFFRHRAQLNIREYVKQNVTPRETALAGDAAAAYLRARTYAPAALTPAPAYCGV
AD
SSTKMMGRLAEAERLLVPHGWPAFAPTTPGDDAGGGTAAPQTCGIVKRLLKLAATEQQGTTPPAIAAL
MQ DASVQTPLPVYRITMSPTGQAFAAAARDDWARVTRDARPPEATWADAAAAPEPGALGRRLTRRICAR
GP
APPPGRPGRRGPDVREPQRDLQRRAGRYEHHPGSGHRPEGARPLSPAPRGPGSL*
Gene matched: gi | 136939 | sp | P10236 |UL52_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:284] = 15
ORF # = 57d ORF start site = 100624
ORF end site = 102942
ORF sequence:
MVEPSSPGWWRASLSRLTMQAWYVRARARAFTRRRVSSSDSRASSSVMGAGKSALTTARASCSRGSXS
EG GAAARIISYCCSSGRVPQPHSTPSRDAIPEHARGSAPAFPHPTPSGFAGAMGTEDCDHEGRSVAAPVE
VM
ALYATDGCVITSSLALLTNCLLGAEPLYIFSYDAYRPDAPNGPTGAPTEQERFEGSRALYRDAGGLNG
DS
FRVTFCLLGTEVGVTHHPKGRTRPMFVCRFERADDVAVLQDALGRGTPLLPAHITATLDLEATFALHA NI
IMALTVAIVHNAPARIGSGSTAPLYEPGESMRSWGRMSLGQRGLTTLFVHHEARVLAAYRRAYYGSA
QS
PFWFLSKFGPDEKSLVLAARYYVLQAPRLGGAGATYDLQAVKDICATYAIPHDPRPDTLSAASLTSFA
Al TRFCCTSQYSRGAAAAGFPLYVERRIAADVRETGALEKFIAHDRSCLRVSDREFITYIYLAHFECFSP
PR
LATHLRAVTTHDPΞPAASTEQPSPLGREAVEQFFRHVRAQLNIREYVKQNVTPRETALAGDAAAAYLR
AR
TYAPAALTPAPAYCGVADSSTKMMGRLAEAERLLVPHGWPAFAPTTPGDDAGGGTAAPQTCGIVKRLL KL AATEQQGTTPPAIAALMQDASVQTPLPVYRITMSPTGQAFAAAARDDWARVTRDARPPEATWADAAA AP
EPGALGRRLTRRICARGPAPPPGRPGRRGPDVREPQRDLQRRAGRYEHHPGSGHRPEGARPLSPAPRG PG SL*
Gene matched: gi | 136939 | sp | P10236 |UL52_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:285] = 15
ORF # = 57e
ORF start site = 100567
RF end site = 102942 ORF sequence:
MHVSARRRILSRCAATAPSMVEPSSPGWWRASLΞRLTMQAWYVRARARAFTRRRVSSSDSRASSSVMG
AG
KSALTTARASCSRGSXSEGGAAARIISYCCSSGRVPQPHSTPSRDAIPEHARGSAPAFPHPTPSGFAG
AM GTEDCDHEGRSVAAPVEVMALYATDGCVITSSLALLTNCLLGAEPLYIFSYDAYRPDAPNGPTGAPTE
QE
RFEGSRALYRDAGGLNGDSFRVTFCLLGTEVGVTHHPKGRTRPMFVCRFERADDVAVLQDALGRGTPL
LP
AHITATLDLEATFALHANIIMALTVAIVHNAPARIGSGSTAPLYEPGESMRSWGRMΞLGQRGLTTLF VH
HEARVLAAYRRAYYGSAQSPFWFLSKFGPDEKSLVLAARYYVLQAPRLGGAGATYDLQAVKDICATYA
IP
HDPRPDTLSAASLTSFAAITRFCCTSQYSRGAAAAGFPLYVERRIAADVRETGALEKFIAHDRSCLRV
SD REFITYIYLAHFECFSPPRLATHLRAVTTHDPSPAASTEQPSPLGREAVEQFFRHVRAQLNIREYVKQ
NV
TPRETALAGDAAAAYLRARTYAPAALTPAPAYCGVADSSTKMMGRLAEAERLLVPHGWPAFAPTTPGD
DA
GGGTAAPQTCGIVKRLLKLAATEQQGTTPPAIAALMQDASVQTPLPVYRITMSPTGQAFAAAARDDWA RV
TRDARPPEATWADAAAAPEPGALGRRLTRRICARGPAPPPGRPGRRGPDVREPQRDLQRRAGRYEHH
PG
SGHRPEGARPLSPAPRGPGSL*
Gene matched: gi | 136939 | sp | P10236 |UL52_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:286] 15 ORF # = 57f ORF start site = 100558 ORF end site = 102942 ORF sequence :
MVAMHVSARRRILSRCAATAPSMVEPSSPGWWRASLSRLTMQAWYVRARARAFTRRRVSSSDSRASSS VM
GAGKSALTTARASCSRGSXSEGGAAARIISYCCSSGRVPQPHSTPSRDAIPEHARGSAPAFPHPTPSG FA
GAMGTEDCDHEGRSVAAPVEVMALYATDGCVITSΞLALLTNCLLGAEPLYIFSYDAYRPDAPNGPTGA PT EQERFEGSRALYRDAGGLNGDSFRVTFCLLGTEVGVTHHPKGRTRPMFVCRFERADDVAVLQDALGRG TP
LLPAHITATLDLEATFALHANIIMALTVAIVHNAPARIGSGSTAPLYEPGESMRSWGRMSLGQRGLT TL
FVHHEARVLAAYRRAYYGSAQSPFWFLSKFGPDEKSLVLAARYYVLQAPRLGGAGATYDLQAVKDICA TY
AIPHDPRPDTLSAASLTSFAAITRFCCTSQYSRGAAAAGFPLYVERRIAADVRETGALEKFIAHDRSC
LR
VSDREFITYIYLAHFECFSPPRLATHLRAVTTHDPSPAASTEQPSPLGREAVEQFFRHVRAQLNIREY
VK QNVTPRETALAGDAAAAYLRARTYAPAALTPAPAYCGVADSSTKMMGRLAEAERLLVPHGWPAFAPTT
PG
DDAGGGTAAPQTCGIVKRLLKLAATEQQGTTPPAIAALMQDASVQTPLPVYRITMSPTGQAFAAAARD
DW
ARVTRDARPPEATWADAAAAPEPGALGRRLTRRICARGPAPPPGRPGRRGPDVREPQRDLQRRAGRY EH
HPGSGHRPEGARPLSPAPRGPGSL*
Gene matched: gi | 136939 | sp | P10236 |UL52_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:287] = 15 ORF # = 57g
ORF start site = 100543 ORF end site = 102942 ORF sequence :
MYICRMVAMHVSARRRILSRCAATAPSMVEPSSPGWWRASLSRLTMQAWYVRARARAFTRRRVSSSDS RA SSSVMGAGKSALTTARASCSRGSXSEGGAAARIISYCCSSGRVPQPHSTPSRDAIPEHARGSAPAFPH PT
PSGFAGAMGTEDCDHEGRSVAAPVEVMALYATDGCVITSSLALLTNCLLGAEPLYIFSYDAYRPDAPN GP
TGAPTEQERFEGSRALYRDAGGLNGDSFRVTFCLLGTEVGVTHHPKGRTRPMFVCRFERADDVAVLQD AL GRGTPLLPAHITATLDLEATFALHANIIMALTVAIVHNAPARIGSGSTAPLYEPGESMRSWGRMSLG
QR
GLTTLFVHHEARVLAAYRRAYYGSAQSPFWFLSKFGPDEKSLVLAARYYVLQAPRLGGAGATYDLQAV
KD ICATYAIPHDPRPDTLSAASLTSFAAITRFCCTSQYSRGAAAAGFPLYVERRIAADVRETGALEKFIA
HD
RSCLRVSDREFITYIYLAHFECFSPPRLATHLRAVTTHDPSPAASTEQPSPLGREAVEQFFRHVRAQL
NI
REYVKQNVTPRETALAGDAAAAYLRARTYAPAALTPAPAYCGVADSSTKMMGRLAEAERLLVPHGWPA FA
PTTPGDDAGGGTAAPQTCGIVKRLLKLAATEQQGTTPPAIAALMQDASVQTPLPVYRITMSPTGQAFA
AA
ARDDWARVTRDARPPEATWADAAAAPEPGALGRRLTRRICARGPAPPPGRPGRRGPDVREPQRDLQR
RA GRYEHHPGSGHRPEGARPLSPAPRGPGSL*
Gene matched: gi | 136939 | sp| P10236 |UL52_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:288] = 15
ORF # = 57h
ORF start site = 100483
ORF end site = 102942 ORF sequence:
MLRMAWETSTSADLSAAPTDMYICRMVAMHVSARRRILSRCAATAPSMVEPSSPGWWRASLSRLTMQA Y
VRARARAFTRRRVSSSDSRASSSVMGAGKSALTTARASCSRGSXSEGGAAARIISYCCSSGRVPQPHS
TP SRDAIPEHARGSAPAFPHPTPSGFAGAMGTEDCDHEGRSVAAPVEVMALYATDGCVITSSLALLTNCL
LG
AEPLYIFSYDAYRPDAPNGPTGAPTEQERFEGSRALYRDAGGLNGDSFRVTFCLLGTEVGVTHHPKGR
TR
PMFVCRFERADDVAVLQDALGRGTPLLPAHITATLDLEATFALHANIIMALTVAIVHNAPARIGSGST AP
LYEPGESMRSWGRMSLGQRGLTTLFVHHEARVLAAYRRAYYGSAQSPFWFLSKFGPDEKSLVLAARY
YV
LQAPRLGGAGATYDLQAVKDICATYAIPHDPRPDTLSAASLTSFAAITRFCCTSQYSRGAAAAGFPLY
VE RRIAADVRETGALEKFIAHDRSCLRVSDREFITYIYLAHFECFSPPRLATHLRAVTTHDPSPAASTEQ
PS
PLGREAVEQFFRHVRAQLNIREYVKQNVTPRETALAGDAAAAYLRARTYAPAALTPAPAYCGVADSST
KM
MGRLAEAERLLVPHGWPAFAPTTPGDDAGGGTAAPQTCGIVKRLLKLAATEQQGTTPPAIAALMQDAS VQ TPLPVYRITMSPTGQAFAAAARDDWARVTRDARPPEATWADAAAAPEPGALGRRLTRRICARGPAPP
PG
RPGRRGPDVREPQRDLQRRAGRYEHHPGSGHRPEGARPLSPAPRGPGSL*
Gene matched: gi | 136939 | sp| P10236 |UL52_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:289] = 15 ORF # = 57i
ORF start site = 100242
ORF end site = 102942
ORF sequence :
MTTSLSAMLRMAWETSTSADLΞAAPTDMYICRMVAMHVSARRRILSRCAATAPSMVEPSSPGWWRASL SR
LTMQAWYVRARARAFTRRRVSSSDSRASSSVMGAGKSALTTARASCSRGSXSEGGAAARIISYCCSSG
RV
PQPHSTPSRDAIPEHARGSAPAFPHPTPSGFAGAMGTEDCDHEGRSVAAPVEVMALYATDGCVITSSL
AL LTNCLLGAEPLYIFSYDAYRPDAPNGPTGAPTEQERFEGSRALYRDAGGLNGDSFRVTFCLLGTEVGV
TH
HPKGRTRPMFVCRFERADDVAVLQDALGRGTPLLPAHITATLDLEATFALHANIIMALTVAIVHNAPA
RI
GSGSTAPLYEPGESMRSWGRMSLGQRGLTTLFVHHEARVLAAYRRAYYGSAQSPFWFLSKFGPDEKS LV
LAARYYVLQAPRLGGAGATYDLQAVKDICATYAIPHDPRPDTLSAASLTSFAAITRFCCTSQYSRGAA
AA
GFPLYVERRIAADVRETGALEKFIAHDRSCLRVSDREFITYIYLAHFECFSPPRLATHLRAVTTHDPS
PA ASTEQPSPLGREAVEQFFRHVRAQLNIREYVKQNVTPRETALAGDAAAAYLRARTYAPAALTPAPAYC
GV
AADSSTKMMGRLAEAERLLVPGWPAFAPTTPGDDAGGGTAAPQTCGIVKRLLKLAATEQQGTTPPAIA
AL
MQDASVQTPLPVYRITMSPTGQAFAAAARDDWARVTRDARPPEATWADAAAAPEPGALGRRLTRRIC AR
GPAPPPGRPGRRGPDVREPQRDLQRRAGRYEHHPGSGHRPEGARPLSPAPRGPGΞL*
Gene matched: gi | 136939 | sp | P10236 | UL52_HSV11 Gene name: DNA HELICASE/PRIMASE COMPLEX
[SEQ ID NO:290] = 15 ORF # = 61a
ORF start site = 107456 ORF end site = 108016 ORF sequence :
MTTTPLSNLFLRAPDITHVAPPYCLNATWQAENALHTTKTDPACLAARSYLVRASCSTSGPIHCFFFA
VY
KDSQHSLPLVTELRNFADLVNHPPVLRELEDKRGGRLRCTGPFSCGTIKDVSGASPAGEYTINGIVYH
CH
CRYPFSKTCWLGASAALQHLRSISSSGTAARAAEQRRHKIKIKIKV*
Gene matched: gi | 136947 | sp| P28281 |UL55_HSV2H Gene name: PROTEIN UL55ΛAgi | 73806 |pir| |w
[SEQ ID NO:291] = 15
ORF # = 61b
ORF start site = 107372
ORF end site = 108016 ORF sequence:
MWGPGPARFIARPGTHGRRVFTDPPPRNMTTTPLSNLFLRAPDITHVAPPYCLNATWQAENALHTTKT
DP
ACLAARSYLVRASCSTSGPIHCFFFAVYKDSQHSLPLVTELRNFADLVNHPPVLRELEDKRGGRLRCT
GP FSCGTIKDVSGASPAGEYTINGIVYHCHCRYPFSKTCWLGASAALQHLRSISSSGTAARAAEQRRHKI
KI
KIKV*
Gene matched: gi | 136947 | sp | P28281 |UL55_HSV2H Gene name: PROTEIN UL55ΛAgi | 73806 |pir | |w
[SEQ ID NO:292] = 15
ORF # = 6a ORF start site = 6446
ORF end site = 8482
ORF sequence :
MAAQRARAPAMRTRGGDAALCAPEDGWVKVHPTPGTMLFREILLGQMGYTEGQGVYNWRSSEAATRQ
LQ AAIFHALLNATTYRDLEEDWRRHWARGLQPQRLVRRYRNAREGDIAGVAERVFDTWRCTLRTTLLDF
AH
GWDCFAPGGPΞGPTSFPKYIDWLTCLGLVPILRKTREGEATQRLGAFLRQHTLPRQLATVAGAAERA
GP
GLLELAVAFDSTRMAEYDRVHIYYNHRRGEWLVRDPVSGQRGECLVLCPPLWTGDRLVFDSPVQRLCP Ei
VACHALREHAHICRLRNTASVKVLLGRKSDSERGVAGAAARWNKALGEDDETKAGSAASRLVRLIIM KG
MRHVGDINDTVRAYLDEAGGHLIDTPAVDHTLPGFGKGGTGRGSAAQDPGARPQQLRQAFQTAWNNI NG MLEGYINNLFGTIERLRETNAGLATQLQARDRELRRAQAGALEREQRAADRAAGGGAGRPAEADLLRA DY
DIIDVSKSMDDDTYVANSFQHQYIPAYGQDLERLSRLWEHELVRCFKILRHRNNQGQETSISYSSGAI AS FVAPYFEYVLRAPRAGALITGSDVILGEEELWEAVFKKTRLQTYLTDVAALFVADVQHAALPRPPSPT PA DFRASASPRGGSRSRTRTRSRSPGRTPRGAPDQGWGVERRDGRPHARR*
Gene matched: gi | 136794 | sp | P10190 |UL06_HSV11 Gene name: VIRION PROTEIN UL6ΛAgi | 73994 |
[SEQ ID NO:293] = 15
ORF # = 6b ORF start site = 6326
ORF end site = 8482
ORF sequence :
MDVKFKNASSLNRTAGLAPGCCGGGPGARTSREPSPPDAAMAAQRARAPAMRTRGGDAALCAPEDGWV
KV HPTPGTMLFREILLGQMGYTEGQGVYNWRSSEAATRQLQAAIFHALLNATTYRDLEEDWRRHWARG
LQ
PQRLVRRYRNAREGDIAGVAERVFDTWRCTLRTTLLDFAHGWDCFAPGGPSGPTSFPKYIDWLTCLG
LV
PILRKTREGEATQRLGAFLRQHTLPRQLATVAGAAERAGPGLLELAVAFDSTRMAEYDRVHIYYNHRR GE
WLVRDPVSGQRGECLVLCPPLWTGDRLVFDSPVQRLCPEIVACHALREHAHICRLRNTASVKVLLGRK
SD
SERGVAGAARWNKALGEDDETKAGSAASRLVRLIINMKGMRHVGDINDTVRAYLDEAGGHLIDTPAV
DH TLPGFGKGGTGRGSAAQDPGARPQQLRQAFQTAWNNINGMLEGYINNLFGTIERLRETNAGLATQLQ
AR
DRELRRAQAGALEREQRAADRAAGGGAGRPAΞADLLRADYDIIDVSKSMDDDTYVANSFQHQYIPAYG
QD
LERLSRLWEHELVRCFKILRHRNNQGQETSISYSSGAIASFVAPYFEYVLRAPRAGALITGSDVILGE EE
LWEAVFKKTRLQTYLTDVAALFVADVQHAALPRPPSPTPADFRASASPRGGSRSRTRTRSRSPGRTPR
GA
PDQGWGVERRDGRPHARR*
Gene matched: gi | 136794 | sp | P10190 |UL06_HSV11 Gene name: VIRION PROTEIN UL6ΛAgi | 73994 |
[SEQ ID NO:294] = 15 ORF # = 6c ORF start site = 6296 ORF end site = 8482
ORF sequence :
MRAMIGWTPCMDVKFKNASSLNRTAGLAPGCCGGGPGARTSREPSPPDAAMAAQRARAPAMRTRGGDA
AL CAPEDGWVKVHPTPGTMLFREILLGQMGYTEGQGVYNWRSSEAATRQLQAAIFHALLNATTYRDLEE
DW
RRHWARGLQPQRLVRRYRNAREGDIAGVAERVFDTWRCTLRTTLLDFAHGWDCFAPGGPSGPTSFP
KY
IDWLTCLGLVPILRKTREGEATQRLGAFLRQHTLPRQLATVAGAAERAGPGLLELAVAFDSTRMAEYD RV
HIYYNHRRGEWLVRDPVSGQRGECLVLCPPLWTGDRLVFDSPVQRLCPEIVACHALREHAHICRLRNT
AS
VKVLLGRKSDSERGVAGAARWNKALGEDDETKAGSAASRLVRLIINMKGMRHVGDINDTVRAYLDEA
GG HLIDTPAVDHTLPGFGKGGTGRGSAAQDPGARPQQLRQAFQTAWNNINGMLEGYINNLFGTIERLRE
TN
AGLATQLQARDRELRRAQAGALEREQRAADRAAGGGAGRPAEADLLRADYDIIDVSKSMDDDTYVANS
FQ
HQYIPAYGQDLERLSRLWEHELVRCFKILRHRNNQGQETSISYSSGAIASFVAPYFEYVLRAPRAGAL IT
GSDVILGEEELWEAVFKKTRLQTYLTDVAALFVADVQHAALPRPPSPTPADFRASASPRGGSRSRTRT
RS
RSPGRTPRGAPDQGWGVERRDGRPHARR*
Gene matched: gi | 136794 | sp| P10190 |UL06_HSV11 Gene name: VIRION PROTEIN UL6ΛAgi | 73994 |
[SEQ ID NO: 295] = 15
ORF # = 6d ORF start site = 6167
ORF end site = 8482
ORF sequence :
MRYAANGNSRSGRPVGTSKAATSRNHCRRGTCVTSSCCCESSRMRAMIGWTPCMDVKFKNASΞLNRTA
GL APGCCGGGPGARTSREPSPPDAAMAAQRARAPAMRTRGGDAALCAPEDGWVKVHPTPGTMLFREILLG
QM
GYTEGQGVYNWRSSEAATRQLQAAIFHALLNATTYRDLEEDWRRHWARGLQPQRLVRRYRNAREGD
IA
GVAERVFDTWRCTLRTTLLDFAHGWDCFAPGGPSGPTSFPKYIDWLTCLGLVPILRKTREGEATQRL GA
FLRQHTLPRQLATVAGAAERAGPGLLELAVAFDSTRMAEYDRVHIYYNHRRGEWLVRDPVSGQRGECL
VL
CPPLWTGDRLVFDSPVQRLCPEIVACHALREHAHICRLRNTASVKVLLGRKSDSERGVAGAARWNKA
LG EDDETKAGSAASRLVRLIINMKGMRHVGDINDTVRAYLDEAGGHLIDTPAVDHTLPGFGKGGTGRGSA AQ
DPGARPQQLRQAFQTAWNNINGMLEGYINNLFGTIERLRETNAGLATQLQARDRELRRAQAGALERE QR AADRAAGGGAGRPAEADLLRADYDIIDVSKSMDDDTYVANSFQHQYIPAYGQDLERLSRLWEHELVRC FK
ILRHRNNQGQETSISYSSGAIASFVAPYFEYVLRAPRAGALITGSDVILGEEELWEAVFKKTRLQTYL TD
VAALFVADVQHAALPRPPSPTPADFRASASPRGGSRSRTRTRSRSPGRTPRGAPDQGWGVERRDGRPH AR R*
Gene matched: gi | 136794 | sp| P10190 |UL06_HSV11 Gene name: VIRION PROTEIN UL6ΛAgi I 73994 I
[SEQ ID NO:296] = 15
ORF # = 6e
ORF start site = 6065
ORF end site = 8482 ORF sequence :
MFCAAIRVAPVTTQSRTSLRVCTHVLFPDPALPVMRYAANGNSRSGRPVGTSKAATSRNHCRRGTCVT
SS
CCCESSRMRAMIGWTPCMDVKFKNASSLNRTAGLAPGCCGGGPGARTSREPSPPDAAMAAQRARAPAM
RT RGGDAALCAPEDGWVKVHPTPGTMLFREILLGQMGYTEGQGVYNVVRSSEAATRQLQAAIFHALLNAT
TY
RDLEEDWRRHWARGLQPQRLVRRYRNAREGDIAGVAERVFDTWRCTLRTTLLDFAHGWDCFAPGGP
SG
PTSFPKYIDWLTCLGLVPILRKTREGEATQRLGAFLRQHTLPRQLATVAGAAERAGPGLLELAVAFDS TR
MAEYDRVHIYYNHRRGEWLVRDPVSGQRGECLVLCPPLWTGDRLVFDSPVQRLCPEIVACHALREHAH
IC
RLRNTASVKVLLGRKSDSERGVAGAARWNKALGEDDETKAGSAASRLVRLIINMKGMRHVGDINDTV RA YLDEAGGHLIDTPAVDHTLPGFGKGGTGRGSAAQDPGARPQQLRQAFQTAWNNINGMLEGYINNLFG Tl
ERLRETNAGLATQLQARDRELRRAQAGALEREQRAADRAAGGGAGRPAEADLLRADYDIIDVSKSMDD DT YVANSFQHQYIPAYGQDLERLSRLWEHELVRCFKILRHRNNQGQETSISYSSGAIASFVAPYFEYVLR AP
RAGALITGSDVILGEEELWEAVFKKTRLQTYLTDVAALFVADVQHAALPRPPSPTPADFRASASPRGG
SR
SRTRTRSRSPGRTPRGAPDQGWGVERRDGRPHARR*
Gene matched: gi | 136794 | sp | P10190 |UL06_HΞV11 Gene name : VIRION PROTEIN UL6 ΛAgi I 73994 j
[ SEQ ID NO : 297 ] = 15 ORF # = 6 f
ORF start site = 6026 ORF end site = 8482 ORF sequence :
MGRLRNAPESLTYMFCAAIRVAPVTTQSRTSLRVCTHVLFPDPALPVMRYAANGNSRSGRPVGTSKAA TS
RNHCRRGTCVTSSCCCESSRMRAMIGWTPCMDVKFKNASSLNRTAGLAPGCCGGGPGARTSREPSPPD
AA
MAAQRARAPAMRTRGGDAALCAPEDGWVKVHPTPGTMLFREILLGQMGYTEGQGVYNWRSSEAATRQ
LQ AAIFHALLNATTYRDLEEDWRRHWARGLQPQRLVRRYRNAREGDIAGVAERVFDTWRCTLRTTLLDF
AH
GWDCFAPGGPSGPTSFPKYIDWLTCLGLVPILRKTREGEATQRLGAFLRQHTLPRQLATVAGAAERA
GP
GLLELAVAFDSTRMAEYDRVHIYYNHRRGEWLVRDPVSGQRGECLVLCPPLWTGDRLVFDSPVQRLCP El
VACHALREHAHICRLRNTASVKVLLGRKSDSERGVAGAARWNKALGEDDETKAGSAASRLVRLI INM
KG
MRHVGDINDTVRAYLDEAGGHLIDTPAVDHTLPGFGKGGTGRGΞAAQDPGARPQQLRQAFQTAWNNI
NG MLEGYINNLFGTIERLRETNAGLATQLQARDRELRRAQAGALEREQRAADRAAGGGAGRPAEADLLRA
DY
DIIDVSKSMDDDTYVANSFQHQYIPAYGQDLERLSRLWEHELVRCFKILRHRNNQGQETSISYSSGAI
AS
FVAPYFEYVLRAPRAGALITGSDVILGEEELWEAVFKKTRLQTYLTDVAALFVADVQHAALPRPPSPT PA
DFRASASPRGGSRSRTRTRSRSPGRTPRGAPDQGWGVERRDGRPHARR*
Gene matched: gi | 136794 | sp | P10190 ] UL06_HSV11 Gene name: VIRION PROTEIN UL6ΛAgi | 73994 |
[SEQ ID NO:298] = 15 ORF # = 6g
ORF start site = 6017 ORF end site = 8482 ORF sequence :
MVLMGRLRNAPESLTYMFCAAIRVAPVTTQSRTSLRVCTHVLFPDPALPVMRYAANGNSRSGRPVGTS KA
ATSRNHCRRGTCVTSSCCCESSRMRAMIGWTPCMDVKFKNASΞLNRTAGLAPGCCGGGPGARTSREPS PP DAAMAAQRARAPAMRTRGGDAALCAPEDGWVKVHPTPGTMLFREILLGQMGYTEGQGVYNVVRSSEAA
TR
QLQAAIFHALLNATTYRDLEEDWRRHWARGLQPQRLVRRYRNAREGDIAGVAERVFDTWRCTLRTTL
LD FAHGWDCFAPGGPSGPTSFPKYIDWLTCLGLVPILRKTREGEATQRLGAFLRQHTLPRQLATVAGAA
ER
AGPGLLELAVAFDSTRMAEYDRVHIYYNHRRGEWLVRDPVSGQRGECLVLCPPLWTGDRLVFDSPVQR
LC
PEIVACHALREHAHICRLRNTASVKVLLGRKSDSERGVAGAARVVNKALGEDDETKAGSAASRLVRLI IN
MKGMRHVGDINDTVRAYLDEAGGHLIDTPAVDHTLPGFGKGGTGRGSAAQDPGARPQQLRQAFQTAW
NN
INGMLEGYINNLFGTIERLRETNAGLATQLQARDRELRRAQAGALEREQRAADRAAGGGAGRPAEADL
LR ADYDIIDVSKSMDDDTYVANSFQHQYIPAYGQDLERLSRLWEHELVRCFKILRHRNNQGQETSISYSS
GA
IASFVAPYFEYVLRAPRAGALITGSDVILGEEELWEAVFKKTRLQTYLTDVAALFVADVQHAALPRPP
SP
TPADFRASASPRGGSRSRTRTRSRSPGRTPRGAPDQGWGVERRDGRPHARR*
Gene matched: gi | 136794 | sp | P10190 | UL06_HSV11 Gene name: VIRION PROTEIN UL6ΛAgi | 73994 |
[SEQ ID NO:299] = 15
ORF # = 47a
ORF start site = 88122
ORF end site = 89564
ORF sequence: MALGRVGLAVGLWGLLWVGWWLANASPGRTITVGPRGNASNAAPSASPRNASAPRTTPTPPQPRKA
TK
SKASTAKPAPPPKTGPPKTSΞEPVRCNRHDPLARYGSRVQIRCRFPNSTRTESRLQIWRYATATDAEI
GT
APSLEEVMVNVSAPPGGQLVYDSAPNRTDPHVIWAEGAGPGASPRLYSWGPLGRQRLIIEELTLETQ GM
YYWVWGRTDRPSAYGTWVRVRVFRPPSLTIHPHAVLEGQPFKATCTAATYYPGNRAEFVWFEDGRRVF
DP
AQIHTQTQENPDGFSTVSTVTSAAVGGQGPPRTFTCQLTWHRDSVSFSRRNASGTASVLPRPTITMEF
TG DHAVCTAGCVPEGVTFAWFLGDDSSPAEKVAVASQTSCGRPGTATIRSTLPVSYEQTEYICRLAGYPD
GI
PVLEHHGSHQPPPRDPTERQVIRAVEGAGIGVAVLVAWLAGTAWYLTHASSVRYRRLR*
Gene matched: gi | 138220 | sp | P06475 |VGLC_HSV23 Gene name: GLYCOPROTEIN C PRECURSORΛAgi | [ SEQ ID NO : 300 ] = 15
ORF # = 47b ORF start site = 87918
ORF end site = 89564
ORF sequence :
MGAGVPWTGIKARGAGGPITVRVLGWEVAQKATHPCCSCPREAWSGNPPRCAGRAHRSFAGAGALLV
MA LGRVGLAVGLWGLLWVGWWLANASPGRTITVGPRGNASNAAPSASPRNASAPRTTPTPPQPRKATK
SK
ASTAKPAPPPKTGPPKTSSEPVRCNRHDPLARYGSRVQIRCRFPNSTRTESRLQIWRYATATDAEIGT
AP
SLEEVMVNVSAPPGGQLVYDSAPNRTDPHVIWAEGAGPGASPRLYSWGPLGRQRLIIEELTLETQGM YY
WVWGRTDRPSAYGTWVRVRVFRPPSLTIHPHAVLEGQPFKATCTAATYYPGNRAEFVWFEDGRRVFDP
AQ
IHTQTQENPDGFSTVSTVTSAAVGGQGPPRTFTCQLTWHRDSVSFSRRNASGTASVLPRPTITMEFTG
DH AVCTAGCVPEGVTFAWFLGDDSSPAEKVAVASQTSCGRPGTATIRSTLPVSYEQTEYICRLAGYPDGI
PV
LEHHGSHQPPPRDPTERQVIRAVEGAGIGVAVLVAWLAGTAWYLTHASSVRYRRLR*
Gene matched: gi | 138220 | sp| P06475 |VGLC_HSV23 Gene name : GLYCOPROTEIN C PRECURSORΛAgi |
[SEQ ID NO:301] = 15 ORF # = 52a
ORF start site = 97076
ORF end site = 95441
ORF sequence:
MSVLGDARHPRRFPSRGPRPFSVAGPGSLPPSPPPGARARLIRLSRSLFPDPTAPMDLLVDDLFADAD GV
SPPPPRPAGGPKNTPAAPPLYATGRLSQAQLMPSPPMPVPPAALFNRLLDDLGFSAGPALCTMLDTWN
ED
LFSGFPTNADMYRECKFLSTLPSDVIDWGDAHVPERSPIDIRAHGDVAFPTLPATRDELPSYYEAMAQ
FF RGELRAREESYRTVLANFCSALYRYLRASVRQLHRQAHMRGRNRDLREMLRTTIADRYYRETARLARV
LF
LHLYLFLSREILWAAYAEQMMRPDLFDGLCCDLESWRQLACLFQPLMFINGSLTVRGVPVEARRLREL
NH
IREHLNLPLVRSAAAEEPGAPLTTPPVLQGNQARSSGYFMLLIRAKLDSYSSVATSEGESVMREHAYS RG RTRNNYGSTIEGLLDLPDDDDAPAEAGLVAPRMSFLSAGQRPRRLΞTTAPITDVSLGDELRLDGEEVD
MT
PADALDDFDLEMLGDVESPSPGMTHDPVSYGALDVDDFEFEQMFTDAMGIDDFGG*
Gene matched: gi | 1168549 | sp | P29793 |ATIN_HSV23 Gene name: ALPHA TRANS-INDUCING PROTEIN
[SEQ ID NO:302] = 15 ORF # = 52b
ORF start site = 969103
ORF end site = 95441
ORF sequence:
MDLLVDDLFADADGVSPPPPRPAGGPKNTPAAPPLYATGRLSQAQLMPSPPMPVPPAALFNRLLDDLG FS
AGPALCTMLDTWNEDLFSGFPTNADMYRECKFLSTLPSDVIDWGDAHVPERSPIDIRAHGDVAFPTLP
AT
RDELPSYYEAMAQFFRGELRAREESYRTVLANFCSALYRYLRASVRQLHRQAHMRGRNRDLREMLRTT
IA DRYYRETARLARVLFLHLYLFLSREILWAAYAEQMMRPDLFDGLCCDLESWRQLACLFQPLMFINGSL
TV
RGVPVEARRLRELNHIREHLNLPLVRSAAAEEPGAPLTTPPVLQGNQARSSGYFMLLIRAKLDSYSSV
AT
SEGESVMREHAYSRGRTRNNYGSTIEGLLDLPDDDDAPAEAGLVAPRMSFLSAGQRPRRLSTTAPITD VS
LGDELRLDGEEVDMTPADALDDFDLEMLGDVESPSPGMTHDPVSYGALDVDDFEFEQMFTDAMGIDDF
GG*
Gene matched: gi | 1168549 | sp| P29793 |ATIN_HSV23 Gene name: ALPHA TRANS-INDUCING PROTEIN
[SEQ ID NO:303] = 15
ORF # = 52c ORF start site = 97097
ORF end site = 95441
ORF sequence:
MRGGGREMSVLGDARHPRRFPSRGPRPFSVAGPGSLPPSPPPGARARLIRLSRSLFPDPTAPMDLLVD
DL FADADGVSPPPPRPAGGPKNTPAAPPLYATGRLSQAQLMPSPPMPVPPAALFNRLLDDLGFSAGPALC
TM
LDTWNEDLFSGFPTNADMYRECKFLSTLPSDVIDWGDAHVPERSPIDIRAHGDVAFPTLPATRDELPS
YY
EAMAQFFRGELRAREESYRTVLANFCSALYRYLRASVRQLHRQAHMRGRNRDLREMLRTTIADRYYRE A RLARVLFLHLYLFLSREILWAAYAEQMMRPDLFDGLCCDLESWRQLACLFQPLMFINGSLTVRGVPVE AR
RLRELNHIREHLNLPLVRSAAAEEPGAPLTTPPVLQGNQARSSGYFMLLIRAKLDSYSSVATSEGESV MR EHAYSRGRTRNNYGSTIEGLLDLPDDDDAPAEAGLVAPRMSFLSAGQRPRRLSTTAPITDVSLGDELR LD GEEVDMTPADALDDFDLEMLGDVESPSPGMTHDPVSYGALDVDDFEFEQMFTDAMGIDDFGG*
Gene matched: gi | 1168549 | sp | P29793 |ATIN_HSV23 Gene name: ALPHA TRANS-INDUCING PROTEIN
SEQUENCE LISTING
(1) GENERAL INFORMATION
(i) APPLICANT: SmithKline Beecham Corporation
(ii) TITLE OF THE INVENTION: Novel Coding Sequences from Herpes Simplex Virus Type-2
(iii) NUMBER OF SEQUENCES: 303
(iv) CORRESPONDENCE ADDRESS:
(A) ADDRESSEE: SmithKline Beecham Corporation
(B) STREET: 709 Swedeland Road (C) CITY: King of Prussia
(D) STATE: PA
(E) COUNTRY: U.S.A.
(F) ZIP: 19046
(v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Diskette
(B) COMPUTER: IBM Compatible
(C) OPERATING SYSTEM: Windows
(D) SOFTWARE: FastSEQ for Windows Version 2.0b
(vi) CURRENT APPLICATION DATA:
(A) APPLICATION NUMBER:
(B) FILING DATE:
(C) CLASSIFICATION:
(vii) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: 60/049,018
(B) FILING DATE: 09-JUN-1997
(A) APPLICATION NUMBER: 60/030,279
(B) FILING DATE: 04-NOV-1996
(viii) ATTORNEY/AGENT INFORMATION: (A) NAME: Geiger, Kathleen W.
(B) REGISTRATION NUMBER: 35,880
(C) REFERENCE/DOCKET NUMBER: P50583 ( ix ) TELECOMMUNICATION INFORMATION :
(A) TELEPHONE : 610-270 -5968
( B ) TELEFAX : 610 -270 -5090
(C) TELEX:
(2) INFORMATION FOR SEQ ID NO : 1 :
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 8953 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 :
GCGGTCGATC TAGAGGATCC CCCGCGCGCC TGCAGCTGGG TCGCCAGACC CGCGTTCGTC 60
TCTCGCAGGC GTTCTATGGT TCCAAAGAGA TTATTGATAT AGCCCTCCAG CATGCCGTTG 120 ATGTTGTTGA CCACGGCCGT CTGAAACGCC TGGCGAAGCT GCTGCGGTCG CGCCCCCGGG 180
TCCTGGGCCG CCGACCCGCG GCCGGTGCCG CCCTTGCCGA ACCCAGGGAG GGTGTGGTCG 240
ACGGCGGGGG TGTCGATCAG GTGCCCCCCC GCCTCGTCCA AGTAGGCGCG TACCGTGTCG 300
TTGATGTCGC CCACGTGGCG CATGCCCTTC ATGTTGATGA TGAGCCGCAC GAGACACGAG 360
GCGGCCGAGC CGGCCTTCGT CTCGTCATCC TCCCCCAGCG CCTTATTGAC GACCCGCGCG 420 GCGCCAGCCA CCCCGCGCTC GCTGTCGCTC TTGCGCCCCA ACAGCACCTT GACGGACGCG 480
GTGTTGCGCA GACGGCAGAT GTGCGCGTGT TCCCGGAGGG CGTGGCACGC GACGATCTCG 540
GGGCACAGCC GCTGAACGGG CGAATCGAAG ACCAGGCGGT CGCCGGTCCA CAGGGGGGGG 600
CACAGCACCA GGCACTCGCC GCGCTGCCCG CTGACCGGGT CGCGCACCAG CCACTCCCCC 660
CGGCGATGGT TGTAGTAGAT GTGCACACGG TCGTATTCCG CCATGCGCGT GGAGTCGAAC 720 GCGACGGCCA GCTCCAGAAG CCCCGGGCCG GCGCGCTCCG CGGCCCCGGC GACCGTGGCC 780
AGCTGCCGGG GCAGCGTGTG CTGCCTGAGA AACGCCCCCA GGCGCTGCGT CGCCTCCCCC 840
TCGCGCGTCT TGCGCAATAT GGGAACCAGC CCCAGACACG TCAGCCAGTC GATATATTTG 900
GGGAAGCTGG TCGGTCCGCT TGGGCCGCCC GGCGCAAAGC AGTTTACCAC CCCGTGGGCA 960
AAGTCCAGCA GCGTCGTCCT GAGCGTGCAT CGCCACGTGT CGAACACCCG CTCGGCCACC 1020 CCGGCGATAT CGCCCTCCCG GGCGTTCCGG TACCTGCGAA CCAGCCGCTG CGGCTGGAGG 1080
CCGCGGGCCA CCACGTGGCG GCGCCAGTCC TCCTCCAGGT CCCGGTACGT TGTGGCGTTG 1140
AGGAGCGCGT GGAAGATCGC CGCCTGCAGC TGTCGGGTGG CGGCCTCGCT GGACCGGACG 1200
ACGTTGTACA CCCCCTGACC CTCGGTGTAC CCCATCTGCC CGAGGAGAAT CTCGCGGAAC 1260
AACATCGTCC CGGGGGTGGG GTGAACCTTC ACCCAGCCGT CCTCGGGGGC GCATAGCGCC 1320 GCGTCGCCGC CCCGCGTCCG CATCGCCGGC GCCCGCGCGC GCTGTGCGGC CATGGCGGCG 1380
TCCGGCGGGG AGGGCTCGCG GGACGTCCGG GCACCAGGTC CGCCCCCACA GCAGCCCGGG 1440
GCCAGACCCG CCGTGCGGTT CAGGGACGAG GCGTTTTTAA ATTTTACGTC CATGCACGGG 1500
GTCCAACCAA TCATCGCACG CATCCGAGAG CTCTCGCAGC AGCAGCTCGA CGTCACGCAG 1560 GTGCCGCGCC TGCAGTGGTT CCGGGACGTG GCGGCCTTGG AGGTCCCGAC CGGCCTGCCG 1620
CTCCGGGAGT TTCCGTTCGC GGCGTATCTC ATCACCGGCA ACGCCGGATC CGGAAAGAGT 1680
ACGTGCGTGC AGACCCTCAA CGAGGTCCTG GACTGCGTGG TCACGGGCGC CACGCGAATC 1740
GCGGCGCAGA ACATGTACGT TAAGCTCTCG GGGGCGTTTC TGAGTCGACC CATCAACACC 1800 ATCTTTCACG AGTTCGGGTT TCGCGGGAAT CACGTCCAGG CCCAGCTGGG GCAGCACCCG 1860
TACACCCTGG CCAGCAGCCC CGCCTCGCTG GAAGACCTGC AGCGGCGAGA CCTGACGTAC 1920
TACTGGGAGG TGATCCTCGA CATCACCAAG CGGGCCCTGG CGGCGCACGG GGGCGAAGAC 1980
GCGCGAAACG AGTTCCACGC CCTCACCGCC CTAGAGCAGA CTTTGGGGCT GGGCCAGGGT 2040
GCCCTCACGC GCCTGGCCTC GGTCACACAC GGGGCGCTGC CGGCTTTCAC CCGCAGCAAC 2100 ATTATCGTCA TCGACGAGGC CGGGCTCCTG GGGCGGCACC TACTCACGAC CGTGGTGTAT 2160
TGCTGGTGGA TGATTAACGC CCTGTACCAC ACCCCCCAGT ACGCGGGCCG CCTGCGGCCG 2220
GTGCTGGTGT GCGTGGGGTC GCCGACCCAG ACGGCCTCGC TGGAGTCCAC CTTCGAACAC 2280
CAAAAACTGC GATGCTCCGT CCGGCAGAGC GAAAACGTGC TCACGTACCT CATCTGCAAC 2340
CGCACCCTAC GCGAGTACAC GCGCCTCTCG CACAGCTGGG CCATTTTCAT TAACAACAAG 2400 CGATGTGTGG AGCACGAGTT CGGGAACCTC ATGAAGGTGC TGGAGTACGG CCTTCCCATC 2460
ACCGAGGAGC ACATGCAGTT TGTGGACCGC TTTGTCGTCC CGGAAAGTTA CATCACCAAC 2520
CCGGCCAACC TTCCGGGGTG GACGCGGCTG TTCTCGTCCC ACAAGGAGGT CAGCGCGTAC 2580
ATGGCCAAGC TCCACGCCTA CCTAAAGGTG ACTCGCGAGG GGGAGTTTGT TGTGTTTACC 2640
CTCCCCGTGC TTACGTTTGT GTCGGTCAAA GAGTTTGACA AGTATCGACG GCTCACGCAG 2700 CAACCCACGC TGACCATGGA AAAGTGGATC ACGGCCAACG CCAGTCGCAT CACCAACTAC 2760
TCCCAGAGTC AGGACCAGGA CGCGGGGCAC GTGCGCTGTG AGGTGCACAG CAAGCAACAG 2820
CTAGTCGTGG CCCGGAACGA CATCACGTAC GTCCTCAACA GCCAGGTCGC GGTGACCGCG 2880
CGCCTCCGAA AGATGGTGTT TGGGTTCGAC GGGACGTTTC GGACCTTCGA GGCTGTGCTG 2940
CGCGACGACA GCTTCGTGAA GACCCAGGGG GAGACCTCGG TGGAGTTCGC CTACCGGTTC 3000 CTGTCGCGGC TCATGTTCGG CGGGCTGATT CACTTTTACA ACTTTCTCCA GCGCCCCGGC 3060
CTGGACGCGA CCCAGAGGAC CCTGGCCTAC GGCCGCCTAG GGGAGCTGAC GGCAGAACTC 3120
CTGTCGCTAC GCCGGGACGC CGCCGGCGCA TCGGCAACCA GGGCCGCCGA CACCAGCGAC 3180
CGCTCTCCGG GGGAGCGTGC GTTCAATTTT AAGCACCTGG GCCCGCGGGA CGGGGGCCCG 3240
GACGACTTCC CCGACGACGA CCTTGACGTT ATCTTCGCCG GGCTGGACGA ACAGCAGCTG 3300 GACGTGTTCT ACTGCCACTA CGCCCTCGAA GAGCCGGAGA CCACCGCGGC CGTCCACGCC 3360
CAGTTTGGGC TCCTGAAGAG GGCCTTTCTG GGGCGATACC TTATCCTACG GGAGCTCTTC 3420
GGGGAGGTGT TTGAGAGCGC CCCCTTCAGC ACCTACGTGG ACAATGTCAT CTTCCGGGGC 3480
TGCGAGCTGC TGACCGGCTC GCCGCGCGGG GGGCTGATGT CCGTGGCCCT GCAGACGGAC 3540
AACTACACGC TGATGGGGTA CACGTACACC CGGGTGTTCG CGTTCGCGGA GGAGCTGCGG 3600 CGGCGGCACG CGACGGCCGG CGTGGCCGAG TTCTTGGAGG AGTCCCCCCT GCCCTACATC 3660
GTCCTGCGGG ACCAGCACGG CTTCATGTCT GTCGTCAATA CCAACATCAG TGAGTTTGTC 3720
GAGTCGATCG ACTCCACGGA GCTGGCCATG GCCATCAACG CCGACTACGG CATCAGCTCC 3780
AAACTCGCGA TGACCATCAC GCGCTCCCAG GGGCTCAGTC TGGACAAGGT CGCCATATGC 3840
TTCACGCCCG GAAACCTGCG CCTAAACAGC GCGTACGTAG CCATGTCCCG CACCACCTCA 3900 TCCGAGTTCC TGCACATGAA TCTAAACCCG CTCCGGGAGC GCCACGAACG CGATGACGTC 3960
ATTAGCGAGC ACATACTATC TGCTCTACGC GATCCGAATG TGGTCATTGT CTATTAACCC 4020
TCCATTCCCT CGCGTTCCCA CCGCACCCGG GCCGGGTGAC ATTCACCCCC ACCCCCCCGA 4080
GACATGGGGA ACCCCCAGAC GACCATCGCG TACAGCCTAC ATCACCCCAG GGCGTCGCTA 4140 ACGAGCGCGC TGCCGGACGC GGCACAGGTG GTGCACGTGT TTGAGTCAGG GACGCGCGCG 4200
GTTCTGACGC GGGGTCGAGC GCGCCAGGAC CGCCTGCCCC GCGGAGGCGT GGTGATACAA 4260
CACACCCCCA TCGGGCTGCT GGTGATTATC GACTGTCGTG CCGAATTTTG CGCATACCGC 4320
TTTATAGGAC GCGCTAGTAC CCAGAGGCTG GAGCGCTGGT GGGACGCCCA TATGTACGCG 4380 TACCCCTTTG ACTCCTGGGT CAGCTCATCG CACGGCGAAA GCGTCCGGAG TGCGACGGCC 4440
GGCATCCTGA CGGTGGTGTG GACCCCGGAC ACCATCTACA TCACCGCAAC GATCTACGGG 4500
ACGGCCCCCG AGGCGGCGCG GGGGTGCGAT AACGCACCCC TGGACGTCCG CCCAACCACA 4560
CCCCCCGCCC CCGTATCCCC AACGGCGGGC GAGTTCCCAG CAAACACAAC AGACCTACTG 4620
GTCGAGGTTC TGCGGGAAAT TCAGATCAGC CCCACCCTGG ACGACGCAGA CCCAACCCCC 4680 GGAACCTGAA ACCTTCTTTC CTCCCCACCC CGCCCGCTTG CATATTCCCT CTGCGCGCGG 4740
CGACGGCACC GCCGGGCGAA CGAACGGTCA ATAAAATCAA TCAATCCATC ATCCAACAAA 4800
ATAAGCTACG TGTTATTTAT TGAAACGTCA CAACACATCA GTAACGGGGG GAAGGGTAGG 4860
GGGAAAAGAA AGGGGACGGC GGGGGTGCTT AGTCTGGTTC CGTAGACAGC ATGACGTTAT 4920
CTCGATGGAG GCGCATGGGT TCGGACGAAA CAAACTCGTG TACAAACACG GGGACGGCCG 4980 GGCAGAGCCG CCCATCCGAG GACCGCGTGT AATACTTGCG CGGCTTGCGC GACCGAATAA 5040
CCGCCCGCAG CTGCTCGCGC ACCTGCGCGG CGTTGGCGCG CTTGCACAAA ACGAACATCT 5100
GGAGGCTCTT GTTGCTGCGT GGTACGCATG TGCGTTGCGG CGGTGCTCCG CGCTTGCGCT 5160
GGCGCGCGGC CGTCCCCGAA AACGACGAGG TCTTGGTACA CGCGATGCTG AACTTGGCCA 5220
GCGACAGCCG CAGGTCCTTA CGGATCGTAT CCGTGAGCTG GCGGCGCCCC AGTTCGTCGA 5280 TCGAAGATAC CATAAACAGA GTATCAAAGG TGACGTAGGG GCCGTCCGCC CCCGGCGGGT 5340
CGCCGGCCAC GCTCGACGTA AGCGGGGAGA TCGCATCCCG GTCCGGCTCT CCGTCGAGAG 5400
GCCCCACGTG CGCGTCCGGT GCGGTCCCCG CAATCGACTC TATGGGCGTC GTATTCGGCG 5460
ACGCGCCAGG ATCATGGTTC TGGGGTGCAA ACGTCCAGCC CCACGAGGCA AGGATAGTAA 5520
ACGCAGAGGG GGACCCTCTC TTCCCCCACG CCCGACATCA CGGACCGGTA TGAGACCCGA 5580 GATTTAACCA TCAACAGTCT TTAT AATTG CCCCCTCGTA AATCAAGACC CCGGATGTCG 5640
GCATCTTATA CCGACCAGTC GATAGGCATA ATGTCCCGGG TTTCGAGGTA GCGATTCGCG 5700
GCGAGGAAAT GCTGGCACGT CCCAAACGGG ACCTTGGAGA GGGGCGACGG GTGAGAAAAC 5760
TTGAGGACGT AGTGTTGGCG AGGGTCGGGC CTGATCGCGT TCTGGGCATG GGCGCCCCAG 5820
AGCATAAAGA CCAGGCCCGG GCGGCGCGCG GCCAGCCGTC GGACCACCCC GCCCACAAAG 5880 CGGTCCCATC CAAGCTTGGA GTGGGACGCC GCCGCCCCGC GCTTGACGGT CAGGGTCGTG 5940
TTCAACAACA GCACGCCGTC GCGAGCCCAC TTTTCCAGGC AGCCGCGGCC GCTCATGCGC 6000
GCGTCGGGGT AACAATTTTT TACCGCCGCC AGCACGTTCC GTAGACTCGG AGGCACCGGC 6060
ACATCCGCAC GCACGCTAAA CGCCAGGCCG TGCGCCTGGC CGGGATGGTG GTACGGGTCC 6120
TGCCCGATGA TAACCACGCG CACGTCGTCG GGGGTACAAT ACCGCGTCCA GGAGAACACA 6180 TCCTCCCGCG GCGGCAGCAC CTCTTCGGTC TGGCACCGAC GGTCATACTC CGCGAGGAGG 6240
CGCGCGGTTA GGGGGTTCGC GAGCTCCGGC TCCAACAGGG GCCGCCAGGC GTCGTCGATC 6300
AGAAACGCGC GCCGAAACGC GGCCCAGTCC AGGGGCACGG AAGTCGGCAG GGGCGGCGTC 6360
GCATCGCCCG ACAGCCCCAG GGGCTGTTCG GGGGTTGTAG ATGCGGAAAA CATCACGCCC 6420
GGCGGACAGC CTCTAGGGCG ACGACGCTGG ACAGCCGACC CGGCCGCTAG ACGGGTCGGA 6480 TATCCTGCTC CCGACCCAGG GTGGCTTGCG ATGCAGACGT GGCTAGTTGC GTCGGAGGCG 6540
AGTATGCCGG CGCCGACCTC GCGTCGGGGA GACCCGCCGT GGGGGGGCGT TCGAAAGGGC 6600
GAGGACGGGC GGCTGGGTGG CGAGGGGCTT CGACTGCGAG CTCGCTTCAT CGTCCGACGA 6660
CACAGGCGGT TCCCACGTTG ACGTGGTGTT GGCAGGCCGT AAATCGCGTC GCCCGACGCA 6720 GCGGCGAGTG CGTGAGTAGT CAAAGTTTAC ACACCCGGCC CTGACGGGTG CGTGGCTGAC 6780
GGCCTGTTTT CGACTGCCCA ACGCATCGCG TATCTCTTTA TAAAGGGCCC GGCGCGTCGT 6840
TGTTTCCTGG GTGTCGGCCG GAAACACAGA GTGACTCAAG TCCTCCAAAA ATCCCCCCGC 6900
AAAGAGAAAG GGGTTAACCA GATACGCCCT CTGGGCGTGC CTATCCCACA AAAACGTGTC 6960 CAACCCCGGG CAGTGATAGC GAAGAAATAT TCCGTCTATG CGGGCATAGT CAATAACGGA 7020
CGGGGCCTCG TAGCGCCAAG AAACATCGTC CGCGGGGGTC CGCATGCAAG GCACTCTTAG 7080
TATGTCCCCC ACCTCTTTGG CAATAACACT ACGAAGAACA TATTCGGTTG CCTGTGACCC 7140
ACCCCACGCC CCCCAGGGTC CCATAACGAC AAGCCCAAAC AGACAGACGA ACCCCATAGC 7200
GAGCGGACAG TGTAACCGGT AAGCCCCCTT GTTCCCGCAT AAAAAACGTC CAAACAAGAC 7260 AACCGCGAGC AACCGAATCA CGCGGGTCCA ATATGCCCAT TCCCGCGCTT TCTACCGCTT 7320
TATATATCCC CCGTGTCCTC CCCTCCCCCG CGTCCTCCCA TCCCCCGCGT CCTCCCTTCC 7380
CCCGCGTCCT CCCATCCCCC GCGTCCTCCC CTCCCCTGCG CACACGTGAT AGGTTTTGGG 7440
AACCCGAGGG GCGACGCGGG GAAAGCGCGC CCCCGCCCGG CCGCCGAGCG CCCCCGCCCG 7500
GCAGCCGAGC GCCCCCGCCC GGCCACCGCG AGCCCCCGCC CGGCCGCCCG GGTCGCGCCG 7560 GCGCCCCCTC CCGGCGCTTC CGGGGCCTTT CTGTCGTTCC CCGCCGGGAC CCCGGCCCCG 7620
CCCCACCGCC CCGCCCGGCA GGGGGGCCCC GGCGCCGCGC AAAACACACA GACGAACACA 7680
CGGTGGCGAT CTTTTCTTTA CTTCGGCAGA CCAGCGAGCC CCGGCCCCGG CCCGCGCCCC 7740
GCCGCCACAC CCACGGCACC CCCCCCGCCG CCCACCCCGG GGTCCACACA GGAGCGCGCG 7800
GGCGGCAAAA ACGCGGGCGT TTTTTTTTTT TTTTTTTCCC CTTTTTCTCC TCTTTTTCTC 7860 CCCCTCTTTC TTTTCCTTCC CCTTTTTCTT CTTCCCTCTC TCTTTCTTTC TCCTTCTCTC 7920
TCTTCTCTTT CTCTTTCTCT TTCTTCCTCT CTCTTTTCTT CTCCTCCTCC TATCCTCTTA 7980
TCTGTCCACT TCTCCTTCTT TTTTCTTGCC TGCTGTTTCT CTTTTTCTTT CTCTTTCTTC 8040
TATTCTCTGC CTCTCTCCTT CTTACTTCTC TTCCTCTCTC CTATCTACTG TCACTCTATC 8100
TCTTTCCCTT CTTTTATATG TGTCGTATTC ATTCTTTCTC TACTCACGTT ACTCTATCTA 8160 TCTTCTTCAT CATCTCTCTC TCCTACTCTC TCTCCCTCCT TACTACTTCT CTCTTCCTCT 8220
TTTCCTTTAA TATATTTTCT TTCTTTACTC ATCCCTTTTT TCACTTTACT ATTCCATGTG 8280
TATCTCTTCT CTCTTCCTTT TTTCTCTCTC TTTATTTCTC ACTTTCTCCC TCCTCCACTC 8340
TTCTCATCTT TTTTTCTCTA CTCAACATAT CTCTCATCTC TCTCTCTCTA TTTACTACTC 8400
CTACTTCTTT TTTCTCTCTC TCATATTCTA TTTTCACTTT TCTTCCTCCT TCTCATATCT 8460 CACCTCTCGT TCTCTCCCTC TTTTCATTTC ATCTCTTATA TCCTCTCTAT CTTTATTCCT 8520
CCTCTCTCAT TCTCTCTCCT CTGCTCACAC TTACTTACTC CTCCTCTCCA ATTTGTCTCT 8580
TCGTCTCTCA CTCTTCTCCT TTCTCATTTC TTTCTATACT CTCTCTATCT TCGTATCTCT 8640
TTTATCTATC TACTTCTCTA TTCTATCTCC TCTATGCTTG ACTCTCGTTA CATTCTCATC 8700
CCTTCGCTCC TTATCAATTA TCTTCGCACT GCTTANGTAT TCTCTCTCTC TCTCCTCTCT 8760 CTCTCATTTT CTCCTCTCCT GCTTCTCTCT CTTTTCTTGT GGCTCTATCC ACTTCTTCAT 8820
TATATTCTCT CACTTATTTT CTTCTTTCTC TCTCACTGCT CTCTCACCTC TCTCTCTACT 8880
TTCTCTCTTC CTCTGTTTTC TCTCCTCTCT TTGTCTATTC ATCCCCTCTT AACGTTCTTC 8940
TTCTCTTCCG TC 8953
(2) INFORMATION FOR SEQ ID NO : 2 :
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 594 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 :
Met Val Leu Met Gly Arg Leu Arg Asn Ala Pro Glu Ser Leu Thr Tyr 1 5 10 15
Met Phe Cys Ala Ala Ile Arg Val Ala Pro Val Thr Thr Gin Ser Arg
20 25 30
Thr Ser Leu Arg Val Cys Thr His Val Leu Phe Pro Asp Pro Ala Leu 35 40 45 Pro Val Met Arg Tyr Ala Ala Asn Gly Asn Ser Arg Ser Gly Arg Pro 50 55 60
Val Gly Thr Ser Lys Ala Ala Thr Ser Arg Asn His Cys Arg Arg Gly 65 70 75 80
Thr Cys Val Thr Ser Ser Cys Cys Cys Glu Ser Ser Arg Met Arg Ala 85 90 95
Met Ile Gly Trp Thr Pro Cys Met Asp Val Lys Phe Lys Asn Ala Ser
100 105 110
Ser Leu Asn Arg Thr Ala Gly Leu Ala Pro Gly Cys Cys Gly Gly Gly 115 120 125 Pro Gly Ala Arg Thr Ser Arg Glu Pro Ser Pro Pro Asp Ala Ala Met 130 135 140
Ala Ala Gin Arg Ala Arg Ala Pro Ala Met Arg Thr Arg Gly Gly Asp 145 150 155 160
Ala Ala Leu Cys Ala Pro Glu Asp Gly Trp Val Lys Val His Pro Thr 165 170 175
Pro Gly Thr Met Leu Phe Arg Glu Ile Leu Leu Gly Gin Met Gly Tyr
180 185 190
Thr Glu Gly Gin Gly Val Tyr Asn Val Val Arg Ser Ser Glu Ala Ala 195 200 205 Thr Arg Gin Leu Gin Ala Ala Ile Phe His Ala Leu Leu Asn Ala Thr 210 215 220
Tyr Asp Leu Glu Glu Asp Trp Arg Arg His Val Val Arg Leu Gin Pro 225 230 235 240
Gin Arg Leu Val Arg Arg Tyr Arg Asn Ala Arg Glu Gly Asp Ile Ala 245 250 255
Gly Val Ala Glu Arg Val Phe Asp Thr Trp Arg Cys Thr Leu Arg Thr
260 265 270
Thr Leu Leu Asp Phe Ala His Gly Val Val Asn Cys Phe Ala Pro Gly 275 280 285
Gly Pro Ser Gly Pro Thr Ser Phe Pro Lys Tyr Ile Asp Trp Leu Thr 290 295 300
Cys Leu Gly Leu Val Pro Ile Leu Arg Lys Thr Arg Glu Gly Glu Ala
305 310 315 320
Thr Gin Arg Leu Gly Ala Phe Leu Arg Gin His Thr Leu Pro Arg Gin 325 330 335
Leu Ala Thr Val Ala Gly Ala Ala Glu Arg Ala Gly Pro Gly Leu Leu 340 345 350
Glu Leu Ala Val Ala Phe Asp Ser Thr Arg Met Ala Glu Tyr Asp Arg 355 360 365
Val His Ile Tyr Tyr Asn His Arg Arg Gly Glu Trp Leu Val Arg Asp 370 375 380
Pro Val Ser Gly Gin Arg Gly Glu Cys Leu Val Leu Cys Pro Pro Leu
385 390 395 400
Trp Thr Gly Asp Arg Leu Val Phe Asp Ser Pro Val Gin Arg Leu Cys 405 410 415
Pro Glu Ile Val Ala Cys His Ala Leu Arg Glu His Ala His lie Cys 420 425 430
Arg Leu Arg Asn Thr Ala Ser Val Lys Val Leu Leu Gly Arg Lys Ser 435 440 445
Asp Ser Gly Val Ala Gly Ala Ala Arg Val Val Asn Lys Ala Leu Gly 450 455 460
Glu Asp Asp Glu Thr Lys Ala Gly Ser Ala Ala Ser Cys Leu Val Arg
465 470 475 480
Leu Ile Ile Asn Met Lys Gly Met Arg His Val Gly Asp Ile Asn Asp
485 490 495
Thr Val Arg Ala Tyr Leu Asp Glu Ala Gly Gly His Leu Ile Asp Thr
500 505 510
Pro Ala Val Asp His Thr Leu Pro Gly Phe Gly Lys Gly Gly Thr Gly
515 520 525
Arg Gly Ser Ala Ala Gin Asp Pro Gly Ala Arg Pro Gin Gin Leu Arg 530 535 540
Gin Ala Phe Gin Thr Ala Val Val Asn Asn Ile Asn Gly Met Leu Glu
545 550 555 560
Gly Tyr Ile Asn Asn Leu Phe Gly Thr Ile Glu Arg Leu Arg Glu Thr
565 570 575
Asn Ala Gly Leu Ala Thr Gin Leu Gin Arg Gly Ser Ser Arg Ser Thr 580 585 590
Ala Xaa
( 2 ) INFORMATION FOR SEQ ID NO : 3 ( i ) SEQUENCE CHARACTERISTICS :
(A) LENGTH : 877 amino acids
( B ) TYPE : amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:
Met Ala Ala Ser Gly Gly Glu Gly Ser Arg Asp Val Arg Ala Pro Gly
1 5 10 15
Pro Pro Pro Gin Gin Pro Gly Ala Arg Pro Ala Val Arg Phe Arg Asp 20 25 30
Glu Ala Phe Leu Asn Phe Thr Ser Met His Gly Val Gin Pro Ile Ile
35 40 45
Ala Arg Ile Arg Glu Leu Ser Gin Gin Gin Leu Asp Val Thr Gin Val 50 55 60 Pro Arg Leu Gin Trp Phe Arg Asp Val Ala Ala Leu Glu Val Pro Thr 65 70 75 80
Gly Leu Pro Leu Arg Glu Phe Pro Phe Ala Ala Tyr Leu Ile Thr Gly
85 90 95
Asn Ala Gly Ser Gly Lys Ser Thr Cys Val Gin Thr Leu Asn Glu Val 100 105 110
Leu Asp Cys Val Val Thr Gly Ala Thr Arg Ile Ala Ala Gin Asn Met
115 120 125
Tyr Val Lys Leu Ser Gly Ala Phe Leu Ser Arg Pro Ile Asn Thr Ile
130 135 140 Phe His Glu Phe Gly Phe Arg Gly Asn His Val Gin Ala Gin Leu Gly
145 150 155 160
Gin His Pro Tyr Thr Leu Ala Ser Ser Pro Ala Ser Leu Glu Asp Leu
165 170 175
Gin Arg Arg Asp Leu Thr Tyr Tyr Trp Glu Val Ile Leu Asp Ile Thr 180 185 190
Lys Arg Ala Ala His Gly Gly Glu Asp Ala Arg Asn Glu Phe His Ala
195 200 205
Leu Thr Ala Leu Glu Gin Thr Leu Gly Leu Gly Gin Gly Ala Leu Thr 210 215 220 Arg Leu Ala Ser Val Thr His Gly Ala Leu Pro Ala Phe Thr Arg Ser 225 230 235 240
Asn Ile Ile Val Ile Asp Glu Ala Gly Leu Leu Gly Arg His Leu Leu 245 250 255 Thr Thr Val Val Tyr Cys Trp Trp Met Ile Asn Ala Leu Tyr His Thr
260 265 270
Pro Gin Tyr Ala Gly Arg Leu Arg Pro Val Leu Val Cys Val Gly Ser 275 280 285 Pro Thr Gin Thr Ala Ser Leu Glu Ser Thr Phe Glu His Gin Lys Leu 290 295 300
Arg Cys Ser Val Arg Gin Ser Glu Asn Val Leu Thr Tyr Leu Ile Cys 305 310 315 320
Asn Arg Thr Leu Arg Glu Tyr Thr Arg Leu Ser His Ser Trp Ala Ile 325 330 335
Phe Ile Asn Asn Lys Arg Cys Val Glu His Glu Phe Gly Asn Leu Met
340 345 350
Lys Val Leu Glu Tyr Gly Leu Pro Ile Thr Glu Glu His Met Gin Phe 355 360 365 Val Asp Arg Phe Val Val Pro Glu Ser Tyr Ile Thr Asn Pro Ala Asn 370 375 380
Leu Pro Gly Trp Thr Arg Leu Phe Ser Ser His Lys Glu Val Ser Ala 385 390 395 400
Tyr Met Ala Lys Leu His Ala Tyr Leu Lys Val Thr Arg Glu Gly Glu 405 410 415
Phe Val Val Phe Thr Leu Pro Val Leu Thr Phe Val Ser Val Lys Glu
420 425 430
Phe Asp Lys Tyr Arg Arg Leu Thr Gin Gin Pro Thr Leu Thr Met Glu 435 440 445 Lys Trp Ile Thr Ala Asn Ala Ser Arg Ile Thr Asn Tyr Ser Gin Ser 450 455 460
Gin Asp Gin Asp Ala Gly His Val Arg Cys Glu Val His Ser Lys Gin 465 470 475 480
Gin Leu Val Val Ala Arg Asn Asp Ile Thr Tyr Val Leu Asn Ser Gin 485 490 495
Val Ala Val Thr Ala Arg Leu Arg Lys Met Val Phe Gly Phe Asp Gly
500 505 510
Thr Phe Arg Thr Phe Glu Ala Val Leu Arg Asp Asp Ser Phe Val Lys 515 520 525 Thr Gin Gly Glu Thr Ser Val Glu Phe Ala Tyr Arg Phe Leu Ser Arg 530 535 540
Leu Met Phe Gly Gly Leu Ile His Phe Tyr Asn Phe Leu Gin Arg Pro 545 550 555 560
Gly Leu Asp Ala Thr Gin Arg Thr Leu Ala Tyr Gly Arg Leu Gly Glu 565 570 575
Leu Thr Ala Glu Leu Leu Ser Leu Arg Arg Asp Ala Ala Gly Ala Ser
580 585 590
Ala Thr Arg Ala Ala Asp Thr Ser Asp Arg Ser Pro Gly Glu Arg Ala 595 600 605
Phe Asn Phe Lys His Leu Gly Pro Arg Asp Gly Gly Pro Asp Asp Phe
610 615 620
Pro Asp Asp Asp Leu Asp Val lie Phe Ala Gly Leu Asp Glu Gin Gin 625 630 635 640
Leu Asp Val Phe Tyr Cys His Tyr Ala Leu Glu Glu Pro Glu Thr Thr
645 650 655
Ala Ala Val His Ala Gin Phe Gly Leu Leu Lys Arg Ala Phe Leu Gly 660 665 670 Arg Tyr Leu Ile Leu Arg Glu Leu Phe Gly Glu Val Phe Glu Ser Ala 675 680 685
Pro Phe Ser Thr Tyr Val Asp Asn Val Ile Phe Arg Gly Cys Glu Leu
690 695 700
Leu Thr Gly Ser Pro Arg Gly Gly Leu Met Ser Val Gin Thr Asp Asn 705 710 715 720
Tyr Thr Leu Met Gly Tyr Thr Tyr Thr Arg Val Phe Ala Phe Ala Glu
725 730 735
Glu Leu Arg Arg Arg His Ala Thr Ala Gly Val Ala Glu Phe Leu Glu 740 745 750 Glu Ser Pro Leu Pro Tyr Ile Val Leu Arg Asp Gin His Gly Phe Met 755 760 765
Ser Val Val Asn Thr Asn Ile Ser Glu Phe Val Glu Ser Ile Asp Ser
770 775 780
Thr Glu Leu Ala Met Ala Ile Asn Ala Asp Tyr Gly Ile Ser Ser Lys 785 790 795 800
Leu Ala Met Thr Ile Thr Arg Ser Gin Gly Leu Ser Leu Asp Lys Val
805 810 815
Ala Ile Cys Phe Thr Pro Gly Asn Leu Arg Leu Asn Ser Ala Tyr Val 820 825 830 Ala Met Ser Arg Thr Thr Ser Ser Glu Phe Leu His Met Asn Leu Asn 835 840 845
Pro Leu Arg Glu Arg His Glu Arg Asp Asp Val Ile Ser Glu His Ile
850 855 860
Leu Ser Ala Leu Arg Asp Pro Asn Val Val Ile Val Tyr 865 870 875
(2) INFORMATION FOR SEQ ID NO: 4:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 199 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 :
Met Gly Asn Pro Gin Thr Thr Ile Ala Tyr Ser Leu His His Pro Arg
1 5 10 15
Ala Ser Leu Thr Ser Ala Leu Pro Asp Ala Ala Gin Val Val His Val 20 25 30 Phe Glu Ser Gly Thr Arg Ala Val Leu Thr Arg Gly Arg Ala Arg Gin 35 40 45
Asp Arg Leu Pro Arg Gly Gly Val Val Ile Gin His Thr Pro Ile Gly
50 55 60
Leu Leu Val Ile Ile Asp Cys Arg Ala Glu Phe Cys Ala Tyr Arg Phe 65 70 75 80
Ile Gly Arg Ala Ser Thr Gin Arg Leu Glu Arg Trp Trp Asp Ala His
85 90 95
Met Tyr Ala Tyr Pro Phe Asp Ser Trp Val Ser Ser Ser His Gly Glu 100 105 110 Ser Val Arg Ser Ala Thr Ala Gly Ile Leu Thr Val Val Trp Thr Pro 115 120 125
Asp Thr Ile Tyr Ile Thr Ala Thr Ile Tyr Gly Thr Ala Pro Glu Ala
130 135 140
Arg Cys Asp Asn Ala Pro Leu Asp Val Arg Pro Thr Thr Pro Pro Ala 145 150 155 160
Pro Val Ser Pro Thr Ala Gly Glu Phe Pro Ala Asn Thr Thr Asp Leu
165 170 175
Leu Val Glu Val Leu Arg Glu Ile Gin Ile Ser Pro Thr Leu Asp Asp 180 185 190 Ala Asp Pro Thr Pro Gly Thr 195
(2) INFORMATION FOR SEQ ID NO: 5:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 172 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: Val Gly Pro Leu Asp Gly Glu Pro Asp Arg Asp Ala Ile Ser Pro Leu
1 5 10 15
Thr Ser Ser Val Ala Gly Asp Pro Pro Gly Ala Asp Gly Pro Tyr Val 20 25 30
Thr Phe Asp Thr Leu Phe Met Val Ser Ser Ile Asp Glu Leu Gly Arg
35 40 45
Arg Gin Leu Thr Asp Thr Ile Arg Lys Asp Leu Arg Leu Ser Leu Ala 50 55 60 Lys Phe Ser Ile Ala Cys Thr Lys Thr Ser Ser Phe Ser Gly Thr Ala 65 70 75 80
Ala Arg Gin Arg Lys Arg Gly Ala Pro Pro Gin Arg Thr Cys Val Pro
85 90 95
Arg Ser Asn Lys Ser Leu Gin Met Phe Val Leu Cys Lys Arg Ala Asn 100 105 110
Ala Ala Gin Val Arg Glu Gin Leu Arg Ala Val Ile Arg Ser Arg Lys
115 120 125
Pro Arg Lys Tyr Tyr Thr Arg Ser Ser Asp Gly Arg Leu Cys Pro Ala 130 135 140 Val Pro Val Phe Val His Glu Phe Val Ser Ser Glu Pro Met Arg Leu 145 150 155 160
His Arg Asp Asn Val Met Leu Ser Thr Glu Pro Asp 165 170
(2) INFORMATION FOR SEQ ID NO: 6:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 334 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 :
Met Lys Arg Ala Arg Ser Arg Ser Pro Ser Pro Pro Ser Arg Pro Ser
1 5 10 15
Ser Pro Phe Arg Thr Pro Pro His Gly Gly Ser Pro Arg Arg Glu Val 20 25 30
Gly Ala Gly Ile Leu Ala Ser Asp Ala Thr Ser His Val Cys Ile Ala
35 40 45
Ser His Pro Gly Ser Gly Ala Gly Tyr Pro Thr Arg Leu Ala Ala Gly 50 55 60
Ser Ala Val Gin Arg Arg Arg Pro Arg Gly Cys Pro Pro Gly Val Met 65 70 75 80
Phe Ser Ala Ser Thr Thr Pro Glu Gin Pro Leu Gly Leu Ser Gly Asp 85 90 95
Ala Thr Pro Pro Leu Pro Thr Ser Val Pro Leu Asp Trp Ala Ala Phe
100 105 110
Arg Arg Ala Phe Leu Ile Asp Asp Ala Trp Arg Pro Leu Leu Glu Pro 115 120 125 Glu Leu Ala Asn Pro Leu Thr Ala Arg Leu Leu Ala Glu Tyr Asp Arg 130 135 140
Arg Cys Gin Thr Glu Glu Val Leu Pro Pro Arg Glu Asp Val Phe Ser 145 150 155 160
Trp Thr Arg Tyr Cys Thr Pro Asp Asp Val Arg Val Val Ile Ile Gly 165 170 175
Gin Asp Pro Tyr His His Pro Gly Gin Ala His Gly Leu Ala Phe Ser
180 185 190
Val Arg Ala Asp Val Pro Val Pro Pro Ser Leu Arg Asn Val Leu Ala 195 200 205 Ala Val Lys Asn Cys Tyr Pro Asp Ala Arg Met Ser Gly Arg Gly Cys 210 215 220
Leu Glu Lys Trp Ala Arg Asp Gly Val Leu Leu Leu Asn Thr Thr Leu 225 230 235 240
Thr Val Lys Arg Gly Ala Ala Ala Ser His Ser Lys Leu Gly Trp Asp 245 250 255
Arg Phe Val Gly Gly Val Val Arg Arg Leu Ala Ala Arg Arg Pro Gly
260 265 270
Leu Val Phe Met Leu Trp Gly Ala His Ala Gin Asn Ala Ile Arg Pro 275 280 285 Asp Pro Arg Gin His Tyr Val Leu Lys Phe Ser His Pro Ser Pro Leu 290 295 300
Ser Lys Val Pro Phe Gly Thr Cys Gin His Phe Leu Ala Ala Asn Arg 305 310 315 320
Tyr Leu Glu Thr Arg Asp Ile Met Pro Ile Asp Trp Ser Val 325 330
(2) INFORMATION FOR SEQ ID NO : 7 :
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 183 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7:
Val Pro Cys Met Arg Thr Pro Ala Asp Asp Val Ser Trp Arg Tyr Glu
1 5 10 15
Ala Pro Ser Val Ile Asp Tyr Ala Arg Ile Asp Gly Ile Phe Leu Arg 20 25 30 Tyr His Cys Pro Gly Leu Asp Thr Phe Leu Trp Asp Arg His Ala Gin 35 40 45
Arg Ala Tyr Leu Val Asn Pro Phe Leu Phe Ala Gly Gly Phe Leu Glu
50 55 60
Asp Leu Ser His Ser Val Phe Pro Ala Asp Thr Gin Glu Thr Thr Thr 65 70 75 80
Arg Arg Ala Leu Tyr Lys Glu Ile Arg Asp Ala Leu Gly Ser Arg Lys
85 90 95
Gin Ala Val Ser His Ala Pro Val Arg Ala Gly Cys Val Asn Phe Asp 100 105 110 Tyr Ser Arg Thr Arg Arg Cys Val Gly Arg Arg Asp Leu Arg Pro Ala 115 120 125
Asn Thr Thr Ser Thr Trp Glu Pro Pro Val Ser Ser Asp Asp Glu Ala
130 135 140
Ser Ser Gin Ser Lys Pro Leu Ala Thr Gin Pro Pro Val Leu Ala Leu 145 150 155 160
Ser Asn Ala Pro Pro Arg Arg Val Ser Pro Thr Arg Gly Arg Arg Arg
165 170 175
His Thr Arg Leu Arg Arg Asn 180
(2) INFORMATION FOR SEQ ID NO: 8:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9218 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 :
CCTCCGGACG TGCGATCGGA TCCCGCGAGT CGAAATCCCA CACAGCAGAC CCGTGGGTGT 60 GCTAGATCGA ACGAGCGGCA GGATCGCGTG CTGGCCCCTT GATACGATCT CGTCGACCGG 120 GGACTCCCCT CTACCCCCAC CCAACCAGCG CGCCGGCGCT TAGGGTGTGA CCCCCCCCAT 180
GGCATCCGGG GTTTCCCCGG CCCACCCCCA AACCCCGGTT GGGGCGGGCA GCCGAGACCT 240
TTCCTTAAAA GGCACCCCAT CCGACGGCAT GCAGCCCAGA GGAGCGGACA CGCTTGAAGG 300
GCACTCGCTT CCGACCGACG GGCCCCCGCA CCGGGGCGGC GACCATGATC CGGCGGCGGG 360 GAAACGTGGA GATTCGGGTC TACTACGAGT CTGTGCGGCC CTCTCGATCC CGAAGCCATC 420
TGAAGCCGTC CGACCATCAA GAATTCCCAG GGCACCACGT GTCCCCAGGG AGCCCCGGGT 480
TCCCCGAGAG CCCAGGGAAC CGCGAGTTCC ACGATCTCCC AGAGAACCCA GGGTCCCGCG 540
CATACCCAGG GACCCGCGAC CCCCACGACC CCCACGGGTG CCCAGGGAGC CTAGACCCCC 600
ACGGGAACCC CGCGCAACCC GCGGGCTTGC CTAGCCCGGT CCCCTACGCC CCCCTCGGCA 660 GCCCGGACCC CTCATCGCCG CGCCAACGCA CGTACGTTCT GCCCCGCGTC GGGATCCGTA 720
ACGCGCCCGC GTCCGACACC CGGGCCCCAA AGCGTGCCCA CTCGCGGCAC CGCGCGGACC 780
GGCCCCCGGA GTCCCCCGGC TCCGAGTTGT ACCCTCTCAA CGCCCAGGCC CTGGCGCACC 840
TGCAGATGCT GCCCGCGGAC CACCGGGCCT TTTTTCGGAC GGTGATCGAG GTGTCCCGCC 900
TGTGTGCTCT CAACACCCAC GACCCACCGC CCCCGCTGGC GGGAGCCAGG GTCGGACAGG 960 AGGCGCAGCT GGTTCATACC CAATGGCTTC GGGCCAACAG GGAGTCCTCG CCGCTGTGGC 1020
CCTGGCGGAC GGCCGCCATG AATTTTATCG CCGCGGCTGC GCCCTGCGTC CAAACACATC 1080
GCCATATGCA CGACCTGCTG ATGGCATGCG CCTTCTGGTG CTGTTTGGCG CACGCGTCGA 1140
CGTGTTCCTA CGCGGGGTTA TATTCGGCAC ACTGCCAGCA TTTGTTTCGT GCGTTTGGGT 1200
GCGGACCCCC GGTCCTGACC ACGTCCCGGG GACAGGGTGG TTGGTGTAAT TAATAATAAA 1260 ATCGTGAAAA TTGAAATCGC TTTGTGTGTT GCTGCGGGGA CGGGGGCAAA TGCGTCGTGA 1320
CTCTAGAACG CCAGATGTGG GGTGCGGATG GGGAAATGTA TGGGTCCTTC GTCTGGAGCC 1380
CGTACCCGGC AGAGAGATTT CCCCAGCACG GAGGAACTGG GGTACTGCAC TGCCCCCCTC 1440
CTGGGGGGGG GGGGGGCGAG AGGTCAATAG ATTTCCCCAA GAGACTTCCC TAACACGGAG 1500
GAGCCGGGAG AGTTCAATAG ATTTCCCAAA CACTGAGGAA CTGGGGTACT GCACTGCCCC 1560 CTCCTGGGGG GGGGGGCGAG AGGTCAATAG ATTTCCCCAA GAGACTTCCC TAACACGGAG 1620
GAGCCGGGAG AGTTCAATAG ATTTCCCAAA CACTGAGGAA CTGGGGTACT GCACTTGCCC 1680
CCCCCGGGGG GTGAAATTCC GAGAATTTTT TACCCTTTTT TGCATTTCCT TCCCCCCCCC 1740
CCCCAAAAAA AAAGACAACC TAGTAGACCG TAATGACAAT CAACCACTTT ATTGCAATTA 1800
ACATACGGAC GTGGGTCGCG GCGAGGGGTG GGGGCGAAGA AGGCGCCATA CATCGAGGCG 1860 TCATTTAGCG GAGCAGCCAC ACCAAAAGTG CCCCGAACCC TCCAGATAGG AGGGCCACGA 1920
CGAGACAGGC GATAACCAGC CCGACGCACC GCGTGCGCCG CCGTCGGCGC CTTAGGACCG 1980
ACTGCTGGCG GCCCATGCGC ACGAGGAAGT CGTTGGCGGC CTCGTCTTCG CTTTCCGAGT 2040
AGTAGGCTTC TGCCGGGACG GGCGAGGCCG CGGGGTAAAG CGGCACCGAC GCGCTGGAAC 2100
GCACCGAGTC TTGGTCGGCG GGCCGGGAGG TCATCGCGGA CGCGGAAGGG CGCTGGCGGA 2160 GGGCCGGAGG CGAAGGTGCG GTTGCCGTGA CTCACGATTT TTATGAGCTG CGGCGGGGCT 2220
GGCCGCCGGA CCTTTATGCG CCTCGGGCGA TTGACGTCAC GTAAAACGCA ATCCCGCACA 2280
GGACGGCCCC GAGACCCACC GCCCCCCGCA GCCAGCGCAC GGCGAGCCAG GTGACGAATT 2340
GGGAGGGGGC GTCCACGGCG TGGAGGGCCA CGGGAAAGGC CGCGGGGGAG CCGCCGCGAG 2400
GTGGTCTGCG GCACGCGGGC GCGGCGCCGC CCGCGCCGGG GGGCAGGGTC TCTGGCGGGT 2460 CCCCGCGTGC GTCCGCGATG GCAATCAGTT CATCGCCGAC GTCCGCGTCG TCGGAAGACG 2520
CCTTACCAGA GGACGGACGA ATAGGAGGCC TGGGAGTGAC GACGGCCCGG GCTTCCCGAA 2580
CCAAAGGTGG TGAGCGGGCG GCGAGATTTA CGCCCCTCGC TATGGGGGTA TACAGACGGA 2640
GCCGTTGGTG ATAAGATCTC AAAGCCGGAT CCATTTGTGG AGGGAGAGTC GGGTCTCTCC 2700 GGAGGGTCCT GCCACAGGGA CCCGTCGCGC TCCCCCTCGC TGTCCGAACT CCAGTCCGCG 2760
TACAGCTCGC TGTCCGCCAC GCGAATGTAA GTGGGGCCCG TCGCCGAGGC CCGGCTTTTA 2820
ACCGCCCGCC AGGAGCGCCT GCGCCAGCAG GTCATGCACG CCCACGCGGA CAGCCCGAGG 2880
GCGGCCAGCA ACAGGGCCGC CCCCAGCACC GCCCCGAGGC GCAGCGGGCC GCGCGCGGAG 2940 GGCGCGGGAG GGGGGGCTCT CACGTGCGGG CGGGTGGGCT CGACGGGCTC GGGCTGGCGC 3000
TGGGGGAGGT GCTGTTCCAC CACCGCGTTC CGGTACTGCG CCGCGGTGCT GATGGTCATG 3060
TGGCCCCAGG CGTGGATATG ATCGTCCACG TACACCACGC ACAGGTAGAG GCCGGCGTGC 3120
TGGGGGGAGG CGTGCTGGAA TTCCAGATTG ACGGTGGAGG CCAGCCACGC CAACCCCGGG 3180
ACCGGTTCCA TGCGAGCCTC GGCAAAACAT CGCGGCGGGG GCGTAGTCCT GGAACAGCCG 3240 GCGTAGCTGC GGACCGCCAG GCGGTACGCC CAGGAACTTA CGGCGCACGG CGCGTCGGCC 3300
GGAGATAGAC ACTCTGGAAG CTGCGGGTGA TACAGACAAG CTTCGTAGAT CCGCATCTCG 3360
GCGCACGAGG ACGGCACGTC AAACCGCATC CAGACGACGT CCATGGCGTA CGGACCGTCG 3420
TCGTGGGCTA CGGCGTGGAT GGAGACCTTC GTCTCAAACG TCTCCCCGGG GGCAAACATA 3480
ATGGCCTCCG GGGTCTCCAT ATGGACCGTT ACCCCGCGCA CGTGGGACAC CTCGGGGATA 3540 ACACGATGGT GCCTCGGGGG GGCCACGGGG GGACCAACGG GGGGGGGTTG GGGGGGGAAC 3600
GCTGACCGGC GTGCGTTCGC TCACGCCCGC GTCGTCTTCT TCGTCGTAGT CGTCGGGGGT 3660
CGGGGTCGGC ACAGGGGCGG GCTCCACGAC CAGAACCACC GACGCCACTT GGCGCGCCTC 3720
GTCGCTTAGG CCGACCACGG ACAGGGTGTA CAGACCGCTG TCCGTCTCCA GGGCCCCGTA 3780
GATGACCAGA CTCTCGTTGA CCACGGCTAC GCGATCGCGC CACGCCAACT CCGAATACAG 3840 TCCCTCGTCG CCCGCGGGGA ACGGGGGACT GTATGCTATG GCGAGCGGTT CCGGGGCGCG 3900
CATGCACGCC GCATCCACGA CCGTCTCGAG CACCCGTCGG GGGGGCCACA GCGCCACCCA 3960
CGACGGGCGC AGGGGACCGC AGGCATCCAG GGGTTCCGCG GCCCACAGTA GTTTGTGGGC 4020
CCGGGTGCGT TCCTCCGGCC CCGCGGGCGC CGGAAGCAAC ACCACGTCCT CGCCCGAGGT 4080
TACCCGTTTC CAGGACGTTC TGGGTGCTGC CGCCAGGCAC GATACGACCC AAACTCCAAC 4140 AAAAAACACC AACCCGGCCC CGCGAGCCAT GTTCGGGTGG CAGGAGCCGT CGGTCGGGGC 4200
AGATCGGAGA CTAGCTGACG GCGGCGCACC AAGTCACCCG AAGACACAGA GTCGGGGCGG 4260
CGACTCCTTA AATGCGCGGC GGGCCTCTCC GACACTACCC CCTTTATTCT TTTTCCTCCC 4320
CCCCCGGGCC CCGCCCATCC ATTACCCGCC TCCCATGCCA TCCGGGGAAT GACGAACGAT 4380
CACAAAGGGA TCCAACACAC GCATATAGGC AAATAACATC GGTTTATTGG GGGGGAAATA 4440 ACCACGATGG GGGCGGTGGG GCGGGCCTGC CGAACGGCCC GCTTGGACCT AAACCTCTTG 4500
GGGGGCCGTC GGGCCACTGC GGGGCCGAGG ACTGACGGAC AGCAGCACGA CTGGACCTGG 4560
CTCCGATTCC TCAGCTATCG ACGTTAGGGA AGGCATGGTC GTGGACGACG ACGAACGGCG 4620
TCGGGGTTTG GGGGGGGTGT TTGGGTGGGA TCGCAGCTCG GCTCCGAGGC GGGCCATGGC 4680
CGCCTCGTTG ACCGCGCAGG AAACGCCCCC GGGGTTGTAA ATCTGGCCGC GGGGGCGCCT 4740 GTATCGGCGC TGGCATCTAT GGATGAAGCA GATACAGCTG CCCAGAAACA CAAAGGCGAT 4800
GATGGACGCC GGTATGGCGA TCTGGATTAC CTGGGCTACG GTTAGCCTGT GTCTCGATTC 4860
GCTGGCCGAT CGCGTGGAAT TGGGCGGGGC TCTCTCGCCG CTCGCGGGCG CGGGCGTCCC 4920
TGTGTCCCCG GGGGCGGGGG TCGGGTCTCG GGGGGAGGAC GGGGATGTCG TTGTCCGTGG 4980
AGGGGTGGGC CGGGAGGCTC CGGGGGTGTA TACGCTCGAG GGTCCCAGGC GCGGGGCCGA 5040 AAAGGGAAGC TGCGCCGGAT CGCAGGAGCC GTAGTCCGAG CCGTTATACA CAAACGTCCC 5100
GTTGGCAGAG AGCGCCACCC CCAAAACAAA CAGGCTGGCG TTCGTCGCGC TGCCGACCCA 5160
TACGCGCAGG ACATACAGAC CGGCATAGTC GCGCGTTGCC GTTCGAACCC GCAGAAGCGG 5220
CTGCCGCGCC AGACCCAGCT CCAGGGTCGG ATAGGCGGGG CTGTGGGCGT GGTGCGTCGA 5280 GCGACACAAG GTGAACGCCA CGGCGGGGCG GCGGGGGCAT GCGGTCAGTG TGACCACGTG 5340
TACAACGCGG GGGCAGTGGT TCCCCAGGGG GTAGTGAAAC AGCTCGATGA TGCCGTCGTA 5400
GTAGTTTGTG TGGGGGACCT GGGCCCCCAC AAAATGAAGC TCCCCGAAAA CACGCAGGTC 5460
CTCTTCCACG AAGCCCTGGG GCCCCACGGC CCCGGCATCC ACGAGTGAGT CTGAGACCAG 5520 ACTGACCGTG GGGCCGCGGA CGACCAGGCC GGTGGCGCAG ACCCACAGGC CCAGGATCGC 5580
CAGGCCCTGC AGCGAGCGGC CGGGCATACC GGGATCGGAC GGGTCGAGGT GACTGTGGGC 5640
GCGGTGGCTA ATCGTCGGGA CAGCGGTGCG CGCCCCACGC TCCCGGCCTA TGAACTGTCC 5700
TAGTTTCCCT CCTTCGAGAC TCCCTTTATG CGGAGTCCAA GTCCCACCCA AATACCCCAG 5760
CCACCCTCCC ACACGGGCCC AGAGGTACAC GGGAGCGGGG ATACTCCTCT AGTAAAACAA 5820 TGGCTGGTGC GAGGGGGGCG CGTCGTCATC CCGGATGTGG GGGAGACGTA GGCGCTTGGG 5880
GGCCATCTGA GCGCGGCGGC GTACCCAAAA CGCAATACCG CCGATGACCA GCACCGCCAG 5940
GGTACTGCCG GCCAGCGCGC CGATGATCAG GCCCGGGTTG CTGGGGGCGG CGGGGGCGTG 6000
GTGCGGCGCG ACGTCCTGGA TCGACGGGAT GTGCCAGTTT GGGGGGATCT GCGAAGACAC 6060
CGTCCCGGCG GGATCCTCTA AGAGGGCCGA GTCCTCGGGG TCTTCCGGAA CGAGTTCGGG 6120 TTGCGTGGCG TTGGTGGTGT CGGACAGCTC CGGCGGCAGC AGGGTGCTGG TGTACGGGGG 6180
CTTGGGGCCG TGCCACCCGG CGATTTTTAA GCTGTAT GG GCGACGGTGC GCTGGTTTTC 6240
GGGGATAAAG CGGGGGAGCA TCCCGATGCT GTCGACCGTC ACGCCCTGTT GGTAGGCCTT 6300
CGAGGTGAGG CACGCTGCCG GGGGGATGCG CAGGGGGAGA GCGTACTTGC AGGAGGCGCG 6360
GGCCCGGTGC TCCAAAATAA ATTGTGTGAT CTCCGTCCAG TCGTTTATCT TCACTAGCCG 6420 CAGGTACGTA CCCGCGGTCT CGAAGGCGGG GGCGTGCATC AGGAATCCCA GGTTATCCTC 6480
GCTGACGGCG CTAAAGCTGT CATAGTAGCT CCAGCGGGGC TGCGTTCGGA TGGGGCAGAC 6540
CCCCAACGAC TTGTTGTAGG GGCACTCGGT GTATTCCATA ACCGTGATGG GGATAGCGCA 6600
ATTGTCTCCC ATGCGATACC AGGCGATGGT CAGGTTGTAC GTGTGCTTTC GGGCCTCGTC 6660
CGAAGCCCCG CGCACGATCT GGGGGGCCTC CGATGGGGCA TGTAGGAGCA CGCTGCGGCA 6720 GGCACGTTCC AGCACTGCGT AGTACACAGT GATCGGGATG CTGGGGGGCT GGAACGGGTC 6780
CTCCAGGCTC GGCTGAATGT GGTAAACACG CTTCACCCCG GGGGGGTCGG TCAGCTGGTC 6840
CAAAACCGGA AGGTTCTTCC CGCGAAATCG ATTGGGATCG GCCATCTTAA GCGAGGGGTC 6900
TGCTAAGGCG TATTTGGCGC AGACGACGCG GAGTCCCACC GCGACAACTA GCAGGGCCGC 6960
CGTCCCGACG CCGGAGGTCA AACGCCCCAT GCCGTGATAC GCGATGCACA CGAAAAACGG 7020 CGACTAGTCG TTCGCAATGC AGCTTATGAC CGAACACCAC ACCGACCCCG GGTTTTAAAC 7080
ACAAAGACTC TATTATACTC CTCCTCCTCG TAAAAATGGA ACCTCCCCTC TGGGGGTGAT 7140
TTGGTTGCAT ATGTGGTCGA ATCGGAGTAT GGTGGTGCGG TGGGTCCGCC ATAACCCCCC 7200
TTGTGGGATG GGTTGTGGGC TTGATGTGTT TTTAGTTTCT TTCCCCCCCC CCCCAAGTTG 7260
GTACGCTTGG GGATCGCCGG ATTATTTCGT CTTTCCCGCA CAACCCATGC CCGGCACGGG 7320 GCGTGTGGCA ACAACCAAAG TTATATTACC GACCGCTCCA TAGCTGCTGT ACCCCGGGCA 7380
CCCGACACAC AATCGGGGCG ATGGGGTGGG GGCAAGGCCA GAAAGGCGAA AAATCATGGG 7440
GCAAATTGGC CCGCGTGGGG GCAGCACCTC GCCAACTCGC GACCCAGGCG ACGCAGGACC 7500
TCGAGTAGAC ACGCCATCCC CAGAATCATG AGACACAGCC CCCCCACGAT GAGGGGGACG 7560
GCAAAGCCCC CGGAACGGAT GAGTGGGGGG TGCGTGGGGA GGCGTGCGGT CGCGTTTGTT 7620 GTATCGGACG CGGGGCCGGT GGGTGCGGCC CCAACAGCAG CACACCCGAG GATTCCCACA 7680
ATCCCCCAGG TCCGAACGGC ATACCGATCC ATTGAGACCA AAACAACAGG CACGCCCCCC 7740
GGCGGCGGTC AAGGTTTTTG TTTTTGTTCG GGGACCCGGG TGACTTCGTC TGGGGGCCTT 7800
TCTCTGTGTG GCCAAAAGTT GCGCGTCTCG AGGGCCCCGG GACACGTCTT TTAAAAGACT 7860 CTGTCACCCA CTGATACCTC CCACCCCCCG CCCCCAGTCC CACAAAACAC AACCAAACTC 7920
ACATATCCGA TTGACGTCAC AGGTTTATTG TTCTTATCGT GGCATTTGGT CGCTGTTCCC 7980
TTTCCCGTCC TTCATCGTTT CTCGCCCCCA CCCCACCCCC TAATCCCGCT CGGGTGGCAG 8040
ACATACGTAA CGCACGCTCG GGTGCGCGTA TCGCCTGCGC CCCGCCCGGC CGCGCCAAAG 8100 TTGTGCTGCC AAGGCGACCA GACAAACGAA CGCCGCCGTG TGGATGGTGG TGCTGATGAT 8160
AAAGAGGATA TCTAGAGCAG GGGAGGCCGT TAGGAACCAG AACAGGGGGA TGTGTTGGGG 8220
TGTGGGGCCC GAGGGCATGT CCTTAGCGGG AGCTTGGGCG GGGGGGCGAG GCGTGTTGGG 8280
GGCGAGCGGC CCAAGAATTC CTGGCGGGAG CGTGGGGCGG ATGGGCCCGG GGCGCGCGGG 8340
GGGTGGTTTG TTGGGGTTCG GAGTTCGGAA GGCGAGGCCG GTGGCGCTGT TGTTGTCATC 8400 GGGGGGTTCG CCGTCCCCGG CGCCCTCAAA CTCCTCGGGT CCGCCGCGAT GTTCGGGGGG 8460
TGGGGGGGCT GGCGAGCCGG GGGGAGCGTC CGCGGGTCCG TGTGGGTGCG TCTTTGGGTC 8520
CGTTGGGGGG GTACGGGCGG TGCCCCGGGT TCCGGGCGTG GCGGTGGTCG CGGCAACCGA 8580
AACGTTGGCG GCCGAGGGCC CCGGCGCGGT ACCGGGGGGC GAAGCGGTGA GGGGGGAATC 8640
GGCCGTGGGT GCGGCGGAAG CGCCCACCGG ACCCGGGGTT GCGGGTCCGG GAGGGGTTGT 8700 TTGGGGCCCC GGATTCCTGG GGCGGGGGGT CACGTGGGTA AACGTGGGCG GGGGGGTCGT 8760
GGGGGCTGGT GTGGTGGGGG GCGTTTCCGC TGCGGGGGCG CTGCTGGTGT TCGTGTGCCC 8820
GGCCCCGGGC GTTGCCGCCG CGGCGGGGAT TGGCTACTAC TCCACGGATG CATTCCCGGG 8880
CGGGGATGCA ACTGCCGTTT CCTCCGGCGT AACGGCGACC GTTGCGGCTT GTGTGGCCCT 8940
CTCGTCGGGG GGAGTATTGG TTGCGGGGGC GGTCGGTCCC CCCCTTGGGT TGACTGATGG 9000 CCCCATGGCG GTGGGGTAAA GAGGGAGGGG GGTTTTTTGG AGAGGGGAAG TTGGGGAAGG 9060
GGAGGAAGGT TTNTGGGGGA GGGGTAAGAG GGGGGGNNNG GGNGAGGGGG GGNNAAAGGG 9120
TGGGGGGGAG GGNGGGGGGG GCTGTNCCCA CGCTCCCCCC CCCCCCCCGC CCCGGTTCGC 9180
AAGCGGGCTT TCGTACCTAC ACCCAGGGCC CCCGCCT 9218
(2) INFORMATION FOR SEQ ID NO : 9 :
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 296 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9:
Met Ile Arg Arg Arg Gly Asn Val Glu Ile Arg Val Tyr Tyr Glu Ser
1 5 10 15
Val Arg Pro Ser Arg Ser Arg Ser His Leu Lys Pro Ser Asp His Gin 20 25 30
Glu Phe Pro Gly His His Val Ser Pro Gly Ser Pro Gly Phe Pro Glu
35 40 45
Ser Pro Gly Asn Arg Glu Phe His Asp Leu Pro Glu Asn Pro Gly Ser 50 55 60
Arg Ala Tyr Pro Gly Thr Arg Asp Pro His Asp Pro His Gly Cys Pro 65 70 75 80
Gly Ser Leu Asp Pro His Gly Asn Pro Ala Gin Pro Ala Gly Leu Pro 85 90 95
Ser Pro Val Pro Tyr Ala Pro Leu Gly Ser Pro Asp Pro Ser Ser Pro
100 105 110
Arg Gin Arg Thr Tyr Val Leu Pro Arg Val Gly Ile Arg Asn Ala Pro 115 120 125 Ala Ser Asp Thr Arg Ala Pro Lys Arg Ala His Ser Arg His Arg Ala 130 135 140
Asp Arg Pro Pro Glu Ser Pro Gly Ser Glu Leu Tyr Pro Leu Asn Ala 145 150 155 160
Gin Ala His Leu Gin Met Leu Pro Ala Asp His Arg Ala Phe Phe Arg 165 170 175
Thr Val Ile Glu Val Ser Arg Leu Cys Ala Leu Asn Thr His Asp Pro
180 185 190
Pro Pro Pro Leu Ala Gly Ala Arg Val Gly Gin Glu Ala Gin Leu Val 195 200 205 His Thr Gin Trp Leu Arg Ala Asn Arg Glu Ser Ser Pro Leu Trp Pro 210 215 220
Trp Arg Thr Ala Ala Met Asn Phe Ile Ala Ala Ala Ala Pro Cys Val 225 230 235 240
Gin Thr His Met His Asp Leu Leu Met Ala Cys Ala Phe Trp Cys Cys 245 250 255
Leu Ala His Ala Ser Thr Cys Ser Tyr Ala Gly Ser Ala His Cys Gin
260 265 270
His Leu Phe Arg Ala Phe Gly Cys Gly Pro Pro Val Leu Thr Thr Ser 275 280 285 Arg Gly Gin Gly Gly Trp Cys Asn 290 295
(2) INFORMATION FOR SEQ ID NO: 10:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 85 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: Met Thr Ser Arg Pro Ala Asp Gin Asp Ser Val Arg Ser Ser Ala Ser
1 5 10 15
Val Pro Leu Tyr Pro Ala Asp Val Pro Ala Glu Ala Tyr Tyr Ser Glu 20 25 30
Ser Glu Asp Glu Ala Ala Asn Asp Phe Leu Val Arg Met Gly Arg Gin
35 40 45
Gin Ser Val Leu Arg Arg Arg Arg Arg Arg Thr Arg Cys Val Gly Leu 50 55 60 Val Ile Ala Cys Leu Val Val Leu Ser Gly Gly Phe Gly Ala Leu Leu 65 70 75 80
Val Trp Leu Leu Arg 85
(2) INFORMATION FOR SEQ ID NO: 11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 41 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11:
Val His Ala Val Asp Ala Pro Ser Gin Phe Val Thr Trp Leu Ala Val
1 5 10 15
Arg Trp Leu Arg Gly Ala Val Gly Leu Gly Ala Val Leu Cys Gly Ile 20 25 30
Ala Phe Tyr Val Thr Ser Ile Arg Ala 35 40
(2) INFORMATION FOR SEQ ID NO: 12:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 337 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:
Val Ala Pro Pro Arg His His Arg Val Ile Pro Glu Val Ser His Val 1 5 10 15 Arg Gly Val Thr Val His Met Pro Glu Ala Ile Met Phe Ala Pro Gly 20 25 30
Glu Thr Phe Glu Thr Lys Val Ser Ile His Ala Val Ala His Asp Asp
35 40 45
Gly Pro Tyr Ala Met Asp Val Val Trp Met Arg Phe Asp Val Pro Ser 50 55 60
Ser Cys Ala Glu Met Arg Ile Tyr Glu Ala Cys Leu Tyr His Pro Gin 65 70 75 80
Leu Pro Glu Cys Leu Ser Pro Ala Asp Ala Pro Cys Ala Val Ser Ser 85 90 95 Trp Ala Tyr Arg Leu Ala Val Arg Ser Tyr Ala Gly Cys Ser Arg Thr 100 105 110
Thr Pro Pro Pro Arg Cys Phe Ala Glu Ala Arg Met Glu Pro Val Pro
115 120 125
Gly Leu Ala Trp Leu Ala Ser Thr Val Asn Leu Glu Phe Gin His Asp 130 135 140
Gin His Ala Gly Leu Cys Val Val Tyr Val Asp Asp His Ile His Ala
145 150 155 160
Trp Gly His Met Thr Ile Ser Thr Ala Ala Gin Tyr Arg Asn Ala Val
165 170 175 Val Glu Gin His Leu Pro Gin Arg Gin Pro Glu Pro Val Glu Pro Trp
180 185 190
His Val Arg Ala Pro Pro Pro Ala Pro Ser Arg Pro Leu Arg Leu Gly
195 200 205
Ala Val Leu Gly Ala Ala Leu Leu Leu Ala Ala Leu Gly Leu Ser Ala 210 215 220
Trp Ala Cys Met Thr Cys Trp Arg Arg Arg Ser Trp Arg Ala Val Lys
225 230 235 240
Ser Arg Ala Ser Ala Thr Gly Pro Thr Tyr Ile Arg Val Ala Asp Ser
245 250 255 Glu Leu Tyr Ala Asp Trp Ser Ser Asp Ser Glu Gly Glu Arg Asp Gly
260 265 270
Ser Leu Trp Gin Asp Pro Pro Glu Arg Pro Asp Ser Pro Ser Thr Asn
275 280 285
Gly Ser Gly Phe Glu Ile Leu Ser Pro Thr Ala Pro Ser Val Tyr Pro 290 295 300
His Ser Glu Gly Arg Lys Ser Arg Arg Pro Leu Thr Thr Phe Gly Ser 305 310 315 320
Gly Ser Pro Gly Arg Arg His Ser Gin Ala Ser Tyr Ser Ser Val Leu 325 330 335
Trp
(2) INFORMATION FOR SEQ ID NO: 13:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 226 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:
Met Arg Ala Gly Leu Val Phe Phe Val Gly Val Trp Val Val Ser Cys
1 5 10 15
Leu Ala Ala Ala Pro Arg Thr Ser Trp Lys Arg Val Thr Ser Gly Glu 20 25 30
Asp Val Val Leu Leu Pro Ala Pro Ala Gly Pro Glu Glu Arg Thr Arg
35 40 45
Ala His Lys Leu Leu Trp Ala Ala Glu Pro Leu Asp Ala Cys Gly Pro 50 55 60 Leu Arg Pro Ser Trp Val Trp Pro Pro Arg Arg Val Leu Glu Thr Val 65 70 75 80
Val Asp Ala Ala Cys Met Arg Ala Pro Glu Pro Leu Ala Ile Ala Tyr
85 90 95
Ser Pro Pro Phe Pro Ala Gly Asp Glu Gly Ser Glu Leu Ala Trp Arg 100 105 110
Asp Arg Val Ala Val Val Asn Glu Ser Leu Val Ile Tyr Gly Ala Leu
115 120 125
Glu Thr Asp Ser Gly Thr Leu Ser Val Val Gly Leu Ser Asp Glu Ala
130 135 140 Arg Gin Val Ala Ser Val Val Leu Val Val Glu Pro Ala Pro Val Pro
145 150 155 160
Thr Pro Thr Pro Asp Asp Tyr Asp Glu Glu Asp Asp Ala Gly Val Ser
165 170 175
Thr Pro Val Ser Val Pro Pro Pro Thr Pro Pro Arg Trp Ser Pro Arg 180 185 190
Gly Pro Pro Glu Ala Pro Ser Cys Tyr Pro Arg Gly Val Pro Arg Arg
195 200 205
Asn Gly Pro Tyr Gly Asp Pro Gly Gly His Tyr Val Cys Pro Arg Gly 210 215 220
Asp Val 225
(2) INFORMATION FOR SEQ ID NO: 14:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 429 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14:
Val Tyr Leu Trp Ala Arg Val Gly Gly Trp Leu Gly Tyr Leu Gly Gly
1 5 10 15
Thr Trp Thr Pro His Lys Gly Ser Leu Glu Gly Gly Lys Leu Gly Gin 20 25 30
Phe Ile Gly Arg Glu Arg Gly Ala Arg Thr Ala Val Pro Thr Ile Ser
35 40 45
His Arg Ala His Ser His Leu Asp Pro Ser Asp Pro Gly Met Pro Gly 50 55 60 Arg Ser Leu Gin Gly Leu Ala Ile Leu Gly Leu Trp Val Cys Ala Thr 65 70 75 80
Gly Leu Val Val Arg Gly Pro Thr Val Ser Leu Val Ser Asp Ser Leu
85 90 95
Val Asp Ala Gly Ala Val Gly Pro Gin Gly Phe Val Glu Glu Asp Leu 100 105 110
Arg Val Phe Gly Glu Leu His Phe Val Gly Ala Gin Val Pro His Thr
115 120 125
Asn Tyr Tyr Asp Gly Ile Ile Glu Leu Phe His Tyr Pro Leu Gly Asn
130 135 140 His Cys Pro Arg Val Val His Val Val Thr Leu Thr Ala Cys Pro Arg
145 150 155 160
Arg Pro Ala Val Ala Phe Thr Leu Cys Arg Ser Thr His His Ala His
165 170 175
Ser Pro Ala Tyr Pro Thr Leu Glu Leu Gly Leu Ala Arg Gin Pro Leu 180 185 190
Leu Arg Val Arg Thr Ala Thr Arg Asp Tyr Ala Gly Val Leu Arg Val
195 200 205
Trp Val Gly Ser Ala Thr Asn Ala Ser Leu Phe Val Leu Gly Val Ser 210 215 220
Ala Asn Gly Thr Phe Val Tyr Asn Gly Ser Asp Tyr Gly Ser Cys Asp
225 230 235 240
Pro Ala Gin Leu Pro Phe Ser Ala Pro Arg Leu Gly Pro Ser Ser Val 245 250 255
Tyr Thr Pro Gly Ala Ser Arg Pro Thr Pro Pro Arg Thr Thr Thr Ser 260 265 270
Pro Ser Ser Pro Arg Asp Pro Thr Pro Ala Pro Gly Asp Thr Gly Thr 275 280 285
Pro Ala Pro Ala Ser Gly Glu Arg Ala Pro Pro Asn Ser Thr Arg Ser 290 295 300
Ala Ser Glu Ser Arg His Arg Leu Thr Val Ala Gin Val Ile Gin Ile
305 310 315 320
Ala Ile Pro Ala Ser Ile Ile Ala Phe Val Phe Leu Gly Ser Cys Ile 325 330 335
Cys Phe Ile His Arg Cys Gin Arg Arg Tyr Arg Arg Pro Arg Gly Gin 340 345 350
Ile Tyr Asn Pro Gly Gly Val Ser Cys Ala Val Asn Glu Ala Ala Met 355 360 365
Ala Arg Leu Gly Ala Glu Leu Arg Ser His Pro Asn Thr Pro Pro Lys 370 375 380
Pro Arg Arg Arg Ser Ser Ser Ser Thr Thr Met Pro Ser Leu Thr Ser
385 390 395 400
Ile Ala Glu Glu Ser Glu Pro Gly Pro Val Val Leu Leu Ser Val Ser 405 410 415
Pro Arg Pro Arg Ser Gly Pro Thr Ala Pro Gin Glu Val 420 425
(2) INFORMATION FOR SEQ ID NO: 15:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 392 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15:
Val Cys Ile Ala Tyr His Gly Met Gly Arg Leu Thr Ser Gly Val Gly
1 5 10 15
Thr Ala Ala Leu Leu Val Val Ala Val Gly Leu Arg Val Val Cys Ala 20 25 30
Lys Tyr Ala Asp Pro Ser Leu Lys Met Ala Asp Pro Asn Arg Phe Arg
35 40 45
Gly Lys Asn Leu Pro Val Leu Asp Gin Leu Thr Asp Pro Pro Gly Val 50 55 60
Lys Arg Val Tyr His Ile Gin Pro Ser Leu Glu Asp Pro Phe Gin Pro 65 70 75 80
Pro Ser Ile Pro Ile Thr Val Tyr Tyr Ala Val Leu Glu Arg Ala Cys 85 90 95 Arg Ser Val Leu Leu His Ala Pro Ser Glu Ala Pro Gin Ile Val Arg 100 105 110
Gly Ala Ser Asp Glu Ala Arg Lys His Thr Tyr Asn Leu Thr Ile Ala
115 120 125
Trp Tyr Arg Met Gly Asp Asn Cys Ala Ile Pro Ile Thr Val Met Glu 130 135 140
Tyr Thr Glu Cys Pro Tyr Asn Lys Ser Leu Gly Val Cys Pro Ile Arg
145 150 155 160
Thr Gin Pro Arg Trp Ser Tyr Tyr Asp Ser Phe Ser Ala Val Ser Glu
165 170 175 Asp Asn Leu Gly Phe Leu Met His Ala Pro Ala Phe Glu Thr Ala Gly
180 185 190
Thr Tyr Leu Arg Leu Val Lys Ile Asn Asp Trp Thr Glu Ile Thr Gin
195 200 205
Phe Ile His Arg Ala Arg Ala Ser Cys Lys Tyr Ala Leu Pro Leu Arg 210 215 220 lie Pro Pro Ala Ala Cys Leu Thr Ser Lys Ala Tyr Gin Gin Gly Val
225 230 235 240
Thr Val Asp Ser Ile Gly Met Leu Pro Arg Phe Ile Pro Glu Asn Gin
245 250 255 Arg Thr Val Ala Lys Leu Lys Ile Ala Gly Trp His Gly Pro Lys Pro
260 265 270
Pro Tyr Thr Ser Thr Leu Leu Pro Pro Glu Leu Ser Asp Thr Thr Asn
275 280 285
Ala Thr Gin Pro Glu Leu Val Pro Glu Asp Pro Glu Asp Ser Ala Leu 290 295 300
Leu Glu Asp Pro Ala Gly Thr Val Ser Ser Gin Ile Pro Pro Asn Trp 305 310 315 320
His Ile Pro Ser Ile Gin Asp Val Ala Pro His His Ala Pro Ala Ala 325 330 335 Pro Ser Asn Pro Gly Leu Ile Ile Gly Ala Gly Ser Thr Leu Ala Val 340 345 350
Leu Val Ile Gly Gly Ile Ala Phe Trp Val Arg Arg Arg Ala Gin Met 355 360 365 Ala Pro Lys Arg Leu Arg Leu Pro His Ile Arg Asp Asp Asp Ala Pro
370 375 380
Pro Ser His Gin Pro Leu Phe Tyr 385 390
(2) INFORMATION FOR SEQ ID NO: 16:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 37 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16:
Val Gly Gly Leu Cys Leu Met Ile Leu Gly Met Ala Cys Leu Leu Glu 1 5 10 15 Val Leu Arg Arg Leu Gly Arg Glu Leu Ala Arg Cys Cys Pro His Ala 20 25 30
Gly Gin Phe Ala Pro 35
(2) INFORMATION FOR SEQ ID NO: 17:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 12489 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17:
GAAAAGGGGG AAGGTGAGGG ATAGGGAAGG AAGAGGAAGG ATAAGAAGTG AAGAAAAAGG 60
GAAGAGAAAA GATAGATATG GGGAGAGGAG GAAGAGAGGG GGTGAAGAAG GGAGAAGAGG 120
GAGAGGAGAG GTAAAGAGGG GAGAGGAGGT AGGAGTGGAA GGGAAGAAGA GAGGAAAAGG 180
GGGGGAGGGA AGAGGGGAGG AGCGGCCGAA GCCGGAATGA CAAACAGACG AAGCGACTGG 240 GGGAGATCCC CCCGCCCCCG AGGACAGCTT TTCCGGGACC TATCCCCGCC ACCGCCGTAT 300
AAGCTCGTCT CCACGGTCGA TATCCCCCAC CCCGAGACAC CCCGGAGAAC ACCGAGCGGC 360
CGACAGGCCA CGGACCCCTA TTGCCGTCGA CACACCACCA GCAATCTCCG CGGATGTGCA 420
GCGACGNGAC CACACCGCCC GAAAACATCT GANTTCCCCT ATGACCTTTC CCACCACCCT 480 CGCGCGGCGC AGGCGGCCAG CGCGAGCGCG CCCGCAAAGG TCACCACGGG AACCCAGATG 540
TTTTCGGGCC GTGCGGCCCC GTCGCCGCAG ACCCGCGGAG ATCGCTGGTG GTGCGTCGAC 600
GGTAACAGGG GTCCGTGGTC TGTTGGCCGC TCGGCGTCCT CCGGGGCGTA TCGGGGCGGG 660
GGAGACTCGA CCGCGGAGAC GACGCAAAAG CGGCGGTGGC GGGGACAGGC CTTGGAAAAG 720 CTGTCCCCGG CGGCGGTTGT CCCGAGGCCC GCCACCACCG CCGCCGTCAG GGCCGCGGCG 780
GCGCGGGGGT CAATCCACCA GTAGCGCGCA CTGCCGCCTA TCGTCAGCCC GCCGACGACA 840
ACCCCCAGGG TGCCGACAAA CAGGGGCCGC GGGGCGAGGA GGAAAAGGGG ATGGACGTCG 900
TCGGCCAGCA CCAGCAGCGC CGCGTAGGCG GCCCCGAGCG CCAGGCACTG GGTCGCCGGG 960
CCCGGAACCC CCGGAGGCGC GCCGGCCTCC CCGAGGCTCC ACAGGGCCAC GGCCGCTCCC 1020 CCGCCGACCA GCCTCAGCCA CGCGCACACA CGCGTCGGCG GCCGGGCGAA CGCGGGGGGG 1080
GCGCGGAGGA ACCCCAGGCC GGTGCTCCCG ATCACGAACG CCCCCGACAT GGCGTCCGCA 1140
TACGGGCCGC GCGACAGAAC GACAGACCCC ACGAAGCCCA GCGTGGTGGC CTGGAGCCAG 1200
AGATGCGCGA ACCCCCGAGC GATGGCACCC ACGCAGGCGA GGCGGGGGCG CCCGTCGTCC 1260
CCATCATCCC CTCCGAGGCA GTGTTCGCCG CTCCCGCCCG CGACCCCGGG GCTGTCCCCG 1320 CGACACATCC CGCACCCGGG GCCCGACGGC GCTCTCGGTG AAGCTGCGAA GGGCCCCGGG 1380
CCCGCTCTTA TAAGCGCACG CAAAACAAAA AAGGAGGGGG AAGGGGGGTG GAAAGGACGG 1440
AGGCGCAACC CGGAACCACA CACAACAGCC ATATTGGTTG GAGGGGGGGG ACACACTACC 1500
CCATTTATTT ATTTTTTTAA CACAACGCAC CCCGCGTGCC CGGGCGCGGT GAACCGTTCG 1560
GCCACCACCC GTCGCCGTCA GGGCAACCCA AAACCGTATG GGGGTCTTTG GGGACCCCGG 1620 AAGGCGGAGA GGGGGCTGGG GCTCGCGTCG CCGGTCTGGA GGTCGCGAAA GTACCACGCG 1680
TAGCGGCCGC CGTCCCCACC ACCCTCCGCC GCCTCGGGTC CG ATCTCGC GGAGAGGGGG 1740
GGCGACTCGG CTCGCGTGGG GGCGGCGGGC ACGCCCGTCT TCGGGCGCTT GGTGGCGTCA 1800
TCCGCGGGGA CAACTTCGGC CCCGGCGTGC CTCCGCTTGG TTCCCGGCGG TTCCGGGAAC 1860
GCGGGCGGGA CCCGGGCCTG GCCCTCGAGG CCTTCTTCTT CTGGGGCGCC GCGGTCGCCC 1920 GCGTCCGGCT CGGAGGAGAA GTCCTGGCTG TCGGTCGGGC CGGCCCGAGA GTCCTGGGAC 1980
GCCAAGCTCC CGGTCGAGGA ACCCGGGGTC CGGCCGAGCC AGTTCAGGCA GACCCGCTGG 2040
GGTTTTAAAA GAAACACCGC CGACACCGCG TTGGGGCCGG TGGCGGTGAC ACACACGCTG 2100
GGGACGTCGG CCGTGAGGAA GAAATTGAGG GTCCCCCCGC CGACCTGGAG CCGCCGGAGG 2160
ACCGCCCGCA TGCTGCAGTC GTCGACGACC ACCGAGAATG TGCGGTGTGT GTTTTCCCCG 2220 TAGACCGTCT TGGCGTTGGC GGCCGCCTGG CCCGCCTTCT TCAGCGCGCT GGTCAGAATC 2280
TGGACCTGGG CGCTGGTGCT GGACGACGCG CCCTCCTCGC GGGCGGCAAA GGTGACGCAG 2340
GTGCGCGCGT TAAACACGGA AAACTTGCCG TTGGGGCCGA GCTCGAACGT GGTGGGTTTG 2400
GCGGTCTCGT CCCCGACGGC GTTCACCACC TTCGTGAGCT GGGGCTTCGT GAGGCGCAGC 2460
TGGACGTCGG GGTCGCCCTG GGGGAGTAGT ACCGCGAAGC TCGTCAACTC GCGTTTCATG 2520 AGCGTCTCGC TGGCAAGCTC CACGGCCTCT CCGTCGGACG CGGTCGTCCA TATGCGCTGC 2580
ACCAGCGTGC GAAACGGGGC CTGGCCCGTG ACCGTCAGCT CCACCCGCCG CAGGTCAGGG 2640
TACTGGTTGG CGCGAAACAC GCTCAGCAGG GATCGCTTCT GGTCCACGAG AGACAGGAAC 2700
GCCGCGGTGG GTCCGCCCCA TCGATAGCGA CTGAACTGCG AATGGTCGAG GGGCAGAAAC 2760
ACCTGCTCGC CGAAAATCGC GTTATGTACA AGGATGCCTC GGTCGCCCAC GACCAGGAGC 2820 GAGTCCAAAA GGCTCGTGCG AAGCGGCGCA AAGGCCTGCA GGATCCCGTT CAGCTCGGCG 2880
CCCTGCAGCA CTATCTGGCA GGGCGGCCAG TCTTCCGTCC GCTCGCGCGG CGACGGGATC 2940
GCGTCCTCCG AAAGGGGGGC GGCGGCCGCA CCGCCGGGAA GATGAGCCAT GCCGCGACGC 3000
TCCCGGGACA ATCCGCAGAG ACGAGGCGCG TCGTGTCACC GGGCCCGGAG GCGCGGCCGT 3060 TTGTGTCGCA GGCGGAGGGG GCGGATGACG CGGACCGGAT GGGGGTTAGG GGGGCCGGGG 3120
GACCCGAGCC ACAGAGCAGT GGCTACCCGA GCCAAGGACT ACGGCGGACC CGCCGCCCTA 3180
GTTTGGTTAA ATACGCCTTC CGCTAGTTAG GCCACACCCT CTTTGAGGGC TCGGGGGAGG 3240
GGGAGGGGGG GAAGAGAGAG ATGGTCGGCC TGCACCGGCG CGCGCCGGCG GTTGCACCAA 3300 TCCGCACGTA GATGGGAAAT AAAAAAGAAT TATAAAGAGC GTGCCTTTCC CGGGATAGCG 3360
TCTTGTTGGA GCGGGGTCGT CGCCGCAGCC ACTGTACACA GGGGCGGCGG GCTTGGGTGT 3420
CCCGGACCGT CACACCTATA CAGCTCTGTA GAGAGACCTA TCCGCACCTA CAATCGTGCC 3480
GGAATGGGTC TGTTTGGCAT GATGAAGTTT GCCCAGACTC ACCATCTGGT GAAGCGCCGG 3540
GGCCTCCGGG CCCCGGAGGG GTACTTTACC CCCATCGCCG TGGACCTGTG GAATGTCATG 3600 TATACCCTGG TGGTTAAATA TCAGCGCCGC TACCCAAGTT ACGACCGCGA GGCAATCACG 3660
CTACACTGTC TCTGTAGTAT GTTACGGGTG TTTACCCAAA AGTCCCTGTT CCCCATCTTC 3720
GTGACCGATC GCGGGGTCGA GTGTACCGAG CCGGTTGTGT TCGGGGCCAA GGCGATCCTG 3780
GCCCGCACGA CGGCCCAGTG CCGCACGGAC GAGGAGGCCA GTGACGTAGA CGCCTCGCCG 3840
CCGCCTTTCC CCCATCACCG ACTCCAGGCC CAGTTTCCCC CTTTCCAACA TGCGCCGCCG 3900 CGGGCACGCC TTCGCCCCGG GGGACCGGGG GAACGCGGGC CGCCGGCCCA GGCCCGGCGG 3960
CCCCCTGGGG CGCGCCCTCG AAGCCGGCCC TGCGCCTGGC TCACCTGTTC TGTATCCGCG 4020
TTCTGCGGGC GCTGGGGTAC GCCTACATCA ACTCGGGTCA GCTGGAGGCC GACGACGCCT 4080
GCGCGAACCT CTATCATACC AACACGGTCG CGTACGTGCA TACCACGGAT ACCGATCTCC 4140
TGCTGATGGG CTGCGATATC GTGTTGGACA TCAGCACCGG CTACATTCCG ACGATTCACT 4200 GCCGCGACCT GCTGCAGTAC TTCAAGATGA GTTACCCGCA GTTCCTGGCG CTGTTCGTCC 4260
GCTGCCACAC AGACCTGCAC CCCAATAACA CCTACGCGTC CGTCGAGGAC GTGCTGCGCG 4320
AGTGTCACTG GACCGCCCCG AGCCGATCCC AGGCCCGCCG GGGGGCCCGG CGGGAGCGCG 4380
CCAACTCGCG CTCCCTGGAG AGCATGCCTA CGCTGACCGC GGCCCCGGTC GGCCTCGAGA 4440
CGCGCATCTC GTGGACCGAA ATTCTGGCCC AACAGATCGC GGGCGAGGAC GACTACGAAG 4500 AAGACCCCCC CCTCCAGCCC CCGGACGTCG CCGGTGGGCC GCGCGACGGC GCCCGGTCGT 4560
CCTCCTCGGA GATACTCACC CCGCCCGAGC TCGTGCAGGT CCCCAACGCG CAGCGGGTCG 4620
CGGAACACCG CGGCTATGTC GCCGGACGTC GCCGCCACGT CATCCACGAC GCCCCGGAGG 4680
CCCTGGACTG GCTGCCCGAT CCGATGACCA TCGCCGAGCT GGTGGAGCAC AGATACGTCA 4740
AGTACGTCAT ATCGCTTATC AGCCCCAAGG AGCGGGGACC CTGGACTCTT CTAAAAAGAC 4800 TGCCCATCTA TCAGGACCTC CGCGACGAAG ATTTAGCGCG CTCCATCGTG ACTCGGCATA 4860
TCACCGCCCC GGACATCGCC GACCGGTTTC TGGCGCAGCT GTGGGCCCAC GCGCCCCCGC 4920
CCGCGTTTTA CAAGGACGTC CTGGCTAAAT TCTGGGACGA GTAGCCGGAA CGGAGGAAAC 4980
GCGCGCCCCC ATCCCCTCCC GATGCCCGAC CTGTTAATAA TAAGAGTAAT AAAATCGTTT 5040
GTTATTATGC ATCTCGGGGT TCTGGTCGGC GCTTGATTTA TCGGTTGGAC GCGTTTCCCT 5100 TTTGGTCCTT TTCTCTGGTT TCGGGCGTTC CTTCCCTTTC CCCAGCCGCC ACCCCCCTCC 5160
CCTGCGTAAT AATCACACCG GAGACCCAAC AGTCCGTTTC GACCCCTTTA TTTCGGTTAG 5220
ACATCGCTAC AAGGGCGCCC AGACCCTCAC AGATCGTTGA CGACGGCCCC GGCGTACGAG 5280
GTGCTGCGGC ACTCGAAGAA GTTGGTGTGT TTGTCGGTGG ACATGAGGCT GAGGGGAAAG 5340
CTGGCGTCGG GGGCGGGGGC GGAATACAGG GGCTGCATAT GGATCAGGCC CAGCAGGCGA 5400 TCCGCGCTGA ATCGCACGTA GTTCTCGATG GCCGCCAGGG CCCCCGGACT CAGGATAGAG 5460
CTGTCCGTCG GGGCCTGGGA TCGGATGAAC CCGATCTCGA TATCCACCGC CTCCCGAAAC 5520
AGCCGGTACA CGCGCGCCGC CTCGGGCTTG GCGTGGCCCC CGAGGTAGTT GTTGTAGATG 5580
TAGCACGAGG CTGTCGTATG CACGGCCTCG TCGCGGCTGA TGAGGTCGTT CGACTGGCAG 5640 GTGACCCGCA GGAGGTTGTT GGTGCGCAGG TACGCGATGG CGGCGAACGA GGCGGCAAAA 5700
AAGACGCCCT CGATGAGGAT CATGAGGATG AACTTCTCCG GGATCGAGTC GCATTCCCGC 5760
ACCCGCGCCT CCAGCCAGTC CACCTTGACG CGAATGGCCG GGTGGTTGAT GGTGCGGGCC 5820
ACATAGGCGC GGCGCGCCTG GTCGTTGTTG TGAAAGAGCA CCAGCTGGAT GATGTTGTAG 5880 ACGCGCGAGT GGACGACCTC GATGCATTCC TGCTCCACGT AGTAGTGAAG AATGTCCTTC 5940
TGTTCGAAGA GGCCGGAGAG GCCGCCCAGG TTTTCCGTCA CCAGGTCGTC CGCGGCCGAC 6000
AGGAAGGCAA ACAGAAAGCG GTAGAAGCCG AGCTCGCCCT CGGAGAGCTT GGAGACGTCC 6060
TCCTCGTCCC CCACGAACAC GAGCTCGGTC TCCAGCCAGC GGTTCAGGAT GCTGAGGGAG 6120
CGAAGGTGGT TGATGTCGGG GCACTGGGAG GTGTAGAAGT ACCGCTCGGG GGTGGGGCAC 6180 ACCGGAATCG GGGCCGCCCC GGCCCCCGAC GCGTGGGTAT CTAGGGGGTC GGTGCTCGCG 6240
GGGGAGACGG CGGGATCCAT GGCGATATGC GGGACCGAGA GCGACGCCTG ACCCCGATCG 6300
GAGCGCTGTT GCTTACAGCG CGCAGCTTGT GCAGACGATG TTGTCGTCGC CGGCGAACAC 6360
CCCGCTGTTG GTCGCCTTGC GAACCTTGCA GTAGTACATC CCCGTCTTCA GGCCGCGCTT 6420
ATATGCGTGG ACGAGAAGGC GGACCAGGGT GGAGGCGGGG AGCGTCCCGT CCGCCTTCTC 6480 TGTGACATAC AGAGTCATGG ATTGGCTGTG ATCAACATAG GGGGCGCGGT CTGCACACAG 6540
GTCGATCAGC AGTTCCTGGT CGTAGTCGAA GGCCGTCTTG AACCGCCGGA GGGGGTGGGC 6600
GGGGTCCAGG CAAGGCAGGG CCTGGGCCAC AGACCACTGC TTGGCCTCGA GCCCGTCCAT 6660
CGCGTCCAGG AGCCGCTTCC CGCCGAACGT GCGCTCGAGT TCCTTCAGCA AGAGCGTGTT 6720
GGGGCGCAGC GTCTCGCCGT CCCTGGTCAC CTTGCTGAAC AGGTTGGTGA ACAGGGGGGC 6780 AAAGCCCTCG CTGACGTCCG AGATCTGGGC CGAGGCGGCG GTGGGCATGA GCGCGATGAA 6840
CTGGCTGTTG CGCAGGCCGT GTTTCATCAT GCTCTGGCGT AGCATCTCCC ACTCGCCCTC 6900
GTACCGCGGG CTGGCGTTCG AAAAGCGCTC CCAGTGAAAG CGGCCGGCCC GGTACATGCT 6960
GCGCTTAAAG TGGCTGAAGG GACGCGCCCC GCGAACGCAC AGCGCGTTAC TGGTCTTCAT 7020
GGCCGCGAGC AGCATCACCT CGGCGATGTG TGTGTTCAGG TCCCGGAACT CGGCCGACTC 7080 CAGATCCAGG CCCATCTTCA GGCACGCCGT GTGCAGGCCC TGCATGCCAA TGCCCATGGA 7140
CCGCAGGTTG TCGTGGCCGC GGGCGCACTG GGGCGTCGGC TGCAGCGTGC TGTCTATCAT 7200
GATATTAACC ATTAGCACGC ACGCCTGCAC GGCGTCGCGG AGCATGCCAA AATCGAACGT 7260
CCGCCGGGAG ACGCATCGGG CCAGATTCAC GCTGCCCAGG TTGCAGACCC CGCTGGAGCG 7320
TTTGGAGGAC GGGTGGACGA TTTCCGTGCA GAGGTTGGAG CCGGCAATGG CCGCCCCTTG 7380 CGTGTTGTAG ATGTAGTGGC GGTTTACCGC GTCCTTAAAC ATGATGAAGG GGCTTCCGGT 7440
GGTGGCCGCG CTGCGCACGA TGGCGTACGC CAGGTCCTGG ATGGGGATCG TTTCGCCGAA 7500
CCCCATGGCC TCGAGGTGCT CGTACAGCTT CTCGAACTCC TCGCCGTGAA AGTCGGCGAG 7560
CGACATGCTG GTGTCCCGGT CGAACAGGGA CCAGGTGACG TTTTCCTCGC CGTCGAGGTG 7620
GCGGATCAGG CGCTTGAAGA ACAGGTCCGG CATCCAGAGG GCGCTGAAGA TGTTGTCGCA 7680 GCGCTGGGCC TCCTCGCCGG CGAGGACGCC CTTCATTCTG AGCACGGCCC GAACGTCGCT 7740
GTGCCAGGGT TCCAGGTACA CGCACGCCCC GGTGGGGCGC GTGCTCTGTT TGTTGTGCGC 7800
CGCCACCAGG GAGTCCAGGA CCTTCAGGGC CGGCATGATG CTGGCGGTGC CGGGGCTGGC 7860
GTCGTTGAAC GCCTGCATGC ACAGCCCGAT GCCCCCGTTG CGGGCGAGGA TGGCGCTCAC 7920
GTTGCCGGTG ATGGCCCGGA GGGTGGCCTG GTTAGTGGTG GCCTGGGGGT TTACCAGGTA 7980 GCAGCTGGAC GTGTAGTAGT TGCGGGTTCC GAGGTTCAGC ATGGCGGGGG TGGACGGCAC 8040
GATCTGGTGG TCGTAGAGGC GGTGGAAAAA GAACTTGAAC ATTTCCCACC ACGACCCCTG 8100
TCGCCCCAGG GCGATGTGGC GCATGCCGCG GGTCGCCCGG CACGCCAGGA ACCCGGCGAT 8160
GCGGGTGTAC ATCTGGAAGA CGGACTCCAT GTAGTGCCCG CCGAAGCGCT TGAGGTAAAA 8220 CTCTTCGTAC TTCAGCGCCG ACTGCAGCCC CCGCTCGACG AGCGTGTTCG GGGTGCTGTG 8280
GATCAGACAG TCGTAGGGGT TCAGGGCCTG GGCCAGGATC ATTAGCTGGG CCTCGTGTTC 8340
GCGAAGCCTT TCCGTCAGCC CGAAGTCCAG GTCCACCTCC TTGGAGCGCA TCCATTCCTC 8400
AAAGGAGGCC TCCCGGGTAC GGATGCGCAG GTGAACCAGA ATCCCCAGGA TGCGATACAG 8460 GCGGGCGGAC CGCCGCACCA GGGGTTTGAA CCCGTTAACC AGCCGCGTCG CATACTCCCT 8520
CAGATGATAG GGCGTGTATG CGTTGGGGGG GACCGGGGGA AGGTCCAGGC ACAGGCGTCG 8580
CATCTCAGCG AGCGCGTAGT TCAGGAGCCC AAAGTCGTCC TCCGTGAGGC GGGGGGCGCT 8640
GCCGAAGGTT CGTGGGGGCA CGCGCTTGCT CTCCTCGCGG GCGCACCGAC AGAAGTACTC 8700
CAGCATGAGC GCGGGCTCGC GGTCGACGGC GTCCCCCAGA AACCGCGCCA CCGCCTCCGC 8760 GTTCTCGGGC GTGAGTTCTA GGGGGACTGG GTAGCCGGGG TCCGTCCCCA CAGTGGGCTG 8820
GCTGTCCAGA ACCGGCGCCA CTTCCGCCTG GGGTGCGGCG GCGTGGGCCG CGGAATCGGA 8880
GTCGGCCGAC GCGCGCGGGT CCGTCGCGGA GCCCGGCCCG GTGCCGGCGC CCAGGCCGGG 8940
GTTTCCGGGG GAGTCCCCGG GGCGCCGGGG CTTGGGAAAG GCCACGGGGG CAGGGCCGTC 9000
GCTCCATCTG CGACGAACGA CAACGTCGGG CTGCACGGAG TCGTCCGACC GCGAGTCGGA 9060 GTCGCTGTCA TCGTCGTCAG TCGCCCCTGC GGCCCAGATC GAAGAGGATC GAGACAGCGT 9120
CTCCGAACCC GAGCCCGTAT CCTCGTCCGA GGAGTCCGAG TCCTCCGTTT CGGAGTCGGA 9180
CGACGGGCCG TCTGACCATG ACTCCGCGGC CCCGACGTCC TTCTCGGCGC CGCCCCTGGC 9240
ATCGCGACGG GCGCAGCACT CGTGGCCCCA TGGAAAGGGG GGAGGAGGGG GCGGGGGGAC 9300
AGCCTGGGGT CCTTGGGGTT CGGGGGTCCT TGGGTTCCCG TGGAGGAACT CCCCGGACGT 9360 CTGGGTCCCC ACGGATGTAG TCGCGGACGG GCCCGAGGTT CCGCCGAGCG CCACGACGGC 9420
GGTTCGGCCA TCCCCGCCGG CTGCGACGTT TGAGATCGCG ACGAAGGCGC CGGTGGACGT 9480
AGCGCCCTCG AGGTCACGCA AATGACCGCG CGCCACGTCT CCGTCGATTA TCATACTGCA 9540
GTTGGAGCCG CATTGAACAA AGCTGCTGTC GCTAATGCGG TAGGCCGCGG GGCCGGGGGG 9600
ATCGCTGGAA AGCACCATCA CGCCGCTGAC TTTCCTGCAA AACACGTGGT CGCCGCCAGG 9660 GGGGGCGACC TCGGGCTCCC GGGGTTCCTG TCGTTCGGAC GGAGACCGCG CTCCGGCGAG 9720
GGCGGATGCG GCAGGGCGGT TGGCCATTGG AACCAAGGTG ACGGTGGCTC GGACGCGATG 9780
AACGGAAACA GCGGCGGGCC CGTGGGAGGC GGGGTGGGCG GTACGACCGA AAGGACGCAC 9840
CGGCCGACGA ATCACCCGGG CAGTCGCCAA CAATCTGTCG ACAGGACAGC ACCGGCCGAC 9900
AGGAACGCAA CAGGAATCGG TGGGCGAAAA CGTGAGGGCT CGCAGGAGAG TGGCGACCGT 9960 CAAGTCCCCG AAGCACCCAC CTGTTGTGGT AACCCAAACC GCCCGCGTTG GCTTTTATCC 10020
CCGGCACACA GGGGTGGGGT GCGATCGTGT GCTGAGTCAA TCTAAGGAGG ATATGCCTAT 10080
CTCCTGAGTC AGGGGGGTGT AACGTGTCCA TGAATCCCAT TTGCATGTCG GTTTCCGCCA 10140
TTTCAGAGAC ACACGCCAAA CACACCACGT CACGGATTTA ACCTGGCTTT ATTGAGACGG 10200
ACGCACGGGG CCGCGCACGG CCAAGACGGC GAGGGCGGAG TTGGTGTGTA GGAGGGGGGG 10260 GGGGGAAAGT GGGAGGGGAA GGTCGCTCGG GTTGGGCGGC CGTCGTCAAT AGACGATCAC 10320
GCGCAGGCCC GCCACCCGCC GGGGCGCCAC ACCACGCCCT CCAGGATGAC AACCGGAACG 10380
CCGACCGCCC CAAACCGCTC GGTGCGGTGC AGACAGCGGG GGCTGTTGTC TGGCATGACA 10440
CCCAGCTCTG CGAAGGCGGC GTATGTGCAG GTGCGTGCGG TAGGGCGCCC CCGCAGATCC 10500
GGCTGCCAGC GGTACAGCCT GTCGACAAAC TGGGGGTTCG TGACTTGGCT CGCCGAGCAC 10560 CGCGGGGATT GCGAGCTCGC GGCCAGGACG GCGAGCGGAA AGTCAACGTT TCGGTTCGGG 10620
GGAAGGGGCG TCATGTCCTC GGCCCCGTGA ATCGTGTTGG TTATCCGGAG GCGCCCGAAC 10680
AGCCGCTCGA GGGCCGCCTC CAGTCCGCGC GGGGGGAGCT GGCTCTTGAC CACGTACACG 10740
CAACAGAGCT CCTGGTCCCG GCACTGTCGG TAGGTGAAAA CCAAGTACAG GAACGCCGCC 10800 CCAGGGCCCC CCAGGCACAG CTCGGCGTCC AGGTCCACAA AGGCGCAGGC CGGGATGGTA 10860
ACGGCCGAAC AGGGCCTGCC CGGGTGGCCG GTGTGTGTGG CCTCCTGGGG TCCGCTGCAG 10920
ACGGCGGTGA CGCTAGCCAG CTCGTCCTGT GTGACCACGA CACTAGCCCC CCGAAAAACC 10980
GCATGACCTC GTGGGGAAAC ACGTGCGTGT GGACCATGGC GCGCAGGCAT TCGGAAAACC 11040 GGTCCAGCCG CGCCCCGGCC CGCGTCCCGC CGTAGTTGGT CGTGATGTGG GCCCGGACCG 11100
CCTCGGCGGC GTCGCGCGCA TCGTAGGCGG CGGCGCACGC GGCCACCAGA AAGTTAAACG 11160
ACACAAGGCC CGCGCGCGCG CAGCCGCTCT CGGCCCGCCC GGACCCCAGG GCGGAGGCCT 11220
CCAGGAGCTG GCCCCAGGCC TCGGCCAGTC GCTCGGTCTG CCGGCCGGGC GGAGCCCGAT 11280
GGCGGGCCAG GTGGGGGAGG TCGGTGGGGT GCCGCAGGGC CAAAAGGAGC GCCCCGGCCC 11340 GCTCCGCGTT TGGTTGGCAG AGATCCGTCA GGGTTACCTG GCGGGTCAGG TGATAGCGGG 11400
GGGCGCCGGG GACGGTCAGC CCTCCCGCGC GCCGGGCTCC CCTGAGGATC TTGTCCACGG 11460
CCTGCTCGGT ATCGTCTCGG GTAGCGCGCG TTCCCGGAGA CGACTCCGCG GGGTCGATCC 11520
CCAGCAGCCA CATCGTACTG GCCGTCCTCC TGGGGGCCCT GTCCCCCAAA AGCACCGGGC 11580
CGTCGCGGCG AGCGATGGGC GGGCGCAGGA CGCGCCGGAG CGGGGCGTGG CCCGCGAGCT 11640 CGCGCGGGCT GGTGGTGGTT TCCACGGCAC TCTCGGCCCA CGCCATCGGG GCTGTCGGGA 11700
GTGGCTTGGT TTTCATGGTT TTCCCGCCGA TCGCCAACGG GCGAGTTGTG GGGGGGGGGG 11760
TGGTGGAGGT GGGCCACCCC CCNGGCCGCC CGTANCCCTC CCCCCCCCTT CTGCCCTCCC 11820
CCCCCCCTCC TCGTGTCCTC TACCCATCTC TCTTCCCATT CCCCTCCCCC CTCTCCCTCC 11880
GTCCCTCTGC CCCTCCCTCC CTCCTCCCCG TTTCCTCTCA TTCCCTCCTT ATTCTCCCAC 11940 TATTCGTCTC TTGGCCTCTC GCTTATCCCC ATATCTCCCC TCCTGTGGTC CTATTATTCT 12000
CATCCTATCT CCTCTCCTTC TTACCATGAC AATTCTCTCC TCATCTCTCC TCTCCTTACT 12060
CCCCTCTCTC TCTCTGTCCA TGCTCTCCCC TCGTCTATCT ACCTCCACAT CTCCATTCTC 12120
TCCTCGTACT CTCTACTCCT CCCTATCATG TCATGCGTGC CCTCGCGTAC TCTTCTCTCC 12180
CCTATCTTTC CACTTCTCCC CCCCGCTCTA CTCTTCCTCT CATGCGTCGC TCTATCTTCT 12240 CTCTCCTCCT TCCGTCTATC TCTCTCCTCC CCTCTTTTCT TCACCCCTCG GTGACCCCCT 12300
CCTTGTTTAC ATCACTTTTG AAACTCCAGC TCTTGCTCTC CCCCACTGCT CCTCCTCCCC 12360
TGACCCCCCC CCCTCTCCCT TTCTCTCGTA TCTCAGTCCT CCTCTCTCCC CCTCTTTTTT 12420
CCCTCCTCTT CGTTCTCCTC CCTTCTTTTC GATCCCGCTC CTCTGTCTCC TCCTCCTTCT 12480
CTTCTCTT 12489
(2) INFORMATION FOR SEQ ID NO: 18:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 335 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18:
Val Cys Pro Pro Pro Pro Thr Asn Met Ala Val Val Cys Gly Ser Gly 1 5 10 15
Leu Arg Leu Arg Pro Phe His Pro Pro Ser Pro Ser Phe Phe Val Leu
20 25 30
Arg Ala Leu Ile Arg Ala Gly Pro Gly Pro Phe Ala Asp Arg Ala Pro 35 40 45
Ser Gly Pro Gly Cys Gly Met Cys Arg Gly Asp Ser Pro Gly Val Ala
50 55 60
Gly Gly Ser Gly Glu His Cys Leu Gly Gly Asp Asp Gly Asp Asp Gly 65 70 75 80 Arg Pro Arg Leu Ala Cys Val Gly Ala Ile Arg Phe Ala His Leu Trp
85 90 95
Leu Gin Ala Thr Thr Leu Gly Phe Val Gly Ser Val Val Leu Ser Arg
100 105 110
Gly Pro Tyr Ala Asp Ala Met Ser Gly Ala Phe Val Ile Gly Ser Thr 115 120 125
Gly Leu Gly Phe Leu Arg Ala Pro Pro Ala Phe Ala Arg Pro Pro Thr
130 135 140
Arg Val Cys Ala Trp Leu Arg Leu Val Gly Gly Gly Ala Ala Val Trp 145 150 155 160 Ser Leu Gly Glu Ala Gly Ala Pro Pro Gly Val Pro Gly Pro Ala Thr
165 170 175
Gin Cys Leu Ala Leu Gly Ala Ala Tyr Ala Ala Leu Leu Val Leu Ala
180 185 190
Asp Asp Val His Pro Leu Phe Leu Leu Ala Pro Arg Pro Leu Phe Val 195 200 205
Gly Thr Leu Gly Val Val Val Gly Gly Leu Thr Ile Gly Gly Ser Ala
210 215 220
Arg Tyr Trp Trp Ile Asp Pro Arg Ala Ala Ala Ala Leu Thr Ala Ala 225 230 235 240 Val Val Ala Gly Leu Gly Thr Thr Ala Ala Gly Asp Ser Phe Ser Lys
245 250 255
Ala Cys Pro Arg His Arg Arg Phe Cys Val Val Ser Ala Val Glu Ser
260 265 270
Pro Pro Pro Arg Tyr Ala Pro Glu Asp Ala Glu Arg Pro Thr Asp His 275 280 285
Gly Pro Leu Leu Pro Ser Thr His His Gin Arg Ser Pro Arg Val Cys
290 295 300
Gly Asp Gly Ala Ala Arg Pro Glu Asn Ile Trp Val Pro Val Val Thr 305 310 315 320 Phe Ala Gly Ala Leu Ala Ala Cys Ala Arg Trp Trp Glu Arg Ser
325 330 335
(2) INFORMATION FOR SEQ ID NO: 19: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 466 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19:
Met Ala His Leu Pro Gly Gly Ala Ala Ala Ala Pro Leu Ser Glu Asp
1 5 10 15
Ala Ile Pro Ser Pro Arg Glu Arg Thr Glu Asp Trp Pro Pro Cys Gin 20 25 30
Ile Val Leu Gin Gly Ala Glu Leu Asn Gly Ile Leu Gin Ala Phe Ala
35 40 45
Pro Leu Arg Thr Ser Leu Leu Asp Ser Leu Leu Val Val Gly Asp Arg 50 55 60 Gly Ile Leu Val His Asn Ala Ile Phe Gly Glu Gin Val Phe Leu Pro 65 70 75 80
Leu Asp His Ser Gin Phe Ser Arg Tyr Arg Trp Gly Gly Pro Thr Ala
85 90 95
Ala Phe Leu Ser Leu Val Asp Gin Lys Arg Ser Leu Leu Ser Val Phe 100 105 110
Arg Ala Asn Gin Tyr Pro Asp Leu Arg Arg Val Glu Leu Thr Val Thr
115 120 125
Gly Gin Ala Pro Phe Arg Thr Leu Val Gin Arg Ile Trp Thr Thr Ala
130 135 140 Ser Asp Gly Glu Ala Val Glu Leu Ala Ser Glu Thr Leu Met Lys Arg
145 150 155 160
Glu Leu Thr Ser Phe Ala Val Leu Leu Pro Gin Gly Asp Pro Asp Val
165 170 175
Gin Leu Arg Leu Thr Lys Pro Gin Leu Thr Lys Val Val Asn Ala Val 180 185 190
Gly Asp Glu Thr Ala Lys Pro Thr Thr Phe Glu Leu Gly Pro Asn Gly
195 200 205
Lys Phe Ser Val Phe Asn Ala Arg Thr Cys Val Thr Phe Ala Ala Arg 210 215 220 Glu Glu Gly Ala Ser Ser Ser Thr Ser Ala Gin Val Gin Ile Leu Thr 225 230 235 240
Ser Ala Leu Lys Lys Ala Gly Gin Ala Ala Ala Asn Ala Lys Thr Val 245 250 255 Tyr Gly Glu Asn Thr Thr Phe Ser Val Val Val Asp Asp Cys Ser Met
260 265 270
Arg Ala Val Leu Arg Arg Leu Gin Val Gly Gly Gly Thr Leu Asn Phe 275 280 285 Phe Leu Thr Ala Asp Val Pro Ser Val Cys Val Thr Ala Thr Gly Pro 290 295 300
Asn Ala Val Ser Ala Val Phe Leu Leu Lys Pro Gin Arg Val Cys Leu 305 310 315 320
Asn Trp Leu Gly Arg Thr Pro Gly Ser Ser Thr Gly Ser Leu Ala Ser 325 330 335
Gin Asp Ser Arg Ala Gly Pro Thr Asp Ser Gin Asp Phe Ser Ser Glu
340 345 350
Pro Asp Ala Gly Asp Arg Gly Ala Pro Glu Glu Glu Gly Leu Glu Gly 355 360 365 Gin Ala Arg Val Pro Pro Ala Phe Pro Glu Pro Pro Gly Thr Lys Arg 370 375 380
Arg His Ala Gly Ala Glu Val Val Pro Ala Asp Asp Ala Thr Lys Arg 385 390 395 400
Pro Lys Thr Gly Val Pro Ala Ala Pro Thr Arg Ala Glu Ser Pro Pro 405 410 415
Leu Ser Ala Arg Tyr Gly Pro Glu Ala Ala Glu Gly Gly Gly Asp Gly
420 425 430
Gly Arg Tyr Ala Trp Tyr Phe Arg Asp Leu Gin Thr Gly Asp Asp Ser 435 440 445 Pro Leu Ser Ala Phe Arg Gly Pro Gin Arg Pro Pro Tyr Gly Phe Gly 450 455 460
Leu Pro 465
(2) INFORMATION FOR SEQ ID NO: 20:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 218 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20:
Met Gly Leu Phe Gly Met Met Lys Phe Ala Gin Thr His His Leu Val 1 5 10 15 Lys Arg Arg Gly Leu Arg Ala Pro Glu Gly Tyr Phe Thr Pro Ile Ala
20 25 30
Val Asp Leu Trp Asn Val Met Tyr Thr Leu Val Val Lys Tyr Gin Arg 35 40 45 Arg Tyr Pro Ser Tyr Asp Arg Glu Ala Ile Thr Leu His Cys Leu Cys 50 55 60
Ser Met Leu Arg Val Phe Thr Gin Lys Ser Leu Phe Pro Ile Phe Val 65 70 75 80
Thr Asp Arg Gly Val Glu Cys Thr Glu Pro Val Val Phe Gly Ala Lys 85 90 95
Ala Ile Leu Ala Arg Thr Thr Ala Gin Cys Arg Thr Asp Glu Glu Ala
100 105 110
Ser Asp Val Asp Asp Pro Pro Phe Pro His His Arg Leu Gin Ala Gin 115 120 125 Phe Pro Pro Phe Gin His Ala Pro Pro Arg Ala Arg Leu Arg Pro Gly 130 135 140
Gly Pro Gly Glu Arg Gly Pro Pro Ala Gin Ala Arg Arg Pro Pro Gly 145 150 155 160
Ala Arg Pro Arg Ser Arg Pro Cys Ala Trp Leu Thr Cys Ser Val Ser 165 170 175
Ala Phe Cys Gly Arg Trp Gly Thr Pro Thr Ser Thr Arg Val Ser Trp
180 185 190
Arg Pro Thr Thr Pro Ala Arg Thr Ser Ile Ile Pro Thr Arg Ser Arg 195 200 205 Thr Cys Ile Pro Arg Ile Pro Ile Ser Cys 210 215
(2) INFORMATION FOR SEQ ID NO: 21:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 282 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21:
Val His Thr Thr Asp Thr Asp Leu Leu Leu Met Gly Cys Asp Ile Val 1 5 10 15
Leu Asp Ile Ser Thr Gly Tyr Ile Pro Thr Ile His Cys Arg Asp Leu 20 25 30 Leu Gin Tyr Phe Lys Met Ser Tyr Pro Gin Phe Leu Ala Leu Phe Val
35 40 45
Arg Cys His Thr Asp Leu His Pro Asn Asn Thr Tyr Ala Ser Val Glu 50 55 60 Asp Val Leu Arg Glu Cys His Trp Thr Ala Pro Ser Arg Ser Gin Ala 65 70 75 80
Arg Arg Gly Ala Arg Arg Glu Arg Ala Asn Ser Arg Ser Leu Glu Ser
85 90 95
Met Pro Thr Leu Thr Ala Ala Pro Val Gly Leu Glu Thr Arg Ile Ser 100 105 110
Trp Thr Glu Ile Leu Ala Gin Gin Ile Ala Gly Glu Asp Asp Tyr Glu
115 120 125
Glu Asp Pro Pro Leu Gin Pro Pro Asp Val Ala Gly Gly Pro Arg Asp
130 135 140 Gly Ala Arg Ser Ser Ser Ser Glu Ile Leu Thr Pro Pro Glu Leu Val
145 150 155 160
Gin Val Pro Asn Ala Gin Arg Val Ala Glu His Arg Gly Tyr Val Ala
165 170 175
Gly Arg Arg Arg His Val Ile His Asp Ala Pro Glu Ala Leu Asp Trp 180 185 190
Leu Pro Asp Pro Met Thr Ile Ala Glu Leu Val Glu His Arg Tyr Val
195 200 205
Lys Tyr Val Ile Ser Leu Ile Ser Pro Lys Glu Arg Gly Pro Trp Thr
210 215 220 Leu Leu Lys Arg Leu Pro Ile Tyr Gin Asp Leu Arg Asp Glu Asp Leu
225 230 235 240
Ala Arg Ser Ile Val Thr Arg His lie Thr Ala Pro Asp Ile Ala Asp
245 250 255
Arg Phe Leu Ala Gin Leu Trp Ala His Ala Pro Pro Pro Ala Phe Tyr 260 265 270
Lys Asp Val Leu Ala Lys Phe Trp Asp Glu 275 280
(2) INFORMATION FOR SEQ ID NO: 22:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 528 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:
Met Arg Ala Gly Leu Val Phe Phe Val Gly Val Trp Val Val Ser Cys 1 5 10 15 Leu Ala Ala Ala Pro Arg Thr Ser Trp Lys Arg Val Thr Ser Gly Glu 20 25 30
Asp Val Val Leu Leu Pro Ala Pro Ala Gly Pro Glu Glu Arg Thr Arg
35 40 45
Ala His Lys Leu Leu Trp Ala Ala Glu Pro Leu Asp Ala Cys Gly Pro 50 55 60
Leu Arg Pro Ser Trp Val Trp Pro Pro Arg Arg Val Leu Glu Thr Val 65 70 75 80
Val Asp Ala Ala Cys Met Arg Ala Pro Glu Pro Leu Ala Ile Ala Tyr 85 90 95 Ser Pro Pro Phe Pro Ala Gly Asp Glu Gly Ser Glu Leu Ala Trp Arg 100 105 110
Asp Arg Val Ala Val Val Asn Glu Ser Leu Val Ile Tyr Gly Ala Leu
115 120 125
Glu Thr Asp Ser Gly Thr Leu Ser Val Val Gly Leu Ser Asp Glu Ala 130 135 140
Arg Gin Val Ala Ser Val Val Leu Val Val Glu Pro Ala Pro Val Pro
145 150 155 160
Thr Pro Thr Pro Asp Asp Tyr Asp Glu Glu Asp Asp Ala Gly Val Ser
165 170 175 Thr Pro Val Ser Val Pro Pro Pro Thr Pro Pro Arg Gly Pro Pro Val
180 185 190
Ala Pro Pro Thr His Pro Arg Val Ile Pro Glu Val Ser His Val Arg
195 200 205
Gly Val Thr Val His Met Pro Glu Ala Ile Leu Phe Ala Pro Gly Glu 210 215 220
Thr Phe Gly Thr Asn Val Ser Ile His Ala Ile Ala His Asp Asp Gly
225 230 235 240
Pro Tyr Ala Met Asp Val Val Trp Met Arg Phe Asp Val Pro Ser Ser
245 250 255 Cys Ala Glu Met Arg Ile Tyr Glu Ala Cys Leu Tyr His Pro Gin Leu
260 265 270
Pro Glu Cys Leu Ser Pro Ala Asp Ala Pro Cys Ala Val Ser Ser Trp
275 280 285
Ala Tyr Arg Leu Ala Val Arg Ser Tyr Ala Gly Cys Ser Arg Thr Thr 290 295 300
Pro Pro Pro Arg Cys Phe Ala Glu Ala Arg Met Glu Pro Val Pro Gly 305 310 315 320
Leu Ala Trp Leu Ala Ser Thr Val Asn Leu Glu Phe Gin His Asp Gin 325 330 335
His Ala Gly Leu Cys Val Val Tyr Val Asp Asp His Ile His Ala Trp
340 345 350
Gly His Met Thr Ile Ser Thr Ala Ala Gin Tyr Arg Asn Ala Val Val 355 360 365
Glu Gin His Leu Pro Gin Arg Gin Pro Glu Pro Val Glu Pro Trp His
370 375 380
Val Arg Ala Pro Pro Pro Ala Pro Ser Arg Pro Leu Arg Leu Gly Ala 385 390 395 400 Val Leu Gly Ala Ala Leu Leu Leu Ala Ala Leu Gly Leu Ser Ala Trp
405 410 415
Ala Cys Met Thr Cys Trp Arg Arg Arg Ser Trp Arg Ala Val Lys Ser
420 425 430
Arg Ala Ser Ala Thr Gly Pro Thr Tyr Ile Arg Val Ala Asp Ser Glu 435 440 445
Leu Tyr Ala Asp Trp Ser Ser Asp Ser Glu Gly Glu Arg Asp Gly Ser
450 455 460
Leu Trp Gin Asp Pro Pro Glu Arg Pro Asp Ser Pro Ser Thr Asn Gly 465 470 475 480 Ser Gly Phe Glu Ile Leu Ser Pro Thr Ala Pro Ser Val Tyr Pro His
485 490 495
Ser Glu Gly Arg Lys Ser Arg Arg Pro Leu Thr Thr Phe Gly Ser Gly
500 505 510
Ser Pro Gly Arg Arg His Ser Gin Ala Ser Tyr Ser Ser Val Leu Trp 515 520 525
(2) INFORMATION FOR SEQ ID NO: 23:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1160 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:
Val Ile Arg Arg Pro Val Arg Pro Phe Gly Arg Thr Ala His Pro Ala 1 5 10 15
Ser His Gly Pro Ala Ala Val Ser Val His Arg Val Arg Ala Thr Val
20 25 30
Thr Leu Val Pro Met Ala Asn Arg Pro Ala Ala Ser Ala Gly Ala Arg 35 40 45
Ser Pro Ser Gin Glu Pro Arg Glu Pro Glu Val Ala Pro Pro Gly Gly
50 55 60
Asp His Val Phe Cys Arg Lys Val Ser Gly Val Met Val Leu Ser Ser 65 70 75 80
Asp Pro Pro Gly Pro Ala Ala Tyr Arg Ile Ser Asp Ser Ser Phe Val
85 90 95
Gin Cys Gly Ser Asn Cys Ser Met Ile Ile Asp Gly Asp Val Arg His 100 105 110 Leu Arg Asp Leu Glu Gly Ala Thr Ser Thr Gly Ala Phe Val Ala Ile 115 120 125
Ser Asn Val Ala Ala Gly Gly Asp Gly Arg Thr Ala Val Val Gly Gly
130 135 140
Thr Ser Gly Pro Ser Ala Thr Thr Ser Val Gly Thr Gin Thr Ser Gly 145 150 155 160
Glu Phe Leu His Gly Asn Pro Arg Thr Pro Glu Pro Gin Gly Pro Gin
165 170 175
Ala Val Pro Pro Pro Pro Pro Pro Pro Phe Pro Trp Gly His Glu Cys 180 185 190 Cys Ala Arg Arg Asp Arg Gly Ala Glu Lys Asp Val Gly Ala Ala Glu 195 200 205
Ser Trp Ser Asp Gly Pro Ser Ser Asp Ser Glu Thr Glu Asp Ser Asp
210 215 220
Ser Ser Asp Glu Asp Thr Gly Ser Gly Ser Glu Thr Leu Ser Arg Ser 225 230 235 240
Ser Ser Ile Trp Ala Ala Gly Ala Thr Asp Asp Asp Asp Ser Asp Ser
245 250 255
Asp Ser Arg Ser Asp Asp Ser Val Gin Pro Asp Val Val Val Arg Arg 260 265 270 Arg Trp Ser Asp Gly Pro Ala Pro Val Ala Phe Pro Lys Pro Arg Arg 275 280 285
Pro Gly Asp Ser Pro Gly Asn Pro Gly Leu Gly Ala Gly Thr Gly Pro
290 295 300
Gly Ser Ala Thr Asp Pro Arg Ala Ser Ala Asp Ser Asp Ser Ala Ala 305 310 315 320
His Ala Ala Ala Pro Gin Ala Glu Val Ala Pro Val Leu Asp Ser Gin
325 330 335
Pro Thr Val Gly Thr Asp Pro Gly Tyr Pro Val Pro Leu Glu Leu Thr 340 345 350 Pro Glu Asn Ala Glu Ala Val Ala Arg Phe Leu Gly Asp Ala Val Asp 355 360 365
Arg Glu Pro Ala Leu Met Leu Glu Tyr Phe Cys Arg Cys Ala Arg Glu 370 375 380 Glu Ser Lys Arg Val Pro Pro Arg Thr Phe Gly Ser Ala Pro Arg Leu
385 390 395 400
Thr Glu Asp Asp Phe Gly Leu Leu Asn Tyr Ala Glu Met Arg Arg Leu
405 410 415 Cys Leu Asp Leu Pro Pro Val Pro Pro Asn Ala Tyr Thr Pro Tyr His
420 425 430
Leu Arg Glu Tyr Ala Thr Arg Leu Val Asn Gly Phe Lys Pro Leu Val
435 440 445
Arg Arg Ser Ala Arg Leu Tyr Arg Ile Leu Gly Ile Leu Val His Leu 450 455 460
Arg Ile Arg Thr Arg Glu Ala Ser Phe Glu Glu Trp Met Arg Ser Lys
465 470 475 480
Glu Val Asp Leu Asp Phe Gly Leu Thr Glu Arg Leu Arg Glu His Glu
485 490 495 Ala Gin Leu Met Ile Leu Ala Gin Ala Leu Asn Pro Tyr Asp Cys Leu
500 505 510
Ile His Ser Thr Pro Asn Thr Leu Val Glu Arg Gly Leu Gin Ser Ala
515 520 525
Leu Lys Tyr Glu Glu Phe Tyr Leu Lys Arg Phe Gly Gly His Tyr Met 530 535 540
Glu Ser Val Phe Gin Met Tyr Thr Arg Ile Ala Gly Phe Leu Ala Cys
545 550 555 560
Arg Ala Thr Arg Gly Met Arg His Ile Ala Leu Gly Arg Gin Gly Ser
565 570 575 Trp Trp Glu Met Phe Lys Phe Phe Phe His Arg Leu Tyr Asp His Gin
580 585 590
Ile Val Pro Ser Thr Pro Ala Met Leu Asn Leu Gly Thr Arg Asn Tyr
595 600 605
Tyr Thr Ser Ser Cys Tyr Leu Val Asn Pro Gin Ala Thr Thr Asn Gin 610 615 620
Ala Thr Leu Arg Ala Ile Thr Gly Asn Val Ser Ala Ile Leu Ala Arg
625 630 635 640
Asn Gly Gly Ile Gly Leu Cys Met Gin Ala Phe Asn Asp Asp Gly Thr
645 650 655 Ala Ser Ile Met Pro Ala Leu Lys Val Leu Asp Ser Leu Val Ala Ala
660 665 670
His Asn Lys Gin Ser Trp Thr Gly Ala Cys Val Tyr Leu Glu Pro Trp
675 680 685
His Ser Asp Val Arg Ala Val Leu Arg Met Lys Gly Val Leu Ala Gly 690 695 700
Glu Glu Ala Gin Arg Cys Asp Asn Ile Phe Ser Ala Leu Trp Met Pro 705 710 715 720
Asp Leu Phe Phe Lys Arg Leu Ile Arg His Leu Asp Gly Glu Glu Asn 725 730 735
Val Thr Trp Ser Leu Phe Asp Arg Asp Thr Ser Met Ser Leu Ala Asp
740 745 750
Phe His Gly Glu Glu Phe Glu Lys Leu Tyr Glu His Leu Glu Ala Met 755 760 765
Gly Phe Gly Glu Thr Ile Pro Ile Gin Asp Leu Ala Tyr Ala Ile Val
770 775 780
Arg Ser Ala Ala Thr Thr Gly Ser Pro Phe Ile Met Phe Lys Asp Ala 785 790 795 800 Val Asn Arg His Tyr Ile Tyr Asn Thr Gin Gly Ala Ala Ile Ala Gly
805 810 815
Ser Asn Leu Cys Thr Glu Ile Val His Pro Ser Ser Lys Arg Ser Ser
820 825 830
Gly Val Cys Asn Leu Gly Ser Val Asn Leu Ala Arg Cys Val Ser Arg 835 840 845
Arg Thr Phe Asp Phe Gly Met Leu Arg Asp Ala Val Gin Ala Cys Val
850 855 860
Leu Met Val Asn Ile Met Ile Asp Ser Thr Leu Gin Pro Thr Pro Gin 865 870 875 880 Cys Arg His Asp Asn Leu Arg Ser Met Gly Ile Gly Met Gin Gly Leu
885 890 895
His Thr Ala Cys Leu Lys Met Gly Leu Asp Leu Glu Ser Ala Glu Phe
900 905 910
Arg Asp Leu Asn Thr His Ile Ala Glu Val Met Leu Leu Ala Ala Met 915 920 925
Lys Thr Ser Asn Ala Leu Cys Val Arg Gly Ala Arg Pro Phe Ser His
930 935 940
Phe Lys Arg Ser Met Tyr Arg Ala Gly Arg Phe His Trp Glu Arg Phe 945 950 955 960 Ser Asn Asp Arg Tyr Glu Gly Glu Trp Glu Met Leu Arg Gin Ser Met
965 970 975
Met Lys His Gly Leu Arg Asn Ser Gin Phe Ile Ala Leu Met Pro Thr
980 985 990
Ala Ala Ser Ala Gin Ile Ser Asp Val Ser Glu Gly Phe Ala Pro Leu 995 1000 1005
Phe Thr Asn Leu Phe Ser Lys Val Thr Arg Asp Gly Glu Thr Leu Arg
1010 1015 1020
Pro Asn Thr Leu Leu Leu Lys Glu Leu Glu Arg Thr Phe Gly Gly Lys 1025 1030 1035 104 Arg Leu Leu Asp Ala Met Asp Gly Leu Glu Ala Lys Gin Trp Ser Val
1045 1050 1055
Ala Gin Ala Leu Pro Cys Leu Asp Pro Ala His Pro Leu Arg Arg Phe 1060 1065 1070 Lys Thr Ala Phe Asp Tyr Asp Gin Glu Leu Leu Ile Asp Leu Cys Ala
1075 1080 1085
Asp Arg Ala Pro Tyr Val Asp His Ser Gin Ser Met Thr Leu Tyr Val
1090 1095 1100
Thr Glu Lys Ala Asp Gly Thr Leu Pro Ala Ser Thr Leu Val Arg Leu 1105 1110 1115 112
Leu Val His Ala Tyr Lys Arg Gly Leu Lys Thr Gly Met Tyr Tyr Cys
1125 1130 1135
Lys Val Arg Lys Ala Thr Asn Ser Gly Val Phe Ala Gly Asp Asp Asn
1140 1145 1150
Ile Val Cys Thr Ser Cys Ala Leu 1155 1160
(2) INFORMATION FOR SEQ ID NO: 24:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 269 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24:
Val Arg Arg Arg Leu Arg Cys Ala Arg Arg Arg Arg Gly Gly Pro Gly
1 5 10 15
Pro His His Asp Gin Leu Arg Arg Asp Ala Gly Arg Gly Ala Ala Gly 20 25 30 Pro Val Phe Arg Met Pro Ala Arg His Gly Pro His Ala Arg Val Ser 35 40 45
Pro Arg Gly His Ala Val Phe Arg Gly Ala Ser Val Val Val Thr Gin
50 55 60
Asp Glu Leu Ala Ser Val Thr Ala Val Cys Ser Gly Pro Gin Glu Ala 65 70 75 80
Thr His Thr Gly His Pro Gly Arg Pro Cys Ser Ala Val Thr Ile Pro
85 90 95
Ala Cys Ala Phe Val Asp Leu Asp Ala Glu Leu Cys Leu Gly Gly Pro 100 105 110 Gly Ala Ala Phe Leu Tyr Leu Val Phe Tyr Gin Cys Arg Asp Gin Glu 115 120 125
Leu Cys Cys Val Tyr Val Val Lys Ser Gin Leu Pro Pro Arg Gly Leu 130 135 140 Glu Ala Ala Leu Glu Arg Leu Phe Gly Arg Leu Arg Ile Thr Asn Thr
145 150 155 160
Ile His Gly Ala Glu Asp Met Thr Pro Leu Pro Pro Asn Arg Asn Val
165 170 175 Asp Phe Pro Leu Ala Val Leu Ala Ala Ser Ser Gin Ser Pro Arg Cys
180 185 190
Ser Ala Ser Gin Val Thr Asn Pro Gin Phe Val Asp Arg Leu Tyr Arg
195 200 205
Trp Gin Pro Asp Leu Arg Gly Arg Pro Thr Ala Arg Thr Cys Thr Tyr 210 215 220
Ala Ala Phe Ala Glu Leu Gly Val Met Pro Asp Asn Ser Pro Arg Cys 225 230 235 240
Leu His Arg Thr Glu Arg Phe Gly Ala Val Gly Val Pro Val Val Ile 245 250 255 Gly Val Val Trp Arg Pro Gly Gly Trp Arg Ala Cys Ala 260 265
(2) INFORMATION FOR SEQ ID NO: 25:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 347 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25:
Met Lys Thr Lys Pro Leu Pro Thr Ala Pro Met Ala Trp Ala Glu Ser 1 5 10 15
Ala Val Glu Thr Thr Thr Ser Pro Arg Glu Leu Ala Gly His Ala Pro
20 25 30
Leu Arg Arg Val Leu Arg Pro Pro Ile Ala Arg Arg Asp Gly Pro Val 35 40 45
Leu Leu Gly Asp Arg Ala Pro Arg Arg Thr Ala Ser Thr Met Trp Leu
50 55 60
Leu Gly Ile Asp Pro Ala Glu Ser Ser Pro Gly Thr Arg Ala Thr Arg 65 70 75 80 Asp Asp Thr Glu Gin Ala Val Asp Lys Ile Leu Arg Gly Ala Arg Arg
85 90 95
Ala Gly Gly Leu Thr Val Pro Gly Ala Pro Arg Tyr His Leu Thr Arg 100 105 110 Gin Val Thr Leu Thr Asp Leu Cys Gin Pro Asn Ala Glu Arg Ala Gly
115 120 125
Ala Leu Leu Leu Ala Leu Arg His Pro Thr Asp Leu Pro His Leu Ala
130 135 140 Arg His Arg Ala Pro Pro Gly Arg Gin Thr Glu Arg Leu Ala Glu Ala
145 150 155 160
Trp Gly Gin Leu Leu Glu Ala Ser Ala Leu Gly Ser Gly Arg Ala Glu
165 170 175
Ser Gly Cys Ala Arg Ala Gly Leu Val Ser Phe Asn Phe Leu Val Ala 180 185 190
Ala Cys Ala Ala Ala Tyr Asp Ala Arg Asp Ala Ala Glu Ala Val Arg
195 200 205
Ala His Ile Thr Thr Asn Tyr Gly Gly Thr Arg Ala Gly Ala Arg Leu
210 215 220 Asp Arg Phe Ser Glu Cys Leu Arg Ala Met Val His Thr His Val Phe
225 230 235 240
Phe Val Met Arg Phe Phe Gly Gly Leu Val Ser Trp Ser His Arg Thr
245 . 250 255
Ser Trp Leu Asp Pro Ser Ala Ala Asp Pro Arg Arg Pro His Thr Pro 260 265 270
Ala Thr Arg Ala Gly Pro Val Arg Pro Leu Pro Ser Arg Pro Ala Pro
275 280 285
Leu Trp Thr Trp Thr Pro Ser Cys Ala Trp Gly Ala Leu Gly Arg Arg
290 295 300 Ser Cys Thr Trp Phe Ser Pro Thr Asp Ser Ala Gly Thr Arg Ser Ser
305 310 315 320
Val Ala Cys Thr Trp Ser Arg Ala Ser Ser Pro Arg Ala Asp Trp Arg
325 330 335
Arg Pro Ser Ser Gly Cys Ser Gly Ala Ser Gly 340 345
(2) INFORMATION FOR SEQ ID NO: 26:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 12701 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:
GTAGAAGTAA GGAGAGAGTG AGAGTATAGA GAAATAGATG AGAGAGAGAA GGTNAGATAA 60 ATAAGAGGGA CAGAGAGTAG GGAGTAGGGT TAGAGAGGAT GAGGGAGAAA GATGAGAGAA 120
GAGGGGAAGA TATATAGATG AGAGAAGGGA AGGAATGAGA GAGGGAAGAA TGATGGAAGG 180
GAAGGAGNAG AGAAGAGTAG TAGAGGGAAA AAAGAGAAGG AGAAAAGNAA GGAAAGAAAT 240
AGAAGAGNAG AGAAGGAAGA GAGAAAGAGA GAAGGAAGGA GGGGAAAGAA GAAANAAGAA 300 GGGAAGAAGG AAAGGAAGAG NAGAAGAAGG AAGAGAAGAG AAAGAAGAAG NAGAGGGGGA 360
AGAAAAGAGA GGAAGAGGGG AAGGAGGAAA GGAGGGAGAA GAAAAAANAA AAAACCACAC 420
GGCGGCCGAA ACGTCGGGGG AACCGGTAGA AGTCCTGCAG GTCGGACGAA CCAACGGACA 480
CCTCCGCAAA GCGCGCGCGC GCCTCCCCCG CGGCGTCGCG ACAGACCAGA TACAGCAGGG 540
CGTGGAGGCA GTCGCGCGTG CGCGGGGGCA GCCATACCGC GTATAGGGTA ATGGCGCTGA 600 CGCTCTCCTC CACCCAAACG ATGCCGGGGG CTTCCATGCC ACGACGCCCG GGGGTTGCCG 660
TGTATCGAAC GAGCGCGGCC CCAGACTTAT AGGGTGCTAA AGTTCACCGC CCCCTGCATC 720
ATGGGCCAGG CCTCGGTGGG AAGCTCCGAC AGAGCCGCCT CGAGAATGAT GTCAGTGTTG 780
GGCTGGGCGC CGGAGGCGTG CGTGCGCAAG CAGCGCCCCC ACGCGGGCGC GCGCAGCTTG 840
AAGCGCGCGC CCGCAAACTC CCGCTTATGG GCCATCAGCA GCGCGTACAG CTGTCTGTGC 900 GTCCGGCAGG CGCTGTGGTC GATGCGGTGG GCGTCCAGCA GCTCCACGAT GGCTCGCTTG 960
GTGAGGTTTT TAACGCGCCC CGCCCCGGGA AACGTCTGCG TGCTCTTGGC CAGCTGCACC 1020
CCGAACAGTT CGCCCCAGAT GATCTTGAAC AGCGACAGCG CGTGCTCCGT CTCGCTCACG 1080
GACCCGCGCG GGGGGCAGCC GCTCAGGGCG TCGGCCACGC GCTTAACCGC GTCCTCCGAC 1140
AGCAAGGGGC CGTCGGTCAC GTTACAGTGG CCCAGTTCGA ACACCAGCTG CATGTAGCGG 1200 TCGTAGTGGG GGTTCAGCAG CTCCAGCACG TCCTCGGGGC TAAAGGTTCG CCCCGACCCC 1260
CCGGCCATCG AGTCCCACTG CAGGCACGCG GCCATGGTGC TGCACAGACG GAACAGCTCC 1320
CAGACGGGGG CGACGTTTAG GGTGGGGTGT AGGGCCACAA GCTCCAGCTC TCCGGCGGCG 1380
TTGATCGTGG GGATGACGCC CGTGGCGTAG TGGTCGTAAA GCCGCCGGAA GATGGCGCTG 1440
CTATGGGCGG CCATGGGGAC GCGAAGACAG GCCTCCAGCA GCACCAGGTA GATGAACCGC 1500 GTGCGGCCGA CCAGGCTGTT GAGGCCGCGC ATGAGCGCGA CCACCTCGGC CGGCGCGACG 1560
TCCGGCCGGA GGTACTTTTC GACGAAAAGG CCCACCTCCT CCGTCTCGGC GGCCTGGGCC 1620
GACAGGGACG TGTCGGGGTC CTGGCAGCGC AGCTCCCGCA GATCCCGCTG GGCCCTCAGG 1680
GCATCAAAAT GTATCCCCCG CAAAAACAGA CAAAAGTTCC TCGGGGTCAG CGCGGCGTCG 1740
TGGCCCCAGA ACCGCACGTG CATGCAGTTG AGGGTCAGAA GCATGTGGAG GATGTTAAGA 1800 CTGTCCGCGA GGCACGCCAG CGTGCACCTC TCGAAGTAGT GCTTGTACCG GAATTTGCTG 1860
TAGATGCGCG ACCCCCGCGC CTGCGCCGCG TCGGCGTGCG ACGCGTCGCA GCGCCCTTTG 1920
AACCGGCGGC ACAACAGGTT CGTCACCTGG GAAAACTGTG CCGGCCACTG CCCGCTGGCG 1980
CTCACCACGT GGTTGAGCAG CATGGGCGTA AAGACGGGCT CCGAGCGCGC CCCGGACCCG 2040
TCCATGTAGA TCAGCAGCTC CCCCTTGCGG AGAGTCCGTA CCCGCCCCAG CGACTGGTAC 2100 ACGGACACCA TGTCCGGCCC GTAGTTCATG GGTTTCACGT AGGCGAACAT GCTGTCAAAG 2160
TGCGGCGGAT CGAAGCTAAG GCCCACCGTC ACGACCGTTG TGTAAATGAC CACCCGGTAC 2220
CGGCCCCATG TGGTCACTTC GCCGGGCGGG GTGAGCGAGT GGAGCAGCAG CACGCGGTCC 2280
GTAAACTGCC GGCAGAACCT GGCAACGACC TCCGCGAAGG AGACCGTCGA CAAGAAGATG 2340
CAGACGTTAT CTCCGCCGGC CAGGCGCGCC TCCACCTCCC CGAAGAAGGT GGCGTCCGGG 2400 GGGGCGTCCG GGGGGGGCGC CCCGCCCGCC GGCCCCGGCG GGCGCAGGGC CGCCTGCAGG 2460
ACCTCGGGCC CCAGGCGCGG GAGAAACAGA CAACGGCGCG CCGAAAATCC GGGCATGGCA 2520
TACTCCCCGA TGACCACGTG AACGTTCTTT TCGCCCCGGA GGCTGCACAG AAAGTCCACC 2580
AGCTGCGCGT TGGCGGTGGC GTCCATGGCG ATGATCCGCG GGCACGTGCG CAGCAGGCGC 2640 AGCATCAACG CGTCGACGCG GCCCAGCTGC TGCATCGTCG GCGAGTACAG TTGGCCCAAC 2700
GTCGACATGA CTTCGTCCAG GACGAGCACG TCGTAGTTGT TCAACAGGTT CGGGCCCACG 2760
CGATGAAGAC TTTCCACCTG CACGATGAGA CGGTGGAAGG GGCGGTCGTT CATGATGTAA 2820
TTGGTGGATG AGAAGTAGGT GACGAAGTCG GGCAACCCTG ACTCAGCGAA CCGCGTCGCC 2880 AGGGTCTGAG TAAAACTCCG ACGACAGGAG ACGACCAGCA CACTCGTGTC CGGAGAGTGG 2940
ATCGCTTCCC CCAACCAGCG GATCAGCGCG GTAGTTTTTC CCGAGCCCAT TGGCGCGCGG 3000
ACCACAGTTA CGCACCGGGC CGTCGGGGCG CTCGCGTCCG GGAAGGTGAC GGGTCCGTGT 3060
TGCTGCCGCT CGATCGTTGT TTTCGGGTGG ACCCGGGGAA CCCACTCGGC CAAATCCCCC 3120
CCGTAAAGCA TCCGCGCCAG CGATACACTC GACGTGTACT GCTCGCACTC GTCATCCCCG 3180 ATGGGACGCC GGGCCCCCAG GGGATCCCCC GAGGCCGCGC CGGGCGCCGA CGTCGCGCCC 3240
GGGGCGCGGG CGGCGTGGTG GGTCTGGTGT GTGCAGGTGG CGACGTTCAT CGTCTCGGCC 3300
ATCTGCGTCG TGGGGCTCCT GGTGCTGGCC TCTGTGTTCC GGGACAGGTT TCCCTGCCTT 3360
TACGCCCCCG CGACCTCTTA TGCGGAGGCG AACGCCACGG TCGAGGTGCG CGGGGGTGTA 3420
GCCGTCCCCC TCCGGTTGGA CACGCAGAGC CTGCTGGCCA CGTACGCAAT TACGTCTACG 3480 CTGTTGCTGG CGGCGGCCGT GTACGCCGCG GTGGGCGCGG TGACCTCGCG CTACGAGCGC 3540
GCGCTGGATG CGGCCCGTCG CCTGGCGGCG GCCCGTATGG CGATGCCACA CGCCACGCTA 3600
ATCGCCGGAA ACGTCTGCGC GTGGCTGTTG CAGATCACAG TCCTGCTGCT GGCCCACCGC 3660
ATCAGCCAGC TGGCCCACCT TATCTACGTC CTGCACTTTG CGTGCCTCGT GTATCTCGCG 3720
GCCCATTTTT GCACCAGGGG GGTCCTGAGC GGGACGTACC TGCGTCAGGT TCACGGCCTG 3780 ATTGACCCGG CGCCGACGCA CCATCGTATC GTCGGTCCGG TGCGGGCAGT AATGACAAAC 3840
GCCTTATTAC TGGGCACCCT CCTGTGCACG GCCGCCGCCG CGGTCTCGTT GAACACGATC 3900
GCCGCCCTCA ACTTCAACTT TTCCGCCCCG AGCATGCTCA TCTGCCTGAC GACGCTGTTC 3960
GCCCTGCTTG TCGTGTCGCT GTTGTTGGTG GTCGAGGGGG TGCTGTGTCA CTACGTGCGC 4020
GTGTTGGTGG GCCCCCACCT CGGGGCCATC GCCGCCACCG GCATCGTCGG CCTGGCCTGC 4080 GAGCACTACC ACACCGGTGG CTACTACGTG GTGGAGCAGC AGTGGCCGGG GGCCCAGACG 4140
GGAGTCCGCG TCGCCCTGGC GCTCGTCGCC GCCTTTGCCC TCGCCATGGC CGTGCTTCGG 4200
TGCACGCGCG CCTACCTGTA TCACCGGCGA CACCACACTA AATTTTTCGT GCGCATGCGC 4260
GACACCCGGC ACCGCGCCCA TTCGGCGCTT CGACGCGTAC GCAGCTCCAT GCGCGGTTCT 4320
AGGCGTGGCG GGCCGCCCGG AGACCCGGGC TACGCGGAAA CCCCCTACGC GAGCGTGTCC 4380 CACCACGCCG AGATCGACCG GTATGGGGAT TCCGACGGGG ACCCGATCTA CGACGAAGTG 4440
GCCCCCGACC ACGAGGCCGA GCTCTACGCC CGAGTGCAAC GCCCCGGGCC TGTGCCCGAC 4500
GCCGAGCCCA TTTACGACAC CGTGGAGGGG TATGCGCCAA GGTCCGCGGG GGAGCCGGTG 4560
TACAGCACCG TTCGGCGATG GTAGCCGTTT CGTTCGTTTT AATAAACCGA CGTTGTGCGT 4620
TTCACCATAC TTCGGCGCGC GCGTGTGTGT GTTTTTTTTG TGGTGTTTAT TTTCCCCCAC 4680 CCCTTCCTTT TCTTTCGGCC ACCACCCCCC TCCTCCCCCG TACTATACAA CAAAAAATAC 4740
CACACATACG ACCAAATACG GACAATCATT TCTGTCTTTA TTCGCTGTCA GAGAGTGGGG 4800
GCGTGAGCGT GGCAGGAGGG CGGGCCACGT CGGGGTCCCG CCGTCTGGTG TGACGCGATG 4860
GGGGGTCCGA TGCGCGCCGG TACTGGGGCC CCGGCGCCCG GGTGACCACG CGCATGTCGG 4920
GGGGCACGTA GAAGTTACCC TCTTCTTCGG ACTCGATGTC CACGACGTCA AATTCGTGGG 4980 CGGTCAGCGA GACGACCTCC CCGCCGTCGG TGATGATGAC GTTGTGTCGG CAGCAGCAGG 5040
GCCGCGCCCC GGAGAACGCG AGGCCCATAA CTTGGCGAGC GTATCGTCGA AGGCCAGGCG 5100
GCTGTTTCGC CGGATGTCCC GGTATATCCC CGGCTCGACG CGGACGGGGG TGATGATCAG 5160
GGCGATCGGA ACGGCCTGGT CCGGGAGGAT CGATGCCTTG GCGGGTCCGG GGGCCCCGCC 5220 ACGCCCGGCG GGCGCTCCGC GGCCGTCCTC CAGGCGGAAC GTCACGCCCT CCTCCGCGCC 5280
CGCGCGGTGC CTGCCGAGGA ACGTCACCAG GTGCGGTTGC AGGGGGCAGT CGGGAAAGTG 5340
GCTGTCGAGG GACGTTTCCC TGCACCAAGA TCTGTTTGAA GTTCGGGTGG CGGGGGTTGG 5400
CGAAGATGGG CTCGCGGCGA ACCAGCTCCC CGGAGCTCCA GGCCACGGGA GAGATGGTGC 5460 GACGCTCGAG GTCGGGGACG CCAAACAGAA GCACCTCCGA GACAACGCCG CTATTTAACT 5520
CCACCAGCGC CCGATCCGGG GCGGAGCATC GCCTTTTTTC GCCGGCGGCG CGGGAATCGA 5580
GCCAGTCCCG GTCTTGGGTG ACGAGCGCCT CCTCCGGGCC CGGGACGCGC CCGGGCGCGA 5640
AGTAGCGCAC GCCGGGGTTG GGGATGGACC GGATGAACGC CCGGAACGCC TCCGGCGATC 5700
GCCGCGCCAT CAGGTCCTCG TACGCGGAGG CCGCGGGGGC GCCGGGGTCC GCGGGGTCGA 5760 ACGCGTACTT GGCTCGGCAC TTAACCTCGT AGAAGGCCAG GGGGGTCTGG GGGGCGGGGG 5820
CCAGGTAGCC GTGAGGGTCC CTGGGGCACA CGAGGATGTC CAGGGACGCC CCCACCATGC 5880
CCGTGTGGCC GTCCATGAGG ACCCCGCACG CGTGCACGTT CTCCTCGGCG AGGTCCCCGG 5940
GTTGGTGAAA GACGAAGCGC CCGGCGTCGG CGTCGTCGTT GACGCCCGCG TCCGCGCGGC 6000
CCACGCAGTA GCGAAACAGC AGGTTTCGGG CCGTCGGCTC GTTCACCCGC CCGAACATCA 6060 CCGCCGACGA CTGGGCGTCC AGCCGCAGGC TGGCGTTGTG GGTGAGCCAC TGGGACGAGA 6120
AGCACGGACC CTGCGCGCCC CACCGCAGCG TGGAGGCGGT CGTCAGGCCC CGCCGAAGCA 6180
GGGCCCAGAG CTGGCAGTCG GCCTGGTTTT GCGTCGCCGC CTCGTAAAAT CCCATAAGCG 6240
GGCGGGGGGC GACGGCTTCG GCGGCGGACG GGGGGGCGCG GCGCGTCAGG CGCCAGAGGT 6300
GCCGGCCGAG CCCGCGGTCC ACCATGCCGG CCGCCTCCAG CGACACGACG AGGGAGCACA 6360 GATAGTCCAG GCGAGCCCAC AGGGGCCCGA TGGCCAGAGG GGAGCGGACG CCGCGCAGCA 6420
GGCCGCGCAG GTGGCGCTCG AACGTTTCCG CCAAGATATG GGGGGGCAGT GCGTTGGGGA 6480
TCGCCGACGC CGACCACATC GGGTCGGGGT CCGGGGGACC GGGGCTGCAG TCCGGGTCGA 6540
TGGCGTGTGC GCCCCCCGGC GAGAGGGGAA TGTCGGGGGT TGGCGGGCCG GATGAGGCCT 6600
CAGAGAGGGC CGGGGACGCG GGCCGGGCCT TTTCGCCCGG GGCCCCGCCG TCGGGTTGCC 6660 CACGTGGGGG GCTCTGGGGC CAATGGGAAC CCGGGGCCCC CGGTGAAGTG GGGCGGGGTG 6720
GGGCGGGGCG GGGCCCAAAG ACGGTCGCCA GATCTAGGCT GTTGGGTCGG GGCCGCTTCG 6780
GGGGACTATC GGGGTCGCGG GCGGGGTCCG CGGGGCGCTT GGCGCCGGGT GTTGCGGCGG 6840
CCGCCATTTT TACGAGCAGC CGAAGAGCTC GAGGGCGGAA GGGATCCTCA CGACAGAGAG 6900
TGGCGCGCGG CCGGGTTGGC GTGACAGAGG CGGGAGACCA GCACCAGCAG CGGCCTCAGC 6960 TCGGGCGGCA GCGACACCGA CGACAGGACG GCCTTGTGCG TGCGCTGGTA ATTTATACAC 7020
TGCTCCGTGA ACGCGCGCCG AATCTTGGGA TTGCGAAGGT GGCGCCGGAT GCCCTCCGGC 7080
ACGTCATACG CCAGGCCGTG GGTGTTGGTC TCGGCCGAGT TGACAAAGAG GGCGGGGTGC 7140
AGAACGCAGC GATAGGCGAG GAGGGCCACG GCAAAGTCCG GCGAGAGCTG GTTGTTAAAG 7200
TACTGGTAGC CCGGGACGCG GGTCACGGGG ACGCCCAGGC TCGGGGCCAC GTACACGCTA 7260 ACCAGCAGCT CCAGCAGCGT CTGCCCCAGG GCGTAGAGAT CGACCGCCAG CCCGACGTCG 7320
TGCTTCAGGG GGCGGTTGTT AAACTCGGCC CGCTCGTTGT TGAGGTACTT TACCAAGAGC 7380
TCCGGCGGCT GGTTGTACCC GTGCCCCACC AGAGTGTGAA AGTTGGCCGT GGTCAGGGCG 7440
GCGGGCATCC CAAACCCCCG GGGGGACTCG AGGTCCGGCT CCTGGAGGCA AAACTGGCCC 7500
CGGGATATCG TGGAGTTGGA GTTCAGGGTC ACCAGGCTAA AGTCGGCCAG GACGGCCCGC 7560 CGGAGCGACA CCGCGTCCGA TCGCAGCATC ACGAGGACGT TGGCGCACTT GATGTCCAGG 7620
TGGCTGATCC CGCACCTGGT GTTCAGGAAC ACCACGGCGC GCGCCAGGTC TGTGAAGCAG 7680
TGGTGGAGGG CCGTCGCGAC GGAGGGGGTG GTCGCGCGCA GGGACGCCAG CTGGCCGATG 7740
TACTTGCCGA GGTCCATGTC GTACGCGGGG AACACGATCT GGCGCTGCTG CAGCGAGAAC 7800 CCGAGCGGGG TGATAAAGCC GCGGATGTCG TGGGTGCGGC CGCCGCAAAA AGCGCACTCC 7860
CCCACGAGCA GGGTCGCGAC GAGCTCCACG GCAAACCACT CCTTTTCCCG GATGGTCTTC 7920
ACGGCCAGCT TGTGTTCGCA AATCAACTGC ACCTCGCCGT ACCCCCCCGA GCCCCCGAAG 7980
CTGCGGGCCC CGGGGATCTC CAGGGTCGTG TAGCGGAGGG CGGGGTTGAC GGCGAATACG 8040 GGGATGCATA GCTTGTGGAT GCGCGCGAGG GACAGGATGT GCGAGGGGGG CGACGGGGGC 8100
GAGGTCATGG CCGTCTCGGA CCTGCGCAGG GGCGGGCGCC TCAGCTTGGC CGCAGGGCCG 8160
GGGGCCTCGG GGGACGAGCG GCGACGAGAC GAGCGGCTCA CTCGCCATCG GGACAGTCCC 8220
GCGCGAAGCC GCTCCCGGAA GCTGGATCGG CGGCGGGACC CGGGGCGGGC TCCGGAGACG 8280
GCGCCGTCTC GGGGGGAGGG GCCGCTTGGG CGTCCGGACG CCCGGCGGCT GAGGGAGTGT 8340 ATGTAGGACG CGAGCCAGGC CTTGAAGGAG CGTCGGTGTG CACCTTGGGG GCTGATGTCA 8400
GCTGCCACAT GACTAGCAGG TCGCTGTCGC CCGGACTCAT CCATCCGTCC GCCAGGTCGC 8460
CGTCCCCCCA CAGAGACGCG TTCGCCGCGG CCTCTTCGAG CTGCTCCTCC TGGTCCGCAA 8520
GACGATCGTC CGCCGCGTCC AGGCGCTCGC TAAGCGCGGG ATCGAGGTAC CGTCGGTGTG 8580
CGGTTAGAAA ATCACGTCGC GCCGCTTGCT CTTCCACGCG AATTTTAACA CAGGTCGCTC 8640 GCTGTCGCAT CATCTCTAAG CGCGCGCGGG ACTTTAGCCG CGCCTCCAAT TCCAAGTGGG 8700
CCGCCTTGGC GGCCATAAAG GCGCCAACAA ACCTAGGATC TTGTGTACTC ACGCCCTCCC 8760
GGTGTAGCTG CAGGGTCTGG TCCCTGTACA CCTCGGCCCG GAGGTGCGTC TCGGCCAAAC 8820
GTCGGCGCAG GGCCGCGTGG CTGGCGTCTC GGCTCATCTC GCCGCCCCCG CGCGCGCCCG 8880
ACGTCGGACT CCTTCGCCCC GACCCCCCTG ACCTCAGCCG CCCCCGCCTC GCCCGCGATG 8940 TTTGGCCAGC AGCTGGCGTC CGACGTGCAG CAGTACCTGG AGCGCCTGGA GAAACAGAGG 9000
CAACAGAAGG TGGGCGTCGA CGAGGCGTCG GCGGGCCTGA CGCTCGGCGG CGATGCGCTG 9060
CGCGTCCCTT TTTTGGATTT TGCCACCGCG ACGCCCAAGC GCCACCAGAC CGTGGTCCCG 9120
GGCGTCGGGA CGCTCCACGA CTGCTGCGAG CACTCGCCGC TCTTCTCGGC CGTCGCGCGG 9180
CGGTTGCTGT TTAATAGCCT GGTGCCGGCG CAACTCAGGG GGCGTGACTT TGGGGGCGAC 9240 CACACGGCCA AGCTGGAGTT CCTGGCCCCC GAGCTGGTGC GGGCGGTGGC GCGCCTGCGG 9300
TTTCGGGAGT GCGCGCCGGA GGACGCCGTG CCCCAACGCA ACGCCTACTA CAGCGTCCTG 9360
AACACGTTTC AGGCCCTGCA CCGCTCCGAA GCCTTTCGGC AGTTGGTTCA CTTCGTGCGG 9420
GACTTCGCCC AGTTGTTGAA AACCTCGTTC CGGGCCTCTA GTCTCGCGGA GAATACGGGC 9480
CCCCCGAAGA AACGGGCCAA GGTGGACGTG GCCACCCACG GGCAGACGTA CGGCACCTTG 9540 GAGCTCTTCC AGAAAATGAT ACTAATGCAC GCGACCTACT TTCTGGCCGC CGTGCTGCTC 9600
GGGGACCACG CGGAGCAGGT CAACACGTTC CTGCGGCTCG TGTTCGAGAT CCCCCTGTTT 9660
AGCGACACGG CCGTGCGGCA CTTCCGCCAG CGCGCCACCG TGTTTCTAGT CCCCAGGCGC 9720
CACGGAAAGA CCTGGTTTTT GGTGCCCCTC ATCGCGCTGT CGCTCGCGTC CTTCCGGGGG 9780
ATCAAGATAG GCTACACGGC CCACATCCGC AAGGCGACCG AGCCCGTGTT TGATGAGATC 9840 GACGCCTGCC TGCGGGGCTG GTTTGGCTCG TCCCGGGTGG ACCACGTCAA GGGGGAAACC 9900
ATCTCGTTCT CGTTCCCGGA CGGCTCGCGC AGCACGATCG TGTTTGCCTC CAGCCACAAC 9960
ACGAACGTAA GTACGCCTTC CTCCCGCGGT GCCTGTTTCC CCGGTGCCGC CCTCCCCGAG 10020
ATCGACCGAC AGACAAACAC AGCCAGACGC GAGTGTGGGA CGACACGCCC GCAGCCCCCC 10080
CCGCCATGGC GGGGGGAAGC CTTACTGTTT ATTTGTAATC GGACGATGAG GCTCTGGCCA 10140 CGGCCCGCGC GACCGCGGGG CAGCTCGTTG CAAACAGGCG GCTGGTATAC GATGACAGAA 10200
CGCAGAGGCG CCACCCGGCG CTGGTCGGGC GGATGACGCT TTCCGCGCCG TCCCGGCCCA 10260
CGACGACCTC GTGCAGGTGG GCCGTGATGC GCGGGCGGCG GGTCGCCTGC CGCAGGATAA 10320
CCGCGTCCAC GGGGTGCCCG AAGAGGAGCT GACACAGGCT CGCGTCCCCC CGGACGGCCA 10380 GGGTGCGCTG GGCCATATTG GACCACATGC ACGGGGCGAC GCAGGGACAG GCCTCCGCCA 10440
CGGCGGGGGC GCGCCACAGC GCGTTGGCGG AATCGATGTG GGCCGTCGGG GCGCAGGCGC 10500
CGCCTCCTCC CGGGGGGTCG GTAATCCTGG ATAGCAGCCA TCCTAAATGG CGGGCCCGGC 10560
TGCCCGGGGG ACAGAGCGAC CCCAGGTCAT CATCCATGGC CCAGCAGTAT ATGCGGCCGC 10620 CGGGGAGGTG CCACCAGGCC CCCGGACCCA GGGCACAGCA CGCCCCCGGA TTCGGGGGCG 10680
GTTCCGTGGG TACCAGGTAG GCGCCGTCGA GCTCGTGGGC CACGGGCTCG TCCGCGAGCT 10740
GTTCGGCGGC GGGGTCGGGG GTTTCCTCCG GGGGGGAGGC AGCTTCCAGG TGGCCGAAGG 10800
CTAGGGTGCA CAGCAGCGGG GTCCGGGGGT GCGTTACGCT GCGGAGGTGG ACGGTGGCGC 10860
AGTAGCGGCG CTCGCGGTTA AAGAAAAAAA TGGCAAAAAA CGTGTTCGAA GGCAGGCGCA 10920 GCGCCTTGGG CCGCGTCAGG TACAGGAAAA TCTCGCAAAA AAGGGCACGC TCGGGGTCGG 10980
GGTCCGGAAG GGCCACCTGG CACAGCGGCT CGGTGAGGAC CGTGAGGCAC CGAAAAATCT 11040
TAAGCCGCTC GTCCCCCCGA ACGACGCGCC ACACGAAAAC AGAGTTGGCG ATGCGCGCGA 11100
CAAGGTCGGC TTCGGGCCCC GGGTCGGGGG CGCGCGCGTC GGGGGGGGCG CCCCGGTGAC 11160
CCGGCGGGGC CGCGGCTCCC GGGGGGCCTG GCGTCCCCTG GGGACGCCAC ACTGCCCGCT 11220 GTGCCATGTT GGTGGTGGGG AAGGGACCGG AGACGCACCA AAAGCAGAGG GGCCAGCGCG 11280
TGTATGACTT GGGGGGGGGG GTGGGTGACC GGTGGAACAA AAACACGCGT CAGCGGACAA 11340
GGCCGGGTCC CGTACCCGCC CCGCGACAGA ACCGGAGTCC GACGGCACGC GCGACGGGGT 11400
CTGCGAGGCT GAGGTACGCC GCGGTGTTAA TGGTAAACGC AAAGCCTCCC GGAAAGACCA 11460
CTAGCCCGCA GAGGCGGCGA TTGAACCCAA GGCAGAGGTA CGCGTAGCTC TCTCCCGGAA 11520 GGTATTGCTC GCAAACCCTG TGCGGGGCAG TGGAGGGGCT GCCCTCCATG AAGCGACATT 11580
TACTCTGCTC GCGTCCATTG ACGTCACCGT CAATCACCAC TGCGATTGGA CGGTTGGTAA 11640
GGCGCAGCGT GTCTCCGCTG GTGCTG AGT AGTCAAACGC GTAGTGGGCG TCGGAGTCGG 11700
CGAAGCGGGC GGGGATGTCG TCGCTGAGAG GGACGAGCCG CCGCCGCCGC CCCCGACCGC 11760
CCTGGCCGCC CAGATGCGCC AGCACGGCCA GGGCGTACGC GGTGTGAAAG AACGCGTCGG 11820 GGGCGGTCCC CTCGAGGGCG CGCATCAGGT TCTCCAGGAG CACGGGGAAG CGCCGCGTCA 11880
CCTCCCCTAG CCACTCGCTC TGGTGGGGGC CAAAGTCGTA GCGCAGGCGC TGGAAGATGC 11940
GCGGGCCGCC TTGGAGCGCG GCCCGGATAG AGTGGCCCAG GGCCCGCAGA CACGCGATCT 12000
GGATGCGCGC GACGAAGGCC ACCTCGGCCG CGATGTCAAA GGGCTGCAGC ACGGGGCGCG 12060
GGTGGCGCAG GGGTCCCTCG AGCGCGGGAA AGCGACGCAG CAGCGCCGTC TGGGCCGCGG 12120 GGGACAGCTG GTGGGGGCGC ACGACGCGCT CGGCGGCACA GGCCTCCGTC AGGGCCGTGG 12180
CCAGATAGGA GGACAACAGC GGGGGGCGGG TGCGTCGCCC GCCCCACGCC ACCGAATTTT 12240
TGTAGGAGAC GACGACGAAG CGCTGCTTGG TCCCGTAGTG ATGGCGCAGG ACCACGGAGA 12300
TGGAGCGACG GCTCCACAGC CAGTCGGGCC GGTCGCCGCC GGCCAGAGGT TCCCACCCGC 12360
GGTCCAGCCA CTCGACCAGC GATCGCGGCT TGGCGGTCCC CGGCACGAGG GTGAGCACGT 12420 CGTTGAGGAC GTCATCGCCC GCGGCCCGGG GGCCCCCCGG GGTGGCAAAG CGCCCCCCGC 12480
CGGGCGGTTC CAGGCCCGCC AGCACCGCCT CCGCGTCCGA CGCGCCCAGG GCTCCCCCGC 12540
TGACGGCNTG GTGGACCAGG GCGCCCTGGC GGAGCCCCGA GGNGACGCCG GAGGCCGCGT 12600
GCTTGGGGCG CGCGCGGACC GGGTGGCGGC GGGTGACGTC CTGCACGGCC CGCTGATCAA 12660
GCTTGTCGAT ACCGTGGACT CTGAAGTAGC CCGTAAGGAA 12701
(2) INFORMATION FOR SEQ ID NO: 27:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 857 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27:
Met Ala Glu Thr Met Asn Val Ala Thr Cys Thr His Gin Thr His His 1 5 10 15
Ala Ala Arg Ala Pro Gly Ala Thr Ser Ala Pro Gly Ala Ala Ser Gly
20 25 30
Asp Pro Leu Gly Ala Arg Arg Pro Ile Gly Asp Asp Glu Cys Glu Gin 35 40 45
Tyr Thr Ser Ser Val Ser Leu Ala Arg Met Leu Tyr Gly Gly Asp Leu
50 55 60
Ala Glu Trp Val Pro Arg Val His Pro Lys Thr Thr lie Glu Arg Gin 65 70 75 80 Gin His Gly Pro Val Thr Phe Pro Asp Ala Ser Ala Pro Thr Ala Arg
85 90 95
Cys Val Thr Val Val Arg Ala Pro Met Gly Ser Gly Lys Thr Thr Ala
100 105 110
Leu lie Arg Trp Leu Gly Glu Ala Ile His Ser Pro Asp Thr Ser Val 115 120 125
Leu Val Val Ser Cys Arg Arg Ser Phe Thr Gin Thr Leu Ala Thr Arg
130 135 140
Phe Ala Glu Ser Gly Leu Pro Asp Phe Val Thr Tyr Phe Ser Ser Thr 145 150 155 160 Asn Tyr Ile Met Asn Asp Arg Pro Phe His Arg Leu Ile Val Gin Val
165 170 175
Glu Ser Leu His Arg Val Gly Pro Asn Leu Leu Asn Asn Tyr Asp Val
180 185 190
Leu Val Leu Asp Glu Val Met Ser Thr Leu Gly Gin Lys Pro Thr Met 195 200 205
Gin Gin Leu Gly Arg Val Asp Ala Leu Met Leu Arg Leu Leu Arg Thr
210 215 220
Cys Pro Arg Ile Ile Ala Met Asp Ala Thr Ala Asn Ala Gin Leu Val 225 230 235 240 Asp Phe Leu Cys Ser Leu Arg Gly Glu Lys Asn Val His Val Val Ile
245 250 255
Gly Glu Tyr Ala Met Pro Gly Phe Ser Ala Arg Arg Cys Leu Phe Leu 260 265 270 Pro Arg Leu Gly Pro Glu Val Leu Gin Ala Ala Leu Arg Pro Pro Gly
275 280 285
Pro Ala Gly Gly Ala Pro Pro Pro Asp Ala Pro Pro Asp Ala Thr Phe
290 295 300 Phe Gly Glu Val Glu Ala Arg Leu Ala Gly Gly Asp Asn Val Cys Ile
305 310 315 320
Phe Leu Ser Thr Val Ser Phe Ala Glu Val Val Ala Arg Phe Cys Arg
325 330 335
Gin Phe Thr Asp Arg Val Leu Leu Leu His Ser Leu Thr Pro Pro Gly 340 345 350
Glu Val Thr Thr Trp Gly Arg Tyr Arg Val Val Ile Tyr Thr Thr Val
355 360 365
Val Thr Val Gly Leu Ser Phe Asp Pro Pro His Phe Asp Ser Met Phe
370 375 380 Ala Tyr Val Lys Pro Met Asn Tyr Gly Pro Asp Met Val Ser Val Tyr
385 390 395 400
Gin Ser Leu Gly Arg Val Arg Thr Leu Arg Lys Gly Glu Leu Leu Ile
405 410 415
Tyr Met Asp Gly Ser Gly Ala Arg Ser Glu Pro Val Phe Thr Pro Met 420 425 430
Leu Leu Asn His Val Val Ser Ala Ser Gly Gin Trp Pro Ala Gin Phe
435 440 445
Ser Gin Val Thr Asn Leu Leu Cys Arg Arg Phe Lys Gly Arg Cys Asp
450 455 460 Ala Ser His Ala Asp Ala Ala Gin Arg Ser Arg Ile Tyr Ser Lys Phe
465 470 475 480
Arg Tyr Lys His Tyr Phe Glu Arg Cys Thr Leu Ala Cys Leu Ala Asp
485 490 495
Ser Leu Asn Ile Leu His Met Leu Leu Thr Leu Asn Cys Met His Val 500 505 510
Arg Phe Trp Gly His Asp Ala Ala Leu Thr Pro Arg Asn Phe Cys Leu
515 520 525
Phe Leu Arg Gly Ile His Phe Asp Ala Leu Arg Ala Gin Arg Asp Leu
530 535 540 Arg Glu Leu Arg Cys Gin Asp Pro Asp Thr Ser Leu Ser Ala Gin Ala
545 550 555 560
Ala Glu Thr Glu Glu Val Gly Leu Phe Val Glu Lys Tyr Leu Arg Pro
565 570 575
Asp Val Ala Pro Ala Glu Val Val Met Arg Gin Ser Leu Val Gly Arg 580 585 590
Thr Arg Phe Ile Tyr Leu Val Leu Leu Glu Ala Cys Leu Arg Val Pro
595 600 605
Met Ala Ala His Ser Ser Ala Ile Phe Arg Arg Leu Tyr Asp His Tyr 610 615 620
Ala Thr Gly Val Ile Pro Thr Ile Asn Ala Ala Gly Glu Leu Glu Leu 625 630 635 640
Val His Pro Thr Leu Asn Val Ala Pro Val Trp Glu Leu Phe Arg Leu 645 650 655
Cys Ser Thr Met Ala Ala Cys Leu Gin Trp Asp Ser Met Ala Gly Gly
660 665 670
Ser Gly Arg Thr Phe Ser Pro Glu Asp Val Leu Glu Leu Leu Asn Pro 675 680 685 His Tyr Asp Arg Tyr Met Gin Leu Val Phe Glu Leu Gly His Cys Asn 690 695 700
Val Thr Asp Gly Pro Leu Leu Ser Glu Asp Ala Val Lys Arg Val Ala 705 710 715 720
Asp Ala Leu Ser Gly Cys Pro Pro Arg Gly Ser Val Ser Glu Thr Glu 725 730 735
His Ala Leu Ser Leu Phe Lys Ile Ile Trp Gly Glu Leu Phe Gly Val
740 745 750
Gin Leu Ala Lys Ser Thr Gin Thr Phe Pro Gly Ala Gly Arg Val Lys 755 760 765 Asn Leu Thr Lys Arg Ala Ile Val Glu Leu Leu Asp Ala His Arg Ile 770 775 780
Asp His Ser Ala Cys Arg Thr Gin Leu Tyr Ala Leu Leu Met Ala His 785 790 795 800
Lys Arg Glu Phe Ala Gly Ala Arg Phe Lys Leu Arg Ala Pro Ala Trp 805 810 815
Gly Arg Cys Leu Arg Thr His Ala Ser Gly Ala Gin Pro Asn Thr Asp
820 825 830
Ile Ile Ala Ala Leu Ser Glu Leu Pro Thr Glu Ala Trp Pro Met Met 835 840 845 Gin Gly Ala Val Asn Phe Ser Thr Leu 850 855
(2) INFORMATION FOR SEQ ID NO: 28:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 470 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2ϊ Val Tyr Cys Ser His Ser Ser Ser Pro Met Gly Arg Arg Ala Pro Arg
1 5 10 15
Gly Ser Pro Glu Ala Ala Pro Gly Ala Asp Val Ala Pro Gly Ala Arg 20 25 30
Ala Ala Trp Trp Val Trp Cys Val Gin Val Ala Thr Phe Ile Val Ser
35 40 45
Ala Ile Cys Val Val Gly Leu Leu Val Leu Ala Ser Val Phe Arg Asp 50 55 60 Arg Phe Pro Cys Leu Tyr Ala Pro Ala Thr Ser Tyr Ala Glu Ala Asn 65 70 75 80
Ala Thr Val Glu Val Arg Gly Gly Val Ala Val Pro Leu Arg Leu Asp
85 90 95
Thr Gin Ser Leu Leu Ala Thr Tyr Ala Ile Thr Ser Thr Leu Leu Leu 100 105 110
Ala Ala Ala Val Tyr Ala Ala Val Gly Ala Val Thr Ser Arg Tyr Glu
115 120 125
Arg Ala Leu Asp Ala Ala Arg Arg Leu Ala Ala Ala Arg Met Ala Met
130 135 140 Pro His Ala Thr Leu Ile Ala Gly Asn Val Cys Ala Trp Leu Leu Gin
145 150 155 160
Ile Thr Val Leu Leu Leu Ala His Arg Ile Ser Gin Leu Ala His Leu
165 170 175
Ile Tyr Val Leu His Phe Ala Cys Leu Val Tyr Leu Ala Ala His Phe 180 185 190
Cys Thr Arg Gly Val Leu Ser Gly Thr Tyr Leu Arg Gin Val His Gly
195 200 205
Leu Ile Asp Pro Ala Pro Thr His His Arg Ile Val Gly Pro Val Arg
210 215 220 Ala Val Met Thr Asn Ala Leu Leu Leu Gly Thr Leu Leu Cys Thr Ala
225 230 235 240
Ala Ala Ala Val Ser Leu Asn Thr Ile Ala Ala Leu Asn Phe Asn Phe
245 250 255
Ser Ala Pro Ser Met Leu Ile Cys Leu Thr Thr Leu Phe Ala Leu Leu 260 265 270
Val Val Ser Leu Leu Leu Val Val Glu Gly Val Leu Cys His Tyr Val
275 280 285
Arg Val Leu Val Gly Pro His Leu Gly Ala Ile Ala Ala Thr Gly Ile 290 295 300 Val Gly Leu Ala Cys Glu His Tyr His Thr Gly Gly Tyr Tyr Val Val 305 310 315 320
Glu Gin Gin Trp Pro Gly Ala Gin Thr Gly Val Arg Val Val Ala Ala 325 330 335 Phe Ala Met Ala Val Leu Arg Cys Thr Arg Ala Tyr Leu Tyr His Arg
340 345 350
Arg His His Thr Lys Phe Phe Val Arg Met Arg Asp Thr Arg His Arg 355 360 365 Ala His Ser Ala Leu Arg Arg Val Arg Ser Ser Met Arg Gly Ser Arg 370 375 380
Arg Gly Gly Pro Pro Gly Asp Pro Gly Tyr Ala Glu Thr Pro Tyr Ala 385 390 395 400
Ser Val Ser His His Ala Glu Ile Asp Arg Tyr Gly Asp Ser Asp Gly 405 410 415
Asp Pro Ile Tyr Asp Glu Val Ala Pro Asp His Glu Ala Glu Leu Tyr
420 425 430
Ala Arg Val Gin Arg Pro Gly Pro Val Pro Asp Ala Glu Pro Ile Tyr 435 440 445 Asp Thr Val Glu Gly Tyr Ala Pro Arg Ser Ala Gly Glu Pro Val Tyr 450 455 460
Ser Thr Val Arg Arg Trp 465 470
(2) INFORMATION FOR SEQ ID NO: 29:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 687 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29:
Met Ala Ala Ala Ala Thr Pro Gly Ala Lys Arg Pro Ala Asp Pro Ala
1 5 10 15
Arg Asp Pro Asp Ser Pro Pro Lys Arg Pro Arg Pro Asn Ser Leu Asp 20 25 30
Leu Ala Thr Val Phe Gly Pro Arg Pro Ala Pro Pro Arg Pro Thr Ser
35 40 45
Pro Gly Ala Pro Gly Ser His Trp Pro Gin Ser Pro Pro Arg Gly Gin 50 55 60 Pro Asp Gly Gly Ala Pro Gly Glu Lys Ala Arg Pro Asp Ala Leu Ser 65 70 75 80
Glu Ala Ser Ser Gly Pro Pro Thr Pro Asp Ile Pro Leu Ser Pro Gly 85 90 95 Gly Ala His Ala Ile Asp Pro Asp Cys Ser Pro Gly Pro Pro Asp Pro
100 105 110
Asp Pro Met Trp Ser Ala Ser Ala Ile Pro Asn Ala Leu Pro Pro His 115 120 125 Ile Leu Ala Glu Thr Phe Glu Arg His Leu Arg Gly Leu Leu Arg Gly 130 135 140
Val Arg Ser Pro Leu Ala Ile Gly Pro Leu Trp Ala Arg Leu Asp Tyr 145 150 155 160
Leu Cys Ser Leu Val Val Ser Leu Glu Ala Ala Gly Met Val Asp Arg 165 170 175
Gly Leu Gly Arg His Leu Trp Arg Leu Thr Arg Arg Ala Pro Pro Ser
180 185 190
Ala Ala Glu Ala Val Ala Pro Arg Pro Leu Met Gly Phe Tyr Glu Ala 195 200 205 Ala Thr Gin Asn Gin Ala Asp Cys Gin Leu Trp Ala Leu Leu Arg Arg 210 215 220
Gly Leu Thr Thr Ala Ser Thr Leu Arg Trp Gly Ala Gin Gly Pro Cys 225 230 235 240
Phe Ser Ser Gin Trp Leu Thr His Asn Ala Ser Leu Arg Leu Asp Ala 245 250 255
Gin Ser Ser Ala Val Met Phe Gly Arg Val Asn Glu Pro Thr Ala Arg
260 265 270
Asn Leu Leu Phe Arg Tyr Cys Val Gly Arg Ala Asp Ala Gly Val Asn 275 280 285 Asp Asp Ala Asp Ala Gly Arg Phe Val Phe His Gin Pro Gly Asp Leu 290 295 300
Ala Glu Glu Asn Val His Ala Cys Gly Val Leu Met Asp Gly His Thr 305 310 315 320
Gly Met Val Gly Ala Ser Leu Asp Ile Leu Val Cys Pro Arg Asp Pro 325 330 335
His Gly Tyr Leu Ala Pro Ala Pro Gin Thr Pro Leu Ala Phe Tyr Glu
340 345 350
Val Lys Cys Arg Ala Lys Tyr Ala Phe Asp Pro Ala Asp Pro Gly Ala 355 360 365 Pro Ala Ala Ser Ala Tyr Glu Asp Leu Met Ala Arg Arg Ser Pro Glu 370 375 380
Ala Phe Arg Ala Phe Ile Arg Ser Ile Pro Asn Pro Gly Val Arg Tyr 385 390 395 400
Phe Ala Pro Gly Arg Val Pro Gly Pro Glu Glu Ala Leu Val Thr Gin 405 410 415
Asp Arg Asp Trp Leu Asp Ser Arg Ala Ala Gly Glu Lys Arg Arg Cys
420 425 430
Ser Ala Pro Asp Arg Ala Leu Val Glu Leu Asn Ser Gly Val Val Ser 435 440 445
Glu Val Leu Leu Phe Gly Val Pro Asp Leu Glu Arg Arg Thr Ile Ser
450 455 460
Pro Val Ala Trp Ser Ser Gly Glu Leu Val Arg Arg Glu Pro Ile Phe 465 470 475 480
Ala Asn Pro Arg His Pro Asn Phe Lys Gin Ile Leu Val Gin Gly Asn
485 490 495
Val Pro Arg Gin Pro Leu Ser Arg Leu Pro Pro Ala Thr Ala Pro Gly 500 505 510 Asp Val Pro Arg Gin Ala Pro Arg Gly Arg Gly Gly Gly Arg Asp Val 515 520 525
Pro Pro Gly Gly Arg Pro Arg Ser Ala Arg Arg Ala Trp Arg Gly Pro
530 535 540
Arg Thr Arg Gin Gly Ile Asp Pro Pro Gly Pro Gly Arg Ser Asp Arg 545 550 555 560
Pro Asp His His Pro Arg Pro Arg Arg Ala Gly Asp Ile Pro Gly His
565 570 575
Pro Ala Lys Gin Pro Pro Gly Leu Arg Arg Tyr Ala Arg Gin Val Met 580 585 590 Gly Leu Ala Phe Ser Gly Ala Arg Pro Cys Cys Cys Arg His Asn Val 595 600 605
Ile Ile Thr Asp Gly Gly Glu Val Val Ser Leu Thr Ala His Glu Phe
610 615 620
Asp Val Val Asp Ile Glu Ser Glu Glu Glu Gly Asn Phe Tyr Val Pro 625 630 635 640
Pro Asp Met Arg Val Val Thr Arg Ala Pro Gly Pro Gin Tyr Arg Arg
645 650 655
Ala Ser Asp Pro Pro Ser Arg His Thr Arg Arg Arg Asp Pro Asp Val 660 665 670 Ala Arg Pro Pro Ala Thr Leu Thr Pro Pro Leu Ser Asp Ser Glu 675 680 685
(2) INFORMATION FOR SEQ ID NO: 30:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 107 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: Val Thr Phe Leu Gly Arg His Arg Ala Gly Ala Glu Glu Gly Val Thr
1 5 10 15
Phe Arg Leu Glu Asp Gly Arg Gly Ala Pro Ala Gly Arg Gly Gly Ala 20 25 30
Pro Gly Pro Ala Lys Ala Ser Ile Leu Pro Asp Gin Ala Val Pro Ile
35 40 45
Ala Leu Ile lie Thr Pro Val Arg Val Glu Pro Gly Ile Tyr Arg Asp 50 55 60 Ile Arg Arg Asn Ser Arg Leu Ala Phe Asp Asp Thr Leu Ala Lys Leu 65 70 75 80
Trp Ala Ser Arg Ser Pro Gly Arg Gly Pro Ala Ala Ala Asp Thr Thr
85 90 95
Ser Ser Ser Pro Thr Ala Gly Arg Ser Ser Arg 100 105
(2) INFORMATION FOR SEQ ID NO: 31:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 525 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31:
Val Gly Gly Arg Arg Pro Gly Gly Arg Met Asp Glu Ser Gly Arg Gin 1 5 10 15
Arg Pro Ala Ser His Val Ala Ala Asp Ile Ser Pro Gin Gly Ala His
20 25 30
Arg Arg Ser Phe Lys Ala Trp Leu Ala Ser Tyr Ile His Ser Leu Ser 35 40 45 Arg Arg Ala Ser Gly Arg Pro Ser Gly Pro Ser Pro Arg Asp Gly Ala 50 55 60
Val Ser Gly Ala Arg Pro Gly Ser Arg Arg Arg Ser Ser Phe Arg Glu 65 70 75 80
Arg Leu Arg Ala Gly Leu Ser Arg Trp Arg Val Ser Arg Ser Ser Arg 85 90 95
Arg Arg Ser Ser Pro Glu Ala Pro Gly Pro Ala Ala Lys Leu Arg Arg
100 105 110
Pro Pro Leu Arg Arg Ser Glu Thr Ala Met Thr Ser Pro Pro Ser Pro 115 120 125
Pro Ser His Ile Leu Ser Leu Ala Arg Ile His Lys Leu Cys Ile Pro
130 135 140
Val Phe Ala Val Asn Pro Ala Leu Arg Tyr Thr Thr Leu Glu Ile Pro 145 150 155 160
Gly Ala Arg Ser Phe Gly Gly Ser Gly Gly Tyr Gly Glu Val Gin Leu
165 170 175
Ile Cys Glu His Lys Leu Ala Val Lys Thr Ile Arg Glu Lys Glu Trp 180 185 190 Phe Ala Val Glu Leu Val Ala Thr Leu Leu Val Gly Glu Cys Ala Phe 195 200 205
Cys Gly Gly Arg Thr His Asp Ile Arg Gly Phe Ile Thr Pro Leu Gly
210 215 220
Phe Ser Leu Gin Gin Arg Gin Ile Val Phe Pro Ala Tyr Asp Met Asp 225 230 235 240
Leu Gly Lys Tyr Ile Gly Gin Leu Ala Ser Leu Arg Ala Thr Thr Pro
245 250 255
Ser Val Ala Thr Ala Leu His His Cys Phe Thr Asp Leu Ala Arg Ala 260 265 270 Val Val Phe Leu Asn Thr Arg Cys Gly Ile Ser His Leu Asp Ile Lys 275 280 285
Cys Ala Asn Val Leu Val Met Leu Arg Ser Asp Ala Val Ser Leu Arg
290 295 300
Arg Ala Val Leu Ala Asp Phe Ser Leu Val Thr Leu Asn Ser Asn Ser 305 310 315 320
Thr Ile Ser Arg Gly Gin Phe Cys Leu Gin Glu Pro Asp Leu Glu Ser
325 330 335
Pro Arg Gly Phe Gly Met Pro Ala Ala Leu Thr Thr Ala Asn Phe His 340 345 350 Thr Leu Val Gly His Gly Tyr Asn Gin Pro Pro Glu Leu Leu Val Lys 355 360 365
Tyr Leu Asn Asn Glu Arg Ala Glu Phe Asn Asn Arg Pro Leu Lys His
370 375 380
Asp Val Gly Leu Ala Val Asp Leu Tyr Ala Leu Gly Gin Thr Leu Leu 385 390 395 400
Glu Leu Leu Val Ser Val Tyr Val Ala Pro Ser Leu Gly Val Pro Val
405 410 415
Thr Arg Val Pro Gly Tyr Gin Tyr Phe Asn Asn Gin Leu Ser Pro Asp 420 425 430 Phe Ala Val Leu Ala Tyr Arg Cys Val Leu His Pro Ala Leu Phe Val 435 440 445
Asn Ser Ala Glu Thr Asn Thr His Gly Leu Ala Tyr Asp Val Pro Glu 450 455 460 Gly Ile Arg Arg His Leu Arg Asn Pro Lys Ile Arg Arg Ala Phe Thr 465 470 475 480
Glu Gin Cys Ile Asn Tyr Gin Arg Thr His Lys Ala Val Leu Ser Ser 485 490 495 Val Ser Leu Pro Pro Glu Leu Arg Pro Leu Leu Val Leu Val Ser Arg 500 505 510
Leu Cys His Ala Asn Pro Ala Ala Arg His Ser Leu Ser 515 520 525
(2) INFORMATION FOR SEQ ID NO: 32:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 79 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32:
Met Ala Val Ser Asp Leu Arg Arg Gly Gly Arg Leu Ser Leu Ala Ala
1 5 10 15
Gly Pro Gly Ala Ser Gly Asp Glu Arg Arg Arg Asp Glu Arg Leu Thr 20 25 30
Arg His Arg Asp Ser Pro Ala Arg Ser Arg Ser Arg Lys Leu Asp Arg
35 40 45
Arg Arg Asp Pro Gly Arg Ala Pro Glu Thr Ala Pro Ser Arg Gly Glu 50 55 60 Gly Pro Leu Gly Arg Pro Asp Ala Arg Arg Leu Arg Glu Cys Met 65 70 75
(2) INFORMATION FOR SEQ ID NO: 33:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 217 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: Met Ser Arg Asp Ala Ser His Ala Ala Leu Arg Arg Arg Leu Ala Glu
1 5 10 15
Thr His Leu Arg Ala Glu Val Tyr Arg Asp Gin Thr Leu Gin Leu His 20 25 30
Arg Glu Gly Val Ser Thr Gin Asp Pro Arg Phe Val Gly Ala Phe Met
35 40 45
Ala Ala Lys Ala Ala His Leu Glu Leu Glu Ala Arg Leu Lys Ser Arg 50 55 60 Ala Arg Leu Glu Met Met Arg Gin Arg Ala Thr Cys Val Lys Ile Arg 65 70 75 80
Val Glu Glu Gin Ala Ala Arg Arg Asp Phe Leu Thr Ala His Arg Arg
85 90 95
Tyr Leu Asp Pro Ala Leu Ser Leu Asp Ala Ala Asp Asp Arg Leu Ala 100 105 110
Asp Gin Glu Glu Gin Leu Glu Glu Ala Ala Ala Asn Ala Ser Leu Trp
115 120 125
Gly Asp Gly Asp Leu Ala Asp Gly Trp Met Ser Pro Gly Asp Ser Asp
130 135 140 Leu Leu Val Met Trp Gin Leu Thr Ser Ala Pro Lys Val His Thr Asp
145 150 155 160
Ala Pro Ser Arg Pro Gly Ser Arg Pro Thr Tyr Thr Pro Ser Ala Ala
165 170 175
Gly Arg Pro Asp Ala Gin Ala Ala Pro Pro Pro Glu Thr Ala Pro Ser 180 185 190
Pro Glu Pro Ala Pro Gly Pro Ala Ala Asp Pro Ala Ser Gly Ser Gly
195 200 205
Phe Ala Arg Asp Cys Pro Asp Gly Glu 210 215
(2) INFORMATION FOR SEQ ID NO: 34:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 493 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34:
Val Tyr Ser Arg Pro Pro Gly Val Ala Ala Gly Ser Gly Pro Cys Thr 1 5 10 15
Pro Arg Pro Gly Gly Ala Ser Arg Pro Asn Val Gly Ala Gly Pro Arg
20 _ 25 30
Gly Trp Arg Leu Gly Ser Ser Arg Arg Pro Arg Ala Arg Pro Thr Ser 35 40 45
Asp Ser Phe Ala Pro Thr Pro Leu Thr Ser Ala Ala Pro Asp Ala Met
50 55 60
Phe Gly Gin Gin Leu Ala Ser Asp Val Gin Gin Tyr Leu Glu Arg Leu 65 70 75 80 Glu Lys Gin Arg Gin Gin Lys Val Gly Val Asp Glu Ala Ser Ala Gly
85 90 95
Leu Thr Leu Gly Gly Asp Ala Leu Arg Val Pro Phe Leu Asp Phe Ala 100 105 110
Thr Ala Thr Pro Lys Arg His Gin Thr Val Val Pro Gly Val Gly Thr 115 120 125
Leu His Asp Cys Cys Glu His Ser Pro Leu Phe Ser Ala Val Ala Arg
130 135 140
Arg Leu Leu Phe Asn Ser Leu Val Pro Ala Gin Leu Arg Gly Arg Asp 145 150 155 160 Phe Gly Gly Asp His Thr Ala Lys Leu Glu Phe Leu Ala Pro Glu Leu
165 170 175
Val Arg Ala Val Ala Arg Leu Arg Phe Arg Glu Cys Ala Pro Glu Asp 180 185 190
Ala Val Pro Gin Arg Asn Ala Tyr Tyr Ser Val Leu Asn Thr Phe Gin 195 200 205
Ala Leu His Arg Ser Glu Ala Phe Arg Gin Leu Val His Phe Val Arg
210 215 220
Asp Phe Ala Gin Leu Leu Lys Thr Ser Phe Arg Ala Ser Ser Leu Ala 225 230 235 240 Glu Asn Thr Gly Pro Pro Lys Lys Arg Ala Lys Val Asp Val Ala Thr
245 250 255
His Gly Gin Thr Tyr Gly Thr Leu Glu Leu Phe Gin Lys Met Ile Leu 260 265 270
Met His Ala Thr Tyr Phe Leu Ala Ala Val Leu Leu Gly Asp His Ala 275 280 285
Glu Gin Val Asn Thr Phe Leu Arg Leu Val Phe Glu Ile Pro Leu Phe
290 295 300
Ser Asp Thr Ala Val Arg His Phe Arg Gin Arg Ala Thr Val Phe Leu 305 310 315 320 Val Pro Arg Arg His Gly Lys Thr Trp Phe Leu Val Pro Leu Ile Ala
325 330 335
Leu Ser Leu Ala Ser Phe Arg Gly Ile Lys Ile Gly Tyr Thr Ala His 340 345 350 Ile Arg Lys Ala Thr Glu Pro Val Phe Asp Glu Ile Asp Ala Cys Leu
355 360 365
Arg Gly Trp Phe Gly Ser Ser Arg Val Asp His Val Lys Gly Glu Thr
370 375 380 Ile Ser Phe Ser Phe Pro Asp Gly Ser Arg Ser Thr Ile Val Phe Ala
385 390 395 400
Ser Ser His Asn Thr Asn Val Ser Thr Pro Ser Ser Arg Gly Ala Cys
405 410 415
Phe Pro Gly Ala Ala Leu Pro Glu Ile Asp Arg Gin Thr Asn Thr Ala 420 425 430
Arg Arg Glu Cys Gly Thr Trp Gin Pro Pro Pro Pro Trp Arg Gly Glu
435 440 445
Ala Leu Leu Phe Ile Cys Asn Arg Thr Met Arg Leu Trp Pro Arg Pro 450 455 460 Ala Arg Pro Arg Gly Ser Ser Leu Gin Thr Gly Gly Trp Tyr Thr Met 465 470 475 480
Thr Glu Arg Arg Gly Ala Thr Arg Arg Trp Ser Gly Gly 485 490
(2) INFORMATION FOR SEQ ID NO: 35:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 399 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35:
Val Phe Leu Phe His Arg Ser Pro Thr Pro Pro Pro Lys Ser Tyr Thr
1 5 10 15
Arg Trp Pro Leu Cys Phe Trp Cys Val Ser Gly Pro Phe Pro Thr Thr 20 25 30
Asn Met Ala Gin Arg Ala Val Trp Arg Pro Gin Gly Thr Pro Gly Pro
35 40 45
Pro Gly Ala Ala Ala Pro Pro Gly His Arg Gly Ala Pro Pro Asp Ala 50 55 60 Arg Ala Pro Asp Pro Gly Pro Glu Ala Asp Leu Val Ala Arg Ile Ala 65 70 75 80
Asn Ser Val Phe Val Trp Arg Val Val Arg Gly Asp Glu Arg Leu Lys 85 90 95 Ile Phe Arg Cys Leu Thr Val Leu Thr Glu Pro Leu Cys Gin Val Pro
100 105 110
Asp Pro Asp Pro Glu Arg Ala Leu Phe Cys Glu Ile Phe Leu Tyr Leu 115 120 125 Trp Lys Ala Leu Arg Leu Pro Ser Asn Thr Phe Phe Ala Ile Phe Phe 130 135 140
Phe Asn Arg Glu Arg Arg Tyr Cys Ala Thr Val His Leu Arg Ser Val 145 150 155 160
Thr His Pro Arg Thr Pro Leu Leu Cys Thr Leu Ala Phe Gly His Leu 165 170 175
Glu Ala Asp Pro Glu Glu Thr Pro Asp Pro Ala Ala Glu Gin Leu Ala
180 185 190
Asp Glu Pro Val Ala His Glu Leu Asp Gly Ala Tyr Leu Val Pro Thr 195 200 205 Glu Pro Pro Pro Asn Pro Gly Ala Cys Cys Ala Leu Gly Pro Gly Ala 210 215 220
Trp Trp His Leu Pro Gly Gly Arg Ile Tyr Cys Trp Ala Met Asp Asp 225 230 235 240
Asp Leu Gly Ser Leu Cys Pro Pro Gly Ser Arg Ala Arg His Leu Gly 245 250 255
Trp Leu Leu Ser Arg Ile Thr Asp Pro Pro Gly Gly Gly Gly Ala Cys
260 265 270
Ala Pro Thr Ala His Ile Asp Ser Ala Asn Ala Leu Trp Arg Ala Pro 275 280 285 Ala Val Ala Glu Ala Cys Pro Cys Val Ala Pro Cys Met Trp Ser Asn 290 295 300
Met Ala Gin Arg Thr Leu Ala Val Arg Gly Asp Ala Ser Leu Cys Gin 305 310 315 320
Leu Leu Phe Gly His Pro Val Asp Ala Val Ile Leu Arg Gin Ala Thr 325 330 335
Arg Arg Pro Arg Ile Thr Ala His Leu His Glu Val Val Val Gly Arg
340 345 350
Asp Gly Ala Glu Ser Val Ile Arg Pro Thr Ser Ala Gly Trp Arg Leu 355 360 365 Cys Val Leu Ser Ser Tyr Thr Ser Arg Leu Phe Ala Thr Ser Cys Pro 370 375 380
Ala Val Ala Arg Ala Val Ala Arg Ala Ser Ser Ser Asp Tyr Lys 385 390 395
(2) INFORMATION FOR SEQ ID NO: 36:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 452 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36:
Phe Leu Thr Gly Tyr Phe Arg Val His Gly Ile Asp Lys Leu Asp Gin 1 5 10 15
Arg Ala Val Gin Asp Val Thr Arg Arg His Pro Val Arg Ala Arg Pro
20 25 30
Lys His Ala Ala Ser Gly Val Xaa Ser Gly Leu Arg Gin Gly Ala Leu 35 40 45 Val His Xaa Ala Val Ser Gly Gly Ala Leu Gly Ala Ser Asp Ala Glu 50 55 60
Ala Val Leu Ala Gly Leu Glu Pro Pro Gly Gly Gly Arg Phe Ala Thr 65 70 75 80
Pro Gly Gly Pro Arg Ala Ala Gly Asp Asp Val Leu Asn Asp Val Leu 85 90 95
Thr Leu Val Pro Gly Thr Ala Lys Pro Arg Ser Leu Val Glu Trp Leu
100 105 110
Asp Arg Gly Trp Glu Pro Leu Ala Gly Gly Asp Arg Pro Asp Trp Leu 115 120 125 Trp Ser Arg Arg Ser Ile Ser Val Val Leu Arg His His Tyr Gly Thr 130 135 140
Lys Gin Arg Phe Val Val Val Ser Tyr Lys Asn Ser Val Ala Trp Gly 145 150 155 160
Gly Arg Arg Trp Pro Leu Leu Ser Ser Tyr Leu Ala Thr Ala Leu Thr 165 170 175
Glu Ala Cys Ala Ala Glu Arg Val Val Arg Pro His Gin Leu Ser Pro
180 185 190
Ala Ala Gin Thr Ala Leu Leu Arg Arg Phe Pro Ala Leu Glu Gly Pro 195 200 205 Leu Arg His Pro Arg Pro Val Leu Gin Pro Phe Asp Ile Ala Ala Glu 210 215 220
Val Ala Phe Val Ala Arg Ile Gin Ile Ala Cys Leu Arg Ala Leu Gly 225 230 235 240
His Ser Ile Arg Ala Ala Leu Gin Gly Gly Pro Arg Ile Phe Gin Arg 245 250 255
Leu Arg Tyr Asp Phe Gly Pro His Gin Ser Glu Trp Leu Gly Glu Val
260 265 270
Thr Arg Arg Phe Pro Val Leu Leu Glu Asn Leu Met Arg Ala Leu Glu 275 280 285
Gly Thr Ala Pro Asp Ala Phe Phe His Thr Ala Tyr Ala Val Leu Ala
290 295 300
His Leu Gly Gly Gin Gly Gly Arg Gly Arg Arg Arg Arg Leu Val Pro 305 310 315 320
Leu Ser Asp Asp Ile Pro Ala Arg Phe Ala Asp Ser Asp Ala His Tyr
325 330 335
Ala Phe Asp Tyr Tyr Ser Thr Ser Gly Asp Thr Leu Arg Leu Thr Asn 340 345 350 Arg Pro Ile Ala Val Val Ile Asp Gly Asp Val Asn Gly Arg Glu Gin 355 360 365
Ser Lys Cys Arg Phe Met Glu Gly Ser Pro Ser Thr Ala Pro His Arg
370 375 380
Val Cys Glu Gin Tyr Leu Pro Gly Glu Ser Tyr Ala Tyr Leu Cys Leu 385 390 395 400
Gly Phe Asn Arg Arg Leu Cys Gly Leu Val Val Phe Pro Gly Gly Phe
405 410 415
Ala Phe Thr Ile Asn Thr Ala Ala Tyr Leu Ser Leu Ala Asp Pro Val 420 425 430 Ala Arg Ala Val Gly Leu Arg Phe Cys Arg Gly Ala Gly Thr Gly Pro 435 440 445
Gly Leu Val Arg 450
(2) INFORMATION FOR SEQ ID NO: 37:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26339 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37:
GGGAGCGAGG TGAGGGAATG AGAGAAGAAA GGAGGAGGAT AGTAAGTGGG GAGATGAAGA 60
AGAAAATAGA CGCTGGGAAG GAGAATACGA CGAGGAGGGA GGAAGGAGAA GAGAGTCGGA 120
AAGCATAAGA GGTGGAAAGG AGGTGTGAGT AATTGAGCGG AGATGAGAGG ATAGGATAAG 180
GCGAGAGACG GAAGATAAGA AAGTGAAGGG AGTAAGGGTA AGGTGAGAGA GAGAAAGAGG 240 AGGATTATGT GATGGTTAGG GGAGAGAGAG GAATATGTGG AGAAATTGTG AGGAAGGAAA 300
AGAGAGAGAA GAGTGTGGGT ATAAAGGAGA TATGGATGGA ATAGAGTAAA TTGAAGAAGG 360
GAAGAGATGG TAAGAATGGA GTGAAGAGGA GGTAAAGAAT TTAGTAAAGG AGTGGTGATG 420
ATGGATAAAA AAGTGGAAAT GGGGTAAGAA AGAAGAGAGA GGAGGAGGAA AAAAAAAAAA 480 TAAAAAAAGG GCCTTGCCGC CGGCCCTGGC GTACGCGCTA TA AAGCCCA TGCGGTATTG 540
GATGAGTTCC CGCGCGCCCC GGAACTCCTC CACCGCCCAC GGGGCCAGGT CCGCGGCCGC 600
CGCGTCAAAC TCCGCCAGCA GGCGCCCCAG GGCGTCAAAG TTCATCTCCC AGGGCACCCT 660
GCGCACCACC TCATCCCGCA GCCGGGCGCA CAGGGCGGTG TGCTTGGTGA CGCGCGCGCC 720 CAGCTCCTCC ACGGCCTCCG CGCGCTCGGC GCCCTTGGCG CCCAGGACGC CCTGGTACCT 780
GGCGGAAAGG CGCTCGTAGG CCGGCTGGGC CCGCAGCCCC GACACCGTGT TGGTGGTGTC 840
CTGCAGGGCG CGCAGCTGCT CGTGCATGGC GCGGAACCCC TCGGGGGACT TCCAGGCGCC 900
CCCCCGGACG CGGCCAAAGC GACCCCAGAC CTCGTCCCAC TCCGCCTCGG CCTCCTCCAG 960
GGACCTCCGC AGGGCGTCGA CGCGGCGCCG AGTATCAAAG AGCGCCCCCA GGCGGCCGGC 1020 GTGCCGCGCC AGGGGGCCGG GGCCGTCGCC GCGGGCGGCG CTTAGCGGGT GCGTCTCGAA 1080
GGTGCGCTGG GCGTGCTCTA GCCAGATAAC CGCGGGCACG TCGAGCTCGC GCGTTTTCTC 1140
GGTCTGATCC AACAGAACCT CGACCTGGTC GGCGATCTCC GCCACCGAGC GCGCCTGGTC 1200
GAGCGTCTTG GCCACGGTCG CCGGGACGGC GACCACCTTC AGCATGGTCT TGAGGTTGGC 1260
CAGGCCCTCG GCCTCGATCT GGGCCCGGCG CTCGCGCGCG GCCAGCGCCT CCCGCAGGCC 1320 CGCCATGACC CGCTCGGTGG CCTCCGCGCG CTGCTGTTTG GCGCGCACCA CTGCGTCCTT 1380
GGTCTCGGCC GTGTCCTGCC GGGTCACGAA GGCGACATAC TCGGCGTACG CCGTGTTCTT 1440
CACGGGGCTC TGGTCCACGC GCTCCAACGC CGCCGCGCAC GCGACCAGCG CGTCCTCGCT 1500
GGGACACGGC AGGGTGACCC CGGTCCGGAC CAGCTCCGCG GTGGCCTCCG GGTCATTCCG 1560
GGCCGCGGAT ATCTGCTCCG CGGCGGCCGC CAGGTCCAGG GGCACGCCGC CGAGCGCCCG 1620 GTGCACGTCG GCCCGGATGG CGTCCAGGCG ATCGCGGAGC TCCACGTAGT CGGCGTAGCC 1680
ATGTTGGAAG AACGGCACGT ACCGGCGCAG GCCGGGCACG CTCGTCATGT CGTCCGCCAG 1740
GCGCCCCACG GCCTCGTGGT AGTCGATAAA CCCGTCGCCC GCCTGGGCCA TTTCCAGGAG 1800
CCCCTCCGCG ATGCGCAGCA GCCGCGCCAG GGGCTCGGCG TCGACCCGAA ACATGTCGGC 1860
GTAGGTTTCG GCGGCGGCGT GGAACGCCGC GCTCCAGCCG AGGCGGTGGA TGGCGGCGAG 1920 CGGGGGGAGC ATGGGGTGGC GCTGGTTCTC GGGGGTGTAG GGGTTAAACG CGAAGGCCGT 1980
ATCCAGGGCG AGGGTGACCG CCTCGGCGTT GGCCGCGAGC GCCTGCTCGG CGCGCTTGCG 2040
GAAGTCCCGG GGGTTGTAGC CGTGCGTGCC CGCCAGCGCC TGCAGGCGGC GCAGCTCGAC 2100
CACGTCGAAC TCGGCGCGGT TCTCGACGCG GTCCAGCGCC GCCTCGACGC CGGCGGCCCA 2160
GCGCTCGCTG CTGCCCCGGG CGCGCTGGGC CGCCATCTTC CCCGTCAGGT CGGCGACGGC 2220 GGCCTCAAGT TCCTCGGCGC GGCGTCGCGT GGCGCCGATG ACCTTGCCCA GCTCCTGCAG 2280
GGCGCGCCCG CTGGGGGAAT GGTCCCCGGC CGTCCCTTCG GCGTGCAGCA GGCCCCCGAA 2340
CCCAGCCTCG TGCCCCGCGA GGCTTTCCCG AGCAGCGGTG GTCGCGCGGG CCGCGGCATC 2400
GATGAGGGCG GCATGGTCCC CCTCCGGCTG GGCGCAGGCC CGGCGCGCAT GGAATACCAG 2460
GTCGGCGGCC GCCGACCCCA GGGTGGTGAG CTTGTCGATG GCCCCCCGCG CCTCCAGGGC 2520 CAGCCGAGTC GCCTTTACAT ACCCCGCGGC GCTATCGGCC AGCGCCGCGA GGAAGGACAG 2580
GGGCGAGGCC GGGTCGCGGG CGGCCGCGCC CAGGGCCGAC ACCGCGTCCG CCAGGGCGCC 2640
ATGCGCCCGC ACGGCCGCGT CCACCGTCGC CGCGGGACTT GCCGTCGCGA CGGCGGCGCT 2700
CCCGGCGTTG ATGGCGTTTG ACACGGCTTT GGCGATTGTG GGGGCGTGAT CGGAAAAGAA 2760
CTGCACGAGG ACCGGCGTCT CGGGGGCGTC NGCGAACATG GTCTTCAGCA CCACCACTAA 2820 GGCGGGATGC ATGCCGGCCA CAACCGTCTC GGTATCCGGG GTCTGGTGTT CCANGGCCTC 2880
CCGGTACTGC CCCATCACCC CCCACATGTC CGCCCGCAGC CCCGCCTTGA CTTCCGGGGG 2940
GGGGCCCCCG GACGGCATCG GCCAGGTCGG TCCACCCCGC GGGGCAGGGA GGCCCGCAGG 3000
GTCGCCAGCA CGGCCGGACA CGCCTTTAGC CCCACAAAGT CCGGGAGGGG CCGCAGGACC 3060 CCTTGGAGTT TGTGCAGGAA CTTCTCCCGG GCGTCGTGGG CCACCTTGGC GCGCTCCCGC 3120
GCGTCGTTGA GCATCGCCTC CAGGGCGTGG GCGCGCTCCC GAAGCCGGGA GCGCGCCTCC 3180
GGAGCGAGCT CCGCCGTCAT CTTGGCCGCC TCCATGGCCC TCGCCTGCCG CAGCGCGTCT 3240
TCGGCCATGC GCGTGGCCTC GGGGGACAGC CCGCCCCCGT CGACGTACGG CGCGGGGCCG 3300 GTCGCCGGGA CGAAGGCCGC GTCGCTGTCC AGCTGCTGCG CGAGCGCCGC GTCGAGGGCG 3360
TCGAAGCGCT GCAGTTCGGC CAGCCCCGAG CTGCGCCGCG CCTGCTGGTC GTTGATGCCG 3420
TGGATGCTGC GCGCCAGCTC TTCCAGGGGC TTGCGTTCGA TGAGCCCCTG GGTCGCGGCG 3480
TCGGTCAGGA CCGAGAGCCA GGCCGCCAGG TCCTCGGGGG CATCTAGGGT CTGGCCCCGC 3540
TGGAGCAGGT CCCGCAGCAG GATGGCCTGG GGGCTGGTGG CGAGGGGGGG CGGGGGGGGG 3600 AGCGCGGCGC GCTGAGCGAC GTCCCGCGTG TGTTGGTCAA AGGCCGGTAG CGATTCCAGC 3660
AACTGGACCA TGGGCACGAC CGCGGCCGAG GCCACGTGAA ACCGACAGTC GTGGCTGTCG 3720
CTGGCCTGCA GGGCCTTCGC GCTGTATACG GCTCCCCGGT GGAAGTACTC CTTGATCGCG 3780
CTTTCGATCG CCCGGCGGGC CTGGATCCGC ACGTCCTCCA GCCGCGCATG GATGGCCTCG 3840
GGGCCCAGGG CGGGCGGGCA CGGGGCCCTG CCGCCGGCGC CCCGGGGGGG GGGGGCAACG 3900 GGCATCACGG TCAGGGGGCC CGGCGCGCTG CGAGACCGAG TAGACCCCGC GGGCGAGGGC 3960
GTATAAGGCC TCGCGCATCT CGCGGGCCTC CGCCTCGACC CGCATCTTTT TGCCCCGGGC 4020
AAAATGGGCC AGCGCCTGGA TCCGATGGAG AAGCGGCTCC GGGTGCGTCG GGGTGGCGGG 4080
GGCGAACAGG GTGTTCGGGT GGGCGCGCGA GCGCTCCAGG AGCCACTTTC CGAGGCGTGC 4140
GTACAGATTG GCCGGCGGGG CGGCGCGCAG CTGCAGATCC AGGTCCGCGA GGTCCCCGTA 4200 AAAGGCGTCC GTCTCCCGAA TAAAGTCCCT GGCGACCAGG ACCAGCTTAG CGAGGGCCAG 4260
GCGCCCGATC TGCGAATTTT TGTCCAGCAC GTGCTGGATG AGGGGCCGGT GGGCGGCCAC 4320
GTCCGCCAGG CTCATGCGCG TGGACGCCAG GAAGTCCCCG ACGGCCGTTT TGCGGGGCGG 4380
CATGCGCAGG GTGAAGTCCA GCAGGGCCGC GGCCGGGCCG GCCACCCCGG CCTGCGTATG 4440
CGTGCGGGCC CCGTTCTCGA TCAAAAAGGC GAGGACGCGC TCAAAGAAGA AGATGACGCA 4500 GAGCTCCAAC AGCCCCGGGT GCGCCGGGTA CGGCGACCGC AGGGCGTTGA TGGTGAGCTG 4560
CGAACACGCG GCCACCTCGC GGGCCAGGGC GGCATCGCGC GCCGCGAGCC GGACCGCCGT 4620
GGCGGCCACA TTGGGGTGGA CCTCGAACAG CTGCGCCAGG TCGGCGCCGG GGGGCTCCGG 4680
GGGGCGGCGG GCCCCCAGCG TCTCGAGCAC GGACGGCGAC GACGGGCTCG CGGGCCCGTC 4740
ATCGCCGCCT CCCTGCCCGG ACTGCGGGGG GGTATCCGGT GCGGGAGGGA CCGTGGCGGC 4800 TATGGGCGTC GGGGAGGAGG CGGGGACCTC GGCGGCGACG GGGGCCTTCT TCTTGGGCGC 4860
GGACTTCTTC TTGGCCTTGG CGGGCGGGGC CTTGGGGGCG GGCCTCTCGC CCGAGGTCAG 4920
ATCCTCCACG CTGGACGGTG GGGTCCAGGT GGGCCGGCGG CGCTTGGGCA AGCCGGTAGA 4980
ATAGCGCGCC CGGTGGCAAC CCACCGGCAC TGCCCCCACC TCCAGGACCC GCAGGTCCTC 5040
GGCTTCTTCG GCCGCGTCCC CGGCGGGTGT CTGCGGGGGC GGGGCGGCGT GCGGTGGACC 5100 CGAGGCCGCG GCGTCCGGGG CCGAGGGCTT TGCGGGCGGG GTCCCCTCCA GGGCTGCTGC 5160
CCACACATCA TCGGGGGGGC GGTTTGGGTG CCCCGCCTGC GGTGTGTTGG GTGGGCCCGA 5220
GGCCCCCCGG GGGGCCTCGG GGGGCCGGTC GGCCCGAGGG GTCTGGACGT GGGTGGGCGC 5280
GGGGAGCGCG GGGACGACCG GGCCCGAGCC TTCTCCGTCC CCCCTGGGGA CCACACCGAC 5340
AAAGAGCGCC CCTAGCCCCC CGATCTCGCC CCGCAGGGGG TGGGTGATGG CCACGCGCCG 5400 CTCGACGAAC GGTTCGTCCT GCAGGTAAGT CTCGCTGGCC CCGTAGAGGT GCAGGGCCGC 5460
GGCGGTCAGG TCCGCCGGCG CCACGGCCCC CGGGCCGGAG GGCACAAAAA ACACCATGGC 5520
GCCCGCCCAC CGCACCTTGG GGCGGTCGTG GGCGTAATAC GTCAGGTACG GGTACACGTC 5580
GCCCGCCCGC ACCTTGGCGA TAAACGCGGG CGTTCCCGCG GGCAGGCCGT GCGGGTCAAA 5640 CAGATAGGCC GTGTCGCCGT CCCGGTAGAG CCCCATGCCC AGGGGGCCGA TGGTCAGGAG 5700
CGTGTAGGAC AGCGGCCGCA TGGCCCAGGG GCCGGCGAAG AACGTGTGCG CGGGGCATTG 5760
CGTCTCCAGC AGCCCCGCCG TGGGCTCCCC GAAGAAGCCC ACCTCGCCGT ACACCCGCGA 5820
AAACACGCAA CGCAGGCCGC CGCGCGCCGC CGGGTACTCC AGGAAGTTGG GGAGCTCGAT 5880 AATGGAACAC ATGCGCGGCG GCCCGGAGCC CGCGGCCGCG CGCGTCCACT CGCCCCCCTC 5940
CACCAGACAT CCCTCAATGG CCTCCGCGGA CAGCACGTCG CGGGGCCCCA CGTCGAAAAG 6000
AAGACTGAGA AACGACAGGG ACGAGCGCAT GCACGATACC GACCCCCCCG GCTCCAGATC 6060
GGTCGCGAAC TGGTTCCGAA CACCGGTGAC CACGATATCG CGATCCCCCT GGGCGCTTCA 6120
TCGTGGGGTG AGGTAGCGCG GCCGGAATCA TGTGTGCCGC GCCCGCCACG AGCGGGGCCT 6180 GTTTATGGGC CGGGCGTCCC GATGAGTACT GTTGTTTCCG CCGCCCGAAC CCCCCCCGCC 6240
CATCAACCGC CTGTTCGTCC CCCTAACCAC ACACCCGGTA TCGCGTGAGC TCGTCGTAAC 6300
TGAACAGGAG CACGCGGGCG CAGGTCGCCC ACGGGCCCCA CGCCAGGCGC AGCGCCGCAA 6360
CCGTGTACGG GTCGTACACG CCTTGGGCGT CGCACGCGAC CGGCAGGGAG ACGAACAGCC 6420
CGCCCGCGCT GGGGACGCGC GGCAGGAGGT CCGGGTGCGC CGGGATGACG GGGGCTAGGA 6480 TCGCCCCCAC CGCATCCGCC GGCACGTAGG CGGCAAACGC CGAACGCCAC GGGGTGCAGT 6540
CGCCGGGCGC GTGGGGCCGG GTCTGGGTTT CGACCCGGAA GTTCGCGGCC GCCCCGCCGT 6600
CGGGGCGGCC GCGCACGAGG GCGGACAGCG GGACCCCCGC CGCCGCCAGG CACTCGCTGG 6660
AGATGATGAC GTGAATCAGC GAGGCGGGGC TGCTCGGGTC CCGGGTGAGA TCGTATTGGA 6720
CCTCGTTGGC AAAGTGCGCG TTCATGGCCC GGCCGGCGGT GCGAGCCCTT CCCGGTGCCG 6780 GAAGGGGCGT GGGTGGGGGG TGCGTGTGCG CGTCCTCGGG GCCCGCGGGC GCACGTGCGC 6840
TTATACGCTG TGTGTTTCGT CTGTCCCCAG GGAATCCGGG GCCAGGACTT TAACCTGCTT 6900
TTCGTCGACG AGGCCAACTT TATTCGCCCG GATGCGGTCC AGACGATTAT GGGCTTTCTC 6960
AATCAGGCCA ACTGCAAGAT CATCTTCGTC TCGTCGACCA ACACCGGGAA GGCCAGCACG 7020
AGCTTTTTGT ACAACCTCCG CGGGGCCGCC GACGAGCTGC TCAACGTGGT CACCTATATA 7080 TGCGACGACC ACATGCCGCG GGTGGTGACG CACACCAACG CCACGGCCTG TTCCTGCTAT 7140
ATCCTGAACA AACCCGTGTT TATCACGATG GACGGCGCCG TTCGCCGGAC GGCCGATCTG 7200
TTTCTGCCCG ACTCCTTCAT GCAGGAGATC ATCGGGGGGC AGGCCCGCGA GACCGGCGAC 7260
GACCGGCCCG TCCTAACAAA GTCGGCGGGG GAGCGGTTTC TGCTGTACCG CCCCTCCACC 7320
ACCACCAACA GCGGCCTGAT GGCCCCCGAG CTGTACGTGT ACGTGGACCC GGCGTTCACG 7380 GCCAACACGC GCGCCTCCGG CACCGGCATC GCGGTCGTCG GGAGGTACCG CGACGATTTC 7440
ATTATCTTCG CCCTGGAGCA CTTTTTCCTC CGCGCGCTCA CGGGATCGGC CCCCGCGGAC 7500
ATCGCCCGCT GCGTCGTGCA CAGCCTCGCC CAGGTGCTGG CGCTGCACCC CGGGGCGTTT 7560
CGCAGCGTTC GCGTGGCGGT CGAGGGCAAC AGCAGCCAGG ACTCGGCCGT GGCCATCGCC 7620
ACACACGTGC ATACCGAGAT GCACCGCATC CTGGCCTCGG CGGGGGCCAA CGGCCCGGGG 7680 CCCGAGCTCC TCTTCTATCA CTGCGAGCCG CCCGGCGGCG CGGTATTGTA CCCCTTCTTT 7740
CTGCTCAACA AACAGAAGAC GCCCGCCTTC GAATACTTTA TCAAAAAGTT CAACTCCGGG 7800
GGCGTCATGG CGTCCCAGGA GCTCGTCTCC GTGACGGTGC GCCTGCAGAC CGACCCGGTC 7860
GAGTATCTGT CCGAGCAGCT CAACAACCTC ATCGAAACCG TCTCTCCCAA CACCGACGTC 7920
CGCATGTACT CCGGAAAACG CAACGGTGCC GCGGACGACC TCATGGTCGC GGTCATCATG 7980 GCCATTTACC TGGCGGCCCC GACCGGGATC CCCCCGGCCT TTTTTCCGAT CACGCGCACG 8040
TCTTGAGTCT TTCTTGCCGT TTCTTTTGTT TCTCTTTCTT TCCCCCCTCT CTCCGCAATA 8100
AACGCCTTCC CGGAACTGTG TTTTCCCCCC CTACAACAGT GTTGTCCGTT GGTTGGGTGG 8160
TTGGGGTGCG GGGGTGGGCG GGGGAAGCAA GAAAACGGTC GGCGAACACA ACATCGGGAA 8220 AACGGATTCC CGAACGTGCG TCTTCCCAGA TTCGACACAC ACCCCCCTTC TCCTTAAATA 8280
AACACAAACC ACACGCTCGT TGGTTGGTTA ATGCCGGCGC TTTATTTACG TCTTGTTTTT 8340
TTGCGTTTCC TCCGCGGGTC CCTTCCCAAC ACGCCTGCCC CCGCCTCAGG GGTAGCGGAT 8400
AACCGGGGCC ATGTCGCCGG ATTGCACAAC GGCGGCGCCG TCGAACGTAC ACACCCGAAC 8460 CGCCGGGGCC AGGGCCAGGA TGTCCCCGAG TTGGCCCGCG TGCGCCAGCC AGGCGACCAG 8520
CGCCTCGTAA AGCGGCAGCC TGCGCTCGCC GTCCTGCATC AGCATGGGGG CTTCGGGGTG 8580
GATGAGCTGG GCGGCTTCTC GCGTGACGCT CTGCATCTGC AGGAGCGCGT TCACGTATCC 8640
GTCCTGGGCG CTCAGCGCGA GCAGCCGGGG GATGAGCGTG AGGATGAGGG TGGTTCCTTC 8700
GGTTATGGAG TAGACCATGT TGAGGACGAG CGACCGCAGC TCGGTGTTTA CGGAGGCGAG 8760 TTGCTGGACG TCGGCCACGA GCGAGAGACG GGCCCCGTTG TAATACAGCA CGTTGAGGTC 8820
GGGGAGCTCC CCGGGCGTCC GGGGGTCGGG GTTGAGGTCC CGGATGCCCC GGGCGACCAG 8880
CCGCGCGACT ATCTCGCGGG CCAGGGGCGT TGGGAGCGGG ACCGGAAACC GCAGCGTGAG 8940
GTCCAGCGAC TCCAGGCGCA CGTCCGTCGC CTGGCCCTCG AAGACGGGCG GGACGAGGCT 9000
GACGGGATCC CCGTTGCAGA GGTCGACGGG GGAGGTGTTG CGGAGATTGA CGGTGCCGGC 9060 GTGCGTGAGC CCCAGGTCCA CGGGGCAGGC GACGATTCGC GTGGGCAGCA CCCGCGTGAT 9120
TACCGCGGGG AAGCGCCTGC GGTACGCCAG CAACAACCCC AACGTGTCGG GACTAACTCC 9180
TCCGGAGACG AACGATTCGT GCGCCACGTC CGCGAGCGCC AGCTGGCGGC GGATGGTCGG 9240
CAGAAAGACC ACTCGACCCT CGCACCGCTG CAGCGCCGCG GCATCGGGGC GCGAGATACC 9300
CGAGGGGATC GCGATGTCTG CTTCGAAACA ATCCGTGATC ATGGCGCCGG GCCGCGAGAC 9360 ACCGGAACGC GGGGGTGCGG GAGGGCCGGA AAGCGCAACG CAACCGGGAC GATGATGAAA 9420
CAGAGATGGG GGGCACCGAC CGTGTGGGAG AGGGGGCGGG GCAGGGCTCA GCAGCACGCA 9480
CGGGGAGGTC TGTCGTGCGC AGGAGCCCCA GGTGAGAATC AGTCCCCCGG AGCTCGGGTC 9540
TGGGTTTTAT TGGGACCTGC CCTCGGAATC GCGGCTCCCA GTCCAAGCCC CCCCGGGGGG 9600
GCGGGGACAG GGGGTGTGTG TGGGTAAAAG CAACGTCGGA AAATCAAACC CAATGCCCCA 9660 AACAGGAAAA AAAAAAAAGA CGGGCGGGTG GAGGGAAAGC TGGGGAAGAA GAAGCCAATT 9720
TTACAGAGAC AGGCCCTTTA GCGGGGAGGC GTCGTAGATG AGATACTGCG TAAAGTGGGT 9780
CTCTCGCGCG TGGGCCTCCC CATCGCGGGC GCTGCGTAGC AGGGCGGGGT CGCTGGCGCA 9840
GGTGATCGGG TAGGCTTCCT GAAACAGGCC GCACGGGTCT TCCACGAGCT CGCGGCACCC 9900
CGGCGGGCGC TTAAACTGCA CGTCGCTGGC AGCGGTGGCC GTGGATACCG CCGATCCCGT 9960 TTCCACGATA AGACGCTCCA GGCAGCGATG TTTGGCCGTG ATGTCGGCCG CGGTGAAGAA 10020
CTTGAAGCAG GGGCTGAGGA CGGGCGAGGC CCCGTTGAGG TGATAGGCCC CGTTGTACAG 10080
CAGGTCCCCG TACGAGAACC GCTGCGACGC CCACGGGTTG GCCGTGGCCG CGAAGGGCCG 10140
CGCCGGGTCG CTCTGGCCGT GGTCGTACAT GAGGGCTATG ACGTCCCCCT CCTTGTCCCC 10200
CGCGTACACG CCGCCGGCCG CGCGTCCCCG CGGGTTGCAG GGCCGGCGAA AGTAGTTGAT 10260 GTCCGTGGCC ACGGGGGTGG CGATGAACTC ACACACGGCA TCCTGCCCGT GGTCCATGCC 10320
GGCGCGCCGC GGCACCTGGG CGCAGCCAAA GACCGGGAGG GGCTGGGCCG GCCCCAGCCG 10380
GTTTCCCGCC ACGACCGCGT TGCGCAGGTA CACGGCGGCC GCGTTGTTTA GCAGCGGGGG 10440
GGCCCCGCGG CCGAGGTAAA AGTTTTGGGG GAGGTTGCCC ATGTCCGTAA CGGGGTTGCG 10500
GACGGTGGCC GTGGCCGCGA CGGCGGTGTA GCCCACACCC AGGTCCACGT TTCCGCGCGG 10560 CTGGGTGAGC GTGAAGCTGA CCCCCCCGCC CGTTTCGTGG CGGGCCACCT GGAGCTGGCC 10620
CAGAAAGTAC GCCTCCGACG CGCGCTCGGA AAACAGCACG TTTTCGGTCA CGAAGCGGTC 10680
CTGCCGCACG ACGGTGAACC CGAACCCGGG GTGGAGGCCC GTCTTGAGCT GGTGATACAG 10740
GGCCACGGGG CTCATCTTGA AGTACCCCGC CATGAGCGCG TAGGTCAGCG CGTTCTCCCC 10800 CGCCGCGCTC TCGCGGGCGT GCTGCACCAC GGGCTGGCGG ATGGAGGAGA AGTAGTTGGC 10860
CCCCAGGGCC GGGGGGACCA GGGGGACGTG GCGCGCCAGG TCGCGCAGGG CCGGGGGGAA 10920
GTTGGGCGCG TTGGCCACGT GGTCGGCGCC CGCAAACAGC GCGTGGACGG GCAGGACGTA 10980
GAAGTATTCG CCATTTTGGA TGGTGTGGTC CAGGTGCTGG GGGGCCATGA GCAGCACGCC 11040 GGCGTGCAGC GCCCCGTCAA AAATGCGCAT GTTGGCCGTC GACGCGGTGT TGGCGCCCGC 11100
GTCGGGCGCC GCGGAGCACA GCAGCGCCGT CGTGCGCTCG GCCATGTTGT GCGCCAGCAC 11160
CTGCAGCGTG AGCATGGCGG GCCCGTCAAC AACAACGCGC CCGTTGTGGA ACATGGCGTT 11220
GACCGTGTTG GCCACCAAAT TGGCGGGATG CAGCGGGTGG GCGGGGTCGG TCACGGGATC 11280
GCTCGGGCAC TCCTCGCCGG GGGCGATCTC CGGGACCACC ATGTTCTGCA GCGTGGCGTA 11340 CACGCGGTCG AAGCGGACCC CCGCGGTGCA GCAGCGCCCC CGCGAGAAGG CCGGCACCAG 11400
CACGTAATAG TAGATTTTGT GGTGGACGGT CCAGTCGGCC GGCCGGTGCG GCCGGTCGTC 11460
GGCGGCGTCG GCCGCGCGGG CCTGGGTGTT GTGCAGCAGC CGGCCGTCGT TGCGGTTAAA 11520
GTCGGCCGTC GCCACGTTGC ACGCCGCCGC GTAGACGGGC TCGTGTCCCC CCGCGTCAAT 11580
CCGGCAGTCT CGGTGGCGGT CCAGGGCCGC GTGTCGCATA AGGCCGTCGC AGTCCCACAC 11640 GAGGGGCGGC AGCAGCGCCG GGTCGCGCAT CAGGTGATTC AGCTCGGCCT GAGCCTGCCC 11700
GCCCAGCTCC GGGCCCGGCA GGGTAAAGTC GTCCACCAGC TGGGCCAGGG CCTCGACGTG 11760
GGCCACCAGG TCCCGATACA CGGCCATGCA CTCCTCGGGG AGGTCGCCCC CGAGGTAGGT 11820
CACGATGTAC GAGACCAGCG AGTAGTCGTT CACGAACGCC GCGCATCGCG TGTTGTTCCA 11880
GTAGCTGGTG ATGCACTGAG TCACGAGCCG CGCCAGGGCG CAGAACACGT GCTCGCTGCC 11940 GTGAATCGCG GCTTGCAGCA GGTAAAACAC CGCCGGGTAG CTGCGGTCCT CGAACGCCCC 12000
GCGGACGGCG GCTATGGTAG CCGGCGCCAT GGCGTGGCGG CCAACGCCGA GCTCCAGGCC 12060
CCGGGCGTCA CGAAACGCCA CCGGACACAG CGCCAGGGGC AGGTTGCCGT TGACCACGCG 12120
CCAGGTGGCC TGGATCGCCC CCGGACCGGC CGGGGGGACT TCGCCGCCGG GAAGCTCGAC 12180
GTCGGCCACG CCCGCGAAGA AGTCGAACGC GGGGTGCAGC TCCAGAGCCA GGTTGGCGTT 12240 GTCGGGCTGC ATGAACTGCT CCGCGGTCAT CTGGCACTCG GCGACCCACC GGACCCGGCC 12300
GTGGGCGAGG CGCTGCCGCC AGGCGTTCAG AAAACGCTGC TGCATGTCCG CGCCGGGGCC 12360
GGCCGGGGCC GCGACGTACG CCCCGTACGG ATTCGCGGCC TCGACGGGGT CGTGGTTCAC 12420
GCCCCCGACG GCCGCGTCGA TGTTCATGAG CGAAGGATGA CACACGGTCC CGACCGCGTT 12480
CTCCATGGAC AGCCGCAGAA CCTGGTGGTC CTTTCCCCAA AAAAACAGCT GCCGGGGAGG 12540 GAACGCGCGG GGCTCCGGGT GGCCGGGGGC GGGCACCAGG TCCCCGGCGT GCGCGGCGAA 12600
GCGCTCCATG GCCGGGTTGA ACAGCCCCAG GGGCAGGACG AACGTCAGGT CCATGGCGCC 12660
CACCAGGGGG TAGGGCACGT TGGTGGCGGC GTAGATGCGC TTCTCCAGGG CCTCCAGGAA 12720
GACCAGCCTG TCGCCTATGG CCACCAGATC CGCGCGCACG CGCGTTGTCT GGGGGGCGCT 12780
TTCGAGTTCA TCCAGCGTCT CCCGGTTCGC CTCGAGTTGC TCCTCCTGCA TCTCCAGCAG 12840 GTGGCGGCCC ACGTCGTCCA GGCTCCGCAC GGCCTTGCCC ATCACCAGCG CCGTGACGAG 12900
GTTGGCCCCG TTCAAGACCA TCTCGCCGTA GGTCACCGGC ACGTCGGCCT CGGTGTCCTC 12960
CACCTTCAGG AAGGACTGCA GGAGGCGCTG TTTGATGGCG GCGGTGGTGA CCAGCACCCC 13020
GTCGACCGGC CGCCCGCGCG TGTCGGCGTG CGTCAGGCGG GGCACGGCCA CGGACGGCTG 13080
CGTCGCCGTG GTCAGGTCCA CGAGCCAGGC CTCGATGGCC TCGCGGCGAT GGCCCGCCTT 13140 GCCCAGGAAG AAGCTCGTGT CGCAAAAGCT CCGCTTCAGC TCGGCGACCA GGGTCGCCCG 13200
GGCGACCCTG GTCGCCAGGC GCCCGTTGTC GAGATATCGT TGCATGGGCA ACAGCAGGGC 13260
CAGGGGAGGC GCCTTCTCCA ACAGCACGTG CAGCATCTGG TCGGCCGTGC CGCGCTCAAA 13320
CGCCCCCAGG ACGGCCTGGA CGTTGCGCGC GAGCTGCTGG ATGGCGCGCA GCTGGCGATG 13380 CAGGCTAATG CCCGTCCCGT CCAGGGCCTC CCCCGTGAGC AGGGCAATGG CCTCGGTGGC 13440
CAGGCTGAAG GCGGCGTTCA GGGCCCGGCG GTCGATGACC TTCGTCATGT AATTATGCAC 13500
GGGCTGCTCG ACGGGGTGCG GGCCGTCGCG GGCGATGAGG GGCTGGTGGA CCTCGAACTG 13560
, CACACGCCCT TCGTTCATGT AAGCCAGCTC CGGGAACTTG GTGCACACGC ACGCCACGGA 13620 CAGGCCGAGC TCCAGAAAGC GCACGAGCGA CAGGGTGTTG CAGTAGGACC CCAGCAGGGC 13680
GTCAAACTCT ACGTCATACA GGCTGTTTTC GTCGGAGCGC ACGCGGGCGA AAAAATCAAA 13740
GAGTCTGCGG TGGGACGCCA CCTCGATCGT ACTCAGGATG GAGCCGGTGG GCACCATGGC 13800
CGCGGCGTAC CGGTAACCCG GGGGGTCGCG GGCAGGAGCG GCCATTGGGT TCCTTGGGGG 13860
ATTCGCAGGC TCCATCAAGC CAAGCTCGGG AAGGCCAAGC CCCTCCCACA CAACGCCTCA 13 20 CCGCCGGCGG ACGCGACTAA CAACCCACGG GCCGCCAAAA ACCCCAAGGG GCAACCCGAC 13980
CAACAACAGG CGAGGGGAGG AAAGGCGTAA AGGGGGCGTT GGGAGGCAAA AAGAAAGAAA 14040
ACACCCAGAC GTAGGCCCGA GGACCGGCCG GCGTCCTCTG TCCCCGAGCA CCCACTGTGC 14100
CCAACAGGCA CGGGGGCGAG CTGCCCCTGC CTTATATACC CCCCCGCCAC ACCCCCGTTA 14160
GAACGCGACG GGTGCCTTCA AGATGGCCCT GGTCCAAAAG CGTGCTAGAA AAAAGTTGGT 14220 AAAGGCGGCA AAGCAGTCCG CCGCCGCCAC CCACATGGCG GCGCCGGCCG CGCAGGCGAT 14280
TCCCAGAGAA CGGGCGCGGA GGGGATCCGT GCGGGGCAGC AGCTGGCTGG CGGTGATCCA 14340
ATGGAAAAGC CCGTCGGGAC TGAACGTCTC ATGGGCGGCC GCCACCAGGG CGCACAGGGC 14400
CGCGCCGCCC ATGATCACGC ACAACCCCCA AAACACGGGT GGCGACAACG GCAGGCGATC 14460
CCGTTTGATG TTCACGTACA GGAGGAGCGC CCGTGCCAGC CACGTGACAT AGTAGGCGAG 14520 GACGGCGGCT ATAATACATG CCGGCGCCAC CGCCCGTCCG GTCCACCCGT AATACATGCC 14580
CGCGGCCACC AGCTCCAGCG GCTTGAGGAC CAGGAACGAC CAAGCAAACA TCACCACCCG 14640
CTTGGAAAAG ACCGGCTGGG TGTGGGGCGG AAGACGCGAG TAGGCCGAAC TGACAAAAAA 14700
ATCAGACGTG CCGTACGAGG ACAGCGAAAA CTGTTCATCG AGCGGCAGTT CGCCGTCCTC 14760
CCCGCCACAC GCGGCCTCGT ATACCAGCTC GCGATCCAAC AAAGGAACAT CATCCCGCAT 14820 TGTCATGGTC GGTGCGGGGA GCCGGCGAGG CAGCGAAACC GAAAGTAGTG CTGGCGGCGC 14880
GGGCCCGGGT CCGGACCCAA GCTTCAGGGA TGGGGGGCGG AGGCCAAAAT CAAACAAGCA 14940
CCGCGCGGGT TCTACACACA ACCCCCACCC GGGTAGTATC CGCGGATGCG AGTGCCTGGC 15000
GAAGTCACGT CCCAGCAGGA TATAAACCTC GGCCGTTGGG CCCGGAACCC CCGAAATTCA 15060
CACCCACGCC CTGACGCCCA AATCATGGGT GGATGTGGTT CGCGAGCCGC ACATCCGTGC 15120 GTCCGCCCTC CCCCGCGGGC TGATGACGTG GCGGTTAGTC AGTGGGAAGG CAGGGGGAAA 15180
GATGGGTTGG GGGAGGAAAC GAAAAAAACA CCCAGAGGGC CACGTCGGGA ATGCGCCCGG 15240
AGTTGTCCTT AAAAGGCCGG CCGTGCGTGA CGGAAGCCGT CGTTTGCCCA AGCACCGACG 15300
CCGCGATCCA CAGTGGGGGG AGTTCCTCCG TCCGGCCACA ACCCTACGCG CGGGCGGCAC 15360
GCGCGAGAGC AACCCACGGG TCCCGTTCGC GCCACCGCCA GCCCTTGCTC CCACCACCCT 15420 CCTCCCACCA CCCCACTATT CCCCCCCCCC CAAGTCCGCC CCGTGGCTCG CCGGCCATGG 15480
AGCTCACCTA TGCCACCACC CTGCACCACC GGGACGTTGT GTTTTACGTC ACGGCAGACA 15540
GAAACCGCGC CTACTTTGTG TGCGGGGGGT CCGTTTATTC CGTAGGGCGG CCTCGGGATT 15600
CTCAGCCGGG GGAAATTGCC AAGTTTGGCC TGGTGGTCCG GGGGACAGGC CCCAAAGACC 15660
GCATGGTCGC CAACTACGTA CGAAGCGAGC TCCGCCAGCG CGGCCTGCGG GAAGTGCGGC 15720 CCGTGGGGGA GGACGAGGTG TTCCTGGACA GCGTGTGTCT GCTAAACCCG AACGTGAGCT 15780
CCGAGCGAGA CGTGATTAAT ACCAACGACG TTGAAGTGCT GGACGAATGC CTGGCCGAAT 15840
ACTGCACCTC GCTGCAAACC AGCCCGGGGG TGCTGGTGAC CGGGGTGCGC GTGCGCGCGC 15900
GAGACAGGGT CATCGAGCTA TTTGAGCACC CGGCGATCGT CAACATTTCC TCGCGCTTCG 15960 CGTACACCCC CTCCCCCTAC GTATTCGCCC TGGCCCAGGC GCACCTCCCC CGGCTCCCGA 16020
GCTCGCTGGA GCCCCTGGTG AGCGGCCTGT TTGACGGCAT TCCCGCCCCG CGCCAGCCCC 16080
TGGACGCCCG CGACCGGCGC ACGGATGTTG TGATCACGGG CACCCGCGCC CCCAGACCGA 16140
TGGCCGGGAC CGGGGCCGGG GGCGCGGGGG CCAAGCGGGC CACCGTCAGC GAGTTCGTGC 16200 AAGTGAAGCA CATCGACCGT GTTGTGTCCC CGAGCGTCTC TTCCGCCCCC CCGCCGAGCG 16260
CCCCCGACGC GAGTCTGCCG CCCCCGGGGC TCCAGGAGGC CGCCCCGCCG GGCCCCCCGC 16320
TCAGGGAGCT GTGGTGGGTG TTCTACGCCG GCGACCGGGC GCTGGAGGAG CCCCACGCCG 16380
AGTCGGGATT GACGCGCGAG GAGGTCCGCG CCGTGCATGG GTTCCGGGAG CAGGCGTGGA 16440
AGCTGTTTGG GTCGGTGGGG GCTCCGCGGG CGTTTCTCGG GGCCGCGCTG GCCCTGAGCC 16500 CGACCCAAAA GCTCGCCGTC TACTACTATC TCATCCACCG GGAGCGGCGC ATGTCCCCCT 16560
TCCCCGCGCT CGTGCGGCTC GTCGGTCGGT ACATCCAGCG CCACGGCCTG TACGTTCCCG 16620
CGCCCGACGA ACCGACGTTG GCCGATGCCA TGAACGGGCT GTTCCGCGAC GCGCTGGCGG 16680
CCGGGACCGT GGCCGAGCAG CTCCTCATGT TCGACCTCCT CCCGCCCAAG GACGTGCCGG 16740
TGGGGAGCGA CGCGCGGGCC GACAGCGCCG CCCTGCTGCG CTTTGTGGAC TCGCAACGCC 16800 TGACCCCGGG GGGGTCCGTC TCGCCCGAGC ACGTCATGTA CCTCGGCGCG TTCCTGGGCG 16860
TGTTGTACGC CGGCCACGGA CGCCTGGCCG CGGCCACGCA TACCGCGCGC CTGACGGGCG 16920
TGACGTCCCT GGTCCTGACC GTGGGGGACG TCGACCGGAT GTCCGCGTTT GACCGCGGGC 16980
CGGCGGGGGC GGCTGGCCGC ACGCGAACCG CCGGGTACCT GGACGCGCTG CTTACCGTTT 17040
GCCTGGCTCG CGCCCAGCAC GGCCAGTCTG TGTGAGATAT CCCAATAAAG TGCAGTCGTT 17100 TTCTAACCCA CGGATGCCGT TGTATGCCTA TACGGGGGAC TATGGGGGGG GGGGGAAAGG 17160
AAAGGAAACA GGAATGGAGA AGGGAAAGGA ACAGAGGCGG TAGCGGACGC ACGGCGGACA 17220
CAATAACAAA CAGACCGCGG ACACGGAGGG AGTCGGTTGG GTTGGGCGTG GACGCCGCTG 17280
CGTCCACACA CCCGTTTATT CGCGTCTCCA CAAAAATGGG ACGCACGTTC GGACCACCCT 17340
GAGGATGCCC GCCAGGGCCG CGGTGATCAT AACGACCCCC AGCGCGGACG CGGCCAGAAA 17400 CCCGGGGGCG ATGGTGGCGA TGGGCAGCGT GTCAAAGGCC AGCAGATGAA TCACAGTTCC 17460
GTTGGGGAAC AACAACAGGG CCACGGACGG CACGTCGCTG GAAAACACGT TCGGGGTGCC 17520
CGCCACCGGC CCCTGGGCCA GCTGCTGTTG GGTGGCATCC GTGTCCACCA GCAGCACCGA 17580
CATGACCTCC CCGGCCGGGG TGTAGCGCAG AAACACGGCC CCCACGAGGC CGAGGTCGCG 17640
CCGGTTTTCG GTGCGCACCA GCCGCTTCGG CTCAATCTCC CGCGCGTGCC CTTCGCAGGT 17700 GGCGGTGAGA TAGGTGATAA ACAGCGGGCG GCGGACGTCA ACGCCCGTAA GCTTGTATCC 17760
GATCCCGCGG GGCAAGGGGG TGTGGGTGAC GACGTAGCTG GCGTTGTGGG TGATGGGCAC 17820
GAGGATCCGG GGCTCCGCGT TGTGCGACGG GCCGCTACAC TGGTGGGTGG CCTCCGGGAC 17880
GAAGGCGCGG ATCAGGGCGT TGTAGTGCGC CCAGCGCGTG AGAACGGAGG CCACGCCGCG 17940
GGTCTGTTGT GCCATGACGT CCGCCGGGAT GTCGGATCGG GTGGCCATGG CCAGCGCGTC 18000 CAGGATGAAC CCGCCCTCGG CGAGATCGAA GCGCAGGGAA GCTGCGCATG GGGAAAAGTG 18060
GTCCGGGAGC CAGAAGAGGT TTTTCTGGTG GTCGGTCCTG GCTAGCGCGG CCCGGAGATC 18120
GGCGTGGGTC GCCGCGGCGA CGTCGGACGT ACACAGGGCC GTGGTTATGA GGAGGCCCCG 18180
GCGGGCGCGT TCCCGCTGCT CGGCCGAGGG CGCGCCCGCC AGGAACGGCG CCCGGAGGAC 18240
GGCCGTGGCG TAAAACAGCG CTCGGCGGAC CATCGGGGCG GTTAGCGCGC GGCCGCCGAG 18300 AAACTCGGCG TACAGGGCGT CGATCAGGCG GGCCGCGCTC GGGGCCACCG CGCCATAGGC 18360
CGCGGGGCTG TCCAACACGA ACGCCAGCTG ATAGCCCAGC GCGTGCGCCG CCAGGCTCTG 18420
CTCTCGCTCG AGGATCGCGG CCACCAGATG CCCGAGGCGC GCCTCCAGCC GCAGGCGGGC 18480
CGCCGGGTCC AACACGGACA CGTTCAGGAA CACCGAGTCG GCCGCGCAGC CCGCTGCTCC 18540 CCGGGCGGCC AGGCCGGCCA GCACGCGCGA GTGGGCCAAA AAGCCCAGCA GGTCGGAGAG 18600
GCGAATCGCG TCGTGGGCGT GGGCCGCGTT GACGAACGCA AACCCCGACG AGGCGAGCAG 18660
CCCCGCGAGG CGCCAGAACA GGGACGGACG CGCGTCCGTG CCGGAGCCCG GGTCCTCCCC 18720
CAAAAACTCC GCATAGGCCC GCGACATATA CTGGGCGTAG TTCGTGCTCT CCTCGGGGTA 18780 GCCGGCCACC CGCCGGAGGG CGTCCAGCGC CGAGCCGTTG TCGGCGGGCG TCGGGGCCCC 18840
CAGGACAAAG ACGCGATACC TGGGGCCGGC CGGAGGCCCG GGGAGCACCG CGGGGGCGTT 18900
TTCGTCGGTC GGATTTCCGA CCCGAGCGAG GGTCTTGTCC GCAGGCACCA CTATGATCTC 18960
GGCCGGAGGG CTGTCCCGCA TCGATATCAC AACCCCCATG AAGCCCTTCC CGTATCGCGC 19020
GCGCACAAGC GCGGCGTCGC ACCCGAACGC CAGCCCGCCC GTCGTCCAAA CGCCCACGGG 19080 CCACTTCAAG GCCGACGGGG AGAGGTACAC TTACCGACCC GGAGTCCGTA GCAGGCCCCT 19140
GGCGGCCAGC CAGGTCACGG ATGCGTTGTG CAGATGCGCG ATGCTCAGGT TCGTCGTCGG 19200
ATGCCTCGGT GTCCCCGCGG GCGGCCCCGG GGGCGGCGCG TTGCGTCGGC CGTCCGGGTG 19260
CCTCTCGGTC GCCCCGTCGT CTCCCCGCGG GAACGTAAGC CCCTCGCGGT CCGGCGCGGC 19320
CGCGAATGTT ACCCAGGCCC GGGACCGCAA CAGCGCGGAG GCGCCGGGGT TGTGCGACAG 19380 TCCCTTGAGC TGGGTCACCT CGGCGGGGGG ACGGGACGTG GGCCCCGCCT CGGGGAGCTC 19440
GGGCAGGCTC GCGTTCCGAG GCCGGCCGAG CAGATAGGTC TTTGGGATGT AAAGCAGCTG 19500
CCCGGGGTCC CGAGGAAACT CGGCCGTGGT GACCAACACA AAACAAAAGC GCTCGGCGTA 19560
CCACCGAAGC ATGGGCACGG ATGCCGTAGT CAGGTTGAGT TCGCCCGGGG GCGCCAAGCG 19620
TCCGCGCTGG GGGTCGCTGG CGTCGGGGGT GTTGGGCAAC CACAGACGCC CGGTGTTTGT 19680 GTCGCGCCAG TACGTGCGGG CCAACCCCAG ACCGTGCAAA AACCACGGGT CGATTTGCTC 19740
CGTCCAGTAC GTGTCATGGC CCCCGGCAAC GCCCACCAGG ACCCCCATCA CCACCCACAG 19800
ACCGGGGCCC ATGGTCGTCC GTCCCGGCTG CCAGTCCGCA GATGGGGGGG TGTCCGTACC 19860
CACGGCCCAA AGAGGCTCCG CACCTCGGAG GCTATCGGAG GCCCTTTGTT GCCGTAAGCG 19920
CGGGCCAAAG GATGGGGTGG GGTGAGGGTA AAAGCACAAA GGGAGTACCA GACCGAAAAC 19980 AAGGACGGAT CGGCCCGCTC CGTTTTTCGG TGGGGTGCTG ATACGGTGCC AGCCCTGGCC 20040
CCGAACCCCC GCGCTTATGG ACACACCACA CGACAACAAT GCCTTTTATT CTGTTCTTTT 20100
ATTGCCGTCA TCGCCGGGAG GCCTTCCGTT CGGGCTTCCG TGTTTGAACT AAACTCCCCC 20160
CACCTCGCGG GCAAACGTGC GCGCCAGGTC GCGTATCTCG GCGATGGACC CGGCGGTTGT 20220
GACGCGGGTT GGGATCATCC CGGCGGTGAG GCGCAACAGG GCGTCTCGAC ACCCGACGGG 20280 CGACTGATCG TAATCCAGGA CAAATAGATG CATCGGAAGG AGGCGGTCGG CCAAGACGTC 20340
CAAGACCCAG GCAAAAATGT GGTACAAGTC CCCGTTGGGG GCCAGCAGCT CGGGAACGCG 20400
GAACAGGGCA AACAGCGTGT CCTCGATGCG GGGCAGAGAC CCCGCGCCGT CCTCGGGGTC 20460
GGGGCGCGGG GTCGCCGCGG CGACCCCCGT CAGCCGGCCC CAGTCCTCCC GCCACCTCCC 20520
GCCGCGCTGC AGGTACCGCA CCGTGTTGGC GAGTAGATCG TAGACACGGC GAATGGCGGA 20580 CAGCATGGCC AGGTCAAGCC GCTCGCCCGG GCGTTGGCGT CTGGCCAGGC GGTCGGCGTG 20640
TTCGGCCTCC GGAAGGACAC CCAGGACCAG GTTCGTGCCG GGCGCGGTCG GGGGCATGAG 20700
GGCCACGAAC GCCAACACGG CCTGGGGGGT CATGCTTCCC ATGAGGTACC GCGCGGCCGG 20760
GTAGCACAGC AGGGAGGCGA TAGGGTGCCG GTCGAAAACA AGGGTGAGGG CCGGGGGCGG 20820
GGCTTGCGGG CCCACAGCCT CCCCCCCGAT ATGAGGAGCC AAAACGGCGT CCGTCGCCGC 20880 ATAAGGCGTG CTCATTGTTA TCTGGGCGCT GGTCATTACC ACCGCCGCCT CCCCGGCCGA 20940
TATCTCGCCG CGGTCCAGAC GGTGCTGCGT GTTGTAGATG TTCGTCAGGG TCTCGGAGGC 21000
CCCCAGCACC TGCCAGTAAG TCATCGGCTC GGGGACGTAG ACGATATTGT CGCGCGGCCC 21060
CAGGGCCTCC ATCAGCTGCG CGGAGGTGGT GGTCTTCCCC ACCCCGTGGG GTCCGTCTAT 21120 ATAAACCCGC AGCAGCGTGG GCAGCTCCGG ATCCCCGCGG GCTTCGGAGG CCCCCTGGCG 21180
ATGGCTAGGA CGGGACGCCG CGCGGCCGTC GGTAGGCCCG CTCGCACGAG CAGCCTGACC 21240
GAACGCAGGC GCGTGCTGTT GGCCGGCGTG AGAAGCCATA CCCGCTTCTA CAAGGCGTTC 21300
GCCCGAGAGG TGCGGGAGTT CAACGCCACC AGGATTTGTG GAACGCTGCT GACGCTGATG 21360 AGCGGGTCGC TGCAGGGTCG CTCGCTGTTC GAGGCCACGC GCGTCACCTT AATATGCGAA 21420
GTGGACCTCG GGCCGCGCCG CCCAGACTGC ATCTGCGTGT TCGAATTCGC CAATGACAAA 21480
ACGTTGGGAG GTGTGTGCGT CATCCTGGAG CTAAAGACAT GCAAATCGAT TTCTTCCGGG 21540
GACACGGCCA GCAAACGCGA ACAGCGGACC ACGGGCATGA AGCAGCTGCG CCACTCCCTG 21600
AAGCTGCTGC AGTCGCTCGC GCCTCCGGGG GACAAGGTCG TCTACCTGTG TCCTATTTTG 21660 GTGTTTGTCG CGCAGCGTAC GCTGCGCGTC AGCCGCGTGA CCCGGCTCGT CCCGCAAAAG 21720
ATCTCCGGCA ACATCACCGC GGCCGTGCGG ATGCTCCAAA GCCTGTCCAC GTATGCCGTG 21780
CCGCCGGAAC CGCAGACCCG GCGGTCGCGG CGCCGGGTTG CCGCGACCGC CAGACCGCAA 21840
AGGCCCCCCT CCCCGACACG TGACCCGGAA GGCACGGCGG GTCATCCGGC CCCACCAGAG 21900
AGCGACCCCC CCTCCCCAGG GGTTGTAGGC GTCGCTGCGG AGGGTGGGGG TGTGCTTCAG 21960 AAAATCGCGG CGCTTTTTTG CGTGCCGGTG GCCGCCAAGA GCAGACCCCG GACCAAAACC 22020
GAGTGAGGTT CTGTGTGTTG TTTTTTTTCC TCGTTTTGTT TTCTCTTCTT TCCCCCCCCC 22080
CTCCCCCGCT TCTGGCCAAG CATCCTCACC TGCTTAAGCG GAACCCGCGG GCGCGCGGGG 22140
ACTCATTTGT CGCCGGCGAC ACCCACCCGA CAACAGCCCC TGGGTGTAGA CCGCTGTCGC 22200
CCCCGTCTGT CGCCTCTCCC TTTTTTCCCC CCCTCAAAAA ACGTGGTGTT GGGCGCCGGC 22260 CAATTCTTCC CGGAGCGCCG TCGTCGCCCG CCCGCCGCCC TCGAACATGG ACCCGTACTA 22320
CCCTTTCGAC GCGCTGGACG TTTGGGAACA CAGGCGCTTC ATCGTCGCCG ACTCCAGGAG 22380
CTTCATCACC CCCGAGTTCC CCCGGGACTT CTGGATGTTG CCCGTGTTCA ACATCCCCCG 22440
GGAGACGGCG GCGGAGCGGG CGGCAGTGAT GCAGGCCCAG CGCACCGCGG CCGCGGCGGC 22500
CCTGGAGAAC GCCGCCCTCC AGGCCGCCGA GCTGCCCGTC GACATCGAGC GCCGGATACG 22560 CCCGATCGAG CAGCAGGTGC ATCACATCGC CGACGCCCTG GAGGCGCTGG AGACCGCGGC 22620
GGCCGCGGCC GAAGAGGCGG ATGCCGCGCG GGACGCCGAG GCGAGGGGGG AGGGCGCTGC 22680
GGACGGGGCA GCGCCGTCGC CCACCGCGGG CCCCGCCGCC GCGGAGATGG AGGTTCAGAT 22740
CGTACGCAAC GACCCGCCGC TACGATACGA TACCAACCTC CCCGTGGATC TGCTACACAT 22800
GGTGTACGCG GGCCGCGGGG CCGCGGGTTC GTCGGGAGTC GTCTTTGGTA CCTGGTACCG 22860 CACGATCCAG GAACGCACCA TCGCGGACTT CCCCCTGACC ACCCGCAGCG CCGACTTTCG 22920
AGACGGGCGC ATGTCCAAAA CCTTCATGAC CGCGCTGGTC CTGTCTCTGC AGTCGTGCGG 22980
CCGGCTGTAC GTGGGCCAGC GCCACTATTC CGCCTTCGAG TGCGCCGTGC TGTGTCTGTA 23040
TCTGCTGTAC CGAACCACCC ACGAGTCCTC CCCCGATCGC GATCGCGCTC CCGTTGCGTT 23100
CGGGGACCTG CTGGCCCGCC TGCCGCGCTA CCTGGCGCGT CTGGCCGCGG TAATCGGCGA 23160 CGAGAGCGGA CGCCCGCAGT ACCGCTACCG CGACGACAAG CTGCCCAAAG CGCAGTTCGC 23220
GGCGGCCGGC GGCCGCTACG AGCACGGGGC CCTGGCCACC CACGTCGTGA TCGCCACGTT 23280
GGTGCGCCAC GGGGTGCTAC CGGCGGCCCC GGGCGACGTT CCCCGAGACA CCAGCACCCG 23340
CGTGAACCCC GACGACGTGG CCCACCGCGA CGACGTCAAC CGCGCCGCCG CCGCGTTTTT 23400
GGCACGCGGC CACAACCTCT TCCTGTGGGA GGACCAGACG CTGCTGCGGG CGACCGCCAA 23460 CACCATTACG GCCCTGGCCG TGCTTCGGCG GCTCCTCGCG AACGGCAACG TGTACGCGGA 23520
CCGCCTCGAC AACCGCCTGC AGCTGGGCAT GCTGATCCCG GGAGCCGTCC CGGCGGAGGC 23580
CATCGCTCGG GGGGCGTCCG GATTGGACTC GGGCGCCATA AAAAGCGGCG ACAACAACCT 23640
GGAGGCGCTG TGCGTTAACT ATGTACTTCC GCTGTATCAG GCAGACCCCA CGGTCGAGCT 23700 GACCCAGTTG TTTCCGGGGG CTGGCCGCCC TGTGCCTGGA CGCCCAGGCG GGGCGGCCAC 23760
TGGCGTTGAC GAGGCGCGTG GTGGATATGT TGTCGGGCGC CCGCCAGGCG GCGCTCGTGC 23820
GCCTCACCGC GCTGGAGCTC ATCAACCGCA CCCGCACAAA CACCACCCCT GTGGGGGAGA 23880
TTATTAACGC CCACGATGCC TTGGGGATAC AATACGAACA GGGCCTGGGG CTGCTCGCCC 23940 AGCAGGCACG CATCGGCTTG GCGTCGAACG CCAAGCGATT CGCCACGTTC AACGTGGGCA 24000
GCGACTACGA CCTGTTGTAC TTTTTGTGTC TCGGGTTCAT TCCCCAGTAC CTGTCCGTGG 24060
CCTAGGGAAG GGTGGGGGTG GTGGTGGTGG GGTGTTTTTC TGCTGTTGTT GTTTCTGGTC 24120
CGCCTGGTCA CAAAAGGCAC GGCGCCCCGA AACGCGGGCT TTAGTCCCGG CCCGGACGTC 24180
GGCGGACACA CAACAACGGC GGGCCCCGTG GGTGGGTAAG TTGGTTCGGG GGCATCGCTG 24240 TATTCCCTTG CCCGCTTCCA CCCCCCCTTC CCGTTTTGTT TGTTTGTGCG GGTGCCCATG 24300
GCGTCGGCGG AAATGCGCGA GCGGTTGGAG GCGCCTCTGC CCGACCGGGC GGTGCCCATC 24360
TACGTGGCCG GGTTTTTGGC CCTGTACGAC AGCGGGGACC CGGGCGAGCT GGCCCTGGAC 24420
CCAGACACGG TGCGTGCGGC CCTGCCTCCG GAGAACCCCC TGCCGATCAA CGTAGACCAC 24480
CGCGCTCGGT GCGAGGTGGG CCGGGTGCTC GCCGTGGTCA ACGACCCTCG GGGGCCGTTT 24540 TTTGTGGGGC TGATCGCGTG CGTGCAGCTG GAGCGCGTCC TCGAGACGGC CGCCAGCGCC 24600
GCTATTTTTG AGCGCCGCGG ACCCGCGCTC TCCCGGGAGG AGCGTCTGCT GTACCTGATC 24660
ACCAACTACC TGCCATCGGT CTCGCTGTCC ACAAAACGCC GGGGGGACGA GGTTCCGCCC 24720
GACCGCACCC TGTTTGCGCA CGTGGCCCTG TGCGCCATCG GGCGGCGCCT TGGAACCATC 24780
GTCACCTACG ACACCAGCCT AGACGCGGCC ATCGCTCCGT TTCGCCACCT GGACCCGGCG 24840 ACGCGCGAGG GGGTGCGACG CGAGGCCGCC GAGGCCGAGC TCGCGCTGGC CGGGCGCACC 24900
TGGGCCCCCG GCGTGGAGGC GCTCACACAC ACGCTGCTCT CCACCGCCGT CAACAACATG 24960
ATGCTGCGTG ACCGCTGGAG CCTCGTGGCC GAGCGGCGGC GGCAGGCCGG GATCGCCGGA 25020
CACACGTACC TTCAGGCGAG CGAAAAATTT AAAATATGGG GGGCGGAGTC TGCCCCTGCG 25080
CCGGAGCGCG GGTATAAAAC CGGCGCCCCG GGTGCCATGG ACACATCCCC CGCCGCGAGC 25140 GTTCCCGCGC CGCAGGTCGC CGTCCGTGCG CGTCAAGTCG CGTCGTCGTC GTCTTCTTCT 25200
TCTTCTTTTC CGGCACCGGC CGATATGAAC CCCGTTTCGG CATCGGGCGC CCCGGCCCCT 25260
CCGCCGCCCG GCGACGGGAG TTATTTGTGG ATCCCCGCCT TTCATTACAA TCAGCTCGTC 25320
ACCGGGCAAT CCGCGCCCCA CCACCCGCCG CTGACCGCGT GCGGCCTGCC GGCCGCGGGG 25380
ACGGTGGCCT ACGGACACCC CGGCGCCGGC CCGTCCCCGC ACTACCCGCC TCCTCCCGCC 25440 CACCCGTACC CGGGGTATGC TGTTCGCGGG CCCCAGTCCC CTGGAGGCCC AGATCGCCGC 25500
GCTGGTGGGG GCCATCGCCG CCGACCGCCA GGCGGGTGGG CTTCCGGCGG CCGCCGGAGA 25560
CCACGGGATC CGGGGGTCGG CGAACCGCCG CCGACACGAG GTGGAGCAGC CGGAGTACGA 25620
CTGCGGCCGT GACGAGCCGG ACCGGGACTT CCCGTATTAC CCGGGCGAGG CCCGCCCCGA 25680
GCCGCGCCCG GTCGACTCCC GGCGCGCCGC GCGCCAGGCT TCCGGGCCCC ACGAAACCAT 25740 CACGGCGCTG GTGGGGGCGG TGACGTCCCT GCAGCAGGAA CTGGCGCACA TGCGCGCGCG 25800
TACCCACGCC CCCTACGGGC CGTATCCGCC GGTGGGGCCC TACCACCACC CCCACGCAGA 25860
CACGGAGACC CCCGCCCAAC CACCCCGCTA CCCCGCCGAG GCCGTCTATC TGCCGCCGCC 25920
GCACATCGCC CCCCCGGGGC CTCCTCTATC CGGGGCGGTC CCCCCACCCT CGTATCCCCC 25980
AGTTGCGGTT ACCCCCGGTC CCGCCCCCCC GCTACATCAG CCCTCCCCCG CACACGCCCA 26040 CCCCCCTCCG CCGCCGCCGG GACCCACGCC TCCCCCCGCC GCGAGCTTAC CCCAACCCGA 26100
GGCGCCCGGC GCGGAGGCCG GCGCCTTAGT TAACGCCAGC AGCGCGGCCC ACGTGAACGT 26160
GGACACGGCC CGGGCCGCCG ATTTGTTTGT GTCACAGATG ATGGGGTCCC GCTAACTCGC 26220
CTCCAGGATC CGGACTTGGG GGGGGTGTGT GTTTTCATAT ATTTTAAATA AACAAACAAC 26280 CGGACAAAAG TATACCCACT TCGTGTGCTT GTGTTTTTGT TTGAGAGGGG GGGGGTGG 26339
(2) INFORMATION FOR SEQ ID NO: 38:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 897 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38:
Val Ser Gly Arg Ala Gly Asp Pro Ala Gly Leu Pro Ala Pro Arg Gly 1 5 10 15
Gly Pro Thr Trp Pro Met Pro Ser Gly Gly Pro Pro Pro Glu Val Lys
20 25 30
Ala Gly Leu Arg Ala Asp Met Trp Gly Val Met Gly Gin Tyr Arg Glu 35 40 45
Ala Xaa Glu His Gin Thr Pro Asp Thr Glu Thr Val Val Ala Gly Met
50 55 60
His Pro Ala Leu Val Val Val Leu Lys Thr Met Phe Xaa Asp Ala Pro 65 70 75 80 Glu Thr Pro Val Leu Val Gin Phe Phe Ser Asp His Ala Pro Thr Ile
85 90 95
Ala Lys Ala Val Ser Asn Ala Ile Asn Ala Gly Ser Ala Ala Val Ala
100 105 110
Thr Asp Ala Ala Thr Val Asp Ala Ala Val Arg Ala His Gly Ala Asp 115 120 125
Ala Val Ser Ala Leu Gly Ala Ala Ala Arg Asp Pro Asp Leu Ser Phe
130 135 140
Leu Ala Ala Asp Ser Ala Ala Gly Tyr Val Lys Ala Thr Arg Leu Ala 145 150 155 160 Leu Glu Arg Ala Ile Asp Lys Leu Thr Thr Leu Gly Ser Ala Ala Ala
165 170 175
Asp Leu Val Phe His Ala Arg Arg Ala Cys Ala Gin Pro Glu Gly Asp
180 185 190
His Ala Ala Leu Ile Asp Ala Ala Ala Arg Ala Thr Thr Ala Ala Arg 195 200 205
Glu Ser Leu Ala Gly His Glu Ala Gly Phe Gly Gly Leu Leu His Ala
210 215 220
Glu Gly Thr Ala Gly Asp His Ser Pro Ser Gly Arg Ala Leu Gin Glu 225 230 235 240
Leu Gly Lys Val Ile Gly Ala Thr Arg Arg Arg Ala Glu Glu Leu Glu
245 _ 250 255
Ala Ala Val Ala Asp Leu Thr Gly Lys Met Ala Ala Gin Arg Arg Ser 260 265 270
Ser Trp Ala Ala Gly Val Glu Ala Ala Leu Asp Arg Val Glu Asn Arg
275 280 285
Ala Glu Phe Asp Val Val Glu Leu Arg Arg Leu Gin Ala Gly Thr His
290 295 300 Gly Tyr Asn Pro Arg Asp Phe Arg Lys Arg Ala Glu Gin Ala Ala Asn
305 310 315 320
Ala Glu Ala Val Thr Leu Ala Leu Asp Thr Ala Phe Ala Phe Asn Pro
325 330 335
Tyr Thr Pro Glu Asn Gin Arg His Pro Met Leu Pro Pro Leu Ala Ala 340 345 350
Ile His Arg Leu Gly Trp Ser Ala Ala Phe His Ala Ala Ala Glu Thr
355 360 365
Tyr Ala Asp Met Phe Arg Val Asp Ala Glu Pro Leu Ala Arg Leu Leu
370 375 380 Arg Ile Ala Glu Gly Leu Leu Glu Met Ala Gin Ala Gly Asp Gly Phe
385 390 395 400
Ile Asp Tyr His Glu Ala Val Gly Arg Leu Ala Asp Asp Met Thr Ser
405 410 415
Val Pro Gly Leu Arg Arg Tyr Val Pro Phe Phe Gin His Gly Tyr Ala 420 425 430
Asp Tyr Val Glu Leu Arg Asp Arg Leu Asp Ala Ile Arg Ala Asp Val
435 440 445
His Arg Ala Leu Gly Gly Val Pro Leu Asp Leu Ala Ala Ala Ala Glu
450 455 460 Gin Ile Ser Ala Ala Arg Asn Asp Pro Glu Ala Thr Ala Glu Leu Val
465 470 475 480
Arg Thr Gly Val Thr Leu Pro Cys Pro Ser Glu Asp Ala Leu Val Ala
485 490 495
Cys Ala Ala Ala Leu Glu Arg Val Asp Gin Ser Pro Val Lys Asn Thr 500 505 510
Ala Tyr Ala Glu Tyr Val Ala Phe Val Thr Arg Gin Asp Thr Ala Glu
515 520 525
Thr Lys Asp Ala Val Val Arg Ala Lys Gin Gin Arg Ala Glu Ala Thr 530 535 540 Glu Arg Val Met Ala Gly Leu Arg Glu Ala Ala Arg Glu Arg Arg Ala 545 550 555 560
Gin Ile Glu Ala Glu Gly Leu Ala Asn Leu Lys Thr Met Leu Lys Val 565 570 575 Val Ala Val Pro Ala Thr Val Ala Lys Thr Leu Asp Gin Ala Arg Ser
580 585 590
Val Ala Glu Ile Ala Asp Gin Val Glu Val Leu Leu Asp Gin Thr Glu 595 600 605 Lys Thr Arg Glu Leu Asp Val Pro Ala Val Ile Trp Leu Glu His Ala 610 615 620
Gin Arg Thr Phe Glu Thr His Pro Leu Ser Ala Arg Asp Gly Pro Gly 625 630 635 640
Pro Leu Ala Arg His Ala Gly Arg Leu Gly Ala Leu Phe Asp Thr Arg 645 650 655
Arg Arg Val Asp Ala Leu Arg Arg Ser Leu Glu Glu Ala Glu Ala Glu
660 665 670
Trp Asp Glu Val Trp Gly Arg Phe Gly Arg Val Arg Gly Gly Ala Trp 675 680 685 Lys Ser Pro Glu Gly Phe Arg Ala Met His Glu Gin Leu Arg Ala Leu 690 695 700
Gin Asp Thr Thr Asn Thr Val Ser Gly Leu Arg Ala Gin Pro Ala Tyr 705 710 715 720
Glu Arg Leu Ser Ala Arg Tyr Gin Gly Val Leu Gly Ala Lys Gly Ala 725 730 735
Glu Arg Ala Glu Ala Val Glu Glu Leu Gly Ala Arg Val Thr Lys His
740 745 750
Thr Ala Leu Cys Ala Arg Leu Arg Asp Glu Val Val Arg Arg Val Pro 755 760 765 Trp Glu Met Asn Phe Asp Ala Leu Gly Arg Leu Leu Ala Glu Phe Asp 770 775 780
Ala Ala Ala Ala Asp Leu Ala Pro Trp Ala Val Glu Glu Phe Arg Gly 785 790 795 800
Ala Arg Glu Leu Ile Gin Tyr Arg Met Gly Ser Ala Tyr Ala Arg Ala 805 810 815
Gly Gly Lys Ala Leu Phe Leu Phe Phe Phe Phe Pro Pro Pro Leu Ser
820 825 830
Ser Phe Leu Pro His Phe His Phe Phe Ile His His His His Ser Phe 835 840 845 Thr Lys Phe Phe Thr Ser Ser Ser Leu His Ser Tyr His Leu Phe Pro 850 855 860
Ser Ser Ile Tyr Ser Ile Pro Ser Ile Ser Pro Leu Tyr Pro His Ser 865 870 875 880
Ser Leu Ser Phe Pro Ser Ser Gin Phe Leu His Ile Phe Leu Ser Leu 885 890 895
Pro ( 2 ) INFORMATION FOR SEQ ID NO : 39 :
( i ) SEQUENCE CHARACTERISTICS : (A) LENGTH : 335 amino acids ( B ) TYPE : amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39:
Val Met Pro Val Ala Pro Pro Pro Arg Gly Ala Gly Gly Arg Ala Pro
1 5 10 15
Cys Pro Pro Ala Leu Gly Pro Glu Ala Ile His Ala Arg Leu Glu Asp 20 25 30
Val Arg Ile Gin Ala Arg Arg Ala Ile Glu Ser Ala Ile Lys Glu Tyr 35 40 45
Phe His Arg Gly Ala Val Tyr Ser Ala Lys Ala Leu Gin Ala Ser Asp 50 55 60
Ser His Asp Cys Arg Phe His Val Ala Ser Ala Ala Val Val Pro Met
65 70 75 80
Val Gin Leu Leu Glu Ser Leu Pro Ala Phe Asp Gin His Thr Arg Asp 85 90 95
Val Ala Gin Arg Ala Ala Leu Pro Pro Pro Pro Pro Leu Ala Thr Ser 100 105 110
Pro Gin Ala Ile Leu Leu Arg Asp Leu Leu Gin Arg Gly Gin Thr Leu
115 120 125
Asp Ala Pro Glu Asp Leu Ala Ala Trp Leu Ser Val Leu Thr Asp Ala 130 135 140
Ala Thr Gin Gly Leu Ile Glu Arg Lys Pro Leu Glu Glu Leu Ala Arg
145 150 155 160
Ser Ile His Gly Ile Asn Asp Gin Gin Ala Arg Arg Ser Ser Gly Leu 165 170 175
Ala Glu Leu Gin Arg Phe Asp Ala Leu Asp Ala Ala Gin Gin Leu Asp 180 185 190
Ser Asp Ala Ala Phe Val Pro Ala Thr Gly Pro Ala Pro Tyr Val Asp 195 200 205
Gly Gly Gly Leu Ser Pro Glu Ala Thr Arg Met Ala Glu Asp Ala Leu 210 215 220
Arg Gin Ala Arg Ala Met Glu Ala Ala Lys Met Thr Ala Glu Leu Ala
225 230 235 240
Pro Glu Ala Arg Ser Arg Leu Arg Glu Arg Ala His Ala Leu Glu Ala 245 250 255
Met Leu Asn Asp Ala Arg Glu Arg Ala Lys Val Ala His Asp Ala Arg
260 265 270
Glu Lys Phe Leu His Lys Leu Gin Gly Val Leu Arg Pro Leu Pro Asp 275 280 285
Phe Val Gly Leu Lys Ala Cys Pro Ala Val Leu Ala Thr Leu Arg Ala
290 295 300
Ser Leu Pro Arg Gly Val Asp Arg Pro Gly Arg Cys Arg Pro Gly Ala 305 310 315 320 Pro Pro Arg Lys Ser Arg Arg Gly Cys Gly Arg Thr Cys Gly Gly
325 330 335
(2) INFORMATION FOR SEQ ID NO: 40:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 800 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40:
Val Val Thr Gly Val Arg Asn Gin Phe Ala Thr Asp Leu Glu Pro Gly 1 5 10 15
Gly Ser Val Ser Cys Met Arg Ser Ser Leu Ser Phe Leu Ser Leu Leu
20 25 30
Phe Asp Val Gly Pro Arg Asp Val Leu Ser Ala Glu Ala Ile Glu Gly 35 40 45
Cys Leu Val Glu Gly Gly Glu Trp Thr Arg Ala Ala Ala Gly Ser Gly
50 55 60
Pro Pro Arg Met Cys Ser Ile Ile Glu Leu Pro Asn Phe Leu Glu Tyr 65 70 75 80 Pro Ala Arg Gly Leu Arg Cys Val Phe Ser Arg Val Tyr Gly Glu Val
85 90 95
Gly Phe Phe Gly Glu Pro Thr Ala Gly Leu Leu Glu Thr Gin Cys Pro
100 105 110
Ala His Thr Phe Phe Ala Gly Pro Trp Ala Met Arg Pro Leu Ser Tyr 115 120 125
Thr Leu Leu Thr lie Gly Pro Leu Gly Met Gly Arg Asp Gly Asp Thr
130 135 140
Ala Tyr Leu Phe Asp Pro His Gly Leu Pro Ala Gly Thr Pro Ala Phe 145 150 155 160
Ile Ala Lys Val Arg Ala Gly Asp Val Tyr Pro Tyr Leu Thr Tyr Tyr
165 170 175
Ala His Asp Arg Pro Lys Val Arg Trp Ala Gly Ala Met Val Phe Phe 180 185 190
Val Pro Ser Gly Pro Gly Ala Val Ala Pro Ala Asp Leu Thr Ala Ala
195 200 205
Ala Leu His Leu Tyr Gly Ala Ser Glu Thr Tyr Leu Gin Asp Glu Pro
210 215 220 Phe Val Glu Arg Arg Val Ala Ile Thr His Pro Leu Arg Gly Glu Ile
225 230 235 240
Gly Gly Leu Gly Ala Leu Phe Val Gly Val Val Pro Arg Gly Asp Gly
245 250 255
Glu Gly Ser Gly Pro Val Val Pro Ala Leu Pro Ala Pro Thr His Val 260 265 270
Gin Thr Pro Arg Ala Asp Arg Pro Pro Glu Ala Pro Arg Gly Ala Ser
275 280 285
Gly Pro Pro Asn Thr Pro Gin Ala Gly His Pro Asn Arg Pro Pro Asp
290 295 300 Asp Val Trp Ala Ala Ala Leu Glu Gly Thr Pro Pro Ala Lys Pro Ser
305 310 315 320
Ala Pro Asp Ala Ala Ala Ser Gly Pro Pro His Ala Ala Pro Pro Pro
325 330 335
Gin Thr Pro Ala Gly Asp Ala Ala Glu Glu Ala Glu Asp Leu Arg Val 340 345 350
Leu Glu Val Gly Ala Val Pro Val Gly Cys His Arg Ala Arg Tyr Ser
355 360 365
Thr Gly Leu Pro Lys Arg Arg Arg Pro Thr Trp Thr Pro Pro Ser Ser
370 375 380 Val Glu Asp Leu Thr Ser Gly Glu Arg Pro Ala Pro Lys Ala Pro Pro
385 390 395 400
Ala Lys Ala Lys Lys Lys Ser Ala Pro Lys Lys Lys Ala Pro Val Ala
405 410 415
Ala Glu Val Pro Ala Ser Ser Pro Thr Pro Ile Ala Ala Thr Val Pro 420 425 430
Pro Ala Pro Asp Thr Pro Pro Gin Ser Gly Gin Gly Gly Gly Asp Asp
435 440 445
Gly Pro Asp Ser Ser Pro Ser Val Leu Glu Thr Leu Gly Ala Arg Arg 450 455 460 Pro Pro Glu Pro Pro Gly Ala Asp Leu Ala Gin Leu Phe Glu Val His 465 470 475 480
Pro Asn Val Ala Ala Thr Ala Val Arg Leu Ala Ala Arg Asp Ala Ala 485 490 495 Arg Glu Val Ala Ala Cys Ser Gin Leu Thr Ile Asn Ala Leu Arg Ser
500 505 510
Pro Tyr Pro Ala His Pro Gly Leu Leu Glu Leu Cys Val Ile Phe Phe 515 520 525 Phe Glu Arg Val Leu Ala Phe Leu Ile Glu Asn Gly Ala Arg Thr His 530 535 540
Thr Gin Ala Gly Val Ala Gly Pro Ala Ala Ala Leu Leu Asp Phe Thr 545 550 555 560
Leu Arg Met Pro Pro Arg Lys Thr Ala Val Gly Asp Phe Leu Ala Ser 565 570 575
Thr Arg Met Ser Leu Ala Asp Val Ala Ala His Arg Pro Leu Ile Gin
580 585 590
His Val Leu Asp Lys Asn Ser Gin Ile Gly Arg Leu Ala Lys Leu Val 595 600 605 Leu Val Ala Arg Asp Phe Ile Arg Glu Thr Asp Ala Phe Tyr Gly Asp 610 615 620
Leu Ala Asp Leu Asp Leu Gin Leu Arg Ala Ala Pro Pro Ala Asn Leu 625 630 635 640
Tyr Ala Arg Leu Gly Lys Trp Leu Leu Glu Arg Ser Arg Ala His Pro 645 650 655
Asn Thr Leu Phe Ala Pro Ala Thr Pro Thr His Pro Glu Pro Leu Leu
660 665 670
His Arg Ile Gin Ala His Phe Arg Lys Lys Met Arg Val Glu Ala Glu 675 680 685 Ala Arg Glu Met Arg Glu Ala Leu Tyr Arg Val Tyr Ser Val Ser Gin 690 695 700
Arg Ala Gly Pro Pro Asp Arg Asp Ala Arg Cys Pro Pro Pro Pro Gly 705 710 715 720
Arg Arg Arg Gin Gly Pro Val Pro Ala Arg Pro Gly Pro Arg Gly His 725 730 735
Pro Cys Ala Ala Gly Gly Arg Ala Asp Pro Gly Pro Pro Gly Asp Arg
740 745 750
Lys Arg Asp Gin Gly Val Leu Pro Pro Gly Ser Arg Ile Gin Arg Glu 755 760 765 Gly Pro Ala Gly Gin Arg Gin Pro Arg Leu Ser Val Ser Arg Gly Leu 770 775 780
Gly Arg Gly Arg Ala His Gly Pro Val Ala Gly Ile Ala Thr Gly Leu 785 790 795 800
(2) INFORMATION FOR SEQ ID NO: 41:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 158 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear ,
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41:
Met Asn Ala His Phe Ala Asn Glu Val Gin Tyr Asp Leu Thr Arg Asp 1 5 10 15
Pro Ser Ser Pro Ala Ser Leu Ile His Val Ile Ile Ser Ser Glu Cys
20 25 30
Leu Ala Ala Ala Gly Val Pro Leu Ser Ala Leu Val Arg Gly Arg Pro 35 40 45 Asp Gly Gly Ala Ala Ala Asn Phe Arg Val Glu Thr Gin Trp His Ala 50 55 60
Pro Gly Asp Cys Thr Pro Trp Arg Ser Ala Phe Ala Ala Tyr Val Pro 65 70 75 80
Ala Asp Ala Val Gly Ala Ile Leu Ala Pro Val Ile Pro Ala His Pro 85 90 95
Asp Leu Leu Pro Arg Val Pro Ser Ala Gly Gly Leu Phe Val Ser Leu
100 105 110
Pro Val Ala Cys Asp Ala Gin Gly Val Tyr Asp Pro Tyr Thr Val Ala 115 120 125 Ala Leu Arg Leu Ala Trp Gly Pro Trp Ala Thr Cys Ala Arg Val Leu 130 135 140
Leu Phe Ser Tyr Asp Glu Leu Thr Arg Tyr Arg Val Cys Gly 145 150 155
(2) INFORMATION FOR SEQ ID NO: 42:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 423 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42:
Val Pro Glu Gly Ala Trp Val Gly Gly Ala Cys Ala Arg Pro Arg Gly 1 5 10 15 Pro Arg Ala His Val Arg Leu Tyr Ala Val Cys Phe Val Cys Pro Gin
20 25 30
Gly Ile Arg Gly Gin Asp Phe Asn Leu Leu Phe Val Asp Glu Ala Asn 35 40 45 Phe Ile Arg Pro Asp Ala Val Gin Thr Ile Met Gly Phe Leu Asn Gin 50 55 60
Ala Asn Cys Lys Ile Ile Phe Val Ser Ser Thr Asn Thr Gly Lys Ala 65 70 75 80
Ser Thr Ser Phe Leu Tyr Asn Leu Arg Gly Ala Ala Asp Glu Leu Leu 85 90 95
Asn Val Val Thr Tyr Ile Cys Asp Asp His Met Pro Arg Val Val Thr
100 105 110
His Thr Asn Ala Thr Ala Cys Ser Cys Tyr Ile Leu Asn Lys Pro Val 115 120 125 Phe Ile Thr Met Asp Gly Ala Val Arg Arg Thr Ala Asp Leu Phe Leu 130 135 140
Pro Asp Ser Phe Met Gin Glu Ile Ile Gly Gly Gin Ala Arg Glu Thr 145 150 155 160
Gly Asp Asp Arg Pro Val Leu Thr Lys Ser Ala Gly Glu Arg Phe Leu 165 170 175
Leu Tyr Arg Pro Ser Thr Thr Thr Asn Ser Gly Leu Met Ala Pro Glu
180 185 190
Leu Tyr Val Tyr Val Asp Pro Ala Phe Thr Ala Asn Thr Arg Ala Ser 195 200 205 Gly Thr Gly Ile Ala Val Val Gly Arg Tyr Arg Asp Asp Phe Ile Ile 210 215 220
Phe Ala Leu Glu His Phe Phe Leu Arg Ala Leu Thr Gly Ser Ala Pro 225 230 235 240
Ala Asp Ile Ala Arg Cys Val Val His Ser Leu Ala Gin Val Leu Ala 245 250 255
Leu His Pro Gly Ala Phe Arg Ser Val Arg Val Ala Val Glu Gly Asn
260 265 270
Ser Ser Gin Asp Ser Ala Val Ala Ile Ala Thr His Val His Thr Glu 275 280 285 Met His Arg Ile Leu Ala Ser Ala Gly Ala Asn Gly Pro Gly Pro Glu 290 295 300
Leu Leu Phe Tyr His Cys Glu Pro Pro Gly Gly Ala Val Leu Tyr Pro 305 310 315 320
Phe Phe Leu Leu Asn Lys Gin Lys Thr Pro Ala Phe Glu Tyr Phe Ile 325 330 335
Lys Lys Phe Asn Ser Gly Gly Val Met Ala Ser Gin Glu Leu Val Ser
340 345 350
Val Thr Val Arg Leu Gin Thr Asp Pro Val Glu Tyr Leu Ser Glu Gin 355 360 365
Leu Asn Asn Leu Ile Glu Thr Val Ser Pro Asn Thr Asp Val Arg Met
370 375 - 380
Tyr Ser Gly Lys Arg Asn Gly Ala Ala Asp Asp Leu Met Val Ala Val 385 390 395 400
Ile Met Ala Ile Tyr Leu Ala Ala Pro Thr Gly Ile Pro Pro Ala Phe
405 410 415
Phe Pro Ile Thr Arg Thr Ser 420
(2) INFORMATION FOR SEQ ID NO: 43:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 355 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43:
Val Leu Leu Ser Pro Ala Pro Pro Pro Leu Pro His Gly Arg Cys Pro
1 5 10 15
Pro Ser Leu Phe His His Arg Pro Gly Cys Val Ser Gly Pro Pro Ala 20 25 30
Pro Pro Arg Ser Gly Val Ser Arg Pro Gly Ala Met Ile Thr Asp Cys 35 40 45
Phe Glu Ala Asp Ile Ala Ile Pro Ser Gly Ile Ser Arg Pro Asp Ala 50 55 60
Ala Ala Leu Gin Arg Cys Glu Gly Arg Val Val Phe Leu Pro Thr Ile
65 70 75 80
Arg Arg Gin Leu Ala Asp Val Ala His Glu Ser Phe Val Ser Gly Gly
85 90 95
Val Ser Pro Asp Thr Leu Gly Leu Leu Leu Ala Tyr Arg Arg Arg Phe 100 105 110
Pro Ala Val Ile Thr Arg Val Leu Pro Thr Arg Ile Val Ala Cys Pro
115 120 125
Val Asp Leu Gly Leu Thr His Ala Gly Thr Val Asn Leu Arg Asn Thr 130 135 140
Ser Pro Val Asp Leu Cys Asn Gly Asp Pro Val Ser Leu Val Pro Pro
145 150 155 160
Val Phe Giu Gly Gin Ala Thr Asp Val Arg Leu Glu Ser Leu Asp Leu 165 170 175
Thr Leu Arg Phe Pro Val Pro Leu Pro Thr Pro Leu Ala Arg Glu Ile
180 _ 185 190
Val Ala Arg Leu Val Arg Ile Arg Asp Leu Asn Pro Asp Pro Arg Thr 195 200 205
Pro Gly Glu Leu Pro Asp Leu Asn Val Leu Tyr Tyr Asn Gly Ala Arg
210 215 220
Leu Ser Leu Val Ala Asp Val Gin Gin Leu Ala Ser Val Asn Thr Glu 225 230 235 240 Leu Arg Ser Leu Val Leu Asn Met Val Tyr Ser Ile Thr Glu Gly Thr
245 250 255
Thr Leu Ile Leu Thr Leu Ile Pro Arg Leu Leu Ala Leu Ser Ala Gin
260 265 270
Asp Gly Tyr Val Asn Ala Leu Leu Gin Met Gin Ser Val Thr Arg Glu 275 280 285
Ala Ala Gin Leu Ile His Pro Glu Ala Pro Met Leu Met Gin Asp Gly
290 295 300
Glu Arg Arg Leu Pro Leu Tyr Glu Ala Leu Val Ala Trp Leu Ala His 305 310 315 320 Ala Gly Gin Leu Gly Asp Ile Leu Ala Pro Ala Val Arg Val Cys Thr
325 330 335
Phe Asp Gly Ala Ala Val Val Gin Ser Gly Asp Met Ala Pro Val Ile
340 345 350
Arg Tyr Pro 355
(2) INFORMATION FOR SEQ ID NO: 44:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1382 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44:
Val Trp Glu Gly Leu Gly Leu Pro Glu Leu Gly Leu Met Glu Pro Ala 1 5 10 15
Asn Pro Pro Arg Asn Pro Met Ala Ala Pro Ala Arg Asp Pro Pro Gly
20 25 30
Tyr Arg Tyr Ala Ala Ala Met Val Pro Thr Gly Ser Ile Leu Ser Thr 35 40 45
Ile Glu Val Ala Ser His Arg Arg Leu Phe Asp Phe Phe Ala Arg Val
50 55 _ 60
Arg Ser Asp Glu Asn Ser Leu Tyr Asp Val Glu Phe Asp Ala Leu Leu 65 70 75 80
Gly Ser Tyr Cys Asn Thr Leu Ser Leu Val Arg Phe Leu Glu Leu Gly
85 90 95
Leu Ser Val Ala Cys Val Cys Thr Lys Phe Pro Glu Leu Ala Tyr Met 100 105 110 Asn Glu Gly Arg Val Gin Phe Glu Val His Gin Pro Leu Ile Ala Arg 115 120 125
Asp Gly Pro His Pro Val Glu Gin Pro Val His Asn Tyr Met Thr Lys
130 135 140
Val Ile Asp Arg Arg Ala Leu Asn Ala Ala Phe Ser Leu Ala Thr Glu 145 150 155 160
Ala Ile Ala Leu Leu Thr Gly Glu Ala Leu Asp Gly Thr Gly Ile Ser
165 170 175
Leu His Arg Gin Leu Arg Ala Ile Gin Gin Leu Ala Arg Asn Val Gin 180 185 190 Ala Val Leu Gly Ala Phe Glu Arg Gly Thr Ala Asp Gin Met Leu His 195 200 205
Val Leu Leu Glu Lys Ala Pro Pro Leu Ala Leu Leu Leu Pro Met Gin
210 215 220
Arg Tyr Leu Asp Asn Gly Arg Leu Ala Thr Arg Val Ala Arg Ala Thr 225 230 235 240
Leu Val Ala Glu Leu Lys Arg Ser Phe Cys Asp Thr Ser Phe Phe Leu
245 250 255
Gly Lys Ala Gly His Arg Arg Glu Ala Ile Glu Ala Trp Leu Val Asp 260 265 270 Leu Thr Thr Ala Thr Gin Pro Ser Val Ala Val Pro Arg Leu Thr His 275 280 285
Ala Asp Thr Arg Gly Arg Pro Val Asp Gly Val Leu Val Thr Thr Ala
290 295 300
Ala Ile Lys Gin Arg Leu Leu Gin Ser Phe Leu Lys Val Glu Asp Thr 305 310 315 320
Glu Ala Asp Val Pro Val Thr Tyr Gly Glu Met Val Leu Asn Gly Ala
325 330 335
Asn Leu Val Thr Ala Leu Val Met Gly Lys Ala Val Arg Ser Leu Asp 340 345 350 Asp Val Gly Arg His Leu Leu Glu Met Gin Glu Glu Gin Leu Glu Ala 355 360 365
Asn Arg Glu Thr Leu Asp Glu Leu Glu Ser Ala Pro Gin Thr Thr Arg 370 375 380 Val Arg Ala Asp Leu Val Ala Ile Gly Asp Arg Leu Val Phe Leu Glu
385 390 395 400
Ala Leu Glu Lys Arg Ile Tyr Ala Ala Thr Asn Val Pro Tyr Pro Leu
405 410 415 Val Gly Ala Met Asp Leu Thr Phe Val Leu Pro Leu Gly Leu Phe Asn
420 425 430
Pro Ala Met Glu Arg Phe Ala Ala His Ala Gly Asp Leu Val Pro Ala
435 440 445
Pro Gly His Pro Glu Pro Arg Ala Phe Pro Pro Arg Gin Leu Phe Phe 450 455 460
Trp Gly Lys Asp His Gin Val Leu Arg Leu Ser Met Glu Asn Ala Val
465 470 475 480
Gly Thr Val Cys His Pro Ser Leu Met Asn Ile Asp Ala Ala Val Gly
485 490 495 Gly Val Asn His Asp Pro Val Glu Ala Ala Asn Pro Tyr Gly Ala Tyr
500 505 510
Val Ala Ala Pro Ala Gly Pro Gly Ala Asp Met Gin Gin Arg Phe Leu
515 520 525
Asn Ala Trp Arg Gin Arg Leu Ala His Gly Arg Val Arg Trp Val Ala 530 535 540
Glu Cys Gin Met Thr Ala Glu Gin Phe Met Gin Pro Asp Asn Ala Asn
545 550 555 560
Leu Ala Leu Glu Leu His Pro Ala Phe Asp Phe Phe Ala Gly Val Ala
565 570 575 Asp Val Glu Leu Pro Gly Gly Glu Val Pro Pro Ala Gly Pro Gly Ala
580 585 590
Ile Gin Ala Thr Trp Arg Val Val Asn Gly Asn Leu Pro Leu Ala Leu
595 600 605
Cys Pro Val Ala Phe Arg Asp Arg Leu Glu Leu Gly Val Gly Arg His 610 615 620
Ala Met Ala Pro Ala Thr Ile Ala Ala Val Arg Gly Ala Phe Glu Asp
625 630 635 640
Arg Ser Tyr Pro Ala Val Phe Tyr Leu Leu Gin Ala Ala Ile His Gly
645 650 655 Ser Glu His Val Phe Cys Ala Arg Leu Val Thr Gin Cys Ile Thr Ser
660 665 670
Tyr Trp Asn Asn Thr Arg Cys Ala Ala Phe Val Asn Asp Tyr Ser Leu
675 680 685
Val Ser Tyr Ile Val Thr Tyr Leu Gly Gly Asp Leu Pro Glu Glu Cys 690 695 700
Met Ala Val Tyr Arg Asp Leu Val Ala His Val Glu Ala Gin Leu Val 705 710 715 720
Asp Asp Phe Thr Leu Pro Gly Pro Glu Leu Gly Gly Gin Ala Gin Ala 725 730 735
Glu Leu Asn His Leu Met Arg Asp Pro Ala Leu Leu Pro Pro Leu Val
740 , 745 750
Trp Asp Cys Asp Gly Leu Met Arg His Ala Ala Leu Asp Arg His Arg 755 760 765
Asp Cys Arg Ile Asp Ala Gly Gly His Glu Pro Val Tyr Ala Ala Ala
770 775 780
Cys Asn Val Ala Thr Ala Asp Phe Asn Arg Asn Asp Gly Arg Leu Leu 785 790 795 800 His Asn Thr Gin Ala Arg Ala Ala Asp Ala Ala Asp Asp Arg Pro His
805 810 815
Arg Pro Ala Asp Trp Thr Val His His Lys Ile Tyr Tyr Tyr Val Leu
820 825 830
Val Pro Ala Phe Ser Arg Gly Arg Cys Cys Thr Ala Gly Val Arg Phe 835 840 845
Asp Arg Val Tyr Ala Thr Leu Gin Asn Met Val Val Pro Glu Ile Ala
850 855 860
Pro Gly Glu Glu Cys Pro Ser Asp Pro Val Thr Asp Pro Ala His Pro 865 870 875 880 Leu His Pro Ala Asn Leu Val Ala Asn Thr Val Asn Ala Met Phe His
885 890 895
Asn Gly Arg Val Val Val Asp Gly Pro Ala Met Leu Thr Leu Gin Val
900 905 910
Leu Ala His Asn Met Ala Glu Arg Thr Thr Ala Leu Leu Cys Ser Ala 915 920 925
Ala Pro Asp Ala Gly Ala Asn Thr Ala Ser Thr Ala Asn Met Arg Ile
930 935 940
Phe Asp Gly Ala Leu His Ala Gly Val Leu Leu Met Ala Pro Gin His 945 950 955 960 Leu Asp His Thr Ile Gin Asn Gly Glu Tyr Phe Tyr Val Leu Pro Val
965 970 975
His Ala Leu Phe Ala Gly Ala Asp His Val Ala Asn Ala Pro Asn Phe
980 985 990
Pro Pro Ala Leu Arg Asp Leu Ala Arg His Val Pro Leu Val Pro Pro 995 1000 1005
Ala Leu Gly Ala Asn Tyr Phe Ser Ser Ile Arg Gin Pro Val Val Gin
1010 1015 1020
His Ala Arg Glu Ser Ala Ala Gly Glu Asn Ala Leu Thr Tyr Ala Leu 1025 1030 1035 104 Met Ala Gly Tyr Phe Lys Met Ser Pro Val Tyr His Gin Leu Lys Thr
1045 1050 1055
Gly Leu His Pro Gly Phe Gly Phe Thr Val Val Arg Gin Asp Arg Phe 1060 1065 1070 Val Thr Glu Asn Val Leu Phe Ser Ala Ser Glu Ala Tyr Phe Leu Gly
1075 1080 1085
Gin Leu Gin Val Ala Arg His Glu Thr Gly Gly Gly Val Ser Phe Thr
1090 1095 1100 Leu Thr Gin Pro Arg Gly Asn Val Asp Leu Gly Val Gly Tyr Thr Ala
1105 1110 1115 112
Val Ala Ala Thr Ala Thr Val Arg Asn Pro Val Thr Asp Met Gly Asn
1125 1130 1135
Leu Pro Gin Asn Phe Tyr Leu Gly Arg Gly Ala Pro Pro Leu Leu Asn 1140 1145 1150
Asn Ala Ala Ala Val Tyr Leu Arg Asn Ala Val Val Ala Gly Asn Arg
1155 1160 1165
Leu Gly Pro Ala Gin Pro Leu Pro Val Phe Gly Cys Ala Gin Val Pro
1170 1175 1180 Arg Arg Ala Gly Met Asp His Gly Gin Asp Ala Val Cys Glu Phe Ile
1185 1190 1195 120
Ala Thr Pro Val Ala Thr Asp Ile Asn Tyr Phe Arg Arg Pro Cys Asn
1205 1210 1215
Pro Arg Gly Arg Ala Ala Gly Gly Val Tyr Ala Gly Asp Lys Glu Gly 1220 1225 1230
Asp Val Ile Ala Leu Met Tyr Asp His Gly Gin Ser Asp Pro Ala Arg
1235 1240 1245
Pro Phe Ala Ala Thr Ala Asn Pro Trp Ala Ser Gin Arg Phe Ser Tyr
1250 1255 1260 Gly Asp Leu Leu Tyr Asn Gly Ala Tyr His Leu Asn Gly Asp Val Leu
1265 1270 1275 128
Ser Pro Cys Phe Lys Phe Phe Thr Ala Ala Asp Ile Thr Ala Lys His
1285 1290 1295
Arg Cys Leu Glu Arg Leu Ile Val Glu Thr Gly Ser Ala Val Ser Thr 1300 1305 1310
Ala Thr Ala Ala Ser Asp Val Gin Phe Lys Arg Pro Pro Gly Cys Arg
1315 1320 1325
Glu Leu Val Glu Asp Pro Cys Gly Leu Phe Gin Glu Ala Tyr Pro Ile
1330 1335 1340 Thr Cys Ala Ser Asp Pro Ala Leu Leu Arg Ser Ala Arg Asp Gly Glu
1345 1350 1355 136
Ala His Ala Arg Glu Thr His Phe Thr Gin Tyr Leu Ile Tyr Asp Asp
1365 1370 1375
Leu Lys Gly Leu Ser Leu 1380
(2) INFORMATION FOR SEQ ID NO: 45: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 222 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45:
Met Thr Met Arg Asp Asp Val Pro Leu Leu Asp Arg Glu Leu Val Tyr
1 5 10 15
Glu Ala Ala Cys Gly Gly Glu Asp Gly Glu Leu Pro Leu Asp Glu Gin 20 25 30 Phe Ser Leu Ser Ser Tyr Gly Thr Ser Asp Phe Phe Val Ser Ser Ala 35 40 45
Tyr Ser Arg Leu Pro Pro His Thr Gin Pro Val Phe Ser Lys Arg Val
50 55 60
Val Met Phe Ala Trp Ser Phe Leu Val Leu Lys Pro Leu Glu Leu Val 65 70 75 80
Ala Ala Gly Met Tyr Tyr Gly Trp Thr Gly Arg Ala Val Ala Pro Ala
85 90 95
Cys Ile Ile Ala Ala Val Leu Ala Tyr Tyr Val Thr Trp Leu Ala Arg 100 105 110 Ala Leu Leu Leu Tyr Val Asn Ile Lys Arg Asp Arg Leu Pro Leu Ser 115 120 125
Pro Pro Val Phe Trp Gly Leu Cys Val Ile Met Gly Gly Ala Ala Leu
130 135 140
Cys Ala Leu Val Ala Ala Ala His Glu Thr Phe Ser Pro Asp Gly Leu 145 150 155 160
Phe His Trp Ile Thr Ala Ser Gin Leu Leu Pro Arg Thr Asp Pro Leu
165 170 175
Arg Ala Arg Ser Leu Gly Ile Ala Cys Ala Ala Gly Ala Ala Met Trp 180 185 190 Val Ala Ala Ala Asp Cys Phe Ala Ala Phe Thr Asn Phe Phe Leu Ala 195 200 205
Arg Phe Trp Thr Arg Ala lie Leu Lys Ala Pro Val Ala Phe 210 215 220
(2) INFORMATION FOR SEQ ID NO: 46:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 627 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46:
Val Gly Arg Gin Gly Glu Arg Trp Val Gly Gly Gly Asn Glu Lys Asn 1 5 10 15
Thr Gin Arg Ala Thr Ser Gly Met Arg Pro Glu Leu Ser Leu Lys Gly
20 25 30
Arg Pro Cys Val Thr Glu Ala Val Val Cys Pro Ser Thr Asp Ala Ala 35 40 45 Ile His Ser Gly Gly Ser Ser Ser Val Arg Pro Gin Pro Tyr Ala Arg 50 55 60
Ala Ala Arg Ala Arg Ala Thr His Gly Ser Arg Ser Arg His Arg Gin 65 70 75 80
Pro Leu Leu Pro Pro Pro Ser Ser His His Pro Thr Ile Pro Pro Pro 85 90 95
Pro Ser Pro Pro Arg Gly Ser Pro Ala Met Glu Leu Thr Tyr Ala Thr
100 105 110
Thr Leu His His Arg Asp Val Val Phe Tyr Val Thr Ala Asp Arg Asn 115 120 125 Arg Ala Tyr Phe Val Cys Gly Gly Ser Val Tyr Ser Val Gly Arg Pro 130 135 140
Arg Asp Ser Gin Pro Gly Glu Ile Ala Lys Phe Gly Leu Val Val Arg 145 150 155 160
Gly Thr Gly Pro Lys Asp Arg Met Val Ala Asn Tyr Val Arg Ser Glu 165 170 175
Leu Arg Gin Arg Gly Leu Arg Glu Val Arg Pro Val Gly Glu Asp Glu
180 185 190
Val Phe Leu Asp Ser Val Cys Leu Leu Asn Pro Asn Val Ser Ser Asp 195 200 205 Val Ile Asn Thr Asn Asp Val Glu Val Leu Asp Glu Cys Leu Ala Glu 210 215 220
Tyr Cys Thr Ser Leu Gin Thr Ser Pro Gly Val Leu Val Thr Gly Val 225 230 235 240
Arg Val Arg Ala Arg Asp Arg Val Ile Glu Leu Phe Glu His Pro Ala 245 250 255
Ile Val Asn Ile Ser Ser Arg Phe Ala Tyr Thr Pro Ser Pro Tyr Val
260 265 270
Phe Ala Gin Ala His Leu Pro Arg Leu Pro Ser Ser Leu Glu Pro Leu 275 280 285
Val Ser Gly Leu Phe Asp Gly Ile Pro Ala Pro Arg Gin Pro Leu Asp
290 295 300
Ala Arg Asp Arg Arg Thr Asp Val Val Ile Thr Gly Thr Arg Ala Pro 305 310 315 320
Arg Pro Met Ala Gly Thr Gly Ala Gly Gly Ala Gly Ala Lys Arg Ala
325 330 335
Thr Val Ser Glu Phe Val Gin Val Lys His Ile Asp Arg Val Val Ser 340 345 350 Pro Ser Val Ser Ser Ala Pro Pro Pro Ser Ala Pro Asp Ala Ser Leu 355 360 365
Pro Pro Pro Gly Leu Gin Glu Ala Ala Pro Pro Gly Pro Pro Leu Arg
370 375 380
Glu Leu Trp Trp Val Phe Tyr Ala Gly Asp Arg Ala Leu Glu Glu Pro 385 390 395 400
His Ala Glu Ser Gly Leu Thr Arg Glu Glu Val Arg Ala Val His Gly
405 410 415
Phe Arg Glu Gin Ala Trp Lys Leu Phe Gly Ser Val Gly Ala Pro Arg 420 425 430 Ala Phe Leu Gly Ala Ala Leu Ser Pro Thr Gin Lys Leu Ala Val Tyr 435 440 445
Tyr Tyr Leu Ile His Arg Glu Arg Arg Met Ser Pro Phe Pro Ala Leu
450 455 460
Val Arg Leu Val Gly Arg Tyr Ile Gin Arg His Gly Val Pro Ala Pro 465 470 475 480
Asp Glu Pro Thr Leu Ala Asp Ala Met Asn Gly Leu Phe Arg Asp Ala
485 490 495
Ala Gly Thr Val Ala Glu Gin Leu Leu Met Phe Asp Leu Leu Pro Pro 500 505 510 Lys Asp Val Pro Val Gly Ser Asp Ala Arg Ala Asp Ser Ala Ala Leu 515 520 525
Leu Arg Phe Val Asp Ser Gin Arg Leu Thr Pro Gly Gly Ser Val Ser
530 535 540
Pro Glu His Val Met Tyr Leu Gly Ala Phe Leu Gly Val Leu Tyr Ala 545 550 555 560
Gly His Gly Arg Leu Ala Ala Ala Thr His Thr Ala Arg Leu Thr Gly
565 570 575
Val Thr Ser Leu Val Leu Thr Val Gly Asp Val Asp Arg Met Ser Ala 580 585 590 Phe Asp Arg Gly Pro Ala Gly Ala Ala Gly Arg Thr Arg Thr Ala Gly 595 600 605
Tyr Leu Asp Ala Leu Leu Thr Val Cys Leu Ala Arg Ala Gin His Gly 610 615 620 Gin Ser Val 625
(2) INFORMATION FOR SEQ ID NO: 47:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 592 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47:
Val Tyr Leu Ser Pro Ser Ala Leu Lys Trp Pro Val Gly Val Trp Thr
1 5 10 15
Thr Gly Gly Leu Ala Phe Gly Cys Asp Ala Ala Leu Val Arg Ala Arg 20 25 30 Tyr Gly Lys Gly Phe Met Gly Val Val Ile Ser Met Arg Asp Ser Pro 35 40 45
Pro Ala Glu lie Ile Val Val Pro Ala Asp Lys Thr Leu Ala Arg Val
50 55 60
Gly Asn Pro Thr Asp Glu Asn Ala Pro Ala Val Leu Pro Gly Pro Pro 65 70 75 80
Ala Gly Pro Arg Tyr Arg Val Phe Val Leu Gly Ala Pro Thr Pro Ala
85 90 95
Asp Asn Gly Ser Ala Leu Asp Ala Leu Arg Arg Val Ala Gly Tyr Pro 100 105 110 Glu Glu Ser Thr Asn Tyr Ala Gin Tyr Met Ser Arg Ala Tyr Ala Glu 115 120 125
Phe Leu Gly Glu Asp Pro Gly Ser Gly Thr Asp Ala Arg Pro Ser Leu
130 135 140
Phe Trp Arg Leu Ala Gly Leu Leu Ala Ser Ser Gly Phe Ala Phe Val 145 150 155 160
Asn Ala Ala His Ala His Asp Ala Ile Arg Leu Ser Asp Leu Leu Gly
165 170 175
Phe Leu Ala His Ser Arg Val Leu Ala Gly Leu Ala Arg Ala Ala Gly 180 185 190 Cys Ala Ala Asp Ser Val Phe Leu Asn Val Ser Val Leu Asp Pro Ala 195 200 205
Ala Arg Leu Arg Leu Glu Ala Arg Leu Gly His Leu Val Ala Ala Ile 210 215 220 Arg Glu Gin Ser Leu Ala Ala His Ala Leu Gly Tyr Gin Leu Ala Phe
225 230 235 240
Val Leu Asp Ser Pro Ala Ala Tyr Gly Ala Val Ala Pro Ser Ala Ala
245 250 255 Arg Leu Ile Asp Ala Leu Tyr Ala Glu Phe Leu Gly Gly Arg Ala Leu
260 265 270
Thr Ala Pro Met Val Arg Arg Ala Leu Phe Tyr Ala Thr Ala Val Leu
275 280 285
Arg Ala Pro Phe Leu Ala Gly Ala Pro Ser Ala Glu Gin Arg Glu Arg 290 295 300
Ala Arg Arg Gly Leu Leu Ile Thr Thr Ala Leu Cys Thr Ser Asp Val
305 310 315 320
Ala Ala Ala Thr His Ala Asp Leu Arg Ala Ala Arg Thr Asp His Gin
325 330 335 Lys Asn Leu Phe Trp Leu Pro Asp His Phe Ser Pro Cys Ala Ala Ser
340 345 350
Leu Arg Phe Asp Leu Ala Glu Gly Gly Phe Ile Leu Asp Ala Met Ala
355 360 365
Thr Arg Ser Asp Ile Pro Ala Asp Val Met Ala Gin Gin Thr Arg Gly 370 375 380
Val Ala Ser Val Leu Thr Arg Trp Ala His Tyr Asn Ala Leu Ile Arg
385 390 395 400
Ala Phe Val Pro Glu Ala Thr His Gin Cys Ser Gly Pro Ser His Asn
405 410 415 Ala Glu Pro Arg lie Leu Val Pro Ile Thr His Asn Ala Ser Tyr Val
420 425 430
Val Thr His Thr Pro Leu Pro Arg Gly Ile Gly Tyr Lys Leu Thr Gly
435 440 445
Val Asp Val Arg Arg Pro Leu Phe Ile Thr Tyr Leu Thr Ala Thr Cys 450 455 460
Glu Gly His Ala Arg Glu Ile Glu Pro Lys Arg Leu Val Arg Thr Glu
465 470 475 480
Asn Arg Arg Asp Leu Gly Leu Val Gly Ala Val Phe Leu Arg Tyr Thr
485 490 495 Pro Ala Gly Glu Val Met Ser Val Leu Leu Val Asp Thr Asp Ala Thr
500 505 510
Gin Gin Gin Leu Ala Gin Gly Pro Val Ala Gly Thr Pro Asn Val Phe
515 520 525
Ser Ser Asp Val Pro Ser Val Leu Leu Phe Pro Asn Gly Thr Val Ile 530 535 540
His Leu Leu Ala Phe Asp Thr Leu Pro Ile Ala Thr Ile Ala Pro Gly 545 550 555 560
Phe Leu Ala Ala Ser Ala Leu Gly Val Val Met Ile Thr Ala Ala Gly 565 570 575
Ile Leu Arg Val Val Arg Thr Cys Val Pro Phe Leu Trp Arg Arg Glu 580 585 590
(2) INFORMATION FOR SEQ ID NO: 48:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 315 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48:
Val Ser Ile Ser Ala Gly Val Arg Gly Gin Gly Trp His Arg Ile Ser
1 5 10 15
Thr Pro Pro Lys Asn Gly Ala Gly Arg Ser Val Leu Val Phe Gly Leu 20 25 30
Val Leu Pro Leu Cys Phe Tyr Pro His Pro Thr Pro Ser Phe Gly Pro
35 40 45
Arg Leu Arg Gin Gin Arg Ala Ser Asp Ser Leu Arg Gly Ala Glu Pro 50 55 60 Leu Trp Ala Val Gly Thr Asp Thr Pro Pro Ser Ala Asp Trp Gin Pro 65 70 75 80
Gly Arg Thr Thr Met Gly Pro Gly Leu Trp Val Val Met Gly Val Leu
85 90 95
Val Gly Val Ala Gly Gly His Asp Thr Tyr Trp Thr Glu Gin Ile Asp 100 105 110
Pro Trp Phe Leu His Gly Leu Gly Leu Ala Arg Thr Tyr Trp Arg Asp
115 120 125
Thr Asn Thr Gly Arg Leu Trp Leu Pro Asn Thr Pro Asp Ala Ser Asp
130 135 140 Pro Gin Arg Gly Arg Leu Ala Pro Pro Gly Glu Leu Asn Leu Thr Thr
145 150' 155 160
Ala Ser Val Pro Met Leu Arg Trp Tyr Ala Glu Arg Phe Cys Phe Val
165 170 175
Leu Val Thr Thr Ala Glu Phe Pro Arg Asp Pro Gly Gin Leu Leu Tyr 180 185 190
Ile Pro Lys Thr Tyr Leu Leu Gly Arg Pro Arg Asn Ala Ser Leu Pro
195 200 205
Glu Leu Pro Glu Ala Gly Pro Thr Ser Arg Pro Pro Ala Glu Val Thr 210 215 220
Gin Leu Lys Gly Leu Ser His Asn Pro Gly Ala Ser Ala Leu Leu Arg 225 230 235 240
Ser Arg Ala Trp Val Thr Phe Ala Ala Ala Pro Asp Arg Glu Gly Leu 245 250 255
Thr Phe Pro Arg Gly Asp Asp Gly Ala Thr Glu Arg His Pro Asp Gly
260 265 270
Arg Arg Asn Ala Pro Pro Pro Gly Pro Pro Ala Gly Thr Pro Arg His 275 280 285 Pro Thr Thr Asn Leu Ser Ile Ala His Leu His Asn Ala Ser Val Thr 290 295 300
Trp Leu Ala Arg Leu Leu Arg Thr Pro Gly Arg 305 310 315
(2) INFORMATION FOR SEQ ID NO: 49:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 370 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49:
Met Ala Ser His Ala Gly Gin Gin His Ala Pro Ala Phe Gly Gin Ala
1 5 10 15
Ala Arg Ala Ser Gly Pro Thr Asp Gly Arg Ala Ala Ser Arg Pro Ser 20 25 30
His Arg Gin Gly Ala Ser Asp Pro Glu Leu Pro Thr Leu Leu Arg Val
35 40 45
Tyr Ile Asp Gly Pro His Gly Val Gly Lys Thr Thr Thr Ser Ala Gin 50 55 60 Leu Met Glu Ala Leu Gly Pro Arg Asp Asn Ile Val Tyr Val Pro Glu 65 70 75 80
Pro Met Thr Tyr Trp Gin Val Leu Gly Ala Ser Glu Thr Leu Thr Asn
85 90 95
Ile Tyr Asn Thr Gin His Arg Leu Asp Arg Gly Glu Ile Ser Ala Gly 100 105 110
Glu Ala Ala Val Val Met Thr Ser Ala Gin Ile Thr Met Ser Thr Pro
115 120 125
Tyr Ala Ala Thr Asp Ala Val Leu Ala Pro His Ile Gly Gly Glu Ala 130 135 140
Val Gly Pro Gin Ala Pro Pro Pro Ala Leu Thr Leu Val Phe Asp Arg 145 150 155 160
His Pro Ile Ala Ser Leu Leu Cys Tyr Pro Ala Ala Arg Tyr Leu Met 165 170 175
Gly Ser Met Thr Pro Gin Ala Val Leu Ala Phe Val Met Pro Pro Thr
180 185 190
Ala Pro Gly Thr Asn Leu Val Leu Gly Val Leu Pro Glu Ala Glu His 195 200 205 Ala Asp Arg Leu Ala Arg Arg Gin Arg Pro Gly Glu Arg Leu Asp Leu 210 215 220
Ala Met Leu Ser Ala Ile Arg Arg Val Tyr Asp Leu Leu Ala Asn Thr 225 230 235 240
Val Arg Tyr Leu Gin Arg Gly Gly Arg Trp Arg Glu Asp Trp Gly Arg 245 250 255
Leu Thr Gly Val Ala Ala Ala Thr Pro Arg Pro Asp Pro Glu Asp Gly
260 265 270
Ala Gly Ser Leu Pro Arg Ile Glu Asp Thr Leu Phe Ala Leu Phe Arg 275 280 285 Val Pro Glu Leu Leu Ala Pro Asn Gly Asp Leu Tyr His Ile Phe Ala 290 295 300
Trp Val Leu Asp Val Leu Ala Asp Arg Leu Leu Pro Met His Leu Phe 305 310 315 320
Val Leu Asp Tyr Asp Gin Ser Pro Val Gly Cys Arg Asp Ala Leu Leu 325 330 335
Arg Leu Thr Ala Gly Met Ile Pro Thr Arg Val Thr Thr Ala Gly Ser
340 345 350
Ile Ala Glu Ile Arg Asp Leu Ala Arg Thr Phe Ala Arg Glu Val Gly 355 360 365 Gly Val 370
(2) INFORMATION FOR SEQ ID NO: 50:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 352 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: Val Leu Arg Val Val Asp Val Arg Gin Gly Leu Gly Gly Pro Gin His
1 5 , 10 15
Leu Pro Val Ser His Arg Leu Gly Asp Val Asp Asp Ile Val Ala Arg 20 25 30
Pro Gin Gly Leu His Gin Leu Arg Gly Gly Gly Gly Leu Pro His Pro
35 40 45
Val Gly Ser Val Tyr Ile Asn Pro Gin Gin Arg Gly Gin Leu Arg Ile 50 55 60 Pro Ala Gly Phe Gly Gly Pro Leu Ala Met Ala Arg Thr Gly Arg Arg 65 70 75 80
Ala Ala Val Gly Arg Pro Ala Arg Thr Ser Ser Leu Thr Glu Arg Arg
85 90 95
Arg Val Leu Leu Ala Gly Val Arg Ser His Thr Arg Phe Tyr Lys Ala 100 105 110
Phe Ala Arg Glu Val Arg Glu Phe Asn Ala Thr Arg Ile Cys Gly Thr
115 120 125
Leu Leu Thr Leu Met Ser Gly Ser Leu Gin Gly Arg Ser Leu Phe Glu
130 135 140 Ala Thr Arg Val Thr Leu Ile Cys Glu Val Asp Leu Gly Pro Arg Arg
145 150 155 160
Pro Asp Cys Ile Cys Val Phe Glu Phe Ala Asn Asp Lys Thr Leu Gly
165 170 175
Gly Val Cys Val Ile Leu Lys Thr Cys Lys Ser Ile Ser Ser Gly Asp 180 185 190
Thr Ala Ser Lys Arg Glu Gin Arg Thr Thr Gly Met Lys Gin Leu Arg
195 200 205
His Ser Leu Lys Leu Leu Gin Ser Leu Ala Pro Pro Gly Asp Lys Val
210 215 220 Val Tyr Leu Cys Pro Ile Leu Val Phe Val Ala Gin Arg Thr Leu Arg
225 230 235 240
Val Ser Arg Val Thr Arg Leu Val Pro Gin Lys Ile Ser Gly Asn Ile
245 250 255
Thr Ala Ala Val Arg Met Leu Gin Ser Leu Ser Thr Tyr Ala Val Pro 260 265 270
Pro Glu Pro Gin Thr Arg Arg Ser Arg Arg Arg Val Ala Ala Thr Ala
275 280 285
Arg Pro Gin Arg Pro Pro Ser Pro Thr Arg Asp Pro Glu Gly Thr Ala 290 295 300 Gly His Pro Ala Pro Pro Glu Ser Asp Pro Pro Ser Pro Gly Val Val 305 310 315 320
Gly Val Ala Ala Glu Gly Gly Gly Val Leu Gin Lys Ile Ala Ala Leu 325 330 335 Phe Cys Val Pro Val Ala Ala Lys Ser Arg Pro Arg Thr Lys Thr Glu 340 345 350
(2) INFORMATION FOR SEQ ID NO: 51:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 514 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51:
Met Asp Pro Tyr Tyr Pro Phe Asp Ala Leu Asp Val Trp Glu His Arg
1 5 10 15
Arg Phe Ile Val Ala Asp Ser Arg Ser Phe Ile Thr Pro Glu Phe Pro 20 25 30 Arg Asp Phe Trp Met Leu Pro Val Phe Asn Ile Pro Arg Glu Thr Ala 35 40 45
Ala Glu Arg Ala Ala Val Met Gin Ala Gin Arg Thr Ala Ala Ala Ala
50 55 60
Ala Leu Glu Asn Ala Ala Leu Gin Ala Ala Glu Leu Pro Val Asp Ile 65 70 75 80
Glu Arg Arg Ile Arg Pro Ile Glu Gin Gin Val His His Ile Ala Asp
85 90 95
Ala Leu Glu Ala Leu Glu Thr Ala Ala Ala Ala Ala Glu Glu Ala Asp 100 105 110 Ala Ala Arg Asp Ala Glu Arg Glu Gly Ala Ala Asp Gly Ala Ala Pro 115 120 125
Ser Pro Thr Ala Gly Pro Ala Ala Ala Glu Met Glu Val Gin Ile Val
130 135 140
Arg Asn Asp Pro Pro Leu Arg Tyr Asp Thr Asn Leu Pro Val Asp Leu 145 150 155 160
Leu His Met Val Tyr Ala Gly Arg Gly Ala Ala Gly Ser Ser Gly Val
165 170 175
Val Phe Gly Thr Trp Tyr Arg Thr Ile Gin Glu Arg Thr Ile Ala Asp 180 185 190 Phe Pro Leu Thr Thr Arg Ser Ala Asp Phe Arg Asp Gly Arg Met Ser 195 200 205
Lys Thr Phe Met Thr Ala Leu Val Leu Ser Leu Gin Ser Cys Gly Arg 210 215 220 Leu Tyr Val Gly Gin Arg His Tyr Ser Ala Phe Glu Cys Ala Val Leu
225 230 235 240
Cys Leu Tyr Leu Leu Tyr Arg Thr Thr His Glu Ser Ser Pro Asp Arg 245 250 255
Asp Arg Ala Pro Val Ala Phe Gly Asp Leu Leu Ala Arg Leu Pro Arg
260 265 270
Tyr Leu Ala Arg Leu Ala Ala Val Ile Gly Asp Glu Ser Gly Arg Pro
275 280 285
Gin Tyr Arg Tyr Arg Asp Asp Lys Leu Pro Lys Ala Gin Phe Ala Ala 290 295 300
Ala Gly Gly Arg Tyr Glu His Gly Ala Thr His Val Val Ile Ala Thr
305 310 315 320
Leu Val Arg His Gly Val Leu Pro Ala Ala Pro Gly Asp Val Pro Arg 325 330 335
Asp Thr Ser Thr Arg Val Asn Pro Asp Asp Val Ala His Arg Asp Asp 340 345 350
Val Asn Arg Ala Ala Ala Ala Phe Leu Arg His Asn Leu Phe Leu Trp 355 360 365
Glu Asp Gin Thr Leu Leu Arg Ala Thr Ala Asn Thr Ile Thr Ala Val 370 375 380
Leu Arg Arg Leu Leu Ala Asn Gly Asn Val Tyr Ala Asp Arg Leu Asp
385 390 395 400
Asn Arg Leu Gin Leu Gly Met Leu Ile Pro Gly Ala Val Pro Ala Glu 405 410 415
Ala Ile Arg Ala Ser Gly Leu Asp Ser Gly Ala Ile Lys Ser Gly Asp 420 425 430
Asn Asn Leu Glu Ala Leu Cys Val Asn Tyr Val Leu Pro Leu Tyr Gin 435 440 445
Ala Asp Pro Thr Val Glu Leu Thr Gin Leu Phe Pro Gly Ala Gly Arg 450 455 460
Pro Val Pro Gly Arg Pro Gly Gly Ala Ala Thr Gly Val Asp Glu Arg
465 470 475 480
Gly Tyr Val Val Gly Arg Pro Pro Gly Gly Ala Arg Ala Pro His Arg
485 490 495
Ala Gly Ala His Gin Pro His Pro His Lys His His Pro Cys Gly Gly 500 505 510
Asp Tyr
(2) INFORMATION FOR SEQ ID NO: 52:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 91 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52:
Val Val Asp Met Leu Ser Gly Ala Arg Gin Ala Ala Leu Val Arg Leu 1 5 10 15
Thr Ala Leu Glu Leu Ile Asn Arg Thr Arg Thr Asn Thr Thr Pro Val
20 25 30
Gly Glu Ile Ile Asn Ala His Asp Ala Leu Gly Ile Gin Tyr Glu Gin 35 40 45 Gly Leu Gly Leu Leu Ala Gin Gin Ala Arg Ile Gin Ala Lys Arg Phe 50 55 60
Ala Thr Phe Asn Val Gly Ser Asp Tyr Asp Leu Leu Tyr Phe Leu Cys 65 70 75 80
Leu Gly Phe Ile Pro Gin Tyr Leu Ser Val Ala 85 90
(2) INFORMATION FOR SEQ ID NO: 53:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 444 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53:
Val Arg Val Pro Met Ala Ser Ala Glu Met Arg Glu Arg Leu Glu Ala 1 5 10 15
Pro Leu Pro Asp Arg Ala Val Pro Ile Tyr Val Ala Gly Phe Leu Ala
20 25 30
Leu Tyr Asp Ser Gly Asp Pro Gly Glu Leu Ala Leu Asp Pro Asp Thr 35 40 45 Val Arg Ala Ala Leu Pro Pro Glu Asn Pro Leu Pro Ile Asn Val Asp 50 55 60
His Arg Ala Arg Cys Glu Val Gly Arg Val Leu Ala Val Val Asn Asp 65 70 75 80 Pro Arg Gly Pro Phe Phe Val Gly Leu Ile Ala Cys Val Gin Leu Glu
85 90 95
Arg Val Leu Glu Thr Ala Ala Ser Ala Ala Ile Phe Glu Arg Arg Gly 100 105 110 Pro Ala Leu Ser Arg Glu Glu Arg Leu Leu Tyr Leu Ile Thr Asn Tyr 115 120 125
Leu Pro Ser Val Ser Leu Ser Thr Lys Arg Arg Gly Asp Glu Val Pro
130 135 140
Pro Asp Arg Thr Leu Phe Ala His Val Cys Ala Ile Gly Arg Arg Leu 145 150 155 160
Gly Thr Ile Val Thr Tyr Asp Thr Ser Leu Asp Ala Ala Ile Ala Pro
165 170 175
Phe Arg His Leu Asp Pro Ala Thr Arg Glu Gly Val Arg Arg Glu Ala 180 185 190 Ala Glu Ala Glu Leu Ala Gly Arg Thr Trp Ala Pro Gly Val Glu Ala 195 200 205
Leu Thr His Thr Leu Leu Ser Thr Ala Val Asn Asn Met Met Leu Arg
210 215 220
Asp Arg Trp Ser Leu Val Ala Glu Arg Arg Arg Gin Ala Gly Ile Ala 225 230 235 240
Gly His Thr Tyr Leu Gin Ala Ser Glu Lys Phe Lys Ile Trp Gly Ala
245 250 255
Glu Ser Ala Pro Ala Pro Glu Arg Gly Tyr Lys Thr Gly Ala Pro Gly 260 265 270 Ala Met Asp Thr Ser Pro Ala Ala Ser Val Pro Ala Pro Gin Val Ala 275 280 285
Val Arg Ala Arg Gin Val Ala Ser Ser Ser Ser Ser Ser Ser Ser Phe
290 295 300
Pro Ala Pro Ala Asp Met Asn Pro Val Ser Ala Ser Gly Ala Pro Ala 305 310 315 320
Pro Pro Pro Pro Gly Asp Gly Ser Tyr Leu Trp Ile Pro Ala Phe His
325 330 335
Tyr Asn Gin Leu Val Thr Gly Gin Ser Ala Pro His His Pro Pro Leu 340 345 350 Thr Ala Cys Gly Leu Pro Ala Ala Gly Thr Val Ala Tyr Gly His Pro 355 360 365
Gly Ala Gly Pro Ser Pro His Tyr Pro Pro Pro Pro Ala His Pro Tyr
370 375 380
Pro Gly Tyr Ala Val Arg Gly Pro Gin Ser Pro Gly Gly Pro Asp Arg 385 390 395 400
Arg Ala Gly Gly Gly His Arg Arg Arg Pro Pro Gly Gly Trp Ala Ser
405 410 415
Gly Gly Arg Arg Arg Pro Arg Asp Pro Gly Val Gly Glu Pro Pro Pro 420 425 430
Thr Arg Gly Gly Ala Ala Gly Val Arg Leu Arg Pro 435 440
(2) INFORMATION FOR SEQ ID NO: 54:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 250 amino acids
(B) TYPE: amino acid (C)' STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54:
Met Leu Phe Ala Gly Pro Ser Pro Leu Glu Ala Gin Ile Ala Ala Leu
1 5 10 15
Val Gly Ala Ile Ala Ala Asp Arg Gin Ala Gly Gly Leu Pro Ala Ala 20 25 30
Ala Gly Asp His Gly Ile Arg Gly Ser Ala Asn Arg Arg Arg His Glu
35 40 45
Val Glu Gin Pro Glu Tyr Asp Cys Gly Arg Asp Glu Pro Asp Arg Asp 50 55 60 Phe Pro Tyr Tyr Pro Gly Glu Ala Arg Pro Glu Pro Arg Pro Val Asp 65 70 75 80
Ser Arg Arg Ala Ala Arg Gin Ala Ser Gly Phe Thr Ile Thr Ala Leu
85 90 95
Val Gly Ala Val Thr Ser Leu Gin Gin Glu Leu Ala His Met Arg Ala 100 105 110
Arg Thr His Ala Pro Tyr Gly Pro Tyr Pro Pro Val Gly Pro Tyr His
115 120 125
His Pro His Ala Asp Thr Glu Thr Pro Ala Gin Pro Pro Arg Tyr Pro
130 135 140 Ala Glu Ala Val Tyr Leu Pro Pro Pro His Ile Ala Pro Pro Gly Pro
145 150 155 160
Pro Leu Ser Gly Ala Val Pro Pro Pro Ser Tyr Pro Pro Val Ala Val
165 170 175
Thr Pro Gly Pro Ala Pro Pro Leu His Gin Pro Ser Pro Ala His Ala 180 185 190
His Pro Pro Pro Pro Pro Pro Gly Pro Thr Pro Pro Pro Ala Ala Ser
195 200 205
Leu Pro Gin Pro Glu Ala Pro Gly Ala Glu Ala Gly Ala Leu Val Asn 210 215 220
Ala Ser Ser Ala Ala His Val Asn Val Asp Thr Ala Arg Ala Ala Asp 225 230 235 240
Leu Phe Val Ser Gin Met Met Gly Ser Arg 245 250
(2) INFORMATION FOR SEQ ID NO: 55:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1161 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55:
CCTGGCACCT TGTAGCAGTC CACCTATAGG ATACCCCAGC ACTTTGGGAG GCCGAGGCAG 60
ACGGATCACA AGTTCAGGAG ATCGAAACCA TCCTGGCCAA CATGGTAAAA CCCCGTCTCT 120 ACTAAAAACG TAAAAATTAN CTGAGTGTGG TGGTGTGTGC CTGTACTCCC AGCTACTTGG 180
GAGGCTGAGG CAAAAAATTC ACTTGAACCT GGGAGGTGGA GGTTGCACTG AGCTGAGGTC 240
ATGCCACTGC ACTCCAGCCT AGCAACAGAG TGAGACTCCA TCTCAAAAAA ATAAATAAAT 300
AAATAAATAA ATAAATAAAG ACATATGGAG GCCTTACTCT GTGCCAAGCA CTATGATGGG 360
CACAGGGAAC AAACACACGG GCTCCCTGAG CACCAGCGGT GAGCCAGGCA CCGTGCCTGG 420 AGACCAACGT CTGGCGTTTT GTATGCGGAC ATGATACCCG GCACTCTCCC CTATGGCTAA 480 TGAATCATCG AGCTTCACCA GAGAAACGCG AACAGACCCC CTTGTCCATG AATTTGCTAA 540 TTGACCTCCC CCAACTCAGA CATCAACCCT GCATTACCAT AAATTACTGG CTAAGAAACA 600 CCCGTTCTCA ACCTGCTGGC CTCAATGGGT TACACGTCCC ACAAACCCCT TTCCCAAGGT 660 GAAATACCAA CCTCGAAAAC TCGGAAAATT CAAGTTCAAC AACCTCCGGG ATTGCGGGCT 720 TTACCAGCGA AGCCCTCTCC AAAAATTTGC TCGCTTAGAC ATCCAACCGC TTCTCCACTA 780 AAGACCCCGG CTTCTTCTCA CCTCGGCGTT CTCTTGCAAA AAATACGCGT CTGTTAATCC 840 GCGCCCCTCT TCCTCACACA CTTCTCCCCT GCCTACTCAT ACCTCATCTC TCCTATAACC 900 CTCTCGCGAA AGAGCCCCTG TCTCTCACCT GTTTAATACC ACCCATGCGG GCGGGTTGTC 960
CCTTAAATGG ATATTCTGAA CCTCAACCCT CCCCCCATTC TAATTTGGCG TGTTGGCCCC 1020 CTCACCTGCC CCCTCCTCTC CCAAAGTCCG GGAATATCCG TTCCCTGGCC ACCCTCCTTC 1080
TTTATGAAAC CCGGCTTACC CCCCCGGGGA AATAGGCCGT TTGGCTTTTG TGGCGCGCCC 1140
TTCCACCTTC CCCTCTACAA 1161
(2) INFORMATION FOR SEQ ID NO: 56:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 38 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56:
Val Lys Tyr Gin Pro Arg Lys Leu Gly Lys Phe Lys Phe Asn Asn Leu 1 5 10 15 Arg Asp Cys Gly Gin Arg Ser Pro Leu Gin Lys Phe Ala Arg Leu Asp 20 25 30
Ile Gin Pro Leu Leu His 35
(2) INFORMATION FOR SEQ ID NO: 57:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 524 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57:
GCAGGTCGAT CTAGAAGTCC CCAGGGGTCA GGGGTCTCAC TTGAGAAGGT AGTCTGTCCG 60
TTCTCCAATC TCAACCTCCG TGTTGGGAGA TCCACTGCTC ACTTCAAAGC TGTGAGACAG 120
AGTTGTTTGC GTCTGCAGAG GTTTCAGCTG CTTTTTGTTG TTGTAGTCGT CGTCGTTGTT 180
GTTGTTGTTG TTGTTTAGCT GTGCCCTGTC CCCAGAGGTG GAGTCTACAG AGACAGGCAG 240 GACTCCTTGA GCTGCTGTGA GCTCCACCCA GTTGGAGCTT CCCAGCTGCT TTGTTTATCT 300
ACTTAAGCCT CAGTAATGGC GGGCGCCCCT CCCCCAACCT CGCTGCTGCC TTGCCCCCAG 360
ATCGCAGACT GCTGTGCTAA CAACGAGGGA GGCCCTGTGG GCATGGGACC CTCCTGGCCA 420
NGTGTGGGAT ATANTCTCCT GGTTGTGCCC GTTGGTAAAA TTTCTGGGTA AACCCCATAT 480
TGGGGGTTTG AATTCCCCAA ATTTCCCAGT TTGTTTTGTG TCT 524
(2) INFORMATION FOR SEQ ID NO: 58:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 76 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58:
Val Glu Leu Thr Ala Ala Gin Gly Val Leu Pro Val Ser Val Asp Ser 1 5 10 15
Thr Ser Gly Asp Arg Ala Gin Leu Asn Asn Asn Asn Asn Asn Asn Asp
20 25 30
Asp Asp Tyr Asn Asn Lys Lys Gin Leu Lys Pro Leu Gin Thr Gin Thr 35 40 45
Thr Leu Ser His Ser Phe Glu Val Ser Ser Gly Ser Pro Asn Thr Glu
50 55 60
Val Glu Ile Gly Glu Arg Thr Asp Tyr Leu Leu Lys 65 70 75
(2) INFORMATION FOR SEQ ID NO: 59:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 773 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59:
GGAAGAGGGC GCTGTTGCCC GCGCTCCTTG CGCGGTGGCG GCGGGGGGCA GGCGGAGGCA 60
GGCGCGGCGT GCGGGGCCTC CGGCGCCTTC CCCCCGCCCT CGCTCGGGGG GCTGTTCGCC 120
CACTCTGCGT CGTCGTTGCC GGCGTAATCC GCGTCGTCGC TGTCGTCCGC CTGGGGCACC 180 AGCAGCCAGC GCCGCAGGAG CGATGACGCG GCCGGCGCGC TCTCGACCGC GGTTCCCGAG 240
TCGTACGCAG GGACCATTTG GGAGTCTGCG GTTGGGAACG CGCCGGGGCG CGGCACGGTT 300
GGACCGCCGG GGCGCGGCCG GCGCCGGGGA CCCCGGCGGC GGGGACTCCG GCGGGACATG 360
GAGGGCGGCT GGGCTCGGCC TATGCCCGGA TCCGGATCGC GTCTGGGCGG GAGATTTCAC 420
TCGGCACGCA TGCACGTCTC CCCCCCCCCC CGTGGTTGCC TATGAAACTA CCCCGTCCCG 480 CTGGTGTGCG CATTTCTGTC CGCGTTGCCG GCCTTCTTTG CGGCGCGTGG CTTGACTGGG 540
ATCCCCTCCC CTCTCCCTTC CCCTCCGGGA TTCACCCCCG GGGGGGGTTT TTCTGGGGGG 600
GGGGTTAATA GCTGTCTGTC CCCTCCCCAC CGTTTCCTCC CTGGACTCCA CGGCGCTCCA 660
TAACTCTCTC CTGGTCCACC CCCCATTCCC CACATGGCCT TTGCTTTTCA ACCCCCCCCT 720
CCGGTTGGGC TGCATATCAA TTTCCTTCTC CCCCGGGGGA TCCCCTATTA CG 773
(2) INFORMATION FOR SEQ ID NO: 60:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 121 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60:
Met Ser Arg Arg Ser Pro Arg Arg Arg Gly Pro Arg Arg Arg Pro Arg 1 5 10 15
Pro Gly Gly Pro Thr Val Pro Arg Pro Gly Ala Phe Pro Thr Ala Asp
20 25 30
Ser Gin Met Val Pro Ala Tyr Asp Ser Gly Thr Ala Val Glu Ser Ala 35 40 45
Pro Ala Ala Ser Ser Leu Leu Arg Arg Trp Leu Leu Val Pro Gin Ala
50 55 60
Asp Asp Ser Asp Asp Ala Asp Tyr Ala Gly Asn Asp Asp Ala Glu Trp 65 70 75 80 Ala Asn Ser Pro Pro Ser Glu Gly Gly Gly Lys Ala Pro Glu Ala Pro
85 90 95
His Ala Ala Pro Ala Ser Ala Cys Pro Pro Pro Pro Pro Arg Lys Glu
100 105 110
Arg Gly Gin Gin Arg Pro Leu Pro Xaa 115 120
(2) INFORMATION FOR SEQ ID NO: 61:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 981 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61:
CCCGGCGGAG AATGAGCCCA CACCCAGATT GTTCACCGCC CCCCATTTCC CCCCCCCCGG 60
GTATACACCN AGGGAAAAGG TTTTTCCCCC CCCCCCCGGA TCAAATTTCC CCCACAAGAA 120 CCGAGTTCCA GGTTAAGTTT AGTTGGGTGC CCTTCCCAGG TTGACGGGGG TCGCCAATGT 180
CCCAGCGGGG GTTGGCGCCC TCAGGGGGGG NGGGGCCAGC CCCCGCGGGC GGTCGCCCAC 240
CAACTTCCAA GCCGCGGCCC GCCGAGGCCA GCACGGTCCC CGGGGGGCCG GTGGCAGACG 300
CCCAGCGTAT CTGCGGGGGC GGGCCCGCGT CCGCGTCGTC GCGCAGCACC AGCGGGGGCG 360 CGTCGCCGTC GGGCTAGAGC AGCGCCCGCG CGCAGAACTC CCGCCGCGGC CCGAGCAGAT 420
CAGCCGGGCC GCCGCGCACG GTGTCGCGCC CCAGCGCCAC GTAGACGGGC CGCAGCGGCG 480
CGCCCAGGCC CCAGCGCGCG CAGGCGCGGT GCGAGTGCGC CTCGTCCTCG CAGAAGTCCG 540
GCGCGCCGGG CGCCATGGCG TCGCCCGCGC CCGAGGCGGC GGCCCGGCCG TCCAGCGCCG 600 GGAGCACGGC GCGGCGGTAC TCGCGCGGGG ACATGGGCAC CAGCGTGTCG GGGCCGAAGC 660
GCGTGCGCAC GCGGTACCGC ACGTTGGCCC CGCGGCAGAG GCGCAGCGGC GGCGCGTCGG 720
GGTACATGCG CGCGTGCGCG GTCTCCACGC GCGCGAATAC CCCGGCCCTA ACACTCTGCC 780
GGATGCCATC ACGGTGCTGC GCTTGTTCCG CGCCCCCGGT CTTCGCACGG CGCTCTGTCT 840
TGGCGGGCTC CTCCTCCCTA GGTTATTTTT GGGTTCTTTC CTCTAAAAAA CCCGGGGCCT 900 CTTTTGGGGG GGCCTTTTCC TCCCGGTCCC CTCCCCGGTT TGTGAACCAA CTAAATATAG 960
GCCGGTGGTT CCCCCAGGCC 981
(2) INFORMATION FOR SEQ ID NO: 62:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 122 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62:
Val Glu Thr Ala His Ala Arg Met Tyr Pro Asp Ala Pro Pro Leu Arg 1 5 10 15
Leu Cys Arg Gly Ala Asn Val Arg Tyr Arg Val Arg Thr Arg Phe Gly
20 25 30
Pro Asp Thr Leu Val Pro Met Ser Pro Arg Glu Tyr Arg Arg Ala Val 35 40 45
Leu Pro Ala Leu Asp Gly Arg Ala Ala Ala Ser Gly Ala Gly Asp Ala
50 55 60
Met Ala Pro Gly Ala Pro Asp Phe Cys Glu Asp Glu Ala His Ser His 65 70 75 80 Arg Ala Cys Ala Arg Trp Gly Leu Gly Ala Pro Leu Arg Pro Val Tyr
85 90 95
Val Gly Arg Asp Thr Val Arg Gly Gly Pro Ala Asp Leu Leu Gly Pro
100 105 110
Arg Arg Glu Phe Cys Ala Arg Ala Leu Leu 115 120
(2) INFORMATION FOR SEQ ID NO: 63: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 644 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63:
GCATGCCTGC AGGTCGACTC TAAAGGATCC CCCAGCTGCC TCTCCCNTGG AAGACATATA 60
TCTTCTTGTG CCGGACCAGC TTCAGTGGGG CCAGTGGGCC CTCTAGGGCA ACGGTCACTT 120
GTATCGTGGC ATCCGTGTGC ACTGGCCCCA TTTCCATTGA AAAAATCAGG GTGTCACGGG 180
TTCTGAGGCT GCCATTGAGG CGGTAAAAAA TGGCCCCGTC TAGCAGGTCC TCCTGGGAGA 240
AGGCTGTGAC TGACTGAGTG GCCCAAAGCA ACTGCCCCCA GCGACGGCCG GCTGTGACAT 300 GGTAGTGCAC CTCATCCCCA CTGCGGATGT CAAGGTTGGT GTCCANGTGG AGCTCANCTG 360
TGTCAATGGT GCCCTGACCT CCTTGAGGGA CCACAAGGCC AAAACCGTTG GCCACACGGA 420
TGTAGGGCTC CGAGGCCTGC ACCTCCANCA NCACAGTGGC CTGGTGTTGC CCATCGGACA 480
CACCTGCAGC TGGATCCAGC CATGATCAGC CCAAGTGCAT GAACAGGACT CGCCTCTTTC 540
TGAGGTCCTC CTGGGTGAAG CGGTAAATGG GCCACGTGGG CTCATCCACG GCCACGATAC 600 TGCCAAAAAA GAAGTCCTTG CGGGTCAGGG TACCGANCTC AAA 644
(2) INFORMATION FOR SEQ ID NO: 64:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 151 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64:
Val Ser Asp Gly Gin His Gin Ala Thr Val Xaa Xaa Glu Val Gin Ala 1 5 10 15
Ser Glu Pro Tyr Ile Arg Val Ala Asn Gly Phe Gly Leu Val Val Pro
20 25 30
Gin Gly Gly Gin Gly Thr Ile Asp Thr Xaa Glu Leu His Xaa Asp Thr 35 40 45 Asn Leu Asp Ile Arg Ser Gly Asp Glu Val His Tyr His Val Thr Ala 50 55 60
Gly Arg Arg Trp Gly Gin Leu Leu Trp Ala Thr Gin Ser Val Thr Ala 65 70 75 80 Phe Ser Gin Glu Asp Leu Leu Asp Gly Ala Ile Phe Tyr Arg Leu Asn
85 90 95
Gly Ser Leu Arg Thr Arg Asp Thr Leu Ile Phe Ser Met Glu Met Gly 100 105 110 Pro Val His Thr Asp Ala Thr Ile Gin Val Thr Val Glu Gly Pro Leu 115 120 125
Ala Pro Leu Lys Leu Val Arg His Lys Lys Ile Tyr Val Phe Xaa Gly
130 135 140
Arg Gly Ser Trp Gly Ile Leu 145 150
(2) INFORMATION FOR SEQ ID NO: 65:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 52 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65:
Val Glu Leu Xaa Cys Val Asn Gly Ala Leu Thr Ser Leu Arg Asp His 1 5 10 15
Lys Ala Lys Thr Val Gly His Thr Asp Val Gly Leu Arg Gly Leu His
20 25 30
Leu Xaa Xaa His Ser Gly Leu Val Leu Pro Ile Gly His Thr Cys Ser 35 40 45 Trp Ile Gin Pro 50
(2) INFORMATION FOR SEQ ID NO: 66:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 585 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: GCTCAATCCT CGAATTCAGA AAACAGTTGC CATTTATCCC TTCTTGTCAG ACTTCAGACG 60
GGTGATTGAG ATTGGTAATA CTAGCGAGGC TTACGACGAA CTTTTCCGTT ATTTCAAGTT 120
TCACGACCCC TTCCATGAAA CAGAGGAGGA AATCATGGCG ACCCTTGCCT ATATCGATGT 180
CAAAAATCTT GCCCATCGTA TCCAAGGTGA GGTTAAAATG ATTACGGGCT TGGACAACAA 240 TGTTTGCTAT CCCATTACCC AGTTTGCGAT TTATAATCGT CTGACCTGCG ATAAAACCTA 300
TCGCATCATG CCTGAGTATG CTCACGAAGC CATGAATGTA TTTGTCAATG ACCAAGTCTA 360
CAACTGGCTC TGTGGAAGTG AGATTCCTTT TAAATATCTA AAATAAGGAG TCGACTCTAA 420
GCACAAAATC TTAAAAATTA CAAACACGCA TAGTATCAGG GGATTAAAAA AACTTGATAC 480
TATGCGTTTT ATCATGGACA TATATTATAA TGAAACAAGA ACAGGACAAA TCGATCCGGA 540 CAGTCCAATC GATTTCTAAC AATGTTTTAA AAGTAAATGT GTCT 585
(2) INFORMATION FOR SEQ ID NO: 67:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 60 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67:
Met Ile Thr Gly Leu Asp Asn Asn Val Cys Tyr Pro Ile Thr Gin Phe 1 5 10 15
Ala Ile Tyr Asn Arg Leu Thr Cys Asp Lys Tyr Ile Met Pro Glu Tyr
20 25 30
Ala His Glu Ala Met Asn Val Phe Val Asn Asp Gin Val Tyr Asn Trp 35 40 45 Leu Cys Gly Ser Glu Ile Pro Phe Lys Tyr Leu Lys 50 55 60
(2) INFORMATION FOR SEQ ID NO: 68:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1237 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: ACTGTATCGA TAAGCTTGAT CCATGGCGGT GGCCGACACA GGGAGGGGCG TCTTCTCCGG 60
AGCGGACGTA GGCGAATCGG AGGACGCGTC CTCGGTCAAG GCCATGAGGC GCCGCCCGGT 120
TAGGGGGGCC CGAACGTCGG GGTCAACCCC CTCGGGGTCT GTCCGCAGGG CGCGTCAAAA 180
CCGCGGGCGG GGTGGGAAGG GGGCGTACGG ACCGTCATCT AGGGCCCCGG GGGCCCAATG 240 GGGTGGCAGG ACCCCGACGT CTTCCGTGGG TCGTGCCATC CGAATAAACG TGCGGCCCGT 300
AATCCCCACC AGCAGGCTCT GGGGAGCAAA ACCGACGCGT GGTAGGTCGC TGGGGGCGGC 360
GGGCGTCTGT GGGGGCAAAC AGCGCTCCCG GAAACGCAGG CCACAAAACC CGGGGTTGGG 420
GGCGGAATAC CATACCGGGG GCACCTATCG CCACGGGCGG CCCGCGGGGA CCGGGGGGAC 480
TCACGGGCCG CCCTCCGCAC GCGCCTCCTG TGGGGGGGCG GTGGGGTTTT CTGCCTATTC 540 CCTTCCTTTC CTCATCCTCT TCCTTCCCTA CTTCCCCCCT TCTCATTTCT CCTCTCTTCT 600
GTTCACCCCT TACTCCGCTT CACTTGCTCT CTCTCTATTC ATTCCGTCCT CTACTTTTCT 660
CGTCCTTTCC TCTCTCTCCC CTCATCTATC TTCTCTTCAT CTCTTTTTCT CTCGCTCCCT 720
CTCTTTCCCA TCTTCCGTTC TTCTCTCACT TTCATATCAT TCTTGCCTCA CCCCGACACT 780
CGCTATTCTC TCTTTCTCGC CACCAATCCT GTGTGTGTTT CTCGTTTTCT TCACACCTCG 840 TTCTATAGCT CACCACATCA TGTGCTTTCT CGTATCTCCT ATCCTCCTTA TCCTTCTCTT 900
TTCTTTCTCT CTCAACCGCT CCCTTCTGTT CCACAGACAC TCTCTCTGCT CTCTCTCATT 960
CTTGCGCTCT TGTATTCACT TCTCATCATT CTTACACTTT TCTCTCTCAT TCGCACCCAT 1020
CTACCGCTAC GTCATTCACA CCGCGATTTT TTCTAGCTCT ACCTATTCCT CCTCGACTTC 1080
TCTGTGCGAC TATACTCCCC TCTTCTTTCT GTCCTACACG TCTGAGATCA CATTGATCTT 1140 CCCTCACCCC TCTGCTCCTG ACTATACCTT ATTCTATTTA TTTCTCGACC CTTCCTTCCC 1200
ATTCTCTATT CTCGACTTCT CTGCACCTCT CCTCAC 1237
(2) INFORMATION FOR SEQ ID NO: 69:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 37 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69:
Met Ala Leu Thr Glu Asp Ala Ser Ser Asp Ser Pro Thr Ser Ala Pro 1 5 10 15
Glu Lys Thr Pro Leu Pro Val Ser Ala Thr Ala Met Asp Gin Ala Tyr
20 25 30
Arg Tyr Ser Xaa Xaa 35
(2) INFORMATION FOR SEQ ID NO: 70: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2057 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70:
GGCGACACGA GAAGAGAGCA AGGAGGGGAG AGCGACAGAG TAACTACACA TCGGAGCGCA 60
TAATGAAGGA CAGTCATAAG ATGAGAGCAA GAAGAGATGA GCAGAGTCAG AAGATATAGC 120
GACATAGAAG TCAGAGGCGA CACGGGGTAC GTGAGAATGT CGAACGCAGG ACGGGCGGAG 180
GAGAGGAAGG GGGCCCGAGC GCAGAACGAA ACAGGGGAAC ACAGGAGAAC GGAAGGAGCG 240
AGCAAAGGGG AACAGACAGA AGGGCGCGCA NAGCGGAACG CCGCGAGAGC CGAGAGCCAC 300 AACACGGCCA ACGCGGCGAG AGCGACAGGG CAGACAGGGC AACGCGCCAG ACGGAAGCGG 360
CGAGGGACGA AGCGNAGAGC NGGAGAGGCA AGGACAAAGC AAAGGAGGAG GAGAGCGCAC 420
AGAAACGAAG AAGACGAGCG CGACAAGCCA CGGGAACGGA ACGAAACCGA GGCCACGGAG 480
GGAGCAAAGC GAAGGGAGGC CTCCTTCCTC ATCATATTCG GCATCATCAT CGTCTTCATC 540
CGCATATTCC TCTGCGGGCG GGGCTGGTGG GAGCGTCGCG TCCGCGTCCG GCGCTGGGGA 600 GAGACGAGAA ACCTCCCTCG GCCCCCGCGC TGCTGCGCCG CGGGGGCCGA GGAAGTGTGC 660
CAGGAAGACG CGCCACGCGG AGGGCGGCCC CGAGCCCGGG GCCCGCGACC CGGCGCCCGG 720
CCTCACGCGC TACCTGCCCA TCGCGGGGGT CTCGAGCGTC GTGGCCCTGG CGCCTTACGT 780
GAACAAGACG GTCACGGGGG ACTGCCTGCC CGTCCTGGAC ATGGAGACGG GCCACATAGG 840
GGCCTACGTG GTCCTCGTGG ACCAGACGGG GAACGTGGCG GACCTGCTGC GGGCCGCGGC 900 CCCCGCGTGG AGCCGCCGCA CCCTGCTCCC CGAGCACGCG CGCAACTGCG TGAGGCCCCC 960
CGACTACCCG ACGCCCCCCG CGTCGGAGTG GAACAGCCTC TGGATGACCC CGGTGGGCAA 1020
CATGCTCTTT GACCAGGGCA CCCTGGTGGG CGCGCTGGAC TTCCACGGCC TCCGGTCGCG 1080
CCACCCGTGG TCTCGGGAGC AGGGCGCGCC CGCGCCGGCC GGCGACGCCC CCGCGGGCCA 1140
CGGGGAGTAG GGGGAGCTAA CACTCGGCTT GCTGCCCGAA GGGAAGCCGC CCCCCACCGG 1200 ACCACCGGCC GAGGCGCCTC GGGGGCATGG GGATGTGGGG GGGGGGGGAA AACNGGGATC 1260
ATATCCGGAT TGCGGGTGGG ATTGGGGGGG GTATGTTTTT TGTTTNTTTT TGTTTTTTTT 1320
TTTTTNTTTT GGTGTTGGTT TTTTTGGTTT TTGTTTTTTT TTNGGGGGAT TTTTGTTTTT 1380 ττττττττττ TTNTTTTTTC GTTTTTTTTT TTGTGTTTTT NTTTGGTNTT TGGTTTGTTT 1440
TGTGTTTTTT TTTTTTTTNT TNTTTTTTTT TTTTGGGNTT TNTGTTTTTT TTGTTTGTTT 1500 CTTTGTTTTT TTTTNTTTTG TTTCGTGTTT TTCTTTTTTT TTTCCTTCCT TTTCCCCCCG 1560
CTTTCCCCCC CCTNCTCCCC CTCCTCTTCT CTTTCTCTNN TTTTCCTCTT CCCTTTTTCT 1620
TCCCGTCTCC CCTCTGCGTT TCCCTCTCCC TTTTCTTCCC TTCCCGCTTC TCCGTCCCTC 1680
CTCTTTTCCC TCCTTCCTCT TTCTTCCCCT GCTGCCTCCC TCTCTCCTCC TGTCCTTTTC 1740
CCTCTTTTTC CCCTCCCTCT GCCCTTTCTT CCCTTCTCCT CTTCCCTCCC CTCCTTTCTT 1800 TCCTTCCTCG CGTCGTTCCT CCCTCTTCTC TCTCTCTCTC CTCTNGTCCC CCCCCTCTTT 1860
CTCTTCCCCC CTCTTTTTCT CTCCTCTCGT CCTCCTTCCC CCTCATTTTA GCCTCATCCC 1920
CTCCATCCTA TTACTCCTCT ATTCTCCTCT CTCTCCCTCT TCCATCCCTT CCGCTCCTCC 1980
CATTATTCCT CTAAGCTTGC CCTCCTCCAC CTTCTCTCTA TCTCAAGTCC TCCTCCCTCT 2040 CACTATTCGG TTCCCT 2057
(2) INFORMATION FOR SEQ ID NO: 71:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 125 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71:
Val Ala Pro Tyr Val Asn Lys Thr Val Thr Gly Asp Cys Leu Pro Val 1 5 10 15
Leu Asp Met Gly His Ile Gly Ala Tyr Val Val Leu Val Asp Gin Thr
20 25 30
Gly Asn Val Ala Asp Leu Leu Arg Ala Ala Ala Pro Ala Trp Ser Arg 35 40 45
Arg Thr Leu Leu Pro Glu His Ala Arg Asn Cys Val Arg Pro Pro Asp
50 55 60
Tyr Pro Thr Pro Pro Ala Ser Glu Trp Asn Ser Leu Trp Met Thr Pro 65 70 75 80 Val Gly Asn Met Leu Phe Asp Gin Gly Thr Leu Val Gly Ala Leu Asp
85 90 95
Phe His Gly Leu Arg Ser Arg His Pro Trp Ser Arg Glu Gin Gly Ala
100 105 110
Pro Ala Pro Ala Gly Asp Ala Pro Ala Gly His Gly Glu 115 120 125
(2) INFORMATION FOR SEQ ID NO:72:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1468 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72:
GCACCGCCCG GAAAGGGATC CCGGGGGGAA CCCCGCCCCC GAGAGGCGAC CGGGGCAGAA 60 CCCCCGGCAC GGTGAGAGGN GACCCCCGGT TATCAGGCCC CCCTTTTTCC CCGACCACCC 120
AGGAGGGGGG TTGGGGGTGT TGCGGGGCGT GGGGTTTGGG GGCGGGGACG CTTGACGGGG 180
CAGACCCCCG CCCCGCTTAA GCGGTCGGGG GACCCCCATG GGCCGTGCGC CGCCCCCCGA 240
CCCTTTGGGG GGGGCGAGGG AGGCAGGGAG CCCTGAGCCC GAGAGCGGGG GACAGGGGGG 300 GGGAGACGAG GGGTAGGAAT CCAAAGGACG CAGACCACCT TTGGTTACGG ACCCCTTTCT 360
CCCCCCCTTC CGAACAAAAA GCAGCGGGCG GGGGGCCGGG GTGAGGGAGG GACACGGGGG 420
ACACGGCACG GGGGTCCCGC CTCACGCCCC GCGCCCTCTA AATCCCCCCC CGTTTCTTTG 480
TCAAGCAGCC CGCCGCCCCG CACGCCTGGG GGATGCTCAA CGACATGCAG TGGCTCGCCA 540
GCAGCGACTC GGAGGAGGAG ACCGAGGTGG GAATCTCTGA CGACGACCTT CACCGCGACT 600 CCACCTCCGA GGCGGGCAGC ACGGACACGG AGATGTTCGA GGCGGGCCTG ATGGACGCGG 660
CCACGCCCCC GGCCCGGCCC CCGGCCGAGC GCCAGGGCAG CCCCACGCCC GCCGACGCGC 720
AGGGATCCTG CGGGGGTGGG CCCGTGGGTG AGGAGGAAGC GGAAGCGGGA GGGGGGGGCG 780
ACGTGTGTGC CGTGTGCACG GACGAGATCG CCCCGCCCCT GCGCTGCCAG AGTTTTCCCT 840
GCCTGCACCC CTTCTGCATC CCGTGCATGA AGACCTGGAT TCCGTTGCGC AACACGTGTC 900 CCCTGTGCAA CACCCCGGTG GCGTACCTGA TAGTGGGCGT GACCGCCAGC GGGTCGTTCA 960
GCACCATCCC GATAGTGAAC GACCCCCGGA CCCGCGTGGA GGCCGAGGCG GCCGTGCGGT 1020
CCGGCACGGC CGTGGACTTT ATCTGGACGG GCAACCCGCG GACGGCCCCG CGCTCCCTGT 1080
CGCTGGGGGG ACACACGGTC CGCGCCCTGT CGCCCACCCC CCCGTGGCCC GGCACGGACG 1140
ACGAGGACGA TGACCCGCCC GACGGTGAGG GCGGGCGGGG GTCTGGCACT GGGCGGGGGT 1200 CCGGCACTGG GCGGGGGTCT GGCACTGGGC GGGGGTCCGG CACTGGGCGG GGGTCTGGCG 1260
GGGGTCAGGC ACTAACCGGG GGTTCCCGTC TCTGTCTCCC TCTGCAACCG GAACTAATTT 1320
CCCGCCCCCC CCCTAATACC TCCCCGCCCG GGGCTGCTGT GCCGGGGCCA CCCCTGGTAA 1380
CTCCACCCCC CCTTTTACCT AACCTGCGCC CCCCGGCCCC CCCCGGGACT ACACTCACCC 1440
GTGGCCCCCC CTTCCTGGGC CGGGGTT 1468
(2) INFORMATION FOR SEQ ID NO: 73:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 319 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73:
Met Leu Asn Asp Met Gin Trp Leu Ala Ser Ser Asp Ser Glu Glu Glu 1 5 10 15 Thr Glu Val Gly Ile Ser Asp Asp Asp Leu His Arg Asp Ser Thr Ser 20 25 30
Glu Ala Gly Ser Thr Asp Thr Glu Met Phe Glu Ala Gly Leu Met Asp 35 40 45 Ala Ala Thr Pro Pro Ala Arg Pro Pro Ala Glu Arg Gin Gly Ser Pro
50 55 60
Thr Pro Ala Asp Ala Gin Gly Ser Cys Gly Gly Gly Pro Val Gly Glu 65 70 75 80 Glu Glu Ala Glu Ala Gly Gly Gly Gly Asp Val Cys Ala Val Cys Thr
85 90 95
Asp Glu Ile Ala Pro Pro Leu Arg Cys Gin Ser Phe Pro Cys Leu His
100 105 110
Pro Phe Cys Ile Pro Cys Met Lys Thr Trp Ile Pro Leu Arg Asn Thr 115 120 125
Cys Pro Leu Cys Asn Thr Pro Val Ala Tyr Leu Ile Val Gly Val Thr
130 135 140
Ala Ser Gly Ser Phe Ser Thr Ile Pro Ile Val Asn Asp Pro Arg Thr 145 150 155 160 Arg Val Glu Ala Glu Ala Ala Val Arg Ser Gly Thr Ala Val Asp Phe
165 170 175
Ile Trp Thr Gly Asn Pro Arg Thr Ala Pro Arg Ser Leu Ser Leu Gly
180 185 190
Gly His Thr Val Arg Ala Leu Ser Pro Thr Pro Pro Trp Pro Gly Thr 195 200 205
Asp Asp Glu Asp Asp Asp Pro Pro Asp Gly Glu Gly Gly Arg Gly Ser
210 215 220
Gly Thr Gly Arg Gly Ser Gly Thr Gly Arg Gly Ser Gly Thr Gly Arg 225 230 235 240 Gly Ser Gly Thr Gly Arg Gly Ser Gly Gly Gly Gin Ala Leu Thr Gly
245 250 255
Gly Ser Arg Leu Cys Leu Pro Leu Gin Pro Glu Leu Ile Ser Arg Pro
260 265 270
Pro Pro Asn Thr Ser Pro Pro Gly Ala Ala Val Pro Gly Pro Pro Leu 275 280 285
Val Thr Pro Pro Pro Leu Leu Pro Asn Leu Arg Pro Pro Ala Pro Pro
290 295 300
Gly Thr Thr Leu Thr Arg Gly Pro Pro Phe Leu Gly Arg Gly Phe 305 310 315
(2) INFORMATION FOR SEQ ID NO: 74:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 620 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74:
AAAACGCACG AGTATTGCAC GAATAACCAA CCAAACAACC ACTCAGACCA TGTGGATCCA 60 TACCCTTACT TGGCAAAATG GGGCATTAGC CGTGAGCAGT TTAAGCATGA TATTGAGAAC 120
GGCTTGACGA TTGAAACAGG CTGGCAGAAG AATGACACTG GCTACTGGTA CGTACACTCA 180
GACGGCTCTT ATCCAAAAGA CAAGTTTGAG AAAATCAATG GCACTTGGTA CTACTTTGAC 240
AGTTCAGGCT ATATGCTTGC AGACCGCTGG AGGAAGCACA CAGACGGCAA CTGGTACTGG 300
TTCGACAACT CAGGCGAAAT GGCTACAGGC TGGAAGAAAA TCGCTGATAA GTGGTACTAT 360 TTCAACGAAG AAGGTGCCAT GAAGACAGGC TGGGTCAAGT ACAAGGACAC TTGGTACTAC 420
TTAAACGCTA AAGAAGGCGC CATGGTATCA AATGCCTTTA TCCACTCAGC CGGACGGAAC 480
AGGCTGGTAC TACCTCAAAC CAGACCGAAC ACTGGCAGAC AAGCCAGAAT TCACAGTAGA 540
CCCAGATGGC TTGATTACGT TAAAATAATA ATGGAATGTC TTTCAAATCA AAACCCCGCA 600
TATTATTAGG TCTTGAAAA 620
(2) INFORMATION FOR SEQ ID NO: 75:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 116 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75:
Met Leu Ala Asp Arg Trp Arg Lys His Thr Asp Gly Asn Trp Tyr Trp 1 5 10 15 Phe Asp Asn Ser Gly Glu Met Ala Thr Gly Trp Lys Lys Ile Ala Asp 20 25 30
Lys Trp Tyr Tyr Phe Asn Glu Glu Gly Ala Met Lys Thr Gly Trp Val
35 40 45
Lys Tyr Lys Asp Thr Trp Tyr Tyr Leu Asn Ala Lys Glu Gly Ala Met 50 55 60
Val Ser Asn Ala Phe Ile His Ser Ala Gly Arg Asn Arg Leu Val Leu 65 70 75 80
Pro Gin Trp Asn Thr Gly Arg Gin Ala Arg Ile His Ser Arg Pro Arg 85 90 95 Trp Leu Asp Tyr Val Lys Ile Ile Met Glu Cys Leu Ser Asn Gin Asn 100 105 110
Pro Ala Tyr Tyr 115 (2) INFORMATION FOR SEQ ID NO: 76:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2695 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76:
GGGAGAGAAG AGAGAGAGAG AGAGAGAAGG AGTAGGAGAG CGAGGAGAGG AGAATAAGGA 60
GTGAATGGAA GCAGTAAGCT AGATAGGCAG AGAGAGAGAG AACGGAGAGT AGGAGTGGAA 120 GAAGTGGAAG TTGAGAACGA CAAGGAGAGA GAAAGGAAGA AAAGTAGAGA GACTAGAGAA 180
TAGAAGAGGA GAACAGAGAG GTAGGAGAAA GAGGAAAAGA AGAGAGAGAG AAGGCAGCGA 240
GAAAGAGAGG AGCAGGCGGA CAAGGAGAAG AGGGAGGAAG AGGAAAAGAG GAAGAGAGAA 300
GAAGAATGGT GGAGAGAGAA GAGGAAAGAG CACCCGCGCC ACCGAGGATT GGGAGATGAA 360
TTAGGGGCCC CTAAGAGGAC CGAAGACCCG GGCGTAGATT ATTCGCCCCG GAGGGCAAGG 420 GAGGTCGACC GCAAAGTAAA TACACCACCA GGGAGGAGGG AAATATGAAC GCCGGCGGAG 480
ACCCGGGGCC CTAGATACTG GTGAAGACGA TCAAAAGTAT GGATATGCCG GTCGCCACCA 540
GCTTTTTGGC CCCGGACGGA ACGCCGCTGC AGTACGCGCT ATGCTTCCCG GCCGTCACCG 600
ACAAACTCGG CGCGCTGCTG ATGCGTCCCG AGGCGGCCTG CGTGCGGCCC CCGCTTCCGA 660
CGGACGTCCT CGAATCGGCC CCGACGGTCA CGGCCATGTA CGTGCTGACC GTCGTGAACC 720 GGCTCCAGCT GGCCCTCAGC GACGCCCAGG CCGCCAACTT TCAGCTGTTC GGTCGCTTCG 780
TGCGCCATCG CCAGGCGACG TGGGGCGCCT CGATGGACGC GGCGGCCGAG CTGTACGTCG 840
CCCTCGTCGC CACCACCCTC ACGCGCGAGT TTGGGTGTCG CTGGGCCCAG CTGGGCTGGG 900
CGTCCGGAGC GGCGGCGCCG CGTCCGCCGC CGGGCCCCCG GGGGTCCCAG CGCCACTGCG 960
TCGCCTTCAA CGAGAACGAC GTGCTGGTCG CGCTGGTGGC CGGCGTTCCG GAACACATCT 1020 ACAACTTCTG GCGCCTGGAC CTCGTTCGCC AGCACGAGTA CATGCACCTC ACCCTCGAAC 1080
GCGCGTTCGA GGACGCAGCG GAGTCCATGC TGTTCGTCCA GCGCCTGACC CCGCATCCCG 1140
ACGCCCGCAT CCGCGTGTTG CCGACGTTTT TGGACGGAGG CCCCCCGACC CGGGGCCTCC 1200
TGTTCGGCAC GCGGCTGGCC GACTGGCGCC GGGGCAAGCT GTCCGAAACC GACCCGCTGG 1260
CGCCCTGGCG CTCGGCCTTG GAGCTCGGGA CCCAGCGCCG GGACGCCCCG GCGCTCGGGA 1320 AGCTCAGTCC GGCCCAGGCC CTGGCGGCGG TGAGCGTCCT CGGGCGCATG TGTCTGCCGA 1380
GCGCCGCTTT GGCCGCGCTG TGGACCTGCA TGTTTCCCGA CGACTACACC GAGTACGACA 1440
GCTTCGACGC CCTCCTGGCC GCACGCCTGG AGTCTGGCCA GACGCTCGGC CCGGCGGGGG 1500
GGCGCGAGGC GTCCCTCCCC GAGGCCCCCC ACGCCCTCTA CCGACCCACG GGCCAGCACG 1560
TGGCCGTGCT GGCCGCCGCG ACCCACCGCA CCCCCGCCGC GCGCGTTACG GCCATGGACC 1620 TGGTTCTGGC CGCGGTGCTC CTCGGCGCGC CCGTCGTGGT GGCGCTCCGC AACACCACGG 1680
CCTTCTCCCG CGAGTCGGAA CTGGAACTGT GCCTGACGCT CTTCGACTCG CGCCCCGGCG 1740
GGCCGGACGC CGCCCTGCGC GACGTCGTGT CGTCCGACAT CGAGACGTGG GCCGTCGGCC 1800
TCCTCCACAC CGATCTCAAC CCGATCGAAA ACGCGTGTCT GGCGGCGCAG CTCCCGCGCC 1860 TGTCGGCGCT CATCGCCGAG CGCCCTCTCG CCGACGGGCC CCCGTGCCTG GTCCTCGTGG 1920
ACATCTCCAT GACCCCGGTC GCGGTCTTGT GGGAAGCCCC GGAGCCCCCC GGCCCCCCTG 1980
ACGTGCGGTT TGTGGGCAGC GAGGCCACCG AGGAGCTTCC GTTTGTGGCT ACCGCGGGGG 2040
ACGTTCTTGC GGCGAGCGCC GCCGACGCGG ACCCCTTCTT CGCGCGGGCC ATCCTCGGGC 2100 GGCCCTTCGA CGCCTCCCTC CTGACGGGGG AGCTGTTCCC GGGACACCCG GTTTACCAGC 2160
GCCCCCTCGC CGACGAGGCA GGTCCCTCTG CCCCGACCGC CGCCCGCGAC CCGCGGGACC 2220
TTGCGGGGGG GGATGGCGGA TCGGGTCCCG AGGACCCCGC TGCCCCCCCC GCGCGGCAGG 2280
CGGACCCGGG GGTCCTCGCC CCCACTTTCC TCACCGACGC CACCACCGGC GAGCCCGTCC 2340
CCCCTCGCAT GTGGGCCTGG ATCCACGGCC TGGAGGAGCT GGCGTCCGAG GACGCCGGCG 2400 GCCCCACGCC CAATCCGGCC CCGGCCTTAC TTCCCCCCCC CGCCACCGAT CAGTCCGTCC 2460
CCACGTCCCA GTACGCACCG CGGCCCATCG GGCCGGCAGN TACGGCTCGC GAAACACGAC 2520
CGAGTGTCCC GCCTCAACAA AACACGGGGC GCGTGCCCGT GGCCCCTCGG GANGACCCAC 2580
GGCCCTCGCC ACCCACACCG AGTCCCCCCG CGGATGCCGC GGTTCCTCCC CCGGCCTTTT 2640
CCGGGTTTGC CGCCGCTTTT TCCGCCGCCG TGCCGCGCGT GCGCAGATCC CGCC 2695
(2) INFORMATION FOR SEQ ID NO: 77:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 718 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77:
Val Lys Thr Ile Lys Ser Met Asp Met Pro Val Ala Thr Ser Phe Leu
1 5 10 15
Ala Pro Asp Gly Thr Pro Leu Gin Tyr Ala Leu Cys Phe Pro Ala Val 20 25 30
Thr Asp Lys Leu Gly Ala Leu Leu Met Arg Pro Glu Ala Ala Cys Val 35 40 45
Arg Pro Pro Leu Pro Thr Asp Val Leu Glu Ser Ala Pro Thr Val Thr
50 55 60
Ala Met Tyr Val Leu Thr Val Val Asn Arg Leu Gin Leu Ala Leu Ser
65 70 75 80
Asp Ala Gin Ala Ala Asn Phe Gin Leu Phe Gly Arg Phe Val Arg His
85 90 95
Arg Gin Ala Thr Trp Gly Ala Ser Met Asp Ala Ala Ala Glu Leu Tyr 100 105 110
Val Val Ala Thr Thr Leu Thr Arg Glu Phe Gly Cys Arg Trp Ala Gin 115 120 125 Leu Gly Trp Ala Ser Gly Ala Ala Ala Pro Arg Pro Pro Pro Gly Pro
130 135 140
Arg Gly Ser Gin Arg His Cys Val Ala Phe Asn Glu Asn Asp Val Leu
145 150 155 160
Val Val Ala Gly Val Pro Glu His Ile Tyr Asn Phe Trp Arg Leu Asp
165 170 175
Leu Val Arg Gin His Glu Tyr Met His Leu Thr Leu Glu Arg Ala Phe
180 185 190
Glu Asp Ala Ala Glu Ser Met Leu Phe Val Gin Arg Leu Thr Pro His 195 200 205
Pro Asp Ala Arg Ile Arg Val Leu Pro Thr Phe Leu Asp Gly Gly Pro
210 215 220
Pro Thr Arg Gly Leu Leu Phe Gly Thr Arg Leu Ala Asp Trp Arg Arg
225 230 235 240
Gly Lys Leu Ser Glu Thr Asp Pro Leu Ala Pro Trp Arg Ser Ala Leu
245 250 255
Glu Leu Gly Thr Gin Arg Arg Asp Ala Pro Ala Leu Gly Lys Leu Ser
260 265 270
Pro Ala Gin Ala Ala Val Ser Val Leu Gly Arg Met Cys Leu Pro Ser 275 280 285
Ala Ala Ala Leu Trp Thr Cys Met Phe Pro Asp Asp Tyr Thr Glu Tyr
290 295 300
Asp Ser Phe Asp Ala Leu Leu Ala Ala Arg Leu Glu Ser Gly Gin Thr
305 310 315 320
Leu Gly Pro Ala Gly Gly Arg Glu Ala Ser Leu Pro Glu Ala Pro His
325 330 335
Ala Leu Tyr Arg Pro Thr Gly Gin His Val Ala Val Leu Ala Ala Ala
340 345 350
Thr Thr Pro Ala Ala Arg Val Thr Ala Met Asp Leu Val Leu Ala Ala 355 360 365
Val Leu Leu Gly Ala Pro Val Val Val Arg Asn Thr Thr Ala Phe Ser
370 375 380
Arg Glu Ser Glu Leu Glu Leu Cys Leu Thr Leu Phe Asp Ser Arg Pro
385 390 395 400
Gly Gly Pro Asp Ala Ala Leu Arg Asp Val Val Ser Ser Asp Ile Glu
405 410 415
Thr Trp Ala Val Gly Leu Leu His Thr Asp Leu Asn Pro Ile Glu Asn
420 425 430
Ala Cys Leu Ala Ala Gin Leu Pro Arg Leu Ser Ala Leu Ile Ala Glu 435 440 445
Arg Pro Leu Ala Asp Gly Pro Pro Cys Leu Val Leu Val Asp Ile Ser
450 455 460
Met Thr Pro Val Ala Val Leu Trp Glu Ala Pro Glu Pro Pro Gly Pro 465 470 475 480
Pro Asp Val Arg Phe Val Gly Ser Glu Ala Thr Glu Glu Leu Pro Phe
485 490 495
Val Ala Thr Ala Gly Asp Val Leu Ala Ala Ser Ala Ala Asp Ala Asp 500 505 510
Pro Phe Phe Ala Arg Ala Ile Leu Gly Arg Pro Phe Asp Ala Ser Leu
515 520 525
Leu Thr Gly Glu Leu Phe Pro Gly His Pro Val Tyr Gin Arg Pro Leu
530 535 540 Ala Asp Glu Ala Gly Pro Ser Ala Pro Thr Ala Ala Arg Asp Pro Arg
545 550 555 560
Asp Leu Ala Gly Gly Asp Gly Gly Ser Gly Pro Glu Asp Pro Ala Ala
565 570 575
Pro Pro Ala Arg Gin Ala Asp Pro Gly Val Leu Ala Pro Thr Phe Leu 580 585 590
Thr Asp Ala Thr Thr Gly Glu Pro Val Pro Pro Arg Met Trp Ala Trp
595 600 605
Ile His Gly Leu Glu Glu Leu Ala Ser Glu Asp Ala Gly Gly Pro Thr
610 615 620 Pro Asn Pro Ala Pro Ala Leu Leu Pro Pro Pro Ala Thr Asp Gin Ser
625 630 635 640
Val Pro Thr Ser Gin Tyr Ala Pro Arg Pro Ile Gly Pro Ala Xaa Thr
645 650 655
Ala Arg Glu Trp Ser Val Pro Pro Gin Gin Asn Thr Gly Arg Val Pro 660 665 670
Val Ala Pro Arg Xaa Asp Pro Arg Pro Ser Pro Pro Thr Pro Ser Pro
675 680 685
Pro Ala Asp Ala Ala Val Pro Pro Pro Ala Phe Ser Gly Phe Ala Ala 690 695 700 Ala Phe Ser Ala Ala Val Pro Arg Val Arg Arg Ser Arg Arg 705 710 715
(2) INFORMATION FOR SEQ ID NO: 78:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2842 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7! GGGGGGAGGG AAACAAGCCC GATAGAGCGC TAATGAGAAG GGAGGATATA ATGAGGGACA 60
TTGGGGGAGG GAAGGTAACA GGAAGGTTTT AGGAGCCCAA GAAGACCGGA GGACCCCCAA 120
GACATTGGAG GAAATGGCCG AGGCCGTTAA GGAGGGAAGA GGTTCAGACA TCCCGGCCCC 180
CACCCGTACA AGAGCCCAGG AGCCACCAAG CCCCGGGAGC GGGAAAACAA AAAGCCGCCC 240 AAGCGGCCCG AGGCGACCCC GCCCCCCGAC GCCAACGCGA CCGTCGCCGC CGGCCACGCC 300
ACGCTGCGCG CGCACCTGCG GGAAATCAAG GTCGAGAACG CCGATGCCCA GTTTTACGTG 360
TGCCCGCCCC CGACGGGCGC CACGGTGGTG CAGTTTGAGC AGCCGCGCCG CTGCCCGACG 420
CGCCCGGAGG GGCAGAACTA CACGGAGGGC ATCGCGGTGG TCTTCAAGGA GAACATCGCC 480
CCGTACAAAT TCAAGGCCAC CATGTACTAC AAAGACGTGA CCGTGTCGCA GGTGTGGTTC 540 GGCCACCGCT ACTCCCAGTT TATGGGGATA TTCGAGGACC GCGCCCCCGT TCCCTTCGAG 600
GAGGTGATCG ACAAGATTAA CGCCAAGGGG GTCTGCCGCT CCACGGCCAA GTACGTGCGG 660
AACAACATGG AGACCACCGC GTTTCACCGG GACGACCACG AGACCGACAT GGAGCTCAAG 720
CCGGCGAAGG TCGCCACGCG CACGAGCCGG GGGTGGCACA CCACCGACCT CAAGTACAAC 780
CCCTCGCGGG TGGAGGCGTT CCATCGGTAC GGCACGACGG TCAACTGCAT CGTCGAGGAG 840 GTGGACGCGC GGTCGGTGTA CCCGTACGAT GAGTTTGTGT TGGCGACGGG CGACTTTGTG 900
TACATGTCCC CGTTTTACGG CTACCGGGAG GGGTCGCACA CCGAGCACAC CAGCTACGCC 960
GCCGACCGCT TCAAGCAGGT CGACGGCTTC TACGCGCGCG ACCTCACCAC GAAGGCCCGG 1020
GCCACGTCGC CGACGACCCG CAACTTGCTG ACGACCCCCA AGTTTACCGT GGCCTGGGAC 1080
TGGGTGCCGA AGCGACCGGC GGTCTGCACC ATGACCAAGT GGCAGGAGGT GGACGAGATG 1140 CTCCGCGCCG AGTACGGCGG CTCCTTCCGC TTCTCCTCCG ACGCCATCTC GACCACCTTC 1200
ACCACCAACC TGACCCAGTA CTCGCTCTCG CGCGTCGACC TGGGCGACTG CATTGGCCGG 1260
GATGCCCGCG AGGCCATCGA CCGCATGTTT GCGCGCAAGT ACAACGCCAC GCACATCAAG 1320
GTGGGCCAGC CGCAGTACTA CCTGGCCACG GGGGGCTTCC TCATCGCGTA CCAGCCCCTC 1380
CTCAGCAACA CGCTCGCCGA GCTGTACGTG CGGGAGTACA TGCGGGAGCA GGACCGCAAG 1440 CCCCGGAATG CCACGCCCGC GCCACTGCGG GAGGCGCCCA GCGCCAACGC GTCCGTGGAG 1500
CGCATCAAGA CCACCTCCTC GATCGAGTTC GCCCGGCTGC AGTTTACGTA TAACCACATA 1560
CAGCGCCACG TGAACGACAT GCTGGGGCGC ATCGCCGTCG CGTGGTGCGA GCTGCAGAAC 1620
CACGAGCTGA CTCTCTGGAA CGAGGCCCGC AAGCTCAACC CCAACGCCAT CGCCTCCGCC 1680
ACCGTCGGCC GGCGGGTGAG CGCGCGCATG CTCGGAGACG TCATGGCCGT CTCCACGTGC 1740 GTGCCCGTCG CCCCGGACAA CGTGATCGTG CAGAACTCGA TGCGCGTCAG CTCGCGGCCG 1800
GGGACGTGCT ACAGCCGCCC CCTGGTCAGC TTTCGGTACG AAGACCAGGG CCCGCTGATC 1860
GAGGGGCAGC TGGGCGAGAA CAACGAGCTG CGCCTCACCC GCGACGCGCT CGAGCCGTGC 1920
ACCGTGGGCC ACCGGCGCTA CTTCATCTTC GGCGGGGGCT ACGTGTACTT CGAGGAGTAC 1980
GCGTACTCTC ACCAGCTGAG TCGCGCCGAC GTCACCACCG TCAGCACCTT CATCGACCTG 2040 AACATCACCA TGCTGGAGGA CCACGAGTTT GTGCCCCTGG AGGTCTACAC GCGCCACGAG 2100
ATCAAGGACA GCGGCCTGCT GGACTACACG GAGGTCCAGC GCCGCAACCA GCTGCACGAC 2160
CTGCGCTTTG CCGACATCGA CACGGTCATC CGCGCCGACG CCAACGCCGC CATGTTCGCG 2220
GGGCTGTGCG CGTTCTTCGA GGGGATGGGG GACTTGGGGC GCGCGGTCGG CAAGGTAGTC 2280
ATGGGAGTAG TGGGGGGCGT GGTGTCGGCC GTCTCGGGCG TGTCCTCCTT TATGTCCAAC 2340 CCCTTCGGGG CGCTTGCCGT GGGGCTGCTG GTCCTGGCCG GCCTGGTCGC GGCCTTCTTC 2400
GCCTTCCGCT ACGTCCTGCA ACTGCAACGC AATCCCATGA AGGCCCTGTA TCCGCTCACC 2460
ACCAAGGAAC TCAAGACTTC CGACCCCGGG GGCGTGGGCG GGGAGGGGGA GGAAGGCGCG 2520
GAGGGGGGCG GGTTTGACGA GGCCAAGTTG GCCGAGGCCC GAGAAATGAT CCGATATATG 2580 GNTTTGGTGT CGGCCATGGA GCGCACGGAA CACAAGGCCA GAAAGAAGGG CACGAGCGCC 2640
CTGCTCAGCT CCAAGGTCAC CAACATGGTT CTGCGCAAGC GCAACAAAGC CAGGTACTCT 2700
CCGCTCCACA ACGAGGACGA GGCCGGAGAC GAAGACGAGC TCTAAGGGAG GGGAGGGGAG 2760
CTGGGCTTGT GTATAAATAA AAAGACACCG ATGTTCAAAA ATACACATGA CTTCTNGGTA 2820 TGTNTGGGTA CCGAGCTCGA A 2842
(2) INFORMATION FOR SEQ ID NO: 79:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 787 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79:
Val Cys Pro Pro Pro Thr Gly Ala Thr Val Val Gin Phe Glu Gin Pro 1 5 10 15
Arg Arg Cys Pro Trp Glu Gly Gin Asn Tyr Thr Glu Gly Ile Ala Val
20 25 30
Val Phe Lys Glu Asn Ile Ala Pro Tyr Lys Phe Lys Ala Thr Met Tyr 35 40 45 Tyr Lys Asp Val Thr Val Ser Gin Val Trp Phe Gly His Arg Tyr Ser 50 55 60
Gin Phe Met Gly Ile Phe Glu Asp Arg Ala Pro Val Pro Phe Glu Glu 65 70 75 80
Val Ile Asp Lys Ile Asn Ala Lys Gly Val Cys Arg Ser Thr Ala Lys 85 90 95
Tyr Val Arg Asn Asn Met Thr Ala Phe His Arg Asp Asp His Glu Thr
100 105 110
Asp Met Glu Leu Lys Pro Ala Lys Val Ala Thr Arg Thr Ser Arg Gly 115 120 125 Trp His Thr Thr Asp Leu Lys Tyr Asn Pro Ser Arg Val Glu Ala Phe 130 135 140
His Arg Tyr Gly Thr Thr Val Asn Cys Ile Val Glu Glu Val Asp Ala 145 150 155 160
Arg Ser Val Tyr Pro Tyr Asp Glu Phe Val Leu Ala Thr Gly Asp Phe 165 170 175
Val Tyr Met Ser Pro Phe Tyr Gly Tyr Arg Glu Gly Ser His Thr Glu
180 185 190
His Thr Ser Tyr Ala Ala Asp Arg Phe Lys Gin Val Asp Gly Phe Tyr 195 200 205
Ala Arg Asp Leu Thr Thr Lys Ala Arg Ala Thr Ser Pro Thr Thr Arg 210 215 220
Asn Leu Leu Thr Thr Pro Lys Phe Thr Val Ala Trp Asp Trp Val Pro
225 230 235 240
Lys Arg Pro Ala Val Cys Thr Met Thr Lys Trp Gin Glu Val Asp Glu 245 250 255
Met Leu Arg Ala Glu Tyr Gly Gly Ser Phe Arg Phe Ser Ser Asp Ala 260 265 270
Ile Ser Thr Thr Phe Thr Thr Asn Leu Thr Gin Tyr Ser Leu Ser Arg 275 280 285
Val Asp Leu Gly Asp Cys Ile Gly Arg Asp Ala Arg Glu Ala Ile Asp
290 295 300
Arg Met Phe Ala Arg Lys Tyr Asn Ala Thr His Ile Lys Val Gly Gin
305 310 315 320
Pro Gin Tyr Tyr Leu Ala Thr Gly Gly Phe Leu Ile Ala Tyr Gin Pro 325 330 335
Leu Leu Ser Asn Thr Leu Ala Glu Leu Tyr Val Arg Glu Tyr Met Arg 340 345 350
Glu Gin Asp Arg Lys Pro Arg Asn Ala Thr Pro Ala Pro Leu Arg Glu 355 360 365
Ala Pro Ser Ala Asn Ala Ser Val Glu Arg Ile Lys Thr Thr Ser Ser 370 375 380
Ile Glu Phe Ala Arg Leu Gin Phe Thr Tyr Asn His Ile Gin Arg His
385 390 395 400
Val Asn Asp Met Leu Gly Arg Ile Ala Val Ala Trp Cys Glu Leu Gin
405 410 415
Asn His Glu Leu Thr Leu Trp Asn Glu Ala Arg Lys Leu Asn Pro Asn 420 425 430
Ala lie Ala Ser Ala Thr Val Gly Arg Arg Val Ser Ala Arg Met Leu 435 440 445
Gly Asp Val Met Ala Val Ser Thr Cys Val Pro Val Ala Pro Asp Asn 450 455 460
Val lie Val Gin Asn Ser Met Arg Val Ser Ser Arg Pro Gly Thr Cys
465 470 475 480
Arg Pro Leu Val Ser Phe Arg Tyr Glu Asp Gin Gly Pro Leu Ile Glu 485 490 495
Gly Gin Leu Gly Glu Asn Asn Glu Leu Arg Leu Thr Arg Asp Ala Leu 500 505 510
Glu Pro Cys Thr Val Gly His Arg Arg Tyr Phe Ile Phe Gly Gly Gly 515 520 525
Tyr Val Tyr Phe Glu Glu Tyr Ala Tyr Ser His Gin Leu Ser Arg Ala 530 535 540 Asp Val Thr Thr Val Ser Thr Phe Ile Asp Leu Asn Ile Thr Met Leu
545 550 555 560
Glu Asp His Glu Phe Val Pro Leu Glu Val Tyr Thr Arg His Glu Ile
565 570 575 Lys Asp Ser Gly Leu Leu Asp Tyr Thr Glu Val Gin Arg Arg Asn Gin
580 585 590
Leu His Asp Leu Arg Phe Ala Asp Ile Asp Thr Val Ile Arg Ala Asp
595 600 605
Ala Asn Ala Ala Met Phe Ala Gly Leu Cys Ala Phe Phe Glu Gly Met 610 615 620
Gly Asp Leu Gly Arg Ala Val Gly Lys Val Val Met Gly Val Val Gly
625 630 635 640
Gly Val Val Ser Ala Val Ser Gly Val Ser Ser Phe Met Ser Asn Pro
645 650 655 Phe Gly Ala Val Gly Leu Leu Val Leu Ala Gly Leu Val Ala Ala Phe
660 665 670
Phe Ala Phe Arg Tyr Val Leu Gin Leu Gin Arg Asn Pro Met Lys Ala
675 680 685
Leu Tyr Pro Leu Thr Thr Lys Glu Leu Lys Thr Ser Asp Pro Gly Gly 690 695 700
Val Gly Gly Glu Gly Glu Glu Gly Ala Glu Gly Gly Gly Phe Asp Glu
705 710 715 720
Ala Lys Leu Ala Glu Ala Arg Glu Met Ile Arg Tyr Met Xaa Leu Val
725 730 735 Ser Ala Met Glu Arg Thr Glu His Lys Ala Arg Lys Lys Gly Thr Ser
740 745 750
Ala Leu Leu Ser Ser Lys Val Thr Asn Met Val Leu Arg Lys Arg Asn
755 760 765
Lys Ala Arg Tyr Ser Pro Leu His Asn Glu Asp Glu Ala Gly Asp Glu 770 775 780
Asp Glu Leu 785
(2) INFORMATION FOR SEQ ID NO: 80:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4290 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: GAGGAAGAGA GGGGGGAGGG GAAGAGAAAA GAGAAAAGGA AGAGGAGGGA GGAGAAGAAG 60
GGAAGGGTAA GAGGGGAAGA AGGGAANAAG GAAAAAGGAG TGTGAGAGGA GGTAGGAGAT 120
GAAGAGAAAA GAGGGGGAGA GAGAAGGGAA AAAGAAGTGG AAGGGAGGGA GAAATAGGGG 180 AGAGGAGAAA AGTAAGATTA GGAGGTGGAG AGGGGAGGAA GAGGAATAAG ATAGGTAGAG 240
TAGGTGAAGG TGGGAGAAGG AGAGATGTAG GGATAGGGAA AAGGGGGGGG AGGGGAGATG 300
AGATAGAGAG GAGGGGGGAA AGGAAGGGGA TGAAGAGGAG GGGAGAGGGA GGGGGGGAGA 360
AAGGAGGTGA GGGGGGAGGG GGAAGAGGGG GGAGGGGAGG GGGGAGAGAA GAANAGAAAA 420
NNAAGNNNCC CCGGCCGCGT CCCCGNTGGA GCCCCTGGGG GACCCGACCC TGTGGCGGGC 480 GCTGTATGCG TGCGTCCTGG CGGCCCTGGA GCGCCAGACG GGGCCGGTGG CCCTCTTCGT 540
CCCGCTGCGC CTGGGCTGGG ACCCGCAGAC GGGTCTGGTC GTGAGGGTCG AAAGGGCGTC 600
GTGGGGCCCG CCGGCCGCTC CTCGCGCCGC CCTCCTGGAC GTGGAGGCCA AGGTCAACTT 660
CAACCCGCTG GCCCTGGCCG CGCGCGTCGC CGAGCACCCC GGCGCGCGGT TGGCGTGGGC 720
GCGCCTGGCC GCCATTCGCA ACAGCCCCCA GTGCGCGTCC TCCGCCTCGC TCGCCGTCAC 780 CATCACGACG AGGACCGCGC GTTTCGCGCG CGAATACACC ACCCTGGCGT TTCCGCCGAC 840
CAGCAAGGAG GGCGCCTTCG CGGACCTGGT CGAGGTGTGC GAGGTATGCC TGCGGCCCCG 900
CGGACACCCG CATCGGGTCA CGGCGCGGGT GCTGCTGCCG CGCGGCTACA ACTACTTCGT 960
GAGCGCCGGC GACGGGTTCT CCGCCCCGGC GCTGGTCGCC CTCTTCCGGC AGTGGCATAC 1020
CACGGTCCAC CCCGCCCCCG GAGCCCTGGC CCCCGTCTTC GCTTTTCTGG GGCCCGGGTT 1080 TGAGGTCCGG GGAGGGCCCC TCCAATACTT TGCCGTGCTG GGATTTCCGG GCTGGCCCCC 1140
CTTTACCGTG CCGGCCGCCG CCGCCGCCGA ATCGGTGCGT GACCTGCTGC GGGGCGCCGC 1200
GTGCACCCAT CCCCTTTGCC CTGGGGGCCC TGGCCCGCGG TGGGCGCCTA GGTCTTCCTG 1260
CCCCCGCGGG CATGGCCGGC CGTGGCCTCG GAGGCGGCCG GCCGCCTCCT GCCCGCCTTT 1320
CGGGAAGCGG TGGCGCGGTG GCACCCCACG GCCACCACCA TCCAACTACT CGACCCCCCG 1380 GCGGCCGTCG GGCCGGTCTG GACGGCGCGG TTTTGTTTCT CCGGGCTCCA GGCCCAGCTC 1440
CTGGCCGCCC TCGCGGGCCT CGGGGAGGCC GGGCTGCCGG AAGCCCGGGG GCGGGCGGGC 1500
CTGGAAAGGC TGGACGCGCT GGTGGCGGCC GCCCCCTCGG AGCCCTGGGC CCGGGCCGTG 1560
CTGGAGCGCC TGGTGCCGGA CGCGTGCGAC GCCTGCCCCG CGCTCCGGCA GCTGCTCGGC 1620
GGGGTCATGG CCGCCGTCTG CCTGCAGATC GAGCAGACGG CCAGCTCGGT GAAGTTTGCG 1680 GTCTGCGGCG GCACCGGGGC TGCGTTCTGG GGGCTGTTCA ACGTGGACCC CGGGGACGCG 1740
GACGCCGCGC ACGGCGCGAT CCATGACGCC CGCCGGGCCC TCGAGGCGTC CGTGCGCGCC 1800
GTACTTTCGG CCAACGGCAT ACGCCCGCGC CTCGCCCCCT CCCTGGCGCT AGAGGGCGTC 1860
TACACCCACG TCGTCACCTG GAGCCAGACC GGGGCGTGGT TCTGGAACTC CCGCGATGAC 1920
ACCGACTTCC TGCAGGGATT TCCTCTCCGC GGGCCCGCGT ACGCCGCGGC GGCCGAGGTT 1980 ATGCGCGACG CGCTGAGACG AATCCTCCGG CGGCCGGCCG CCGGCCCGCC GGAGGAGGCC 2040
GTGTGCGCGG CCCGGGGCAT CATGGAGGAC GCCTGTGACC GCTTTGTCCT GGATGCCTTC 2100
GGGAGGCGTC TGGACGCGGA GTACTGGAGC GTTCTGACCC CCCCGGGCGA GGCCGACGAC 2160
CCCCTGCCCC AAACGGCCTT CCGCGGAGGC GCCCTGCTGG ACGCGGAGCA ATACTGGAGA 2220
CGCGTCGTGC GCGTATGTCC CGGGGGCGGG GAGTCGGTCG GCGTCCCCGT GGATCTGTAC 2280 CCGCGGCCCT TGGTGCTCCC CCCCGTGGAC TGCGCCCATC ACCTGCGCGA GATCCTGCGC 2340
GAGATTCAAC TGGTGTTTAC GGGGGTTCTG GAAGGCGTGT GGGGCGAGGG CGGGAGCTTT 2400
GTGTACCCTT TCGAGGAAAA GATGCGGTTT CTGTTTCCCT GAATTTGGTC AATAAACTGG 2460
GGCCCCGTGC TCCAACTTAC CCCCGCGTGT GCGCGCGTCC GTATTTACTG ACACGCGCCG 2520 GTTGTGGTTT TCTTCTATTT CTTTGTTCCC TCTATCATGT CTTTCCACCA CCAGCACCAC 2580
CACCCCCCCA CTTTCCTCCT TCGGTGCACA AGACACACAC ACAGGCCCAC CACCATCCCC 2640
CGAGAGATGA CACGACAGGT AGGGAACGTT CCATAAAAAA CACGTTTATT TTCCGGAGTT 2700
AGCAAAACCG ATAGAAAAGC GACGAGGTCC GTCGTTTGGG GCTCCCCGAA AGCCACCAAT 2760 ACACCAGAGC CGAACGCAGG TCCTTGGATT TCCAGCAGCT TCCCATGACG CCGGCCGGGT 2820
TATAGGCCAC ACAGTCCGTG CGGGGGACGG GCCCGGGCAA CTGCAACGCA AAGCTCTGCG 2880
GGGTGGCGCA AAACAGGGCC GAAAGGACGG GGGGCGGATT GTTGGCCAGC AGGTAGTGGG 2940
CCATGTACCA GTGGGGCAGG ACTAGCTCGT CGTCGAAGGG CTTCACGCCC GCCATGCACA 3000
TTAGCGTGTT CAGGATCCAG TGGCAGCTGC GGAGGAGAAG GCAGCGGACG CGCTCGAAGG 3060 GAGACGCGTG GCGGGCCACA ACCCCCAACC ACGCCGACAC CTTGACAAAC AGCGAGGGCG 3120
TGGCGTGGCT CCGGGGACAG TTCTCCAGGT ACATCAGCAG GCAGACGAGC TCGAAGTCCC 3180
GGAGGTCCGT GGGGCGAAAC AAGGAGAGCC GGTGCAGCAG AACGAGGGGG GGCACGGCCA 3240
GGCTGTGCAC GTGGTCCTCG TTGGCCGTGA GGACGACGAC CGCAAAACCG CGATACGTCG 3300
GCTGGTCCCC GCAGCGCTTA AAGTAATCCT CCGACGCCAC GTACGCGTCG TCGGAACTCC 3360 CGTCCAGAAC GAACCGCAAC CGCCCCCTGG GGGTGACGTC AACGCGCAGG ACGCTGGTCG 3420
CGGTAAACCG CGGCTGGCGA TCGCTGACCT GGCGCACCTC GCAGGCCATG CGCAGCAGCG 3480
CCTGGTTGCT GATCCCCTCC GCCACCTCGA CCAGACTGCG GTCCCCGGCG ATGGCCTGTT 3540
TGAGGATGGC GGCGGCCGTT CCCTCATCGG CGGGCGTGGG GTCGGCCATC CCTGCGTTGG 3600
ACGCCCCAGC CCTGGTCCGG CGCACCCCTC GGCGTTCTCC CGGGCGACCG GGATCGGGTC 3660 CGGGTCCGGG ACCGGGACCC GCCCCGCGGG GACGCGCTCG CCCGGAAATC GGCGGGGGTT 3720
GGGGAGGGGG GCCGGGGCAG AGCCGCGTGC TGTACGTCCG CCACGAACAG GGCCGCGACG 3780
TCTGTCAGGT ACGTCTGCAG GCGGGTTTTT TTAAAGACCG CCTCCCATAA CTCCTCCTCC 3840
CCTAGGATGA CATCGGAGCC GGTGATGAGC GCGCCCGCTC GGGGGGCGCG AAGCACGTAC 3900
TCGAAATACG GGGCCACGAA GGAGGCGATC GCCCCGCTAG AGTACGAGAT CGACGTTTCC 3960 TGGCCCTGGT TGTTGCGGTG GCGCAGAATC TTGAAGCAGC GCACCAGCTC GTGCTCCCAG 4020
AGGCGCGACA GGCGCTCGAG GTCCTGGCCG TACGCGGGGA TGTACTGGTG CTGGAAACTG 4080
TTGGCCACGT ACGTGTCGTC GTCCATGGAC TTGCTGACGT CGATAATGTC GTAGTCGGCC 4140
CGGAGAAGAT CCGCCTCCGC CGGGCGGCCC GCGCCTCCCC CGGCCGCCCG GTCCGCCGCG 4200
CGATGCTCCC GCTCCAGCGC CCCCGCCTGG GCGCGCCGCA GCTCGCGGTC GCGCGCCTGC 4260 AGCTGGGTCG CCGGGGACAT CTAGAGTCG 4290
(2) INFORMATION FOR SEQ ID NO: 81:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 373 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: Val Phe Val Pro Leu Arg Leu Gly Trp Asp Pro Gin Thr Gly Leu Val
1 5 10 15
Val Arg Val Glu Arg Ala Ser Trp Gly Pro Pro Ala Ala Pro Arg Ala 20 25 30 Ala Leu Leu Asp Val Glu Ala Lys Val Asn Phe Asn Pro Leu Ala Ala 35 40 45
Arg Val Ala Glu His Pro Gly Ala Arg Leu Ala Trp Ala Arg Leu Ala
50 55 60
Ala Ile Arg Asn Ser Pro Gin Cys Ala Ser Ser Ala Ser Leu Ala Val 65 70 75 80
Thr Ile Thr Thr Arg Thr Ala Arg Phe Ala Arg Glu Tyr Thr Thr Leu
85 90 95
Ala Phe Pro Pro Thr Ser Lys Glu Gly Ala Phe Ala Asp Leu Val Glu 100 105 110 Val Cys Glu Val Cys Leu Arg Pro Arg Gly His Pro His Arg Val Thr 115 120 125
Ala Arg Val Leu Leu Pro Arg Gly Tyr Asn Tyr Phe Val Ser Ala Gly
130 135 140
Asp Gly Phe Ser Ala Pro Ala Leu Val Phe Arg Gin Trp His Thr Thr 145 150 155 160
Val His Pro Ala Pro Gly Ala Pro Val Phe Ala Phe Leu Gly Pro Gly
165 170 175
Phe Glu Val Arg Gly Gly Pro Leu Gin Tyr Phe Ala Val Leu Gly Phe 180 185 190 Pro Gly Trp Pro Pro Phe Thr Val Pro Ala Ala Ala Ala Ala Glu Ser 195 200 205
Val Arg Asp Leu Leu Arg Gly Ala Ala Cys Thr His Pro Leu Cys Pro
210 215 220
Gly Gly Pro Gly Pro Arg Trp Ala Pro Arg Ser Ser Cys Pro Arg Gly 225 230 235 240
His Gly Arg Pro Trp Pro Arg Arg Arg Pro Ala Ala Ser Cys Pro Pro
245 250 255
Phe Gly Lys Arg Trp Arg Gly Gly Thr Pro Arg Pro Pro Pro Ser Asn 260 265 270 Tyr Ser Thr Pro Arg Arg Pro Ser Gly Arg Ser Gly Arg Arg Gly Phe 275 280 285
Val Ser Pro Gly Ser Arg Pro Ser Ser Trp Pro Pro Ser Arg Ala Ser
290 295 300
Gly Arg Pro Gly Cys Arg Lys Pro Gly Gly Gly Arg Ala Trp Lys Gly 305 310 315 320
Trp Thr Arg Trp Trp Arg Pro Pro Pro Arg Ser Pro Gly Pro Gly Pro
325 330 335
Cys Trp Ser Ala Trp Cys Arg Thr Arg Ala Thr Pro Ala Pro Arg Ser 340 345 350
Gly Ser Cys Ser Ala Gly Ser Trp Pro Pro Ser Ala Cys Arg Ser Ser
355 360 365
Arg Arg Pro Ala Arg 370
(2) INFORMATION FOR SEQ ID NO: 82:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 380 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82:
Val Ala Ser Glu Ala Ala Gly Arg Leu Leu Pro Ala Phe Arg Glu Ala 1 5 10 15
Val Ala Arg Trp His Pro Thr Ala Thr Thr Ile Gin Leu Leu Asp Pro
20 25 30
Pro Ala Ala Val Gly Pro Val Trp Thr Ala Arg Phe Cys Phe Ser Gly 35 40 45 Leu Gin Ala Gin Leu Leu Ala Ala Gly Leu Gly Glu Ala Gly Leu Pro 50 55 60
Glu Arg Arg Ala Gly Leu Glu Arg Leu Asp Ala Leu Val Ala Ala Ala 65 70 75 80
Pro Ser Glu Pro Trp Ala Arg Ala Val Leu Glu Arg Leu Val Pro Asp 85 90 95
Ala Cys Asp Ala Cys Pro Ala Leu Arg Gin Leu Leu Gly Gly Val Met
100 105 110
Ala Ala Val Cys Leu Gin Ile Glu Gin Thr Ala Ser Ser Val Lys Phe 115 120 125 Ala Val Cys Gly Gly Thr Gly Ala Ala Phe Trp Gly Leu Phe Asn Val 130 135 140
Asp Pro Gly Asp Ala Asp Ala Ala His Gly Ala Ile His Asp Ala Arg 145 150 155 160
Arg Ala Leu Glu Ala Ser Val Arg Ala Val Leu Ser Ala Asn Gly Ile 165 170 175
Arg Pro Arg Leu Ala Pro Ser Leu Ala Leu Glu Gly Val Tyr Thr His
180 185 190
Val Val Thr Trp Ser Gin Thr Gly Ala Trp Phe Trp Asn Ser Arg Asp 195 200 205
Asp Thr Asp Phe Leu Gin Gly Phe Pro Leu Arg Gly Pro Ala Tyr Ala
210 215 _ 220
Ala Ala Ala Glu Val Met Arg Asp Ala Leu Arg Arg Ile Leu Arg Arg 225 230 235 240
Pro Ala Ala Gly Pro Pro Glu Glu Ala Val Cys Ala Arg Ile Met Glu
245 250 255
Asp Ala Cys Asp Arg Phe Val Leu Asp Ala Phe Gly Arg Arg Leu Asp 260 265 270 Ala Glu Tyr Trp Ser Val Leu Thr Pro Pro Gly Glu Ala Asp Asp Pro 275 280 285
Leu Pro Gin Thr Ala Phe Arg Gly Gly Ala Leu Leu Asp Ala Glu Gin
290 295 300
Tyr Trp Arg Arg Val Val Arg Val Cys Pro Gly Gly Gly Glu Ser Val 305 310 315 320
Gly Val Pro Val Asp Leu Tyr Pro Arg Pro Leu Val Leu Pro Pro Val
325 330 335
Asp Cys Ala His His Leu Arg Glu Ile Leu Arg Glu Ile Gin Leu Val 340 345 350 Phe Thr Gly Val Leu Glu Gly Val Trp Gly Glu Gly Gly Ser Phe Val 355 360 365
Tyr Pro Phe Glu Glu Lys Met Arg Phe Leu Phe Pro 370 375 380
(2) INFORMATION FOR SEQ ID NO: 83:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 302 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83:
Val Arg Arg Thr Arg Ala Gly Asn Ala Gly Met Ala Asp Pro Thr Pro
1 5 10 15
Ala Asp Glu Gly Thr Ala Ala Ala Ile Leu Lys Gin Ala Ile Ala Gly 20 25 30
Asp Arg Ser Leu Val Glu Val Ala Glu Gly Ile Ser Asn Gin Ala Leu
35 40 45
Leu Arg Met Ala Cys Glu Val Arg Gin Val Ser Asp Arg Gin Pro Arg 50 55 60
Phe Thr Ala Thr Ser Val Leu Arg Val Asp Val Thr Pro Arg Gly Arg 65 70 . 75 80
Leu Arg Phe Val Leu Asp Gly Ser Ser Asp Asp Ala Tyr Val Ala Ser 85 90 95
Glu Asp Tyr Phe Lys Arg Cys Gly Asp Gin Pro Tyr Gly Phe Ala Val
100 105 110
Val Val Leu Thr Ala Asn Glu Asp His Val His Ser Leu Ala Val Pro 115 120 125 Pro Leu Val Leu Leu His Arg Leu Ser Leu Phe Arg Pro Thr Asp Leu 130 135 140
Arg Asp Phe Glu Leu Val Cys Leu Leu Met Tyr Leu Glu Asn Cys Pro 145 150 155 160
Arg Ser His Ala Thr Pro Ser Leu Phe Val Lys Val Ser Ala Trp Leu 165 170 175
Gly Val Val Ala Arg His Asp Phe Glu Arg Val Arg Cys Leu Leu Leu
180 185 190
Arg Ser Cys His Trp Ile Leu Asn Thr Leu Met Cys Met Ala Gly Val 195 200 205 Lys Pro Phe Asp Asp Glu Leu Val Leu Pro His Trp Tyr Met Ala His 210 215 220
Tyr Leu Leu Ala Asn Asn Pro Pro Pro Val Leu Ser Ala Leu Phe Cys 225 230 235 240
Ala Thr Pro Gin Ser Phe Ala Leu Gin Leu Pro Gly Pro Val Pro Arg 245 250 255
Thr Asp Cys Val Ala Tyr Asn Pro Ala Gly Val Met Gly Ser Cys Trp
260 265 270
Lys Ser Lys Asp Leu Arg Ser Ala Leu Val Tyr Trp Trp Leu Ser Gly 275 280 285 Ser Pro Lys Arg Arg Thr Ser Ser Leu Phe Tyr Arg Phe Cys 290 295 300
(2) INFORMATION FOR SEQ ID NO: 84:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 236 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: Met Ser Pro Ala Thr Gin Leu Gin Ala Arg Asp Arg Glu Leu Arg Arg
1 5 . 10 15
Ala Gin Ala Gly Ala Leu Glu Arg Glu His Arg Ala Ala Asp Arg Ala 20 25 30
Ala Gly Gly Gly Ala Gly Arg Pro Ala Glu Ala Asp Leu Leu Arg Ala
35 40 45
Asp Tyr Asp Ile Ile Asp Val Ser Lys Ser Met Asp Asp Asp Thr Tyr 50 55 60 Val Ala Asn Ser Phe Gin His Gin Tyr Ile Pro Ala Tyr Gly Gin Asp 65 70 75 80
Leu Glu Arg Leu Ser Arg Leu Trp Glu His Glu Leu Val Arg Cys Phe
85 90 95
Lys Ile Leu Arg His Arg Asn Asn Gin Gly Gin Glu Thr Ser Ile Ser 100 105 110
Tyr Ser Ser Gly Ala Ile Ala Ser Phe Val Ala Pro Tyr Phe Glu Tyr
115 120 125
Val Leu Arg Ala Pro Arg Ala Gly Ala Leu Ile Thr Gly Ser Asp Val
130 135 140 Ile Leu Gly Glu Glu Glu Leu Trp Glu Ala Val Phe Lys Lys Thr Arg
145 150 155 160
Leu Gin Thr Tyr Leu Thr Asp Val Ala Ala Leu Phe Val Ala Asp Val
165 170 175
Gin His Ala Ala Leu Pro Arg Pro Pro Ser Pro Thr Pro Ala Asp Phe 180 185 190
Arg Ala Ser Asp Arg Gly Gly Ser Arg Ser Arg Thr Arg Thr Arg Ser
195 200 205
Arg Ser Pro Gly Arg Thr Pro Arg Gly Ala Pro Asp Gin Gly Trp Gly 210 215 220 Val Gin Arg Arg Asp Gly Arg Pro His Ala Arg Arg 225 230 235
(2) INFORMATION FOR SEQ ID NO: 85:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3664 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: GTGTGTTTTT GTTTGTCTCC ACTTGAAGAG GCGTAAATTT GAGTTTCTAG GGGGGGGCCA 60
GAGGAAAACC ANCAAGGCCC TTTAGGTTTT CGCCCTTNGN GGCCCNTGGT AACGTTTTAT 120
CCGGNGTTAT CCAGATAGGG GAGACCCANA CCCCCTTGAG GAGGNAAACC TTTCCCCACC 180
CGNACCCGCG CCCCAGATTT AGTGAGGGGG ANGGAAGAGC CCCAAACACC NACCCCCTTT 240 CCGGNGGGTN GCGATTTAAT ATGCANTGCA GACAGTTCTC GATTGGAACG GGCATGGCGC 300
AACCANTNAT GGGTNGTCAT CTTGCACCCC GCTACATTAA GTTCGTTTGA AGTGGGGATG 360
GGGGTAACAT TAACAGAACA GTTAGCCAGA TACGCCAGGG GCATTACCTC ATAAAGGACA 420
AAGTGAGTTC CACGCGTGCG CCGTTTTAGA TTAGTGATCC CCCGGCTGCA GGATTCGATN 480
GGGAGACAGT CACGAGTCNN GGACCACGTC GGNGACCCAG GCCCCAGNNT GTGTCCNCCC 540 AGCCCCCCAG TCATGACGTT TGTGAGCACG ACGAGTCTGC GGCCGGGCTG GGGGCGCGTC 600
TTCGTTCGCG TGGGCCATCA CTTCCTGAAT GGCTGCGGTG CGCTGATCGC CCGAGCTGGC 660
GAAGGGCGCC ACAACCAGCG CGCGCTCCGT CTGCAGGCCC TTCCACGTGT CGTGGAGTTC 720
CTGAACGAAC TCGGCCACCC GCTCGGGGCC CGTGGCCGCG CGCGCGGCCT GATAGCCGGC 780
CGAGAGGCGC CGCCAGCGCG CCAGGAACTG ACTCATGTAA CAGAACCCGG GGACCTGGTC 840 CCCCGACATC AACTTTGACG CCCTGGCGTG GATGCCCGAC ACGATGGCCA GGAACCCGTG 900
GATTTCCCGC CGCACGACGG CCAGCACGTT ACCCTCGTGC GAGACCTGGG CCGCCAGCTC 960
GTCGCATACC CCGAGGTGCG CCGTCGTCTC GGTGACGACG GACCGCAGCC CCGCGAGGGA 1020
CGCGACCAGC GCGCGCTTGG CGTCGTGATA CATGCCGCAG TACTGGCTCA CCGCGTCGCC 1080
CATGGCCTCG GGGCGCCAGG GCCCCAGGCG CTCGTGGGCG TCTGCGACCA CGGCGTACAG 1140 GCGGTGCCCG TCGCTCTCGA ACCGGCACTC AAAGAAGGCG GCGAGCGTGC GCATGTGCAG 1200
CCGCAGCAGC ACGATCGCGT CCTCCAGCTG GCGGACCAGG GGGTCGGCGC GCTCGGCAAA 1260
CTCCTGCATC ACCCCCCGGG CCGCCAGGGC GTACATGCTG ATCAGCAGCA GGCTGCTGCC 1320
CACCTCGGGA GGCTGGGGGG GAGGCAGCTG GACCGCGGGC CGCAGCTGCT CGACGGCCCC 1380
CCTGGCGATC ACGTACAGCT CGCGCAGCAG CTGCTCGATG TTGTCGGCCA TCTGCATCGT 1440 GGGCCCGACG CCGGCCCGGG TGGCCGGTTC GAGGAGGGTG ATCAGCGCGC CCAATTTTGT 1500
GCGGTGCCCC TCGACGGTGG GGAGATAGCC CAGGCCGAAG TCGCGCGCCC AGGCCAGCAC 1560
CCGCAGGGCA AACTCGATGG GGCGGGGCAG GTAGGCAGCG TTGCACGTGG CCCTCAGCGC 1620
GTCCCCGACC ACCAGGGCCA GCACGTAAGG GACGAACCCC GGGTCGGCGA GGACGTTGGG 1680
GTGGATGCCC TCCAGGGCCG GGAAGCGGAT CTTGGTGGCC GCGGCCAGGT GAACCGAGGG 1740 GGCGTGGCTA GGCGGCCCGA CGGGGAGCAG CGCGGACAGC GGCGTGGCCG GGGTGGTGGG 1800
GGTCAGGTCC CAGTGGGTCT GGCCGTACAC GTCGAGCCAG ATGAGCGCCG TCTCGCGCAG 1860
GAGGCTGGGC TGGCCGGCGC TGAAGCGGCG CTCGGCCGTC TCAAACTCCC CCACGAGCGT 1920
GCGCCGCAGG CTCGCCAGGT GTTCCGTCGG CACGGCCGGG CCCATGATGC GCGCCAGCGT 1980
CTGGCTGAGG ACGCCGCCCG ACAGGCCGAC CGCCTCACAG AGCCGCCCGT GCGTGTGCTC 2040 GCTGGCGCCC TGGATCCGCC GGAACGTTTT CACGTAGCCG GCGTAGTGCC CGTACTCCCG 2100
CGCGAGCCCG AACACGTTCG CCCCCGCAAG GGCAATGCAC CCAAAGAGCT GCTGGATCTC 2160
GCTGAGCCCG TGGCCGGGGG GCGTCCGCGC GGGCACCCCC GCCACCAAAA ACCCCTCCAG 2220
GGCCGATATG TACTGGGTGC AGTGCGCGGG CGTGAACCCC GCGTCGGTAA GCGTGTTGAT 2280
CACCACGGAG GGCGAGTTGC TGTTTTGGAC CAAAGCCCAC GTCTGCTGCA GCAGCGCGAG 2340 GAGCCGTTGC TGGGCCCCGG CGGAGGGCGG CTCCCCTAGC TGCAGCAGGC CGGTGACGGC 2400
CGGACGGAAG ATGGCCAGCG CCGACGCACT CAGAAACGGC ACGTCGGGGT CGAAGACGGC 2460
CGCGTCCGTC CGCACGCGCG CCATCAGCGT CCCCGGGGGC GCGCACGCCG ACCGCGGGCT 2520
GACGCGGCTT AGGGCGGTCG ACACGCGCAC CTCCTCGCGA CTGCGAACCA TTTTGGTGGC 2580 CTCGAGGGGC GGGATCATGA TAGCCGGGTC GATCTCCCGC ACCGTGTGCT GAAACTGGGC 2640
CAGCAGCGGC GGCGGGACCA CCGCGCCCCG ATCGGGGGTC GTCAGGTACT CGTCCACCAG 2700
CGCCAGCGTA AACAGGGCCC GCGTGAGGGG GGTCAGGGCG GCGTCGTCGA TGCGCTGTAG 2760
GTGCGCCGAG AACAGCGTCA CCCAATTGCT GACCAGGGCC AAGAACCGGA GACCCTCTTG 2820 CACGATCGGG GACGGGAAGA GCAGGCTGTA CGCCGGGGTG GTCAGGTTGG CGCCGGGTTG 2880
CCCCAGGGGA ACCGGGGACA TCTTAAGCGA CATCTCCCCG AGGGCCTCCA GGGAGGTCCG 2940
CGGGTTCATG GCCAGGCAGC TCTGGGTGAC GGTCCGCCAG CGGTCGATCC ACTCCACGGC 3000
ACACTGGCGG ACGCGCACCG GCCCCAGGGC CGCCGTGGTG CGCAGCCCGG CGGCCTCCAG 3060
CGCGTGGGTC GTGTCGGAGC CGGTGATCGC CAGGACCGTG TCCTTGATGA CGTCCATCTC 3120 CCGGAAGGCC GCCTCGGGGG TCTCGGGGAG CGCCACCGCC ATGCGGTGCA CCAGCAGCCC 3180
GGGGAGGTTC TCGGCCAAGA GCGCCGTCTC CGGAAGCCCG TGGGCCCGGT GCAAGGCGCA 3240
CAGTTGCTCC AGGAGCGGGT GCCAGCACGC CCGCGCCTCC GCCGGGCCGA CCGCCGCGCC 3300
CGACAACAGA AACGCCGCCG TGGCGGCGCG CAGTTTGGCC GCGGACAGAA ACGCCGGCTC 3360
GTCCGCGCTG CCCGCCGGCT CGCTCGAGGG GGAGGGCGGC CGGCGGAGGT TGGTCAGGCT 3420 CCCCAACAGG ACCTGCAACG GTCCGTTTGG GGGTGGAGCG GACGGGGGGG TCATGCCGGC 3480
GGGCGCCGGG ACCTGGAGCG CGCTGTCCGA CATGGCGACC GGCGTGCGCG CTCGGCGACG 3540
CGGCGCGGAG ACCGCGGGCC CAAACGGGAA TGACTGCCGC CGCCCTATAC GGAGGGGGTA 3600
AGTATCGCCC GGGGACCCTT CGAAACCCCG GGCGTGTCGC AAGTACGCCC GCGAAAGGCG 3660
CGG 3664
(2) INFORMATION FOR SEQ ID NO: 86:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1043 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86:
Pro Arg Leu Ser Arg Ala Tyr Leu Arg His Arg Phe Glu Gly Ser Pro 1 5 10 15 Gly Asp Thr Tyr Pro Leu Arg Ile Gly Arg Arg Gin Ser Phe Pro Phe 20 25 30
Gly Pro Ala Val Ser Ala Pro Arg Arg Arg Ala Arg Thr Pro Val Ala
35 40 45
Met Ser Asp Ser Ala Leu Gin Val Pro Ala Pro Ala Gly Met Thr Pro 50 55 60
Pro Ser Ala Pro Pro Pro Asn Gly Pro Leu Gin Val Leu Leu Gly Ser 65 70 75 80
Leu Thr Asn Leu Arg Arg Pro Pro Ser Pro Ser Ser Glu Pro Ala Gly 85 90 95
Ser Ala Asp Glu Pro Ala Phe Leu Ser Ala Ala Lys Leu Arg Ala Ala
100 . 105 110
Thr Ala Ala Phe Leu Leu Ser Gly Ala Ala Val Gly Pro Ala Glu Ala 115 120 125
Arg Ala Cys Trp His Pro Leu Leu Glu Gin Leu Cys Ala Leu His Arg
130 135 140
Ala His Gly Leu Pro Glu Thr Ala Leu Leu Ala Glu Asn Leu Pro Gly 145 150 155 160 Leu Leu Val His Arg Met Ala Val Pro Glu Thr Pro Glu Ala Ala Phe
165 170 175
Arg Glu Met Asp Val Ile Lys Asp Thr Val Leu Ala Ile Thr Gly Ser
180 185 190
Asp Thr Thr His Ala Leu Glu Ala Ala Gly Leu Arg Thr Thr Ala Ala 195 200 205
Leu Gly Pro Val Arg Val Arg Gin Cys Ala Val Glu Trp Ile Asp Arg
210 215 220
Trp Arg Thr Val Thr Gin Ser Cys Leu Ala Met Asn Pro Arg Thr Ser 225 230 235 240 Leu Glu Ala Leu Gly Glu Met Ser Leu Lys Met Ser Pro Val Pro Leu
245 250 255
Gly Gin Pro Gly Ala Asn Leu Thr Thr Pro Ala Tyr Ser Leu Leu Phe
260 265 270
Pro Ser Pro Ile Val Gin Glu Gly Leu Arg Phe Leu Ala Leu Val Ser 275 280 285
Asn Trp Val Thr Leu Phe Ser Ala His Leu Gin Arg Ile Asp Asp Ala
290 295 300
Ala Leu Thr Pro Leu Thr Arg Ala Leu Phe Thr Leu Ala Leu Val Asp 305 310 315 320 Glu Tyr Leu Thr Thr Pro Asp Arg Gly Ala Val Val Pro Pro Pro Leu
325 330 335
Leu Ala Gin Phe Gin His Thr Val Arg Glu Ile Asp Pro Ala Ile Met
340 345 350
Ile Pro Pro Leu Glu Ala Thr Lys Met Val Arg Ser Arg Glu Glu Val 355 360 365
Arg Val Ser Thr Ala Leu Ser Arg Val Ser Pro Arg Ser Ala Cys Ala
370 375 380
Pro Pro Gly Thr Leu Met Ala Arg Val Arg Thr Asp Ala Ala Val Phe 385 390 395 400 Asp Pro Asp Val Pro Phe Leu Ser Ala Ser Ala Ile Phe Arg Pro Ala
405 410 415
Val Thr Gly Leu Leu Gin Leu Gly Glu Pro Pro Ser Ala Gly Ala Gin 420 425 430 Gin Arg Leu Leu Ala Leu Leu Gin Gin Thr Trp Ala Leu Val Gin Asn
435 440 445
Ser Asn Ser Pro Ser Val Val lie Asn Thr Leu Thr Asp Ala Gly Phe
450 455 460 Thr Pro Ala His Cys Thr Gin Tyr Ile Ser Ala Leu Glu Gly Phe Leu
465 470 475 480
Val Ala Gly Val Pro Ala Arg Thr Pro Pro Gly His Gly Leu Ser Glu
485 490 495
Ile Gin Gin Leu Phe Gly Cys Ile Ala Gly Ala Asn Val Phe Gly Leu 500 505 510
Ala Arg Glu Tyr Gly His Tyr Ala Gly Tyr Val Lys Thr Phe Arg Arg
515 520 525
Ile Gin Gly Ala Ser Glu His Thr His Gly Arg Leu Cys Glu Ala Val
530 535 540 Gly Leu Ser Gly Gly Val Leu Ser Gin Thr Leu Ala Arg lie Met Gly
545 550 555 560
Pro Ala Val Pro Thr Glu His Leu Ala Ser Leu Arg Arg Thr Leu Val
565 570 575
Gly Glu Phe Glu Thr Ala Glu Arg Arg Phe Ser Ala Gly Gin Pro Ser 580 585 590
Leu Leu Arg Glu Thr Ala Leu Ile Trp Leu Asp Val Tyr Gly Gin Thr
595 600 605
His Trp Asp Leu Thr Pro Thr Thr Pro Ala Thr Pro Leu Ser Ala Leu
610 615 620 Leu Pro Val Gly Pro Pro Ser His Ala Pro Ser Val His Leu Ala Ala
625 630 635 640
Ala Thr Lys Ile Arg Phe Pro Ala Leu Glu Gly Ile His Pro Asn Val
645 650 655
Leu Ala Asp Pro Gly Phe Val Pro Tyr Val Leu Ala Leu Val Val Gly 660 665 670
Asp Ala Leu Arg Ala Thr Cys Asn Ala Ala Tyr Leu Pro Arg Pro Ile
675 680 685
Glu Phe Ala Leu Arg Val Leu Ala Trp Ala Arg Asp Phe Gly Leu Gly
690 695 700 Tyr Leu Pro Thr Val Glu Gly His Arg Thr Lys Leu Gly Ala Leu Ile
705 710 715 720
Thr Leu Leu Glu Pro Ala Thr Arg Ala Gly Val Gly Pro Thr Met Gin
725 730 735
Met Ala Asp Asn Ile Glu Gin Leu Leu Arg Glu Leu Tyr Val Ile Arg 740 745 750
Ala Val Glu Gin Leu Arg Pro Ala Val Gin Leu Pro Pro Pro Gin Pro
755 760 765
Pro Glu Val Gly Ser Ser Leu Leu Leu Ile Ser Met Tyr Ala Arg Val 770 775 780
Met Gin Glu Phe Ala Glu Arg Ala Asp Pro Leu Val Arg Gin Leu Glu 785 790 795 800
Asp Ala Ile Val Leu Leu Arg Leu His Met Arg Thr Leu Ala Ala Phe 805 810 815
Phe Glu Cys Arg Phe Glu Ser Asp Gly His Arg Leu Tyr Ala Val Val
820 825 830
Ala Asp Ala His Glu Arg Leu Gly Pro Trp Arg Pro Glu Ala Met Gly 835 840 845 Asp Ala Val Ser Gin Tyr Cys Gly Met Tyr His Asp Ala Lys Arg Ala 850 855 860
Leu Val Ala Ser Leu Ala Gly Leu Arg Ser Val Val Thr Glu Thr Thr 865 870 875 880
Ala His Leu Gly Val Cys Asp Glu Leu Ala Ala Gin Val Ser His Glu 885 890 895
Gly Asn Val Leu Ala Val Val Arg Arg Glu Ile His Gly Phe Leu Ala
900 905 910
Ile Val Ser Gly Ile His Ala Arg Ala Ser Lys Leu Met Ser Gly Asp 915 920 925 Gin Val Pro Gly Phe Cys Tyr Met Ser Gin Phe Leu Ala Arg Trp Arg 930 935 940
Arg Leu Ser Ala Gly Tyr Gin Ala Ala Arg Ala Ala Thr Gly Pro Glu 945 950 955 960
Arg Val Ala Glu Phe Val Gin Glu Leu His Asp Thr Trp Lys Gly Leu 965 970 975
Gin Thr Glu Arg Ala Leu Val Val Ala Pro Phe Ala Ser Ser Gly Asp
980 985 990
Gin Arg Thr Ala Ala Ile Gin Glu Val Met Ala His Ala Asn Glu Asp 995 1000 1005 Ala Pro Pro Ala Arg Pro Gin Thr Arg Arg Ala His Lys Arg His Asp 1010 1015 1020
Trp Gly Ala Gly Xaa Thr Xaa Xaa Gly Ala Trp Val Xaa Asp Val Val 1025 1030 1035 104
Xaa Asp Ser
(2) INFORMATION FOR SEQ ID NO: 87:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 5033 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87:
CGCGGTCGAC TCTAGAGGAT CCCCTGCGCC GCGTCGGGAT TCACCAACTC GTTCGCGCGC 60
TGCAGGAGGT TCTTGCCCTC GCAGACCGTC ACGCGAATGG TGGTGAGGTC GAGGAGCTCG 120
TTGAGGTCTC CGTCGGTGTG CGGCCGCGAC ATGTCCCACA GCTGTACCGC CGCCAGCCGG 180
GCGTGCGTGG CCGCCAGGCG CCCGACCGCG GCGCAAAAAA CGCGCTTGTT GAACCCGGCC 240
ACCCGGGGGG TCCACGGCGC CGTGGGGCTC GGTGGGGCGG TGCTGAATTG CACCTCCTTG 300 GCCAGTCCCT GGGCGGGTGT CTTGGTTCTT CCCGAGGCCG TGGGAGCGGG GGCGTCTAGG 360
AGCACGGCGG TATCGGCCTG GGCGGGTCGC CTGCCGCGGG CAGGGTCGGT CGCCGGGGTC 420
GCGGGGGCCT TAGGGCGCCC CGCGCGTCAT TTTGGGGGTC CGCGCGGGAG GGGCGTGCGA 480
GCGCCCGCCG GCGCCCACGG GGCCCCCGGG GGGTGGAGGA GCGCGCGCGG GGCCGGGGCC 540
GTGAGAGCCC GCGACGGACG CCGAACGACG CGGTCGCGCG GTATCCCGGG ACTCGTCGTT 600 GTCTTCGGAC GACGACGAGT CCCGGTAGAG GGCATACCCA GCCTCGTCAT AATGGAGAAA 660
GCGAACCTCG CCCCTTGGGC GCGCGCGCAT CGGGCCAGCG CCGCGGCGGA AGTCGTCGCG 720
CGGACTCTCT GGATCCGCCG GGGAGACCGG GCCATAGTAC AGCTCCTCGT GGGTCCCGCG 780
CGGCGCTTCC CGCGGACACG ACTTGACGGA GCGGCGAGAG GTCATGGTCT ATCGGAGACA 840
CCGGGGACGC CCGTGCGGAT CACAGGGAAG GCGTCGGCGA AGCAGGCAGA GAGCGTCGGA 900 AGGCGGCGAG GGAGGGAAAG AGGGAGACCG GCGGGGTACG GGAGAGCAGC GAGGGCCTGC 960
GTAACCCACG GGGGCCGCGG GAGTGGCTCC CTGCGGGTTG CGGGGGAGAG TTTATAGGAA 1020
GTGGATATAA CCGCAGGCGA CGGGACTAAC CAATCCCCGG GGGGGCAACG GACAGACACG 1080
CCCCGAACCG GCCCGACTTC CGCGAGGAAG CAAAGGCCGG GGGCCGCCCA ACGACACGCC 1140
CACCCCTTCC CAACAGGGCG GGCTCAGGCT GACCCGGCGG CCAGTGCCCG CTGGCATATC 1200 TGATACACGT GCGCGATCAT ACATACGCCC ATCGAGGTCA TGCCTAGATA AAAGGGCACC 1260
AGGACCCCCG GGACGGACAC CACACCGGCG CTGTCGCCCC GGCATTGCGC GTCCCCGATA 1320
ACGCCGCGTG CGCCTGCCGC GTTCGGCGGC TCCCCGGGCA CGCCCGCGAC GAGCGCGACG 1380
AACAACAGCA CCACCCAGCG GCCCAGTCTT GCGGGTTTCC CCGTCATCGC GGCGATGAGT 1440
CAGTGGGGGC CCAGGGCGAT CCTTGTCCAG ACGGACAGCA CCAACCGGAA TGCCGATGGG 1500 GACTGGCAAG CGGCCGTAGC TATTCGCGGG GGCGGAGTCG TTCAACTGAA CATGGTCAAC 1560
AAACGCGCCG TGGATTTTAC CCCGGCAGAA TGCGGGGACT CCGAATGGGC CGTGGGCCGC 1620
GTCTCTCTGG GCCTGCGAAT GGCAATGCCG CGTGACTTCT GCGCGATTAT TCACGCCCCC 1680
GCGGTATCCG GCCCCGGGCC CCACGTGATG CTCGGTCTCG TCCACTCGGG CTACCGCGGA 1740
ACCGTCCTGG CCGTGGTCGT ATCCCCGAAC GGGACGCGCG GGTTTGCCCC CGGGGCCCTC 1800 CGGGTCGACG TGACGTTTCT GGACATCCGG GCCACCCCCC CGACCCTCAC CGAGCCGAGC 1860
TCCCTGCACC GGTTTCCGCA GTTGGCGCCG TCCCCGCTGG CAGGGTTACG AGAAGATCCT 1920
TGGTTGGACG GGGCGCTCGC GACCGCCGGG GGGGCGGTGG CCCTGCCGGC CAGACGGCGC 1980
GGGGGATCGC TGGTCTACGC GGGCGAGCTA ACGCAGGTGA CCACCGAGCA CGGCGACTGC 2040
GTGCACGAGG CGCCCGCCTT TCTGCCAAAG CGCGAGGAGG ACGCAGGCTT TGACATTCTC 2100 ATCCACCGAG CCGTGACCGT CCCGGCCAAC GGCGCCACGG TCATACAGCC GTCCCTCCGC 2160
GTATTGCGCG CGGCCGACGG ACCAGAGGCC TGCTATGTGC TGGGGCGGTC GTCGCTCAAT 2220
GCCAGGGGCC TCCTGGTCAT GCCTACGCGC TGGCCCTCCG GGCACGCCTG TGCGTTTGTT 2280
GTATGTAACC TGACCGGAGT CCCGGTGACC CTACAAGCCG GGTCCAAGGT CGCCCAGCTG 2340 CTCGTCGCGG GGACCCACGC CCTCCCCTGG ATCCCCCCCG ACAACATCCA CGAGGACGGC 2400
GCATTCCGGG CCTACCCCAG AGGGGTTCCG GACGCGACCG CCACCCCCCG AGACCCGCCG 2460
ATTTTGGTGT TTACGAACGA GTTTGACGCG GACGCCCCCC CAAGCAAGCG GGGGGCCGGG 2520
GGGTTTGGCT CCACTGGCAT CTAAACCGCG CCTCGCGTCG GGCCAGATGG GGCCCCGGTC 2580 AATAAAGAGC TCTGTTTCGC ATATGCCCTG GTGTTGGCGG TTTTTTTTTT GTTGTCTGTC 2640
TGCCCGGCAC TCGGTTGTCC GTTCTGTCGT CGCTATCACA TACGCACAAA CACACGGGTA 2700
GAGTGGAACC GAAACCGGTC GACGTTTATT CACCACACAG AAACACAAGC TAAGCGAGAA 2760
GGAGGGGGGC CTCGGTCGAC GAGGCCTGGC GTTTGGGGGC GGACGTGCGA TGACGTGGGT 2820
CCGGTGTAGG GTCCGCGGGG GGCACGGGCC CGGGGCGAAC GGGGGATCTG TCGCCGGCGT 2880 GGGTGACTGG GACCGACGCA ACCTCCGGGG CTTGTGCCCT CGTAGGCCCG GGGGGGGCCT 2940
CGGTCGCTCC GAGCCCCGCG GTGCGGGTCC CTCCGGCCAG AGCCGAGGTG GAGAGACCAA 3000
GGGCCCGCTC CGCGATCGCC ACGTCCTCCA TGACCACGTC GCTTTCGGCC ATGCTCCGAA 3060
TGGCCTGGGA GACGAGCACG TCCGCCGACT TGTCCGCGGC CCCCACCGAC ATGTACATCT 3120
GCAGGATGGT GGCCATGCAC GTGTCCGCCA GGCGGCGCAT CTTGTCCCGA TGCGCCGCAA 3180 CGGCCCCGTC GATGGTGGAG CCCTCGAGTC CCGGGTGGTG GCGCGCCAGC CTCTCGAGGT 3240
TGACCATGCA GGCGTGGTAT GTGCGGGCCA GGGCGCGCGC CTTCACGAGG CGCCGGGTGT 3300
CGTCCAGCGA CTCTAGGGCG TCATCAAGCG TGATGGGGGC GGGCAAAAGC GCATTGACCA 3360
CCGCCAGGGC CTCCTGCAGC CGCGGCTCCG CCTCCGAGGG CGGATCCGCG GCCCGAATCA 3420
TCTCATATTG TTGTTCCTCG GGGCGCGTGC CCCAACCGCA CAGCACCCCG AGCAGGGACG 3480 CCATCCCGGA ACACGCGCGC GGCTCTGCGC CGGCTTTCCC CCACCCCACC CCCTCCGGGT 3540
TCGCAGGGGC GATGGGGACG GAAGACTGCG ATCACGAAGG GCGGTCGGTT GCGGCTCCCG 3600
TGGAGGTTAT GGCGCTGTAT GCGACCGACG GGTGCGTTAT CACCTCCTCG CTCGCCCTCC 3660
TCACAAACTG CCTGCTGGGG GCCGAGCCGT TGTATATATT CAGCTACGAC GCGTACCGGC 3720
CCGATGCGCC CAATGGCCCC ACGGGCGCGC CCACCGAACA GGAGAGGTTC GAGGGGAGCC 3780 GGGCGCTCTA CCGGGATGCG GGGGGGCTAA ATGGCGATTC ATTTCGGGTG ACCTTTTGTT 3840
TATTGGGGAC GGAAGTGGGC GTGACCCACC ACCCGAAAGG GCGCACCCGG CCCATGTTTG 3900
TGTGCCGCTT CGAGCGAGCG GACGACGTCG CCGTGCTCCA AGACGCCCTG GGCCGCGGGA 3960
CCCCATTGCT CCCGGCCCAC ATCACAGCAA CTCTGGACTT GGAGGCGACG TTTGCGCTCC 4020
ACGCTAACAT CATCATGGCT CTCACCGTGG CCATAGTCCA CAACGCCCCC GCCCGCATCG 4080 GCAGCGGCAG CACCGCTCCC CTGTATGAGC CCGGCGAATC GATGCGCTCG GTCGTCGGGC 4140
GCATGTCCCT GGGGCAGCGC GGCCTCACCA CGCTGTTCGT GCACCACAAG GCGCGCGTGC 4200
TGGCGGCGTA CCGCCGGGCG TATTATGGGA GCGCCCAAAG CCCCTTTTGG TTTCTGAGCA 4260
AATTCGGCCC GGACAAAAAG AGCCTGGTGC TGGCCGCTAG GTACTACCTA CTCCAGGCTC 4320
CGCGCTTGGG GGGCGCCGGA GCCACGTACG ATCTGCAGGC CGTGAAAGAC ATCTGCGCGA 4380 CCTACGCGAT CCCCCACGAC CCACGCCCCG ACACCCTCAG TGCCGCGTCC TTGACCTCGT 4440
TCGCCGCCAT CACTCGGTTC TGTTGCACGA GCCAGTACTC CCGCGGGGCC GCGGCCGCTG 4500
GGTTTCCGCT GTATGTGGAG CGCCGCATCG CCGCCGACGT ACGCGAGACC GGCGCGCTGG 4560
AGAAGTTCAT CGCCCACGAT CGCAGTTGCC TGCGCGTGTC CGACCGGGAA TTCATTACGT 4620
ACATCTACCT GGCCCACTTT GAGTGCTTCA GCCCCCCGCG CCTGGCCACG CATCTCCGGG 4680 CCGTGACCAC CCACGACCCC AGCCCCGCGG CCAGCACGGA GCAGCCCTCG CCCCTGGGTC 4740
GGGAGGCGGT GGAACAGTTC TTCCGGCACG TGCGCGCCCA GCTGAACATC CGCGAGTACG 4800
TAAAGCAAAA CGTCACCCCC AGGGAAACCG CCCTGGCGGG AGACGCGGCC GCCGCCTACC 4860
TGCGCGCGCG CACGTATGCC CCGGCGGCCC TCACGCCCGC CCCCGCGTAC TGCGGGGTCG 4920 CAGACTCGTC CACCAAAATG ATGGGACGTC TGGCGGAAGC AGAAAGGCTC CTAGTCCCCC 4980 ACGGCTGGCC CGCGTTCGCA CCAACAACCC CCGGGGACGA CGCGGGGGGC GG 5033
(2) INFORMATION FOR SEQ ID NO: 88:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 117 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88:
Val Leu Leu Asp Ala Pro Ala Pro Thr Ala Ser Gly Arg Thr Lys Thr
1 5 10 15
Pro Ala Gin Gly Leu Ala Lys Glu Val Gin Phe Ser Thr Ala Pro Pro 20 25 30 Ser Pro Thr Ala Pro Trp Thr Pro Arg Val Ala Gly Phe Asn Lys Arg 35 40 45
Val Phe Cys Ala Ala Val Gly Arg Leu Ala Ala Thr His Ala Arg Leu
50 55 60
Ala Ala Val Gin Leu Trp Asp Met Ser Arg Pro His Thr Asp Gly Asp 65 70 75 80
Leu Asn Glu Leu Leu Asp Leu Thr Thr Ile Arg Val Thr Val Cys Glu
85 90 95
Gly Lys Asn Leu Leu Gin Arg Ala Asn Glu Leu Val Asn Pro Asp Ala 100 105 110 Ala Gin Gly Ile Leu 115
(2) INFORMATION FOR SEQ ID NO: 89:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 131 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: Met Thr Ser Arg Arg Ser Val Lys Ser Cys Pro Arg Glu Ala Pro Arg
1 5 10 15
Gly Thr His Glu Glu Leu Tyr Tyr Gly Pro Val Ser Pro Ala Asp Pro 20 25 30
Glu Ser Pro Arg Asp Asp Phe Arg Arg Gly Ala Gly Pro Met Arg Ala
35 40 45 Arg Pro Arg Gly Glu Val Arg Phe Leu His Tyr Asp Glu Ala Gly Tyr 50 55 60 Ala Leu Tyr Arg Asp Ser Ser Ser Ser Glu Asp Asn Asp Glu Ser Arg 65 70 75 80
Asp Thr Ala Arg Pro Arg Arg Ser Ala Ser Val Ala Gly Ser His Gly
85 90 95 Pro Gly Pro Ala Arg Ala Pro Pro Pro Pro Gly Gly Pro Val Gly Ala 100 105 110
Gly Gly Arg Ser His Ala Pro Pro Ala Arg Thr Pro Lys Met Thr Arg
115 120 125 Gly Ala Pro 130
(2) INFORMATION FOR SEQ ID NO: 90:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 363 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90:
Met Ser Gin Trp Gly Pro Arg Ala Ile Leu Val Gin Thr Asp Ser Thr 1 5 10 15 Asn Arg Asn Ala Asp Gly Asp Trp Gin Ala Ala Val Ala Ile Arg Gly 20 25 30
Gly Gly Val Val Gin Leu Asn Met Val Asn Lys Arg Ala Val Asp Phe
35 40 45
Thr Pro Ala Glu Cys Gly Asp Ser Glu Trp Ala Val Gly Arg Val Ser 50 55 60
Leu Gly Leu Arg Met Ala Met Pro Arg Asp Phe Cys Ala Ile Ile His 65 70 75 80
Ala Pro Ala Val Ser Gly Pro Gly Pro His Val Met Leu Gly Leu Val 85 90 95
His Ser Gly Tyr Arg Gly Thr Val Leu Ala Val Val Val Ser Pro Asn
100 . 105 110
Gly Thr Arg Gly Phe Ala Pro Gly Ala Leu Arg Val Asp Val Thr Phe 115 120 125
Leu Asp Ile Arg Ala Thr Pro Pro Thr Leu Thr Glu Pro Ser Ser Leu
130 135 140
His Arg Phe Pro Gin Leu Ala Pro Ser Pro Leu Ala Gly Leu Arg Glu 145 150 155 160 Asp Pro Trp Leu Asp Gly Ala Thr Ala Gly Gly Ala Val Pro Ala Arg
165 170 175
Arg Arg Gly Gly Ser Leu Val Tyr Ala Gly Glu Leu Thr Gin Val Thr
180 185 190
Thr Glu His Gly Asp Cys Val His Glu Ala Pro Ala Phe Leu Pro Lys 195 200 205
Arg Glu Glu Asp Ala Gly Phe Asp Ile Leu Ile His Arg Ala Val Thr
210 215 220
Val Pro Ala Asn Gly Ala Thr Val Ile Gin Pro Ser Leu Arg Val Leu 225 230 235 240 Arg Ala Ala Asp Gly Pro Glu Ala Cys Tyr Val Leu Gly Arg Ser Ser
245 250 255
Leu Asn Arg Leu Leu Val Met Pro Thr Arg Trp Pro Ser Gly His Ala
260 265 270
Cys Ala Phe Val Val Cys Asn Leu Thr Gly Val Pro Val Thr Leu Gin 275 280 285
Ala Gly Ser Lys Val Ala Gin Leu Leu Val Ala Gly Thr His Ala Leu
290 295 300
Pro Trp Ile Pro Pro Asp Asn Ile His Glu Asp Gly Ala Phe Arg Ala 305 310 315 320 Tyr Pro Arg Gly Val Pro Asp Ala Thr Ala Thr Pro Arg Asp Pro Pro
325 330 335
Ile Leu Val Phe Thr Asn Glu Phe Asp Ala Asp Ala Pro Pro Ser Lys
340 345 350
Arg Gly Ala Gly Gly Phe Gly Ser Thr Gly Ile 355 360
(2) INFORMATION FOR SEQ ID NO: 91:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 251 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91:
Val Gly Trp Gly Lys Ala Gly Ala Glu Pro Arg Ala Cys Ser Gly Met
1 5 10 15
Ala Ser Leu Leu Gly Val Leu Cys Gly Trp Gly Trp Glu Glu Gin Gin 20 25 30 Tyr Glu Met Ile Arg Ala Ala Asp Pro Pro Ser Glu Ala Glu Pro Arg 35 40 45
Leu Gin Glu Ala Val Val Asn Ala Leu Leu Pro Ala Pro Ile Thr Leu
50 55 60
Asp Asp Ala Leu Glu Ser Leu Asp Asp Thr Arg Arg Leu Val Lys Ala 65 70 75 80
Arg Ala Arg Thr Tyr His Ala Cys Met Val Asn Leu Glu Arg Leu Ala
85 90 95
Arg His His Pro Gly Leu Glu Gly Ser Thr Ile Asp Gly Ala Val Ala 100 105 110 Ala His Arg Asp Lys Met Arg Arg Leu Ala Asp Thr Cys Met Ala Thr 115 120 125
Ile Leu Gin Met Tyr Met Ser Val Gly Ala Ala Asp Lys Ser Ala Asp
130 135 140
Val Leu Val Ser Gin Ala Ile Arg Ser Met Ala Glu Ser Asp Val Val 145 150 155 160
Met Glu Asp Val Ala Ile Ala Glu Arg Ala Leu Gly Leu Ser Thr Ser
165 170 175
Ala Gly Gly Thr Arg Thr Ala Gly Leu Gly Ala Thr Glu Ala Pro Pro 180 185 190 Gly Pro Thr Arg Ala Gin Ala Pro Glu Val Ala Ser Val Pro Val Thr 195 200 205
His Ala Gly Asp Arg Ser Pro Val Arg Pro Gly Pro Val Pro Pro Ala
210 215 220
Asp Pro Thr Pro Asp Pro Arg His Arg Thr Ser Ala Pro Lys Arg Gin 225 230 235 240
Ala Ser Ser Thr Glu Ala Pro Leu Leu Leu Ala 245 250
(2) INFORMATION FOR SEQ ID NO: 92:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 710 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92:
Val Thr Gly Thr Asp Ala Thr Ser Gly Ala Cys Ala Leu Val Gly Pro 1 5 10 15 Gly Gly Ala Ser Val Ala Pro Ser Pro Ala Val Arg Val Pro Pro Ala 20 25 30
Arg Ala Glu Val Glu Arg Pro Arg Ala Arg Ser Ala Ile Ala Thr Ser
35 40 45
Ser Met Thr Thr Ser Leu Ser Ala Met Leu Arg Met Ala Trp Glu Thr 50 55 60
Ser Thr Ser Ala Asp Leu Ser Ala Ala Pro Thr Asp Met Tyr Ile Cys 65 70 75 80
Arg Met Val Ala Met His Val Ser Ala Arg Arg Arg Ile Leu Ser Arg 85 90 95 Cys Ala Ala Thr Ala Pro Ser Met Val Glu Pro Ser Ser Pro Gly Trp 100 105 110
Trp Arg Ala Ser Leu Ser Arg Leu Thr Met Gin Ala Trp Tyr Val Arg
115 120 125
Ala Arg Ala Arg Ala Phe Thr Arg Arg Arg Val Ser Ser Ser Asp Ser 130 135 140
Arg Ala Ser Ser Ser Val Met Gly Ala Gly Lys Ser Ala Leu Thr Thr
145 150 155 160
Ala Arg Ala Ser Cys Ser Arg Gly Ser Ala Ser Glu Gly Gly Ser Ala
165 170 175 Ala Arg Ile Ile Ser Tyr Cys Cys Ser Ser Gly Arg Val Pro Gin Pro
180 185 190
His Ser Thr Pro Ser Arg Asp Ala Ile Pro Glu His Arg Ser Ala Pro
195 200 205
Ala Phe Pro His Pro Thr Pro Ser Gly Phe Ala Gly Ala Met Gly Thr 210 215 220
Glu Asp Cys Asp His Glu Gly Arg Ser Val Ala Ala Pro Val Glu Val 225 230 235 240
Met Ala Leu Tyr Ala Thr Asp Gly Cys Val Ile Thr Ser Ser Leu Ala 245 250 255 Leu Leu Thr Asn Cys Leu Leu Gly Ala Glu Pro Leu Tyr Ile Phe Ser 260 265 270
Tyr Asp Ala Tyr Arg Pro Asp Ala Pro Asn Gly Pro Thr Gly Ala Pro 275 280 285 Thr Glu Gin Glu Arg Phe Glu Gly Ser Arg Ala Leu Tyr Arg Asp Ala
290 295 300
Gly Gin Gly Asp Ser Phe Arg Val Thr Phe Cys Leu Leu Gly Thr Glu 305 310 315 320 Val Gly Val Thr His His Pro Lys Gly Arg Trp Met Phe Val Cys Arg
325 330 335
Phe Glu Arg Ala Asp Asp Val Ala Val Leu Gin Asp Ala Leu Gly Arg
340 345 350
Gly Thr Pro Leu Leu Pro Ala His Ile Thr Ala Thr Leu Asp Leu Glu 355 360 365
Ala Thr Phe Ala Leu His Ala Asn Ile Ile Met Ala Leu Thr Val Ala
370 375 380
Ile Val His Asn Ala Pro Ala Arg Ile Gly Ser Gly Ser Thr Ala Pro 385 390 395 400 Leu Tyr Glu Pro Gly Glu Ser Met Arg Ser Val Val Gly Arg Met Ser
405 410 415
Leu Gly Gin Arg Gly Leu Thr Thr Leu Phe Val His His Lys Ala Arg
420 425 430
Val Leu Ala Ala Tyr Arg Arg Ala Tyr Tyr Gly Ser Ala Gin Ser Pro 435 440 445
Phe Trp Phe Leu Ser Lys Phe Gly Pro Asp Lys Lys Ser Leu Val Leu
450 455 460
Ala Ala Arg Tyr Tyr Leu Leu Gin Ala Pro Arg Leu Gly Gly Ala Gly 465 470 475 480 Ala Thr Tyr Asp Leu Gin Ala Val Lys Asp Ile Cys Ala Thr Tyr Ala
485 490 495
Ile Pro His Asp Pro Arg Pro Asp Thr Leu Ser Ala Ala Ser Leu Thr
500 505 510
Ser Phe Ala Ala Ile Thr Arg Phe Cys Cys Thr Ser Gin Tyr Ser Arg 515 520 525
Gly Ala Ala Ala Ala Gly Phe Pro Leu Tyr Val Glu Arg Arg Ile Ala
530 535 540
Ala Asp Val Arg Glu Thr Gly Ala Leu Glu Lys Phe Ile Ala His Asp 545 550 555 560 Arg Ser Cys Leu Arg Val Ser Asp Arg Glu Phe Ile Thr Tyr Ile Tyr
565 570 575
Leu Ala His Phe Glu Cys Phe Ser Pro Pro Arg Leu Ala Thr His Leu
580 585 590
Arg Ala Val Thr Thr His Asp Pro Ser Pro Ala Ala Ser Thr Glu Gin 595 600 605
Pro Ser Pro Leu Gly Arg Glu Ala Val Glu Gin Phe Phe Arg His Val
610 615 620
Arg Ala Gin Leu Asn Ile Arg Glu Tyr Val Lys Gin Asn Val Thr Pro 625 630 635 640
Arg Glu Thr Ala Gly Asp Ala Ala Ala Ala Tyr Leu Arg Ala Arg Thr
645 - 650 655
Tyr Ala Pro Ala Ala Leu Thr Pro Ala Pro Ala Tyr Cys Gly Val Ala 660 665 670
Asp Ser Ser Thr Lys Met Met Gly Arg Leu Ala Glu Ala Glu Arg Leu
675 680 685
Leu Val Pro His Gly Trp Pro Ala Phe Ala Pro Thr Thr Pro Gly Asp 690 695 700 Asp Ala Gly Gly Gly Ile 705 710
(2) INFORMATION FOR SEQ ID NO: 93:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 5742 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93:
AGAGGAGTAG AATAGAGAAG AGGATAGAGA GGAGTAGAGT GATCAATAAG ATGTGAAATA 60 TGAGAGAGTA GTAAAGTAGT AAAGAATTTG GGACGGAGCG TAGACAGATA GATATAGAGA 120
TATGGCCGTC TAGGAAGAAG AAATGTGTGA GAGATATAAA GTGGGTAAGA GGTCTATATG 180
AAGAGTAACG AGTAAGGGAT GGGTAGAAGA AGCCTGATGG GGAAGGTGAG AAGAAGTGTT 240
AAAGGGGATA GAATGGAGGT TAGCGAGGTG GTAGAAACAA GAAGGGGAAT AGGAAGGAAC 300
GGCCAATAGG AGAAAGAAGA GGAATGATGG AGGGAAGATG AGGTAGGAGC CATCCCGGCC 360 CACATTTACG GAAAACAGAC CAACGTGCAG GTCGCGACGG AGTTCGATAT GGAAGTAGAA 420
GTTCTCCGCG GCGCGGTCCC AAATCGGCAC CAGCAGGGAA GCATTTACAA AAGCGTACCG 480
GGTCGGAGGC CCGCCGCCCT GGTACGTGTA CGTGTACAAC CCGCACGTCT TTGGGGACGG 540
CGGCTCGCCG CCGCGACACC CTCCATACTG CCGAAGGAGG ATGGGCTGAC GGCAGGCGCG 600
GGTGAGCCGG TATTACGTCA CACGGGCCGC ATACGTTGCG TTGGGTGCGC CAAATCCGCG 660 TACCGGCGGC GCCAGCAAAA CCAGCCCCCC CGTGAGGAAC GAGCGGCCCA GGGGCTCGTG 720
ACGGACGACG CGCCGGGGCA GGTCTTGCCG CCCGGCGTCC GCGGAACCCA AGGGCCCGGG 780
ATCGTCCAAC CGGGGATAGG CATAACAATA TTGAGCCACG GGGGAACCTC CCGGGAAAAC 840
AACATCGTTG TTGGGGGGAT TAATTGGTCC GGGGACACCC GACCCGCCGC GTGTCCCCGG 900
AAGACCAGAA AGAACAAAAA GAAGAAGCAA CCTGGGAGCG ATGGCGTGCA TGACGCCGGG 960 CGGCAAGGTG CCAAAAACAC CAAAGCCCGA GGGCCGCGTC TTTTTGTGCA AAAACATCCA 1020
ACCAGCCCCC CCTCCACGCC CCCCGGGGGG AGCGGGGTCA CTTAGGGTGA AATAGCGGCA 1080
GGCGCAGCAA CTCCGCGGCG CTTGGGCGGA GCGCCGCGTC AAAGGTAAGG GCTTTGCAGA 1140
TGAGATATTC GACGTCTGTG TGGATCTTGT AGTAGCGGGT CCATGCCGGT CGGGTCCACG 1200 CCGGACGATT GTTCCCGGCC GCCCGCGAGC GGTAGTGCGC GGTGAGGCGC GATTCCGCGT 1260
GCGTTGGAAA CTCGTCGACG TGTACCTGGG CCTGTCGGAT GATGCGCGCG ATCTGGTTGT 1320
CGCACGGCCG CCTTTCGGGG TCGCGCGGGG CCGAGAACAA GGACGCGGTG TGGACGGCGG 1380
TCTCAAAGAT CACCAGGCCG GCGCTCCAGA TGTCGATTAC CTGGGTGTAC GGATCCCCGG 1440 CCAGGACCTC GGGGGCGTTT GTATCGATGG TGCCTGCGAT CCCGTAATGG AAGGGGCTCG 1500
ATCGACACCC GCGCACAAAG CACGCCGCCC CAAAGTCCCC CAGACAGATG TTCTCGGGGG 1560
TGTTGATGAG GATGTTCTCG GTCTTAATAT CGCGGTGGAT GATGCCTTCG CAGTGGACGT 1620
AGTCGATGGC GCTCAAGAGC TGCCGGGAGA CCGCGGTTAT CTGTAGGTGG CCCAACGGAG 1680
ACGGGCGCTT GCTCAGATAG GTATACAGGT CGCAGTGATA CTTGGGGAGG ACCAGACACG 1740 TGACCCCAGA AACGACGTGC AGGTCCAGGA GGGGTAGGAT CGCGGGGTGG TTCAGGCGTC 1800
TCAGCAGCCG CGCCTCGTGG TTTGTGCTGG CGTACCACCC CGCCTTGACG ATTACCCGAT 1860
GAGGGTAGTT CGGGTGGCTG CTATCAAAGA CACACCCCTC CGACCCCGGG ATGAGCGTTC 1920
CGTGGATCGC GAATCCCAGC CCCGTCACCA GTTTTGCCAG GGTCGAAGGG GGCTTGCACC 1980
CGCGGCTGAT GGCCCGAAGT GCCTCCCGGT CCATGGTGTC GAGCTCTTCC GGGGTGAACC 2040 CCGTGGCCCC CATCTTACCG CTGTCGCAGG TCCGGACGTC GGGGGGGGCT GCGCGGCCCG 2100
GAACAGGAGG ATGGCCGCTG GCTCCGGGCA GGGGGGCGGC CGAAACCATG GACAGAAAAC 2160
GCCCCTCCGC GTAGTCCTCC GGGTAGGCCA CGTCATCCGG GGCGTCATCG TCGGCCTCGT 2220
CTTCCTCCTC CGCACCCGCG GCGTCCACGA TGGGGTAGTC CTCGTCGCTG TGCATCTGGG 2280
CCAAGATCTC CTGCAGCTGA CACAGGCGCG CAGCCTCGCC GGGGGACGGT GGGCGGGAAG 2340 GGTGGATGGT TTCCGGGGGC CCGGGGGCCA GGTACGCATC CTCCGCGGGG GTATAAAAGG 2400
TGCTCGCCGG GAAGGCCGGG GCCGTGTTTG TCTCCGGCGG GACGGACGCC TCCTGTCTCT 2460
TGTCGGGTCT ACGGTAGACC CCACAGAACT TACGACAGGC CATTCGCCGC GTCGCGCGTG 2520
CCAACCAACG AGCACCCCGA GCGACGGGCC CCGGTGTTTT AAGAAGCGGC AGTTTGTCGA 2580
CACACCCCCC CACTACCCCC GCCCCCTATA TCCGGAACGT CAGATTATCC GGGATACCTA 2640 GCCAACCAAA CAAGGCTGAA AAAATCGAAC GTGCGAACGG GCCGTGTGAT AGCAAGCAGC 2700
CCCCCCGGGT CCGCGCGCCG TCCCGCCGTG CATAGGTCCG CAGACAGGCG AGTGAGTGAA 2760
GATCGGACCA CGGGCCTAAT ATACCGACAT GGGCGTTGTT GTTGTAAGTG TGGTTACCCT 2820
CCTAAACCAA CGAAACGCCC TGCCGCGGAC TTCCGCTGAC GCAAGCCCGG CTCTGTGGAG 2880
TTTTCTGCTT CGGCAATGCC GGATCCTGGC CTCCGAGCCT CTGGGAACCC CGGTGGTGGT 2940 TCGCCCGGCG AACCTTCGCA GGCTGGCCGA GCCTCTGATG GACTTGCCCA AATTCACCCG 3000
ACCGATCGTG CGAACCCGCT CCTGTCGCTG TCCCCCAAAC ACCACGACGG GCCTGTTTGC 3060
GGAGGACGAC CCCCTGGAAA GCATCGAGAT TCTGGATGCC CCTGCGTGTT TTCGGCTCCT 3120
GCATCAAGAG CGCCCCGGCC CCCACCGGCT ATACCACCTG TGGGTGGTCG GGGCGGCGGA 3180
CCTGTGTGTG CCGTTTTTAG AGTACGCACA AAAAACCCGG CTGGGGTTTC GCTTCATCGC 3240 CATGAAGACC AACGACGCGT GGGTGGGGGA ACCGTGGCCC CTGCCCGATC GGTTTTTGCC 3300
CGAGCGGACC GTGTCGTGGA CCCCGTTCCC CGCAGCGCCT AATCACCCCC TGGGAAAATC 3360
TCCTTAGCCG ATACGAATAC CAATACGGCG TGGTGGTGCC CGGCGACCGG GAACGCAGCT 3420
GTCTTCGCTG GCTACGGTCC CTCGTGGCTC CTCACAACAA ACCCCGCCCC GCATCATCCC 3480
GCCCGCATCC GGCGACCCAC CCCACGCAGC GCCCGTGTTT TACGTGCATG GGGCGACCCG 3540 AGATTCCCGA TGAGCCCTCC TGGCAGACGG GGGACGATGA CCCCCAGAAC CCCGGGCCCC 3600
CGCTGGCCGT TGGCGACGAG TGGCCTCCGT CATCCCACGT TTGCTATCCA ATCACCAACC 3660
TCTAACCCCC CCCCGATGCT AATAAAAAAC ACTGCGCCCC ATTACACGTA CGAGCGGTGT 3720
CGCGTTTGTT TCTTTTTTTG TCGTTCCTTC CTCCACCCCC AGAAAAACCA GACACTCAGA 3780 CACAAAAGCT TTCTTGTAGG GCGTTTATTT TCGTTTGGCA AACACACCGG GCTGGGGGTC 3840
CCGCGCTCAT GGCCGGAGAA ATGTGTGGCC GCAGGGATAG GGGCAGGCGG CGGGGAAGCG 3900
CATTTTTCGG CACCCGTCCT CGCGTTCTGG GGCTTCCGTT GCGCGACACG CGCCCGGGGC 3960
GGGTAGGCCA GAGCGTTCGG CGGAGCTGGT ATCGGCGACC ACCGCGGACA GCCAGGGCTG 4020 GGAGCCCTCC TCGGACGTCC AGTCAAACGT CCCAAACTCA CACTCCAGCC GAGCCGCGAT 4080
AGCCCCGCCG GACGCGGATT CCGGGTTCTC CCGGCCGGCC GGGGAGGGCC CCCCCTCCGT 4140
GTCGGAATGG GACGCGAGGG TATCGTCCGA GTCCGTCGCA TCCGATATCT CATCATCGCT 4200
CGTCGAGCCG CCGTTGGTCT CGAGATTGCC AACATCACAC TCGGGGCCGT AACGCCTGGC 4260
GTTCAGACAG GGCAGCTCGC ACACGGGCTC GGCGGCGGGT TCAAAACGCG CCTCGACCTC 4320 CCGGATTGCG TTTCGCAGGC GCACGTCCCA GGTTCCGCCC GAGATCTGCA GCAGGCGGCC 4380
CCACGTGCGC GGCCCCAGGC GGGTCCGGCA GTAGCCCATA AGGTAGCAGT CTCGCACCAG 4440
GTGGCGCAGG CGGTTGGCGC TGCCGGGCGG GTTCGGGGCG GCACGCAGGA TCCGAAAGAG 4500
CTGGTTGACG CACTGGCGTA TGTAATGCAG ATCAAGCGTC CAGGCGGCTC CGCTCCGCAT 4560
CTGGGCCTGG CGAGCAGAGC GCCGCGGTGC GGCTGCGCGA TCCCCGGAAG ACTGGGCCGG 4620 GGCCTGGGGT TGCGCCGCGC GGATAGGTCT GTCGTTTCTC CACACCTCGG GGAAAACCAC 4680
ACCCGCGCGC CGGTCGGGGG AGCTCGTTAA TCGCAGGTTG ATTCGGGGTC GCTTGGATTT 4740
ACGGGGGGGG GTGTCAACCA ACCAGCCGTC TGACGCATCA TCATCATCGT CGTCCGACGG 4800
CCCCGTTCCC TCCGATTCCG TCCCCGTGGT CGATTCTGCC GACAGATCCA AAAAATACCT 4860
CCCCCCGAGC TCCCGCGGGG AGCGACGGCG CCCGCCGCGT AGGTCTCCCG CCTCATCCTC 4920 GGACGACTCG GTCGAGGAGG ATTCCGATTC TGTGTCGGGC TTACCCTCAG ATTCCGACGA 4980
GCTGGGGAGG ACGGGGCGTC TGCGCTTCCG TGAACCCGGG GGTGGGGATG GGGGAGCATG 5040
ATTCGCAGGC GTCGTGTTGA CCGCGGGCGG GTCCGGGGGG ATGTCTGCCA TAGTGGCGAC 5100
GCCTTGTCGC CAGTTACCAC ACCGGTGTCC CGTCCACGAA GGCGGCGCCC GGCCTGCGAT 5160
AAAGCGCGGA TGTTGGGATC GGGGCCCCCC CCCCCCGTCT CCCTTTTCCC CTCTCTTCTC 5220 TCTCCCTCCT CCCTCCCCCC CTCCTCTGTC TCTCTCCCCT TTTTCCCTCA CCTCCCCCTC 5280
TTCTCCTCCC TCCTCCTCCT TCCCTTTCCC CTCCCCTCCC TCTCTCTTCC TCCTTCTCCT 5340
CCCCTCTTTC TTCTTCTTCC CTCTCTCCCT TTCCCCCACT CTCATTCTTC CCACTTCGCT 5400
CCCTTTTCTC TCTCTTCCCT TCTTTTCCTT TTCCCTACCC TCTCCCTTCT TCTTCCGTCT 5460
CCCCTCCCCT TCTCTCCTCT CTCTCTCCTC GTCTTTGTAT CCACGCTACC TCCTCTTCAT 5520 CTCATCTCTT TCTTCCTCTC TTCTCCTCCC TCTCCCCTCT TTTATCTCCC CATTCCTTAC 5580
TCTCTCCTTA TCTCTACCTT TATCTCAACA GCTCTCTCAC GCTCCTCCCA TGGCCATCTC 5640
CTCTCCTTCC TCTCCCCTCC TCTCACATCT CATCCTCTCT TCTCCTTCCA TCTCATCTCC 5700
CATCTCCTGC CCCCACGCTT CTCCCCCTCT CTCTTCTCAC T 5742
(2) INFORMATION FOR SEQ ID NO: 94:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 507 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94:
Val Gly Gly Cys Val Asp Lys Leu Pro Leu Leu Lys Thr Pro Gly Pro 1 5 10 15
Val Arg Ala Arg Trp Leu Ala Arg Ala Thr Arg Arg Met Ala Cys Arg
20 25 30
Lys Phe Cys Gly Val Tyr Arg Arg Pro Asp Lys Arg Gin Glu Ala Ser 35 40 45 Val Pro Pro Glu Thr Asn Thr Ala Pro Ala Phe Pro Ala Ser Thr Phe 50 55 60
Tyr Thr Pro Ala Glu Asp Ala Tyr Leu Ala Pro Gly Pro Pro Glu Thr 65 70 75 80
Ile His Pro Ser Arg Pro Pro Ser Pro Gly Glu Ala Ala Arg Leu Cys 85 90 95
Gin Leu Gin Glu Ile Leu Ala Gin Met His Ser Asp Glu Asp Tyr Pro
100 105 110
Ile Val Asp Ala Ala Gly Ala Glu Glu Glu Asp Glu Ala Asp Asp Asp 115 120 125 Ala Pro Asp Asp Val Ala Tyr Pro Glu Asp Tyr Ala Glu Gly Arg Phe 130 135 140
Leu Ser Met Val Ser Ala Ala Pro Leu Pro Gly Ala Ser Gly His Pro 145 150 155 160
Pro Val Pro Gly Arg Ala Ala Pro Pro Asp Val Arg Thr Cys Asp Ser 165 170 175
Gly Lys Met Gly Ala Thr Gly Phe Thr Pro Glu Glu Leu Asp Thr Met
180 185 190
Asp Arg Glu Ala Leu Arg Ala Ile Ser Arg Gly Cys Lys Pro Pro Ser 195 200 205 Thr Leu Ala Lys Leu Val Thr Gly Leu Gly Phe Ala Ile His Gly Thr 210 215 220
Leu Ile Pro Gly Ser Glu Gly Cys Val Phe Asp Ser Ser His Pro Asn 225 230 235 240
Tyr Pro His Arg Val Ile Val Lys Ala Gly Trp Tyr Ala Ser Thr Asn 245 250 255
His Glu Ala Arg Leu Leu Arg Arg Leu Asn His Pro Ala Ile Leu Pro
260 265 270
Leu Leu Asp Leu His Val Val Ser Gly Val Thr Cys Leu Val Leu Pro 275 280 285 Lys Tyr His Cys Asp Leu Tyr Thr Tyr Leu Ser Lys Arg Pro Ser Pro 290 295 300
Leu Gly His Leu Gin Ile Thr Ala Val Ser Arg Gin Leu Leu Ser Ala 305 310 315 320 He Asp Tyr Val His Cys Glu Gly Ile Ile His Arg Asp Ile Lys Thr 325 330 335
Glu Asn Ile Leu Ile Asn Thr Pro Glu Asn Ile Cys Leu Gly Asp Phe
340 345 350
Gly Ala Ala Cys Phe Val Arg Gly Cys Arg Ser Ser Pro Phe His Tyr 355 360 365
Gly lie Ala Gly Thr Ile Asp Thr Asn Ala Pro Glu Val Leu Ala Gly 370 375 380
Asp Pro Tyr Thr Gin Val Ile Asp Ile Trp Ser Ala Gly Leu Val Ile
385 390 395 400
Phe Glu Thr Ala Val His Thr Ala Ser Leu Phe Ser Ala Pro Arg Asp 405 410 415
Pro Glu Arg Arg Pro Cys Asp Asn Gin Ile Ala Arg Ile Ile Arg Gin 420 425 430
Ala Gin Val His Val Asp Glu Phe Pro Thr His Ala Glu Ser Arg Leu
435 440 445
Thr Ala His Tyr Arg Ser Arg Ala Ala Gly Asn Asn Arg Pro Ala Trp 450 455 460
Trp Ala Trp Thr Arg Tyr Tyr Lys Ile His Thr Asp Val Glu Tyr Leu
465 470 475 480
Ile Cys Lys Ala Leu Thr Phe Asp Ala Ala Leu Arg Pro Ser Ala Ala 485 490 495
Glu Leu Leu Arg Leu Pro Leu Phe His Pro Lys 500 505
(2) INFORMATION FOR SEQ ID NO: 95:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 188 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95:
Met Gly Val Val Val Val Ser Val Val Thr Leu Leu Asn Gin Arg Asn 1 5 10 15 Ala Leu Pro Arg Thr Ser Ala Asp Asp Ala Leu Trp Ser Phe Leu Leu 20 25 30
Arg Gin Cys Arg Ile Leu Ala Ser Glu Pro Leu Gly Thr Pro Val Val 35 40 45 Val Arg Pro Ala Asn Leu Arg Arg Leu Ala Glu Pro Leu Met Asp Leu
50 55 60
Pro Lys Phe Trp Ile Val Arg Thr Arg Ser Cys Arg Cys Pro Pro Asn 65 70 75 80 Thr Thr Thr Gly Leu Phe Ala Glu Asp Asp Pro Leu Glu Ser Ile Glu
85 90 95
Ile Leu Asp Ala Pro Ala Cys Phe Arg Leu Leu His Gin Glu Arg Pro
100 105 110
Gly Pro His Arg Leu Tyr His Leu Trp Val Val Gly Ala Ala Asp Leu 115 120 125
Cys Val Pro Phe Leu Glu Tyr Ala Gin Lys Thr Arg Leu Gly Phe Arg
130 135 140
Phe Ile Ala Met Lys Thr Asn Asp Ala Trp Val Gly Glu Pro Trp Pro 145 150 155 160 Leu Pro Asp Arg Phe Leu Pro Glu Arg Thr Val Ser Trp Thr Pro Phe
165 170 175
Pro Ala Ala Pro Asn His Pro Leu Gly Lys Ser Pro 180 185
(2) INFORMATION FOR SEQ ID NO: 96:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 45 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96:
Met Gly Arg Pro Glu Ile Pro Asp Glu Pro Ser Trp Gin Thr Gly Asp
1 5 10 15
Asp Asp Pro Gin Asn Pro Gly Pro Pro Leu Ala Val Gly Asp Glu Trp 20 25 30
Pro Pro Ser Ser His Val Cys Tyr Pro Ile Thr Asn Leu 35 40 45
(2) INFORMATION FOR SEQ ID NO: 97:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 515 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97:
Val Gly Arg Met Arg Val Gly Glu Arg Glu Arg Gly Lys Lys Lys Lys 1 5 10 15 Glu Gly Arg Arg Arg Arg Lys Arg Glu Gly Gly Glu Gly Lys Gly Lys 20 25 30
Glu Glu Glu Gly Gly Glu Glu Gly Glu Val Arg Glu Lys Gly Glu Arg
35 40 45
Asp Arg Gly Gly Gly Glu Gly Gly Gly Arg Glu Lys Arg Gly Glu Lys 50 55 60
Gly Asp Gly Gly Gly Gly Pro Arg Ser Gin His Pro Arg Phe Ile Ala 65 70 75 80
Gly Arg Ala Pro Pro Ser Trp Thr Gly His Arg Cys Gly Asn Trp Arg 85 90 95 Gin Gly Val Ala Thr Met Ala Asp Ile Pro Pro Asp Pro Pro Ala Val 100 105 110
Asn Thr Thr Pro Ala Asn His Ala Pro Pro Ser Pro Pro Pro Gly Ser
115 120 125
Arg Lys Arg Arg Arg Pro Val Leu Pro Ser Ser Ser Glu Ser Glu Gly 130 135 140
Lys Pro Asp Thr Glu Ser Glu Ser Ser Ser Thr Glu Ser Ser Glu Asp
145 150 155 160
Glu Ala Gly Asp Leu Arg Gly Gly Arg Arg Arg Ser Pro Arg Glu Leu
165 170 175 Gly Gly Arg Tyr Phe Leu Asp Leu Ser Ala Glu Ser Thr Thr Gly Thr
180 185 190
Glu Ser Glu Gly Thr Gly Pro Ser Asp Asp Asp Asp Asp Asp Ala Ser
195 200 205
Asp Gly Trp Leu Val Asp Thr Pro Pro Arg Lys Ser Lys Arg Pro Arg 210 215 220
Ile Asn Leu Arg Leu Thr Ser Ser Pro Asp Arg Arg Ala Gly Val Val 225 230 235 240
Phe Pro Glu Val Trp Arg Asn Asp Arg Pro Ile Arg Ala Ala Gin Pro 245 250 255 Gin Ala Pro Ala Gin Ser Ser Gly Asp Arg Ala Ala Ala Pro Arg Arg 260 265 270
Ser Ala Arg Gin Ala Gin Met Arg Ser Gly Ala Ala Trp Thr Leu Asp 275 280 285 Leu His Tyr Ile Arg Gin Cys Val Asn Gin Leu Phe Arg Ile Leu Arg
290 295 300
Ala Ala Pro Asn Pro Pro Gly Ser Ala Asn Arg Leu Arg His Leu Val 305 310 315 320 Arg Asp Cys Tyr Leu Met Gly Tyr Cys Arg Thr Arg Leu Gly Pro Arg
325 330 335
Thr Trp Gly Arg Leu Leu Gin Ile Ser Gly Gly Thr Trp Asp Val Arg
340 345 350
Leu Arg Asn Ala Ile Arg Glu Val Glu Ala Arg Phe Glu Pro Ala Ala 355 360 365
Glu Pro Val Cys Glu Leu Pro Cys Leu Asn Ala Arg Arg Tyr Gly Pro
370 375 380
Glu Cys Asp Val Gly Asn Leu Glu Thr Asn Gly Gly Ser Thr Ser Asp 385 390 395 400 Asp Glu Ile Ser Asp Ala Thr Asp Ser Asp Asp Thr Leu Ala Ser His
405 410 415
Ser Asp Thr Glu Gly Gly Pro Ser Pro Ala Gly Arg Glu Asn Pro Glu
420 425 430
Ser Ala Ser Gly Gly Ala Ile Ala Ala Arg Leu Glu Cys Glu Phe Gly 435 440 445
Thr Phe Asp Trp Thr Ser Glu Glu Gly Ser Gin Pro Trp Leu Ser Ala
450 455 460
Val Val Ala Asp Thr Ser Ser Ala Glu Arg Ser Gly Leu Pro Ala Pro 465 470 475 480 Gly Ala Cys Arg Ala Thr Glu Ala Pro Glu Arg Glu Asp Gly Cys Arg
485 490 495
Lys Met Arg Phe Pro Ala Ala Cys Pro Tyr Pro Cys Gly His Thr Phe
500 505 510
Leu Arg Pro 515
(2) INFORMATION FOR SEQ ID NO: 98:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 6328 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98:
TGCGGTCGAC TCTAGAAGAC CCTGTGCACG GGACTCGGTT GGGCGACGTC TGCGGTCTAN 60 TGGTCGGCGT GGGGACCGGC TGTGTGGTGG GTGGGGGAAG CACGTTGGAC TACAACCGAC 120
CACACGACAT GCAGGCTTCT GAGCCGCGAA TCCCCACCAT GGCGTTGGGG GCTGGGCATG 180
CCCACGCATG CAGGGATGAC GGAGATGATA GCGTGATTGA CGCCCCGCCC CCATACGAAA 240
TTGTGGCCGG CGCGAGCGCG GGCCANTTTG TCGTTATTGA TATCGACACC CCCACGGACT 300 CGCCTCCACC GTACTCTGCA GGGACGTCTC CCGTTGGGCT TGTTTCACCG GCTTCTTCCG 360
GTGACGGCGA GGTGTGTGAG CGTGGCCGCT CGCGCCGCGC CGCCTGGCGG GCCGCTCGGC 420
GCGCCAGGCG CCGCGCCGAA CGACGGGCGC GGCGCCGGAG CTTTGGCCCA GGGGGGTTGT 480
TTGTGGAGAC CCCCCTGTTT CTACCGGAAA CTATGATTGG GGCCCACCCT GGCGTGGGAG 540
GCGACCTCCC GTCGGGCCTC CCTACTTACG CAAAGGCGAC CTCGGATCGC CCCCCCACCT 600 ACGCCATGGT CATGGCCGCA TGTCCGACCG AGCCACCGGG CGGGTCCGTG GGGCCGGCCG 660
ACCAACCCCG CGTGCAAAGC TCGCGCACGT GGCGACCCCC GCTCGTCAAT TCGCGAGAGC 720
TGTACCGGGC CCAACGCGCG GCCCGCTGTG CGTCAAGCTC CGACACGCCC CAAGCCCCAG 780
GGTGGTGTGG CGGGACGTGT CGTCATGCGG TTTTTGGGGT GGTCGCGGTG GTCGTCGTTA 840
TCATCTTGGC CTTCCTTTGG CGGTAAGCTT CCCCCCTCCC GCGATACAAC GAATAAAAGT 900 CGCGTTAACA CACACGCTGG TTCGTCGCGT GGTATTTACC GGGTTCCTAT AACCCACAAA 960
CTCACACCGC GTCTGTTTTG GTTGGTTCTC ACTCTTTATT AATGAGGTTG CA ACGGACT 1020
CGGAGGGAAG GGGGTGGGTT ATACCTTGAT TTTGATTTTG ATTTTGTGGC GTCGCTGTTC 1080
TGCCGCGCGA GCGGCCGTGC CGCTTGAGCT TATAAAGCGA AGGTGTTGTA GGGCCGCGGA 1140
TGCCCCGAGC CAGCAGGTTT TGGAGAACGG ATACCGACAG TGACAGTGGT ACACGATACC 1200 GTTTATCGTG TATTCCCCCG CGGGGGATGC ACCGGAGACG TCCTTGATGG TTCCGCAGCT 1260
GAATGGGCCC GTGCACCGCA GCCGCCCCCC ACGCTTATCC TCTAGTTCGC GCAAGACGGG 1320
CGGGTGGTTG ACCAGGTCCG CGAAGTTGCG GAGCTCGGTA ACCAGCGGAA GGGAGTGCTG 1380
CGAGTCCTTG TACACCGCAA AGAAAAAACA GTGGATGGGG CCGCTGGTCG AGCAGGAGGC 1440
GCGGACTAAA TAACTCCGCG CGGCCAGGCA CGCGGGGTCC GTTTTGGTCG TGTGCAGGGC 1500 GTTTTCGGCC TGCCACGTGG CATTCAGACA GTACGGGGGG GCGACGTGGG TGATGTCCGG 1560
GGCCCGTAAA AACAGGTTCG AGAGGGGCGT CGTTGTCATG TTGCGAGGGG GGGGGTCGGT 1620
GAATACGCGT CTGCCGTGAG TTCCTGGACG CGCTATGAAG CGGGCCGGGC CGGGGCCCCA 1680
CATTTATCCG GTGGGTCATC GCCCTCCTCC CACGCGCACG CCGGCATCGC CCCGGAGTCT 1740
CCGCCCCACC CGCCGCGCGC GCCAAGAACA TCACACGGAA CCACTTGGGT TGACGTCAAT 1800 ATGTTTATTC TTGCCTAAAA TAGGGAGTTG CAGTAGAAGT ATTTGCCGTG CACATATAAG 1860
GGGGCGATAG TGTGACTGGC CGTCAGCTCG CACACGCGAC TGGAACACTC CTGGCGGTGC 1920
GTGTCCAGTA TTTCAATGAG ACCCGCCATG CAGGCCCCCG GGATGTAAAA GTGCATCGTC 1980
TCGCCGGCCC CAACCCCCAC GGTCGTGTAG TCGATCTCCG ACACGCCGCG CTCGACGCGG 2040
TTGGCGAGGC GGGCCAGGAT GACCAACACA AAGGAGGCAA TATCCTTAAT GTCCGACAGG 2100 CGTCGCCGCG AGCACAGGTC GTCCAGCCCG CACAGGCCTC GGGCCTTCAG GTAGCACTGC 2160
AGAAAGGGGC GCAGGCGCGT GGCGAGGTTT TCCAGCACGG CGGCCGCCGT TCCGATGATA 2220
GGGTCCTGGG GGCGGAGCGG CAGATTGTGG TGAATGCACA TCTTGCACCA CGCCAGCGTC 2280
TCATCCGCGG ACGCCAGGGC CTCGATGAGA TTTTCCTGGC GCAGCACGCA GTCGCGCATG 2340
GCCTTGGCTG TCGACGCGGC CCGCGGGTTG GCTGCGAATG TGCGGTAGAG GCTCGGGCCG 2400 TGAGCGACCA GGGTTTCCCA GGAAACCCGA CGGGTCTCGG CGTCAAACCC CCCCGCTTGG 2460
GTGGCCAGCA CGGGAGCCCA GGGGCTGTTC GCGGCGGGAA ACGGCATCCC GCCAAAGGGG 2520
TCTTGCATGA CCAGGGCACT GCGTCCAAAG CTTTCGCTGA TGCGCTCGAC CGCCGCGCGC 2580
TCGGATATGG ATCGCAGAAC CGCCCGAACG GCGGGGTCGA TGGTGTCGGC AGAGGGCGCC 2640 TTTCGCTCCG GGACCGGGGC GCGGCCGTCC GCGTGCGGGG GGGTCAGGGA CAGCGCCATC 2700
AGCGGAGGGG GGGCCTGGCG CGTGCCTCGT GGCCGCGGGG CCCCCGGAGA CGTCGAGCTG 2760
CTCCCCTCCG CGCCGCGCCT CGCCGTGGGT GGCGCCGGGG CCGTCCGTCC GCGCCGACGC 2820
GGGGTGGCAA CCCCCTTGGT TGTGGGCGTT TCTGGAGACG CGCCGGCGGG GGTTTGGTGT 2880 GGAGTCGGCG CCGCCGGGGC CGTATTGACC CCGGCCCCGG CGGCGACCTC GCCGCCCGCC 2940
TCGGGGATGC GGTGCCTTGG TCGACGGGGG TTGGATGCGG GCCACCTTCC CCCCGTGCGG 3000
TTCCCGGGGG GAAGCCGACC GCCTGGTCCC GAGGCGCGAC CACACGCCGG TGGTCGCGGG 3060
TGGCGGATCG TCGGCTCCCC GCCGCGCTGC CGGGCGAGGC GTCAAGGCTT CGGGGGTGCC 3120
GGCGTCCTCG GGGCGGGCCG GGGGACCTTT GGGAATCGCC GCGTAGATGG CCTCCGCCCC 3180 TCCGTCTCCG CAGGGGTCTT CCATGTCCTC GTCCGACAAG GAACACTCCC CGCTGCTGTC 3240
GGACTCGGGG TCGTCGCGGC GGCCCTCCTC GTCCCGCTCC AGAGCGTCCT CCTCGAGCTC 3300
GCTGTCGGAC AGGTCCAATC CTAGGTCGAT TAGCATATCA ATGTCGGTAG CCATGTTGTA 3360
GGTCGCCGGG GCTGGGATGG CGGGTGTCCT CCGAGGGGGC GCGTGTCGGA AGAGAGTGGC 3420
CGGGTCCGAA TCGAGGAGCG GCACCGACGC GCAACCGGGG TCGGCACACG GCAGCACACA 3480 GCGCCCAGTG GGCCGGTGGC TGCCCTTATA CCCGCACGAC CGGGGCCGGC TTTCCGAAAC 3540
TCCTCCTTGT CCCTCCCCGT CGGGCGTCAC CGCCCCCGCC CCCGCCGTCC CCAGAAACCA 3600
ATCGGACGCC GAGGGTGGGT TTTATGTATT TAATTAGCAT ACGGCAGGTC TGGGTCCGCC 3660
TTCGCGTACA CGCGTAGGCG GGGGTGCGGA AGCACGCGGT AGGGTGGGGT GTATGCGGAA 3720
GTCGGACGAG CCTGCCTGTG CTGGACCGGG GGAGGGGCAA GCAGACCCGA GGCCGGATCG 3780 GCTCTGTGCA CGATTTTAAT TTGCATGCGA CGTGCGAGGG TGCGTAGGCC CGAGGCGGGT 3840
CGTGTATTTA ATTTGCATGG GCGACTGGGT CCGCCTCTTC CAACGGAAGA GGCGTTACGT 3900
CACAGATCAA ACAGGCGCCG CTGAATCTCC TGTTCGTAGC GAAGCGCCAT AAGCACCACC 3960
CCGGCCACGA CGGCGATATA GCACAGGCGC ACGGCGATAC CGGAGAGGAT GATGGAACAG 4020
CAGCGCCCGC AGACGCCCGA CCACCCTTTG GAGCGCCCCC TGGGGGCCGC TGGTTCCGCG 4080 TTTTTGGGGG CCGAGTCCCG CCGCAGGATG AAATACAGCT CCGTCAGGGC GATGATGGAC 4140
ACGAAACACC AGGTGGTGAT TGTTAGAAAC AGGGGGTATG TGATCGCGCA GGCGCCCCGG 4200
GAGATGAGAG CGGTGCCGAC GATGAGACCG AGGGCCACGA AGCGGAGCAG CAGCTCGCAG 4260
CCCACGATGA CGCCAACGGC CGGGCGGTGG TACAAGAAGG TGACCGGATC CGCCTCGAAC 4320
AGCTGCACCA GGGTCTGGCG TTGAACGGAT AGCTCGCAGA GGAGGCGGGT GATTTTCGTG 4380 TAGGGGTATT GCAAGAACAC GCTCGACACT ATGCGGCCGG CGTAGTTCAA AAGATAGGTC 4440
GCCGGGGCCA CCATGCTGTG CGCGGGACTC ACGACGCCGA ACATGCATCG TCGTTGGTGA 4500
AGGGCGACGA ACGCTAGATA CAGAAACCAA CCGACGACCA CCAGGCGCAT CTGGGTGTCC 4560
CAGAGGGCCT CCAAGCAGTT TACGGCCTCG TGCACGTTCA TGACCCGGCG GCTCATGGCG 4620
CCGGGGATGG CCGGGAGGGA CACGGCCCGA CCTTCGATGA TATTGGCGTA GCAGACGTGG 4680 GCGTGGGGGG TCCATGCCCC GCCGGGGGGG GCGGTCGGCG GGCCCAGAAA CAACAGCGTC 4740
TGGTTTATCT TCATCCACAC GAGGGCGGTA TCGTTGTGTG CCCCGGCGGG GCGCACCGCG 4800
TAAATACATC GGTGGAGCGG ACTGGCACCA AAGACGATGT ACCACGCGAG CACGAGGCCG 4860
TAGGCCGTTA TGAAGATGAC GGTCGTGAGG TGCTGGAGGG AGCGGACCGC GAGCATGGCG 4920
TGCCCGCATC GACGGTAAAC AGCGTGTGCA GGCGGTTGTT ATCGCATTTA GTGGCAAAGC 4980 ACTGCTGACA CAGGGACGCG CATAGGCGGT TGTTGGTCCC GACGCTCAGC GCGAGAAAGG 5040
TCCGGGCCGT CGCGCGACTT GCCCTGCCGT GCTTGAAGCG CAGACACGAG AGGCTTTGGT 5100
TCAGGGCGCC GCGGCCGGGG ATCAGCTGCA GCAGGACCCA GTCGTCCTTA ATGACGGCCC 5160
GGCGAACGGA CACGGCCTGA TATTCCCGCG CGTGCTCGGG AAAGTGGGCC TCGATGCACG 5220 CGACTATTCG CCCCAGCAGT TCTGACGCGA ACCCCTCCAC GGTCTCCCCG GCGTCGGGCC 5280
GCACATCGTA GCGGCCCAGA ACCTCCGTCA GGGTCTCGCG TCGCCCAAAG TGCTCCAGGG 5340
CGTTGCGCGA CGCCTTCTTC TCGAAAAAGC TGACATAGTC CCCGCCCAGG CTGTGGAGGA 5400
CGCGGATCTC CCGCGGGGCC GCGGAAAACA TGGGCGGGGC GTGAAAGTGG AAGCGCCGCG 5460 GGTCGGCGTG CGCGGCGACG AACGCCGGAA CGTCCTCGCA CGCGGGGGGG ATCACGAAGA 5520
CGGGCAATAA CCGGCCGCAC GCGGAGCCGT CGGGGCCGAT TTTGGCGAAA TACGGCAAGC 5580
GCAGGCTGTG GCCGTGGGCG TACACGCCCG TATCGATCAG CAAAAAGTTC TTTACGTGGC 5640
TCCCTACGGC CTCCACGAAG TTGCGGTCCA ACAGCACCGC CTGCTGGATC ACCCTCGCCA 5700
CCCCACGCAT TGTCAGGGAG CCGTGCACAA CGTACGGGGC GGGGACCGGT AGGCACACGC 5760 GCAGCCCGAT CTTGTCGGCG CAGGAACACA CGACGGAGTC CGGTTCGCTG GGCGTCGCCG 5820
CTGGTATCTG TTCGTGTAGC AGGTCGAGGT ACGCGGCCTC GTCGTCCGGG AGGGGGCCGT 5880
GGGTCGTGTC CATGGGGTCC GTGTCCTCCT CCCACTCCTC GTCGCCGTCG TCGCCACCGG 5940
CGTCGGGGAA CCAGTCCCCG TCGCCGTCGT CGCCACCGGC CGAGGGCCCG TCGCCCGCAC 6000
AGACGGGCGG CGCGCGGGGC CGACAGGCGC TTTTGAAAAA ATAACAGGGA TAGGCGTCGG 6060 GGTTTACGCG GGCCGCGGGA AACAACAGNT GAACCGCCGC CAGCGCCCCG CGCATAAAGT 6120
GACCCAGGGC CTCGTGGAGC CGGGGAAAGG GGACGGGCTC CTTCAGGGCG ATGTCCAGAT 6180
CCAGGATGAT GTTCGTAACG GCCAGCGCGG CGTTGAAGAT CTCGTTGCGG TTCACGTACA 6240
TATGGCCCCC GACGGCCAGG CCGCCCGGGG GGAGCGCGGG CCCCGGCGCA AAAGGGCGGT 6300
GACCGGGGAC TTCTAGATCG AATGCAG 6328
(2) INFORMATION FOR SEQ ID NO: 99:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 86 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99:
Met Val Met Ala Ala Cys Pro Thr Glu Pro Pro Gly Gly Ser Val Gly 1 5 10 15 Pro Ala Asp Gin Pro Arg Val Gin Ser Ser Arg Thr Trp Arg Pro Pro 20 25 30
Leu Val Asn Ser Arg Glu Leu Tyr Arg Ala Gin Arg Ala Ala Arg Cys
35 40 45
Ala Ser Ser Ser Asp Thr Pro Gin Ala Pro Gly Trp Cys Gly Gly Thr 50 55 60
Cys Arg His Ala Val Phe Gly Val Val Ala Val Val Val Val Ile Ile 65 70 75 80
Leu Ala Phe Leu Trp Arg 85
(2) INFORMATION FOR SEQ ID NO: 100:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 212 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100:
Met Trp Gly Pro Gly Pro Ala Arg Phe Ile Ala Arg Pro Gly Thr His 1 5 10 15
Gly Arg Arg Val Phe Thr Asp Pro Pro Pro Arg Asn Met Thr Thr Thr
20 25 30
Pro Leu Ser Asn Leu Phe Leu Arg Ala Pro Asp Ile Thr His Val Ala 35 40 45
Pro Pro Tyr Cys Leu Asn Ala Thr Trp Gin Ala Glu Asn Ala Leu His
50 55 60
Thr Thr Lys Thr Asp Pro Ala Cys Leu Ala Ala Arg Ser Tyr Leu Val 65 70 75 80 Arg Ala Ser Cys Ser Thr Ser Gly Pro Ile His Cys Phe Phe Phe Ala
85 90 95
Val Tyr Lys Asp Ser Gin His Ser Leu Pro Leu Val Thr Glu Leu Arg
100 105 110
Asn Phe Ala Asp Leu Val Asn His Pro Pro Val Leu Arg Glu Leu Glu 115 120 125
Asp Lys Arg Gly Gly Arg Leu Arg Cys Thr Gly Pro Phe Ser Cys Gly
130 135 140
Thr Ile Lys Asp Val Ser Gly Asp Ala Gly Glu Tyr Thr Ile Asn Gly 145 150 155 160 Ile Val Tyr His Cys His Cys Arg Tyr Pro Phe Ser Lys Thr Cys Trp
165 170 175
Leu Gly Ala Ser Ala Ala Leu Gin His Leu Arg Phe Ile Ser Ser Ser
180 185 190
Gly Thr Ala Ala Arg Ala Ala Glu Gin Arg Arg His Lys Ile Lys Ile 195 200 205
Lys Ile Lys Val 210 ( 2 ) INFORMATION FOR SEQ ID NO : 101 :
( i ) SEQUENCE CHARACTERISTICS : (A) LENGTH : 286 amino acids ( B ) TYPE : amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101:
Met Ala Leu Ser Leu Thr Pro Pro His Ala Asp Gly Arg Ala Pro Val 1 5 10 15 Pro Glu Arg Lys Ala Pro Ser Ala Asp Thr Ile Asp Pro Ala Val Arg 20 25 30
Ala Val Leu Arg Ser Ile Ser Ala Ala Val Glu Arg Ile Ser Glu Ser
35 40 45
Phe Gly Arg Ser Ala Leu Val Met Gin Asp Pro Phe Gly Gly Met Pro 50 55 60
Phe Pro Ala Ala Asn Ser Pro Trp Ala Pro Val Leu Ala Thr Gin Ala 65 70 75 80
Gly Gly Phe Asp Ala Glu Thr Arg Arg Val Ser Trp Glu Thr Leu Val 85 90 95 Ala His Gly Pro Ser Leu Tyr Arg Thr Phe Ala Ala Asn Pro Arg Ala 100 105 110
Ala Ser Thr Ala Lys Ala Met Arg Asp Cys Val Leu Arg Gin Glu Asn
115 120 125
Leu Ile Glu Ala Ser Ala Asp Glu Thr Leu Ala Trp Cys Lys Met Cys 130 135 140
Ile His His Asn Leu Pro Leu Arg Pro Gin Asp Pro Ile Ile Gly Thr
145 150 155 160
Ala Ala Ala Val Leu Glu Asn Leu Ala Thr Arg Leu Arg Pro Phe Leu
165 170 175 Gin Cys Tyr Leu Lys Arg Leu Cys Gly Leu Asp Asp Leu Cys Ser Arg
180 185 190
Arg Arg Leu Ser Asp Ile Lys Asp Ile Ala Ser Phe Val Leu Val Ile
195 200 205
Leu Ala Arg Leu Ala Asn Arg Val Glu Arg Gly Val Ser Glu Ile Asp 210 215 220
Tyr Thr Thr Val Gly Val Gly Ala Gly Glu Thr Met His Phe Tyr Ile 225 230 235 240
Pro Gly Ala Cys Met Ala Gly Leu lie Glu Ile Leu Asp Thr Gin Glu 245 250 255
Cys Ser Ser Arg Val Cys Glu Leu Thr Ala Ser His Thr Ile Ala Pro
260 265 270
Leu Tyr Val His Gly Lys Tyr Phe Tyr Cys Asn Ser Leu Phe 275 280 285
(2) INFORMATION FOR SEQ ID NO: 102:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 332 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102:
Met Leu Ala Val Arg Ser Leu Gin His Leu Thr Thr Val Ile Phe Ile 1 5 10 15
Thr Ala Tyr Gly Leu Val Leu Ala Trp Tyr Ile Val Phe Gly Asp Leu
20 25 30
His Arg Cys Ile Tyr Ala Val Arg Pro Ala Gly Ala His Asn Asp Thr 35 40 45 Ala Leu Val Trp Met Lys Ile Asn Gin Thr Leu Leu Phe Leu Gly Pro 50 55 60
Pro Thr Ala Pro Pro Gly Gly Ala Trp Thr Pro His Ala His Val Cys 65 70 75 80
Tyr Ala Asn Ile Ile Glu Gly Arg Ala Val Ser Leu Pro Ala Ile Pro 85 90 95
Gly Ala Met Ser Arg Arg Val Met Asn Val His Glu Ala Val Asn Cys
100 105 110
Leu Glu Ala Leu Trp Asp Thr Gin Met Arg Leu Val Val Val Gly Trp 115 120 125 Phe Leu Tyr Leu Ala Phe Val His Gin Arg Arg Cys Met Phe Gly Val 130 135 140
Val Ser Pro Ala His Ser Met Val Ala Pro Ala Thr Tyr Leu Leu Asn 145 150 155 160
Tyr Ala Gly Arg Ile Val Ser Ser Val Phe Leu Gin Tyr Pro Tyr Thr 165 170 175
Lys Ile Thr Arg Leu Leu Cys Glu Leu Ser Val Gin Arg Gin Thr Leu
180 185 190
Val Gin Leu Phe Glu Ala Asp Pro Val Thr Phe Leu Tyr His Arg Pro 195 200 205
Ala Val Gly Val Ile Val Gly Cys Glu Leu Leu Leu Arg Phe Val Gly 210 215 220
Leu Ile Val Gly Thr Ala Leu Ile Ser Arg Gly Ala Cys Ala Ile Thr
225 230 235 240
Tyr Pro Leu Phe Leu Thr Ile Thr Thr Trp Cys Phe Val Ser Ile Ile 245 250 255
Ala Leu Thr Glu Leu Tyr Phe Ile Leu Arg Arg Asp Ser Ala Pro Lys 260 265 270
Asn Ala Glu Pro Ala Ala Pro Arg Gly Arg Ser Lys Gly Trp Ser Gly
275 280 285
Val Cys Gly Arg Cys Cys Ser Ile Ile Leu Ser Gly Ile Ala Val Arg
290 295 300
Leu Cys Tyr Ile Ala Val Val Ala Gly Val Val Leu Met Ala Leu Arg
305 310 315 320
Tyr Glu Gin Glu Ile Gin Arg Arg Leu Phe Asp Leu 325 330
(2) INFORMATION FOR SEQ ID NO: 103
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 482 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103:
Ala Ala Phe Asp Leu Glu Val Pro Gly His Arg Pro Phe Ala Pro Gly
1 5 10 15
Pro Ala Leu Pro Pro Gly Gly Leu Ala Val Gly Gly His Met Tyr Val 20 25 30 Asn Arg Asn Glu Ile Phe Asn Ala Ala Val Thr Asn Ile Ile Leu Asp 35 40 45
Leu Asp Ile Ala Leu Lys Glu Pro Val Pro Phe Pro Arg Leu His Glu
50 55 60
Ala Leu Gly His Phe Met Arg Gly Ala Ala Val Xaa Leu Leu Phe Pro 65 70 75 80
Ala Ala Arg Val Asn Pro Asp Ala Tyr Pro Cys Tyr Phe Phe Lys Ser
85 90 95
Ala Cys Arg Pro Arg Ala Pro Pro Val Cys Ala Gly Asp Gly Pro Ser 100 105 110
Ala Gly Gly Asp Asp Gly Asp Gly Asp Trp Phe Pro Asp Ala Gly Gly
115 120 125
Asp Asp Gly Asp Glu Glu Trp Glu Glu Asp Thr Asp Pro Met Asp Thr 130 135 140
Thr His Gly Pro Leu Pro Asp Asp Glu Ala Ala Tyr Leu Asp Leu Leu
145 150 155 160
His Glu Gin Ile Pro Ala Ala Thr Pro Ser Glu Pro Asp Ser Val Val 165 170 175
Cys Ser Cys Ala Asp Lys Ile Gly Leu Arg Val Cys Leu Pro Val Pro 180 185 190
Ala Pro Tyr Val Val His Gly Ser Leu Thr Met Arg Gly Val Ala Arg 195 200 205
Val Ile Gin Gin Ala Val Leu Leu Asp Arg Asn Phe Val Glu Ala Val 210 215 220
Gly Ser His Val Lys Asn Phe Leu Leu Ile Asp Thr Gly Val Tyr Ala
225 230 235 240
His Gly His Ser Leu Arg Leu Pro Tyr Phe Ala Lys Ile Gly Pro Asp 245 250 255
Gly Ser Ala Cys Gly Arg Leu Leu Pro Val Phe Val Ile Pro Pro Ala 260 265 270
Cys Glu Asp Val Pro Ala Phe Val Ala Ala His Ala Asp Pro Arg Arg 275 280 285
Phe His Phe His Ala Pro Pro Met Phe Ser Ala Ala Pro Arg Glu Ile 290 295 300
Arg Val Leu His Ser Leu Gly Gly Asp Tyr Val Ser Phe Phe Glu Lys
305 310 315 320
Lys Ala Ser Arg Asn Ala Leu Glu His Phe Gly Arg Arg Glu Thr Leu 325 330 335
Thr Glu Val Leu Gly Arg Tyr Asp Val Arg Pro Asp Ala Gly Glu Thr 340 345 350
Val Glu Gly Phe Ala Ser Glu Leu Leu Gly Arg Ile Val Ala Cys Ile 355 360 365
Glu Ala His Phe Pro Glu His Ala Arg Glu Tyr Gin Ala Val Ser Val 370 375 380
Arg Arg Ala Val Ile Lys Asp Asp Trp Val Leu Leu Gin Leu Ile Pro
385 390 395 400
Gly Arg Gly Ala Leu Asn Gin Ser Leu Ser Cys Leu Arg Phe Lys His
405 410 415
Gly Arg Ala Ser Arg Ala Thr Ala Arg Thr Phe Leu Ala Leu Ser Val
420 425 430
Gly Thr Asn Asn Arg Leu Cys Ala Ser Leu Cys Gin Gin Cys Phe Ala 435 440 445 Thr Lys Cys Asp Asn Asn Arg Leu His Thr Leu Phe Thr Val Asp Ala
450 455 460
Gly Thr Pro Cys Ser Arg Ser Ala Pro Ser Ser Thr Ser Arg Pro Ser 465 470 475 480
Ser Ser
(2) INFORMATION FOR SEQ ID NO: 104:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 10212 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104:
GAAAGTGGAG GGAAGGGGGG GAGGAGTGGG AGTGGTTAAG GAGGTGGGAG GAGAGAGAGA 60 AGGGAGGGGA GGTGGTGGAG AGTGTGGAGT AAGGAAGGAG AAAGGAGGAG AAGGGAGGGA 120
GTTTGATTGT AGAGGAAAAG AGGTGGGAGA GGAGAGGGGG ATGATGGATG GAGGGTATGG 180
GAGTGGAGGG GGGGGGTGGA GGTGGAGTGG GGGTGGGAGT TAGGGGGTGG GGGGGTGGGG 240
GNTGTGGTGG GTTGGTGGTG GGGGGGTGGT GTTGGGTGGG NTGGGGTGGG TGGGTGGGGG 300
TTTTTTTTTG TTTTTTTTTG TTTTGTTNTT NNTTTTNNNT NNNNNTNNTN NNNNNTTTTT 360 TGGCGCCGGA CCTCACGGAC CCGCTGCTGT TTGCGTACGT CGGATTCCAG GTCGTGAACC 420
ACGGGCTGAT GTTTGTGGTC CCCGACATCG CCGTATACGC GATGCTGGGG GGCGCCGTGT 480
GGATCTCGCT GACGCAGGTG CTTGGGCTCC GGCGCCGCCT TCACAAGGAC CCAGACGCCG 540
GGCCCTGGGC GGCCGCGACC CTGCGGGGCC TCTTTTTCTC CGTCTACGCA TTGGGGTTTG 600
CGGCGGGGGT GCTGGTGCGG CCGCGGATGG CGGCGAGCCG GCGGTCGGGG TGATCGCCAT 660 TTCAAATAAA AGGCACGAGT TCCCCGAATA CCACCGGCGT GTGATGATTT CGCCCTACCG 720
CTCCGATCCC CGGGGGGAGG GGGGAAGGAA ATGGGGGCGG GGGTGCCGTG GACGGGTATA 780
AAGGCCAGGG GGGCAGGCGG GCCCATCACT GTTAGGGTGT TAGGTTGGGA GGTGGCACAA 840
AAAGCGACAC ACCCGTGTTG TAGTTGTCCG CGGGAGGCGG TGGTTTCCGG CAACCCTCCT 900
CGCTGCGCCG GGCGCGCCCA CCGGTCCTTC GCGGGGGCCG GGGCTCTTCT GGTCATGGCC 960 CTTGGACGGG TGGGCCTAGC CGTGGGCCTG TGGGGCCTGC TGTGGGTGGG TGTGGTCGTG 1020
GTGCTGGCCA ATGCCTCCCC CGGACGCACG ATAACGGTGG GCCCGCGGGG GAAAGAGAGC 1080
AATGCCGCCC CCTCCGCGTC CCCGCGGAAC GCATCCGCCC CCCGAACCAC ACCCACGCCC 1140
CCCCAACCCC GCAAGGCGAC AAAAAGTAAG GCCTCCACCG CCAAACCGGC CCCGCCCCCC 1200
AAGACCGGGC CCCCGAAGAC ATCCTCGGAG CCCGTGCGAT GCAACCGCCA CAACCCGCTG 1260 GCCCGGTACG GCTTGCGGGT GCAAATCCGA TGCCGGTTTC CCAACTCCAC CCGCACGGAG 1320
TCCCGCCTCC AGATCTGGCG TTATGCCACG GCGACGGACG CCGAGATCGG AACGGCGCCT 1380
AGCTTAGAGG AGGTGATGGT AAACGTGTCG GCCCCGCCCG GGGGCCAACT GGTGTATGAC 1440
AGCGCCCCCA ACCGAACGGA CCCGCACGTG ATCTGGGCGG AGGGCGCCGG CCCGGGCGCC 1500 AGCCCGCGGC TGTACTCGGT CGTCGGGCCG CTGGGTCGGC AGCGGCTCAT CATCGAAGAG 1560
CTGACCCTGG AGACCCAGGG CATGTACTAC TGGGTGTGGG GCCGGACGGA CCGCCCGTCC 1620
GCGTACGGGA CCTGGGTGCG CGTTCGCGTG TTCCGCCCTC CGTCGCTGAC CATCCACCCC 1680
CACGCGGTGC TGGAGGGCCA GCCGTTTAAG GCGACGTGCA CGGCCGCCAC CTACTACCCG 1740 GGCAACCGCG CGGAGTTCGT CTGGTTCGAG GACGGTCGCC GGGTATTCGA TCCGGCCCAG 1800
ATACACACGC AGACGCAGGA GAACCCCGAC GGCTTTTCCA CCGTCTCCAC CGTGACCTCC 1860
GCGGCCGTCG GCGGCCAGGG CCCCCCGCGC ACCTTCACCT GCCAGCTGAC GTGGCACCGC 1920
GACTCCGTGT CGTTCTCTCG GCGCAACGCC AGCGGCACGG CATCGGTGCT GCCGCGGCCA 1980
ACCATTACCA TGGAGTTTAC GGGCGACCAT GCGGTCTGCA CGGCCGGCTG TGTGCCCGAG 2040 GGGGTGACGT TTGCCTGGTT CCTGGGGGAC GACTCCTCGC CGGCGGAGAA GGTGGCCGTC 2100
GCGTCCCAGA CATCGTGCGG GCGCCCCGGC ACCGCCACGA TCCGCTCCAC CCTGCCGGTC 2160
TCGTACGAGC AGACCGAGTA CATCTGCCGG CTGGCGGGAT ACCCGGACGG AATTCCGGTC 2220
CTAGAGCACC ACGGCAGCCA CCAGCCCCCG CCGCGGGACC CCACCGAGCG GCAGGTGATC 2280
CGGGCGGTGG AGGGGGCGGG GATCGGAGTG GCTGTCCTTG TCGCGGTGGT TCTGGCCGGG 2340 ACCGCGGTAG TGTACCTCAC CCACGCCTCC TCGGTGCGCT ATCGTCGGCT GCGGTAACTC 2400
CGGGGCCGGG CCCGGCCGCC GGTTGTCTTC TTTTCCACCC CTTCCGTCCC CCGTACCCAC 2460
CACACCCCAC CCCACCCCCC CGCCGTCCCC CGGGCGTTAT AAGCCGCCGC ACTCGCTTTT 2520
CCCACCGGAA AATCCTCGGC CCGATCCGAA CGGCGCACGC CGCGTGGGCT CCAAACGCCT 2580
CCGCGAAGAG AGCGCCCCGC CCCGATATTC AAGCCCGCGG TGGTGCTATG GCTTTCCGTG 2640 CTTCGGGACC CGCCTACCAG CCCCTCGCCC CCGCGGCCTC CCCGGCGCGG GCTCGTGTTC 2700
CGGCCGTGGC CTGGATCGGC GTCGGAGCGA TCGTCGGGGC CTTTGCGCTC GTCGCCGCGT 2760
TGGTTCTCGT ACCCCCTCGG TCCTCGTGGG GACTCTCGCC GTGCGACAGC GGCTGGCAGG 2820
AATTCAACGC GGGATGCGTC GCGTGGGACC CCACCCCCGT CGAGCACGAG CAGGCGGTCG 2880
GCGGCTGCAG CGCGCCGGCC ACCCTTATCC CCCGTGCGGC CGCCAAGCAC CTGGCCGCTC 2940 TGACACGCGT CCAGGCGGAG AGATCGTCGG GTTACTGGTG GGTGAACGGA GACGGCATCC 3000
GGACCTGTCT GAGACTCGTC GACAGCGTCA GTGGCATCGA CGAGTTTTGC GAGGAGCTCG 3060
CGATCCGCAT ATGCTACTAC CCACGAAGCC CCGGCGGGTT TGTCCGCTTC GTAACTTCAA 3120
TACGTAACGC CCTGGGGTTG CCGTGAGGCG CGCGTCCGAC GGTCCCGCTT CTCGCCTCTC 3180
TTCTTCCCCC TCCCCACCCC ACCCACCGAC CAACGACGGC GTTTGGCCAA TACCCTCCTT 3240 TTTTCTTTTT CTCTTCCCCC CCCAAAAAAA AAAACAATAA ACAGCTAATT GCGTACGACA 3300
AACCATGCGG AACTCGCTGT TTTTTTTCCC CTGTTTGTTA CTTTTTATTG AAAACAGACA 3360
TACGGGGAAA GGGGCCGGAA ACCGAGACGG TGGGGCCGGC GGTCGCATTT TTTTAATGGC 3420
TCTGGTGTCG GCCGCGTTTG AGCTTCGTCA ACAGGGCGCT GAGGGCGGCG ACGTTTGTCG 3480
GGCCGTCGTT GGCCAGCGCG TTGGTCCGGG GGCGGGCGGG CATGGGCGAC AGGCTTAGTC 3540 CCGGGTCCGG GGCGCGTGTG GCCCCCGGAG GGGAGAAGAG GGCAGACCCG CCCCAGTCGT 3600
ACAGGGGATT TTCCGCCTCG ATGTACGGGG AGTCCGGGGC GTCTCCCGGC GGGGCCGCCC 3660
CGCCGGCGTC TTGCCGGCGA AGGCAGATGT TTTCGTATAC CCGAACCCAG GGGATCTCCT 3720
CGTAGACGCG CCCCCCATCC TCGCTCACCG ACTCGTAAAT GGAATCTGCC TCCTCGGAGG 3780
GGGCGCGGGG GGCGTGGCTT TCGGCCGGCC AGGCGGCGGC GGTGGTGTCG GCGGCGGGGG 3840 TGGAGCCAAG CCCGACGCCC GCGGGCATGG CGGCGTCATC CTCCGGCAGC AGATACGTGT 3900
TTTCCATCTG GTCCGGTTCG GCCTCCGCGT CTGGCCCCCA GGTCCGCACT GCGTTGTAAA 3960
CCCCGGCGGC CTCGCGATGA GCCGCGAGCG GGCGCGCCGC GGCTGCCGGC CGCTGCTCGG 4020
GGGGCGCGGG GTTGCGGGGC GGGAGGCGCG GGGGCGCCCC GGCCATATGC GTGTAATAAG 4080 TGGCCGGCCG GCCGGCGCAG GGCTCGGGAC CCCGGTCGGC CGCGTCAACG TGCGGGGGCT 4140
CGGGGAGGTC CTCGCGGTGG CGCCTGAACC TCCGAGGGGC CGCGGGGGTC AATTGGGGGC 4200
CACCCCGGGG GAGCGGCGGG GGTGCGTTAT CGCGCCGGGT CCGTTGTATC TTGTCCCGGC 4260
AGCTCCCGCC GACCGCGCCG CGGCCCCCCG GTGGGCCGGA CGCCGCGAGG CGCAGGATGG 4320 ACTCGTAGTG GGGCGACGGG GTTCCGCTCC GAAGCAGGTC CGGGGCCAGG GCGGCCCCTA 4380
ACCACTACTT GATGCTGAGT TCCATCCGGG CCCAGCTCGG GGCGGTCATC GTGGGGAACA 4440
GGGGGGCGGC GGTCCTGCAG AAGCGCTCCT GGCTGTCCAC CGCCGCCCGT AGGTACTCGT 4500
TGTTCAGGCT GTCGGAGGCC CAAACAACAT ACCCGGTAAG CGTCGCGTTA ATTATATACT 4560
GGGCGTGGTG GTGGACTATG GATAGAACCT CGACGGTCGA GACGATGGCG TCCACGATCC 4620 CGTACGTGCC GCCGCTGCGC TTGCCGGTCT CCCACAGGTG GGCCAGGCGC GTCAGGTGGC 4680
CCAGGACGTC GCTGACCGCC GCCCGCAGGG CCATGCACTG CATCGAGCCC GTGGTGCCGC 4740
TGGGCCCGCG GTCCAGGTGG CGCGCAAACG TCTCCGCGGG CGCCTCCAGA CTCCCGCTGA 4800
GCGCCACGAA CCGGCGATCG GCGGGGCCCA GGCGGCGACA CACGTACTTG TCCGCCGTCC 4860
ACAGCATCCA CGAGGCCCAA TGGTACAACA CGGAGACGTA GGCCAGGAGC TCGCTCAGCC 4920 GCAGTGCGGT GTCCGTGCTC GGCCGGCTCG GGTCTGCGGG GCGCATAAAA AACATGTACT 4980
GCTGGAGCCT GTGGGCCGCG TCGCGCAACC CCGCCACCGC GGCGGCGTAC TTGGCCGCGG 5040
CGGCCCCGCT CTTGAACGGG GCGCGCACCA CCAGCTTCGG GAGCAGGGTG GGCCGCATCA 5100
ACACGTGCAG GCTGGGGTCG CANTCGCCCG CCGGGTCGTC GGGGATGTCC AGGCCGCTGG 5160
GCACAACCGT CTGGAGGTAC TTCCAGTACT GCGCTAGGAT GGCGCGGCTC AGCTGGCCGC 5220 CCGACAGCTC CACCTCGCCG AGCGCCTGCT TGGCGGCCGA CGCGTAGTGC CGGATGTAGT 5280
CGTAGTGCGG GTCGCTGGCG AGCCCGTCTA CGATCAGGCT CTCGGGGACG GTGTTATGGT 5340
GCCGCGCCGC CAGCCGGACG CTGCGATCGG CGCCGGTCAG AAACGCCGGC TGCAGGTCGT 5400
CGGCGCGCTG CCGCAGGACG CCCACGGCCG CGCTGAGGAG CCCCTCCGGG GTGGGGAGCA 5460
GACACCCGGC GAAGATGCGC CGCTCGGGGA CGCCCGCGTT GGCGCCGCGG ATGAGGTTGG 5520 CCGGCGTCAG GCACCGCGCC AGCCGCAGGG AGCTCGCGCC GCGCGCCCGG CGTTGCATGG 5580
CGGAGACCGT TCGGTCGGGG GCCCGCCGGT CGGAGGTATG CCGCGTCCCG GGATATAGGG 5640
TTGCTTTTTA TGGGGAGGCG CCTATGGGCG TGGCGGGCCG CCCAGCCCGG TCGCGCGCCT 5700
CCCGGACACG TGCGCCCGGA GGGCGGCGGT CTCCTCGTCG CCCATGAGCA GTTTCCGAAA 5760
CTGCGCCATG ATGTCCACGA CGCGGACCCG CGGCCCCAGC ACGGACTCGC TATTCAGGGG 5820 GGCGGGGGGG AAGGCCGCCA GGTCTTCGAG CAGGAAGGCG GGGTCTGCCG TCCCGCTCAC 5880
GGGCGCCCGG GGCGCCGAGG ACGCGGGGCG AAGGTCCACG TGTTCCGCGG CGGCGCGCAC 5940
GTCCGCCCAA AATTTGGCGG GGGTGGTCCG CGCGTACAGG GGCTGGGTCG CGCGGAGGAC 6000
GCACGCGTAG CGCAGGGGGG TGTACGTGCC CACCTCGGGG GCCGTCGACC CGCCGTCAAA 6060
CGCGGCCAGG GCCACGCACG CGACCACCGT GTCGGCCAGG CCCAGCAGCC GCTGCAGGAT 6120 GAGCCCCGTC GCCAGCACGG CGCGCGCGGC CGCCGCGTTG TCCCTGCGCC GGCGCGCGTC 6180
CCCGCAGGCC AGGGCGTATT TCAGGGTAAC GGTCGCCAGG GCCGTGTGCA GCGCGTACAC 6240
GGCCGCGCCC AGCACGGCGT TCAGCCCGCT GGTGGCGAGC AGGCGGCGCG CCGCGGTGTC 6300
GCCCAGCGCC TCGTGCTCGG CCGCCACGAC CCCGGGGCTA CCCAGGGGCA GGGCGCGAAA 6360
CAGCGCCTCC TGCTCCACGT CCGCAAACGC GGGGTGGGCG GAGTGCGGGT GCAGGCGCGC 6420 CCCCACGACC ACCGAGAGCC ACTGGACCGT CTGCTCCGCC AGGACCGCCA GCACGTCCAG 6480
GACGCGCCCC GCAAACGCGG CCTCCCGCGG GAGCACGCAT TTGACGGCGC CGGGGTTGAA 6540
GCGGGCGAGC AGAGCCCCGG TGGCGATGTA CGTCATGCGC CCCGCGTAGC GGGCGGCCAC 6600
GCGACAGTCG CGCCCCAGGA GCGCGCGCAC CCCGGGCCAG TACAGCAGGG ACCCCAGCGA 6660 ACTGCGAAAG ACCGCGGCGT CGGGGCCGGG GTGGGGGGGC GCGGCCCCTC CCGCGCTGAG 6720
CAGCGGCACG GCGGCGGCCC CCACGGGCCG CAACGCCGTG AGGCTCGCGA ACTGCCGTCG 6780
GAGCTCGGCC GCCCTGTCGT CGAGCTCCGA GCCGCGCCCC TCCGTGTGCA GGCGCGTCCC 6840
GCAGACCCAC CCGTTGATCG CCACCCGCAC GATGGCGTCC ACCAGAAAGC CCATCGCGCG 6900 GGAGGGGCTG GTTTTTGCCC GCCGATCCGT CAGGTCGAGG ATCGCGTCGC CCGTGACGTA 6960
CCAGGCCAGC GCCTCGCCCT GCTGCAGCGT CTGGCGGAAA AACACCTTTG GGTCGGCCGG 7020
GGAGGCAAAG TGCATGACCC CCACGCGCGA CAGCCCGAAC GCGCTATCCG GACACGGGTA 7080
GAACCCGGCC GGATGTCCCA GGGCCAGGGC CGAGCGCACG GACTCGTCCC ACGCGGCGAC 7140
TCGGGGGGTC AGGCGGTCCA GGGGGAATGC CGCCTGCAGC TCCGGGCCCG ACACGCGGCC 7200 CTCTATAATC TCGACCGTCG CGGGAGGCCG CGCCCCGGCG CCGTCATCGT GCGCGACGGC 7260
GGCGGGGTAG TCGTCCTCCT CGTAGCTGAG CTCGTCCAGG AACAGCGGCG AGGGCACCAC 7320
CCGCGAACCG CCCACCCGCC CCAAAACGTC GCGTGGGTCC ATCGGGCCCA GGTAGCCTCC 7380
CCGCGGGGCC CGCGTGATGG CGCTGTCCCG GCGTCCGCGA ACGGACTGGC TCCTGGCCGT 7440
AACGGACCTG GGGCGCGGAA AGGACGCCCG GCGGGGGGGC GCCGCCGCCC GGGCCTCGGA 7500 CGCGCGTCGG GACCCGGGGT GACCGCGGGC CTCCCGGCGA CGGCGCGGGG GCGGCTCTTC 7560
GCTCGCCATC TCCCCCGCGG CCTCGACCTC GCTGTCGTCG TCCACGTTAA ACACCGCCCG 7620
CAGGTACCCC ATTAACCCGA CTCCACCGCC CTCGGGCTCG TCCTCCACGG GCGATTCGGC 7680
GCGATGCGCG GACGGGGCAT GGGACCGGGT GGAGGCGCGC CTCCGGCGTA CGGCATGCCC 7740
GCGCACGGAC ATGGTGGCCG GAGGCCCGAT TTTTTACACA CCCCCTCCCC GCAAACGGAC 7800 AAGGAAAGGG GTGGTGCGAG GGGGGAGGCC CAAACGGGGA GGTGGGGGGT AGGGGGCGGT 7860
CCCAGGGAGC GGGGGGTACG AACCGGCACG ACGGGAACAG AGAAACGCGA CCGCTCCAAC 7920
AAGGGTGGGG GGTGGGCCTC ATCCCCACGC AAACCCGCGG GCAAATGCGA GAACGGGACC 7980
CGCGCGCCTG CCTTTATACG CGGACCCCAG CACCACGAGC CGTTCTGTGA CCCGAATCTA 8040
CACGACCGCG GGCTCGTAGG CGCGACTAAC GCCCAACCCA ACGGCACACA CCCCCCACCC 8100 CGCGCGTAAC CCCATTTCTT TCATGGTCCC GTAATAAACA GCCAACGCAC GCCGCGTATG 8160
ATGAGTTGCT TGCCAATGTT TATTGCTGTG GTTGCGAACC CTCTATCGCG ATACAGACGG 8220
AGGTGAGGCG GGGCGGTGGT GGGGGGGGGG GCGCCCCCCC CGGTCGCACA TCCTACCCCC 8280
CAAAGTCGTC AATGCCCATG GCATCGGTAA ACATCTGTTC AAACTCAAAA TCGTCCACGT 8340
CCAAAGCCCC ATACAAAACG GGGTCGTGGG TCATTCCCGG GGAGGGGGAC TCCACGTCCC 8400 CCAGCATCTC CAAGTCGAAG TCGTCCAGGG CGTCGGCGGG CGTCATATCC ACCTCCTCGC 8460
CGTCCAGGCG GAGTTCGTCT CCCAGGCTGA CGTCGGTAAT GGGGGCGGTG GTGGACAGTC 8520
TGCGGGGGCG TTGTCCCGCG GAGAGAAACG ACATGCGCGG CGCCACCAGC CCGGCCTCCG 8580
CGGGAGCGTC ATCGTCGTCC GGGAGGTCGA GCAGGCCCTC GATTGTCGAT CCGTAATTAT 8640
TTCTGGTCCG CCCGCGGCTA TACGCGTGCT CCCGCATGAC GGACTCGCCC TCCGAGGTCG 8700 CAACGCTGGA GTACGAGTCC AACTTGGCCC GGATCAGCAG CATAAAGTAC CCAGAGGAGC 8760
GGGCCTGGTT GCCCTGCAGG ACGGGCGGGG TCGTGAGGGG CGCCCCGGGT TCCTCCGCCG 8820
CCGCACTTCG CACCAGCGGG AGGTTCAGGT GCTCGCGAAT GTGGTTTAGC TCCCGCAGTC 8880
GCCGGGCCTC CACGGGAACT CCCCGCACGG TGAGCGATCC GTTGATAAAC ATCAGGGGCT 8940
GAAACAGACA CGCCAACTGG CGCCAGCTCT CCAGGTCGCA GCAGAGGCCG TCGAACAGAT 9000 CGGGCCGCAT CATCTGCTCG GCGTACGCGG CCCATAGGAT CTCGCGGCTC AGAAAGAGGT 9060
ATAGATGCAG AAACAGGACG CGCGCCAGGC GCGCGGTCTC GCGGTAGTAC CTGTCCGCGA 9120
TCGTGGTGCG CAGCATCTCC CGCAGGTCGC GGTTGCGGCC CCGCATGTGT GCCTGGCGGT 9180
GTAGCTGCCG AACGCTGGCG CGCAGGTACC GGTACAGGGC CGAGCAAAAA TTTGCCAACA 9240 CGGTCCGGTA GCTCTCCTCC CGCGCCCGCA GCTCACCGCG GAAAAACTGC GCCATGGCCT 9300
CGTAGTACGA AGGCAGCTCG TCGCGGGTGG CGGGCAGGGT GGGGAACGCC ACGTCGCCGT 9360
GGGCGCGAAT GTCGATCGGG GAGCGCTCGG GGACGTGCGC ATCCCCCCAG TCGATCACGT 9420
CGCTGGGCAG CGTCGACAGA AACTTGCACT CCCGGTACAT GTCGGCGTTG GTCGGGAACC 9480 CAGAGAACAG GTCCTCGTTC CAGGTATCTA GCATGGTACA CAGCGCGGGA CCCGCGCTGA 9540
AGCCCAGATC GTCGAGGAGA CGGTTAAACA GGGCCGCGGG GGGGACGGGC ATGGGCGGCG 9600
AGGGCATCAG CTGGGCCTGA CTCAGCCGAC CGGTGGCGTA CAGCGGAGGG GCGGCTGGGG 9660
TGTTCTTGGG ACCCCCGGCT GGCCTGGGGG GCGGTGGCGA AACCCCGTCC GCGTCCGCAA 9720
ACAGATTGTT GACCAACAGG TCCATGGGGG CGGTTGGGTC CGGGGATAAC GATTTTGAGA 9780 GGCGAATGAG AAGTGCCCGA GCGCCCGGCG GCGGAGAGGG GGGGAGGGAT CCGGGACCCG 9840
CGACAGAAAA AGGCCGGGGC CCTTGCGAAG GGAATTGCCG GGGGTGCCGT GCGTCCCCGA 9900
TGACTGACAT CTCTCTTCCT CCCCCCCGCA TTTTTAGTAT CACCCCAATT GCCGCCCCAA 9960
AACCTTCTTG ACTTCCCCCA CCCGTTTCCG TGGCGGCCCC TTCCCCCCTG CTCCTCTGTA 10020
ACGGGATGGT CTTATTCCCT CCTTCCCCTG GCCCTTCCCC CTCCTCTCTT CCTTTTTCCT 10080 TCCCCTTCTT CCGTCACTCC TTCCTCCCCT CTCTCGATTC CTCCCTTCTT CCCCATCTCT 10140
TCCTTCTCCT CTCACTCTCA TATCCTTCAA TACTCTCCTC CTCTCTATCT TTCCCCCCGC 10200
TTCTTCTCTC T 10212
(2) INFORMATION FOR SEQ ID NO: 105:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 148 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105:
Val Gly Val Gly Val Arg Gly Trp Gly Gly Gly Xaa Cys Gly Gly Leu
1 5 10 15
Val Val Gly Gly Trp Cys Trp Val Xaa Trp Gly Gly Trp Val Gly Val 20 25 30 Phe Phe Cys Phe Phe Leu Phe Cys Xaa Xaa Phe Xaa Xaa Xaa Xaa Xaa 35 40 45
Xaa Xaa Phe Leu Ala Pro Asp Leu Thr Asp Pro Leu Leu Phe Ala Tyr
50 55 60
Val Gly Phe Gin Val Val Asn His Gly Leu Met Phe Val Val Pro Asp 65 70 75 80
Ile Ala Val Tyr Ala Met Leu Gly Gly Ala Val Trp Ile Ser Leu Thr
85 90 95
Gin Val Leu Gly Leu Arg Arg Arg Leu His Lys Asp Pro Asp Ala Gly 100 105 110
Pro Trp Ala Ala Ala Thr Leu Arg Gly Leu Phe Phe Ser Val Tyr Ala
115 120 125
Leu Gly Phe Ala Ala Gly Val Leu Val Arg Pro Arg Met Ala Ala Ser
130 135 140
Arg Arg Ser Gly 145
(2) INFORMATION FOR SEQ ID NO: 106:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 538 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106:
Met Gly Ala Gly Val Pro Trp Thr Gly Ile Lys Arg Ala Gly Gly Pro
1 5 10 15
Ile Thr Val Arg Val Leu Gly Trp Glu Val Ala Gin Lys Ala Thr His 20 25 30 Pro Cys Cys Ser Cys Pro Arg Glu Ala Val Val Ser Gly Asn Pro Pro 35 40 45
Arg Cys Ala Gly Arg Ala His Arg Ser Phe Ala Gly Ala Gly Ala Leu
50 55 60
Leu Val Met Ala Leu Gly Arg Val Gly Leu Ala Val Gly Leu Trp Gly 65 70 75 80
Leu Leu Trp Val Gly Val Val Val Val Leu Ala Asn Asp Gly Arg Thr
85 90 95
Ile Thr Val Gly Pro Arg Gly Lys Glu Ser Asn Ala Ala Pro Ser Asp 100 105 110 Arg Asn Ala Ser Ala Pro Arg Thr Thr Pro Thr Pro Pro Gin Pro Arg 115 120 125
Lys Ala Thr Lys Ser Lys Ala Ser Thr Ala Lys Pro Ala Pro Pro Pro
130 135 140
Lys Thr Gly Pro Pro Lys Thr Ser Ser Glu Pro Val Arg Cys Asn Arg 145 150 155 160
His Asn Pro Leu Ala Arg Tyr Gly Leu Arg Val Gin Ile Arg Cys Arg
165 170 175
Phe Pro Asn Ser Thr Arg Thr Glu Ser Arg Leu Gin Ile Trp Arg Tyr 180 185 190
Ala Thr Ala Thr Asp Ala Glu Ile Gly Thr Ala Pro Ser Leu Glu Glu
195 200 205
Val Met Val Asn Val Ser Ala Pro Pro Gly Gly Gin Leu Val Tyr Asp 210 215 220
Ser Ala Pro Asn Arg Thr Asp Pro His Val Ile Trp Ala Glu Gly Ala
225 230 235 240
Gly Pro Gly Asp Arg Lys Val Val Gly Pro Leu Gly Arg Gin Arg Leu
245 250 255 Ile Ile Glu Glu Leu Thr Leu Glu Thr Gin Gly Met Tyr Tyr Trp Val
260 265 270
Trp Gly Arg Thr Asp Arg Pro Ser Ala Tyr Gly Thr Trp Val Arg Val
275 280 285
Arg Val Phe Arg Pro Pro Ser Leu Thr Ile His Pro His Ala Val Leu 290 295 300
Glu Gly Gin Pro Phe Lys Ala Thr Cys Thr Ala Ala Thr Tyr Tyr Pro
305 310 315 320
Gly Asn Arg Ala Glu Phe Val Trp Phe Glu Asp Gly Arg Arg Val Phe
325 330 335 Asp Pro Ala Gin Ile His Thr Gin Thr Gin Glu Asn Pro Asp Gly Phe
340 345 350
Ser Thr Val Ser Thr Val Thr Ser Ala Ala Val Gly Gly Gin Gly Pro
355 360 365
Pro Arg Thr Phe Thr Cys Gin Leu Thr Trp His Arg Asp Ser Val Ser 370 375 380
Phe Ser Arg Arg Asn Ala Ser Gly Thr Ala Ser Val Leu Pro Arg Pro
385 390 395 400
Thr Ile Thr Met Glu Phe Thr Gly Asp His Ala Val Cys Thr Ala Gly
405 410 415 Cys Val Pro Glu Gly Val Thr Phe Ala Trp Phe Leu Gly Asp Asp Ser
420 425 430
Ser Pro Ala Glu Lys Val Ala Val Ala Ser Gin Thr Ser Cys Gly Arg
435 440 445
Pro Gly Thr Ala Thr Ile Arg Ser Thr Leu Pro Val Ser Tyr Glu Gin 450 455 460
Thr Glu Tyr Ile Cys Arg Leu Ala Gly Tyr Pro Asp Gly Ile Pro Val 465 470 475 480
Leu Glu His His Gly Ser His Gin Pro Pro Pro Arg Asp Pro Thr Glu 485 490 495 Arg Gin Val Ile Arg Ala Val Glu Gly Ala Gly Ile Gly Val Ala Val 500 505 510
Leu Val Ala Val Val Leu Ala Gly Thr Ala Val Val Tyr Leu Thr His 515 520 525 Ala Ser Ser Val Arg Tyr Arg Arg Leu Arg 530 535
(2) INFORMATION FOR SEQ ID NO: 107:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 170 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107:
Met Ala Phe Arg Ala Ser Gly Pro Ala Tyr Gin Pro Leu Ala Pro Ala
1 5 10 15
Asp Ala Arg Ala Arg Val Pro Ala Val Ala Trp Ile Gly Val Gly Ala 20 25 30 Ile Val Gly Ala Phe Ala Leu Val Ala Ala Leu Val Leu Val Pro Pro 35 40 45
Arg Ser Ser Trp Gly Leu Ser Pro Cys Asp Ser Gly Trp Gin Glu Phe
50 55 60
Asn Ala Gly Cys Val Ala Trp Asp Pro Thr Pro Val Glu His Glu Gin 65 70 75 80
Ala Val Gly Gly Cys Ser Ala Pro Ala Thr Leu Ile Pro Arg Ala Ala
85 90 95
Ala Lys His Leu Ala Ala Leu Thr Arg Val Gin Ala Glu Arg Ser Ser 100 105 110 Gly Tyr Trp Trp Val Asn Gly Asp Gly Ile Arg Thr Cys Leu Arg Leu 115 120 125
Val Asp Ser Val Ser Gly Ile Asp Glu Phe Cys Glu Glu Leu Ala Ile
130 135 140
Arg Ile Cys Tyr Tyr Pro Arg Ser Pro Gly Gly Phe Val Arg Phe Val 145 150 155 160
Thr Ser Ile Arg Asn Ala Leu Gly Leu Pro 165 170
(2) INFORMATION FOR SEQ ID NO:10E
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 215 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108:
Met Ala Gly Ala Pro Pro Arg Leu Pro Pro Arg Asn Pro Ala Pro Pro 1 5 10 15 Glu Gin Arg Pro Ala Ala Ala Ala Arg Pro Leu Ala Ala His Arg Glu 20 25 30
Ala Ala Gly Val Tyr Asn Ala Val Arg Thr Trp Gly Pro Asp Ala Glu
35 40 45
Ala Glu Pro Asp Gin Met Glu Asn Thr Tyr Leu Leu Pro Glu Asp Asp 50 55 60
Ala Ala Met Pro Ala Gly Val Gly Leu Gly Ser Thr Pro Ala Ala Asp 65 70 75 80
Thr Thr Ala Ala Ala Trp Pro Ala Glu Ser His Ala Pro Arg Ala Pro 85 90 95 Ser Glu Glu Ala Asp Ser Ile Tyr Glu Ser Val Ser Glu Asp Gly Gly 100 105 110
Arg Val Tyr Glu Glu Ile Pro Trp Val Arg Val Tyr Glu Asn Ile Cys
115 120 125
Leu Arg Arg Gin Asp Ala Gly Gly Ala Ala Pro Pro Gly Asp Ala Pro 130 135 140
Asp Ser Pro Tyr Ile Glu Ala Glu Asn Pro Leu Tyr Asp Trp Gly Gly
145 150 155 160
Ser Ala Leu Phe Ser Pro Pro Gly Ala Thr Arg Ala Pro Asp Pro Gly
165 170 175 Leu Ser Leu Ser Pro Met Pro Ala Arg Pro Arg Thr Asn Ala Asn Asp
180 185 190
Gly Pro Thr Asn Val Ala Ala Leu Ser Ala Leu Leu Thr Lys Leu Lys
195 200 205
Arg Gly Arg His Gin Ser His 210 215
(2) INFORMATION FOR SEQ ID NO: 109:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 393 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109:
Met Gin Arg Arg Arg Ala Ser Ser Leu Arg Leu Ala Arg Cys Leu Thr
1 5 10 15
Pro Ala Asn Leu Ile Arg Gly Ala Asn Ala Gly Val Pro Glu Arg Arg 20 25 30 Ile Phe Ala Gly Cys Leu Leu Pro Thr Pro Glu Gly Leu Leu Ser Ala 35 40 45
Ala Val Gly Val Leu Arg Gin Arg Ala Asp Asp Leu Gin Pro Ala Phe
50 55 60
Leu Thr Gly Ala Asp Arg Ser Val Arg Leu Ala Ala Arg His His Asn 65 70 75 80
Thr Val Pro Glu Ser Leu Ile Val Asp Gly Leu Ala Ser Asp Pro His
85 90 95
Tyr Asp Tyr Ile Arg His Tyr Ala Ser Ala Ala Lys Gin Ala Leu Gly 100 105 110 Glu Val Glu Leu Ser Gly Gly Gin Leu Ser Arg Ala Ile Leu Ala Gin 115 120 125
Tyr Trp Lys Tyr Leu Gin Thr Val Val Pro Ser Gly Leu Asp Ile Pro
130 135 140
Asp Asp Pro Ala Gly Xaa Cys Asp Pro Ser Leu His Val Leu Met Arg 145 150 155 160
Pro Thr Leu Leu Pro Lys Leu Val Val Arg Ala Pro Phe Lys Ser Gly
165 170 175
Ala Ala Ala Ala Lys Tyr Ala Ala Ala Val Ala Gly Leu Arg Asp Ala 180 185 190 Ala His Arg Leu Gin Gin Tyr Met Phe Phe Met Arg Pro Ala Asp Pro 195 200 205
Ser Arg Pro Ser Thr Asp Thr Ala Leu Arg Leu Ser Glu Leu Leu Ala
210 215 220
Tyr Val Ser Val Leu Tyr His Trp Ala Ser Trp Met Leu Trp Thr Ala 225 230 235 240
Asp Lys Tyr Val Cys Arg Arg Leu Gly Pro Ala Asp Arg Arg Phe Val
245 250 255
Ser Gly Ser Leu Glu Ala Pro Ala Glu Thr Phe Ala Arg His Leu Asp 260 265 270 Arg Gly Pro Ser Gly Thr Thr Gly Ser Met Gin Cys Met Ala Leu Arg 275 280 285
Ala Ala Val Ser Asp Val Leu Gly His Leu Thr Arg Leu Ala His Leu 290 295 300 Trp Glu Thr Gly Lys Arg Ser Gly Gly Thr Tyr Gly Ile Val Asp Ala 305 310 315 320
Ile Val Ser Thr Val Glu Val Leu Ser Ile Val His His His Ala Gin
325 330 335
Tyr Ile Ile Asn Ala Thr Leu Thr Gly Tyr Val Val Trp Ala Ser Asp
340 345 350
Ser Leu Asn Asn Glu Tyr Leu Arg Ala Ala Val Asp Ser Gin Glu Arg
355 360 365
Phe Cys Arg Thr Ala Ala Pro Leu Phe Pro Thr Met Thr Ala Pro Ser
370 375 380
Trp Ala Arg Met Glu Leu Ser Ile Lys 385 390
(2) INFORMATION FOR SEQ ID NO: 110:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 680 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110:
Met Ser Val Arg Gly His Ala Val Arg Arg Arg Arg Ala Ser Thr Arg
1 5 10 15
Ser His Ala Pro Ser Ala His Arg Ala Glu Ser Pro Val Glu Asp Glu 20 25 30 Pro Glu Gly Gly Gly Val Gly Leu Met Gly Tyr Leu Arg Ala Val Phe 35 40 45
Asn Val Asp Asp Asp Ser Glu Val Glu Ala Ala Gly Glu Met Ala Ser
50 55 60
Glu Glu Pro Pro Pro Arg Arg Arg Arg Glu Arg His Pro Gly Ser Arg 65 70 75 80
Arg Ala Ser Glu Ala Arg Ala Ala Ala Pro Pro Arg Arg Ala Ser Phe
85 90 95
Pro Arg Pro Arg Ser Val Thr Ala Arg Ser Gin Ser Val Arg Gly Arg 100 105 110 Arg Asp Ser Ala Ile Thr Arg Ala Pro Arg Gly Gly Tyr Leu Gly Pro 115 120 125
Met Asp Pro Arg Asp Val Leu Gly Arg Val Gly Gly Ser Arg Val Val 130 135 140 Pro Ser Pro Leu Phe Leu Asp Glu Leu Ser Tyr Glu Glu Asp Asp Tyr
145 150 155 160
Pro Ala Ala Val Ala His Asp Asp Gly Ala Gly Ala Arg Pro Pro Ala
165 170 175 Thr Val Glu Ile Ile Glu Gly Arg Val Ser Gly Pro Glu Leu Gin Ala
180 185 190
Ala Phe Pro Leu Asp Arg Leu Thr Pro Arg Val Ala Ala Trp Asp Glu
195 200 205
Ser Val Arg Ser Ala Leu Gly His Pro Ala Gly Phe Tyr Pro Cys Pro 210 215 220
Asp Ser Ala Phe Gly Leu Ser Arg Val Gly Val Met His Phe Asp Ala
225 230 235 240
Asp Pro Lys Val Phe Phe Arg Gin Thr Leu Gin Gin Gly Glu Ala Trp
245 250 255 Tyr Val Thr Gly Asp Ala Ile Leu Asp Leu Thr Asp Arg Arg Ala Lys
260 265 270
Thr Ser Pro Ser Arg Ala Met Gly Phe Leu Val Asp Ala Ile Val Arg
275 280 285
Val Ala Ile Asn Gly Trp Val Cys Gly Thr Arg Leu His Thr Glu Gly 290 295 300
Arg Gly Ser Glu Leu Asp Asp Arg Ala Ala Glu Leu Arg Arg Gin Phe
305 310 315 320
Ala Ser Leu Thr Ala Leu Arg Pro Val Gly Ala Ala Ala Val Pro Leu
325 330 335 Leu Ser Ala Gly Gly Ala Ala Pro Pro His Pro Gly Pro Asp Ala Ala
340 345 350
Val Phe Arg Ser Ser Leu Gly Ser Leu Leu Tyr Trp Pro Gly Val Arg
355 360 365
Ala Leu Leu Gly Arg Asp Cys Arg Val Ala Ala Arg Tyr Ala Gly Arg 370 375 380
Met Thr Tyr Ile Ala Thr Gly Ala Leu Leu Ala Arg Phe Asn Pro Gly
385 390 395 400
Ala Val Lys Cys Val Leu Pro Arg Glu Ala Ala Phe Ala Gly Arg Val
405 410 415 Leu Asp Val Leu Ala Val Leu Ala Glu Gin Thr Val Gin Trp Leu Ser
420 425 430
Val Val Val Gly Ala Arg Leu His Pro His Ser Ala His Pro Ala Phe
435 440 445
Ala Asp Val Glu Gin Glu Ala Leu Phe Arg Ala Leu Pro Leu Gly Ser 450 455 460
Pro Gly Val Val Ala Ala Glu His Glu Ala Leu Gly Asp Thr Ala Ala 465 470 475 480
Arg Arg Leu Leu Ala Thr Ser Gin Ala Val Leu Gly Ala Ala Val Tyr 485 490 495
Ala Leu His Thr Ala Thr Val Thr Leu Lys Tyr Ala Cys Gly Asp Ala
500 _ 505 510
Arg Arg Arg Arg Asp Asn Ala Ala Ala Ala Arg Ala Val Leu Ala Thr 515 520 525
Gly Leu Ile Leu Gin Arg Leu Leu Gly Leu Ala Asp Thr Val Val Ala
530 535 540
Cys Val Ala Ala Phe Asp Gly Gly Ser Thr Ala Pro Glu Val Gly Thr 545 550 555 560 Tyr Thr Pro Leu Arg Tyr Ala Cys Val Leu Arg Ala Thr Gin Pro Leu
565 570 575
Tyr Ala Arg Thr Thr Pro Ala Lys Phe Trp Ala Asp Val Arg Ala Ala
580 585 590
Ala Glu His Val Asp Leu Arg Pro Ala Ser Ser Ala Pro Arg Ala Pro 595 600 605
Val Ser Gly Thr Ala Asp Pro Ala Phe Leu Leu Glu Asp Leu Ala Ala
610 615 620
Phe Pro Pro Ala Pro Leu Asn Ser Glu Ser Val Leu Gly Pro Arg Val 625 630 635 640 Arg Val Val Asp Ile Met Ala Gin Phe Arg Lys Leu Leu Met Gly Asp
645 650 655
Glu Glu Thr Ala Ala Leu Arg Ala His Val Ser Gly Arg Arg Ala Thr
660 665 670
Gly Leu Gly Gly Pro Pro Arg Pro 675 680
(2) INFORMATION FOR SEQ ID NO: 111:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 556 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111:
Val Ile Leu Lys Met Arg Gly Gly Gly Arg Glu Met Ser Val Ile Gly 1 5 10 15
Asp Ala Arg His Pro Arg Gin Phe Pro Ser Gin Gly Pro Arg Pro Phe
20 25 30
Ser Val Ala Gly Pro Gly Ser Leu Pro Pro Ser Pro Pro Pro Gly Ala 35 40 45
Arg Ala Leu Leu Ile Arg Leu Ser Lys Ser Leu Ser Pro Asp Pro Thr
50 55 . 60
Ala Pro Met Asp Leu Leu Val Asn Asn Leu Phe Ala Asp Ala Asp Gly 65 70 75 80
Val Ser Pro Pro Pro Pro Arg Pro Ala Gly Gly Pro Lys Asn Thr Pro
85 90 95
Ala Ala Pro Pro Leu Tyr Ala Thr Gly Arg Leu Ser Gin Ala Gin Leu 100 105 110 Met Pro Ser Pro Pro Met Pro Val Pro Pro Ala Ala Leu Phe Asn Arg 115 120 125
Leu Leu Asp Asp Leu Gly Phe Ser Ala Gly Pro Ala Leu Cys Thr Met
130 135 140
Leu Asp Thr Trp Asn Glu Asp Leu Phe Ser Gly Phe Pro Thr Asn Ala 145 150 155 160
Asp Met Tyr Arg Glu Cys Lys Phe Leu Ser Thr Leu Pro Ser Asp Val
165 170 175
Ile Asp Trp Gly Asp Ala His Val Pro Glu Arg Ser Pro Ile Asp Ile 180 185 190 Arg Ala His Gly Asp Val Ala Phe Pro Thr Leu Pro Ala Thr Arg Asp 195 200 205
Glu Leu Pro Ser Tyr Tyr Glu Ala Met Ala Gin Phe Phe Arg Gly Glu
210 215 220
Leu Arg Ala Arg Glu Glu Ser Tyr Arg Thr Val Leu Ala Asn Phe. Cys 225 230 235 240
Ser Ala Leu Tyr Arg Tyr Leu Arg Ala Ser Val Arg Gin Leu His Arg
245 250 255
Gin Ala His Met Arg Gly Arg Asn Arg Asp Leu Arg Glu Met Leu Arg 260 265 270 Thr Thr Ile Ala Asp Arg Tyr Tyr Arg Glu Thr Ala Arg Leu Ala Arg 275 280 285
Val Leu Phe Leu His Leu Tyr Leu Phe Leu Ser Arg Glu Ile Leu Trp
290 295 300
Ala Ala Tyr Ala Glu Gin Met Met Arg Pro Asp Leu Phe Asp Gly Leu 305 310 315 320
Cys Cys Asp Leu Glu Ser Trp Arg Gin Leu Ala Cys Leu Phe Gin Pro
325 330 335
Leu Met Phe Ile Asn Gly Ser Leu Thr Val Arg Gly Val Pro Val Glu 340 345 350 Ala Arg Arg Leu Arg Glu Leu Asn His Ile Arg Glu His Leu Asn Leu 355 360 365
Pro Leu Val Arg Ser Ala Ala Ala Glu Glu Pro Gly Ala Pro Leu Thr 370 375 380 Thr Pro Pro Val Leu Gin Gly Asn Gin Ala Arg Ser Ser Gly Tyr Phe 385 390 395 400
Met Leu Leu Ile Arg Ala Lys Leu Asp Ser Tyr Ser Ser Val Ala Thr 405 410 415 Ser Glu Gly Glu Ser Val Met Arg Glu His Ala Tyr Ser Arg Gly Arg 420 425 430
Thr Arg Asn Asn Tyr Gly Ser Thr Ile Glu Gly Leu Leu Asp Leu Pro
435 440 445
Asp Asp Asp Asp Ala Pro Ala Glu Ala Gly Leu Val Ala Pro Arg Met 450 455 460
Ser Phe Leu Ser Ala Gly Gin Arg Pro Arg Arg Leu Ser Thr Thr Ala 465 470 475 480
Pro Ile Thr Asp Val Ser Leu Gly Asp Glu Leu Arg Leu Asp Gly Glu 485 490 495 Glu Val Asp Met Thr Pro Ala Asp Ala Leu Asp Asp Phe Asp Leu Glu 500 505 510
Met Leu Gly Asp Val Glu Ser Pro Ser Pro Gly Met Thr His Asp Pro
515 520 525
Val Leu Tyr Gly Ala Leu Asp Val Asp Asp Phe Glu Phe Glu Gin Met 530 535 540
Phe Thr Asp Ala Met Gly Ile Asp Asp Phe Gly Gly 545 550 555
(2) INFORMATION FOR SEQ ID NO: 112:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 7362 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 112:
CGCGGGGGAG GGGACGACGC GGGGGAGGGG ACGACGCGGG GGAGGGGAGG ACGCGGGGGA 60
TATATAAAGC GGTACAAAGC GCGGGAATGG GCATATTGGA CCCGCGTGAT TCGGTTGCTC 120
GCGGTTGTCT TGTTTGGACG TTTTTTATGC GGGAACAAGG GGGCTTACCG GTTACACTGT 180
CCGCTCGCTA TGGGGTTCGT CTGTCTGTTT GGGCTTGTCG TTATGGGAGC CTGGGGGGCG 240
TGGGGTGGGT CACAGGCAAC CGAATATGTT CTTCGTAGTG TTATTGCCAA AGAGGTGGGG 300 GACATACTAA GAGTGCCTTG CATGCGGACC CCCGCGGACG ATGTTTCTTG GCGCTACGAG 360
GCCCCGTCCG TTATTGACTA TGCCCGCATA GACGGAATAT TTCTTCGCTA TCACTGCCCG 420
GGGTTGGACA CGTTTTTGTG GGATAGGCAC GCCCAGAGGG CGTATCTGGT TAACCCCTTT 480
CTCTTTGCGG CGGGATTTTT GGAGGACTTG AGTCACTCTG TGTTTCCGGC CGACACCCAG 540 GAAACAACGA CGCGCCGGGC CCTTTATAAA GAGATACGCG ATGCGTTGGG CAGTCGAAAA 600
CAGGCCGTCA GCCACGCACC CGTCAGGGCC GGGTGTGTAA ACTTTGACTA CTCACGCACT 660
CGCCGCTGCG TCGGGCGACG CGATTTACGG CCTGCCAACA CCACGTCAAC GTGGGAACCG 720
CCTGTGTCGT CGGACGATGA AGCGAGCTCG CAGTCGAAGC CCCTCGCCAC CCAGCCGCCC 780 GTCCTCGCCC TTTCGAACGC CCCCCCACGG CGGGTCTCCC CGACGCGAGG TCGGCGCCGG 840
CATACTCGCC TCCGACGCAA CTAGCCACGT CTGCATCGCA AGCCACCCTG GGTCGGGAGC 900
AGGATATCCG ACCCGTCTAG CGGCCGGGTC GGCTGTCCAG CGTCGTCGCC CTAGAGGCTG 960
TCCGCCGGGC GTGATGTTTT CCGCATCTAC GACCCCCGAA CAGCCCCTGG GGCTGTCGGG 1020
CGATGCGACG CCGCCCCTGC CGACTTCCGT GCCCCTGGAC TGGGCCGCGT TTCGGCGCGC 1080 GTTTCTGATC GACGACGCCT GGCGGCCCCT GTTGGAGCCG GAGCTCGCGA ACCCCCTAAC 1140
CGCGCGCCTC CTCGCGGAGT ATGACCGTCG GTGCCAGACC GAAGAGGTGC TGCCGCCGCG 1200
GGAGGATGTG TTCTCCTGGA CGCGGTATTG TACCCCCGAC GACGTGCGCG TGGTTATCAT 1260
CGGGCAGGAC CCGTACCACC ATCCCGGCCA GGCGCACGGC CTGGCGTTTA GCGTGCGTGC 1320
GGATGTGCCG GTGCCTCCGA GTCTACGGAA CGTGCTGGCG GCGGTAAAAA ATTGTTACCC 1380 CGACGCGCGC ATGAGCGGCC GCGGCTGCCT GGAAAAGTGG GCTCGCGACG GCGTGCTGTT 1440
GTTGAACACG ACCCTGACCG TCAAGCGCGG GGCGGCGGCG TCCCACTCCA AGCTTGGATG 1500
GGACCGCTTT GTGGGCGGGG TGGTCCGACG GCTGGCCGCG CGCCGCCCGG GCCTGGTCTT 1560
TATGCTCTGG GGCGCCCATG CCCAGAACGC GATCAGGCCC GACCCTCGCC AACACTACGT 1620
CCTCAAGTTT TCTCACCCGT CGCCCCTCTC CAAGGTCCCG TTTGGGACGT GCCAGCATTT 1680 CCTCGCCGCG AATCGCTACC TCGAAACCCG GGACATTATG CCTATCGACT GGTCGGTATA 1740
AGATGCCGAC ATCCGGGGTC TTGATTTACG AGGGGGCAAT TAATAAAGAC TGTTGATGGT 1800
TAAATCTCGG GTCTCATACC GGTCCGTGAT GTCGGGCGTG GGGGAAGAGA GGGTCCCCTC 1860
TGCGTTTACT ATCCTTGCCT CGTGGGGCTG GACGTTTGCA CCCCAGAACC ATGATCCTGG 1920
CGCGTCGCCG AATACGACGC CCATAGAGTC GATTGCGGGG ACCGCACCGG ACGCGCACGT 1980 GGGGCCTCTC GACGGAGAGC CGGACCGGGA TGCGATCTCC CCGCTTACGT CGAGCGTGGC 2040
CGGCGACCCG CCGGGGGCGG ACGGCCCCTA CGTCACCTTT GATACTCTGT TTATGGTATC 2100
TTCGATCGAC GAACTGGGGC GCCGCCAGCT CACGGATACG ATCCGTAAGG ACCTGCGGCT 2160
GTCGCTGGCC AAGTTCAGCA TCGCGTGTAC CAAGACCTCG TCGTTTTCGG GGACGGCCGC 2220
GCGCCAGCGC AAGCGCGGAG CACCGCCGCA ACGCACATGC GTACCACGCA GCAACAAGAG 2280 CCTCCAGATG TTCGTTTTGT GCAAGCGCGC CAACGCCGCG CAGGTGCGCG AGCAGCTGCG 2340
GGCGGTTATT CGGTCGCGCA AGCCGCGCAA GTATTACACG CGGTCCTCGG ATGGGCGGCT 2400
CTGCCCGGCC GTCCCCGTGT TTGTACACGA GTTTGTTTCG TCCGAACCCA TGCGCCTCCA 2460
TCGAGATAAC GTCATGCTGT CTACGGAACC AGACTAAGCA CCCCCGCCGT CCCCTTTCTT 2520
TTCCCCCTAC CCTTCCCCCC GTTACTGATG TGTTGTGACG TTTCAATAAA TAACACGTAG 2580 CTTATTTTGT TGGATGATGG ATTGATTGAT TTTATTGACC GTTCGTTCGC CCGGCGGTGC 2640
CGTCGCCGCG CGCAGAGGGA ATATGCAAGC GGGCGGGGTG GGGAGGAAAG AAGGTTTCAG 2700
GTTCCGGGGG TTGGGTCTGC GTCGTCCAGG GTGGGGCTGA TCTGAATTTC CCGCAGAACC 2760
TCGACCAGTA GGTCTGTTGT GTTTGCTGGG AACTCGCCCG CCGTTGGGGA TACGGGGGCG 2820
GGGGGTGTGG TTGGGCGGAC GTCCAGGGGT GCGTTATCGC ACCCCCGCGC CGCCTCGGGG 2880 GCCGTCCCGT AGATCGTTGC GGTGATGTAG ATGGTGTCCG GGGTCCACAC CACCGTCAGG 2940
ATGCCGGCCG TCGCACTCCG GACGCTTTCG CCGTGCGATG AGCTGACCCA GGAGTCAAAG 3000
GGGTACGCGT ACATATGGGC GTCCCACCAG CGCTCCAGCC TCTGGGTACT AGCGCGTCCT 3060
ATAAAGCGGT ATGCGCAAAA TTCGGCACGA CAGTCGATAA TCACCAGCAG CCCGATGGGG 3120 GTGTGTTGTA TCACCACGCC TCCGCGGGGC AGGCGGTCCT GGCGCGCTCG ACCCCGCGTC 3180
AGAACCGCGC GCGTCCCTGA CTCAAACACG TGCACCACCT GTGCCGCGTC CGGCAGCGCG 3240
CTCGTTAGCG ACGCCCTGGG GTGATGTAGG CTGTACGCGA TGGTCGTCTG GGGGTTCCCC 3300
ATGTCTCGGG GGGGTGGGGG TGAATGTCAC CCGGCCCGGG TGCGGTGGGA ACGCGAGGGA 3360 ATGGAGGGTT AATAGACAAT GACCACATTC GGATCGCGTA GAGCAGATAG TATGTGCTCG 3420
CTAATGACGT CATCGCGTTC GTGGCGCTCC CGGAGCGGGT TTAGATTCAT GTGCAGGAAC 3480
TCGGATGAGG TGGTGCGGGA CATGGCTACG TACGCGCTGT TTAGGCGCAG GTTTCCGGGC 3540
GTGAAGCATA TGGCGACCTT GTCCAGACTG AGCCCCTGGG AGCGCGTGAT GGTCATCGCG 3600
AGTTTGGAGC TGATGCCGTA GTCGGCGTTG ATGGCCATGG CCAGCTCCGT GGAGTCGATC 3660 GACTCGACAA ACTCACTGAT GTTGGTATTG ACGACAGACA TGAAGCCGTG CTGGTCCCGC 3720
AGGACGATGT AGGGCAGGGG GGACTCCTCC AAGAACTCGG CCACGCCGGC CGTCGCGTGC 3780
CGCCGCCGCA GCTCCTCCGC GAACGCGAAC ACCCGGGTGT ACGTGTACCC CATCAGCGTG 3840
TAGTTGTCCG TCTGCAGGGC CACGGACATC AGCCCCCCGC GCGGCGAGCC GGTCAGCAGC 3900
TCGCAGCCCC GGAAGATGAC ATTGTCCACG TAGGTGCTGA AGGGGGCGCT CTCAAACACC 3960 TCCCCGAAGA GCTCCCGTAG GATAAGGTAT CGCCCCAGAA AGGCCCTCTT CAGGAGCCCA 4020
AACTGGGCGT GGACGGCCGC GGTGGTCTCC GGCTCTTCGA GGGCGTAGTG GCAGTAGAAC 4080
ACGTCCAGCT GCTGTTCGTC CAGCCCGGCG AAGATAACGT CAAGGTCGTC GTCGGGGAAG 4140
TCGTCCGGGC CCCCGTCCCG CGGGCCCAGG TGCTTAAAAT TGAACGCACG CTCCCCCGGA 4200
GAGCGGTCGC TGGTGTCGGC GGCCCTGGTT GCCGATGCGC CGGCGGCGTC CCGGCGTAGC 4260 GACAGGAGTT CTGCCGTCAG CTCCCCTAGG CGGCCGTAGG CCAGGGTCCT CTGGGTCGCG 4320
TCCAGGCCGG GGCGCTGGAG AAAGTTGTAA AAGTGAATCA GCCCGCCGAA CATGAGCCGC 4380
GACAGGAACC GGTAGGCGAA CTCCACCGAG GTCTCCCCCT GGGTCTTCAC GAAGCTGTCG 4440
TCGCGCAGCA CAGCCTCGAA GGTCCGAAAC GTCCCGTCGA ACCCAAACAC CATCTTTCGG 4500
AGGCGCGCGG TCACCGCGAC CTGGCTGTTG AGGACGTACG TGATGTCGTT CCGGGCCACG 4560 ACTAGCTGTT GCTTGCTGTG CACCTCACAG CGCACGTGCC CCGCGTCCTG GTCCTGACTC 4620
TGGGAGTAGT TGGTGATGCG ACTGGCGTTG GCCGTGATCC ACTTTTCCAT GGTCAGCGTG 4680
GGTTGCTGCG TGAGCCGTCG ATACTCGTCA AACTCTTTGA CCGACACAAA CGTAAGCACG 4740
GGGAGGGTAA ACACAACAAA CTCCCCCTCG CGAGTCACCT TTAGGTAGGC GTGGAGCTTG 4800
GCCATGTACG CGCTGACCTC CTTGTGGGAC GAGAACAGCC GCGTCCACCC CGGAAGGTTG 4860 GCCGGGTTGG TGATGTAACT TTCCGGGACG ACAAAGCGGT CCACAAACTG CATGTGCTCC 4920
TCGGTGATGG GAAGGCCGTA CTCCAGCACC TTCATGAGGT TCCCGAACTC GTGCTCCACA 4980
CATCGCTTGT TGTTAATGAA AATGGCCCAG CTGTGCGAGA GGCGCGTGTA CTCGCGTAGG 5040
GTGCGGTTGC AGATGAGGTA CGTGAGCACG TTTTCGCTCT GCCGGACGGA GCATCGCAGT 5100
TTTTGGTGTT CGAAGGTGGA CTCCAGCGAG GCCGTCTGGG TCGGCGACCC CACGCACACC 5160 AGCACCGGCC GCAGGCGGCC CGCGTACTGG GGGGTGTGGT ACAGGGCGTT AATCATCCAC 5220
CAGCAATACA CCACGGTCGT GAGTAGGTGC CGCCCCAGGA GCCCGGCCTC GTCGATGACG 5280
ATAATGTTGC TGCGGGTGAA AGCCGGCAGC GCCCCGTGTG TGACCGAGGC CAGGCGCGTG 5340
AGGGCACCCT GGCCCAGCCC CAAAGTCTGC TCTAGGGCGG TGAGGGCGTG GAACTCGTTT 5400
CGCGCGTCTT CGCCCCCGTG CGCCGCCAGG GCCCGCTTGG TGATGTCGAG GATCACCTCC 5460 CAGTAGTACG TCAGGTCTCG CCGCTGCAGG TCTTCCAGCG AGGCGGGGCT GCTGGCCAGG 5520
GTGTACGGGT GCTGCCCCAG CTGGGCCTGG ACGTGATTCC CGCGAAACCC GAACTCGTGA 5580
AAGATGGTGT TGATGGGTCG ACTCAGAAAC GCCCCCGAGA GCTTAACGTA CATGTTCTGC 5640
GCCGCGATTC GCGTGGCGCC CGTGACCACG CAGTCCAGGA CCTCGTTGAG GGTCTGCACG 5700 CACGTACTCT TTCCGGATCC GGCGTTGCCG GTGATGAGAT ACGCCGCGAA CGGAAACTCC 5760
CGGAGCGGCA GGCCGGTCGG GACCTCCAAG GCCGCCACGT CCCGGAACCA CTGCAGGCGC 5820
GGCACCTGCG TGACGTCGAG CTGCTGCTGC GAGAGCTCTC GGATGCGTGC GATGATTGGT 5880
TGGACCCCGT GCATGGACGT AAAATTTAAA AACGCCTCGT CCCTGAACCG CACGGCGGGT 5940 CTGGCCCCGG GCTGCTGTGG GGGCGGACCT GGTGCCCGGA CGTCCCGCGA GCCCTCCCCG 6000
CCGGACGCCG CCATGGCCGC ACAGCGCGCG CGGGCGCCGG CGATGCGGAC GCGGGGCGGC 6060
GACGCGGCGC TATGCGCCCC CGAGGACGGC TGGGTGAAGG TTCACCCCAC CCCCGGGACG 6120
ATGTTGTTCC GCGAGATTCT CCTCGGGCAG ATGGGGTACA CCGAGGGTCA GGGGGTGTAC 6180
AACGTCGTCC GGTCCAGCGA GGCCGCCACC CGACAGCTGC AGGCGGCGAT CTTCCACGCG 6240 CTCCTCAACG CCACAACGTA CCGGGACCTG GAGGAGGACT GGCGCCGCCA CGTGGTGGCC 6300
CGCGGCCTCC AGCCGCAGCG GCTGGTTCGC AGGTACCGGA ACGCCCGGGA GGGCGATATC 6360
GCCGGGGTGG CCGAGCGGGT GTTCGACACG TGGCGATGCA CGCTCAGGAC GACGCTGCTG 6420
GACTTTGCCC ACGGGGTGGT AGACTGCTTT GCGCCGGGCG GCCCAAGCGG ACCGACCAGC 6480
TTCCCCAAAT ATATCGACTG GCTGACGTGT CTGGGGCTGG TTCCCATATT GCGCAAGACG 6540 CGCGAGGGGG AGGCGACGCA GCGCCTGGGG GCGTTTCTCA GGCAGCACAC GCTGCCCCGG 6600
CAGCTGGCCA CGGTCGCCGG GGCCGCGGAG CGCGCCGGCC CGGGGCTTCT GGAGCTGGCC 6660
GTCGCGTTCG ACTCCACGCG CATGGCGGAA TACGACCGTG TGCACATCTA CTACAACCAT 6720
CGCCGGGGGG AGTGGCTGGT GCGCGACCCG GTCAGCGGGC AGCGCGGCGA GTGCCTGGTG 6780
CTGTGCCCCC CCCTGTGGAC CGGCGACCGC CTGGTCTTCG ATTCGCCCGT TCAGCGGCTG 6840 TGCCCCGAGA TCGTCGCGTG CCACGCCCTC CGGGAACACG CGCACATCTG CCGTCTGCGC 6900
AACACCGCGT CCGTCAAGGT GCTGTTGGGG CGCAAGAGCG ACAGCGAGCG CGGGGTGGCT 6960
GGCGCCGCGC GGGTCGTCAA TAAGGCGCTG GGGGAGGATG ACGAGACGAA GGCCGGCTCG 7020
GCCGCCTCGC GTCTCGTGCG GCTCATCATC AACATGAAGG GCATGCGCCA CGTGGGCGAC 7080
ATCAACGACA CGGTACGCGC CTACTTGGAC GAGGCGGGGG GGCACCTGAT CGACACCCCC 7140 GCCGTCGACC ACACCCTCCC TGGGTTCGGC AAGGGCGGCA CCGGCCGCGG GTCGGCGGCC 7200
CAGGACCCGG GGGCGCGACC GCAGCAGCTT CGCCAGGCGT TTCAGACGGC CGTGGTCAAC 7260
AACATCAACG GCATGCTGGA GGGCTATATC AATAATCTCT TTGGAACCAT AGAACGCCTG 7320
CGAGAGACGA ACGCGGGTCT GGCGACCCAG CTGCAGGCGC G 7362
(2) INFORMATION FOR SEQ ID NO: 113:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 180 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113:
Met Arg Thr Pro Ala Asp Asp Val Ser Trp Arg Tyr Glu Ala Pro Ser 1 5 10 15 Val Ile Asp Tyr Ala Arg Ile Asp Gly Ile Phe Leu Arg Tyr His Cys
20 25 30
Pro Gly Leu Asp Thr Phe Leu Trp Asp Arg His Ala Gin Arg Ala Tyr 35 40 45 Leu Val Asn Pro Phe Leu Phe Ala Ala Gly Phe Leu Glu Asp Leu Ser 50 55 60
His Ser Val Phe Pro Ala Asp Thr Gin Glu Thr Thr Thr Arg Arg Ala 65 70 75 80
Leu Tyr Lys Glu Ile Arg Asp Ala Leu Gly Ser Arg Lys Gin Ala Val 85 90 95
Ser His Ala Pro Val Arg Ala Gly Cys Val Asn Phe Asp Tyr Ser Arg 100 105 110
Thr Arg Arg Cys Val Gly Arg Arg Asp Leu Arg Pro Ala Asn Thr Thr 115 120 125 Ser Thr Trp Glu Pro Pro Val Ser Ser Asp Asp Glu Ala Ser Ser Gin 130 135 140
Ser Lys Pro Leu Ala Thr Gin Pro Pro Val Leu Ala Leu Ser Asn Ala 145 150 155 160
Pro Pro Arg Arg Val Ser Pro Thr Arg Gly Arg Arg Arg His Thr Arg 165 170 175
Leu Arg Arg Asn 180
(2) INFORMATION FOR SEQ ID NO: 114:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 334 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114:
Met Lys Arg Ala Arg Ser Arg Ser Pro Ser Pro Pro Ser Arg Pro Ser
1 5 10 15
Ser Pro Phe Arg Thr Pro Pro His Gly Gly Ser Pro Arg Arg Glu Val 20 25 30 Gly Ala Gly Ile Leu Ala Ser Asp Ala Thr Ser His Val Cys Ile Ala 35 40 45
Ser His Pro Gly Ser Gly Ala Gly Tyr Pro Thr Arg Leu Ala Ala Gly 50 55 60 Ser Ala Val Gin Arg Arg Arg Pro Arg Gly Cys Pro Pro Gly Val Met
65 70 75 80
Phe Ser Ala Ser Thr Thr Pro Glu Gin Pro Leu Gly Leu Ser Gly Asp
85 90 95 Ala Thr Pro Pro Leu Pro Thr Ser Val Pro Leu Asp Trp Ala Ala Phe 100 105 110
Arg Arg Ala Phe Leu Ile Asp Asp Ala Trp Arg Pro Leu Leu Glu Pro 115 120 125 Glu Leu Ala Asn Pro Leu Thr Ala Arg Leu Leu Ala Glu Tyr Asp Arg 130 135 140
Arg Cys Gin Thr Glu Glu Val Leu Pro Pro Arg Glu Asp Val Phe Ser
145 150 155 160
Trp Thr Arg Tyr Cys Thr Pro Asp Asp Val Arg Val Val Ile Ile Gly 165 170 175 Gin Asp Pro Tyr His His Pro Gly Gin Ala His Gly Leu Ala Phe Ser 180 185 190
Val Arg Ala Asp Val Pro Val Pro Pro Ser Leu Arg Asn Val Leu Ala 195 200 205 Ala Val Lys Asn Cys Tyr Pro Asp Ala Arg Met Ser Gly Arg Gly Cys 210 215 220
Leu Glu Lys Trp Ala Arg Asp Gly Val Leu Leu Leu Asn Thr Thr Leu
225 230 235 240
Thr Val Lys Arg Gly Ala Ala Ala Ser His Ser Lys Leu Gly Trp Asp 245 250 255 Arg Phe Val Gly Gly Val Val Arg Arg Leu Ala Ala Arg Arg Pro Gly 260 265 270
Leu Val Phe Met Leu Trp Gly Ala His Ala Gin Asn Ala Ile Arg Pro 275 280 285 Asp Pro Arg Gin His Tyr Val Leu Lys Phe Ser His Pro Ser Pro Leu 290 295 300
Ser Lys Val Pro Phe Gly Thr Cys Gin His Phe Leu Ala Ala Asn Arg
305 310 315 320
Tyr Leu Glu Thr Arg Asp Ile Met Pro Ile Asp Trp Ser Val 325 330
(2) INFORMATION FOR SEQ ID NO: 115:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 231 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115:
Met Val Lys Ser Arg Val Ser Tyr Arg Ser Val Met Ser Gly Val Gly 1 5 10 15
Glu Glu Arg Val Pro Ser Ala Phe Thr Ile Leu Ala Ser Trp Gly Trp
20 25 30
Thr Phe Ala Pro Gin Asn His Asp Pro Gly Asp Asn Thr Thr Pro Ile 35 40 45
Glu Ser Ile Ala Gly Thr Ala Pro Asp Ala His Val Gly Pro Leu Asp
50 55 60
Gly Glu Pro Asp Arg Asp Ala Ile Ser Pro Leu Thr Ser Ser Val Ala 65 70 75 80 Gly Asp Pro Pro Gly Ala Asp Gly Pro Tyr Val Thr Phe Asp Thr Leu
85 90 95
Phe Met Val Ser Ser Ile Asp Glu Leu Gly Arg Arg Gin Leu Thr Asp
100 105 110
Thr lie Arg Lys Asp Leu Arg Leu Ser Leu Ala Lys Phe Ser Ile Ala 115 120 125
Cys Thr Lys Thr Ser Ser Phe Ser Gly Thr Ala Ala Arg Gin Arg Lys
130 135 140
Arg Gly Ala Pro Pro Gin Arg Thr Cys Val Pro Arg Ser Asn Lys Ser 145 150 155 160 Leu Gin Met Phe Val Leu Cys Lys Arg Ala Asn Ala Ala Gin Val Arg
165 170 175
Glu Gin Leu Arg Ala Val Ile Arg Ser Arg Lys Pro Arg Lys Tyr Tyr
180 185 190
Thr Arg Ser Ser Asp Gly Arg Leu Cys Pro Ala Val Pro Val Phe Val 195 200 205
His Glu Phe Val Ser Ser Glu Pro Met Arg Leu His Arg Asp Asn Val
210 215 220
Met Leu Ser Thr Glu Pro Asp 225 230
(2) INFORMATION FOR SEQ ID NO: 116:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 199 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116:
Met Gly Asn Pro Gin Thr Thr Ile Ala Tyr Ser Leu His His Pro Arg 1 5 10 15
Ala Ser Leu Thr Ser Ala Leu Pro Asp Ala Ala Gin Val Val His Val
20 25 30
Phe Glu Ser Gly Thr Arg Ala Val Leu Thr Arg Gly Arg Ala Arg Gin 35 40 45
Asp Arg Leu Pro Arg Gly Gly Val Val Ile Gin His Thr Pro Ile Gly
50 55 60
Leu Leu Val Ile Ile Asp Cys Arg Ala Glu Phe Cys Ala Tyr Arg Phe 65 70 75 80 Ile Gly Arg Ala Ser Thr Gin Arg Leu Glu Arg Trp Trp Asp Ala His
85 90 95
Met Tyr Ala Tyr Pro Phe Asp Ser Trp Val Ser Ser Ser His Gly Glu
100 105 110
Ser Val Arg Ser Ala Thr Ala Gly Ile Leu Thr Val Val Trp Thr Pro 115 120 125
Asp Thr Ile Tyr Ile Thr Ala Thr Ile Tyr Gly Thr Ala Pro Glu Ala
130 135 140
Arg Cys Asp Asn Ala Pro Leu Asp Val Arg Pro Thr Thr Pro Pro Ala 145 150 155 160 Pro Val Ser Pro Thr Ala Gly Glu Phe Pro Ala Asn Thr Thr Asp Leu
165 170 175
Leu Val Glu Val Leu Arg Glu Ile Gin Ile Ser Pro Thr Leu Asp Asp
180 185 190
Ala Asp Pro Thr Pro Gly Thr 195
(2) INFORMATION FOR SEQ ID NO: 117:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 877 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: Met Ala Ala Ser Gly Gly Glu Gly Ser Arg Asp Val Arg Ala Pro Gly
1 5 10 15
Pro Pro Pro Gin Gin Pro Gly Ala Arg Pro Ala Val Arg Phe Arg Asp 20 25 30 Glu Ala Phe Leu Asn Phe Thr Ser Met His Gly Val Gin Pro Ile Ile 35 40 45
Ala Arg Ile Arg Glu Leu Ser Gin Gin Gin Leu Asp Val Thr Gin Val
50 55 60
Pro Arg Leu Gin Trp Phe Arg Asp Val Ala Ala Leu Glu Val Pro Thr 65 70 75 80
Gly Leu Pro Leu Arg Glu Phe Pro Phe Ala Ala Tyr Leu Ile Thr Gly
85 90 95
Asn Ala Gly Ser Gly Lys Ser Thr Cys Val Gin Thr Leu Asn Glu Val 100 105 110 Leu Asp Cys Val Val Thr Gly Ala Thr Arg Ile Ala Ala Gin Asn Met 115 120 125
Tyr Val Lys Leu Ser Gly Ala Phe Leu Ser Arg Pro Ile Asn Thr Ile
130 135 140
Phe His Glu Phe Gly Phe Arg Gly Asn His Val Gin Ala Gin Leu Gly 145 150 155 160
Gin His Pro Tyr Thr Leu Ala Ser Ser Pro Ala Ser Leu Glu Asp Leu
165 170 175
Gin Arg Arg Asp Leu Thr Tyr Tyr Trp Glu Val Ile Leu Asp Ile Thr 180 185 190 Lys Arg Ala Ala His Gly Gly Glu Asp Ala Arg Asn Glu Phe His Ala 195 200 205
Leu Thr Ala Leu Glu Gin Thr Leu Gly Leu Gly Gin Gly Ala Leu Thr
210 215 220
Arg Leu Ala Ser Val Thr His Gly Ala Leu Pro Ala Phe Thr Arg Ser 225 230 235 240
Asn Ile Ile Val Ile Asp Glu Ala Gly Leu Leu Gly Arg His Leu Leu
245 250 255
Thr Thr Val Val Tyr Cys Trp Trp Met Ile Asn Ala Leu Tyr His Thr 260 265 270 Pro Gin Tyr Ala Gly Arg Leu Arg Pro Val Leu Val Cys Val Gly Ser 275 280 285
Pro Thr Gin Thr Ala Ser Leu Glu Ser Thr Phe Glu His Gin Lys Leu
290 295 300
Arg Cys Ser Val Arg Gin Ser Glu Asn Val Leu Thr Tyr Leu Ile Cys 305 310 315 320
Asn Arg Thr Leu Arg Glu Tyr Thr Arg Leu Ser His Ser Trp Ala Ile
325 330 335
Phe lie Asn Asn Lys Arg Cys Val Glu His Glu Phe Gly Asn Leu Met 340 345 350
Lys Val Leu Glu Tyr Gly Leu Pro Ile Thr Glu Glu His Met Gin Phe
355 360 365
Val Asp Arg Phe Val Val Pro Glu Ser Tyr Ile Thr Asn Pro Ala Asn 370 375 380
Leu Pro Gly Trp Thr Arg Leu Phe Ser Ser His Lys Glu Val Ser Ala
385 390 395 400
Tyr Met Ala Lys Leu His Ala Tyr Leu Lys Val Thr Arg Glu Gly Glu
405 410 415 Phe Val Val Phe Thr Leu Pro Val Leu Thr Phe Val Ser Val Lys Glu
420 425 430
Phe Asp Glu Tyr Arg Arg Leu Thr Gin Gin Pro Thr Leu Thr Met Glu
435 440 445
Lys Trp Ile Thr Ala Asn Ala Ser Arg Ile Thr Asn Tyr Ser Gin Ser 450 455 460
Gin Asp Gin Asp Ala Gly His Val Arg Cys Glu Val His Ser Lys Gin
465 470 475 480
Gin Leu Val Val Ala Arg Asn Asp Ile Thr Tyr Val Leu Asn Ser Gin
485 490 495 Val Ala Val Thr Ala Arg Leu Arg Lys Met Val Phe Gly Phe Asp Gly
500 505 510
Thr Phe Arg Thr Phe Glu Ala Val Leu Arg Asp Asp Ser Phe Val Lys
515 520 525
Thr Gin Gly Glu Thr Ser Val Glu Phe Ala Tyr Arg Phe Leu Ser Arg 530 535 540
Leu Met Phe Gly Gly Leu Ile His Phe Tyr Asn Phe Leu Gin Arg Pro
545 550 555 560
Gly Leu Asp Ala Thr Gin Arg Thr Leu Ala Tyr Gly Arg Leu Gly Glu
565 570 575 Leu Thr Ala Glu Leu Leu Ser Leu Arg Arg Asp Ala Ala Gly Ala Ser
580 585 590
Ala Thr Arg Ala Ala Asp Thr Ser Asp Arg Ser Pro Gly Glu Arg Ala
595 600 605
Phe Asn Phe Lys His Leu Gly Pro Arg Asp Gly Gly Pro Asp Asp Phe 610 615 620
Pro Asp Asp Asp Leu Asp Val Ile Phe Ala Gly Leu Asp Glu Gin Gin 625 630 635 640
Leu Asp Val Phe Tyr Cys His Tyr Ala Leu Glu Glu Pro Glu Thr Thr 645 650 655 Ala Ala Val His Ala Gin Phe Gly Leu Leu Lys Arg Ala Phe Leu Gly 660 665 670
Arg Tyr Leu Ile Leu Arg Glu Leu Phe Gly Glu Val Phe Glu Ser Ala 675 680 685 Pro Phe Ser Thr Tyr Val Asp Asn Val Ile Phe Arg Gly Cys Glu Leu
690 695 700
Leu Thr Gly Ser Pro Arg Gly Gly Leu Met Ser Val Gin Thr Asp Asn 705 710 715 720 Tyr Thr Leu Met Gly Tyr Thr Tyr Thr Arg Val Phe Ala Phe Ala Glu
725 730 735
Glu Leu Arg Arg Arg His Ala Thr Ala Gly Val Ala Glu Phe Leu Glu
740 745 750
Glu Ser Pro Leu Pro Tyr Ile Val Leu Arg Asp Gin His Gly Phe Met 755 760 765
Ser Val Val Asn Thr Asn Ile Ser Glu Phe Val Glu Ser Ile Asp Ser
770 775 780
Thr Glu Leu Ala Met Ala Ile Asn Ala Asp Tyr Gly Ile Ser Ser Lys 785 790 795 800 Leu Ala Met Thr Ile Thr Arg Ser Gin Gly Leu Ser Leu Asp Lys Val
805 810 815
Ala Ile Cys Phe Thr Pro Gly Asn Leu Arg Leu Asn Ser Ala Tyr Val
820 825 830
Ala Met Ser Arg Thr Thr Ser Ser Glu Phe Leu His Met Asn Leu Asn 835 840 845
Pro Leu Arg Glu Arg His Glu Arg Asp Asp Val Ile Ser Glu His Ile
850 855 860
Leu Ser Ala Leu Arg Asp Pro Asn Val Val Ile Val Tyr 865 870 875
(2) INFORMATION FOR SEQ ID NO: 118:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 588 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: H i
Met Val Leu Met Gly Arg Leu Arg Asn Ala Pro Glu Ser Leu Thr Tyr 1 5 10 , 15 Met Phe Cys Ala Ala Ile Arg Val Ala Pro Val Thr Thr Gin Ser Arg 20 25 30
Thr Ser Leu Arg Val Cys Thr His Val Leu Phe Pro Asp Pro Ala Leu 35 40 45 Pro Val Met Arg Tyr Ala Ala Asn Gly Asn Ser Arg Ser Gly Arg Pro
50 55 60
Val Gly Thr Ser Lys Ala Ala Thr Ser Arg Asn His Cys Arg Arg Gly 65 70 75 80 Thr Cys Val Thr Ser Ser Cys Cys Cys Glu Ser Ser Arg Met Arg Ala
85 90 95
Met lie Gly Trp Thr Pro Cys Met Asp Val Lys Phe Lys Asn Ala Ser
100 105 110
Ser Leu Asn Arg Thr Ala Gly Leu Ala Pro Gly Cys Cys Gly Gly Gly 115 120 125
Pro Gly Ala Arg Thr Ser Arg Glu Pro Ser Pro Pro Asp Ala Ala Met
130 135 140
Ala Ala Gin Arg Ala Arg Ala Pro Ala Met Arg Thr Arg Gly Gly Asp 145 150 155 160 Ala Ala Leu Cys Ala Pro Glu Asp Gly Trp Val Lys Val His Pro Thr
165 170 175
Pro Gly Thr Met Leu Phe Arg Glu Ile Leu Leu Gly Gin Met Gly Tyr
180 185 190
Thr Glu Gly Gin Gly Val Tyr Asn Val Val Arg Ser Ser Glu Ala Ala 195 200 205
Thr Arg Gin Leu Gin Ala Ala Ile Phe His Ala Leu Leu Asn Ala Thr
210 215 220
Tyr Asp Leu Glu Glu Asp Trp Arg Arg His Val Val Arg Leu Gin Pro 225 230 235 240 Gin Arg Leu Val Arg Arg Tyr Arg Asn Ala Arg Glu Gly Asp Ile Ala
245 250 255
Gly Val Ala Glu Arg Val Phe Asp Thr Trp Arg Cys Thr Leu Arg Thr
260 265 270
Thr Leu Leu Asp Phe Ala His Gly Val Val Asp Cys Phe Ala Pro Gly 275 280 285
Gly Pro Ser Gly Pro Thr Ser Phe Pro Lys Tyr Ile Asp Trp Leu Thr
290 295 300
Cys Leu Gly Leu Val Pro Ile Leu Arg Lys Thr Arg Glu Gly Glu Ala 305 310 315 320 Thr Gin Arg Leu Gly Ala Phe Leu Arg Gin His Thr Leu Pro Arg Gin
325 330 335
Leu Ala Thr Val Ala Gly Ala Ala Glu Arg Ala Gly Pro Gly Leu Leu
340 345 350
Glu Leu Ala Val Ala Phe Asp Ser Thr Arg Met Ala Glu Tyr Asp Arg 355 360 365
Val His Ile Tyr Tyr Asn His Arg Arg Gly Glu Trp Leu Val Arg Asp
370 375 380
Pro Val Ser Gly Gin Arg Gly Glu Cys Leu Val Leu Cys Pro Pro Leu 385 390 395 400
Trp Thr Gly Asp Arg Leu Val Phe Asp Ser Pro Val Gin Arg Leu Cys
405 - 410 415
Pro Glu Ile Val Ala Cys His Ala Leu Arg Glu His Ala His Ile Cys 420 425 430
Arg Leu Arg Asn Thr Ala Ser Val Lys Val Leu Leu Gly Arg Lys Ser
435 440 445
Asp Ser Gly Val Ala Gly Ala Ala Arg Val Val Asn Lys Ala Leu Gly
450 455 460 Glu Asp Asp Glu Thr Lys Ala Gly Ser Ala Ala Ser Arg Leu Val Arg
465 470 475 480
Leu Ile Ile Asn Met Lys Gly Met Arg His Val Gly Asp Ile Asn Asp
485 490 495
Thr Val Arg Ala Tyr Leu Asp Glu Ala Gly Gly His Leu Ile Asp Thr 500 505 510
Pro Ala Val Asp His Thr Leu Pro Gly Phe Gly Lys Gly Gly Thr Gly
515 520 525
Arg Gly Ser Ala Ala Gin Asp Pro Gly Ala Arg Pro Gin Gin Leu Arg
530 535 540 Gin Ala Phe Gin Thr Ala Val Val Asn Asn Ile Asn Gly Met Leu Glu
545 550 555 560
Gly Tyr Ile Asn Asn Leu Phe Gly Thr Ile Glu Arg Leu Arg Glu Thr
565 570 575
Asn Ala Gly Leu Ala Thr Gin Leu Gin Ala Arg Val 580 585
(2) INFORMATION FOR SEQ ID NO: 119:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21035 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119:
GTCTGGCCGC CGGCCCTGGC GTACGCGCTA TATAAGCCCA TGCGGTATTG GATGAGTTCC 60
CGCGCGCCCC GGAACTCCTC CACCGCCCAC GGGGCCAGGT CCGCGGCCGC CGCGTCGAAC 120 TCCGCCAGCA GGCGCCCCAG GGCGTCAAAG TTCATCTCCC AGGGCACCCT GCGCACCACC 180
TCATCCCGCA GCCGGGCGCA CAGGGCGGTG TGCTTGGTGA CGCGCGCGCC CAGCTCCTCC 240
ACGGCCTCCG CGCGCTCGGC GCCCTTGGCG CCCAGGACGC CCTGGTACCT GGCGGAAAGG 300
CGCTCGTAGG CCGGCTGGGC CCGCAGCCCC GACACCGTGT TGGTGGTGTC CTGCAGGGCG 360 CGCAGCTGCT CGTGCATGGC GCGGAACCCC TCGGGGGACT TCCAGGCGCC CCCCCGGACG 420
CGGCCAAAGC GACCCCAGAC CTCGTCCCAC TCCGCCTCGG CCTCCTCCAG GGACCTCCGC 480
AGGGCGTCGA CGCGGCGCCG AGTATCAAAG AGCGCCCCCA GGCGGCCGGC GTGCCGCGCC 540
AGGGGGCCGG GGCCGTCGCC GCGGGCGGCG CTTAGCGGGT GCGTCTCGAA GGTGCGCTGG 600 GCGTGCTCTA GCCAGATAAC CGCGGGCACG TCGAGCTCGC GCGTTTTCTC GGTCTGATCC 660
AACAGAACCT CGACCTGGTC GGCGATCTCC GCCACCGAGC GCGCCTGGTC GAGCGTCTTG 720
GCCACGGTCG CCGGGACGGC GACCACCTTC AGCATGGTCT TGAGGTTGGC CAGGCCCTCG 780
GCCTCGATCT GGGCCCGGCG CTCGCGCGCG GCCAGCGCCT CCCGCAGGCC CGCCATGACC 840
CGCTCGGTGG CCTCCGCGCG CTGCTGTTTG GCGCGCACCA CTGCGTCCTT GGTCTCGGCC 900 GTGTCCTGCC GGGTCACGAA GGCGACATAC TCGGCGTACG CCGTGTTCTT CACGGGGCTC 960
TGGTCCACGC GCTCCAACGC CGCCGCGCAC GCGACCAGCG CGTCCTCGCT GGGACACGGC 1020
AGGGTGACCC CGGTCCGGAC CAGCTCCGCG GTGGCCTCCG GGTCATTCCG GGCCGCGGAT 1080
ATCTGCTCCG CGGCGGCCGC CAGGTCCAGG GGCACGCCGC CGAGCGCCCG GTGCACGTCG 1140
GCCCGGATGG CGTCCAGGCG ATCGCGGAGC TCCACGTAGT CGGCGTAGCC ATGTTGGAAG 1200 AACGGCACGT ACCGGCGCAG GCCGGGCACG CTCGTCATGT CGTCCGCCAG GCGCCCCACG 1260
GCCTCGTGGT AGTCGATAAA CCCGTCGCCC GCCTGGGCCA TTTCCAGGAG CCCCTCCGCG 1320
ATGCGCAGCA GCCGCGCCAG GGGCTCGGCG TCGACCCGAA ACATGTCGGC GTAGGTTTCG 1380
GCGGCGGCGT GGAACGCCGC GCTCCAGCCG AGGCGGTGGA TGGCGGCGAG CGGGGGGAGC 1440
ATGGGGTGGC GCTGGTTCTC GGGGGTGTAG GGGTTAAACG CGAAGGCCGT ATCCAGGGCG 1500 AGGGTGACCG CCTCGGCGTT GGCCGCGAGC GCCTGCTCGG CGCGCTTGCG GAAGTCCCGG 1560
GGGTTGTAGC CGTGCGTGCC CGCCAGCGCC TGCAGGCGGC GCAGCTCGAC CACGTCGAAC 1620
TCGGCGCGGT TCTCGACGCG GTCCAGCGCC GCCTCGACGC CGGCGGCCCA GCGCTCGCTG 1680
CTGCCCCGGG CGCGCTGGGC CGCCATCTTC GCCGTCAGGT CGGCGACGGC GGCCTCAAGT 1740
TCGTCGGCGC GGCGTCGCGT GGCGCCGATG ACCTTGCCCA GCTCCTGCAG GGCGCGCCCG 1800 CTGGGGGAAT GGTCCCCGGC CGTCCCTTCG GCGTGCAGCA GGCCCCCGAA CCCAGCCTCG 1860
TGCCCCGCGA GGCTTTCCCG AGCAGCGGTC GTCGCGCGGG CCGCGGCATC GATGAGGGCG 1920
GCATGGTCCC CCTCCGGCTG GGCGCAGGCC CGGCGCGCCT GGACTACCAG GTCGGCGGCC 1980
GCCGACCCCA GGGTCGTGAG CTCGTCGATG GCCCCCCGCG CCTCCAGGGC CAGCCGAGTC 2040
GCCTTTACAT ACCCCGCGGC GCTATCGGCC AGCGCCGCGA GGAAGGACAG GGGCGAGGCC 2100 GGGTCGCGGG CGGCCGCGCC CAGGGCCGAC ACCGCGTCCG CCAGGGCGCC ATGCGCCCGC 2160
ACGGCCGCGT CCACCGTCGC CGCGGGACTT GCCGTCGCGA CGGCGGCGCT CCCGGCGTTG 2220
ATGGCGTTTG ACACGGCTTT GGCGATTGTG GGGGCGTGAT CGGAAAAGAA CTGCACGAGG 2280
ACCGGCGTCT CGGGGGCGTC GGCGAACAGG GTCTTCAGCA CCACCACGAA GGCGGGATGC 2340
AGGCCGGCCA GAGCCGTCGC GGTATCCGGG GTCGGGTGTT CCAGGGCCTC CCGGTACTGC 2400 CCCAGCAGCC CCCACAGGTC CGCCCGCAGC GCCGCCGTGA CTTCCGGGGG GGGGCCCCGG 2460
ACGGCATCGG CCAGGTCGGT CCACCCCGCG GGCAGGGAGG CCCGCAGGGT CGCCAGCACG 2520
GCCGGACACG CCTTTAGCCC CACAAAGTCC GGGAGGGGCC GCAGGACCCC TTGGAGTTTG 2580
TGCAGGAACT TCTCCCGGGC GTCGTGGGCC ACCTTGGCGC GCTCCCGCGC GTCGTTGAGC 2640
ATCGCCTCCA GGGCGTGGGC GCGCTCCCGA AGCCGGGAGC GCGCCTCCGG AGCGAGCTCC 2700 GCCGTCATCT TGGCCGCCTC CATGGCCCTC GCCTGCCGCA GCGCGTCTTC GGCCATGCGC 2760
GTGGCCTCGG GGGACAGCCC GCCCCCGTCG ACGTACGGCG CGGGGCCGGT CGCCGGGACG 2820
AAGGCCGCGT CGCTGTCCAG CTGCTGCGCG AGCGCCGCGT CGAGGGCGTC GAAGCGCTGC 2880
AGTTCGGCCA GCCCCGAGCT GCGCCGCGCC TGCTGGTCGT TGATGCCGTG GATGCTGCGC 2940 GCCAGCTCTT CCAGGGGCTT GCGTTCGATG AGCCCCTGGG TCGCGGCGTC GGTCAGGACC 3000
GAGAGCCAGG CCGCCAGGTC CTCGGGGGCA TCTAGGGTCT GGCCCCGCTG GAGCAGGTCC 3060
CGCAGCAGGA TGGCCTGGGG GCTGGTGGCG AGGGGGGGCG GGGGGGGGAG CGCGGCGCGC 3120
TGAGCGACGT CCCGCGTGTG TTGGTCAAAG GCCGGTAGCG ATTCCAGCAA CTGGACCATG 3180 GGCACGACCG CGGCCGAGGC CACGTGAAAC CGACAGTCGT GGCTGTCGCT GGCCTGCAGG 3240
GCCTTCGCGC TGTATACGGC TCCCCGGTGG AAGTACTCCT TGATCGCGCT CTCGATCGCC 3300
CGGCGGGCCT GGATCCGCAC GTCCTCCAGC CGCGCCTGGA TGGCCTCGGG GCCCAGGGCG 3360
GGCGGGCACG GGGCCCTGCC GCCGGCGCCC GGGGCGGCGG GCACGGGCAT CACGGTCAGG 3420
GGCCCGGCGC GCTGCGAGAC CGAGTCGACC CCGCGGGCGA GGGCGTCTAA GGCCTCGCGC 3480 ATCTCGCGGG CCTCCGCCTC GACCCGCATC TCTTCGCCCC GGGCAAACTG GGCCAGCGCC 3540
TGGATCCGAT GGAGAAGCGG CTCCGGGTGC GTCGGGGTGG CGGGGGCGAA CAGGGTGTTC 3600
GGGTGGGCGC GCGAGCGCTC CAGGAGCCAC TCTCCGAGGC GTGCGTACAG ATTGGCCGGC 3660
GGGGCGGCGC GCAGCTGCAG ATCCAGGTCC GCGAGGTCCC CGTAAAAGGC GTCCGTCTCC 3720
CGAATAACGT CCCTGGCGAC CAGGACCAGC TTAGCGAGGG CCAGGCGCCC GATCTGCGAA 3780 TTTTCGTCCA GCACGTGCTG GATGAGGGGC CGGTGGGCGG CCACGTCCGC CAGGCTCATG 3840
CGCGTGGACG CCAGGAAGTC CCCGACGGCC GTTTTGCGGG GCAGCATGCG CAGGGTGAAG 3900
TCCAGCAGGG CCGCGGCCGG GCCGGCCACC CCGGCCTGCG TATGCGTGCG GGCCCCGTTC 3960
TCGATCAAAA AGGCGAGGAC GCGCTCAAAG AAGAAGATGA CGCAGAGCTC CAACAGCCCC 4020
GGGTGCGCCG GGTACGGCGA CCGCAGGGCG TTGATGGTGA GCTGCGAACA CGCGGCCACC 4080 TCGCGGGCCA GGGCGGCATC GCGCGCCGCG AGCCGGACCG CCGTGGCGGC CACATTGGGG 4140
TGGACCTCGA ACAGCTGCGC CAGGTCGGCG CCGGGGGGCT CCGGGGGGCG GCGGGCCCCC 4200
AGCGTCTCGA GCACGGACGG CGACGACGGG CTCGCGGGCC CGTCATCGCC GCCTCCCTGC 4260
CCGGACTGCG GGGGGGTATC CGGTGCGGGA GGGACCGTGG CGGCTATGGG CGTCGGGGAG 4320
GAGGCGGGGA CCTCGGCGGC GACGGGGGCC TTCTTCTTGG GCGCGGACTT CTTCTTGGCC 4380 TTGGCGGGCG GGGCCTTGGG GGCGGGCCTC TCGCCCGAGG TCAGATCCTC CACGCTGGAC 4440
GGTGGGGTCC AGGTGGGCCG GCGGCGCTTG GGCAAGCCGG TAGAATAGCG CGCCCGGTGG 4500
CGACCCACCG GCACTGCCCC CACCTCCAGG ACCCGCAGGT CCTCGGCTTC TTCGGCCGCG 4560
TCCCCGGCGG GTGTCTGCGG GGGCGGGGCG GCGTGCGGTG GACCCGAGGC CGCGGCGTCC 4620
GGGGCCGAGG GCTTTGCGGG CGGGGTCCCC TCCAGGGCTG CTGCCCACAC ATCATCGGGG 4680 GGGCGGTTTG GGTGCCCCGC CTGCGGTGTG TTGGGTGGGC CCGAGGCCCC CCGGGGGGCC 4740
TCGGGGGGCC GGTCGGCCCG AGGGGTCTGG ACGTGGGTGG GCGCGGGGAG CGCGGGGACG 4800
ACCGGGCCCG AGCCTTCTCC GTCCCCCCTG GGGACCACAC CGACAAAGAG CGCCCCTAGC 4860
CCCCCGATCT CGCCCCGCAG GGGGTGGGTG ATGGCCACGC GCCGCTCGAC GAACGGTTCG 4920
TCCTGCAGGT AAGTCTCGCT GGCCCCGTAG AGGTGCAGGG CCGCGGCGGT CAGGTCCGCC 4980 GGCGCCACGG CCCCCGGGCC GGAGGGCACA AAAAACACCA TGGCGCCCGC CCACCGCACC 5040
TTGGGGCGGT CGTGGGCGTA ATACGTCAGG TACGGGTACA CGTCGCCCGC CCGCACCTTG 5100
GCGATAAACG CGGGCGTTCC CGCGGGCAGG CCGTGCGGGT CAAACAGATA GGCCGTGTCG 5160
CCGTCCCGGT AGAGCCCCAT GCCCAGGGGG CCGATGGTCA GGAGCGTGTA GGACAGCGGC 5220
CGCATGGCCC AGGGGCCGGC GAAGAACGTG TGCGCGGGGC ATTGCGTCTC CAGCAGCCCC 5280 GCCGTGGGCT CCCCGAAGAA GCCCACCTCG CCGTACACCC GCGAAAACAC GCAACGCAGG 5340
CCGCCGCGCG CCGCCGGGTA CTCCAGGAAG TTGGGGAGCT CGATAATGGA ACACATGCGC 5400
GGCGGCCCGG AGCCCGCGGC CGCGCGCGTC CACTCGCCCC CCTCCACCAG ACATCCCTCA 5460
ATGGCCTCCG CGGACAGCAC GTCGCGGGGC CCCACGTCGA AAAGAAGACT GAGAAACGAC 5520 AGGGACGAGC GCATGCACGA TACCGACCCC CCCGGCTCCA GATCGGTCGC GAACTGGTTC 5580
CGAACACCGG TGACCACGAT ATCGCGATCC CCCTGGCGCT TCATCGTGGG GTGAGGTAGC 5640
GCGGCCGGAA TCATGTGTGC CGCGCCCGCC ACGAGCGGGG CCTGTTTATG GGCCGGGCGT 5700
CCCGATGAGT ACTGTTGTTT CCGCCGCCCG AACCCCCCCC GCCCATCAAC CGCCTGTTCG 5760 TCCCCCTAAC CACACACCCG GTATCGCGTG TGTGGTTTCC CGGGAAGCCA CATCCCACCC 5820
CATGAAGTTT TGCCCTTTTT TTCCGTCCCG CACTACGCCA CCTTTCCACC CCCCCCCCCC 5880
AAAAAAAAAA AAACAACAAC CAACTCCCAG ATGGATGGGT GCGATAATAA AGCTTTATTA 5940
TTGTTTAACC AAAGGCGAGT CCTACGGGTG TACCGGTGGT GTCTCCTGCG GCGTCATCTC 6000
GTCGTCCTCC ACGGGGGTGT TGGGCCAAGG GACCGTCTCG CGGCCCGCCG GGCGCGTCGA 6060 CGGCGCGCGG GCCTGCGTGT CCTGTGGGCC GGGTGTCGTG GGTTCGGGGG TGCTACCGCC 6120
GGCATCTTGG GCCTCCAGGT CCCCGGGGGC CTCCGGGCCG GCGGAAGGCC GAAACGCCGA 6180
GGCGCGAAAC ACGCCGTCGG TGACCTGCAG GAGCTCGTTT ATTAATAGCC AGTCCATGCT 6240
CAGCGTAGCG GCCAGCCCCT GGGGAGACAG GTCCACGGAG TCCGGAACCA CCGTCGGCTG 6300
ACCCAGGGGC CCCAGGCTGT AGTCCCCCCA GGCCCCCAGG TCATGACGGT TCGTGAGCAC 6360 GACGAGGTCT GCGGCCGGGC TGGGGGGCGC GTCCTCGGTC GCGTGGGCCA TCACCTCCTG 6420
AATGGCTGCG GTGCGCTGAT CGGCCGAGCT GGCGAAGGGC GCCACGACCA GCGCGCGCTC 6480
CGTCTGCAGG CCCTTCCACG TGTCGTGGAG TTCCTGAACG AACTCGGCCA CCCGCTCGGG 6540
GCCCGTGGCC GCGCGCGCGG CCTGATAGCC GGCCGAGAGG CGCCGCCAGC GCGCCAGGAA 6600
CTGACTCATG TAACAGAACC CGGGGACCTG GTCCCCCGAC ATCAACTTTG ACGCCCTGGC 6660 GTGGATGCCC GACACGATGG CCAGGAACCC GTGGATTTCC CGCCGCACGA CGGCCAGCAC 6720
GTTACCCTCG TGCGAGACCT GGGCCGCCAG CTCGTCGCAT ACCCCGAGGT GCGCCGTCGT 6780
CTCGGTGACG ACGGACCGCA GCCCCGCGAG GGACGCGACC AGCGCGCGCT TGGCGTCGTG 6840
ATACATGCCG CAGTACTGGC TCACCGCGTC GCCCATGGCC TCGGGGCGCC AGGGCCCCAG 6900
GCGCTCGTGG GCGTCTGCGA CCACGGCGTA CAGGCGGTGC CCGTCGCTCT CGAACCGGCA 6960 CTCAAAGAAG GCGGCGAGCG TGCGCATGTG CAGCCGCAGC AGCACGATCG CGTCCTCCAG 7020
CTGGCGGACC AGGGGGTCGG CGCGCTCGGC GAGCTCCTGC AGCACCCCCC GGGCCGCCAG 7080
GGCGTACATG CTGATCAGCA GCAGGCTGCT GCCCACCTCG GGAGGCTGGG GGGGAGGCAG 7140
CTGGACCGCG GGCCGCAGCT GCTCGACGGC CCCCCTGGCG ATCACGTACA GCTCGCGCAG 7200
CAGCTGCTCG ATGTTGTCGG CCATCTGCAT CGTGGGCCCG ACGCCGGCCC GGGTGGCCGG 7260 TTCGAGGAGG GTGATCAGCG CGCCCAATTT TGTGCGGTGC CCCTCGACGG TGGGGAGATA 7320
GCCCAGGCCG AAGTCGCGCG CCCAGGCCAG CACCCGCAGG GCAAACTCGA TGGGGCGGGG 7380
CAGGTAGGCA GCGTTGCACG TGGCCCTCAG CGCGTCCCCG ACCACCAGGG CCAGCACGTA 7440
AGGGACGAAC CCCGGGTCGG CGAGGACGTT GGGGTGGATG CCCTCCAGGG CCGGGAAGCG 7500
GATCTTGGTG GCCGCGGCCA GGTGAACCGA GGGGGCGTGG CTAGGCGGCC CGACGGGGAG 7560 CAGCGCGGAC AGCGGCGTGG CCGGGGTGGT GGGGGTCAGG TCCCAGTGGG TCTGGCCGTA 7620
CACGTCGAGC CAGATGAGCG CCGTCTCGCG CAGGAGGCTG GGCTGGCCGG CGCTGAAGCG 7680
GCGCTCGGCC GTCTCAAACT CCCCCACGAG CGTGCGCCGC AGGCTCGCCA GGTGTTCCGT 7740
CGGCACGGCC GGGCCCATGA TGCGCGCCAG CGTCTGGCTG AGGACGCCGC CCGACAGGCC 7800
GACCGCCTCA CAGAGCCGCC CGTGCGTGTG CTCGCTGGCG CCCTGGATCC GCCGGAACGT 7860 TTTCACGTAG CCGGCGTAGT GCCCGTACTC CCGCGCGAGC CCGAACACGT TCGCCCCCGC 7920
AAGGGCAATG CACCCAAAGA GCTGCTGGAT CTCGCTGAGC CCGTGGCCGG GGGGCGTCCG 7980
CGCGGGCACC CCCGCCACCA AAAACCCCTC CAGGGCCGAT ATGTACTGGG TGCAGTGCGC 8040
GGGCGTGAAC CCCGCGTCGG TAAGCGTGTT GATCACCACG GAGGGCGAGT TGCTGTTCTG 8100 GACCAAAGCC CACGTCTGCT GCAGCAGCGC GAGGAGCCGT TGCTGGGCCC CGGCGGAGGG 8160
CGGCTCCCCT AGCTGCAGCA GGCCGGTGAC GGCCGGACGG AAGATGGCCA GCGCCGACGC 8220
ACTCAGAAAC GGCACGTCGG GGTCGAAGAC GGCCGCGTCC GTCCGCACGC GCGCCATCAG 8280
CGTCCCCGGG GGCGCGCACG CCGACCGCGG GCTGACGCGG CTTAGGGCGG TCGACACGCG 8340 CACCTCCTCG CGACTGCGAA CCATTTTGGT GGCCTCGAGG GGCGGGATCA TGATAGCCGG 8400
GTCGATCTCC CGCACCGTGT GCTGAAACTG GGCCAGCAGC GGCGGCGGGA CCACCGCGCC 8460
CCGATCGGGG GTCGTCAGGT ACTCGTCCAC CAGCGCCAGC GTAAACAGGG CCCGCGTGAG 8520
GGGGGTCAGG GCGGCGTCGT CGATGCGCTG TAGGTGCGCC GAGAACAGCG TCACCCAATT 8580
GCTGACCAGG GCCAAGAACC GGAGACCCTC TTGCACGATC GGGGACGGGA AGAGCAGGCT 8640 GTACGCCGGG GTGGTCAGGT TGGCGCCGGG TTGCCCCAGG GGAACCGGGG ACATCTTAAG 8700
CGACATCTCC CCGAGGGCCT CCAGGGAGGT CCGCGGGTTC ATGGCCAGGC AGCTCTGGGT 8760
GACGGTCCGC CAGCGGTCGA TCCACTCCAC GGCACACTGG CGGACGCGCA CCGGCCCCAG 8820
GGCCGCCGTG GTGCGCAGCC CGGCGGCCTC CAGCGCGTGG GTCGTGTCGG AGCCGGTGAT 8880
CGCCAGGACC GTGTCCTTGA TGACGTCCAT CTCCCGGAAG GCCGCCTCGG GGGTCTCGGG 8940 GAGCGCCACC GCCATGCGGT GCACCAGCAG CCCGGGGAGG TTCTCGGCCA AGAGCGCCGT 9000
CTCCGGAAGC CCGTGGGCCC GGTGCAAGGC GCACAGTTGC TCCAGGAGCG GGTGCCAGCA 9060
CGCCCGCGCC TCCGCCGGGC CGACCGCCGC GCCCGACAAC AGAAACGCCG CCGTGGCGGC 9120
GCGCAGTTTG GCCGCGGACA GAAACGCCGG CTCGTCCGCG CTGCCCGCCG GCTCGCTCGA 9180
GGGGGAGGGC GGCCGGCGGA GGTTGGTCAG GCTCCCCAAC AGGACCTGCA ACGGTCCGTT 9240 TGGGGGTGGA GCGGACGGGG GGGTCATGCC GGCGGGCGCC GGGACCTGGA GCGCGCTGTC 9300
CGACATGGCG ACCGGCGTGC GCGCTCGGCG ACGCGGCGCG GAGACCGCGG GCCCAAACGG 9360
GAATGACTGC CGCCGCCCTA TACGGAGGGG CTAAGTATCG CCCGGGGACC CTTCGAAACC 9420
CCGGGCGTGT CGCAAGTACG CCGCGAAGGC GCGGCGTGTT ATACGGCGCG TTATGTCCCG 9480
GCATTCCGTT CGTGGGTTCG GGCCCGGGTG CTGTCGGGTG GGAGTGTGTG TGTGTGGGGG 9540 GGGGGCGGCG CGACGGCGGC CCGGACCAAG TGTATCGCGG CCGTTCCGTG GGGCGGCCCA 9600
ACAGGCCCTT TAAACATTTG CGTATGCACC GGCCCAGCCA GTCGGACACC GGAACCCACC 9660
AGAGGCGGAA GCCGCCTTCG CCCGTGAGGG TGCGTGTGTT TTCTGGTGGC GTGTTTTTCC 9720
TTTCCGCCCT CCTCCCTCCC CACCTCCACC ACCCCCCCCC CACAACTCGC CCGTTGGCGA 9780
TCGGCGGGAA AACCATGAAA ACCAAGCCAC TCCCGACAGC CCCGATGGCG TGGGCCGAGA 9840 GTGCCGTGGA AACCACCACC AGCCCGCGCG AGCTCGCGGG CCACGCCCCG CTCCGGCGCG 9900
TCCTGCGCCC GCCCATCGCT CGCCGCGACG GCCCGGTGCT TTTGGGGGAC AGGGCCCCCA 9960
GGAGGACGGC CAGTACGATG TGGCTGCTGG GGATCGACCC CGCGGAGTCG TCTCCGGGAA 10020
CGCGCGCTAC CCGAGACGAT ACCGAGCAGG CCGTGGACAA GATCCTCAGG GGAGCCCGGC 10080
GCGCGGGAGG GCTGACCGTC CCCGGCGCCC CCCGCTATCA CCTGACCCGC CAGGTAACCC 10140 TGACGGATCT CTGCCAACCA AACGCGGAGC GGGCCGGGGC GCTCCTTTTG GCCCTGCGGC 10200
ACCCCACCGA CCTCCCCCAC CTGGCCCGCC ATCGGGCTCC GCCCGGCCGG CAGACCGAGC 10260
GACTGGCCGA GGCCTGGGGC CAGCTCCTGG AGGCCTCCGC CCTGGGGTCC GGGCGGGCCG 10320
AGAGCGGCTG CGCGCGCGCG GGCCTTGTGT CGTTTAACTT TCTGGTGGCC GCGTGCGCCG 10380
CCGCCTACGA TGCGCGCGAC GCCGCCGAGG CGGTCCGGGC CCACATCACG ACCAACTACG 10440 GCGGGACGCG GGCCGGGGCG CGGCTGGACC GGTTTTCCGA ATGCCTGCGC GCCATGGTCC 10500
ACACGCACGT GTTTCCCCAC GAGGTCATGC GGTTTTTCGG GGGGCTAGTG TCGTGGGTCA 10560
CACAGGACGA GCTGGCTAGC GTCACCGCCG TCTGCAGCGG ACCCCAGGAG GCCACACACA 10620
CCGGCCACCC GGGCAGGCCC TGTTCGGCCG TTACCATCCC GGCCTGCGCC TTCGTGGACC 10680 TGGACGCCGA GCTGTGCCTG GGGGGCCCTG GGGCGGCGTT CCTGTACTTG GTCTTCACCT 10740
ACCGACAGTG CCGGGACCAG GAGCTCTGTT GCGTGTACGT GGTCAAGAGC CAGCTCCCCC 10800
CGCGCGGACT GGAGGCGGCC CTCGAGCGGC TGTTCGGGCG CCTCCGGATA ACCAACACGA 10860
TTCACGGGGC CGAGGACATG ACGCCCCCTC CCCCGAACCG AAACGTTGAC TTTCCGCTCG 10920 CCGTCCTGGC CGCGAGCTCG CAATCCCCGC GGTGCTCGGC GAGCCAAGTC ACGAACCCCC 10980
AGTTTGTCGA CAGGCTGTAC CGCTGGCAGC CGGATCTGCG GGGGCGCCCT ACCGCACGCA 11040
CCTGCACATA CGCCGCCTTC GCAGAGCTGG GTGTCATGCC AGACAACAGC CCCCGCTGTC 11100
TGCACCGCAC CGAGCGGTTT GGGGCGGTCG GCGTTCCGGT TGTCATCCTG GAGGGCGTGG 11160
TGTGGCGCCC CGGCGGGTGG CGGGCCTGCG CGTGATCGTC TATTGACGAC GGCCGCCCAA 11220 CCCGAGCGAC CTTCCCCTCC CACTTTCCCC CCCCCCCCTC CTACACACCA ACTCCGCCCT 11280
CGCCGTCTTG GCCGTGCGCG GCCCCGTGCG TCCGTCTCAA TAAAGCCAGG TTAAATCCGT 11340
GACGTGGTGT GTTTGGCGTG TGTCTCTGAA ATGGCGGAAA CCGACATGCA AATGGGATTC 11400
ATGGACACGT TACACCCCCC TGACTCAGGA GATAGGCATA TCCTCCTTAG ATTGACTCAG 11460
CACACGATCG CACCCCACCC CTGTGTGCCG GGGATAAAAG CCAACGCGGG CGGTCTGGGT 11520 TACCACAACA GGTGGGTGCT TCGGGGACTT GACGGTCGCC ACTCTCCTGC GAGCCCTCAC 11580
GTCTTCGCCC ACCGATTCCT GTTGCGTTCC TGTCGGCCGG TGCTGTCCTG TCGACAGATT 11640
GTTGGCGACT GCCCGGGTGA TTCGTCGGCC GGTGCGTCCT TTCGGTCGTA CCGCCCACCC 11700
CGCCTCCCAC GGGCCCGCCG CTGTTTCCGT TCATCGCGTC CGAGCCACCG TCACCTTGGT 11760
TCCAATGGCC AACCGCCCTG CCGCATCCGC CCTCGCCGGA GCGCGGTCTC CGTCCGAACG 11820 ACAGGAACCC CGGGAGCCCG AGGTCGCCCC CCCTGGCGGC GACCACGTGT TTTGCAGGAA 11880
AGTCAGCGGC GTGATGGTGC TTTCCAGCGA TCCCCCCGGC CCCGCGGCCT ACCGCATTAG 11940
CGACAGCAGC TTTGTTCAAT GCGGCTCCAA CTGCAGTATG ATAATCGACG GAGACGTGGC 12000
GCGCGGTCAT TTGCGTGACC TCGAGGGCGC TACGTCCACC GGCGCCTTCG TCGCGATCTC 12060
AAACGTCGCA GCCGGCGGGG ATGGCCGAAC CGCCGTCGTG GCGCTCGGCG GAACCTCGGG 12120 CCCGTCCGCG ACTACATCCG TGGGGACCCA GACGTCCGGG GAGTTCCTCC ACGGGAACCC 12180
AAGGACCCCC GAACCCCAAG GACCCCAGGC TGTCCCCCCG CCCCCTCCTC CCCCCTTTCC 12240
ATGGGGCCAC GAGTGCTGCG CCCGTCGCGA TGCCAGGGGC GGCGCCGAGA AGGACGTCGG 12300
GGCCGCGGAG TCATGGTCAG ACGGCCCGTC GTCCGACTCC GAAACGGAGG ACTCGGACTC 12360
CTCGGACGAG GATACGGGCT CGGGTTCGGA GACGCTGTCT CGATCCTCTT CGATCTGGGC 12420 CGCAGGGGCG ACTGACGACG ATGACAGCGA CTCCGACTCG CGGTCGGACG ACTCCGTGCA 12480
GCCCGACGTT GTCGTTCGTC GCAGATGGAG CGACGGCCCT GCCCCCGTGG CCTTTCCCAA 12540
GCCCCGGCGC CCCGGCGACT CCCCCGGAAA CCCCGGCCTG GGCGCCGGCA CCGGGCCGGG 12600
CTCCGCGACG GACCCGCGCG CGTCGGCCGA CTCCGATTCC GCGGCCCACG CCGCCGCACC 12660
CCAGGCGGAC GTGGCGCCGG TTCTGGACAG CCAGCCCACT GTGGGAACGG ACCCCGGCTA 12720 CCCAGTCCCC CTAGAACTCA CGCCCGAGAA CGCGGAGGCG GTGGCGCGGT TTCTGGGGGA 12780
CGCCGTCGAC CGCGAGCCCG CGCTCATGCT GGAGTACTTC TGTCGGTGCG CCCGCGAGGA 12840
GAGCAAGCGC GTGCCCCCAC GAACCTTCGG CAGCGCCCCC CGCCTCACGG AGGACGACTT 12900
TGGGCTCCTG AACTACGCGC TCGCTGAGAT GCGACGCCTG TGCCTGGACC TTCCCCCGGT 12960
CCCCCCCAAC GCATACACGC CCTATCATCT GAGGGAGTAT GCGACGCGGC TGGTTAACGG 13020 GTTCAAACCC CTGGTGCGGC GGTCCGCCCG CCTGTATCGC ATCCTGGGGA TTCTGGTTCA 13080
CCTGCGCATC CGTACCCGGG AGGCCTCCTT TGAGGAATGG ATGCGCTCCA AGGAGGTGGA 13140
CCTGGACTTC GGGCTGACGG AAAGGCTTCG CGAACACGAG GCCCAGCTAA TGATCCTGGC 13200
CCAGGCCCTG AACCCCTACG ACTGTCTGAT CCACAGCACC CCGAACACGC TCGTCGAGCG 13260 GGGGCTGCAG TCGGCGCTGA AGTACGAAGA GTTTTACCTC AAGCGCTTCG GCGGGCACTA 13320
CATGGAGTCC GTCTTCCAGA TGTACACCCG CATCGCCGGG TTCCTGGCGT GCCGGGCGAC 13380
CCGCGGCATG CGCCACATCG CCCTGGGGCG ACAGGGGTCG TGGTGGGAAA TGTTCAAGTT 13440
CTTTTTCCAC CGCCTCTACG ACCACCAGAT CGTGCCGTCC ACCCCCGCCA TGCTGAACCT 13500 CGGAACCCGC AACTACTACA CGTCCAGCTG CTACCTGGTA AACCCCCAGG CCACCACTAA 13560
CCAGGCCACC CTCCGGGCCA TCACCGGCAA CGTGAGCGCC ATCCTCGCCC GCAACGGGGG 13620
CATCGGGCTG TGCATGCAGG CGTTCAACGA CGCCAGCCCC GGCACCGCCA GCATCATGCC 13680
GGCCCTGAAG GTCCTGGACT CCCTGGTGGC GGCGCACAAC AAACAGAGCA CGCGCCCCAC 13740
CGGGGCGTGC GTGTACCTGG AACCCTGGCA CAGCGACGTT CGGGCCGTGC TCAGAATGAA 13800 GGGCGTCCTC GCCGGCGAGG AGGCCCAGCG CTGCGACAAC ATCTTCAGCG CCCTCTGGAT 13860
GCCGGACCTG TTCTTCAAGC GCCTGATCCG CCACCTCGAC GGCGAGAAAA ACGTCACCTG 13920
GTCCCTGTTC GACCGGGACA CCAGCATGTC GCTCGCCGAC TTTCACGGCG AGGAGTTCGA 13980
GAAGCTGTAC GAGCACCTCG AGGCCATGGG GTTCGGCGAA ACGATCCCCA TCCAGGACCT 14040
GGCGTACGCC ATCGTGCGCA GCGCGGCCAC CACCGGAAGC CCCTTCATCA TGTTTAAGGA 14100 CGCGGTAAAC CGCCACTACA TCTACGACAC GCAAGGGGCG GCCATTGCCG GCTCCAACCT 14160
CTGCACGGAG ATCGTCCACC CGTCCTCCAA ACGCTCCAGC GGGGTCTGCA ACCTGGGCAG 14220
CGTGAATCTG GCCCGATGCG TCTCCCGGCG GACGTTCGAT TTTGGCATGC TCCGCGACGC 14280
CGTGCAGGCG TGCGTGCTAA TGGTTAATAT CATGATAGAC AGCACGCTGC AGCCGACGCC 14340
CCAGTGCGCC CGCGGCCACG ACAACCTGCG GTCCATGGGC ATTGGCATGC AGGGCCTGCA 14400 CACGGCGTGC CTGAAGATGG GCCTGGATCT GGAGTCGGCC GAGTTCCGGG ACCTGAACAC 14460
ACACATCGCC GAGGTGATGC TGCTCGCGGC CATGAAGACC AGTAACGCGC TGTGCGTTCG 14520
CGGGGCGCGT CCCTTCAGCC ACTTTAAGCG CAGCATGTAC CGGGCCGGCC GCTTTCACTG 14580
GGAGCGCTTT TCGAACGCCA GCCCGCGGTA CGAGGGCGAG TGGGAGATGC TACGCCAGAG 14640
CATGATGAAA CACGGCCTGC GCAACAGCCA GTTCATCGCG CTCATGCCCA CCGCCGCCTC 14700 GGCCCAGATC TCGGACGTCA GCGAGGGCTT TGCCCCCCTG TTCACCAACC TGTTCAGCAA 14760
GGTGACCAGG GACGGCGAGA CGCTGCGCCC CAACACGCTC TTGCTGAAGG AACTCGAGCG 14820
CACGTTCGGC GGGAAGCGGC TCCTGGACGC GATGGACGGG CTCGAGGCCA AGCAGTGGTC 14880
TGTGGCCCAG GCCCTGCCTT GCCTGGACCC CGCCCACCCC CTCCGGCGGT TCAAGACGGC 14940
CTTCGACTAC GACCAGGAAC TGCTGATCGA CCTGTGTGCA GACCGCGCCC CCTATGTTGA 15000 TCACAGCCAA TCCATGACTC TGTATGTCAC AGAGAAGGCG GACGGGACGC TCCCCGCCTC 15060
CACCCTGGTC CGCCTTCTCG TCCACGCATA TAAGCGCGGC CTGAAGACGG GGATGTACTA 15120
CTGCAAGGTT CGCAAGGCGA CCAACAGCGG GGTGTTCGCC GGCGACGACA ACATCGTCTG 15180
CACAAGCTGC GCGCTGTAAG CAACAGCGCT CCGATCGGGG TCAGGCGTCG CTCTCGGTCC 15240
CGCATATCGC CATGGATCCC GCCGTCTCCC CCGCGAGCAC CGACCCCCTA GATACCCACG 15300 CGTCGGGGGC CGGGGCGGCC CCGATTCCGG TGTGCCCCAC CCCCGAGCGG TACTTCTACA 15360
CCTCCCAGTG CCCCGACATC AACCACCTTC GCTCCCTCAG CATCCTGAAC CGCTGGCTGG 15420
AGACCGAGCT CGTGTTCGTG GGGGACGAGG AGGACGTCTC CAAGCTCTCC GAGGGCGAGC 15480
TCGGCTTCTA CCGCTTTCTG TTTGCCTTCC TGTCGGCCGC GGACGACCTG GTGACGGAAA 15540
ACCTGGGCGG CCTCTCCGGC CTCTTCGAAC AGAAGGACAT TCTTCACTAC TACGTGGAGC 15600 AGGAATGCAT CGAGGTCGTC CACTCGCGCG TCTACAACAT CATCCAGCTG GTGCTCTTTC 15660
ACAACAACGA CCAGGCGCGC CGCGCCTATG TGGCCCGCAC CATCAACCAC CCGGCCATTC 15720
GCGTCAAGGT GGACTGGCTG GAGGCGCGGG TGCGGGAATG CGACTCGATC CCGGAGAAGT 15780
TCATCCTCAT GATCCTCATC GAGGGCGTCT TTTTTGCCGC CTCGTTCGCC GCCATCGCGT 15840 ACCTGCGCAC CAACAACCTC CTGCGGGTCA CCTGCCAGTC GAACGACCTC ATCAGCCGCG 15900
ACGAGGCCGT GCATACGACA GCCTCGTGCT ACATCTACAA CAACTACCTC GGGGGCCACG 15960
CCAAGCCCGA GGCGGCGCGC GTGTACCGGC TGTTTCGGGA GGCGGTGGAT ATCGAGATCG 16020
GGTTCATCCG ATCCCAGGCC CCGACGGACA GCTCTATCCT GAGTCCGGGG GCCCTGGCGG 16080 CCATCGAGAA CTACGTGCGA TTCAGCGCGG ATCGCCTGCT GGGCCTGATC CATATGCAGC 16140
CCCTGTATTC CGCCCCCGCC CCCGACGCCA GCTTTCCCCT CAGCCTCATG TCCACCGACA 16200
AACACACCAA CTTCTTCGAG TGCCGCAGCA CCTCGTACGC CGGGGCCGTC GTCAACGATC 16260
TGTGAGGGTC TGGGCGCCCT TGTAGCGATG TCTAACCGAA ATAAAGGGGT CGAAACGGAC 16320
TGTTGGGTCT CCGGTGTGAT TATTACGCAG GGGAGGGGGG TGGCGGCTGG GGAAAGGGAA 16380 GGAACGCCCG AAACCAGAGA AAAGGACCAA AAGGGAAACG CGTCCAACCG ATAAATCAAG 16440
CGCCGACCAG AACCCCGAGA TGCATAATAA CAAACGATTT TATTACTCTT ATTATTAACA 16500
GGTCGGGCAT CGGGAGGGGA TGGGGGCGCG CGTTTCCTCC GTTCCGGCTA CTCGTCCCAG 16560
AATTTAGCCA GGACGTCCTT GTAAAACGCG GGCGGGGGCG CGTGGGCCCA CAGCTGCGCC 16620
AGAAACCGGT CGGCGATGTC CGGGGCGGTG ATATGCCGAG TCACGATGGA GCGCGCTAAA 16680 TCTTCGTCGC GGAGGTCCTG ATAGATGGGC AGTCTTTTTA GAAGAGTCCA GGGTCCCCGC 16740
TCCTTGGGGC TGATAAGCGA TATGACGTAC TTGACGTATC TGTGCTCCAC CAGCTCGGCG 16800
ATGGTCATCG GATCGGGCAG CCAGTCCAGG GCCTCCGGGG CGTCGTGGAT GACGTGGCGG 16860
CGACGTCCGG CGACATAGCC GCGGTGTTCC GCGACCCGCT GCGCGTTGGG GACCTGCACG 16920
AGCTCGGGCG GGGTGAGTAT CTCCGAGGAG GACGACCGGG CGCCGTCGCG CGGCCCACCG 16980 GCGACGTCCG GGGGCTGGAG GGGGGGGTCT TCTTCGTAGT CGTCCTCGCC CGCGATCTGT 17040
TGGGCCAGAA TTTCGGTCCA CGAGATGCGC GTCTCGAGGC CGACCGGGGC CGCGGTCAGC 17100
GTAGGCATGC TCTCCAGGGA GCGCGAGTTG GCGCGCTCCC GCCGGGCCGC CCGGCGGGCC 17160
TGGGATCGGC TCGGGGCGGT CCAGTGACAC TCGCGCAGCA CGTCCTCGAC GGACGCGTAG 17220
GTGTTATTGG GGTGCAGGTC TGTGTGGCAG CGGACGAACA GCGCCAGGAA CTGCGGGTAA 17280 CTCATCTTGA AGTACTGCAG CAGGTCGCGG CAGTGAATCG TCGGAATGTA GCCGGTGCTG 17340
ATGTCCAACA CGATATCGCA GCCCATCAGC AGGAGATCGG TATCCGTGGT ATGCACGTAC 17400
GCGACCGTGT TGGTATGATA GAGGTTCGCG CAGGCGTCGT CGGCCTCCAG CTGACCCGAG 17460
TTGATGTAGG CGTACCCCAG CGCCCGCAGA ACGCGGATAC AGAACAGGTG AGCCAGGCGC 17520
AGGGCCGGCT TCGAGGGCGC GCCCCAGGGG GCCGCCGGGC CTGGGCCGGC GGCCCGCGTT 17580 CCCCGGTCCC CCGGGGCGAA GGCGTGCCCG CGGCGGCGCA TGTTGGAAAA AGGCGAAACT 17640
GGGCCTGGAG TCGGTGATGG GGGAAGGCGG CGGCGAGGCG TCTACGTCAC TGGCCTCCTC 17700
GTCCGTGCGG CACTGGGCCG TCGTGCGGGC CAGGATCGCC TTGGCCCCGA ACACAACCGG 17760
CTCGGTACAC TCGACCCCGC GATCGGTCAC GAAGATGGGG AACAGGGACT TTTGGGTAAA 17820
CACCCGTAAC ATACTACAGA GACAGTGTAG CGTGATTGCC TCGCGGTCGT AACTTGGGTA 17880 GCGGCGCTGA TATTTAACCA CCAGGGTATA CATGACATTC CACAGGTCCA CGGCGATGGG 17940
GGTAAAGTAG CCCTCCGGGG CCCGGAGGCC CCGGCGCTTC ACCAGATGGT GAGTCTGGGC 18000
AAACTTCATC ATGCCAAACA GACCCATTCC GGCACGATTG TAGGTGCGGA TAGGTCTCTC 18060
TACAGAGCTG TATAGGTGTG ACGGTCCGGG ACACCCAAGC CCGCCGCCCC TGTGTACAGT 18120
GGCTGCGGCG ACGACCCCGC TCCAACAAGA CGCTATCCCG GGAAAGGCAC GCTCTTTATA 18180 ATTCTTTTTT ATTTCCCATC TACGTGCGGA TTGGTGCAAC CGCCGGCGCG CGCCGGTGCA 18240
GGCCGACCAT CTCTCTCTTC CCCCCCTCCC CCTCCCCCGA GCCCTCAAAG AGGGTGTGGC 18300
CTAACTAGCG GAAGGCGTAT TTAACCAGAC TAGGGCGGCG GGTCCGCCGT AGTCCTTGGC 18360
TCGGGTAGCC ACTGCTCTGT GGCTCGGGTC CCCCGGCCCC CCTAACCCCC ATCCGGTCCG 18420 CGTCATCCGC CCCCTCCGCC TGCGACACAA ACGGCCGCGC CTCCGGGCCC GGTGACACGA 18480
CGCGCCTCGT CTCTGCGGAT TGTCCCGGGA GCGTCGCGGC ATGGCTCATC TTCCCGGCGG 18540
TGCGGCCGCC GCCCCCCTTT CGGAGGACGC GATCCCGTCG CCGCGCGAGC GGACGGAAGA 18600
CTGGCCGCCC TGCCAGATAG TGCTGCAGGG CGCCGAGCTG AACGGGATCC TGCAGGCCTT 18660 TGCGCCGCTT CGCACGAGCC TTTTGGACTC GCTCCTGGTC GTGGGCGACC GAGGCATCCT 18720
TGTACATAAC GCGATTTTCG GCGAGCAGGT GTTTCTGCCC CTCGACCATT CGCAGTTCAG 18780
TCGCTATCGA TGGGGCGGAC CCACCGCGGC GTTCCTGTCT CTCGTGGACC AGAAGCGATC 18840
CCTGCTGAGC GTGTTTCGCG CCAACCAGTA CCCTGACCTG CGGCGGGTGG AGCTGACGGT 18900
CACGGGCCAG GCCCCGTTTC GCACGCTGGT GCAGCGCATA TGGACGACCG CGTCCGACGG 18960 AGAGGCCGTG GAGCTTGCCA GCGAGACGCT CATGAAACGC GAGTTGACGA GCTTCGCGGT 19020
ACTACTCCCC CAGGGCGACC CCGACGTCCA GCTGCGCCTC ACGAAGCCCC AGCTCACGAA 19080
GGTGGTGAAC GCCGTCGGGG ACGAGACCGC CAAACCCACC ACGTTCGAGC TCGGCCCCAA 19140
CGGCAAGTTT TCCGTGTTTA ACGCGCGCAC CTGCGTCACC TTTGCCGCCC GCGAGGAGGG 19200
CGCGTCGTCC AGCACCAGCG CCCAGGTCCA GATTCTGACC AGCGCGCTGA AGAAGGCGGG 19260 CCAGGCGGCC GCCAACGCCA AGACGGTCTA CGGGGAAAAC ACACACCGCA CATTCTCGGT 19320
GGTCGTCGAC GACTGCAGCA TGCGGGCGGT CCTCCGGCGG CTCCAGGTCG GCGGGGGGAC 19380
CCTCAAGTTC TTCCTCACGG CCGACGTCCC CAGCGTGTGT GTCACCGCCA CCGGCCCCAA 19440
CGCGGTGTCG GCGGTGTTTC TTTTAAAACC CCAGCGGGTC TGCCTGAACT GGCTCGGCCG 19500
GACCCCGGGT TCCTCGACCG GGAGCTTGGC GTCCCAGGAC TCTCGGGCCG GCCCGACCGA 19560 CAGCCAGGAC TTCTCCTCCG AGCCGGACGC GGGCGACCGC GGCGCCCCAG AAGAAGAAGG 19620
CCTCGAGGGC CAGGCCCGGG TCCCGCCCGC GTTCCCGGAA CCGCCGGGAA CCAAGCGGAG 19680
GCACGCCGGG GCCGAAGTTG TCCCCGCGGA CGACGCCACC AAGCGCCCGA AGACGGGCGT 19740
GCCCGCCGCC CCCACGCGAG CCGAGTCGCC CCCCCTCTCC GCGAGATACG GACCCGAGGC 19800
GGCGGAGGGT GGTGGGGACG GCGGCCGCTA CGCGTGCTAC TTTCGCGACC TCCAGACCGG 19860 CGACGCGAGC CCCAGCCCCC TCTCCGCCTT CCGGGGTCCC CAAAGACCCC CATACGGCTT 19920
TGGGTTGCCC TGACGGCGAC GGGTGGTGGC CGAACGCTTC ACCGCGCCCG GGCACGCGGG 19980
GTGCGTTGTG TTAAAAAAAT AAATAAATGG GGTAGTGTGT CCCCCCCCTC CAACCAATAT 20040
GGCTGTCGTG TGTGGTTCCG GGTTGCGCCT CCGTCCTTTC CACCCCCCTT CCCCCTCCTT 20100
TTTTGTTTTG CGTGCGCTTA TAAGAGCGGG CCCGGGGCCC TTCGCAGCTT CACCGAGAGC 20160 GCCGTCGGGC CCCGGGTGCG GGATGTGTCG CGGGGACAGC CCCGGGGTCG CGGGCGGGAG 20220
CGGCGAACAC TGCCTCGGAG GGGATGATGG GGACGACGGG CGCCCCCGCC TCGCCTGCGT 20280
GGGTGCCATC GCTCGGGGGT TCGCGCATCT CTGGCTCCAG GCCACCACGC TGGGCTTCGT 20340
GGGGTCTGTC GTTCTGTCGC GCGGCCCGTA TGCGGACGCC ATGTCGGGGG CGTTCGTGAT 20400
CGGGAGCACC GGCCTGGGGT TCCTCCGCGC CCCCCCCGCG TTCGCCCGGC CGCCGACGCG 20460 TGTGTGCGCG TGGCTGAGGC TGGTCGGCGG GGGAGCGGCC GTGGCCCTGT GGAGCCTCGG 20520
GGAGGCCGGC GCGCCTCCGG GGGTTCCGGG CCCGGCGACC CAGTGCCTGG CGCTCGGGGC 20580
CGCCTACGCG GCGCTGCTGG TGCTGGCCGA CGACGTCCAT CCCCTTTTCC TCCTCGCCCC 20640
GCGGCCCCTG TTTGTCGGCA CCCTGGGGGT TGTCGTCGGC GGGCTGACGA TAGGCGGCAG 20700
TGCGCGCTAC TGGTGGATCG ACCCCCGCGC CGCCGCGGCC CTGACGGCGG CGGTGGTGGC 20760 GGGCCTCGGG ACAACCGCCG CCGGGGACAG CTTTTCCAAG GCCTGTCCCC GCCACCGCCG 20820
CTTTTGCGTC GTCTCCGCGG TCGAGTCTCC CCCGCCCCGA TACGCCCCGG AGGACGCCGA 20880
GCGGCCAACA GACCACGGAC CCCTGTTACC GTCGACGCAC CACCAGCGAT CTCCGCGGGT 20940
CTGCGGCGAC GGGGCCGCAC GGCCCGAAAA CATCTGGGTT CCCGTGGTGA CCTTTGCGGG 21000 CGCGCTCGCG CTGGCCGCCT GCGCCGCGCG AGGG 21035
(2) INFORMATION FOR SEQ ID NO: 120:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1850 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120:
Val Ala Gly Ala Ala His Met Ile Pro Ala Ala Leu Pro His Pro Thr 1 5 10 15
Met Lys Arg Gin Gly Asp Arg Asp Ile Val Val Thr Gly Val Arg Asn
20 25 30
Gin Phe Ala Thr Asp Leu Glu Pro Gly Gly Ser Val Ser Cys Met Arg 35 40 45
Ser Ser Leu Ser Phe Leu Ser Leu Leu Phe Asp Val Gly Pro Arg Asp
50 55 60
Val Leu Ser Ala Glu Ala Ile Glu Gly Cys Leu Val Glu Gly Gly Glu 65 70 75 80 Trp Thr Arg Ala Ala Ala Gly Ser Gly Pro Pro Arg Met Cys Ser Ile
85 90 95
Ile Glu Leu Pro Asn Phe Leu Glu Tyr Pro Ala Arg Gly Leu Arg Cys
100 105 110
Val Phe Ser Arg Val Tyr Gly Glu Val Gly Phe Phe Gly Glu Pro Thr 115 120 125
Ala Gly Leu Leu Glu Thr Gin Cys Pro Ala His Thr Phe Phe Ala Gly
130 135 140
Pro Trp Ala Met Arg Pro Leu Ser Tyr Thr Leu Leu Thr Ile Gly Pro 145 150 155 160 Leu Gly Met Gly Arg Asp Gly Asp Thr Ala Tyr Leu Phe Asp Pro His
165 170 175
Gly Leu Pro Ala Gly Thr Pro Ala Phe Ile Ala Lys Val Arg Ala Gly
180 185 190
Asp Val Tyr Pro Tyr Leu Thr Tyr Tyr Ala His Asp Arg Pro Lys Val 195 200 205
Arg Trp Ala Gly Ala Met Val Phe Phe Val Pro Ser Gly Pro Gly Ala
210 215 220
Val Ala Pro Ala Asp Leu Thr Ala Ala Ala Leu His Leu Tyr Gly Ala 225 230 235 240
Ser Glu Thr Tyr Leu Gin Asp Glu Pro Phe Val Glu Arg Arg Val Ala
245 - 250 255
Ile Thr His Pro Leu Arg Gly Glu Ile Gly Gly Leu Gly Ala Leu Phe 260 265 270
Val Gly Val Val Pro Arg Gly Asp Gly Glu Gly Ser Gly Pro Val Val
275 280 285
Pro Ala Leu Pro Ala Pro Thr His Val Gin Thr Pro Arg Ala Asp Arg
290 295 300 Pro Pro Glu Ala Pro Arg Gly Ala Ser Gly Pro Pro Asn Thr Pro Gin
305 310 315 320
Ala Gly His Pro Asn Arg Pro Pro Asp Asp Val Trp Ala Ala Ala Leu
325 330 335
Glu Gly Thr Pro Pro Ala Lys Pro Ser Ala Pro Asp Ala Ala Ala Ser 340 345 350
Gly Pro Pro His Ala Ala Pro Pro Pro Gin Thr Pro Ala Gly Asp Ala
355 360 365
Ala Glu Glu Ala Glu Asp Leu Arg Val Leu Glu Val Gly Ala Val Pro
370 375 380 Val Gly Arg His Arg Ala Arg Tyr Ser Thr Gly Leu Pro Lys Arg Arg
385 390 395 400
Arg Pro Thr Trp Thr Pro Pro Ser Ser Val Glu Asp Leu Thr Ser Gly
405 410 415
Glu Arg Pro Ala Pro Lys Ala Pro Pro Ala Lys Ala Lys Lys Lys Ser 420 425 430
Ala Pro Lys Lys Lys Ala Pro Val Ala Ala Glu Val Pro Ala Ser Ser
435 440 445
Pro Thr Pro Ile Ala Ala Thr Val Pro Pro Ala Pro Asp Thr Pro Pro
450 455 460 Gin Ser Gly Gin Gly Gly Gly Asp Asp Gly Pro Asp Ser Ser Pro Ser
465 470 475 480
Val Leu Glu Thr Leu Gly Ala Arg Arg Pro Pro Glu Pro Pro Gly Ala
485 490 495
Asp Leu Ala Gin Leu Phe Glu Val His Pro Asn Val Ala Ala Thr Ala 500 505 510
Val Arg Leu Ala Ala Arg Asp Ala Ala Arg Glu Val Ala Ala Cys Ser
515 520 525
Gin Leu Thr Ile Asn Ala Leu Arg Ser Pro Tyr Pro Ala His Pro Gly 530 535 540 Leu Leu Glu Leu Cys Val Ile Phe Phe Phe Glu Arg Val Leu Ala Phe 545 550 555 560
Leu Ile Glu Asn Gly Ala Arg Thr His Thr Gin Ala Gly Val Ala Gly 565 570 575 Pro Ala Ala Ala Leu Leu Asp Phe Thr Leu Arg Met Leu Pro Arg Lys
580 585 590
Thr Ala Val Gly Asp Phe Leu Ala Ser Thr Arg Met Ser Leu Ala Asp 595 600 605 Val Ala Ala His Arg Pro Leu Ile Gin His Val Leu Asp Glu Asn Ser 610 615 620
Gin Ile Gly Arg Leu Ala Lys Leu Val Leu Val Ala Arg Asp Val Ile 625 630 635 640
Arg Glu Thr Asp Ala Phe Tyr Gly Asp Leu Ala Asp Leu Asp Leu Gin 645 650 655
Leu Arg Ala Ala Pro Pro Ala Asn Leu Tyr Ala Arg Leu Gly Glu Trp
660 665 670
Leu Leu Glu Arg Ser Arg Ala His Pro Asn Thr Leu Phe Ala Pro Ala 675 680 685 Thr Pro Thr His Pro Glu Pro Leu Leu His Arg Ile Gin Ala Gin Phe 690 695 700
Arg Glu Glu Met Arg Val Glu Ala Glu Ala Arg Glu Met Arg Glu Ala 705 710 715 720
Leu Asp Arg Val Asp Ser Val Ser Gin Arg Ala Gly Pro Leu Thr Val 725 730 735
Met Pro Val Pro Ala Ala Pro Gly Ala Gly Gly Arg Ala Pro Cys Pro
740 745 750
Pro Ala Leu Gly Pro Glu Ala Ile Gin Ala Arg Leu Glu Asp Val Arg 755 760 765 Ile Gin Ala Arg Arg Ala Ile Glu Ser Ala Ile Lys Glu Tyr Phe His 770 775 780
Arg Gly Ala Val Tyr Ser Ala Lys Ala Leu Gin Ala Ser Asp Ser His 785 790 795 800
Asp Cys Arg Phe His Val Ala Ser Ala Ala Val Val Pro Met Val Gin 805 810 815
Leu Leu Glu Ser Leu Pro Ala Phe Asp Gin His Thr Arg Asp Val Ala
820 825 830
Gin Arg Ala Ala Leu Pro Pro Pro Pro Pro Leu Ala Thr Ser Pro Gin 835 840 845 Ala Ile Leu Leu Arg Asp Leu Leu Gin Arg Gly Gin Thr Leu Asp Ala 850 855 860
Pro Glu Asp Leu Ala Ala Trp Leu Ser Val Leu Thr Asp Ala Ala Thr 865 870 875 880
Gin Gly Leu Ile Glu Arg Lys Pro Leu Glu Glu Leu Ala Arg Ser Ile 885 890 895
His Gly Ile Asn Asp Gin Gin Ala Arg Arg Ser Ser Gly Leu Ala Glu
900 905 910
Leu Gin Arg Phe Asp Ala Leu Asp Ala Ala Gin Gin Leu Asp Ser Asp 915 920 925
Ala Ala Phe Val Pro Ala Thr Gly Pro Ala Pro Tyr Val Asp Gly Gly
930 935 - 940
Gly Leu Ser Pro Glu Ala Thr Arg Met Ala Glu Asp Ala Leu Arg Gin 945 950 955 960
Ala Arg Ala Met Glu Ala Ala Lys Met Thr Ala Glu Leu Ala Pro Glu
965 970 975
Ala Arg Ser Arg Leu Arg Glu Arg Ala His Ala Leu Glu Ala Met Leu 980 985 990 Asn Asp Ala Arg Glu Arg Ala Lys Val Ala His Asp Ala Arg Glu Lys 995 1000 1005
Phe Leu His Lys Leu Gin Gly Val Leu Arg Pro Leu Pro Asp Phe Val
1010 1015 1020
Gly Leu Lys Ala Cys Pro Ala Val Leu Ala Thr Leu Arg Ala Ser Leu 1025 1030 1035 104
Pro Ala Gly Trp Thr Asp Leu Ala Asp Ala Val Arg Gly Pro Pro Pro
1045 1050 1055
Glu Val Thr Ala Ala Leu Arg Ala Asp Leu Trp Gly Leu Leu Gly Gin 1060 1065 1070 Tyr Arg Glu Ala Leu Glu His Pro Thr Pro Asp Thr Ala Thr Ala Gly 1075 1080 1085
Leu His Pro Ala Phe Val Val Val Leu Lys Thr Leu Phe Ala Asp Ala
1090 1095 1100
Pro Glu Thr Pro Val Leu Val Gin Phe Phe Ser Asp His Ala Pro Thr 1105 1110 1115 112
Ile Ala Lys Ala Val Ser Asn Ala Ile Asn Ala Gly Ser Ala Ala Val
1125 1130 1135
Ala Thr Asp Ala Ala Thr Val Asp Ala Ala Val Arg Ala His Gly Ala 1140 1145 1150 Asp Ala Val Ser Ala Leu Gly Ala Ala Ala Arg Asp Pro Asp Leu Ser 1155 1160 1165
Phe Leu Ala Ala Asp Ser Ala Ala Gly Tyr Val Lys Ala Thr Arg Leu
1170 1175 1180
Ala Leu Glu Arg Ala Ile Asp Glu Leu Thr Thr Leu Gly Ser Ala Ala 1185 1190 1195 120
Ala Asp Leu Val Val Gin Ala Arg Arg Ala Cys Ala Gin Pro Glu Gly
1205 1210 1215
Asp His Ala Ala Leu Ile Asp Ala Ala Ala Arg Ala Thr Thr Ala Ala 1220 1225 1230 Arg Glu Ser Leu Ala Gly His Glu Ala Gly Phe Gly Gly Leu Leu His 1235 1240 1245
Ala Glu Gly Thr Ala Gly Asp His Ser Pro Ser Gly Arg Ala Leu Gin 1250 1255 1260 Glu Leu Gly Lys Val Ile Gly Ala Thr Arg Arg Arg Ala Asp Glu Leu
1265 1270 1275 128
Glu Ala Ala Val Ala Asp Leu Thr Ala Lys Met Ala Ala Gin Arg Arg
1285 1290 1295 Ser Ser Trp Ala Ala Gly Val Glu Ala Ala Leu Asp Arg Val Glu Asn
1300 1305 1310
Arg Ala Glu Phe Asp Val Val Glu Leu Arg Arg Leu Gin Ala Gly Thr
1315 1320 1325
His Gly Tyr Asn Pro Arg Asp Phe Arg Lys Arg Ala Glu Gin Ala Ala 1330 1335 1340
Asn Ala Glu Ala Val Thr Leu Ala Leu Asp Thr Ala Phe Ala Phe Asn
1345 1350 1355 136
Pro Tyr Thr Pro Glu Asn Gin Arg His Pro Met Leu Pro Pro Leu Ala
1365 1370 1375 Ala Ile His Arg Leu Gly Trp Ser Ala Ala Phe His Ala Ala Ala Glu
1380 1385 1390
Thr Tyr Ala Asp Met Phe Arg Val Asp Ala Glu Pro Leu Ala Arg Leu
1395 1400 1405
Leu Arg Ile Ala Glu Gly Leu Leu Glu Met Ala Gin Ala Gly Asp Gly 1410 1415 1420
Phe Ile Asp Tyr His Glu Ala Val Gly Arg Leu Ala Asp Asp Met Thr
1425 1430 1435 144
Ser Val Pro Gly Leu Arg Arg Tyr Val Pro Phe Phe Gin His Gly Tyr
1445 1450 1455 Ala Asp Tyr Val Glu Leu Arg Asp Arg Leu Asp Ala Ile Arg Ala Asp
1460 1465 1470
Val His Arg Ala Leu Gly Gly Val Pro Leu Asp Leu Ala Ala Ala Ala
1475 1480 1485
Glu Gin Ile Ser Ala Ala Arg Asn Asp Pro Glu Ala Thr Ala Glu Leu 1490 1495 1500
Val Arg Thr Gly Val Thr Leu Pro Cys Pro Ser Glu Asp Ala Leu Val
1505 1510 1515 152
Ala Cys Ala Ala Ala Leu Glu Arg Val Asp Gin Ser Pro Val Lys Asn
1525 1530 1535 Thr Ala Tyr Ala Glu Tyr Val Ala Phe Val Thr Arg Gin Asp Thr Ala
1540 1545 1550
Glu Thr Lys Asp Ala Val Val Arg Ala Lys Gin Gin Arg Ala Glu Ala
1555 1560 1565
Thr Glu Arg Val Met Ala Gly Leu Arg Glu Ala Ala Arg Glu Arg Arg 1570 1575 1580
Ala Gin Ile Glu Ala Glu Gly Leu Ala Asn Leu Lys Thr Met Leu Lys 1585 1590 1595 160
Val Val Ala Val Pro Ala Thr Val Ala Lys Thr Leu Asp Gin Ala Arg 1605 1610 1615
Ser Val Ala Glu Ile Ala Asp Gin Val Glu Val Leu Leu Asp Gin Thr
1620 - 1625 1630
Glu Lys Thr Arg Glu Leu Asp Val Pro Ala Val Ile Trp Leu Glu His 1635 1640 1645
Ala Gin Arg Thr Phe Glu Thr His Pro Leu Ser Ala Arg Asp Gly Pro
1650 1655 1660
Gly Pro Leu Ala Arg His Ala Gly Arg Leu Gly Ala Leu Phe Asp Thr 1665 1670 1675 168 Arg Arg Arg Val Asp Ala Leu Arg Arg Ser Leu Glu Glu Ala Glu Ala
1685 1690 1695
Glu Trp Asp Glu Val Trp Gly Arg Phe Gly Arg Val Arg Gly Gly Ala
1700 1705 1710
Trp Lys Ser Pro Glu Gly Phe Arg Ala Met His Glu Gin Leu Arg Ala 1715 1720 1725
Leu Gin Asp Thr Thr Asn Thr Val Ser Gly Leu Arg Ala Gin Pro Ala
1730 1735 1740
Tyr Glu Arg Leu Ser Ala Arg Tyr Gin Gly Val Leu Gly Ala Lys Gly 1745 1750 1755 176 Ala Glu Arg Ala Glu Ala Val Glu Glu Leu Gly Ala Arg Val Thr Lys
1765 1770 1775
His Thr Ala Leu Cys Ala Arg Leu Arg Asp Glu Val Val Arg Arg Val
1780 1785 1790
Pro Trp Glu Met Asn Phe Asp Ala Leu Gly Arg Leu Leu Ala Glu Phe 1795 1800 1805
Asp Ala Ala Ala Ala Asp Leu Ala Pro Trp Ala Val Glu Glu Phe Arg
1810 1815 1820
Gly Ala Arg Glu Leu Ile Gin Tyr Arg Met Gly Ser Ala Tyr Ala Arg 1825 1830 1835 184 Ala Gly Gly Gin Thr Xaa Xaa Xaa Xaa Xaa
1845 1850
(2) INFORMATION FOR SEQ ID NO: 121:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1100 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: Met Ser Asp Ser Ala Leu Gin Val Pro Ala Pro Ala Gly Met Thr Pro
1 5 - 10 15
Pro Ser Ala Pro Pro Pro Asn Gly Pro Leu Gin Val Leu Leu Gly Ser 20 25 30
Leu Thr Asn Leu Arg Arg Pro Pro Ser Pro Ser Ser Glu Pro Ala Gly
35 40 45
Ser Ala Asp Glu Pro Ala Phe Leu Ser Ala Ala Lys Leu Arg Ala Ala 50 55 60 Thr Ala Ala Phe Leu Leu Ser Gly Ala Ala Val Gly Pro Ala Glu Ala 65 70 75 80
Arg Ala Cys Trp His Pro Leu Leu Glu Gin Leu Cys Ala Leu His Arg
85 90 95
Ala His Gly Leu Pro Glu Thr Ala Leu Leu Ala Glu Asn Leu Pro Gly 100 105 110
Leu Leu Val His Arg Met Ala Val Pro Glu Thr Pro Glu Ala Ala Phe
115 120 125
Arg Glu Met Asp Val Ile Lys Asp Thr Val Leu Ala Ile Thr Gly Ser
130 135 140 Asp Thr Thr His Ala Leu Glu Ala Ala Gly Leu Arg Thr Thr Ala Ala
145 150 155 160
Leu Gly Pro Val Arg Val Arg Gin Cys Ala Val Glu Trp Ile Asp Arg
165 170 175
Trp Arg Thr Val Thr Gin Ser Cys Leu Ala Met Asn Pro Arg Thr Ser 180 185 190
Leu Glu Ala Leu Gly Glu Met Ser Leu Lys Met Ser Pro Val Pro Leu
195 200 205
Gly Gin Pro Gly Ala Asn Leu Thr Thr Pro Ala Tyr Ser Leu Leu Phe
210 215 220 Pro Ser Pro Ile Val Gin Glu Gly Leu Arg Phe Leu Ala Leu Val Ser
225 230 235 240
Asn Trp Val Thr Leu Phe Ser Ala His Leu Gin Arg Ile Asp Asp Ala
245 250 255
Ala Leu Thr Pro Leu Thr Arg Ala Leu Phe Thr Leu Ala Leu Val Asp 260 265 270
Glu Tyr Leu Thr Thr Pro Asp Arg Gly Ala Val Val Pro Pro Pro Leu
275 280 285
Leu Ala Gin Phe Gin His Thr Val Arg Glu Ile Asp Pro Ala Ile Met 290 295 300 Ile Pro Pro Leu Glu Ala Thr Lys Met Val Arg Ser Arg Glu Glu Val 305 310 315 320
Arg Val Ser Thr Ala Leu Ser Arg Val Ser Pro Arg Ser Ala Cys Ala 325 330 335 Pro Pro Gly Thr Leu Met Ala Arg Val Arg Thr Asp Ala Ala Val Phe
340 345 350
Asp Pro Asp Val Pro Phe Leu Ser Ala Ser Ala Ile Phe Arg Pro Ala 355 360 365 Val Thr Gly Leu Leu Gin Leu Gly Glu Pro Pro Ser Ala Gly Ala Gin 370 375 380
Gin Arg Leu Leu Ala Leu Leu Gin Gin Thr Trp Ala Leu Val Gin Asn 385 390 395 400
Ser Asn Ser Pro Ser Val Val Ile Asn Thr Leu Thr Asp Ala Gly Phe 405 410 415
Thr Pro Ala His Cys Thr Gin Tyr Ile Ser Ala Leu Glu Gly Phe Leu
420 425 430
Val Ala Gly Val Pro Ala Arg Thr Pro Pro Gly His Gly Leu Ser Glu 435 440 445 lie Gin Gin Leu Phe Gly Cys Ile Ala Gly Ala Asn Val Phe Gly Leu 450 455 460
Ala Arg Glu Tyr Gly His Tyr Ala Gly Tyr Val Lys Thr Phe Arg Arg 465 470 475 480 lie Gin Gly Ala Ser Glu His Thr His Gly Arg Leu Cys Glu Ala Val 485 490 495
Gly Leu Ser Gly Gly Val Leu Ser Gin Thr Leu Ala Arg Ile Met Gly
500 505 510
Pro Ala Val Pro Thr Glu His Leu Ala Ser Leu Arg Arg Thr Leu Val 515 520 525 Gly Glu Phe Glu Thr Ala Glu Arg Arg Phe Ser Ala Gly Gin Pro Ser 530 535 540
Leu Leu Arg Glu Thr Ala Leu Ile Trp Leu Asp Val Tyr Gly Gin Thr 545 550 555 560
His Trp Asp Leu Thr Pro Thr Thr Pro Ala Thr Pro Leu Ser Ala Leu 565 570 575
Leu Pro Val Gly Pro Pro Ser His Ala Pro Ser Val His Leu Ala Ala
580 585 590
Ala Thr Lys Ile Arg Phe Pro Ala Leu Glu Gly Ile His Pro Asn Val 595 600 605 Leu Ala Asp Pro Gly Phe Val Pro Tyr Val Leu Ala Leu Val Val Gly 610 615 620
Asp Ala Leu Arg Ala Thr Cys Asn Ala Ala Tyr Leu Pro Arg Pro Ile 625 630 635 640
Glu Phe Ala Leu Arg Val Leu Ala Trp Ala Arg Asp Phe Gly Leu Gly 645 650 655
Tyr Leu Pro Thr Val Glu Gly His Arg Thr Lys Leu Gly Ala Leu Ile
660 665 670
Thr Leu Leu Glu Pro Ala Thr Arg Ala Gly Val Gly Pro Thr Met Gin 675 680 685
Met Ala Asp Asn Ile Glu Gin Leu Leu Arg Glu Leu Tyr Val Ile Arg
690 695 - 700
Ala Val Glu Gin Leu Arg Pro Ala Val Gin Leu Pro Pro Pro Gin Pro 705 710 715 720
Pro Glu Val Gly Ser Ser Leu Leu Leu Ile Ser Met Tyr Ala Arg Val
725 730 735
Leu Gin Glu Leu Ala Glu Arg Ala Asp Pro Leu Val Arg Gin Leu Glu 740 745 750 Asp Ala Ile Val Leu Leu Arg Leu His Met Arg Thr Leu Ala Ala Phe 755 760 765
Phe Glu Cys Arg Phe Glu Ser Asp Gly His Arg Leu Tyr Ala Val Val
770 775 780
Ala Asp Ala His Glu Arg Leu Gly Pro Trp Arg Pro Glu Ala Met Gly 785 790 795 800
Asp Ala Val Ser Gin Tyr Cys Gly Met Tyr His Asp Ala Lys Arg Ala
805 810 815
Leu Val Ala Ser Leu Ala Gly Leu Arg Ser Val Val Thr Glu Thr Thr 820 825 830 Ala His Leu Gly Val Cys Asp Glu Leu Ala Ala Gin Val Ser His Glu 835 840 845
Gly Asn Val Leu Ala Val Val Arg Arg Glu Ile His Gly Phe Leu Ala
850 855 860
Ile Val Ser Gly Ile His Ala Arg Ala Ser Lys Leu Met Ser Gly Asp 865 870 875 880
Gin Val Pro Gly Phe Cys Tyr Met Ser Gin Phe Leu Ala Arg Trp Arg
885 890 895
Arg Leu Ser Ala Gly Tyr Gin Ala Ala Arg Ala Ala Thr Gly Pro Glu 900 905 910 Arg Val Ala Glu Phe Val Gin Glu Leu His Asp Thr Trp Lys Gly Leu 915 920 925
Gin Thr Glu Arg Ala Leu Val Val Ala Pro Phe Ala Ser Ser Ala Asp
930 935 940
Gin Arg Thr Ala Ala lie Gin Glu Val Met Ala His Ala Thr Glu Asp 945 950 955 960
Ala Pro Pro Ser Pro Ala Ala Asp Leu Val Val Leu Thr Asn Arg His
965 970 975
Asp Leu Gly Ala Trp Gly Asp Tyr Ser Leu Gly Pro Leu Gly Gin Pro 980 985 990 Thr Val Val Pro Asp Ser Val Asp Leu Ser Pro Gin Gly Leu Ala Ala 995 1000 1005
Thr Leu Ser Met Asp Trp Leu Leu Ile Asn Glu Leu Leu Gin Val Thr 1010 1015 1020 Asp Gly Val Phe Arg Ala Ser Ala Phe Arg Pro Ser Ala Gly Pro Glu
1025 1030 1035 104
Ala Pro Gly Asp Leu Glu Ala Gin Asp Ala Gly Gly Ser Thr Pro Glu
1045 1050 1055 Pro Thr Thr Pro Gly Pro Gin Asp Thr Gin Ala Arg Ala Pro Ser Trp
1060 1065 1070
Ala Gly Arg Glu Thr Val Pro Trp Pro Asn Thr Pro Val Glu Asp Asp
1075 1080 1085
Glu Met Thr Pro Gin Glu Thr Pro Pro Val His Pro 1090 1095 1100
(2) INFORMATION FOR SEQ ID NO: 122:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 641 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:122:
Val Glu Arg Thr Gly Gly Ser Cys Arg Arg Ala Pro Gly Pro Gly Ala 1 5 10 15
Arg Cys Pro Thr Trp Arg Pro Ala Cys Ala Leu Gly Asp Ala Ala Arg
20 25 30
Arg Pro Arg Ala Gin Thr Gly Met Thr Ala Ala Ala Leu Tyr Gly Gly 35 40 45 Ala Lys Tyr Arg Pro Gly Thr Leu Arg Asn Pro Gly Arg Val Ala Ser 50 55 60
Thr Pro Arg Arg Arg Gly Val Leu Tyr Gly Ala Leu Cys Pro Gly Ile 65 70 75 80
Pro Phe Val Gly Ser Gly Pro Gly Ala Val Gly Trp Glu Cys Val Cys 85 90 95
Val Gly Gly Gly Arg Arg Asp Gly Gly Pro Asp Gin Val Tyr Arg Gly
100 105 110
Arg Ser Val Gly Arg Pro Asn Arg Pro Phe Lys His Leu Arg Met His 115 120 125 Arg Pro Ser Gin Ser Asp Thr Gly Thr His Gin Arg Arg Lys Pro Pro 130 135 140
Ser Pro Val Arg Val Arg Val Phe Ser Gly Gly Val Phe Phe Leu Ser 145 150 155 160 Ala Leu Leu Pro Pro His Leu His His Pro Pro Pro Thr Trp Leu Ala
165 170 175
Ile Gly Gly Lys Thr Met Lys Thr Lys Pro Leu Pro Thr Ala Pro Met 180 185 190 Ala Trp Ala Glu Ser Ala Val Glu Thr Thr Thr Ser Pro Arg Glu Leu 195 200 205
Ala Gly His Ala Pro Leu Arg Arg Val Leu Arg Pro Pro Ile Ala Arg
210 215 220
Arg Asp Gly Pro Val Leu Leu Gly Asp Arg Ala Pro Arg Arg Thr Ala 225 230 235 240
Ser Thr Met Trp Leu Leu Gly Ile Asp Pro Ala Glu Ser Ser Pro Gly
245 250 255
Thr Arg Ala Thr Arg Asp Asp Thr Glu Gin Ala Val Asp Lys Ile Leu 260 265 270 Arg Gly Ala Arg Arg Ala Gly Gly Leu Thr Val Pro Gly Ala Pro Arg 275 280 285
Tyr His Leu Thr Arg Gin Val Thr Leu Thr Asp Leu Cys Gin Pro Asn
290 295 300
Ala Glu Arg Ala Gly Ala Leu Leu Leu Ala Leu Arg His Pro Thr Asp 305 310 315 320
Leu Pro His Leu Ala Arg His Arg Ala Pro Pro Gly Arg Gin Thr Glu
325 330 335
Arg Leu Ala Glu Ala Trp Gly Gin Leu Leu Glu Ala Ser Ala Leu Gly 340 345 350 Ser Gly Arg Ala Glu Ser Gly Cys Ala Arg Ala Gly Leu Val Ser Phe 355 360 365
Asn Phe Leu Val Ala Ala Cys Ala Ala Ala Tyr Asp Ala Arg Asp Ala
370 375 380
Ala Glu Ala Val Arg Ala His Ile Thr Thr Asn Tyr Gly Gly Thr Arg 385 390 395 400
Ala Gly Ala Arg Leu Asp Arg Phe Ser Glu Cys Leu Arg Ala Met Val
405 410 415
His Thr His Val Phe Phe Val Met Arg Phe Phe Gly Gly Leu Val Ser 420 425 430 Trp Val Thr Gin Asp Glu Leu Ala Ser Val Thr Ala Val Cys Ser Gly 435 440 445
Pro Gin Glu Ala Thr His Thr Gly His Pro Gly Arg Pro Cys Ser Ala
450 455 460
Val Thr Ile Pro Ala Cys Ala Phe Val Asp Leu Asp Ala Glu Leu Cys 465 470 475 480
Leu Gly Gly Pro Gly Ala Ala Phe Leu Tyr Leu Val Phe Tyr Gin Cys
485 490 495
Arg Asp Gin Glu Leu Cys Cys Val Tyr Val Val Lys Ser Gin Leu Pro 500 505 510
Pro Arg Gly Leu Glu Ala Ala Leu Glu Arg Leu Phe Gly Arg Leu Arg
515 520 525
Ile Thr Asn Thr Ile His Gly Ala Glu Asp Met Thr Pro Pro Pro Pro 530 535 540
Asn Arg Asn Val Asp Phe Pro Leu Ala Val Leu Ala Ala Ser Ser Gin
545 550 555 560
Ser Pro Arg Cys Ser Ala Ser Gin Val Thr Asn Pro Gin Phe Val Asp
565 570 575 Arg Leu Tyr Arg Trp Gin Pro Asp Leu Arg Gly Arg Pro Thr Ala Arg
580 585 590
Thr Cys Thr Tyr Ala Ala Phe Ala Glu Leu Gly Val Met Pro Asp Asn
595 600 605
Ser Pro Arg Cys Leu His Arg Thr Glu Arg Phe Gly Ala Val Gly Val 610 615 620
Pro Val Val Ile Gly Val Val Trp Arg Pro Gly Gly Trp Arg Ala Cys 625 630 635 640
Ala
(2) INFORMATION FOR SEQ ID NO: 123:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1160 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 123
Val Ile Arg Arg Pro Val Arg Pro Phe Gly Arg Thr Ala His Pro Ala 1 5 10 15 Ser His Gly Pro Ala Ala Val Ser Val His Arg Val Arg Ala Thr Val 20 25 30
Thr Leu Val Pro Met Ala Asn Arg Pro Ala Ala Ser Ala Gly Ala Arg
35 40 45
Ser Pro Ser Gin Glu Pro Arg Glu Pro Glu Val Ala Pro Pro Gly Gly 50 55 60
Asp His Val Phe Cys Arg Lys Val Ser Gly Val Met Val Leu Ser Ser 65 70 75 80
Asp Pro Pro Gly Pro Ala Ala Tyr Arg Ile Ser Asp Ser Ser Phe Val 85 90 95
Gin Cys Gly Ser Asn Cys Ser Met Ile Ile Asp Gly Asp Val Arg His
100 " 105 110
Leu Arg Asp Leu Glu Gly Ala Thr Ser Thr Gly Ala Phe Val Ala Ile 115 120 125
Ser Asn Val Ala Ala Gly Gly Asp Gly Arg Thr Ala Val Val Gly Gly
130 135 140
Thr Ser Gly Pro Ser Ala Thr Thr Ser Val Gly Thr Gin Thr Ser Gly 145 150 155 160 Glu Phe Leu His Gly Asn Pro Arg Thr Pro Glu Pro Gin Gly Pro Gin
165 170 175
Ala Val Pro Pro Pro Pro Pro Pro Pro Phe Pro Trp Gly His Glu Cys
180 185 190
Cys Ala Arg Arg Asp Arg Gly Ala Glu Lys Asp Val Gly Ala Ala Glu 195 200 205
Ser Trp Ser Asp Gly Pro Ser Ser Asp Ser Glu Thr Glu Asp Ser Asp
210 215 220
Ser Ser Asp Glu Asp Thr Gly Ser Gly Ser Glu Thr Leu Ser Arg Ser 225 230 235 240 Ser Ser Ile Trp Ala Ala Gly Ala Thr Asp Asp Asp Asp Ser Asp Ser
245 250 255
Asp Ser Arg Ser Asp Asp Ser Val Gin Pro Asp Val Val Val Arg Arg
260 265 270
Arg Trp Ser Asp Gly Pro Ala Pro Val Ala Phe Pro Lys Pro Arg Arg 275 280 285
Pro Gly Asp Ser Pro Gly Asn Pro Gly Leu Gly Ala Gly Thr Gly Pro
290 295 300
Gly Ser Ala Thr Asp Pro Arg Ala Ser Ala Asp Ser Asp Ser Ala Ala 305 310 315 320 His Ala Ala Ala Pro Gin Ala Asp Val Ala Pro Val Leu Asp Ser Gin
325 330 335
Pro Thr Val Gly Thr Asp Pro Gly Tyr Pro Val Pro Leu Glu Leu Thr
340 345 350
Pro Glu Asn Ala Glu Ala Val Ala Arg Phe Leu Gly Asp Ala Val Asp 355 360 365
Arg Glu Pro Ala Leu Met Leu Glu Tyr Phe Cys Arg Cys Ala Arg Glu
370 375 380
Glu Ser Lys Arg Val Pro Pro Arg Thr Phe Gly Ser Ala Pro Arg Leu 385 390 395 400 Thr Glu Asp Asp Phe Gly Leu Leu Asn Tyr Ala Glu Met Arg Arg Leu
405 410 415
Cys Leu Asp Leu Pro Pro Val Pro Pro Asn Ala Tyr Thr Pro Tyr His 420 425 430 Leu Arg Glu Tyr Ala Thr Arg Leu Val Asn Gly Phe Lys Pro Leu Val
435 440 445
Arg Arg Ser Ala Arg Leu Tyr Arg Ile Leu Gly lie Leu Val His Leu
450 455 460 Arg Ile Arg Thr Arg Glu Ala Ser Phe Glu Glu Trp Met Arg Ser Lys
465 470 475 480
Glu Val Asp Leu Asp Phe Gly Leu Thr Glu Arg Leu Arg Glu His Glu
485 490 495
Ala Gin Leu Met Ile Leu Ala Gin Ala Leu Asn Pro Tyr Asp Cys Leu 500 505 510
Ile His Ser Thr Pro Asn Thr Leu Val Glu Arg Gly Leu Gin Ser Ala
515 520 525
Leu Lys Tyr Glu Glu Phe Tyr Leu Lys Arg Phe Gly Gly His Tyr Met
530 535 540 Glu Ser Val Phe Gin Met Tyr Thr Arg Ile Ala Gly Phe Leu Ala Cys
545 550 555 560
Arg Ala Thr Arg Gly Met Arg His Ile Ala Leu Gly Arg Gin Gly Ser
565 570 575
Trp Trp Glu Met Phe Lys Phe Phe Phe His Arg Leu Tyr Asp His Gin 580 585 590
Ile Val Pro Ser Thr Pro Ala Met Leu Asn Leu Gly Thr Arg Asn Tyr
595 600 605
Tyr Thr Ser Ser Cys Tyr Leu Val Asn Pro Gin Ala Thr Thr Asn Gin
610 615 620 Ala Thr Leu Arg Ala Ile Thr Gly Asn Val Ser Ala Ile Leu Ala Arg
625 630 635 640
Asn Gly Gly Ile Gly Leu Cys Met Gin Ala Phe Asn Asp Asp Gly Thr
645 650 655
Ala Ser Ile Met Pro Ala Leu Lys Val Leu Asp Ser Leu Val Ala Ala 660 665 670
His Asn Lys Gin Ser Trp Thr Gly Ala Cys Val Tyr Leu Glu Pro Trp
675 680 685
His Ser Asp Val Arg Ala Val Leu Arg Met Lys Gly Val Leu Ala Gly
690 695 700 Glu Glu Ala Gin Arg Cys Asp Asn Ile Phe Ser Ala Leu Trp Met Pro
705 710 715 720
Asp Leu Phe Phe Lys Arg Leu Ile Arg His Leu Asp Gly Glu Lys Asn
725 730 735
Val Thr Trp Ser Leu Phe Asp Arg Asp Thr Ser Met Ser Leu Ala Asp 740 745 750
Phe His Gly Glu Glu Phe Glu Lys Leu Tyr Glu His Leu Glu Ala Met
755 760 765
Gly Phe Gly Glu Thr Ile Pro Ile Gin Asp Leu Ala Tyr Ala Ile Val 770 775 780
Arg Ser Ala Ala Thr Thr Gly Ser Pro Phe Ile Met Phe Lys Asp Ala 785 790 " 795 800
Val Asn Arg His Tyr Ile Tyr Asp Thr Gin Gly Ala Ala Ile Ala Gly 805 810 815
Ser Asn Leu Cys Thr Glu Ile Val His Pro Ser Ser Lys Arg Ser Ser
820 825 830
Gly Val Cys Asn Leu Gly Ser Val Asn Leu Ala Arg Cys Val Ser Arg 835 840 845 Arg Thr Phe Asp Phe Gly Met Leu Arg Asp Ala Val Gin Ala Cys Val 850 855 860
Leu Met Val Asn Ile Met Ile Asp Ser Thr Leu Gin Pro Thr Pro Gin 865 870 875 880
Cys Arg His Asp Asn Leu Arg Ser Met Gly Ile Gly Met Gin Gly Leu 885 890 895
His Thr Ala Cys Leu Lys Met Gly Leu Asp Leu Glu Ser Ala Glu Phe
900 905 910
Arg Asp Leu Asn Thr His Ile Ala Glu Val Met Leu Leu Ala Ala Met 915 920 925 Lys Thr Ser Asn Ala Leu Cys Val Arg Gly Ala Arg Pro Phe Ser His 930 935 940
Phe Lys Arg Ser Met Tyr Arg Ala Gly Arg Phe His Trp Glu Arg Phe 945 950 955 960
Ser Asn Asp Arg Tyr Glu Gly Glu Trp Glu Met Leu Arg Gin Ser Met 965 970 975
Met Lys His Gly Leu Arg Asn Ser Gin Phe Ile Ala Leu Met Pro Thr
980 985 990
Ala Ala Ser Ala Gin Ile Ser Asp Val Ser Glu Gly Phe Ala Pro Leu 995 1000 1005 Phe Thr Asn Leu Phe Ser Lys Val Thr Arg Asp Gly Glu Thr Leu Arg 1010 1015 1020
Pro Asn Thr Leu Leu Leu Lys Glu Leu Glu Arg Thr Phe Gly Gly Lys 1025 1030 1035 104
Arg Leu Leu Asp Ala Met Asp Gly Leu Glu Ala Lys Gin Trp Ser Val 1045 1050 1055
Ala Gin Ala Leu Pro Cys Leu Asp Pro Ala His Pro Leu Arg Arg Phe
1060 1065 1070
Lys Thr Ala Phe Asp Tyr Asp Gin Glu Leu Leu Ile Asp Leu Cys Ala 1075 1080 1085 Asp Arg Ala Pro Tyr Val Asp His Ser Gin Ser Met Thr Leu Tyr Val 1090 1095 1100
Thr Glu Lys Ala Asp Gly Thr Leu Pro Ala Ser Thr Leu Val Arg Leu 1105 1110 1115 112 Leu Val His Ala Tyr Lys Arg Gly Leu Lys Thr Gly Met Tyr Tyr Cys
1125 1130 1135
Lys Val Arg Lys Ala Thr Asn Ser Gly Val Phe Ala Gly Asp Asp Asn 1140 1145 1150 Ile Val Cys Thr Ser Cys Ala Leu 1155 1160
(2) INFORMATION FOR SEQ ID NO: 124:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 333 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124:
Met Asp Pro Ala Val Ser Pro Ala Ser Thr Asp Pro Leu Asp Thr His 1 5 10 15
Ala Ser Gly Ala Gly Ala Ala Pro Ile Pro Val Cys Pro Thr Pro Glu
20 25 30
Arg Tyr Phe Tyr Thr Ser Gin Cys Pro Asp Ile Asn His Leu Arg Ser 35 40 45
Leu Ser Ile Leu Asn Arg Trp Leu Glu Thr Glu Leu Val Phe Val Gly
50 55 60
Asp Glu Glu Asp Val Ser Lys Leu Ser Glu Gly Glu Leu Gly Phe Tyr 65 70 75 80 Arg Phe Leu Phe Ala Phe Leu Ser Ala Ala Asp Asp Leu Val Thr Glu
85 90 95
Asn Leu Gly Gly Leu Ser Gly Leu Phe Glu Gin Lys Asp Ile Leu His
100 105 110
Tyr Tyr Val Glu Gin Glu Cys Ile Glu Val Val His Ser Arg Val Tyr 115 120 125
Asn Ile Ile Gin Leu Val Leu Phe His Asn Asn Asp Gin Ala Arg Arg
130 135 140
Ala Tyr Val Ala Arg Thr Ile Asn His Pro Ala Ile Arg Val Lys Val 145 150 155 160 Asp Trp Leu Glu Ala Arg Val Arg Glu Cys Asp Ser Ile Pro Glu Lys
165 170 175
Phe Ile Leu Met Ile Leu Ile Glu Gly Val Phe Phe Ala Ala Ser Phe 180 185 190 Ala Ala Ile Ala Tyr Leu Arg Thr Asn Asn Leu Leu Arg Val Thr Cys
195 200 205
Gin Ser Asn Asp Leu Ile Ser Arg Asp Glu Ala Val His Thr Thr Ala
210 215 220 Ser Cys Tyr Ile Tyr Asn Asn Tyr Leu Gly Gly His Ala Lys Pro Glu
225 230 235 240
Ala Ala Arg Val Tyr Arg Leu Phe Arg Glu Ala Val Asp Ile Glu Ile
245 250 255
Gly Phe Ile Arg Ser Gin Ala Pro Thr Asp Ser Ser Ile Leu Ser Pro 260 265 270
Gly Ala Ala Ile Glu Asn Tyr Val Arg Phe Ser Ala Asp Arg Leu Leu
275 280 285
Gly Leu Ile His Met Gin Pro Lys Ala Pro Ala Pro Asp Ala Ser Phe 290 295 300 Pro Leu Ser Leu Met Ser Thr Asp Lys His Thr Asn Phe Phe Glu Cys 305 310 315 320
Arg Ser Thr Ser Tyr Ala Gly Ala Val Val Asn Asp Leu 325 330
(2) INFORMATION FOR SEQ ID NO: 125:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 357 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125:
Met Arg Arg Arg Gly His Ala Phe Ala Pro Gly Asp Arg Gly Thr Arg
1 5 10 15
Ala Ala Gly Pro Gly Pro Ala Ala Pro Trp Gly Ala Pro Ser Lys Pro 20 25 30
Ala Leu Arg Leu Ala His Leu Phe Cys Ile Arg Val Leu Arg Ala Leu
35 40 45
Gly Tyr Ala Tyr Ile Asn Ser Gly Gin Leu Glu Ala Asp Asp Ala Cys 50 55 60 Ala Asn Leu Tyr His Thr Asn Thr Val Ala Tyr Val His Thr Thr Asp 65 70 75 80
Thr Asp Leu Leu Leu Met Gly Cys Asp Ile Val Leu Asp Ile Ser Thr 85 90 95 Gly Tyr Ile Pro Thr Ile His Cys Arg Asp Leu Leu Gin Tyr Phe Lys
100 105 110
Met Ser Tyr Pro Gin Phe Leu Ala Leu Phe Val Arg Cys His Thr Asp 115 120 125 Leu His Pro Asn Asn Thr Tyr Ala Ser Val Glu Asp Val Leu Arg Glu 130 135 140
Cys His Trp Thr Ala Pro Ser Arg Ser Gin Ala Arg Arg Ala Ala Arg 145 150 155 160
Arg Glu Arg Ala Asn Ser Arg Ser Leu Glu Ser Met Pro Thr Leu Thr 165 170 175
Ala Ala Pro Val Gly Leu Glu Thr Arg Ile Ser Trp Thr Glu Ile Leu
180 185 190
Ala Gin Gin Ile Ala Gly Glu Asp Asp Tyr Glu Glu Asp Pro Pro Leu 195 200 205 Gin Pro Pro Asp Val Ala Gly Gly Pro Arg Asp Gly Ala Arg Ser Ser 210 215 220
Ser Ser Glu Ile Leu Thr Pro Pro Glu Leu Val Gin Val Pro Asn Ala 225 230 235 240
Gin Arg Val Ala Glu His Arg Gly Tyr Val Ala Gly Arg Arg Arg His 245 250 255
Val Ile His Asp Ala Pro Glu Ala Leu Asp Trp Leu Pro Asp Pro Met
260 265 270
Thr Ile Ala Glu Leu Val Glu His Arg Tyr Val Lys Tyr Val Ile Ser 275 280 285 Leu Ile Ser Pro Lys Glu Arg Gly Pro Trp Thr Leu Leu Lys Arg Leu 290 295 300
Pro Ile Tyr Gin Asp Leu Arg Asp Glu Asp Leu Ala Arg Ser Ile Val 305 310 315 320
Thr Arg His Ile Thr Ala Pro Asp Ile Ala Asp Arg Phe Leu Ala Gin 325 330 335
Leu Trp Ala His Ala Pro Pro Pro Ala Phe Tyr Lys Asp Val Leu Ala
340 345 350
Lys Phe Trp Asp Glu 355
(2) INFORMATION FOR SEQ ID NO: 126:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 466 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 126:
Met Ala His Leu Pro Gly Gly Ala Ala Ala Ala Pro Leu Ser Glu Asp 1 5 10 15
Ala Ile Pro Ser Pro Arg Glu Arg Thr Glu Asp Trp Pro Pro Cys Gin
20 25 30
Ile Val Leu Gin Gly Ala Glu Leu Asn Gly Ile Leu Gin Ala Phe Ala 35 40 45
Pro Leu Arg Thr Ser Leu Leu Asp Ser Leu Leu Val Val Gly Asp Arg
50 55 60
Gly Ile Leu Val His Asn Ala Ile Phe Gly Glu Gin Val Phe Leu Pro 65 70 75 80 Leu Asp His Ser Gin Phe Ser Arg Tyr Arg Trp Gly Gly Pro Thr Ala
85 90 95
Ala Phe Leu Ser Leu Val Asp Gin Lys Arg Ser Leu Leu Ser Val Phe
100 105 110
Arg Ala Asn Gin Tyr Pro Asp Leu Arg Arg Val Glu Leu Thr Val Thr 115 120 125
Gly Gin Ala Pro Phe Arg Thr Leu Val Gin Arg Ile Trp Thr Thr Ala
130 135 140
Ser Asp Gly Glu Ala Val Glu Leu Ala Ser Glu Thr Leu Met Lys Arg 145 150 155 160 Glu Leu Thr Ser Phe Ala Val Leu Leu Pro Gin Gly Asp Pro Asp Val
165 170 175
Gin Leu Arg Leu Thr Lys Pro Gin Leu Thr Lys Val Val Asn Ala Val
180 185 190
Gly Asp Glu Thr Ala Lys Pro Thr Thr Phe Glu Leu Gly Pro Asn Gly 195 200 205
Lys Phe Ser Val Phe Asn Ala Arg Thr Cys Val Thr Phe Ala Ala Arg
210 215 220
Glu Glu Gly Ala Ser Ser Ser Thr Ser Ala Gin Val Gin Ile Leu Thr 225 230 235 240 Ser Ala Leu Lys Lys Ala Gly Gin Ala Ala Ala Asn Ala Lys Thr Val
245 250 255
Tyr Gly Glu Asn Thr Thr Phe Ser Val Val Val Asp Asp Cys Ser Met
260 265 270
Arg Ala Val Leu Arg Arg Leu Gin Val Gly Gly Gly Thr Leu Lys Phe 275 280 285
Phe Leu Thr Ala Asp Val Pro Ser Val Cys Val Thr Ala Thr Gly Pro
290 295 300
Asn Ala Val Ser Ala Val Phe Leu Leu Lys Pro Gin Arg Val Cys Leu 305 310 315 320
Asn Trp Leu Gly Arg Thr Pro Gly Ser Ser Thr Gly Ser Leu Ala Ser
325 " 330 335
Gin Asp Ser Arg Ala Gly Pro Thr Asp Ser Gin Asp Phe Ser Ser Glu 340 345 350
Pro Asp Ala Gly Asp Arg Gly Ala Pro Glu Glu Glu Gly Leu Glu Gly
355 360 365
Gin Ala Arg Val Pro Pro Ala Phe Pro Glu Pro Pro Gly Thr Lys Arg
370 375 380 Arg His Ala Gly Ala Glu Val Val Pro Ala Asp Asp Ala Thr Lys Arg
385 390 395 400
Pro Lys Thr Gly Val Pro Ala Ala Pro Thr Arg Ala Glu Ser Pro Pro
405 410 415
Leu Ser Ala Arg Tyr Gly Pro Glu Ala Ala Glu Gly Gly Gly Asp Gly 420 425 430
Gly Arg Tyr Ala Cys Tyr Phe Arg Asp Leu Gin Thr Gly Asp Asp Ser
435 440 445
Pro Leu Ser Ala Phe Arg Gly Pro Gin Arg Pro Pro Tyr Gly Phe Gly 450 455 460 Leu Pro 465
(2) INFORMATION FOR SEQ ID NO: 127:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 331 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127:
Val Cys Pro Pro Pro Pro Thr Asn Met Ala Val Val Cys Gly Ser Gly
1 5 10 15
Leu Arg Leu Arg Pro Phe His Pro Pro Ser Pro Ser Phe Phe Val Leu
20 25 30
Arg Ala Leu Ile Arg Ala Gly Pro Gly Pro Phe Ala Asp Arg Ala Pro 35 40 45
Ser Gly Pro Gly Cys Gly Met Cys Arg Gly Asp Ser Pro Gly Val Ala
50 55 60
Gly Gly Ser Gly Glu His Cys Leu Gly Gly Asp Asp Gly Asp Asp Gly 65 70 75 80
Arg Pro Arg Leu Ala Cys Val Gly Ala Ile Arg Phe Ala His Leu Trp
85 - 90 95
Leu Gin Ala Thr Thr Leu Gly Phe Val Gly Ser Val Val Leu Ser Arg 100 105 110
Gly Pro Tyr Ala Asp Ala Met Ser Gly Ala Phe Val Ile Gly Ser Thr
115 120 125
Gly Leu Gly Phe Leu Arg Ala Pro Pro Ala Phe Ala Arg Pro Pro Thr
130 135 140 Arg Val Cys Ala Trp Leu Arg Leu Val Gly Gly Gly Ala Ala Val Trp
145 150 155 160
Ser Leu Gly Glu Ala Gly Ala Pro Pro Gly Val Pro Gly Pro Ala Thr
165 170 175
Gin Cys Leu Ala Leu Gly Ala Ala Tyr Ala Ala Leu Leu Val Leu Ala 180 185 190
Asp Asp Val His Pro Leu Phe Leu Leu Ala Pro Arg Pro Leu Phe Val
195 200 205
Gly Thr Leu Gly Val Val Val Gly Gly Leu Thr Ile Gly Gly Ser Ala
210 215 220 Arg Tyr Trp Trp Ile Asp Pro Arg Ala Ala Ala Ala Leu Thr Ala Ala
225 230 235 240
Val Val Ala Gly Leu Gly Thr Thr Ala Ala Gly Asp Ser Phe Ser Lys
245 250 255
Ala Cys Pro Arg His Arg Arg Phe Cys Val Val Ser Ala Val Glu Ser 260 265 270
Pro Pro Pro Arg Tyr Ala Pro Glu Asp Ala Glu Arg Pro Thr Asp His
275 280 285
Gly Pro Leu Leu Pro Ser Thr His His Gin Arg Ser Pro Arg Val Cys 290 295 300 Gly Asp Gly Ala Ala Arg Pro Glu Asn Ile Trp Val Pro Val Val Thr 305 310 315 320
Phe Ala Gly Ala Leu Ala Ala Cys Ala Arg Ser 325 330
(2) INFORMATION FOR SEQ ID NO: 128:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2342 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128:
GGCGGGATCT GCGCACGCGC GGCACGGCGG CGGAGAAAGC GGCGGCAGAG CCGGAAAAGG 60
CCGGGGGAGG AAGCGCGGCA TCCGCGGGGG GACTCGGTGT GGGTGGCGAG GGCCGTGGGT 120 CGTCGCGAGG GGCCACGGGC ACGCGCCCCG TGTTTTGTTG AGGCGGGACA CTCGGTCGTG 180
TTTCGCGAGC CGTAGCTGCC GGCCCGATGG GCCGCGGTGC GTACTGGGAC GTGGGGACGG 240
ACTGATCGGT GGCGGGGGGG GGAAGAAGGG CCGGGGCCGG ATTGGGCGTG GGGCCGCCGG 300
CGTCGTCGGA CGCCAGCTCC TCCAGGCCGT GGATCCAGGC CCACATGCGA GGGGGGACGG 360
GCTCGCCGGT GGTGGCGTCG GTGAGGAGAG TGGGGGCGAG GACCCCCGGG TCCGCCTGCC 420 GCGCGGGGGG GGCAGCGGGG TCCTCGGGAC CCGATCCGCC ATCCCCCCCC GCAAGGTCCC 480
GCGGGTCGCG GGCGGCGGTC GGGGCAGAGG GACCTGCCTC GTCGGCGAGG GGGCGCTGGT 540
AAACCGGGTG TCCCGGGAAC AGCTCCCCCG TCAGGAGGGA GGCGTCGAAG GGCCGCCCGA 600
GGATGGCCCG CGCGAAGAAG GGGTCCGCGT CGGCGGCGCT CGCCGCGAGA ACGTCCCCCG 660
CGGTAGCCAC AAACGGAAGC TCCTCGGTGG CCTCGCTGCC CACAAACCGC ACGTCAGGGG 720 GGCCGGGGGG CTCCGGGGCT TCCCACAAGA CCGCGACCGG GGTCATGGAG ATGTCCACGA 780
GGACCAGGCA CGGGGGCCCG TCGGCGAGAG GGCGCTCGGC GATGAGCGCC GACAGGCGCG 840
GGAGCTGCGC CGCCAGACAC GCGTTTTCGA TCGGGTTGAG ATCGGTGTGG AGGAGGCCGA 900
CGGCCCACGT CTCGATGTCG GACGACACGA CGTCGCGCAG GGCGGCGTCC GGCCCGCCGG 960
GGCGCGAGTC GAAGAGCGTC AGGCACAGTT CCAGTTCCGA CTCGCGGGAG AAGGCCGTGG 1020 TGTTGCGGAG CGCCACCACG ACGGGCGCGC CGAGGAGCAC CGCGGCCAGA ACCAGGTCCA 1080
TGGCCGTAAC GCGCGCGGCG GGGGTGCGGT GGGTCGCGGC GGCCAGCACG GCCACGTGCT 1140
GGCCCGTGGG TCGGTAGAGG GCGTGGGGGG CCTCGGGGAG GGACGCCTCG CGCCCCCCCG 1200
CCGGGCCGAG CGTCTGGCCA GACTCCAGGC GTGCGGCCAG GAGGGCGTCG AAGCTGTCGT 1260
ACTCGGTGTA GTCGTCGGGA AACATGCAGG TCCACAGCGC GGCCAAAGCG GCGCTCGGCA 1320 GACACATGCG CCCGAGGACG CTCACCGCCG CCAGGGCCTG GGCCGGACTG AGCTTCCCGA 1380
GCGCCGGGGC GTCCCGGCGC TGGGTCCCGA GCTCCAAGGC CGAGCGCCAG GGCGCCAGCG 1440
GGTCGGTTTC GGACAGCTTG CCCCGGCGCC AGTCGGCCAG CCGCGTGCCG AACAGGAGGC 1500
CCCGGGTCGG GGGGCCTCCG TCCAAAAACG TCGGCAACAC GCGGATGCGG GCGTCGGGAT 1560
GCGGGGTCAG GCGCTGGACG AACAGCATGG ACTCCGCTGC GTCCTCGAAC GCGCGTTCGA 1620 GGGTGAGGTG CATGTACTCG TGCTGGCGAA CGAGGTCCAG GCGCCAGAAG TTGTAGATGT 1680
GTTCCGGAAC GCCGGCCACC AGCGCGACCA GCACGTCGTT CTCGTTGAAG GCGACGCAGT 1740
GGCGCTGGGA CCCCCGGGGG CCCGGCGGCG GACGCGGCGC CGCCGCTCCG GACGCCCAGC 1800
CCAGCTGGGC CCAGCGACAC CCAAACTCGC GCGTGAGGGT GGTGGCGACG AGGGCGACGT 1860
ACAGCTCGGC CGCCGCGTCC ATCGAGGCGC CCCACGTCGC CTGGCGATGG CGCACGAAGC 1920 GACCGAACAG CTGAAAGTTG GCGGCCTGGG CGTCGCTGAG GGCCAGCTGG AGCCGGTTCA 1980
CGACGGTCAG CACGTACATG GCCGTGACCG TCGGGGCCGA TTCGAGGACG TCCGTCGGAA 2040
GCGGGGGCCG CACGCAGGCC GCCTCGGGAC GCATCAGCAG CGCGCCGAGT TTGTCGGTGA 2100
CGGCCGGGAA GCATAGCGCG TACTGCAGCG GCGTTCCGTC CGGGGCCAAA AAGCTGGTGG 2160
CGAACGGCAG ATCCAGAGCG CTGACGGCCT CACGCAGCAC CAGGGGCCCC GGGTCTCCGC 2220 CGGCGCGCAG ATACGCCTCG CCCCGGCGGC GCAGCAGCTG CGGGTCGACC TCGTGGCCCT 2280
CGGGGGAAGA AGAGGCCCGG GCGCGGGCGT CGAGGGCGCG AAGATCAACG AGCAGGGGCG 2340
C 2342 ( 2 ) INFORMATION FOR SEQ ID NO : 129 :
( i ) SEQUENCE CHARACTERISTICS : (A) LENGTH : 771 amino acids ( B ) TYPE : amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129:
Ala Pro Leu Leu Val Asp Leu Arg Ala Leu Asp Ala Arg Ala Arg Ala 1 5 10 15 Ser Ser Ser Pro Glu Gly His Glu Val Asp Pro Gin Leu Leu Arg Arg 20 25 30
Arg Gly Glu Ala Tyr Leu Arg Ala Gly Gly Asp Pro Gly Pro Leu Val
35 40 45
Leu Arg Glu Ala Val Ser Ala Leu Asp Leu Pro Phe Ala Thr Ser Phe 50 55 60
Leu Ala Pro Asp Gly Thr Pro Leu Gin Tyr Ala Leu Cys Phe Pro Ala 65 70 75 80
Val Thr Asp Lys Leu Gly Ala Leu Leu Met Arg Pro Glu Ala Ala Cys 85 90 95 Val Arg Pro Pro Leu Pro Thr Asp Val Leu Glu Ser Ala Pro Thr Val 100 105 110
Thr Ala Met Tyr Val Leu Thr Val Val Asn Arg Leu Gin Leu Ala Leu
115 120 125
Ser Asp Ala Gin Ala Ala Asn Phe Gin Leu Phe Gly Arg Phe Val Arg 130 135 140
His Arg Gin Ala Thr Trp Gly Ala Ser Met Asp Ala Ala Ala Glu Leu
145 150 155 160
Tyr Val Val Ala Thr Thr Leu Thr Arg Glu Phe Gly Cys Arg Trp Ala
165 170 175 Gin Leu Gly Trp Ala Ser Gly Ala Ala Ala Pro Arg Pro Pro Pro Gly
180 185 190
Pro Arg Gly Ser Gin Arg His Cys Val Ala Phe Asn Glu Asn Asp Val
195 200 205
Leu Val Val Ala Gly Val Pro Glu His Ile Tyr Asn Phe Trp Arg Leu 210 215 220
Asp Leu Val Arg Gin His Glu Tyr Met His Leu Thr Leu Glu Arg Ala 225 230 235 240
Phe Glu Asp Ala Ala Glu Ser Met Leu Phe Val Gin Arg Leu Thr Pro 245 250 255
His Pro Asp Ala Arg Ile Arg Val Leu Pro Thr Phe Leu Asp Gly Gly 260 265 270
Pro Pro Thr Arg Gly Leu Leu Phe Gly Thr Arg Leu Ala Asp Trp Arg 275 280 285
Arg Gly Lys Leu Ser Glu Thr Asp Pro Leu Ala Pro Trp Arg Ser Ala 290 295 300
Leu Glu Leu Gly Thr Gin Arg Arg Asp Ala Pro Ala Leu Gly Lys Leu
305 310 315 320
Ser Pro Ala Gin Ala Ala Val Ser Val Leu Gly Arg Met Cys Leu Pro 325 330 335
Ser Ala Ala Ala Leu Trp Thr Cys Met Phe Pro Asp Asp Tyr Thr Glu 340 345 350
Tyr Asp Ser Phe Asp Ala Leu Leu Ala Ala Arg Leu Glu Ser Gly Gin 355 360 365
Thr Leu Gly Pro Ala Gly Gly Arg Glu Ala Ser Leu Pro Glu Ala Pro 370 375 380
His Ala Leu Tyr Arg Pro Thr Gly Gin His Val Ala Val Leu Ala Ala
385 390 395 400
Ala Thr Thr Pro Ala Ala Arg Val Thr Ala Met Asp Leu Val Leu Ala 405 410 415
Ala Val Leu Leu Gly Ala Pro Val Val Val Arg Asn Thr Thr Ala Phe 420 425 430
Ser Arg Glu Ser Glu Leu Glu Leu Cys Leu Thr Leu Phe Asp Ser Arg 435 440 445
Pro Gly Gly Pro Asp Ala Ala Leu Arg Asp Val Val Ser Ser Asp Ile
450 455 460
Glu Thr Trp Ala Val Gly Leu Leu His Thr Asp Leu Asn Pro Ile Glu
465 470 475 480
Asn Ala Cys Leu Ala Ala Gin Leu Pro Arg Leu Ser Ala Leu Ile Ala 485 490 495
Glu Arg Pro Leu Ala Asp Gly Pro Pro Cys Leu Val Leu Val Asp Ile 500 505 510
Ser Met Thr Pro Val Ala Val Leu Trp Glu Ala Pro Glu Pro Pro Gly 515 520 525
Pro Pro Asp Val Arg Phe Val Gly Ser Glu Ala Thr Glu Glu Leu Pro 530 535 540
Phe Val Ala Thr Ala Gly Asp Val Leu Ala Ala Ser Ala Ala Asp Ala
545 550 555 560
Asp Pro Phe Phe Ala Arg Ala Ile Leu Gly Arg Pro Phe Asp Ala Ser
565 570 575
Leu Leu Thr Gly Glu Leu Phe Pro Gly His Pro Val Tyr Gin Arg Pro 580 585 590 Leu Ala Asp Glu Ala Gly Pro Ser Ala Pro Thr Ala Ala Arg Asp Pro
595 600 605
Arg Asp Leu Ala Gly Gly Asp Gly Gly Ser Gly Pro Glu Asp Pro Ala
610 615 620 Ala Pro Pro Ala Arg Gin Ala Asp Pro Gly Val Leu Ala Pro Thr Leu
625 630 635 640
Leu Thr Asp Ala Thr Thr Gly Glu Pro Val Pro Pro Arg Met Trp Ala
645 650 655
Trp Ile His Gly Leu Glu Glu Leu Ala Ser Asp Asp Ala Gly Gly Pro 660 665 670
Thr Pro Asn Pro Ala Pro Ala Leu Leu Pro Pro Pro Ala Thr Asp Gin
675 680 685
Ser Val Pro Thr Ser Gin Tyr Ala Pro Arg Pro Ile Gly Pro Ala Ala
690 695 700 Thr Ala Arg Glu Trp Ser Val Pro Pro Gin Gin Asn Thr Gly Arg Val
705 710 715 720
Pro Val Ala Pro Arg Asp Asp Pro Arg Pro Ser Pro Pro Thr Pro Ser
725 730 735
Pro Pro Ala Asp Ala Ala Leu Pro Pro Pro Ala Phe Ser Gly Ser Ala 740 745 750
Ala Ala Phe Ser Ala Ala Val Pro Arg Val Arg Arg Ser Arg Xaa Xaa
755 760 765
Xaa Xaa Xaa 770
(2) INFORMATION FOR SEQ ID NO: 130:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 14927 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130:
GCACGTCAGC GGCGGCCCGC GTCGCGATGT CGCCCCAGCT CTCCGGCCCC TGCGCCCCTG 60
GCTCGGGGCC GCGCTCCCCG TCCTCGCTCG CGGGCGTCCC CGCGCCACGC CTCCGCCCCC 120
CCTCCTCCGC GGCGGCCCGG GGCTCTTCCT CCTCGGCCCC CCCGGTCGCG CCGCCGGCCC 180 CCAGCCGCGC CAGCACGCGG CGCAGCGCCT CCTCGTCGCA CTGCTCGGGG CTGACGAGCC 240
GCCGCAGCAG CGGCGTCGTC AGGTGGTGGT CGTAGCACGC GCGTATCAGC GCCTCGATCT 300
GATCGTCGGG CGACGTCGCC TGGCCGCCGA TGATCAGGGC GTCCACCATG TCCAGCGCCG 360
CCAGGTGGCC CCCGAACGCG CGATCGAAGT GCTCCGCCCG CCGCCCGAAC AGCGCCAGCT 420 CCACGGCCAC CGCGGCGGTC TCCTGCTGCA GCTCGCGCTG CGCCAGCGCG TTCAGGTTGT 480
CGGCGAAGGC GTCCATGGTG GAGTGGCGGG CGCGATCGCC GGACGCCAGC CAGAAGCGCA 540
GCTCGCTGAT GGCGTACAGG CCGGGCGTAG TGGCCTGAAA CACGTCATGC GCCTCCAGCA 600
GGGCGTCGGC CTCCTCGCGG ACAGAAGAGC TATCGGCGGG CGGCGGGCCG GCCCGGGCCC 660 CGCCGCCCGC CGCGGTCCGC GCCAGCGCCT GGTCCAGCAC ACAGAGCGCT CGCGCGCGGG 720
CGGCGTCCGA CAGCCCGGCG GCGTGGGGCA GGTACCGTCG CAGCTCGTTG GCGTCCAGCC 780
GCACCTGGGC CTGTTGGGTG ACGTGGTTAC AGATGCGGTC CGCCAGGCGG CGGGCGATGG 840
TCGCCCCCTG GTTCGCGGTG ACGCACAGCT CCTCGAAACA GACCGCGCAC GGGTGGGACG 900
GGTCGCTCAG CTCCGGGGGC ACGATGAGGC CCGACCCCAC CGCCGCCACC ATAAACTCCC 960 GGACGCGCTC CAGCGCGGCC GTGGCGCCGC TCGGGGGGGT GATGAGGTGG CAGTAGTTCA 1020
GCTGCTTGAG AAAATTCTCG ACATCATGCA GGAAGCACAG CTCCATGCGG ACGTCCCCGC 1080
CGTACGTCTC CAGCCGGATC TGCTGGTGGT ACGGACAGGG TCGGGCCAGA CCCATGGTCT 1140
CGGTGAAAAA GGCAGAGACG TCACCCGTGG TCGCGAACGT TTCCAGGTGG CCCAGGAGCC 1200
GCTCCCCCTC GCGCCACGCG TACTCCAGGA GCAACTCCAG GGTGACCGAC AGCGGGGTGA 1260 GAAAGGCGGC GGCCTGAGCC TCCAGCCCCG GCCGCAGGTG CCGCCGCAGC ACGCGCACCT 1320
GGAGCGCGTT GAGCTTTAGC TGGGCGAGCT TCCCCAGGCC GATCTGGGGG TCGCATCGTC 1380
GAAGCAGCTC TAGCTGAAAA ACGTACGTCT GTACCTGCCC GAGCAGGGCC AACAGTTTCT 1440
GTCGGGCCGC AGTGGGCTCG GAAACCGCGG CCGGGGGCGC GGCCGCCATG GCGAGTCGCC 1500
CGGCCGTGCT GTGGTTTAGT TAAGGTTTGG GGGGGTGGGT CAGAGGCGCG CCCCGCGCGG 1560 ACTGATGCGG CGGCGGGCCC CTGACATCCC CTCTTTATGC CCGTCGCCCG CCCGCCCGCC 1620
CCGCCGGTGT GCCGTGATTC GCGGAGTCGG GGCCTTGTGT TTCTTTCTTT CCCCCCCCGA 1680
ATCCGTTCTT TCTTCCTCAC CCCCCCTCCC CACACACCCA CCCAGGACTC GCCACCACAA 1740
GGAGGCGAGA GCCCGTCGCT AACCCAAAGA CACAGTCACG AGACACGATA TCGACTGTAG 1800
TTGCGATCGT TTATTTTATA CACAACACCA ACCTTTCCTT CGACCCCCCC CACCCCCGCC 1860 CCTAGAGCAT ATCCAACGTC AGGTCCTTTT TCTCCGGTGG TCCCTCCCCA AACGGATCGT 1920
CGCCGTGAAA CGCCCGCTTT CGGGCGACGC CGGCCGCCCC CGCCGCCGCC GCCAAACCGC 1980
CGAACGACGC CGCGTGGTCA TCCTCGTCGC CGAAATCCCC AAAGTTAAAC ACCTCCCCGG 2040
CGGCGCCGAG CTGGCTGACC AGGGCCTCCG CCTCGTGGGC CACCTCCAGG GCCGCGTCGG 2100
TCGACCACTC GCCGTGCCCG CGCTCCAGGG CGCGGGTGGT AAACTCCATC ATTTCCTCGC 2160 TCAGGTACTC GTCCTCCAGC AGCGCCAGCC AGTCCTCGAT CTGCAGCTGC TGGGTGCGGG 2220
GGCCCAGGCT CTTGACGGTC GCCACAAACA CGCTGCTGGC GACCGCCGCC CCGCCCTCCG 2280
CAATGATGCC CCGGAGCTGC TCGCACAGCG AATGCTCGTG GGCCCCGCCC CCGAGACTCG 2340
ACGCCGCGCA CACAAACCCG GCCCTGGGGC AGGCCAGGAC AAACTTGCGG GTGCGGTCAA 2400
AGATCAGCAG CGGGCACGCG TTTTTGCCGC CCAGCAGGCT GGCCCAGTTC CCGGCCTGAA 2460 ACACGCGGTC GTTGCCGGCC ATGCCGTAGT ATTTGCTGAT GCTGAGGCCC AGCACGACCA 2520
TCGGGCGCGC GGCCATCACG GGCCGCAGCA GGTTGCAGCT CGCGAACATG GACGTCCAGG 2580
CGCCGGGGTG CGCGTCGAGG GAGTCCATCA GCGCGCGGGC CCCGGCCTCC AGGCCCGCGC 2640
CGCCCTGCGG GGCCCAGGCG GCGGCCGCCT GCACGCCGGG GGGACGGCGG GACCCGGCGA 2700
TGACGGCCGT GAGGGTGTTT ATGAAGTACG TCGAGTGGTC GCAGTACCTC AAGATCTGGT 2760 TGGCCATGTA GTACATGGCC AGTTCGCTCA CGTTATTGGG GGCCAGGTTG ATAAAGTTAA 2820
TCGCGCCGTA GTCCAGGGAG AACCTCTTAA TGAACGCGAT GGTCTCTATG TCCTCGCGCG 2880
ACAAGAGCCG GGCGGGGAGC TGGTTGCGCT GGAGGGCGGT CCAGAACCAC TGCGGGTTCG 2940
GCTGGTTCGA CCCCGGGGGC TTGCCGTTGG GAAAGATGAC CGCGTGGAAC TGCTTCAGCA 3000 GGAAGCCCAG CGGTCCGAGG AGGATGTCCA CGCGCTTGTC GGGCTTCTGG TAGGCGCTCT 3060
GGAGGCTGGC GACCCGCGCC TTGGCGGCCT CGGACGCGTT GGCGCTCGCG CCCGCGAACA 3120
ACACGCGGCT CTTGACGCGC AGCTCCTTGG GAAACCCCAG GGTCACGCGG GCAACGTCGC 3180
CCTCGAAGCT GCTCTCGGCG GGGGCCGTCT GGCCGGCCGT TAGGCTGGGG GCGCAGATAG 3240 CCGCCCCCTC CGAGAGCGCG ACCGTCAGCG TCTTCGCCGA CAGGAACCCG TTGTTGAACA 3300
GGTCCATGAC GCGCCGCCGC AGCACCGGTT GGAATTGATT GCGAAAGTTG CGCCCCTCGA 3360
CCGACTGCCC GGCGAACACC CCGTGGCACT GGCTCAGGGC CAGGTCCTGG TACACGGCGA 3420
GGTTGGACCG CCGCGCGAGG AGCTGCAGCA GGGGGCACGG CCCGCAGGTG TACGGGTCCA 3480
GCGACAGCGA CATGGCGTGG TTGGCCTCGG CCAGACCGTC GCGGAACTTA AAGTTGCGCC 3540 CCTCGATCAG GTTGCGCATC AGCTGTTCCA CCTCGCGATC CACCAGCTGC TTGATGTTGT 3600
TCACCACCGT GTGCAGGGCC TCGCGGGTGC CGATAATCGT CTCCAGCCTC CCCAGGGCCG 3660
TGGGCACCGC CTGGTCCACG TACTGCAGGG CCTCGAGCTC GGCCATGACG CGCTCGGTGG 3720
CCGCGCGGTA CGTCTCCTGC ATGATGGTCC GGGTGTTCTC GGACCCGTCC GCGCGCTTCA 3780
GGGCCGAGAA GGCGGCGTAG TTCCCCAGCA CGTCGCAGTC GCTGTACGCG CTGTTCATCG 3840 TTCCGAAGAC CCCAATGGCC CCCCGGGCGG CGCTCGCGAA CTTGGGGTGG CGGGCCCGCA 3900
GCCGCATCAG CGTCGTGTGC GCGCAGGCGT GGCGGGTCTC GAAGGTACAC AGGTTGCAGG 3960
GCACGTCGGT CTGGCCCGAG TCCGCGACGT AGCGAAACAC GTCCATCTCC TGGCGCCCGA 4020
CGATGACTCC GCCGTCGCAG CGCTCCAGGT AAAACAGCAT CTTGGCCAGC AGGGCCGGAG 4080
AGAACCCGCA CAGCATGGCC AGGTGCTCGC CGGCGAACTC CTGGGTTCCG CCGACGAGGG 4140 GCGCCGTGGG GCGCCCCTCG TACCCGGGCA CCACGTGGCC CTCGCGGTCC AGCTGCGGGT 4200
TGGCCGCCAC GTGCGTGCCG GGCACGAGAA AGAAGCGGTA AAAGGAGGGC TTGCTGTGGT 4260
CCTTGGGGTC CGCCGGCCCG GCGTCGTCCA CCTCGGTCAG GTGGAGGGCC GAGTTGGTGC 4320
TGAACACCAT GGCGCCCACG AGGCCCGCGG CGCGCGCCAG GTACGCCCCG ACGGCGCCGG 4380
CACGGGCCGC GGGCGTTTCC TGGCCCTCAA GCAGGGGCCA CGTGGTGATG TCGGGGGGCG 4440 GCTCGTCAAA GACCGCCATC GACACGATGG ACTCCAGGGC CAGGGCGGCG TCGCCCGCCA 4500
TCACCGAGGC CAGGCGCTGC TCAAACCCGC CCGCCGGGCC CTTGTTCCCG GCGTCGCGCG 4560
CGCCCCGCTG GGGCTTACCC TGGCTGGCCT CGAAGGCCGT GAACGTAATG TCGGCGGGGA 4620
GGGCCGCGCC CTCGTGGTTT TCGTCGAACG CCAGGTGGGC GGCCGCGCGG GCCACGGCGT 4680
CCACGTTCCG AGCACGCAGG GCCACGGCGG CGGGCCCGAC GACCGCCTCG AACAGCAGGC 4740 GGGCGAGGGG GCGGTTGAAA AACGGAAGGG GGTAGTTGAA ATTCTCCCCG ATCGATCGGT 4800
GGTTGCAGTT AAACGGATCG GCGATGACCC GGCTAAAATC CGGCATAAAC ATCTGCAGCG 4860
GATACACGGG GATGCGGTGA ACCTCCGCGT CCCCGATGGT TACCTTGTCC ATCCCGCCCA 4920
GGTGCAGGAA GGTGTTGCTG ATGCACACGG CCTCCCGGAA GCCCTCCGTG ATCACCAGAT 4980
ACAGCAAGGC CCGGTCCGGG TCCAGTCCGA GCCGCTCGCA CAGCGCGTCC CCCGTCGTCT 5040 CGTGCTTTAG GTCGCAGGGC CGGGGCGCGT AGTCCGAGAA GCCAAAATGG CGGCGCGCCC 5100
GCTCGCAGAG CCGCGTCAGG TTGGGGGCCT GGGTGCTGGG GGCCAGGTGG CGGCCGCCGT 5160
GAAAGACGTA GACGGACGGG CTGTAGTGCG AGGGCATAAG CTTGAGGGAC ACCGCGGTCC 5220
CCCCAAGGCC CGTCGTGCGG GACCCGACGA CCGCGGCCAC GTTGGCCTCA AACCCGCTCT 5280
CCACGGTCAG GCCGACGATG AGGGGCGCGA CGGCGACGTC CGCGTCGCCG CTGCGCGCCG 5340 ACAGTAGCGA CAGCAGCTCC AGGCCTTCGG CCGGACAGGC GCGGCCATAC ACGTACCCCA 5400
TCGGCCCCGG AGGAACCTTG ACGGTGGTCG TCGTTTTGGG CTTGGTGTCC ATGGCTTTCG 5460
GGAGATTGGC GACCGGCAGG AACGGGGGCC CGGCAAGACG ACCGGGGGCA GACGGGGGAG 5520
GCCGCGCGTG GTCGACGGCT GCTGCCCGCC GTCGTCTCTC CGATGGGGTC GAATGCCGGC 5580 GCTGGGGGTG GGGTCTACAC CCGCCCGTTC ACCGAGCGGC CCCTGGTGGG GGTGGGATGG 5640
GTGGGATGGG GTGGGCGAGA ATGGCCCGCC ACCGGATCGC GCCGGACGGG GGGGCCCGGG 5700
GTTGGGCAAG GTTTGGGCGC AAGGCTCCAG CGGCGATTCG AGAGGCCTGC GGATGGCGGC 5760
CCAGAGCTGG GTATGCTCGG CCGGGGCGGC CGGTATATGT ACGGCGTGCT GGGAGGGGCG 5820 GCGTCGGGCC CCGCCCACGG TCCGCCACGC CCCGCGCGTC ATCGGCAGGG GGCGTGGCCG 5880
CCCTTCTAAA AAAAGTGAGA ACGCGAAGCG TTCGCACTTT GTCCTAATAG TATATATATT 5940
ATTAGGACAA AGTGCGAACG CTTCGCGTTC TCACTTTTTT TAGAAGGGCG GCCACGCCCC 6000
CTTTGACGTC ACGCTCACCC GGGCGGCCGG CCGCCCATAA GCGCGGCCTG CCGGGCCGAT 6060
AAAAAGAAAC CGCGGCGCCC CCGCGGACAC CACACACTGG CTCTCGAACC CCGGACGCGC 6120 AGAAGGGACC CGGGCGCGGG TCCGCCGGTA AGAGCCGGGG GGAACATCGG CACCGCCATC 6180
CCACCCCGAG CTGTTGGGTG GGCGGGTGGG GGGGCTGGTG AGGCGGTGGT GGGAGGGGGC 6240
GGCGTATAGC AGGACAACGA CCGGCGGCGA TGTTTTGTGC CGCGGGCGGC CCGACTTCCC 6300
CCGGGGGGAA GTCGGCGGCT CGGGCGGCGT CTGGGTTTTT TGCCCCCCAC AACCCCCGGG 6360
GAGCCACCCA GACGGCACCG CCGCCTTGCC GCCGGCAGAA CTTCTACAAC CCCCACCTCG 6420 CTCAGACCGG AACGCAGCCA AAGGCCCCCG GGCCGGCTCA GCGCCATACG TACTACAGCG 6480
AGTGCGACGA ATTTCGATTT ATCGCCCCGC GTTCGCTGGA CGAGGACGCC CCCGCGGAGC 6540
AGCGCACCGG GGTCCACGAC GGCCGCCTCC GGCGCGCCCC TAAGGTGTAC TGCGGGGGGG 6600
ACGAGCGCGA CGTCCTCCGC GTGGGCCCGG AGGGCTTCTG GCCGCGTCGC TTGCGCCTGT 6660
GGGGCGGTGC GGACCATGCC CCCGAGGGGT TCGACCCCAC CGTCACCGTC TTCCACGTGT 6720 ACGACATCCT GGAGCACGTG GAACACGCGT ACAGCATGCG CGCCGCCCAG CTCCACGAGC 6780
GATTTATGGA CGCCATCACG CCCGCCGGGA CCGTCATCAC GCTTCTGGGT CTGACCCCCG 6840
AAGGCCATCG CGTCGCCGTT CACGTCTACG GCACGCGGCA GTACTTTTAC ATGAACAAGG 6900
CAGAGGTGGA TCGGCACCTG CAGTGCCGTG CCCCGCGCGA TCTCTGCGAG CGCCTGGCGG 6960
CGGCCCTGCG CGAGTCGCCG GGGGCGTCGT TCCGCGGCAT CTCCGCGGAC CACTTCGAGG 7020 CGGAGGTGGT GGAGCGCGCC GACGTGTACT ATTACGAAAC GCGCCCGACC CTGTACTACC 7080
GCGTCTTCGT GCGAAGCGGG CGCGCGCTGG CCTACCTGTG CGACAACTTT TGCCCCGCGA 7140
TCAGGAAGTA CGAGGGGGGC GTCGACGCCA CCACCCGGTT TATCCTGGAC AACCCGGGGT 7200
TTGTCACCTT CGGCTGGTAC CGCCTCAAGC CCGGCCGCGG GAACGCGCCG GCCCAACCGC 7260
GCCCCCCGAC GGCGTTCGGA ACCTCGAGCG ACGTCGAGTT TAACTGCACG GCGGACAACC 7320 TGGCCGTCGA GGGGGCCATG TGTGACCTGC CGGCCTACAA GCTCATGTGC TTCGATATCG 7380
AATGCAAGGC CGGGGGGGAG GACGAGCTGG CCTTTCCGGT CGCGGAACGC CCGGAAGACC 7440
TCGTCATCCA GATCTCCTGT CTGCTCTACG ACCTGTCCAC CACCGCCCTC GAGCACATCC 7500
TCCTGTTTTC GCTCGGATCC TGCGACCTCC CCGAGTCCCA CCTCAGCGAT CTCGCCTCCA 7560
GGGGCCTGCC GGCCCCCGTC GTCCTGGAGT TTGACAGCGA ATTCGAGATG CTGCTGGCCT 7620 TCATGACCTT CGTCAAGCAG TACGGCCCCG AGTTCGTGAC CGGGTACAAC ATCATCAACT 7680
TCGACTGGCC CTTCGTCCTG ACCAAGCTGA CGGAGATCTA CAAGGTCCCG CTCGACGGGT 7740
ACGGGCGCAT GAACGGCCGG GGTGTGTTCC GCGTGTGGGA CATCGGCCAG AGCCACTTTC 7800
AGAAGCGCAG CAAGATCAAG GTGAACGGGA TGGTGAACAT CGACATGTAC GGCATCATCA 7860
CCGACAAGGT CAAACTCTCC AGCTACAAGC TGAACGCCGT CGCCGAGGCC GTCTTGAAGG 7920 ACAAGAAGAA GGATCTGAGC TACCGCGACA TCCCCGCCTA CTACGCCTCC GGGCCCGCGC 7980
AGCGCGGGGT GATCGGCGAG TATTGTGTGC AGGACTCGCT GCTGGTCGGG CAGCTGTTCT 8040
TCAAGTTTCT GCCGCACCTG GAGCTTTCCG CCGTCGCGCG CCTGGCGGGC ATCAACATCA 8100
CCCGCACCAT CTACGACGGC CAGCAGATCC GCGTCTTCAC GTGCCTCCTG CGCCTTGCGG 8160 GCCAGAAGGG CTTCATCCTG CCGGACACCC AGGGGCGGTT TCGGGGCCTC GACAAGGAGG 8220
CGCCCAAGCG CCCGGCCGTG CCTCGGGGGG AAGGGGAGCG GCCGGGGGAC GGGAACGGGG 8280
ACGAGGATAA GGACGACGAC GAGGACGGGG ACGAGGACGG GGACGAGCGC GAGGAGGTCG 8340
CGCGCGAGAC CGGGGGCCGG CACGTTGGGT ACCAGGGGGC CCGGGTCCTC GACCCCACCT 8400 CCGGGTTTCA CGTCGACCCC GTGGTGGTGT TTGACTTTGC CAGCCTGTAC CCCAGCATCA 8460
TCCAGGCCCA CAACCTGTGC TTCAGTACGC TCTCCCTGCG GCCCGAGGCC GTCGCGCACC 8520
TGGAGGCGGA CCGGGACTAC CTGGAGATCG AGGTGGGGGG CCGACGGCTG TTCTTCGTGA 8580
AGGCCCACGT ACGCGAGAGC CTGCTGAGCA TCCTGCTGCG CGACTGGCTG GCCATGCGAA 8640
AGCAGATCCG CTCGCGGATC CCCCAGAGCA CCCCCGAGGA GGCCGTCCTC CTCGACAAGC 8700 AACAGGCCGC CATCAAGGTG GTGTGCAACT CGGTGTACGG GTTCACCGGG GTGCAGCACG 8760
GTCTTCTGCC CTGCCTGCAC GTGGCCGCCA CCGTGACGAC CATCGGCCGC GAGATGCTCC 8820
TCGCGACGCG CGCGTACGTG CACGCGCGCT GGGCGGAGTT CGATCAGCTG CTGGCCGACT 8880
TTCCGGAGGC GGCCGGCATG CGCGCCCCCG GTCCGTACTC CATGCGCATC ATCTACGGGG 8940
ACACGGACTC CATTTTCGTT TTGTGCCGCG GCCTCACGGC CGCGGGCCTG GTGGCCATGG 9000 GCGACAAGAT GGCGAGCCAC ATCTCGCGCG CGCTGTTCCT CCCCCCGATC AAGCTCGAGT 9060
GCGAAAAAAC GTTCACCAAG CTGCTGCTCA TCGCCAAGAA AAAGTACATC GGCGTCATCT 9120
GCGGGGGCAA GATGCTCATC AAGGGCGTGG ATCTGGTGCG CAAAAACAAC TGCGCGTTTA 9180
TCAACCGCAC CTCCAGGGCC CTGGTCGACC TGCTGTTTTA CGACGATACC GTATCCGGAG 9240
CGGCCGCCGC GTTAGCCGAG CGCCCCGCAG AGGAGTGGCT GGCGCGACCC CTGCCCGAGG 9300 GACTGCAGGC GTTCGGGGCC GTCCTCGTAG ACGCCCATCG GCGCATCACC GACCCGGAGA 9360
GGGACATCCA GGACTTTGTC CTCACCGCCG AACTGAGCAG ACACCCGCGC GCGTACACCA 9420
ACAAGCGCCT GGCCCACCTG ACGGTGTATT ACAAGCTCAT GGCCCGCCGC GCGCAGGTCC 9480
CGTCCATCAA GGACCGGATC CCGTACGTGA TCGTGGCCCA GACCCGCGAG GTAGAGGAGA 9540
CGGTCGCGCG GCTGGCCGCC CTCCGCGAGC TAGACGCCGC CGCCCCAGGG GACGAGCCCG 9600 CCCCCCCAGC GGCCCTGCCC TCCCCGGCCA AGCGCCCCCG GGAGACGCCG TCGCATGCCG 9660
ACCCCCCGGG AGGCGCGTCC AAGCCCCGCA AGCTGCTGGT GTCCGAGCTG GCGGAGGATC 9720
CCGGGTACGC CATCGCCCGG GGCGTTCCGC TCAACACGGA CTATTACTTC TCGCACCTGC 9780
TGGGGGCGGC CTGCGTGACG TTCAAGGCCC TGTTTGGAAA TAACGCCAAG ATCACCGAGA 9840
GTCTGTTAAA GAGGTTTATT CCCGAGACGT GGCACCCCCC GGACGACGTG GCCGCGCGGC 9900 TCAGGGCCGC GGGGTTCGGG CCGGCGGGGG CCGGCGCTAC GGCGGAGGAA ACTCGTCGAA 9960
TGTTGCATAG AGCCTTTGAT ACTCTAGCAT GAGCCCCCCG TCGAAGCTGA TGTCCCGCAT 10020
CTTGCAATAA ATGTCTGCGG CCGACACGGT CGGAATTTCC GCGTCCGCTG GTTTCTCTGC 10080
GTTGCGTCTG ACCACGAGCA CAAACGTGCT CTGCCACACG TGGGCGGCGA ACCGGTAGCC 10140
GGGGCACGCG GTCAGCATCC GATCGATGAG CCGGTAGTGC AGGTGGGCCG ACGTGCCGGG 10200 GAAGATGACG TACAGCATGT GGCCCCCGTA CGTGGGGTCC GGGTAAAAAA GAAACCGGGG 10260
GTCGCACGCC CCCCCTCCGC GCAGGATCGT GTGCACGAAA AAGAGCTCGG GCTGGCCGAG 10320
CGTATCGGCC AGGAGGTCCT GGAGGGGGGT GCTGTGGCGG TCGGCCAGCA CGACCAGGGA 10380
GGCCAGAAAG GTGCGGTGCT CAAAGATCGT ATTGATCTGC TGCACGAAGG CCAGGATGAG 10440
GGCCTCGCGG CTGACGGTGG CCAGCCGCCC GTCGCCCGCG CTGCACGCGG GGCAGCAGCC 10500 CCCGATCCCC AGGTAGTAGC CCATGCCCGA GAGGGTCAGG CAGTTGTCGG CCACGGTCTG 10560
GTCCAGGCTG AAGGGGAGCG ACACGGGGGT CGTCTTCACC AGGGGCACGG ATAGCGAGCG 10620
CACGATGGCG ATCTCCTCGG AGGGCGTCTG GGCGAGGGCG GCGAAGAAGC CGCGGTAGCG 10680
ACGGCGCTCG TGCAGGCAGA GCTCCAGCCT GCGCGCGTGC GACGGCAGGC TCTTGCGGGA 10740 GGCCCGGCGC TCCACGCCGG GGTTCCCGGC GGCGGAAAAG CGCGACCGCC GCCGGGTCTT 10800
GTCGCGGCCG GGCCCGGGCC GGGAGCCGGA GCGACGGGGG GCGATGTCAT ACATAGGTAC 10860
AGAGGGTGTG CTCCAGGGAC AGGAGAGAGA TCGAGTGTCG TCTGAGCAGC GCGCCGGCCT 10920
CGCGGACAAA TGTGGCCAGC GCGGTGGGCT TCGGCACAAA TACCTGGTAC GTCTTGAAGG 10980 TGTAGATGAG GGCCCGCAGG GCTATACAGA CCCGCCCCTC GAACTCGTTG CCGCAGGCCA 11040
ACTTGGCCTT GTGAAGCTGC AGCTCGTCGC GATGGTCGGC GCGGGGGTGG CCAAACAGGA 11100
CCCAGGGGTC GACTTCCATC TCCGTGATGG CGCACATGGG ATCGCAGAAC ATGTGCTTGA 11160
AGATGGCCTC GGGGCCCGCG GCCCGAAGCA GGCTCACGAA CCGGCCCCCG TCCCCGGGCT 11220
GCGCCTCGGG GTCCGCCTCG AGCTGGTCCA CGACCGGCAC TATGCAGTCG AAGAGGCTGG 11280 TGTTGTTCTC CGAGTAGCGG ACGACGGACG CCCTCAGGCG TCGCATGGCC AGCCAGTAGG 11340
CCCGCACCAG CAACAGATTG CACAGCAGGC ATTCCCCGCC GGTGCGCCCG CGGCCCCGGC 11400
CGTGCTTCAG CACGGTGGCC ATCAGCGGGC CCAGGTCCAG GTCGGGCTGG GCCTTGGGCT 11460
CGGCGAACTG CGCAAAGCGC GGGGCCGCGT CGCGCATGCG CGCCCCGCGG TGCGCTTCCC 11520
AGGACTCGCT GACCGCGGCG CGGCGGGCGT CCGCGGCGGC GCGCAGCCGG GGCCCCGACT 11580 CCCAGACGGC GGGGGTGCCG GCGAGCAGCA GCAGGATCAG GTCGGCGTAC GCCCACGTCT 11640
CCGGCTCACC CCCCTGCGCC AGCGCCCCGG CGGCGGCCTC GAACTCCCCG TTGCGGGCGG 11700
CGGCGCGCGT GCAGCAGCTG TCTCCGCCCC CGCGCTTGCC CTCGGTGCAG TCGAGCAGGC 11760
GGGCGCAGTC CTTCCAGTTC ATCAGGGCGG TGGTGAGGGA GGGTTGCGTT CCCGAGCCCC 11820
CGCCCGCCCC CGCCCCGTCA TCGCCCCCGG AGGCCAGGGT CCCGATGAGG GCCCGGGTTG 11880 CGGACTGCGC GAGGAAGGAA TAGTTGGAGT ACTGCACCTT GGCGGCGCCC GGGGAGGGCG 11940
TCGGCCTGGG TTGCTTCTGG GCGTGGCGCC CGGGCACCCC GCCGTCGGTC CGGAAGCAGC 12000
AGTGGAGAAA GAAATGCCGG TGGATGTCGT TGATGGTCAG GGCGAAGCGC GCGAAGGAGC 12060
CGACAAGGGT CGCCTTCTTG GTGCGCAGGA AGTGGTGGTC CATGACGTAG ACGAACTCGA 12120
AGGCGGCCAC GAAGATGCTC GCGGCGCAGT GGGGCGCGCC CAGGCACTTG GCGCAGAGGA 12180 ACGCGTAATC GGCCACCCAC TGGGGCGAGA GGCGGTAGGC CTGCTTGTAC AGCTCGATGG 12240
TGCGGCAGAC CAGACAGGGG CGGTCCAGCG CGAAGGTGTC GACGGACGCC GCGGCGAAGG 12300
GCCCCGTGTC CAAGAGTCCC TCTGCCGTGG GGTCTGCGGG CGGGCCGCGG GCGGACCCCG 12360
GCCCCCGCCC CCCCGAAGCC TCGCGCGCGG CCCCGCGCGG CCGCGGGGGG GCGGGCGCGA 12420
CGTCGCTCTC CACGTCCTCG TCGAGCGCGC TCGCGGGCGG CACGCCTACC ACGTGACAGG 12480 CCGCCAGGAG CTCGGCGCAC AGGGCCTCGT TAAGAGCCAG AAGGTCGGGA TCGAAGGCCA 12540
CATACGGACG CTCGAACGCG CCCTCCTTCC AGCTGCTGCC CGGCGACTCT TCGCGCACGG 12600
CGGCGCTCGA CGGCACCCCC GGGGCGGACG TCGCCATGGC CGGTCGAGCG GGGCGCACGC 12660
GTCCGCGAAC GTTACGGGAC GCGATCCCCG ACTGCGCGCT GCGGTCCCAG ACCCTGGAAA 12720
GTCTAGACGC GCGCTACGTC TCGCGAGACG GCGCGGGGGA CGCGGCCGTC TGGTTCGAGG 12780 ACATGACCCC CGCCGAACTA GAGGTTATAT TCCCGACCAC GGACGCCAAG CTGAACTACC 12840
TCTCGCGGAC GCAGCGGCTG GCCTCCCTCC TGACGTACGC CGGGCCTATA AAAGCGCCCG 12900
ACGGCCCCGC CGCCCCACAT ACGCAGGACA CCGCGTGCGT GCACGGCGAG CTGCTCGCCC 12960
GAAAGCGCGA ACGGTTCGCG GCGGTCATTA ACCGGTTCCT GGACCTGCAC CAGATCCTGC 13020
GGGGCTGACG CGCGCTTCGG CGGGGCACCG GCACCGGGAC CGACTTGTTT TACATAACAG 13080 TAGGGGGTGG GGGAACGCGC ACCCTTGCCC GGTCGCGATG GCGGGGATGG GGAAGCCCTA 13140
CGGCGGCCGC CCGGGGGACG CGTTCGAGGG TCTCGTTCAG CGCATCAGGC TCATTGTTCC 13200
CACCACGCTG CGCGGCGGGG GTGGGGAGTC GGGCCCCTAC TCGCCATCCA ACCCGCCCTC 13260
GAGATGTGCC TTCCAGTTCC ACGGCCAGGA TGGGTCCGAC GAGGCCTTCC CGATCGAGTA 13320 CGTCCTGCGG CTCATGAACG ACTGGGCCGA TGTGCCCTGC AACCCCTACC TGCGCGTGCA 13380
GAACACCGGC GTTTCGGTGC TGTTTCAGGG GTTTTTTAAC CGGCCCCACG GCGCCCCGGG 13440
GGGCGCGATC ACGGCGGAGC AGACCAAGGT GATTCTGCAC TCCACCGAGA CGACGGGACT 13500
GTCCCTCGGA GACCTGGACG ACGTCAAGGG GCGCCTCGGC CTGGACGCCC GGCCGATGAT 13560 GGCCAGCATG TGGATCAGCT GCTTTGTGCG CATGCCCCGG GTGCAGCTCG CGTTTCGGTT 13620
CATGGGCCCC GAGGACGCCG TTCGCACGCG GCGGATCCTG TGTCGCGCCG CCGAGCAGGC 13680
CCTCGCCCGT CGCCGCCGGT CCAGGCGGTC CCAGGATGAC TACGGGGCGG TGGCGGTGGC 13740
GGCGGCGCAC CACTCTTCCG GAGCGCCCGG GCCGGGGGTC GCCGCCTCGG GCCCGCCAGC 13800
GCCGCCCGGA CGGGGACCGG CCCGTCCGTG GCATCAGGCC GTGCAGTTGT TCCGGGCCCC 13860 GCGTCCGGGC CCCCCGGCGC TTCTGTTGCT GGTGGCGGGG CTGTTTCTGG GGGCCGCTAT 13920
CTGGTGGGCG GTTGGCGCGC GCCTATGAAA GGGGGCGAGC CACCGTCCCG CCCGCCAGTG 13980
CATCCCAGAC GCCCGCGAGC CGCACATCCC CTCCGCTCCC GCCTCCGGCC CGATTCTTAC 14040
GGCGCGACCC AAGGTCCCGA TGGCCGCCCC GCAGTTTCAC CGCCCCAGCA CCATTACCGC 14100
CGACAACGTC CGGGCGCTCG GCATGCGCGG GCTCGTGTTG GCCACCAACA ACGCTCAGTT 14160 CATCATGGAT AACAGCTACC CGCATCCGCA CGGAACGCAG GGTGCGGTGC GAGAGTTTCT 14220
TCGCGGGCAG GCCGCGGCGC TGACGGACCT CGGGGTGACC CACGCCAACA ACACGTTCGC 14280
CCCGCAGCCT ATGTTCGCGG GCGACGCCGC GGCCGAATGG CTGCGGCCCT CGTTCGGTCT 14340
TAAGCGCACG TATTCCCCCT TTGTCGTTCG CGACCCCAAG ACCCCCAGCA CCCCGTGAGT 14400
CCTCGGCGGG TCCCTCCGCG GCCGTCTCTC GTTGCCCCCC CTTTCCCCCT TCCCGGGTGG 14460 TTCAATAAAA AACACCAACA TACGATATTC GCGTTTGATA CGTTTATTGG GGGGGTGTAG 14520
GGCCCAACGA TCGGCGATTA ACAACACCAA ACAATCGAGC GCGTCTAACC CAGTAACATG 14580
CGCACGTGAT GTAGGCTGGT CAGCACGGCG TTGCTGCGCT GAAACAGCGC CCTGCGGGTC 14640
CGCTGCAGCT GTTGTTGTAT GCGGCGGCAT GCGCGGATCA AAACCGCCAG GGCGCTACGA 14700
CCGGTGCTTC GTACGTAGCG TCGCGACAAG ACGGCATTTG CCTGTACGGG CAAGGGGCCA 14760 AATTGCGAGT GTGGTGACTG GAGGTGGTCG GCGGCCAATG GGCCGGGTGG TTCGTCGGCG 14820
GGGGGCAAGT GCGGTTCCGG TGGGAGGGGG TCGAGCGCCT CGGTATCATC CGAGTCCGAG 14880
AAACGCAGGG AGTCTGCGTC GGAGTGTTCA TCATCGGAGG AGATGT 14927
(2) INFORMATION FOR SEQ ID NO: 131:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 495 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131:
Met Ala Ala Ala Pro Pro Ala Ala Val Ser Glu Pro Thr Ala Ala Arg
1 5 10 15
Gin Lys Leu Leu Ala Leu Leu Gly Gin Val Gin Thr Tyr Val Phe Gin 20 25 30
Leu Glu Leu Leu Arg Arg Cys Asp Pro Gin Ile Gly Leu Gly Lys Leu 35 40 45
Ala Gin Leu Lys Leu Asn Ala Leu Gin Val Arg Val Leu Arg Arg His
50 55 60
Leu Arg Pro Gly Leu Glu Ala Gin Ala Ala Ala Phe Leu Thr Pro Leu
65 70 75 80
Ser Val Thr Leu Glu Leu Leu Leu Glu Tyr Ala Trp Arg Glu Gly Glu
85 90 95
Arg Leu Leu Gly His Leu Glu Thr Phe Ala Thr Thr Gly Asp Val Ser
100 105 110
Ala Phe Phe Thr Glu Thr Met Gly Leu Ala Arg Pro Cys Pro Tyr His 115 120 125
Gin Gin Ile Arg Leu Glu Thr Tyr Gly Gly Asp Val Arg Met Glu Leu
130 135 140
Cys Phe Leu His Asp Val Glu Asn Phe Leu Lys Gin Leu Asn Tyr Cys
145 150 155 160
His Leu Ile Thr Pro Pro Ser Gly Ala Thr Ala Ala Leu Glu Arg Val 165 170 175
Arg Glu Phe Met Val Ala Ala Val Gly Ser Gly Leu Ile Val Pro Pro
180 185 190
Glu Leu Ser Asp Pro Ser His Pro Cys Ala Val Cys Phe Glu Glu Leu 195 200 205
Cys Val Thr Ala Asn Gin Gly Ala Thr Ile Ala Arg Arg Leu Ala Asp
210 215 220
Arg Ile Cys Asn His Val Thr Gin Gin Ala Gin Val Arg Leu Asp Ala
225 230 235 240
Asn Glu Leu Arg Arg Tyr Leu Pro His Ala Ala Gly Leu Ser Asp Ala 245 250 255
Ala Arg Ala Arg Ala Leu Cys Val Leu Asp Gin Ala Arg Thr Ala Ala
260 265 270
Gly Gly Gly Ala Arg Ala Gly Pro Pro Pro Ala Asp Ser Ser Ser Val 275 280 285
Arg Glu Glu Ala Asp Ala Leu Leu Glu Ala His Asp Val Phe Gin Ala
290 295 300
Thr Thr Pro Gly Ala Ile Ser Glu Leu Arg Phe Trp Leu Ala Ser Gly
305 310 315 320
Asp Arg Ala Arg His Ser Thr Met Asp Ala Phe Ala Asp Asn Leu Asn 325 330 335
Ala Gin Arg Glu Leu Gin Gin Glu Thr Ala Ala Val Ala Val Glu Leu
340 345 350
Ala Leu Phe Gly Arg Arg Ala Glu His Phe Asp Arg Ala Phe Gly Gly 355 360 365 His Leu Ala Ala Leu Asp Met Val Asp Ala Leu Ile Ile Gly Gly Gin
370 375 380
Ala Thr Ser Pro Asp Asp Gin Ile Glu Ala Leu Ile Arg Ala Cys Tyr 385 390 395 400 Asp His His Leu Thr Thr Pro Leu Leu Arg Arg Leu Val Ser Pro Glu
405 410 415
Gin Cys Asp Glu Glu Ala Leu Arg Arg Val Leu Ala Arg Leu Gly Ala
420 425 430
Gly Gly Ala Thr Gly Gly Ala Glu Glu Glu Glu Pro Arg Ala Ala Ala 435 440 445
Glu Glu Gly Gly Arg Arg Arg Gly Ala Gly Thr Pro Ala Ser Glu Asp
450 455 460
Gly Glu Arg Gly Pro Glu Pro Gly Ala Gin Gly Pro Glu Ser Trp Gly 465 470 475 480 Asp Ile Ala Thr Arg Ala Ala Ala Asp Val Xaa Xaa Xaa Xaa Xaa
485 490 495
(2) INFORMATION FOR SEQ ID NO: 132:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1186 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132:
Met Asp Thr Lys Pro Lys Thr Thr Thr Thr Val Lys Val Pro Pro Gly 1 5 10 15
Pro Met Gly Tyr Val Tyr Gly Arg Ala Cys Pro Ala Glu Gly Leu Glu
20 25 30
Leu Leu Ser Leu Leu Ser Ala Arg Ser Gly Asp Ala Asp Val Ala Val 35 40 45
Ala Pro Leu Ile Val Gly Leu Thr Val Glu Ser Gly Phe Glu Ala Asn
50 55 60
Val Ala Ala Val Val Gly Ser Arg Thr Thr Gly Leu Gly Gly Thr Ala 65 70 75 80 Val Ser Leu Lys Leu Met Pro Ser His Tyr Ser Pro Ser Val Tyr Val
85 90 95
Phe His Gly Gly Arg His Leu Ala Pro Ser Thr Gin Ala Pro Asn Leu 100 105 110 Thr Arg Leu Cys Glu Arg Ala Arg Arg His Phe Gly Phe Ser Asp Tyr
115 120 125
Ala Pro Arg Pro Cys Asp Leu Lys His Glu Thr Thr Gly Asp Ala Leu
130 135 140 Cys Glu Arg Leu Gly Leu Asp Pro Asp Arg Ala Leu Leu Tyr Leu Val
145 150 155 160
Ile Thr Glu Gly Phe Arg Glu Ala Val Cys Ile Ser Asn Thr Phe Leu
165 170 175
His Leu Gly Gly Met Asp Lys Val Thr Ile Gly Asp Ala Glu Val His 180 185 190
Arg Ile Pro Val Tyr Pro Leu Gin Met Phe Met Pro Asp Phe Ser Arg
195 200 205
Val Ile Ala Asp Pro Phe Asn Cys Asn His Arg Ser Ile Gly Glu Asn
210 215 220 Phe Asn Tyr Pro Leu Pro Phe Phe Asn Arg Pro Leu Ala Arg Leu Leu
225 230 235 240
Phe Glu Ala Val Val Gly Pro Ala Ala Val Arg Ala Arg Asn Val Asp
245 250 255
Ala Val Ala Arg Ala Ala Ala His Leu Ala Phe Asp Glu Asn His Glu 260 265 270
Gly Ala Ala Leu Pro Ala Asp Ile Thr Phe Thr Ala Phe Glu Ala Ser
275 280 285
Gin Gly Lys Pro Gin Arg Gly Ala Arg Asp Ala Gly Asn Lys Gly Pro
290 295 300 Ala Gly Gly Phe Glu Gin Arg Leu Ala Ser Val Met Ala Gly Asp Ala
305 310 315 320
Ala Leu Glu Ser Ile Val Ser Met Ala Val Phe Asp Glu Pro Pro Pro
325 330 335
Asp Ile Thr Thr Trp Pro Leu Leu Glu Gly Gin Glu Thr Pro Ala Ala 340 345 350
Arg Ala Gly Ala Val Gly Ala Tyr Leu Ala Arg Ala Ala Gly Leu Val
355 360 365
Gly Ala Met Val Phe Ser Thr Asn Ser Ala Leu His Leu Thr Glu Val
370 375 380 Asp Asp Ala Gly Pro Ala Asp Pro Lys Asp His Ser Lys Pro Ser Phe
385 390 395 400
Tyr Arg Phe Phe Leu Val Pro Gly Thr His Val Ala Ala Asn Pro Gin
405 410 415
Leu Asp Arg Glu Gly His Val Val Pro Gly Tyr Glu Gly Arg Pro Thr 420 425 430
Ala Pro Leu Val Gly Gly Thr Gin Glu Phe Ala Gly Glu His Leu Ala
435 440 445
Met Leu Cys Gly Phe Ser Pro Ala Leu Leu Ala Lys Met Leu Phe Tyr 450 455 460
Leu Glu Arg Cys Asp Gly Gly Val Ile Val Gly Arg Gin Glu Met Asp 465 470 " 475 480
Val Phe Arg Tyr Val Ala Asp Ser Gly Gin Thr Asp Val Pro Cys Asn 485 490 495
Leu Cys Thr Phe Glu Thr Arg His Ala Cys Ala His Thr Thr Leu Met
500 505 510
Arg Leu Arg Ala Arg His Pro Lys Phe Ala Ser Ala Arg Ala Ile Gly 515 520 525 Val Phe Gly Thr Met Asn Ser Ala Tyr Ser Asp Cys Asp Val Leu Gly 530 535 540
Asn Tyr Ala Ala Phe Ser Ala Leu Lys Arg Ala Asp Gly Ser Glu Asn 545 550 555 560
Thr Arg Thr Ile Met Gin Glu Tyr Ala Ala Thr Glu Arg Val Met Ala 565 570 575
Glu Leu Glu Ala Leu Gin Tyr Val Asp Gin Ala Val Pro Thr Ala Leu
580 585 590
Gly Arg Leu Glu Thr Ile Ile Gly Thr Arg Glu Ala Leu His Thr Val 595 600 605 Val Asn Asn Ile Lys Gin Leu Val Asp Arg Glu Val Glu Gin Leu Met 610 615 620
Arg Asn Leu Ile Glu Gly Arg Asn Phe Lys Phe Arg Asp Gly Leu Ala 625 630 635 640
Glu Ala Asn His Ala Met Ser Leu Ser Leu Asp Pro Tyr Thr Cys Gly 645 650 655
Pro Cys Pro Leu Leu Gin Leu Leu Ala Arg Arg Ser Asn Leu Ala Val
660 665 670
Tyr Gin Asp Leu Ala Leu Ser Gin Cys His Gly Val Phe Ala Gly Gin 675 680 685 Ser Val Glu Gly Arg Asn Phe Arg Asn Gin Phe Gin Pro Val Leu Arg 690 695 700
Arg Arg Val Met Asp Leu Phe Asn Asn Gly Phe Leu Ser Ala Lys Thr 705 710 715 720
Leu Thr Val Ser Glu Gly Ala Ala Ile Cys Ala Pro Ser Leu Thr Ala 725 730 735
Gly Gin Thr Ala Pro Ala Glu Ser Ser Phe Glu Gly Asp Val Ala Arg
740 745 750
Val Thr Leu Gly Phe Pro Lys Glu Leu Arg Val Lys Ser Arg Val Leu 755 760 765 Phe Ala Gly Ala Ser Ala Asn Ala Ser Glu Ala Ala Lys Ala Arg Val 770 775 780
Ala Ser Leu Gin Ser Ala Tyr Gin Lys Pro Asp Lys Arg Val Asp Ile 785 790 795 800 Leu Leu Gly Pro Leu Gly Phe Leu Leu Lys Gin Phe His Ala Val Ile
805 810 815
Phe Pro Asn Gly Lys Pro Pro Gly Ser Asn Gin Pro Asn Pro Gin Trp 820 825 830 Phe Trp Thr Ala Leu Gin Arg Asn Gin Leu Pro Ala Arg Leu Leu Ser 835 840 845
Arg Glu Asp Ile Glu Thr Ile Ala Phe Ile Lys Arg Phe Ser Leu Asp
850 855 860
Tyr Gly Ala Ile Asn Phe Ile Asn Leu Ala Pro Asn Asn Val Ser Glu 865 870 875 880
Leu Ala Met Tyr Tyr Met Ala Asn Gin Ile Leu Arg Tyr Cys Asp His
885 890 895
Ser Thr Tyr Phe Ile Asn Thr Leu Thr Ala Val Ile Ala Gly Ser Arg 900 905 910 Arg Pro Pro Gly Val Gin Ala Ala Ala Ala Trp Ala Pro Gin Gly Gly 915 920 925
Ala Gly Leu Glu Ala Gly Ala Arg Ala Leu Met Asp Ser Leu Asp Ala
930 935 940
His Pro Gly Ala Trp Thr Ser Met Phe Ala Ser Cys Asn Leu Leu Arg 945 950 955 960
Pro Val Met Ala Ala Arg Pro Met Val Val Leu Gly Leu Ser Ile Ser
965 970 975
Lys Tyr Tyr Gly Met Ala Gly Asn Asp Arg Val Phe Gin Ala Gly Asn 980 985 990 Trp Ala Ser Leu Leu Gly Gly Lys Asn Ala Cys Pro Leu Leu Ile Phe 995 1000 1005
Asp Arg Thr Arg Lys Phe Val Leu Ala Cys Pro Arg Ala Gly Phe Val
1010 1015 1020
Cys Ala Ala Ser Ser Leu Gly Gly Gly Ala His Glu His Ser Leu Cys 1025 1030 1035 104
Glu Gin Leu Arg Gly Ile Ile Ala Glu Gly Gly Ala Ala Val Ala Ser
1045 1050 1055
Ser Val Phe Val Ala Thr Val Lys Ser Leu Gly Pro Arg Thr Gin Gin 1060 1065 1070 Leu Gin Ile Glu Asp Trp Leu Ala Leu Leu Glu Asp Glu Tyr Leu Ser 1075 1080 1085
Glu Glu Met Met Glu Phe Thr Thr Arg Ala Leu Glu Arg Gly His Gly
1090 1095 1100
Glu Trp Ser Thr Asp Ala Ala Leu Glu Val Ala His Glu Ala Glu Ala 1105 1110 1115 112
Leu Val Ser Gin Leu Gly Ala Ala Gly Glu Val Phe Asn Phe Gly Asp
1125 1130 1135
Phe Gly Asp Glu Asp Asp His Ala Ala Ser Phe Gly Gly Leu Ala Ala 1140 1145 1150
Ala Ala Gly Ala Ala Gly Val Ala Arg Lys Arg Ala Phe His Gly Asp
1155 1160 1165
Asp Pro Phe Gly Glu Gly Pro Pro Glu Lys Lys Asp Leu Thr Leu Asp
1170 1175 1180
Met Leu 1185
(2) INFORMATION FOR SEQ ID NO: 133:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1228 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133:
Met Phe Cys Ala Ala Gly Gly Pro Thr Ser Pro Gly Gly Lys Ser Ala
1 5 10 15
Ala Arg Ala Ala Ser Gly Phe Phe Ala Pro His Asn Pro Arg Gly Ala 20 25 30 Thr Gin Thr Ala Pro Pro Pro Cys Arg Arg Gin Asn Phe Tyr Asn Pro 35 40 45
His Leu Ala Gin Thr Gly Thr Gin Pro Lys Ala Pro Gly Pro Ala Gin
50 55 60
Arg His Thr Tyr Tyr Ser Glu Cys Asp Glu Phe Arg Phe Ile Ala Pro 65 70 75 80
Arg Ser Leu Asp Glu Asp Ala Pro Ala Glu Gin Arg Thr Gly Val His
85 90 95
Asp Gly Arg Leu Arg Arg Ala Pro Lys Val Tyr Cys Gly Gly Asp Glu 100 105 110 Arg Asp Val Leu Arg Val Gly Pro Glu Gly Phe Trp Pro Arg Arg Leu 115 120 125
Arg Leu Trp Gly Gly Ala Asp His Ala Pro Glu Gly Phe Asp Pro Thr
130 135 140
Val Thr Val Phe His Val Tyr Asp Ile His Val Glu His Ala Tyr Ser 145 150 155 160
Met Arg Ala Ala Gin Leu His Glu Arg Phe Met Asp Ala Ile Thr Pro
165 170 175
Ala Gly Thr Val Ile Thr Leu Leu Gly Leu Thr Pro Glu Gly His Arg 180 185 190
Val Ala Val His Val Tyr Gly Thr Arg Gin Tyr Phe Tyr Met Asn Lys
195 200 205
Ala Glu Val Asp Arg His Leu Gin Cys Arg Ala Pro Arg Asp Leu Cys 210 215 220
Glu Arg Leu Ala Ala Ala Leu Arg Glu Ser Pro Gly Ala Ser Phe Arg
225 230 235 240
Gly Ile Ser Ala Asp His Phe Glu Ala Glu Val Val Glu Arg Ala Asp
245 250 255 Val Tyr Tyr Tyr Glu Trp Thr Leu Tyr Tyr Arg Val Phe Val Arg Ser
260 265 270
Gly Arg Ala Tyr Leu Cys Asp Asn Phe Cys Pro Ala Ile Arg Lys Tyr
275 280 285
Glu Gly Gly Val Asp Ala Thr Thr Arg Phe Ile Leu Asp Asn Pro Gly 290 295 300
Phe Val Thr Phe Gly Trp Tyr Arg Leu Lys Pro Gly Arg Gly Asn Ala
305 310 315 320
Pro Ala Gin Pro Arg Pro Pro Thr Ala Phe Gly Thr Ser Ser Asp Val
325 330 335 Glu Phe Asn Cys Thr Ala Asp Asn Leu Ala Val Glu Gly Ala Met Cys
340 345 350
Asp Leu Pro Ala Tyr Lys Leu Met Cys Phe Asp Ile Glu Cys Lys Ala
355 360 365
Gly Gly Glu Asp Glu Leu Ala Phe Pro Val Ala Glu Arg Pro Glu Asp 370 375 380
Leu Val Ile Gin Ile Ser Cys Leu Leu Tyr Asp Leu Ser Thr Thr Ala
385 390 395 400
Leu Glu His Ile Leu Leu Phe Ser Leu Gly Ser Cys Asp Leu Pro Glu
405 410 415 Ser His Leu Ser Asp Leu Ala Ser Arg Gly Leu Pro Ala Pro Val Val
420 425 430
Leu Glu Phe Asp Ser Glu Phe Glu Met Leu Leu Ala Phe Met Thr Phe
435 440 445
Val Lys Gin Tyr Gly Pro Glu Phe Val Thr Gly Tyr Asn Ile Ile Asn 450 455 460
Phe Asp Trp Pro Phe Val Leu Thr Lys Leu Thr Glu Ile Tyr Lys Val 465 470 475 480
Pro Leu Asp Gly Tyr Gly Arg Met Asn Gly Arg Gly Val Phe Arg Val 485 490 495 Trp Asp Ile Gly Gin Ser His Phe Gin Lys Arg Ser Lys Ile Lys Val 500 505 510
Asn Gly Met Val Asn Ile Asp Met Tyr Gly Ile Ile Thr Asp Lys Val 515 520 525 Lys Leu Ser Ser Tyr Lys Leu Asn Ala Val Ala Glu Ala Val Leu Lys
530 535 540
Asp Lys Lys Lys Asp Leu Ser Tyr Arg Asp Ile Pro Ala Tyr Tyr Ala 545 550 555 560 Ser Gly Pro Ala Gin Arg Gly Val Ile Gly Glu Tyr Cys Val Gin Asp
565 570 575
Ser Leu Leu Val Gly Gin Leu Phe Phe Lys Phe Leu Pro His Leu Glu
580 585 590
Leu Ser Ala Val Ala Arg Leu Ala Gly Ile Asn Ile Thr Arg Thr Ile 595 600 605
Tyr Asp Gly Gin Gin Ile Arg Val Phe Thr Cys Leu Leu Arg Leu Ala
610 615 620
Gly Gin Lys Gly Phe Ile Leu Pro Asp Thr Gin Gly Arg Phe Arg Gly 625 630 635 640 Leu Asp Lys Glu Ala Pro Lys Arg Pro Ala Val Pro Arg Gly Glu Gly
645 650 655
Glu Arg Pro Gly Asp Gly Asn Gly Asp Glu Asp Lys Asp Asp Asp Glu
660 665 670
Asp Gly Asp Glu Asp Gly Asp Glu Arg Glu Glu Val Ala Arg Glu Thr 675 680 685
Gly Gly Arg His Val Gly Tyr Gin Gly Ala Arg Val Leu Asp Pro Thr
690 695 700
Ser Gly Phe His Val Asp Pro Val Val Val Phe Asp Phe Ala Ser Leu 705 710 715 720 Tyr Pro Ser Ile Ile Gin Ala His Asn Leu Cys Phe Ser Thr Leu Ser
725 730 735
Leu Arg Pro Glu Ala Val Ala His Leu Glu Ala Asp Arg Asp Tyr Leu
740 745 750
Glu Ile Glu Val Gly Gly Arg Arg Leu Phe Phe Val Lys Ala His Val 755 760 765
Arg Glu Ser Leu Leu Ser Ile Leu Leu Arg Asp Trp Leu Ala Met Arg
770 775 780
Lys Gin Ile Arg Ser Arg Ile Pro Gin Ser Thr Pro Glu Glu Ala Val 785 790 795 800 Leu Leu Asp Lys Gin Gin Ala Ala Ile Lys Val Val Cys Asn Ser Val
805 810 815
Tyr Gly Phe Thr Gly Val Gin His Gly Leu Leu Pro Cys Leu His Val
820 825 830
Ala Ala Thr Val Thr Thr Ile Gly Arg Glu Met Leu Leu Ala Thr Arg 835 840 845
Ala Tyr Val His Ala Arg Trp Ala Glu Phe Asp Gin Leu Leu Ala Asp
850 855 860
Phe Pro Glu Ala Ala Gly Met Arg Ala Pro Gly Pro Tyr Ser Met Arg 865 870 875 880
Ile Ile Tyr Gly Asp Thr Asp Ser Ile Phe Val Leu Cys Arg Gly Leu
885 - 890 895
Thr Ala Ala Gly Leu Val Ala Met Gly Asp Lys Met Ala Ser His Arg 900 905 910
Ala Leu Phe Leu Pro Pro Ile Lys Leu Glu Cys Glu Lys Thr Phe Thr
915 920 925
Lys Leu Leu Leu Ile Ala Lys Lys Lys Tyr Ile Gly Val Ile Cys Gly
930 935 940 Gly Lys Met Leu Ile Lys Gly Val Asp Leu Val Arg Lys Asn Asn Cys
945 950 955 960
Ala Phe Ile Asn Arg Thr Ser Arg Ala Leu Val Asp Leu Leu Phe Tyr
965 970 975
Asp Asp Thr Val Ser Gly Ala Ala Ala Ala Glu Arg Pro Ala Glu Glu 980 985 990
Trp Leu Ala Arg Pro Leu Pro Glu Gly Leu Gin Ala Phe Gly Ala Val
995 1000 1005
Leu Val Asp Ala His Arg Arg Ile Thr Asp Pro Glu Arg Asp Ile Gin
1010 1015 1020 Asp Phe Val Leu Thr Ala Glu Leu Ser Arg His Pro Arg Ala Tyr Thr
1025 1030 1035 104
Asn Lys Arg Leu Ala His Leu Thr Val Tyr Tyr Lys Leu Met Ala Arg
1045 1050 1055
Arg Ala Gin Val Pro Ser Ile Lys Asp Arg Ile Pro Tyr Val Ile Val 1060 1065 1070
Ala Gin Thr Arg Glu Val Glu Glu Thr Val Ala Arg Leu Ala Ala Leu
1075 1080 1085
Arg Glu Leu Asp Ala Ala Ala Pro Gly Asp Glu Pro Ala Pro Pro Ala
1090 1095 1100 Ala Leu Pro Ser Pro Ala Lys Arg Pro Arg Glu Thr Pro Ser His Ala
1105 1110 1115 112
Asp Pro Pro Gly Gly Ala Ser Lys Pro Arg Lys Leu Leu Val Ser Glu
1125 1130 1135
Leu Ala Glu Asp Pro Gly Tyr Ala Ile Arg Val Pro Leu Asn Thr Asp 1140 1145 1150
Tyr Tyr Phe Ser His Leu Leu Gly Ala Ala Cys Val Thr Phe Lys Ala
1155 1160 1165
Leu Phe Gly Asn Asn Ala Lys Ile Thr Glu Ser Leu Leu Lys Arg Phe 1170 1175 1180 Ile Pro Glu Thr Trp His Pro Pro Asp Asp Val Ala Ala Arg Leu Arg 1185 1190 1195 120
Ala Ala Gly Phe Gly Pro Ala Gly Ala Gly Ala Thr Ala Glu Glu Thr 1205 1210 1215 Arg Arg Met Leu His Arg Ala Phe Asp Thr Leu Ala 1220 1225
(2) INFORMATION FOR SEQ ID NO: 134:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 303 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134:
Met Tyr Asp Ile Ala Pro Arg Arg Ser Gly Ser Arg Pro Gly Pro Gly
1 5 10 15
Arg Asp Lys Thr Arg Arg Arg Ser Arg Phe Ser Ala Ala Gly Asn Pro 20 25 30 Gly Val Glu Arg Arg Ala Ser Arg Lys Ser Leu Pro Ser His Ala Arg 35 40 45
Arg Leu Glu Leu Cys Leu His Glu Arg Arg Arg Tyr Arg Gly Phe Phe
50 55 60
Ala Ala Gin Thr Pro Ser Glu Glu Ile Ala Ile Val Arg Ser Leu Ser 65 70 75 80
Val Pro Leu Val Lys Thr Thr Pro Val Ser Leu Pro Phe Ser Leu Asp
85 90 95
Gin Thr Val Ala Asp Asn Cys Leu Thr Leu Ser Gly Met Gly Tyr Tyr 100 105 110 Leu Gly Ile Gly Gly Cys Cys Pro Ala Cys Ser Ala Gly Asp Gly Arg 115 120 125
Leu Ala Thr Val Ser Arg Glu Ala Leu Ile Leu Ala Phe Val Gin Gin
130 135 140
Ile Asn Thr Ile Phe Glu His Arg Thr Phe Leu Ala Ser Leu Val Val 145 150 155 160
Leu Ala Asp Arg His Ser Thr Pro Leu Gin Asp Leu Leu Ala Asp Thr
165 170 175
Leu Gly Gin Pro Glu Leu Phe Phe Val His Thr Ile Leu Arg Gly Gly 180 185 190 Gly Ala Cys Asp Pro Arg Phe Leu Phe Tyr Pro Asp Pro Thr Tyr Gly 195 200 205
Gly His Met Leu Tyr Val Ile Phe Pro Gly Thr Ser Ala His Leu His 210 215 220 Tyr Arg Leu Ile Asp Arg Met Leu Thr Ala Cys Pro Gly Tyr Arg Phe
225 230 235 240
Ala Ala His Val Trp Gin Ser Thr Phe Val Leu Val Val Arg Arg Asn
245 250 255 Ala Glu Lys Pro Ala Asp Ala Glu Ile Pro Thr Val Ser Ala Ala Asp
260 265 270
Ile Tyr Cys Lys Met Arg Asp Ile Ser Phe Asp Gly Gly Leu Met Leu
275 280 285
Glu Tyr Gin Arg Leu Tyr Ala Thr Phe Asp Glu Phe Pro Pro Pro 290 295 300
(2) INFORMATION FOR SEQ ID NO: 135:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 597 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135:
Val Arg Pro Ala Arg Pro Ala Met Ala Thr Ser Ala Pro Gly Val Pro 1 5 10 15
Ser Ser Ala Ala Val Arg Glu Glu Ser Pro Gly Ser Ser Trp Lys Glu
20 25 30
Gly Ala Phe Glu Arg Pro Tyr Val Ala Phe Asp Pro Asp Leu Leu Ala 35 40 45 Leu Asn Glu Ala Leu Cys Ala Glu Leu Leu Ala Ala Cys His Val Val 50 55 60
Gly Val Pro Pro Ala Ser Ala Leu Asp Glu Asp Val Glu Ser Asp Val 65 70 75 80
Ala Pro Ala Pro Pro Arg Pro Arg Gly Ala Ala Arg Glu Ala Ser Gly 85 90 95
Gly Arg Gly Pro Gly Ser Arg Pro Pro Ala Asp Pro Thr Ala Glu Gly
100 105 110
Leu Leu Asp Thr Gly Pro Phe Ala Ala Ala Ser Val Asp Thr Phe Ala 115 120 125 Leu Asp Arg Pro Cys Leu Val Cys Arg Thr Ile Glu Leu Tyr Lys Gin 130 135 140
Ala Tyr Arg Leu Ser Pro Gin Trp Val Ala Asp Tyr Ala Phe Leu Cys 145 150 155 160 Ala Lys Cys Leu Gly Ala Pro His Cys Ala Ala Ser Ile Phe Val Ala
165 170 175
Ala Phe Glu Phe Val Tyr Val Met Asp His His Phe Leu Arg Thr Lys 180 185 190 Lys Ala Thr Leu Val Gly Ser Phe Ala Arg Phe Ala Leu Thr Ile Asn 195 200 205
Asp Ile His Arg His Phe Phe Leu His Cys Cys Phe Arg Thr Asp Gly
210 215 220
Gly Val Pro Gly Arg His Ala Gin Lys Gin Pro Arg Pro Thr Pro Ser 225 230 235 240
Pro Gly Ala Ala Lys Val Gin Tyr Ser Asn Tyr Ser Phe Leu Ala Gin
245 250 255
Ser Ala Thr Arg Ala Leu Ile Gly Thr Leu Ala Ser Gly Gly Asp Asp 260 265 270 Gly Ala Gly Ala Gly Gly Gly Ser Gly Thr Gin Pro Ser Leu Thr Thr 275 280 285
Ala Leu Met Asn Trp Lys Asp Cys Ala Arg Leu Leu Asp Cys Thr Glu
290 295 300
Gly Lys Arg Gly Gly Gly Asp Ser Cys Cys Thr Arg Ala Ala Ala Arg 305 310 315 320
Asn Gly Glu Phe Glu Ala Ala Ala Gly Ala Gin Gly Gly Glu Pro Glu
325 330 335
Thr Trp Ala Tyr Ala Asp Leu Ile Leu Leu Leu Leu Ala Gly Thr Pro 340 345 350 Ala Val Trp Glu Ser Gly Pro Arg Leu Arg Ala Ala Ala Asp Ala Arg 355 360 365
Arg Ala Ala Val Ser Glu Ser Trp Glu Ala His Arg Gly Ala Arg Met
370 375 380
Arg Asp Ala Ala Pro Arg Phe Ala Gin Phe Ala Glu Pro Lys Ala Gin 385 390 395 400
Pro Asp Leu Asp Leu Gly Pro Leu Met Ala Thr Val Leu Lys His Gly
405 410 415
Arg Gly Arg Gly Arg Thr Gly Gly Glu Cys Leu Leu Cys Asn Leu Leu 420 425 430 Leu Val Arg Ala Tyr Trp Leu Ala Met Arg Arg Leu Arg Ala Ser Val 435 440 445
Val Arg Tyr Ser Glu Asn Asn Thr Ser Leu Phe Asp Cys Ile Val Pro
450 455 460
Val Val Asp Gin Leu Glu Ala Asp Pro Glu Ala Gin Pro Gly Asp Gly 465 470 475 480
Gly Arg Phe Val Ser Leu Leu Arg Ala Ala Gly Pro Glu Ala Ile Phe
485 490 495
Lys His Met Phe Cys Asp Pro Met Cys Ala Ile Thr Glu Met Glu Val 500 505 510
Asp Pro Trp Val Leu Phe Gly His Pro Arg Ala Asp His Arg Asp Glu
515 520 525
Leu Gin Leu His Lys Ala Lys Leu Ala Cys Gly Asn Glu Phe Glu Gly 530 535 540
Arg Val Cys Ile Ala Leu Arg Ala Leu Ile Tyr Thr Phe Lys Thr Tyr 545 550 555 560
Gin Val Phe Val Pro Lys Pro Thr Ala Thr Phe Val Arg Glu Ala Gly 565 570 575 Ala Leu Leu Arg Arg His Ser Ile Ser Leu Leu Ser Leu Glu His Thr 580 585 590
Leu Cys Thr Tyr Val 595
(2) INFORMATION FOR SEQ ID NO: 136:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 128 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136:
Met Ala Gly Arg Ala Gly Arg Trp Arg Thr Leu Arg Asp Ala Ile Pro
1 5 10 15
Asp Cys Ala Leu Arg Ser Gin Thr Leu Glu Ser Leu Asp Ala Arg Tyr 20 25 30
Val Ser Arg Asp Gly Ala Gly Asp Ala Ala Val Trp Phe Glu Asp Met
35 40 45
Thr Pro Ala Glu Leu Glu Val Ile Phe Pro Thr Thr Asp Ala Lys Leu 50 55 60 Asn Tyr Leu Ser Arg Thr Gin Arg Leu Ala Ser Leu Leu Thr Tyr Ala 65 70 75 80
Gly Pro Ile Lys Ala Pro Asp Gly Pro Ala Ala Pro His Thr Gin Asp
85 90 95
Thr Ala Cys Val His Gly Glu Leu Asp Ala Thr Glu Arg Glu Arg Phe 100 105 110
Ala Ala Val Ile Asn Arg Phe Leu Asp Leu His Gin Ile Leu Arg Gly 115 120 125 ( 2 ) INFORMATION FOR SEQ ID NO : 137 :
( i ) SEQUENCE CHARACTERISTICS : (A) LENGTH : 274 amino acids ( B ) TYPE : amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137:
Met Ala Gly Met Gly Lys Pro Tyr Gly Gly Arg Pro Gly Asp Ala Phe 1 5 10 15 Glu Gly Leu Val Gin Arg Ile Arg Leu Ile Val Pro Thr Thr Leu Arg 20 25 30
Gly Gly Gly Gly Glu Ser Gly Pro Tyr Ser Pro Ser Asn Pro Pro Ser
35 40 45
Arg Cys Ala Phe Gin Phe His Gly Gin Asp Gly Ser Asp Glu Ala Phe 50 55 60
Pro Ile Glu Tyr Val Leu Arg Leu Met Asn Asp Trp Ala Asp Val Pro 65 70 75 80
Cys Asn Pro Tyr Leu Arg Val Gin Asn Thr Gly Val Ser Val Leu Phe 85 90 95 Gin Gly Phe Phe Asn Arg Pro His Gly Ala Pro Gly Gly Ala Ile Thr 100 105 110
Ala Glu Gin Thr Asn Val Ile Leu His Ser Thr Glu Thr Thr Gly Leu
115 120 125
Ser Leu Gly Asp Leu Asp Asp Val Lys Gly Arg Leu Gly Leu Asp Ala 130 135 140
Arg Pro Met Met Ala Ser Met Trp Ile Ser Cys Phe Val Arg Met Pro
145 150 155 160
Arg Val Gin Leu Ala Phe Arg Phe Met Gly Pro Glu Asp Ala Val Arg
165 170 175 Thr Arg Arg Ile Leu Cys Arg Ala Ala Glu Gin Ala Arg Arg Arg Arg
180 185 190
Ser Arg Arg Ser Gin Asp Asp Tyr Gly Ala Val Ala Val Ala Ala Ala
195 200 205
His His Ser Ser Gly Ala Pro Gly Pro Gly Val Ala Ala Ser Gly Pro 210 215 220
Pro Ala Pro Pro Gly Arg Gly Pro Ala Arg Pro Trp His Gin Ala Val 225 230 235 240
Gin Leu Phe Arg Ala Pro Arg Pro Gly Pro Pro Ala Leu Leu Leu Leu 245 250 255
Val Ala Gly Leu Phe Leu Gly Ala Ala Ile Trp Trp Ala Val Gly Ala
260 - 265 270
Arg Leu
(2) INFORMATION FOR SEQ ID NO: 138:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 112 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138:
Met Ala Ala Pro Gin Phe His Arg Pro Ser Thr Ile Thr Ala Asp Asn 1 5 10 15
Val Arg Ala Leu Gly Met Arg Gly Leu Val Leu Ala Thr Asn Asn Ala
20 25 30
Gin Phe Ile Met Asp Asn Ser Tyr Pro His Pro His Gly Thr Gin Gly 35 40 45 Ala Val Arg Glu Phe Leu Arg Gly Gin Ala Ala Ala Leu Thr Asp Leu 50 55 60
Gly Val Thr His Ala Asn Asn Thr Phe Ala Pro Gin Pro Met Phe Ala 65 70 75 80
Gly Asp Ala Ala Ala Glu Trp Leu Arg Pro Ser Phe Gly Leu Lys Arg 85 90 95
Thr Tyr Ser Pro Phe Val Val Arg Asp Pro Lys Thr Pro Ser Thr Pro 100 105 110
(2) INFORMATION FOR SEQ ID NO: 139:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 837 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139: CCCGCTAGTC TGGGGGCGAG GTGCTGCAGG ACCGAGTAGA GGATGGAAAA AACGTCTCGG 60
TCGTAAACCA CGACCGAGCG GGGTCCGATG CAGCCGTCGG GGCCGCTCTC GACGATGGCC 120
ACCAGCGGAC AGTCGGAGTT GTACGTGAGG TACACGCCCG GCGGGTAGCG GTACAGACCT 180 TCGGAGGTCG GGCGGCTGCA GTCGGGGCGG CGCAACTCAA GCTCCCCGCA CCGGTAGACC 240
GACGCAAAGA GTGTGGTGGC GATAATGAGC TCGCGAATAT ATCGCCAGGC GGCGCGCTGG 300
GTGGGCGTGA TTCCGGAAAC ACCGTCAAAA CAGTAGAACT TTTGAAACTC GCTGACGGCC 360
CAATCAGCGC CCGAACCCCC CGCGCCCATG ATGAAGCGGG CGAGTTCCTC CTTGAGGTGC 420
GGCAGGAGCC CCACGTTCTC GACGCTGTAG TACAGCGCGG TGTTGGGGGG CTGGGCGAAG 480 CTGTGGGTGG AGTGGTCGAA CAGGGGCCCG TTGACGAGCT CGAAGAAGCG ATGGGTGATG 540
CTGGGGAGCA GGGCCGGGTC CACCTGGTGG CGCAGCAGCG ACGCTCGCAT GAACCGGTGC 600
GCGTCAAACA CGCCCGGGGC GGCGCGGTTG TCGATGACCG TGCCCGCGCC CGCCGTCAGG 660
GCGCAGAAGC GCGCGCGCGC CGCGAAGCCG TTGGCGACCG CGGCGAAGGT CGCGGGCAGC 720
ACCTCGCCGT GGACGCTGAC CCGCAGCATC TTCTCGAGCT CCCCGCGCTG CTCGCGCACG 780 CAGCGCCCGA GGCTGGCCAG CGACCGCTTG GTCAGGCGGT CCGCGTACAG CCGCCG 837
(2) INFORMATION FOR SEQ ID NO: 140:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 278 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140:
Arg Arg Leu Tyr Ala Asp Arg Leu Thr Lys Arg Ser Leu Ala Ser Leu 1 5 10 15
Gly Arg Cys Val Arg Glu Gin Arg Gly Glu Leu Glu Lys Met Leu Arg
20 25 30
Val Ser Val His Gly Glu Val Leu Pro Ala Thr Phe Ala Ala Val Ala 35 40 45 Asn Gly Phe Ala Ala Arg Ala Arg Phe Cys Ala Leu Thr Ala Gly Ala 50 55 60
Gly Thr Val Ile Asp Asn Arg Ala Ala Pro Gly Val Phe Asp Ala His 65 70 75 80
Arg Phe Met Arg Ala Ser Leu Leu Arg His Gin Val Asp Pro Ala Leu 85 90 95
Leu Pro Ser Ile Thr Phe Phe Glu Leu Val Asn Gly Pro Leu Phe Asp
100 105 110
His Ser Thr His Ser Phe Ala Gin Pro Pro Asn Thr Ala Leu Tyr Tyr 115 120 125
Ser Val Glu Asn Val Gly Leu Leu Pro His Leu Lys Glu Glu Leu Ala
130 135 - 140
Arg Phe Ile Met Gly Ala Gly Gly Ser Gly Ala Asp Trp Ala Val Ser 145 150 155 160
Glu Phe Gin Lys Phe Tyr Cys Phe Asp Gly Val Ser Gly Ile Thr Pro
165 170 175
Thr Gin Arg Ala Ala Trp Arg Tyr Ile Arg Glu Leu Ile Ile Ala Thr 180 185 190 Thr Leu Phe Ala Ser Val Tyr Arg Cys Gly Glu Leu Glu Leu Arg Arg 195 200 205
Pro Asp Cys Ser Arg Pro Thr Ser Glu Gly Arg Tyr Pro Pro Gly Val
210 215 220
Tyr Leu Thr Tyr Asn Ser Asp Cys Pro Leu Val Ala Ile Val Glu Ser 225 230 235 240
Gly Pro Asp Gly Cys Ile Gly Pro Arg Ser Val Val Val Tyr Asp Arg
245 250 255
Asp Val Phe Ser Ile Lys Val Leu Gin His Leu Ala Pro Arg Leu Ala 260 265 270 Gly Xaa Xaa Xaa Xaa Xaa 275
(2) INFORMATION FOR SEQ ID NO: 141:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2646 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141:
AAACAATACC AGAAGTCATG TGTATTTTTG AACATCGGTG TCTTTTTATT TATACACAAG 60 CCCAGCTCCC CTCCCCTCCC TTAGAGCTCG TCTTCGTCTC CGGCCTCGTC CTCGTTGTGG 120
AGCGGAGAGT ACCTGGCTTT GTTGCGCTTG CGCAGAACCA TGTTGGTGAC CTTGGAGCTG 180
AGCAGGGCGC TCGTGCCCTT CTTTCTGGCC TTGTGTTCCG TGCGCTCCAT GGCCGACACC 240
AAAGCCATAT ATCGGATCAT TTCTCGGGCC TCGGCCAACT TGGCCTCGTC AAACCCGCCC 300
CCCTCCGCGC CTTCCTCCCC CTCCCCGCCC ACGCCCCCGG GGTCGGAAGT CTTGAGTTCC 360 TTGGTGGTGA GCGGATACAG GGCCTTCATG GGATTGCGTT GCAGTTGCAG GACGTAGCGG 420
AAGGCGAAGA AGGCCGCGAC CAGGCCGGCC AGGACCAGCA GCCCCACGGC AAGCGCCCCG 480
AAGGGGTTGG ACATAAAGGA GGACACGCCC GAGACGGCCG ACACCACGCC CCCCACTACT 540
CCCATGACTA CCTTGCCGAC CGCGCGCCCC AAGTCCCCCA TCCCCTCGAA GAACGCGCAC 600 AGCCCCGCGA ACATGGCGGC GTTGGCGTCG GCGCGGATGA CCGTGTCGAT GTCGGCAAAG 660
CGCAGGTCGT GCAGCTGGTT GCGGCGCTGG ACCTCCGTGT AGTCCAGCAG GCCGCTGTCC 720
TTGATCTCGT GGCGCGTGTA GACCTCCAGG GGCACAAACT CGTGGTCCTC CAGCATGGTG 780
ATGTTCAGGT CGATGAAGGT GCTGACGGTG GTGACGTCGG CGCGACTCAG CTGGTGAGAG 840 TACGCGTACT CCTCGAAGTA CACGTAGCCC CCGCCGAAGA TGAAGTAGCG CCGGTGGCCC 900
ACGGTGCACG GCTCGAGCGC GTCGCGGGTG AGGCGCAGCT CGTTGTTCTC GCCCAGCTGC 960
CCCTCGATCA GCGGGCCCTG GTCTTCGTAC CGAAAGCTGA CCAGGGGGCG GCTGTAGCAC 1020
GTCCCCGGCC GCGAGCTGAC GCGCATCGAG TTCTGCACGA TCACGTTGTC CGGGGCGACG 1080
GGCACGCACG TGGAGACGGC CATGACGTCT CCGAGCATGC GCGCGCTCAC CCGCCGGCCG 1140 ACGGTGGCGG AGGCGATGGC GTTGGGGTTG AGCTTGCGGG CCTCGTTCCA GAGAGTCAGC 1200
TCGTGGTTCT GCAGCTCGCA CCACGCGACG GCGATGCGCC CCAGCATGTC GTTCACGTGG 1260
CGCTGTATGT GGTTATACGT AAACTGCAGC CGGGCGAACT CGATCGAGGA GGTGGTCTTG 1320
ATGCGCTCCA CGGACGCGTT GGCGCTGGGC GCCTCCCGCA GTGGCGCGGG CGTGGCATTC 1380
CGGGGCTTGC GGTCCTGCTC CCGCATGTAC TCCCGCACGT ACAGCTCGGC GAGCGTGTTG 1440 CTGAGGAGGG GCTGGTACGC GATGAGGAAG CCCCCCGTGG CCAGGTAGTA CTGCGGCTGG 1500
CCCACCTTGA TGTGCGTGGC GTTGTACTTG CGCGCAAACA TGCGGTCGAT GGCCTCGCGG 1560
GCATCCCGGC CAATGCAGTC GCCCAGGTCG ACGCGCGAGA GCGAGTACTG GGTCAGGTTG 1620
GTGGTGAAGG TGGTCGAGAT GGCGTCGGAG GAGAAGCGGA AGGAGCCGCC GTACTCGGCG 1680
CGGAGCATCT CGTCCACCTC CTGCCACTTG GTCATGGTGC AGACCGCCGG TCGCTTCGGC 1740 ACCCAGTCCC AGGCCACGGT AAACTTGGGG GTCGTCAGCA AGTTGCGGGT CGTCGGCGAC 1800
GTGGCCCGGG CCTTCGTGGT GAGGTCGCGC GCGTAGAAGC CGTCGACCTG CTTGAAGCGG 1860
TCGGCGGCGT AGCTGGTGTG CTCGGTGTGC GACCCCTCCC GG AGCCGTA AAACGGGGAC 1920
ATGTACACAA AGTCGCCCGT CGCCAACACA AACTCATCGT ACGGGTACAC CGACCGCGCG 1980
TCCACCTCCT CGACGATGCA GTTGACCGTC GTGCCGTACC GATGGAACGC CTCCACCCGC 2040 GAGGGGTTGT ACTTGAGGTC GGTGGTGTGC CACCCCCGGC TCGTGCGCGT GGCGACCTTC 2100
GCCGGCTTGA GCTCCATGTC GGTCTCGTGG TCGTCCCGGT GAAACGCGGT GGTCTCCATG 2160
TTGTTCCGCA CGTACTTGGC CGTGGAGCGG CAGACCCCCT TGGCGTTAAT CTTGTCGATC 2220
ACCTCCTCGA AGGGAACGGG GGCGCGGTCC TCGAATATCC CCATAAACTG GGAGTAGCGG 2280
TGGCCGAACC ACACCTGCGA CACGGTCACG TCTTTGTAGT ACATGGTGGC CTTGAATTTG 2340 TACGGGGCGA TGTTCTCCTT GAAGACCACC GCGATGCCCT CCGTGTAGTT CTGCCCCTCC 2400
GGGCGCGTCG GGCAGCGGCG CGGCTGCTCA AACTGCACCA CCGTGGCGCC CGTCGGGGGC 2460
GGGCACACGT AAAACTGGGC ATCGGCGTTC TCGACCTTGA TTTCCCGCAG GTGCGCGCGC 2520
AGCGTGGCGT GGCCGGCGGC GACGGTCGCG TTGGCGTCGG GGGGCGGGGT CGCCTCGGGC 2580
CGCTTGGGCG GCTTTTTGGT TTTCCGCTTC CGGGCCTTGG TGGTCGCGGG GCTCGGGACG 2640 GGGGG 2646
(2) INFORMATION FOR SEQ ID NO: 142:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 846 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear ( ii ) MOLECULE TYPE : peptide
(xi ) SEQUENCE DESCRIPTION : SEQ ID NO : 142 :
Pro Pro Val Pro Ser Pro Ala Thr Thr Lys Ala Arg Lys Arg Lys Thr
1 5 10 15
Lys Lys Pro Pro Lys Arg Pro Glu Ala Thr Pro Pro Pro Asp Ala Asn 20 25 30 Ala Thr Val Ala Ala Gly His Ala Thr Leu Arg Ala His Leu Arg Glu 35 40 45
Ile Lys Val Glu Asn Ala Asp Ala Gin Phe Tyr Val Cys Pro Pro Pro
50 55 60
Thr Gly Ala Thr Val Val Gin Phe Glu Gin Pro Arg Arg Cys Pro Trp 65 70 75 80
Glu Gly Gin Asn Tyr Thr Glu Gly Ile Ala Val Val Phe Lys Glu Asn
85 90 95
Ile Ala Pro Tyr Lys Phe Lys Ala Thr Met Tyr Tyr Lys Asp Val Thr 100 105 110 Val Ser Gin Val Trp Phe Gly His Arg Tyr Ser Gin Phe Met Gly Ile 115 120 125
Phe Glu Asp Arg Ala Pro Val Pro Phe Glu Glu Val Ile Asp Lys Ile
130 135 140
Asn Ala Lys Gly Val Cys Arg Ser Thr Ala Lys Tyr Val Arg Asn Asn 145 150 155 160
Met Thr Ala Phe His Arg Asp Asp His Glu Thr Asp Met Glu Leu Lys
165 170 175
Pro Ala Lys Val Ala Thr Arg Thr Ser Arg Gly Trp His Thr Thr Asp 180 185 190 Leu Lys Tyr Asn Pro Ser Arg Val Glu Ala Phe His Arg Tyr Gly Thr 195 200 205
Thr Val Asn Cys Ile Val Glu Glu Val Asp Ala Arg Ser Val Tyr Pro
210 215 220
Tyr Asp Glu Phe Val Leu Ala Thr Gly Asp Phe Val Tyr Met Ser Pro 225 230 235 240
Phe Tyr Gly Tyr Arg Glu Gly Ser His Thr Glu His Thr Ser Tyr Ala
245 250 255
Ala Asp Arg Phe Lys Gin Val Asp Gly Phe Tyr Ala Arg Asp Leu Thr 260 265 270 Thr Lys Ala Arg Ala Thr Ser Pro Thr Thr Arg Asn Leu Leu Thr Thr 275 280 285
Pro Lys Phe Thr Val Ala Trp Asp Trp Val Pro Lys Arg Pro Ala Val 290 295 300 Cys Thr Met Thr Lys Trp Gin Glu Val Asp Glu Met Leu Arg Ala Glu
305 310 315 320
Tyr Gly Gly Ser Phe Arg Phe Ser Ser Asp Ala Ile Ser Thr Thr Phe
325 330 335 Thr Thr Asn Leu Thr Gin Tyr Ser Leu Ser Arg Val Asp Leu Gly Asp
340 345 350
Cys Ile Gly Arg Asp Ala Arg Glu Ala Ile Asp Arg Met Phe Ala Arg
355 360 365
Lys Tyr Asn Ala Thr His Ile Lys Val Gly Gin Pro Gin Tyr Tyr Leu 370 375 380
Ala Thr Gly Gly Phe Leu Ile Ala Tyr Gin Pro Leu Leu Ser Asn Thr
385 390 395 400
Leu Ala Glu Leu Tyr Val Arg Glu Tyr Met Arg Glu Gin Asp Arg Lys
405 410 415 Pro Arg Asn Ala Thr Pro Ala Pro Leu Arg Glu Ala Pro Ser Ala Asn
420 425 430
Ala Ser Val Glu Arg Ile Lys Thr Thr Ser Ser Ile Glu Phe Ala Arg
435 440 445
Leu Gin Phe Thr Tyr Asn His Ile Gin Arg His Val Asn Asp Met Leu 450 455 460
Gly Arg Ile Ala Val Ala Trp Cys Glu Leu Gin Asn His Glu Leu Thr
465 470 475 480
Leu Trp Asn Glu Ala Arg Lys Leu Asn Pro Asn Ala Ile Ala Ser Ala
485 490 495 Thr Val Gly Arg Arg Val Ser Ala Arg Met Leu Gly Asp Val Met Ala
500 505 510
Val Ser Thr Cys Val Pro Val Ala Pro Asp Asn Val Ile Val Gin Asn
515 520 525
Ser Met Arg Val Ser Ser Arg Pro Gly Thr Cys Arg Pro Leu Val Ser 530 535 540
Phe Arg Tyr Glu Asp Gin Gly Pro Leu Ile Glu Gly Gin Leu Gly Glu
545 550 555 560
Asn Asn Glu Leu Arg Leu Thr Arg Asp Ala Leu Glu Pro Cys Thr Val
565 570 575 Gly His Arg Arg Tyr Phe Ile Phe Gly Gly Gly Tyr Val Tyr Phe Glu
580 585 590
Glu Tyr Ala Tyr Ser His Gin Leu Ser Arg Ala Asp Val Thr Thr Val
595 600 605
Ser Thr Phe Ile Asp Leu Asn Ile Thr Met Leu Glu Asp His Glu Phe 610 615 620
Val Pro Leu Glu Val Tyr Thr Arg His Glu Ile Lys Asp Ser Gly Leu 625 630 635 640
Leu Asp Tyr Thr Glu Val Gin Arg Arg Asn Gin Leu His Asp Leu Arg 645 650 655
Phe Ala Asp Ile Asp Thr Val Ile Arg Ala Asp Ala Asn Ala Ala Met
660 - 665 670
Phe Ala Gly Leu Cys Ala Phe Phe Glu Gly Met Gly Asp Leu Gly Arg 675 680 685
Ala Val Gly Lys Val Val Met Gly Val Val Gly Gly Val Val Ser Ala
690 695 700
Val Ser Gly Val Ser Ser Phe Met Ser Asn Pro Phe Gly Ala Val Gly 705 710 715 720 Leu Leu Val Leu Ala Gly Leu Val Ala Ala Phe Phe Ala Phe Arg Tyr
725 730 735
Val Leu Gin Leu Gin Arg Asn Pro Met Lys Ala Leu Tyr Pro Leu Thr
740 745 750
Thr Lys Glu Leu Lys Thr Ser Asp Pro Gly Gly Val Gly Gly Glu Gly 755 760 765
Glu Glu Gly Ala Glu Gly Gly Gly Phe Asp Glu Ala Lys Leu Ala Glu
770 775 780
Ala Arg Glu Met Ile Arg Tyr Met Ala Leu Val Ser Ala Met Glu Arg 785 790 795 800 Thr Glu His Lys Ala Arg Lys Lys Gly Thr Ser Ala Leu Leu Ser Ser
805 810 815
Lys Val Thr Asn Met Val Leu Arg Lys Arg Asn Lys Ala Arg Tyr Ser
820 825 830
Pro Leu His Asn Glu Asp Glu Ala Gly Asp Glu Asp Glu Leu 835 840 845
(2) INFORMATION FOR SEQ ID NO: 143:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 20388 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143:
GGATCTCCTC GTTCTCTTGC GTGATGGACA CGTCCTCCGC GGTGGCCGTG TCGCCTCCCG 60
GGGCCGTGAG CTGCTCCTCC GGGGAGATGG GGGGGTCTGG GGTGCCGACA ACGGCCGGCC 120 CGGCCCCGCC CGAGACCGAG GACGCCTGGG GAGTGGGGGT GCCGCTTTCC CCCATCCCCA 180
GGGACAGGTG GGCCGCCGCC TCCGTCGCGG CGGCGGGAGC CGCGGCCCCC AGCCGCGCGA 240
CGTAGCGACA AAAGTGGCGA CAGAGGCGCA TGAGGCGCGC GCCGTCGGCC GCGTATCGCG 300
TGTTTGGCGG GACGAGCTCG TCGTAACTGA ACAGGAGCAC GCGGGCGCAG GTCGCCCACG 360 GGCCCCACGC CAGGCGCAGC GCCGCGACCG TGTACGGGTC GTACACGCCT TGGGCGTCGC 420
ACGCGACCGG CAGGGAGACG AACAGCCCGC CCGCGCTGGG GACGCGCGGC AGGAGGTCCG 480
GGTGCGCCGG GATGACGGGG GCTAGGATCG CCCCCACCGC ATCCGCCGGC ACGTAGGCGG 540
CAAACGCCGA ACGCCACGGG GTGCAGTCGC CGGTCGCGTG GGCCCGGGTC TGGGTTTCGA 600 CCCGGAAGTT CGCGGCCGCC CCGCCGTCGG GGCGGCCGCG CACGAGGGCG GACAGCGGGA 660
CCCCCGCCGC CGCCAGGCAC TCGCTGGAGA TGATGACGTG AATCAGCGAG GCGGGGCTGC 720
TCGGGTCCCG GGTGAGATCG TATTGGACCT CGTTGGCAAA GTGCGCGTTC ATGGCCCGGC 780
CGGCGGTGCG AGCCCTTCCC GGTGCCGGAA GGGGCGTGGG TGGGGGGTGC GTGTGCGCGT 840
CCTCGGGGCC CGCGGGCGCA CGTGCGCTTA TACGCTGTGT GTTTCGTCTG TCCCCAGGGA 900 ATCCGGGGCC AGGACTTTAA CCTGCTTTTC GTCGACGAGG CCAACTTTAT TCGCCCGGAT 960
GCGGTCCAGA CGATTATGGG CTTTCTCAAT CAGGCCAACT GCAAGATCAT CTTCGTCTCG 1020
TCGACCAACA CCGGGAAGGC CAGCACGAGC TTTTTGTACA ACCTCCGCGG GGCCGCCGAC 1080
GAGCTGCTCA ACGTGGTCAC CTATATATGC GACGACCACA TGCCGCGGGT GGTGACGCAC 1140
ACCAACGCCA CGGCCTGTTC CTGCTATATC CTGAACAAAC CCGTGTTTAT CACGATGGAC 1200 GGCGCCGTTC GCCGGACGGC CGATCTGTTT CTGCCCGACT CCTTCATGCA GGAGATCATC 1260
GGGGGGCAGG CCCGCGAGAC CGGCGACGAC CGGCCCGTCC TAACAAAGTC GGCGGGGGAG 1320
CGGTTTCTGC TGTACCGCCC CTCCACCACC ACCAACAGCG GCCTGATGGC CCCCGAGCTG 1380
TACGTGTACG TGGACCCGGC GTTCACGGCC AACACGCGCG CCTCCGGCAC CGGCATCGCG 1440
GTCGTCGGGA GGTACCGCGA CGATTTCATT ATCTTCGCCC TGGAGCACTT TTTCCTCCGC 1500 GCGCTCACGG GATCGGCCCC CGCGGACATC GCCCGCTGCG TCGTGCACAG CCTCGCCCAG 1560
GTGCTGGCGC TGCACCCCGG GGCGTTTCGC AGCGTTCGCG TGGCGGTCGA GGGCAACAGC 1620
AGCCAGGACT CGGCCGTGGC CATCGCCACA CACGTGCATA CCGAGATGCA CCGCATCCTG 1680
GCCTCGGCGG GGGCCAACGG CCCGGGGCCC GAGCTCCTCT TCTATCACTG CGAGCCGCCC 1740
GGCGGCGCGG TATTGTACCC CTTCTTTCTG CTCAACAAAC AGAAGACGCC CGCCTTCGAA 1800 TACTTTATCA AAAAGTTCAA CTCCGGGGGC GTCATGGCGT CCCAGGAGCT CGTCTCCGTG 1860
ACGGTGCGCC TGCAGACCGA CCCGGTCGAG TATCTGTCCG AGCAGCTCAA CAACCTCATC 1920
GAAACCGTCT CTCCCAACAC CGACGTCCGC ATGTACTCCG GAAAACGCAA CGGTGCCGCG 1980
GACGACCTCA TGGTCGCGGT CATCATGGCC ATTTACCTGG CGGCCCCGAC CGGGATCCCC 2040
CCGGCCTTTT TTCCGATCAC GCGCACGTCT TGAGTCTTTC TTGCCGTTTC TTTTGTTTCT 2100 CTTTCTTTCC CCCCTCTCTC CGCAATAAAC GCCTTCCCGG AACTGTGTTT TCCCCCCCTA 2160
CAACAGTGTT GTCCGTTGGT TGGGTGGTTG GGGTGCGGGG GTGGGCGGGG GAAGCAAGAA 2220
AACGGTCGGC GAACACAACA TCGGGAAAAC GGATTCCCGA ACGTGCGTCT TCCCAGATTC 2280
GACACACACC CCCCTTCTCC TTAAATAAAC ACAAACCACA CGCTCGTTGG TTGGTTAATG 2340
CCGGCGCTTT ATTTACGTCT TGTTTTTTTG CGTTTCCTCC GCGGGTCCCT TCCCAACACG 2400 CCTGCCCCCG CCTCAGGGGT AGCGGATAAC CGGGGCCATG TCGCCGGATT GCACAACGGC 2460
GGCGCCGTCG AACGTACACA CCCGAACCGC CGGGGCCAGG GCCAGGATGT CCCCGAGTTG 2520
GCCCGCGTGC GCCAGCCAGG CGACCAGCGC CTCGTAAAGC GGCAGCCTGC GCTCGCCGTC 2580
CTGCATCAGC ATGGGGGCTT CGGGGTGGAT GAGCTGGGCG GCTTCTCGCG TGACGCTCTG 2640
CATCTGCAGG AGCGCGTTCA CGTATCCGTC CTGGGCGCTC AGCGCGAGCA GCCGGGGGAT 2700 GAGCGTGAGG ATGAGGGTGG TTCCTTCGGT TATGGAGTAG ACCATGTTGA GGACGAGCGA 2760
CCGCAGCTCG GTGTTTACGG AGGCGAGTTG CTGGACGTCG GCCACGAGCG AGAGACGGGC 2820
CCCGTTGTAA TACAGCACGT TGAGGTCGGG GAGCTCCCCG GGCGTCCGGG GGTCGGGGTT 2880
GAGGTCCCGG ATGCCCCGGG CGACCAGCCG CGCGACTATC TCGCGGGCCA GGGGCGTTGG 2940 GAGCGGGACC GGAAACCGCA GCGTGAGGTC CAGCGACTCC AGGCGCACGT CCGTCGCCTG 3000
GCCCTCGAAG ACGGGCGGGA CGAGGCTGAC GGGATCCCCG TTGCAGAGGT CGACGGGGGA 3060
GGTGTTGCGG AGATTGACGG TGCCGGCGTG CGTGAGCCCC AGGTCCACGG GGCAGGCGAC 3120
GATTCGCGTG GGCAGCACCC GCGTGATTAC CGCGGGGAAG CGCCTGCGGT ACGCCAGCAA 3180 CAACCCCAAC GTGTCGGGAC TAACTCCTCC GGAGACGAAC GATTCGTGCG CCACGTCCGC 3240
GAGCGCCAGC TGGCGGCGGA TGGTCGGCAG AAAGACCACT CGACCCTCGC ACCGCTGCAG 3300
CGCCGCGGCA TCGGGGCGCG AGATACCCGA GGGGATCGCG ATGTCTGCTT CGAAACAATC 3360
CGTGATCATG GCGCCGGGCC GCGAGACACC GGAACGCGGG GGTGCGGGAG GGCCGGAAAG 3420
CGCAACGCAA CCGGGACGAT GATGAAACAG AGATGGGGGG CACCGACCGT GTGGGAGAGG 3480 GGGCGGGGCA GGGCTCAGCA GCACGCACGG GGAGGTCTGT CGTGCGCAGG AGCCCCAGGT 3540
GAGAATCAGT CCCCCGGAGC TCGGGTCTGG GTTTTATTGG GACCTGCCCT CGGAATCGCG 3600
GCTCCCAGTC CAAGCCCCCC CGGGGGGGCG GGGACAGGGG GTGTGTGTGG GTAAAAGCAA 3660
CGTCGGAAAA TCAAACCCAA TGCCCCAAAC AGGAAAAAAA AAAAAGACGG GCGGGTGGAG 3720
GGAAAGCTGG GGAAGAAGAA GCCAATTTTA CAGAGACAGG CCCTTTAGCG GGGAGGCGTC 3780 GTAGATGAGA TACTGCGTAA AGTGGGTCTC TCGCGCGTGG GCCTCCCCAT CGCGGGCGCT 3840
GCGTAGCAGG GCGGGGTCGC TGGCGCAGGT GATCGGGTAG GCTTCCTGAA ACAGGCCGCA 3900
CGGGTCTTCC ACGAGCTCGC GGCACCCCGG CGGGCGCTTA AACTGCACGT CGCTGGCAGC 3960
GGTGGCCGTG GATACCGCCG ATCCCGTTTC CACGATAAGA CGCTCCAGGC AGCGATGTTT 4020
GGCCGTGATG TCGGCCGCGG TGAAGAACTT GAAGCAGGGG CTGAGGACGG GCGAGGCCCC 4080 GTTGAGGTGA TAGGCCCCGT TGTACAGCAG GTCCCCGTAC GAGAACCGCT GCGACGCCCA 4140
CGGGTTGGCC GTGGCCGCGA AGGGCCGCGC CGGGTCGCTC TGGCCGTGGT CGTACATGAG 4200
GGCTATGACG TCCCCCTCCT TGTCCCCCGC GTACACGCCG CCGGCCGCGC GTCCCCGCGG 4260
GTTGCAGGGC CGGCGAAAGT AGTTGATGTC CGTGGCCACG GGGGTGGCGA TGAACTCACA 4320
CACGGCATCC TGCCCGTGGT CCATGCCGGC GCGCCGCGGC ACCTGGGCGC AGCCAAAGAC 4380 CGGGAGGGGC TGGGCCGGCC CCAGCCGGTT TCCCGCCACG ACCGCGTTGC GCAGGTACAC 4440
GGCGGCCGCG TTGTCTAGCA GCGGGGGGGC CCCGCGGCCG AGGTAAAAGT TTTGGGGGAG 4500
GTTGCCCATG TCCGTAACGG GGTTGCGGAC GGTGGCCGTG GCCGCGACGG CGGTGTAGCC 4560
CACACCCAGG TCCACGTTTC CGCGCGGCTG GGTGAGCGTG AAGCTGACCC CCCCGCCCGT 4620
TTCGTGGCGG GCCACCTGGA GCTGGCCCAG AAAGTACGCC TCCGACGCGC GCTCGGAAAA 4680 CAGCACGTTC TCGGTCACGA AGCGGTCCTG CCGCACGACG GTGAACCCGA ACCCGGGGTG 4740
GAGGCCCGTC TTGAGCTGGT GATACAGGGC CACGGGGCTC ATCTTGAAGT ACCCCGCCAT 4800
GAGCGCGTAG GTCAGCGCGT TCTCCCCCGC CGCGCTCTCG CGGGCGTGCT GCACCACGGG 4860
CTGGCGGATG GAGGAGAAGT AGTTGGCCCC CAGGGCCGGG GGGACCAGGG GGACGTGGCG 4920
CGCCAGGTCG CGCAGGGCCG GGGGGAAGTT GGGCGCGTTG GCCACGTGGT CGGCGCCCGC 4980 AAACAGCGCG TGGACGGGCA GGACGTAGAA GTATTCGCCA TTTTGGATGG TGTGGTCCAG 5040
GTGCTGGGGG GCCATGAGCA GCACGCCGGC GTGCAGCGCC CCGTCGAAGA TGCGCATGTT 5100
GGCCGTCGAC GCGGTGTTGG CGCCCGCGTC GGGCGCCGCG GAGCACAGCA GCGCCGTCGT 5160
GCGCTCGGCC ATGTTGTGCG CCAGCACCTG CAGCGTGAGC ATGGCGGGCC CGTCGACGAC 5220
GACGCGCCCG TTGTGGAACA TGGCGTTGAC CGTGTTGGCC ACCAGATTGG CGGGATGCAG 5280 CGGGTGGGCG GGGTCGGTCA CGGGATCGCT CGGGCACTCC TCGCCGGGGG CGATCTCCGG 5340
GACCACCATG TTCTGCAGCG TGGCGTACAC GCGGTCGAAG CGGACCCCCG CGGTGCAGCA 5400
GCGCCCCCGC GAGAAGGCCG GCACCAGCAC GTAATAGTAG ATTTTGTGGT GGACGGTCCA 5460
GTCGGCCGGC CGGTGCGGCC GGTCGTCGGC GGCGTCGGCC GCGCGGGCCT GGGTGTTGTG 5520 CAGCAGCCGG CCGTCGTTGC GGTTAAAGTC GGCCGTCGCC ACGTTGCACG CCGCCGCGTA 5580
GACGGGCTCG TGTCCCCCCG CGTCAATCCG GCAGTCTCGG TGGCGGTCCA GGGCCGCGTG 5640
TCGCATAAGG CCGTCGCAGT CCCACACGAG GGGCGGCAGC AGCGCCGGGT CGCGCATCAG 5700
GTGATTCAGC TCGGCCTGAG CCTGCCCGCC CAGCTCCGGG CCCGGCAGGG TAAAGTCGTC 5760 CACCAGCTGG GCCAGGGCCT CGACGTGGGC CACCAGGTCC CGATACACGG CCATGCACTC 5820
CTCGGGGAGG TCGCCCCCGA GGTAGGTCAC GATGTACGAG ACCAGCGAGT AGTCGTTCAC 5880
GAACGCCGCG CATCGCGTGT TGTTCCAGTA GCTGGTGATG CACTGAGTCA CGAGCCGCGC 5940
CAGGGCGCAG AACACGTGCT CGCTGCCGTG AATCGCGGCT TGCAGCAGGT AAAACACCGC 6000
CGGGTAGCTG CGGTCCTCGA ACGCCCCGCG GACGGCGGCT ATGGTAGCCG GCGCCATGGC 6060 GTGGCGGCCA ACGCCGAGCT CCAGGCCCCG GGCGTCACGA AACGCCACCG GACACAGCGC 6120
CAGGGGCAGG TTGCCGTTGA CCACGCGCCA GGTGGCCTGG ATCGCCCCCG GACCGGCCGG 6180
GGGGACTTCG CCGCCGGGAA GCTCGACGTC GGCCACGCCC GCGAAGAAGT CGAACGCGGG 6240
GTGCAGCTCC AGAGCCAGGT TGGCGTTGTC GGGCTGCATG AACTGCTCCG CGGTCATCTG 6300
GCACTCGGCG ACCCACCGGA CCCGGCCGTG GGCGAGGCGC TGCCGCCAGG CGTTCAGAAA 6360 ACGCTGCTGC ATGTCCGCGC CGGGGCCGGC CGGGGCCGCG ACGTACGCCC CGTACGGATT 6420
CGCGGCCTCG ACGGGGTCGT GGTTCACGCC CCCGACGGCC GCGTCGATGT TCATGAGCGA 6480
AGGATGACAC ACGGTCCCGA CCGCGTTCTC CATGGACAGC CGCAGAACCT GGTGGTCCTT 6540
TCCCCAAAAA AACAGCTGCC GGGGAGGGAA CGCGCGGGGC TCCGGGTGGC CGGGGGCGGG 6600
CACCAGGTCC CCGGCGTGCG CGGCGAAGCG CTCCATGGCC GGGTTGAACA GCCCCAGGGG 6660 CAGGACGAAC GTCAGGTCCA TGGCGCCCAC CAGGGGGTAG GGCACGTTGG TGGCGGCGTA 6720
GATGCGCTTC TCCAGGGCCT CCAGGAAGAC CAGCCTGTCG CCTATGGCCA CCAGATCCGC 6780
GCGCACGCGC GTTGTCTGGG GGGCGCTTTC GAGTTCATCC AGCGTCTCCC GGTTCGCCTC 6840
GAGTTGCTCC TCCTGCATCT CCAGCAGGTG GCGGCCCACG TCGTCCAGGC TCCGCACGGC 6900
CTTGCCCATC ACCAGCGCCG TGACGAGGTT GGCCCCGTTC AAGACCATCT CGCCGTAGGT 6960 CACCGGCACG TCGGCCTCGG TGTCCTCCAC CTTCAGGAAG GACTGCAGGA GGCGCTGTTT 7020
GATGGCGGCG GTGGTGACCA GCACCCCGTC GACCGGCCGC CCGCGCGTGT CGGCGTGCGT 7080
CAGGCGGGGC ACGGCCACGG ACGGCTGCGT CGCCGTGGTC AGGTCCACGA GCCAGGCCTC 7140
GATGGCCTCG CGGCGATGGC CCGCCTTGCC CAGGAAGAAG CTCGTGTCGC AAAAGCTCCG 7200
CTTCAGCTCG GCGACCAGGG TCGCCCGGGC GACCCTGGTC GCCAGGCGCC CGTTGTCGAG 7260 ATATCGTTGC ATGGGCAACA GCAGGGCCAG GGGAGGCGCC TTCTCCAACA GCACGTGCAG 7320
CATCTGGTCG GCCGTGCCGC GCTCAAACGC CCCCAGGACG GCCTGGACGT TGCGCGCGAG 7380
CTGCTGGATG GCGCGCAGCT GGCGATGCAG GCTAATGCCC GTCCCGTCCA GGGCCTCCCC 7440
CGTGAGCAGG GCAATGGCCT CGGTGGCCAG GCTGAAGGCG GCGTTCAGGG CCCGGCGGTC 7500
GATGACCTTC GTCATGTAAT TATGCACGGG CTGCTCGACG GGGTGCGGGC CGTCGCGGGC 7560 GATGAGGGGC TGGTGGACCT CGAACTGCAC ACGCCCTTCG TTCATGTAAG CCAGCTCCGG 7620
GAACTTGGTG CACACGCACG CCACGGACAG GCCGAGCTCC AGAAAGCGCA CGAGCGACAG 7680
GGTGTTGCAG TAGGACCCCA GCAGGGCGTC AAACTCTACG TCATACAGGC TGTTTTCGTC 7740
GGAGCGCACG CGGGCGAAAA AATCAAAGAG TCTGCGGTGG GACGCCACCT CGATCGTACT 7800
CAGGATGGAG CCGGTGGGCA CCATGGCCGC GGCGTACCGG TAACCCGGGG GGTCGCGGGC 7860 AGGAGCGGCC ATTGGGTTCC TTGGGGGATT CGCAGGCTCC ATCAAGCCAA GCTCGGGAAG 7920
GCCAAGCCCC TCCCACACAA CGCCTCACCG CCGGCGGACG CGACTAACAA CCCACGGGCC 7980
GCCAAAAACC CCAAGGGGCA ACCCGACCAA CAACAGGCGA GGGGAGGAAA GGCGTAAAGG 8040
GGGCGTTGGG AGGCAAAAAG AAAGAAAACA CCCAGACGTA GGCCCGAGGA CCGGCCGGCG 8100 TCCTCTGTCC CCGAGCACCC ACTGTGCCCA ACAGGCACGG GGGCGAGCTG CCCCTGCCTT 8160
ATATACCCCC CCGCCACACC CCCGTTAGAA CGCGACGGGT GCCTTCAAGA TGGCCCTGGT 8220
CCAAAAGCGT GCTAGAAAAA AGTTGGTAAA GGCGGCAAAG CAGTCCGCCG CCGCCACCCA 8280
CATGGCGGCG CCGGCCGCGC AGGCGATTCC CAGAGAACGG GCGCGGAGGG GATCCGTGCG 8340 GGGCAGCAGC TGGCTGGCGG TGATCCAATG GAAAAGCCCG TCGGGACTGA ACGTCTCATG 8400
GGCGGCCGCC ACCAGGGCGC ACAGGGCCGC GCCGCCCATG ATCACGCACA ACCCCCAAAA 8460
CACGGGTGGC GACAACGGCA GGCGATCCCG TTTGATGTTC ACGTACAGGA GGAGCGCCCG 8520
TGCCAGCCAC GTGACATAGT AGGCGAGGAC GGCGGCTATA ATACATGCCG GCGCCACCGC 8580
CCGTCCGGTC CACCCGTAAT ACATGCCCGC GGCCACCAGC TCCAGCGGCT TGAGGACCAG 8640 GAACGACCAA GCAAACATCA CCACCCGCTT GGAAAAGACC GGCTGGGTGT GGGGCGGAAG 8700
ACGCGAGTAG GCCGAACTGA CAAAAAAATC AGACGTGCCG TACGAGGACA GCGAAAACTG 8760
TTCATCGAGC GGCAGTTCGC CGTCCTCCCC GCCACACGCG GCCTCGTATA CCAGCTCGCG 8820
ATCCAACAAA GGAACATCAT CCCGCATTGT CATGGTCGGT GCGGGGAGCC GGCGAGGCAG 8880
CGAAACCGAA AGTAGTGCTG GCGGCGCGGG CCCGGGTCCG GACCCAAGCT TCAGGGATGG 8940 GGGGCGGAGG CCAAAATCAA ACAAGCACCG CGCGGGTTCT ACACACAACC CCCACCCGGG 9000
TAGTATCCGC GGATGCGAGT GCCTGGCGAA GTCACGTCCC AGCAGGATAT AAACCTCGGC 9060
CGTTGGGCCC GGAACCCCCG AAATTCACAC CCACGCCCTG ACGCCCAAAT CATGGGTGGA 9120
TGTGGTTCGC GAGCCGCACA TCCGTGCGTC CGCCCTCCCC CGCGGGCTGA TGACGTGGCG 9180
GTTAGTCAGT GGGAAGGCAG GGGGAAAGAT GGGTTGGGGG AGGAAACGAA GAAAACACCC 9240 AGAGGGCCAC GTCGGGAATG CGCCCGGAGT TGTCCTTAAA AGGCCGGCCG TGCGTGACGG 9300
AAGCCGTCGT TTGCCCAAGC ACCGACGCCG CGATCCACAG TGGGGGGAGT TCCTCCGTCC 9360
GGCCACAACC CTACGCGCGG GCGGCACGCG CGAGAGCAAC CCACGGGTCC CGTTCGCGCC 9420
ACCGCCAGCC CTTGCTCCCA CCACCCTCCT CCCACCACCC CACTATTCCC CCCCCCCCAA 9480
GTCCGCCCCG TGGCTCGCCG GCCATGGAGC TCAGCTATGC CACCACCCTG CACCACCGGG 9540 ACGTTGTGTT TTACGTCACG GCAGACAGAA ACCGCGCCTA CTTTGTGTGC GGGGGGTCCG 9600
TTTATTCCGT AGGGCGGCCT CGGGATTCTC AGCCGGGGGA AATTGCCAAG TTTGGCCTGG 9660
TGGTCCGGGG GACAGGCCCC AAAGACCGCA TGGTCGCCAA CTACGTACGA AGCGAGCTCC 9720
GCCAGCGCGG CCTGCGGGAC GTGCGGCCCG TGGGGGAGGA CGAGGTGTTC CTGGACAGCG 9780
TGTGTCTGCT AAACCCGAAC GTGAGCTCCG AGCGAGACGT GATTAATACC AACGACGTTG 9840 AAGTGCTGGA CGAATGCCTG GCCGAATACT GCACCTCGCT GCGAACCAGC CCGGGGGTGC 9900
TGGTGACCGG GGTGCGCGTG CGCGCGCGAG ACAGGGTCAT CGAGCTATTT GAGCACCCGG 9960
CGATCGTCAA CATTTCCTCG CGCTTCGCGT ACACCCCCTC CCCCTACGTA TTCGCCCTGG 10020
CCCAGGCGCA CCTCCCCCGG CTCCCGAGCT CGCTGGAGCC CCTGGTGAGC GGCCTGTTTG 10080
ACGGCATTCC CGCCCCGCGC CAGCCCCTGG ACGCCCGCGA CCGGCGCACG GATGTCGTGA 10140 TCACGGGCAC CCGCGCCCCC AGACCGATGG CCGGGACCGG GGCCGGGGGC GCGGGGGCCA 10200
AGCGGGCCAC CGTCAGCGAG TTCGTGCAAG TGAAGCACAT CGACCGTGTT GTGTCCCCGA 10260
GCGTCTCTTC CGCCCCCCCG CCGAGCGCCC CCGACGCGAG TCTGCCGCCC CCGGGGCTCC 10320
AGGAGGCCGC CCCGCCGGGC CCCCCGCTCA GGGAGCTGTG GTGGGTGTTC TACGCCGGCG 10380
ACCGGGCGCT GGAGGAGCCC CACGCCGAGT CGGGATTGAC GCGCGAGGAG GTCCGCGCCG 10440 TGCATGGGTT CCGGGAGCAG GCGTGGAAGC TGTTTGGGTC GGTGGGGGCT CCGCGGGCGT 10500
TTCTCGGGGC CGCGCTGGCC CTGAGCCCGA CCCAAAAGCT CGCCGTCTAC TACTATCTCA 10560
TCCACCGGGA GCGGCGCATG TCCCCCTTCC CCGCGCTCGT GCGGCTCGTC GGTCGGTACA 10620
TCCAGCGCCA CGGCCTGTAC GTTCCCGCGC CCGACGAACC GACGTTGGCC GATGCCATGA 10680 ACGGGCTGTT CCGCGACGCG CTGGCGGCCG GGACCGTGGC CGAGCAGCTC CTCATGTTCG 10740
ACCTCCTCCC GCCCAAGGAC GTGCCGGTGG GGAGCGACGC GCGGGCCGAC AGCGCCGCCC 10800
TGCTGCGCTT TGTGGACTCG CAACGCCTGA CCCCGGGGGG GTCCGTCTCG CCCGAGCACG 10860
TCATGTACCT CGGCGCGTTC CTGGGCGTGT TGTACGCCGG CCACGGACGC CTGGCCGCGG 10920 CCACGCATAC CGCGCGCCTG ACGGGCGTGA CGTCCCTGGT CCTGACCGTG GGGGACGTCG 10980
ACCGGATGTC CGCGTTTGAC CGCGGGCCGG CGGGGGCGGC TGGCCGCACG CGAACCGCCG 11040
GGTACCTGGA CGCGCTGCTT ACCGTTTGCC TGGCTCGCGC CCAGCACGGC CAGTCTGTGT 11100
GAGATATCCC AATAAAGTGC AGTCGTTTTC TAACCCACGG ATGCCGTTGT ATGCCTATAC 11160
GGGGGACTAT GGGGGGGGGG GGAAAGGAAA GGAAACAGGA ATGGAGAAGG GAAAGGAACA 11220 GAGGCGGTAG CGGACGCACG GCGGACACAA TAACAAACAG ACCGCGGACA CGGAGGGAGT 11280
CGGTTGGGTT GGGCGTGGAC GCCGCTGCGT CCACACACCC GTTTATTCGC GTCTCCACAA 11340
AAATGGGACG CACGTTCGGA CCACCCTGAG GATGCCCGCC AGGGCCGCGG TGATCATAAC 11400
GACCCCCAGC GCGGACGCGG CCAGAAACCC GGGGGCGATG GTGGCGATGG GCAGCGTGTC 11460
AAAGGCCAGC AGATGAATCA CAGTTCCGTT GGGGAACAAC AACAGGGCCA CGGACGGCAC 11520 GTCGCTGGAA AACACGTTCG GGGTGCCCGC CACCGGCCCC TGGGCCAGCT GCTGTTGGGT 11580
GGCATCCGTG TCCACCAGCA GCACCGACAT GACCTCCCCG GCCGGGGTGT AGCGCAGAAA 11640
CACGGCCCCC ACGAGGCCGA GGTCGCGCCG GTTTTCGGTG CGCACCAGCC GCTTCGGCTC 11700
AATCTCCCGC GCGTGCCCTT CGCAGGTGGC GGTGAGATAG GTGATAAACA GCGGGCGGCG 11760
GACGTCAACG CCCGTAAGCT TGTATCCGAT CCCGCGGGGC AAGGGGGTGT GGGTGACGAC 11820 GTAGCTGGCG TTGTGGGTGA TGGGCACGAG GATCCGGGGC TCCGCGTTGT GCGACGGGCC 11880
GCTACACTGG TGGGTGGCCT CCGGGACGAA GGCGCGGATC AGGGCGTTGT AGTGCGCCCA 11940
GCGCGTGAGA ACGGAGGCCA CGCCGCGGGT CTGTTGTGCC ATGACGTCCG CCGGGATGTC 12000
GGATCGGGTG GCCATGGCCA GCGCGTCCAG GATGAACCCG CCCTCGGCGA GATCGAAGCG 12060
CAGGGAAGCT GCGCATGGGG AAAAGTGGTC CGGGAGCCAG AAGAGGTTTT TCTGGTGGTC 12120 GGTCCTGGCT AGCGCGGCCC GGAGATCGGC GTGGGTCGCC GCGGCGACGT CGGACGTACA 12180
CAGGGCCGTG GTTATGAGGA GGCCCCGGCG GGCGCGTTCC CGCTGCTCGG CCGAGGGCGC 12240
GCCCGCCAGG AACGGCGCCC GGAGGACGGC CGTGGCGTAA AACAGCGCTC GGCGGACCAT 12300
CGGGGCGGTT AGCGCGCGGC CGCCGAGAAA CTCGGCGTAC AGGGCGTCGA TCAGGCGGGC 12360
CGCGCTCGGG GCCACCGCGC CATAGGCCGC GGGGCTGTCC AACACGAACG CCAGCTGATA 12420 GCCCAGCGCG TGCGCCGCCA GGCTCTGCTC TCGCTCGAGG ATCGCGGCCA CCAGATGCCC 12480
GAGGCGCGCC TCCAGCCGCA GGCGGGCCGC CGGGTCCAAC ACGGACACGT TCAGGAACAC 12540
CGAGTCGGCC GCGCAGCCCG CTGCTCCCCG GGCGGCCAGG CCGGCCAGCA CGCGCGAGTG 12600
GGCCAAAAAG CCCAGCAGGT CGGAGAGGCG AATCGCGTCG TGGGCGTGGG CCGCGTTGAC 12660
GAACGCAAAC CCCGACGAGG CGAGCAGCCC CGCGAGGCGC CAGAACAGGG ACGGACGCGC 12720 GTCCGTGCCG GAGCCCGGGT CCTCCCCCAA AAACTCCGCA TAGGCCCGCG ACATATACTG 12780
GGCGTAGTTC GTGCTCTCCT CGGGGTAGCC GGCCACCCGC CGGAGGGCGT CCAGCGCCGA 12840
GCCGTTGTCG GCGGGCGTCG GGGCCCCCAG GACAAAGACG CGATACCTGG GGCCGGCCGG 12900
AGGCCCGGGG AGCACCGCGG GGGCGTTTTC GTCGGTCGGA TTTCCGACCC GAGCGAGGGT 12960
CTTGTCCGCA GGCACCACTA TGATCTCGGC CGGAGGGCTG TCCCGCATCG ATATCACGAG 13020 CCCCATGAAG CCCTTCCCGT ATCGCGCGCG CACGAGCGCG GCGTCGCACC CGAACGCCAG 13080
CCCGCCCGTC GTCCAGACGC CCACGGGCCA CGTCGAGGCC GACGGGGAGA GGTACACGTA 13140
CCGACCCGGA GTCCGTAGCA GGCCCCTGGC GGCCAGCCAG GTCACGGATG CGTTGTGCAG 13200
ATGCGCGATG CTCAGGTTCG TCGTCGGATG CCTCGGTGTC CCCGCGGGCG GCCCCGGGGG 13260 CGGCGCGTTG CGTCGGCCGT CCGGGTGCCT CTCGGTCGCC CCGTCGTCTC CCCGCGGGAA 13320
CGTAAGCCCC TCGCGGTCCG GCGCGGCCGC GAATGTTACC CAGGCCCGGG ACCGCAACAG 13380
CGCGGAGGCG CCGGGGTTGT GCGACAGTCC CTTGAGCTGG GTCACCTCGG CGGGGGGACG 13440
GGACGTGGGC CCCGCCTCGG GGAGCTCGGG CAGGCTCGCG TTCCGAGGCC GGCCGAGCAG 13500 ATAGGTCTTT GGGATGTAAA GCAGCTGCCC GGGGTCCCGA GGAAACTCGG CCGTGGTGAC 13560
CAACACGAAA CAAAAGCGCT CGGCGTACCA CCGAAGCATG GGCACGGATG CCGTAGTCAG 13620
GTTGAGTTCG CCCGGGGGCG CCAAGCGTCC GCGCTGGGGG TCGCTGGCGT CGGGGGTGTT 13680
GGGCAACCAC AGACGCCCGG TGTTTGTGTC GCGCCAGTAC GTGCGGGCCA ACCCCAGACC 13740
GTGCAAAAAC CACGGGTCGA TTTGCTCCGT CCAGTACGTG TCATGGCCCC CGGCAACGCC 13800 CACCAGGACC CCCATCACCA CCCACAGACC GGGGCCCATG GTCGTCCGTC CCGGCTGCCA 13860
GTCCGCAGAT GGGGGGGTGT CCGTACCCAC GGCCCAAAGA GGCTCCGCAC CTCGGAGGCT 13920
ATCGGAGGCC CTTTGTTGCC GTAAGCGCGG GCCAAAGGAT GGGGTGGGGT GAGGGTAAAA 13980
GCACAAAGGG AGTACCAGAC CGAAAACAAG GACGGATCGG CCCGCTCCGT TTTTCGGTGG 14040
GGTGCTGATA CGGTGCCAGC CCTGGCCCCG AACCCCCGCG CTTATGGACA CACCACACGA 14100 CAACAATGCC TTTTATTCTG TTCTTTTATT GCCGTCATCG CCGGGAGGCC TTCCGTTCGG 14160
GCTTCCGTGT TTGAACTAAA CTCCCCCCAC CTCGCGGGCA AACGTGCGCG CCAGGTCGCG 14220
TATCTCGGCG ATGGACCCGG CGGTTGTGAC GCGGGTTGGG ATCATCCCGG CGGTGAGGCG 14280
CAACAGGGCG TCTCGACACC CGACGGGCGA CTGATCGTAA TCCAGGACAA ATAGATGCAT 14340
CGGAAGGAGG CGGTCGGCCA AGACGTCCAA GACCCAGGCA AAAATGTGGT ACAAGTCCCC 14400 GTTGGGGGCC AGCAGCTCGG GAACGCGGAA CAGGGCAAAC AGCGTGTCCT CGATGCGGGG 14460
CAGAGACCCC GCGCCGTCCT CGGGGTCGGG GCGCGGGGTC GCCGCGGCGA CCCCCGTCAG 14520
CCGGCCCCAG TCCTCCCGCC ACCTCCCGCC GCGCTGCAGG TACCGCACCG TGTTGGCGAG 14580
TAGATCGTAG ACACGGCGAA TGGCGGACAG CATGGCCAGG TCAAGCCGCT CGCCCGGGCG 14640
TTGGCGTCTG GCCAGGCGGT CGGCGTGTTC GGCCTCCGGA AGGACACCCA GGACCAGGTT 14700 CGTGCCGGGC GCGGTCGGGG GCATGAGGGC CACGAACGCC AACACGGCCT GGGGGGTCAT 14760
GCTTCCCATG AGGTACCGCG CGGCCGGGTA GCACAGCAGG GAGGCGATAG GGTGCCGGTC 14820
GAAAACAAGG GTGAGGGCCG GGGGCGGGGC TTGCGGGCCC ACAGCCTCCC CCCCGATATG 14880
AGGAGCCAAA ACGGCGTCCG TCGCCGCATA AGGCGTGCTC ATTGTTATCT GGGCGCTGGT 14940
CATTACCACC GCCGCCTCCC CGGCCGATAT CTCGCCGCGG TCCAGACGGT GCTGCGTGTT 15000 GTAGATGTTC GTCAGGGTCT CGGAGGCCCC CAGCACCTGC CAGTAAGTCA TCGGCTCGGG 15060
GACGTAGACG ATATTGTCGC GCGGCCCCAG GGCCTCCATC AGCTGCGCGG AGGTGGTGGT 15120
CTTCCCCACC CCGTGGGGTC CGTCTATATA AACCCGCAGC AGCGTGGGCA GCTCCGGATC 15180
CCCGCGGGCT TCGGAGGCCC CCTGGCGATG GCTAGGACGG GACGCCGCGC GGCCGTCGGT 15240
AGGCCCGCTC GCACGAGCAG CCTGACCGAA CGCAGGCGCG TGCTGTTGGC CGGCGTGAGA 15300 AGCCATACCC GCTTCTACAA GGCGTTCGCC CGAGAGGTGC GGGAGTTCAA CGCCACCAGG 15360
ATTTGTGGAA CGCTGCTGAC GCTGATGAGC GGGTCGCTGC AGGGTCGCTC GCTGTTCGAG 15420
GCCACGCGCG TCACCTTAAT ATGCGAAGTG GACCTCGGGC CGCGCCGCCC AGACTGCATC 15480
TGCGTGTTCG AATTCGCCAA TGACAAAACG TTGGGAGGTG TGTGCGTCAT CCTGGAGCTA 15540
AAGACATGCA AATCGATTTC TTCCGGGGAC ACGGCCAGCA AACGCGAACA GCGGACCACG 15600 GGCATGAAGC AGCTGCGCCA CTCCCTGAAG CTGCTGCAGT CGCTCGCGCC TCCGGGGGAC 15660
AAGGTCGTCT ACCTGTGTCC TATTTTGGTG TTTGTCGCGC AGCGTACGCT GCGCGTCAGC 15720
CGCGTGACCC GGCTCGTCCC GCAAAAGATC TCCGGCAACA TCACCGCGGC CGTGCGGATG 15780
CTCCAAAGCC TGTCCACGTA TGCCGTGCCG CCGGAACCGC AGACCCGGCG GTCGCGGCGC 15840 CGGGTCGCCG CGACCGCCAG ACCGCAAAGG CCCCCCTCCC CGACACGTGA CCCGGAAGGC 15900
ACGGCGGGTC ATCCGGCCCC ACCAGAGAGC GACCCCCCCT CCCCAGGGGT CGTAGGCGTC 15960
GCTGCGGAGG GTGGGGGTGT GCTTCAGAAA ATCGCGGCGC TTTTTTGCGT GCCGGTGGCC 16020
GCCAAGAGCA GACCCCGGAC CAAAACCGAG TGAGGTTCTG TGTGTTGTTT TTTTTCCTCG 16080 TTTTGTTTTC TCTTCTTTCC CCCCCCCCTC CCCCGCTTCT GGCCAAGCAT CCTCACCTGC 16140
TTAAGCGGAA CCCGCGGGCG CGCGGGGACT CATTTGTCGC CGGCGACACC CACCCGACAA 16200
CAGCCCCTGG GTGTAGACCG CTGTCGCCCC CGTCTGTCGC CTCTCCCTTT TTTCCCCCCC 16260
TCAAAGAACG TGGTGTTGGG CGCCGGCCAA TTCTTCCCGG AGCGCCGTCG TCGCCCGCCC 16320
GCCGCCCTCG AACATGGACC CGTACTACCC TTTCGACGCG CTGGACGTTT GGGAACACAG 16380 GCGCTTCATC GTCGCCGACT CCAGGAGCTT CATCACCCCC GAGTTCCCCC GGGACTTCTG 16440
GATGTTGCCC GTGTTCAACA TCCCCCGGGA GACGGCGGCG GAGCGGGCGG CAGTGCTGCA 16500
GGCCCAGCGC ACCGCGGCCG CGGCGGCCCT GGAGAACGCC GCCCTCCAGG CCGCCGAGCT 16560
GCCCGTCGAC ATCGAGCGCC GGATACGCCC GATCGAGCAG CAGGTGCATC ACATCGCCGA 16620
CGCCCTGGAG GCGCTGGAGA CCGCGGCGGC CGCGGCCGAA GAGGCGGATG CCGCGCGGGA 16680 CGCCGAGGCG AGGGGGGAGG GCGCTGCGGA CGGGGCAGCG CCGTCGCCCA CCGCGGGCCC 16740
CGCCGCCGCG GAGATGGAGG TTCAGATCGT ACGCAACGAC CCGCCGCTAC GATACGATAC 16800
CAACCTCCCC GTGGATCTGC TACACATGGT GTACGCGGGC CGCGGGGCCG CGGGTTCGTC 16860
GGGAGTCGTC TTTGGTACCT GGTACCGCAC GATCCAGGAA CGCACCATCG CGGACTTCCC 16920
CCTGACCACC CGCAGCGCCG ACTTTCGAGA CGGGCGCATG TCCAAGACCT TCATGACCGC 16980 GCTGGTCCTG TCTCTGCAGT CGTGCGGCCG GCTGTACGTG GGCCAGCGCC ACTATTCCGC 17040
CTTCGAGTGC GCCGTGCTGT GTCTGTATCT GCTGTACCGA ACCACCCACG AGTCCTCCCC 17100
CGATCGCGAT CGCGCTCCCG TTGCGTTCGG GGACCTGCTG GCCCGCCTGC CGCGCTACCT 17160
GGCGCGTCTG GCCGCGGTAA TCGGCGACGA GAGCGGACGC CCGCAGTACC GCTACCGCGA 17220
CGACAAGCTG CCCAAAGCGC AGTTCGCGGC GGCCGGCGGC CGCTACGAGC ACGGGGCCCT 17280 GGCCACCCAC GTCGTGATCG CCACGTTGGT GCGCCACGGG GTGCTACCGG CGGCCCCGGG 17340
CGACGTTCCC CGAGACACCA GCACCCGCGT GAACCCCGAC GACGTGGCCC ACCGCGACGA 17400
CGTCAACCGC GCCGCCGCCG CGTTTTTGGC ACGCGGCCAC AACCTCTTCC TGTGGGAGGA 17460
CCAGACGCTG CTGCGGGCGA CCGCCAACAC CATTACGGCC CTGGCCGTGC TTCGGCGGCT 17520
CCTCGCGAAC GGCAACGTGT ACGCGGACCG CCTCGACAAC CGCCTGCAGC TGGGCATGCT 17580 GATCCCGGGA GCCGTCCCGG CGGAGGCCAT CGCTCGGGGG GCGTCCGGAT TGGACTCGGG 17640
CGCCATAAAA AGCGGCGACA ACAACCTGGA GGCGCTGTGC GTTAACTATG TACTTCCGCT 17700
GTATCAGGCA GACCCCACGG TCGAGCTGAC CCAGTTGTTT CCGGGGCTGG CCGCCCTGTG 17760
CCTGGACGCC CAGGCGGGGC GGCCACTGGC GTCGACGAGG CGCGTGGTGG ATATGTCGTC 17820
GGGCGCCCGC CAGGCGGCGC TCGTGCGCCT CACCGCGCTG GAGCTCATCA ACCGCACCCG 17880 CACAAACACC ACCCCTGTGG GGGAGATTAT TAACGCCCAC GATGCCTTGG GGATACAATA 17940
CGAACAGGGC CTGGGGCTGC TCGCCCAGCA GGCACGCATC GGCTTGGCGT CGAACGCCAA 18000
GCGATTCGCC ACGTTCAACG TGGGCAGCGA CTACGACCTG TTGTACTTTT TGTGTCTCGG 18060
GTTCATTCCC CAGTACCTGT CCGTGGCCTA GGGAAGGGTG GGGGTGGTGG TGGTGGGGTG 18120
TTTTTCTGCT GTTGTTGTTT CTGGTCCGCC TGGTCACAAA AGGCACGGCG CCCCGAAACG 18180 CGGGCTTTAG TCCCGGCCCG GACGTCGGCG GACACACAAC AACGGCGGGC CCCGTGGGTG 18240
GGTAAGTTGG TTCGGGGGCA TCGCTGTATT CCCTTGCCCG CTTCCACCCC CCCTTCCCGT 18300
TTTGTTTGTT TGTGCGGGTG CCCATGGCGT CGGCGGAAAT GCGCGAGCGG TTGGAGGCGC 18360
CTCTGCCCGA CCGGGCGGTG CCCATCTACG TGGCCGGGTT TTTGGCCCTG TACGACAGCG 18420 GGGACCCGGG CGAGCTGGCC CTGGACCCAG ACACGGTGCG TGCGGCCCTG CCTCCGGAGA 18480
ACCCCCTGCC GATCAACGTA GACCACCGCG CTCGGTGCGA GGTGGGCCGG GTGCTCGCCG 18540
TGGTCAACGA CCCTCGGGGG CCGTTTTTTG TGGGGCTGAT CGCGTGCGTG CAGCTGGAGC 18600
GCGTCCTCGA GACGGCCGCC AGCGCCGCTA TTTTTGAGCG CCGCGGACCC GCGCTCTCCC 18660 GGGAGGAGCG TCTGCTGTAC CTGATCACCA ACTACCTGCC ATCGGTCTCG CTGTCCACAA 18720
AACGCCGGGG GGACGAGGTT CCGCCCGACC GCACCCTGTT TGCGCACGTG GCCCTGTGCG 18780
CCATCGGGCG GCGCCTTGGA ACCATCGTCA CCTACGACAC CAGCCTAGAC GCGGCCATCG 18840
CTCCGTTTCG CCACCTGGAC CCGGCGACGC GCGAGGGGGT GCGACGCGAG GCCGCCGAGG 18900
CCGAGCTCGC GCTGGCCGGG CGCACCTGGG CCCCCGGCGT GGAGGCGCTC ACACACACGC 18960 TGCTCTCCAC CGCCGTCAAC AACATGATGC TGCGTGACCG CTGGAGCCTC GTGGCCGAGC 19020
GGCGGCGGCA GGCCGGGATC GCCGGACACA CGTACCTTCA GGCGAGCGAA AAATTTAAAA 19080
TATGGGGGGC GGAGTCTGCC CCTGCGCCGG AGCGCGGGTA TAAAACCGGC GCCCCGGGTG 19140
CCATGGACAC ATCCCCCGCC GCGAGCGTTC CCGCGCCGCA GGTCGCCGTC CGTGCGCGTC 19200
AAGTCGCGTC GTCGTCGTCT TCTTCTTCTT CTTTTCCGGC ACCGGCCGAT ATGAACCCCG 19260 TTTCGGCATC GGGCGCCCCG GCCCCTCCGC CGCCCGGCGA CGGGAGTTAT TTGTGGATCC 19320
CCGCCTTTCA TTACAATCAG CTCGTCACCG GGCAATCCGC GCCCCACCAC CCGCCGCTGA 19380
CCGCGTGCGG CCTGCCGGCC GCGGGGACGG TGGCCTACGG ACACCCCGGC GCCGGCCCGT 19440
CCCCGCACTA CCCGCCTCCT CCCGCCCACC CGTACCCGGG TATGCTGTTC GCGGGCCCCA 19500
GTCCCCTGGA GGCCCAGATC GCCGCGCTGG TGGGGGCCAT CGCCGCCGAC CGCCAGGCGG 19560 GTGGGCTTCC GGCGGCCGCC GGAGACCACG GGATCCGGGG GTCGGCGAAG CGCCGCCGAC 19620
ACGAGGTGGA GCAGCCGGAG TACGACTGCG GCCGTGACGA GCCGGACCGG GACTTCCCGT 19680
ATTACCCGGG CGAGGCCCGC CCCGAGCCGC GCCCGGTCGA CTCCCGGCGC GCCGCGCGCC 19740
AGGCTTCCGG GCCCCACGAA ACCATCACGG CGCTGGTGGG GGCGGTGACG TCCCTGCAGC 19800
AGGAACTGGC GCACATGCGC GCGCGTACCC ACGCCCCCTA CGGGCCGTAT CCGCCGGTGG 19860 GGCCCTACCA CCACCCCCAC GCAGACACGG AGACCCCCGC CCAACCACCC CGCTACCCCG 19920
CCGAGGCCGT CTATCTGCCG CCGCCGCACA TCGCCCCCCC GGGGCCTCCT CTATCCGGGG 19980
CGGTCCCCCC ACCCTCGTAT CCCCCAGTTG CGGTTACCCC CGGTCCCGCC CCCCCGCTAC 20040
ATCAGCCCTC CCCCGCACAC GCCCACCCCC CTCCGCCGCC GCCGGGACCC ACGCCTCCCC 20100
CCGCCGCGAG CTTACCCCAA CCCGAGGCGC CCGGCGCGGA GGCCGGCGCC TTAGTTAACG 20160 CCAGCAGCGC GGCCCACGTG AACGTGGACA CGGCCCGGGC CGCCGATCTG TTTGTGTCAC 20220
AGATGATGGG GTCCCGCTAA CTCGCCTCCA GGATCCGGAC TTGGGGGGGG TGTGTGTTTT 20280
CATATATTTT AAATAAACAA ACAACCGGAC AAAAGTATAC CCACTTCGTG TGCTTGTGTT 20340
TTTGTTTGAG AGGGGGGGGG TGGAGTGGGG GGGAAAGTGG GCCGAAT 20388
(2) INFORMATION FOR SEQ ID NO: 144:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 262 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144:
Met Asn Ala His Phe Ala Asn Glu Val Gin Tyr Asp Leu Thr Arg Asp 1 5 10 15
Pro Ser Ser Pro Ala Ser Leu Ile His Val Ile Ile Ser Ser Glu Cys
20 25 30
Leu Ala Ala Ala Gly Val Pro Leu Ser Ala Leu Val Arg Gly Arg Pro 35 40 45 Asp Gly Gly Ala Ala Ala Asn Phe Arg Val Glu Thr Gin Thr Arg Ala 50 55 60
His Ala Thr Gly Asp Cys Thr Pro Trp Arg Ser Ala Phe Ala Ala Tyr 65 70 75 80
Val Pro Ala Asp Ala Val Gly Ala Ile Leu Ala Pro Val Ile Pro Ala 85 90 95
His Pro Asp Leu Leu Pro Arg Val Pro Ser Ala Gly Gly Leu Phe Val
100 105 110
Ser Leu Pro Val Ala Cys Asp Ala Gin Gly Val Tyr Asp Pro Tyr Thr 115 120 125 Val Ala Ala Leu Arg Leu Ala Trp Gly Pro Trp Ala Thr Cys Ala Arg 130 135 140
Val Leu Leu Phe Ser Tyr Asp Glu Leu Val Pro Pro Asn Thr Arg Tyr 145 150 155 160
Ala Ala Asp Gly Ala Arg Leu Met Arg Leu Cys Arg His Phe Cys Arg 165 170 175
Tyr Val Ala Arg Leu Gly Ala Ala Ala Pro Ala Ala Ala Thr Glu Ala
180 185 190
Ala Ala His Leu Ser Leu Gly Met Gly Glu Ser Gly Thr Pro Thr Pro 195 200 205 Gin Ala Ser Ser Val Ser Gly Gly Ala Gly Pro Ala Val Val Gly Thr 210 215 220
Pro Asp Pro Pro Ile Ser Pro Glu Glu Gin Leu Thr Ala Pro Gly Gly 225 230 235 240
Asp Thr Ala Thr Ala Glu Asp Val Ser Ile Thr Gin Glu Asn Glu Glu 245 250 255
Ile Xaa Xaa Xaa Xaa Xaa 260
(2) INFORMATION FOR SEQ ID NO: 145:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 423 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145:
Val Pro Glu Gly Ala Trp Val Gly Gly Ala Cys Ala Arg Pro Arg Gly 1 5 10 15 Pro Arg Ala His Val Arg Leu Tyr Ala Val Cys Phe Val Cys Pro Gin 20 25 30
Gly Ile Arg Gly Gin Asp Phe Asn Leu Leu Phe Val Asp Glu Ala Asn
35 40 45
Phe Ile Arg Pro Asp Ala Val Gin Thr Ile Met Gly Phe Leu Asn Gin 50 55 60
Ala Asn Cys Lys Ile Ile Phe Val Ser Ser Thr Asn Thr Gly Lys Ala 65 70 75 80
Ser Thr Ser Phe Leu Tyr Asn Leu Arg Gly Ala Ala Asp Glu Leu Leu 85 90 95 Asn Val Val Thr Tyr Ile Cys Asp Asp His Met Pro Arg Val Val Thr 100 105 110
His Thr Asn Ala Thr Ala Cys Ser Cys Tyr Ile Leu Asn Lys Pro Val
115 120 125
Phe Ile Thr Met Asp Gly Ala Val Arg Arg Thr Ala Asp Leu Phe Leu 130 135 140
Pro Asp Ser Phe Met Gin Glu Ile Ile Gly Gly Gin Ala Arg Glu Thr
145 150 155 160
Gly Asp Asp Arg Pro Val Leu Thr Lys Ser Ala Gly Glu Arg Phe Leu
165 170 175 Leu Tyr Arg Pro Ser Thr Thr Thr Asn Ser Gly Leu Met Ala Pro Glu
180 185 190
Leu Tyr Val Tyr Val Asp Pro Ala Phe Thr Ala Asn Thr Arg Ala Ser
195 200 205
Gly Thr Gly Ile Ala Val Val Gly Arg Tyr Arg Asp Asp Phe Ile Ile 210 215 220
Phe Ala Leu Glu His Phe Phe Leu Arg Ala Leu Thr Gly Ser Ala Pro 225 230 235 240
Ala Asp Ile Ala Arg Cys Val Val His Ser Leu Ala Gin Val Leu Ala 245 250 255 Leu His Pro Gly Ala Phe Arg Ser Val Arg Val Ala Val Glu Gly Asn 260 265 270
Ser Ser Gin Asp Ser Ala Val Ala Ile Ala Thr His Val His Thr Glu 275 280 285 Met His Arg Ile Leu Ala Ser Ala Gly Ala Asn Gly Pro Gly Pro Glu
290 295 300
Leu Leu Phe Tyr His Cys Glu- Pro Pro Gly Gly Ala Val Leu Tyr Pro 305 310 315 320 Phe Phe Leu Leu Asn Lys Gin Lys Thr Pro Ala Phe Glu Tyr Phe Ile
325 330 335
Lys Lys Phe Asn Ser Gly Gly Val Met Ala Ser Gin Glu Leu Val Ser
340 345 350
Val Thr Val Arg Leu Gin Thr Asp Pro Val Glu Tyr Leu Ser Glu Gin 355 360 365
Leu Asn Asn Leu lie Glu Thr Val Ser Pro Asn Thr Asp Val Arg Met
370 375 380
Tyr Ser Gly Lys Arg Asn Gly Ala Ala Asp Asp Leu Met Val Ala Val 385 390 395 400 Ile Met Ala Ile Tyr Leu Ala Ala Pro Thr Gly Ile Pro Pro Ala Phe
405 410 415
Phe Pro Ile Thr Arg Thr Ser 420
(2) INFORMATION FOR SEQ ID NO: 146:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 355 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146:
Val Leu Leu Ser Pro Ala Pro Pro Pro Leu Pro His Gly Arg Cys Pro
1 5 10 15
Pro Ser Leu Phe His His Arg Pro Gly Cys Val Ser Gly Pro Pro Ala 20 25 30
Pro Pro Arg Ser Gly Val Ser Arg Pro Gly Ala Met Ile Thr Asp Cys
35 40 45
Phe Glu Ala Asp Ile Ala Ile Pro Ser Gly Ile Ser Arg Pro Asp Ala 50 55 60 Ala Ala Leu Gin Arg Cys Glu Gly Arg Val Val Phe Leu Pro Thr Ile 65 70 75 80
Arg Arg Gin Leu Ala Asp Val Ala His Glu Ser Phe Val Ser Gly Gly 85 90 95 Val Ser Pro Asp Thr Leu Gly Leu Leu Leu Ala Tyr Arg Arg Arg Phe
100 105 110
Pro Ala Val Ile Thr Arg Val -Leu Pro Thr Arg Ile Val Ala Cys Pro 115 120 125 Val Asp Leu Gly Leu Thr His Ala Gly Thr Val Asn Leu Arg Asn Thr 130 135 140
Ser Pro Val Asp Leu Cys Asn Gly Asp Pro Val Ser Leu Val Pro Pro 145 150 155 160
Val Phe Glu Gly Gin Ala Thr Asp Val Arg Leu Glu Ser Leu Asp Leu 165 170 175
Thr Leu Arg Phe Pro Val Pro Leu Pro Thr Pro Leu Ala Arg Glu Ile
180 185 190
Val Ala Arg Leu Val Arg Ile Arg Asp Leu Asn Pro Asp Pro Arg Thr 195 200 205 Pro Gly Glu Leu Pro Asp Leu Asn Val Leu Tyr Tyr Asn Gly Ala Arg 210 215 220
Leu Ser Leu Val Ala Asp Val Gin Gin Leu Ala Ser Val Asn Thr Glu 225 230 235 240
Leu Arg Ser Leu Val Leu Asn Met Val Tyr Ser Ile Thr Glu Gly Thr 245 250 255
Thr Leu Ile Leu Thr Leu Ile Pro Arg Leu Leu Ala Leu Ser Ala Gin
260 265 270
Asp Gly Tyr Val Asn Ala Leu Leu Gin Met Gin Ser Val Thr Arg Glu 275 280 285 Ala Ala Gin Leu Ile His Pro Glu Ala Pro Met Leu Met Gin Asp Gly 290 295 300
Glu Arg Arg Leu Pro Leu Tyr Glu Ala Leu Val Ala Trp Leu Ala His 305 310 315 320
Ala Gly Gin Leu Gly Asp Ile Leu Ala Pro Ala Val Arg Val Cys Thr 325 330 335
Phe Asp Gly Ala Ala Val Val Gin Ser Gly Asp Met Ala Pro Val Ile
340 345 350
Arg Tyr Pro 355
(2) INFORMATION FOR SEQ ID NO: 147:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1382 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147:
Val Trp Glu Gly Leu Gly Leu Pro Glu Leu Gly Leu Met Glu Pro Ala 1 5 10 15
Asn Pro Pro Arg Asn Pro Met Ala Ala Pro Ala Arg Asp Pro Pro Gly
20 25 30
Tyr Arg Tyr Ala Ala Ala Met Val Pro Thr Gly Ser Ile Leu Ser Thr 35 40 45
Ile Glu Val Ala Ser His Arg Arg Leu Phe Asp Phe Phe Ala Arg Val
50 55 60
Arg Ser Asp Glu Asn Ser Leu Tyr Asp Val Glu Phe Asp Ala Leu Leu 65 70 75 80 Gly Ser Tyr Cys Asn Thr Leu Ser Leu Val Arg Phe Leu Glu Leu Gly
85 90 95
Leu Ser Val Ala Cys Val Cys Thr Lys Phe Pro Glu Leu Ala Tyr Met
100 105 110
Asn Glu Gly Arg Val Gin Phe Glu Val His Gin Pro Leu Ile Ala Arg 115 120 125
Asp Gly Pro His Pro Val Glu Gin Pro Val His Asn Tyr Met Thr Lys
130 135 140
Val Ile Asp Arg Arg Ala Leu Asn Ala Ala Phe Ser Leu Ala Thr Glu 145 150 155 160 Ala Ile Ala Leu Leu Thr Gly Glu Ala Leu Asp Gly Thr Gly Ile Ser
165 170 175
Leu His Arg Gin Leu Arg Ala Ile Gin Gin Leu Ala Arg Asn Val Gin
180 185 190
Ala Val Leu Gly Ala Phe Glu Arg Gly Thr Ala Asp Gin Met Leu His 195 200 205
Val Leu Leu Glu Lys Ala Pro Pro Leu Ala Leu Leu Leu Pro Met Gin
210 215 220
Arg Tyr Leu Asp Asn Gly Arg Leu Ala Thr Arg Val Ala Arg Ala Thr 225 230 235 240 Leu Val Ala Glu Leu Lys Arg Ser Phe Cys Asp Thr Ser Phe Phe Leu
245 250 255
Gly Lys Ala Gly His Arg Arg Glu Ala Ile Glu Ala Trp Leu Val Asp
260 265 270
Leu Thr Thr Ala Thr Gin Pro Ser Val Ala Val Pro Arg Leu Thr His 275 280 285
Ala Asp Thr Arg Gly Arg Pro Val Asp Gly Val Leu Val Thr Thr Ala
290 295 300
Ala Ile Lys Gin Arg Leu Leu Gin Ser Phe Leu Lys Val Glu Asp Thr 305 310 315 320
Glu Ala Asp Val Pro Val Thr Tyr Gly Glu Met Val Leu Asn Gly Ala
325 - 330 335
Asn Leu Val Thr Ala Leu Val Met Gly Lys Ala Val Arg Ser Leu Asp 340 345 350
Asp Val Gly Arg His Leu Leu Glu Met Gin Glu Glu Gin Leu Glu Ala
355 360 365
Asn Arg Glu Thr Leu Asp Glu Leu Glu Ser Ala Pro Gin Thr Thr Arg
370 375 380 Val Arg Ala Asp Leu Val Ala Ile Gly Asp Arg Leu Val Phe Leu Glu
385 390 395 400
Ala Leu Glu Lys Arg Ile Tyr Ala Ala Thr Asn Val Pro Tyr Pro Leu
405 410 415
Val Gly Ala Met Asp Leu Thr Phe Val Leu Pro Leu Gly Leu Phe Asn 420 425 430
Pro Ala Met Glu Arg Phe Ala Ala His Ala Gly Asp Leu Val Pro Ala
435 440 445
Pro Gly His Pro Glu Pro Arg Ala Phe Pro Pro Arg Gin Leu Phe Phe
450 455 460 Trp Gly Lys Asp His Gin Val Leu Arg Leu Ser Met Glu Asn Ala Val
465 470 475 480
Gly Thr Val Cys His Pro Ser Leu Met Asn Ile Asp Ala Ala Val Gly
485 490 495
Gly Val Asn His Asp Pro Val Glu Ala Ala Asn Pro Tyr Gly Ala Tyr 500 505 510
Val Ala Ala Pro Ala Gly Pro Gly Ala Asp Met Gin Gin Arg Phe Leu
515 520 525
Asn Ala Trp Arg Gin Arg Leu Ala His Gly Arg Val Arg Trp Val Ala
530 535 540 Glu Cys Gin Met Thr Ala Glu Gin Phe Met Gin Pro Asp Asn Ala Asn
545 550 555 560
Leu Ala Leu Glu Leu His Pro Ala Phe Asp Phe Phe Ala Gly Val Ala
565 570 575
Asp Val Glu Leu Pro Gly Gly Glu Val Pro Pro Ala Gly Pro Gly Ala 580 585 590
Ile Gin Ala Thr Trp Arg Val Val Asn Gly Asn Leu Pro Leu Ala Leu
595 600 605
Cys Pro Val Ala Phe Arg Asp Arg Leu Glu Leu Gly Val Gly Arg His 610 615 620 Ala Met Ala Pro Ala Thr Ile Ala Ala Val Arg Gly Ala Phe Glu Asp 625 630 635 640
Arg Ser Tyr Pro Ala Val Phe Tyr Leu Leu Gin Ala Ala Ile His Gly 645 650 655 Ser Glu His Val Phe Cys Ala Arg Leu Val Thr Gin Cys Ile Thr Ser
660 665 670
Tyr Trp Asn Asn Thr Arg Cys -Ala Ala Phe Val Asn Asp Tyr Ser Leu 675 680 685 Val Ser Tyr Ile Val Thr Tyr Leu Gly Gly Asp Leu Pro Glu Glu Cys 690 695 700
Met Ala Val Tyr Arg Asp Leu Val Ala His Val Glu Ala Gin Leu Val 705 710 715 720
Asp Asp Phe Thr Leu Pro Gly Pro Glu Leu Gly Gly Gin Ala Gin Ala 725 730 735
Glu Leu Asn His Leu Met Arg Asp Pro Ala Leu Leu Pro Pro Leu Val
740 745 750
Trp Asp Cys Asp Gly Leu Met Arg His Ala Ala Leu Asp Arg His Arg 755 760 765 Asp Cys Arg Ile Asp Ala Gly Gly His Glu Pro Val Tyr Ala Ala Ala 770 775 780
Cys Asn Val Ala Thr Ala Asp Phe Asn Arg Asn Asp Gly Arg Leu Leu 785 790 795 800
His Asn Thr Gin Ala Arg Ala Ala Asp Ala Ala Asp Asp Arg Pro His 805 810 815
Arg Pro Ala Asp Trp Thr Val His His Lys Ile Tyr Tyr Tyr Val Leu
820 825 830
Val Pro Ala Phe Ser Arg Gly Arg Cys Cys Thr Ala Gly Val Arg Phe 835 840 845 Asp Arg Val Tyr Ala Thr Leu Gin Asn Met Val Val Pro Glu Ile Ala 850 855 860
Pro Gly Glu Glu Cys Pro Ser Asp Pro Val Thr Asp Pro Ala His Pro 865 870 875 880
Leu His Pro Ala Asn Leu Val Ala Asn Thr Val Asn Ala Met Phe His 885 890 895
Asn Gly Arg Val Val Val Asp Gly Pro Ala Met Leu Thr Leu Gin Val
900 905 910
Leu Ala His Asn Met Ala Glu Arg Thr Thr Ala Leu Leu Cys Ser Ala 915 920 925 Ala Pro Asp Ala Gly Ala Asn Thr Ala Ser Thr Ala Asn Met Arg Ile 930 935 940
Phe Asp Gly Ala Leu His Ala Gly Val Leu Leu Met Ala Pro Gin His 945 950 955 960
Leu Asp His Thr Ile Gin Asn Gly Glu Tyr Phe Tyr Val Leu Pro Val 965 970 975
His Ala Leu Phe Ala Gly Ala Asp His Val Ala Asn Ala Pro Asn Phe
980 985 990
Pro Pro Ala Leu Arg Asp Leu Ala Arg His Val Pro Leu Val Pro Pro 995 1000 1005
Ala Leu Gly Ala Asn Tyr Phe Ser Ser Ile Arg Gin Pro Val Val Gin
1010 1015 1020
His Ala Arg Glu Ser Ala Ala Gly Glu Asn Ala Leu Thr Tyr Ala Leu 1025 1030 1035 104
Met Ala Gly Tyr Phe Lys Met Ser Pro Val Tyr His Gin Leu Lys Thr
1045 1050 1055
Gly Leu His Pro Gly Phe Gly Phe Thr Val Val Arg Gin Asp Arg Phe 1060 1065 1070 Val Thr Glu Asn Val Leu Phe Ser Ala Ser Glu Ala Tyr Phe Leu Gly 1075 1080 1085
Gin Leu Gin Val Ala Arg His Glu Thr Gly Gly Gly Val Ser Phe Thr
1090 1095 1100
Leu Thr Gin Pro Arg Gly Asn Val Asp Leu Gly Val Gly Tyr Thr Ala 1105 1110 1115 112
Val Ala Ala Thr Ala Thr Val Arg Asn Pro Val Thr Asp Met Gly Asn
1125 1130 1135
Leu Pro Gin Asn Phe Tyr Leu Gly Arg Gly Ala Pro Pro Leu Leu Asp 1140 1145 1150 Asn Ala Ala Ala Val Tyr Leu Arg Asn Ala Val Val Ala Gly Asn Arg 1155 1160 1165
Leu Gly Pro Ala Gin Pro Leu Pro Val Phe Gly Cys Ala Gin Val Pro
1170 1175 1180
Arg Arg Ala Gly Met Asp His Gly Gin Asp Ala Val Cys Glu Phe Ile 1185 1190 1195 120
Ala Thr Pro Val Ala Thr Asp Ile Asn Tyr Phe Arg Arg Pro Cys Asn
1205 1210 1215
Pro Arg Gly Arg Ala Ala Gly Gly Val Tyr Ala Gly Asp Lys Glu Gly 1220 1225 1230 Asp Val Ile Ala Leu Met Tyr Asp His Gly Gin Ser Asp Pro Ala Arg 1235 1240 1245
Pro Phe Ala Ala Thr Ala Asn Pro Trp Ala Ser Gin Arg Phe Ser Tyr
1250 1255 1260
Gly Asp Leu Leu Tyr Asn Gly Ala Tyr His Leu Asn Gly Asp Val Leu 1265 1270 1275 128
Ser Pro Cys Phe Lys Phe Phe Thr Ala Ala Asp Ile Thr Ala Lys His
1285 1290 1295
Arg Cys Leu Glu Arg Leu Ile Val Glu Thr Gly Ser Ala Val Ser Thr 1300 1305 1310 Ala Thr Ala Ala Ser Asp Val Gin Phe Lys Arg Pro Pro Gly Cys Arg 1315 1320 1325
Glu Leu Val Glu Asp Pro Cys Gly Leu Phe Gin Glu Ala Tyr Pro Ile 1330 1335 1340 Thr Cys Ala Ser Asp Pro Ala Leu Leu Arg Ser Ala Arg Asp Gly Glu 1345 1350 1355 136
Ala His Ala Arg Glu Thr His- Phe Thr Gin Tyr Leu Ile Tyr Asp Asp 1365 1370 1375 Leu Lys Gly Leu Ser Leu 1380
(2) INFORMATION FOR SEQ ID NO: 148:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 222 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148:
Met Thr Met Arg Asp Asp Val Pro Leu Leu Asp Arg Glu Leu Val Tyr 1 5 10 15
Glu Ala Ala Cys Gly Gly Glu Asp Gly Glu Leu Pro Leu Asp Glu Gin
20 25 30
Phe Ser Leu Ser Ser Tyr Gly Thr Ser Asp Phe Phe Val Ser Ser Ala 35 40 45
Tyr Ser Arg Leu Pro Pro His Thr Gin Pro Val Phe Ser Lys Arg Val
50 55 60
Val Met Phe Ala Trp Ser Phe Leu Val Leu Lys Pro Leu Glu Leu Val 65 70 75 80 Ala Ala Gly Met Tyr Tyr Gly Trp Thr Gly Arg Ala Val Ala Pro Ala
85 90 95
Cys Ile Ile Ala Ala Val Leu Ala Tyr Tyr Val Thr Trp Leu Ala Arg
100 105 110
Ala Leu Leu Leu Tyr Val Asn Ile Lys Arg Asp Arg Leu Pro Leu Ser 115 120 125
Pro Pro Val Phe Trp Gly Leu Cys Val Ile Met Gly Gly Ala Ala Leu
130 135 140
Cys Ala Leu Val Ala Ala Ala His Glu Thr Phe Ser Pro Asp Gly Leu 145 150 155 160 Phe His Trp Ile Thr Ala Ser Gin Leu Leu Pro Arg Thr Asp Pro Leu
165 170 175
Arg Ala Arg Ser Leu Gly Ile Ala Cys Ala Ala Gly Ala Ala Met Trp 180 185 190 Val Ala Ala Ala Asp Cys Phe Ala Ala Phe Thr Asn Phe Phe Leu Ala
195 200 205
Arg Phe Trp Thr Arg Ala He- Leu Lys Ala Pro Val Ala Phe 210 215 220
(2) INFORMATION FOR SEQ ID NO: 149:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 627 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 149:
Val Gly Arg Gin Gly Glu Arg Trp Val Gly Gly Gly Asn Glu Glu Asn 1 5 10 15 Thr Gin Arg Ala Thr Ser Gly Met Arg Pro Glu Leu Ser Leu Lys Gly 20 25 30
Arg Pro Cys Val Thr Glu Ala Val Val Cys Pro Ser Thr Asp Ala Ala
35 40 45
Ile His Ser Gly Gly Ser Ser Ser Val Arg Pro Gin Pro Tyr Ala Arg 50 55 60
Ala Ala Arg Ala Arg Ala Thr His Gly Ser Arg Ser Arg His Arg Gin 65 70 75 80
Pro Leu Leu Pro Pro Pro Ser Ser His His Pro Thr Ile Pro Pro Pro 85 90 95 Pro Ser Pro Pro Arg Gly Ser Pro Ala Met Glu Leu Ser Tyr Ala Thr 100 105 110
Thr Leu His His Arg Asp Val Val Phe Tyr Val Thr Ala Asp Arg Asn
115 120 125
Arg Ala Tyr Phe Val Cys Gly Gly Ser Val Tyr Ser Val Gly Arg Pro 130 135 140
Arg Asp Ser Gin Pro Gly Glu Ile Ala Lys Phe Gly Leu Val Val Arg 145 150 155 160
Gly Thr Gly Pro Lys Asp Arg Met Val Ala Asn Tyr Val Arg Ser Glu 165 170 175 Leu Arg Gin Arg Gly Leu Arg Asp Val Arg Pro Val Gly Glu Asp Glu 180 185 190
Val Phe Leu Asp Ser Val Cys Leu Leu Asn Pro Asn Val Ser Ser Asp 195 200 205 Val Ile Asn Thr Asn Asp Val Glu Val Leu Asp Glu Cys Leu Ala Glu
210 215 220
Tyr Cys Thr Ser Leu Arg Thr Ser Pro Gly Val Leu Val Thr Gly Val 225 230 235 240 Arg Val Arg Ala Arg Asp Arg Val Ile Glu Leu Phe Glu His Pro Ala
245 250 255
Ile Val Asn Ile Ser Ser Arg Phe Ala Tyr Thr Pro Ser Pro Tyr Val
260 265 270
Phe Ala Gin Ala His Leu Pro Arg Leu Pro Ser Ser Leu Glu Pro Leu 275 280 285
Val Ser Gly Leu Phe Asp Gly Ile Pro Ala Pro Arg Gin Pro Leu Asp
290 295 300
Ala Arg Asp Arg Arg Thr Asp Val Val Ile Thr Gly Thr Arg Ala Pro 305 310 315 320 Arg Pro Met Ala Gly Thr Gly Ala Gly Gly Ala Gly Ala Lys Arg Ala
325 330 335
Thr Val Ser Glu Phe Val Gin Val Lys His Ile Asp Arg Val Val Ser
340 345 350
Pro Ser Val Ser Ser Ala Pro Pro Pro Ser Ala Pro Asp Ala Ser Leu 355 360 365
Pro Pro Pro Gly Leu Gin Glu Ala Ala Pro Pro Gly Pro Pro Leu Arg
370 375 380
Glu Leu Trp Trp Val Phe Tyr Ala Gly Asp Arg Ala Leu Glu Glu Pro 385 390 395 400 His Ala Glu Ser Gly Leu Thr Arg Glu Glu Val Arg Ala Val His Gly
405 410 415
Phe Arg Glu Gin Ala Trp Lys Leu Phe Gly Ser Val Gly Ala Pro Arg
420 425 430
Ala Phe Leu Gly Ala Ala Leu Ser Pro Thr Gin Lys Leu Ala Val Tyr 435 440 445
Tyr Tyr Leu Ile His Arg Glu Arg Arg Met Ser Pro Phe Pro Ala Leu
450 455 460
Val Arg Leu Val Gly Arg Tyr Ile Gin Arg His Gly Val Pro Ala Pro 465 470 475 480 Asp Glu Pro Thr Leu Ala Asp Ala Met Asn Gly Leu Phe Arg Asp Ala
485 490 495
Ala Gly Thr Val Ala Glu Gin Leu Leu Met Phe Asp Leu Leu Pro Pro
500 505 510
Lys Asp Val Pro Val Gly Ser Asp Ala Arg Ala Asp Ser Ala Ala Leu 515 520 525
Leu Arg Phe Val Asp Ser Gin Arg Leu Thr Pro Gly Gly Ser Val Ser
530 535 540
Pro Glu His Val Met Tyr Leu Gly Ala Phe Leu Gly Val Leu Tyr Ala 545 550 555 560
Gly His Gly Arg Leu Ala Ala Ala Thr His Thr Ala Arg Leu Thr Gly
565 " 570 575
Val Thr Ser Leu Val Leu Thr Val Gly Asp Val Asp Arg Met Ser Ala 580 585 590
Phe Asp Arg Gly Pro Ala Gly Ala Ala Gly Arg Thr Arg Thr Ala Gly
595 600 605
Tyr Leu Asp Ala Leu Leu Thr Val Cys Leu Ala Arg Ala Gin His Gly 610 615 620 Gin Ser Val 625
(2) INFORMATION FOR SEQ ID NO: 150:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 908 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150:
Val Ser Ile Ser Ala Gly Val Arg Gly Gin Gly Trp His Arg Ile Ser 1 5 10 15
Thr Pro Pro Lys Asn Gly Ala Gly Arg Ser Val Leu Val Phe Gly Leu
20 25 30
Val Leu Pro Leu Cys Phe Tyr Pro His Pro Thr Pro Ser Phe Gly Pro 35 40 45
Arg Leu Arg Gin Gin Arg Ala Ser Asp Ser Leu Arg Gly Ala Glu Pro
50 55 60
Leu Trp Ala Val Gly Thr Asp Thr Pro Pro Ser Ala Asp Trp Gin Pro 65 70 75 80 Gly Arg Thr Thr Met Gly Pro Gly Leu Trp Val Val Met Gly Val Leu
85 90 95
Val Gly Val Ala Gly Gly His Asp Thr Tyr Trp Thr Glu Gin Ile Asp
100 105 110
Pro Trp Phe Leu His Gly Leu Gly Leu Ala Arg Thr Tyr Trp Arg Asp 115 120 125
Thr Asn Thr Gly Arg Leu Trp Leu Pro Asn Thr Pro Asp Ala Ser Asp
130 135 140
Pro Gin Arg Gly Arg Leu Ala Pro Pro Gly Glu Leu Asn Leu Thr Thr 145 150 155 160
Ala Ser Val Pro Met Leu Arg Trp Tyr Ala Glu Arg Phe Cys Phe Val
165 - 170 175
Leu Val Thr Thr Ala Glu Phe Pro Arg Asp Pro Gly Gin Leu Leu Tyr 180 185 190
Ile Pro Lys Thr Tyr Leu Leu Gly Arg Pro Arg Asn Ala Ser Leu Pro
195 200 205
Glu Leu Pro Glu Ala Gly Pro Thr Ser Arg Pro Pro Ala Glu Val Thr
210 215 220 Gin Leu Lys Gly Leu Ser His Asn Pro Gly Ala Ser Ala Leu Leu Arg
225 230 235 240
Ser Arg Ala Trp Val Thr Phe Ala Ala Ala Pro Asp Arg Glu Gly Leu
245 250 255
Thr Phe Pro Arg Gly Asp Asp Gly Ala Thr Glu Arg His Pro Asp Gly 260 265 270
Arg Arg Asn Ala Pro Pro Pro Gly Pro Pro Ala Gly Thr Pro Arg His
275 280 285
Pro Thr Thr Asn Leu Ser Ile Ala His Leu His Asn Ala Ser Val Thr
290 295 300 Trp Leu Ala Arg Leu Leu Arg Thr Pro Gly Arg Tyr Val Tyr Leu Ser
305 310 315 320
Pro Ser Ala Ser Thr Trp Pro Val Gly Val Trp Thr Thr Gly Gly Leu
325 330 335
Ala Phe Gly Cys Asp Ala Ala Leu Val Arg Ala Arg Tyr Gly Lys Gly 340 345 350
Phe Met Gly Leu Val Ile Ser Met Arg Asp Ser Pro Pro Ala Glu Ile
355 360 365
Ile Val Val Pro Ala Asp Lys Thr Leu Ala Arg Val Gly Asn Pro Thr
370 375 380 Asp Glu Asn Ala Pro Ala Val Leu Pro Gly Pro Pro Ala Gly Pro Arg
385 390 395 400
Tyr Arg Val Phe Val Leu Gly Ala Pro Thr Pro Ala Asp Asn Gly Ser
405 410 415
Ala Leu Asp Ala Leu Arg Arg Val Ala Gly Tyr Pro Glu Glu Ser Thr 420 425 430
Asn Tyr Ala Gin Tyr Met Ser Arg Ala Tyr Ala Glu Phe Leu Gly Glu
435 440 445
Asp Pro Gly Ser Gly Thr Asp Ala Arg Pro Ser Leu Phe Trp Arg Leu 450 455 460 Ala Gly Leu Leu Ala Ser Ser Gly Phe Ala Phe Val Asn Ala Ala His 465 470 475 480
Ala His Asp Ala Ile Arg Leu Ser Asp Leu Leu Gly Phe Leu Ala His 485 490 495 Ser Arg Val Leu Ala Gly Leu Ala Arg Ala Ala Gly Cys Ala Ala Asp
500 505 510
Ser Val Phe Leu Asn Val Ser -Val Leu Asp Pro Ala Ala Arg Leu Arg 515 520 525 Leu Glu Ala Arg Leu Gly His Leu Val Ala Ala Ile Arg Glu Gin Ser 530 535 540
Leu Ala Ala His Ala Leu Gly Tyr Gin Leu Ala Phe Val Leu Asp Ser 545 550 555 560
Pro Ala Ala Tyr Gly Ala Val Ala Pro Ser Ala Ala Arg Leu Ile Asp 565 570 575
Ala Leu Tyr Ala Glu Phe Leu Gly Gly Arg Ala Leu Thr Ala Pro Met
580 585 590
Val Arg Arg Ala Leu Phe Tyr Ala Thr Ala Val Leu Arg Ala Pro Phe 595 600 605 Leu Ala Gly Ala Pro Ser Ala Glu Gin Arg Glu Arg Ala Arg Arg Gly 610 615 620
Leu Leu Ile Thr Thr Ala Leu Cys Thr Ser Asp Val Ala Ala Ala Thr 625 630 635 640
His Ala Asp Leu Arg Ala Ala Arg Thr Asp His Gin Lys Asn Leu Phe 645 650 655
Trp Leu Pro Asp His Phe Ser Pro Cys Ala Ala Ser Leu Arg Phe Asp
660 665 670
Leu Ala Glu Gly Gly Phe Ile Leu Asp Ala Met Ala Thr Arg Ser Asp 675 680 685 Ile Pro Ala Asp Val Met Ala Gin Gin Thr Arg Gly Val Ala Ser Val 690 695 700
Leu Thr Arg Trp Ala His Tyr Asn Ala Leu Ile Arg Ala Phe Val Pro 705 710 715 720
Glu Ala Thr His Gin Cys Ser Gly Pro Ser His Asn Ala Glu Pro Arg 725 730 735
Ile Leu Val Pro Ile Thr His Asn Ala Ser Tyr Val Val Thr His Thr
740 745 750
Pro Leu Pro Arg Gly Ile Gly Tyr Lys Leu Thr Gly Val Asp Val Arg 755 760 765 Arg Pro Leu Phe Ile Thr Tyr Leu Thr Ala Thr Cys Glu Gly His Ala 770 775 780
Arg Glu Ile Glu Pro Lys Arg Leu Val Arg Thr Glu Asn Arg Arg Asp 785 790 795 800
Leu Gly Leu Val Gly Ala Val Phe Leu Arg Tyr Thr Pro Ala Gly Glu 805 810 815
Val Met Ser Val Leu Leu Val Asp Thr Asp Ala Thr Gin Gin Gin Leu
820 825 830
Ala Gin Gly Pro Val Ala Gly Thr Pro Asn Val Phe Ser Ser Asp Val 835 840 845
Pro Ser Val Leu Leu Phe Pro Asn Gly Thr Val Ile His Leu Leu Ala
850 855 860
Phe Asp Thr Leu Pro Ile Ala Thr Ile Ala Pro Gly Phe Leu Ala Ala 865 870 875 880
Ser Ala Leu Gly Val Val Met Ile Thr Ala Ala Gly Ile Leu Arg Val
885 890 895
Val Arg Thr Cys Val Pro Phe Leu Trp Arg Arg Glu 900 905
(2) INFORMATION FOR SEQ ID NO: 151:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 370 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151
Met Ala Ser His Ala Gly Gin Gin His Ala Pro Ala Phe Gly Gin Ala
1 5 10 15
Ala Arg Ala Ser Gly Pro Thr Asp Gly Arg Ala Ala Ser Arg Pro Ser
20 25 30
His Arg Gin Gly Ala Ser Asp Pro Glu Leu Pro Thr Leu Leu Arg Val 35 40 45
Tyr Ile Asp Gly Pro His Gly Val Gly Lys Thr Thr Thr Ser Ala Gin
50 55 60
Leu Met Glu Ala Leu Gly Pro Arg Asp Asn Ile Val Tyr Val Pro Glu
65 70 75 80
Pro Met Thr Tyr Trp Gin Val Leu Gly Ala Ser Glu Thr Leu Thr Asn
85 90 95
Ile Tyr Asn Thr Gin His Arg Leu Asp Arg Gly Glu Ile Ser Ala Gly
100 105 110
Glu Ala Ala Val Val Met Thr Ser Ala Gin Ile Thr Met Ser Thr Pro 115 120 125
Tyr Ala Ala Thr Asp Ala Val Leu Ala Pro His Ile Gly Gly Glu Ala
130 135 140
Val Gly Pro Gin Ala Pro Pro Pro Ala Leu Thr Leu Val Phe Asp Arg
145 150 155 160
His Pro Ile Ala Ser Leu Leu Cys Tyr Pro Ala Ala Arg Tyr Leu Met 165 170 175
Gly Ser Met Thr Pro Gin Ala Val Leu Ala Phe Val Met Pro Pro Thr 180 185 190
Ala Pro Gly Thr Asn Leu Val Leu Gly Val Leu Pro Glu Ala Glu His 195 200 205
Ala Asp Arg Leu Ala Arg Arg Gin Arg Pro Gly Glu Arg Leu Asp Leu 210 215 220
Ala Met Leu Ser Ala Ile Arg Arg Val Tyr Asp Leu Leu Ala Asn Thr
225 230 235 240
Val Arg Tyr Leu Gin Arg Gly Gly Arg Trp Arg Glu Asp Trp Gly Arg
245 250 255
Leu Thr Gly Val Ala Ala Ala Thr Pro Arg Pro Asp Pro Glu Asp Gly 260 265 270
Ala Gly Ser Leu Pro Arg Ile Glu Asp Thr Leu Phe Ala Leu Phe Arg 275 280 285
Val Pro Glu Leu Leu Ala Pro Asn Gly Asp Leu Tyr His Ile Phe Ala
290 295 300
Trp Val Leu Asp Val Leu Ala Asp Arg Leu Leu Pro Met His Leu Phe
305 310 315 320
Val Leu Asp Tyr Asp Gin Ser Pro Val Gly Cys Arg Asp Ala Leu Leu 325 330 335
Arg Leu Thr Ala Gly Met Ile Pro Thr Arg Val Thr Thr Ala Gly Ser 340 345 350
Ile Ala Glu Ile Arg Asp Leu Ala Arg Thr Phe Ala Arg Glu Val Gly 355 360 365
Gly Val 370
(2) INFORMATION FOR SEQ ID NO: 152:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 352 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152:
Val Leu Arg Val Val Asp Val Arg Gin Gly Leu Gly Gly Pro Gin His
1 5 10 15
Leu Pro Val Ser His Arg Leu Gly Asp Val Asp Asp Ile Val Ala Arg 20 25 30
Pro Gin Gly Leu His Gin Leu Arg Gly Gly Gly Gly Leu Pro His Pro
35 -40 45
Val Gly Ser Val Tyr Ile Asn Pro Gin Gin Arg Gly Gin Leu Arg Ile 50 55 60
Pro Ala Gly Phe Gly Gly Pro Leu Ala Met Ala Arg Thr Gly Arg Arg 65 70 75 80
Ala Ala Val Gly Arg Pro Ala Arg Thr Ser Ser Leu Thr Glu Arg Arg 85 90 95 Arg Val Leu Leu Ala Gly Val Arg Ser His Thr Arg Phe Tyr Lys Ala 100 105 110
Phe Ala Arg Glu Val Arg Glu Phe Asn Ala Thr Arg Ile Cys Gly Thr
115 120 125
Leu Leu Thr Leu Met Ser Gly Ser Leu Gin Gly Arg Ser Leu Phe Glu 130 135 140
Ala Thr Arg Val Thr Leu Ile Cys Glu Val Asp Leu Gly Pro Arg Arg
145 150 155 160
Pro Asp Cys Ile Cys Val Phe Glu Phe Ala Asn Asp Lys Thr Leu Gly
165 170 175 Gly Val Cys Val Ile Leu Lys Thr Cys Lys Ser Ile Ser Ser Gly Asp
180 185 190
Thr Ala Ser Lys Arg Glu Gin Arg Thr Thr Gly Met Lys Gin Leu Arg
195 200 205
His Ser Leu Lys Leu Leu Gin Ser Leu Ala Pro Pro Gly Asp Lys Val 210 215 220
Val Tyr Leu Cys Pro Ile Leu Val Phe Val Ala Gin Arg Thr Leu Arg
225 230 235 240
Val Ser Arg Val Thr Arg Leu Val Pro Gin Lys Ile Ser Gly Asn Ile
245 250 255 Thr Ala Ala Val Arg Met Leu Gin Ser Leu Ser Thr Tyr Ala Val Pro
260 265 270
Pro Glu Pro Gin Thr Arg Arg Ser Arg Arg Arg Val Ala Ala Thr Ala
275 280 285
Arg Pro Gin Arg Pro Pro Ser Pro Thr Arg Asp Pro Glu Gly Thr Ala 290 295 300
Gly His Pro Ala Pro Pro Glu Ser Asp Pro Pro Ser Pro Gly Val Val
305 310 315 320
Gly Val Ala Ala Glu Gly Gly Gly Val Leu Gin Lys Ile Ala Ala Leu
325 330 335 Phe Cys Val Pro Val Ala Ala Lys Ser Arg Pro Arg Thr Lys Thr Glu
340 345 350
(2) INFORMATION FOR SEQ ID NO: 153: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 571 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 153:
Met Asp Pro Tyr Tyr Pro Phe Asp Ala Leu Asp Val Trp Glu His Arg
1 5 10 15
Arg Phe Ile Val Ala Asp Ser Arg Ser Phe Ile Thr Pro Glu Phe Pro 20 25 30
Arg Asp Phe Trp Met Leu Pro Val Phe Asn Ile Pro Arg Glu Thr Ala
35 40 45
Ala Glu Arg Ala Ala Val Leu Gin Ala Gin Arg Thr Ala Ala Ala Ala 50 55 60 Ala Leu Glu Asn Ala Ala Leu Gin Ala Ala Glu Leu Pro Val Asp lie 65 70 75 80
Glu Arg Arg Ile Arg Pro Ile Glu Gin Gin Val His His Ile Ala Asp
85 90 95
Ala Leu Glu Ala Leu Glu Thr Ala Ala Ala Ala Ala Glu Glu Ala Asp 100 105 110
Ala Ala Arg Asp Ala Glu Arg Glu Gly Ala Ala Asp Gly Ala Ala Pro
115 120 125
Ser Pro Thr Ala Gly Pro Ala Ala Ala Glu Met Glu Val Gin Ile Val
130 135 140 Arg Asn Asp Pro Pro Leu Arg Tyr Asp Thr Asn Leu Pro Val Asp Leu
145 150 155 160
Leu His Met Val Tyr Ala Gly Arg Gly Ala Ala Gly Ser Ser Gly Val
165 170 175
Val Phe Gly Thr Trp Tyr Arg Thr Ile Gin Glu Arg Thr Ile Ala Asp 180 185 190
Phe Pro Leu Thr Thr Arg Ser Ala Asp Phe Arg Asp Gly Arg Met Ser
195 200 205
Lys Thr Phe Met Thr Ala Leu Val Leu Ser Leu Gin Ser Cys Gly Arg 210 215 220 Leu Tyr Val Gly Gin Arg His Tyr Ser Ala Phe Glu Cys Ala Val Leu 225 230 235 240
Cys Leu Tyr Leu Leu Tyr Arg Thr Thr His Glu Ser Ser Pro Asp Arg 245 250 255 Asp Arg Ala Pro Val Ala Phe Gly Asp Leu Leu Ala Arg Leu Pro Arg
260 265 270
Tyr Leu Ala Arg Leu Ala Ala, Val Ile Gly Asp Glu Ser Gly Arg Pro 275 280 285 Gin Tyr Arg Tyr Arg Asp Asp Lys Leu Pro Lys Ala Gin Phe Ala Ala 290 295 300
Ala Gly Gly Arg Tyr Glu His Gly Ala Thr His Val Val Ile Ala Thr 305 310 315 320
Leu Val Arg His Gly Val Leu Pro Ala Ala Pro Gly Asp Val Pro Arg 325 330 335
Asp Thr Ser Thr Arg Val Asn Pro Asp Asp Val Ala His Arg Asp Asp
340 345 350
Val Asn Arg Ala Ala Ala Ala Phe Leu Arg His Asn Leu Phe Leu Trp 355 360 365 Glu Asp Gin Thr Leu Leu Arg Ala Thr Ala Asn Thr Ile Thr Ala Val 370 375 380
Leu Arg Arg Leu Leu Ala Asn Gly Asn Val Tyr Ala Asp Arg Leu Asp 385 390 395 400
Asn Arg Leu Gin Leu Gly Met Leu Ile Pro Gly Ala Val Pro Ala Glu 405 410 415
Ala Ile Arg Ala Ser Gly Leu Asp Ser Gly Ala Ile Lys Ser Gly Asp
420 425 430
Asn Asn Leu Glu Ala Leu Cys Val Asn Tyr Val Leu Pro Leu Tyr Gin 435 440 445 Ala Asp Pro Thr Val Glu Leu Thr Gin Leu Phe Pro Gly Leu Ala Ala 450 455 460
Leu Cys Leu Asp Ala Gin Ala Gly Arg Pro Leu Ala Ser Thr Arg Arg 465 470 475 480
Val Val Asp Met Ser Ser Gly Ala Arg Gin Ala Ala Leu Val Arg Leu 485 490 495
Thr Ala Leu Glu Leu Ile Asn Arg Thr Arg Thr Asn Thr Thr Pro Val
500 505 510
Gly Glu Ile Ile Asn Ala His Asp Ala Leu Gly Ile Gin Tyr Glu Gin 515 520 525 Gly Leu Gly Leu Leu Ala Gin Gin Ala Arg Ile Gin Ala Lys Arg Phe 530 535 540
Ala Thr Phe Asn Val Gly Ser Asp Tyr Asp Leu Leu Tyr Phe Leu Cys 545 550 555 560
Leu Gly Phe Ile Pro Gin Tyr Leu Ser Val Ala 565 570
(2) INFORMATION FOR SEQ ID NO: 154: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 571 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154:
Met Asp Pro Tyr Tyr Pro Phe Asp Ala Leu Asp Val Trp Glu His Arg
1 5 10 15
Arg Phe Ile Val Ala Asp Ser Arg Ser Phe Ile Thr Pro Glu Phe Pro 20 25 30 Arg Asp Phe Trp Met Leu Pro Val Phe Asn Ile Pro Arg Glu Thr Ala 35 40 45
Ala Glu Arg Ala Ala Val Leu Gin Ala Gin Arg Thr Ala Ala Ala Ala
50 55 60
Ala Leu Glu Asn Ala Ala Leu Gin Ala Ala Glu Leu Pro Val Asp Ile 65 70 75 80
Glu Arg Arg Ile Arg Pro Ile Glu Gin Gin Val His His Ile Ala Asp
85 90 95
Ala Leu Glu Ala Leu Glu Thr Ala Ala Ala Ala Ala Glu Glu Ala Asp 100 105 110 Ala Ala Arg Asp Ala Glu Arg Glu Gly Ala Ala Asp Gly Ala Ala Pro 115 120 125
Ser Pro Thr Ala Gly Pro Ala Ala Ala Glu Met Glu Val Gin Ile Val
130 135 140
Arg Asn Asp Pro Pro Leu Arg Tyr Asp Thr Asn Leu Pro Val Asp Leu 145 150 155 160
Leu His Met Val Tyr Ala Gly Arg Gly Ala Ala Gly Ser Ser Gly Val
165 170 175
Val Phe Gly Thr Trp Tyr Arg Thr Ile Gin Glu Arg Thr Ile Ala Asp 180 185 190 Phe Pro Leu Thr Thr Arg Ser Ala Asp Phe Arg Asp Gly Arg Met Ser 195 200 205
Lys Thr Phe Met Thr Ala Leu Val Leu Ser Leu Gin Ser Cys Gly Arg
210 215 220
Leu Tyr Val Gly Gin Arg His Tyr Ser Ala Phe Glu Cys Ala Val Leu 225 230 235 240
Cys Leu Tyr Leu Leu Tyr Arg Thr Thr His Glu Ser Ser Pro Asp Arg
245 250 255
Asp Arg Ala Pro Val Ala Phe Gly Asp Leu Leu Ala Arg Leu Pro Arg 260 265 270
Tyr Leu Ala Arg Leu Ala Ala Val Ile Gly Asp Glu Ser Gly Arg Pro
275 - 280 285
Gin Tyr Arg Tyr Arg Asp Asp Lys Leu Pro Lys Ala Gin Phe Ala Ala 290 295 300
Ala Gly Gly Arg Tyr Glu His Gly Ala Thr His Val Val Ile Ala Thr
305 310 315 320
Leu Val Arg His Gly Val Leu Pro Ala Ala Pro Gly Asp Val Pro Arg
325 330 335 Asp Thr Ser Thr Arg Val Asn Pro Asp Asp Val Ala His Arg Asp Asp
340 345 350
Val Asn Arg Ala Ala Ala Ala Phe Leu Arg His Asn Leu Phe Leu Trp
355 360 365
Glu Asp Gin Thr Leu Leu Arg Ala Thr Ala Asn Thr Ile Thr Ala Val 370 375 380
Leu Arg Arg Leu Leu Ala Asn Gly Asn Val Tyr Ala Asp Arg Leu Asp
385 390 395 400
Asn Arg Leu Gin Leu Gly Met Leu Ile Pro Gly Ala Val Pro Ala Glu
405 410 415 Ala Ile Arg Ala Ser Gly Leu Asp Ser Gly Ala Ile Lys Ser Gly Asp
420 425 430
Asn Asn Leu Glu Ala Leu Cys Val Asn Tyr Val Leu Pro Leu Tyr Gin
435 440 445
Ala Asp Pro Thr Val Glu Leu Thr Gin Leu Phe Pro Gly Leu Ala Ala 450 455 460
Leu Cys Leu Asp Ala Gin Ala Gly Arg Pro Leu Ala Ser Thr Arg Arg
465 470 475 480
Val Val Asp Met Ser Ser Gly Ala Arg Gin Ala Ala Leu Val Arg Leu
485 490 495 Thr Ala Leu Glu Leu Ile Asn Arg Thr Arg Thr Asn Thr Thr Pro Val
500 505 510
Gly Glu Ile Ile Asn Ala His Asp Ala Leu Gly Ile Gin Tyr Glu Gin
515 520 525
Gly Leu Gly Leu Leu Ala Gin Gin Ala Arg Ile Gin Ala Lys Arg Phe 530 535 540
Ala Thr Phe Asn Val Gly Ser Asp Tyr Asp Leu Leu Tyr Phe Leu Cys 545 550 555 560
Leu Gly Phe Ile Pro Gin Tyr Leu Ser Val Ala 565 570
(2) INFORMATION FOR SEQ ID NO: 155:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11706 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155:
GGACGAACCA ACGGACACCT CCGCAAAGCG CGCGCGCGCC TCCCCCGCGG CGTCGCGACA 60 GACCAGATAC AGCAGGGCGT GGAGGCAGTC GCGCGTGCGC GGGGGCAGCC ATACCGCGTA 120
TAGGGTAATG GCGCTGACGC TCTCCTCCAC CCAAACGATG CCGGGGGCTT CCATGCCACG 180
ACGCCCGGGG GTTGCCGTGT ATCGAACGAG CGCGGCCCCA GACTTATAGG GTGCTAAAGT 240
TCACCGCCCC CTGCATCATG GGCCAGGCCT CGGTGGGAAG CTCCGACAGA GCCGCCTCGA 300
GAATGATGTC AGTGTTGGGC TGGGCGCCGG AGGCGTGCGT GCGCAAGCAG CGCCCCCACG 360 CGGGCGCGCG CAGCTTGAAG CGCGCGCCCG CAAACTCCCG CTTATGGGCC ATCAGCAGCG 420
CGTACAGCTG TCTGTGCGTC CGGCAGGCGC TGTGGTCGAT GCGGTGGGCG TCCAGCAGCT 480
CCACGATGGC TCGCTTGGTG AGGTTTTTAA CGCGCCCCGC CCCGGGAAAC GTCTGCGTGC 540
TCTTGGCCAG CTGCACCCCG AACAGTTCGC CCCAGATGAT CTTGAACAGC GACAGCGCGT 600
GCTCCGTCTC GCTCACGGAC CCGCGCGGGG GGCAGCCGCT CAGGGCGTCG GCCACGCGCT 660 TAACCGCGTC CTCCGACAGC AAGGGGCCGT CGGTCACGTT ACAGTGGCCC AGTTCGAACA 720
CCAGCTGCAT GTAGCGGTCG TAGTGGGGGT TCAGCAGCTC CAGCACGTCC TCGGGGCTAA 780
AGGTTCGCCC CGACCCCCCG GCCATCGAGT CCCACTGCAG GCACGCGGCC ATGGTGCTGC 840
ACAGACGGAA CAGCTCCCAG ACGGGGGCGA CGTTTAGGGT GGGGTGTAGG GCCACAAGCT 900
CCAGCTCTCC GGCGGCGTTG ATCGTGGGGA TGACGCCCGT GGCGTAGTGG TCGTAAAGCC 960 GCCGGAAGAT GGCGCTGCTA TGGGCGGCCA TGGGGACGCG AAGACAGGCC TCCAGCAGCA 1020
CCAGGTAGAT GAACCGCGTG CGGCCGACCA GGCTGTTGAG GCCGCGCATG AGCGCGACCA 1080
CCTCGGCCGG CGCGACGTCC GGCCGGAGGT ACTTTTCGAC GAAAAGGCCC ACCTCCTCCG 1140
TCTCGGCGGC CTGGGCCGAC AGGGACGTGT CGGGGTCCTG GCAGCGCAGC TCCCGCAGAT 1200
CCCGCTGGGC CCTCAGGGCA TCAAAATGTA TCCCCCGCAA AAACAGACAA AAGTTCCTCG 1260 GGGTCAGCGC GGCGTCGTGG CCCCAGAACC GCACGTGCAT GCAGTTGAGG GTCAGAAGCA 1320
TGTGGAGGAT GTTAAGACTG TCCGCGAGGC ACGCCAGCGT GCACCTCTCG AAGTAGTGCT 1380
TGTACCGGAA TTTGCTGTAG ATGCGCGACC CCCGCGCCTG CGCCGCGTCG GCGTGCGACG 1440
CGTCGCAGCG CCCTTTGAAC CGGCGGCACA ACAGGTTCGT CACCTGGGAA AACTGTGCCG 1500
GCCACTGCCC GCTGGCGCTC ACCACGTGGT TGAGCAGCAT GGGCGTAAAG ACGGGCTCCG 1560 AGCGCGCCCC GGACCCGTCC ATGTAGATCA GCAGCTCCCC CTTGCGGAGA GTCCGTACCC 1620
GCCCCAGCGA CTGGTACACG GACACCATGT CCGGCCCGTA GTTCATGGGT TTCACGTAGG 1680
CGAACATGCT GTCAAAGTGC GGCGGATCGA AGCTAAGGCC CACCGTCACG ACCGTTGTGT 1740
AGATGACCAC CCGGTACCGG CCCCATGTGG TCACGTCGCC GGGCGGGGTG AGCGAGTGGA 1800
GCAGCAGCAC GCGGTCCGTA AACTGCCGGC AGAACCTGGC AACGACCTCC GCGAAGGAGA 1860 CCGTCGACGA GAAGATGCAG ACGTTATCTC CGCCGGCCAG GCGCGCCTCC AGCTCCCCGA 1920
AGAAGGTGGC GTCCGGGGGG GCGTCCGGGG GGGGCGCCCC GCCCGCCGGC CCCGGCGGGC 1980
GCAGGGCCGC CTGCAGGACC TCGGGCCCCA GGCGCGGGAG AAACAGACAA CGGCGCGCCG 2040
AAAATCCGGG CATGGCATAC TCCCCGATGA CCACGTGAAC GTTCTTTTCG CCCCGGAGGC 2100 TGCACAGAAA GTCCACCAGC TGCGCGTTGG CGGTGGCGTC CATGGCGATG ATCCGCGGGC 2160
ACGTGCGCAG CAGGCGCAGC ATCAACGCGT CGACGCGGCC CAGCTGCTGC ATCGTCGGCG 2220
AGTACAGTTG GCCCAACGTC GACATGACTT CGTCCAGGAC GAGCACGTCG TAGTTGTTCA 2280
ACAGGTTCGG GCCCACGCGA TGAAGACTTT CCACCTGCAC GATGAGACGG TGGAAGGGGC 2340 GGTCGTTCAT GATGTAATTG GTGGATGAGA AGTAGGTGAC GAAGTCGGGC AACCCTGACT 2400
CAGCGAACCG CGTCGCCAGG GTCTGAGTAA AACTCCGACG ACAGGAGACG ACCAGCACAC 2460
TCGTGTCCGG AGAGTGGATC GCTTCCCCCA ACCAGCGGAT CAGCGCGGTA GTTTTTCCCG 2520
AGCCCATTGG CGCGCGGACC ACAGTTACGC ACCGGGCCGT CGGGGCGCTC GCGTCCGGGA 2580
AGGTGACGGG TCCGTGTTGC TGCCGCTCGA TCGTTGTTTT CGGGTGGACC CGGGGAACCC 2640 ACTCGGCCAA ATCCCCCCCG TAAAGCATCC GCGCCAGCGA TACACTCGAC GTGTACTGCT 2700
CGCACTCGTC ATCCCCGATG GGACGCCGGG CCCCCAGGGG ATCCCCCGAG GCCGCGCCGG 2760
GCGCCGACGT CGCGCCCGGG GCGCGGGCGG CGTGGTGGGT CTGGTGTGTG CAGGTGGCGA 2820
CGTTCATCGT CTCGGCCATC TGCGTCGTGG GGCTCCTGGT GCTGGCCTCT GTGTTCCGGG 2880
ACAGGTTTCC CTGCCTTTAC GCCCCCGCGA CCTCTTATGC GGAGGCGAAC GCCACGGTCG 2940 AGGTGCGCGG GGGTGTAGCC GTCCCCCTCC GGTTGGACAC GCAGAGCCTG CTGGCCACGT 3000
ACGCAATTAC GTCTACGCTG TTGCTGGCGG CGGCCGTGTA CGCCGCGGTG GGCGCGGTGA 3060
CCTCGCGCTA CGAGCGCGCG CTGGATGCGG CCCGTCGCCT GGCGGCGGCC CGTATGGCGA 3120
TGCCACACGC CACGCTAATC GCCGGAAACG TCTGCGCGTG GCTGTTGCAG ATCACAGTCC 3180
TGCTGCTGGC CCACCGCATC AGCCAGCTGG CCCACCTTAT CTACGTCCTG CACTTTGCGT 3240 GCCTCGTGTA TCTCGCGGCC CATTTTTGCA CCAGGGGGGT CCTGAGCGGG ACGTACCTGC 3300
GTCAGGTTCA CGGCCTGATT GACCCGGCGC CGACGCACCA TCGTATCGTC GGTCCGGTGC 3360
GGGCAGTAAT GACAAACGCC TTATTACTGG GCACCCTCCT GTGCACGGCC GCCGCCGCGG 3420
TCTCGTTGAA CACGATCGCC GCCCTCAACT TCAACTTTTC CGCCCCGAGC ATGCTCATCT 3480
GCCTGACGAC GCTGTTCGCC CTGCTTGTCG TGTCGCTGTT GTTGGTGGTC GAGGGGGTGC 3540 TGTGTCACTA CGTGCGCGTG TTGGTGGGCC CCCACCTCGG GGCCATCGCC GCCACCGGCA 3600
TCGTCGGCCT GGCCTGCGAG CACTACCACA CCGGTGGCTA CTACGTGGTG GAGCAGCAGT 3660
GGCCGGGGGC CCAGACGGGA GTCCGCGTCG CCCTGGCGCT CGTCGCCGCC TTTGCCCTCG 3720
CCATGGCCGT GCTTCGGTGC ACGCGCGCCT ACCTGTATCA CCGGCGACAC CACACTAAAT 3780
TTTTCGTGCG CATGCGCGAC ACCCGGCACC GCGCCCATTC GGCGCTTCGA CGCGTACGCA 3840 GCTCCATGCG CGGTTCTAGG CGTGGCGGGC CGCCCGGAGA CCCGGGCTAC GCGGAAACCC 3900
CCTACGCGAG CGTGTCCCAC CACGCCGAGA TCGACCGGTA TGGGGATTCC GACGGGGACC 3960
CGATCTACGA CGAAGTGGCC CCCGACCACG AGGCCGAGCT CTACGCCCGA GTGCAACGCC 4020
CCGGGCCTGT GCCCGACGCC GAGCCCATTT ACGACACCGT GGAGGGGTAT GCGCCAAGGT 4080
CCGCGGGGGA GCCGGTGTAC AGCACCGTTC GGCGATGGTA GCCGTTTCGT TCGTTTTAAT 4140 AAACCGACGT TGTGCGTTTC ACCATACTTC GGCGCGCGCG TGTGTGTGTT TTTTTTGTGG 4200
TGTTTATTTT CCCCCACCCC TTCCTTTTCT TTCGGCCACC ACCCCCCTCC TCCCCCGTAC 4260
TATACAACAA AAAATACCAC ACATACGACC AAATACGGAC AATCATTTCT GTCTTTATTC 4320
GCTGTCAGAG AGTGGGGGCG TGAGCGTGGC AGGAGGGCGG GCCACGTCGG GGTCCCGCCG 4380
TCTGGTGTGA CGCGATGGGG GGTCCGATGC GCGCCGGTAC TGGGGCCCCG GCGCCCGGGT 4440 GACCACGCGC ATGTCGGGGG GCACGTAGAA GTTACCCTCT TCTTCGGACT CGATGTCCAC 4500
GACGTCAAAT TCGTGGGCGG TCAGCGAGAC GACCTCCCCG CCGTCGGTGA TGATGACGTT 4560
GTGTCGGCAG CAGCAGGGCC GCGCCCCGGA GAACGCGAGG CCCATAACTT GGCGAGCGTA 4620
TCGTCGAAGG CCAGGCGGCT GTTTCGCCGG ATGTCCCGGT AGATCCCCGG CTCGACGCGG 4680 ACGGGGGTGA TGATCAGGGC GATCGGAACG GCCTGGTCCG GGAGGATCGA TGCCTTGGCG 4740
GGTCCGGGGG CCCCGCCACG CCCGGCGGGC GCTCCGCGGC CGTCCTCCAG GCGGAACGTC 4800
ACGCCCTCCT CCGCGCCCGC GCGGTGCCTG CCGAGGAACG TCACCAGGTG CGGTTGCAGG 4860
GGGCAGTCGG GAAAGTGGCT GTCGAGGACG TATCCCTGCA CCAAGATCTG TTTGAAGTTC 4920 GGGTGGCGGG GGTTGGCGAA GATGGGCTCG CGGCGAACCA GCTCCCCGGA GCTCCAGGCC 4980
ACGGGAGAGA TGGTGCGACG CTCGAGGTCG GGGACGCCAA ACAGAAGCAC CTCCGAGACA 5040
ACGCCGCTAT TTAACTCCAC CAGCGCCCGA TCCGGGGCGG AGCATCGCCT TTTTTCGCCG 5100
GCGGCGCGGG AATCGAGCCA GTCCCGGTCT TGGGTGACGA GCGCCTCCTC CGGGCCCGGG 5160
ACGCGCCCGG GCGCGAAGTA GCGCACGCCG GGGTTGGGGA TGGACCGGAT GAACGCCCGG 5220 AACGCCTCCG GCGATCGCCG CGCCATCAGG TCCTCGTACG CGGAGGCCGC GGGGGCGCCG 5280
GGGTCCGCGG GGTCGAACGC GTACTTGGCT CGGCACTTAA CCTCGTAGAA GGCCAGGGGG 5340
GTCTGGGGGG CGGGGGCCAG GTAGCCGTGA GGGTCCCTGG GGCACACGAG GATGTCCAGG 5400
GACGCCCCCA CCATGCCCGT GTGGCCGTCC ATGAGGACCC CGCACGCGTG CACGTTCTCC 5460
TCGGCGAGGT CCCCGGGTTG GTGAAAGACG AAGCGCCCGG CGTCGGCGTC GTCGTTGACG 5520 CCCGCGTCCG CGCGGCCCAC GCAGTAGCGA AACAGCAGGT TTCGGGCCGT CGGCTCGTTC 5580
ACCCGCCCGA ACATCACCGC CGACGACTGG GCGTCCAGCC GCAGGCTGGC GTTGTGGGTG 5640
AGCCACTGGG ACGAGAAGCA CGGACCCTGC GCGCCCCACC GCAGCGTGGA GGCGGTCGTC 5700
AGGCCCCGCC GAAGCAGGGC CCAGAGCTGG CAGTCGGCCT GGTTTTGCGT CGCCGCCTCG 5760
TAAAATCCCA TAAGCGGGCG GGGGGCGACG GCTTCGGCGG CGGACGGGGG GGCGCGGCGC 5820 GTCAGGCGCC AGAGGTGCCG GCCGAGCCCG CGGTCCACCA TGCCGGCCGC CTCCAGCGAC 5880
ACGACGAGGG AGCACAGATA GTCCAGGCGA GCCCACAGGG GCCCGATGGC CAGAGGGGAG 5940
CGGACGCCGC GCAGCAGGCC GCGCAGGTGG CGCTCGAACG TTTCCGCCAA GATATGGGGG 6000
GGCAGTGCGT TGGGGATCGC CGACGCCGAC CACATCGGGT CGGGGTCCGG GGGACCGGGG 6060
CTGCAGTCCG GGTCGATGGC GTGTGCGCCC CCCGGCGAGA GGGGAATGTC GGGGGTTGGC 6120 GGGCCGGATG AGGCCTCAGA GAGGGCCGGG GACGCGGGCC GGGCCTTTTC GCCCGGGGCC 6180
CCGCCGTCGG GTTGCCCACG TGGGGGGCTC TGGGGCCAAT GGGAACCCGG GGCCCCCGGT 6240
GACGTGGGGC GGGGTGGGGC GGGGCGGGGC CCAAAGACGG TCGCCAGATC TAGGCTGTTG 6300
GGTCGGGGCC GCTTCGGGGG ACTATCGGGG TCGCGGGCGG GGTCCGCGGG GCGCTTGGCG 6360
CCGGGTGTTG CGGCGGCCGC CATTTTTACG AGCAGCCGAA GAGCTCGAGG GCGGAAGGGA 6420 TCCTCACGAC AGAGAGTGGC GCGCGGCCGG GTTGGCGTGA CAGAGGCGGG AGACCAGCAC 6480
CAGCAGCGGC CTCAGCTCGG GCGGCAGCGA CACCGACGAC AGGACGGCCT TGTGCGTGCG 6540
CTGGTAATTT ATACACTGCT CCGTGAACGC GCGCCGAATC TTGGGATTGC GAAGGTGGCG 6600
CCGGATGCCC TCCGGCACGT CATACGCCAG GCCGTGGGTG TTGGTCTCGG CCGAGTTGAC 6660
AAAGAGGGCG GGGTGCAGAA CGCAGCGATA GGCGAGGAGG GCCACGGCAA AGTCCGGCGA 6720 GAGCTGGTTG TTAAAGTACT GGTAGCCCGG GACGCGGGTC ACGGGGACGC CCAGGCTCGG 6780
GGCCACGTAC ACGCTAACCA GCAGCTCCAG CAGCGTCTGC CCCAGGGCGT AGAGATCGAC 6840
CGCCAGCCCG ACGTCGTGCT TCAGGGGGCG GTTGTTAAAC TCGGCCCGCT CGTTGTTGAG 6900
GTACTTTACC AAGAGCTCCG GCGGCTGGTT GTACCCGTGC CCCACCAGAG TGTGAAAGTT 6960
GGCCGTGGTC AGGGCGGCGG GCATCCCAAA CCCCCGGGGG GACTCGAGGT CCGGCTCCTG 7020 GAGGCAAAAC TGGCCCCGGG ATATCGTGGA GTTGGAGTTC AGGGTCACCA GGCTAAAGTC 7080
GGCCAGGACG GCCCGCCGGA GCGACACCGC GTCCGATCGC AGCATCACGA GGACGTTGGC 7140
GCACTTGATG TCCAGGTGGC TGATCCCGCA CCTGGTGTTC AGGAACACCA CGGCGCGCGC 7200
CAGGTCTGTG AAGCAGTGGT GGAGGGCCGT CGCGACGGAG GGGGTGGTCG CGCGCAGGGA 7260 CGCCAGCTGG CCGATGTACT TGCCGAGGTC CATGTCGTAC GCGGGGAACA CGATCTGGCG 7320
CTGCTGCAGC GAGAACCCGA GCGGGGTGAT AAAGCCGCGG ATGTCGTGGG TGCGGCCGCC 7380
GCGAAGAGCG CACTCCCCCA CGAGCAGGGT CGCGACGAGC TCCACGGCAA ACCACTCCTT 7440
TTCCCGGATG GTCTTCACGG CGAGCTTGTG TTCGCGAATC AACTGCACCT CGCCGTACCC 7500 CCCCGAGCCC CCGAAGCTGC GGGCCCCGGG GATCTCCAGG GTCGTGTAGC GGAGGGCGGG 7560
GTTGACGGCG AATACGGGGA TGCATAGCTT GTGGATGCGC GCGAGGGACA GGATGTGCGA 7620
GGGGGGCGAC GGGGGCGAGG TCATGGCCGT CTCGGACCTG CGCAGGGGCG GGCGCCTCAG 7680
CTTGGCCGCA GGGCCGGGGG CCTCGGGGGA CGAGCGGCGA CGAGACGAGC GGCTCACTCG 7740
CCATCGGGAC AGTCCCGCGC GAAGCCGCTC CCGGAAGCTG GATCGGCGGC GGGACCCGGG 7800 GCGGGCTCCG GAGACGGCGC CGTCTCGGGG GGAGGGGCCG CTTGGGCGTC CGGACGCCCG 7860
GCGGCTGAGG GAGTGTATGT AGGACGCGAG CCAGGCCTTG AAGGAGCGTC GGTGTGCACC 7920
TTGGGGGCTG ATGTCAGCTG CCACATGACT AGCAGGTCGC TGTCGCCCGG ACTCATCCAT 7980
CCGTCCGCCA GGTCGCCGTC CCCCCACAGA GACGCGTTCG CCGCGGCCTC TTCGAGCTGC 8040
TCCTCCTGGT CCGCAAGACG ATCGTCCGCC GCGTCCAGGC GCTCGCTAAG CGCGGGATCG 8100 AGGTACCGTC GGTGTGCGGT TAGAAAATCA CGTCGCGCCG CTTGCTCTTC CACGCGAATT 8160
TTAACACAGG TCGCTCGCTG TCGCATCATC TCTAAGCGCG CGCGGGACTT TAGCCGCGCC 8220
TCCAATTCCA AGTGGGCCGC CTTGGCGGCC ATAAAGGCGC CAACAAACCT AGGATCTTGT 8280
GTACTCACGC CCTCCCGGTG TAGCTGCAGG GTCTGGTCCC TGTACACCTC GGCCCGGAGG 8340
TGCGTCTCGG CCAAACGTCG GCGCAGGGCC GCGTGGCTGG CGTCTCGGCT CATCTCGCCG 8400 CCCCCGCGCG CGCCCGACGT CGGACTCCTT CGCCCCGACC CCCCTGACCT CAGCCGCCCC 8460
CGCCTCGCCC GCGATGTTTG GCCAGCAGCT GGCGTCCGAC GTGCAGCAGT ACCTGGAGCG 8520
CCTGGAGAAA CAGAGGCAAC AGAAGGTGGG CGTCGACGAG GCGTCGGCGG GCCTGACGCT 8580
CGGCGGCGAT GCGCTGCGCG TCCCTTTTTT GGATTTTGCC ACCGCGACGC CCAAGCGCCA 8640
CCAGACCGTG GTCCCGGGCG TCGGGACGCT CCACGACTGC TGCGAGCACT CGCCGCTCTT 8700 CTCGGCCGTC GCGCGGCGGT TGCTGTTTAA TAGCCTGGTG CCGGCGCAAC TCAGGGGGCG 8760
TGACTTTGGG GGCGACCACA CGGCCAAGCT GGAGTTCCTG GCCCCCGAGC TGGTGCGGGC 8820
GGTGGCGCGC CTGCGGTTTC GGGAGTGCGC GCCGGAGGAC GCCGTGCCCC AACGCAACGC 8880
CTACTACAGC GTCCTGAACA CGTTTCAGGC CCTGCACCGC TCCGAAGCCT TTCGGCAGTT 8940
GGTTCACTTC GTGCGGGACT TCGCCCAGTT GTTGAAAACC TCGTTCCGGG CCTCTAGTCT 9000 CGCGGAGACT ACGGGCCCCC CGAAGAAACG GGCCAAGGTG GACGTGGCCA CCCACGGGCA 9060
GACGTACGGC ACCTTGGAGC TCTTCCAGAA AATGATACTA ATGCACGCGA CCTACTTTCT 9120
GGCCGCCGTG CTGCTCGGGG ACCACGCGGA GCAGGTCAAC ACGTTCCTGC GGCTCGTGTT 9180
CGAGATCCCC CTGTTTAGCG ACACGGCCGT GCGGCACTTC CGCCAGCGCG CCACCGTGTT 9240
TCTAGTCCCC AGGCGCCACG GAAAGACCTG GTTTTTGGTG CCCCTCATCG CGCTGTCGCT 9300 CGCGTCCTTC CGGGGGATCA AGATAGGCTA CACGGCCCAC ATCCGCAAGG CGACCGAGCC 9360
CGTGTTTGAT GAGATCGACG CCTGCCTGCG GGGCTGGTTT GGCTCGTCCC GGGTGGACCA 9420
CGTCAAGGGG GAAACCATCT CGTTCTCGTT CCCGGACGGC TCGCGCAGCA CGATCGTGTT 9480
TGCCTCCAGC CACAACACGA ACGTAAGTAC GCCTTCCTCC CGCGGTGCCT GTTTCCCCGG 9540
TGCCGCCCTC CCCGAGATCG ACCGACAGAC AAACACAGCC AGACGCGAGT GTGGGACGAC 9600 ACGCCCGCAG CCCCCCCCGC CATGGCGGGG GGAAGCCTTA CTGTTTATTT GTAATCGGAC 9660
GATGAGGCTC TGGCCACGGC CCGCGCGACC GCGGGGCAGC TCGTTGCAAA CAGGCGGCTG 9720
GTATACGATG ACAGAACGCA GAGGCGCCAC CCGGCGCTGG TCGGGCGGAT GACGCTTTCC 9780
GCGCCGTCCC GGCCCACGAC GACCTCGTGC AGGTGGGCCG TGATGCGCGG GCGGCGGGTC 9840 GCCTGCCGCA GGATAACCGC GTCCACGGGG TGCCCGAAGA GGAGCTGACA CAGGCTCGCG 9900
TCCCCCCGGA CGGCCAGGGT GCGCTGGGCC ATATTGGACC ACATGCACGG GGCGACGCAG 9960
GGACAGGCCT CCGCCACGGC GGGGGCGCGC CACAGCGCGT TGGCGGAATC GATGTGGGCC 10020
GTCGGGGCGC AGGCGCCGCC TCCTCCCGGG GGGTCGGTAA TCCTGGATAG CAGCCATCCT 10080 AAATGGCGGG CCCGGCTGCC CGGGGGACAG AGCGACCCCA GGTCATCATC CATGGCCCAG 10140
CAGTATATGC GGCCGCCGGG GAGGTGCCAC CAGGCCCCCG GACCCAGGGC ACAGCACGCC 10200
CCCGGATTCG GGGGCGGTTC CGTGGGTACC AGGTAGGCGC CGTCGAGCTC GTGGGCCACG 10260
GGCTCGTCCG CGAGCTGTTC GGCGGCGGGG TCGGGGGTTT CCTCCGGGGG GGAGGCAGCT 10320
TCCAGGTGGC CGAAGGCTAG GGTGCACAGC AGCGGGGTCC GGGGGTGCGT TACGCTGCGG 10380 AGGTGGACGG TGGCGCAGTA GCGGCGCTCG CGGTTAAAGA AGAAAATGGC AAAGAACGTG 10440
TTCGAAGGCA GGCGCAGCGC CTTGGGCCGC GTCAGGTACA GGAAGATCTC GCAGAAAAGG 10500
GCACGCTCGG GGTCGGGGTC CGGAAGGGCC ACCTGGCACA GCGGCTCGGT GAGGACCGTG 10560
AGGCACCGAA AAATCTTAAG CCGCTCGTCC CCCCGAACGA CGCGCCACAC GAAGACAGAG 10620
TTGGCGATGC GCGCGACGAG GTCGGCTTCG GGCCCCGGGT CGGGGGCGCG CGCGTCGGGG 10680 GGGGCGCCCC GGTGACCCGG CGGGGCCGCG GCTCCCGGGG GGCCTGGCGT CGCCTGGGGA 10740
CGCCAGAGTG CCCGCTGTGC CAGGTTGGTG GTGGGGAAGG GACCGGAGAC GCACCAAAAG 10800
CAGAGGGGCC AGCGCGTGTA TGAGTTGGGG GGGGGGTGGG TGAGCGGTGG AACAAAAGCA 10860
CGCGTCAGCG GACAAGGCCG GGTCCCGTAG CCGCCCCGCG ACAGAACCGG AGTCCGACGG 10920
CACGCGCGAC GGGGTCTGCG AGGCTGAGGT ACGCCGCGGT GTTAATGGTA AACGCAAAGC 10980 CTCCCGGAAA GACCACTAGC CCGCAGAGGC GGCGATTGAA CCCAAGGCAG AGGTACGCGT 11040
AGCTCTCTCC CGGAAGGTAT TGCTCGCAGA CCCTGTGCGG GGCAGTGGAG GGGCTGCCCT 11100
CCATGAAGCG ACATTTACTC TGCTCGCGTC CATTGACGTC ACCGTCAATC ACCACTGCGA 11160
TTGGACGGTT GGTAAGGCGC AGCGTGTCTC CGCTGGTGCT GTAGTAGTCA AACGCGTAGT 11220
GGGCGTCGGA GTCGGCGAAG CGGGCGGGGA TGTCGTCGCT GAGAGGGACG AGCCGCCGCC 11280 GCCGCCCCCG ACCGCCCTGG CCGCCCAGAT GCGCCAGCAC GGCCAGGGCG TACGCGGTGT 11340
GAAAGAACGC GTCGGGGGCG GTCCCCTCGA GGGCGCGCAT CAGGTTCTCC AGGAGCACGG 11400
GGAAGCGCCG CGTCACCTCC CCTAGCCACT CGCTCTGGTG GGGGCCAAAG TCGTAGCGCA 11460
GGCGCTGGAA GATGCGCGGG CCGCCTTGGA GCGCGGCCCG GATAGAGTGG CCCAGGGCCC 11520
GCAGACACGC GATCTGGATG CGCGCGACGA AGGCCACCTC GGCCGCGATG TCAAAGGGCT 11580 GCAGCACGGG GCGCGGGTGG CGCAGGGGTC CCTCGAGCGC GGGAAAGCGA CGCAGCAGCG 11640
CCGTCTGGGC CGCGGGGGAC AGCTGGTGGG GGCGCACGAC GCGCTCGGCG GCACAGGCCT 11700
CCGTC 11706
(2) INFORMATION FOR SEQ ID NO: 156:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 63 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156:
Met Glu Ala Pro Gly Ile Val- Trp Val Glu Glu Ser Val Ser Ala Ile 1 5 10 15 Thr Leu Tyr Ala Val Trp Leu Pro Pro Arg Thr Arg Asp Cys Leu His 20 25 30
Ala Leu Leu Tyr Leu Val Cys Arg Asp Ala Ala Gly Glu Ala Arg Ala
35 40 45
Arg Phe Ala Glu Val Ser Val Gly Ser Ser Xaa Xaa Xaa Xaa Xaa 50 55 60
(2) INFORMATION FOR SEQ ID NO: 157:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 857 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157:
Met Ala Glu Thr Met Asn Val Ala Thr Cys Thr His Gin Thr His His 1 5 10 15
Ala Ala Arg Ala Pro Gly Ala Thr Ser Ala Pro Gly Ala Ala Ser Gly
20 25 30
Asp Pro Leu Gly Ala Arg Arg Pro Ile Gly Asp Asp Glu Cys Glu Gin 35 40 45 Tyr Thr Ser Ser Val Ser Leu Ala Arg Met Leu Tyr Gly Gly Asp Leu 50 55 60
Ala Glu Trp Val Pro Arg Val His Pro Lys Thr Thr Ile Glu Arg Gin 65 70 75 80
Gin His Gly Pro Val Thr Phe Pro Asp Ala Ser Ala Pro Thr Ala Arg 85 90 95
Cys Val Thr Val Val Arg Ala Pro Met Gly Ser Gly Lys Thr Thr Ala
100 105 110
Leu Ile Arg Trp Leu Gly Glu Ala Ile His Ser Pro Asp Thr Ser Val 115 120 125 Leu Val Val Ser Cys Arg Arg Ser Phe Thr Gin Thr Leu Ala Thr Arg 130 135 140
Phe Ala Glu Ser Gly Leu Pro Asp Phe Val Thr Tyr Phe Ser Ser Thr 145 150 155 160 Asn Tyr Ile Met Asn Asp Arg Pro Phe His Arg Leu Ile Val Gin Val
165 170 175
Glu Ser Leu His Arg Val Gly Pro Asn Leu Leu Asn Asn Tyr Asp Val 180 185 190 Leu Val Leu Asp Glu Val Met Ser Thr Leu Gly Gin Lys Pro Thr Met 195 200 205
Gin Gin Leu Gly Arg Val Asp Ala Leu Met Leu Arg Leu Leu Arg Thr
210 215 220
Cys Pro Arg Ile Ile Ala Met Asp Ala Thr Ala Asn Ala Gin Leu Val 225 230 235 240
Asp Phe Leu Cys Ser Leu Arg Gly Glu Lys Asn Val His Val Val Ile
245 250 255
Gly Glu Tyr Ala Met Pro Gly Phe Ser Ala Arg Arg Cys Leu Phe Leu 260 265 270 Pro Arg Leu Gly Pro Glu Val Leu Gin Ala Ala Leu Arg Pro Pro Gly 275 280 285
Pro Ala Gly Gly Ala Pro Pro Pro Asp Ala Pro Pro Asp Ala Thr Phe
290 295 300
Phe Gly Glu Leu Glu Ala Arg Leu Ala Gly Gly Asp Asn Val Cys lie 305 310 315 320
Phe Ser Ser Thr Val Ser Phe Ala Glu Val Val Ala Arg Phe Cys Arg
325 330 335
Gin Phe Thr Asp Arg Val Leu Leu Leu His Ser Leu Thr Pro Pro Gly 340 345 350 Asp Val Thr Thr Trp Gly Arg Tyr Arg Val Val Ile Tyr Thr Thr Val 355 360 365
Val Thr Val Gly Leu Ser Phe Asp Pro Pro His Phe Asp Ser Met Phe
370 375 380
Ala Tyr Val Lys Pro Met Asn Tyr Gly Pro Asp Met Val Ser Val Tyr 385 390 395 400
Gin Ser Leu Gly Arg Val Arg Thr Leu Arg Lys Gly Glu Leu Leu Ile
405 410 415
Tyr Met Asp Gly Ser Gly Ala Arg Ser Glu Pro Val Phe Thr Pro Met 420 425 430 Leu Leu Asn His Val Val Ser Ala Ser Gly Gin Trp Pro Ala Gin Phe 435 440 445
Ser Gin Val Thr Asn Leu Leu Cys Arg Arg Phe Lys Gly Arg Cys Asp
450 455 460
Ala Ser His Ala Asp Ala Ala Gin Arg Ser Arg Ile Tyr Ser Lys Phe 465 470 475 480
Arg Tyr Lys His Tyr Phe Glu Arg Cys Thr Leu Ala Cys Leu Ala Asp
485 490 495
Ser Leu Asn Ile Leu His Met Leu Leu Thr Leu Asn Cys Met His Val 500 505 510
Arg Phe Trp Gly His Asp Ala Ala Leu Thr Pro Arg Asn Phe Cys Leu
515 -520 525
Phe Leu Arg Gly Ile His Phe Asp Ala Leu Arg Ala Gin Arg Asp Leu 530 535 540
Arg Glu Leu Arg Cys Gin Asp Pro Asp Thr Ser Leu Ser Ala Gin Ala
545 550 555 560
Ala Glu Thr Glu Glu Val Gly Leu Phe Val Glu Lys Tyr Leu Arg Pro
565 570 575 Asp Val Ala Pro Ala Glu Val Val Met Arg Gin Ser Leu Val Gly Arg
580 585 590
Thr Arg Phe Ile Tyr Leu Val Leu Leu Glu Ala Cys Leu Arg Val Pro
595 600 605
Met Ala Ala His Ser Ser Ala Ile Phe Arg Arg Leu Tyr Asp His Tyr 610 615 620
Ala Thr Gly Val Ile Pro Thr Ile Asn Ala Ala Gly Glu Leu Glu Leu
625 630 635 640
Val His Pro Thr Leu Asn Val Ala Pro Val Trp Glu Leu Phe Arg Leu
645 650 655 Cys Ser Thr Met Ala Ala Cys Leu Gin Trp Asp Ser Met Ala Gly Gly
660 665 670
Ser Gly Arg Thr Phe Ser Pro Glu Asp Val Leu Glu Leu Leu Asn Pro
675 680 685
His Tyr Asp Arg Tyr Met Gin Leu Val Phe Glu Leu Gly His Cys Asn 690 695 700
Val Thr Asp Gly Pro Leu Leu Ser Glu Asp Ala Val Lys Arg Val Ala
705 710 715 720
Asp Ala Leu Ser Gly Cys Pro Pro Arg Gly Ser Val Ser Glu Thr Glu
725 730 735 His Ala Leu Ser Leu Phe Lys Ile Ile Trp Gly Glu Leu Phe Gly Val
740 745 750
Gin Leu Ala Lys Ser Thr Gin Thr Phe Pro Gly Ala Gly Arg Val Lys
755 760 765
Asn Leu Thr Lys Arg Ala Ile Val Glu Leu Leu Asp Ala His Arg Ile 770 775 780
Asp His Ser Ala Cys Arg Thr Gin Leu Tyr Ala Leu Leu Met Ala His 785 790 795 800
Lys Arg Glu Phe Ala Gly Ala Arg Phe Lys Leu Arg Ala Pro Ala Trp 805 810 815 Gly Arg Cys Leu Arg Thr His Ala Ser Gly Ala Gin Pro Asn Thr Asp 820 825 830
Ile Ile Ala Ala Leu Ser Glu Leu Pro Thr Glu Ala Trp Pro Met Met 835 840 845 Gin Gly Ala Val Asn Phe Ser Thr Leu 850 855
(2) INFORMATION FOR SEQ ID NO: 158:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 470 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158:
Val Tyr Cys Ser His Ser Ser Ser Pro Met Gly Arg Arg Ala Pro Arg
1 5 10 15
Gly Ser Pro Glu Ala Ala Pro Gly Ala Asp Val Ala Pro Gly Ala Arg 20 25 30 Ala Ala Trp Trp Val Trp Cys Val Gin Val Ala Thr Phe Ile Val Ser 35 40 45
Ala Ile Cys Val Val Gly Leu Leu Val Leu Ala Ser Val Phe Arg Asp
50 55 60
Arg Phe Pro Cys Leu Tyr Ala Pro Ala Thr Ser Tyr Ala Glu Ala Asn 65 70 75 80
Ala Thr Val Glu Val Arg Gly Gly Val Ala Val Pro Leu Arg Leu Asp
85 90 95
Thr Gin Ser Leu Leu Ala Thr Tyr Ala Ile Thr Ser Thr Leu Leu Leu 100 105 110 Ala Ala Ala Val Tyr Ala Ala Val Gly Ala Val Thr Ser Arg Tyr Glu 115 120 125
Arg Ala Leu Asp Ala Ala Arg Arg Leu Ala Ala Ala Arg Met Ala Met
130 135 140
Pro His Ala Thr Leu Ile Ala Gly Asn Val Cys Ala Trp Leu Leu Gin 145 150 155 160
Ile Thr Val Leu Leu Leu Ala His Arg Ile Ser Gin Leu Ala His Leu
165 170 175
Ile Tyr Val Leu His Phe Ala Cys Leu Val Tyr Leu Ala Ala His Phe 180 185 190 Cys Thr Arg Gly Val Leu Ser Gly Thr Tyr Leu Arg Gin Val His Gly 195 200 205
Leu Ile Asp Pro Ala Pro Thr His His Arg Ile Val Gly Pro Val Arg 210 215 220 Ala Val Met Thr Asn Ala Leu Leu Leu Gly Thr Leu Leu Cys Thr Ala
225 230 235 240
Ala Ala Ala Val Ser Leu Asn -Thr Ile Ala Ala Leu Asn Phe Asn Phe
245 250 255
Ser Ala Pro Ser Met Leu Ile Cys Leu Thr Thr Leu Phe Ala Leu Leu
260 265 270
Val Val Ser Leu Leu Leu Val Val Glu Gly Val Leu Cys His Tyr Val 275 280 285
Arg Val Leu Val Gly Pro His Leu Gly Ala Ile Ala Ala Thr Gly Ile
290 295 300
Val Gly Leu Ala Cys Glu His Tyr His Thr Gly Gly Tyr Tyr Val Val
305 310 315 320
Glu Gin Gin Trp Pro Gly Ala Gin Thr Gly Val Arg Val Val Ala Ala
325 330 335
Phe Ala Met Ala Val Leu Arg Cys Thr Arg Ala Tyr Leu Tyr His Arg
340 345 350
Arg His His Thr Lys Phe Phe Val Arg Met Arg Asp Thr Arg His Arg 355 360 365
Ala His Ser Ala Leu Arg Arg Val Arg Ser Ser Met Arg Gly Ser Arg
370 375 380
Arg Gly Gly Pro Pro Gly Asp Pro Gly Tyr Ala Glu Thr Pro Tyr Ala
385 390 395 400
Ser Val Ser His His Ala Glu Ile Asp Arg Tyr Gly Asp Ser Asp Gly
405 410 415
Asp Pro Ile Tyr Asp Glu Val Ala Pro Asp His Glu Ala Glu Leu Tyr
420 425 430
Ala Arg Val Gin Arg Pro Gly Pro Val Pro Asp Ala Glu Pro Ile Tyr 435 440 445
Asp Thr Val Glu Gly Tyr Ala Pro Arg Ser Ala Gly Glu Pro Val Tyr
450 455 460
Ser Thr Val Arg Arg Trp
465 470
(2) INFORMATION FOR SEQ ID NO: 159:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 96 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159:
Met Gly Leu Ala Phe Ser Gly-Ala Arg Pro Cys Cys Cys Arg His Asn 1 5 10 15 Val Ile Ile Thr Asp Gly Gly Glu Val Val Ser Leu Thr Ala His Glu 20 25 30
Phe Asp Val Val Asp Ile Glu Ser Glu Glu Glu Gly Asn Phe Tyr Val
35 40 45
Pro Pro Asp Met Arg Val Val Thr Arg Ala Pro Gly Pro Gin Tyr Arg 50 55 60
Arg Ala Ser Asp Pro Pro Ser Arg His Thr Arg Arg Arg Asp Pro Asp 65 70 75 80
Val Ala Arg Pro Pro Ala Thr Leu Thr Pro Pro Leu Ser Asp Ser Glu 85 90 95
(2) INFORMATION FOR SEQ ID NO: 160:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 618 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 160:
Met Ala Ala Ala Ala Thr Pro Gly Ala Lys Arg Pro Ala Asp Pro Ala
1 5 10 15
Arg Asp Pro Asp Ser Pro Pro Lys Arg Pro Arg Pro Asn Ser Leu Asp 20 25 30
Leu Ala Thr Val Phe Gly Pro Arg Pro Ala Pro Pro Arg Pro Thr Ser 35 40 45
Pro Gly Ala Pro Gly Ser His Trp Pro Gin Ser Pro Pro Arg Gly Gin 50 55 60
Pro Asp Gly Gly Ala Pro Gly Glu Lys Ala Arg Pro Asp Ala Leu Ser
65 70 75 80
Glu Ala Ser Ser Gly Pro Pro Thr Pro Asp Ile Pro Leu Ser Pro Gly 85 90 95
Gly Ala His Ala Ile Asp Pro Asp Cys Ser Pro Gly Pro Pro Asp Pro 100 105 110
Asp Pro Met Trp Ser Ala Ser Ala Ile Pro Asn Ala Leu Pro Pro His 115 120 125 lie Leu Ala Glu Thr Phe Glu Arg His Leu Arg Gly Leu Leu Arg Gly
130 135 140
Val Arg Ser Pro Leu Ala Ile -Gly Pro Leu Trp Ala Arg Leu Asp Tyr 145 150 155 160 Leu Cys Ser Leu Val Val Ser Leu Glu Ala Ala Gly Met Val Asp Arg
165 170 175
Gly Leu Gly Arg His Leu Trp Arg Leu Thr Arg Arg Ala Pro Pro Ser
180 185 190
Ala Ala Glu Ala Val Ala Pro Arg Pro Leu Met Gly Phe Tyr Glu Ala 195 200 205
Ala Thr Gin Asn Gin Ala Asp Cys Gin Leu Trp Ala Leu Leu Arg Arg
210 215 220
Gly Leu Thr Thr Ala Ser Thr Leu Arg Trp Gly Ala Gin Gly Pro Cys 225 230 235 240 Phe Ser Ser Gin Trp Leu Thr His Asn Ala Ser Leu Arg Leu Asp Ala
245 250 255
Gin Ser Ser Ala Val Met Phe Gly Arg Val Asn Glu Pro Thr Ala Arg
260 265 270
Asn Leu Leu Phe Arg Tyr Cys Val Gly Arg Ala Asp Ala Gly Val Asn 275 280 285
Asp Asp Ala Asp Ala Gly Arg Phe Val Phe His Gin Pro Gly Asp Leu
290 295 300
Ala Glu Glu Asn Val His Ala Cys Gly Val Leu Met Asp Gly His Thr 305 310 315 320 Gly Met Val Gly Ala Ser Leu Asp Ile Leu Val Cys Pro Arg Asp Pro
325 330 335
His Gly Tyr Leu Ala Pro Ala Pro Gin Thr Pro Leu Ala Phe Tyr Glu
340 345 350
Val Lys Cys Arg Ala Lys Tyr Ala Phe Asp Pro Ala Asp Pro Gly Ala 355 360 365
Pro Ala Ala Ser Ala Tyr Glu Asp Leu Met Ala Arg Arg Ser Pro Glu
370 375 380
Ala Phe Arg Ala Phe Ile Arg Ser Ile Pro Asn Pro Gly Val Arg Tyr 385 390 395 400 Phe Ala Pro Gly Arg Val Pro Gly Pro Glu Glu Ala Leu Val Thr Gin
405 410 415
Asp Arg Asp Trp Leu Asp Ser Arg Ala Ala Gly Glu Lys Arg Arg Cys
420 425 430
Ser Ala Pro Asp Arg Ala Leu Val Glu Leu Asn Ser Gly Val Val Ser 435 440 445
Glu Val Leu Leu Phe Gly Val Pro Asp Leu Glu Arg Arg Thr Ile Ser
450 455 460
Pro Val Ala Trp Ser Ser Gly Glu Leu Val Arg Arg Glu Pro Ile Phe 465 470 475 480
Ala Asn Pro Arg His Pro Asn Phe Lys Gin Ile Leu Val Gin Gly Tyr
485 - 490 495
Val Leu Asp Ser His Phe Pro Asp Cys Pro Leu Gin Pro His Leu Val 500 505 510
Thr Phe Leu Gly Arg His Arg Ala Gly Ala Glu Glu Gly Val Thr Phe
515 520 525
Arg Leu Glu Asp Gly Arg Gly Ala Pro Ala Gly Arg Gly Gly Ala Pro
530 535 540 Gly Pro Ala Lys Ala Ser Ile Leu Pro Asp Gin Ala Val Pro Ile Ala
545 550 555 560
Leu Ile Ile Thr Pro Val Arg Val Glu Pro Gly Ile Tyr Arg Asp Ile
565 570 575
Arg Arg Asn Ser Arg Leu Ala Phe Asp Asp Thr Leu Ala Lys Leu Trp 580 585 590
Ala Ser Arg Ser Pro Gly Arg Gly Pro Ala Ala Ala Asp Thr Thr Ser
595 600 605
Ser Ser Pro Thr Ala Gly Arg Ser Ser Arg 610 615
(2) INFORMATION FOR SEQ ID NO: 161:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 525 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161:
Val Gly Gly Arg Arg Pro Gly Gly Arg Met Asp Glu Ser Gly Arg Gin 1 5 10 15 Arg Pro Ala Ser His Val Ala Ala Asp Ile Ser Pro Gin Gly Ala His 20 25 30
Arg Arg Ser Phe Lys Ala Trp Leu Ala Ser Tyr Ile His Ser Leu Ser
35 40 45
Arg Arg Ala Ser Gly Arg Pro Ser Gly Pro Ser Pro Arg Asp Gly Ala 50 55 60
Val Ser Gly Ala Arg Pro Gly Ser Arg Arg Arg Ser Ser Phe Arg Glu 65 70 75 80
Arg Leu Arg Ala Gly Leu Ser Arg Trp Arg Val Ser Arg Ser Ser Arg 85 90 95 Arg Arg Ser Ser Pro Glu Ala Pro Gly Pro Ala Ala Lys Leu Arg Arg 100 - 105 110
Pro Pro Leu Arg Arg Ser Glu Thr Ala Met Thr Ser Pro Pro Ser Pro 115 120 125
Pro Ser His Ile Leu Ser Leu Ala Arg Ile His Lys Leu Cys Ile Pro
130 135 140
Val Phe Ala Val Asn Pro Ala Leu Arg Tyr Thr Thr Leu Glu Ile Pro 145 150 155 160 Gly Ala Arg Ser Phe Gly Gly Ser Gly Gly Tyr Gly Glu Val Gin Leu
165 170 175 Ile Arg Glu His Lys Leu Ala Val Lys Thr Ile Arg Glu Lys Glu Trp 180 185 190
Phe Ala Val Glu Leu Val Ala Thr Leu Leu Val Gly Glu Cys Ala Leu 195 200 205
Arg Gly Gly Arg Thr His Asp Ile Arg Gly Phe Ile Thr Pro Leu Gly
210 215 220
Phe Ser Leu Gin Gin Arg Gin Ile Val Phe Pro Ala Tyr Asp Met Asp 225 230 235 240 Leu Gly Lys Tyr Ile Gly Gin Leu Ala Ser Leu Arg Ala Thr Thr Pro
245 250 255 Ser Val Ala Thr Ala Leu His His Cys Phe Thr Asp Leu Ala Arg Ala 260 265 270
Val Val Phe Leu Asn Thr Arg Cys Gly Ile Ser His Leu Asp Ile Lys 275 280 285
Cys Ala Asn Val Leu Val Met Leu Arg Ser Asp Ala Val Ser Leu Arg
290 295 300
Arg Ala Val Leu Ala Asp Phe Ser Leu Val Thr Leu Asn Ser Asn Ser 305 310 315 320 Thr Ile Ser Arg Gly Gin Phe Cys Leu Gin Glu Pro Asp Leu Glu Ser
325 330 335 Pro Arg Gly Phe Gly Met Pro Ala Ala Leu Thr Thr Ala Asn Phe His 340 345 350
Thr Leu Val Gly His Gly Tyr Asn Gin Pro Pro Glu Leu Leu Val Lys 355 360 365
Tyr Leu Asn Asn Glu Arg Ala Glu Phe Asn Asn Arg Pro Leu Lys His
370 375 380
Asp Val Gly Leu Ala Val Asp Leu Tyr Ala Leu Gly Gin Thr Leu Leu 385 390 395 400 Glu Leu Leu Val Ser Val Tyr Val Ala Pro Ser Leu Gly Val Pro Val
405 410 415 Thr Arg Val Pro Gly Tyr Gin Tyr Phe Asn Asn Gin Leu Ser Pro Asp 420 425 430 Phe Ala Val Leu Ala Tyr Arg Cys Val Leu His Pro Ala Leu Phe Val
435 440 445
Asn Ser Ala Glu Thr Asn Thr "His Gly Leu Ala Tyr Asp Val Pro Glu
450 455 460 Gly Ile Arg Arg His Leu Arg Asn Pro Lys Ile Arg Arg Ala Phe Thr
465 470 475 480
Glu Gin Cys Ile Asn Tyr Gin Arg Thr His Lys Ala Val Leu Ser Ser
485 490 495
Val Ser Leu Pro Pro Glu Leu Arg Pro Leu Leu Val Leu Val Ser Arg 500 505 510
Leu Cys His Ala Asn Pro Ala Ala Arg His Ser Leu Ser 515 520 525
(2) INFORMATION FOR SEQ ID NO: 162:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 217 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162:
Met Ser Arg Asp Ala Ser His Ala Ala Leu Arg Arg Arg Leu Ala Glu
1 5 10 15
Thr His Leu Arg Ala Glu Val Tyr Arg Asp Gin Thr Leu Gin Leu His 20 25 30 Arg Glu Gly Val Ser Thr Gin Asp Pro Arg Phe Val Gly Ala Phe Met 35 40 45
Ala Ala Lys Ala Ala His Leu Glu Leu Glu Ala Arg Leu Lys Ser Arg
50 55 60
Ala Arg Leu Glu Met Met Arg Gin Arg Ala Thr Cys Val Lys Ile Arg 65 70 75 80
Val Glu Glu Gin Ala Ala Arg Arg Asp Phe Leu Thr Ala His Arg Arg
85 90 95
Tyr Leu Asp Pro Ala Leu Ser Leu Asp Ala Ala Asp Asp Arg Leu Ala 100 105 110 Asp Gin Glu Glu Gin Leu Glu Glu Ala Ala Ala Asn Ala Ser Leu Trp 115 120 125
Gly Asp Gly Asp Leu Ala Asp Gly Trp Met Ser Pro Gly Asp Ser Asp 130 135 140 Leu Leu Val Met Trp Gin Leu Thr Ser Ala Pro Lys Val His Thr Asp
145 150 155 160
Ala Pro Ser Arg Pro Gly Ser- Arg Pro Thr Tyr Thr Pro Ser Ala Ala
165 170 175 Gly Arg Pro Asp Ala Gin Ala Ala Pro Pro Pro Glu Thr Ala Pro Ser
180 185 190
Pro Glu Pro Ala Pro Gly Pro Ala Ala Asp Pro Ala Ser Gly Ser Gly
195 200 205
Phe Ala Arg Asp Cys Pro Asp Gly Glu 210 215
(2) INFORMATION FOR SEQ ID NO: 163:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 239 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 163:
Pro Ala Asp Leu Glu Pro Leu Gly Asp Pro Thr Leu Trp Arg Ala Leu 1 5 10 15
Tyr Ala Cys Val Leu Ala Ala Leu Glu Arg Gin Thr Gly Pro Val Phe
20 25 30
Val Pro Leu Arg Leu Gly Trp Asp Pro Gin Thr Gly Leu Val Val Arg 35 40 45 Val Glu Arg Ala Ser Trp Gly Pro Pro Ala Ala Pro Arg Ala Ala Leu 50 55 60
Leu Asp Val Glu Ala Lys Val Asp Val Asp Pro Leu Ala Ala Arg Val 65 70 75 80
Ala Glu His Pro Gly Ala Arg Leu Ala Trp Ala Arg Leu Ala Ala Ile 85 90 95
Arg Asp Ser Pro Gin Cys Ala Ser Ser Ala Ser Leu Ala Val Thr Ile
100 105 110
Thr Thr Arg Thr Ala Arg Phe Ala Arg Glu Tyr Thr Thr Leu Ala Phe 115 120 125 Pro Pro Thr Ser Lys Glu Gly Ala Phe Ala Asp Leu Val Glu Val Cys 130 135 140
Glu Val Gly Leu Arg Pro Arg Gly His Pro Gin Arg Val Thr Ala Arg 145 150 155 160 Val Leu Leu Pro Arg Gly Tyr Asp Tyr Phe Val Ser Ala Gly Asp Gly
165 170 175
Phe Ser Ala Pro Ala Leu Val Phe Arg Gin Trp His Thr Thr Val His 180 185 190 Ala Ala Pro Gly Ala Pro Val Phe Ala Phe Leu Gly Ala Gly Phe Asp 195 200 205
Val Arg Gly Gly Pro Val Gin Tyr Phe Ala Val Leu Gly Phe Pro Gly
210 215 220
Trp Pro Thr Phe Thr Val Pro Ala Ala Ala Xaa Xaa Xaa Xaa Xaa 225 230 235
(2) INFORMATION FOR SEQ ID NO: 164:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 315 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 164:
Val Trp Arg Val Val Arg Gly Asp Glu Arg Leu Lys Ile Phe Arg Cys 1 5 10 15
Leu Thr Val Leu Thr Glu Pro Leu Cys Gin Val Pro Asp Pro Asp Pro
20 25 30
Glu Arg Ala Leu Phe Cys Glu Ile Phe Leu Tyr Leu Trp Lys Ala Leu 35 40 45 Arg Leu Pro Ser Asn Thr Phe Phe Ala Ile Phe Phe Phe Asn Arg Glu 50 55 60
Arg Arg Tyr Cys Ala Thr Val His Leu Arg Ser Val Thr His Pro Arg 65 70 75 80
Thr Pro Leu Leu Cys Thr Leu Ala Phe Gly His Leu Glu Ala Asp Pro 85 90 95
Glu Glu Thr Pro Asp Pro Ala Ala Glu Gin Leu Ala Asp Glu Pro Val
100 105 110
Ala His Glu Leu Asp Gly Ala Tyr Leu Val Pro Thr Glu Pro Pro Pro 115 120 125 Asn Pro Gly Ala Cys Cys Ala Leu Gly Pro Gly Ala Trp Trp His Leu 130 135 140
Pro Gly Gly Arg Ile Tyr Cys Trp Ala Met Asp Asp Asp Leu Gly Ser 145 150 155 160 Leu Cys Pro Pro Gly Ser Arg Ala Arg His Leu Gly Trp Leu Leu Ser
165 170 175
Arg Ile Thr Asp Pro Pro Gly -Gly Gly Gly Ala Cys Ala Pro Thr Ala 180 185 190 His Ile Asp Ser Ala Asn Ala Leu Trp Arg Ala Pro Ala Val Ala Glu 195 200 205
Ala Cys Pro Cys Val Ala Pro Cys Met Trp Ser Asn Met Ala Gin Arg
210 215 220
Thr Leu Ala Val Arg Gly Asp Ala Ser Leu Cys Gin Leu Leu Phe Gly 225 230 235 240
His Pro Val Asp Ala Val Ile Leu Arg Gin Ala Thr Arg Arg Pro Arg
245 250 255
Ile Thr Ala His Leu His Glu Val Val Val Gly Arg Asp Gly Ala Glu 260 265 270 Ser Val Ile Arg Pro Thr Ser Ala Gly Trp Arg Leu Cys Val Leu Ser 275 280 285
Ser Tyr Thr Ser Arg Leu Phe Ala Thr Ser Cys Pro Ala Val Ala Arg
290 295 300
Ala Val Ala Arg Ala Ser Ser Ser Asp Tyr Lys 305 310 315
(2) INFORMATION FOR SEQ ID NO: 165:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 278 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165:
Leu Thr Glu Ala Cys Ala Ala Glu Arg Val Val Arg Pro His Gin Leu 1 5 10 15
Ser Pro Ala Ala Gin Thr Ala Leu Leu Arg Arg Phe Pro Ala Leu Glu
20 25 30
Gly Pro Leu Arg His Pro Arg Pro Val Leu Gin Pro Phe Asp Ile Ala 35 40 45 Ala Glu Val Ala Phe Val Ala Arg Ile Gin Ile Ala Cys Leu Arg Ala 50 55 60
Leu Gly His Ser Ile Arg Ala Ala Leu Gin Gly Gly Pro Arg Ile Phe 65 70 75 80 Gin Arg Leu Arg Tyr Asp Phe Gly Pro His Gin Ser Glu Trp Leu Gly
85 90 95
Glu Val Thr Arg Arg Phe Pro- Val Leu Leu Glu Asn Leu Met Arg Ala 100 105 110 Leu Glu Gly Thr Ala Pro Asp Ala Phe Phe His Thr Ala Tyr Ala Val 115 120 125
Leu Ala His Leu Gly Gly Gin Gly Gly Arg Gly Arg Arg Arg Arg Leu
130 135 140
Val Pro Leu Ser Asp Asp Ile Pro Ala Arg Phe Ala Asp Ser Asp Ala 145 150 155 160
His Tyr Ala Phe Asp Tyr Tyr Ser Thr Ser Gly Asp Thr Leu Arg Leu
165 170 175
Thr Asn Arg Pro Ile Ala Val Val lie Asp Gly Asp Val Asn Gly Arg 180 185 190 Glu Gin Ser Lys Cys Arg Phe Met Glu Gly Ser Pro Ser Thr Ala Pro 195 200 205
His Arg Val Cys Glu Gin Tyr Leu Pro Gly Glu Ser Tyr Ala Tyr Leu
210 215 220
Cys Leu Gly Phe Asn Arg Arg Leu Cys Gly Leu Val Val Phe Pro Gly 225 230 235 240
Gly Phe Ala Phe Thr Ile Asn Thr Ala Ala Tyr Leu Ser Leu Ala Asp
245 250 255
Pro Val Ala Arg Ala Val Gly Leu Arg Phe Cys Arg Gly Ala Ala Thr 260 265 270 Gly Pro Gly Leu Val Arg 275
(2) INFORMATION FOR SEQ ID NO: 166:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 731 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166:
GCGGCGGCCG GCACGGTAAA CGTGGGCCAG CCCGGAAATC CCAGCACGGC AAAGTATTGG 60 ACGGGCCCTC CCCGGACGTC AAACCCGGCC CCCAGAAAAG CGAAGACGGG GGCCAGGGCT 120
CCGGGGGCGG CGTGGACCGT GGTATGCCAC TGCCGGAAGA GGGCGACCAG CGCCGGGGCG 180
GAGAACCCGT CGCCGGCGCT CACGAAGTAG TCGTAGCCGC GCGGCAGCAG CACCCGCGCC 240
GTGACCCGCT GCGGGTGTCC GCGGGGCCGC AGGCCGACCT CGCACACCTC GACCAGGTCC 300 GCGAAGGCGC CCTCCTTGCT GGTCGGCGGA AACGCCAGGG TGGTGTATTC GCGCGCGAAA 360
CGCGCGGTCC TCGTCGTGAT GGTGACGGCG AGCGAGGCGG AGGACGCGCA CTGGGGGCTG 420
TCGCGAATGG CGGCCAGGCG CGCCCACGCC AACCGCGCGC CGGGGTGCTC GGCGACGCGC 480
GCGGCCAGGG CCAGCGGGTC GACGTCGACC TTGGCCTCCA CGTCCAGGAG GGCGGCGCGA 540 GGAGCGGCCG GCGGGCCCCA CGACGCCCTT TCGACCCTCA CGACCAGACC CGTCTGCGGG 600
TCCCAGCCCA GGCGCAGCGG GACGAAGAGG GCCACCGGCC CCGTCTGGCG CTCCAGGGCC 660
GCCAGAACGC ACGCATACAG CGCCCGCCAC AGGGTCGGGT CCCCCAGGGG CTCCAGCGGG 720
GAGGCGGCCG 731
(2) INFORMATION FOR SEQ ID NO: 167:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 239 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 167:
Pro Ala Asp Leu Glu Pro Leu Gly Asp Pro Thr Leu Trp Arg Ala Leu
1 5 10 15
Tyr Ala Cys Val Leu Ala Ala Leu Glu Arg Gin Thr Gly Pro Val Phe
20 25 30 Val Pro Leu Arg Leu Gly Trp Asp Pro Gin Thr Gly Leu Val Val Arg
35 40 45
Val Glu Arg Ala Ser Trp Gly Pro Pro Ala Ala Pro Arg Ala Ala Leu
50 55 60 Leu Asp Val Glu Ala Lys Val Asp Val Asp Pro Leu Ala Ala Arg Val 65 70 75 80
Ala Glu His Pro Gly Ala Arg Leu Ala Trp Ala Arg Leu Ala Ala Ile
85 90 95 Arg Asp Ser Pro Gin Cys Ala Ser Ser Ala Ser Leu Ala Val Thr Ile 100 105 110 Thr Thr Arg Thr Ala Arg Phe Ala Arg Glu Tyr Thr Thr Leu Ala Phe 115 120 125 Pro Pro Thr Ser Lys Glu Gly Ala Phe Ala Asp Leu Val Glu Val Cys
130 135 140 Glu Val Gly Leu Arg Pro Arg Gly His Pro Gin Arg Val Thr Ala Arg 145 150 155 160
Val Leu Leu Pro Arg Gly Tyr Asp Tyr Phe Val Ser Ala Gly Asp Gly
165 170 175
Phe Ser Ala Pro Ala Leu Val Phe Arg Gin Trp His Thr Thr Val His 180 185 190
Ala Ala Pro Gly Ala Pro Val Phe Ala Phe Leu Gly Ala Gly Phe Asp
195 , 200 205
Val Arg Gly Gly Pro Val Gin Tyr Phe Ala Val Leu Gly Phe Pro Gly
210 215 220
Trp Pro Thr Phe Thr Val Pro Ala Ala Ala Xaa Xaa Xaa Xaa Xaa 225 230 235
(2) INFORMATION FOR SEQ ID NO: 168:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3005 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168:
GCGCTGGAGC GGGAGCAGCG CGCGGCGGAC CGGGCGGCCG GGGGAGGCGC GGGCCGCCCG 60
GCGGAGGCGG ATCTTCTCCG GGCCGACTAC GACATTATCG ACGTCAGCAA GTCCATGGAC 120
GACGACACGT ACGTGGCCAA CAGTTTCCAG CACCAGTACA TCCCCGCGTA CGGCCAGGAC 180
CTCGAGCGCC TGTCGCGCCT CTGGGAGCAC GAGCTGGTGC GCTGCTTCAA GATTCTGCGC 240
CACCGCAACA ACCAGGGCCA GGAAACGTCG ATCTCGTACT CTAGCGGGGC GATCGCCTCC 300 TTCGTGGCCC CGTATTTCGA GTACGTGCTT CGCGCCCCCC GAGCGGGCGC GCTCATCACC 360
GGCTCCGATG TCATCCTAGG GGAGGAGGAG TTATGGGAGG CGGTCTTTAA GAAAACCCGC 420
CTGCAGACGT ACCTGACAGA CGTCGCGGCC CTGTTCGTGG CGGACGTACA GCACGCGGCT 480
CTGCCCCGGC CCCCCTCCCC AACCCCCGCC GATTTCCGGG CGAGCGCGTC CCCGCGGGGC 540
GGGTCCCGGT CCCGGACCCG GACCCGATCC CGGTCGCCCG GGAGAACGCC GAGGGGTGCG 600 CCGGACCAGG GCTGGGGCGT CGAACGCAGG GATGGCCGAC CCCACGCCCG CCGATGAGGG 660
AACGGCCGCC GCCATCCTCA AACAGGCCAT CGCCGGGGAC CGCAGTCTGG TCGAGGTGGC 720
GGAGGGGATC AGCAACCAGG CGCTGCTGCG CATGGCCTGC GAGGTGCGCC AGGTCAGCGA 780
TCGCCAGCCG CGGTTTACCG CGACCAGCGT CCTGCGCGTT GACGTCACCC CCAGGGGGCG 840
GTTGCGGTTC GTTCTGGACG GGAGTTCCGA CGACGCGTAC GTGGCGTCGG AGGATTACTT 900 TAAGCGCTGC GGGGACCAGC CGACGTATCG CGGTTTTGCG GTCGTCGTCC TCACGGCCAA 960
CGAGGACCAC GTGCACAGCC TGGCCGTGCC CCCCCTCGTT CTGCTGCACC GGCTCTCCTT 1020
GTTTCGCCCC ACGGACCTCC GGGACTTCGA GCTCGTCTGC CTGCTGATGT ACCTGGAGAA 1080
CTGTCCCCGG AGCCACGCCA CGCCCTCGCT GTTCGTCAAG GTGTCGGCGT GGTTGGGGGT 1140
CGTGGCCCGC CACGCGTCTC CCTTCGAGCG CGTCCGCTGC CTTCTCCTCC GCAGCTGCCA 1200 CTGGATCCTG AACACGCTAA TGTGCATGGC GGGCGTGAAG CCCTTCGACG ACGAGCTAGT 1260
CCTGCCCCAC TGGTACATGG CCCACTACCT GCTGGCCAAC AATCCGCCCC CCGTCCTCTC 1320
GGCCCTGTTT TGCGCCACCC CGCAGAGCTC TGCGTTGCAG TTGCCCGGGC CCGTCCCCCG 1380
CACGGACTGT GTGGCCTATA ACCCGGCCGG CGTCATGGGA AGCTGCTGGA AATCCAAGGA 1440 CCTGCGTTCG GCTCTGGTGT ATTGGTGGCT TTCGGGGAGC CCCAAACGAC GGACCTCGTC 1500
GCTTTTCTAT CGGTTTTGCT AACTCCGGAA AATAAACGTG TTTTTTATGG AACGTTCCCT 1560
ACCTGTCGTG TCATCTCTCG GGGGATGGTG GTGGGCCTGT GTGTGTGTCT TGTGCACCGA 1620
AGGAGGAAAG TGGGGGGGTG GTGGTGCTGG TGGTGGAAAG ACATGATAGA GGGAACAAAG 1680 AAATAGAAGA AAACCACAAC CGGCGCGTGT CAGTAAATAC GGACGCGCGC ACACGCGGGG 1740
GTAAGTTGGA GCACGGGGCC CCAGTTTATT GACCAAATTC AGGGAAACAG AAACCGCATC 1800
TTTTCCTCGA AAGGGTACAC AAAGCTCCCG CCCTCGCCCC ACACGCCTTC CAGAACCCCC 1860
GTAAACACCA GTTGAATCTC GCGCAGGATC TCGCGCAGGT GATGGGCGCA GTCCACGGGG 1920
GGGAGCACCA AGGGCCGCGG GTACAGATCC ACGGGGACGC CGACCGACTC CCCGCCCCCG 1980 GGACATACGC GCACGACGCG TCTCCAGTAT TGCTCCGCGT CCAGCAGGGC GCCTCCGCGG 2040
AAGGCCGTTT GGGGCAGGGG GTCGTCGGCC TCGCCCGGGG GGGTCAGAAC GCTCCAGTAC 2100
TCCGCGTCCA GACGCCTCCC GAAGGCATCC AGGACAAAGC GGTCACAGGC GTCCTCCATG 2160
ATGCCCCGGG CCGCGCACAC GGCCTCCTCC GGCGGGCCGG CGGCCGGCCG CCGGAGGATT 2220
CGTCTCAGCG CGTCGCGCAT AACCTCGGCC GCCGCGGCGT ACGCGGGCCC GCGGAGAGGA 2280 AATCCCTGCA GGAAGTCGGT GTCATCGCGG GAGTTCCAGA ACCACGCCCC GGTCTGGCTC 2340
CAGGTGACGA CGTGGGTGTA GACGCCCTCT AGCGCCAGGG AGGGGGCGAG GCGCGGGCGT 2400
ATGCCGTTGG CCGAAAGTAC GGCGCGCACG GACGCCTCGA GGGCCCGGCG GGCGTCCTGG 2460
ATCGCGCCGT GCGCGGCGTC CGCGTCCCCG GGGTCCACGT TGAACAGCCC CCAGAACGCA 2520
GCCCCGGTGC CGCCGCAGAC CGCAAACTTC ACCGAGCTGG CCGTCTGCTC GATCTGCAGG 2580 CAGACGGCGG CCATGACCCC GCCGAGCAGC TGCCGGAGCG CGGGGCAGGC GTCGCACGCG 2640
TCCGGCACCA GGCGCTCCAG CACGGCCCGG GCCCAGGGCT CCGAGGGGGC GGCCGCCACC 2700
AGCGCGTCCA GCCTTTCCAG GCCCGCCCGC CCCCGGGCTT CCGGCAGCCC GGCCTCCCCG 2760
AGGCCCGCGA GGGCGGCCAG GAGCTGGGCC TGGAGCCCGG AGAAACAAAA CCGCGCCGTC 2820
CAGACCGGCC CGACGGCCGC CGGGGGGTCG AGTAGTTGGA TGGTGGTGGC CGTGGGGTGC 2880 CACCGCGCCA CCGCTTCCCG AAAGGCGGGC AGGAGGCGGC CGGCCGCCTC CGAGGCCACG 2940
GCCGGCCATG CCCGCGGGGG CAGGACGACC CTGGCGCCCA CCGCGGGCCA GGCCCCCAGG 3000
CACG 3005
(2) INFORMATION FOR SEQ ID NO: 169:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 221 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 169:
Xaa Xaa Xaa Xaa Xaa Ala Leu Glu Arg Glu Gin Arg Ala Ala Asp Arg
1 5 10 15
Ala Ala Gly Gly Gly Ala Gly Arg Pro Ala Glu Ala Asp Leu Leu Arg 20 25 30
Ala Asp Tyr Asp Ile Ile Asp Val Ser Lys Ser Met Asp Asp Asp Thr
35 -40 45
Tyr Val Ala Asn Ser Phe Gin His Gin Tyr Ile Pro Ala Tyr Gly Gin 50 55 60
Asp Leu Glu Arg Leu Ser Arg Leu Trp Glu His Glu Leu Val Arg Cys 65 70 75 80
Phe Lys Ile Leu Arg His Arg Asn Asn Gin Gly Gin Glu Thr Ser Ile 85 90 95 Ser Tyr Ser Ser Gly Ala Ile Ala Ser Phe Val Ala Pro Tyr Phe Glu 100 105 110
Tyr Val Leu Arg Ala Pro Arg Ala Gly Ala Leu Ile Thr Gly Ser Asp
115 120 125
Val Ile Leu Gly Glu Glu Glu Leu Trp Glu Ala Val Phe Lys Lys Thr 130 135 140
Arg Leu Gin Thr Tyr Leu Thr Asp Val Ala Ala Leu Phe Val Ala Asp
145 150 155 160
Val Gin His Ala Ala Leu Pro Arg Pro Pro Ser Pro Thr Pro Ala Asp
165 170 175 Phe Arg Ala Ser Asp Arg Gly Gly Ser Arg Ser Arg Thr Arg Thr Arg
180 185 190
Ser Arg Ser Pro Gly Arg Thr Pro Arg Gly Ala Pro Asp Gin Gly Trp
195 200 205
Gly Val Glu Arg Arg Asp Gly Arg Pro His Ala Arg Arg 210 215 220
(2) INFORMATION FOR SEQ ID NO: 170:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 302 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 170:
Val Arg Arg Thr Arg Ala Gly Asn Ala Gly Met Ala Asp Pro Thr Pro 1 5 10 15
Ala Asp Glu Gly Thr Ala Ala Ala Ile Leu Lys Gin Ala Ile Ala Gly
20 25 30
Asp Arg Ser Leu Val Glu Val Ala Glu Gly Ile Ser Asn Gin Ala Leu 35 40 45
Leu Arg Met Ala Cys Glu Val Arg Gin Val Ser Asp Arg Gin Pro Arg
50 55 - 60
Phe Thr Ala Thr Ser Val Leu Arg Val Asp Val Thr Pro Arg Gly Arg 65 70 75 80
Leu Arg Phe Val Leu Asp Gly Ser Ser Asp Asp Ala Tyr Val Ala Ser
85 90 95
Glu Asp Tyr Phe Lys Arg Cys Gly Asp Gin Pro Tyr Gly Phe Ala Val 100 105 110 Val Val Leu Thr Ala Asn Glu Asp His Val His Ser Leu Ala Val Pro 115 120 125
Pro Leu Val Leu Leu His Arg Leu Ser Leu Phe Arg Pro Thr Asp Leu
130 135 140
Arg Asp Phe Glu Leu Val Cys Leu Leu Met Tyr Leu Glu Asn Cys Pro 145 150 155 160
Arg Ser His Ala Thr Pro Ser Leu Phe Val Lys Val Ser Ala Trp Leu
165 170 175
Gly Val Val Ala Arg His Asp Phe Glu Arg Val Arg Cys Leu Leu Leu 180 185 190 Arg Ser Cys His Trp Ile Leu Asn Thr Leu Met Cys Met Ala Gly Val 195 200 205
Lys Pro Phe Asp Asp Glu Leu Val Leu Pro His Trp Tyr Met Ala His
210 215 220
Tyr Leu Leu Ala Asn Asn Pro Pro Pro Val Leu Ser Ala Leu Phe Cys 225 230 235 240
Ala Thr Pro Gin Ser Ser Ala Leu Gin Leu Pro Gly Pro Val Pro Arg
245 250 255
Thr Asp Cys Val Ala Tyr Asn Pro Ala Gly Val Met Gly Ser Cys Trp 260 265 270 Lys Ser Lys Asp Leu Arg Ser Ala Leu Val Tyr Trp Trp Leu Ser Gly 275 280 285
Ser Pro Lys Arg Arg Thr Ser Ser Leu Phe Tyr Arg Phe Cys 290 295 300
(2) INFORMATION FOR SEQ ID NO: 171:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 402 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171:
Ala Cys Leu Gly Ala Trp Pro Ala Val Gly Ala Arg Val Val Leu Pro 1 5 10 15
Pro Arg Ala Trp Pro Ala Val Ala Ser Glu Ala Ala Gly Arg Leu Leu
20 25 30
Pro Ala Phe Arg Glu Ala Val Ala Arg Trp His Pro Thr Ala Thr Thr 35 40 45 Ile Gin Leu Leu Asp Pro Pro Ala Ala Val Gly Pro Val Trp Thr Ala 50 55 60
Arg Phe Cys Phe Ser Gly Leu Gin Ala Gin Leu Leu Ala Ala Gly Leu 65 70 75 80
Gly Glu Ala Gly Leu Pro Glu Arg Arg Ala Gly Leu Glu Arg Leu Asp 85 90 95
Ala Leu Val Ala Ala Ala Pro Ser Glu Pro Trp Ala Arg Ala Val Leu
100 105 110
Glu Arg Leu Val Pro Asp Ala Cys Asp Ala Cys Pro Ala Leu Arg Gin 115 120 125 Leu Leu Gly Gly Val Met Ala Ala Val Cys Leu Gin Ile Glu Gin Thr 130 135 140
Ala Ser Ser Val Lys Phe Ala Val Cys Gly Gly Thr Gly Ala Ala Phe 145 150 155 160
Trp Gly Leu Phe Asn Val Asp Pro Gly Asp Ala Asp Ala Ala His Gly 165 170 175
Ala Ile Gin Asp Ala Arg Arg Ala Leu Glu Ala Ser Val Arg Ala Val
180 185 190
Leu Ser Ala Asn Gly Ile Arg Pro Arg Leu Ala Pro Ser Leu Ala Leu 195 200 205 Glu Gly Val Tyr Thr His Val Val Thr Trp Ser Gin Thr Gly Ala Trp 210 215 220
Phe Trp Asn Ser Arg Asp Asp Thr Asp Phe Leu Gin Gly Phe Pro Leu 225 230 235 240
Arg Gly Pro Ala Tyr Ala Ala Ala Ala Glu Val Met Arg Asp Ala Leu 245 250 255
Arg Arg Ile Leu Arg Arg Pro Ala Ala Gly Pro Pro Glu Glu Ala Val
260 265 270
Cys Ala Arg Ile Met Glu Asp Ala Cys Asp Arg Phe Val Leu Asp Ala 275 280 285 Phe Gly Arg Arg Leu Asp Ala Glu Tyr Trp Ser Val Leu Thr Pro Pro 290 295 300
Gly Glu Ala Asp Asp Pro Leu Pro Gin Thr Ala Phe Arg Gly Gly Ala 305 310 315 320 Leu Leu Asp Ala Glu Gin Tyr Trp Arg Arg Val Val Arg Val Cys Pro
325 330 335
Gly Gly Gly Glu Ser Val Gly -Val Pro Val Asp Leu Tyr Pro Arg Pro
340 345 350
Leu Val Leu Pro Pro Val Asp Cys Ala His His Leu Arg Glu Ile Leu
355 360 365
Arg Glu Ile Gin Leu Val Phe Thr Gly Val Leu Glu Gly Val Trp Gly
370 375 380
Glu Gly Gly Ser Phe Val Tyr Pro Phe Glu Glu Lys Met Arg Phe Leu 385 390 395 400
Phe Pro
(2) INFORMATION FOR SEQ ID NO: 172:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 428 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172:
CGCGACGCGG GCCGCTGGGT CCGCGGACCG GAGAACGACG TCCGCGGTCC GCGGGGCGTA 60
CCCGGACCCC ATGGCCAGCC TGTCGCCGCG ACCCCCGGCG CCCCGCCGAC ACCACCACCA 120
CCACCACCGC CGCCGCCGCC GGCGCGCCCC CCGCCGGCGC TCGACCGCCT CTGACTCATC 180
AAAATCCGGA TCCTCGTCGT CGGCGTCCTC CGCCTCCTCC TCCGCCTCCT CCTCCTCGTC 240
TGCATCCGCC TCCTCGTCTG ACGACGACGA CGACGACGCC GCCCGCGCCC CCGCCAGCGC 300 CGCAGACCAC GCCGCGGGCG GGACCCTCGG CGCGGACGAC GAGGAGGCGG GGGTGCCCGC 360
GAGGGCCCCG GGGGCGGCGC CCCGGCCGAG CCCGCCCAGG GCCGAGCCCG CCCCCGGGGC 420
CGGGGCG 428
(2) INFORMATION FOR SEQ ID NO: 173:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 15900 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 173: CGCGCTCCGT GTGGACGATC GCCCCGTCGC CTGGCTGATA TAGTCCTCGG GGCGCGCGGG 60
GCGGGGGGAA AGGAGGAGGA CGCGGAGGAG GAGCGATCGA CGCCGCCGCG CCCCGGCTCG 120
CCGGGGTTCC GCCCCCAGGT GGAACCGCAT TATGCGCGGC CCCGCCCCGA CGCCCGCGCG 180 TCCGCGTCCG TGGCGGCGGC CCGTTGGTCG CGCCGCCGCC GGCTCCGCCC GCGCGGCATC 240
TCATTAGCGC CCGGCGCGGG CGGCTTCCGC TTCCGCCCGC GATGCTAATG AGACCCTCGT 300
CGCGGGCGGG CTCGCTCCCC TGCCCTTCCG GGTTCGTGGT AATGAGATGC CGGCCCCGCG 360
CTCCCGTTGG CCCCCGCCGG CCCCTTTGGG GCCGGCGAGG TCGCCCCGTT GGTCCGCGGG 420
CGGCTCCGCC CCAAAGGGGG CGGGGCCGCA GGGTAAAAGA AGTGAGAACG CGAAGCGTTC 480 GCACTTCGTC CTAATAGTAT ATATATTATT AGGGCAAAGT GCGAGCACTG GCGCCCTGCC 540
CGGGGCCCGC GTCATCCCGC GGGCTCCGCC CCAAAGGGGG CGGGGCCGCA GGGTAAAAGA 600
AGTGAGAACG CGAAGCGTTC GCACTTCGTC CTAATAGTAT ATATATTATT AGGGCAAAGT 660
GCGAGCACTG GCGCCCTGCC CGGGGCCCGC GTCATCCCGC GGGCCCCGCC CCGAGGCGGG 720
CCCGGACGGG GGGCGGGCCG TTCCTCGCGC ACATAAAGGG CCGGCGTCCC GGTCGCCGCC 780 GCACCAGGGG CACACCGGCT GCGCGGCGGA GACCGGGACG GCAGCGGCGG CATCGCGAAG 840
GGGGCCACAG CGAGACAGAG ACGCCGGCGG CGAGCGGGGC ACCGACGCAC CCGGATCGGA 900
TCGGATACAG AGACGCGGGC GCATCGGTTC CTTTTCGTTC TGCCTTTCCC TCCCCCCCCC 960
CCCCACCCTG TACGTACCGC GAGGACCCAT CCACCCACTG CAGCCTTATC GCAGGTACGG 1020
TGACCCGGGG GCGCCGGGGC GGGGGGACGG GACGGGGGGA CGGGACGGGG GGACGGGACG 1080 GGGGGACGGG ACGGGGGGAC GGGACGGGGG GACGGGACGG GGGGACGGGA CGGGGGGACG 1140
GGACGGGGGG ACGGGACGGG GGGACGGGAC GGGGGGACGG GACGGGGGGA CGGGACGGGG 1200
GGACGGGACG GGGGGACGGG ACGGGGGGAC GGGACGGGGG GACGGGACGG GGGGACGGGA 1260
CGGGGGGACG GGACGGGGGG ACGGGACGGG GGGACGGGAC GGGGGGACGG GACGGGGGGA 1320
CGGGACGGGG GGACGGGACG GGGGGACGGG ACGGGGGGAC GGGACGGGGG GACGGGACGG 1380 GGGGACGGGA CGGGGGGACG GGACGGGGGG ACGGGACGGG GGGACGGGAC GGGGGGGCCC 1440
CGATCCCAAC ATCCGCGCTT TCTCGCAGGC CGGGCGCCGC CTTCGTGGAC GGGACACCGG 1500
TGTGGTAACT GGCGACAAGG CGTCGCCACT ATGGCAGACA TCCCCCCGGA CCCGCCCGCG 1560
CTCAACACGA CGCCTGCGAA TCATGCTCCC CCATCCCCAC CCCCGGGTTC ACGGAAGCGC 1620
AGACGCCCCG TCCTCCCCAG CTCGTCGGAA TCTGAGGGTA AGCCCGACAC AGAATCGGAA 1680 TCCTCCTCGA CCGAGTCGTC CGAGGATGAG GCGGGAGACC TACGCGGCGG GCGCCGTCGC 1740
TCCCCGCGGG AGCTCGGGGG GAGGTATTTT TTGGATCTGT CGGCAGAATC GACCACGGGG 1800
ACGGAATCGG AGGGAACGGG GCCGTCGGAC GACGATGATG ATGATGCGTC AGACGGCTGG 1860
TTGGTTGACA CCCCCCCCCG TAAATCCAAG CGACCCCGAA TCAACCTGCG ATTAACGAGC 1920
TCCCCCGACC GGCGCGCGGG TGTGGTTTTC CCCGAGGTGT GGAGAAACGA CAGACCTATC 1980 CGCGCGGCGC AACCCCAGGC CCCGGCCCAG TCTTCCGGGG ATCGCGCAGC CGCACCGCGG 2040
CGCTCTGCTC GCCAGGCCCA GATGCGGAGC GGAGCCGCCT GGACGCTTGA TCTGCATTAC 2100
ATACGCCAGT GCGTCAACCA GCTCTTTCGG ATCCTGCGTG CCGCCCCGAA CCCGCCCGGC 2160
AGCGCCAACC GCCTGCGCCA CCTGGTGCGA GACTGCTACC TTATGGGCTA CTGCCGGACC 2220
CGCCTGGGGC CGCGCACGTG GGGCCGCCTG CTGCAGATCT CGGGCGGAAC CTGGGACGTG 2280 CGCCTGCGAA ACGCAATCCG GGAGGTCGAG GCGCGTTTTG AACCCGCCGC CGAGCCCGTG 2340
TGCGAGCTGC CCTGTCTGAA CGCCAGGCGT TACGGCCCCG AGTGTGATGT TGGCAATCTC 2400
GAGACCAACG GCGGCTCGAC GAGCGATGAT GAGATATCGG ATGCGACGGA CTCGGACGAT 2460
ACCCTCGCGT CCCATTCCGA CACGGAGGGG GGGCCCTCCC CGGCCGGCCG GGAGAACCCG 2520 GAATCCGCGT CCGGCGGGGC TATCGCGGCT CGGCTGGAGT GTGAGTTTGG GACGTTTGAC 2580
TGGACGTCCG AGGAGGGCTC CCAGCCCTGG CTGTCCGCGG TGGTCGCCGA TACCAGCTCC 2640
GCCGAACGCT CTGGCCTACC CGCCC-CGGGC GCGTGTCGCG CAACGGAAGC CCCAGAACGC 2700
GAGGACGGGT GCCGAAAAAT GCGCTTCCCC GCCGCCTGCC CCTATCCCTG CGGCCACACA 2760 TTTCTCCGGC CATGAGCGCG GGACCCCCAG CCCGGTGTGT TTGCCAAACG AAAATAAACG 2820
CCCTACAAGA AAGCTTTTGT GTCTGAGTGT CTGGTTTTTC TGGGGGTGGA GGAAGGAACG 2880
ACAAAAAAAG AAACAAACGC GACACCGCTC GTACGTGTAA TGGGGCGCAG TGTTTTTTAT 2940
TAGCATCGGG GGGGGGTTAG AGGTTGGTGA TTGGATAGCA AACGTGGGAT GACGGAGGCC 3000
ACTCGTCGCC AACGGCCAGC GGGGGCCCGG GGTTCTGGGG GTCATCGTCC CCCGTCTGCC 3060 AGGAGGGCTC ATCGGGAATC TCGGGTCGCC CCATGCACGT AAAACACGGG CGCTGCGTGG 3120
GGTGGGTCGC CGGATGCGGG CGGGATGATG CGGGGCGGGG TTTGTTGTGA GGAGCCACGA 3180
GGGACCGTAG CCAGCGAAGA CAGCTGCGTT CCCGGTCGCC GGGCACCACC ACGCCGTATT 3240
GGTATTCGTA TCGGCTAAGG AGATTTTCCA GGGGGTGATT AGGCGCTGCG GGGAACGGGG 3300
TCCACGACAC GGTCCGCTCG GGCAAAAACC GATCGGGCAG GGGCCACGGT TCCCCCACCC 3360 ACGCGTCGTT GGTCTTCATG GCGATGAAGC GAAACCCCAG CCGGGTTTTT TGTGCGTACT 3420
CTAAAAACGG CACACACAGG TCCGCCGCCC CGACCACCCA CAGGTGGTAT AGCCGGTGGG 3480
GGCCGGGGCG CTCTTGATGC AGGAGCCGAA AACACGCAGG GGCATCCAGA ATCTCGATGC 3540
TTTCCAGGGG GTCGTCCTCC GCAAACAGGC CCGTCGTGGT GTTTGGGGGA CAGCGACAGG 3600
AGCGGGTTCG CACGATCGGT CGGGTGAATT TGGGCAAGTC CATCAGAGGC TCGGCCAGCC 3660 TGCGAAGGTT CGCCGGGCGA ACCACCACCG GGGTTCCCAG AGGCTCGGAG GCCAGGATCC 3720
GGCATTGCCG AAGCAGAAAA CTCCACAGAG CCGGGCTTGC GTCAGCGGAA GTCCGCGGCA 3780
GGGCGTTTCG TTGGTCTAGG AGGGTAACCA CACTTACAAC AACAACGCCC ATGTCGGTAT 3840
ATTAGGCCCG TGGTCCGATC TTCACTCACT CGCCTGTCTG CGGACCTATG CACGGCGGGA 3900
CGGCGCGCGG ACCCGGGGGG GCTGCTTGCT ATCACACGGC CCGTTCGCAC GTTCGATTTT 3960 TTCAGCCTTG TTTGGTTGGC TAGGTATCCC GGATAATCTG ACGTTCCGGA TATAGGGGGC 4020
GGGGGTAGTG GGGGGGTGTG TCGACAAACT GCCGCTTCTT AAAACACCGG GGCCCGTCGC 4080
TCGGGGTGCT CGTTGGTTGG CACGCGCGAC GCGGCGAATG GCCTGTCGTA AGTTCTGTGG 4140
GGTCTACCGT AGACCCGACA AGAGACAGGA GGCGTCCGTC CCGCCGGAGA CAAACACGGC 4200
CCCGGCCTTC CCGGCGAGCA CCTTTTATAC CCCCGCGGAG GATGCGTACC TGGCCCCCGG 4260 GCCCCCGGAA ACCATCCACC CTTCCCGCCC ACCGTCCCCC GGCGAGGCTG CGCGCCTGTG 4320
TCAGCTGCAG GAGATCTTGG CCCAGATGCA CAGCGACGAG GACTACCCCA TCGTGGACGC 4380
CGCGGGTGCG GAGGAGGAAG ACGAGGCCGA CGATGACGCC CCGGATGACG TGGCCTACCC 4440
GGAGGACTAC GCGGAGGGGC GTTTTCTGTC CATGGTTTCG GCCGCCCCCC TGCCCGGAGC 4500
CAGCGGCCAT CCTCCTGTTC CGGGCCGCGC AGCCCCCCCC GACGTCCGGA CCTGCGACAG 4560 CGGTAAGGTG GGGGCCACGG GGTTCACCCC GGAAGAGCTC GACACCATGG ACCGGGAGGC 4620
ACTTCGGGCC ATCAGCCGCG GGTGCAAGCC CCCTTCGACC CTGGCAAAAC TGGTGACCGG 4680
GCTGGGATTC GCGATCCACG GAGCGCTCAT CCCGGGGTCG GAGGGGTGTG TCTTTGATAG 4740
CAGCCACCCG AACTACCCTC ATCGGGTAAT CGTCAAGGCG GGGTGGTACG CCAGCACGAA 4800
CCACGAGGCG CGGCTGCTGA GACGCCTGAA CCACCCCGCG ATCCTACCCC TCCTGGACCT 4860 GCACGTCGTT TCTGGGGTCA CGTGTCTGGT CCTCCCCAAG TATCACTGCG ACCTGTATAC 4920
CTATCTGAGC AAGCGCCCGT CTCCGTTGGG CCACCTACAG ATAACCGCGG TCTCCCGGCA 4980
GCTCTTGAGC GCCATCGACT ACGTCCACTG CGAAGGCATC ATCCACCGCG ATATTAAGAC 5040
CGAGAACATC CTCATCAACA CCCCCGAGAA CATCTGTCTG GGGGACTTTG GGGCGGCGTG 5100 CTTTGTGCGC GGGTGTCGAT CGAGCCCCTT CCATTACGGG ATCGCAGGCA CCATCGATAC 5160
AAACGCCCCC GAGGTCCTGG CCGGGGATCC GTACACCCAG GTAATCGACA TCTGGAGCGC 5220
CGGCCTGGTG ATCTTTGAGA CCGCCGTCCA CACCGCGTCC TTGTTCTCGG CCCCGCGCGA 5280
CCCCGAAAGG CGGCCGTGCG ACAACCAGAT CGCGCGCATC ATCCGACAGG CCCAGGTACA 5340 CGTCGACGAG TTTCCAACGC ACGCGGAATC GCGCCTCACC GCGCACTACC GCTCGCGGGC 5400
GGCCGGGAAC AATCGTCCGG CGTGGACCCG ACCGGCATGG ACCCGCTACT ACAAGATCCA 5460
CACAGACGTC GAATATCTCA TCTGCAAAGC CCTTACCTTT GACGCGGCGC TCCGCCCAAG 5520
CGCCGCGGAG TTGCTGCGCC TGCCGCTATT TCACCCTAAG TGACCCCGCT CCCCCCGGGG 5580
GGCGTGGAGG GGGGGCTGGT TGGATGTTTT TGCACAAAAA GACGCGGCCC TCGGGCTTTG 5640 GTGTTTTTGG CACCTTGCCG CCCGGCGTCA TGCACGCCAT CGCTCCCAGG TTGCTTCTTC 5700
TTTTTGTTCT TTCTGGTCTT CCGGGGACAC GCGGCGGGTC GGGTGTCCCC GGACCAATTA 5760
ATCCCCCCAA CAACGATGTT GTTTTCCCGG GAGGTTCCCC CGTGGCTCAA TATTGTTATG 5820
CCTATCCCCG GTTGGACGAT CCCGGGCCCT TGGGTTCCGC GGACGCCGGG CGGCAAGACC 5880
TGCCCCGGCG CGTCGTCCGT CACGAGCCCC TGGGCCGCTC GTTCCTCACG GGGGGGCTGG 5940 TTTTGCTGGC GCCGCCGGTA CGCGGATTTG GCGCACCCAA CGCAACGTAT GCGGCCCGTG 6000
TGACGTACTA CCGGCTCACC CGCGCCTGCC GTCAGCCCAT CCTCCTTCGG CAGTATGGAG 6060
GGTGTCGCGG CGGCGAGCCG CCGTCCCCAA AGACGTGCGG GTCGTACACG TACACGTACC 6120
AGGGCGGCGG GCCTCCGACC CGGTACGCTC TCGTAAATGC TTCCCTGCTG GTGCCGATCT 6180
GGGACCGCGC CGCGGAGACA TTCGAGTACC AGATCGAACT CGGCGGCGAG CTGCACGTGG 6240 GTCTGTTGTG GGTAGAGGTG GGCGGGGAGG GCCCCGCCCC CACCGCCCCC CCACAGGCGG 6300
CGCGTGCGGA GGGCGGCCCG TGCGTCCCCC CGGTCCCCGC GGGCCGCCCG TGGCGCTCGG 6360
TGCCCCCGGT ATGGTATTCC GCCCCCAACC CCGGGTTTCG TGGCCTGCGT TTCCGGGAGC 6420
GCTGTCTGCC CCCACAGACG CCCGCCGCCC CCAGCGACCT ACCACGCGTC GCTTTTGCTC 6480
CCCAGAGCCT GCTGGTGGGG ATTACGGGCC GCACGTTTAT TCGGATGGCA CGACCCACGG 6540 AAGACGTCGG GGTCCTGCCA CCCCATTGGG CCCCCGGGGC CCTAGATGAC GGTCCGTACG 6600
CCCCCTTCCC ACCCCGCCCG CGGTTTCGAC GCGCCCTGCG GACAGACCCC GAGGGGGTCG 6660
ACCCCGACGT TCGGGCCCCC CTAACCGGGC GGCGCCTCAT GGCCTTGACC GAGGACGCGT 6720
CCTCCGATTC GCCTACGTCC GCTCCGGAGA AGACGCCCCT CCCTGTGTCG GCCACCGCCA 6780
TGGCGCCCTC AGTCGACCCA AGCGCGGAAC CGACCGCCCC CGCAACCACT ACTCCCCCCG 6840 ACGAGATGGC CACACAAGCC GCAACGGTCG CCGTTACGCC GGAGGAAACG GCAGTCGCCT 6900
CCCCGCCCGC GACTGCATCC GTGGAGTCGT CGCCACTCCC CGCCGCGGCG GCAACGCCCG 6960
GGGCCGGGCA CACGAACACC AGCAGCGCCC CCGCAGCGAA AACGCCCCCC ACCACACCAG 7020
CCCCCACGAC CCCCCCGCCC ACGTCTACCC ACGCGACCCC CCGCCCCACG AGTCCGGGGC 7080
CCCAAACAAC CCCTCCCGGA CCCGCAACCC CGGGTCCGGT GGGCGCCTCC GCCGCACCCA 7140 CGGCCGATTC CCCCCTCACC GCCTCGCCCC CCGCTACCGC GCCGGGGCCC TCGGCCGCCA 7200
ACGTTTCGGT CGCCGCGACC ACCGCCACGC CCGGAACCCG GGGCACCGCC CGTACCCCCC 7260
CAACGGACCC AAAGACGCAC CCACACGGAC CCGCGGACGC TCCCCCCGGC TCGCCAGCCC 7320
CCCCACCCCC CGAACATCGC GGCGGACCCG AGGAGTTTGA GGGCGCCGGG GACGGCGAAC 7380
CCCCCGATGA CGACGACAGC GCCACCGGCC TCGCCTTCCG AACTCCGAAC CCCAACAAAC 7440 CACCCCCCGC GCGCCCCGGG CCCATCCGCC CCACGCTCCC GCCAGGAATT CTTGGGCCGC 7500
TCGCCCCCAA CACGCCTCGC CCCCCCGCCC AAGCTCCCGC TAAGGACATG CCCTCGGGCC 7560
CCACACCCCA ACACATCCCC CTGTTCTGGT TCCTAACGGC CTCCCCTGCT CTAGATATCC 7620
TCTTTATCAT CAGCACCACC ATCCACACGG CGGCGTTCGT TTGTCTGGTC GCCTTGGCAG 7680 CACAACTTTG GCGCGGCCGG GCGGGGCGCA GGCGATACGC GCACCCGAGC GTGCGTTACG 7740
TATGTCTGCC ACCCGAGCGG GATTAGGGGG TGGGGTGGGG GCGAGAAACG ATGAAGGACG 7800
GGAAAGGGAA CAGCGACCAA ATGCCACGAT AAGAACAATA AACCTGTGAC GTCAATCGGA 7860
TATGTGAGTT TGGTTGTGTT TTGTGGGACT GGGGGCGGGG GGTGGGAGGT ATCAGTGGGT 7920 GACAGAGTCT TTTAAAAGAC GTGTCCCGGG GCCCTCGAGA CGCGCAACTT TTGGCCACAC 7980
AGAGAAAGGC CCCCAGACGA AGTCACCCGG GTCCCCGAAC AAAAACAAAA ACCTTGACCG 8040
CCGCCGGGGG GCGTGCCTGT TGTTTTGGTC TCAATGGATC GGTATGCCGT TCGGACCTGG 8100
GGGATTGTGG GAATCCTCGG GTGTGCTGCT GTTGGGGCCG CACCCACCGG CCCCGCGTCC 8160
GATACAACAA ACGCGACCGC ACGCCTCCCC ACGCACCCCC CACTCATCCG TTCCGGGGGC 8220 TTTGCCGTCC CCCTCATCGT GGGGGGGCTG TGTCTCATGA TTCTGGGGAT GGCGTGTCTA 8280
CTCGAGGTCC TGCGTCGCCT GGGTCGCGAG TTGGCGAGGT GCTGCCCCCA CGCGGGCCAA 8340
TTTGCCCCAT GATTTTTCGC CTTTCTGGCC TTGCCCCCAC CCCATCGCCC CGATTGTGTG 8400
TCGGGTGCCC GGGGTACAGC AGCTATGGAG CGGTCGGTAA TATAACTTTG GTTGTCGCCA 8460
CACGCCCCGT GCCGGGCATG GGTTGTGCGG GAAAGACGAA ATAATCCGGC GATCCCCAAG 8520 CGTACCAACT TGGGGGGGGG GGGAAAGAAA CTAAAAACAC ATCAAGCCCA CAACCCATCC 8580
CACAAGGGGG GTTATGGCGG ACCCACCGCA CCACCATACT CCGATTCGAC CACATATGCA 8640
ACCAAATCAC CCCCAGAGGG GAGGTTCCAT TTTTACGAGG AGGAGGAGTA TAATAGAGTC 8700
TTTGTGTTTA AAACCCGGGG TCGGTGTGGT GTTCGGTCAT AAGCTGCATT GCGAACGACT 8760
AGTCGCCGTT TTTCGTGTGC ATCGCGTATC ACGGCATGGG GCGTTTGACC TCCGGCGTCG 8820 GGACGGCGGC CCTGCTAGTT GTCGCGGTGG GACTCCGCGT CGTCTGCGCC AAATACGCCT 8880
TAGCAGACCC CTCGCTTAAG ATGGCCGATC CCAATCGATT TCGCGGGAAG AACCTTCCGG 8940
TTTTGGACCA GCTGACCGAC CCCCCCGGGG TGAAGCGTGT TTACCACATT CAGCCGAGCC 9000
TGGAGGACCC GTTCCAGCCC CCCAGCATCC CGATCACTGT GTACTACGCA GTGCTGGAAC 9060
GTGCCTGCCG CAGCGTGCTC CTACATGCCC CATCGGAGGC CCCCCAGATC GTGCGCGGGG 9120 CTTCGGACGA GGCCCGAAAG CACACGTACA ACCTGACCAT CGCCTGGTAT CGCATGGGAG 9180
ACAATTGCGC TATCCCCATC ACGGTTATGG AATACACCGA GTGCCCCTAC AACAAGTCGT 9240
TGGGGGTCTG CCCCATCCGA ACGCAGCCCC GCTGGAGCTA CTATGACAGC TTTAGCGCCG 9300
TCAGCGAGGA TAACCTGGGA TTCCTGATGC ACGCCCCCGC CTTCGAGACC GCGGGTACGT 9360
ACCTGCGGCT AGTGAAGATA AACGACTGGA CGGAGATCAC ACAATTTATC CTGGAGCACC 9420 GGGCCCGCGC CTCCTGCAAG TACGCTCTCC CCCTGCGCAT CCCCCCGGCA GCGTGCCTCA 9480
CCTCGAAGGC CTACCAACAG GGCGTGACGG TCGACAGCAT CGGGATGCTC CCCCGCTTTA 9540
TCCCCGAAAA CCAGCGCACC GTCGCCCTAT ACAGCTTAAA AATCGCCGGG TGGCACGGCC 9600
CCAAGCCCCC GTACACCAGC ACCCTGCTGC CGCCGGAGCT GTCCGACACC ACCAACGCCA 9660
CGCAACCCGA ACTCGTTCCG GAAGACCCCG AGGACTCGGC CCTCTTAGAG GATCCCGCCG 9720 GGACGGTGTC TTCGCAGATC CCCCCAAACT GGCACATCCC GTCGATCCAG GACGTCGCGC 9780
CGCACCACGC CCCCGCCGCC CCCAGCAACC CGGGCCTGAT CATCGGCGCG CTGGCCGGCA 9840
GTACCCTGGC GGTGCTGGTC ATCGGCGGTA TTGCGTTTTG GGTACGCCGC CGCGCTCAGA 9900
TGGCCCCCAA GCGCCTACGT CTCCCCCACA TCCGGGATGA CGACGCGCCC CCCTCGCACC 9960
AGCCATTGTT TTACTAGAGG AGTATCCCCG CTCCCGTGTA CCTCTGGGCC CGTGTGGGAG 10020 GGTGGCTGGG GTATTTGGGT GGGACTTGGA CTCCGCATAA AGGGAGTCTC GAAGGAGGGA 10080
AACTAGGACA GTTCATAGGC CGGGAGCGTG GGGCGCGCAC CGCTGTCCCG ACGATTAGCC 10140
ACCGCGCCCA CAGTCACCTC GACCCGTCCG ATCCCGGTAT GCCCGGCCGC TCGCTGCAGG 10200
GCCTGGCGAT CCTGGGCCTG TGGGTCTGCG CCACCGGCCT GGTCGTCCGC GGCCCCACGG 10260 TCAGTCTGGT CTCAGACTCA CTCGTGGATG CCGGGGCCGT GGGGCCCCAG GGCTTCGTGG 10320
AAGAGGACCT GCGTGTTTTC GGGGAGCTTC ATTTTGTGGG GGCCCAGGTC CCCCACACAA 10380
ACTACTACGA CGGCATCATC GAGCTGTTTC ACTACCCCCT GGGGAACCAC TGCCCCCGCG 10440
TTGTACACGT GGTCACACTG ACCGCATGCC CCCGCCGCCC CGCCGTGGCG TTCACCTTGT 10500 GTCGCTCGAC GCACCACGCC CACAGCCCCG CCTATCCGAC CCTGGAGCTG GGTCTGGCGC 10560
GGCAGCCGCT TCTGCGGGTT CGAACGGCAA CGCGCGACTA TGCCGGTCTG TATGTCCTGC 10620
GCGTATGGGT CGGCAGCGCG ACGAACGCCA GCCTGTTTGT TTTGGGGGTG GCGCTCTCTG 10680
CCAACGGGAC GTTTGTGTAT AACGGCTCGG ACTACGGCTC CTGCGATCCG GCGCAGCTTC 10740
CCTTTTCGGC CCCGCGCCTG GGACCCTCGA GCGTATACAC CCCCGGAGCC TCCCGGCCCA 10800 CCCCTCCACG GACAACGACA TCCCCGTCCT CCCCCCGAGA CCCGACCCCC GCCCCCGGGG 10860
ACACAGGGAC GCCCGCGCCC GCGAGCGGCG AGAGAGCCCC GCCCAATTCC ACGCGATCGG 10920
CCAGCGAATC GAGACACAGG CTAACCGTAG CCCAGGTAAT CCAGATCGCC ATACCGGCGT 10980
CCATCATCGC CTTTGTGTTT CTGGGCAGCT GTATCTGCTT CATCCATAGA TGCCAGCGCC 11040
GATACAGGCG CCCCCGCGGC CAGATTTACA ACCCCGGGGG CGTTTCCTGC GCGGTCAACG 11100 AGGCGGCCAT GGCCCGCCTC GGAGCCGAGC TGCGATCCCA CCCAAACACC CCCCCCAAAC 11160
CCCGACGCCG TTCGTCGTCG TCCACGACCA TGCCTTCCCT AACGTCGATA GCTGAGGAAT 11220
CGGAGCCAGG TCCAGTCGTG CTGCTGTCCG TCAGTCCTCG GCCCCGCAGT GGCCCGACGG 11280
CCCCCCAAGA GGTCTAGGTC CAAGCGGGCC GTTCGGCAGG CCCGCCCCAC CGCCCCCATC 11340
GTGGTTATTT CCCCCCCAAT AAACCGATGT TATTTGCCTA TATGCGTGTG TTGGATCCCT 11400 TTGTGATCGT TCGTCATTCC CCGGATGGCA TGGGAGGCGG GTAATGGATG GGCGGGGCCC 11460
GGGGGGGGAG GAAAAAGAAT AAAGGGGGTA GTGTCGGAGA GGCCCGCCGC GCATTTAAGG 11520
AGTCGCCGCC CCGACTCTGT GTCTTCGGGT GACTTGGTGC GCCGCCGTCA GCTAGTCTCC 11580
GATCTGCCCC GACCGACGGC TCCTGCCACC CGAACATGGC TCGCGGGGCC GGGTTGGTGT 11640
TTTTTGTTGG AGTTTGGGTC GTATCGTGCC TGGCGGCAGC ACCCAGAACG TCCTGGAAAC 11700 GGGTAACCTC GGGCGAGGAC GTGGTGTTGC TTCCGGCGCC CGCGGGGCCG GAGGAACGCA 11760
CCCGGGCCCA CAAACTACTG TGGGCCGCGG AACCCCTGGA TGCCTGCGGT CCCCTGCGCC 11820
CGTCGTGGGT GGCGCTGTGG CCCCCCCGAC GGGTGCTCGA GACGGTCGTG GATGCGGCGT 11880
GCATGCGCGC CCCGGAACCG CTCGCCATAG CATACAGTCC CCCGTTCCCC GCGGGCGACG 11940
AGGGACTGTA TTCGGAGTTG GCGTGGCGCG ATCGCGTAGC CGTGGTCAAC GAGAGTCTGG 12000 TCATCTACGG GGCCCTGGAG ACGGACAGCG GTCTGTACAC CCTGTCCGTG GTCGGCCTAA 12060
GCGACGAGGC GCGCCAAGTG GCGTCGGTGG TTCTGGTCGT GGAGCCCGCC CCTGTGCCGA 12120
CCCCGACCCC CGACGACTAC GACGAAGAAG ACGACGCGGG CGTGAGCGAA CGCACGCCGG 12180
TCAGCGTTCC CCCCCCAACC CCCCCCCGTC GTCCCCCCGT CGCCCCCCCG ACGCACCCTC 12240
GTGTTATCCC CGAGGTGTCC CACGTGCGCG GGGTAACGGT CCATATGGAG ACCCCGGAGG 12300 CCATTCTGTT TGCCCCCGGG GAGACGTTTG GGACGAACGT CTCCATCCAC GCCATTGCCC 12360
ACGACGACGG TCCGTACGCC ATGGACGTCG TCTGGATGCG GTTTGACGTG CCGTCCTCGT 12420
GCGCCGAGAT GCGGATCTAC GAAGCTTGTC TGTATCACCC GCAGCTTCCA GAGTGTCTAT 12480
CTCCGGCCGA CGCGCCGTGC GCCGTAAGTT CCTGGGCGTA CCGCCTGGCG GTCCGCAGCT 12540
ACGCCGGCTG TTCCAGGACT ACGCCCCCGC CGCGATGTTT TGCCGAGGCT CGCATGGAAC 12600 CGGTCCCGGG GTTGGCGTGG CTGGCCTCCA CCGTCAATCT GGAATTCCAG CACGCCTCCC 12660
CCCAGCACGC CGGCCTCTAC CTGTGCGTGG TGTACGTGGA CGATCATATC CACGCCTGGG 12720
GCCACATGAC CATCAGCACC GCGGCGCAGT ACCGGAACGC GGTGGTGGAA CAGCACCTCC 12780
CCCAGCGCCA GCCCGAGCCC GTCGAGCCCA CCCGCCCGCA CGTGAGAGCC CCCCCTCCCG 12840 CGCCCTCCGC GCGCGGCCCG CTGCGCCTCG GGGCGGTGCT GGGGGCGGCC CTGTTGCTGG 12900
CCGCCCTCGG GCTGTCCGCG TGGGGCGTGC ATGACCTGCT GGCGCAGGCG CTCCTGGCGG 12960
GCGGTTAAAA GCCGGGCCTC GGCGACGGGC CCCACTTACA TTCGCGTGGC GGACAGCGAG 13020
CTGTACGCGG ACTGGAGTTC GGACAGCGAG GGGGAGCGCG ACGGGTCCCT GTGGCAGGAC 13080 CCTCCGGAGA GACCCGACTC TCCCTCCACA AATGGATCCG GCTTTGAGAT CTTATCACCA 13140
ACGGCTCCGT CTGTATACCC CCATAGCGAG GGGCGTAAAT CTCGCCGCCC GCTCACCACC 13200
TTTGGTTCGG GAAGCCCGGG CCGTCGTCAC TCCCAGGCCT CCTATTCGTC CGTCCTCTGG 13260
TAAGGCGTCT TCCGACGACG CGGACGTCGG CGATGAACTG ATTGCCATCG CGGACGCACG 13320
CGGGGACCCG CCAGAGACCC TGCCCCCCGG CGCGGGCGGC GCCGCGCCCG CGTGCCGCAG 13380 ACCACCTCGC GGCGGCTCCC CCGCGGCCTT TCCCGTGGCC CTCCACGCCG TGGACGCCCC 13440
CTCCCAATTC GTCACCTGGC TCGCCGTGCG CTGGCTGCGG GGGGCGGTGG GTCTCGGGGC 13500
CGTCCTGTGC GGGATTGCGT TTTACGTGAC GTCAATCGCC CGAGGCGCAT AAAGGTCCGG 13560
CGGCCAGCCC CGCCGCAGCT CATAAAAATC GTGAGTCACG GCAACCGCAC CTTCGCCTCC 13620
GGCCCTCCGC CAGCGCCCTT CCGCGTCCGC GATGACCTCC CGGCCCGCCG ACCAAGACTC 13680 GGTGCGTTCC AGCGCGTCGG TGCCGCTTTA CCCCGCGGCC TCGCCCGTCC CGGCAGAAGC 13740
CTACTACTCG GAAAGCGAAG ACGAGGCCGC CAACGACTTC CTCGTGCGCA TGGGCCGCCA 13800
GCAGTCGGTC CTAAGGCGCC GACGGCGGCG CACGCGGTGC GTCGGGCTGG TTATCGCCTG 13860
TCTCGTCGTG GCCCTCCTAT CTGGAGGGTT CGGGGCACTT TTGGTGTGGC TGCTCCGCTA 13920
AATGACGCCT CGATGTATGG CGCCTTCTTC GCCCCCACCC CTCGCCGCGA CCCACGTCCG 13980 TATGTTAATT GCAATAAAGT GGTTGATTGT CATTACGGTC TACTAGGTTG TCTTTTTTTT 14040
TTGGGGGGGG GGGGAAGGAA ATGCAGAAAA GGGTAAGAAA TTCTCGGAAT TTCACCCCCC 14100
GGGGGGGGCA AGTGCAGTAC CCCAGTTCCT CAGTGTTTGG GAAATCTATT GAACTCTCCC 14160
GGCTCCTCCG TGTTAGGGAA GTCTCTTGGG GAAATCTATT GACCTCTCGC CCCCCCCCCC 14220
AGGAGGGGGC AGTGCAGTAC CCCAGTTCCT CCGTGCTGGG GAAATCTCTC TGCCGGGTAC 14280 GGGCTCCAGA CGAAGGACCC ATACATTTCC CCATCCGCAC CCCACATCTG GCGTTCTAGA 14340
GTCACGACGC ATTTGCCCCC GTCCCCGCAG CAACACACAA AGCGATTTCA ATTTTCACGA 14400
TTTTATTATT AATTACACCA ACCACCCTGT CCCCGGGACG TGGTCAGGAC CGGGGGTCCG 14460
CACCCAAACG CACGAAACAA ATGCTGGCAG TGTGCCGAAT ATAACCCCGC GTAGGAACAC 14520
GTCGACGCGT GCGCCAAACA GCACCAGAAG GCGCATGCCA TCAGCAGGTC GTGCATATGG 14580 CGATGTGTTT GGACGCAGGG CGCAGCCGCG GCGATAAAAT TCATGGCGGC CGTCCGCCAG 14640
GGCCACAGCG GCGAGGACTC CCTGTTGGCC CGAAGCCATT GGGTATGAAC CAGCTGCGCC 14700
TCCTGTCCGA CCCTGGCTCC CGCCAGCGGG GGCGGTGGGT CGTGGGTGTT GAGAGCACAC 14760
AGGCGGGACA CCTCGATCAC CGTCCGAAAA AAGGCCCGGT GGTCCGCGGG CAGCATCTGC 14820
AGGTGCGCCA GGGCCTGGGC GTTGAGAGGG TACAACTCGG AGCCGGGGGA CTCCGGGGGC 14880 CGGTCCGCGC GGTGCCGCGA GTGGGCACGC TTTGGGGCCC GGGTGTCGGA CGCGGGCGCG 14940
TTACGGATCC CGACGCGGGG CAGAACGTAC GTGCGTTGGC GCGGCGATGA GGGGTCCGGG 15000
CTGCCGAGGG GGGCGTAGGG GACCGGGCTA GGCAAGCCCG CGGGTTGCGC GGGGTTCCCG 15060
TGGGGGTCTA GGCTCCCTGG GCACCCGTGG GGGTCGTGGG GGTCGCGGGT CCCTGGGTAT 15120
GCGCGGGACC CTGGGTTCTC TGGGAGATCG TGGAACTCGC GGTTCCCTGG GCTCTCGGGG 15180 AACCCGGGGC TCCCTGGGGA CACGTGGTGC CCTGGGAATT CTTGATGGTC GGACGGCTTC 15240
AGATGGCTTC GGGATCGAGA GGGCCGCACA GACTCGTAGT AGACCCGAAT CTCCACGTTT 15300
CCCCGCCGCC GGATCATGGT CGCCGCCCCG GTGCGGGGGC CCGTCGGTCG GAAGCGAGTG 15360
CCCTTCAAGC GTGTCCGCTC CTCTGGGCTG CATGCCGTCG GATGGGGTGC CTTTTAAGGA 15420 AAGGTCTCGG CTGCCCGCCC CAACCGGGGT TTGGGGGTGG GCCGGGGAAA CCCCGGATGC 15480
CATGGGGGGG GTCACACCCT AAGCGCCGGC GCGCTGGTTG GGTGGGGGTA GAGGGGAGTC 15540
CCCGGTCGAC GAGATCGTAT CAAGGGGCCA GCACGCGATC CTGCCGCTCG TTCGATCTAG 15600
CACACCCACG GGTCTGCTGT GTGGGATTTC GACTCGCGGG ATCCGATCGC ACGTCCGGAG 15660 GACACAGCAG CGGGAGCTCC GGGTCGGTCA CCGCAGTTCT GGCCGCCTCT CGGTCCTCCC 15720
GTTCCCTTTT ATGGATCTCC GCGCAGACAT CGCCATACGT CCGGTGTGTG CACCGCGAAG 15780
AATCCAAAAA CATGTCCGTC GTTTTCAGGG CCCAAGACAT GGTGTCCCGT CCACGAAGGC 15840
GGCGCCCGGC CTGCGAGAAA GCGCGGATGT TGGGATCGGG GCCCCCCCGT CCCGTCCCC 15900
(2) INFORMATION FOR SEQ ID NO: 174:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 414 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174:
Met Ala Asp Ile Pro Pro Asp Pro Pro Ala Leu Asn Thr Thr Pro Ala
1 5 10 15
Asn His Ala Pro Pro Ser Pro Pro Pro Gly Ser Arg Lys Arg Arg Arg 20 25 30
Pro Val Leu Pro Ser Ser Ser Glu Ser Glu Gly Lys Pro Asp Thr Glu
35 40 45
Ser Glu Ser Ser Ser Thr Glu Ser Ser Glu Asp Glu Ala Gly Asp Leu 50 55 60 Arg Gly Gly Arg Arg Arg Ser Pro Arg Glu Leu Gly Gly Arg Tyr Phe 65 70 75 80
Leu Asp Leu Ser Ala Glu Ser Thr Thr Gly Thr Glu Ser Glu Gly Thr
85 90 95
Gly Pro Ser Asp Asp Asp Asp Asp Asp Ala Ser Asp Gly Trp Leu Val 100 105 110
Asp Thr Pro Pro Arg Lys Ser Lys Arg Pro Arg Ile Asn Leu Arg Leu
115 120 125
Thr Ser Ser Pro Asp Arg Arg Ala Gly Val Val Phe Pro Glu Val Trp 130 135 140 Arg Asn Asp Arg Pro Ile Arg Ala Ala Gin Pro Gin Ala Pro Ala Gin 145 150 155 160
Ser Ser Gly Asp Arg Ala Ala Ala Pro Arg Arg Ser Ala Arg Gin Ala 165 170 175 Gin Met Arg Ser Gly Ala Ala Trp Thr Leu Asp Leu His Tyr Ile Arg
180 185 190
Gin Cys Val Asn Gin Leu Phe- Arg Ile Leu Arg Ala Ala Pro Asn Pro 195 200 205 Pro Gly Ser Ala Asn Arg Leu Arg His Leu Val Arg Asp Cys Tyr Leu 210 215 220
Met Gly Tyr Cys Arg Thr Arg Leu Gly Pro Arg Thr Trp Gly Arg Leu 225 230 235 240
Leu Gin Ile Ser Gly Gly Thr Trp Asp Val Arg Leu Arg Asn Ala Ile 245 250 255
Arg Glu Val Glu Ala Arg Phe Glu Pro Ala Ala Glu Pro Val Cys Glu
260 265 270
Leu Pro Cys Leu Asn Ala Arg Arg Tyr Gly Pro Glu Cys Asp Val Gly 275 280 285 Asn Leu Glu Thr Asn Gly Gly Ser Thr Ser Asp Asp Glu Ile Ser Asp 290 295 300
Ala Thr Asp Ser Asp Asp Thr Leu Ala Ser His Ser Asp Thr Glu Gly 305 310 315 320
Gly Pro Ser Pro Ala Gly Arg Glu Asn Pro Glu Ser Ala Ser Gly Gly 325 330 335
Ala Ile Ala Ala Arg Leu Glu Cys Glu Phe Gly Thr Phe Asp Trp Thr
340 345 350
Ser Glu Glu Gly Ser Gin Pro Trp Leu Ser Ala Val Val Ala Asp Thr 355 360 365 Ser Ser Ala Glu Arg Ser Gly Leu Pro Ala Pro Gly Ala Cys Arg Ala 370 375 380
Thr Glu Ala Pro Glu Arg Glu Asp Gly Cys Arg Lys Met Arg Phe Pro 385 390 395 400
Ala Ala Cys Pro Tyr Pro Cys Gly His Thr Phe Leu Arg Pro 405 410
(2) INFORMATION FOR SEQ ID NO: 175:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 287 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 175: Met Gly Val Val Val Val Ser Val Val Thr Leu Leu Asp Gin Arg Asn
1 5 10 15
Ala Leu Pro Arg Thr Ser Ala Asp Asp Ala Leu Trp Ser Phe Leu Leu 20 25 30 Arg Gin Cys Arg Ile Leu Ala Ser Glu Pro Leu Gly Thr Pro Val Val 35 40 45
Val Arg Pro Ala Asn Leu Arg Arg Leu Ala Glu Pro Leu Met Asp Leu
50 55 60
Pro Lys Phe Trp Ile Val Arg Thr Arg Ser Cys Arg Cys Pro Pro Asn 65 70 75 80
Thr Thr Thr Gly Leu Phe Ala Glu Asp Asp Pro Leu Glu Ser Ile Glu
85 90 95
Ile Leu Asp Ala Pro Ala Cys Phe Arg Leu Leu His Gin Glu Arg Pro 100 105 110 Gly Pro His Arg Leu Tyr His Leu Trp Val Val Gly Ala Ala Asp Leu 115 120 125
Cys Val Pro Phe Leu Glu Tyr Ala Gin Lys Thr Arg Leu Gly Phe Arg
130 135 140
Phe Ile Ala Met Lys Thr Asn Asp Ala Trp Val Gly Glu Pro Trp Pro 145 150 155 160
Leu Pro Asp Arg Phe Leu Pro Glu Arg Thr Val Ser Trp Thr Pro Phe
165 170 175
Pro Ala Ala Pro Asn His Pro Leu Glu Asn Leu Leu Ser Arg Tyr Glu 180 185 190 Tyr Gin Tyr Gly Val Val Val Pro Gly Asp Arg Glu Arg Ser Cys Leu 195 200 205
Arg Trp Leu Arg Ser Leu Val Ala Pro His Asn Lys Pro Arg Pro Ala
210 215 220
Ser Ser Arg Pro His Pro Ala Thr His Pro Thr Gin Arg Pro Cys Phe 225 230 235 240
Thr Cys Met Gly Arg Pro Glu Ile Pro Asp Glu Pro Ser Trp Gin Thr
245 250 255
Gly Asp Asp Asp Pro Gin Asn Pro Gly Pro Pro Leu Ala Val Gly Asp 260 265 270 Glu Trp Pro Pro Ser Ser His Val Cys Tyr Pro Ile Thr Asn Leu 275 280 285
(2) INFORMATION FOR SEQ ID NO: 176:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 507 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 176:
Val Gly Gly Cys Val Asp Lys Leu Pro Leu Leu Lys Thr Pro Gly Pro
1 5 10 15
Val Arg Ala Arg Trp Leu Ala Arg Ala Thr Arg Arg Met Ala Cys Arg 20 25 30
Lys Phe Cys Gly Val Tyr Arg Arg Pro Asp Lys Arg Gin Glu Ala Ser
35 40 45
Val Pro Pro Glu Thr Asn Thr Ala Pro Ala Phe Pro Ala Ser Thr Phe 50 55 60 Tyr Thr Pro Ala Glu Asp Ala Tyr Leu Ala Pro Gly Pro Pro Glu Thr 65 70 75 80
Ile His Pro Ser Arg Pro Pro Ser Pro Gly Glu Ala Ala Arg Leu Cys
85 90 95
Gin Leu Gin Glu Ile Leu Ala Gin Met His Ser Asp Glu Asp Tyr Pro 100 105 110
Ile Val Asp Ala Ala Gly Ala Glu Glu Glu Asp Glu Ala Asp Asp Asp
115 120 125
Ala Pro Asp Asp Val Ala Tyr Pro Glu Asp Tyr Ala Glu Gly Arg Phe
130 135 140 Leu Ser Met Val Ser Ala Ala Pro Leu Pro Gly Ala Ser Gly His Pro
145 150 155 160
Pro Val Pro Gly Arg Ala Ala Pro Pro Asp Val Arg Thr Cys Asp Ser
165 170 175
Gly Lys Val Gly Ala Thr Gly Phe Thr Pro Glu Glu Leu Asp Thr Met 180 185 190
Asp Arg Glu Ala Leu Arg Ala Ile Ser Arg Gly Cys Lys Pro Pro Ser
195 200 205
Thr Leu Ala Lys Leu Val Thr Gly Leu Gly Phe Ala Ile His Gly Ala
210 215 220 Leu Ile Pro Gly Ser Glu Gly Cys Val Phe Asp Ser Ser His Pro Asn
225 230 235 240
Tyr Pro His Arg Val Ile Val Lys Ala Gly Trp Tyr Ala Ser Thr Asn
245 250 255
His Glu Ala Arg Leu Leu Arg Arg Leu Asn His Pro Ala Ile Leu Pro 260 265 270
Leu Leu Asp Leu His Val Val Ser Gly Val Thr Cys Leu Val Leu Pro
275 280 285
Lys Tyr His Cys Asp Leu Tyr Thr Tyr Leu Ser Lys Arg Pro Ser Pro 290 295 300
Leu Gly His Leu Gin Ile Thr Ala Val Ser Arg Gin Leu Leu Ser Ala 305 310 - 315 320
Ile Asp Tyr Val His Cys Glu Gly Ile Ile His Arg Asp Ile Lys Thr 325 330 335
Glu Asn Ile Leu Ile Asn Thr Pro Glu Asn Ile Cys Leu Gly Asp Phe
340 345 350
Gly Ala Ala Cys Phe Val Arg Gly Cys Arg Ser Ser Pro Phe His Tyr 355 360 365 Gly Ile Ala Gly Thr Ile Asp Thr Asn Ala Pro Glu Val Leu Ala Gly 370 375 380
Asp Pro Tyr Thr Gin Val Ile Asp Ile Trp Ser Ala Gly Leu Val Ile 385 390 395 400
Phe Glu Thr Ala Val His Thr Ala Ser Leu Phe Ser Ala Pro Arg Asp 405 410 415
Pro Glu Arg Arg Pro Cys Asp Asn Gin Ile Ala Arg Ile Ile Arg Gin
420 425 430
Ala Gin Val His Val Asp Glu Phe Pro Thr His Ala Glu Ser Arg Leu 435 440 445 Thr Ala His Tyr Arg Ser Arg Ala Ala Gly Asn Asn Arg Pro Ala Trp 450 455 460
Trp Ala Trp Thr Arg Tyr Tyr Lys Ile His Thr Asp Val Glu Tyr Leu 465 470 475 480
Ile Cys Lys Ala Leu Thr Phe Asp Ala Ala Leu Arg Pro Ser Ala Ala 485 490 495
Glu Leu Leu Arg Leu Pro Leu Phe His Pro Lys 500 505
(2) INFORMATION FOR SEQ ID NO: 177:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 392 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 177:
Val Cys Ile Ala Tyr His Gly Met Gly Arg Leu Thr Ser Gly Val Gly
1 5 10 15
Thr Ala Ala Leu Leu Val Val Ala Val Gly Leu Arg Val Val Cys Ala 20 25 30
Lys Tyr Ala Asp Pro Ser Leu Lys Met Ala Asp Pro Asn Arg Phe Arg
35 - 40 45
Gly Lys Asn Leu Pro Val Leu Asp Gin Leu Thr Asp Pro Pro Gly Val 50 55 60
Lys Arg Val Tyr His Ile Gin Pro Ser Leu Glu Asp Pro Phe Gin Pro
65 70 75 80
Pro Ser Ile Pro Ile Thr Val Tyr Tyr Ala Val Leu Glu Arg Ala Cys
85 90 95 Arg Ser Val Leu Leu His Ala Pro Ser Glu Ala Pro Gin Ile Val Arg
100 105 110
Gly Ala Ser Asp Glu Ala Arg Lys His Thr Tyr Asn Leu Thr Ile Ala
115 120 125
Trp Tyr Arg Met Gly Asp Asn Cys Ala Ile Pro Ile Thr Val Met Glu 130 135 140
Tyr Thr Glu Cys Pro Tyr Asn Lys Ser Leu Gly Val Cys Pro Ile Arg
145 150 155 160
Thr Gin Pro Arg Trp Ser Tyr Tyr Asp Ser Phe Ser Ala Val Ser Glu
165 170 175 Asp Asn Leu Gly Phe Leu Met His Ala Pro Ala Phe Glu Thr Ala Gly
180 185 190
Thr Tyr Leu Arg Leu Val Lys Ile Asn Asp Trp Thr Glu Ile Thr Gin
195 200 205
Phe Ile His Arg Ala Arg Ala Ser Cys Lys Tyr Ala Leu Pro Leu Arg 210 215 220
Ile Pro Pro Ala Ala Cys Leu Thr Ser Lys Ala Tyr Gin Gin Gly Val
225 230 235 240
Thr Val Asp Ser Ile Gly Met Leu Pro Arg Phe Ile Pro Glu Asn Gin
245 250 255 Arg Thr Val Ala Lys Leu Lys Ile Ala Gly Trp His Gly Pro Lys Pro
260 265 270
Pro Tyr Thr Ser Thr Leu Leu Pro Pro Glu Leu Ser Asp Thr Thr Asn
275 280 285
Ala Thr Gin Pro Glu Leu Val Pro Glu Asp Pro Glu Asp Ser Ala Leu 290 295 300
Leu Glu Asp Pro Ala Gly Thr Val Ser Ser Gin Ile Pro Pro Asn Trp 305 310 315 320
His Ile Pro Ser Ile Gin Asp Val Ala Pro His His Ala Pro Ala Ala 325 330 335 Pro Ser Asn Pro Gly Leu Ile Ile Gly Ala Gly Ser Thr Leu Ala Val 340 345 350
Leu Val Ile Gly Gly Ile Ala Phe Trp Val Arg Arg Arg Ala Gin Met 355 360 365 Ala Pro Lys Arg Leu Arg Leu Pro His Ile Arg Asp Asp Asp Ala Pro
370 375 380
Pro Ser His Gin Pro Leu Phe- Tyr 385 390
(2) INFORMATION FOR SEQ ID NO: 178:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 392 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178:
Val Cys Ile Ala Tyr His Gly Met Gly Arg Leu Thr Ser Gly Val Gly 1 5 10 15 Thr Ala Ala Leu Leu Val Val Ala Val Gly Leu Arg Val Val Cys Ala 20 25 30
Lys Tyr Ala Asp Pro Ser Leu Lys Met Ala Asp Pro Asn Arg Phe Arg
35 40 45
Gly Lys Asn Leu Pro Val Leu Asp Gin Leu Thr Asp Pro Pro Gly Val 50 55 60
Lys Arg Val Tyr His Ile Gin Pro Ser Leu Glu Asp Pro Phe Gin Pro 65 70 75 80
Pro Ser Ile Pro Ile Thr Val Tyr Tyr Ala Val Leu Glu Arg Ala Cys 85 90 95 Arg Ser Val Leu Leu His Ala Pro Ser Glu Ala Pro Gin Ile Val Arg 100 105 110
Gly Ala Ser Asp Glu Ala Arg Lys His Thr Tyr Asn Leu Thr Ile Ala
115 120 125
Trp Tyr Arg Met Gly Asp Asn Cys Ala Ile Pro Ile Thr Val Met Glu 130 135 140
Tyr Thr Glu Cys Pro Tyr Asn Lys Ser Leu Gly Val Cys Pro Ile Arg 145 150 155 160
Thr Gin Pro Arg Trp Ser Tyr Tyr Asp Ser Phe Ser Ala Val Ser Glu 165 170 175 Asp Asn Leu Gly Phe Leu Met His Ala Pro Ala Phe Glu Thr Ala Gly 180 185 190
Thr Tyr Leu Arg Leu Val Lys Ile Asn Asp Trp Thr Glu Ile Thr Gin 195 200 205 Phe Ile His Arg Ala Arg Ala Ser Cys Lys Tyr Ala Leu Pro Leu Arg
210 215 220
Ile Pro Pro Ala Ala Cys Leu, Thr Ser Lys Ala Tyr Gin Gin Gly Val 225 230 235 240 Thr Val Asp Ser Ile Gly Met Leu Pro Arg Phe Ile Pro Glu Asn Gin
245 250 255
Arg Thr Val Ala Lys Leu Lys Ile Ala Gly Trp His Gly Pro Lys Pro
260 265 270
Pro Tyr Thr Ser Thr Leu Leu Pro Pro Glu Leu Ser Asp Thr Thr Asn 275 280 285
Ala Thr Gin Pro Glu Leu Val Pro Glu Asp Pro Glu Asp Ser Ala Leu
290 295 300
Leu Glu Asp Pro Ala Gly Thr Val Ser Ser Gin Ile Pro Pro Asn Trp 305 310 315 320 His Ile Pro Ser Ile Gin Asp Val Ala Pro His His Ala Pro Ala Ala
325 330 335
Pro Ser Asn Pro Gly Leu Ile Ile Gly Ala Gly Ser Thr Leu Ala Val
340 345 350
Leu Val Ile Gly Gly Ile Ala Phe Trp Val Arg Arg Arg Ala Gin Met 355 360 365
Ala Pro Lys Arg Leu Arg Leu Pro His Ile Arg Asp Asp Asp Ala Pro
370 375 380
Pro Ser His Gin Pro Leu Phe Tyr 385 390
(2) INFORMATION FOR SEQ ID NO: 179:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 429 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 179:
Val Tyr Leu Trp Ala Arg Val Gly Gly Trp Leu Gly Tyr Leu Gly Gly 1 5 10 15 Thr Trp Thr Pro His Lys Gly Ser Leu Glu Gly Gly Lys Leu Gly Gin 20 25 30
Phe Ile Gly Arg Glu Arg Gly Ala Arg Thr Ala Val Pro Thr Ile Ser 35 40 45 His Arg Ala His Ser His Leu Asp Pro Ser Asp Pro Gly Met Pro Gly
50 55 60
Arg Ser Leu Gin Gly Leu Ala, Ile Leu Gly Leu Trp Val Cys Ala Thr 65 70 75 80 Gly Leu Val Val Arg Gly Pro Thr Val Ser Leu Val Ser Asp Ser Leu
85 90 95
Val Asp Ala Gly Ala Val Gly Pro Gin Gly Phe Val Glu Glu Asp Leu
100 105 110
Arg Val Phe Gly Glu Leu His Phe Val Gly Ala Gin Val Pro His Thr 115 120 125
Asn Tyr Tyr Asp Gly Ile Ile Glu Leu Phe His Tyr Pro Leu Gly Asn
130 135 140
His Cys Pro Arg Val Val His Val Val Thr Leu Thr Ala Cys Pro Arg 145 150 155 160 Arg Pro Ala Val Ala Phe Thr Leu Cys Arg Ser Thr His His Ala His
165 170 175
Ser Pro Ala Tyr Pro Thr Leu Glu Leu Gly Leu Ala Arg Gin Pro Leu
180 185 190
Leu Arg Val Arg Thr Ala Thr Arg Asp Tyr Ala Gly Val Leu Arg Val 195 200 205
Trp Val Gly Ser Ala Thr Asn Ala Ser Leu Phe Val Leu Gly Val Ser
210 215 220
Ala Asn Gly Thr Phe Val Tyr Asn Gly Ser Asp Tyr Gly Ser Cys Asp 225 230 235 _ 240 Pro Ala Gin Leu Pro Phe Ser Ala Pro Arg Leu Gly Pro Ser Ser Val
245 250 255
Tyr Thr Pro Gly Ala Ser Arg Pro Thr Pro Pro Arg Thr Thr Thr Ser
260 265 270
Pro Ser Ser Pro Arg Asp Pro Thr Pro Ala Pro Gly Asp Thr Gly Thr 275 280 285
Pro Ala Pro Ala Ser Gly Glu Arg Ala Pro Pro Asn Ser Thr Arg Ser
290 295 300
Ala Ser Glu Ser Arg His Arg Leu Thr Val Ala Gin Val Ile Gin Ile 305 310 315 320 Ala Ile Pro Ala Ser Ile Ile Ala Phe Val Phe Leu Gly Ser Cys Ile
325 330 335
Cys Phe Ile His Arg Cys Gin Arg Arg Tyr Arg Arg Pro Arg Gly Gin
340 345 350
Ile Tyr Asn Pro Gly Gly Val Ser Cys Ala Val Asn Glu Ala Ala Met 355 360 365
Ala Arg Leu Gly Ala Glu Leu Arg Ser His Pro Asn Thr Pro Pro Lys
370 375 380
Pro Arg Arg Arg Ser Ser Ser Ser Thr Thr Met Pro Ser Leu Thr Ser 385 390 395 400
Ile Ala Glu Glu Ser Glu Pro Gly Pro Val Val Leu Leu Ser Val Ser
405 - 410 415
Pro Arg Pro Arg Ser Gly Pro Thr Ala Pro Gin Glu Val 420 425
(2) INFORMATION FOR SEQ ID NO: 180:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 430 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 180:
Met Arg Ala Gly Leu Val Phe Phe Val Gly Val Trp Val Val Ser Cys 1 5 10 15
Leu Ala Ala Ala Pro Arg Thr Ser Trp Lys Arg Val Thr Ser Gly Glu
20 25 30
Asp Val Val Leu Leu Pro Ala Pro Ala Gly Pro Glu Glu Arg Thr Arg 35 40 45 Ala His Lys Leu Leu Trp Ala Ala Glu Pro Leu Asp Ala Cys Gly Pro 50 55 60
Leu Arg Pro Ser Trp Val Trp Pro Pro Arg Arg Val Leu Glu Thr Val 65 70 75 80
Val Asp Ala Ala Cys Met Arg Ala Pro Glu Pro Leu Ala Ile Ala Tyr 85 90 95
Ser Pro Pro Phe Pro Ala Gly Asp Glu Gly Ser Glu Leu Ala Trp Arg
100 105 110
Asp Arg Val Ala Val Val Asn Glu Ser Leu Val Ile Tyr Gly Ala Leu 115 120 125 Glu Thr Asp Ser Gly Thr Leu Ser Val Val Gly Leu Ser Asp Glu Ala 130 135 140
Arg Gin Val Ala Ser Val Val Leu Val Val Glu Pro Ala Pro Val Pro 145 150 155 160
Thr Pro Thr Pro Asp Asp Tyr Asp Glu Glu Asp Asp Ala Gly Val Ser 165 170 175
Thr Pro Val Ser Val Pro Pro Pro Thr Pro Pro Arg Arg Pro Pro Val
180 185 190
Ala Pro Pro Thr His Pro Arg Val Ile Pro Glu Val Ser His Val Arg 195 200 205
Gly Val Thr Val His Met Pro Glu Ala Ile Leu Phe Ala Pro Gly Glu
210 215- 220
Thr Phe Gly Thr Asn Val Ser Ile His Ala Ile Ala His Asp Asp Gly 225 230 235 240
Pro Tyr Ala Met Asp Val Val Trp Met Arg Phe Asp Val Pro Ser Ser
245 250 255
Cys Ala Glu Met Arg Ile Tyr Glu Ala Cys Leu Tyr His Pro Gin Leu
260 265 270 Pro Glu Cys Leu Ser Pro Ala Asp Ala Pro Cys Ala Val Ser Ser Trp
275 280 285
Ala Tyr Arg Leu Ala Val Arg Ser Tyr Ala Gly Cys Ser Arg Thr Thr
290 295 300
Pro Pro Pro Arg Cys Phe Ala Glu Ala Arg Met Glu Pro Val Pro Gly 305 310 315 320
Leu Ala Trp Leu Ala Ser Thr Val Asn Leu Glu Phe Gin His Asp Gin
325 330 335
His Ala Gly Leu Cys Val Val Tyr Val Asp Asp His Ile His Ala Trp
340 345 350 Gly His Met Thr Ile Ser Thr Ala Ala Gin Tyr Arg Asn Ala Val Val
355 360 365
Glu Gin His Leu Pro Gin Arg Gin Pro Glu Pro Val Glu Pro Trp His
370 375 380
Val Arg Ala Pro Pro Pro Ala Pro Ser Arg Pro Leu Arg Leu Gly Ala 385 390 395 400
Val Leu Gly Ala Ala Leu Leu Leu Ala Ala Leu Gly Leu Ser Ala Trp
405 410 415 Gly Val His Asp Leu Leu Ala Gin Ala Leu Leu Ala Gly Gly 420 425 430
(2) INFORMATION FOR SEQ ID NO: 181:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 41 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 181:
Val His Ala Val Asp Ala Pro Ser Gin Phe Val Thr Trp Leu Ala Val 1 5 10 15
Arg Trp Leu Arg Gly Ala Val Gly Leu Gly Ala Val Leu Cys Gly Ile
20 - 25 30
Ala Phe Tyr Val Thr Ser Ile Arg Ala 35 40
(2) INFORMATION FOR SEQ ID NO: 182:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 85 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182:
Met Thr Ser Arg Pro Ala Asp Gin Asp Ser Val Arg Ser Ser Ala Ser 1 5 10 15
Val Pro Leu Tyr Pro Ala Asp Val Pro Ala Glu Ala Tyr Tyr Ser Glu
20 25 30
Ser Glu Asp Glu Ala Ala Asn Asp Phe Leu Val Arg Met Gly Arg Gin 35 40 45 Gin Ser Val Leu Arg Arg Arg Arg Arg Arg Thr Arg Cys Val Gly Leu 50 55 60
Val Ile Ala Cys Leu Val Val Leu Ser Gly Gly Phe Gly Ala Leu Leu 65 70 75 80
Val Trp Leu Leu Arg 85
(2) INFORMATION FOR SEQ ID NO: 183:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 296 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:183: Met Ile Arg Arg Arg Gly Asn Val Glu Ile Arg Val Tyr Tyr Glu Ser
1 5 10 15
Val Arg Pro Ser Arg Ser Arg.Ser His Leu Lys Pro Ser Asp His Gin 20 25 30 Glu Phe Pro Gly His His Val Ser Pro Gly Ser Pro Gly Phe Pro Glu 35 40 45
Ser Pro Gly Asn Arg Glu Phe His Asp Leu Pro Glu Asn Pro Gly Ser
50 55 60
Arg Ala Tyr Pro Gly Thr Arg Asp Pro His Asp Pro His Gly Cys Pro 65 70 75 80
Gly Ser Leu Asp Pro His Gly Asn Pro Ala Gin Pro Ala Gly Leu Pro
85 90 95
Ser Pro Val Pro Tyr Ala Pro Leu Gly Ser Pro Asp Pro Ser Ser Pro 100 105 110 Arg Gin Arg Thr Tyr Val Leu Pro Arg Val Gly Ile Arg Asn Ala Pro 115 120 125
Ala Ser Asp Thr Arg Ala Pro Lys Arg Ala His Ser Arg His Arg Ala
130 135 140
Asp Arg Pro Pro Glu Ser Pro Gly Ser Glu Leu Tyr Pro Leu Asn Ala 145 150 155 160
Gin Ala His Leu Gin Met Leu Pro Ala Asp His Arg Ala Phe Phe Arg
165 170 175
Thr Val Ile Glu Val Ser Arg Leu Cys Ala Leu Asn Thr His Asp Pro 180 185 190 Pro Pro Pro Leu Ala Gly Ala Arg Val Gly Gin Glu Ala Gin Leu Val 195 200 205
His Thr Gin Trp Leu Arg Ala Asn Arg Glu Ser Ser Pro Leu Trp Pro
210 215 220
Trp Arg Thr Ala Ala Met Asn Phe Ile Ala Ala Ala Ala Pro Cys Val 225 230 235 240
Gin Thr His Met His Asp Leu Leu Met Ala Cys Ala Phe Trp Cys Cys
245 250 255
Leu Ala His Ala Ser Thr Cys Ser Tyr Ala Gly Ser Ala His Cys Gin 260 265 270 His Leu Phe Arg Ala Phe Gly Cys Gly Pro Pro Val Leu Thr Thr Ser 275 280 285
Arg Gly Gin Gly Gly Trp Cys Asn 290 295
(2) INFORMATION FOR SEQ ID NO: 184:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 178 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear-
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184:
ATGCACGTGT AACCGCCAGT CCGTGCTTGC CTAGCGAACT CACCCGTCCC GGCTGGCGTG 60
CGCAGCCCGG GCCGTGTTGC GGGCCCTCTT AAGGGGCGGC GGCAGGACGG GGACTCCGCC 120 CCGCCTCCTT TCCCCCGGGG AGTCAACCCC CGGGGGGGTG TATTCTGGGG GGGGGGT 178
(2) INFORMATION FOR SEQ ID NO: 185:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 2116 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185:
GACCCAGATC CCACCCCCGC CCGCAACGGG GCGCCGCCGC TGCTGCTGCT CCGCGGGGCG 60
CCAGGGGGCG CCGGTCGGGT CGCGGCGGGC TGGGAGGTTC CGCGGGTCGC CCCCGCACCG 120 CCGCCCCCGC GCCGGGGCGC TCTTCGGGGG GCGGGCGGGA CGTAGTCCGC TGCAGAGGGA 180
GACAGAGACG GGAACCCCCG GTTAGTGCCN GACCCCCGCC CGACCCCCGC CCAGTGCCCG 240
ACCCCCGCCC GACCCCCGCC CGACCCCCGC CCAGTGCCCG ACCCCCGCCC AGTGCCCGAC 300
CCCCGCCCAG TGCCCGACCC CCGCCCAGTG CCCGACCCCC GCCCGCCCTC ACCGTCGGCC 360
AGGTCATCGT CCTCGTCGTC CGTGCCGGGC CACGGGGGGG TGGGCGACAG GGCGCGGACC 420 GTGTGTCCCC CCAGCGACAG GGAGCGCGGG GCCGTCCGCG GGTTGCCCGT CCAGATAAAG 480
TCCACGGCCG TGCCGGACCG CACGGCCGCC TCGGCCTCCA CGCGGGTCCG GGGGTCGTTC 540
ACTATCGGGA TGGTGCTGAA CGACCCGCTG GCGGTCACGC CCACTATCAG GTACGCCACC 600
GGGGTGTTGC ACAGGGGACA CGTGTTGCGC AACGGAATCC AGGTCTTCAT GCACGGGATG 660
CAGAAGGGGT GCAGGCAGGG AAAACTCTGG CAGCGCAGGG GCGGGGCGAT CTCGTCCGTG 720 CACACGGCAC ACACGTCGCC CCCCCCTCCC GCTTCCGCTT CCTCCTCACC CACGGGCCCA 780
CCCCCGCAGG ATCCCTGCGC GTCGGCGGGC GTGGGGCTGC CCTGGCGCTC GGCCGGGGGC 840
CGGGCCGGGG GCGTGGCCGC GTCCATCAGG CCCGCCTCGA ACATCTCCGT GTCCGTGCTG 900
CCCGCCTCGG AGGTGGAGTC GCGGTGAAGG TCGTCGTCAG AGATTCCCAC CTCGGTCTCC 960
TCCTCCGAGT CGCTGCTGGC GAGCCACTGC ATGTCGTTGA GCATCCCCCA GGCGTGCGGG 1020 GCGGCGGGCT GCTTGACAAA GAAACGGGGG GGGATTTAGA GGGCGCGGGG CGTGAGGCGG 1080
GACCCCCGTG CCGTGTCCCC CGTGTCCCTC CCTCACCCCG GCCCCCCGCC CGCTGCTTTT 1140
TGTTCGGAAG GGGGGGAGAA AGGGGTCCGT AACCAAAGGT GGTCTGCGTC CTTTGGATTC 1200
CGACCCCTCG TCTCCCCCCC CCTGTCCCCC GCTCTCGGGC TCAGGGCTCC CTGCCTCCCT 1260 CGCCCCCCCA AAGGGTCGGG GGGCGGCGCA CGGCCCACGG GGGTCCCCCG ACCGCTTAAG 1320
CGGGCCGGGG GTCGGCCCCG TCAAGCGTCC CCGCCCCCGA GCCCACCGCC CGCGACCACC 1380
CCCAACCCGC AGCCGGGTGG TCCGGGGAAA AGGGGGGGCC TGAGACCCGG GGGTCGCCCT 1440
CTCACCGTGC CGGGGGTCTG CCGCGGCGGC CGCTCGGGGC CGGGGTCCGC CCGGGAGCTC 1500 GTGCCGGGCC GGGGTTCCAT GAGCCGGGGT AGGGTAGACT CGAGACGGCG GCCCGCGGTC 1560
TCTCTCTTGC CGGGTGTTAG TCTCTGTCTC TCCGGGTCTC CTCCTCCCGC CGGGCCGCCG 1620
CTCCGTCGCT CGCAGTGCCG GGGTGCGAAT GCGGCCCGAC CGTCACACGG GGCTGCCTTA 1680
TACCCGGCGC CTATCCACTC CCCCAAAGGG GCGGCATTTA CGATTCCCCC AATAGCCGCG 1740
CGCCCCGGCG GGGGCGGAGG GAGGGAATCC CCCCCTCTCG GGGCGGCCCC GTCCCCGGGG 1800 ACCAACCGGG TGTACTCCAA GAACCCCATT AGCATGCGCC GCCCCCCGCC GACGCAGATG 1860
GGAGTCCCCC CGGCGCCCCG CCGGCGCGGC CCTGAGTGGT GCCCGCCCCC GGGGAGAAAT 1920
TCATTAGCAT ACTAGGAAGC CCAGGGGACC AATAGGGGCC GATCAGCCCA CCCACCCGGC 1980
GGCGCGCGAG GCTCTGCGTG TTCTGCCAAG AAAGTAATCA GCATAACCCG GAACCCCGAG 2040
GGAGTAATTA CGCGGGGAGC GAGGGGCCGT CCGAACGTTT TTAATTACCA TAAGCGGGAA 2100 TGGCGGCCCG TTAAA 2116
(2) INFORMATION FOR SEQ ID NO: 186:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 338 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 186:
Met Leu Asn Asp Met Gin Trp Leu Ala Ser Ser Asp Ser Glu Glu Glu 1 5 10 15
Thr Glu Val Gly Ile Ser Asp Asp Asp Leu His Arg Asp Ser Thr Ser
20 25 30
Glu Ala Gly Ser Thr Asp Thr Glu Met Phe Glu Ala Gly Leu Met Asp 35 40 45 Ala Ala Thr Pro Pro Ala Arg Pro Pro Ala Glu Arg Gin Gly Ser Pro 50 55 60
Thr Pro Ala Asp Ala Gin Gly Ser Cys Gly Gly Gly Pro Val Gly Glu 65 70 75 80
Glu Glu Ala Glu Ala Gly Gly Gly Gly Asp Val Cys Ala Val Cys Thr 85 90 95
Asp Glu Ile Ala Pro Pro Leu Arg Cys Gin Ser Phe Pro Cys Leu His
100 105 110
Pro Phe Cys Ile Pro Cys Met Lys Thr Trp Ile Pro Leu Arg Asn Thr 115 120 125
Cys Pro Leu Cys Asn Thr Pro Val Ala Tyr Leu Ile Val Gly Val Thr 130 135 140
Ala Ser Gly Ser Phe Ser Thr Ile Pro Ile Val Asn Asp Pro Arg Thr
145 150 155 160
Arg Val Glu Ala Glu Ala Ala Val Arg Ser Gly Thr Ala Val Asp Phe 165 170 175
Ile Trp Thr Gly Asn Pro Arg Thr Ala Pro Arg Ser Leu Ser Leu Gly 180 185 190
Gly His Thr Val Arg Ala Leu Ser Pro Thr Pro Pro Trp Pro Gly Thr 195 200 205
Asp Asp Glu Asp Asp Asp Leu Ala Asp Gly Glu Gly Gly Arg Gly Ser
210 215 220
Gly Thr Gly Arg Gly Ser Gly Thr Gly Arg Gly Ser Gly Thr Gly Arg
225 230 235 240
Gly Ser Gly Thr Gly Arg Gly Ser Gly Gly Gly Arg Ala Gly Val Gly 245 250 255
His Trp Ala Gly Val Gly Arg Gly Xaa Gly Thr Asn Arg Gly Phe Pro
260 265 270
Ser Leu Ser Pro Ser Ala Ala Asp Tyr Val Pro Pro Ala Pro Arg Arg 275 280 285
Ala Pro Arg Arg Gly Gly Gly Gly Ala Gly Ala Thr Arg Gly Thr Ser
290 295 300
Gin Pro Ala Ala Trp Ala Pro Pro Gly Ala Pro Arg Ser Ser Ser Ser
305 310 315 320
Gly Gly Ala Pro Leu Arg Ala Gly Val Gly Ser Gly Ser Xaa Xaa Xaa 325 330 335
Xaa Xaa
(2) INFORMATION FOR SEQ ID NO: 187:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 642 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 187:
CGGCGGCGTT TCCGCGTTCC GTTTCTTCTC CCTCCCGGCC GCCCCGCTCC CGGGCCCGAC 60 CCTCGCCCCT TCCCTTCTCC TCGTCTTCCC CCGTCCCGCC GCGCCCCTTC CCTCTTCCTT 120 CTCTCTCTCT GTCTCGCTGT CTCGCTCTCC TCACATTTCC CCCCCCCCCC CCCGCCGCCG 180
CCGCCGCCCT CTGCCCGCGT CCCACCGAGA CGCCGCGCCG CGTGAGCCGT CCGCCGGGGG 240
ACCCAGGCTC CGGGGGGGGG GCGCGCCTGC GTGTGTCTCG TGTGAGAGAG CGCGCCCCTC 300
GAACGCCGCG CGTTCTCGCA GGTAGGTTTA GGGTCGTACA GGTGAGCTTC TGCTGAGGCG 360
GCGGGAGAGG GGGGGGCGGG CGGAAGAGAG AAGAGAGCAG GGGTTGGGGG AAAACTGTTC 420
TTCCTCCCCC TTTCAAGAAA CACGAGGCGG GGGTCCCAGA AAGGGCAGGC AGGTCAGCCG 480
CACCGCCCGC GAGCCAACCC GTATCCTTTT TTTCTAGGTG TTTTTGTTTT TGTTTCTGTT 540
TTTGTTTGTT TTGTTATTAT TTTCGCGGAT CCGGCGTGTT CGGATCCACC CCCCCCTTTC 600
TCCTTCCTCT TCCCTTCCAC CCACCCCCGT TTCCCCCCCC C 642
(2) INFORMATION FOR SEQ ID NO: 188:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 353 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 188:
CGCGCCCCCG CCCGGCCGCC GCGCGCCCCC GCCCGACCGC CGCGCGCCCC CGCCCGGCCG 60
CCGCGCGCCC CCGCCCGGCC GCCCGCGTCG CGCCGGCGCC CCCTCCCGGC GCTTCCGGGG 120
CCTTTCCTTC CTTCCCCGCC GCGACCCCGG CCCCGCCCCA CCGCCCCGCC CGGCAGGGGG 180 GCCCCGGCGC CGCGCAGAAC ACACAGACGA ACACACGGTG GCGATCTTTT CTTTACTTCG 240
GCAGACCAGC GAGCCCCGGC CCCGGCCCGC GCCCCGCCGC CACACCCACG GCACCCCCCC 300
CGCCGCCCAC CCCGGGGTCC ACACAGGAGC GCGCGGGCGG CAGAAACGCG GG 353
(2) INFORMATION FOR SEQ ID NO: 189:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 6386 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 189:
CACGCGCCGC ATTTGCGCCC GGGGCCCCGC GCTCCCCCCG GGCGGCCTGG CCGTCGGGGG 60
CCAGATGTAC GTGAACCGCA ACGAGATCTT CAACGCCGCG CTGGCCGTTA CGAACATCAT 120
CCTGGATCTG GACATCGCCC TGAAGGAGCC CGTCCCCTTT CCCCGGCTCC ACGAGGCCCT 180
GGGTCACTTT AGGCGCGGGG CGCTGGCGGC GGTTCAGCTG TTGTTTCCCG CGGCCCGCGT 240 AGACCCCGAC GCCTATCCCT GTTATTTTTT CAAAAGCGCC TGTCGGCCCC GCGCGCCGCC 300 CGTCTGTGCG GGCGACGGGC CCTCGGCCGG TGGCGACGAC GGCGACGGGG ACTGGTTCCC 360 CGACGCCGGT GGCGACGACG GCGACGAGGA GTGGGAGGAG GACACGGACC CCATGGACAC 420 GACCCACGGC CCCCTCCCGG ACGACGAGGC CGCGTACCTC GACCTGCTAC ACGAACAGAT 480 ACCAGCGGCG ACGCCCAGCG AACCGGACTC CGTCGTGTGT TCCTGCGCCG ACAAGATCGG 540 GCTGCGCGTG TGCCTACCGG TCCCCGCCCC GTACGTTGTG CACGGCTCCC TGACGATGCG 600 TGGGGTGGCG AGGGTGATCC AGCAGGCGGT GCTGTTGGAC CGCGACTTCG TGGAGGCCGT 660 AGGGAGCCAC GTAAAGAACT TTTTGCTGAT CGATACGGGC GTGTACGCCC ACGGCCACAG 720 CCTGCGCTTG CCGTATTTCG CCAAGATCGG CCCCGACGGC TCCGCGTGCG GCCGGTTATT 780 GCCCGTCTTC GTGATCCCCC CCGCGTGCGA GGACGTTCCG GCGTTCGTCG CCGCGCACGC 840 CGACCCGCGG CGCTTCCACT TTCACGCCCC GCCCATGTTT TCCGCGGCCC CGCGGGAGAT 900 CCGCGTCCTC CACAGCCTGG GCGGGGACTA TGTCAGCTTT TTCGAGAAGA AGGCGTCGCG 960
CAACGCCCTG GAGCACTTTG GGCGACGCGA GACCCTGACG GAGGTTCTGG GCCGCTACGA 1020
TGTGCGGCCC GACGCCGGGG AGACCGTGGA GGGGTTCGCG TCAGAACTGC TGGGGCGAAT 1080 AGTCGCGTGC ATCGAGGCCC ACTTTCCCGA GCACGCGCGG GAATATCAGG CCGTGTCCGT 1140
TCGCCGGGCC GTCATTAAGG ACGACTGGGT CCTGCTGCAG CTGATCCCCG GCCGCGGCGC 1200
CCTGAACCAA AGCCTCTCGT GTCTGCGCTT CAAGCACGGC AGGGCAAGTC GCGCGACGGC 1260
CCGGACCTTT CTCGCGCTGA GCGTCGGGAC CAACAACCGC CTATGCGCGT CCCTGTGTCA 1320
GCAGTGCTTT GCCACTAAAT GCGATAACAA CCGCCTGCAC ACGCTGTTTA CCGTCGATGC 1380 GGGCACGCCA TGCTCGCGGT CCGCTCCCTC CAGCACCTCA CGACCGTCAT CTTCATAACG 1440
GCCTACGGCC TCGTGCTCGC GTGGTACATC GTCTTTGGTG CCAGTCCGCT CCACCGATGT 1500
ATTTACGCGG TGCGCCCCGC CGGGGCACAC AACGATACCG CCCTCGTGTG GATGAAGATA 1560
AACCAGACGC TGTTGTTTCT GGGCCCGCCG ACCGCCCCCC CCGGCGGGGC ATGGACCCCC 1620
CACGCCCACG TCTGCTACGC CAATATCATC GAAGGTCGGG CCGTGTCCCT CCCGGCCATC 1680 CCCGGCGCCA TGAGCCGCCG GGTCATGAAC GTGCACGAGG CCGTAAACTG CTTGGAGGCC 1740
CTCTGGGACA CCCAGATGCG CCTGGTGGTC GTCGGTTGGT TTCTGTATCT AGCGTTCGTC 1800
GCCCTTCACC AACGACGATG CATGTTCGGC GTCGTGAGTC CCGCGCACAG CATGGTGGCC 1860
CCGGCGACCT ATCTTTTGAA CTACGCCGGC CGCATAGTGT CGAGCGTGTT CTTGCAATAC 1920
CCCTACACGA AAATCACCCG CCTCCTCTGC GAGCTATCCG TTCAACGCCA GACCCTGGTG 1980 CAGCTGTTCG AGGCGGATCC GGTCACCTTC TTGTACCACC GCCCGGCCGT TGGCGTCATC 2040
GTGGGCTGCG AGCTGCTGCT CCGCTTCGTG GCCCTCGGTC TCATCGTCGG CACCGCTCTC 2100
ATCTCCCGGG GCGCCTGCGC GATCACATAC CCCCTGTTTC TAACAATCAC CACCTGGTGT 2160
TTCGTGTCCA TCATCGCCCT GACGGAGCTG TATTTCATCC TGCGGCGGGA CTCGGCCCCC 2220
AAAAACGCGG AACCAGCGGC CCCCAGGGGG CGCTCCAAAG GGTGGTCGGG CGTCTGCGGG 2280 CGCTGCTGTT CCATCATCCT CTCCGGTATC GCCGTGCGCC TGTGCTATAT CGCCGTCGTG 2340
GCCGGGGTGG TGCTTATGGC GCTTCGCTAC GAACAGGAGA TTCAGCGGCG CCTGTTTGAT 2400
CTGTGACGTA ACGCCTCTTC CGTTGGAAGA GGCGGACCCA GTCGCCCATG CAAATTAAAT 2460
ACACGACCCG CCTCGGGCCT ACGCACCCTC GCACGTCGCA TGCAAATTAA AATCGTGCAC 2520
AGAGCCGATC CGGCCTCGGG TCTGCTTGCC CCTCCCCCGG TCCAGCACAG GCAGGCTCGT 2580 CCGACTTCCG CATACACCCC ACCCTACCGC GTGCTTCCGC ACCCCCGCCT ACGCGTGTAC 2640
GCGAAGGCGG ACCCAGACCT GCCGTATGCT AATTAAATAC ATAAAACCCA CCCTCGGCGT 2700
CCGATTGGTT TCTGGGGACG GCGGGGGCGG GGGCGGTGAC GCCCGACGGG GAGGGACAAG 2760
GAGGAGTTTC GGAAAGCCGG CCCCGGTCGT GCGGGTATAA GGGCAGCCAC CGGCCCACTG 2820 GGCGCTGTGT GCTGCCGTGT GCCGACCCCG GTTGCGCGTC GGTGCCGCTC CTCGATTCGG 2880
ACCCGGCCAC TCTCTTCCGA CACGCGCCCC CTCGGAGGAC ACCCGCCATC CCAGCCCCGG 2940
CGACCTACAA CATGGCTACC GACATTGATA TGCTAATCGA CCTAGGATTG GACCTGTCCG 3000
ACAGCGAGCT CGAGGAGGAC GCTCTGGAGC GGGACGAGGA GGGCCGCCGC GACGACCCCG 3060 AGTCCGACAG CAGCGGGGAG TGTTCCTCGT CGGACGAGGA CATGGAAGAC CCCTGCGGAG 3120
ACGGAGGGGC GGAGGCCATC GACGCGGCGA TTCCCAAAGG TCCCCCGGCC CGCCCCGAGG 3180
ACGCCGGCAC CCCCGAAGCC TCGACGCCTC GCCCGGCAGC GCGGCGGGGA GCCGACGATC 3240
CGCCACCCGC GACCACCGGC GTGTGGTCGC GCCTCGGGAC CAGGCGGTCG GCTTCCCCCC 3300
GGGAACCGCA CGGGGGGAAG GTGGCCCGCA TCCAACCCCC GTCGACCAAG GCACCGCATC 3360 CCCGAGGCGG GCGGCGAGGT CGCCGCCGGG GCCGGGGTCG ATACGGCCCC GGCGGCGCCG 3420
ACTCCACACC AAACCCCCGC CGGCGCGTCT CCAGAAACGC CCACAACCAA GGGGGTCGCC 3480
ACCCCGCGTC GGCGCGGACG GACGGCCCCG GCGCCACCCA CGGCGAGGCG CGGCGCGGAG 3540
GGGAGCAGCT CGACGTCTCC GGGGGCCCGC GGCCACGAGG CACGCGCCAG GCCCCCCCTC 3600
CGCTGATGGC GCTGTCCCTG ACCCCCCCGC ACGCGGACGG CCGCGCCCCG GTCCCGGAGC 3660 GAAAGGCGCC CTCTGCCGAC ACCATCGACC CCGCCGTTCG GGCGGTTCTG CGATCCATAT 3720
CCGAGCGCGC GGCGGTCGAG CGCATCAGCG AAAGCTTTGG ACGCAGTGCC CTGGTCATGC 3780
AAGACCCCTT TGGCGGGATG CCGTTTCCCG CCGCGAACAG CCCCTGGGCT CCCGTGCTGG 3840
CCACCCAAGC GGGGGGGTTT GACGCCGAGA CCCGTCGGGT TTCCTGGGAA ACCCTGGTCG 3900
CTCACGGCCC GAGCCTCTAC CGCACATTCG CAGCCAACCC GCGGGCCGCG TCGACAGCCA 3960 AGGCCATGCG CGACTGCGTG CTGCGCCAGG AAAATCTCAT CGAGGCCCTG GCGTCCGCGG 4020
ATGAGACGCT GGCGTGGTGC AAGATGTGCA TTCACCACAA TCTGCCGCTC CGCCCCCAGG 4080
ACCCTATCAT CGGAACGGCG GCCGCCGTGC TGGAAAACCT CGCCACGCGC CTGCGCCCCT 4140
TTCTGCAGTG CTACCTGAAG GCCCGAGGCC TGTGCGGGCT GGACGACCTG TGCTCGCGGC 4200
GACGCCTGTC GGACATTAAG GATATTGCCT CCTTTGTGTT GGTCATCCTG GCCCGCCTCG 4260 CCAACCGCGT CGAGCGCGGC GTGTCGGAGA TCGACTACAC GACCGTGGGG GTTGGGGCCG 4320
GCGAGACGAT GCACTTTTAC ATCCCGGGGG CCTGCATGGC GGGTCTCATT GAAATACTGG 4380
ACACGCACCG CCAGGAGTGT TCCAGTCGCG TGTGCGAGCT GACGGCCAGT CACACTATCG 4440
CCCCCTTATA TGTGCACGGC AAATACTTCT ACTGCAACTC CCTATTTTAG GCAAGAATAA 4500
ACATATTGAC GTCAACCCAA GTGGTTCCGT GTGATGTTCT TGGCGCGCGC GGCGGGTGGG 4560 GCGGAGACTC CGGGGCGATG CCGGCGTGCG CGTGGGAGGA GGGCGATGAC CCACCGGATA 4620
AATGTGGGGC CCCGGCCCGG CCCGCTTCAT AGCGCGTCCA GGAACTCACG GCAGACGCGT 4680
ATTCACCGAC CCCCCCCCTC GCAACATGAC AACGACGCCC CTCTCGAACC TGTTTTTACG 4740
GGCCCCGGAC ATCACCCACG TCGCCCCCCC GTACTGTCTG AATGCCACGT GGCAGGCCGA 4800
AAACGCCCTG CACACGACCA AAACGGACCC CGCGTGCCTG GCCGCGCGGA GTTATTTAGT 4860 CCGCGCCTCC TGCTCGACCA GCGGCCCCAT CCACTGTTTT TTCTTTGCGG TGTACAAGGA 4920
CTCGCAGCAC TCCCTTCCGC TGGTTACCGA GCTCCGCAAC TTCGCGGACC TGGTCAACCA 4980
CCCGCCCGTC TTGCGCGAAC TAGAGGATAA GCGTGGGGGG CGGCTGCGGT GCACGGGCCC 5040
ATTCAGCTGC GGAACCATCA AGGACGTCTC CGGTGCATCC CCCGCGGGGG AATACACGAT 5100
AAACGGTATC GTGTACCACT GTCACTGTCG GTATCCGTTC TCCAAAACCT GCTGGCTCGG 5160 GGCATCCGCG GCCCTACAAC ACCTTCGCTC TATAAGCTCA AGCGGCACGG CCGCTCGCGC 5220
GGCAGAACAG CGACGCCACA AAATCAAAAT CAAAATCAAG GTATAACCCA CCCCCTTCCC 5280
TCCGAGTCCG TATGCAACCT CATTAATAAA GAGTGAGAAC CAACCAAAAC AGACGCGGTG 5340
TGAGTTTGTG GGTTATAGGA ACCCGGTAAA TACCACGCGA CGAACCAGCG TGTGTGTTAA 5400 CGCGACTTTT ATTCGTTGTA TCGCGGGAGG GGGGAAGCTT ACCGCCAAAG GAAGGCCAAG 5460
ATGATAACGA CGACCACCGC GACCACCCCA AAAACCGCAT GACGACACGT CCCGCCACAC 5520
CACCCTGGGG CTTGGGGCGT GTCGGAGCTC GACGCACAGC GGGCCGCGCG TTGGGCCCGG 5580
TACAGCTCTC GCGAATTGAC GAGCGGGGGT CGCCACGTGC GCGAGCTTTG CACGCGGGGT 5640 TGGTCGGCCG GCCCCACGGA CCCGCCCGGT GGCTCGGTCG GACATGCGGC CATGACCATG 5700
GCGTAGGTGG GGGGGCGATC CGAGGTCGCC TCTGCGTAAG TAGGGAGGCC CGACGGGAGG 5760
TCGCCTCCCA CGCCAGGGTG GGCCCCAATC ATAGTTTCCG GTAGAAACAG GGGGGTCTCC 5820
ACAAACAACC CCCCTGGGCC AAAGCTCCGG CGCCGCGCCC GTCGTTCGGC GCGGCGCCTG 5880
GCGCGCCGAG CGGCCCGCCA GGCGGCGCGG CGCGAGCGGC CACGCTCACA CACCTCGCCG 5940 TCACCGGAAG AAGCCGGTGA AACAAGCCCA ACCGGCGACG TCCCTGCAGA GTACGGTGGA 6000
GGCGAGTCCG TGGGGGTGTC GATATCAATA ACGACAAACT GGCCCGCGCT CGCGCCGGCC 6060
ACACTCTCGT ATGGGGGCGG GGCGTCAATC ACGCTATCAT CTCCGTCATC CCTGCATGCG 6120
TGGGCATGCC CAGCCCCCAA CGCCATGGTG GGGATTCGCG GCTCAGAAGC CTGCATGTCG 6180
TGTGGTCGGT CGTAGTCCAA CGTGCCTCCC CCACCCACCA CACAGCCGGT CCCCACGCCG 6240 ACCACTAGAC CGCAGACGTC GCCCAACCGA GGTCCCCGTG CACAGACCGC GCCTTTTATA 6300
GCCCCAGGGG TTGCTAATTA ACGCACGCAT GCAGACGCAA TTTATTTTGC TCCCCCGCGT 6360
CCTCCCCTCC CCCGCGTCCT CCCNT 6386
(2) INFORMATION FOR SEQ ID NO: 190:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 477 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190:
Xaa Xaa Xaa Xaa Xaa Thr Arg Arg Ile Cys Arg Pro Ala Leu Pro Pro
1 5 10 15
Gly Gly Leu Ala Val Gly Gly Gin Met Tyr Val Asn Arg Asn Glu Ile 20 25 30 Phe Asn Ala Ala Val Thr Asn Ile Ile Leu Asp Leu Asp Ile Ala Leu 35 40 45
Lys Glu Pro Val Pro Phe Pro Arg Leu His Glu Ala Leu Gly His Phe
50 55 60
Arg Arg Gly Ala Ala Val Gin Leu Leu Phe Pro Ala Ala Arg Val Asp 65 70 75 80
Pro Asp Ala Tyr Pro Cys Tyr Phe Phe Lys Ser Ala Cys Arg Pro Arg
85 90 95
Ala Pro Pro Val Cys Ala Gly Asp Gly Pro Ser Ala Gly Gly Asp Asp 100 105 110
Gly Asp Gly Asp Trp Phe Pro Asp Ala Gly Gly Asp Asp Gly Asp Glu
115 120 125
Glu Trp Glu Glu Asp Thr Asp Pro Met Asp Thr Thr His Gly Pro Leu 130 135 140
Pro Asp Asp Glu Ala Ala Tyr Leu Asp Leu Leu His Glu Gin Ile Pro
145 150 155 160
Ala Ala Thr Pro Ser Glu Pro Asp Ser Val Val Cys Ser Cys Ala Asp
165 170 175 Lys Ile Gly Leu Arg Val Cys Leu Pro Val Pro Ala Pro Tyr Val Val
180 185 190
His Gly Ser Leu Thr Met Arg Gly Val Ala Arg Val Ile Gin Gin Ala
195 200 205
Val Leu Leu Asp Arg Asp Phe Val Glu Ala Val Gly Ser His Val Lys 210 215 220
Asn Phe Leu Leu Ile Asp Thr Gly Val Tyr Ala His Gly His Ser Leu
225 230 235 240
Arg Leu Pro Tyr Phe Ala Lys Ile Gly Pro Asp Gly Ser Ala Cys Gly
245 250 255 Arg Leu Leu Pro Val Phe Val Ile Pro Pro Ala Cys Glu Asp Val Pro
260 265 270
Ala Phe Val Ala Ala His Ala Asp Pro Arg Arg Phe His Phe His Ala
275 280 285
Pro Pro Met Phe Ser Ala Ala Pro Arg Glu Ile Arg Val Leu His Ser 290 295 300
Leu Gly Gly Asp Tyr Val Ser Phe Phe Glu Lys Lys Ala Ser Arg Asn
305 310 315 320
Ala Leu Glu His Phe Gly Arg Arg Glu Thr Leu Thr Glu Val Leu Gly
325 330 335 Arg Tyr Asp Val Arg Pro Asp Ala Gly Glu Thr Val Glu Gly Phe Ala
340 345 350
Ser Glu Leu Leu Gly Arg Ile Val Ala Cys Ile Glu Ala His Phe Pro
355 360 365
Glu His Ala Arg Glu Tyr Gin Ala Val Ser Val Arg Arg Ala Val Ile 370 375 380
Lys Asp Asp Trp Val Leu Leu Gin Leu Ile Pro Gly Arg Gly Ala Leu 385 390 395 400
Asn Gin Ser Leu Ser Cys Leu Arg Phe Lys His Gly Arg Ala Ser Arg 405 410 415 Ala Thr Ala Arg Thr Phe Leu Ala Leu Ser Val Gly Thr Asn Asn Arg 420 425 430
Leu Cys Ala Ser Leu Cys Gin Gin Cys Phe Ala Thr Lys Cys Asp Asn 435 440 445 Asn Arg Leu His Thr Leu Phe Thr Val Asp Ala Gly Thr Pro Cys Ser
450 455 460
Arg Ser Ala Pro Ser Ser Thr Ser Arg Pro Ser Ser Ser 465 470 475
(2) INFORMATION FOR SEQ ID NO: 191:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 332 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 191:
Met Leu Ala Val Arg Ser Leu Gin His Leu Thr Thr Val Ile Phe lie 1 5 10 15 Thr Ala Tyr Gly Leu Val Leu Ala Trp Tyr Ile Val Phe Gly Asp Leu 20 25 30
His Arg Cys Ile Tyr Ala Val Arg Pro Ala Gly Ala His Asn Asp Thr
35 40 45
Ala Leu Val Trp Met Lys Ile Asn Gin Thr Leu Leu Phe Leu Gly Pro 50 55 60
Pro Thr Ala Pro Pro Gly Gly Ala Trp Thr Pro His Ala His Val Cys 65 70 75 80
Tyr Ala Asn Ile Ile Glu Gly Arg Ala Val Ser Leu Pro Ala Ile Pro 85 90 95 Gly Ala Met Ser Arg Arg Val Met Asn Val His Glu Ala Val Asn Cys 100 105 110
Leu Glu Ala Leu Trp Asp Thr Gin Met Arg Leu Val Val Val Gly Trp
115 120 125
Phe Leu Tyr Leu Ala Phe Val His Gin Arg Arg Cys Met Phe Gly Val 130 135 140
Val Ser Pro Ala His Ser Met Val Ala Pro Ala Thr Tyr Leu Leu Asn 145 150 155 160
Tyr Ala Gly Arg Ile Val Ser Ser Val Phe Leu Gin Tyr Pro Tyr Thr 165 170 175 Lys Ile Thr Arg Leu Leu Cys Glu Leu Ser Val Gin Arg Gin Thr Leu 180 185 190
Val Gin Leu Phe Glu Ala Asp Pro Val Thr Phe Leu Tyr His Arg Pro 195 200 205 Ala Val Gly Val Ile Val Gly Cys Glu Leu Leu Leu Arg Phe Val Gly
210 215 220
Leu Ile Val Gly Thr Ala Leu Ile Ser Arg Gly Ala Cys Ala Ile Thr 225 230 235 240 Tyr Pro Leu Phe Leu Thr Ile Thr Thr Trp Cys Phe Val Ser Ile Ile
245 250 255
Ala Leu Thr Glu Leu Tyr Phe Ile Leu Arg Arg Asp Ser Ala Pro Lys
260 265 270
Asn Ala Glu Pro Ala Ala Pro Arg Gly Arg Ser Lys Gly Trp Ser Gly 275 280 285
Val Cys Gly Arg Cys Cys Ser Ile Ile Leu Ser Gly Ile Ala Val Arg
290 295 300
Leu Cys Tyr Ile Ala Val Val Ala Gly Val Val Leu Met Ala Leu Arg 305 310 315 320 Tyr Glu Gin Glu Ile Gin Arg Arg Leu Phe Asp Leu
325 330
(2) INFORMATION FOR SEQ ID NO: 192:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 574 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 192:
Val Thr Pro Asp Gly Glu Gly Gin Gly Gly Val Ser Glu Ser Arg Pro 1 5 10 15
Arg Ser Cys Gly Tyr Lys Gly Ser His Arg Pro Thr Gly Arg Cys Val
20 25 30
Leu Pro Cys Ala Asp Pro Gly Cys Ala Ser Val Pro Leu Leu Asp Ser 35 40 45
Asp Pro Ala Thr Leu Phe Arg His Ala Pro Pro Arg Arg Thr Pro Ala
50 55 60
Ile Pro Ala Pro Ala Thr Tyr Asn Met Ala Thr Asp Ile Asp Met Leu 65 70 75 80 Ile Asp Leu Gly Leu Asp Leu Ser Asp Ser Glu Leu Glu Glu Asp Ala
85 90 95
Leu Glu Arg Asp Glu Glu Gly Arg Arg Asp Asp Pro Glu Ser Asp Ser 100 105 110 Ser Gly Glu Cys Ser Ser Ser Asp Glu Asp Met Glu Asp Pro Cys Gly
115 120 125
Asp Gly Gly Ala Glu Ala Ile Asp Ala Ala Ile Pro Lys Gly Pro Pro
130 135 140 Ala Arg Pro Glu Asp Ala Gly Thr Pro Glu Ala Ser Thr Pro Arg Pro
145 150 155 160
Ala Ala Arg Arg Gly Ala Asp Asp Pro Pro Pro Ala Thr Thr Gly Val
165 170 175
Trp Ser Arg Leu Gly Thr Arg Arg Ser Asp Arg Glu Pro His Gly Gly 180 185 190
Lys Val Ala Arg Ile Gin Pro Pro Ser Thr Lys Ala Pro His Pro Arg
195 200 205
Gly Gly Arg Arg Gly Arg Arg Arg Gly Arg Gly Arg Tyr Gly Pro Gly
210 215 220 Gly Ala Asp Ser Thr Pro Asn Pro Arg Arg Arg Val Ser Arg Asn Ala
225 230 235 240
His Asn Gin Gly Gly Arg His Pro Ala Ser Ala Arg Thr Asp Gly Pro
245 250 255
Gly Ala Thr His Gly Glu Ala Arg Arg Gly Gly Glu Gin Leu Asp Val 260 265 270
Ser Gly Gly Pro Arg Pro Arg Gly Thr Arg Gin Ala Pro Pro Pro Leu
275 280 285
Met Ala Leu Ser Leu Thr Pro Pro His Ala Asp Gly Arg Ala Pro Val
290 295 300 Pro Glu Arg Lys Ala Pro Ser Ala Asp Thr Ile Asp Pro Ala Val Arg
305 310 315 320
Ala Val Leu Arg Ser Ile Ser Ala Ala Val Glu Arg lie Ser Glu Ser
325 330 335
Phe Gly Arg Ser Ala Leu Val Met Gin Asp Pro Phe Gly Gly Met Pro 340 345 350
Phe Pro Ala Ala Asn Ser Pro Trp Ala Pro Val Leu Ala Thr Gin Ala
355 360 365
Gly Gly Phe Asp Ala Glu Thr Arg Arg Val Ser Trp Glu Thr Leu Val
370 375 380 Ala His Gly Pro Ser Leu Tyr Arg Thr Phe Ala Ala Asn Pro Arg Ala
385 390 395 400
Ala Ser Thr Ala Lys Ala Met Arg Asp Cys Val Leu Arg Gin Glu Asn
405 410 415
Leu Ile Glu Ala Ser Ala Asp Glu Thr Leu Ala Trp Cys Lys Met Cys 420 425 430
Ile His His Asn Leu Pro Leu Arg Pro Gin Asp Pro Ile Ile Gly Thr
435 440 445
Ala Ala Ala Val Leu Glu Asn Leu Ala Thr Arg Leu Arg Pro Phe Leu 450 455 460
Gin Cys Tyr Leu Lys Arg Leu Cys Gly Leu Asp Asp Leu Cys Ser Arg 465 470 . 475 480
Arg Arg Leu Ser Asp lie Lys Asp Ile Ala Ser Phe Val Leu Val Ile 485 490 495
Leu Ala Arg Leu Ala Asn Arg Val Glu Arg Gly Val Ser Glu Ile Asp
500 505 510
Tyr Thr Thr Val Gly Val Gly Ala Gly Glu Thr Met His Phe Tyr Ile 515 520 525 Pro Gly Ala Cys Met Ala Gly Leu Ile Glu Ile Leu Asp Thr Gin Glu 530 535 540
Cys Ser Ser Arg Val Cys Glu Leu Thr Ala Ser His Thr Ile Ala Pro 545 550 555 560
Leu Tyr Val His Gly Lys Tyr Phe Tyr Cys Asn Ser Leu Phe 565 570
(2) INFORMATION FOR SEQ ID NO: 193:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 212 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:193:
Met Trp Gly Pro Gly Pro Ala Arg Phe Ile Ala Arg Pro Gly Thr His 1 5 10 15
Gly Arg Arg Val Phe Thr Asp Pro Pro Pro Arg Asn Met Thr Thr Thr
20 25 30
Pro Leu Ser Asn Leu Phe Leu Arg Ala Pro Asp Ile Thr His Val Ala 35 40 45 Pro Pro Tyr Cys Leu Asn Ala Thr Trp Gin Ala Glu Asn Ala Leu His 50 55 60
Thr Thr Lys Thr Asp Pro Ala Cys Leu Ala Ala Arg Ser Tyr Leu Val 65 70 75 80
Arg Ala Ser Cys Ser Thr Ser Gly Pro Ile His Cys Phe Phe Phe Ala 85 90 95
Val Tyr Lys Asp Ser Gin His Ser Leu Pro Leu Val Thr Glu Leu Arg
100 105 110
Asn Phe Ala Asp Leu Val Asn His Pro Pro Val Leu Arg Glu Leu Glu 115 120 125
Asp Lys Arg Gly Gly Arg Leu Arg Cys Thr Gly Pro Phe Ser Cys Gly
130 135. 140
Thr Ile Lys Asp Val Ser Gly Asp Ala Gly Glu Tyr Thr Ile Asn Gly 145 150 155 160
Ile Val Tyr His Cys His Cys Arg Tyr Pro Phe Ser Lys Thr Cys Trp
165 170 175
Leu Gly Ala Ser Ala Ala Leu Gin His Leu Arg Ser Ile Ser Ser Ser 180 185 190 Gly Thr Ala Ala Arg Ala Ala Glu Gin Arg Arg His Lys Ile Lys Ile 195 200 205
Lys Ile Lys Val 210
(2) INFORMATION FOR SEQ ID NO: 194:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 117 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194:
Met Ile Gly Ala His Pro Gly Val Gly Gly Asp Leu Pro Ser Gly Leu
1 5 10 15
Pro Thr Tyr Ala Glu Ala Thr Ser Asp Arg Pro Pro Thr Tyr Ala Met 20 25 30
Val Met Ala Ala Cys Pro Thr Glu Pro Pro Gly Gly Ser Val Gly Pro
35 40 45
Ala Asp Gin Pro Arg Val Gin Ser Ser Arg Thr Trp Arg Pro Pro Leu 50 55 60 Val Asn Ser Arg Glu Leu Tyr Arg Ala Gin Arg Ala Ala Arg Cys Ala 65 70 75 80
Ser Ser Ser Asp Thr Pro Gin Ala Pro Gly Trp Cys Gly Gly Thr Cys
85 90 95
Arg His Ala Val Phe Gly Val Val Ala Val Val Val Val Ile Ile Leu 100 105 110
Ala Phe Leu Trp Arg 115 (2) INFORMATION FOR SEQ ID NO: 195:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3699 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195:
GGGGGCCACG CGGCGGCGGG CCTGACGGAG CTGTGTCAGA CCCTCGCGCC CCGGGACCTC 60
ACGGACCCGC TGCTGTTTGC GTACGTCGGA TTCCAGGTCG TGAACCACGG GCTGATGTTT 120
GTGGTCCCCG ACATCGCCGT ATACGCGATG CTGGGGGGCG CCGTGTGGAT CTCGCTGACG 180 CAGGTGCTTG GGCTCCGGCG CCGCCTTCAC AAGGACCCAG ACGCCGGGCC CTGGGCGGCC 240
GCGACCCTGC GGGGCCTCTT TTTCTCCGTC TACGCATTGG GGTTTGCGGC GGGGGTGCTG 300
GTGCGGCCGC GGATGGCGGC GAGCCGGCGG TCGGGGTGAT CGCCATTTCA AATAAAAGGC 360
ACGAGTTCCC CGAATACCAC CGGCGTGTGA TGATTTCGCC CTACCGCTCC GATCCCCGGG 420
GGGAGGGGGG AAGGAAATGG GCGCGGGGGT GCCGTGGACG GGTATAAAGG CCAGGGGGGC 480 AGGCGGGCCC ATCACTGTTA GGGTGTTAGG TTGGGAGGTG GCACAAAAAG CGACACACCC 540
GTGTTGTAGT TGTCCGCGGG AGGCGGTGGT TTCCGGCAAC CCTCCTCGCT GCGCCGGGCG 600
CGCCCACCGG TCCTTCGCGG GGGCCGGGGC TCTTCTGGTC ATGGCCCTTG GACGGGTGGG 660
CCTAGCCGTG GGCCTGTGGG GCCTGCTGTG GGTGGGTGTG GTCGTGGTGC TGGCCAATGC 720
CTCCCCCGGA CGCACGATAA CGGTGGGCCC GCGGGGGAAC GCGAGCAATG CCGCCCCCTC 780 CGCGTCCCCG CGGAACGCAT CCGCCCCCCG AACCACACCC ACGCCCCCCC AACCCCGCAA 840
GGCGACGAAA AGTAAGGCCT CCACCGCCAA ACCGGCCCCG CCCCCCAAGA CCGGGCCCCC 900
GAAGACATCC TCGGAGCCCG TGCGATGCAA CCGCCACGAC CCGCTGGCCC GGTACGGCTC 960
GCGGGTGCAA ATCCGATGCC GGTTTCCCAA CTCCACCCGC ACGGAGTCCC GCCTCCAGAT 1020
CTGGCGTTAT GCCACGGCGA CGGACGCCGA GATCGGAACG GCGCCTAGCT TAGAGGAGGT 1080 GATGGTAAAC GTGTCGGCCC CGCCCGGGGG CCAACTGGTG TATGACAGCG CCCCCAACCG 1140
AACGGACCCG CACGTGATCT GGGCGGAGGG CGCCGGCCCG GGCGCCAGCC CGCGGCTGTA 1200
CTCGGTCGTC GGGCCGCTGG GTCGGCAGCG GCTCATCATC GAAGAGCTGA CCCTGGAGAC 1260
CCAGGGCATG TACTACTGGG TGTGGGGCCG GACGGACCGC CCGTCCGCGT ACGGGACCTG 1320
GGTGCGCGTT CGCGTGTTCC GCCCTCCGTC GCTGACCATC CACCCCCACG CGGTGCTGGA 1380 GGGCCAGCCG TTTAAGGCGA CGTGCACGGC CGCCACCTAC TACCCGGGCA ACCGCGCGGA 1440
GTTCGTCTGG TTCGAGGACG GTCGCCGGGT ATTCGATCCG GCCCAGATAC ACACGCAGAC 1500
GCAGGAGAAC CCCGACGGCT TTTCCACCGT CTCCACCGTG ACCTCCGCGG CCGTCGGCGG 1560
CCAGGGCCCC CCGCGCACCT TCACCTGCCA GCTGACGTGG CACCGCGACT CCGTGTCGTT 1620
CTCTCGGCGC AACGCCAGCG GCACGGCATC GGTGCTGCCG CGGCCAACCA TTACCATGGA 1680 GTTTACGGGC GACCATGCGG TCTGCACGGC CGGCTGTGTG CCCGAGGGGG TGACGTTTGC 1740
CTGGTTCCTG GGGGACGACT CCTCGCCGGC GGAGAAGGTG GCCGTCGCGT CCCAGACATC 1800
GTGCGGGCGC CCCGGCACCG CCACGATCCG CTCCACCCTG CCGGTCTCGT ACGAGCAGAC 1860
CGAGTACATC TGCCGGCTGG CGGGATACCC GGACGGAATT CCGGTCCTAG AGCACCACGG 1920 CAGCCACCAG CCCCCGCCGC GGGACCCCAC CGAGCGGCAG GTGATCCGGG CGGTGGAGGG 1980
GGCGGGGATC GGAGTGGCTG TCCTTGTCGC GGTGGTTCTG GCCGGGACCG CGGTAGTGTA 2040
CCTCACCCAC GCCTCCTCGG TGCGCTATCG TCGGCTGCGG TAACTCCGGG GCCGGGCCCG 2100
GCCGCCGGTT GTCTTCTTTT CCACCCCTTC CGTCCCCCGT ACCCACCACA CCCCACCCCA 2160 CCCCCCCGCC GTCCCCCGGG CGTTATAAGC CGCCGCACTC GCTTTTCCCA CCGGAAAATC 2220
CTCGGCCCGA TCCGAACGGC GCACGCCGCG TGGGCTCCAA ACGCCTCCGG AAGAGAGCGC 2280
CCCGCCCCGA TATTCAAGCC CGCGGTGGTG CTATGGCTTT CCGTGCTTCG GGACCCGCCT 2340
ACCAGCCCCT CGCCCCCGCG GCCTCCCCGG CGCGGGCTCG TGTTCCGGCC GTGGCCTGGA 2400
TCGGCGTCGG AGCGATCGTC GGGGCCTTTG CGCTCGTCGC CGCGTTGGTT CTCGTACCCC 2460 CTCGGTCCTC GTGGGGACTC TCGCCGTGCG ACAGCGGCTG GCAGGAATTC AACGCGGGAT 2520
GCGTCGCGTG GGACCCCACC CCCGTCGAGC ACGAGCAGGC GGTCGGCGGC TGCAGCGCGC 2580
CGGCCACCCT TATCCCCCGT GCGGCCGCCA AGCACCTGGC CGCTCTGACA CGCGTCCAGG 2640
CGGAGAGATC GTCGGGTTAC TGGTGGGTGA ACGGAGACGG CATCCGGACC TGTCTGAGAC 2700
TCGTCGACAG CGTCAGTGGC ATCGACGAGT TTTGCGAGGA GCTCGCGATC CGCATATGCT 2760 ACTACCCACG AAGCCCCGGC GGGTTTGTCC GCTTCGTAAC TTCGATACGT AACGCCCTGG 2820
GGTTGCCGTG AGGCGCGCGT CCGACGGTCC CGCTTCTCGC CTCTCTTCTT CCCCCTCCCC 2880
ACCCCACCCA CCGACCAACG ACGGCGTTTG GCCAATACCC TCCTTTTTTC TTTTTCTCTT 2940
CCCCCCCCAA AAAAAAAAAC AATAAACAGC TAATTGCGTA CGACAAACCA TGCGGAACTC 3000
GCTGTTTTTT TTTCTCTGTT TGTTACTTTT TATTGAAAAC AGACATACGG GGAAAGGGGC 3060 CGGAAACCGA GACGGTGGGG CCGGCGGTCG CATTTTTTTA ATGGCTCTGG TGTCGGCCGC 3120
GTTTGAGCTT CGTCAACAGG GCGCTGAGGG CGGCGACGTT TGTCGGGCCG TCGTTGGCCA 3180
GCGCGTTGGT CCGGGGGCGG GCGGGCATGG GCGACAGGCT TAGTCCCGGG TCCGGGGCGC 3240
GTGTGGCCCC CGGAGGGGAG AAGAGGGCAG ACCCGCCCCA GTCGTACAGG GGATTTTCCG 3300
CCTCGATGTA CGGGGAGTCC GGGGCGTCTC CCGGCGGGGC CGCCCCGCCG GCGTCTTGCC 3360 GGCGAAGGCA GATGTTTTCG TATACCCGAA CCCAGGGGAT CTCCTCGTAG ACGCGCCCCC 3420
CATCCTCGCT CACCGACTCG TAAATGGAAT CTGCGTCCTC GGAGGGGGCG CGGGGGGCGT 3480
GGCTTTCGGC CGGCCAGGCG GCGGCGGTGG TGTCGGCGGC GGGGGTGGCG CCAAGCCCGA 3540
CGCCCGCGGG CATGGCGGCG TCATCGTCGG GCAGCAGATA CGTGTTTTCC ATCTGGTCCG 3600
GTTCGGCCTC CGCGTCTGGC CCCCAGGTCC GCACCGCGTC GTAAACCCCG GCGGCCTCGC 3660 GCTGAGCCGC GAGCGGGCGC GCCGCGGCTG CCGGCCGC 3699
(2) INFORMATION FOR SEQ ID NO: 196:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 117 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 196: Xaa Xaa Xaa Xaa Xaa Gly Gly His Ala Ala Ala Gly Leu Thr Glu Leu
1 5 10 15
Cys Gin Thr Leu Ala Pro Arg, Asp Leu Thr Asp Pro Leu Leu Phe Ala 20 25 30 Tyr Val Gly Phe Gin Val Val Asn His Gly Leu Met Phe Val Val Pro 35 40 45
Asp Ile Ala Val Tyr Ala Met Leu Gly Gly Ala Val Trp Ile Ser Leu
50 55 60
Thr Gin Val Leu Gly Leu Arg Arg Arg Leu His Lys Asp Pro Asp Ala 65 70 75 80
Gly Pro Trp Ala Ala Ala Thr Leu Arg Gly Leu Phe Phe Ser Val Tyr
85 90 95
Ala Leu Gly Phe Ala Ala Gly Val Leu Val Arg Pro Arg Met Ala Ala 100 105 110 Ser Arg Arg Ser Gly 115
(2) INFORMATION FOR SEQ ID NO: 197:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 536 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 197:
Met Gly Ala Gly Val Pro Trp Thr Gly Ile Lys Arg Ala Gly Gly Pro 1 5 10 15
Ile Thr Val Arg Val Leu Gly Trp Glu Val Ala Gin Lys Ala Thr His
20 25 30
Pro Cys Cys Ser Cys Pro Arg Glu Ala Val Val Ser Gly Asn Pro Pro 35 40 45
Arg Cys Ala Gly Arg Ala His Arg Ser Phe Ala Gly Ala Gly Ala Leu
50 55 60
Leu Val Met Ala Leu Gly Arg Val Gly Leu Ala Val Gly Leu Trp Gly 65 70 75 80 Leu Leu Trp Val Gly Val Val Val Val Leu Ala Asn Asp Gly Arg Thr
85 90 95
Ile Thr Val Gly Pro Arg Gly Asn Asn Ala Ala Pro Ser Asp Arg Asn 100 105 110 Ala Ser Ala Pro Arg Thr Thr Pro Thr Pro Pro Gin Pro Arg Lys Ala
115 120 125
Thr Lys Ser Lys Ala Ser Thr-Ala Lys Pro Ala Pro Pro Pro Lys Thr
130 135 140 Gly Pro Pro Lys Thr Ser Ser Glu Pro Val Arg Cys Asn Arg His Asp
145 150 155 160
Pro Leu Ala Arg Tyr Gly Ser Arg Val Gin Ile Arg Cys Arg Phe Pro
165 170 175
Asn Ser Thr Arg Thr Glu Ser Arg Leu Gin Ile Trp Arg Tyr Ala Thr 180 185 190
Ala Thr Asp Ala Glu Ile Gly Thr Ala Pro Ser Leu Glu Glu Val Met
195 200 205
Val Asn Val Ser Ala Pro Pro Gly Gly Gin Leu Val Tyr Asp Ser Ala
210 215 220 Pro Asn Arg Thr Asp Pro His Val Ile Trp Ala Glu Gly Ala Gly Pro
225 230 235 240
Gly Asp Arg Lys Val Val Gly Pro Leu Gly Arg Gin Arg Leu Ile Ile
245 250 255
Glu Glu Leu Thr Leu Glu Thr Gin Gly Met Tyr Tyr Trp Val Trp Gly 260 265 270
Arg Thr Asp Arg Pro Ser Ala Tyr Gly Thr Trp Val Arg Val Arg Val
275 280 285
Phe Arg Pro Pro Ser Leu Thr Ile His Pro His Ala Val Leu Glu Gly
290 295 300 Gin Pro Phe Lys Ala Thr Cys Thr Ala Ala Thr Tyr Tyr Pro Gly Asn
305 310 315 320
Arg Ala Glu Phe Val Trp Phe Glu Asp Gly Arg Arg Val Phe Asp Pro
325 330 335
Ala Gin Ile His Thr Gin Thr Gin Glu Asn Pro Asp Gly Phe Ser Thr 340 345 350
Val Ser Thr Val Thr Ser Ala Ala Val Gly Gly Gin Gly Pro Pro Arg
355 360 365
Thr Phe Thr Cys Gin Leu Thr Trp His Arg Asp Ser Val Ser Phe Ser
370 375 380 Arg Arg Asn Ala Ser Gly Thr Ala Ser Val Leu Pro Arg Pro Thr Ile
385 390 395 400
Thr Met Glu Phe Thr Gly Asp His Ala Val Cys Thr Ala Gly Cys Val
405 410 415
Pro Glu Gly Val Thr Phe Ala Trp Phe Leu Gly Asp Asp Ser Ser Pro 420 425 430
Ala Glu Lys Val Ala Val Ala Ser Gin Thr Ser Cys Gly Arg Pro Gly
435 440 445
Thr Ala Thr Ile Arg Ser Thr Leu Pro Val Ser Tyr Glu Gin Thr Glu 450 455 460
Tyr Ile Cys Arg Leu Ala Gly Tyr Pro Asp Gly Ile Pro Val Leu Glu 465 470 , 475 480
His His Gly Ser His Gin Pro Pro Pro Arg Asp Pro Thr Glu Arg Gin 485 490 495
Val lie Arg Ala Val Glu Gly Ala Gly Ile Gly Val Ala Val Leu Val
500 505 510
Ala Val Val Leu Ala Gly Thr Ala Val Val Tyr Leu Thr His Ala Ser 515 520 525 Ser Val Arg Tyr Arg Arg Leu Arg 530 535
(2) INFORMATION FOR SEQ ID NO: 198:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 189 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 198:
Val Gly Ser Lys Arg Leu Arg Lys Arg Ala Pro Arg Pro Asp Ile Gin 1 5 10 15
Arg Gly Ala Met Ala Phe Arg Ala Ser Gly Pro Ala Tyr Gin Pro Leu
20 25 30
Ala Pro Ala Asp Ala Arg Ala Arg Val Pro Ala Val Ala Trp Ile Gly 35 40 45
Val Gly Ala Ile Val Gly Ala Phe Ala Leu Val Ala Ala Leu Val Leu
50 55 60
Val Pro Pro Arg Ser Ser Trp Gly Leu Ser Pro Cys Asp Ser Gly Trp 65 70 75 80 Gin Glu Phe Asn Ala Gly Cys Val Ala Trp Asp Pro Thr Pro Val Glu
85 90 95
His Glu Gin Ala Val Gly Gly Cys Ser Ala Pro Ala Thr Leu Ile Pro
100 105 110
Arg Ala Ala Ala Lys His Leu Ala Ala Leu Thr Arg Val Gin Ala Glu 115 120 125
Arg Ser Ser Gly Tyr Trp Trp Val Asn Gly Asp Gly Ile Arg Thr Cys
130 135 140
Leu Arg Leu Val Asp Ser Val Ser Gly Ile Asp Glu Phe Cys Glu Glu 145 150 155 160
Leu Ala Ile Arg Ile Cys Tyr Tyr Pro Arg Ser Pro Gly Gly Phe Val
165 - 170 175
Arg Phe Val Thr Ser Ile Arg Asn Ala Leu Gly Leu Pro 180 185
(2) INFORMATION FOR SEQ ID NO: 199:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 198 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 199:
Gin Arg Pro Ala Ala Ala Ala Arg Pro Leu Ala Ala Gin Arg Glu Ala 1 5 10 15
Ala Gly Val Tyr Asp Ala Val Arg Thr Trp Gly Pro Asp Ala Glu Ala
20 25 30
Glu Pro Asp Gin Met Glu Asn Thr Tyr Leu Leu Pro Asp Asp Asp Ala 35 40 45 Ala Met Pro Ala Gly Val Gly Leu Gly Ala Thr Pro Ala Ala Asp Thr 50 55 60
Thr Ala Ala Ala Trp Pro Ala Glu Ser His Ala Pro Arg Ala Pro Ser 65 70 75 80
Glu Asp Ala Asp Ser Ile Tyr Glu Ser Val Ser Glu Asp Gly Gly Arg 85 90 95
Val Tyr Glu Glu Ile Pro Trp Val Arg Val Tyr Glu Asn Ile Cys Leu
100 105 110
Arg Arg Gin Asp Ala Gly Gly Ala Ala Pro Pro Gly Asp Ala Pro Asp 115 120 125 Ser Pro Tyr Ile Glu Ala Glu Asn Pro Leu Tyr Asp Trp Gly Gly Ser 130 135 140
Ala Leu Phe Ser Pro Pro Gly Ala Thr Arg Ala Pro Asp Pro Gly Leu 145 150 155 160
Ser Leu Ser Pro Met Pro Ala Arg Pro Arg Thr Asn Ala Asn Asp Gly 165 170 175
Pro Thr Asn Val Ala Ala Leu Ser Ala Leu Leu Thr Lys Leu Lys Arg
180 185 190
Gly Arg His Gin Ser His 195
(2) INFORMATION FOR SEQ ID NO: 200:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 152 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 200:
CTGTGTGAAA TTGTTATCCG CTCACAATTC CACACAACAT ACGAGCCGGA AGCATAAAGT 60 GTAAAGCCTG GGGTGCCTAA TGAGTGAGCT AACTCACATT AATTGCGTTG CGCTCACTGC 120
CCGCTTTCCA GTCGGGAAAC CTGTCGTGCC A 152
(2) INFORMATION FOR SEQ ID NO: 201:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 129 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201:
CCCGGAGCCC GGCGGCCGCA GCCGAGCAGC GCCGCGGGCT CCGGGGCCGG GCCGGGCCGG 60 CAACGCCCCG CGCCGGCCGC GGCGGTGAGA ACCCCTGTGT CATTGTTTAC GTGGCCGCGG 120
GCCAGCAG 129
(2) INFORMATION FOR SEQ ID NO: 202:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 127 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 202: GGTGGGGCGC GGCGGCCGGC TCGGGGTGGG GGGAGAGTGT CGTGGGTGTG TTTTCGTGTC 60 CCCCACCACC ACTCCCACCC CGACCGCCGC CGCGCCCGCG TTTCTGCCGC CCGCGCGCTC 120 CTGTGT - 127
(2) INFORMATION FOR SEQ ID NO: 203:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 157 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:203:
ATGCACGTGT AACCGCCAGT CCGTGCTTGC CTAGCGAACT CACCCGTCCC GGCTGGCGTG 60 CGCAGCCCGG GCCGTGTTGC GGGCCCTCTT AAGGGGCGGC GGCAGGACGG GGACTCCGCC 120 CCGCCTCCTT TCCCCCGGGG AGTCAACCCC CGGGGG 157
(2) INFORMATION FOR SEQ ID NO: 204:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 16813 base pairs
(B) TYPE: nucleic acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 204:
GGGGGGACGG GACGGGGGGA CGGGACGGGG GGGCCCCGAT CCCAACATCC GCGCTTTCTC 60
GCAGGCCGGG CGCCGCCTTC GTGGACGGGA CACCGGTGTG GTAACTGGCG ACAAGGCGTC 120
GCCACTATGG CAGACATCCC CCCGGACCCG CCCGCGCTCA ACACGACGCC TGCGAATCAT 180
GCTCCCCCAT CCCCACCCCC GGGTTCACGG AAGCGCAGAC GCCCCGTCCT CCCCAGCTCG 240 TCGGAATCTG AGGGTAAGCC CGACACAGAA TCGGAATCCT CCTCGACCGA GTCGTCCGAG 300
GATGAGGCGG GAGACCTACG CGGCGGGCGC CGTCGCTCCC CGCGGGAGCT CGGGGGGAGG 360
TATTTTTTGG ATCTGTCGGC AGAATCGACC ACGGGGACGG AATCGGAGGG AACGGGGCCG 420
TCGGACGACG ATGATGATGA TGCGTCAGAC GGCTGGTTGG TTGACACCCC CCCCCGTAAA 480
TCCAAGCGAC CCCGAATCAA CCTGCGATTA ACGAGCTCCC CCGACCGGCG CGCGGGTGTG 540 GTTTTCCCCG AGGTGTGGAG AAACGACAGA CCTATCCGCG CGGCGCAACC CCAGGCCCCG 600
GCCCAGTCTT CCGGGGATCG CGCAGCCGCA CCGCGGCGCT CTGCTCGCCA GGCCCAGATG 660
CGGAGCGGAG CCGCCTGGAC GCTTGATCTG CATTACATAC GCCAGTGCGT CAACCAGCTC 720
TTTCGGATCC TGCGTGCCGC CCCGAACCCG CCCGGCAGCG CCAACCGCCT GCGCCACCTG 780 GTGCGAGACT GCTACCTTAT GGGCTACTGC CGGACCCGCC TGGGGCCGCG CACGTGGGGC 840
CGCCTGCTGC AGATCTCGGG CGGAACCTGG GACGTGCGCC TGCGAAACGC AATCCGGGAG 900
GTCGAGGCGC GTTTTGAACC CGCCGCCGAG CCCGTGTGCG AGCTGCCCTG TCTGAACGCC 960
AGGCGTTACG GCCCCGAGTG TGATGTTGGC AATCTCGAGA CCAACGGCGG CTCGACGAGC 1020 GATGATGAGA TATCGGATGC GACGGACTCG GACGATACCC TCGCGTCCCA TTCCGACACG 1080
GAGGCGGGGC CCTCCCCGGC CGGCCGGGAG AACCCGGAAT CCGCGTCCGG CGGGGCTATC 1140
GCGGCTCGGC TGGAGTGTGA GTTTGGGACG TTTGACTGGA CGTCCGAGGA GGGCTCCCAG 1200
CCCTGGCTGT CCGCGGTGGT CGCCGATACC AGCTCCGCCG AACGCTCTGG CCTACCCGCC 1260
CCGGGCGCGT GTCGCGCAAC GGAAGCCCCA GAACGCGAGG ACGGGTGCCG AAAAATGCGC 1320 TTCCCCGCCG CCTGCCCCTA TCCCTGCGGC CACACATTTC TCCGGCCATG AGCGCGGGAC 1380
CCCCAGCCCG GTGTGTTTGC CAAACGAAAA TAAACGCCCT ACAAGAAAGC TTTTGTGTCT 1440
GAGTGTCTGG TTTTTCTGGG GGTGGAGGAA GGAACGACAA AAAAAGAAAC AAACGCGACA 1500
CCGCTCGTAC GTGTAATGGG GCGCAGTGTT TTTTATTAGC ATCGGGGGGG GGTTAGAGGT 1560
TGGTGATTGG ATAGCAAACG TGGGATGACG GAGGCCACTC GTCGCCAACG GCCAGCGGGG 1620 GCCCGGGGTT CTGGGGGTCA TCGTCCCCCG TCTGCCAGGA GGGCTCATCG GGAATCTCGG 1680
GTCGCCCCAT GCACGTAAAA CACGGGCGCT GCGTGGGGTG GGTCGCCGGA TGCGGGCGGG 1740
ATGATGCGGG GCGGGGTTTG TTGTGAGGAG CCACGAGGGA CCGTAGCCAG CGAAGACAGC 1800
TGCGTTCCCG GTCGCCGGGC ACCACCACGC CGTATTGGTA TTCGTATCGG CTAAGGAGAT 1860
TTTCCAGGGG GTGATTAGGC GCTGCGGGGA ACGGGGTCCA CGACACGGTC CGCTCGGGCA 1920 AAAACCGATC GGGCAGGGGC CACGGTTCCC CCACCCACGC GTCGTTGGTC TTCATGGCGA 1980
TGAAGCGAAA CCCCAGCCGG GTTTTTTGTG CGTACTCTAA AAACGGCACA CACAGGTCCG 2040
CCGCCCCGAC CACCCACAGG TGGTATAGCC GGTGGGGGCC GGGGCGCTCT TGATGCAGGA 2100
GCCGAAAACA CGCAGGGGCA TCCAGAATCT CGATGCTTTC CAGGGGGTCG TCCTCCGCAA 2160
ACAGGCCCGT CGTGGTGTTT GGGGGACAGC GACAGGAGCG GGTTCGCACG ATCGGTCGGG 2220 TGAATTTGGG CAAGTCCATC AGAGGCTCGG CCAGCCTGCG AAGGTTCGCC GGGCGAACCA 2280
CCACCGGGGT TCCCAGAGGC TCGGAGGCCA GGATCCGGCA TTGCCGAAGC AGAAAACTCC 2340
ACAGAGCCGG GCTTGCGTCA GCGGAAGTCC GCGGCAGGGC GTTTCGTTGG TCTAGGAGGG 2400
TAACCACACT TACAACAACA ACGCCCATGT CGGTATATTA GGCCCGTGGT CCGATCTTCA 2460
CTCACTCGCC TGTCTGCGGA CCTATGCACG GCGGGACGGC GCGCGGACCC GGGGGGGCTG 2520 CTTGCTATCA CACGGCCCGT TCGCACGTTC GATTTTTTCA GCCTTGTTTG GTTGGCTAGG 2580
TATCCCGGAT AATCTGACGT TCCGGATATA GGGGGCGGGG GTAGTGGGGG GGTGTGTCGA 2640
CAAACTGCCG CTTCTTAAAA CACCGGGGCC CGTCGCTCGG GGTGCTCGTT GGTTGGCACG 2700
CGCGACGCGG CGAATGGCCT GTCGTAAGTT CTGTGGGGTC TACCGTAGAC CCGACAAGAG 2760
ACAGGAGGCG TCCGTCCCGC CGGAGACAAA CACGGCCCCG GCCTTCCCGG CGAGCACCTT 2820 TTATACCCCC GCGGAGGATG CGTACCTGGC CCCCGGGCCC CCGGAAACCA TCCACCCTTC 2880
CCGCCCACCG TCCCCCGGCG AGGCTGCGCG CCTGTGTCAG CTGCAGGAGA TCTTGGCCCA 2940
GATGCACAGC GACGAGGACT ACCCCATCGT GGACGCCGCG GGTGCGGAGG AGGAAGACGA 3000
GGCCGACGAT GACGCCCCGG ATGACGTGGC CTACCCGGAG GACTACGCGG AGGGGCGTTT 3060
TCTGTCCATG GTTTCGGCCG CCCCCCTGCC CGGAGCCAGC GGCCATCCTC CTGTTCCGGG 3120 CCGCGCAGCC CCCCCCGACG TCCGGACCTG CGACAGCGGT AAGGTGGGGG CCACGGGGTT 3180
CACCCCGGAA GAGCTCGACA CCATGGACCG GGAGGCACTT CGGGCCATCA GCCGCGGGTG 3240
CAAGCCCCCT TCGACCCTGG CAAAACTGGT GACCGGGCTG GGATTCGCGA TCCACGGAGC 3300
GCTCATCCCG GGGTCGGAGG GGTGTGTCTT TGATAGCAGC CACCCGAACT ACCCTCATCG 3360 GGTAATCGTC AAGGCGGGGT GGTACGCCAG CACGAACCAC GAGGCGCGGC TGCTGAGACG 3420
CCTGAACCAC CCCGCGATCC TACCCCTCCT GGACCTGCAC GTCGTTTCTG GGGTCACGTG 3480
TCTGGTCCTC CCCAAGTATC ACTGCGACCT GTATACCTAT CTGAGCAAGC GCCCGTCTCC 3540
GTTGGGCCAC CTACAGATAA CCGCGGTCTC CCGGCAGCTC TTGAGCGCCA TCGACTACGT 3600 CCACTGCGAA GGCATCATCC ACCGCGATAT TAAGACCGAG AACATCCTCA TCAACACCCC 3660
CGAGAACATC TGTCTGGGGG ACTTTGGGGC GGCGTGCTTT GTGCGCGGGT GTCGATCGAG 3720
CCCCTTCCAT TACGGGATCG CAGGCACCAT CGATACAAAC GCCCCCGAGG TCCTGGCCGG 3780
GGATCCGTAC ACCCAGGTAA TCGACATCTG GAGCGCCGGC CTGGTGATCT TTGAGACCGC 3840
CGTCCACACC GCGTCCTTGT TCTCGGCCCC GCGCGACCCC GAAAGGCGGC CGTGCGACAA 3900 CCAGATCGCG CGCATCATCC GACAGGCCCA GGTACACGTC GACGAGTTTC CAACGCACGC 3960
GGAATCGCGC CTCACCGCGC ACTACCGCTC GCGGGCGGCC GGGAACAATC GTCCGGCGTG 4020
GACCCGACCG GCATGGACCC GCTACTACAA GATCCACACA GACGTCGAAT ATCTCATCTG 4080
CAAAGCCCTT ACCTTTGACG CGGCGCTCCG CCCAAGCGCC GCGGAGTTGC TGCGCCTGCC 4140
GCTATTTCAC CCTAAGTGAC CCCGCTCCCC CCGGGGGGCG TGGAGGGGGG GCTGGTTGGA 4200 TGTTTTTGCA CAAAAAGACG CGGCCCTCGG GCTTTGGTGT TTTTGGCACC TTGCCGCCCG 4260
GCGTCATGCA CGCCATCGCT CCCAGGTTGC TTCTTCTTTT TGTTCTTTCT GGTCTTCCGG 4320
GGACACGCGG CGGGTCGGGT GTCCCCGGAC CAATTAATCC CCCCAACAAC GATGTTGTTT 4380
TCCCGGGAGG TTCCCCCGTG GCTCAATATT GTTATGCCTA TCCCCGGTTG GACGATCCCG 4440
GGCCCTTGGG TTCCGCGGAC GCCGGGCGGC AAGACCTGCC CCGGCGCGTC GTCCGTCACG 4500 AGCCCCTGGG CCGCTCGTTC CTCACGGGGG GGCTGGTTTT GCTGGCGCCG CCGGTACGCG 4560
GATTTGGCGC ACCCAACGCA ACGTATGCGG CCCGTGTGAC GTACTACCGG CTCACCCGCG 4620
CCTGCCGTCA GCCCATCCTC CTTCGGCAGT ATGGAGGGTG TCGCGGCGGC GAGCCGCCGT 4680
CCCCAAAGAC GTGCGGGTCG TACACGTACA CGTACCAGGG CGGCGGGCCT CCGACCCGGT 4740
ACGCTCTCGT AAATGCTTCC CTGCTGGTGC CGATCTGGGA CCGCGCCGCG GAGACATTCG 4800 AGTACCAGAT CGAACTCGGC GGCGAGCTGC ACGTGGGTCT GTTGTGGGTA GAGGTGGGCG 4860
GGGAGGGCCC CGGCCCCACC GCCCCCCCAC AGGCGGCGCG TGCGGAGGGC GGCCCGTGCG 4920
TCCCCCCGGT CCCCGCGGGC CGCCCGTGGC GCTCGGTGCC CCCGGTATGG TATTCCGCCC 4980
CCAACCCCGG GTTTCGTGGC CTGCGTTTCC GGGAGCGCTG TCTGCCCCCA CAGACGCCCG 5040
CCGCCCCCAG CGACCTACCA CGCGTCGCTT TTGCTCCCCA GAGCCTGCTG GTGGGGATTA 5100 CGGGCCGCAC GTTTATTCGG ATGGCACGAC CCACGGAAGA CGTCGGGGTC CTGCCACCCC 5160
ATTGGGCCCC CGGGGCCCTA GATGACGGTC CGTACGCCCC CTTCCCACCC CGCCCGCGGT 5220
TTCGACGCGC CCTGCGGACA GACCCCGAGG GGGTCGACCC CGACGTTCGG GCCCCCCTAA 5280
CCGGGCGGCG CCTCATGGCC TTGACCGAGG ACGCGTCCTC CGATTCGCCT ACGTCCGCTC 5340
CGGAGAAGAC GCCCCTCCCT GTGTCGGCCA CCGCCATGGC GCCCTCAGTC GACCCAAGCG 5400 CGGAACCGAC CGCCCCCGCA ACCACTACTC CCCCCGACGA GATGGCCACA CAAGCCGCAA 5460
CGGTCGCCGT TACGCCGGAG GAAACGGCAG TCGCCTCCCC GCCCGCGACT GCATCCGTGG 5520
AGTCGTCGCC ACTCCCCGCC GCGGCGGCAA CGCCCGGGGC CGGGCACACG AACACCAGCA 5580
GCGCCCCCGC AGCGAAAACG CCCCCCACCA CACCAGCCCC CACGACCCCC CCGCCCACGT 5640
CTACCCACGC GACCCCCCGC CCCACGAGTC CGGGGCCCCA AACAACCCCT CCCGGACCCG 5700 CAACCCCGGG TCCGGTGGGC GCCTCCGCCG CACCCACGGC CGATTCCCCC CTCACCGCCT 5760
CGCCCCCCGC TACCGCGCCG GGGCCCTCGG CCGCCAACGT TTCGGTCGCC GCGACCACCG 5820
CCACGCCCGG AACCCGGGGC ACCGCCCGTA CCCCCCCAAC GGACCCAAAG ACGCACCCAC 5880
ACGGACCCGC GGACGCTCCC CCCGGCTCGC CAGCCCCCCC ACCCCCCGAA CATCGCGGCG 5940 GACCCGAGGA GTTTGAGGGC GCCGGGGACG GCGAACCCCC CGATGACGAC GACAGCGCCA 6000
CCGGCCTCGC CTTCCGAACT CCGAACCCCA ACAAACCACC CCCCGCGCGC CCCGGGCCCA 6060
TCCGCCCCAC GCTCCCGCCA GGAAT-TCTTG GGCCGCTCGC CCCCAACACG CCTCGCCCCC 6120
CCGCCCAAGC TCCCGCTAAG GACATGCCCT CGGGCCCCAC ACCCCAACAC ATCCCCCTGT 6180 TCTGGTTCCT AACGGCCTCC CCTGCTCTAG ATATCCTCTT TATCATCAGC ACCACCATCC 6240
ACACGGCGGC GTTCGTTTGT CTGGTCGCCT TGGCAGCACA ACTTTGGCGC GGCCGGGCGG 6300
GGCGCAGGCG ATACGCGCAC CCGAGCGTGC GTTACG ATG TCTGCCACCC GAGCGGGATT 6360
AGGGGGTGGG GTGGGGGCGA GAAACGATGA AGGACGGGAA AGGGAACAGC GACCAAATGC 6420
CACGATAAGA ACAATAAACC TGTGACGTCA ATCGGATATG TGAGTTTGGT TGTGTTTTGT 6480 GGGACTGGGG GCGGGGGGTG GGAGGTATCA GTGGGTGACA GAGTCTTTTA AAAGACGTGT 6540
CCCGGGGCCC TCGAGACGCG CAACTTTTGG CCACACAGAG AAAGGCCCCC AGACGAAGTC 6600
ACCCGGGTCC CCGAACAAAA ACAAAAACCT TGACCGCCGC CGGGGGGCGT GCCTGTTGTT 6660
TTGGTCTCAA TGGATCGGTA TGCCGTTCGG ACCTGGGGGA TTGTGGGAAT CCTCGGGTGT 6720
GCTGCTGTTG GGGCCGCACC CACCGGCCCC GCGTCCGATA CAACAAACGC GACCGCACGC 6780 CTCCCCACGC ACCCCCCACT CATCCGTTCC GGGGGCTTTG CCGTCCCCCT CATCGTGGGG 6840
GGGCTGTGTC TCATGATTCT GGGGATGGCG TGTCTACTCG AGGTCCTGCG TCGCCTGGGT 6900
CGCGAGTTGG CGAGGTGCTG CCCCCACGCG GGCCAATTTG CCCCATGATT TTTCGCCTTT 6960
CTGGCCTTGC CCCCACCCCA TCGCCCCGAT TGTGTGTCGG GTGCCCGGGG TACAGCAGCT 7020
ATGGAGCGGT CGGTAATATA ACTTTGGTTG TCGCCACACG CCCCGTGCCG GGCATGGGTT 7080 GTGCGGGAAA GACGAAATAA TCCGGCGATC CCCAAGCGTA CCAACTTGGG GGGGGGGGGA 7140
AAGAAACTAA AAACACATCA AGCCCACAAC CCATCCCACA AGGGGGGTTA TGGCGGACCC 7200
ACCGCACCAC CATACTCCGA TTCGACCACA TATGCAACCA AATCACCCCC AGAGGGGAGG 7260
TTCCATTTTT ACGAGGAGGA GGAGTATAAT AGAGTCTTTG TGTTTAAAAC CCGGGGTCGG 7320
TGTGGTGTTC GGTCATAAGC TGCATTGCGA ACGACTAGTC GCCGTTTTTC GTGTGCATCG 7380 CGTATCACGG CATGGGGCGT TTGACCTCCG GCGTCGGGAC GGCGGCCCTG CTAGTTGTCG 7440
CGGTGGGACT CCGCGTCGTC TGCGCCAAAT ACGCCTTAGC AGACCCCTCG CTTAAGATGG 7500
CCGATCCCAA TCGATTTCGC GGGAAGAACC TTCCGGTTTT GGACCAGCTG ACCGACCCCC 7560
CCGGGGTGAA GCGTGTTTAC CACATTCAGC CGAGCCTGGA GGACCCGTTC CAGCCCCCCA 7620
GCATCCCGAT CACTGTGTAC TACGCAGTGC TGGAACGTGC CTGCCGCAGC GTGCTCCTAC 7680 ATGCCCCATC GGAGGCCCCC CAGATCGTGC GCGGGGCTTC GGACGAGGCC CGAAAGCACA 7740
CCTACAACCT GACCATCGCC TGGTATCGCA TGGGAGACAA TTGCGCTATC CCCATCACGG 7800
TTATGGAATA CACCGAGTGC CCCTACAACA AGTCGTTGGG GGTCTGCCCC ATCCGAACGC 7860
AGCCCCGCTG GAGCTACTAT GACAGCTTTA GCGCCGTCAG CGAGGATAAC CTGGGATTCC 7920
TGATGCACGC CCCCGCCTTC GAGACCGCGG GTACGTACCT GCGGCTAGTG AAGATAAACG 7980 ACTGGACGGA GATCACACAA TTTATCCTGG AGCACCGGGC CCGCGCCTCC TGCAAGTACG 8040
CTCTCCCCCT GCGCATCCCC CCGGCAGCGT GCCTCACCTC GAAGGCCTAC CAACAGGGCG 8100
TGACGGTCGA CAGCATCGGG ATGCTCCCCC GCTTTATCCC CGAAAACCAG CGCACCGTCG 8160
CCCTATACAG CTTAAAAATC GCCGGGTGGC ACGGCCCCAA GCCCCCGTAC ACCAGCACCC 8220
TGCTGCCGCC GGAGCTGTCC GACACCACCA ACGCCACGCA ACCCGAACTC GTTCCGGAAG 8280 ACCCCGAGGA CTCGGCCCTC TTAGAGGATC CCGCCGGGAC GGTGTCTTCG CAGATCCCCC 8340
CAAACTGGCA CATCCCGTCG ATCCAGGACG TCGCGCCGCA CCACGCCCCC GCCGCCCCCA 8400
GCAACCCGGG CCTGATCATC GGCGCGCTGG CCGGCAGTAC CCTGGCGGTG CTGGTCATCG 8460
GCGGTATTGC GTTTTGGGTA CGCCGCCGCG CTCAGATGGC CCCCAAGCGC CTACGTCTCC 8520 CCCACATCCG GGATGACGAC GCGCCCCCCT CGCACCAGCC ATTGTTTTAC TAGAGGAGTA 8580
TCCCCGCTCC CGTGTACCTC TGGGCCCGTG TGGGAGGGTG GCTGGGGTAT TTGGGTGGGA 8640
CTTGGACTCC GCATAAAGGG AGTCTCGAAG GAGGGAAACT AGGACAGTTC ATAGGCCGGG 8700
AGCGTGGGGC GCGCACCGCT GTCCCGACGA TTAGCCACCG CGCCCACAGT CACCTCGACC 8760 CGTCCGATCC CGGTATGCCC GGCCGCTCGC TGCAGGGCCT GGCGATCCTG GGCCTGTGGG 8820
TCTGCGCCAC CGGCCTGGTC GTCCGCGGCC CCACGGTCAG TCTGGTCTCA GACTCACTCG 8880
TGGATGCCGG GGCCGTGGGG CCCCAGGGCT TCGTGGAAGA GGACCTGCGT GTTTTCGGGG 8940
AGCTTCATTT TGTGGGGGCC CAGGTCCCCC ACACAAACTA CTACGACGGC ATCATCGAGC 9000
TGTTTCACTA CCCCCTGGGG AACCACTGCC CCCGCGTTGT ACACGTGGTC ACACTGACCG 9060 CATGCCCCCG CCGCCCCGCC GTGGCGTTCA CCTTGTGTCG CTCGACGCAC CACGCCCACA 9120
GCCCCGCCTA TCCGACCCTG GAGCTGGGTC TGGCGCGGCA GCCGCTTCTG CGGGTTCGAA 9180
CGGCAACGCG CGACTATGCC GGTCTGTATG TCCTGCGCGT ATGGGTCGGC AGCGCGACGA 9240
ACGCCAGCCT GTTTGTTTTG GGGGTGGCGC TCTCTGCCAA CGGGACGTTT GTGTATAACG 9300
GCTCGGACTA CGGCTCCTGC GATCCGGCGC AGCTTCCCTT TTCGGCCCCG CGCCTGGGAC 9360 CCTCGAGCGT ATACACCCCC GGAGCCTCCC GGCCCACCCC TCCACGGACA ACGACATCCC 9420
CGTCCTCCCC CCGAGACCCG ACCCCCGCCC CCGGGGACAC AGGGACGCCC GCGCCCGCGA 9480
GCGGCGAGAG AGCCCCGCCC AATTCCACGC GATCGGCCAG CGAATCGAGA CACAGGCTAA 9540
CCGTAGCCCA GGTAATCCAG ATCGCCATAC CGGCGTCCAT CATCGCCTTT GTGTTTCTGG 9600
GCAGCTGTAT CTGCTTCATC CATAGATGCC AGCGCCGATA CAGGCGCCCC CGCGGCCAGA 9660 TTTACAACCC CGGGGGCGTT TCCTGCGCGG TCAACGAGGC GGCCATGGCC CGCCTCGGAG 9720
CCGAGCTGCG ATCCCACCCA AACACCCCCC CCAAACCCCG ACGCCGTTCG TCGTCGTCCA 9780
CGACCATGCC TTCCCTAACG TCGATAGCTG AGGAATCGGA GCCAGGTCCA GTCGTGCTGC 9840
TGTCCGTCAG TCCTCGGCCC CGCAGTGGCC CGACGGCCCC CCAAGAGGTC TAGGTCCAAG 9900
CGGGCCGTTC GGCAGGCCCG CCCCACCGCC CCCATCGTGG TTATTTCCCC CCCAATAAAC 9960 CGATGTTATT TGCCTATATG CGTGTGTTGG ATCCCTTTGT GATCGTTCGT CATTCCCCGG 10020
ATGGCATGGG AGGCGGGTAA TGGATGGGCG GGGCCCGGGG GGGGAGGAAA AAGAATAAAG 10080
GGGGTAGTGT CGGAGAGGCC CGCCGCGCAT TTAAGGAGTC GCCGCCCCGA CTCTGTGTCT 10140
TCGGGTGACT TGGTGCGCCG CCGTCAGCTA GTCTCCGATC TGCCCCGACC GACGGCTCCT 10200
GCCACCCGAA CATGGCTCGC GGGGCCGGGT TGGTGTTTTT TGTTGGAGTT TGGGTCGTAT 10260 CGTGCCTGGC GGCAGCACCC AGAACGTCCT GGAAACGGGT AACCTCGGGC GAGGACGTGG 10320
TGTTGCTTCC GGCGCCCGCG GGGCCGGAGG AACGCACCCG GGCCCACAAA CTACTGTGGG 10380
CCGCGGAACC CCTGGATGCC TGCGGTCCCC TGCGCCCGTC GTGGGTGGCG CTGTGGCCCC 10440
CCCGACGGGT GCTCGAGACG GTCGTGGATG CGGCGTGCAT GCGCGCCCCG GAACCGCTCG 10500
CCATAGCATA CAGTCCCCCG TTCCCCGCGG GCGACGAGGG ACTGTATTCG GAGTTGGCGT 10560 GGCGCGATCG CGTAGCCGTG GTCAACGAGA GTCTGGTCAT CTACGGGGCC CTGGAGACGG 10620
ACAGCGGTCT GTACACCCTG TCCGTGGTCG GCCTAAGCGA CGAGGCGCGC CAAGTGGCGT 10680
CGGTGGTTCT GGTCGTGGAG CCCGCCCCTG TGCCGACCCC GACCCCCGAC GACTACGACG 10740
AAGAAGACGA CGCGGGCGTG AGCGAACGCA CGCCGGTCAG CGTTCCCCCC CCAACCCCCC 10800
CCCGTGGTCC CCCCGTGGCC CCCCCGACGC ACCCTCGTGT TATCCCCGAG GTGTCCCACG 10860 TGCGCGGGGT AACGGTCCAT ATGGAGACCC CGGAGGCCAT TCTGTTTGCC CCCGGGGAGA 10920
CGTTTGGGAC GAACGTCTCC ATCCACGCCA TTGCCCACGA CGACGGTCCG TACGCCATGG 10980
ACGTCGTCTG GATGCGGTTT GACGTGCCGT CCTCGTGCGC CGAGATGCGG ATCTACGAAG 11040
CTTGTCTGTA TCACCCGCAG CTTCCAGAGT GTCTATCTCC GGCCGACGCG CCGTGCGCCG 11100 TAAGTTCCTG GGCGTACCGC CTGGCGGTCC GCAGCTACGC CGGCTGTTCC AGGACTACGC 11160
CCCCGCCGCG ATGTTTTGCC GAGGCTCGCA TGGAACCGGT CCCGGGGTTG GCGTGGCTGG 11220
CCTCCACCGT CAATCTGGAA TTCCAGCACG CCTCCCCCCA GCACGCCGGC CTCTACCTGT 11280
GCGTGGTGTA CGTGGACGAT CATATCCACG CCTGGGGCCA CATGACCATC AGCACCGCGG 11340 CGCAGTACCG GAACGCGGTG GTGGAACAGC ACCTCCCCCA GCGCCAGCCC GAGCCCGTCG 11400
AGCCCACCCG CCCGCACGTG AGAGCCCCCC CTCCCGCGCC CTCCGCGCGC GGCCCGCTGC 11460
GCCTCGGGGC GGTGCTGGGG GCGGCCCTGT TGCTGGCCGC CCTCGGGCTG TCCGCGTGGG 11520
CGTGCATGAC CTGCTGGCGC AGGCGCTCCT GGCGGGCGGT TAAAAGCCGG GCCTCGGCGA 11580
CGGGCCCCAC TTACATTCGC GTGGCGGACA GCGAGCTGTA CGCGGACTGG AGTTCGGACA 11640 GCGAGGGGGA GCGCGACGGG TCCCTGTGGC AGGACCCTCC GGAGAGACCC GACTCTCCCT 11700
CCACAAATGG ATCCGGCTTT GAGATCTTAT CACCAACGGC TCCGTCTGTA TACCCCCATA 11760
GCGAGGGGCG TAAATCTCGC CGCCCGCTCA CCACCTTTGG TTCGGGAAGC CCGGGCCGTC 11820
GTCACTCCCA GGCCTCCTAT TCGTCCGTCC TCTGGTAAGG CGTCTTCCGA CGACGCGGAC 11880
GTCGGCGATG AACTGATTGC CATCGCGGAC GCACGCGGGG ACCCGCCAGA GACCCTGCCC 11940 CCCGGCGCGG GCGGCGCCGC GCCCGCGTGC CGCAGACCAC CTCGCGGCGG CTCCCCCGCG 12000
GCCTTTCCCG TGGCCCTCCA CGCCGTGGAC GCCCCCTCCC AATTCGTCAC CTGGCTCGCC 12060
GTGCGCTGGC TGCGGGGGGC GGTGGGTCTC GGGGCCGTCC TGTGCGGGAT TGCGTTTTAC 12120
GTGACGTCAA TCGCCCGAGG CGCATAAAGG TCCGGCGGCC AGCCCCGCCG CAGCTCATAA 12180
AAATCGTGAG TCACGGCAAC CGCACCTTCG CCTCCGGCCC TCCGCCAGCG CCCTTCCGCG 12240 TCCGCGATGA CCTCCCGGCC CGCCGACCAA GACTCGGTGC GTTCCAGCGC GTCGGTGCCG 12300
CTTTACCCCG CGGCCTCGCC CGTCCCGGCA GAAGCCTACT ACTCGGAAAG CGAAGACGAG 12360
GCCGCCAACG ACTTCCTCGT GCGCATGGGC CGCCAGCAGT CGGTCCTAAG GCGCCGACGG 12420
CGGCGCACGC GGTGCGTCGG GCTGGTTATC GCCTGTCTCG TCGTGGCCCT CCTATCTGGA 12480
GGGTTCGGGG CACTTTTGGT GTGGCTGCTC CGCTAAATGA CGCCTCGATG TATGGCGCCT 12540 TCTTCGCCCC CACCCCTCGC CGCGACCCAC GTCCGTATGT TAATTGCAAT AAAGTGGTTG 12600
ATTGTCATTA CGGTCTACTA GGTTGTCTTT TTTTTTTGGG GGGGGGGGGA AGGAAATGCA 12660
GAAAAGGGTA AGAAATTCTC GGAATTTCAC CCCCCGGGGG GGGCAAGTGC AGTACCCCAG 12720
TTCCTCAGTG TTTGGGAAAT CTATTGAACT CTCCCGGCTC CTCCGTGTTA GGGAAGTCTC 12780
TTGGGGAAAT CTATTGACCT CTCGCCCCCC CCCCCAGGAG GGGGCAGTGC AGTACCCCAG 12840 TTCCTCCGTG CTGGGGAAAT CTCTCTGCCG GGTACGGGCT CCAGACGAAG GACCCATACA 12900
TTTCCCCATC CGCACCCCAC ATCTGGCGTT CTAGAGTCAC GACGCATTTG CCCCCGTCCC 12960
CGCAGCAACA CACAAAGCGA TTTCAATTTT CACGATTTTA TTATTAATTA CACCAACCAC 13020
CCTGTCCCCG GGACGTGGTC AGGACCGGGG GTCCGCACCC AAACGCACGA AACAAATGCT 13080
GGCAGTGTGC CGAATATAAC CCCGCGTAGG AACACGTCGA CGCGTGCGCC AAACAGCACC 13140 AGAAGGCGCA TGCCATCAGC AGGTCGTGCA TATGGCGATG TGTTTGGACG CAGGGCGCAG 13200
CCGCGGCGAT AAAATTCATG GCGGCCGTCC GCCAGGGCCA CAGCGGCGAG GACTCCCTGT 13260
TGGCCCGAAG CCATTGGGTA TGAACCAGCT GCGCCTCCTG TCCGACCCTG GCTCCCGCCA 13320
GCGGGGGCGG TGGGTCGTGG GTGTTGAGAG CACACAGGCG GGACACCTCG ATCACCGTCC 13380
GAAAAAAGGC CCGGTGGTCC GCGGGCAGCA TCTGCAGGTG CGCCAGGGCC TGGGCGTTGA 13440 GAGGGTACAA CTCGGAGCCG GGGGACTCCG GGGGCCGGTC CGCGCGGTGC CGCGAGTGGG 13500
CACGCTTTGG GGCCCGGGTG TCGGACGCGG GCGCGTTACG GATCCCGACG CGGGGCAGAA 13560
CGTACGTGCG TTGGCGCGGC GATGAGGGGT CCGGGCTGCC GAGGGGGGCG TAGGGGACCG 13620
GGCTAGGCAA GCCCGCGGGT TGCGCGGGGT TCCCGTGGGG GTCTAGGCTC CCTGGGCACC 13680 CGTGGGGGTC GTGGGGGTCG CGGGTCCCTG GGTATGCGCG GGACCCTGGG TTCTCTGGGA 13740
GATCGTGGAA CTCGCGGTTC CCTGGGCTCT CGGGGAACCC GGGGCTCCCT GGGGACACGT 13800
GGTGCCCTGG GAATTCTTGA TGGTCGGACG GCTTCAGATG GCTTCGGGAT CGAGAGGGCC 13860
GCACAGACTC GTAGTAGACC CGAATCTCCA CGTTTCCCCG CCGCCGGATC ATGGTCGCCG 13920 CCCCGGTGCG GGGGCCCGTC GGTCGGAAGC GAGTGCCCTT CAAGCGTGTC CGCTCCTCTG 13980
GGCTGCATGC CGTCGGATGG GGTGCCTTTT AAGGAAAGGT CTCGGCTGCC CGCCCCAACC 14040
GGGGTTTGGG GGTGGGCCGG GGAAACCCCG GATGCCATGG GGGGGGTCAC ACCCTAAGCG 14100
CCGGCGCGCT GGTTGGGTGG GGGTAGAGGG GAGTCCCCGG TCGACGAGAT CGTATCAAGG 14160
GGCCAGCACG CGATCCTGCC GCTCGTTCGA TCTAGCACAC CCACGGGTCT GCTGTGTGGG 14220 ATTTCGACTC GCGGGATCCG ATCGCACGTC CGGAGGACAC AGCAGCGGGA GCTCCGGGTC 14280
GGTCACCGCA GTTCTGGCCG CCTCTCGGTC CTCCCGTTCC CTTTTATGGA TCTCCGCGCA 14340
GACATCGCCA TACGTCCGGT GTGTGCACCG CGAAGAATCC AAAAACATGT CCGTCGTTTT 14400
CAGGGCCCAA GACATGGTGT CCCGTCCACG AAGGCGGCGC CCGGCCTGCG AGAAAGCGCG 14460
GATGTTGGGA TCGGGGCCCC CCCGTCCCGT CCCCCCGTCC CGTCCCCCCG TCCCGTCCCC 14520 CCGTCCCGTC CCCCCGTCCC GTCCCCCCGT CCCGTCCCCC CGTCCCGTCC CCCCGTCCCG 14580
TCCCCCCGTC CCGTCCCCCC GTCCCGTCCC CCCGTCCCGT CCCCCCGTCC CGTCCCCCCG 14640
TCCCGTCCCC CCGTCCCGTC CCCCCGTCCC GTCCCCCCGT CCCGTCCCCC CGCCCCGGCG 14700
CCCCCGGGTC ACCGTACCTG CGATAAGGCT GCAGTGGGTG GATGGGTCCT CGCGGTACGT 14760
ACAGGGTGGG GGGGGGGGGG AGGGAAAGGC AGAACGAAAA GGAACCGATG CGCCCGCGTC 14820 TCTGTATCCG ATCCGATCCG GGTGCGTCGG TGCCCCGCTC GCCGCCGGCG TCTCTGTCTC 14880
GCTGTGGCCC CCTTCGCGAT GCCGCCGCTG CCGTCCCGGT CTCCGCCGCG CAGCCGGTGT 14940
GCCCCTGGTG CGGCGGCGAC CGGGACGCCG GCCCTTTATG TGCGCGAGGA ACGGCCCGCC 15000
CCCCGTCCGG GCCCGCCTCG GGGCGGGGCC CGCGGGATGA CGCGGGCCCC GGGCAGGGCG 15060
CCAGTGCTCG CACTTTGCCC TAATAATATA TATACTATTA GGACGAAGTG CGAACGCTTC 15120 GCGTTCTCAC TTCTTTTACC CTGCGGCCCC GCCCCCTTTG GGGCGGAGCC CGCGGGATGA 15180
CGCGGGCCCC GGGCAGGGCG CCAGTGCTCG CACTTTGCCC TAATAATATA TATACTATTA 15240
GGACGAAGTG CGAACGCTTC GCGTTCTCAC TTCTTTTACC CTGCGGCCCC GCCCCCTTTG 15300
GGGCGGAGCC GCCCGCGGAC CAACGGGGCG ACCTCGCCGG CCCCAAAGGG GCCGGCGGGG 15360
GCCAACGGGA GCGCGGGGCC GGCATCTCAT TACCACGAAC CCGGAAGGGC AGGGGAGCGA 15420 GCCCGCCCGC GACGAGGGTC TCATTAGCAT CGCGGGCGGA AGCGGAAGCC GCCCGCGCCG 15480
GGCGCTAATG AGATGCCGCG CGGGCGGAGC CGGCGGCGGC GCGACCAACG GGCCGCCGCC 15540
ACGGACGCGG ACGCGCGGGC GTCGGGGCGG GGCCGCGCAT AATGCGGTTC CACCTGGGGG 15600
CGGAACCCCG GCGAGCCGGG GCGCGGCGGC GTCGATCGCT CCTCCTCCGC GTCCTCCTCC 15660
TTTCCCCCCG CCCCGCGCGC CCCGAGGACT ATATCAGCCA GGCGACGGGG CGATCGTCCA 15720 CACGGAGCGC GGCTACCGAC GCGGCCGCCA GGATCTACCC GATCGGCGCG GAGAGGCGAA 15780
AAGACACAGG CACACGCACG CACCGCACGG GGGGGAGAGA GAGACCGCCA ACCCCCCCCC 15840
CCCCCCACTG CCGCCCCTGA AGAAGAAGAA GACCCCCCGC ACACCCCGGT CGGAGGCGAT 15900
GTCGGCGGAG CAGCGGAAGA AGAAGAAGAC GACGACGACG ACGCAGGGCC GCGGGGCCGA 15960
GGTCGCGATG GCGGACGAGG ACGGGGGACG TCTCCGGGCC GCGGCGGAGA CGACCGGCGG 16020 CCCCGGATCT CCGGATCCAG CCGACGGACC GCCGCCCACC CCGAACCCGG ACCGTCGCCC 16080
CGCCGCGCGG CCCGGGTTCG GGTGGCACGG TGGGCCGGAG GAGAACGAAG ACGAGGACGA 16140
CGACGCCGCC GCCGATGCCG ATGCCGACGA GGCGGCCCCG GCGTCCGGGG AGGCCGTCGA 16200
CGAGCCTGCC GCGGACGGCG TCGTCTCGCC GCGGCAGCTG GCCCTGCTGG CCTCGATGGT 16260 GGACGAGGCC GTTCGCACGA TCCCGTCGCC CCCCCCGGAG CGCGACGGCG CGGAAGAAGA 16320
AGCAGCCCGC TCGCCTTCTC CGCCGCGGAC CCCCTCCATG TGCGCCGATT ATGGCGAGGA 16380
GAACGACGAC GACGACGATG ACGAGGACCG CGACGCGGGC CGCTGGGTCC GCGGACCGGA 16440
GAACGACGTC CGCGGTCCGC GGGGCGTACC CGGACCCCAT GGCCAGCCTG TCGCCGCGAC 16500 CCCCGGCGCC CCGCCGACAC CACCACCACC ACCACCGCCG CCGCCGCCGG CGCGCCCCCC 16560
GCCGGCGCTC GACCGCCTCT GACTCATCAA AATCCGGATC CTCGTCGTCG GCGTCCTCCG 16620
CCTCCTCCTC CGCCTCCTCC TCCTCGTCTG CATCCGCCTC CTCGTCTGAC GACGACGACG 16680
ACGACGCCGC CCGCGCCCCC GCCAGCGCCG CAGACCACGC CGCGGGCGGG ACCCTCGGCG 16740
CGGACGACGA GGAGGCGGGG GTGCCCGCGA GGGCCCCGGG GGCGGCGCCC CGGCCGAGCC 16800 CGCCCAGGGC CG 16813
(2) INFORMATION FOR SEQ ID NO: 205:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 414 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 205:
Met Ala Asp Ile Pro Pro Asp Pro Pro Ala Leu Asn Thr Thr Pro Ala 1 5 10 15
Asn His Ala Pro Pro Ser Pro Pro Pro Gly Ser Arg Lys Arg Arg Arg
20 25 30
Pro Val Leu Pro Ser Ser Ser Glu Ser Glu Gly Lys Pro Asp Thr Glu 35 40 45 Ser Glu Ser Ser Ser Thr Glu Ser Ser Glu Asp Glu Ala Gly Asp Leu 50 55 60
Arg Gly Gly Arg Arg Arg Ser Pro Arg Glu Leu Gly Gly Arg Tyr Phe 65 70 75 80
Leu Asp Leu Ser Ala Glu Ser Thr Thr Gly Thr Glu Ser Glu Gly Thr 85 90 95
Gly Pro Ser Asp Asp Asp Asp Asp Asp Ala Ser Asp Gly Trp Leu Val
100 105 110
Asp Thr Pro Pro Arg Lys Ser Lys Arg Pro Arg Ile Asn Leu Arg Leu 115 120 125 Thr Ser Ser Pro Asp Arg Arg Ala Gly Val Val Phe Pro Glu Val Trp 130 135 140
Arg Asn Asp Arg Pro Ile Arg Ala Ala Gin Pro Gin Ala Pro Ala Gin 145 150 155 160 Ser Ser Gly Asp Arg Ala Ala Ala Pro Arg Arg Ser Ala Arg Gin Ala
165 170 175
Gin Met Arg Ser Gly Ala Ala Trp Thr Leu Asp Leu His Tyr Ile Arg 180 185 190 Gin Cys Val Asn Gin Leu Phe Arg Ile Leu Arg Ala Ala Pro Asn Pro 195 200 205
Pro Gly Ser Ala Asn Arg Leu Arg His Leu Val Arg Asp Cys Tyr Leu
210 215 220
Met Gly Tyr Cys Arg Thr Arg Leu Gly Pro Arg Thr Trp Gly Arg Leu 225 230 235 240
Leu Gin Ile Ser Gly Gly Thr Trp Asp Val Arg Leu Arg Asn Ala Ile
245 250 255
Arg Glu Val Glu Ala Arg Phe Glu Pro Ala Ala Glu Pro Val Cys Glu 260 265 270 Leu Pro Cys Leu Asn Ala Arg Arg Tyr Gly Pro Glu Cys Asp Val Gly 275 280 285
Asn Leu Glu Thr Asn Gly Gly Ser Thr Ser Asp Asp Glu Ile Ser Asp
290 295 300
Ala Thr Asp Ser Asp Asp Thr Leu Ala Ser His Ser Asp Thr Glu Gly 305 310 315 320
Gly Pro Ser Pro Ala Gly Arg Glu Asn Pro Glu Ser Ala Ser Gly Gly
325 330 335
Ala Ile Ala Ala Arg Leu Glu Cys Glu Phe Gly Thr Phe Asp Trp Thr 340 345 350 Ser Glu Glu Gly Ser Gin Pro Trp Leu Ser Ala Val Val Ala Asp Thr 355 360 365
Ser Ser Ala Glu Arg Ser Gly Leu Pro Ala Pro Gly Ala Cys Arg Ala
370 375 380
Thr Glu Ala Pro Glu Arg Glu Asp Gly Cys Arg Lys Met Arg Phe Pro 385 390 395 400
Ala Ala Cys Pro Tyr Pro Cys Gly His Thr Phe Leu Arg Pro 405 410
(2) INFORMATION FOR SEQ ID NO: 206:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 414 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 206:
Met Ala Asp Ile Pro Pro Asp Pro Pro Ala Leu Asn Thr Thr Pro Ala 1 5 10 15 Asn His Ala Pro Pro Ser Pro Pro Pro Gly Ser Arg Lys Arg Arg Arg 20 25 30
Pro Val Leu Pro Ser Ser Ser Glu Ser Glu Gly Lys Pro Asp Thr Glu
35 40 45
Ser Glu Ser Ser Ser Thr Glu Ser Ser Glu Asp Glu Ala Gly Asp Leu 50 55 60
Arg Gly Gly Arg Arg Arg Ser Pro Arg Glu Leu Gly Gly Arg Tyr Phe 65 70 75 80
Leu Asp Leu Ser Ala Glu Ser Thr Thr Gly Thr Glu Ser Glu Gly Thr 85 90 95 Gly Pro Ser Asp Asp Asp Asp Asp Asp Ala Ser Asp Gly Trp Leu Val 100 105 110
Asp Thr Pro Pro Arg Lys Ser Lys Arg Pro Arg Ile Asn Leu Arg Leu
115 120 125
Thr Ser Ser Pro Asp Arg Arg Ala Gly Val Val Phe Pro Glu Val Trp 130 135 140
Arg Asn Asp Arg Pro Ile Arg Ala Ala Gin Pro Gin Ala Pro Ala Gin
145 150 155 160
Ser Ser Gly Asp Arg Ala Ala Ala Pro Arg Arg Ser Ala Arg Gin Ala
165 170 175 Gin Met Arg Ser Gly Ala Ala Trp Thr Leu Asp Leu His Tyr Ile Arg
180 185 190
Gin Cys Val Asn Gin Leu Phe Arg Ile Leu Arg Ala Ala Pro Asn Pro
195 200 205
Pro Gly Ser Ala Asn Arg Leu Arg His Leu Val Arg Asp Cys Tyr Leu 210 215 220 Met Gly Tyr Cys Arg Thr Arg Leu Gly Pro Arg Thr Trp Gly Arg Leu
225 230 235 240
Leu Gin Ile Ser Gly Gly Thr Trp Asp Val Arg Leu Arg Asn Ala Ile
245 250 255 Arg Glu Val Glu Ala Arg Phe Glu Pro Ala Ala Glu Pro Val Cys Glu
260 265 270
Leu Pro Cys Leu Asn Ala Arg Arg Tyr Gly Pro Glu Cys Asp Val Gly
275 280 285
Asn Leu Glu Thr Asn Gly Gly Ser Thr Ser Asp Asp Glu Ile Ser Asp 290 295 300
Ala Thr Asp Ser Asp Asp Thr Leu Ala Ser His Ser Asp Thr Glu Gly 305 310 315 320
Gly Pro Ser Pro Ala Gly Arg Glu Asn Pro Glu Ser Ala Ser Gly Gly 325 330 335
Ala Ile Ala Ala Arg Leu Glu Cys Glu Phe Gly Thr Phe Asp Trp Thr
340 - 345 350
Ser Glu Glu Gly Ser Gin Pro Trp Leu Ser Ala Val Val Ala Asp Thr 355 360 365
Ser Ser Ala Glu Arg Ser Gly Leu Pro Ala Pro Gly Ala Cys Arg Ala
370 375 380
Thr Glu Ala Pro Glu Arg Glu Asp Gly Cys Arg Lys Met Arg Phe Pro 385 390 395 400 Ala Ala Cys Pro Tyr Pro Cys Gly His Thr Phe Leu Arg Pro
405 410
(2) INFORMATION FOR SEQ ID NO: 207:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 287 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:207:
Met Gly Val Val Val Val Ser Val Val Thr Leu Leu Asp Gin Arg Asn 1 5 10 15
Ala Leu Pro Arg Thr Ser Ala Asp Asp Ala Leu Trp Ser Phe Leu Leu
20 25 30
Arg Gin Cys Arg Ile Leu Ala Ser Glu Pro Leu Gly Thr Pro Val Val 35 40 45
Val Arg Pro Ala Asn Leu Arg Arg Leu Ala Glu Pro Leu Met Asp Leu
50 55 60
Pro Lys Phe Trp Ile Val Arg Thr Arg Ser Cys Arg Cys Pro Pro Asn 65 70 75 80 Thr Thr Thr Gly Leu Phe Ala Glu Asp Asp Pro Leu Glu Ser Ile Glu
85 90 95
Ile Leu Asp Ala Pro Ala Cys Phe Arg Leu Leu His Gin Glu Arg Pro
100 105 110
Gly Pro His Arg Leu Tyr His Leu Trp Val Val Gly Ala Ala Asp Leu 115 120 125
Cys Val Pro Phe Leu Glu Tyr Ala Gin Lys Thr Arg Leu Gly Phe Arg
130 135 140
Phe Ile Ala Met Lys Thr Asn Asp Ala Trp Val Gly Glu Pro Trp Pro 145 150 155 160
Leu Pro Asp Arg Phe Leu Pro Glu Arg Thr Val Ser Trp Thr Pro Phe
165 - 170 175
Pro Ala Ala Pro Asn His Pro Leu Glu Asn Leu Leu Ser Arg Tyr Glu 180 185 190
Tyr Gin Tyr Gly Val Val Val Pro Gly Asp Arg Glu Arg Ser Cys Leu
195 200 205
Arg Trp Leu Arg Ser Leu Val Ala Pro His Asn Lys Pro Arg Pro Ala
210 215 220 Ser Ser Arg Pro His Pro Ala Thr His Pro Thr Gin Arg Pro Cys Phe
225 230 235 240
Thr Cys Met Gly Arg Pro Glu Ile Pro Asp Glu Pro Ser Trp Gin Thr
245 250 255
Gly Asp Asp Asp Pro Gin Asn Pro Gly Pro Pro Leu Ala Val Gly Asp 260 265 270
Glu Trp Pro Pro Ser Ser His Val Cys Tyr Pro Ile Thr Asn Leu 275 280 285
(2) INFORMATION FOR SEQ ID NO: 208:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 479 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 208:
Met Ala Cys Arg Lys Phe Cys Gly Val Tyr Arg Arg Pro Asp Lys Arg
1 5 10 15
Gin Glu Ala Ser Val Pro Pro Glu Thr Asn Thr Ala Pro Ala Phe Pro 20 25 30 Ala Ser Thr Phe Tyr Thr Pro Ala Glu Asp Ala Tyr Leu Ala Pro Gly 35 40 45
Pro Pro Glu Thr Ile His Pro Ser Arg Pro Pro Ser Pro Gly Glu Ala
50 55 60
Ala Arg Leu Cys Gin Leu Gin Glu Ile Leu Ala Gin Met His Ser Asp 65 70 75 80
Glu Asp Tyr Pro Ile Val Asp Ala Ala Gly Ala Glu Glu Glu Asp Glu
85 90 95
Ala Asp Asp Asp Ala Pro Asp Asp Val Ala Tyr Pro Glu Asp Tyr Ala 100 105 110
Glu Gly Arg Phe Leu Ser Met Val Ser Ala Ala Pro Leu Pro Gly Ala
115 - 120 125
Ser Gly His Pro Pro Val Pro Gly Arg Ala Ala Pro Pro Asp Val Arg 130 135 140
Thr Cys Asp Ser Gly Lys Val Gly Ala Thr Gly Phe Thr Pro Glu Glu
145 150 155 160
Leu Asp Thr Met Asp Arg Glu Ala Leu Arg Ala Ile Ser Arg Gly Cys
165 170 175 Lys Pro Pro Ser Thr Leu Ala Lys Leu Val Thr Gly Leu Gly Phe Ala
180 185 190
Ile His Gly Ala Leu Ile Pro Gly Ser Glu Gly Cys Val Phe Asp Ser
195 200 205
Ser His Pro Asn Tyr Pro His Arg Val Ile Val Lys Ala Gly Trp Tyr 210 215 220
Ala Ser Thr Asn His Glu Ala Arg Leu Leu Arg Arg Leu Asn His Pro
225 230 235 240
Ala Ile Leu Pro Leu Leu Asp Leu His Val Val Ser Gly Val Thr Cys
245 250 255 Leu Val Leu Pro Lys Tyr His Cys Asp Leu Tyr Thr Tyr Leu Ser Lys
260 265 270
Arg Pro Ser Pro Leu Gly His Leu Gin Ile Thr Ala Val Ser Arg Gin
275 280 285
Leu Leu Ser Ala Ile Asp Tyr Val His Cys Glu Gly Ile Ile His Arg 290 295 300
Asp Ile Lys Thr Glu Asn Ile Leu Ile Asn Thr Pro Glu Asn Ile Cys
305 310 315 320
Leu Gly Asp Phe Gly Ala Ala Cys Phe Val Arg Gly Cys Arg Ser Ser
325 330 335 Pro Phe His Tyr Gly Ile Ala Gly Thr Ile Asp Thr Asn Ala Pro Glu
340 345 350
Val Leu Ala Gly Asp Pro Tyr Thr Gin Val Ile Asp Ile Trp Ser Ala
355 360 365
Gly Leu Val Ile Phe Glu Thr Ala Val His Thr Ala Ser Leu Phe Ser 370 375 380
Ala Pro Arg Asp Pro Glu Arg Arg Pro Cys Asp Asn Gin Ile Ala Arg 385 390 395 400
Ile Ile Arg Gin Ala Gin Val His Val Asp Glu Phe Pro Thr His Ala 405 410 415 Glu Ser Arg Leu Thr Ala His Tyr Arg Ser Arg Ala Ala Gly Asn Asn 420 425 430
Arg Pro Ala Trp Trp Ala Trp Thr Arg Tyr Tyr Lys Ile His Thr Asp 435 440 445 Val Glu Tyr Leu Ile Cys Lys Ala Leu Thr Phe Asp Ala Ala Leu Arg
450 455 460
Pro Ser Ala Ala Glu Leu Leu- Arg Leu Pro Leu Phe His Pro Lys 465 470 475
(2) INFORMATION FOR SEQ ID NO: 209:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 37 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 209:
Val Gly Gly Leu Cys Leu Met Ile Leu Gly Met Ala Cys Leu Leu Glu 1 5 10 15 Val Leu Arg Arg Leu Gly Arg Glu Leu Ala Arg Cys Cys Pro His Ala 20 25 30
Gly Gin Phe Ala Pro 35
(2) INFORMATION FOR SEQ ID NO: 210:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 385 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 210:
Met Gly Arg Leu Thr Ser Gly Val Gly Thr Ala Ala Leu Leu Val Val
1 5 10 15
Ala Val Gly Leu Arg Val Val Cys Ala Lys Tyr Ala Asp Pro Ser Leu 20 25 30
Lys Met Ala Asp Pro Asn Arg Phe Arg Gly Lys Asn Leu Pro Val Leu
35 40 45
Asp Gin Leu Thr Asp Pro Pro Gly Val Lys Arg Val Tyr His Ile Gin 50 55 60
Pro Ser Leu Glu Asp Pro Phe Gin Pro Pro Ser Ile Pro Ile Thr Val 65 70 - 75 80
Tyr Tyr Ala Val Leu Glu Arg Ala Cys Arg Ser Val Leu Leu His Ala 85 90 95
Pro Ser Glu Ala Pro Gin Ile Val Arg Gly Ala Ser Asp Glu Ala Arg
100 105 110
Lys His Thr Tyr Asn Leu Thr Ile Ala Trp Tyr Arg Met Gly Asp Asn 115 120 125 Cys Ala Ile Pro Ile Thr Val Met Glu Tyr Thr Glu Cys Pro Tyr Asn 130 135 140
Lys Ser Leu Gly Val Cys Pro Ile Arg Thr Gin Pro Arg Trp Ser Tyr 145 150 155 160
Tyr Asp Ser Phe Ser Ala Val Ser Glu Asp Asn Leu Gly Phe Leu Met 165 170 175
His Ala Pro Ala Phe Glu Thr Ala Gly Thr Tyr Leu Arg Leu Val Lys
180 185 190
Ile Asn Asp Trp Thr Glu Ile Thr Gin Phe Ile His Arg Ala Arg Ala 195 200 205 Ser Cys Lys Tyr Ala Leu Pro Leu Arg Ile Pro Pro Ala Ala Cys Leu 210 215 220
Thr Ser Lys Ala Tyr Gin Gin Gly Val Thr Val Asp Ser Ile Gly Met 225 230 235 240
Leu Pro Arg Phe Ile Pro Glu Asn Gin Arg Thr Val Ala Lys Leu Lys 245 250 255
Ile Ala Gly Trp His Gly Pro Lys Pro Pro Tyr Thr Ser Thr Leu Leu
260 265 270
Pro Pro Glu Leu Ser Asp Thr Thr Asn Ala Thr Gin Pro Glu Leu Val 275 280 285 Pro Glu Asp Pro Glu Asp Ser Ala Leu Leu Glu Asp Pro Ala Gly Thr 290 295 300
Val Ser Ser Gin Ile Pro Pro Asn Trp His Ile Pro Ser Ile Gin Asp 305 310 315 320
Val Ala Pro His His Ala Pro Ala Ala Pro Ser Asn Pro Gly Leu Ile 325 330 335
Ile Gly Ala Gly Ser Thr Leu Ala Val Leu Val Ile Gly Gly Ile Ala
340 345 350
Phe Trp Val Arg Arg Arg Ala Gin Met Ala Pro Lys Arg Leu Arg Leu 355 360 365 Pro His Ile Arg Asp Asp Asp Ala Pro Pro Ser His Gin Pro Leu Phe 370 375 380
Tyr 385 (2) INFORMATION FOR SEQ ID NO: 211:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 368 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 211:
Met Pro Gly Arg Ser Leu Gin Gly Leu Ala Ile Leu Gly Leu Trp Val 1 5 10 15
Cys Ala Thr Gly Leu Val Val Arg Gly Pro Thr Val Ser Leu Val Ser
20 25 30
Asp Ser Leu Val Asp Ala Gly Ala Val Gly Pro Gin Gly Phe Val Glu 35 40 45 Glu Asp Leu Arg Val Phe Gly Glu Leu His Phe Val Gly Ala Gin Val 50 55 60
Pro His Thr Asn Tyr Tyr Asp Gly Ile Ile Glu Leu Phe His Tyr Pro 65 70 75 80
Leu Gly Asn His Cys Pro Arg Val Val His Val Val Thr Leu Thr Ala 85 90 95
Cys Pro Arg Arg Pro Ala Val Ala Phe Thr Leu Cys Arg Ser Thr His
100 105 110
His Ala His Ser Pro Ala Tyr Pro Thr Leu Glu Leu Gly Leu Ala Arg 115 120 125 Gin Pro Leu Leu Arg Val Arg Thr Ala Thr Arg Asp Tyr Ala Gly Val 130 135 140
Leu Arg Val Trp Val Gly Ser Ala Thr Asn Ala Ser Leu Phe Val Leu 145 150 155 160
Gly Val Ser Ala Asn Gly Thr Phe Val Tyr Asn Gly Ser Asp Tyr Gly 165 170 175
Ser Cys Asp Pro Ala Gin Leu Pro Phe Ser Ala Pro Arg Leu Gly Pro
180 185 190
Ser Ser Val Tyr Thr Pro Gly Ala Ser Arg Pro Thr Pro Pro Arg Thr 195 200 205 Thr Thr Ser Pro Ser Ser Pro Arg Asp Pro Thr Pro Ala Pro Gly Asp 210 215 220
Thr Gly Thr Pro Ala Pro Ala Ser Gly Glu Arg Ala Pro Pro Asn Ser 225 230 235 240 Thr Arg Ser Ala Ser Glu Ser Arg His Arg Leu Thr Val Ala Gin Val
245 250 255
Ile Gin Ile Ala Ile Pro Ala- Ser Ile Ile Ala Phe Val Phe Leu Gly 260 265 270 Ser Cys Ile Cys Phe Ile His Arg Cys Gin Arg Arg Tyr Arg Arg Pro 275 280 285
Arg Gly Gin Ile Tyr Asn Pro Gly Gly Val Ser Cys Ala Val Asn Glu
290 295 300
Ala Ala Met Ala Arg Leu Gly Ala Glu Leu Arg Ser His Pro Asn Thr 305 310 315 320
Pro Pro Lys Pro Arg Arg Arg Ser Ser Ser Ser Thr Thr Met Pro Ser
325 330 335
Leu Thr Ser Ile Ala Glu Glu Ser Glu Pro Gly Pro Val Val Leu Leu 340 345 350 Ser Val Ser Pro Arg Pro Arg Ser Gly Pro Thr Ala Pro Gin Glu Val 355 360 365
(2) INFORMATION FOR SEQ ID NO: 212:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 528 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212:
Met Arg Ala Gly Leu Val Phe Phe Val Gly Val Trp Val Val Ser Cys 1 5 10 15
Leu Ala Ala Ala Pro Arg Thr Ser Trp Lys Arg Val Thr Ser Gly Glu
20 25 30
Asp Val Val Leu Leu Pro Ala Pro Ala Gly Pro Glu Glu Arg Thr Arg 35 40 45
Ala His Lys Leu Leu Trp Ala Ala Glu Pro Leu Asp Ala Cys Gly Pro
50 55 60
Leu Arg Pro Ser Trp Val Trp Pro Pro Arg Arg Val Leu Glu Thr Val 65 70 75 80 Val Asp Ala Ala Cys Met Arg Ala Pro Glu Pro Leu Ala Ile Ala Tyr
85 90 95
Ser Pro Pro Phe Pro Ala Gly Asp Glu Gly Ser Glu Leu Ala Trp Arg 100 105 110 Asp Arg Val Ala Val Val Asn Glu Ser Leu Val Ile Tyr Gly Ala Leu
115 120 125
Glu Thr Asp Ser Gly Thr Leu- Ser Val Val Gly Leu Ser Asp Glu Ala
130 135 140 Arg Gin Val Ala Ser Val Val Leu Val Val Glu Pro Ala Pro Val Pro
145 150 155 160
Thr Pro Thr Pro Asp Asp Tyr Asp Glu Glu Asp Asp Ala Gly Val Ser
165 170 175
Thr Pro Val Ser Val Pro Pro Pro Thr Pro Pro Arg Gly Pro Pro Val 180 185 190
Ala Pro Pro Thr His Pro Arg Val Ile Pro Glu Val Ser His Val Arg
195 200 205
Gly Val Thr Val His Met Pro Glu Ala Ile Leu Phe Ala Pro Gly Glu
210 215 220 Thr Phe Gly Thr Asn Val Ser Ile His Ala Ile Ala His Asp Asp Gly
225 230 235 240
Pro Tyr Ala Met Asp Val Val Trp Met Arg Phe Asp Val Pro Ser Ser
245 250 255
Cys Ala Glu Met Arg Ile Tyr Glu Ala Cys Leu Tyr His Pro Gin Leu 260 265 270
Pro Glu Cys Leu Ser Pro Ala Asp Ala Pro Cys Ala Val Ser Ser Trp
275 280 285
Ala Tyr Arg Leu Ala Val Arg Ser Tyr Ala Gly Cys Ser Arg Thr Thr
290 295 300 Pro Pro Pro Arg Cys Phe Ala Glu Ala Arg Met Glu Pro Val Pro Gly
305 310 315 320
Leu Ala Trp Leu Ala Ser Thr Val Asn Leu Glu Phe Gin His Asp Gin
325 330 335
His Ala Gly Leu Cys Val Val Tyr Val Asp Asp His Ile His Ala Trp 340 345 350
Gly His Met Thr Ile Ser Thr Ala Ala Gin Tyr Arg Asn Ala Val Val
355 360 365
Glu Gin His Leu Pro Gin Arg Gin Pro Glu Pro Val Glu Pro Trp His
370 375 380 Val Arg Ala Pro Pro Pro Ala Pro Ser Arg Pro Leu Arg Leu Gly Ala
385 390 395 400
Val Leu Gly Ala Ala Leu Leu Leu Ala Ala Leu Gly Leu Ser Ala Trp
405 410 415
Ala Cys Met Thr Cys Trp Arg Arg Arg Ser Trp Arg Ala Val Lys Ser 420 425 430
Arg Ala Ser Ala Thr Gly Pro Thr Tyr Ile Arg Val Ala Asp Ser Glu
435 440 445
Leu Tyr Ala Asp Trp Ser Ser Asp Ser Glu Gly Glu Arg Asp Gly Ser 450 455 460
Leu Trp Gin Asp Pro Pro Glu Arg Pro Asp Ser Pro Ser Thr Asn Gly 465 470 - 475 480
Ser Gly Phe Glu Ile Leu Ser Pro Thr Ala Pro Ser Val Tyr Pro His
485 490 495
Ser Glu Gly Arg Lys Ser Arg Arg Pro Leu Thr Thr Phe Gly Ser Gly
500 505 510
Ser Pro Gly Arg Arg His Ser Gin Ala Ser Tyr Ser Ser Val Leu Trp 515 520 525
(2) INFORMATION FOR SEQ ID NO: 213:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 41 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:213:
Val His Ala Val Asp Ala Pro Ser Gin Phe Val Thr Trp Leu Ala Val 1 5 10 15 Arg Trp Leu Arg Gly Ala Val Gly Leu Gly Ala Val Leu Cys Gly Ile 20 25 30
Ala Phe Tyr Val Thr Ser Ile Arg Ala 35 40
(2) INFORMATION FOR SEQ ID NO: 214:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 85 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 214:
Met Thr Ser Arg Pro Ala Asp Gin Asp Ser Val Arg Ser Ser Ala Ser 1 5 10 15 Val Pro Leu Tyr Pro Ala Asp Val Pro Ala Glu Ala Tyr Tyr Ser Glu
20 25 30
Ser Glu Asp Glu Ala Ala Asn- Asp Phe Leu Val Arg Met Gly Arg Gin 35 40 45 Gin Ser Val Leu Arg Arg Arg Arg Arg Arg Thr Arg Cys Val Gly Leu 50 55 60
Val Ile Ala Cys Leu Val Val Leu Ser Gly Gly Phe Gly Ala Leu Leu 65 70 75 80
Val Trp Leu Leu Arg 85
(2) INFORMATION FOR SEQ ID NO: 215:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 227 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 215:
Met Ser Ala Glu Gin Arg Lys Lys Lys Lys Thr Thr Thr Thr Thr Gin 1 5 10 15
Gly Arg Gly Ala Glu Val Ala Met Ala Asp Glu Asp Gly Gly Arg Leu
20 25 30
Arg Ala Ala Ala Glu Thr Thr Gly Gly Pro Gly Ser Pro Asp Pro Ala 35 40 45 Asp Gly Pro Pro Pro Thr Pro Asn Pro Asp Arg Arg Pro Ala Ala Arg 50 55 60
Pro Gly Phe Gly Trp His Gly Gly Pro Glu Glu Asn Glu Asp Glu Asp 65 70 75 80
Asp Asp Ala Ala Ala Asp Ala Asp Ala Asp Glu Ala Ala Pro Ala Ser 85 90 95
Gly Glu Ala Val Asp Glu Pro Ala Ala Asp Gly Val Val Ser Pro Arg
100 105 110
Gin Leu Ala Leu Leu Ala Ser Met Val Asp Glu Ala Val Arg Thr Ile 115 120 125 Pro Ser Pro Pro Pro Glu Arg Asp Gly Ala Glu Glu Glu Ala Ala Arg 130 135 140
Ser Pro Ser Pro Pro Arg Thr Pro Ser Met Cys Ala Asp Tyr Gly Glu 145 150 155 160 Glu Asn Asp Asp Asp Asp Asp Asp Asp Asp Arg Asp Ala Gly Arg Trp
165 170 175
Val Arg Gly Pro Glu Asn Asp Val Arg Gly Pro Arg Gly Val Pro Gly 180 185 190 Pro His Gly Gin Pro Val Ala Ala Thr Pro Gly Ala Pro Pro Thr Pro 195 200 205
Pro Pro Pro Pro Pro Pro Pro Pro Pro Ala Arg Pro Pro Pro Ala Leu
210 215 220
Asp Arg Leu 225
(2) INFORMATION FOR SEQ ID NO: 216:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 227 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 216:
Met Ser Ala Glu Gin Arg Lys Lys Lys Lys Thr Thr Thr Thr Thr Gin 1 5 10 15
Gly Arg Gly Ala Glu Val Ala Met Ala Asp Glu Asp Gly Gly Arg Leu
20 25 30
Arg Ala Ala Ala Glu Thr Thr Gly Gly Pro Gly Ser Pro Asp Pro Ala 35 40 45 Asp Gly Pro Pro Pro Thr Pro Asn Pro Asp Arg Arg Pro Ala Ala Arg 50 55 60
Pro Gly Phe Gly Trp His Gly Gly Pro Glu Glu Asn Glu Asp Glu Asp 65 70 75 80
Asp Asp Ala Ala Ala Asp Ala Asp Ala Asp Glu Ala Ala Pro Ala Ser 85 90 95
Gly Glu Ala Val Asp Glu Pro Ala Ala Asp Gly Val Val Ser Pro Arg
100 105 110
Gin Leu Ala Leu Leu Ala Ser Met Val Asp Glu Ala Val Arg Thr Ile 115 120 125 Pro Ser Pro Pro Pro Glu Arg Asp Gly Ala Glu Glu Glu Ala Ala Arg 130 135 140
Ser Pro Ser Pro Pro Arg Thr Pro Ser Met Cys Ala Asp Tyr Gly Glu 145 150 155 160 Glu Asn Asp Asp Asp Asp Asp Asp Asp Asp Arg Asp Ala Gly Arg Trp
165 170 175
Val Arg Gly Pro Glu Asn Asp-Val Arg Gly Pro Arg Gly Val Pro Gly 180 185 190 Pro His Gly Gin Pro Val Ala Ala Thr Pro Gly Ala Pro Pro Thr Pro 195 200 205
Pro Pro Pro Pro Pro Pro Pro Pro Pro Ala Arg Pro Pro Pro Ala Leu
210 215 220
Asp Arg Leu 225
(2) INFORMATION FOR SEQ ID NO: 217:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 10 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 217:
CGCCCGCGTT TCTGCCGCCC GCGCGCTCCT GTGTGGACCC CGGGGTGGGC GGCGGGGGGG 60
GTGCCGTGGG TGTGGCGGCG GGGCGCGGGC CGGGGCCGGG GCTCGCTGGT CTGCCGAAGT 120 AAAGAAAAGA TCGCCACCGT GTGTTCGTCT GTGTGTTCTG CGCGGCGCCG GGGCCCCCCT 180
GCCGGGCGGG GCGGTGGGGC GGGGCCGGGG TCGCGGCGGG GAAGGAAGGA AAGGCCCCGG 240
AAGCGCCGGG AGGGGGCGCC GGCGCGACGC GGGCGGCCGG GCGGGGGCGC GCGGCGGCCG 300
GGCGGGGGCG CGCGGCGGCC GGGCGGGGGC GCGCGGCGGC CGGGCGGGGG CGCGCTTTCC 360
CCGCGTCGCC CCTCGGGTTC CCAAGACCTA TCACGTGTGC GCAGGGGAGG GGAGGACGCG 420 GGGGAGGGGA GGACGCGGGG GAGGGGAGGA CGCGGGGGAG GGGAGGACGC GGGGGAGGGG 480
AGGACGCGGG GGATATATAA AGCGGTAGAA AGCGCGGGAA TGGGCATATT GGACCCGCGT 540
GATTCGGTTG CTCGCGGTTG TCTTGTTTGG ACGTTTTTTA TGCGGGAACA AGGGGGCTTA 600
CCGGTTACAC TGTCCGCTCG CTATGGGGTT CGTCTGTCTG TTTGGGCTTG TCGTTATGGG 660
AGCCTGGGGG GCGTGGGGTG GGTCACAGGC AACCGAATAT GTTCTTCGTA GTGTTATTGC 720 CAAAGAGGTG GGGGACATAC TAAGAGTGCC TTGCATGCGG ACCCCCGCGG ACGATGTTTC 780
TTGGCGCTAC GAGGCCCCGT CCGTTATTGA CTATGCCCGC ATAGACGGAA TATTTCTTCG 840
CTATCACTGC CCGGGGTTGG ACACGTTTTT GTGGGATAGG CACGCCCAGA GGGCGTATCT 900
GGTTAACCCC TTTCTCTTTG CGGCGGGATT TTTGGAGGAC TTGAGTCACT CTGTGTTTCC 960
GGCCGACACC CAGGAAACAA CGACGCGCCG GGCCCTTTAT AAAGAGATAC GCGATGCGTT 1020 GGGCAGTCGA AAACAGGCCG TCAGCCACGC ACCCGTCAGG GCCGGGTGTG TAAACTTTGA 1080
CTACTCACGC ACTCGCCGCT GCGTCGGGCG ACGCGATTTA CGGCCTGCCA ACACCACGTC 1140
AACGTGGGAA CCGCCTGTGT CGTCGGACGA TGAAGCGAGC TCGCAGTCGA AGCCCCTCGC 1200
CACCCAGCCG CCCGTCCTCG CCCTTTCGAA CGCCCCCCCA CGGCGGGTCT CCCCGACGCG 1260 AGGTCGGCGC CGGCATACTC GCCTCCGACG CAACTAGCCA CGTCTGCATC GCAAGCCACC 1320
CTGGGTCGGG AGCAGGATAT CCGACCCGTC TAGCGGCCGG GTCGGCTGTC CAGCGTCGTC 1380
GCCCTAGAGG CTGTCCGCCG GGCGTGATGT TTTCCGCATC TACGACCCCC GAACAGCCCC 1440
TGGGGCTGTC GGGCGATGCG ACGCCGCCCC TGCCGACTTC CGTGCCCCTG GACTGGGCCG 1500 CGTTTCGGCG CGCGTTTCTG ATCGACGACG CCTGGCGGCC CCTGTTGGAG CCGGAGCTCG 1560
CGAACCCCCT AACCGCGCGC CTCCTCGCGG AGTATGACCG TCGGTGCCAG ACCGAAGAGG 1620
TGCTGCCGCC GCGGGAGGAT GTGTTCTCCT GGACGCGGTA TTGTACCCCC GACGACGTGC 1680
GCGTGGTTAT CATCGGGCAG GACCCGTACC ACCATCCCGG CCAGGCGCAC GGCCTGGCGT 1740
TTAGCGTGCG TGCGGATGTG CCGGTGCCTC CGAGTCTACG GAACGTGCTG GCGGCGGTAA 1800 AAAATTGTTA CCCCGACGCG CGCATGAGCG GCCGCGGCTG CCTGGAAAAG TGGGCTCGCG 1860
ACGGCGTGCT GTTGTTGAAC ACGACCCTGA CCGTCAAGCG CGGGGCGGCG GCGTCCCACT 1920
CCAAGCTTGG ATGGGACCGC TTTGTGGGCG GGGTGGTCCG ACGGCTGGCC GCGCGCCGCC 1980
CGGGCCTGGT CTTTATGCTC TGGGGCGCCC ATGCCCAGAA CGCGATCAGG CCCGACCCTC 2040
GCCAACACTA CGTCCTCAAG TTTTCTCACC CGTCGCCCCT CTCCAAGGTC CCGTTTGGGA 2100 CGTGCCAGCA TTTCCTCGCC GCGAATCGCT ACCTCGAAAC CCGGGACATT ATGCCTATCG 2160
ACTGGTCGGT ATAAGATGCC GACATCCGGG GTCTTGATTT ACGAGGGGGC AATTAATAAA 2220
GACTGTTGAT GGTTAAATCT CGGGTCTCAT ACCGGTCCGT GATGTCGGGC GTGGGGGAAG 2280
AGAGGGTCCC CTCTGCGTTT ACTATCCTTG CCTCGTGGGG CTGGACGTTT GCACCCCAGA 2340
ACCATGATCC TGGCGCGTCG CCGAATACGA CGCCCATAGA GTCGATTGCG GGGACCGCAC 2400 CGGACGCGCA CGTGGGGCCT CTCGACGGAG AGCCGGACCG GGATGCGATC TCCCCGCTTA 2460
CGTCGAGCGT GGCCGGCGAC CCGCCGGGGG CGGACGGCCC CTACGTCACC TTTGATACTC 2520
TGTTTATGGT ATCTTCGATC GACGAACTGG GGCGCCGCCA GCTCACGGAT ACGATCCGTA 2580
AGGACCTGCG GCTGTCGCTG GCCAAGTTCA GCATCGCGTG TACCAAGACC TCGTCGTTTT 2640
CGGGGACGGC CGCGCGCCAG CGCAAGCGCG GAGCACCGCC GCAACGCACA TGCGTACCAC 2700 GCAGCAACAA GAGCCTCCAG ATGTTCGTTT TGTGCAAGCG CGCCAACGCC GCGCAGGTGC 2760
GCGAGCAGCT GCGGGCGGTT ATTCGGTCGC GCAAGCCGCG CAAGTATTAC ACGCGGTCCT 2820
CGGATGGGCG GCTCTGCCCG GCCGTCCCCG TGTTTGTACA CGAGTTTGTT TCGTCCGAAC 2880
CCATGCGCCT CCATCGAGAT AACGTCATGC TGTCTACGGA ACCAGACTAA GCACCCCCGC 2940
CGTCCCCTTT CTTTTCCCCC TACCCTTCCC CCCGTTACTG ATGTGTTGTG ACGTTTCAAT 3000 AAATAACACG TAGCTTATTT TGTTGGATGA TGGATTGATT GATTTTATTG ACCGTTCGTT 3060
CGCCCGGCGG TGCCGTCGCC GCGCGCAGAG GGAATATGCA AGCGGGCGGG GTGGGGAGGA 3120
AAGAAGGTTT CAGGTTCCGG GGGTTGGGTC TGCGTCGTCC AGGGTGGGGC TGATCTGAAT 3180
TTCCCGCAGA ACCTCGACCA GTAGGTCTGT TGTGTTTGCT GGGAACTCGC CCGCCGTTGG 3240
GGATACGGGG GCGGGGGGTG TGGTTGGGCG GACGTCCAGG GGTGCGTTAT CGCACCCCCG 3300 CGCCGCCTCG GGGGCCGTCC CGTAGATCGT TGCGGTGATG TAGATGGTGT CCGGGGTCCA 3360
CACCACCGTC AGGATGCCGG CCGTCGCACT CCGGACGCTT TCGCCGTGCG ATGAGCTGAC 3420
CCAGGAGTCA AAGGGGTACG CGTACATATG GGCGTCCCAC CAGCGCTCCA GCCTCTGGGT 3480
ACTAGCGCGT CCTATAAAGC GGTATGCGCA AAATTCGGCA CGACAGTCGA TAATCACCAG 3540
CAGCCCGATG GGGGTGTGTT GTATCACCAC GCCTCCGCGG GGCAGGCGGT CCTGGCGCGC 3600 TCGACCCCGC GTCAGAACCG CGCGCGTCCC TGACTCAAAC ACGTGCACCA CCTGTGCCGC 3660
GTCCGGCAGC GCGCTCGTTA GCGACGCCCT GGGGTGATGT AGGCTGTACG CGATGGTCGT 3720
CTGGGGGTTC CCCATGTCTC GGGGGGGTGG GGGTGAATGT CACCCGGCCC GGGTGCGGTG 3780
GGAACGCGAG GGAATGGAGG GTTAATAGAC AATGACCACA TTCGGATCGC GTAGAGCAGA 3840 TAGTATGTGC TCGCTAATGA CGTCATCGCG TTCGTGGCGC TCCCGGAGCG GGTTTAGATT 3900
CATGTGCAGG AACTCGGATG AGGTGGTGCG GGACATGGCT ACGTACGCGC TGTTTAGGCG 3960
CAGGTTTCCG GGCGTGAAGC ATATGGCGAC CTTGTCCAGA CTGAGCCCCT GGGAGCGCGT 4020
GATGGTCATC GCGAGTTTGG AGCTGATGCC GTAGTCGGCG TTGATGGCCA TGGCCAGCTC 4080 CGTGGAGTCG ATCGACTCGA CAAACTCACT GATGTTGGTA TTGACGACAG ACATGAAGCC 4140
GTGCTGGTCC CGCAGGACGA TGTAGGGCAG GGGGGACTCC TCCAAGAACT CGGCCACGCC 4200
GGCCGTCGCG TGCCGCCGCC GCAGCTCCTC CGCGAACGCG AACACCCGGG TGTACGTGTA 4260
CCCCATCAGC GTGTAGTTGT CCGTCTGCAG GGCCACGGAC ATCAGCCCCC CGCGCGGCGA 4320
GCCGGTCAGC AGCTCGCAGC CCCGGAAGAT GACATTGTCC ACGTAGGTGC TGAAGGGGGC 4380 GCTCTCAAAC ACCTCCCCGA AGAGCTCCCG TAGGATAAGG TATCGCCCCA GAAAGGCCCT 4440
CTTCAGGAGC CCAAACTGGG CGTGGACGGC CGCGGTGGTC TCCGGCTCTT CGAGGGCGTA 4500
GTGGCAGTAG AACACGTCCA GCTGCTGTTC GTCCAGCCCG GCGAAGATAA CGTCAAGGTC 4560
GTCGTCGGGG AAGTCGTCCG GGCCCCCGTC CCGCGGGCCC AGGTGCTTAA AATTGAACGC 4620
ACGCTCCCCC GGAGAGCGGT CGCTGGTGTC GGCGGCCCTG GTTGCCGATG CGCCGGCGGC 4680 GTCCCGGCGT AGCGACAGGA GTTCTGCCGT CAGCTCCCCT AGGCGGCCGT AGGCCAGGGT 4740
CCTCTGGGTC GCGTCCAGGC CGGGGCGCTG GAGAAAGTTG TAAAAGTGAA TCAGCCCGCC 4800
GAACATGAGC CGCGACAGGA ACCGGTAGGC GAACTCCACC GAGGTCTCCC CCTGGGTCTT 4860
CACGAAGCTG TCGTCGCGCA GCACAGCCTC GAAGGTCCGA AACGTCCCGT CGAACCCAAA 4920
CACCATCTTT CGGAGGCGCG CGGTCACCGC GACCTGGCTG TTGAGGACGT ACGTGATGTC 4980 GTTCCGGGCC ACGACTAGCT GTTGCTTGCT GTGCACCTCA CAGCGCACGT GCCCCGCGTC 5040
CTGGTCCTGA CTCTGGGAGT AGTTGGTGAT GCGACTGGCG TTGGCCGTGA TCCACTTTTC 5100
CATGGTCAGC GTGGGTTGCT GCGTGAGCCG TCGATACTCG TCAAACTCTT TGACCGACAC 5160
AAACGTAAGC ACGGGGAGGG TAAACACAAC AAACTCCCCC TCGCGAGTCA CCTTTAGGTA 5220
GGCGTGGAGC TTGGCCATGT ACGCGCTGAC CTCCTTGTGG GACGAGAACA GCCGCGTCCA 5280 CCCCGGAAGG TTGGCCGGGT TGGTGATGTA ACTTTCCGGG ACGACAAAGC GGTCCACAAA 5340
CTGCATGTGC TCCTCGGTGA TGGGAAGGCC GTACTCCAGC ACCTTCATGA GGTTCCCGAA 5400
CTCGTGCTCC ACACATCGCT TGTTGTTAAT GAAAATGGCC CAGCTGTGCG AGAGGCGCGT 5460
GTACTCGCGT AGGGTGCGGT TGCAGATGAG GTACGTGAGC ACGTTTTCGC TCTGCCGGAC 5520
GGAGCATCGC AGTTTTTGGT GTTCGAAGGT GGACTCCAGC GAGGCCGTCT GGGTCGGCGA 5580 CCCCACGCAC ACCAGCACCG GCCGCAGGCG GCCCGCGTAC TGGGGGGTGT GGTACAGGGC 5640
GTTAATCATC CACCAGCAAT ACACCACGGT CGTGAGTAGG TGCCGCCCCA GGAGCCCGGC 5700
CTCGTCGATG ACGATAATGT TGCTGCGGGT GAAAGCCGGC AGCGCCCCGT GTGTGACCGA 5760
GGCCAGGCGC GTGAGGGCAC CCTGGCCCAG CCCCAAAGTC TGCTCTAGGG CGGTGAGGGC 5820
GTGGAACTCG TTTCGCGCGT CTTCGCCCCC GTGCGCCGCC AGGGCCCGCT TGGTGATGTC 5880 GAGGATCACC TCCCAGTAGT ACGTCAGGTC TCGCCGCTGC AGGTCTTCCA GCGAGGCGGG 5940
GCTGCTGGCC AGGGTGTACG GGTGCTGCCC CAGCTGGGCC TGGACGTGAT TCCCGCGAAA 6000
CCCGAACTCG TGAAAGATGG TGTTGATGGG TCGACTCAGA AACGCCCCCG AGAGCTTAAC 6060
GTACATGTTC TGCGCCGCGA TTCGCGTGGC GCCCGTGACC ACGCAGTCCA GGACCTCGTT 6120
GAGGGTCTGC ACGCACGTAC TCTTTCCGGA TCCGGCGTTG CCGGTGATGA GATACGCCGC 6180 GAACGGAAAC TCCCGGAGCG GCAGGCCGGT CGGGACCTCC AAGGCCGCCA CGTCCCGGAA 6240
CCACTGCAGG CGCGGCACCT GCGTGACGTC GAGCTGCTGC TGCGAGAGCT CTCGGATGCG 6300
TGCGATGATT GGTTGGACCC CGTGCATGGA CGTAAAATTT AAAAACGCCT CGTCCCTGAA 6360
CCGCACGGCG GGTCTGGCCC CGGGCTGCTG TGGGGGCGGA CCTGGTGCCC GGACGTCCCG 6420 CGAGCCCTCC CCGCCGGACG CCGCCATGGC CGCACAGCGC GCGCGGGCGC CGGCGATGCG 6480
GACGCGGGGC GGCGACGCGG CGCTATGCGC CCCCGAGGAC GGCTGGGTGA AGGTTCACCC 6540
CACCCCCGGG ACGATGTTGT TCCGCGAGAT TCTCCTCGGG CAGATGGGGT ACACCGAGGG 6600
TCAGGGGGTG TACAACGTCG TCCGGTCCAG CGAGGCCGCC ACCCGACAGC TGCAGGCGGC 6660 GATCTTCCAC GCGCTCCTCA ACGCCACAAC GTACCGGGAC CTGGAGGAGG ACTGGCGCCG 6720
CCACGTGGTG GCCCGCGGCC TCCAGCCGCA GCGGCTGGTT CGCAGGTACC GGAACGCCCG 6780
GGAGGGCGAT ATCGCCGGGG TGGCCGAGCG GGTGTTCGAC ACGTGGCGAT GCACGCTCAG 6840
GACGACGCTG CTGGACTTTG CCCACGGGGT GGTAGACTGC TTTGCGCCGG GCGGCCCAAG 6900
CGGACCGACC AGCTTCCCCA AATATATCGA CTGGCTGACG TGTCTGGGGC TGGTTCCCAT 6960 ATTGCGCAAG ACGCGCGAGG GGGAGGCGAC GCAGCGCCTG GGGGCGTTTC TCAGGCAGCA 7020
CACGCTGCCC CGGCAGCTGG CCACGGTCGC CGGGGCCGCG GAGCGCGCCG GCCCGGGGCT 7080
TCTGGAGCTG GCCGTCGCGT TCGACTCCAC GCGCATGGCG GAATACGACC GTGTGCACAT 7140
CTACTACAAC CATCGCCGGG GGGAGTGGCT GGTGCGCGAC CCGGTCAGCG GGCAGCGCGG 7200
CGAGTGCCTG GTGCTGTGCC CCCCCCTGTG GACCGGCGAC CGCCTGGTCT TCGATTCGCC 7260 CGTTCAGCGG CTGTGCCCCG AGATCGTCGC GTGCCACGCC CTCCGGGAAC ACGCGCACAT 7320
CTGCCGTCTG CGCAACACCG CGTCCGTCAA GGTGCTGTTG GGGCGCAAGA GCGACAGCGA 7380
GCGCGGGGTG GCTGGCGCCG CGCGGGTCGT CAATAAGGCG CTGGGGGAGG ATGACGAGAC 7440
GAAGGCCGGC TCGGCCGCCT CGCGTCTCGT GCGGCTCATC ATCAACATGA AGGGCATGCG 7500
CCACGTGGGC GACATCAACG ACACGGTACG CGCCTACTTG GACGAGGCGG GGGGGCACCT 7560 GATCGACACC CCCGCCGTCG ACCACACCCT CCCTGGGTTC GGCAAGGGCG GCACCGGCCG 7620
CGGGTCGGCG GCCCAGGACC CGGGGGCGCG ACCGCAGCAG CTTCGCCAGG CGTTTCAGAC 7680
GGCCGTGGTC AACAACATCA ACGGCATGCT GGAGGGCTAT ATCAATAATC TCTTTGGAAC 7740
CATAGAACGC CTGCGAGAGA CGAACGCGGG TCTGGCGACC CAGCTGCAGG CGCGCGACCG 7800
CGAGCTGCGG CGCGCCCAGG CGGGGGCGCT GGAGCGGGAG CAGCGCGCGG CGGACCGGGC 7860 GGCCGGGGGA GGCGCGGGCC GCCCGGCGGA GGCGGATCTT CTCCGGGCCG ACTACGACAT 7920
TATCGACGTC AGCAAGTCCA TGGACGACGA CACGTACGTG GCCAACAGTT TCCAGCACCA 7980
GTACATCCCC GCGTACGGCC AGGACCTCGA GCGCCTGTCG CGCCTCTGGG AGCACGAGCT 8040
GGTGCGCTGC TTCAAGATTC TGCGCCACCG CAACAACCAG GGCCAGGAAA CGTCGATCTC 8100
GTACTCTAGC GGGGCGATCG CCTCCTTCGT GGCCCCGTAT TTCGAGTACG TGCTTCGCGC 8160 CCCCCGAGCG GGCGCGCTCA TCACCGGCTC CGATGTCATC CTAGGGGAGG AGGAGTTATG 8220
GGAGGCGGTC TTTAAGAAAA CCCGCCTGCA GACGTACCTG ACAGACGTCG CGGCCCTGTT 8280
CGTGGCGGAC GTACAGCACG CGGCTCTGCC CCGGCCCCCC TCCCCAACCC CCGCCGATTT 8340
CCGGGCGAGC GCGTCCCCGC GGGGCGGGTC CCGGTCCCGG ACCCGGACCC GATCCCGGTC 8400
GCCCGGGAGA ACGCCGAGGG GTGCGCCGGA CCAGGGCTGG GGCGTCGAAC GCAGGGATGG 8460 CCGACCCCAC GCCCGCCGAT GAGGGAACGG CCGCCGCCAT CCTCAAACAG GCCATCGCCG 8520
GGGACCGCAG TCTGGTCGAG GTGGCGGAGG GGATCAGCAA CCAGGCGCTG CTGCGCATGG 8580
CCTGCGAGGT GCGCCAGGTC AGCGATCGCC AGCCGCGGTT TACCGCGACC AGCGTCCTGC 8640
GCGTTGACGT CACCCCCAGG GGGCGGTTGC GGTTCGTTCT GGACGGGAGT TCCGACGACG 8700
CGTACGTGGC GTCGGAGGAT TACTTTAAGC GCTGCGGGGA CCAGCCGACG TATCGCGGTT 8760 TTGCGGTCGT CGTCCTCACG GCCAACGAGG ACCACGTGCA CAGCCTGGCC GTGCCCCCCC 8820
TCGTTCTGCT GCACCGGCTC TCCTTGTTTC GCCCCACGGA CCTCCGGGAC TTCGAGCTCG 8880
TCTGCCTGCT GATGTACCTG GAGAACTGTC CCCGGAGCCA CGCCACGCCC TCGCTGTTCG 8940
TCAAGGTGTC GGCGTGGTTG GGGGTCGTGG CCCGCCACGC GTCTCCCTTC GAGCGCGTCC 9000 GCTGCCTTCT CCTCCGCAGC TGCCACTGGA TCCTGAACAC GCTAATGTGC ATGGCGGGCG 9060
TGAAGCCCTT CGACGACGAG CTAGTCCTGC CCCACTGGTA CATGGCCCAC TACCTGCTGG 9120
CCAACAATCC GCCCCCCGTC CTCTCGGCCC TGTTTTGCGC CACCCCGCAG AGCTCTGCGT 9180
TGCAGTTGCC CGGGCCCGTC CCCCGCACGG ACTGTGTGGC CTATAACCCG GCCGGCGTCA 9240 TGGGAAGCTG CTGGAAATCC AAGGACCTGC GTTCGGCTCT GGTGTATTGG TGGCTTTCGG 9300
GGAGCCCCAA ACGACGGACC TCGTCGCTTT TCTATCGGTT TTGCTAACTC CGGAAAATAA 9360
ACGTGTTTTT TATGGAACGT TCCCTACCTG TCGTGTCATC TCTCGGGGGA TGGTGGTGGG 9420
CCTGTGTGTG TGTCTTGTGC ACCGAAGGAG GAAAGTGGGG GGGTGGTGGT GCTGGTGGTG 9480
GAAAGACATG ATAGAGGGAA CAAAGAAATA GAAGAAAACC ACAACCGGCG CGTGTCAGTA 9540 AATACGGACG CGCGCACACG CGGGGGTAAG TTGGAGCACG GGGCCCCAGT TTATTGACCA 9600
AATTCAGGGA AACAGAAACC GCATCTTTTC CTCGAAAGGG TACACAAAGC TCCCGCCCTC 9660
GCCCCACACG CCTTCCAGAA CCCCCGTAAA CACCAGTTGA ATCTCGCGCA GGATCTCGCG 9720
CAGGTGATGG GCGCAGTCCA CGGGGGGGAG CACCAAGGGC CGCGGGTACA GATCCACGGG 9780
GACGCCGACC GACTCCCCGC CCCCGGGACA TACGCGCACG ACGCGTCTCC AGTATTGCTC 9840 CGCGTCCAGC AGGGCGCCTC CGCGGAAGGC CGTTTGGGGC AGGGGGTCGT CGGCCTCGCC 9900
CGGGGGGGTC AGAACGCTCC AGTACTCCGC GTCCAGACGC CTCCCGAAGG CATCCAGGAC 9960
AAAGCGGTCA CAGGCGTCCT CCATGATGCC CCGGGCCGCG CACACGGCCT CCTCCGGCGG 10020
GCCGGCGGCC GGCCGCCGGA GGATTCGTCT CAGCGCGTCG CGCATAACCT CGGCCGCCGC 10080
GGCGTACGCG GGCCCGCGGA GAGGAAATCC CTGCAGGAAG TCGGTGTCAT CGCGGGAGTT 10140 CCAGAACCAC GCCCCGGTCT GGCTCCAGGT GACGACGTGG GTGTAGACGC CCTCTAGCGC 10200
CAGGGAGGGG GCGAGGCGCG GGCGTATGCC GTTGGCCGAA AGTACGGCGC GCACGGACGC 10260
CTCGAGGGCC CGGCGGGCGT CCTGGATCGC GCCGTGCGCG GCGTCCGCGT CCCCGGGGTC 10320
CACGTTGAAC AGCCCCCAGA ACGCAGCCCC GGTGCCGCCG CAGACCGCAA ACTTCACCGA 10380
GCTGGCCGTC TGCTCGATCT GCAGGCAGAC GGCGGCCATG ACCCCGCCGA GCAGCTGCCG 10440 GAGCGCGGGG CAGGCGTCGC ACGCGTCCGG CACCAGGCGC TCCAGCACGG CCCGGGCCCA 10500
GGGCTCCGAG GGGGCGGCCG CCACCAGCGC GTCCAGCCTT TCCAGGCCCG CCCGCCCCCG 10560
GGCTTCCGGC AGCCCGGCCT CCCCGAGGCC CGCGAGGGCG GCCAGGAGCT GGGCCTGGAG 10620
CCCGGAGAAA CAAAACCGCG CCGTCCAGAC CGGCCCGACG GCCGCCGGGG GGTCGAGTAG 10680
TTGGATGGTG GTGGCCGTGG GGTGCCACCG CGCCACCGCT TCCCGAAAGG CGGGCAGGAG 10740 GCGGCCGGCC GCCTCCGAGG CCACGGCCGG CCATGCCCGC GGGGGCAGGA CGACCCTGGC 10800
GCCCACCGCG GGCCAGGCCC CCAGGCACGC GGCATGGGTG GCCGCGGCGC CCCGCACCAG 10860
GTCACGCGCC GACTCGGCGG CGGCGGCGGC CGGCACGGTA AACGTGGGCC AGCCCGGAAA 10920
TCCCAGCACG GCAAAGTATT GGACGGGCCC TCCCCGGACC TCAAACCCGG GCCCCAGAAA 10980
AGCGAAGACG GGGGCCAGGG CTCCGGGGGC GGCGTGGACC GTGGTATGCC ACTGCCGGAA 11040 GAGGGCGACC AGCGCCGGGG CGGAGAACCC GTCGCCGGCG CTCACGAAGT AGTCGTAGCC 11100
GCGCGGCAGC AGCACCCGCG CCGTGACCCG CTGCGGGTGT CCGCGGGGCC GCAGGCCTAC 11160
CTCGCACACC TCGACCAGGT CCGCGAAGGC GCCCTCCTTG CTGGTCGGCG GAAACGCCAG 11220
GGTGGTGTAT TCGCGCGCGA AACGCGCGGT CCTCGTCGTG ATGGTGACGG CGAGCGAGGC 11280
GGAGGACGCG CACTGGGGGC TGTCGCGAAT GGCGGCCAGG CGCGCCCACG CCAACCGCGC 11340 GCCGGGGTGC TCGGCGACGC GCGCGGCCAG GGCCAGCGGG TCGACGTCGA CCTTGGCCTC 11400
CACGTCCAGG AGGGCGGCGC GAGGAGCGGC CGGCGGGCCC CACGACGCCC TTTCGACCCT 11460
CACGACCAGA CCCGTCTGCG GGTCCCAGCC CAGGCGCAGC GGGACGAAGA GGGCCACCGG 11520
CCCCGTCTGG CGCTCCAGGG CCGCCAGAAC GCACGCATAC AGCGCCCGCC ACAGGGTCGG 11580 GTCCCCCAGG GGCTCCAGCG GGGAGGCGGC CGGGGCGGCG CGGGCGGCCG CGACGGCCCG 11640
GCGGCCGAGA CGTCGGGGGA GCCGTAGAAG TCCTGCAGGT CGGACGAACC AACGGACACC 11700
TCCGCAAAGC GCGCGCGCGC CTCCCCCGCG GCGTCGCGAC AGACCAGATA CAGCAGGGCG 11760
TGGAGGCAGT CGCGCGTGCG CGGGGGCAGC CATACCGCGT ATAGGGTAAT GGCGCTGACG 11820 CTCTCCTCCA CCCAAACGAT GCCGGGGGCT TCCATGCCAC GACGCCCGGG GGTTGCCGTG 11880
TATCGAACGA GCGCGGCCCC AGACTTATAG GGTGCTAAAG TTCACCGCCC CCTGCATCAT 11940
GGGCCAGGCC TCGGTGGGAA GCTCCGACAG AGCCGCCTCG AGAATGATGT CAGTGTTGGG 12000
CTGGGCGCCG GAGGCGTGCG TGCGCAAGCA GCGCCCCCAC GCGGGCGCGC GCAGCTTGAA 12060
GCGCGCGCCC GCAAACTCCC GCTTATGGGC CATCAGCAGC GCGTACAGCT GTCTGTGCGT 12120 CCGGCAGGCG CTGTGGTCGA TGCGGTGGGC GTCCAGCAGC TCCACGATGG CTCGCTTGGT 12180
GAGGTTTTTA ACGCGCCCCG CCCCGGGAAA CGTCTGCGTG CTCTTGGCCA GCTGCACCCC 12240
GAACAGTTCG CCCCAGATGA TCTTGAACAG CGACAGCGCG TGCTCCGTCT CGCTCACGGA 12300
CCCGCGCGGG GGGCAGCCGC TCAGGGCGTC GGCCACGCGC TTAACCGCGT CCTCCGACAG 12360
CAAGGGGCCG TCGGTCACGT TACAGTGGCC CAGTTCGAAC ACCAGCTGCA TGTAGCGGTC 12420 GTAGTGGGGG TTCAGCAGCT CCAGCACGTC CTCGGGGCTA AAGGTTCGCC CCGACCCCCC 12480
GGCCATCGAG TCCCACTGCA GGCACGCGGC CATGGTGCTG CACAGACGGA ACAGCTCCCA 12540
GACGGGGGCG ACGTTTAGGG TGGGGTGTAG GGCCACAAGC TCCAGCTCTC CGGCGGCGTT 12600
GATCGTGGGG ATGACGCCCG TGGCGTAGTG GTCGTAAAGC CGCCGGAAGA TGGCGCTGCT 12660
ATGGGCGGCC ATGGGGACGC GAAGACAGGC CTCCAGCAGC ACCAGGTAGA TGAACCGCGT 12720 GCGGCCGACC AGGCTGTTGA GGCCGCGCAT GAGCGCGACC ACCTCGGCCG GCGCGACGTC 12780
CGGCCGGAGG TACTTTTCGA CGAAAAGGCC CACCTCCTCC GTCTCGGCGG CCTGGGCCGA 12840
CAGGGACGTG TCGGGGTCCT GGCAGCGCAG CTCCCGCAGA TCCCGCTGGG CCCTCAGGGC 12900
ATCAAAATGT ATCCCCCGCA AAAACAGACA AAAGTTCCTC GGGGTCAGCG CGGCGTCGTG 12960
GCCCCAGAAC CGCACGTGCA TGCAGTTGAG GGTCAGAAGC ATGTGGAGGA TGTTAAGACT 13020 GTCCGCGAGG CACGCCAGCG TGCACCTCTC GAAGTAGTGC TTGTACCGGA ATTTGCTGTA 13080
GATGCGCGAC CCCCGCGCCT GCGCCGCGTC GGCGTGCGAC GCGTCGCAGC GCCCTTTGAA 13140
CCGGCGGCAC AACAGGTTCG TCACCTGGGA AAACTGTGCC GGCCACTGCC CGCTGGCGCT 13200
CACCACGTGG TTGAGCAGCA TGGGCGTAAA GACGGGCTCC GAGCGCGCCC CGGACCCGTC 13260
CATGTAGATC AGCAGCTCCC CCTTGCGGAG AGTCCGTACC CGCCCCAGCG ACTGGTACAC 13320 GGACACCATG TCCGGCCCGT AGTTCATGGG TTTCACGTAG GCGAACATGC TGTCAAAGTG 13380
CGGCGGATCG AAGCTAAGGC CCACCGTCAC GACCGTTGTG TAGATGACCA CCCGGTACCG 13440
GCCCCATGTG GTCACGTCGC CGGGCGGGGT GAGCGAGTGG AGCAGCAGCA CGCGGTCCGT 13500
AAACTGCCGG CAGAACCTGG CAACGACCTC CGCGAAGGAG ACCGTCGACG AGAAGATGCA 13560
GACGTTATCT CCGCCGGCCA GGCGCGCCTC CAGCTCCCCG AAGAAGGTGG CGTCCGGGGG 13620 GGCGTCCGGG GGGGGCGCCC CGCCCGCCGG CCCCGGCGGG CGCAGGGCCG CCTGCAGGAC 13680
CTCGGGCCCC AGGCGCGGGA GAAACAGACA ACGGCGCGCC GAAAATCCGG GCATGGCATA 13740
CTCCCCGATG ACCACGTGAA CGTTCTTTTC GCCCCGGAGG CTGCACAGAA AGTCCACCAG 13800
CTGCGCGTTG GCGGTGGCGT CCATGGCGAT GATCCGCGGG CACGTGCGCA GCAGGCGCAG 13860
CATCAACGCG TCGACGCGGC CCAGCTGCTG CATCGTCGGC GAGTACAGTT GGCCCAACGT 13920 CGACATGACT TCGTCCAGGA CGAGCACGTC GTAGTTGTTC AACAGGTTCG GGCCCACGCG 13980
ATGAAGACTT TCCACCTGCA CGATGAGACG GTGGAAGGGG CGGTCGTTCA TGATGTAATT 14040
GGTGGATGAG AAGTAGGTGA CGAAGTCGGG CAACCCTGAC TCAGCGAACC GCGTCGCCAG 14100
GGTCTGAGTA AAACTCCGAC GACAGGAGAC GACCAGCACA CTCGTGTCCG GAGAGTGGAT 14160 CGCTTCCCCC AACCAGCGGA TCAGCGCGGT AGTTTTTCCC GAGCCCATTG GCGCGCGGAC 14220
CACAGTTACG CACCGGGCCG TCGGGGCGCT CGCGTCCGGG AAGGTGACGG GTCCGTGTTG 14280
CTGCCGCTCG ATCGTTGTTT TCGGGTGGAC CCGGGGAACC CACTCGGCCA AATCCCCCCC 14340
GTAAAGCATC CGCGCCAGCG ATACACTCGA CGTGTACTGC TCGCACTCGT CATCCCCGAT 14400 GGGACGCCGG GCCCCCAGGG GATCCCCCGA GGCCGCGCCG GGCGCCGACG TCGCGCCCGG 14460
GGCGCGGGCG GCGTGGTGGG TCTGGTGTGT GCAGGTGGCG ACGTTCATCG TCTCGGCCAT 14520
CTGCGTCGTG GGGCTCCTGG TGCTGGCCTC TGTGTTCCGG GACAGGTTTC CCTGCCTTTA 14580
CGCCCCCGCG ACCTCTTATG CGGAGGCGAA CGCCACGGTC GAGGTGCGCG GGGGTGTAGC 14640
CGTCCCCCTC CGGTTGGACA CGCAGAGCCT GCTGGCCACG TACGCAATTA CGTCTACGCT 14700 GTTGCTGGCG GCGGCCGTGT ACGCCGCGGT GGGCGCGGTG ACCTCGCGCT ACGAGCGCGC 14760
GCTGGATGCG GCCCGTCGCC TGGCGGCGGC CCGTATGGCG ATGCCACACG CCACGCTAAT 14820
CGCCGGAAAC GTCTGCGCGT GGCTGTTGCA GATCACAGTC CTGCTGCTGG CCCACCGCAT 14880
CAGCCAGCTG GCCCACCTTA TCTACGTCCT GCACTTTGCG TGCCTCGTGT ATCTCGCGGC 14940
CCATTTTTGC ACCAGGGGGG TCCTGAGCGG GACGTACCTG CGTCAGGTTC ACGGCCTGAT 15000 TGACCCGGCG CCGACGCACC ATCGTATCGT CGGTCCGGTG CGGGCAGTAA TGACAAACGC 15060
CTTATTACTG GGCACCCTCC TGTGCACGGC CGCCGCCGCG GTCTCGTTGA ACACGATCGC 15120
CGCCCTCAAC TTCAACTTTT CCGCCCCGAG CATGCTCATC TGCCTGACGA CGCTGTTCGC 15180
CCTGCTTGTC GTGTCGCTGT TGTTGGTGGT CGAGGGGGTG CTGTGTCACT ACGTGCGCGT 15240
GTTGGTGGGC CCCCACCTCG GGGCCATCGC CGCCACCGGC ATCGTCGGCC TGGCCTGCGA 15300 GCACTACCAC ACCGGTGGCT ACTACGTGGT GGAGCAGCAG TGGCCGGGGG CCCAGACGGG 15360
AGTCCGCGTC GCCCTGGCGC TCGTCGCCGC CTTTGCCCTC GCCATGGCCG TGCTTCGGTG 15420
CACGCGCGCC TACCTGTATC ACCGGCGACA CCACACTAAA TTTTTCGTGC GCATGCGCGA 15480
CACCCGGCAC CGCGCCCATT CGGCGCTTCG ACGCGTACGC AGCTCCATGC GCGGTTCTAG 15540
GCGTGGCGGG CCGCCCGGAG ACCCGGGCTA CGCGGAAACC CCCTACGCGA GCGTGTCCCA 15600 CCACGCCGAG ATCGACCGGT ATGGGGATTC CGACGGGGAC CCGATCTACG ACGAAGTGGC 15660
CCCCGACCAC GAGGCCGAGC TCTACGCCCG AGTGCAACGC CCCGGGCCTG TGCCCGACGC 15720
CGAGCCCATT TACGACACCG TGGAGGGGTA TGCGCCAAGG TCCGCGGGGG AGCCGGTGTA 15780
CAGCACCGTT CGGCGATGGT AGCCGTTTCG TTCGTTTTAA TAAACCGACG TTGTGCGTTT 15840
CACCATACTT CGGCGCGCGC GTGTGTGTGT TTTTTTTGTG GTGTTTATTT TCCCCCACCC 15900 CTTCCTTTTC TTTCGGCCAC CACCCCCCTC CTCCCCCGTA CTATACAACA AAAAATACCA 15960
CACATACGAC CAAATACGGA CAATCATTTC TGTCTTTATT CGCTGTCAGA GAGTGGGGGC 16020
GTGAGCGTGG CAGGAGGGCG GGCCACGTCG GGGTCCCGCC GTCTGGTGTG ACGCGATGGG 16080
GGGTCCGATG CGCGCCGGTA CTGGGGCCCC GGCGCCCGGG TGACCACGCG CATGTCGGGG 16140
GGCACGTAGA AGTTACCCTC TTCTTCGGAC TCGATGTCCA CGACGTCAAA TTCGTGGGCG 16200 GTCAGCGAGA CGACCTCCCC GCCGTCGGTG ATGATGACGT TGTGTCGGCA GCAGCAGGGC 16260
CGCGCCCCGG AGAACGCGAG GCCCATAACT TGGCGAGCGT ATCGTCGAAG GCCAGGCGGC 16320
TGTTTCGCCG GATGTCCCGG TAGATCCCCG GCTCGACGCG GACGGGGGTG ATGATCAGGG 16380
CGATCGGAAC GGCCTGGTCC GGGAGGATCG ATGCCTTGGC GGGTCCGGGG GCCCCGCCAC 16440
GCCCGGCGGG CGCTCCGCGG CCGTCCTCCA GGCGGAACGT CACGCCCTCC TCCGCGCCCG 16500 CGCGGTGCCT GCCGAGGAAC GTCACCAGGT GCGGTTGCAG GGGGCAGTCG GGAAAGTGGC 16560
TGTCGAGGAC GTATCCCTGC ACCAAGATCT GTTTGAAGTT CGGGTGGCGG GGGTTGGCGA 16620
AGATGGGCTC GCGGCGAACC AGCTCCCCGG AGCTCCAGGC CACGGGAGAG ATGGTGCGAC 16680
GCTCGAGGTC GGGGACGCCA AACAGAAGCA CCTCCGAGAC AACGCCGCTA TTTAACTCCA 16740 CCAGCGCCCG ATCCGGGGCG GAGCATCGCC TTTTTTCGCC GGCGGCGCGG GAATCGAGCC 16800
AGTCCCGGTC TTGGGTGACG AGCGCCTCCT CCGGGCCCGG GACGCGCCCG GGCGCGAAGT 16860
AGCGCACGCC GGGGTTGGGG ATGGACCGGA TGAACGCCCG GAACGCCTCC GGCGATCGCC 16920
GCGCCATCAG GTCCTCGTAC GCGGAGGCCG CGGGGGCGCC GGGGTCCGCG GGGTCGAACG 16980 CGTACTTGGC TCGGCACTTA ACCTCGTAGA AGGCCAGGGG GGTCTGGGGG GCGGGGGCCA 17040
GGTAGCCGTG AGGGTCCCTG GGGCACACGA GGATGTCCAG GGACGCCCCC ACCATGCCCG 17100
TGTGGCCGTC CATGAGGACC CCGCACGCGT GCACGTTCTC CTCGGCGAGG TCCCCGGGTT 17160
GGTGAAAGAC GAAGCGCCCG GCGTCGGCGT CGTCGTTGAC GCCCGCGTCC GCGCGGCCCA 17220
CGCAGTAGCG AAACAGCAGG TTTCGGGCCG TCGGCTCGTT CACCCGCCCG AACATCACCG 17280 CCGACGACTG GGCGTCCAGC CGCAGGCTGG CGTTGTGGGT GAGCCACTGG GACGAGAAGC 17340
ACGGACCCTG CGCGCCCCAC CGCAGCGTGG AGGCGGTCGT CAGGCCCCGC CGAAGCAGGG 17400
CCCAGAGCTG GCAGTCGGCC TGGTTTTGCG TCGCCGCCTC GTAAAATCCC ATAAGCGGGC 17460
GGGGGGCGAC GGCTTCGGCG GCGGACGGGG GGGCGCGGCG CGTCAGGCGC CAGAGGTGCC 17520
GGCCGAGCCC GCGGTCCACC ATGCCGGCCG CCTCCAGCGA CACGACGAGG GAGCACAGAT 17580 AGTCCAGGCG AGCCCACAGG GGCCCGATGG CCAGAGGGGA GCGGACGCCG CGCAGCAGGC 17640
CGCGCAGGTG GCGCTCGAAC GTTTCCGCCA AGATATGGGG GGGCAGTGCG TTGGGGATCG 17700
CCGACGCCGA CCACATCGGG TCGGGGTCCG GGGGACCGGG GCTGCAGTCC GGGTCGATGG 17760
CGTGTGCGCC CCCCGGCGAG AGGGGAATGT CGGGGGTTGG CGGGCCGGAT GAGGCCTCAG 17820
AGAGGGCCGG GGACGCGGGC CGGGCCTTTT CGCCCGGGGC CCCGCCGTCG GGTTGCCCAC 17880 GTGGGGGGCT CTGGGGCCAA TGGGAACCCG GGGCCCCCGG TGACGTGGGG CGGGGTGGGG 17940
CGGGGCGGGG CCCAAAGACG GTCGCCAGAT CTAGGCTGTT GGGTCGGGGC CGCTTCGGGG 18000
GACTATCGGG GTCGCGGGCG GGGTCCGCGG GGCGCTTGGC GCCGGGTGTT GCGGCGGCCG 18060
CCATTTTTAC GAGCAGCCGA AGAGCTCGAG GGCGGAAGGG ATCCTCACGA CAGAGAGTGG 18120
CGCGCGGCCG GGTTGGCGTG ACAGAGGCGG GAGACCAGCA CCAGCAGCGG CCTCAGCTCG 18180 GGCGGCAGCG ACACCGACGA CAGGACGGCC TTGTGCGTGC GCTGGTAATT TATACACTGC 18240
TCCGTGAACG CGCGCCGAAT CTTGGGATTG CGAAGGTGGC GCCGGATGCC CTCCGGCACG 18300
TCATACGCCA GGCCGTGGGT GTTGGTCTCG GCCGAGTTGA CAAAGAGGGC GGGGTGCAGA 18360
ACGCAGCGAT AGGCGAGGAG GGCCACGGCA AAGTCCGGCG AGAGCTGGTT GTTAAAGTAC 18420
TGGTAGCCCG GGACGCGGGT CACGGGGACG CCCAGGCTCG GGGCCACGTA CACGCTAACC 18480 AGCAGCTCCA GCAGCGTCTG CCCCAGGGCG TAGAGATCGA CCGCCAGCCC GACGTCGTGC 18540
TTCAGGGGGC GGTTGTTAAA CTCGGCCCGC TCGTTGTTGA GGTACTTTAC CAAGAGCTCC 18600
GGCGGCTGGT TGTACCCGTG CCCCACCAGA GTGTGAAAGT TGGCCGTGGT CAGGGCGGCG 18660
GGCATCCCAA ACCCCCGGGG GGACTCGAGG TCCGGCTCCT GGAGGCAAAA CTGGCCCCGG 18720
GATATCGTGG AGTTGGAGTT CAGGGTCACC AGGCTAAAGT CGGCCAGGAC GGCCCGCCGG 18780 AGCGACACCG CGTCCGATCG CAGCATCACG AGGACGTTGG CGCACTTGAT GTCCAGGTGG 18840
CTGATCCCGC ACCTGGTGTT CAGGAACACC ACGGCGCGCG CCAGGTCTGT GAAGCAGTGG 18900
TGGAGGGCCG TCGCGACGGA GGGGGTGGTC GCGCGCAGGG ACGCCAGCTG GCCGATGTAC 18960
TTGCCGAGGT CCATGTCGTA CGCGGGGAAC ACGATCTGGC GCTGCTGCAG CGAGAACCCG 19020
AGCGGGGTGA TAAAGCCGCG GATGTCGTGG GTGCGGCCGC CGCGAAGAGC GCACTCCCCC 19080 ACGAGCAGGG TCGCGACGAG CTCCACGGCA AACCACTCCT TTTCCCGGAT GGTCTTCACG 19140
GCGAGCTTGT GTTCGCGAAT CAACTGCACC TCGCCGTACC CCCCCGAGCC CCCGAAGCTG 19200
CGGGCCCCGG GGATCTCCAG GGTCGTGTAG CGGAGGGCGG GGTTGACGGC GAATACGGGG 19260
ATGCATAGCT TGTGGATGCG CGCGAGGGAC AGGATGTGCG AGGGGGGCGA CGGGGGCGAG 19320 GTCATGGCCG TCTCGGACCT GCGCAGGGGC GGGCGCCTCA GCTTGGCCGC AGGGCCGGGG 19380
GCCTCGGGGG ACGAGCGGCG ACGAGACGAG CGGCTCACTC GCCATCGGGA CAGTCCCGCG 19440
CGAAGCCGCT CCCGGAAGCT GGATCGGCGG CGGGACCCGG GGCGGGCTCC GGAGACGGCG 19500
CCGTCTCGGG GGGAGGGGCC GCTTGGGCGT CCGGACGCCC GGCGGCTGAG GGAGTGTATG 19560 TAGGACGCGA GCCAGGCCTT GAAGGAGCGT CGGTGTGCAC CTTGGGGGCT GATGTCAGCT 19620
GCCACATGAC TAGCAGGTCG CTGTCGCCCG GACTCATCCA TCCGTCCGCC AGGTCGCCGT 19680
CCCCCCACAG AGACGCGTTC GCCGCGGCCT CTTCGAGCTG CTCCTCCTGG TCCGCAAGAC 19740
GATCGTCCGC CGCGTCCAGG CGCTCGCTAA GCGCGGGATC GAGGTACCGT CGGTGTGCGG 19800
TTAGAAAATC ACGTCGCGCC GCTTGCTCTT CCACGCGAAT TTTAACACAG GTCGCTCGCT 19860 GTCGCATCAT CTCTAAGCGC GCGCGGGACT TTAGCCGCGC CTCCAATTCC AAGTGGGCCG 19920
CCTTGGCGGC CATAAAGGCG CCAACAAACC TAGGATCTTG TGTACTCACG CCCTCCCGGT 19980
GTAGCTGCAG GGTCTGGTCC CTGTACACCT CGGCCCGGAG GTGCGTCTCG GCCAAACGTC 20040
GGCGCAGGGC CGCGTGGCTG GCGTCTCGGC TCATCTCGCC GCCCCCGCGC GCGCCCGACG 20100
TCGGACTCCT TCGCCCCGAC CCCCCTGACC TCAGCCGCCC CCGCCTCGCC CGCGATGTTT 20160 GGCCAGCAGC TGGCGTCCGA CGTGCAGCAG TACCTGGAGC GCCTGGAGAA ACAGAGGCAA 20220
CAGAAGGTGG GCGTCGACGA GGCGTCGGCG GGCCTGACGC TCGGCGGCGA TGCGCTGCGC 20280
GTCCCTTTTT TGGATTTTGC CACCGCGACG CCCAAGCGCC ACCAGACCGT GGTCCCGGGC 20340
GTCGGGACGC TCCACGACTG CTGCGAGCAC TCGCCGCTCT TCTCGGCCGT CGCGCGGCGG 20400
TTGCTGTTTA ATAGCCTGGT GCCGGCGCAA CTCAGGGGGC GTGACTTTGG GGGCGACCAC 20460 ACGGCCAAGC TGGAGTTCCT GGCCCCCGAG CTGGTGCGGG CGGTGGCGCG CCTGCGGTTT 20520
CGGGAGTGCG CGCCGGAGGA CGCCGTGCCC CAACGCAACG CCTACTACAG CGTCCTGAAC 20580
ACGTTTCAGG CCCTGCACCG CTCCGAAGCC TTTCGGCAGT TGGTTCACTT CGTGCGGGAC 20640
TTCGCCCAGT TGTTGAAAAC CTCGTTCCGG GCCTCTAGTC TCGCGGAGAC TACGGGCCCC 20700
CCGAAGAAAC GGGCCAAGGT GGACGTGGCC ACCCACGGGC AGACGTACGG CACCTTGGAG 20760 CTCTTCCAGA AAATGATACT AATGCACGCG ACCTACTTTC TGGCCGCCGT GCTGCTCGGG 20820
GACCACGCGG AGCAGGTCAA CACGTTCCTG CGGCTCGTGT TCGAGATCCC CCTGTTTAGC 20880
GACACGGCCG TGCGGCACTT CCGCCAGCGC GCCACCGTGT TTCTAGTCCC CAGGCGCCAC 20940
GGAAAGACCT GGTTTTTGGT GCCCCTCATC GCGCTGTCGC TCGCGTCCTT CCGGGGGATC 21000
AAGATAGGCT ACACGGCCCA CATCCGCAAG GCGACCGAGC CCGTGTTTGA TGAGATCGAC 21060 GCCTGCCTGC GGGGCTGGTT TGGCTCGTCC CGGGTGGACC ACGTCAAGGG GGAAACCATC 21120
TCGTTCTCGT TCCCGGACGG CTCGCGCAGC ACGATCGTGT TTGCCTCCAG CCACAACACG 21180
AACGTAAGTA CGCCTTCCTC CCGCGGTGCC TGTTTCCCCG GTGCCGCCCT CCCCGAGATC 21240
GACCGACAGA CAAACACAGC CAGACGCGAG TGTGGGACGA CACGCCCGCA GCCCCCCCCG 21300
CCATGGCGGG GGGAAGCCTT ACTGTTTATT TGTAATCGGA CGATGAGGCT CTGGCCACGG 21360 CCCGCGCGAC CGCGGGGCAG CTCGTTGCAA ACAGGCGGCT GGTATACGAT GACAGAACGC 21420
AGAGGCGCCA CCCGGCGCTG GTCGGGCGGA TGACGCTTTC CGCGCCGTCC CGGCCCACGA 21480
CGACCTCGTG CAGGTGGGCC GTGATGCGCG GGCGGCGGGT CGCCTGCCGC AGGATAACCG 21540
CGTCCACGGG GTGCCCGAAG AGGAGCTGAC ACAGGCTCGC GTCCCCCCGG ACGGCCAGGG 21600
TGCGCTGGGC CATATTGGAC CACATGCACG GGGCGACGCA GGGACAGGCC TCCGCCACGG 21660 CGGGGGCGCG CCACAGCGCG TTGGCGGAAT CGATGTGGGC CGTCGGGGCG CAGGCGCCGC 21720
CTCCTCCCGG GGGGTCGGTA ATCCTGGATA GCAGCCATCC TAAATGGCGG GCCCGGCTGC 21780
CCGGGGGACA GAGCGACCCC AGGTCATCAT CCATGGCCCA GCAGTATATG CGGCCGCCGG 21840
GGAGGTGCCA CCAGGCCCCC GGACCCAGGG CACAGCACGC CCCCGGATTC GGGGGCGGTT 21900 CCGTGGGTAC CAGGTAGGCG CCGTCGAGCT CGTGGGCCAC GGGCTCGTCC GCGAGCTGTT 21960
CGGCGGCGGG GTCGGGGGTT TCCTCCGGGG GGGAGGCAGC TTCCAGGTGG CCGAAGGCTA 22020
GGGTGCACAG CAGCGGGGTC CGGGGGTGCG TTACGCTGCG GAGGTGGACG GTGGCGCAGT 22080
AGCGGCGCTC GCGGTTAAAG AAGAAAATGG CAAAGAACGT GTTCGAAGGC AGGCGCAGCG 22140 CCTTGGGCCG CGTCAGGTAC AGGAAGATCT CGCAGAAAAG GGCACGCTCG GGGTCGGGGT 22200
CCGGAAGGGC CACCTGGCAC AGCGGCTCGG TGAGGACCGT GAGGCACCGA AAAATCTTAA 22260
GCCGCTCGTC CCCCCGAACG ACGCGCCACA CGAAGACAGA GTTGGCGATG CGCGCGACGA 22320
GGTCGGCTTC GGGCCCCGGG TCGGGGGCGC GCGCGTCGGG GGGGGCGCCC CGGTGACCCG 22380
GCGGGGCCGC GGCTCCCGGG GGGCCTGGCG TCGCCTGGGG ACGCCAGAGT GCCCGCTGTG 22440 CCAGGTTGGT GGTGGGGAAG GGACCGGAGA CGCACCAAAA GCAGAGGGGC CAGCGCGTGT 22500
ATGAGTTGGG GGGGGGGTGG GTGAGCGGTG GAACAAAAGC ACGCGTCAGC GGACAAGGCC 22560
GGGTCCCGTA GCCGCCCCGC GACAGAACCG GAGTCCGACG GCACGCGCGA CGGGGTCTGC 22620
GAGGCTGAGG TACGCCGCGG TGTTAATGGT AAACGCAAAG CCTCCCGGAA AGACCACTAG 22680
CCCGCAGAGG CGGCGATTGA ACCCAAGGCA GAGGTACGCG TAGCTCTCTC CCGGAAGGTA 22740 TTGCTCGCAG ACCCTGTGCG GGGCAGTGGA GGGGCTGCCC TCCATGAAGC GACATTTACT 22800
CTGCTCGCGT CCATTGACGT CACCGTCAAT CACCACTGCG ATTGGACGGT TGGTAAGGCG 22860
CAGCGTGTCT CCGCTGGTGC TGTAGTAGTC AAACGCGTAG TGGGCGTCGG AGTCGGCGAA 22920
GCGGGCGGGG ATGTCGTCGC TGAGAGGGAC GAGCCGCCGC CGCCGCCCCC GACCGCCCTG 22980
GCCGCCCAGA TGCGCCAGCA CGGCCAGGGC GTACGCGGTG TGAAAGAACG CGTCGGGGGC 23040 GGTCCCCTCG AGGGCGCGCA TCAGGTTCTC CAGGAGCACG GGGAAGCGCC GCGTCACCTC 23100
CCCTAGCCAC TCGCTCTGGT GGGGGCCAAA GTCGTAGCGC AGGCGCTGGA AGATGCGCGG 23160
GCCGCCTTGG AGCGCGGCCC GGATAGAGTG GCCCAGGGCC CGCAGACACG CGATCTGGAT 23220
GCGCGCGACG AAGGCCACCT CGGCCGCGAT GTCAAAGGGC TGCAGCACGG GGCGCGGGTG 23280
GCGCAGGGGT CCCTCGAGCG CGGGAAAGCG ACGCAGCAGC GCCGTCTGGG CCGCGGGGGA 23340 CAGCTGGTGG GGGCGCACGA CGCGCTCGGC GGCACAGGCC TCCGTCAGGG CCGTGGCCAG 23400
CTCGGAGGAC AGCCGCGGGG GGCGGGCGCG TCGCCCGCCC CACGCCACCG AATTCTCGTA 23460
GGAGACGACG ACGAAGCGCT GCTTGGTCCC GTAGTGATGG CGCAGGACCA CGGAGATGGA 23520
GCGACGGCTC CACAGCCAGT CGGGCCGGTC GCCGCCGGCC AGAGCTTCCC ACCCGCGGTC 23580
CAGCCACTCG ACCAGCGATC GCGGCTTGGC GGTCCCCGGC ACGAGGGTGA GCACGTCGTT 23640 GAGGACGTCC TCGCCCGCGG CCCGGGGGCC CCCCGGGGTG GCAAAGCGCC CCCCGCCGGG 23700
CGGCTCCAGG CCCGCCAGCA CCGCCTCCGC GTCCGACGCG CCCAGGGCTC CCCCGCTGAC 23760
GGCCTGGTGG ACCAGGGCGC CCTGGCGGAG CCCCGAGGCG ACGCCGGAGG CCGCGTGCTT 23820
GGGGCGCGCG CGGACCGGGT GGCGGCGGGT GACGTCCTGC ACGGCCCGCT GGACCAGCGC 23880
GAGGATCTCC TCGTTCTCTT GCGTGATGGA CACGTCCTCC GCGGTGGCCG TGTCGCCTCC 23940 CGGGGCCGTG AGCTGCTCCT CCGGGGAGAT GGGGGGGTCT GGGGTGCCGA CAACGGCCGG 24000
CCCGGCCCCG CCCGAGACCG AGGACGCCTG GGGAGTGGGG GTGCCGCTTT CCCCCATCCC 24060
CAGGGACAGG TGGGCCGCCG CCTCCGTCGC GGCGGCGGGA GCCGCGGCCC CCAGCCGCGC 24120
GACGTAGCGA CAAAAGTGGC GACAGAGGCG CATGAGGCGC GCGCCGTCGG CCGCGTATCG 24180
CGTGTTTGGC GGGACGAGCT CGTCGTAACT GAACAGGAGC ACGCGGGCGC AGGTCGCCCA 24240 CGGGCCCCAC GCCAGGCGCA GCGCCGCGAC CGTGTACGGG TCGTACACGC CTTGGGCGTC 24300
GCACGCGACC GGCAGGGAGA CGAACAGCCC GCCCGCGCTG GGGACGCGCG GCAGGAGGTC 24360
CGGGTGCGCC GGGATGACGG GGGCTAGGAT CGCCCCCACC GCATCCGCCG GCACGTAGGC 24420
GGCAAACGCC GAACGCCACG GGGTGCAGTC GCCGGTCGCG TGGGCCCGGG TCTGGGTTTC 24480 GACCCGGAAG TTCGCGGCCG CCCCGCCGTC GGGGCGGCCG CGCACGAGGG CGGACAGCGG 24540
GACCCCCGCC GCCGCCAGGC ACTCGCTGGA GATGATGACG TGAATCAGCG AGGCGGGGCT 24600
GCTCGGGTCC CGGGTGAGAT CGTATTGGAC CTCGTTGGCA AAGTGCGCGT TCATGGCCCG 24660
GCCGGCGGTG CGAGCCCTTC CCGGTGCCGG AAGGGGCGTG GGTGGGGGGT GCGTGTGCGC 24720 GTCCTCGGGG CCCGCGGGCG CACGTGCGCT TATACGCTGT GTGTTTCGTC TGTCCCCAGG 24780
GAATCCGGGG CCAGGACTTT AACCTGCTTT TCGTCGACGA GGCCAACTTT ATTCGCCCGG 24840
ATGCGGTCCA GACGATTATG GGCTTTCTCA ATCAGGCCAA CTGCAAGATC ATCTTCGTCT 24900
CGTCGACCAA CACCGGGAAG GCCAGCACGA GCTTTTTGTA CAACCTCCGC GGGGCCGCCG 24960
ACGAGCTGCT CAACGTGGTC ACCTATATAT GCGACGACCA CATGCCGCGG GTGGTGACGC 25020 ACACCAACGC CACGGCCTGT TCCTGCTATA TCCTGAACAA ACCCGTGTTT ATCACGATGG 25080
ACGGCGCCGT TCGCCGGACG GCCGATCTGT TTCTGCCCGA CTCCTTCATG CAGGAGATCA 25140
TCGGGGGGCA GGCCCGCGAG ACCGGCGACG ACCGGCCCGT CCTAACAAAG TCGGCGGGGG 25200
AGCGGTTTCT GCTGTACCGC CCCTCCACCA CCACCAACAG CGGCCTGATG GCCCCCGAGC 25260
TGTACGTGTA CGTGGACCCG GCGTTCACGG CCAACACGCG CGCCTCCGGC ACCGGCATCG 25320 CGGTCGTCGG GAGGTACCGC GACGATTTCA TTATCTTCGC CCTGGAGCAC TTTTTCCTCC 25380
GCGCGCTCAC GGGATCGGCC CCCGCGGACA TCGCCCGCTG CGTCGTGCAC AGCCTCGCCC 25440
AGGTGCTGGC GCTGCACCCC GGGGCGTTTC GCAGCGTTCG CGTGGCGGTC GAGGGCAACA 25500
GCAGCCAGGA CTCGGCCGTG GCCATCGCCA CACACGTGCA TACCGAGATG CACCGCATCC 25560
TGGCCTCGGC GGGGGCCAAC GGCCCGGGGC CCGAGCTCCT CTTCTATCAC TGCGAGCCGC 25620 CCGGCGGCGC GGTATTGTAC CCCTTCTTTC TGCTCAACAA ACAGAAGACG CCCGCCTTCG 25680
AATACTTTAT CAAAAAGTTC AACTCCGGGG GCGTCATGGC GTCCCAGGAG CTCGTCTCCG 25740
TGACGGTGCG CCTGCAGACC GACCCGGTCG AGTATCTGTC CGAGCAGCTC AACAACCTCA 25800
TCGAAACCGT CTCTCCCAAC ACCGACGTCC GCATGTACTC CGGAAAACGC AACGGTGCCG 25860
CGGACGACCT CATGGTCGCG GTCATCATGG CCATTTACCT GGCGGCCCCG ACCGGGATCC 25920 CCCCGGCCTT TTTTCCGATC ACGCGCACGT CTTGAGTCTT TCTTGCCGTT TCTTTTGTTT 25980
CTCTTTCTTT CCCCCCTCTC TCCGCAATAA ACGCCTTCCC GGAACTGTGT TTTCCCCCCC 26040
TACAACAGTG TTGTCCGTTG GTTGGGTGGT TGGGGTGCGG GGGTGGGCGG GGGAAGCAAG 26100
AAAACGGTCG GCGAACACAA CATCGGGAAA ACGGATTCCC GAACGTGCGT CTTCCCAGAT 26160
TCGACACACA CCCCCCTTCT CCTTAAATAA ACACAAACCA CACGCTCGTT GGTTGGTTAA 26220 TGCCGGCGCT TTATTTACGT CTTGTTTTTT TGCGTTTCCT CCGCGGGTCC CTTCCCAACA 26280
CGCCTGCCCC CGCCTCAGGG GTAGCGGATA ACCGGGGCCA TGTCGCCGGA TTGCACAACG 26340
GCGGCGCCGT CGAACGTACA CACCCGAACC GCCGGGGCCA GGGCCAGGAT GTCCCCGAGT 26400
TGGCCCGCGT GCGCCAGCCA GGCGACCAGC GCCTCGT AA GCGGCAGCCT GCGCTCGCCG 26460
TCCTGCATCA GCATGGGGGC TTCGGGGTGG ATGAGCTGGG CGGCTTCTCG CGTGACGCTC 26520 TGCATCTGCA GGAGCGCGTT CACGTATCCG TCCTGGGCGC TCAGCGCGAG CAGCCGGGGG 26580
ATGAGCGTGA GGATGAGGGT GGTTCCTTCG GTTATGGAGT AGACCATGTT GAGGACGAGC 26640
GACCGCAGCT CGGTGTTTAC GGAGGCGAGT TGCTGGACGT CGGCCACGAG CGAGAGACGG 26700
GCCCCGTTGT AATACAGCAC GTTGAGGTCG GGGAGCTCCC CGGGCGTCCG GGGGTCGGGG 26760
TTGAGGTCCC GGATGCCCCG GGCGACCAGC CGCGCGACTA TCTCGCGGGC CAGGGGCGTT 26820 GGGAGCGGGA CCGGAAACCG CAGCGTGAGG TCCAGCGACT CCAGGCGCAC GTCCGTCGCC 26880
TGGCCCTCGA AGACGGGCGG GACGAGGCTG ACGGGATCCC CGTTGCAGAG GTCGACGGGG 26940
GAGGTGTTGC GGAGATTGAC GGTGCCGGCG TGCGTGAGCC CCAGGTCCAC GGGGCAGGCG 27000
ACGATTCGCG TGGGCAGCAC CCGCGTGATT ACCGCGGGGA AGCGCCTGCG GTACGCCAGC 27060 AACAACCCCA ACGTGTCGGG ACTAACTCCT CCGGAGACGA ACGATTCGTG CGCCACGTCC 27120
GCGAGCGCCA GCTGGCGGCG GATGGTCGGC AGAAAGACCA CTCGACCCTC GCACCGCTGC 27180
AGCGCCGCGG CATCGGGGCG CGAGATACCC GAGGGGATCG CGATGTCTGC TTCGAAACAA 27240
TCCGTGATCA TGGCGCCGGG CCGCGAGACA CCGGAACGCG GGGGTGCGGG AGGGCCGGAA 27300 AGCGCAACGC AACCGGGACG ATGATGAAAC AGAGATGGGG GGCACCGACC GTGTGGGAGA 27360
GGGGGCGGGG CAGGGCTCAG CAGCACGCAC GGGGAGGTCT GTCGTGCGCA GGAGCCCCAG 27420
GTGAGAATCA GTCCCCCGGA GCTCGGGTCT GGGTTTTATT GGGACCTGCC CTCGGAATCG 27480
CGGCTCCCAG TCCAAGCCCC CCCGGGGGGG CGGGGACAGG GGGTGTGTGT GGGTAAAAGC 27540
AACGTCGGAA AATCAAACCC AATGCCCCAA ACAGGAAAAA AAAAAAAGAC GGGCGGGTGG 27600 AGGGAAAGCT GGGGAAGAAG AAGCCAATTT TACAGAGACA GGCCCTTTAG CGGGGAGGCG 27660
TCGTAGATGA GATACTGCGT AAAGTGGGTC TCTCGCGCGT GGGCCTCCCC ATCGCGGGCG 27720
CTGCGTAGCA GGGCGGGGTC GCTGGCGCAG GTGATCGGGT AGGCTTCCTG AAACAGGCCG 27780
CACGGGTCTT CCACGAGCTC GCGGCACCCC GGCGGGCGCT TAAACTGCAC GTCGCTGGCA 27840
GCGGTGGCCG TGGATACCGC CGATCCCGTT TCCACGATAA GACGCTCCAG GCAGCGATGT 27900 TTGGCCGTGA TGTCGGCCGC GGTGAAGAAC TTGAAGCAGG GGCTGAGGAC GGGCGAGGCC 27960
CCGTTGAGGT GATAGGCCCC GTTGTACAGC AGGTCCCCGT ACGAGAACCG CTGCGACGCC 28020
CACGGGTTGG CCGTGGCCGC GAAGGGCCGC GCCGGGTCGC TCTGGCCGTG GTCGTACATG 28080
AGGGCTATGA CGTCCCCCTC CTTGTCCCCC GCGTACACGC CGCCGGCCGC GCGTCCCCGC 28140
GGGTTGCAGG GCCGGCGAAA GTAGTTGATG TCCGTGGCCA CGGGGGTGGC GATGAACTCA 28200 CACACGGCAT CCTGCCCGTG GTCCATGCCG GCGCGCCGCG GCACCTGGGC GCAGCCAAAG 28260
ACCGGGAGGG GCTGGGCCGG CCCCAGCCGG TTTCCCGCCA CGACCGCGTT GCGCAGGTAC 28320
ACGGCGGCCG CGTTGTCTAG CAGCGGGGGG GCCCCGCGGC CGAGGTAAAA GTTTTGGGGG 28380
AGGTTGCCCA TGTCCGTAAC GGGGTTGCGG ACGGTGGCCG TGGCCGCGAC GGCGGTGTAG 28440
CCCACACCCA GGTCCACGTT TCCGCGCGGC TGGGTGAGCG TGAAGCTGAC CCCCCCGCCC 28500 GTTTCGTGGC GGGCCACCTG GAGCTGGCCC AGAAAGTACG CCTCCGACGC GCGCTCGGAA 28560
AACAGCACGT TCTCGGTCAC GAAGCGGTCC TGCCGCACGA CGGTGAACCC GAACCCGGGG 28620
TGGAGGCCCG TCTTGAGCTG GTGATACAGG GCCACGGGGC TCATCTTGAA GTACCCCGCC 28680
ATGAGCGCGT AGGTCAGCGC GTTCTCCCCC GCCGCGCTCT CGCGGGCGTG CTGCACCACG 28740
GGCTGGCGGA TGGAGGAGAA GTAGTTGGCC CCCAGGGCCG GGGGGACCAG GGGGACGTGG 28800 CGCGCCAGGT CGCGCAGGGC CGGGGGGAAG TTGGGCGCGT TGGCCACGTG GTCGGCGCCC 28860
GCAAACAGCG CGTGGACGGG CAGGACGTAG AAGTATTCGC CATTTTGGAT GGTGTGGTCC 28920
AGGTGCTGGG GGGCCATGAG CAGCACGCCG GCGTGCAGCG CCCCGTCGAA GATGCGCATG 28980
TTGGCCGTCG ACGCGGTGTT GGCGCCCGCG TCGGGCGCCG CGGAGCACAG CAGCGCCGTC 29040
GTGCGCTCGG CCATGTTGTG CGCCAGCACC TGCAGCGTGA GCATGGCGGG CCCGTCGACG 29100 ACGACGCGCC CGTTGTGGAA CATGGCGTTG ACCGTGTTGG CCACCAGATT GGCGGGATGC 29160
AGCGGGTGGG CGGGGTCGGT CACGGGATCG CTCGGGCACT CCTCGCCGGG GGCGATCTCC 29220
GGGACCACCA TGTTCTGCAG CGTGGCGTAC ACGCGGTCGA AGCGGACCCC CGCGGTGCAG 29280
CAGCGCCCCC GCGAGAAGGC CGGCACCAGC ACGTAATAGT AGATTTTGTG GTGGACGGTC 29340
CAGTCGGCCG GCCGGTGCGG CCGGTCGTCG GCGGCGTCGG CCGCGCGGGC CTGGGTGTTG 29400 TGCAGCAGCC GGCCGTCGTT GCGGTTAAAG TCGGCCGTCG CCACGTTGCA CGCCGCCGCG 29460
TAGACGGGCT CGTGTCCCCC CGCGTCAATC CGGCAGTCTC GGTGGCGGTC CAGGGCCGCG 29520
TGTCGCATAA GGCCGTCGCA GTCCCACACG AGGGGCGGCA GCAGCGCCGG GTCGCGCATC 29580
AGGTGATTCA GCTCGGCCTG AGCCTGCCCG CCCAGCTCCG GGCCCGGCAG GGTAAAGTCG 29640 TCCACCAGCT GGGCCAGGGC CTCGACGTGG GCCACCAGGT CCCGATACAC GGCCATGCAC 29700
TCCTCGGGGA GGTCGCCCCC GAGGTAGGTC ACGATGTACG AGACCAGCGA GTAGTCGTTC 29760
ACGAACGCCG CGCATCGCGT GTTGTTCCAG TAGCTGGTGA TGCACTGAGT CACGAGCCGC 29820
GCCAGGGCGC AGAACACGTG CTCGCTGCCG TGAATCGCGG CTTGCAGCAG GTAAAACACC 29880 GCCGGGTAGC TGCGGTCCTC GAACGCCCCG CGGACGGCGG CTATGGTAGC CGGCGCCATG 29940
GCGTGGCGGC CAACGCCGAG CTCCAGGCCC CGGGCGTCAC GAAACGCCAC CGGACACAGC 30000
GCCAGGGGCA GGTTGCCGTT GACCACGCGC CAGGTGGCCT GGATCGCCCC CGGACCGGCC 30060
GGGGGGACTT CGCCGCCGGG AAGCTCGACG TCGGCCACGC CCGCGAAGAA GTCGAACGCG 30120
GGGTGCAGCT CCAGAGCCAG GTTGGCGTTG TCGGGCTGCA TGAACTGCTC CGCGGTCATC 30180 TGGCACTCGG CGACCCACCG GACCCGGCCG TGGGCGAGGC GCTGCCGCCA GGCGTTCAGA 30240
AAACGCTGCT GCATGTCCGC GCCGGGGCCG GCCGGGGCCG CGACGTACGC CCCGTACGGA 30300
TTCGCGGCCT CGACGGGGTC GTGGTTCACG CCCCCGACGG CCGCGTCGAT GTTCATGAGC 30360
GAAGGATGAC ACACGGTCCC GACCGCGTTC TCCATGGACA GCCGCAGAAC CTGGTGGTCC 30420
TTTCCCCAAA AAAACAGCTG CCGGGGAGGG AACGCGCGGG GCTCCGGGTG GCCGGGGGCG 30480 GGCACCAGGT CCCCGGCGTG CGCGGCGAAG CGCTCCATGG CCGGGTTGAA CAGCCCCAGG 30540
GGCAGGACGA ACGTCAGGTC CATGGCGCCC ACCAGGGGGT AGGGCACGTT GGTGGCGGCG 30600
TAGATGCGCT TCTCCAGGGC CTCCAGGAAG ACCAGCCTGT CGCCTATGGC CACCAGATCC 30660
GCGCGCACGC GCGTTGTCTG GGGGGCGCTT TCGAGTTCAT CCAGCGTCTC CCGGTTCGCC 30720
TCGAGTTGCT CCTCCTGCAT CTCCAGCAGG TGGCGGCCCA CGTCGTCCAG GCTCCGCACG 30780 GCCTTGCCCA TCACCAGCGC CGTGACGAGG TTGGCCCCGT TCAAGACCAT CTCGCCGTAG 30840
GTCACCGGCA CGTCGGCCTC GGTGTCCTCC ACCTTCAGGA AGGACTGCAG GAGGCGCTGT 30900
TTGATGGCGG CGGTGGTGAC CAGCACCCCG TCGACCGGCC GCCCGCGCGT GTCGGCGTGC 30960
GTCAGGCGGG GCACGGCCAC GGACGGCTGC GTCGCCGTGG TCAGGTCCAC GAGCCAGGCC 31020
TCGATGGCCT CGCGGCGATG GCCCGCCTTG CCCAGGAAGA AGCTCGTGTC GCAAAAGCTC 31080 CGCTTCAGCT CGGCGACCAG GGTCGCCCGG GCGACCCTGG TCGCCAGGCG CCCGTTGTCG 31140
AGATATCGTT GCATGGGCAA CAGCAGGGCC AGGGGAGGCG CCTTCTCCAA CAGCACGTGC 31200
AGCATCTGGT CGGCCGTGCC GCGCTCAAAC GCCCCCAGGA CGGCCTGGAC GTTGCGCGCG 31260
AGCTGCTGGA TGGCGCGCAG CTGGCGATGC AGGCTAATGC CCGTCCCGTC CAGGGCCTCC 31320
CCCGTGAGCA GGGCAATGGC CTCGGTGGCC AGGCTGAAGG CGGCGTTCAG GGCCCGGCGG 31380 TCGATGACCT TCGTCATGTA ATTATGCACG GGCTGCTCGA CGGGGTGCGG GCCGTCGCGG 31440
GCGATGAGGG GCTGGTGGAC CTCGAACTGC ACACGCCCTT CGTTCATGTA AGCCAGCTCC 31500
GGGAACTTGG TGCACACGCA CGCCACGGAC AGGCCGAGCT CCAGAAAGCG CACGAGCGAC 31560
AGGGTGTTGC AGTAGGACCC CAGCAGGGCG TCAAACTCTA CGTCATACAG GCTGTTTTCG 31620
TCGGAGCGCA CGCGGGCGAA AAAATCAAAG AGTCTGCGGT GGGACGCCAC CTCGATCGTA 31680 CTCAGGATGG AGCCGGTGGG CACCATGGCC GCGGCGTACC GGTAACCCGG GGGGTCGCGG 31740
GCAGGAGCGG CCATTGGGTT CCTTGGGGGA TTCGCAGGCT CCATCAAGCC AAGCTCGGGA 31800
AGGCCAAGCC CCTCCCACAC AACGCCTCAC CGCCGGCGGA CGCGACTAAC AACCCACGGG 31860
CCGCCAAAAA CCCCAAGGGG CAACCCGACC AACAACAGGC GAGGGGAGGA AAGGCGTAAA 31920
GGGGGCGTTG GGAGGCAAAA AGAAAGAAAA CACCCAGACG TAGGCCCGAG GACCGGCCGG 31980 CGTCCTCTGT CCCCGAGCAC CCACTGTGCC CAACAGGCAC GGGGGCGAGC TGCCCCTGCC 32040
TTATATACCC CCCCGCCACA CCCCCGTTAG AACGCGACGG GTGCCTTCAA GATGGCCCTG 32100
GTCCAAAAGC GTGCTAGAAA AAAGTTGGTA AAGGCGGCAA AGCAGTCCGC CGCCGCCACC 32160
CACATGGCGG CGCCGGCCGC GCAGGCGATT CCCAGAGAAC GGGCGCGGAG GGGATCCGTG 32220 CGGGGCAGCA GCTGGCTGGC GGTGATCCAA TGGAAAAGCC CGTCGGGACT GAACGTCTCA 32280
TGGGCGGCCG CCACCAGGGC GCACAGGGCC GCGCCGCCCA TGATCACGCA CAACCCCCAA 32340
AACACGGGTG GCGACAACGG CAGGCGATCC CGTTTGATGT TCACGTACAG GAGGAGCGCC 32400
CGTGCCAGCC ACGTGACATA GTAGGCGAGG ACGGCGGCTA TAATACATGC CGGCGCCACC 32460 GCCCGTCCGG TCCACCCGTA ATACATGCCC GCGGCCACCA GCTCCAGCGG CTTGAGGACC 32520
AGGAACGACC AAGCAAACAT CACCACCCGC TTGGAAAAGA CCGGCTGGGT GTGGGGCGGA 32580
AGACGCGAGT AGGCCGAACT GACAAAAAAA TCAGACGTGC CGTACGAGGA CAGCGAAAAC 32640
TGTTCATCGA GCGGCAGTTC GCCGTCCTCC CCGCCACACG CGGCCTCGTA TACCAGCTCG 32700
CGATCCAACA AAGGAACATC ATCCCGCATT GTCATGGTCG GTGCGGGGAG CCGGCGAGGC 32760 AGCGAAACCG AAAGTAGTGC TGGCGGCGCG GGCCCGGGTC CGGACCCAAG CTTCAGGGAT 32820
GGGGGGCGGA GGCCAAAATC AAACAAGCAC CGCGCGGGTT CTACACACAA CCCCCACCCG 32880
GGTAGTATCC GCGGATGCGA GTGCCTGGCG AAGTCACGTC CCAGCAGGAT ATAAACCTCG 32940
GCCGTTGGGC CCGGAACCCC CGAAATTCAC ACCCACGCCC TGACGCCCAA ATCATGGGTG 33000
GATGTGGTTC GCGAGCCGCA CATCCGTGCG TCCGCCCTCC CCCGCGGGCT GATGACGTGG 33060 CGGTTAGTCA GTGGGAAGGC AGGGGGAAAG ATGGGTTGGG GGAGGAAACG AAGAAAACAC 33120
CCAGAGGGCC ACGTCGGGAA TGCGCCCGGA GTTGTCCTTA AAAGGCCGGC CGTGCGTGAC 33180
GGAAGCCGTC GTTTGCCCAA GCACCGACGC CGCGATCCAC AGTGGGGGGA GTTCCTCCGT 33240
CCGGCCACAA CCCTACGCGC GGGCGGCACG CGCGAGAGCA ACCCACGGGT CCCGTTCGCG 33300
CCACCGCCAG CCCTTGCTCC CACCACCCTC CTCCCACCAC CCCACTATTC CCCCCCCCCC 33360 AAGTCCGCCC CGTGGCTCGC CGGCCATGGA GCTCAGCTAT GCCACCACCC TGCACCACCG 33420
GGACGTTGTG TTTTACGTCA CGGCAGACAG AAACCGCGCC TACTTTGTGT GCGGGGGGTC 33480
CGTTTATTCC GTAGGGCGGC CTCGGGATTC TCAGCCGGGG GAAATTGCCA AGTTTGGCCT 33540
GGTGGTCCGG GGGACAGGCC CCAAAGACCG CATGGTCGCC AACTACGTAC GAAGCGAGCT 33600
CCGCCAGCGC GGCCTGCGGG ACGTGCGGCC CGTGGGGGAG GACGAGGTGT TCCTGGACAG 33660 CGTGTGTCTG CTAAACCCGA ACGTGAGCTC CGAGCGAGAC GTGATTAATA CCAACGACGT 33720
TGAAGTGCTG GACGAATGCC TGGCCGAATA CTGCACCTCG CTGCGAACCA GCCCGGGGGT 33780
GCTGGTGACC GGGGTGCGCG TGCGCGCGCG AGACAGGGTC ATCGAGCTAT TTGAGCACCC 33840
GGCGATCGTC AACATTTCCT CGCGCTTCGC GTACACCCCC TCCCCCTACG TATTCGCCCT 33900
GGCCCAGGCG CACCTCCCCC GGCTCCCGAG CTCGCTGGAG CCCCTGGTGA GCGGCCTGTT 33960 TGACGGCATT CCCGCCCCGC GCCAGCCCCT GGACGCCCGC GACCGGCGCA CGGATGTCGT 34020
GATCACGGGC ACCCGCGCCC CCAGACCGAT GGCCGGGACC GGGGCCGGGG GCGCGGGGGC 34080
CAAGCGGGCC ACCGTCAGCG AGTTCGTGCA AGTGAAGCAC ATCGACCGTG TTGTGTCCCC 34140
GAGCGTCTCT TCCGCCCCCC CGCCGAGCGC CCCCGACGCG AGTCTGCCGC CCCCGGGGCT 34200
CCAGGAGGCC GCCCCGCCGG GCCCCCCGCT CAGGGAGCTG TGGTGGGTGT TCTACGCCGG 34260 CGACCGGGCG CTGGAGGAGC CCCACGCCGA GTCGGGATTG ACGCGCGAGG AGGTCCGCGC 34320
CGTGCATGGG TTCCGGGAGC AGGCGTGGAA GCTGTTTGGG TCGGTGGGGG CTCCGCGGGC 34380
GTTTCTCGGG GCCGCGCTGG CCCTGAGCCC GACCCAAAAG CTCGCCGTCT ACTACTATCT 34440
CATCCACCGG GAGCGGCGCA TGTCCCCCTT CCCCGCGCTC GTGCGGCTCG TCGGTCGGTA 34500
CATCCAGCGC CACGGCCTGT ACGTTCCCGC GCCCGACGAA CCGACGTTGG CCGATGCCAT 34560 GAACGGGCTG TTCCGCGACG CGCTGGCGGC CGGGACCGTG GCCGAGCAGC TCCTCATGTT 34620
CGACCTCCTC CCGCCCAAGG ACGTGCCGGT GGGGAGCGAC GCGCGGGCCG ACAGCGCCGC 34680
CCTGCTGCGC TTTGTGGACT CGCAACGCCT GACCCCGGGG GGGTCCGTCT CGCCCGAGCA 34740
CGTCATGTAC CTCGGCGCGT TCCTGGGCGT GTTGTACGCC GGCCACGGAC GCCTGGCCGC 34800 GGCCACGCAT ACCGCGCGCC TGACGGGCGT GACGTCCCTG GTCCTGACCG TGGGGGACGT 34860
CGACCGGATG TCCGCGTTTG ACCGCGGGCC GGCGGGGGCG GCTGGCCGCA CGCGAACCGC 34920
CGGGTACCTG GACGCGCTGC TTACCGTTTG CCTGGCTCGC GCCCAGCACG GCCAGTCTGT 34980
GTGAGATATC CCAATAAAGT GCAGTCGTTT TCTAACCCAC GGATGCCGTT GTATGCCTAT 35040 ACGGGGGACT ATGGGGGGGG GGGGAAAGGA AAGGAAACAG GAATGGAGAA GGGAAAGGAA 35100
CAGAGGCGGT AGCGGACGCA CGGCGGACAC AATAACAAAC AGACCGCGGA CACGGAGGGA 35160
GTCGGTTGGG TTGGGCGTGG ACGCCGCTGC GTCCACACAC CCGTTTATTC GCGTCTCCAC 35220
AAAAATGGGA CGCACGTTCG GACCACCCTG AGGATGCCCG CCAGGGCCGC GGTGATCATA 35280
ACGACCCCCA GCGCGGACGC GGCCAGAAAC CCGGGGGCGA TGGTGGCGAT GGGCAGCGTG 35340 TCAAAGGCCA GCAGATGAAT CACAGTTCCG TTGGGGAACA ACAACAGGGC CACGGACGGC 35400
ACGTCGCTGG AAAACACGTT CGGGGTGCCC GCCACCGGCC CCTGGGCCAG CTGCTGTTGG 35460
GTGGCATCCG TGTCCACCAG CAGCACCGAC ATGACCTCCC CGGCCGGGGT GTAGCGCAGA 35520
AACACGGCCC CCACGAGGCC GAGGTCGCGC CGGTTTTCGG TGCGCACCAG CCGCTTCGGC 35580
TCAATCTCCC GCGCGTGCCC TTCGCAGGTG GCGGTGAGAT AGGTGATAAA CAGCGGGCGG 35640 CGGACGTCAA CGCCCGTAAG CTTGTATCCG ATCCCGCGGG GCAAGGGGGT GTGGGTGACG 35700
ACGTAGCTGG CGTTGTGGGT GATGGGCACG AGGATCCGGG GCTCCGCGTT GTGCGACGGG 35760
CCGCTACACT GGTGGGTGGC CTCCGGGACG AAGGCGCGGA TCAGGGCGTT GTAGTGCGCC 35820
CAGCGCGTGA GAACGGAGGC CACGCCGCGG GTCTGTTGTG CCATGACGTC CGCCGGGATG 35880
TCGGATCGGG TGGCCATGGC CAGCGCGTCC AGGATGAACC CGCCCTCGGC GAGATCGAAG 35940 CGCAGGGAAG CTGCGCATGG GGAAAAGTGG TCCGGGAGCC AGAAGAGGTT TTTCTGGTGG 36000
TCGGTCCTGG CTAGCGCGGC CCGGAGATCG GCGTGGGTCG CCGCGGCGAC GTCGGACGTA 36060
CACAGGGCCG TGGTTATGAG GAGGCCCCGG CGGGCGCGTT CCCGCTGCTC GGCCGAGGGC 36120
GCGCCCGCCA GGAACGGCGC CCGGAGGACG GCCGTGGCGT AAAACAGCGC TCGGCGGACC 36180
ATCGGGGCGG TTAGCGCGCG GCCGCCGAGA AACTCGGCGT ACAGGGCGTC GATCAGGCGG 36240 GCCGCGCTCG GGGCCACCGC GCCATAGGCC GCGGGGCTGT CCAACACGAA CGCCAGCTGA 36300
TAGCCCAGCG CGTGCGCCGC CAGGCTCTGC TCTCGCTCGA GGATCGCGGC CACCAGATGC 36360
CCGAGGCGCG CCTCCAGCCG CAGGCGGGCC GCCGGGTCCA ACACGGACAC GTTCAGGAAC 36420
ACCGAGTCGG CCGCGCAGCC CGCTGCTCCC CGGGCGGCCA GGCCGGCCAG CACGCGCGAG 36480
TGGGCCAAAA AGCCCAGCAG GTCGGAGAGG CGAATCGCGT CGTGGGCGTG GGCCGCGTTG 36540 ACGAACGCAA ACCCCGACGA GGCGAGCAGC CCCGCGAGGC GCCAGAACAG GGACGGACGC 36600
GCGTCCGTGC CGGAGCCCGG GTCCTCCCCC AAAAACTCCG CATAGGCCCG CGACATATAC 36660
TGGGCGTAGT TCGTGCTCTC CTCGGGGTAG CCGGCCACCC GCCGGAGGGC GTCCAGCGCC 36720
GAGCCGTTGT CGGCGGGCGT CGGGGCCCCC AGGACAAAGA CGCGATACCT GGGGCCGGCC 36780
GGAGGCCCGG GGAGCACCGC GGGGGCGTTT TCGTCGGTCG GATTTCCGAC CCGAGCGAGG 36840 GTCTTGTCCG CAGGCACCAC TATGATCTCG GCCGGAGGGC TGTCCCGCAT CGATATCACG 36900
AGCCCCATGA AGCCCTTCCC GTATCGCGCG CGCACGAGCG CGGCGTCGCA CCCGAACGCC 36960
AGCCCGCCCG TCGTCCAGAC GCCCACGGGC CACGTCGAGG CCGACGGGGA GAGGTACACG 37020
TACCGACCCG GAGTCCGTAG CAGGCCCCTG GCGGCCAGCC AGGTCACGGA TGCGTTGTGC 37080
AGATGCGCGA TGCTCAGGTT CGTCGTCGGA TGCCTCGGTG TCCCCGCGGG CGGCCCCGGG 37140 GGCGGCGCGT TGCGTCGGCC GTCCGGGTGC CTCTCGGTCG CCCCGTCGTC TCCCCGCGGG 37200
AACGTAAGCC CCTCGCGGTC CGGCGCGGCC GCGAATGTTA CCCAGGCCCG GGACCGCAAC 37260
AGCGCGGAGG CGCCGGGGTT GTGCGACAGT CCCTTGAGCT GGGTCACCTC GGCGGGGGGA 37320
CGGGACGTGG GCCCCGCCTC GGGGAGCTCG GGCAGGCTCG CGTTCCGAGG CCGGCCGAGC 37380 AGATAGGTCT TTGGGATGTA AAGCAGCTGC CCGGGGTCCC GAGGAAACTC GGCCGTGGTG 37440
ACCAACACGA AACAAAAGCG CTCGGCGTAC CACCGAAGCA TGGGCACGGA TGCCGTAGTC 37500
AGGTTGAGTT CGCCCGGGGG CGCCAAGCGT CCGCGCTGGG GGTCGCTGGC GTCGGGGGTG 37560
TTGGGCAACC ACAGACGCCC GGTGTTTGTG TCGCGCCAGT ACGTGCGGGC CAACCCCAGA 37620 CCGTGCAAAA ACCACGGGTC GATTTGCTCC GTCCAGTACG TGTCATGGCC CCCGGCAACG 37680
CCCACCAGGA CCCCCATCAC CACCCACAGA CCGGGGCCCA TGGTCGTCCG TCCCGGCTGC 37740
CAGTCCGCAG ATGGGGGGGT GTCCGTACCC ACGGCCCAAA GAGGCTCCGC ACCTCGGAGG 37800
CTATCGGAGG CCCTTTGTTG CCGTAAGCGC GGGCCAAAGG ATGGGGTGGG GTGAGGGTAA 37860
AAGCACAAAG GGAGTACCAG ACCGAAAACA AGGACGGATC GGCCCGCTCC GTTTTTCGGT 37920 GGGGTGCTGA TACGGTGCCA GCCCTGGCCC CGAACCCCCG CGCTTATGGA CACACCACAC 37980
GACAACAATG CCTTTTATTC TGTTCTTTTA TTGCCGTCAT CGCCGGGAGG CCTTCCGTTC 38040
GGGCTTCCGT GTTTGAACTA AACTCCCCCC ACCTCGCGGG CAAACGTGCG CGCCAGGTCG 38100
CGTATCTCGG CGATGGACCC GGCGGTTGTG ACGCGGGTTG GGATCATCCC GGCGGTGAGG 38160
CGCAACAGGG CGTCTCGACA CCCGACGGGC GACTGATCGT AATCCAGGAC AAATAGATGC 38220 ATCGGAAGGA GGCGGTCGGC CAAGACGTCC AAGACCCAGG CAAAAATGTG GTACAAGTCC 38280
CCGTTGGGGG CCAGCAGCTC GGGAACGCGG AACAGGGCAA ACAGCGTGTC CTCGATGCGG 38340
GGCAGAGACC CCGCGCCGTC CTCGGGGTCG GGGCGCGGGG TCGCCGCGGC GACCCCCGTC 38400
AGCCGGCCCC AGTCCTCCCG CCACCTCCCG CCGCGCTGCA GGTACCGCAC CGTGTTGGCG 38460
AGTAGATCGT AGACACGGCG AATGGCGGAC AGCATGGCCA GGTCAAGCCG CTCGCCCGGG 38520 CGTTGGCGTC TGGCCAGGCG GTCGGCGTGT TCGGCCTCCG GAAGGACACC CAGGACCAGG 38580
TTCGTGCCGG GCGCGGTCGG GGGCATGAGG GCCACGAACG CCAACACGGC CTGGGGGGTC 38640
ATGCTTCCCA TGAGGTACCG CGCGGCCGGG TAGCACAGCA GGGAGGCGAT AGGGTGCCGG 38700
TCGAAAACAA GGGTGAGGGC CGGGGGCGGG GCTTGCGGGC CCACAGCCTC CCCCCCGATA 38760
TGAGGAGCCA AAACGGCGTC CGTCGCCGCA TAAGGCGTGC TCATTGTTAT CTGGGCGCTG 38820 GTCATTACCA CCGCCGCCTC CCCGGCCGAT ATCTCGCCGC GGTCCAGACG GTGCTGCGTG 38880
TTGTAGATGT TCGTCAGGGT CTCGGAGGCC CCCAGCACCT GCCAGTAAGT CATCGGCTCG 38940
GGGACGTAGA CGATATTGTC GCGCGGCCCC AGGGCCTCCA TCAGCTGCGC GGAGGTGGTG 39000
GTCTTCCCCA CCCCGTGGGG TCCGTCTATA TAAACCCGCA GCAGCGTGGG CAGCTCCGGA 39060
TCCCCGCGGG CTTCGGAGGC CCCCTGGCGA TGGCTAGGAC GGGACGCCGC GCGGCCGTCG 39120 GTAGGCCCGC TCGCACGAGC AGCCTGACCG AACGCAGGCG CGTGCTGTTG GCCGGCGTGA 39180
GAAGCCATAC CCGCTTCTAC AAGGCGTTCG CCCGAGAGGT GCGGGAGTTC AACGCCACCA 39240
GGATTTGTGG AACGCTGCTG ACGCTGATGA GCGGGTCGCT GCAGGGTCGC TCGCTGTTCG 39300
AGGCCACGCG CGTCACCTTA A ATGCGAAG TGGACCTCGG GCCGCGCCGC CCAGACTGCA 39360
TCTGCGTGTT CGAATTCGCC AATGACAAAA CGTTGGGAGG TGTGTGCGTC ATCCTGGAGC 39420 TAAAGACATG CAAATCGATT TCTTCCGGGG ACACGGCCAG CAAACGCGAA CAGCGGACCA 39480
CGGGCATGAA GCAGCTGCGC CACTCCCTGA AGCTGCTGCA GTCGCTCGCG CCTCCGGGGG 39540
ACAAGGTCGT CTACCTGTGT CCTATTTTGG TGTTTGTCGC GCAGCGTACG CTGCGCGTCA 39600
GCCGCGTGAC CCGGCTCGTC CCGCAAAAGA TCTCCGGCAA CATCACCGCG GCCGTGCGGA 39660
TGCTCCAAAG CCTGTCCACG TATGCCGTGC CGCCGGAACC GCAGACCCGG CGGTCGCGGC 39720 GCCGGGTCGC CGCGACCGCC AGACCGCAAA GGCCCCCCTC CCCGACACGT GACCCGGAAG 39780
GCACGGCGGG TCATCCGGCC CCACCAGAGA GCGACCCCCC CTCCCCAGGG GTCGTAGGCG 39840
TCGCTGCGGA GGGTGGGGGT GTGCTTCAGA AAATCGCGGC GCTTTTTTGC GTGCCGGTGG 39900
CCGCCAAGAG CAGACCCCGG ACCAAAACCG AGTGAGGTTC TGTGTGTTGT TTTTTTTCCT 39960 CGTTTTGTTT TCTCTTCTTT CCCCCCCCCC TCCCCCGCTT CTGGCCAAGC ATCCTCACCT 40020
GCTTAAGCGG AACCCGCGGG CGCGCGGGGA CTCATTTGTC GCCGGCGACA CCCACCCGAC 40080
AACAGCCCCT GGGTGTAGAC CGCTGTCGCC CCCGTCTGTC GCCTCTCCCT TTTTTCCCCC 40140
CCTCAAAGAA CGTGGTGTTG GGCGCCGGCC AATTCTTCCC GGAGCGCCGT CGTCGCCCGC 40200 CCGCCGCCCT CGAACATGGA CCCGTACTAC CCTTTCGACG CGCTGGACGT TTGGGAACAC 40260
AGGCGCTTCA TCGTCGCCGA CTCCAGGAGC TTCATCACCC CCGAGTTCCC CCGGGACTTC 40320
TGGATGTTGC CCGTGTTCAA CATCCCCCGG GAGACGGCGG CGGAGCGGGC GGCAGTGCTG 40380
CAGGCCCAGC GCACCGCGGC CGCGGCGGCC CTGGAGAACG CCGCCCTCCA GGCCGCCGAG 40440
CTGCCCGTCG ACATCGAGCG CCGGATACGC CCGATCGAGC AGCAGGTGCA TCACATCGCC 40500 GACGCCCTGG AGGCGCTGGA GACCGCGGCG GCCGCGGCCG AAGAGGCGGA TGCCGCGCGG 40560
GACGCCGAGG CGAGGGGGGA GGGCGCTGCG GACGGGGCAG CGCCGTCGCC CACCGCGGGC 40620
CCCGCCGCCG CGGAGATGGA GGTTCAGATC GTACGCAACG ACCCGCCGCT ACGATACGAT 40680
ACCAACCTCC CCGTGGATCT GCTACACATG GTGTACGCGG GCCGCGGGGC CGCGGGTTCG 40740
TCGGGAGTCG TCTTTGGTAC CTGGTACCGC ACGATCCAGG AACGCACCAT CGCGGACTTC 40800 CCCCTGACCA CCCGCAGCGC CGACTTTCGA GACGGGCGCA TGTCCAAGAC CTTCATGACC 40860
GCGCTGGTCC TGTCTCTGCA GTCGTGCGGC CGGCTGTACG TGGGCCAGCG CCACTATTCC 40920
GCCTTCGAGT GCGCCGTGCT GTGTCTGTAT CTGCTGTACC GAACCACCCA CGAGTCCTCC 40980
CCCGATCGCG ATCGCGCTCC CGTTGCGTTC GGGGACCTGC TGGCCCGCCT GCCGCGCTAC 41040
CTGGCGCGTC TGGCCGCGGT AATCGGCGAC GAGAGCGGAC GCCCGCAGTA CCGCTACCGC 41100 GACGACAAGC TGCCCAAAGC GCAGTTCGCG GCGGCCGGCG GCCGCTACGA GCACGGGGCC 41160
CTGGCCACCC ACGTCGTGAT CGCCACGTTG GTGCGCCACG GGGTGCTACC GGCGGCCCCG 41220
GGCGACGTTC CCCGAGACAC CAGCACCCGC GTGAACCCCG ACGACGTGGC CCACCGCGAC 41280
GACGTCAACC GCGCCGCCGC CGCGTTTTTG GCACGCGGCC ACAACCTCTT CCTGTGGGAG 41340
GACCAGACGC TGCTGCGGGC GACCGCCAAC ACCATTACGG CCCTGGCCGT GCTTCGGCGG 41400 CTCCTCGCGA ACGGCAACGT GTACGCGGAC CGCCTCGACA ACCGCCTGCA GCTGGGCATG 41460
CTGATCCCGG GAGCCGTCCC GGCGGAGGCC ATCGCTCGGG GGGCGTCCGG ATTGGACTCG 41520
GGCGCCATAA AAAGCGGCGA CAACAACCTG GAGGCGCTGT GCGTTAACTA TGTACTTCCG 41580
CTGTATCAGG CAGACCCCAC GGTCGAGCTG ACCCAGTTGT TTCCGGGGCT GGCCGCCCTG 41640
TGCCTGGACG CCCAGGCGGG GCGGCCACTG GCGTCGACGA GGCGCGTGGT GGATATGTCG 41700 TCGGGCGCCC GCCAGGCGGC GCTCGTGCGC CTCACCGCGC TGGAGCTCAT CAACCGCACC 41760
CGCACAAACA CCACCCCTGT GGGGGAGATT ATTAACGCCC ACGATGCCTT GGGGATACAA 41820
TACGAACAGG GCCTGGGGCT GCTCGCCCAG CAGGCACGCA TCGGCTTGGC GTCGAACGCC 41880
AAGCGATTCG CCACGTTCAA CGTGGGCAGC GACTACGACC TGTTGTACTT TTTGTGTCTC 41940
GGGTTCATTC CCCAGTACCT GTCCGTGGCC TAGGGAAGGG TGGGGGTGGT GGTGGTGGGG 42000 TGTTTTTCTG CTGTTGTTGT TTCTGGTCCG CCTGGTCACA AAAGGCACGG CGCCCCGAAA 42060
CGCGGGCTTT AGTCCCGGCC CGGACGTCGG CGGACACACA ACAACGGCGG GCCCCGTGGG 42120
TGGGTAAGTT GGTTCGGGGG CATCGCTGTA TTCCCTTGCC CGCTTCCACC CCCCCTTCCC 42180
GTTTTGTTTG TTTGTGCGGG TGCCCATGGC GTCGGCGGAA ATGCGCGAGC GGTTGGAGGC 42240
GCCTCTGCCC GACCGGGCGG TGCCCATCTA CGTGGCCGGG TTTTTGGCCC TGTACGACAG 42300 CGGGGACCCG GGCGAGCTGG CCCTGGACCC AGACACGGTG CGTGCGGCCC TGCCTCCGGA 42360
GAACCCCCTG CCGATCAACG TAGACCACCG CGCTCGGTGC GAGGTGGGCC GGGTGCTCGC 42420
CGTGGTCAAC GACCCTCGGG GGCCGTTTTT TGTGGGGCTG ATCGCGTGCG TGCAGCTGGA 42480
GCGCGTCCTC GAGACGGCCG CCAGCGCCGC TATTTTTGAG CGCCGCGGAC CCGCGCTCTC 42540 CCGGGAGGAG CGTCTGCTGT ACCTGATCAC CAACTACCTG CCATCGGTCT CGCTGTCCAC 42600
AAAACGCCGG GGGGACGAGG TTCCGCCCGA CCGCACCCTG TTTGCGCACG TGGCCCTGTG 42660
CGCCATCGGG CGGCGCCTTG GAACCA-TCGT CACCTACGAC ACCAGCCTAG ACGCGGCCAT 42720
CGCTCCGTTT CGCCACCTGG ACCCGGCGAC GCGCGAGGGG GTGCGACGCG AGGCCGCCGA 42780 GGCCGAGCTC GCGCTGGCCG GGCGCACCTG GGCCCCCGGC GTGGAGGCGC TCACACACAC 42840
GCTGCTCTCC ACCGCCGTCA ACAACATGAT GCTGCGTGAC CGCTGGAGCC TCGTGGCCGA 42900
GCGGCGGCGG CAGGCCGGGA TCGCCGGACA CACGTACCTT CAGGCGAGCG AAAAATTTAA 42960
AATATGGGGG GCGGAGTCTG CCCCTGCGCC GGAGCGCGGG TATAAAACCG GCGCCCCGGG 43020
TGCCATGGAC ACATCCCCCG CCGCGAGCGT TCCCGCGCCG CAGGTCGCCG TCCGTGCGCG 43080 TCAAGTCGCG TCGTCGTCGT CTTCTTCTTC TTCTTTTCCG GCACCGGCCG ATATGAACCC 43140
CGTTTCGGCA TCGGGCGCCC CGGCCCCTCC GCCGCCCGGC GACGGGAGTT ATTTGTGGAT 43200
CCCCGCCTTT CATTACAATC AGCTCGTCAC CGGGCAATCC GCGCCCCACC ACCCGCCGCT 43260
GACCGCGTGC GGCCTGCCGG CCGCGGGGAC GGTGGCCTAC GGACACCCCG GCGCCGGCCC 43320
GTCCCCGCAC TACCCGCCTC CTCCCGCCCA CCCGTACCCG GGTATGCTGT TCGCGGGCCC 43380 CAGTCCCCTG GAGGCCCAGA TCGCCGCGCT GGTGGGGGCC ATCGCCGCCG ACCGCCAGGC 43440
GGGTGGGCTT CCGGCGGCCG CCGGAGACCA CGGGATCCGG GGGTCGGCGA AGCGCCGCCG 43500
ACACGAGGTG GAGCAGCCGG AGTACGACTG CGGCCGTGAC GAGCCGGACC GGGACTTCCC 43560
GTATTACCCG GGCGAGGCCC GCCCCGAGCC GCGCCCGGTC GACTCCCGGC GCGCCGCGCG 43620
CCAGGCTTCC GGGCCCCACG AAACCATCAC GGCGCTGGTG GGGGCGGTGA CGTCCCTGCA 43680 GCAGGAACTG GCGCACATGC GCGCGCGTAC CCACGCCCCC TACGGGCCGT ATCCGCCGGT 43740
GGGGCCCTAC CACCACCCCC ACGCAGACAC GGAGACCCCC GCCCAACCAC CCCGCTACCC 43800
CGCCGAGGCC GTCTATCTGC CGCCGCCGCA CATCGCCCCC CCGGGGCCTC CTCTATCCGG 43860
GGCGGTCCCC CCACCCTCGT ATCCCCCAGT TGCGGTTACC CCCGGTCCCG CCCCCCCGCT 43920
ACATCAGCCC TCCCCCGCAC ACGCCCACCC CCCTCCGCCG CCGCCGGGAC CCACGCCTCC 43980 CCCCGCCGCG AGCTTACCCC AACCCGAGGC GCCCGGCGCG GAGGCCGGCG CCTTAGTTAA 44040
CGCCAGCAGC GCGGCCCACG TGAACGTGGA CACGGCCCGG GCCGCCGATC TGTTTGTGTC 44100
ACAGATGATG GGGTCCCGCT AACTCGCCTC CAGGATCCGG ACTTGGGGGG GGTGTGTGTT 44160
TTCATATATT TTAAATAAAC AAACAACCGG ACAAAAG AT ACCCACTTCG TGTGCTTGTG 44220
TTTTTGTTTG AGAGGGGGGG GGTGGAGTGG GGGGGAAAGT GGGCCGAATG ACACAAAAAT 44280 TAGGTCGGAG GGGTGAGGGG GGGGGGGCTA GGAGCCGAAC CGATGGCCCC CACACGCGAC 44340
GGAAGGCCCG GAAGACTACC ACGGGGAGGG GGTGTGGAAA GCGACCGGTC GCAGGGAGAC 44400
GGGGTTGGTT TGGGGTTGGT TTGGGGTTGG TTTTCCCGTT AGCACATGTC TGCATTTGTT 44460
TTTCTAGTCA CACGCCCCCC CCCCCAAATA AAAACCAAGG CAAAACAATA CCAGAAGTCA 44520
TGTGTATTTT TGAACATCGG TGTCTTTTTA TTTATACACA AGCCCAGCTC CCCTCCCCTC 44580 CCTTAGAGCT CGTCTTCGTC TCCGGCCTCG TCCTCGTTGT GGAGCGGAGA GTACCTGGCT 44640
TTGTTGCGCT TGCGCAGAAC CATGTTGGTG ACCTTGGAGC TGAGCAGGGC GCTCGTGCCC 44700
TTCTTTCTGG CCTTGTGTTC CGTGCGCTCC ATGGCCGACA CCAAAGCCAT A ATCGGATC 44760
ATTTCTCGGG CCTCGGCCAA CTTGGCCTCG TCAAACCCGC CCCCCTCCGC GCCTTCCTCC 44820
CCCTCCCCGC CCACGCCCCC GGGGTCGGAA GTCTTGAGTT CCTTGGTGGT GAGCGGATAC 44880 AGGGCCTTCA TGGGATTGCG TTGCAGTTGC AGGACGTAGC GGAAGGCGAA GAAGGCCGCG 44940
ACCAGGCCGG CCAGGACCAG CAGCCCCACG GCAAGCGCCC CGAAGGGGTT GGACATAAAG 45000
GAGGACACGC CCGAGACGGC CGACACCACG CCCCCCACTA CTCCCATGAC TACCTTGCCG 45060
ACCGCGCGCC CCAAGTCCCC CATCCCCTCG AAGAACGCGC ACAGCCCCGC GAACATGGCG 45120 GCGTTGGCGT CGGCGCGGAT GACCGTGTCG ATGTCGGCAA AGCGCAGGTC GTGCAGCTGG 45180
TTGCGGCGCT GGACCTCCGT GTAGTCCAGC AGGCCGCTGT CCTTGATCTC GTGGCGCGTG 45240
TAGACCTCCA GGGGCACAAA CTCGTGGTCC TCCAGCATGG TGATGTTCAG GTCGATGAAG 45300
GTGCTGACGG TGGTGACGTC GGCGCGACTC AGCTGGTGAG AGTACGCGTA CTCCTCGAAG 45360 TACACGTAGC CCCCGCCGAA GATGAAGTAG CGCCGGTGGC CCACGGTGCA CGGCTCGAGC 45420
GCGTCGCGGG TGAGGCGCAG CTCGTTGTTC TCGCCCAGCT GCCCCTCGAT CAGCGGGCCC 45480
TGGTCTTCGT ACCGAAAGCT GACCAGGGGG CGGCTGTAGC ACGTCCCCGG CCGCGAGCTG 45540
ACGCGCATCG AGTTCTGCAC GATCACGTTG TCCGGGGCGA CGGGCACGCA CGTGGAGACG 45600
GCCATGACGT CTCCGAGCAT GCGCGCGCTC ACCCGCCGGC CGACGGTGGC GGAGGCGATG 45660 GCGTTGGGGT TGAGCTTGCG GGCCTCGTTC CAGAGAGTCA GCTCGTGGTT CTGCAGCTCG 45720
CACCACGCGA CGGCGATGCG CCCCAGCATG TCGTTCACGT GGCGCTGTAT GTGGTTATAC 45780
GTAAACTGCA GCCGGGCGAA CTCGATCGAG GAGGTGGTCT TGATGCGCTC CACGGACGCG 45840
TTGGCGCTGG GCGCCTCCCG CAGTGGCGCG GGCGTGGCAT TCCGGGGCTT GCGGTCCTGC 45900
TCCCGCATGT ACTCCCGCAC GTACAGCTCG GCGAGCGTGT TGCTGAGGAG GGGCTGGTAC 45960 GCGATGAGGA AGCCCCCCGT GGCCAGGTAG TACTGCGGCT GGCCCACCTT GATGTGCGTG 46020
GCGTTGTACT TGCGCGCAAA CATGCGGTCG ATGGCCTCGC GGGCATCCCG GCCAATGCAG 46080
TCGCCCAGGT CGACGCGCGA GAGCGAGTAC TGGGTCAGGT TGGTGGTGAA GGTGGTCGAG 46140
ATGGCGTCGG AGGAGAAGCG GAAGGAGCCG CCGTACTCGG CGCGGAGCAT CTCGTCCACC 46200
TCCTGCCACT TGGTCATGGT GCAGACCGCC GGTCGCTTCG GCACCCAGTC CCAGGCCACG 46260 GTAAACTTGG GGGTCGTCAG CAAGTTGCGG GTCGTCGGCG ACGTGGCCCG GGCCTTCGTG 46320
GTGAGGTCGC GCGCGTAGAA GCCGTCGACC TGCTTGAAGC GGTCGGCGGC GTAGCTGGTG 46380
TGCTCGGTGT GCGACCCCTC CCGGTAGCCG TAAAACGGGG ACATGTACAC AAAGTCGCCC 46440
GTCGCCAACA CAAACTCATC GTACGGGTAC ACCGACCGCG CGTCCACCTC CTCGACGATG 46500
CAGTTGACCG TCGTGCCGTA CCGATGGAAC GCCTCCACCC GCGAGGGGTT GTACTTGAGG 46560 TCGGTGGTGT GCCACCCCCG GCTCGTGCGC GTGGCGACCT TCGCCGGCTT GAGCTCCATG 46620
TCGGTCTCGT GGTCGTCCCG GTGAAACGCG GTGGTCTCCA TGTTGTTCCG CACGTACTTG 46680
GCCGTGGAGC GGCAGACCCC CTTGGCGTTA ATCTTGTCGA TCACCTCCTC GAAGGGAACG 46740
GGGGCGCGGT CCTCGAATAT CCCCATAAAC TGGGAGTAGC GGTGGCCGAA CCACACCTGC 46800
GACACGGTCA CGTCTTTGTA GTACATGGTG GCCTTGAATT TGTACGGGGC GATGTTCTCC 46860 TTGAAGACCA CCGCGATGCC CTCCGTGTAG TTCTGCCCCT CCGGGCGCGT CGGGCAGCGG 46920
CGCGGCTGCT CAAACTGCAC CACCGTGGCG CCCGTCGGGG GCGGGCACAC GTAAAACTGG 46980
GCATCGGCGT TCTCGACCTT GATTTCCCGC AGGTGCGCGC GCAGCGTGGC GTGGCCGGCG 47040
GCGACGGTCG CGTTGGCGTC GGGGGGCGGG GTCGCCTCGG GCCGCTTGGG CGGCTTTTTG 47100
GTTTTCCGCT TCCGGGCCTT GGTGGTCGCG GGGCTCGGGA CGGGGGGCGG CCGGGAGGCG 47160 GGACCCCCGT TCGCCGCGAC GGTCGCGGCC ACGCCGCCCG AGGCGCGGGG GGCCGCCGGG 47220
GCCGCCGGGG CCGCCGACGC CACCGCGGCC ACCAGCGCCC CCACGACCAG CGCGCAAATC 47280
AAGCCCCCCC CGCGCATGGC GGGCCTACGG GGGCGCGTCG CTCCCGCCGC CCGCTAGTCT 47340
GGGGGCGAGG TGCTGCAGGA CCGAGTAGAG GATGGAAAAA ACGTCTCGGT CGTAAACCAC 47400
GACCGAGCGG GGTCCGATGC AGCCGTCGGG GCCGCTCTCG ACGATGGCCA CCAGCGGACA 47460 GTCGGAGTTG TACGTGAGGT ACACGCCCGG CGGGTAGCGG TACAGACCTT CGGAGGTCGG 47520
GCGGCTGCAG TCGGGGCGGC GCAACTCAAG CTCCCCGCAC CGGTAGACCG ACGCAAAGAG 47580
TGTGGTGGCG ATAATGAGCT CGCGAATATA TCGCCAGGCG GCGCGCTGGG TGGGCGTGAT 47640
TCCGGAAACA CCGTCAAAAC AGTAGAACTT TTGAAACTCG CTGACGGCCC AATCAGCGCC 47700 CGAACCCCCC GCGCCCATGA TGAAGCGGGC GAGTTCCTCC TTGAGGTGCG GCAGGAGCCC 47760
CACGTTCTCG ACGCTGTAGT ACAGCGCGGT GTTGGGGGGC TGGGCGAAGC TGTGGGTGGA 47820
GTGGTCGAAC AGGGGCCCGT TGACGAGCTC GAAGAAGCGA TGGGTGATGC TGGGGAGCAG 47880
GGCCGGGTCC ACCTGGTGGC GCAGCAGCGA CGCTCGCATG AACCGGTGCG CGTCAAACAC 47940 GCCCGGGGCG GCGCGGTTGT CGATGACCGT GCCCGCGCCC GCCGTCAGGG CGCAGAAGCG 48000
CGCGCGCGCC GCGAAGCCGT TGGCGACCGC GGCGAAGGTC GCGGGCAGCA CCTCGCCGTG 48060
GACGCTGACC CGCAGCATCT TCTCGAGCTC CCCGCGCTGC TCGCGCACGC AGCGCCCGAG 48120
GCTGGCCAGC GACCGCTTGG TCAGGCGGTC CGCGTACAGC CGCCGGCGCT CCGGCACGTC 48180
CGCGGCGGCC CGCGTCGCGA TGTCGCCCCA GCTCTCCGGC CCCTGCGCCC CTGGCTCGGG 48240 GCCGCGCTCC CCGTCCTCGC TCGCGGGCGT CCCCGCGCCA CGCCTCCGCC CCCCCTCCTC 48300
CGCGGCGGCC CGGGGCTCTT CCTCCTCGGC CCCCCCGGTC GCGCCGCCGG CCCCCAGCCG 48360
CGCCAGCACG CGGCGCAGCG CCTCCTCGTC GCACTGCTCG GGGCTGACGA GCCGCCGCAG 48420
CAGCGGCGTC GTCAGGTGGT GGTCGTAGCA CGCGCGTATC AGCGCCTCGA TCTGATCGTC 48480
GGGCGACGTC GCCTGGCCGC CGATGATCAG GGCGTCCACC ATGTCCAGCG CCGCCAGGTG 48540 GCCCCCGAAC GCGCGATCGA AGTGCTCCGC CCGCCGCCCG AACAGCGCCA GCTCCACGGC 48600
CACCGCGGCG GTCTCCTGCT GCAGCTCGCG CTGCGCCAGC GCGTTCAGGT TGTCGGCGAA 48660
GGCGTCCATG GTGGAGTGGC GGGCGCGATC GCCGGACGCC AGCCAGAAGC GCAGCTCGCT 48720
GATGGCGTAC AGGCCGGGCG TAGTGGCCTG AAACACGTCA TGCGCCTCCA GCAGGGCGTC 48780
GGCCTCCTCG CGGACAGAAG AGCTATCGGC GGGCGGCGGG CCGGCCCGGG CCCCGCCGCC 48840 CGCCGCGGTC CGCGCCAGCG CCTGGTCCAG CACACAGAGC GCTCGCGCGC GGGCGGCGTC 48900
CGACAGCCCG GCGGCGTGGG GCAGGTACCG TCGCAGCTCG TTGGCGTCCA GCCGCACCTG 48960
GGCCTGTTGG GTGACGTGGT TACAGATGCG GTCCGCCAGG CGGCGGGCGA TGGTCGCCCC 49020
CTGGTTCGCG GTGACGCACA GCTCCTCGAA ACAGACCGCG CACGGGTGGG ACGGGTCGCT 49080
CAGCTCCGGG GGCACGATGA GGCCCGACCC CACCGCCGCC ACCATAAACT CCCGGACGCG 49140 CTCCAGCGCG GCCGTGGCGC CGCTCGGGGG GGTGATGAGG TGGCAGTAGT TCAGCTGCTT 49200
GAGAAAATTC TCGACATCAT GCAGGAAGCA CAGCTCCATG CGGACGTCCC CGCCGTACGT 49260
CTCCAGCCGG ATCTGCTGGT GGTACGGACA GGGTCGGGCC AGACCCATGG TCTCGGTGAA 49320
AAAGGCAGAG ACGTCACCCG TGGTCGCGAA CGTTTCCAGG TGGCCCAGGA GCCGCTCCCC 49380
CTCGCGCCAC GCGTACTCCA GGAGCAACTC CAGGGTGACC GACAGCGGGG TGAGAAAGGC 49440 GGCGGCCTGA GCCTCCAGCC CCGGCCGCAG GTGCCGCCGC AGCACGCGCA CCTGGAGCGC 49500
GTTGAGCTTT AGCTGGGCGA GCTTCCCCAG GCCGATCTGG GGGTCGCATC GTCGAAGCAG 49560
CTCTAGCTGA AAAACGTACG TCTGTACCTG CCCGAGCAGG GCCAACAGTT TCTGTCGGGC 49620
CGCAGTGGGC TCGGAAACCG CGGCCGGGGG CGCGGCCGCC ATGGCGAGTC GCCCGGCCGT 49680
GCTGTGGTTT AGTTAAGGTT TGGGGGGGTG GGTCAGAGGC GCGCCCCGCG CGGACTGATG 49740 CGGCGGCGGG CCCCTGACAT CCCCTCTTTA TGCCCGTCGC CCGCCCGCCC GCCCCGCCGG 49800
TGTGCCGTGA TTCGCGGAGT CGGGGCCTTG TGTTTCTTTC TTTCCCCCCC CGAATCCGTT 49860
CTTTCTTCCT CACCCCCCCT CCCCACACAC CCACCCAGGA CTCGCCACCA CAAGGAGGCG 49920
AGAGCCCGTC GCTAACCCAA AGACACAGTC ACGAGACACG ATATCGACTG TAGTTGCGAT 49980
CGTTTATTTT ATACACAACA CCAACCTTTC CTTCGACCCC CCCCACCCCC GCCCCTAGAG 50040 CATATCCAAC GTCAGGTCCT TTTTCTCCGG TGGTCCCTCC CCAAACGGAT CGTCGCCGTG 50100
AAACGCCCGC TTTCGGGCGA CGCCGGCCGC CCCCGCCGCC GCCGCCAAAC CGCCGAACGA 50160
CGCCGCGTGG TCATCCTCGT CGCCGAAATC CCCAAAGTTA AACACCTCCC CGGCGGCGCC 50220
GAGCTGGCTG ACCAGGGCCT CCGCCTCGTG GGCCACCTCC AGGGCCGCGT CGGTCGACCA 50280 CTCGCCGTGC CCGCGCTCCA GGGCGCGGGT GGTAAACTCC ATCATTTCCT CGCTCAGGTA 50340
CTCGTCCTCC AGCAGCGCCA GCCAGTCCTC GATCTGCAGC TGCTGGGTGC GGGGGCCCAG 50400
GCTCTTGACG GTCGCCACAA ACACGCTGCT GGCGACCGCC GCCCCGCCCT CCGCAATGAT 50460
GCCCCGGAGC TGCTCGCACA GCGAATGCTC GTGGGCCCCG CCCCCGAGAC TCGACGCCGC 50520 GCACACAAAC CCGGCCCTGG GGCAGGCCAG GACAAACTTG CGGGTGCGGT CAAAGATCAG 50580
CAGCGGGCAC GCGTTTTTGC CGCCCAGCAG GCTGGCCCAG TTCCCGGCCT GAAACACGCG 50640
GTCGTTGCCG GCCATGCCGT AGTATTTGCT GATGCTGAGG CCCAGCACGA CCATCGGGCG 50700
CGCGGCCATC ACGGGCCGCA GCAGGTTGCA GCTCGCGAAC ATGGACGTCC AGGCGCCGGG 50760
GTGCGCGTCG AGGGAGTCCA TCAGCGCGCG GGCCCCGGCC TCCAGGCCCG CGCCGCCCTG 50820 CGGGGCCCAG GCGGCGGCCG CCTGCACGCC GGGGGGACGG CGGGACCCGG CGATGACGGC 50880
CGTGAGGGTG TTTATGAAGT ACGTCGAGTG GTCGCAG AC CTCAAGATCT GGTTGGCCAT 50940
GTAGTACATG GCCAGTTCGC TCACGTTATT GGGGGCCAGG TTGATAAAGT TAATCGCGCC 51000
GTAGTCCAGG GAGAACCTCT TAATGAACGC GATGGTCTCT ATGTCCTCGC GCGACAAGAG 51060
CCGGGCGGGG AGCTGGTTGC GCTGGAGGGC GGTCCAGAAC CACTGCGGGT TCGGCTGGTT 51120 CGACCCCGGG GGCTTGCCGT TGGGAAAGAT GACCGCGTGG AACTGCTTCA GCAGGAAGCC 51180
CAGCGGTCCG AGGAGGATGT CCACGCGCTT GTCGGGCTTC TGGTAGGCGC TCTGGAGGCT 51240
GGCGACCCGC GCCTTGGCGG CCTCGGACGC GTTGGCGCTC GCGCCCGCGA ACAACACGCG 51300
GCTCTTGACG CGCAGCTCCT TGGGAAACCC CAGGGTCACG CGGGCAACGT CGCCCTCGAA 51360
GCTGCTCTCG GCGGGGGCCG TCTGGCCGGC CGTTAGGCTG GGGGCGCAGA TAGCCGCCCC 51420 CTCCGAGAGC GCGACCGTCA GCGTCTTCGC CGACAGGAAC CCGTTGTTGA ACAGGTCCAT 51480
GACGCGCCGC CGCAGCACCG GTTGGAATTG ATTGCGAAAG TTGCGCCCCT CGACCGACTG 51540
CCCGGCGAAC ACCCCGTGGC ACTGGCTCAG GGCCAGGTCC TGGTACACGG CGAGGTTGGA 51600
CCGCCGCGCG AGGAGCTGCA GCAGGGGGCA CGGCCCGCAG GTGTACGGGT CCAGCGACAG 51660
CGACATGGCG TGGTTGGCCT CGGCCAGACC GTCGCGGAAC TTAAACCAGC TGCTTGATGT 51720 TGTTCACCAC CGTGTGCAGG GCCTCGCGGG TGCCGATAAT CGTCTCCAGC CTCCCCAGGG 51780
CCGTGGGCAC CGCCTGGTCC ACGTACTGCA GGGCCTCGAG CTCGGCCATG ACGCGCTCGG 51840
TGGCCGCGCG GTACGTCTCC TGCATGATGG TCCGGGTGTT CTCGGACCCG TCCGCGCGCT 51900
TCAGGGCCGA GAAGGCGGCG TAGTTCCCCA GCACGTCGCA GTCGCTGTAC GCGCTGTTCA 51960
TCGTTCCGAA GACCCCAATG GCCCCCCGGG CGGCGCTCGC GAACTTGGGG TGGCGGGCCC 52020 GCAGCCGCAT CAGCGTCGTG TGCGCGCAGG CGTGGCGGGT CTCGAAGGTA CACAGGTTGC 52080
AGGGCACGTC GGTCTGGCCC GAGTCCGCGA CGTAGCGAAA CACGTCCATC TCCTGGCGCC 52140
CGACGATGAC TCCGCCGTCG CAGCGCTCCA GGTAAAACAG CATCTTGGCC AGCAGGGCCG 52200
GAGAGAACCC GCACAGCATG GCCAGGTGCT CGCCGGCGAA CTCCTGGGTT CCGCCGACGA 52260
GGGGCGCCGT GGGGCGCCCC TCGTACCCGG GCACCACGTG GCCCTCGCGG TCCAGCTGCG 52320 GGTTGGCCGC CACGTGCGTG CCGGGCACGA GAAAGAAGCG GTAAAAGGAG GGCTTGCTGT 52380
GGTCCTTGGG GTCCGCCGGC CCGGCGTCGT CCACCTCGGT CAGGTGGAGG GCCGAGTTGG 52440
TGCTGAACAC CATGGCGCCC ACGAGGCCCG CGGCGCGCGC CAGGTACGCC CCGACGGCGC 52500
CGGCACGGGC CGCGGGCGTT TCCTGGCCCT CAAGCAGGGG CCACGTGGTG ATGTCGGGGG 52560
GCGGCTCGTC AAAGACCGCC ATCGACACGA TGGACTCCAG GGCCAGGGCG GCGTCGCCCG 52620 CCATCACCGA GGCCAGGCGC TGCTCAAACC CGCCCGCCGG GCCCTTGTTC CCGGCGTCGC 52680
GCGCGCCCCG CTGGGGCTTA CCCTGGCTGG CCTCGAAGGC CGTGAACGTA ATGTCGGCGG 52740
GGAGGGCCGC GCCCTCGTGG TTTTCGTCGA ACGCCAGGTG GGCGGCCGCG CGGGCCACGG 52800
CGTCCACGTT CCGAGCACGC AGGGCCACGG CGGCGGGCCC GACGACCGCC TCGAACAGCA 52860 GGCGGGCGAG GGGGCGGTTG AAAAACGGAA GGGGGTAGTT GAAATTCTCC CCGATCGATC 52920
GGTGGTTGCA GTTAAACGGA TCGGCGATGA CCCGGCTAAA ATCCGGCATA AACATCTGCA 52980
GCGGATACAC GGGGATGCGG TGAACCTCCG CGTCCCCGAT GGTTACCTTG TCCATCCCGC 53040
CCAGGTGCAG GAAGGTGTTG CTGATGCACA CGGCCTCCCG GAAGCCCTCC GTGATCACCA 53100 GATACAGCAA GGCCCGGTCC GGGTCCAGTC CGAGCCGCTC GCACAGCGCG TCCCCCGTCG 53160
TCTCGTGCTT TAGGTCGCAG GGCCGGGGCG CGTAGTCCGA GAAGCCAAAA TGGCGGCGCG 53220
CCCGCTCGCA GAGCCGCGTC AGGTTGGGGG CCTGGGTGCT GGGGGCCAGG TGGCGGCCGC 53280
CGTGAAAGAC GTAGACGGAC GGGCTGTAGT GCGAGGGCAT AAGCTTGAGG GACACCGCGG 53340
TCCCCCCAAG GCCCGTCGTG CGGGACCCGA CGACCGCGGC CACGTTGGCC TCAAACCCGC 53400 TCTCCACGGT CAGGCCGACG ATGAGGGGCG CGACGGCGAC GTCCGCGTCG CCGCTGCGCG 53460
CCGACAGTAG CGACAGCAGC TCCAGGCCTT CGGCCGGACA GGCGCGGCCA TACACGTACC 53520
CCATCGGCCC CGGAGGAACC TTGACGGTGG TCGTCGTTTT GGGCTTGGTG TCCATGGCTT 53580
TCGGGAGATT GGCGACCGGC AGGAACGGGG GCCCGGCAAG ACGACCGGGG GCAGACGGGG 53640
GAGGCCGCGC GTGGTCGACG GCTGCTGCCC GCCGTCGTCT CTCCGATGGG GTCGAATGCC 53700 GGCGCTGGGG GTGGGGTCTA CACCCGCCCG TTCACCGAGC GGCCCCTGGT GGGGGTGGGA 53760
TGGGTGGGAT GGGGTGGGCG AGAATGGCCC GCCACCGGAT CGCGCCGGAC GGGGGGGCCC 53820
GGGGTTGGGC AAGGTTTGGG CGCAAGGCTC CAGCGGCGAT TCGAGAGGCC TGCGGATGGC 53880
GGCCCAGAGC TGGGTATGCT CGGCCGGGGC GGCCGGTATA TGTACGGCGT GCTGGGAGGG 53940
GCGGCGTCGG GCCCCGCCCA CGGTCCGCCA CGCCCCGCGC GTCATCGGCA GGGGGCGTGG 54000 CCGCCCTTCT AAAAAAAGTG AGAACGCGAA GCGTTCGCAC TTTGTCCTAA TAGTATATAT 54060
ATTATTAGGA CAAAGTGCGA ACGCTTCGCG TTCTCACTTT TTTTAGAAGG GCGGCCACGC 54120
CCCCTTTGAC GTCACGCTCA CCCGGGCGGC CGGCCGCCCA TAAGCGCGGC CTGCCGGGCC 54180
GATAAAAAGA AACCGCGGCG CCCCCGCGGA CACCACACAC TGGCTCTCGA ACCCCGGACG 54240
CGCAGAAGGG ACCCGGGCGC GGGTCCGCCG GTAAGAGCCG GGGGGAACAT CGGCACCGCC 54300 ATCCCACCCC GAGCTGTTGG GTGGGCGGGT GGGGGGGCTG GTGAGGCGGT GGTGGGAGGG 54360
GGCGGCGTAT AGCAGGACAA CGACCGGCGG CGATGTTTTG TGCCGCGGGC GGCCCGACTT 54420
CCCCCGGGGG GAAGTCGGCG GCTCGGGCGG CGTCTGGGTT TTTTGCCCCC CACAACCCCC 54480
GGGGAGCCAC CCAGACGGCA CCGCCGCCTT GCCGCCGGCA GAACTTCTAC AACCCCCACC 54540
TCGCTCAGAC CGGAACGCAG CCAAAGGCCC CCGGGCCGGC TCAGCGCCAT ACGTACTACA 54600 GCGAGTGCGA CGAATTTCGA TTTATCGCCC CGCGTTCGCT GGACGAGGAC GCCCCCGCGG 54660
AGCAGCGCAC CGGGGTCCAC GACGGCCGCC TCCGGCGCGC CCCTAAGGTG TACTGCGGGG 54720
GGGACGAGCG CGACGTCCTC CGCGTGGGCC CGGAGGGCTT CTGGCCGCGT CGCTTGCGCC 54780
TGTGGGGCGG TGCGGACCAT GCCCCCGAGG GGTTCGACCC CACCGTCACC GTCTTCCACG 54840
TGTACGACAT CCTGGAGCAC GTGGAACACG CGTACAGCAT GCGCGCCGCC CAGCTCCACG 54900 AGCGATTTAT GGACGCCATC ACGCCCGCCG GGACCGTCAT CACGCTTCTG GGTCTGACCC 54960
CCGAAGGCCA TCGCGTCGCC GTTCACGTCT ACGGCACGCG GCAGTACTTT TACATGAACA 55020
AGGCAGAGGT GGATCGGCAC CTGCAGTGCC GTGCCCCGCG CGATCTCTGC GAGCGCCTGG 55080
CGGCGGCCCT GCGCGAGTCG CCGGGGGCGT CGTTCCGCGG CATCTCCGCG GACCACTTCG 55140
AGGCGGAGGT GGTGGAGCGC GCCGACGTGT ACTATTACGA AACGCGCCCG ACCCTGTACT 55200 ACCGCGTCTT CGTGCGAAGC GGGCGCGCGC TGGCCTACCT GTGCGACAAC TTTTGCCCCG 55260
CGATCAGGAA GTACGAGGGG GGCGTCGACG CCACCACCCG GTTTATCCTG GACAACCCGG 55320
GGTTTGTCAC CTTCGGCTGG TACCGCCTCA AGCCCGGCCG CGGGAACGCG CCGGCCCAAC 55380
CGCGCCCCCC GACGGCGTTC GGAACCTCGA GCGACGTCGA GTTTAACTGC ACGGCGGACA 55440 ACCTGGCCGT CGAGGGGGCC ATGTGTGACC TGCCGGCCTA CAAGCTCATG TGCTTCGATA 55500
TCGAATGCAA GGCCGGGGGG GAGGACGAGC TGGCCTTTCC GGTCGCGGAA CGCCCGGAAG 55560
ACCTCGTCAT CCAGATCTCC TGTCTGCTCT ACGACCTGTC CACCACCGCC CTCGAGCACA 55620
TCCTCCTGTT TTCGCTCGGA TCCTGCGACC TCCCCGAGTC CCACCTCAGC GATCTCGCCT 55680 CCAGGGGCCT GCCGGCCCCC GTCGTCCTGG AGTTTGACAG CGAATTCGAG ATGCTGCTGG 55740
CCTTCATGAC CTTCGTCAAG CAGTACGGCC CCGAGTTCGT GACCGGGTAC AACATCATCA 55800
ACTTCGACTG GCCCTTCGTC CTGACCAAGC TGACGGAGAT CTACAAGGTC CCGCTCGACG 55860
GGTACGGGCG CATGAACGGC CGGGGTGTGT TCCGCGTGTG GGACATCGGC CAGAGCCACT 55920
TTCAGAAGCG CAGCAAGATC AAGGTGAACG GGATGGTGAA CATCGACATG TACGGCATCA 55980 TCACCGACAA GGTCAAACTC TCCAGCTACA AGCTGAACGC CGTCGCCGAG GCCGTCTTGA 56040
AGGACAAGAA GAAGGATCTG AGCTACCGCG ACATCCCCGC CTACTACGCC TCCGGGCCCG 56100
CGCAGCGCGG GGTGATCGGC GAGTATTGTG TGCAGGACTC GCTGCTGGTC GGGCAGCTGT 56160
TCTTCAAGTT TCTGCCGCAC CTGGAGCTTT CCGCCGTCGC' GCGCCTGGCG GGCATCAACA 56220
TCACCCGCAC CATCTACGAC GGCCAGCAGA TCCGCGTCTT CACGTGCCTC CTGCGCCTTG 56280 CGGGCCAGAA GGGCTTCATC CTGCCGGACA CCCAGGGGCG GTTTCGGGGC CTCGACAAGG 56340
AGGCGCCCAA GCGCCCGGCC GTGCCTCGGG GGGAAGGGGA GCGGCCGGGG GACGGGAACG 56400
GGGACGAGGA TAAGGACGAC GACGAGGACG GGGACGAGGA CGGGGACGAG CGCGAGGAGG 56460
TCGCGCGCGA GACCGGGGGC CGGCACGTTG GGTACCAGGG GGCCCGGGTC CTCGACCCCA 56520
CCTCCGGGTT TCACGTCGAC CCCGTGGTGG TGTTTGACTT TGCCAGCCTG TACCCCAGCA 56580 TCATCCAGGC CCACAACCTG TGCTTCAGTA CGCTCTCCCT GCGGCCCGAG GCCGTCGCGC 56640
ACCTGGAGGC GGACCGGGAC TACCTGGAGA TCGAGGTGGG GGGCCGACGG CTGTTCTTCG 56700
TGAAGGCCCA CGTACGCGAG AGCCTGCTGA GCATCCTGCT GCGCGACTGG CTGGCCATGC 56760
GAAAGCAGAT CCGCTCGCGG ATCCCCCAGA GCACCCCCGA GGAGGCCGTC CTCCTCGACA 56820
AGCAACAGGC CGCCATCAAG GTGGTGTGCA ACTCGGTGTA CGGGTTCACC GGGGTGCAGC 56880 ACGGTCTTCT GCCCTGCCTG CACGTGGCCG CCACCGTGAC GACCATCGGC CGCGAGATGC 56940
TCCTCGCGAC GCGCGCGTAC GTGCACGCGC GCTGGGCGGA GTTCGATCAG CTGCTGGCCG 57000
ACTTTCCGGA GGCGGCCGGC ATGCGCGCCC CCGGTCCGTA CTCCATGCGC ATCATCTACG 57060
GGGACACGGA CTCCATTTTC GTTTTGTGCC GCGGCCTCAC GGCCGCGGGC CTGGTGGCCA 57120
TGGGCGACAA GATGGCGAGC CACATCTCGC GCGCGCTGTT CCTCCCCCCG ATCAAGCTCG 57180 AGTGCGAAAA AACGTTCACC AAGCTGCTGC TCATCGCCAA GAAAAAGTAC ATCGGCGTCA 57240
TCTGCGGGGG CAAGATGCTC ATCAAGGGCG TGGATCTGGT GCGCAAAAAC AACTGCGCGT 57300
TTATCAACCG CACCTCCAGG GCCCTGGTCG ACCTGCTGTT TTACGACGAT ACCGTATCCG 57360
GAGCGGCCGC CGCGTTAGCC GAGCGCCCCG CAGAGGAGTG GCTGGCGCGA CCCCTGCCCG 57420
AGGGACTGCA GGCGTTCGGG GCCGTCCTCG TAGACGCCCA TCGGCGCATC ACCGACCCGG 57480 AGAGGGACAT CCAGGACTTT GTCCTCACCG CCGAACTGAG CAGACACCCG CGCGCGTACA 57540
CCAACAAGCG CCTGGCCCAC CTGACGGTGT ATTACAAGCT CATGGCCCGC CGCGCGCAGG 57600
TCCCGTCCAT CAAGGACCGG ATCCCGTACG TGATCGTGGC CCAGACCCGC GAGGTAGAGG 57660
AGACGGTCGC GCGGCTGGCC GCCCTCCGCG AGCTAGACGC CGCCGCCCCA GGGGACGAGC 57720
CCGCCCCCCC AGCGGCCCTG CCCTCCCCGG CCAAGCGCCC CCGGGAGACG CCGTCGCATG 57780 CCGACCCCCC GGGAGGCGCG TCCAAGCCCC GCAAGCTGCT GGTGTCCGAG CTGGCGGAGG 57840
ATCCCGGGTA CGCCATCGCC CGGGGCGTTC CGCTCAACAC GGACTATTAC TTCTCGCACC 57900
TGCTGGGGGC GGCCTGCGTG ACGTTCAAGG CCCTGTTTGG AAATAACGCC AAGATCACCG 57960
AGAGTCTGTT AAAGAGGTTT ATTCCCGAGA CGTGGCACCC CCCGGACGAC GTGGCCGCGC 58020 GGCTCAGGGC CGCGGGGTTC GGGCCGGCGG GGGCCGGCGC TACGGCGGAG GAAACTCGTC 58080
GAATGTTGCA TAGAGCCTTT GATACTCTAG CATGAGCCCC CCGTCGAAGC TGATGTCCCG 58140
CATCTTGCAA TAAATGTCTG CGGCCGACAC GGTCGGAATT TCCGCGTCCG CTGGTTTCTC 58200
TGCGTTGCGT CTGACCACGA GCACAAACGT GCTCTGCCAC ACGTGGGCGG CGAACCGGTA 58260 GCCGGGGCAC GCGGTCAGCA TCCGATCGAT GAGCCGGTAG TGCAGGTGGG CCGACGTGCC 58320
GGGGAAGATG ACGTACAGCA TGTGGCCCCC GTACGTGGGG TCCGGGTAAA AAAGAAACCG 58380
GGGGTCGCAC GCCCCCCCTC CGCGCAGGAT CGTGTGCACG AAAAAGAGCT CGGGCTGGCC 58440
GAGCGTATCG GCCAGGAGGT CCTGGAGGGG GGTGCTGTGG CGGTCGGCCA GCACGACCAG 58500
GGAGGCCAGA AAGGTGCGGT GCTCAAAGAT CGTATTGATC TGCTGCACGA AGGCCAGGAT 58560 GAGGGCCTCG CGGCTGACGG TGGCCAGCCG CCCGTCGCCC GCGCTGCACG CGGGGCAGCA 58620
GCCCCCGATC CCCAGGTAGT AGCCCATGCC CGAGAGGGTC AGGCAGTTGT CGGCCACGGT 58680
CTGGTCCAGG CTGAAGGGGA GCGACACGGG GGTCGTCTTC ACCAGGGGCA CGGATAGCGA 58740
GCGCACGATG GCGATCTCCT CGGAGGGCGT CTGGGCGAGG GCGGCGAAGA AGCCGCGGTA 58800
GCGACGGCGC TCGTGCAGGC AGAGCTCCAG CCTGCGCGCG TGCGACGGCA GGCTCTTGCG 58860 GGAGGCCCGG CGCTCCACGC CGGGGTTCCC GGCGGCGGAA AAGCGCGACC GCCGCCGGGT 58920
CTTGTCGCGG CCGGGCCCGG GCCGGGAGCC GGAGCGACGG GGGGCGATGT CATACATAGG 58980
TACAGAGGGT GTGCTCCAGG GACAGGAGAG AGATCGAGTG TCGTCTGAGC AGCGCGCCGG 59040
CCTCGCGGAC AAATGTGGCC AGCGCGGTGG GCTTCGGCAC AAATACCTGG TACGTCTTGA 59100
AGGTGTAGAT GAGGGCCCGC AGGGCTATAC AGACCCGCCC CTCGAACTCG TTGCCGCAGG 59160 CCAACTTGGC CTTGTGAAGC TGCAGCTCGT CGCGATGGTC GGCGCGGGGG TGGCCAAACA 59220
GGACCCAGGG GTCGACTTCC ATCTCCGTGA TGGCGCACAT GGGATCGCAG AACATGTGCT 59280
TGAAGATGGC CTCGGGGCCC GCGGCCCGAA GCAGGCTCAC GAACCGGCCC CCGTCCCCGG 59340
GCTGCGCCTC GGGGTCCGCC TCGAGCTGGT CCACGACCGG CACTATGCAG TCGAAGAGGC 59400
TGGTGTTGTT CTCCGAGTAG CGGACGACGG ACGCCCTCAG GCGTCGCATG GCCAGCCAGT 59460 AGGCCCGCAC CAGCAACAGA TTGCACAGCA GGCATTCCCC GCCGGTGCGC CCGCGGCCCC 59520
GGCCGTGCTT CAGCACGGTG GCCATCAGCG GGCCCAGGTC CAGGTCGGGC TGGGCCTTGG 59580
GCTCGGCGAA CTGCGCAAAG CGCGGGGCCG CGTCGCGCAT GCGCGCCCCG CGGTGCGCTT 59640
CCCAGGACTC GCTGACCGCG GCGCGGCGGG CGTCCGCGGC GGCGCGCAGC CGGGGCCCCG 59700
ACTCCCAGAC GGCGGGGGTG CCGGCGAGCA GCAGCAGGAT CAGGTCGGCG TACGCCCACG 59760 TCTCCGGCTC ACCCCCCTGC GCCAGCGCCC CGGCGGCGGC CTCGAACTCC CCGTTGCGGG 59820
CGGCGGCGCG CGTGCAGCAG CTGTCTCCGC CCCCGCGCTT GCCCTCGGTG CAGTCGAGCA 59880
GGCGGGCGCA GTCCTTCCAG TTCATCAGGG CGGTGGTGAG GGAGGGTTGC GTTCCCGAGC 59940
CCCCGCCCGC CCCCGCCCCG TCATCGCCCC CGGAGGCCAG GGTCCCGATG AGGGCCCGGG 60000
TTGCGGACTG CGCGAGGAAG GAATAGTTGG AGTACTGCAC CTTGGCGGCG CCCGGGGAGG 60060 GCGTCGGCCT GGGTTGCTTC TGGGCGTGGC GCCCGGGCAC CCCGCCGTCG GTCCGGAAGC 60120
AGCAGTGGAG AAAGAAATGC CGGTGGATGT CGTTGATGGT CAGGGCGAAG CGCGCGAAGG 60180
AGCCGACAAG GGTCGCCTTC TTGGTGCGCA GGAAGTGGTG GTCCATGACG TAGACGAACT 60240
CGAAGGCGGC CACGAAGATG CTCGCGGCGC AGTGGGGCGC GCCCAGGCAC TTGGCGCAGA 60300
GGAACGCGTA ATCGGCCACC CACTGGGGCG AGAGGCGGTA GGCCTGCTTG TACAGCTCGA 60360 TGGTGCGGCA GACCAGACAG GGGCGGTCCA GCGCGAAGGT GTCGACGGAC GCCGCGGCGA 60420
AGGGCCCCGT GTCCAAGAGT CCCTCTGCCG TGGGGTCTGC GGGCGGGCCG CGGGCGGACC 60480
CCGGCCCCCG CCCCCCCGAA GCCTCGCGCG CGGCCCCGCG CGGCCGCGGG GGGGCGGGCG 60540
CGACGTCGCT CTCCACGTCC TCGTCGAGCG CGCTCGCGGG CGGCACGCCT ACCACGTGAC 60600 AGGCCGCCAG GAGCTCGGCG CACAGGGCCT CGTTAAGAGC CAGAAGGTCG GGATCGAAGG 60660
CCACATACGG ACGCTCGAAC GCGCCCTCCT TCCAGCTGCT GCCCGGCGAC TCTTCGCGCA 60720
CGGCGGCGCT CGACGGCACC CCCGGGGCGG ACGTCGCCAT GGCCGGTCGA GCGGGGCGCA 60780
CGCGTCCGCG AACGTTACGG GACGCGATCC CCGACTGCGC GCTGCGGTCC CAGACCCTGG 60840 AAAGTCTAGA CGCGCGCTAC GTCTCGCGAG ACGGCGCGGG GGACGCGGCC GTCTGGTTCG 60900
AGGACATGAC CCCCGCCGAA CTAGAGGTTA TATTCCCGAC CACGGACGCC AAGCTGAACT 60960
ACCTCTCGCG GACGCAGCGG CTGGCCTCCC TCCTGACGTA CGCCGGGCCT ATAAAAGCGC 61020
CCGACGGCCC CGCCGCCCCA CATACGCAGG ACACCGCGTG CGTGCACGGC GAGCTGCTCG 61080
CCCGAAAGCG CGAACGGTTC GCGGCGGTCA TTAACCGGTT CCTGGACCTG CACCAGATCC 61140 TGCGGGGCTG ACGCGCGCTT CGGCGGGGCA CCGGCACCGG GACCGACTTG TTTTACATAA 61200
CAGTAGGGGG TGGGGGAACG CGCACCCTTG CCCGGTCGCG ATGGCGGGGA TGGGGAAGCC 61260
CTACGGCGGC CGCCCGGGGG ACGCGTTCGA GGGTCTCGTT CAGCGCATCA GGCTCATTGT 61320
TCCCACCACG CTGCGCGGCG GGGGTGGGGA GTCGGGCCCC TACTCGCCAT CCAACCCGCC 61380
CTCGAGATGT GCCTTCCAGT TCCACGGCCA GGATGGGTCC GACGAGGCCT TCCCGATCGA 61440 GTACGTCCTG CGGCTCATGA ACGACTGGGC CGATGTGCCC TGCAACCCCT ACCTGCGCGT 61500
GCAGAACACC GGCGTTTCGG TGCTGTTTCA GGGGTTTTTT AACCGGCCCC ACGGCGCCCC 61560
GGGGGGCGCG ATCACGGCGG AGCAGACCAA CGTGATTCTG CACTCCACCG AGACGACGGG 61620
ACTGTCCCTC GGAGACCTGG ACGACGTCAA GGGGCGCCTC GGCCTGGACG CCCGGCCGAT 61680
GATGGCCAGC ATGTGGATCA GCTGCTTTGT GCGCATGCCC CGGGTGCAGC TCGCGTTTCG 61740 GTTCATGGGC CCCGAGGACG CCGTTCGCAC GCGGCGGATC CTGTGTCGCG CCGCCGAGCA 61800
GGCCCTCGCC CGTCGCCGCC GGTCCAGGCG GTCCCAGGAT GACTACGGGG CGGTGGCGGT 61860
GGCGGCGGCG CACCACTCTT CCGGAGCGCC CGGGCCGGGG GTCGCCGCCT CGGGCCCGCC 61920
AGCGCCGCCC GGACGGGGAC CGGCCCGTCC GTGGCATCAG GCCGTGCAGT TGTTCCGGGC 61980
CCCGCGTCCG GGCCCCCCGG CGCTTCTGTT GCTGGTGGCG GGGCTGTTTC TGGGGGCCGC 62040 TATCTGGTGG GCGGTTGGCG CGCGCCTATG AAAGGGGGCG AGCCACCGTC CCGCCCGCCA 62100
GTGCATCCCA GACGCCCGCG AGCCGCACAT CCCCTCCGCT CCCGCCTCCG GCCCGATTCT 62160
TACGGCGCGA CCCAAGGTCC CGATGGCCGC CCCGCAGTTT CACCGCCCCA GCACCATTAC 62220
CGCCGACAAC GTCCGGGCGC TCGGCATGCG CGGGCTCGTG TTGGCCACCA ACAACGCTCA 62280
GTTCATCATG GATAACAGCT ACCCGCATCC GCACGGAACG CAGGGTGCGG TGCGAGAGTT 62340 TCTTCGCGGG CAGGCCGCGG CGCTGACGGA CCTCGGGGTG ACCCACGCCA ACAACACGTT 62400
CGCCCCGCAG CCTATGTTCG CGGGCGACGC CGCGGCCGAA TGGCTGCGGC CCTCGTTCGG 62460
TCTTAAGCGC ACGTATTCCC CCTTTGTCGT TCGCGACCCC AAGACCCCCA GCACCCCGTG 62520
AGTCCTCGGC GGGTCCCTCC GCGGCCGTCT CTCGTTGCCC CCCCTTTCCC CCTTCCCGGG 62580
TGGTTCAATA AAAAACACCA ACATACGATA TTCGCGTTTG ATACGTTTAT TGGGGGGGTG 62640 TAGGGCCCAA CGATCGGCGA TTAACAACAC CAAACAATCG AGCGCGTCTA ACCCAGTAAC 62700
ATGCGCACGT GATGTAGGCT GGTCAGCACG GCGTTGCTGC GCTGAAACAG CGCCCTGCGG 62760
GTCCGCTGCA GCTGTTGTTG TATGCGGCGG CATGCGCGGA TCAAAACCGC CAGGGCGCTA 62820
CGACCGGTGC TTCGTACGTA GCGTCGCGAC AAGACGGCAT TTGCCTGTAC GGGCAAGGGG 62880
CCAAATTGCG AGTGTGGTGA CTGGAGGTGG TCGGCGGCCA ATGGGCCGGG TGGTTCGTCG 62940 GCGGGGGGCA AGTGCGGTTC CGGTGGGAGG GGGTCGAGCG CCTCGGTATC ATCCGAGTCC 63000
GAGAAACGCA GGGAGTCTGC GTCGGAGTGT TCATCATCGG AGGAGATGTG CAGCGTCTGA 63060
AGCAGCGATG CGGGTGGGGG CGCGGAGTCG ACGTGAAGCG CGAGAGAGGA AGCCCACGAA 63120
GTCACAGCGG ACACTGGGAG GTGGGTGTTT GTATGTGTGG GAGACTCGGG CGTCGGGACC 63180 GAGTCTCGGC TCTGGGGCAG GGGCGGCTGG GGCAGGGGCG GCTGGGGCAG GGGCGGCTGG 63240
GGCAGGGGCG GCTGGGGCAG GGGCGGCTGG GGCAGGGGCG GCTGGGGCAG GGGCGGCTGG 63300
GGCAGGGGCG GCTGGGGCAG GGGCGGCTGG GGCAGGGGCG GCTGGGGCAG GGGCGGCTGG 63360
GGCAGGGGCG GCTGGGGCAC CGAGCGCGCG CGGATGCGCG TCCGCGCGGC GGGTTTGGTC 63420 GCGGGTGACT GGGGTGGGGG GCGGCGGGCA ACCGGGCCTC CGGGCACGAC CCAACCGCAC 63480
AAAGGCTCGC TCGGGGCAAC CGGGCCTGGG GCCAAAGGCG GGGGGCTGGT CTGGACGGCG 63540
GAGGTCGGGG GGGCAAGGCC CGGAGAAGGC GGCACTGCCG CCGCTGCGGC GGAAACCGCG 63600
GCCGCGTGGT CGGCTGGGTC CCGGGGAGAG GGGAGGGAGT TCAACGAGGC CGAGAGCGAG 63660
GCGACCGCGG GGCGCGTGAG GCGCCGGGGT GGGCCGGCCG CGGGGCCCCG GGGGGGTGTC 63720 GGCGAGGGAC CCGCTGTTGT CTGGCGGCGG CCGCGGCGGC GGTCGTCCCC GGGGGCGACC 63780
GCTCCTTCGG CGGGCGGAGG CGGGATGGGC GCGAGCGTGG GGGCGGGAAA GGCCCCGCGA 63840
GCCGAGGCGG GGCCGGGCGG AAGGGGCAAA GCAGAAACCC AAGCCGGGGG CGCGGACTCC 63900
GGGGTGGGCG GCTGGTCGGG AGGACGCGCG GAAGCGGCGA CCGGGGCGAC CGGGGCGGGG 63960
AGTGCCGGCG GACGCCACCC CTCGGGGGGG GCGGAGGCCC GGGGCGCGCG CGATTTGGCA 64020 CGCGTCCGGC GGGATCTGCG CACGCGCGGC ACGGCGGCGG AGAAAGCGGC GGCAGAGCCG 64080
GAAAAGGCCG GGGGAGGAAG CGCGGCATCC GCGGGGGGAC TCGGTGTGGG TGGCGAGGGC 64140
CGTGGGTCGT CGCGAGGGGC CACGGGCACG CGCCCCGTGT TTTGTTGAGG CGGGACACTC 64200
GGTCGTGTTT CGCGAGCCGT AGCTGCCGGC CCGATGGGCC GCGGTGCGTA CTGGGACGTG 64260
GGGACGGACT GATCGGTGGC GGGGGGGGGA AGAAGGGCCG GGGCCGGATT GGGCGTGGGG 64320 CCGCCGGCGT CGTCGGACGC CAGCTCCTCC AGGCCGTGGA TCCAGGCCCA CATGCGAGGG 64380
GGGACGGGCT CGCCGGTGGT GGCGTCGGTG AGGAGAGTGG GGGCGAGGAC CCCCGGGTCC 64440
GCCTGCCGCG CGGGGGGGGC AGCGGGGTCC TCGGGACCCG ATCCGCCATC CCCCCCCGCA 64500
AGGTCCCGCG GGTCGCGGGC GGCGGTCGGG GCAGAGGGAC CTGCCTCGTC GGCGAGGGGG 64560
CGCTGGTAAA CCGGGTGTCC CGGGAACAGC TCCCCCGTCA GGAGGGAGGC GTCGAAGGGC 64620 CGCCCGAGGA TGGCCCGCGC GAAGAAGGGG TCCGCGTCGG CGGCGCTCGC CGCGAGAACG 64680
TCCCCCGCGG TAGCCACAAA CGGAAGCTCC TCGGTGGCCT CGCTGCCCAC AAACCGCACG 64740
TCAGGGGGGC CGGGGGGCTC CGGGGCTTCC CACAAGACCG CGACCGGGGT CATGGAGATG 64800
TCCACGAGGA CCAGGCACGG GGGCCCGTCG GCGAGAGGGC GCTCGGCGAT GAGCGCCGAC 64860
AGGCGCGGGA GCTGCGCCGC CAGACACGCG TTTTCGATCG GGTTGAGATC GGTGTGGAGG 64920 AGGCCGACGG CCCACGTCTC GATGTCGGAC GACACGACGT CGCGCAGGGC GGCGTCCGGC 64980
CCGCCGGGGC GCGAGTCGAA GAGCGTCAGG CACAGTTCCA GTTCCGACTC GCGGGAGAAG 65040
GCCGTGGTGT TGCGGAGCGC CACCACGACG GGCGCGCCGA GGAGCACCGC GGCCAGAACC 65100
AGGTCCATGG CCGTAACGCG CGCGGCGGGG GTGCGGTGGG TCGCGGCGGC CAGCACGGCC 65160
ACGTGCTGGC CCGTGGGTCG GTAGAGGGCG TGGGGGGCCT CGGGGAGGGA CGCCTCGCGC 65220 CCCCCCGCCG GGCCGAGCGT CTGGCCAGAC TCCAGGCGTG CGGCCAGGAG GGCGTCGAAG 65280
CTGTCGTACT CGGTGTAGTC GTCGGGAAAC ATGCAGGTCC ACAGCGCGGC CAAAGCGGCG 65340
CTCGGCAGAC ACATGCGCCC GAGGACGCTC ACCGCCGCCA GGGCCTGGGC CGGACTGAGC 65400
TTCCCGAGCG CCGGGGCGTC CCGGCGCTGG GTCCCGAGCT CCAAGGCCGA GCGCCAGGGC 65460
GCCAGCGGGT CGGTTTCGGA CAGCTTGCCC CGGCGCCAGT CGGCCAGCCG CGTGCCGAAC 65520 AGGAGGCCCC GGGTCGGGGG GCCTCCGTCC AAAAACGTCG GCAACACGCG GATGCGGGCG 65580
TCGGGATGCG GGGTCAGGCG CTGGACGAAC AGCATGGACT CCGCTGCGTC CTCGAACGCG 65640
CGTTCGAGGG TGAGGTGCAT GTACTCGTGC TGGCGAACGA GGTCCAGGCG CCAGAAGTTG 65700
TAGATGTGTT CCGGAACGCC GGCCACCAGC GCGACCAGCA CGTCGTTCTC GTTGAAGGCG 65760 ACGCAGTGGC GCTGGGACCC CCGGGGGCCC GGCGGCGGAC GCGGCGCCGC CGCTCCGGAC 65820
GCCCAGCCCA GCTGGGCCCA GCGACACCCA AACTCGCGCG TGAGGGTGGT GGCGACGAGG 65880
GCGACGTACA GCTCGGCCGC CGCGTCGATC GAGGCGCCCC ACGTCGCCTG GCGATGGCGC 65940
ACGAAGCGAC CGAACAGCTG AAAGTTGGCG GCCTGGGCGT CGCTGAGGGC CAGCTGGAGC 66000 CGGTTCACGA CGGTCAGCAC GTACATGGCC GTGACCGTCG GGGCCGATTC GAGGACGTCC 66060
GTCGGAAGCG GGGGCCGCAC GCAGGCCGCC TCGGGACGCA TCAGCAGCGC GCCGAGTTTG 66120
TCGGTGACGG CCGGGAAGCA TAGCGCGTAC TGCAGCGGCG TTCCGTCCGG GGCCAAAAAG 66180
CTGGTGGCGA ACGGCAGATC CAGAGCGCTG ACGGCCTCAC GCAGCACCAG GGGCCCCGGG 66240
TCTCCGCCGG CGCGCAGATA CGCCTCGCCC CGGCGGCGCA GCAGCTGCGG GTCGACCTCG 66300 TGGCCCTCGG GGGAAGAAGA GGCCCGGGCG CGGGCGTCGA GGGCGCGAAG ATCAACGAGC 66360
AGGGGCGCGG GCGCGGCCGC CGCGCCCGCG CCCGTCTGGC CGCCGGCCCT GGCGTACGCG 66420
CTATATAAGC CCATGCGGCG TTGGATGAGT TCCCGCGCGC CCCGGAACTC CTCCACCGCC 66480
CACGGGGCCA GGTCCGCGGC CGCCGCGTCG AACTCCGCCA GCAGGCGCCC CAGGGCGTCA 66540
AAGTTCATCT CCCAGGGCAC CCTGCGCACC ACCTCATCCC GCAGCCGGGC GCACAGGGCG 66600 GTGTGCTTGG TGACGCGCGC GCCCAGCTCC TCCACGGCCT CCGCGCGCTC GGCGCCCTTG 66660
GCGCCCAGGA CGCCCTGGTA CCTGGCGGAA AGGCGCTCGT AGGCCGGCTG GGCCCGCAGC 66720
CCCGACACCG TGTTGGTGGT GTCCTGCAGG GCGCGCAGCT GCTCGTGCAT GGCGCGGAAC 66780
CCCTCGGGGG ACTTCCAGGC GCCCCCCCGG ACGCGGCCAA AGCGACCCCA GACCTCGTCC 66840
CACTCCGCCT CGGCCTCCTC CAGGGACCTC CGCAGGGCGT CGACGCGGCG CCGAGTATCA 66900 AAGAGCGCCC CCAGGCGGCC GGCGTGCCGC GCCAGGGGGC CGGGGCCGTC GCCGCGGGCG 66960
GCGCTTAGCG GGTGCGTCTC GAAGGTGCGC TGGGCGTGCT CTAGCCAGAT AACCGCGGGC 67020
ACGTCGAGCT CGCGCGTTTT CTCGGTCTGA TCCAACAGAA CCTCGACCTG GTCGGCGATC 67080
TCCGCCACCG AGCGCGCCTG GTCGAGCGTC TTGGCCACGG TCGCCGGGAC GGCGACCACC 67140
TTCAGCATGG TCTTGAGGTT GGCCAGGCCC TCGGCCTCGA TCTGGGCCCG GCGCTCGCGC 67200 GCGGCCAGCG CCTCCCGCAG GCCCGCCATG ACCCGCTCGG TGGCCTCCGC GCGCTGCTGT 67260
TTGGCGCGCA CCACTGCGTC CTTGGTCTCG GCCGTGTCCT GCCGGGTCAC GAAGGCGACA 67320
TACTCGGCGT ACGCCGTGTT CTTCACGGGG CTCTGGTCCA CGCGCTCCAA CGCCGCCGCG 67380
CACGCGACCA GCGCGTCCTC GCTGGGACAC GGCAGGGTGA CCCCGGTCCG GACCAGCTCC 67440
GCGGTGGCCT CCGGGTCATT CCGGGCCGCG GATATCTGCT CCGCGGCGGC CGCCAGGTCC 67500 AGGGGCACGC CGCCGAGCGC CCGGTGCACG TCGGCCCGGA TGGCGTCCAG GCGATCGCGG 67560
AGCTCCACGT AGTCGGCGTA GCCATGTTGG AAGAACGGCA CGTACCGGCG CAGGCCGGGC 67620
ACGCTCGTCA TGTCGTCCGC CAGGCGCCCC ACGGCCTCGT GGTAGTCGAT AAACCCGTCG 67680
CCCGCCTGGG CCATTTCCAG GAGCCCCTCC GCGATGCGCA GCAGCCGCGC CAGGGGCTCG 67740
GCGTCGACCC GAAACATGTC GGCGTAGGTT TCGGCGGCGG CGTGGAACGC CGCGCTCCAG 67800 CCGAGGCGGT GGATGGCGGC GAGCGGGGGG AGCATGGGGT GGCGCTGGTT CTCGGGGGTG 67860
TAGGGGTTAA ACGCGAAGGC CGTATCCAGG GCGAGGGTGA CCGCCTCGGC GTTGGCCGCG 67920
AGCGCCTGCT CGGCGCGCTT GCGGAAGTCC CGGGGGTTGT AGCCGTGCGT GCCCGCCAGC 67980
GCCTGCAGGC GGCGCAGCTC GACCACGTCG AACTCGGCGC GGTTCTCGAC GCGGTCCAGC 68040
GCCGCCTCGA CGCCGGCGGC CCAGCGCTCG CTGCTGCCCC GGGCGCGCTG GGCCGCCATC 68100 TTCGCCGTCA GGTCGGCGAC GGCGGCCTCA AGTTCGTCGG CGCGGCGTCG CGTGGCGCCG 68160
ATGACCTTGC CCAGCTCCTG CAGGGCGCGC CCGCTGGGGG AATGGTCCCC GGCCGTCCCT 68220
TCGGCGTGCA GCAGGCCCCC GAACCCAGCC TCGTGCCCCG CGAGGCTTTC CCGAGCAGCG 68280
GTCGTCGCGC GGGCCGCGGC ATCGATGAGG GCGGCATGGT CCCCCTCCGG CTGGGCGCAG 68340 GCCCGGCGCG CCTGGACTAC CAGGTCGGCG GCCGCCGACC CCAGGGTCGT GAGCTCGTCG 68400
ATGGCCCCCC GCGCCTCCAG GGCCAGCCGA GTCGCCTTTA CATACCCCGC GGCGCTATCG 68460
GCCAGCGCCG CGAGGAAGGA CAGGGGGGAG GCCGGGTCGC GGGCGGCCGC GCCCAGGGCC 68520
GACACCGCGT CCGCCAGGGC GCCATGCGCC CGCACGGCCG CGTCCACCGT CGCCGCGGGA 68580 CTTGCCGTCG CGACGGCGGC GCTCCCGGCG TTGATGGCGT TTGACACGGC TTTGGCGATT 68640
GTGGGGGCGT GATCGGAAAA GAACTGCACG AGGACCGGCG TCTCGGGGGC GTCGGCGAAC 68700
AGGGTCTTCA GCACCACCAC GAAGGCGGGA TGCAGGCCGG CCAGAGCCGT CGCGGTATCC 68760
GGGGTCGGGT GTTCCAGGGC CTCCCGGTAC TGCCCCAGCA GCCCCCACAG GTCCGCCCGC 68820
AGCGCCGCCG TGACTTCCGG GGGGGGGCCC CGGACGGCAT CGGCCAGGTC GGTCCACCCC 68880 GCGGGCAGGG AGGCCCGCAG GGTCGCCAGC ACGGCCGGAC ACGCCTTTAG CCCCACAAAG 68940
TCCGGGAGGG GCCGCAGGAC CCCTTGGAGT TTGTGCAGGA ACTTCTCCCG GGCGTCGTGG 69000
GCCACCTTGG CGCGCTCCCG CGCGTCGTTG AGCATCGCCT CCAGGGCGTG GGCGCGCTCC 69060
CGAAGCCGGG AGCGCGCCTC CGGAGCGAGC TCCGCCGTCA TCTTGGCCGC CTCCATGGCC 69120
CTCGCCTGCC GCAGCGCGTC TTCGGCCATG CGCGTGGCCT CGGGGGACAG CCCGCCCCCG 69180 TCGACGTACG GCGCGGGGCC GGTCGCCGGG ACGAAGGCCG CGTCGCTGTC CAGCTGCTGC 69240
GCGAGCGCCG CGTCGAGGGC GTCGAAGCGC TGCAGTTCGG CCAGCCCCGA GCTGCGCCGC 69300
GCCTGCTGGT CGTTGATGCC GTGGATGCTG CGCGCCAGCT CTTCCAGGGG CTTGCGTTCG 69360
ATGAGCCCCT GGGTCGCGGC GTCGGTCAGG ACCGAGAGCC AGGCCGCCAG GTCCTCGGGG 69420
GCATCTAGGG TCTGGCCCCG CTGGAGCAGG TCCCGCAGCA GGATGGCCTG GGGGCTGGTG 69480 GCGAGGGGGG GCGGGGGGGG GAGCGCGGCG CGCTGAGCGA CGTCCCGCGT GTGTTGGTCA 69540
AAGGCCGGTA GCGATTCCAG CAACTGGACC ATGGGCACGA CCGCGGCCGA GGCCACGTGA 69600
AACCGACAGT CGTGGCTGTC GCTGGCCTGC AGGGCCTTCG CGCTGTATAC GGCTCCCCGG 69660
TGGAAGTACT CCTTGATCGC GCTCTCGATC GCCCGGCGGG CCTGGATCCG CACGTCCTCC 69720
AGCCGCGCCT GGATGGCCTC GGGGCCCAGG GCGGGCGGGC ACGGGGCCCT GCCGCCGGCG 69780 CCCGGGGCGG CGGGCACGGG CATCACGGTC AGGGGCCCGG CGCGCTGCGA GACCGAGTCG 69840
ACCCCGCGGG CGAGGGCGTC TAAGGCCTCG CGCATCTCGC GGGCCTCCGC CTCGACCCGC 69900
ATCTCTTCGC CCCGGGCAAA CTGGGCCAGC GCCTGGATCC GATGGAGAAG CGGCTCCGGG 69960
TGCGTCGGGG TGGCGGGGGC GAACAGGGTG TTCGGGTGGG CGCGCGAGCG CTCCAGGAGC 70020
CACTCTCCGA GGCGTGCGTA CAGATTGGCC GGCGGGGCGG CGCGCAGCTG CAGATCCAGG 70080 TCCGCGAGGT CCCCGTAAAA GGCGTCCGTC TCCCGAATAA CGTCCCTGGC GACCAGGACC 70140
AGCTTAGCGA GGGCCAGGCG CCCGATCTGC GAATTTTCGT CCAGCACGTG CTGGATGAGG 70200
GGCCGGTGGG CGGCCACGTC CGCCAGGCTC ATGCGCGTGG ACGCCAGGAA GTCCCCGACG 70260
GCCGTTTTGC GGGGCAGCAT GCGCAGGGTG AAGTCCAGCA GGGCCGCGGC CGGGCCGGCC 70320
ACCCCGGCCT GCGTATGCGT GCGGGCCCCG TTCTCGATCA AAAAGGCGAG GACGCGCTCA 70380 AAGAAGAAGA TGACGCAGAG CTCCAACAGC CCCGGGTGCG CCGGGTACGG CGACCGCAGG 70440
GCGTTGATGG TGAGCTGCGA ACACGCGGCC ACCTCGCGGG CCAGGGCGGC ATCGCGCGCC 70500
GCGAGCCGGA CCGCCGTGGC GGCCACATTG GGGTGGACCT CGAACAGCTG CGCCAGGTCG 70560
GCGCCGGGGG GCTCCGGGGG GCGGCGGGCC CCCAGCGTCT CGAGCACGGA CGGCGACGAC 70620
GGGCTCGCGG GCCCGTCATC GCCGCCTCCC TGCCCGGACT GCGGGGGGGT ATCCGGTGCG 70680 GGAGGGACCG TGGCGGCTAT GGGCGTCGGG GAGGAGGCGG GGACCTCGGC GGCGACGGGG 70740
GCCTTCTTCT TGGGCGCGGA CTTCTTCTTG GCCTTGGCGG GCGGGGCCTT GGGGGCGGGC 70800
CTCTCGCCCG AGGTCAGATC CTCCACGCTG GACGGTGGGG TCCAGGTGGG CCGGCGGCGC 70860
TTGGGCAAGC CGGTAGAATA GCGCGCCCGG TGGCGACCCA CCGGCACTGC CCCCACCTCC 70920 AGGACCCGCA GGTCCTCGGC TTCTTCGGCC GCGTCCCCGG CGGGTGTCTG CGGGGGCGGG 70980
GCGGCGTGCG GTGGACCCGA GGCCGCGGCG TCCGGGGCCG AGGGCTTTGC GGGCGGGGTC 71040
CCCTCCAGGG CTGCTGCCCA CACATCATCG GGGGGGCGGT TTGGGTGCCC CGCCTGCGGT 71100
GTGTTGGGTG GGCCCGAGGC CCCCCGGGGG GCCTCGGGGG GCCGGTCGGC CCGAGGGGTC 71160 TGGACGTGGG TGGGCGCGGG GAGCGCGGGG ACGACCGGGC CCGAGCCTTC TCCGTCCCCC 71220
CTGGGGACCA CACCGACAAA GAGCGCCCCT AGCCCCCCGA TCTCGCCCCG CAGGGGGTGG 71280
GTGATGGCCA CGCGCCGCTC GACGAACGGT TCGTCCTGCA GGTAAGTCTC GCTGGCCCCG 71340
TAGAGGTGCA GGGCCGCGGC GGTCAGGTCC GCCGGCGCCA CGGCCCCCGG GCCGGAGGGC 71400
ACAAAAAACA CCATGGCGCC CGCCCACCGC ACCTTGGGGC GGTCGTGGGC GTAATACGTC 71460 AGGTACGGGT ACACGTCGCC CGCCCGCACC TTGGCGATAA ACGCGGGCGT TCCCGCGGGC 71520
AGGCCGTGCG GGTCAAACAG ATAGGCCGTG TCGCCGTCCC GGTAGAGCCC CATGCCCAGG 71580
GGGCCGATGG TCAGGAGCGT GTAGGACAGC GGCCGCATGG CCCAGGGGCC GGCGAAGAAC 71640
GTGTGCGCGG GGCATTGCGT CTCCAGCAGC CCCGCCGTGG GCTCCCCGAA GAAGCCCACC 71700
TCGCCGTACA CCCGCGAAAA CACGCAACGC AGGCCGCCGC GCGCCGCCGG GTACTCCAGG 71760 AAGTTGGGGA GCTCGATAAT GGAACACATG CGCGGCGGCC CGGAGCCCGC GGCCGCGCGC 71820
GTCCACTCGC CCCCCTCCAC CAGACATCCC TCAATGGCCT CCGCGGACAG CACGTCGCGG 71880
GGCCCCACGT CGAAAAGAAG ACTGAGAAAC GACAGGGACG AGCGCATGCA CGATACCGAC 71940
CCCCCCGGCT CCAGATCGGT CGCGAACTGG TTCCGAACAC CGGTGACCAC GATATCGCGA 72000
TCCCCCTGGC GCTTCATCGT GGGGTGAGGT AGCGCGGCCG GAATCATGTG TGCCGCGCCC 72060 GCCACGAGCG GGGCCTGTTT ATGGGCCGGG CGTCCCGATG AGTACTGTTG TTTCCGCCGC 72120
CCGAACCCCC CCCGCCCATC AACCGCCTGT TCGTCCCCCT AACCACACAC CCGGTATCGC 72180
GTGTGTGGTT TCCCGGGAAG CCACATCCCA CCCCATGAAG TTTTGCCCTT TTTTTCCGTC 72240
CCGCACTACG CCACCTTTCC ACCCCCCCCC CCCAAAAAAA AAAAAACAAC AACCAACTCC 72300
CAGATGGATG GGTGCGATAA TAAAGCTTTA TTATTGTTTA ACCAAAGGCG AGTCCTACGG 72360 GTGTACCGGT GGTGTCTCCT GCGGCGTCAT CTCGTCGTCC TCCACGGGGG TGTTGGGCCA 72420
AGGGACCGTC TCGCGGCCCG CCGGGCGCGT CGACGGCGCG CGGGCCTGCG TGTCCTGTGG 72480
GCCGGGTGTC GTGGGTTCGG GGGTGCTACC GCCGGCATCT TGGGCCTCCA GGTCCCCGGG 72540
GGCCTCCGGG CCGGCGGAAG GCCGAAACGC CGAGGCGCGA AACACGCCGT CGGTGACCTG 72600
CAGGAGCTCG TTTATTAATA GCCAGTCCAT GCTCAGCGTA GCGGCCAGCC CCTGGGGAGA 72660 CAGGTCCACG GAGTCCGGAA CCACCGTCGG CTGACCCAGG GGCCCCAGGC TGTAGTCCCC 72720
CCAGGCCCCC AGGTCATGAC GGTTCGTGAG CACGACGAGG TCTGCGGCCG GGCTGGGGGG 72780
CGCGTCCTCG GTCGCGTGGG CCATCACCTC CTGAATGGCT GCGGTGCGCT GATCGGCCGA 72840
GCTGGCGAAG GGCGCCACGA CCAGCGCGCG CTCCGTCTGC AGGCCCTTCC ACGTGTCGTG 72900
GAGTTCCTGA ACGAACTCGG CCACCCGCTC GGGGCCCGTG GCCGCGCGCG CGGCCTGATA 72960 GCCGGCCGAG AGGCGCCGCC AGCGCGCCAG GAACTGACTC ATGTAACAGA ACCCGGGGAC 73020
CTGGTCCCCC GACATCAACT TTGACGCCCT GGCGTGGATG CCCGACACGA TGGCCAGGAA 73080
CCCGTGGATT TCCCGCCGCA CGACGGCCAG CACGTTACCC TCGTGCGAGA CCTGGGCCGC 73140
CAGCTCGTCG CATACCCCGA GGTGCGCCGT CGTCTCGGTG ACGACGGACC GCAGCCCCGC 73200
GAGGGACGCG ACCAGCGCGC GCTTGGCGTC GTGATACATG CCGCAGTACT GGCTCACCGC 73260 GTCGCCCATG GCCTCGGGGC GCCAGGGCCC CAGGCGCTCG TGGGCGTCTG CGACCACGGC 73320
GTACAGGCGG TGCCCGTCGC TCTCGAACCG GCACTCAAAG AAGGCGGCGA GCGTGCGCAT 73380
GTGCAGCCGC AGCAGCACGA TCGCGTCCTC CAGCTGGCGG ACCAGGGGGT CGGCGCGCTC 73440
GGCGAGCTCC TGCAGCACCC CCCGGGCCGC CAGGGCGTAC ATGCTGATCA GCAGCAGGCT 73500 GCTGCCCACC TCGGGAGGCT GGGGGGGAGG CAGCTGGACC GCGGGCCGCA GCTGCTCGAC 73560
GGCCCCCCTG GCGATCACGT ACAGCTCGCG CAGCAGCTGC TCGATGTTGT CGGCCATCTG 73620
CATCGTGGGC CCGACGCCGG CCCGGGTGGC CGGTTCGAGG AGGGTGATCA GCGCGCCCAA 73680
TTTTGTGCGG TGCCCCTCGA CGGTGGGGAG ATAGCCCAGG CCGAAGTCGC GCGCCCAGGC 73740 CAGCACCCGC AGGGCAAACT CGATGGGGCG GGGCAGGTAG GCAGCGTTGC ACGTGGCCCT 73800
CAGCGCGTCC CCGACCACCA GGGCCAGCAC GTAAGGGACG AACCCCGGGT CGGCGAGGAC 73860
GTTGGGGTGG ATGCCCTCCA GGGCCGGGAA GCGGATCTTG GTGGCCGCGG CCAGGTGAAC 73920
CGAGGGGGCG TGGCTAGGCG GCCCGACGGG GAGCAGCGCG GACAGCGGCG TGGCCGGGGT 73980
GGTGGGGGTC AGGTCCCAGT GGGTCTGGCC GTACACGTCG AGCCAGATGA GCGCCGTCTC 74040 GCGCAGGAGG CTGGGCTGGC CGGCGCTGAA GCGGCGCTCG GCCGTCTCAA ACTCCCCCAC 74100
GAGCGTGCGC CGCAGGCTCG CCAGGTGTTC CGTCGGCACG GCCGGGCCCA TGATGCGCGC 74160
CAGCGTCTGG CTGAGGACGC CGCCCGACAG GCCGACCGCC TCACAGAGCC GCCCGTGCGT 74220
GTGCTCGCTG GCGCCCTGGA TCCGCCGGAA CGTTTTCACG TAGCCGGCGT AGTGCCCGTA 74280
CTCCCGCGCG AGCCCGAACA CGTTCGCCCC CGCAAGGGCA ATGCACCCAA AGAGCTGCTG 74340 GATCTCGCTG AGCCCGTGGC CGGGGGGCGT CCGCGCGGGC ACCCCCGCCA CCAAAAACCC 74400
CTCCAGGGCC GATATGTACT GGGTGCAGTG CGCGGGCGTG AACCCCGCGT CGGTAAGCGT 74460
GTTGATCACC ACGGAGGGCG AGTTGCTGTT CTGGACCAAA GCCCACGTCT GCTGCAGCAG 74520
CGCGAGGAGC CGTTGCTGGG CCCCGGCGGA GGGCGGCTCC CCTAGCTGCA GCAGGCCGGT 74580
GACGGCCGGA CGGAAGATGG CCAGCGCCGA CGCACTCAGA AACGGCACGT CGGGGTCGAA 74640 GACGGCCGCG TCCGTCCGCA CGCGCGCCAT CAGCGTCCCC GGGGGCGCGC ACGCCGACCG 74700
CGGGCTGACG CGGCTTAGGG CGGTCGACAC GCGCACCTCC TCGCGACTGC GAACCATTTT 74760
GGTGGCCTCG AGGGGCGGGA TCATGATAGC CGGGTCGATC TCCCGCACCG TGTGCTGAAA 74820
CTGGGCCAGC AGCGGCGGCG GGACCACCGC GCCCCGATCG GGGGTCGTCA GGTACTCGTC 74880
CACCAGCGCC AGCGTAAACA GGGCCCGCGT GAGGGGGGTC AGGGCGGCGT CGTCGATGCG 74940 CTGTAGGTGC GCCGAGAACA GCGTCACCCA ATTGCTGACC AGGGCCAAGA ACCGGAGACC 75000
CTCTTGCACG ATCGGGGACG GGAAGAGCAG GCTGTACGCC GGGGTGGTCA GGTTGGCGCC 75060
GGGTTGCCCC AGGGGAACCG GGGACATCTT AAGCGACATC TCCCCGAGGG CCTCCAGGGA 75120
GGTCCGCGGG TTCATGGCCA GGCAGCTCTG GGTGACGGTC CGCCAGCGGT CGATCCACTC 75180
CACGGCACAC TGGCGGACGC GCACCGGCCC CAGGGCCGCC GTGGTGCGCA GCCCGGCGGC 75240 CTCCAGCGCG TGGGTCGTGT CGGAGCCGGT GATCGCCAGG ACCGTGTCCT TGATGACGTC 75300
CATCTCCCGG AAGGCCGCCT CGGGGGTCTC GGGGAGCGCC ACCGCCATGC GGTGCACCAG 75360
CAGCCCGGGG AGGTTCTCGG CCAAGAGCGC CGTCTCCGGA AGCCCGTGGG CCCGGTGCAA 75420
GGCGCACAGT TGCTCCAGGA GCGGGTGCCA GCACGCCCGC GCCTCCGCCG GGCCGACCGC 75480
CGCGCCCGAC AACAGAAACG CCGCCGTGGC GGCGCGCAGT TTGGCCGCGG ACAGAAACGC 75540 CGGCTCGTCC GCGCTGCCCG CCGGCTCGCT CGAGGGGGAG GGCGGCCGGC GGAGGTTGGT 75600
CAGGCTCCCC AACAGGACCT GCAACGGTCC GTTTGGGGGT GGAGCGGACG GGGGGGTCAT 75660
GCCGGCGGGC GCCGGGACCT GGAGCGCGCT GTCCGACATG GCGACCGGCG TGCGCGCTCG 75720
GCGACGCGGC GCGGAGACCG CGGGCCCAAA CGGGAATGAC TGCCGCCGCC CTATACGGAG 75780
GGGCTAAGTA TCGCCCGGGG ACCCTTCGAA ACCCCGGGCG TGTCGCAAGT ACGCCGCGAA 75840 GGCGCGGCGT GTTATACGGC GCGTTATGTC CCGGCATTCC GTTCGTGGGT TCGGGCCCGG 75900
GTGCTGTCGG GTGGGAGTGT GTGTGTGTGG GGGGGGGGCG GCGCGACGGC GGCCCGGACC 75960
AAGTGTATCG CGGCCGTTCC GTGGGGCGGC CCAACAGGCC CTTTAAACAT TTGCGTATGC 76020
ACCGGCCCAG CCAGTCGGAC ACCGGAACCC ACCAGAGGCG GAAGCCGCCT TCGCCCGTGA 76080 GGGTGCGTGT GTTTTCTGGT GGCGTGTTTT TCCTTTCCGC CCTCCTCCCT CCCCACCTCC 76140
ACCACCCCCC CCCCACAACT CGCCCGTTGG CGATCGGCGG GAAAACCATG AAAACCAAGC 76200
CACTCCCGAC AGCCCCGATG GCGTGGGCCG AGAGTGCCGT GGAAACCACC ACCAGCCCGC 76260
GCGAGCTCGC GGGCCACGCC CCGCTCCGGC GCGTCCTGCG CCCGCCCATC GCTCGCCGCG 76320 ACGGCCCGGT GCTTTTGGGG GACAGGGCCC CCAGGAGGAC GGCCAGTACG ATGTGGCTGC 76380
TGGGGATCGA CCCCGCGGAG TCGTCTCCGG GAACGCGCGC TACCCGAGAC GATACCGAGC 76440
AGGCCGTGGA CAAGATCCTC AGGGGAGCCC GGCGCGCGGG AGGGCTGACC GTCCCCGGCG 76500
CCCCCCGCTA TCACCTGACC CGCCAGGTAA CCCTGACGGA TCTCTGCCAA CCAAACGCGG 76560
AGCGGGCCGG GGCGCTCCTT TTGGCCCTGC GGCACCCCAC CGACCTCCCC CACCTGGCCC 76620 GCCATCGGGC TCCGCCCGGC CGGCAGACCG AGCGACTGGC CGAGGCCTGG GGCCAGCTCC 76680
TGGAGGCCTC CGCCCTGGGG TCCGGGCGGG CCGAGAGCGG CTGCGCGCGC GCGGGCCTTG 76740
TGTCGTTTAA CTTTCTGGTG GCCGCGTGCG CCGCCGCCTA CGATGCGCGC GACGCCGCCG 76800
AGGCGGTCCG GGCCCACATC ACGACCAACT ACGGCGGGAC GCGGGCCGGG GCGCGGCTGG 76860
ACCGGTTTTC CGAATGCCTG CGCGCCATGG TCCACACGCA CGTGTTTCCC CACGAGGTCA 76920 TGCGGTTTTT CGGGGGGCTA GTGTCGTGGG TCACACAGGA CGAGCTGGCT AGCGTCACCG 76980
CCGTCTGCAG CGGACCCCAG GAGGCCACAC ACACCGGCCA CCCGGGCAGG CCCTGTTCGG 77040
CCGTTACCAT CCCGGCCTGC GCCTTCGTGG ACCTGGACGC CGAGCTGTGC CTGGGGGGCC 77100
CTGGGGCGGC GTTCCTGTAC TTGGTCTTCA CCTACCGACA GTGCCGGGAC CAGGAGCTCT 77160
GTTGCGTGTA CGTGGTCAAG AGCCAGCTCC CCCCGCGCGG ACTGGAGGCG GCCCTCGAGC 77220 GGCTGTTCGG GCGCCTCCGG ATAACCAACA CGATTCACGG GGCCGAGGAC ATGACGCCCC 77280
CTCCCCCGAA CCGAAACGTT GACTTTCCGC TCGCCGTCCT GGCCGCGAGC TCGCAATCCC 77340
CGCGGTGCTC GGCGAGCCAA GTCACGAACC CCCAGTTTGT CGACAGGCTG TACCGCTGGC 77400
AGCCGGATCT GCGGGGGCGC CCTACCGCAC GCACCTGCAC ATACGCCGCC TTCGCAGAGC 77460
TGGGTGTCAT GCCAGACAAC AGCCCCCGCT GTCTGCACCG CACCGAGCGG TTTGGGGCGG 77520 TCGGCGTTCC GGTTGTCATC CTGGAGGGCG TGGTGTGGCG CCCCGGCGGG TGGCGGGCCT 77580
GCGCGTGATC GTCTATTGAC GACGGCCGCC CAACCCGAGC GACCTTCCCC TCCCACTTTC 77640
CCCCCCCCCC CTCCTACACA CCAACTCCGC CCTCGCCGTC TTGGCCGTGC GCGGCCCCGT 77700
GCGTCCGTCT CAATAAAGCC AGGTTAAATC CGTGACGTGG TGTGTTTGGC GTGTGTCTCT 77760
GAAATGGCGG AAACCGACAT GCAAATGGGA TTCATGGACA CGTTACACCC CCCTGACTCA 77820 GGAGATAGGC ATATCCTCCT TAGATTGACT CAGCACACGA TCGCACCCCA CCCCTGTGTG 77880
CCGGGGATAA AAGCCAACGC GGGCGGTCTG GGTTACCACA ACAGGTGGGT GCTTCGGGGA 77940
CTTGACGGTC GCCACTCTCC TGCGAGCCCT CACGTCTTCG CCCACCGATT CCTGTTGCGT 78000
TCCTGTCGGC CGGTGCTGTC CTGTCGACAG ATTGTTGGCG ACTGCCCGGG TGATTCGTCG 78060
GCCGGTGCGT CCTTTCGGTC GTACCGCCCA CCCCGCCTCC CACGGGCCCG CCGCTGTTTC 78120 CGTTCATCGC GTCCGAGCCA CCGTCACCTT GGTTCCAATG GCCAACCGCC CTGCCGCATC 78180
CGCCCTCGCC GGAGCGCGGT CTCCGTCCGA ACGACAGGAA CCCCGGGAGC CCGAGGTCGC 78240
CCCCCCTGGC GGCGACCACG TGTTTTGCAG GAAAGTCAGC GGCGTGATGG TGCTTTCCAG 78300
CGATCCCCCC GGCCCCGCGG CCTACCGCAT TAGCGACAGC AGCTTTGTTC AATGCGGCTC 78360
CAACTGCAGT ATGATAATCG ACGGAGACGT GGCGCGCGGT CATTTGCGTG ACCTCGAGGG 78420 CGCTACGTCC ACCGGCGCCT TCGTCGCGAT CTCAAACGTC GCAGCCGGCG GGGATGGCCG 78480
AACCGCCGTC GTGGCGCTCG GCGGAACCTC GGGCCCGTCC GCGACTACAT CCGTGGGGAC 78540
CCAGACGTCC GGGGAGTTCC TCCACGGGAA CCCAAGGACC CCCGAACCCC AAGGACCCCA 78600
GGCTGTCCCC CCGCCCCCTC CTCCCCCCTT TCCATGGGGC CACGAGTGCT GCGCCCGTCG 78660 CGATGCCAGG GGCGGCGCCG AGAAGGACGT CGGGGCCGCG GAGTCATGGT CAGACGGCCC 78720
GTCGTCCGAC TCCGAAACGG AGGACTCGGA CTCCTCGGAC GAGGATACGG GCTCGGGTTC 78780
GGAGACGCTG TCTCGATCCT CTTCGATCTG GGCCGCAGGG GCGACTGACG ACGATGACAG 78840
CGACTCCGAC TCGCGGTCGG ACGACTCCGT GCAGCCCGAC GTTGTCGTTC GTCGCAGATG 78900 GAGCGACGGC CCTGCCCCCG TGGCCTTTCC CAAGCCCCGG CGCCCCGGCG ACTCCCCCGG 78960
AAACCCCGGC CTGGGCGCCG GCACCGGGCC GGGCTCCGCG ACGGACCCGC GCGCGTCGGC 79020
CGACTCCGAT TCCGCGGCCC ACGCCGCCGC ACCCCAGGCG GACGTGGCGC CGGTTCTGGA 79080
CAGCCAGCCC ACTGTGGGAA CGGACCCCGG CTACCCAGTC CCCCTAGAAC TCACGCCCGA 79140
GAACGCGGAG GCGGTGGCGC GGTTTCTGGG GGACGCCGTC GACCGCGAGC CCGCGCTCAT 79200 GCTGGAGTAC TTCTGTCGGT GCGCCCGCGA GGAGAGCAAG CGCGTGCCCC CACGAACCTT 79260
CGGCAGCGCC CCCCGCCTCA CGGAGGACGA CTTTGGGCTC CTGAACTACG CGCTCGCTGA 79320
GATGCGACGC CTGTGCCTGG ACCTTCCCCC GGTCCCCCCC AACGCATACA CGCCCTATCA 79380
TCTGAGGGAG TATGCGACGC GGCTGGTTAA CGGGTTCAAA CCCCTGGTGC GGCGGTCCGC 79440
CCGCCTGTAT CGCATCCTGG GGATTCTGGT TCACCTGCGC ATCCGTACCC GGGAGGCCTC 79500 CTTTGAGGAA TGGATGCGCT CCAAGGAGGT GGACCTGGAC TTCGGGCTGA CGGAAAGGCT 79560
TCGCGAACAC GAGGCCCAGC TAATGATCCT GGCCCAGGCC CTGAACCCCT ACGACTGTCT 79620
GATCCACAGC ACCCCGAACA CGCTCGTCGA GCGGGGGCTG CAGTCGGCGC TGAAGTACGA 79680
AGAGTTTTAC CTCAAGCGCT TCGGCGGGCA CTACATGGAG TCCGTCTTCC AGATGTACAC 79740
CCGCATCGCC GGGTTCCTGG CGTGCCGGGC GACCCGCGGC ATGCGCCACA TCGCCCTGGG 79800 GCGACAGGGG TCGTGGTGGG AAATGTTCAA GTTCTTTTTC CACCGCCTCT ACGACCACCA 79860
GATCGTGCCG TCCACCCCCG CCATGCTGAA CCTCGGAACC CGCAACTACT ACACGTCCAG 79920
CTGCTACCTG GTAAACCCCC AGGCCACCAC TAACCAGGCC ACCCTCCGGG CCATCACCGG 79980
CAACGTGAGC GCCATCCTCG CCCGCAACGG GGGCATCGGG CTGTGCATGC AGGCGTTCAA 80040
CGACGCCAGC CCCGGCACCG CCAGCATCAT GCCGGCCCTG AAGGTCCTGG ACTCCCTGGT 80100 GGCGGCGCAC AACAAACAGA GCACGCGCCC CACCGGGGCG TGCGTGTACC TGGAACCCTG 80160
GCACAGCGAC GTTCGGGCCG TGCTCAGAAT GAAGGGCGTC CTCGCCGGCG AGGAGGCCCA 80220
GCGCTGCGAC AACATCTTCA GCGCCCTCTG GATGCCGGAC CTGTTCTTCA AGCGCCTGAT 80280
CCGCCACCTC GACGGCGAGA AAAACGTCAC CTGGTCCCTG TTCGACCGGG ACACCAGCAT 80340
GTCGCTCGCC GACTTTCACG GCGAGGAGTT CGAGAAGCTG TACGAGCACC TCGAGGCCAT 80400 GGGGTTCGGC GAAACGATCC CCATCCAGGA CCTGGCGTAC GCCATCGTGC GCAGCGCGGC 80460
CACCACCGGA AGCCCCTTCA TCATGTTTAA GGACGCGGTA AACCGCCACT ACATCTACGA 80520
CACGCAAGGG GCGGCCATTG CCGGCTCCAA CCTCTGCACG GAGATCGTCC ACCCGTCCTC 80580
CAAACGCTCC AGCGGGGTCT GCAACCTGGG CAGCGTGAAT CTGGCCCGAT GCGTCTCCCG 80640
GCGGACGTTC GATTTTGGCA TGCTCCGCGA CGCCGTGCAG GCGTGCGTGC TAATGGTTAA 80700 TATCATGATA GACAGCACGC TGCAGCCGAC GCCCCAGTGC GCCCGCGGCC ACGACAACCT 80760
GCGGTCCATG GGCATTGGCA TGCAGGGCCT GCACACGGCG TGCCTGAAGA TGGGCCTGGA 80820
TCTGGAGTCG GCCGAGTTCC GGGACCTGAA CACACACATC GCCGAGGTGA TGCTGCTCGC 80880
GGCCATGAAG ACCAGTAACG CGCTGTGCGT TCGCGGGGCG CGTCCCTTCA GCCACTTTAA 80940
GCGCAGCATG TACCGGGCCG GCCGCTTTCA CTGGGAGCGC TTTTCGAACG CCAGCCCGCG 81000 GTACGAGGGC GAGTGGGAGA TGCTACGCCA GAGCATGATG AAACACGGCC TGCGCAACAG 81060
CCAGTTCATC GCGCTCATGC CCACCGCCGC CTCGGCCCAG ATCTCGGACG TCAGCGAGGG 81120
CTTTGCCCCC CTGTTCACCA ACCTGTTCAG CAAGGTGACC AGGGACGGCG AGACGCTGCG 81180
CCCCAACACG CTCTTGCTGA AGGAACTCGA GCGCACGTTC GGCGGGAAGC GGCTCCTGGA 81240 CGCGATGGAC GGGCTCGAGG CCAAGCAGTG GTCTGTGGCC CAGGCCCTGC CTTGCCTGGA 81300
CCCCGCCCAC CCCCTCCGGC GGTTCAAGAC GGCCTTCGAC TACGACCAGG AACTGCTGAT 81360
CGACCTGTGT GCAGACCGCG CCCCCTATGT TGATCACAGC CAATCCATGA CTCTGTATGT 81420
CACAGAGAAG GCGGACGGGA CGCTCCCCGC CTCCACCCTG GTCCGCCTTC TCGTCCACGC 81480 ATATAAGCGC GGCCTGAAGA CGGGGATGTA CTACTGCAAG GTTCGCAAGG CGACCAACAG 81540
CGGGGTGTTC GCCGGCGACG ACAACATCGT CTGCACAAGC TGCGCGCTGT AAGCAACAGC 81600
GCTCCGATCG GGGTCAGGCG TCGCTCTCGG TCCCGCATAT CGCCATGGAT CCCGCCGTCT 81660
CCCCCGCGAG CACCGACCCC CTAGATACCC ACGCGTCGGG GGCCGGGGCG GCCCCGATTC 81720
CGGTGTGCCC CACCCCCGAG CGGTACTTCT ACACCTCCCA GTGCCCCGAC ATCAACCACC 81780 TTCGCTCCCT CAGCATCCTG AACCGCTGGC TGGAGACCGA GCTCGTGTTC GTGGGGGACG 81840
AGGAGGACGT CTCCAAGCTC TCCGAGGGCG AGCTCGGCTT CTACCGCTTT CTGTTTGCCT 81900
TCCTGTCGGC CGCGGACGAC CTGGTGACGG AAAACCTGGG CGGCCTCTCC GGCCTCTTCG 81960
AACAGAAGGA CATTCTTCAC TACTACGTGG AGCAGGAATG CATCGAGGTC GTCCACTCGC 82020
GCGTCTACAA CATCATCCAG CTGGTGCTCT TTCACAACAA CGACCAGGCG CGCCGCGCCT 82080 ATGTGGCCCG CACCATCAAC CACCCGGCCA TTCGCGTCAA GGTGGACTGG CTGGAGGCGC 82140
GGGTGCGGGA ATGCGACTCG ATCCCGGAGA AGTTCATCCT CATGATCCTC ATCGAGGGCG 82200
TCTTTTTTGC CGCCTCGTTC GCCGCCATCG CGTACCTGCG CACCAACAAC CTCCTGCGGG 82260
TCACCTGCCA GTCGAACGAC CTCATCAGCC GCGACGAGGC CGTGCATACG ACAGCCTCGT 82320
GCTACATCTA CAACAACTAC CTCGGGGGCC ACGCCAAGCC CGAGGCGGCG CGCGTGTACC 82380 GGCTGTTTCG GGAGGCGGTG GATATCGAGA TCGGGTTCAT CCGATCCCAG GCCCCGACGG 82440
ACAGCTCTAT CCTGAGTCCG GGGGCCCTGG CGGCCATCGA GAACTACGTG CGATTCAGCG 82500
CGGATCGCCT GCTGGGCCTG ATCCATATGC AGCCCCTGTA TTCCGCCCCC GCCCCCGACG 82560
CCAGCTTTCC CCTCAGCCTC ATGTCCACCG ACAAACACAC CAACTTCTTC GAGTGCCGCA 82620
GCACCTCGTA CGCCGGGGCC GTCGTCAACG ATCTGTGAGG GTCTGGGCGC CCTTGTAGCG 82680 ATGTCTAACC GAAATAAAGG GGTCGAAACG GACTGTTGGG TCTCCGGTGT GATTATTACG 82740
CAGGGGAGGG GGGTGGCGGC TGGGGAAAGG GAAGGAACGC CCGAAACCAG AGAAAAGGAC 82800
CAAAAGGGAA ACGCGTCCAA CCGATAAATC AAGCGCCGAC CAGAACCCCG AGATGCATAA 82860
TAACAAACGA TTTTATTACT CTTATTATTA ACAGGTCGGG CATCGGGAGG GGATGGGGGC 82920
GCGCGTTTCC TCCGTTCCGG CTACTCGTCC CAGAATTTAG CCAGGACGTC CTTGTAAAAC 82980 GCGGGCGGGG GCGCGTGGGC CCACAGCTGC GCCAGAAACC GGTCGGCGAT GTCCGGGGCG 83040
GTGATATGCC GAGTCACGAT GGAGCGCGCT AAATCTTCGT CGCGGAGGTC CTGATAGATG 83100
GGCAGTCTTT TTAGAAGAGT CCAGGGTCCC CGCTCCTTGG GGCTGATAAG CGATATGACG 83160
TACTTGACGT ATCTGTGCTC CACCAGCTCG GCGATGGTCA TCGGATCGGG CAGCCAGTCC 83220
AGGGCCTCCG GGGCGTCGTG GATGACGTGG CGGCGACGTC CGGCGACATA GCCGCGGTGT 83280 TCCGCGACCC GCTGCGCGTT GGGGACCTGC ACGAGCTCGG GCGGGGTGAG TATCTCCGAG 83340
GAGGACGACC GGGCGCCGTC GCGCGGCCCA CCGGCGACGT CCGGGGGCTG GAGGGGGGGG 83400
TCTTCTTCGT AGTCGTCCTC GCCCGCGATC TGTTGGGCCA GAATTTCGGT CCACGAGATG 83460
CGCGTCTCGA GGCCGACCGG GGCCGCGGTC AGCGTAGGCA TGCTCTCCAG GGAGCGCGAG 83520
TTGGCGCGCT CCCGCCGGGC CGCCCGGCGG GCCTGGGATC GGCTCGGGGC GGTCCAGTGA 83580 CACTCGCGCA GCACGTCCTC GACGGACGCG TAGGTGTTAT TGGGGTGCAG GTCTGTGTGG 83640
CAGCGGACGA ACAGCGCCAG GAACTGCGGG TAACTCATCT TGAAGTACTG CAGCAGGTCG 83700
CGGCAGTGAA TCGTCGGAAT GTAGCCGGTG CTGATGTCCA ACACGATATC GCAGCCCATC 83760
AGCAGGAGAT CGGTATCCGT GGTATGCACG TACGCGACCG TGTTGGTATG ATAGAGGTTC 83820 GCGCAGGCGT CGTCGGCCTC CAGCTGACCC GAGTTGATGT AGGCGTACCC CAGCGCCCGC 83880
AGAACGCGGA TACAGAACAG GTGAGCCAGG CGCAGGGCCG GCTTCGAGGG CGCGCCCCAG 83940
GGGGCCGCCG GGCCTGGGCC GGCGGCCCGC GTTCCCCGGT CCCCCGGGGC GAAGGCGTGC 84000
CCGCGGCGGC GCATGTTGGA AAAAGGCGAA ACTGGGCCTG GAGTCGGTGA TGGGGGAAGG 84060 CGGCGGCGAG GCGTCTACGT CACTGGCCTC CTCGTCCGTG CGGCACTGGG CCGTCGTGCG 84120
GGCCAGGATC GCCTTGGCCC CGAACACAAC CGGCTCGGTA CACTCGACCC CGCGATCGGT 84180
CACGAAGATG GGGAACAGGG ACTTTTGGGT AAACACCCGT AACATACTAC AGAGACAGTG 84240
TAGCGTGATT GCCTCGCGGT CGTAACTTGG GTAGCGGCGC TGATATTTAA CCACCAGGGT 84300
ATACATGACA TTCCACAGGT CCACGGCGAT GGGGGTAAAG TAGCCCTCCG GGGCCCGGAG 84360 GCCCCGGCGC TTCACCAGAT GGTGAGTCTG GGCAAACTTC ATCATGCCAA ACAGACCCAT 84420
TCCGGCACGA TTGTAGGTGC GGATAGGTCT CTCTACAGAG CTGTATAGGT GTGACGGTCC 84480
GGGACACCCA AGCCCGCCGC CCCTGTGTAC AGTGGCTGCG GCGACGACCC CGCTCCAACA 84540
AGACGCTATC CCGGGAAAGG CACGCTCTTT ATAATTCTTT TTTATTTCCC ATCTACGTGC 84600
GGATTGGTGC AACCGCCGGC GCGCGCCGGT GCAGGCCGAC CATCTCTCTC TTCCCCCCCT 84660 CCCCCTCCCC CGAGCCCTCA AAGAGGGTGT GGCCTAACTA GCGGAAGGCG TATTTAACCA 84720
GACTAGGGCG GCGGGTCCGC CGTAGTCCTT GGCTCGGGTA GCCACTGCTC TGTGGCTCGG 84780
GTCCCCCGGC CCCCCTAACC CCCATCCGGT CCGCGTCATC CGCCCCCTCC GCCTGCGACA 84840
CAAACGGCCG CGCCTCCGGG CCCGGTGACA CGACGCGCCT CGTCTCTGCG GATTGTCCCG 84900
GGAGCGTCGC GGCATGGCTC ATCTTCCCGG CGGTGCGGCC GCCGCCCCCC TTTCGGAGGA 84960 CGCGATCCCG TCGCCGCGCG AGCGGACGGA AGACTGGCCG CCCTGCCAGA TAGTGCTGCA 85020
GGGCGCCGAG CTGAACGGGA TCCTGCAGGC CTTTGCGCCG CTTCGCACGA GCCTTTTGGA 85080
CTCGCTCCTG GTCGTGGGCG ACCGAGGCAT CCTTGTACAT AACGCGATTT TCGGCGAGCA 85140
GGTGTTTCTG CCCCTCGACC ATTCGCAGTT CAGTCGCTAT CGATGGGGCG GACCCACCGC 85200
GGCGTTCCTG TCTCTCGTGG ACCAGAAGCG ATCCCTGCTG AGCGTGTTTC GCGCCAACCA 85260 GTACCCTGAC CTGCGGCGGG TGGAGCTGAC GGTCACGGGC CAGGCCCCGT TTCGCACGCT 85320
GGTGCAGCGC ATATGGACGA CCGCGTCCGA CGGAGAGGCC GTGGAGCTTG CCAGCGAGAC 85380
GCTCATGAAA CGCGAGTTGA CGAGCTTCGC GGTACTACTC CCCCAGGGCG ACCCCGACGT 85440
CCAGCTGCGC CTCACGAAGC CCCAGCTCAC GAAGGTGGTG AACGCCGTCG GGGACGAGAC 85500
CGCCAAACCC ACCACGTTCG AGCTCGGCCC CAACGGCAAG TTTTCCGTGT TTAACGCGCG 85560 CACCTGCGTC ACCTTTGCCG CCCGCGAGGA GGGCGCGTCG TCCAGCACCA GCGCCCAGGT 85620
CCAGATTCTG ACCAGCGCGC TGAAGAAGGC GGGCCAGGCG GCCGCCAACG CCAAGACGGT 85680
CTACGGGGAA AACACACACC GCACATTCTC GGTGGTCGTC GACGACTGCA GCATGCGGGC 85740
GGTCCTCCGG CGGCTCCAGG TCGGCGGGGG GACCCTCAAG TTCTTCCTCA CGGCCGACGT 85800
CCCCAGCGTG TGTGTCACCG CCACCGGCCC CAACGCGGTG TCGGCGGTGT TTCTTTTAAA 85860 ACCCCAGCGG GTCTGCCTGA ACTGGCTCGG CCGGACCCCG GGTTCCTCGA CCGGGAGCTT 85920
GGCGTCCCAG GACTCTCGGG CCGGCCCGAC CGACAGCCAG GACTTCTCCT CCGAGCCGGA 85980
CGCGGGCGAC CGCGGCGCCC CAGAAGAAGA AGGCCTCGAG GGCCAGGCCC GGGTCCCGCC 86040
CGCGTTCCCG GAACCGCCGG GAACCAAGCG GAGGCACGCC GGGGCCGAAG TTGTCCCCGC 86100
GGACGACGCC ACCAAGCGCC CGAAGACGGG CGTGCCCGCC GCCCCCACGC GAGCCGAGTC 86160 GCCCCCCCTC TCCGCGAGAT ACGGACCCGA GGCGGCGGAG GGTGGTGGGG ACGGCGGCCG 86220
CTACGCGTGC TACTTTCGCG ACCTCCAGAC CGGCGACGCG AGCCCCAGCC CCCTCTCCGC 86280
CTTCCGGGGT CCCCAAAGAC CCCCATACGG CTTTGGGTTG CCCTGACGGC GACGGGTGGT 86340
GGCCGAACGC TTCACCGCGC CCGGGCACGC GGGGTGCGTT GTGTTAAAAA AATAAATAAA 86400 TGGGGTAGTG TGTCCCCCCC CTCCAACCAA TATGGCTGTC GTGTGTGGTT CCGGGTTGCG 86460
CCTCCGTCCT TTCCACCCCC CTTCCCCCTC CTTTTTTGTT TTGCGTGCGC TTATAAGAGC 86520
GGGCCCGGGG CCCTTCGCAG CTTCACCGAG AGCGCCGTCG GGCCCCGGGT GCGGGATGTG 86580
TCGCGGGGAC AGCCCCGGGG TCGCGGGCGG GAGCGGCGAA CACTGCCTCG GAGGGGATGA 86640 TGGGGACGAC GGGCGCCCCC GCCTCGCCTG CGTGGGTGCC ATCGCTCGGG GGTTCGCGCA 86700
TCTCTGGCTC CAGGCCACCA CGCTGGGCTT CGTGGGGTCT GTCGTTCTGT CGCGCGGCCC 86760
GTATGCGGAC GCCATGTCGG GGGCGTTCGT GATCGGGAGC ACCGGCCTGG GGTTCCTCCG 86820
CGCCCCCCCC GCGTTCGCCC GGCCGCCGAC GCGTGTGTGC GCGTGGCTGA GGCTGGTCGG 86880
CGGGGGAGCG GCCGTGGCCC TGTGGAGCCT CGGGGAGGCC GGCGCGCCTC CGGGGGTTCC 86940 GGGCCCGGCG ACCCAGTGCC TGGCGCTCGG GGCCGCCTAC GCGGCGCTGC TGGTGCTGGC 87000
CGACGACGTC CATCCCCTTT TCCTCCTCGC CCCGCGGCCC CTGTTTGTCG GCACCCTGGG 87060
GGTTGTCGTC GGCGGGCTGA CGATAGGCGG CAGTGCGCGC TACTGGTGGA TCGACCCCCG 87120
CGCCGCCGCG GCCCTGACGG CGGCGGTGGT GGCGGGCCTC GGGACAACCG CCGCCGGGGA 87180
CAGCTTTTCC AAGGCCTGTC CCCGCCACCG CCGCTTTTGC GTCGTCTCCG CGGTCGAGTC 87240 TCCCCCGCCC CGATACGCCC CGGAGGACGC CGAGCGGCCA ACAGACCACG GACCCCTGTT 87300
ACCGTCGACG CACCACCAGC GATCTCCGCG GGTCTGCGGC GACGGGGCCG CACGGCCCGA 87360
AAACATCTGG GTTCCCGTGG TGACCTTTGC GGGCGCGCTC GCGCTGGCCG CCTGCGCCGC 87420
GCGAGGGTCT GACGCGGCTC CGTCAGGCCC GGTCCTGCCG CTGTGGCCCC AGGTGTTTGT 87480
CGGGGGCCAC GCGGCGGCGG GCCTGACGGA GCTGTGTCAG ACCCTCGCGC CCCGGGACCT 87540 CACGGACCCG CTGCTGTTTG CGTACGTCGG ATTCCAGGTC GTGAACCACG GGCTGATGTT 87600
TGTGGTCCCC GACATCGCCG TATACGCGAT GCTGGGGGGC GCCGTGTGGA TCTCGCTGAC 87660
GCAGGTGCTT GGGCTCCGGC GCCGCCTTCA CAAGGACCCA GACGCCGGGC CCTGGGCGGC 87720
CGCGACCCTG CGGGGCCTCT TTTTCTCCGT CTACGCATTG GGGTTTGCGG CGGGGGTGCT 87780
GGTGCGGCCG CGGATGGCGG CGAGCCGGCG GTCGGGGTGA TCGCCATTTC AAATAAAAGG 87840 CACGAGTTCC CCGAATACCA CCGGCGTGTG ATGATTTCGC CCTACCGCTC CGATCCCCGG 87900
GGGGAGGGGG GAAGGAAATG GGGGCGGGGG TGCCGTGGAC GGGTATAAAG GCCAGGGGGG 87960
CAGGCGGGCC CATCACTGTT AGGGTGTTAG GTTGGGAGGT GGCACAAAAA GCGACACACC 88020
CGTGTTGTAG TTGTCCGCGG GAGGCGGTGG TTTCCGGCAA CCCTCCTCGC TGCGCCGGGC 88080
GCGCCCACCG GTCCTTCGCG GGGGCCGGGG CTCTTCTGGT CATGGCCCTT GGACGGGTGG 88140 GCCTAGCCGT GGGCCTGTGG GGCCTGCTGT GGGTGGGTGT GGTCGTGGTG CTGGCCAATG 88200
CCTCCCCCGG ACGCACGATA ACGGTGGGCC CGCGGGGGAA CGCGAGCAAT GCCGCCCCCT 88260
CCGCGTCCCC GCGGAACGCA TCCGCCCCCC GAACCACACC CACGCCCCCC CAACCCCGCA 88320
AGGCGACGAA AAGTAAGGCC TCCACCGCCA AACCGGCCCC GCCCCCCAAG ACCGGGCCCC 88380
CGAAGACATC CTCGGAGCCC GTGCGATGCA ACCGCCACGA CCCGCTGGCC CGGTACGGCT 88440 CGCGGGTGCA AATCCGATGC CGGTTTCCCA ACTCCACCCG CACGGAGTCC CGCCTCCAGA 88500
TCTGGCGTTA TGCCACGGCG ACGGACGCCG AGATCGGAAC GGCGCCTAGC TTAGAGGAGG 88560
TGATGGTAAA CGTGTCGGCC CCGCCCGGGG GCCAACTGGT GTATGACAGC GCCCCCAACC 88620
GAACGGACCC GCACGTGATC TGGGCGGAGG GCGCCGGCCC GGGCGCCAGC CCGCGGCTGT 88680
ACTCGGTCGT CGGGCCGCTG GGTCGGCAGC GGCTCATCAT CGAAGAGCTG ACCCTGGAGA 88740 CCCAGGGCAT GTACTACTGG GTGTGGGGCC GGACGGACCG CCCGTCCGCG TACGGGACCT 88800
GGGTGCGCGT TCGCGTGTTC CGCCCTCCGT CGCTGACCAT CCACCCCCAC GCGGTGCTGG 88860
AGGGCCAGCC GTTTAAGGCG ACGTGCACGG CCGCCACCTA CTACCCGGGC AACCGCGCGG 88920
AGTTCGTCTG GTTCGAGGAC GGTCGCCGGG TATTCGATCC GGCCCAGATA CACACGCAGA 88980 CGCAGGAGAA CCCCGACGGC TTTTCCACCG TCTCCACCGT GACCTCCGCG GCCGTCGGCG 89040
GCCAGGGCCC CCCGCGCACC TTCACCTGCC AGCTGACGTG GCACCGCGAC TCCGTGTCGT 89100
TCTCTCGGCG CAACGCCAGC GGCACGGCAT CGGTGCTGCC GCGGCCAACC ATTACCATGG 89160
AGTTTACGGG CGACCATGCG GTCTGCACGG CCGGCTGTGT GCCCGAGGGG GTGACGTTTG 89220 CCTGGTTCCT GGGGGACGAC TCCTCGCCGG CGGAGAAGGT GGCCGTCGCG TCCCAGACAT 89280
CGTGCGGGCG CCCCGGCACC GCCACGATCC GCTCCACCCT GCCGGTCTCG TACGAGCAGA 89340
CCGAGTACAT CTGCCGGCTG GCGGGATACC CGGACGGAAT TCCGGTCCTA GAGCACCACG 89400
GCAGCCACCA GCCCCCGCCG CGGGACCCCA CCGAGCGGCA GGTGATCCGG GCGGTGGAGG 89460
GGGCGGGGAT CGGAGTGGCT GTCCTTGTCG CGGTGGTTCT GGCCGGGACC GCGGTAGTGT 89520 ACCTCACCCA CGCCTCCTCG GTGCGCTATC GTCGGCTGCG GTAACTCCGG GGCCGGGCCC 89580
GGCCGCCGGT TGTCTTCTTT TCCACCCCTT CCGTCCCCCG TACCCACCAC ACCCCACCCC 89640
ACCCCCCCGC CGTCCCCCGG GCGTTATAAG CCGCCGCACT CGCTTTTCCC ACCGGAAAAT 89700
CCTCGGCCCG ATCCGAACGG CGCACGCCGC GTGGGCTCCA AACGCCTCCG GAAGAGAGCG 89760
CCCCGCCCCG ATATTCAAGC CCGCGGTGGT GCTATGGCTT TCCGTGCTTC GGGACCCGCC 89820 TACCAGCCCC TCGCCCCCGC GGCCTCCCCG GCGCGGGCTC GTGTTCCGGC CGTGGCCTGG 89880
ATCGGCGTCG GAGCGATCGT CGGGGCCTTT GCGCTCGTCG CCGCGTTGGT TCTCGTACCC 89940
CCTCGGTCCT CGTGGGGACT CTCGCCGTGC GACAGCGGCT GGCAGGAATT CAACGCGGGA 90000
TGCGTCGCGT GGGACCCCAC CCCCGTCGAG CACGAGCAGG CGGTCGGCGG CTGCAGCGCG 90060
CCGGCCACCC TTATCCCCCG TGCGGCCGCC AAGCACCTGG CCGCTCTGAC ACGCGTCCAG 90120 GCGGAGAGAT CGTCGGGTTA CTGGTGGGTG AACGGAGACG GCATCCGGAC CTGTCTGAGA 90180
CTCGTCGACA GCGTCAGTGG CATCGACGAG TTTTGCGAGG AGCTCGCGAT CCGCATATGC 90240
TACTACCCAC GAAGCCCCGG CGGGTTTGTC CGCTTCGTAA CTTCGATACG TAACGCCCTG 90300
GGGTTGCCGT GAGGCGCGCG TCCGACGGTC CCGCTTCTCG CCTCTCTTCT TCCCCCTCCC 90360
CACCCCACCC ACCGACCAAC GACGGCGTTT GGCCAATACC CTCCTTTTTT CTTTTTCTCT 90420 TCCCCCCCCA AAAAAAAAAA CAATAAACAG CTAATTGCGT ACGACAAACC ATGCGGAACT 90480
CGCTGTTTTT TTTTCTCTGT TTGTTACTTT TTATTGAAAA CAGACATACG GGGAAAGGGG 90540
CCGGAAACCG AGACGGTGGG GCCGGCGGTC GCATTTTTTT AATGGCTCTG GTGTCGGCCG 90600
CGTTTGAGCT TCGTCAACAG GGCGCTGAGG GCGGCGACGT TTGTCGGGCC GTCGTTGGCC 90660
AGCGCGTTGG TCCGGGGGCG GGCGGGCATG GGCGACAGGC TTAGTCCCGG GTCCGGGGCG 90720 CGTGTGGCCC CCGGAGGGGA GAAGAGGGCA GACCCGCCCC AGTCGTACAG GGGATTTTCC 90780
GCCTCGATGT ACGGGGAGTC CGGGGCGTCT CCCGGCGGGG CCGCCCCGCC GGCGTCTTGC 90840
CGGCGAAGGC AGATGTTTTC GTATACCCGA ACCCAGGGGA TCTCCTCGTA GACGCGCCCC 90900
CCATCCTCGC TCACCGACTC GTAAATGGAA TCTGCGTCCT CGGAGGGGGC GCGGGGGGCG 90960
TGGCTTTCGG CCGGCCAGGC GGCGGCGGTG GTGTCGGCGG CGGGGGTGGC GCCAAGCCCG 91020 ACGCCCGCGG GCATGGCGGC GTCATCGTCG GGCAGCAGAT ACGTGTTTTC CATCTGGTCC 91080
GGTTCGGCCT CCGCGTCTGG CCCCCAGGTC CGCACCGCGT CGTAAACCCC GGCGGCCTCG 91140
CGCTGAGCCG CGAGCGGGCG CGCCGCGGCT GCCGGCCGCT GCTCGGGGGG CGCGGGGTTG 91200
CGGGGCGGGA GGCGCGGGGG CGCCCCGGCC ATATGCGTGT AATACGTGGC CGGCCGGCCG 91260
GCGCAGGGCT CGGGACCCCG GTCGGCCGCG TCGACGTGCG GGGGCTCGGG GAGGTCCTCG 91320 CGGTGGCGCC TGAACCTCCG AGGGGCCGCG GGGGTCGAGT GGGGGCGAGC CCGGGGGAGC 91380
GGCGGGGGTG CGTTATCGCG CCGGGTCCGT TGTATCTTGT CCCGGCAGCT CCCGCCGACC 91440
GCGCCGCGGC CCCCCGGTGG GCCGGACGCC GCGAGGCGCA GGATGGACTC GTAGTGGGGC 91500
GACGGGGTTC CGCTCCGAAG CAGGTCCGGG GCCAGGGCGG CCCCGAACCA GGACTTGATG 91560 CTGAGTTCCA TCCGGGCCCA GCTCGGGGCG GTCATCGTGG GGAACAGGGG GGCGGCGGTC 91620
CTGCAGAAGC GCTCCTGGCT GTCCACCGCC GCCCGTAGGT ACTCGTTGTT CAGGCTGTCG 91680
GAGGCCCAGA CGACATACCC GGTAAGCGTC GCGTTAATTA TATACTGGGC GTGGTGGTGG 91740
ACTATGGATA GAACCTCGAC GGTCGAGACG ATGGCGTCCA CGATCCCGTA CGTGCCGCCG 91800 CTGCGCTTGC CGGTCTCCCA CAGGTGGGCC AGGCGCGTCA GGTGGCCCAG GACGTCGCTG 91860
ACCGCCGCCC GCAGGGCCAT GCACTGCATC GAGCCCGTGG TGCCGCTGGG CCCGCGGTCC 91920
AGGTGGCGCG CAAACGTCTC CGCGGGCGCC TCCAGACTCC CGCTGAGCGC CACGAACCGG 91980
CGATCGGCGG GGCCCAGGCG GCGACACACG TACTTGTCCG CCGTCCACAG CATCCACGAG 92040
GCCCAATGGT ACAACACGGA GACGTAGGCC AGGAACTCGC TCAGCCGCAG TGCGGTGTCC 92100 GTGCTCGGCC GGCTCGGGTC TGCGGGGCGC ATAAAGAACA TGTACTGCTG GAGCCTGTGG 92160
GCCGCGTCGC GCAACCCCGC CACCGCGGCG GCGTACTTGG CCGCGGCGGC CCCGCTCTTG 92220
AACGGGGCGC GCACCAGCAG CTTCGGGAGC AGGGTGGGCC GCAGCAGCAC GTGCAGGCTG 92280
GGGTCGCAGT CGCCCGCCGG GTCGTCGGGG ATGTCCAGGC CGCTGGGCAC GACCGTCTGG 92340
AGGTACTTCC AGTACTGCGC TAGGATGGCG CGGCTCAGCT GGCCGCCCGA CAGCTCCACC 92400 TCGCCGAGCG CCTGCTTGGC GGCCGACGCG TAGTGCCGGA TGTAGTCGTA GTGCGGGTCG 92460
CTGGCGAGCC CGTCTACGAT CAGGCTCTCG GGGACGGTGT TATGGTGCCG CGCCGCCAGC 92520
CGGACGCTGC GATCGGCGCC GGTCAGAAAC GCCGGCTGCA GGTCGTCGGC GCGCTGCCGC 92580
AGGACGCCCA CGGCCGCGCT GAGGAGCCCC TCCGGGGTGG GGAGCAGACA CCCGGCGAAG 92640
ATGCGCCGCT CGGGGACGCC CGCGTTGGCG CCGCGGATGA GGTTGGCCGG CGTCAGGCAC 92700 CGCGCCAGCC GCAGGGAGCT CGCGCCGCGC GCCCGGCGTT GCATGGCGGA GACCGTTCGG 92760
TCGGGGGCCC GCCGGTCGGA GGTATGCCGC GTCCCGGGAT ATAGGGTTGC TTTTTATGGG 92820
GAGGCGCCTA TGGGCGTGGC GGGCCGCCCA GCCCGGTCGC GCGCCTCCCG GACACGTGCG 92880
CCCGGAGGGC GGCGGTCTCC TCGTCGCCCA TGAGCAGTTT CCGAAACTGC GCCATGATGT 92940
CCACGACGCG GACCCGCGGC CCCAGCACGG ACTCGCTATT CAGGGGGGCG GGGGGGAAGG 93000 CCGCCAGGTC TTCGAGCAGG AAGGCGGGGT CTGCCGTCCC GCTCACGGGC GCCCGGGGCG 93060
CCGAGGACGC GGGGCGAAGG TCCACGTGTT CCGCGGCGGC GCGCACGTCC GCCCAAAATT 93120
TGGCGGGGGT GGTCCGCGCG TACAGGGGCT GGGTCGCGCG GAGGACGCAC GCGTAGCGCA 93180
GGGGGGTGTA CGTGCCCACC TCGGGGGCCG TCGACCCGCC GTCAAACGCG GCCAGGGCCA 93240
CGCACGCGAC CACCGTGTCG GCCAGGCCCA GCAGCCGCTG CAGGATGAGC CCCGTCGCCA 93300 GCACGGCGCG CGCGGCCGCC GCGTGGTCCC TGCGCCGGCG CGCGTCCCCG CAGGCCAGGG 93360
CGTATTTCAG GGTAACGGTC GCCAGGGCCG TGTGCAGCGC GTACACGGCC GCGCCCAGCA 93420
CGGCGTTCAG CCCGCTGGTG GCGAGCAGGC GGCGCGCCGC GGTGTCGCCC AGCGCCTCGT 93480
GCTCGGCCGC CACGACCCCG GGGCTACCCA GGGGCAGGGC GCGAAACAGC GCCTCCTGCT 93540
CCACGTCCAC AAACGCGGGG TGGGCGGAGT GCGGGTGCAG GCGCGCCCCC ACGACCACCG 93600 AGAGCCACTG GACCGTCTGC TCCGCCAGGA CCGCCAGCAC GTCCAGGACG CGCCCCGCAA 93660
ACGCGGCCTC CCGCGGGAGC ACGCATTTGA CGGCGCCGGG GTTGAAGCGG GCGAGCAGAG 93720
CCCCGGTGGC GATGTACGTC ATGCGCCCCG CGTAGCGGGC GGCCACGCGA CAGTCGCGCC 93780
CCAGGAGCGC GCGCACCCCG GGCCAGTACA GCAGGGACCC CAGCGAACTG CGAAAGACCG 93840
CGGCGTCGGG GCCGGGGTGG GGGGGCGCGG CCCCTCCCGC GCTGAGCAGC GGCACGGCGG 93900 CGGCCCCCAC GGGCCGCAAC GCCGTGAGGC TCGCGAACTG CCGTCGGAGC TCGGCCGCCC 93960
TGTCGTCGAG CTCCGAGCCG CGCGCCCTCC GTGTGCAGGC GCGTCCCGCA GACCCACCCG 94020
TTGATCGCCA CCCGCACGAT GGCGTCCACC AGAAAGCCCA TCGCGCGGGA GGGGCTGGTT 94080
TTTGCCCGCC GATCCGTCAG GTCGAGGATC GCGTCGCCCG TGACGTACCA GGCCAGCGCC 94140 TCGCCCTGCT GCAGCGTCTG GCGGAAAAAC ACCTTTGGGT CGGCCGGGGA GGCAAAGTGC 94200
ATGACCCCCA CGCGCGACAG CCCGAACGCG CTATCCGGAC ACGGGTAGAA CCCGGCCGGA 94260
TGTCCCAGGG CCAGGGCCGA GCGCACGGAC TCGTCCCACG CGGCGACTCG GGGGGTCAGG 94320
CGGTCCAGGG GGAATGCCGC CTGCAGCTCC GGGCCCGACA CGCGGCCCGC GAGAATCTCG 94380 ACCGTCGCGG GAGGCCGCGC CCCGGCGCCG TCATCGTGCG CGACGGCGGC GGGGTAGTCG 94440
TCCTCCTCGT AGCTGAGCTC GTCCAGGAAC AGCGGCGAGG GCACCACCCG CGAACCGCCC 94500
ACCCGCCCCA AAACGTCGCG TGGGTCCATC GGGCCCAGGT AGCCTCCCCG CGGGGCCCGC 94560
GTGATGGCGC TGTCCCGGCG TCCGCGAACG GACTGGCTCC TGGCCGTAAC GGACCTGGGG 94620
CGCGGAAAGG ACGCCCGGCG GGGGGGCGCC GCCGCCCGGG CCTCGGACGC GCGTCGGGAC 94680 CCGGGGTGAC CGCGGGCCTC CCGGCGACGG CGCGGGGGCG GCTCTTCGCT CGCCATCTCC 94740
CCCGCGGCCT CGACCTCGCT GTCGTCGTCC ACGTTAAACA CCGCCCGCAG GTACCCCATT 94800
AACCCGACTC CACCGCCCTC GGGCTCGTCC TCCACGGGCG AGTCGGCGCG ATGCGCGGAC 94860
GGGGCATGGG ACCGGGTGGA GGCGCGCCTC CGGCGTACGG CATGCCCGCG CACGGACATG 94920
GTGGCCGGAG GCCCGATTTT TTACACACGC CCTCCCCGCA GACGGACGAG GAAAGGGGTG 94980 GTGCGAGGGG GGAGGCCCAA ACGGGGAGGT GGGGGGTAGG GGGCGGTCCC AGGGAGCGGG 95040
GGGTAGGAAC CGGCACGACG GGAACAGAGA AACGCGACCG CTCCAACAAG GGTGGGGGGT 95100
GGGCCTCGTC CCCACGCAGA CCCGCGGGCA AATGCGAGAA CGGGACCCGC GCGCCTGCCT 95160
TTATACGCGG ACCCCAGCAC CACGAGCCGT TCTGTGACGC GAATCTACAC GACCGCGGGC 95220
TCGTAGGCGC GACTAACGCC CAACCCAACG GCACACACCC CCCACCCCGC GCGTAACCCC 95280 ATTTCTTTCA TGGTCCCGTA ATAAACAGCC AACGCACGCC GCGTATGATG AGTTGCTTGC 95340
CAATGTTTAT TGCTGTGGTT GCGAACCCTC TATCGCGATA CAGACGGAAG TGAGGCGGGG 95400
CGGTGGTGGG GGGGGGGGGC GCGCCGCCCG GTCGCACATC CTACCCCCCA AAGTCGTCAA 95460
TGCCCATGGC ATCGGTAAAC ATCTGTTCAA ACTCAAAATC GTCCACGTCC AAAGCCCCAT 95520
ACGAGACGGG GTCGTGGGTC ATTCCCGGGG AGGGGGACTC CACGTCCCCC AGCATCTCCA 95580 AGTCGAAGTC GTCCAGGGCG TCGGCGGGCG TCATATCCAC CTCCTCGCCG TCCAGGCGGA 95640
GTTCGTCTCC CAGGCTGACG TCGGTAATGG GGGCGGTGGT GGACAGTCTG CGGGGGCGTT 95700
GTCCCGCGGA GAGAAACGAC ATGCGCGGCG CCACCAGCCC GGCCTCCGCG GGAGCGTCAT 95760
CGTCGTCCGG GAGGTCGAGC AGGCCCTCGA TTGTCGATCC GTAATTATTT CTGGTCCGCC 95820
CGCGGCTATA CGCGTGCTCC CGCATGACGG ACTCGCCCTC CGAGGTCGCA ACGCTGGAGT 95880 ACGAGTCCAA CTTGGCCCGG ATCAGCAGCA TAAAGTACCC AGAGGAGCGG GCCTGGTTGC 95940
CCTGCAGGAC GGGCGGGGTC GTGAGGGGCG CCCCGGGTTC CTCCGCCGCC GCACTTCGCA 96000
CCAGCGGGAG GTTCAGGTGC TCGCGAATGT GGTTTAGCTC CCGCAGTCGC CGGGCCTCCA 96060
CGGGAACTCC CCGCACGGTG AGCGATCCGT TGATAAACAT CAGGGGCTGA AACAGACACG 96120
CCAACTGGCG CCAGCTCTCC AGGTCGCAGC AGAGGCCGTC GAACAGATCG GGCCGCATCA 96180 TCTGCTCGGC GTACGCGGCC CATAGGATCT CGCGGCTCAG AAAGAGGTAT AGATGCAGAA 96240
ACAGGACGCG CGCCAGGCGC GCGGTCTCGC GGTAGTACCT GTCCGCGATC GTGGTGCGCA 96300
GCATCTCCCG CAGGTCGCGG TTGCGGCCCC GCATGTGTGC CTGGCGGTGT AGCTGCCGAA 96360
CGCTGGCGCG CAGGTACCGG TACAGGGCCG AGCAAAAATT TGCCAACACG GTCCGGTAGC 96420
TCTCCTCCCG CGCCCGCAGC TCACCGCGGA AAAACTGCGC CATGGCCTCG TAGTACGAAG 96480 GCAGCTCGTC GCGGGTGGCG GGCAGGGTGG GGAACGCCAC GTCGCCGTGG GCGCGAATGT 96540
CGATCGGGGA GCGCTCGGGG ACGTGCGCAT CCCCCCAGTC GATCACGTCG CTGGGCAGCG 96600
TCGACAGAAA CTTGCACTCC CGGTACATGT CGGCGTTGGT CGGGAACCCA GAGAACAGGT 96660
CCTCGTTCCA GGTATCTAGC ATGGTACACA GCGCGGGACC CGCGCTGAAG CCCAGATCGT 96720 CGAGGAGACG GTTAAACAGG GCCGCGGGGG GGACGGGCAT GGGCGGCGAG GGCATCAGCT 96780
GGGCCTGACT CAGCCGACCG GTGGCGTACA GCGGAGGGGC GGCTGGGGTG TTCTTGGGAC 96840
CCCCGGCTGG CCTGGGGGGC GGTGGCGAAA CCCCGTCCGC GTCCGCAAAC AGATCGTCGA 96900
CCAACAGGTC CATGGGGGCG GTTGGGTCCG GGAATAACGA TCTCGAGAGG CGAATGAGAC 96960 GTGCCCGAGC GCCCGGCGGC GGAGAGGGGG GGAGGGATCC GGGACCCGCG ACAGAAAAAG 97020
GCCGGGGCCC TCGCGAAGGG AATCGCCGGG GGTGCCGTGC GTCCCCGAGG ACTGACATCT 97080
CGCGTCCACC ACCCCGCATT TAAGTATCAC CCCAGTGCCG CCCCAAACCT CGTGACTTCC 97140
CCACCGCTCC GGGCGGCCCG TCCCCCGCGC TCGGAAGGGA GGCGTGTCGC CCCTCCCGCC 97200
CCTCCCGCCC CTCCCGCCCC TCCCGCCCCT CCCGCCCCTC CCGCCCCTCC CGCCCCTCCC 97260 GCCCCTCGCC ACAAACGCGT GCTGACAGCG AAGTGGTTAA ATCGACCGTG ATGCTTTATT 97320
GTCTGTCGTC TGAACGCGGC CGGGGTCGCT ACTCGAGGGG GCGGCGGGGA CGGGAAGCCG 97380
AGCGGGCGGG GGCCCGTGCG GTCGCGGCGG CACGCCCCGC GGGGCGGCCC CGGGCGGCCG 97440
CGGTCGCGTC GACGTCCTGC GCCGCGTCGG GATTCACCAA CTCGTTCGCG CGCTGCAGGA 97500
GGTTCTTGCC CTCGCAGACC GTCACGCGAA TGGTGGTGAG GTCGAGGAGC TCGTTGAGGT 97560 CTTCGTCGGT GTGCGGCCGC GACATGTCCC ACAGCTGTAC CGCCGCCAGC CGGGCGTGCG 97620
TGGCCGCCAG GCGCCCGACC GCGGCGCAGA AGACGCGCTT GTTGAACCCG GCCACCCGGG 97680
GGGTCCACGG CGCCGTGGGG CTCGGTGGGG CGGTGCTGAA GTGCAGCTTC TTGGCCAGTC 97740
CCTGGGCGGG TGTCTTGGTT CTTCCCGAGG CCGTGGGAGC GGGGGCGTCT AGGAGCACGG 97800
CGGAGTCGGC CTGGGCGGGT CGCCTGCCGC GGGCAGGGTC GGTCGCCGGG GTCGCGGGGG 97860 CCTTAGGCGC CCCGCGCGTC ATTTTGGGGG TCCGCGCGGG AGGGGCGTGC GAGCGCCCGC 97920
CGGCGCCCAC GGGGCCCCCG GGGGGTGGAG GAGCGCGCGC GGGGCCGGGG CCGTGAGAGC 97980
CCGCGACGGA CGCCGAACGA CGCGGTCGCG CGGTATCCCG GGACTCGTCG TTGTCTTCGG 98040
ACGACGACGA GTCCCGGTAG AGGGCATACC CAGCCTCGTC ATAATGGAGA AAGCGAACCT 98100
CGCCCCTTGG GCGCGCGCGC ATCGGGCCAG CGCCGCGGCG GAAGTCGTCG CGCGGACTCT 98160 CTGGATCCGC CGGGGAGACC GGGCCATAGT ACAGCTCCTC GTGGGTCCCG CGCGGCGCTT 98220
CCCGCGGACA CGACTTGACG GAGCGGCGAG AGGTCATGGT CTATCGGAGA CACCGGGGAC 98280
GCCCGTGCGG ATCACAGGGA AGGCGTCGGC GAAGCAGGCA GAGAGCGTCG GAAGGCGGCG 98340
AGGGAGGGAA AGAGGGAGAC CGGCGGGGTA CGGGAGAGCA GCGAGGGCCT GCGTAACCCA 98400
CGGGGGCCGC GGGAGTGGCT CCCTGCGGGT TGCGGGGGAG AGTTTATAGG AAGTGGATAT 98460 AACCGCAGGC GACGGGACTA ACCAATCCCC GGGGGGGCAA CGGACAGACA CGCCCCGAAC 98520
CGGCCCGACT TCCGCGAGGA AGCAAAGGCC GGGGGCCGCC CAACGACACG CCCACCCCTT 98580
CCCAACAGGG CGGGCTCAGG CTGACCCGGC GGCCAGTGCC CGCTGGCATA TCTGATACAC 98640
GTGCGCGATC ATACATACGC CCATCGAGGT CATGCCTAGA TAAAAGGGCA CCAGGACCCC 98700
CGGGACGGAC ACCACACCGG CGCTGTCGCC CCGGCATTGC GCGTCCCCGA TAACGCCGCG 98760 TGCGCCTGCC GCGTTCGGCG GCTCCCCGGG CACGCCCGCG ACGAGCGCGA CGAACAACAG 98820
CACCACCCAG CGGCCCAGTC TTGCGGGTTT CCCCGTCATC GCGGCGATGA GTCAGTGGGG 98880
GCCCAGGGCG ATCCTTGTCC AGACGGACAG CACCAACCGG AATGCCGATG GGGACTGGCA 98940
AGCGGCCGTA GCTATTCGCG GGGGCGGAGT CGTTCAACTG AACATGGTCA ACAAACGCGC 99000
CGTGGATTTT ACCCCGGCAG AATGCGGGGA CTCCGAATGG GCCGTGGGCC GCGTCTCTCT 99060 GGGCCTGCGA ATGGCAATGC CGCGTGACTT CTGCGCGATT ATTCACGCCC CCGCGGTATC 99120
CGGCCCCGGG CCCCACGTGA TGCTCGGTCT CGTCGACTCG GGCTACCGCG GAACCGTCCT 99180
GGCCGTGGTC GTAGCCCCGA ACGGGACGCG CGGGTTTGCC CCCGGGGCCC TCCGGGTCGA 99240
CGTGACGTTT CTGGACATCC GGGCCACCCC CCCGACCCTC ACCGAGCCGA GCTCCCTGCA 99300 CCGGTTTCCG CAGTTGGCGC CGTCCCCGCT GGCAGGGTTA CGAGAAGATC CTTGGTTGGA 99360
CGGGGCGCTC GCGACCGCCG GGGGGGCGGT GGCCCTGCCG GCCAGACGGC GCGGGGGATC 99420
GCTGGTCTAC GCGGGCGAGC TAACGCAGGT GACCACCGAG CACGGCGACT GCGTGCACGA 99480
GGCGCCCGCC TTTCTGCCAA AGCGCGAGGA GGACGCAGGC TTTGACATTC TCATCCACCG 99540 AGCCGTGACC GTCCCGGCCA ACGGCGCCAC GGTCATACAG CCGTCCCTCC GCGTATTGCG 99600
CGCGGCCGAC GGACCAGAGG CCTGCTATGT GCTGGGGCGG TCGTCGCTCA ATGCCAGGGG 99660
CCTCCTGGTC ATGCCTACGC GCTGGCCCTC CGGGCACGCC TGTGCGTTTG TTGTATGTAA 99720
CCTGACCGGA GTCCCGGTGA CCCTACAAGC CGGGTCCAAG GTCGCCCAGC TGCTCGTCGC 99780
GGGGACCCAC GCCCTCCCCT GGATCCCCCC CGACAACATC CACGAGGACG GCGCATTCCG 99840 GGCCTACCCC AGAGGGGTTC CGGACGCGAC CGCCACCCCC CGAGACCCGC CGATTTTGGT 99900
GTTTACGAAC GAGTTTGACG CGGACGCCCC CCCAAGCAAG CGGGGGGCCG GGGGGTTTGG 99960
CTCCACTGGC ATCTAGACCG CGCCTCGCGT CGGGCCAGAT GGGGCCCCGG TCAATAAAGA 100020
GCTCTGTTTC GCATATGCCC TGGTGTTGGC GGTTTTTTTT TTGTTGTCTG TCTGCCCGGC 100080
ACTCGGTTGT CCGTTCTGTC GTCGCTATCA CA ACGCACA AACACACGGG TAGAGTGGAA 100140 CCGAAACCGG TCGACGTTTA TTCACCACAC AGAAACACAA GCTAAGCGAG AAGGAGGGGG 100200
GCCTCGGTCG ACGAGGCCTG GCGTTTGGGG GCGGACGTGC GATGACGTGG GTCCGGTGTA 100260
GGGTCCGCGG GGGGCACGGG CCCGGGGCGA ACGGGGGATC TGTCGCCGGC GTGGGTGACT 100320
GGGACCGACG CAACCTCCGG GGCTTGTGCC CTCGTAGGCC CGGGGGGGGC CTCGGTCGCT 100380
CCGAGCCCCG CGGTGCGGGT CCCTCCGGCC AGAGCCGAGG TGGAGAGACC AAGGGCCCGC 100440 TCCGCGATCG CCACGTCCTC CATGACCACG TCGCTCTCGG CCATGCTCCG AATGGCCTGG 100500
GAGACGAGCA CGTCCGCCGA CTTGTCCGCG GCCCCCACCG ACATGTACAT CTGCAGGATG 100560
GTGGCCATGC ACGTGTCCGC CAGGCGGCGC ATCTTGTCCC GATGCGCCGC AACGGCCCCG 100620
TCGATGGTGG AGCCCTCGAG TCCCGGGTGG TGGCGCGCCA GCCTCTCGAG GTTGACCATG 100680
CAGGCGTGGT ATGTGCGGGC CAGGGCGCGC GCCTTCACGA GGCGCCGGGT GTCGTCCAGC 100740 GACTCTAGGG CGTCGTCGAG CGTGATGGGG GCGGGCAAAA GCGCATTGAC CACCGCCAGG 100800
GCCTCCTGCA GCCGCGGGTC CNNNTCCGAG GGCGGAGCCG CGGCCCGAAT CATCTCATAT 100860
TGTTGTTCCT CGGGGCGCGT TCCCCAACCG CACAGCACCC CGAGCAGGGA CGCCATCCCG 100920
GAACACGCGC GCGGCTCTGC GCCGGCTTTC CCCCACCCCA CCCCCTCCGG GTTCGCAGGG 100980
GCGATGGGGA CGGAAGACTG CGATCACGAA GGGCGGTCGG TTGCGGCTCC CGTGGAGGTT 101040 ATGGCGCTGT ATGCGACCGA CGGGTGCGTT ATCACCTCCT CGCTCGCCCT CCTCACAAAC 101100
TGCCTGCTGG GGGCCGAGCC GTTGTATATA TTCAGCTACG ACGCGTACCG GCCCGATGCG 101160
CCCAATGGCC CCACGGGCGC GCCCACCGAA CAGGAGAGGT TCGAGGGGAG CCGGGCGCTC 101220
TACCGGGATG CGGGGGGGCT AAATGGCGAT TCATTTCGGG TGACCTTTTG TTTATTGGGG 101280
ACGGAAGTGG GCGTGACCCA CCACCCGAAA GGGCGCACCC GGCCCATGTT TGTGTGCCGC 101340 TTCGAGCGAG CGGACGACGT CGCCGTGCTC CAAGACGCCC TGGGCCGCGG GACCCCATTG 101400
CTCCCGGCCC ACATCACAGC AACTCTGGAC TTGGAGGCGA CGTTTGCGCT CCACGCTAAC 101460
ATCATCATGG CTCTCACCGT GGCCATAGTC CACAACGCCC CCGCCCGCAT CGGCAGCGGC 101520
AGCACCGCTC CCCTGTATGA GCCCGGCGAA TCGATGCGCT CGGTCGTCGG GCGCATGTCC 101580
CTGGGGCAGC GCGGCCTCAC CACGCTGTTC GTGCACCACG AGGCGCGCGT GCTGGCGGCG 101640 TACCGCCGGG CGTATTATGG GAGCGCCCAA AGCCCCTTTT GGTTTCTGAG CAAATTCGGC 101700
CCGGACGAAA AGAGCCTGGT GCTGGCCGCT AGGTACTACG TACTCCAGGC TCCGCGCTTG 101760
GGGGGCGCCG GAGCCACGTA CGATCTGCAG GCCGTGAAAG ACATCTGCGC GACCTACGCG 101820
ATCCCCCACG ACCCACGCCC CGACACCCTC AGTGCCGCGT CCTTGACCTC GTTCGCCGCC 101880 ATCACTCGGT TCTGTTGCAC GAGCCAGTAC TCCCGCGGGG CCGCGGCCGC TGGGTTTCCG 101940
CTGTATGTGG AGCGCCGCAT CGCCGCCGAC GTACGCGAGA CCGGCGCGCT GGAGAAGTTC 102000
ATCGCCCACG ATCGCAGTTG CCTGCGCGTG TCCGACCGGG AATTCATTAC GTACATCTAC 102060
CTGGCCCACT TTGAGTGCTT CAGCCCCCCG CGCCTGGCCA CGCATCTCCG GGCCGTGACC 102120 ACCCACGACC CCAGCCCCGC GGCCAGCACG GAGCAGCCCT CGCCCCTGGG TCGGGAGGCG 102180
GTGGAACAGT TCTTCCGGCA CGTGCGCGCC CAGCTGAACA TCCGCGAGTA CGTAAAGCAA 102240
AACGTCACCC CCAGGGAAAC CGCCCTGGCG GGAGACGCGG CCGCCGCCTA CCTGCGCGCG 102300
CGCACGTATG CCCCGGCGGC CCTCACGCCC GCCCCCGCGT ACTGCGGGGT CGCAGACTCG 102360
TCCACCAAAA TGATGGGACG TCTGGCGGAA GCAGAAAGGC TCCTAGTCCC CCACGGCTGG 102420 CCCGCGTTCG CACCAACAAC CCCCGGGGAC GACGCGGGGG GCGGCACTGC CGCCCCCCAG 102480
ACCTGCGGAA TCGTCAAGCG CCTCCTCAAG CTGGCCGCCA CGGAGCAGCA GGGCACGACG 102540
CCCCCGGCGA TCGCGGCTCT CATGCAGGAC GCGTCGGTCC AAACCCCCCT GCCCGTGTAC 102600
AGGATTACCA TGTCCCCGAC CGGCCAGGCG TTTGCCGCGG CGGCGCGGGA CGACTGGGCC 102660
CGCGTGACGC GGGACGCGCG CCCGCCGGAA GCGACCGTGG TCGCGGACGC GGCGGCGGCG 102720 CCCGAGCCCG GCGCGCTCGG CCGGCGGCTC ACGCGCCGCA TTTGCGCCCG GGGCCCCGCG 102780
CCTCCCCCCG GGCGGCCTGG CCGTCGGGGG CCAGATGTAC GTGAACCGCA ACGAGATCTT 102840
CAACGCCGCG CTGGCCGTTA CGAACATCAT CCTGGATCTG GACATCGCCC TGAAGGAGCC 102900
CGTCCCCTTT CCCCGGCTCC ACGAGGCCCT GGGTCACTTT AGGCGCGGGG CGCTGGCGGC 102960
GGTTCAGCTG TTGTTTCCCG CGGCCCGCGT AGACCCCGAC GCCTATCCCT GTTATTTTTT 103020 CAAAAGCGCC TGTCGGCCCC GCGCGCCGCC CGTCTGTGCG GGCGACGGGC CCTCGGCCGG 103080
TGGCGACGAC GGCGACGGGG ACTGGTTCCC CGACGCCGGT GGCGACGACG GCGACGAGGA 103140
GTGGGAGGAG GACACGGACC CCATGGACAC GACCCACGGC CCCCTCCCGG ACGACGAGGC 103200
CGCGTACCTC GACCTGCTAC ACGAACAGAT ACCAGCGGCG ACGCCCAGCG AACCGGACTC 103260
CGTCGTGTGT TCCTGCGCCG ACAAGATCGG GCTGCGCGTG TGCCTACCGG TCCCCGCCCC 103320 GTACGTTGTG CACGGCTCCC TGACGATGCG TGGGGTGGCG AGGGTGATCC AGCAGGCGGT 103380
GCTGTTGGAC CGCGACTTCG TGGAGGCCGT AGGGAGCCAC GTAAAGAACT TTTTGCTGAT 103440
CGATACGGGC GTGTACGCCC ACGGCCACAG CCTGCGCTTG CCGTATTTCG CCAAGATCGG 103500
CCCCGACGGC TCCGCGTGCG GCCGGTTATT GCCCGTCTTC GTGATCCCCC CCGCGTGCGA 103560
GGACGTTCCG GCGTTCGTCG CCGCGCACGC CGACCCGCGG CGCTTCCACT TTCACGCCCC 103620 GCCCATGTTT TCCGCGGCCC CGCGGGAGAT CCGCGTCCTC CACAGCCTGG GCGGGGACTA 103680
TGTCAGCTTT TTCGAGAAGA AGGCGTCGCG CAACGCCCTG GAGCACTTTG GGCGACGCGA 103740
GACCCTGACG GAGGTTCTGG GCCGCTACGA TGTGCGGCCC GACGCCGGGG AGACCGTGGA 103800
GGGGTTCGCG TCAGAACTGC TGGGGCGAAT AGTCGCGTGC ATCGAGGCCC ACTTTCCCGA 103860
GCACGCGCGG GAATATCAGG CCGTGTCCGT TCGCCGGGCC GTCATTAAGG ACGACTGGGT 103920 CCTGCTGCAG CTGATCCCCG GCCGCGGCGC CCTGAACCAA AGCCTCTCGT GTCTGCGCTT 103980
CAAGCACGGC AGGGCAAGTC GCGCGACGGC CCGGACCTTT CTCGCGCTGA GCGTCGGGAC 104040
CAACAACCGC CTATGCGCGT CCCTGTGTCA GCAGTGCTTT GCCACTAAAT GCGATAACAA 104100
CCGCCTGCAC ACGCTGTTTA CCGTCGATGC GGGCACGCCA TGCTCGCGGT CCGCTCCCTC 104160
CAGCACCTCA CGACCGTCAT CTTCATAACG GCCTACGGCC TCGTGCTCGC GTGGTACATC 104220 GTCTTTGGTG CCAGTCCGCT CCACCGATGT ATTTACGCGG TGCGCCCCGC CGGGGCACAC 104280
AACGATACCG CCCTCGTGTG GATGAAGATA AACCAGACGC TGTTGTTTCT GGGCCCGCCG 104340
ACCGCCCCCC CCGGCGGGGC ATGGACCCCC CACGCCCACG TCTGCTACGC CAATATCATC 104400
GAAGGTCGGG CCGTGTCCCT CCCGGCCATC CCCGGCGCCA TGAGCCGCCG GGTCATGAAC 104460 GTGCACGAGG CCGTAAACTG CTTGGAGGCC CTCTGGGACA CCCAGATGCG CCTGGTGGTC 104520
GTCGGTTGGT TTCTGTATCT AGCGTTCGTC GCCCTTCACC AACGACGATG CATGTTCGGC 104580
GTCGTGAGTC CCGCGCACAG CATGGTGGCC CCGGCGACCT ATCTTTTGAA CTACGCCGGC 104640
CGCATAGTGT CGAGCGTGTT CTTGCAATAC CCCTACACGA AAATCACCCG CCTCCTCTGC 104700 GAGCTATCCG TTCAACGCCA GACCCTGGTG CAGCTGTTCG AGGCGGATCC GGTCACCTTC 104760
TTGTACCACC GCCCGGCCGT TGGCGTCATC GTGGGCTGCG AGCTGCTGCT CCGCTTCGTG 104820
GCCCTCGGTC TCATCGTCGG CACCGCTCTC ATCTCCCGGG GCGCCTGCGC GATCACATAC 104880
CCCCTGTTTC TAACAATCAC CACCTGGTGT TTCGTGTCCA TCATCGCCCT GACGGAGCTG 104940
TATTTCATCC TGCGGCGGGA CTCGGCCCCC AAAAACGCGG AACCAGCGGC CCCCAGGGGG 105000 CGCTCCAAAG GGTGGTCGGG CGTCTGCGGG CGCTGCTGTT CCATCATCCT CTCCGGT TC 105060
GCCGTGCGCC TGTGCTATAT CGCCGTCGTG GCCGGGGTGG TGCTTATGGC GCTTCGCTAC 105120
GAACAGGAGA TTCAGCGGCG CCTGTTTGAT CTGTGACGTA ACGCCTCTTC CGTTGGAAGA 105180
GGCGGACCCA GTCGCCCATG CAAATTAAAT ACACGACCCG CCTCGGGCCT ACGCACCCTC 105240
GCACGTCGCA TGCAAATTAA AATCGTGCAC AGAGCCGATC CGGCCTCGGG TCTGCTTGCC 105300 CCTCCCCCGG TCCAGCACAG GCAGGCTCGT CCGACTTCCG CATACACCCC ACCCTACCGC 105360
GTGCTTCCGC ACCCCCGCCT ACGCGTGTAC GCGAAGGCGG ACCCAGACCT GCCGTATGCT 105420
AATTAAATAC ATAAAACCCA CCCTCGGCGT CCGATTGGTT TCTGGGGACG GCGGGGGCGG 105480
GGGCGGTGAC GCCCGACGGG GAGGGACAAG GAGGAGTTTC GGAAAGCCGG CCCCGGTCGT 105540
GCGGGTATAA GGGCAGCCAC CGGCCCACTG GGCGCTGTGT GCTGCCGTGT GCCGACCCCG 105600 GTTGCGCGTC GGTGCCGCTC CTCGATTCGG ACCCGGCCAC TCTCTTCCGA CACGCGCCCC 105660
CTCGGAGGAC ACCCGCCATC CCAGCCCCGG CGACCTACAA CATGGCTACC GACATTGATA 105720
TGCTAATCGA CCTAGGATTG GACCTGTCCG ACAGCGAGCT CGAGGAGGAC GCTCTGGAGC 105780
GGGACGAGGA GGGCCGCCGC GACGACCCCG AGTCCGACAG CAGCGGGGAG TGTTCCTCGT 105840
CGGACGAGGA CATGGAAGAC CCCTGCGGAG ACGGAGGGGC GGAGGCCATC GACGCGGCGA 105900 TTCCCAAAGG TCCCCCGGCC CGCCCCGAGG ACGCCGGCAC CCCCGAAGCC TCGACGCCTC 105960
GCCCGGCAGC GCGGCGGGGA GCCGACGATC CGCCACCCGC GACCACCGGC GTGTGGTCGC 106020
GCCTCGGGAC CAGGCGGTCG GCTTCCCCCC GGGAACCGCA CGGGGGGAAG GTGGCCCGCA 106080
TCCAACCCCC GTCGACCAAG GCACCGCATC CCCGAGGCGG GCGGCGAGGT CGCCGCCGGG 106140
GCCGGGGTCG ATACGGCCCC GGCGGCGCCG ACTCCACACC AAACCCCCGC CGGCGCGTCT 106200 CCAGAAACGC CCACAACCAA GGGGGTCGCC ACCCCGCGTC GGCGCGGACG GACGGCCCCG 106260
GCGCCACCCA CGGCGAGGCG CGGCGCGGAG GGGAGCAGCT CGACGTCTCC GGGGGCCCGC 106320
GGCCACGAGG CACGCGCCAG GCCCCCCCTC CGCTGATGGC GCTGTCCCTG ACCCCCCCGC 106380
ACGCGGACGG CCGCGCCCCG GTCCCGGAGC GAAAGGCGCC CTCTGCCGAC ACCATCGACC 106440
CCGCCGTTCG GGCGGTTCTG CGATCCATAT CCGAGCGCGC GGCGGTCGAG CGCATCAGCG 106500 AAAGCTTTGG ACGCAGTGCC CTGGTCATGC AAGACCCCTT TGGCGGGATG CCGTTTCCCG 106560
CCGCGAACAG CCCCTGGGCT CCCGTGCTGG CCACCCAAGC GGGGGGGTTT GACGCCGAGA 106620
CCCGTCGGGT TTCCTGGGAA ACCCTGGTCG CTCACGGCCC GAGCCTCTAC CGCACATTCG 106680
CAGCCAACCC GCGGGCCGCG TCGACAGCCA AGGCCATGCG CGACTGCGTG CTGCGCCAGG 106740
AAAATCTCAT CGAGGCCCTG GCGTCCGCGG ATGAGACGCT GGCGTGGTGC AAGATGTGCA 106800 TTCACCACAA TCTGCCGCTC CGCCCCCAGG ACCCTATCAT CGGAACGGCG GCCGCCGTGC 106860
TGGAAAACCT CGCCACGCGC CTGCGCCCCT TTCTGCAGTG CTACCTGAAG GCCCGAGGCC 106920
TGTGCGGGCT GGACGACCTG TGCTCGCGGC GACGCCTGTC GGACATTAAG GATATTGCCT 106980
CCTTTGTGTT GGTCATCCTG GCCCGCCTCG CCAACCGCGT CGAGCGCGGC GTGTCGGAGA 107040 TCGACTACAC GACCGTGGGG GTTGGGGCCG GCGAGACGAT GCACTTTTAC ATCCCGGGGG 107100
CCTGCATGGC GGGTCTCATT GAAATACTGG ACACGCACCG CCAGGAGTGT TCCAGTCGCG 107160
TGTGCGAGCT GACGGCCAGT CACACTATCG CCCCCTTATA TGTGCACGGC AAATACTTCT 107220
ACTGCAACTC CCTATTTTAG GCAAGAATAA ACATATTGAC GTCAACCCAA GTGGTTCCGT 107280 GTGATGTTCT TGGCGCGCGC GGCGGGTGGG GCGGAGACTC CGGGGCGATG CCGGCGTGCG 107340
CGTGGGAGGA GGGCGATGAC CCACCGGATA AATGTGGGGC CCCGGCCCGG CCCGCTTCAT 107400
AGCGCGTCCA GGAACTCACG GCAGACGCGT ATTCACCGAC CCCCCCCCTC GCAACATGAC 107460
AACGACGCCC CTCTCGAACC TGTTTTTACG GGCCCCGGAC ATCACCCACG TCGCCCCCCC 107520
GTACTGTCTG AATGCCACGT GGCAGGCCGA AAACGCCCTG CACACGACCA AAACGGACCC 107580 CGCGTGCCTG GCCGCGCGGA GTTATTTAGT CCGCGCCTCC TGCTCGACCA GCGGCCCCAT 107640
CCACTGTTTT TTCTTTGCGG TGTACAAGGA CTCGCAGCAC TCCCTTCCGC TGGTTACCGA 107700
GCTCCGCAAC TTCGCGGACC TGGTCAACCA CCCGCCCGTC TTGCGCGAAC TAGAGGATAA 107760
GCGTGGGGGG CGGCTGCGGT GCACGGGCCC ATTCAGCTGC GGAACCATCA AGGACGTCTC 107820
CGGTGCATCC CCCGCGGGGG AATACACGAT AAACGGTATC GTGTACCACT GTCACTGTCG 107880 GTATCCGTTC TCCAAAACCT GCTGGCTCGG GGCATCCGCG GCCCTACAAC ACCTTCGCTC 107940
TATAAGCTCA AGCGGCACGG CCGCTCGCGC GGCAGAACAG CGACGCCACA AAATCAAAAT 108000
CAAAATCAAG GTATAACCCA CCCCCTTCCC TCCGAGTCCG TATGCAACCT CATTAATAAA 108060
GAGTGAGAAC CAACCAAAAC AGACGCGGTG TGAGTTTGTG GGTTATAGGA ACCCGGTAAA 108120
TACCACGCGA CGAACCAGCG TGTGTGTTAA CGCGACTTTT ATTCGTTGTA TCGCGGGAGG 108180 GGGGAAGCTT ACCGCCAAAG GAAGGCCAAG ATGATAACGA CGACCACCGC GACCACCCCA 108240
AAAACCGCAT GACGACACGT CCCGCCACAC CACCCTGGGG CTTGGGGCGT GTCGGAGCTC 108300
GACGCACAGC GGGCCGCGCG TTGGGCCCGG TACAGCTCTC GCGAATTGAC GAGCGGGGGT 108360
CGCCACGTGC GCGAGCTTTG CACGCGGGGT TGGTCGGCCG GCCCCACGGA CCCGCCCGGT 108420
GGCTCGGTCG GACATGCGGC CATGACCATG GCGTAGGTGG GGGGGCGATC CGAGGTCGCC 108480 TCTGCGTAAG TAGGGAGGCC CGACGGGAGG TCGCCTCCCA CGCCAGGGTG GGCCCCAATC 108540
ATAGTTTCCG GTAGAAACAG GGGGGTCTCC ACAAACAACC CCCCTGGGCC AAAGCTCCGG 108600
CGCCGCGCCC GTCGTTCGGC GCGGCGCCTG GCGCGCCGAG CGGCCCGCCA GGCGGCGCGG 108660
CGCGAGCGGC CACGCTCACA CACCTCGCCG TCACCGGAAG AAGCCGGTGA AACAAGCCCA 108720
ACCGGCGACG TCCCTGCAGA GTACGGTGGA GGCGAGTCCG TGGGGGTGTC GATATCAATA 108780 ACGACAAACT GGCCCGCGCT CGCGCCGGCC ACACTCTCGT ATGGGGGCGG GGCGTCAATC 108840
ACGCTATCAT CTCCGTCATC CCTGCATGCG TGGGCATGCC CAGCCCCCAA CGCCATGGTG 108900
GGGATTCGCG GCTCAGAAGC CTGCATGTCG TGTGGTCGGT CGTAGTCCAA CGTGCCTCCC 108960
CCACCCACCA CACAGCCGGT CCCCACGCCG ACCACTAGAC CGCAGACGTC GCCCAACCGA 109020
GGTCCCCGTG CACAGACCGC GCCTTTTATA GCCCCAGGGG TTGCTAATTA ACGCACGCAT 109080 GCAGACGCAA TTTATTTTGC TCCCCCGCGT CCTCCCCTCC CCCGCGTCCT CCCCTCCCCG 109140
TCCTCCCCTC CCCCGCGTCC TCCCCTCCCC CGCGTCCTCC CCTCCCCTGC GCACACGTGA 109200
TAGGTCTTGG GAACCCGAGG GGCGACGCGG GGAAAGCGCG CCCCCGCCCG GCCGCCGCGC 109260
GCCCCCGCCC GGCCGCCGCG CGCCCCCGCC CGGCCGCGCG CCCCCGCCCG GCCGCCGCGC 109320
GCCCCCGCCC GGCCGCCGCG CGCCCCCGCC CGGCCGCCGC GCGCCCCCGC CCGGCCGCCC 109380 GCGTCGCGCC GGCGCCCCCT CCCGGCGCTT CCGGGGCCTT TCCTTCCTTC CCCGCCGCGA 109440
CCCCGGCCCC GCCCCACCGC CCCGCCCGGC AGGGGGGCCC CGGCGCCGCG CAGAACACAC 109500
AGACGAACAC ACGGTGGCGA TCTTTTCTTT ACTTCGGCAG ACCAGCGAGC CCCGGCCCCG 109560
GCCCGCGCCC CGCCGCCACA CCCACGGCAC CCCCCCCGCC GCCCACCCCG GGGTCCACAC 109620 AGGAGCGCGC GGGCGGCAGA AACGCGGGCG CGGCGGCGGT CGGGGTGGGA GTGGTGGTGG 109680
GGGACACGAA AACACACCCA CGACACTCTC CCCCCACCCC GACCGCCGCC GCGCCCCACC 109740
GGCGGGATCG CGGCGAGACG CAGCCGGGCC CCCCCCCACC ACCCGCCCAC CCACCTACCC 109800
CGCGCCCGCA GCCTCCGGCA GCACGCCGAC CACCGCCGCC ACCCCCCAAA CAGCCAAGGC 109860 GCGGTGGGGG GCGTGGTGGT GAACGATGGG GGGAACACGG GGGGGAGGGG TCCGGGGCGA 109920
GGCGGGCGGG CGAAGGAAGG GGGGGTGGTG GCGGCGGCGG TGGAAAGCGG AAAAACGGAG 109980
GATGGAAGGG CAGAAGATGG GGAGTCCCGA TCCTCCTCCT GCATCCCCTC GCCTTCCATT 110040
CTCCGGCCCT CCGCGAGTCC CGACGCCCCC CCCCCGCCGC CCGACGAAGG AGACCCAAGC 110100
ACCGCAGCCG GAGAGGCCGA GCGGGGAGTG GGCGGCCGGG CGGGAGGATG GCGGAGAGAG 110160 AGAGAGAGAG AGAGAGAGGG GGGGGGGGGG AGAGGGAAAG CAACGGGAAA GAGAGGCGCG 110220
CGGAAAAGCA GCAAGAGGGG GGACGGGGCG AGCCGGGCAG AGTGCGGAGC CCCCGGAGCC 110280
CGCGGCCGCA GCCGAGCAGC GCCGCGGGCT CCGGGGCCGG GCCGGGCCGG CAACGCCCCG 110340
CGCCGGCCGC GGCGGAGAGA ACCCCTGTGT CATTGTTTAC GTGGCCGCGG GCCAGCAGAC 110400
GGGCCGCGGG CCAGCAGACG GGCCGCGGCG CCAGCGGCCC ACGCCTCCCG CCGCATTAGG 110460 CCCCCGCGGG CATCCGGCGG CCGGCCCCAC GCCCTTCCAT TAAACACTCC CACGTTGGGG 110520
GGGGGCGCGC CAGCTGAGTG CTCTGCGGTT GCGGGCGCCG TGCCCGGAGA TCCATTAAGC 110580
CGCCGGAGAG CCCGAGCCCC GCCCGCGTGT TGCTGTGGGC ATTTCTGCTG CGTCATCCCT 110640
GTCTTTATAA AACCGGGGGC GCGGCAGCAA CGAACACAGG GGCCCGCCGC CGATCGAGAG 110700
GGACTCCGGA GAAGGAAGGC TGCTCCGCGC ACCGGCGCGC CCTTCTCCTC TCCCCTCCCT 110760 ACCTCCCCCT CTCTTCCCCC TTTTTTCCCC CGCCTCCCGT CTTCTTCCGC GCCTCCGAGG 110820
GTCCGCCTCT GCCTCGGGGA CCCCCGGGCG GGCCGGGGCT TGGCCGCCGA GGTGCGCCCC 110880
GGCCGGAGGG GCCCCCGCAC CTCGGCGGCC GCCCCCTCCG GCGCCGCGCG TTCGCGAAAG 110940
GCGCGAAAGG GGCCCCCGGA GGCTTTTTTC GATTCCCGGC CGGGGGTCCC GGGTAGCCGC 111000
CCGGCGCCGG TCGGAAGGCG TCCCCCGCCC GGCGGTCNGG NNNNGGCCCC CGGCGGAGCG 111060 CGGGGGCCCC GGGGCCCCGG GCCGCGCCGG CGGCGTTTCC GCGTTCCGTT TCTTCTCCCT 111120
CCCGGCCGCC CCGCTCCCGG GCCCGACCCT CGCCCCTTCC CTTCTCCTCG TCTTCCCCCG 111180
TCCCGCCGCG CCCCTTCCCT CTTCCTTCTC TCTCTCTGTC TCGCTGTCTC GCTCTCCTCA 111240
CATTTCCCCC CCCCCCCCCC GGCCGCCGCC GCCGCCCTCT GCCCGCGTCC CACCGAGACG 111300
CCGCGCCGCG TGAGCCGTCC GCCGGGGGAC CCAGGCTCCG GGGGGGGGGC GCGCCTGCGT 111360 GTGTCTCGTG TGAGAGAGCG CGCCCCTCGA ACGCCGCGCG TTCTCGCAGG TAGGTTTAGG 111420
GTCGTACAGG TGAGCTTCTG CTGAGGCGGC GGGAGAGGGG GGGGCGGGCG GAAGAGAGAA 111480
GAGAGCAGGG GTTGGGGGAA AACTGTTCTT CCTCCCCCTT TCAAGAAACA CGAGGCGGGG 111540
GTCCCAGAAA GGGCAGGCAG GTCAGCCGCA CCGCCCGCGA GCCAACCCGT ATCCTTTTTT 111600
TCTAGGTGTT TTTGTTTTTG TTTCTGTTTT TGTTTGTTTT GTTATTATTT TCGCGGATCC 111660 GGCGTGTTCG GATCCACCCC CCCCTTTCTC CTTCCTCTTC CCTTCCACCC ACCCCCGTTT 111720
CCCCCCCCCC CCCGTGGTGT CGTCCGGGGG CGTCGTTCCC AGGGGGGGCA GGCGCGGGTC 111780
GGGCCCATAC GCCCACCGCC CCCACGCGCC GGTCACCCCC CCCCCAACAA CCCCAAAGGC 111840
GCGTGCCCGG CCACAGCCGT GGGTGTGGCG CCCGTCCCCT TCCTCTACCG CGTGGGCGCG 111900
GGCGGGGGGG TGGTGGTGGT AGTGGTGGCG GAAGGAAACG GGCCGGGGGG CCGGGGCCGC 111960 TAGGGAAAGG TAGGCACGCG CGCGGTGTGT CGACTTGCAT GCCCCGCAAA ACGCGTCGTG 112020
TCGTGTTGTG TCGTGGTGGG CCGTGTTGTG GTGGGCCGTG TGGTGTGGTG TGGTGTTGCG 112080
AACGCGCGAG CCCCCTCGCC CCGATGGGAG TCTCCCCGCA GCCAGGGTAA GGAGGGGCGG 112140
GCGTGGCGGG CAGGTGTGCG GGCGGGGTGG GGTGAGTGCG GTTGCATGCC TCGGGTCTCC 112200 TCTTCCTGCT CCTCCTCCTT TCTCCCAGCC AGGGTGAGGA GGGGCGGGCG TGGCGGGCGG 112260
GTGTGCGGGC GGGGTGGGCG CCGGGGCGGG GGTGGGCACG GGCGTAAGTG CGGGTGCATG 112320
CCTCGGGTCT TCTCTTCTCC CTCCTCCTTC CTCCCACCCG TCCCCGGGGG CAGAGGGCGT 112380
GCATGCGTTG TGATTCAACC GCCCTCGCCC CCGCCCCACT TTCCCCCCTC TCTATCAAAG 112440 TTCCCTGGCC CCTGGCTTCG CGCCGGTGGT GCGGCTGACC CCCCCCTCCT CCCTCCCCGA 112500
GCCAGGCGCC CTCCCACTCC TGCCCACCAC CCCCCGGGTC TGGCCGGCCA GACGTGCGTG 112560
CTCTGCACGA TCGGGCCCCC CTCCCTGTCA ACACGGACAC ACTCTTTTTT TACCCGCCAG 112620
CCCGCCCACC CACCAAGACA GGGAGCCAGA ACGCAGGCCG GGCCCCGGCT CTGTTCTATG 112680
ATAAAGACCA ACAGGCCTCG GGGGTGGGGG CGGCTTCTCG TGCCCGCCAA GGAGACGCCC 112740 GCCCCGACCA CCCTCGCAGC GCAGGCGCAG GCCGCGACCC CTCTCTGCTC TTTGGAGGGA 112800
GCCGGGGGCG CGACGACGCG GCGCCCCCGG CTCCTTCACA CGGTCCTTCT GCGCGGTGCG 112860
CCTCCCGCCG GGGCTCGGGC CCCGGGCGCT GGGCCGCGGG CCGGAGTGCG CGATGGACGG 112920
GTAGCGCCCC CAGAGCTCGC AGCACCGGGA CCGCGGAATG CACTTGTTCT GCCAGTGCCC 112980
CCTGACGGAC GGGCAGGACC TGTACCTCTG CCCGGTGTAT CCCCGGATGC ACCAGGAGCA 113040 CCTGGTCTGC CCCTTGCACC GCCTGGACGA CGCCCGGCGC CGGGGGCGCA CCTCGGCGGC 113100
GTGGGACGAG GGGCTCGTGC GCGCGTTGAC GCACTCCGGG GGGCTGATGG GCTGCGGGGG 113160
GCGCAGCCTC ACCTTGTCGG AGACCTACTG GGGCCACCCG TTGTACGAGA AACTGGTCCC 113220
GTGGGACCAC CCGCGCGACC TGAAGGTGCC GGAGGCCAGC GCGGTGGGCA CCAGAGCCCT 113280
CGTCCCGCGC GGGCGCGGCC GGCCGCTGCG GGGGCGCCCG GTGCCCCTCA TCCCCCTCGA 113340 TTGTGAGCCG AACGACGGGC TTCCCTTCGG CGGGGGGTGG CCTGGTGGCC GGCTCCGCGG 113400
AGCCCCCGTC CCCCTCCACC CCCCCCCCCC TTCTGCCCCT CCTCTGTCCT TCACCCCCAC 113460
CCTCACCCCC CCCTGCCTGT GCCGGGGCTT GTCGTTGTGT GTGGTCGTAA AACAATACCT 113520
GAAAGACCGG AACAACTTTT GAACTCCTTT TTTTTGAAAT ATAAATATTT TTAAAATGTT 113580
ATTTCAAAAC ACTACGAAAA CTGTGTGAAA CAACAACCGG AAACTACGTC GAGGGGGCGC 113640 GTCCCCCCGG CCCCTACGCC CCCCCCTTTC CCTCCTCCTC CTCCCCGCCC TCCGCCTCCT 113700
CCTCCTCCGC CTCCTCCTCC TCCGCCTCCT CCTCCTCCGC CTCCTCCTCC TCCGCCTCCT 113760
CCTCCTCCGC CTCCTCCTCC TCCGCCTCCT CCTCCTCCGC CGCCGCTGGC GCCGGACCCT 113820
GCTGCCTCTG CGGCTGCCCC CGCGCCGCGG GCGCCTGCGG CCCCGCTCGC CGGGCACCGG 113880
CGCCAGCGGG CTCAGGCTCA GGCCCCGGGC CGCGCCGCGG CGGGAGAACC GGGGGTGGGG 113940 GACCCCCCGC TCCCCGCTCG CGCCCCGCCG CCTCCTTCTC CGCCTCCTGC TCCGGCGCCC 114000
CGGGCTCAGG CTGGGCGCGG AGAAGGCCCC CGCCCGGAGA GGAGGAGGCG CGTCCACAGG 114060
AGCCCGGGGC CCCCCCCTCC AGACGGTGTC AGCAGCCCCG CGCGCCGCGC GGGGGCGCGC 114120
CGGCAGCGGG GCGCGCAGGC CTCAGGCGGG GCGCGGCGGC GGCGGGGGCA CCACAGACGC 114180
TCGCGCCTGC GCCGGCCCGG GCGCGGCGGG CGGCACGGCC ACCTGCGCGT GGCGCGCGGG 114240 GCCAGCGCGT ACTGGGTCCG AGTCTGGCTG TGGGTTCGTG TCTCAGACCC GGCCCGTCCG 114300
CGCTGGCTGC GCGCGCCCAG CCCTCCCGGC CCGCGCCTCC CTCCTGGGCC CCAGGGGGCG 114360
CCGTGGTTGT GGGGGCCACG GCGGGGGGTG CGGCGCCTCC CCCGCCGTCT GTTTCCTCTC 114420
GCCGGGCCCC GGGCGCCCCG CCGCGCCTCT GCCGCCCCCT CTCAGCGACT ACTGATACCC 114480
CCCGAGGACC CGGCGCGCCC CGACAGAGCG CCCCCCGCAG GACGGGAGGC GGCGGCGCCG 114540 CAGAAGCGGG TGGGCGGCGC GGACGCGCGC GGGGGGCGGC CGGCGTCCCC CTTCTCTCCG 114600
GTGAGAGCCG TGCTGCCGGC GCTGCCGTCC CGGCGGGGGT ATGCGGTTCG CGCGTTAATT 114660
GGGAGTGATT TCCCTTGTTT TCGACCTCGA GGTGGCGCCA CCGCCGGCGA GATCTTGATC 114720
ACCTAGGGGG CCCGACGTCC TTAAGCTACT GGGTCTAGGG TGGGGGCGGG CGTTGCCCCG 114780 CGGCGGCGAC GACGACGAGG CGCCCCGCGG TCCCCCGCGG CCAGCCCAGC GCCGCCCGAC 114840
CCTCCAAGGC GCCCAGCGGG GGCGTGGCGG CGGGGGCGCG GCCCCGCGAG AAGCCCCCCG 114900
CCCGCCCTGC ATCAGGCGAC GTCTCCCTCT GTCTCTGCCC TTGGGGGCCA ATCACGGGCT 114960
GGGGGCGGGC TGGGGGCGGG TCACGGGCTG GGGGCGGGTC ACGGGCTGGG GGCGGGTCAC 115020 GGGCTGGGGG CGGGTCACGG GCTGGGGGCG GGCGGGAGTG GCAGCCGGTC CAGTAGCAGG 115080
AGCAGCAGGC ACGGCCCGGT GCCCCCCCAC CCGCTGTCCC GCGCCTGGCA CACAGGGGGG 115140
TCGCTGTCCC TCGCGCCCCG GCAGGCGCCC AACGGGCAGG TCTATTTCAG GTGCCGGCAC 115200
GGCCTGGCGT GCCGGCGGAG CCGGAGGTGC GCCCAGGCCC CCAGCAAGTG ATAGCCCTAC 115260
CACGACTTGC TGGGCGACCG CCAGTGCGGG TGATAGTCCA TGCGGTGGCC CCACAACGTG 115320 TCCCCTGTGC ACAACGCGTT GCCTTAGGTC CAGAAGTACG TGCCCTACGT CTTCCCCACG 115380
TCCGTCCCTT TTGAGACCGT CGCGTCCCCG CCCCGCTAGA GCAGGCACGT GTGCCGTGTG 115440
TGCAGCGGGG GGGGAGGGCG AAGGCGAAGG AGGAGTGGGT GCCCGGGTGG GGGCGTCCTA 115500
GGGACGCGCA GCCGCCCGCA CCCCGACGGG ACCGCGAGCC GGCCCCCGGC CCGGCCCCCG 115560
CACCGGCGCA GGTAGTCCGG GCGGAGCTTG TAGAGGCACA GGCACGACGG GCGGAGCCTC 115620 CACCTCAGCG CCACTTCCAG CAGCAGTCTC TAAGGGTGGA GCCAGAGGAG GAGGCTCAGC 115680
GACGACCGCT CGGTGACGTA CAGCAACTCG TAGGGGGTCC GCACGCCCCG CCGCCCGACG 115740
AACTGTTTCT TTGCCCCCCC CTAAATCTCC CGCGCCCCGC ACTCCGCCCT GGGGGCACGG 115800
CACAGGGGGC ACAGGGAGGG AGTGGGGCCG GGGGGCGGGC GACGAAAAAC AAGCCTTCCC 115860
CCCCTCTTTC CCCAGGCATT GGTTTCCACC AGACGCAGGA AACCTAAGGC TGGGGAGCAG 115920 AGGGGGGGGG ACAGGGGGCG AGAGCCCGAG TCCCGAGGGA CGGAGGGAGC GGGGGGGTTT 115980
CCCAGCCCCC CGCCGCGTGC CGGGTGCCCC CAGGGGGCTG GCGAATTCGC CCGGCCCCCA 116040
GCCGGGGCAG TTCGCAGGGG CGGGGGCTCG GGTGGCGGGC GCTGGTGGGG GTTGGGCGTC 116100
GGCCCACCAG GCCCCTTTTC CCCCCCGGAC TCTGGGCCCC CAGCGGGAGA GTGGCACGGC 116160
CCCCAGACGG CGCCGCCGGC GAGCCCCGGC CCCAGGCGGG CCCTCGAGCA CGGCCCGGCC 116220 CCAAGGTACT CGGCCCCATC CCATCTGAGC TCTGCCGCCG GGCGCCAGAG AGAGAACGGC 116280
CCACAATCAG AGACAGAGAG GCCCAGAGGA GGAGGGCGGC CCGGCGGCGA GGCAGCGAGC 116340
GTCACGGCCC CACGCTTACG CCGGGCTGGC AGTGTGCCCC GACGGAATAT GGGCCGCGGA 116400
TAGGTGAGGG GGTTTCCCCG CCGTAAATGC TAAGGGGGTT ATCGGCGCGC GGGGCCGCCC 116460
CCGCCTCCCT CCCTTAGGGG GGGAGAGCCC CGCCGGGGCA GGGGCCCCTG GTTGGCCCAC 116520 ATGAGGTTCT TGGGGTAATC GTACGCGGCG GGGGGCGGCT GCGTCTACCC TCAGGGGGGC 116580
CGCGGGGCGG CCGCGCCGGG ACTCACCACG GGCGGGGGCC CCTCTTTAAG TAATCGTATG 116640
ATCCTTCGGG TCCCCTGGTT ATCCCCGGCT AGTCGGGTGG GTGGGCCGCC GCGCGCTCCG 116700
AGACGCACAA GACGGTTCTT TCATTAGTCG TATTGGGCCT TGGGGCTCCC TCATTAATGC 116760
GCCCCTCGCT CCCCGGCAGG CTTGCAAAAA TTAATGGTAT TCGCCCTTAC CGCCGGGCAA 116820 TTTTCGACGA TTAATGGCGC TCGCCCTTGC GGCCGGGTAA TTTTCAACGA TTAATGGTAC 116880
TCGCCCCTAC CGCCGGCCCT GGCGGATAAT TTTCAAAGAT TAATGGTATG GCCCTTCGGC 116940
CGCGCCCCGC CAGCGGCCCC GCCTCAGGCC CGGGCGCGCC GCCGCGCGCC AACCGGCCGC 117000
GGCGGGGGAC CCCGCCCGCC TCGCCGCCCC GCCGCGGCCC GGGAGCGCCT ATATATGCGC 117060
CCCGAGGGTA GCAGAGAAGC CTCTCGCCGG AGCGCGTCTG GAAGCCTCGA GGCCCCGAGG 117120 CGGCCGGCTC CGGCGGGAGC GGCCAAGTTG GGATCTGGCG GGCTGCCGGG CCCGGGCGCC 117180
GCCGCCTCCT GGGCGCGCGG CGGCGGCGGC GGA 117213
(2) INFORMATION FOR SEQ ID NO: 218: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 180 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 218:
Met Arg Thr Pro Ala Asp Asp Val Ser Trp Arg Tyr Glu Ala Pro Ser
1 5 10 15
Val Ile Asp Tyr Ala Arg Ile Asp Gly Ile Phe Leu Arg Tyr His Cys 20 25 30
Pro Gly Leu Asp Thr Phe Leu Trp Asp Arg His Ala Gin Arg Ala Tyr
35 40 45
Leu Val Asn Pro Phe Leu Phe Ala Ala Gly Phe Leu Glu Asp Leu Ser 50 55 60 His Ser Val Phe Pro Ala Asp Thr Gin Glu Thr Thr Thr Arg Arg Ala 65 70 75 80
Leu Tyr Lys Glu Ile Arg Asp Ala Leu Gly Ser Arg Lys Gin Ala Val
85 90 95
Ser His Ala Pro Val Arg Ala Gly Cys Val Asn Phe Asp Tyr Ser Arg 100 105 110
Thr Arg Arg Cys Val Gly Arg Arg Asp Leu Arg Pro Ala Asn Thr Thr
115 120 125
Ser Thr Trp Glu Pro Pro Val Ser Ser Asp Asp Glu Ala Ser Ser Gin
130 135 140 Ser Lys Pro Leu Ala Thr Gin Pro Pro Val Leu Ala Leu Ser Asn Ala
145 150 155 160
Pro Pro Arg Arg Val Ser Pro Thr Arg Gly Arg Arg Arg His Thr Arg
165 170 175
Leu Arg Arg Asn 180
(2) INFORMATION FOR SEQ ID NO: 219:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 334 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 219:
Met Lys Arg Ala Arg Ser Arg Ser Pro Ser Pro Pro Ser Arg Pro Ser
1 5 10 15
Ser Pro Phe Arg Thr Pro Pro His Gly Gly Ser Pro Arg Arg Glu Val 20 25 30 Gly Ala Gly Ile Leu Ala Ser Asp Ala Thr Ser His Val Cys Ile Ala 35 40 45
Ser His Pro Gly Ser Gly Ala Gly Tyr Pro Thr Arg Leu Ala Ala Gly
50 55 60
Ser Ala Val Gin Arg Arg Arg Pro Arg Gly Cys Pro Pro Gly Val Met 65 70 75 80
Phe Ser Ala Ser Thr Thr Pro Glu Gin Pro Leu Gly Leu Ser Gly Asp
85 90 95
Ala Thr Pro Pro Leu Pro Thr Ser Val Pro Leu Asp Trp Ala Ala Phe 100 105 110 Arg Arg Ala Phe Leu Ile Asp Asp Ala Trp Arg Pro Leu Leu Glu Pro 115 120 125
Glu Leu Ala Asn Pro Leu Thr Ala Arg Leu Leu Ala Glu Tyr Asp Arg
130 135 140
Arg Cys Gin Thr Glu Glu Val Leu Pro Pro Arg Glu Asp Val Phe Ser 145 150 155 160
Trp Thr Arg Tyr Cys Thr Pro Asp Asp Val Arg Val Val Ile Ile Gly
165 170 175
Gin Asp Pro Tyr His His Pro Gly Gin Ala His Gly Leu Ala Phe Ser 180 185 190 Val Arg Ala Asp Val Pro Val Pro Pro Ser Leu Arg Asn Val Leu Ala 195 200 205
Ala Val Lys Asn Cys Tyr Pro Asp Ala Arg Met Ser Gly Arg Gly Cys
210 215 220
Leu Glu Lys Trp Ala Arg Asp Gly Val Leu Leu Leu Asn Thr Thr Leu 225 230 235 240
Thr Val Lys Arg Gly Ala Ala Ala Ser His Ser Lys Leu Gly Trp Asp
245 250 255
Arg Phe Val Gly Gly Val Val Arg Arg Leu Ala Ala Arg Arg Pro Gly 260 265 270 Leu Val Phe Met Leu Trp Gly Ala His Ala Gin Asn Ala Ile Arg Pro 275 280 285
Asp Pro Arg Gin His Tyr Val Leu Lys Phe Ser His Pro Ser Pro Leu 290 295 300 Ser Lys Val Pro Phe Gly Thr Cys Gin His Phe Leu Ala Ala Asn Arg 305 310 315 320
Tyr Leu Glu Thr Arg Asp Ile Met Pro Ile Asp Trp Ser Val 325 330
(2) INFORMATION FOR SEQ ID NO: 220:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 231 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:220:
Met Val Lys Ser Arg Val Ser Tyr Arg Ser Val Met Ser Gly Val Gly 1 5 10 15 Glu Glu Arg Val Pro Ser Ala Phe Thr Ile Leu Ala Ser Trp Gly Trp 20 25 30
Thr Phe Ala Pro Gin Asn His Asp Pro Gly Asp Asn Thr Thr Pro lie
35 40 45
Glu Ser Ile Ala Gly Thr Ala Pro Asp Ala His Val Gly Pro Leu Asp 50 55 60
Gly Glu Pro Asp Arg Asp Ala Ile Ser Pro Leu Thr Ser Ser Val Ala 65 70 75 80
Gly Asp Pro Pro Gly Ala Asp Gly Pro Tyr Val Thr Phe Asp Thr Leu 85 90 95 Phe Met Val Ser Ser Ile Asp Glu Leu Gly Arg Arg Gin Leu Thr Asp 100 105 110
Thr Ile Arg Lys Asp Leu Arg Leu Ser Leu Ala Lys Phe Ser Ile Ala
115 120 125
Cys Thr Lys Thr Ser Ser Phe Ser Gly Thr Ala Ala Arg Gin Arg Lys 130 135 140
Arg Gly Ala Pro Pro Gin Arg Thr Cys Val Pro Arg Ser Asn Lys Ser 145 150 155 160
Leu Gin Met Phe Val Leu Cys Lys Arg Ala Asn Ala Ala Gin Val Arg 165 170 175 Glu Gin Leu Arg Ala Val Ile Arg Ser Arg Lys Pro Arg Lys Tyr Tyr 180 185 190
Thr Arg Ser Ser Asp Gly Arg Leu Cys Pro Ala Val Pro Val Phe Val 195 200 205 His Glu Phe Val Ser Ser Glu Pro Met Arg Leu His Arg Asp Asn Val
210 215 220
Met Leu Ser Thr Glu Pro Asp , 225 230
(2) INFORMATION FOR SEQ ID NO: 221:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 199 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221:
Met Gly Asn Pro Gin Thr Thr Ile Ala Tyr Ser Leu His His Pro Arg 1 5 10 15 Ala Ser Leu Thr Ser Ala Leu Pro Asp Ala Ala Gin Val Val His Val 20 25 30
Phe Glu Ser Gly Thr Arg Ala Val Leu Thr Arg Gly Arg Ala Arg Gin
35 40 45 Asp Arg Leu Pro Arg Gly Gly Val Val Ile Gin His Thr Pro Ile Gly 50 55 60
Leu Leu Val Ile Ile Asp Cys Arg Ala Glu Phe Cys Ala Tyr Arg Phe
65 70 75 80
Ile Gly Arg Ala Ser Thr Gin Arg Leu Glu Arg Trp Trp Asp Ala His
85 90 95 Met Tyr Ala Tyr Pro Phe Asp Ser Trp Val Ser Ser Ser His Gly Glu 100 105 110
Ser Val Arg Ser Ala Thr Ala Gly Ile Leu Thr Val Val Trp Thr Pro 115 120 125 Asp Thr Ile Tyr Ile Thr Ala Thr Ile Tyr Gly Thr Ala Pro Glu Ala 130 135 140
Arg Cys Asp Asn Ala Pro Leu Asp Val Arg Pro Thr Thr Pro Pro Ala
145 150 155 160
Pro Val Ser Pro Thr Ala Gly Glu Phe Pro Ala Asn Thr Thr Asp Leu
165 170 175 Leu Val Glu Val Leu Arg Glu Ile Gin Ile Ser Pro Thr Leu Asp Asp 180 185 190
Ala Asp Pro Thr Pro Gly Thr 195 (2) INFORMATION FOR SEQ ID NO: 222:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 877 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 222:
Met Ala Ala Ser Gly Gly Glu Gly Ser Arg Asp Val Arg Ala Pro Gly 1 5 10 15
Pro Pro Pro Gin Gin Pro Gly Ala Arg Pro Ala Val Arg Phe Arg Asp
20 25 30
Glu Ala Phe Leu Asn Phe Thr Ser Met His Gly Val Gin Pro Ile Ile 35 40 45 Ala Arg Ile Arg Glu Leu Ser Gin Gin Gin Leu Asp Val Thr Gin Val 50 55 60
Pro Arg Leu Gin Trp Phe Arg Asp Val Ala Ala Leu Glu Val Pro Thr 65 70 75 80
Gly Leu Pro Leu Arg Glu Phe Pro Phe Ala Ala Tyr Leu Ile Thr Gly 85 90 95
Asn Ala Gly Ser Gly Lys Ser Thr Cys Val Gin Thr Leu Asn Glu Val
100 105 110
Leu Asp Cys Val Val Thr Gly Ala Thr Arg Ile Ala Ala Gin Asn Met 115 120 125 Tyr Val Lys Leu Ser Gly Ala Phe Leu Ser Arg Pro Ile Asn Thr Ile 130 135 140
Phe His Glu Phe Gly Phe Arg Gly Asn His Val Gin Ala Gin Leu Gly 145 150 155 160
Gin His Pro Tyr Thr Leu Ala Ser Ser Pro Ala Ser Leu Glu Asp Leu 165 170 175
Gin Arg Arg Asp Leu Thr Tyr Tyr Trp Glu Val Ile Leu Asp Ile Thr
180 185 190
Lys Arg Ala Ala His Gly Gly Glu Asp Ala Arg Asn Glu Phe His Ala 195 200 205 Leu Thr Ala Leu Glu Gin Thr Leu Gly Leu Gly Gin Gly Ala Leu Thr 210 215 220
Arg Leu Ala Ser Val Thr His Gly Ala Leu Pro Ala Phe Thr Arg Ser 225 230 235 240 Asn Ile Ile Val Ile Asp Glu Ala Gly Leu Leu Gly Arg His Leu Leu
245 250 255
Thr Thr Val Val Tyr Cys Trp Trp Met Ile Asn Ala Leu Tyr His Thr 260 265 270 Pro Gin Tyr Ala Gly Arg Leu Arg Pro Val Leu Val Cys Val Gly Ser 275 280 285
Pro Thr Gin Thr Ala Ser Leu Glu Ser Thr Phe Glu His Gin Lys Leu
290 295 300
Arg Cys Ser Val Arg Gin Ser Glu Asn Val Leu Thr Tyr Leu Ile Cys 305 310 315 320
Asn Arg Thr Leu Arg Glu Tyr Thr Arg Leu Ser His Ser Trp Ala Ile
325 330 335
Phe Ile Asn Asn Lys Arg Cys Val Glu His Glu Phe Gly Asn Leu Met 340 345 350 Lys Val Leu Glu Tyr Gly Leu Pro Ile Thr Glu Glu His Met Gin Phe 355 360 365
Val Asp Arg Phe Val Val Pro Glu Ser Tyr Ile Thr Asn Pro Ala Asn
370 375 380
Leu Pro Gly Trp Thr Arg Leu Phe Ser Ser His Lys Glu Val Ser Ala 385 390 395 400
Tyr Met Ala Lys Leu His Ala Tyr Leu Lys Val Thr Arg Glu Gly Glu
405 410 415
Phe Val Val Phe Thr Leu Pro Val Leu Thr Phe Val Ser Val Lys Glu 420 425 430 Phe Asp Glu Tyr Arg Arg Leu Thr Gin Gin Pro Thr Leu Thr Met Glu 435 440 445
Lys Trp Ile Thr Ala Asn Ala Ser Arg Ile Thr Asn Tyr Ser Gin Ser
450 455 460
Gin Asp Gin Asp Ala Gly His Val Arg Cys Glu Val His Ser Lys Gin 465 470 475 480
Gin Leu Val Val Ala Arg Asn Asp Ile Thr Tyr Val Leu Asn Ser Gin
485 490 495
Val Ala Val Thr Ala Arg Leu Arg Lys Met Val Phe Gly Phe Asp Gly 500 505 510 Thr Phe Arg Thr Phe Glu Ala Val Leu Arg Asp Asp Ser Phe Val Lys 515 520 525
Thr Gin Gly Glu Thr Ser Val Glu Phe Ala Tyr Arg Phe Leu Ser Arg
530 535 540
Leu Met Phe Gly Gly Leu Ile His Phe Tyr Asn Phe Leu Gin Arg Pro 545 550 555 560
Gly Leu Asp Ala Thr Gin Arg Thr Leu Ala Tyr Gly Arg Leu Gly Glu
565 570 575
Leu Thr Ala Glu Leu Leu Ser Leu Arg Arg Asp Ala Ala Gly Ala Ser 580 585 590
Ala Thr Arg Ala Ala Asp Thr Ser Asp Arg Ser Pro Gly Glu Arg Ala
595 .600 605
Phe Asn Phe Lys His Leu Gly Pro Arg Asp Gly Gly Pro Asp Asp Phe 610 615 620
Pro Asp Asp Asp Leu Asp Val Ile Phe Ala Gly Leu Asp Glu Gin Gin
625 630 635 640
Leu Asp Val Phe Tyr Cys His Tyr Ala Leu Glu Glu Pro Glu Thr Thr
645 650 655 Ala Ala Val His Ala Gin Phe Gly Leu Leu Lys Arg Ala Phe Leu Gly
660 665 670
Arg Tyr Leu Ile Leu Arg Glu Leu Phe Gly Glu Val Phe Glu Ser Ala
675 680 685
Pro Phe Ser Thr Tyr Val Asp Asn Val Ile Phe Arg Gly Cys Glu Leu 690 695 700
Leu Thr Gly Ser Pro Arg Gly Gly Leu Met Ser Val Gin Thr Asp Asn
705 710 715 720
Tyr Thr Leu Met Gly Tyr Thr Tyr Thr Arg Val Phe Ala Phe Ala Glu
725 730 735 Glu Leu Arg Arg Arg His Ala Thr Ala Gly Val Ala Glu Phe Leu Glu
740 745 750
Glu Ser Pro Leu Pro Tyr Ile Val Leu Arg Asp Gin His Gly Phe Met
755 760 765
Ser Val Val Asn Thr Asn Ile Ser Glu Phe Val Glu Ser Ile Asp Ser 770 775 780
Thr Glu Leu Ala Met Ala Ile Asn Ala Asp Tyr Gly Ile Ser Ser Lys
785 790 795 800
Leu Ala Met Thr Ile Thr Arg Ser Gin Gly Leu Ser Leu Asp Lys Val
805 810 815 Ala Ile Cys Phe Thr Pro Gly Asn Leu Arg Leu Asn Ser Ala Tyr Val
820 825 830
Ala Met Ser Arg Thr Thr Ser Ser Glu Phe Leu His Met Asn Leu Asn
835 840 845
Pro Leu Arg Glu Arg His Glu Arg Asp Asp Val Ile Ser Glu His Ile 850 855 860
Leu Ser Ala Leu Arg Asp Pro Asn Val Val Ile Val Tyr 865 870 875
(2) INFORMATION FOR SEQ ID NO: 223:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 292 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:223:
Met Ala Asp Pro Thr Pro Ala Asp Glu Gly Thr Ala Ala Ala Ile Leu 1 5 10 15 Lys Gin Ala Ile Ala Gly Asp Arg Ser Leu Val Glu Val Ala Glu Gly 20 25 30
Ile Ser Asn Gin Ala Leu Leu Arg Met Ala Cys Glu Val Arg Gin Val
35 40 45
Ser Asp Arg Gin Pro Arg Phe Thr Ala Thr Ser Val Leu Arg Val Asp 50 55 60
Val Thr Pro Arg Gly Arg Leu Arg Phe Val Leu Asp Gly Ser Ser Asp 65 70 75 80
Asp Ala Tyr Val Ala Ser Glu Asp Tyr Phe Lys Arg Cys Gly Asp Gin 85 90 95 Pro Tyr Gly Phe Ala Val Val Val Leu Thr Ala Asn Glu Asp His Val 100 105 110
His Ser Leu Ala Val Pro Pro Leu Val Leu Leu His Arg Leu Ser Leu
115 120 125
Phe Arg Pro Thr Asp Leu Arg Asp Phe Glu Leu Val Cys Leu Leu Met 130 135 140
Tyr Leu Glu Asn Cys Pro Arg Ser His Ala Thr Pro Ser Leu Phe Val
145 150 155 160
Lys Val Ser Ala Trp Leu Gly Val Val Ala Arg His Asp Phe Glu Arg
165 170 175 Val Arg Cys Leu Leu Leu Arg Ser Cys His Trp Ile Leu Asn Thr Leu
180 185 190
Met Cys Met Ala Gly Val Lys Pro Phe Asp Asp Glu Leu Val Leu Pro
195 200 205
His Trp Tyr Met Ala His Tyr Leu Leu Ala Asn Asn Pro Pro Pro Val 210 215 220
Leu Ser Ala Leu Phe Cys Ala Thr Pro Gin Ser Ser Ala Leu Gin Leu 225 230 235 240
Pro Gly Pro Val Pro Arg Thr Asp Cys Val Ala Tyr Asn Pro Ala Gly 245 250 255 Val Met Gly Ser Cys Trp Lys Ser Lys Asp Leu Arg Ser Ala Leu Val 260 265 270
Tyr Trp Trp Leu Ser Gly Ser Pro Lys Arg Arg Thr Ser Ser Leu Phe 275 280 285 Tyr Arg Phe Cys 290
(2) INFORMATION FOR SEQ ID NO: 224:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 734 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 224:
Met Glu Ala Pro Gly Ile Val Trp Val Glu Glu Ser Val Ser Ala lie
1 5 10 15
Thr Leu Tyr Ala Val Trp Leu Pro Pro Arg Thr Arg Asp Cys Leu His 20 25 30 Ala Leu Leu Tyr Leu Val Cys Arg Asp Ala Ala Gly Glu Ala Arg Ala 35 40 45
Arg Phe Ala Glu Val Ser Val Gly Ser Ser Asp Leu Gin Asp Phe Tyr
50 55 60
Gly Ser Pro Asp Val Ser Ala Ala Gly Ala Val Ala Ala Ala Arg Ala 65 70 75 80
Ala Pro Ala Asp Leu Glu Pro Leu Gly Asp Pro Thr Leu Trp Arg Ala
85 90 95
Leu Tyr Ala Cys Val Leu Ala Ala Leu Glu Arg Gin Thr Gly Pro Val 100 105 110 Phe Val Pro Leu Arg Leu Gly Trp Asp Pro Gin Thr Gly Leu Val Val 115 120 125
Arg Val Glu Arg Ala Ser Trp Gly Pro Pro Ala Ala Pro Arg Ala Ala
130 135 140
Leu Leu Asp Val Glu Ala Lys Val Asp Val Asp Pro Leu Ala Ala Arg 145 150 155 160
Val Ala Glu His Pro Gly Ala Arg Leu Ala Trp Ala Arg Leu Ala Ala
165 170 175
Ile Arg Asp Ser Pro Gin Cys Ala Ser Ser Ala Ser Leu Ala Val Thr 180 185 190 Ile Thr Thr Arg Thr Ala Arg Phe Ala Arg Glu Tyr Thr Thr Leu Ala 195 200 205
Phe Pro Pro Thr Ser Lys Glu Gly Ala Phe Ala Asp Leu Val Glu Val 210 215 220 Cys Glu Val Gly Leu Arg Pro Arg Gly His Pro Gin Arg Val Thr Ala
225 230 235 240
Arg Val Leu Leu Pro Arg Gly Tyr Asp Tyr Phe Val Ser Ala Gly Asp
245 250 255 Gly Phe Ser Ala Pro Ala Leu Val Phe Arg Gin Trp His Thr Thr Val
260 265 270
His Ala Ala Pro Gly Ala Pro Val Phe Ala Phe Leu Gly Pro Gly Phe
275 280 285
Glu Val Arg Gly Gly Pro Val Gin Tyr Phe Ala Val Leu Gly Phe Pro 290 295 300
Gly Trp Pro Thr Phe Thr Val Pro Ala Ala Ala Ala Ala Glu Ser Ala
305 310 315 320
Arg Asp Leu Val Arg Gly Ala Ala Ala Thr His Ala Ala Cys Leu Gly
325 330 335 Ala Trp Pro Ala Val Gly Ala Arg Val Val Leu Pro Pro Arg Ala Trp
340 345 350
Pro Ala Val Ala Ser Glu Ala Ala Gly Arg Leu Leu Pro Ala Phe Arg
355 360 365
Glu Ala Val Ala Arg Trp His Pro Thr Ala Thr Thr Ile Gin Leu Leu 370 375 380
Asp Pro Pro Ala Ala Val Gly Pro Val Trp Thr Ala Arg Phe Cys Phe
385 390 395 400
Ser Gly Leu Gin Ala Gin Leu Leu Ala Ala Gly Leu Gly Glu Ala Gly
405 410 415 Leu Pro Glu Arg Arg Ala Gly Leu Glu Arg Leu Asp Ala Leu Val Ala
420 425 430
Ala Ala Pro Ser Glu Pro Trp Ala Arg Ala Val Leu Glu Arg Leu Val
435 440 445
Pro Asp Ala Cys Asp Ala Cys Pro Ala Leu Arg Gin Leu Leu Gly Gly 450 455 460
Val Met Ala Ala Val Cys Leu Gin Ile Glu Gin Thr Ala Ser Ser Val
465 470 475 480
Lys Phe Ala Val Cys Gly Gly Thr Gly Ala Ala Phe Trp Gly Leu Phe
485 490 495 Asn Val Asp Pro Gly Asp Ala Asp Ala Ala His Gly Ala Ile Gin Asp
500 505 510
Ala Arg Arg Ala Leu Glu Ala Ser Val Arg Ala Val Leu Ser Ala Asn
515 520 525
Gly Ile Arg Pro Arg Leu Ala Pro Ser Leu Ala Leu Glu Gly Val Tyr 530 535 540
Thr His Val Val Thr Trp Ser Gin Thr Gly Ala Trp Phe Trp Asn Ser 545 550 555 560
Arg Asp Asp Thr Asp Phe Leu Gin Gly Phe Pro Leu Arg Gly Pro Ala 565 570 575
Tyr Ala Ala Ala Ala Glu Val Met Arg Asp Ala Leu Arg Arg Ile Leu
580 - 585 590
Arg Arg Pro Ala Ala Gly Pro Pro Glu Glu Ala Val Cys Ala Arg Ile 595 600 605
Met Glu Asp Ala Cys Asp Arg Phe Val Leu Asp Ala Phe Gly Arg Arg
610 615 620
Leu Asp Ala Glu Tyr Trp Ser Val Leu Thr Pro Pro Gly Glu Ala Asp 625 630 635 640 Asp Pro Leu Pro Gin Thr Ala Phe Arg Gly Gly Ala Leu Leu Asp Ala
645 650 655
Glu Gin Tyr Trp Arg Arg Val Val Arg Val Cys Pro Gly Gly Gly Glu
660 665 670
Ser Val Gly Val Pro Val Asp Leu Tyr Pro Arg Pro Leu Val Leu Pro 675 680 685
Pro Val Asp Cys Ala His His Leu Arg Glu Ile Leu Arg Glu Ile Gin
690 695 700
Leu Val Phe Thr Gly Val Leu Glu Gly Val Trp Gly Glu Gly Gly Ser 705 710 715 720 Phe Val Tyr Pro Phe Glu Glu Lys Met Arg Phe Leu Phe Pro
725 730
(2) INFORMATION FOR SEQ ID NO: 225:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 461 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 225:
Met Gly Arg Arg Ala Pro Arg Gly Ser Pro Glu Ala Ala Pro Gly Ala 1 5 10 15
Asp Val Ala Pro Gly Ala Arg Ala Ala Trp Trp Val Trp Cys Val Gin
20 25 30
Val Ala Thr Phe Ile Val Ser Ala Ile Cys Val Val Gly Leu Leu Val 35 40 45
Leu Ala Ser Val Phe Arg Asp Arg Phe Pro Cys Leu Tyr Ala Pro Ala
50 55 60
Thr Ser Tyr Ala Glu Ala Asn Ala Thr Val Glu Val Arg Gly Gly Val 65 70 75 80
Ala Val Pro Leu Arg Leu Asp Thr Gin Ser Leu Leu Ala Thr Tyr Ala
85 - 90 95
Ile Thr Ser Thr Leu Leu Leu Ala Ala Ala Val Tyr Ala Ala Val Gly 100 105 110
Ala Val Thr Ser Arg Tyr Glu Arg Ala Leu Asp Ala Ala Arg Arg Leu
115 120 125
Ala Ala Ala Arg Met Ala Met Pro His Ala Thr Leu Ile Ala Gly Asn
130 135 140 Val Cys Ala Trp Leu Leu Gin Ile Thr Val Leu Leu Leu Ala His Arg
145 150 155 160
Ile Ser Gin Leu Ala His Leu Ile Tyr Val Leu His Phe Ala Cys Leu
165 170 175
Val Tyr Leu Ala Ala His Phe Cys Thr Arg Gly Val Leu Ser Gly Thr 180 185 190
Tyr Leu Arg Gin Val His Gly Leu Ile Asp Pro Ala Pro Thr His His
195 200 205
Arg Ile Val Gly Pro Val Arg Ala Val Met Thr Asn Ala Leu Leu Leu
210 215 220 Gly Thr Leu Leu Cys Thr Ala Ala Ala Ala Val Ser Leu Asn Thr Ile
225 230 235 240
Ala Ala Leu Asn Phe Asn Phe Ser Ala Pro Ser Met Leu Ile Cys Leu
245 250 255
Thr Thr Leu Phe Ala Leu Leu Val Val Ser Leu Leu Leu Val Val Glu 260 265 270
Gly Val Leu Cys His Tyr Val Arg Val Leu Val Gly Pro His Leu Gly
275 280 285
Ala Ile Ala Ala Thr Gly Ile Val Gly Leu Ala Cys Glu His Tyr His
290 295 300 Thr Gly Gly Tyr Tyr Val Val Glu Gin Gin Trp Pro Gly Ala Gin Thr
305 310 315 320
Gly Val Arg Val Val Ala Ala Phe Ala Met Ala Val Leu Arg Cys Thr
325 330 335
Arg Ala Tyr Leu Tyr His Arg Arg His His Thr Lys Phe Phe Val Arg 340 345 350
Met Arg Asp Thr Arg His Arg Ala His Ser Ala Leu Arg Arg Val Arg
355 360 365
Ser Ser Met Arg Gly Ser Arg Arg Gly Gly Pro Pro Gly Asp Pro Gly 370 375 380 Tyr Ala Glu Thr Pro Tyr Ala Ser Val Ser His His Ala Glu Ile Asp 385 390 395 400
Arg Tyr Gly Asp Ser Asp Gly Asp Pro Ile Tyr Asp Glu Val Ala Pro 405 410 415 Asp His Glu Ala Glu Leu Tyr Ala Arg Val Gin Arg Pro Gly Pro Val
420 425 430
Pro Asp Ala Glu Pro Ile Tyr Asp Thr Val Glu Gly Tyr Ala Pro Arg 435 440 445 Ser Ala Gly Glu Pro Val Tyr Ser Thr Val Arg Arg Trp 450 455 460
(2) INFORMATION FOR SEQ ID NO: 226:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 96 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 226:
Met Gly Leu Ala Phe Ser Gly Ala Arg Pro Cys Cys Cys Arg His Asn
1 5 10 15
Val Ile Ile Thr Asp Gly Gly Glu Val Val Ser Leu Thr Ala His Glu
20 25 30
Phe Asp Val Val Asp Ile Glu Ser Glu Glu Glu Gly Asn Phe Tyr Val 35 40 45
Pro Pro Asp Met Arg Val Val Thr Arg Ala Pro Gly Pro Gin Tyr Arg
50 55 60
Arg Ala Ser Asp Pro Pro Ser Arg His Thr Arg Arg Arg Asp Pro Asp 65 70 75 80 Val Ala Arg Pro Pro Ala Thr Leu Thr Pro Pro Leu Ser Asp Ser Glu
85 90 95
(2) INFORMATION FOR SEQ ID NO: 227:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 618 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 227: Met Ala Ala Ala Ala Thr Pro Gly Ala Lys Arg Pro Ala Asp Pro Ala
1 5 . 10 15
Arg Asp Pro Asp Ser Pro Pro Lys Arg Pro Arg Pro Asn Ser Leu Asp 20 25 30
Leu Ala Thr Val Phe Gly Pro Arg Pro Ala Pro Pro Arg Pro Thr Ser
35 40 45
Pro Gly Ala Pro Gly Ser His Trp Pro Gin Ser Pro Pro Arg Gly Gin 50 55 60 Pro Asp Gly Gly Ala Pro Gly Glu Lys Ala Arg Pro Asp Ala Leu Ser 65 70 75 80
Glu Ala Ser Ser Gly Pro Pro Thr Pro Asp Ile Pro Leu Ser Pro Gly
85 90 95
Gly Ala His Ala Ile Asp Pro Asp Cys Ser Pro Gly Pro Pro Asp Pro 100 105 110
Asp Pro Met Trp Ser Ala Ser Ala Ile Pro Asn Ala Leu Pro Pro His
115 120 125
Ile Leu Ala Glu Thr Phe Glu Arg His Leu Arg Gly Leu Leu Arg Gly
130 135 140 Val Arg Ser Pro Leu Ala Ile Gly Pro Leu Trp Ala Arg Leu Asp Tyr
145 150 155 160
Leu Cys Ser Leu Val Val Ser Leu Glu Ala Ala Gly Met Val Asp Arg
165 170 175
Gly Leu Gly Arg His Leu Trp Arg Leu Thr Arg Arg Ala Pro Pro Ser 180 185 190
Ala Ala Glu Ala Val Ala Pro Arg Pro Leu Met Gly Phe Tyr Glu Ala
195 200 205
Ala Thr Gin Asn Gin Ala Asp Cys Gin Leu Trp Ala Leu Leu Arg Arg
210 215 220 Gly Leu Thr Thr Ala Ser Thr Leu Arg Trp Gly Ala Gin Gly Pro Cys
225 230 235 240
Phe Ser Ser Gin Trp Leu Thr His Asn Ala Ser Leu Arg Leu Asp Ala
245 250 255
Gin Ser Ser Ala Val Met Phe Gly Arg Val Asn Glu Pro Thr Ala Arg 260 265 270
Asn Leu Leu Phe Arg Tyr Cys Val Gly Arg Ala Asp Ala Gly Val Asn
275 280 285
Asp Asp Ala Asp Ala Gly Arg Phe Val Phe His Gin Pro Gly Asp Leu 290 295 300 Ala Glu Glu Asn Val His Ala Cys Gly Val Leu Met Asp Gly His Thr 305 310 315 320
Gly Met Val Gly Ala Ser Leu Asp Ile Leu Val Cys Pro Arg Asp Pro 325 330 335 His Gly Tyr Leu Ala Pro Ala Pro Gin Thr Pro Leu Ala Phe Tyr Glu
340 345 350
Val Lys Cys Arg Ala Lys Tyr. Ala Phe Asp Pro Ala Asp Pro Gly Ala 355 360 365 Pro Ala Ala Ser Ala Tyr Glu Asp Leu Met Ala Arg Arg Ser Pro Glu 370 375 380
Ala Phe Arg Ala Phe Ile Arg Ser Ile Pro Asn Pro Gly Val Arg Tyr 385 390 395 400
Phe Ala Pro Gly Arg Val Pro Gly Pro Glu Glu Ala Leu Val Thr Gin 405 410 415
Asp Arg Asp Trp Leu Asp Ser Arg Ala Ala Gly Glu Lys Arg Arg Cys
420 425 430
Ser Ala Pro Asp Arg Ala Leu Val Glu Leu Asn Ser Gly Val Val Ser 435 440 445 Glu Val Leu Leu Phe Gly Val Pro Asp Leu Glu Arg Arg Thr Ile Ser 450 455 460
Pro Val Ala Trp Ser Ser Gly Glu Leu Val Arg Arg Glu Pro Ile Phe 465 470 475 480
Ala Asn Pro Arg His Pro Asn Phe Lys Gin Ile Leu Val Gin Gly Tyr 485 490 495
Val Leu Asp Ser His Phe Pro Asp Cys Pro Leu Gin Pro His Leu Val
500 505 510
Thr Phe Leu Gly Arg His Arg Ala Gly Ala Glu Glu Gly Val Thr Phe 515 520 525 Arg Leu Glu Asp Gly Arg Gly Ala Pro Ala Gly Arg Gly Gly Ala Pro 530 535 540
Gly Pro Ala Lys Ala Ser Ile Leu Pro Asp Gin Ala Val Pro Ile Ala 545 550 555 560
Leu Ile Ile Thr Pro Val Arg Val Glu Pro Gly Ile Tyr Arg Asp Ile 565 570 575
Arg Arg Asn Ser Arg Leu Ala Phe Asp Asp Thr Leu Ala Lys Leu Trp
580 585 590
Ala Ser Arg Ser Pro Gly Arg Gly Pro Ala Ala Ala Asp Thr Thr Ser 595 600 605 Ser Ser Pro Thr Ala Gly Arg Ser Ser Arg 610 615
(2) INFORMATION FOR SEQ ID NO: 228:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 516 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 228:
Met Asp Glu Ser Gly Arg Gin Arg Pro Ala Ser His Val Ala Ala Asp
1 5 10 15
Ile Ser Pro Gin Gly Ala His Arg Arg Ser Phe Lys Ala Trp Leu Ala 20 25 30
Ser Tyr Ile His Ser Leu Ser Arg Arg Ala Ser Gly Arg Pro Ser Gly
35 40 45
Pro Ser Pro Arg Asp Gly Ala Val Ser Gly Ala Arg Pro Gly Ser Arg 50 55 60 Arg Arg Ser Ser Phe Arg Glu Arg Leu Arg Ala Gly Leu Ser Arg Trp 65 70 75 80
Arg Val Ser Arg Ser Ser Arg Arg Arg Ser Ser Pro Glu Ala Pro Gly
85 90 95
Pro Ala Ala Lys Leu Arg Arg Pro Pro Leu Arg Arg Ser Glu Thr Ala 100 105 110
Met Thr Ser Pro Pro Ser Pro Pro Ser His Ile Leu Ser Leu Ala Arg
115 120 125
Ile His Lys Leu Cys Ile Pro Val Phe Ala Val Asn Pro Ala Leu Arg
130 135 140 Tyr Thr Thr Leu Glu Ile Pro Gly Ala Arg Ser Phe Gly Gly Ser Gly
145 150 155 160
Gly Tyr Gly Glu Val Gin Leu Ile Arg Glu His Lys Leu Ala Val Lys
165 170 175
Thr lie Arg Glu Lys Glu Trp Phe Ala Val Glu Leu Val Ala Thr Leu 180 185 190
Leu Val Gly Glu Cys Ala Leu Arg Gly Gly Arg Thr His Asp Ile Arg
195 200 205
Gly Phe Ile Thr Pro Leu Gly Phe Ser Leu Gin Gin Arg Gin Ile Val
210 215 220 Phe Pro Ala Tyr Asp Met Asp Leu Gly Lys Tyr Ile Gly Gin Leu Ala
225 230 235 240
Ser Leu Arg Ala Thr Thr Pro Ser Val Ala Thr Ala Leu His His Cys
245 250 255
Phe Thr Asp Leu Ala Arg Ala Val Val Phe Leu Asn Thr Arg Cys Gly 260 265 270
Ile Ser His Leu Asp Ile Lys Cys Ala Asn Val Leu Val Met Leu Arg
275 280 285
Ser Asp Ala Val Ser Leu Arg Arg Ala Val Leu Ala Asp Phe Ser Leu 290 295 300
Val Thr Leu Asn Ser Asn Ser Thr Ile Ser Arg Gly Gin Phe Cys Leu 305 310 , 315 320
Gin Glu Pro Asp Leu Glu Ser Pro Arg Gly Phe Gly Met Pro Ala Ala 325 330 335
Leu Thr Thr Ala Asn Phe His Thr Leu Val Gly His Gly Tyr Asn Gin
340 345 350
Pro Pro Glu Leu Leu Val Lys Tyr Leu Asn Asn Glu Arg Ala Glu Phe 355 360 365 Asn Asn Arg Pro Leu Lys His Asp Val Gly Leu Ala Val Asp Leu Tyr 370 375 380
Ala Leu Gly Gin Thr Leu Leu Glu Leu Leu Val Ser Val Tyr Val Ala 385 390 395 400
Pro Ser Leu Gly Val Pro Val Thr Arg Val Pro Gly Tyr Gin Tyr Phe 405 410 415
Asn Asn Gin Leu Ser Pro Asp Phe Ala Val Leu Ala Tyr Arg Cys Val
420 425 430
Leu His Pro Ala Leu Phe Val Asn Ser Ala Glu Thr Asn Thr His Gly 435 440 445 Leu Ala Tyr Asp Val Pro Glu Gly Ile Arg Arg His Leu Arg Asn Pro 450 455 460
Lys Ile Arg Arg Ala Phe Thr Glu Gin Cys Ile Asn Tyr Gin Arg Thr 465 470 475 480
His Lys Ala Val Leu Ser Ser Val Ser Leu Pro Pro Glu Leu Arg Pro 485 490 495
Leu Leu Val Leu Val Ser Arg Leu Cys His Ala Asn Pro Ala Ala Arg
500 505 510
His Ser Leu Ser 515
(2) INFORMATION FOR SEQ ID NO: 229:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 217 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 229:
Met Ser Arg Asp Ala Ser His Ala Ala Leu Arg Arg Arg Leu Ala Glu 1 5 10 15
Thr His Leu Arg Ala Glu Val Tyr Arg Asp Gin Thr Leu Gin Leu His
20 25 30 Arg Glu Gly Val Ser Thr Gin Asp Pro Arg Phe Val Gly Ala Phe Met 35 40 45
Ala Ala Lys Ala Ala His Leu Glu Leu Glu Ala Arg Leu Lys Ser Arg
50 55 60
Ala Arg Leu Glu Met Met Arg Gin Arg Ala Thr Cys Val Lys Ile Arg 65 70 75 80 Val Glu Glu Gin Ala Ala Arg Arg Asp Phe Leu Thr Ala His Arg Arg
85 90 95 Tyr Leu Asp Pro Ala Leu Ser Leu Asp Ala Ala Asp Asp Arg Leu Ala
100 105 110 Asp Gin Glu Glu Gin Leu Glu Glu Ala Ala Ala Asn Ala Ser Leu Trp 115 120 125
Gly Asp Gly Asp Leu Ala Asp Gly Trp Met Ser Pro Gly Asp Ser Asp
130 135 140
Leu Leu Val Met Trp Gin Leu Thr Ser Ala Pro Lys Val His Thr Asp
145 150 155 160 Ala Pro Ser Arg Pro Gly Ser Arg Pro Thr Tyr Thr Pro Ser Ala Ala
165 170 175 Gly Arg Pro Asp Ala Gin Ala Ala Pro Pro Pro Glu Thr Ala Pro Ser
180 185 190 Pro Glu Pro Ala Pro Gly Pro Ala Ala Asp Pro Ala Ser Gly Ser Gly 195 200 205
Phe Ala Arg Asp Cys Pro Asp Gly Glu 210 215
(2) INFORMATION FOR SEQ ID NO: 230:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 430 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 230:
Met Phe Gly Gin Gin Leu Ala Ser Asp Val Gin Gin Tyr Leu Glu Arg
1 5 10 15
Leu Glu Lys Gin Arg Gin Gin Lys Val Gly Val Asp Glu Ala Ser Ala 20 25 30
Gly Leu Thr Leu Gly Gly Asp Ala Leu Arg Val Pro Phe Leu Asp Phe
35 -40 45
Ala Thr Ala Thr Pro Lys Arg His Gin Thr Val Val Pro Gly Val Gly 50 55 60
Thr Leu His Asp Cys Cys Glu His Ser Pro Leu Phe Ser Ala Val Ala 65 70 75 80
Arg Arg Leu Leu Phe Asn Ser Leu Val Pro Ala Gin Leu Arg Gly Arg 85 90 95 Asp Phe Gly Gly Asp His Thr Ala Lys Leu Glu Phe Leu Ala Pro Glu 100 105 110
Leu Val Arg Ala Val Ala Arg Leu Arg Phe Arg Glu Cys Ala Pro Glu
115 120 125
Asp Ala Val Pro Gin Arg Asn Ala Tyr Tyr Ser Val Leu Asn Thr Phe 130 135 140
Gin Ala Leu His Arg Ser Glu Ala Phe Arg Gin Leu Val His Phe Val
145 150 155 160
Arg Asp Phe Ala Gin Leu Leu Lys Thr Ser Phe Arg Ala Ser Ser Leu
165 170 175 Ala Glu Thr Thr Gly Pro Pro Lys Lys Arg Ala Lys Val Asp Val Ala
180 185 190
Thr His Gly Gin Thr Tyr Gly Thr Leu Glu Leu Phe Gin Lys Met Ile
195 200 205
Leu Met His Ala Thr Tyr Phe Leu Ala Ala Val Leu Leu Gly Asp His 210 215 220
Ala Glu Gin Val Asn Thr Phe Leu Arg Leu Val Phe Glu Ile Pro Leu
225 230 235 240
Phe Ser Asp Thr Ala Val Arg His Phe Arg Gin Arg Ala Thr Val Phe
245 250 255 Leu Val Pro Arg Arg His Gly Lys Thr Trp Phe Leu Val Pro Leu Ile
260 265 270
Ala Leu Ser Leu Ala Ser Phe Arg Gly Ile Lys Ile Gly Tyr Thr Ala
275 280 285
His Ile Arg Lys Ala Thr Glu Pro Val Phe Asp Glu Ile Asp Ala Cys 290 295 300
Leu Arg Gly Trp Phe Gly Ser Ser Arg Val Asp His Val Lys Gly Glu 305 310 315 320
Thr Ile Ser Phe Ser Phe Pro Asp Gly Ser Arg Ser Thr Ile Val Phe 325 330 335 Ala Ser Ser His Asn Thr Asn Val Ser Thr Pro Ser Ser Arg Gly Ala 340 345 350
Cys Phe Pro Gly Ala Ala Leu Pro Glu Ile Asp Arg Gin Thr Asn Thr 355 360 365 Ala Arg Arg Glu Cys Gly Thr Trp Gin Pro Pro Pro Pro Trp Arg Gly
370 375 380
Glu Ala Leu Leu Phe Ile Cys Asn Arg Thr Met Arg Leu Trp Pro Arg 385 390 395 400 Pro Ala Arg Pro Arg Gly Ser Ser Leu Gin Thr Gly Gly Trp Tyr Thr
405 410 415
Met Thr Glu Arg Arg Gly Ala Thr Arg Arg Trp Ser Gly Gly 420 425 430
(2) INFORMATION FOR SEQ ID NO: 231:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 315 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:231:
Val Trp Arg Val Val Arg Gly Asp Glu Arg Leu Lys Ile Phe Arg Cys
1 5 10 15
Leu Thr Val Leu Thr Glu Pro Leu Cys Gin Val Pro Asp Pro Asp Pro 20 25 30
Glu Arg Ala Leu Phe Cys Glu Ile Phe Leu Tyr Leu Trp Lys Ala Leu
35 40 45
Arg Leu Pro Ser Asn Thr Phe Phe Ala Ile Phe Phe Phe Asn Arg Glu 50 55 60 Arg Arg Tyr Cys Ala Thr Val His Leu Arg Ser Val Thr His Pro Arg 65 70 75 80
Thr Pro Leu Leu Cys Thr Leu Ala Phe Gly His Leu Glu Ala Asp Pro
85 90 95
Glu Glu Thr Pro Asp Pro Ala Ala Glu Gin Leu Ala Asp Glu Pro Val 100 105 110
Ala His Glu Leu Asp Gly Ala Tyr Leu Val Pro Thr Glu Pro Pro Pro
115 120 125
Asn Pro Gly Ala Cys Cys Ala Leu Gly Pro Gly Ala Trp Trp His Leu 130 135 140 Pro Gly Gly Arg Ile Tyr Cys Trp Ala Met Asp Asp Asp Leu Gly Ser 145 150 155 160
Leu Cys Pro Pro Gly Ser Arg Ala Arg His Leu Gly Trp Leu Leu Ser 165 170 175 Arg Ile Thr Asp Pro Pro Gly Gly Gly Gly Ala Cys Ala Pro Thr Ala
180 185 190
His Ile Asp Ser Ala Asn Ala- Leu Trp Arg Ala Pro Ala Val Ala Glu 195 200 205 Ala Cys Pro Cys Val Ala Pro Cys Met Trp Ser Asn Met Ala Gin Arg 210 215 220
Thr Leu Ala Val Arg Gly Asp Ala Ser Leu Cys Gin Leu Leu Phe Gly 225 230 235 240
His Pro Val Asp Ala Val Ile Leu Arg Gin Ala Thr Arg Arg Pro Arg 245 250 255
Ile Thr Ala His Leu His Glu Val Val Val Gly Arg Asp Gly Ala Glu
260 265 270
Ser Val Ile Arg Pro Thr Ser Ala Gly Trp Arg Leu Cys Val Leu Ser 275 280 285 Ser Tyr Thr Ser Arg Leu Phe Ala Thr Ser Cys Pro Ala Val Ala Arg 290 295 300
Ala Val Ala Arg Ala Ser Ser Ser Asp Tyr Lys 305 310 315
(2) INFORMATION FOR SEQ ID NO: 232:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 698 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232:
Met Asn Ala His Phe Ala Asn Glu Val Gin Tyr Asp Leu Thr Arg Asp
1 5 10 15
Pro Ser Ser Pro Ala Ser Leu Ile His Val Ile Ile Ser Ser Glu Cys 20 25 30
Leu Ala Ala Ala Gly Val Pro Leu Ser Ala Leu Val Arg Gly Arg Pro
35 40 45
Asp Gly Gly Ala Ala Ala Asn Phe Arg Val Glu Thr Gin Thr Arg Ala 50 55 60 His Ala Thr Gly Asp Cys Thr Pro Trp Arg Ser Ala Phe Ala Ala Tyr 65 70 75 80
Val Pro Ala Asp Ala Val Gly Ala Ile Leu Ala Pro Val Ile Pro Ala 85 90 95 His Pro Asp Leu Leu Pro Arg Val Pro Ser Ala Gly Gly Leu Phe Val
100 105 110
Ser Leu Pro Val Ala Cys Asp- Ala Gin Gly Val Tyr Asp Pro Tyr Thr 115 120 125 Val Ala Ala Leu Arg Leu Ala Trp Gly Pro Trp Ala Thr Cys Ala Arg 130 135 140
Val Leu Leu Phe Ser Tyr Asp Glu Leu Val Pro Pro Asn Thr Arg Tyr 145 150 155 160
Ala Ala Asp Gly Ala Arg Leu Met Arg Leu Cys Arg His Phe Cys Arg 165 170 175
Tyr Val Ala Arg Leu Gly Ala Ala Ala Pro Ala Ala Ala Thr Glu Ala
180 185 190
Ala Ala His Leu Ser Leu Gly Met Gly Glu Ser Gly Thr Pro Thr Pro 195 200 205 Gin Ala Ser Ser Val Ser Gly Gly Ala Gly Pro Ala Val Val Gly Thr 210 215 220
Pro Asp Pro Pro Ile Ser Pro Glu Glu Gin Leu Thr Ala Pro Gly Gly 225 230 235 240
Asp Thr Ala Thr Ala Glu Asp Val Ser Ile Thr Gin Glu Asn Glu Glu 245 250 255
Ile Leu Ala Leu Val Gin Arg Ala Val Gin Asp Val Thr Arg Arg His
260 265 270
Pro Val Arg Ala Arg Pro Lys His Ala Ala Ser Gly Val Ala Ser Gly 275 280 285 Leu Arg Gin Gly Ala Leu Val His Gin Ala Val Ser Gly Gly Ala Leu 290 295 300
Gly Ala Ser Asp Ala Glu Ala Val Leu Ala Gly Leu Glu Pro Pro Gly 305 310 315 320
Gly Gly Arg Phe Ala Thr Pro Gly Gly Pro Arg Ala Ala Gly Glu Asp 325 330 335
Val Leu Asn Asp Val Leu Thr Leu Val Pro Gly Thr Ala Lys Pro Arg
340 345 350
Ser Leu Val Glu Trp Leu Asp Arg Gly Trp Glu Ala Gly Gly Asp Arg 355 360 365 Pro Asp Trp Leu Trp Ser Arg Arg Ser Ile Ser Val Val Leu Arg His 370 375 380
His Tyr Gly Thr Lys Gin Arg Phe Val Val Val Ser Tyr Glu Asn Ser 385 390 395 400
Val Ala Trp Gly Gly Arg Arg Ala Arg Pro Pro Arg Leu Ser Ser Glu 405 410 415
Leu Ala Thr Ala Leu Thr Glu Ala Cys Ala Ala Glu Arg Val Val Arg
420 425 430
Pro His Gin Leu Ser Pro Ala Ala Gin Thr Ala Leu Leu Arg Arg Phe 435 440 445
Pro Ala Leu Glu Gly Pro Leu Arg His Pro Arg Pro Val Leu Gin Pro
450 455 - 460
Phe Asp Ile Ala Ala Glu Val Ala Phe Val Ala Arg Ile Gin Ile Ala 465 470 475 480
Cys Leu Arg Ala Leu Gly His Ser Ile Arg Ala Ala Leu Gin Gly Gly
485 490 495
Pro Arg Ile Phe Gin Arg Leu Arg Tyr Asp Phe Gly Pro His Gin Ser 500 505 510 Glu Trp Leu Gly Glu Val Thr Arg Arg Phe Pro Val Leu Leu Glu Asn 515 520 525
Leu Met Arg Ala Leu Glu Gly Thr Ala Pro Asp Ala Phe Phe His Thr
530 535 540
Ala Tyr Ala Val Leu Ala His Leu Gly Gly Gin Gly Gly Arg Gly Arg 545 550 555 560
Arg Arg Arg Leu Val Pro Leu Ser Asp Asp Ile Pro Ala Arg Phe Ala
565 570 575
Asp Ser Asp Ala His Tyr Ala Phe Asp Tyr Tyr Ser Thr Ser Gly Asp 580. 585 590 Thr Leu Arg Leu Thr Asn Arg Pro Ile Ala Val Val Ile Asp Gly Asp 595 600 605
Val Asn Gly Arg Glu Gin Ser Lys Cys Arg Phe Met Glu Gly Ser Pro
610 615 620
Ser Thr Ala Pro His Arg Val Cys Glu Gin Tyr Leu Pro Gly Glu Ser 625 630 635 640
Tyr Ala Tyr Leu Cys Leu Gly Phe Asn Arg Arg Leu Cys Gly Leu Val
645 650 655
Val Phe Pro Gly Gly Phe Ala Phe Thr Ile Asn Thr Ala Ala Tyr Leu 660 665 670 Ser Leu Ala Asp Pro Val Ala Arg Ala Val Gly Leu Arg Phe Cys Arg 675 680 685
Gly Ala Ala Thr Gly Pro Gly Leu Val Arg 690 695
(2) INFORMATION FOR SEQ ID NO: 233:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 423 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:233:
Val Pro Glu Gly Ala Trp Val Gly Gly Ala Cys Ala Arg Pro Arg Gly 1 5 10 15
Pro Arg Ala His Val Arg Leu Tyr Ala Val Cys Phe Val Cys Pro Gin
20 25 30
Gly Ile Arg Gly Gin Asp Phe Asn Leu Leu Phe Val Asp Glu Ala Asn 35 40 45 Phe Ile Arg Pro Asp Ala Val Gin Thr Ile Met Gly Phe Leu Asn Gin 50 55 60
Ala Asn Cys Lys Ile Ile Phe Val Ser Ser Thr Asn Thr Gly Lys Ala 65 70 75 80
Ser Thr Ser Phe Leu Tyr Asn Leu Arg Gly Ala Ala Asp Glu Leu Leu 85 90 95
Asn Val Val Thr Tyr Ile Cys Asp Asp His Met Pro Arg Val Val Thr
100 105 110
His Thr Asn Ala Thr Ala Cys Ser Cys Tyr Ile Leu Asn Lys Pro Val 115 120 125 Phe Ile Thr Met Asp Gly Ala Val Arg Arg Thr Ala Asp Leu Phe Leu 130 135 140
Pro Asp Ser Phe Met Gin Glu Ile Ile Gly Gly Gin Ala Arg Glu Thr 145 150 155 160
Gly Asp Asp Arg Pro Val Leu Thr Lys Ser Ala Gly Glu Arg Phe Leu 165 170 175
Leu Tyr Arg Pro Ser Thr Thr Thr Asn Ser Gly Leu Met Ala Pro Glu
180 185 190
Leu Tyr Val Tyr Val Asp Pro Ala Phe Thr Ala Asn Thr Arg Ala Ser 195 200 205 Gly Thr Gly Ile Ala Val Val Gly Arg Tyr Arg Asp Asp Phe Ile lie 210 215 220
Phe Ala Leu Glu His Phe Phe Leu Arg Ala Leu Thr Gly Ser Ala Pro 225 230 235 240
Ala Asp Ile Ala Arg Cys Val Val His Ser Leu Ala Gin Val Leu Ala 245 250 255
Leu His Pro Gly Ala Phe Arg Ser Val Arg Val Ala Val Glu Gly Asn
260 265 270
Ser Ser Gin Asp Ser Ala Val Ala Ile Ala Thr His Val His Thr Glu 275 280 285 Met His Arg Ile Leu Ala Ser Ala Gly Ala Asn Gly Pro Gly Pro Glu 290 295 300
Leu Leu Phe Tyr His Cys Glu Pro Pro Gly Gly Ala Val Leu Tyr Pro 305 310 315 320 Phe Phe Leu Leu Asn Lys Gin Lys Thr Pro Ala Phe Glu Tyr Phe Ile
325 330 335
Lys Lys Phe Asn Ser Gly Gly Val Met Ala Ser Gin Glu Leu Val Ser
340 345 350
Val Thr Val Arg Leu Gin Thr Asp Pro Val Glu Tyr Leu Ser Glu Gin
355 360 365
Leu Asn Asn Leu Ile Glu Thr Val Ser Pro Asn Thr Asp Val Arg Met
370 375 380
Tyr Ser Gly Lys Arg Asn Gly Ala Ala Asp Asp Leu Met Val Ala Val 385 390 395 400 lie Met Ala Ile Tyr Leu Ala Ala Pro Thr Gly Ile Pro Pro Ala Phe
405 410 415
Phe Pro Ile Thr Arg Thr Ser 420
(2) INFORMATION FOR SEQ ID NO: 234:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 312 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 234:
Met Ile Thr Asp Cys Phe Glu Ala Asp Ile Ala Ile Pro Ser Gly Ile 1 5 10 15 Ser Arg Pro Asp Ala Ala Ala Leu Gin Arg Cys Glu Gly Arg Val Val 20 25 30
Phe Leu Pro Thr Ile Arg Arg Gin Leu Ala Asp Val Ala His Glu Ser
35 40 45
Phe Val Ser Gly Gly Val Ser Pro Asp Thr Leu Gly Leu Leu Leu Ala 50 55 60
Tyr Arg Arg Arg Phe Pro Ala Val Ile Thr Arg Val Leu Pro Thr Arg 65 70 75 80
Ile Val Ala Cys Pro Val Asp Leu Gly Leu Thr His Ala Gly Thr Val 85 90 95 Asn Leu Arg Asn Thr Ser Pro Val Asp Leu Cys Asn Gly Asp Pro Val 100 105 110
Ser Leu Val Pro Pro Val Phe Glu Gly Gin Ala Thr Asp Val Arg Leu 115 120 125 Glu Ser Leu Asp Leu Thr Leu Arg Phe Pro Val Pro Leu Pro Thr Pro
130 135 140
Leu Ala Arg Glu Ile Val Ala-Arg Leu Val Arg Ile Arg Asp Leu Asn 145 150 155 160 Pro Asp Pro Arg Thr Pro Gly Glu Leu Pro Asp Leu Asn Val Leu Tyr
165 170 175 Tyr Asn Gly Ala Arg Leu Ser Leu Val Ala Asp Val Gin Gin Leu Ala 180 185 190
Ser Val Asn Thr Glu Leu Arg Ser Leu Val Leu Asn Met Val Tyr Ser 195 200 205
Ile Thr Glu Gly Thr Thr Leu Ile Leu Thr Leu Ile Pro Arg Leu Leu
210 215 220
Ala Leu Ser Ala Gin Asp Gly Tyr Val Asn Ala Leu Leu Gin Met Gin 225 230 235 240 Ser Val Thr Arg Glu Ala Ala Gin Leu Ile His Pro Glu Ala Pro Met
245 250 255 Leu Met Gin Asp Gly Glu Arg Arg Leu Pro Leu Tyr Glu Ala Leu Val 260 265 270
Ala Trp Leu Ala His Ala Gly Gin Leu Gly Asp Ile Leu Ala Pro Ala 275 280 285
Val Arg Val Cys Thr Phe Asp Gly Ala Ala Val Val Gin Ser Gly Asp
290 295 300
Met Ala Pro Val Ile Arg Tyr Pro 305 310
(2) INFORMATION FOR SEQ ID NO: 235:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 222 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235:
Met Thr Met Arg Asp Asp Val Pro Leu Leu Asp Arg Glu Leu Val Tyr 1 5 10 15 Glu Ala Ala Cys Gly Gly Glu Asp Gly Glu Leu Pro Leu Asp Glu Gin 20 25 30
Phe Ser Leu Ser Ser Tyr Gly Thr Ser Asp Phe Phe Val Ser Ser Ala 35 40 45 Tyr Ser Arg Leu Pro Pro His Thr Gin Pro Val Phe Ser Lys Arg Val
50 55 60
Val Met Phe Ala Trp Ser Phe -Leu Val Leu Lys Pro Leu Glu Leu Val 65 70 75 80 Ala Ala Gly Met Tyr Tyr Gly Trp Thr Gly Arg Ala Val Ala Pro Ala
85 90 95
Cys Ile Ile Ala Ala Val Leu Ala Tyr Tyr Val Thr Trp Leu Ala Arg
100 105 110
Ala Leu Leu Leu Tyr Val Asn Ile Lys Arg Asp Arg Leu Pro Leu Ser 115 120 125
Pro Pro Val Phe Trp Gly Leu Cys Val Ile Met Gly Gly Ala Ala Leu
130 135 140
Cys Ala Leu Val Ala Ala Ala His Glu Thr Phe Ser Pro Asp Gly Leu 145 150 155 160 Phe His Trp Ile Thr Ala Ser Gin Leu Leu Pro Arg Thr Asp Pro Leu
165 170 175
Arg Ala Arg Ser Leu Gly Ile Ala Cys Ala Ala Gly Ala Ala Met Trp
180 185 190
Val Ala Ala Ala Asp Cys Phe Ala Ala Phe Thr Asn Phe Phe Leu Ala 195 200 205
Arg Phe Trp Thr Arg Ala Ile Leu Lys Ala Pro Val Ala Phe 210 215 220
(2) INFORMATION FOR SEQ ID NO: 236:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 824 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 236:
Met Gly Pro Gly Leu Trp Val Val Met Gly Val Leu Val Gly Val Ala
1 5 10 15
Gly Gly His Asp Thr Tyr Trp Thr Glu Gin Ile Asp Pro Trp Phe Leu 20 25 30 His Gly Leu Gly Leu Ala Arg Thr Tyr Trp Arg Asp Thr Asn Thr Gly 35 40 45
Arg Leu Trp Leu Pro Asn Thr Pro Asp Ala Ser Asp Pro Gin Arg Gly 50 55 60 Arg Leu Ala Pro Pro Gly Glu Leu Asn Leu Thr Thr Ala Ser Val Pro 65 70 75 80
Met Leu Arg Trp Tyr Ala Glu Arg Phe Cys Phe Val Leu Val Thr Thr 85 90 95 Ala Glu Phe Pro Arg Asp Pro Gly Gin Leu Leu Tyr Ile Pro Lys Thr 100 105 110
Tyr Leu Leu Gly Arg Pro Arg Asn Ala Ser Leu Pro Glu Leu Pro Glu
115 120 125
Ala Gly Pro Thr Ser Arg Pro Pro Ala Glu Val Thr Gin Leu Lys Gly 130 135 140
Leu Ser His Asn Pro Gly Ala Ser Ala Leu Leu Arg Ser Arg Ala Trp
145 150 155 160
Val Thr Phe Ala Ala Ala Pro Asp Arg Glu Gly Leu Thr Phe Pro Arg
165 170 175 Gly Asp Asp Gly Ala Thr Glu Arg His Pro Asp Gly Arg Arg Asn Ala
180 185 190
Pro Pro Pro Gly Pro Pro Ala Gly Thr Pro Arg His Pro Thr Thr Asn
195 200 205
Leu Ser Ile Ala His Leu His Asn Ala Ser Val Thr Trp Leu Ala Arg 210 215 220
Leu Leu Arg Thr Pro Gly Arg Tyr Val Tyr Leu Ser Pro Ser Ala Ser
225 230 235 240
Thr Trp Pro Val Gly Val Trp Thr Thr Gly Gly Leu Ala Phe Gly Cys
245 250 255 Asp Ala Ala Leu Val Arg Ala Arg Tyr Gly Lys Gly Phe Met Gly Leu
260 265 270
Val Ile Ser Met Arg Asp Ser Pro Pro Ala Glu Ile Ile Val Val Pro
275 280 285
Ala Asp Lys Thr Leu Ala Arg Val Gly Asn Pro Thr Asp Glu Asn Ala 290 295 300
Pro Ala Val Leu Pro Gly Pro Pro Ala Gly Pro Arg Tyr Arg Val Phe
305 310 315 320
Val Leu Gly Ala Pro Thr Pro Ala Asp Asn Gly Ser Ala Leu Asp Ala
325 330 335 Leu Arg Arg Val Ala Gly Tyr Pro Glu Glu Ser Thr Asn Tyr Ala Gin
340 345 350
Tyr Met Ser Arg Ala Tyr Ala Glu Phe Leu Gly Glu Asp Pro Gly Ser
355 360 365
Gly Thr Asp Ala Arg Pro Ser Leu Phe Trp Arg Leu Ala Gly Leu Leu 370 375 380
Ala Ser Ser Gly Phe Ala Phe Val Asn Ala Ala His Ala His Asp Ala 385 390 395 400
Ile Arg Leu Ser Asp Leu Leu Gly Phe Leu Ala His Ser Arg Val Leu 405 410 415
Ala Gly Leu Ala Arg Ala Ala Gly Cys Ala Ala Asp Ser Val Phe Leu
420 425 430
Asn Val Ser Val Leu Asp Pro Ala Ala Arg Leu Arg Leu Glu Ala Arg 435 440 445
Leu Gly His Leu Val Ala Ala Ile Arg Glu Gin Ser Leu Ala Ala His
450 455 460
Ala Leu Gly Tyr Gin Leu Ala Phe Val Leu Asp Ser Pro Ala Ala Tyr 465 470 475 480 Gly Ala Val Ala Pro Ser Ala Ala Arg Leu Ile Asp Ala Leu Tyr Ala
485 490 495
Glu Phe Leu Gly Gly Arg Ala Leu Thr Ala Pro Met Val Arg Arg Ala
500 505 510
Leu Phe Tyr Ala Thr Ala Val Leu Arg Ala Pro Phe Leu Ala Gly Ala 515 520 525
Pro Ser Ala Glu Gin Arg Glu Arg Ala Arg Arg Gly Leu Leu Ile Thr
530 535 540
Thr Ala Leu Cys Thr Ser Asp Val Ala Ala Ala Thr His Ala Asp Leu 545 550 555 560 Arg Ala Ala Arg Thr Asp His Gin Lys Asn Leu Phe Trp Leu Pro Asp
565 570 575
His Phe Ser Pro Cys Ala Ala Ser Leu Arg Phe Asp Leu Ala Glu Gly
580 585 590
Gly Phe Ile Leu Asp Ala Met Ala Thr Arg Ser Asp Ile Pro Ala Asp 595 600 605
Val Met Ala Gin Gin Thr Arg Gly Val Ala Ser Val Leu Thr Arg Trp
610 615 620
Ala His Tyr Asn Ala Leu Ile Arg Ala Phe Val Pro Glu Ala Thr His 625 630 635 640 Gin Cys Ser Gly Pro Ser His Asn Ala Glu Pro Arg Ile Leu Val Pro
645 650 655
Ile Thr His Asn Ala Ser Tyr Val Val Thr His Thr Pro Leu Pro Arg
660 665 670
Gly Ile Gly Tyr Lys Leu Thr Gly Val Asp Val Arg Arg Pro Leu Phe 675 680 685
Ile Thr Tyr Leu Thr Ala Thr Cys Glu Gly His Ala Arg Glu Ile Glu
690 695 700
Pro Lys Arg Leu Val Arg Thr Glu Asn Arg Arg Asp Leu Gly Leu Val 705 710 715 720 Gly Ala Val Phe Leu Arg Tyr Thr Pro Ala Gly Glu Val Met Ser Val
725 730 735
Leu Leu Val Asp Thr Asp Ala Thr Gin Gin Gin Leu Ala Gin Gly Pro 740 745 750 Val Ala Gly Thr Pro Asn Val Phe Ser Ser Asp Val Pro Ser Val Leu
755 760 765
Leu Phe Pro Asn Gly Thr Val Ile His Leu Leu Ala Phe Asp Thr Leu
770 775 780 Pro Ile Ala Thr Ile Ala Pro Gly Phe Leu Ala Ala Ser Ala Leu Gly
785 790 795 800
Val Val Met Ile Thr Ala Ala Gly lie Leu Arg Val Val Arg Thr Cys
805 810 815
Val Pro Phe Leu Trp Arg Arg Glu 820
(2) INFORMATION FOR SEQ ID NO: 237:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 370 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 237:
Met Ala Ser His Ala Gly Gin Gin His Ala Pro Ala Phe Gly Gin Ala 1 5 10 15
Ala Arg Ala Ser Gly Pro Thr Asp Gly Arg Ala Ala Ser Arg Pro Ser
20 25 30
His Arg Gin Gly Ala Ser Asp Pro Glu Leu Pro Thr Leu Leu Arg Val 35 40 45 Tyr Ile Asp Gly Pro His Gly Val Gly Lys Thr Thr Thr Ser Ala Gin 50 55 60
Leu Met Glu Ala Leu Gly Pro Arg Asp Asn Ile Val Tyr Val Pro Glu 65 70 75 80
Pro Met Thr Tyr Trp Gin Val Leu Gly Ala Ser Glu Thr Leu Thr Asn 85 90 95
Ile Tyr Asn Thr Gin His Arg Leu Asp Arg Gly Glu Ile Ser Ala Gly
100 105 110
Glu Ala Ala Val Val Met Thr Ser Ala Gin Ile Thr Met Ser Thr Pro 115 120 125 Tyr Ala Ala Thr Asp Ala Val Leu Ala Pro His Ile Gly Gly Glu Ala 130 135 140
Val Gly Pro Gin Ala Pro Pro Pro Ala Leu Thr Leu Val Phe Asp Arg 145 150 155 160 His Pro Ile Ala Ser Leu Leu Cys Tyr Pro Ala Ala Arg Tyr Leu Met
165 170 175
Gly Ser Met Thr Pro Gin Ala Val Leu Ala Phe Val Met Pro Pro Thr 180 185 190 Ala Pro Gly Thr Asn Leu Val Leu Gly Val Leu Pro Glu Ala Glu His 195 200 205
Ala Asp Arg Leu Ala Arg Arg Gin Arg Pro Gly Glu Arg Leu Asp Leu
210 215 220
Ala Met Leu Ser Ala Ile Arg Arg Val Tyr Asp Leu Leu Ala Asn Thr 225 230 235 240
Val Arg Tyr Leu Gin Arg Gly Gly Arg Trp Arg Glu Asp Trp Gly Arg
245 250 255
Leu Thr Gly Val Ala Ala Ala Thr Pro Arg Pro Asp Pro Glu Asp Gly 260 265 270 Ala Gly Ser Leu Pro Arg Ile Glu Asp Thr Leu Phe Ala Leu Phe Arg 275 280 285
Val Pro Glu Leu Leu Ala Pro Asn Gly Asp Leu Tyr His Ile Phe Ala
290 295 300
Trp Val Leu Asp Val Leu Ala Asp Arg Leu Leu Pro Met His Leu Phe 305 310 315 320
Val Leu Asp Tyr Asp Gin Ser Pro Val Gly Cys Arg Asp Ala Leu Leu
325 330 335
Arg Leu Thr Ala Gly Met Ile Pro Thr Arg Val Thr Thr Ala Gly Ser 340 345 350 Ile Ala Glu Ile Arg Asp Leu Ala Arg Thr Phe Ala Arg Glu Val Gly 355 360 365
Gly Val 370
(2) INFORMATION FOR SEQ ID NO: 238:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 279 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:238:
Met Ala Arg Thr Gly Arg Arg Ala Ala Val Gly Arg Pro Ala Arg Thr 1 5 10 15 Ser Ser Leu Thr Glu Arg Arg Arg Val Leu Leu Ala Gly Val Arg Ser
20 25 30
His Thr Arg Phe Tyr Lys Ala Phe Ala Arg Glu Val Arg Glu Phe Asn 35 40 45 Ala Thr Arg Ile Cys Gly Thr Leu Leu Thr Leu Met Ser Gly Ser Leu 50 55 60
Gin Gly Arg Ser Leu Phe Glu Ala Thr Arg Val Thr Leu lie Cys Glu 65 70 75 80
Val Asp Leu Gly Pro Arg Arg Pro Asp Cys Ile Cys Val Phe Glu Phe 85 90 95
Ala Asn Asp Lys Thr Leu Gly Gly Val Cys Val Ile Leu Lys Thr Cys
100 105 110
Lys Ser Ile Ser Ser Gly Asp Thr Ala Ser Lys Arg Glu Gin Arg Thr 115 120 125 Thr Gly Met Lys Gin Leu Arg His Ser Leu Lys Leu Leu Gin Ser Leu 130 135 140
Ala Pro Pro Gly Asp Lys Val Val Tyr Leu Cys Pro Ile Leu Val Phe 145 150 155 160
Val Ala Gin Arg Thr Leu Arg Val Ser Arg Val Thr Arg Leu Val Pro 165 170 175
Gin Lys Ile Ser Gly Asn Ile Thr Ala Ala Val Arg Met Leu Gin Ser
180 185 190
Leu Ser Thr Tyr Ala Val Pro Pro Glu Pro Gin Thr Arg Arg Ser Arg 195 200 205 Arg Arg Val Ala Ala Thr Ala Arg Pro Gin Arg Pro Pro Ser Pro Thr 210 215 220
Arg Asp Pro Glu Gly Thr Ala Gly His Pro Ala Pro Pro Glu Ser Asp 225 230 235 240
Pro Pro Ser Pro Gly Val Val Gly Val Ala Ala Glu Gly Gly Gly Val 245 250 255
Leu Gin Lys Ile Ala Ala Leu Phe Cys Val Pro Val Ala Ala Lys Ser
260 265 270
Arg Pro Arg Thr Lys Thr Glu 275
(2) INFORMATION FOR SEQ ID NO: 239:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 571 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:239:
Met Asp Pro Tyr Tyr Pro Phe Asp Ala Leu Asp Val Trp Glu His Arg 1 5 10 15
Arg Phe Ile Val Ala Asp Ser Arg Ser Phe Ile Thr Pro Glu Phe Pro
20 25 30
Arg Asp Phe Trp Met Leu Pro Val Phe Asn Ile Pro Arg Glu Thr Ala 35 40 45
Ala Glu Arg Ala Ala Val Leu Gin Ala Gin Arg Thr Ala Ala Ala Ala
50 55 60
Ala Leu Glu Asn Ala Ala Leu Gin Ala Ala Glu Leu Pro Val Asp Ile 65 70 75 80 Glu Arg Arg Ile Arg Pro Ile Glu Gin Gin Val His His Ile Ala Asp
85 90 95
Ala Leu Glu Ala Leu Glu Thr Ala Ala Ala Ala Ala Glu Glu Ala Asp
100 105 110
Ala Ala Arg Asp Ala Glu Arg Glu Gly Ala Ala Asp Gly Ala Ala Pro 115 120 125
Ser Pro Thr Ala Gly Pro Ala Ala Ala Glu Met Glu Val Gin Ile Val
130 135 140
Arg Asn Asp Pro Pro Leu Arg Tyr Asp Thr Asn Leu Pro Val Asp Leu 145 150 155 160 Leu His Met Val Tyr Ala Gly Arg Gly Ala Ala Gly Ser Ser Gly Val
165 170 175
Val Phe Gly Thr Trp Tyr Arg Thr Ile Gin Glu Arg Thr Ile Ala Asp
180 185 190
Phe Pro Leu Thr Thr Arg Ser Ala Asp Phe Arg Asp Gly Arg Met Ser 195 200 205
Lys Thr Phe Met Thr Ala Leu Val Leu Ser Leu Gin Ser Cys Gly Arg
210 215 220
Leu Tyr Val Gly Gin Arg His Tyr Ser Ala Phe Glu Cys Ala Val Leu 225 230 235 240 Cys Leu Tyr Leu Leu Tyr Arg Thr Thr His Glu Ser Ser Pro Asp Arg
245 250 255
Asp Arg Ala Pro Val Ala Phe Gly Asp Leu Leu Ala Arg Leu Pro Arg
260 265 270
Tyr Leu Ala Arg Leu Ala Ala Val Ile Gly Asp Glu Ser Gly Arg Pro 275 280 285
Gin Tyr Arg Tyr Arg Asp Asp Lys Leu Pro Lys Ala Gin Phe Ala Ala
290 295 300
Ala Gly Gly Arg Tyr Glu His Gly Ala Thr His Val Val Ile Ala Thr 305 310 315 320
Leu Val Arg His Gly Val Leu Pro Ala Ala Pro Gly Asp Val Pro Arg
325 330 335
Asp Thr Ser Thr Arg Val Asn Pro Asp Asp Val Ala His Arg Asp Asp 340 345 350
Val Asn Arg Ala Ala Ala Ala Phe Leu Arg His Asn Leu Phe Leu Trp
355 360 365
Glu Asp Gin Thr Leu Leu Arg Ala Thr Ala Asn Thr Ile Thr Ala Val
370 375 380 Leu Arg Arg Leu Leu Ala Asn Gly Asn Val Tyr Ala Asp Arg Leu Asp
385 390 395 400
Asn Arg Leu Gin Leu Gly Met Leu Ile Pro Gly Ala Val Pro Ala Glu
405 410 415
Ala Ile Arg Ala Ser Gly Leu Asp Ser Gly Ala Ile Lys Ser Gly Asp 420 425 430
Asn Asn Leu Glu Ala Leu Cys Val Asn Tyr Val Leu Pro Leu Tyr Gin
435 440 445
Ala Asp Pro Thr Val Glu Leu Thr Gin Leu Phe Pro Gly Leu Ala Ala
450 455 460 Leu Cys Leu Asp Ala Gin Ala Gly Arg Pro Leu Ala Ser Thr Arg Arg
465 470 475 480
Val Val Asp Met Ser Ser Gly Ala Arg Gin Ala Ala Leu Val Arg Leu
485 490 495
Thr Ala Leu Glu Leu Ile Asn Arg Thr Arg Thr Asn Thr Thr Pro Val 500 505 510
Gly Glu Ile Ile Asn Ala His Asp Ala Leu Gly Ile Gin Tyr Glu Gin
515 520 525
Gly Leu Gly Leu Leu Ala Gin Gin Ala Arg Ile Gin Ala Lys Arg Phe 530 535 540 Ala Thr Phe Asn Val Gly Ser Asp Tyr Asp Leu Leu Tyr Phe Leu Cys 545 550 555 560
Leu Gly Phe Ile Pro Gin Tyr Leu Ser Val Ala 565 570
(2) INFORMATION FOR SEQ ID NO: 240:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 651 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240:
Met Ala Ser Ala Glu Met Arg Glu Arg Leu Glu Ala Pro Leu Pro Asp 1 5 10 15
Arg Ala Val Pro Ile Tyr Val Ala Gly Phe Leu Ala Leu Tyr Asp Ser
20 25 30
Gly Asp Pro Gly Glu Leu Ala Leu Asp Pro Asp Thr Val Arg Ala Ala 35 40 45 Leu Pro Pro Glu Asn Pro Leu Pro Ile Asn Val Asp His Arg Ala Arg 50 55 60
Cys Glu Val Gly Arg Val Leu Ala Val Val Asn Asp Pro Arg Gly Pro 65 70 75 80
Phe Phe Val Gly Leu Ile Ala Cys Val Gin Leu Glu Arg Val Leu Glu 85 90 95
Thr Ala Ala Ser Ala Ala Ile Phe Glu Arg Arg Gly Pro Ala Leu Ser
100 105 110
Arg Glu Glu Arg Leu Leu Tyr Leu Ile Thr Asn Tyr Leu Pro Ser Val 115 120 125 Ser Leu Ser Thr Lys Arg Arg Gly Asp Glu Val Pro Pro Asp Arg Thr 130 135 140
Leu Phe Ala His Val Cys Ala Ile Gly Arg Arg Leu Gly Thr Ile Val 145 150 155 160
Thr Tyr Asp Thr Ser Leu Asp Ala Ala Ile Ala Pro Phe Arg His Leu 165 170 175
Asp Pro Ala Thr Arg Glu Gly Val Arg Arg Glu Ala Ala Glu Ala Glu
180 185 190
Leu Ala Gly Arg Thr Trp Ala Pro Gly Val Glu Ala Leu Thr His Thr 195 200 205 Leu Leu Ser Thr Ala Val Asn Asn Met Met Leu Arg Asp Arg Trp Ser 210 215 220
Leu Val Ala Glu Arg Arg Arg Gin Ala Gly Ile Ala Gly His Thr Tyr 225 230 235 240
Leu Gin Ala Ser Glu Lys Phe Lys Ile Trp Gly Ala Glu Ser Ala Pro 245 250 255
Ala Pro Glu Arg Gly Tyr Lys Thr Gly Ala Pro Gly Ala Met Asp Thr
260 265 270
Ser Pro Ala Ala Ser Val Pro Ala Pro Gin Val Ala Val Arg Ala Arg 275 280 285 Gin Val Ala Ser Ser Ser Ser Ser Ser Ser Ser Phe Pro Ala Pro Ala 290 295 300
Asp Met Asn Pro Val Ser Ala Ser Gly Ala Pro Ala Pro Pro Pro Pro 305 310 315 320 Gly Asp Gly Ser Tyr Leu Trp Ile Pro Ala Phe His Tyr Asn Gin Leu
325 330 335
Val Thr Gly Gin Ser Ala Pro His His Pro Pro Leu Thr Ala Cys Gly 340 345 350 Leu Pro Ala Ala Gly Thr Val Ala Tyr Gly His Pro Gly Ala Gly Pro 355 360 365
Ser Pro His Tyr Pro Pro Pro Pro Ala His Pro Tyr Pro Gly Met Leu
370 375 380
Phe Ala Gly Pro Ser Pro Leu Glu Ala Gin Ile Ala Ala Leu Val Gly 385 390 395 400
Ala Ile Ala Ala Asp Arg Gin Ala Gly Gly Leu Pro Ala Ala Ala Gly
405 410 415
Asp His Gly Ile Arg Gly Ser Ala Lys Arg Arg Arg His Glu Val Glu 420 425 430 Gin Pro Glu Tyr Asp Cys Gly Arg Asp Glu Pro Asp Arg Asp Phe Pro 435 440 445
Tyr Tyr Pro Gly Glu Ala Arg Pro Glu Pro Arg Pro Val Asp Ser Arg
450 455 460
Arg Ala Ala Arg Gin Ala Ser Gly Phe Thr Ile Thr Ala Leu Val Gly 465 470 475 480
Ala Val Thr Ser Leu Gin Gin Glu Leu Ala His Met Arg Ala Arg Thr
485 490 495
His Ala Pro Tyr Gly Pro Tyr Pro Pro Val Gly Pro Tyr His His Pro 500 505 510 His Ala Asp Thr Glu Thr Pro Ala Gin Pro Pro Arg Tyr Pro Ala Glu 515 520 525
Ala Val Tyr Leu Pro Pro Pro His Ile Ala Pro Pro Gly Pro Pro Leu
530 535 540
Ser Gly Ala Val Pro Pro Pro Ser Tyr Pro Pro Val Ala Val Thr Pro 545 550 555 560
Gly Pro Ala Pro Pro Leu His Gin Pro Ser Pro Ala His Ala His Pro
565 570 575
Pro Pro Pro Pro Pro Gly Pro Thr Pro Pro Pro Ala Ala Ser Leu Pro 580 585 590 Gin Pro Glu Ala Pro Gly Ala Glu Ala Gly Ala Leu Val Asn Ala Ser 595 600 605
Ser Ala Ala His Val Lys Arg Gly His Gly Pro Gly Arg Arg Ser Val
610 615 620
Cys Val Thr Asp Asp Gly Val Pro Leu Thr Arg Leu Gin Asp Pro Asp 625 630 635 640
Leu Gly Gly Val Cys Val Phe Ile Tyr Phe Lys 645 650 (2) INFORMATION FOR SEQ ID NO : 241:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 896 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:241:
Met Arg Gly Gly Gly Leu Ile Cys Ala Leu Val Val Gly Ala Leu Val 1 5 10 15 Ala Ala Val Ala Ser Ala Ala Pro Ala Ala Pro Ala Ala Pro Arg Ala 20 25 30
Ser Gly Gly Val Ala Ala Thr Val Ala Ala Asn Gly Gly Pro Ala Ser
35 40 45
Arg Pro Pro Pro Val Pro Ser Pro Ala Thr Thr Lys Ala Arg Lys Arg 50 55 60
Lys Thr Lys Lys Pro Pro Lys Arg Pro Glu Ala Thr Pro Pro Pro Asp 65 70 75 80
Ala Asn Ala Thr Val Ala Ala Gly His Ala Thr Leu Arg Ala His Leu 85 90 95 Arg Glu Ile Lys Val Glu Asn Ala Asp Ala Gin Phe Tyr Val Cys Pro 100 105 110
Pro Pro Thr Gly Ala Thr Val Val Gin Phe Glu Gin Pro Arg Arg Cys
115 120 125
Pro Trp Glu Gly Gin Asn Tyr Thr Glu Gly Ile Ala Val Val Phe Lys 130 135 140
Glu Asn Ile Ala Pro Tyr Lys Phe Lys Ala Thr Met Tyr Tyr Lys Asp
145 150 155 160
Val Thr Val Ser Gin Val Trp Phe Gly His Arg Tyr Ser Gin Phe Met
165 170 175 Gly Ile Phe Glu Asp Arg Ala Pro Val Pro Phe Glu Glu Val Ile Asp
180 185 190
Lys Ile Asn Ala Lys Gly Val Cys Arg Ser Thr Ala Lys Tyr Val Arg
195 200 205
Asn Asn Met Thr Ala Phe His Arg Asp Asp His Glu Thr Asp Met Glu 210 215 220
Leu Lys Pro Ala Lys Val Ala Thr Arg Thr Ser Arg Gly Trp His Thr 225 230 235 240
Thr Asp Leu Lys Tyr Asn Pro Ser Arg Val Glu Ala Phe His Arg Tyr 245 250 255
Gly Thr Thr Val Asn Cys Ile Val Glu Glu Val Asp Ala Arg Ser Val
260 265 270
Tyr Pro Tyr Asp Glu Phe Val Leu Ala Thr Gly Asp Phe Val Tyr Met 275 280 285
Ser Pro Phe Tyr Gly Tyr Arg Glu Gly Ser His Thr Glu His Thr Ser
290 295 300
Tyr Ala Ala Asp Arg Phe Lys Gin Val Asp Gly Phe Tyr Ala Arg Asp 305 310 315 320 Leu Thr Thr Lys Ala Arg Ala Thr Ser Pro Thr Thr Arg Asn Leu Leu
325 330 335
Thr Thr Pro Lys Phe Thr Val Ala Trp Asp Trp Val Pro Lys Arg Pro
340 345 350
Ala Val Cys Thr Met Thr Lys Trp Gin Glu Val Asp Glu Met Leu Arg 355 360 365
Ala Glu Tyr Gly Gly Ser Phe Arg Phe Ser Ser Asp Ala Ile Ser Thr
370 375 380
Thr Phe Thr Thr Asn Leu Thr Gin Tyr Ser Leu Ser Arg Val Asp Leu 385 390 395 400 Gly Asp Cys Ile Gly Arg Asp Ala Arg Glu Ala Ile Asp Arg Met Phe
405 410 415
Ala Arg Lys Tyr Asn Ala Thr His Ile Lys Val Gly Gin Pro Gin Tyr
420 425 430
Tyr Leu Ala Thr Gly Gly Phe Leu Ile Ala Tyr Gin Pro Leu Leu Ser 435 440 445
Asn Thr Leu Ala Glu Leu Tyr Val Arg Glu Tyr Met Arg Glu Gin Asp
450 455 460
Arg Lys Pro Arg Asn Ala Thr Pro Ala Pro Leu Arg Glu Ala Pro Ser 465 470 475 480 Ala Asn Ala Ser Val Glu Arg Ile Lys Thr Thr Ser Ser Ile Glu Phe
485 490 495
Ala Arg Leu Gin Phe Thr Tyr Asn His Ile Gin Arg His Val Asn Asp
500 505 510
Met Leu Gly Arg Ile Ala Val Ala Trp Cys Glu Leu Gin Asn His Glu 515 520 525
Leu Thr Leu Trp Asn Glu Ala Arg Lys Leu Asn Pro Asn Ala Ile Ala
530 535 540
Ser Ala Thr Val Gly Arg Arg Val Ser Ala Arg Met Leu Gly Asp Val 545 550 555 560 Met Ala Val Ser Thr Cys Val Pro Val Ala Pro Asp Asn Val Ile Val
565 570 575
Gin Asn Ser Met Arg Val Ser Ser Arg Pro Gly Thr Cys Arg Pro Leu 580 585 590 Val Ser Phe Arg Tyr Glu Asp Gin Gly Pro Leu Ile Glu Gly Gin Leu
595 600 605
Gly Glu Asn Asn Glu Leu Arg Leu Thr Arg Asp Ala Leu Glu Pro Cys
610 615 620 Thr Val Gly His Arg Arg Tyr Phe Ile Phe Gly Gly Gly Tyr Val Tyr
625 630 635 640
Phe Glu Glu Tyr Ala Tyr Ser His Gin Leu Ser Arg Ala Asp Val Thr
645 650 655
Thr Val Ser Thr Phe Ile Asp Leu Asn Ile Thr Met Leu Glu Asp His 660 665 670
Glu Phe Val Pro Leu Glu Val Tyr Thr Arg His Glu Ile Lys Asp Ser
675 680 685
Gly Leu Leu Asp Tyr Thr Glu Val Gin Arg Arg Asn Gin Leu His Asp
690 695 700 Leu Arg Phe Ala Asp Ile Asp Thr Val Ile Arg Ala Asp Ala Asn Ala
705 710 715 720
Ala Met Phe Ala Gly Leu Cys Ala Phe Phe Glu Gly Met Gly Asp Leu
725 730 735
Gly Arg Ala Val Gly Lys Val Val Met Gly Val Val Gly Gly Val Val 740 745 750
Ser Ala Val Ser Gly Val Ser Ser Phe Met Ser Asn Pro Phe Gly Ala
755 760 765
Val Gly Leu Leu Val Leu Ala Gly Leu Val Ala Ala Phe Phe Ala Phe
770 775 780 Arg Tyr Val Leu Gin Leu Gin Arg Asn Pro Met Lys Ala Leu Tyr Pro
785 790 795 800
Leu Thr Thr Lys Glu Leu Lys Thr Ser Asp Pro Gly Gly Val Gly Gly
805 810 815
Glu Gly Glu Glu Gly Ala Glu Gly Gly Gly Phe Asp Glu Ala Lys Leu 820 825 830
Ala Glu Ala Arg Glu Met Ile Arg Tyr Met Ala Leu Val Ser Ala Met
835 840 845
Glu Arg Thr Glu His Lys Ala Arg Lys Lys Gly Thr Ser Ala Leu Leu 850 855 860 Ser Ser Lys Val Thr Asn Met Val Leu Arg Lys Arg Asn Lys Ala Arg 865 870 875 880
Tyr Ser Pro Leu His Asn Glu Asp Glu Ala Gly Asp Glu Asp Glu Leu 885 890 895
(2) INFORMATION FOR SEQ ID NO: 242:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 69 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:242:
Val Val Ala Gly Leu Gly Thr Gly Gly Gly Arg Glu Ala Gly Pro Pro 1 5 10 15
Phe Ala Ala Thr Val Ala Ala Thr Pro Pro Glu Arg Ala Ala Gly Ala
20 25 30
Ala Gly Ala Ala Asp Ala Thr Ala Ala Thr Ser Ala Pro Thr Thr Ser 35 40 45 Ala Gin Ile Lys Pro Pro Pro Arg Met Ala Gly Leu Arg Gly Arg Val 50 55 60
Ala Pro Ala Ala Arg 65
(2) INFORMATION FOR SEQ ID NO: 243:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 773 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:243:
Met Ala Ala Ala Pro Pro Ala Ala Val Ser Glu Pro Thr Ala Ala Arg
1 5 10 15
Gin Lys Leu Leu Ala Leu Leu Gly Gin Val Gin Thr Tyr Val Phe Gin 20 25 30
Leu Glu Leu Leu Arg Arg Cys Asp Pro Gin Ile Gly Leu Gly Lys Leu
35 40 45
Ala Gin Leu Lys Leu Asn Ala Leu Gin Val Arg Val Leu Arg Arg His 50 55 60 Leu Arg Pro Gly Leu Glu Ala Gin Ala Ala Ala Phe Leu Thr Pro Leu 65 70 75 80
Ser Val Thr Leu Glu Leu Leu Leu Glu Tyr Ala Trp Arg Glu Gly Glu 85 90 95 Arg Leu Leu Gly His Leu Glu Thr Phe Ala Thr Thr Gly Asp Val Ser
100 105 110
Ala Phe Phe Thr Glu Thr Met Gly Leu Ala Arg Pro Cys Pro Tyr His 115 120 125 Gin Gin Ile Arg Leu Glu Thr Tyr Gly Gly Asp Val Arg Met Glu Leu 130 135 140
Cys Phe Leu His Asp Val Glu Asn Phe Leu Lys Gin Leu Asn Tyr Cys 145 150 155 160
His Leu Ile Thr Pro Pro Ser Gly Ala Thr Ala Ala Leu Glu Arg Val 165 170 175
Arg Glu Phe Met Val Ala Ala Val Gly Ser Gly Leu Ile Val Pro Pro
180 185 190
Glu Leu Ser Asp Pro Ser His Pro Cys Ala Val Cys Phe Glu Glu Leu 195 200 205 Cys Val Thr Ala Asn Gin Gly Ala Thr Ile Ala Arg Arg Leu Ala Asp 210 215 220
Arg Ile Cys Asn His Val Thr Gin Gin Ala Gin Val Arg Leu Asp Ala 225 230 235 240
Asn Glu Leu Arg Arg Tyr Leu Pro His Ala Ala Gly Leu Ser Asp Ala 245 250 255
Ala Arg Ala Arg Ala Leu Cys Val Leu Asp Gin Ala Arg Thr Ala Ala
260 265 270
Gly Gly Gly Ala Arg Ala Gly Pro Pro Pro Ala Asp Ser Ser Ser Val 275 280 285 Arg Glu Glu Ala Asp Ala Leu Leu Glu Ala His Asp Val Phe Gin Ala 290 295 300
Thr Thr Pro Gly Ala Ile Ser Glu Leu Arg Phe Trp Leu Ala Ser Gly 305 310 315 320
Asp Arg Ala Arg His Ser Thr Met Asp Ala Phe Ala Asp Asn Leu Asn 325 330 335
Ala Gin Arg Glu Leu Gin Gin Glu Thr Ala Ala Val Ala Val Glu Leu
340 345 350
Ala Leu Phe Gly Arg Arg Ala Glu His Phe Asp Arg Ala Phe Gly Gly 355 360 365 His Leu Ala Ala Leu Asp Met Val Asp Ala Leu Ile Ile Gly Gly Gin 370 375 380
Ala Thr Ser Pro Asp Asp Gin Ile Glu Ala Leu Ile Arg Ala Cys Tyr 385 390 395 400
Asp His His Leu Thr Thr Pro Leu Leu Arg Arg Leu Val Ser Pro Glu 405 410 415
Gin Cys Asp Glu Glu Ala Leu Arg Arg Val Leu Ala Arg Leu Gly Ala
420 425 430
Gly Gly Ala Thr Gly Gly Ala Glu Glu Glu Glu Pro Arg Ala Ala Ala 435 440 445
Glu Glu Gly Gly Arg Arg Arg Gly Ala Gly Thr Pro Ala Ser Glu Asp
450 455 460
Gly Glu Arg Gly Pro Glu Pro Gly Ala Gin Gly Pro Glu Ser Trp Gly 465 470 475 480
Asp Ile Ala Thr Arg Ala Ala Ala Asp Val Pro Glu Arg Arg Arg Leu
485 490 495
Tyr Ala Asp Arg Leu Thr Lys Arg Ser Leu Ala Ser Leu Gly Arg Cys 500 505 510 Val Arg Glu Gin Arg Gly Glu Leu Glu Lys Met Leu Arg Val Ser Val 515 520 525
His Gly Glu Val Leu Pro Ala Thr Phe Ala Ala Val Ala Asn Gly Phe
530 535 540
Ala Ala Arg Ala Arg Phe Cys Ala Leu Thr Ala Gly Ala Gly Thr Val 545 550 555 560
Ile Asp Asn Arg Ala Ala Pro Gly Val Phe Asp Ala His Arg Phe Met
565 570 575
Arg Ala Ser Leu Leu Arg His Gin Val Asp Pro Ala Leu Leu Pro Ser 580 585 590 Ile Thr Phe Phe Glu Leu Val Asn Gly Pro Leu Phe Asp His Ser Thr 595 600 605
His Ser Phe Ala Gin Pro Pro Asn Thr Ala Leu Tyr Tyr Ser Val Glu
610 615 620
Asn Val Gly Leu Leu Pro His Leu Lys Glu Glu Leu Ala Arg Phe Ile 625 630 635 640
Met Gly Ala Gly Gly Ser Gly Ala Asp Trp Ala Val Ser Glu Phe Gin
645 650 655
Lys Phe Tyr Cys Phe Asp Gly Val Ser Gly Ile Thr Pro Thr Gin Arg 660 665 670 Ala Ala Trp Arg Tyr Ile Arg Glu Leu Ile Ile Ala Thr Thr Leu Phe 675 680 685
Ala Ser Val Tyr Arg Cys Gly Glu Leu Glu Leu Arg Arg Pro Asp Cys
690 695 700
Ser Arg Pro Thr Ser Glu Gly Arg Tyr Pro Pro Gly Val Tyr Leu Thr 705 710 715 720
Tyr Asn Ser Asp Cys Pro Leu Val Ala Ile Val Glu Ser Gly Pro Asp
725 730 735
Gly Cys Ile Gly Pro Arg Ser Val Val Val Tyr Asp Arg Asp Val Phe 740 745 750 Ser lie Lys Val Leu Gin His Leu Ala Pro Arg Leu Ala Gly Gly Gly 755 760 765
Ser Asp Ala Pro Pro 770 (2) INFORMATION FOR SEQ ID NO: 244:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 616 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 244:
Met Asp Thr Lys Pro Lys Thr Thr Thr Thr Val Lys Val Pro Pro Gly 1 5 10 15
Pro Met Gly Tyr Val Tyr Gly Arg Ala Cys Pro Ala Glu Gly Leu Glu
20 25 30
Leu Leu Ser Leu Leu Ser Ala Arg Ser Gly Asp Ala Asp Val Ala Val 35 40 45 Ala Pro Leu Ile Val Gly Leu Thr Val Glu Ser Gly Phe Glu Ala Asn 50 55 60
Val Ala Ala Val Val Gly Ser Arg Thr Thr Gly Leu Gly Gly Thr Ala 65 70 75 80
Val Ser Leu Lys Leu Met Pro Ser His Tyr Ser Pro Ser Val Tyr Val 85 90 95
Phe His Gly Gly Arg His Leu Ala Pro Ser Thr Gin Ala Pro Asn Leu
100 105 110
Thr Arg Leu Cys Glu Arg Ala Arg Arg His Phe Gly Phe Ser Asp Tyr 115 120 125 Ala Pro Arg Pro Cys Asp Leu Lys His Glu Thr Thr Gly Asp Ala Leu 130 135 140
Cys Glu Arg Leu Gly Leu Asp Pro Asp Arg Ala Leu Leu Tyr Leu Val 145 150 155 160
Ile Thr Glu Gly Phe Arg Glu Ala Val Cys Ile Ser Asn Thr Phe Leu 165 170 175
His Leu Gly Gly Met Asp Lys Val Thr Ile Gly Asp Ala Glu Val His
180 185 190
Arg Ile Pro Val Tyr Pro Leu Gin Met Phe Met Pro Asp Phe Ser Arg 195 200 205 Val Ile Ala Asp Pro Phe Asn Cys Asn His Arg Ser Ile Gly Glu Asn 210 215 220
Phe Asn Tyr Pro Leu Pro Phe Phe Asn Arg Pro Leu Ala Arg Leu Leu 225 230 235 240 Phe Glu Ala Val Val Gly Pro Ala Ala Val Arg Ala Arg Asn Val Asp
245 250 255
Ala Val Ala Arg Ala Ala Ala His Leu Ala Phe Asp Glu Asn His Glu 260 265 270 Gly Ala Ala Leu Pro Ala Asp Ile Thr Phe Thr Ala Phe Glu Ala Ser 275 280 285
Gin Gly Lys Pro Gin Arg Gly Ala Arg Asp Ala Gly Asn Lys Gly Pro
290 295 300
Ala Gly Gly Phe Glu Gin Arg Leu Ala Ser Val Met Ala Gly Asp Ala 305 310 315 320
Ala Leu Glu Ser Ile Val Ser Met Ala Val Phe Asp Glu Pro Pro Pro
325 330 335
Asp Ile Thr Thr Trp Pro Leu Leu Glu Gly Gin Glu Thr Pro Ala Ala 340 345 350 Arg Ala Gly Ala Val Gly Ala Tyr Leu Ala Arg Ala Ala Gly Leu Val 355 360 365
Gly Ala Met Val Phe Ser Thr Asn Ser Ala Leu His Leu Thr Glu Val
370 375 380
Asp Asp Ala Gly Pro Ala Asp Pro Lys Asp His Ser Lys Pro Ser Phe 385 390 395 400
Tyr Arg Phe Phe Leu Val Pro Gly Thr His Val Ala Ala Asn Pro Gin
405 410 415
Leu Asp Arg Glu Gly His Val Val Pro Gly Tyr Glu Gly Arg Pro Thr 420 425 430 Ala Pro Leu Val Gly Gly Thr Gin Glu Phe Ala Gly Glu His Leu Ala 435 440 445
Met Leu Cys Gly Phe Ser Pro Ala Leu Leu Ala Lys Met Leu Phe Tyr
450 455 460
Leu Glu Arg Cys Asp Gly Gly Val Ile Val Gly Arg Gin Glu Met Asp 465 470 475 480
Val Phe Arg Tyr Val Ala Asp Ser Gly Gin Thr Asp Val Pro Cys Asn
485 490 495
Leu Cys Thr Phe Glu Thr Arg His Ala Cys Ala His Thr Thr Leu Met 500 505 510 Arg Leu Arg Ala Arg His Pro Lys Phe Ala Ser Ala Arg Ala Ile Gly 515 520 525
Val Phe Gly Thr Met Asn Ser Ala Tyr Ser Asp Cys Asp Val Leu Gly
530 535 540
Asn Tyr Ala Ala Phe Ser Ala Leu Lys Arg Ala Asp Gly Ser Glu Asn 545 550 555 560
Thr Arg Thr Ile Met Gin Glu Tyr Ala Ala Thr Glu Arg Val Met Ala
565 570 575
Glu Leu Glu Ala Leu Gin Tyr Val Asp Gin Ala Val Pro Thr Ala Leu 580 585 590
Gly Arg Leu Glu Thr Ile Ile Gly Thr Arg Glu Ala Leu His Thr Val
595 600 605
Val Asn Asn Ile Lys Gin Leu Val 610 615
(2) INFORMATION FOR SEQ ID NO: 245:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 616 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 245:
Met Asp Thr Lys Pro Lys Thr Thr Thr Thr Val Lys Val Pro Pro Gly 1 5 10 15
Pro Met Gly Tyr Val Tyr Gly Arg Ala Cys Pro Ala Glu Gly Leu Glu
20 25 30
Leu Leu Ser Leu Leu Ser Ala Arg Ser Gly Asp Ala Asp Val Ala Val 35 40 45 Ala Pro Leu Ile Val Gly Leu Thr Val Glu Ser Gly Phe Glu Ala Asn 50 55 60
Val Ala Ala Val Val Gly Ser Arg Thr Thr Gly Leu Gly Gly Thr Ala 65 70 75 80
Val Ser Leu Lys Leu Met Pro Ser His Tyr Ser Pro Ser Val Tyr Val 85 90 95
Phe His Gly Gly Arg His Leu Ala Pro Ser Thr Gin Ala Pro Asn Leu
100 105 110
Thr Arg Leu Cys Glu Arg Ala Arg Arg His Phe Gly Phe Ser Asp Tyr 115 120 125 Ala Pro Arg Pro Cys Asp Leu Lys His Glu Thr Thr Gly Asp Ala Leu 130 135 140
Cys Glu Arg Leu Gly Leu Asp Pro Asp Arg Ala Leu Leu Tyr Leu Val 145 150 155 160
Ile Thr Glu Gly Phe Arg Glu Ala Val Cys Ile Ser Asn Thr Phe Leu 165 170 175
His Leu Gly Gly Met Asp Lys Val Thr Ile Gly Asp Ala Glu Val His
180 185 190
Arg Ile Pro Val Tyr Pro Leu Gin Met Phe Met Pro Asp Phe Ser Arg 195 200 205
Val Ile Ala Asp Pro Phe Asn Cys Asn His Arg Ser Ile Gly Glu Asn
210 215 220
Phe Asn Tyr Pro Leu Pro Phe Phe Asn Arg Pro Leu Ala Arg Leu Leu 225 230 235 240
Phe Glu Ala Val Val Gly Pro Ala Ala Val Arg Ala Arg Asn Val Asp
245 250 255
Ala Val Ala Arg Ala Ala Ala His Leu Ala Phe Asp Glu Asn His Glu 260 265 270 Gly Ala Ala Leu Pro Ala Asp Ile Thr Phe Thr Ala Phe Glu Ala Ser 275 280 285
Gin Gly Lys Pro Gin Arg Gly Ala Arg Asp Ala Gly Asn Lys Gly Pro
290 295 300
Ala Gly Gly Phe Glu Gin Arg Leu Ala Ser Val Met Ala Gly Asp Ala 305 310 315 320
Ala Leu Glu Ser Ile Val Ser Met Ala Val Phe Asp Glu Pro Pro Pro
325 330 335
Asp Ile Thr Thr Trp Pro Leu Leu Glu Gly Gin Glu Thr Pro Ala Ala 340 345 350 Arg Ala Gly Ala Val Gly Ala Tyr Leu Ala Arg Ala Ala Gly Leu Val 355 360 365
Gly Ala Met Val Phe Ser Thr Asn Ser Ala Leu His Leu Thr Glu Val
370 375 380
Asp Asp Ala Gly Pro Ala Asp Pro Lys Asp His Ser Lys Pro Ser Phe 385 390 395 400
Tyr Arg Phe Phe Leu Val Pro Gly Thr His Val Ala Ala Asn Pro Gin
405 410 415
Leu Asp Arg Glu Gly His Val Val Pro Gly Tyr Glu Gly Arg Pro Thr 420 425 430 Ala Pro Leu Val Gly Gly Thr Gin Glu Phe Ala Gly Glu His Leu Ala 435 440 445
Met Leu Cys Gly Phe Ser Pro Ala Leu Leu Ala Lys Met Leu Phe Tyr
450 455 460
Leu Glu Arg Cys Asp Gly Gly Val Ile Val Gly Arg Gin Glu Met Asp 465 470 475 480
Val Phe Arg Tyr Val Ala Asp Ser Gly Gin Thr Asp Val Pro Cys Asn
485 490 495
Leu Cys Thr Phe Glu Thr Arg His Ala Cys Ala His Thr Thr Leu Met 500 505 510 Arg Leu Arg Ala Arg His Pro Lys Phe Ala Ser Ala Arg Ala Ile Gly 515 520 525
Val Phe Gly Thr Met Asn Ser Ala Tyr Ser Asp Cys Asp Val Leu Gly 530 535 540 Asn Tyr Ala Ala Phe Ser Ala Leu Lys Arg Ala Asp Gly Ser Glu Asn
545 550 555 560
Thr Arg Thr Ile Met Gin Glu Tyr Ala Ala Thr Glu Arg Val Met Ala
565 " 570 575 Glu Leu Glu Ala Leu Gin Tyr Val Asp Gin Ala Val Pro Thr Ala Leu
580 585 590
Gly Arg Leu Glu Thr Ile Ile Gly Thr Arg Glu Ala Leu His Thr Val
595 600 605
Val Asn Asn Ile Lys Gin Leu Val 610 615
(2) INFORMATION FOR SEQ ID NO: 246:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1228 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:246:
Met Phe Cys Ala Ala Gly Gly Pro Thr Ser Pro Gly Gly Lys Ser Ala 1 5 10 15
Ala Arg Ala Ala Ser Gly Phe Phe Ala Pro His Asn Pro Arg Gly Ala
20 25 30
Thr Gin Thr Ala Pro Pro Pro Cys Arg Arg Gin Asn Phe Tyr Asn Pro 35 40 45 His Leu Ala Gin Thr Gly Thr Gin Pro Lys Ala Pro Gly Pro Ala Gin 50 55 60
Arg His Thr Tyr Tyr Ser Glu Cys Asp Glu Phe Arg Phe Ile Ala Pro 65 70 75 80
Arg Ser Leu Asp Glu Asp Ala Pro Ala Glu Gin Arg Thr Gly Val His 85 90 95
Asp Gly Arg Leu Arg Arg Ala Pro Lys Val Tyr Cys Gly Gly Asp Glu
100 105 110
Arg Asp Val Leu Arg Val Gly Pro Glu Gly Phe Trp Pro Arg Arg Leu 115 120 125 Arg Leu Trp Gly Gly Ala Asp His Ala Pro Glu Gly Phe Asp Pro Thr 130 135 140
Val Thr Val Phe His Val Tyr Asp Ile His Val Glu His Ala Tyr Ser 145 150 155 160 Met Arg Ala Ala Gin Leu His Glu Arg Phe Met Asp Ala Ile Thr Pro
165 170 175
Ala Gly Thr Val Ile Thr Leu Leu Gly Leu Thr Pro Glu Gly His Arg 180 185 190 Val Ala Val His Val Tyr Gly Thr Arg Gin Tyr Phe Tyr Met Asn Lys 195 200 205
Ala Glu Val Asp Arg His Leu Gin Cys Arg Ala Pro Arg Asp Leu Cys
210 215 220
Glu Arg Leu Ala Ala Ala Leu Arg Glu Ser Pro Gly Ala Ser Phe Arg 225 230 235 240
Gly Ile Ser Ala Asp His Phe Glu Ala Glu Val Val Glu Arg Ala Asp
245 250 255
Val Tyr Tyr Tyr Glu Trp Thr Leu Tyr Tyr Arg Val Phe Val Arg Ser 260 265 270 Gly Arg Ala Tyr Leu Cys Asp Asn Phe Cys Pro Ala Ile Arg Lys Tyr 275 280 285
Glu Gly Gly Val Asp Ala Thr Thr Arg Phe Ile Leu Asp Asn Pro Gly
290 295 300
Phe Val Thr Phe Gly Trp Tyr Arg Leu Lys Pro Gly Arg Gly Asn Ala 305 310 315 320
Pro Ala Gin Pro Arg Pro Pro Thr Ala Phe Gly Thr Ser Ser Asp Val
325 330 335
Glu Phe Asn Cys Thr Ala Asp Asn Leu Ala Val Glu Gly Ala Met Cys 340 345 350 Asp Leu Pro Ala Tyr Lys Leu Met Cys Phe Asp Ile Glu Cys Lys Ala 355 360 365
Gly Gly Glu Asp Glu Leu Ala Phe Pro Val Ala Glu Arg Pro Glu Asp
370 375 380
Leu Val Ile Gin Ile Ser Cys Leu Leu Tyr Asp Leu Ser Thr Thr Ala 385 390 395 400
Leu Glu His Ile Leu Leu Phe Ser Leu Gly Ser Cys Asp Leu Pro Glu
405 410 415
Ser His Leu Ser Asp Leu Ala Ser Arg Gly Leu Pro Ala Pro Val Val 420 425 430 Leu Glu Phe Asp Ser Glu Phe Glu Met Leu Leu Ala Phe Met Thr Phe 435 440 445
Val Lys Gin Tyr Gly Pro Glu Phe Val Thr Gly Tyr Asn Ile Ile Asn
450 455 460
Phe Asp Trp Pro Phe Val Leu Thr Lys Leu Thr Glu Ile Tyr Lys Val 465 470 475 480
Pro Leu Asp Gly Tyr Gly Arg Met Asn Gly Arg Gly Val Phe Arg Val
485 490 495
Trp Asp Ile Gly Gin Ser His Phe Gin Lys Arg Ser Lys Ile Lys Val 500 505 510
Asn Gly Met Val Asn Ile Asp Met Tyr Gly Ile Ile Thr Asp Lys Val
515 520 525
Lys Leu Ser Ser Tyr Lys Leu Asn Ala Val Ala Glu Ala Val Leu Lys 530 535 540
Asp Lys Lys Lys Asp Leu Ser Tyr Arg Asp Ile Pro Ala Tyr Tyr Ala
545 550 555 560
Ser Gly Pro Ala Gin Arg Gly Val Ile Gly Glu Tyr Cys Val Gin Asp
565 570 575 Ser Leu Leu Val Gly Gin Leu Phe Phe Lys Phe Leu Pro His Leu Glu
580 585 590
Leu Ser Ala Val Ala Arg Leu Ala Gly Ile Asn Ile Thr Arg Thr Ile
595 600 605
Tyr Asp Gly Gin Gin Ile Arg Val Phe Thr Cys Leu Leu Arg Leu Ala 610 615 620
Gly Gin Lys Gly Phe Ile Leu Pro Asp Thr Gin Gly Arg Phe Arg Gly
625 630 635 640
Leu Asp Lys Glu Ala Pro Lys Arg Pro Ala Val Pro Arg Gly Glu Gly
645 650 655 Glu Arg Pro Gly Asp Gly Asn Gly Asp Glu Asp Lys Asp Asp Asp Glu
660 665 670
Asp Gly Asp Glu Asp Gly Asp Glu Arg Glu Glu Val Ala Arg Glu Thr
675 680 685
Gly Gly Arg His Val Gly Tyr Gin Gly Ala Arg Val Leu Asp Pro Thr 690 695 700
Ser Gly Phe His Val Asp Pro Val Val Val Phe Asp Phe Ala Ser Leu
705 710 715 720
Tyr Pro Ser Ile Ile Gin Ala His Asn Leu Cys Phe Ser Thr Leu Ser
725 730 735 Leu Arg Pro Glu Ala Val Ala His Leu Glu Ala Asp Arg Asp Tyr Leu
740 745 750
Glu Ile Glu Val Gly Gly Arg Arg Leu Phe Phe Val Lys Ala His Val
755 760 765
Arg Glu Ser Leu Leu Ser Ile Leu Leu Arg Asp Trp Leu Ala Met Arg 770 775 780
Lys Gin Ile Arg Ser Arg Ile Pro Gin Ser Thr Pro Glu Glu Ala Val 785 790 795 800
Leu Leu Asp Lys Gin Gin Ala Ala Ile Lys Val Val Cys Asn Ser Val 805 810 815 Tyr Gly Phe Thr Gly Val Gin His Gly Leu Leu Pro Cys Leu His Val 820 825 830
Ala Ala Thr Val Thr Thr lie Gly Arg Glu Met Leu Leu Ala Thr Arg 835 840 845 Ala Tyr Val His Ala Arg Trp Ala Glu Phe Asp Gin Leu Leu Ala Asp
850 855 860
Phe Pro Glu Ala Ala Gly Met Arg Ala Pro Gly Pro Tyr Ser Met Arg 865 870 875 880 Ile Ile Tyr Gly Asp Thr Asp Ser Ile Phe Val Leu Cys Arg Gly Leu
885 890 895
Thr Ala Ala Gly Leu Val Ala Met Gly Asp Lys Met Ala Ser His Arg
900 905 910
Ala Leu Phe Leu Pro Pro Ile Lys Leu Glu Cys Glu Lys Thr Phe Thr 915 920 925
Lys Leu Leu Leu Ile Ala Lys Lys Lys Tyr Ile Gly Val Ile Cys Gly
930 935 940
Gly Lys Met Leu Ile Lys Gly Val Asp Leu Val Arg Lys Asn Asn Cys 945 950 955 960 Ala Phe Ile Asn Arg Thr Ser Arg Ala Leu Val Asp Leu Leu Phe Tyr
965 970 975
Asp Asp Thr Val Ser Gly Ala Ala Ala Ala Glu Arg Pro Ala Glu Glu
980 985 990
Trp Leu Ala Arg Pro Leu Pro Glu Gly Leu Gin Ala Phe Gly Ala Val 995 1000 1005
Leu Val Asp Ala His Arg Arg Ile Thr Asp Pro Glu Arg Asp Ile Gin
1010 1015 1020
Asp Phe Val Leu Thr Ala Glu Leu Ser Arg His Pro Arg Ala Tyr Thr 1025 1030 1035 104 Asn Lys Arg Leu Ala His Leu Thr Val Tyr Tyr Lys Leu Met Ala Arg
1045 1050 1055
Arg Ala Gin Val Pro Ser Ile Lys Asp Arg Ile Pro Tyr Val Ile Val
1060 1065 1070
Ala Gin Thr Arg Glu Val Glu Glu Thr Val Ala Arg Leu Ala Ala Leu 1075 1080 1085
Arg Glu Leu Asp Ala Ala Ala Pro Gly Asp Glu Pro Ala Pro Pro Ala
1090 1095 1100
Ala Leu Pro Ser Pro Ala Lys Arg Pro Arg Glu Thr Pro Ser His Ala 1105 1110 1115 112 Asp Pro Pro Gly Gly Ala Ser Lys Pro Arg Lys Leu Leu Val Ser Glu
1125 1130 1135
Leu Ala Glu Asp Pro Gly Tyr Ala Ile Arg Val Pro Leu Asn Thr Asp
1140 1145 1150
Tyr Tyr Phe Ser His Leu Leu Gly Ala Ala Cys Val Thr Phe Lys Ala 1155 1160 1165
Leu Phe Gly Asn Asn Ala Lys Ile Thr Glu Ser Leu Leu Lys Arg Phe
1170 1175 1180
Ile Pro Glu Thr Trp His Pro Pro Asp Asp Val Ala Ala Arg Leu Arg 1185 1190 1195 120
Ala Ala Gly Phe Gly Pro Ala Gly Ala Gly Ala Thr Ala Glu Glu Thr
1205 1210 1215
Arg Arg Met Leu His Arg Ala Phe Asp Thr Leu Ala 1220 1225
(2) INFORMATION FOR SEQ ID NO: 247:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 303 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:247:
Met Tyr Asp Ile Ala Pro Arg Arg Ser Gly Ser Arg Pro Gly Pro Gly 1 5 10 15
Arg Asp Lys Thr Arg Arg Arg Ser Arg Phe Ser Ala Ala Gly Asn Pro
20 25 30
Gly Val Glu Arg Arg Ala Ser Arg Lys Ser Leu Pro Ser His Ala Arg 35 40 45 Arg Leu Glu Leu Cys Leu His Glu Arg Arg Arg Tyr Arg Gly Phe Phe 50 55 60
Ala Ala Gin Thr Pro Ser Glu Glu Ile Ala Ile Val Arg Ser Leu Ser 65 70 75 80
Val Pro Leu Val Lys Thr Thr Pro Val Ser Leu Pro Phe Ser Leu Asp 85 90 95
Gin Thr Val Ala Asp Asn Cys Leu Thr Leu Ser Gly Met Gly Tyr Tyr
100 105 110
Leu Gly Ile Gly Gly Cys Cys Pro Ala Cys Ser Ala Gly Asp Gly Arg 115 120 125 Leu Ala Thr Val Ser Arg Glu Ala Leu Ile Leu Ala Phe Val Gin Gin 130 135 140
Ile Asn Thr Ile Phe Glu His Arg Thr Phe Leu Ala Ser Leu Val Val 145 150 155 160
Leu Ala Asp Arg His Ser Thr Pro Leu Gin Asp Leu Leu Ala Asp Thr 165 170 175
Leu Gly Gin Pro Glu Leu Phe Phe Val His Thr Ile Leu Arg Gly Gly
180 185 190
Gly Ala Cys Asp Pro Arg Phe Leu Phe Tyr Pro Asp Pro Thr Tyr Gly 195 200 205
Gly His Met Leu Tyr Val Ile Phe Pro Gly Thr Ser Ala His Leu His
210 215 220
Tyr Arg Leu Ile Asp Arg Met Leu Thr Ala Cys Pro Gly Tyr Arg Phe 225 230 235 240
Ala Ala His Val Trp Gin Ser Thr Phe Val Leu Val Val Arg Arg Asn
245 250 255
Ala Glu Lys Pro Ala Asp Ala Glu Ile Pro Thr Val Ser Ala Ala Asp 260 265 270 Ile Tyr Cys Lys Met Arg Asp Ile Ser Phe Asp Gly Gly Leu Met Leu 275 280 285
Glu Tyr Gin Arg Leu Tyr Ala Thr Phe Asp Glu Phe Pro Pro Pro 290 295 300
(2) INFORMATION FOR SEQ ID NO: 248:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 590 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 248:
Met Ala Thr Ser Ala Pro Gly Val Pro Ser Ser Ala Ala Val Arg Glu
1 5 10 15
Glu Ser Pro Gly Ser Ser Trp Lys Glu Gly Ala Phe Glu Arg Pro Tyr 20 25 30
Val Ala Phe Asp Pro Asp Leu Leu Ala Leu Asn Glu Ala Leu Cys Ala
35 40 45
Glu Leu Leu Ala Ala Cys His Val Val Gly Val Pro Pro Ala Ser Ala 50 55 60 Leu Asp Glu Asp Val Glu Ser Asp Val Ala Pro Ala Pro Pro Arg Pro 65 70 75 80
Arg Gly Ala Ala Arg Glu Ala Ser Gly Gly Arg Gly Pro Gly Ser Arg
85 90 95
Pro Pro Ala Asp Pro Thr Ala Glu Gly Leu Leu Asp Thr Gly Pro Phe 100 105 110
Ala Ala Ala Ser Val Asp Thr Phe Ala Leu Asp Arg Pro Cys Leu Val
115 120 125
Cys Arg Thr Ile Glu Leu Tyr Lys Gin Ala Tyr Arg Leu Ser Pro Gin 130 135 140
Trp Val Ala Asp Tyr Ala Phe Leu Cys Ala Lys Cys Leu Gly Ala Pro 145 150 155 160
His Cys Ala Ala Ser Ile Phe Val Ala Ala Phe Glu Phe Val Tyr Val 165 170 175
Met Asp His His Phe Leu Arg Thr Lys Lys Ala Thr Leu Val Gly Ser
180 185 190
Phe Ala Arg Phe Ala Leu Thr Ile Asn Asp Ile His Arg His Phe Phe 195 200 205 Leu His Cys Cys Phe Arg Thr Asp Gly Gly Val Pro Gly Arg His Ala 210 215 220
Gin Lys Gin Pro Arg Pro Thr Pro Ser Pro Gly Ala Ala Lys Val Gin 225 230 235 240
Tyr Ser Asn Tyr Ser Phe Leu Ala Gin Ser Ala Thr Arg Ala Leu Ile 245 250 255
Gly Thr Leu Ala Ser Gly Gly Asp Asp Gly Ala Gly Ala Gly Gly Gly
260 265 270
Ser Gly Thr Gin Pro Ser Leu Thr Thr Ala Leu Met Asn Trp Lys Asp 275 280 285 Cys Ala Arg Leu Leu Asp Cys Thr Glu Gly Lys Arg Gly Gly Gly Asp 290 295 300
Ser Cys Cys Thr Arg Ala Ala Ala Arg Asn Gly Glu Phe Glu Ala Ala 305 310 315 320
Ala Gly Ala Gin Gly Gly Glu Pro Glu Thr Trp Ala Tyr Ala Asp Leu 325 330 335 lie Leu Leu Leu Leu Ala Gly Thr Pro Ala Val Trp Glu Ser Gly Pro
340 345 350
Arg Leu Arg Ala Ala Ala Asp Ala Arg Arg Ala Ala Val Ser Glu Ser 355 360 365 Trp Glu Ala His Arg Gly Ala Arg Met Arg Asp Ala Ala Pro Arg Phe 370 375 380
Ala Gin Phe Ala Glu Pro Lys Ala Gin Pro Asp Leu Asp Leu Gly Pro 385 390 395 400
Leu Met Ala Thr Val Leu Lys His Gly Arg Gly Arg Gly Arg Thr Gly 405 410 415
Gly Glu Cys Leu Leu Cys Asn Leu Leu Leu Val Arg Ala Tyr Trp Leu
420 425 430
Ala Met Arg Arg Leu Arg Ala Ser Val Val Arg Tyr Ser Glu Asn Asn 435 440 445 Thr Ser Leu Phe Asp Cys Ile Val Pro Val Val Asp Gin Leu Glu Ala 450 455 460
Asp Pro Glu Ala Gin Pro Gly Asp Gly Gly Arg Phe Val Ser Leu Leu 465 470 475 480 Arg Ala Ala Gly Pro Glu Ala Ile Phe Lys His Met Phe Cys Asp Pro
485 490 495
Met Cys Ala Ile Thr Glu Met Glu Val Asp Pro Trp Val Leu Phe Gly
500 505 510
His Pro Arg Ala Asp His Arg Asp Glu Leu Gin Leu His Lys Ala Lys
515 520 525
Leu Ala Cys Gly Asn Glu Phe Glu Gly Arg Val Cys Ile Ala Leu Arg
530 535 540
Ala Leu Ile Tyr Thr Phe Lys Thr Tyr Gin Val Phe Val Pro Lys Pro 545 550 555 560
Thr Ala Thr Phe Val Arg Glu Ala Gly Ala Leu Leu Arg Arg His Ser
565 570 575
Ile Ser Leu Leu Ser Leu Glu His Thr Leu Cys Thr Tyr Val 580 585 590
(2) INFORMATION FOR SEQ ID NO: 249:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 128 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 249:
Met Ala Gly Arg Ala Gly Arg Trp Arg Thr Leu Arg Asp Ala Ile Pro 1 5 10 15 Asp Cys Ala Leu Arg Ser Gin Thr Leu Glu Ser Leu Asp Ala Arg Tyr 20 25 30
Val Ser Arg Asp Gly Ala Gly Asp Ala Ala Val Trp Phe Glu Asp Met
35 40 45
Thr Pro Ala Glu Leu Glu Val Ile Phe Pro Thr Thr Asp Ala Lys Leu 50 55 60
Asn Tyr Leu Ser Arg Thr Gin Arg Leu Ala Ser Leu Leu Thr Tyr Ala 65 70 75 80
Gly Pro Ile Lys Ala Pro Asp Gly Pro Ala Ala Pro His Thr Gin Asp 85 90 95 Thr Ala Cys Val His Gly Glu Leu Leu Ala Arg Lys Arg Glu Arg Phe 100 105 110
Ala Ala Val Ile Asn Arg Phe Leu Asp Leu His Gin Ile Leu Arg Gly 115 120 125 (2) INFORMATION FOR SEQ ID NO: 250:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 112 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:250:
Met Ala Ala Pro Gin Phe His Arg Pro Ser Thr Ile Thr Ala Asp Asn 1 5 10 15
Val Arg Ala Leu Gly Met Arg Gly Leu Val Leu Ala Thr Asn Asn Ala
20 25 30
Gin Phe Ile Met Asp Asn Ser Tyr Pro His Pro His Gly Thr Gin Gly 35 40 45 Ala Val Arg Glu Phe Leu Arg Gly Gin Ala Ala Ala Leu Thr Asp Leu 50 55 60
Gly Val Thr His Ala Asn Asn Thr Phe Ala Pro Gin Pro Met Phe Ala 65 70 75 80
Gly Asp Ala Ala Ala Glu Trp Leu Arg Pro Ser Phe Gly Leu Lys Arg 85 90 95
Thr Tyr Ser Pro Phe Val Val Arg Asp Pro Lys Thr Pro Ser Thr Pro 100 105 110
(2) INFORMATION FOR SEQ ID NO: 251:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 112 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 251:
Met Ala Ala Pro Gin Phe His Arg Pro Ser Thr Ile Thr Ala Asp Asn
1 5 10 15
Val Arg Ala Leu Gly Met Arg Gly Leu Val Leu Ala Thr Asn Asn Ala 20 25 30
Gin Phe Ile Met Asp Asn Ser Tyr Pro His Pro His Gly Thr Gin Gly
35 40 45
Ala Val Arg Glu Phe Leu Arg Gly Gin Ala Ala Ala Leu Thr Asp Leu 50 55 60
Gly Val Thr His Ala Asn Asn Thr Phe Ala Pro Gin Pro Met Phe Ala 65 70 75 80
Gly Asp Ala Ala Ala Glu Trp Leu Arg Pro Ser Phe Gly Leu Lys Arg 85 90 95 Thr Tyr Ser Pro Phe Val Val Arg Asp Pro Lys Thr Pro Ser Thr Pro 100 105 110
(2) INFORMATION FOR SEQ ID NO: 252:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3051 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:252:
Met Ile Pro Ala Ala Leu Pro His Pro Thr Met Lys Arg Gin Gly Asp 1 5 10 15
Arg Asp Ile Val Val Thr Gly Val Arg Asn Gin Phe Ala Thr Asp Leu
20 25 30
Glu Pro Gly Gly Ser Val Ser Cys Met Arg Ser Ser Leu Ser Phe Leu 35 40 45
Ser Leu Leu Phe Asp Val Gly Pro Arg Asp Val Leu Ser Ala Glu Ala
50 55 60
Ile Glu Gly Cys Leu Val Glu Gly Gly Glu Trp Thr Arg Ala Ala Ala 65 70 75 80 Gly Ser Gly Pro Pro Arg Met Cys Ser Ile Ile Glu Leu Pro Asn Phe
85 90 95
Leu Glu Tyr Pro Ala Arg Gly Leu Arg Cys Val Phe Ser Arg Val Tyr
100 105 110
Gly Glu Val Gly Phe Phe Gly Glu Pro Thr Ala Gly Leu Leu Glu Thr 115 120 125
Gin Cys Pro Ala His Thr Phe Phe Ala Gly Pro Trp Ala Met Arg Pro
130 135 140
Leu Ser Tyr Thr Leu Leu Thr Ile Gly Pro Leu Gly Met Gly Arg Asp 145 150 155 160
Gly Asp Thr Ala Tyr Leu Phe Asp Pro His Gly Leu Pro Ala Gly Thr 165 170 175
Pro Ala Phe Ile Ala Lys Val Arg Ala Gly Asp Val Tyr Pro Tyr Leu
180 185 190
Thr Tyr Tyr Ala His Asp Arg Pro Lys Val Arg Trp Ala Gly Ala Met 195 200 205
Val Phe Phe Val Pro Ser Gly Pro Gly Ala Val Ala Pro Ala Asp Leu
210 215 220
Thr Ala Ala Ala Leu His Leu Tyr Gly Ala Ser Glu Thr Tyr Leu Gin
225 230 235 240
Asp Glu Pro Phe Val Glu Arg Arg Val Ala Ile Thr His Pro Leu Arg 245 250 255
Gly Glu Ile Gly Gly Leu Gly Ala Leu Phe Val Gly Val Val Pro Arg
260 265 270
Gly Asp Gly Glu Gly Ser Gly Pro Val Val Pro Ala Leu Pro Ala Pro 275 280 285
Thr His Val Gin Thr Pro Arg Ala Asp Arg Pro Pro Glu Ala Pro Arg
290 295 300
Gly Ala Ser Gly Pro Pro Asn Thr Pro Gin Ala Gly His Pro Asn Arg
305 310 315 320
Pro Pro Asp Asp Val Trp Ala Ala Ala Leu Glu Gly Thr Pro Pro Ala 325 330 335
Lys Pro Ser Ala Pro Asp Ala Ala Ala Ser Gly Pro Pro His Ala Ala 340 345 350
Pro Pro Pro Gin Thr Pro Ala Gly Asp Ala Ala Glu Glu Ala Glu Asp 355 360 365
Leu Arg Val Leu Glu Val Gly Ala Val Pro Val Gly Arg His Arg Ala
370 375 380
Arg Tyr Ser Thr Gly Leu Pro Lys Arg Arg Arg Pro Thr Trp Thr Pro
385 390 395 400
Pro Ser Ser Val Glu Asp Leu Thr Ser Gly Glu Arg Pro Ala Pro Lys 405 410 415
Ala Pro Pro Ala Lys Ala Lys Lys Lys Ser Ala Pro Lys Lys Lys Ala 420 425 430
Pro Val Ala Ala Glu Val Pro Ala Ser Ser Pro Thr Pro Ile Ala Ala 435 440 445
Thr Val Pro Pro Ala Pro Asp Thr Pro Pro Gin Ser Gly Gin Gly Gly
450 455 460
Gly Asp Asp Gly Pro Asp Ser Ser Pro Ser Val Leu Glu Thr Leu Gly
465 470 475 480
Ala Arg Arg Pro Pro Glu Pro Pro Gly Ala Asp Leu Ala Gin Leu Phe 485 490 495 Glu Val His Pro Asn Val Ala Ala Thr Ala Val Arg Leu Ala Ala Arg
500 505 510
Asp Ala Ala Arg Glu Val Ala Ala Cys Ser Gin Leu Thr Ile Asn Ala 515 520 525 Leu Arg Ser Pro Tyr Pro Ala His Pro Gly Leu Leu Glu Leu Cys Val 530 535 540
Ile Phe Phe Phe Glu Arg Val Leu Ala Phe Leu Ile Glu Asn Gly Ala 545 550 555 560
Arg Thr His Thr Gin Ala Gly Val Ala Gly Pro Ala Ala Ala Leu Leu 565 570 575
Asp Phe Thr Leu Arg Met Leu Pro Arg Lys Thr Ala Val Gly Asp Phe
580 585 590
Leu Ala Ser Thr Arg Met Ser Leu Ala Asp Val Ala Ala His Arg Pro 595 600 605 Leu Ile Gin His Val Leu Asp Glu Asn Ser Gin Ile Gly Arg Leu Ala 610 615 620
Lys Leu Val Leu Val Ala Arg Asp Val Ile Arg Glu Thr Asp Ala Phe 625 630 635 640
Tyr Gly Asp Leu Ala Asp Leu Asp Leu Gin Leu Arg Ala Ala Pro Pro 645 650 655
Ala Asn Leu Tyr Ala Arg Leu Gly Glu Trp Leu Leu Glu Arg Ser Arg
660 665 670
Ala His Pro Asn Thr Leu Phe Ala Pro Ala Thr Pro Thr His Pro Glu 675 680 685 Pro Leu Leu His Arg Ile Gin Ala Gin Phe Arg Glu Glu Met Arg Val 690 695 700
Glu Ala Glu Ala Arg Glu Met Arg Glu Ala Leu Asp Arg Val Asp Ser 705 710 715 720
Val Ser Gin Arg Ala Gly Pro Leu Thr Val Met Pro Val Pro Ala Ala 725 730 735
Pro Gly Ala Gly Gly Arg Ala Pro Cys Pro Pro Ala Leu Gly Pro Glu
740 745 750
Ala Ile Gin Ala Arg Leu Glu Asp Val Arg Ile Gin Ala Arg Arg Ala 755 760 765 Ile Glu Ser Ala Ile Lys Glu Tyr Phe His Arg Gly Ala Val Tyr Ser 770 775 780
Ala Lys Ala Leu Gin Ala Ser Asp Ser His Asp Cys Arg Phe His Val 785 790 795 800
Ala Ser Ala Ala Val Val Pro Met Val Gin Leu Leu Glu Ser Leu Pro 805 810 815
Ala Phe Asp Gin His Thr Arg Asp Val Ala Gin Arg Ala Ala Leu Pro
820 825 830
Pro Pro Pro Pro Leu Ala Thr Ser Pro Gin Ala Ile Leu Leu Arg Asp 835 840 845
Leu Leu Gin Arg Gly Gin Thr Leu Asp Ala Pro Glu Asp Leu Ala Ala
850 855 860
Trp Leu Ser Val Leu Thr Asp Ala Ala Thr Gin Gly Leu Ile Glu Arg 865 870 875 880
Lys Pro Leu Glu Glu Leu Ala Arg Ser Ile His Gly Ile Asn Asp Gin
885 890 895
Gin Ala Arg Arg Ser Ser Gly Leu Ala Glu Leu Gin Arg Phe Asp Ala 900 905 910 Leu Asp Ala Ala Gin Gin Leu Asp Ser Asp Ala Ala Phe Val Pro Ala 915 920 925
Thr Gly Pro Ala Pro Tyr Val Asp Gly Gly Gly Leu Ser Pro Glu Ala
930 935 940
Thr Arg Met Ala Glu Asp Ala Leu Arg Gin Ala Arg Ala Met Glu Ala 945 950 955 960
Ala Lys Met Thr Ala Glu Leu Ala Pro Glu Ala Arg Ser Arg Leu Arg
965 970 975
Glu Arg Ala His Ala Leu Glu Ala Met Leu Asn Asp Ala Arg Glu Arg 980 985 990 Ala Lys Val Ala His Asp Ala Arg Glu Lys Phe Leu His Lys Leu Gin 995 1000 1005
Gly Val Leu Arg Pro Leu Pro Asp Phe Val Gly Leu Lys Ala Cys Pro
1010 1015 1020
Ala Val Leu Ala Thr Leu Arg Ala Ser Leu Pro Ala Gly Trp Thr Asp 1025 1030 1035 104
Leu Ala Asp Ala Val Arg Gly Pro Pro Pro Glu Val Thr Ala Ala Leu
1045 1050 1055
Arg Ala Asp Leu Trp Gly Leu Leu Gly Gin Tyr Arg Glu Ala Leu Glu 1060 1065 1070 His Pro Thr Pro Asp Thr Ala Thr Ala Gly Leu His Pro Ala Phe Val 1075 1080 1085
Val Val Leu Lys Thr Leu Phe Ala Asp Ala Pro Glu Thr Pro Val Leu
1090 1095 1100
Val Gin Phe Phe Ser Asp His Ala Pro Thr Ile Ala Lys Ala Val Ser 1105 1110 1115 112
Asn Ala Ile Asn Ala Gly Ser Ala Ala Val Ala Thr Asp Ala Ala Thr
1125 1130 1135
Val Asp Ala Ala Val Arg Ala His Gly Ala Asp Ala Val Ser Ala Leu 1140 1145 1150 Gly Ala Ala Ala Arg Asp Pro Asp Leu Ser Phe Leu Ala Ala Asp Ser 1155 1160 1165
Ala Ala Gly Tyr Val Lys Ala Thr Arg Leu Ala Leu Glu Arg Ala Ile 1170 1175 1180 Asp Glu Leu Thr Thr Leu Gly Ser Ala Ala Ala Asp Leu Val Val Gin
1185 1190 1195 120
Ala Arg Arg Ala Cys Ala Gin Pro Glu Gly Asp His Ala Ala Leu Ile
1205 1210 1215 Asp Ala Ala Ala Arg Ala Thr Thr Ala Ala Arg Glu Ser Leu Ala Gly
1220 1225 1230
His Glu Ala Gly Phe Gly Gly Leu Leu His Ala Glu Gly Thr Ala Gly
1235 1240 1245
Asp His Ser Pro Ser Gly Arg Ala Leu Gin Glu Leu Gly Lys Val Ile 1250 1255 1260
Gly Ala Thr Arg Arg Arg Ala Asp Glu Leu Glu Ala Ala Val Ala Asp
1265 1270 1275 128
Leu Thr Ala Lys Met Ala Ala Gin Arg Arg Ser Ser Trp Ala Ala Gly
1285 1290 1295 Val Glu Ala Ala Leu Asp Arg Val Glu Asn Arg Ala Glu Phe Asp Val
1300 1305 1310
Val Glu Leu Arg Arg Leu Gin Ala Gly Thr His Gly Tyr Asn Pro Arg
1315 1320 1325
Asp Phe Arg Lys Arg Ala Glu Gin Ala Ala Asn Ala Glu Ala Val Thr 1330 1335 1340
Leu Ala Leu Asp Thr Ala Phe Ala Phe Asn Pro Tyr Thr Pro Glu Asn
1345 1350 1355 136
Gin Arg His Pro Met Leu Pro Pro Leu Ala Ala Ile His Arg Leu Gly
1365 1370 1375 Trp Ser Ala Ala Phe His Ala Ala Ala Glu Thr Tyr Ala Asp Met Phe
1380 1385 1390
Arg Val Asp Ala Glu Pro Leu Ala Arg Leu Leu Arg Ile Ala Glu Gly
1395 1400 1405
Leu Leu Glu Met Ala Gin Ala Gly Asp Gly Phe Ile Asp Tyr His Glu 1410 1415 1420
Ala Val Gly Arg Leu Ala Asp Asp Met Thr Ser Val Pro Gly Leu Arg
1425 1430 1435 144
Arg Tyr Val Pro Phe Phe Gin His Gly Tyr Ala Asp Tyr Val Glu Leu
1445 1450 1455 Arg Asp Arg Leu Asp Ala Ile Arg Ala Asp Val His Arg Ala Leu Gly
1460 1465 1470
Gly Val Pro Leu Asp Leu Ala Ala Ala Ala Glu Gin Ile Ser Ala Ala
1475 1480 1485
Arg Asn Asp Pro Glu Ala Thr Ala Glu Leu Val Arg Thr Gly Val Thr 1490 1495 1500
Leu Pro Cys Pro Ser Glu Asp Ala Leu Val Ala Cys Ala Ala Ala Leu 1505 1510 1515 152
Glu Arg Val Asp Gin Ser Pro Val Lys Asn Thr Ala Tyr Ala Glu Tyr 1525 1530 1535
Val Ala Phe Val Thr Arg Gin Asp Thr Ala Glu Thr Lys Asp Ala Val
1540 1545 1550
Val Arg Ala Lys Gin Gin Arg Ala Glu Ala Thr Glu Arg Val Met Ala 1555 1560 1565
Gly Leu Arg Glu Ala Ala Arg Glu Arg Arg Ala Gin Ile Glu Ala Glu
1570 1575 1580
Gly Leu Ala Asn Leu Lys Thr Met Leu Lys Val Val Ala Val Pro Ala 1585 1590 1595 160 Thr Val Ala Lys Thr Leu Asp Gin Ala Arg Ser Val Ala Glu Ile Ala
1605 1610 1615
Asp Gin Val Glu Val Leu Leu Asp Gin Thr Glu Lys Thr Arg Glu Leu
1620 1625 1630
Asp Val Pro Ala Val Ile Trp Leu Glu His Ala Gin Arg Thr Phe Glu 1635 1640 1645
Thr His Pro Leu Ser Ala Arg Asp Gly Pro Gly Pro Leu Ala Arg His
1650 1655 1660
Ala Gly Arg Leu Gly Ala Leu Phe Asp Thr Arg Arg Arg Val Asp Ala 1665 1670 1675 168 Leu Arg Arg Ser Leu Glu Glu Ala Glu Ala Glu Trp Asp Glu Val Trp
1685 1690 1695
Gly Arg Phe Gly Arg Val Arg Gly Gly Ala Trp Lys Ser Pro Glu Gly
1700 1705 1710
Phe Arg Ala Met His Glu Gin Leu Arg Ala Leu Gin Asp Thr Thr Asn 1715 1720 1725
Thr Val Ser Gly Leu Arg Ala Gin Pro Ala Tyr Glu Arg Leu Ser Ala
1730 1735 1740
Arg Tyr Gin Gly Val Leu Gly Ala Lys Gly Ala Glu Arg Ala Glu Ala 1745 1750 1755 176 Val Glu Glu Leu Gly Ala Arg Val Thr Lys His Thr Ala Leu Cys Ala
1765 1770 1775
Arg Leu Arg Asp Glu Val Val Arg Arg Val Pro Trp Glu Met Asn Phe
1780 1785 1790
Asp Ala Leu Gly Arg Leu Leu Ala Glu Phe Asp Ala Ala Ala Ala Asp 1795 1800 1805
Leu Ala Pro Trp Ala Val Glu Glu Phe Arg Gly Ala Arg Glu Leu Ile
1810 1815 1820
Gin Arg Arg Met Gly Ser Ala Tyr Ala Arg Ala Gly Gly Gin Thr Gly 1825 1830 1835 184 Ala Gly Ala Ala Ala Ala Pro Ala Pro Leu Leu Val Asp Leu Arg Ala
1845 1850 1855
Leu Asp Ala Arg Ala Arg Ala Ser Ser Ser Pro Glu Gly His Glu Val 1860 1865 1870 Asp Pro Gin Leu Leu Arg Arg Arg Gly Glu Ala Tyr Leu Arg Ala Gly
1875 1880 1885
Gly Asp Pro Gly Pro Leu Val Leu Arg Glu Ala Val Ser Ala Leu Asp
1890 1895 1900 Leu Pro Phe Ala Thr Ser Phe Leu Ala Pro Asp Gly Thr Pro Leu Gin
1905 1910 1915 192
Tyr Ala Leu Cys Phe Pro Ala Val Thr Asp Lys Leu Gly Ala Leu Leu
1925 1930 1935
Met Arg Pro Glu Ala Ala Cys Val Arg Pro Pro Leu Pro Thr Asp Val 1940 1945 1950
Leu Glu Ser Ala Pro Thr Val Thr Ala Met Tyr Val Leu Thr Val Val
1955 1960 1965
Asn Arg Leu Gin Leu Ala Leu Ser Asp Ala Gin Ala Ala Asn Phe Gin
1970 1975 1980 Leu Phe Gly Arg Phe Val Arg His Arg Gin Ala Thr Trp Gly Ala Ser
1985 1990 1995 200
Met Asp Ala Ala Ala Glu Leu Tyr Val Val Ala Thr Thr Leu Thr Arg
2005 2010 2015
Glu Phe Gly Cys Arg Trp Ala Gin Leu Gly Trp Ala Ser Gly Ala Ala 2020 2025 2030
Ala Pro Arg Pro Pro Pro Gly Pro Arg Gly Ser Gin Arg His Cys Val
2035 2040 2045
Ala Phe Asn Glu Asn Asp Val Leu Val Val Ala Gly Val Pro Glu His
2050 2055 2060 Ile Tyr Asn Phe Trp Arg Leu Asp Leu Val Arg Gin His Glu Tyr Met
2065 2070 2075 208
His Leu Thr Leu Glu Arg Ala Phe Glu Asp Ala Ala Glu Ser Met Leu
2085 2090 2095
Phe Val Gin Arg Leu Thr Pro His Pro Asp Ala Arg Ile Arg Val Leu 2100 2105 2110
Pro Thr Phe Leu Asp Gly Gly Pro Pro Thr Arg Gly Leu Leu Phe Gly
2115 2120 2125
Thr Arg Leu Ala Asp Trp Arg Arg Gly Lys Leu Ser Glu Thr Asp Pro
2130 2135 2140 Leu Ala Pro Trp Arg Ser Ala Leu Glu Leu Gly Thr Gin Arg Arg Asp
2145 2150 2155 216
Ala Pro Ala Leu Gly Lys Leu Ser Pro Ala Gin Ala Ala Val Ser Val
2165 2170 2175
Leu Gly Arg Met Cys Leu Pro Ser Ala Ala Ala Leu Trp Thr Cys Met 2180 2185 2190
Phe Pro Asp Asp Tyr Thr Glu Tyr Asp Ser Phe Asp Ala Leu Leu Ala
2195 2200 2205
Ala Arg Leu Glu Ser Gly Gin Thr Leu Gly Pro Ala Gly Gly Arg Glu 2210 2215 2220
Ala Ser Leu Pro Glu Ala Pro His Ala Leu Tyr Arg Pro Thr Gly Gin 2225 2230 _ 2235 224
His Val Ala Val Leu Ala Ala Ala Thr Thr Pro Ala Ala Arg Val Thr 2245 2250 2255
Ala Met Asp Leu Val Leu Ala Ala Val Leu Leu Gly Ala Pro Val Val
2260 2265 2270
Val Arg Asn Thr Thr Ala Phe Ser Arg Glu Ser Glu Leu Glu Leu Cys 2275 2280 2285 Leu Thr Leu Phe Asp Ser Arg Pro Gly Gly Pro Asp Ala Ala Leu Arg 2290 2295 2300
Asp Val Val Ser Ser Asp Ile Glu Thr Trp Ala Val Gly Leu Leu His 2305 2310 2315 232
Thr Asp Leu Asn Pro Ile Glu Asn Ala Cys Leu Ala Ala Gin Leu Pro 2325 2330 2335
Arg Leu Ser Ala Leu Ile Ala Glu Arg Pro Leu Ala Asp Gly Pro Pro
2340 2345 2350
Cys Leu Val Leu Val Asp Ile Ser Met Thr Pro Val Ala Val Leu Trp 2355 2360 2365 Glu Ala Pro Glu Pro Pro Gly Pro Pro Asp Val Arg Phe Val Gly Ser 2370 2375 2380
Glu Ala Thr Glu Glu Leu Pro Phe Val Ala Thr Ala Gly Asp Val Leu 2385 2390 2395 240
Ala Ala Ser Ala Ala Asp Ala Asp Pro Phe Phe Ala Arg Ala Ile Leu 2405 2410 2415
Gly Arg Pro Phe Asp Ala Ser Leu Leu Thr Gly Glu Leu Phe Pro Gly
2420 2425 2430
His Pro Val Tyr Gin Arg Pro Leu Ala Asp Glu Ala Gly Pro Ser Ala 2435 2440 2445 Pro Thr Ala Ala Arg Asp Pro Arg Asp Leu Ala Gly Gly Asp Gly Gly 2450 2455 2460
Ser Gly Pro Glu Asp Pro Ala Ala Pro Pro Ala Arg Gin Ala Asp Pro 2465 2470 2475 248
Gly Val Leu Ala Pro Thr Leu Leu Thr Asp Ala Thr Thr Gly Glu Pro 2485 2490 2495
Val Pro Pro Arg Met Trp Ala Trp Ile His Gly Leu Glu Glu Leu Ala
2500 2505 2510
Ser Asp Asp Ala Gly Gly Pro Thr Pro Asn Pro Ala Pro Ala Leu Leu 2515 2520 2525 Pro Pro Pro Ala Thr Asp Gin Ser Val Pro Thr Ser Gin Tyr Ala Pro 2530 2535 2540
Arg Pro Ile Gly Pro Ala Ala Thr Ala Arg Glu Trp Ser Val Pro Pro 2545 2550 2555 256 Gin Gin Asn Thr Gly Arg Val Pro Val Ala Pro Arg Asp Asp Pro Arg
2565 2570 2575
Pro Ser Pro Pro Thr Pro Ser. Pro Pro Ala Asp Ala Ala Leu Pro Pro 2580 2585 2590 Pro Ala Phe Ser Gly Ser Ala Ala Ala Phe Ser Ala Ala Val Pro Arg 2595 2600 2605
Val Arg Arg Ser Arg Arg Thr Arg Ala Lys Ser Arg Ala Pro Arg Ala
2610 2615 2620
Ser Ala Pro Pro Glu Gly Trp Arg Pro Pro Ala Leu Pro Ala Pro Val 2625 2630 2635 264
Ala Pro Val Ala Ala Ser Ala Arg Pro Pro Asp Gin Pro Pro Thr Pro
2645 2650 2655
Glu Ser Ala Pro Pro Ala Trp Val Ser Ala Leu Pro Leu Pro Pro Gly 2660 2665 2670 Pro Ala Ser Arg Ala Phe Pro Ala Pro Thr Leu Ala Pro Ile Pro Pro 2675 2680 2685
Pro Pro Ala Glu Gly Ala Val Ala Pro Gly Asp Asp Arg Arg Arg Gly
2690 2695 2700
Arg Arg Gin Thr Thr Ala Gly Pro Ser Pro Thr Pro Pro Arg Gly Pro 2705 2710 2715 272
Ala Ala Gly Pro Pro Arg Arg Leu Trp Ala Val Ala Ser Leu Ser Ala
2725 2730 2735
Ser Leu Asn Ser Leu Pro Ser Pro Arg Asp Pro Ala Asp His Ala Ala 2740 2745 2750 Ala Val Ser Ala Ala Ala Ala Ala Val Pro Pro Ser Pro Gly Leu Ala 2755 2760 2765
Pro Pro Thr Ser Ala Val Gin Thr Ser Pro Pro Pro Leu Ala Pro Gly
2770 2775 2780
Pro Val Ala Pro Ser Glu Pro Leu Cys Gly Trp Val Val Pro Gly Gly 2785 2790 2795 280
Pro Val Ala Arg Arg Pro Pro Pro Gin Ser Pro Ala Thr Lys Pro Ala
2805 2810 2815
Ala Arg Thr Arg Ile Arg Ala Arg Ser Val Pro Gin Pro Pro Leu Pro 2820 2825 2830 Gin Pro Pro Leu Pro Gin Pro Pro Leu Pro Gin Pro Pro Leu Pro Gin 2835 2840 2845
Pro Pro Leu Pro Gin Pro Pro Leu Pro Gin Pro Pro Leu Pro Gin Pro
2850 2855 2860
Pro Leu Pro Gin Pro Pro Leu Pro Gin Pro Pro Leu Pro Gin Pro Pro 2865 2870 2875 288
Leu Pro Gin Pro Pro Leu Pro Gin Ser Arg Asp Ser Val Pro Thr Pro
2885 2890 2895
Glu Ser Pro Thr His Thr Asn Thr His Leu Pro Val Ser Ala Val Thr 2900 2905 2910
Ser Trp Ala Ser Ser Leu Ala Leu His Val Asp Ser Ala Pro Pro Pro
2915 . 2920 2925
Ala Ser Leu Leu Gin Thr Leu His Ser Asp Asp Glu His Ser Asp Ala 2930 2935 2940
Asp Ser Leu Arg Phe Ser Asp Ser Asp Asp Thr Glu Ala Leu Asp Pro
2945 2950 2955 296
Leu Pro Pro Glu Pro His Leu Pro Pro Ala Asp Glu Pro Pro Gly Pro
2965 2970 2975 Leu Ala Ala Asp His Leu Gin Ser Pro His Ser Gin Phe Gly Pro Leu
2980 2985 2990
Pro Val Gin Ala Asn Ala Val Leu Ser Arg Arg Tyr Val Arg Ser Thr
2995 3000 3005
Gly Arg Ser Ala Val Leu Ile Arg Ala Cys Arg Arg Ile Gin Gin Gin 3010 3015 3020
Leu Gin Arg Thr Arg Arg Ala Leu Phe Gin Arg Ser Asn Ala Val Leu 3025 3030 3035 304
Thr Ser Leu His His Val Arg Met Leu Leu Gly 3045 3050
(2) INFORMATION FOR SEQ ID NO: 253:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1124 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:253
Met Ala Asn Arg Pro Ala Ala Ser Ala Gly Ala Arg Ser Pro Ser Gin 1 5 10 15 Glu Pro Arg Glu Pro Glu Val Ala Pro Pro Gly Gly Asp His Val Phe 20 25 30
Cys Arg Lys Val Ser Gly Val Met Val Leu Ser Ser Asp Pro Pro Gly
35 40 45
Pro Ala Ala Tyr Arg Ile Ser Asp Ser Ser Phe Val Gin Cys Gly Ser 50 55 60
Asn Cys Ser Met Ile Ile Asp Gly Asp Val Arg His Leu Arg Asp Leu 65 70 75 80
Glu Gly Ala Thr Ser Thr Gly Ala Phe Val Ala Ile Ser Asn Val Ala 85 90 95
Ala Gly Gly Asp Gly Arg Thr Ala Val Val Gly Gly Thr Ser Gly Pro
100 . 105 110
Ser Ala Thr Thr Ser Val Gly Thr Gin Thr Ser Gly Glu Phe Leu His 115 120 125
Gly Asn Pro Arg Thr Pro Glu Pro Gin Gly Pro Gin Ala Val Pro Pro
130 135 140
Pro Pro Pro Pro Pro Phe Pro Trp Gly His Glu Cys Cys Ala Arg Arg 145 150 155 160 Asp Arg Gly Ala Glu Lys Asp Val Gly Ala Ala Glu Ser Trp Ser Asp
165 170 175
Gly Pro Ser Ser Asp Ser Glu Thr Glu Asp Ser Asp Ser Ser Asp Glu
180 185 190
Asp Thr Gly Ser Gly Ser Glu Thr Leu Ser Arg Ser Ser Ser Ile Trp 195 200 205
Ala Ala Gly Ala Thr Asp Asp Asp Asp Ser Asp Ser Asp Ser Arg Ser
210 215 220
Asp Asp Ser Val Gin Pro Asp Val Val Val Arg Arg Arg Trp Ser Asp 225 230 235 240 Gly Pro Ala Pro Val Ala Phe Pro Lys Pro Arg Arg Pro Gly Asp Ser
245 250 255
Pro Gly Asn Pro Gly Leu Gly Ala Gly Thr Gly Pro Gly Ser Ala Thr
260 265 270
Asp Pro Arg Ala Ser Ala Asp Ser Asp Ser Ala Ala His Ala Ala Ala 275 280 285
Pro Gin Ala Asp Val Ala Pro Val Leu Asp Ser Gin Pro Thr Val Gly
290 295 300
Thr Asp Pro Gly Tyr Pro Val Pro Leu Glu Leu Thr Pro Glu Asn Ala 305 310 315 320 Glu Ala Val Ala Arg Phe Leu Gly Asp Ala Val Asp Arg Glu Pro Ala
325 330 335
Leu Met Leu Glu Tyr Phe Cys Arg Cys Ala Arg Glu Glu Ser Lys Arg
340 345 350
Val Pro Pro Arg Thr Phe Gly Ser Ala Pro Arg Leu Thr Glu Asp Asp 355 360 365
Phe Gly Leu Leu Asn Tyr Ala Glu Met Arg Arg Leu Cys Leu Asp Leu
370 375 380
Pro Pro Val Pro Pro Asn Ala Tyr Thr Pro Tyr His Leu Arg Glu Tyr 385 390 395 400 Ala Thr Arg Leu Val Asn Gly Phe Lys Pro Leu Val Arg Arg Ser Ala
405 410 415
Arg Leu Tyr Arg Ile Leu Gly Ile Leu Val His Leu Arg Ile Arg Thr 420 425 430 Arg Glu Ala Ser Phe Glu Glu Trp Met Arg Ser Lys Glu Val Asp Leu
435 440 445
Asp Phe Gly Leu Thr Glu Arg. Leu Arg Glu His Glu Ala Gin Leu Met
450 455 460 Ile Leu Ala Gin Ala Leu Asn Pro Tyr Asp Cys Leu Ile His Ser Thr
465 470 475 480
Pro Asn Thr Leu Val Glu Arg Gly Leu Gin Ser Ala Leu Lys Tyr Glu
485 490 495
Glu Phe Tyr Leu Lys Arg Phe Gly Gly His Tyr Met Glu Ser Val Phe 500 505 510
Gin Met Tyr Thr Arg Ile Ala Gly Phe Leu Ala Cys Arg Ala Thr Arg
515 520 525
Gly Met Arg His Ile Ala Leu Gly Arg Gin Gly Ser Trp Trp Glu Met
530 535 540 Phe Lys Phe Phe Phe His Arg Leu Tyr Asp His Gin Ile Val Pro Ser
545 550 555 560
Thr Pro Ala Met Leu Asn Leu Gly Thr Arg Asn Tyr Tyr Thr Ser Ser
565 570 575
Cys Tyr Leu Val Asn Pro Gin Ala Thr Thr Asn Gin Ala Thr Leu Arg 580 585 590
Ala Ile Thr Gly Asn Val Ser Ala Ile Leu Ala Arg Asn Gly Gly Ile
595 600 605
Gly Leu Cys Met Gin Ala Phe Asn Asp Asp Gly Thr Ala Ser Ile Met
610 615 620 Pro Ala Leu Lys Val Leu Asp Ser Leu Val Ala Ala His Asn Lys Gin
625 630 635 640
Ser Trp Thr Gly Ala Cys Val Tyr Leu Glu Pro Trp His Ser Asp Val
645 650 655
Arg Ala Val Leu Arg Met Lys Gly Val Leu Ala Gly Glu Glu Ala Gin 660 665 670
Arg Cys Asp Asn Ile Phe Ser Ala Leu Trp Met Pro Asp Leu Phe Phe
675 680 685
Lys Arg Leu Ile Arg His Leu Asp Gly Glu Lys Asn Val Thr Trp Ser
690 695 700 Leu Phe Asp Arg Asp Thr Ser Met Ser Leu Ala Asp Phe His Gly Glu
705 710 715 720
Glu Phe Glu Lys Leu Tyr Glu His Leu Glu Ala Met Gly Phe Gly Glu
725 730 735
Thr Ile Pro Ile Gin Asp Leu Ala Tyr Ala Ile Val Arg Ser Ala Ala 740 745 750
Thr Thr Gly Ser Pro Phe Ile Met Phe Lys Asp Ala Val Asn Arg His
755 760 765
Tyr Ile Tyr Asp Thr Gin Gly Ala Ala Ile Ala Gly Ser Asn Leu Cys 770 775 780
Thr Glu Ile Val His Pro Ser Ser Lys Arg Ser Ser Gly Val Cys Asn 785 790 . 795 800
Leu Gly Ser Val Asn Leu Ala Arg Cys Val Ser Arg Arg Thr Phe Asp 805 810 815
Phe Gly Met Leu Arg Asp Ala Val Gin Ala Cys Val Leu Met Val Asn
820 825 830
Ile Met Ile Asp Ser Thr Leu Gin Pro Thr Pro Gin Cys Arg His Asp 835 840 845 Asn Leu Arg Ser Met Gly Ile Gly Met Gin Gly Leu His Thr Ala Cys 850 855 860
Leu Lys Met Gly Leu Asp Leu Glu Ser Ala Glu Phe Arg Asp Leu Asn 865 870 875 880
Thr His Ile Ala Glu Val Met Leu Leu Ala Ala Met Lys Thr Ser Asn 885 890 895
Ala Leu Cys Val Arg Gly Ala Arg Pro Phe Ser His Phe Lys Arg Ser
900 905 910
Met Tyr Arg Ala Gly Arg Phe His Trp Glu Arg Phe Ser Asn Asp Arg 915 920 925 Tyr Glu Gly Glu Trp Glu Met Leu Arg Gin Ser Met Met Lys His Gly 930 935 940
Leu Arg Asn Ser Gin Phe Ile Ala Leu Met Pro Thr Ala Ala Ser Ala 945 950 955 960
Gin Ile Ser Asp Val Ser Glu Gly Phe Ala Pro Leu Phe Thr Asn Leu 965 970 975
Phe Ser Lys Val Thr Arg Asp Gly Glu Thr Leu Arg Pro Asn Thr Leu
980 985 990
Leu Leu Lys Glu Leu Glu Arg Thr Phe Gly Gly Lys Arg Leu Leu Asp 995 1000 1005 Ala Met Asp Gly Leu Glu Ala Lys Gin Trp Ser Val Ala Gin Ala Leu 1010 1015 1020
Pro Cys Leu Asp Pro Ala His Pro Leu Arg Arg Phe Lys Thr Ala Phe 1025 1030 1035 104
Asp Tyr Asp Gin Glu Leu Leu Ile Asp Leu Cys Ala Asp Arg Ala Pro 1045 1050 1055
Tyr Val Asp His Ser Gin Ser Met Thr Leu Tyr Val Thr Glu Lys Ala
1060 1065 1070
Asp Gly Thr Leu Pro Ala Ser Thr Leu Val Arg Leu Leu Val His Ala 1075 1080 1085 Tyr Lys Arg Gly Leu Lys Thr Gly Met Tyr Tyr Cys Lys Val Arg Lys 1090 1095 1100
Ala Thr Asn Ser Gly Val Phe Ala Gly Asp Asp Asn Ile Val Cys Thr 1105 1110 1115 112 Ser Cys Ala Leu
(2) INFORMATION FOR SEQ ID NO: 254:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1124 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 254:
Met Ala Asn Arg Pro Ala Ala Ser Ala Gly Ala Arg Ser Pro Ser Gin
1 5 10 15
Glu Pro Arg Glu Pro Glu Val Ala Pro Pro Gly Gly Asp His Val Phe 20 25 30 Cys Arg Lys Val Ser Gly Val Met Val Leu Ser Ser Asp Pro Pro Gly 35 40 45
Pro Ala Ala Tyr Arg Ile Ser Asp Ser Ser Phe Val Gin Cys Gly Ser
50 55 60
Asn Cys Ser Met Ile Ile Asp Gly Asp Val Arg His Leu Arg Asp Leu 65 70 75 80
Glu Gly Ala Thr Ser Thr Gly Ala Phe Val Ala Ile Ser Asn Val Ala
85 90 95
Ala Gly Gly Asp Gly Arg Thr Ala Val Val Gly Gly Thr Ser Gly Pro 100 105 110 Ser Ala Thr Thr Ser Val Gly Thr Gin Thr Ser Gly Glu Phe Leu His 115 120 125
Gly Asn Pro Arg Thr Pro Glu Pro Gin Gly Pro Gin Ala Val Pro Pro
130 135 140
Pro Pro Pro Pro Pro Phe Pro Trp Gly His Glu Cys Cys Ala Arg Arg 145 150 155 160
Asp Arg Gly Ala Glu Lys Asp Val Gly Ala Ala Glu Ser Trp Ser Asp
165 170 175
Gly Pro Ser Ser Asp Ser Glu Thr Glu Asp Ser Asp Ser Ser Asp Glu 180 185 190 Asp Thr Gly Ser Gly Ser Glu Thr Leu Ser Arg Ser Ser Ser Ile Trp 195 200 205
Ala Ala Gly Ala Thr Asp Asp Asp Asp Ser Asp Ser Asp Ser Arg Ser 210 215 220 Asp Asp Ser Val Gin Pro Asp Val Val Val Arg Arg Arg Trp Ser Asp
225 230 235 240
Gly Pro Ala Pro Val Ala Phe .Pro Lys Pro Arg Arg Pro Gly Asp Ser
245 250 255 Pro Gly Asn Pro Gly Leu Gly Ala Gly Thr Gly Pro Gly Ser Ala Thr
260 265 270
Asp Pro Arg Ala Ser Ala Asp Ser Asp Ser Ala Ala His Ala Ala Ala
275 280 285
Pro Gin Ala Asp Val Ala Pro Val Leu Asp Ser Gin Pro Thr Val Gly 290 295 300
Thr Asp Pro Gly Tyr Pro Val Pro Leu Glu Leu Thr Pro Glu Asn Ala
305 310 315 320
Glu Ala Val Ala Arg Phe Leu Gly Asp Ala Val Asp Arg Glu Pro Ala
325 330 335 Leu Met Leu Glu Tyr Phe Cys Arg Cys Ala Arg Glu Glu Ser Lys Arg
340 345 350
Val Pro Pro Arg Thr Phe Gly Ser Ala Pro Arg Leu Thr Glu Asp Asp
355 360 365
Phe Gly Leu Leu Asn Tyr Ala Glu Met Arg Arg Leu Cys Leu Asp Leu 370 375 380
Pro Pro Val Pro Pro Asn Ala Tyr Thr Pro Tyr His Leu Arg Glu Tyr
385 390 395 400
Ala Thr Arg Leu Val Asn Gly Phe Lys Pro Leu Val Arg Arg Ser Ala
405 410 415 Arg Leu Tyr Arg Ile Leu Gly Ile Leu Val His Leu Arg Ile Arg Thr
420 425 430
Arg Glu Ala Ser Phe Glu Glu Trp Met Arg Ser Lys Glu Val Asp Leu
435 440 445
Asp Phe Gly Leu Thr Glu Arg Leu Arg Glu His Glu Ala Gin Leu Met 450 455 460
Ile Leu Ala Gin Ala Leu Asn Pro Tyr Asp Cys Leu Ile His Ser Thr
465 470 475 480
Pro Asn Thr Leu Val Glu Arg Gly Leu Gin Ser Ala Leu Lys Tyr Glu
485 490 495 Glu Phe Tyr Leu Lys Arg Phe Gly Gly His Tyr Met Glu Ser Val Phe
500 505 510
Gin Met Tyr Thr Arg Ile Ala Gly Phe Leu Ala Cys Arg Ala Thr Arg
515 520 525
Gly Met Arg His Ile Ala Leu Gly Arg Gin Gly Ser Trp Trp Glu Met 530 535 540
Phe Lys Phe Phe Phe His Arg Leu Tyr Asp His Gin Ile Val Pro Ser 545 550 555 560
Thr Pro Ala Met Leu Asn Leu Gly Thr Arg Asn Tyr Tyr Thr Ser Ser 565 570 575
Cys Tyr Leu Val Asn Pro Gin Ala Thr Thr Asn Gin Ala Thr Leu Arg
580 - 585 590
Ala Ile Thr Gly Asn Val Ser Ala Ile Leu Ala Arg Asn Gly Gly Ile 595 600 605
Gly Leu Cys Met Gin Ala Phe Asn Asp Asp Gly Thr Ala Ser Ile Met
610 615 620
Pro Ala Leu Lys Val Leu Asp Ser Leu Val Ala Ala His Asn Lys Gin 625 630 635 640 Ser Trp Thr Gly Ala Cys Val Tyr Leu Glu Pro Trp His Ser Asp Val
645 650 655
Arg Ala Val Leu Arg Met Lys Gly Val Leu Ala Gly Glu Glu Ala Gin
660 665 670
Arg Cys Asp Asn Ile Phe Ser Ala Leu Trp Met Pro Asp Leu Phe Phe 675 680 685
Lys Arg Leu Ile Arg His Leu Asp Gly Glu Lys Asn Val Thr Trp Ser
690 695 700
Leu Phe Asp Arg Asp Thr Ser Met Ser Leu Ala Asp Phe His Gly Glu 705 710 715 720 Glu Phe Glu Lys Leu Tyr Glu His Leu Glu Ala Met Gly Phe Gly Glu
725 730 735
Thr Ile Pro Ile Gin Asp Leu Ala Tyr Ala Ile Val Arg Ser Ala Ala
740 745 750
Thr Thr Gly Ser Pro Phe Ile Met Phe Lys Asp Ala Val Asn Arg His 755 760 765
Tyr Ile Tyr Asp Thr Gin Gly Ala Ala Ile Ala Gly Ser Asn Leu Cys
770 775 780
Thr Glu Ile Val His Pro Ser Ser Lys Arg Ser Ser Gly Val Cys Asn 785 790 795 800 Leu Gly Ser Val Asn Leu Ala Arg Cys Val Ser Arg Arg Thr Phe Asp
805 810 815
Phe Gly Met Leu Arg Asp Ala Val Gin Ala Cys Val Leu Met Val Asn
820 825 830
Ile Met Ile Asp Ser Thr Leu Gin Pro Thr Pro Gin Cys Arg His Asp 835 840 845
Asn Leu Arg Ser Met Gly Ile Gly Met Gin Gly Leu His Thr Ala Cys
850 855 860
Leu Lys Met Gly Leu Asp Leu Glu Ser Ala Glu Phe Arg Asp Leu Asn 865 870 875 880 Thr His Ile Ala Glu Val Met Leu Leu Ala Ala Met Lys Thr Ser Asn
885 890 895
Ala Leu Cys Val Arg Gly Ala Arg Pro Phe Ser His Phe Lys Arg Ser 900 905 910 Met Tyr Arg Ala Gly Arg Phe His Trp Glu Arg Phe Ser Asn Asp Arg
915 920 925
Tyr Glu Gly Glu Trp Glu Met eu Arg Gin Ser Met Met Lys His Gly
930 935 940 Leu Arg Asn Ser Gin Phe Ile Ala Leu Met Pro Thr Ala Ala Ser Ala
945 950 955 960
Gin Ile Ser Asp Val Ser Glu Gly Phe Ala Pro Leu Phe Thr Asn Leu
965 970 975
Phe Ser Lys Val Thr Arg Asp Gly Glu Thr Leu Arg Pro Asn Thr Leu 980 985 990
Leu Leu Lys Glu Leu Glu Arg Thr Phe Gly Gly Lys Arg Leu Leu Asp
995 1000 1005
Ala Met Asp Gly Leu Glu Ala Lys Gin Trp Ser Val Ala Gin Ala Leu
1010 1015 1020 Pro Cys Leu Asp Pro Ala His Pro Leu Arg Arg Phe Lys Thr Ala Phe
1025 1030 1035 104
Asp Tyr Asp Gin Glu Leu Leu Ile Asp Leu Cys Ala Asp Arg Ala Pro
1045 1050 1055
Tyr Val Asp His Ser Gin Ser Met Thr Leu Tyr Val Thr Glu Lys Ala 1060 1065 1070
Asp Gly Thr Leu Pro Ala Ser Thr Leu Val Arg Leu Leu Val His Ala
1075 1080 1085
Tyr Lys Arg Gly Leu Lys Thr Gly Met Tyr Tyr Cys Lys Val Arg Lys 1090 1095 1100 Ala Thr Asn Ser Gly Val Phe Ala Gly Asp Asp Asn Ile Val Cys Thr 1105 1110 1115 112
Ser Cys Ala Leu
(2) INFORMATION FOR SEQ ID NO: 255:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 333 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 255:
Met Asp Pro Ala Val Ser Pro Ala Ser Thr Asp Pro Leu Asp Thr His 1 5 10 15 Ala Ser Gly Ala Gly Ala Ala Pro Ile Pro Val Cys Pro Thr Pro Glu
20 25 30
Arg Tyr Phe Tyr Thr Ser Gin. Cys Pro Asp Ile Asn His Leu Arg Ser 35 40 45 Leu Ser Ile Leu Asn Arg Trp Leu Glu Thr Glu Leu Val Phe Val Gly 50 55 60
Asp Glu Glu Asp Val Ser Lys Leu Ser Glu Gly Glu Leu Gly Phe Tyr 65 70 75 80
Arg Phe Leu Phe Ala Phe Leu Ser Ala Ala Asp Asp Leu Val Thr Glu 85 90 95
Asn Leu Gly Gly Leu Ser Gly Leu Phe Glu Gin Lys Asp Ile Leu His
100 105 110
Tyr Tyr Val Glu Gin Glu Cys Ile Glu Val Val His Ser Arg Val Tyr 115 120 125 Asn Ile Ile Gin Leu Val Leu Phe His Asn Asn Asp Gin Ala Arg Arg 130 135 140
Ala Tyr Val Ala Arg Thr Ile Asn His Pro Ala Ile Arg Val Lys Val 145 150 155 160
Asp Trp Leu Glu Ala Arg Val Arg Glu Cys Asp Ser Ile Pro Glu Lys 165 170 175
Phe Ile Leu Met Ile Leu Ile Glu Gly Val Phe Phe Ala Ala Ser Phe
180 185 190
Ala Ala Ile Ala Tyr Leu Arg Thr Asn Asn Leu Leu Arg Val Thr Cys 195 200 205 Gin Ser Asn Asp Leu Ile Ser Arg Asp Glu Ala Val His Thr Thr Ala 210 215 220
Ser Cys Tyr Ile Tyr Asn Asn Tyr Leu Gly Gly His Ala Lys Pro Glu 225 230 235 240
Ala Ala Arg Val Tyr Arg Leu Phe Arg Glu Ala Val Asp Ile Glu Ile 245 250 255
Gly Phe Ile Arg Ser Gin Ala Pro Thr Asp Ser Ser Ile Leu Ser Pro
260 265 270
Gly Ala Ala Ile Glu Asn Tyr Val Arg Phe Ser Ala Asp Arg Leu Leu 275 280 285 Gly Leu Ile His Met Gin Pro Lys Ala Pro Ala Pro Asp Ala Ser Phe 290 295 300
Pro Leu Ser Leu Met Ser Thr Asp Lys His Thr Asn Phe Phe Glu Cys 305 310 315 320
Arg Ser Thr Ser Tyr Ala Gly Ala Val Val Asn Asp Leu 325 330
(2) INFORMATION FOR SEQ ID NO: 256: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 357 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 256:
Met Arg Arg Arg Gly His Ala Phe Ala Pro Gly Asp Arg Gly Thr Arg
1 5 10 15
Ala Ala Gly Pro Gly Pro Ala Ala Pro Trp Gly Ala Pro Ser Lys Pro 20 25 30 Ala Leu Arg Leu Ala His Leu Phe Cys Ile Arg Val Leu Arg Ala Leu 35 40 45
Gly Tyr Ala Tyr Ile Asn Ser Gly Gin Leu Glu Ala Asp Asp Ala Cys
50 55 60
Ala Asn Leu Tyr His Thr Asn Thr Val Ala Tyr Val His Thr Thr Asp 65 70 75 80
Thr Asp Leu Leu Leu Met Gly Cys Asp Ile Val Leu Asp Ile Ser Thr
85 90 95
Gly Tyr Ile Pro Thr Ile His Cys Arg Asp Leu Leu Gin Tyr Phe Lys 100 105 110 Met Ser Tyr Pro Gin Phe Leu Ala Leu Phe Val Arg Cys His Thr Asp 115 120 125
Leu His Pro Asn Asn Thr Tyr Ala Ser Val Glu Asp Val Leu Arg Glu
130 135 140
Cys His Trp Thr Ala Pro Ser Arg Ser Gin Ala Arg Arg Ala Ala Arg 145 150 155 160
Arg Glu Arg Ala Asn Ser Arg Ser Leu Glu Ser Met Pro Thr Leu Thr
165 170 175
Ala Ala Pro Val Gly Leu Glu Thr Arg Ile Ser Trp Thr Glu Ile Leu 180 185 190 Ala Gin Gin Ile Ala Gly Glu Asp Asp Tyr Glu Glu Asp Pro Pro Leu 195 200 205
Gin Pro Pro Asp Val Ala Gly Gly Pro Arg Asp Gly Ala Arg Ser Ser
210 215 220
Ser Ser Glu Ile Leu Thr Pro Pro Glu Leu Val Gin Val Pro Asn Ala 225 230 235 240
Gin Arg Val Ala Glu His Arg Gly Tyr Val Ala Gly Arg Arg Arg His
245 250 255
Val Ile His Asp Ala Pro Glu Ala Leu Asp Trp Leu Pro Asp Pro Met 260 265 270
Thr Ile Ala Glu Leu Val Glu His Arg Tyr Val Lys Tyr Val Ile Ser
275 .280 285
Leu Ile Ser Pro Lys Glu Arg Gly Pro Trp Thr Leu Leu Lys Arg Leu 290 295 300
Pro Ile Tyr Gin Asp Leu Arg Asp Glu Asp Leu Ala Arg Ser Ile Val 305 310 315 320
Thr Arg His Ile Thr Ala Pro Asp Ile Ala Asp Arg Phe Leu Ala Gin 325 330 335 Leu Trp Ala His Ala Pro Pro Pro Ala Phe Tyr Lys Asp Val Leu Ala 340 345 350
Lys Phe Trp Asp Glu 355
(2) INFORMATION FOR SEQ ID NO: 257:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 466 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:257:
Met Ala His Leu Pro Gly Gly Ala Ala Ala Ala Pro Leu Ser Glu Asp
1 5 10 15
Ala Ile Pro Ser Pro Arg Glu Arg Thr Glu Asp Trp Pro Pro Cys Gin 20 25 30
Ile Val Leu Gin Gly Ala Glu Leu Asn Gly Ile Leu Gin Ala Phe Ala
35 40 45
Pro Leu Arg Thr Ser Leu Leu Asp Ser Leu Leu Val Val Gly Asp Arg 50 55 60 Gly Ile Leu Val His Asn Ala Ile Phe Gly Glu Gin Val Phe Leu Pro 65 70 75 80
Leu Asp His Ser Gin Phe Ser Arg Tyr Arg Trp Gly Gly Pro Thr Ala
85 90 95
Ala Phe Leu Ser Leu Val Asp Gin Lys Arg Ser Leu Leu Ser Val Phe 100 105 110
Arg Ala Asn Gin Tyr Pro Asp Leu Arg Arg Val Glu Leu Thr Val Thr
115 120 125
Gly Gin Ala Pro Phe Arg Thr Leu Val Gin Arg Ile Trp Thr Thr Ala 130 135 140
Ser Asp Gly Glu Ala Val Glu Leu Ala Ser Glu Thr Leu Met Lys Arg 145 150 - 155 160
Glu Leu Thr Ser Phe Ala Val Leu Leu Pro Gin Gly Asp Pro Asp Val 165 170 175
Gin Leu Arg Leu Thr Lys Pro Gin Leu Thr Lys Val Val Asn Ala Val
180 185 190
Gly Asp Glu Thr Ala Lys Pro Thr Thr Phe Glu Leu Gly Pro Asn Gly 195 200 205 Lys Phe Ser Val Phe Asn Ala Arg Thr Cys Val Thr Phe Ala Ala Arg 210 215 220
Glu Glu Gly Ala Ser Ser Ser Thr Ser Ala Gin Val Gin Ile Leu Thr 225 230 235 240
Ser Ala Leu Lys Lys Ala Gly Gin Ala Ala Ala Asn Ala Lys Thr Val 245 250 255
Tyr Gly Glu Asn Thr Thr Phe Ser Val Val Val Asp Asp Cys Ser Met
260 265 270
Arg Ala Val Leu Arg Arg Leu Gin Val Gly Gly Gly Thr Leu Lys Phe 275 280 285 Phe Leu Thr Ala Asp Val Pro Ser Val Cys Val Thr Ala Thr Gly Pro 290 295 300
Asn Ala Val Ser Ala Val Phe Leu Leu Lys Pro Gin Arg Val Cys Leu 305 310 315 320
Asn Trp Leu Gly Arg Thr Pro Gly Ser Ser Thr Gly Ser Leu Ala Ser 325 330 335
Gin Asp Ser Arg Ala Gly Pro Thr Asp Ser Gin Asp Phe Ser Ser Glu
340 345 350
Pro Asp Ala Gly Asp Arg Gly Ala Pro Glu Glu Glu Gly Leu Glu Gly 355 360 365 Gin Ala Arg Val Pro Pro Ala Phe Pro Glu Pro Pro Gly Thr Lys Arg 370 375 380
Arg His Ala Gly Ala Glu Val Val Pro Ala Asp Asp Ala Thr Lys Arg 385 390 395 400
Pro Lys Thr Gly Val Pro Ala Ala Pro Thr Arg Ala Glu Ser Pro Pro 405 410 415
Leu Ser Ala Arg Tyr Gly Pro Glu Ala Ala Glu Gly Gly Gly Asp Gly
420 425 430
Gly Arg Tyr Ala Cys Tyr Phe Arg Asp Leu Gin Thr Gly Asp Asp Ser 435 440 445 Pro Leu Ser Ala Phe Arg Gly Pro Gin Arg Pro Pro Tyr Gly Phe Gly 450 455 460
Leu Pro 465 (2) INFORMATION FOR SEQ ID NO: 258:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 170 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 258:
Met Ala Phe Arg Ala Ser Gly Pro Ala Tyr Gin Pro Leu Ala Pro Ala 1 5 10 15
Asp Ala Arg Ala Arg Val Pro Ala Val Ala Trp Ile Gly Val Gly Ala
20 25 30
Ile Val Gly Ala Phe Ala Leu Val Ala Ala Leu Val Leu Val Pro Pro 35 40 45 Arg Ser Ser Trp Gly Leu Ser Pro Cys Asp Ser Gly Trp Gin Glu Phe 50 55 60
Asn Ala Gly Cys Val Ala Trp Asp Pro Thr Pro Val Glu His Glu Gin 65 70 75 80
Ala Val Gly Gly Cys Ser Ala Pro Ala Thr Leu Ile Pro Arg Ala Ala 85 90 95
Ala Lys His Leu Ala Ala Leu Thr Arg Val Gin Ala Glu Arg Ser Ser
100 105 110
Gly Tyr Trp Trp Val Asn Gly Asp Gly Ile Arg Thr Cys Leu Arg Leu 115 120 125 Val Asp Ser Val Ser Gly Ile Asp Glu Phe Cys Glu Glu Leu Ala Ile 130 135 140
Arg Ile Cys Tyr Tyr Pro Arg Ser Pro Gly Gly Phe Val Arg Phe Val 145 150 155 160
Thr Ser Ile Arg Asn Ala Leu Gly Leu Pro 165 170
(2) INFORMATION FOR SEQ ID NO: 259:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 713 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 259:
Met Gin Arg Arg Arg Ala Ser Ser Leu Arg Leu Ala Arg Cys Leu Thr
1 5 10 15
Pro Ala Asn Leu Ile Arg Gly Ala Asn Ala Gly Val Pro Glu Arg Arg 20 25 30 Ile Phe Ala Gly Cys Leu Leu Pro Thr Pro Glu Gly Leu Leu Ser Ala 35 40 45
Ala Val Gly Val Leu Arg Gin Arg Ala Asp Asp Leu Gin Pro Ala Phe
50 55 60
Leu Thr Gly Ala Asp Arg Ser Val Arg Leu Ala Ala Arg His His Asn 65 70 75 80
Thr Val Pro Glu Ser Leu Ile Val Asp Gly Leu Ala Ser Asp Pro His
85 90 95
Tyr Asp Tyr Ile Arg His Tyr Ala Ser Ala Ala Lys Gin Ala Leu Gly 100 105 110 Glu Val Glu Leu Ser Gly Gly Gin Leu Ser Arg Ala Ile Leu Ala Gin 115 120 125
Tyr Trp Lys Tyr Leu Gin Thr Val Val Pro Ser Gly Leu Asp Ile Pro
130 135 140
Asp Asp Pro Ala Gly Asp Cys Asp Pro Ser Leu His Val Leu Leu Arg 145 150 155 160
Pro Thr Leu Leu Pro Lys Leu Leu Val Arg Ala Pro Phe Lys Ser Gly
165 170 175
Ala Ala Ala Ala Lys Tyr Ala Ala Ala Val Ala Gly Leu Arg Asp Ala 180 185 190 Ala His Arg Leu Gin Gin Tyr Met Phe Phe Met Arg Pro Ala Asp Pro 195 200 205
Ser Arg Pro Ser Thr Asp Thr Ala Leu Arg Leu Ser Glu Phe Leu Ala
210 215 220
Tyr Val Ser Val Leu Tyr His Trp Ala Ser Trp Met Leu Trp Thr Ala 225 230 235 240
Asp Lys Tyr Val Cys Arg Arg Leu Gly Pro Ala Asp Arg Arg Phe Val
245 250 255
Ser Gly Ser Leu Glu Ala Pro Ala Glu Thr Phe Ala Arg His Leu Asp 260 265 270 Arg Gly Pro Ser Gly Thr Thr Gly Ser Met Gin Cys Met Ala Leu Arg 275 280 285
Ala Ala Val Ser Asp Val Leu Gly His Leu Thr Arg Leu Ala His Leu 290 295 300 Trp Glu Thr Gly Lys Arg Ser Gly Gly Thr Tyr Gly Ile Val Asp Ala
305 310 315 320
Ile Val Ser Thr Val Glu Val Leu Ser Ile Val His His His Ala Gin
325 330 335 Tyr Ile Ile Asn Ala Thr Leu Thr Gly Tyr Val Val Trp Ala Ser Asp
340 345 350
Ser Leu Asn Asn Glu Tyr Leu Arg Ala Ala Val Asp Ser Gin Glu Arg
355 360 365
Phe Cys Arg Thr Ala Ala Pro Leu Phe Pro Thr Met Thr Ala Pro Ser 370 375 380
Trp Ala Arg Met Glu Leu Ser Ile Lys Ser Trp Phe Gly Ala Ala Pro
385 390 395 400
Asp Leu Leu Arg Ser Gly Thr Pro Ser Pro His Tyr Glu Ser Ile Leu
405 410 415 Arg Leu Ala Ala Ser Gly Pro Pro Gly Gly Arg Gly Ala Val Gly Gly
420 425 430
Ser Cys Arg Asp Lys Ile Gin Arg Thr Arg Arg Asp Asn Ala Pro Pro
435 440 445
Pro Leu Pro Arg Ala Arg Pro His Ser Thr Pro Ala Ala Pro Arg Arg 450 455 460
Phe Arg Arg His Arg Glu Asp Leu Pro Glu Pro Pro His Val Asp Ala
465 470 475 480
Ala Asp Arg Gly Pro Glu Pro Cys Ala Gly Arg Pro Ala Thr Tyr Tyr
485 490 495 Thr His Met Ala Gly Ala Pro Pro Arg Leu Pro Pro Arg Asn Pro Ala
500 505 510
Pro Pro Glu Gin Arg Pro Ala Ala Ala Ala Arg Pro Leu Ala Ala Gin
515 520 525
Arg Glu Ala Ala Gly Val Tyr Asp Ala Val Arg Thr Trp Gly Pro Asp 530 535 540
Ala Glu Ala Glu Pro Asp Gin Met Glu Asn Thr Tyr Leu Leu Pro Asp
545 550 555 560
Asp Asp Ala Ala Met Pro Ala Gly Val Gly Leu Gly Ala Thr Pro Ala
565 570 575 Ala Asp Thr Thr Ala Ala Ala Trp Pro Ala Glu Ser His Ala Pro Arg
580 585 590
Ala Pro Ser Glu Asp Ala Asp Ser Ile Tyr Glu Ser Val Ser Glu Asp
595 600 605
Gly Gly Arg Val Tyr Glu Glu Ile Pro Trp Val Arg Val Tyr Glu Asn 610 615 620
Ile Cys Leu Arg Arg Gin Asp Ala Gly Gly Ala Ala Pro Pro Gly Asp 625 630 635 640
Ala Pro Asp Ser Pro Tyr Ile Glu Ala Glu Asn Pro Leu Tyr Asp Trp 645 650 655
Gly Gly Ser Ala Leu Phe Ser Pro Pro Gly Ala Thr Arg Ala Pro Asp
660 . 665 670
Pro Gly Leu Ser Leu Ser Pro Met Pro Ala Arg Pro Arg Thr Asn Ala
675 680 685
Asn Asp Gly Pro Thr Asn Val Ala Ala Leu Ser Ala Leu Leu Thr Lys
690 695 700
Leu Lys Arg Gly Arg His Gin Ser His 705 710
(2) INFORMATION FOR SEQ ID NO: 260:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 352 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 260:
Val Gly Ala Ala Ala Val Pro Leu Leu Ser Ala Gly Gly Ala Ala Pro 1 5 10 15 Pro His Pro Gly Pro Asp Ala Ala Val Phe Arg Ser Ser Leu Gly Ser 20 25 30
Leu Leu Tyr Trp Pro Gly Val Arg Ala Leu Leu Gly Arg Asp Cys Arg
35 40 45
Val Ala Ala Arg Tyr Ala Gly Arg Met Thr Tyr Ile Ala Thr Gly Ala 50 55 60
Leu Leu Ala Arg Phe Asn Pro Gly Ala Val Lys Cys Val Leu Pro Arg 65 70 75 80
Glu Ala Ala Phe Ala Gly Arg Val Leu Asp Val Leu Ala Val Leu Ala 85 90 95 Glu Gin Thr Val Gin Trp Leu Ser Val Val Val Gly Ala Arg Leu His 100 105 110
Pro His Ser Ala His Pro Ala Phe Val Asp Val Glu Gin Glu Ala Leu
115 120 125
Phe Arg Ala Leu Pro Leu Gly Ser Pro Gly Val Val Ala Ala Glu His 130 135 140
Glu Ala Leu Gly Asp Thr Ala Ala Arg Arg Leu Leu Ala Thr Ser Gin 145 150 155 160
Ala Val Leu Gly Ala Ala Val Tyr Ala Leu His Thr Ala Thr Val Thr 165 170 175
Leu Lys Tyr Ala Cys Gly Asp Ala Arg Arg Arg Arg Asp His Ala Ala
180 185 190
Ala Ala Arg Ala Val Leu Ala Thr Gly Leu Ile Leu Gin Arg Leu Leu 195 200 205
Gly Leu Ala Asp Thr Val Val Ala Cys Val Ala Ala Phe Asp Gly Gly
210 215 220
Ser Thr Ala Pro Glu Val Gly Thr Tyr Thr Pro Leu Arg Tyr Ala Cys 225 230 235 240 Val Leu Arg Ala Thr Gin Pro Leu Tyr Ala Arg Thr Thr Pro Ala Lys
245 250 255
Phe Trp Ala Asp Val Arg Ala Ala Ala Glu His Val Asp Leu Arg Pro
260 265 270
Ala Ser Ser Ala Pro Arg Ala Pro Val Ser Gly Thr Ala Asp Pro Ala 275 280 285
Phe Leu Leu Glu Asp Leu Ala Ala Phe Pro Pro Ala Pro Leu Asn Ser
290 295 300
Glu Ser Val Leu Gly Pro Arg Val Arg Val Val Asp Ile Met Ala Gin 305 310 315 320 Phe Arg Lys Leu Leu Met Gly Asp Glu Glu Thr Ala Ala Leu Arg Ala
325 330 335
His Val Ser Gly Arg Arg Ala Thr Gly Leu Gly Gly Pro Pro Arg Pro 340 345 350
(2) INFORMATION FOR SEQ ID NO: 261:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 457 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261:
Met Ser Val Arg Gly His Ala Val Arg Arg Arg Arg Ala Ser Thr Arg
1 5 10 15
Ser His Ala Pro Ser Ala His Arg Ala Asp Ser Pro Val Glu Asp Glu 20 25 30
Pro Glu Gly Gly Gly Val Gly Leu Met Gly Tyr Leu Arg Ala Val Phe
35 40 45
Asn Val Asp Asp Asp Ser Glu Val Glu Ala Ala Gly Glu Met Ala Ser 50 55 60
Glu Glu Pro Pro Pro Arg Arg Arg Arg Glu Arg His Pro Gly Ser Arg 65 70 75 80
Arg Ala Ser Glu Ala Arg Ala Ala Ala Pro Pro Arg Arg Ala Ser Phe 85 90 95
Pro Arg Pro Arg Ser Val Thr Ala Arg Ser Gin Ser Val Arg Gly Arg 100 105 110
Arg Asp Ser Ala Ile Thr Arg Ala Pro Arg Gly Gly Tyr Leu Gly Pro 115 120 125 Met Asp Pro Arg Asp Val Leu Gly Arg Val Gly Gly Ser Arg Val Val 130 135 140
Pro Ser Pro Leu Phe Leu Asp Glu Leu Ser Tyr Glu Glu Asp Asp Tyr 145 150 155 160
Pro Ala Ala Val Ala His Asp Asp Gly Ala Gly Ala Arg Pro Pro Ala 165 170 175
Thr Val Glu Ile Leu Ala Gly Arg Val Ser Gly Pro Glu Leu Gin Ala 180 185 190
Ala Phe Pro Leu Asp Arg Leu Thr Pro Arg Val Ala Ala Trp Asp Glu 195 200 205 Ser Val Arg Ser Ala Leu Gly His Pro Ala Gly Phe Tyr Pro Cys Pro 210 215 220
Asp Ser Ala Phe Gly Leu Ser Arg Val Gly Val Met His Phe Asp Ala 225 230 235 240
Asp Pro Lys Val Phe Phe Arg Gin Thr Leu Gin Gin Gly Glu Ala Trp 245 250 255
Tyr Val Thr Gly Asp Ala Ile Leu Asp Leu Thr Asp Arg Arg Ala Lys 260 265 270
Thr Ser Pro Ser Arg Ala Met Gly Phe Leu Val Asp Ala Ile Val Arg 275 280 285 Val Ala Ile Asn Gly Trp Val Cys Gly Thr Arg Leu His Thr Glu Gly 290 295 300
Ala Arg Leu Gly Ala Arg Arg Gin Gly Gly Arg Ala Pro Thr Ala Val 305 310 315 320
Arg Glu Pro His Gly Val Ala Arg Gly Arg Arg Arg Ala Ala Ala Gin 325 330 335
Arg Gly Arg Gly Arg Ala Pro Pro Pro Arg Pro Arg Arg Arg Gly Leu 340 345 350
Ser Gin Phe Ala Gly Val Pro Ala Val Leu Arg Ala Arg Ala Pro Gly 355 360 365 Ala Arg Leu Ser Arg Gly Arg Pro Leu Arg Gly Ala His Asp Val His 370 375 380
Arg His Arg Gly Ser Ala Arg Pro Leu Gin Pro Arg Arg Arg Gin Met 385 390 395 400 Arg Ala Pro Ala Gly Gly Arg Val Cys Gly Ala Arg Pro Gly Arg Ala
405 410 415
Gly Gly Pro Gly Gly Ala Asp Gly Pro Val Gly Gly Arg Gly Gly Ala 420 425 430 Pro Ala Pro Ala Leu Arg Pro Pro Arg Val Cys Gly Arg Gly Ala Gly 435 440 445
Gly Ala Val Ser Arg Pro Ala Pro Gly 450 455
(2) INFORMATION FOR SEQ ID NO: 262:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 298 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 262:
Met Thr Ser Arg Arg Ser Val Lys Ser Cys Pro Arg Glu Ala Pro Arg
1 5 10 15
Gly Thr His Glu Glu Leu Tyr Tyr Gly Pro Val Ser Pro Ala Asp Pro 20 25 30
Glu Ser Pro Arg Asp Asp Phe Arg Arg Gly Ala Gly Pro Met Arg Ala
35 40 45
Arg Pro Arg Gly Glu Val Arg Phe Leu His Tyr Asp Glu Ala Gly Tyr 50 55 60 Ala Leu Tyr Arg Asp Ser Ser Ser Ser Glu Asp Asn Asp Glu Ser Arg 65 70 75 80
Asp Thr Ala Arg Pro Arg Arg Ser Ala Ser Val Ala Gly Ser His Gly
85 90 95
Pro Gly Pro Ala Arg Ala Pro Pro Pro Pro Gly Gly Pro Val Gly Ala 100 105 110
Gly Gly Arg Ser His Ala Pro Pro Ala Arg Thr Pro Lys Met Thr Arg
115 120 125
Gly Ala Pro Lys Ala Pro Ala Thr Pro Ala Thr Asp Pro Arg Arg Arg 130 135 140 Pro Ala Gin Ala Asp Ser Ala Val Leu Leu Asp Ala Pro Ala Pro Thr 145 150 155 160
Ala Ser Gly Arg Thr Lys Thr Pro Ala Gin Gly Leu Ala Lys Lys Leu 165 170 175 His Phe Ser Thr Ala Pro Pro Ser Pro Thr Ala Pro Trp Thr Pro Arg
180 185 190
Val Ala Gly Phe Asn Lys Arg Val Phe Cys Ala Ala Val Gly Arg Leu 195 200 205 Ala Ala Thr His Ala Arg Leu Ala Ala Val Gin Leu Trp Asp Met Ser 210 215 220
Arg Pro His Thr Asp Glu Asp Leu Asn Glu Leu Leu Asp Leu Thr Thr 225 230 235 240
Ile Arg Val Thr Val Cys Glu Gly Lys Asn Leu Leu Gin Arg Ala Asn 245 250 255
Glu Leu Val Asn Pro Asp Ala Ala Gin Asp Val Asp Ala Thr Ala Ala
260 265 270
Arg Arg Pro Ala Gly Arg Ala Ala Ala Thr Ala Arg Ala Pro Ala Arg 275 280 285 Ser Ala Ser Arg Pro Arg Arg Pro Leu Glu 290 295
(2) INFORMATION FOR SEQ ID NO: 263:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 70 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:263:
Val Val Leu Leu Phe Val Val Ala Gly Val Pro Gly Glu Pro Pro Asn 1 5 10 15
Ala Ala Gly Arg Val Ile Gly Asp Ala Gin Cys Arg Gly Asp Ser Ala
20 25 30
Gly Val Val Ser Val Pro Gly Val Leu Val Pro Phe Tyr Leu Gly Met 35 40 45
Thr Ser Met Gly Val Cys Met Ile Ala His Val Tyr Gin Ile Cys Gin
50 55 60
Arg Ala Ala Gly Ser Ala 65 70
(2) INFORMATION FOR SEQ ID NO: 264:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 363 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 264:
Met Ser Gin Trp Gly Pro Arg Ala Ile Leu Val Gin Thr Asp Ser Thr 1 5 10 15
Asn Arg Asn Ala Asp Gly Asp Trp Gin Ala Ala Val Ala Ile Arg Gly
20 25 30
Gly Gly Val Val Gin Leu Asn Met Val Asn Lys Arg Ala Val Asp Phe 35 40 45
Thr Pro Ala Glu Cys Gly Asp Ser Glu Trp Ala Val Gly Arg Val Ser
50 55 60
Leu Gly Leu Arg Met Ala Met Pro Arg Asp Phe Cys Ala Ile Ile His 65 70 75 80 Ala Pro Ala Val Ser Gly Pro Gly Pro His Val Met Leu Gly Leu Val
85 90 95
Asp Ser Gly Tyr Arg Gly Thr Val Leu Ala Val Val Val Ala Pro Asn
100 105 110
Gly Thr Arg Gly Phe Ala Pro Gly Ala Leu Arg Val Asp Val Thr Phe 115 120 125
Leu Asp Ile Arg Ala Thr Pro Pro Thr Leu Thr Glu Pro Ser Ser Leu
130 135 140
His Arg Phe Pro Gin Leu Ala Pro Ser Pro Leu Ala Gly Leu Arg Glu 145 150 155 160 Asp Pro Trp Leu Asp Gly Ala Thr Ala Gly Gly Ala Val Pro Ala Arg
165 170 175
Arg Arg Gly Gly Ser Leu Val Tyr Ala Gly Glu Leu Thr Gin Val Thr
180 185 190
Thr Glu His Gly Asp Cys Val His Glu Ala Pro Ala Phe Leu Pro Lys 195 200 205
Arg Glu Glu Asp Ala Gly Phe Asp Ile Leu Ile His Arg Ala Val Thr
210 215 220
Val Pro Ala Asn Gly Ala Thr Val Ile Gin Pro Ser Leu Arg Val Leu 225 230 235 240 Arg Ala Ala Asp Gly Pro Glu Ala Cys Tyr Val Leu Gly Arg Ser Ser
245 250 255
Leu Asn Arg Leu Leu Val Met Pro Thr Arg Trp Pro Ser Gly His Ala 260 265 270 Cys Ala Phe Val Val Cys Asn Leu Thr Gly Val Pro Val Thr Leu Gin
275 280 285
Ala Gly Ser Lys Val Ala Gin Leu Leu Val Ala Gly Thr His Ala Leu
290 295 300
Pro Trp Ile Pro Pro Asp Asn Ile His Glu Asp Gly Ala Phe Arg Ala 305 310 315 320
Tyr Pro Arg Gly Val Pro Asp Ala Thr Ala Thr Pro Arg Asp Pro Pro
325 330 335
Ile Leu Val Phe Thr Asn Glu Phe Asp Ala Asp Ala Pro Pro Ser Lys
340 345 350
Arg Gly Ala Gly Gly Phe Gly Ser Thr Gly Ile 355 360
(2) INFORMATION FOR SEQ ID NO: 265:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 236 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 265:
Met Ala Ser Leu Leu Gly Val Leu Cys Gly Trp Gly Trp Glu Glu Gin
1 5 10 15
Gin Tyr Glu Met Ile Arg Ala Ala Ala Pro Pro Ser Xaa Xaa Asp Pro 20 25 30 Arg Leu Gin Glu Ala Val Val Asn Ala Leu Leu Pro Ala Pro Ile Thr 35 40 45
Leu Asp Asp Ala Leu Glu Ser Leu Asp Asp Thr Arg Arg Leu Val Lys
50 55 60
Ala Arg Ala Arg Thr Tyr His Ala Cys Met Val Asn Leu Glu Arg Leu 65 70 75 80
Ala Arg His His Pro Gly Leu Glu Gly Ser Thr Ile Asp Gly Ala Val
85 90 95
Ala Ala His Arg Asp Lys Met Arg Arg Leu Ala Asp Thr Cys Met Ala 100 105 110 Thr Ile Leu Gin Met Tyr Met Ser Val Gly Ala Ala Asp Lys Ser Ala 115 120 125
Asp Val Leu Val Ser Gin Ala Ile Arg Ser Met Ala Glu Ser Asp Val 130 135 140 Val Met Glu Asp Val Ala Ile Ala Glu Arg Ala Leu Gly Leu Ser Thr 145 150 155 160
Ser Ala Gly Gly Thr Arg Thr Ala Gly Leu Gly Ala Thr Glu Ala Pro
165 " 170 175
Pro Gly Pro Thr Arg Ala Gin Ala Pro Glu Val Ala Ser Val Pro Val
180 185 190
Thr His Ala Gly Asp Arg Ser Pro Val Arg Pro Gly Pro Val Pro Pro
195 200 205
Ala Asp Pro Thr Pro Asp Pro Arg His Arg Thr Ser Ala Pro Lys Arg
210 215 220
Gin Ala Ser Ser Thr Glu Ala Pro Leu Leu Leu Ala 225 230 235
(2) INFORMATION FOR SEQ ID NO: 266:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 453 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 266:
Met Tyr Val Asn Arg Asn Glu Ile Phe Asn Ala Ala Val Thr Asn Ile
1 5 10 15
Ile Leu Asp Leu Asp Ile Ala Leu Lys Glu Pro Val Pro Phe Pro Arg 20 25 30 Leu His Glu Ala Leu Gly His Phe Arg Arg Gly Ala Ala Val Gin Leu 35 40 45
Leu Phe Pro Ala Ala Arg Val Asp Pro Asp Ala Tyr Pro Cys Tyr Phe
50 55 60
Phe Lys Ser Ala Cys Arg Pro Arg Ala Pro Pro Val Cys Ala Gly Asp 65 70 75 80
Gly Pro Ser Ala Gly Gly Asp Asp Gly Asp Gly Asp Trp Phe Pro Asp
85 90 95
Ala Gly Gly Asp Asp Gly Asp Glu Glu Trp Glu Glu Asp Thr Asp Pro 100 105 110 Met Asp Thr Thr His Gly Pro Leu Pro Asp Asp Glu Ala Ala Tyr Leu 115 120 125
Asp Leu Leu His Glu Gin Ile Pro Ala Ala Thr Pro Ser Glu Pro Asp 130 135 140 Ser Val Val Cys Ser Cys Ala Asp Lys Ile Gly Leu Arg Val Cys Leu
145 150 155 160
Pro Val Pro Ala Pro Tyr Val Val His Gly Ser Leu Thr Met Arg Gly
165 170 175 Val Ala Arg Val Ile Gin Gin Ala Val Leu Leu Asp Arg Asp Phe Val
180 185 190
Glu Ala Val Gly Ser His Val Lys Asn Phe Leu Leu Ile Asp Thr Gly
195 200 205
Val Tyr Ala His Gly His Ser Leu Arg Leu Pro Tyr Phe Ala Lys Ile 210 215 220
Gly Pro Asp Gly Ser Ala Cys Gly Arg Leu Leu Pro Val Phe Val Ile
225 230 235 240
Pro Pro Ala Cys Glu Asp Val Pro Ala Phe Val Ala Ala His Ala Asp
245 250 255 Pro Arg Arg Phe His Phe His Ala Pro Pro Met Phe Ser Ala Ala Pro
260 265 270
Arg Glu Ile Arg Val Leu His Ser Leu Gly Gly Asp Tyr Val Ser Phe
275 280 285
Phe Glu Lys Lys Ala Ser Arg Asn Ala Leu Glu His Phe Gly Arg Arg 290 295 300
Glu Thr Leu Thr Glu Val Leu Gly Arg Tyr Asp Val Arg Pro Asp Ala
305 310 315 320
Gly Glu Thr Val Glu Gly Phe Ala Ser Glu Leu Leu Gly Arg Ile Val
325 330 335 Ala Cys Ile Glu Ala His Phe Pro Glu His Ala Arg Glu Tyr Gin Ala
340 345 350
Val Ser Val Arg Arg Ala Val Ile Lys Asp Asp Trp Val Leu Leu Gin
355 360 365
Leu Ile Pro Gly Arg Gly Ala Leu Asn Gin Ser Leu Ser Cys Leu Arg 370 375 380
Phe Lys His Gly Arg Ala Ser Arg Ala Thr Ala Arg Thr Phe Leu Ala
385 390 395 400
Leu Ser Val Gly Thr Asn Asn Arg Leu Cys Ala Ser Leu Cys Gin Gin
405 410 415 Cys Phe Ala Thr Lys Cys Asp Asn Asn Arg Leu His Thr Leu Phe Thr
420 425 430
Val Asp Ala Gly Thr Pro Cys Ser Arg Ser Ala Pro Ser Ser Thr Ser
435 440 445
Arg Pro Ser Ser Ser 450
(2) INFORMATION FOR SEQ ID NO: 267: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 332 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 267:
Met Leu Ala Val Arg Ser Leu Gin His Leu Thr Thr Val Ile Phe Ile
1 5 10 15
Thr Ala Tyr Gly Leu Val Leu Ala Trp Tyr Ile Val Phe Gly Asp Leu 20 25 30 His Arg Cys Ile Tyr Ala Val Arg Pro Ala Gly Ala His Asn Asp Thr 35 40 45
Ala Leu Val Trp Met Lys Ile Asn Gin Thr Leu Leu Phe Leu Gly Pro
50 55 60
Pro Thr Ala Pro Pro Gly Gly Ala Trp Thr Pro His Ala His Val Cys 65 70 75 80
Tyr Ala Asn Ile Ile Glu Gly Arg Ala Val Ser Leu Pro Ala Ile Pro
85 90 95
Gly Ala Met Ser Arg Arg Val Met Asn Val His Glu Ala Val Asn Cys 100 105 110 Leu Glu Ala Leu Trp Asp Thr Gin Met Arg Leu Val Val Val Gly Trp 115 120 125
Phe Leu Tyr Leu Ala Phe Val His Gin Arg Arg Cys Met Phe Gly Val
130 135 140
Val Ser Pro Ala His Ser Met Val Ala Pro Ala Thr Tyr Leu Leu Asn 145 150 155 160
Tyr Ala Gly Arg Ile Val Ser Ser Val Phe Leu Gin Tyr Pro Tyr Thr
165 170 175
Lys Ile Thr Arg Leu Leu Cys Glu Leu Ser Val Gin Arg Gin Thr Leu 180 185 190 Val Gin Leu Phe Glu Ala Asp Pro Val Thr Phe Leu Tyr His Arg Pro 195 200 205
Ala Val Gly Val Ile Val Gly Cys Glu Leu Leu Leu Arg Phe Val Gly
210 215 220
Leu Ile Val Gly Thr Ala Leu Ile Ser Arg Gly Ala Cys Ala Ile Thr 225 230 235 240
Tyr Pro Leu Phe Leu Thr Ile Thr Thr Trp Cys Phe Val Ser Ile Ile
245 250 255
Ala Leu Thr Glu Leu Tyr Phe Ile Leu Arg Arg Asp Ser Ala Pro Lys 260 265 270
Asn Ala Glu Pro Ala Ala Pro Arg Gly Arg Ser Lys Gly Trp Ser Gly
275 280 285
Val Cys Gly Arg Cys Cys Ser Ile Ile Leu Ser Gly Ile Ala Val Arg
290 295 300
Leu Cys Tyr Ile Ala Val Val Ala Gly Val Val Leu Met Ala Leu Arg 305 310 315 320
Tyr Glu Gin Glu Ile Gin Arg Arg Leu Phe Asp Leu 325 330
(2) INFORMATION FOR SEQ ID NO: 268:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 117 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26
Met Ile Gly Ala His Pro Gly Val Gly Gly Asp Leu Pro Ser Gly Leu 1 5 10 15 Pro Thr Tyr Ala Glu Ala Thr Ser Asp Arg Pro Pro Thr Tyr Ala Met 20 25 30
Val Met Ala Ala Cys Pro Thr Glu Pro Pro Gly Gly Ser Val Gly Pro
35 40 45
Ala Asp Gin Pro Arg Val Gin Ser Ser Arg Thr Trp Arg Pro Pro Leu 50 55 60
Val Asn Ser Arg Glu Leu Tyr Arg Ala Gin Arg Ala Ala Arg Cys Ala 65 70 75 80
Ser Ser Ser Asp Thr Pro Gin Ala Pro Gly Trp Cys Gly Gly Thr Cys 85 90 95 Arg His Ala Val Phe Gly Val Val Ala Val Val Val Val Ile Ile Leu 100 105 110
Ala Phe Leu Trp Arg 115
(2) INFORMATION FOR SEQ ID NO: 269:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 194 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 269:
Met His Leu Phe Cys Gin Cys Pro Leu Thr Asp Gly Gin Asp Leu Tyr 1 5 10 15
Leu Cys Pro Val Tyr Pro Arg Met His Gin Glu His Leu Val Cys Pro
20 25 30
Leu His Arg Leu Asp Asp Ala Arg Arg Arg Gly Arg Thr Ser Ala Ala 35 40 45 Trp Asp Glu Gly Leu Val Arg Ala Leu Thr His Ser Gly Gly Leu Met 50 55 60
Gly Cys Gly Gly Arg Ser Leu Thr Leu Ser Glu Thr Tyr Trp Gly His 65 70 75 80
Pro Leu Tyr Glu Lys Leu Val Pro Trp Asp His Pro Arg Asp Leu Lys 85 90 95
Val Pro Glu Ala Ser Ala Val Gly Thr Arg Ala Leu Val Pro Arg Gly
100 105 110
Arg Gly Arg Pro Leu Arg Gly Arg Pro Val Pro Leu Ile Pro Leu Asp 115 120 125 Cys Glu Pro Asn Asp Gly Leu Pro Phe Gly Gly Gly Trp Pro Gly Gly 130 135 140
Arg Leu Arg Gly Ala Pro Val Pro Leu His Pro Pro Pro Pro Ser Ala 145 150 155 160
Pro Pro Leu Ser Phe Thr Pro Thr Leu Thr Pro Pro Cys Leu Cys Arg 165 170 175
Gly Leu Ser Leu Cys Val Val Val Lys Gin Tyr Leu Lys Asp Arg Asn
180 185 190
Asn Phe
(2) INFORMATION FOR SEQ ID NO: 270:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 853 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 270:
Met Asn Val Ala Thr Cys Thr His Gin Thr His His Ala Ala Arg Ala 1 5 10 15
Pro Gly Ala Thr Ser Ala Pro Gly Ala Ala Ser Gly Asp Pro Leu Gly
20 25 30
Ala Arg Arg Pro Ile Gly Asp Asp Glu Cys Glu Gin Tyr Thr Ser Ser 35 40 45
Val Ser Leu Ala Arg Met Leu Tyr Gly Gly Asp Leu Ala Glu Trp Val
50 55 60
Pro Arg Val His Pro Lys Thr Thr Ile Glu Arg Gin Gin His Gly Pro 65 70 75 80 Val Thr Phe Pro Asp Ala Ser Ala Pro Thr Ala Arg Cys Val Thr Val
85 90 95
Val Arg Ala Pro Met Gly Ser Gly Lys Thr Thr Ala Leu Ile Arg Trp
100 105 110
Leu Gly Glu Ala Ile His Ser Pro Asp Thr Ser Val Leu Val Val Ser 115 120 125
Cys Arg Arg Ser Phe Thr Gin Thr Leu Ala Thr Arg Phe Ala Glu Ser
130 135 140
Gly Leu Pro Asp Phe Val Thr Tyr Phe Ser Ser Thr Asn Tyr Ile Met 145 150 155 160 Asn Asp Arg Pro Phe His Arg Leu Ile Val Gin Val Glu Ser Leu His
165 170 175
Arg Val Gly Pro Asn Leu Leu Asn Asn Tyr Asp Val Leu Val Leu Asp
180 185 190
Glu Val Met Ser Thr Leu Gly Gin Lys Pro Thr Met Gin Gin Leu Gly 195 200 205
Arg Val Asp Ala Leu Met Leu Arg Leu Leu Arg Thr Cys Pro Arg Ile
210 215 220
Ile Ala Met Asp Ala Thr Ala Asn Ala Gin Leu Val Asp Phe Leu Cys 225 230 235 240 Ser Leu Arg Gly Glu Lys Asn Val His Val Val Ile Gly Glu Tyr Ala
245 250 255
Met Pro Gly Phe Ser Ala Arg Arg Cys Leu Phe Leu Pro Arg Leu Gly
260 265 270
Pro Glu Val Leu Gin Ala Ala Leu Arg Pro Pro Gly Pro Ala Gly Gly 275 280 285
Ala Pro Pro Pro Asp Ala Pro Pro Asp Ala Thr Phe Phe Gly Glu Leu
290 295 300
Glu Ala Arg Leu Ala Gly Gly Asp Asn Val Cys Ile Phe Ser Ser Thr 305 310 315 320
Val Ser Phe Ala Glu Val Val Ala Arg Phe Cys Arg Gin Phe Thr Asp
325 330 335
Arg Val Leu Leu Leu His Ser Leu Thr Pro Pro Gly Asp Val Thr Thr 340 345 350
Trp Gly Arg Tyr Arg Val Val Ile Tyr Thr Thr Val Val Thr Val Gly
355 360 365
Leu Ser Phe Asp Pro Pro His Phe Asp Ser Met Phe Ala Tyr Val Lys
370 375 380 Pro Met Asn Tyr Gly Pro Asp Met Val Ser Val Tyr Gin Ser Leu Gly
385 390 395 400
Arg Val Arg Thr Leu Arg Lys Gly Glu Leu Leu Ile Tyr Met Asp Gly
405 410 415
Ser Gly Ala Arg Ser Glu Pro Val Phe Thr Pro Met Leu Leu Asn His 420 425 430
Val Val Ser Ala Ser Gly Gin Trp Pro Ala Gin Phe Ser Gin Val Thr
435 440 445
Asn Leu Leu Cys Arg Arg Phe Lys Gly Arg Cys Asp Ala Ser His Ala
450 455 460 Asp Ala Ala Gin Arg Ser Arg Ile Tyr Ser Lys Phe Arg Tyr Lys His
465 470 475 480
Tyr Phe Glu Arg Cys Thr Leu Ala Cys Leu Ala Asp Ser Leu Asn Ile
485 490 495
Leu His Met Leu Leu Thr Leu Asn Cys Met His Val Arg Phe Trp Gly 500 505 510
His Asp Ala Ala Leu Thr Pro Arg Asn Phe Cys Leu Phe Leu Arg Gly
515 520 525
Ile His Phe Asp Ala Leu Arg Ala Gin Arg Asp Leu Arg Glu Leu Arg
530 535 540 Cys Gin Asp Pro Asp Thr Ser Leu Ser Ala Gin Ala Ala Glu Thr Glu
545 550 555 560
Glu Val Gly Leu Phe Val Glu Lys Tyr Leu Arg Pro Asp Val Ala Pro
565 570 575
Ala Glu Val Val Met Arg Gin Ser Leu Val Gly Arg Thr Arg Phe Ile 580 585 590
Tyr Leu Val Leu Leu Glu Ala Cys Leu Arg Val Pro Met Ala Ala His
595 600 605
Ser Ser Ala Ile Phe Arg Arg Leu Tyr Asp His Tyr Ala Thr Gly Val 610 615 620 Ile Pro Thr Ile Asn Ala Ala Gly Glu Leu Glu Leu Val His Pro Thr 625 630 635 640
Leu Asn Val Ala Pro Val Trp Glu Leu Phe Arg Leu Cys Ser Thr Met 645 650 655 Ala Ala Cys Leu Gin Trp Asp Ser Met Ala Gly Gly Ser Gly Arg Thr
660 665 670
Phe Ser Pro Glu Asp Val Leu Glu Leu Leu Asn Pro His Tyr Asp Arg 675 680 685 Tyr Met Gin Leu Val Phe Glu Leu Gly His Cys Asn Val Thr Asp Gly 690 695 700
Pro Leu Leu Ser Glu Asp Ala Val Lys Arg Val Ala Asp Ala Leu Ser 705 710 715 720
Gly Cys Pro Pro Arg Gly Ser Val Ser Glu Thr Glu His Ala Leu Ser 725 730 735
Leu Phe Lys Ile Ile Trp Gly Glu Leu Phe Gly Val Gin Leu Ala Lys
740 745 750
Ser Thr Gin Thr Phe Pro Gly Ala Gly Arg Val Lys Asn Leu Thr Lys 755 760 765 Arg Ala Ile Val Glu Leu Leu Asp Ala His Arg Ile Asp His Ser Ala 770 775 780
Cys Arg Thr Gin Leu Tyr Ala Leu Leu Met Ala His Lys Arg Glu Phe 785 790 795 800
Ala Gly Ala Arg Phe Lys Leu Arg Ala Pro Ala Trp Gly Arg Cys Leu 805 810 815
Arg Thr His Ala Ser Gly Ala Gin Pro Asn Thr Asp Ile Ile Ala Ala
820 825 830
Leu Ser Glu Leu Pro Thr Glu Ala Trp Pro Met Met Gin Gly Ala Val 835 840 845 Asn Phe Ser Thr Leu 850
(2) INFORMATION FOR SEQ ID NO: 271:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 857 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 271:
Met Ala Glu Thr Met Asn Val Ala Thr Cys Thr His Gin Thr His His 1 5 10 15
Ala Ala Arg Ala Pro Gly Ala Thr Ser Ala Pro Gly Ala Ala Ser Gly 20 25 30 Asp Pro Leu Gly Ala Arg Arg Pro lie Gly Asp Asp Glu Cys Glu Gin
35 40 45
Tyr Thr Ser Ser Val Ser Leu Ala Arg Met Leu Tyr Gly Gly Asp Leu 50 55 60 Ala Glu Trp Val Pro Arg Val His Pro Lys Thr Thr Ile Glu Arg Gin 65 70 75 80
Gin His Gly Pro Val Thr Phe Pro Asp Ala Ser Ala Pro Thr Ala Arg
85 90 95
Cys Val Thr Val Val Arg Ala Pro Met Gly Ser Gly Lys Thr Thr Ala 100 105 110
Leu Ile Arg Trp Leu Gly Glu Ala Ile His Ser Pro Asp Thr Ser Val
115 120 125
Leu Val Val Ser Cys Arg Arg Ser Phe Thr Gin Thr Leu Ala Thr Arg
130 135 140 Phe Ala Glu Ser Gly Leu Pro Asp Phe Val Thr Tyr Phe Ser Ser Thr
145 150 155 160
Asn Tyr Ile Met Asn Asp Arg Pro Phe His Arg Leu Ile Val Gin Val
165 170 175
Glu Ser Leu His Arg Val Gly Pro Asn Leu Leu Asn Asn Tyr Asp Val 180 185 190
Leu Val Leu Asp Glu Val Met Ser Thr Leu Gly Gin Lys Pro Thr Met
195 200 205
Gin Gin Leu Gly Arg Val Asp Ala Leu Met Leu Arg Leu Leu Arg Thr
210 215 220 Cys Pro Arg Ile Ile Ala Met Asp Ala Thr Ala Asn Ala Gin Leu Val
225 230 235 240
Asp Phe Leu Cys Ser Leu Arg Gly Glu Lys Asn Val His Val Val Ile
245 250 255
Gly Glu Tyr Ala Met Pro Gly Phe Ser Ala Arg Arg Cys Leu Phe Leu 260 265 270
Pro Arg Leu Gly Pro Glu Val Leu Gin Ala Ala Leu Arg Pro Pro Gly
275 280 285
Pro Ala Gly Gly Ala Pro Pro Pro Asp Ala Pro Pro Asp Ala Thr Phe
290 295 300 Phe Gly Glu Leu Glu Ala Arg Leu Ala Gly Gly Asp Asn Val Cys Ile
305 310 315 320
Phe Ser Ser Thr Val Ser Phe Ala Glu Val Val Ala Arg Phe Cys Arg
325 330 335
Gin Phe Thr Asp Arg Val Leu Leu Leu His Ser Leu Thr Pro Pro Gly 340 345 350
Asp Val Thr Thr Trp Gly Arg Tyr Arg Val Val Ile Tyr Thr Thr Val
355 360 365
Val Thr Val Gly Leu Ser Phe Asp Pro Pro His Phe Asp Ser Met Phe 370 375 380
Ala Tyr Val Lys Pro Met Asn Tyr Gly Pro Asp Met Val Ser Val Tyr 385 390 . 395 400
Gin Ser Leu Gly Arg Val Arg Thr Leu Arg Lys Gly Glu Leu Leu Ile 405 410 415
Tyr Met Asp Gly Ser Gly Ala Arg Ser Glu Pro Val Phe Thr Pro Met
420 425 430
Leu Leu Asn His Val Val Ser Ala Ser Gly Gin Trp Pro Ala Gin Phe 435 440 445 Ser Gin Val Thr Asn Leu Leu Cys Arg Arg Phe Lys Gly Arg Cys Asp 450 455 460
Ala Ser His Ala Asp Ala Ala Gin Arg Ser Arg Ile Tyr Ser Lys Phe 465 470 475 480
Arg Tyr Lys His Tyr Phe Glu Arg Cys Thr Leu Ala Cys Leu Ala Asp 485 490 495
Ser Leu Asn Ile Leu His Met Leu Leu Thr Leu Asn Cys Met His Val
500 505 510
Arg Phe Trp Gly His Asp Ala Ala Leu Thr Pro Arg Asn Phe Cys Leu 515 520 525 Phe Leu Arg Gly Ile His Phe Asp Ala Leu Arg Ala Gin Arg Asp Leu 530 535 540
Arg Glu Leu Arg Cys Gin Asp Pro Asp Thr Ser Leu Ser Ala Gin Ala 545 550 555 560
Ala Glu Thr Glu Glu Val Gly Leu Phe Val Glu Lys Tyr Leu Arg Pro 565 570 575
Asp Val Ala Pro Ala Glu Val Val Met Arg Gin Ser Leu Val Gly Arg
580 585 590
Thr Arg Phe Ile Tyr Leu Val Leu Leu Glu Ala Cys Leu Arg Val Pro 595 600 605 Met Ala Ala His Ser Ser Ala Ile Phe Arg Arg Leu Tyr Asp His Tyr 610 615 620
Ala Thr Gly Val Ile Pro Thr Ile Asn Ala Ala Gly Glu Leu Glu Leu 625 630 635 640
Val His Pro Thr Leu Asn Val Ala Pro Val Trp Glu Leu Phe Arg Leu 645 650 655
Cys Ser Thr Met Ala Ala Cys Leu Gin Trp Asp Ser Met Ala Gly Gly
660 665 670
Ser Gly Arg Thr Phe Ser Pro Glu Asp Val Leu Glu Leu Leu Asn Pro 675 680 685 His Tyr Asp Arg Tyr Met Gin Leu Val Phe Glu Leu Gly His Cys Asn 690 695 700
Val Thr Asp Gly Pro Leu Leu Ser Glu Asp Ala Val Lys Arg Val Ala 705 710 715 720 Asp Ala Leu Ser Gly Cys Pro Pro Arg Gly Ser Val Ser Glu Thr Glu
725 730 735
His Ala Leu Ser Leu Phe Lys Ile Ile Trp Gly Glu Leu Phe Gly Val 740 745 750 Gin Leu Ala Lys Ser Thr Gin Thr Phe Pro Gly Ala Gly Arg Val Lys 755 760 765
Asn Leu Thr Lys Arg Ala Ile Val Glu Leu Leu Asp Ala His Arg Ile
770 775 780
Asp His Ser Ala Cys Arg Thr Gin Leu Tyr Ala Leu Leu Met Ala His 785 790 795 800
Lys Arg Glu Phe Ala Gly Ala Arg Phe Lys Leu Arg Ala Pro Ala Trp
805 810 815
Gly Arg Cys Leu Arg Thr His Ala Ser Gly Ala Gin Pro Asn Thr Asp 820 825 830 Ile Ile Ala Ala Leu Ser Glu Leu Pro Thr Glu Ala Trp Pro Met Met 835 840 845
Gin Gly Ala Val Asn Phe Ser Thr Leu 850 855
(2) INFORMATION FOR SEQ ID NO: 272:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1370 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:272:
Met Glu Pro Ala Asn Pro Pro Arg Asn Pro Met Ala Ala Pro Ala Arg
1 5 10 15
Asp Pro Pro Gly Tyr Arg Tyr Ala Ala Ala Met Val Pro Thr Gly Ser 20 25 30
Ile Leu Ser Thr Ile Glu Val Ala Ser His Arg Arg Leu Phe Asp Phe
35 40 45
Phe Ala Arg Val Arg Ser Asp Glu Asn Ser Leu Tyr Asp Val Glu Phe 50 55 60 Asp Ala Leu Leu Gly Ser Tyr Cys Asn Thr Leu Ser Leu Val Arg Phe 65 70 75 80
Leu Glu Leu Gly Leu Ser Val Ala Cys Val Cys Thr Lys Phe Pro Glu 85 90 95 Leu Ala Tyr Met Asn Glu Gly Arg Val Gin Phe Glu Val His Gin Pro
100 105 110 Leu Ile Ala Arg Asp Gly Pro. His Pro Val Glu Gin Pro Val His Asn 115 120 125 Tyr Met Thr Lys Val Ile Asp Arg Arg Ala Leu Asn Ala Ala Phe Ser 130 135 140
Leu Ala Thr Glu Ala lie Ala Leu Leu Thr Gly Glu Ala Leu Asp Gly 145 150 155 160
Thr Gly Ile Ser Leu His Arg Gin Leu Arg Ala Ile Gin Gin Leu Ala 165 170 175
Arg Asn Val Gin Ala Val Leu Gly Ala Phe Glu Arg Gly Thr Ala Asp
180 185 190 Gin Met Leu His Val Leu Leu Glu Lys Ala Pro Pro Leu Ala Leu Leu 195 200 205 Leu Pro Met Gin Arg Tyr Leu Asp Asn Gly Arg Leu Ala Thr Arg Val 210 215 220
Ala Arg Ala Thr Leu Val Ala Glu Leu Lys Arg Ser Phe Cys Asp Thr 225 230 235 240
Ser Phe Phe Leu Gly Lys Ala Gly His Arg Arg Glu Ala Ile Glu Ala 245 250 255
Trp Leu Val Asp Leu Thr Thr Ala Thr Gin Pro Ser Val Ala Val Pro
260 265 270 Arg Leu Thr His Ala Asp Thr Arg Gly Arg Pro Val Asp Gly Val Leu 275 280 285 Val Thr Thr Ala Ala Ile Lys Gin Arg Leu Leu Gin Ser Phe Leu Lys 290 295 300
Val Glu Asp Thr Glu Ala Asp Val Pro Val Thr Tyr Gly Glu Met Val 305 310 315 320
Leu Asn Gly Ala Asn Leu Val Thr Ala Leu Val Met Gly Lys Ala Val 325 330 335
Arg Ser Leu Asp Asp Val Gly Arg His Leu Leu Glu Met Gin Glu Glu
340 345 350 Gin Leu Glu Ala Asn Arg Glu Thr Leu Asp Glu Leu Glu Ser Ala Pro 355 360 365 Gin Thr Thr Arg Val Arg Ala Asp Leu Val Ala Ile Gly Asp Arg Leu 370 375 380
Val Phe Leu Glu Ala Leu Glu Lys Arg Ile Tyr Ala Ala Thr Asn Val 385 390 395 400
Pro Tyr Pro Leu Val Gly Ala Met Asp Leu Thr Phe Val Leu Pro Leu 405 410 415
Gly Leu Phe Asn Pro Ala Met Glu Arg Phe Ala Ala His Ala Gly Asp
420 425 430 Leu Val Pro Ala Pro Gly His Pro Glu Pro Arg Ala Phe Pro Pro Arg 435 440 445
Gin Leu Phe Phe Trp Gly Lys Asp His Gin Val Leu Arg Leu Ser Met
450 455_ 460
Glu Asn Ala Val Gly Thr Val Cys His Pro Ser Leu Met Asn Ile Asp 465 470 475 480
Ala Ala Val Gly Gly Val Asn His Asp Pro Val Glu Ala Ala Asn Pro
485 490 495
Tyr Gly Ala Tyr Val Ala Ala Pro Ala Gly Pro Gly Ala Asp Met Gin 500 505 510 Gin Arg Phe Leu Asn Ala Trp Arg Gin Arg Leu Ala His Gly Arg Val 515 520 525
Arg Trp Val Ala Glu Cys Gin Met Thr Ala Glu Gin Phe Met Gin Pro
530 535 540
Asp Asn Ala Asn Leu Ala Leu Glu Leu His Pro Ala Phe Asp Phe Phe 545 550 555 560
Ala Gly Val Ala Asp Val Glu Leu Pro Gly Gly Glu Val Pro Pro Ala
565 570 575
Gly Pro Gly Ala Ile Gin Ala Thr Trp Arg Val Val Asn Gly Asn Leu 580 585 590 Pro Leu Ala Leu Cys Pro Val Ala Phe Arg Asp Arg Leu Glu Leu Gly 595 600 605
Val Gly Arg His Ala Met Ala Pro Ala Thr Ile Ala Ala Val Arg Gly
610 615 620
Ala Phe Glu Asp Arg Ser Tyr Pro Ala Val Phe Tyr Leu Leu Gin Ala 625 630 635 640
Ala Ile His Gly Ser Glu His Val Phe Cys Ala Arg Leu Val Thr Gin
645 650 655
Cys Ile Thr Ser Tyr Trp Asn Asn Thr Arg Cys Ala Ala Phe Val Asn 660 665 670 Asp Tyr Ser Leu Val Ser Tyr Ile Val Thr Tyr Leu Gly Gly Asp Leu 675 680 685
Pro Glu Glu Cys Met Ala Val Tyr Arg Asp Leu Val Ala His Val Glu
690 695 700
Ala Gin Leu Val Asp Asp Phe Thr Leu Pro Gly Pro Glu Leu Gly Gly 705 710 715 720
Gin Ala Gin Ala Glu Leu Asn His Leu Met Arg Asp Pro Ala Leu Leu
725 730 735
Pro Pro Leu Val Trp Asp Cys Asp Gly Leu Met Arg His Ala Ala Leu 740 745 750 Asp Arg His Arg Asp Cys Arg Ile Asp Ala Gly Gly His Glu Pro Val 755 760 765
Tyr Ala Ala Ala Cys Asn Val Ala Thr Ala Asp Phe Asn Arg Asn Asp 770 775 780 Gly Arg Leu Leu His Asn Thr Gin Ala Arg Ala Ala Asp Ala Ala Asp
785 790 795 800
Asp Arg Pro His Arg Pro Ala. Asp Trp Thr Val His His Lys Ile Tyr
805 810 815 Tyr Tyr Val Leu Val Pro Ala Phe Ser Arg Gly Arg Cys Cys Thr Ala
820 825 830
Gly Val Arg Phe Asp Arg Val Tyr Ala Thr Leu Gin Asn Met Val Val
835 840 845
Pro Glu Ile Ala Pro Gly Glu Glu Cys Pro Ser Asp Pro Val Thr Asp 850 855 860
Pro Ala His Pro Leu His Pro Ala Asn Leu Val Ala Asn Thr Val Asn
865 870 875 880
Ala Met Phe His Asn Gly Arg Val Val Val Asp Gly Pro Ala Met Leu
885 890 895 Thr Leu Gin Val Leu Ala His Asn Met Ala Glu Arg Thr Thr Ala Leu
900 905 910
Leu Cys Ser Ala Ala Pro Asp Ala Gly Ala Asn Thr Ala Ser Thr Ala
915 920 925
Asn Met Arg Ile Phe Asp Gly Ala Leu His Ala Gly Val Leu Leu Met 930 935 940
Ala Pro Gin His Leu Asp His Thr Ile Gin Asn Gly Glu Tyr Phe Tyr
945 950 955 960
Val Leu Pro Val His Ala Leu Phe Ala Gly Ala Asp His Val Ala Asn
965 970 975 Ala Pro Asn Phe Pro Pro Ala Leu Arg Asp Leu Ala Arg His Val Pro
980 985 990
Leu Val Pro Pro Ala Leu Gly Ala Asn Tyr Phe Ser Ser Ile Arg Gin
995 1000 1005
Pro Val Val Gin His Ala Arg Glu Ser Ala Ala Gly Glu Asn Ala Leu 1010 1015 1020
Thr Tyr Ala Leu Met Ala Gly Tyr Phe Lys Met Ser Pro Val Tyr His
1025 1030 1035 104
Gin Leu Lys Thr Gly Leu His Pro Gly Phe Gly Phe Thr Val Val Arg
1045 1050 1055 Gin Asp Arg Phe Val Thr Glu Asn Val Leu Phe Ser Ala Ser Glu Ala
1060 1065 1070
Tyr Phe Leu Gly Gin Leu Gin Val Ala Arg His Glu Thr Gly Gly Gly
1075 1080 1085
Val Ser Phe Thr Leu Thr Gin Pro Arg Gly Asn Val Asp Leu Gly Val 1090 1095 1100
Gly Tyr Thr Ala Val Ala Ala Thr Ala Thr Val Arg Asn Pro Val Thr 1105 1110 1115 112
Asp Met Gly Asn Leu Pro Gin Asn Phe Tyr Leu Gly Arg Gly Ala Pro 1125 1130 1135
Pro Leu Leu Asp Asn Ala Ala Ala Val Tyr Leu Arg Asn Ala Val Val
1140 . 1145 1150
Ala Gly Asn Arg Leu Gly Pro Ala Gin Pro Leu Pro Val Phe Gly Cys 1155 1160 1165
Ala Gin Val Pro Arg Arg Ala Gly Met Asp His Gly Gin Asp Ala Val
1170 1175 1180
Cys Glu Phe Ile Ala Thr Pro Val Ala Thr Asp Ile Asn Tyr Phe Arg 1185 1190 1195 120 Arg Pro Cys Asn Pro Arg Gly Arg Ala Ala Gly Gly Val Tyr Ala Gly
1205 1210 1215
Asp Lys Glu Gly Asp Val Ile Ala Leu Met Tyr Asp His Gly Gin Ser
1220 1225 1230
Asp Pro Ala Arg Pro Phe Ala Ala Thr Ala Asn Pro Trp Ala Ser Gin 1235 1240 1245
Arg Phe Ser Tyr Gly Asp Leu Leu Tyr Asn Gly Ala Tyr His Leu Asn
1250 1255 1260
Gly Asp Val Leu Ser Pro Cys Phe Lys Phe Phe Thr Ala Ala Asp Ile 1265 1270 1275 128 Thr Ala Lys His Arg Cys Leu Glu Arg Leu Ile Val Glu Thr Gly Ser
1285 1290 1295
Ala Val Ser Thr Ala Thr Ala Ala Ser Asp Val Gin Phe Lys Arg Pro
1300 1305 1310
Pro Gly Cys Arg Glu Leu Val Glu Asp Pro Cys Gly Leu Phe Gin Glu 1315 1320 1325
Ala Tyr Pro Ile Thr Cys Ala Ser Asp Pro Ala Leu Leu Arg Ser Ala
1330 1335 1340
Arg Asp Gly Glu Ala His Ala Arg Glu Thr His Phe Thr Gin Tyr Leu 1345 1350 1355 136 He Tyr Asp Asp Leu Lys Gly Leu Ser Leu
1365 1370
(2) INFORMATION FOR SEQ ID NO: 273:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1360 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 273 Met Ala Ala Pro Ala Arg Asp Pro Pro Gly Tyr Arg Tyr Ala Ala Ala
1 5 . 10 15
Met Val Pro Thr Gly Ser Ile Leu Ser Thr Ile Glu Val Ala Ser His 20 25 30
Arg Arg Leu Phe Asp Phe Phe Ala Arg Val Arg Ser Asp Glu Asn Ser
35 40 45
Leu Tyr Asp Val Glu Phe Asp Ala Leu Leu Gly Ser Tyr Cys Asn Thr 50 55 60 Leu Ser Leu Val Arg Phe Leu Glu Leu Gly Leu Ser Val Ala Cys Val 65 70 75 80
Cys Thr Lys Phe Pro Glu Leu Ala Tyr Met Asn Glu Gly Arg Val Gin
85 90 95
Phe Glu Val His Gin Pro Leu Ile Ala Arg Asp Gly Pro His Pro Val 100 105 110
Glu Gin Pro Val His Asn Tyr Met Thr Lys Val Ile Asp Arg Arg Ala
115 120 125
Leu Asn Ala Ala Phe Ser Leu Ala Thr Glu Ala Ile Ala Leu Leu Thr
130 135 140 Gly Glu Ala Leu Asp Gly Thr Gly Ile Ser Leu His Arg Gin Leu Arg
145 150 155 160
Ala Ile Gin Gin Leu Ala Arg Asn Val Gin Ala Val Leu Gly Ala Phe
165 170 175
Glu Arg Gly Thr Ala Asp Gin Met Leu His Val Leu Leu Glu Lys Ala 180 185 190
Pro Pro Leu Ala Leu Leu Leu Pro Met Gin Arg Tyr Leu Asp Asn Gly
195 200 205
Arg Leu Ala Thr Arg Val Ala Arg Ala Thr Leu Val Ala Glu Leu Lys
210 215 220 Arg Ser Phe Cys Asp Thr Ser Phe Phe Leu Gly Lys Ala Gly His Arg
225 230 235 240
Arg Glu Ala Ile Glu Ala Trp Leu Val Asp Leu Thr Thr Ala Thr Gin
245 250 255
Pro Ser Val Ala Val Pro Arg Leu Thr His Ala Asp Thr Arg Gly Arg 260 265 270
Pro Val Asp Gly Val Leu Val Thr Thr Ala Ala Ile Lys Gin Arg Leu
275 280 285
Leu Gin Ser Phe Leu Lys Val Glu Asp Thr Glu Ala Asp Val Pro Val 290 295 300 Thr Tyr Gly Glu Met Val Leu Asn Gly Ala Asn Leu Val Thr Ala Leu 305 310 315 320
Val Met Gly Lys Ala Val Arg Ser Leu Asp Asp Val Gly Arg His Leu 325 330 335 Leu Glu Met Gin Glu Glu Gin Leu Glu Ala Asn Arg Glu Thr Leu Asp
340 345 350
Glu Leu Glu Ser Ala Pro Gin. Thr Thr Arg Val Arg Ala Asp Leu Val 355 360 365 Ala Ile Gly Asp Arg Leu Val Phe Leu Glu Ala Leu Glu Lys Arg lie 370 375 380
Tyr Ala Ala Thr Asn Val Pro Tyr Pro Leu Val Gly Ala Met Asp Leu 385 390 395 400
Thr Phe Val Leu Pro Leu Gly Leu Phe Asn Pro Ala Met Glu Arg Phe 405 410 415
Ala Ala His Ala Gly Asp Leu Val Pro Ala Pro Gly His Pro Glu Pro
420 425 430
Arg Ala Phe Pro Pro Arg Gin Leu Phe Phe Trp Gly Lys Asp His Gin 435 440 445 Val Leu Arg Leu Ser Met Glu Asn Ala Val Gly Thr Val Cys His Pro 450 455 460
Ser Leu Met Asn Ile Asp Ala Ala Val Gly Gly Val Asn His Asp Pro 465 470 475 480
Val Glu Ala Ala Asn Pro Tyr Gly Ala Tyr Val Ala Ala Pro Ala Gly 485 490 495
Pro Gly Ala Asp Met Gin Gin Arg Phe Leu Asn Ala Trp Arg Gin Arg
500 505 510
Leu Ala His Gly Arg Val Arg Trp Val Ala Glu Cys Gin Met Thr Ala 515 520 525 Glu Gin Phe Met Gin Pro Asp Asn Ala Asn Leu Ala Leu Glu Leu His 530 535 540
Pro Ala Phe Asp Phe Phe Ala Gly Val Ala Asp Val Glu Leu Pro Gly 545 550 555 560
Gly Glu Val Pro Pro Ala Gly Pro Gly Ala Ile Gin Ala Thr Trp Arg 565 570 575
Val Val Asn Gly Asn Leu Pro Leu Ala Leu Cys Pro Val Ala Phe Arg
580 585 590
Asp Arg Leu Glu Leu Gly Val Gly Arg His Ala Met Ala Pro Ala Thr 595 600 605 Ile Ala Ala Val Arg Gly Ala Phe Glu Asp Arg Ser Tyr Pro Ala Val 610 615 620
Phe Tyr Leu Leu Gin Ala Ala Ile His Gly Ser Glu His Val Phe Cys 625 630 635 640
Ala Arg Leu Val Thr Gin Cys He Thr Ser Tyr Trp Asn Asn Thr Arg 645 650 655
Cys Ala Ala Phe Val Asn Asp Tyr Ser Leu Val Ser Tyr Ile Val Thr
660 665 670
Tyr Leu Gly Gly Asp Leu Pro Glu Glu Cys Met Ala Val Tyr Arg Asp 675 680 685
Leu Val Ala His Val Glu Ala Gin Leu Val Asp Asp Phe Thr Leu Pro
690 695- 700
Gly Pro Glu Leu Gly Gly Gin Ala Gin Ala Glu Leu Asn His Leu Met 705 710 715 720
Arg Asp Pro Ala Leu Leu Pro Pro Leu Val Trp Asp Cys Asp Gly Leu
725 730 735
Met Arg His Ala Ala Leu Asp Arg His Arg Asp Cys Arg Ile Asp Ala 740 745 750 Gly Gly His Glu Pro Val Tyr Ala Ala Ala Cys Asn Val Ala Thr Ala 755 760 765
Asp Phe Asn Arg Asn Asp Gly Arg Leu Leu His Asn Thr Gin Ala Arg
770 775 780
Ala Ala Asp Ala Ala Asp Asp Arg Pro His Arg Pro Ala Asp Trp Thr 785 790 795 800
Val His His Lys Ile Tyr Tyr Tyr Val Leu Val Pro Ala Phe Ser Arg
805 810 815
Gly Arg Cys Cys Thr Ala Gly Val Arg Phe Asp Arg Val Tyr Ala Thr 820 825 830 Leu Gin Asn Met Val Val Pro Glu Ile Ala Pro Gly Glu Glu Cys Pro 835 840 845
Ser Asp Pro Val Thr Asp Pro Ala His Pro Leu His Pro Ala Asn Leu
850 855 860
Val Ala Asn Thr Val Asn Ala Met Phe His Asn Gly Arg Val Val Val 865 870 875 880
Asp Gly Pro Ala Met Leu Thr Leu Gin Val Leu Ala His Asn Met Ala
885 890 895
Glu Arg Thr Thr Ala Leu Leu Cys Ser Ala Ala Pro Asp Ala Gly Ala 900 905 910 Asn Thr Ala Ser Thr Ala Asn Met Arg Ile Phe Asp Gly Ala Leu His 915 920 925
Ala Gly Val Leu Leu Met Ala Pro Gin His Leu Asp His Thr Ile Gin
930 935 940
Asn Gly Glu Tyr Phe Tyr Val Leu Pro Val His Ala Leu Phe Ala Gly 945 950 955 960
Ala Asp His Val Ala Asn Ala Pro Asn Phe Pro Pro Ala Leu Arg Asp
965 970 975
Leu Ala Arg His Val Pro Leu Val Pro Pro Ala Leu Gly Ala Asn Tyr 980 985 990 Phe Ser Ser Ile Arg Gin Pro Val Val Gin His Ala Arg Glu Ser Ala 995 1000 1005
Ala Gly Glu Asn Ala Leu Thr Tyr Ala Leu Met Ala Gly Tyr Phe Lys 1010 1015 1020 Met Ser Pro Val Tyr His Gin Leu Lys Thr Gly Leu His Pro Gly Phe
1025 1030 1035 104
Gly Phe Thr Val Val Arg Gin- Asp Arg Phe Val Thr Glu Asn Val Leu
1045 1050 1055 Phe Ser Ala Ser Glu Ala Tyr Phe Leu Gly Gin Leu Gin Val Ala Arg
1060 1065 1070
His Glu Thr Gly Gly Gly Val Ser Phe Thr Leu Thr Gin Pro Arg Gly
1075 1080 1085
Asn Val Asp Leu Gly Val Gly Tyr Thr Ala Val Ala Ala Thr Ala Thr 1090 1095 1100
Val Arg Asn Pro Val Thr Asp Met Gly Asn Leu Pro Gin Asn Phe Tyr
1105 1110 1115 112
Leu Gly Arg Gly Ala Pro Pro Leu Leu Asp Asn Ala Ala Ala Val Tyr
1125 1130 1135 Leu Arg Asn Ala Val Val Ala Gly Asn Arg Leu Gly Pro Ala Gin Pro
1140 1145 1150
Leu Pro Val Phe Gly Cys Ala Gin Val Pro Arg Arg Ala Gly Met Asp
1155 1160 1165
His Gly Gin Asp Ala Val Cys Glu Phe Ile Ala Thr Pro Val Ala Thr 1170 1175 1180
Asp Ile Asn Tyr Phe Arg Arg Pro Cys Asn Pro Arg Gly Arg Ala Ala
1185 1190 1195 120
Gly Gly Val Tyr Ala Gly Asp Lys Glu Gly Asp Val Ile Ala Leu Met
1205 1210 1215 Tyr Asp His Gly Gin Ser Asp Pro Ala Arg Pro Phe Ala Ala Thr Ala
1220 1225 1230
Asn Pro Trp Ala Ser Gin Arg Phe Ser Tyr Gly Asp Leu Leu Tyr Asn
1235 1240 1245
Gly Ala Tyr His Leu Asn Gly Asp Val Leu Ser Pro Cys Phe Lys Phe 1250 1255 1260
Phe Thr Ala Ala Asp Ile Thr Ala Lys His Arg Cys Leu Glu Arg Leu
1265 1270 1275 128
Ile Val Glu Thr Gly Ser Ala Val Ser Thr Ala Thr Ala Ala Ser Asp
1285 1290 1295 Val Gin Phe Lys Arg Pro Pro Gly Cys Arg Glu Leu Val Glu Asp Pro
1300 1305 1310
Cys Gly Leu Phe Gin Glu Ala Tyr Pro Ile Thr Cys Ala Ser Asp Pro
1315 1320 1325
Ala Leu Leu Arg Ser Ala Arg Asp Gly Glu Ala His Ala Arg Glu Thr 1330 1335 1340
His Phe Thr Gin Tyr Leu Ile Tyr Asp Asp Leu Lys Gly Leu Ser Leu 1345 1350 1355 136 ( 2 ) INFORMATION FOR SEQ ID NO : 274 :
( i ) SEQUENCE CHARACTERISTICS : (A) LENGTH : 604 amino acids ( B ) TYPE : amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 274:
Met Arg Pro Glu Leu Ser Leu Lys Gly Arg Pro Cys Val Thr Glu Ala 1 5 10 15 Val Val Cys Pro Ser Thr Asp Ala Ala Ile His Ser Gly Gly Ser Ser 20 25 30
Ser Val Arg Pro Gin Pro Tyr Ala Arg Ala Ala Arg Ala Arg Ala Thr
35 40 45
His Gly Ser Arg Ser Arg His Arg Gin Pro Leu Leu Pro Pro Pro Ser 50 55 60
Ser His His Pro Thr He Pro Pro Pro Pro Ser Pro Pro Arg Gly Ser 65 70 75 80
Pro Ala Met Glu Leu Ser Tyr Ala Thr Thr Leu His His Arg Asp Val 85 90 95 Val Phe Tyr Val Thr Ala Asp Arg Asn Arg Ala Tyr Phe Val Cys Gly 100 105 110
Gly Ser Val Tyr Ser Val Gly Arg Pro Arg Asp Ser Gin Pro Gly Glu
115 120 125
Ile Ala Lys Phe Gly Leu Val Val Arg Gly Thr Gly Pro Lys Asp Arg 130 135 140
Met Val Ala Asn Tyr Val Arg Ser Glu Leu Arg Gin Arg Gly Leu Arg
145 150 155 160
Asp Val Arg Pro Val Gly Glu Asp Glu Val Phe Leu Asp Ser Val Cys
165 170 175 Leu Leu Asn Pro Asn Val Ser Ser Asp Val Ile Asn Thr Asn Asp Val
180 185 190
Glu Val Leu Asp Glu Cys Leu Ala Glu Tyr Cys Thr Ser Leu Arg Thr
195 200 205
Ser Pro Gly Val Leu Val Thr Gly Val Arg Val Arg Ala Arg Asp Arg 210 215 220
Val Ile Glu Leu Phe Glu His Pro Ala Ile Val Asn Ile Ser Ser Arg 225 230 235 240
Phe Ala Tyr Thr Pro Ser Pro Tyr Val Phe Ala Gin Ala His Leu Pro 245 250 255
Arg Leu Pro Ser Ser Leu Glu Pro Leu Val Ser Gly Leu Phe Asp Gly
260 265 270
Ile Pro Ala Pro Arg Gin Pro Leu Asp Ala Arg Asp Arg Arg Thr Asp 275 280 285
Val Val Ile Thr Gly Thr Arg Ala Pro Arg Pro Met Ala Gly Thr Gly
290 295 300
Ala Gly Gly Ala Gly Ala Lys Arg Ala Thr Val Ser Glu Phe Val Gin 305 310 315 320 Val Lys His Ile Asp Arg Val Val Ser Pro Ser Val Ser Ser Ala Pro
325 330 335
Pro Pro Ser Ala Pro Asp Ala Ser Leu Pro Pro Pro Gly Leu Gin Glu
340 345 350
Ala Ala Pro Pro Gly Pro Pro Leu Arg Glu Leu Trp Trp Val Phe Tyr 355 360 365
Ala Gly Asp Arg Ala Leu Glu Glu Pro His Ala Glu Ser Gly Leu Thr
370 375 380
Arg Glu Glu Val Arg Ala Val His Gly Phe Arg Glu Gin Ala Trp Lys 385 390 395 400 Leu Phe Gly Ser Val Gly Ala Pro Arg Ala Phe Leu Gly Ala Ala Leu
405 410 415
Ser Pro Thr Gin Lys Leu Ala Val Tyr Tyr Tyr Leu Ile His Arg Glu
420 425 430
Arg Arg Met Ser Pro Phe Pro Ala Leu Val Arg Leu Val Gly Arg Tyr 435 440 445
Ile Gin Arg His Gly Val Pro Ala Pro Asp Glu Pro Thr Leu Ala Asp
450 455 460
Ala Met Asn Gly Leu Phe Arg Asp Ala Ala Gly Thr Val Ala Glu Gin 465 470 475 480 Leu Leu Met Phe Asp Leu Leu Pro Pro Lys Asp Val Pro Val Gly Ser
485 490 495
Asp Ala Arg Ala Asp Ser Ala Ala Leu Leu Arg Phe Val Asp Ser Gin
500 505 510
Arg Leu Thr Pro Gly Gly Ser Val Ser Pro Glu His Val Met Tyr Leu 515 520 525
Gly Ala Phe Leu Gly Val Leu Tyr Ala Gly His Gly Arg Leu Ala Ala
530 535 540
Ala Thr His Thr Ala Arg Leu Thr Gly Val Thr Ser Leu Val Leu Thr 545 550 555 560 Val Gly Asp Val Asp Arg Met Ser Ala Phe Asp Arg Gly Pro Ala Gly
565 570 575
Ala Ala Gly Arg Thr Arg Thr Ala Gly Tyr Leu Asp Ala Leu Leu Thr 580 585 590 Val Cys Leu Ala Arg Ala Gin His Gly Gin Ser Val 595 600
(2) INFORMATION FOR SEQ ID NO: 275:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 522 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 275:
Met Glu Leu Ser Tyr Ala Thr Thr Leu His His Arg Asp Val Val Phe
1 5 10 15
Tyr Val Thr Ala Asp Arg Asn Arg Ala Tyr Phe Val Cys Gly Gly Ser 20 25 30 Val Tyr Ser Val Gly Arg Pro Arg Asp Ser Gin Pro Gly Glu Ile Ala 35 40 45
Lys Phe Gly Leu Val Val Arg Gly Thr Gly Pro Lys Asp Arg Met Val
50 55 60
Ala Asn Tyr Val Arg Ser Glu Leu Arg Gin Arg Gly Leu Arg Asp Val 65 70 75 80
Arg Pro Val Gly Glu Asp Glu Val Phe Leu Asp Ser Val Cys Leu Leu
85 90 95
Asn Pro Asn Val Ser Ser Asp Val Ile Asn Thr Asn Asp Val Glu Val 100 105 110 Leu Asp Glu Cys Leu Ala Glu Tyr Cys Thr Ser Leu Arg Thr Ser Pro 115 120 125
Gly Val Leu Val Thr Gly Val Arg Val Arg Ala Arg Asp Arg Val lie
130 135 140
Glu Leu Phe Glu His Pro Ala Ile Val Asn Ile Ser Ser Arg Phe Ala 145 150 155 160
Tyr Thr Pro Ser Pro Tyr Val Phe Ala Gin Ala His Leu Pro Arg Leu
165 170 175
Pro Ser Ser Leu Glu Pro Leu Val Ser Gly Leu Phe Asp Gly Ile Pro 180 185 190 Ala Pro Arg Gin Pro Leu Asp Ala Arg Asp Arg Arg Thr Asp Val Val 195 200 205
Ile Thr Gly Thr Arg Ala Pro Arg Pro Met Ala Gly Thr Gly Ala Gly 210 215 220 Gly Ala Gly Ala Lys Arg Ala Thr Val Ser Glu Phe Val Gin Val Lys
225 230 235 240
His Ile Asp Arg Val Val Ser. Pro Ser Val Ser Ser Ala Pro Pro Pro
245 250 255 Ser Ala Pro Asp Ala Ser Leu Pro Pro Pro Gly Leu Gin Glu Ala Ala
260 265 270
Pro Pro Gly Pro Pro Leu Arg Glu Leu Trp Trp Val Phe Tyr Ala Gly
275 280 285
Asp Arg Ala Leu Glu Glu Pro His Ala Glu Ser Gly Leu Thr Arg Glu 290 295 300
Glu Val Arg Ala Val His Gly Phe Arg Glu Gin Ala Trp Lys Leu Phe
305 310 315 320
Gly Ser Val Gly Ala Pro Arg Ala Phe Leu Gly Ala Ala Leu Ser Pro
325 330 335 Thr Gin Lys Leu Ala Val Tyr Tyr Tyr Leu Ile His Arg Glu Arg Arg
340 345 350
Met Ser Pro Phe Pro Ala Leu Val Arg Leu Val Gly Arg Tyr Ile Gin
355 360 365
Arg His Gly Val Pro Ala Pro Asp Glu Pro Thr Leu Ala Asp Ala Met 370 375 380
Asn Gly Leu Phe Arg Asp Ala Ala Gly Thr Val Ala Glu Gin Leu Leu
385 390 395 400
Met Phe Asp Leu Leu Pro Pro Lys Asp Val Pro Val Gly Ser Asp Ala
405 410 415 Arg Ala Asp Ser Ala Ala Leu Leu Arg Phe Val Asp Ser Gin Arg Leu
420 425 430
Thr Pro Gly Gly Ser Val Ser Pro Glu His Val Met Tyr Leu Gly Ala
435 440 445
Phe Leu Gly Val Leu Tyr Ala Gly His Gly Arg Leu Ala Ala Ala Thr 450 455 460
His Thr Ala Arg Leu Thr Gly Val Thr Ser Leu Val Leu Thr Val Gly 465 470 475 480
Asp Val Asp Arg Met Ser Ala Phe Asp Arg Gly Pro Ala Gly Ala Ala 485 490 495 Gly Arg Thr Arg Thr Ala Gly Tyr Leu Asp Ala Leu Leu Thr Val Cys 500 505 510
Leu Ala Arg Ala Gin His Gly Gin Ser Val 515 520
(2) INFORMATION FOR SEQ ID NO: 276:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 602 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:276:
Met Thr Ala Ala Ala Leu Tyr Gly Gly Ala Lys Tyr Arg Pro Gly Thr 1 5 10 15
Leu Arg Asn Pro Gly Arg Val Ala Ser Thr Pro Arg Arg Arg Gly Val
20 25 30
Leu Tyr Gly Ala Leu Cys Pro Gly Ile Pro Phe Val Gly Ser Gly Pro 35 40 45 Gly Ala Val Gly Trp Glu Cys Val Cys Val Gly Gly Gly Arg Arg Asp 50 55 60
Gly Gly Pro Asp Gin Val Tyr Arg Gly Arg Ser Val Gly Arg Pro Asn 65 70 75 80
Arg Pro Phe Lys His Leu Arg Met His Arg Pro Ser Gin Ser Asp Thr 85 90 95
Gly Thr His Gin Arg Arg Lys Pro Pro Ser Pro Val Arg Val Arg Val
100 105 110
Phe Ser Gly Gly Val Phe Phe Leu Ser Ala Leu Leu Pro Pro His Leu 115 120 125 His His Pro Pro Pro Thr Trp Leu Ala Ile Gly Gly Lys Thr Met Lys 130 135 140
Thr Lys Pro Leu Pro Thr Ala Pro Met Ala Trp Ala Glu Ser Ala Val 145 150 155 160
Glu Thr Thr Thr Ser Pro Arg Glu Leu Ala Gly His Ala Pro Leu Arg 165 170 175
Arg Val Leu Arg Pro Pro Ile Ala Arg Arg Asp Gly Pro Val Leu Leu
180 185 190
Gly Asp Arg Ala Pro Arg Arg Thr Ala Ser Thr Met Trp Leu Leu Gly 195 200 205 Ile Asp Pro Ala Glu Ser Ser Pro Gly Thr Arg Ala Thr Arg Asp Asp 210 215 220
Thr Glu Gin Ala Val Asp Lys Ile Leu Arg Gly Ala Arg Arg Ala Gly 225 230 235 240
Gly Leu Thr Val Pro Gly Ala Pro Arg Tyr His Leu Thr Arg Gin Val 245 250 255
Thr Leu Thr Asp Leu Cys Gin Pro Asn Ala Glu Arg Ala Gly Ala Leu
260 265 270
Leu Leu Ala Leu Arg His Pro Thr Asp Leu Pro His Leu Ala Arg His 275 280 285
Arg Ala Pro Pro Gly Arg Gin Thr Glu Arg Leu Ala Glu Ala Trp Gly
290 295. 300
Gin Leu Leu Glu Ala Ser Ala Leu Gly Ser Gly Arg Ala Glu Ser Gly 305 310 315 320
Cys Ala Arg Ala Gly Leu Val Ser Phe Asn Phe Leu Val Ala Ala Cys
325 330 335
Ala Ala Ala Tyr Asp Ala Arg Asp Ala Ala Glu Ala Val Arg Ala His 340 345 350 lie Thr Thr Asn Tyr Gly Gly Thr Arg Ala Gly Ala Arg Leu Asp Arg 355 360 365
Phe Ser Glu Cys Leu Arg Ala Met Val His Thr His Val Phe Phe Val
370 375 380
Met Arg Phe Phe Gly Gly Leu Val Ser Trp Val Thr Gin Asp Glu Leu 385 390 395 400
Ala Ser Val Thr Ala Val Cys Ser Gly Pro Gin Glu Ala Thr His Thr
405 410 415
Gly His Pro Gly Arg Pro Cys Ser Ala Val Thr Ile Pro Ala Cys Ala 420 425 430 Phe Val Asp Leu Asp Ala Glu Leu Cys Leu Gly Gly Pro Gly Ala Ala 435 440 445
Phe Leu Tyr Leu Val Phe Tyr Gin Cys Arg Asp Gin Glu Leu Cys Cys
450 455 460
Val Tyr Val Val Lys Ser Gin Leu Pro Pro Arg Gly Leu Glu Ala Ala 465 470 475 480
Leu Glu Arg Leu Phe Gly Arg Leu Arg Ile Thr Asn Thr Ile His Gly
485 490 495
Ala Glu Asp Met Thr Pro Pro Pro Pro Asn Arg Asn Val Asp Phe Pro 500 505 510 Leu Ala Val Leu Ala Ala Ser Ser Gin Ser Pro Arg Cys Ser Ala Ser 515 520 525
Gin Val Thr Asn Pro Gin Phe Val Asp Arg Leu Tyr Arg Trp Gin Pro
530 535 540
Asp Leu Arg Gly Arg Pro Thr Ala Arg Thr Cys Thr Tyr Ala Ala Phe 545 550 555 560
Ala Glu Leu Gly Val Met Pro Asp Asn Ser Pro Arg Cys Leu His Arg
565 570 575
Thr Glu Arg Phe Gly Ala Val Gly Val Pro Val Val Ile Gly Val Val 580 585 590 Trp Arg Pro Gly Gly Trp Arg Ala Cys Ala 595 600
(2) INFORMATION FOR SEQ ID NO: 277: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 515 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 277:
Met His Arg Pro Ser Gin Ser Asp Thr Gly Thr His Gin Arg Arg Lys
1 5 10 15
Pro Pro Ser Pro Val Arg Val Arg Val Phe Ser Gly Gly Val Phe Phe 20 25 30
Leu Ser Ala Leu Leu Pro Pro His Leu His His Pro Pro Pro Thr Trp
35 40 45
Leu Ala Ile Gly Gly Lys Thr Met Lys Thr Lys Pro Leu Pro Thr Ala 50 55 60 Pro Met Ala Trp Ala Glu Ser Ala Val Glu Thr Thr Thr Ser Pro Arg 65 70 75 80
Glu Leu Ala Gly His Ala Pro Leu Arg Arg Val Leu Arg Pro Pro lie
85 90 95
Ala Arg Arg Asp Gly Pro Val Leu Leu Gly Asp Arg Ala Pro Arg Arg 100 105 110
Thr Ala Ser Thr Met Trp Leu Leu Gly Ile Asp Pro Ala Glu Ser Ser
115 120 125
Pro Gly Thr Arg Ala Thr Arg Asp Asp Thr Glu Gin Ala Val Asp Lys
130 135 140 Ile Leu Arg Gly Ala Arg Arg Ala Gly Gly Leu Thr Val Pro Gly Ala
145 150 155 160
Pro Arg Tyr His Leu Thr Arg Gin Val Thr Leu Thr Asp Leu Cys Gin
165 170 175
Pro Asn Ala Glu Arg Ala Gly Ala Leu Leu Leu Ala Leu Arg His Pro 180 185 190
Thr Asp Leu Pro His Leu Ala Arg His Arg Ala Pro Pro Gly Arg Gin
195 200 205
Thr Glu Arg Leu Ala Glu Ala Trp Gly Gin Leu Leu Glu Ala Ser Ala 210 215 220 Leu Gly Ser Gly Arg Ala Glu Ser Gly Cys Ala Arg Ala Gly Leu Val 225 230 235 240
Ser Phe Asn Phe Leu Val Ala Ala Cys Ala Ala Ala Tyr Asp Ala Arg 245 250 255 Asp Ala Ala Glu Ala Val Arg Ala His Ile Thr Thr Asn Tyr Gly Gly 260 265 270
Thr Arg Ala Gly Ala Arg Leu. Asp Arg Phe Ser Glu Cys Leu Arg Ala 275 280 285
Met Val His Thr His Val Phe Phe Val Met Arg Phe Phe Gly Gly Leu 290 295 300
Val Ser Trp Val Thr Gin Asp Glu Leu Ala Ser Val Thr Ala Val Cys
305 310 315 320
Ser Gly Pro Gin Glu Ala Thr His Thr Gly His Pro Gly Arg Pro Cys
325 330 335
Ser Ala Val Thr Ile Pro Ala Cys Ala Phe Val Asp Leu Asp Ala Glu 340 345 350
Leu Cys Leu Gly Gly Pro Gly Ala Ala Phe Leu Tyr Leu Val Phe Tyr
355 360 365
Gin Cys Arg Asp Gin Glu Leu Cys Cys Val Tyr Val Val Lys Ser Gin 370 375 380
Leu Pro Pro Arg Gly Leu Glu Ala Ala Leu Glu Arg Leu Phe Gly Arg
385 390 395 400
Leu Arg Ile Thr Asn Thr Ile His Gly Ala Glu Asp Met Thr Pro Pro 405 410 415
Pro Pro Asn Arg Asn Val Asp Phe Pro Leu Ala Val Leu Ala Ala Ser 420 425 430
Ser Gin Ser Pro Arg Cys Ser Ala Ser Gin Val Thr Asn Pro Gin Phe 435 440 445
Val Asp Arg Leu Tyr Arg Trp Gin Pro Asp Leu Arg Gly Arg Pro Thr
450 455 460
Ala Arg Thr Cys Thr Tyr Ala Ala Phe Ala Glu Leu Gly Val Met Pro
465 470 475 480
Asp Asn Ser Pro Arg Cys Leu His Arg Thr Glu Arg Phe Gly Ala Val 485 490 495
Gly Val Pro Val Val Ile Gly Val Val Trp Arg Pro Gly Gly Trp Arg 500 505 510
Ala Cys Ala 515
(2) INFORMATION FOR SEQ ID NO: 278:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 460 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 278:
Met Lys Thr Lys Pro Leu Pro Thr Ala Pro Met Ala Trp Ala Glu Ser 1 5 10 15
Ala Val Glu Thr Thr Thr Ser Pro Arg Glu Leu Ala Gly His Ala Pro
20 25 30
Leu Arg Arg Val Leu Arg Pro Pro Ile Ala Arg Arg Asp Gly Pro Val 35 40 45
Leu Leu Gly Asp Arg Ala Pro Arg Arg Thr Ala Ser Thr Met Trp Leu
50 55 60
Leu Gly Ile Asp Pro Ala Glu Ser Ser Pro Gly Thr Arg Ala Thr Arg 65 70 75 80 Asp Asp Thr Glu Gin Ala Val Asp Lys Ile Leu Arg Gly Ala Arg Arg
85 90 95
Ala Gly Gly Leu Thr Val Pro Gly Ala Pro Arg Tyr His Leu Thr Arg
100 105 110
Gin Val Thr Leu Thr Asp Leu Cys Gin Pro Asn Ala Glu Arg Ala Gly 115 120 125
Ala Leu Leu Leu Ala Leu Arg His Pro Thr Asp Leu Pro His Leu Ala
130 135 140
Arg His Arg Ala Pro Pro Gly Arg Gin Thr Glu Arg Leu Ala Glu Ala 145 150 155 160 Trp Gly Gin Leu Leu Glu Ala Ser Ala Leu Gly Ser Gly Arg Ala Glu
165 170 175
Ser Gly Cys Ala Arg Ala Gly Leu Val Ser Phe Asn Phe Leu Val Ala
180 185 190
Ala Cys Ala Ala Ala Tyr Asp Ala Arg Asp Ala Ala Glu Ala Val Arg 195 200 205
Ala His Ile Thr Thr Asn Tyr Gly Gly Thr Arg Ala Gly Ala Arg Leu
210 215 220
Asp Arg Phe Ser Glu Cys Leu Arg Ala Met Val His Thr His Val Phe 225 230 235 240 Phe Val Met Arg Phe Phe Gly Gly Leu Val Ser Trp Val Thr Gin Asp
245 250 255
Glu Leu Ala Ser Val Thr Ala Val Cys Ser Gly Pro Gin Glu Ala Thr
260 265 270
His Thr Gly His Pro Gly Arg Pro Cys Ser Ala Val Thr Ile Pro Ala 275 280 285
Cys Ala Phe Val Asp Leu Asp Ala Glu Leu Cys Leu Gly Gly Pro Gly
290 295 300
Ala Ala Phe Leu Tyr Leu Val Phe Tyr Gin Cys Arg Asp Gin Glu Leu 305 310 315 320
Cys Cys Val Tyr Val Val Lys Ser Gin Leu Pro Pro Arg Gly Leu Glu
325 . 330 335
Ala Ala Leu Glu Arg Leu Phe Gly Arg Leu Arg Ile Thr Asn Thr Ile 340 345 350
His Gly Ala Glu Asp Met Thr Pro Pro Pro Pro Asn Arg Asn Val Asp
355 360 365
Phe Pro Leu Ala Val Leu Ala Ala Ser Ser Gin Ser Pro Arg Cys Ser
370 375 380 Ala Ser Gin Val Thr Asn Pro Gin Phe Val Asp Arg Leu Tyr Arg Trp
385 390 395 400
Gin Pro Asp Leu Arg Gly Arg Pro Thr Ala Arg Thr Cys Thr Tyr Ala
405 410 415
Ala Phe Ala Glu Leu Gly Val Met Pro Asp Asn Ser Pro Arg Cys Leu 420 425 430
His Arg Thr Glu Arg Phe Gly Ala Val Gly Val Pro Val Val Ile Gly
435 440 445
Val Val Trp Arg Pro Gly Gly Trp Arg Ala Cys Ala 450 455 460
(2) INFORMATION FOR SEQ ID NO: 279:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 452 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 279
Met Ala Val Val Cys Gly Ser Gly Leu Arg Leu Arg Pro Phe His Pro 1 5 10 15 Pro Ser Pro Ser Phe Phe Val Leu Arg Ala Leu Ile Arg Ala Gly Pro 20 25 30
Gly Pro Phe Ala Asp Arg Ala Pro Ser Gly Pro Gly Cys Gly Met Cys
35 40 45
Arg Gly Asp Ser Pro Gly Val Ala Gly Gly Ser Gly Glu His Cys Leu 50 55 60
Gly Gly Asp Asp Gly Asp Asp Gly Arg Pro Arg Leu Ala Cys Val Gly 65 70 75 80
Ala Ile Arg Phe Ala His Leu Trp Leu Gin Ala Thr Thr Leu Gly Phe 85 90 95
Val Gly Ser Val Val Leu Ser Arg Gly Pro Tyr Ala Asp Ala Met Ser
100 - 105 110
Gly Ala Phe Val Ile Gly Ser Thr Gly Leu Gly Phe Leu Arg Ala Pro 115 120 125
Pro Ala Phe Ala Arg Pro Pro Thr Arg Val Cys Ala Trp Leu Arg Leu
130 135 140
Val Gly Gly Gly Ala Ala Val Trp Ser Leu Gly Glu Ala Gly Ala Pro 145 150 155 160 Pro Gly Val Pro Gly Pro Ala Thr Gin Cys Leu Ala Leu Gly Ala Ala
165 170 175
Tyr Ala Ala Leu Leu Val Leu Ala Asp Asp Val His Pro Leu Phe Leu
180 185 190
Leu Ala Pro Arg Pro Leu Phe Val Gly Thr Leu Gly Val Val Val Gly 195 200 205
Gly Leu Thr Ile Gly Gly Ser Ala Arg Tyr Trp Trp Ile Asp Pro Arg
210 215 220
Ala Ala Ala Ala Leu Thr Ala Ala Val Val Ala Gly Leu Gly Thr Thr 225 230 235 240 Ala Ala Gly Asp Ser Phe Ser Lys Ala Cys Pro Arg His Arg Arg Phe
245 250 255
Cys Val Val Ser Ala Val Glu Ser Pro Pro Pro Arg Tyr Ala Pro Glu
260 265 270
Asp Ala Glu Arg Pro Thr Asp His Gly Pro Leu Leu Pro Ser Thr His 275 280 285
His Gin Arg Ser Pro Arg Val Cys Gly Asp Gly Ala Ala Arg Pro Glu
290 295 300
Asn Ile Trp Val Pro Val Val Thr Phe Ala Gly Ala Leu Ala Ala Cys 305 310 315 320 Ala Arg Ser Asp Ala Ala Pro Ser Gly Pro Val Leu Pro Leu Trp Pro
325 330 335
Gin Val Phe Val Gly Gly His Ala Ala Ala Gly Leu Thr Glu Leu Cys
340 345 350
Gin Thr Leu Ala Pro Arg Asp Leu Thr Asp Pro Leu Leu Phe Ala Tyr 355 360 365
Val Gly Phe Gin Val Val Asn His Gly Leu Met Phe Val Val Pro Asp
370 375 380
Ile Ala Val Tyr Ala Met Leu Gly Gly Ala Val Trp Ile Ser Leu Thr 385 390 395 400 Gin Val Leu Gly Leu Arg Arg Arg Leu His Lys Asp Pro Asp Ala Gly
405 410 415
Pro Trp Ala Ala Ala Thr Leu Arg Gly Leu Phe Phe Ser Val Tyr Ala 420 425 430 Leu Gly Phe Ala Ala Gly Val Leu Val Arg Pro Arg Met Ala Ala Ser
435 440 445
Arg Arg Ser Gly 450
(2) INFORMATION FOR SEQ ID NO: 280:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 406 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 280:
Met Cys Arg Gly Asp Ser Pro Gly Val Ala Gly Gly Ser Gly Glu His
1 5 10 15
Cys Leu Gly Gly Asp Asp Gly Asp Asp Gly Arg Pro Arg Leu Ala Cys
20 25 30
Val Gly Ala Ile Arg Phe Ala His Leu Trp Leu Gin Ala Thr Thr Leu 35 40 45
Gly Phe Val Gly Ser Val Val Leu Ser Arg Gly Pro Tyr Ala Asp Ala 50 55 60
Met Ser Gly Ala Phe Val Ile Gly Ser Thr Gly Leu Gly Phe Leu Arg
65 70 75 80
Ala Pro Pro Ala Phe Ala Arg Pro Pro Thr Arg Val Cys Ala Trp Leu 85 90 95
Arg Leu Val Gly Gly Gly Ala Ala Val Trp Ser Leu Gly Glu Ala Gly 100 105 110
Ala Pro Pro Gly Val Pro Gly Pro Ala Thr Gin Cys Leu Ala Leu Gly
115 120 125
Ala Ala Tyr Ala Ala Leu Leu Val Leu Ala Asp Asp Val His Pro Leu 130 135 140
Phe Leu Leu Ala Pro Arg Pro Leu Phe Val Gly Thr Leu Gly Val Val
145 150 155 160
Val Gly Gly Leu Thr Ile Gly Gly Ser Ala Arg Tyr Trp Trp Ile Asp
165 170 175
Pro Arg Ala Ala Ala Ala Leu Thr Ala Ala Val Val Ala Gly Leu Gly 180 185 190
Thr Thr Ala Ala Gly Asp Ser Phe Ser Lys Ala Cys Pro Arg His Arg
195 200 205 Arg Phe Cys Val Val Ser Ala Val Glu Ser Pro Pro Pro Arg Tyr Ala
210 215 220
Pro Glu Asp Ala Glu Arg Pro. Thr Asp His Gly Pro Leu Leu Pro Ser 225 230 235 240 Thr His His Gin Arg Ser Pro Arg Val Cys Gly Asp Gly Ala Ala Arg
245 250 255
Pro Glu Asn Ile Trp Val Pro Val Val Thr Phe Ala Gly Ala Leu Ala
260 265 270
Ala Cys Ala Arg Ser Asp Ala Ala Pro Ser Gly Pro Val Leu Pro Leu 275 280 285
Trp Pro Gin Val Phe Val Gly Gly His Ala Ala Ala Gly Leu Thr Glu
290 295 300
Leu Cys Gin Thr Leu Ala Pro Arg Asp Leu Thr Asp Pro Leu Leu Phe 305 310 315 320 Ala Tyr Val Gly Phe Gin Val Val Asn His Gly Leu Met Phe Val Val
325 330 335
Pro Asp Ile Ala Val Tyr Ala Met Leu Gly Gly Ala Val Trp Ile Ser
340 345 350
Leu Thr Gin Val Leu Gly Leu Arg Arg Arg Leu His Lys Asp Pro Asp 355 360 365
Ala Gly Pro Trp Ala Ala Ala Thr Leu Arg Gly Leu Phe Phe Ser Val
370 375 380
Tyr Ala Leu Gly Phe Ala Ala Gly Val Leu Val Arg Pro Arg Met Ala 385 390 395 400 Ala Ser Arg Arg Ser Gly
405
(2) INFORMATION FOR SEQ ID NO: 281:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 644 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 281:
Met Gly Thr Glu Asp Cys Asp His Glu Gly Arg Ser Val Ala Ala Pro 1 5 10 15
Val Glu Val Met Ala Leu Tyr Ala Thr Asp Gly Cys Val Ile Thr Ser 20 25 30 Ser Leu Ala Leu Leu Thr Asn Cys Leu Leu Gly Ala Glu Pro Leu Tyr
35 40 45 Ile Phe Ser Tyr Asp Ala Tyr Arg Pro Asp Ala Pro Asn Gly Pro Thr 50 55 60 Gly Ala Pro Thr Glu Gin Glu Arg Phe Glu Gly Ser Arg Ala Leu Tyr 65 70 75 80
Arg Asp Ala Gly Gin Gly Asp Ser Phe Arg Val Thr Phe Cys Leu Leu
85 90 95 Gly Thr Glu Val Gly Val Thr His His Pro Lys Gly Arg Trp Met Phe
Figure imgf000692_0001
Val Cys Arg Phe Glu Arg Ala Asp Asp Val Ala Val Leu Gin Asp Ala
115 120 125
Leu Gly Arg Gly Thr Pro Leu Leu Pro Ala His Ile Thr Ala Thr Leu
130 135 140 Asp Leu Glu Ala Thr Phe Ala Leu His Ala Asn Ile Ile Met Ala Leu
145 150 155 160
Thr Val Ala Ile Val His Asn Ala Pro Ala Arg Ile Gly Ser Gly Ser
165 170 175 Thr Ala Pro Leu Tyr Glu Pro Gly Glu Ser Met Arg Ser Val Val Gly 180 185 190
Arg Met Ser Leu Gly Gin Arg Gly Leu Thr Thr Leu Phe Val His His
195 200 205
Glu Ala Arg Val Leu Ala Ala Tyr Arg Arg Ala Tyr Tyr Gly Ser Ala
210 215 220 Gin Ser Pro Phe Trp Phe Leu Ser Lys Phe Gly Pro Asp Glu Lys Ser
225 230 235 240
Leu Val Leu Ala Ala Arg Tyr Tyr Val Leu Gin Ala Pro Arg Leu Gly
245 250 255 Gly Ala Gly Ala Thr Tyr Asp Leu Gin Ala Val Lys Asp Ile Cys Ala 260 265 270
Thr Tyr Ala Ile Pro His Asp Pro Arg Pro Asp Thr Leu Ser Ala Ala
275 280 285
Ser Leu Thr Ser Phe Ala Ala Ile Thr Arg Phe Cys Cys Thr Ser Gin
290 295 300 Tyr Ser Arg Gly Ala Ala Ala Ala Gly Phe Pro Leu Tyr Val Glu Arg
305 310 315 320
Arg Ile Ala Ala Asp Val Arg Glu Thr Gly Ala Leu Glu Lys Phe Ile
325 330 335 Ala His Asp Arg Ser Cys Leu Arg Val Ser Asp Arg Glu Phe Ile Thr 340 345 350
Tyr Ile Tyr Leu Ala His Phe Glu Cys Phe Ser Pro Pro Arg Leu Ala
355 360 365 Thr His Leu Arg Ala Val Thr Thr His Asp Pro Ser Pro Ala Ala Ser 370 375 380
Thr Glu Gin Pro Ser Pro Leu Gly Arg Glu Ala Val Glu Gin Phe Phe 385 390 - 395 400
Arg His Val Arg Ala Gin Leu Asn Ile Arg Glu Tyr Val Lys Gin Asn 405 410 415
Val Thr Pro Arg Glu Thr Ala Gly Asp Ala Ala Ala Ala Tyr Leu Arg 420 425 430
Ala Arg Thr Tyr Ala Pro Ala Ala Leu Thr Pro Ala Pro Ala Tyr Cys 435 440 445 Gly Val Ala Asp Ser Ser Thr Lys Met Met Gly Arg Leu Ala Glu Ala 450 455 460
Glu Arg Leu Leu Val Pro His Gly Trp Pro Ala Phe Ala Pro Thr Thr 465 470 475 480
Pro Gly Asp Asp Ala Gly Gly Gly Thr Ala Ala Pro Gin Thr Cys Gly 485 490 495
Ile Val Lys Arg Leu Leu Lys Leu Ala Ala Thr Glu Gin Gin Gly Thr 500 505 510
Thr Pro Pro Ala Ile Ala Ala Leu Met Gin Asp Ala Ser Val Gin Thr 515 520 525 Pro Leu Pro Val Tyr Arg Ile Thr Met Ser Pro Thr Gly Gin Ala Phe 530 535 540
Ala Ala Ala Ala Arg Asp Asp Trp Ala Arg Val Thr Arg Asp Ala Arg 545 550 555 560
Pro Pro Glu Ala Thr Val Val Ala Asp Ala Ala Ala Ala Pro Glu Pro 565 570 575
Gly Ala Leu Gly Arg Arg Leu Thr Arg Arg Ile Cys Arg Pro Ala Pro 580 585 590
Pro Pro Gly Arg Pro Gly Arg Arg Gly Pro Asp Val Arg Glu Pro Gin 595 600 605 Arg Asp Leu Gin Arg Arg Ala Gly Arg Tyr Glu His His Pro Gly Ser 610 615 620
Gly His Arg Pro Glu Gly Ala Arg Pro Leu Ser Pro Ala Pro Arg Gly 625 630 635 640
Pro Gly Ser Leu
(2) INFORMATION FOR SEQ ID NO: 282
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 715 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear ( ii ) MOLECULE TYPE : peptide
(xi ) SEQUENCE DESCRIPTION : SEQ ID NO : 282 :
Met Gly Ala Gly Lys Ser Ala Leu Thr Thr Ala Arg Ala Ser Cys Ser
1 5 10 15
Arg Gly Ser Xaa Ser Glu Gly Gly Ala Ala Ala Arg Ile Ile Ser Tyr 20 25 30 Cys Cys Ser Ser Gly Arg Val Pro Gin Pro His Ser Thr Pro Ser Arg 35 40 45
Asp Ala Ile Pro Glu His Arg Ser Ala Pro Ala Phe Pro His Pro Thr
50 55 60
Pro Ser Gly Phe Ala Gly Ala Met Gly Thr Glu Asp Cys Asp His Glu 65 70 75 80
Gly Arg Ser Val Ala Ala Pro Val Glu Val Met Ala Leu Tyr Ala Thr
85 90 95
Asp Gly Cys Val Ile Thr Ser Ser Leu Ala Leu Leu Thr Asn Cys Leu 100 105 110 Leu Gly Ala Glu Pro Leu Tyr Ile Phe Ser Tyr Asp Ala Tyr Arg Pro 115 120 125
Asp Ala Pro Asn Gly Pro Thr Gly Ala Pro Thr Glu Gin Glu Arg Phe
130 135 140
Glu Gly Ser Arg Ala Leu Tyr Arg Asp Ala Gly Gin Gly Asp Ser Phe 145 150 155 160
Arg Val Thr Phe Cys Leu Leu Gly Thr Glu Val Gly Val Thr His His
165 170 175
Pro Lys Gly Arg Trp Met Phe Val Cys Arg Phe Glu Arg Ala Asp Asp 180 185 190 Val Ala Val Leu Gin Asp Ala Leu Gly Arg Gly Thr Pro Leu Leu Pro 195 200 205
Ala His Ile Thr Ala Thr Leu Asp Leu Glu Ala Thr Phe Ala Leu His
210 215 220
Ala Asn Ile Ile Met Ala Leu Thr Val Ala Ile Val His Asn Ala Pro 225 230 235 240
Ala Arg Ile Gly Ser Gly Ser Thr Ala Pro Leu Tyr Glu Pro Gly Glu
245 250 255
Ser Met Arg Ser Val Val Gly Arg Met Ser Leu Gly Gin Arg Gly Leu 260 265 270 Thr Thr Leu Phe Val His His Glu Ala Arg Val Leu Ala Ala Tyr Arg 275 280 285
Arg Ala Tyr Tyr Gly Ser Ala Gin Ser Pro Phe Trp Phe Leu Ser Lys 290 295 300 Phe Gly Pro Asp Glu Lys Ser Leu Val Leu Ala Ala Arg Tyr Tyr Val
305 310 315 320
Leu Gin Ala Pro Arg Leu Gly. Gly Ala Gly Ala Thr Tyr Asp Leu Gin
325 330 335 Ala Val Lys Asp Ile Cys Ala Thr Tyr Ala Ile Pro His Asp Pro Arg
340 345 350
Pro Asp Thr Leu Ser Ala Ala Ser Leu Thr Ser Phe Ala Ala Ile Thr
355 360 365
Arg Phe Cys Cys Thr Ser Gin Tyr Ser Arg Gly Ala Ala Ala Ala Gly 370 375 380
Phe Pro Leu Tyr Val Glu Arg Arg Ile Ala Ala Asp Val Arg Glu Thr
385 390 395 400
Gly Ala Leu Glu Lys Phe Ile Ala His Asp Arg Ser Cys Leu Arg Val
405 410 415 Ser Asp Arg Glu Phe Ile Thr Tyr Ile Tyr Leu Ala His Phe Glu Cys
420 425 430
Phe Ser Pro Pro Arg Leu Ala Thr His Leu Arg Ala Val Thr Thr His
435 440 445
Asp Pro Ser Pro Ala Ala Ser Thr Glu Gin Pro Ser Pro Leu Gly Arg 450 455 460
Glu Ala Val Glu Gin Phe Phe Arg His Val Arg Ala Gin Leu Asn lie
465 470 475 480
Arg Glu Tyr Val Lys Gin Asn Val Thr Pro Arg Glu Thr Ala Gly Asp
485 490 495 Ala Ala Ala Ala Tyr Leu Arg Ala Arg Thr Tyr Ala Pro Ala Ala Leu
500 505 510
Thr Pro Ala Pro Ala Tyr Cys Gly Val Ala Asp Ser Ser Thr Lys Met
515 520 525
Met Gly Arg Leu Ala Glu Ala Glu Arg Leu Leu Val Pro His Gly Trp 530 535 540
Pro Ala Phe Ala Pro Thr Thr Pro Gly Asp Asp Ala Gly Gly Gly Thr
545 550 555 560
Ala Ala Pro Gin Thr Cys Gly Ile Val Lys Arg Leu Leu Lys Leu Ala
565 570 575 Ala Thr Glu Gin Gin Gly Thr Thr Pro Pro Ala Ile Ala Ala Leu Met
580 585 590
Gin Asp Ala Ser Val Gin Thr Pro Leu Pro Val Tyr Arg Ile Thr Met
595 600 605
Ser Pro Thr Gly Gin Ala Phe Ala Ala Ala Ala Arg Asp Asp Trp Ala 610 615 620
Arg Val Thr Arg Asp Ala Arg Pro Pro Glu Ala Thr Val Val Ala Asp 625 630 635 640
Ala Ala Ala Ala Pro Glu Pro Gly Ala Leu Gly Arg Arg Leu Thr Arg 645 650 655
Arg Ile Cys Arg Pro Ala Pro Pro Pro Gly Arg Pro Gly Arg Arg Gly
660 . 665 670
Pro Asp Val Arg Glu Pro Gin Arg Asp Leu Gin Arg Arg Ala Gly Arg
675 680 685
Tyr Glu His His Pro Gly Ser Gly His Arg Pro Glu Gly Ala Arg Pro
690 695 700
Leu Ser Pro Ala Pro Arg Gly Pro Gly Ser Leu 705 710 715
(2) INFORMATION FOR SEQ ID NO: 283:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 744 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 283
Met Gin Ala Trp Tyr Val Arg Ala Arg Ala Arg Ala Phe Thr Arg Arg 1 5 10 15 Arg Val Ser Ser Ser Asp Ser Arg Ala Ser Ser Ser Val Met Gly Ala 20 25 30
Gly Lys Ser Ala Leu Thr Thr Ala Arg Ala Ser Cys Ser Arg Gly Ser
35 40 45
Xaa Ser Glu Gly Gly Ala Ala Ala Arg Ile Ile Ser Tyr Cys Cys Ser 50 55 60
Ser Gly Arg Val Pro Gin Pro His Ser Thr Pro Ser Arg Asp Ala Ile 65 70 75 80
Pro Glu His Arg Ser Ala Pro Ala Phe Pro His Pro Thr Pro Ser Gly 85 90 95 Phe Ala Gly Ala Met Gly Thr Glu Asp Cys Asp His Glu Gly Arg Ser 100 105 110
Val Ala Ala Pro Val Glu Val Met Ala Leu Tyr Ala Thr Asp Gly Cys
115 120 125
Val Ile Thr Ser Ser Leu Ala Leu Leu Thr Asn Cys Leu Leu Gly Ala 130 135 140
Glu Pro Leu Tyr Ile Phe Ser Tyr Asp Ala Tyr Arg Pro Asp Ala Pro 145 150 155 160
Asn Gly Pro Thr Gly Ala Pro Thr Glu Gin Glu Arg Phe Glu Gly Ser 165 170 175
Arg Ala Leu Tyr Arg Asp Ala Gly Gin Gly Asp Ser Phe Arg Val Thr
180 - 185 190
Phe Cys Leu Leu Gly Thr Glu Val Gly Val Thr His His Pro Lys Gly 195 200 205
Arg Trp Met Phe Val Cys Arg Phe Glu Arg Ala Asp Asp Val Ala Val
210 215 220
Leu Gin Asp Ala Leu Gly Arg Gly Thr Pro Leu Leu Pro Ala His Ile 225 230 235 240 Thr Ala Thr Leu Asp Leu Glu Ala Thr Phe Ala Leu His Ala Asn Ile
245 250 255
Ile Met Ala Leu Thr Val Ala Ile Val His Asn Ala Pro Ala Arg Ile
260 265 270
Gly Ser Gly Ser Thr Ala Pro Leu Tyr Glu Pro Gly Glu Ser Met Arg 275 280 285
Ser Val Val Gly Arg Met Ser Leu Gly Gin Arg Gly Leu Thr Thr Leu
290 295 300
Phe Val His His Glu Ala Arg Val Leu Ala Ala Tyr Arg Arg Ala Tyr 305 310 315 320 Tyr Gly Ser Ala Gin Ser Pro Phe Trp Phe Leu Ser Lys Phe Gly Pro
325 330 335
Asp Glu Lys Ser Leu Val Leu Ala Ala Arg Tyr Tyr Val Leu Gin Ala
340 345 350
Pro Arg Leu Gly Gly Ala Gly Ala Thr Tyr Asp Leu Gin Ala Val Lys 355 360 365
Asp Ile Cys Ala Thr Tyr Ala Ile Pro His Asp Pro Arg Pro Asp Thr
370 375 380
Leu Ser Ala Ala Ser Leu Thr Ser Phe Ala Ala Ile Thr Arg Phe Cys 385 390 395 400 Cys Thr Ser Gin Tyr Ser Arg Gly Ala Ala Ala Ala Gly Phe Pro Leu
405 410 415
Tyr Val Glu Arg Arg Ile Ala Ala Asp Val Arg Glu Thr Gly Ala Leu
420 425 430
Glu Lys Phe Ile Ala His Asp Arg Ser Cys Leu Arg Val Ser Asp Arg 435 440 445
Glu Phe Ile Thr Tyr Ile Tyr Leu Ala His Phe Glu Cys Phe Ser Pro
450 455 460
Pro Arg Leu Ala Thr His Leu Arg Ala Val Thr Thr His Asp Pro Ser 465 470 475 480 Pro Ala Ala Ser Thr Thr Glu Gin Pro Ser Pro Leu Gly Arg Glu Ala
485 490 495
Val Glu Gin Phe Phe Arg His Arg Ala Gin Leu Asn Ile Arg Glu Tyr 500 505 510 Val Lys Gin Asn Val Thr Pro Arg Glu Thr Ala Gly Asp Ala Ala Ala
515 520 525
Ala Tyr Leu Arg Ala Arg Thr. Tyr Ala Pro Ala Ala Leu Thr Pro Ala
530 535 540 Pro Ala Tyr Cys Gly Val Ala Asp Ser Ser Thr Lys Met Met Gly Arg
545 550 555 560
Leu Ala Glu Ala Glu Arg Leu Leu Val Pro His Gly Trp Pro Ala Phe
565 570 575
Ala Pro Thr Thr Pro Gly Asp Asp Ala Gly Gly Gly Thr Ala Ala Pro 580 585 590
Gin Thr Cys Gly Ile Val Lys Arg Leu Leu Lys Leu Ala Ala Thr Glu
595 600 605
Gin Gin Gly Thr Thr Pro Pro Ala Ile Ala Ala Leu Met Gin Asp Ala
610 615 620 Ser Val Gin Thr Pro Leu Pro Val Tyr Arg Ile Thr Met Ser Pro Thr
625 630 635 640
Gly Gin Ala Phe Ala Ala Ala Ala Arg Asp Asp Trp Ala Arg Val Thr
645 650 655
Arg Asp Ala Arg Pro Pro Glu Ala Thr Val Val Ala Asp Ala Ala Ala 660 665 670
Ala Pro Glu Pro Gly Ala Leu Gly Arg Arg Leu Thr Arg Arg Ile Cys
675 680 685
Arg Pro Ala Pro Pro Pro Gly Arg Pro Gly Arg Arg Gly Pro Asp Val
690 695 700 Arg Glu Pro Gin Arg Asp Leu Gin Arg Arg Ala Gly Arg Tyr Glu His
705 710 715 720
His Pro Gly Ser Gly His Arg Pro Glu Gly Ala Arg Pro Leu Ser Pro
725 730 735
Ala Pro Arg Gly Pro Gly Ser Leu 740
(2) INFORMATION FOR SEQ ID NO: 284:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 762 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:284: Met Val Glu Pro Ser Ser Pro Gly Trp Trp Arg Ala Ser Leu Ser Arg
1 5 10 15
Leu Thr Met Gin Ala Trp Tyr.Val Arg Ala Arg Ala Arg Ala Phe Thr 20 25 30 Arg Arg Arg Val Ser Ser Ser Asp Ser Arg Ala Ser Ser Ser Val Met 35 40 45
Gly Ala Gly Lys Ser Ala Leu Thr Thr Ala Arg Ala Ser Cys Ser Arg
50 55 60
Gly Ser Xaa Ser Glu Gly Gly Ala Ala Ala Arg Ile Ile Ser Tyr Cys 65 70 75 80
Cys Ser Ser Gly Arg Val Pro Gin Pro His Ser Thr Pro Ser Arg Asp
85 90 95
Ala Ile Pro Glu His Arg Ser Ala Pro Ala Phe Pro His Pro Thr Pro 100 105 110 Ser Gly Phe Ala Gly Ala Met Gly Thr Glu Asp Cys Asp His Glu Gly 115 120 125
Arg Ser Val Ala Ala Pro Val Glu Val Met Ala Leu Tyr Ala Thr Asp
130 135 140
Gly Cys Val He Thr Ser Ser Leu Ala Leu Leu Thr Asn Cys Leu Leu 145 150 155 160
Gly Ala Glu Pro Leu Tyr Ile Phe Ser Tyr Asp Ala Tyr Arg Pro Asp
165 170 175
Ala Pro Asn Gly Pro Thr Gly Ala Pro Thr Glu Gin Glu Arg Phe Glu 180 185 190 Gly Ser Arg Ala Leu Tyr Arg Asp Ala Gly Gin Gly Asp Ser Phe Arg 195 200 205
Val Thr Phe Cys Leu Leu Gly Thr Glu Val Gly Val Thr His His Pro
210 215 220
Lys Gly Arg Trp Met Phe Val Cys Arg Phe Glu Arg Ala Asp Asp Val 225 230 235 240
Ala Val Leu Gin Asp Ala Leu Gly Arg Gly Thr Pro Leu Leu Pro Ala
245 250 255
His Ile Thr Ala Thr Leu Asp Leu Glu Ala Thr Phe Ala Leu His Ala 260 265 270 Asn Ile Ile Met Ala Leu Thr Val Ala Ile Val His Asn Ala Pro Ala 275 280 285
Arg Ile Gly Ser Gly Ser Thr Ala Pro Leu Tyr Glu Pro Gly Glu Ser
290 295 300
Met Arg Ser Val Val Gly Arg Met Ser Leu Gly Gin Arg Gly Leu Thr 305 310 315 320
Thr Leu Phe Val His His Glu Ala Arg Val Leu Ala Ala Tyr Arg Arg
325 330 335
Ala Tyr Tyr Gly Ser Ala Gin Ser Pro Phe Trp Phe Leu Ser Lys Phe 340 345 350
Gly Pro Asp Glu Lys Ser Leu Val Leu Ala Ala Arg Tyr Tyr Val Leu
355 360 365
Gin Ala Pro Arg Leu Gly Gly Ala Gly Ala Thr Tyr Asp Leu Gin Ala 370 375 380
Val Lys Asp Ile Cys Ala Thr Tyr Ala Ile Pro His Asp Pro Arg Pro
385 390 395 400
Asp Thr Leu Ser Ala Ala Ser Leu Thr Ser Phe Ala Ala Ile Thr Arg
405 410 415 Phe Cys Cys Thr Ser Gin Tyr Ser Arg Gly Ala Ala Ala Ala Gly Phe
420 425 430
Pro Leu Tyr Val Glu Arg Arg Ile Ala Ala Asp Val Arg Glu Thr Gly
435 440 445
Ala Leu Glu Lys Phe Ile Ala His Asp Arg Ser Cys Leu Arg Val Ser 450 455 460
Asp Arg Glu Phe Ile Thr Tyr Ile Tyr Leu Ala His Phe Glu Cys Phe
465 470 475 480
Ser Pro Pro Arg Leu Ala Thr His Leu Arg Ala Val Thr Thr His Asp
485 490 495 Pro Ser Pro Ala Ala Ser Thr Glu Gin Pro Ser Pro Leu Gly Arg Glu
500 505 510
Ala Val Glu Gin Phe Phe Arg His Val Arg Ala Gin Leu Asn Ile Arg
515 520 525
Glu Tyr Val Lys Gin Asn Val Thr Pro Arg Glu Thr Ala Gly Asp Ala 530 535 540
Ala Ala Ala Tyr Leu Arg Ala Arg Thr Tyr Ala Pro Ala Ala Leu Thr
545 550 555 560
Pro Ala Pro Ala Tyr Cys Gly Val Ala Asp Ser Ser Thr Lys Met Met
565 570 575 Gly Arg Leu Ala Glu Ala Glu Arg Leu Leu Val Pro His Gly Trp Pro
580 585 590
Ala Phe Ala Pro Thr Thr Pro Gly Asp Asp Ala Gly Gly Gly Thr Ala
595 600 605
Ala Pro Gin Thr Cys Gly Ile Val Lys Arg Leu Leu Lys Leu Ala Ala 610 615 620
Thr Glu Gin Gin Gly Thr Thr Pro Pro Ala Ile Ala Ala Leu Met Gin 625 630 635 640
Asp Ala Ser Val Gin Thr Pro Leu Pro Val Tyr Arg Ile Thr Met Ser 645 650 655 Pro Thr Gly Gin Ala Phe Ala Ala Ala Ala Arg Asp Asp Trp Ala Arg 660 665 670
Val Thr Arg Asp Ala Arg Pro Pro Glu Ala Thr Val Val Ala Asp Ala 675 680 685 Ala Ala Ala Pro Glu Pro Gly Ala Leu Gly Arg Arg Leu Thr Arg Arg
690 695 700
Ile Cys Arg Pro Ala Pro Pro. Pro Gly Arg Pro Gly Arg Arg Gly Pro 705 710 715 720 Asp Val Arg Glu Pro Gin Arg Asp Leu Gin Arg Arg Ala Gly Arg Tyr
725 730 735
Glu His His Pro Gly Ser Gly His Arg Pro Glu Gly Ala Arg Pro Leu
740 745 750
Ser Pro Ala Pro Arg Gly Pro Gly Ser Leu 755 760
(2) INFORMATION FOR SEQ ID NO: 285:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 781 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 285:
Met His Val Ser Ala Arg Arg Arg Ile Leu Ser Arg Cys Ala Ala Thr 1 5 10 15
Ala Pro Ser Met Val Glu Pro Ser Ser Pro Gly Trp Trp Arg Ala Ser
20 25 30
Leu Ser Arg Leu Thr Met Gin Ala Trp Tyr Val Arg Ala Arg Ala Arg 35 40 45 Ala Phe Thr Arg Arg Arg Val Ser Ser Ser Asp Ser Arg Ala Ser Ser 50 55 60
Ser Val Met Gly Ala Gly Lys Ser Ala Leu Thr Thr Ala Arg Ala Ser 65 70 75 80
Cys Ser Arg Gly Ser Xaa Ser Glu Gly Gly Ala Ala Ala Arg Ile Ile 85 90 95
Ser Tyr Cys Cys Ser Ser Gly Arg Val Pro Gin Pro His Ser Thr Pro
100 105 110
Ser Arg Asp Ala Ile Pro Glu His Arg Ser Ala Pro Ala Phe Pro His 115 120 125 Pro Thr Pro Ser Gly Phe Ala Gly Ala Met Gly Thr Glu Asp Cys Asp 130 135 140
His Glu Gly Arg Ser Val Ala Ala Pro Val Glu Val Met Ala Leu Tyr 145 150 155 160 Ala Thr Asp Gly Cys Val Ile Thr Ser Ser Leu Ala Leu Leu Thr Asn
165 170 175
Cys Leu Leu Gly Ala Glu Pro Leu Tyr Ile Phe Ser Tyr Asp Ala Tyr 180 185 190 Arg Pro Asp Ala Pro Asn Gly Pro Thr Gly Ala Pro Thr Glu Gin Glu 195 200 205
Arg Phe Glu Gly Ser Arg Ala Leu Tyr Arg Asp Ala Gly Gin Gly Asp
210 215 220
Ser Phe Arg Val Thr Phe Cys Leu Leu Gly Thr Glu Val Gly Val Thr 225 230 235 240
His His Pro Lys Gly Arg Trp Met Phe Val Cys Arg Phe Glu Arg Ala
245 250 255
Asp Asp Val Ala Val Leu Gin Asp Ala Leu Gly Arg Gly Thr Pro Leu 260 265 270 Leu Pro Ala His Ile Thr Ala Thr Leu Asp Leu Glu Ala Thr Phe Ala 275 280 285
Leu His Ala Asn Ile Ile Met Ala Leu Thr Val Ala Ile Val His Asn
290 295 300
Ala Pro Ala Arg Ile Gly Ser Gly Ser Thr Ala Pro Leu Tyr Glu Pro 305 310 315 320
Gly Glu Ser Met Arg Ser Val Val Gly Arg Met Ser Leu Gly Gin Arg
325 330 335
Gly Leu Thr Thr Leu Phe Val His His Glu Ala Arg Val Leu Ala Ala 340 345 350 Tyr Arg Arg Ala Tyr Tyr Gly Ser Ala Gin Ser Pro Phe Trp Phe Leu 355 360 365
Ser Lys Phe Gly Pro Asp Glu Lys Ser Leu Val Leu Ala Ala Arg Tyr
370 375 380
Tyr Val Leu Gin Ala Pro Arg Leu Gly Gly Ala Gly Ala Thr Tyr Asp 385 390 395 400
Leu Gin Ala Val Lys Asp Ile Cys Ala Thr Tyr Ala Ile Pro His Asp
405 410 415
Pro Arg Pro Asp Thr Leu Ser Ala Ala Ser Leu Thr Ser Phe Ala Ala 420 425 430 Ile Thr Arg Phe Cys Cys Thr Ser Gin Tyr Ser Arg Gly Ala Ala Ala 435 440 445
Ala Gly Phe Pro Leu Tyr Val Glu Arg Arg Ile Ala Ala Asp Val Arg
450 455 460
Glu Thr Gly Ala Leu Glu Lys Phe Ile Ala His Asp Arg Ser Cys Leu 465 470 475 480
Arg Val Ser Asp Arg Glu Phe Ile Thr Tyr Ile Tyr Leu Ala His Phe
485 490 495
Glu Cys Phe Ser Pro Pro Arg Leu Ala Thr His Leu Arg Ala Val Thr 500 505 510
Thr His Asp Pro Ser Pro Ala Ala Ser Thr Glu Gin Pro Ser Pro Leu
515 520 525
Gly Arg Glu Ala Val Glu Gin Phe Phe Arg His Val Arg Ala Gin Leu 530 535 540
Asn Ile Arg Glu Tyr Val Lys Gin Asn Val Thr Pro Arg Glu Thr Ala
545 550 555 560
Gly Asp Ala Ala Ala Ala Tyr Leu Arg Ala Arg Thr Tyr Ala Pro Ala
565 570 575 Ala Leu Thr Pro Ala Pro Ala Tyr Cys Gly Val Ala Asp Ser Ser Thr
580 585 590
Lys Met Met Gly Arg Leu Ala Glu Ala Glu Arg Leu Leu Val Pro His
595 600 605
Gly Trp Pro Ala Phe Ala Pro Thr Thr Pro Gly Asp Asp Ala Gly Gly 610 615 620
Gly Thr Ala Ala Pro Gin Thr Cys Gly lie Val Lys Arg Leu Leu Lys
625 630 635 640
Leu Ala Ala Thr Glu Gin Gin Gly Thr Thr Pro Pro Ala Ile Ala Ala
645 650 655 Leu Met Gin Asp Ala Ser Val Gin Thr Pro Leu Pro Val Tyr Arg Ile
660 665 670
Thr Met Ser Pro Thr Gly Gin Ala Phe Ala Ala Ala Ala Arg Asp Asp
675 680 685
Trp Ala Arg Val Thr Arg Asp Ala Arg Pro Pro Glu Ala Thr Val Val 690 695 700
Ala Asp Ala Ala Ala Ala Pro Glu Pro Gly Ala Leu Gly Arg Arg Leu
705 710 715 720
Thr Arg Arg Ile Cys Arg Pro Ala Pro Pro Pro Gly Arg Pro Gly Arg
725 730 735 Arg Gly Pro Asp Val Arg Glu Pro Gin Arg Asp Leu Gin Arg Arg Ala
740 745 750
Gly Arg Tyr Glu His His Pro Gly Ser Gly His Arg Pro Glu Gly Ala
755 760 765
Arg Pro Leu Ser Pro Ala Pro Arg Gly Pro Gly Ser Leu 770 775 780
(2) INFORMATION FOR SEQ ID NO: 286:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 784 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear ( ii ) MOLECULE TYPE : peptide
(xi ) SEQUENCE DESCRIPTION : SEQ ID NO : 286 :
Met Val Ala Met His Val Ser Ala Arg Arg Arg Ile Leu Ser Arg Cys
1 5 10 15
Ala Ala Thr Ala Pro Ser Met Val Glu Pro Ser Ser Pro Gly Trp Trp 20 25 30 Arg Ala Ser Leu Ser Arg Leu Thr Met Gin Ala Trp Tyr Val Arg Ala 35 40 45
Arg Ala Arg Ala Phe Thr Arg Arg Arg Val Ser Ser Ser Asp Ser Arg
50 55 60
Ala Ser Ser Ser Val Met Gly Ala Gly Lys Ser Ala Leu Thr Thr Ala 65 70 75 80
Arg Ala Ser Cys Ser Arg Gly Ser Xaa Ser Glu Gly Gly Ala Ala Ala
85 90 95
Arg Ile Ile Ser Tyr Cys Cys Ser Ser Gly Arg Val Pro Gin Pro His 100 105 110 Ser Thr Pro Ser Arg Asp Ala Ile Pro Glu His Arg Ser Ala Pro Ala 115 120 125
Phe Pro His Pro Thr Pro Ser Gly Phe Ala Gly Ala Met Gly Thr Glu
130 135 140
Asp Cys Asp His Glu Gly Arg Ser Val Ala Ala Pro Val Glu Val Met 145 150 155 160
Ala Leu Tyr Ala Thr Asp Gly Cys Val Ile Thr Ser Ser Leu Ala Leu
165 170 175
Leu Thr Asn Cys Leu Leu Gly Ala Glu Pro Leu Tyr Ile Phe Ser Tyr 180 185 190 Asp Ala Tyr Arg Pro Asp Ala Pro Asn Gly Pro Thr Gly Ala Pro Thr 195 200 205
Glu Gin Glu Arg Phe Glu Gly Ser Arg Ala Leu Tyr Arg Asp Ala Gly
210 215 220
Gin Gly Asp Ser Phe Arg Val Thr Phe Cys Leu Leu Gly Thr Glu Val 225 230 235 240
Gly Val Thr His His Pro Lys Gly Arg Trp Met Phe Val Cys Arg Phe
245 250 255
Glu Arg Ala Asp Asp Val Ala Val Leu Gin Asp Ala Leu Gly Arg Gly 260 265 270 Thr Pro Leu Leu Pro Ala His Ile Thr Ala Thr Leu Asp Leu Glu Ala 275 280 285
Thr Phe Ala Leu His Ala Asn Ile Ile Met Ala Leu Thr Val Ala Ile 290 295 300 Val His Asn Ala Pro Ala Arg Ile Gly Ser Gly Ser Thr Ala Pro Leu
305 310 315 320
Tyr Glu Pro Gly Glu Ser Met .Arg Ser Val Val Gly Arg Met Ser Leu
325 330 335 Gly Gin Arg Gly Leu Thr Thr Leu Phe Val His His Glu Ala Arg Val
340 345 350
Leu Ala Ala Tyr Arg Arg Ala Tyr Tyr Gly Ser Ala Gin Ser Pro Phe
355 360 365
Trp Phe Leu Ser Lys Phe Gly Pro Asp Glu Lys Ser Leu Val Leu Ala 370 375 380
Ala Arg Tyr Tyr Val Leu Gin Ala Pro Arg Leu Gly Gly Ala Gly Ala
385 390 395 400
Thr Tyr Asp Leu Gin Ala Val Lys Asp Ile Cys Ala Thr Tyr Ala Ile
405 410 415 Pro His Asp Pro Arg Pro Asp Thr Leu Ser Ala Ala Ser Leu Thr Ser
420 425 430
Phe Ala Ala Ile Thr Arg Phe Cys Cys Thr Ser Gin Tyr Ser Arg Gly
435 440 445
Ala Ala Ala Ala Gly Phe Pro Leu Tyr Val Glu Arg Arg Ile Ala Ala 450 455 460
Asp Val Arg Glu Thr Gly Ala Leu Glu Lys Phe Ile Ala His Asp Arg
465 470 475 480
Ser Cys Leu Arg Val Ser Asp Arg Glu Phe Ile Thr Tyr Ile Tyr Leu
485 490 495 Ala His Phe Glu Cys Phe Ser Pro Pro Arg Leu Ala Thr His Leu Arg
500 505 510
Ala Val Thr Thr His Asp Pro Ser Pro Ala Ala Ser Thr Glu Gin Pro
515 520 525
Ser Pro Leu Gly Arg Glu Ala Val Glu Gin Phe Phe Arg His Val Arg 530 535 540
Ala Gin Leu Asn Ile Arg Glu Tyr Val Lys Gin Asn Val Thr Pro Arg
545 550 555 560
Glu Thr Ala Gly Asp Ala Ala Ala Ala Tyr Leu Arg Ala Arg Thr Tyr
565 570 575 Ala Pro Ala Ala Leu Thr Pro Ala Pro Ala Tyr Cys Gly Val Ala Asp
580 585 590
Ser Ser Thr Lys Met Met Gly Arg Leu Ala Glu Ala Glu Arg Leu Leu
595 600 605
Val Pro His Gly Trp Pro Ala Phe Ala Pro Thr Thr Pro Gly Asp Asp 610 615 620
Ala Gly Gly Gly Thr Ala Ala Pro Gin Thr Cys Gly Ile Val Lys Arg 625 630 635 640
Leu Leu Lys Leu Ala Ala Thr Glu Gin Gin Gly Thr Thr Pro Pro Ala 645 650 655
He Ala Ala Leu Met Gin Asp Ala Ser Val Gin Thr Pro Leu Pro Val 660 665 670
Tyr Arg He Thr Met Ser Pro Thr Gly Gin Ala Phe Ala Ala Ala Ala 675 680 685
Arg Asp Asp Trp Ala Arg Val Thr Arg Asp Ala Arg Pro Pro Glu Ala 690 695 700
Thr Val Val Ala Asp Ala Ala Ala Ala Pro Glu Pro Gly Ala Leu Gly
705 710 715 720
Arg Arg Leu Thr Arg Arg He Cys Arg Pro Ala Pro Pro Pro Gly Arg 725 730 735
Pro Gly Arg Arg Gly Pro Asp Val Arg Glu Pro Gin Arg Asp Leu Gin 740 745 750
Arg Arg Ala Gly Arg Tyr Glu His His Pro Gly Ser Gly His Arg Pro
755 760 765
Glu Gly Ala Arg Pro Leu Ser Pro Ala Pro Arg Gly Pro Gly Ser Leu 770 775 780
(2) INFORMATION FOR SEQ ID NO: 287:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 789 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:287:
Met Tyr Ile Cys Arg Met Val Ala Met His Val Ser Ala Arg Arg Arg
1 5 10 15
Ile Leu Ser Arg Cys Ala Ala Thr Ala Pro Ser Met Val Glu Pro Ser 20 25 30 Ser Pro Gly Trp Trp Arg Ala Ser Leu Ser Arg Leu Thr Met Gin Ala 35 40 45
Trp Tyr Val Arg Ala Arg Ala Arg Ala Phe Thr Arg Arg Arg Val Ser
50 55 60
Ser Ser Asp Ser Arg Ala Ser Ser Ser Val Met Gly Ala Gly Lys Ser 65 70 75 80
Ala Leu Thr Thr Ala Arg Ala Ser Cys Ser Arg Gly Ser Xaa Ser Glu
85 90 95
Gly Gly Ala Ala Ala Arg Ile Ile Ser Tyr Cys Cys Ser Ser Gly Arg 100 105 110
Val Pro Gin Pro His Ser Thr Pro Ser Arg Asp Ala Ile Pro Glu His
115 .120 125
Arg Ser Ala Pro Ala Phe Pro His Pro Thr Pro Ser Gly Phe Ala Gly 130 135 140
Ala Met Gly Thr Glu Asp Cys Asp His Glu Gly Arg Ser Val Ala Ala
145 150 155 160
Pro Val Glu Val Met Ala Leu Tyr Ala Thr Asp Gly Cys Val Ile Thr
165 170 175 Ser Ser Leu Ala Leu Leu Thr Asn Cys Leu Leu Gly Ala Glu Pro Leu
180 185 190
Tyr Ile Phe Ser Tyr Asp Ala Tyr Arg Pro Asp Ala Pro Asn Gly Pro
195 200 205
Thr Gly Ala Pro Thr Glu Gin Glu Arg Phe Glu Gly Ser Arg Ala Leu 210 215 220
Tyr Arg Asp Ala Gly Gin Gly Asp Ser Phe Arg Val Thr Phe Cys Leu
225 230 235 240
Leu Gly Thr Glu Val Gly Val Thr His His Pro Lys Gly Arg Trp Met
245 250 255 Phe Val Cys Arg Phe Glu Arg Ala Asp Asp Val Ala Val Leu Gin Asp
260 265 270
Ala Leu Gly Arg Gly Thr Pro Leu Leu Pro Ala His Ile Thr Ala Thr
275 280 285
Leu Asp Leu Glu Ala Thr Phe Ala Leu His Ala Asn Ile He Met Ala 290 295 300
Leu Thr Val Ala Ile Val His Asn Ala Pro Ala Arg Ile Gly Ser Gly
305 310 315 320
Ser Thr Ala Pro Leu Tyr Glu Pro Gly Glu Ser Met Arg Ser Val Val
325 330 335 Gly Arg Met Ser Leu Gly Gin Arg Gly Leu Thr Thr Leu Phe Val His
340 345 350
His Glu Ala Arg Val Leu Ala Ala Tyr Arg Arg Ala Tyr Tyr Gly Ser
355 360 365
Ala Gin Ser Pro Phe Trp Phe Leu Ser Lys Phe Gly Pro Asp Glu Lys 370 375 380
Ser Leu Val Leu Ala Ala Arg Tyr Tyr Val Leu Gin Ala Pro Arg Leu 385 390 395 400
Gly Gly Ala Gly Ala Thr Tyr Asp Leu Gin Ala Val Lys Asp Ile Cys 405 410 415 Ala Thr Tyr Ala Ile Pro His Asp Pro Arg Pro Asp Thr Leu Ser Ala 420 425 430
Ala Ser Leu Thr Ser Phe Ala Ala Ile Thr Arg Phe Cys Cys Thr Ser 435 440 445 Gin Tyr Ser Arg Gly Ala Ala Ala Ala Gly Phe Pro Leu Tyr Val Glu
450 455 460
Arg Arg Ile Ala Ala Asp Val. Arg Glu Thr Gly Ala Leu Glu Lys Phe 465 470 475 480 Ile Ala His Asp Arg Ser Cys Leu Arg Val Ser Asp Arg Glu Phe lie
485 490 495
Thr Tyr Ile Tyr Leu Ala His Phe Glu Cys Phe Ser Pro Pro Arg Leu
500 505 510
Ala Thr His Leu Arg Ala Val Thr Thr His Asp Pro Ser Pro Ala Ala 515 520 525
Ser Thr Glu Gin Pro Ser Pro Leu Gly Arg Glu Ala Val Glu Gin Phe
530 535 540
Phe Arg His Val Arg Ala Gin Leu Asn Ile Arg Glu Tyr Val Lys Gin 545 550 555 560 Asn Val Thr Pro Arg Glu Thr Ala Gly Asp Ala Ala Ala Ala Tyr Leu
565 570 575
Arg Ala Arg Thr Tyr Ala Pro Ala Ala Leu Thr Pro Ala Pro Ala Tyr
580 585 590
Cys Gly Val Ala Asp Ser Ser Thr Lys Met Met Gly Arg Leu Ala Glu 595 600 605
Ala Glu Arg Leu Leu Val Pro His Gly Trp Pro Ala Phe Ala Pro Thr
610 615 620
Thr Pro Gly Asp Asp Ala Gly Gly Gly Thr Ala Ala Pro Gin Thr Cys 625 630 635 640 Gly Ile Val Lys Arg Leu Leu Lys Leu Ala Ala Thr Glu Gin Gin Gly
645 650 655
Thr Thr Pro Pro Ala Ile Ala Ala Leu Met Gin Asp Ala Ser Val Gin
660 665 670
Thr Pro Leu Pro Val Tyr Arg Ile Thr Met Ser Pro Thr Gly Gin Ala 675 680 685
Phe Ala Ala Ala Ala Arg Asp Asp Trp Ala Arg Val Thr Arg Asp Ala
690 695 700
Arg Pro Pro Glu Ala Thr Val Val Ala Asp Ala Ala Ala Ala Pro Glu 705 710 715 720 Pro Gly Ala Leu Gly Arg Arg Leu Thr Arg Arg Ile Cys Arg Pro Ala
725 730 735
Pro Pro Pro Gly Arg Pro Gly Arg Arg Gly Pro Asp Val Arg Glu Pro
740 745 750
Gin Arg Asp Leu Gin Arg Arg Ala Gly Arg Tyr Glu His His Pro Gly 755 760 765
Ser Gly His Arg Pro Glu Gly Ala Arg Pro Leu Ser Pro Ala Pro Arg
770 775 780
Gly Pro Gly Ser Leu 785
(2) INFORMATION FOR SEQ ID NO: 288:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 809 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 288:
Met Leu Arg Met Ala Trp Glu Thr Ser Thr Ser Ala Asp Leu Ser Ala 1 5 10 15
Ala Pro Thr Asp Met Tyr Ile Cys Arg Met Val Ala Met His Val Ser
20 25 30
Ala Arg Arg Arg Ile Leu Ser Arg Cys Ala Ala Thr Ala Pro Ser Met 35 40 45
Val Glu Pro Ser Ser Pro Gly Trp Trp Arg Ala Ser Leu Ser Arg Leu
50 55 60
Thr Met Gin Ala Trp Tyr Val Arg Ala Arg Ala Arg Ala Phe Thr Arg 65 70 75 80 Arg Arg Val Ser Ser Ser Asp Ser Arg Ala Ser Ser Ser Val Met Gly
85 90 95
Ala Gly Lys Ser Ala Leu Thr Thr Ala Arg Ala Ser Cys Ser Arg Gly
100 105 110
Ser Xaa Ser Glu Gly Gly Ala Ala Ala Arg Ile Ile Ser Tyr Cys Cys 115 120 125
Ser Ser Gly Arg Val Pro Gin Pro His Ser Thr Pro Ser Arg Asp Ala
130 135 140
Ile Pro Glu His Arg Ser Ala Pro Ala Phe Pro His Pro Thr Pro Ser 145 150 155 160 Gly Phe Ala Gly Ala Met Gly Thr Glu Asp Cys Asp His Glu Gly Arg
165 170 175
Ser Val Ala Ala Pro Val Glu Val Met Ala Leu Tyr Ala Thr Asp Gly
180 185 190
Cys Val Ile Thr Ser Ser Leu Ala Leu Leu Thr Asn Cys Leu Leu Gly 195 200 205
Ala Glu Pro Leu Tyr Ile Phe Ser Tyr Asp Ala Tyr Arg Pro Asp Ala
210 215 220
Pro Asn Gly Pro Thr Gly Ala Pro Thr Glu Gin Glu Arg Phe Glu Gly 225 230 235 240
Ser Arg Ala Leu Tyr Arg Asp Ala Gly Gin Gly Asp Ser Phe Arg Val
245 . 250 255
Thr Phe Cys Leu Leu Gly Thr Glu Val Gly Val Thr His His Pro Lys 260 265 270
Gly Arg Trp Met Phe Val Cys Arg Phe Glu Arg Ala Asp Asp Val Ala
275 280 285
Val Leu Gin Asp Ala Leu Gly Arg Gly Thr Pro Leu Leu Pro Ala His
290 295 300 Ile Thr Ala Thr Leu Asp Leu Glu Ala Thr Phe Ala Leu His Ala Asn
305 310 315 320
Ile Ile Met Ala Leu Thr Val Ala Ile Val His Asn Ala Pro Ala Arg
325 330 335
Ile Gly Ser Gly Ser Thr Ala Pro Leu Tyr Glu Pro Gly Glu Ser Met 340 345 350
Arg Ser Val Val Gly Arg Met Ser Leu Gly Gin Arg Gly Leu Thr Thr
355 360 365
Leu Phe Val His His Glu Ala Arg Val Leu Ala Ala Tyr Arg Arg Ala
370 375 380 Tyr Tyr Gly Ser Ala Gin Ser Pro Phe Trp Phe Leu Ser Lys Phe Gly
385 390 395 400
Pro Asp Glu Lys Ser Leu Val Leu Ala Ala Arg Tyr Tyr Val Leu Gin
405 410 415
Ala Pro Arg Leu Gly Gly Ala Gly Ala Thr Tyr Asp Leu Gin Ala Val 420 425 430
Lys Asp Ile Cys Ala Thr Tyr Ala Ile Pro His Asp Pro Arg Pro Asp
435 440 445
Thr Leu Ser Ala Ala Ser Leu Thr Ser Phe Ala Ala Ile Thr Arg Phe
450 455 460 Cys Cys Thr Ser Gin Tyr Ser Arg Gly Ala Ala Ala Ala Gly Phe Pro
465 470 475 480
Leu Tyr Val Glu Arg Arg Ile Ala Ala Asp Val Arg Glu Thr Gly Ala
485 490 495
Leu Glu Lys Phe Ile Ala His Asp Arg Ser Cys Leu Arg Val Ser Asp 500 505 510
Arg Glu Phe Ile Thr Tyr Ile Tyr Leu Ala His Phe Glu Cys Phe Ser
515 520 525
Pro Pro Arg Leu Ala Thr His Leu Arg Ala Val Thr Thr His Asp Pro 530 535 540 Ser Pro Ala Ala Ser Thr Glu Gin Pro Ser Pro Leu Gly Arg Glu Ala 545 550 555 560
Val Glu Gin Phe Phe Arg His Val Arg Ala Gin Leu Asn Ile Arg Glu 565 570 575 Tyr Val Lys Gin Asn Val Thr Pro Arg Glu Thr Ala Gly Asp Ala Ala
580 585 590
Ala Ala Tyr Leu Arg Ala Arg. Thr Tyr Ala Pro Ala Ala Leu Thr Pro 595 600 605 Ala Pro Ala Tyr Cys Gly Val Ala Asp Ser Ser Thr Lys Met Met Gly 610 615 620
Arg Leu Ala Glu Ala Glu Arg Leu Leu Val Pro His Gly Trp Pro Ala 625 630 635 640
Phe Ala Pro Thr Thr Pro Gly Asp Asp Ala Gly Gly Gly Thr Ala Ala 645 650 655
Pro Gin Thr Cys Gly Ile Val Lys Arg Leu Leu Lys Leu Ala Ala Thr
660 665 670
Glu Gin Gin Gly Thr Thr Pro Pro Ala Ile Ala Ala Leu Met Gin Asp 675 680 685 Ala Ser Val Gin Thr Pro Leu Pro Val Tyr Arg Ile Thr Met Ser Pro 690 695 700
Thr Gly Gin Ala Phe Ala Ala Ala Ala Arg Asp Asp Trp Ala Arg Val 705 710 715 720
Thr Arg Asp Ala Arg Pro Pro Glu Ala Thr Val Val Ala Asp Ala Ala 725 730 735
Ala Ala Pro Glu Pro Gly Ala Leu Gly Arg Arg Leu Thr Arg Arg lie
740 745 750
Cys Arg Pro Ala Pro Pro Pro Gly Arg Pro Gly Arg Arg Gly Pro Asp 755 760 765 Val Arg Glu Pro Gin Arg Asp Leu Gin Arg Arg Ala Gly Arg Tyr Glu 770 775 780
His His Pro Gly Ser Gly His Arg Pro Glu Gly Ala Arg Pro Leu Ser 785 790 795 800
Pro Ala Pro Arg Gly Pro Gly Ser Leu 805
(2) INFORMATION FOR SEQ ID NO: 289:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 816 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 289: Met Thr Thr Ser Leu Ser Ala Met Leu Arg Met Ala Trp Glu Thr Ser
1 5 10 15
Thr Ser Ala Asp Leu Ser Ala. Ala Pro Thr Asp Met Tyr Ile Cys Arg 20 25 30
Met Val Ala Met His Val Ser Ala Arg Arg Arg Ile Leu Ser Arg Cys 35 40 45
Ala Ala Thr Ala Pro Ser Met Val Glu Pro Ser Ser Pro Gly Trp Trp 50 55 60
Arg Ala Ser Leu Ser Arg Leu Thr Met Gin Ala Trp Tyr Val Arg Ala
65 70 75 80
Arg Ala Arg Ala Phe Thr Arg Arg Arg Val Ser Ser Ser Asp Ser Arg 85 90 95
Ala Ser Ser Ser Val Met Gly Ala Gly Lys Ser Ala Leu Thr Thr Ala 100 105 110
Arg Ala Ser Cys Ser Arg Gly Ser Xaa Ser Glu Gly Gly Ala Ala Ala
115 120 125
Arg Ile Ile Ser Tyr Cys Cys Ser Ser Gly Arg Val Pro Gin Pro His
130 135 140
Ser Thr Pro Ser Arg Asp Ala Ile Pro Glu His Arg Ser Ala Pro Ala
145 150 155 160
Phe Pro His Pro Thr Pro Ser Gly Phe Ala Gly Ala Met Gly Thr Glu 165 170 175
Asp Cys Asp His Glu Gly Arg Ser Val Ala Ala Pro Val Glu Val Met 180 185 190
Ala Leu Tyr Ala Thr Asp Gly Cys Val Ile Thr Ser Ser Leu Ala Leu 195 200 205
Leu Thr Asn Cys Leu Leu Gly Ala Glu Pro Leu Tyr Ile Phe Ser Tyr 210 215 220
Asp Ala Tyr Arg Pro Asp Ala Pro Asn Gly Pro Thr Gly Ala Pro Thr
225 230 235 240
Glu Gin Glu Arg Phe Glu Gly Ser Arg Ala Leu Tyr Arg Asp Ala Gly 245 250 255
Gin Gly Asp Ser Phe Arg Val Thr Phe Cys Leu Leu Gly Thr Glu Val
260 265 270
Gly Val Thr His His Pro Lys Gly Arg Trp Met Phe Val Cys Arg Phe
275 280 285
Glu Arg Ala Asp Asp Val Ala Val Leu Gin Asp Ala Leu Gly Arg Gly 290 295 300
Thr Pro Leu Leu Pro Ala His Ile Thr Ala Thr Leu Asp Leu Glu Ala
305 310 315 320
Thr Phe Ala Leu His Ala Asn Ile Ile Met Ala Leu Thr Val Ala Ile 325 330 335
Val His Asn Ala Pro Ala Arg Ile Gly Ser Gly Ser Thr Ala Pro Leu 340 345 350
Tyr Glu Pro Gly Glu Ser Met Arg Ser Val Val Gly Arg Met Ser Leu
355 .360 365
Gly Gin Arg Gly Leu Thr Thr Leu Phe Val His His Glu Ala Arg Val 370 375 380
Leu Ala Ala Tyr Arg Arg Ala Tyr Tyr Gly Ser Ala Gin Ser Pro Phe
385 390 395 400
Trp Phe Leu Ser Lys Phe Gly Pro Asp Glu Lys Ser Leu Val Leu Ala
405 410 415 Ala Arg Tyr Tyr Val Leu Gin Ala Pro Arg Leu Gly Gly Ala Gly Ala
420 425 430
Thr Tyr Asp Leu Gin Ala Val Lys Asp Ile Cys Ala Thr Tyr Ala Ile
435 440 445
Pro His Asp Pro Arg Pro Asp Thr Leu Ser Ala Ala Ser Leu Thr Ser 450 455 460
Phe Ala Ala Ile Thr Arg Phe Cys Cys Thr Ser Gin Tyr Ser Arg Gly
465 470 475 480
Ala Ala Ala Ala Gly Phe Pro Leu Tyr Val Glu Arg Arg Ile Ala Ala
485 490 495 Asp Val Arg Glu Thr Gly Ala Leu Glu Lys Phe Ile Ala His Asp Arg
500 505 510
Ser Cys Leu Arg Val Ser Asp Arg Glu Phe Ile Thr Tyr Ile Tyr Leu
515 520 525
Ala His Phe Glu Cys Phe Ser Pro Pro Arg Leu Ala Thr His Leu Arg 530 535 540
Ala Val Thr Thr His Asp Pro Ser Pro Ala Ala Ser Thr Glu Gin Pro
545 550 555 560
Ser Pro Leu Gly Arg Glu Ala Val Glu Gin Phe Phe Arg His Val Arg
565 570 575 Ala Gin Leu Asn Ile Arg Glu Tyr Val Lys Gin Asn Val Thr Pro Arg
580 585 590
Glu Thr Ala Gly Asp Ala Ala Ala Ala Tyr Leu Arg Ala Arg Thr Tyr
595 600 605
Ala Pro Ala Ala Leu Thr Pro Ala Pro Ala Tyr Cys Gly Val Ala Ala 610 615 620
Asp Ser Ser Thr Lys Met Met Gly Arg Leu Ala Glu Ala Glu Arg Leu 625 630 635 640
Leu Val Pro Gly Trp Pro Ala Phe Ala Pro Thr Thr Pro Gly Asp Asp 645 650 655 Ala Gly Gly Gly Thr Ala Ala Pro Gin Thr Cys Gly He Val Lys Arg 660 665 670
Leu Leu Lys Leu Ala Ala Thr Glu Gin Gin Gly Thr Thr Pro Pro Ala 675 680 685 Ile Ala Ala Leu Met Gin Asp Ala Ser Val Gin Thr Pro Leu Pro Val
690 695 700
Tyr Arg Ile Thr Met Ser Pro Thr Gly Gin Ala Phe Ala Ala Ala Ala 705 710 715 720 Arg Asp Asp Trp Ala Arg Val Thr Arg Asp Ala Arg Pro Pro Glu Ala
725 730 735
Thr Val Val Ala Asp Ala Ala Ala Ala Pro Glu Pro Gly Ala Leu Gly
740 745 750
Arg Arg Leu Thr Arg Arg Ile Cys Arg Pro Ala Pro Pro Pro Gly Arg 755 760 765
Pro Gly Arg Arg Gly Pro Asp Val Arg Glu Pro Gin Arg Asp Leu Gin
770 775 780
Arg Arg Ala Gly Arg Tyr Glu His His Pro Gly Ser Gly His Arg Pro 785 790 795 800 Glu Gly Ala Arg Pro Leu Ser Pro Ala Pro Arg Gly Pro Gly Ser Leu
805 810 815
(2) INFORMATION FOR SEQ ID NO: 290:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 184 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 290:
Met Thr Thr Thr Pro Leu Ser Asn Leu Phe Leu Arg Ala Pro Asp Ile 1 5 10 15
Thr His Val Ala Pro Pro Tyr Cys Leu Asn Ala Thr Trp Gin Ala Glu
20 25 30
Asn Ala Leu His Thr Thr Lys Thr Asp Pro Ala Cys Leu Ala Ala Arg 35 40 45
Ser Tyr Leu Val Arg Ala Ser Cys Ser Thr Ser Gly Pro Ile His Cys
50 55 60
Phe Phe Phe Ala Val Tyr Lys Asp Ser Gin His Ser Leu Pro Leu Val 65 70 75 80 Thr Glu Leu Arg Asn Phe Ala Asp Leu Val Asn His Pro Pro Val Leu
85 90 95
Arg Glu Leu Glu Asp Lys Arg Gly Gly Arg Leu Arg Cys Thr Gly Pro 100 105 110 Phe Ser Cys Gly Thr Ile Lys Asp Val Ser Gly Asp Ala Gly Glu Tyr
115 120 125
Thr Ile Asn Gly Ile Val Tyr. His Cys His Cys Arg Tyr Pro Phe Ser
130 135 140 Lys Thr Cys Trp Leu Gly Ala Ser Ala Ala Leu Gin His Leu Arg Ser
145 150 155 160
Ile Ser Ser Ser Gly Thr Ala Ala Arg Ala Ala Glu Gin Arg Arg His
165 170 175
Lys Ile Lys Ile Lys Ile Lys Val 180
(2) INFORMATION FOR SEQ ID NO: 291:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 212 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 291:
Met Trp Gly Pro Gly Pro Ala Arg Phe He Ala Arg Pro Gly Thr His 1 5 10 15
Gly Arg Arg Val Phe Thr Asp Pro Pro Pro Arg Asn Met Thr Thr Thr
20 25 30
Pro Leu Ser Asn Leu Phe Leu Arg Ala Pro Asp Ile Thr His Val Ala 35 40 45 Pro Pro Tyr Cys Leu Asn Ala Thr Trp Gin Ala Glu Asn Ala Leu His 50 55 60
Thr Thr Lys Thr Asp Pro Ala Cys Leu Ala Ala Arg Ser Tyr Leu Val 65 70 75 80
Arg Ala Ser Cys Ser Thr Ser Gly Pro Ile His Cys Phe Phe Phe Ala 85 90 95
Val Tyr Lys Asp Ser Gin His Ser Leu Pro Leu Val Thr Glu Leu Arg
100 105 110
Asn Phe Ala Asp Leu Val Asn His Pro Pro Val Leu Arg Glu Leu Glu 115 120 125 Asp Lys Arg Gly Gly Arg Leu Arg Cys Thr Gly Pro Phe Ser Cys Gly 130 135 140
Thr Ile Lys Asp Val Ser Gly Asp Ala Gly Glu Tyr Thr He Asn Gly 145 150 155 160 Ile Val Tyr His Cys His Cys Arg Tyr Pro Phe Ser Lys Thr Cys Trp
165 170 175
Leu Gly Ala Ser Ala Ala Leu. Gin His Leu Arg Ser Ile Ser Ser Ser 180 185 190 Gly Thr Ala Ala Arg Ala Ala Glu Gin Arg Arg His Lys Ile Lys Ile 195 200 205
Lys Ile Lys Val 210
(2) INFORMATION FOR SEQ ID NO: 292:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 670 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 292:
Met Ala Ala Gin Arg Ala Arg Ala Pro Ala Met Arg Thr Arg Gly Gly
1 5 10 15
Asp Ala Ala Leu Cys Ala Pro Glu Asp Gly Trp Val Lys Val His Pro 20 25 30
Thr Pro Gly Thr Met Leu Phe Arg Glu Ile Leu Leu Gly Gin Met Gly
35 40 45
Tyr Thr Glu Gly Gin Gly Val Tyr Asn Val Val Arg Ser Ser Glu Ala 50 55 60 Ala Thr Arg Gin Leu Gin Ala Ala Ile Phe His Ala Leu Leu Asn Ala 65 70 75 80
Thr Tyr Asp Leu Glu Glu Asp Trp Arg Arg His Val Val Arg Leu Gin
85 90 95
Pro Gin Arg Leu Val Arg Arg Tyr Arg Asn Ala Arg Glu Gly Asp Ile 100 105 110
Ala Gly Val Ala Glu Arg Val Phe Asp Thr Trp Arg Cys Thr Leu Arg
115 120 125
Thr Thr Leu Leu Asp Phe Ala His Gly Val Val Asp Cys Phe Ala Pro 130 135 140 Gly Gly Pro Ser Gly Pro Thr Ser Phe Pro Lys Tyr Ile Asp Trp Leu 145 150 155 160
Thr Cys Leu Gly Leu Val Pro He Leu Arg Lys Thr Arg Glu Gly Glu 165 170 175 Ala Thr Gin Arg Leu Gly Ala Phe Leu Arg Gin His Thr Leu Pro Arg
180 185 190
Gin Leu Ala Thr Val Ala Gly- Ala Ala Glu Arg Ala Gly Pro Gly Leu 195 200 205 Leu Glu Leu Ala Val Ala Phe Asp Ser Thr Arg Met Ala Glu Tyr Asp 210 215 220
Arg Val His Ile Tyr Tyr Asn His Arg Arg Gly Glu Trp Leu Val Arg 225 230 235 240
Asp Pro Val Ser Gly Gin Arg Gly Glu Cys Leu Val Leu Cys Pro Pro 245 250 255
Leu Trp Thr Gly Asp Arg Leu Val Phe Asp Ser Pro Val Gin Arg Leu
260 265 270
Cys Pro Glu Ile Val Ala Cys His Ala Leu Arg Glu His Ala His Ile 275 280 285 Cys Arg Leu Arg Asn Thr Ala Ser Val Lys Val Leu Leu Gly Arg Lys 290 295 300
Ser Asp Ser Gly Val Ala Gly Ala Ala Ala Arg Val Val Asn Lys Ala 305 310 315 320
Leu Gly Glu Asp Asp Glu Thr Lys Ala Gly Ser Ala Ala Ser Arg Leu 325 330 335
Val Arg Leu Ile Ile Met Lys Gly Met Arg His Val Gly Asp Ile Asn
340 345 350
Asp Thr Val Arg Ala Tyr Leu Asp Glu Ala Gly Gly His Leu Ile Asp 355 360 365 Thr Pro Ala Val Asp His Thr Leu Pro Gly Phe Gly Lys Gly Gly Thr 370 375 380
Gly Arg Gly Ser Ala Ala Gin Asp Pro Gly Ala Arg Pro Gin Gin Leu 385 390 395 400
Arg Gin Ala Phe Gin Thr Ala Val Val Asn Asn He Asn Gly Met Leu 405 410 415
Glu Gly Tyr Ile Asn Asn Leu Phe Gly Thr Ile Glu Arg Leu Arg Glu
420 425 430
Thr Asn Ala Gly Leu Ala Thr Gin Leu Gin Ala Arg Asp Arg Glu Leu 435 440 445 Arg Arg Ala Gin Ala Gly Ala Leu Glu Arg Glu Gin Arg Ala Ala Asp 450 455 460
Arg Ala Ala Gly Gly Gly Ala Gly Arg Pro Ala Glu Ala Asp Leu Leu 465 470 475 480
Arg Ala Asp Tyr Asp Ile Ile Asp Val Ser Lys Ser Met Asp Asp Asp 485 490 495
Thr Tyr Val Ala Asn Ser Phe Gin His Gin Tyr Ile Pro Ala Tyr Gly
500 505 510
Gin Asp Leu Glu Arg Leu Ser Arg Leu Trp Glu His Glu Leu Val Arg 515 520 525
Cys Phe Lys Ile Leu Arg His Arg Asn Asn Gin Gly Gin Glu Thr Ser
530 535. 540
Ile Ser Tyr Ser Ser Gly Ala Ile Ala Ser Phe Val Ala Pro Tyr Phe 545 550 555 560
Glu Tyr Val Leu Arg Ala Pro Arg Ala Gly Ala Leu Ile Thr Gly Ser
565 570 575
Asp Val Ile Leu Gly Glu Glu Glu Leu Trp Glu Ala Val Phe Lys Lys 580 585 590 Thr Arg Leu Gin Thr Tyr Leu Thr Asp Val Ala Ala Leu Phe Val Ala 595 600 605
Asp Val Gin His Ala Ala Leu Pro Arg Pro Pro Ser Pro Thr Pro Ala
610 615 620
Asp Phe Arg Ala Ser Asp Arg Gly Gly Ser Arg Ser Arg Thr Arg Thr 625 630 635 640
Arg Ser Arg Ser Pro Gly Arg Thr Pro Arg Gly Ala Pro Asp Gin Gly
645 650 655
Trp Gly Val Glu Arg Arg Asp Gly Arg Pro His Ala Arg Arg 660 665 670
(2) INFORMATION FOR SEQ ID NO: 293:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 710 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:293
Met Asp Val Lys Phe Lys Asn Ala Ser Ser Leu Asn Arg Thr Ala Gly 1 5 10 15 Leu Ala Pro Gly Cys Cys Gly Gly Gly Pro Gly Ala Arg Thr Ser Arg 20 25 30
Glu Pro Ser Pro Pro Asp Ala Ala Met Ala Ala Gin Arg Ala Arg Ala
35 40 45
Pro Ala Met Arg Thr Arg Gly Gly Asp Ala Ala Leu Cys Ala Pro Glu 50 55 60
Asp Gly Trp Val Lys Val His Pro Thr Pro Gly Thr Met Leu Phe Arg 65 70 75 80
Glu Ile Leu Leu Gly Gin Met Gly Tyr Thr Glu Gly Gin Gly Val Tyr 85 90 95
Asn Val Val Arg Ser Ser Glu Ala Ala Thr Arg Gin Leu Gin Ala Ala
100 . 105 110
Ile Phe His Ala Leu Leu Asn Ala Thr Tyr Asp Leu Glu Glu Asp Trp 115 120 125
Arg Arg His Val Val Arg Leu Gin Pro Gin Arg Leu Val Arg Arg Tyr
130 135 140
Arg Asn Ala Arg Glu Gly Asp Ile Ala Gly Val Ala Glu Arg Val Phe 145 150 155 160 Asp Thr Trp Arg Cys Thr Leu Arg Thr Thr Leu Leu Asp Phe Ala His
165 170 175
Gly Val Val Asp Cys Phe Ala Pro Gly Gly Pro Ser Gly Pro Thr Ser
180 185 190
Phe Pro Lys Tyr Ile Asp Trp Leu Thr Cys Leu Gly Leu Val Pro Ile 195 200 205
Leu Arg Lys Thr Arg Glu Gly Glu Ala Thr Gin Arg Leu Gly Ala Phe
210 215 220
Leu Arg Gin His Thr Leu Pro Arg Gin Leu Ala Thr Val Ala Gly Ala 225 230 235 240 Ala Glu Arg Ala Gly Pro Gly Leu Leu Glu Leu Ala Val Ala Phe Asp
245 250 255
Ser Thr Arg Met Ala Glu Tyr Asp Arg Val His Ile Tyr Tyr Asn His
260 265 270
Arg Arg Gly Glu Trp Leu Val Arg Asp Pro Val Ser Gly Gin Arg Gly 275 280 285
Glu Cys Leu Val Leu Cys Pro Pro Leu Trp Thr Gly Asp Arg Leu Val
290 295 300
Phe Asp Ser Pro Val Gin Arg Leu Cys Pro Glu Ile Val Ala Cys His 305 310 315 320 Ala Leu Arg Glu His Ala His Ile Cys Arg Leu Arg Asn Thr Ala Ser
325 330 335
Val Lys Val Leu Leu Gly Arg Lys Ser Asp Ser Gly Val Ala Gly Ala
340 345 350
Ala Arg Val Val Asn Lys Ala Leu Gly Glu Asp Asp Glu Thr Lys Ala 355 360 365
Gly Ser Ala Ala Ser Arg Leu Val Arg Leu Ile Ile Asn Met Lys Gly
370 375 380
Met Arg His Val Gly Asp Ile Asn Asp Thr Val Arg Ala Tyr Leu Asp 385 390 395 400 Glu Ala Gly Gly His Leu Ile Asp Thr Pro Ala Val Asp His Thr Leu
405 410 415
Pro Gly Phe Gly Lys Gly Gly Thr Gly Arg Gly Ser Ala Ala Gin Asp 420 425 430 Pro Gly Ala Arg Pro Gin Gin Leu Arg Gin Ala Phe Gin Thr Ala Val
435 440 445
Val Asn Asn Ile Asn Gly Met. Leu Glu Gly Tyr Ile Asn Asn Leu Phe
450 455 460 Gly Thr Ile Glu Arg Leu Arg Glu Thr Asn Ala Gly Leu Ala Thr Gin
465 470 475 480
Leu Gin Ala Arg Asp Arg Glu Leu Arg Arg Ala Gin Ala Gly Ala Leu
485 490 495
Glu Arg Glu Gin Arg Ala Ala Asp Arg Ala Ala Gly Gly Gly Ala Gly 500 505 510
Arg Pro Ala Glu Ala Asp Leu Leu Arg Ala Asp Tyr Asp He Ile Asp
515 520 525
Val Ser Lys Ser Met Asp Asp Asp Thr Tyr Val Ala Asn Ser Phe Gin
530 535 540 His Gin Tyr Ile Pro Ala Tyr Gly Gin Asp Leu Glu Arg Leu Ser Arg
545 550 555 560
Leu Trp Glu His Glu Leu Val Arg Cys Phe Lys Ile Leu Arg His Arg
565 570 575
Asn Asn Gin Gly Gin Glu Thr Ser Ile Ser Tyr Ser Ser Gly Ala Ile 580 585 590
Ala Ser Phe Val Ala Pro Tyr Phe Glu Tyr Val Leu Arg Ala Pro Arg
595 600 605
Ala Gly Ala Leu Ile Thr Gly Ser Asp Val Ile Leu Gly Glu Glu Glu
610 615 620 Leu Trp Glu Ala Val Phe Lys Lys Thr Arg Leu Gin Thr Tyr Leu Thr
625 630 635 640
Asp Val Ala Ala Leu Phe Val Ala Asp Val Gin His Ala Ala Leu Pro
645 650 655
Arg Pro Pro Ser Pro Thr Pro Ala Asp Phe Arg Ala Ser Asp Arg Gly 660 665 670
Gly Ser Arg Ser Arg Thr Arg Thr Arg Ser Arg Ser Pro Gly Arg Thr
675 680 685
Pro Arg Gly Ala Pro Asp Gin Gly Trp Gly Val Glu Arg Arg Asp Gly 690 695 700 Arg Pro His Ala Arg Arg 705 710
(2) INFORMATION FOR SEQ ID NO: 294:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 720 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 294:
Met Arg Ala Met Ile Gly Trp Thr Pro Cys Met Asp Val Lys Phe Lys
1 5 10 15
Asn Ala Ser Ser Leu Asn Arg Thr Ala Gly Leu Ala Pro Gly Cys Cys 20 25 30
Gly Gly Gly Pro Gly Ala Arg Thr Ser Arg Glu Pro Ser Pro Pro Asp
35 40 45
Ala Ala Met Ala Ala Gin Arg Ala Arg Ala Pro Ala Met Arg Thr Arg 50 55 60 Gly Gly Asp Ala Ala Leu Cys Ala Pro Glu Asp Gly Trp Val Lys Val 65 70 75 80
His Pro Thr Pro Gly Thr Met Leu Phe Arg Glu Ile Leu Leu Gly Gin
85 90 95
Met Gly Tyr Thr Glu Gly Gin Gly Val Tyr Asn Val Val Arg Ser Ser 100 105 110
Glu Ala Ala Thr Arg Gin Leu Gin Ala Ala Ile Phe His Ala Leu Leu
115 120 125
Asn Ala Thr Tyr Asp Leu Glu Glu Asp Trp Arg Arg His Val Val Arg
130 135 140 Leu Gin Pro Gin Arg Leu Val Arg Arg Tyr Arg Asn Ala Arg Glu Gly
145 150 155 160
Asp Ile Ala Gly Val Ala Glu Arg Val Phe Asp Thr Trp Arg Cys Thr
165 170 175
Leu Arg Thr Thr Leu Leu Asp Phe Ala His Gly Val Val Asp Cys Phe 180 185 190
Ala Pro Gly Gly Pro Ser Gly Pro Thr Ser Phe Pro Lys Tyr Ile Asp
195 200 205
Trp Leu Thr Cys Leu Gly Leu Val Pro Ile Leu Arg Lys Thr Arg Glu
210 215 220 Gly Glu Ala Thr Gin Arg Leu Gly Ala Phe Leu Arg Gin His Thr Leu
225 230 235 240
Pro Arg Gin Leu Ala Thr Val Ala Gly Ala Ala Glu Arg Ala Gly Pro
245 250 255
Gly Leu Leu Glu Leu Ala Val Ala Phe Asp Ser Thr Arg Met Ala Glu 260 265 270
Tyr Asp Arg Val His Ile Tyr Tyr Asn His Arg Arg Gly Glu Trp Leu
275 280 285
Val Arg Asp Pro Val Ser Gly Gin Arg Gly Glu Cys Leu Val Leu Cys 290 295 300
Pro Pro Leu Trp Thr Gly Asp Arg Leu Val Phe Asp Ser Pro Val Gin
305 310 _ 315 320
Arg Leu Cys Pro Glu Ile Val Ala Cys His Ala Leu Arg Glu His Ala 325 330 335
His Ile Cys Arg Leu Arg Asn Thr Ala Ser Val Lys Val Leu Leu Gly 340 345 350
Arg Lys Ser Asp Ser Gly Val Ala Gly Ala Ala Arg Val Val Asn Lys 355 360 365
Ala Leu Gly Glu Asp Asp Glu Thr Lys Ala Gly Ser Ala Ala Ser Arg 370 375 380
Leu Val Arg Leu Ile He Asn Met Lys Gly Met Arg His Val Gly Asp
385 390 395 400
Ile Asn Asp Thr Val Arg Ala Tyr Leu Asp Glu Ala Gly Gly His Leu 405 410 415
Ile Asp Thr Pro Ala Val Asp His Thr Leu Pro Gly Phe Gly Lys Gly 420 425 430
Gly Thr Gly Arg Gly Ser Ala Ala Gin Asp Pro Gly Ala Arg Pro Gin
435 440 445
Gin Leu Arg Gin Ala Phe Gin Thr Ala Val Val Asn Asn Ile Asn Gly 450 455 460
Met Leu Glu Gly Tyr He Asn Asn Leu Phe Gly Thr Ile Glu Arg Leu
465 470 475 480
Arg Glu Thr Asn Ala Gly Leu Ala Thr Gin Leu Gin Ala Arg Asp Arg 485 490 495
Glu Leu Arg Arg Ala Gin Ala Gly Ala Leu Glu Arg Glu Gin Arg Ala 500 505 510
Ala Asp Arg Ala Ala Gly Gly Gly Ala Gly Arg Pro Ala Glu Ala Asp
515 520 525
Leu Leu Arg Ala Asp Tyr Asp Ile Ile Asp Val Ser Lys Ser Met Asp 530 535 540
Asp Asp Thr Tyr Val Ala Asn Ser Phe Gin His Gin Tyr Ile Pro Ala
545 550 555 560
Tyr Gly Gin Asp Leu Glu Arg Leu Ser Arg Leu Trp Glu His Glu Leu 565 570 575
Val Arg Cys Phe Lys Ile Leu Arg His Arg Asn Asn Gin Gly Gin Glu 580 585 590
Thr Ser Ile Ser Tyr Ser Ser Gly Ala Ile Ala Ser Phe Val Ala Pro 595 600 605
Tyr Phe Glu Tyr Val Leu Arg Ala Pro Arg Ala Gly Ala Leu Ile Thr 610 615 620
Gly Ser Asp Val Ile Leu Gly Glu Glu Glu Leu Trp Glu Ala Val Phe
625 630 635 640 Lys Lys Thr Arg Leu Gin Thr Tyr Leu Thr Asp Val Ala Ala Leu Phe
645 650 655
Val Ala Asp Val Gin His Ala Ala Leu Pro Arg Pro Pro Ser Pro Thr 660 665 670 Pro Ala Asp Phe Arg Ala Ser Asp Arg Gly Gly Ser Arg Ser Arg Thr 675 680 685
Arg Thr Arg Ser Arg Ser Pro Gly Arg Thr Pro Arg Gly Ala Pro Asp
690 695 700
Gin Gly Trp Gly Val Glu Arg Arg Asp Gly Arg Pro His Ala Arg Arg 705 710 715 720
(2) INFORMATION FOR SEQ ID NO: 295:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 763 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 295:
Met Arg Tyr Ala Ala Asn Gly Asn Ser Arg Ser Gly Arg Pro Val Gly 1 5 10 15
Thr Ser Lys Ala Ala Thr Ser Arg Asn His Cys Arg Arg Gly Thr Cys
20 25 30
Val Thr Ser Ser Cys Cys Cys Glu Ser Ser Arg Met Arg Ala Met Ile 35 , 40 45 Gly Trp Thr Pro Cys Met Asp Val Lys Phe Lys Asn Ala Ser Ser Leu 50 55 60
Asn Arg Thr Ala Gly Leu Ala Pro Gly Cys Cys Gly Gly Gly Pro Gly 65 70 75 80
Ala Arg Thr Ser Arg Glu Pro Ser Pro Pro Asp Ala Ala Met Ala Ala 85 90 95
Gin Arg Ala Arg Ala Pro Ala Met Arg Thr Arg Gly Gly Asp Ala Ala
100 105 110
Leu Cys Ala Pro Glu Asp Gly Trp Val Lys Val His Pro Thr Pro Gly 115 120 125 Thr Met Leu Phe Arg Glu Ile Leu Leu Gly Gin Met Gly Tyr Thr Glu 130 135 140
Gly Gin Gly Val Tyr Asn Val Val Arg Ser Ser Glu Ala Ala Thr Arg 145 150 155 160 Gin Leu Gin Ala Ala Ile Phe His Ala Leu Leu Asn Ala Thr Tyr Asp
165 170 175
Leu Glu Glu Asp Trp Arg Arg His Val Val Arg Leu Gin Pro Gin Arg 180 185 190 Leu Val Arg Arg Tyr Arg Asn Ala Arg Glu Gly Asp Ile Ala Gly Val 195 200 205
Ala Glu Arg Val Phe Asp Thr Trp Arg Cys Thr Leu Arg Thr Thr Leu
210 215 220
Leu Asp Phe Ala His Gly Val Val Asp Cys Phe Ala Pro Gly Gly Pro 225 230 235 240
Ser Gly Pro Thr Ser Phe Pro Lys Tyr Ile Asp Trp Leu Thr Cys Leu
245 250 255
Gly Leu Val Pro Ile Leu Arg Lys Thr Arg Glu Gly Glu Ala Thr Gin 260 265 270 Arg Leu Gly Ala Phe Leu Arg Gin His Thr Leu Pro Arg Gin Leu Ala 275 280 285
Thr Val Ala Gly Ala Ala Glu Arg Ala Gly Pro Gly Leu Leu Glu Leu
290 295 300
Ala Val Ala Phe Asp Ser Thr Arg Met Ala Glu Tyr Asp Arg Val His 305 310 315 320 lie Tyr Tyr Asn His Arg Arg Gly Glu Trp Leu Val Arg Asp Pro Val
325 330 335
Ser Gly Gin Arg Gly Glu Cys Leu Val Leu Cys Pro Pro Leu Trp Thr 340 345 350 Gly Asp Arg Leu Val Phe Asp Ser Pro Val Gin Arg Leu Cys Pro Glu 355 360 365
Ile Val Ala Cys His Ala Leu Arg Glu His Ala His Ile Cys Arg Leu
370 375 380
Arg Asn Thr Ala Ser Val Lys Val Leu Leu Gly Arg Lys Ser Asp Ser 385 390 395 400
Gly Val Ala Gly Ala Ala Arg Val Val Asn Lys Ala Leu Gly Glu Asp
405 410 415
Asp Glu Thr Lys Ala Gly Ser Ala Ala Ser Arg Leu Val Arg Leu Ile 420 425 430 Ile Asn Met Lys Gly Met Arg His Val Gly Asp Ile Asn Asp Thr Val 435 440 445
Arg Ala Tyr Leu Asp Glu Ala Gly Gly His Leu Ile Asp Thr Pro Ala
450 455 460
Val Asp His Thr Leu Pro Gly Phe Gly Lys Gly Gly Thr Gly Arg Gly 465 470 475 480
Ser Ala Ala Gin Asp Pro Gly Ala Arg Pro Gin Gin Leu Arg Gin Ala
485 490 495
Phe Gin Thr Ala Val Val Asn Asn Ile Asn Gly Met Leu Glu Gly Tyr 500 505 510
Ile Asn Asn Leu Phe Gly Thr Ile Glu Arg Leu Arg Glu Thr Asn Ala
515 . 520 525
Gly Leu Ala Thr Gin Leu Gin Ala Arg Asp Arg Glu Leu Arg Arg Ala 530 535 540
Gin Ala Gly Ala Leu Glu Arg Glu Gin Arg Ala Ala Asp Arg Ala Ala
545 550 555 560
Gly Gly Gly Ala Gly Arg Pro Ala Glu Ala Asp Leu Leu Arg Ala Asp
565 570 575 Tyr Asp Ile Ile Asp Val Ser Lys Ser Met Asp Asp Asp Thr Tyr Val
580 585 590
Ala Asn Ser Phe Gin His Gin Tyr Ile Pro Ala Tyr Gly Gin Asp Leu
595 600 605
Glu Arg Leu Ser Arg Leu Trp Glu His Glu Leu Val Arg Cys Phe Lys 610 615 620
Ile Leu Arg His Arg Asn Asn Gin Gly Gin Glu Thr Ser Ile Ser Tyr
625 630 635 640
Ser Ser Gly Ala Ile Ala Ser Phe Val Ala Pro Tyr Phe Glu Tyr Val
645 650 655 Leu Arg Ala Pro Arg Ala Gly Ala Leu Ile Thr Gly Ser Asp Val Ile
660 665 670
Leu Gly Glu Glu Glu Leu Trp Glu Ala Val Phe Lys Lys Thr Arg Leu
675 680 685
Gin Thr Tyr Leu Thr Asp Val Ala Ala Leu Phe Val Ala Asp Val Gin 690 695 700
His Ala Ala Leu Pro Arg Pro Pro Ser Pro Thr Pro Ala Asp Phe Arg 705 710 715 720
Ala Ser Asp Arg Gly Gly Ser Arg Ser Arg Thr Arg Thr Arg Ser Arg 725 730 735 Ser Pro Gly Arg Thr Pro Arg Gly Ala Pro Asp Gin Gly Trp Gly Val 740 745 750
Glu Arg Arg Asp Gly Arg Pro His Ala Arg Arg 755 760
(2) INFORMATION FOR SEQ ID NO: 296:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 414 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:296:
Met Ala Asp Ile Pro Pro Asp Pro Pro Ala Leu Asn Thr Thr Pro Ala 1 5 10 15
Asn His Ala Pro Pro Ser Pro Pro Pro Gly Ser Arg Lys Arg Arg Arg
20 25 30
Pro Val Leu Pro Ser Ser Ser Glu Ser Glu Gly Lys Pro Asp Thr Glu 35 40 45 Ser Glu Ser Ser Ser Thr Glu Ser Ser Glu Asp Glu Ala Gly Asp Leu 50 55 60
Arg Gly Gly Arg Arg Arg Ser Pro Arg Glu Leu Gly Gly Arg Tyr Phe 65 70 75 80
Leu Asp Leu Ser Ala Glu Ser Thr Thr Gly Thr Glu Ser Glu Gly Thr 85 90 95
Gly Pro Ser Asp Asp Asp Asp Asp Asp Ala Ser Asp Gly Trp Leu Val
100 105 110
Asp Thr Pro Pro Arg Lys Ser Lys Arg Pro Arg Ile Asn Leu Arg Leu 115 120 125 Thr Ser Ser Pro Asp Arg Arg Ala Gly Val Val Phe Pro Glu Val Trp 130 135 140
Arg Asn Asp Arg Pro Ile Arg Ala Ala Gin Pro Gin Ala Pro Ala Gin 145 150 155 160
Ser Ser Gly Asp Arg Ala Ala Ala Pro Arg Arg Ser Ala Arg Gin Ala 165 170 175
Gin Met Arg Ser Gly Ala Ala Trp Thr Leu Asp Leu His Tyr Ile Arg
180 185 190
Gin Cys Val Asn Gin Leu Phe Arg Ile Leu Arg Ala Ala Pro Asn Pro 195 200 205 Pro Gly Ser Ala Asn Arg Leu Arg His Leu Val Arg Asp Cys Tyr Leu 210 215 220
Met Gly Tyr Cys Arg Thr Arg Leu Gly Pro Arg Thr Trp Gly Arg Leu 225 230 235 240
Leu Gin Ile Ser Gly Gly Thr Trp Asp Val Arg Leu Arg Asn Ala Ile 245 250 255
Arg Glu Val Glu Ala Arg Phe Glu Pro Ala Ala Glu Pro Val Cys Glu
260 265 270
Leu Pro Cys Leu Asn Ala Arg Arg Tyr Gly Pro Glu Cys Asp Val Gly 275 280 285 Asn Leu Glu Thr Asn Gly Gly Ser Thr Ser Asp Asp Glu Ile Ser Asp 290 295 300
Ala Thr Asp Ser Asp Asp Thr Leu Ala Ser His Ser Asp Thr Glu Gly 305 310 315 320 Gly Pro Ser Pro Ala Gly Arg Glu Asn Pro Glu Ser Ala Ser Gly Gly
325 330 335
Ala lie Ala Ala Arg Leu Glu. Cys Glu Phe Gly Thr Phe Asp Trp Thr
340 345 350
Ser Glu Glu Gly Ser Gin Pro Trp Leu Ser Ala Val Val Ala Asp Thr
355 360 365
Ser Ser Ala Glu Arg Ser Gly Leu Pro Ala Pro Gly Ala Cys Arg Ala
370 375 380
Thr Glu Ala Pro Glu Arg Glu Asp Gly Cys Arg Lys Met Arg Phe Pro 385 390 395 400
Ala Ala Cys Pro Tyr Pro Cys Gly His Thr Phe Leu Arg Pro 405 410
(2) INFORMATION FOR SEQ ID NO: 297:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 810 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 297:
Met Gly Arg Leu Arg Asn Ala Pro Glu Ser Leu Thr Tyr Met Phe Cys
1 5 10 15
Ala Ala Ile Arg Val Ala Pro Val Thr Thr Gin Ser Arg Thr Ser Leu 20 25 30 Arg Val Cys Thr His Val Leu Phe Pro Asp Pro Ala Leu Pro Val Met 35 40 45
Arg Tyr Ala Ala Asn Gly Asn Ser Arg Ser Gly Arg Pro Val Gly Thr
50 55 60
Ser Lys Ala Ala Thr Ser Arg Asn His Cys Arg Arg Gly Thr Cys Val 65 70 75 80
Thr Ser Ser Cys Cys Cys Glu Ser Ser Arg Met Arg Ala Met Ile Gly
85 90 95
Trp Thr Pro Cys Met Asp Val Lys Phe Lys Asn Ala Ser Ser Leu Asn 100 105 110 Arg Thr Ala Gly Leu Ala Pro Gly Cys Cys Gly Gly Gly Pro Gly Ala 115 120 125
Arg Thr Ser Arg Glu Pro Ser Pro Pro Asp Ala Ala Met Ala Ala Gin 130 135 140 Arg Ala Arg Ala Pro Ala Met Arg Thr Arg Gly Gly Asp Ala Ala Leu
145 150 155 160
Cys Ala Pro Glu Asp Gly Trp .Val Lys Val His Pro Thr Pro Gly Thr
165 170 175 Met Leu Phe Arg Glu Ile Leu Leu Gly Gin Met Gly Tyr Thr Glu Gly
180 185 190
Gin Gly Val Tyr Asn Val Val Arg Ser Ser Glu Ala Ala Thr Arg Gin
195 200 205
Leu Gin Ala Ala Ile Phe His Ala Leu Leu Asn Ala Thr Tyr Asp Leu 210 215 220
Glu Glu Asp Trp Arg Arg His Val Val Arg Leu Gin Pro Gin Arg Leu
225 230 235 240
Val Arg Arg Tyr Arg Asn Ala Arg Glu Gly Asp Ile Ala Gly Val Ala
245 250 255 Glu Arg Val Phe Asp Thr Trp Arg Cys Thr Leu Arg Thr Thr Leu Leu
260 265 270
Asp Phe Ala His Gly Val Val Asp Cys Phe Ala Pro Gly Gly Pro Ser
275 280 285
Gly Pro Thr Ser Phe Pro Lys Tyr Ile Asp Trp Leu Thr Cys Leu Gly 290 295 300
Leu Val Pro Ile Leu Arg Lys Thr Arg Glu Gly Glu Ala Thr Gin Arg
305 310 315 320
Leu Gly Ala Phe Leu Arg Gin His Thr Leu Pro Arg Gin Leu Ala Thr
325 330 335 Val Ala Gly Ala Ala Glu Arg Ala Gly Pro Gly Leu Leu Glu Leu Ala
340 345 350
Val Ala Phe Asp Ser Thr Arg Met Ala Glu Tyr Asp Arg Val His Ile
355 360 365
Tyr Tyr Asn His Arg Arg Gly Glu Trp Leu Val Arg Asp Pro Val Ser 370 375 380
Gly Gin Arg Gly Glu Cys Leu Val Leu Cys Pro Pro Leu Trp Thr Gly
385 390 395 400
Asp Arg Leu Val Phe Asp Ser Pro Val Gin Arg Leu Cys Pro Glu Ile
405 410 415 Val Ala Cys His Ala Leu Arg Glu His Ala His Ile Cys Arg Leu Arg
420 425 430
Asn Thr Ala Ser Val Lys Val Leu Leu Gly Arg Lys Ser Asp Ser Gly
435 440 445
Val Ala Gly Ala Ala Arg Val Val Asn Lys Ala Leu Gly Glu Asp Asp 450 455 460
Glu Thr Lys Ala Gly Ser Ala Ala Ser Arg Leu Val Arg Leu Ile Ile 465 470 475 480
Asn Met Lys Gly Met Arg His Val Gly Asp Ile Asn Asp Thr Val Arg 485 490 495
Ala Tyr Leu Asp Glu Ala Gly Gly His Leu Ile Asp Thr Pro Ala Val
500 - 505 510
Asp His Thr Leu Pro Gly Phe Gly Lys Gly Gly Thr Gly Arg Gly Ser 515 520 525
Ala Ala Gin Asp Pro Gly Ala Arg Pro Gin Gin Leu Arg Gin Ala Phe
530 535 540
Gin Thr Ala Val Val Asn Asn Ile Asn Gly Met Leu Glu Gly Tyr Ile 545 550 555 560 Asn Asn Leu Phe Gly Thr Ile Glu Arg Leu Arg Glu Thr Asn Ala Gly
565 570 575
Leu Ala Thr Gin Leu Gin Ala Arg Asp Arg Glu Leu Arg Arg Ala Gin
580 585 590
Ala Gly Ala Leu Glu Arg Glu Gin Arg Ala Ala Asp Arg Ala Ala Gly 595 600 605
Gly Gly Ala Gly Arg Pro Ala Glu Ala Asp Leu Leu Arg Ala Asp Tyr
610 615 620
Asp Ile Ile Asp Val Ser Lys Ser Met Asp Asp Asp Thr Tyr Val Ala 625 630 635 640 Asn Ser Phe Gin His Gin Tyr Ile Pro Ala Tyr Gly Gin Asp Leu Glu
645 650 655
Arg Leu Ser Arg Leu Trp Glu His Glu Leu Val Arg Cys Phe Lys Ile
660 665 670
Leu Arg His Arg Asn Asn Gin Gly Gin Glu Thr Ser Ile Ser Tyr Ser 675 680 685
Ser Gly Ala Ile Ala Ser Phe Val Ala Pro Tyr Phe Glu Tyr Val Leu
690 695 700
Arg Ala Pro Arg Ala Gly Ala Leu Ile Thr Gly Ser Asp Val Ile Leu 705 710 715 720 Gly Glu Glu Glu Leu Trp Glu Ala Val Phe Lys Lys Thr Arg Leu Gin
725 730 735
Thr Tyr Leu Thr Asp Val Ala Ala Leu Phe Val Ala Asp Val Gin His
740 745 750
Ala Ala Leu Pro Arg Pro Pro Ser Pro Thr Pro Ala Asp Phe Arg Ala 755 760 765
Ser Asp Arg Gly Gly Ser Arg Ser Arg Thr Arg Thr Arg Ser Arg Ser
770 775 780
Pro Gly Arg Thr Pro Arg Gly Ala Pro Asp Gin Gly Trp Gly Val Glu 785 790 795 800 Arg Arg Asp Gly Arg Pro His Ala Arg Arg
805 810
(2) INFORMATION FOR SEQ ID NO: 298: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 813 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 298:
Met Val Leu Met Gly Arg Leu Arg Asn Ala Pro Glu Ser Leu Thr Tyr
1 5 10 15
Met Phe Cys Ala Ala Ile Arg Val Ala Pro Val Thr Thr Gin Ser Arg 20 25 30
Thr Ser Leu Arg Val Cys Thr His Val Leu Phe Pro Asp Pro Ala Leu
35 40 45
Pro Val Met Arg Tyr Ala Ala Asn Gly Asn Ser Arg Ser Gly Arg Pro 50 55 60 Val Gly Thr Ser Lys Ala Ala Thr Ser Arg Asn His Cys Arg Arg Gly 65 70 75 80
Thr Cys Val Thr Ser Ser Cys Cys Cys Glu Ser Ser Arg Met Arg Ala
85 90 95
Met Ile Gly Trp Thr Pro Cys Met Asp Val Lys Phe Lys Asn Ala Ser 100 105 110
Ser Leu Asn Arg Thr Ala Gly Leu Ala Pro Gly Cys Cys Gly Gly Gly
115 120 125
Pro Gly Ala Arg Thr Ser Arg Glu Pro Ser Pro Pro Asp Ala Ala Met
130 135 140 Ala Ala Gin Arg Ala Arg Ala Pro Ala Met Arg Thr Arg Gly Gly Asp
145 150 155 160
Ala Ala Leu Cys Ala Pro Glu Asp Gly Trp Val Lys Val His Pro Thr
165 170 175
Pro Gly Thr Met Leu Phe Arg Glu Ile Leu Leu Gly Gin Met Gly Tyr 180 185 190
Thr Glu Gly Gin Gly Val Tyr Asn Val Val Arg Ser Ser Glu Ala Ala
195 200 205
Thr Arg Gin Leu Gin Ala Ala Ile Phe His Ala Leu Leu Asn Ala Thr 210 215 220 Tyr Asp Leu Glu Glu Asp Trp Arg Arg His Val Val Arg Leu Gin Pro 225 230 235 240
Gin Arg Leu Val Arg Arg Tyr Arg Asn Ala Arg Glu Gly Asp Ile Ala 245 250 255 Gly Val Ala Glu Arg Val Phe Asp Thr Trp Arg Cys Thr Leu Arg Thr
260 265 270
Thr Leu Leu Asp Phe Ala His. Gly Val Val Asp Cys Phe Ala Pro Gly 275 280 285 Gly Pro Ser Gly Pro Thr Ser Phe Pro Lys Tyr Ile Asp Trp Leu Thr 290 295 300
Cys Leu Gly Leu Val Pro Ile Leu Arg Lys Thr Arg Glu Gly Glu Ala 305 310 315 320
Thr Gin Arg Leu Gly Ala Phe Leu Arg Gin His Thr Leu Pro Arg Gin 325 330 335
Leu Ala Thr Val Ala Gly Ala Ala Glu Arg Ala Gly Pro Gly Leu Leu
340 345 350
Glu Leu Ala Val Ala Phe Asp Ser Thr Arg Met Ala Glu Tyr Asp Arg 355 360 365 Val His Ile Tyr Tyr Asn His Arg Arg Gly Glu Trp Leu Val Arg Asp 370 375 380
Pro Val Ser Gly Gin Arg Gly Glu Cys Leu Val Leu Cys Pro Pro Leu 385 390 395 400
Trp Thr Gly Asp Arg Leu Val Phe Asp Ser Pro Val Gin Arg Leu Cys 405 410 415
Pro Glu Ile Val Ala Cys His Ala Leu Arg Glu His Ala His Ile Cys
420 425 430
Arg Leu Arg Asn Thr Ala Ser Val Lys Val Leu Leu Gly Arg Lys Ser 435 440 445 Asp Ser Gly Val Ala Gly Ala Ala Arg Val Val Asn Lys Ala Leu Gly 450 455 460
Glu Asp Asp Glu Thr Lys Ala Gly Ser Ala Ala Ser Arg Leu Val Arg 465 470 475 480
Leu Ile Ile Asn Met Lys Gly Met Arg His Val Gly Asp Ile Asn Asp 485 490 495
Thr Val Arg Ala Tyr Leu Asp Glu Ala Gly Gly His Leu Ile Asp Thr
500 505 510
Pro Ala Val Asp His Thr Leu Pro Gly Phe Gly Lys Gly Gly Thr Gly 515 520 525 Arg Gly Ser Ala Ala Gin Asp Pro Gly Ala Arg Pro Gin Gin Leu Arg 530 535 540
Gin Ala Phe Gin Thr Ala Val Val Asn Asn Ile Asn Gly Met Leu Glu 545 550 555 560
Gly Tyr Ile Asn Asn Leu Phe Gly Thr Ile Glu Arg Leu Arg Glu Thr 565 570 575
Asn Ala Gly Leu Ala Thr Gin Leu Gin Ala Arg Asp Arg Glu Leu Arg
580 585 590
Arg Ala Gin Ala Gly Ala Leu Glu Arg Glu Gin Arg Ala Ala Asp Arg 595 600 605
Ala Ala Gly Gly Gly Ala Gly Arg Pro Ala Glu Ala Asp Leu Leu Arg 610 615 620 Ala Asp Tyr Asp Ile Ile Asp Val Ser Lys Ser Met Asp Asp Asp Thr 625 630 635 640 Tyr Val Ala Asn Ser Phe Gin His Gin Tyr Ile Pro Ala Tyr Gly Gin 645 650 655
Asp Leu Glu Arg Leu Ser Arg Leu Trp Glu His Glu Leu Val Arg Cys 660 665 670
Phe Lys Ile Leu Arg His Arg Asn Asn Gin Gly Gin Glu Thr Ser Ile 675 680 685
Ser Tyr Ser Ser Gly Ala Ile Ala Ser Phe Val Ala Pro Tyr Phe Glu 690 695 700 Tyr Val Leu Arg Ala Pro Arg Ala Gly Ala Leu Ile Thr Gly Ser Asp 705 710 715 720 Val Ile Leu Gly Glu Glu Glu Leu Trp Glu Ala Val Phe Lys Lys Thr 725 730 735
Arg Leu Gin Thr Tyr Leu Thr Asp Val Ala Ala Leu Phe Val Ala Asp 740 745 750
Val Gin His Ala Ala Leu Pro Arg Pro Pro Ser Pro Thr Pro Ala Asp 755 760 765
Phe Arg Ala Ser Asp Arg Gly Gly Ser Arg Ser Arg Thr Arg Thr Arg 770 775 780 Ser Arg Ser Pro Gly Arg Thr Pro Arg Gly Ala Pro Asp Gin Gly Trp 785 790 795 800 Gly Val Glu Arg Arg Asp Gly Arg Pro His Ala Arg Arg 805 810
(2) INFORMATION FOR SEQ ID NO: 299:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 470 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 299:
Met Ala Leu Gly Arg Val Gly Leu Ala Val Gly Leu Trp Gly Leu Leu
1 5 10 15
Trp Val Gly Val Val Val Val Leu Ala Asn Asp Gly Arg Thr Ile Thr 20 25 30
Val Gly Pro Arg Gly Asn Asn Ala Ala Pro Ser Asp Arg Asn Ala Ser
35 - 40 45
Ala Pro Arg Thr Thr Pro Thr Pro Pro Gin Pro Arg Lys Ala Thr Lys 50 55 60
Ser Lys Ala Ser Thr Ala Lys Pro Ala Pro Pro Pro Lys Thr Gly Pro 65 70 75 80
Pro Lys Thr Ser Ser Glu Pro Val Arg Cys Asn Arg His Asp Pro Leu 85 90 95 Ala Arg Tyr Gly Ser Arg Val Gin Ile Arg Cys Arg Phe Pro Asn Ser 100 105 110
Thr Arg Thr Glu Ser Arg Leu Gin Ile Trp Arg Tyr Ala Thr Ala Thr
115 120 125
Asp Ala Glu Ile Gly Thr Ala Pro Ser Leu Glu Glu Val Met Val Asn 130 135 140
Val Ser Ala Pro Pro Gly Gly Gin Leu Val Tyr Asp Ser Ala Pro Asn
145 150 155 160
Arg Thr Asp Pro His Val Ile Trp Ala Glu Gly Ala Gly Pro Gly Asp
165 170 175 Arg Lys Val Val Gly Pro Leu Gly Arg Gin Arg Leu Ile Ile Glu Glu
180 185 190
Leu Thr Leu Glu Thr Gin Gly Met Tyr Tyr Trp Val Trp Gly Arg Thr
195 200 205
Asp Arg Pro Ser Ala Tyr Gly Thr Trp Val Arg Val Arg Val Phe Arg 210 215 220
Pro Pro Ser Leu Thr Ile His Pro His Ala Val Leu Glu Gly Gin Pro
225 230 235 240
Phe Lys Ala Thr Cys Thr Ala Ala Thr Tyr Tyr Pro Gly Asn Arg Ala
245 250 255 Glu Phe Val Trp Phe Glu Asp Gly Arg Arg Val Phe Asp Pro Ala Gin
260 265 270
Ile His Thr Gin Thr Gin Glu Asn Pro Asp Gly Phe Ser Thr Val Ser
275 280 285
Thr Val Thr Ser Ala Ala Val Gly Gly Gin Gly Pro Pro Arg Thr Phe 290 295 300
Thr Cys Gin Leu Thr Trp His Arg Asp Ser Val Ser Phe Ser Arg Arg 305 310 315 320
Asn Ala Ser Gly Thr Ala Ser Val Leu Pro Arg Pro Thr Ile Thr Met 325 330 335 Glu Phe Thr Gly Asp His Ala Val Cys Thr Ala Gly Cys Val Pro Glu 340 345 350
Gly Val Thr Phe Ala Trp Phe Leu Gly Asp Asp Ser Ser Pro Ala Glu 355 360 365 Lys Val Ala Val Ala Ser Gin Thr Ser Cys Gly Arg Pro Gly Thr Ala
370 375 380
Thr Ile Arg Ser Thr Leu Pro. Val Ser Tyr Glu Gin Thr Glu Tyr Ile 385 390 395 400
Cys Arg Leu Ala Gly Tyr Pro Asp Gly Ile Pro Val Leu Glu His His
405 410 415
Gly Ser His Gin Pro Pro Pro Arg Asp Pro Thr Glu Arg Gin Val Ile
420 425 430
Arg Ala Val Glu Gly Ala Gly Ile Gly Val Ala Val Leu Val Ala Val
435 440 445
Val Leu Ala Gly Thr Ala Val Val Tyr Leu Thr His Ala Ser Ser Val
450 455 460
Arg Tyr Arg Arg Leu Arg 465 470
(2) INFORMATION FOR SEQ ID NO: 300:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 536 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 300:
Met Gly Ala Gly Val Pro Trp Thr Gly Ile Lys Arg Ala Gly Gly Pro 1 5 10 15 He Thr Val Arg Val Leu Gly Trp Glu Val Ala Gin Lys Ala Thr His 20 25 30
Pro Cys Cys Ser Cys Pro Arg Glu Ala Val Val Ser Gly Asn Pro Pro
35 40 45
Arg Cys Ala Gly Arg Ala His Arg Ser Phe Ala Gly Ala Gly Ala Leu 50 55 60
Leu Val Met Ala Leu Gly Arg Val Gly Leu Ala Val Gly Leu Trp Gly 65 70 75 80
Leu Leu Trp Val Gly Val Val Val Val Leu Ala Asn Asp Gly Arg Thr 85 90 95 Ile Thr Val Gly Pro Arg Gly Asn Asn Ala Ala Pro Ser Asp Arg Asn 100 105 110
Ala Ser Ala Pro Arg Thr Thr Pro Thr Pro Pro Gin Pro Arg Lys Ala 115 120 125 Thr Lys Ser Lys Ala Ser Thr Ala Lys Pro Ala Pro Pro Pro Lys Thr
130 135 140
Gly Pro Pro Lys Thr Ser Ser. Glu Pro Val Arg Cys Asn Arg His Asp 145 150 155 160 Pro Leu Ala Arg Tyr Gly Ser Arg Val Gin Ile Arg Cys Arg Phe Pro
165 170 175
Asn Ser Thr Arg Thr Glu Ser Arg Leu Gin Ile Trp Arg Tyr Ala Thr
180 185 190
Ala Thr Asp Ala Glu Ile Gly Thr Ala Pro Ser Leu Glu Glu Val Met 195 200 205
Val Asn Val Ser Ala Pro Pro Gly Gly Gin Leu Val Tyr Asp Ser Ala
210 215 220
Pro Asn Arg Thr Asp Pro His Val Ile Trp Ala Glu Gly Ala Gly Pro 225 230 235 240 Gly Asp Arg Lys Val Val Gly Pro Leu Gly Arg Gin Arg Leu Ile Ile
245 250 255
Glu Glu Leu Thr Leu Glu Thr Gin Gly Met Tyr Tyr Trp Val Trp Gly
260 265 270
Arg Thr Asp Arg Pro Ser Ala Tyr Gly Thr Trp Val Arg Val Arg Val 275 280 285
Phe Arg Pro Pro Ser Leu Thr Ile His Pro His Ala Val Leu Glu Gly
290 295 300
Gin Pro Phe Lys Ala Thr Cys Thr Ala Ala Thr Tyr Tyr Pro Gly Asn 305 310 315 320 Arg Ala Glu Phe Val Trp Phe Glu Asp Gly Arg Arg Val Phe Asp Pro
325 330 335
Ala Gin Ile His Thr Gin Thr Gin Glu Asn Pro Asp Gly Phe Ser Thr
340 345 350
Val Ser Thr Val Thr Ser Ala Ala Val Gly Gly Gin Gly Pro Pro Arg 355 360 365
Thr Phe Thr Cys Gin Leu Thr Trp His Arg Asp Ser Val Ser Phe Ser
370 375 380
Arg Arg Asn Ala Ser Gly Thr Ala Ser Val Leu Pro Arg Pro Thr Ile 385 390 395 400 Thr Met Glu Phe Thr Gly Asp His Ala Val Cys Thr Ala Gly Cys Val
405 410 415
Pro Glu Gly Val Thr Phe Ala Trp Phe Leu Gly Asp Asp Ser Ser Pro
420 425 430
Ala Glu Lys Val Ala Val Ala Ser Gin Thr Ser Cys Gly Arg Pro Gly 435 440 445
Thr Ala Thr Ile Arg Ser Thr Leu Pro Val Ser Tyr Glu Gin Thr Glu
450 455 460
Tyr Ile Cys Arg Leu Ala Gly Tyr Pro Asp Gly Ile Pro Val Leu Glu 465 470 475 480
His His Gly Ser His Gin Pro Pro Pro Arg Asp Pro Thr Glu Arg Gin
485 . 490 495
Val He Arg Ala Val Glu Gly Ala Gly Ile Gly Val Ala Val Leu Val
500 505 510
Ala Val Val Leu Ala Gly Thr Ala Val Val Tyr Leu Thr His Ala Ser
515 520 525
Ser Val Arg Tyr Arg Arg Leu Arg 530 535
(2) INFORMATION FOR SEQ ID NO: 301
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 545 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 301:
Met Ser Val Leu Gly Asp Ala Arg His Pro Arg Arg Phe Pro Ser Arg 1 5 10 15 Gly Pro Arg Pro Phe Ser Val Ala Gly Pro Gly Ser Leu Pro Pro Ser 20 25 30
Pro Pro Pro Gly Ala Arg Ala Arg Leu Ile Arg Leu Ser Arg Ser Leu
35 40 45
Phe Pro Asp Pro Thr Ala Pro Met Asp Leu Leu Val Asp Asp Leu Phe 50 55 60
Ala Asp Ala Asp Gly Val Ser Pro Pro Pro Pro Arg Pro Ala Gly Gly
65 70 75 80
Pro Lys Asn Thr Pro Ala Ala Pro Pro Leu Tyr Ala Thr Gly Arg Leu
85 90 95 Ser Gin Ala Gin Leu Met Pro Ser Pro Pro Met Pro Val Pro Pro Ala
100 105 110
Ala Leu Phe Asn Arg Leu Leu Asp Asp Leu Gly Phe Ser Ala Gly Pro
115 120 125
Ala Leu Cys Thr Met Leu Asp Thr Trp Asn Glu Asp Leu Phe Ser Gly 130 135 140
Phe Pro Thr Asn Ala Asp Met Tyr Arg Glu Cys Lys Phe Leu Ser Thr 145 150 155 160
Leu Pro Ser Asp Val Ile Asp Trp Gly Asp Ala His Val Pro Glu Arg 165 170 175
Ser Pro Ile Asp Ile Arg Ala His Gly Asp Val Ala Phe Pro Thr Leu
180 . 185 190
Pro Ala Thr Arg Asp Glu Leu Pro Ser Tyr Tyr Glu Ala Met Ala Gin 195 200 205
Phe Phe Arg Gly Glu Leu Arg Ala Arg Glu Glu Ser Tyr Arg Thr Val
210 215 220
Leu Ala Asn Phe Cys Ser Ala Leu Tyr Arg Tyr Leu Arg Ala Ser Val 225 230 235 240 Arg Gin Leu His Arg Gin Ala His Met Arg Gly Arg Asn Arg Asp Leu
245 250 255
Arg Glu Met Leu Arg Thr Thr Ile Ala Asp Arg Tyr Tyr Arg Glu Thr
260 265 270
Ala Arg Leu Ala Arg Val Leu Phe Leu His Leu Tyr Leu Phe Leu Ser 275 280 285
Arg Glu Ile Leu Trp Ala Ala Tyr Ala Glu Gin Met Met Arg Pro Asp
290 295 300
Leu Phe Asp Gly Leu Cys Cys Asp Leu Glu Ser Trp Arg Gin Leu Ala 305 310 315 320 Cys Leu Phe Gin Pro Leu Met Phe Ile Asn Gly Ser Leu Thr Val Arg
325 330 335
Gly Val Pro Val Glu Ala Arg Arg Leu Arg Glu Leu Asn His Ile Arg
340 345 350
Glu His Leu Asn Leu Pro Leu Val Arg Ser Ala Ala Ala Glu Glu Pro 355 360 365
Gly Ala Pro Leu Thr Thr Pro Pro Val Leu Gin Gly Asn Gin Ala Arg
370 375 380
Ser Ser Gly Tyr Phe Met Leu Leu Ile Arg Ala Lys Leu Asp Ser Tyr 385 390 395 400 Ser Ser Val Ala Thr Ser Glu Gly Glu Ser Val Met Arg Glu His Ala
405 410 415
Tyr Ser Arg Gly Arg Thr Arg Asn Asn Tyr Gly Ser Thr Ile Glu Gly
420 425 430
Leu Leu Asp Leu Pro Asp Asp Asp Asp Ala Pro Ala Glu Ala Gly Leu 435 440 445
Val Ala Pro Arg Met Ser Phe Leu Ser Ala Gly Gin Arg Pro Arg Arg
450 455 460
Leu Ser Thr Thr Ala Pro Ile Thr Asp Val Ser Leu Gly Asp Glu Leu 465 470 475 480 Arg Leu Asp Gly Glu Glu Val Asp Met Thr Pro Ala Asp Ala Leu Asp
485 490 495
Asp Phe Asp Leu Glu Met Leu Gly Asp Val Glu Ser Pro Ser Pro Gly 500 505 510 Met Thr His Asp Pro Val Ser Tyr Gly Ala Leu Asp Val Asp Asp Phe
515 520 525
Glu Phe Glu Gin Met Phe Thr Asp Ala Met Gly Ile Asp Asp Phe Gly 530 535 540 Gly 545
(2) INFORMATION FOR SEQ ID NO: 302:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 490 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 302:
Met Asp Leu Leu Val Asp Asp Leu Phe Ala Asp Ala Asp Gly Val Ser 1 5 10 15
Pro Pro Pro Pro Arg Pro Ala Gly Gly Pro Lys Asn Thr Pro Ala Ala
20 25 30
Pro Pro Leu Tyr Ala Thr Gly Arg Leu Ser Gin Ala Gin Leu Met Pro 35 40 45
Ser Pro Pro Met Pro Val Pro Pro Ala Ala Leu Phe Asn Arg Leu Leu
50 55 60
Asp Asp Leu Gly Phe Ser Ala Gly Pro Ala Leu Cys Thr Met Leu Asp 65 70 75 80 Thr Trp Asn Glu Asp Leu Phe Ser Gly Phe Pro Thr Asn Ala Asp Met
85 90 95
Tyr Arg Glu Cys Lys Phe Leu Ser Thr Leu Pro Ser Asp Val Ile Asp
100 105 110
Trp Gly Asp Ala His Val Pro Glu Arg Ser Pro Ile Asp Ile Arg Ala 115 120 125
His Gly Asp Val Ala Phe Pro Thr Leu Pro Ala Thr Arg Asp Glu Leu
130 135 140
Pro Ser Tyr Tyr Glu Ala Met Ala Gin Phe Phe Arg Gly Glu Leu Arg 145 150 155 160 Ala Arg Glu Glu Ser Tyr Arg Thr Val Leu Ala Asn Phe Cys Ser Ala
165 170 175
Leu Tyr Arg Tyr Leu Arg Ala Ser Val Arg Gin Leu His Arg Gin Ala 180 185 190 His Met Arg Gly Arg Asn Arg Asp Leu Arg Glu Met Leu Arg Thr Thr
195 200 205
Ile Ala Asp Arg Tyr Tyr Arg Glu Thr Ala Arg Leu Ala Arg Val Leu
210 215 220 Phe Leu His Leu Tyr Leu Phe Leu Ser Arg Glu Ile Leu Trp Ala Ala
225 230 235 240
Tyr Ala Glu Gin Met Met Arg Pro Asp Leu Phe Asp Gly Leu Cys Cys
245 250 255
Asp Leu Glu Ser Trp Arg Gin Leu Ala Cys Leu Phe Gin Pro Leu Met 260 265 270
Phe Ile Asn Gly Ser Leu Thr Val Arg Gly Val Pro Val Glu Ala Arg
275 280 285
Arg Leu Arg Glu Leu Asn His Ile Arg Glu His Leu Asn Leu Pro Leu
290 295 300 Val Arg Ser Ala Ala Ala Glu Glu Pro Gly Ala Pro Leu Thr Thr Pro
305 310 315 320
Pro Val Leu Gin Gly Asn Gin Ala Arg Ser Ser Gly Tyr Phe Met Leu
325 330 335
Leu Ile Arg Ala Lys Leu Asp Ser Tyr Ser Ser Val Ala Thr Ser Glu 340 345 350
Gly Glu Ser Val Met Arg Glu His Ala Tyr Ser Arg Gly Arg Thr Arg
355 360 365
Asn Asn Tyr Gly Ser Thr He Glu Gly Leu Leu Asp Leu Pro Asp Asp
370 375 380 Asp Asp Ala Pro Ala Glu Ala Gly Leu Val Ala Pro Arg Met Ser Phe
385 390 395 400
Leu Ser Ala Gly Gin Arg Pro Arg Arg Leu Ser Thr Thr Ala Pro lie
405 410 415
Thr Asp Val Ser Leu Gly Asp Glu Leu Arg Leu Asp Gly Glu Glu Val 420 425 430
Asp Met Thr Pro Ala Asp Ala Leu Asp Asp Phe Asp Leu Glu Met Leu
435 440 445
Gly Asp Val Glu Ser Pro Ser Pro Gly Met Thr His Asp Pro Val Ser 450 455 460 Tyr Gly Ala Leu Asp Val Asp Asp Phe Glu Phe Glu Gin Met Phe Thr 465 470 475 480
Asp Ala Met Gly Ile Asp Asp Phe Gly Gly 485 490
(2) INFORMATION FOR SEQ ID NO: 303:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 552 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 303:
Met Arg Gly Gly Gly Arg Glu Met Ser Val Leu Gly Asp Ala Arg His 1 5 10 15
Pro Arg Arg Phe Pro Ser Arg Gly Pro Arg Pro Phe Ser Val Ala Gly
20 25 30
Pro Gly Ser Leu Pro Pro Ser Pro Pro Pro Gly Ala Arg Ala Arg Leu 35 40 45 He Arg Leu Ser Arg Ser Leu Phe Pro Asp Pro Thr Ala Pro Met Asp 50 55 60
Leu Leu Val Asp Asp Leu Phe Ala Asp Ala Asp Gly Val Ser Pro Pro 65 70 75 80
Pro Pro Arg Pro Ala Gly Gly Pro Lys Asn Thr Pro Ala Ala Pro Pro 85 90 95
Leu Tyr Ala Thr Gly Arg Leu Ser Gin Ala Gin Leu Met Pro Ser Pro
100 105 110
Pro Met Pro Val Pro Pro Ala Ala Leu Phe Asn Arg Leu Leu Asp Asp 115 120 125 Leu Gly Phe Ser Ala Gly Pro Ala Leu Cys Thr Met Leu Asp Thr Trp 130 135 140
Asn Glu Asp Leu Phe Ser Gly Phe Pro Thr Asn Ala Asp Met Tyr Arg 145 150 155 160
Glu Cys Lys Phe Leu Ser Thr Leu Pro Ser Asp Val Ile Asp Trp Gly 165 170 175
Asp Ala His Val Pro Glu Arg Ser Pro Ile Asp Ile Arg Ala His Gly
180 185 190
Asp Val Ala Phe Pro Thr Leu Pro Ala Thr Arg Asp Glu Leu Pro Ser 195 200 205 Tyr Tyr Glu Ala Met Ala Gin Phe Phe Arg Gly Glu Leu Arg Ala Arg 210 215 220
Glu Glu Ser Tyr Arg Thr Val Leu Ala Asn Phe Cys Ser Ala Leu Tyr 225 230 235 240
Arg Tyr Leu Arg Ala Ser Val Arg Gin Leu His Arg Gin Ala His Met 245 250 255
Arg Gly Arg Asn Arg Asp Leu Arg Glu Met Leu Arg Thr Thr Ile Ala
260 265 270
Asp Arg Tyr Tyr Arg Glu Thr Ala Arg Leu Ala Arg Val Leu Phe Leu 275 280 285
His Leu Tyr Leu Phe Leu Ser Arg Glu Ile Leu Trp Ala Ala Tyr Ala
290 295 300
Glu Gin Met Met Arg Pro Asp Leu Phe Asp Gly Leu Cys Cys Asp Leu 305 310 315 320
Glu Ser Trp Arg Gin Leu Ala Cys Leu Phe Gin Pro Leu Met Phe Ile
325 330 335
Asn Gly Ser Leu Thr Val Arg Gly Val Pro Val Glu Ala Arg Arg Leu 340 345 350 Arg Glu Leu Asn His Ile Arg Glu His Leu Asn Leu Pro Leu Val Arg 355 360 365
Ser Ala Ala Ala Glu Glu Pro Gly Ala Pro Leu Thr Thr Pro Pro Val
370 375 380
Leu Gin Gly Asn Gin Ala Arg Ser Ser Gly Tyr Phe Met Leu Leu Ile 385 390 395 400
Arg Ala Lys Leu Asp Ser Tyr Ser Ser Val Ala Thr Ser Glu Gly Glu
405 410 415
Ser Val Met Arg Glu His Ala Tyr Ser Arg Gly Arg Thr Arg Asn Asn 420 425 430 Tyr Gly Ser Thr Ile Glu Gly Leu Leu Asp Leu Pro Asp Asp Asp Asp 435 440 445
Ala Pro Ala Glu Ala Gly Leu Val Ala Pro Arg Met Ser Phe Leu Ser
450 455 460
Ala Gly Gin Arg Pro Arg Arg Leu Ser Thr Thr Ala Pro Ile Thr Asp 465 470 475 480
Val Ser Leu Gly Asp Glu Leu Arg Leu Asp Gly Glu Glu Val Asp Met
485 490 495
Thr Pro Ala Asp Ala Leu Asp Asp Phe Asp Leu Glu Met Leu Gly Asp 500 505 510 Val Glu Ser Pro Ser Pro Gly Met Thr His Asp Pro Val Ser Tyr Gly 515 520 525
Ala Leu Asp Val Asp Asp Phe Glu Phe Glu Gin Met Phe Thr Asp Ala
530 535 540
Met Gly Ile Asp Asp Phe Gly Gly 545 550

Claims

What is claimed is:
I . An isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of:
(a) a polynucleotide having at least a 70% identity to a polynucleotide encoding a polypeptide comprising an amino acid sequence of Table 1, 2, 3 or 4;
(b) a polynucleotide having at least a 70% identity to a polynucleotide encoding a mature polypeptide expressed by the gene contained in the HSV-2 of deposited strain VR-2546 that was sequenced to obtain a polynucleotide sequence of Table 1, 2 or 3;
(c) a polynucleotide encoding a polypeptide comprising an amino acid sequence which is at least 70% identical to an amino acid sequence of Table 1, 2, 3 or 4;
(d) a polynucleotide which is complementary to the polynucleotide of (a), (b) or (c); and
(e) a polynucleotide comprising at least 15 sequential bases of the polynucleotide of (a), (b), (c) or (d).
2. The polynucleotide of Claim 1 wherein the polynucleotide is DNA.
3. The polynucleotide of Claim 1 wherein the polynucleotide is RNA.
4. The polynucleotide of Claim 2 comprising the nucleic acid sequence selected from the group consisting of the nucleic acid sequences set forth in Table 1, 2 and 3.
5. The polynucleotide of Claim 2 which encodes a polypeptide comprising an amino acid sequence sequence selected from the group consisting of the amino acid sequences set forth in Table 1, 2, 3 and 4.
6. A vector comprising the polynucleotide of Claim 1.
7. A host cell comprising the vector of Claim 6.
8. A process for producing a polypeptide comprising expressing in the host cell of Claim 7 a polypeptide encoded by said polynucleotide.
9. A process for producing a polypeptide or fragment thereof comprising culturing a host cell of Claim 7 under conditions sufficient for the production of said polypeptide or fragment.
10. A polypeptide comprising an amino acid sequence which is at least 70% identical to an amino acid sequence selected from the group consisting of the amino acid sequences or fragments thereof set forth in Table 1, 2, 3 and 4.
I I. A polypeptide comprising an amino acid sequence selected from the group consisting of the amino acid sequences or fragments thereof set forth in Table 1, 2, 3, and 4.
12. An antibody generated against the polypeptide of claim 10.
13. An antagonist or agonist of the activity or expression of the polypeptide of claim 10.
14. A method for the treatment or prevention of disease of an individual comprising administering to the individual a therapeutically effective amount of the polypeptide of claim 10.
15. A method for the treatment of an individual having medical need to inhibit a viral polypeptide comprising administering to the individual a therapeutically effective amount of the antagonist of Claim 13.
16. A process for diagnosing a disease related to expression or activity of the polypeptide of claim 10 in an individual comprising (a) determining a nucleic acid sequence encoding said polypeptide, and/or
(b) analyzing for the presence or amount of said polypeptide in a sample derived from the individual.
17. A method for identifying compounds which inhibit or activate the polypeptide of claim 10 comprising (a) contacting a composition comprising the polypeptide with the compound to be screened under conditions to permit interaction between the compound and the polypeptide to assess the interaction of a compound, such interaction being associated with a second component capable of providing a detectable signal in response to the interaction of the polypeptide with the compound; and (b) determining whether the compound activates or inhibits polypeptide by detecting the presence or absence of the signal generated from the interaction of the compound with the polypeptide.
18. A method for inducing an immunological response in a mammal which comprises inoculating the mammal with the polypeptide of Claim 10, or a variant thereof, adequate to produce antibody and or T cell immune response to protect said animal from disease.
19. A method of inducing immunological response in a mammal which comprises delivering a nucleic acid vector to direct expression of a polypeptide of Claim 10, or a variant thereof, for expressing said polypeptide in vivo in order to induce an immunological response to produce antibody and or T cell immune response to protect said animal from disease.
20. The isolated polynucleotide of claim 1 wherein said nucleotide is selected from the group consisting of:
(a) a polynucleotide having at least a 90% identity to a polynucleotide encoding a polypeptide comprising the amino acid sequence of Table 1, 2, 3 or 4; (b) a polynucleotide having at least a 90% identity to a polynucleotide encoding the same mature polypeptide expressed by the gene contained in the HSV-2 of the deposited strain VR-2546that was sequenced to obtain a.poly nucleotide sequence of Table 1, 2 or 3;
(c) a polynucleotide encoding a polypeptide comprising an amino acid sequence which is at least 90% identical to the amino acid sequence of Table 1 , 2, 3 or 4;
(d) a polynucleotide which is complementary to the polynucleotide of (a), (b) or (c); and
(e) a polynucleotide comprising at least 15 sequential bases of the polynucleotide of (a), (b), (c) or (d).
21. The isolated polynucleotide of Claim 1 selected from the group consisting of
(a) a polynucleotide having at least a 95% identity to a polynucleotide encoding a polypeptide comprising the amino acid sequence of Table 1, 2, 3 or 4;
(b) a polynucleotide having at least a 95% identity to a polynucleotide encoding the same mature polypeptide expressed by the gene contained in the HSV-2 of the deposited strain VR-2546 that was sequenced to obtain a polynucleotide sequence of Table 1 , 2 or 3;
(c) a polynucleotide encoding a polypeptide comprising an amino acid sequence which is at least 95% identical to the amino acid sequence of Table 1 , 2, 3 or 4;
(d) a polynucleotide which is complementary to the polynucleotide of (a), (b) or (c); and (e) a polynucleotide comprising at least 15 sequential bases of the polynucleotide of (a), (b), (c) or (d).
22. An isolated polynucleotide comprising a polynucleotide sequence selected from the group consisting of:
(a) a polynucleotide having at least a 50% identity to a polynucleotide encoding a polypeptide comprising the amino acid sequence of Table 1, 2, 3 or 4 and obtained from a prokaryotic species other than HSV-2;
(b) a polynucleotide encoding a polypeptide comprising an amino acid sequence which is at least 50% identical to the amino acid sequence of Table 1, 2, 3 or 4 and obtained from a prokaryotic species other than HSV-2; and (c) a polynucleotide which is complementary to the polynucleotide of (a) or (b).
23. An isolated polypeptide having one of the amino acid sequences given in Table 1, 2, 3 or 4.
24. An isolated nucleic acid encoding one of the amino acid sequences of Claim 1 and nucleic acid sequences capable of hybridizing therewith under stringent conditions.
25. A recombinant vector comprising the nucleic acid sequences of Claim 24 and host cells transformed or transfected therewith.
26. A method of identifying an antiviral compound comprising contacting candidate compounds with a polypeptide of Claim 10 and selecting those compounds capable of inhibiting the bioactivity of said polypeptide.
27. Antiviral compounds identified by the method of Claim 26.
28. An isolated polypeptide having an amino acid sequence or fragment thereof given in Table 1, 2, 3 or 4.
29. An isolated nucleic acid encoding one of the amino acid sequences of Claim 28 and nucleic acid sequences capable of hybridizing therewith under stringent conditions.
30. A method of identifying an antiviral compound comprising contacting candidate compounds with a polypeptide of Claim 28 and selecting those compounds capable of inhibiting the bioactivity of said polypeptide.
PCT/US1997/020016 1996-11-04 1997-10-31 Novel coding sequences from herpes simplex virus type-2 WO1998020016A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP97946877A EP0948508A4 (en) 1996-11-04 1997-10-31 Novel coding sequences from herpes simplex virus type-2
JP52166998A JP2001508649A (en) 1996-11-04 1997-10-31 Novel coding sequence from herpes simplex virus type 2
CA002270282A CA2270282A1 (en) 1996-11-04 1997-10-31 Novel coding sequences from herpes simplex virus type-2

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US3027996P 1996-11-04 1996-11-04
US60/030,279 1996-11-04
US4901897P 1997-06-09 1997-06-09
US60/049,018 1997-06-09

Publications (1)

Publication Number Publication Date
WO1998020016A1 true WO1998020016A1 (en) 1998-05-14

Family

ID=26705857

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/020016 WO1998020016A1 (en) 1996-11-04 1997-10-31 Novel coding sequences from herpes simplex virus type-2

Country Status (4)

Country Link
EP (1) EP0948508A4 (en)
JP (1) JP2001508649A (en)
CA (1) CA2270282A1 (en)
WO (1) WO1998020016A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6413518B1 (en) * 1999-09-30 2002-07-02 University Of Washington Immunologically significant herpes simplex virus antigens and methods for identifying and using same
US6537555B2 (en) * 2000-06-29 2003-03-25 Corixa Corporation Compositions and methods for the diagnosis and treatment of herpes simplex virus infection
US6821519B2 (en) * 2000-06-29 2004-11-23 Corixa Corporation Compositions and methods for the diagnosis and treatment of herpes simplex virus infection
EP1523582A2 (en) * 2002-07-18 2005-04-20 University Of Washington Rapid, efiicient purification of hsv-specific t-lymphocytes and hsv antigens identified via same
US7037509B2 (en) 2001-07-31 2006-05-02 University Of Washington Immunologically significant herpes simplex virus antigens and methods for using same
WO2007049731A1 (en) * 2005-10-28 2007-05-03 Mitsubishi Tanabe Pharma Corporation Novel cell membrane-permeable peptide
US7744903B2 (en) 1998-08-07 2010-06-29 University Of Washington Immunological herpes simplex virus antigens and methods for use thereof
WO2012074881A3 (en) * 2010-11-24 2012-09-13 Genocea Biosciences, Inc. Vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
US8460674B2 (en) 2009-02-07 2013-06-11 University Of Washington HSV-1 epitopes and methods for using same
US8617564B2 (en) 2009-05-22 2013-12-31 Genocea Biosciences, Inc. Vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
US20140127247A1 (en) * 2012-05-16 2014-05-08 Immune Design Corp. Vaccines for hsv-2
US9044447B2 (en) 2009-04-03 2015-06-02 University Of Washington Antigenic peptide of HSV-2 and methods for using same
US9624273B2 (en) 2011-11-23 2017-04-18 Genocea Biosciences, Inc. Nucleic acid vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
US20180303929A1 (en) * 2015-10-22 2018-10-25 Moderna TX, Inc. Herpes simplex virus vaccine
US10350288B2 (en) 2016-09-28 2019-07-16 Genocea Biosciences, Inc. Methods and compositions for treating herpes
WO2021015987A1 (en) * 2019-07-19 2021-01-28 Merck Sharp & Dohme Corp. Antigenic glycoprotein e polypeptides, compositions, and methods of use thereof
CN113150071A (en) * 2021-04-25 2021-07-23 上海健康医学院 Marine brevibacillus brevis antitumor active polypeptide and medicine and application thereof
EP3641810A4 (en) * 2017-04-26 2021-08-18 Modernatx, Inc. Herpes simplex virus vaccine
US11752206B2 (en) 2017-03-15 2023-09-12 Modernatx, Inc. Herpes simplex virus vaccine
US12070495B2 (en) 2019-03-15 2024-08-27 Modernatx, Inc. HIV RNA vaccines

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995006055A1 (en) * 1993-08-20 1995-03-02 Smithkline Beecham Corporation Hsv-2 ul26 gene, capsid proteins, immunoassays and protease inhibitors
WO1995016779A1 (en) * 1993-12-14 1995-06-22 Smithkline Beecham Biologicals (S.A.) Herpes-symplex-virus type 2 icp4 protein and its use in a vaccine composition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995006055A1 (en) * 1993-08-20 1995-03-02 Smithkline Beecham Corporation Hsv-2 ul26 gene, capsid proteins, immunoassays and protease inhibitors
WO1995016779A1 (en) * 1993-12-14 1995-06-22 Smithkline Beecham Biologicals (S.A.) Herpes-symplex-virus type 2 icp4 protein and its use in a vaccine composition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JOURNAL OF GENERAL VIROLOGY, 1995, Vol. 76, STEFFY et al., "Nucleotide Sequence of Herpes Simplex Virus Type 2 Gene Encoding the Protease and Capsid Protein ICP35", pages 1069-1072. *
See also references of EP0948508A4 *

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8852602B2 (en) 1998-08-07 2014-10-07 David M. Koelle Immunological herpes simplex virus antigens and methods for use thereof
US7744903B2 (en) 1998-08-07 2010-06-29 University Of Washington Immunological herpes simplex virus antigens and methods for use thereof
US8067010B2 (en) 1998-08-07 2011-11-29 University Of Washington Immunological herpes simplex virus antigens and methods for use thereof
EP1102790B1 (en) * 1998-08-07 2014-05-07 University of Washington Immunological Herpes Simplex Virus antigens and methods for use thereof
EP2272859A3 (en) * 1998-08-07 2011-06-08 University of Washington Immunological herpes simplex virus antigens and methods for use thereof
US6962709B2 (en) 1999-09-30 2005-11-08 University Of Washington, Fred Hutchinson Cancer Research Center Immunologically significant herpes simplex virus antigens and methods for identifying and using same
US6413518B1 (en) * 1999-09-30 2002-07-02 University Of Washington Immunologically significant herpes simplex virus antigens and methods for identifying and using same
AU2005234654B2 (en) * 1999-09-30 2008-04-24 Corixa Corporation Immunologically significant herpes simplex virus antigens and methods for identifying and using same
US6821519B2 (en) * 2000-06-29 2004-11-23 Corixa Corporation Compositions and methods for the diagnosis and treatment of herpes simplex virus infection
US6537555B2 (en) * 2000-06-29 2003-03-25 Corixa Corporation Compositions and methods for the diagnosis and treatment of herpes simplex virus infection
US7037509B2 (en) 2001-07-31 2006-05-02 University Of Washington Immunologically significant herpes simplex virus antigens and methods for using same
US7078041B2 (en) 2002-07-18 2006-07-18 University Of Washington Rapid, efficient purification of HSV-specific T-lymphocytes and HSV antigens identified via same
US7666434B2 (en) 2002-07-18 2010-02-23 University Of Washington Rapid, efficient purification of HSV-specific T-lymphocytes and HSV antigens identified via same
US7431934B2 (en) 2002-07-18 2008-10-07 University Of Washington Rapid, efficient purification of HSV-specific T-lymphocytes and HSV antigens identified via same
EP1523582A4 (en) * 2002-07-18 2007-02-07 Univ Washington Rapid, efficient purification of hsv-specific t-lymphocytes and hsv antigens identified via same
US9675688B2 (en) 2002-07-18 2017-06-13 University Of Washington Rapid, efficient purification of HSV-specific T-lymphocytes and HSV antigens identified via same
EP2316479A3 (en) * 2002-07-18 2011-07-06 University of Washington Pharmaceutical compositions comprising immunologically active herpes simplex virus (HSV) protein fragments
US8197824B2 (en) 2002-07-18 2012-06-12 University Of Washington Rapid, efficient purification of HSV-specific T-lymphocytes and HSV antigens identified via same
US9138473B2 (en) 2002-07-18 2015-09-22 University Of Washington Rapid, efficient purification of HSV-specific T-lymphocytes and HSV antigens identified via same
EP2865386A1 (en) * 2002-07-18 2015-04-29 University of Washington Pharmaceutical compositions comprising immunologically active herpes simplex virus (HSV) protein fragments
EP1523582A2 (en) * 2002-07-18 2005-04-20 University Of Washington Rapid, efiicient purification of hsv-specific t-lymphocytes and hsv antigens identified via same
WO2007049731A1 (en) * 2005-10-28 2007-05-03 Mitsubishi Tanabe Pharma Corporation Novel cell membrane-permeable peptide
US8691528B2 (en) 2005-10-28 2014-04-08 Mitsubishi Tanabe Pharma Corporation Cell penetrating peptide
US8044019B2 (en) 2005-10-28 2011-10-25 Mitsubishi Tanabe Pharma Corporation Cell penetrating peptide
US8460674B2 (en) 2009-02-07 2013-06-11 University Of Washington HSV-1 epitopes and methods for using same
US9328144B2 (en) 2009-02-07 2016-05-03 University Of Washington HSV-1 epitopes and methods for using same
US9044447B2 (en) 2009-04-03 2015-06-02 University Of Washington Antigenic peptide of HSV-2 and methods for using same
US9579376B2 (en) 2009-04-03 2017-02-28 University Of Washington Antigenic peptide of HSV-2 and methods for using same
US9895436B2 (en) 2009-05-22 2018-02-20 Genocea Biosciences, Inc. Vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
US8617564B2 (en) 2009-05-22 2013-12-31 Genocea Biosciences, Inc. Vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
US10653771B2 (en) 2009-05-22 2020-05-19 Genocea Biosciences, Inc. Vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
US9782474B2 (en) 2010-11-24 2017-10-10 Genocea Biosciences, Inc. Vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
WO2012074881A3 (en) * 2010-11-24 2012-09-13 Genocea Biosciences, Inc. Vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
EP2643014A4 (en) * 2010-11-24 2015-11-11 Genocea Biosciences Inc Vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
US9624273B2 (en) 2011-11-23 2017-04-18 Genocea Biosciences, Inc. Nucleic acid vaccines against herpes simplex virus type 2: compositions and methods for eliciting an immune response
US9895435B2 (en) 2012-05-16 2018-02-20 Immune Design Corp. Vaccines for HSV-2
US9555099B2 (en) 2012-05-16 2017-01-31 Immune Design Corp. Vaccines for HSV-2
US20140127247A1 (en) * 2012-05-16 2014-05-08 Immune Design Corp. Vaccines for hsv-2
US10391164B2 (en) 2012-05-16 2019-08-27 Immune Design Corp. Vaccines for HSV-2
US20180303929A1 (en) * 2015-10-22 2018-10-25 Moderna TX, Inc. Herpes simplex virus vaccine
US10350288B2 (en) 2016-09-28 2019-07-16 Genocea Biosciences, Inc. Methods and compositions for treating herpes
US11752206B2 (en) 2017-03-15 2023-09-12 Modernatx, Inc. Herpes simplex virus vaccine
EP3641810A4 (en) * 2017-04-26 2021-08-18 Modernatx, Inc. Herpes simplex virus vaccine
US12070495B2 (en) 2019-03-15 2024-08-27 Modernatx, Inc. HIV RNA vaccines
WO2021015987A1 (en) * 2019-07-19 2021-01-28 Merck Sharp & Dohme Corp. Antigenic glycoprotein e polypeptides, compositions, and methods of use thereof
EP3999093A4 (en) * 2019-07-19 2023-11-22 Merck Sharp & Dohme LLC Antigenic glycoprotein e polypeptides, compositions, and methods of use thereof
CN113150071A (en) * 2021-04-25 2021-07-23 上海健康医学院 Marine brevibacillus brevis antitumor active polypeptide and medicine and application thereof
CN113150071B (en) * 2021-04-25 2023-02-17 上海健康医学院 Marine brevibacillus brevis antitumor active polypeptide and medicine and application thereof

Also Published As

Publication number Publication date
EP0948508A4 (en) 2001-11-07
EP0948508A1 (en) 1999-10-13
CA2270282A1 (en) 1998-05-14
JP2001508649A (en) 2001-07-03

Similar Documents

Publication Publication Date Title
AU2019204982B2 (en) Recombinant HCMV and RhCMV Vectors and Uses Thereof
DK2753355T3 (en) ONCOLYTIC HERP SIMPLEX VIRUSES AND THERAPEUTIC APPLICATIONS THEREOF
WO1998020016A1 (en) Novel coding sequences from herpes simplex virus type-2
KR101668163B1 (en) A conditional replicating cytomegalovirus as a vaccine for cmv
CN100558896C (en) The genome of bifidus bacillus
AU2015289560B2 (en) Human cytomegalovirus comprising exogenous antigens
CN110101723B (en) Pseudorabies virus for treating tumors
CN107574154A (en) Monkey (gorilla) adenovirus or adenovirus vector and its application method
AU2021232778A1 (en) Oncolytic HSV1 vector and methods of use
JP2023145678A (en) Epstein-barr virus antigen constructs
CN116940589A (en) Recombinant SARS-COV-2 vaccine
KR20210053923A (en) Chimeric oncolytic Herpisvirus that stimulates an anti-tumor immune response
KR20200083540A (en) Stable formulation of cytomegalovirus
TW202242106A (en) Pseudorabies virus vaccine
KR102039631B1 (en) Method for preparing a recombinant herpes simplex virus vector comprising the sequences of U38, U340 and U41
KR101974169B1 (en) Recombinant herpes simplex virus and method for manufacturing the same
CN114761030A (en) Oncolytic virus therapy with induced anti-tumor immunity
CN101671656B (en) Method for targeting and killing prostate tumor cells by oncolytic I type herpes simplex viruses
RU2800085C2 (en) Chimeric oncolytic herpes virus stimulating antitumor immune response
KR102032446B1 (en) A vector for preparing recombinant herpes simplex virus
KR102039634B1 (en) A recombinant herpes simplex virus vector comprising the sequences of U38, U340 and U41
KR102039632B1 (en) Method for preparing recombinant herpes simplex virus comprising ICP 6 deletion and GM-CSF gene
KR102039633B1 (en) A recombinant herpes simplex virus vector comprising GM-CSF gene
KR102052718B1 (en) Method for preparing cnacer therapeutic HSV-1 vector using transformation step with herpes simplex virus including recombination of ICP6 region
KR101998793B1 (en) A vector for preparing recombinant herpes simplex virus

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2270282

Country of ref document: CA

Ref country code: CA

Ref document number: 2270282

Kind code of ref document: A

Format of ref document f/p: F

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1998 521669

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1997946877

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09297477

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1997946877

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1997946877

Country of ref document: EP