AU3335101A - Streptococcus pneumoniae polynucleotides and sequences - Google Patents

Streptococcus pneumoniae polynucleotides and sequences Download PDF

Info

Publication number
AU3335101A
AU3335101A AU33351/01A AU3335101A AU3335101A AU 3335101 A AU3335101 A AU 3335101A AU 33351/01 A AU33351/01 A AU 33351/01A AU 3335101 A AU3335101 A AU 3335101A AU 3335101 A AU3335101 A AU 3335101A
Authority
AU
Australia
Prior art keywords
protein
pneumoniae
sequence
fragments
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
AU33351/01A
Other versions
AU777190B2 (en
Inventor
Steven C. Barash
Gil H. Choi
Patrick J. Dillon
Brian A. Dougherty
Michael Fannon
Charles A. Kunsch
Craig A Rosen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Human Genome Sciences Inc
Original Assignee
Human Genome Sciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU69090/98A external-priority patent/AU6909098A/en
Application filed by Human Genome Sciences Inc filed Critical Human Genome Sciences Inc
Publication of AU3335101A publication Critical patent/AU3335101A/en
Application granted granted Critical
Publication of AU777190B2 publication Critical patent/AU777190B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Description

P/00/011 28/5/91 Regulation 3.2
AUSTRALIA
Patents Act 1990
ORIGINAL
COMPLETE SPECIFICATION STANDARD PATENT Name of Applicant: Address for service is: Human Genome Sciences, Inc.
WRAY ASSOCIATES 239 Adelaide Terrace Perth, WA 6000 Attorney code: WR Invention Title: "Streptococcus pneumoniae Antigens and Vaccines" This application is a divisional application by virtue of Section 39 of Australian Patent Application 69090/98 filed on 30 October 1997.
The following statement is a full description of this invention, including the best method of performing it known to me:- Streptococcus pneumoniae Polynucleotides and Sequences FIELD OF THE INVENTION The present invention relates to the field of molecular biology. In particular, it relates to, among other things, nucleotide sequences of Streptococcus pnewnoniae, contigs, ORFs, fragments, probes, primers and related' polynucleotides thereof, peptides and polypeptides encoded by the sequences, and uses of the polynucleotides and sequences thereof, such as in fermentation, polypeptide production, assays and pharmaceutical development, among others.
BACKGROUND OF THE INVENTION Streptococcus pneumoniae has been one of the most extensively studied 15 microorganisms since its first isolation in 1881. It was the object of many investigations that led to important scientific discoveries. In 1928, Griffith observed that when heat-killed encapsulated pneumococci and live strains constitutively lacking any capsule were concomitantly injected into mice, the nonencapsulated could be converted into encapsulated pneumococci with the same capsular type as the heat-killed strain. Years later, the nature of this "transforming principle," or carrier of genetic information, was shown to be DNA. (Avery, O.T., et al., J. Exp. Med., 79.137-157 (1944)).
In spite of the vast number of publications on S. pneumoniae many questions about its virulence are still unanswered, and this pathogen remains a 25 major causative agent of serious human disease, especially community-acquired pneumonia. (Johnston, et al., Rev. Infect. Dis. 13(Suppl. 6):S509-517 (1991)). In addition, in developing countries, the pneumococcus is responsible for the death of a large number of children under the age of 5 years from pneumococcal pneumonia. The incidence of pneumococcal disease is highest in infants under 2 years of age and in people over 60 years of age. Pneumococci are the second most frequent cause (after Haemophilus influenzae type b) of bacterial meningitis and otitis media in children. With the recent introduction of conjugate vaccines for H.
influenzae type b, pneumococcal meningitis is likely to become increasingly prominent. S. pneumoniae is the most important etiologic agent of communityacquired pneumonia in adults and is the second most common cause of bacterial meningitis behind Neisseria meningitidis.
The antibiotic generally prescribed to treat S. pneumoniae is benzylpenicillin, although resistance to this and to other antibiotics is found occasionally. Pneumococcal resistance to penicillin results from mutations in its penicillin-binding proteins. In uncomplicated pneumococcal pneumonia caused by a sensitive strain, treatment with penicillin is usually successful unless started too late. Erythromycin or clindamycin can be used to treat pneumonia in patients hypersensitive to penicillin, but resistant strains to these drugs exist. Broad spectrum antibiotics the tetracyclines) may also be effective, although tetracycline-resistant strains are not rare. In spite of the availability of antibiotics, the mortality of pneumococcal bacteremia in the last four decades has remained stable between 25 and 29%. (Gillespie, et al., J. Med. Microbiol. 28:237- 248 (1989).
15 S. pneumoniae is carried in the upper respiratory tract by many healthy individuals. It has been suggested that attachment of pneumococci is mediated by a disaccharide receptor on fibronectin. present on human pharyngeal epithelial cells.
(Anderson, et al., J. Immunol. 142:2464-2468 (1989). The mechanisms by which pneumococci translocate from the nasopharynx to the lung, thereby causing pneumonia, or migrate to the blood, giving rise to bacteremia or septicemia, are poorly understood. (Johnston, et al., Rev. Infect. Dis. 13(Suppl. 6):S509- 517(1991).
Various proteins have been suggested to be involved in the pathogenicity of S. pneumoniae, however, only a few of them have actually been confirmed as 25 virulence factors. Pneumococci produce an IgAl protease that might interfere with host defense at mucosal surfaces. (Kornfield, et al., Rev. Inf. Dis. 3:521- 534 (1981). S. pneumoniae also produces neuraminidase, an enzyme that may facilitate attachment to epithelial cells by cleaving sialic acid from the host Sglycolipids and gangliosides. Partially purified neuraminidase was observed to induce meningitis-like symptoms in mice; however, the reliability of this finding has been questioned because the neuraminidase preparations used were probably contaminated with cell wall products. Other pneumococcal proteins besides neuraminidase are involved in the adhesion of pneumococci to epithelial and endothelial cells. These pneumococcal proteins have as yet not been identified.
Recently, Cundell et.. al., reported that peptide permeases can modulate pneumococcal adherence to epithelial and endothelial cells. It was, however, unclear whether these permeases function directly as adhesions or whether they enhance adherence by modulating the expression of pneumococcal adhesions.
(DeVelasco, et al., Micro. Rev. 59:591-603 (1995). A better understanding of the virulence factors determining its pathogenicity will need to be developed to cope with the devastating effects of pneumococcal disease in humans.
Ironically, despite the prominent role of S. pneumoniae in the discovery of DNA, little is known about the molecular genetics of the organism. The S.
pneumoniae genome consists of one circular, covalently closed, double-stranded DNA and a collection of so-called variable accessory elements, such as prophages, plasmids, transposons and the like. Most physical characteristics and almost all of the genes of S. pneumoniae are unknown. Among the few that have been identified, most have not been physically mapped or characterized in detail. Only a few genes of this organism have been sequenced. (See, for instance current 15 versions of GENBANK and other nucleic acid databases, and references that relate to the genome of S. pneumoniae such as those set out elsewhere herein.) It is clear that the etiology of diseases mediated or exacerbated by S.
pneumoniae, infection involves the programmed expression of S. pneumoniae genes, and that characterizing the genes and their patterns of expression would add dramatically to our understanding of the organism and its host interactions.
Knowledge of S. pneumoniae genes and genomic organization would improve our understanding of disease etiology and lead to improved and new ways of preventing, ameliorating, arresting and reversing diseases. Moreover, characterized genes and genomic fragments of S. pneumoniae would provide 25 reagents for, among other things, detecting, characterizing and controlling S.
Spneumoniae infections. There is a need to characterize the genome of S.
Spneumoniae and for polynucleotides of this organism.
1 P SUMMARY OF THE INVENTION The present invention is based on the sequencing of fragments of the Streptococcus pneumoniae genome. The primary nucleotide sequences which were generated are provided in SEQ ID NOS: 1-391.
The present invention provides the nucleotide sequence of several hundred contigs of the Streptococcus pneumoniae genome, which are listed in tables below and set out in the Sequence Listing submitted herewith, and representative fragments thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan. In one embodiment, the present invention is provided as contiguous strings of primary sequence information corresponding to the nucleotide sequences depicted in SEQ ID NOS:1-391.
The present invention further provides nucleotide sequences which are at 15 least 95% identical to the nucleotide sequences of SEQ ID NOS:1-391.
The nucleotide sequence of SEQ ID NOS: 1-391, a representative fragment thereof, or a nucleotide sequence which is at least 95% identical to the nucleotide sequence of SEQ ID NOS:1-391 may be provided in a variety of mediums to facilitate its use. In one application of this embodiment, the sequences of the present invention are recorded on computer readable media. Such media includes, but is not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media.
o* 25 The present invention further provides systems, particularly computerbased systems which contain the sequence information herein described stored in a S o data storage means. Such systems are designed to identify commercially important fragments of the Streptococcus pneumoniae genome.
Another embodiment of the present invention is directed to fragments of the Streptococcus pneumoniae genome having particular structural or functional attributes. Such fragments of the Streptococcus pneumoniae genome of the present invention include, but are not limited to, fragments which encode peptides, hereinafter referred to as open reading frames or ORFs, fragments which modulate the expression of an operably linked ORF, hereinafter referred to as expression modulating fragments or EMFs, and fragments which can be used to diagnose the presence of Streptococcus pneumoniae in a sample, hereinafter referred to as diagnostic fragments or DFs.
Each of the ORFs in fragments of the Streptococcus pneumoniae genome disclosed in Tables 1-3, and the EMFs found 5' to the ORFs, can be used in numerous ways as polynucleotide reagents. For instance, the sequences can be used as diagnostic probes or amplification primers for detecting or determining the presence of a specific microbe in a sample, to selectively control gene expression in a host and in the production of polypeptides, such as polypeptides encoded by ORFs of the present invention, particular those polypeptides that have a pharmacological activity.
The present invention further includes recombinant constructs comprising one or more fragments of the Streptococcus pneumoniae genome of the present invention. The recombinant constructs of the present invention comprise vectors, such as a plasmid or viral vector, into which a fragment of the Streptococcus 15 pneumoniae has been inserted.
The present invention further provides host cells containing any of the .isolated fragments of the Streptococcus pneumoniae genome of the present invention. The host cells can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a 20 bacterial cell.
The present invention is further directed to isolated polypeptides and proteins encoded by ORFs of the present invention. A variety of methods, well known to those of skill in the art, routinely may be utilized to obtain any of the S polypeptides and proteins of the present invention. For instance, polypeptides and proteins of the present invention having relatively short, simple amino acid sequences readily can be synthesized using commercially available automated peptide synthesizers. Polypeptides and proteins of the present invention also may be purified from bacterial cells which naturally produce the protein. Yet another alternative is to purify polypeptide and proteins of the present invention from cells which have been altered to express them.
The invention further provides methods of obtaining homologs of the fragments of the Streptococcus pneumoniae genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention.
Specifically, by using the nucleotide and amino acid sequences disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.
The invention further provides antibodies which selectively bind polypeptides and proteins of the present invention. Such antibodies include both monoclonal and polyclonal antibodies.
The invention further provides hybridomas which produce the abovedescribed antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.
The present invention further provides methods of identifying test samples derived from cells which express one of the ORFs of the present invention, or a homolog thereof. Such methods comprise incubating a test sample with one or more of the antibodies of the present invention, or one or more of the DFs of the present invention, under conditions which allow a skilled artisan to determine if the sample contains the ORF or product produced therefrom.
15 In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry out the above-described assays.
:*.**Specifically, the invention provides a compartmentalized kit to receive, in o close confinement, one or more containers which comprises: a first container comprising one of the antibodies, or one of the DFs of the present invention; and one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of bound antibodies or hybridized DFs.
Using the isolated proteins of the present invention, the present invention further provides methods of obtaining and identifying agents capable of binding to 25 a polypeptide or protein encoded by one of the ORFs of the present invention.
Specifically, such agents include, as further described below, antibodies, peptides, carbohydrates, pharmaceutical agents and the like. Such methods comprise steps of: contacting an agent with an isolated protein encoded by one of the ORFs of the present invention; and determining whether the agent binds to said protein.
The present genomic sequences of Streptococcus pneumoniae will be of great value to all laboratories working with this organism and for a variety of commercial purposes. Many fragments of the Streptococcus pneumoniae genome will be immediately identified by similarity searches against GenBank or protein databases and will be of immediate value to Streptococcus pneumoniae researchers and for immediate commercial value for the production of proteins or to control gene expression.
The methodology and technology for elucidating extensive genomic sequences of bacterial and other genomes has and will greatly enhance the ability to analyze and understand chromosomal organization. In particular, sequenced contigs and genomes will provide the models for developing tools for the analysis of chromosome structure and function, including the ability to identify genes within large segments of genomic DNA, the structure, position, and spacing of regulatory elements, the identification of genes with potential industrial applications, and the ability to do comparative genomic and molecular phylogeny.
DESCRIPTION OF THE FIGURES FIGURE 1 is a block diagram of a computer system (102) that can be 15 used to implement computer-based systems of present invention.
FIGURE 2 is a schematic diagram depicting the data flow and computer programs used to collect, assemble, edit and annotate the contigs of the Streptococcus pneumoniae genome of the present invention. Both Macintosh and 20 Unix platforms are used to handle the AB 373 and 377 sequence data files, largely as described in Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System Sciences, 585, IEEE Computer Society Press, Washington D.C. (1993). Factura (AB) is a Macintosh program designed for automatic vector sequence removal and end-trimming of sequence files. The program Loadis runs on a Macintosh platform and parses the feature data extracted from the sequence files by Factura to the Unix based Streptococcus pneumoniae relational database. Assembly of contigs (and whole genome sequences) is accomplished by retrieving a specific set of sequence files and their associated features using Extrseq, a Unix utility for retrieving sequences from an SQL database. The resulting sequence file is processed by seqfilter to trim portions of the sequences with more than 2% ambiguous nucleotides. The sequence files were assembled using TIGR Assembler, an assembly engine designed at The Institute for Genomic Research TIGR for rapid and accurate assembly of thousands of sequence fragments. The collection of contigs generated by the assembly step is loaded into the database with the lassie program. Identification of open reading frames (ORFs) is accomplished by processing contigs with zorf or GenMark. The ORFs are searched against S. pneumoniae sequences from GenBank and against all protein sequences using the BLASTN and BLASTP programs, described in Altschul et al., J. Mol. Biol. 215: 403-410 (1990)). Results of the ORF determination and similarity searching steps were loaded into the database. As described below, some results of the determination and the searches are set out in Tables 1-3.
DETAILED DESCRIPTION OF ILLUSTRATIVE
EMBODIMENTS
The present invention is based on the sequencing of fragments of the Streptococcus pneumoniae genome and analysis of the sequences. The primary nucleotide sequences generated by sequencing the fragments are provided in SEQ ID NOS:1-391. (As used herein, the "primary sequence" refers to the nucleotide 15 sequence represented by the IUPAC nomenclature system.) In addition to the aforementioned Streptococcus pneumoniae polynucleotide and polynucleotide sequences, the present invention provides the nucleotide sequences of SEQ ID NOS:1-391, or representative fragments thereof, in a form which can be readily used, analyzed, and interpreted by a skilled artisan.
As used herein, a "representative fragment of the nucleotide sequence o*eee depicted in SEQ ID NOS:1-391" refers to any portion of the SEQ ID NOS:1-391 which is not presently represented within a publicly available database. Preferred representative fragments of the present invention are Streptococcus pneumoniae open reading frames ORFs expression modulating fragment EMFs and 25 fragments which can be used to diagnose the presence of Streptococcus pneumoniae in sample DFs A non-limiting identification of preferred representative fragments is provided in Tables 1-3. As discussed in detail below, the information provided in SEQ ID NOS:1-391 and in Tables 1-3 together with routine cloning, synthesis, sequencing and assay methods will enable those skilled in the art to clone and sequence all "representative fragments" of interest, including open reading frames encoding a large variety of Streptococcus pneumoniae proteins.
While the presently disclosed sequences of SEQ ID NOS: 1-391 are highly accurate, sequencing techniques are not perfect and, in relatively rare instances, further investigation of. a fragment or sequence of the invention may reveal a nucleotide sequence error present in a nucleotide sequence disclosed in SEQ ID NOS:1-391. However, once the present invention is made available once the information in SEQ ID NOS:1-391 and Tables 1-3 has been made available), resolving a rare sequencing error in SEQ ID NOS:1-391 will be well within the skill of the art. The present disclosure makes available sufficient sequence information to allow any of the described contigs or portions thereof to be obtained readily by straightforward application of routine techniques. Further sequencing of such polynucleotide may proceed in like manner using manual and automated sequencing methods which are employed ubiquitous in the art. Nucleotide sequence editing software is publicly available. For example, Applied Biosystem's (AB) AutoAssembler can be used as an aid during visual inspection of nucleotide sequences. By employing such routine techniques potential errors readily may be i. dentified and the correct sequence then may be ascertained by targeting further sequencing effort, also of a routine nature, to the region containing the potential error.
Even if all of the very rare sequencing errors in SEQ ID NOS:1-391 were corrected, the resulting nucleotide sequences would still be at least 95% identical, nearly all would be at least 99% identical, and the great majority would be at least 99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-391.
20 As discussed elsewhere herein, polynucleotides of the present invention readily may be obtained by routine application of well known and standard procedures for cloning and sequencing DNA. Detailed methods for obtaining libraries and for sequencing are provided below, for instance. A wide variety of Streptococcus pneumoniae strains that can be used to prepare S. pneumoniae genomic DNA for cloning and for obtaining polynucleotides of the present invention are available to the public from recognized depository institutions, such as the American Type Culture Collection ATCC). While the present invention is Senabled by the sequences and other information herein disclosed, the S.
pneumoniae strain that provided the DNA of the present Sequence Listing, Strain 7/87 14.8.91, has been deposited in the ATCC, as a convenience to those of skill in the art. As a further convenience, a library of S. pneumoniae genomic DNA.
derived from the same strain, also has been deposited in the ATCC. The S.
pneumoniae strain was deposited on October i10 1996, and was given Deposit No.
55840, and the cDNA library was deposited on October 11, 1996 and was given Deposit No. 97755. The genomic fragments in the library are 15 to 20 kb fragments generated by partial Sau3Al digestion and they are inserted into the BamHI site in the well-known lambda-derived vector lambda DASH II (Stratagene, La Jolla, CA). The provision of the deposits is not a waiver of any rights of the inventors or their assignees in the present subject matter.
The nucleotide sequences of the genomes from different strains of Streptococcus pneumoniae differ somewhat. However, the nucleotide sequences of the genomes of all Streptococcus pneumoniae strains will be at least identical, in corresponding part, to the nucleotide sequences provided in SEQ ID NOS: 1-391. Nearly all will be at least 99% identical and the great majority will be 99.9% identical.
Thus, the present invention further provides nucleotide sequences which are at least 95%, preferably 99% and most preferably 99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-391, in a form which can be readily used, analyzed and interpreted by the skilled artisan.
Methods for determining whether a nucleotide sequence is at least 95%, at least 99% or at least 99.9% identical to the nucleotide sequences of SEQ ID NOS:1-391 are routine and readily available to the skilled artisan. For example, the well known fasta algorithm described in Pearson and Lipman, Proc. Natl. Acad.
SSci. USA 85: 2444 (1988) can be used to generate the percent identity of nucleotide sequences. The BLASTN program also can be used to generate an identity score of polynucleotides compared to one another.
COMPUTER RELATED EMBODIMENTS The nucleotide sequences provided in SEQ ID NOS: 1-391, a representative 25 fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most preferably at least 99.9% identical to a polynucleotide sequence of SEQ ID NOS:1-391 may be "provided" in a variety of mediums to facilitate use thereof.
As used herein, provided refers to a manufacture, other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the present invention; i.e., a nucleotide sequence provided in SEQ ID NOS: 1-391, a representative fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most preferably at least 99.9% identical to a polynucleotide of SEQ ID NOS:1-391.
Such a manufacture provides a large portion of the Streptococcus pneumoniae genome and parts thereof a Streptococcus pneumoniae open reading frame (ORF)) in a form which allows a skilled artisan to examine the manufacture using means not directly applicable to examining the Streptococcus pneumwnoniae genome or a subset thereof as it exists in nature or in purified form.
In one application of this embodiment, a nucleotide sequence of the present invention can be recorded on computer readable media. As used herein, "computer readable media" refers to any medium which can-be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD- ROM; electrical storage media such as RAM and ROM; and hybrids of these categories, such as magnetic/optical storage media. A skilled artisan can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising computer readable medium having recorded thereon a nucleotide sequence of the present invention.
Likewise, it will be clear to those of skill how additional computer readable media °0* *that may be developed also can be used to create analogous manufactures having recorded thereon a nucleotide sequence of the present invention.
As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled artisan can readily adopt any of the presently know methods for recording information on computer readable medium to generate manufactures comprising the nucleotide sequence information of the present "20 invention. A variety of data storage structures are available to a skilled artisan Ore for creating a computer readable medium having recorded thereon a nucleotide 960, sequence of the present invention. The choice of the data storage structure will generally be based on the means chosen to access the stored information. In addition, a variety of data processor programs and formats can be used to store the nucleotide sequence information of the present invention on computer readable medium. The sequence information can be represented in a word processing text file, formatted in commercially- available software such as WordPerfect and MicroSoft Word, or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt any number of data-processor structuring formats text file or database) in order to obtain computer readable medium having recorded thereon the nucleotide sequence information of the present invention.
Computer software is publicly available which allows a skilled artisan to access sequence information provided in a computer readable medium. Thus, by providing in computer r.eadable form the nucleotide sequences of SEQ ID NOS: 1- 12 391, a representative fragment thereof, or a nucleotide sequence at least preferably at least 99% and most preferably at least 99.9% identical to a sequence of SEQ ID NOS: 1-391 the present invention enables the skilled artisan routinely to access the provided sequence information for a wide variety of purposes.
The examples which follow demonstrate how software which implements the BLAST (Altschul et al., J. Mol. Biol. 215:403-410 (1990)) and BLAZE (Brutlag et al., Comp. Chem. 17:203-207 (1993)) search algorithms on a Sybase system was used to identify open reading frames (ORFs) within the Streptococcus pneumoniae genome which contain homology to ORFs or proteins from both Streptococcus pneumoniae and from other organisms. Among the ORFs discussed herein are protein encoding fragments of the Streptococcus pneumoniae genome useful in producing commercially important proteins, such as enzymes used in fermentation reactions and in the production of commercially useful metabolites.
The present invention further provides systems, particularly computer- 15 based systems, which contain the sequence information described herein. Such .ooooi S• systems are designed to identify, among other things, commercially important •fragments of the Streptococcus pneumoniae genome.
V. As used herein, "a computer-based system" refers to the hardware means, *....software means, and data storage means used to analyze the nucleotide sequence infonnrmation of the present invention. The minimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means. A skilled artisan can readily appreciate that any one of the currently available computer-based systems are suitable for use in the present invention.
25 As stated above, the computer-based systems of the present invention comprise a data storage means having stored therein a nucleotide sequence of the present invention and the necessary hardware means and software means for supporting and implementing a search means.
As used herein, "data storage means" refers to memory which can store nucleotide sequence information of the present invention, or a memory access means which can access manufactures having recorded thereon the nucleotide sequence information of the present invention.
As used herein, "search means" refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the sequence information stored within the data storage 13 means. Search means are used to identify fragments or regions of the present genomic sequences which match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a variety of commercially available software for conducting search means are and can be used in the computer-based systems of the present invention. Examples of such software includes, but is not limited to, MacPattem (EMBL), BLASTN and BLASTX (NCBIA). A skilled artisan can readily recognize that any one of the available algorithms or implementing software packages for conducting homology searches can be adapted for use in the present computer-based systems.
As used herein, a "target sequence" can be any DNA or amino acid sequence of six or more nucleotides or two or more amino acids. A skilled artisan can readily recognize that the longer a target sequence is, the less likely a target sequence will be present as a random occurrence in the database. The most preferred sequence length of a target sequence is from about 10 to 100 amino acids Is or from about 30 to 300 nucleotide residues. However, it is well recognized that searches for commercially important fragments, such as sequence fragments involved in gene expression and protein processing, may be of shorter length.
As used herein, "a target structural motif," or "target motif," refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration which is formed upon the folding of the target motif. There are a variety of target motifs known in the art.
oooo Protein target motifs include, but are not limited to, enzymic active sites and signal sequences. Nucleic acid target motifs include, but are not limited to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences).
A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention. A preferred format for an output means ranks fragments of the Streptococcus pneumoniae genomic sequences possessing varying degrees of homology to the target sequence or target motif. Such presentation provides a skilled artisan with a ranking of sequences which contain various amounts of the target sequence or target motif and identifies the degree of homology contained in the identified fragment.
A variety of comparing means can be used to compare a target sequence or target motif with the data storage means to identify sequence fragments of the Streptococcus pneumoniae genome. In the present examples, implementing software which implement the BLAST and BLAZE algorithms, described in Altschul et J. Mol. Biol. 215: 403-410(1990), is used to identify open reading frames within the Streptococcus pneumoniae genome. A skilled artisan can readily recognize that any one of the publicly available-homology search programs can be used as the search means for the computer-based systems of the present invention.
Of course, suitable proprietary systems that may be known to those of skill also may be employed in this regard.
Figure 1 provides a block diagram of a computer system illustrative of embodiments of this aspect of present invention. The computer system 102 includes a processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented as random access memory, RAM) and a variety of secondary storage devices 110, such as a hard drive 112 and a removable medium storage device 114. The removable medium storage device 114 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage medium 116 (such as a floppy disk, a compact disk, a-magnetic tape, etc.) containing control logic and/or data recorded therein may be inserted into the removable medium storage device 114. The computer system 102 includes appropriate software for reading the control logic and/or the S* 20 data from the removable medium storage device 114, once it is inserted into the removable medium storage device 114.
A nucleotide sequence of the present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 110, and/or a removable storage medium 116. During execution, software for accessing 25 and processing the genomic sequence (such as search tools, comparing tools, etc.) reside in main memory 108, in accordance with the requirements and operating parameters of the operating system, the hardware system and the software program or programs.
BIOCHEMICAL
EMBODIMENTS
Other embodiments of the present invention are directed to isolated fragments of the Streptococcus pneumoniae genome. The fragments of the Streptococcus pneumoniae genome of the present invention include, but are not limited to fragments which encode peptides and polypeptides, hereinafter open reading frames (ORFs), fragments which modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs) and fragments which can be used to diagnose the presence of Streptococcus pneumoniae in a sample, hereinafter diagnostic fragments (DFs).
As used herein, an "isolated nucleic acid molecule" or an "isolated fragment of the Streptococcus pneumoniae genome" refers to a nucleic acid molecule possessing a specific nucleotide sequence which has been subjected to purification means to reduce, from the composition, the number of compounds which are normally associated with the composition. Particularly, the term refers to the nucleic acid molecules having the sequences set out in SEQ ID NOS:1-391, to representative fragments thereof as described above, to polynucleotides at least preferably at least 99% and especially preferably at least 99.9% identical in sequence thereto, also as set out above.
A variety of purification means can be used to generate the isolated fragments of the present invention. These include, but are not limited to methods which separate constituents of a solution based on charge, solubility, or size.
In one embodiment. Streptococcus pneumoniae DNA can be enzymatically sheared to produce fragments of 15-20 kb in length. These fragments can then be used to generate a Streptococcus pneumoniae library by inserting them into lambda clones as described in the Examples below. Primers flanking, for example, an ORF, such as those enumerated in Tables 1-3 can then be generated using nucleotide sequence information provided in SEQ ID NOS:1-391. Well known and routine techniques of PCR cloning then can be used to isolate the ORF from the lambda DNA library or Streptococcus pneumoniae genomic DNA. Thus, given the availability of SEQ ID NOS: 1-391, the information in Tables 1, 2 and 3, and the information that may be obtained readily by analysis of the sequences of SEQ ID NOS:1-391 using methods set out above, those of skill will be enabled by the present disclosure to isolate any ORF-containing or other nucleic acid fragment 6f the present invention.
The isolated nucleic acid molecules of the present invention include, but are not limited to single stranded and double stranded DNA, and single stranded RNA.
As used herein, an "open reading frame," ORF, means a series of triplets coding for amino acids without any termination codons and is a sequence translatable into protein.
Tables 1, 2, and 3 list ORFs in the Streptococcus pneumoniae genomic contigs of the present invention that were identified as putative coding regions by the GeneMark software using organism-specific second-order Markov probability transition matrices. It will be appreciated that other criteria can be used, in accordance with well known analytical methods, such as those discussed herein, to generate more inclusive, more restrictive, or more selective lists.
Table I sets out ORFs in the Streptococcus pneumoniae contigs of the present invention that over a continuous region of at least 50 bases are 95% or more identical (by BLAST analysis) to a nucleotide sequence available through GenBank in October, 1997.
Table 2 sets out ORFs in the Streptococcus pneumoniae contigs of the present invention that are not in Table 1 and match, with a BLASTP probability S score of 0.01 or less, a polypeptide sequence available through GenBank in S* October, 1997.
20 Table 3 sets out ORFs in the Streptococcus pneumoniae contigs of the present invention that do not match significantly, by BLASTP analysis, a polypeptide sequence available through GenBank in October, 1997.
In each table, the first and second columns identify the ORF by, respectively, contig number and ORF number within the contig; the third column 25 indicates the first nucleotide of the ORF (actually the first nucleotide of the stop codon immediately preceeding the ORF), counting from the 5' end of the contig strand; and the fourth column, "stop indicates the last nucleotide of the stop codon defining the 3'end of the ORF.
In Tables 1 and 2, column five,- lists the- Reference for the closest matching sequence available through GenBank. These reference numbers are the databases entry numbers commonly used by those of skill in the art, who will be familiar with their denominators. Descriptions of the nomenclature are available from the National Center for Biotechnology Information. Column six in Tables 1 and 2 provides the gene name of the matching sequence; column seven provides the BLAST identity score and column eight the BLAST similarity score from the comparison of the ORF and the homologous gene; and column nine indicates the length in nucleotides of the highest scoring segment pair identified by the BLAST identity analysis.
Each ORF described in the tables is defined by "start and "stop nucleotide position numbers. These position numbers refer to the boundaries of each ORF and provide orientation with respect to whether the forward or reverse strand is the coding strand and which reading frame the coding sequence is contained. The "start" position is the first nucleotide of the triplet encoding a stop codon just 5' to the ORF and the "stop" position is the last 0t nucleotide of the triplet encoding the next in-frame stop codon the stop codon at the 3' end of the ORF). Those of ordinary skill in the art appreciate that preferred fragments within each ORF described in the table include fragments of each ORF which include the entire sequence from the delineated "start" and "stop" positions excepting the first and last three nucleotides since these encode stop 15 codons. Thus, polynucleotides set out as ORFs in the tables but lacking the three 5' nucleotides and the three 3' nucleotides are encompassed by the present invention. Those of skill also appreciate that particularly preferred are fragments within each ORF that are polynucleotide fragments comprising polypeptide coding C 0 S sequence. As defined herein, "coding sequence" includes the fragment within an 20 ORF beginning at the first in-frame ATG (triplet encoding methionine) and ending with the last nucleotide prior to the triplet encoding the 3' stop codon. Preferred are fragments comprising the entire coding sequence and fragments comprising the entire coding sequence, excepting the coding sequence for the N-terminal methionine. Those of skill appreciate that the N-terminal methionine is often 25 removed during post-translational processing and that polynucleotides lacking the ATG can be used to facilitate production of N-termainal fusion proteins which may be benefical in the production or use of genetically engineered proteins. Of course, due to the degeneracy of the genetic code many polynucleotides can encode a given polypeptide. Thus, the invention further includes polynucleotides comprising a nucleotide sequence encoding a polypeptide sequence itself encoded by the coding sequence within an ORF described in Tables 1-3 herein. Further, polynucleotides at least 95%, preferably at least 99% and especially preferably at least 99.9% identical in sequence to the foregoing polynucleotides, are contemplated by the present invention.
Polypeptides encoded by polynucleotides described above and elsewhere herein are also provided by the present invention as are polypeptide comprising a an amino acid sequence at least about 95%, preferably at least 97% and even more preferably 99% identical to the amino acid sequence of a polypeptide encoded by an ORF shown in Tables 1-3. These polypeptides may or may not comprise an Nterminal methionine.
The concepts of percent identity and percent similarity of two polypeptide sequences is well understood in the art. For example, two polypeptides 10 amino acids in length which differ at three amino acid positions at positions 1, 3 and 5) are said to have a percent identity of 70%. However, the same two polypeptides would be deemed to have a percent similarity of 80% if, for example at position 5, the amino acids moieties, although not identical, were "similar" possessed similar biochemical characteristics). Many programs for analysis of nucleotide or amino acid sequence similarity, such as fasta and BLAST specifically 15 list percent identity of a matching region as an output parameter. Thus, for instance, Tables 1 and 2 herein enumerate the percent identity of the highest scoring segment pair in each ORF and its listed relative. Further details concerning the algorithms and criteria used for homology searches are provided below and are described in the pertinent literature highlighted by the citations provided below.
It will be appreciated that other criteria can be used to generate more inclusive and more exclusive listings of the types set out in the tables. As those of skill will appreciate, narrow and broad searches both are useful. Thus, a skilled artisan can readily identify ORFs in contigs of the Streptococcus pneumoniae 25 genome other than those listed in Tables 1-3, such as ORFs which are overlapping or encoded by the opposite strand of an identified ORF in addition to those ascertainable using the computer-based systems of the present invention.
As used herein, an "expression modulating fragment," EMF, means a *series of nucleotide molecules which modulates the expression of an operably linked ORF or EMF.
As used herein, a sequence is said to "modulate the expression of an operably linked sequence" when the expression of the sequence is altered by the presence of the EMF. EMFs include, but are not limited to, promoters, and promoter modulating sequences (inducible elements). One class of EMFs are fragments which induce the expression or an operably linked ORF in response to a specific regulatory factor or physiological event.
EMF sequences can be identified within the contigs of the Streptococcus pneumoniae genome by their proximity to the ORFs provided in Tables 1-3. An intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 nucleotides in length, taken from any one of the ORFs of Tables 1-3 will modulate the expression of an operably linked ORF in a fashion similar to that found with the naturally linked ORF sequence. As used herein, an "intergenic segment" refers to fragments of the Streptococcus pneumoniae genome which are between two ORF(s) herein described. EMFs also can be identified using known EMFs as a S. 15 target sequence or target motif in the computer-based systems of the present invention. Further, the two methods can be combined and used together.
The presence and activity of an EMF can be confirmed using an EMF trap vector. An EMF trap vector contains a cloning site linked to a marker sequence. A marker sequence encodes an identifiable phenotype, such as antibiotic resistance or a complementing nutrition auxotrophic factor, which can be identified or assayed when the EMF trap vector is placed within an appropriate host under appropriate conditions. As described above, a EMF will modulate the expression of an operably linked marker sequence. A more detailed discussion of various marker sequences is provided below. A sequence which is suspected as being an EMF is cloned in all three reading frames in one or more restriction sites upstream from the marker sequence in the EMF trap vector. The vector is then transformed into an appropriate host using known procedures and the phenotype of the transformed S. host in examined under appropriate conditions. As described above; an EMF will modulate the expression of an operably linked marker sequence.
As used herein, a "diagnostic fragment," DF, means a series of nucleotide molecules which selectively hybridize to Streptococcus pneumoniae sequences.
DFs can be readily identified by identifying unique sequences within contigs of the Streptococcus pneumoniae genome, such as by using well-known computer analysis software, and by generating and testing probes or amplification primers consisting of the DF sequence in an appropriate diagnostic format which determines amplification or hybridization selectivity.
The sequences falling within the scope of the present invention are not limited to the specific sequences herein described, but also include allelic and species variations thereof. Allelic and species variations can be routinely determined by comparing the sequences provided in SEQ ID NOS:1-391, a representative fragment thereof, or a nucleotide sequence at least 95%, preferrably at least 99% and most at least preferably 99.9% identical to SEQ ID NOS:1-391, with a sequence from another isolate of the same species. Furthermore, to accommodate codon variability, the invention includes nucleic acid molecules coding for the same amino acid sequences as do the specific ORFs disclosed herein. In other words, in the coding region of an ORF, substitution of one codon for another which encodes the same amino acid is expressly contemplated. Any specific sequence disclosed herein can be readily screened for errors by o o 15 resequencing a particular fragment, such as an ORF, in both directions sequence both strands). Alternatively, error screening can be performed by S: sequencing corresponding polynucleotides of Streptococcus pneumoniae origin isolated by using part or all of the fragments in question as a probe or primer.
Preferred DFs of the present invention comprise at least about 17, preferrably at least about 20, and more preferrably at least about 50 contiguous nucleotides within an ORF set out in Tables 1-3. Most highly preferred DFs specifically hybridize to a polynucleotide containing the sequence of the ORF from which they are derived. Specific hybridization occurs even under stringent conditions defined elsewhere herein.
25 Each of the ORFs of the Streptococcus pneumoniae genome disclosed in Tables 1, 2 and 3, and the EMFs found 5' to the ORFs, can be used as polynucleotide reagents in numerous ways. For example, the sequences can be used as diagnostic probes or diagnostic amplification primers to detect the presence of a specific microbe in a sample, particularly Streptococcus pneumoniae.
Especially preferred in this regard are ORFs such as those of Table 3, which do not match previously characterized sequences from other organisms and thus are most likely to be highly selective for Streptococcus pneumoniae. Also particularly preferred are ORFs that can be used to distinguish between strains of Streptococcus pneumoniae, particularly those that distinguish medically important strain, such as drug-resistant strains.
In addition, the fragments of the present invention, as broadly described, can be used to control gene expression through triple helix formation or antisense DNA or RNA, both of which methods are based on the binding of a polynucleotide sequence to DNA or RNA. Triple helix-formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Information from the sequences of the present invention can be used to design antisense and triple helixforming oligonucleotides. Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region of the gene involved in transcription, for triple-helix formation, or to the mRNA itself, for antisense inhibition. Both techniques have been demonstrated to be effective in model systems, and the requisite techniques are well known and involve routine procedures. Triple helix techniques are discussed in, for example, Lee et al., Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and Dervan et al., Science 251:1360 (1991). Antisense techniques in general are discussed in, for instance, Okano, J. Neurochem. 56:560 (1991) and Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)).
The present invention further provides recombinant constructs comprising one or more fragments of the Streptococcus pneumoniae genomic fragments and contigs of the present invention. Certain preferred recombinant constructs of the present invention comprise a vector, such as a plasmid or viral vector, into which a fragment of the Streptococcus pneumoniae genome has been inserted, in a forward or reverse orientation. In the case of a vector comprising one of the ORFs of the present invention, the vector may further comprise regulatory sequences, including forexample, a promoter, operably linked to the ORF. For vectors comprising the EMFs of the present invention, the vector may further comprise a marker sequence or heterologous ORF operably linked to the EMF.
Large numbers of suitable vectors and promoters are known to those of skill in the art and are commercially available for generating the recombinant constructs of the present invention. The following vectors are provided by way of example. Useful bacterial vectors include phagescript, PsiX174, pBluescript SK, pBS KS, pNH8a, pNH16a, pNH18a, pNH46a (available from Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (available from Pharmacia).
Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, pXT1, pSG (available from Stratagene) pSVK3, pBPV, pMSG, pSVL (available from Pharmacia).
Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.
Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters include lacI, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late LTRs from retrovirus, and mouse metallothionein- I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.
The present invention further provides host cells containing any one of the isolated fragments of the Streptococcus pneumoniae genomic fragments and contigs of the present invention, wherein the fragment has been introduced into the host cell using known methods. The host cell can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or 15 a procaryotic cell, such as a bacterial cell.
A polynucleotide of the present invention, such as a recombinant construct comprising an ORF of the present invention, may be introduced into the host by a variety of well established techniques that are standard in the art, such as calcium phosphate transfection, DEAE, dextran mediated transfection and electroporation, which are described in, for instance, Davis, L. et al., BASIC METHODS IN MOLECULAR BIOLOGY (1986).
o A host cell containing one of the fragments of the Streptococcus pneumoniae genomic fragments and contigs of the present invention, can be used in conventional manners to produce the gene product encoded by the isolated 25 fragment (in the case of an ORF) or can be used to produce a heterologous protein under the control of the EMF. The present invention further provides isolated polypeptides encoded by the nucleic acid fragments of the present S' invention or by degenerate variants of the nucleic acid fragments of the present invention. By "degenerate variant" is intended nucleotide fragments which differ from a nucleic acid fragment of the present invention an ORF) by nucleotide sequence but, due to the degeneracy of the Genetic Code, encode an identical polypeptide sequence.
Preferred nucleic acid fragments of the present invention are the ORFs and subfragments thereof depicted in Tables 2 and 3 which encode proteins.
A variety of methodologies known in the art can be utilized to obtain any one of the isolated polypeptides or proteins of the present invention. At the simplest level, the amino acid sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides---Such short fragments as may be obtained most readily by synthesis are useful, for example, in generating antibodies against the native polypeptide, as discussed further below.
In an alternative method, the polypeptide or protein is purified from bacterial cells which naturally produce the polypeptide or protein. One skilled in the art can readily employ well-known methods for isolating polypeptides and proteins to isolate and purify polypeptides or proteins of the present invention produced naturally by a bacterial strain, or by other methods. Methods for isolation and purification that can be employed in this regard include, but are not limited to, immunochromatography, HPLC, size-exclusion chromatography, ion- 15 exchange chromatography, and immuno-affinity chromatography.
The polypeptides and proteins of the present invention also can be purified from cells which have been altered to express the desired polypeptide or protein.
As used herein, a cell is said to be altered to express a desired polypeptide or protein when the cell, through genetic manipulation, is made to produce a polypeptide or protein which it normally does not produce or which the cell normally produces at a lower level. Those skilled in the art can readily adapt procedures for introducing and expressing either recombinant or synthetic sequences into eukaryotic or prokaryotic cells in order to generate a cell which produces one of the polypeptides or proteins of the present invention.
25 Any host/vector system can be used to express one or more of the ORFs of the present invention. These include, but are not limited to, eukaryotic hosts such as HeLa cells, CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host such as E. coli.and B. subtilis. The most preferred cells are those which do not normally express the particular polypeptide or protein or which expresses the polypeptide or protein at low natural level.
"Recombinant," as used herein, means that a polypeptide or protein is derived from recombinant microbial or mammalian) expression systems.
"Microbial" refers to recombinant polypeptides or proteins made in bacterial or fungal yeast) expression systems. As a product, "recombinant microbial"defines a polypeptide or protein essentially free of native endogenous substances and unaccompanied by associated native glycosylation. Polypeptides or proteins expressed in most bacterial cultures, E. coli, will be free of glycosylation modifications; polypeptides or proteins expressed in yeast will have a glycosylation pattern different from that expressed in mammalian cells.
"Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides.
Generally, DNA segments encoding the polypeptides and proteins provided by this invention are assembled from fragments of the Streptococcus pneumoniae genome and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is capable of being expressed in a recombinant transcriptional 15 unit comprising regulatory elements derived from a microbial or viral operon.
Recombinant expression vehicle or vector" refers to a plasmid or phage or virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The expression vehicle can comprise a transcriptional unit comprising an assembly of a genetic regulatory elements necessary for gene expression in the host, including elements required to initiate and maintain transcription at a level sufficient for suitable expression of the desired polypeptide, including, for example, promoters and, where necessary, an enhancer and a polyadenylation signal; a structural or coding sequence which is transcribed into mRNA and translated into protein, and appropriate signals to initiate translation at the beginning of the 25 desired coding region and terminate translation at its end. Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell.
Alternatively, where recombinant protein is expressed without a leader or transport sequence, it may include an N-terminal methionine residue. This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.
"Recombinant expression system" means host cells which have stably integrated a recombinant transcriptional unit into chromosomal DNA or carry the recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic or eukaryotic. Recombinant expression systems as defined herein will express heterologous polypeptides or proteins upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed.
Mature proteins can be expressed in mammalian.cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2 nd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989), the disclosure of which is hereby incorporated by reference in its entirety.
Generally, recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly expressed gene to direct transcription of a 15 downstream structural sequence. Such promoters can be derived from operons encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), alphafactor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein into the periplasmic space or extracellular medium.
Optionally, the heterologous sequence can encode a fusion protein including an Nterminal identification peptide imparting desired characteristics, stabilization or simplified purification of expressed recombinant product.
Useful expression vectors for bacterial use are constructed by inserting a S* 25 structural DNA sequence encoding a desired protein together with suitable translation initiation and termination signals in operable reading phase with a functional promoter. The vector will comprise one or more phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and, when desirable, provide amplification within the host.
Suitable prokaryotic hosts for transformation include strains of E. coli, B.
subtilis, Salmonella typhimurium and various species within the genera Pseudomonas and Streptomyces. Others may, also be employed as a matter of choice.
As a representative but non-limiting example, useful expression vectors for 'bacterial use can comprise a selectable marker and bacterial origin of replication derived from commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, pKK223-3 (available form Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (available from Promega Biotec, Madison, WI, USA). These pBR322 "backbone" sections-are combined with an appropriate promoter and the structural sequence to be expressed.
Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter, where it is inducible, is derepressed or induced by appropriate means temperature shift or chemical induction) and cells are cultured for an additional period to provide for expression of the induced gene product. Thereafter cells are typically harvested, generally by centrifugation, disrupted to release expressed protein, generally by physical or chemical means, and the resulting crude extract is retained for further purification.
Various mammalian cell culture systems can also be employed to express ;o.o 15 recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described in Gluzman, Cell 23:175 (1981), and other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines.
Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required 25 nontranscribed genetic elements.
Recombinant polypeptides and proteins produced in bacterial culture is usually isolated by initial extraction from cell pellets, followed by one or more .salting-out, aqueous ion exchange or size exclusion chromatography steps.
Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Protein refolding steps can be used, as necessary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps.
The present invention further includes isolated polypeptides, proteins and nucleic acid molecules which are substantially equivalent to those herein described.
As used herein, substantially equivalent can refer both to nucleic acid and amino acid sequences, for example a mutant sequence, that varies from a reference sequence by one or more substitutions, deletions, or additions, the net effect of which does not result in an adverse functional dissimilarity between reference and subject sequences. For purposes of the present invention, sequences having equivalent biological activity, and equivalent expression characteristics are considered substantially equivalent. For purposes of determining equivalence, truncation of the mature sequence should be disregarded.
The invention further provides methods of obtaining homologs from other strains of Streptococcus pneumoniae, of the fragments of the Streptococcus pneumoniae genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention. As used herein, a sequence or protein of 15 Streptococcus pneumoniae is defined as a homolog of a fragment of the Streptococcus pneumoniae fragments or contigs or a protein encoded by one of the ORFs of the present invention, if it shares significant homology to one of the fragments of the Streptococcus pneumoniae genome of the present invention or a protein encoded by one of the ORFs of the present invention. Specifically, by using the sequence disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.
As used herein, two nucleic acid molecules or proteins are said to "share significant homology" if the two contain regions which possess greater than 25 sequence (amino acid or nucleic acid) homology. Preferred homologs in this regard are those with more than 90% homology. Especially preferred are those with 93% or more homology. Among especially preferred homologs those with 95% or more homology are particularly preferred. Very particularly preferred among these are those with 97% and even more particularly preferred among those are homologs with 99% or more homology. The most preferred homologs among these are those with 99.9% homology or more. It will be understood that, among measures of homology, identity is particularly preferred in this regard.
Region specific primers or probes derived from the nucleotide sequence provided in SEQ ID NOS:1-391 or from a nucleotide sequence at least particularly at least 99%, especially at least 99.5% identical to a sequence of SEQ ID NOS:1-391 can be used to prime DNA synthesis and PCR amplification, as well as to identify colonies containing cloned DNA encoding a homolog. Methods suitable to this aspect of the present invention are well known and have been described in great detail in many publications such as, for example, Innis et al., PCR Protocols, Academic Press, San Diego, CA (1990)).
When using primers derived from SEQ ID NOS: 1-391 or from a nucleotide sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-391, one skilled in the art will recognize that by employing high stringency conditions annealing at 50-60 0 C in 6X SSPC and 50% formamide, and washing at 65 0 C in 0.5X SSPC) only sequences which are greater than 75% homologous to the primer will be amplified. By employing lower stringency conditions hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing at 42°C in 0.5X SSPC), sequences which are greater than 40-50% homologous to the primer will also be amplified.
s15 When using DNA probes derived from SEQ ID NOS:1-391, or from a nucleotide sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-391, for colony/plaque hybridization, one skilled in the art will recognize that by employing high stringency conditions hybridizing at 50- 65 0 C in SSPC and 50% formamide, and washing at 50- 65 0 C in 0.5X SSPC), sequences having regions which are greater than 90% homologous to the probe can be obtained, and that by employing lower stringency conditions hybridizing at 35-37 0 C in 5X SSPC and 40-45% formamide, and washing at 42 0 C in SSPC), sequences having regions which are greater than 35-45% homologous to the probe will be obtained.
25 Any organism can be used as the source for homologs of the present invention so long as the organism naturally expresses such a protein or contains genes encoding the same. The most preferred organism for isolating homologs are *bacteria which are closely related to Streptococcus pneumoniae.
a ILLUSTRATIVE USES OF COMPOSITIONS OF THE
INVENTION
Each ORF provided in Tables 1 and 2 is identified with a function by homology to a known gene or polypeptide. As a result, one skilled in the art can use the polypeptides of the present invention for commercial, therapeutic and industrial purposes consistent with the type of putative identification of the polypeptide. Such identifications permit one skilled in the art to use the Streptococcus pneumoniae ORFs in a manner similar to the known type of sequences for which the identification is made; for example, to ferment a particular sugar source or to produce a particular metabolite. A variety of reviews illustrative of this aspect of the invention are available, including the following reviews on the industrial use of enzymes, for example, BIOCHEMICAL ENGINEERING AND BIOTECHNOLOGY HANDBOOK, 2nd Ed., MacMillan Publications, Ltd. NY (1991) and BIOCATALYSTS IN ORGANIC SYNTHESES, Tramper et al., Eds., Elsevier Science Publishers, Amsterdam, The Netherlands (1985). A variety of exemplary uses that illustrate this and similar aspects of the present invention are discussed below.
1. Biosynthetic Enzymes Open reading frames encoding proteins involved in mediating the catalytic 15 reactions involved in intermediary and macromolecular metabolism, the biosynthesis of small molecules, cellular processes and other functions includes enzymes involved in the degradation of the intermediary products of metabolism, enzymes involved in central intermediary metabolism, enzymes involved in respiration, both aerobic and anaerobic, enzymes involved in fermentation, enzymes involved in ATP proton motor force conversion, enzymes involved in broad regulatory function, enzymes involved in amino acid synthesis, enzymes involved in nucleotide synthesis, enzymes involved in cofactor and vitamin synthesis, can be used for industrial biosynthesis.
The various metabolic pathways present in Streptococcus pneumoniae can be identified based on absolute nutritional requirements as well as by examining the various enzymes identified in Table 1-3 and SEQ ID NOS: 1-391.
Of particular interest are polypeptides involved in the degradation of intermediary metabolites as well as non-macromolecular metabolism. Such enzymes include amylases, glucose oxidases, and catalase.
Proteolytic enzymes are another class of commercially important enzymes.
Proteolytic enzymes find use in a number of industrial processes including the processing of flax and other vegetable fibers, in the extraction, clarification and depectinization of fruit juices, in the extraction of vegetables' oil and in the maceration of fruits and vegetables to give unicellular fruits. A detailed review-bf the proteolytic enzymes.used in the food industry is provided in Rombouts et al., Symbiosis 21:79 (1986) and Voragen et al. in Biocatalysts In Agricultural Biotechnology, Whitaker et al., Eds., American Chemical Society Symposium Series 389:93 (1989).
The metabolism of sugars is an important aspect of the primary metabolism of Streptococcus pneumoniae. Enzymes involved in the degradation of sugars, such as, particularly, glucose, galactose, fructose and xylose, can be used in industrial fermentation. Some of the important sugar transforming enzymes, from a commercial viewpoint, include sugar isomerases such as glucose isomerase.
Other metabolic enzymes have found commercial use such as glucose oxidases which produces ketogulonic acid (KGA). KGA is an intermediate in the commercial production of ascorbic acid using the Reichstein's procedure, as described in Krueger et al., Biotechnology Rhine et al., Eds., Verlag Press, Weinheim, Germany (1984).
Glucose oxidase (GOD) is commercially available and has been used in 15 purified form as well as in an immobilized form for the deoxygenation of beer.
See, for instance, Hartmeir et al., Biotechnology Letters 1:21 (1979). The most important application of GOD is the industrial scale fermentation of gluconic acid.
Market for gluconic acids which are used in the detergent, textile, leather, photographic, pharmaceutical, food, feed and concrete industry, as described, for example, in Bigelis et al., beginning on page 357 in GENE MANIPULATIONS AND FUNGI; Benett et al., Eds., Academic Press, New York (1985). In addition to industrial applications, GOD has found applications in medicine for quantitative determination of glucose in body fluids recently in biotechnology for analyzing syrups from starch and cellulose hydrosylates. This application is described in 25 Owusu et al., Biochem. et Biophysica. Acta. 872:83 (1986), for instance.
The main sweetener used in the world today is sugar which comes from sugar beets and sugar cane. In the field of industrial enzymes, the glucose ~isomerase process shows the largest expansion in the market today. Initially, soluble enzymes were used and later immobilized enzymes were developed (Krueger et al., Biotechnology, The Textbook of Industrial Microbiology, Sinauer Associated Incorporated, Sunderland, Massachusetts (1990)). Today, the use of glucose- produced high fructose syrups is by far the largest industrial business using immobilized enzymes. A review of the industrial use of these enzymes is provided by Jorgensen, Starch 40:307 (1988).
Proteinases, such as alkaline serine proteinases, are used as detergent additives and thus represent one of the largest volumes of microbial enzymes used in the industrial sector. Because of their industrial importance, there is a large body of published and unpublished information regarding the use of these enzymes in industrial processes. (See Faultman et al., Acid Proteases Structure Function and Biology, Tang, ed., Plenum Press, New York (1977) and Godfrey et al., Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et al., Report Industrial Enzymes by 1990, Hel Hepner Associates, London (1986)).
Another class of commercially usable proteins of the present invention are the microbial lipases, described by, for instance, Macrae et al., Philosophical Transactions of the Chiral Society of London 310:227 (1985) and Poserke, Journal of the American Oil Chemist Society 61:1758 (1984). A major use of lipases is in the fat and oil industry for the production of neutral glycerides using lipase catalyzed inter-esterification of readily available triglycerides. Application of lipases include the use as a detergent additive to facilitate the removal of fats from fabrics in the course of the washing procedures.
The use of enzymes, and in particular microbial enzymes, as catalyst for key steps in the synthesis of complex organic molecules is gaining popularity at a great rate. One area of great interest is the preparation of chiral intermediates.
Preparation of chiral intermediates is of interest to a wide range of synthetic o* chemists particularly those scientists involved with the preparation of new pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et al., Recent Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, Boca Raton, Florida (1990)). The following reactions catalyzed by enzymes are of 25 interest to organic chemists: hydrolysis of carboxylic acid esters, phosphate esters, amides and nitriles, esterification reactions, trans-esterification reactions, synthesis of amides, reduction of alkanones and oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to sulfoxides, and carbon bond forming reactions such as the aldol reaction.
When considering the use of an enzyme encoded by one of the ORFs of the present invention for biotransformation and organic synthesis it is sometimes necessary to consider the respective advantages and disadvantages of using a microorganism as opposed to an isolated enzyme. Pros and cons of using a whole cell system on the one hand or an isolated partially purified enzyme on the other hand, has been described in detail by Bud et al., Chemistry in Britain (1987), p.
127.
Amino transferases, enzymes involved in the biosynthesis and metabolism of amino acids, are useful in the catalytic production of amino acids. The advantages of using microbial based enzyme systems is that the amino transferase enzymes catalyze the stereo- selective synthesis of only L-amino acids and generally possess uniformly high catalytic rates. A description of the use of amino transferases for amino acid production is provided by Roselle-David, Methods of Enzymology 136:479 (1987).
Another category of useful proteins encoded by the ORFs of the present invention include enzymes involved in nucleic acid synthesis, repair, and recombination.
2. Generation of Antibodies So. 15 As described here, the proteins of the present invention, as well as homologs thereof, can be used in a variety of procedures and methods known in the art which are currently applied to other proteins. The proteins of the present invention can further be used to generate an antibody which selectively binds the protein. Such antibodies can be either monoclonal or polyclonal antibodies, as well fragments of these antibodies, and humanized forms.
ooo The invention further provides antibodies which selectively bind to one of the proteins of the present invention and hybridomas which produce these antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.
25 In general, techniques for preparing polyclonal and monoclonal antibodies as well as hybridomas capable of producing the desired antibody are well known in the art (Campbell, A. Monoclonal Antibody Technology: Laboratory Techniques In Biochemistry And Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. Groth et al., J. Immunol. Methods 35: 1- 21 (1980), Kohler and Milstein, Nature 256:495-497 (1975)), the trioma technique, the human B-cell hybridoma technique (Kozbor et al., Immunology Today 4:72 (1983), pgs. 77-96 of Cole et al., in Monoclonal Antibodies And Cancer Therapy, Alan R. Liss, Inc. (1985)). Any animal (mouse, rabbit, etc.)which is known to produce antibodies can be immunized with the pseudogene polypeptide. Methods for immunization are well known in the art. Such methods include subcutaneous or interperitoneal injection of the polypeptide. One skilled in the art will recognize that the amount of the protein encoded by the ORF of the present invention used for immunization will vary based on the animal which is immunized, the antigenicity of the peptide and the site of injection.
The protein which is used as an immunogen may be modified or administered in an adjuvant in order to increase the protein's antigenicity. Methods of increasing the antigenicity of a protein are well known in the art and include, but are not limited to coupling the antigen with a heterologous protein (such as globulin or galactosidase) or through the inclusion of an adjuvant during immunization.
For monoclonal antibodies, spleen cells from the immunized animals are removed, fused with myeloma cells, such as SP2/0-Agl4 myeloma cells, and allowed to become monoclonal antibody producing hybridoma cells.
Any one of a number of methods well known in the art can be used to identify the hybridoma cell which produces an antibody with the desired characteristics. These include screening the hybridomas with an ELISA assay, western blot analysis, or radioimmunoassay (Lutz et al., Exp. Cell Res. 175:109- 124 (1988)).
Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using procedures known in the art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and o ~Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984)).
Techniques described for the production of single chain antibodies S.
Patent 4,946,778) can be adapted to produce single chain antibodies to proteins of 25 the present invention.
For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is screened for the presence of antibodies with the desired specificity using one of the above-described procedures.
The present invention further provides the above- described antibodies in detectably labelled form. Antibodies can be detectably labelled through the use of radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing such labeling are well-known in the art, for example see Sternberger et al., J.
Histochem. Cytochem. 18:315 (1970); Bayer, E. A. et al., Meth. Enzym. 62:308 (1979); Engval, E. et al., Immunol. 109:129 (1972); Goding, J. J. Immunol.
Meth. 13:215 (1976)).
The labeled antibodies of the present invention can be used for in vitro, in vivo, and in situ assays to identify cells or tissues in which a fragment of the Streptococcus pneumoniae genome is expressed.
The present invention further provides the above-described antibodies immobilized on a solid support. Examples of such solid supports include plastics such as polycarbonate, complex carbohydrates such as agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. Techniques for coupling antibodies to such solid supports are well known in the art (Weir, D. M.
et al., "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. et al., Meth.
Enzym. 34 Academic Press, N. Y. (1974)). The immobilized antibodies of the present invention can be used for in vitro, in vivo, and in situ assays as well as for S. 15 immunoaffinity purification of the proteins of the present invention.
o 3. Diagnostic Assays and Kits The present invention further provides methods to identify the expression of one of the ORFs of the present invention, or homolog thereof, in a test sample, using one of the DFs or antibodies of the present invention.
In detail, such methods comprise incubating a test sample with one or more of the antibodies or one or more of the DFs of the present invention and assaying for binding of the DFs or antibodies to components within the test sample.
Conditions for incubating a DF or antibody with a test sample vary.
25 Incubation conditions depend on the format employed in the assay, the detection methods employed, and the type and nature of the DF or antibody used in the *.assay. One skilled in the art will recognize that any one of the commonly available hybridization, amplification or immunological assay formats can readily be adapted to employ the DFs or antibodies of the present invention. Examples of such assays can be found in Chard, An Introduction to Radioimmunoassay and Related Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); Bullock, G. R. et al., Techniques in Immunocytochemistry, Academic Press, Orlando, FL Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, Practice and Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985).
The test samples of the present invention include cells, protein or membrane extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or urine. The test sample used in the above-described method will vary based on the assay format, nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. Methods for preparing protein extracts or membrane extracts of cells are well known in the art and can be readily be adapted in order to obtain a sample which is compatible with the system utilized.
In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry out the assays of the present invention.
Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: a first container comprising one of the DFs or antibodies of the present invention; and one or 15 more other containers comprising one or more of the following: wash reagents, •reagents capable of detecting presence of a bound DF or antibody.
In detail, a compartmentalized kit includes any kit in which reagents are contained in separate containers. Such containers include small glass containers, 9 plastic containers or strips of plastic or paper. Such containers allows one to efficiently transfer reagents from one compartment to another compartment such S• that the samples and reagents are not cross-contaminated, and the agents or solutions of each container can be added in a quantitative fashion from one compartment to another. Such containers will include a container which will accept *°9°.the test sample, a container which contains the antibodies used in the assay, containers which contain wash reagents (such as phosphate buffered saline, Trisbuffers, etc.), and containers which contain the reagents used to detect the bound antibody or DF.
Types of detection reagents include labelled nucleic acid probes, labelled secondary antibodies, or in the alternative, if the primary antibody is labelled, the enzymatic, or antibody binding reagents which are capable of reacting with the labelled antibody. One skilled in the art will readily recognize that the disclosed DFs and antibodies of the present invention can be readily incorporated into one of the established kit formats which are well known in the art.
4. Screening. Assay for Binding Agents Using the isolated proteins of the present invention, the present invention further provides methods of obtaining and identifying agents which bind to a protein encoded by one of the ORFs of the present invention or to one of the fragments and the Streptococcus pneumoniae fragment and contigs herein described.
In general, such methods comprise steps of: contacting an agent with an isolated protein encoded by one of the ORFs of the present invention, or an isolated fragment of the Streptococcus pneumoniae genome; and determining whether the agent binds to said protein or said fragment.
The agents screened in the above assay can be, but are not limited to, peptides, carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents can be selected and screened at random or rationally selected or designed using protein modeling techniques.
S" 15 For random screening, agents such as peptides, carbohydrates, S. pharmaceutical agents and the like are selected at random and are assayed for their ability to bind to the protein encoded by the ORF of the present invention.
S Alternatively, agents may be rationally selected or designed. As used ^herein, an agent is said to be "rationally selected or designed" when the agent is chosen based on the configuration of the particular protein. For example, one skilled in the art can readily adapt currently available procedures to generate peptides, pharmaceutical agents and the like capable of binding to a specific peptide sequence in order to generate rationally designed antipeptide peptides, for example see Hurby et al., "Application of Synthetic Peptides: Antisense Peptides," in Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, and Kaspczak et al., Biochemistry 28:9230-8 (1989), or pharmaceutical agents, or the like.
In addition to the foregoing, one class of agents of the present invention, as broadly described, can be used to control gene expression through binding to one of the ORFs or EMFs of the present invention. As described above, such agents can be randomly screened or rationally designed/selected. Targeting the ORF or EMF allows a skilled artisan to design sequence specific or element specific agents, modulating the expression of either a single ORF or multiple ORFs which rely on the same EMF for expression control.
One class of DNA binding agents are agents which contain base residues which hybridize or form a triple helix by binding to DNA or RNA. Such agents can be based on the classic phosphodiester, ribonucleic acid backbone, or can be a variety of sulfhydryl or polymeric derivatives which have base attachment capacity.
Agents suitable for use in these methods usually contain 20 to 40 bases and are designed to be complementary to a region of the gene involved in transcription (triple helix see Lee et al., Nucl. Acids Res. 6:3073 (1979); Cooney et al., Science 241:456 (1988); and Dervan et al., Science 251:1360 (1991)) or to the mRNA itself (antisense Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)). Triple helix- formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the 15 sequences of the present invention can be used to design antisense and triple helixforming oligonucleotides, and other DNA binding agents.
Pharmaceutical Compositions and Vaccines The present invention further provides pharmaceutical agents which can be used to modulate the growth or pathogenicity of Streptococcus pneumoniae, or another related organism, in vivo or in vitro. As used herein, a "pharmaceutical agent" is defined as a composition of matter which can be formulated using known techniques to provide a pharmaceutical compositions. As used herein, the "pharmaceutical agents of the present invention" refers the pharmaceutical agents 25 which are derived from the proteins encoded by the ORFs of the present invention or are agents which are identified using the herein described assays.
As used herein, a pharmaceutical agent is said to "modulate the growth pathogenicity of Streptococcus pneumoniae or a related organism, in vivo or in vitro," when the agent reduces the rate of growth, rate of division, or viability of the organism in question. The pharmaceutical agents of the present invention can modulate the growth or pathogenicity of an organism in many fashions, although an understanding of the underlying mechanism of action is not needed to practice the use of the pharmaceutical agents of the present invention. Some agents will modulate the growth by binding to an important protein thus blocking the biological activity of the protein, while other agents may bind to a component of the outer surface of the organism blocking attachment or rendering the organism more prone to act the bodies nature immune system. Alternatively, the agent may comprise a protein encoded by one of the ORFs of the present invention and serve as a vaccine. The development and use of a vaccine based on outer membrane components are well known in the art.
As used herein, a "related organism" is a broad term which refers to any organism whose growth can be modulated by one of the pharmaceutical agents of the present invention. In general, such an organism will contain a homolog of the protein which is the target of the pharmaceutical agent or the protein used as a vaccine. As such, related organisms do not need to be bacterial but may be fungal or viral pathogens.
The pharmaceutical agents and compositions of the present invention may be administered in a convenient manner, such as by the oral, topical, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. The 15 pharmaceutical compositions are administered in an amount which is effective for :treating and/or prophylaxis of the specific indication. In general, they are administered in an amount of at least about 1 mg/kg body weight and in most cases S* they will be administered in an amount not in excess of about I g/kg body weight S: per day. In most cases, the dosage is from about 0.1 mg/kg to about 10 g/kg body weight daily, taking into account the routes of administration, symptoms, etc.
The agents of the present invention can be used in native form or can be modified to form a chemical derivative. As used herein, a molecule is said to be a "chemical derivative" of another molecule when it contains additional chemical moieties not normally a part of the molecule. Such moieties may improve the molecule's solubility, absorption, biological half life, etc. The moieties may alternatively decrease the toxicity of the molecule, eliminate or attenuate any undesirable side effect of the molecule, etc. Moieties capable of mediating such effects are disclosed in, among, other sources, REMINGTON'S PHARMACEUTICAL SCIENCES (1980) cited elsewhere herein.
For example, such moieties may change an immunological character of the functional derivative, such as affinity for a given antibody. Such changes in immunomodulation activity are measured by the appropriate assay, such as a competitive type immunoassay. Modifications of such protein properties as redox or thermal stability, biological half-life, hydrophobicity, susceptibility to proteolytic degradation or the tendency to aggregate with carriers or into multimers also may be effected in this way and can be assayed by methods well known to the skilled artisan.
S The therapeutic effects of the agents of the present invention may be obtained by providing the agent to a patient by any suitable means inhalation, intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is preferred to administer the agent of the present invention so as to achieve an effective concentration within the blood or tissue in which the growth of the organism is to be controlled. To achieve an effective blood concentration, the preferred method is to administer the agent by injection. The administration may be by continuous infusion, or by single or multiple injections.
In providing a patient with one of the agents of the present invention, the dosage of the administered agent will vary depending upon such factors as the patient's age, weight, height, sex, general medical condition, previous medical history, etc. In general, it is desirable to provide the recipient with a dosage of 15 agent which is in the range of from about 1 pg/kg to 10 mg/kg (body weight of patient), although a lower or higher dosage may be administered. The therapeutically effective dose can be lowered by using combinations of the agents of the present invention or another agent.
As used herein, two or more compounds or agents are said to be administered "in combination" with each other when either the physiological effects of each compound, or the serum concentrations of each compound can be measured at the same time. The composition of the present invention can be administered concurrently with, prior to, .or following the administration of the *...other agent.
The agents of the present.invention are intended to be provided to recipient subjects in an amount sufficient to decrease the rate of growth (as defined above) of the target organism.
The administration of the agent(s) of the invention may be for either a "prophylactic" or "therapeutic" purpose. When provided prophylactically, the agent(s) are provided in advance of any symptoms indicative of the organisms growth. The prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the rate of onset of any subsequent infection. When provided therapeutically, the agent(s) are provided at (or shortly after) the onset of an indication of infection. The therapeutic administration of the compound(s) serves to attenuate the pathological symptoms of the infection and to increase the rate of recovery.
The agents of the present invention are administered to a subject, such as a mammal, or a patient, in a pharmaceutically acceptable form and in a therapeutically effective concentration. A composition is said to be "pharmacologically acceptable" if its administration can be tolerated by a recipient patient. Such an agent is said to be administered in a "therapeutically effective amount" if the amount administered is physiologically significant. An agent is physiologically significant if its presence results in a detectable change in the physiology of a recipient patient.
The agents of the present invention can be formulated according to known methods to prepare pharmaceutically useful compositions, whereby these materials, or their functional derivatives, are combined in a mixture with a pharmaceutically acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of other human proteins, human serum albumin, are described, for example, in 15 REMINGTON'S PHARMACEUTICAL SCIENCES, 16 th Ed., Osol, Ed., Mack Publishing, Easton PA (1980). In order to form a pharmaceutically •acceptable composition suitable for effective administration, such compositions will ••contain an effective amount of one or more of the agents of the present invention, together with a suitable amount of carrier vehicle.
Additional pharmaceutical methods may be employed to control the duration of action. Control release preparations may be achieved through the use of polymers to complex or absorb one or more of the agents of the present invention.
The controlled delivery may be effectuated by a variety of well known techniques, including formulation with macromolecules such as, for example, polyesters, o 25 polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose, carboxymethylcellulose, or protamine, sulfate, adjusting the concentration of the macromolecules and the agent in the formulation, and by appropriate use of methods of incorporation, which can be manipulated to effectuate a desired time course of release. Another possible method to control the duration of action by controlled release preparations is to incorporate agents of the present invention into particles of a polymeric material such as polyesters, polyamino acids, hydrogels, poly(lactic acid) or ethylene vinylacetate copolymers. Alternatively, instead of incorporating these agents into polymeric particles, it is possible to entrap these materials in microcapsules prepared, for example, by coacervation techniques or by interfacial polymerization with, for example, hydroxymethylcellulose or gelatinemicrocapsules and poly(methylmethacylate) microcapsules, respectively, or in colloidal drug delivery systems, for example, liposomes, albumin microspheres, microemulsions, nanoparticles, and nanocapsules or in macroemulsions. Such techniques are disclosed in REMINGTON'S PHARMACEUTICAL
SCIENCES
(1980).
The invention further provides a pharmaceutical pack or kit comprising one or more containers filled with one or more of the ingredients of the pharmaceutical compositions of the invention. Associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration.
In addition, the agents of the present invention may be employed in conjunction with other therapeutic compounds.
15 6. Shot-Gun Approach to Megabase DNA Sequencing The present invention further demonstrates that a large sequence can be sequenced using a random shotgun approach. This procedure, described in detail in the examples that follow, has eliminated the up front cost of isolating and ordering overlapping or contiguous subclones prior to the start of the sequencing protocols.
Certain aspects of the present invention are described in greater detail in the examples that follow. The examples are provided by way of illustration. Other aspects and embodiments of the present invention are contemplated by the inventors, as will be clear to those of skill in the art from reading the present 00. 25 disclosure.
ILLUSTRATIVE EXAMPLES LIBRARIES AND SEQUENCING 1. Shotgun Sequencing Probability Analysis The overall strategy for a shotgun approach to whole genome sequencing follows from the Lander and Waterman (Landerman and. Waterman, Genomics 2:231 (1988)) application of the equation for the Poisson distribution. According to this treatment, the probability, P that any given base in a sequence of size L, in nucleotides, is not sequenced after a certain amount, n, in nucleotides, of random 0 sequence has been determined can be calculated by the equation P e-m, where m is L/n, the fold coverage. For instance, for a genome of 2.8 Mb, m=l when 2.8 Mb of sequence has been randomly generated (IX coverage). APthat point, P e- 1 0.37. The probability that any given base has not been sequenced is the same 0 as the probability that any region of the whole sequence L has not been determined and, therefore, is equivalent to the fraction of the whole sequence that has yet to be determined. Thus, at one-fold coverage, approximately 37% of a polynucleotide of size L, in nucleotides has not been sequenced. When 14 Mb of sequence has been generated, coverage is 5X for a 2.8 Mb and the unsequenced fraction drops to .0067 or 0.67%. 5X coverage of a 2.8 Mb sequence can be attained by sequencing approximately 17,000 random clones from both insert ends with an average sequence read length of 410 bp.
Similarly, the total gap length, G, is determined by the equation G Le m and the average gap size, g, follows the equation, g L/n. Thus, 5X coverage 1 5 leaves about 240 gaps averaging about 82 bp in size in a sequence of a S* polynucleotide 2.8 Mb long.
The treatment above is essentially that of Lander and Waterman, Genomics 2:231 (1988).
2. Random Library Construction In order to approximate the random model described above during actual sequencing, a nearly ideal library of cloned genomic fragments is required. The following library construction procedure was developed to achieve this end.
Streptococcus pneumoniae DNA is prepared by phenol extraction. A 25 mixture containing 200 pg DNA in 1.0 ml of 300 mM sodium acetate, 10 mM Tris- HCI, 1 mM Na-EDTA, 50% glycerol is processed through a nebulizer (IPI Medical Products) with a stream of nitrogen adjusted to 35 Kpa for 2 minutes. The Ssonicated DNA is ethanol precipitated and redissolved in 500 lp TE buffer.
To create blunt-ends, a 100 pl aliquot of the resuspended DNA is digested with 5 units of BAL31 nuclease (New England BioLabs) for 10 min at 30 0 C in 200 pl BAL31 buffer. The digested DNA is phenol-extracted, ethanol-precipitated, redissolved in 100 ptl TE buffer, and then size-fractionated by electrophoresis through a 1.0% low melting temperature agarose gel. The section containing DNA fragments 1.6-2.0 kb in size is excised from the gel, and the LGT agarose is melted and the resulting solution is extracted with phenol to separate the agarose from the DNA. DNA is ethanol precipitated and redissolved in 20 gl of TE buffer for ligation to vector.
A two-step ligation procedure is used to produce a plasmid library with 97% inserts, of which >99% were single inserts. The first ligation mixture (50 ul) contains 2 g.g of DNA fragments, 2 glg pUC18 DNA (Pharmacia) cut with Smal and dephosphorylated with bacterial alkaline phosphatase, and 10 units of T4 ligase (GIBCO/BRL) and is incubated at 14 0 C for 4 hr. The ligation mixture then is phenol extracted and ethanol precipitated, and the precipitated DNA is dissolved in gl TE buffer and electrophoresed on a 1.0% low melting agarose gel. Discrete bands in a ladder are visualized by ethidium bromide-staining and UV illumination and identified by size as insert vector v+I, v+2i, v+3i, etc. The portion of the gel containing v+I DNA is excised and the v+I DNA is recovered and resuspended into 20 ul TE. The v+I DNA then is blunt-ended by T4 polymerase treatment for 5 min. at 37 0 C in a reaction mixture (50 ul) containing the v+I linears, 500 iM each of the 4 dNTPs, and 9 units of T4 polymerase (New England BioLabs), under recommended buffer conditions. After phenol extraction and ethanol precipitation the repaired v+I linears are dissolved in 20 il TE. The final ligation to produce circles is carried out in a 50 l.1 reaction containing 5 p.1 of v+I linears and 5 units of T4 ligase at 14°C overnight. After 10 min. at 70°C the following day, the reaction mixture is stored at -20 0
C.
This two-stage procedure results in a molecularly random collection of single-insert plasmid recombinants with minimal contamination from double-insert chimeras or free vector Since deviation from randomness can arise from propagation the DNA in 25 the host, E. coli host cells deficient in all recombination and restriction functions Greener, Strategies 3 (1990)) are used to prevent rearrangements, deletions, and loss of clones by restriction. Furthermore, transformed cells are plated directly on antibiotic diffusion plates to avoid the usual broth recovery phase which allows multiplication and selection of the most rapidly growing cells.
Plating is carried out as follows. A 100 .1 aliquot of Epicurian Coli SURE II Supercompetent Cells (Stratagene 200152) is thawed on ice and transferred to a chilled Falcon 2059 tube on ice. A 1.7 l1 aliquot of 1.42 M beta-mercaptoethanol is added to the aliquot of cells to a final concentration of 25 mM. Cells are incubated on ice for 10 min. A 1 l1 aliquot of the final ligation is added to the cells and incubated on ice for 30 min. The cells are heat pulsed for 30 sec. at 42 0 C and placed back on ice for 2 min. The outgrowth period in liquid culture is eliminated from this protocol in order to minimize the preferential growth of any given transformed cell. Instead the transfonnation mixture is plated directly on a nutrient rich SOB plate containing a 5 ml bottom layer of SOB agar SOB agar: 20 g tryptone, 5 g yeast extract, 0.5 g NaCI, 1.5% Difco Agar per liter of media). The ml bottom layer is supplemented with 0.4 ml of 50 mg/ml ampicillin per 100 ml SOB agar. The 15 ml top layer of SOB agar is supplemented with 1 ml X-Gal 1 ml MgCl (1 and 1 ml MgSO /100 ml SOB agar. The 15 ml top layer is poured just prior to plating. Our titer is approximately 100 colonies/10 gl aliquot of transformation 2 4 All colonies are picked for template preparation regardless of size. Thus, only clones lost due to "poison" DNA or deleterious gene products are deleted from the library, resulting in a slight increase in gap number over that expected.
3. Random DNA Sequencing High quality double stranded DNA plasmid templates are prepared using a boiling bead" method developed in collaboration with Advanced Genetic Technology Corp. (Gaithersburg, MD) (Adams et al., Science 252:1651 (1991); Adams et al., Nature 355:632 (1992)). Plasmid preparation is performed in a 96- 20 well format for all stages of DNA preparation from bacterial growth through final DNA purification. Template concentration is determined using Hoechst Dye and a Millipore Cytofluor. DNA concentrations are not adjusted, but low-yielding templates are identified where possible and not sequenced.
Templates are also prepared from two Streptococcus pneumoniae lambda 25 genomic libraries. An amplified library is constructed in the vector Lambda GEM- 12 (Promega) and an unamplified library is constructed in Lambda DASH II (Stratagene). In particular; for the unamplified lambda library, Streptococcus pneumoniae DNA 100 kb) is partially digested in a reaction mixture (200 ul) containing 50 gg DNA, IX Sau3AI buffer, 20 units Sau3AI for 6 min. at 23 0
C.
0 30 The digested DNA was phenol-extracted and electrophoresed on a 0.5% low melting agarose gel at 2V/cm for 7 hours. Fragments from 15 to 25 kb are excised and recovered in a final volume of 6 ul. One pl of fragments is used with 1 pl of DASHII vector (Stratagene) in the recommended ligation reaction. One 1 of the ligation mixture is used per packaging reaction following the recommended protocol with the Gigapack II XL Packaging Extract (Stratagene, #227711). Phage are plated directly without amplification from the packaging mixture (after dilution with 500 pC of recommended SM buffer and chloroform treatment). Yield is about S2.5x103 pfu/ul. The amplified library is prepared essentially as above except the lambda GEM-12 vector is used. After packaging, about 3.5x104 pfu are plated on the restrictive NM539 host. The lysate is harvested in 2 ml of SM buffer and stored frozen in 7% dimethylsulfoxide. The phage titer is approximately lxl09 pfu/ml.
Liquid lysates (100 pl) are prepared from randomly selected plaques (from the unamplified library) and template is prepared by long-range PCR using T7 and T3 vector-specific primers.
Sequencing reactions are carried out on plasmid and/or PCR templates using the AB Catalyst LabStation with Applied Biosystems PRISM Ready Reaction Dye Primer Cycle Sequencing Kits for the M13 forward (M13-21) and the M 3 reverse (M13RP1) primers (Adams et al., Nature 368:474 (1994)). Dye terminator sequencing reactions are carried out on the lambda templates on a Perkin-Elmer 9600 Thermocycler using the Applied Biosystems Ready Reaction Dye Terminator Cycle Sequencing kits. T7 and SP6 primers are used to sequence the ends of the inserts from the Lambda GEM-12 library and T7 and T3 primers are used to sequence the ends of the inserts from the Lambda DASH II library.
20 Sequencing reactions are performed by eight individuals using an average of fourteen AB 373 DNA Sequencers per day. All sequencing reactions are analyzed using the Stretch modification of the AB 373, primarily using a 34 cm well-to-read distance. The overall sequencing success rate very approximately is about 85% for M13-21 and M13RP1 sequences and 65% for dye-terminator reactions. The 25 average usable read length is 485 bp for M13-21 sequences, 445bp for M13RP1 ~sequences, and 375 bp for dye-terminator reactions.
Richards et al., Chapter 28 in AUTOMATED DNA SEQUENCING AND ANALYSIS, M. D. Adams, C. Fields, J. C. Venter, Eds., Academic Press, London, (1994) described the value of using sequence from both ends of 30 sequencing templates to facilitate ordering of contigs in shotgun assembly projects of lambda and cosmid clones. We balance the desirability of both-end sequencing (including the reduced cost of lower total number of templates) against shorter read-lengths for sequencing reactions performed with the M13RP1 (reverse) primer compared to the M13-21 (forward) primer. Approximately one-half of the templates are sequenced from both ends. Random reverse sequencing reactions are done based on successful forward sequencing reactions. Some M13RP1 sequences are obtained in a semi-directed fashion: M13-21: sequences pointing outward at the ends of contigs are chosen for M13RPI sequencing in an effort to specifically order contigs.
4. Protocol for Automated Cycle Sequencing The sequencing is carried out using ABI Catalyst robots and AB 373 Automated DNA Sequencers. The Catalyst robot is a publicly available sophisticated pipetting and temperature control robot which has been developed specifically for DNA sequencing reactions. The Catalyst combines pre-aliquoted templates and reaction mixes consisting of deoxy- and dideoxynucleotides, the thermostable Taq DNA polymerase, fluorescently-labelled sequencing primers, and reaction buffer. Reaction mixes and templates are combined in the wells of an aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear amplification one primer synthesis) steps are performed including denaturation, annealing of primer and template, and extension; DNA synthesis. A heated lid with rubber gaskets on the thermocycling plate prevents evaporation without the need for an oil overlay.
Two sequencing protocols are used: one for dye-labelled primers and a 20 second for dye-labelled dideoxy chain terminators. The shotgun sequencing involves use of four dye-labelled sequencing primers, one for each of the four terminator nucleotide. Each dye-primer is labelled with a different fluorescent dye, permitting the four individual reactions to be combined into one lane of the 373 DNA Sequencer for electrophoresis, detection, and base-calling. ABI currently 25 supplies pre-mixed reaction mixes in bulk packages containing all the necessary non-template reagents for sequencing. Sequencing can be done with both plasmid and PCR- generated templates with both dye-primers and dye- terminators with approximately equal fidelity, although plasmid templates generally give longer usable sequences.
30 Thirty-two reactions are loaded per AB373 Sequencer each day, for a total of 960 samples. Electrophoresis is run overnight following the manufacturer's protocols, and the data is .collected for twelve hours. Following electrophoresis and fluorescence detection, the ABI373 performs automatic lane tracking and basecalling. The lane-tracking is confirmed visually. Each sequence electropherogram (or fluorescence lane trace) is inspected visually and assessed for quality. Trailing sequences of low quality are removed and the sequence itself is loaded via software to a Sybase database (archived daily to 8mm tape). Leading vector polylinker sequence is removed automatically by a software program. Average edited lengths of sequences from the standard ABI 373 are around 400 bp and depend mostly on the quality of the template used for the sequencing-reaction. ABI 373 Sequencers converted to Stretch Liners provide a longer electrophoresis path prior to fluorescence detection and increase the average number of usable bases to 500-600 bp.
INFORMATICS
1. Data Management A number of information management systems for a large-scale sequencing lab have been developed. (For review see, for instance, Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii International Conference on 15 System Sciences, IEEE Computer Society Press, Washington D. 585 (1993)) The system used to collect and assemble the sequence data was developed using the Sybase relational database management system and was designed to automate data flow wherever possible and to reduce user error. The database stores and correlates all information collected during the entire operation from template S 20 preparation to final analysis of the genome... Because the raw output of the ABI 373 Sequencers was based on a Macintosh platform and the data management system chosen was based on a Unix platform, it was necessary to design and implement a variety of multi- user, client-server applications which allow the raw data as well as analysis results to flow seamlessly into the database with a minimum of user effort.
2. Assembly An assembly engine (TIGR Assembler) developed for the rapid and accurate assembly of thousands of sequence fragments was employed to generate contigs. The TIGR assembler simultaneously clusters and assembles fragments of the genome. In order to obtain the speed necessary to assemble more than 104 fragments, the algorithm builds a hash table of 12 bp oligonucleotide subsequences to generate a list of potential sequence fragment overlaps. The number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements. Beginning with a single seed sequence fragment, TIGR Assembler extends the. current contig by attempting to add the best matching fragment based on oligonucleotide content. The contig and candidate fragment are aligned using a modified version of the Smith-Waterman algorithm which provides for optimal gapped alignments (Waterman, M. Methods in Enzymology 164:765 (1988)). The contig is extended by the fragment only if strict criteria for the quality of the match are met. The match criteria include the minimum length of overlap, the maximum length of an unmatched end, and the minimum percentage match. These criteria are automatically lowered by the algorithm in regions of minimal coverage and raised in regions with a possible repetitive element. The number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements. Fragments representing the boundaries of repetitive elements and potentially chimeric fragments are often rejected based on partial mismatches at the ends of alignments and excluded from the current contig.
TIGR Assembler is designed to take advantage of clone size information coupled with sequencing from both ends of each template. It enforces the constraint that 15 sequence fragments from two ends of the same template point toward one another in the contig and are located within a certain range of base pairs (definable for each i* clone based on the known clone size range for a given library).
The process resulted in 391 contigs as represented by SEQ ID NOs: 1-391.
3. Identifying Genes The predicted coding regions of the Streptococcus pneumoniae genome were initially defined with the program GeneMark, which finds ORFs using a probabilistic classification technique. The predicted coding region sequences were used in searches against a database of all nucleotide sequences from GenBank (October, 1997), using the BLASTN search method to identify overlaps of 50 or more nucleotides with at least a 95% identity. Those ORFs with nucleotide sequence matches are shown in Table 1. The ORFs without such matches were translated to protein sequences and compared to a non-redundant database of known proteins generated by combining the Swiss-prot, PIR and GenPept databases. ORFs that matched a database protein with BLASTP probability less than or equal to 0.01 are shown in Table 2. The table also lists assigned functions based on the closest match in the databases. ORFs that did not match protein or nucleotide sequences in the databases at these fevels are shown in Table 3.
ILLUSTRATIVE APPLICATIONS 1. Production of an Antibody to a Streptococcus pneumoniae Protein Substantially pure protein or polypeptide is isolated from the transfected or transformed cells using any one of the methods known in the art. The protein can also be produced in a recombinant prokaryotic expression system, such as E. coli, or can be chemically synthesized. Concentration of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be prepared as follows.
2. Monoclonal Antibody Production by Hybridoma Fusion Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from murine hybridomas according to the 15 classical method of Kohler, G. and Milstein, Nature 256:495 (1975) or modifications of the methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated.
The spleen cells are fused by means of polyethylene glycol with mouse myeloma 20 cells, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures, such as ELISA, as originally described by Engvall, Meth.
Enzymol. 70:419 (1980), and modified methods thereof. Selected positive clones S can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et al., Basic Methods in Molecular Biology, Elsevier, New York. Section 21-2 (1989).
3. Polyclonal Antibody Production by Immunization Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein described above, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody-production is affected by many factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic than others and may require the use of carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appears to be most reliable. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al., J. Clin. Endocrinol. Metab. 33:988-991 (1971).
Booster injections can be given at regular intervals, and antiserum harvested .*15 when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen, o• begins To fall. See, for example, Ouchterlony, O. et al., Chap. 19 in: Handbook of Experimental Immunology, Wier, ed, Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12M).
Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, Chap. 42 in: Manual of Clinical Immunology, second edition, Rose and Friedman, eds., Amer. Soc. For Microbiology, Washington, D. C. (1980) Antibody preparations prepared according to either protocol are useful in 25 quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples: they are also used semi- quantitatively or qualitatively to identify the presence of antigen in a biological sample. In addition, antibodies are useful in various animal models of pneumococcal disease as a means of evaluating the.protein used to make the antibody as a potential vaccine target or as a means of evaluating the antibody as a potential immunotherapeutic or immunoprophylactic reagent.
4. Preparation of PCR Primers and Amplification of DNA Various fragments of the Streptococcus pneumoniae genome, such as those of Tables 1-3 and SEQ ID NOS: 1-391 can be used, in accordance with the present invention, to prepare PCR primers for a variety of uses. The PCR primers are preferably at least 15 bases, and more preferably at least 18 bases in length. When selecting a primer sequence, it is preferred that the primer pairs have approximately the same G/C ratio, so that melting temperatures are approximately the same. The PCR primers and amplified DNA of this Example find use in the Examples that follow.
Gene expression from DNA Sequences Corresponding to ORFs A fragment of the Streptococcus pneumoniae genome provided in Tables 1- 3 is introduced into an expression vector using conventional technology.
15 Techniques to transfer cloned sequences into expression vectors that direct protein translation in mammalian, yeast, insect or bacterial expression systems are well known in the art. Commercially available vectors and expression systems are available from a variety of suppliers including Stratagene (La Jolla, California), Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If 20 desired, to enhance expression and facilitate proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular expression organism, as explained by Hatfield et al., U. S. Patent No. 5,082,767, incorporated herein by this reference.
a o The following is provided as one exemplary method to generate polypeptide(s) from cloned ORFs of the Streptococcus pneumoniae genome fragment. Bacterial ORFs generally lack a poly A addition signal. The addition signal sequence can be added to the construct by, for example, splicing out the poly A addition sequence from pSG5 (Stratagene) using BglI and Sall restriction endonuclease enzymes and incorporating it into the mammalian expression vector pXTI (Stratagene) for use in eukaryotic expression systems. pXTI contains the LTRs and a portion of the gag gene of Moloney Murine Leukemia Virus. The positions of the LTRs in the construct allow efficient stable transfection. The vector includes the Herpes Simplex thymidine kinase promoter and the selectable neomycin gene. The Streptococcuspneumoniae DNA is obtained by PCR from the bacterial vector using oligonucleotide primers complementary to the Streptococcus pneumoniae DNA and containing restriction endonuclease sequences for PstI o *incorporated into the 5' primer and BglII at the 5' end of the corresponding 15 Streptococcus pneumoniae DNA 3' primer, taking care to ensure that the Streptococcus pneumoniae DNA is positioned such that its followed with the poly A addition sequence. The purified fragment obtained from the resulting PCR reaction is digested with PstI, blunt ended with an exonuclease, digested with BglII, purified and ligated to pXT1, now containing a poly A addition sequence 20 and digested BglII.
The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri).
The protein is preferably released into the supernatant. However if the protein has membrane binding domains, the protein may additionally be retained within the cell or expression may be restricted to the cell surface. Since it may be necessary to purify and locate the transfected product, synthetic 15-mer peptides synthesized from the predicted Streptococcus pneumoniae DNA sequence are injected into mice to generate antibody to the polypeptide encoded by the Streptococcus pneumoniae
DNA.
Alternatively and if antibody production is not possible, the Streptococcus pneumoniae DNA sequence is additionally incorporated into eukaryotic expression vectors and expressed as, for example, a globin fusion. Antibody to the globin moiety then is used to purify the chimeric protein. Corresponding protease cleavage sites are engineered between the globin moiety and the polypeptide encoded by the Streptococcus pneumoniae DNA so that the latter may be freed from the formed by simple protease digestion. One useful expression vector for generating globin chimerics is pSG5 (Stratagene). This vector encodes a rabbit globin. Intron II of the rabbit globin gene facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. These techniques are well known to those skilled in the art of molecular biology. Standard methods are published in methods texts such as Davis et al., cited elsewhere herein, and many of the methods are available from the technical assistance representatives from Stratagene, Life Technologies, Inc., or Promega. Polypeptides of the invention also may be produced using in vitro translation systems such as in vitro ExpressTM Translation Kit (Stratagene).
o While the present invention has been described in some detail for purposes of clarity and understanding, one skilled in the art will appreciate that various changes in form and detail can be made without departing from the true scope of the invention.
All patents, patent applications and publications referred to above are hereby incorporated by reference.
.e Page(s) 4152U4Qiare claims pages they appear after the sequence listing S. .nuo a coin rein conainn a no a seaauen*es 437~ S pnemoia -gI475 Cteodgros cnuonann knownd seeine sufxd euas s ad 92056 I I I Ijhomoserine kinase homolog ithril genes, complete cdsIIII 2 I5 16169 15720 1gb1U040471 IStreptococcus pneumoniae SSZ dextran glucosidase gene and Insertion 96 45045 I~ I 1 sequence_151202_transposase gene,_complete cdsIII 4 6T 659-*-167- *embl8333-1--,-P---IIS--- -eumoniaedexB----- BCD-EF.GI,-TJ.K1g-e-es-dTDP-ri-amo- 4- 4- 6 2 6927 j 6167 jembIZ8333SISPZ8 IS.pneumoniae dcxli. capltAli.C.D.E.F.G.ll.I.3.KI genes. d1'DP-rlamnose 94 6461 I I I I Ibiosynthesis genes and aliA gene I II 46 12 I 104I977 961 J- embj283335j5PZ8 jS.pneumoniae dcxli ScaplIA.B.C.D.E.F.G.H.1.J.KI genes. dTDP-rhamnose 91 62462 I I I I Ibiosynthesis genes and aliA gene I I I9 03- 4-132019-*g-l---5261----*Strptcoccus--pne---- 4 4 11049 961 1mbi133315P5 S~neumoniaedxl, aiA ee. TJ-hans 9 I112 b ios nhess gene and aene, genel d. 3 153 11546 112019 gbIU43S261 jStreptococcus pneumoniae neuraminidase B Inane) gene, complete cds. and 1 99 91 918 I I I I I Ineu raminidase (nanA) gene, partial cdsj I I 3 114 1121 113371 igbIt143526I IStreptococcus pneumoniae neuraminidase Bi (nanlil gene, complete cds, and 99 841394 13591 I I 1 I1 neuraminidase (nanA) gene, partial cds III I 4- ;141 143 bU32I Streptococcus pneumoniae neuraminidase li (nanil gene, complete cds. and 99 211 181 I I I I I neuraminidase (nanA) gene, partial cdsIIII f 3 16 14297 11517 jgbIu43S26I IStreptococcus pneumoniae neuraminidase B (nanil gene, complete cds, and 99 8383 I I I I IIneu raminidase InanA) gene, partial cds I 1 01 I 'l I 8 11267 118437 IgbjU41356 Streptococcus pneumoniae neurmide B t ionnsloi ge e cplee cdsA and 81 1069 11317 I 4 I I 46 118 I~ Yl6 ISD jhrooccusn pinehmoae dnaG r eos, cpopene and O? n 1314 4 732 7 19 529 IembIYII4631IS ISte ocu pneumoniae d nrt po poAu ene s and38 (966 anp) 99 j 860 j 132 6 I7 12 764 1eb1777261SPIS IS.pneumoniae DNA for insertion sequence 1S131 (1372 bpl 1 99 4523 6 23 12322 I9770 IembIZ77725ISPIS IS.pneumoniae DNAB for insertion equence1181 (966s bTp -ramIs 96 460 495 7 11 835 782 embZ83351PZ8 S.pneumoniae desli, cspllA lC.D,E,F.,ll1.1,J.Ki genes, dTDP-rhamnose 95 62 62 I I I I biosyitlesis genes3 and aiA gene6264 4 4 4 0 0* 49 44* TABE IS. pneumoniae -Coding regions containing known sequences m Coti IOR c Str Stp jathma&tch-gene -name percentl 145 nt ORF n ID jiD j (JIt) I (ntl I cession I I ident Ilength jlengl Ox I ii I 9024 j8206 jembIZ8333SjSPz8 S.pneumoniae dexli, capilA.8.C.D.E,P.U.Fl.1.J,Kj genes, dTD)P-rhamnoae 1 95 819 819 j biosynthesis genes and aliA gene I 1 113 930 8078 IbL921 IStreptococcus pneumoniae methyl transferase imtrl gene cluster, complete j 93 5312 I I I I cds 51I I 'll I 1 2 j548 19i9 1emb157969i iSOOn jS.pneumoniae yorflA.8.C.D,E1. ftsL. pbpx and regH genes I 99 316 372 4 11 5 j 3040 1 190 embiZ7969llSOOR IS.pneumoniae yorflA ,C.D.EI. ftsL, pbpx and regR genes I 99 259 4389 F 4 1 11 1 6 13480 13247 lembIZ7969lISoOR jS.pneumoniae yorflA.8.C.D.El. ftsL. pbpX and regR genes j 99 234 234 4 11 17 13601 -4557 ;embIlZ79691l500R IS.pneueoniae yortIA.B.C.D.EI. ftsL. pbpX and regRl genes 957 i 957 T 4 I it 1 4884 7 142 lembIXlfi367lSPPB jStreptococcus pneumoniae pbpX gene for penicillin binding protein 2X1 99 1 2259 1 22591 I 1 10 I7132 8124. 1emb1X1636715PPe IStreptococcus pneumoniae pbpX gene for penicillin binding protein 2X 1 98 70o 993 1 13 1 j 53 j 1126 lgblH3l296l IS.pneumoniae recP gene, complete cda 99 I 437 1074 24 3 1 1837. 2148 1emb1Z8333515PZ8 S.pn~umoniae dexi. capllA.B.C.D.El'.a.H.1.JKI genes. drOP-rhamnose 879632 I I Ibiosynthesis genes and aliA gene 9I 11I.1 SA-CAR-syt-hea-e-- I I I II (purCl genes, complete cds I 1 1 4 4- 4 Is 9 1 92.181 ibU929 trpoccu pemnietpe1Fcapsular polysaccharid .e biosynthesis 89 340 432 I I .1 Ipartial cds1 17 7 1.3910 I3458 jembIZ77726ISPlS jS.pneumoniae DNA for insertion sequence 151318 (112 bpl 1 98 4S 453I 17 8 14304 j 3873 1emb1Z7772715P15 jS.pneumoniae DNA for Insertion sequence IS1318 1823 bp) 96 382 432 I 19 111 41 1529 jembIX949O91SPIO IS.pneumoniae iga gene 7 5 368 j 489 4 1 19 12 1554 1 75 gbIL077S2j jStreptococcus pneozsoniae attachment site lattli. DNA sequence 99 167 204 I 19 3 j946 1827 IgbIL077521 Streptococcus pneumoniae attachment site (attLSl, DNA sequence I 94 100 j 882 4- -0 -377- 182----gb lI033315t1 Streptococcus e pneumoniae orrIL gene, partial cds, competenceo stimulating 99t 756 756t-in-g----- 7-56-1 ?5-6-1 I I I Ipeptide precursor (coed), histidine protein kinase lcos'DI and response I I I Iregulator (comEl genes, complete cds, tRNA-Arg and tRNA-Gin genesI 4 20 12 12271 1931 gbU111 IStreptococcus pneumoniae ortL gene, partial cds. competence stimulating 1 98 13134 I I Jgu3331I peptide precursor lcomCl. histldine protein kinase lcomD) and response 11 1 1 1 1 regulator (comE) genes, complete cds, tRUA-Arg and tRNA-Gin genesII 4 TAB E. S. .nuona Codin rein cotinn know sequences.
C CoTABLER I pneumonia -S Coin regon cotiig nw eqec ID JID I nt I nt) I acession I ident I length Ilength, -4 4- I 20 3 375 684Stretocccu pnumoniae competence stimulating peptide precursor comc f 9 492 492 I 1 gblu76218j (~CO14C), histidine kinase homolog ComD (coml)). and response regulator I- I homolog ComE (comZ) genes, complete cds
I
j4 3322 J4527 lgblAF~OO6581 Streptococcus pneumonlae R801 tRLNA-Arg gene, partial sequence, and puaie 99 1206 beta subunit of DNA polymerase III (spdnan) genes, complete cds PuaveI Streptococcus pneumoniaa 11801 tRNA-Arg gene, partial sequence, and putative 9 7 7 ~o 543 JebIAooo~s8I erine protease (sphtra), SPSpoJ (apspoJ), initiator protein (spdnaa) and j I I I I i Ibeta subunit of DNA polymerase III iapdnan) genas, complete cdsI 20 6 53 97 IbA0051 Srpo o neumoniae R801 tltNA-Arg gene, partial sequence, an uaie 99 1386 1386 6 953 J 617 gbIAooos8I srie ro tease (sphtra). SPSpoJ (spspoJl, initiator protein (spdnaa) andI I I I I beta subunit of DNA polymerase III (spdnan) genes, complete cds 4-4 4 7 6995 18212 IbA0 651 Istreptococcus pneumoniae R801 tRI4A-Arg gene, partial sequence, and putative 99 1218 1218 jgbIA0006~j Iserine protease (sphtra), SPSpoJ lspspoJ). initiator protein (spdnaal and I- 4 beta subunit of DNA polymerase III ispdnan) genes, complete cds 20 a 24 47 1bAF0651 Streptococcus pneumoniae R801 tRI4A-Arg gene, partial sequence, and putative 98 25825 I 2(4 8471 jgbIF00068I Iserine protease lsphtra), SPSpoJ (spspo2), initiator protein fspdnaal I I I I I Ibeta subunit of DNA polymerase III (spdnanl genes, complete cdsII 853 970 9bAF00651 trp~cocu pnumnle 80 tUAAr gne prtalsequence, and putative 9 j 854 9670 fgiAVOOO6e i nepoes tetooc sp trla, PotJ4Ar (p o nitatr protein (spdnaa) and 1413 I I I Ibeta subunit of DNA polymerase III (spdnan) genes, complete cds 1 22 1 1187- 1267 e- nerton- uece-S131- -172- 2 3 i-i-i-4 I 22 115 112708 112256 IemblZ77727lSPIS IS.pneumoniae DNA for insertion sequence IS1318 (823 bp) I 97 I 353 I 453 22 11 1315 12662 1embIZ777261SPIS IS.pneumoniae DNA for insertion sequence IS1318 (1373 bp) 98 504 504 13 1 98 181, b 8111P8 S.pneumoniae genes encoding galacturonosyl transferase and transposase and 95 I 463 513 I 18 I I Insertion-sequence IS1515II I 2 24 118829 119299 jembIz861121SPZ8 inertoniseqgene encodin galacturonosyl transferase and transposase and J9 41 71 1 23 1 5 1 5624 1 4203 lembIX52474lSPPL IS.pneumoniae ply gene for pneumolysin 1 99 1 1422 1 14221 I 23 16 16063 1 5629 lgbIH177171 IS.pneumoniae pneumolysin gene, complete cds I 98 j 197 435 1 26 1 1 15500 1 2 IemblX94909ISPIG IS.pneumoniae iga gene 1 87 3487 5499 1 26 1 5823 15584 jgbIU476871 IStreptococcus pneumoniae immunoglobulin Al protease (iga) gene, complete 1 99 j 151 240 I I I I1 cds I II 2 6 3 1 6 7 1 5 8 g I 4 6 7 4 I s r p o o c s p e m n a m u o l b l n A p r t s e e e o p e e1 0 1 I I cds I1 -0 -9 5 0 *e *eso* TABLE I TABLE I S. pneumoniae Coding regions containing known sequences Contig jORFa Start I Stop I match match gene name ID ID (nt) (nt) (aession percent HSP nt ORF nt I Int int acession ident length length 26 4 148- 4 j 26 8 14498 114854 lembjZ833351SP78 Sbpoeumone dexB. capljA.8,C. DEEFGH. K genes, dTDP-rhamnose 99 gD 357 1 1 1 1 1 biosynthesis genes and llA gene 8 2 03 5 b I I jbiosynthesis genes and aliA gene 6 2 .8 1517 IgbIU04047 IStrentococcus pneumoniae Sf dearn lcsia-geead netLnI24+5 sequence IS1202 transposase gene, complete cds I1 51 a 28 80 505 Iem1Z6335srz S~neuonie dancaPlA.eB..E. FGH IJK genes dtLP-rhaeose jo 99 j a 42 426 S 1. bblosynthesis genes and aliA gene
I
1248 7 2-b L0- -11 1Streptococcus pneumoniae mS sdextrak (lucosmdase gene and inerio 97 28 I 5-1-03-_ Ie952sequence IS1202 transposase gene complete ds 2 0 8 0- 1 29- jIg jU O4 O047 j- S tr e p t -c o cc u a p n e u o n i a e-SZ-dea r---gl c o s i d s e-gen-and-isertio-j- 0- S I 7 gbU47 StreptococcusS detran glucosidase gene and insertion 96 III sequence 11202 transposase gene, complete dcds 1 I 58 0 nhsee -3 1 207 152 IgbLUO861j IStreptococcus pneumoniae maltose/maltodetrn Uptake malX) and two 99 1317 1317 I I j maltodextrin permease s C maGiC and maIKN genes, complete cds n0 3 1 14 137 Igb8LOa6I8 Istreptococcus pneumoniae maltsose/ealtodetrin uptake Imaixi and to 96 795 891 maltodeatri,, permease iaiC and maim genes, complete cds 3 9 0 b8 IStreptococcus pneumoniae malA gene complete cds; aR gene complete cds 96 446 828 34 I 14 2790 12647 IgbIL20569j IStreptococcus pneumoniae mal. gene, complete cds maL gene, complete cds 98 137 144 N--s 38Ltococcus peumoniae malA gene, complete cds; maim gene, complete cds 96 999 999 4 pneumoniae pepidemethionine sultoxide 258uctase fmsrA) and 13 101 111 -V93 3(20 luoJJ21 8rin h inasepPmolog hril genes, complete cds 93 j I 1176 1439 IembIZS33JSPazn S.pneumonlae den. caplA.C.D.o.s.onlJXI genes. dTDia-rhamnose j 87 248 j 264 biosynthesis genes and 1iA a Fcapsular polysaccharide biosyntiaeis 98 264 504 ope on cps~fA CDE GIIJKL NO) genes, complete cds, n l~ e e I I I I ~partial cds a J..1458.. 196! genes,- complete- 35 117 16172 115477 IembIxns7nl1SPrc IS.pneumoniae dean, cpsl4A, cps148. cpsI4C, cpsl4D, cps4. cpsI4F, cpsl4o, 9 I 696696 I I I I j cpsl4-i cpsI4I, cpsl4j, cpsl4(, cpa14L. tact. genesI
II
a 135 8 116961 16170 lemblzn333sIspza IS.pneumoniae dea, capliA.C.D.E.F. GHlfIJK genes, dTDP-rhauose 86 792 7921 4 -a a I I I Ibiosynthesis genes and aliA gneI 117620 3S 11 116 16871 IgbIuO92391 IStreptococcus pneumonlae type 19F capsular polysacciaride biosynthesis 1 83 750 750 operon. cpsl9fACDEFGiaJKL~.aO) genes, complete cds, and ali gene partial cds g
'C
TABLE I S. pneumoniae Coding regions containing known sequences IID ID I ntl I nt) acesonConi OR tr tp wth jmthgn aeidrent I lispgth I lengt 3- 2 0- 190---176-- 1 35 j0 1101 1164 Imb1X85787I5PCv lSpneumoniae dexe, cpsi4A' cpsl4B, cpsi4C, cpsl4D, cpsi4E. cpsi4F, cpsl4G, 94 1458 j 1458 4 II I I I I cpsl44, Cps 4 cpsI4J, cpsi4X, cpsl4L tasA genes I1 60 0 -4 1, 4 4- 4 g- 0 -a 6 1 S -r e p-t -c -c c u a -p u -o -j e -u r -c -a n-t 1 -e -v -r I -n L -e cu a r- -J a A I I I I kDa protein genes, complete cds. and ORFI gene, partial cdsI
I
136 101941196IbU39 Streptococcus pneumoniae surface adhesin A precursor IpsaAj gene, complete I 99j 6961 :1111-~ 4 37 1 2743 117 emb1Z677391sppA IS.pneumoniae parC, parE and transposase genes and unknown onf 99 1 2565j 25651 4- 4- i 2 i 985- 2824 IembiZi77391sPPA IS-pneumoniae parC, parE and transposase genes and unknown orf 1 100 162 162 I I 50341 3070 IembIZ67739ISPrA IS.pneumoniae parC, parE and transposase genes and unknown on I~peuonaeparC, parE and transposase genes and unknown orf 1 99 1 1657 65 196 3 5 61115833 IembIZ677391SPPA IS.pneumoniaa parC. parE and transposase genes and unknown orf I 96 339 1 339 1 1 38 119 112969 113268 1gb1H2a6791 IS.pneueoniae promoter region DNA 1101 6 0 2 1256 12137 19bIU41?35I (streptococcus pneumoniae peptide methionine sulfoxide reductase imsrAj and 1 9I 86 8 I 1 I 1I homosarine Icinase homolog (throl genes, complete cds I I
LA
39 j 2405 13370 IgbIU417351 Streptococcus pneumoniae peptide methionine sulfoxlde reductase lmsrAl and 1 99 1 966 j 66 00 I ~homoserine kinase homolog thrl genes, complete cds J9 j 5253 17208 IgbIl129686j IS.pneussoniae mismatch repair (hexa) gene, complete cds I 99 j 1956 1956 41 1 3 137IebIl7o7s~E Is.pneumoniae recA gene encoding RecA 99 1027 1035 -41- I 1 I Igenes, and downstream sequences
II
41 3 08 j445jgji181~ S.pneumoniae autolysin (lytA) gene, complete cds 1 99 I 963 j 963 1 41. 14 1 3272 13096 IgbIM13812I IS.pneumoniae autolysin ilytA) gene, complete cds j 100 177 177 41 15 1 3603 13860 jgbjN138l21 IS.pneumoniae autolysin (lytA) gene, complete cds 1I 0 25825 41 j6 j4755 j 5162 Igbj[L 66601 Streptococcus pneumoniae ORF, complete cds 1 98 408 408 T i I 41 17 I5270 J 5716 IgbIL366601 Streptococcus pneumoniae ORF, complete cds j 98 447 447I -4 41 j8 6112 j 6918 1gb1L366601 IStreptococcus pneumonlae OR?, complete cds j 98 431 j 807 41 19. 1 6916 1 7119 IgbjL366601 IStreptococcus pneumoniae OR?, complete cds j 100 I 204 204 41 1 02I7660 jgbIL36660j IStreptococcus pneumoniae OR?. complete cds 1 97 j 552 579 Il 41 11 I 7680 1 9 9 IgbJL3666Of Istreptococcus pneumoniae OR?, complete cds j 98 1 81 j 300 1 41 112 1 9169 18717 IembIZ77727IsPIS IS.pneumoniae DNA for insertion sequence 1S1318 823 bp) 1 97 353 I 453 TABLE 1 S. pneumoniae -Coding regions containing known sequences Contig I0RF start IStop match Imatch gene name I percentl HSP nt 01fF ni I 41 j13 9533 9132 IeinbIZ77725ISPIS IS.pnewaoniae DNA for insertion sequence 1i381 (966 bp) 95 I 160 I 402 4 41 11 1 796 1975 IemblZ820011SPZB IS.pneumoniae pcpA gene and open reading frames 10 199 196656 4 t I 44 16 18059 1 7607 IembIZ777261SPIS IS.pneumoniae DNA for insertion sequence 1S1318 (1372 bp) 97 I 453 I 453I I I7 j8423 18022 IembIZ7772SISPIS 1S.pneumoniae DNA for Insertion sequence 151381 (966 bp) I 95 j 160 j 402 I 44 1 8 18559 1 8365' IembIZ820lIjSPZ8 IS.pneumoniae pcpA gene and open reading frames I 100 189 195 11 48 9 IL 6 4 80 14687 IgblL390741 IStreptococcus pneumoniae pyruvate oxidase ispxB) gene, complete cds 99 I 1794 j 1794 -4 49 12 1231 12603 IgbIL2OS611 IStreptococcus pneumoniae Exp7 gene, partial cds 100 216 2373 4- 53 16 120 16 IbU441 Streptococcus pneumoniaeSSZ dextran glucosidase gene and insertion 97 2421 252 53 j 6 2407 156 1 bjU0447J Isequence IS1202 transposase gene, complete cdsII I 4 I~ I 1 biosynthesis genes and aliA gene I 416 4 I 53 a 8 2831 2475 lembIZO333SISPZ8 S.pneumoniae dex8; cap~IA,S.CD.E.F.GIi.1.J.KI genes. dTOP-rhamnose 99 I 338 357~ I I I biosynthesis genes and aliA gene III I 54 1I3 112409 111105 IembIZ8333SISPZS S.pneumoniae dexB. capl(A.B.C.D EF,OGi1.3.JKi genes. dTOP-rhamnose 67 591 11051 I j j. j biosynthesis genes and aliA gene III 55-- 22 0--8 -11994'9 lembIZ84379111SZ8 IS.pneumoniae dfr gene (isolate 92) 99 I 540 540j I 61 11I 111864 1 9909 lembIZ16O82IPNAL IStreptococcus pneumoniae aliB gene 1 98 1965 1965s I 63 1 3 1 239 IgbIM1l87291 IS.pneumoniae Mismatch repair protein (hexA) gene. complete cd3 100 237 j 237 63 -j 2 33 1 2611 IgbIM187291 JS.pneumoniae ismatch repair protein thexAl gene, complete cds 1 99 1 2330 1 2379 #4 I 63 1 3 1 255-7 2823 IgbIN187291 jS.pneumoniae mismatch repair protein ihexAj gene, complete cds 99 1 266 j 267 I 63 14 12958 14664 Igb1H187291 IS.pneumoniae mismatch repair protein ihexA) gene, complete cds 1 95 1 69 1 1707 4 67 7 7161 I4171 IgbIL206701 iStreptococcus pneumoniae hyaluronidase gene, complete cds 1 99 1 2938 1 2991 V 4T 1 0 1 702 IgbIHI434OI Is.pneumoniae Dpn1 gene region encoding dpnC and dpnD. complete cds I 100 693 7 -02 I 70 I2 678 1 116 0 IgbIM1434OI IS.pneumoniae Dpnl gene region encoding dpnC and dpnD. complete cds j 100 1 483 1 483 4 70 I3 1 2490 1 1210 igbIlii43391 fs.pneumoniae Dpnll gene region encoding dpni. dpnA, dpns., complete cds I 98 j 462 1 1281 70 1 7 4230 14424 igbIJO42341 Is.pneumoniae exodeoxyribonuclease (exoAl gene, complete cds 1 99 147 1 195 4 4 4 I 70 18 15197 14316 IgbIJO42341 IS.pneumoniae exodeoxyribonuclease lexoA) gene, complete cds 99 1 881 j 882 T A B L I S. 999m a co in re i n 9on ai9n kn99 99que99es TABL 12 2s.6 284 mI662Spneumoniae oin egos -cotiigkow eune p e r c e nt9s3 1 3 1 8710 j3 987 jbI62OS IStrppcspneumoniae ExpBo gene pata1csI93 3334 j 976 f ft4 -4 ft 4- 73 13 135 1 I lebIZ268SISPA js.pneumonlae DN(11222 s (pl) genes forATmspasuun t Te b su ni an1T~s 99 102 1 06 S- 4-64- 5-79- g -I 36180 S-re-tococcu-- o-ae t-a-ns -o -t- 77 3 J 2 1 j4 1 1339 1emb1X63325PZ8 S.rneumoniae mesA-Bo caIABCDEFG1,,,jgns TPrans 91 j 2 193239 f-t f- f I3 bi 65 7 jbJ4 snei ene an polmeraen(oA ene opeecsI9 62j28 f- f I 77' 4 3 34 2523 199 emb1Z8333515rz8 S.pneumoniae dexB. caplIA.8.C.D ,E ,Fr .o J,~a Kj genes. dfl7P-rhaenose I 95 624 819 I 1 1 1 biosynthesis genes and allA geneII ft S ft. 78 1 14133411 253 1embIz833351SPz8 IS.pneumoniae (RxB) cap/la.BcoeroIa genes. 78 1 2 j 1095 1 325 1embIx772491SPR6 IS.pneumonlae IR6) claR/cilN genes I 99 771 771 0' 826 110 111436 110816 IgbIU90721I IStreptococcus pne..moniae signal peptidase I ispil gene, complete cds 1 97 1 621 621 C 82 1II 112402 111434 1gb1U935761 Streptococcus pneumonlae ribonuclease HI! irnhe) gene, complete cds I 98 1 953 j 969 82 112 112381 112704 jgbju93S761 IStreptoc~ccus pneumoniae rlbonuclease Nil irnhai gene, complete cds 100 51. 324 I 3 I8 3212 3550 1emb1Z7772715P15 IS.pneumoniae DNA for insertion sequence 1S1318 (823 bp) 97 290 339 83 1 1 662 1 681 IbIN6181 I~repococu3pneumoniae transposase, (comA and comB) and SAICAR synthetase 99 1 2190 29 8 11 I621 685 II 4310 I1Suc) genes complete cds III ft- 83 Ill 6849 18213 IgbjN36l8Oj jStreptococcus pneumonlae transposase. (comA an d comB) and SAICAR synthetase j 9 I 1365 1365 I I I I I ipurc) genes, complete cds 1 99 1 f- S I 83 112 18236 j 9090 fgbIN36l8OI IStreptococcus pneumoniae transposase, (comA and comB) and SAICAR synthetase j 99 j 85 5 855 I I I I 1 ipurc) genes, complete cdsIIII t- ft- 83 113 j 9283 j13017 IabjLlSl9OI Istreptococcus pneumoniae SAICAR synthetase ipurCi gene. complete cds 100o 107 3735 8-3 12-3-1,-2-1-7 1247---- L3 ft 8J2 236 2 3 4 5 0 cdIL623 srpoocspemn beta-N-acetylhesosammnidase istrHi gene. complete 98 172 18 83 125 1257 23505 IgbIL369231 Istreptococcus pneumoniae beta--acetleaolnda 4- e omlte 9 1 321 41 272 cI yIm l s Isrl gee copet 32 J 42 -4 -ft- T A .L c *nu o a C o i n r e i n c o n ai i n k n w *eu e cc *oti ccR cc ct r Sto ma c *a c *en nam cer c TABIlE)I(t I pesonia Codin regon cotinn knw seuece I 83 126 128472 127771 jgbjL36923j Streptococcus pneumoniae beta-N-acetylhexosaminidase (strll gene, complete 99 1 702 1 1 cds III 845 1 6173 IembIZ83335jSPz8 jS.pneumoniae dexB. cOPiIA.B.C.D.E.F.G.H.1.J.KI genes, dTDP-rhamnose 8 67 1 0 I I I I I ~biosynthesis genes and aliA gene98 i 67 j 12 87 6 591 516 1eblZ7772515815 jS.pneumoniae DNA-.for insertion sequence 151381 (966 bpi 96 439 j 636 6-T f f- -t f 88 5 125 51 IbM681 Streptococcus pneumoniae transposase, 1comA and come) and SAICAR synthetase 94 '555 555 88 S 297 j35 1 Igl3Io purC) genes complete cdsIIII 88 346Streptococcus pneumoniaectransposase. (comA and camell aqd SAICAR synthetase 94 80)4 j 804 1 t4 10 1 (pu rCl genes complete cdsIII I 89 Ill 1 9878 110093 IgbIHi361801 Streptococcus pneumoniae transposase. (comA and come) and SAICAR synthetase I 97 I 211 11 1 I 1 (purCl genes, complete cdsIj21 89 114 110062 1141 -ebZ335S S.pneumoniae doe, caplIA.e,C,D.E,F.G,ltL.J.gI genes, dTDP-rhamnose -t -t I01 Iem I jsj I~ biosynthesis genes and aliA geneI 1 93 110 1 5303 14941 IembIX63602Isveo IS.pneumoniae mmsA-Box 89 1 2317 363 f 97 j4 (708 11520 Ib '1 5 IStreptococcus pneumoniae peptide methionine sulfoxide reductase (msrAl and 91 1418 I homogeUnel735 1 homolog (thre) genes, complete cds 1TD1rha"1s1 99 j 1 8 0 I835SZ ~nuoiedxcapllA,e,C.,n .,jc genes, jTPramoe9 592 612 I 1 9 70 (mls3SSz biosynthesis genes and aliA gene I 1 I 9 I2 1773 I715 1emb1733715rAn jStrepococcus pneumoniae ami locus conferring aminopterin resistance 1 99 I 998 j 999 I ft 1 99 1 3 1 2794 11712 IembIX17337ISPAN IStreptococcus pneumoniae aol locus conferring aminopterin resistance 1 99 10831 1083 1 99 1 4 1 3732 j 2788 IlinbIXI73I7SPAN IStreptococcus pneuinoniae ami locus conferring aminopterin resistance 100 945 1 945 I I 99 1 5 5249 13714 IembIXI7337ISrn4 Istreptococcus pneumoniae sini locus conferring aminopterin resistance 1 100 j 1536 j 1536 I 9 6 j722 5277 1emb1X1733715M IStreptococcus pneumoniae aol locus conferring aminopterin resistance 1 99 I 1986 j 1986 ;__101 1-1-58-ebX42SSE pnunna epuA and endA -genes for 7 kDa protein and membrane 9 j 146 1323 I I I I j Iendonuclease II 101 2 1149f179 X542- gnesfor7ft--rotin-nd-emran-9922- -2 12----492-j-1719-Ie-bIXS422SI-PEN----neu--on----epuA-a----ndA-genes-for-7---a-protein-and-membraneI---228-22 I I j jeidonucl ease
III
64 1855 jembjXS4225jSPEN Is.pneumoniae epuA and endA genes for 7 kDa protein and membrane j 100 162 162 I I I Iendonuclease
IIII
1 101 4 1701 1 2582 IembIXS422SISPEII IS.pneumoniae epuA and endA genes for 7 kDa protein and membrane 1 100 882 j 182 I I I I I endonuclease I I I 1 103 17 15556 15041 1emb12.959l415rz9 Istreptococcus pneumoniae sodA gene i 100 396 516 -ft 1 104 12 11347 11556 1emb1Z777271585S jS.pneumoniae DNA for insertion sequence 151318 (823 bpi I a) 206 j 210 4 TABL I 9 ,9 *9 9 9 9 9S9 **uo a *oin region cotinn knw eune 6089 S 57 mZ679S ISpneumoniae -ac Codin rgoscn an krno seenes an ukow r1 98871 4- I 107 1 4 1 75381 180 lembI1677391PPA jS.pneuzmoniae peArc pen d t1npss ee n nnw r 98 1 72 105 j j609 498 embI1673915PPA js.pneumoniae pArC pen d trnp1s gene an unknow or209471 -4-4 107 J6 4981 5595 ICmbIXL3l36ISPPE JStreptococcus pneumonia. penA gene for penicillin binding protein 28 91 107 615 4 I 108 19 19068 1 8718 1emb1267-73915P2A IS.pneumoniae parC, parE and transposae genes and unknoon orf 1 95 j 342 J 351 4 I 108 112 11!1308 110922 IembIZ67739ISPPA fS.pneumoniae parC, parE and transposase genes and unknown orf* 1 99 1 199 387 I 09 I3 76 24 lm 1 7251SPIS jS.pneumoniae DNA for Insertion sequence 1S1381 (966 bpl 96 1 61 528 109 14 12688 12855 1emb1Z7772615P15 IS.pneumoniae DNA for insertion sequence 1S1318 (1372 bp) 96 j 148 1 168 109 5 12862 13269 1emliZ77727l5P15 IS.pneumoniae DNA for Insertion sequence 151318 (823 bpi I 9'7 353 408 I 19 6 j5320 J3584 JgbjMl87291 IS.pneumoniae mismatch repair protein (hexA) gene, complete cds I 100 371 1737 I 113 i 431lI 3 igbIH36180I Streptococcus pneumonia. transposasm. (comA and come) and SAICAR synthetase f 95 429 j 29 I I I I j (purCi genes, complete cdsIIII I 13 11 I978 8532 1emb1X9940015PDA 1S.pneumoniae dacA gene end OR? 99 I 1257 j 1257 113 11l 1 9870 110985 IembIx994001SPDA IS.pneumoniae dacA gene and OR? 1 99 j 1116 1116 -j 4 14 13 123 120Q Ib 3681 Srpoocspnieumonia. transposase. (comA and comel and SAICAR synthetase 9 0 11 j 3prC ge50n0esgIH68O gc: complete cds 48 1 115 111 111303 11093i IgbjtJO4O47j Istreptococcus pneumoniae SSZ dextran glucosidase gene and insertion 1 97 372 372 1 1 1 1 1 sequence 1S1202 transposase gene, complete cds I I I 117 111 897 j 3302 IembIX729671SPNA JS.pneumonise nanA gene I 99 2402 2406 17 I2 377 3831 IembIX72967ISPNA IS.pneumoniae nanA gene j 99 237 555 4- 1 117 J3 14327) 3899 jgbIH3618OI IStreptococcus pneumonise transposase, (comA and come) and SAICAR synthetase 98 j 29 421 I I I I I ipurC) genes, complete cdsIIII 4 12 39 114 gI770 tdp ocspneumoniae heat shock protein 70 (dnaK) gene, complete cds 99 202 j 57.3 136 1911bu~7 and DnaJ ldnaJ) gene, partial cda jI )21 3 12412 4253 IgbIU72720j Streptococcus pneumoniae heat shock protein 70 (dnsK) gene. complete cds 1 99 j 182 j 84 I I I I I and Dna) gene, partial cds 1 184 184 I 12 8 156 1557 IbU441 Itetccu nuoieS dextran glucosidase gene and insertion 64 j 411 522 122~~ I~ 06158 b007 sequence 2S1202 transposase gene, complete cds TABLE I S. pneumoniae -Coding regions containing known sequences I ontig lOAF IStart IStop match jmatch gene name iprat S tjOn I D 1 II) 113 I Intl I (nil acession I ident Ilength length.
4 4 1811 (ur) jg89ebnes,0 complete cds II 128 15 112496 11204 1embIt8333515Pz8 jS.pneumoniae dean, capllA.B.C.D.E.F.0.l1.I.Jacl genes. d1DP-rhamnose j 91 705 1293 I I I j I j biosynthesis genes and aliA geneIIII a I134 1 1 1492 lembIYl08l8jSPYI IS.pneumoniae spsA gene j 99 203 j 492 a F134 j- F2-j 556 j- i2652 -Igb[AF0l99041 Istreptococcus pneumonlae choline binding proteln A (cbpA) gene, partial cds 86 685 2097 I14 3j1160 I837 IembIY088SPY1 IS.pneueaoniae spsA gene 86 j 324 324 I34 4 3952 2882 IgblAFol99041 IStreptococcus pneumoniae choline binding protein A (cbpAl gene, partial cds 98 215 1071 13 8:92 94 gll5 trpoocspneumoniaeP13 glycerol-3-phosphate dehydrogenase Ipo)99 25 87 I Igee partia Cd3. and glycerol uptake facilitator (gIp?) and ORFI genes I coplete cdsI 4 13 86162IbU261 Srpaoc:pneumoniae P13 glycerol-3-phosphate dehydrogenase igpO) I950 1 134 9 i984 (1622~gbU1267~ gene, pertis cds, and glycerol uptakefacilirator lglpF) and ORF31 genes, I I I I I Icomplete cdi I134 110 110805 111122 IghIu1256lj Streptococcus pneumoniae P13 glycerol-3-phosphate dehydrogenase igipDl 100 318 318 I I i I Igene, partial cds, and glycerol uptake facilitator lglpF) and ORF3 genes., I I I.complete cds I iI 1137 113 I7970 18443 1bU93 Streptococcus pneumbnlae type 19F capsular polysaccharide biosynthesis 90 j 420 174 I I (gb~u09239j operon, IcpsIVfABCDEFGIIJKLNNO) genes, complete cds, and aliA gene,. I I I Ipartial cds 1 137 14 18590 j 8775 1emb1Z8333515PZ8 IS.rneumoniae dean, capliA,8,C,D,E,F,G.I1,IJK) genes, d713P-rhamnose 1 7 1 1) I Ibiosynthesis genes and sliA geneI I I I* Ibiosynthesis gene and aliA gene j1 J95 4 137 .116 I9223 j 9687 jembjZ77726jSPIS IS.pneumoniee DNA for Insertion sequence 151318 (1372 bp) 96 j 446 465 137 117 1 9641 110051 IemblZ77727ISPIS jS.pneumoniae DN4A for insertion sequence 1S1318 (823 bp) 96 293 j 411 a 1391 110- 112998 112702 1emb1X6360215P80 on---ae- -900-- j 2234-- 229977 4 141 I8 17805 j 8938 lemblZ49988ISPHH jStreptococcus pneumoniae mSA gene 99 338 j 1134 203 141--- 10972---je---- sA-gene- 9 2037 -2037 4---a1 141 1101 111472 112467 e- treptococcu-----p---u--on--ae- A--gene--j -100 -j 76 996 1142 12 1257 1814 jgbjI4802151 Istreptococcus pneumoniae uvs4OZ protein gene, complete cds 1 98 1 174 j 558 4 I142 13 1787 1957. IgbIN802l51 Istreptococcus pneumoniae uvs4O2 protein gene, complete cds 100 j 142 171 I142 1 4 1 980 13022 jgbIl4802lsj Istreptococcus pneumoniae uvs4O2 protein gene, Complete cds 1 95 j 1997 2043 4 S. .nu on Cod n rei c ona n n kn w sequen cs TABL ID I a So I 142 5 3020 1 3595 1gbjN802151 Istrept ococcus pneumoniae uvs4O2 protein gene, complete cds 1 100 153 5761 145 1 1 2 1 96 IebIZ305i3S A IStrpocspneumonia gne poenamiAllin-idnkeon A gene, omlet 99 216 18 219 I 145 25 1188 9942 Igb(L205271 IStr ptococcus pneumoniae pip en ell partnial poe in Ip)gncmlt 99 5 1 56182 C C- C I :5 14 3 27 379 embIZ8202osPZB jS.pneumoniae dexfl ca3.cpgand capC genes an rf 99 1052 516 C C- 145 1 2 110448 1 992 lebIZ8021S B ISt pocspneumon iae protenn ppdA gene. coplt 99 12 255 1 146 11 11175 1179 lemblZa2002lSPZ8 IS.pneumonlae pcpB and pcpC genes 1 98 1 156 1 162 I 1467 1 44 1167 91002 embIZ2102ISPzS ls.pneumoniae pcp gen and cpC genes enoigiai -N gyoyaead8 98 25 275157 14 116 111795 110674 IembIZ202ISPz8 jS.pneumoniae pcpB~n and o.t encdin genes-DN gl s 8nd 8- 26 3 61002N I I I I Ioxo uTP nucSOSide triphosphatase 99 II 148 2 1 9336 110676 IebIU21702sP m Strpocspneumonia e e an td i e e ncoing utxd eucae(s n 147 I j oxodO n. nuce sde t pho sp(haas genesON gcompletae and 90 663 663 156 3 4 1154 I1402 lembIX636O2ISPBO IS.pneumoniae mmsA-Box I 94 I 185 2493 159 113 1 9048 1 8521 3gbIkI361801 IStreptococcus pneulsoniae transposase, (comA and comB) and SAICAR synthetase I 98 52652 I I 3IpurCl genes, complete cds I 58 1607 l 7 9 embIZ268511SPAT IS.pneumoniae IR6) genes for ATPase a subunit. ATPase b subunit and ATPase c j 0 99 j 2 7 I I I I I I subunitIIII 3 160 3 19 1406 jebIZ2685 I SPAT IS.pneumoniae IH22) genes for AIPase a subunit. ATPase b subunit and ATPase c 9 72072 1 1c subunit 51 I 160 3 0 1426 embIZ2685O1sPAT IS.pneumoniae 111222) genes for ATPase a subunit, ATPase b Subunit and ATPase 9 0 0 I I I I c subunit I 71 3650 I 1610 1 137 1 942 embi7Z2851 SPAT Is.pneumoniae 186 )caW c genes 99as a 984iA~ sebsb ntan Tae 8 3 9840 I 161 7 1 6910 1 7497 lemb1X839171SPGY jS.pneumoniae orflgyre and gyrB gene encoding DNA gyrase 8 subunit 1 99 I 437 SO 58 161 j8 7443 I9386 IembIX839l7ISPCY IS.pneumoniae orflgyrB and gyre gene encoding DNA gyrase B subunit I 98 1912 1944 13 II2155 igbIL2OS591 IStreptococcus pneumoniae Exp5 gene, partial cds 98 I 327 2154 TABE IS. pneumoniae -Coding regions containing known sequences 4 Contig joRFI' Start I Stop I match I match gene name percent lISP nt ORF nt I ID fIII I nt) I (nt) I acession I dn lelngh 16 2 11 gI076 S~pneumonjae maIX and maiN genes encoding membrane protein and 1 5 187 1587 1 5 1 1 j 2 11 gI076 mylomaitase. complete cds. end malP gene encoding phosphorylase iI I 165 2 11608 13902 jgbjJ0l7961 IS.pneumoniae maix and malm genes encoding membrane protein and I 100 280 2295 I I I I I I amylomaltase, complete cds, and malP gene encoding phosphorylase 1i 1 166 1j 1 378 1 I embIYI1463ISPDN IStreptocaccus pneumanlae dnaG, rpoo), cpoA genes and 0RF3 and ORF5 1 100 375 1 375 1 166 12 11507 1 320 lembI1l463ISPDN IStreptococcus pneumoniae dnaG, rpao. cpoh genes and ORF3 and ORFS 1 99 1188 1188 F- 1- 4 0 14- 32--emb-I l1-4- treptococcus -neum-oi-- *1 o-9 I 167 1 1 1077 1 328 IemblZ7l5521SPAo Istreptococcus pneumoniae adcCBA operon I 94 155 750 i- 18 -I e -bI Z715 521 S PA t u pn-- 1 6 7 -7 -F 2- 1 8 4 4-1 4- 1 19 9 9 I e m b IZb-Z 7- -l S S 2 I s P -P -ISo t i -p-tr e p t o c o c c u s p n e u m o n i a e a da c C B A a p e r o n I 9 4- -6 4 1 167 4 1 4 39 1842 lembIZ71552jSPAD IStreptococcus pneumoniae adcCBA operon 99 1 604 739 167 1 13 129 9 241 I LblZ7 12 A IStreptococcus pneumaniae adcCBA ne o pe r a I 99 1 702 259 4 170 110 j7338 I7685 IembIz777261SPIS IS.pneumonias DNA for insertion sequence IS1318 11372 bp) I 95 I 315 j 348 4 4 172 16 2462 14981 IgI465 Streptococcus pneumoniae formate acetyltransterase (exp?2) gene, partial 1 97 365 2520 tJA IgI46S cdsI 120 IgbIH3618Oj Streptococcus pneumoniae transposase, (comA and comBI and SAICAR synthetase 1 89 1 353 I I j j (purC) genes, complete cds I j4 j1843 j3621 jembjZ4721OISPDE IS.pneumoniae dexB, cap3A, cap38 and caplC genes and arts I 95 I 89 1779 I 176 5 1 3984 1 2980 IembIZ67739ISPPA jS.pneumoniae parC. parE and transposase genes and unknown art 100 573 I 1005 1 178 1 1 1 3 1 425 embIZ677391SPPA IS.pneumoniae parC, parE and transposase genes and unknown ort 95 I 423 4231 1 179 1 I426 70 emb1Z83335[SPz8 IS.pneumoniae dexB. capllA.BCD.E.xc lx.J.Xj genes, dTDP-rhamnose 9 11 1 1 1 biosynthesis genes and allA gene 1 3 3 I3084 I1855 lembjX9S7l81SPGY jS.pneumoniae gyrA gene 1 99 j 381 1230 186 1 714 1 I4embIZ1969lISOOR Is.pneumoniae yorf(A.B.C.D,El, ftsL, pbpX and regR genes I 98 59 I 711 I 186 j2 12254 1 608 lembIZ796911S00a jS.pneumoniae yartlA.8.C.0.EJ, ttsL, pbpX and regR genes I 98 1 315 1647 I 8 07 1 880 IembIZ796911SOOR IS.pneumoniae yorfiA.B.C.D.Ej, ftsL. pbpX and regR genes j 98 1 174 174 -3-17 4 18 129 IbU221 IStreptococcus pneumoniae heat shock protein 70 (dnaXl gene, complete cds j 99 258 1 258 189 1 2 59 1 gbU7720 jand Onaj (dna3) gene, partial cds
I
189 12 600 1 385 IgbIU721201 ;Streptococcus pneumoniae heat shock protein 70 (dnaK) gene. complete cds 1 98 j 204 211 I I I I I and OnaJ (dni~a) gene, partial cds
I
t T A L I S Cnu a C o i n r e i n c o t i n n k n w s e q u e n c e s TABLED_._nt I S. peuion a den Codinh relnscetniggthnsqune atchnJ d gene nariale I D ID I nt I l I acedsslonaj gn, aril 7 p- -c e n t 4 -t 1 91 9 1 08 1 5 4 l m I 6 6 2 S B- I~nu o i e m s -B x1 9 3 0 19 2 gbIU727201 iStreptococcus pneumoniae heat soa ck m rte n 7 dnie com pl ACR yetc se 99 1 678 1 671 4 I I I I I I and C en es, comapee patil d I I i- 4 I11 ]9 79 II 754biX301P o neni eeso an 95i ge2ne06 1 0 19 1 1 97 17297 igbIL20561 IStreptococcus pneumoniae rXPra gn .pareticmAa n c omB an SACA synthetase4 I I I I(prlns complete cds 91 72 111 4 88 1 2 5 12 l m I 8 3 5 S Z S.pneumonlae de B ,E F G H1.2 I ge es lTDP-rhamnose9 3 3 199 biosynthesis genes and aLiA gene S 21i 6 2 embIZ833351SPZ8 IS.pneumoniae dexB. capI(A.B.C,DE,F,G1,.J.K( genes. dTnp-rhamnose 99 89811 iboytei-ee-n i--2 -99 i -18 224 -IembIZB3335ISPz8 IS.pneumonlae dexB. capl iA.B.CD. E.F.0o.I..3.K( genes. dTDl'-rhamnose -1 1 1 I I I I I I biosynthesis genes and aliA gene 9 0 0 i-227-- 13- I 0 23 1 j17 33 gbIL32S6 IStrpocspneumonae Esp9 gene, partiale cds 95 42 164 4 a 247 1 1145 1803 IgbjL3611~ Istreptococcus pneumoniae exanpo gene, comlete cs, reane 5'CA syend 94 113 1143 gees complete cds 1eb23355z emoia ialA 253 131 24556 13 bgI3681i llepooc neuned A geansoae geoA n es.mB ad SDP- Ram n th e 96 332 360j 28 125 ebZ331P8I~ nuoiedexB, capl[ABCD,E,F,.,I.,J,K) genes. dTfP-ramnose 338 4083 4 222 I I I biosynthesis genes and aliA gene Ii a TAB E S *nup a Coin r egin cn an know sequences TA LE. numeie on a r egion cnaining .,EFG,~,JK genes, sequences170,50 I I I I J ~biosy'nthesis genes and aliA genei 79 255 I I80 embI-Z820021sPzS IS.pneumoniae pcpB and pcpC genes I 97 1 5312 1984 255 I2493 I1969 lembIZ677391SPPA IS.pneumoniam parC. parE and transposase genes and unknown orf 1 92 435 I 525 27 2 j985 I770 lembIX17337ISPhn IStreptococcus pneumoniae ami locus conferring aminopterin resistance I 96 1 117 1 216 4 I 2I I 2 purC) genes. complete cds 67 2495 11208 1gbjULl-6561 Streptococcus pneumoniae dihydropteroate synthase (sulA1, dihydrofolate 95 84 714 I. I I ynthetase (sul~l. guanosine triphosphate cyclohydrolase (suIC), aldolase- Psyrophosphokinase (sulD) genes, complete cds 26 21 130 bU65I Streptococcus pneumoniae dihydropterot synthase (sulk), dihdofo- 267heas (3uB) gunsn 1291spat 2277 I droiase suIC), aldolase- 015598 267 5 1 2261 136 j~ I116 Streptocc pneumoniae dihdroteot synthase (sulAl, dihdt 98 1341 513 jgbnu16356e 1oc 0Bcu yuaoe rposate cylhdoa r SI) o.e pyrophosphokinase (sultl) genes, complete cds 27 6 4164 14949 IbU651 Streptococcus pneumonlae dihydropteroate synthase IsulA). dihydrofolate 99 748 786 1 gb~ullS~g Isynthetase (sulB), guanosine triphosphate cyclohydrolase (su1C), aldolase-II I i I I j pyrophosphokinase 13ulD) genes, complete cds iI 61 -56- -n s-- 1267 1 54 5140 IgbIUl656 Streptococcus pneumoniae dihydropteroate synthase (sulA), dihydrofolate 100 186 J 405 4 -I 271 1j 1 562 1 104 lgbIH296861 jS.pneumoniae mismatch repair fhexBl gene, complete cds 93 J 160 459 291 1 1 75 1 524 jgbIU040471 iStreptococcus pneumoniae SSZ dextran glucosidase gene and insertion 96 j 50 0 I I I sequence IS1202 transposase gene, complete cds
I
21 I2 1001 1525 IembIZ833ls1spz8 S. neumoniae dexcB, cap1 B.C.D.E. F.Gj(1JK1 genes, dTDP-rhamnose 1 87 I 205 4i7; I II biosynthesis genes and aliA gene
I
291 3 j807 1559 1embIz83335Ispza S.pneumoniae deaB, caplA8.CD.E,..FHrJ.K( genes, dTDP-rhanose 90 17024 I I I jbiosynthesis genes and aliA gene
I
291 4 j1374 11099 gbIN3618Oj Streptococcus pneumoniae transposase, (comA and comB) and SAICAR synthetase 85 264 271 I I I (purC1 genes, complete cds TABLE I S. pneursoniae -Coding regions containing known sequenceA c-ge- I ID I itnt) nt) I acession I percentl NSP ft IORF nt 531 67 i 296 -IDE S.pneumoniae gyrB.cpA a3 an cpC genes andunnw orf -eng-----ngth 317 1 157 510 lembIZ677391SPPA jS.pneumoiiae parC. parE and transposase genes and unknown art I I biosynthesis genes and aliA gene 91 1 299 j 7513 326 I 1 1 1 462 lembIZ820011SPz8 IS.pneumoniae PcPA gene and open reading frames I 100 233 1 462 32 I60 6 Imbza33~szs S.pneumoniae dexB, cap1 IA.BCD..FGHI.JK genes, dTDP-rhamnose 94 I 89 5401 1 1 I biosynthesis genes and aliA gene 34 j3-- 1 5 545 Streptococcus- 1gb treptcoccu pneumoniae peptide methionine sulfoxide reductase,(msrA) 1 1 1I homoserine kinase hornolog (thrB) genes, complete cds 336 1 309 93 IemhIZ268SoISPAT S.pneumoniae W3222) genes for ATPase a subunit. AT~ase b subunit and ATPase 97 102 2161 I -I I 360 j1 1 j519 1embIZ67739ISPPA IS.pneumoniae parC, parE and transposase genes and unknown art 95 1 435 I 519 I i ;360 I 159 1960 -IembIze-333-51ssP-ze Sne:umonia e dee.BC -a pl( eco.F1IK 94 353 b3snh6i3ee ndai en 362 1 J 673 2 ~~~~biosynthesis genes and ahA geneges. TDrhno 963 1 72 362 1 1673 1-2 embIZ83335SjPz8 IS.pneumoniae dxcap IA. BC,DE,GHXIJKI eedD-hmns 56 7 blseunces gSenes tandpoaseA gene, oplt cd I 2 18 178 gI007 Strpocspneumoniae dexe dpsi4A, cps4B. cps4. si4da en n cp i ncseri on cs 96 4 3 34 1 eb18775~ cps14I. cpsl4l, cpsl4J, cpsl4K, cpsI4L. tasA genes
II
I- I TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins Contig j ORP Start I Stop macatch gene name aim 6ident jlength, I Di ID v jIt ntl aces s ion j ntl 4 I 2 2 f10 1942 jpirIF606621F606 Itranslation elongation factor Tu Streptococcus oralis 100 J 101f1837 4 260 1 1382 lirI1F F066 jtraslaoen l onation facr ph shT a s er -y te H~Streptococcus oai 99 93 11 25 1 486 2 1394 ;gi13479995---- jhy pend -ticl a e piu inluenzae suui Irpoocu ai 98 965 199 94 68 102 g1J3i0627 lioimnphosph yu atesua pehdosphrn ease se (Streptococcus Iygns 98 94 381 S07 7 1290 589 1011987050 jlaaZ gene product [unidentified cloning vector) 1 98 98 300 I 11 9 598 766 19115755 Iphspo-btav-g~atosda E (C 3.2.1.85) ILactococcus lactic crernorisi 1 97 I 94 1419 4 1 32 18 1 6575 17486 IspIP372l4IERA.S IGTP-82N31N0 PROTEIN ERLA HOHOLOG. I 96 I 91 I 9121 94 1 951 12741 Igi(153615 lphosphoenolpyruvate:suear phosphorransferase sse nyeI(tetccu 6 j 9 71O I I I I s livariusl I (II" 4 1 127 1 1 1 1 168 1911581299 linltlation factor IF-i ILactococcus (actisl I 96 I 89 I 168 1 128 114 110438 j11154 19i11276873 joeoo (Streptococcus thermophilusl1 96 93 j -171 1 181 14 11362 1 1598 191146606 Ilacn polypeptide (A.A 1-326) (Staphylococcus aureusj 96 80o 237 218 I 1 I 34 g1 14356 llntrageneric coaggregation-relevant adhesin (Streptococcus gordonii( 96 93 j 834 4 4 39 2 1 115 1441 1011208225 heat-shock protein 82/neomcyn phosphotransferase fusion protein (hsp82-neol 96 96 3271 I I II (unidentified cloning vector) I 54 1 12 I82 j1967 1on11P101dl00972 jpyruvate formate-lyase (Streptococcus niutansl 95 I 89 2346 11 I2 j606 .1289 IgiI149396 Ilacy (Lactococcus lactis) I 95 I 89 684j F 4- -4-6 -i 3 1-l 3045 -igI 341 804 19 11 5 0 6 0 6 ix (Streptococcus-- 6 4- 1 89 110 1 7972 17337 1011703442 Ithymidine kinase (Streptococcus gordoniil 94 I 86 1 6361 4- -4 4- -4 4 148 I_ 9iI-- 6431----I354 1911995767 IUOP-yolucose os pyrophosphorylase (Streptococcusu pyogenesle I 9I 'I -585 942 160 430 5357-11-+- -4 il 1 4 -7 4 F160 7 4430- 5848---1---1-53573- I- nterococcu---- fa--ca--is---j 94 1419plsi -4 4- 2 3 j 4598 351---i---37-3-------n-receptr-(Streptococcs-pyogenes--j93---86 -j-108 fomlt~ayrflt 4 4 -12 ;_8T 7877 -i6204 igi1103865 fr--tera y roo-t sy tet (Streptococcus--- -1674- 4 S. .a Pu at v co in re i n of no e pr te n sim la to kn w protein .i 9R I Star I Sto ma c 9c gen 9am 9i 9*en I* *D 1 11 *n t I n t acs s o *e n t I 4 4- 111 1D 4nt4 j 5120 acession0 jL1 prti (A 1-22 Intllssut l9 8 68 I1 53 1297 191147341 ;antitumor protein (Streptococcus pyogenes( 93 j 87 1245 4- j j 339 IgnljIo dl6 6 Iribosomal protein 57L (Bacillus subtilsi 93 84 297 160 5 11924 j3462 19111773264 IATPase, alpha subunit (Streptococcus mutans) I 93 I 85 I 1539 211 5 3757 3047 1gij535273 laminopeptidase C (Streptococcus thermophilus) 93 I 82 j 711 r 4- I 36 1 97 3 91295259 (trflptophan synthase beta subunit (Synechocystis sp.( 93 I 91 195 1 3 j 1392 j 1976 I91I1574496 1hypothetical (laemophilus influenzae( 92 1 80 585 1 36 121 120781 119927 1ei1310632 Ihydrophobic membrane pro *tein (Streptococtus gordonli) 92 1 86 855 4 4 4 I 81 3 12265 j -1534. (;gij149396 ilaco (Lactococcus lactis) 1 92 83 270 1 I 8 62 (460 (gi(149410 Ienzs'me III (Lactococcus lactis) 1 92 83 3991 -32 61 (33 gn(I~~~s fbrnci-idn roenlk rti A [Streptococcus gordonii) 91 as5 1695(0 46 2 13054 1 1462 IgiII850607 Isignal recognition patticle Ftb (Streptococcus mutans( 1 91 84 1593 1 65 110 j1 4442 14726 1pir151786515178 (ribosomal protein 517 Bacillus stearothermophilus j 91 80 j 285 I 77 2 260 1900 1gi1287871 loroEL gene product (Lactococcus lactilj 91 82 j 1641 4 -i 4 84 (I j 2 2056 1911871784 ICIp-like APP-dependent protease binding subunit (Boa taurus) 91 9 I 2055 4 99 1 8 110750 19272 1911153740 Isucrose phosphorylase (Streptococcus mutans) 91 I 84 1479I I 99 1 9 111947 111072 1911153739 (membrane protein (Streptococcus mutans) j 91 78 876 I 17 (5 (2065 j2469 IpirIS07223(RSBS (ribosomal protein LI? Bacillus stearothermophilus 91 j 78 405 4 4- 17 8 j4765 16153 IgnlIPlDIdlOO347 Nat -ATPase beta subunit (Enterococcus hirae) I 91 9 79 1389 4- 4 4 4 1 201 1 2 1 1798 1 2718 19112208998 (dextran glucosidase fleaS (Streptococcus suis) j 91 79 1521 222- 4- 63-183-- -4AT-- pr- -4 -4 222 2-T 673 -1839 153741 protein (Streptococcus---mutans-----91----85--j-1167 293 4- 14 -400 -49-91- nown- -ot-i- n-rtio-s- -unce-S861- 91-1 -28 T293--+ 4--13--j 4400-----i------92---unknown---protein----Insertion----sequence---1S86---- 91- -71 288 32- 616-1- 670- -4 ccus4utns 9 1 7 T A B E 2S n u Onl e Pua t v o in eg on f o e proe in a a o know pr te n *I I aD aI .1 ant *nL ac o I a, TABL 2 2 s.192 pnuoia utative rodeing (Irionse noe protein Siia to know proein S4- 4- 4 I 3 21 81 2 191111969njP~j2II unknow protein (inlsto s equence 158613 70 1 23 4 4 4 1 56 12 17l 977 19111710133 Iflagellar filament cap jeorrelia burgdorferi( 90 1 s 261.
4 6 5 1 1 j 606 19L11165303 jL3 (Bacillus subtilis) I 90 75 1 606 t 11 j 2 j988 1911153562 aspartate beta-semialdehyde dehydrogenase (EC 1.2.1.111 (Streptococcus 9 80 j 87 12 I1 135g2 1.11407880 1Q91') (Streptococcus equisimilis). 90 75 519 159 112 1j7690 18298 1911143012 jGur synthetase (Bacillus subtilis) 1 90 1 84 6091 4- 1 183 1 1 28 11395 1gi1308858 IATP:pyruvate 2-0-phosphotransferase (Lactococcus lactisi 90 76 1368 4 4 4 1 191 13 12891 11662 1gi1149521 Itryptophan synthase beta subunit (Lactococcus lactial 1 90 78 1230 1 198 1'.2 j 1551 1 436 19112323342 I(AF0144601 CcpA (Streptococcus mutansi 90 1 76 1 1116I -4 4 4 305 1 1 37 1 783 jgil1lIS5l lasparagine aynthetase A (asnAl- (Iammophilus influenrae) 1 90 80s 747 I 8 3 I2285 1334 19(1149434 Iputative (Lactococcus (actis( 89 78 1059 0 46 18 j 7577 j 73. 62 IpirIA45434IA4S4 Iribosomal protein L19 Bacillus stearotharmophilus 1 89 1 76 216 4 4 4 I I 86g1042 11153792 Ireor peptide (Streptococcus pneumoniae( 9 j 83 f 18 4 4- 4 I 1 14 j81 19441 1911308857 jATP:0-fructose 6-phosphate 1-phosphotransferase lMactoccccus lactial 89 at8 1038 4 I 57 (1 j 9686 110669 19n11P101d100932 1112-forming IADII Oxidase (Streptococcus mutansi )97 8 4 4- I 65 f 5 12418 1 2786 1gi11165307 1519 (Bacillus subtilial 1 89 81a 369 I8 j3806 1j 4225 IspIP14577IRL16.. SOS RIBOSOI4AL PROTEIN Li16. 1 89 1 82 420j 6 4 65 j1a 8219 8719 1911143417 Iribosomal protein S5 (Bacillus stearothermophilus] 89 f 76 501 T 73 1 9 1 6337 15315 1911532204 jprs IListeria monocytogenes( 1 89 j 70 1 1023 1 76 j 3 13360 j 1465 jgnljlDje20067I jlepA gene product (Bacillus subtilis( 1 89 1 76 1 8961 4 1 99 110 112818 111919 1911153738 jmembrane protein (Streptococcus mutans) 89 73 900 4 1 120 12 13552 11300 1911407881 Istrlngent response-like protein (Streptococcus equisimilis( 1 89 79 I 2253 4 1 122 15 14512 1 2791 IonlIDje280490 junknown (Streptococcus pneumoniae( 89 81 j 1722 4- CotgtR tr tp I mth I* atch g e e Si iea I leenere, I ID qI I Int I *nt 0 tcs o 0n S.491 pnuoi en utte coding reios ofnovl prten simla to knwnprten -4 4 1761 1 j 66319 4 273 1gi14573 jS-ioorlpeptidase (Streptococcus pogens) 89 J 8 6667 4- S 7 6ITDLTN 3050EAS (EC4 1911912423 IHputativeTN (Lactcoccu lactis. 1 89 78 1 1305 4 4 18 8 1436 5651 1g111 44 Ienzve ABC (La sotrsbntCmAItetococcus lactisii 89 78 178 4 4 4 36 1j 1 1 43 1 22 83 gij 11969 I2 Iqunn n proeinus s rti os eqec as61 66 8922050 4 4 1 57 1 2 1 611 1 1468 19nl1P101e134943 1putative reductase I (Saccharomyces cerevisias) as 68 75 1 8585 5--i-3--5497--6069-pir-A2102 I 5 20 I 03 59 0 912078381 Iribosomal protein L15 (Staphylococcus aureusl I 8B 83 5 4715 -4 4- 4- -I 78 ;3 -;3636 -I 1108 19-gn-15101D-dl--078l *Iysyl-aeminopeptlidase Lactococcus lactic) 88 1 80 j 2529 1 -i 4- 4 107 2 5 219 1 962 1unl191flje339862 Sputative ecylneuraminate lysse (Clostridium tertium) 88 75 1 744I 4 4 4- I 1 -1l 1 81 5103 10420 19-1402363 IRN~A polymerase beta-subunit (Bacillus subtilisl 1 88 1 74 5 3654 1 1 126 19 113096 112062 5onl5P10te31l46i8 5unknown (Bacillus subtilis) 88 1 74 1 1035 140 11 1114 1887 11(57359 5i. influenzam predicted coding region 1110659 iHaemophilus influenzae( as8 61 1 270 4 144 111 394 1555 Ign1IPIDje2747O5 Ilactate oxidase (Streptococcus iniael 1 88 5 75 1 1621 148 5 4 j2723 ;_49 19111591672 lphosphate transport system ATP-binding protein (hethanococcus jannaschii) j 88 68 1 771 -853- 4- 4 4- -2-4 F 177 1 4 51770 1 2885 1911149426 1putative (Lactococcus lactis) 88 j 72 11 211 1 6 14140 1 3613 1911535273 Iaminopeptidase C (Streptococcus thermophilus) 88 74 1 528 231 4 80 1497- 4 3- 231 T4 -580 -1 40 86 homologous----to co i-ribosoma----protein---L--7-(Bacillus----ubt----i-- -88-5--78 378 1 260. 5 1 2387 1 2998 19111196922 junknown protein [Insertion sequence 1S8611 88 5 69 6121 1 291 16 12017 5 3375 IgnlIIlDIdlOOS7l Iadens'losuccinate synthetase (Bacillus subtilis) as 88 75 5 1359 4 I 319 14 1 658 5 317 lgi1603578 5serine/threonine kinase (Phytophthora capsicl 1 88 Be8 342 4 4 4 4 40 5- j435 3_*451 -11 1 5 3 6 7 2 ;lactose-repressor (Streptococcus mutans( 87 5 56 162 ti .OR St r I Sto ma c *c gen name 4S9 pnusona -06 Putativ coding2 regonsofoove proteins similario touec known proteins0 4- f f 110 16623 170939 Igif114992 Ilnkownm protein SOnsertin sunciSiII 87 j 72 1 247 f- 11 I 663 78--ib-s-a--rotei--8 7t- -82 ;541 -i248-- Igil 1196921 unknown-protein llnsertion sequence 358611 I 87 69 j 94 I 4 2 25033 123897 1onl1P1iDe254999 Iphenylslany-tRNA synthetase beta subunit laacillus subtilis( 87 74 1137 12]--l 1 214 4 1.14411 8516 joi122813o5 glucoseZihibited division protein homolog GidA iLactococcus lactic 87 75 1926 220 i2 -i27142 874 IgnlIPlDje3243S8 iproduct highly similar to elongation factor EF-O (Bacillus subtilisl 87- 7 1869 4 4 260 1 4 12096 12389 IgI119692l junknown protein (Insertion sequence IS8611 j 87 72 j 294 4 f# 32 1 1 1 27 j 650 IgiI897795 130S ribosomal protein IPediococcus acidilactici) j 87 73 1 624 57 j 154 j 570 joif1044978 Iribosomal protein S8 ilacillus subtilisi 1 87 73 1 417 T 4 f 10927 111445 19111196922 Iunknown protein [Insertion sequence IS8611 86 63 519 59 ;12 17461 j 9224-- 191501 Irelaxase (Streptococcus pneumoniae( 86 68 1764 f 4 1 1553 12401 IpirIAO27S9IRftBS Iribosomal protein L2 Bacillus stearothermophilus 86 771 849 -ft t 4 4- t 1 65 123 110957 111610 jgij 44074 jadenylate kinase (Lactococcus lactis) j 86 76 f 654j I 82 j 4 14374 14856 1gi1153745 Imannitol-specific enzyme III iStreptococcus mutansi 86 72 483 f 4 i I 106' 6 7824 6880 IgnI IPIDje a137s98 jaspar tate transcarbamyl ase aILactobacilIlIus. lei chmanniIi1 86 68 j 945 f -t -t 107 1 1 j273 IonIIPIDIe339862 jputative acylneuraminate lyase [Clostridium tertiumi 0 6 j 71 j 273 -t f- I 1 1 7 1 10432 16710 IgnIIPIDIeZ28283 ItNA-dependent RNA polymerase [Streptococcus pyogenesi 86 80o 3723 I 11 j9 j5704 4892 1o111661193 Ipolipoprotein diacylglycerol transterase (Streptococcus mutans) j 86 71 813 4-13-li--- i- f-T 13 40 7980 IgiI23B8637 glycerol kinase ltnterococcus taecalisi 86 I 73 1551 -f 4 153 j2 1 595 1 2010 jgij2160707 Idipeptidase iLactococcus lactis) 1 86 78 j) 1416 4 I 154 1 1 2 1 1435 1gi11857246 16-phosphogluconate dehydrogenase (Lactococcus lactisi 86 j 74 1 1434 C C C TABLE 2 S. pneumonlae -Putative coding regions of novel proteins similar to known proteins ID ID I (nt) I (nt) I acession I t) 161 5 j5025 I6284 1 gi 147529 IUnknown (Streptococcus salivarius) 86n6t2) *1-2 F :1642667 NADP-'cep enden:t g-yceara dehyde -3-phos pha&e- dehydrog en a se -(Streptococcus 86 73 -1182 4- 4- 1 2610 4 8 2659 51 j99 gij213661 Itrallvson inittin fc (Enterococcus faec iumj 86 76 12629 1 :1 11- 250 1 1 247 138) 1gi115355 japuaraie syC thaseorAersubn it (amophlu (Sretoou se jlroni 86 1 68 1863 I i 57 38 j 1- 2757 3589 glP)j1I 1:120554 Ipu ai u ABC tran spote suui OY srptccu odnt 85 j 72 j 1113 S 82 1 5 3 915 14 Ii1258546 ImanomY opht (Syrgeae(treptococcus gordonji 1 85 68 j 1140 4- 4 57141277 23789 IgniII dl 316 iyqfa (Ba cil us ut s as 8597 972 8231495 j654 jg11153746 Omannit1Kprtoste ehyroenause (Septcocu utns 8 6 1409 66 2 7 117 1 628 9 ij1184967 (rs l (s r eoccS1 (utacillu (utls 85 69 3897 108 13 1 2666 j 314 19i1365566 lOeR ah(9K rotein)e (entrons/ylhdoaeI rptcoccus faecalis( Is 85 j 71 1 89j6 12 I 2 J I1 9 08 Iriosa pESS r otei MEI S13 (Bacillus subtilis I 857I8 13187 3 1346 4096 Igil6B7llO Iptatwdrotolka dproen ascco hydolacse Srpoocsteiohls I 85 71 827 4 i 233 23. j2 728 1873 Igit1163116 jORF-5 (Streptococcus pneujsoniae 85 as j 67 1 11461
-I
111 309 j 1931 IgiI143597 jCTP synthetese (Bacillus subtllis( 85 1 70 j 1623 I 6 1 j199 j 1521 1911508979 IOTP-binding protein (Bacillus subtills) j 84 1 72 1323 4 4375 I3443 IgnlIPIDje339862 Iputative acylneurainate lyase (Clostridium, tertium( 84 70 933 14 -1-1 6 63 2093 IgiI520753 DNA--- topoisomerasem I (BacillusB subtilis-- s tbt-l 84 02031 19 14 11793 12593 19112352484 I tAFOOSO98( KNAseN II (Lactococcus lactis( I -84 1 68 1 801 1 1 20 117 117720 119687 IgnlIXDIdlOOS84 Icell division protein (Bacillus subtllls( j 84 71 j 1968 1 22 128 121723 120884 19(1299163 lalanine dehydrogenase (Bacillus subtilis( 84 68 8401 4 4 -4 -4 cc 0 S a cc cc c 0. 0 0 cc *s se cc cce 6:cc cc TABLE 2 S. Pneumoniae -Putative coding regions of novel proteins similar to known proteins i C o n t i g -F I t- t-4I 1 a c c h e m I ID I I D I nt) cession Imatc eenm dn lnth I *7730 56792 lgnllPl0ldl00296 Ifructokinase (Streptococcus mutans) 1I47 33 9 15650 55300 IgI1147194 5phMA 0rotein (Ehihia coi 841 1 4 4 3- ;2--121551 5-20772 Igi15310631 ATP binding protein (Streptococcus gordonli) 84 5 72 780 48 14 1 2837 1 2505 5giI882609 I6-phospho-beta-glucosidase (Escherichla coi) 5 84 69 333 -a51 1- 8 a- 4- 4- 5 9 50 56715 1 7116 5gij951053 jORFlO. putative (Streptococcus pneumoniae( 84 5 74 1 4025 5 2 1 5 21 5644 gi5806487 508F211; putative (Lactococcus lactisi 84 5 66 624 5 6 517 17779 1 8207 19111044980 Iribosomal protein LIB (Bacillus subtilis) 84 1 7'3 5 429 4 4 06 4 574 2262 jgnl1141Dle19s387 Icarbamoyl-phosphate synthase (Lactobacillus pientarum) 5 84 5 73 5 32135 192 1 5 47 1348 5gi45806 IsiFpl;pt ate (Lactococcus lactis) 84 7 3 126 L44 F T -T 46 1390 1 5910 Igij2293164 5 1A0820 sige sadA s nase rei (Bacillus subtilis 1 84 5 68 5 1221 1 1 17 15 1 57426 1 185 I91495046 It rip eias (Lactococcus lact ois)5 84 663 102 348 671 j 6 23 1+4 1 335 2606 gill7 8294 iAC000245). 2 hi 2386 79o f s 4 pct identical to 336s amin acid of A74iy n S11 resP2088duhs 10a addton28a prote idue (ECEOIS:43Escherichiacot 4- 28 1 6 1 330 15 2 1 305 lill143766 5ll (thr u nv) (ECi te 6 1 .3 (B coilu subti s) H1 6 95am p i u n l e z e 83 5 57 5 004 -4 357 1 18 5 3417 jgn 1D17 0 7 6 5sing hetrcaln clDNA d bin in drin rti (B ac l s btlsm 5ada 83 1 68 5 142 1 5 17 1 15 574326 58 191527085 OR d prot e hi Ste ococs num nae 83 6 59 4032 4 120 1 536 1149144 IgnIJ lD9 lo8 3 juo wnx Bacillus subtils) 83 5 61 5 285 4-4- 4 4 4 23 5 4 5 158 2 5g1 0 (AE02 90 Bacllu suThis) 23 aa or s 4 c dnia i as o 21j 84 751 6 8 7 6 8 g111721394 5 iu eo f au n ap prote24 as I~rprotein u Y neB mO n aL S1 P223 J195 4 4 T A B L 2 S 0 00m a 0ua t v 0oi n r e g o n 0 0f 0 00e 0 00ei n s i i a o k o n p o e n TA IJ 11 112 5477 ~n jeumonlae -putative podigrgosozoe protein Iail s s imilar to know protein 9 112 18963 1 9631 jgli47394 I5-oxoptolyl-peptidase (Streptococcus pyogenesi 83 j 73 I 689 4 1 98 1 1 1 3 1 263 jg111183885 Iglutamine-binding subunit (Bacillus subtills( 83 55 I 261 4- -4 4 4- -4 4 1120---F4-- 7170 70 1 5233 1gil310630 metalloprotease (Streptococcus gordonlilc jc 83 72go 1938niiI 7 1 198 I 17 7 99 437 giJ 150056 IN. yp j ann a l eited coding reo 1416 hehnccu anshl 83 72 8 3466- 4356 1gi.-g-a-ml773265 uIAT-t-ISt-ase. gammac subunit -m-8tans I-783-I 67-- 891 6- -2 -8- 226 1 3 1 2367 1 2020 1911142154 Ithioredoxin (Synechococcus PCC63011 j 83 58 3481 4 9- 4 4- olcs IoeaeA A Bclu sertemohls 8 7 1 1 4 -303 1049 jgi- 0046---ph-- -pho--lucos----iso-- erase- 1-4 9 -6--1047 331155 J1931 IgiJ289282 Iglutamyl-tR4A synthetase (Bacillus subtilis) 83 67 777 -6 1 537 (1431 (Irlbose--phosr'-hate pyrophosphokinase cilu caido---yt--c--- 82 j 64 1053 j 7 i 1 T i14 68-- -o-sma-lprtei--- 4- 4---04- 4 4 I 9 1 3 I 1479 11090 1gij385178 junknown (Bacillus subtilisi I 82 j 46 j 3901 7 4213 3899 IgnlPIDjdloOs76 Iribosomal protein S6 (Bacillus subtilis) 1 82 60 1 315 4 1 12 16 1 4688 1 3942 IgnljPIDld10057l junknown (Bacillus subtilis( 82 68 747 4- 4- i i 2 1 32 187 91 274ptti e( ail ssbii)8 911 4 1 22 (18 114897 115658 lgnIle~rildxol92s luridine monophosphate kinase (Synechocystis sp.1 82 62 7621 4 I 35 1 9 1 7400 6255 0i 11881543 lUDP-k4-acetylglucosaine-2-.epimerase (Streptococcus pneumoniae) 1 82 1 68 I 1146 4 4- 4 1 40 110 I 8003 1 533 Igi) 1173519 Iribottavin synthase beta subunit (Actinobacillus pleuropneumonlae( 82 j 68 f 471 48 12 13159 123437 IiI1930092 louter membrane protein (Campylobacter jejunij 82 j 61 279 1I 2 1 113833 114765 1g1jl42521 Ideoxyribodipyrimidine photolyase (Bacillus subtilis) 1 82 61 1 933 j 4 14737 11849 IgnilP1Ddl02221 I(ABOOIIIOI uvrA (Delnococcus radiodurans) I 82 66 2889 2131- 4 57-4- -4 -4noytognes)82 6 67 I 62 I 2131-[ j- 1457 -g 4 9-j 0962 thiored---xin-reductase----(-isteria----onocytogenes---- 8---63 675 168 171 njl e326 4- 4 4 trptocccu- -4 -93 71 III11 6586-- j1 51 -gn e3 s- ga-- act (Streptococcus-- ema ia---8-j60-- 4 1 73 113 j 9222 1 7837 jgnlPl'xodIoos86 junknown (Bacillus subtilisl 1 82 1 65 1386 4 ~0 e 0e0 0 0* 0 0 0O 0 0 0 TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins Contig Start Sto ma-- tch m tch gene name aim--- ident-lengt 10 lID (nil int) arc eso a IiIntl 14 1 1 13771 IgnI[IlDldlOll99 lalkaline amylopullulanase [Bacillus sp.i 82 68 3771l 4- 83 9 13696 1 3983 jgnlJPIDje3O5362 junnamed protein product [Streptococcus thermophilusi 82 52 288 4 4- 4- 86 Ill 110776 1 9394 loll 683583 I5-enolpyruvy1shiklmate-3-phosphate synthase iLactococcslcii8 7 18 89 112 1 8295 19752 Igi140025 [homologous to Ecoli 50K (Bacillus subtilisi 82 66 1458 11 j j j04 882 jijI d102090 j(A8003927( phospho-beta-galactosidase 1 [Lactobacillus gasseril8213 I 1li I I I 1 1 1332 Ign~lPlDldOOS7)9 Iaeryl-tRNA synthatase (Bacillus subtilisi 82 71 1332 4- 151 3 65 j626 pirlS060971S060 ltnpe I site-specific deoxyribonuclease (EC 3.1.21.33 CtrA chain S -82 66 1590 I Citrobacter freundil 4 4- 4 173 4183- 350 1i131336 p.000584) conserved hypotheticalprotein ieli-c-o-bacter pylon!- 82 1 68 1 681 177 112 15481' 17442 fgnIlPIDjdiOl999 l(A8001341( Ncr8 (Escherichia colil I 82 58o 19621 193 12 1 178 1576 lpirlSO8S64IR3BS Iribosomal protein S9 Bacillus stearothermophilus 82 1 0 j 399 245 2 j 258 1 845 jgij 146402 lEcoA type I rastriction-modification enzyme S subunit lEacherichia colil 1 82 1 68 588a 4- 4 9 5 1 3400 1 3146 IgnllPIDdOOS76 Iribosomal protein 518 (Bacillus subtilis) 81 1 66 255 1 16 17 1 7484 1 8413 fgi11100074 itryptophanyl-RNA synthetase [Clostridum IongispruiI(a 81 70 j 930 111 110308 113820 IgnlIDIdlOOSB3 Itranscrlption-repair coupling factor (Bacillus subtilisi 81 1 63 1 3513 4- 4- 38 2 1232 j1606 1i12058543 jpultative Dt4A binding protein (Streptococcus gordonil) 81 1 63 1 375I 2l 01 1151 Ii405 enolase (Bacillus subtilisi I 81 67 131! 46 1 1 2 1 1267 I0Il431231 juracil permease (Bacillus caldolyticusl at 61 12661 48 1 3 12453 j 1440 lgnllPII~ldlOO4S3 IKannosephosphate Isomerase (Streptococcus mutans) 81 1 70 1 10141 54 12 11106 1336 jgII54752 Itransport protein (Agrobacterium tumefaciensl a 81 64 1 7711 4--4 4 4- 122 110306 110821 Igi144073 ISecY protein (Lactococcus lactisl 81 j 66 j 56 69 14 j 3874 j 2603 lgilsS6BB6 Isenine hydroxymethyltransferase (Bacillus subtilis( 81 69 1272 4 I- 4 4 4- 1 99 116 19126 118929 IgiJ231352 6 p.2E0005571 Hl. pylori predicted coding region fIP04il (Helicobacter pyloril 81 j 75 j 198 7 837 782 lg PIDlel993B4 jpyrR (Lactobaclillus plantaruml 8 1 j 52 108 16 15054 16877 1gi11469939 Igroup B oligopeptidase Pepe (Streptococcus agalactise)a 81 66 f 1824 11 s 189 188 4- 4 prten- 1 128 1 5 3359 13634 IgtI16851l1 lorflO91_1Streptococcus thermophilual 81 -j 69-T 2761 TAB. 02 5 5 0a Puatv *oin ein of nve l prten *iia to knw prten Coti CR Str I Sto 0 mac tc *en nam si IC Cd egh 4i-- 1 19 1 11F 1 672 len7gth298 IM snh ts (ailssbtls 1 9 1 11 1 1501 1 1 8303 1 1 913 86 45co sypeI~dO20 I(BO 8 restction -mNKod.iicionenzyme tlsuit 1scelii ol 83 559 2821 4 191 1i 72 779 18937 jgiI1 228 ItONptp synth i alus subutils fIcoocu at 81 69 j 6 1 2 17220 1 19 45 IgnIIPIDd1O00 jIABOvlIse FUancrtON UNKNOWN. iac se (rsoptila is i ls1 81 554 282 217 2 1745 1 4089 1911466413 jtrypophan phsyn t sease esbntm 11'acto c i us ac rtisiohi a 1 81 59 48 2 2 3 5290 1994 191153757 revertse tra nipase tedocleaus miorospiaurls) 81 j 43 j 329 F2 T i f 1 299 1 1 !663 1 4 Ign1IPIDIe3Ol1S4 jStySKI methylase (Salmonella entericel 81 60 1 660 366 I 76 83 1911149521 Itryvtophan synthase beta subunit iLactococcus lactisl 16 9 B14--- 9-41 12 110 I8766 19242 jgi 11216490 IDNA/pantothenste metabolism fiavoprotein istreptococcus mutansi 80 64 4)7 17 11 11i 6050 15748 Ign1IPIDI8305362 junnamed protein product iStreptococcus thermophilusi j 80 1 67 1 303 4- -17 16__f 8455__;9066__;gIj703I26 i ucocin -A -translocator Leuconostoc -gelidumi 0--s 59 612 4-
*C
1 18 13 12440 11613 IgIj 1591672 Iphosphate transport system ATP-binding protein iMethanococcus jannaschiii 80 58o 828 27 j3 I4248 11579 jgiJ452309 jvalyl-tRNA synthetase isacilius subtilisi 8 69 2670 4 1 28 17 13671 13288 19111573660 IN. intluensae predicted coding region H410660 ilaemophilus Influenzaei 80 63 j 384 4 1 32 12 1902 11933 IgnIIPIDje264499 Idihydroorotate dehydrogenase B iLactococcus lactisil 80 66 1032 1 39 1 1 I 1 1266 jgnljPIje234O78 jhom fLactococcus lactis) 80 1 63 j 1266 4 4 4 1 2 jS 46 3593 Ioi11183884 IATP-blnding subunit (facillus subtilisi 1 80 1 57 j 771 I 54 1 S 4550 14744 1o112198826 I(AP0042251 CuxICDP(IB1); cux/CDP homeoprotein Itlus musculus) 1 80 1 60 195 4- 11 59 II 7109 7486 jgij9SlOS2 lOAFS, putatIve iStreptococcus pneumonlael I 0 j 6 7 5 4 -4 3 11230 11550 IpirIAO28LSIRSBS Iribosomal protein L23 Bacillus stearothermophilus 1 80 1 69 1 321 1 65 112 j 5174 15503 IpirIAO2819IRSBS Iribosomal protein L24 Bacillus stearothermophilus so 70 j 330 1 66 19 19884 110687 911I2313836 IiAEOOOS84I conserved hypothetical protein iHelicobacter pylorl 80 jo 66 804 1 82 12 1648 1 2438 1911622991. Imannitol transport protein [Bacillus stearothermopitllus) 80 65 17911 I 5 90 30 11528995 Ipolyketide synthase (Bacillus subtilisi 80 46 321 8_ 4 1 89 1 8 1 6870 15779 1911853776 jpeptide chain release factor I [sacillus subtilis) 80o 63 1092 1 93 112 1 8718 17438 IonlIPIDIOl959 jhypothetical protein (Synechocystis sp.) 80 60 12811 S. pnu a Ptatv poin rein pf nope* proen siia to p p rten 4- Contig loaF IStart stop I match Imatch gene name I aim tident length I ID lID I Intl Intl I acession I Intl 106 5 j6854 j5751 gnljPIDje199386 Iglutamilnase of carbamoyl-phosphate synthase (Lactobacillus plantaruml 80 65 1104 109 2 1 2160 1 i450 laiI40056 IphoP gene product (Bacillus subtilis so 80 59 711 4 124 9 14246 13953 IgnljPIDIdi02254 130S ribosomal protein 516 (Bacillus subtilisi s0 65 294 13 1 5 9 1 0- 9 4 1N A d- -e-nd enI--- 4---40 I128 j9 8 148 64 19 gij228l308: Iphosphe no ase [SLac tococcus actis cersi 1 80 70 6 1281 4 4 158 12 12474 1984 10111877423 j galactoae-1-P-uridyl transferase [Straptococcus mutans so 80 65 14911 111 110 1 47 7728 Igij397800 jcyclophilin C-associated protein titus musculusi 80o 60 f 255 619 1 giI149395 IlacC (Lactococcus lactisl 0066 618 ;C 1 313 1 1 1 27 1539 1o11143467 Iribosomal protein 54 (Bacillus subtilil I so 70 513 229 2 1652 1858 1g11533080 maecF protein (Streptococcus pyogenesl 80 63 795 371 1 I95 1g14230 Clzpc adenosine triphosphatase (Bactilus subtilial 80o 58 957 1 8 1 4312 5580 IgiI149435 Iputative (Lactococcus lactial 1 79 1 64 1 1269 23 1 1175 1135 jgiJI542975 JAbcB (Thermoanserobacterium thetmosulfurigeneal 79 61 1041 I 33 114- 1 9244 21 Io~ nlIPIDls25389l IUDP-glucose 4-epimerase (Bacillus subtilla) 1 9 j 62 j 10441 36 3 1242 j 2633 onIP 101e324218 IftaA (Enterococcus hirasiI 79 j 5 f 19 4 3 8 113 7-1-155 18378 jgij4O5l34 lacetats kinase (Bacillus aubtilisi I so 1224 4901 1- 829 4- 4 C- -4subills 79 6 78 1 65. 119 1 8661 1891 ,5 IgiJ2078380 jribosomal proteinL30_(Staphylococcus aureusI 1 79 j 68 255 69 4 3678 j 18 IgnIPIDje3ll452 junknown (Bacillus subt is 79 I 64 J 1551 698 9 6 1 7279 lgi1'677850 Ihypothetical protein (Staphylococcus aureusl 1 79 I 59 J 603
C
72 110 1 8491 19783 IgnllPIDIl09l Ihypothetical protein (Synechocystia sp.( 79 62 I 12931 4 3 12906 17300 1911143342 jpolyaeraae III (Bacillus subtilisi 79 1 65 4395 12j4 113326 115689 gnIPIDje255O9l jhypothetical protein (Bacillus subtilisi 79 65 j 2364 1 86 113 112233 111118 jgi1683582 Iprephenate dehydrogenass (Lactococcus lactisI 79 1 so8 1116 4 92 3 940 1734 gilI537286 lrlosephosphate isomerase (Lactococcus lactisl 79 65 j 795 4- I 98 I6 14023 1 4742 IgnllPiDIdOO262 jI~lvG protein (Salmonella typhimuriumt 79 63 1 720 4- c. anuo a Puatv cod n a ein .f no e *rten siia to a pro*eia a. aD aI aI ant Ic a.s o I* Ia *aa( 1 9 1 1615 1410 S.1533 pnumn a -g Puatv ctoding Ireiosof noelpt ins s7mla to know proein -4 1 1 11615 1164106 gi1153736 la-alactoidasae (Streptococcus prtiutanslccsfaclsi7 5 2 i-113-- 19-- 64 -2166- 1 107 j 2 j 158 j 6067 19i1506000 jaDal a phyo coianin liaesrlae prti1Etrccu cl 79 67 16672 113 5 j 28685 1 3032 IgiJ1482 Ippsl:iv f146.c2..189 u lyacterlu 1erc 79 I 641 1446 F 4- I 117 180 1982 112213 lgi1145686 I3phosphog lycetcu kiacse 1Temtg 7aiia 9 60 f 3121 4 4 162 Iaclu 2 s15u07 g5070 Iip (tpylcccsares 79 536 180 #4 18 1 7 2386 j 430 jgilj91242 l3 4 Iputetive at ococcusn lateic) ofACtp BcluIutla 79 61 1772 191 1 5 1 49 1 3449 1eil49429 Ipaivlgyea hsht ytae Lactococcus lactis) 1 79 66 6 1 366 4 21 18 3 278 1 2907 IgnIPl3IlO200 I (aB001488 FUNCTION suuniNWN Iacillus subihalis) I 79 I 57 93 18 4- 4 4-- 212 1 7 3893 1 3501 jgnljPl(el83O49 IputatiedATP-lndln protein of AoC-type acils abils 79 so 6723 32 9 1 2 J 4249 1 3489 Iui1495930 rind o slyol poste synthaocs actoilccslats 79 66 1 80152 4 4- 2101 3 1805 1 273 191111440 Imalnoseopemeae phsubunita IN-Hen s l che ihi l) 1 79 64 I 93 22 53 (3635 2362 5gnl(ID32090 jgutrdoll protein Lu.,ptaieaccocus lacti) s 1 79 59 437 4 I I 1 j 0 98 175 Igi j29432 I(ua00220 farginineu suciaesynhs (Bclu1ut s 79 j 64 737 1 4 4-- 32 3 8 gi5B97795 y308 geneosa pro tei (Pediococcus acidilac icil 79 59 67 220 S4 4- 230 10 1 943 201 jqgiI1184680 58 Isa e sotle io phoshorlasl Bils subtilis) 78 1 64 j 563 4- 2016765 i71 5 g3 ji 2820 409 g 1 5 IU-N- ac--h-erlgluosamlne i-bol- -a-erarboxyvin lanstera s (eactils su tlsj 78 62 527 1 22 1 11138 186 1gni1149432 35 Iq p tai (Lactococcu latc 78 1 63 1037 4 4 1 22 1 35091 12012 191897793 jy98 gne peyroct s [Bed cicus acidilclil j 78 j 59 3622 1 738 1 10105 lnDI dlOOS835 jasta e N sporu atio (Bacllus hl subtls) j 78 585 33 t 4- 4 i 17657 171 IgiI4915 4 IhnoxEn[tae spy oros ytrenseae atcocslatc 78 j 0 19062 4 4 4 TABE 2S. pneumoniae Putative coding regions of novel proteins slmilar to known proteins -tc I onigjRF Str Sop at I'match gene name I sirn ident Ilength I 10 JID I Intl (nt) acession I Itl *1 408001 IgI1731 jGTP cyclohydrase II/ 3 .4-difiydroxy-2-butanone-4-phosphate'synthass5 18 40 f~l 9287 73518 Actinobacillus pleuropneumoniaelj5828 I 48 31 12422 13183 giJ2-314330 IAE000623) glutamine ABC transporter, ATP-binding protein iginQI 85 I 72 I .1 l~elicobacter pylorij 8 18 76 52 2 12101 11430 IgiJ1183887 [integral membrane'prOteill IBacillus subtilisl 78 54 1 672 55 114 113605 112712 Ign11PID1d102026 I(A8002150) YbbP [Bacillus subtilisl 1 78 f 58 I 8941 I 55 117 116637 115612 jgnlPIoIe313o27 [hypothetical protein [Bacillus subtilisi 78 j 51 j 1026 71 f 114 1197S6 119598 Igi1179764 Icalcium channel alpha-ID subunit (Hlomo Sapiensi 78 j 57 I 159 7 5 1 9 1 6623 1 7972 Igil1877423 Igalactose-1-P-uridyl transferase IStreptococcus mutanal I 78 62 I 1350 1 81 112 112125 113906 jglj 1573607 IL-fucose iaomerase (fucil llaemophilus influenzaci 78 66 1782 4 82 I3 1 2423 1 4417 1gi1153744 JORF Xi putative IStreptococcus mutans) 78 I 64 I 1995 C- C 4- 4- -r -y -t-ran te r n -s i 83 iBj1926j150 j~11333 monophosphate cyclohydrolas: (PUR-4(J)IBacillussubiis) 1 780 3 I 83 120 120212 120775 IgiI143364 lphosphoribosyl aminoimidazole carboxylase I (PUR-El (Bacillus subtilis) I 78 64 5641 92-T 2 T 165- ;-878- I~d~l~ IO S rept-oco-cus mutansij I 78- 62 714 98 F T 586-3 6909 gi12331287 I(AF013188) release factor 2 (Bacillus subtilisi 1 78 63 1047 11 1i 3 1 1071 1 2741 jgiJS8O914 IdnazX (Bacillus subtilisi 1 78 64 j 1671 I 27 (4 1133 1 2071 11142463 IRICA polymerase alpha-core-subunit [Bacillus subtilisl 1 78 j 59 939I I 132 1 1 2782 1497 Igi1S61763 1pullulanase (Bacteroides thetaiotaomicron( j 78 58 2286 1 1 4 12698 1 353.7 IgiI1788036 I(AE000269) 1113-dependent NAD synthetase lEscheri'chia colil 1 78 1 66 1 8401 140 24 126853 125423 jgIjllOOO77 Iphospho-beta-glucosidase lClostridium longisporuml 1 78 1 64 1 1431 -4 I 150 5 1 4690 1 4514 IgiII49d64 lamino peptidase Itactococcus lactis) 1 78 4.2 j 177f I 152 1 1 1 795 IgI1639915 INADH dehydrogenase subunit lihunbergia alatal 78 43 1 C -4 I 162 1 4 1 4997 1 4110 IgnlIPIDIe32352B Iputative YhaP protein [Bacillus subtilisl 1 78 1 64 I 888 I 181 110 1 8651 1 794 Igil49402 Ilactose repressor lacR; alt.) (Lactococcus lactisi 1 78 48 705 I 200 1 4 1 3627 1 4958 IenIP!Djdl00172 linvertase IZymooas mobilisi 78 j 61 f 13321 4 20 3 1 3230 1 3015 1giI1l74237 ICyCK (Pseudomonas fluorescensl 1 78 1 57 I 2161 4 4 TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins 1I4- ma I 0 ji nt) Int) acession (nt) I 210 9 6789 1 7172 IgIj580902 10RF6 gene product (Bacillus subtilisia) 78 42 j 384 214 6 1 '81 2797 jglPDdO09 P. haemlytica o-sialoglycoprotein endopeptidase; P36175 (660) 78 B 60 101 transmembrane [Bacillus subtilisl 1 24 113 1 6322 18163 Igi11377831 junknown (Bacilius subtilisl 76 62 j 1842 1 217 111 9 12717 jgiJ488430 jalcohol dehydrogenase 2 (Entamoeba histolytical 78 1 64 I 2709 2- I- poe I I I I 1Iinfluenzaal III 216 1 1 1 ?23 1 75 IgniIPID72 O0 Iputativel r o e L Strepc ocus pyogenesi 78-a 65 1 531 27 22 [BacgnIX~dO36Irbsma rtinlu L Baills abiii 86 3 1 12 3 j1567 I1079 Igi1289261 IcomE 016F2 1Bclu utls 78 1 54 1 489 I 339 1 1 j 117 17-94 IgiII916729 ICadD (Staphylococcus aureusl 78 1 53 1 678 342 12 1762 1 265 1Igil]842439 Iphosphatidylglycerophosphate synthase (Bacillus subtilisl 78 1 59 1 498 1 383 111 737 1 1eIiI1184680 Ipoivnucleotide phoaphorylaso (Bacillus subtilisl 1 78 1 64 1 735 1
_O
1 7 115 111923 111018 1LI1i399855 Icarboxyitransferase beta subunit (Synechococcus PCC79421 1 71 I 63 1 906 I 8 12 11698 1 2255 1011149433 Iputative_(Lactococcus lactis) j 7 1 59 I 558j1 I 17 -114 I ,*6948 -i7550 1ii520738 IcomA protein (Streptococcus pneumoniae( 7 I 60 I 6031 112 9761 896 ;g~OOS ;ie Bclu utls 71 43 I 7951 1 36 114 111421 112131 IgiII573766 Iphosphoglyceromutase IgpmA) (iaemophiius Infiuenze( 77 I 64 1 7111 I .55 I3 I3836 1 4096 Igill08640 IveaB (Bacillus subtilisl I 77 I 5 261 I 61 8 18377 1 8054. jgij1890649 Imultidrug resistance protein LmrA ILactococcuS lactisl I 77 1 51 324 I 65 I2 1607 11254 g01140103 Iribosomal protein L4 (Bacillus stearothermophilusl 7763 648 I 6 8 759 7240 IgiI4755l IHRP (Streptococcus suisl 1 ~77 68 270 69 1 1 1083 118 IgnlIPIOIe3ll493 Iunknown (Bacillus subtilislI 77 57 9661 T?7 5 1 4583 14026 IgnIIPIDIe28I578 1hypothetIcal 12.2 kd protein (Bacillus subtiiis( 1 77 1 60 1 5581 6 1 83 114 113104 114552 IgiII590947 iamidophophoribosyltransferase (liethanococcus jannaschii( 1 77 I 56 1 1449 I 94 1 4 1 3006 1 5444 IgnlIPIOIe3299 11AJ0004961 cyclic nucleotide-gated channel beta subunit (Rattus norvegicuslj 77 66 2439 96 111 1 8518 1 8880 1a11551879 IORF 1 (Lactococcus lactis( ~77 62 3631 I 9 jl 1402 1279 gi15337 sugar-binding protein (Streptococcus mutans( )7 I 61 1284 TABL a2 s a S. n a Pua tv coin a ein of noe prten simaa ao knw aroae*n 106 pnuona 2 Puatv codin region of2 novel proteins similarilu tolun e know protein 4- 108 14 13152 14030 1g111574730 Itellurite resistance protein (tehll (Haemophilus influenzae( 77 1 58 j 8791 -1-18 j 4- ;3520-,13131- Igij1573900 ID-alanine permease (dagA) Ilaemophilus influenzael 1 77 57 1 390 124 14 j1796 1071 19i11573162 jtRl4A (guanine-N1)-methyltransferase itrmD( (laemophilus influenzae( 1 77 I 58 1 726 1 126 14 15909 14614 IgnlIPiOjIOll63 ISrb (Bacillus subtilisi 1 77 62 1296 ;-28 4- -4-4 10 1 1 1287 jgniIPIDje325013 1hypothetlca1 protein (Bacillus subtilis) 17 1 61 1 1287 1 139 1 5 1.4388 13639 19112293302 (JAF008220) YtqA (Bacillus subtilis) 1 77 1 59 I 7501 1 140 111 110931 1 9582 1gij289284 jcysteinyl-tR4A synthetase leacillus subtilis) 1 77 1 64 1 1350 18 195 119263 jgiJ517210 putative transposase (Streptococcus pyogenes) 1 77 1 66 1 189 4- -4-1 7 I 4 735 5293 ~giI556258 IsecA (Listeria monocytogenes) 1 77 1 59 I 2559 1- 14 2 671 I13 IgnIIPIDIdOOS8S jlysyl-tRUA thynthetase (Bacillus subtillsl 77' 61 1503 U.) 163 15 1 6412 1 7308 IgilsIOtS Idihydroorotate dehydrogenase A (tactococcus lactia) 1 77 62 987 1 164 110 7841 17074 Ign1IP1D~dl00964 Ihomologue of Iron dicitrate transport ATP-binding protein FecE of E. coli 77 52 761 1 1 1 I I I (Bacillus subtilisi F 1 191 1 8 1 7257 1 5791 1giJ149516 lanthranhlate synthase alpha subunit iLactococcus lactis) 77' 57 14671 4 I 198 j 8 15377 1 5177 1gi11573856 Ihyvothetical illaemophilus influenzae( 1 77 1 66 2011 213 11 202 I462 g0111743860 IBrca2 (Iuamusculusj 1 77 1 50 261f1 289 3 11737 1 1276 IonlIPIDjIOO947 IRibosomal Protein LIO (Bacillus subtilis) I 77 1 62 j 4621 4---4 292 2 1399 668 g011143004 Itransfer RWA-GIn synthetase (Bacillus atearothermophilus( 77 58 1 732 4-T I I3 I2734 11166 Ign~jPID~dl01824 jpeptide-chain-release factor 3 (Synechocystis sp.) 76 53 1569 I -1 123 118474 118235 IgiI4S5ls7 lacyl carrier protein (Cryptomonas phi) 1 76 1 57 1 240 1 9 1 8B 5706 14342 loll 1146247 jasparaginyi-tR4A synthetase (Bacillus subtilis) 1 76 1 61 j 1365 1 10 15 14531 14385 IgnlIPlDIe3l449S [hypothetical protein (Clostridum periringensi 76 1 53 147 4 1 18 12 1 1615 1842 1g111591672 lphosphate transport system ATP-binding protein (lethanococcus jannaschii( j 76 56 1 7 -4 4 4 4- T A B E. S .nu o n a P u t t i v c o i n r e i n o f. n o ve S eoei n a t o k n w r o e n ID, *I *nt I I t See I I@ I ID n It I I acessio I tt) -4 4-4 22 137 1127896 1286 IgI lO4 el338 Iters ioninitiain fatorys IF'] llu AA -72 Bailus 1taohrohls 76 64 378 I6 86 1 28 jniI177 23346 IfapSO (Staphylococcus aureusi 1 76 61 5 9lie 1 58 12 21413 1 20957 19112ID31432 (Ayote003 gluteine [BCanlssortrpeseaepoen(lP)(llcbc 76 Ii404 I SL29 pyloen (A 1-6 I8clu ub )7 521 21 4- I 52 12 1128 1 113786 jgni5 ~252 Idonbo diriedine phbaoytoly ease [atacillus ubtilis) I 76 8 51 064 695 110 111271 60051 5gn~lP105e283110 jferm iD i nStph lcocsd aures)h rl s B cl u t a ohe mp i )7 11 9 -:1111-- T 7I I 5 84 1 6559 giljI29 O 2O IC.s (aherm c hium bea -g u oi a coi8 195lBai ls)u t l l7 0 3 412 6 762 5 406 j72956 5gnlP140e302 Ihnoe tl59)c nsprtein (Bachil subtin)oatr yarl7 6 9 4- 44 82 65 161 J42 419 ~gi5701 1.29proin (mAse 1-66 (alus ubils) inleze 76 58 5 29 -I 4 4- 4- 4- 4 69 69 72971 1118 6005 437 jn5 dx l42 o Py ridine nu c s ide h s hy ase (aius- startrophil) [Bclu 6 5 60 1269 S- I 6 114 I1739 112671 Ign1I4lDe26 2 Iunkown (ycou btrlumtuecuois 76 5 53 1179 4 4 4 4 5 I 80 j1II5 764 5 7936 Igil23l40320 5(uA iE0059 cm posredn hyptei al potelinl eco ctr yl )J 76 5 6 245 948 15 16 9 51 696 1g157330 16.0ain perme ase daA (Heepils nlunsl 76 56 978 816 Il 110 1 12231 gi143806 p ero (B a llu su btaei p tis n A Irp aocu nu oi 76 59 1179 1 V4T Ii-- 4 4 S- 4 1 167 12 1 2156 jl293O IgnlPlDdO320 putive e tiBacillus subtilis) 76 6 6457 4 4 4 4- 1 128 11 1 6 19 3 6 1 7 9 gI114 41 16.0 n kd nRuc l s id Ch spo lB ci l s s bt l s 76 I 60 82 3 1 -My g-en 35- 4 123 6 44 595 5g5 3129 I yclr APlasea (Liserlna eonctgns 5 I 245 -4 an a Pu at v co in en of noe prte n si a r o o p o e n a- 4- I- 1 140 114 114872 112536 1gL11184680 Ipoilnucieotlde phosphorylase (Bacillus subtills) 1 76 j 62 23371 I 13 283 j 90 1114395 Itransfer RN4A-Tyr synthetase (Bacillus subtiiis( 1 76 j 61 1323 t- 170 j6 j 5095 j6114 lgnllPIDldlOO959 ycgQ (Bacillus subtilis) j 76 44 j 1020 F 180 280 19279-27 71- 5 gi-- 4g 00l91 4I0 1-821) (Bacillus- subt-- is)- j- 76- 371- 19 7 j 81 j528 gi5588 anthranilate synthase beta subunit (Lactococcus iactis( 76 61 1 508 195 3 13829 12444 IgiJ2l49905 ID-glutamic acid adding enzyme (Enterococcus faecalis) 76 j 60 1386 200 J3 11914 3629 1911431272 lysis protein (Bacillus subtilisI 76 j 58 J 1716 ft ;201 431 1207 Ig112208998 Idextran glucosidase DexS (Streptococcus suil(. 76 57 225 24 (2 1283 2380 (911663278 trsnsposase (Streptococcus pneumoniae) 76 55 1098 T 4 225 13 12338 1 3411 19i11552775 IATP-binding protein (Escherichis coli( 76 56 1 10741 f- f f 4 ft- 233 1 1I 2 1724 IgilI63li5 Ineuraminidase B (Streptococcus pneumoniae) j 76 60 .723 1 347 1 523 3 8 j 9 1j537033: IORF~f356 (Eacherichis call) 1 76 60 460 356 2 1 842 1165 (gi(214990s jo-glutamic acid adding enzyme (Enterococcus faecalis) 76 61 I .678 -366 j- +13- 1-734 1348 j1gi 1-49S -2O jphosphoribosyl anthranilate -Isomerase (Lactococcus Iactis( 76 1 69 387 5 ft- f I S 8 112599 111484 1gi11574293 Ifimbrial transcription regulation repressor (p118) (Haemophilus influenzae( 75 I 6j I 1116 6 1 12553 111894 InIPIDIdlO2OSO ydlHi (Bacillus subtilis) 5 1 51 6601 ft-- 9 (0 7282 6062 (gij 142538 jaspartate amlnotransferase (Bacillus sp.( 7 5 I 55 J 1221 1--0T T 112 I8080 7940 1911149493 ISCRFI methylase [Lactococcus lactis) 7S 56 1 141 4- t- S4 22 j4 1838 1 2728 IgiI1373157 j orf-X; hypothetical protein; method: conceptual translation supplied by 75 62 1 891 1 Ill 1 9015 1 7828 1i11I53801 fenzyme scr-II (Streptococcus mutans) I 75 1 64 1188 1 3 fT ft I 3 4 I175 14 1111976 Iipa-52r gene product (Bacillus subtilil I 75 I 53 I 288 1 33 110 1 6470 15769 1911533105 junknown (Bacillus subtilis( 1 75 1 56 702 a a a a a a TA BLE 2S. pneumoniae -Putative coding regions of novel proteins similar to known pr Iptelns Contig lORF I Start (stop I catch match gene name t Sic Iident length 4 4- I 3 (2 I6878 (7183 Ipir(AOO2OSIFECL (ferredoxin (4Fe-45( Clostridium thermaceticws j 75 56 306 -4 I 36 1 1 181 1 2 jg112068739 ((AF003141) strong similarity to the FABP/P2/CRBP/CRABP family of 75 I 4 8 38 122 114510 j15379 1g111574058 1hypothetical .(Iaemophilus influensaej 75 J 56 8701 1 48 133 j23398 124066 IgiI1930092 (outer membrane protein (Campylobacter jejunil 1 75 I 56 1 6691 13-9--*1g i-4 -9 -4 -1-4* Si (1 I838 1537192 JCO Site No. 620; alternate gene names hs, hsp. har, rm7. apparent frameshift 75 50 33116 j 9 1 in DenBank Accession Number X06545 (Escherlchia coll I 18 11966 2059 gi666069 (orf2 gene product ILactobacillus leichmannii( 7 5 58 1194 I 57 I9 j8448 1 7822 (gij290561 (o188 (Eacherichia coli) 75I 50 1 627 I 6 11 (6072 (6356 (gi(606241 (30S ribosomal subunit protein S14 (Eacherichia coli( 75 j 64 285 65-- 1-6 4 4 70-- 40--3071--i-- 2472 1g1711-- 256617 jadenine--- phosphoribosyltransferaseos (Bacillus aubtilisi I 75 I 57 ssI 600il----I 75 f 4 1 71 124 130399 (29404 (gi(1574390 (C4-dicarboxylate transport protein (Naemophilus influenzae) 7 5 I 57 1 996 ;2i910----*-----i-nIIP-Dl--4-6561Yn-T[-ac-l-u--s-btl--- 4- 9 1 10 41 gil 629 128.2% of identity to the Escherichia coli DiP-binding protein Era; putative 71 1 9 I 1320I 79 1 110 [§11146219 Bacillus subtilis( -I 17 i 82 i6 6360 1 6536 1g111655715 InatD (Rhodobacrer capsulatus( 1 7'5 5517 I 83 I6 1 1938 1 2975 Ignl1P101e323529 (putative PlsX protein (Bacillus subtills) I 75 I 56 1038 4 f 93 111 1 7368 1 5317 191139989 (methionyl-tNA synthetase (Bacillus stearothercophilusl 75 58 1 2052 T 9 5 1 1 1 17~95 1 47 IgnIIPID~e323510 (flov protein (Bacillus subtilis) 1 75 I 57 1749 10 2 j362 (1186 (gnl(P1D~e266928 Iunknown (Nycobacteriuc tuberculosisi) 75 64 825( F 104 i1-j 691 1 915 (gil 460026 (repressor protein (Streptococcus pneumoniael I I5 54 225 1 113 15 1 2951 3883 (gnljPID~dloIll (ABC transporter subunit (Synechocystis sp.) j 75 55 9334- ,44 4 20-- 4- pres-inHrcA--- IStre-t--oc---mutanSJ-1----- -7-4 I 121 6 1 j 201 390 (gi(21451IM (repre s fc lass p eiceat o ckn rg n M 1 5 expressi n c u jtrept cus Iuas J 75 44 5838171 I 37 (8 108 10687 0g1393116 (P-glycoprotein S (Entamoeba histolytica( 75 S 2 6061 4-4- 4- 149 111 j 8499 1 9338 Ignl(PID~d100582 (unknown [Bacillus subtilis) 1 75 55 1 840(1 4 4- TABLE 2 S. pneumoniae Putative cod ing regions of novel proteins similar to known proteins I Contifg (jOarF- S t S art I Stop- ma tch *match gene name -length-- I ID (ID I (n t) nt) J acession (III nt), I 151 1 9100 I63 (gnIPI40467 89 l s poly epi e r at of cilu C srfa il Ctoatr rudis)7 713 t* 4 4 -4 I 1 172 18 15653 1 6774 (gi(142978 jglycerol dehydrogenase (Bacillus stesrothermophilusi 1 75 56 j 1122 4 T 4- 4- 17 61 7 gnh(ID e264 IlS .6 (Cenourhabint reansr IT-idn prti 't)Iampi 5 I 56 1853 I I I i I ~~~~~influenz a)(tasotAPbnin rti 6 I 191 1 6 1 5235 1 4213 1gi11149518 (phosphorlbosyl anthranilate transferase (Lactococcus lactisi 7 5 61 j 10231 226 1 2 !1774 j 1181 Jgi(2314588 ((AE000642) conserved hypothetical protein (Ielicobacter pylon) j 75 1 65 594 4- 1 231 1 1 1 1 1 153 (gi(40173 (homolog of E.coli ribosomal protein L21 (Bacillus subtilis) 5 75 5 57 153 I 234 55( 2 1 41 (gI( 9298 j(AF0082 0 prtein (Bacillus subtilis( 75 5 590 4172 I 21 I7 1 3558 1 3827 (gi(40011 (0RF17 (A.A 1-161) (Bacillus subtilis( 75 48 1 270( 00
I
375 1 2 1 137 (628 (91(410137 10RFX13 (Bacillus subtilis) 75 1 58 5 4921 i 4 6 (20 (16721 176 1912293323 ((A7008220) YtdI (Bacillus subtilis( 7 4 1 53 8401 7 I6 1 4682 1 6052 (gi(1354311 (PE7'l12-like protein (Bacillus subtilis( 7 4 5 60 13711 4 I 8 4 34 (2427 (gnlPID~d101319 5YqgI (BacIllus subtilis( 74 I 54 9155 4 21 1 6 1 5885 5 4800 (gI(1072381 (glutamyl-aminopeptidase (Lactococcus Iactil I 4 I 59 1086 24 (2 739 58 (I2172 (AEOOO65Sp-ABC transporter. permease protein tyaeE) ti-elicobactar pylori( 74 46 192 4 1 1 1 2 1 367 IgnI(PIDjd100932 (120-forming NADH Oxidase (Streptococcus mutans) I 74 63 5 366 3 (18 (11432 (12964 (gi(537o34 (ORF..o488 (Escherichis coli( 74 I 57 1533 4- -i -4 4 4- 48 (10 (8924 1 6669 (gi( 1513069 (P-type adenosine tniphosphatase (Listerla monocytogenes) I 74 1 53 2256 5 (11 (11964 (11401 (gnl(PIDje2e3llo (femD (Staphylococcus aureus( 74 64 5 5645 4 4 4 I 6 (2 172 47 gi(2921 ((AF008220) putative UDP-N-acetylmuramate-alanine ligase (Bacillus subtilis( 74 1 55 1356 4 4 1 76 (10 1 9414 1 8065 (gnl(PIDldlo1325 5YqiB (Bacillus subtilis] 74 I 4 1350( I 8 I 66 1 926 (pirIC33496(C334 (hisC homolog Bacillus subtilis I 74 55 261- 4 4 86 (9 1 8985 (8080 (gi(683585 (prephenate dehydratase (Lactococcus lactis( 74 55 9064- 4 4 4 4- 4 T A B L 2 S qqmo a P u t a t ie cig r i n s f n v e p o t e i n. m l a t o k o n p o e n TABLE 236 S26 pgn leumoniae -o putatecing rgcionus ofnovl prten simla toknwnprten 4 102 7 1 686 500 75652 jgnlj143394 76 (OtlPP'transferase Bac illocus subtilia 1 74 1 56 I 6489 4- 4 4 4 1-4- 4--*gnll-IDIdl01320-'--- 74- S4- 1-33 I2-1 -1380-49-19 -g19nlI1-PID-le313025 (*hypotheticalprotein (Bacill-us subtilisi 46 6 4 13 1I 17 168 Isn1IPIDIdl00479 INa. -ATPase subunit D (Enterococcus hirsel I 74 I 53 I 621 149 j (306 383 jgljlDd1051 Ihigh level kasgamycin resistance (Bacillus subtllis( '74 I 55 876 1P 1D Id -0 05 8-i- I II I i nfluenzael 181 164 1 6 13515 1 4249 IgiJ410131 IORFX7 (Bacillus subtilisl 1 74 1 48 1 735J 7- 52---igi141392 24 171 (1 1 1 1818 IgnlIlDIdlO22sl Ibeta-galactosidase (Bacillus circulansi 7 2 11 4 4 4 17 04 I2392 1011466474 (cellobiose phosphocransferase enzyme 11l" (Bacillus stearothermophiiua( 74 62 1329 1 185 1 1326 1 3 19111573646 Hgf24) transport ATPase protein C (isgtC) (SP:P22037) (Haezsophiius j 74 68 J 324 18 (1 691 7174 10111661199 jsakacin A production response regulator (Streptococcus mutans) 7 4 I 6i I 684 9 I 4 210 (2 (520 I1287 jgi(2293207 IAF00822Oi YtmQ (Bacillus subtilis( 1 74 j 60 768 4-T, 26 3 1192 (gil6669B3 (putative ATP binding subunit (Bacillus subtilisl 74 I 55 645 4- 4 266 69 (35S 1911663232 Similarity with S. cerevisiae hypothetical 137.7 lcD protein in subteloiseric I 74 I 221 I I j Y repeat region iSaccharomyces cerevisiad(4i23l 265 I2 1 844 1 1227 1011492712 jAsparaginase (Bacillus licheniforiisi6438 1- 4- 1 368 1( 1 1 1 942 1911603998 (unknown ISaccharomyces cerevisiael1 3i4 7 (16 113357 111921 (onlPID~d101324 (YqhX (Bacillus subtiliag5 13 :1111 31 (2 I522 (244 IgnIIPID~d100576 (single strand DNA binding protein (Bacillus subtilis) 73 I 55 1 279 32 1 6 15667 j 6194 IgnI(P1D~d101315 jYqIG (Bacillus subtilis( 73 so8 528 34 (1 (128 I 970 InII~l~ IABool6a4) 08F42c iChlorella vulgaris) 7 3 I 46 492( 4 4 040 .0 00 0 w we :10 1e TABLE 2S. pneumoniae Putative coding regions of novel proteins similar to known proteins Contig IORF start IStop match Imatch gene name taim tident Ilength ID J1D jintl I (ntl acession I I 4 j12 9876 9226 jgi11173Sl7 jriboflavin synthase alpha subunit (Actinobacillus pleuropneumoniaej 73 I 55 6511 55 392 83 InlPlDldIOl887 Ication-transporting ATPase PacL. (Synechocystis sp.( 73 j 60 I 2754 -4 118 117494 116586 lgnllPIDle265S8O lunknown (Hycobacterium tuberculosis! 1 '3 1 52 1 909 1 116 1 7213 17767 1911143419 Iribosomal protein L6i [Bacillus stearothermophilusi 1 73 1 60 555 4- 66 3 3300 3659 IgnlIle269883 ILacF (Lactobacillus caseil I 73 5236 4- 4- S 4 110 I5557 1 5733 IgiI 657631 lenvelope protein (Human Immunodeficiency virus type 11 73 1 60 177 4 4 71 4 6133 0 262 1gnl1P101e322063 jss-l,4-galactosyltransferase (Streptococcus pneumoniae( 7 3 1 45 I 2130 4- 72 1 3. I 851 jgi12293177 g(AF0082201 transporter (Bacillus subtililI 73 50 849 76 7 019 1 6195 jgnljPIDjdI132S jvqir (Bacillus subtilisi 1 7 3 66 1 825 4- 76 112 110009 1 9533 jgij 1573086 lurldine kinase (uridine monophosphokinase) (udkl (Haemophilus influenzasi 73 j 54 I 477I -811- 372 g-i-I1 377 82 97 5 1 3389 j1668 jgnljPIDjdlOl9S4 Idihydroxyacid dehydratase (Synechocystis sp.( 1 73 1 54 1 1722 4 00 98 9 6912 17619 1jgn11P1Dje3l4991 IFtsK (Iycobacterium tueclss 351 708 0 _T 4 -73 4 128 6 13632 4222 19i11685111 forf1091 (Streptococcus therisophilus! 63 1 591 1 138 12 1 575 1394 1911147326 Itransport. protein (Escherlchia coli 73 1 60 1182 (4 13 112538 111903 1pir1E534021E534 Iserine 0-acetyltransferase (EC 2.3.1.30) -Bacillus stearothermophilus 73 1 55 j 636 4 162 571 491 I -IPIDIC-e323S-1I j-putative YhaQ protein (Bacillus subtI Its) 73 1 so j 7111 4- 1-R _X 1 S- 164 4 j 2323 53279 1gnji592076 9 homooheical detie protein of78 (E.ancoccu nacilss ii I 73 j1 5 46890 164laio 8rti B. 4815d 5546oiu jgijlli 10F1 (Bclu7utls 3 56 73 4 104 5 09 439 32 lgnlIPIDjdIOO9S9 jhmlouIf ndntfe protein o .ci (Bacillus subtilienis)I 73 1 41 89 2 18 7 83 4855 10114622 odulation protein B. hoendg (sheoumc spotiic jN-idn rti 73 56 9 26 I 0g6596 678 1nlP01e217l IPmoo o lcRbsoa protein (Bcilu thuriniensis sut s 1 73 6 1 j 2081 27 I1 2 505 1oi11773151 ladenine phosphoribosyltransferase (Eacherichia col! 1 73 I 51 504 4 *0 a a. a a a0 ae 00 a a TABLE 2 S. pneumoniae Putative coding regions of novel proteins similar to known proteins Contig lOaF( Start( Stop match I match gene name I Si sl jtident I length ID (ID j Intl j (nt) I acession (jII nt), 269 1 1( 2 I691 (gnlPID~d101328 lYqix [Bacillus subtilis) I I 3669 289 2 1 1272 832 (pir(A02771(R7HC (ribosomal protein L7/L12 Ilicrococcus luteus 1 73 66 4411 4- 33 11 1 1 48 gI782 A007)hypothetical 30.4 kO protein in manZ-cspC intergenic region 73 47 471 14 48 gi~12 (Escherichia coil III 1 356 j 1 222 1 4 (gi(2149905 jD-glutamic acid adding enzyme tEnterococcus faecalls) 1 73 so0 2191 t 1 7 I 5 1 3165 1 4691 Ignl(PiD~dl0l833 lamidase (Synechocystis sp.) 72 52 15271 I 9 I7195 (7647 jgi(146976 (nusfi (Escherlchia colil 72 54 453I 7 117 113743 j13300 jgnI(PID~e289141 Isiia ohdoyyity-ay carrier protein) dehydratase (BacIllus 72 5944 I 2 (19 (15637 116224 (gnllPlDldl01929 Iribosome releasing factor [Synechocystis ap.) 72 51 j 588 4- 4 4 I 33 117 (12111 j11425 IgnIjPIDjdlOll9O 10RF3 (Streptococcus mutans] 72 1 55 I 6871 38 123 115372 (16085 (pIr(I16410,8(1641 L-ribuloae-phosphare 4-episierase (arafl) homolog Hacutophilus lnfluenzae 72 54 7114 1 1 I 1 (strain Rtd 3CW20)III 4 39 5 1 5094 1 6905 -IgnlPIDe254877 (unknown (Hycobacterium, tuberculosis) 72 56 1812 1 6 1 4469 1 4636 (gl(153672 Ilactose repressor (Streptococcus mutans) 72 j 58 I 168j1 48 2 159 153 gi(1080 Inlbi bta-A-subunit (Ovis aries) 72 33 207 _nhIbn_ (829 1129 122424 g1342 (AE000623) glutamine ABC transporter, permease protein (glnPI ((tlicobacter 72 49 696 I I I I I pylon) II 2-- 5l1( 3 1 1044 1 2282 (gl(2293230 ((AF008220) Ytb3 [Bacillus subtilis) 72 54 1239 I i1 4- 55 1 41 5 91882518 IoRF..o304; OTO start (Eacherichia coll 72 I 59 1 807 4-4 4- 1 75 1 5 1 2832 1 3191 (gnl(PlDje209886 (mercuric resistance operon regulatory protein (BacIllus subtilis) 72 44 1 360 76 1. 6 1 6229 1 5771 (gi(142450 (ahrC protein [Bacillus subtilis) 72 53 459 72-6--474 I 79 5 5065 (4592 (gi(2293279 I(AF008220) YtcO (Bacillus subtilis)726 1 44 -4 1 87 (34 (14726 (12309 (gn)(PIoje323502 (putative PrIA protein (Bacillus subtllis) 72 52 I 2418 9 444 1 662 (gi(500691 (HYol gene product LSaccharomyces cerevislse)( 72 so5 219 91 7 1 4516 1 4764 (gi(829615 (skeletal muscle sodium channel alpha-subunit )Equus ceballus) 72 38 1 249 0. 5 5 55** 5 *00 1 151 gnl.5e23 2 puat v 5**2 *ro ei 55 55us *sulsl1 7 4 8 4 4 1925 21-- 2004 1717 IgnljPIDje323521 Iplutative-bAsp23 p eis cprotein Ifiacillusti subtlis 72 1 40 2880 1 130 1 j1 475 j 1178 1giJ2133l jallAF05 ca osphat s regu atoyprin Baills sbls 72 53 7334 4- 4 4 4 S1262183 2929 IgniPID7d2 l83 Igltamine-bindl n erplasmic prtenIyehcytss. 72 46 2190 1310 0 1961 73 92478 giJ49249 I(AF01575 Iyehcarox pepids (Bclu sutiis 1 53 I 199 7 4 190 13 1 55 2297 1giljI47 292 5 lvptveticaAr ei l~nercoccus huim) 1 2 1 46 6 5 4 4- 1 147 1 2 1 2084 1 1083 jgnlIDje325016 [hypothetical protein (Bacillus subtilis) 72 1 56 j 1002 0 4 1 147. 5 16156 1 5146 lgil472327 ITPP-dependent acetoin dehydrogenase beta-subunit (Clostridium magnum) 72 56 1 1011 1 148 18 15381 16433 jgiI974332 INAo(P)H-dependent dihydroxyacetone-phosphate reductase (Bacillus subtilisi 72 54 1053 4 4- 1 148 114 110256 19675 IgnijPXDjdI13l9 IYqgNI4 lacillus subtilis) 7 50 5821 159 8 4005 14949 J gij1788770 j AE0003301 o463; 24 pct identical (44 gaps) to 338 residues from I 72 j 43 f 945I I Ieni cillin-bndng protein PBPE-BACSU 5SW: P32959 (451 as) IEscherichia II 1 172 j10 1 9907 110620 1011763387 Iunknon )Saccharomyces cerevisiae( 72 I 55 I 714j 4--4 4 220 1 3 j 2862 1 3602 Ig111574175 Ihypothetical (Hammophilus influenzae) jo 72 041 4 267 1 1 j 3 1449 jgil290513 1I47 )Escherlchla colli 72 j 48 447 281 12 8 99 1540 1 gnIIPIDIdOO964 homologuecof aproi se2 alpha and beta subunits LysC of B. subtilis 72 45 360 29 1118 1 1, 914495 Ti OFishmooos oa 00kd hypothetical protein in the htrB,,' 1 72 54 1005 4- I 06 I 6 5 7 1 74 39 tran c ip i n el naion fro C l A c tssor Numceric i X 1 0 yco)m k 72g j 5052 43--F587--g-147 -4 I 16 j 126 4 0i1512 protein kinase C (Drosophila melanogaster) j 72 1 40 1323j 4- 4 342 1 227 3 IgnlIPIDIdlOll64 Iunknowt (Bacillus subtilis) 172 54 I 225j T 4- 4- 4 I 5 I1 1 05101PI~l2O48 IC. thermocellum beta-gluc .osldase; P26208 (985) [Bacillus subtilis) 72 52 1005 4 4 4- 6 10 .1 8134 110467 Ion1IP1DjI64229 Iunknown (Hycob .acterium tuberculosis) 71 S7 I 2334 4- -4 1 7 120 116231 j15464 Igi118046 13-oxoscyl-Iacrl-carrier protein) reductase (Cuphes lanceolata) 71 52 768 1T~ 12971 2 Io n 1I- PD1-0-1dlOOS- 71 ireplicative DNA helicase (Bacillus subtilisj 71 51 12961 I 5 4 I4415 ]3869 I911499384 Iorfl89 (Bacillus subtilis) 71 47 5671 -4 TABL 2 4 9 9 .nuona Puatv coin rein o4. noe *r n simla tokow roen 29 1 1 S pnemoia -iI 4 Putlartive oding oei i ET-EO region of nove proein simla to knwnprten 4lshrchaclj---- 4 Coti8IR 20 Start7 Stop3 match03 matchI5 geeri amel 1 aim j504et ent 4 4- 4 51 112 115015 112676 1911149528 Idipeptidyl peptidase IV (Lactococcus lactisi 71 55 I 2340 55 23120401258 ig~24385 I(AF0154531 surface located protein (Lactobacillus rhamnosus) 71 58 456 4-4- 1- 210 265 lgnllPIOIdOl32O lYqgZ (Bacillus subtilisi I 1 44 441 71 118 124679 126226 1gij580920 Irodo (gtaAl polypeptide (A.A 1-673) (Bacillus subtilisi 71 44 1548 71 125 130 587 110360 1911606028 jORF.o414j Geneplot suggests frameshitt near start but none found 1 71 j 5 2 I I I II Escherichia coll I0 I I 4---------;Iysine-e--rboxyla-s--I8ac 1214 111991 112878 1911624085 Isimilar..toorat bete-alanine synthetase encoded by Genflank Accession 71888
I
I I I Ivirus 3 1 1 "3 111 j 7269 17033 .1gi11906594 jPN1 (Rtattus norvegicus) 71 1 42 1 27 4 1 74 .1 6 110385 8517 IgiIl573733 Iprolyl-tmlA synthetase (proS) (Naemophilus influensael 1 71 j 52 18691 7 81 9 5772 16578 jgi1 147404 jmannose permease subunit II-N-Nan IEscherichia colil 71 45 807 4- 4 86 5 60 j3604 IgnlIPIDje322O63 ss-l.4-galactosyltransferese (Streptococcus pneumoniae( 53 999I 7 4 ;--105 ;4 -i3619 -i4707 jigi1232334i 11AF0144601 PepQ (Streptococcus mutans) 71 58 1089I 1 106 113 113557 112955 19111519287 jLemA (Listensa monocytogenes) 71 1 48 1 6031 -114 -2 -+1029 -i1979 1911l310303 siosA (Rhizobium melilotil 1 1 55 I 951 1 122... 2. 564 j1205 19111649037 Iglutamlne transport ATP-bindlng protein GLNQ (Salmonella typhimuriuml 71 50 642 132 5 9018 7063 jgnljPlDjdlO2O49 H. inlezehypothetical ABC transporter; P44808 (974) (BacIllus 7 1 j i 140 1 1 1141 227 bqI6378 IAE000015) Nycoplasma pneumoniae, fructose-bisphosphate aldolase; similar 71 4 1 I 1I111638 to swiss-prot Accession Number P13243. from B. subtilis (Nycoplasa I I I I I pneumoniael 140 5 15635 j4973 IgnlJPOIdOO964 homologue of hypothetical protein in a rapamycin synthesis genie cluster of j 1 48 663I I I IStreptomyces hygroscopicus IBacillus subtilil III IglIIldO00 (A0148 FNTINUNNWN, SINILAR PRODUCT IN E. CDLI AND NYCOPLASNA I 71 51 I 477 I i I I PNEUNONIAE. (Bacillus subtilis) l TABLE 2 S. pneumoniae -Putative coding regions of novel proteint similar to known proteins Contlg Jonr Start IStop j match match gene name %sim i dent length I ID JID Int) (nl) I acession I Ii ntl 1 13 1 1 I165 igi46912 Iribososial protein L13 (Staphylococcus carnosus) 71 f 59 I 165 4: 4- ;199 i3 I110IO 1319 1gi12182574 J(AE000090( Y4pE (Rhizobiumsp 5. NGR2341( 14 9 20-2266gi352 191787378 I1A20002131 hypothetical protein In purfl 5' region (Escherichia coli( 71 57 1137 1 209 12 12022 1 1141 IJ41432 Ifepe gene product (Escherichia col) j 71 46 f 882 -4 4 1 210 1 5S 1911 13071 1g1149316 IORF2 gene product (Bacillus subtilisi 71 45 1 1161 1 210 1 6 3069 13386 IglI580900 banF gene product (Bacillus subtilisi 71 j 48 J 3181 4 212 j2 3561 1 381 Igil557567 Iribonucleotide reductase Rl subunit (Mlycobacteriums tuberculosis( 153 28 -11- 233 3 00 J2920 JgnIlPlD~dlOlJ2O jYqgft (Bacillus subtilisi 7 5098 I 244 1 03 105 7-prtk-as 24 1 105 IenIPIDIOO964 horslogue of aproi se2 alpha and beta subunits LysC of B. subtilis 71 55 1041 251 2 008 I 874 jI750 Iun~ (Bacillus subtilis 14 6 4- 2512 2 9 0 1 18 Igljl53874 junknown (RhBacl u t a slat 71 46 895 M 31 13 06 1 565 1gnll110d104 5 (IABnOOSS4 yabEBacillu ca suatsls 1 71 1 46 15 I 31 33 1 j 13 3 I 6 gn1I1591 0 22 Ih pothetica proteBi lu I sP;P31 66 (1ta o o c s anshi 71 4 j 681 4 4 I 346 1 1 3 1164 IgIJ1591234 Ihypothecical protein ISP:P42297) (Hethanococcus jannaschil( I 1 36 162 4 4 I 377 1 1 688 1 2 1gi1397526 jclumping factor (Staphylococcus aureus( 71 1 23 687 4 4 I 3 j 8 1 7419 16958 jgn1jP1oje269486 jUnknown (Bacillus subtilisl I 0 1 42 1 462 1 110 1 8395 1 9075 jgnljPIDje2SsS43 Iputative Iron dependant repressor (Staphylococcus epidereidis) 1 70 1 46 681 14 11024 j10254 IgnlIPIDldloo29o lundefined open reading frame (Bacillus stearothermophilusi 1 70 1 55 771 -118 114213 113719 gIj~PIDIOlO9O jblotin carboxyl catrier protein of acetyl-CoA carboxylase (Synechocystls 70 56-- 1 1 1 1 1jsp.(II 2 105 1 28 I ignIlPIDIdOOS81 Iunknown (Bacillus subtilis( 70 52 j 771 4- 40-- 4 42-2 21 j2 12586 11846 1giJ2293447 I(AF008930) AT~ase (Bacillus subtills) j 70 1 54 7 41 22- 2 13-1 -101-955-5 1--111512 9 -loll 116529 5 jd-r-4O-----drs-c--aocp (Saccharozsycess cere is-ia--7e(-- -70 -0-50 5581--1 6 4315 3980 101(39478 IATP binding protein of transport AtPases (Bacillus firmus( 70 51 1 336 4- a. a a a a a a a a a *a a a a TABLE 2 S. pneusioniaa Putative coding regions of novel proteins timilar to known proteins I Conrtig l*ORF j -Start I-Stop match ma-- tch genename ident--- -length--- I ID liD I ntl I Intl j acsso j ntl 1 I 3-1 3i 1 370 ;113 jgi1662792 singla-stranded DNA binding protein (unidentified eubacteriws) j 70 J 36 258 1 33 115 110639 19521 fgiI 161219 jhomadgous to D-amino acid dehydrogenass enzyme (Psaudomonas aeruginosal I 70 50 1 1119 I 8 6 I3812 14312 19i12058547 IComYn (Streptococcus gordonil) 70 48 j 501 :1111:- 38 125 117986 118477 1911537033 IORF.1356 (Eacherichia coli( 1 70 58 1 492 11 4 ji j11054 1 9846 19111173516 riboflavin-specific deaminasa (Actinobacillus pleuropneumoniaa( 1 0 52 1 1209 -0 -722-- putativ---BC-11----s-bt1li- f- I---3-1 I 43 1 3 1 2373 j 1612 19111591493 1glutamine transport ATP-binding protein 0 Iethanococcus jannaschil( 70 48 j 7621 I 45 1 8 19197 18049 1gn1jP11d102036 Isubunit of ADP-glucbse pyrophosphorylase [Bacillus stearorhermophilus( 70 54 1 149 I 59 2 f567 j956 IgnlIlDjdlOO0302 Ineopuliulanase (Bacillus sp.( 70 42 390 3- I 1874 795 IgniPI e276466laminopeptidase P (Lactococcus lactisl 70 48 1080 61 4 5553 2437 1gnl1P101e275074 ISNF (Bacillus cereus( 70 51 3117 F_ T ft I 61 17 17914 16802 10111573037 Icystathionine gamma-synthase imetBfll Haemophilus influenzael I 70 52 1113 I 7 f5372 I7222 IgnlIPIDjdIOO974 junknown (Bacillus subtilis( 70 54 I 1851 t I 68 ;7 +1 7126 6962 19111263014 jemmle.1 gene product (Streptococcus pyogenesi 70 37 I 165 1 72 in2 110081 j10911 Jg1J2313093 i 1AE000524( carboxynorspermidine decarboxylase (nspC( (Helicobacter pyloril 1 70 56 831 11 I788 814 9117723 Ialactose-l-P-uridyl transferase (Streptococcus mutansl 1 70 59237 g I 9 I 3424 1 2525 g91139881 lORP 311 (A.A 1-3111 (Bacillus subtilisl 04 0 87 1 9369 7324 70j1De230 puatv Pkn 9rti Bclu sbii)-a500 I 9 96 14 110640 111780 19111573209 ItRNA-guani transglycosylase itgt( (Hsemophilug influenzae( 7 52 1149 S 1 74 1 1086 jgi1433630 IAI80 (saceharomyces cerevisiael 0 1 5 1 1 123 5 1 2901 1 3461 Ign1IPIDId10S8S Iunkno~n (Bacillus subtilisl 70 45 51 4 f- t t- ;-125 j -15--145090 13 -428 g-ll-P-D -l-0 314-- I 04 6 268 19 cJ2 apacira) tive c aci u enty cannl 1(osturs 70 so 1 12 I 2 5 40 35 InIIDdoll yqre Bclu subtiliocc slejoie 70 j 47 1 1047 135- 4 20- 4- ft- -ft 133 68 19394 19i122912 I(Ayp020 Ya-tfr (Ba rcill us ubti l 70 57 21 Ft- -f"T f- 18 1 440 j 3 1911147336 Itranamembrane protein (Escherlchia coll 70 I 42 438 aAB. 2 4 4 4 4 S. 4*u a *uat v co in re i n of no e pr te n sim la to k ow. ro e n 4 Contig 0 l 1~ Start Sto 669 match53 matchgne nciae n enym aimtbailu %aei idn 274 lgljind 02 4 jE Intl hy o he i a prntin ace180o 12 7 11a il u Iut l s 1 nt51) 4 140 116 118796 116364921 101197644 NS-mthy tyo olate hisomsen ehlrnfrs Schr ye 70 45 243 113 I8 Ig 33j cern visiae IRo ob c e Ias l s 705 I i- 204 12 32265 1145 1on11P10dl20489 ILcihypoth etical protein 380 (27 (Bacillus subtilis) 70 59441 400 4- I 207 3 j 2-682 1526 Ignil2 j392l iraAP (Dicty osteallu d s ium 1 70 45 I 21431 4-- 22 39 11362 8821 1g11135387405 junknown (RhBacu ter c slat 69 70 51 25510 2 l 12 9551 118453 jgnljPI lO2389 uhynohw cloen [Bacillus subtilis 69 j 44 1
I
22 6 j62 84 j Ij2209379 2 I(Ar006720) PrCO N (B NN 1acillus subtilis 69 2 8 14 F 4 4 22 11 I 97 110716 IgnlI4lO9 lOOS l Isu n Bcl l us s ubthtils Sahlccu ue) 69 I 51 2897 -4 4 27 j 7 j 585 5348 Ign II d20 i2 11aB001488 FUNCTOgNas UNKOWN (B a l lsbillns jurph 69 28 j 109 36 0 1>0 721942 1105116 gi1453916 Iieol3e yl n sntetays ds (Staphy loccu (aempusi nlunae 69 50 S7826' 47 38542190 1gi1141008 sach dbacydrilyogen ytase f.11 Acllens ubtrophs( 6 48 108 1 40 134 112332 121544 1gni17 38 1Hyo ia junct ei n DNahellas (uvA)lisempiu inlese 69 j 36 j 61 61 16194 11621 jgiI3653 IDN-m thla denlne glycora siase Iystagenym) 1 (Ilseeop ihi inf luen e 69 so 576 I 49 a 6 569 j 5490 1011584088 lstmar o (acraliglyen) syntohseG (Baciluls subtilispht 3Ipm 69 47 7 48 243 124353 1gnl1P1Dje338 1hypothetical protein (Bacillus subtilis) I 69 36 7380 T 4- 4 62 1 3 1 178 1 8338 1011396420 l I sie[h aro t o lcalrtine Jyetrohuts pNG1 1iuoe5popht pmrs 69 49 1 249 1 IRF (p t Iu at v Ma t c c u la t a (E c er i co9i 42 1 6 1 3 6 I8627 7033 10i111438 lpholyAp olera se Bacillse E 3ubt 15) Lco cusl ti crm t 69 50 14230 4T 4 -5 -5-26 -1 1- 33 -0 4- 69-- 4-6- -4 1 63 6 7281171019017 106 g~P~ 2 0 873 p tav e ac etococcus actis) je 69 42 f 6465 4 T A L E 2S p n a P u a t v o i n r e g o n o f n o e p r o is S i m l a t o k n o w p r t e n TABLE 273 126 pneumdoniae -Putatiecin regpions of novelpteins simla to know protein 362 4101 Ig I530 Ifnt)s opnt) prti acessio j~eopiu nleze 9 1 52 (n480 71 1 4270 127966 lp IOdo49C3 iEcaherino (Drosopilau mebtligste1 69 1 4 23675 V 4- 4 I 83 I 14 263 jgij132877 jgr s bo prod c nmd fLc o us ltats)eas (PRN Bcl uIutl s 69 46 59237 j 13212 11 19gl19407 IIfucos p on promt bdi ngcU (factorhllus Inf cluac) 69 48 52948 4 58 16 36578 142 7 1633 1 1 4712 Iphosph ribo l gycnside-t fpossyltrnsfe ase aPUcN)iva illu p bti isr) I 69 46 597 9* -H I I I II Heewophilus influensam)
II
I 9 jS 327 j4032 IonlIPIdIOO262 ILIUF protein (Salmonella typhiwuriuml 69 51 786 I 108 j 5 14085 15056 1gni1PIl3e25762s Itranscription factor (Lactococcus lactis) 69 49 I 972 I 126 13 1 3078 14568 IgnIIPIDIdIOI329 IYqiJ (Bacillus aubtilil I 69 j 49 I 1491 131 1 6 14121 12869 19nlPIt3dl0l3l4 I~qeR (Bacillus subtilis) 69 1 47 I 12331 ;-136 ;2 ;-1505 ;2299 Ign-lIPIDldl0058l junknown (Bacillus eubtilie) I 69 47 j 19 (S 32 473 gnPIj33sjYloQ protein (Bacillus subtilis) 69 50 912 1-49 4 4- 1o-ology w-ith-E -oli .aerugI oS ysA-gene;----- r-duct--t 1i O 11 I I I function; putative (Pseudowonas syringe 1 13 4 3191 3829 1g111710373 B8mg (Bacillus subtilis) 69 44 I 6391 169 3 j849 j2324 IgnlIiDIdlO0s82 Itemperature sensitive cell division (Bacillus subtills) I 69 49 I 1476 4 4- too0 1 1 566 j gi1488339 lalpha-awylase (unidentified cloning vco)j 69 50 564 31-g4--9209,- 4 4- t-Mycbactriu-tub4 -ulsis]69-5-96 226 1 1 1 2 1661 IprIJQ2285I3Q22 Inodulin-26 soybean 69 41 6601 233 5 3249 14766 1911472918 Iv-type Na-ATrase (Enterococcus hiram) j 69 56 I518 23 3 I 60 I 76 Ii~4845 Iethylase Illaeeophilus influenzael) 69 43 I 1107w ;660 11766---. -1 I--I1 489 I 23 2 j865 12361 IgnlIPIDIdlO2 ORF (Barley yellow dwarf virus) 69 1 69 j 1497 4- 251 13 1 2899 11967 19112289231 Iwacrolide-efflux protein (Streptococcus agalactiae) 1 69 51 S 9331 I-PI-le322442--epti-e--e--r-y----------tr---um 45- 41 4- I 369 1 8 68 1 2 1911397526 jclumping factor (Staphylococcus aureus) 69 22 8671 I 30 1 j 79 Ii13756 clumping factor (Staphylococcus aureus) 69 21 j 747 S 5. 5 0% TABLE 2 S. PfleumOniae -Putative coding regions of novel proteings imilar to known proteins atch-- -1 -at h-g e- na ID JI I ntl Intl acession I I s I dnt lnth 39 11 1 44 f 280 lgnlPIDIdlOO649 IDE-cadherin (Drosophila melanogasteri 1 69 30 t 37 388 1 260 72 gi1872 (A2000225J hypothetical 69. 30 237ei intp-tRitegncrgo I I I [Eacherichia coli) 32. lc rti ntp-tu negncrgo9 44 189 4 4- 1 2 12006 1 3040 lonliPIDldlol8os IABC transporter [Synechocystis sp.1 1 68 43 I 1035 4 12 15 1 3958 12600 19112182992 Ihistidine kinase ItLactococcus lactis cremoria( 68 45 1 1359 1 2 j 1790 j 1311 IpirlSl6974laSs jribosomal protein L9 Bacillus stearothermophilus I 68 ,56 1 480 16 16 1,7353 *5701 1 911178704 1 1(A5-000184)- o530O; This 530 aa orf is 33 pct identical (,14 gaps) to 525 1 68 41 1653 I I I I I jresidues of an approx. 640 aa protein YHESJ4AEIN SW: P44808 (Escherichia I Ijcolil I I 17 112 1. 6479 I 6805 1911553165 Iacetylcholinesterase IHomo sapiensi 1 68 1 68 J 32)1 113 114128 114505 1911142700 IP competence protein (trg start codon) putative (Bacillus subtilis) 1 68 1 40 1 378 i-2--i2--1 1 1 7 j 4548 4288 19ii311388 loan (Azorhizoblum caulinodans( 68 1 46 I 261 I 36 I 5 I 3911 4585 19111573041 Ihypothetical (Haemophilus influenzael I 68 54 I 675J' 46 6 15219 6040 19111790131 I (A5000446) hypothetical 29.7 kD protein in ibpA-gyrB intergenic region I 68 4782 I I I I escherichia colil
I
I 54 110 I6235 7086 jgij882579 ICG Site No. 29739 IEscherichia coll 68 j 551 852 I 5 I5 I7069 5165 lgnlIPIDldIol9l4 IABC transporter [Synechocystis sp.) I 68 45[ 1905j 4- 4- -4 71 3 j6134 5613 Ig111573353 louter membrane integrity protein (tolA) Ul1aemophilus influenzae( 68 1 50 522 4 4-T 1 71 110 115342 116613 jgij560866 Iipa-12d gene product (Bacillus suhtiiis( 68 31 J 1272 71 11 f750 1892 191407 SecY protein (Lactococcus lactis] 68 35 I 1233 I 1 J 1 22295 J24703 19111762349 jinvolved in protein export (Bacillus subtilis( 68 50 2409 4 I 73 116 110208 1 9729 1gi11353537 IdUTPase (Bacteriophage rnt! 68 5148 4- I 8 18 1798 j1611 1914393 ipa-19d gene product (Bacillus subtilis) 68 53 118 4 87 117 117491 115866 1911150209 jORW I'(lycoplairta mycoides) 68 43 1626 89 j6 539 354 191149824 I~.janaschii predicted coding region K4.0062 liethanococcus jannaschilj 68 j 40 f 786 1 89 111 j 8021 18242 1i1I150974 14-oxalocroronate tautomerase (Pseudomonas putidal 1 66 43 I 2221 97 1 8 16755 I 5394 1gi12367358 I(AE000491) hypothetical 52.9 kD protein in aidB-rpsF intergenic region I 68 41 -4 4 -2- I I I I I( Escherichia colil I I 32 4 9.999 9oo 999 9 99 :9 TABLE 2 S- Pneumonie Putative coding regions of novel proteinsr similar to known Proteins m- aoti t1 Star hf Stop-- j- mthatch gene-name------- sm ie t Iln h I D lIV I nt) Int) I acession a sh n idn jIent 4 -98 13 f 1418 -;2308 -IgnljPIDdlOo261 VivA protein (Salmonella typhimuriumj 68 40 891 4 4 4 j3 1 16414 1-1728 -0 jgij-455363 Ireduklatory protein (Streptococcus mutans) I 68 j 50 j 867 1 115 ,I5054 -;3693 gij466474 Icellobiose phosphotransferase enzyme 11'' (Bacillus stearothermophixusl 68 1 44 13621 4 4- 4- 124 ;7 -j1- 3394 3321 !gnhIPmo~dloo702 Icucl4 protein (Schi zcsaccharomyces pombel 1 68 1 56 174 I 125 I2 j 2923 1922 jgiJ4S0S66 Itransembrane protein (Bacillus subtilial 68 1 50 1 1002 I 132 I2 1 4858 1 2888 j gnljPIodlOl732 JONA ligase (Synechocystis sp.J 1 68 1 52 1971 ;-140 ;7 17765 -17580 jgj 1*9iI209711 -Iunknaw-n (Saccharomyces cerevisiael j 68 1 47 1 186 I 150 1 539 j 3 ig1j402490 I-ADP-ribcsylarginine hydrolase tHus musculus) j 68 j 59 537I i164 58 1867 IgnlPIDle2ssxld Iglutamate racemase (Bacillus subtilis) l 68 49 I 810 -4- 4 F 169 17 ji3946 1 4104 jIpirI-B54s-4sI1-Bs54s Ihypothetical protein Lactococcus lactis subsp. lactis plasmid pSL2 j 68 j 0 1 159 170 14 14247 1 4396 IgiI304146 Ispore coat protein (Bacillus subtilis) I 68 1 52 150 1 171 1 8 1 6002 1 7054 1gi138722 1precursor (aa -20 to 381) (Acinetobacter calcoaceticus) 1 68 54 I Oc 1I98 3 j 2473 1 18-71 Ion~jPIje3l3O7s Ihypothetical protein (Bacilljjs subtilisf 68 46 6031 1 211 1 2 1969 1 1802 IgiJ1439528 IErIC-man (Lactobacillus curvatusl 68 j 45 834 4 1 214 1 8 1 4926 1 4 31 ion1lIIdIo2o49 IN. influenzae hypothetical protein; P43990 182) (Bacillus au btilis( 68 j0 696 21 95 51.70 tohodra reere683621 218 *----!transcriptase (Arabidopsi~s thaianiac) r .I rvre 8 j 3 1 I--1 [3930 1 4745 IgiJ2293198 I(AF0082201'YtgP (Bacillus subtilis( 1 68 38 816 220 6 1 4628 j 4338 lgn1lPI0le32s791 I(A30000051 cr11 (Bacillus megateriuml 6 68 f 51 291 4 -4 236+ -10- I 1 746 J1 4 108 0oI 3 -7 IO F l1Bc l u u t l s 68 1 46 j 639 4 -237 *12 675- 1 1451 Igi1396348 Ihomoserine transauccinylase (Escherichia cclij 1 68 j 49 f 7771 4 1 254 111 517 155 gi1815 (AE000189) o648 was o669; This 669 aa orf is 40 pct identical 11 gaps) to 68 44 363 217 residues of an approx. 232 aa protein YBBAIIAEIN SW: P45247I 337 1 1 1 774 IonlIPxDIe26l9so Iputative orE (Bacillus subtllis( 1 68 47 774 345 1 3 653 lgill49513 Ithymidylate synthase (EC 2.1.1.45) (Lactococcus locris) 1 68 61 651 TA L nuoies Puatv coin regon of nove poeno a to. know prten TABLE 2 s69 pneu15 oniae M. Putaive rec coding region of50 noe rtirimilatknowncu protns 626 1 06 i I(F020 Contil troaFcio reuao Startlu Stopls matc 4j cac0gn7nm i-D -51- nt- t- acessi-- i nt--- 38 2 3 1 1 54 Ig1J 157335 jouer057 mem ra Inetypoteisntolta) (Haccophls influese jy i 68 51 142 22 1 6 5397 51 87 jgiJ128293 l(AF008220) siga xla ran ductn reuactordi (Ba c sabis 67 44 8041 4 13S j 3 1 574 1g1l23213 j1AE0041 para-amintoerteu dsyntheaIaum (eioatelyoi 67 48 1728 4 2--32 8 8709 j17897 giJ419296 IyRroine-5prox t e redctiae (Ati n i de1ioa 67 5 71 13804 36 29 6 114835 90724 1911i144 gtc geaneprt pro (bailus APbeis)n uui mtanccu ansh 67 1 41 738 211 13795 1455 giJ24533 (RF0199861 PsBhrc Io cotllu isodem 67 52 735 32042 8 111 giJ4210 jORan n g enzprduc (acerihi coEi 2...8.~clu taohrohls 67 517 132 4 44 I 6 16 118304 j17514 l0J1 942. IaBC 5 tranrer pro bal u AlP biisubnt(ehnccu]anshi 67 50 71 4----423 I 1 8~ 317 1141 3 giI570371 jOimrF.21 trancrichian col toerso (iB Heohiu nleze 67 52 738142 4 F 614 123 118344 117514 1gnji4139 4 )I IiTP-25dign produte (Bacispre Istilis) ocu ar 67 50 1 81 -41--F 141 611--;g 1-1 1- 7- -u 1c 1us- 4 813 1 43891 493 Igijl7429l4 Iimbriaenlp ranscriptio reultonphorses (pB)leemphi us I influae 67 40 495 4 4 4 618910 6829 bgnIIIe47ll PS jan-bining casette ransore A1Sahlccu ues 67 1 5 88132 971 2 5861 13611 IgnIP1197667 3 jvleloge3 nn(amoed puleillou t[tetoocs(uas 67 1 36 5 04 61 448 1483 1giI11214 phosphgenopruvat emaseci phoporasers elmn1itatbclu 67 42 495 1 4 4- 4 83 2 19 j 3148 IgiJ1276746 lUcy crr protein Porphyr pnf urpuea 67 437 81 S4- 4- 4 4 4 puaiecloisphshtntrse nze-- 4 4- 4 -47 C0 .C *0 C0 C* C TABLE 2 S. pneumoniae Putative coding regions of novel proteins similar to known proteins 4 Contig 101WI Start Stop match match gene name I~ aim 8idant length I ID lID I Intl I Intl I acession I gtnt), F 11-5 i--;8421 T 8077 fil466473 Jcellobiose phosphotransferase enzyme II' (Bacillus stearothermophilusI 67-- 511--- 345 I 127 113 18127 7021 1911147326 Itransport. protein (Escherichia colil 67 45 1 3107 -4 136 13 12215 12859 lgnllPIDIdIOOS8l junknown (Bacillus aubtilis) I67 1 49 1 6451 4 ;-1-40-121 1233 17 -120906 -IgnlIDIdlOl9l2 jphenylalanyl-t .RNA synthetase (Synechocystis sp.i 67 j 43 j 24121 4 146 16 12894 11893 1g112182994 Ihiatidine kinase (Lactococcus lactis cremoris) 67 1 44 j 1002 -4 4 4 1 151 18 111476 111117 lgnlIPIDldlOOOBS 10RF129 (Bacillus cereus) j 67 1 48 360 1 t i60 10 1 7453 8646 Igii228l3l7 lorfe; similar to a Streptococcus pneumoniae putative membrane prot ein 67 46 1194 I I encodedoby DenBank AccessiontNumber X(99400; inactivation of the orfa gene leads to LW-sensit ivit y and to decrease of homologous recombination (plasmidic test) (LactococcusI 4 163 13 13099 14505 IgnlIPIDjdlOl3l7 jYqfR (Bacillus subtilis( 67 47 1 14071 16 74 j5454 1g111161933 IDltB (Lactobacillus casei) 67 45 1 1251 1 4 171 Ill1 7656 8384 1gi1153841 Ipneumococcal. surface protein A (Streptococcus pneumoniae( 67 1 50 729- *T 4- 4 C 18 90 I3723 19111542975 IAbcB (Thermoanaerobacterium thermosulfurigenes) 67 1 46 1 1794 C0 1----i3--i1930- 19 6 3599 3141 IonlIPIDle325l78 Illyothetical protein (Bacillus subtills) 67 j 52 459 F189--i---F -205 -F 3 -1663- 1 2211- 9-1 gi 6 0 6 0 7 3 66t--o169 l[Eacherichia-co-li) I 671-I- 47 I 549 I 4 207 4 2896 1 456 19112276374 jotxe/iron regulated lipoprotein precursor (Corynebacterium, dlphtheriae( 67 1 49 1 561 217 3 4086 3703 IgiJ895750 1putative cellobiose phosphotransferase enzyme III (Bacillus subtillal 67 42 384 26 I2 21 662 19111842438 Iunknown (Bacillus subtilis) 67 43 372 2- 4 4 29- 252 1 1 2 1745 19112351768 IPspA (Streptococcus pneumonlae( 67 j 41 1 744 F4 25 3 1134 1811 jg112313847 I(AE0005851 L-asparaginase 11 (ansB) (Ielicobacter pylon)l 67 j 42 678 295 1 1 375 19112276374 IDtxR/iron regulated lipoprotein precursor (Corynebacterium, diphtheniae( 67 43 375 7 149 51-- 4 6 4 4- l 1 66 5 4 7 j- 489 5146-- *lgn1IP1D e2--5l79---I--nkn-- cob- ct--ri--m--tuberculosis)-- 66 249-- 4 3 111 389 1 3 jgnIIPIDje2IS548 jUnknown (Bacillus subtilis) j 66 1 48 I 3871 -4 1 3 120 119267 120805 IgiI39SS6 jIlGlc (Bacillus subtilis(,1 66 1 50 1I S1391 21 gi-178564-4- 4- ock- ro- -C 4--1 136 41 -4 I 4 3- 254 271 1 1776 j E 028 phg shock- pr ti ch r-hi-ci)--6-j36 j-7 149 59-4- 4- rnscipton- gua- -n resor 4 4- -4ael 66 4660 9 13197-- 112592--1911--57-- 291-Ifimbria--l transcription---- regulation---repressor----pi e ophilus--inf;luen-a--)-66--j-46 j -606 4 1 a a a TABLE 2 S. pneumoniae -Putative coding regions of novel proteinszsmllar to known proteins 4 4 4 4 4 Contig OR? Start Stop mtch jmtch gene name I% aim I ident I length I ID 11D Int) ntl a cession j I Int) 7 4 4 4- I 9 I4 I2812 1451 IgnIIPIDIe266928 junknown [Mycobacterium tuberculosis( 66 43 I 14221 4 I 12 1 2 1 1469 1 1200 Igij520407 Iorf2: ar start codon (Bacillus thuringieneis) 66 42 270 I 5 112 110979 1 9897 IgiJ2314738 j(AE000653) translation elongation factor EF-TS (tsf( (Helicobacter pyloril I 66 49 1083 16 1 2 11312 j 734 jgnIIPID~dlO2245 I(ABOOSSS4I yxbF (Bacillus subtilial 1 66 35 5791 22 13 11372 11851 IgiIi4809l6 Isignal peptidase type II (Lactococcus lactisi 66 30 480 22 I588 09 ~n1PX~e0661 gamma-glutamy1 phosphate reductase (Streptococcus thermophilusj 66 j 51 1269 22 120 116194 J17138. jgnlJP1D~e2819l4 IYitL (Bacillus subtilisi 66 50s 945 j2 1 530 976 Jgi12314379 I(AE000627) ABC transporter. ATP-blndlng protein (yhcG) (Helicobacter 66 4047 4 I 32 1 j 199. 984 jgiJ312444 jORF2 (Bacillus caldolyticus) 66 49 I 7861 33 13I 1 8352 7234 IgI377 44% identity over.302 residues with hypothetical protein from Synechocsi 66 44 1119 I I jg(138979 ap.acceSion D64006SCD; expression Induced by environmental stress;,om I I I I I Isimilarity to glycosyl transferases; two potential membrane-spanning he-- (Bacillus-- subt-- 34 16 15658 14708 IgnljPIOIe250724 Iorf2 (Lactobacillus sake) 66 1 39 I 951 1 0> 34----92 95 4 -4 072l~thaocccu jnnachll 6 1 4 -49 36 j 9 6173 6976 19i11518680 Iminicell-assoclated protein DivIVA (Bacillus subtilisl 66 35 804 36 111 j10396 110824 Jbbsi1SS344 insulin activator factor.' INSAF (human. Pancreatic insulinoma. Peptide 66 3 2 48 1 j 28 1419 1gnl1P101e325204 hypothetical protein (Bacillus subtilis] 1 66 j s5 1392 48 3810 4112 ~giI21B2574 j(AEOOOO9O) Yip5 (Rhizobium sp. HGR234) 66 40 1 303 4 52 14 13595 1 2789 jglJ388565 Imalor cell-binding factor (Campylobacter jejunil j 66 1 52 j 807 I 54 I 3 1 2662 11076 IgnIIrlIjdlOl83l Iglutamine-binding periplasmic protein (Synechocystis sp.i 66 j 43 j 1587 eI54 4 gn rdct(tpyoocu ue 6 14 -455 61 110 I 70 j 13 ign II l4 4 Imd gen prdc (Sahlccu au s 558--- 13-10-94---- gI--4 4- yloi-- iced- cdi- -ion -P00 -4 4- -4 6 44110 I 7~4 I 9 113267 112476 jgij1573941 Ihnpothetica1 (Hammophilus influenzae( 66 1 3 I 7921 I4 4 -4 7 5 1 1 1 2 1 868 Igil574611 Inicotinamide mononucleotide transporter (pnuC( IHaemophilus influenael 1 66 1 48 1 8671 427- 4 -4 4- olij 66 1 4 -02 I 755 I--7;7 5303 4275 put.---EBG- -repressor- protein----- cherichia--- 66--40 1029- L 2 S mo a Pu at v co in reg on of* nove pro ein 9 a o k o n p o e n i lO-R I* -r I I ma a ta c hs aIm a t C h n e 4 4- 1 82 8130 8123 IgniPID85 Sl2 jtrlggaer f acto (Bacll cus ub ti1 66 51 1 3113 703 01 1219 IplIC3396IC33 Ihuiv ho mog -Bascis sub tili Atnmcs asudi 66 52 44 315 T 366 86 11 47 8251 jg16I83 jsia tm020o 9 k hinase (Lco o acisl 51ptietcl Sgp)t 9 66 1 491 9548 88 110 j 7001 6060idue f91081 lpttv fmralassroci0ate protein Actinomycs nSluni R,27 66 52 j 942a 89 4 1 35795 1 425 jgij1410118 homo~logu oEcigBacillus subtilis 66 421 678j1 1 10 1 j 61 j6 1 16 19111748936 O reide l of aniu aprx 04a poei CSBCS W:R 92( erci 66 49 901 31 j 1805 3049 Igi114767 jIp a c dvso perein tW Eteoocu hrl 66 48 1245 4 4 1 107 1 1 965 1864 Ignl 144858 j0R A~i (ailrus pertiingna 66 1 16 9001 4515 1 0 jgij727676 jilyr(e a roti chrmyces cereviseel 1 66 56 30017 4 122 j 3236 156 IgnIPID40 132 jqiy (eepocBacillus subtilis 66 36 643 1287 Il 2 1 814 1gi1723628 jgothPdp n ass ciatpoin GAP-43 xen opus a l avsubnt[lsrdim(an 66 j 481 j 8 2 1 t 4- I 140 3 32 436 54 191140056 OIS8 iphen e-5producate(Bacilu s e ubi(ehcsi sp) 66 3 6 1 963j 140 11 116318 115434 IgniP1658 1 9 jS.10-m(Ethn ete trocu eahfliaterdcae(ril aoooal 66 1 48 885 4 4- 6 7378 3654 IgiJ47326 jTPP-deenenprodacto(ain u dehyrogease lph-uui (Cotilmmgu66 48 3298 F -1 47- 13 4 149 6 5143 1 5305 jgnjI~j dl18 8 jtobnecp pa3epto C ierlse (ynichocsts a. 66 46 996 4- 4 4 4 4 4- I 210 7 4 335 1 3678 Igij49318 ja n74 e g n merd oBacL23us subt i Ioi 66 I 46 j 3 3 1 4- 4- 4- S n u o a u at v .ei n *f e ein simle n w ro e n .a c .a c nam sim 9n length,* S07 peumoni aa~e2 8 zin fuaiver c ogteins onove poten simla to knwnprten +4 24 2 1864 1 2640 jgiI1l76399 Iputative ABC transporter subunit (Staphylococcus epidermidisi 66 1 4177 44 4 243 111 3 6 72 ldbjllAOOO617-.2 IjAB000617) YcdJI (Bacillus subtilisi 66 j 45 870 268 j 81 68 jg~5720 putative transposase (Streptococcus pyogenesi 66 60 324 322 11 1 2 1643 1giJl499836 jun protease (Hethanococcus Jannaschlil 66 40 642j 5 10 1190 j318 g~15422 hypothetical (Ilaemophilus influenzae) 65 34 732 ;1-*11909-- 1-13 4 6 Il 106 111190 IgiI1428S4 homologous to E. colt radC gene product and to unidentified protein from 65 48 j 726 I I I I I1 Staphylococcus aureus (Bacillus subtililIIII 1 7' 2 1647 1405 1pir1C64l46lC64l Ihypothetical protein 1110259 Ilaemophilus Influenzea (strain Rd KW20) 65 1 42 243 4 4- 4 4 4 1 7 1 I 6246 1 6821 lgnljPIl~jdt0l323 jYqhU (Bacillus subtilisi 65 50s 576 1 4 1 10 1 2 1 1873 1 1397 Igij 1163111 IORF-1 [Streptococcus pnaumoniae( 1 65 J 54 d 77 I 4- 1 16 1 3 1 1428 12222 IgnIIDle32SOIO Ihyvothetical protein (Bacillus subtilis( 65 1 45 795 1 1 21 14 13815 13357 IgnI[PIoIe3l49lO jhypothetical protein [Staphylococcus sciuril 65 1 40 1 459 22-- 347*25776-12684--'Ig11 11 230-3 ICpxA-A-ti-o----11us-pler-opneumoni-e I 3 2 I1648 290 gil 1044826 jFl4Es.l (Caenorhabditis elegansi 65 38 1 1359 8 13 110062 110856 jgij1573390 hypothetical (Ilmemophilus influenzael 65 45 1 7951 4- 48 122 117521 116883 1giIl57339l (hypothetical (Imaemophilus influensae( 65 37 I 6391 4 48- 25- 119027---1- acchar- 1 48 12 3 1193856 1j85334' jgnIP480 4 JC02c jputativ tra scritio areulatrv(B ailu .taotemph1a 65 j 32 j 495 I 50 1 6 15337 4519 jgij 171963 ItRtNA isopentenyl transferase (Saccharomyces cerevisimel 65 42 819 52 1S 1478 1558 IgI14974 It. jnnachl pedicted coding region MJ.0912 (lethanococcus jannmschiil 65 j 46 861 4 4- I 9 I I3963 1 4745 1giJ496514 jorf zeta (Streptococcus pyogenes! 65 1 42 7831 4 68 3 1 2500 13483 Igil887824 IORFt-o310 (Escherichia colil1 65 1 46 I 9841 4- 4- I 69 3 121111 1017 jynljPlDje311453 junknown (Bacillus subtilisl 65 j 42 1095 69 7 6029 5325 1911809660 jdeomyribose-phosphate aldolase (Bacillus subtilisl j 65 55 j 705 4- 4 71 15 18536 j1 9783 IgiII5'322 Iglycosyl transferase lgtC (GP:U14554-4) lHaemophiius influene) 65 42 1248 72 8 7664 8527 1gn11P101e267589 Unknown, highly similar to several spermidine synthases (Bacillus subtilisi 65 j 39 1 864 -2--F8T 4 C 9 4* .a TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins Contig jORF I Start IStop match jmatch gene name 8slm J8ident Ilength, I ID lI0 1 Int) (ntL acession I ntL 76 5 15773 14097 jgnhjP1IDd101723 IDNA REPAIR PROTEIN RECH (RECONBINATION PROTEIN IEscherichla col)j 65 44 1677 4 76 j9 f8099 I7875 ilI574276 lexodeoxvrlbonuclease. small subunit IxseB) (Neemophilus influenzaej 65 I 38 j 225 T 84 .2 I2870 12352 1g112313188 I(AEOOOS32I conserved hypothetical protein (Nelicobacter pyloril I 65 41 519 4 86 115 114495 113407 jgnljPIDjdl~l8aO J3-dehydroquinate synthase fSynechocystis api j 65 44 j 1089 4 4 87 3 f37~06 12423 1qiJ151259 IHNG-CoA reductase (EC 1.1.1.88) (Pseudomonas mevalonli) 1 65 j 51 1284 88 3 22 j23 gj 11098510 Iunknown [Lactococcus lactisi j 65 30 312 89 2 11627 1 1007 Ign1lPIDjd1020O8 (AB001488l SI MILARsTO ORF14 OF ENTEROCOCCUS FAECALIS TRANSPOSON TN916. 65 41 621 I I I I jI (Bacillus subtilisI IIII 1 116 j 1 j 3 '11016 IgnilIlDIdlO1l2s Jqueuoslne biosynthesis protein QueA (Synechocyatis ap.1 j 65 1 44 1 1014 4 123 1 1j 69 1389 Ig1i498839 jORF2 (Clostridum perfringens) 1 65 j 36 321 4 123 7 16522 17190 Ig1i1575577 IDNA-binding response regulator IThermotoga maritima) I 65 1 39 669 12i 3 381 j2859 IgnIPIDje2S76O9 jaugar-binding transport protein (Anaerocellum thermophilumi 65S 47 963 13 j2 815 718 Ig~28274 IAEOOOO9OI Y4pE (Rhizobium sp, NGR234) 1 65 41 198 1-2 1- -5 .17 14 1 5021 13885 Igil472329 jdihydrollpoamlde acetyltransferase [Clostridium magnum) 65 47 1137 I .148 12 11053 11931 IgnllP1Dld101319 JvqgN [Bacillus subtilis) 65 42 J 8.79 151 3212 4687 1gij304897 jEcoE type I restriction modification enzyme N subunit lEscherichia colil 65 50 1476 I 156 12 1 730 1437 lglJ310893 Imembrane protein IThelleria parval I 65 j 47 j 294 -4 4 164 4256 j 4837 gi(41Ol32 10RFX8 Bacillus -subtilisl 65 j 48 j- 582 j 4- 4---4 169 16 13192 1 3914 Jg111552737 Isimilar to purine nucleoside phosphorylase (deool (Escherichia colil 65 f 41 7123 4- 4 176 I4 1 2951 2220 jgnIPIDje339S00 joligopeptide binding lipoprotein (Streptococcus pneumoniael 65 4 3 732 195 4 14556 13900 jgij1592142 JABC transporter, probable ATP-binding subunit Imethanococcus jannaschi( 65 40 j 657 i 4 4 4- 196 1 1160 1572 jgn1IPIDjI2004 IABOOI4B) PROBABLE UDP-N-ACETYLNURANOYLALANyL-D-GLUTANYL-2. 6- 65 51 j 1413 I I I DIANINOLIGASE IEC 6.3.2.15). (Bacillus subtilis)II 204 12 12246 1215 Ig1i143156 membrane bound protein (Bacillus subtilis) 65 I 37 J 1032 4 210 14 11544 1891 Igi149315 IORFI gene product (Bacillus subtills) I 65 j 48 3481 4- -4 242 j2 11625 1723' gil 1787540 (AE000226( f249; This 249 sa orf Is 32 pct identical (8 gaps) to 244 65 42 903 residues o f an a pprox 272 aa protein AGAR-.ECOLI SW: P42902 IEscherlchia I I IColli)
I
4 4 4 TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins FContig lORF j Start I Stop I match match gene name S im j eidtI length, I D jID I (nt( I nt) acession j (nt) 284 1 1 1900 jgiJsSSS6l jcly4 (Plasmid pADlI 6 36 I 9001 4 I 4 304 1 574 1gn---l1-57iP-- 10je290934 Iunknoni (hlycobacteriu--m tuberculosis; 65 52ube-r-ul-osi3-I 52- 1 573 315 1 1483 1gij790694 Imannuronan C-5-epimerase (Azotobacter vinelandlilj 65 1 57 1 1482 1320 1 1I 3 569 jgnlIPX0Jdl02048 aerogeones. histidine utilization repressor; P12380 1199) DNA binding I 65 46 7 I 1 5 I BcilssbtilIis) I 5 7 ;-4------i409--1--l-----e323508-IYIo-p-o--in--(aci11u 2 F 7571 I -I6696 jigil498753 Inicotinate-nucleotide pyrophosphorylase [Rhodosplrillum 'rubrumj 64 1 4 876 4 4- I 6 I6 I5924 6802 IgnljPOIdlOIll jmethionine aminopeptidase (Synechocystis sp. I 64 1 52 1 879j1 4 4- I 8j3417 3686 gij1045935 DNA helicase 11 (Hycoplasma genitaliumi 1 64 58 2701 8 I 1 j329 I 68 InlPD~2852 Orfa (Streptococcus pneumoniae) 1 64 46 561 4 I I I7 604 715 giI162328 jYcrS9c/YlgZ homolog (Bacillus aubtill 64 45 I 642 1 22 Ill 1 9548 19895 IgnlIlDjdlOOS8l Iunknown (Bacillus subtilisj 64 38 3481 1 22 130 122503 123174 jgiJ289260 1comE ORFl (Bacillus subtilis) 1 64 44 I 6721 I 6 7 14375 114199 Ig11409286 IbmrU (Bacillus subtills) 1 64 I 30 177 F--I 27 12 1510 j1334 jg1140795 IDdeI methylase (Desulfovibrlo vulgarisl 64 51 1) I 29 12 1614 1297 Igij232616B Itnpe VII collagen (Ilus musculusi j 64 j 50 j 3181 I Is 2ueain 36 2 prIfslJf strain P022) plasmid Ti I 40 1 j 1 449 Ig~i 4970 lepiD gene product (Staphylococcus epidermidis; 1 64 f 41 1 447 7 j 4683 14976 1eni1P501e325792 IiAJOOOOO5I glucose kinase (Bacillus megateriumi j 64 1 45 I 294 45 +1 8068 16920 jgnljIPIDjdlO2O36 laubunit of ADP-glucose pyrophosphorylase (Bacillus atearothermophllus) j 64 40 j 1149 51 j2 j301 j1059 Igij43985 InifS-llke gene (Lctobacillus delbruecklill 64 54 759 T F 1 i 1 5 113 115251 118397 Igij2293260 I(AF008220( DNA-polymerase Ill alpha-chain (Bacillus subtilla) j 64 46 j 31471 53 3--;1157-15745559 gijl574292 Ihypothetical e (Haemophilus influensee! j -64-4- 7 1 603j- 58 2523 1662Ii4236 1606 I--n--gi-t-l573826 jalanyl-tRNA syntheta---seHae (ala-hi-lu-s-i-ni (ilsemophilus----------------ensae-l-4 -64-15-- -j 6 36311 I 4 4 4- 1 66 111 3 1 1259 Ig1I895749 Iputative cellobiose phosphotransferase enzyme I1- (Bacillus subtiil 64 42 12571 521- 1656- 4 3696- -neprodcts( -cl- ils) 4 4 6-868 5 j--5213 i6556 i---3696-- gene--products-- (Bacillus------earot- ermop-- us-- 1344- 6 6 536 44 -I -4 4 -2 4 T *D 0 0 Int 0 Int 0 *css o I 0a Int0
I
a 3 1 1283 5 1465 lbbs1l33379 TLS-CNarsfuslon protein(CIIOPoC/EBP transcription factor, TLS=nuclear RNA- 6 6411 I I I I ~binding protein) (human, myxoid liposarcomas cells, Peptide Mutant, 462 5 ~1 1 3 I I I I 5 5aaj (10mo sapienslI 1 81 113 114016 114231 Igij143175 Imethanol dehydrogenase alpha-10 subunit (Bacillus sp.) 1 64 5 35 216 51312 ~85A 122090 5gn15P105dlOl3l5 5YqfA (Bacillus subtilis) 64 44 I 2405 83 a -1 87 -1-1-410046- 59300 51 230 -putati-e-Ptc--protein--Baci-lus-subti-is) 6-4-1 43 7471 1 98 17 5 5032 1 5706 5gnIjP105e233880 5hypothetlcal protein [Bacillus subtilis) 1 64 5 38 5 675 1 11505 5-1- 5- 2 5- 1276 g 1657503 -simil r -to aureus.--mercur------)-reductase----E-c-er--chi-aIcoil)--- 64 1275 I 13 57 55136 1 6410 5gn15P105dlOlll9 5NifS (Synechocystis sp.) 1 64 50s 12755 4 -a 19 1 5 2 5197 5gnl5P1D5e32052O jhypothetical protein (Natronobacterium pharaonis( 64 5 37 1296 123 53 1125 2156 5gnl5P105e253284 loRF YDL244w ISaccharomyces cerevisile 64 5 40 5 1032 14 5 52331 1780 5gnl5P1O5d10l884 jhypothetical. protein (Synechocystis sp.) 64 50 5o 5521 1-4 19 54 j3467 2709 5gnilP105dl01314 JYqeU (Bacillus subtilis) 64 5 52 7591 ;4 aO 1-5--152- o----(Bacillus----s-bt---is- 64--5-42 5 505 1 -a1-3 4- 1 131 511 1 7196 57549 5pir5JCll15JCll hypothetical 20.3K protein (insertion sequence 15113)) Agrobacterium 5 64 j 50 5 354 S I I Itumefaciens (strain P022) plasmid Ti IISI 139 5 3 1 3226 12651 5gij2293301 I(AF008220) YtqB (Bacillus subtilisi 5 64 5 44 I 5765 146 110 56730 55648 5gi11322245 5mevalonate pyrophtosphate decarboxylase [Rartus norvegicus) 64 1 45 5 10835 a 1 14 8 i 58430 5 8783 5gi52130630 51AF000430) dynamin-like protein (Homo sapiens)] 64 5 28 5 3545 4313 3612--, a a- -0 Ita 1a70 5--156--5-5--4313- -3612- tran--m--mbrane--(Bacillus---subti--is----- 64 31 5 -7025 8-- 5F 157 ;4 5 i_1299 5 2114 Ignl5-IP105Idl00892- jhm u to O trnpr syte pemes prten (Bclu subt-is 5 64- 5- 816-- 4 162 1 6 1 5880 5 6362 5gi5517204 (ORFI. putative 42 kDa protein (Streptococcus pyogenes) 64 5 58 5 4835 -4 164 513 5 9707 58769 5gnl5Pl05d100964 5hom oge f ferriccanguibactin transport system permerase protein PatD of 5 64 5 40 5 9395 1 I 1 I I. 1 V. anogulloarum (Baillus subtl is) 5I 1 175 1 5 5 3906 1 4598 jgi5534045 jantiterminator [Bacillus subtilis) 1 64 5 39 1 69351 1 169 110 5 6154 51 6507 5gi5581307 5response regulator (Lactobacillus plantarum) 5 64 5 33 5 3545 5 91 54 5 I3519 5 286 5i 149520 5po oioy anhan t ismrs Lcoccu ra s 64 46-- 5 657-- I I a. a. **a*aa TABLE 2 S. pneumoniae -Putative coding regions of novel proteinrvl'fmilar to known proteins Contig IQAF Start Stop j match jmatch gene name i im %ident length I TO jID In t I int) acesslon In nt) I ;202 j1 76 11140 lgn11Pzole293806 JO-acetylhososerlne sulthydrylase iLeptospira rreyerij 64 47 1065 i i-224-- -3 g- 1- 57-3 3- 1---38j 23 3 21 47 111014jOaF X1 (Bacillus subtilis) 1 64 1 43 1 357I 253' 3 1709 1089 IlrIJCl1SllJC1I hypothetical 20.3K protein (insertion sequence ISl131) Agrobacterius 64 5038 I I I i I I tuwefaciens (strain P022) plasmid TiI 38 I B -9 I 297 1 I 1 660 jgiI1590873 Icoilagenase (Imethanococcus jannaschiil 64 48 660j 1 5 14 18730 18098 jgij556885 jUnknown [Bacillus subtilis) I 63 48 1 6331 4 1 10 16 15178 1 4483 I91I1573101 jhypothetica1 (Iaemophilus Influenzae( 63 1 40 6961 4 4 1 12 ill 1 9324 1 9902 IgIj1806536 Imembrane protein (Bacillus acidopullulyticusi 63 1 42 1 59 1 15 110 1 BR97 19187 19)1722339 junknown (Acetobacter xylinusi 63 40 1 291 37 I2 101 I309 IgnIjPIOje2L76O2 jP~nUIJ Lactobacillus plantarumj 63 32 723- I 18 8 7778 I6975 19111377843 Iunknown [Bacillus subtilisi 63 45 j -4 4 C) 1- 4-T 4- 26 14 1 9780 1 7078 gi 1142440 lATP-dependent nuclease (Baclllus subtilill 63 1 46 2703 I 4 I 29 15 1 3488 14192 IgiI11377829 junknown (Bacillus subtilia) I 63 1 35 I 7051 OF-~trccu-acl I 34 Itl 8830 j79f8 ignljPzDjdll98 0R8(neo cus acasj63 45 I 8431 1 35 1 3 1 1187 1876 1gij722339 Iunknown (Acetobacter xylinumi j 63 39 I 312 48 115 f12509 111691 jgiJ1573389 Ihypothetical (Naemophilus influenzae( 63 41 819 4 I 51 Ill 112719 112189 1IgI1142450 jahrc protein (Bacillus subtilis) 63 j 35 I 531 I 55 I 1 3979 5022 lgij1708640 IYeaB (Bacillus aubtilis) 63 41 1044 4 I 55 115 113669 114670 IgnIPIDje3llSO2 Ithioredoxlne reductase (Bacillus subtilis] 1 63 44 j 1002 1 68 110 19242 1 8919 jspIP376B61YIAY_ I)YPOTNETICAL 40.2 KD PROTEIN IN AVTA-SELB INTERGENIC REGION iF382). 63 j 40 324 4 4- 86 6-4--585 1-11748 Ilic-1 operon protein ilicD) (Heemophilus influenael 1 63 I 41 1 870 4 4 I 88 1 8 16085 15180 19112098719 jputatlve fimbrial-aesociatead protein (Actinomyces naeslundil) 63 43 1 906 4 1 96 18 5858 16484 IglI11052803 jorflgyrb gene product (Streptococcus pneumoniae( 63 38 627 I I 10 4 14 g117171 Ifucosidase ilictyostelium discoideum( 63 36 j 1701 C* o 000 0e c TABLE 2 S. pneumoniae -Putative coding regions of novel proteins irmisar to known proteins Contig 101W Start jStop match match gene name Itsim Iidant length, ID JID J nt) j nt) I acession III (nt 122 __4704 14886 jgnljPIDjdlO1X39 Itransposase_(Synechocystis sp.i 63 39 183 1287 4 SI 203 1 gn112W9d0l3 jorf2e Netano~aeiu E therosutotrophcum 1 63 27 6 857 -S S 142~ I 10 j48 glPoe105(yohtcal protein (Bacillus subtilis) 63 I 44 I 4861 159 15 11741 2571 10 1874 (AE000184)1201; This.271 aa orf is 24 pct identical (16 gaps) to 265 63 39 831 I I 1 iI7703 resi dues of an ppo.272 as protein Y*ACOLI' oW; P099974 (E~b-scL4ichIS -71 (12 i8803 14406 gn1jP10_j13249l_8 ((gAl protease (Streptococcus sanguisi 1 63 48 1 5604 1 177 1 3 1 347 Igi11773150 1hypotheticai 14.8kd protein (Escherichia colt) 63 34 345I j2_ 42_ 1 i- 11- 22 -3 -I I 1 178 1 3 1 794 11012 jgills9l5a2 Icobalamin biosynthesis protein N IMethanococcus jannaschii( 63 36 1 219 I 95 177 j175 jgnilPsDle3242l7 (ftso (Enterococcus hirel 63 j 33 12030 234 5 (1739 1 1527 Igi11591582 Icobalamin biosynthesis protein N (Hethanococcus jannaschii( 63 j 36 213 4-i T 2 49 1 1 1 81 1 257 Igij 1000453 ITreR (Bacillus subtilis) 63 1 411 177 1 283 1 127 1347 jgi[396486 108F8 (Bacillus subtilisi 63 1 44 12211 29 84 3466 jg1j722339 junknown (Acetobacter xylinum( 63 j 37 663 2 4- 11 1 95 486 19111877424 IUDP-oalactose 4-epimerase (Streptococcus mutans) 63 46 j 420 -4-1 34 I1 I 2 556 10i11477741 Ihistidine periplasmic binding protein P29 (Campylobacter jejunil 63 1 36 j 555 -4 I 35 1 j29 13 Igi 12252843 i (AF013293) No definition line found (Arabidopsis thalisnal 63 33 207 4- 382 1 1 88e 378 1gi1722339 Iunknown (Acatobacter xylinum)1 63 40 291 385 13 1364 1158 1gi12252843 j(AF013293) No definition line found (Arabidopsis thaliana) 63 33 207 4 2 II 2495 j 288 jgnl(PIDje325007 penicillin-binding protein (Bacillus subtilisi 62 42 2208 S4 3 123 123374 124231 IonljP1Dje254993 Ihypothetical protein (Bacillus subtilisi 62 35 858 6 1 130 113193 IgnlIle3496l4 jnifS-like protein (Hycobacterius leprae) 62 37 j 1128 6819 7232 jgnljPIDidlOl324 YqhY (Bacillus subtilis) 62 32 414 1 7 119 115466 114207 IgnlIPlDldlO8O4 lbeta ketoacyl-acyl carrier protein synthase (Synechocysti s sp.) 62 43 j 1260 4 *D @11 1 (n t0 a *c *i on*t TABL 2115 1622 n leoniae Putative cobdigriosonve protein lailu i l mila to know proein I 4 7 24 119526 118519 Il 1276434 Ibeta-ketoacyl-ACP synthase III (Cuphee wrightil 62 37 j 1008 12 j7 15904 14702 1gi11573768 JA/0-specific adenine glycosylase (mutY) (Haemophllus influenzel 62 43 1 1203 4- -4 I 5 (1 I9678 9328 jpirIJCll5lIJC11 hypothetical 20.3K protein (insertion sequence 1S1131) Agrobacterium 62 4 I I I I j ~tumefaciens (strain P0221 plasmid Ti I 4 17 4 2609 12442 jgi(1591081 IM. jannaachli predicted coding region HJ.0374 llethanocqccus jannaschii( 62 43 1681 I 17 I 1, 3053 12835 1g11149570 Irole in the expression of lactacin F, part of the let operon (Lactobacillus j 62 I 44 I 219I I'r I I I I 22 110 1 8627 19538 IgnilPIDIdlOOS8O Isimilar to B. subtilis Onall (Bacillus subtilisi 62 I 43 j 912 3 1865 2043 1gij2314379 1AE000627) ABC transporter, ATP-binding protein (yhco) Ielicobecter I 62 1 43 I 1179I I I I I I pylon) II 33 5 1 2235 11636 191j413976 Iipa-52r gene product (Bacillus subtilisi j 62 1 44 1 600 I 38 1I1 1 5689 16123 IgiI148231 joZ51l(Eacherichie colil 62 34 I 4351
CD
I 40 10~ 114272 j13l28 IgnliPIDIdlOl9O4 jhypothenical protein (Synechocystis sp.1 j 62 43 945 0 4 42 3 1 311 jgij1146182 jputetive (Bacillus subtilisi 62 41 309 -4 Ihypothetical protein fragment YBGB-ECOLI SW: P54746 (Escherichie coli) 48 112 19732 19304 IgIJ662920 jrepressor protein (Enterococcus hirei 62 1 32 j 429 51 8 j 15664 1 7181 1gnl1Pl01e301153 jStySKl methylase [Salmonella enterical 62 j 44 j 1518 52 i3 I i2791-i 2099 1 giJ1l83886 ntegral me mbrane protein (Bacillus subtilis) I 62 41 693 1 55 116 115702 114704 1gn1jP101e313028 jhypothetical protein (Bacillus subtilis( 1 62 I 40 I 999I 1 9 6 1 3418 1 3984 1gi12065483 junknown (Lactococcus lectis lectisi 1 62 32 1 567 I 3 IS 4997 j 4809 Igi 1149771 jpilin gene inverting protein (PivilL) (Iorexelle lacunetal I 62 j 28 189I I 70 114 110002 110739 1911992977 IbplG gene product (Bordetella pertussial 62 I 45 j 738 4 1 71 113 118790 j20382 1g111280135 cded for by C. elegans cDI4A cm2le6; coded for by C. elegens cONA cmOleZ; I 62 I 62 1593 similar to melibiose carrier protein (thiomethylgalactoside permease 11 I I I I I(Ceenorhabditis elegansi' I 1 71 128 132217 132768 lgnl1Pxold101312 jrqec (Bacillus subtilisi 62 I 35 552j 7 4 1 7 111666 110383 1g111552753 hypothetical (Escherichla col)I 62 I 38 I 1284 00.L 02 0 0 0 S. 0nu a 0uatv coin 0 ein 0f 0000 l 0rten 0iia 0 o 00w roen 4 4 06-13-6B-g-l--I-Dl-0-19 F- 4- 98 206 j 26 jnl~ljd~lSG Ilira (ineralebn protein seudomnas funaegnl 62 423 963 -2 14-115--- 102 1 282439 5649 lgnllPID43l3O l c- ohpoteics protein [Bacillu subtopiluisl ue 62 24 717 F1- 4- 10 3 279 124 intuenzae hpteia B rnpre;P488(73(ails6 1 4- 12 1 I 2 3120 j 2329 giJ58197 jisPcy Iactnooccus e Lact ilu Iebuekl 62 40 8314 *-126--17- tI -07 129 I I 4931 J 4080 jgirlS43709I Ilic opner protein ic6 lmohil s ic nfviunse 62 I 8 1 927 11 I6 439 569 gijl85748 Illc1oprn prcoteinu laciCs Heohls nleie 62 j 2 39071 1243 257 3gI524 IB 1137orer 721abl jT n seucnts e thInrd occ I jsemophilu 62 41 417 T 124 I j 16 05532 jgnI(609076 35 Ileucy arotinpetds (Batacillus delbruckill 62 40 834 6 2 103 1 795 IgnllPXDId1Oll63 IORF4mebn (Bacillus subtil 62 38 3678 4 4 4* 7 j 131 2 1 3845 413 101I85725 uknive (Lctocios lacphtis eaeezm I Bclu utls 62 42 4389 1 14973 1 23 25 9 19i115921732 Icobl transportpol ATP-binding subunit 0(Iletha nococcus jannasch i i 1 62 41 67 7 4 4- 4 1 64 1 256 1 6705 jgniII323O ILao pracoti bacillus csuilis 62 1 40 1 6962 I 2607 I 4 381 jgnJ18P~25464 meane0 0 proten (retoccums Neumonie 62 I 40 I 13 4 I 0 15 I 6 I13061 I 2935 jgn PiDJ 10 200 Itransmembrane pon(Bacillus subtilis) 62 37 62 I 0 1 3 177 I 291 43 gIi4439 IP31- Seepor utS Isiella pnEumonerihi jo 62 41 1 I 8513 I 12 I2 85 73 Ii19570 puttie cerobis hoporaseascnymtI (Bacillus subtilis) 62 45 1 319 4- I. aD aI a ant I ant a ao a I Intl 4 211 1 1 j 3 1 971 1911147402 Imannose permease subunit 1ll-Nan (Escherichia coi) f 62 43 I 969 I 23 2 J1495 1034 IgnljPlDIdlOllSO jORF2 (Streptococcus mutansi 62 41 4621 ;-223 _1 _j 1- 00 6- g-4 -y c e r-4 4 4 4_4 7- -4 I 22 j5 I1765 1487 jgnljPZDje27647S jgalactokinase (Arabidopsis thaliana) 62 j 33 279 2 42-- 1 T 375 f Ij1i 159 IgIj167423l 1AE000052i Nycoplasma pneumoniae, hypothetical protein homolog; similar to 62 4015 j S q54 357 19111573353 jouter membrane integrity protein (toiA) (Neemophilus influenzae) 62 47 228 4 I 7 I 2725 3225 g112114425 similar to Synechocystis sp. hypothetical protein, encoded by GenBank 61425 1 4 4 17 6 3326 j3054 9gI149569 Ilactacin F (Lactobacillus sp.) 61 43 273 4- 7- I 4 l I888 724 jg1IDdll29 IQ Bacillus subtille) 61 42 1155j 4 4 57 i 6 j-3974 -1 6037- Ign-I-IPI-Di-dll13l-6 YqfiC (Bac lu su- s- j -42 2064-- I 5 5 736 665 jsjP4I6j~TC SP~lDIN/P1~ESC1NE TRANSPORT SYSTEM PERNEASE PROTEIN POTC. 61 34792 i_58--FS-i7356-- 4 67 1 j 692 1911537108 IORF-f254 (Escherichia coli) I 61 46 6901 4-F-i- F I 68 19 1 8816 1 7890 101119501 IpPLZ12 gene produc t (AA 1-184) (Lupinus polyphyllus) 61 j 41 927 7 ji 1177 1200o8 IglI992976 jbplF gene product (Borderella pertussis) 61 j 44 1272 101 -4 4- 72 Il1 9759 110202 IgnlIPIDId1Ol833 Icarboxs-normpermidine decarboxylasm (Synechocystis sp.) 61 36 444 4 4- 4 4 7881-j--7003--gn-------d-----------ar--sy---d-phos-hate--synt--as--(Bacillus----tearother--oph-----l-614------879 I 87 4 j 4914 13697 Iglj528991 junknown [Bacillus subtilis) 1 61 j 42 j 1218 1 4 4 4 4 1 87 113. 112311 111361 jgij1789683 I(AE000407) methionyl-tNA formyltransferase (Eacherichia coll) 61 44 I 9511 I 1 2 1731 1 2989 IgiIS37OBO Iribanucleoside triphosphate reductase (Eacherichia colil 61 45 j 2259 4- 105 3 12711 3499 IgnlIPIl 1l851 Jhypothetical protein (Synechocystls sp.) j 61 44 j 789 1 115 1 6 17968 16478 jg11895747 Iputative cel operon regulator (Bacillus subtllis) 61 36 1491 7181-8518 g11124957- -rotei4- 4 4- sl 614 -3 ;123 8- j_ 7181 8518 19-110952 prt; h tdn k as (Etrccu facl 1338--- 4 -4 .oti aOR a .tj a *LO *c aee m ai *dn I a.nga *D a1 a at a *nt ac ao *a a a a'aI S4 t 4755 72 jiJ7804 (AE0001841 f27l; This .271 as orf is 24 pct Identical (16 gaps) to 265 61.380 326 j 8 I 525 6725 1 i residues of an approx. 272 as protein YIDA-EcOLI SW: P09997 (Bscherichia *j 3 0 f- ;--128 V11 I--i639-- jgnljP1DjdlOI328 fvqiY (Bacillus subtilis) l 1 41 1 639 13 74 5054 IgiI1022726 Iunknown (staphylococcus heemolyticus) 1 61 1 41 1 261 f- f 13 1632 5913 Jgn1lP1Dle2700l4 beta-galactosidase (Thermoanaerobacter ethanolicusi 61 1 41 6720 t f- -t f I 43 jI 55 4 gil52054l penicillin-binding proteins IA and lB (Bacillus eubtilial 61 '42 1 2511 143 f--f t f 14 16 1225 1142 1i15574 Ietehdrdivicoiinate N-succinyltransferase (Eacherichia coli) 61 1 42 702 f- F 162 i3 -i4?12 3456 -ignlP1DIdl01829 iphosphoglycolate phosphetase (Synechocystis ap.) 61-- 30- 657 72 13 1 727 1 1077 jgnIjPIDI02048i j. subtilis. cellobiose phosphotransferase system. celA; P46318 (2201 I 1 44 I 311 I I II Ba cillus subtilisi 1 177 3 1101 1772 gnIP1Djd100574 junknown (Bacillus subtilis) 1 61 1 4- 672 T _I 202 2 1278 j2585 1g111045831 hypothetical protein_(GB:L18965-6l (Iycoplasma genitalium) 61 36 1 1308 ft t- -ft f 224 3 272 344 )~j19114 j. jennaschil predicted coding region NJ.0440 [ftethanococcus jannaschiil 630 j 363f- ft 22 35 3766 jgiJ1552774 1hypothetical (Escherichia coli)1 61 j 40 I 3721 f- 2-5 -f T 4-9_ T t f f ft- 254 2 83 84 gnIPlDjdlO0dl7 105F120 (Escherichla cclii 61 36 j 3601 257 1 1 3 1350 ignlPIDje2SS3lS Iunknown (Iycobacterium tuberculosis) 61 42 3481 293 j4 j3971 13657 JpirIJCIIIJCII Ihnpothetical 20.3K protein (insertion sequence IS11311 Agrobacterlum61435 I I I Itumefaciens (strain P0223 plasmid Ti 614 I 31 301 VI T 949 17___gij229I209 i(AF016424l contains similarity to acyltransferases (caenorhabditis elegans) 61 j 33 933I ft-- r 1 313 1 1 1066 1287 IgiI393396 ITb-292 membrane associated protein (Trypanosoma brucei subgroup( 61 38 780 3 2 243 145 giJ537093 I0RPo153b (Escherichia coli) 60 27 483 1- 4--3 f- 61 4636 15739 JgiJ2293258 (JAF0082201 Ytol (Bacillus subtilis) 60 35 1104 t 6 ft 6 12 1196 1117 gi293011 10573 putative (Lactococcus lactis) 60 f 44 J 750 17 113 j6708 1 6484 Igg1j149569 liactecin F (Lactobacillus sp.] 60 32 I 2251 ft 1 6977 5670 giI810 (AE00027pl o481; This 481 *aa orf is 35 pct identical 31 as o30 60 133 I I gij178140 residues of an approx. 858 aa protein 340LI1..IUHMAi SW: P46087 (Echerichia cclii 38 115 115878 117167 jgnlIPIDjdlOOS84 u-nknown (Bacillus subtilis) 60 44 1290 0 9 99 9% 9 09 0 9 T A B L E* 99 S. .nu on a Pu at v co..n re i n of 9ove 90ti s r t no n p o e n 9 0 I 2 110 8n96 896 gint)27 W00 2cesio Ita (Bclu Iu s 6nt 37 1 6 22~~rpH jnA 10d 1id 243 1ggl11d025 j(ransimbrun Iflcilussubili 60 I 36 86241 43 1 i 1 826 8 964 IgiI1223757 IAFOO82in YtnaG (Bacillus yce subtilis a 60 37 I 6691 4- -4 1 38 1 1 883 9269 1011400j323 18u btiwnl~is ensacrpinfl, r pA. be S1d iAadgd Bclu utls 60 1 3 82691 4 4 I 45 10 111610 10368 IgjI'9'488 proteinp kinase 1b*Iaccharoenmce Bcill I 60 6 I 667 44 1 1 1269 ungnmedIprotein2prouncnol~me Ischizosaccharomycesmpostaans 60 32 4 1269 4- 4 2 11 j 838 1138 1gni397 488 3 JIRF28alpaoglucn brau nin enzymer(illu sut1s 60 1 31 87 I 48 192 115668. 11737 gn11P11e2057 lol (Lctbacillus heltius)1 60 32 1389 -4 4 4 489 j~ 177 16510 IgnIIDdlO2O3 1HAB002668) unnamediprte ing roduct M 541aemophilus ctnomce e ianl j 60 1 32 22514 6 -4T 4- 4 4 ARTO----NE-P-OU-C-OFT-HISENTR--- 62 2 63 17 InIIjlO8 nnw Bacillus subtili s 04 I I40 68 4 350 1450 gi11586 lIN. infuenee prdicte coingu reginl 154 laephlsiflese 60 36 j 2161 4 i 6182-- SIMILAR- TO-- Y-FR-----02 1 7 1 I 63430 13366 jgnljPijO327 Iphnotheticale protein (Bltailsersbils 38ncoc sp. 179145 4 1 761 8 1i~ 4 11 41537 1g1125066 Irip 12 genela poroduc t hBac iu pubtin jhshts Frioatr 60 33 2457 76 1 16 3 3 6 152 gi 1147429 I orfi si m a suui II-a I~ce i colI03 I 94 1 15 IgiI 7 putati cllo s itrinepoenpopaas )rioatru 60 39 750 92 1 1192 ae (scerihi coij-- -42 4- 4 I3 1 119 198 gJ788 (AE0002971 o464; This 464 aa onf is 33 pct identical (9 gaps) to 331 j 60 27 123.6 114 (0619 9384 (0(1188389 residues of an tppzox. 416 aa protein HTRCJIEIOO SW: P43505 lEsciterichia j- II I I I coli I 1 94 1 5 I 5548 18121 jgnijV10je329895 I(AJ000496) cyclic nucleotide-gated channel beta subunit (Rattus norvegicusi j 60 50s 2574 4 I 91 1 I 5396 14533 1gi11591396 Itransketolas&' (liethanococcus jannaschiij 60 43 8641 2 3-4- otheica n- 4 4- 60 3 75 -102 2 208 2833 n 1P---3 092 cal- pr t i b c- tu e c l s 60-- 75 -4 4 a..m ccnt I *ec C C c TABLE977 2 918 n lemonlae -Y patecing rgionus of nol prt6n0lha to knw protein 4- FD _15 ((noaceru 60nl I aeaon jIII It S4 4 1 2755 18 24 gnllPIle334782 IIlb o tein2 (Blucsae1 m subtllisi 60 321 5912 1178631 568 1gni1466875 18 jtrnifU; a B.46.lS7 Hyc o bcteru lepral 60 39 306 1 1 42-- 4 9 122 1 4 1438 568 ignIIPIDl3l876 Itrnpoasea protnechocti s p i ls 1 60 39 41 306 S4- 4- 127 f 8 9 4510 jORm (Thrpnma tepl ul1 60 I 38 1 74J1 139--- a 44 -9 3082-2672-- o-h- tica-l- prote-n- (Bacillus-- 1 1390 1 2 17 9 1 249 jgnljl09d1068 lpoF i Thrmstdn thar ahsl ~trccu acls 60 39 1 1749 -4 11 i 15208 11309 gi1I53l4 90BFinesf437 iv (Easytae aoF cherichia coli 60 301 1512 148 20 77252 814 gnljl092 32 hpotei a otekin (ae lu (Etrcu i ci sl 9 637 9 134 4-T, 4 -4 4 4- 14 IS 538 9 45 e 14562 Iptrane-ensglv e DH yrnt ase [Earo ri(Echeria col 60 1 401 1038 155 2 1 4 3558 j24492 1911600711 07 puteative (Bgacillu subtills) Psuomn5 ergno 60 1 37 42 1-5 9 1539 1 3 6 7 134286 Ig122932 product20 brah -chimiain amo acidu atraster p(Bac iu (uBtcllus 9 0 9 42 129 4- 4- S155 801 9 1 748 99192104504l~l 9put al uPlus e deydognae Echrihi cli 60 40 38 66 92 63--i7--i 9 1580 9 3. 431 9 28 1gn11 1107 2 Ia. nfegati e preulat o diho regon (H1udoona (aerohiug in lueze 60 9 379 6 145 1 a-T 1115 3 1 477 13861 III0 7 prFouct (higy imiar o Bi llsatrcsCp)rti Bclu 60 48 6095 9 6 7 84 86 gnIIOd r Yeprso (BailSr utioicu pygnspaeT2 60 .38 3406 4 S9 170 3 1 9441 828 jglj 15747 Icat influe n tr pri te ing rgcionu H11244 ism opiu)nlese 60 1 34 1017 2 4- 11139-1----- 21 1 3 1 2448 15 9 gi91877427 epryesorb (Stetococcus poeneosae T11 60 9 .389 3689 -y -o 4 4- 9 28 9 92145 2363 1911608520 fmyosin heavy chain kinase A [oictyostelius discoideus(. 60 1 31 219 F 4 4 00 0 0C C* CC C TABLE 2 S. pneumoniae -Putative coding regions of novel proteinssitilar to known proteins 4 a Contig bOR? Start Stop match w atch gene name I'Sim i dent lnt 4p-- 4 4 242 1 1 725 J 3 191143938 ISor regulator (Kiebsiella pneumonlae( j 60 41 1 723 S4 1 1 288 IgiI304897 Ifros type I restriction modification enzyme II subunit (Bacherichia colil j 60 56 288 21 j1 905 45 lgi1671632 Iunknown [Staphylococcus eureusi j 60 j 36 861 -4 1g i- 1 5- 9 4- 12--414929- 1662662 1pirl531840l5318 Iprobable tranaposesea -s Bacillus stearothermophilus 60 26e 6- 26 7 274 1 836 196 1gi11.592173 IN-ethylammeline chiorohydrolase (lMethanococcue jannaachii( 60 1 40 741 F 4 308 j 1 463 1 2 1giI1787397 ((AE000214) ol57 (Eacherichia coli) 60 43 1 462I I 18 I I30 gn11P101el37594 IxerC recombinase (Lactobacillus leichmannii( 60 42 1 306 1 344 111 73 1522 1gij509672 Irepreasor protein (Bacteriophage Tuc2009) j 60 1 32 450 5 4--iI--5 7-6 i1229-1-7----i(-F0-820-)Ytx I j2 184 1712 gn110e2074junknown (Hycobacterium tuberculosis( 59 I 39 I 9991 4- 1 10 j 1 1413 1 4 IgII353880 jalalidase L (Hacrobdella decora) 59 41 1410 LP'i 4 j6 j643 5156 1gi1580841 Il (Bacillus subtilis) 1 59 1 35 I 13081 4 22 j I47 193 jgj1249 als operom regulatory protein (Bacillus subtilisi 1 59 I 34 915 22 5 2698 4614 19nIlP1Dje280623 JPCPA (Streptococcus pneumoniae) 59 I 44 I 1917 I 30 111 208 1558 Ign11Pl01e233868 jhypothetical protein (Bacillus aubtilisi 1 59 I 37 J 3511 4 67 255 lglPI~e0290jukown (Lactobacillus sake) I 59 I 33 I 1224 I 5 13 12201 111071 IgnIIPIDje23B664 1hypothetical protein (Bacillus subtilisi 59 I 35 I 1131 3 4 4 I 5 14 1328 118 I116767 p8 (tphylococcus aureus) I 59 I 39 1107 4- ,ap- H-St 4 4 i 36 1-18 ji -7-6-j17897 Igi 11 5 0053-5 *jM. annach-i predicted coding-region 11J1635 Imethanococcus jannachiil 59 I 33 I 3 112 1 6172 I7137 19112293239 I (AF0082201 YtxC (Bacillus subtilisl 59 I 34 I 966 4 4- 4 42 j 15 361 IgI6884 pinin (Canis familiaris) j 59 j 40 1410j 4 3 1192 136 g11--4 4 4 I 5 5 180 288 b~1I10e1754 xerc recombinase (LactobacIllus leichmannii( 59 1 41 519 61 1i 6 16812 I 5628 IgnIPlDje3lli56 Iaminotransterase (Bacillus subtilis) 59 I 40 ll1185j 4 67 1S 12382 -i3023 IgiI1146190 I2-keto-3-deoxy-6-phosphogluconate aldolase (Bacillus subtills) I 59 I 36 I 642 4 TABLE 2 S. pneumoniae -Putative coding regions of novel proteins si~ilar to known proteins Contig IORF Start Stop I matc ath gene name a im Iident length ID lID Intl tnt) I acession 0I I I nt) I 69 110 I8567 I8899 Igi11573628 lantothenate kinase (coaAl (Haemophilus influenzsel 59 I 38 I 333 87 112 111383 110055 IgnljPIDe323SO4 jputatlve Fmu protein (Bacillus subtilis] 59 I 44 1 1329 4 111 11 111117 115891 gi 1163731 AEOOOOIO) IMycoplasma pneumoniae, fructose-permeasm uISC component; similar 5 316 113 j4 11327 j5894 911163731 to Swiss-Prot Accession Number P20966. from E. coi (Mycoplasma j16 4 119 2 11966 1526 jgnlIPIDje2O900S homologous to ORF2 in nrdEF operons of Ecoll and Styphimurium I 59 413 411 I I I ([Lactacoccus lactietI I 128 117 113438 113178 1gnl1PIDje279632 junknown (Mycobacterium tuberculosis) 59 1 38 1 261 14 121290 2388111482922 protein with homology to pail repressor of B.subtilis (Lactobacillus I 59 1 40 516 I I I I I I delbrueckil( II 7 14 3 199 04 g1~~dOO JAB0014881 FUNCTION UNKNOWN, SIMILAR PRODUCT INI It. INFLUENZAE AND 59 32 684 148 1 l 967 1 014 JniI~o~d1~005SYNECIIOCYSTIS. (Bacillus subtilis) I 10 7213 8244 1911710422 Icep-binding-factor 1 IStaphylococcus aureus( 59 I 40 I 1032I 164 9 6993 16013 IgnlIPIOjd10096S ferric angulbactin-binding protein precusor FatB of V. anguillarum I 59 1I 91 I I I I BacilIlus'subti is) I164 112 j8836 17823 IgnIIPIDI00964 Ihomologue of ferric anguibactin transport system permerase protein FatC of 59 35 1014 I I I I I I V. anguillarum (Bacillus subtilisl 177' 2 1 401 11072 Ii289759 jcded for by C. elegans cDNA CE2G3 JOenBank:114728)i putative 59 40 672k II II (Caenorhabditis elegansi I 4 I 177 I7 I3841 ;4200 19112313445 j(AEOOOSSII N. pylonl predicted coding region 11P0342 (Nelicobacter pylon)i 59 38 360 I 183 14 12768 1 2508 1911509672 Irepressor protein (Bacteriophage Tuc2009) 1 59 1 50 j 261I 186 16 13398 12820 Ii606080 IORF-o290, Geneplot suggests frameshift linking to o267. not found I 59 I 3857 I Escherichla colil I 190 1 3 I3120 11711 Igi11613768 Ihistidine protein kinase (Streptococcus pneumoniae( 1 59 j 32 1410 14 2 1621 11019 IgnIPIDI0s79 junknown (Bacillus subtilial 5 40 603 1- 4 I 198 I 5205 14306 IgnIPIDje3l3O73 hypothetical protein (Bacillus subtilis( 59 I 38 I 9001
I
I 220 15 14362 13958 IgnIIPIDId101322 JYqhL (Bacillus subtills) 1 59 1 46 1 4051 4 4- 24 17 2367 Ii 16 45 (AE000184( f308; This 308 .aa orf is 35 pct identical (35 gaps) to 305 1 59 4 9 242 3 173 jgi~787045 residues of an approx. 296 aa protein PFLC-ECOLI SW: P32675 (Eschenichia I4 I I jcoli) I 247 12 1 1154 1 1480 191140073 JORF107 (Bacillus subtilis) 1 59 I 39 3271 4 4 *06 a a a a a TABLE 2S. pileumoniam Putative coding regions of novel proteins'sliuilar to known proteins sm Iet lnt W ID I (nt) Int acession j I I j nt) I 256 f1 868 1 2 IgnlIPIDIdlI924 jhemolysin [Synechocystis sp.1 5 9 I 39 867 258 1 1 65 1 820 19112246532 I ORF 73, contains large complex repeat CR 13 (iaposi's sarcoma-associated 59 20 756 I 1 11 1 1 herpesvirus)JI I 270 1 j386 1126 lgn1IP!Dld102092 jYfnfl (Bacillus subtilist 59 j 40 j 741 I 21 I552 1166 1911666062 Iputative (Lactococcus lactisi 1 s5 31 1 387 24- 11--i -4 1 309 j 1 3 1 479 1911405879 IyeiI4 iEscherichia col) 1 59 38 j 477 I 363 111 2 1 1894 IglJ915208 Igastric mucin (Sus scrotal 59 1 31 1 1893( 1 5 6 111123 110465 Ign~jPIDIdiOl8I2 ILumo (Synechocystis sp.( 1 58 .29 1 759j1 1 29 14 12098 13513 IgnIlIlDIdioO479 jIla# -ATPase subunit .1 (Enterococcus hirag) 58 39 1 1416 1 30 15 14058 13651 191139478 jATP binding protein of transport ATrases (Bacillus litmus) 58 34 j 408 I 33 16 1 2983 1 2210 IgnIlPIDIdI164 Iunknown (Bacillus subtilist 58 45 7'4I 36 j8 5316 I6179 jgiJ 1518679 ott (Bacillus subtilis) 58 I 32 I 864 4-- I 43 1 5 j 5926 1 3971 1gi11788150 j 1AE000278) protease 11 tEscherichia colil I 58 1 37 19561 8 -518 I 48 14 1i1722 111066 1gnl1P1131d101771 Ithiamin biosynthetic bifunctional enzyme (Synechocystis sp.( 58 j 34 657 I 2 1 1229 j 3 gInlIDIdiOl29l Ireductase (Pseudomonas aeruginosa) 58 1 35 1 1227 4- -54--i 2- 02- *1 412- 1g i- 23 1-3 57- -E 4 5_8_ 2_5_ -91 6586 5498 g911147329 transport protein (Eacherichia coli) 58 j 41 1 1089 59--FT 6 4* 1 69 I1 I 4934 1 3807 IjgnlItDje3li492 junknown (Bacillus subtilis) 58 1 41 1 1128 1 4# 1 71 127 131357 132277 19i12408014 Ihypothetical protein (Schizosaccharomyces pombel I 58 1 33 I 9211 1 72 14 13586 12882 1911186 94 jnodulin-21 (AA 1-2011 (Glycine maxl i 58 1 34 705 74 I 3 I4937 j4230 19i12293252 I(AP008220( YtmO (Bacillus subtilis) 1 58 1 33 1 7081 I 79 I4 I4594 13422 19111217989 JORF3 iStreptococcus pneumoniae( 1 58 j 44 3 1173 4 1 82 18 110585 1 8171 19i1882711 lexonuclease V alpha-subunit (Escherichia colil I 58 j 38 2415 86 17 j16017 115337 191147642 5-dehydroquinate hydrolyase (3-dehydroquinese) [Salmonella typhi) 58 1 32 j 681 1 97 I 2 931 1560 191 1153794 jrgg (Streptococcus gordonlil 58 32 I 3721 4 9 9 C9 9 9 Ow to: s .19 *0 09. 0 90 9 .0 99 '9 9. @9 00a TABLE.2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins Contig lOBS' I Start jStop match jmatch gene name S sin ident jlength ID JIX I (ntl nt) ceso nt) 108 1 2 1358 12724 Igij537020 Ivacs gene product (Kicherichia colil 58 j 37 23671 I 11 J 59 540 g~15214 jBCtransporter, probable ATP-binding subunit (Nethanococcus jannasehi) I 58 36 I 648 120 j3 1.4421 5110 IgnIPIDIdlOlJ2O YqgX (Bacillus eubtilia) sBo 47 I 6901 12 16 1311 112673 jgiJ662919 joaF U (Enterococcus hlrael 58 j 42 459 4- 4 13 6174 4939 jgiilBOO3Ol Imacrolide-eftlux determinant (Streptococcus pneumoniae( 58 j 35 1236 ia90--g-l-----e--9-88-IUnko-w---acil-ussub-t1-is 16 il I865 98_65 _I11473901 SP0-FI ([Lactococcus lactia) 58 1 39 j 1251 1 4- 161 16 16268 16849 IgnlIPIDIdO1024 IDJ-l protein_(liomo sapiens) 58I 32 1 5821 4 169 1 1214 1 2 IgnlIPIDIdOO447 jtranslatlon elongation factor-3 (Chiorella Virus) 1 58 31 1 213 I 87 I1 j487 1 2 jgi 475114 Iregulatory protein (Pediococcus pentosaceusl 58 38 486 17 j6 j4384 j4620 g1j 167475 Idesslcation-related protein (Crarerostigma plantagineumi 58 55 237 4- 4 I 190 j2 1464 1640 jgnljP10je246727 jcompetence pheromone (Streptococcus gordonil 58 38 177j 192 1 2 j 2012 11344 jgnlIPIDIdlOOSS6 Irat GCP360 (Rattus rattus) 58 44 I 66910 1T 41-6-9-4 4 216 2 j2333 555 9-gn 1 P 10D-Ie325 0 36 Ihyvothetlcal protein (Bacillus subtilisl 58 1 33 1 1779 4- 217 15 1 5250 14321 Iglj466474 Icellobiose phoaphotransferase enzyme II'' (Bacillus stearotherrsophllus( 58 1 38 j 930 I217 7 5636 1 5106 jgnl1P101d102048 B. subtilis cellobiose phosphotransferase system ceIB; P46317 19981 58 44 I 531 I~ I B.transembrane (Bacillus subtilis) I 4 232 1 1 2 1 l jgiIl573777 Icell division ATP-blndlng protein iftsEl Illaemophilus influenzae( I 58 1 39 1 8101 264 1j 1 2 1 715 jgiJ973330 I~atA (Bacillus subtills) 58 32 7141 4 280 j8 j1 33 +17673 1-7l786l--g 7 1IA1E17O-8l61-1-hypothetical_(A- E29.60 -k-131 protein in thrC-taIB intergenic regiontei InthrCtaIBin58genc 31ion587331 1 I I I I I (Escherichia coli) 1 306 1 1845 1 3 IgnljPIDje334780 IY~bL protein (Bacillus subtilisi 58 1 47 j 843 4- 360 3 55 j109 IpI4651YZG~ HYOTETCAL 45.4 lCD PROTEIN IN TIIAJINASE I 5'14E010N. j 58 1 32 465 363 5 2160 1867 g211160671 IS antigen precursor (Plasmodium talciparuel 1 58 1 51 j 294 372 1 1806 3 IgiJ393394 ITb-291 membrane associated protein (Trypanosoma brucei subgroup) I 58 1 7804 4 382 2 74 1 519 IpirIJCll5lIJCll [hypothetical 20.3K protein (insertion sequence IS)131) Agrobacterium 58 4)21 I I I I I I tumefaciens (strain P0221 plasmid Ti III 2 1 -I -4 -I 4 a. 0 a a. a a a. a.00 a a a..00 TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins Contig JOR? Start Stop I match match gene name jI aim Iident jlength I D I JID I ntl I ntl I acession I I nol~ I 3 19 1 8409 17471 1giJl499745 jH. jann aschii predicted coding region 1(30912 (Iethanococcus jannaschiil 1 57 38 1 939 1 10 10 I 67 j757 gi17719 hoologue to SICPI (Arabidopsis thalianal 1 57 30 1 168 11 I 2 J412 lonllPIOIdlO0l3S lORF (Acetobacter pasteurianual 57 42 411 I 31 14 12032 11388 fgij2293213 IjAF0082201 YtpR (Blacillus subtilisi 1 57 I 31 I 645 1 4 33 I6931 I6449 gn1jP1d0-e324949 Ihypo-thetical protein (Bacill-us a-ubtilisI 57 1 36 1 483j1 6 15IS I54 5060 giJ1592204 Iphosphoserine phosphatase llethanococcua jannaschiil 57 1 44 1 387 4- I 9 j7 j6523 j 7632 19i1155369 jan'S enzyme-II fructose IXanthomonas campestrial I 51 I 1 11101 52 6 j 4.520 j 6850 19111574144 jsingle-stranded-V1JA-specific exonuclease (recJl (liemophilus influenzaei j 57 I 35 2331 I 53 1 5 2079 11795 jg9j 1843580 Ireplicase-associated polyprotein (oat blue dwarf virusi 57 46 j 2851 -0 -4I- 4- 63 16 1 5312 14995 Igij2l82608 IIAEOOOO94) Y4rJ IfAhizobium sp. NGR2341 I 57 I 39 318 S4- -4 4- -4 4-- 79 12 2561 1j 1815 IgnlIDIdlOO96S Ihomologue of NADPH-flavin oxidoreductase Frp of V. harveyi (Bacillus I 57 I 1 14 747I I I I I I I subtilisi 82 9 959 j 9763 giI12O6o45 jshort region of similarity to glycerophosphoryl diester phosphodiesterases I~ 6 I I I I I(Caenorhabditis elegans( 6 8 6131 j1l 7 7 8 YOIB-.ECOLI SW: P28244 (223 as) (Escherichia col iduso rget I 3 I3 I1695 1177 19111500003 Imutator mutT protein (Hethanococcus Jannaschfi( 57 I 33 S 59 96 6 13026 14519 1g11559882 Ithreonine synthase (Arabidopsis thallanal I 57 I 43 1 14941 I 99 j14 117211 118212 1911773349 jairA protein [Bacillus subtilisi j 57 j 44 1002 1 4 112 I I744 703 jgj 19193 II. jannachiil predicted coding region H(10678 (Hethanococcus jannaschii) 57 I 30 1 456 4- -4 4 4- 11 16 1827 j132 jiIA45605IA456 Imature-parasite-infected erythrocyte surface antigen SMESA -Plasmodium 5J 22 j 00 1 123 12 j 343 11110 1pir1F641491P641 jhypotheticai. protein H10355 Haemophilus influenise (strain Rd KW203 57 38 768 13 4 2108 12884 IgnlIlDIdIO2l48 i(AB0016841 sulfate transport system permease protein (Chlorella vulgarial I 57 39 777 4- 4---4 1 27- 110 Ij 6477 5587 19111573082 n~trogenasm C (nifC) (Haemophilus influenaelI 57 I 35 I 8911 28 13 I9251 9790 19i1153692 1pneumolysln Istreptococcud pneumoniaelj 57 i8 540j 12-113T 4; 112139 1363 191142081 Inago gene product (AA 1-250) (Escherichia col) 1 57 j 36 777 F-3- a aa a TABLE 2 S. pneumoniae -Putative coding regions of novel proteins Similar to known proteins 1 10 JIt I Int) I nt) I acessian I I nt) 136 1 1214 j1221 1bbsI148453 ISpaA=endocarditis immunodominant antigen (Streptococcus sobrinus, HUCOB 571 I j 108 I I I I263; Peptide, 1566 (Streptococcus sobrinus) III 4i -140 125- j271 165 gi-1505576 Ibeta-giucoside permease (Bacillus subtilis) I 57 1 38 1 1851 1 141 16 1 6395 1 7438 1gi1995560 Iunknown (Schizosaccharomyces pombe) 57 1 41 1 1044 1444 3+ 3231 2785 -27-----Ig-n-l-gnljPIO -O-13---1dl00139c JORF (Acetobacter pasteurianus--- 5 -57 42--42 -4447- 4- +1yos-y---anseras 1 I 159 9 4877 I5854 Igi129b509 10307 (Escterichia coli) 1 57 1 35 1 978 1 -T-T 16 Il 910 9249 IgnilIDIdlO0l39 IORF (Acetobacter pasteurianus) 57 42 462 1 171 1 6 14023 1 4436 IgiJ1474o2 Imannose permease subunit III-Man )Escherlchia coli) 57 1 29 414
I
1 178 14 3 170 1 1076 igiP1I12041AB001488) ATP-DEPENDENT RN4A IIELICASE DEAD IIOMOLOG. (Bacillus subtilia) 1 57 39 1095 4 I- a 1 190 111 145 j 1455 1gi1149420 lexportiprocessing protein (Lactococcus lactis) 1 57 I 30 1311 298- 9- i- 8 uidentifie .ORF22. 6 1
I
203 12 13195 j 2110 1gn11P101e283915 jore c01003 ISulfolobus soifataricusi I 57 I 41 j 108 6 305 1 1 40 1 507 )gi11439527 III-man iLactobacillus curvatus) 1 57 28 468 I I I I I j (Bacillus subtilis) III 268 I3 1767 11276 Igi 143979 IL.curvatus small cryptic plasmid gene for rep protein IMactobacillus I 57 36 492 I II I I curvatusl 1 351 1 1 324 1 34 jgnljPI13Ie275871 1T03F6.b ICaenorhabditis elegans) I 57 I 31 291 386 226 2 ij -160671 S-antigen precursor IPlasmodi-um falciparum) 1 57 1 45 225 1 4 4 a 5 10364 1 8777 1gi1405871joahrihacllI 63 364 31 gj619 pksC; L518j1..2 [Hycobacterium lepre j 56 j 39 1 237 10 3 j 3442 i1874 jgn1jP1Djd10l907 Isodium-coupled permease ISynechocystis sp.) I 56 36 1 1569 f I 21 1 1 11880 1333 IoiI2313949 j (AE000593) osmoprotection protein (proliX) )ifelicobacter pylon)l 56 1 33 1 1548 4 a 22 129 121968 122456 IgnIlPIDjdlO200l j(AB001488) PROBABLE ACETYLRAJSFERtASE. (Bacillus subtilis) 56 37 J 489
I
1 27 1 j 1361 1 3 jgiJ215132 Iea59 (525) (Bacteriophage lambda) 56 1 30 1359 I 8 9 46671 4278 ~giJ1592090 IDNA repair protein RA02 Ilethanococcus jannaschii) 56 1 29 I 390 -a i- a I 33 1 1 3 386 Ign~jPxo~d100i39 10SF lAcetobacter pasteurianus) 56 j 41 j 384 4 0. .0 0. 0 a a a 0 TABLE 2 S. pneumoniae -Putative coding regions of novel proteintlf-gmilar to known prot eins m- Coti lot tr tp mtch jmatLchgenename s im Ident length I ID lID I (nt) ntl a acession II ntl I 6 j7 52 5397 lpirPQOS3lPQo0 hypothetical protein (proC region) Pseudomonas aeruginosa (strain PAO) 56 28 276 I I I I j (fragrpent( 11 4 1P- _j__17602_* 1nU_- 4 6- 4 137 431 19i18000l trancr tide-e afcu ivdtrinat lu (Stre tcoccsp moie 56 27 1182 167 4 11 115 1 21391 gnlIPIDIeI276 IPebnU [Lact facillus la t ea Iphi 56 1 38 6891 8 117__ -i 7 *1 13023 g ll -Pij 43729 Otr -F-a crit o nac t ivp asto-r [a cilu- s subti- is-- 56 -415 3- 1075 1 2 1 1364 1 2594 IOnIPIDjdIO2I3 1ymembr ane protein (ilsy tao therm phiu 1 56 25 3 I 191 1 852 3 1425 1314 IgnII D1 00l3 loRF_04 (A ccer pateia ns i I S6 1 411 344 892 7 I1 5 49304 giI8539 5 iroducffiilt Eri coiii PRFtai prot i n (Bac i Stlionll J 56 42 876 4 I 0 136 271 IgnIi~ l I hyphica poeirSyecoysi s. I 56 I I 139 124 3 92151 3194 1911537D2 8 IORF..o345 (Echbceria ct)bruoss 56 3127 1044 4 1 12 1 13 4 2540 93 I1.7 gnIPlDIdlOO34O lhuma (Pl uspox myors in jev hi ospes 56 284 29701 23 3 j 79 20 3 825 1149 035IPDdII7 higohafftity peias cgutm ebndgprotein (Salmonellati spI 56 30 821 I I 1 17 110 1 681 193 7245 IgiISro2bB s uknown (Iyber ium(csA (tuer co s jansh 56 274 42 I -I p -r 1A-47 -7 11 -7 0 -rf _di~ _I f i I us s u _L 1 -5 1 1 SF 4 4 I 3 46 89 IPD;ll7 hypothetical protein (Synechacystis sp.1 56 1 329 1- F _148- 6- 4 4 t- r-Le---tA -[Synecho 1 834 131 j 1 446849 2743 19iIj192 0 2 sulfatel pierm e cfcr omase (cs) (itao ocu janneoaschl faiyo nyeI 56 34 j 3 4 142 8 019 45 282 1PilA4771A7 lorfi 9 immdiael Thi of 7 nf Baills subptidenis a j1 as)t 0 56 29 4381 I rsde a 146. 82 466 36aglPadlll hpteia protein (Synecocysti SW; P314 56 32 1017 I148 3 1 99 2739 IgnllPD9 dl09 ihospart transport systemi pmease prote. sn (Sncotis p. rei 56 3 3 S4-- 4 172 1 372 2087 1911787791 (AS E00249 for7 Thinse 317aaurfis 27~a pctideica (ris gas)to30 56 1 36 207 1 17 12 120 71 I prS794 7 iir prt eillu S ubetcocis tyoh. e 20 kfaprin in sro3 reMio) 1 56 40 6158 -F 4- TABLE 2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins Contig 10SF IStart jStop j match match gene name sim ident Ilength I ID jIG .1 nt) I (nt) acession I I ntl 772 2239 jgiJ606376 l0BF...oj62 Itscherichia coiij 56 35 534I 206 2 3342 j1633 jg11559861 iclyW (Plasmid pADlI 1 56 38 1 17101 219 3 11689 1 1096 IgiIl146197 jputative (Bacillus subtilisj 56 j 27 5941 230 2 j 409 1485 lprC02I63 hypothetical protein 2 (sr S' region) Streptococcus mutans (strain 564017 I I I I 0l1Z175. aerotype f) i$2 ;4 rhoptyrotn--- 23 2 I1543 12724 jgiJl43089 flap protein (Bacillus subtilis) 56 1 32 1182 4 4- ;353 -j 1 F5-16 j gnljIPl10-ie32SOOO- ihypothetlca1 protein (Bacillus subtilisi 56 41 1 5161 1 359 1 1 87 1 641 IgiIl786952 IiAEOO0l76) o877, 100 pet identical to the first 86 residues of the 100 aa 56 j 46 555 I I I I I j hypothetical protein fragment YBGBECOLX SW: P54746 (Escheridta colil III 4 1 363 17 14482 14198 IgiJ1573353 jooter membrane integrity protein (tolAl (l~aemophilus influenael 56 j 38 1 285 4 I 18 111 836 j 177 IgnljPID~d100872 Ia negative regulator of pho regulon (Pseudomonas aeruginosal 1 55 31 1 660 1 -4 4-~ 28 j 4 1824 1619 IgnlPIOIe31,6518 jSTAT protein (Dictyostelium discoideuml 54 0 4 -4 4 -041 29 6 14496 15041 19i11088261 junknown protein (Anabaena sp.) 55 j 31 546 4- 4 38 116 9695 110702 jgijSBO9OS IBaubtilis genes rpml,. rnpA. S0kd, gidA and gidB (Bacillus subtilis( 55 j 31 1 1008 4- -T4- 4 4 I 49 I 5 5727 16182 IgiI1786951 j(AE000176) heat-responsive regulatory protein (Escherichia coli) I 55 1 29 1 456 I 4- 51 4 12381 13241 IgnlP1DId101293 lYbbA (Bacillus subtilis) 1 55 I 42 1 861 4 52 19 19640 110866 1 9 i1153016 10SF 419 protein (Staphylococcus aureusj 55 I 23 1227
S-
I 3 I 81 34 918642 IsP Brella bugofrjI 55 30 465 I 60 15 14194 15756 1gl11499876 [magnesium and cobalt transport protein (Hethanococcus jannaschill I 55 38 1 963 71 19 114176 j15408 IgiI1857l20 jglycoayl transferase (Nelsseria meningitidisj I I 41 1 1233 4- 4 16 13189 14229 IgnlPlDje2O989O IHAD alcohol dehydrogenase (Bacillus subtilisj 55 I 44 1 1041 10 I_8 Ol-0 114-68 I9820 Ign11-P101-e324997 lhypothetical protein (Bacillus subtilist 55 f 36 669 11 12 112273 I13037 lgnIIPlDje3ll496 Iunknown (Bacillus subtilisi 55 I 34 I 7651 S4- 4- 11 13 j13007 113945 IgiI13423 1-phosphofructokinase (frus( (Haemophilus influenzae( 55 I 39 I 939 1 126 5 6764 5907 1gi11790l31 I(AE000446( hypothetical 29.7 kG protein in ibpA-gyrB intergenic region 55 I 37 I 8581 I I I I I I (Escherichia colil I II 0~~0 0too000 0 00, TABLE 2 S. pneuisonlae -Putative coding regions of novel proreintdamilar to known proteins 4- Coti ORP Start I Stop I match match gene name %Sim ident jlength I ID jio 1 Intl j Intl I acession (ntl I 129 3 1 2719 1902 IgnlPIDjdIOit2S lPz-peptidase (Bacilus licheniformisl 55 I 35 1818 4- 138 3 2593 j 1610 1gi1142833 I08r2 (Bacillus subtilis( 55 I 37 1 984 10 6 16916 15633 JgnljPLD~d100964 homologue of hypothetical protein in a rapamycin synthesis gene cluster of 55 26 1 1284 I i I i i~ Streptowyces hygroscopicus (Baciilus subtilis)II II i--147 i3 13854 12136 jIgi1472330 dihydrolipoamide dehydrogenase (Clostridium magnum! 55 j 39 I 1719 I147 110 110204 I8921 IgnIIPIDIS73078 IdLhydroorotase [Lactobacillus leichmannili 55 38 j 1284 148 5 3430 i4119 igij290572 Iperipheral membrane protein U (Escherichia coli( 55 j 29 690 148 6 14171 j 4650 19i1695769 Itrensposase IXanthobacter autotrophicus) 1 55 1 A7 1 480j I 149 ji4 j12564 111650 IgnIlPlDIdlOl329 YqjG (Bacillus subtilis) 1 55 1 32 1 915 4 i; 1 1 pyioril II I 5 1 65 I5897 Ii203 similar to E. col 0RP adjacent to sue operon; similar to gnrR class of 55 1 29 729 4 16r 74 J23 gnjI~25 hpeltocr1 proteins (Bachius subtils 5 II 4 j 164 i5 2772 ;3521 gIt 1 4 0 348 I---1p-u resa-ivase Tnp I (AA- 2843 (Bacillus thuringiensis( 55 I 35 1 750 1 F 164 Ill- 7428 f -7216 1ign11Plole2494o7 Iunknown (Hycobacterium tuberculosis( 55 1 38 1 213 I 161 1 5 1 3860 1 3345 IgiJ535052 linvolved In protein secretion (Bacillus subtilisi 55 I 28 1 516I 186 I 5 2880 2563 Jg1I606080 IORF .o290: Geneplot suggests trameshift linking to o267, not found I 55 1 35 I 318I I I I I(Escherichia call
II
4- 4 189 8- I 31 I 5396 jignl--P-D-le183450 jhypothetical Bosa-protein IBacilus subtilisI 55 I 32 1086 I I 192 IT S- 320 I37*g~16vitellogenn convertase (Aedes aegypti( 55 I 38 192 1 195 1 2 1 2454 1 1384 IgiJi574693 Itransferaae, peptidoglycan synthesis lmurO) (Heemophilus influenzae) I 55 1 33 1071 ;---9198 *1I 1I-3013 7-2471 Ignll--P-L)I-PXD 1-07e 4-i3074 jhypotherical p protein (Bacillus subtilisi I 29 543I3 4 1 214 1 1 1 373 1 744 IgnijPID~d10174l Itransposase (Synechocystis sp.) 55 I 33 I 3721 4 30 66 2199 2 1115 456 gene---product----(Bacillus-- egateriu---- I 23 I I3742 I3443 IgiJ18137 Icgcr-4 product (Chiamydomonas reinhardtii( 55 I 48 300I -263 1 285 1 1 1 2 1 829 JgnlJP1D~d100974 Iunknown, (Bacillus subtilis( 55 1 40 828I 286 I 1 650 I249 gI(396844 loaF (18 k~a l(Vibrio cholerae( 5 31 I 402 T -T i I 297 I2 1 1229 I1696 IgiI150848 jprtC (Porphyx-omonas glngivalis( 55 1 39 I 468I 4 09 9 9 t 9 99 9 09 0 99 9 9 9 9 TABLE 2 S. pneumoniae -Putative coding regions of novel protel-slilar to known pro~eins Ioni jO? Start IStop match match gene name I a im ident length, I ID JlD Intl. j Intl I acession I I I Intl 39 2 218 1982 1gi11574491 jhypothetical (Iaemophilus influenzae( 55 f 35 765 F328 i2-1 646- 224 191 1571500 Iprohlbitin (Saccharomyces cerevisime) 55 27 423 330 111 1340 474 1911396397 IsoxS (Escherichia colil 5 29 87 364 132538 1546 jgij393394 jTb-291 membrane associated protein (Trypanosoma brucei subgroup) 55 j 36 j 993 744- 368 f-94- 1 1 0 5 19g 1 1160 6 71 IS antigen precursor (Pasmodium falciparum( 1 55 j 40 f 837 I 3 IS 4604 3624' 1112293176 l(AF0082203 signal transduction protein kinase (Bacillus subtillal 54 j 26 f 981 34 -4*J5- "1 1 1746? 1 7246 19111146245 1putative (Bacillus subtilis( 1 54 j 38 j 501 L 38 24 11213 I17937 19111480429 jputatlve transcriptional regulator (Bacillus stearothermophilus( 1 54 1 27 1 1725 4 1 40 18 15076 1 48182 IglI139909 Imethionyl-tRNA synthetase (Bacillus atearothermophilual .1 54 35 1 195 4 I 43 I 4 I 3980 12367 IgnIIPID~lel4B6II IABC transporter ILactobacilius helvetlcus( 1 54 25 1614 I 57 1 f 3 1512 1911558177 lendo-l,4-beta-xvlanase (Cellulomonas timi( j 54 1 36 SI 51 1.
58 79 14246 IgnIjPIDjdl0l237 Ihypothetical (Bacillus subtilis( 54 29 504 I 1 I20 110 6 111 7 19115025 53 l o i n reac e pthi or i u .ov g c s I 54 1 31 1 02 I 72 I2 1844 I 109P 191I148613 Isrna3 gene product (Plasmid F( I 54 37 72 7 1 7438 16695 19111196496 Irecomblnase (Moraxella bovis( 4 I 3 4 74 110 114043 113465 19i11200342 jORF 3 gene product (Bradyrhizobium japonicuml 54 I 32 579 74 112 116483 115995 10112317798 Imaturase-related protein (Pseudomonas alcaligenes( 54 I 30 f 489 I 6 I3 I2877 I2155 191146988 Iorf9.6 possibly encodes the 0 unit polymerase ISalmonella enterica( 54 34 723 T I 9 I5 I4433 I 3921 1gi1147211 jphno protein (Escherichia colil 54 j 41 I 513 4- 4 I1 j 3 1464 19i12317798 Imaturase-related protein [Pseudomonas alcaligenes( 54 j 30 462 4- 4 44 4 96 110 I 8058 I8510 IonlIDxIdlo2ols 11AB0014881 SIMILAR TO SALMONELLA TYPIIURIUM SLYY GENE REQUIRED FOR I 54 I 3245 I I I I I I SURVIVAL IN MACROPNAGE. (Bacillus subtilialI 4 I 97 16 j 4662 I 3604 19i11591394 Itransketolasr' (Methanococcus jannaschii( I .54 j 30 10591 106 Ill 110406 112010 19i1606286 IORF~o637 lEscherichia coli( I 54 I 32 I 16051 147- I-8- I 8663 I- i 7404 I;gn IIPIO I d 1 6 15 ORFIID: o319 7; simi lar to (Swi ssProt Access ion Number P37340( (E cher ichia I 54 -35 I 1260 1 TABLE 2 S. pneumoniae -Putative coding regions of novel proteiins similar to known proteins IConcig jaM' Start Stop I match jmatch gene name I sim Iident length, I D 10 ID I Int) Iintl I acession II Iintl 4# 171 4 12477 1 3223 19111439528 jElIC-man [Lactobacillus curvatusl 5 4 I 36 747 4 4 174 2 12068 11787 IgnlIPlDjdl0o5l8 Imdt ,or protein illomo sapiens) I 54 I 35 I 282 1 188 1 1 526 11188 IgnlIP10~e250352 junknown [Mycobacterium tuberculosis) 54 31 j 663 198 j j358 j284 In1P 13074 jhypothetical protein leacillus subtilis) 54 I 33 I 699 F-207 F1- j- 1 1-;l641 jg-;qnh-jPITDdI-1la1-1l 1hypothetical protein iSynechocystis sp.j 54 j 24 j 1641 1I1 2 1655 li12293208 j(AF0082201 YtmP (Bacillus subtilis)1 54 .2 1 641 I 2 2 j 966 1 2357 IgnIIPIOje33Ol94 jRIlH6.1 iCaenorhabditis elegans) 5 2 9 65492 1241 1 1 16811 347 IgniIPzojd101813 Ihypothetical protein (Synechocystis ap.) 1 54 1 26 1 13 1 263 1 2 1907 11395 IonhIPIOIdiaa86 Itransposase ISynechocystis sgvi 1 54 1 30 489 I263 1 6 j 3450 12977 jgijl6O67I IS antigen precursor (Plasmodium falciparuml 1 54 1 47 I 474I 4 4 277 13 12517 11363 1giI1196526 lunknown protein (Streptococcus mutans) j 54 j 30 j 11551 1307 1 1 828 14 19112293198 I)AP0082201 YtgP iBacillus subtilis) 54 j 28 J 825j I332 I2 898 50 g1985 AOP-ribosylglycohydrolase idraG) ifethanococcus jannaschiii I 54 j 32 309I 4 4- 4 4- I385 4 240 1 479 gij530878 aino acid feature: f-glycosylation sites. as 41 43. 46 48. 51 3. 5 4 49 240) 72 74j 10 .1 128 130, 132 134. 158 160, 161 165; ac fe t Relod protein domain, as 169..30amnacdttre I I I I ~~~~globular protein domal.30 mn cdfaue 17 125 119702 119493 IgnlIPIDIe2SSIl 1hypothetical protein (Bacillus subtilis) I 53 I 32 1 2101 23 3 1 2497 j 2033 Ign1IPIoId1O2ols IABOO1488I SII LARi 'O.SALISONELLA TYPIIINURIUH SLYY GENE REQUIRED FOR I 32 j I I I SURVIVAL IN MACROPHAGE. [Bacillus subtilis) 1 1 2 I 46 1 29 111 1 9042 110121 g9i1143331 jalkaline phosphatase regulatory protein (Bacillus subtilis) I 53 j 31 10801 F-4- -009- -y-o-occs--- o -s -i -a jm -n *1--711 36 6 I 4583 j5134 Ign1IPIDIe3I6O29 junknown (Nycobacterium tuberculosis) I 53 I 30 S 521 i 4 S 8 1 8521 I8898 IgiI580904 Ihomologous to Ecoli rnpA (Bacillus subtiiisj 53 I 30 378 152 1 7 I 7007 1 8686 Ig111377831 Iunknown (Bacillus subtilisl 1 53 I 9I 1680 I~ 171755I954191666 orf2 gene product iLactobacillus ieichmannii) I 53 I 36 I 2010I 1 56 1 1 1 1 1 681 19i11592266 Irestriction modification system S subunit imtedanococcus jannaschil I 53 1 32 1 681 I 4- 0 TABLE 2 S. pneuimonlae -Putative coding regions Of novel proteins61milar to known proteins I onig 11W StrtjStp athmatch gene name I im Sident jlength, I 10 JID I't A I ntl a-cessio I ntl 10 19431 18487 19111788543 l1AE0003101 f351; Residues 1-121 are 100 pct identical to YOJLECOLI SW: 53 31 945 I I P33544 (122 al and as 152-351 are 100 pct identical to YWJK-ECOLI SW:I I I I P3394-3 (Escherichia col! II 61 1 I49 I 4 1gnl1P1Ije236467 190024.12 ICeenorhabditis eleganal j 53 j 33 I 4261 4 4- 71 I 1 j5772 1 4 Igij393394 ITb-291 membrane-associated protein (Trypanosoma brucel subgroup( I 53 I 33 I 5769 4 72--*3T89 2 -9 -1 4-- 11- 19793--*---1 -i I1 -7 -5 -8-4j 88 j7 5217 4342 Igi120987l9 Iputative fimbrial-associated protein (Actinomyces naeslundiij 53 I 38 1 876 4 1 93 1 5 2395 1 1688 1911563366 Igiuconate oxidoreductase Icluconobacter oxydans) 1 51 I 33 7081 96 i9 6 632 ;7762 1911517204 IORPI. putative 42 kna protein (Streptococcus pyogenesI 53-- 42-- 11- 31-
I
08 1 8 17629 1 8600 191l149581 Imaturation protein (*Lactobacillus paracaseil 53 32 I 972 12 I9 612 I6972 IgnIIPIDje317237 Iunknown (Hycobacterlum tuberculosis( 53 I 36 j 561 1 128 112 1 8429 1 9253 Igij3il070 Ipentraxin fusion protein IXenopus laevisi 53 1 31 I 8251 I 48 I I 3 90 IpIrIA6I6O7IA616 Iprobable hemolysin precursor Streptococcus agalactise Istrain 74-360) 53 I 36 I 948 IX 163 2- 2162 3022 -gi-1-5 Inocturnin -IXenopu la s 1 171- j 3 12304 12624 1gi11732200 IPTS permease for mannose subunit IIPlian (Vibrio furnissii( 53 I 32 11 321 I 82 I I375 05 gnlIPIDIdlOO572 junknown (Bacillus subtilis( 53 sY5 7u35 209 13 12948 11935 19111778505 Iferric enterobactin transport protein lEscherlchia colil I 53 28 1014 218 5 88 2 0 140162 Imurs gene product (Bacillus subtilis( 53 I 34 1479 3 473 ;790 Iogn1 D-1I0e 3 3 4776 IYlbH protein (Bacillus subtilis( 53 I 30 3181 i 275 1 1 1611 lgn11PIt11d101314 lYqeW (Bacillus subtilisi 53 35 I 1611 T-i g- 1-0 -2 8- 2 21 2543 3445 1gnlP101e233879 Ihypothetical protein IBacillus subtilis) 523 I FI I2- 122402 123376 1gi138969 IlacF gene---- 11 product (Agrobacterium radiobacter(eri-----r---i--bac-Le-r 36- 915-.S S4- 04 j 2356 1gnl1P101e3249l5 11gMl protease (Streptococcus sanguisl 1 52 j 32 5739 4 1 22 126 119961 120212 IgI1152901 jORF 3 (Spirochaera aurantial 1 52 35 I 252 1 22 131 123140 124666 1911289262 IcomE OAF] (Bacillus subrilis( 52 1 32 j 1527 27 1 5397 4801 191139 5 73 P2 (AA 1-1781 (Bacillus licheniformis) I 5 5 I 57 i 0 00 0 0 TA LE 2S. pneumoniae -Putative coding regions of novel proteind-hmllar to known proteinss i Contig loa-F I Start J *-Stop -I 7 match jmatchgenename ID jio 1 Intl I Intl a acession I I I In tl l I 5 10 J8604 7357 fgiJ50824l Iputative 0-antigen transporter IEscherichia colil 1 52 5 27 1 12485 4-S I 5 I4 14801 3662 jgnllPiDIO2243 5(ABOOSSS4) homologs are found in E. coil and H. intluenzae; see SWISS..PHOT 523614 I I IACCI: P42100 (Bacillus subtilil I I I 1 I I 8 11 148 5326 InIIPIDje20SI74 jorf2 (Lactobacillus helveticusi 52 25 5 6601 I 49 1 4 1 5321 5755 igi12317740 j(AF013987) nitrogen regulatory 11A protein (Vibrio cholerael j 52 f 19 435 1 54 1 4 1 2773 1 4668 Igi11500472 IN. jannaschii predicted coding region 1431577 (lethanococcus Jannaschil 52 f 36 5 1896 I I6 J5250 j4969 IgiJ2l82453 I(AE0000791 Y4iO (Rhizobium, sp. NGR2341 52 40 2821 54+_6*j T 1 66 1 6 1 8400 16955 jgiI431dO ITrkG protein [KEcherichia colil I 52 1 30 1446 11 26 136659 531312 jgnl5P1Dje314993 junknown (Hycobacterium tuberculosis( 52 1 23 j 6541 4 1 75 I2 11673 1 1035 IgnlIPIDjdiO227l 5(AB0016831 ParA (Streptomyces sp.( 52 27 6391 1 81 1 3 11439 12893 jgnIjP1Dje31458 jrhamnulose kinase (Bacillus subtilisj 52 32 1455 4 0 4 83 21 120687 121853 (gJ435 lhshrbslainiiaoecroyae1 PUR-K; t start coo) 237 10 Bacillus subtilialI 86 56 15785 I4592 10111276879 jspsF (Streptococcus thermophilusl 52 26 11945 1 86 120 119390 117861 1011454844 IORF 3 (Schistosoma mansonil 52 26 1 96 113 110540 1 9659 JgIJ288299 joRFI gene product (Bacillus megaterluml j 52 5 33 8825 I Ill 1 2 2026 1 1480 cytolysln a transport protein (Enterococcus faecall 52 27 5 2025 12 I2 1457 j2161 5gi[47l234 lorfi IHaemophilus influenzaej 52 I 33 I 711 3 291 26 ibbs5151233 5Nip=24 kda macrophage Infectivity potentiator protein (Legionella 52 3 6 I I i I 1 pneumophila, Philadelphia-I. Peptide, 184 aal (Legionella pneumophilal III 12 59 5646 5951 10118214 jmyosin heavy chain (Drosophila melanogaster) 52 36 306 12 122 Ill I 6159 56374 IgiJ434025 dihydrolipoamide aceryltransferase (Pelobacter carbinolicus) 5 52 5 52 j 216 4- -4 141 13 5 1681 12319 IgniIPlDId10S73 Iunknown, (Bacillus subtilisl 5 52 5 32 1 8395 4 a 4---4 I 161 I 4 I 2562 5024 loll 1146243 122.4% Identity with £schierichia coli DNA-damage inducible protein 5 52 5 36 5 2463 1 I I I I 1 Putative (Bacillus subtilis( l I 173 2 968 183 gi1 1215693 Iputative orf; 079-orf434 [Nycoplasma pneumoniael 5 52 I 30 786 i_ 4 0 0 0 0 0 0 0 0. :00 0 0 0 0 0 0 00 e 00 0* 0% 0 0. 0 0 00 TABLE 2 S. pneumonlae -Putative coding regions of novel proteinS 's'isllar to known proteins t Cotg OBV Start stop I match Imatch gene name S oim ident length I D j I (ntl I ntl acession I- I 1 6 f4400 f3567 jgnllP1Dle31300 lhypothetical protein (Bacillus subtilis) 52 26 834 4 1 210 112 1 8844 19101 Ig1i497647 IDNA gyrase subunit B (Nycoplasma genitalluml 52 I 38 j 264 211t I56 5431 jgJ155O697 jenvelope protein (Hluman immunodeficiency virus type 11 52 36 168 1 225 1 j 15 1 884 IgiJ1552773 1hypothetlcal (Escherichia colil 52 j 34 I 8701 1 230 1j 1 39 1 362 Ign1lPIDld100582 junknown [Bacillus subtilisi 1 52 28 I 324 27 1 j871 2 ~gnlPIle335028 Iprotease/peptldase (Nycobacterlum leprae( 52 j 29 870 F 36-3 I- F2- I 1305 4 jgij393394 ITb-291 membrane associated protein (Trypanosoma brucal subgroupl j 52 1 32 1 13021 23 2 24 1173 ~gn1lPIle254943 Iunknown [Hycobacterium tuberculosisl I 51 I 30 876 423 0- 4 4 29 13 1742 11521 Igi929900 j5-methylthloadeiioslne phosphorylase (Sulfolobus solfataricusl 51 j 31 780, 4 4 I 45 I 1 410 '11597 IgiJ1877429 lintegrase (Streptococcus pyogenes phage T121 51 32 1188 48 26 1927 1894 ~i~21445 (E00633 tacriptional regulator (tenAl (Ilelicobacter pyloril I 1 33 282 I 73 I 5 4276 14016 IglI474177 lalpha-0-l.4-giucosldase (Staphylo coccus xylosus( 51 31 261 -l*1 5- 1- 1070- -4 NJvis 4 83 I 5 11195 I 1986 jgnl!PlDjdlOl3l6 jYqfI (Bacillus suibtilisl -I 51 33 1 921 1 .98 110 I7531 18538 jgiJ41500 lOF3(. 8k pt t~ Eceiha cl(I 51 1 28 I 1008 1 I 113 16 13908 15173 IglJ466882 Ippsl; B1496C2.189 (Hycobacterium leprael 1 51 1 27 1 12661 ;I 124 1 I 326 I- ;_57 ig1i2191168 ;(AF007270) -conta inss sim iIa r ity to myos in heavy cha in- IArabi dops is -thalIi ana( II 5 32-- 270- 1 129 110 1 7.286 16816 jgl 1046241 jorfl4 (Bacteriophage H-P13 51 j 30 1 4711 143 13 4963 1 3983 Ig1i13S493S iprobable copper-transporting atpase (Escherichia colil I 51 1 26 I 981I 1 148 js5 111359 110226 jgiJ2293256 I 1AF0082201 putative hippurate hydrolase (Bacillus subtilisl 51 I 36 1134I 149 18 f6003 17313 jgij 1633572 jIerpesvirus salmiri 0RF73 homolog ((aposi's sarcoma-associated herpes-like 51 211' I I I I I I ~virusl ~pI1 2 1 11 4- 4- 4 -151 9 1 i12092 -i111550 1ogn119l0e28l580 ihypothetical 40.7 kd protein (Bacillus subtilis] 51 1 34 1 543I I 19 I6 I2555 I3208 Igi1146944 ICxP-N-acetylneuraminlc acid synthetase (Escherichia colil 51 I -36 I 654I 4- I 174 I 1 I 1797 I 4 IgiJ1773l66 1probable copper-transporting arpase (Escherlcila colil I 51 1 280 1794 1 4- I 4 4- 1 265 I 4 1.2231 I 1773 jgnljP1Dje2S6400 Iantl-P.falciparum antigenic polypeptide (Salmiri sciureusl 51 I 8is 459I 4 4- -9 277 I2 1.643 I1311 1pir153291515329 pilD protein -e-sser-a gonorrhoeae--51-I----66 0*ti O@ St r stop ma c ma c 0en naa Si U. I l n t TO SI I4 an) I (t I a e s. a @0 3. IgJ900 500 55cerci cal 0 0880 4 I 363 1 4 11228 14485 Igi11707247 Ipartial CDS (Caenorhabditls elegansi 51 j 23 j 3258 367 1 1 j 1701Tj 4 Igil393394 ITb-291 membrane associated protein ('Trypanosoma brucel subgroup) 51 32 1698 I s 15 5 j 5174 -;4497 jignljPIDjeS8l5l IF3 (Bacillus subtill so5 38 1 678 16 i25-2--U-- 19 5 2591 4159 Iei11SS2733 jslmilar to voltage-gated chloride channel protein (Escherichia colil so 50 30 1569 25 -4 2701 j -1997 Igi88 7849 IORFI..219 (Eseherichia coli 50s 27 7051 1 35 1 1 2?11 j 417 IgnlIPIDje236697 Iunknown (Saccharomyces cerevisiael I 50 33 I 207 1 I 9 I4 3416 1 5152 jgnlP1D~dl00974 junknown (Bacillus subtilisi so5 27 j 1737 U 7 S 4000 518) 19111592027 Icarbamoyl-phosphate syntlhase, pyrimidine-specific, large subunit I 50 1 27) 1182 1 I I I I I (Iethanococcus jannaschil(II 51 J -i9 7179 -i8303 -gij 1591847 jtype I restriction-modification enzyme, S subunit IKethanococcus i 50 28 1125 1 1 1 1 1 jannaschll) I III 52 I 8 1 8740 19534 IgI1144297 jacety1 esterase (XynC( Icaldocellum saccharolyticum( 50 34 795 IN 52 jd 116591 115770 jgij2108229 Ibasic surface protein (Lactobacillus fermentum) so 50 34 822 I I7 j6011 6336 1 9 iI2275264 160S ribosomal protein LIB (Schlzosaccharomyces pombel I so j 40 306 1 71 123 129348 128383 IgnlIPIDIdIOl328 IYqJA (Bacillus subtilis) 1 50 1 30 1 966 1 86 112 111155 j10769 jgnljP10je324964 jhypothetical protein (Bacillus subtilisl 50 24 j 3871 93 2 1205 330 1i11066016 Isimilar to Escherichia colt pyruvate. water dikinase. Swiss-Prot Accession so 24 I 876 I I I IjNumber P23538 (Pyrococcus furiosus).IIII 7i-- 96- 5 j1673 j 2959: gn II P I0 Ie322433 jgamma -glIutamylIcyste Ine synthet ase I(Brass ica juncea( 1 50 1 29 1287 1 98 1 2 1218 11171 jgiJl51l0 jleucine-. isoleucine-, and valine-binding protein (Pseudomonas aeruginosa) 50s 30 1 954 1 103 14 13303 1 2785 1igI1154330 jo-antigen ligase (Salmonella typhimurluml I so 1 31 j 519 F _--i6480- -U1 -4 11595 2 6480 5 giJ8577 31putAte cploen regbuato Baillu utls 50 26 50122 T I129 Il 1 155 570 gnIP12164751 Iseal muscle ryanodin reepo (IlomoUCTE aienusui s j o 5 29 597 a a* 00 a. .0 a: TABLE .2 S. pneumoniae -Putative coding regions of novel proteins similar to known proteins Contig 101W Start Istop I match j watch gene name I" sim Iident length 11)D ID IIntl I tnt) I acession I Intl 4 I 155 1 5 1 5986 j 5432 19111276880 I~psG [Streptococcus thermophilus] 50 j 28 1 5551 16 30 16323 IgiJl786983 (AEDOO~i79( o31; 92 pct identical to the 333 as hypothetical protein 5030 1068 160 7390YBHE-ECOLI SW: P52697; 26 pct identical 17 gaps) to 167 residues of the I I I I I f373 aa protein HLE-TRICU SW: P46057; SW; P52697 (Escherlchia col) 1 167 16 15232 13940 Igi1413926 jipa-2r gene product (Bacillus subtilisi I s0 27 1 12931 19 2 1801 130 10nl1Pl01e304540 lendolysin (Bacteriophage Bastilie) I so 35 1 6781 4- 4 4 171 j 5 13168 14025 IgiJ6o6080 IORF-o290; Geneplot suggests frameshift linking to o267, not found I 50 27 1 8581 I I I I (Escherichia colil I 210 111 j 8151 18414 1911330038 IHRV 2 polyprotein (Human rhinovirusl I s0o 25 1 2641 4 1 364 1 1 j 1538 1135 1911393396 lTb-292 membrane associated protein (Trypanosoma brucel subgroupi 1 50 1 31 1 14041 4- 1 10 17 j 5911 15090 1gi1144859 IORF B (Clostridium perfringens( 49 I 24 1 8221 4 4 4- 4 26 15 110754 1 9768 1 9 i1142440 [ATP-dependent nuclease (Bacillus subtilis) 49 j 31 987 66 7 j9777 j 398 1911414170 jtrkA gene product (i~ethanosarcina mazeii( 49 j 26 1380 6- -i T 77 6 5364 4648 gIelPIDIe2BS322 IRecX protein Llycobacterium smegmatisl 49 28 I 717 C V 82 113 112689 113249 IgnlIPIDIe2sSO9l hypothetical protein (Bacillus iubtilisj I 49 I 20 I 5611 I 93 I9 I4866 4531 191140067 IX gene product (Bacillus sphaericus( 49 26 1 336 4- I 112 5 14019 14948 IgiIl574380 Ilic-l operon protein (1icB( (laemophilua influenzaej 49 27 930 I 29 j6058 I4949 1gn11P1D~e267587 jUnknown (Bacillus subtilis) 49 I 35 I 11101 1 3875 4438 7--1 139573 jP2O In 1-178)n (Bacillus heni- o---misi- 4- -1 255--564 154 1 2 11423 11953 IgnlIPlDIdlOllO2 jregulatory components of sensory transduction system (Synechocystis sp.] I 49 29 j 531 27 1631--. 4 4- 5*124 4 I- 4 173 5 3500 j2940 1911490324 ILORP X gene product (unidenrified( 49 30 561 4 4 182 111 1057 1 2 1911333002 Ifirst methionine codon in the ECLFI ORF (Saimiriine herpesvirus 21 1 49 25 f 1056 192 1 6 15352 13667 JgiJ2394472 j(AF024499) contains similarity to homeobox domains (Caenorhabditis elegans( 1 49 1 23 j 16861 4 253 14 11129 11350 19i1531116 1S194 protein (Saccharomyces cerevislael 1 49 I 23 1 222 4- 1 277 111 600 1136 1i1I396844 lOB? (18 kna) IVibrio choleram) j 49 j 32 465 4 4 4 1 327 j 3 1 1435 1 887 1911733524 Iphospharidylinositol-4.5-diphosphate 3-kinase (Dictyostelium discoideum] 1 49 24 549 4- L a. .nu o l e P t a i e c d n e g i n .f .o e p r o e i n a o o w r t e n .c ha ae aD a.D *I ant I ant a *csso a *r a t i *f un n w au ct o a~c e i o i 48 .6 *1 67 ;4 -T 317S3 pneumoniae2 Putativecodigreons fve rte ins similar to known protei ns8]a 4 Coti l 981 364 Star604 Stop match atch ge e nameas (fch skinrihi c i id8n length 81-- T 97-- ;602-- 51--g 11-1-37-9 4 1 11 13 I 41 32 (gi1454 (cde for a1 proei of unkow fucto (Eceihi4 48 26 1185 4 4 401 2 5 6214 17L7 gnl(PID~e29649I (ornithine dlea royase (Nasictans abu m( 48 29 1125 I- 4 167 1 4 1 3774 1 2384 (gni(177 26 8 I2-keto-3-e oxyBaiusat kutins lfrxaiati 48 j 7 3 108 13 1 2 4269 81 5 g91911049388 IIAEOOO10I gene p rhi obiu sp.orad i NCR234( l 1 48 1 27 1 39 1 139 8 2 501326 1 5415 (gi(153672 (lactose retpesoy (rpococcus m iutas) 48 29 630 W8 -4 4 14 1 560 1 4541 911i159137 1 (g (Slalntretococcs gordhaonill u jans 48 29 101552 I 1610 1 128 1 3324 jgnIj 13114 8 jprtllagenae prdut o (EaCtbclu delbrueci coi 48 33 24 1335 6 1321 11 29051 147 (gnlII1(e18~3 1 Ayl-ACP ti poteis (BSyehsica p. I 48o 27 7681 4- 34-51 2139 11 6 1074231 1114596 (i(I1304388 1Z8470. E YgeneSI ro uct (anTPOradTEis ele n( D-9) 48 30 36 139.( 8 5036 (p5665cted(coding5r(uknownH(Staphylococcuslha molyticuse 48 29 760 i- -4 4 246 2 9 48560 7546 (ginl7~e3 9 unelvnonate kiasoyes (liethncos annscl 1 48 41 107 4 3 61 3 1814 11237 IgnIPIDdIO57 Icola odue prcrs yoor E a34-.) (Escherdichacoi 48 4 7 -TT 4 189 1 1 1 390 1 28 (e1(40037 (x gene product (ilusdmoa ph eicus) ti 1 48 8 1 5041 4 e5 S 55 55 5550** Str Sto Sac *atc Sen nam Si Sdn le* *08 46S 5g 1357 P2 fA 1-78 (Bclu 0ihn s 47 23 -7--6-*I760-g--I-PI--10-0---Iunkno-n'(B--llus-u- 4- 1 35 115 114516 113263 (gLi1773351 (CapSL. )Staphylocnccus, aureust 47 20 12S4 51 (6 (3547 14002 (pirA37024(A370 132K antigen precursor Mycobacterium tuberculosis I 47 38 456 51--F* 4 1 55 1 8 110154 1 9273 1(gl139848 (13 (Bacillus subtilisi 1 47 1 26 1 882 -4 92 (4 I1753 3276 IgnlIPIyJe2aO6ll (Pcrc (Streptococcus pneumoniae) 47 I 35 1524 -4-4 127 9 559 5386 1gI765 A20001341 f120;aThis 120 aa orf is 76 pct identical (0 gaps) to,42 4732 204 5589 residues of an approx. 48 aa protein Y127-.HAEIN SW± P43949 [Esclterichia 2 I(232I 179 (g((P(~e26555 unkown(yoatrumtbruoi! 7I 2 2 4 4 140 4 (4951 1 3542 (gnlIPItlIdlOo964 (homologue of hypothetical protein in a rapamycin synthesis gene cluster of j 47 24 1410 I I I I Streptoisyces hygroscopicus (Bacillus subtilis) I 1 5L1 4 1 6814 16200 19111522674 jannascull predicted coding region HJECL.41 (Hethanococcus jaitnascliil 1 47 27 1 615 1 157 j 3 1.803 11174 jynl(P10jd101320 (YqgZ (Bacillus subtills) 1 47 I 25 1 372 t 178 5 1 3267 12155 9g112367190 (AE000390) o334; sequence change joins ORE's ygjfi L ygjS from earlier 4730 I 1 1 version )GJR..ECOL( SW; P42599 and 'GJSECOLI SW; P42600) (Escherlchla I I I I .col! -273 111 j- 2- 1-549 -Ign1IP1Ie254973 jautolysin sensor kinase (Bacillus subtllis) j- 47-- 32 1548 1 1 100 12 1.880 1 644 (gII835755 (zinc finger protein Png-l tmus musculus) 47 j 22 I 237 1 1 54 114 114182 112638 1plr154360915436 Irofh protein Streptococcus pyogenes j 46 1 24 15I45 1 88 1 1 2 1 1018 1on11P101e223891 jxyloae repressor (Anserocellum thermophilumi 46 j 27 1017 96 (7 (4553 15860 g lPDd 162IRFI: 0 5 similar to ISwlssProt Accession Number P45272) (Escherichia 46 1 23 1 13081 I I I I jgIPO~llS (ORE..ID375 112 j 1 1 1127 3 I 12625 (F03 )puaieoligosaccharide repeat unit transporter (Streptococcus 46 24 1125 122 111 1 7308 1 7982 (giIIl054776 (hr44 gene product (Homo sapiens) 46 34 6751 1 127 114 1 9198 1 8125 Igi(1469286 (afuA gene product (Actinobacillus pleuropneumonlael 1 46 j 28 1 10741 4- 1 132 1 4 1 7093 1 6197 (gI(153794 (rgg (Streptococcus gordonilli j 46 1 26 1 897 4 140 (8 1 8220 1 7723 1g111235795 (pullulanase (7hermoanaerobacterium thermosulfurigenesl 46 21 498 4 I9 20 81 11407878 jleuclne rich protein (Streptococcus equislmilis) 46 I 27 89 1~ 1* 112 011430 09F7 Mehd cocpta trnlto suple by auhr5 hgla l 4 5 12 i 199 1 1S1 pnemoia -91977 Pative9 co dinireion ofinel rotCens sbimi t kgnwn prqteiSOs 162 1 2 1 76 11258 Ig1910612 joafi; geth poduoctu~al thranslaio ulie byato1~iel o 46 28 84 12 -2 4 F 1 68- 3- 8 1-9 44 I 23 jf 2 7 -36 1 168 jgiil7ldll2 jycfnl g enepoducte Ixyanphr paaoa 46 28 489 I 1 292 I 3067 2 20 1 gi1673744 dieOOl No~y coplas DN pneumonia .3cydedfo deamna e simiar tocenaan 46 29 46 I Ig a Acc so CNumbe 9or S ce rb C.312 froman H.N pirum l;ycodeadm pnfonor 1 0 82 16 543 647 11 788 gi 109 I1AE0002701 o235;s Th i 235 88 o ~rteis 29 p ctitcanl (1 aa o18j 45 24 0 I 6 346 38 8 g j7 3 9ju ko wen l e ter y num of t ejo F p r n p ta i e J h z b u l l t 45 j 29 J 14081 6027 11 1 30746 260 bsjgl6 9 codedfror byti lgana roteA yk~emlh4.3; coedfornze by p C, elegan cOMA 45 36 306 1 1 1 I I by C.6 epiega c50A ceZglO c1aeodedl inle 72 1 137 1161 148741 IgiJi32l900 INADHp dehyrogease lubiqulnone)s irtemi frnicaa1 45 1 25 154 1 0 J 15 j79 1 36 gi1l52192 muetaitionaues d scnoluc asn-iu phenotype: EoQ ias] ata emrn 45 28 11 3127 1 6 046 6 1 b2Ii11 58 Ilit8ilr en tiliz ao protinlle mhila nfunatp b. L4 Tl j 45 24 441 1--4--4l--4*g-ll-l--e-08082imembrane-ta-nspt--- 209 1 77 1 3641 7 jgn1D041I00 JItito enoulas easuuifacillus coguanl 44 28 41 32--F-T -4 34 1 1564 125 jgiJ140457 1M Ila ta alerge previce brasingei n MG6 Icpam eiai 43 31 670 4 4- 6 7 8 69 7030 6452 pgi153720 j0F~27 E cherichia coli 43 29 6579 -4 4 136 j2 115416 11385 IgIl0390 IN.migenitilu predic ted coding argorM06 Iycoplama gnil lm 43 26 20791 4 T A L*.S.num n a Pu at v co in region s *.no e p o eirl- i *e o now r te n .o i .R St r Sto ma c ma c *en *am Si Sn I-lengt 387 1 47S pnemoia -i2 5 Ptative69 co dinireion ofinel rotceinosii t kgnwn proteins1 4 4 1 3801 1 82 1 10 jgi12315652 f1F01669 NCPdayg o l ef nton lie founbd paenorhabitis elgas 41 30 53 4- 185 4 42251 j1317 19112187239 jRAEO27-2prteinf ITrhzobima sp.i 1G24 41 j 25 1 0292j j2_ 14-34--- 1-112-7 -ut 3 0 1 4482 706 g i 1P0 2 18 8 emb di lgyer o syTP d pe nt htrase or(Ar abi ois thali man o pr isand 1 4 20 13 1-3--4 Y 4- 363 6 425 194 g1 6311 5722 pvrtesin ilOR7 Ityao oma lo cu I aoissroaascae epslk 41 21 2292 4- F- 4- 1 1 3 44899 81 911202 memberj1O90 1y of tiAPPdpnettasotfmiy eysmlrt proteins andeho i 40. 48 j91 2 1 682 1 15 1381 1 64 IgilPD~dO191 heolystin xp protein t scherichia coli9t 1 1 83 4 4 4 4 4 4 36 26 9 t05 174 iggi1l635 Ill-erpeviussaiir nucleas haclogs bilpsls sacm-sscae hepe-lk 0 201 314 -4 4 4- 4 a a a a..
a a a a a. a a a e. *a a. asa TABLE 3 S. pneumoniae -Putative coding regionis of novel proteins 4 -t Contig JORF IStart stop I ID 4 -4tt) (t 1 4 3428 3009 1 6 j4611 4964 3 2 818 I994 3 j3 [1182 I1574 i- 3 125 j25046 125396 I 6 -i 2 -15-19- I 1689 6 285 1161 6 1325-184 i--6 18 115977 11I5390 -i 4 I 10 11-- 6061 I 990 i 6-4 82 6 3154 48 1- 17 602 1506 ;5-57--38--4 10 14 64509 61598 not irnUrlar to known proteins 00. to a* be: *e t: 0 TABLE 3 S. pneuzsonlae -Putative coding regions of novel proteins not similar to known proteins Contig IORF I Start I StopI ID IID I Intl Int) I 21 3 I3359 1 2589 4- I 21 5 4802 I 4482 I *1-22 21 11I7099 11I7362 I 1 22 125 .119467 119982 I 22- 135 j;26388 1i26218 I; I 22 36 16382 j27572 2- I 23 17 1 6655 16032 1 -23 8- ;BT 7132 I 66531 I 24 1 36 I518 I 25 I5 I3009 2641 27 4 4819 j 4223 I 27 5: 4789 1 4956 Uj 28 5 30117 1797 I 28 I8 I4212 1 3850 1 28 110 1 5028 1 597 28 111 1 5746 1 5072 I 29 I7 I5596 1 4919 29 8 j5039 I5518 I 29 9 I5595 1 8207 9 I6511 I6263 I 31 I6 j2664 2344 I 32 5. 5203 i 5538 I 33 I8 5327 i 4668 1 34 110 1 8024 1-7740 1 4 1 34 12 j9360 I8641 I 34 113 1 9667 1 9377 1 TABLE 3 S. pneumoniae -Putative coding regions of novel proteins not Atdlar to known proteins IContig I0RF Start StopI I ID lID I Int) I i'nt) 4 I 34 118 113104 ;11902 I I 35 11i 9688 1 8588 I 5 112 111073 1 9670 I 36 2 334 1 1041 1 36 112 111120 110893 I i 36 i13- 110993 111-388 -I 36 127 149 I 38 -7 -i 4269 -i4577 i I 38 18 I4480 15001 38 i10-1 5-5-1 i57-11 1 I 40 I3 I1728 I3143 43-- 43 I- I 7- 8884 I 87327 43 8 9568 I 907 I 44 4 4831 I68311 1 45 I 3 13204 13665 I 1 46 I4 1 3875 I3468 46 F 7 I 6074 -;7081 -i 48 IS I3196 1 3582 I I 48 8 I4579 j4229 I 48 111 1 9323 1 8922 1 1 48 116 113042 112494 1 1 48 I20 116342 115764 4 I 48 124 117971 118351 1 1 48 130 121979 121776 I 4 4 1 49 111 209 1 3 1 0 TALE3 S. pneumoniae -Putative coding regions of novel proteins not bl~1ar to known proteins IContig IORF start Istop I I ID JID I (nt) I Intl I 4 I 50 I4 I3307 I2672 I 51 5 3239 j3598 1 52 111 112146 112883 1 I 54 1 7 5588 5 187 54 8 16013 ;5459 I 54 1 9 16004 1 6210 4 1 54 116 117685 117506 I 1 55 1 9 110515 110123 1 4 1 55 112 111947 112141 1 56-1 3-1 -935 ;1387- I 56 14 1496 I1939 I 57 4 2100 I2501 IT-T0 I 58 6 7541 7335 I 59 1I 1 2 1 430 1 59 -4 I 2416 I 2736 I 9-----T4-4-59 I 59 I9 I 5459 I5929 I I 61 I3 I2395 i 1772 I 61- 6-3- 76 i I 64 I1 I2722 2 I I 66 I2 I1180 I3147 I 66 I8 I9082 I9495 I 67) I-3 I -13-43- I 1182 I 69 i 2 -I 1165 -I 980 -i TABLE 3 S. pneumoniae -Putative coding regions of novel proteins not b1'seilar to known proteins Contig loRF I Start I Stop I 10 lID I (nt) I Intl I 5 4059 I3922 70. 6 4215 I4057 1 I 70 9 j5268 5504 71 15 120351 121901 1 71 116 121859 122338 1 71 19 126204 121556 72 I9 I8458 I8081 7 3 I4 I3815 I4216 73 j6 14214 1 4582 1 73 10 I7183 ;6428 i 73 115 I9462 1 9668 76 1 524 j195 jt 4 76 j2 867 535 I 76 Iii 8602 9210 I I 80 6 7924 j8109 I 81 1 244 2 I 81 10 j6631 8931 I 83 I4 I1872 1150 1 83 117 116810 116460 1 84 3 I4464 2929 i 86 1 2 j2147 ;1092 1 86 ;4 3606 I2875 I 86 119 116767 117114 1 i 1--01 S- -87 i7 1 6459 I 6001-- 87 9 j7224 I7006 TABLE 3 S. pneunioniae -Putative coding regions of novel proteins not slisllar to known proteins Contig 101W Start IStop ID ID nt) nt) 8 7 118 117930 117670 1 1 87 119 118275 117928 1 I 88 I2 1619 I1840 88 4 2711 2878 88 1 9 1 6252 I60i6 89 1 2634- 1-1621 89 19 7371 I6868 12 899 2395 90 3 1143 I 91 1 4 3170- ;-3691 I 91 6 4253 I45-73 I 9 1139112 -4 I9 16 12648 12379 93 I8 4533 3712 96 1 3 182 96 j3 j1407 1147 I 97 9 I7043 1 6753 1 99 115 118522 118692 1 1 99 117 119717 119541 1 I 100 j2 4094 I1980 103 I1. 48 I299 I 103 I 6 924 I4373 I 104 5 6142 6735 a* a.
108 1 2 268 I 111 I3 3417 j 788 L 111i 4 1 3609 I4606 1 1 115 110 110854 110438 1 116 1 3 2873 2121 118 2 2274 1357 I 122 I4 I2698 33 4- 1 122 110 1 5858 16199 1 I 122 112 1 6301 1 7416 1 I 124 2 346 690 I 128 I4 I254 36 I 129 1 689 102 129 2 1011 724 129 8 6454 6056 129 9 6540 6277 1 129 112 1 7809 17621 1 131 I3 I1433 756 1 131 110 1 5972 15673 1 4 1 134 ill 111838 111209 1 I 15 I2 j625 j1140 41 4- 4 36 I4 2913 3830 F 137-1 2-1-325 ;134 4 I 139 i3 114840 1- 14532 I 1 139 114 115363 114875 1 0 0 0 0* 0 0, .0 0090* 0 .0 0. 0 TABLE 3 S. pneumoniae -Putative coding regions of novel proteins not simsilar to known proteins 4 IContig IORF Start IStop I I ID lID I int) Itnt) I 1 140 -120 119822 120838 1 14'2 1 j 1 1 285 146 3 1760 479 I 146 I4 1 1149 1 7 4- I 146 1 I 3604 I2885 146 113 1 8223 I9401 1 146 114 1 9399 110676 1 4j-- 4 4j---4 1 146 15s 110052 I9750 I 47 i-7 I 7488 i 7276 1 147 9 1 8913 8647 1 148 j7 I5298 4765 149 1112 119361 I 149 3 2557 2880 I 149 I9 I6258 6070- 4 150 2 1355 57 150 3 I2556 1909 I 153 I3 I2061 2 642 I 154 I3 I1953 i 1741 I 155 1 2 2181 I1411 I 156 8 4550 I4311 1 157 Ill 37 1.294 1 4- 1 2 1 631 780 I 159 I4 1 1384 1722 I 159 I7 j3271 4017 I 161 2 1332 I1018 165 3 5535 I4945 I 4 166 6 15406 4972 TABLE 3 S. pneuzsoniae -Putative coding regions ot novel proteins not similar to known proteins Contig ORF I Start Stop ID ID I ntl (nt) 167 j9 j6075 6395 169 I5 I2828 j3205 170 j7 6485 6243 170 8 1 6964 6362 170 1 9 7303 6962 1 170 111 1 8790 7906 171 I9 17150 7476 7 172 5 2298 1948 13 I4 2913 2677 175 2 659 835 115 I3 893 1789 I 176 12 I1487 I546 4 176 3 2200 1466 4 I 177 9 4686 14925 177 110 4923 ;5177 I 177 I 1 5111 1 5347 I 1-177 113- 7396-1 87031 I 181 I5 11853 j2473 182 2 12112 11102 18 2 I2126 j2320 185 5 4683 4219 -185 j6 I4846 4634 18 4 2940 j3557 188 I5. I4183 4821 I. Coti 5*R Str Sto I 1 88 6 588 6493* S. 5 5 14 28 5* A 5 I 189 6 5882 15643 1 68 148 44 192 1 3 I2861 I2268 I 192 I4 I3081 2878 192 I7 j6800 5331 4 1 193 I3 j997 839 I1194 F4 I 2315 1 2127-- I 195 I5 I6249 I4543 I I 195 6 6620 I6231 I 196 2 1553 I1849 197 1 1 861 I 198 I9 I6844 16644 200 15 I5329 5769 200 j6 5993 I6595 I 204 I5 I3914 3276 205 2 447 j1709 209 I4 j2038 2460 1 209 5 2458 2682 210 10 I7370 18230 210 -3 i 9029 j10441 I ;210 14 110439 110705 -I 214 15 1 2581 12330 214 9 1 5065 J5277 214 111 1 5996 1 574 00 0 0. TABLE 3 S. pneumonlae -Putative coding regions of novel proteins not bilffdlar to knovn protplns Contig JORF Start Stop 217 1I2 1 541 1194 1 I 18 2 914 11432 218 3 j1430 I1972 I 218 6 3639 j3821 I 219 I1 458 39 F 220 i1 -i 869 i600 1 223 4 2617 I1964 I 227 1 1 j 510 ;4 235 1 52 I312 2- 4 -4 4 0 238 1 1 1660 1 64 1 246 I1 1 I270 I 248 1 1 3 I362 248 2 443 I1222 254 3 I2789 I792 I 258 I2 1179 I1616 I 260 3 1770 I2123 4 I 263 1 1 653 1 177 I 263 j4 1 2244 1900 263 5 3569 1 2973 I 266 1 1 342 266 2 177 1 1022 I 270 I2 1124 I1681 272 I1 j857 I186 275 12 1684 2295 0 **0 a a.
a TABLE 3 S. pneumnoniae -Putative coding regions of novel proteins not AflMilar to known proteins Contig ORF Start IStop ID IID (nt) j nt) 278 1 2 j406 I 282 1 714 J391 1 282 4 j1463 11134 1 4 287 2 1119 j826 I 288 1 j 40 4 I 289 1 684 I 4 I 291 1589 I1858 293 2 2539 I2925 1 294 1 21 608 ,i4---40 4 i 296 13 670 I843 302 261 530- -309 3 559 *1 350 I 30 2 249 I1889- 316% 2 12087 1818 317 2 1 1048 1 584 318 2 .1313 1 I 319 3- 477 i 133 -i i 327 12 j912 607 4 4 I 331 1 1 1 549 i 333 J1 2 535 333 2 465 I 82 4- 4 I 345 1 2 I895 101 4 346 12 1750 I199 4- 349 1 1 1 198 1 TABLE 3 S. pneumonise -Putative coding regions of novel proteins not Utmiar to known proteins Contig IORF I Start I stopI ID JI I int) I Int) I 350 I2 81 I413 I 3 I 1 44 973 I 38 j2 j636 I448 360 2 946 ;-628-- 364 2 1639 j1265 I 378 1 j345 I1004 I 379 I2 683 I510 I381 1 109 693 3 85 I1 I 365 2 269 I 148 GENERAL INFORMATION: APPLICANT: Charles Kunsch Gil H. Choi Patrick S. Dillon Craig A. Rosen Steven C. Barash Michael R. Fannon Brian A. Dougherty (ii) TITLE OF INVENTION: Streptococcus pneumoniae Polynucleotides and Sequences (iii) NUMBER O F SEQUENCES: 391 (iv) CORRESPONDENCE ADDRESS: CA) ADDRESSEE: Human Genome Sciences, Inc.
STREET: 9410 Key West Avenue (CC CITY: Rockville CD) STATE: Maryland CE) COUNTRY: USA ZIP: 20850 COMPUTER READABLE FORM: CA) MEDIUM TYPE: Disk ette, 3.50 inch, 1.4Mb storage COMPUTER: HP Vectra 486/33 OPERATING SYSTEM: MSDOS version 6.2 SOFTWARE: ASCII Text (vi) CURRENT APPLICATION DATA: 149 APPLICATION NUMBER: FILING DATE:
CLASSIFICATION:
(Vii) PRIOR APPLICATION DATA: APPLICATION NUMBER: FILING DATE: (Viii) ATTORNEY/AGENT INFORMATION: NAME: Brookes, A. Anders REGISTRATION NUMBER: 36,373 REFERENCE/ DOCKET NUMBER: PB340P1 (vi) TELECOMMUNICATION INFORMATION: TELEPHONE: (301) 309-8504 TELEFAX: (301) 309-8512 150 INFORMATION FOR SEQ ID NO: 1: SEQUENCE CHARACTERISTICS: LENGTH: 5625 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY, linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: CCAAGCAAAA CCAGCTACAG TCAAGGTGTT GGTATCAAAG AAAAAGTTGA AGACGCTATG AGATGTGTAC TATTCTAGTT GCTATATCAA AACCAGTCCT GAACGACATG CGTTAAAAGT GAGAGGGGCT AGAGATTATC CTAAAGGAAC TTACGTAACA TTGACGTAAA CTCACTTTAA TCTCAACTTT TTTTGATGTA TCAATCTACT ATAGTAGCTC TGAAA.AACGT GGACTGGTTT TAGTTGAACC GCCGTATGCC CCCTACTCGA TTTCGAAATC AACTTGACTA TCACAACTAC TCAGTAGTTA AAGTAATGTA CGACGGGCAT GTTGTATAGT AGAAGTCGGT ACTTAAACGT CGTGTTTGGA TTATTACCTT GAACGGACGT ACGGTGGTGT TAGTGGAATG AATCTGGAAT a.
a a a.
a a a a. *a a a AGTCCATCGA GCTTTCTAAT ACTCTTCGAA AATCTCTTCA AACCACGTCA ACGTCGCCTT GCCGTGCGTA TGGTTACTGA CTrrCGTCAGT TCTATCCACA ACCTCAAAAC AGTGTTTTGA GCTGACTACG TCAGTTCCAT CTACAACCTC AAAACAGTGT TTTGAGCAAC CTGCGGCTAG TTTCCTAGTT TGCTCTTTGG TTTTCATTGA GTATAACACA TTGTTAGAAG TTGG3TTTAAA TTTCCTAATC AGTTTGT'rCA CATTTACCTT CGATATATTA TATCCCATAG TTAAGGTTGG TCATACAGAT GATTATAGTC ATGGAGCCGT AAAACTTAGT GTTTCTTTAG TTGACAAAGA #4000 a0 a..
0 TGCCATGAAA AAAATATTTG TATCACCGAT ATTCTATACG TGTTrCAATA GTTTCGGCAA TTTAGAGTTA CTAGATAAAT TTGAGGGTAA GGAAAAGTAA TGGGAATGAG TGGATGGATT !TATTATTGGA CAGTTAGTCT TGCTAATAAT GAGGAGGTTA ATTGCTAAAA CATTTATAGA TTCAATCCGC TATATATTAT ATATCTGATT ACAAACAGAA TAACTGTAAT AGGATATTTT TAAATGGTAC TGCTATTCTT TTGATAGCAG TGAAGCAATG CTCAACCTTT TGAAGAAGAA AAGCAGTAAG AAAAATGTCT GAATAAAATT TGATTAAGAG TGAAGTAGTC TAAGAATTAG GTTTATGTAT AGTAGACTGA AATTAATTTT ACTTTCCCAA GGTATCGAAT CTTCATCAGA TATGAAAGCT TTTTATATCA GAAATAAATA TAGATGAAAA TATCTTTATT TACGTTCAAT TTGCTACCTA TCATTAATGT TAATTrATTA GCTCACTAAA TGCATTATAC AGCAACCTTT TGGATGATTT ATCTGTAGAT GTTATAATCA GTAGAAGCCT ATCTAAAATA GTACGAAACA TCGATTrGTT CTCATCTTAT ATGATAAAAT TAATCAATTG CTATTGAAAA ATTTATACGA 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 151 GATGATGAAA GCCTTAAGTG 'rTATT'rTATA AAGTATr'r CAAGTCGTTC CAACGTAACA AGTCTAGATC AGATTGAAGC AAATTTATTG GATTTTATAA AAGAGGTGGT CGAGTTGGTr ACTTCTTTr'r CATGAATGAG ATATT'rTCTG GATAGAGGGA GAGAAGATGA GTAGGTTGGT TTTCATGAGA GAAATCCTAA TTGGACTATA ATAGGTTGGT AGTAAGGATT TAGAATA'TT GATAGATATG GCAATGATAG GGAACCGGGC TTGAAGGGGG AGCGGTCAAG GATTTGACCT AAATGGTGCT GGGAAGTCGA TGGTTr'rTGT CGGATTAACG TATTGGCGTA GTCTTTGGAC CTACACTGTC TTAAAAGAGA CTrTTTTAAT GAAGCTGG ACTGCGACAA CGGATGCGGG *TTTTTTAGAT GAGCCGACCA AATTACTCAG ATCAATCAAG
TGATAAAACG
TGAGA?1'ATT
TAGGTAGTCG
GTAAAAGAGC
GTATCCGATT
AATTTAAACT
TTTCACAATC
ATAAAGCCTT
GTAGTTAAAA
AAGTGGAACA
CTTTGCGCTC
TTGAGGTTCC
CAACCATTAA
GCAAGATTCC
ATACAAAGAA AATATTCAAG TGAGCTAAAA TGTGAGGAAA ATAGTTTCCT ACATGTACGA A'rGCGTGAGT TGATAATTCT CAGGGTATGG AGGTATTGTT TAGAGACAA'r CATITCTGAGC TTATGATCAA AGTTAATACC GCCCTCTGGT ATTAAACAGA ATTTTTGATT AAAAGTATTA CATAGGCAAA CGCTTGCAT'r TCG~rTTTTA CTGTAGTAAT AAAATGTAGA AGGTGTAGAA ACACAATGT'r GCTATTCCTT ACGATAGGGA TCTTCAGAAA AATT-rTGTGA AGACTGTTA-A CTTTATTCAT CCTGAAAAGC AGACCTTTGA AAAAGGGCAG AT'NTTAGGAT TTATCGGGC *9 S S 5.
5 9 5* 5.
S
9.55 r 5* S S
S
AATGCTGACA
CCAGGACAAT
GGAATTTTGA AACCAACATC CGGCAAGATT ATGTCAAAGA 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700- 2760 2820 2880 2940 3000 3060 3120 3180 AACGCACCCA GCTATGGTGG GATT'rGGCTC TGCAAGAGAC TTT'ATGATGT GCCAGACTCG CTCTrTCATA AGCGTATGGA ATTTGAAGGA CTTTATCAAG GATCCCGTGC GGACTCTTTC CGGATATTGC GGCCTCCTTG CTCCACAATC CCAAGGTTCT
TCATATTGAG
TGGAACGGTG
GCTACCACGT
TAGACAAGGA
TATCAAGCAA
TGAGGA'rATT
GTI'ATAAACC
TTCTCTATCG
CAACTTTGTG
AGCCAACTCA
CAAAGTCATC
AACAGCCTCA
ACCCTGTC'rG
TTGGTTTGGA
AGGAAGAAAC
ATCGGATTTT
AGGAGACCTT
TCGTCTCTCA
ACATTGAATT
ATTTTGAAAT
CGTTTCGGTT AAGGATAATA IrCGTCGGGC TACCATTCTT 'rTGACCACTC ACGATTTGAG CATGATTGAC AAGGGGCAAG AGAT'rr TGA TGGTAAGATG AAGACTCTCT CTTTTGAACT CTATGACGGT CTGTCTGATA TGACCATTGA TGATAGTTCT CGCTACCAGT CAGCTGACAT CCGCGATTTG AAGATGGTGG ATACGGATAT ATCCGTCGCT TC'rACCGAA.A GGAGCTCTAG CTTTATCAAT GCAGGGGTTC AGGAGI'CAT GATTGGCGAT GTCATGGGGG CTTITTGTGGC GATGATCAAA TTGTGGAGAC TACTTACCGA GTCAACTTTA CTTTTATCTC TGGAAGGCTG o o** TCT'rTGATTC TTCGCAAGAG ACATCATCAT GAGTrTTTGTG GGGAGGAGGT CAAGGATGGC CCTCCTATCT TTTCACCGAG CATNTTTAAG TGTCATT-GTC TAGGA'rTAAC TGTCATTT-AT TTAATATTTG CTTTGGAT TTAAGACT'rC CATAGTGGCT AGGTTGT'N'C AGATA'N'CTC TGATCATTGT TGGAAAATAC TCTGGCTCTT AGTGATGGTG TCACCA'rTCA AGGAGGTTAG CAATACATCA AACAAATCAT TTTCTGACTC AAGGCTTGAA CTAGAAGGCT GGACCTTTCA GGAATGGACC ATCTCTTTTT GGGGAGTTTG ACAAGTATCT ACC'rTCAGA T'rGATGCCTT GTGACCAGCA TTGmTGGAC GCGACCT'rGA TTTATACTTC CAGTCAGGCG CCATGATTTA TCTATTTACA AT'rcCCrCT GCCTACTATC CAGCTAGCTA TTGATGTTdA TTTCTCTGGT GAT'rCCTACG AAAGTGCGGG 'rCCATTATCA TGCGTTTGTT CTTGGT'rCCA AGTGGTTGAT TTGATGAAAA TCATATCGCG CTTTTTAGCT TAACGCTCGC TCAGCCTr'rG TGI'TAAAAA
TTTATGTCGG
TCCTTTTTGC
GATGCCAGTC
GGATTGTCTC
TATGAAAAAA
GGAATATAAG
TCTCTTGTTT
GGACTTTGAT
CTTTTCATC
AGATTCT'rCA AGT'rAATTTG
TATCAACGAA
GTAGATTTTG
CTCAA'rGTCA GCGACCAGTG CATTTTGCGG rIrTTATCACC GTTCCCCTTC TCAAGGTATT GTAGAGGTGC CTATCTGATT AACTTrCT TCTTTGGGCT TCCAACCTAC TCCCTTGGCA TTTT'rTCCAA CTTGA'N'TAT ACTCCAGTTA GGCACTCCTT TTGCAGTTCT GAAACGGGTC CAGTCCI'TTA TIGCATCTGAT TTTATCAGA TGGTTGGTGT CTTGGGAGTC TCTT1TCAACA TATTCCATTC 152 TCTTTGATTC AGGGCTTCAG TATCGCGGAT ATCACCCTCT ACCAATCTTC TGACTAGATC CGATTCCTCC TTTATGNITrG AGAGATAGCT T1CATTTATG GATTTTCCTT GATTCCCAAG TGACAATCTC TGGGCACTAG GGCAACGCCT AG'TCCGAAAA GACTCGTCCC ATCAATCCTC TCTTTCACA'r CCTAGTTGAA GGGTGAACTC T'rAGTCGGTG GTATTTTATT GGGAACAACA 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 TCTTCCAAAA TTCCTGCTTr TCTTAAAATC GCAACAGCCA TCCTAGTTTG TA'rTCCTTTT GTATCGCC'rT TTGGACTAAG
CATCTI'CTAT
TCGTTGGTTG
TTTCTTACAG
TTTCTTTGTT
TTCGTAAAAG
ATGT1CAATG ACT'rTGCTAA ATTAGCTTTA TCGTGCCTTT T1'ATGATGTT AGG7"T'rGAA 'rGTAATTGAA GAAGTCAACG GCATTTGCCA GAATGGTTTG
GAAAAGGATG
ATTTCCCTTA
CTAAAGTAAG
ATGAAAATCA
GAATCCCAGA
CCTATCAGGA
CAGAGATTGA
GCTTGCTACT
TGTTCTTTAA
AACTTTGGGA
ACTAAAATCA
GTATCCGATT
CGCCTTTACA
CGTAGGAGGT
TAAGGGCTTA
AGAAAGAAAC
AAAAAAGGCA GTTGTCGCTG AALGCACACAA GCCTATATAG AAGGAACCAC GACACTGCAA GTTTGGACCG TAAGCTTATC CTATTCGAGT GAAGATTGTG TTATCAAGGT AGAAAAATTG GGAGCCAATT GAGTGATT'rG T'rGTC'rCGGC
TTAGAGAGTG
ACTAGATT'rG
GTAAAAAAC
AAGCTCG'rAA 153 AAAAGTTGGT TATCTGCAGG TCAAAACAGT GGCAGAAGGT TCTAATAAAG ATTATGATCG 5040 AACAAATGAC TTTTATcGAG GTCTTGGCTT1 TAAAAAGTTA GAGATTTrTc cTCAACTATG 5100 GAATCCGCAA AATCCTTGTC AGATTTTGA'r TAAAAAGCTT GAATAATATr ACTTGACATC 5160 TATTCTCAGA GTGCTATACT GTA.AGTGTAA TCGCCGATTr AGCI-rAGT'rG GTAGAGCAAG 5220 GCACTCGTAA AGCCTAGGTT ATAGGTAGAT AAACGACTGA GGATr'TGAAA AAATAGATAG 5280 GTAGAAGATA ACCGTTAAGC CTTACTCTTA GCGGTrTATT'r ATATTGTTTA ATAGCGCTAA 5340 TATTTTATCA ATTArGCCTG TTTCGTGTT TCTGGTAGTT GTTCAAGTTT ATTGC'rACTA 5400 TTTTTGATGG TATGAATGTG CTTATAATGT ATCCCGGTTA ACGAAAGTTT TGGACTTATA 5460 CTCTTCGAAA. ATCTCTTCA.A ACCACGTCAA CGTCGCCT'rG CCGTGCGTAT GGTI-ATGACT 5520 TCGTCAGT'rC TATCCACAAC CTCAAAACAG TGTTTTGAGP GACTACGTCA GTTCCATCTA 5580 CAACCTCAAA ACACTGTTT GCCCA.ATCTG CGGCTAGTTT CCTAG 5625 INFORMATION FOR SEQ ID NO: 2: SEQUENCE CHARACTERISTICS: LENGTH: 7571 base pairs TYPE: nucleic acid STRANDEDNESS: double :0 00 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: CTCTCCAGCT TTCCTTGCGA GTTGGCCATG ?1'GTGTCTTT AAGAAGTCTA AAAATATCTC CAATAAAACG CATCGCTCTC TCCTATCTCG TTTCTCTGTG TGTAGTGTAC TTGCCACAAT 120 .GCTTACAAAA TTTAT'rTACT TCTAGTCGTG TAGGCTTGAG GTTTCCGCTG ATCTTGATTG 180 AATAGTTTCT CGAACCACAA ACCGCACAAG CTAGGCTTGC TTN"TTTTAGT GCCATAACGC 240 CTCCATCTTA TCCATTATAA CAAGAAAGCT AGGCTTTGAC AAGCATCTTA GCGAAATAGA 300 0 .0 TTGACTAT.CG AATCCCATAT TGTTTGAGCC TTTTCCTTAA TCTTCGCATC TGAGATAGCC 360 CGGCTAGCCT CATCTACTAG ACTTTGCGCA CGCCCTCGAA TATCAGACAA ATTATCATCT 420 0GTCTGGCTAT TATCAT'rGGT TTGTACTTGT CTTTTTGTAT TGGCTGGTGC AATTCCATTT 480 .o:TGCTTATAAG CATT'PTCAAC CGTAAAGGTA CTTCCTGGCG TATAAGGTAA AATGGTATTrG 540 GCAATGTTTC TAAAGACATG AGCTGCACCG TTTGAAGTAG AGCCAGCTAG ATAGTGGTTT 600 TCATCAGTGG TCGGAAAGCC AAGCCAGTGG CTAATCAC'TA CATCCGGAGT ATAACCAATT 660 ACCCACTGGT CACTTGTCTA CTCCGGATTG AAAACTGCTT CAGTTGTTCC AGTTTTCCCT 72 720 154 GCCATGACAT AGTCTGCAGG CGATGAACTA ATACCGGTAC CGTTGGTGAA AGTCCCCAAC ATCATACTGG TCATCTTGTC AGCTACAGAC TTA'rCAATCA CCCGT'rrG TGAAT'rTTTA TGACTCGCAA TAACTTGTCC ACTAGCA'TT TCAA'rTCTAC TAATAAAATG AGCTTCAGGC ATTAAACCTT CATTTGCAAA GGCGGCGTAT ACACCGCTTC CCAAGGCGAC ACCAAGAACA TTTTCGCCTG CCTCAAAAGC CTTGTCGACA AGATTAAGCG ATTCTGCCAA GGCT'rGATAC GCATAGTTAT CAACCT'rATA GCTGTCATAC AAAGCCCAGC TTGCTTCAAC TGCTGGCGTA GGACTACGCT TTGATT-GGGT TGCATAGT'rG GCAACTTGAC CGACAAC.rCC ACGAACTCCC GATTGAGCAA ACGTTCCATC CTCTGCCCTC TGCATATTTG CTTGGTAGTT TTGGTCCAGC ATCTCrrCCT CTGTTAGATT ATACTTGGAA GAGGGGTAAC GGTAA'rCTGA GATT'TTCCT TCAACTTCAG CAGCTTTGGT TTrCTTGG'ITTT TGCAAGACAG TATCGCGCCG ATTAGTAGAA TCCGGCCCCT TGAGCATCCC TGCCAGAGTC GAAACTCCAA AGTATTfTCTT ACTCGCATCT GCGTTGTTAA GGTACATGGT TAGAATTTGC GCAAGGAAAA ATTCTTTCGC T'N'rCTCTCA TTAGCCAGCT G'rTGGGTAAT GGTAGAGCCA AAGAAAAAAC GGCCATAGTT AATCCCGTCA ATAACAGCAT TCTGCAAGTT TTTACTGATG
TAAACAACTA
AAATTCCGGA
CCTGTTTTCG
GGAAATAGCG
TCTGTGTAAA
ACAGCTTCAT
TCATACTTAT
TTATCAATAT
TCTTCTACGG
GCAGCTTGAT
TCTACACCCC
TCCTTACTAT
ACAGTTTGA'r
CCACCTGAAC
AAGGCTTAAT TGTAGAACCA ATCCAGTTTT ATCATTGTCA GTTCGAGGGC TACAC?'rCCT ATGTGTTTC ATAAACAATC TGCGGTAGCC ATTATTGACA TAACCACCGC ATCAAAATAA CGTGCAATTG CGAAGTCATA ATCCTGCTGC AACCATATTC AATTCAAGGG ATTA'rACAGT CCAGACTCAC TTCTGATGCA ACACACCATT TCCAAAATAA ATTTTTTGCT TAATTCTAAG CCTGCGATAA ATAGGCGTTT GTCCAGCAGT GACAATAGCC GCTrTGAGCCA T'rGAAGAGG GTTGGTTTCA CGTCGACCI' TITTCCATGTTI GAGTCCGAAT CCCAAATCAT TAACAGTGGC AACAGCAGGT ATAGGAACTT CTCGACTCGT TTTGA'rCCCT TGCATGGTAT GGTTATCCAA CTGCTTAT'rC 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 TTTTTATAGA AAGAACGGTC TTCTG'rCGCA TCAGTCAGCT CAACATAGGT TCCCTTTTGA CCAGACAAGG CACCAGCCTC 7rETCTTCA CGGTCAAAAA GCATTCCA AATCATTGAC ATTGGTCGAC TTGGCTACAG TA6AGAGTCCG AGTTTTCAAG CAAACAAATA GATTCCAACT
AGCAAGCCTG
AATTTTCGAA
ATAGTAGAAT
TCAAATAATT
CACTCAAACC TAGTATAAGG ATAATCTTTG TTAGATGATA ACGACGCCAG TCGGACCTAC TTGGGCTAAT TTTTTTCGAT CACTACGAGA GCGACGTAAG CAGAGTCCTC TAGTTCACTT GTTTCTTTTT TAAAAAGAGA AAGAAATTTC TATCTAATTT CATGCGTTTA TTTTATCATC TTICATCATAG GAAGACAAGA
ATTTAGCTAT
ATTTACATTA
CCTCATCCCT
AGAAGAAGTC
CGACGAAGAT
TTATCAAGAT
TCAACCAAAC
TGTCGTTCAT
TATCCTGCCC
TCTAGTTGAT
TCGCCATGA'r
TGTAAGCAGA
AGGGCGAACC
CCCTCTCTAT
CTTTACCCAC
AAAAGAATTA
CCACAAAGCC
AGAGTACGAA
ACCAATTGTT
TAAGACATAC
TTTGAAGCTG
T'rCCTATCCA AATAGGGCT CCCGCCTC1'C TACCTCAAAT AGAAAAATCC GTCATTTTTT CACTGGAAGG AAATCGTAAA TATTCCCAAA AGACGATCCC CAACACTTGA TTATTGTAAA GAAATTGCCC TTCTTAACCA CGTCTGGACA TGGAAACCAG ATTCTCAATC GCTTATTGGA GGAAATATCA ACAGAAAAGA CGTAGAAAAA GAATAGTTGA 'rTAAAGCAAT TC'rCAAACAA CATCAGATTC GTGTGCACCT AATAGTAAAT CAAAGACAAG CCACTTACTT TAGAGAAGCT AAAAAGAATG GATGATCGTG TTGCTTTrA TCAACTCAAG CAAGTTGTGC AGTGTATGAC TACCGTCAAC GTCAAGAACT CTACGATATC TGAAGATACG CTTTCATAGC TGCGTTCACT
TTTTTGTTAC
GACAGTAXAG
GAGAATCAAG
TCCTGGAGAT
TTGGGGCAAC
CAAACCAGAG
TGTCAGTACC
TGGCTTAGTT
AATATCTGTA TGCAATTCAC CAATTACTTG AGGAACAACT AAACATATTT TGATAAATrCA
GTTTGCCAGT
CCAGACTTAG
CGGATGAAAA
TATGTTGGCC
CTCTTTGCCA
TGACTTTTGA
TGCAGGAAGT
CGCATGGThA
AAACCTGCTA
AAAA'rCCTTT 0 0 .to.
GAAAAAAGAG ATTTCTAGAG AATATTGGGC ACTTGTTTTC AGAGACAAAA TTGGACGTGA TGCAAAAAAT GGGCAATATG CTGAAACGCA GACTTCCTTG GCTCATTGCA AGCTAAAGAC TT'CGCATCAT AATCTTCCTA TCCTGGGAGA CCGGCTTATG CTTCATGCCT TCCGACTr'rC AACTTTCACT ACCCTTTCAA ATACATTTGA TCATCCATTT TTCCATATAA AAAAGCAAGA AA'IATTTAG CAATTTTTGC GAAGTATTCA ATTTCGTTGT CGTACCATGA TACAACTTTA TTAGTTTGAG TTGCGTCAAA CAATGAACCG ATTGGATCTT CTGTGTAACC GTATGATTCG TCATCAACAG TAACGT'rCTT TTCAAGAACT GTTGGAACGC GT'rGTGCAGA TCCGTCAAGT ATAGCTT'rTG CAGCACr-AGT TGAGTTAGGA 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 GCTACCAAT'r CAGTAACTGA TCCAGTTGGA 'N'ACCATTCA AT'rCTGGGAT TACAAGACCG ACGATGTTTG CAGCACCAGC GCGAGCACGG AtGGATCATTT GGTCACCAGT GTAAGCGTGG AACTTGTCTT GAAGAGCTTT AGCCATTGGA GAGATAACTG TTTCAGTACC GTCAAGAACG ACGTCGTTTC CACCAGGAGC AGTGATAACA TCACT~GCTT CTTTCTTAGC AAAGAAACCA
CGAAGGTCAC
ATAGTAGTCA
GCCAAGCAGT
TCGTGOI'TAG
ACTT T'ITTAG
CACCACGGTG
TCAATCCTTC
TTG'rAGTACA
TGTTGAATAC
CTCCACCTTT
TGGTCCG'rCA
AACAACACCA
TGAAGCACCT
AACTGTTTTA
AAGGTGTTTT
GTAGCTTCAA GAACGATTTC TACACCGTCA 156
GTAGCCCAGT
ACTTCAAATC
TATTTCAACA
ACACCTTCTA
ATACCAACTT
TGAAAAGAGT
TGAGf'rGAAT
ACCAAGGAAT
CUATTGTTC TGATCACGT TCAGCAGAAA CTTTGATGAA TTTACCGTTA CACCTTCTTT AACTTCAACA GTACCGTcr.A AACGACCTTG AGTTGTGTCG AGTGTGCAAG CATAACTCCA TCTGTAAGGT CGTTGATGCG TGTAACTTCA CGT'rTTGGAT ACGACGGAAA GCAAGACGAC CGATACGTCC GAAACCGTTA TAACTACCAT TAGTGATTTC CTCCTTATGA AAATCATGAA ATTr?1ATTG AACTTGAATC ACTACAAATC ACCTTTCAAC AAACCTATTrA TACAACTATT TGCAAGTATG GCCATTGTT'r T'rCTATGTTA GTTTCPTTTTr AAGACTGTAA CCCTTACTAT TCATAGCATA ACGATTCTAT AGGATCCATT T'rACTAATCT a.
TACGCGCCGG GAAGTAGGCT AAACAGATAA AAGATTTAAT TCGCATTCGC CAAACTrCCC CTACAATCCA GATAGCCTCG.
CTGCTTTCAT GACACCTATT CAATCATAAC CGCTGCTACC TAACACGAAT AAAGGTAATC TCTTATT-TTT CTGTAATTCT AGATAAAATA AGATACAGCT CAGACAGCAT GAAACTGTTG
ACCCCTTGTG
TAAATAAAA.A
TCCTTGGAAC
ACAATAGCTT
ACAATATCAA
TCTGTTACTA
TTCGTAAATC
CTGTCCTCCA
CAACCAAAAA TGCCAGCAC TT-CCTx-rGAC ATCACGATTC GTTGCATGAT ATTGATGTAA GTGATGAAAG CACAATCAAT GAACTCTCTG TTGAGAAAGC CTTTTGTCTC TGATGGATCT CAGCCTCTTT CAAAATCGT'r TGTCATCT'rC ATCATarCATT TTTCGGCAGC ACTTTCTACA TCAAATTTTT TCCTGTCTGT
AAGGCGATGC
TGATAACCAA
ATAATGATAC
AATCCCTGPA
ACAGTATACT
TTGAGTTCCA
TCCATTTGAT
ACACGTACAA
ATGCTGGCTG
TCATTCCAAT
GAGACATAAC CAACTA.ATAG AGCGAAAACT AGAGTTCC'rA T'rAAAAACCT TAGTGATGGA TGGGTAAAAG TGACTTACAA 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 606)A0 TCTTCGT'rTG AAATTGAGCA ATCTTACTAG AGACTGATTT GCCAATAAGA TCATTAGCTG TTTTTAGTAA ACTGCTTGGA ATCGTTAATC CAGCCAACAC TTTGTCCGTC TCATTATTAC TCACTACTTG AGCATAAGAA GGCATCGTTT CCTGTTCATT TGTATCAGTA TAGAGGGATC TAACAGAGA'r ACT'rCTATCA TCATAAAGAC GACTCAGATC CATI-rCTTGC CCATCTATAG TAATATTTGA CATGTTCATC CCAAAAGGAC TCTCCAAATA TTTAATAGCT TCTTCCCAA CTGTATCCGT GATATATAGT CAATTGAAAC AAGAGCAGGA TAAAAAAGCC TCGTAAAAGG TATTGCAACT TGGTAATACC TTTTTGAGGT GCTTT'rTGAT ATGAGCCCAT GT'TTTCTCAA TAGGATTGTA CTCAGGCGAG TAGGGAGGAA GAGGTAAAAG TTTATGCCCA AACTCTTCGC ATAAAAGTTC TAGCTTCCCC ATTCTATGGA ATCTTACATT ATCCATAATA ATAACCGATG GTGTGTTTAA TGTTGGTAAG AGAAAATTCT GAAACCAAGC TTCAAAAAAG TCGCTCGTCA TCGTCTCTTC GTAAGTCATT GGAGCGATTA ATTCACCATT TGTTAGACCT GCALACCrAAAG 157 AAATCCTCTG ATATCTTCTT CCAGATACTT TGCCTCTTAT TAATTGACCT TTAATGAGC GACCATATTC TCGATAAAAA TAAGTATCGA ATCCTGTNTC GTCAATCTAA ACAGGTGCTA GGTGCTTTAA ACTATTAAAA TTCTTAAGAA ATAAGGCTAC TTFTTTCTGGG TCTTGTTCAT AGTAGGTGTG GTTCTTTTTT CGAGTGTAGC CCATAGCTTT GAGCGTATAG TGGATGGTAG TTGGATGACA GCCAAATrCA GAAGCTAT'rT CAGTCAAATA AGCGTCTGGA TTGTCAGTAA GATAGTTTTT AAGTCTATCT CTATCAACCT TTCTTGGTrT TATTCCTTT'r ACTTGGTGGT TT~AGCTCTCC TGTTT1TCTCT TTTAGCTT'rA ACCAGCCATA AATGGTATTA CGTGAGATTr "94*99 4 9 8 9 96* 4 S. @5 9 9 9 *9 0 9 *9 9. 4 9 8.4 9 0* 49 9990 0 *0*e 9 899* 9. *9 9 9 4 0 GGAAAACGTG TGATGCTTCT GTTATACTAC CTGTTCGCTC TACGAAAATC TATTGAATAT GCCATAAAAA GATTATACCA TCATTTTACT ATATTTGAAG AGGCGTTTAA AC'rATCTGAC AAGACATCCT TrAAAAAGTT AGTTTATTTT ACAACTTAGA TTCATGGAAA AATCAAGACT CTTAGCACTA TGGGTTAAAC ATCGCTAAAC CACGAAAACG GCTAATAGTG GTCATATCAA CGAGAACGTC CTGCAATTAG GGTAATGGCC TGTTCAATCT AACATGATAA TATCAGCACC CGCCGCCGCA GCTTCTrCGG TCCACCTCGA CCATTTTCAC AAAAGGGGCA TAGGCACGCG ACACTACCTA CTGCCGCAAT GTGATTGTCT ?TTrAGCAGGA CGATOATTAT AGCCACCGCC AACrCTCACG GCATATT'rCT GTAGTTTTTC GAGTATCAAA TACCTI'AATG CAATCATCGC GCTGTCATCG AAGCAATCCC TGATAAATGT TGTAAAAAAT GTTAAGAGAC TTCTCACCGA GCCTATGATT TCTAAAACCA TCCCCATCCT TAAATTGATG AGGATTCTGG AAGGTCACCT ACCCTTrGAA AAACGGTTAG CCCCGCTAAA ACACCAGCTT TITGGCTTGGC CATGATGATC AAAAATGGCA TTGGTACTGT TCTCGCAAGG CTGCTTTCAA TGTATCATCT ATTTGAAAAG ATTGACATCA C INFORMATION FOR SEQ ID NO: 3: SEQUENCE CHARACTERISTICS: LENGTH: 26385 base pairs TYPE: nucleic acid STRANDEDNESS: double ACAATAAGAG AGAACTT CATTGTGTAC TATTTTTGGT ATAAAACTCG TrCTAGAGGA CATCAAGGTA GGTTAACCCC TACCACTGGA GACGTAATCA TATTTCCAGA ACATTCAATC GT'rCCAATGA CATATTATCC CAGCAGCAAG GCTTTCCACT CTTGAGCAAT TGCCTTTTGA TAGCATCTGA TAAATTAAAG CAAAAAGACG TAAATTAGGA CTAAGGCTTC TACATAAGCA TCAAGGCAAC GCGT'rCACAT AATCGCCACT AGTCAAACGA CGGCATCAAA TAGGGTAAAA CCTTGGCAAA AAGCGACACC AATCTTCGGA ATGAACATCT GGGTTAAATC AGTTGAAATG 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7571 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 'rTTGC'rAGTG
GTTTCATCTA
GCAAGAAGCT
TAAGCACCAA
AATAAATCTT-
GTTACTTCAT
GCTGCAATAA
GAAATTGTGA
TTCTACCAAG
ACITTCTTGT
TTAAAACGAG
ATTATTCTTT
ACTCAAGGGA
ATCGTTACTC
GCTCTTTTAC
TGAAACAGTT
AATGAAAAAC
TCCATAGAGA
ATAGGAAGTC
CGTTAGAGCC
GTTGCGACGT
AAGAATTAGT
ATCATGATTC
ATTTCCAACG
TGGAACAAAA
TAGACATTCT
GCTTAAATTC
TATAAAGACT
GATTAAATAG
ACGCTGAAAT
TTTTAATAAT
GTCCTCGTTT
CTGCTAGTTT CA'TTTTTTAT ACCTCTCTTG TTGTAATTAT TT'rAGTTACA CACTCTTAAT AATCAATGTC AATAGTCTG CTTAATTATT ATCAAAATAT AAAACTAACC ATGATTCTAG TGAAAAAAAA TCTTCTTTGT CAACAAATTT TTTAAACATG CTATAATAAT CATAGCAAGA GATC'rAAGTr GTCTGTTTTT GTGATTATCA TGCGTAGATT CTATTCCCAT CTCCCCTACT ATCTGGTCAT TATTGGCCAC TT'rATGAGTT GTTCTTACTA GTTGT'rTCTG ACCCCCTTAC CTCTATATAA ACAATCTTCT CTTCTTTACA CCTCTGGTAA TCTTGATTGT TATAGCTACC GTTTCCGTTr CTCACTTTGA TGGTTAGTTG GTAACGGACT TTTACTATCA TAACCTTTGG TGAGTTTfATA CTAATTTACT TGCTAATCTA GCTCTGGTCG GCATGGATTC 'rGGTATTAGC ATCAAGCATA T'rCTACAAAA AAAAAACTTT CACAAAATCC 'rTGAAAAATC TCACAATCAT GCTATAATAA CAAGTCACTT AGTCCCTTTC TACTACAGAG TGCGTGGTTG CTGGAAACGC TAAACTGATA CTACTCTTGA GTTrTTTTATG AAAACATAAA ACGGTGGCCA GATCAGAGGT GTCCCTCTCT TTTGAGGTAC ATAAATGAAG GTGGAACCAC CCTITCGAGG ATGTCGCATI TTTTTATTAG GATACTAATT ATGGAGTTGC GGAGCGCAGT TGGGCAATCC GACAAGCTTA TCACGAACTG GAAGTTAAGC CAAGTGGACG GTAGAAGAAG ACC'rCTTGGC TTTATCTAAT GATATTGGAA ACTGGTGATG ACAAAGCAAG GACGCTACTA TGATGAA.ACA CCCTACACAC ACTT'rCAGAA AATATCTGGT GGCTATTAGA ACTTTCTCAA CGTTTGGATA GACGGAAATG GAAAACTTCC TCTCTGATAA AGAAAAGCAA TTGAACGTTA 'N'CAGGAAAA TCAGGCGTAT CTAAAAGTCG TGTCG'TTT TCCTGCTCCC CCTACAACTA GAAAACGTGT CTGTGTTCCA TTrCGATTGAT TTGCTGTGGA GCGGTAGCGT A'rCTGGTGTA AACAGCATCA AATCCAGTAA GATCATCTTT TGTCAACTCA AGACTCAGCT TGACTTTTGT TTTCAGAACG AACAATAGCC GACTGCTTCT TCAACAATTG CTTTCCCCGC TTGTCCATTT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 GGACTTGGAA GTAGTCTGCT GATAAAAAAT CAATGCTTAG AAACTATGAA ATAATAAAAA 159 AGGAGAACAT CATGATTAAC AT'rACTT'rCC CAGATGGCGC TGTTCGTGAA TTCGAATCTG GCGTAACAAC TTTrGAAATT GCCCAATCTA TCAGCAAT'rC CCTAGCTAAA AAAGCCTTGG CTGGTAAATT CAACGGCAAA CTCATCGACA TCGAAATTGT GACACCTGAT CACGAAGATG ACT'rGTTCGC CCAAGCAGCT CCATCGAAGA TGGTTTCTAC ACCTTCCTCG TATCGAAGAA GTGAAGAAGT GACTAAAGAC AATTGAT'rGA AGAACACTCA ATGTAGACCT C'TGCCGTGGA TTCTCCATGT AGCTGGTGCG TC'rACGGTAC ACT'rGGTTT AAGCTAAGGA ACGTGACCAC AAGAAGTGGG ACAACGTTTG TGGAACGCTA CATCGTAAAC CACTTGCTTC TGI-rGAGCTT TGTTCCCAAC CATGGACATG CGCACCACAT CCAAGT'rTrC TCGCTGAAAT CGGTATGATG GTGTACGTGA AATG'rCACTC AAGAATTCCA ACGTGCCCTT ACTACCGCTT CCGCCTCTCT CGTCGCTTTr
TACGATACTG
GAAATGCAAA
GAGGCACGTG
GAAGACGAAG
CC'TCACGTTC CATCAACAGG TCGTATCCAA TACTGGCGTG GAAACAGCGA CAACGCTATG GACAAGAAAG ACTGAAAAA CTACCTTCA.A CGTAAACTTG GTAAAGAGCT TGACCTCTTT CCATTCTGGT TGCCAAATGG TGCGACTATC AAAGAGTTGG TTTCTGGCTA CCAACACGTC TACAAGACTT CTGGTCACTG GGATCATTAC GGTGACGGGG AAGAATTTCT CCTTCGTCCA AAACACCATG TTCACTC'rTA CCGTGAATTG CACCGTTACG AAAAATCTGG 'rGCCCTCACT AACGACGGTC ACCTATTCGT TACTCCAGAA CTACTCGCGC TATCACTGAA CCCTTCCAAT CTrGCGTCAC TCCCAGACAT TCACTTGGGA ACAACACAGC TGGTCAAATC AAATCGTCAA AGAAAACTTC AAATCTTCAA AAATGACCC'r GCGGTP'rGAC TATCTATCCT
GATGGAAGCA
TCAGCAGCTC
G'N'GGTCCAG
TCTAACGAAG
CCATCTATTC
TACAAGTTGG
CAGGGTGAAT
ATCTTCCACC
ATGCAACGTA
ATGCGTGAAG
ATGATTTCAC
CGTCGTGAAT
TACACTCCAC
CAAGAAGACA
ATCAACTGTC
CCAATCCGTA
GGCCTTCAAC
CAAATCCAAG
1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 CAGT'rGATTA TCCATGTTTA TGAAGACTTC AACTTGACTG CTTCGTGACC CTCAAGATAC TCATAAGTAC TTTGATAACG ATGAGATGTG GGAAAATGCC CAAACCATGC TTCGTGCAGC TCTTGATGA.A ATGGGCGTGG ACTACTNTGA AGCCGAAGGT GAAGCAGCCT TCTACGGACC AAAATTGGAT ATCCAGATTA AAACTGCCCT TGGAAAAGAA GAAACCCTTT CTACI'ATCCA ACT'rGATTTC TTGTTGCCAG AACGCTTCGA CCTCAAATAC ATCGGAGCTG ATGGCGAAGA TCACCG'rCCA GTCATGATCC ACCGTGGGGT TATCTCAACT ATGGAACGCT TCACAGCTAT CTTGATTGAG AACTACAAGG GGGCCT'rCCC AACATGGCTG CCACCACACC AAGTAACCCT CATCCCAGTA TCTAACGAAA AACACGTGGA CTACGCTTGG GAAGTGGCCA AGAAACTCCG TGACCGCGGT GTCCGTGCAG
ACGTAGATGA
TTCCTTACCA
GCTACGGCCA
CTGATATCGC
TGGAGGCTTT
160 GCGCAA'rGAA AAAATGCAGT TCAAGATCCG TGCTTCACAA ACCAGCAAGA ATTAATTG~rr GGAGACAAAG AAATGGAAGA CGAAACAGTC AACGTTCCTC AAAAGAAACA CAAACTGTCT CAG7rGATAA IrTTGTCAA GCTATCCTAG CAACAAATCA CGCGTTGAGA AATAAGAGTC TAGCATAAAA GCCTCCAATC TTCTCATCTA TTTTTACTCA AGGACTAAGT TCACTGAGC AAACTGAATC 3420 3480 3540 3600 3660 3720 3780 CGCACTGTCG TTCCTTr'rCC GACCTCAGAC TCGATACGAA TCTGGTGCCC CAGTTCTTCA CCAGAGGACT GC'rGGGTCAA ACGGCCATTG
GAAATTTTCT
TATCCTGA.AA
GTATCTTTGA
TACTTGAGAC
TAGATAGATA AACGCCAAGT AGCCACCTTC AAATACTCGG TACAAAGCTC TTGGTCATCC AGGACATCAC TGTTTTTTAT ATATAAATCT CCAGACCACC
TGTTTGAGAT
ACGATTTCTT T1ATCAAGGTC GCATATTTAC GAATTATTTC TCATCATGGA AACTTTCTAA TTGAAAATTT CCTGTTCTAG AGT-rGACTGG CTGCAATGCG AAATCCGTCA GTI-rCTTTC TCTAATTTrT CTGCTAAAGC AGTTCCTGGC GATAGACCTG ACAAAGCAAC ACAAGAAGAA TGAAAGAGTA AGACAAGAAA CGCAGATAGG CTAGAAAAAA TTCTTGGTC TCGATAAATC CACATTGACA GAGAGGGTAT CAGGTCGTCA CGTGCTACGA TTCATTCT'rG GTCAAATTCA QAGGATAACA CCAGCATATT CAACAAGCCC TGAACCTTAG ATCCGCCCCC ATATTGATTG CATGATAGGT ACCTTGGAAA GGGCAAACCA ATATCCATGA GATTTGCTrCA A'rAACCACTA GCAGCCACT'r ATGTAGATTG ACAT'rTAAGC CTTTTTGAAT CT'rGACCAAG TCCTCAATTT GAACCTGCT ACGCAGGTAC TGTAAAACTA GGTTGGTATA
CTGCTGCTTC
GGTCTTTATC
TGCTT'N'GAC
TATTTCCAAA
CGTTTCCACC
AAAGTAGAGG
TGCCAAAGAA
TTGTTTCCAA
CTACCAATCC
AGTTGGCGGT
TGATGGACCC
CTCTCCTGAT
GGAGACTTGG
AATATGTCCC
AAGTAAATTC
AGCAGATAGA
TCAAGCA'rGC
CTGCYICCTCC
CGACCACTTC
ACAAGGTATA
AGAGTTCCAT
CTTCCCTCTC
AAGTGAAAAA
CTAGACTGGC
TAAAAAGACG
TTCAATCCGT
AACTTTTTAC
CTGTTCCAAA
CCCGATTCCC 3840 TTCCTTGGTG 3900 TTTATCCGTC 3960 AAAGAAAAGA 4020 TAAGACCAAA 4080 GGAGTCGATT 4140 TGCAACTAAG 4200 GTAATCCAGC 4260 CTCACGCGCT 4320 TCCATAGAGA 4380 TAATATGGTT 4440 AAATAAAAAC 4500 ACTACGGGAG 4560 ACCCTATTCC 4620 GCAAACGAGC 4680 GTTCCCGCAT 4740 TATCATCAAT GAAAAAGTCA TGTTGCCTGC ATGCTCAAAT AGACTTGCCC TTGATAATGT CCAGCAAACT CTCATCACGC CTAAAAGAAC CTGCTGGTCA CCATGACAAT ATCCATAGCC TCTTGCGGAT TTCCTGACAC GGACCAGATG AGGTTCCGAC AACACGCGTA AAATCTGGAA AAATCCATGG ATT'rGGTATT CCAAACTCAT AGGAACGACG AAAGGC~rGG TCACAAAGTC TGGTCTCTCG AAGAAAGAAA CAGTGATAAC CATTAAACAA TGAACAAATA GACTCAAAAC 4800 4860 4920 4980 5040 5100 5160 161 AAATCCCCAT TTCCATAAAG TCTTCTACCA GGACCACTTC CTGTTGACGA ATGACCTGAT CATCTTCTAT TTCCATTATT ATAACAGATT TTTCCATGCT TG'N'TTTAGC CAGTACAAAC AGr.CTA'rGCT T1CAGAGAGCA ?3'TTCCCAAT
TAAATAAAAA
ATGATTCTTA
CCTACCTCTC
CTCAGGTTrCC
AATTGGCTCA
ACCCTGAACT
GAAAGGAGCI' CTTATGGCCA TTACGACCT'r CCCTTGAATG CTN'GATAAT CTGGTCTCCA AAGAGATCCC ACCATGCTTA ACACAAGCGC TTCAAAAATT TAATAAAATC TTGTGCATGC AGATGGTC1'G AAACTGAATI' ACTAGCTAAT TTGAGGGAAA ATATTTGA CTATCTCAAA AGTTAGACAT TCTAACCTTA CACTTCCTCA ACGTCTTTTA C2'AGCAAAAA TCGCCTTCAA GCAAACTCTC CCATTTTATC
GCTTCTCCTT
TGAAATACC
TT'rGCTAAGA GATrGTCGCAT
ATAGAAATCA
GATCTAGCAC
TTATTAGATG
AACGACATCG
5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 GCAAAAGCAA T'TTGCGGCTA TGACTITATCG TGTrCAGCCTC GATACCTATC TGATTGTCTT TCGTGGGACA CCTATATGAA GGAAATTCCT CCCATCATCC TAAGCAAAAG TCTATGCTGC TAGCCAAATT1 TTGATGCACC TGGTCTCCAT ATAGAAGCAA GATATTCAT'r CTCACCAAAT CATCGTTCAG GTTGGCAGAT TGAGGACAAG AAGTAGACAC AACCTTTAAA ACTTCGACCT CTTCTTTGGC CTTCCTTAAA GGCGC'N'GAA CAGAAGAAAG AGAAACCTTG CATGGAAAAA TAGATAATAC GATGACAGTA TCATTGGCTG GCTCAAAAGC ACGCCCTTCG GTTATrCTAG CTGGGCATTC GAGCAAAGTT TGCAAkAATCA CAAGAATTGA CACAGACTGC CCACAAGGTT CCATTATCGG AGTACTGCCC TGG GTGGCAT CACTTCGTCC AACTGGATAA GAATGGGTCG CCACAGTCCC ACTATTCTTG ATGCTGGTAT GAAGGAAGAT TTCC-ACCTGA CTATTTAAAG AACTTTXTTG CAAGGGAGGA AA'rCTCGCTA GATCACAGCA GTTTATACAT GGGTTATCAA AGGATAATGG TATGATGC'rG GAAATTCCTG CGCCCAGCAC GATACCTTTA GACCAACAGT GATAGCCAGC TGACGAAGAA CTTCAGCTCT 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840
TACATTCATC
GGTCGCCTTA
TCTTGAAAAT
ATCTCTTTGT
CCCAGTTATT
TAAATGTATA
TAAATAGAAT
CGTCCCTCTC
TAGCTCTATC
CCAAGCTCAA
GATTGATACT
CAAAACAAAA
CTAATAGTCA
GC'rTCAACAA ACATACTTTC ATCTGCATTC TAAGTCTTT AAGAGCATTG AGAGATAATG GGGCTTGGAA TTATAGATTA AAAAGATGCC ACTTAGAAAA ATATAAAAAG CTAACTGAAC ATTCTCGTAT AAATAACTTG AAATGAGGGA TAATAAAAAT
AATGACTTGG
TCCCTCACTC
CGTTACCAGG
GACCTAGAAT
ATAAAAATCA
AATGACCCCA
AAAGGCAAA'r GGA'rAGATAA
TCTATTATCC
AGCAAAAAAG GAAGTAAGAC CCATTTTTrAT AAAAAAGGTA AATACTGGAT TCCACAAACT T'rCCAAAAT1G ACACTATAAA GGCTAATACA ATTCCTATAA CGAGATACAT TTCTTACTCC 60 6900 162 rrrTAATAGCT ACAT'rTTATC ATAATTATCC AAAGAAAAAA GAGGGCATTT ATCCTTCATC TGAC'TCTCTG CATCGGCCAC GACTTNPrCT AGACTGGTTT TGCCTCCATA GTCAACS'CAA TTCTCTCCAA TTTTTGATCC AAAACATCAT TCCTACAGGG CAA'n'rCGAT TCGGATTGTC ATGGAAACTG AAGAGTTGAC AAGACATTCC ACCGCCTGA'r CI'CTGTTCCG CCCGT'rCCAC GATCTTCTG ATAATGACAG AAACATCTAA AAGACTAATA '-T=AAGGT1 GCGCTACTGA AATC-AGCTCT GCCTTCTTCA CATTGACCCC GACACTAGCA GCCAGAAAAT
ATCCCTCTTA
GACCAAGTTC
GAATATGAGC
CTGTCI'rACC
CCTTGACAAT
ACTGGGACAA
CACTGGTCAC
TGGTAAATCT
CTCTTCTTTT
ACCAGTCGTC
CTTGCTTTCC TTCCCCTCGA GGGCAATGAT TATCAGCATA.TGAGTCGCAA ACT'rGGAATT TGCATCCTCT TCTCCTTTTT ACGAGGCTAC CCTGCCTCTA 6 0 .t 0 0* *t 0 *0 50 66 0 *0 0 6 6 00 06 0 *600 p .6.6 00 0*
S
6600 6
S
#096 50 *6 00 6 0 'rCTAT'rATTA TACCCTTTTT TAACTCCCGA TCGCAGCCCT ATGGTGGATC CCATTGACCA TTTAGCCAGC TGCAATTCAG CAGGCCTTTG TCATCAAACT AGCGTCCAAA AAGAAAGGTT CCCGTAACGA AGCAACTGCT AGTCACCAAG CGGTCAAACT CTTATCAGGC AGATGATGGC AAGAGGTTGA TTG'rGGAGCA GCACTGGCGC AAGAGATAGC CTTrTCTGAGC CAATTCTCTC GACTTTCATA GTAAACCTCA CTGCTACATC GTAGTCTACC CCAAAATCAr ATACTGGGCC GGCCAATCGA ACCCGGATTG GGTGAATATG TCCATAAACA CCTCTTGTTT GCCAGTATGA TAATTCCCAC CGTCAAATCC CTTCAATTTC TTCTAGGGAA GTTGACTGGG GCGAGTACTG AAAAATTCCT GATGATGAGT AAATAGGGAA GTCTCAGGTC CGTCGGAAGT CCATATCTAC CGCAAGTCCT TCCGTAGCTG ACAATCAATT GCCCACCAGT GCAATATCAC AGGGAGGATG ATCAACTCTC GCCCCCAGTT CCAAACTGAC GATGAATTTG ATTTCCTCTrA AAACA'rACTG TCCAATTCCTr TACGGACACC GTAATCGGTA GTTGATCCAA ATATCTCCCA AAAGCCAGTA TCCAAGGCGG TGGTATTTCC TCCTCGTTTr TTCTCTTGCA TTTTTTTGA'r ATACTAGATA AGTTGTAATG TCAATCGTTA CCACTTTTCA 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 ATGCCAAAGA CTGTCTTCCC AGTTTCCCAA AACTCTAGCC CAAGTCCAAA ATCCTrTCTAC GCCCTGTCCC TGGCATGAGA
TTCATCCACT
ATGAATATCT
ATAAGTATAA
AAGATATTAG
TATGAAATTG
GTCTCTCCCC
AAGGACAAGG
TATCGTAAGC
CCTATCTGCC
GAAAGAAGAG
CATAAAAAGT
ACAAGAGGAA
GC.ATAGACCT
CTGCCGTAAC
AATGTGCCTA
ACCGCTNTGAT
GAGCATCTGC CAAAACAGCC CTAT'rTTCGT CATATCCATC CACAGCTAGA GAAATCTAGC ACGAATGACC CCAAACAAAG GCATAAGATT ACCAACAAGG TGAAATGATC AALACGAATGA TCTACTGACT GACCTCGGTC TGAAGTTTTT CTAGTTCATC
AAGACTATCT
AAATTGCGGC
AAAGTGAAAA
TCAAACTGGT
ATTTAGACTA
AAAATGTATT
TCGCATGCAA
TCTCATCCTA
CTCTGAGCTC
TACAAGTGAC
163 CAGATTCACG AGGAAGCTGA GGTCTTGGAA CACACTGTCT CTGACCTGTT CGTGGAAAGA CTAGATAAAC TGCTAGGTTT CCCTAAAACC TCCCCCACG GGGGAACTAT TCCTGCCAAG GGAGAACTAC TCGT'rGAAAT CAATAACCTC CCACTAGCTG ATA'rCAAGGA AGCTGGCGCC TACCGCC'rGA CTCGGGTGCA CGATAGTTTT GACATTCTCC A'rTATCTGGA CAAGCACTCA CTTCACATCG GTGACCAGCT CCAAGTCAAG CAC'rTrGATG GCTrCAGCAA TACCTTCACT A'rCCTCAGTA ACGACGAGGA TT'rACAAGTG AATATGGACA 'rTGCAAAACA ACTCTATGTC GAGAAAATCA ACTAATTTCT CAAGTCCCCT ACCAACCCTG AAAGTTTTAT TTTGGCTCTT TGTCAACTGT AGTGGGTTGA AGTCAGCTAA GCTCGACAAA GGACAAArTTT TGTCCTTTCT TTTTTGATAT TCAGAGCGAT AAAAATCCGT AAGGCATTGC GCTTGATAAG TTrTGATGAGA TAGTGTAGTT GAAGGGCGTT GACAATCT'N' GTCTGAAAAA TAGGATGAAC CTGCTTTAGA CGTTTCTTAT TCTGAAAGTG AAACAGCAAG TCTTGTGAA'r AGCTCAAAAG CTT1GTCTAAA TTTTGAAGT TTTCAAAGTT CCGAAAACCA TTATTGGTCG CTTCCAGTTT GGCATTAGAA TCTTTATCTT TGAGGAAGGT TTTAAAGACA TTGTCCTCAA TGAGTCCGAA AAATTTCTCC AGTTGATACA GCTGATAGTG GTGT'rTCAAG GTAGGACGAT AAAATCGCTT TAGCGCTTGA TAGCCTTGTA CGCACACGAC TCATAGCACG TTAGCATTCG GGAGTGAAAC AGCGAAGCTG TTTAGCCAAG CT'rGACAACG AACGGCTCTA TGCCTTCAAG AACAGTGATA TTCCCTTAGT GAAGGCATAC GCTCAAAGTG AAAGTCATTG GGGCAATATC AGTCATAGAA TACGAGGGAT TTGGTGATTT AGTGATAGCA CTTGAAACGA
ATCACTCAGT
TTrCATGGGAT
GCTAAGATGT
AGTCTGGGAG3
TCATAGTAAG
TCGTAGCGAA
ATATTAAGAT
ATCTCTTTAT TGGTTAAGTG TTACGCI'AT CCTGTTGTAT TTTCGATCCA ATTGGTTCAT TGTACAATGT GAAAGCGATC
CATACGAAAA
GAGCTTCCAG
*AANTTTGAACA
CAACACGATT
8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440
ACTGTTTCAG
GACTAAACAT
GAAAGTGATT
TATCAAAATC
TCATCCCAAG ACATAATCTT AGCTTGCGAA TGACAGTTGA AT'TTTTTCA.A TTAACTTTTG TrCfTTACCA GGGGAGTCTC CGCTTTCTAA GGAGAATTCT CCTGAGCCTA GAAATTTGAA ATCCATCGTA ATGATTTTCA TCGGATGACA GCTTGTGTTC TTGCGCAATG AAACTCATCT TGGAAGCCGA GAAAAATCAT AGTTGAAA'rG GCCAGCTGAT AGCAATyT-rT TGGTTGATGA AGCAACCATC ATTTTTGAAC AGAAGGCATA CCAGTCGT'TT CTTCAATTGG TTTCCGCACT CAAGATAAGG AATTTTAGAA GGTTrTGAA AGTCATATTT CAGGGCAAGA TGGGGCGTCG TAGTCCAGTT TGGCGATGAT TTCCTTGTGT GTATCCTTAT TGATGATGTC TAAAATCTGG ATATTAGGGT CTTTAATGTC TAGTAATTTT GTGATAAAAT GTAATTGTrC CATATGATI'C ATGGGAC?1'T TI'?ICTACAA ACAAATATTA TAGAACCGTA
TTTTTGATGA
GTCTGTTCTA
AAATTAATTT
CA'rrCAGGCT
TAAAATATAG
ATAGAATTAT
TATGACTATT AACCI'GTCT TCGAAACAAG TA'N'GTAAGA AGTTAAAAAA GArTTGAAAC AAAAAGTATA CTCTAATTGA ATTTCTCTCT TAAAAGTTTT CTATTAGTGC TAGAATAATA AATTCAAGTC CCCAAATAGA CTTCATGAAT ATCAATTITCA TGTTTGTCGC TAGAAAAATC CAAAGAAGAA ACTAGCAAGC TAAGACTGAC CTATATAATC TATAGGCTAG AGAGTAGTGT GGGAACT'rGG ATAGGAAAAA AAT'rAGCCGT TTAGGAACTT TCTAGAAAALA -ACAGGACTAG ATTGTTTGGG AATTGCTTTA CCACCAAAAA TATTTGGATA ATCGCCGGAA AATGTAAGCG CACAGGGAAA GTGGAACAGT 164 TTTCTAATGA GTTGTTGT CGCTTTTCAT TATAGGTCAT TAAAATAGGC TCCATAATAT CTATAGTGGA TTTACCCACT AAAATAGAAG GAGATAGCAG GTTTTCAAGC CTGCTATCTT GATACGAAAT CATAAGAGGT CTGAAAC'rAC TTTCAGAGTA TAGATTGAAA TAAGATGTGA ACAACTCTAT CAGGAAAGTC TTTAGCAGTC AAGGTGTACT GTTATAGATT CAATATATTA TCTCCTAAAA TTGACTTTCT TGTTTTCTTA TCTTGTCCAC ATTTGATTAT 'TI-rGAALAGT ACTTTTAATA TACTTGATAT TAAATTCCAA ATTIAGAAAAA GACT'rGAAAT ACTAAAAAAA AAACGGTAAC AAAACTAATT TAGAGAATGA AATATAGAGT TGGTGAAACG AGATGTAGAA AGGAGATTTA GCCAAAGAGT GATTAGAATr ATNTTAGA-AA AACGAAGTGA GCAGCTTATA TTCATACTAG TATCTTTTGC AAAAAATAAA GGGCGACTTC TCTATAAGGA AGGTAGCTAA TTGAACTAAC TTATTTATTC AGACCTCCTT GTGAAGATTG AGGAGATACT TAATGAAAAT TAGTAGCAGA TTGCCCAAAA CACCGCTTTG AGGTTGTAGA CAAGGTGAAG CGACTGTGGT TTGA-AGAGAT TTTCAAAGAG T'rTTATGTCC TrrCTAGTAGA AAATGCTAGA CAGAAGAATG TAGATTGAGA AAGGAGGTTA GAAGAGATGA TTATTACAAA ATGTGGGAGT AAATCCACAT TTTGCAACAT TAATAGATTT AAAATTTAAC AGAAGGTTCG ATTGCTATCG ATGGTAATCG 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240
CTTATCTAGC
TTICATTTAGT
TTACCCAAGA
TGGTTCA'N'T
AGAAGATTTA CATCAACCCA AGGTTCGTAT TAAAGTTGCG ATTCTTAAT GTAGAAAGAG ATGTCTAGCr GGAATTGCGC TAGCGGCTGT AGCTACAACT AGTACTGAAC CACCAACAGA CTTTACTCAA GGACCCCGTT TAGAAAGTAT AGATGGTCAA GCAGGGGCTT 'rCTTTGAAAC TTTGGAAAAC GAAGAAGCCA TGGCTGTTAC ATATGATG.AA GAGAPArGATA TTGAATTATA GAGAGCTGGC GAATGCCTCA TCACTT-rTCC AAATGATGAA CCTGTGAAAA AAGTTGTCTT AAGAACGATG AAAAAAATGA GAAAGTTTTT TGCCTTGGTA GCTTGI'TCAG GAAAAAAAGA ATTATCTGGT GAGATTACAA TGTGGCACTC TCAAAAATCA GCAGATGC?? TCATGCAAAA 165 GCATCCAAAA ACGAAAATCA AGATTGAAAC ATTT'rCTTGG AATGACTTCT ATACTAAATG GACTACAGT TTAGCAAATG GAAATGTGCC AGATATCAGT ACAGCTCTTC CTAACCAAGT AATGGAAATG GTCAACTCAG ATGCTTTGGT TCCGCTAAAT GATTCTATCA AGCGTATTGG ACAAGATAAA TTTAACGAAA CTCCCT'rAAA TGAAGCAA.AA ATCGGAGATG ATTACTACTC TGTTCCTCTT TATTCACATG TAATAT'rGAG GTTCCTAAAA AGCTGGAGTT TATGGCTTGT CTTGAACTTC TACGTACGTA CTTGACAAGC CAACTTGCTC CTCACCTCAA GATTCTTTGA AAAAACAGCA TTTGACTTTA TCAATTGATT GATTCGATTG CCAAGGAATT GAAACCTCAA AGTTGCTAAA GCATTCTTAG TTCAACTCCA GTAGG'rATGT AGAAAATGAA ACTCGTAAGA AAAAGGTACT GCTAT'rGGTT CCAACACATT ATTGAACAAA AGCAGCAAAA GAAGCAGAAA CACAAGTCAT GTGGGTTAGA ACAGATTTGT CTTGGGATCA ACTCTATGAA GCTTCTAAAA CTGTTCCGTT TGGAACAAAT rTGGTGGAGG AAGCCTCTTA AAGATGGTAT TAAATACTGG ACTTTAATGT CCTTCAACAA ACTCTGGCTT CCATATCGGA ATGCTTATCC TATTCCAAAA
GA=~AATCG
ACAAAAGATC
TAAAAGAACA
AATTGAAAGA
CAACACGTT'r
TTAAAGCAGA
j 9.
9 9 9. 9* 9 9 9 *9 9 9 9**C9* 9 9.* *9*9 0* 9 0 *9*9
ACATTCCAAT
AAGCACTTTA
TGCCAACTAT
AATTTAAACA
ATGAAAATGG
TGTTCCAAGA
AACAATTAAA
GGT'rGTTTGG
TAATGAAGAA
TAAGGGGATT
TGCTGAAGAA
GCCAAGTGTA
TATCATTACA
G'rTAAATTGT ATAAAGAAAT GCTACCTTGT TCTATCAAGG GGAATTAATG CCAACAGTCC ATCAAAGAGT CTGATAAAGA AAAAATTCAA AACATCCAGA GACTACGTTA AATTCCT'rGA AGCGATTCTG CAGCC'rArA.A GTAATTACTG AAGCTGT'rAA CAAGCTGGTA TGTTGACTAA AATGGAACAG ATCCTATGAA 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13'740 13800 13860 13920 13980
AAAGACTAGA
AGGGAGAAGG
TATGATTATC
TACCAATAAG
TGTGCTATCA
CTCATTAGTT
CCACTTCAAG
TATTGCCTTC
CGTAAAATTA
AAATAGGTGG GATAGTGAGC AGAATGGT'rA AAGAACGTAA
GTAGGATTAC.TCTTTGTTTA
CATTTGATTA TGCCTAATTA GATCCCAACT TCTTTAA'rGC GGTCAAGTTT TAGTAGGGTT AAATTATATA GGACATTATT TCTTGGCAGT GGATTCTAAA GGTTTAATGG AACATACACC TGATrTATTT GAGGCTGTTC AGTAGATGTA TGAAAAGCTC TAGCCCAATC TTGTAAAAGA TTTAACTCGC TGGATATTTG TTTTGCCAGC TCCGTTTTTC TCGAGTATTT TTTATAGCTr' TAAATT'rGTT GGTTTGGCTA ACTATAAAGC GTTCTTTAAT TCAATTAAGT GGACCGTN'T TGTATTGGCT 'IrAGCTCI-rC ACAGAGTACG GATTGTTCCT TGGGCATTTC CTACCATCGT CGGGGTTTAT GGCTACTTAC CTAATCTAAT TGCATTTTTG ACAGATAGTA CATGGGCATT TGGAGCACCA ATGATTATGG T'rAATGTGCT CCTATGTTTG GTGTTTATCA ACATTTGGT'r 166 TTCAGCTr'rG CAAACAGTAC CAGAAGAACA ATTrGAGGCT AAGTTGGCAG GTGITrCAAGT TTATCGTCTT TCCACATAT'r AGTTGTTG AGAACTGTAT GGATCTTTAA TAACTI'TGAC TGGTGGACCA GCCAATGCTA CAACGACGCT TCCAATTT'TT AACTAAATTG T'rGGGTCGTG GATTTGCTTT ATCTACTT'rG AACAAGAAAT CCAGTATT-TA ATCGTTGCAG TTTTCCCATT TTAACTCAGT ATCCAACACG GTTATCAACG ATTTGCACTT ACAACCCTA TTGCGATTA'r CCTAAATTGG GAGCAATCA'r TTGTTAGCAA TTCCCTATTC GGC'rTGATGA TGGTTTATCT TTTTTCCAAA CAGTTCCAAT TTTGTTACGT TTITATAAAGT ATTTATACAT T1TATCAATGC ACAGGAAAGA 'rGACAGTAC TGGGGAGATA TGATGGCAGC ATCATCCAAA ATAAGATTGC GGAAAAAAAT GAATAAAAGA GTCTTTTAAT GGGAGTCCCC CCATATCTCC TATTrTCAA GCTCT TGTr ATTAGATAAA cAGATAAACC AAACTCTCTT CTTCAGCAGT TACAGTACTG CTATCATCAG TAAGTGGCAA TTTAGATATT CTCTCACATG GGTATGGATT ATCATATCT ATTTTGGCC'r GAACAGTTTA CATTGATAAC AT'rCGAAACA TATTCTGCT ATGGCAGCCT GTCGAGACTA CTCGTCATTA AATTGCCATT GCTAAAGTTG ATCTTTTAGT GTTCCATATG TGGAATTGAA GAAGCGGCTA TGTGCTACCG ATTGTAGCAC TTGGAATGAA TTCCTGTATG AGTAGCCCTT CGTTCACTTA GTCTGTTATT GTAGTTCTTC AAGTGGATTA TCAGAAGGAT
GCTAAGATAG
AAAGTGGTTG
AT'rATCTACC
GCTTACAACC
CTC'rTrATCT
AAGGAGGGTA
TACTTTTAGT
CTGTCAAAGG
CATTAGATTA
GTTTAATCAT
ATGGTATTGT
CCTACA=r?
GGTTAACAAA
CAGTTTGGCT
GAATTGATGG
CAGGTATTGT
CCTTGATTTT
ATGGTTCAGA
CATCAATTAT
CTGTGAAGTA
ATGGTGCTTrC TAGGACTTrCT
TCATTACTGG
TGGGCTGGGG
TCT'rGGTGGC
GAAAATAATG
TGGTGCGACC
GAAAGGGGAA
TTTCACTCAT
TGCCTTGGCT
TCGATTCTr'r
CCCACCAATT
TAGTTTATTT
CTTAGTTGGA
TGCAAATAAA
ACCAACAGCT
GATTAACAAT
AATACTAGAC
TTTCTTCTCT
GACGAAAGAA
GTAGGCATTA
GGTCAACTGT
14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 GGTCTTTATT CAAAACTAGG AATTTCCGTT ACTTTGATTC ATGCGAATGA ATTAAACTAT
GGAGGTTCAT
TTGTCTGGAG
CAAGCTTT~GT
ATCAACTGAA
AGAGTCAGAC
TTGGCCTATC
AAAATAATTA CTTTTCAAT'r TTCATGAGAG ATTCTGGTGA ACGCCCAAAA GGGAATAAAT TATTTATT'rT CCAGACCAGC AAGGACAGGC AGTTGAAAAT ACACTAGTAT TTGTATCTGA CAATGTATGT TAATGGAATA GAAGTGTTCT CTGAAACAGT CAATAAGAGT ATAGATATCA AGTAGTAATG AAATTTAAAG TAATAGTAAA GCAGGCTTTA GATAGGTGTA GAAATAAGAG TTCATTATGG GGAAAACATA TTCTAAAGAT AAAACATACA TGATACATTT TTGCCAATTT CAAATATAAA TGGTATAGAT AAGGCAACAC TAGGAGCTGT TAATCGTGAA GGTAAGGAAC ATT'ACC'rCGC AAAAGGAAGT AGGAAGT'rTC AACTATTCCC CTACTCAAGC TAACTATTr'r TATCAAGTAT TGATGCACGT CCACTTCTTA TAGTGATGAT TTAATGACTA TGAGGAGCAG AAATTAGTGG AAGTGCTTCA AAACGATATTI ACTAGCTGAT CCGACTCAGG T'rTTAAAGAA 167 AT'rGATGAALA TCAGTCTATT TAACAAAGCA ATTAGTGATC T1'GTCAAATC CAT'rTCAGT'r AAT'rCCAA TCAGCAGATT AGAATACCGA CACTATATAC ATTAAGTAGr GGAAGAGT'rC TATGGTGGGA CTCATGA'1rC TAAAAGTAAG ATTAATATTG AATGGGAAAA CGTGGAGTG3A GCCAATTTTT GCTATGAALGT TTAGTTTACT GGCCACGAGA TAATAAATTA AAGAATAGTC
TTCATAGATT
GTTATCCCTG
ATAAATGGTC
CATCCATTGT
CGGGTATTGG
ATTAT'rATTT TGAAGATAAA AAATCTGGGA
AAATAATAAT
AAAACTAAAG
ATAACGATTT CCGTTATACA GTTAGAGAAA ATGGTGTCGT TATAATGAA AACCTACAAA TTATACTATA AATGATAAGT ATGAAGTTTT GGAGGGAGGA
GCAAATAAAG
AAGAATGGAG
ACAACTAATA
AAGTCTTTAA
CAGTCGAACA ATATTCGGTT AACAGGT'rCC TATGAATGTT ATATAGCAAT GACAACTAGT CTCCGTTCTT AGGAGAAAAA TAAAATCAAG T"AACAGATTG TTTCTGATGA TAGTGGTCAA CAACAGCAGA AGCACAAATG CCACTACAGG TAAGATAGCT TTTCGTATAT TGATGGA-ATC ACTCTCAATT AATTGATGGA GCCGCAAGGG AGGCCAATTA GGAAATACCA CTATGATATT AATTGCCAAA TCATCACATA
GATTTTGATA
TTCTACAAAG
CAGAATAGAG
CATAATGGAA
ATTrTGCAA
ACATGGAAGA
GTTGAACTGA
TATATGACTA
CAACAAACTT
AAAGAAGCAG
GTTGTCGGTT
GATTTGCCTT
GGTGTACTGT
GTGGCTCTTT AAGAGAAAGG CATAATGGAA ATTCGTTATT TAAAGTGACT CCTACTAATT GAGAGAGTTG GGAACAATTT AAGTTGTTGC CTTACTTATG TCCCGGACAA GGT'rTAGCAT CATATACTAG TGGAGAACTA- ACCTATCTCA AATCCTCAGC TTCAATTCCG TTTAAAAATG GAGATGGTGT GATTAGAACA TTCTTTAGAA GTAGAGATTC TGGAGAAACA TGGTCGAAAG CATATGGCAC ACAAGTATCT GCAATrAAAT TCATTTTGAG TACACCAAAT TCTAGAAGTG TAGTCAATAA AGAAGATGAT AGTATTGATT CGTATGGTTA TGCCrATTCT GCGAT'rACAG TrGAAAAATA TGATT1CGTGG TCGAGAAATG 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 AATTGCATTT AAGCAATGTA GTTCAGTATA TAGA'TTGGA AAAGGAGAAA AACATGGTTA AATACGGTGT TGTTGGAACA GGCTCGCTAC ATGCAAAAGA ATGATGGAGC AGAGATTACT TGCAGAGGCG ATTGCAGAAG AATTGGGAGC AAAAGTAGCA TTCTAGCGAT GAAGTAGATT GTGTTATCGT CGCAACTCCA AATTAATGA'r
GGGTATTTTG
CTTCTCTATG
TTAACAAAAT
GAGCTGAATT
ATCCAGATAA
AGTTCCTTAG ATGAGTTGGT AATAATCTTC ATAAGGAACC GGTTATTAAG GCTGCACAC TTATCAAGAT TGTCGCGAGA AGGACATAT'r ATGAATTTCT
ATGGTAAAAA
TGGTAGATGC
TTAATCGTGT
168 TGTTTrTCTGT
GTGTAAAGAA
TCATCATGCA
AGTTATCGGA
GTCAGTA'rCA
TGAATTGGAT
TCGAAATGTG
TATGGAATTT
TGAACATTAT
TAAAGGAACT
GACGTTCTAT ATTGTCATAC AGCTCGTAAT TGGAAAAAAA TTCGTGAAAA ATCAGGTGGT TCCGTTCAAT TCC'N'ATGGG GGGCA'rGCCT GCCCATGAAG GTGAACATTT CCGTGATGAA TCTAATAAGC GTTTTGCCTT GTTAGAATGG GTCTTAATCC AAGGAAGCAA AGGTGCCATC C'rTAAGCTAG ATGGGCAAGA AAGCTATTTC GAAAAACCAA T1TGCGCTTTC AACAATGTAA CCTTTATGGC AAAGAACTCA TTAATCAACG GGTTGGGAAG AACAACAAcC CACTTGTATC ACCACATCCA GAAACTGTAA CCATGACAGG GATGATATGA TTTTTGTCAA GGTTCAGCTT ATCGTTGGGG CGCTTAGACT TATTCAACTG T'rGAT'rCACG AATCGCAAGA ATGGATGGAG CAATTGCTTA AGAAGATGAT GATCGGACTC GTATCTATCA TAGTACAGAG TGGTAAACCA GGTAAACGTA CTCCAT1'ATG GCTATCATCT GTCATTGATA AAGAAATGCG CTATCTGCAT GAGATTATGG AAGGAGCTCC AGTATCAGAA GAATTTGCAA AACTTTTGAC AGGTGAAGCT GCCCTAGAAG CAATTGCTAC TGCAGATGCT TGTACCCAGT CTATGTTTGA AGATCGCAAA GTAAAATTGT CAGAAATTGT AAAATAAATT TTGGTATTCT CCTATTTATA GGTCGACTTG CTCCTCTGAA AGTACTT'rTA GAGGAGCTGT TTGACrTTTGC TAGTTTTTGA AACTGAAATC TATTATACTA CAAACTATTG AAAGCGTTTT AATTTTAAGG TATAATAATC 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 TCATAGAAAT AAAGAAAAGG GAGCAAATCA AAGATGGAAT ACAGAAGCGG GAGGGGTGAT GGTATCCGAG CAAACAGTGT ATCATTGGGA TTATCAAACG AAAGAAGTTG ATGAATTGGC CGTGAACGCT ACGATGGTTT CCTAATCAGC TTTTGATGGC PCAGGAATTG ACTr'rGTCGG GACGGTCCAG ATTrTTAATT GAAGGAAAAA TrrCATACACC AGGAAAGAGG ATGCCACAGA TTAGCAAAGA AGCCTTGATT CATCGTT'rCT TGTCAGGCTC TTCCTCATGA ACCGCTTTAT TCCCTTGCTG GTCAAAGCGG CTGCGCAAGG TCGCGATATC AAGGAAATTA AGGAAG'rCAC TGATTATCCA CCTCAGGAAC CCTTCATCAC AGAACTGGAC ATCGAGGTGA TTGCTCTGGA GGAAATTCAA GAGTI'CATTC GTCAGGTTAA TGATACTAGT ATCTTCGAAG AAGGGCTAGC AACAACCTTA TCAGGCTACA CATCCTACAG GATTAAGAAA CTCTGTGATG CTGGTGTAGA AGAACAAGCC AAACAAATCC TTGAATATGG
TGGAGCAGTC
TAAACTTCCA
GGCTACTATG
TTGTACCAAG
GGAGAAATAT
AGCTGTAGAA
TCCAAAAGTA
TGTCATTGCA
AGTGCGAGGC
CGTTGCTAGT
TGGAAAAGCT
ATCGTTGTTG GTGGCGCCAT TACTAGACCA AAAGAGATTA CAGAACGCTT CT'rAAATAAG ATC.Tr.GCCG GArTTTTATG TTTAAAGTTT TACAAAAAr~T 169 TTTATGTTAC CTATAGCTAT ACTTCCTGCA GCAGGTCTAC TT'rTGGGGAT TGGTGGTGCA cTTTCAAACC CALACCACGAT AGCAACTIAT CCAATACTAG ACAATAGTAT V=TCAATCA ATANTCCAAG TAATGAGCTC TOCAGGAGAG GTTGTATTCA GTAAmTGTC ACTACTTCTC TGTGTGGGAT TATGTAI'GG CTTAGCGAAA CGAGATAAAG GAACCGCTGC GTTAGCAGGA GTAACTGGTT ACTTAGTTAT GACTGCAACG ATCAAAGCTT TGGTAAAACT TNTATGGCA GAAGGATCTG cAATTGATAC TGGAGTTATT GGAGCATTAG TTGTCGGAAT AGTTGCCGTA TATTTGCACA ACCGATATAA CAATATTCAA T'rACCTTCCG CTTTAGGAT CTTTGGAGGT TCACGCI'CG TTCCTATTG'r TACATCGTTC TCTTCTATCT 'rGATTGGCTT TGTCr'rCT GTTATTTGGC CACC'TTTCCA ACAACTTCTT GTTTCTACAG GTGGATATAT TTCTCAGGCG 6 69 5* S S
S
5* Sq S
S
S *5*4 5555 55 *9 5
*SS*
S
S
*6SS 45 S S GGTCCAATTG GAACTTTTCT ATATGGATTT CATCATATAA Tr'rACCCTAT GT1'TTGGTAT GGACAAACAG TGGTTGGAGC TCAAAAAATA TCTGGATTAT TTACAGAAGG AACAAGGTTT GGTT'rACCGG CTGCCTGTT'r AGCGATGTAC TACGCGGGTT TGTTTTrGG AGTTGCTTTA ATTGAAT'rTA TGTTrCTATT CGTC AGTCCG GGTGTTAGCT TCrTTATTGC AGACGTCTTA GGTGTAATCG ATTTCACTr'r ATTTGGAATI' CTTCAGATTC CATTITGGACT TATTTGGAGT ATTACTCAAT TCAACGTTCT AACGCCAGGG TCTGAATCCG CAGATTCAAC TTCAAATACT ATTATCAGAG CCT'rGGGTGG ATCAAATAAT ACTGAACTTG GTGGTGTTGA AACTGTTGCA TTrTGC'rC AATTAGCCGA TTTGCAGGTC GTTTCTCAAC CATAGTGTTC CTAAAAATCG ACATCTTTTA TTACCGGTAT GTTCTATATG TTGTTCACGC AATATTTCAA TAGGAAACAC TTGCAGGGGA ACGCTAAGAC GTTrTGTATT ATATTATTT CGAGGAGAAG AAGTAGATTC TTrGGCCCAT
AATGATGTTC
TCGTAAAAAA
TACAGAACCA
ATTCCT1'GAT
ATTTCAGGA
GAATTGGGTT
TAGATGGTTC
TAAAGAAATT
TTAATGAGAC 'TTTrGGAGC AGTAGGCITrA 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 GCAGATTATT TAAAACAGGA TAGCCTrACAA ATAGAAGATG TAGATGCTrG TGTGACACGT TTACGTGTAG CTGTAAAAGA AGTTAATCAA GTTGATAAAG CACTTTTAAA ACAAATTGGT GCAGTTGATG TCTTAGAAGT GAAGGGTGGC ATTCAAGCAA TCTATGGAGC AAAAGCAATC TTATATAAAA ATAGTATTAA TGAAATTTTA GGTGTAGATG, ATTAAGTACT TACTGACTTA ATAAAAAACA GAGGAGAGTG ATGGATGAGT AGGATGAAAT GAAATCGCAT ACAAGAAATA AAGAACTCAT TATCCAAGTT GGATACGCTT ATTACATAGG AGAATACAAA TGAAATTTAG AAAATTAGCT TGTACAGTAC TTGCGGGTGC TGCGGTTCTT GGTCTTGCTG CTTGTGGCAA TTCTGGCGGA AGTAAAGATG CTGCCAAATC AGGTGGTGAC GGTGCCAAAA CAGAAATCAC 170 TTGGTGGGCA TTCCCAGTAT TTACCCAAGA AAAAACTGGT GACGGTGTTG GAAC'rTATGA 21120 AAAATCAATC ATCGAAGCGT TrGAAAAAGC CATCGACTTC AAGTCAGGTC CTGAAAAAAT AGACGTACTC TTTGATGCAC CAGGACGTAT TGAGTTGAAT GACCTCTTCA CAGATGAATTf ACAAGCAAGT AAAGCTGGAG ACAAGGCTTA CATGGCAATG AACAAGAAAA TGTTAGAAGA TTGGACAAC'r GATGATTTTG AAAAAGTATT AGGTTCATTG TTCAGTTCTG GTCAAGGGGG CCTTTATAGC GGTTCTGTAA CAGATGAAAA A'r'CGTCAAA GGTCTTGAAA AAGCAACTAG TT~CACAAT'rT GACGGTGGGG CAGATATCCA AATCCTTTGG GCACCAGCTC AAAATGGTAT AGAAGTGGTA GAAGTACCAT TCCCATCAGA AAACGGGTTT GCAGTATTCA ACAATAAAGA CATCCAG'N'T ATCGCAGATG ACAAGGAGTG TTTCCCAGTC CGTACrITCAT TTGGAAAACT CGGCTGGACT CAATACTACT CACCATACTA AACACr'rTGG TTCCCAATGT TGCAATCTGT TTTGAAAGCC TTwCACTGAAA AAGCGAACGA AAACCCAGAT ATAAAAGTGA AATTGGAAAC CACAACAGCC ATCGAAGCAG GAACAGCTcc CATCCAATAC GGTAAAAACG GTAAATTGGc TGT'rAAAGAT GTCAACAATG AAAACATCGT TATGTATCCG ATTAGTTCTG CCCCA'N'CTA TGCTGGAGTA GCAAACCTTG TAAAAGAAGG GAAAGCACTT AAAGACAAGG GTTACACACC AGACCAAGGA ACACGTGCCT TTATCTCTAA AGTTAGCAA-A TATACAACTG ATGATCCTAA CTGGATTAAA GACAATTTGA TCAATAATGG AAACTTTGCC AACGGTCAAA CATCTTACAC CCAAGCTAAA CTTTTAGAAG CAAGTAAGGT CGAAGGTAAG CCAGCTCTTG AGTACCTTGT CGACAAGAAA GTCGCTGCAT CTAAGAAATT GGGACCTAAA GACGTAGTTC GTACAGGTGC TTATGAAGAC AAACGC-ATGG AAACAATCAG CAACACTATT GATGGA'rTTG CTGAAATGAG ATCAAATGGT GACGAAAAAC CAGCAGATGC AACAATCAAA AAAGCTATGA AACAATAGTC AAGAACCTAA GAGTGTATAC CCCCTTTTCC TTTTGT'rTAA AATGTAAGAA ACTGTCACGA CATAAAAAAT TTCATTTTGA TTTTAAAACA GAAAGAGAGG TGCCGACTGT GAAAGTCAAT TACGCTTTCC TAGCACCAGT ATTATTCTTC 'ATGGGCTTCA TTACAAG'rTT CTTTAACTAC GATAACTATA TCCGTATGTT TAAAGATCCT ATTT'rGGTTA TTGGATCTGT ACCAGTTGTT ACCTATCATC AAAATGTCA'r TGCCAGATCC GTAACGGGTA GTGTTGCCGT GACAGTTGTT 21180 21240 21300 21360 21420 21480 21540 21600 21660 21720 21780 21840 21900 21960 22020 22080 22140 22200 22260 22320 22380 22440 22500 22560 22620 22680 22740 22800 22860
CTTAGTTATT
CTCTACACAG
AATTAAAATG
GTTCAAGAAA
AAAATCCGTA
TTTGTCATCT
'rCAATGACTA
GTCTTTACAA
GTTCTATTCT
TTCTACCGTT
CTATAAAAAG TAGTTTTTTA ATAGTGTAAG AAAAGGGGC AAGTTCTTAC ATAAGCGAAT GTCAAAAA.AT TATTCTATTT TGCGGGAAAC AGTGATTTCC TTGTGTTGGC TCCGATGGTG AATTTGAGTT TGTAGGCTTG AATCTCTGAT TAACACAGTT CACTCTTTGT AGCATCTCAG TCGTCTTCTT CCTTCCTGTT TGGA-AATGGA TTTATGACCC ACTATCAGGG ATTICTAAACT ATCATCAGCC AAAAcATTTc AT'rATT-CTCT TGACCACTTC AATA'rTGACA ATTCACTGGT TTTTGGAAGA T1'AAATGGCC ACAATTAACT CATTCCAGTG TACTCAACAA GTACCTTGAT GGCTATGCCA ACACAATTGG CAATTTAAAG TACTTGGAAA ACAGAAAAAA AACCATTAAC ACTGTGCTGT TCATCTTrCC GATACAATTG TTrATTCCTCC TTGGTTGG4GA GATAAAAAC'r AGTTGGTCAG CCCATCATCC TGAAGCGGCG CGTGTTGATG AAGCCr'rCTT CCAACAACTC TTTCGCCTTG ATTCAGCTTT TTGTCCT'rAA GTCCAGCCAC GGGCATTGAT GGCGATTATG TTTA'rATCGC TGCCATGGGG, GTGCAACTGA GTTTCAAGTT TTTATA'N'GC AATCATCACA TGACATCTGG TGGTCCAA.AC 22920 22980 23040 23100 23160 23220 GTACTACCTT TACGAAAAAG CCTTCCAATT TGTCTTCTTG GCAGTCATGA TTGCTATCGT CCACGTAGAA TACTAAAGAA AGCCTTTACT GT'rATTTCAA ATTCTACTGG ATTTTGACAG TCAGTGGTTC CCTAAAATGC
AGGAGACAGC
CAATCATTTT
GGGCATTCAA
CAACCATGGA
CAACTCATGG
GTAACCATGT
CGTT'rCTATG
CAAGTTGTCC
TGGGCAGTTA
AGTGAAAATA
CGTACCT'rCT TT'rACC'rrCA
TGCAGAACCC
TCTTAGTTTG
GTCAACGCAT
T'rGTACCATT
TCTTGCCTTT
TGCCTTGCAA TGGATGTGGA ACTCAGTATT TGCAACCTCA TCTCTAGCAG GTTATGTATT TCTATTTGCT ATCTITTATCG CTGCTATGGC GGTACGTATC GTCAACTTCA TGGGAATCCA GATTGGATGG CCATTCGGTG TCTTCCTCAT GACAGAATAC 23280 AAGCTTTGTT 23340 TATGCAArCT 23400 GCTCTTGTTG 23460 ATCACAACCT 23520 AAACTTCCAA 23580 TATCTCATTG 23640 GGC'rAAAAAA 23700.
GCTTCCAAAA 23760 'rGATACTCTC 23820 GAAACAGTTIC 23880 TGGTGAGATr 23940 CCTTGCAATC 24000 GACTTCACGT 24060 TCCCTACAGA GTT'GCT'rGAA TCAGCTAAAA TCGACGGTTG GGAGTGTAGC CTTCCCGATT GTGAAACCAG GGTTTGCAGC TCAATACTTG GAATGACTAC T'rCATGCAAT TGGTAATGTT AACAATTTGA CCATCTCACT TGGGGT'rGCG ACCATGCAGG CTGAAATGGC AACCAACTAT GGTTTGATTA TGGCAGGACC TGCCCTTGCT GCTGTTCCAA TCGTCACAGT CTTCCTAGTC TTCCAAAAAT CCTTCACACA GGGTATTACT ATGGGAGCGG TCAAAGGATA ATACTCTGCG AAAATCTCTrT CAAACTACGT CAGC'rrCACC TTIGCCATACT TAAGTATTGC CTGCGGTTAG CTTCCTAGTT TGT'rCTTCAA TTTTCATTGA GTATAGGAAA ATCAATCTAT CAAGATACAG AAGTATATTT TATAGATTTA GAGAATATAG AGGTTATAAG TGTCTACAAA ATGGAGGGTA 24120 24180 24240 24300 24360 24420 24480 24540 24600
TGCAGTTACT
TATCAGAAAC
TTATGAAGTT TTGTCAGACA GAAGGAAAGA GTATGATTTT CTTATAAACT TAAGAATGGT TTAGTTAAC TGACGATTTG AAAAACATCA CCTTTTACAA CGACTATCTC TACCAACATC GTAAGGATTC AGGCATTCAT CCTAATTTAG ACAAGGCTAT TTT'CGAAT'rA GGAAAGTATG ATATTGATGG TGTCCTCAAT CAAGCTGAAA ATGATCAATT T'GTGTA GAAGGACATG AATA'rTCGAG
AGCATTCGAC
GTTGGGTTAT
TGCAGGCATG
GGATGAATTG
GGTAGAGAAA
TAGTTTCT'?A
TGTGAATGCA
GAAGCGAGTG ACATTGGCTT CACAATTrG CGATTTTCT'r GAAGAAAAGG TTCGAAAA'rA TrTTTTcTA AAGCTTTGAT TGAGAATAAA ATATTTAAAA GATGGACAGG CGATTACAGT TTGATTGGTA GATACATAAA 172 AGATAAAGTC TTTCTAGTTG 'rTCAGGAAAA TGAGTATCAT AAGAACTA'rG CAGATTTGCA CTACGGTTCA CGTATCAAAG ACGAGGCAGT TGTTCATTGT CATGAACACT ACCCACTCTT CCCAGG'rGAG CCACATCAGC CAAATGGTTA TCTCTTTAAA ATTTGATTG Nr'rAAAAATA AATACTCTAC CATGAAATTG ATCTTTGTGA ATTGGTATCT TCTAAGTATG CTGCAAGAGC TGATGAGATG GCTTGGATAA TTAGGGGCAT ATTAGGTACT TATGCGGCTA AGTATGGTAT TAGTATGGCA CGCTCGATCT TAAGTAGGGT AGCTGCAACT GCAGCACCAA GAGTAGGA'TT ACTGACCAAG ATTTCTGGAT GGA'rTTTACG AGTAGCTGTG AATGTAGCTG ATGTATATGG TAATTTTGCC AACAATATTG CTGCAGCTTG GGATGCATAT GATAAA.ArrC CTAACAATGG TCGTATAAAC TTTTAAAATG CGAGAATGAA AGCACTTTGT ATrTTTTTTAT TGAATATGTT AGCTTGGACA GTGCTTGCAA TGATAATTCG TGGAGGGCTA GATGGATTTG TTGGAGTACT ATTTTAATTG CGTCGCTGTT CGGGGTATAT GATTATAAGC 'AAATAGAAAA AAGTCCAAAA GAAAAAATAG ATrGTTCAT GGTAGGGACT TTACTGACAA AAAAGAAAAC AGTTTACAAA GAAAAATGAT GGAGGAGCAA
ATAGGOATAC
CCATAGATAA
TATGAAAGCT
ACATGGCACA
TGCGTTTTAG
GTT'rACCCAT
TTAAGAAGAG
TGAAACTAGG
24660 24720 24780 24840 24900 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26385 AAAAGGAGTA AGCCTTATCA AGGCAGCAT TGATACAGAT TGAGAAGGTC TTGGACATCG TGACAGCCAA TCTTCTTTTT CGTGACGATT GGAGTGGCTA AAATCAGCCT CTACGAGACC CAGACGGGTG CCTGTTTTTA AAATCTATCT AAGATCTTTC TCTTCAGCTG GGTTTAATGG AGTTAGGAAT TGTGTTTCTT TTTCTGGGGT CAAACAGCTC TGCCCTTCCA ATTGCTGAAA GATTTTTCT'r ACTATCGTGA TGCTGGCTAG TTACCCTATC T.TGGAAAGAA ATTCTTCAAA AAGGATTGAT GTTGGCTAG T CC2'CATGTTA GCCATTCTTG TCCTCATTGT GATGGTCTT ACTCTTAGGT GGCTCAGTCT TCCTACT'N'r 'rGGGTTTGGA TGGATTGATG GAGAAAATTT TCGCAAAATA CCAATAGGAG
AACTTTCTCA
GTCGTCTCTT
ATGTTCGAAG
AAGCAAAATC
ACCCTTTCAG ATCTCTATCT GCCATTTGTT TAGGTATTCT GCGGCACGTT ATGACCTATC TTTAACTTTC CTTGGTTCTT TATCTGTCCG CCTTCAGTCT CTATTGGTCT TTATCCAGAC CT'rTATCI GAAACTACTT TCAAAGGCTC CAAACGCTAT TCTATAAGCG AGAAACTAAA ATCGG 173 INFORMATION FOR SEQ ID NO: 4: SEQUENCE CHARACTERISTICS: LENGTH: 2716 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPIN: SEQ ID NO: 4: CCTGCCCGCA TTGCCCTAGG CATTAAGTAA AAAGCGAGGA AATTTCCCCT CTTTTCCTCT AAAGAAAATG ATATAATAGT AGTTATGGAG TCAAGTGAGA AAGTAGCAGG ACAGGGAGTT CTTCACCGTG CTGCCAAGGA CCAATTGATT GTGACTCACT TTCATACGAT TGATTTTCCC ACATATAAAA GCATGTGAGA GACTGTTGGA
AGTCTCTCCT
AAAAAGAAAT
TCAGGTGCTT
GTTACAGAAA
TATTATTTAT
TTCTTTTGCT
TACGCATCAA
ACCGTGAATT
ATCTTCCAAT
CAACCTTCCA
GATT'IrATTC
TATGTTGAGT
AGTTCGTCTT
CGAGGCAGAT
AAAGAAACGC
TCAGGGAGAA
AT'rCCATTTT
GAGCACTTGG
CGTGAAAAAG
CAAGAAGAGG
GTAGGTGCTG
GAATTGCCTC
GGTTATGAAC
ATTGTATCGC
AGATTGGCTA TGTGCATTTC TCTTAAAGGG AATTGTGAAA TTGTGGTCAA TCCTATGTTr TGACCTATAT TCCTAACTTT TAGTCAGACT GCGCACAGAT GGCAAGTTCA GAAACGTAAA TTGCCAGCTA CACTTGAGGG AAGTTTGAAA CGCTATGTAT TTTCTTTTTA CAACCGGATG ATTGAGGATT TGGTAGCAGC TGGTATTCCA GTCAACAAGG AAAAATGGCA TCCTCTACCA CTTGGTCTTA GTGACAATCA GTTTATCGTA GGGATTGATG AC'rTTATCCG TCTGGCTGAG GGTGGCTTCT CTTTTGGTGG TATGACAGAT AATCCCCC'rA AAAATTTGAT TTTTCCAGGC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380
AGATTACCTT
ACTATAAGAA
CAGAGCGGAT
TATCTGGGCT
AATTATGGAA
GCGCGAATTG
TATGACTATT
CTATAAGGTG
AGTTACAATG AGCTCTTTCC ATGTTGCGTG ATTTAGATCT
TATGCTCTAG
TTAGAAGCTG
ATTTTGGAGG
GAATATCAAG
CGGATCTTTT CTTGTTGCCT CGAGTTGTGA GGCTCCTATT GAAATTATCG GGCGACAGCG CAAATCCTGC TGTCTTAAAA GGTAGAGAAG AGATGAAAGA GATCTCAAAG AAAAGGCTAA ATCTGGTTGG ACTTTTATGA TCTATGCGAA TTGGTTTATT AGTATTCGAA CCTTGAAAAC ACGACAGATA AGGATGTCAA
GGCTATTTTG
GAATATTTCC AGAGAGTATT. CTGAAGAGCA TCTGTTACAA GAAACAAGCC GCTTTAGGGA GAAAGTAAAA AGTGAGGTAA TACAGATACC TATT-TTCCTC AGGTTTCTGG TGTTGCGACC AGAACTTGAA AAGCAGGGAC ATGCTGTTTT TATCTTTACG TCGCTACGAA GATTGGCAAA TTATCCGCAT TCCAAGTGTT CCTTTCTr'rG
ATTGCTAAAC
TTGGGGATTT
TATGAAGACT
TATCTGGTTA
CGTGACTTGC
GAATrAGCCA
AAACTAGGGA.
AAAAATATTC
AAACTGGTAG
CTAGAGATTC
TACTATAAG
TACTTGGAAA
AACCTCATCA
GCTATTTTGG
TTGTATGAGA
ATTATTTCAA
TTTAAGACAG
CGCATGTTGA
GAAGAATAGA
AAGATTGCGG
CAAGTCATTT
CTTTAAGGA
AGTATCAGCT
GGATTGCGCG
ATGTCCATTA
GAGGTTTCCT
174 TCGTCGCTTT GCCTACCGAG AGATATTATC CATACTCAGA TGAATTGAAA ATTCCAGTCA TATTGCTAAG GGGATGTTGA GCATGATGTG GATGGGGTTA TATCTGATrA TAAGGTCAAG GrTGAAAAAC AGTTTGAGCG TCCGGAAATC AAGCAGGAAA TTCAAGATGG TGAAAAGACG TTGCTTAGTC AAGCAGTTTT AGCAGCCTTT GCTGATGTTC TAGCTGGGGA TGGCCCTTAT CTGAATGACC AAGAC'rCAGT
CGGCGGATTT
CATCTTTACA GGGATGATTG CTTCAT'rTCG GCATCGACAA GTTTTAGC.AA GGCACTTGAA CAGAATr'rTC TCT'rGGCCTG TCCATACCTA TCACACCCAG TCCGGCCGAG TATGGTCAAG TTTGCCCTAG TGAGATTGTC GGGTCATTCC TACTGGGATT ATTTGAAAGA ACTGCGTAGT TTTCGAGAAT CTCCTATGAA TGAAAGAGGA AGACAAGGTT TCAA.AGAGCA AGCCCAGAAC CTCCTAGTGA GACGGCTCTT GCGAAACGCA AGGTTTGACC ACGGAAATCC TTATTrGAAC GAGAACATGA TTTGGCTGGT AGCATACCTT ATCAGAGAAA ATGAGTTTTA TCTGGATGCC ATAdGGTCAG tCAGCGTATC TACCTGTAAA AGGATCTAGA GAGACTATTG GAAAGACCAT TGAGAAGCGG TCGTGATAAA ATATGGATCC GACTATCGTT GAATTGTAGC TTACATTATT 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2716 4 4 *44*4* 4
C
4* CC GCT'rAGCCAG TGGAACACCT GTCAT'rGCTC GTGATAAAAT GTTTGGAACC *TTGTACTATG AAGCCCTGAT TGCAACACCA GACATGAACG TTTCAGCTGA GAACTTTGGG AAACGAGTGC ATAACTTCCA GAAAGATTTG GCTAAAGATG TTTTGTATCT TCAGCAACAG GTGG~rrGCTG AGGCT'rCAAA AACACAGTTG ATCAGTATGA AAGAGGAACA GCTATGAAAA AAACAATTAA GTGTTTGTGC TGGGGTGGCC CAT'rATCTGG GGGGTGTTCT TACTTGCTGT TACGGAGCTG TTATGGATTA TCGCGA 4**C CC C C INFORMATION FOR SEQ ID NO:. SEQUENCE CHARACTERISTICS: LENGTH: 13926 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CTTTGGT'TT CCITATTCA AGACATGAGG GCCATCAGGA ATGATCTGAA ACTGCGAATC TGTTrAACAGT CTATGGAGAG AATTAGTAAG GTTGG.ATAAG CTCCTCAGAA ACTCCGACCA GGAGTAG!' TTAAAAATCA CGGGCATCCT GACAGAATCA TGTCAGGGCA GCACCTAAGG GTGCTGATAA ACTCGCTCTT
CTTTCATAGA
GGTAAGT'rCC
AAGAGTCTT
GCAATTGAAG
AAGCTCGAAG
ACAATCCAAT
TAGCTTGTTG
S.
S
S
S.
S
6 *S *S S. GATAGCCTCA GAAGGATAAT CTGTCAGTAG CTGCCCTAAC CCATGCAAAA ATAT'rAATTT CAAGCCTCTC ACTGCAT'rAC AGCCGTAAAT CAAGACT'rrC TCTTCTACCT GTCCTGTTI'C GATTCCAAGT CCCATAGGCG GTGTGTAGAG TATAGAGGTT TCAAGCAGTT CTCCTGACTT GCTCAAATGC GGTCGGTGAG TGGT'rTTAAA CTGAAGACAG ACTTGGGCCT GACAAAAGCT GACTTCTATC 'rCTCCAGATT TGCTAAGGCT ACAATCCTGA CACTCTTCCT CAATCTTGTG AAAATAACGA CTAGCTTI'TC TCAGCCTTTC TTGGCTGATT TTTCCAGTTG TAATT1AATTG TGCCTTTTGA TGAACCTCTC GGTATTCAGA GCCAACTCCA 'rAAATGGCTT GACCTTTGTG AATCCGTCG'r CCATTTGGAA GCAAGAGACC AGGCTCCA-AG TCCTTGATAA TCTCCATTGT ACAAGGAAAG AGTGAGCGGA AGATTT'CAAC GATGGTCGAA GTCATCTGCC AAACAGTTGA CTGCTCGCTC CCAACTTCAG AAA'rACGGTT CATATTI'CA GAGCGATTTT TGGGATCCTG TTGGTCAGTT ACCCCACGCT GAGTCGTCCC AI=rGCTCA AAAAAGAGCT CTGGGCTCAT 1.75 ACTAAGATTC GGTTTATCTT TGCTGCCACPA TGCTATATCC GTTAAATCAA GTGTCTTCAA GTCTGCTCCC TGT1'TTCAA ATACTCTTTT ATAAAATAGG ATATTCCCTG CTAATNTAAG ATTTGGTAAA TCGTAACTGG AAACTCTAG CAAAACAAAA GGTTCTGTCT CTTGAGCTAG ATAGTI'ACTA ACTCCAGAAG GAAATAACTC ATTCCGAACT TCTTTCCAAG ACTCTGCTGA CATCTAGTTC TCCTCAAGGC TTAATTCATA AGCTTCTGCT TGGG'rTAAAT CTGCCAAGGT TAGCAAATGC TGACGGTAAA T'rCCTOGCAA TTTTCCAGCG A'T'rTCAGAA CCAAATTTCC ATTGTGGTAA ATCTTCTCTT GTTCTCCTAG GTAGGTAAAG GATTGATTCA AAGCAGCTTC TGTACTGAGA GGGGTTAATA CTGACGATT GATTCGCAAG CGGTAATCTC GATTAGCTTC TCCCAAGTCT TCTGCATCAA AAGGAAAAC CAGArGTTGT TCTTCAAACA TCAGTTG'N'T GAAGCGAGCT TGTTTACGAT AGAGAACTC T'rCCCATGTG CTATCCCAAG TAATCCCTCC AAGT'rGAATG GTACGAATGG CCACATTAAA AATCGTTCCA CACTAGACTC CACGCGGT'rG CGCAA'TTC GGTGCACCCG TTATGGAACC AAG= CCACA TCCTCTCGCA ACTGACTCTT ATACTGCTCT ACCTGACACA GACGCTCCAC CATATCATTG CGCAACGT CCACAATCAT TTCCAACCAA CTGGCCTGTT CAAGATCTTC CTTCATTGGT CGTGTTGTCA ACTCGCGATC GGAAATCAC1' GTCATCTCG'r CATGTTCCAC 5.
C S 9 176 ATAGGCAT'rG TAGCCCGCCT CC'rGCTCTAC CACCA'rACGA TTGTAGATGG CAAAAGGA'rr GGCATTTAAC T~TTTGCTTAA GTTGGACCGT GTAGTTGACC TAAATGATGG TGAATTTGGG CAATGGCCTT TTCATAGTCT CCAATT'rGAG GGCAAATCAA TATCCTCATA AGTCAGAGGA ATCATGAACA GTAAAGTAAA GCAGGTACTC TCCCAGTAGG TTTTTCCTCA AAAGCAGGTG CAGCCTCGTA GCTGACATAC CTC'rTGGTAG CTTTCCACTr GTGCCAGCAA ATCTGCCACT CAAC'rCTTTA ATAGGCTGGG TAAAGGTATA TCTCTCCCC TGTTT'CTA TGCATACCTT AAGTATAGCA TAAAATA.AGA GATGAGAGAT TTCAATTATT TAAAGAT'rGA AGTTrTAAAG TTTCTTATAA ACAGCTTCT'r TTAAT'rrAAC TGTATTA'rTC GTTTGCTTCT TGTTTAAGAG TTTCGGCATC 1'r'TTTAACA TAAATCATCG TATGATGAAA CGGAAGAACC ATTTACTTCG TGCTTTATCT TTAACTTCT'r TGAAGTAAGC 'TT'rTTA-AAT ATTGTTAGAT ATTTTCT'rGA TAATATATTC ATCACTrAGA AGATTGTTGT TTATAT'N'AT TTGAAGCATA ACCTAAGAAC ACCCCATAAT CTAAAAGCAT TATGTTTGAA TGAAACAGCT ATTACCTCCG TAGATACCGG TCATCATTCT AACACCTACA TGATAGGTAT CTCCCTGCCG GCTGCAGACG TTACTTCCTG ATAGGGGAAG TT'rCTACGAT GGATCCTTGT GAACTGCTAA CCCACCACAT AATAACCTTG TCTTCTACAT TTCTCGTTTT AAAGTCCTAA AATCAATCAC AAACCCTCAT CCGCAAAGCA CTATTTGT'rT GTTGAAGAAG ATAGATACTG TTTTATTACC GCTTCTTTAA ACAATGTCZAG AATGTTGTTA ATCCTTTCGT TCTT'CAATAG TATTAAATGT ACAGACTCAC CATCTGTTrrT CCATT'rTCGT ATCCGTAGTA CCAGGAGCAC CTTTACTAGT TAAGGTGArI' GATCGTTATA CGATTAGTCA TTAATTGTTG TTCTCTTCTT GACTTAGATT 000 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 GCTAATTGCT TCGGGTTTA'r AGATACCATT ATCAACTAAA TCATTAACAG ATTGAATATT TCGAATTTTA.TCCCATTGAT TTAATTTATT GAACCATGCA CTATTTAAAT CT'rTATr-N'G
ACCTGGATTG
TAATTCATTT
ATCAAGAAGA
TTTTAGAGTT
CATATTAATA
CATACCCTGA
AATCGCACGG
GTTAAAGTGT CATTATAACC TTTGATCTG TAATATACCA CCTAAAGAAC CAAACTCATC GCATATGCT'r CAGCATCAGT TCGT'rGACGT GTGTTGTTTC GTTATCACGG TATTC'rCTAT CTATTTTTTT TTGAGAAATC ACAGATTCAG CCTCAATTTC CTTCATATAT CTATTAATAT CTTCTCGTGT CTGATTCCCA TCATTTTTGC GTTTAAATAC AAATCCACTA CCAGTAACAG GACrT'GTAG ACCTTCACGG TGTCCAAAGC CACCTAAGTA ATGTGTGTAA ACTGAAATAC CGTATTCAcC AGTTCTAATA PCATCAGAGT TAGGATATAT ATAATACTTA TCCATAGGAC CAAAGAATTC ATAGCGGCCA TATTTTTCAA CCCATCCACC AACCATTTCT AAATGAACAT AGCAGCATAA GCTCCTGTTC TCTAAGAGGA GTATATACTT ATTT1'ACATC
CATTATAATT
TGTCGGTATT
177
AGGAGCGTTA
GTTATCAGAC
GTTAACGATA
GGCAATTGCA
GAGTATAGTA
TTTAATTTCT
AATAAACCAA
TGCATCTAAA
ACGTGAACCT
AACTTNTTCA
AAATT'rAACT
TTTCTCGACT
TTCGTTTTGA
TTCT'TCTAAA
CTGATACATC
AAGAGTAGTG
AGTA'rATTCT
TGTACCGTCC
ATTGTTGTTC
TAATTTATTA
ATTTTTrATAG
TTCATTAACT
AGTAGTAACT
TAACCTTCCC AAATAGGAAT AACAGCATCT GCTAGACGAT ACCAGAAATC ATAATAGTTT TCN'AATAT CTTCTAATGA TTTI'?rACC1' TTATAATTG AAATTAAATA AAGATGTGCT TTTCTAAGGT GACTTCGTTT TAAATTATCG TCGACCTCAG AAGCGC=~C TGCGATGTAG TCGTTCA'rAT TGTCTATATT TGTGAACAAT TTACCTGATT TAG'rATA'rTT AGCCAATACT TTAATGTTGT TCTCTTTAGA ACCGATTTCA CCATAGAAAT CTGGTTTGAA TAGCATTAAT CCATAGTAAC GATT'rAGGTA AGTTAAACCT TTATCACGAA TCATTTGACG AGCAGCTGGA AC'rAATTT'TG TGATTAGGTT TGTTAAGTTT TATAAATCTT TGATTGCAT'r AACTCTATAG GI'TGAGACT GAAGCTCTAC TGATTC1'AAA TTATCTTT GAACGATATT AGGTGTATAT TTTACATTAC 'rTAAACCTTC ACTGCTAGPA GCATAGTGAA CAATAATTTr ATTAGCTTCA ATCGCGGTAA CAGAAAGAAC TTCTTI'AGTA CCTTGATA'rA CAATATAATC TTTATGTAG GCTTGGTTAT ATTCAGCGT'r ATAATCTTGA
CTI'ACTAGTC
CTATAACCAT
AATCGCTCCTG
TTATCAATAT
AATGCACGAT
ACATGGTCTT
TGTCTATTAT
TGACCGAATG
ATTAATCTGT
TCTTTAATAT
AGTAATAAAG
GAATCATTTA
TCTT'rAACAT
TCACCTAATC
ATAGATTTTA
TTAATTCCTA
GACAAGTTAA
TCTAGGTTTG
TTTAGATGGT
A.ATGGTATTA
ATACTAGAAT
GTTG?1"rAAC
CTGC-AGCTTT
CACTACCAAA
TCAGTAGTGG
GTTTAGAATT
CTGTAdCATC!
AATTTAAAAA
CGTCGAATGT
CTAATACGCT
TAACATCACC
CTGCTTTGTT
GTI'CATGTTC
CTGTGAAGCT
GATTTAGATG
TATCATTAAC
AGTCAGTTAT
AGTAATCTTT
TGATAAACTC
GTTCT'rTATT
ATTTTCAAG
AGGCTTTTTC
3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 TTTGCAAGAG GAGATAGATC ACTTTCTAAT TTATCAGCAG TAATATTGAA TTAGCATCAG CTTGTTCTTT AGTTAATTTA GTAAATGTT TAGATTTCCT AAATGATCTA TTACCTGACG AATATCCCTC TACCGCATAT AAATCTTTTA AGCATAATCA GAATCATCAA CGTCGTTAGA GCCGAATAAC TCCTCTCCAC AGCATAGCTG ACAGAATTAC TTA.CCGTACC TACAGGCCAA GTCTTACTTG AACTTCTACT GGATTGAAA CATCTATTTT ACCITTTTACA ACCGACTCAG TTTTGTACCA ATAAGATGGT CTAGAGTTAA TCCATAATCT ACT1-rAGGAA
TATGAGCACT
GGATAATCTT
CTATTGCTCC
TT'AGGAGAGC
CTAACAAGCT
GGCGCGTGTT TTGTTTCCTG TAATAGTAGC ATCAACATAT GCTTTI'CTAA CAATTCCTCT 173 ATAGTTTGTA CCT'GCAATTC CCCC1'GTATG AGAGCCATTT CCACTTGTAG AGTGTAGTN' GCCAAAGAAA GCAACAT'I- CAATACCAGT TCCATCATTC ATATTATTTA CAAATCCAGC AAAT'rATIA CGACCTGAAA GTGTGCCTGT '1TTCATAGTA TTGGCTAATG ATGCAATATT TTCAAAATTC ACATTAT'rTA TCGTTGCGTT AATTTTGACA TTTGTAATAA CTGAAGAACC ATCTTGACCA GAACGTTCTA
TGTTATCACA
TTCCTTCAGA AC?1'AAAAGT 'rTCAGTAATA GCAAATTG'TT GATATATGAT TTTCCATTAG TTCTTTTGAA GGATCGTTTT ATCTTCGTGG ACTTTAGGTT AGCAGTTCTA GAGACTAAAT AACCGTAGTT TCTTCTATAT TATTTTTAAA TAATAATTGC TTCTTTTCCA TTTTCGTATT TTTAAGATCT AATTGAATAT GTCGTAAATC ATAGTTGTAG
GAACAACATT
GAATAGCTTC
TTTCAATATA
TGTCTGCGAT
7TTTAACAGC
TCTAGCGCTC
CACTAATTCT
GTGAACGTAT
TGCTGTAACT
TAGTAATGTA
P'rAAATAATG
TTTCCTGTGA
ATTGATTGTC
TTGAAATTA'r TCT'rCTTCAA
TTATATACAG
GT'ITrCTGAT
GTTATCAGTG
ATATTTTTAA
GTT'rCTTCAC
TCTCTACATT
GATGTTCCAA
AT'rCTTTAGT
CCAGACGATA
AATATACATT
A'ITT'AT'rATC GTGTrTCCGTT
TATTTGAAGT
ATTCATTAGT
GCTCAACTTT
CGTTACCTCT
TCTTATCATC AGGAATAGTT TGATTAAATC TGTACGTTTA TTTGATTTTC TAGAGTT'rCA ATAGGGTGTA TTCTTTGTAG TACTCTAGGT TCTTAAATGC
AGCGCTTATA
TTCTCCTTTT
AGTATACTTA
GTTTCTGTTG TTACCTI'GTC TTCAATTCAG CTGTGATTGA GCAACAGCTT CACGTTCCAA ATCTGTA-AGG ACTACAGTAT TAATAACTTC TTTGATTTTTGTTTTGTTTT GATTTTCTAG TATTTTCrlwrA TCGGTACTAG TCAATGTTAA 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 t5780 6840 6900 6960 7020 7080 7140 TATTGGCTTT TCAGATAATr' CAACCAATTT TTCAATAGTT GCAGTTAATT TTTCAACAGC TTCG -rAACT TCACTTTGTT TACCATCTGT ATTAGCTGCA ACTTTTTCAG CCTTTGTAAC TTCAGTTTGG AGGT'TTTGCC AACTTCTATC ACTGTAATGT TCTTTTACCT TTGTTT'N'GC ATCTGCAA'rC GTAT'rGTTTA ATTCAGTTTT ATCAACGTTT AGAGCGTCAA 'rAGCCGTTTT AAGTT'CATTT G'rCTCGCTAT TTACCTCAGG CTGTTTTACA GGCTCTGAAG CATAGACACC TTTTGCAGTT TCTAAAACAG GTCCAAGAGC ATTGTAACTT GCTGTAGAAT AATCAGTAGG AGAAACTGAA CTAGCTr'rA' CAATTTGATT ATTI'AACTCA CT'rTrATCAA CTGCT'rCTTT AGTACCAATA CCCTr'rA'IT TATCTTCTGG TTTCGGTG'rT TCCTCTACAG CCTTCTCTTC 'N'CAGGAACT TCTGGTTGCT TTTCTGGCTC AACTGGTGCC GT'rGGTGCCT GTTCGTCTTC TCTTGGCGCG ACTGGTTCAC CTGCTTGTTC AACTTTTGGT TCCTCTGTTG GTTCTGTTTG TTITTCTACA GCAGGCGTTT CAACTTTTGG TTGTTCAATA GATTGATTAA CAGTCTCCTC rTTTTGGTTCT ACAGTTTCTT CAGCCTTGGT ATCTGCAGTT GACTCrCTr GTTTCGGTGT 179 TTCCTCTACA GCCTTrCTCT'r CTTCAGGAGC TTCTGGTTGC CTTTTCGTCT TCTCTTGGCG CGACTGGTT~C ACCTGCTTGT TGGTTTGTCT GATGGTTGAC 'TTCTGGCTT AACTGCTACT AAC'rTCTCCA CCTACTTCTT CAACTGGAGC 'rGGTTCTGCT TACTN'TAGGA AGGGTGTCGT CAGTAGGTTT TACCTCCGAT TTCTTCTGT'r TTAGGTGCTT CTTCT'rTTGG AGCTICCTCT TGTCCTAGCT TGCTCCTGAT TTGTTATTGA TTGAGGAGTC CTCTCCAGGT TTTGCTGAGG TT'rCTTCTAA AACAGTGTCC GTCACCTGAT AGATAACCAA CATAGCGATA GCCCTCCATT AGCCAGCGCT AGGG'rCGCAA CTGGGTCTAC AGCCCCTGCA CATAGCTCCA ACTAGAAAGA CGCTAGCAAT TTTCTTTCTC CCCAACAGTC AGCAAACCAA: AAGCTGTCAA AACAGATGCT CTGATCTT'rT TGATACACCA AACCATATAC AACTTCATTC AATTAAATCT TTAGCTTCTT GTGAALATAAT CTCTTTATTT GTCCACTACA GAAGGAGCCA TCAAAAGGCT TCCAAGAAAT TTICGGCT CGACTGGTGC TCAACTTTTIG ATTCCTCAGC TTTTCCTCTG GTTN'GACTC GAATCTTCTI' TCCCCTCTTC TTTGGT'rCTT CCTTTGGACT GTCTCTACTA CTTGGTT-rrC TCAACT'rCGA CCACAGTCAC AAGCCAAGCG TTTTGAGGAT TCAACAACAC CCTCTCGACT CTAGGAAGAA CTACCAATCC TTGTAGATTA AAAGCAAGCT TCTGTCCCTG TTTGAGGCAA CTGTCAGGCT T'rCCTGTCTG ACATAGTGAT AGGTGGCTGC ACAGAGCCTA CAACTCCCTT 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 AATCTTACGA ATTGAAAAAC GGTCTTTTTT AAACACTTTT ATCTCC'rTTA TTCATTCTCA AAACTTCCTA ATAGCATCTT ACTAGCCAGT GCCGTTACAT CTGAACATGC TTGACATGCA CTTGATAAAT TCAACCTCAA ACCTCCTGCA TGAAGAGTAG CTGACGTTCA ACAACGAGAG ACCTCTACTA CCTAGAATAT GGATTCTTTC CCAATATGAC GCCCCCCTGC TTCAAATTGG T'TGAATCTGT GTCTCGTTCA T'rTCTTGTTT TTAGCAAGAT CAGAATTGAC TGGGAGTTAG CTTTTTGGGG TCTAGAATCA GCGGATAGTG CGCACGCGCA CCTCCGATTA ATTTTGGACG GGGCATGACC AATCTCTCTC AAAATAGGGC GAATCGGAAC TGCCAATTGC AGTGTCTCCG ATATCCAATC CAGCATGAGC CTGGATCCTG CATAAACTTA AAGGCTGCCA ACTGCCCCGA GATGGACACT GACAATTTCC AGACCAAACT CCTCTGCCAC CCCGATTGAC ATGCTCACAA CCTTGAACTG CTAAATGGAT CCAAGATAGT C'TCCACTATC AGCTCACCAA TCTCTTGACT CACCTAGCAC CTCACTAGAA GATAGACCTA AAACAAAAAG TCrrTTTC'rAA AACATCTTCC ACTACCTGAC GTGTTTCTCT TCTCTGTTAC CTCTGTTGTC ACTCTTCTAT CATACCG'N'T AGACAACCTA GAAAGTTTGC CCAATTACGC ATAAAACTCC CTAGTT'rCTA T'rCTATTTAT ATATATT'rCA ACTTCGTCC ATCTTCATAT GGTAATTGGC TCCAAAATGA AGTTGAGCC GT'rGATCGAC ATTTTGAAGA CCAACTCCCC CAGCATCTTG GAAGCCAACG CCATCATCCT TCTGGACAGA AAGTTTAATA TGGCCCTGAC CATTTTCTAC AAGGGGTTGT AGGACCAGCT TTTCATTAA'r TTCGTATTCC AGCTTATCTC GGCGGACA'rG ATTGATTTCG TCAGAGAGAC AGCGGAAATA GGT'rGCCAAG GACTTGGTCA CAGCCATCCA GATGATGGTG TCCAAAGTGT AAAGGGCTTG AAGTTGGTAC TGACGGGTCG 180 CACGTTTGAG TTGACTTTGA C'rACTATCAC CAATACGGAT GACCAATCCC GAATccTGT'r CTTCCITrrTC CTAATGCCA TGGTAAAGAG TOGGTAAGAC TAAATTATCA AAGGCAACAT CATAGCGTTG TTTCTCGATA AAGAGATACT AAATCAAGTC CTTGCCTTGrA TTGAGCGCCA CCTGCACCAC TCGCTGACTA TCATGAAAT'r TATAGAGGAA ATGTGGATTA ATCTGGCTCG TT-TCT'rCCTG GCTACGAAPA GCTACCATCA ACTGATCAAT CTGATCCAAC ATAGCATTAA ATTGGCGAGT TACTTCTCTC AGTTCATAGG CACCAACTTC CTTGGCACGA AGATTTTGAG CACCAGAAGC TCAAATCCTT CAAAGGAGCA ATCCAGCGTT TAAGACTGAA CAAGAAGAGA TGTGACACTG GCCCCAAGCA AGGTCCACAA 0* CTAACTTTTC CAATGATGAC GACTGACGTA GGATTTGTGA TAGCCTCCAT 'N'TGCTAGAC GGTTTTCATT GATAATGAAG CTTCCAGAGT TTCATAAGAA CAACAAGTTC TTGAGTGACA AAACAGGCAT AGCTCCCTGA AGGAAGTTT'r CATCTGCACA GCACAACAGT TTTCAAGTCC CCTCGACCTT GTCTTGACTG AACCAGTCGA GGTGGTN'TCT TGATGGTCGT TTGGCTGTTG ACGCCAAGCA CCGTCCAATC CCAGGAGTAT AACCCTGACC GAACTATAAA CTGTGTGTTG GCAAAGCCCT GCTGCCCCAA ATATCCAAAC GAAGCACACC GAAATGACCC ACTGACTATC TGAATGGCCT TTTGGTACCA CTGTCATCTG TAGAAATGAC TTArCTGACT TCAAGATGGT GGATTCTCAG CATAGGCCAG AATTTCCAAC ATGGTTTCTC CCACACTAAG CAGAGACAGA GAGCTGACTC CGAACCTGGT AG'rTCCTGCA ATCTTCTCTT TGTATCGATG TAGGGTTTCA AGGATGGTAG ACAAAT'rCAT CTGGAGTTGA TTGAGATAGG 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680
AAGATTGGCT
TGATTTACGA
ATCCTCAGCC
CTGACCAGAT
CAAAAACAAA
AACATCCGTC
AATAAAGTGG
GGCTGAAGAA
CAGAAAGATG
TTCTTAAACT
CCCTTTGCAT
GCTGGAGTCA
ATCATATCAG
TTGGTCACCA
TCTCGGATTC
TGCTGGGTCA
CTAGTCTGGC
CTTGATTrGAT
GAAATAACCA
GACGAGGTG'r
AGTTTTTTGA
CCCTCAATGG
AGTAGAAAGT
TTCTAAC'rAA
CACACCTGCA
CTCTGCGATC
ACGTTCTCTC
TCCAACCAGA GCTAGGAGAA AAGAGAAGAA CGCTTCATCG
TATAAGACTG
TGGCCTCAAT
TGAGAAAGAC
GTCTTCTCCC
ATCTGCT'rAA AACGTTGGGT AAAATAGTTC ATATCTTCAA AACCAACCTT TCATAAATCI' TCAGATCTGT AGTTAAAAGC AAGAGCTTGG CTTGTTTAAC ACCAGATAAT CCTGAAAAGG CAAGCCCAAC TCTTTCTTAA TCAAGGAACT
CAGATAGGTC
AGCCAGA'rGA T'rGTAACTGC CTCAATA'rCC
AGACAAGGCA
GGTTTCTCGT
GGACTAAAAC CTAAGTCACT GGCTAAAGAC TTTAAACTAA ATTGGCTATC GACTGGATTT TCTGGGCCAT GTCCTTCA AACCTATTrAG TCAATAAATC TCrTrCTTT-CT CTTCCTTGTC TAGTITTGT TTGA'rTTTCC CCAACATTTC TGACGAGAAA ACGGTTTGAG CAGOTAGTCG TAATCAAAAT CATCGTAACC TGTTAAAAAG ACCAGACTGG CCAACTGGAT GCCAT'rTAGA *4.
S
4 S. *t~ S
V
S55 V. S V
S
TAAAATGATA TCTGGCACCT CTGACCGATG ATTCCATAT TACCAGATAT TCATCTrCTA ?TTCTAGTAT CAGTATAGCA AATAAAAATC AAAAAGTAAA GGTrGTAA-AT AAAACTGACG TGACGAAGTC GATAACCCTA CGAAGAGTAT TAATCAACAT TTGGGAATAA AGCGGATAGA GGACCATCCG TAAAGACATG CGCGTCATAT TGTAGGACTT GTAAAACTAG GCCAGCCACA CCAGTTGCTA TATCCACArA GCTCGTTCTG TTTGATTTrC AATTCCTCAT CACTTGGTTT
AAGTCGACTC
CATACGGTAA
AATCTAGTAA
GAGGCTA'IrG
CCCAAGGTGA
ATCTTCCTTG
ACCAGACTCA
GATACCGGAT
CTGGGTAACT
TGGA'rA'TTG
AAAGTATAC
GGCGACGCTG
ATAAGCGTAc
ATACAGTAAC
GAATCTCCTA
TAGGTGACAA
AATTTGTCTT
TCAAAT'TTAT
GCATACTCCT
GC'TTrTGGAT CAArrCCcAA CGTAGGCTGC TACA7TTGACC CGATTAAGAT TGTGTAGGTC AAATTCTCCT CTAACTGCTT CTAGGAAGAT AGCCACAGGT TCCACACCTA GTTTGACAGC ACCAAATGAA CCTGAGGATA TG.AGGCATGT TGATATCGGT GCCTGCCTTC CA'IrTTCAGC AGT11TAGTCA AACCTTGTCT ATGCTCTGCT CCTTTACCAC AGGAAAGACC TCTTATACTC TrCTCAAAGT ACCGCTTTGA TTTGAGGTTG TAGATAAAAC ACGTGGTTTG AAGAGATTTT CTTTTTCTTC CATTTGGTCT GTAAGCCGCC CTTGTCCTGT CTCGGCrCCG CACTTCCATA CATCTGGACT GATGGGTTGG T'rGATGAAAA GAGAGGTTCC CCCAG'rAACG GN'TGAGAAA CAGGTGACAG GGTCTTTTTC 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 TTAACATTGA TATGGCAGTA GCCATTTGGA TCAGCCACCA CAAAATTCTT CAAGTTTTCC TTAGCCACCT CATCAAAGAC TTGGTTAATC ACACCAGTAC GGTACTGGGT CCCCACATCA ATAATGCGGA AATAGTGAAG CAGGATTTCC ACATGGACGG TTTCTGCATG ACCTGTTTGG CTACCA'ITTG CATAGCCTGA AACGGCATCC TCCACTCCCC AGAA.ACAACC TCCAGCTAGA CTGGCATCAA TGACAGGATA TT'IrTCTTGA GATAGTCTTG TTTTCAACTG CTAGAGGTTG ACTTCCAAAT CCTTGTCATC TTTCCTTGTT TATTTTTGCT
GGCCGCCTGA
ATGGTAATCC
ATCGTATTTC
TGTGTAATAA
GGTTGGATTG
TTGAGAGAAA TTTGCTTGGC ATCATAGGTG TTAATCAATTr CGTAC?1'GGT TGTTTCTCCT GTCACCCCGG GAACACGTGA GAAATATTCC TAALATTTCGT GCAAGTCTGC GTCTTTACTA ATTTCTGTTT TT?1'CACTGC GCATCTGTCT GCCCTGCATT AATACTCCTA GCAACAAGAA 'rTCCTTCAAA GTTTGCAAAA CTrGCCTrCT r'rGTCTATAA AAGTTTGCCT GATGGGTCAA ATTCTTAAAG TCCGCTTCAG 182 TTTTCCTCCT TGGCTAACTG CCGCCTTTTC AATTTGCGAG TCGTATCAAT AGAACATAGA A.ACCGGTTAT GGCTAGAAAA GATTTTT~AAC TTATCATTCA TAAGACGCCT CCTAGGCTAA TTGCATCTTT TTCCATGAAT CCrGGATGTG 'N'rTGACCAG AGGCTTGGGT TGCGTAAGAA CGGACACCAT CTAGGACTGG GAGATTTA TIAATCCAATC ATTGCTCTCC CTTATGTCCT GGTGACACTA CTTTAGCAAT CTCATCCGTA TCTGGAAGAC AGAATTTGAG ATAGACTTTC TTGCCCTTGT
AAGTTTCCAA
CCTTATACCA
CTGTCAAGAC
TAGCCAGACA
AATCAGATAA
CACATAGTCA
GATGGAACAC
ACGGTAGGTC
TGCGCTTGT'r
AGTCACGGAC
TCACCAGCTT
CAAGAAGCCC
TTGCCATCTA
TTACTAGCTG
TTGCCTGAAC
CTCCCATCAA TTCAAAATCA GCCACCTCTT TCCCTrTAGC TCTGCTCCGT C'TTCATTTCA. TCTTTCGTTT GGTGTTCACT AAGCCGTCAA ACAAAGGAGC GAACCTGCTC CAAGAACACA a a TGTTTGCCAT TTTTTCATAT TGATATTCCI' T'rGAAGCATT TCCAAACAGA ACCAAGAAGC TTTTGAGGAT TCCGAGATAG GGATGAAGTT TCAGAGCTAG AAGCAAGAAT GGTAGCGCCA CTCCCTGCCA AGCTCCTGAA CCACCTGAAG GCCCCACGCA AGGCGTCCAA GCAAAACTAA AGCCCTTACC ATTTTGCCCC TGTCCTTGCA TAAAGTGTAG AATCTCCATT TGGTGCAAAC TTCCATTTTA TTCAAATAAT CCATCACAAT AATGAGAAAA TTCGGAA.ATG TTTCAAAACA AGCCCAGCGT ATACACCAAC CCGCCAAGGC CAAAACAGAC AGGTCAAGCC CAATAAAAAT GTrGTAGCCT CTTTTCCTTA CAAGAAGGAT AATAATTGCC
TGACTTAAAA
CCACCCACTT
TAACTAGAGG
ATGAGACCAG
CCCAGAACCG
GCCTGACTAT
TAA.AGCCCCT
CCAGTAAGAT
12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13926
ATTGGAACCA
TAAATATAAA
TTGAAAATTT
AGACCGGTAA
CACTTAGAAA
AGAAGCATAA
GGAAATTCCT
GCCGCTAGAA
CAAAGGTAAG
AAAGAAAATA
AGCAAATCGC
GCTATAAAGG
GCCTGAGCAC!
ATACAAGGAG
TGACCCATAA
CTAAAAAACC AGCTCCATAG CCCAACAAAA CCAGAGTTCG TAATAAACTA GTAACTGAGA CATCCTTATC ATCTAGTAAC ACTCCTGTAT AAAAGAAGGA TAGAATCCCT GCCAAAAAGA AGTTCCTCC1' ATCATTTTAT TGATAGATTT
ATTATA
INFORMATION FOR SEQ ID NO: 6: SEQUENCE CHARACTERISTICS: LENGTH: 20199 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: CCCAGCAGAA AAATGGCA'rT 'PGGAGATAAT GGAAATCGTA AAAAAACTAT GTTTGAGAAA ATAACCTTGT TTATCGTGAT TATCATGCTA GTAGCAAGTr TATTGGGAAT ?rTGCAACT GCAATTGGTG CCCrCAGTAA 'rCTATAAAA'r AGATTCAAGA AAATTTAGTG ACTGGGATTT CCCAGCCCTT TTTTAAAGTG AGAAGAAATA ATGAGTATGT TTTTAGATAC AGCTAAGAT AAGGTCAAGG CTGGTAATGG TGGCGATGGT ATGGTTGCCT TTCGTCGTGA AAAATATGTC CCTAATGGAG GCCCTTGGGG TGGTGATGGT GGTCGTGGAG GCAATGTGGT CTTCGTTGTA GACGAAGGAC TACGTACCTT GATGGATTTC CGCTACAATC GTCATTTCAA GGCTGATTCT GGTGAAAAAG GGATGACCAA AGGGATGCAT GGTCGTGGTG CTGAGGACCT TAGAGTTCGA GTACCAPAAG GTACGACTGT TCGTGATGCG GAGACTGGCA AGGTTTTAAC AGATTTGATT GAACATGGGC AAGAATTTAT CGTTGCCCAC GGTGGTCGTG G'rGGACGTGG AAATATTCGT TTCGCGACAC CAAAAAATCC TGCACCGGAA ATCTCTGAAA ATGGAGAACC AGGTCAGGAA CGTGAGTTAC AATTGGAACT AAAAATCTTG GCAGATGTCG GTTTAGTAGG ATTCCCATCT GTAGGGAAGT CAACACTTTT AAGTGTTATT ACCTCAGCTA AGCC'rAAAAT TGGTGCCTAC 0 0 0000
CAC'ITTACCA
GCAGTAGCCG
CAGTTCCTCC
AGCGAGGGCC
CTATTGTACC AAATTTAGGT ATGGT'rCGCA ACTTGCCAGG TTTGATTGAA GGGGCTAGTC GTCACATCGA GCGTACACGT GTTATCCTTC GTGATCCATA TGAGGACTAC CTAGCTATCA AATCTTCGCC TCATGGAGCG TCCACAGATT ATTGTAGCTA AGTCAGGAAA ATCTTGAAGA CTT'rAAGAAA AAAN'GGCTG GAGTTACCAG CTATCTTCCC AATT'rCTGGA TTGACCAAGC GATGCTACAG CTGAATTGTT AGACAAGACA CCAGAATTTr ATGGAAGAAG AAGCTTACTA TGGATrTGAC GAAGAAGAAA GATGACGATG CGACATGGGT ACTTTCTGGT GAAAAACTCA AACTTTGATC GTGATGAATC TGTCATGAAA TTTGCCCGTC CCCAATCAGG TGAATCCTTT AAGGTGT'rGG T'rTGGGAACT ACA'rCATTGA TATGTCAGCT ATAA-AGAGCT GGAGTCTTAC ATAAGATGGA CATGCCTGAG AAAA'rTATGA TGAATTTGAA AAGGTCTGGC AACACTT'rTA TGCTCTACGA CGAGTCCGAT AAGCCTTTGA AATTAGTCGT TGAA.ACTCTT TAATATGACC AGCTTCGTGG TATGGGGGTT TGGTCCGCAT TGGTAAATTT PAACCGA'rATC TTTCCGAGAT AAAAGAAATT GGAAGAACTA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 GATGAAGCCC TTCGTGCGCG GAGTTTGAAT TTGTAGACTA GCGGATGGTA ATTTTGTrrC
TGGAGCTAAA
GGAGACTGGT
CGCCGCAGAC
GATGGGGATT
ATGGGAGATA
GTTTGG.AATG
TTAATCGTC TCAATCCAAA TCTCAGTAAA GAAGCTAAAA AGGCATGTAr AGCAAACTGA TGAAATGAGA ACAGGACAAA AAGCAGAGAT GTACTATTCT CAAAATTAAA TTGTTTGATT1 TCGGCGAGTC AAATAGCGAT CTGGGGATAG ACCGTTTTAA CGAAATCGTG GCTCTACGAA CATCAAACTC TAAAGTCCAA TTGAATTCGT ACTAAGATTT ATACACGAGG AAAGATGTAC CAGATAGAAG TGATCCTGAG 184 TCGTGCCTTG AGATTGGCAC GAACTAAAAA CGAAAATCCA AATCCCGTGC CTCATCAGAC ACGGGATTTT GTGGTACGAC ATCTCGAATA CCACAGCATA TCTTCTAAAA TATAGTAAAA TCGATCAGGA CAGTAAAATC AGTTTCAA'rC AACTATATTG CTTAT'ITCAA TTTGTTATAG TCCCAAGCCT GACTATCGTG
GTCTGACGCT
CAGGAACGTG
AAAGGTAGTC
TC'rATTTTCA
GACTTATCCC
TCACGGTTAT
GGAAATAAGA
ATAATAAGGC
GTAACCTATA
TCTGTGAAC TGAGAGAAGG GGGAGAAGT'r CCGATACTTA GATAAGAGAT CTAGTCTTAG GGCAATAGCG ATTCGAGAAA GATTATACTC
CTGTAACCTT
GTGAGGTCTA
CTGTCTGATA
CTTGCTAAAA
CTCCTACTCA
TTCGAAAATC
CGTCAGTTCC
T'rrGATCTTT
TATCCAATAA
CAT' CTAAC AATGTTTTAT ?I'ATAAATTG' ATTTGAATTTr TATATCTGA'r CTCAAAGTTC AGGTAGCGGA TTAAAATGGT ATTGTCAGAA GAAGGGATAG GTATATAGCG GATAAGAGGG TGCGTAAATC ACGAGAGTAA TTAACCCCCT TATATCTTGT TCACTATAAA GAGAAAACGA GGACGGTATG TATAAAACGC TTTAGTTGAA CAGCCGTATT GTTTTAGGGG ATAAAAAAGG TCTTCAAATC ACGTCAA'rAT ATCTACAACC TCAAAACAGT GATTTTCATT GAGTA1'TAGT AATTGAAAAG GATGGAAAAA 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420
CGCCTTGTCG
GTTTTGAGCA
AATTCAGTTA
AGGATAAATT
GAAAACTTAG
CTATTAGTGG
ATGTGGTGAC
TGGAATTGAT
GTGT'rCAAAA TATGTGTAGG ATACTGACTA ACcTGCGGCT AGTTTCCTAG CTAACTCGTC AACTCTGATT TATGATATAC TTTATTTTGA AATGAGAAAA ATTGTTATCA TGCTAAAAAT AGTGTCGTTG TTTGGATTGC GTTCCAGATA GGGAGCTACT GTTAAGCGT'r TATTCCAATG CCTTATGGTA AGACCTTATT ACAAATCTTG AAAGAGTAT'r ATGGTGGATT ACCACTGCAA GGTGAAATCA CCTTAAT'rCC AGCTATTATC TTGGCTGATG TT'rCGGATGT AGCCAGTCTT GTCGAAATCA ATGACGATGT ATTGGAGATT GACCCAAGAG AAATTAACAG TCTTCGTGCA TCTTACTATT AAGCGACAGT TGGTCTACCG GGAGGATGTG TTAAGGCGTT TGAAGCTATG; GGTGCCACTG CTGCTAAAGA TACAGGACTTr CATGGTGCAA CAACGATTAA TACGATGATT GCTGCGGTTA CAGCCCGTGA ACCTGAGATT ATTGATGTAG 'NYTATGGGAG CCTCTTAGGC CGTTTTGGTG ATCTTGGTCC TCGTCCGATT GACTTACACC CTAGCTACGA GGGAGATh-AC ATGAAGTTAT GTATTT-ACAT GGATACGGTT AGTGTGGGAG AAGCAAATGG TCGTAC1'ATT Ar'rGAAAATG 185 CTACTCTCTT GAATAATATG GGTGCCCATA TCCG'rGGGGC AGGAACTAAT ATCATCATTA 3480 TTGATGGTGT TGAAAGATTA CATGGGACAC GTCATCAGGT GATTCCAGAC CGCAT'rGAAG 3540 CTGGAACATA TATATCTr'rA GCTGCTGCAG TTGCTAAAGG AATTCGTATA AATAATGT'rC 3600 TTTACGAACA CCTGGAAGGG TTATTGCTA AGT1TGGAAGA AATGGGAGTG AGAATGACTG 3660 TATCTGAAGA CAGCATTTTT GTCGAGGAAC AGTCTAAT'N' GAAAGCAATC AATA'rTAAGA 3720 CAGCTICCTTA CCCAGGCTTT GCAACTGATT TGCAACAACC GCTTACCCCT CI"rTACTAA 3780 GAGCGAATGG TCGTGGTACA ATTGTCGATA CGATTTACGA AAAACGTG'rA AATCATG'rrT 3840 TTGAACTAGC AAAGATGGAT GCGGATATr'r CGACAACAAA TGC'rCATATT TTGTACACGG 3900 GTGGACGTGA TTTACGTGGG GCCAGTGTTA AAGCGACCGA CTTAAGAGCT GGGGCTGCAC 3960 TAGTCA'rTGC 'rGGGCTTATG GCTGAAGGTA AAACTGAAAT TACCAATATC GAGTTTATCT 4020 TACGTGGTTA TTCTGATA'I' ATCGAAAAAT 'rACGTAATTT AGGAGCGGAT ATTAGACTTG 4080 'rTGAGGATTA AACCGTAGAG GTGTT'rATGA ATATTTGGAC CAAATTAGCA ATGTTT'rCTT 4140 TTT?1'GAAAC GGATCGCTTG TATT'rCCGTC CTCTTTTT TAGTGATAGT CAGGACTTCC 4200 GCGAGATAGC TETCAAATCCA GAAAATCTTC AATTT~ATTTT CCCAACGCAG GCAAGTCTGG 4260 *..AAGAAAGTCA ATA'rGCACTG GCCAAT'rACT TTATGAAGTC CCCTTTGGGA GTGTGGGCAA 4320 TTTGTGACCA GAAAAATCAA CAAATGATTG GTTCTATTAA ATTTGAGAAG TTAGATGAAA 4380 :*.TCAAAAAAGA AGCTGAGCTT GGCTATTTTT TGAGAAAAGA TGCTTGGTCG CAAGGATTTA 4440 TGACAGAGGT TGTTAGAAAA ATTTGTCAGC TTCTTTGA GGAATTTGGC TTAAAACAAT 4500 TATTTATCAT 'rACCCACCTT GAAAATAAAG CTAGCCAAA G AGTTGCTCTT AAGTCTGGAT 4560 TTAGTTTGTT CCGTCAGTTr AAGGGAAGTG ATCGTTACAC AAGAAAAATG CGGGATTATC 4620 TrGAATTTCG GTATGTAAAA GGAGAGTTCA ATGAGTAAGC ATCAGGAA.AT TCTAAGCTA'r 4680 TTGGAGGAAT TACCAGTAGG TAAAAGGGTC AGTCTTCGTA GCATT'rCGAA TCATCTAGGA 4740 *GTTAGTGATG GAACAGCCTA TCGGGCTATT AAAGAAGCTG AAAACCGTGG AATTGTGGAG 4800 ACCCGTCCTA GAAGTGGAAC AATTCGTGTT AAATCCCAGA AAGTTGCTAT AGAGAGATTA 4860 ACGTTTGCTG AAATTGCAGA AGTGACTTCT TCTGAGGr'rC TGGCTGGGCA AGAAGGTTrTA 4920 *GAGAGAGAAT TTAGTAAGTT TTCAATTGGT GCCATGACTG AACAAAATAT CTTGTCTTAC 4980 ***CTTCATGATG GGGGGCTCTT GATTGTCGGA GACCGAACCC GTATTCAGTT GCTAGCCTTG 5040 GAAAATGAAA ATGCAGTTCT GGTTACAGGG GGATTTCAGG TTCATGATGA TGTGCTTAAA 5100 CTGGCCAATC AAAAAGGGAT TCCTGT'rCTA AGAAGTAAGC ATGATACCTT TACCGTCGCG 5160
S
SC
5 0
S
S. S* S S
S
5S S S* S. ~S S S
I.
ACCATGATCA
AAACTTTATC
TATTTGGACT
GTCGTTGTTG
GATAAGGTTA
AGTCAACGGA
TTGCTTGGCG
GCTCTACCAA
GTCATTACAG
GCAGAAATTC
CGAGCAGATG
GGCACGGATT
TCACCAGATT
GATGATAACA
AGCAAGTAT'r
TGAAGAATAT
TGACGGTGCC
TCACGCTTTC
TTTGGGAGGT
TGAACAAATG
TTATGCTGTT
TATGTACAAG
TATTCAAGAA
TGTTGGCCCA
ACTCCGTCTT
GGAAATTGAT
ATGTCAGTAT
AG4GTGAAGAA
AAAATCCAGT
ATGCGTTTTT
186 ATAAAGCCTT GTCAAATGTC CAAATCAAGA GCCCTAGTCA TGAGTATGGT TTTCTGAGAG TGGTTCGTAA GAATCGTAGC AGCCGTTTCC GTGTTGTAAC CATGAGAGAC GCTGGTGATA TGTCTCGTAG TCTATTTTTG GTTrGAT'rAT TGATCGCAGA AGACTTTGAA ATGGTACCAG TTGTGACGCG ACGAGATGTC ATGGAGAAGA CT=~'CTGA GCAGATTGGA CAAAAGCTCT TGGAACCCTT TATGCTAGAA AAAAATGGAG TGACCCACAT GACCCGATTT ACTTGTTAAT CTGATCTACT TT"rTGCAGGC TGTTCAGATA ATTCATCATA CGAGACGGTC AGCTATAATT GTTTCAAAAG CAAATGTGAC TGT'rAAAATT TTAAAATCAG CTCGTGAAAT CGAAGCTATG CATATAGGCT TACGTGATTT GATTAAGCCA GTCCCCCGTC GTTGTAAAGA AGAAAATTTC ATGATGGACT ATCCTTATGC TACCTGTTGC CCTCGTCATT ATATCT'rGAA AGATGGTGAT CCCATTGCTA AATCTGACCT AAATGTCTCA AAAAAATACA CTCAGAGCTA TTCTGGTGGT GGTACACCGT CCGAAGAAGT CAAAAACTTG GGTATTGAGC AAGCTGTTGT TGGAAATCGT TACGCTGAAA GTCGTGGTTA CGGTGTAGTG ACTATGCACG AAGAACCAAT GGTTCCTAAC CGTGAAGGAA TGGTCTTAAC CATTGAACCA ACAGATATGA AAAC'rCGTTG GGCGCATAAG GAACACCAAT TTGTCATTAC GAAAGATGGA GGAACTTATTf AATAAAAAGT GAAAAGACTA AGATCTTTTC ATAATAAAAC GCATTGTATC CTGCTTTTAA GATTTTTTCC AACTCTGTTT CTGATATTCT GACAGTTGAG AGACAGATAC AGTTAAAGAT CTGTTATCAA TCAACATCAG AATCACCAAG CACGACAATT CGACAAATAT TGCCAATGTG TTGTTCGAAG CAATCAAACT TGAGCCGTTC CCAAGTT'rCG CTTATCACCA TGATGAAGTA TTTTGGCTAA TGGTGTATTG AGTGGTCGCA ATCTCATTAT GATGATATAT TGCGCATTCA GATTACGATA TTTATCATGG AATr'AGAAAC TAGGAGAAAA GACAAGGCTG GTGATTTTCT GGCGTAGATA TGTGGGAAGT CTTCCACTTC AGATTGGGGT TCTCTTAACG ATGAAGTGGC TTGCTICAAAG TTGATATGGT AAATTAAACT TCAACAATGT TTAGCAGACP CATGTTGGGC ATGGATGTAA CCAAAGAAGC ATCGGTGATA TCGGTGCGGC CGrGATTTGG TTGGTCATCG TArGGTATTG CAGGTCGTGG ATGArCAATA CAGGCGATTG ACCATTGACG GTGGATTGTC CCTGTrATCT TGACTAGCCA CTGGAAGTTT ATTrTGATAA AAGTGTTAGG GGC1'GATATC GTAAGCGCAT CATAACAAAG 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 S. 55
C
S
187 GGTCTAGGAT TCAGGGCTCT ATTTTAGTGT CGCAGTCTAT AAAAAGAGAA ATGGAATTCT 'rATTATACTT' CCTGCGAAAC AACAACTAAC TGATGCACGA AGATGTTAGC TGTATTAAAA CTAAATTAAG CCTAGAAGAC TTATGAAGAA ATTGCGGCTG ATGGGTTGAA ATAACI'CTTG
CCTCCTATAT
TGTTCCTGTA
GTTTAATGTT
AAAATATG4GT
TTTAAGCGTC
ACAGCTTATC
CTTCTTATGC
ATTTTGGTAT
TTCAAAGTGG
ACTATTAGTA
GAGATTCCAC
CGTATTGCCA
AAGTAAAACT
AATATTGTCG
ATCTI'AAATT
ATAGTAGTTC TATGAATGAT TTGTTGGTGT TCAGCGTACC AACTTIAAACA CGCAAAAGGT CCACrT'TCA ATAGTGCGAG TCACGAAAGC AACTTTATCC
AAGGGAGGAT
TT'rTGAT'rCT
TACTTTTTTA
GAAGCAAGTA
ACTTTTGAAG
GGACGAAAAC
AATATCGAAC
GTCGGAGCCA
TTTTACGGTT TCAAGAACI'C CTCTCAGTTC TGAGGACACG GTAATGATTG ATGCGACGGA AG1'AAAAATC TAGCGAATGA TTCTGGTAAA AAGAAATTTC ACGCTATGAA GTCAAGGGAG AATTGTTTCT TTGGATATCG CTGTGAACTA TCAAAATGAG TCGTAGAAAT ATCGAACAAG CTGGTAAAAT AAGGGCTCAT GAAGATATAT CC'rCAAGCAC AAACTCCACG CGCTAACAGC TGAAGATAAA GCCTATAACC ATGCGCTATC AATCGCCCTA AAAAAACAAT GGCTCAAGCG ATT'GTCACAA TAGTCATGAT ATGAAGTTGT CTTGCTGAC AGTGGTTATC TAAATCCAGC AAACTCAAGC TAAGGAAAGA AGCAAGGTTG 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700
AGAACATCTT
GTAAACGCTT
TCTAGTTTTG
TCCCAAAGTA AAAACGTTTA AAATA=TTC AACAACCTAT CGAAATCATC CGGATTACGA ATGAATTTGA GTGCTGGTA'r TATCAATCAT GAACTAGGAT CAGGAAGTCT ATTGAGGTAT TGAGCTAGTT TATGAAAAAA TTGGGTGAAA AGTCGAGTGT TTTAGAAACC CACAGTGTAG TATTCTAGTT TCAATCCACT ATATTTTGCT ACTCCCCGTA AAGTTT-CTAT AGTAAGGAAG AGAAGATGAA GTGCAAACGA CAGAAGGTCC GTTTTrGACCC ACCGTATCGC TTGGCCATTA CCTTTACCAA AATCCAGCGA CTCAGGACTG CGTCGCGATG CGGACCATAT CAGCGAACGC TCATGAAACG GAACGAACTA TTTTGGGGAC TATGCTGCCC AAGCTGGCGA TTTCCCTGAT TTCTGAfATA ATAGAALATAT CGCATTATTA AATGGAATGA ATGACCGTCA CTTGCTAATC ATGGCAGGGG CTGGTTCTGG T'rATTTGATT GATGAAAAGC TGGTCAATCC CAAGGCTGCG CGTGAGATGA AAGAGCGTGC TCTGATTGCG ACCTTCCACT CCATGTGTGT TGGCTACAAT CGTAATTTTA CAATTG'rGGA TATrCTCAAA CAGT'rGAACT TGGACCCTAA CATTTCCAAT GCTAAGAATG ATTTGATTGA TATGTATACG CAAAT'rGTGG CCCAGTGTTA
TGACTTCAAG
GGCTGAGGCG
AAAGACTCGT
TTGGAATATC
TTATAGCCTC
GCGTATTTTG
TCCTGGTGAA
AAAATGGAAT
TGATGT'rGCT
TACAGCCTAT
CAAAAAGAAC
CG1'CTCTTTG
CACGTITGATG
TCCCGTTTTA
GGTGCTGATA
TTGTTGGAGG
AAAAATAATA
ATCGTTTACT
GATGAACTTA
AATGCCCAGT
GTTGGCGGAA
188 TTCGTCAGTC TGAATCCGT'r GACTTTGATG ATrrGATT-AT ATCAAAATCC TGA'rGTTTI'G ACCTACTACC AGCAAAAATT AGrACCAAGA TACCAACCAC GCTCAGTACC AATTGGTCAA AAAATATCTG TGTGGTTGGG GATGCGGACC AGTCTATCTA TIGCAGAATAT CTTGGACTT'r GAAAAGGATT ACCCAAAGC AAAATTACCG CTCAACCAAA ACCATTCTCC AAGCGGCCAA AAAATCGCCG TCCTAAAAAT CTCTGGACTC AAALACGCTGA
GCTGACCTTG
CCAATACATC
ACTCTTGGCT
CGGTTGGCGT
CAAGGTTGTT
CGAGGTTATT
TGGGGAGCAA
ATCGTGCCGA TGATGAGCTG GATr.AGrGCTG TATTTGTAGC CAGAACCATC GTCGCAGTCA AAACTTCCTT~ CATAAGGATT T'rGCAG'rTCT CTATCCGACT CCCGTACAAT TGAGGAAGCC CTGCTCAAGT CTAACA'rTCC TTATACCATG CCAAATTCTA CAGCCGTAAG GAAATTCGCG ATATTATTGC TTATCTCAAC CTTATTGCTA ATTTGAGTGA CAATATTAGT TTTGAGCGTA TTATCAACGA GGAATTGGTC TAGGTACAG'r TGAGAAAATC CGTGATTTTG CAAATTTGCA ATGCTGGATG CTTCTGCTAA TATTATGTTG TCTGGTATCA AGGGTA.AGGC ATCTGGGATT TTGCCAATAT ACAGAGTTGG TTGAGTCCGT GCGACTCTAG AAAGCAAGGC AACTTTGATG ACACCACGGA TTC'rTAAATG ACTTGGCTTT GTGACCTTGA TGACCCTGCA GGGATGGAAG AAAATGTCT GAAGAGCGCC GTCTAGCCTA GATGCTTGAT TTGCGGGAC CCTAGAAAAA ACAGG'N'ATG ACGGGTTrGAA AATATCGAAG 'rGTGACAGAA GAGGAAACTG GATTGCCGAC ACAGATTCAG TGCTGCCAAA GGTCTCGAAT TCCACTTAGT CGTGCGACTG TGTAGGTATC ACGCGTGCAG
AGCTAGACCA
TCGATATTCT
AGTTCITTC
GTCTGGACAA
GTAGTCAGGA
TTCCAGTTGT
AAGATTCAGA
AGAAPLATTCT
GCCTAAACT
AAATATGTCT
AGCCCAATCT
CTTAAGCATT
TAACTCCCAA
TGTTACGAAG-
ACTGAGTCGT
GACATCAGAA
CTTTTTGATT
TGAATTAGAA
CTATCTGACC
8760 8820 '8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 AATGCCAACT CACGCTTGCT TTTTGGTCGT ACCAATTATA ACCGTCCGAC TCGTTTTATT AACGAAATCA GTTCAGACTTI GCTTGAGTAT CAAGGTCTGG CTCGTCCTGC AAATACAAGC TTITAAGGCAT CATATAGCAG GCTCTTCAAG ACCGTAAACG TTTGGTCAAT TTACAGCTGG GATATTGCTC TCCACAAGAA GCTAGGCAGG AATTGAAAAT TGGTAGTATT TCCT'rTGGTC AAGGTATGAG TTTGGCTCAG CGGTGCTGCC CCAAAATCAA TCCAGTCAAG CGGTCTTCCA CCCAAAACCA GCATCTAGCG AGGCAAATTG GTCCATTGGT ATGGGGAGAG GGAACCGTTC TGGAAGTTTrC AGGTAGCGGT CAAT'rTCCCA GAAGTAGGTT TGAAAAAACT TTTAGCCAGT GTGGCTCCAA TTGAGAAAAA AATCTrAATTT TCCATCCTTC TCACGAATAA TAPAAGTGAGG 189 AGGATTTTTA TGTACAGTAT TTCATTCCAA GAAGATTCAC TA'rTACCAAG AGAAAGGCTG GCCAAGGAAG GAGTTGAAGC GCTTAGTAAC CAAGAGTTGC TAGCTATTTT ACTCAGGACA GGAACACGTC AAGCTAGCGT ~T'GAAATT GCCCAAAAAG TCTrGAACAA TCTr'rCAAGC CTAACGGATT TGAAAAAAAT GACCCTGCAG GAATTGCAGA GTTGTCTGG TATTGGGCG'r GTTAAGGCCA TAGAAT'rACA AGCTATGATT GAACTGGGGC ATCGTATTCA CAAACACGAG ACTCTTGAAA TGGAAAGTAT TCTCAGCAGT CAAAAGTTGG CCAAGAAGAT GCAGCAGGAA 0 TTAGGGGATA AAAAACAAGA ATCCATCAGC AGACCATNT ATTCTTCACT ATGCAATCAA TCAGGAGCGG TAGCGCCTAG TGCGAATTGA TGGGGA'N'GT AGTTATCGTG AAAAGACAGA TTATCTTTGG GACGATTTTC ACATCATCCC TACTCATGAC TTTAAATTTG GTGCTAAGTC TCTCCATAGA TTTCTTGGAG TCAACAGAAG AATCCTGCCA ACGTTAAAGA GTTGGGTACC TCCTTGATGA GGGCCAGTTC GTTTTGGGTT CGCCATAAAA
GCACCTGGTG
'rATCGGGTCT
GCATATGGCG
GTAACTrCGTA GTATCGCTCA ACTTCTCTTA TCTTGGTCCA
ACCCGAGAG
CAATCATCCT
CCAAAATGAT GATCATGTCA TCTCTTGGAC CATTTGATTG
TTTAATCTAA
AAAAAGAAGT
AGCCTCAATG
CTTGATGCTC
AACGGTATCT
ATGGTCTTCG
ACGGCAGACA
GAAGATATCT
TTTTGGATCG
AGTTCATTAA
TCTGGATGCC
ATACCATCTT
TGGTGGTGGA
GGTTCTGTTA
ATATCTTGGT
GAGAAAATGG
CTTTGAAGGT
ACATTTTGCC
TCTTGATCAC
CT1AAACTTGT TAAAGAAGCC TCTCTCATTC TAATTACTr'r CGACATAGTC AAAGAGTT'TT ATTGGACACC GAGAAAGGCG TAGGATCATG AGCCACAACT AGGAGTTGAT ATGAGAGATT CCAAGCGTTG AG'N'G'GTAC ACAAAGTTCC ACCCATGGCA GCTTTTTCTG TT'rAATAGCT GATAGTCATC ACTATCAATG CACCTGTCAA GATGAGCTTG CAATCGGTAG GATGATGGGA CTGCGTAGCT CATCATGATG TAACTGGTTT ?r'rCATAAAA GCACTCTATC TCAATACTCA AAATCAAATC 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 TCAATCAAAC TGATATAGTG GCAGGCCATT ATCCCTCCAG CATCTTTAAC GCCTTCAACA AAGCCTTTTG TCATCATCTG GATGAGTr'rT TTCGTTTCCT GTAATCCCAA TGATTTTCGC TTTCTAATCC TCTTT'rCGCA TGAAGTAGAG GAGGGTr'rG AGTTCACTTG TCAAATCGAC ATACTGAACG ACCACGTCTT TTGGTAAATG CAGATGGACT GGTGAAAAAC TGAGAATTCC TTTCACACCA GCATCAPLCCA AGAGATTAGC AACCTCTTGT GACTTGACGC TGGGAACAGT TAGGATAGCA GTCTTCACAT CAGCATCCTT GATTrTATCC TTGATCTGAG AAATCCCGTA AATGGGAATC CCGTCAGGAG T'ITGGGTACC GACTTCAGGA 'EGGTCGTCTA GGTCAAAGGC CATGATAATC TTCATCTTGT TACGTTCGTG GAAGCGGTAG TGGAGAAGGG 190 CATGGCCCAT ATTTCCAATA CCAACCAGCA TGACATTGGT AATAGAG'rTG TCATTGAGCA AATCGCcAAA AAATGTCATT AGI=T=GA CATCATAGCC AAAACCACGA CGACCAAGTT CACCAAAATA GGAAAAATCA CGACGTACGG TCGCTGAATC AATACCGATA GCCrCTGCAA TTTGCTTAGA GTTGGCACGT TCAATCT7"r CTGCATGAAA TCTCTTAAAA AITCGATAGT AGAGAGAGAG TCTTTTTGCT GTAGCTrTG GAATAGCAAA CTGTTTATCT TTCACAAAA'r CACAACCTTT CTATTCTTCT ATTTTATAGA AACATrGTGA AAAAATCAAC AAAAATAAGA AAAAACTAAG AAAAATCTTA GTTTTGATGT AAAAAATCTG CATGAGATAG AAAACGGTAG AGGTCTCCGA CCAGCCCCTG ATAAACTTTT TTGCCCCTAA AAGTCAGAGA AGTCACATAA AGTGTATCTG GTAAGGTTAC ACATCCTGAC AAAGTCAACA TGAGAGCCTC ATGATCCTcA TACTTGAGAG TACGCTCTAC ATGATAGCAG TCCTTATAGG TCAGTrCAAA CATTTTGGC'r CTATCTTTCC GA?'rrGTAA AGACACCACG TTCTACCAAG CTATCCATGA GGAAGTAGAA TTTTTCCTGA TGAATATGGT GGTCTTCTGA TTTGAAAATA TCAACTAGAC GAAGGCCAAA CT'rGTCAGTG ATATTGATTI' TAGCCCCTGT GAAGCCTTCA CCGCTGTTTG GCACTTTTTC CTTAGTTTCA AAAAAGGTGT TATCTTTGAG GTAATCGTAA CGACAATTTT r'rAACTGAAT AAGTTCCTTG TTAATGATGA TTTTGAGTTG CAAAAGGCGA GTCAGT'rCAT GGTGAATTTT I-rAACAGAAG GAT'rTTTTCA AATGCCATAT
AGTTACCAAC
GGCTAAGAGT
GGCTAACCTC
12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 CGATAATTTC TT'rTAAGGTT ACAAACTGAT GCGAAGGGAT GAACATGGCT GGA'rrGGACA CAGCTAAATC 'rAGCCGAAGG
GAACATAAGG-GAGATGATGT
CCAGAAAGGC AGTTTCTAGA CTTCTTTTAG GGCTGCAACC TTTTCTGTTC CTGGTCTCCG GAAAACCGAT TCCCTTAGGA CCAATTCTTC TGAA'rGAATT CAGCAGGGTG TTGCTTGAGT CA'rrATTGAC AAACATGGTA TTTGCGAGGG 'rrTGrAGGTC TCC'rTCAAGC GTTCTGAATr ACGCCTGCAG TACAGGCTGA AGTAAGAGGT CATTTrTCTG TTTCCTCTAT TCAGGTAATA TTTTGTACAT G'rTGAAAATG ATGCCTACAA TGGCAGGCAG CCATGTAGAT AGGAATCAAA CCATGGAATT TGTGGGCAGA GGGArTTTAC CAATAGCCTG ATITTGGCCAA TTCAGCGAT TTCAACGGTA TTTTGTGGCG TGCGCCATAC ATGGCTTCAA GCCAGTfAGAG ATTGAAATTC ACCAGGAAAT CCAATATTGA CTGAATGCCC TCCAGCTCTG TTCTTCTTGT TTTTCTAGGT ATTTTCAGTT CCTGCACGTT? GTCCATGCTA GATGCGTAGA AGCAGTGAGA AAATCAATGC AACTGCATCA ACArGATAGG GGGCAGTAGG TTTCCTGTCT GAAACCAAAA TCGTATCGTC ACGTAAAGCC TTTTGAATTT GCTGGGCTGT GATTTCTTGA Tr'rTCTGGCT GGATAATGGT TGCr'rCAAAC CCAAAGTGTT GAACCAAGTA ATCAATTGTT TCAAGGACAG CA'rGGTGCTC GATGGCAGTT GTGATGATAT 191 GTITTCCTTG TTCTTGGTGA CGAAGACAGT AGCCAATGAT GGTAGTATTA TTGCCT'rCAG TCCCACCAGA AGTGAAAAAG ATATGTTGAG GTTTTGTCCT TAGTAACTGG GCTAGTTCCT GACGGGCTTC TCGCAGAGT T1TGCCAGCTT GACGACCATG ACCATGAATA C'rAGAACGAT TTCCG'rGGGT TTCI'GCATA ACCTTGGTCA TAGCTGAAAT AGCAACTGCT GACATAGGAG TCGTTGCAGC ATTGTCCAAA TAAATCAAAG AATCACCTTA TrTCrTTTTA ?rGTAGGCAA AGAGTGGGCT GACTGG'rTTT CITCGTGAA TACGGACGAT AGCATCACCA AT'rAACTCAC TAGCAGTGAT GTAGCATACA 'rTTTTAGGAG TTTTTCTTT TGTTIGCTACr GAATCAGTCA CAAGAATT~TC TTTAATATTA GTATTGTCAA GAAGCTCAGC AGCTCCCTCG ACGAAGAGAC CGTGGCTAGA AACAGCATAA ATT'rCTGTAG CAGAGAAGGT ACGTCCTGTA TTTAAAATAT CATCACCAAT AATATAACCT TCGTTACGAG CGATAGGAGC ATCAAGATAT TCAGCCAGGC GGCTAACGAC AACAACATCT GAACCAAGCA GGGGAACAGT GAAAAGATTA TCCACTGGAA CTCCTTCACG TTCAACGATT TTAGAAGCTT CATCAATCAA GATAGCTTTC TTACCTTCAA T'rGCATCGTC rTGAGGGTAG TCGATAATGG TACGCGCACG TTTGACACCT GAATTTTTAG ATCCTTTATC GCAGTAATGT TTTGCGAATA TATCAAAGAA ACCTTGAACC TGAACGGCAT GCAAATCAAG AGTCAGGATA TTGCTGTAAG TGGCTCACGA GAAGGACAAC GTTGATACTG ATTCCATTAG GTGGTTGTTG CACGGACACT TTCTTCGATA GTTTTCCAAG TGGGACACCA TGAGTGCGAA AAGTTTCATG TATAAATCCT AGTTATATTTI TTCTATTTEA CCAAAAAATG ACCAATTTTG AAGGAGCT TCC'rGTTTGC CTTGCTCATG AAGTATCGGC TCTGATCCAG G'TTGTTAGAC AACCAAGAAC
CGATCAACTC
GGACAAGCAA
TGGGCACTTG
ACAGGGAAAC
TTTACTTGGA
ACAGCT'rGGG T~rT'TCTAT
CAGCCTTAAC
TGCGGTCTTG
CACGCACACA
TTGTTGATTG
TTTCTCCGTC
CAATTTTrfTG
CTGACATTAT
CAGCATATTG GCAACTAGTT ACGTGCATAG CCAAAATATG AGCATCGACC ATGATTAACA GATGATGTAA ACATCATAAC TGAAAATTGA CGTGATGATA TGCAATCTCT TGGTTAGAGT AGACCGTCCT CTGTAAACTT GATTTGTGTA TTIrTTATrTT 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 ACCTTACATA TATGAACTGG GAGATTATTT CAGCTATTTT TTGATAGGAA ATCTGATTTT ATTTTrCCACT TCAAGCTCCA 'rGCCATGAGA CCAATAGCTG CTGCCAGTTC TTACTTTGGA
TCATACTTTT
TCTCTAAAAA
ATTCGTAATC
TTTTCATTTC
TACCATG~T
GACAAATCGA
TTGTCGAAAA
TGTTATATCA
ATAGCGAAGC
CGCCAATTCA
ATCTTTTAGT
TGTCAACTCA
TCCAGTACTA GCCCTTGAGG AAGT'rCTTCC TTACTCAGAT AGTTCTCAGC TGCAATTT1'T GGTTGTATTC CATGTTTCCA ACACTCTGCG GGACTTTGAG
GCCCAGTCTT
CCCGTGTCGA
AGACGATTGT
TT'rCTTACCT C3'ATATGAAT 192 CAAAGGTTCG AATGCGCATA GCGACTTTCT TTTCTCGCAG T'rCAAAA'rCA TGTAGTAAT'r TC'1'TTGAAGA ACAGGAGTGA CACCTGTGAA CTCGTCTTT ATTCATCTTT ThTrCAATAGT GTTTTCAATT CAATTTCTAA ATGTTTCATTr TTTTTTATCG TTGAMAGCGG A'N'TATGGTA TAATXAAGCAT TGTATTTATT CTGGAGAAAA AATCAAAGAT ATrTTGACG GATAATATGA GAACAAGGGA GAATATATGA CC TTAGAATG TTAAAGATTA AACTTCGTGG ATTGAGTTTG TGACCGGTCG CGTGGCATTA CTTATGCGAC ATGGTTCAGT TTGTAGATGA ATGCGAATCA TACAGGAGCG TATCATGTGG TAGTAGAATA GAAATTCAAA TTCGTACTTT TACAAGTACc AAGGGGATTr ATCGCCCATC AGTTGGATGA GCACTTTTTG ATCCT'rTGAG GATGAAGAAT ACAGGTAAAC GGTTT'rCTAT GAATTGCGAG TCCGGATATT GTCATTTCCA CGAAAATCAG CTTGACAAGG GGAAGAAT'IT CTAGATCCTT ACAT'rCAAGC TGTTGGTGAG TATTCGTAAG CAATATCGTA AGCAAAATAA GCAT'rCTCCA AGTC.AAGCCA ATTrGAGAGCA TCAAAGAAAA CTTGGAACAC GATTTGCAGG ATATTGCTGG CGTCAAGGAA GTAGTGGATA TTTTGCACAA
AATGGCTCGT
CT'rACG'rGTG
GCGTCAGGAT
AG.A'XTACAT'I ACTCATAGAA TACGGITGAT ACCATCAATG GGCCATGAAT TTCTGGGCAA a. eaa.
a a a.
a a a a. a.
a a a .a .a a a a a.
a a
CCCAGATGAG
AGAAATGGGT
TAGAAAATTA
GAATTGATC'T
ATCGTTTGAA
TTGGCGGGGA
TCCGCTTTAT
AT'rAAGAAGC GAAA'rTCGTG
AATGACGGTG
GATAGCCAAT
GAGAAATCAG
TGGTATGCTC
CGGTCTTCA'r
GCTAGTGACT
GAAGGTCTTT
AAGCATCAGG CTATCGTTCC GAGCTAAGAC TATTTTGGCA CGATAGAACA TTCTCTCAAC GACTGGAAAT TACAGCTAGA ATGATATCCA AGAAGCCCAG TAGGAAACAG TGACGATACA AGAAAACCGC AGAGTCAAAG TTTATACTCA ATGATACCAA TTGTCGGCCT TTCATAAGTA ACTrGGACATT TGGGCTTCTA AATTTGCAGC TAGATACTCG CTTGAAAATG GTGAAGTTAA GATCGAACCA TGGTGGCAGA 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580
TACAGATTAT
GGCAAGGGTT
GATTTTCAGA
CGTGATTTTG AGTTGGACAA TCTTACCCTG TTCTGAATGT a. *a a a a a. .a a a a GCACTCAACG AAGCCAGCAT CCGCAGGTCT TATTGTAATA AATGGTGTTC CCTTTGAACC TTTTCGTGGA GACGGGCTAA CAGTTTCGAC ACCGACTGGT ACTACTGCCT ATAACAAGTC TCTTGGCGGT GCTGTTTTAC ACCCTACCAT T.GAAGCTTTG CAATTAACGG AAATTGCCAG CCTTAATAAT CGTGTCTATC GAACACTGGG CTCTTCCATT ATTGTGCCTA AGAAGGATAA GATTGAACTT ATTCCAACAA GAAACGATTA TCATACTATT TCGGTTGACA ATAGCGTTTA TTCTTTCCGT AATATTGAGC GTATTGAGTA TCAAATCGAC CATCATAAGA TTCACTTTGT CGCGACTCCT AGCCATACCA GTTrCTGGAA CCGTGTTAAG GACGCCTTTA TCGGCGAGGT GGATGAATGA GGTTTGAATT TATCGCAGAT GAACATGTCA AGGTTAAGAC AAGA'rTAAGT TTCGAGGTGG CTATTGGACG TTGGAGACTA TTGGAGGCTA 'rTGAGCT'rCC AATAAACCCT ATGGAGTGGC TTTATCAAGG GTTACTATGT AGACTAGA'rA GGGATACTTC CGATTAGACA AGCAGT'rGCA 193 CTTCT'rAAAA AAGCACGAGG TTCTAAGGG ATTGC'rGGCC AGC'rATTCTG GTCAATAATC AACCGCAAA TGCAACGTAT CGTTACCATT GACATTCCCG ATTAGATATT CTCTATGAGG TTCTATTCCT AGTGTCAATC CAAGCAAAAT TATGAAAATC TGGCTTGATG CTCTTTGCCA GAAGAAATCT ATCGAGAPAAC AGAAGGGCAA ATTATTGCTC GGCTAAAGGC GGAAAGTATG TCACTTGGTC 'rATATTCACC TCATATCGGT TTTCCTTTGC TCAACGTCAG GCTCTGCATT
GGAGATGGAC
TCCATTATTA
GTAGCTTCTT
CAAATCCGAG
GGTAGTCTGG
CATCCATT
AT'TTGGAGCC
CCAGACGAGT
ATGGAAATA'r
TCCATTTT'TC
AAGATGGTAT
0..94 4 6 CTGAGAAAGG CrTTTGAAACC ArGACCACTT TCTAGTCIG ACTCTAATAC CATTGCCAAT- AGCAGGT'rCA CATTGTTACC AGCACGGTTA TGCCCATGCA GCTACTTTGC TTTGGTTAAG CGATTGCGCG TGATGAAGAT CCCATACT'rC ATACAAGATT TGCACACTGG TCGAACCCAT TGGGAGATGA TTTGTATGGT GCCAT'rACCT ATCCTTTTAT 'rGCCGGATGA =rTAGTAAC CTCAGAGTAT AATTATTATC GCCAACCT'rG TrGGTAAAAA CTTCAAGCAA CAAAACGCT CCTGAAAAAA 'N'AAAATTTA GACCCTCAAC ATTATCCTCA GGCAAAATGA CTGAAGAAGA GTGATGTTGG TTTACTTGGG GCTTCAACAG T'rCGCCCAGC TCAGGAGCCT TCCTCATGGT AACATCAATC CAGATGCAGA TAGAGCAAGA CTTGCAGT'rA GAAAGTCCCT CT'rAT'rACCC AGTTATCAAC TAATACTCTA TAAAAACTGT 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320
TTAAAGGAGA
TGCTCGTATC
AGTAAAAGAA
TCTTGAAATT
AAACTCATGG AAGTTTTTGA AAGTCTCAAA GTTCTCCCTG AAGGGGAAGA GCCTCGTATT ACAGAAGTGA TTCCTGTTTT GCTTGGAAAT CAAGGAATCA TGGATGGTTA TGAGGTCATC S. *a S S S
S
a ATTTGAAGAA A'rGGTTTCTG CCTTGGTGGA TGTACGCAAG GTrr'rGGTTG AAGATGTCAA CTTGGTTGAT GGAATGGTGT CAGGAGCGAT TCTACAAATC A'rCAAAACTC GTCCAAATGT TCGTGG'rACG GAACGT'rACC TATTTGGAGA ACCCTGGCT GAAAT'rGCCA TCAACTCAGC TAAAKI'TGCC ATGTTGAGCT ATT1'ACTAA GGTCG'rTGAA GCAACTAAAA TTCCACGA GTTGCAATTT GATGCAGCCT TTG?1'CCTGA GGTAGCTGGT CAAGCAAATG TCTTCATCTT GCGTCGCAA G
CTACTTTGGT
TCACTCAACA
AACTCGTACT
CTGTGCCATT
AATCACAGCT.AAGATTmG GCATCGAACC AGGTTCAGGG TTTGGTGAAA GCGTTGATAA CT'rGCGTCCT GACCTTGAAA TCGATGGTGA AAC'TGCAGCT CTGAAAGCTC CTGGAAGTAC CCCAGGTATC GAGGCAGGAA ATATTGGTTA 194 CAAGATGGCT GAACGCCTGG GTGGCTTTGC GGCTGTAGGA CCTGTTTTGC AAGGTTTAAA CAAGCCAGTT AATGATC'rTT CTCGTGGATG TAATGCAGAT GATGTTTACA AGTTGACCCT CATCACAGCA GCTCAAGCAG TTCATCAATA GTGAAAACTA TAAAGTGATA TACTIGTAGTr ATGAAACTAT GTACGAAAAG CACTGCCATT AATTCCTGAG CTGATTGGTG TCAAAAAGGA AAACTrCCAA GCGATGATAT CCTGTCTATA AGAAATCTGT AATATACATA TCCGTAAAAC ATGAAAAGAG AATTTIrTGG CTCTTTGTCA GAGAAAGGAC AAATTTCATC CTTTCTTT TGAAGTTTTC AAAGTTCCGA AAACCAAAGG TGGTCGCTTC CAGT'rTGGCG T'rAGAATAGT TATCTTTGAG GAAGGTTTTA AAGACAGTCT CCTCAATAAG TCCGAAAAAT TTCTCTGGTT GATAGAGCTG ATAGTGGTGT TTCAAGTCTT CTTTATTGGT TAAGTGCAtA CGAAAAATAG GGCTATCCTG TTGAATGAGT TTCCAGTAGC
TACTATGCTA
AACTAAATTA
CACGACCTAT
TTAAATGAGT
AGCTAAGCTC
ATCCGTI'TT
GATAAATTCC
ACTGTAGTGG
TGATA'rTCAG
CTTTTTGATT
GTTGAAGAAA
AGCGATAAAA
19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20199 CATTGCGCTT GATAAGTTTG ATGAGATTAT GTAGTTGAAG GGCGTTGATA ATCTTTTC'TT GAAAAATAGG ATGAACCTGC TTAAGATTGT CCTTATTCTG GAAGTGAAAA AGCAAGAGTT CCGAATAGCT CAAAAGCTTG TTTAAAATCT
GACGATAAAA
GCTTGATAG
TCGCTTATCA CTCAGTTTAC INFORM4ATION FOR SEQ ID NO: 7: SEQUENCE CHARACTERISTICS: LENGTH: 19702 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: ACCCGATGTA TCAGCGGATA TT'rACTCTAT TTT'rCAA.ACG ATGTTATACC CACAATAAAA GAAAAAAGAC CCTAAGGTCT CCTTP-GCTTT TATTATTAAA CGCGTTCAAC TTTACCTGAT TTCAAAGCAC GAGCTGAAGC CCAAACTTTT TTAGGTTTAC CATCGATAAG AACAGTAACT TTTTGAAGGT TTGGTTTTAC GGCACGTTTT GTTTGGI'CA TCGCGTGTGA ACGGTTGTTT CCTGATACAG TCTTACGACC TGTAAAGTAA CATACTTTAG CCATTGTGTT TTCCTCCTAT TAGATCTAAT ATAGCGGATG TGCTAGCACC ACATACCGTA CTATGTTATC ACATTTTCTT GTTTTTTGCA AGGGAATTGG AAGATT'TTTT ATTTGTGTCT TAAATCAGGT CTTGCGTGAC ATTTcTGCTC TCCACATGCC ATCGTTGATT AACAGAACAC CAGAATTAAA ATI'ATGTGTA TAAAAATCAT CTCTAACTGC AGCTAAGGGT ATAGCCGTCA AGTCCAAATC CCACAGCTCA 195 TCTATCGATT TTCTTACAAC AATATCTGAA TACTTTGGAA TAAAATACCT AAAAAAGCCG AACCTN'CAG AATAI-rACTG TCAATCTAAA GTCTrT'r'rTC CATCAATTGG AACCATTCTC TAAGATTATA ATGATGAACA CAAAGAGATT CATTATCTGC ACCTAAGACA TTTAT'rCCCA TCATATTAT'r TATTTTCGCC TATTCGT'rCG CGGTTTACAA TTTTTACATT TAGTTAGTGA GGCAGT'rAGC TTTAGTTCCA ATTGTTGGTA TGTCCGCAAA CCAGCGGAGG GGTCTTTGGC TAAGATATTG TCCAACGAAC GGTCTTTTTA
ATCGCTTTTT
CCCATCATAT
TAAAACCATA
ACGACACGGA
TAGTTCGCCA
CTGAGTCACA
TCCAAATACA GTACACGAGA CTCGCTTACA CATATGAAAG TCCCTCAAAG GGGAGACGAT CATTCACAAT CTCACTATTC AAAGTCTCTA GCGGAAGGTC ATCATTAAAA ACATAAAACT TTATTGT'rGT TTCAACTTTA TCCATATAAG TCTCTTCTTT CACTTTTTAT CI'CATTTCTT GTTTCCCATC ATATGTT'rCT ACGTAACCAT CCAGTGGAGA TTTTAGATGA AGTCCCATTA GTTT'rACAAA TCGATTTCAT TTGCCAAACG AATAGCGACT AGCGTCCAAC AA'TTTGGAAC TCTTCTCCTC TAACTCTACG TCTGGATACT GCAAAGTCAT TTTCA.AAGAG AAAGACTGGT TGGTCAAAAC CGACTTGACG ACATCCGT'rC ATCCAAGTCC TCAGGCTTGA CCCATTGGGT TCATAACTAC TTCCGCATTG TACTCGCCTT CCATGCGGTG TTTAAAGACT TCAAACTGGA GTTGACCTAC AGCGCCTAGC ATGTACTCAC 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 CTGTTTGGTA ATTC?1ATAA AGCTGAACGG TGTGGAAGGA TTTTTGCTTC ATAACATTCT TAAAGGTTGG CAGGGGT'rCA AATTCAAACT CCTGATAAGT ACCGGTATCG TAAACCCCGA TCTCACGACT CTCCGCCATA AACTGGGTAA GACGAGATT GACACTCATG CCGCGCTCAA 'rACGGTCACG GTGACGAGGG TCCATGTTGG CCTTGTCATA AGGATCCACA ATTTCACCGT CAAACTTGAG GAAGGTr'rCA AGCAAGGTCT AAAAGACAGG CGTCAATTCT CCAGCCAGAA CAT'N'AAAAG CTCAATGTCA TCCTTGACTT TGTCCCCGTC TTCTAGACTG GCAAAACGCT CTCCTTCTTG CACCAATTGC TCAATCCCCT TAGCAGAAAC TTTCATGAAA ATCTCAGGTG TGT'TTCC AACCGTCAAG GTATCCCCAA ThA'rATCACC TGCCACGGCA TTGGTCACAT CATTAGA'AG TTTAGCCCCC TTACCAGTAC ATTCGCCAGA TACGATACGG ACAAAGGCAA CTTGGATTTT AAAGACAAAG CCTGAGAAAT CTGTTTT'CTT GTGACCATGT GGTTCTGGAG GCACACCAAA GTTTGTCAGG GCTGAACCGA TAGCTTCCTC TGAAAACTCA TTCCCGGCTT GCTCGTAGAA AGGATTGCTA CCAAAGAGTT CATCCCCTr'r GTA.AAGCTCT AAACGTTGGT TATAGAGGTC ATACAAGCCC TCAAAGGCTT TCCCCATCCC GATAGGCCAG TTCATAGGGT AGCTAGCAAT CCCCAAGATT TCTTCCAATT CTTGCAAGAG ATCCAAAGGC TCACGACCG'r CACGGTCCAG CTTGTTCATA ACAATTCTT GGTTTGACCC CCACCGCCAT CAACGTACGA AGATATTCAC GCGCTTGCCG 196 AAGGTAAAGA CTGGAATGCC ACGATGTTTC ACAACCTCAA TCGATCCCCT TGGCAGAGTC CACGACCA'rC ACCGCAGCAT TAGGTATCTT CTGAGAAGTC CTCGTGCCCT GGCGTGCTrA TCGTAGTCAA ATTGCATAAC AGATGAAGTA ACAGAAATCC CACGTTGCTT CTCGATATCC ATCCAGTCAG ATITTAGCAAA
TTACCGTACC
TTTTCCCCGC
GAATATTCAT
CATT-TTTACA
TrT'I-CAAI-r
ATAGAACAGA
GAGGTAACAC
AGCCTCACGA ATCTCACCCC CAAAGTAGAG GTCCGGGTGG GAGATAATGG CAAAGGTACG AAGTTCTCTT TC'rTTGAT'rC TCTATTTTTC TTGGATTTTA CCAT'rCCTTT CAACACTCCA TCTATTTCTT TTCACTTCCC CCTCCCTTAT CTAAAAATCA TCATTTCACG AAAGGATGCA ACGTTGCCAA TCTTTCAAAA TTAAGATTCT AGTCCCTGTT TTCTTCCCTT TAACTGCTCA GTGATGG'PTG ACGTTTCTTA ATTTCTTCTT TTGTTTCAAT AGCTGAGAAT TTATATCGGA TTTTAGCA'TT
TTATAGGAAA
AGATGAAAAT
CTGAAGAAGA
ATATGGTAAA
TACCCAAGAA
AACTGCTGCC
TTTGCGACCA CCTTG'rCTAA ACTGGTGTCG CACCTACTAC GCCGAAGAAG GAATAGACCG TATATCAAGC TGCCAGCTAT AAACTATTGA AGAGTTGCAC CCCAAGCAAC ACTTGAAAAT TCGCTGAGGA GCAAGCTCTT ACAATGTCCT TT~CAGGAAT'r TCACAACTGC TGCCTCAAAA TTGCCAATGC AAAAACCAAG CTATGGGTGG rrrCAGGTGAA GATTGTTGAC ATGGTTGAAT TGCTGGGCGA AGTTGACACA GACTATGGCT GACCGCAAGA CTGTACTCCG CCCTGATGTG TGATCGCT'rG TT'rAAAAACG TACCTGAAAA AGACAACTAC CCTAGACAAT GGAGGAGATG CCTAA'rCACT TTTAACAATA AATCTCCTTG TCTCTAAGGA AATTTCTGCA ACAGAATTCA ATCAACTCTC GTGAGGAAGC CCTCAATTrCA TTTGTCACCA GTTCAAGCTA AAGCCATTGA TGAAGCTGGA ATTGATGCTG 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080
CCACTTGCTG
ATCCTCTACA
GGCATGATTG
ACTTCACACT
TTAAGGATAA
ACTATGAGCC
'rCGTTGGAAA
ACGGAGCAAC
GCAAGTTCC TGGTGGGTCA TCAAGTGGT'r CTGCCGCAGC GCTTGTCACT TGGTT~CTGA'r ACTGGTGGTT CCATCCGCCA TCGTTGGTCT CAAACCAACC TACGGAACAG TTTCACGTTT GCTCATTAGA CCAGATTGGA CCTTTTGCTC CTACTGTTAA ACGCTATTGC CAGCGAAGAT GCTAAAGACT CTACTTCTGC TTACTTCAAA AA'rCGGCCAA GACATCAAGG GTATGAAAAT TAGGCGAAGG AATTGATCCA GAGGTTAAGG AAACAATCTT CATCTCTACA GACGGTATTC AATCTTTGAT GCGACAGCTG GACCAACATC GACGAATTTG TAAAAACGCT TGGAACCACA TGTAGCCTCA GGACAAGTTC ACCTGCTGCC TTCAACGGAA CGGTC'rCATT GCCTrTTGTA GGAAAATGCC CTCTTGCTCA TCCTGTCCCC ATCGCCGACT* CGCTTTGCCT AAGGAATACC AAACGCGGCC AAACACTTTG 197 AAAAATTGGG TGCTATCGTC GAAGAAGTCA GCC'rTCCTCA TTTATTACAT CATCGCTTCA TCAGAAGCTr CATCAAACTT GTTACGGCTA TCGCGCAGAA GCCAAGG'IT TGGTGAAGAG CAGGT'rACTA TGATGCCTAC ATTTCGAAAA AGTCTTCGCG CCTATGAC'rT CGATTCTCTC CCATACCTGT AAACTTGGCA GTCTACCTG'r CGGACTCCAA CTGCTGCTGC TTTTGAAGCA GTGACAACTA ATGAACTTTG CAATTCAAAA ATCTTCTCAC TAACGTGATT CACTGGTCTT GATGCAACCA ACCT'rGATGA GTAAAACGTC GTATCATGCT CTCrAAATAC GGTGTTGCCG GCAACGCTTC GACGGTATCC AATCTATC'rA AACAGCCGAA GGGT ACTTTC AGTCT'rTCAT CCGTACCCTC ATCATTCAAG TCCAACTGCT CCAAGTGTTG GTACTTAGCC GACCTATTGA TCCTGCTGGA TTCTCTCAAG
TACAAAAAGG
GATTACGATT
AACCATGACC
GGACTGCCTG
TTGATTGGTC
ACAACAGACT
AAACAGTCAT
CTACTTCTGC
TCCCAGGAGT
CTGGTCAAGT
TGATrrTTGGG
CAGTTGCCAT
GAATTTCGAT
CCAAGTACTC TGAGGAAACC ATTTACCAAG
ACCACAAACA
CGGACTTGAA
CCACTTTGGA
TCTACCAGTT
a* ACAACCCGTG ATTTTTGGAG GTCCACGTAG AGCTCAACAC AATGACCAAA ATGCCAACAC CTCAATAAAG GGGTTGTTGA CACAAAAAGA TGCACTT'rGA CAAATTTCTC AGT'rTGATGA GACGGTACGA CCAAGAAAAT AACACCCA'rG GTACAGATGG TGCCGGTATC AAGGCTGCTC TTGCCCTCAA CATGGACATC CCGCAAGAAC TACTT'CTATC CTGATALACCC CAAAGCCTAC ACCAATCGGA TATAATGGCT GGATTGA-AGT CAAACTAGAA CGGTATCGAA CGTGCCCACC TAGAGGAAGA CGCTGGTAA-A 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 CTACTCTTAT GTTGACCTCA ACCGCCAAGG GGTTCCCTTG A'rTGAGATTG TATCTGAGGC AGATATGCGT TCTCCTGAAG AAGCCTATGC TTATCTGACA GCCCTCAAGG AAGTTATCCA GTACGCTGGC ATTTCTGACG TTAAGATGGA GGAAGGTrCG ATGCGTGTGG ATGCCAACAT CTCCCTTCGT CCTTATGGTC AAGAGAAATT CGGTACCAAG ACTGAATTGA AGAACCTCAA CTCCTTCTCA AACGTTCGTA AAGGTCTTGA ATACGAAGTC CAACGCCAGG C2'GAAATTCT TCGCTCAGGT GGTCAAATCC GCCAAGAAAC ACGCCGTTAC GATGAAGCGA A'rAAAGCAAC CATCCTCATG CGTGTrCAAGG AAGGGGCTGC TGACTACCGC TACTTCCCAG AACCAGACCT ACCCCTCTTT GAAATTTCTG ACGAGTGGAT TCAGGAAATG CGGACTGAGT TGCCAGAGTT TCCAAAAGAA CGTCG'rGCGC GTTATGTATC TGACCTTGGT TTATCAGACT ACGATGCTAG TCAGTTGACT GCTAATAAAG TCACTTCTGA CTTCTr'rGAA AAAGCTGTTG CCCTAGGTGG TGATGCCAAA CAAGTCTCTA ACTGGCTCCA AGGGGAAGTC GCTCAGTTCT TGAATGCTGA AGGTAAAACA CTGGAACAAA TCGAATTGAC ACCAGAAAAC TTGGTTGAAA 'rGATTGCCAT CATCGAAGAC GGTAC'TATTT AAATGGCGCT GGCGCGCGTG AGCTA'rCr'r ATCCCAATCA CTTCAAGTCA GGCAAACGTA AAAGGCCAAG CCAACCCACA AAAGAAAACT AGACAGAACA CCAATAACTA TTTTGGC?1r TTTAT'rAAAG AGGTAAAAAC AAACAGCTGA CGGCAAATTG GAAACACGAT CATGCGTATG GCTACCGTCC AGAGGAAAAA TGTACAAAAT GGATGACACA TCCCTGTAGT CAATGTTGAA TCCAAT'rCTA CCGAACTGAA
CATCTAAGAT
AATACGTGGA
TCCACCAAGI'
ACGCCGACAA
AGTTGCCCTT
AAACCACCC
ATTTCCAGAG
ATGATTGAAG
ATTCGCGTTT
AAATTGCGTG
TT'rGAACAAG
GCATACTTCA
AACGAATTGC
GTGATCGGTG-
198 TGCCAAGAAA (3CTTrrGTCC ATC'rAGCTAA AAAAGCAGGT ATCCTTCAAA TTTCAGATCC CTTTGCCGAT AACGAAGCTG CTGTTGCCGA GGCtT'rACAG GATTCCTTAT GAAGGCAACC AAACTACT'G CACAGGAATT GGCGAAGTTG TAAGGTTGG'r TTTTTCT'rCT CTACCAACTC TATTTTATGG TAAAATGAAG AGTAATAATA CAAGTACCTT AAAAGCTGGT ATGACCTTTG TGGAAGCTAG TCACCACAAA CCAGGTAAAG ATGTCCGTAC TGGTTCTACA TTGACACAA CTATTATCGA GACTGTCCCA GCTCAATACT TGAATACAGA AACTTATGAC CAATACGAAA TTTACATCCT TGAAAACrCT GATGTGAAAA TCACCGTTCC TACTACTGTT GAGTTGACAG CTACTGTTAC AGGTTCTGGT AAACCAGCAA CAGACTTCAT CGAAGCAGGA CAAAAACTCG *b TTGCTGAAAC TCAACCATCT ATCAAAGGTG CGATGGAAAC TGGACTTGTC GTAAACGTTC 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 T'rATCAACAC
CTATGGGAAT
TCATTGCTAT
CTGATACCCT
*TGCAGAAGGA ACTTACGTTT CTCGTGCCTA ATCTCTAGAA AGAGGTCATT TGAAGAACAA CTTGGCGAAA TCGTTATCGC CCCACGTGTA CTTGAAAAAA CGCTACTGCA AAGGTAGAGG GTGTTCACTC TTTTTCAAAC AGA'rCAGTGT TTCAAALACTr TCACTCGGCC GTGGCATTTA TCTTAAAAAC GTGGACGAAG .too 000.
0.* AACTCACAGC AGATATCTAT CTCTACCTTG AGTACGGAGT TTrCCTATCCA GAAAGCTGTC AAAGATGCCG CTATCAATAT TCACGTTGCA GGTATCG'rCC
ATCTATTTGA
CGTAAATGCG
TGTCGTTTCG
ATAGACCTCG
CATTTAAAAG
TTGGGAGTCT
ATCGAGCTTG
CGAGGACTTC CTCAATGACT CTTTTCAAGC TCTCATGAGC CCTATACTCA TGATCGTGAA TTTCTGGTGT TCAALGCTAAA CAGGTTGGAC CATTGAACGC
TCCGTAATAT
CAGATAAAAC
AGTCCACTAT
CTTGAGTTCG
GATACGGATG
AAGGAAGAAC
TTAACGCTCG
AAAAGT'rCCT AAGGTAGCGG GGCTGATGTA GAACTCGCTG ACCAAAACCA GAATTGAAAG TAGAATCTAG ACGCCAACTC GTACGGATGT CGAAACTGCT TACAACTT'CC AGCCT'rTTTG TAGATAAGCA AATCACTCAG TGGAGAGAAA CCTCCTTCGC TGG'ITGCTGT TAA'rGAAGCT TTGAAATCAC 'rTCA'N'TGAC ACI'CCTCAGC CAAAGGACTT CTCCGATCAA AAATCTGCCC GTTTrrATCAA TGGACTGCTC 7620
AGCCAGTTITG
GCTAAGCTCG
TCCGTTTT
TGAGATTATT
TCTTTTCTTT
TAAGAT'rGTC
GCAAGAGCTG
CTAAAATCTC
TCAGTCTACG
199 TAACAGAAGA ACAATAAGGC TCTG'TCAA CTGTAGTGGG TTGAAAAAAA AGAAAGGACA AATTTCG'rCC MTTTTTT GATGTTCAAA GCGATAAAAA GAAGTTTTCA AAGP'rCGAA AACCAAAGGC ATTGCGCTTG ATAAGTTTGA GGTCGCTTCC AGTTTGGCAT TAGAATAGTG TAG'rrGAAGG CCTrTGACAA ATCq'rGAGG AAGGTTTTP.A AGACAGTCTG AAAAATAGGA 'rGAGCC'rGCT CTCAATAAGT CCGAAAAATT TCTCTGGTTC CTTATTCTGG AAGTGAAACA ATAGAGC'rGA TAGTGGTGTT TCAAGTCTTG TGAATGGC'rC AAAAGCTTGT TTTATTGGTr AAGTGCA'rAC GAAAAGTAGG ACGATAAAAT CGCT'rATCAC GCTATCCTGT TGAATGAGTT TCCAGTAGCG CTTGATATCC TTGTATTCAT
C
C
p. C C
C
S
C. C C 9 99 CC CC 9 0
-SC.
CC .5 5 9 b C
*CV*
S
*SC*
*CCC
9*CC C. CC C S GGGATTTrCG ATGAAACTGA TTCATGATTT GGACACGCAC ACGACTCATG GCACGGCTAA GATGTTGTAC AATGTGAAAG CGATCAAGAA CGATTrAGC ATTCGGGAGT GAAACAGTCT GGGAGACTGT TTCAGCC1'GA GCCTAGGAAT TTGAAAGCGA AGCTGTTTAG CCAAGTCATA GTAAGGGCTA AACATATCCA TAGTAATAAT 'NTrGACGcA CATCGGACAA CTCrATCGTA GCGAAGAAAG TGATTTCGAA TGATAGCTTG TGTTCTACCC TCAAGAACAG TGATGATATT GAGATTGTTA AAATCTTGCG CCAAC-ACATA ATCTCAGGAA ACGAATAACA GTTGAAGTTG TTCAATCAAC TTTTGAGCAA GACGATAGAA GTTCAGCGA TCTAAGGAGA ATTCTAGTAG TTGAAAGTCA TATT'rCTTCA CAGTTTGGCG ATGATTTCCT AGGGTCT'rTA ATGTCTAGTA AATGAGTrGT TTTGTCGCTT TAGGCTCCAT AATATCTATA AAAGAAAAAG TGTTTGATAG AA'rGATAATA GATATAAT'rG CAATGAAGCT CATCT'DTCCC TTTGTAAAAG CATACTCATC GACAAGAAAA ATCATGTrTA AAGTGAAAAT CAT-TGAGCTT AGATGGAAAG CTGATGGGCA ATATCAGTCA TAGAAATCTT TCTTTTGGTT GATGATACGA GGGATTTGGT GAT'TTTCTT CCATCATTTT TGAACAGTGA TAGCACTTGA ATCGACGCTT GCATACCAGT CGTTTCAAGA TAAGGAATTT TAGAAGGTT'r 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 ATTGGTTTCC GCACTCAGGG TGTGTGTATC CTTATTGATG ATTTTGTGAT AAAATGTAAT TTCATTATAG GTCATATGGG GGGGATTTAC CCACTACAAA ATATCAAACA CTTTTrTCTT TAAACAAAAA TCCAGA'rAGG
CAAGATGGGG
ATGTCTAAAA
TGTTCCATAT
ACTTTrTTrC
TATTATAGAG
TGCCTCCCAC
T'TTrGCATGA
GGTAGAACGA
TGGATCTCTG
CGTCGTAGTC
TCTGGATATT
GAATCTTTCT
'IACAATAAAA
CCAACAATAA
TATCTAAAAA
TTGAGAAAGT
TAAACAAGGG
TGAGATAGTA
TAAAAAAACT ATGGCAGAGA ATCGTTAATC TCAGATr'GTC CAAAAAAGAA ACCAATCAGA CTATAATATA ATAAACTAAT 200 TCAAATGGCT AATCCCAAAG ATGATAGCAG ATAGGATAAC ATCCAAATAG TACTTGGACT AGGGAAAGAA GGTATT-CATA AAATACCCTC TATCAAGAGT CTCCTCAAAA ACAGGACCGA TGATDACAGG CAGGACAAAA GATAAGATAG TCGATAAAAA GGTTGGTTGT CCATTTGAAA AAAGCACGGT AAAATACTCA TCATGAATAT TCCTATGA'rT AATCAAATGA GCATAGCGTG CCCAAAAAWr ACCGAGAATC TGATAAACCA CATAAGTTGC 7U~TAAGTAG ACCAGTTCCA GCTCTTTTTC TTAATAGAAG GAAACTTCCC CTAACCCTAA AATGATCGTC AA.ATAATAAG CAAAAATATT TTTGAAAGAT TACCCTGCTC CCCTTGCCTC ATATAGGGAG CCAAACAGCA AGACCACCTG ATTTGTACCT GTCATTGCAA GTATATTTCT TTTTTTGTCT TCAAAGATAA AGACCATCTT TTTCTrTTTTT ACTAATCCCA TTGTTAAAAT AAGAGAATAG
ACATACAATC
CCAAATTGTC
GGAAGCCGTA
CAAA 1rCTCT
AAGTTTGCTC
GTACCT'rCTT
ATTTTATAGC
CAATTGTTTG
r'rAGTTTTTT
CTTCCAAGCA
ATAATATAAC
CAAGTCCTCA
AAAATAGATT
CCATCTCCTC
TGGTAAATAG
TGTGTTTCTC
TCTATATAAG
CATC'rACTAT
GTTGAAAGAA
AAGACAAATG
AACCTCCAAA
ACATCAGCTC
GTAGATAGTA
ATCGTACT'TT
AATTAAGTGC
ATCCATCTTC
CI'CTAAATGT
GTTGTAGGCT CACATTTATA AACTGGCAAT TTTTCGACCT CATT'rAGTCA TTCTTAGTAT GAATTACATT TTTCCATAALA AAATGAGACC TTTCTAGTCT 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160
TTTCTAAATC
GCCAdifTCCG GT'rGATAGCG TTCTTCCAGC AACTCTTCTA GCGGIrTTTG TGAAAGTCTA TTTGGAGT'rC TTrTTTrGACA CTCTTAATCA GTTCTTTACT AGAAAGTCCT AT'rTCAGAAA TCACCTTATC CACCACGTCC ATCAGTTCTG CTGCTTCCAr AGCGCGAGTA TCTGGACTGA GAATGGCATA GATAGAATTT AGAGCCAGAG CCCCGCCTGA ACCACCTTCA AGGTCACTCA TTTCCATGAG ATTGCGAGCG CCGACACCAG GATAAGCACC TGCTGTATTG TCAGCCTCTT TCATCAACCG CAGTGCCT ?rCCGTTTGA GGTTGTCTTG CAAACTCTTG 'rGGTCTCCAA GCCALACCAAT ACCACCAACA ATTTCTAACA GTTCATGCGA AGTGATTTTC CCGTCCTTCC ATAAAATGGA AGCAAAGCCT TCCAGCATCC AGACACGGTC CGCGACAGCT CCGATAATAA TGGCGATA-AT AGGAACTTTC ATACCTTCCC CTTGACCACG TTCTTCCGCT ATAAAGGTCA CAACTGGACG GCCAAATTTC CCATGTAATT GGATAAATTC CTCTGCTCAC GCGCT'rCTCr
ATCAAAAATG
GACTATTTTT
CGGTAGCCTT
CCTTT'rTGGA
ACTGCACCAT
CCTGTCGCAA
GCAATATTCA
AGTCTCTTCT
CTCGATG'rGG TTGGCCAAAA TACCAACCAC TGTTACAGCT CATCACGAAA AGAACGGTCA AGTCCAAGGT TGTCAAGCGA TCTAGGACTC CCTCCATGCA TTTGACAATA GCATCCACAA ATCTGACTAG GCTAGCAATC GTATCTGGTA AGCCATGTTC TAATAGGAAr TCTGCCTTrr' GGAAATCCTC AGGCAAGCTT TCACCAACCG 201
TATITTTCAAT
TATCGCCTC
TCAGGTAAAA
GCATGAGACT
CTGGCAATTTr
TACCCATAGA
TAATAAGAGC
CCAGTTTCTT
ATTC'T'TGAA
CACACGACGC CCAGCAAAAC .CAACCAAGCT CrGTGG~rCA G;CCAGAATGA CATACCGAAA GAAGCTGTCA CACCACCAGT CGTTGGATCT GTCAAAATGG GAGACCAGCA TTGAATGGC GTTTAMCCGC CGCAGAGATC TTAGCCATCT CATGATTCCT TCCTGCATAC GGGCTCCACC AGAGGCTGTG AATAGGACAA TTCGACAGTC GCA'rACTCAA ACAAACGAGT GATTrTCA CCTACAACCG AGCCATGATA AAGTTAGAAT CCAPAATCCC AAGAGCCACA GTCTGACCTT AGTTCCTGTC ACAACGGCTT CATGCAGACC TGTTTTrCA CGCATAGATG TTGGTAACCA GGGAAATGCA AGGGATCCTr GCTTTCAATC CCTGTAAACA GGr1rCCCATA TCAATCGTCA AAGCCAAGCG TTCTTGGGCA GAAATACGAA 11220 11280 11340 11400 11460 11520 11580 11640 11700 AGGTATAGCT ACAGTGCGGA CAGATACGTT GCTTACAGCC TGGACACTGG GAAAATAATT TTTCCCTAAC CGAACGATTG GGATTGAT'rC CCATTGATTC CCCTTTTCGG TTTAAACTCT TAGGTAAGAA GGTTTCCATC AAGAAGGAAG CTGAAATGAG GTCAAGCTGG AAATC'rGCAT AGACGGCACG TTGCATrmTC ATCAAPGCCGT TGGCAATCAT ACTArCATAA TAAGGCGGAA CGCGCAAGCC AACTCCACCA CTTGGCAGAT CAAAGTTAAA GGCTGGGTT'r TCTGCATTGA CAATATCTTC TTGCTTAACA GACAAAGGCT CGATATCAAC ACCTGAAACA AACTCTGTTA TCTCCATGAA ATAGAAATTG CTACTTGCTT TCTCATAGCC AACAAACTC1' GCCGCTCGAA TTTrTTCCGAT TGCAATCGAG GGACTTTCTT CACTTCCCAG ATCCTTCTGA TAGATGGTAT CATCTGGAAC CTCTGGCTTA GCTTGAGGTT GAATATACTT ATCTTTT'rTA CTAAATAGAG
TAAAGTCATT
TATCATAATC
TGGTCTGCAC
CAAAACGATT
TGGTATAACC
AGAGATTAGT
TACGACACTC
GACCTGCCGC
CTGGATGT'rC
CATCAAGAAG
CAGCAGCAGC
CCAAPLACCTT
TTATTCTTTT TCT'rGATATT CCCAGCAATG ACATTGCGAT TCCTTCAATT TCTAATTCAT TTrCGCCGTGT ACTATGATT TGGATAAACT GCTGAATCCA AATCTTACCT GGACT'rGGAG GATGGCATGA CCGCGTAGGA AATGCAAATC TGTTCCTTAA TACCTGAACA CGAGTATTCA AAATTCAATG G'rTCCTGCAT ACCTATTTCA TGACGCAGCG TrGGT'rATTC CT'rTGAAGAG ATCACCTAGG ATTTGAACCT 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 AACAATCCCG TTCACCCAAG TGAATCACAT GTCCATGCTC CAATGTGCCG AGCTCGATAG ATAACCCGTT CTATGTACAT GGCACCATrG CCATAATTGG CCTTGGCCTC ACTAGAGGCA GTTTCAAAGG CAGAAACGAG GTCATC'rGGT TTTTCAACCT TACGAATCCC TTTACCACCT CCACCTGCTG AAGCCTTGAG CATAACAGGA TAGCCAATTT TTTCAGCAAC AATCAAAGCT TCTTCAGAGT TATGCACTTC TCCATCTGAA CCTGGTATAA
CAGGCACACC
TAACATGACC
TGGAA'N'TTC
CAGCTGATAG
AAACTGCTTC
202 TGCTTTAATC ATCTGAGCAC GCCCATTGAT AGATGGACCG ATAAACTTGA TACCTACTT'C ACTGAGAAAT CCAAAACCAG GGTGAATAGC AACTGCAI'A ATATTGAGAT AAGACTCTGT ATCTGCCAAA AGCGTATGAA GAGCT-rCCT'r CTTATCCCCC ATCATATCCA TTCACACATG GTCGCAAATT rrCTGCCTCA GTCAAGACTG TCCCTGCCA GGACCAATAC CTACCGTCGC AA'rCCCCAAT TCACGTGCCG
GATTGGCAAT
AAGGGTACCA
GGTGCCACGA
CTTCTTGA.AC
TTTTGATAAC
CATAACTGGG
GATAGCAACA
TAAAATTTT
CTGGCTGCAA
CGTTTTACAA
TTAACCTTGT
CGAAACATGG
GCTTGCCATC
CACGGATAAT
AGAACCTCCT'
CACTTCAGCC
ATCAGCAGT'r
ACGAACCGCA
TAGrTCCCAA
TTTGCTTCAA
GAATAAACCG
ATTTCACCAC
TTGCAAAAGT
CCACAGCTAT
AAGTCGCTGT CATAACCAAT CCATACCAGC GTAAAAGACC
ACGGTGGGGA
TTACCAAACT
GGAGCTGGGA
TTGACACTAT
TCCAACACAC CGGCAGT'ITG TATTGAGGAA AGTGGCCGTT ATGGTATCCT CGCTCACTTC AGAGCTTCTT TGATTCCTTG CAACCATTTC TTCGTT'AGAG TTTCATTCAT GACTTTCATG CACCAACTGT AACGAAGGCA
CGCCAAGGCT
AAAGAAAGGC
CAAGACACGG
TGGTCGCCTG GTACAACTTG ACTTTTCCTT TATTTTCPAGG TCCATA.ATCA CAACACCTGG TCGT'rGATGG TCACATTITT TCCACTAGAA GCATAGGATA AATATCGATC ATTTGATACG TACCAATCCT ACGAGAATTT CCGTTACCAC ACCATCCTTA GCTTCCATAA TTACCAATGT TTGACCTTTT CGTTTATCTG GTCCAGCAGC CAAGTAAACC CCCTCAGTAG CCACACTTGC TrCAGCTGGA GGAGCAGATG TAGGAGCTAC TGGACTCGGT 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 ACTCCAACAA GTGGACTCTC TACAAGATTT GCTGGAACTT CTTCTGCTAC AGTCTCTGCT GTTGCTAGAA CGGGTGCTGG AGCGACTTGA GTTGCAACTT CAGGCACAGG TCTTGCTTCA TTCTTGCTAA ACTGCAACTC ATCCGTCCCA TTTTTATAAG AAAATTCTCT CAAACTTGAC TGGTCAAATT GAGTCATCAA GTCT'N'AATA TCGTTTAAAT TCATACTTAT CTATTCTCCC AACGTGTAA AGCAAGAACT GCATTGTGGC CTCCAAAACC AAAAGTAPTT GAAATAGCGTr ATGGAATTTC TTT~CTCCAAG CCTTGTCCAT AAACGACATT ACTCGATA TAATCTGATA CTTCACTTGT CCCAGCTGTC ATTGGTACAA AGTTATGACG CATAGCTTCG ATGGTGACGA TAGCTTCTAC TGCACCCGCA GCCCCCAGCA AATGTCCTGT AAAAGACTTG GTTGATGATA CAGGTACTTC CTTACCAAGA GAGTTGACGT TCCGTGAGCA CCAAGGCTAG TTTGATGCCC GGTAGGCATC ACAAGTATTT ACAGCTACGA TAGCACCACT TTCTCCTTTT TCATTGGCAG TTGACATAGG CTACTTGCTC TGGAGAAATC TCAGCTTCTT TTGATAGCTC CCTGACCTTC TGGATGTGGA GAAGTCATGT CCGTAACCAA CCACTTCAGC CAGGATAGTA GCTCCACGTT 203 rTTCAGCGTG T'rCAAGACTT TCTAGAACCA ACATCCCTGA ACCrrCACCC ATAACAAACC CATTGCGATC CT'rATCAAAT GGGATCGAAG CACGAGTTGG ATCC1'CTGTA GTAGAGAGAG CTGI'AAGGC TTGGAAACCA GCGATGGCAA AAGGTGTGAT AGAAGCTTCT GTTCCTCCCA CCAACATCAC ATCTTGGAAA CCAAACTTAA TGGAGCGGAA GGCATCCCCA ATCGCATCAT TTGATGAAGA GCAGGCAGTA TTGATAGATr TACAAACACC GTTTGCACCA AAACGCATGG CTACATTCCC AGAAGCCATA TTTGGTAAAG CTTTGGAAG AGTCATTGGT 'N'GACACGTT TGGGTCCTTT TTCATGAAGG CGAAGTACCT GATCTTCAAT TTCCTTGATT CCACCAATAC CAGATGCAAC GATAACACCA AAACGATCCC TATTAAGAGC CTCTACATCA AGATTGGCAT *t GATTTACAGC CTCTTGGGCT TATCTTTTTT TACAAAGTAT CATCAAAGTC ACTATGATCA AACTAT'TCCA AAATTCTTCT CCACTACTCG ArTTTAGTTTC CATCAATGGC AACCACTTGT CTGCAACCTG CTCTGCCTGC TAATCTTATC TGACAGGATA TGACTCGTAT ATTCCGACTA CAGCCT'rAGA AGCAGCATAA ACATATTAAT GATAGCACCT TATTAAAGGC ACCAGTCAGA TGAGCA'rAAG AGTATCTTGG TTATCGAACG GAAAATCTTG AATTTTGTAA TGCCACCAAT GGTGTATTTC CGATTGGAGA ATTCTTTTCA CCTCTAGCTT CCAGTTAGAT AATCTTGGCC CCAAATCTT TCATCGGAAT GCGGTCATAT CAGACTCAAT GCGACCTCGC GTGCCACAGA TTAGCTTGAC CAATATTCCC TCTCTGGCTT TCATCATCGG TTGACCTTGA GCACTTTTTC GTAATCCCTG CATTGTTGAC GATTTCTGCC GCATTATGCA GCCGATTTTC CCAGT'rGCTA TGTTACTCCA TAACCTGTTA TCGCTACATA CTTAAGCCAC TGCTAAAAAT ACTGTCAAAT CTGAGCTAGT GTAGCTTCCT CATTCCTGGA GCAATCACAT CTTGGTAAAG CCAATCAAGC CATCAAACCA ACAACACTAG TTTCAAGACT GATTGTGTCA AAAATCTGCT TCTGTCATCT CAAAACATCT ACTGAACCCA AAAATCTGA'r ACATCTCCTG GAGCAA'rTCT TCTGAGATTG AAACTTGTCG GCGATGGCAA GCATACAAGG CATArAAAGA ATAGTTATCA AAACGGTTGG 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 GTTCTGCAAT AGCTTGATCA ATCATACGCT AAATCGGAAC CACCTTGATA CCATAGTT'rG CCCCACGACT GTTI'AAGACA ATGTTGGCTC GACCAATTCC ACGACTCGAA CCTGTAATAA TCCI'?TCAAA ACTTCTACTT ATTT~TAGTCT GATCTTCCAC ATGAGCTAAG TGAGCAGTTT CTTTCCCCGG TCCAATCTCG ATAAAGTTGC TTTCATAGAA ACGAACGGGT TCCTTGACCT
TAGCGTCTGC
AAAACTCAGC
CTGCTTGAGC
ACATATTTTT ATGTTCTAGT TTCATTTTTT AT1-rTTCTAA AAGTGCTACT AAACTCGCTT GATCAATTTT TTTAACAAAA TTATGCCTGC TTCTTGCATG GACGCGTCAA GAGCTGAGCA
CCTGACAAGA
ACCCCAATAC
ATGTCCTCTT
204 TTTGCATCAC AGCAGCTrCT GTATTGCCGA CTAGGGACA AGTAA).ATCT GAAAAACTTA 16500 CCTGAGC'rAG AGTTTCAGCT AGTTTCTGGC TAGCAGGT'rC AAGGAGAGCG GTGTGAAAGG 16560 GACCTGACAC CTTAAGAGGA ATCAAGCGTT TGGCACC'rGC TTCTGCAAA AGTTCAACCG 16620 CTCGATCAAC TGCAACCACT TCTCCAGCAA TGACGATTTG TGCACGTGTG TTrATAGT'rGG 16680 CTGGAGTAAC CACTCCAAGT TCAGAAGCTT TTTGACAGGC TTCrr'rCAATG ACCTCTAC'rG 16740 GCGTATTGAG AACTGCTACC ATCT'rGCCAG AGTCAGCAGG AGCCGCTTCT1 TCCATATAGG 16800 CTCCACGCTT AGCTACCAAG GCAACCGCAT CT'rCAAAATC CAAGGCGCCA CTTGCCACCA 16860 AGGCAGAGTA TTCTCCAAGA GACAAACCAG CAACCATATC AGGCTGATAG CCCTTrr= 16920 GCAATAAACG GTAGATAGCA ACCGAAGTCG CTAGAATGGC TGGTTGCGTA TAGCGGGTCT 16980 GAT'rGAGTTT GTCTTCTrCC GTA'rCGATGA GATAACGCAA ATCATAACCG AGCACCTGGC 17040 'rCGCTCGATC AATCGTTTCT TTAACAATCG GATACTGATC ATAGAAA'rCC CGTCCCATCC 17100 CTAGATACTG GGCACCTTGA CCAGCAAATA AAAAGGCTGT TTTAGTCATT TCTTACAACT 17160 CC'rGTCCAGC GAGAGGCTTC TTCTTGA-ATT TTCTTAGCGG CTCCGTAATA CAAATCTTTT 17220 AGGATTTCTT CAGCTGTTTC TTCTTTAGAA ACAAGCCCTG CGATTTGACC TGCCATAACA 17280 *GAGCCACCAT CCACATCACC G'rGAACAACT GCTTTGGCTA GAGCACCTGC TCCCATTTGT 17340 *..TCAAAGATTN' CTAAATCAGG ATCT'rCTTGC TTAAAGGCAT CTTTTTCAGC CAGTTCAAAA 17400 *..TCTCTAGTCA ACTGATTTTr AATAGCACGA ACAGCATGAC CAAAGTGCTG AGCTGAAA:TC 17460 999GTAGTATCAA TATCCCTTGC TTr'rAAAATT TTCTCCTTGT AGTTTGGATG GGCAT'rCGAC 17520 TCTTTTGCAA CTACAAACCG TGTCCCCACC TGTACAGCCT CTGCACCTAG CATAAAGCCA 17580 GCCGCAGCAC CTTCACCATC CGCAATTCCr C(rGCAGCAA TAACAGGAAT AGATATAGCT 17640 *..GTGGCTACCT GTCGCACCAA GGTCATGGrr GTTAAT'TAC CGATATGCCC CCCAGCT'rCC 17700 ATTCCTTCTG CAATAACAGC GTCTGCACCG ATTTTTTCCA TGCGTTTAGC TAAAGCGACA 17760 CTAGGAACAA CAGGAATAAC GATTATCCCA GCTTCATGGA AACG'rTCCAT ATAC'rTGCTT 17820 GGATTTCCTG CTCCrGTTGT GACAACT'TTA ACACCI'TCTT CAATA-ACGAG ATCCACGATG 17880 TCTTCCACAA AGGGAGATAA GAGCAfGATG TTGACCCCAA AGGGTTTATC AGTCAATGAT 17940 TTGATTTTAT CAATATTGGC CTTGACAACT TCTTTCGGGG CATTTCCCCC ACCGATAA'rT 18000 CCTAATCCTC CAGCCTTGGA AACAGCCCCT GCCAAATCAC CATCAGCAAC CCAGGCCATC 18060 *CCTCCTTGGA AAATAGGATA ATCAATCTTC AATAATTCTG TAATACGCGT TTTCATAGTG 18120 CCTCCAACCT TCCTTGC?1'A CGTAATAGTT CGATTTCACC ATAATTTGAC AGTCAAACTA 18180 TTACCTAAAC AAGAGGGAGT GGGTr'rCTCC CTACTCCTC TACTAATATT CTGCTTATTT 18240 205 TGCTTGCTCT TCAACGTAAG CAACCAAGTC ACCAACTGT-r GATTTGGATA TCAAAAGCAT CTTCGATTTC TGAGATTACT TGCGTCCAA.A TCATCAAAAG TTGATrCAAG TGTTACTrCT T1'CAACGATA ATTTCTrGTA CTTTTTCAA.A TACTGCCATG ATAGTTTTTT TATAACA.ATG TGTTCACCAC ATGATTACCT CCCCAGGTCA AGCCTCCACC GAAGCCTGAT AGAAGAACAG ATGAGACCTT GTTCTACACA CTCTrcAAAT AAAATCGCCA TTCAAGTCAT TrTCTGCTTC TGGAACAAGT CCAATGAATC GATGCGTCTT TTCCAAGTTC ATAGGACTCC TTTAAAATAA AAATTGTAAG AATGAGCGTG TCTGGCTACC ATCTAAAGGG TACTGGCTGC ACTGGTATTG CACCAATT'rT TCTAGCCATC AATCCAAGTC TGTCACCTCT CATCTCGAAT GGCAAAATCA
CCATATTCCA
TTATCCAAAA
ATAGGAGATT
'rCATATTGGC TGGAAG'N'TG GCTCGGTCAA.
TACGGTCATT GGCTTGATGA AGTAGCAGAT CATCAATAGT CTGCT'rGATA GACTTGGCTA S.
S
*5
S
S
AAGACTGTGC GTCCATCCAT GAATGTAAAC CTGAATGCCC CTCTCAGCTA AGAAATGCTC CCAAACAACA CAGCTGTTGA CCAATCACCA AGCCTT~T-TG GCAAATACAA ATCCACTGCA ATATTAGCTT GAACACGAGC AGGATGATAA AATCCAGTTC ACCTCTGTAG CCAAATCACT GTTCGACTTG AAATCCACTC ACCACTTGCT CTGGCACATA AATCCTCCAA. AAATTGGTAA CGCTCATGCC ATCAATAATT CTCAAAAAC GAATCTGCAC TTTCTTGATC TGAAAATGGA ATAAGTTAAA CACTCGCTGC GACTTCCATC GCTATTGAGA TTGCTCGCTA GCTTCTAACA AGACACCACC AGCACCATCT TCGATCCGAC CAATCGACTG CCTrAGAGAG GGTTTCACTA AAAGCGACCA GAAGCGATAA ACTTNTCAGC AGTTGAAAGA 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 195 19620 19680 19702
AGCCGCGGTT
AGCTGTAGAG
TTCTCCTGTT
GGTAGATTCT
ATCATTGGTA
ATGAGCAACC
AGATTAGTCA
TTTTCTACCA
AAGTCAAAAG
GGCATCATCG
ATTCCAGCTT
GTTCTTGA.AA
TCCATAATCT
TGACTTATTT
AACCTTTACC
TGGCCTTGTG
CAAAGGCTTT ATTAGCACCA AATCTGGAGT AATGGTAGCT TTGCCATCAG 'NTCTTAGCA.
TATGCCTTTG TCGTATTCCC GAGCCAAGTC GTGATTTGTA TTGCAAAAGC CATTATTTCA CATGACAGCA ATTTCT'rCCT GAATCAAGCG ACCCTrCT'rT GTCAAATGCA GATG( GAACTCGCTC AATGTAGCCC GG INFORMATION FOR SEQ ID NO: 8: SEQUENCE CHARACTERISTICS: LENGTH: 6211 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear ACCAC ACGACGATCC TGTTCTGACC (Xi) SEQUENCE DESCR~IPTION: SEQ ID NO:'8: GAAAATTTCC TCTCTTCTCT TGAAAAATTT TGAAAAAATG TTTTAAGAGG AAAGAAAGGG GAATAATGGA GAAAATCAGT GTCGGACCTA GTTTTGGAAA CACTTCGTGA TTTACGAGTT 'rGGTGGTGCG GTTTTGCCTT ?I'TATGA'rGC GATATATAA'r TCTAGGGCGC CATGAGCAAG AAAGTTGGGT GTTGCCGTCG TGCGGATGCC ATGAGCGATA AGGGATTGGG AAGGATGCCT TAAGTACAA' TACCAAGTTC CCATATCGCA ACTACAGGCC TGCT'rAGAA ACAGACTTCA GTTGTITrGCA TGAAGCTGAA TCACTAGTGG ACCAGGAGCA GCGTTCCCCT TTTGGTCTT TTCAGGAGGC AGACATCGTG GTGAGACAGC TGATATTCCG GTATGATAGT AACAAGTTAT TTAGAATCTC CTAAGACGGG GATACCATCT TTGGTTATCC TTTAAAGGCA TTCGCCACAT GGTTATGCCA AATCAACTGG ACAAATGCCA TTACAGGGAT ACAGGTCAGG TGGCGCGAGC GGAATTACCA TGCCAATCAC CGTATCAT'rA CGGAAGCTGT 6 4S 6
I
TCTTGAGCCG
GCCAGTCTTG
ATTTGCAGAA
AACGACTCAC
GTCCAGGGCC AGTTGTAATT GACCTACCAA AACACATATC TTTAT'rCACC AGAAGTGAAT TTACCAAGTT ATCAGCCGAC AAATCAAGAA AATCTTGAAG CAATTGTCCA AGGCTAAAAA GTGGAAT'rAG TTATGCTGAG GCTGCTACGG AACTAAATGA
AATGATATGC
TTAGCTGGTG
CGCTATCAAA
CCACTCTITTC
TTCCAGTGGT AACCAGTCTT TTGGGACAAG TTGGAATGGG AGGCATGC.AC GGGTCAT'rCG
,~C
C.
e*.
TGCTATGACG GAAGCGGACT TTATGATTAG GGGGAATCCT AAGACTTTCG CTAAGAATGC TGAGAT'rOGC A.AGATTATCA GTGCAGACAT GCAAATGTTG CTAGCAGAAC CAACAGTTCA CACTAAAGAC AAGAATCGTG TTCGTTCT'rA AGCAGTTATT GAAcGAATTrG GTGAAT'rGAC TGGTC.AACAC CAAATGTGGA CAGCTCAGTA
TATTGGTTCT
TAAGGTTGCC
TCCTGTAGTT
CAACAACACT
TGATAAGPAAA
GAATGGAGAT
TTATCCCTAC
CGTT'TCGATG
CACATTGATA
GGAGATGCTA
GAAAAGTGGA
GAGCGTGTGG
GCCATTGTGG
CAAAATGAAC
GCAGCAATCG
GAACGAT'rGC 840 CAGCAAATAT 900 ACCGTT'rGAC 960 TTGACCCAGC 1020 AGAAGGCCTT 1080 TTGAGAAAGT 1140 TTCAACCGCA 1200 TAACAGAC:T 1260 GTCAGTTAGT 1320 GTGCTAAAAT 1380 GACTTCAGGT GGTTTGGGAA CAATGGGCTT TGGAATTCCA TGCTAACCCA GATAAGGAAG TAGTCTTGTT TGTTGGGGAT CCAGGAGTTG GCTATTTTGA ATATTTACAA GGTGCCAATC 'rCATTCACTT GGAATGGTrC GCCAGTGGCA GGAATCCTTC GTCGGTCTTT GATACCCTTC CTGATTTCCA ATTGATGGCG CTATAAGTTT GACAATCCTG AGACCTTGGC TCAAGACCTT GGTGGTTTCC AAATGACCAA AAGGTGGTTA TGCTGAACA-A TATGAAGGCA GAACATCAGA CAGGCTTATG GTATTAAAAA GAAGTCATCA CTGAGGATGT 1440 1500 1560 1620 1680 207 'rCCTATGCTA ATTGAGGTAG ATATTTCTCG TAAGGAACAG GTCTACCAA TGGTACCGGC TGGTAAGAGT AATCATGAGA TGTTGGGGGT GCAGTTCCAT GCGTAGAATG TTAACAGCAA AACTACAAAA TCGTTCAGGA GTCCTCAATC GCTTTrACAGG TGTCCTATCT TTAATATTGA AAGCATCTCT CTTGGAGCAA CAGAAGATCC GAATGTATCG TTATTATTGA TGTTGCTTCT CATGATGAAG TGGAGCAAAT CATCAAACAG AGATTGATGT GATTCGCATT CGAGATATTA CAGACAAGCC 'rCATTGGAG LlITIGGTTAA GATGTCAGCG CCAGCTGAGA AGAGAGCTGA GATTTTAGCG CTTTCCGTGC AACAGTAGTA GACGTAGCGC CAAGCTCGAT TACCATTCAG ATGCAGAAAA GAGCGAAGCC CTATTGCGAG TCATTCGCCC ATACGGTATT CTCGAACGGG TGCAACTGGA TTTACCCGCG AT'rAAAAATC CAAC~tAAAT AGCCTAAAAG GCAATAAATA ATAGAAAAGA GAGAAAAGCT ATGACAGTTC TGAAAAAGAT GTTAAAGTAG CAGCACTTGA CGGTAAAAAA ATCGCCGTTA ?rCACAAGGG CATGCGCATG C'rcAAAACT'r GCGTGATTCA GGTCGTGACG TGTACGTCCA GGTAAATCTT 'rrGATAAAGC AAAAGAAGAT GGNITrTGATA
CG'TCGTCAGG
CGTATCACTA
CTCAATCGTC
CGCGAGGTGA
ATTAT'ICAAC
ATGACGGGAA
CGCAATATI'G
TTATTAAACC
AAATGGAATA
TCGGTTATGG
TTATTATCGG
CTTACACAGT
AGCAGAAGCT ACTAAGTTGG CTGATGTTAT CATGATCTTG AGAATTGTAC GAAGCAGAAA TCGCTCCAAA CTTGGAAGCT CCATGGTTTC AACATCCACT TTGAATTTAT CAAAGTTCCT GTGTGCTCCT AAAGGACCAG GACACTTIGGT ACGTCGTACT TCCAGCTCTT TATGCAGTAT ACCAAGATGC AACAGGAAAT CTGGTGTAAA GGTGTTGGAG CGGCTCGTGT AGGTCTTT AACTGAAGAA GATTTGTTG GTGAACAAGC TGTACTT'rGT CGAAGCAGGT TTCGAAGTCT TGACAGAAGC AGGTTACGCT GCGCCAGACG AAA'rTCAACA GGAAACGCAG 'rTGGA'1rTGC GCGGATGTAG ATGTCTTCAT TACGAAGAAG GATT'rGGTGT GCTAAAAACA TTGCTATGGA GAAACAACTT ACAAAGAAGA GGTGCTTTGA CTGCCCTTAT CCAGAATTGG CTTACTTTGA 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 AGTTCTTCAC GAAATGAAAT TGATCGTTGA CTTGATCTAC GAAGGTGGAT GCGTCAATCT ATTTCAAACA CTGCTGAATA CGGTGACTAT GTATCAGGTC CACTGAACAA GTTAAAGAAA ATATGAAGGC TGTCTTGGCA GACATCCAAA TGCAAA'rGAC TTTGTAAATG ACTATAAAGC TGGACGTCCA AAAT'rGACTG ACAAGCAGCT AACCTTGAAA TTGAAAAAGT TGGTGCAGAA TTGCGTAAAG 'rCAAGAAAAT
CACGTGTAAT
ATGGTAAATT
CTTACCGTGA
CAATGCCATT
CTGGTAALA AACGACGATG ATGCATTCAA AA'rCTATAAC TAA~T-AGAAA TATATAGCGC TGGAGATGAT TTTATGAAAA AGATTATGAG AAAATTGCA TCGTTATTAT TGGTTCTAGT 208 TGTATAATGT AA'N'ACACCG TCGGTAATAG TGCTAGCAGA CCAAAATAAA GCAGATTGGT CGTA'rGATGA AAATGCTGTA ATTAACATr'r ATGATGATGC TAATMrGAA GATGGTAGGT TGCATATGAA CTTTGAACAA TTCTTCAAAT TGGCACAAAT AGCTAGAGAA GAAGGTCTTG AAATTCATTC TCCGTTGAG AGAGCGGTG CGACTAAATC TGCTCGTTAT ATAGCGAAAT GGATTTTGAG AAA'rAAAAAA CATTAACAAA TATAGTTGGT AAATCATTAG GACCTAAATC AGCTGTTAGA TTCGGAGAAG CTTTATCCTA TATI'GAAGGT CCTCTTCGCA GAATAAATGA GACGATAGAT CGCCGTTTAT GGGTTTAAAT GACTGGACTG ACTTATT'rAG GGGTTrGAAAT GTGATGAAAC TGATAGGCAA AATTGACAGA TCAACTAAGA ACACGTTGGA TTTGAATGA'r GTAATGA'rAG TGAAGAAAGT ATGAACTTCC TAAAGAGTTT AAGTTT'TCGG AGATGAATAA AGGTTT'rATA TGTTAAGTTC GTTGTGAATA CTCCACTGGA TA'rTTGAAAA AAGAAAATGC GCCAT'rTCCC AGCTCAGCAA AATCATCCGC AGGGAGTAGC ATCAAATAGA GCAAAT'rATT GCATCTCGAT TGAAAGAATC
CGAAAACTTT
CATATGAATA
CTGCAAA.AAC
AAAT'IAGATT
GTAGAATATA
TTGGTAGAAT
A(GCTTCAC~CT ATTCUTGGGA TATTAGATGT TTACCAAT'rT GTTT'rCTATC AAGACAGGAT TATT'r'r'TCA GTTGGA'rTrA CAATTGGGAG CTAA'N'TrGT TCCTCGTAGT CAATTTTAG AAGAAATTTT AAACTATTTT GGTTATATGA TTGGATT'rCC
ATCTTCCATC
ACAAA'TCGTT
AGTGTTACTG
ATAGAAAAAG
TCGATTCGTA TGGCTCATAA ATACCATGAA CTAAAAAACA GTCAT'rAGTG ACTGTTTrTTT 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 AAAAGATATA ATCAAGGCTC TTACGATCAT TATTTATCGG CCAGCGTGTT CGCTCCTTTA GGAAGAACGT GAACGTGGGG CTATACTTGT AATGAAATGA ACAAGGTCTT GAACGGTGTG AGAAGTATGG TGCTAAGATT AAA'PrCGTGG TGCCTATTAT TAGTCTGCGC TTCTGCGGGA AAATTCCTGC TACTATCTr'r
ATGCCCATTA
GTAACTATTA
ACAGTCTCTG
CTACGCCACA ACAAAAGATT AACTAGTTGG AGATACCTTT AAAATCGTAC CTT~TATTGAT GGTCAGGTTC GCTTTTTTGG TGGGGATTT GATGCCTCAG CCAAAGCAGC TCAAGAATTT CCTTTTGATG ATGCTCA'rGT TCAAGCAGGT CAAGGAACAG TTGCTTATGA GATTTTAGA.A GAAGCTrCGAA AAGAATCGAT TGATTTTCAT GCTGTCTTGG ?1'CCTGTTGG TGGTGCCGGT CTCATTGCCG GGGTTTCTAC CTATATCA-AG GAAACAAGTC CAGAGATTGA GGTTATCGGA GTAGAGGCGA ATGGAGCGCG TTCCATGAAA GCTGCCTTT'G AGGCTGGAGG TCCAGTAAAA CTCAAGGAAA TTGATAAAT'r TGCTGWI'GGG AT'rGCTGTGC AAAAGGTAGG TCAGTTGACC TATGAAGCAA CTCGTCAACA TATrrAAAACT TTGGTAGGTG TCGATGAGGG ATTGATTTCT GAAACCTTGA TTGACCTTTA CTCTAAGCAA GGGATAGTCG CAGAACC'rGC TGGAGCGGCT AGTATCGCCT CTTTAGAGGT T'rTAGCTGAA TATATrAAGG GGAAAACCAT TTGrTGTATC ATGCCAGAAA TGGAAGAGCG TGCCI'GATT AAPTCCCAC AACGTCCAGG AGCTTTGCGT GATGATATCA CACGTN'TGA GTATATCAAA ATTGGGATCG CT'rTAGCAGA TAAGCATGAT TTTGATCCAG CTTATATTAA CTrAAATGGT GGACTAATAA AAAAATATCA TACCTTCATT ACACTGTCTr TAATACTCTT CGAAAATCTC CAAAACAGTG TrTTGAGCAA CTTGCGGCTA AGTATAAGGT ATGATTTGAT TrCrTTTTTGT AAGTAATTAA CTGAGCTTAT CTGTCTTGTC GTGTCTGCTr CTAGGCTAGC ACCTCAATAT GGAATACCTA TCTCTCAGAT GAT'rTATTGA AAGGCTTGGA TTTCTAA-AGG TTAGAACTAT CTATCTTACG GAAATAGAGA AGCATTTT GGTAATAATA CAGTATTTTT ATTAGCAAAT ATATTATCGG ATTTAAAAAG GAAGTAAGAA 209
AT'N'CTGGAG
TATGATGGTA
GAGTTTGTAA
CGAGCTAGCA
TATGCAGGTT
AATGAAACGC
GAAATAATGA TATCAACCGT
TCAAACATTA
ATGATATCCT
AGGGAACAGG
TGATTCGTAG
TTTATAATAT
CTT'rGTGGTC
GGGGCCAAAT
CCCAGTATTA
AATGGAAGGT
GCT'rGTCTGA TTGATTTCCT ATCTATTGAC AAGCATAGTC TTCAAACCAC GTTAGCTCTA TCTGCAACCT GCTTCCTAGT TTGCTCTTTG ATTTTCATrG TGACAAATAT ACTATATTAA AAAGATATAT ATCTCTATTA AGGATGGTI'T AGATAATCGG CCAAAGGAGT GATGAATTTG AAGGACATAA GGAAGAAAGA TAGGAGTTTT TGAGCTAGTG CATCTTCAGT TCTTAAATCG AAGAAATAAG AAGAACTTGA ATAATTTCGC ACCTTAAGAG ATTTATGGTG TAGAGGCTAG CAAAACCTAT 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6211 INFORMATION FOR SEQ ID NO: 9: SEQUENCE CHARACTERISTICS: LENGTH: 7939 base pairs TYPE: nucleic acid STRAflOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: CCGGACTCCC CACGATTCTT CAAAATAACT GAGTATATTT CTATCTTGAT TTTCAGATAT AAATTCTTCC TTCTGTGGCC TCTTCTTACG CTTGAGAAGA GCTTCTCCGA CATGGCTTCT TCCTTACTGA GCAAAACCTT GAGCATAGAT AAGTTTGACT GGCAAGCGTG CTCTTGTATA TTTGGCTCC!C TTCCCACTAT TGTGGATAGC GAGGCGTCTT CTCATATCAG TCGTATAGCC TATATAGTAG GATCCATCAC GACACTCCAG AACGTACATA TAAGCCTTAT GATCCATAAT AAATCTCTTC GATTTCGGGC GTATAAGAGC CATCATCATT GTGGACAATC AAAGGAGGTA 210 AGACCTTAAA GCCACTTGTT GCGCCATCCT TGATCGCCTC AATCAAAAGC ATATTGGCTT CCTTTTCTCT TTTTGGATAA ACAAACTGCA GGCGCTTAGG GGCTAGATTA TGTCG'rTTrA ACGTATCCAA AATATCCAGA AGTCGATCAG GACGATGAAC CATGGCCAAA CGCCCATTAG ACTTGAGAAT ACTCTGGGCA CTACGACAGA TTTCTTCCAA ATTAGTCGTG ATTTCGTGTC GACCCAAGAG ATAATGTTCA CTCTCGTTCA GATTAGAATA AGGATTCACC T'rGAAATAGG GTGGATTACA CAAAATCATA TCCACCTTAC TCCCCT3GAAT AATCATCGCA GATGACCTGC CCATATCCGC CAAACGCTCC TAGCAAAAAG CCCCAC'rGCT GA.AAACGTG;G AAATCGTGAT TTTGAATGAT TTTGATATCT ATTGTT-CTTC TTCCATGGTC AACTCTAAAC TACTTCTTCT CATTCGTCGA AAGGGAGCAA AGGAAACTGG TACTTr'rCTT TTCATCTAAA TCCACTACCT ATTTTCA.ATA AATCCTGTCC ATTTGCTCCT CTAATCCATrr TGAATCTCAA CAGACAATAT CCATTCCCAG CACAGAAATC AAGAGAACAC TATCCACCGA GTCGAAAAGA GCTGGTTAAT CTATTATAGC AAATTCATAT TTTTTAAATG GTGCAGGGCT AGCCGTAGTT AAAGCGGTCG GTGAGCAGGC ATATTTTTCA CAAACGGACA GAGCGTTCAG CTGTGCTTGA GTACGAG'rGC CACAATCAAC CCCTTCTTAG ATAGCTAAAA ACCTCTCTAT GCGCTCTCCT GATTTTAATA TAACATTACA AAAAATATAA TCTCCAGTCC AGATTGGTAG CTTGAAAAGC GTCTCCGTCT V.
CCTCCAAAGT
GAACTTGAAC
GCGGATAGAA AGACTGGCTT TCCCTGTAAA CTCTrCATCG ACTTTCAAGG TT'rCATGAAT 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 GAATCTCTGA AATGTGAATC AGCCCCGTAT CACCCGTCTC TAACTCAACA AAGGCACCGT AGGGCTGAAT CCCTrGTAATA CGCCCCTTTA GCTTATCACC GATTTTCATC TTAGTCCTCG ATTTCAATAG TTTCAAT'rAC AACATCTTCA ACTGGCTTGT CCATAGCTCC TGTCTCAACA GCAGCAATGG CATCCAAGAC AGCGTAAGAT GCTCATCAG CTAACTGACC AAAAACCGTG TGACGGCGGT CTAGGTGAGG TGTCCCACCT TGATTGGCAT AGATTTCTGC AATCGGTTCT GGCCAACCAC CACGAGTAAT TTCTTTC'rTA GAATAAGGTA GGTGTTGGTT TTGCACGATA AAGAACTGGC TGCCGTTGGT AT1'TGGACCA GCATTTGCCA TGGAAAGAGC ACCACGGATA T'rGTAAAGC'r CTTCTGAGAA TTCATCCTCA AA.AGATTCGC CGTAGATTGA CTCGCCACCC ATACCAGTTC CAGTTGGGTC TCCACCTTGG ATCATAAAGT CCTTGATAAT ACGGTGGA.A ATGACACCAT CATAGTAGCC ATCTTTrGAA AGAGATACAA AGTTAGCCAC TGTTTTAGGA GCATGr'rCAG GGAAA.AGCTT GATACGTAAG TCTCCGTGAT TGGTCr'rAAT AGTCGCAAGA GGACCTTCTA CTGTTX'CAAT GTCTACT'rGT GGAAAATGCA ATTCTT'TTTC TACCATACCA AATACTTCTA AGGCAGCAAA AATGCCATCT TCTTCTAATG TTTTTGTAAT ATAATCTGCT TTTTCTTTGA TTTTATCATG AGAAATTCCC ATGGCAACGC 211 TGATTCCACC ATAATCAAAG AGTTCCAAGT CCTTGAGACC ATCTCCAAAA ACCATGACCI' 2220 TCTCTGGTrr CAAGCCAAGG TGTTCCACAA CCTTTCCAC CCCCGTCGCT TTGGAGCCTG 2280 AAATCGGCAC AATATCAGAC GAATGTGAT GCCAACGAAC CATGCGAAGT T1TGTCTGAGA 2340 GACTGTCAGG CAAGTGCAAG TCATCTCCCT TATCTrTCAAA AGTCCACATC TGATAGATAT 2400 CTTCTTTTTC ATGGAALATCG GGATCTACAT C'TAAGTCGGG ATAAATTGGA TTGATAGCTT 2460 CACTCATCAT A'rCGGTGCGA GTCGACAACT TGGCATCATG ACTCCCAACC AAGCCATACT 2520 CAATTCCTTC TTGCTTAGCC CAAGAGATAT ACTCCTCAAC ATCTGACTTT TCAATCTGAT 2580 GCTGATAAAT GACCTGACCT TTTTTATCTT CGATATAAGC CCCATTCAAA GTTACAAAAA 2640 AGTCAGGCTT GAGATCACGA ATCTC'rGGAA CAACACCAAA AATGCCACGT CCAGAGGCGA 2700 TTCCTGTTAA AATTCCTTTT TCACGCAACT GTTTAAAAAC AGTGGGAATT GTAGTTGGAA 2760 TAAACCCTGT CTTTGAATTC CGCAATGTAT CATCAATATC AAAAAAGACA ATCTTGATCT 2820 TCTTTGCCTT GTATCTTAAT TTCGCGTCCA TCTCACTACC TCT'rTCAATC TAACTCTT'rC 2880 CATTATATCA TAAAGTAGGC AAATCCCCTA 'IrTTCAAAAA GTTTATCAT'r TTTATTTTAA 2940 TTTCTTGCAT GAGAAAAGAG ACATATTTAT GAAAAAGCTC CATCGTGCTT TTAATGTGTr 3000 ***CTCTTGTTTT CAAACTCGTA AAAAGGGAGC CACTGATCCT AACTCGCTCT CTCATTTCA.A 3060 *..AGCTTGTGAA AAAAGACCCG TTGGGGTCTT AATTCGCTTT CT'rGTTTTCA AGCTCATGAA 3120 *AAAGAGACCC AACTGGGTCT TTTC I rlAAT CTTCGTTTAC GAA.AGGCATC AAAGCCATTA 3180 *S*CGCGAGCGCG TTTGATAGCT GTTGTACTT TACGTTGGTT TTTAGCTGAA GTTCC'rGTTA 3240 CACGACGAGG AAGGATTr'rC CCACGTTCTG AAACGAAACG GCTAAGAAGC TCAGTATCTT 3300 **.:TGTAATCAAC ATATTCAATT TTGTTTGCTG CGATGTAATC AACTTTTTTA CGGCGTTTGA 3360 ATCCGCCACG ACGTTGTTGA GCCATGTTTT TTCTCCTTTA TAAGTTTAGT TGTCCATTAG 3420 AATGGTAAAT CA'rCATCTGA AATATCCAAT GGGTTTGTTG C'rCCAAATGG AT'rTTCATTA 3480 :~**CGTGAAAAGT CTGGTACTGA A'rTTGTAGGT GCTGAATAGT TTGCAG']TGG TGCAGAGTAA 3540 GCTCCACCTG TGTGACCCTC ACGCACACTA CGGCT'rTCCA ACATTTGGAA ATTCTCAGCC 3500 ACGACCTCTG TCACGTAGAC ACGTTGTCCT TGCTGGTTAT CGTAACTACG AGTC'rGGATA 3660 CGACCTCTCA CCCCGATAAG TGAGCCTTT TTAGCCCAGT TAGCAAGATr TTCAGCC'rCT 3720 TCGCGCCACA TA.ACGACATT GATAAAATCA GCCTCACGTT CACCATTTrG ACTCTTAAAT 3780 GTACGGTTTA CTGCAAGAGT AAAAGTCGCA ACTGCTACAT TTGATGGGGT ATAACGCAAC 3840 TCAGCGTCAC GTGTCATACG CCCTACAAGT ACAACATTGT TAATCATAGT TrACCTTCTT 3900 212 ACGCGTCAAT TTTGACGATC ATGTGACGAA GAATGTCAGC GTTGArTTT GAAAGACGGT CAAACTCTTT AAGAGCTGCA TCGTCATT'rG cTTcAAcGT'r AACGATGTGG TAAAGTCCD CACCGAAATC 'rTGGATTTCG TATGCAAGAC GACGTTITC CCAAGTTT GA?1'CAACAA CAGTTGCACC CTTGTCAGTC AAAATAGAGT CAAAACGTGC TACCAAAGCG TTTTTAGCTT CTTCTTCAAT GTTTGGACGA ATGATATAAA GAATTTCCTA 'r=GCATT GATATGTTCC TCCTrTTrGT CTAATGACCC CAAGAC'rTTG CAAGGGGTAA GTGAGGTTCG CTCACAATAA ACTATTATAC TACAAAAAAT TITTTTACGC AAGTAAAAAC ACTACAATTC GAAAAAACGC CACATGGGCG TTTTCCTGTT CTTATGGTTI' GATACGGTGC AACATACGTG GGAATGGAAT AGCTTCACGG ATATGTTTTG TTCCTGCTGC GAAGGTTACC ATACGTTCGA TACCCATACC AAATCCTCCG TGTGGAACTG TACCGTATTT ACGAAGGTCA AGGTAGAATT CATATTCTGT q. 0* ACGATCCATG CCAAGTTCAT AGACCCACCG ATAAT'rTCTC CTCTGGATT'r CCAGGAACTG GACAAATGTT GGCACACCAA ATCACCATGC TCAAGATGCT GTCAATGGCT TGATCGTAAG TrCTGTfATCA CGTTCCAAGG AAGAGCTrTC ACATAAGCTT CTCAGCATCC ATCATCCAGA GAAAACTGGA CCAAAGTCAA CTGACCTGAT TGGCTCAAGT AGAATCTT-CT GCCGCATTTC GTCAAAGAAC TCATAAGTTG CTTACGAGAG CGTAgCCACA TGGTGTGAT'r GGGTAGTCTT ATAGCCAAAT TTAGAACGTT TTGGCTCAAG CGTTrGATAA GACAAAGTTT GGTTTAAAAG GAAACGATT TTTCCTTTTC ACCAACATAG TCTTTTACGT CA'rAGCCTTC TGGAGCAAGC AAGTCTGCAC GTTTCATGTA GAAGGCCTTG ATGGCTGCTG AGTGGTTTGA AATCCAAGTT TCGTG'rGGTG CGTAGTCAGC ATCTTCATCA TTTTCATGCr TGATACGTT'r GAATGGCTCT GCAA'rGTAGC ?r'rCCAAGGC TrGAGGCGCG CGGTCAAGAA CTTGCAAG.TC AAGCGACTCA TCATGTGTCA ACTCAGTCAA GTGACGGCGT GTTTTTGATT AGACACGACC AAGAGCCATA GCCCCTGCTT AGGCTGGCGT TCCGAAGTAG TCAGTTTCAA CTGAAAGAAT TGGGCTGTCA AACTCATAA CA'rAGATAAT AGCGTTACGG ATTTGCAACA
AAAGCACGCG
GATAGTTCAT
ACCCAAAGTC
CTrGCAAGAG GTrq'CAAGAG
CACCTTGTAG
AGTATGAGTA
TTTCAGCACG
CTAGGTAAAG
AGAGTTCTGT
AACCGT'rCT'r CAGCTACTrG
CGTGTTCTTT
TGTCCAACTC
CAGACGTTTC
CCATCT'rAGC GACAAGGGCA TCGTAATCTT CCTCACGCAT 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 AGTGACGGTT ATCCATCAAA GAGATTCACC GATCACTTCG CGTCCTCTTT GACAATACCT
AAGTCTGTTC
ATGTCTGTGA
CTCACATAA.A
CATCAAACI'r
CCACACCTTG
CTGATTTGTT
CAATA.ATCGT
CTCAAGTCCC ACTTCTTCAC CAAATTT'rTC AAAGAAGGCT GTTCCATCAC GCAATTGTAA GGCAACCCAA GCGCCAATCG TCACTTrCCTG TACACGTTTT GTCATTATTT TTCCTTTTCT 213 TTTTTATTCT TTATGGCAAA CCACCTCTAT ATTGTTCCCA AGCATAGTAA ATCGGATGCT CACTTCGATA ACCAGGAGCC CTCTAAGCCA GCCTCATAAC AAGCCTGAAC TTCTTCCTTA ATGAACAGGA TCTTGTGTTC CCTGAGTCAG CCAAAAATCA CGGGGATAGA AAACTAATTA GAGAACTAGT CTTAAAAGCCC
TCCAGGTCAA
CCATTTCTC
TTTTCTGCTA
CCACCAGGAT
AATTTATAGT
TTTACCTTAA CTTATGAAAT TTGTAAATCC
S
*g C, S S
S.
9 5 55 0
*S
S
GAGA.AAACTC CTATAAAATC ATCAATCATT C'rCACrACCC CTAGGTCTGT CGC-ATAGCTG CCAAGGCCAC T'rCGGCT'rCT CTTTCATCTC CA'rGGCCTT CCACTTCAAA TCCTGGTACC AGGCCTGACG CATGCTTTCT ATTGGGCTAC TGCTGACGGA TAATGTCTGC TTCTCCAACG ACACACCATT GATGACCACT TGAACrCATG ACCA'rTATAA CATTT'rCTAC AGCCCAGTTT TGGGATTAGA TGGCGAATTC ACTGCTCTAC GGTCACCTTA CTTCTGCCAT CTrGACCTGA CATCACCTGG ATTGACCACA CGACTGTCAC TTGATTTGAC CCGCCGCCTT AAGCTCTGGC ATAAATGCTT 'rCAAGCGTTC GACTGCTTCT AGGCGGACAT TTTCTGGTGC TCCAAATCCA TCTAAGATAA CAGTTGTAAA T'rGACATTTG GGAAGAGATA TCTGCAAGGA GGGGATAGAT ACAGTATCTT GCTCACCTGA
GTCTGTCACA
GAAGGCCCC'r
GGTATTPLAGA
TAGAGCCTCA ACTGCTGCAT TTCGA.AGTTG TTTGACCTGC AATCTTGGAC ATGGCAGCGA GCATAACCAA TCCGCCCAACC AGTCATGGCA TAAGTrTAG GTTrTGCTTGC GAATCGCTTC CGATAGGCTA GAAArrCGGTG ACCA.AGCGGC CATAGATATC GTCTGCTAGG ATGAGAATAT CCAATTGCCA AGAGTTCCTC ACGGGTGTAA ATCATACCTG AGCACCAAAA CCTTGGTCTT GTCAGTGCGA GCTGCTTCTA AAGTGATTGT CTTCCTTAGC AGAAACAAAG ACGGGAACGC TCTCCATAGC TAACCCAGTA TGGGGTTGGG ATGATGACTT GCCATAAAGA AGGTATAGAG AGAATATTTG GCTCCCGCAG GCTACAGAAT AGCCGTAAAA GCGCTCAAAG TAGCTATTGA AGACCTGAGG TTACTGTATA AAAAGAAGCA CGCCCATCTC TCfl'GGATAT TTTr'GGGAGT AGTGAAATCT GGCTCACCCA CTACCCTCAG CCTTCAGTGC TTTGGCACGG GCTCCAGCAG ATTTCTAAAA CACGGT'rGGA TAGTr'rCATA GGCCCTCCTT CAAAATCTAC TAGATAAAAA TCAGATCCTG ACTTAACTTC AACGGCCAAA GGT'rATCTTrG TCAATCTCGC CAGCTCCCTT CTTTCTTG TGAAACACCC TGATTTAGCT GATAAACGTA
TCATAAAAGC
GCCCACCTGC
A.AAAAGCAAA
GAGGGCTGTT
CCAAAGGAGC
TCTCAAAATG
TTAAGCG3'GT GCTCCTGT'rA
TCCGTGTAGC
TGCGGTTrGA
CGTTCCTCAA
5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 9f 0' 5 0
S
*59s
S
*00.
I
S. S S
S
GAATCGATGC
AGGTTAGAGA
CCAAAGTCAC
G'rTGACCAAT
CCAGATTGGC
AATGGCGGCA
CAAAATATCT
ACTTT'CTTCC
GC'rCCTGTTT
TTATCTTGAT
TTCCTTAGAA ACCGTT'rCTG 214 AATCTTATGG TCATCTTTAC CAATCAGGAC AGCAAGCGCT AAGAACGCTG TAATAAGATPT CCAAGCCATT GTATAAATCA TCCTGCATAC TGCTGAGCTA ATrTTTrCTCC PTCACTTTrA GCTAAGAGAA ACCATATACA GAAAGGAACC ACTGATAACC TAGACCATAC TGCCACAGTA GA'rrATI-rTr TGCTTITGTTT ATTTTACCAT CTATTAAGCT TTATTACAAG TGAATATAAG TCAAACCACG TCAGCTTTAT CTGCAGACCT CAAAGCTGTG TTCTCCCTTC AAACAAAACC GAT'N1TGAAA GTGAAACAGT ATGATTAGAG TTTGCCGGG INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 9897 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
TCI'GCTGTT
ACCTGATCAG
GCTGTTTGAT
ACAAACAAAA
TGTCTTTT
AATACTCTTC
CTTTGAGCAA
TCTTACTTTT
TGTTACGACC
CCTGCTCTAA
AGGGTTTCAT
TCGTCATCCC
TCACTCGTCT
GAAAATCTCT
CCAATTCTAT
TCAGTCACAA
7500 7560 7620 7680 7740 7800 7860 7920 7939 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCGCTCTACc GTCAAATAAT TACCATTTTG TTTAATACCG AAATTTTTAT TTCAGTTGGT CTGTTGGTAC GTAATCAGTA TCACCTTGTT ACCTGAATCT TTTCTAGTTG CATGAGTACT TGTTTGTTCT GCTGATTTGT AAGTCACGTC ATCATCGAAT GCCAATGTTA ACTTGTAGCA ATATTAATTT GCTTGAAAAT ACCCGACCAT ACCTGTTGTA TCATTAGCCG GATCGTCGTA TACAGTACCA TTCTCACGAA TCCTTAATTT AAGGTAATAA TTACCATCAA CTTCTCTAAA ACTTACTCCA GCAGGCATCA T'rTTTTCAAC AATAACAGAG TCAATATAGG
CTACTGAAAA
TAGTATAATT
TTTGTTTATA
CATCAGCAAA
TTGCACCACC
CACCAACTTC ACGAGGCCAT ATTITTGGTTT AGTCCATGTC
TATTCAAGAA
TGCTAA.AAGT
TATAAATTAA
ATCATGAGTT
ATACAGAACT
ATGTCCAGTA
TCTAATGGTA CTGGCGCAAA TTACCATTAT CATCACTATA.
CCACCGTAAC GAGCGTCAAT GGAATACGGA AATAGTTAGA ACAGCGTTTG TTGTCATCTT 'PTTAACAGTT TCTTCATCCA ATGCACTATT AAAACCAAAC GCCGTTTC CTGCACGTTT TTTAATATCC TTGATGTTTA GGAAATTATC AAAGAATTTG ATATTTTCTA CACTCCCCCA AGCATATAGT CACTTTCTTT TCTACTACTT
GTGTTCCGTT
AATCAATACC
TTGTACCATT
TGCGTATAAA GAATATGTTT TTTTGACTGA ATCTGCTACT ACTGCAACAG TGTTAGTCAC AGCCTCTTGT TTGTACTTAC CCCAAACTGA AGCAGGTCTG GATACTAGGT TATTTTTATT GGAAGAAGTA TCACGCGCTT ATAACTATTT TGTTGACCGG CTCAC 'rGAC TGATTGTACT TAACTCTT'rA GTAACA'N'TT AGGAGTTTCT TCCGCTGTAG
CCATCCCCAA
GTTTGCGAATT
TAATCACTAC
CTCCGCCCCC
AAGATG4GATC
TATCCGATGT
215 CTCACCAT'rG
AGATATTCCA
AGTAAAGTCA
TGTTAAAGTA
CTTAACAGTA
AGGTTGTACT
TACAGTAACT
TGCAACAGGT
AACTGAGT'rA
AATATTTTTT
TTTCATCATT
TCCCTGGAAG
CTTCGCTAAA
TATTT'rCAAC
GAACTCCATC
TTT'rTCCATC
AATATTTATT
TT'rCTCTTTT
TCCGAAGAGT
TGCACCAACT TTGGTGTTGA TACTTCAGA-A GCAACAAATG CTGATAATAC CACTACAGTA TTCATT?1'AT TTTTCCTCGT GCAA'rGAATC TTTGGTTGGT CAATTCAACA ATTTGATAGT AGGTACACGT GACTGGGCAC AACAATATGA ATATCTAAAT
TTAAAACTTT
GAAGATCTTC
CTTI'GCTATC
GAACTGGGGA
ATT'TC'rTATG TCTCTAAGGA ACACATCTAC AACAGAGCTT GTAAGCC?1-r CCGCTAGTAA A'r~rATCCrT ACATTATTTT TTTCTAAGAC GTTTCAACTG TTCGAGGTTG TCCGAAATCG GAGTCG~rGG GTTTCAGTrCT CCTGAGCTGC CCTAAGGTTA CATATTGT'TT GATAACAAGT TTTTTAACAG TTCAAAACTC ACCAACATAT GTA.AAAAGCA ATATCCTTCT AGTTACTGCC AT'TTTTCAG AGTTTCAAAA ATATCTCCTG ATTTTCCCCG TCAATATCAA AAAATCACAG ACTTTTGAAA TTCAGAAATA ATCATATTAT CGCTTTAAP'r TCTGTAATAT AGCTAGATAA GTCATACAAT TTGCAAAAAC AACTA.AArCT GTCAAATTTG TATTTT'CTAA GACAGAAGCA TATCGTTTAA AATCAGATTG CTATTAGTGA CGAACTTCCC AACTTGAATC 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1550 1620 1680 1740 1800 1860 1920 1980 2040 2100 2150 2220 2280 2340 2400 2460 2520 2580 2640 CATGAATCGT TGTATATTTA GGTGCAGATA CTTITATTTCC AGTAAGAACA AACCTGAAAC TACTGATACA GAGATTGAAA TCAATGAATA TGCCCAGTAG TTGGAGGAAG GAAGTATTTA ATAAATACCA TGACGATGG.T TGATACAATC AAGCACCTTG TTTATTTGCT TTTTTAGAAA CAA.ATCCAAG AATAAATACA GACCAAGTAC AAGTCCCATG AAACTATTGA ACCATTCGTA TGCAGATTTA
GATACAATAT
CTAACAGCTG
AGCGCTGCAT
CCACCAAGTA
ATATCTGAGT
GAGCCATGA.C A.ATGGAAACA CCAATTGAGA ATAAACCTAC TGCTAGAGAT ACGAATTGTG CAATTT'rCGT ACGACGATTG 'rCTGACATAT TTTTAGAAAT GACATCTTGA ATATCCAATG TCCATGAAGT TGCAACAGAG TTCAAACCTG TTGAAATAGT TGATTGAGAT GCTGCATAAA TCGCTGCCAA GATCAAACCT GTGATACCTA CTGGTAACTG GTATGCAATA AAGTACATAA AGATTTGGTC 'rTGAGGGATA TTGCTAGCTG CACTATCTGC ATTTTGTACT TGATAGAATA CGTACAAGCC TGTACCAATC AAGTAAAAGA CTGTTGCAGT TGCAAGTGAC AAAACACCGT TTGTGAACAA CATCTTATTIA AGTTTCTAA 'rATTT'rGTGT TGTAGTAAA.A CGTTGAACCA 216 AATCTTGAGA 'rGAAGCATAG GAAGACAAGA TTGTAAAGCC AGATGGAGTT TGAAAGCAAG 'rTAGGATCGA AAAGTTTT~c CGTTTGCTAA TGT'rTCTGCT ACTGCACCAA AGCCACCTTT ATAAAGCTAA AACGACACCA CTAATCAGAA TCACACCTTG
TGAACCCATC
ATT'rGCAGCA
AATATTAGCA
ACAATTAAAA
ACAAT'rTCC
ATCAGTACAA
CGGATTTTAG ACCACCAGTA AAATATTGAT GTCAATTCCT TAGACATACG 'rCCCAATTGA TAGAATTAAA ACGTTTATCC TAGGTAAGAT AAAACGAATT ATAAAATCCA GCTACCTGCA GCATTGTGGC AAAAATGGAT TAAGAATAAA CAATTGCAAC GTCAATACTG ATAAACCAGC TAAATAATAA ACAAGAGTGC AAGTAATCAT ATGCCGTATC GTCAGTGGAA TAGCTACTAC AATAAAGTCT GTCCATAATA TACACCCATC AAAATAATCA TGATGCGAGG TACATAATGA TGAAATAATA CGAAGTGCTT GATGTCTATC CGTGCAAAGA CATCCCTAAT TGAGCAAACC CAAGAAGGAA ATCGGACTGA 4 4. 4.
4 4* 4 4* 4 4
C
44 4* 4 4 4 *44.
4 4.S* *4 44 4 4 AGA.ACTCTTr GTAAkATAAAC
ATCTCCATAT
CTGCTTGT'TT
AACCTAAATC
CCTTACCTGA
TATCTAAATC
CATAAGTACC
CTGGACCATT
GTACAGGCAT
ACAAACTACC
TCCTTTCATC
AATCAAGATA
TGATTT'rATT
TGCAACTTCC
AAGTTTTTCA
TATCATCTTA
TCGTTCTrGA
ACCAATACCA
GAATACAATG
TAAGAGCTAC
ACCGAAGTAA
TCTTTTTTAG
AT'rAAGTCAA
TATTATAAAA
AAGTCACCTT
TTTAGACGCA
TAGATAACTT
ATCAAACTTT
GCTTCTGCTC
TAATCTTCTC
CATACCAAGG AACCGAACCA TCTCCTTTAA AGAAATAGAT ACCTGCAACC AACACCGCAA TTA'rTGTAAA TCCTGTTGTG CCCATAACAT ATTCTTTTCG TGCTTGTTGA ATAAGTTCTG CTGCCAATGC TTCTAAAGGT TGACGAACAG AAACTTCTTT TCTACAGCA TACATATTTG CATTGATAGC ATATTGAAGT TTTTTAGCTG CCAATTTCAA GAACAAATCT GGCATAACGC CCATCAAGCG ACCACCAAGA TATTGTTCAT CACCTGCAGC TACAAACATT 'rGAATATCT'r TCACACGAGG ATTTTGACGC ATTGTTGCAT ATTGTGGAAT ATTATAGArA ATAAAATCTG AATATGCTGC GATTGAATAC TCTGGCAATT CATCGACTCC AACACTTTCT GAATGTTrTG ATGCAATATG GTTGATAACT GTTAATTTAC TTTGTTTACG ATCTTCTACA CTTTGGTAAA
CAGCGAGTCC
2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 AGAAGAATTT TTAACTCCAA AGTCAACGCA ACCCCTGCCA TATTTGACGC AGCTTCACTC ATTGCATTCC TGAAATAAAT AGGTGGGATA GCTGCAATAG
CCAATTCGAT
CTTTAGCAAC
ACTATCTTTC GTGTTATTAC TTCCATAACA GCTTCAATAA TACATTCACC TGAAGAACCA TTTACATAGA TACCTTTTAC ACCTTTGTCA ATGAAATATT GTACCAGAGA TTTTACACGA TCTTGGCTAA TTTCACCATT TTCATCATAG CAAGCATAAA ATGCAGGGAT AACGCCTTTG TATTTAGTTA AATCTTTCAT CAGANTrCTC CTTTATATTG TTTTTTATTT GATGACATTA ATAAATCGCT GAGCAATTTC TTTTGGACGT GTAATCGCTC 217 CACCAATGAC TACACTGGTA ACACCTAAAC TA'rAAGCTr GAATTTTTCt TCGGCAATTA CCGGAATAT2' AAAATCAGCC ATCAGGCTCA TCTGATTGTA CACTTGTACT TGTGTAACCT ATCAACGCCT GATTTAAATG CATAGAGACC TrCATCTAAA CAA'rTGATTC GGATATT'TT CTTTTATTTT- TTTGATAAAT ATATCTTGGT CTTAAAGTTG CATCAAATGC AATGAC'rGTT ATCTACTTCT TTCATCGTAG CAGTAATATA TGGTTCITTGA AATTCCAATT AT'rGGTAAAT CTACTACTTI' CTGAATTGCT TGCGCGAATG CCCACTGCTC CTGCC'rCTAA AGCTGCTTTA AAA'rrCTTCA T'rATAAAGGG CTTCACCAGG TAAAGCTTGA TTGAACTTGG CTTATAAATT T1TrC1TTAGT CCAAATTMG TATGGATAAT AGTTTGATTG TAATAATATT GTCTCTCTGG ATAAGCAGTC TGTAATTAAA AGTATTGGAA ACTGAGGTGA GATGATCGGT CGAAGCTAAT AACAATAGTT CATCAAAGAA TTCT'rG'AGT CATTAAAACT GTTTTAGCGC CTTTATCTGC GTACAATATC AGTTTGACCT GAAATGGATG CTCCAATGAC GTAAGCTACT CCACAAAATC ATATCCTCGT CTGATAATAC
TTTTAATTGT
AA'TTT-CA
GATAATGTTG
TC'rGGATAAT
TTAC'N'CAAA
TACCAACAAA
'I1ACTTACAT CCGCCATCAG TCACTGACAA CTAAGCCA'rC CTTCCGCATT CTACAAGTTC GGTGGATAAT CCCTTTTCAT TTA.ATATCAC GCACAGAATT GCCATAAAAG GCATCAAGCT CALAGAAACAA TGACTCCACC
CTCATTAT
ACTTTCCAGA
TATGCGATTG
ACAATCTTCT
AGCTTTTTGT
AAG4GCAATTT
TTCACCAATC
TATTCCTCCT
TAATTAGAGA
CCATACGAGA
TCGTCAAATT
AGACCTTCTA
TCATTAAGTA
ACTCCGAGAC
4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180
GCATAAATCT
ATACACGCTC
CCGTGTAAGT
CATCTTCATT TCTTGTAAAG CAAGAACAGA ACTTCCTTTA CCGTAGAGAT AGCAGTTTCT ATCATCTCAG CAATACGCTC AAGTTGAACT TCATCAAGAA T'rTTCTCAAC ATTTCCTCAT AGTCGGATAA AAC?1r'rTCT GTTGCCTCTG TATATAATGC CAACTrTTTCT TAAAACCTTT AAAACCACAT GGTATTCGCA CAATGCTTTA TAGCAATGTC TTTTI'CAGCA GTTCTTTTTG CAGTAACATA TAACACTTTT TTTTTrTTTTC CGTTTCATAA CAGAACAACA TATATGTATT GTAAGAACGT TATTTTATTT TGGTTTTATT TTC'TCATGAA 'rCATCTCTTG GTATTTGAAA ATGAATTGTC ?TTTTTCGCAA ATCGAGTCAA TGTTGCT'rTG GATACAT'rAA GATGAATAAT CATTCAGAGG TTGCTGTTTT AAGAAGAATT TATGCCATA'r TTGGTAAGTT AGC'rTCTA'rC ATTGGAATTA TGAGCTCCT'r AGTTGA-AGTA AACGTITACA TTCTTTATTT AATATTT'rTC ATAAATTIAGA AACTAGTTTC CAAT'rTCTTT AACATAAAAA TATAATAGTT T'rTATTCTTT TTATCGTAAT TTATCACTAA TAATATGTTC ATATTA-AAAT ATTTTAGTAA ATTTCTTT1'C GGALATTTCTA TATAATATTT TAT -rCTAA.A AAAATTGAAA AAATAT'rTCT AGTT'rCT'rTA ATTAAAAGAG AATCCCATAA AAACTACAGA AAAAAGCAGC AAACTATAAA CTAAAAAGTT TAAGTCAGAT TTA'rAGCGCA CCATACCTAA AGCTAGAATG GTTCCTGGAT GATGTACTAA AATATCTAA1 TTTCTAACCA AGTTCCATAA ACTCGCAT'rG ATTAAGAACA ATAAAAA'rGA TAAATTTGT'r TGATTCGTGC TTCCTTGAGC CAGTAGACTA GCTAGTCCAA TACCAAGGCA CACT'rGTTTT~ CGTTGACCAT ACATCCATAA 218 TTTTATATAG GTAATATATT TTTATGAGAT AAATCAGGTC CCACACCAAA TGTAACCCCA AAACAT'rCCA AGTGAAACGT GGCAAATAAA ACACTTGTCA AATTTCACGA TACAGAAATT AAACCAAGGA ACTGATGTT ATGAATCAGG CTAAAACATA TTTCATCCTA GTTTTCATAT AAAAGAAAAA AGAGACGCAC CTGTAGTATA GTTAACTCAC GACATTTACT TGTTGGAATA AAATCTTCAT AATCTAAATC ATTCGTTGTT CCATCTTGTA TAGTGAAACT CTCCCGCAkA GCCAAAATAA TGGACAAGTT GATAAATA.AT GTGT'rTGyGC CTAGCCTCTT CCAAATTCAG CCGATGGTTA GTTCAGGATT TCACGGTTCT TAAATCGTAA CTCATCATCT CAATTAATTT ATGCCATAGT TTTGGAAGAA TAGAGATCAA TCATGGGAGA CGATACAAAG AAATTTCAAT AAGTATAGAG TATAAACTGG AATTATTCTT TTCATAGTTA TAATATCTGC ACAATCCT'rT CTACCCATGG GTGGCGAATC TTTTGATATA AACGATTCAA CATTTTTCTG GTTAACTCAA TCCAGCTGAT CTCCCAAAAT CGTTCAGCCA TATTrCTTCT CATGTAAATC AA'rTGTTTCG TATCTCTTGG ACTTGGATAA ACCCGCTTAT TTGAAACCAC TTTTAAAATT ATCTCAACGA AATCCGTTAA TAAATTGGGA GATAAAAACT CAAAACAATC GTCCTTTGTC ATTTCAGAAA CTGAATGACA GTCTAAAAGA AGTTGATr'rC TT'rGGCTATT CCTCCAACAA ATTTGCTTCC ATTTGATATTI TTATTTC'rAA
ACCTATTTTA
TACTTCCCCA
ACAGACACCA
AAGCAACTCG
CTTCAACCAT
GAAGGCCAAT
GACTTATAAT
TGACCTTGAC
CATAGAGAAC
ATACCAATAG
CCTCCGAAAT
ACTTTGAGGC
TTCACTTGGA
ATTTCTTTCA
CCTTTAGTTA
CAATAGAGCT
AAAAGGAAGT
'rCTTAGATTG
TGAAGAATAG
AGATACCTCA
TT'rACTTAGA
CTGAGACGAT
6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 TAAGGAATCT AACAACTTTG AGAAGTTAAT CGATTTCTTG TCTTCATCAT AAGCTI'TTAC AGTTACTTGG GTTGTAAGTA TCCCCTCTTT TCCCTCGGCT CGATAGTCTT GTCAATATAA AACAAAAACA AGATTCTGAT ACTTTCAAGG AATTCCATAA rTTTATGG TAATCATCTA AAATATTTGT TCATCCAGCT ACAATCTAGT ATTGATTCTT
TATCATCTAC
CGTirTTGAAG AAAATGTT1AC
GTI'TGATI'TC
TATTTAATGC
AAAGGCATTA
ATAGGATTCA
CTCAAACTCA
ACTCCGTrCT TTATATCCTG TAAAATAGTG GGTAATTA'rG CATGGATAAT TGGGCATCAA TGCATCATGT AATTCTGTTT CTAATTCATC TTTTATCTTT TTCCTCTATT TCTT'rTAATT 219 TCTTGCGAT TGCGGCAATC ACAGGAACGG 'rTACACTATT ACCAACTTGT TATAGAGCT GACTAN'AAT AGAGACTr'rT CTAGCAGCTT CAAAAGCCTA ATCAGGAAAG CCATGCAATC GAAAACACTC TTTAGGAGTG ATTCGTCGTA TTCTCAAACG CTAAAATTGT CCATCTATTA AAACACCAGC TACTTGGTAA ACTTGTTTAT CTTCTCCTC ATAGCTAGCC ACTACTACTC CCATTTGACC ACTAGTTGTT AACGTATTAG ACTGAGAAC'r TGGTCTTTCT AAATTGATTG TTTTrCGT'rGC TTCCCGTACT TTTAGAAATT TTTTATCTCC TCCTTGCATC GTAGTCAGTG GACCTGTCTC CTTAAAGCTA GTCGGTAAAT CTATACCTTT TCCAACTCTA CCACGACGAr- AATCCCCAAT CTCTCTTGA GCATATCCT GGATTGGTT-C TGGAATTAGT ATTTTGGGGA TTGGAGATAA GCCCTCACTT CCATAGACAC CTrCCAACAAC GACAATGCCA TAACGATCCT ?TTTCCTTAAA GCGTC'rCCCA TTTTGTCTCT
GAGTATTTAA
AGTAAACATC GGCTCT'rGAT TGTCTAATC'r ATCTGGTGTC ATACAAGCAA GAACTAAGGT TGGCGCAAGA CCTTCTGAAT ATGGATTCAA ATTTCCTAGT GCTT'rCAAAG TGAAAGGAAA TAAGAGTCTG GTACCrTTTCT CTCTGrTrTT GGGAACGCCA AAATCCTTAC CCAACTCATC AAGTGTGGTA AGTATTGTGG- GGCCTTTAAC ATTTTCAAGA AAAAGAAAAC TTCAAAGAA CAAAGTTCCT CTAGTATCTT ATGCTTGACA AGGGAATCCC CCACAGA'rGA
TCGCAACTTT
AATAGACTTT
TC'rCAGAGTT
TTCTAGAATG
TGTTAAGCAC
TGAACGTCCG
GTGGTTGGAT
CAAATCCCAA
CATCGACTTT
AAATCCTTCT CCTTTACCAC ACCGCTCATT CCACTTCTTG AGTTGCTTGA CCTTCTCGTC TCCGATAATA AACACCCTCT CTGCCACTCA ACATCAAACC TCCCTTATCG TGATTGAGTA- TTG'PTTGGCC GCCCGAGCAA TCGTCTTCCT GCGAT'rGAAA CCCTCTAAGT TTTTTAAATT TCCTTCCGTT TGAAAAATGG 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8'760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 CGTCATCTGA AACATC'rCGT ATGTCATGAA ATTCTATTTC ACTTATAAGA TTTCCTAGCA AATTTATCAA TCTCACAAAA TCCCAAGCAC TCATGCCCT GAGCTTCCAT TCCCATCCTA AAGCCTCCTA TCCCAGCAAA TAAATCTAAA ACCCAAATCA TTCATACCTC TCTCAACTAG ATGTAACTTA CAAAACCCCT GACCTCATGA GCCACTTTCT 'rCCTCCTCAT GAGGTCAGTT TTACTTTC'rG CTGTTCCAGT ATCGTTTrTC CTCGCTAGAT TTCCTCAAAA GGGCAGACTC CTCCCTTGGT TCGTCACACG-..ATTTTTTCAT CTCGACTGTT CT'N'AATGCA 'rCATTAACGA CGCTTTTCTT CTAGGTGGTT CATAAGGAAC AGGAAGATTC AGGTTGACTT 'rTCTAATCCT AGAATAAAGT GCTGAAAACA ATTCGGAATA GGCATAGAGA CTAGACAAT'r TGAGGAGCTG CTTGCGTCCT G'IrCGAACAC ATTT'rCCTAC CACGTGAAGA AAAAGATGGC GGAAGCGTTT GATTGTTAAA GT'1'0GAAGT CACCTCCAGC TAGATGTTTG 220 AGAAAAAGAT AGAGAI-rGTA GGCGATACAG CTCATCATCA TACGAACTCG TTTTT-GATTA AGGTrGAACT ATCCGTN'TA TCGCCAAAAA ATCCCTCCTT CATCcrTG ATGAAATTCT CGGCTTGACC ACGTCCACGA TAAAGCTGAA ACTGGTCTrG GCTTGTTCCG GTACCGA INFORMATION FOR SEQ ID NO: 11: SEQUENCE CHARACTERISTICS: LENGTH: 8148 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: CCGTGGAACA AGCCAAGACC AGTTTCAGCT TTATCGTGGA CGTGGTCAAG CATCAAGGAG ATGAAGGAGG GAT'NTTGG CGATAAAACG GATAGTTCAA AAACGAAGTT CGTATGATGA TGAGCTGTAT CGCCTACAAT CTCTATCTTT
CCGAGAATTT
CCTTAATCAA
TTCTCAAACA
TCTAGCTGGA GGTGACTTCC AAACTTTAAC AATCAAACGC CGTGGTAGGA AAATGTGTTC GAACAGGACG CAAGCAGCTC TGCCTAT'rCC GAATTGTTTT CAGCACTTTA TTCTAGGATT TCCTGTTCCT TATGAACCAC CTAGAAGAAA AGCGTCGTTA CGAGATGAAA AAATCGTGTG ACGAACCAAG GGAGGAGTCT CGAGGAAAAA CGATACTGGA ACAGCAGAAA GTAAAACTGA TGGCTCATGA GGTCAGGGGT NTrGTAAGTT ACATCTAGTT GGGTAAATAC AATGAGCTTG AAAGAAGTAG CAAACTCACC TCAGATGCTG GATTATACCA TCATTGCGCA TGAGAGTTT'r CTACCAGACA GATGATCGTG AAGTGGAAAA TGCTCTGGCT 'rTCCGCCATC TTTT'rCTTCA CTCAAATTGT CTAGTCTCTA AGAAAAGTCA ACCTGAATCT ATGATGCATT AAAGAACAG'r GCCCTTTTGA GGAAATCTAG CCTCATGAGG AGGAAGAAAG GAGAGAGGTA TGAATGATT'r AAGCGCCAAT TCTNTGAGAA GAAATCATCC GTCATTCTGT TTTGAAGTGA AAAATGATGA 9780 9840 9897 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 AACAGACAAG CTGATTCTGT TATTAAGCGA GGATATTGGT GTAGGTGAAA AATTGTGCCT CGTTGACGGA ACAAAAATGC GTGGAAAATG TTTAGTATAT GATAAAATAA ATGAGAGAAT GATTCGCTTG CAGTGCTAGA AATAGGCATT TTGAATAGTG AATATGTTAT AATAAGTATT AcqTAGGAGGT GTTTTAGATT GGAGAAGAAA CTGACCATAA AAGACATTGC GGAAATGGCT CAGACCTCGA AAACAACCGT GTCATT'rTAC CTAAACGGGA AATATGAAAA AATGTCCCA-A GAGACACGTG AAAAGATTGA AAAAGTTATT CATGAAACAA ATTACAAACC GAGCATTGTT GCGCGTAGCT TAA.ACTCCAA ACGAACAAAA TTAATCGGTG TTTTGATTGG TGATATTACC AACAGTTTCT CAAACCAAAT TGTTAAGGGA ATTGAGGATA TCGCCAGCCA GAATGGCTAC CAGGTAATGA TAGGAAATAG TAATTACAGC ATGCTTCTCT TGGGAGTAGA CGGCTTTATT TCTCGTATCA TCGATGAGAA AAAGAAGAA.A CACCGGACTA GCTGGGTTAA AACCAATAAC TGTATCGAA.A AAGGTTATGA ACATTTTCTC ACTCGGATTG AGCGGGCAAG TGGTTTTGTG GCCAGTCTAA CCATTGAAGA TAAGCATACG 221
CAAGAGAGTG
ATTCAGCCGA
ATGGI'CITr 'rATGA'rGCCG rGA'rTACAG
GATGCTTTAA
AAT'rTGGAAC AGGACCGGTA TATTGAAAGc CCTCTAATTT CCGAAAATAT TTGATAGTCA GCTCTATGAA TTTATGACAT GACCCAGTCC CGGATACGAG TCGTTTGAGT CAGATGCTAA TATGCGTCAC AAATTAAGGA A'N'TTTACAA AAAGAAATCG ATCCCGATGA CTAGTCTTTA CCGTTATCAA TTTfGACAATA CGGAGTGGAC TCCTTTGAGG AAGGACAACA CAAGAAGAAA GGCAACAAGT AATGAAGGAA AATGACTTGC CTAGGTGGGA TTATTTGCCT GTTTTTGATC TACTTCTTTA 'rGATGGTTAC GACTAGGAAT GAAAGGATAT GAGGAGAAAG TTTTCTCAAG TCCTTI'ATTG
AAAAACTCTG
AGAGTTGAAT
TTGCTTTTCT
GGCTACAAAG
CTTGGATT~GT
AATCTCTGTT
ATGAAATGAG
GTATTT-ATCC CTAACTGTTG GCCCrACCT TATAAC'rTGC CACAAGTTGG GTTGATTGGT TCTCCAAGTG ?TTTCGACGCT GGTTCAGCCC ATTTTGATTG ACCAGATTGA AGGTCGCAAT AGTGTGAATT GGAAAGAGTC GACTTTCTAA AAGAAATAAA ATAATCCCAC CTAGAACAAG AAATTATGGG AGCAAGCTCC TAAATCAACT a a a. a.
a a a a a a a a a a..
a a. a a ACTACTTGAT AAAAGTTATA GAAGTAGGCC AAACTTGAAA ATTGAAAATT TCCATTGGAC AGTGTGGT'r AAAAGTTGTG AAGAGGGCTG CGTTGAGGAC AGGTATCCGT TTTGATTGTA AGCGCAGGAA GAAAGAGGAG TAGGAGTAGT AAAACTGTAT 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 GAGAAATAGC TCCTGAAGTA AGGGCGAAGA AAAGGAAAAT ACTGATAAAA ACATGAATGA TCAGTAGTCT AGCTAGTGAT TTCATAAGGC ACCTCCTAhAT CCTGGTCTTT TTTAGCTCTT GCAATACGAA GTGAGTCGAC AATATGTATC ATCACTCCGA AA.AAGAAAGC TCCCAGTATA GTTT'rAAAAA TATGTTTTGT ATTTAGAAGA GAACrGA'rAA AArTCTGATT TTCACTTT AGGGTATCAA TGAGTGGAAT AGACCAGGAT AACGTAACTG TTGATTTTTC TTAGTTTTTG GAAAATAGAA CCGAGAATAC GTCCTGACCT TCATCTATGA TATAAAAAAT ATCACTGTTC CATAAATCGA ACCTCCTTTC TTTCTTTTCT TTTTTCATGA GTTTCCTCCT AATCCTCATC CAATGCGACG GGAGATGAGG AACTGTATGC TCGCTCCGAA TTGATACACC ATTTCTTATA GTGAGAAGAG AATGAAAATA GTATCCTGAG AAGAGGAGTT ATAAAAAACA TCCATAGACC AAAGAACAA.A CCTGCTTTCA GACCTGGGTA GTGTAGTTGC TTGCTTTCTT TCTCATTCAG CATATCTGGT TCAATGACTG TGATGCCtGT TTTTTTCATT TGGTAGGTGA CATAGCCAGA 222 AGCGATGAGG GCAATCACTA AAATCAGAGG AGGATAGATI' rTrATAGGCC AGAAGGAGTG GAATAAGATT TCCGAAAATC AAAGACTTGG TTCCCAATAC TATCGGCCTC ACGCCGTTTG AATACCGTAT GTGCGTTTGA TCAGTTTTTC AGTGAAGGTT CCTTTTTTrAA AAATCTTCCT CCCAAAAGAG ACTGTTGAGG GAGATTGAGA CAGAGTTCCA AGGTTGGATT GTACTTGTCG CTGTCTCGAG ACACCGATAT CCT'rGGCGAG TTCGAGCTGG AAATTCTTTC ACACGATTCA TCTGTTCTCC TTTCTGATTT TATTATAGTC TTTTAAACAT AAAGTGTCAA GTATTTTTGA AGAGCCACTT CTTGAGGGTA ATCAGATAAA AGAGGATGAT TATITCGTCAA GGGGACCAGA TCTTIrTTC.A TGAGTTTGCT TCAGTTTGGA GGCTGCGGGC TTI'rAATCA TATTGATAGT GAAATACCCA ATTCCTTGCG ATGTCGTATA TATTr'GACTA CATATTTr GAAGAAATAG CGGATrrGTG GTAAAATAGA TCCACGAAAC GATTGATATG TGGGCGGAGC AGGACATAGC ATGCCTTTGA CCAGGATCAG TTGAGAAGGG AATGGTGACC *00.
*.06 TAGTCTCCTT GTCCTAT'rTG TAAGATATGA CAAAAGAATT CTTGACGTAA AGCCTGATGG GACTATTTAT TAAGTAAATT AATGCCATTG ACAATGCGCA TTTATCAAGG ACAACTTCCG ATTGATGGAA TTTGTTATGA GGTTTT'rCTT ATAAAAAGGA ACAGCCTATG AAGTGGTGAA TATGGAGAGG ACAAATTCTC AAGCCGAT'rG AGACAACGAC TCI'GACAAGT GCAAGCTGGT TCATCATGTA ACGGTCTTAC TATCTACGTT GATGCGACTT AAGTGAAAA.A GGCCATCTCT AAAACGCTTG GCACCTTACA
TCATTTACAG
CTTGGGAGTG
TGCGCCACTG
CAATTATGAC
TAAACAGATT
TGAGTrAGCA
TCCTGCTAAG
GCATGTTTGC GCGAAGC'rGG TGTTCAGGAA TCTAGTCCTC AATTAGACCA GCGTGAGCGT GACATGCGGA TGAATCAGGA TGCTAGCCTG TATCATGACT TGGTTCGTAT TTTCTTCAAG GCGCGTAAGA TTGAGCAAGC GCGTGAAGTG GAGATTATCA AGTTGGTCAA ACCTGCCAAG CAGATTTTCC AGGCTATTCG AATTGAAGTC 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 AATGATGAAC TGGGAGCGGC GATGGTAGAA TTTCAGTGAT TTCAAGGAAG CTTCAACAGT AAGrCCCAAGA TGGAATTGGT AGATGAGTCC ATCCAGCAGG CTATGGATAT GTTGGCTCTG TACCTTTCAT TCCTTAGAAG ACCGCTTGAC CAAGCAA7"2G TGAAGTTCCA AAAGGCTTGC CTTTCATCCC AGATGATCTC GTCCCGTAAG CCAATCTTGC CAAGTGCGGA AGAGTTAGAA GCCAATAACC GCTCGCACTC AGCCAAGTTG CGCGTGGTCA GAAAAATTCA CAAGTAAGAG GGAAAAAGAT GGCAGAAAAA ATGGAAAAAA CAGGTCAA.AT ACTACAGATG CAACT'rAXAC GGTTTTCGCG TGTGGAAAAA GCTTTTTACT TTrTCCArTTGC TGTA.ACCACT CTTATTGTAG CCAT'rAGTAT TATTT'rTATG CAGACCAAGC TCTTGCAAGT GCAGAATGAT TTGACAAAAA TCAATGCGCA GATAGAGGAA AAGAAGACCG AATTGGACGA TGCCAAGCAA GAGGTCAATG 223 AACTATTACG TGCAGAACGT TTGAAAGAAA TTGCCAAI-rC ACACGATI'TG CAATTAAACA ATGAAAATAT TAGA.ATAGCG GAGTAAGATA TGAAGTGGAC AAAAAGAGTA ATCCG'N'ATG CGACCAAAAA TCGGAAATCG CCGGCTGAAA ACAGACGCAG AGTTGGAAAA AGTCTGAGTr TATTATCTGT CTTTGT'Tr GCCAIW1TIT TACTCAATTT TGCGCTCATT A'rTGGGACAG GCACTCGCTT TGGAACAGAT TTAGCGAAGG AAGCTAAGAA GCTTCATCAA ACCACCCGTA CAGTTCC'rGC CAAACGTGGG ACTATTTATG ACCGAAATGG AGTCCCGATT GCTGAGGATG CAACCTCCTA TAATGTCTAT GCGGTCATTG ATGAGAACTA TAAGTCAGCA ACGGGTAAGA TTCTTTACGT AGAAAAAACA CAATTTAACA ACGTTGCAGA GGTCTTTCAT AAGTATCTGG ACATGGAAGA ATCCTATGTA TTG4GAGCAAA GGGAAATGGG AAGCTGCAGA GGTCAAGGGG GACAAT'rTGC TTCTAGTTTT AGAGCT'rGCT GGGAACCTCT ACGGCATTA'r TACCTATGAA TTTCCCAACG AACGATGGAC CCTTTATGGA AACCCAGATG CGACTTTGGT CAGTGCTAAA ATGCAGATAC AAAAGAAGCC GTAACTATGA GCCAGGTTCC
AGAGAGCAAC
ATTACCTATG
ATTGATTTTA
ATCGGTCTAG
TCTCGCAACC
CCAATATGAT
CAACCAGTCC
CTCAGCTCCA
TAATCTCAAG CAAGTTTCCT GTCTATCAA-A AAAGAATTGG CAATCGTAGT TACCCAAACG TGAAAATGAA GATGGAAGCA CAG'rATTCTT GCAGGGACAG TGTACCCGGA ACAGAACAAG GGAATGGAGA GTTCCTTGAA AAGGATCGTC TGGGTAATAT
GGTAAGGATG
GATGCI'TTTC
ACAGGGGAAA
ATTACAGAGG
ACTATGAAAG
T?1'ATACAAC CATTTCCAGC AAGAGAAGGT AAAAGGAAAG TTCTGGCAAC AACGCAACGA ACTTTGTTTG GCGTGATATC TGATGATGTT GGCTGCTGCT GTAGTGAGTT AAAAATTGCA CTGGTGGCAG AACGATGACT CCCTCCTTGA GCAAAAGATG TTGGAGTTCC GACCCGTTTC
CCCCTCCAGT
TACATGACAG
CCGACCTTTG
CTTTACCAAA
ATTGATAATA
GATGCCACCA
TTTTCTCAAG
GGAGATGCTA
GGTTTGACGG
5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540
ATACCTT'TCC
TTCGAGATTG
GTTTTGCACA
CCTGGCTTGA
ATGAGTATGC
GACAAGGGAT
ACGGTGTCAT
AGGAGGAGAA GTCTTTAATA GGACGTTAAT GAAGGAr'rGA CTCAAGTAAC GT'rGGGATGA TTATCT'rAAT CGTTTTAAAT TGGTCAGC'rT CCTGCGGATA TTCAGTGACC CAGACGCAAA GCTGGAGCCT AAA'IrTATTA
ATATTGTCAA
TGATTCGTGC
GTGCCATTTA
CAI'TGCGCAA AGCTCATTTG C2'TTACAGCT ATTGCTAATG TGATCCAAAT GATCAAACTG CTCGGAAATC TCAAAAAGAA ATTGTGGGAA ATCCTGTTTC TAAAGATGCA GCTAGTCTAA CTCGGACTAA CATGGTTTTG GTAGGGACGG A'rCCGGTTTA TGGAACCATG TATAACCACA CCACAGGCAA GCCAACTGTA ACTGTTCCTG GGCAAAATGT AGCCCTCAAG TCTGGTACGG CTCAGAGCc
CGGCTGTATC
AACCTGAACA
CTTCAGCTAT
GTCAACAAAG
224 TGACGAGAAA AATGGTGGTT ATCTAGTCGG GTTAACCGAC TATATT'TTCT GATGAGTCCG GCTGAAAATC CTGATTTTAT CTTGTATGTG ACGGTCCAAC TTATTCAGGT ATrCAGTTGG GAGAATTTGC CAATccTATc TTGGAGCCGGG GAAAGACTCT CTCAATCTC AAACAACAGC TAAGGCTTNA GAGCAAGTAA TCCT'rATCr A'rGCC'rAGTG TCAAGGATAT 'TTACTGGT GAT'rrAGCAG AAGAATTGCG TCGCATCT'r GTACAACCCA TCGTTGTGGG AACAGGAACG ACAGTTCTrCc
ATAAAGCAGA
CTAAGTGGCT
'rGAAGAAGGG AAGAATCTTG GGAGGTTCCA GATATGTATO CAATATAGAA CTTGAATTTC CCCCGAACCA GCAAGTCCT'r GTTGGACAAA GGAGACTGCT
AAGATTAAAA
ATCTTATCTG
GAGACCCTTG
ATGTTCGTGC TAACACAGCT ATCAAGGACA AATATGTTTA TTTCCATCAG CCGGCCTTTA TCCAATTTTA GTCAAACAGC ATCAGGCAAA ACTTCTGTTT TGGTTGCTTT GGAATGATTT TGTTCATCT AAGGTCTTTC GTAAAATCAA CTAGGTGGAG TTATCTTCTA TGCTGGAAT'r
TAGAAAGGCG
AGCTGGGACT
CTTTTrCGCC
GGTCTTGTAT
TGAGGGGCTT
TCTTTTCTAT
AAGGTTCGGG CTCTACTGTG CAGAAGCAAG TTAAAAAAAT TACATTAACT TrAGGAGACT GTGACATTTT TACTAACTTT AGTAGAAATT CAAATTACAG GCCAGCAGAT GCATGAGGAT CCTACAATGG GAGGTTTGGT TTTCT'rGATT CTATTTAGTA GCCAATTCAG CAATAATGTG 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8148
GGCTTGGTCG
AATCCTAAGC
GAGCGCGGTG
GGTTATCCAG
TTT'rCAAACG
ATTAGTTTGT
GTGATTCTTG
AAGGTCTTTA
ATGGCTCTCC
TTCATTTGGG ATTTTTCTAT ATTTTCTTCG CAGTAAACTT GACAGACGGT GTTGACGGTT CTGCCTATGG AGTTATTGCC TATGTGCAAG CCATGATTGG TGGTrrTGCTC GGTTTCTTCA TGGGTGATGT GGGAAGTTTG GCCCTAGGTG ACCAAGAATG GACTCTCTTC ATTATCGGAA GAT TTT TAGA TGACTTTCTC AAAAATTAGC TCTTCAGCTT GCGATATCCT GTCTGTCTTT CTCTTTTCTG GCTAGTCGGT TAGCTAGTAT TTCCGTTGTG GTCAGATGGA TATTCTTCTA TCTTTAACCA TAAGCCTGCC GGATGCTGGC AGCTATCTCT TTGTGTATGT TTTTGAAACA ACTTCTGTTA TGATGCAAGT CAGTTATTTC AAACTGACAG GTCGTAAACG TATTTTCCGT ATGACGCCTG TACATCACCA TTTTGAGCTT GGGGGATTGT CTGGTAAAGG AAATCCTTGG AGCGAGTGGA AGGTTGACTT CTTCTNTTGG GGAGTGGGAC TTCTAGCAAG TCTCCTGACC CTAGCAATTT TATATT'rGAT GTAAGAATGG CACCCTGATG TTTCAGGG INFORMATION FOR SEQ ID NO: 12: SEQUENCE CHARACTERISTICS: CA) LENGTH: 9909 base pairs TYPE: nucleic acid 225 STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 0* TACTCCACCC TTAATATCCG AACTTTTAAA TGCTTGTCTT TTTAAATAAA AACCTTGCTG ATCAATAAAA TAGTGATATA ATTTCCTATC ACGACATCAA TATCAT'rCTA TTCTTr'I-CC TTCTAGCTCA TCCGCAAACA ACTACTTrTT T'rCAATCCAT TTTCTTTTGC TCTAATGTTG TGCCAAAAGA TTCTCACGCG AGCATGATAA GCATCTTTTA AATCCGTTGC AGTTTTCTAT AGTTACGGTA TCATATCTCG TCGTAAGTCC ACATACTCCT TTGCTTTGCG ACCAACCACG T'rGTrCATAT TTGGATTTTT CTTAC'rTGGA TTGATGCGAT CGTAnGCCAA ACAATTGATT AGCCGATTCT TTAGCCAAAC TATTGTAAGC CCTATTATAT TTCCTGTAAA TACTTTACCG CAAGCATCTT TTCCATCCAA GGGTGATTAG TATAGATTTA TCTGCGATTT TATAAGCTTC TCGGCTCA'rC TCTGGCTTCT CCTGTTTCCT GATACGGAGG ATTTCATTTC ACTTTCCTCG CTAGATAGGC GCTCAAAACC AGTCTTTGAT ATGGGTTTTA GATTCTTCTA CTTCTTGGAC AACTCAATTG TTGAGATTGC TTTTGTTTAG CTGAATAAGG CCATCTGAAA GACATTGTAA GAGATAATAG TCGCAATTTC GT'rGATTTCC AGTCT'rAGCT AGATAATAGT CCTCAAAAGT CCAAAAGGAG AGAATCTCCT TGATACTCAT AACCATACGA CAAGTTTATA AAATGTGACT TCATCTGAAA CCTCACGACT CTTTTAAGTTr CATAGAATTG TTTTTAGGAG TTTGACCAGC CAACAAAACC AACTCGCTCA GATAATGGAA TTACCATATA AGGTGCTTCA CCACAAGTTA CAAGACTTAA CGAGCCTAAT TTrCGATTCTA TTGGTGTAAA CACTTCTGCC CTTATTTTTG CAGATCTGGG CTGAATCAAG TTGGCAAAGT CACTTGGAGC AAATCCCTTT CCTAACAAT'r TCTTrGTCGT TCGATCTTTT AAAAGAATTT
TTTCCTCACC
CCTCTAACCA
CATATCCATT
TCCGATCTTT
TTCCAGTAAC
CATAAGAATG
TTAATAAGTC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440
TTTCTTCACT
CATATTTTAA
CAATAAATAT
ATTTGTACGA
ACCAAAGCCT
ATAGCTTCT
CTCTCCGATG ACCAACTTCT TAACTCGATA ATTACCAATT TTCCGTGTCG TCTTGGGTCT GCGTATAACG GTCCAATTTT AATATCTATT GTCATCAGCA ACCTCTCTTA AGAATGAAAA TTTACTTGAA AAAAGTAATT AGAGTAGCAA CGACTAATTC ATCATC1'ACA CTATAGCGCC ATTGACCAAC GCGATTACCA TCCAAAACAT TGGTTTGTAA ATAGTTTGTA TTCAATTGCT TGATAAAACG TCTTGTTGGA ACTA.ATTTAT ACAAATTATT CATCCTTCA.A GCCTAAATCA TGCATCATTT CTTCCCAAGT AATGGGTTCA ACTCCTTTTT CCAAGTCTTC TA.AATACTCT TGATAGGCTA AATCTGCCAC ACGAGCATCG TATTCATCTT CGAAACTCCT TCAAACTTAG TAATGTAATA GTAGTCATCr TTTTAGGTGT TCAGTCAATA CAATGGTGTT GCGGTATCTG ATCAGCTTGA GCCAGCTGAT CTTAGATAAA TGCCCAAGGT TTCAGCACCG TCCTCGTTAG CCAAGCGTAA GAACCTGATC ATCCGCATTT TCGACAATGC GACAAALACTC TTATCATCCT 226 CTAGGGCTTC AAGAGTr'rTG CCATTGCTTT CATAAATGTT TTTIGTGCTCC CTTTTTTAAT TAAAAAGAAC ACCTTCTCAG
GTGAGGTATC
TGACCATGGT
AAATCI-rCr'
AAAGGTGACC
GCAAAATCTC
A'rAAACCTrrA
CATATGAGCC
AGTACGATTT
AAGGTCAGAT
TACATCATCG
CCGCCATACG GTCACTGACA TCATAAAGCG ATAGAACTGC TACACCAAAA CTCTCGATGT CGATATCTCC AAAGGTTTTG ATGCTTTTGC GAAGAATCCA CCTTGCCAAG ATATTTACTA GTGCGAATAA GTTCCGAAAG TTATCAGCTT CAGAAACTTT GGTAACACCA T'rGTAT'rACT CG'N'CTTTCT ATATCTCTGT AAGTCTACTC CGAC'TCCCAG AGI-rCCT'rGA TATTGTTTTC CCTAGCGTCC GAATCATAGC AGGATTCGTT GTTTGAGTCG 'rTGGCCTCGA TAAGATAACC TAACCTGTAT CTGTCAAGAG GGTGCGACTG CATCATGGCT GTTTTACCCA TTTCAAAAAT TTTTCCATAG CTTGCCAGCT AAAACGCCTA CTCCATGGAT 'rCT-CTGGCT TACGGTTAAT GCATCTACTA AAAGCTTCTT GACGCTAAAA TACTGTATTT CCATACTTCT TCTTTCACTG CTTGCCGTAT TCACTCTTGG GATAGACACT CCTAGACCTG GAAACGGTCA AAGATACGTG GGATAAAATC ATCTGGTCTT CGAATACTTA ATAGCATTAT AATTTCCATC CAGATAGAAT ft.
ft ft ft. ft.
ft ft *ft eft ft ft ft ft ft...
ft ft...
ft ft...
ft. ft.
S ft ft *.ftft ft CTTTTC-ATTG GCATAAAGAT ATGATCTGAA rGCTCATGGG 'IrCAGCTAGC AGACTGGTAA TTTTGAGGTT TCCAGATAAA AAAGCCTATT TCACTCATTC CATCCTTATC ATAAGGGAGT CCCAAATAAA GCCCTTATGT TACCACCTTG TGCACGACTT GTAAATCCTG CETrAGGAA'rC CAGTTGTCTr CATTCTGACA 'rTAAAATAT'r GTCGACAACC
CCATACCATA
TAATCAAGAT
TTTTCTTGCC
AAGAATTTCC
TAGTCTTCTA
ACAATGGTAA
TGTTTGATAA
CTAGCACGAT
CCCAAACCGT
GTGAT'PTTAC
TGCGTCATCT
CT'rGCGAGCC
GGCATCCAGG
AGACAAGCCT
ACTGGAACCC
CT'rCATCCTC
AGGTTGAACC
TTTCTTTAGC
CCACACGATA
GGTCAGAAAT
CCCCATCTGG
TATCTGTATC
1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 TGATGGGATA ATCTCTCACC AACTCATATT CAAAACGATT GAGGATAAAG GrAATAAAAG GACTGGTAGC ATT~ATCAATA CGTGAAAGAT TTTCTCCrT TTCCTGTCCT TTCATCTTGT CAGTGAAGTT AATCAGTTCC ACATCTAGGT GGAGGAGATC CGTCACCA'rG CGCATCATAC
GGTTGGTCTC
CCTCATCCAA
TAACATTGGA
ATCAAGAGAA ACCTTGATAA AGTCTGGTGC TACAGTTTCA CACAAAGCCC GGCTTCAAGA TAGGATTTTA CGCTAGTCAG AGGAGTCCGT AACTCATGGC AACAAAGAGT C'rTCGTTCGC GTTCTTCCTT CTCCTGCTCC GTCGTATCAT 227 GCAAAACAGC CACCAAACCT GAAATAAAGC CAGACTCTCG CTCGAAGGTT CAAATArrCG CCATTGATAT CTI'GGGAATC GGGTAATCAA ATCACGCAAT TCATAGI= CTDCTATCrr TATTCAGAAC ATCTTCCN'A ACCAACCCCA GTTGCTCr TAATCTGACC CCGACGGTTA GTCGCAAGAA CCCCATCTGT TTAGCCTCT'r ACTC'TCTTGT TCTAGATTTrT CCTGAGDGAG CATTCAAATT ATTGGTAATA TTGGTGATTT CAGACCCACC AATAATCTCC TGCAATCAAA TCTTTAACCT TTTGATTGAC CACGTCTAT'r TTCCAGTAAT AAGAGGGTCA CAACAAGGAT ACGTATCAAG GCAAAGCGAA TAGCAACAAT TCTGGACTrT GAGCAATTCC AAAATGCTTC GCCTGTATCG TTAATCATGA CATATAAAAC AGAATACTAT ACGAATAACC TCCGACAAGT TTGCATATCA AGAACCTTGG TTGCTTCAAC TGAATATTAT GAAACCTAAC AAAATCAGGA TAAAGATAAA ATCTCTGGTA AAAATGGTTr' GTT'rCAG'rAA ATCAAGCATT ATTTCTCATG 0 0 0.
**0 0 0 0 0 a 000.
TAATACCCTA CACCACGGCG CGTCAAGATA TTCTrCACGCA GACGTCGTAC AGTCACATCA CCCCAGACAG TCTCAAGCAA GTGTTCGCGC TGATACAAAA GCTCAAATTC ACGATGGGTT ACGTAGGCGT CTGGAACAAT T'rCTAAATCC GCTTCCTGAC CATCTACTGG CATACGPTTGA TGCAACTCAC GATTGGAGAA GGGTTTTG'N' ATAACC'rTAT CAAAT'rCACT AT1CTTTGGCT TT'ACGAATGG TCTTAGCAAC TTCTAAACCA ATAATATCTG GTTGCTCTGC TTCAAATTGC ACAACTTCGT AACCTTCCTT GGTCATATTA TCATCTACAA TTAGTATTTT .TTTCATATGT AATAGTCAGA AGACACAATA GCTAGTCTTG CTGCCAGATT TTTGTTGGG GTTTGGCAAG CCAGCGAACT TCCCTATCTG AAAAATCATG TACATGCCAT TTTCGATGAC TAAAAACATG 'rACTCTGGTC GGCTGGGCGT ATCTTCA-ATC ACTGTACGGA CATCACCAAA ATAGTCATAA G'rGATGACTT GACCTGTATG CGATGCTAAA AAGTCTAGTT CTTCGCCATA TTTTTTAGCC CCAATTTGGA TAGGT'rGAGG TTTACTATCT 342o 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 GAACGACGCA GAAGAGCTTT ACATAGTCAT CTGCCCCAAG GAAAGCATAA GAATGGGCAC TCAATTTCTG GAAGCATCAA TCTAGCGC'rT CACGACCATT AACTTGATAA TATCCGAGAT TCACC'rTTTT CTCTACTA~r GCTACTGTCT AAGTTCG~CTT TGGGTAATTC TTGAATTCTT GAAGTCACTC ACCTGACCTG CTGGACTGTA TCAAAACAAA
AACACGCCCC
TTCCAAACCG
ACTGCTTGTC
ATCCAGAATA
AAAAGCAGTT
TGGT'rTCTCA
ATACCAAAAA
GTGCATAAAC
CTGGTGAAAG
CTACAATCTG
CATCAAGCCA
CAAAGTTCAC
ATCAACATCT AGGTCATAGT CCTGCTGGAA ACTCTCTTCT GGACTGGGAC
ACTTTCTTCC
TTCTATAAAG
GCAACCTGAT GAAAGAGGTC AAACTGCTCT GGGAAATGCC AAAAACCTGC CAAGAGCT'TT TCTTGCGAAA AGTI'ATCAAC TCGCTTTCAT TITTTTTCAAG 228 TAAAAATTGT CCTTGAGAAT TTTTCACAAC TAAGGC'rTTA AGATAAATAG GAACCGGCTT TTTCTTAGGA GATTTAATTG GATAACGGTC CATGGTTCCA TTCTGATATG CCGCACTAAA GTCCTTGACT GGGCTTCTT CAGGTCTGGG ATTTACAGGA GACTCAATAT CAGACCCTA GTCCATCAAG GCT'rGATTAA AATCACCCGG ACGATCCGGA TTAATCAAGA TCTCCATCA'r TGCCTGAAAA A=M1'CCAT TACTIGAAT CCCAATATCG TGGTTGACTT
CGCCAAGACC
GGAAATGGCT
ATTTGGAAAT
CGCATGACAT TACCATCTAC AGCPGGCTCA GGCAAGTTAA
CCTGCTGTGT
TGGCCACCAA
AAGGTCCAAT CCCTT'rCAAG CTGGAAATTC
CAAACAGACG
AAGCAATACT
CTTCATAGGT
GCATATTGCG
CAGGCGCAGT
GGATAACTGT
AACTCGAGAA TAATAGCCCA TGCCAGACTT 'rCGACAGTTG ATCCACCCTG GTCTGCTGAA TCTCCTCCAA GGCAAATCTC AGAAATGACT TTCTCCTCCG TCTAGTATAA CACAGAAGGT ATAGTATATA ACTrTTTCTAT CTAGCCGCAG GTTGCTCAAA ATCATATAcT ACGGCAAGGT TCTTATTGAT GAACTGCTTG CACTTAAAGT CAATTTCAAT AGTCAGTCAT AATCTGCTGG ACCCCTCCCA AGCTTTCAGT GAAACCAGTC CAAAAATCTT GCATGATTTC AGATACCCAG
TTTTGTTTTC
GCCACATGAC
TTCACCTGTC
CTACTTATAC
ACACTGTTTT
ATCATACCAA
GATACCGTAT
TTTGTATCTG
TCAATGAAAA
GAGGTTGTGG
ATtGTGATAAG GATTTTTACT GCGAGAAGTT TCTCACGGAA TCTTTCAAAT CTAACATATC ATTTATPATrA TTTTCAATAG TCAAAGAGCA AACTAGGAAG ATAGAACTGA CAGAGTCAG'r
GCTGCAGCCT
AAACTCTCCT
TCGTAGTAAG
5160 5220 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 GAAGCTGACG TAGTTTGAAG AGATTTTCGA AGAGTATAAA CAGTCTGAGA AAAAATGAGC TTGGATATTA TTTCCAAACT CCACTAGAAC AAGCCTAGTA CAG7-rCCATC GCTTTCAACA TCCATGTTGA GAGCTGCTGG ACGTTT'rGGA AGACCTGGCA TGGTCATAAC ATCACCAGTT AAGGCAACGA TGAAGCCTGC ACCTAATTTT GGTACCAATT CACCAATGGT AAT'N'CAAAG TTTTCTGGTG CTCCAAGCGC A'NTTGGATTG TCTGAGAAAC TGTATTGAGT TTTAGCCATA CAGATTGGCA ATTTGTCCCA ACCGTTTTGA ACGATT'rGAG CAATTTGTGT TTGAGCTr'rC T'rCTCAAAGT TCACTTrTGCT ACCACGATAG ATTTCAGTGA CAAr-TTTTTC AATCTTTTCT TGGACAGAAA GGTCATTATC ATACAAACGT TTATAGTTAG CTGGATTTTC AGCAATTGTC TTAACAACTG TTTCGGCAAG TGCTACTCCA CCTTCTGCTC CATCAGCCCA GACACTAGCC AA?1'CAACTG GTACATCGAT TGAGGCACAG AGTTCTTTTA AGGCTGCAAT TTCAGCTTCT GTATCAGATA CAAATTCGTT AATAGCTACA ACTGCTGGAA TACCCAACTT ACGGATATrTr TCAACGTGGC GTTTCAAGTT AGCAAAACCT GCACGAACTG CCTCTACATT TTCTTCAGTC AGAGCGTCT'r TAGCCACACC ACCATTCATC TTAAGGGCAC GAAGGGTTGC GACAATAACA ACTGCATCTG GAGATGTTGG CAAGTTTGGT GTCTTGA'rAT CAAGGAAT1Tr CTCAGCACCA AGGTCCGCAC CAAAACCAGC TTCAGTAACA GTGTAA'rCAG CCAAGTGAAG CGCTGTTGTC GTCGCCAAAA CAGAGT'rACA GCCATGAGCG ATATTGGCAA ATCGACCACC GTGTACAAAG GCAGGTGTAC CGTAAATTGT CTGAACCAAG TTGGCTTAA TAGCATCCTT CAAAATCAAA GCCAAGGCAC CCrCAACCTG CAAATCACCT ACAGAAACAG GCGTACGGTC ATAGCGATAA CCAATAACGA TATTCGCCAA ACGACGTT'rC AAGTCCTCGA TGTCCGTTGC CAAGCAAAGA ATTGCCATGA TTTCTGAAGC AACTGTAATA TCAAXACCAT CCTCACGTGG AATACCGTT'r AGAGGACCAC CAAGACCAAC AGTCACATGG CGGAGCG'rAC GGTCGTTCAA GTCCACAACG CGTTTCCAGA GGATACGACG TTGATCAArr CCCAGCTCAT TCCCTTGGTG CAAGTGGTTG TCAATCAAGG CAGAAAGGGC ATTGTTG~CA GTTGTAATAG CATGCATArC TCCAGTAAAG TGGAGGT'rGA TGTCTTCCAT TGGCAGAACT TGTGCATACC CACCACCAGC AGCACCACCC 4* TTGATCCCCA TGACTGGACC AAGAGACGGT ATCTTGTTCA AGGCATCCGC A.AGACCAATG GTTGGGTTGA TGGCAGTAAC CAAGA TCAAT ArTTTTATCAA AGCTGAGTTT AGCCTTIGTAC ATACCAAGTT TCTCTACAAC ATCAACAATT ATATCTGTTT TCATTCAAAA TTCCTCTAAC ACAAGATTTT TAACATCCTA AAACTCTCTA TTTTAGAGTC CTTTCTTAAA TTTTATATGC GTTrTTTACCA AAAATTTATC ACTTTCAT'TT GCTATGAAAA TTTTAGTTAC ATCGGGCGGT ATCACTAACC AT'rCTACAGG TCACTTGGGG GGGTATGAAG TTTGTTTAAT TACGACAAAA CTAAGTATTC GAGAAAT'rAC CAATACCAAG TCGCGGATAG CAATCATGGT TT'rCTTGCCA GTAAGCGTCG AC'IrTCC'rrC ACCTGCAGGT TTACCGACTG GATTGCTCTC AACTGCACGA TTTCCGTACA ACTCCAAATC GTCATAAGAA GGCTTCAACT CAATACTCTG TGCGATTTCA CTCTTATATG ATAATTCATT ATATCACAAA AACGTTCGTA AATATCTCTG NTTTTAAGAC CTTTATAGTT TGAAAC'rATA ATAAATCTTC TACTTAC6GC 'rTA7-rTTTGT GTACAATAGT ACCAGTGAAG CTATCGATAG CGTCCGCTCT AAAATTATCA CAGAGACTTT GCTT'rCTGCA CGAGCTCTGA AGCCAGAGCC TCATCCTAAC GACCTTrCTA-A TAGAAATGCA AGAACG'rGTT 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620* 7680 7*740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 CAGGA1'rATC AGGTCTTGAT CCACTCAATG GCTGTTTCTG ACTACACTCC 'rGT?7ATATG ACAGGGCTTG AGGAAGT'rCA GGCTAGCTCC AATCTAAAAG AATTTTTAAG CAAGCAAAAT CATCAGGCCA AGATTTCTTC AACTGATGAG GTTCAGGTT1T TGTTCCTTAA AAAGACACCC AAAATCATAT CCCTAGTCAA GGAATGGAA'r CCTACTATTC ATCTGATTGG TTTCAAACTG CTGGTTGATG TTACCGAAGA TCATCTGGTT GACATTGCAC GAAAAAGTCT TATCAAGAAT CAAGCAGATT TAATCATCGC GAATGACCTG 230 ACTCAAATTT CAGCAGATCA GCACCGAGCT GTCCAGACTA AAGAAGAAAT TGCAGAACTC TAGAAAGGAA AACTATGGCA AACATTCTCT
ATATTTGTTG
CTCCTTGAAA.
TGGCTGTAAC
AACAAGGCCA
TGACACTACA
CTGATCAGGT
CAACTGCTAA
AGAAAAATCA GCTTCAAACA AAATrCAAGC CTATCATTCT GGGTTCAATC GCCTCTTATA TCAAGTCACT GTCTTAATGA GGTACTCTCA CAGAATCCTG CAATCATATC GAACTTGGAA CACTATTGCA AAACTAGCTC AGTCGGCAGA TNTrAGTCAGT CTCAGGCTGC TACAGAGT'rT TCCACTTGGA TGTCATGAAG AAAAAGCAGA TTTATTTATC ACGGATTTGC GGACAACATG CAGCTCTAGC CCTACCAAGT CATAT'rCCCA AACTAATAGC TCCTGCTATG TGTATGACCA TCCAGTAACT CAGAATAATC TGAAAACATT AGAAACTACG GATTGCTCCT AAGGAATCCC TACTAGCTTG TGGAGACCAC GGACCAGGAG CCTCACAATT ATTTTAGAAA GAATAAAGGA AACTATCGAT GAAAAAACGC
TCTCTAAAAA
ATCCAACCTT
GAACCCTATC
GTGGTACCTG
GTAACCAGTA
AATACAAAAA
GCTATCAGCT
CTTTAGCTGA
TCTAATATTG
TCACTTATCT
ATTATTGCCA
T'rACTTAGCT
TTCGTACCAA
GGTTTAACTC
8700 8760 -8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9909
C
C
C. C C
S
cc SC C C
S
CACCCATTGC TATCTTTTT TTAACCTTT'r TCCATTTCCA GCATTATTTA TGGTCCACGA TGACGGTTAA CACGAT'rACG ACGGAAACAT CTACTCAGCr CTTACTTAGT CTATAA.ACTG GTTCcTTGAC AAATACTATC ATAATGGAALA TATCCAACTT TGGTCATTTC TGCAATTCTA
AAAAACAGG
GCTACCATGC TCGTGATACA CTTTCTGAGC ATCAAACCGA CCATTGTTCA TATTCCTGTC GTTGGGGTTA CACTrGGATT TTTGATGGGA ATTCTACCGA CAAGCTACCT CTTCTCTCCC ATCATTGCCA TCGTCCCACG TATTTTGATT ATGAAAAACA AGACTGGTCT TTTGTCCTTG GAGGAATCTT CTTCTGGCAA CCGTTATCTC ACCCTAGCCA TTGTTCCACG
GATTI
5 FAGCT GGAGCCCTTG CTTCCTATTr' GGAAATGTTT AACAAATTCA ATTGCTGAAT ACTACAAACC TTGAAAAAAT *C i* C C
CC..
INFORMATION FOR SEQ ID NO: 13: SEQUENCE CHARACTERISTICS: LENGTH: 1126 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: TAATTTI'CAT ATAATAGTAA AATAGAATGT GTGATTCAAT AATCACCTCA AATAGAAAGG AAATTCTATG TCAAATCTAT CTGTTAATGC AATTCGTTTT CTAGGTATTG ACGCCATTAA TAAAGCCAAC TCAGGTCATC CTTTACAAAA CAACTCATA TATrCTTT-CA GCAGGTCATG TGAAGATGTC AGCATGGATG TCACCCAGAA 'PTTGGTCATA.
GATTTCAACT GCTACTGGTT TGAAGGTTAC AATATCTTTG GGAAGGTGTC TCAAGCGAGG TGTTCTTTAT GAT'rCAAATG 231 CAGGTGTGGT TATGGGAGCG TCAATCCAGC TCAACCAAAC GT'PCAATGCT CCTPATGCT AGATTAAGAG TTTCCGTCAA CGGCAGGGAT TGATGCTACG TTGCCCAAGC AGAACGTrTC ACCACTATAC TTACGTTATC CAGCTTCATA CGCAGGCTTG GCTCCGA'rGG
TGGATTAACC
CT'DCTCACC
CTTACAGCCT
GCGACCGCT'r
TTTCTGGTTT
TGGGGTTCAA AAACACCAGG ACAGGTCCTC TAGGGCAAGG- TTGGCAGCCA. AATATAACCG TGTGGAGACG GAGACTTGAT CAAAAACTTG ATAAGTTGGT kTATCAACTT GGATGGTGAG ACAAAGGATT
CCTTT'ACAGA
AAAATGGAAC
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1126 AAGTGTTCGT GACCGTTACA ATGCCTACGG TTGGCATACT GCCTTGGTTG
S
AGACTTGGAA GCCATCCATG CTGCTATCGA AACAGCAAAA GCTTCAGGCA AGCCATCTTT GAT'rGAAGTG AAGACGGT'rA TTGGATACGG TTCTCCAAAC AAACAAGGAA CTAATGCTGT ACACGCGCC CCTCTTGGAG CAGATGAAAC TGCATCAACT CGTCAAGCCC TCGGTTGGGA CTACGAACCA TTTGAAATTC CAGAACAAGT ATATGCTGAT TTCAAAGAAC ATGTTGCAGA CCGTGGCGCA. TCAGCTTATC AAGCTTGGAC TAAATTAGTT GCAGATTATA AAGAAGCTCA TCCAGAACTG GCTGCAGAAG. T.AGAAGCCAT CATCGACGGA CGTGATCCAG TCGAAGTGAC TCCAGCAGAC TTCCCAGCTr TAGAAAATGG TTTTtCTCAA GCAACT INFORMATION FOR SEQ ID NO: 14: SEQUENCE CHARACTERISTICS: LENGTH: 2520 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: CCGGCAACAA AAAAGAAAAA ATCAACAGTT AAAAAAAATC TAGTCATCGT GGAGTCGCCT GCTAAGCCAA GACGATTGAA AAATATCTAG GCAGAAACTA CAAGGTTTTA GCCAGTGTCG G GCATATCCG TGATTTGAAG AAATCCAGTA TGTCCGTCGA TATTGAAAAT AATTATGAAC CGCAATATAT TAATATCCGA GGAAAAGGCC CTCTTATCAA TGACTTGAAA AAAGAAGCTA AAAAAGCTAA TAAAGTTTTT CTCGCGAGTG ACCCGGACCG TGAAGGAGAA GCGATTTCT'r GGCATTTGGC CCATATTCTC AACTTGGATG AA.AATGATGC CA.ACCGTGTG GTCTTCAATG S.
S
S
S
S. 55 S 232 AAATCACCAA GGATGCAGTC AAAAATGCTT TGGTCGATGC CCAACAAGCT CGTCGGATCT CTATI'TGTG GAAGAAGGTC AAGAAGGGCT TTAAACTCAT CAPTGACCGT GAAAATGAAA CAGTTGATGC TGTCTTTAAA AAGGGAACCA A'rCGTAAAAA GATGAAACTG ACCAGCAATA CGAGTAAAGA CI-rTTCAGTA GATCAGGTCG TACCC'rATAC CACTTCATCT ATGCAGATGG GAAAAACCAT GATGGT'rGCC CAACAGCTCT AAGGTTTGAT TACCTATA'rG CGTACCGATT AGGCGGCAAG CT'rCATTACG GATCGTTTTG TCAAAAACGC ATCAGGTGCT CACGATGCCC ATACACCAGA AAGCATCGCT AAGTATCTGG TCTGGAATCG TTTTGTGGCT AGCCAGATGA AATTGTCTCA AAAAGGGGTT CAATTTGCTG TI'AAAcAACC TCGTAAGATC GATATGGACT TGGATCGC1-r GGTAGGGTAT TCGATTTCGC TGTCAGCAGG TCGCGTTCAG TCCAT'rGCCC TCAATGCCTT CCAGCCAGAA GAATACTGGA AACAATTTC-A TCTCCTTC TATGGAGTAG ACGAAGTCAA GGAAGTCTTG TCTCGTCTGA ATAAGAAAGA GCGCAAGCGC AATGC3'CCTT ATGCTGCCAA TAAAATCAAT TTCCGTACTC ATGAAGGAAT TAATATCGGT TCTGGTGTTC CGACTCGTAT CAGTCCTGTA GCGCAAAATG GTAGCAAGTA TTCTAAGCAC GGTAGCAAGG ATGAGGC'TAT TCGTCCGTCA AGTGTCTTTA ACAAGGATCA GCTTAAGCTA TATACCCTITA CAGCGGCCGT TT'rTGATACC ATGGCTGTTA CCAATGGTAG TCAGGI-rAAG TTTGATGGTT 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 .1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160
ATCTTGCCAT
ATGTGGTCAA
ATI'CTGAAGC
ACGCCCCAAC
TTATAATGAT TCTGACAAGA ATAAGATGTT ACCGGACATG GTTGTTGGAG ACAGGTCAAT AGCAAACCAG AGCAACA'N-r CACCCAACCG CCTGCCCGTT AACACTGATT AAAACCTrAG AGGAAAATGG GGTTGGACGT CCATCAACCT CATTGAAACC ATTCAGAAAC GTTAT'rATGT TCGCCTGGCA CCCAAACGTT TTGAACCGAC AGAGTTGGGA GAAATTGTCA ATAAGCTCAT CGTTGAATAT
TCGTAAACGT
AAGAGCAGTG
AGGCTGAAGA
AAGTGTGTGG
GCAATTTCCC
CAAGCTGTCA
CTTGCAATCG
GACCTTCACA GCTGAAATGG AAGGTAAACT GGATGATGTC GCGACGGG'rC ATTGATGCCT Tr'rACAAACC ATTCTCTAAA
TTCCCAGATA
GAAGTTGGAA
GAAGT'TGCCA
AGAAATGGAA AAAATCCAGA T'rAAGGATGA ACCAGCTGGA TT'1GACTGTG CAGTCCAATG GTCATTAAAC TTGGTCGTTT TGGTAAATTC TACGCTTGTA AGATTGCCGT CATACCCAAG CAATCGTGAA AGAGA'rTGGT GTTGAGTGTC TCAGGGACAA ATTATTGAGC GAAAAACCAA GCGTAATCGC CTAT'rCTATG CTATCCAGAA TGTGAATT'rA CCTCTTGGGA CAAGCCTGTT GGTCGTGACT GTCCAAAATG TGGCAACTTC CTCATGGAGA AAAAAGTCCG TGGTGGTGGC TTTGTAGCAA AGGCGACTAC GAGGAAGAAA AGATGGCTCT TTGTCAACTG AAGTCAGCTA AGCTCGAGAA AGGACAAAT'r TTGTCCTTTC T'TTTTTGATA
AAGCAGGTTG
TAGTGGGTTG
TTC-AGAGCGA
TAAAAATCCG TTTTTTGAAG TTTTCAAAGT TCCGAAAACC AAAGGCATTG CGCTTIGATAA 2220 GTTTGATGAG ATTATTGGTC GCTTCCAATT TGGCGTTAGA ATAGTGTAGT TGAAGGGCGT 2280 TGACGA'TTTT CTCTTTGTCC TTTAGAAAGG TTrTAAAGAC AGTCTGAAAA AGAGGATGAA 2340 CCTGCI'TTAG ATTGTCCTCA ATGAGTCCGA AAAATTTCTC CGGTTCCTTA TTCTGAAAGT 2400 GAAACAGCAA GAGTTGATAG AGCTGATAGT GATGTTTCAA GTCI'rGTGAA TAGCTCAAAA 2460 GCTTGT'N'AA AATCTCTTTA TTGGTTAAAT GCATACGAAA AGTAGGGCGA TAAAAATGTT 2520 INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 10993 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO.: *TTTTCTCGAT AATAACTTCC ACCTTATTAT TTGGGATACC CTCCTCTTCT TCACCACCAC *GTTCATAGTA GTCATCGCGA TAGAGAAAAG CTACGATATC AGCGTCCTGC TCAATAGACC 120 *CAGATTCACG AATATCAGAC AAGACCGGTC TCTTGTCCTG ACGTTGTTCT ACACCACGAG 180 AAAGCTGACT CAGAGCGATT ACTGGAACCT TCAATTCCTT GGCTAGTATT TTCAACTGAC 240 *GAGAAATTTC AGAAACTTCT TGTTGACGAT TTCTCGACC AGTTCCCGTG ATAAGTTGCA 300 AATAGTCTAT CAAAATCAAA CCAAGATTTC CAGTTTCTTG AGCCAATTTA CGAGAACGAG 360 AACGAATCTC TGTA.ATCCGA ATACCTGGCG TATCATCGAT ATAGATACTG GCGTTAGcTA 420 *GATTACCCTG AGCAATAGTA TATTTTTGCC ACTCCTCATC TGTCAATTGC CCTGTACGGA 480 TAGAATGTGA CTCCACTAAG CCTTCTGCAG CTAACATACG ATCTACCAAG CTTTCCGCAC 540 ***CCATTTCGAG TGAAAAAATA GCAACCGTTT TGTCCAACTT AGTCCCAATG TTCTGAGCGA 600 **.*TATTCAAGGC AAATGCTGTC TTACCAACTG CTGGACGAGC TGCTAAGATA ATCAACTCCT 660 CCTCATGAAG TCCTGTTGTC ATATGATCCA AATCACGATA ACCTGTCGCA ATACCTGTAA 720 TATCGGTCGT TTGTTGCGAG CGAGCTTCCA GATTTCCAAA GTTGAGATTC AACACATCTC 780 **GAATGTTCTT AAACCCGCTr CGATTTGCAT TTTCACTGAC ATCAATCAAC CCTPTTTTCTG 840 CCTGAGCAAT AATTTCATCA GCTGGTTGTG ACGCTTCGTA AGCTTGGTTG ACAGACTCTG 900 TCAACTTGGC AATTAAACGA CGTAGCATTG CTTTTTCTGC AACAATCTTA GCATAATACT 960 CCGCATTAGC AGAAGrrGGC ACAGAATTAA CAATCTCAAC CAAGTA-AGAC AAGCCACCAA 1020 234 TATTCTGTAA ATCACCTTGA TT'ATCAAGGA TAGTACGAAC CG7TG=GCA TCTATGGCAT CACCACGATC GGATAAATCG ACCATGGCTT GGAAAATCAA ACGATGGGCA TACTTAAAAA AGTCCCGAGA CTCAA'rGTAT TCTCGCACAA AAACAAGTTT ACTCTCA'rCA ATAAAGATAG CCCCTAAAAC GGATTGCTCA GCTAAGA'rAT CTTGAGGTTG TACTCGTAAC TCTTCTACT'r CTGCCATCAG ACTTCCCTTC CTTTTACAATX CTTGTCA6AGA AGGTGTAAAC TTATCCTTCT T'rCACACGAA GATTGAI'rAC ACTTGTGATA TCTTGATAGA TT'N'CAC'rGG CACATCA6ATC AAACCAACCG CTCGAATCGG AGCTTGTACT TGAATATGAC GTTTATCAAT CTTrAATTCCA AATTGCTTTT GCAATTC'rTC TGCAATCTTC TTATrGGTAA TAGAACCAAA GGTACGACcA TCTGGACCAA CTTT'rTCAAC AAATTCTACA ACAGTTTCTT CTGCTTCAAG TTGTGCTTTA a S. a a
S
a. Sa a a
S
S. S S S
S
a..
I
S. t~ S e a *SeS *eaS Sc 55 a ATTGCTTTTC CTTCTGCAAT CATCTCAGCG CGAAGTTCAC CTACAGCTTG AGCAGTCGCT TTTTGCGCAT ACCCTG'rrGG TACTTCCTTA TCTGCTAAAA AGATrACTTT CA'TrCTrCTT T'rTCTGTCAG TTTTTCACCT GCT'rCTGACA AATTAAAGTG GCCTCCACCG CCTAACTCTr GACTT'CGAGC TGAGATAGAG ATAAATCCTT CAATACCTGA CATGGCTAAC ATGGCATCTG ATTTCATGTC CTTAGCCTCT GCTATTAGTA TCTTTGGCTA GATTCTTTTT ATTTCGCCTr TTTTACCr'rT TCTCCTTTTC CTTCATTTCA AGGT'rACATC TTTAATTTGA CCATAATCCG TrGTACATTC GTGTATTCTT CGCAAGAACA CTGCCTTACT AATAACAACT CATCTGAACC TAA'ITTACGC
GATAAGAAAG
TCC'rTTAACA
TTTAATACAA
GCTGCTGCCA
AGTTTACTAC
AAACTCGCTT
GTATCATAGC
CCCTGTAAAA
TGAGC=rTT CTTCCGATTT TTGTTTACCA 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 TAAGT'rCATT GACCTCACGA TATTCTTCAA AATCTGTCGC TACTATCACT TCCGCGCGTT CTGAGATAGC TAGCAACATC GCGAGGTGAA ATqrTTAG'rA TCCAACATCA TACCAGCCAT AGCGATTTCC TGGATAGCAA AAATGTCCGA CTAGTrACTC CAAGACACTT GCTTGCATAC GACTCAAACG ATTTTTCTrA GAATCTGGA CACTACTTGC ACCACTTTCG TTCTATGG'rG GTCAATAACA TTAAGGCTGT CTTrGAATGG TTGCATCCTT AACAGACAAC G'N'CAATATC TGGAGACATT TGCTGGCGAA CAACTGC-ATA GACCGACTAC AAAAACCTGA
ATATAAGTAA
ATGGTTTGG
TCTACAAGAA
AACTTCGTAA
TGTTCTTCAT
CCTACAGCAG
TCTACAC1'CC ACTGAATCAA TTCCGTTACC AACTCACTGG TAACCGCATT ATCTGGAAAA TCCTGATCCC TAAATAAATC ATAAAATTCT TTTGATAATG TCAACAAAGA ACGATTGGTrC ACCATCCCCA CTCCTTCTX'r TTCTATGAAT GAAACAGCTC CATAAAGAGC ATAGCTATTT TCAATCACAT AGCCCAAAGC ATCCATGTCT AAATTTTTGT GAATCTTATC TGAAATAGCt GTCATCATAG CGCGCGTACG AGTCCGTGTA CGCTTGATTG AAGCAGCAGA CCCACCACCA AAATAAACTG GAT'N'TTCGT TTCGTCGTTT TCCTTAACAA CCACCTGGTC GCCACCACG1' ACTTCACCA AGTTCAAATT GAGCAAAGCA ACTTTCCCTA TCTCATCATG ATCCCATACT TAAGGTCAAG TAACAGAAAA TTT'ATCATTC ATCGATCCAT ACTTACCCGA AATTAGC'rAC AAAAcrAT'rG CATCATAATT ATCCACACAG TTATGGCTTG TrTCCCTGGAT GGCAACTGTC TCTGTTTCGA ATCAAGCCCT CAAGCACCGT CGAGAAAACA TCATGTGTTT ATTTGACTAA TATCTGACTC ACAATCCCAA TCACTGGTCT ACATCTACAA AATACAAAAC CCAAGCTTGG CATAAGTAGA TCTAAATCAA AA'rCACCATC ATTTCCATCG CCATAAGAAA CTCTTCTCTG AAAGCATCAA G1'AGTCAGTA AA'rAGATAAA TTCTGAAAAC TCTGATATAA AGAAGT'rTCA TCCTCCAAAT ACTTGT'rACC AAT'PCATCTG ACCGGAAGAA GCATCCATAT CGGA'rTTCC'r ACTCAAGCCT TTCCTTGGTC AAAA'rCAATT
GAACAGCATA
TGATAATCGT
CAGCATAGGG
CAGGCATCTG
TCTGTTCTAC
ACGCTC~r-rcA
TTGAACAGCT
A'rTAAACCAC
TTCCAATAGA
ATCACTCCTT
TCAACCTCTC CAGAAGATAA ATTCAATTTC ATAACACCTA GCTGTCAAAC TTTCTTCCCC TTGGTGGTTT ACATACTGTA GTATAATGCA CTCTCAGTTT CTTAAATAAA AAA.ACATAGC 0 CTCCTACAAA AAGAAACAAA ATTAAAACCG TCAACAGATT ATTATTAACA AAAATAATGA AAGTGGATAA GACTCCAAAC GCAATCAATC CTACTAGAAT AGGAAAAATT GGACTTACAT AAAA'NTrTTT CATTCAAAAC CTCTTGGCAC CCATTATACC ATAATACCCC TCAAAAAGCG ACTTTTTAAA AGTGTAA'rCA GTAATTC'rAT CAATTATAAG AAAAAGGTAG TT'1ACAATTC AGTAAACCTA CCTT'rACACA TATT~GAAATT- AAGATTCTTT AACCTCTAAC AAACCAATTT 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 a. a a.
a a CGCCATCC'rC ACGACGATAA ATCACATTGG TTGTCTGATC AGAAATCATG CCCCAATAAA TCCATTTGTA GAATTCTTC AATCAATTTG TTTTGAACGA ACAACTTTAG ACTGGACAAT CATCTGTAAA TAATTGACCA G'N'GCTACCT TATTTTT'ATT TATTTTTACG AATCTGACGT TCAA'rTTTAT CAGTTACAAG CTTGAGATAC ATCT'rCTGCG CGGAGAGTAA TAGATCCAAG TAGCCGTTTT TTCACGATAA ACTTTTAAGT TAATTCGGGC GGAAGTACTrT TTCGATCTTT TCGAGTTTAG AAACTACATA ITTCAACATCC ACATAGATA.A TTCCAAATCC ATTGGTTTA AT'rTGAATCT TCCACCAAAG TTTACGCTCG ATTTTTGTTT
GTCAATTGAA
CGGAATCGTT-
ATCCAACTCT
CCATACATAT
ACTTCCACrT
TGTTCTGGTT
CTTCTAGGTT
TTT'rGTTT
CTTACAAGCG
TTCACCACGG
ATGATTTTAT
AAAATGTTTT
ATACTATATT
TATAACGCT1' TACATCCT'rA
TAATCATATG
TCATTCTATT
GCACCAGCTT
ATCACGAA'rT GCTTCTGTTA AGTACCTTCT T'rCTAAACAT TTTGCAAATT TTTTCCTCAT CTTCCAACAG TTTCTTAACA 236 CGATTTATAG TTGCTCCTGT AGTATAGATA TCATCTATAA GTGACTCCAC TTTTAATAAA GAAAGGAAGT TCTGTCCCCA GAAGAACTGG CTCTC'rCTTC TCTTTTCTCT AATAAATCCA TC1'ACCAAGC CCTCAACCTG ATTAAATCCT CTATTAGCAT ATTACAACAA ATTGATACTC TTTGTACTrT TTCAACTCCT ACTTT'rCTA ACAGGAAGTC TCCATCAAAC TTATACCGAC TGATTGTAAG TAAAAATCGC TCTATGACTG ACTTCAACTC CAATCT'rGAC ACTTTGTTGA CAACTCTGTT T'rCATACAAT ATTCTCAA AAGTAGAATC ACAGTC'rGAA CAAAGACAAG AAGAGACTAC TAAAAC'rrAA AACAGTCTTC ATAGTCTGCC GTAGGAITT'r TTT1AGGAATA AGCGCTC'rGA ACGArrrT'rA GATACTCAAA GCCTGCTGCC ATCTATCAGG ACTTAGGGGA CACTTAAAAA TGAAGCGAAA TGAAAAAATC CTTCATAGCT CCTCTTTACA CCAAAGTTGA
TTGGACAGTT
AGTCATCATT
CACATAACAA
CTCTTCCCCA
CCTCAGAAGT
GCACrTTCATA GAAGCAT'rTA
CCAACTCGTC
GACCAGCCTC CTTATTCATC ATCTGAATTT
ACCCATCATG
CACCAATCTG
GAACAAAAGC AAATCTCCTG AATCAAACTA GACTTGGTAA CATCCACACA AGGGAAGGTA ACTCCGCGCT GTTCTCCATC TCGAAAAGCT TGTACTTGCT AGCCAATTT CTCATTTGGA AATTGCTCCT TAATTTCTGA AGCAAAAATG AGTAACGGAT ACT'rTAACTT TGGTGACAAA CGATTCTTGT 'rTGGTT'N'G AATAATCAAC GGAT'rTCCAT GTTCTCCTAA ACGGACCTTT TTATCTAACT TCAATCCATT CTCCT'TTACA CTATTCTTGA AAGGAAAAGC ATCTACTTCA TCCACTATCA ACTGATGGGT TGT'rGCAACA ACTAGTGGTG GCAAAGC'rAT CCCGCAAGAA AA.ATCCTGT'r CTATGCGAGG ACTAGCCAAA CACACTGCAC qATAAATCAT TTCTGTCTNT CCAGCTCCTG TGTCTACTAC TTGAAGCAAT CCCTCTGACA GCCATTTGAG AACATCTTGC 'rTTGGAAAAT CACTTCTGAC TCGCTTCATC AGCAAGCACT
CCTTAATCGC
TCGGTCTATC
ACAAACGATG
CCAAGATTGT
CTAATCGATC
GTAAGATTTC
AAGCTGTCTT
CTAAGTAGCG
GAAACCGTCT
CAT'rGGTCGA
CAGCGTGGTA
CTTCTTGAT1'
CATGCTTCGT
ATTGGCCTCT ACTACGAAAA CGTACTGATA AGTATTGTCA CTCTrGTTACA GAAGATACAA TGCTAACTGC TCCCCTT'rCT TCTCTGCTTC TCAATATAGG ATTAAAATCC GATAACCAAA CGGTAA.A'TC AGTCT'rTTA AGTCGCTGTT AAAAAGATTC AAGCATGGGA TTATCAACAT 4620 4680 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 GCAAATCAAA AGCTTGATAA AACTTCAATA T'rCGAAAATA AGGT'rCCGAT TCTCCATGTA GCAGGCGCTT GTACAGCTCC AAACAAACAT CACCCGCATT GATCACTTTA GCCACTACT TTACCGCATG AACTAAGGTT GGCT'I-GCT CCTTCTCTTG AAAAGGAG'rT AATTGGCCGC CCTCCTGCGG AAAATAGTAT AAAGT'rTGAT CTCGACAATA GTA-AGCACCG ATGGGCAAAT ACCATTCTTC TAGAATAGTA CTATT'ACAGC GTTGACAGAA AAGTTTCCCC TTCTCCTTTC TCATTGCTGG AAGTTTCTCC ATAAACGACC GAGATAATCT TTTAGA'rGAT TTNTTAGTAC AAGTCCAAGA AGAAATCAAA AA1GAAGAGGC TCG'rGACTTC ACTGCTCTGC CTTCATTATT AGCCTAGTGG TACTGCTrGGT ATGTCTGTGT GGTCGTGACA TTCGTGC'rTA CGCCGGCAGT AAGAACAGGC TGGCATTGCT TCCTTAAAGA ACATGGTCTC TGATTTATGT TGATAAAGAA ATGGAAAAGT CACITAACT TGTAAACAAT GAATAATACA A'rAAAAAGAG GCGTACCAAA CGTTTTCCTT CACACCTATT.
CTT'rcATCTA TGATATATAG AAGAGGTATT CATATGTCTA TGTTAAACTT AACAACATCG ATTTAATCCT GGTTCATCTG ACAAGATGGT ATTCTGAAAC
GCCAACTGAC
AAATTTACTT
AATTAAATCA
AAATCTCCTI
ATTACTOCCA
GGAGAACGTA
GTTCCCATGC
CGCTACTT'rG
GTCGCCTTAG
ATTCAAATGT
ATGGAGCTGG
GAAAAAGAAA
GACCAAGGTT
GCGTTTCGTT
ATATACTAGA
237 GTTCTTCTTC TCTTAATTCA TCATACTTCT ?TATTCGTAA TGGAATTTAG GACAATTAAA TTATCTGCCA TGCCAAGCGT TCAAAAAAGA ACACTACAAA GTGAAAT'rAA ACGTACAAGT T1'CTCAGTAA
AAACTAGCAC
GAGGACGGTC
GTTTATAGCG
GCGACACATA
GATGATGGTG
TTCGGGTACT AGAAAATCAC AATCTCACCA GTGGTATTAA AC'rAGGCGCT GGAGGACTAA CTGTCAAAGA AATTGGTATT CTTATGCTCA GTACCAACAG ATACAAACTT TACAGATCAA CTATTAAAGC TGC.ACTTGTG TACGAGAGGT TGAAGTTCCT GACATTCTCA CAACTACTTT AAA'rGAAGCA ATTCAAACGA
ATTGAAATAA
TACAGTAACT
GTCGATACGA
GAGTTTTTTA
GTAAACTTAG
AGCGAGCAAA
AACCTGATAT
S*
S.
S S
S
S. 55 S S
S
S. S S 555555
S
55.5
S
5555 S. 55 S S
S
5555
S
S
55 5* S S
S
6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 TACTAGAATT AGCTGAACC AAATGGTATG GATAGCGTTA TTTATAACAA CATTACTGAA TGCCAGAAGG TGCTGCAGAC TAAAAGACCG TATTGCC&T CTGGTTCTAC TATTGTTGAA TATTGGACTT TCATGGGTAG GTGCTGCTAA AGGGTATAAA AACTATGAGT GTAGAACGAC GTAAAATTAT CCAAGCTTAT A.ATCACTTGA AAATTAATGA TACTAAAGAT ATCTTATACA TTAATCGGTC AAACACCGAT GTCTATATAA AGCT'rGAAGC AGCATGATTG AAAAAGCTGA GCAACAAGTG GAAACACCGG GTCGTCATCG T'rATGCCTGA GGTGCTCAAC TCGTCCTAAC CAAGAAATCG CTGCTGAACG CCAGAAGTAC ACGAAAGAAC TTAGATGCCT TTGTTGCTGG CTCAAATCAG AAAATTCTAA CTATCTGGTG AAAAACCTGG GATACACTTG ATACTAAAC
TCCTGGTAGC
TGATCGTTTC
AACAGGAGCT
AGTAGGTACT
CATTCAAGTT
GAGGGAATGA AAGGTGCTAT TGCTAAGGCT CTTCCTCI'C AATTTGACAA TCCAGCTAAT GAGATACTAG CTGCTTTCGG TAAAGATGGA GGTGGAACGA TTTCTGGTGT TTCTCATGCA TTTGCAGTAG AAGCAGATGA ATCTGCTATT TCCTCACAAA ATTCAAGGTA TCTCAGCTGG ATTTATTCCT 238 TAACATCAGA TGACGCTCTT GCACTCGGAC GTGAAATTGG CTATGATGGT ATCGT'rCGTG TGGAAAAGAA GGCTrcCCTG GGT'rGCCAAA AAATTAGGTA ACG'rrATCTC TCTACAGCAC AATCTCCAGA CTAGAGAACT T'rCTTGTACA ACTTTAGTCC T'rCCACGTT'r GGAAGACATT TTGAAATAAG ATATGAACAA TCAGCTTT~CC CAGACAAAAA
TAGGGATTTC
CAGGTAAAAA
'TrATGAAT
CACGGATAGT
ATGGTAAATA
CTAGAAGATA
ATCGAT'rAGA
AGTCCAATAG
CTCAGCTGCA GCTATCTAcG AGTCC-TTGCC CTAGCACCAG GTAACCGTCC AATAACGAAG
TCCTAATCTG
GGCCTCTAAA
GGATAGATAT
ACATGATGGT
TAAGTCAGCT
GAcTTT ACCTC T-r GT
TTCTCACTAT
AAAGCGTAAT
GACTATCACT
GAGCCATCGA
ATAACGGTGA
TCTATTGAAA
ATTTrGCACTT
TTACGAGAGT
'rTATAATGGA CCCTTGCr'r'r
CTCTAGCACC
TCGCAAGAGG
GATTTCCTTA
CTATAAGAAG TTTCATCCGC ATGAAGTAAG GGCTGAGTCA ATAGTCTCTC TTATAAAGGG GCTCCAAATA GTATTGACTC GTCTTGATAT GCCAATTAGA a.
a a. a a a a a *b a CGTGTGATTG GTAAACCCAT AGATTAAACT TCTGATGGAT AAAGGGGCTT TAGGAATAGG TATGGACAAT GCTATATGGC TGCATTTAT'r CCTCCATACA TCGTA~aCGT GGATTTAAGA GGAGAAGGAG GTGATAAAGT
CCTAGCCCAA
GGTGTGAGCG
AGCTTTCACA
ATAAATCAAG
CACCAGAGAT
ATAAAGCCTT
CCATCGTTAA
TCTTCTTCTT GGCGATAATT GGGTACCTTC ATAATAGAAG CTGAGCCAAA GTTATGCGCT AGCTTATCCA GATGATTATC TTTTACTCGT TACCTI'AAAG ATTCCGACTA ATATTGGCTT GAACCCCATT GAACAAGTGT GGAAAGAGAT TCGAACT'TTG GAAGATGTCA TACAAGGACT TCGGAGACGG ACTAGAATGC 'N'TN'GAAAA CAATAGAAAT CACGACTTTC TGATGAATTT AAATCGAT.TT CTAACAATGT TT'rAGAAGCA TATTTGGGGA GTGATAGAAA AGCCCTTCAT TTTGACATCC ?r'TTCTGTAC TGGACCAAGT 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 CAGATGAGTA TAAAAAGAAA GTCCTCATTT ATAGTAAAAT GAAATAAGAA CAGGATAGTC GAGGTGTAC'r ATTCTAGTTTr AAATCCACTA CAGCCAATCT ACTTGT'rCAG GTGCGAGAGC CAGTTTTCCG TTCTCAAAGC GT'rTATATAA AACTTTAAAG CGGTCTTTAC GTCCACCACA a.
a a *Sa* a a a. *a a a a
TATCCAAAAT
AAAGAGAAAG
CCT'rGACCAT
ACTT'GATCGG
CCCAGTAAAG
AGAAAGGATC
CAATTCAAAG
CTTGCCACAA
CCTTTCCTCC
ACCTTGTTAT
TGGGTTTTAA CTACATAGGC TAATGAGTCT ACAAGGTGAA CTTGACCTAA ATCACTTAGT GATAATTATT TTTTATCTGG TATACTGGAA GACGCGCTTA CTATGAATTT GAAGTATAGT ATTCCCTGCC TCATATCTGT TGAATTATCA TAGTACAATA GTTGGGGAAT TAGGATAGAT CTCCTAAATG CACTTAGCCC TTATTATAGG GCTTTrGTT TTAATTATTC TAATCGAGTG AGACTGGGGA AAAAACAAT'r TCAGGAAAAA TCTAAGCCCT ATACAAAAAA GGAAGCAWITT TGCTTCCTTT CTATTATTAG TTATTCAAGG CTGCTGCCAT TCGATACCTT CACCA.ACTTC AGGTATGcTr CAACTGTCTT GCTrGGTCAA CTrTAGTGTT TCCCAGATTT TrTCTGGT TGAGCAATAA CATCATCAGT GGTTTATTAA CCATTGCACG AACTCATCTT TAACGAATTG GCTGCGATGT GCATTGACAA ATAACACCGA TACGTCCACC TCA.ATCAATG CAAAGCGACG 239 TGTAC'rGCA ACTTCAGCTT AAAGCCAGCA AACTCAACTA
GCTGTCATCC
ATCAAGCATG
GCCTTCTGCA
TAATTGAGCT
GCTI'TCGTTG
CTCATCCAAT
TTGTTTAGCA
GTTATGTTGG
GAATGAGATT
ATGATGTAA
AAGCGATCCA
GCCAATTCAG
TTTGATCCAT
TCTTGGTCGA
CGA.AGTCGTT TGCAGCITTC CCGAAGCGTr AACTGATTCA CTTGTGCAAG AAGTGTGTA.A TTTTACCTGG AATAAT'rTTG CTTTGATGTC AGCTTCAGCT ACITrCAAGTG TGGAAGAGCT TAACGTGATT CAATTGTCC TCTTGTAAG AAAGAACTGT TGGTTTCATC AGTGCTTCGT CTCCACCTTC AACAACTGAA TATGCTCCAA AGTGTTGTGC GTCTGTT'TTT TTCTCTCCGA TAGTTGCTGT TGCAGATACG ATTATCAAAG CAAGAGCTTC TTCGTTGTTA GTAGTATTTA CCAATTCAAC GAATTGAGCG ACTTCAATAA CTGCTGCAAC ATTACCGTTA 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 10993 TATGCAGCTr CAAGAGTTrC ACCTGAAGGC GCAGGTTTTC CTTCAGCAAT GACTTTAGCT TTTTTTGCAA CGAAGTCAGT TTCAGCGTTT ACATAAACAC CAGTCAAACC TTCTGCAGCA ATACCTTTTT CACGAAGCAA TTCAATCGCT GCTTTTTTAG CGTCCATAAC ACCGGCACCA GCTGTAATTT CTGCCATTTT- AATTCTCCTA GCCCCGCCTC CGG ACACGGTCAG CTTrCTTAGC TTTTCGATGT CACCGTCTGT GATTTTTCAC GCAACTCTTT TATTTTTTGA AAATAGGAGA
TGCCTTAGCC
TCTACAAGC
TACAAGTTrA
GCGCGGCTAA
INFORMATION FOR SEQ ID NO: 16: SEQUENCE CHARACTERISTICS: CA) LENGTH: 8411 base pairs TYPE: .nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: CGACGGGGAG GTTTGGCACC TCGATGTCGG CTCGTCGCAT CCTGGGGCTG TAGTCGGTCC CAAGGGTTGG GCTGTTCGCC CATTAAAGCG GCACCCGAGC TGGGTTCAGA ACGTCGTGAG ACAGTTCGGT CCCTATCCGT CGCGGGCGTA GGAAATTTGA GAGGATCTGC TCCTAGTACG AGAGGACCAG AGTGGACTTA CCGCTGGTGT ACCAGT'rGTC TTGCCAAAGG CATCGCTGGG 240 TAGCTATGTA GGGAAGGGAT AAACGCTGAA AGCATICTAAG TGTGAAACCC ACCTCAAGAT GAGATTTCCC ATGATTATAT ATCAGTAAGA GCCCTGAGAG ATGATCAGGT AGATAGGTTA GAAGTGGAAG TGTGGCGACA CATG'rAGCGG
AAGTAACTGA
GTAGGTATTA
GAATATGAAA GCGAACGT CTCAGAGT'rA AGTGACGATA ACTAATACTA ATAGCTCGAG GACTTATCCA 1-rCTTAAA'rr GAA'rAGATAT TCAATTTGA GCCTAGGAGA TACACCTGTA CCCATGCCGA ACACAGAAGT TAAGCCCTAG GAAGTCGCTr AGCTT'rAATC TGATGTCGTA GGTTCGAGTC TCCTCTTTTT GTATCAATTT GAGAACTTTC TTTTTTCCA TTAGATAGAT GCTACTATAT TTTGTAAATC TGTACTAAGC TAGTCCCATT TTCAGAAAGA CTTCTAAAAT AGCGTCTCTT GGAATAGCTT GCTTTGATAG AGCTGGAACG ACTAATTCCG AACGCCGGAA GTAGTTGGGG CGCCATAGCT CAGTTGGTAG CTACTGGCGG AGTAAT tGAT GTATCACCAA GCATTTTCAT TGTGCAATCC AAGTTTGGCA TCTAATTCAG TGGTATTTAG ATGATATGAA GTTTATTTCG AGGGCAGCCA GAAGTGGT'rC TTGTGATGAG CATGTTTTTG TGCTCAATCA TATCATACTT GCT=MTCTA CTAATTTGAC GTTGCCCCCT GTGAGATAGG TAGCGCATGA CTGTTAATCA AAAAGGGaAC ACAGCTGTGT AAGGAAGTCT GTTATTTCTT GACACCAAAA AGTGCATGAG AT'rCAGTTGC ATAAATCGCT TAAGAAATTT TGGATTTCTT TGACTCTAGA TATTCAAAAA AAAAATATAT TCAAATGTAT ATCCTTATAG TGAGTATAGA AGTAGAAATT TTATCAAATG TCGCTTTGTT TTTAAGCGTT ACTATATGTC TAAAAATAGA
GCTGTTCCAT
TGTTACTTTC
TTTTTTATCT
CAGTAATTGT ACCATAGCAT TTTCAATAGT TTGCATATTT CCTCCTTGTA AACAAATTAG TGTAATTTAG ATTTTTAAT GTATAATCTA 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 TTATATCAAA ATTTTAGACA ATATGTTTAA AAAAGGAGAA ACTAAGTTTA AAGAATGGAA AGCAATTTAA AAAAAACCAA CCT'rTATTAT TGTCATGATC GGGATTTCTC TGTCATCAAT GTGGGATCCA TATGGGCAAT ATGATAAAGA GGCCTCCTAT AATGGTAATA ATTTAAAAGA AAATAAAACC T'rGGATTTTC GATTGGAAGA TGGCGATTAC TATATGGTAG CAACTACATT ATCCAATATT CAATCGACAG AAACTGAGAT AAGTGATTCT GTATCTCAAA CAATTCTAGC TTTAGTACAA GATTTACAGG CTAATCTTTC GACTTTAAAA AATCAATCTA TGATAGGATT GTCAAGTGGA TTAACAGAGA TTATTCCAGA TCTGTACAAT ATCATATTTT TGTCTGACTT ACCTGTGGCA GTTGTAAATA CTATGGCAAT AGGAAAAGAC ATGGTGTCCA ATTTTGTAGA TGAAGAGGAA GGAAAGAAGG TGACTTTACC AAGTGATTTA TCTGAAAAAA CAGCTTATCA ATCATTGACA AGTGAGCAAC ATTCAACTGA TAGTATTCAA TCGGCTCAGT GAAGTTTAGA AAACTTACAA AATCAATCTT ATCAAGTATC ACCTATTACT TCTACTTCTT TACAAGGAGA TGTTACTAGC AAATTAGTTC 241 CTGCCAGTCA GTCGATTGCA TCAGGTGTAA ACGCATA'rAC TACAGGTG=r GATAAACr'IT 2100 C'rCAGGGCGC AACTCAACTA AGTGAAAAAA ATGCCACCTT GACAGGTAGT TTGGATAAAC 2160 TAGTTTCAGC CTCAAACACC TTGACACAAA AATCTTCTAG ArGACAGCA GGAGTTGGTT 2220 AATTACAATC AGGATCTGCG CAATTAGCAG ACAAATCCAG TCAGTTACI-r TCAGGTrGCTT 2280 CTCCATTAGA GAATAGAGCT AATAAAT'rGG CAGATGGATC TGGGAAACTA GCAGAAGGTG 2340 GAACAAAGTT AACTTCTGGA T'rGGAAGATT TACAGACAGG ACTTGCTTCTr TTAGGACAAG 2400 GACTAGGTAA TGCTAGTGAT CAACTCAAAT CAGTATCAAC AGAA'rCTAAA AATGCAGAGA 2460 TTTTGTCAAA TCCACTCAAT C'TTrCAAAAA CAGACAATGA TCAAGTTCCT GTAAATGGAA 2520 TCGCAATAGC TCCTTATATG ATATCAGTTG CTCTTTTTTT GCAGCAATA'r CAACAAATAT 2580 GATATTTGCG AAATTGCCTT CAGGACGTCA TCCAGAGAGC CGTTGGGCTT GGTTGAAATC 2640 TTGAGCTGAA ATAAATGGTA TTATAGCTGT TTTGGCAGGA ATTTTGGTAT ATGGAGGAGT 2700 TCAGCT'rATT GGTTTAACTG CTAATCATGA GATGAGAATA TTTATTCTCA TCATCCTAAC 2760 AAGTTTAGTA TTCATGTCTA TGGTGACCAC TTTAGCAACG TGGAATAGCC GTATAGGAGC 2820 .T'TTTTTCTCA CTTAT'rTTGC TTTTACTACA GTTAGCA'rCA AGTGCAGGTA CI'ATCCACT 2880 TGCTT'rGACA AATGATTTCT TTAGATCTAT TAATCCCTGG TTACCAATGA GCTATtCAGT 2940 TTCGGGATTA CGACAAACAA TCTCTATCAA CAAGTCATTT TCCTAGC'rGT CATACTAGT'r 3000 *CTATTTACTA GTTTAGGTAT GCTAGCCTAT CAACATAAGA AAATGGAAGA AGATTAAAAA 3060 AATCGACCGA TTAACTGGTC GATTTTTTAT GCCTI'AGATG ACI'TCGTCT GTGATTATAG 3120 ATTCCAAATA GTAAGAGAGA AGTAAAGGAA CAGAT'rGCTC CAGTAATAAA ACCATTGGGA 3180 *ATGAAGGAAA GTGTAATAGT TCCTTTCCCC 'rTGGGAAT GT CAACTTTCAT AAATCCAGTT 3240 *TGAGCTTGTT TAATT'rCTAT TTTCTTACCA TCT'rGGTAGG CAGACCAACC TTTGTCATAA 3300 GGAATGGTGA AGAAAATAGA TGTATCTTGT TGGACATCAT ATGTAGCAAA AACC'TTGTTT 3360 *TTAGAAGTTG ATACTGTGAC AGGTTGTTCT I-rAATTTrM GAATTGCCTC GGTGAAAGTT 3420 *TTGGTATCTA AACGATAGAA GGTAGGAGAT TCAAATGATA CTTGTGAATT TCCAGGGAAA 3480 CTAACATTGA TATTGAAAGT TTTTTTCTCT TTAGTATATC CTAGATTAAA GAAGGAGAAG 3540 ACATTATCAG TTGTAAAAGT CTITTTrCA CCATTTACAA GGATGTCAAC CTTCTTTTGT 3600 *.*TTATCGTrAG AAAAGTGAAG GTTTATGAAA GAGAGATAAA CTTGGCTGT'r TTCTGGAACT 3660 TCAATTTGAT AC'TGGATTGC TGCATCTTCA TTTGAAGAAC TTGTGACACT AATCAAATCA 3720 TTAGTATTTT CTATTTTTTC TGTTTTTTCA TAAGGTATTG GAGAAAAATA ATCAAAATTG 3780 242 AAATGAGGCC TGATTATCCA AGGTATGTTC AT'rGAACTTG ACGTTAGCAA GTT'GATTTAA ACATCATTGT AAACAGATTG AGGGTAAGAT TATCTTrTG GAGATATTGT ACTGGATACC AGATTGAGAT TAGTCCCAGA CGATTTCGAA CAGATGAAAA CCTGTCTGAG TTTGTAGTTT GATTCCATAG CTGGGATATC GCAATTCCGT CCATTTGAGA ATTAGAATGG CAAATAGATT ACrCCCAACT GCAATCGGAA GAGAGTATTG ATAGATATCT TTAAAGCCAT AAATAAACTA TCAGCCAAAA GGATTTAAAA CCAAGTTTAT TTGAGAGATT CCATTGTAGT TTCAGTACGA GTAAPLTTGAT TCGAC'?ATAA GCACTTCGAG TGAAGCATTT AAACTCATTT CACAGATATA AACTTT~TTGA
ACTTATCAAT
TACTATTATT
CTAAAGTAGA
TGAATNTrCAT
TTCCAATATA
AAGCAAATCC
CAACCAGTAT
TAACTGCAAG
ATTTTCATAT
AGGACTGTCT
TGCATATCGG
GCTTGATGAA
ACTGTCA'ITr'
TGTTGAGAAA
CCATTCCTTA
AAATAAAGAG
GAGTAAAAGA
GAATAGACAA CCAAAAATTC AAGAGTA.AGC TGCGATTTTA GATAGATGGT AGCTAAAAAT AAATTCCAGA CTTTAAGTTC TTTCAGACGC AAGGTAGAGA AAATCCAAGC ATAGCGATGT CAAAATAAGT CAAGAGCTTC TATGTA-AAAG TATATGAGT'r TCACGTGAAA CTTAATAGAT AAGGGAAATA GTCCAACAAA AATCATTGGG CCAATGAATT GCTTAGCAALA GAGATCAAGA ACTTCAGTCA ATTT'rTCCCC ATGTGTCTGT AAACTAGCCA TACCAGCTAA AAAGGAGATA GTC'rTAAAGT CCCACGAAAT TTGACAGAGA TATCCAAAAT AATAATTTTG AATAAATAAC TTCTTTTCAG TTATCAGTAG ATGTAAACCA TCTAGCCAGG TTTTTATCTC TAATTGACTG GATAAGGCTA GTTTTAAAAT CTGAGGGATA GTTGACAGAC CAATCAATCC AAATrITrAAG T'rCGTTAGAT CAAAAALAGTA AACdAGAGGC AGGGCATAGA AGTTTAGCCC TAGACCACTT AGAATATTCA AATCTGTTAA AAAAGAATAA CCTGCTACTA CAAGAAAAAG CGAAACTAAA TTTAAGACTT CTGCTGCTGT GTAAATTAAC AAAAACATGT TTGGAGTATG CATGCCTTGC 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 CTTrGCAATTA
TTCAGCGTAA
ATGGCCCCAT
TACCAGCTAC
GAAATGCAAA GAATATTACA AAAATAAAAT GGTCAAAATA AC'TrGTTGT GTCAAAGGAA TTTCAGTTTG AAACTTTGTA AA.ATCAAATA GAGTGGGAAG AGTCATAATC ACTATGAAAT CAAGAACAGA TGATTTTCGA TACCAGAAAA TAAGAAACAA TACTGTCATA ATTGACAGAC TTGTAAAGTA CAATAGGAGT GTTATAATTA AAGGAATCAA GATAAAAACA ACAGTGAAAC TCATCAGAGC ATAGGAAGTA GATTGAA.ACA ATTTATTCAA ACTAAAAAAG AGAGTTGTCA GATAGATAGC ATCTGGCATA GCGAGAAAAC TACCCAAGTA ATAACTAGAT GTAAAGGTGT AAAACAGATT ACTATTTCCA TGTAGGATAT TTCGTAAGGC TACATCAAAA ATAACGTATT GATGAAAGCC ATCTCCTAAT AGAGGAGAGT TGTCGCTATT CCAGTAGATA CTTTGAGATA GATATACTCC AGACATAATC 243 ACTACAGGAA TGATGAAAGA AATAAAATAG GTTCGATATG TTTrAAAAA TGATTTCATC TTACCTCGTA GAATGATAGA AAACTCAGTT GGTTAACCCA ACTGAGTI-rr GAAGTTTAT
TTAGTCTTTC
GTAGGTTTCA
CAAAGTI-rGA CAATCCA'rCA
TACAAGGACA
CAAAGTTCTT 'rAACTTTTGC TCGATACGGT CAATGACGCC A'rAAATTCGT GGTCATGGCT TrCAAGCTTG AGATAGATTC TTTGATT-TTA AGAGCATGAG 'rTGTACTTCT
ATTTTTAGAT
GGCAAAGATG
CAAGTCCAAG
T'TTrGAAAGC
TTCATCTCCA
TTCTTTACT
TGAGTTATCT
ATAGTCAATA
CCCTGACAAG ACATTACAG GAAGCCACGT AGGAAAGTAT AAGAATI'GAT TCTCCTCCTG.
AGTTGTAACr CCCCACTTGA TAATGCAGTC GTTTGAATAT GATGAAACTA ATATTrATCCA TGTCAAGACA TCATTACCA.A ACTAGATGGC ACAATCTCTT CTGCCTTGAC 'rTAGAAGCAT TTTTTCTTCT GCTTT'AGCAT
GTTTGTTAAC
TGTCATCTTC
CAAAATCAGC
CAGT'rCCTTC CAT'rTrGTCC AA'rAAGTGC'r AGATAGTTTC ACCATCAATC TCTCACGTTC CGCTTTAAAG CTAGCTCAAT CTTATCAAGC GC-ATTTTCTA GGAATTCATC AAGACAATGA TATGGTTAGC ATTGATTCTT TAAAGITTTT 'rGATTTGTTG GATCATCAAC ATGACACGAA C7"=TCTCC GAGAAGAGCA TACGGCCGAC GCGAATTGAC GCAACCAGTC TTTGGTAGGT AAGA'ITGACT TCTCCCATGA TTGCAXCCAAT GTC'rTATCAT CTGGACGCAA TTTACAGTTA AATTTTCTAC TTGATAAATG GATATTTACG ATTCTTTAC GTGAI'GT'rGC TGGCAGAGAA ACGAGCAACA AATTCTTGCA ATTGTTTAAT TACGGTCTGC TAGCAAT'ITA GCAGCAAGCT CAGAAGATTC 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 CT'rCCAGAAG TCGTAGTTTC CGACATAGAG TTTGATTTTT CCAAAGTCAA GGTCGGCCAT GTGAGTACAA ACTTTGTTTA AGAAGTGACG GTCGTGGGAT AAAGTCAATC AAGAAGTCTT C'TAACCAAGT CTCGTCCAAG AGAAGAACAT CTGGTTTACC TTTCACCG TTGGCCAATT CGCTCATGTT GTTTTGAAGT AGTTGAGAGG CTI'CACTCTC TCCTTCGAGT TCGGCAGCAC GAACCCCGTC AGCATCTT'rC TCTTTCATGA TGCTATAAAG GGCACGTTCA TCTTCGTAGT CA.AAGTGATT ACCAAGAGAG ATGTGACCAG TAGTAGG'rTC TGATTTTCCG GCACCATTAG CACCGATTAA
AATCGATTGG
AAAAAGTGCT
TTGGTAGTGT
TGCTTCCCAA
CTCGTCTGAG
TTTTTCATTT-
TTGACGAAGA
GATATCTCCA
TCCGTAAGTA
ACTACCATAA CTGTGTTATC ATATCCAAAC CGTTAGTAGG TTGGCGAGGA GAACCTTTAC AATTCTTCTG GAATGTTTAG CCTCCAAGTT CGGCAAACTC AAATCTTCCT TCATGTAGAT CCCATGATAA CGACATCAAT ACAGAGAGAC GTTCATCTGG CCTAAAATTT 'rrAAAAAGGT TTTCCTTCTG TAAATTTGAT ATTGACATCA TCAAAAAGTT TGCGATCACT AAAACGTACT GAAACA'rCAG ATACTGTAAG 244 CAATGTTTTT CTCCTATATG TGTAATATAT TrATTCTACT AGAAAATACA GAAATATTCA AATTTTTATT TGTCAA'rTTT GTGTAAATTA TATTTACAGT ATCCTTTACA CA.AATCTGTA AAAAGCAAGG CTGATTTATT TGAAAGGACr ATATCGAAGG TCCAACAGGA AAATTGCATA ACAGGAAGAG GATAAGTATG TCATGCCAAA GATCCTCAAA TGCAGTTGGA TTGGATCCAA GGCTGAGTTG TCTATGTATT AACAGTCAAG ACAGAGATTr GGTCTATCCA ATCGCTCAAG TGGGACAGAT CAGAAACCA.A TGCATATAAC TGTGATGTCT AGGGCGTTTG CCTGGTrTAG TTATTTAGCT GATGATGCGG AGATCATATC CGCGTTGAGG AGATGTTTTT GGTCGTCCAG TTGATAAATr ACGGrATr'r CATTAAAAAA ATGCTATAAT AGAACAAAAT GACTAAACCC ATTATTTTAA CAGGAGACCG TTGGACATTA TGTTGGAAGT CTCAAAAATC ATATGT'rTGT GTTCTTGGCT GACCAACAAG CCA'PTGTAGA GTCTATCGGA AATGTGGCTT ATAAGTCAAC TATTTTT CAAAGCCAGA.
ATATGAATCT AGTTTCGTTA GCACGTTTGG CTCAGAAAGG A'N'TGGAGAA AGCATTCCGA GAGTATTAT'r
CCTTGACAGA
TGGATTATCT
TTCCAGAGTT
AGCGAAATC
CAGGATTCTT
CAGCTGATAT CACAGCTTTC TGATTGAGCA AACTCGTGAA
TGGTAGAGCC
ATGGAAATGC
ATACTTTGCG
GGAAGGTATT
TA.AA'GTCT
TAAAAAAGTA
AAGGCTAATT ATGTTCCTGT ATTGTTCGTT CTTTTAACAA TATCCAGAAA ATGAGAGAGC AAATCACTAA ATAATGGTAT ATGAGTATGT ATACAGATCC 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8411 120 180 240 300 ATCCAGGTAA GATTGAGGGA AATATGGTTT TCCATTATCT AAGATGCTCA AGAAATTGCT GATATGAAAG AACGT'rATCA ACGAGGTGGT CTTGGTGATG TGAAGACCAA GCGTTATCTA CTTGAAATAT TAGAACGTGA ACTGGGTCCG G INFORMATION FOR SEQ ID NO: 17: SEQUENCE CHARACTERISTICS: LENGTH: 9064 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: T1GCCGTACTC AAGTACAGCC TGCGCTA.AGT TTCCTAGTTT GCTCTTTGAT TTTCATTGAG TATTAGTAAC CAAAATCCGA CCACATAGCC AGCCCCTATG AATATAGCCA TTAAAGCTAG CATGGAAT'rT AGGAAATTAA AAACCACCGC AGATACAAAG GTTAGCACAA AAACATTAAA AGCAATGGTG TCAGAAGCCA AGACTAGAAT ATAGGGTGTC AACCGATCTA AAGTTTTGGA ATCTAGGAAA AATAAGTGTT TATACATGAT GACCTCCTCT ATGGCTGAA.A AGCAAGCCTT 245 CCTATGTAGA AAAGTGAGCA TTGTTT'rTrT ACCCCAAGAC ATATTATTGA TCACATGCAC CCAGCAAAGA TGATTCCAAC GAAAAATGAG GGAGAGCAAA TGCTTAAAGA AAGCATGTTG AGAAAGAACA GGGCTATATA AATACAGCCC AACCTTCCGC ATCTGGAACA CTAGCACTAA ATGCGGAAGA GATAACCATG
CGCATAGGAT
TGTTGCAAAG
TAAAATAGAA
CAGTAA'rCCT
AATACCTAGC
AGTTGACTGA
TACTGTCAAA
GCCTGTCTTA
GGA'rAAATC
ACGAAGATAT
GGAAGAAGCA
AAAACGGGAA GGTCGCTACA TCTTGGTATA GCGGGTCAAA CTAACAGACT AGGCAGGCTT AATCAAGACC AAATCGCGAA CTATAAATCA ArTTCTTCCAT TC'rGCAAAGT TAGTCCCACT ACATGTrrTAG
ATCGAATACC
ACAAGAACCA
CTGTCTGAAC
AAAGCCATTT
CAATCATGAC
CAGTGGAACC
ATAACCAATC
GTTAAAAGAG
TTMrCTTGGA
TCCAATAAAA
AG'rAAACTCA AGATATTTTG AATCCAGAAT AAATTGCCTA TCTGAGAAGA AAATTGCCAA TAGTTTTGGA CGATAAGCGT CAGCTGAGAA AGACTAAATA CGAAAAATAA GTAAGAGAAG 0 fi 0 00 *t 0 0 0 0 0 00 *0 0 0 ACTGCACTTA TTTTGAATAG TCACCTTGTC AGGCTC'rACT ACCTGACTAC TAGATAATAG AATAAAATCA ACCTCGCATC AGAATTTGAA ACCATAAGGT TCCTTGATTT TTACCGCCAC CCACTGTAAA GAACAAGCCA AGAATATCCA ACACACTACT ACCTCCATTC ATTTATTTCA GAAAAGGATA GAAAGCTACT AAACAAGCAG AGAATACACC ATACCTCTAT ACAAACAAAT TTGGTTCTAG GACTAACCAA AATGTATGTT AGCACTGAAA GGTTAAGTCT CTAAAAAAAT CGAAACTATC TTTTTCTTAT AATCAAGAGC GATTTTTAAC ATATCA'N'GT TTT'rTAA.AAT
AAGTTGATAC
GCTGTAAGAT
TTTTTCATAG
TAAGAAGACA
ATACATTAAG GCATTAAAGA CAAACCAAGA TAAAGTTTGA TTTTCCAAAA ATA.AAT'rTAA CCCTTTATTA GCAAGAAGGA CCCAATAGAT ACGATAGAGA CAAGAAAATA ACAAAAAATA CTAACAAT'rT AATAGAGCCT TTTTATAATA CTTCAAGCCC TATATAAGCG AT'rAGT'rGTT GACAAACATA AAA'rCTGCCA ATCATCATTT ACTTATATTT AGCAAGACAG GCCAATAATA TATCTACTGA CACTACAAGA CCATAATTAT TTACTCCTTT ATAATGTAGC AGCACCCGTT TTTTCATCCA AATCTTGAAT AAATCCTCCC TACTATGACC GTTTGTTTr TTTAAGGCTA CAATGAAAAT ATGTCCATAG TTATCAAAAA GATGAGCAAA AGCGAT'rTCG AATATCTACT AAACTCCTGC TTCAAACAAA T'ITGTAAAAA TGTCCCTAAA ATCTGTATTr CATATTAAAT TCTACTCAAA TATCCTGTCA CACATGAGCA GAAGCGTGAT GATAGAATTC TGTTTCTGAA AGCCGATAAA CATAAGT'rGA AAGAGTATCT CTTTTATI'T TTTAAAATGA ACAGTAACGG AATACTATAC ATATTATACT CCTAACAAAT CCAGCTTATC GCAACTTrGA CAAGTTTAGT TGTCATCGAA ACATCTTGAA 360 420 480 540 600 660 720 780' 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 162U 1680 1740 1800 1860 1920 1980 2040 00 00 *000 0 J*00 0*00 0 0 0.00 00 00 00 0 S 4 T1TG=rAAAAA ATTTAAAAAG TAAGCAI'AA TACCAACTTG TTTGTAGACT TTTCATCCTG ATATTAAAA AACTCCCCTG TAAATTAAGC AGAGAGTACYT ATCCGTATCC TTTTTGGAAG CCATATAAGG ACCAAATATA CCAACTACTA TTACCACCAA CATATTGC'rG CATAGGCTAC 246 AAACATACTT TCCTCTI'TAT ATTGTATTGA CTATCACATA TCATTTTGAC AGGCGAAACA TAGCAAATAC AGGGGAGAAA TTTATTTTT ATrTTGAAAA TAT'N'ITrA ATTAAGTCAT AACCAATAAT AAAACTTTrA ACCTCCAAGT ATAGCTCCAC AG'rrACACCT ATTCCTATAG CCATCAATTG CGCCATATGC TCTCCACCTT CAACGCAAGC TT'rGTATCCA TATAGTGTAT AAAAAAACAT AGGCAATAAA AAACAATTCA ATGATGTTAA AGATAAATAA GATAGGTTTG TTTAGCTTTC ATTTTTATGA CCAATTCCTC C'rATTGCAC GCTGCACAAC CTAAAGCAGC CTCAGCATTG TTTCATTTAT CCACCCGTTrG CCCCTGTTAC CCACATGCTC CCATAAATGG CAAA'rGGTCC CAATAGAAAT GTCAAACCGT AACCCC'TGCT GCACAACTAA TTTTrCTTCC AAGCATTTCA TT'ATCCATAA CTGCAAATTG CACTTTTCAG TTACGGAACA AGTTTAATAT
AAATCCATAA
CTGC-AGCACC
TGTTGCACAC
CCAATCAATA
TGACATCATT
AAAAATTATC
0 o 0 0 o* o GAGAAAAATT AATTTATCAT TTCAATAGTC TTTTGTTTTT AAAAGCGAAG AGAATAATAA TGTAGCGGTA TAGGCTAAAT GCCCCATGGT CCTAGAAGTC AACTACAGCT GCTCCTCCGG ATTACAATAA GTATTCATAC TCCTGCCCAA AGATCCACAC TGCTCCAACA CCACTCGCAG AGATTAGAAA TAATATGACA TATCGGAGAT ACTTATGGAT AGAATATAGC CTTCATAAAA ATCCACAAAC CACTGCTCCT TCCCATATTT- CACTCCACCC AATTACCTCC ATAAACCTCA AAGTCTCCTT TTA'rrAAAAT CAAATTTAGC TCCTATGTAT CACAAATAGC TGTCCCTAGC TAGTTTGCCA ATTATTCTTG GAAATTGTTC CATAATTT CTTGTTGTAA CTTTGACAAG TTGAATAGTC ATCGAAACGT 2100 2160 M22 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840
CCCCAGCCAC
CCTCCTTCAA
GTATCCATGA
CAAAAGCAGC ACCACCACC'r TACTAGATAA CATAGTTATA CAAATACTCT TTTTT~ATTTT
TCTAAGACAT
TCCATTTCAT
TAATTTTTGT
CATCCAGATT
TTTAGTATAT CATCGTTTTT TAAAAITTT CTTGAATTGC AAAAATT'ACA TTAGACTTCC TAATACCAGC ACTCAAATTC A'N'CGTAATC TTGAAAACAT TTTAAACGTT TTTACTTTGG TAGATAGCGC ATGGTTACAG GCT1TTATCTT TACGTGAAGT TTGTGCTTGA GGATATATCT AGATTTTACC AGCTTGTCCG ATATTTCTGC AATAGI'TCAC AGTGATATCC AAAGAAACAA TGCAAAACTA GAATCCTAGT CGAAGCGTrTr ACCATGACTT CAAAGATGTT CTCAACCTTG CAACTGTTAG CGGTTTGAGT TCATGAGCCC TTGATAACCA GACTCATTT'r GALACAACTTC TTCTCC C ACrCTGACA
TCATGA'ITGA
CGATAGGTTG
CTTCTCTCCT
TTGCTGATT
CTGTCAGCCA
ATATCATGAC
ATCGCTTGAG
247 TCTTCATAGC GTGAAATT'rC TTGATTTTTA CTCCGTCGC GAAATCGTAA CACCACTTTG TTGCTT'rCGT GAACACCAAA TATTGAAGAG TGGCCATAAG GCGTGTTTAA GTTGA'rAAGC TGAACACCAA CAAGACGCTT ATAGAACTAT AGTAAAATGA TTCTAACAAT GTTTTAGAAG T'=-~ACCAG AATCATTCGC TAATTC'T T1'TAGGGCGA ATCAATCAT'r ACCGTGTCCT CAGAACTGAG AGGAGTTCT'r AACAAGAGTT ACTTCAACCC ATTGGCTCCG ACGGAGTAAG ATCAGCCGCA ATTTCTNCAT AAGTGCGGTA 'rTCTCGCACA AAGGTC'rTCT AGGCTAATTI TAGGTTTTCG TCCACCTT TGTTTTTAAT ACAGCTAGCA TCTCTrCAAA AGTCGTGCGC AAATCGTGCA TCAGTTAGTT GT'rTACTTGC TTCATAATTC AATAAGAACA GGATAAATCG ATCAGGACAG TCAAATCGAT TAGAGGCGTA CTATTCTAGT TTCAATCTAC TATACTATAC
CATATTTTGT
TT'rTCTTTTA TTCGCAGGGA ATCTATTATA AAAGGGTAAG TATTGCAAAA ACACTTACCC TACTTCATTA AGCTCTACTT TTTATAATAC TTCA.AGCCCC ACATGAGCAG AAGCATGATG ATTAAGCAGA GAACAGCGCC AATATAAGCG TCCTGCTGTG ATACCTCTAT ACAAACAAAT AATAGACATA CATAAGTTGA TTGGTTCTAG GACTAACCAA ATCATCATCT TTCCCTAGTG AGATAAACAG TAACCAAAAT T'rGGAAAACT ACGGAAAAAT TTAAAAACTG CAAGGGCAAC TGACCTAAGA ACAATCTCGC AGTTTTCATT TCT'rTTCTCC TTTCTTTTTA ACATAGGCTA TGGTATAAAA TAGCTGATAC ATGGACATGA TTAGATACAG AACGAAAATA CCTAAATGTG CGATTTATCT TAGTTGAGCA TT'rTGACCTT GGATCACTCA AATCATAAAT TAAAAAAGCA AGCATGAAAA ACATAC'N'TC TGTAGACTT'r TCATCCTGCT ATCACATATC CTCCCCTGTA AATTAAGCTA GCAAATACAG CCGTATCCTT 'rTTGGAAGAT TTTGAAAATA CAAATATACC AACTACTA.AA CCAATAATAA TGTTGCTGCA TAGGCTACAC CTCCAAGTAT
AGAAGCCAAG
ACGAGATAGA
AAGGAAGATG
TTGATAGCA-A
CAAGCACTCT
ATCAGTATTT
AGAACATTTA
GGTCATCAAA
CTCTTTATAT
ATTATTTGTT GGTAGGATTC AAACCTGTCA AGCCGATGAA TCAAACTCTC TTATCCTCAT TTAATAACTA CTAAAAGAAA AATAGATAAG TAGAAACAAG TTCCGTTTTT TAGCAAGAAA AATAGATCAT AACTGCAATC CCCTAAGCGG ATATAGAAAG TTTTCTTCAT AAGATTTCCT CACTGCTAGT ATAGCACTTA ACCTCTTGAA TTGTAAAAAT TGTATTGATA CCAACTTGTT 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 ATTTTGACAG GCGAAACAAT ATTAAAGAAA GGGAGAAATT TA'rTTTTTAG AGAGTACTAT TTTTCTAAT TAAGTCATCC ATATAAGGAC AACTTTTAAA ATCCATAATT ACCACCAACA AGCTCCACCC GCAGCACCAG TTGCTGCACC TTGCCATGTT CCTGTTTTAA TGCCTAGTTG AAGACCTCTT GCTGCTCCTC CTCCAACACC TGCTTTrGGCA WATCTrCCCC ATCCATAACA GAAAATTGTG AAACTAAAAT AAATCAGAAT CCCAATTTAT CACCAACCA.
TGTGCTCCAA CAAATGCACC GTTCCACCAG TTATAATTCC GAGCTATACC CCCCTTCAAC AACATTTT'rG TATTCATGAT AAATT'CAATA AACAAATAGA ACGTCTACTA TCT'rCTTAAA
AATTGCATCC
ACATCATTTT
AGAATCCTCA
ACCTCCTAAG
AGCAAGTCCA
CGTAGTGACT
TTTCGCAAGC
GAATACCTCC
248 GCCACCTTCA ACGCAAGCAA GCATTTCAGT rG'rATCCATG ACAAATACTC CTTTTTAAA TAANTTACT ATAACTCTTA CCAACTTAGT CATG?1'AATC CACCCCCAAT TGCACCAATG GCTACTCCTA AAGTGGCCAA ACCTGCTCCA CCTG3TAATCA GTGCATTTTG AT'rTCAGTA'r CCATAACCTC T'TTrATTTT CAATTTGTTA
ACAATCAGTG
TAACTGTGAC
CCAAAGTCTT
AAAAGTATAT
TTCAATAATC
TTTTTTATAG TATCTr'rTTG ATTT'rCTT'AA GGTAGCAGTA CCTAT7T'rTT AGTCTAAGAT 0 0000* 0000.
0 o* TTGAGTATCT AAAATA'rC'T AA'rTTCGTTA TTCTCCTTGC TATTTATTAA CTTGCAGAAA GCAAAAAA'rA TI'AGTAAATA TTATTCCTAC CAATCCATCA ACTAAGTAAA GCATCAACGA ATAATTAAAA TTTTGCTAAC TATCTTATTC.TCATCATTCT TAAGTAAGTA AATAAGACAG 'rAAATT'AATA GCGATAATAA ATCTTACAAA GAGGACATAA TTCCTGAACC TACACAAATA TATCGGACCA GTCGCAGCAG CTAATAGTrAC TGCTCCAATA TAAATTGCCT C'PrCCTCCAC TAACTATTTC GAGTTCTTCA TTCCATCATT T'rTGTATTCA TGACAAATAC TCCTTTTTTC TGTAACTTTG A'rAAGTTTAG TATATCATCG TTTTTTAAPA TTGTCATCGA AACGTCTTGA'ATTAGCTTNT TTATTTCAAG AAAATAATTT CTAATCACTT TTTTACCA'rT CAGGAAGTTT ATAAAATATG AACTTAGT'rT TATGACATAA TAGACCTATC CAATGACT'rC TTATAAACGT ACA'rTTGTTC CTCAAATAGA CTGCCTTAGC CTCGATTGCT AAATTCTATG GTTCAGATTT AACTTGCAAA GACCAATAAA GAAGGGACGA CTGCTCTTGG AATAAAAAGT TTTACTATAC ATAG'rTATA GTTAAGTTTTr TTACATAAAC GATTGATAAT TAGATAACTT TGATATTTTG TACTATAT1T AAGAATCATA AGTGTTGCTG CTCCCCCAGT CAACCACCGA TTGCAGATCC TTATCCATAA CAGAAAATTG TTTTTTTATT TTTGTCTTGT TTTTTCATCC AGATCTTGAA CCACCTCTAA A'rGTTTAAAA TAATGACTAT PCAAGATTTC CACTATATGA AAGGAATTGC TGCGAGAGAC TGTGGTGTCG TTCTCTAGCT CACTTGACAG CATT'GTAAAA GCCGCTGA'rC 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 AAATGGGCTT TGAAACAAGA CCTGTTCAAG CAGATAAAAC GCTCTTTGAC ATGAGTGATG TCCCCTATCC ATTTATCGTT CACGTTAACA AAGAAGGAAA ACTCCAACAT TACTATG'rTG TCTATCAAAC AAAGAAAGAC TATCTGA'rTA TTGGTGATCC TGACCCTTCT GTAAAAATCA CTAAAATGTC AAAAGAACGC 'TTTTCTATG AATGGAC1'GG AGTAGCTrATT T'rTCTAGCTA CCAAACCCAG CTA'rCAACCC CATAAAGATA AAAAGAATGG CCTCTGA'TT TCAAACAAAA ATCTCTCATT ACTATTATCA ATATAGGTGG AATCAGATGA AATCAACTr'r CAACAAGTCA TGAGCTTCTC ATTGATGTGA TTTTATCCTA ACACGTCGTA CAGGAGAAAT TTGGCTTCTA CCATTCTTTC GTCTTACTGG CACAAAACCC ATGTTCATCA TCTTTCTTT AGTAATTCTA TGGTTAGCTC TCGCTCACGA GTGAAGAAAA GAAAAATCCT TTAAGCTCAG AAATTAGTTC TGAATATCCT ATTTCTATCG GTCAGCTGAT T~rACTAT
AGGAATCATC
CAGAGATTAT
TATTCGCCAT
CATTTCACGA
TCTTTTTCTG
TAATCTCTTC
GCTTACATTG
CTCCAAGGAA
TCAGTTGGTC
C TCCTAACCG
ATTTTTGAAC
TTCACAGATG
GATGTCTA
CT'rCTTTCTC TCTACTAAGC AAGCTTCCTT TTCTCrCAAG CTTATTGGTC TCTGGATGA ATACATTCCA TGGTTATCAC CTATATCCTC TTCTGAGTCA GAGATTAAGT
TTCCCATGTC
CTAACTCTAT
TTCTGATTCT
TTATTTCCAT
TTTCTTTGCG
TATAGATGCC
TGTAGGAGGC
TCCTATATAC
TGTCATGCAA
AACTATAAAG
TATGAAACCT T'rCGAAAAAA TGAACCATGA TGCCATTATC GAAGATATCA ACGGGATTGA TCGCTATCAA AATATAGACA TAA.ATATTCT ATTTTACAAA TATCCTATGG TTTGGCGCTC TACCTTTAAC ACACTTTTTT CA.ATTTGT AGA'rrATTTG CGAGTTTAAA GCAGGGAACA AATTAGTCAT GTCAAGTAAA CTTACTTTAC AACTCCTATG AGGTCGCTAA TAACCGTTTG AAAACCCTGT TCATTCACAT 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9064 GAAAATATTA TCAACCTCCA AACCAAACTC CAATCTGCGA AACGAAGTCT ATCTAGTCGA ATCTGAATTT CAAGTTCAAG TTTTTGATGG GCGATATTGA ATTTGATGAC CTTTCTTATA AGTATGGTTT 'rGGATGAGAT ACCTTAACAG ATATTAATCT CACGATTAAA CAAGGAGATA AGGTTAGCC'r AGT'rGGAGTT AGTGGTTCTG GTAAAACAAC TTTAGCCAAA ATGATTGTCA ATTTC'TTGCA ACCCTACAAA GGGCATATTT CCATCAATCA TCAGGATATT AAAAACATTG ATAAAAAAGT CTTGCGCCGT CATATTAATT ACCTACCCCA ACAAGCCTAT ATCTTTAATG GCTCTATTTT GGAAAACTTA ACCTTGGGCG GTAATCATAT GATTAGTCAA GAAGATATTC TAAAAGCTTG TGAAGTAGCT GAAATCCGTC AAGACATTGA AAGAATGCCT ATGGGCTATC AAACTCAGCT CTCTGATGGA GCTGGTCTAT CAGGAGGACA GAAGCAACGA ATCGCTCTCG CTCGTGCTCT TTTAACTAAA TCTCCTGTTT TAATACTAGA TGAAGC'rACT AGCGGTCTTG ATGTCTTGAC TGAGAAAAAG GTTATAGATA ATCTTATGTC TCTAACTGAT AAAACCATTC TCT'r1GTAGC CCATCGTCTC AGTATAGCCG AACGAACCAA CCGTGTCATT GTTCTTGACC AGGGGAAAAT CATTGAAGT'r
GGTA
250 INFORMATION FOR SEQ ID NO: 18: SEQUENCE CHARACTERISTICS: LENGTH: 7780 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: CTCCATTTTT TTGATTTCAT AAATAAACAA CCTCTCTGTT AATTTTGTAT AATTATAACG a.
a a a. a.
a a. a a a. a.
a a a a a a ATATCCAAGT TACTTGTCAA AAAAAAAGGA GCCATCAGTT AGTTCGACAA TCTTACCTGT TCACCGATAC GCTCCAAGTA TCTGGATTTT TCTTAATCTC ATTTGCTCAT CCATGGAGGC TCAAGTGCTG CTTCCACAAC ACAGCTGGAA TGCGCTCTTC TGATCCCCCA TACGCTCCAC TCTTGAGAGA CTGGTTGTTrG TCGTATTCAT TTACTTCTGC GTGACAAAAG CACGTACCGT AACTGGTTAT GTAATTTCTC CTTATCCAAA TTTTCCTGTA
GATTTCAAGC
TTCAAAGTAG
GGAAATAACT
TTCAGTCGCA
CACCCGGTAT
GCTTTTAACT
TCCCTTTTAT ACAGAATTAA ACAACCCATT CACAGATATT TGGAAATAAT CACGACCCGT AGGTCACGGA TAGTTTCAAA GCGTCGTCAA CAGAACCATT TCACGTCCCA TTTTTTTAAT GTGTTTTTTA AATTTTTATC TCAAAAATAT
TTTTTCGTTC
ACTATTTTAT
TTTAGCATAG
AACAATGGCT
ATAGTGGTTA
AAGATAAAGA
TTCTTCCTCT
CCCCTTCATA CGGATGGT'rG ATCTGATACA GCCTTAAGGA GAGTGCGATC ATTTCAAATG ATCATCTTCG ATGACCTCTT ACGA'N'GATT TGTGAGAGCA TAAATCTTCT TCAAATTGAG CCTGGGCAAT GGCTACAGCG CAGTCAAGAC TGTACGCAAA ATTTCTTTTC CAGTTTCACT TTGCCAGGTC ACGGTCATGC CrTCTTGTCC CATAGCGTAG ATCGTAACAT CTTTCATCTC GTGTrGGGGA TCAAGGAACA GAA.AAATCCT GTCTTATCAG CATGGTGTAC TTGTCTTTTA ATCCAAAGCC GAAGTTGGCT GGCCACGCAG ACACGCTGCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320
ATATAGTCTT
TCTGCTTGGT ATCATTAAAT TCAATCAAAT AGATACGTGA AGCTTGCTGC ATGGAACGGG GACCATACAA GGTTTCCTCA ATTTTACCAG CATCCAAGAG GATGATTTTA GGACTAGTrG GTTGACCACC TGACAATCCA ATAGCTGAAT TAGAGGOACC T'rGCAAGGCT TTTTCTACGG CATTGATACG AAGCCCGTAG ACAACATTCT
CCGTTTCCTT
CTCCATCTAG
TTACCAGAAG
CTGAAATCGG
CCAAGACACG
CATATAGACG ATCCTTGACC TCATCCCAGA CTTCATCCAG AACCTGCTTA TCCTTAATTC CATAGATAGT CATAGGGAA.A GGATTAGGTT GTTGGAAAAC CATTCCGATT TCCTTACGTA ATTCAACCGT ATCTGTACGC GGACTGTAGA TGTTGTGACC ATT~GTACACC ACGCATCCAG TTGTGGTCAC CTCTGGATTG AGATCTCCCA 251 TGCGGTTGAG iAGACTTGAGG AGGGT1TGACT TCCCTGATCC AGATGGACCA ATCAAGGCTG 'rAATTTCC -r AGGTGGAAA GATAGGGAAA CACTATTCAA AGCCTCI-r TTATTATAAT AAACGGACAG GTCTGATACC TGTAAAATCG CATCTGI'CAT ACGGTTTCCT TTCTAACCAA AGTGACCAGA 'rACATAGTCA TTGGTGGACT G;TAGCTTGGC ATTTTGGAAA ATAGTrGCAG TCTTGTCATA CTCAATCAAA TCACCCAAGT AAAAGAAGCC TGTATAGTCA CTrGCACGAG CAGCCTGCTG CATATTA'rGC GTTACAATGA TGATGGTAAA GTTTTTCTTG AGCTCAAACA rGTCTCTTC TAGTTGCATG GTCGCAATCG GATCCAACGC TGACGCTGGC TCATCCATTA AGAGGAT1ATC TGGCTTAACA GAGATGGCAC GAGCGATACA GAGACGTT'GT TGCTGACCAC CTGATAAGGT CAAGGCTGAC TTGTGGAGAT GACGAAGGGA GGTTTCTACG ATT'rCATCTA CATGCGCAAA GGTAATATTA CGGTAAA'rTG CCAT'rCCAAT GTGTTTACGC ATTTCATAAA CACGATAGAG AATCTGCCCA GTTACTTTAG GACTGCGTAA GTACGTAGAT TTCCCCGATC TTCTTTCAAA TTGCATATCA ATCCCCTTAA CGTCTTTA6AC CTGATCCCAG AGGGCAGCCT GGACTTGCTT ATCCTTAACT CCAGCACGTT ACTTAGCAAA TGGATTGGGA CGTTGAAAAA 4 4 4
CGTTGATTTC
CAATATCA.A'
CCCACGGGCC
TGGATTCATT
CATCCTTAGT AGAPLAGGGCT AGTTATATGT TGACATGGCT CCGAACTTAC GAGCTCCAA6A CCTGCI'GATA CAATGGTTCC ACAGCCAAGG TTTCTGCTTG CAGTTAGACC AGTCAAGAGC TCGCCAAAGA TACGACCAGA GGAATAACAA CATGAACCAC CGTTGGGTAT CGTGAACGTG TTAAAGACTG TCAACGCCAA ACAAAGA'rCA AGTAACCAAA ATACAAGTCC GCACAAAGTT CCAGCTCCCA TAGAAAGAGG ACTTTTTCTT CAGGAAAGGT TCTCCTTTAG GCAGCGGTTA GTTAAAAATC AGGATAAAGA ATCTGGAATA GTGCCrTCAC ACGGAAGATA GAGATGGGGC TGGCGCCGAT TGCCCTGCTG TGCCAAGACG ACACCCGTTA TGTCTCCCAG CGAGAAATCC TGGACGGTTG ACATCAATrC AGTATCATTC ATGCGATTGA AATCAAAGCT GTAATTTTAT TTTACCATAG TAAACATGGA AAGGATATGC TrCTCATCCC ATTTCTTGTG TAGATAGCTT TCAGGAGCAC AGCGGCAGAA TAT'rGACT'rT CCAGATATGG TAGTCACACT GAGGATATTC TATAGATCAG AGCTGCAGCT CAATACCTGG AAGCGCT'rCC CA-AGAGCCAG ACCAGCCTCA TACGCGTCAT CTGAGGCAAG 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 4* *b 4 4 4*49 4 444* 44.4 4 4.44 .4 4* 4 4 4
TTTCAAACTA
GGCACCTGA.A
GAGACCCACC
GGTAACAGGA
TACAGAAATA
TCCTCTACAT
ATGATTGAAA ATCCATACTC ACCACTGATG GTAAAGAGGA
CCTTTTTAG
ATCAAGGTAA
GCTTGAAA.AG
CATATTCAGC
TGACCAATAG
CAGAAGACCT
AAACTGGACT
CAAAATTTCA
CAAGTAAATrC
GAAAAAGGAA
TCCAGTCAAG
TTGTAAAGCT GAATGCCAAT CCCACCACCT 252 AAAGACCAAG AGATATGGGG CAAGCCCCGA ACCAAGATAT AGAGAATCAA GGAAGCCAAG ATTrGTCACAA TGATGCTAGC AATCGTATAG AGGACAGCTG rrGCAAGTTT ATCTAATT'rC TTAGCGCGCA TAATTTCT TTCCTCTT'rC AACTAAGCTC ATCAAGAGCA GTACCAAGGC TCCCATGACA GTGTTCCCAA TTCCCATAGT GGTCAAGGAA GTTGGGATAA CAGCTGAGTT ACCAALAGGCA CGCGCCATCC CAAAGACCAC CAAGA'rCACA CGCCAGATAG TCTGCCAGCG ATAATAACGA GGAACCGCAC GCAAGCTATC CATGACAAAG AGGACGGAAA TCCCTGACAA TTTCGTAATC AATTTAATCA CACTGTTAAA CAGTGACCAG AGAACATTAT TATTTACAGT TAATATAGAA GTTAAAGTG CAGCTGGTGT TCCGACAACC ATCTGGATAG CTAGAGCCTC TGCAGTGAAA ATACCAGAAC GGGCCGCCTT AGTGGCTCCC ATAGCGAAAC TGGCTTCACG CGTTGTCATA AACGTTACGG TCGGCAAAAT AATCCCAAAA CCAGTCCCAC CAAAGACACT AATAAATCCG TACACTACTG AAGGAATCCC CTTCGCCCCT TTTGGTGATA CTTCGGTCAT TGCGATAAGG GCTGAGAGAA TGGTAACCAT
GCGAACAAAG
AACCAGGAGT
AAAAACTGC'r
AAAGGAACCC
GGAACGACGA CTTGCAAGCC TCAATAGCTG GTTGCAAAAT GCACCAATAG CAAAGGGTGT AAAATCATAG GAAGGGCACC AAATTCTTTA CTAGAAGGAT TCCAAGTTCC 9# 4 4 *b
V
C
0* VS S S
S
TCCCAAAAGA AAGTCAAAGA TATTCACACC ATTGACAAAG AAGGTCGACA AGCCTTTTTG CGCTACGAAA ACCAAAATCA TGGCCACAAG GATGACTATC AAAGAAAGAC AGGCAAAGGT CAAACCTT'rT CCTAATTTCT CCAGACGAGA AITrCTNTGAT GGAAGCAACA TTTTCTTAGC 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 .5 ~5
S
V
4. S S *45V 4* 4 5
S
TAATTCTTCT TGATTCATTA TTGTCTCCCT TTTTCAACCT TCATTrCCTT AATCGGAATA GTCTCATCCG AGAGAACAAA ATTGAGAAAT GTATACATAT GCTCATAAGA CCACAAGGGC AAGTCATAGC CATTCAACTT CATGCTTT'rG TAAGAGATAG CTCCTGGACT TTTTGATACG TCCTGACTTT GCATGGCAGA CTGACCTTCC CCAGAGCCGG CTGCCCGATT GATAACAGAG CAATTGGTTA CCTCACCTAT GAAGATTTGA TCAACCTCCT TATTGACAAT CAGAGCCAAG CCAGAAGCAT CAATTCCGTC TTTrTCCTCA GCCCCAGACT GAACCTGGGA CAAGCCTGTA
TCCAACACTG
TACTTCAATC
TCTGCAGCCA
CAATTATTGC
ACCGAATCAT
ATTGATTTTA
ATAATCACAG
ATGGGTAAGT
CGAAGTTGCT
CCAGCTACCG
GCAAATACAT
CCAGAACCTC
TCACAGTTCC GGCAGCATCT CTTTGACAAT CCCTTCTTGG ACTCATrGGG CTGCCCCAAT TACTTATATT TTCTGGACTT CTATATAGGT AAGAGATAAA CCGCTCCATT 'rGAATCCTGC TATCAAAGGT AGCACGAGAG CCr'rACCACC AACCrCTTTC CTGTCGTTAG GTTATCAACA
CGACCTTGTG
CTGAGTTTCC
CCCCTTGGAC
GTCAACAAGA
TATATCAACT
ATTGACCGTT
TT'rCCAACAT GGATCGTGCC AAATTCATCT GCCGCTACTT CAACCAAGGG. TTGCAAGGCA 253 G'rTGAGCCAA CACCCTTA'r GGATTCTCCA CGATCAATCC AGCTAGCACA GCCTACrAAA CAAGCCGTCA GCCAAAAAGC GATAAGAGAC AGAGCAAGCT TTTTTCT-rT TTTCACTGT'r TTTCTCCTCG AAAATAAT'rA TGAATACTGT GAAT'T'rA AGTAGrCr'r TATGAGTTGA CGCATGAATrr CTTACCAAAT TTCTGCGCAA TTGATTATT ATATAATATA GGCTATATTA CTCTTTrCCTA ACCTCCI-rTT TTCATATGTG GATAAAATCT CT'rGTCTATC CCTrCCCCCA TTGTCACCCA TTATAGTCAT TTCGTGTCTC TTrTCCCCT T'TTTAATGCA AGGGAAATTA CTCTCCTTAG ATGATAATCC AAAAGCTAGA AAGGTATCTC AAACCTCTCT ACTCTCCCAG ACTAGTTTAC AACTAAAAGG AAAAGATTCT ATTTTATGAG AAATCTAGTT TACAAGCGGT AACAACGCTA ATAACTAAAC TTCTTGTACT CTTTGAAAAT CTCTTCAAAC CAGTGTTTTG AGCTA'rCTAT GGCTAGCTTC CTAGTTTGCT CTTTGATTTT CATTGAGTAG TAAAACTACA TGTAATGGCA ATCAAGATAT CAAGAATCAT CCTACTAAAA AA.ATCCATAC TTTCACTATA ACATAGAATA AGATATTTGA CTAGCATTTT CATTTGAATC TGAGGCCTTT TGGAAAATAA TTTTTCAAAA CATTrTCCAGT AACCTTTGCA AAGCCCAAGC CATTGCCTTT AACCAAAACT TGGTACCAAC CATTTGGCAG ACTTTCTGCC AGCTGAACGG 'NTTCTCCAGC CGCATAC ITG ACAAACGCTT CTTGGCCAAT TTCAACCGAC TGTTCGACCT GAC'TCGGTTT CAAGGCTAAA CCAAGAGCGA AACTGGGCTC AAAGCGTTTC.TTCTTAAAAG TACCCAGATG CAGTCCATTG CGAGCAATCT TGAGCTTCCA TAAATCTGGC AAAAGTTCTG GCAAGAGATA AAGCTGGTCT CCAAAAATCT GCAAGATACC CGGTAGATTG ACCTTCAAAT GGTTTTGGGC AAATTCCTGC CACAAGGCAA CTTGT'rCACG GCTGAGGTTA CTCTTACTTG CCTTAAATT'r AGGAGCTGGA TTGTTACCCT TAAACTGTAG ATGGGCAACA AACTGACCCT CTCCCTTAAA CTGATGAGGA TACATCCGAG CCGTTTCTGG CAGGTCAATA CCAGCTACCA TTCCATTGAT ATGCTCTACT GGCAACAAGT CAAAATCATA CTCTTCCAGC AACCAATTGA CAATCTCTTC GTTTTCCTCG GGTGCCCAGG TACAGGTCGA ATAAACCAGA TGACCACCTT CAGCTAACAT GGTCACTGCA TCCTCCAGAA TTTCTCTTTG CAAGCTAGCA CATTGACTCG GATAATCTAA GCTCCAATAG TCCATAGCAT CAGGTTGCTT ACGAAACATT CCTTCACCAG AGCAAGGGGC ATCAAGAACG ATTAAGTCAA AATAGCCTTT AAAGACCTTG ACCAAGCGGT CGGCAGATTC ATTGG'rCACC 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720
ACGACATTTG
GAAATrcAT
TTGCCCCCCG
TCGCTCCAAA ACGCTCCATG TTTTCAACCA AAATCTTAGC CCGTTTGCTT TGGAAXICAAG TAGCCCCTCC CCTGCTAGAT AGGCTGCCAG TTGAGTTGAT GTGCAGCAGC CAAGTCCAAG ACCTTCATAC CAGGACTGGG TTGGGCTACT 254 TGAGCCACCA TT'rGAGCAGC AG.GTTCTTGC GAATAAACTA AACCTGTAGC ATGCTCAGGC 6780 GATrTC!CCTG AAACCI'rCCC ATAGTGGCCC CAAGGGGTTT GAGTAATGGC ATCAGAAAAG 6840 GAAAGTTGCT CTTCTTTTAA. GGGATrGACC CGAAAGGCCG AAACCGCI-TC CTCCTCAAAA 6900 GAGGCAAGAA AATCTCTTGC CTCATCTCCT AGTATCTCTT TATATN'TTC AACAAATCCTr 6960 TCTGGAAArT GCATTTAAGT TCr'NTCC'rT TCGTAAATAT AGGAcUTG' TTCCTCCrGc 7020 ATCTCAAGAG GCACCA'rCAT GACCGGCTGT CTGGTTTGAA AATCAGGAGC TTCACCAAA 7080 AGGGTCACAA CCCGATAGCC CAGACTTTCC CCTAAAATAC TAGCI'GCGGC ATAATCCCAT 7140 GGTTGCAGAT AAGTGAGArA GGTCAACAAA CGCCCTGACA AAATCTTGGC AAAACTAATG 7200 GCCGCACTTC CATAGACACG AACACCAAGA ACCGCTCGGC TCAAATCAGC CAGCCCCCAT 7260 TCATTGGTT'r CCAGCATACC ACTATTCCCT GCAATGAGAA AATCTCCAAG TGGTTTAGTT 7320 TTAAAAGGAG CTAGGGACCT ATCATTTAGA. CAAACTGGA-A ATTCCCCACC ACCGTGGTAA 7380 CAATCCCCTT TGACCACATC ArAAATCAGA CCAAACTGTC CCTrGACCATT TTCAAAATAA 7440 GCCATCATAA CAGCAAAATC TTCCTGCTGG GCTACAAAAT TATTGGTACC ATCAATGGGA 7500 TCAATGACCC AAACCTTGCC CTCTTGAACC GAGGCTCGCA GACAACCTTC TTCAGCACAA 7560 *A'rCTTATCCT CAGGATAACG GGACAAAATC TCACCAACCA AGAGTTCCTG AACTrCTTTG 7620 TCCAGTCTGG TCACCAAATC TGTTGGAGAG GAC'ITGGTTT CA-ArACGCAA GTCTTCCTGC 7680 :e.ATATGGTCAA GAATGTACTG ACCTGCTTTC TTAACAAGCTr CTITTAGCAAA TTCAAATTTA 7740 :**CTTTCCA.AGA. GAAATCTTTC CTrCCCCTTT TTCTTTGGGG 7780 INFORMATION FOR SEQ ID NO: 19: Ci) SEQUENCE CHARACTERISTICS:' LENGTH: 4820 base pairs TYPE: nucleic acid STRANDEDNESS: double 00000 D TOPOLOGY: linear :00.(xi) SEQUENCE DESCRIPTION: SE 0.0,GTAATGATAT ACGAACACCA GGTGACCTGA TGGGACGTCG TAAGCCTATG AACTACTAGC TGCTAAAGGC TTTAAAGATG GTATGGTACC ATATATCTCA AACCAATACG AAGAAGAAGC 120 CAAACAAAAG GGCAAGACAA TCAATCTCTA CGGTAAA6ACA AGAGGTTrGG TTACAGATGA 180 CTTGGTTTTG GAA.AAGGTAT TTAATAACCA ATATCATACT TGGAGTGAGT TTAAGAAAGC 240 TATGTATCAA GAACGACAAG ATCAGTTTGA TAGATTGAAC AAAGTTAC'rT PTAATGATAC 300 AACACAGCCT TGGCAAACAT TTGCCAAGA-A AACTACAAGC AGTGTAGATG AATTACAGAA 360 255 ATTAATGGAC GTTGCTGTTC GTAAGGATGC AGAACAM' TACTACCATT GGAATAACTA 420 CAATCCAr.AC ATAGATAGTG AAGTCCACAA GCTCAAGAGA GCAATCTTTA AAGCCTATCT 480 TGACCAAACA AATGA?1rXTA GAAGTTCAAT TTT TGAGAAT AAAAAATAAGT GTCTACTATTr 540 AGGAAATAAA GTTTAAAAAG GTGATGAAGA ACAAACCAAAG ATTCAAGCAG GAATTCCTAC 600 TGATAAMGAA GTAAGTTATG ATCTTATTTA TCAC-CAGGAA ACrTCCcTG CAACAGGTTC 660 A'rCAACTTCT GAGCTTACAG CrTTAGGCCT ATTAGCTGTT GGTAG=IAG TTCI-rTTGGT 720 TCATAATATG ACGGGAACAG 'r1TTrTGCTC CCTCTAAAA GTCATCAr GATGGCTT 780 TTrCTATATAG GGTAAAAGAT AGGGTAAAAG GCTATCATCG GACAAAATALA AGAAGGCATG 840 ATATAATATA AAGTAGATTT CTATGTCATA A.AACAAGAAC TGTTTGGACA TCATTCATTT 900 GAA.AACTCTC TATG?1YCAAA CAATAGTAAA ATAAAATAGG GG.ATCTAAAT CcrrGCTATG 960 AAAGGAAAAA ACTCAATGGC TACTATTCAA TGGTrrCC'rG GTCACATGTC TAAAGC'TCGT 1020 CGACAGGTGC AGGAGAATTT AANATTTGTT GAT'rTrGTGA CGAT rT ACT AGATGCACGC 1080 TTGCCTCTAT CTAGTCAAAA TCCTATGTTG ACCAAGA'rTG TTGGTGATAA ACCAAAACTC 1.140 .TTGATTTTAA ACAAGGCCGA CTTGGCTGAT CCAGCAATGA CCAAGGAATG GCGTCAGTAr 1200 **.TTTGAATCAC AAGGAATCCA GACGCTAGCT ATCAACTCCA AACAGCAAGT GACTGTAAAA 1260 GTTGTAACAG ATGCGGCCAA GAAGCTCATG- GCTGATAAGA TTGCTCGCCA GAAAGAACGT 1320 *GGGAT'rCAGA rrGAAACCT'r GCGTACTATC ATTATCCGGA TTCCAAACGC TCGTAAATCA 1380 *ACTCTGATGA ACCGT'rTGGC TGGTAAAAAG ATTGCTGTTG TTGGAAACAA GCCAGGGGTC 1440 ACAAAAGGTC AACAATGGCT TAAA.ACCAAT AAAGACCTGG AAATCTTGGA 'rACACCGGCG 1500 *ATTCTCTGGC CTAAGTTTrGA GGATGAAACT GTTGCACTTA AGTTGGCATT GACTGGAGCT 1560 ATCAAAGACC AGTTGCTTCC TATGGATGAG GT'rACCATT'r TTGGTATCAA TTATTTCAAA 1620 GAACATTATC CAGAAAAGCT GGCTGAACCC T'rCAAACAAA TGAAAATTGA AGAAGAAGCG 1680 *CCTGTGATTA.T'TATGGATAT GACCCGCGCC CTCGGTTTCC GTGATGACTA TGACCGTTTT 1740 0 *TACAGTCTCT TCGTGAAGGA AGTCCGTGAT GGCAAACTCG G'rAACTATAC CTTAGATACA 1800 TTGGAAGACC TCGATGGCAA CGA1-rAAAGA AATCAAAGA6A TTCCTGTGA CAOTCAAGGA 1860 *GTTAGAAAGC CCTATTTTr TAGAGCTTGA AAAGGA'rAAT CGCTCAGGAG TCAAAAGGA 1920 AATCAGCAAG CGTAAAAGAG CCATTCAAGC TGAATTAGAT GAAAATTTGC GCTTGGA6ATC 1980 o CATGCTTTCT TATGAAAAAG AACTTTATAA GCAAGGAT'rG ACCTTAATTG CACGTATTGA 2040 TGAGGTTGGT CGTGGTCCTC TTGCTGGTCC 'rCTAG'rCGCT GCGGCCCTTA TTTTATCTAA 2100 256 AAATTGTAAG ATTAAAGGTC TCAACGACAG CAAGAAAATT CCI'AAAAAGA AAC-ATCTGCA 2160 GATTTTCCAA GCCGTrTCAAG3 ACCAAGCCTT GTCQATTGGA ATTrTATCA TAGATAAC 2220 GGTCATCGAC CAACTCAACA TCTATGAAr.C AACCAAACTA GCCTGCAAG AAGCAATCrC *2280 CCAGCTCAGC CCTCAACCAG AGCACCTTT GATTGATGCC ATGAAACTGG ACTTGCCCAT 2340 TTCACAAACC TCCATTATCA AAGGAGATGC cAAcTCCCTc TcrATcGcAG cA~CATcrAT 2400 AGTAGCCAAG GTAACACGTG ATGAATTGCT GAAAGAATAC GATCAGCAGT 'rCCCTGGCTA 2460 TGATTTCGCT ACTAATGCAG GATATGGCAC AGCTAAACAT CTGGAAGCCC TCACAAAACT 2520 AGGAG?1'ACC CCAATTCACC GAACCAGCTT TGAACCCGTT AAA'rCACTGG3 TTTTAGGTAA 2580 AA.AAGAAAGT TAATTGAAAG GAAATAACAT GGAGGAACAG TCGGAAATAG TCCGTTCTAA 2640 GAAAGAATTC GCCTTTGCAT CCAGCACTAT ACTIATCCCAA GTTGGTCGAG GAATCATTGT 2700 CGGCCTCATC GTTGGAATTA TCGTCCGATC CTCG?1-rC TAATGAAA AGGGCTTCCA 2760 CCTGATACAA GGAGTrrATC AAGATCAAGG GTACTTAGTG CGCAATCTTT TTGTACTGGT 2820 TITTGTTTTAT ATACTCATCT GTTGGCTCAG TGCCAAACTA ACACGGTCAG AAAAAGATAT 2880 TAAAGGCTCA GGAATTCCTC AAGTCGAAGC CGAACTCAAA GGCCTCATGT CCCTCAACTG 2940 .GTGGGGCATT CTTTGGAAAA AATATGTGCT AGGTATTCTr GCTATGCCA GTGGACTCAT 3000 GCTGGGTCGA GAGGGACCCA GCATTCAACT TGGAGCAGTT GGTGGTAAAG GAATTGCCAA 3060 ***-GTGGCTCAAA TCCAGTCCAG TAGAGGAACG TTCCTTGATT GCCAGTGGAG CTGCAGCAGG 3120 TAGCCGCA CCTTTAATG CTCCTATTGC AGCACTTCTC TTTGTTGTAG AAGAAGTCTA 3180 TCACCATTTT TCGCGCTTTT TCTGGGTCTC AACTCTAGCA GCCAGCATCG TAGCAAACT'r 3240 TGTGTCTCTA CTCATGTTCG GTTTGACACC AGTATTGGAT ATGCCAGATA ACATTCCTCC 3300 aCATGACCCTA GATCAGTATT GGATATATCT CGTCATGGG;A ATTTr-PCC'rTG GATTTTCAGG 3360 *T'rrTCTCTAT GAGAAAGCTG TATTAAACGT TGGAAGAGTT TATGACTTGA TTGGTCAAAA 3420 AATCCATT'rG GATAGGGCTT ATTATCCCAT CT'rGGC'TTT ATCCTTATCA TACCAGTCGG 3480 *AATC'IrCTTA .CCTCAAATCA ~GCGAAATCAGC'TT GTCCTTTCTT TAACTGAACA 3540 AAATTTTAG'r TTCCAAGTTT TATTAGCTTA CTTTTTAATC CGCTTrATTT GGAGTATGAT 3600 TAGCTATGGA AGTGGACTGC CAGGAGGAAT 'TTTCCTCCCC ATTTAGCTC TTGGTTCTTT 3660 *GC?'rGGTGCC TTAGTTGGTG TTATCTGTGT CAATCTTGGA CTTGTrCAGTC AAGAGCAATT1 3720 .CCCTATAT'rT GTCAT1'CTAG GAATGAG'rGG CTAPTTTTGGA GCCATATCAA AAGCTCCCTT 3780 AACCGCTATG ATCCTCGTAA CTGAGATGGT AGGAGATA?? CGCAACCTTA TGCCACTTGG 3840 TCTTGTCACT CTTGTTTCTT ATATTATCAT GGATTTGCTC AAAGGTACCC CAGTCTATGA 3900 257 AGCCATGCrG GAA.AAAATGC TTCCAGAAGA AGTATCTAGC GAAGGAGAAG TTACACTTAT
CGAAATACCA
CAACGTCCTC
CAGAATGTAT
CAAAGA'N'TG
TGATTTAT
rrTTACATAT
AAAGTATAAA
GAAAAAGAAA
GGACAATCAA
GTT'rCTGATA AAAT'rGCTGG GAAACAAGTT CATGAACTCA ACTrACCACA ATCACAACTC AAGTCCATAA TGGCAAGAGC CAAACAGTTA ACGGCTCAAC CTGGGTGATA TGATTCACCT GGTTATrCCA AAAAGTGAAA TTGGAAAAGT TTGTTGTAGT ATGAGTATTT ACATAAT'ITA TGTTATGTAA ATGATCAGT AGAAAACCGA TTCTCAGGAA TGAGATCGGT TA'NTTTTrAC TGATGAGGAA
AAATAATTGA
GCTAGAAAGG
ACAATAGCAA
ATCAATTTCT
ACTrTATTAA AAATAAGACT AGTTTACTGT ATCAAATCTG TTATATAGAG AAATGAAATA AGCAATGTTT TAGAAGTCCA C S
SC
SC. *5 be 5 6'
S
*b C S
S
C C
C
ATCTATTATA CAATGTGTrT TGTATCTCAT AGCTCCTTAT TATTAACAGA AGTTTAGTGG GTGAGATTTT TATTATTTTC GGTCTAAGTC TTTTTATCAC TTTGAAAAAC TCCTATAACA TTTCTTGAAA AATATACAAG TCTATGCTAT ACTACTAGTA ACATGAAACG TGAGATTTTA CrGGAACGAA TCGACAAACT AAGTTCTGGA ATACTACCAA INFORMATION FOR SEQ ID NO: Wi SEQUENCE CHARACTERISTICS: LENGTH: 21338 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear ATAATTAAGT TAGAAATGAT TACAGTA.AGA TTAAAATCAT GAAATAGGAT AAAACAATCA GATGTACTAT 'rCTAGTTTCA ATAGCTCTTC AGTTATGTAG C'N'ATTCTGT TTTGTTTGTA TCTTTCCGAA AAACTATAAT TACTTACTTA TGGAGAAAJ'T AAAACAACTC ATGCCCTGGT 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4820 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: .CTACGACATC ATGATTAACA GTCATGCGCT ACTACCAACT GAGCTATGGC GGATAAAATA C...GTCCGTACGG GATTCGAACC CGTGTTACCG CCGTGAAAAG GCGGTGTCTT AACCCCTTGA CCAACGGACC TTCTATCTGT AGCAGATATA ACCATTATAT CAATTTCTTG CTAATTGTCA ATCACTTTTG AGATTTTTTC TCTAAAATAT CTTTTAATTT TCTAATTTTT AATCTTGAAA S CTAGGACAACG ATGGTCTTCA TAGAAA&ACAA TTTCTAAGTT TTTTCGATCA ATTTCTCTGA TATTACCTAT ATTTACCAAA AATGACTTGT GAGGAGAATA AAATCGCTGA GTATGTTTGT CCTTTTCCTG AATATCTGTC ATGGTACCAT AAAACTCTTT TGCAAAATTC TTACCAATAA 258 TGCGCAATTT ATGAGATACC CCTGITrGIT CAATATACAA AATATCATGG TAAGGAATTT- TTAAATCATT TCCCTTGTAA TTGTAGTCGA AATAATCTAC AACATCTTCA TTTTCAAGTA ACATACTCTr CGTGTAGAAG ATATr1-rGCT CAATTCTTT CT 'AAACATC TCArCATT'GA TATCCTTATC AACAAAATCT AGGGCTGATA CCTGGTATTTr ATAGGTTAGA GTCGCAAACT CTGATCGACT AGTGATAAAG ACGATAATAG CCTAAGGATT GTAATGACGA ATGAGCTGAG CCACTTCAAA TCCCTTTTTC TCAATTCCAT GAATATCGAT ATCTAGGAAA TAAAGCTGAT T1'AC1-rCATC ATTTTCAATG TATTCTT-CAA ATTCACGGAC T=CCCGTT GTCTTGTATG ATATTGGAAT ATTCGATTCT TTCGAAATTT CATCCAATAT TCTCTCTAGT CTCACTTGAT GTTCAATAAC ATCTTCTAAA ATTAAAACTT TCATTCAAA'r TCCCTCTTAA ATCTAATGAT TTGTCTAAA'r GTACTGCCTT CCATCTCTGT TCTAAAATA ATATTG'rTGT ACTTATCTAG TAGTTCrTTTC ACATTATTTA ATCCGACTCC GCGATTTCTT CCCTAGTGG AGAATCCTAA GGCAAATAGA TCTCCTGAAG GAGTCATCGT CA-rTTACAT GAATTCTGAA TCACAATAAC TGTTTCAGTT TCCATCT'rAA TAACTGCTAC TTCCATCTGC TTTTTATAGC TATCAGCCGA TCCTTCGACA GCATT-ATTCA ATAAAACGCT CATGATACGA ACCAAATCCA ATAGTTCAAT TGGAAGCTTG GTAATCGTAT CTTTTACTTC CAGTGTAAAC TCTACACCAT TATTTCGAGC ATAGACAATT GACTGAGCAA CCAAAC'TCG TAAAGCTGAG TCTTCTATGT TGT'rCAAATC AAAGTAAGTG TACTTATCTG AACGCAATTT ATGATTTGCT TTGACTAAAA CTTCATTGTA AATTCTGTCA ATTTCCTGTA A.ATTACCACT GTCAATTGCC ATCTGCATGC TGACAAGCAT TCCAGCATAA TCATGTCGAA AACCACGGAT TTCATTATAC AGACCAACAA TTTCATCTGT GTAATTCTGT AAATGTTTCT GTTCAAATTT CTTCTGCTTC AAAGCAATCT CTTTCTCCAT TTGAACTTTA TGAGAATTCA TTGCAAAGAA GGTCAAAAGG AGAGAGATAA AGACAATAGA TGACAAAATA CTTCCAAAAC TATTCAAATG TTTAATCGTA CTTACCATAT CTGAAACGAA AGATACAATA TGTAGCAATA GTAAAGCAAA AAATAC=rT TTCAAGAAAG GATAAAGGTA GTCCTTGTCA AAATAGGCTA GTTCCAAATG GAAATAGTAA ATGATTrTTTA ATGTAACAA.A ATAGGTTAAC ACCGTCACAA CGAAAAAGAA TGGGAAATGA TA'rTG'rAAAA CAAAATTATC TCCTGTTATA GAGGAGAAAA TTACGGACAG AAAGTTATGA GTGCTCTCAT ATAAAAGAGA TAGTAGTAAA CTTAGGAATA GTCCTCTATC CCTCTCATAC TGTTTCATCC ATCGAAAATA GGAATATAAG CCCAAAGGAA ATAAAAATCT TI'CAATCCCT ATTTATCTA AATATAGAAG ATAAAAGGAA AATTCAAGTA CTATTTCAGT TAGTAATGTA TAAGCACCAA AAACGTATAA TTCTTT'rCTA TTTATTCGAC CTTTACAAAT TAAACGGTAA CTGTGACTAA TAATTAAAAA 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 ATGAACAATA ACTGTcccAA 'rTTTTTCGTA GGAAAAGAAA ATCTTTTGTA AGTCT'N'TTC ATAATAAAAT CTCCTAAAAT 259 ATCCAAGTAA ATCCATTACT CTrTCTCCTT ATTCATTAC ATCAAGGATG ATTCTTGkkA TCCTCATCTC CCCACCTTTA CTTCAAAGCT ACAAACTGTT CCAATTTAAC TGTGTTTTC GT1-rTrTCT'r GTAAGCTAAC TTACAAAAAC CATTATACAA AATGGAATTT CGTTTTAGAT AAAATCTCT CAACTGTCZAT TTT1TTTCTCC CAAAGTGTAC
TTTTTTAAGA
TTCGAACCTG
TAATACAAT'r
ATCCCTGCTG
GATGGGGTAA
ATGATAATCC
AAAAAGCCGG GAAAATTCCC CGACCGTTCG CTTAGAAGC ATTCTACCAA AAATTCAATT AATCGTAAAA GCGCGATAGA GGTTAGGCGA CCAAAACTGA TAAAC'T'CCC CCAATAATAA TTTCTCTAA CTGCTTACTA AATTCTTCTG CAATAACGAA ATCACGGTCA GCAATTTTTG TCTTT'TGATT TTCTCAT'rCA CTGGCCTTAT TTTCAAACTT AGCAAATCTA GAAATTCGTT ACTTTTCTTT CAGTTTCCCA ACTGTTACAA TTCTCI'ATTT T'N'CACATCT TAT'rCACAAA CAATTTCAAA AGAGT'rATCC ACAGTTTGTC CTAGTCAGTT TATACI'TCA GTAATTCAAA AGCTTTGCTA TTATATTGAT CCCAGCAGGA GAATGCTCTA TCCAGCTGAG CTA'rGAGACC AAAAGTCAAT 'rTTCTATr'rA TGGTAGGGGA TTTGTTCAAC AAGAACTAGT CTCATTAACT CAGAAAGATT GGCTCTATT TTTACAGATG AAGTAAGAGT AGAAAATCCT TTTATAGAAG AGAAGAAAGT TTTCCCTTCA ATGGCTAACA ATAAAATTCT CTGACCTTCT ATTTCTAAAA CTGGTGTr TTCATCTGAT AACTCAATCA TTGAATACTC TGCGATACCA TCTT'rTAAAT CTTT'AATTTT CATGACTCTA TTCTAACATA ATAAAAAATA GATTTCAATT AAGAAAATCA TAAAACTTTT GTGTTTAAGT TATAATTAAG CATATGGAGG CAAATATGAA ACA'rCTAAAA GTCGTTATCG TCATTAGCTT TTTTAGTGGA ACTCAAAAAA G;TAGTGTAAA CAACCAA AAGAACGAAA ATTCAACAAC ACAGGCTGTT ATTACTTATT CGGCAAACAG ACAAAATAGC 2280 .2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960
ACATTTTACA
GCCTTGGGTA
AACAATAGTA
AACAAAGTAA
AAAAATCGTT TCAATTATTA GTTTTTCAAT AACTCAACTA CTATTACACA AACTGCCTAT AAGATGCTGT TGTTTCTGTT~ GTATTTGGCA ATGATGATAC TGACACAGAT TCTCAGCGAA TCTCTAGTGA AGGATCTGGA GTTATTTATA AAAAGAATGA TAAAGAAGC TACATCGTCA CCAACAATCA CGTTATTAAT GGCGCCAgCA AAGTAGATAT TCGATTGTCA GATGGGACTA AAGTACCTGG AGAAATTGTC GGAGCTGACA CTTTCTCTGA 'rATTGCTGTC GTCAAAATCT CTTCAGAAAA AGTGACAACA GTAGCTGAGT TTGGTGATTC TAGTAAGTTA ACTGTAGGAG AAACTGCTAT TGCCATCGGT AGCCCGTTAG GTT'CTGAATA TGCAAATACT GTCACTCAAG GTATCGTATC CAGTCTCAAT 260 AGAAA'rGTAT CCT'rAAAATC GGAAGATGGA CAAGCTAT GATACTGCTA TTAACCCAGG 'rAACTCTGGC GGCCCACTGA ATCGCAATTA CCTCAAGTAA AATTGCTACA AATGGAGGAA TTCGCAATTC CTGCAAATGA TGCTATCAAT ATTATTGAAC CTACAAAAGC CATCCAAACT TCAATATTCA AGGGCAGGr CATCTGTAGA AGGTCTTGGT AGTTAGAAAA AAACGGAAAA GTGACGCGTC CAGC~rTGGG ATCAGAAGAC TCAATATTCC AGTAATATGC CTGCCAATGG AAAGAGATTG CTTCATCAAC ACCAT'rAAGA TAACCTACTA AAGAGTTCAG GTGAT'rTAGA GAAAAGATGT GTTAGTGTAG CAAAAAAATC CCTATCAACC
AATCCAGATG
AAGTAATGTT
TCACCTTGAA
AGACTTACAA
TCGTAACGGG
ATCTTAATTG
AATCATGGAA
CCGAAAAGAA
0 .0 *o 0*.
CAGTCTATCA AAGAAAATGG GGTCATTCAA GGTTATGAAA TCcTTGCAGG AGAGAGACGC TCTATCCCAG CTGTTGTTAA ACAGATTTCA GAAAATTTAC AGAGAGAAAA ?r'rAAACCCA GTAGAGAAAG GATTCACCCA TGCTGAAATT GTTAATTTAT CTAATGTGAG TACAAGCGAC ACATCTGGTG TAATT'GTTCG TTCGGTACAA AAATACGATG TAATTACAAA AGTAGATGAC AGTGCTCTTT ACAACCATTC TATCGGAGAC AAAGAAGAAA CTACCTCTAT CAAACTTAAC ACATCTATGT AAAGAAAGCT ~TTACATAAGA AAATTTGAAA TGATTTCTAT CACAGATATA TTTATAGAG AAAA.ACTAGA TGAACTAGCA CCGATTATTG TTCGTCA.ATC TCCTGTTATT TATCGGGCTT CACTTTTAGC TGGTCTACGG GACCAAGAGA TGATGGTCCA GTCCATTATT ATAGAAGAAG CACGCGCCTA TGAATCTCTC GCAGATAAGA TGGGCAAGTC TCGTCCATAT CCAGAACAGA TTCTTTCAGA AGTAGAAAAT CTAGTTGGGT TAAATAAGGA ACAACAAGAC ATTTCTGTAA GGAAATTAGA AGCTCT'rCTG ACTAATCATT TCATACAAAA TGAAGAAAAA 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 ATCAGCAACT CCATTCGTT GGCAAACTAT CACAAGCCCA TATTTCTTTC AACGGATTAT ACAGAGAAAA AACAAAAGAA CAGTTAAGAA AACTACTCGG
ACTTTCCTTG
TGCGCGTTCC
AGAAGAAGAT
ACAGCAAAAA
ATTAGATGTA
AAATCAAGAA
TTTA'rCTCAC
CTTCTTTTAT
TTGTGGAAAA
AAAATCATTA
TAAGGCTGTT
OCTTAATAAA
TGTGGATTAT
TTTCTTT'rTC
CTTTTATTTT
TCAATAA'TTT
TTI'TCACAGC
GAAATTAAAC TATCTAAAAA GAATATAGTA GAATTATCAA AAGGTTATCC ACTATGTTrT CCCCAACCTG TGGATAAAGT TT~CTTGCTAT CTATGGTAAA
AGACAGTGGA
CAGCCTGAAA
TCGATAAAAA
TTGGTAACAT
ATATCTCTAG
TTTTGGAATC
TATGCTATTC
CGCTCTGAAA
TATTAAACTr TTAAATAGTA AAGGAGGAGA AAGGATTGAA AGAAAAACAA GTATATTAGA ATTTGCACAA GAAAGACTGA CTCGATCCAT GTATGATTTC AAGCTGAACT CATCAAGGTA GAGGAAAATG TTGCCACTAT ATTTCTACCT TGGAAATGGr CTGGGAAAAA CAACTAAAAG ATATTATTGT AGTAGCTGGT TTTGAAATTT 56 5760 261 ATGACGCTGA AATAACTCCC CACTATATTT TCACCAAACC 1'CAAGATACG ACTAGCTCAC AAGTIGAAGA AGCTACAAAT TTAACTCTTT ATAACTATAG TCCAAAGTTA GTATCTATTC CTTATTCAGA TACGGGA?1'A AAAGAAAAGT ATACCTTTGA TAACTTTATT CAAGGGGATG GAAATGTTTG GGCTGTATCA GCCGCTTTAG CTGTCTCTGA AGATTTGGCT CTGACCTATA ACCCTCTTTT TATCTATGGA GGACCAGGCC TTGGTAAGAC TCACTTATTA AACGCTATTG GAAATGAAAT TCTAAAAAAT TTATTAATGA CTTTCTTGAT ATCGTAGTCT TGATCTr'rTG CAAC'rCAGGA AGAATTI'TC TCCTAACGAG TGATCGTAGT GT'rTTAGTTG GGGATTGACA TTTACAAAG TAAGACGGAA TAGCTGGGCA ATTTGATTCA TAATTGCCAG AGTAAAAAAA GAGCCCGCAA ACAAGATGTT AAGTTGGTAA CTrr'rATGGT ATATTGTTTT GGCCCGTCAA TTCCAAAAAT TGGGAAGGAA
ATTCCTAATG
CACCTAAGAC
TTAATCGATG
AATACCTTTA
CGCGTGTTAA ATATATCCCT GCCGAAAGCT TTGGGGAAAT GGAAAAGTTT AAAAAGACCT ATATCCAGTC ACTCAGCGGA AAAAAAGTCG ACGCCCTTCA TGACAAGCAA AAACAGATTG CCAAAACATC TAGAACGGCT CAAACTATCA CCCCCCCTrGA
CATTTAGGCT
AATGTTCGAG
ATCAAGGATA
AGCCAAATGC
GTTAGTATCA
GTAGCCATGT
ACAATTTCCA
ATCTTGAGGG
TCACTATTGA
TCGTCATCCC
AAGAAATGAA
ATTTATCTAG
CGAGGAGAGG CTTGTCACGC CTTTGAAACA CGTAT'rGCC-A AAGTGATACT CTAGAATACC AGCCATCAAC GACATCACTT TATTGCTGCA GAAGCCATTA AATTGATAAA A'rCCAAACT(G GGGAAGTAGA CGCCTTCAAA AGA.ACTAACA GATAATAGTC CACAGTCATT CATGCCCATG TTTAGAAATT GAATCAATCA TATCTTTTTT ATCCACATT CTGTTTTCCA CAGAT'rTCAC AATAAAGGAG AA'rCCATGAT AATACTACTA AGAGAGCTAT ATTGACGTGA CCAATGAAGG 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 TTGGGGGAA AAGATCATAC CCAAAATAAA ATCT~TTGATT GATCAAGACG ATAATTTACG AAAAGAAAAT CAAATAAT'rT GTGGATAACT TTTAGTTTTT TTTAAACAAG CTAAAAAACT TGATATGACT TGTTTAAAGG AGACTCTATT ATTACTATTA TCTTTCTAAT ACTAAAAATA TCAT'rTTTCA ATTAATAAAA ATTTATTTCT ACAAGCATTA TAGTTCTAAA AATGCCATTC CTATTTTATC AACAGTAAAA TATTACTTTA ATTGGTTCAA ATGGTCAAAT TTCAATTGAA AATTT'rATTT CTCAAAAAAA TGAAGATGCT CGTTTGTTAA TTACTTCTTT AGGTTCGATC CTTCTTGAAG CTTCTTTCTT TATCAATGTA GTATCTAGTT TACCTGATGT AACTCTTGAT TT'rAAAGAAA TTGAACAAAA TCAAATTGTT TTAACCAGTG GCAAATCAGA AATTACCCTA AAAGGAAAAG ATACCGAACA ATATCCACGA ATCCAAGAAA TTTCAGCAAG CACTCCTTTA ATACTTGAAA CAAAATTACT 262 CAAGAAAATT ATTAATGAAA CAGCCTTTGC TGCAAGTAr-A CAAGAGAGTC GTCCGATTTT~ AACAGGTGTC CACTTCGTAT TGAGTCAACA CAAAGAGTA AAAACAG'N'G CAACAGACTC TCATCGCCTA AGCCAGAAAA AATTGACTCT TGAAAAAAAT AGTGATGATT TTGATGTCG'r AATTCCTACC CGI'CTCTAC GCGAATrC AGCGGTATTN AGAGA?T'C TTTGCCAATA ACCAAATCCT CTTAGAAGC TCGTCTCCTA GAAGGAAACT TACTAT-rACT TTTAATrGTGG AAGTGCGACT CAAAATGGTA TGTTCACTCT CCAGAAGT1'G TGAAGATTTG ACCAT'rAGTT TAGCGAAAAG GTGACTATTA AGATACTGAC GAAGACTTCA GGTTGAGCCT GGCTCGCCTC ATCAAGTTGG AAATTTTGTT GTAAAAAGGC TAATCGT'rGG
ATCCTGATAC
TAAACTTACG
CTG'rGAAAC'r
GTAAAGTAAA
TCAACCCAAC
AGATCGCTG
CCAGTCAATG
TGAAATTAAG
CGAAGAAATC
T'rACTTGATT ACAGATGATA TCGWACTGT ffAUMATTA GCTTCTATAC ATTCCAACAG ACTTTAACAC GAGCGTGCCC GTCTrATC GATGGGGTTG TTAGCGCCCA GATACTGATC AGGTTACTGG GATTCTCTTA AAGCTTTAAA CCATTTACTC TTGTGCCAGC CGTACAAATT AAGTGAAAGA AGAAAAGGAG AGTAGTATGT GCTTTATCTC AGCTGTTCGT TGCAGCTCAT TACACCAGTT TTTTATGATA TAATCGAAAA
S
S. 55 S S
S
55
S
*5 *S S S GAGATGAAAA AATCACACGC TTGTACAATC AAGTCGACTG GAAATTACAC GTGTAGGAGC AGATATCAAA ATAAAATGTA 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 GTAATTGTGA GCATGTTrGTC ATGATGGGGC GATATGATTT TGAGCGAAAA ATGAATAAAA TTATTGrcACTG AGAACCCTTA GTTAGAGGGT TAGCACTTTA TCCCrT'T-G TGTrATAATA S. 55 S
V
555
S
SSSS
S
S. 55 5
S
TTAGGGATTG AAATGAAAAC GGAGAATGAG AAATATGGCT TTTGCCAAAC G~rGGTAAAT CAACACTATT 'rAATGCAATT AGCAAACTAC CCATTTGCGA CGATTGATCC AAATGTTGGA ACGCCTACAA AAACTAACTG AAATGATAAC TCCTAAAAAG ATTrACAGAT ATTGCAGGGA TTGTAAAAGG AGCTTCAAAA ATTCTTGGCC AATATTCGTG AAGTAGATGC GATTGTTCAC TGAAAATGTA ATGCGCGAGC AAGGACGTGA AGACGCCTTT TGATACCATT AATCTGGAAT TGATTCTTGC TGACTTAGAA GCGTGTAGAA AAGATGGCAC GTACGCAAAA AGATAAAGAA TCI'CAAAAG ATTAAACCAG TCCTAGAAGA CGGGAAATCA AGATGAGGAA CAAAAGGTTG TCAAAGGTCT TTTCCTTrG TGTAGCTAAT GTGGACGAGG ATGTGGT'rTC AGAACCTGAC AATTCGTGAA TTTGCAGCGA CAGAA.AATGC TGAAGTAGTC T'rGACAGCAG GTATCGTTGG ACAAAAGCAG GAGCAGAGGC ATGGTGGAAG TTCCAGATGA ACAGTTCCCA CAACATTTGA GGAGAGGGGC TAGGGAATAA GTAGTTCGTG CTTT'TGATGA GTAGATCCAC TTGCAGATAT TCAGTGAACA AACGATATGC TCAGTAGCAG AATTCAATGT GCTCGTACCA TTGAATTTAC ACGACTAAAC CAGTTCTTTA TCTATCGACT ATGTCAAACA GTTATTTCTG CGCGTGCTGA 263 GGAAGAAAT TCTGAA'N'GA ATGATGAAGA TAAAAAAGAG ?PTCTTGAAG CCATTGGr GACAGAATCA GGTGTAGATA AGTTGACGCG TGCAGCTTAC CACTTGCTTG GATTGGGAAC TTACT'rCACA GCTGGTGAAA AAGAAG1'TCG CGCT1GGACT TTCAAACGT GTATGAAGGC TCCTCAAGCA GCTGGTATTA rCCACTCAGA CTTTCAAAAA GGCTTTATTC GTGCAGTAAC CATGTCATA'r GAAGATCTAG TGAAATACGG ATCTGAAAAG GCCGTAAAAG AACCTGGACG CTTGCGTGAA GAAGGAAAAG AATATATCGT TCAAGATGGC GATA'rCATGG AATTCCGCTT TAATGTCTAA AAATTAATAA ATGGTGTCAA TTAGGTTGGA AAAAAATTCC AACCCTTTT'G GCTT'rTGAAA GGAAAAATAA ATGACCAAAT TACTTGTAGG CTTGGGAAAT CCAGGGGATA AATATTTrGA AACAAAACAC AATGTTGGTT TTATGTTGAT TGATCAACTA GCGAAGAAAC AGAATGTCAC TTTTACACAC GATAAGATAT ?TCAAGCTGA ATGGAGAAA AATTTATCTG GTTAAACCAA CGACCT?1TAT TWcATGcTT'~r ATTAACTTAC TATGGTTTGG ATATTGACGA ATCTTGACAT GGAAGTTGGG AAAATTCG'rT TAAGAGCAAA ATGGTATCAA GTCTATTATT CAACATATAG GAACTCAGGT GAATrGGAAG ACCTAAAAAT GGTATGTCAG TTGTTCATCA GGGATGATTA TATCGGTA'N' T'rACAGTCTG T'rGACAAAGT 999999 9 9 09 0 9 9 9.
9 9 9 9. 4* 9 9 9 .9 *9 *9 9 ATTTACAAGA GAAAAATTTT TATTAGATTT ATTCTCAGAA AGAAAAGACA ACTAATACTT GTTTAGAAAA AGAAGATAGG TTGTTAGTGA TCTTATTrCT ATGCTCCTAT GGTGGAGTTT CCTTGCGTTT TTTGACTGAT GTCGATTGAT TTTACCGTCT GTGAAGAATA TGATCAACAC TTACTCAAGT ACAAACTCAG AAATATCCCA GITrAGAACCT GAGAAAACAA TGCAGAGGTA AATGATCAGA TTAAAAAATG GGTTTATCAA CATCTACTAA ATTGTGTTAT TGACGTCAAC CCTAGCATCC 'TrTTTCCTAA GAATGAAAGT GGAAAAGCAG TT'rAC=ATC ATTTACGATG AGGCTCAGCA GGTGGTCATA CTTTAACCGT GT'rAAGArlwrG TGTTTTGAGT AAGTTTGACA TGACGAT'rCT GTAAACTACT TAACGGATAA ATGGTGACCT GCATCAAAAT TTAACAGATA GGCTCT'rGCA ATTGCAAGCA TTATGGAGAA GCAGAAGGAC CTATCCATTT TTGGTAGATG AATTATrrCA CGGGTTGAAG AG'N'TGTAAT ATCGCAGCAA TATTGTAAAA ATCTCAGTTG GGAAAATGGC TATCGAAAAG AGATATTTTA GATATTTTTG TGATGAAATT GATGGTATCA GACAGAACTC ACTATCTTTC AGGACAGTCA GCTTTAGAAA 9360 .9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 99 99 9 4 *999 9 4.99 *9.9 9 **99 99 -9 99 9 0
ATCTTGGGTG
TTGATGTCTT
TCATCTAAGA
CCCAATGCAT
GCGTTTATCC
GGCGAATTTA
TGTCGAATTG
AGGAACTCGT
CACAGG.AAAA
AAGGGATT'T
TCAAAGATAG
ATCAGTTAAA
GTCTTCGAGG
AGTTTTGG
AAGAAAATAA
ATTATCAACG
GGTCATTTGA AGTAGAAACA CAATTATCGA CAGCTAGTGA TATGCTTTTG AGAGAAAAGG 264 AACAAATTTC AAAAACI-rTA TCACCTATTN TGAAATCATA CCTAGAAGAA ATTCTTTCAA GTr'rTCACCA AAAACAAAGT CATGCAGACT CTCGGAAGTT TTTATCTr'rG TGCTATGATA AGACATGGAC TGTCTTTGAT TATATTGAAA AAGA'rACTCC AATATTCTTT GATGATTATC AAAAATTGAT GAATCAGTAT GAAGTCTTTG AAAGAGACT AA1-rACAGAA TAGTAAAGCA TTTTCTGATA TGCAGTATfTT ATAAAAAACA AAGTCCAGTG AATTTGACAA AATT'rATCAA CTTTTCTAAA AGAAGAAATT CTAGCAA'rrC AATGGGAAGT TGGATTCTAG AGATAAGACA TCAGACA'rGG 'rM'CArITT TTCAAAAGAA A'rrAAAGCGT AAGATTACAA TGAACTTGAA ACCTTTTTCT CTrAATCTTCA TTCAATCAAT ATCCTATGCA GAACGATATA AAAAAATGGA AAAACAr'rGG AGGATATGTT AATATCTGTA AAGAATCTCT GTAGATGAAA AGATTTTATT CG'rTTTCGAA GACAACATGT AGCGCAGTAC TTACAGAAG TTCTGATATT GAAcAAATCT AAAGGGTTTA GGAAATCTcA GGAA7'rTTrC AATCAGT'N'T TTACACCATT ATTCTGCAGT AGAGGAATAT CAGATTAAAT AAACTTAATA GAGGGTAArC GATAACTGAA CATGACATTT TTCAAATGCA GAGAGArrAA TCATATCCAT GGGATTGGTC TCGCGATTAT GTCAGTGTCC GATTCATCTA CTGTCCAAAT AAATGACGGT CATTTTAAAA TGA'rGATrrA ATCAAACTCT TGATGATGAT GATCAAGATG ALCTTCGTAGT ATTGAGGAAA TTT'AGTTGGG GATGTTGGTT
AAAGGGGACT
AATATCTAGG AAT'rGAAACC ATTGAAATCA AATACCAAAA TGCTGATCAA ATTTCTATCC
ATGTTGT'CCA
AGGGAATTCA
CCGTGGAACA
11100 11160 112 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840
ATATTTCAAG
AGGCCAAGCA
ACTCTGAACG
CCTTTGATGA
TGATGGTAAA GCTCCAAAAC TCAATAAATT AAAGGTTAAG AACCAGGTAG AGGATATAGC TAGTCAGTTG AAGGGTTTTG CTTTCTCAGC TGCT1TTCCCT TATGTTGAAA CGGATGATCA TCAAGAGGGA TATGCAGGCT TTGGAAAGAC *TGAAGTTGCT TTGTCATTCT AGTTCCGACG GATTCCAAAA 'Nr1TGCAGTT AGACTGCAkC ACT'rGAAAAA GTGTTTTGTC AAAAGATGT A.GCGATTTGG TGTCAAGCAT TAACCTTGAC CGCTACGCCA ATTTATCTGT TATTGAAACT AAAAGAATGA TAGTGTCATT 'rTTATTATCT TTACAACAAA TCTCAGCCAA TGGATCGACT ATGCGTGCAG CCTTTAAAGC ACGG'rTTTAG CGCAACAGCA AATATTGA'rG TGTTGAGTCG TTGAAAAACG GTCAAGTCGA GTGTTTGCTG ATTGGGCTT AGTcAATGAT CACAA ACMG CTATACGAAT TTTAAGGAAC CTTTAGAAGT AAAAAAGAGC TATTT'rGATT GGAACACATC GATGATTATT GATGAGGAAC GAAGAAACAA GTGGATGTCC GTCTATGCTG GGAATCAGAG TGTTC-AGACC TATGTTTTGG AATGGAGCGT GGAGGTCAAG GGTTTCAGAA TTACAGGAGT
AAGGAAACTT
ATCCCTCGTA
CCGCCGACTA
CGTGATGCTG
GTTGACACAA
TGAAAGAACT
CCCTCCATAT
ATCGCTATCC
TCTTGCGTGA
TTGTrCAGAA WO 98118931 PCT/US97/19588 265 TGATTCCGGA GGC'ITCGATT GGATATGTTC ATGGTCGAAT GAGTGAAG'rC CAGTTGGAAA 12900 ATACTCTATT AGAC'IrrATT GAGGGACAAT ACGATATCTT GGTGACGACT ACTATTATTG 12960 AGACAGGGGT OGACATTOCA AATGCTAATA CrrTATTrAT TGAAAATGCG GACCATATGG 13020 GCTTGTCAAC CTTATATCAG TTAAGAGGAA GAGTCGGTCG TAGTAATCGT ATTGCTrATG 13080 CTTATCTCAT GTATCGTCCA GAAAAATCAA TCAGTGAAGT CTCTGAAAAG AGATTAGAAG 13140 CGATTAAAGG ATTTACAGAA TTGGGCTCTG GCrrTAAGAT TGCAATGCGA GATCTTTCGA 13200 TTCGTGGAGC AGGAAATCTT TTAGGAAAAT CCCAGTCTGG TTTCATTGAT TCTGTTGGT 13260 TTGAATTGTA TTCGCAGTTA TTAGAGGAAG CTATTGCTAA ACGAAACGGT AATGCTAACG 13320 CTAACACAAG AACCAflGGG AATGCTGAGT TOAI'rrTGCA AATTGATGCC TATCTTCCTG 13380 ATACTTATAT TTCTGATCAA CGACATAAGA TTGAAATTTA CAAGAAAATT CGTCAAATTG 13440 ACAACCGTGT CAATrATGAA GAGTTACAAG AGGAGTTGA'r AGACCGTTTT GGAGAATACC 13500 .CAGATGTAGT AGCCTATCTG TTAGAGATTG GTTTGGTCAA ATCATACTTG GACAAGGTCT 13560 **TTGTTCAACG TGTGGAAAGA AAAGATAATA AAATTACAAT TCAATTTGAA AAAGTCACTC 13620 AACGACTGTT TTTAGCTCAA GATTATTTTA AAGCTTTATC CGTAACGAAC TTAAAAGCAG 13680 GCATCGCTGA GAATAAGGGA TTAATGGAGC TTGTATTTGA TGTCCAAAAT AAGAAAGATT 13740 *ATGAAArrTT AGAAGGTTTG CTGATTTTTG GAGAAAGTTT ATTAGAGATA AAAGAGTCTA 13800 AGGAAGAAAA TTCCATTTGA TATTTTTCTT CTATAAAATA GATAAAAATC GTACAATAAT 13860 00 AAATrGAGGT AATAAGGATG AGATTAGATA AATAT'PTAAA AGTATCGCGA AVI'ATCAAGC 13920 *GTCGTACAGT CGCAAAGGAA GTAGCAGATA AAGGTAGAAT CAAGGTTAAT GGAATCTTGG 13980 CCAAAAGTTC AACGGACTTG AAAGTrAATG ACCAAGTTGA AATTCGCTTr GGCAATAAGT 14040 *.TGCTGCTTGT AAAAGTACTA GAGATGAAAG ATAGTACAAA AAAAGAAGAT GCAGCAGGAA 14100 TGTATGAAAT TATCAGTGAA ACACGGGTAG AAGAAAATGT CTAAAAATAT TGTACAATTG 14160 AATAATTCTT TTATTCAAAA TGAA'IACCAA CGTCGTCGOT ACCTGATGAA AGAACGACAA 14220 AAACGGAATC GTrTTATGGG AGGGGTATTG AT'IFTGATTA TGCTArrATT TATCTTGCCA 14280 ACTTrTAATT TAGCGOAGAG 'rTATCAGCAA TTACTCCAAA GACGTCAGCA ATTAGCAGAC 14340 **9TTGCAAACTC AGTATCAAAC TTTGAGTGAT GAAAAGGATA AGGAGACAGC ATTTGCTACC 14400 AAGTTGAAAG ATGAAGATTA TGCTGCTAAA TATACACGAG CGAAG'rACTA TTATTCTAAG 14460 TCGAGGGAAA AAGrrTATAC GATTCCTGAC TTGCTTCAAA GGTGATAAAA TGGAAAATTT 14520 ATTAGACGTA ATAGAGCAAT TTTTGAGTTT GTCAGATGAA AAGCTGGAAG AATTGGCTGA 14580 266 TAAAAATCAA TTATGCGTT- TACAAGAAGA AAAGGAAAGG AAGAATGCGT AAATTCTTAA TTATTTT'GTT GCTACCAAGT TTTTTGACCA TTCAAA6AGT CGT'rAGCACA GAAAAAGAAG TCGTCTATAC TTCGAAAGAA ATTATTACC TTTCACAATC TGACTI'GGT ATTTATTTTA GAGAAAAAT'r AAGTTCTCCC ATGGTTTATG GAGAGGTTCC TGTTrAGCG AATGAAGATT TAGTAGTGGA ATCTGGGAAA TTGACTCCCA AAACAAGTT TCAAATAACC GAGTGGCGCT TAAATAAACA AGGAATTCCA GTATTTAAGC TATCAAATCA TCAATI'TATA GCTGCGGACA AACGATT'TT ATATGATCAA TCAGAGGTAA CTCCAACAA'r AAAAAAAGTA TGGTTAGAAT CTGACTT'rAA ACTGTACAAT AGTCCTTATG ATTTAAAAGA AGTGAAATCA TCCTTATCAG CTTATTCGCA AGTATCAATC GACAAGACCA TGTTTGTAGA AGGAAGAGAA TTTCTACATA TTGATCAGGC TGGATGGGTA GCTAAAGAAT CAACTTCTGA AGAAGATAAT CGGATGAGTA AAGTTCAAGA AATG'N'ATCT GAAAAATATC AGAAAGATTC TTTCTCTATT TATGTTAAGC AACTGACTAC TGGAAAAGAA GCTGGTATCA ATCAAGATGA AAAGATGTAT GCAGCCAGCG TTTGAAACT CTCTTATCTC TATTATACGC AAGAAAAAAT AA.ATGAGGG'r CTTTATCAGT TAGATACCAC TGTAAAATAC GTATCTGCAG TCAATGATTT TCCAGGTTCT TATAAACCAG AGGGAAGTGG TAGTCTTCCT AAAAAAGAAG ATAATAAAGA ATATTCTTTA AAGGATTTAA TTACGAA.AGT ATCAAAAGAA TCTGATAATG TAGCTCATAA TCTATTGGGA TATTACATTT CAAACCAATC TG.ATGCCAcA TTcAAATrccA iAGATGTCTGC CAT'rAtGGGA GATGATTGGG ATCCAAAAGA AAAATTGATT TCT'rCTAAGA TGGCCGGG.AA GTTTATGGAA GCTATTTATA 14640 14700 i4760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 ATCAAAATGG ATTTGTGCTA CCAAAGGTGT TTCTGTTAAA ATACGGGTGT TGTCTATGCA ATTATGATAC GATTTCTAAG AGATTTTTTA AATCATTTTC AGCTCTTTCT GGTGGATTAG AGAGTTAGAG ATTGAATTGA TTGGGAAGAA AAGGAATTAA CAATTTTCA GGAGAATTTT GAGTCTTTGA CTAAAACAGA TTTTGATACT G'rAGCTCATA AAATTGGAGA TGCGGATCAA
CAGCGAATTG
TTTAAGCATG
GATTCTCCAT TrTATTCTTTC TATTTTCACT AAGAATrTCTG ATAGCCAAGG ATGTNTATGA GGTTCTAAAA TGAGGGAACC TCAAGAAGGG ATATTrCAAA AAGCATGCTA AGGCGGTTCT ATTCCATGTT TCTATTTAAG GTATTGTCTA CTTATCAAAA TTCTAGCTCA TGTGAATCAT AAGCAGAGAA TTGAATCAGA GGAAGTTGGC TGCTGAAGCA GAGCTTCCTA TTTATATCAG CAGAAGCGCG TGCACGAAAT TTTCGTTATG ATTTT'T-TCA GTGCGACAGC TTTAGTCACT GCCCACCATG CTGATGATCA GCTTGATTCG AGGAACrCGC TTGCGCTATC TATCAGGAAT GAGAGATAGA AATCATTCGT CCCTTC'rTGC ATTTTCAGAA
AGAGGTCATG
GGTGGAAACG
TAAGGAGAAG
AAAAAGACAG
ATTTTTATGC
CAAGTAGTCG
WO 98/18931 PCr[US97/19588 AAAAGACTTT CCATCAATTT TTCACTTTGA AGATACATCA AATCAGGAGA A'rCATTATTT TCGAAATCGT ATTCGAAATT CTTACTTACC AGAATTGGAA AAAGAPAATC CTCGATTTAG GGATGCAATC T'rAGGCATTG ATCTAACAAT ATTAA'rGTGG AAGAGTTTTA CTTCAAACT'r GTTTGCTGAA GTI'CAGCAGA TGGCTATGAA TTGATAAAAG PGATGAAAAG GAAGATGAAC TTTATTTTCT TTTGGAC'TC TGAAACATCC ATACACATTC TAGAAAAAAA CTCAGACGTT GCAATGAAAT TTTAGA'rAT GATTTGGCAA TAGCTGAATT AAGATTTACA GCAGTTATTT TCTTACTCTG AGTCTACACA ATCTGAA'rCG TTTTCCAGAT T'rGAATCTTA CAAAAGCTCA TTTTAAAATC TAAAAGCCAG TATCGTCATC CGATTAAAAA AGTACCAACA GTTCAGAT2 TGTAAAATCA GTCCGCAGgC TTGTGTTACA CTATCAAAAT CAGGTAGCTT ATCAAGGATA CAT'rAGAAGG TGAATTAATT CAACAAATAC CTGT'r'CACG GTCATCGAAA AACAGGAGAT GT'rTTGATTA AAAATGGGCA TATTTATTGA TTTGAAAATC CCTATGGAAA AGAGAAAC'rC
S
S.
C
S
*5
S
S
C. S S 5555.5 5~~4 V. TGCTCTTATT ATTGAGCAAT TTGGTGAAAT TGTCTCAATT TTGGGAATTG CGACCAATAA TTTGAGTAAA AAAACGAAAA ATGATATAAT GAACACTGTA CTT'rATATAG AAAAAATAGA TAGGTAAAAA ATGTTAGAAA ACGATATTAA AAAAGTCCTC GTTTCACACG ATGAAAT'rAC AGAAGCAGC'r AAAAAACTAO GTGCTCAATT AACTAAAGAC TATGCAGGAA AAAATCCAAT CTTAGTTGGG AT'IPAAAAG GATCTATTCC TTTTATGGCT GAATTGGTCA AACArATTGA TACACATATT GAAATGGACT TCATGATGGT TTCTAGCTAC CATGGTGGAA CAGCAAGTAG TGGTGTTATC AATATTAAAC AAGATGTGAC TCAAGATATC AAAGGAAGAC ATGTTCTATT 'rGTAGAAGAT A'rCATTGATA CAGGTCAAAC TTTGAAGAAT ?I'GCGAGATA TGTTTAAAGA AAGAGAAGCA GCTTCTGTTA AAATTGCAAC CTTGTTGGAT AAACCAGAAG GACGTGTTGT 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 AGAAATTGAG GCAGACTATA CTTGCTT'rAC TATCCCAAAT GAGTTTGTAG TTTAGACTAC AAAGAAAATT ATCGTAATCT TCCTTATATT GGAGTATTGA GTATTCAAAT TAGAAAGAAT AATCTTTAAT GAAAAAACAA AATAATGGT TCCTrTTCTA TGGTTATTAT TTATCTTTTT CCTTGTGACA GGATTCCAGT TGGGAATAAC TCAOGAGGAA GTCAGCAAAT CAACTATACT OAGTTGG'rAC CGATGGTAAT GTAAAAGAAT TAACTTACCA ACCAAATGGT AGTGTTATCG TGTCTATAAA AATCCTAAAA CAAGTAAAGA AGAAACAGGT ATTCAGTTTT TGTTACTAAG GTAGAGAAAT TTACCAGCAC TATTCTTCCT GCAGATACTA ATTGCAAAAA CTTGCTACTG ACCATAAAGC AGAAGTAACT GTTAAGCATG
TAGGTTATGG
AAGAGGAAGT
TAAT'rAAAAA
ATTTCTATTC
AAGAAAP'PAC
AAGTTTCTGG
TCACGCCATC
CCGTATCAGA
AAAGTTCAAr, TGGTATATGG ATTAATCTAC ATTCTCTATG ATGGGAAATA TAGTAAGGCT AAAGCAGCAA AGCTGAGGAA GAAAAACAAG ATTCACAAAA CTTGGAGCCC ACGTAAAAC'r TTGCTTGCTA CTCAGGTTCT GACTTTGTAG TTTTGAGGAT GCCAAAAAAG TGGACGTCAA CGTGGAGTCG CCAAC'TTTG ATTGAGATGG GACAAACCGT 'rCAGATGTAC AGTATTGGT'r GGTCGTCCTG GAATAAGCCT TTAGCAGAAG 268 TCGTATCCAT TGTCCCATTT GGAATTCTAT TCTTCTTCCT TGGGAGGAGG CAATGGCCGT AA'rCCAATGA GTTTTGGACG ATAAAGAAGA TATTAAAGTA AGATTTTCAG AACTAGTTGA AGTTGTTGAG TTCTTAAAAG GTATTCCAGC AGGTGTTCTT TGGAGGGAC AGGCAGTCGC TGGAGAAGCA GGTGTTCCAT AAATGTTTGT CGGAGTTGGA GCTAGTCGTG CAGCACCAGC TATCATCTTT ATCGATGAAA GTCTCGGCGG AGGTAATGAC GAACGTGAAC ATGGT'rTrGA GGGAAATGAA GGGATTATCG
ATGTTGCTGG
ATCCAAAACG
CTCCGGGGAC
TCTTTAGTAT
TTCGC'rCTCT
TTGATGCTGT
AAACCTTGAA
TCATCGCTGC
TTGACCCTGC CCTTTTGCGT ATGTTAAAGG TCGTGAAGCA
ATGTTGATTT
TGTTGGTGCT GATTTAGAGA ATGTC?'rGAA TAAATCGATA ATTGATGCTT CAGATATTGA TTCTAAGAAA GATAAGACAG TTTCACAAAA AGGACATACC ATTGTTGGTC TAGTCTTGTC TGTACCACGC GGCCGTGCAG GCGGATACAT TCTATCTAAA GAAGATATGA AAGAGCAATT AGAAATTATC 'TTrAATGTCC AAACCACAGG AATGGCACGT GCAATGGTTA CAGAGTACGG TGAAGGAAAC CATGCTATGC TTGGCTGCACA
GAAATTAGTG
TGAAGCAGCT
TGAAGCAGAA
AGAACGAGAA
GAATGCTCGC
GATTGCACTT
GGCTGGCTTA
AGCTTCAAAC
TATGAGTGAA
GAGTCCTCAA
CCAGGACGTT TTGATAGAAA ATCTTGAAAG TTCAIZGCTAA GCTCAACAAA CTCCAGGCTT TTAGTTGCTG CTCGTCGCAA GATAGAGTTA TTGC'1GGACC TTGGTTGCTT ACCATGAGGC GTTGTCCATA AGGTTACAAT CCTAAAGAGG ATCAAATGCT ATGGGTGGAC GTGTAGCTGA GACTTTGAAC AAGCGACACA AAACTTGGCC CAGTACAATA AAATCAATTT CAGAACAAAC GAGGCACGAA ATAAAGCTGC 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 AGCI'ATGAA ATTGATGAAG AGGTTCGTTC ATTATTAAAT TGAAATTATT CAGTCAAATC GTGAAACTCA CAAGTTAATT GCAGAAGCAT TATTGAAATA CGAAACATTG GATAGTACAC AAATTAAAGC TCTTTACGAA ACAGGAAAGA TGCCTGAAGC AGTAGAAGAG GAATCTCATG CACTATCCTA TGATGAAGTA AAGTCAAAAA TGAATGACGA AAAATAACCC TGAGAGAGGC TGGAGCCTCT CTTTTrGTG CAGTTTAGGA G.CTAAAGGGA ACAGAATGGA GAAAATGGAA CAAATGTGTT TTCTAATCTG TTAGACTGTA TCTAGAAAGG GGAAAATTAT GATTAAAGAA TTGTATGA.AG AAGTCCAAGG GACTGTGTAT AAGTGTAGAA ATGAATATTA CCTTCATTTA TGGGAATTGT CGGATTGGGA GCAAGAAGGC ATGCTCTGCT TACATGAATr GATTAGTAGA GAAGAAGGAC TGGTAGACGA TATTCCACGT ?1'AAGGAAAT ATTTCAAGAC CAAGTTTCGA AATCGAATTT TAGACTATAT CCC'TAAACAG GAAAGTCAGA AGCGTAGATA CGATAAAGAA CCCrATGAG AAGT'GGrGA GATCAGTCAT CGTATAAGTG AGGGGGTCT CTGGCTAGAT GATTATTATC TCTTTCATGA ACAAACAAAG TAAAGAGAAA GAGGGCGTCA AAGAGTATrA CCCACTAGTA AGTCATGCAA GAAAAGGCTG TATAATAGTA TAAGACACCG CCTTTT~CACG TTGCGTCAGG ACCACTTGAT GTTGACAAGC GAAAGCAGTT' CAAGAAGAAC TAGAACGCGT AGACACTTAC GCATrGTGTT AAAAAATGAA AAAAATTAGA AGAGTrGAAA ATAACAACTC GCGGTAACAC GGGTTCGAAT GAAAAAAAGTr TTAAAAAAAC GTGATATACT AATATAGT'rC AACACTAAGA GATTATAGAA CTTAAGCAAT GAACGATTrc TAAGGAGTr-r ACTATCCGTA AAAAGTAGTT GACAAAGTTT AGGTCCGTTG GTCAAGGGGT CCCGTACGGA CTATGGTATG TTrAAAAATCT TCAAAAAAGT TCGCTTGAGA GAAGCAAGTG ACAAAGACCT TTGAAAACTG A6ACAAGACGA GTAGTACTGA ACAATGAAAA AAACAATAAA AAACTTTTTA ATGAGAGTTT GATCCTGGCT ACCAATGTGC AGGGCGCTAC AACGTAAGTT TCTGTCAGTG ACAGAAATGA GTAAGAACTC CAGGACGAAC GCTGGCGGCG TGCCTAATAC 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21338 ATGCAAGTAG AACGCTGAAG AACGCGTAGG TAACCTGCCT CATAAGAGTA GATGTTGCAT GACCTGCGTT GTATTAGCTA CGACCTGAGA GGGTGATCGG GCAGCAGTAG GGAATCTTCG AAGAAGGTTT TCGGATCGTA GT'rCACACTG TGACGGTATC
GTA.ATACGTA'GGTCCCGAGC
TAGATAAGTC TGAAGTTAAA GAGGAGCTTG CTrCTCTGGA TGAGTTGCGA ACGGGTGAGT GGTAGCGGGG GATAAcTA7-r GGAAACGATA GCTAATACCG; GACATTTGCT TAAAAGGTGC ACTTGCATCA CTACCAGATG GTTGGTGGGG TA6ACGGCTCA CCAAGGCGAC GATACATAGC CCACACTGGG ACTGAGACAC GGCCCAGACT CCTACGGCAG GCAATGGACG GAAGTCTGAC CGAGCAACCC CGCGTGAGTC AAGCTCTGTT GTA6AGAGAAG AACGAGTGTG AGAGTGGAAA TTACCAGAAA GGGACGGCTA ACTACGTGCC AGCAGCCGrG GTTGTCCGGA TTTATTGGGC GTAAAGCGAG CGCAGGCGGT GGCTG'rGGCT TAACCATA INFORMATION FOR SEQ ID NO: 21: SEQUENCE CHARACTERISTICS: LENGTH: 6273 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 270 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: TGTI'TTTAAA GAGCCGTGTC TGGATAGACT TTCGGACGCA ACGCTCI'ATT CTGCCTATAC ACAAGATTC TAACCTTACT CGACATGAGC TGAAACCTCT GTAGTTCACA AAATATTATA CACCTATTTT ATGAATAGTC AACrTrCTTT T'rTAGAAAAT CATGAAAATT TITCTCTTTCT TTCCATTrA AGTGACATTC ACATCAAAAA ACCCCAGACG AAAT'PGTCTG AGCATTC'N'T TATCTAGTCG TTGAGTTCAG TATGTTTAAA GTCTCTGTCC CATCATTTCT TCAACAAACC AGAAACTCCT TGGCTACTTG CTTTCGAC TTGCCTTCAA CACCGACTTG TGGCTCATCT GGCTTTCTGT AATCTTACCA GCCAATGTAT TAAGAACTCT GGGTGTTTCT TGAGAAGAGC TTCTTTCATG AGTGGAGCCC CTTGATAAGG TGCTTGTCAT CTTCCAAGAC CTGTAAATCA TAACGCTCCA ATTCCGCATC
AGATAATGAA
TATTrGTTAA
ACAGTAAAAT
AGTCATTCTC
TTTAAGGAAG
TTGTTCTTGG
GTAGTTGAGC
r'rCCAACTCT
TGGGAAGAGT
AGTCGAATAG
GCATCCGTGA
GTCGCTACAT
TCGTTAAACT
ATGGTCTTCA
TGATAAGACA
ACCTGATAAA
GTCACCGTAC
AGGAAGCTTG
ATCAGCAACT
ACCAAGTTTT
AAAGCCACCA
AGTAGGAAGT
CTGGCATTAT
AAGGCCGCCA
ATAACAGGCA
AAG40CAATCC
TGCAAAATAG
TTTGAATATC CCCTGAC'rGA ATAGCCTGAT TGAGATTGAG ACCATACATT GATTGCAAGC CGAGTGTAAA ACCTGCCTTC AACTGCCCTT AGCCATATTC TTGAGCAATC TTTTTCGGAA TGGGTTTGAG ATAGGCTAGA TGATCCTGCT CCTGTTC'rGG TTCATGACTC ACCTTGGGTG CAGTAAATTC AGGATAGATG TCAATATCC TCTTCCCAAA ATTCGGTTTA ACAGTCGCAG TATACATATT GGCCAAAATT TCTGGTTCTG CCTTCTCTTT TTGAACCAAA AGAGCTGGAC AGGCAAAACC TGAGAAAATC GTCCGTAATT TAAAGGCAAT GGCTAGCACT GCAGAAGAAA TACGGTCAAT TCCCAAAAGA ATAA.AGGAAC AGGTTGCCGT ACCGATAATC AAAACAGCTG TGGCGAGTGG AATTTCAAAT TTCTTGAGAC CAGCCTCTTG CAGGTTCGGA TCAATTCCCT GGAAAATCGC ATAAATCACT AGAGCTGTCA AGCGAAGGGC TGGCTCAATG CCTTATTTCC ATCTTCACGG CCAC'TTTTTT CAAGTCTGAA CAGCTACAGC ATAGGTGTTT TAGCAATGCC ATCACGCGCC ATGGTTGAAG CAAACTTTCA CTTTrTTTCAG AGCTTCATAA TCATGCTGGT ATTTTCTTCA GACCTATTTT CCCAGCAATA TATAAGACAG ACCCAGTAAT TTGCTTTTTC CATCACTTTT GTGCCCCAAT CAAAATCAAA CTAGTCCCCC TGCACCAATC CCGTCCGAAT CCCAGACATG GTTCCCATCT GGTCATCCCA TCAGCCCAGT GATAGTATTT AAGCCGGCAA GG'rCCCAATT GGATGGTCTG GAAAATACCT GAAGAAkAAAC AGCCAAGGGA 120 180 240 300 360 420 480 540 600 560 720 780 840 900 960 1020 1080 1140 1200 1250 1320 1380 1440 1500 1560 1620 1680 1740 CCCATCAAAG GGATAAAGAG CCCCAACAAG GCCAGAGACG GCAATCTGCA AGACCCAGTC GGCCAGCTTC TCATGATAGC ATCGCAAGCA AAA'rAGCTAG TAACAAGGTC GCTGTCAACC AATCACTAAA ACGATCCTG.A ACCTCCAAAC AAGTCTGCTA CAAAGTCTGT CGCTACCTGG CGAAT?1CTC CATCCTGCAA TT1CATCCGTA TC-ATGGGTA CAAAAATCGT TGTCAGAACC TGCAACTGTT TTCTCGAAAT GAGGAAAATC ?rGGGCTGAC CAATCATAGC ACCAGATAAT TCACTAGGTA AGCGATGCCC CAAAAGCTCT TCTGrTTCT TCGTAATTTC GAGAGCAATA TTTTCCGCAA CTGTT-AGAT'r AAAAGCGACA ACTGCAAATG TTGAGATAGA AAAGTTGCAA TrAAATTAGT CATGAACACT TGCAGGCGCT 'NTAAAATTG TCTCGGGATT GACAGCAATA CGGTCCGCCA ACT1'CAAGGC TGTCATCCCA AACTCTr'rAT GCAATTCTTT' AGCATCCAAG; GCCGAAAACG TCGGACAATA CCGACCCGT'r ATACTCGGCT ACTGG;TAAAC T'rCC'IGCTC CACCCCTTCA TGGAAAAAGA GCAATAGCCT
GTTCATCCAT
GCTGTTCTCC
CAACCT'rAGC
'NTCACGAAT
GTAAAACATA
CATCCATATA
TCGTCTTACC
AGTTGACATC
ACCAGTAGAA AGACGAAGTT CACGCTCATC ATAGTCTTTG ATGCGCTTCC AATATTTCCA TCAGTTGGTT CCAAAAGACG GTTAATCATC TTGAGCATGG TGACCCAGAA GGCCCTACTA AAACCATAAA TTCCCCATCC TCAATCI'GTA 0 S.
S
S.
S
S
S S
S
*SS..S
S
S
5555 S. 55 S S
S
*555
*SSS
S S TC'rCAAGACA TCTTTT TCCTCAATTT AAAACTTCCC rArC-CCAAT GCTCCACAA'r GCAACGCCAT CAATCTGAGT TGGAAAACAA AGTCAAAAGT CCTTGTCCAA TATCATGAT'r TCTAGCTCCC CA=TGGAA GCTTTCTTAT CCAAATCCTT TCAGGTGTTC GATAGTAGTC TCACGGAAAG TATCCGTCGT 'rGAACAAAGA CCAGATTGCC TGACGCTCAA AGAAATCTGC TGTAGCGCAG TGCTACATTr TCGATTGGTC AAGTCTTCTA TTTCCCGTTC TC1'AAACGGA
CTGACCATAG
GACACTATAT
ATGCTGAATC
AATTTCTGTC
GATTTCAAAA
AA'rGACATCC
CACCCATTGA
ATCCTCAATG
AAAGAAGGCT
CTAACCACAT
TCAGCCACAT
AAATCGTCTG
AAGAAACGGC
TCT~CCAAAAA
CAATGCTCAA
GCTTCTCCAC
GTGCGGACAA
GCAAATCCTT
TTGTATTCAA TCATTCT1'TG CCTrAGGCAT AACTTCCTTA AGATATCGTA CTGGGCATAA AGT'rTCCTTG TCCTAAGAGT AGTT'rTTATA AGCAGCACTT CCACATAATC ACTCCACTGC GAACCZAGCTT T'rTATTTTCT TTTGATCTAG TTGGTCATTT CAATACAACC ATTCTCATCC CATTCAGATA TrGATGAACA 'rCTTAATCTG ACGCTCTGGA CTTTCCCGTC AGGAACACCT GAGCCTCAGC AACTCGTCCG CTTGTrGTG.A CATATTCTAA CTTGAAAACC TTCAAATTGG CCAATT'rCTG ACTGACACGA 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 GTCGAATGTT GOATATAGGT ATCCCCTACA GACTGGGCTT TCTTGAATGG CATGGATGTA TAGGTTGTGA GCATrTTCA ACCTCATTTC CCTTCTCTTT CAGATTCGCC AAAATTCTTT TGAATTTCTT CCTCTGAAAA TCCTTTGTAA AAGATAGTAT TGCCCCACTT CTTTCTGGGA TCT?!TrCCAC ACGGACTAAC GTCAGCGTAT TATTCGCAAG GTTTCACTAT TCCATAAAAC GGATCTTGAC TCAGTAACTT AGCAAA'rGAC CATCTTTCAT AT'rGTAACAC TCATCGTTCT TATCCACACC ACCATACTCC TCCTCCAGTC TACAAAAGCC TTTCCTATCA GACCATTTAT TTTTATAGTA TATGAAGTT GAT'rTGAATT TCCCAA'rCAA CCATGACTGC TAAAACATTT CTTATT~TCAT TCCGCTA'rAA GCCTATCCCC TACCGTTTGA
CTTGCCTAAC
AATTACAAGC
TCCAGTCGCA
CGCTAAAATC
TTGAAAAATC
AACACCTCCA
AACTGTCAAC
GGCTCAACTA
272 TCCGTTAAAA CTAAATACTT TTI='CCT CTAGCTTTTT AGCGCGA'rA' CTGTCGCAGT TTGCCCTGTT CACCCCTATA CGCCCATTCA ACAAACGAAT
ATTATTTCG
TATTTCGATT
ACTTTTAACG
ATATCGAAAT
TAGAAATAAT
AGAG'rCTA
CTTACGCTTG
TATCATAGTC
TGCGcAGccA
AAGAGCCTCA
ATGATGGGCT
GAATAAAAcA
TTTGATAAT
AACTCCTTCG
GGGACACATT
CCATCTGTTC
GATAGAAAGC
GAATAGGACA
TTGTTrCATAT
CCTTCAAACA
AGAAG'r'TCT TTCCArrCGT ACTATCCTAT ATTTTATGAG TTTAAAGATA GAAGTAALATC ATAATTGCTT
AGACTAGAGC
'r'r'GTTCGTA
CTATAAATTC
TTTCACCTTA.
CGATTCCTCA
ACTGTATCT2
TT'TATAGCAT
ATTTAATTTC
CCCTATCTTT
CT'rCGCTCCA
CTAAAACATT
TTrCGAAACTG
CTCAATCAAT
TTCGTAGCAC
CTTCCATTAC
C
p Ct 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280
TCACTACTAT
CGAGATAGAT
TCCTCATACA
AAGCGAGCCA
TTGACCAGGC
GTAGTCTCTA
AGATCTTTTA
TAATAATGTT
AAAGCTTGCG
CGAGCGCAGA
GCATCTGTTC
GTCAAAAGGA
AAGG'CCAT
ACTGCGAGGA
AC~rTTCTTT GGGCTCGGCT GACT'rCTCAT GATTCCTTGT CTTACAAAAA ArGCTTTGAT CCACAATGGA TAAGCGCAGA AGTCGCAGTT CCTC'rGTACT AGTTGAGCAA C'rCAGGTGCT GGATGTTTGG CTGAGAGACG AACTGCCTGC AATTGCTCAT GGAGAGCAGC AACTAAATCT TCACTCAA.AT TAAGGCTC TAGGTr'rGGT TCTACCATCC TAATCAAATC AACCGTTGAA CGATCCAATT TACTACTATT TGAACGCTCA ATCAAAGCAT TTTAAAGAGT TGGCTTCTTC TCTT'TGCACA GATTTAGGAG CAATTCACGA 'rTGTAGTAGG CAGTTTTA CATGTCGAGC ATGATTGTAA CTACCACCTC CCTTATGGTT TC=TCACCAA GGCTTGTAAG TTTGGTGAGA ATGGATATAA TTTTCAGATG AGCTGCACCT TACCAGCTTC TTCAGCCGTT p. pp be p
S..
C. PC 0 C
C
CTTCTAGGAA GTCATCCATT CACCGATAGT TGTTGATGGG CGCCTTTACC ACAGTAGTAT
GCATAGAGGG
ACACCACCAT
TGGTACTTGA
AATCCTTCAT CCCTGGGAGA AGCAAGTGAC CTGGATCATA GAAACCAATC CTCCAATCT GCCTTGACCA CCGTAGACAT CACCTGCTGG TGAGCAATCA AGACTTCTGG GTCAAACTTG GTTGTAGAG4G TATGAGCGCC ACGCAGACCA GGACGTTAGA ACCCAGATAG AGTTCATTGC CGAGTT'N'G ACCCGATAAA GCTrCAGCTA GCTCGCTTAC ATATT'TTr CATTGGCTG'r CGGATGCCAA AACTTTCTGC GCA.ATGGCTG GCATGGTTGG GAAATCACAG GAATTTCATG ACCATGGGGT TCCAGCCACC CTGACCATAA AACCAACTTC GCTrCTGAAT GTTTGATACC TGCGGTGTCA ACTTTTCACG GCAAGTTCTG T'rACTrCTTr TTrCATCCAT TT'rACCACTT AGTATCTTAG TCCTGCTCTA TCTAGTAAAC ATrCCAAAAT ATCAAAGTTG TTTGGATTTT TTCTTTAA.AT ATATAGTTGG TAGAAAT'rAA TT'rGACTTTC ACAAAACTAG AAAAGGAAAA CATGAGGACA CCGTAGCGGT CAAAATTGCA GAACTATCTG CTCAGCCTTG TCCGCAAAAC TCCCCCCTTT CCACGAGTCA ACCATCACGA GTCAAGAGTT GATTTCTACG ACACGGAAGG GTCCATATGA GAAGCGACCA AAAAATACCA CCCAAGCCAT AAGATAAGCA CGGACAGGCG TATCCCAAGC TTTTGAGATG GTACAATGGT ATCACCAGGA CACCATCAAA AACGATATCG AATGCGGAGG AACAGAACCT TGAAACGTTG GCTGCTAACC TACCATCTGG CTrGATTTCG AGACGCGCGG TGCATCCACA CTGTCACCAC 'rTCATCCACA CTTCATGACC TGAGACTGCA 5340 .5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6273
AATTTTTGAA
TTrATAGGAG
TCTTAGAAAA
TAACTCGAAT
TCATGAAATT
TTGAGTTTGG
CTAATAGAGT
AATAATGT'rG TCATTTCAGT TCCTrCTrTC AAGGATAGTG GGAAGGTGGA TTTC'AAGTT GGATAGTATT C'TCTTGCATG TAGTGCAAAA A'PTTATTTCC AAACAAAAAA ACAATACACC TACAGAAAAT AGTTGACTTC CCTTTCTTCT AATAGTACGC TGTAGCTGCT AAAACATrTC TGTTCATATC TTAT'rTCAAT TTACTATAGT AATCATGACC AGG INFORMATION FOR SEQ ID NO: 22: SEQUENCE CHARACTERISTICS: LENGTH: 28171 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: ACAACCTTTrT TCAAAAACTC ACCTTGGTAC GGAGATGTTr TGCTTTCTGC TATTATTrTC GGTTATATTC ATATCAATTT TGCTTTAACT CCTCTTGC'PT TTTTCATTTA TGCTAGTGGA GGTCTTATTT TAGCTCTATT GTATCGCATG ACTAAAAATC TCTACTATCC AATACTAGTT CATATTCTCA TTAATATCAC TGCCTTCTGG GATGTGTGGT TGCTCCTATT TTCAGGAAGT TAGCTTACTA AAATAATGTC GGAACTPTTCC GGCATTTTCT TTTTTCACAA TTTTTCT?1'T CGATATTGTA GTGGTGTGTA TCCAGTTATT ITTTTTGAATT
ATAGTCAACG
GATTTTGAAA
ATAAGGTGA
TTTCCTNTT
GAAGATGAAC
TCGAACGCT
GGAAAAAGAG
TCGTGGGATG
AAATCAGTAT
TGACTTC'IrG
CCGTATTTTT
GA'rTGGGATT 274 CTrGAGAAAG GCAGATAGTC AAGiATAGTTA AGAAGAATAG GATGTCrT GGAAAACTTC TAAAATATGG TATAATGAAA AGATAAAGAA GT'rGGGGGTA
ATTCAACAAT
GAAAAGATGT
TTGGGCTTTA
CAATTTTATG
GCCAA'rCCTG
CCACCAACTA
GAATCAACTA
TACGCTATGT TGTGGCTATT GCCAATAGTG ATGN'AGTCA GCCGAGTCTG TCTATTTCTG AGAI TTCCG
AAAAATCGCA
AAGAAGAAAA
TTACGGCC?1' C'rGTTCAAA'r
TCGGACCAGC
AGAAITGGTT
AGATGAAI'T
TTCAGAGCC
ATTAGATGAA
'rCXr.GGACTT
AAAGGATTTG
'rCTGTTGCTA
TATCCTGACT
GTGGCGCAAG
GTACTTTTCG
TTCGTGATTT
TCTTGACCCG
ATA'TTTTCA
GCCAGCACTA
ATAAGAACTT
GGCATAGTGA
GGGTTGAAAA
TCCGTGAGGG
ATCTACCTCA ACAATCAAAA TAAAAAGGGG ATTATGCAAC TTTCCATACC CATATTTATC ATTAGGTCTG GAGGTCATCG AATTGATTCC TCATCCTTTA GCCCAGAAAG AGGAATTAGT TCGT'rTCACT CAAGAGAAAG ACCAGTACCT CGCTAGCTCA CAGATGTTTA ATGTGACAGA GACGGACGCC TATGCGACAG GTTCTGGATT AGTTATTCGT CTCAAGGATA ACCTAGATALA GGAGCTTAGT CAAGCTGGGA CTCTCTTCG'r GAGGAAATCA TGAAAAAAAG AGCAATAGTG GATCAGTTGG TCAAATCCTA TATCGTCCAG ATCCCCAATT TCGTTAGCTT GACCTACCTG CAAGATCAGC AGCTG'TTATT CGCTGTCATT TATTTACATA AACACATGGA GGACTCATTC GCGGGTGGTC TTGGAAACTT TATGACAGG CATGGAGGAT T'rAGCGGATT TACCAACGGT TTATTA'rTCA GAGAACTI-rG TCGATACCAG CCGTGCCACC TTGAATGGTA TTTTGGAGCG TTTAGATAGT GACAGTGTTA ATGGCATTAC CCGCATGGTC T7ATGTTAAAC GTGAAGAAGT AGAAGTCATG CAAGAATAIT TTGATCAAAA GCAGTCAT'rG TACTGCTTTT GATTGGGCTG CAGATTCCAC TGGGTGAAGT GCGCTCCTGG CAAAATCGAG GTGCAGCCTT TTCTATC'rTA ACTCTGGTTG TCGTGATAGG TGCCATTTGG TGGATGGTCT TGGGTTTGAC TCTAATAATC GTCAGTCAGG GCTTTGTTGT GGATATGTTC 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 CACCTTGACT TTATCAACTT TGCAATTTTC AATGTGGCAG ATAGCTATCT GACGGTTGGA GTGATTATr'r TATTGATTGC AATGCTAAALA GAGGAAATAA ATGGAAATTA AAATTGAAAC -TGGTGGTCTG CGTTTGGATA AGGCTTTGTC AGATTTGTCA GAATTATCAC GTAGTCTCGC GAA'rGAACAA ATTAAATCAG GCCAGGTCTT GGTCAATGGT CAAGTCAAGA AAGCTAAATA CACAGTCCAA GAGGGTGATG TCGTCACTTA CCATIGTGCCA GAACCAGAGG TATTAGAGTA TGTGGCTGAG GATCTTCCGC TAGAAATAGT CTACCAAGAT GAGGATGTGG CTGTCGTTAA CAAACCTCAG GGAATGGT'rG TGCACCCGAG 'rGCTGGTCAT ACCAGTGGAA CCCTAGTAAA 275 TGCCCTCATG TATCATATTA AGGAC?1'GTC GGGTATCAAT GGGGTrCTGC GTCCAGGGAT TGTTCA ccT ATTGATAAG.G ATAcG2cAGG CATCTAGCA CI'GCCCAAG AACTCAAGGA TGTTCATGGA AATCTACCTA ATGATCGTGG AAAAG.AccGT AAGAAACAGG CTGTAACTGC CGTCTTGGAA CGCTTTGGCG ATTATAGCTT TCATCAAATC CGTrGTCCACA TGGCTTATAT TGGTCCTCGC AAGACTTTGA AAGGACATGG TACTCATCCG AGAACAGGTA AGACCTTGGA GGAAACCTTG GAGAGATTGA GAAAGTAAGA GTAGGCGCTT TTr'rAGGTTTr GTCATGGTAT AATAAAATCC ACTTTATCAA TGT'rCAAGAA TCTTCTCATG ATTGCrAAAA ACGATGATGC TAAAAAGTCT CTCCGCAAA'r ATTCGGCGAT TGTAAI'rGAA
TAAAGGGAAG
AGTAGAGrrG
CGGCCATCCA
ACAATTTCTT
ATTTAAAGCA
ATGAAAAAGA
GGGAATGTTC
GGTGGCAGTG
GAAGATTATC
GCGCCr.ATT.G CCCGAGTGA CCTGCAGTGA CGCGTTTTCA CA.ACTGGAGA CAGGGCGCAC GTCGCTGGTG ATGAGGTCTA CATGCCAAGA CTTTAGGTTT GATATCCCAG AGATTTT'rAA AATTAACTAG TTTAGCAC'TT AGGCTCAAGA AAGTTCAGGA ATG;CGATTrAT TCTTGAAAGC ATTTCCCAGA TGGAAGTGA'r AGCATGTTCT AACAGACCGT ATTTATTN'T GGTGACCCAT CTACCTATCC AGTTGACCGA CTGAACGTCT ATGGGATAAT AAGGTGTT'rC AGTTATTCAA 2220 .2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 GGATACAGGA AATGGACATT TTGCCATGGT TCTCGCTATC CATGGACAGA GTCTTTCGTC GTTTGAAGGA ACCCACAGTG3 ATCATATTGG GTCTATCTTA AGAAATATAG CTGTA'rGGCT ATGATAAGGT AATATCACAC AAGGGGATGC TATGAAAATG AAACTGAT'rC AGGAA'TTGAA ACGTCTTATA ATTGGGT1GTC CAAAAACTTG AAATGTTGAT GAATTACTGT TGATAGTCCT ATTACTAATT 'rTACAGACT GCTGCAGAAA TCATTTTCAG TTTGGGGACA TGGATATTCA ATCGGGTGAA TTAAAGAAAA ?r-rGGGATGA
GCTCTATAAT
TCCTTGATTA GCGTGGTGAA AGTCAATGGC AAGAAAA~r ACCTTGGGGG AATGTTCATG; GAGCAGAAGA CAAGTATGGT CCTCTCATTGCGAAALAGTTGA TTAATCATC ACCATGATAC CAACAAATCA AATACCAAGG ATTTCATTAA CCGAGTTTGA TTGTTCAAAC TTCGGATAGT CTACCTTGGA AAAATGGTGT TATGTTAATT GGCTCAAAGA ACGAGGAATTf GAGAGAATCA ACGCAGCCAG GATGCAACAG TTTTTGATAT TCGAAAAGAC GGTTTTGTCA ATA'I=TCAAC CCGATTCCAA GTTmCAAGC TGGT'rGGCAT AAGAGTGCAT A'rGGGAACTG GCGCC'rGATT cTAcAGr.AGA-- G'rAT'rGTC- GGTTGGALATG. AAATCGAAGG CAATTCCAAT 3360 CGATTTAGAT 3420 TTTGATGAAG 3480 AAATTTGAGT 3540 TGATAGTGAG 3600 CAAAGACTAT 3660 ATCCTACAAG 3720 GTGGTATCAA 3780 TGAATGGTAT 3840 TACTTTAACC AAACGGGTAT CTTGTTACAG AATCAATGGA AAAAATGCAA CAATCATTGG 1900 276 TTCTAT'rTGA CAGACTCTGG TGCTTCTGCT AAAAATTGGA AGAAAATCCC TGGAATCTGG TATTATTTTA ACAAAGAAAA CCAGATGCAA ATTGGTTGGA 'rTCAAGATAA AGAGCAG'rGG TATTATTTGG A'rGTTGATGG TTCTATGAAG ACAGGATGGC TTCAATATAT GGGGCAATGG TATTACTTTG CTCCATCAGG GGAAATGAAA ATGGGCTGGG TAAAAGATAA AGAAACCI'GG TACTATATGG ATTCTACTGG TGTCATGAAG ACAGGTGAGA TAGAAGTTGC TGGTCAACAT TAT'rATCTGG AAGATrrCAGG AGCTATGAAG CAAGGCTGGC ATAAAAAGGC AAATGAT'rGG TATTTCTACA AGACAGACGG TTCACGAGCT GTGGGTTGGA TCAAGGACAA GGATAAATGG
TACTTCTTGA
GTGGATTCAA
ACTACAAGTC
AAAGAAACGA
TCAACTTCAC
AAGAAAATGG TCAATTACTT GTCGTGCCTG GTTAGTGGAT ATTCAGAAAT AAAAGAATCC GTCAACATGA AAGTGTTACA AAAGCTCTGA AACGAGTGTA GTGAACGGTA AGACACCAGA AGGTTATACT GTTTCGA'rCG AG-AAATCTGC TACAAT'rAAA AAAGAAGTAG TGAAAAAGGA TCTTGAAAAT1 AATTTTT'CAA CTAGTCAAGA TTGACATCC
AACAAATCGG
AACTCT=rC AGAAGGT'rTT AGGGCCTTCT TTTTCCTATC ATAATGGATA AATATGAA'rA GGTGGGTACT TCTTCTrCTGA TATTACCCAG CAGTGGCrA AGGTGCCATT- GCGGCTGGTT TGATAAACAG GCTTCAGCAG TCTTCTCTTG CGTCAAATCG TAAGCGTCGT TATAAAAATG TCCTATCATC AATGAGAATG CACTCTAAGT GCTCAAGTAG TGTGGACGGT CTCTATACTG A'rCGGAGTGA GACTATGAAA CAAATGAGGA TGGAAGTTTA TGCTGCACGA GGCTGGTCAT TTGGAGCCTT AGGA'rTTAAA CGGTAGGGCA GGGGCTTTTG TT'TCTGCACA AATCTTGCTG AATCAGAACA GTAGTAGAAA TATTTCCTGT TATTCATGTT TACAAACGGA TTGTCTrTAA TCACGTAGTA AGGTAAAGGA GAGT'rGATT'r TGGTGTCTTC AAGCGTCCGA CTAAGATTC TTGGAAGAAT ATACAACCAA ACCCAAGATG ACTTTGTGGA T'rGCTCAACC GTGGGGCAAT CTCAAGGTTG GGGACAATGA CTTTTAGTTT TCTTGACAGA AGAGCCAAAC GCTTGGAGAG GGAGCTGGTT CGTCAAACGG GCGACGGAAT CAGGAGT'rCC 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700
CCCATCAGGC
ATAGTGTCGT
CGGCGATGGT
GAAATCCTAA
TTTGTCGGTT
TATTGATGAG
CCAAGCAGAC
TTCAGATCCA
AATCGAGACC
AACTGGGGGT
TGTTTATATC
GGATGGTTCT
CTTCTATGCT
TCAATATGGA
CGGTGATATC
ATCAATCGTG AGATTATTGA TATGGCTGGT ATGTTAACCA AAATCAAGGC TGCAACTATC TGCTCATCCT TGAAATCAGA TTCCATGATT GAGGCGGCAG AGGAGACCGA TACTTTGTTG CTCAAGAGAA GGGGCTTCGT ACCCAGAAAc AATGGCTTGC CAGAGTCAAG GTTCTATTTG GGTTGATAAA GGGGCTGCGG AAGCTCTCTC AAGAGTCTTC TCTTATCTGG TATCGTTGAA GCAGAAGGAG TCTTTTCTTA GTGACAGTAT TTGACAAGGA AAGTGGAAAA TCACTNGGAA AAGGACGCGT 277
GCAATTTGGA
GATTTACCGT
TTAGAGGTAA
ATCGATTAAC
GCATCTGCTT TGGAGGATAT GTTGCGTTC1' CAAAAAGCCA AGGGTGTCTT1 GACGACTGGA TTTCCATTAC TCCTGAAATC CAACTACTTT TTACAGAAITT ACTATGGTGA GTAGACAAGA ACAATTGAA CAGGTACAGG CTGTTAAAAA ACAGCTAGTG AAGAAGTGAA AAACCAAGCC TTGCTAGCCA TGGCTGATCA CTTAGTGGCT GCTACTGAGG AAATTTTAGC GGGGAAAATC TCAGATG'rcA TGTTGGATCG GATGGCAAGA GGAATTCGTG AAGTGGTTGC AACAAGTCAG CTTGAAAATG GTTTGGTTAT GGC1TAATGCC CTCGATATCGG CAGCGGCTAA TCTTTATTTG GATGCAGATC GTATAGAAGC CTTACCAGAT CCAATCGGTG AAGTrTTAGA CACAAAAAAA CGTGTAGCTA TGGGTGTCAT CGGTATTATC TATGAAAGCC GTCCAAATGT GACGTCTGAT GAGTGGAAAT GCGGTTGTTC TTCGTAGTGG TAAGGATGCC TGTCACAGCC TTGAAGAAGG GCTGGAGAC GACTACTATT GGTGGAGGAT ACTAGCCGTG" AAAGTAGTTA TGCTATGATG CCTTCTCAT'r CCTCGTGGAG GAGCTGGCTTr GATCAATGCA ACCTGTTATC GAGACAGGGA CTGGATTGT CCA'rGTCTAT AGACAAGGCG CTGTCTATCA TCAACAATGC TAAAACCAGT CATGGAGGTT CTGCTGGTTC ATGAAAACAA GGCAGCAAGC AGTGTTGGT'r CCAGAGCGTA AGGAAGCTGG ACTGGAACCA CAAAGCAAGC CAGTrTGTTT CAGGTCAAGC AGC'rGAGACC TTTAGACTAT GTCCTTGCTG TTAAGGTTGT GAGCAG'rTTA TGAATCCCAC AGCACCCATC ATTCGGATGC TATTGTGACG ATACTTTACA GATCAAGTGG ACTCTGCAGC GGTGTATGT'r AGATGGAGGA CAATTTGGTC.TTGGTTGTGA AATGGGGATT GCGTGGTCCC ATGGGCTTGA AAGAGTTGAC CAGCTACAAG CCAGATAAGG GAGTAAGAGA TGAAGATTGG ATTTATCGGT CTTGGCAAAA TCTGTCTTGC AGACTAGGAC GTCAGATGAG TCAAGCTAAG GTAGATGCTT TCATTGCAGA CTTTGGTGGT AATGTTTGCA GAAGCAGA'rG TGAT'TTrCT AGGAGTTAAG CCTTTCTCAA TACCAGACCA TCCTTGAAAA AAGAGAAAGT GCGGCTGCTT TGACTCTTAA TATCAAACAA CCCATGCCAT CATCCAAATG TGATTCAACT AAGGCCAAGG GCTATCTAGA GTG4GTTGAGA ATGCGATTGT GTGGATAAGG ATGCAGACGA CGTCCTTCTG TTTGTAATGC TTCCTTCCTC GCTTGGAGCA ATTCAATTCC GCCTAGATAG CAAGACTTTG ACACCGAGTT GAAGAAGCGG TTGCGCACAT GAAAATGCTG AAGCTGCAGC AATGCCTCAA CTCGTTTCAC TCTACTCAGA AATTGCACCC TATGTGGTTG CCGGTGATGG TTGGGGAATA TGGGTGCTAG ATTCTCCTTG CCAATCGTAG CAGGCTTCCA GCAATGAAGA 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 CCTGCTCAGT TTTCTGAACT CTTCTTTTGA TTTCGATGGC AGCTGGATTG ACCTTAGAAA AAC'rAGCAAG TCTTATCCCA AGTC.AACACC GAATT-ATTCG 278 TATGATGCCT AATACCCCTG CTTCTATCGG GCAAGGAGTG ATTAGTTATG CCT'TGTCTCc TAATTGCAGG GCTGAGGACA GTGAGCI'CTT TTATCAGCTT TTAGCCAAGG CTGGTCTCTTr GGTGA.ACTA GGAGAAAGTT CTTTGTCTAT CN'TTATCG AGAAATAGCA T'rGAAAATGG AAGTCAGCAA CATCCTGGAG CGCTGGTGTA GCAAGCCTAG TCAAGCCTAC AAACGAACAC TATGGTG;GCT GAAATGAGAA AATAGAAGTA GTAAAAAAGA AGGGACCAGA GGGAGCAGGC AAAAAGGAGT AGAGGTGI'rG T'rCGGGAAGT GATTTTGGAT TCTATATTGC CAGTCGCAGA GCAAGTTGGT CATCATGGAT GTGGCTTAGA TATTGAAGCC CCGA'rrTGAC ACTCTAT'r ATAGTGACCG CGAGGTTAAT GTCAAGGCTA CCTTTCTCTT GTCTCCCTT'r GGAGCAAGTT TGGCCAAATG AAACAAGATC CCGTATCTTA GAACAAGACC CTTGGAAATG GCGCAATT'T TAATCGATGC AGCGACAGGT CTTGCAGGTT GTrGGACCAGC AGGCCTTGGC ACATGCAGGT CTTCAGACAG GATTACCACG CAGCACAAAC TGTGGTAGGA GCTGGGCAAT TGGTCCTTGA TAT'IGAAAGA CCAAGTCTGT AGCCCAGGCG GTTCGACTAT AAGCGCATGC TTTCCGAGGA ACAGTCATGG ATGCAGTTCA AAGAACTAGG TAAATAAGAG GTAGTTTTGA CTGCCTCTrT GAcACAAAAA GATTGTCACA AACCCCTATT TT'N'TGATAG
AATGAGTTAG
AAGACCAGTG
ACGACCCGTG
CCAAGTCATA
CAGCATTTGG
CG'N'TTATCG
ACATGTCAAA AGGA'TTTA GTCTCTCTTG
TTTTAGAGC
AACCTGGCGG
CTCAGATGGA
TGGAAAAAGT
ATAGTTCTGT
TCTGCTACCA ATT'TTAGAGG AGTCTTGATT GGGGAGAAGA TGCTAAAACA GAGCTACTTC TCTTCCAGCC CTTGAAGCTG TGCCTATCAG GGATTTGGTC TGCGACAGAT GGCCTCAAAC GCTGGCTCGT ATTGCTGCTA GGACTTGCAT AAAAAAGTTC CATTGTCAAG A'rrGATGCTA a.
a a. a a a a. a.
a a a a. a.
a a a a a a a. a.
a a a 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 ATTGACTGGC TCAATCAGTT GACATCGAGG TGGAAGAAGG CGTTTGGATT TGGAAGGGTT CTGGATAAAG AGGGAAATCG GTGGAAACTA CCAAGGCTGT CTTGT'rTGAC GGAATGGGCT AACTAAAGGC 1TGGCAACCA GCTCAGTTTG ACCGTT'TT
AGCTCAATCA
ACCATGTGAG
CTTGATTAAA
GTTTTCTCAA
AATGCATCCC
TTATATTNC
AAATGCCGAA GT'rCCAAGCT CCAGTTAATC AGGTCATTAA GCAGGGATTG AAAGCCAGCA
CGCCTATCTC
CCTCTTM'TGT
GATTGAACAG
GACGGAACGC
ACAGGTCTT'r
TTTTCAGT
ACGGATAAAG
GGAGAATTC
ATTCGAGAAT
ATCATCGAGC
TCTTTGAAAG
TTGGCGTCTT
CCGATGTCAC
TGGTGGGTCA
AAGCGGATAA
AACGCAGCCA
TTCTTGACTA
ATTCTCTGCT
GCGA'rCAGGA CAAGGTCATC GAAGAACCCC AGAGTGAAGT AAAGATGTTA CCGACAATCC GAAGTCGGAC TCAGATCTTC CACTTTAAAA AGCAAGAAGA AAAACTTATC TTACTCTTAG AACAAATGGG ACTTGTTAAG AAAAAAGCGA CTCTTTTAGC TAAGTTTAGT CAATCGCGAG CTGAAGCAGA AAAG'rTGGCT AATCAGGCAA TTGGTTAGTA GCTAAGAAAA AGATGATAAG GAAAAACAGG CCTCTTGCAG GTAAGAGTAA GCAAGCTAAT GTCAGCTTTC TCAAAAATGA ATGA'rAAAGA CGCGCTGGAT GATTTTTCCC GAAAAATCTC AAGAGCCTGG GCGAGAACGC TTGGGTGAGG AAGTGTCCGT CGCAT'rTACC TCGAGAGCAG GACGAGGAA'r CAGATTCAAA AAAGTTTTAA CCGATTGGCA ATCTAGATGA 'rGGATTGCTG CTGAGGATAC ACCAAGCAGA TCAG'rTTTCA TTCTTGAAAG CAGGGCAAAG GACCCTGGTC ATGATTT'AGT CCAGGTGCCT CTGCAGGAAT ATCTTTTACG GTTINACC AAAGATTATC CTGAAACACA GAAAATATGT TAGAAGTCTA ATCTA'rGAAG AATACCAACG CCACTCAAGG GCGAATGTCT GACGAGGAAG ACTTGTTCG'r
GTTTTTGGAC
AAGAAAGrrA
ATCAGGTTTT
GAGTGATTCT
AAAATGCCAT
AAGGAAAGGG
279 CTTGG'rCGAT GAAAGTGAAC GCCTGCTGAC TCTACAGG'rT GCCAAA'rTAG CCAAC?1'GGC ACGGATT-CTT GAAGTTCTCT GTGGGCACGA ACAAGAT'rrA CTAGAAGCTA GAAAAATGTG GGAATATCTG GTCTTGAAAG AAATATAAAC CTG7I-IATG GACAAAAAAG AATTATGA AACAATTAT GGTAACCTTA GCCGATGTGG TAGAGGAAAA TACAGCTCTT CGCTTGGAPA TGGAAGCAGA TGCTCCTGTC GTGATGGATT TCACGTATGT GTATGTTTTG TGACGAGTTG GGGGCAGTCT CCCTA'rGGCA TATGACTTTT CGTGCTATCC
AAGGCCAAGC
AATGATTT
CTATACAGGG
AAGCCATCAA
ATAGTAAGTT
ATGTTCGTGA
ATGGACAACG
AGTAGGCATG
AGCTGTATCT AGTGGCAACG AGACCTTGAA AGAAGTGGAC
C.
C
C. CC C a
C
C. C
C
C. CC a C
C
a
CC..
C
CCC.
C C
C
CC..
C
CCC.
CC CC C C
C
GCGCAATACA GGGCTTTTGC TCAAGCArTTT TGAGCACAAT GCCAAGGAAA AAATTCCTGA TATTGCTCAG .GTCTCTGATG CCGGTTTGCC 'rAAGGCAGCT A'rTGAGGAAG AA.ATTGCAGT
TTCTGCCTTG
GAGAAAATCA
GATTTTTTAT
CGGTGACCGC
AGGTACTATC
TC1'CATTrGTT
AGAAATTCAA
ATTGCCAGTG GTTrAGCGcC GGTCAGCAGA AGCAATTTTr GAATCACCTC ATCGTGTAGC TCCGTI'GTCT TGGTCAGGGA TCTGAGTTAT TAGAAAGCAT GAGGGTGCCA GTCAGGGTGT ACCCGCATCC AGCAAGGTGT
TGACATTTCC
TTTGA'rTGGT
TAGCATTTCA
TGTGACAGTT
ACAGCCACAT
TGGCTTGAAA
AGACACGTTG
ATTGACCAAA
TGCTGAAACG
GGAGGAAAAG
GAAGAAAAAC
9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 105 10620 10680 1 0740 10800 10860 10920 10980 C;AAGCTATCA AGGAAGTCGC TAAGATTTAC TACCACGACT GGGAAGAAAA ACAATAAAGG CAGTGGAATA AAAGTCAGCI' CTACGCTGCC GAGACAGGAT GTAATAAT'rC TGTCTG'rTTC TGTTTAACT'r AATTAGTGAT GATAATATAA AGATGTATCA CTGGTATAG AT'rAAGTTTT TTATTAAGCC CATACGGAAT ACCGATGGTT GGAGCAGCAG CTTAGAAGGT ATAAATAGAA AAATAAGGTC ATTT'rAAATC AAAGGATTGA
AAGCTTTGGT
TTATAGCGTT
TAAATCAGAA
AGAAGGTGAT TN'TTGCGAA AACATTTGAG AAATATATTIT AGCTGATGAA GTGACAGAA CTTGGTTCTT AAATGGGAAG TAAATTTAAA TGGAATCAAC TTATCTCTG CAAGAGT'rGA GATTGATTCG TTGAGTGAGG AGCGACTAAA ACAGCGACTT TTTTGGAACT 'XTCAGAACTA 280 CATACGAAAA TAAAGAAGAA CTAAAAGCTG AGATAGAGAA TAGAATTTGA TAATATTCtA GAAAATTTAA AAGATAAGAG CTCCAGCAGA AAACCTTGCT TATCAGGTTG GTTGGACCAA AAGATGAAAG AAAGGGGCTr CAAGTAAAAA CACCATCGGA TT-GGTGAATT ATATCAGTGG T'rCACAGATA CCTACGCTCA AAGCAAAA'rT AAATGAAAAT A'rrAATTCTA TCTCTGCAAT AAGAATT1ATT TGAACCGCAT ATGAGAAAGT GGGCTGATGA GGGAAGTGTA TAAGTTTATT CATGTAAATA CGGTTGCACC AAATCAGAAA ATGGAAGAAG ATAGTATTAT AAATTATAT TTTAACTTTA AAAAATTT'CA TAAAAATGGT TACCAAAGGC GATAGAAGAA AAACTATCGT CTTTT'rTTT GCAAATTTTT AACAAGGGAG GTGA'rCI-GC A'rGGACTTTG AATATTTTTA TAACAGAGAA GCGGAA.AGAT TTAACTTCTT AAAAGTACCG GAGATATTAG TTGATAGAGA AGAATTTCGG GGCTTATCAG CAGAAGCAAT TATCCTN'AT TCCATACTTC TTAAACAGAC AGGAATGTCA TTTAAGAATA ACTGGATAGA CAAGGAAGGC AGAGTATT'rA TCTATTTTAC TGTCGAAGAA ATTATGAAAA GAAGAAATAT CTCAAAGCCA ACTGCCATAA AAACATTAGA TGAGCTTGAT GTAAAAAAGG AATAGGACTG ATCGAAAGAG TAAGGCTTGG ACTTGG'rAAG CCGAACATCA TTTATG-rA-A AGACTTTATG AGTATATTTC AGGTAAAAGA AAATGACTTAtees*: .4 :*,s0 0060.,- 0.
00 0 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780
CAGAAGTCAA
GAACTTCAAG
AGTAAGAGAG
GCTGCTGAAG
AGACTTCCTG
TCGTAATCCG
TACTTTGGCA
TTTATCTTCA
PATATATCTTC
ATTTCTGCGA
AGAAACAATT
AAAACTTAAC TTCAGAAGTA AGGTTAAGAA CCTTGACTCT AATATAGTrr TGGTGAAAAC ATATATCGGA TTTACAAATC CAAAACTAGA ATCCTAGTTC AAGCGTTTAC GATGATTTCG AAGATGTTCT CAATCTTGCTI GCTGTTAGCG GCTTrGAGTTT ATGAGCCCTT GATAACCACT CTCATTTTGA ACAACTTCAT CTCCCTTGAC TTGTGACAAT
AAAGATTTTA
AACTATATAG
GGACTTGGAA
ATAATGAACT
ATGATTGATA
ATAGATTGTT
TCTCTCCTTG
GCTGGATTTA
GTCAGACAAG
ATCACGACAA
CGCTTGAGCC
TAGGGCGATT
GAGTTCTTGA
ACCTCAGAAG TAAAGAAA.AT AGAATAATAA GAGTAAGTAT CATTTCAAAA TGTGTTTTTA CACAGCTTGA GAATTACAT'r ATGCCAGCAA TCAAATTCAT GAAAACATTT TAAACGTTTT GATAGCGCAT GGTTACAGGC CGTGGAGTT'r GTACTTGAGG ATTTTACCAG CTTGTCCGAT TAGTTCACAG CGATATCCAA TTCATAGCGT GAAATTTCTT GATT'rTrACTTCCGTCGCAT AATCGTAACA CCACTTTGAA TTTACCAGAA TGATTCGCTA ATT'r'TT CAATCATTAC CGTGTCCTCA GAACTGAGAG
CAAGAGTTAC
CAGCCGCAAT
GGTCTTCTAG
TT'ITTAATAC
ATCGTGCATC
TGTTT1CGCAG
TTGTGAGCGT
AGGAGATCAC
AATAACAGTC
TTCTTAACTG
GCCTTCTTCC
CAAACAGCCC
TCGTGTCTCA
ACCAAATCGC
CACCCCAGCA
TTCAACCCAT 'rGGCTCCGAC GGATTAAG~r TTGTTCATAA GTTCGATATT CTCGCACATA GCTTAATTTA GGTTTTCGTC CACCT'rTTGC AGCTAATATC 'rCTTCAAAAG TCGTGCGCTG AGTTAGTTGT TTACTTGCTT CATCATTCAT GAAGTCTATT GGAAAGTAAG AAATATTGAA GGTGCTATI' TTTCAGGTAA AATAAAATAT TATGTTrGAAT GTTATGCTGT TTTAGATAAT CCTATCGATC CGTTATGTGG AAAAGATT'r? GAATACTCAC TATCTCTTTA CATCAAGAAA CTTTTrTGT TATACTAGTA GAAGAAAAAA AGTGGCGTGT TTTAATATGG ACTTAGGTCC ATCTTATATC AATGTTATCG GTGCTGGr'rT AGAGCGTGGT ATTCCAGTTA AACTATATGA TAAAACAGAC AATTTTGCTG AGT'rGGTTTG 0 0 0 0 :0.0.0 0 000.
GCTTTCGTGA ATACCAAAAT 'rTGAAGAGTG GCCATAAGAA GTGrTAACT TGATAAGCTG AACACCAACA AGACGCTAA AGAACTACTA TACCATATTT- GCTIGAGGCTA TTAGAAGAAA CACGAAGATT CACAGTTTAA ACGGT'rATAG CAAGAGATAG ATAGAGTAGC ATATAAT'rGA ATGACTAAAC AGGGAAGT'TT TTAGAAACAT TTGTGGGTGT CACCCAAAGA GGTATTAGTG GGCAGGTTCT GAAGCAGCTT AATGCGTGGT GTCAAGTCTA TTCCAATTC'r TTGCGTGGGG GCCTCGCTTG GGTTCTGTTA TGCCCTTGCA GTGGACCGTG CCCCTTGATT GAAGTGGTTC TATCGCTACT GGTCCTTTGA CGGTGCTGGT TTTTATTTCT TATGAGCAAG GTCTACCTCA CCCTATGACC AAGCAAGAA.T ACCGCTTAGT TCT'TTTGAAA GGCCAAACGT GGCATTAAA-A CCCAGACGAC TATACAGGAC ACTTCGTCAG GATAATGCAG CAAATGGGGA GAACAAAAGC TGTCCGTTAT GGTGTGATGC GACTTACCGT TCTAAGAAAC 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 ATGCTTTGAC AAATGCAGTT GGTCTTCTCA AGGAAGAAAT TCTTGGAATC TGCTGAGGCT ACACGTGTTC CTGCAGGTGG ATGGTTTCTC TCAAATGGTG ACCGAAAAAG TTGCCAACCA GTGATGAAAT TACAGAATTG CCGACAGATG T'rATTACGGT CAAGTGATGC CTTGGCTGAA AAGAT'rCATG CTCTTAATGA ACGATGCGGC AGCGCCTATT ATCGArGTCA ACACTATCGA AATCACGTTA TGATAAGGGA GAAGCGGCCT ACCI'CAATGC T'rATGGATTT CCATGAAGCT TTGGTCAATG CAGAAGAAGC AAGAAAAGTA CT'rTGAAGGA TGTATGCCTA TCGAAGTCAT CTATGCTTTA TGGCCCTATG AAGCCAGTCG GTCTTGAGTA.
CTCGTGATGG AGAATTTAAA ACACCTTATG CGGTTGTGCA CTGGTAGCCT CTACA.ATATT GTTGGTTTCC AGACCCACCT GTGTCTTCCA AATGATTCCG GGTCTTGAAA ATCGCAATTC TTACATGGAT TCACCAAATC
ATGCGGAGTT
T'rCTTGAGCA 282 AACCAAATCT CTTCTTTGCT GGTCAAATGA CGGG'TGTGGA CTTICACGCTT AGTTGCGGGA ATTAACGCAG CTCGTCTCTT TTTTCCCCGA GACGACAGCG AACATT'rCCA ACCAATGAAT TCCGTGATAA GAAGGCTCGT AATTTTTGAC TGTCTAATITT ATTGTGATAA AATAGGTAGG ACGTATTTTA ATCAAGTTAT TATCCAAACA GTTCAAACAA A.AT'rGCCCTT GTTATCGGTG TATGGACCGT GTTCAGGCAG GATGGCAGAT TCA'IrGCAAC GCAACAAGTG GCAGAGCCTT TATCGTTATC TTTGGTGCTG CCTTCGTGCA GCTGAAATCG TGTTTACAAT GCCGATCCTA CCGTGACGTT ATCAATAAAG GGACAACGAC ATTGACTTGG CGTATTI'rGGT GAAALATATCG AAGAATATGG CTAACGCAAT TCACTTGCTC GTGAATTTGG CGTGTACATG TAGAATACTA ATTCCAGAAG CGCGTGTTTT GAACGTGCCT TGAACGCTTC CGCTTGGTTA TCCCAGCTCT AALGGTCGGCG AAAATGCTAA GCTAAGAAAC GAGAAAAAGC GACAT'rCAAA AAGTAACAGA GAGAAAGAAC TTTTGGAAGT AGTTTTATTC GAAAGAAGGA AlrGAAGCT TAGCTCATTA G'rCAATTTTG GGATCATCAA TATGAAAAAA TTGCAGAGCG 'TTT'TGAAAGA ATT1GCTCATC ATGAAAGAAG GAGAGTGAAA CAGGTGAAGC CCTTGCCGGT TCGCAAAAGA GATTCAAGAA GAGGAAATCT CTGGCGTGGA ATTACACAGG AATGCTTGGG AAGTTGGGGT TGATACGCGT ATGTCCGTGG ACGTGCCCTT
GTACAAACAG
CGTCACCTTG
GAATTGGTTC ACCTTACTTC TCGACAGATA AAGCAGATGC CATCCTCATG GCTAAAAATG AGAAAGATAA GACAGCTGTT AAGTTTGAAG GTCTTCGTAT CATGGACTCA ACAGCTTCAA TTGTATTCAA CATGAACCAA CCAGGCAACA GAACAACAGT TTCAAATAAT ATCGAAGAAA TATTGAAAAA GCTAAAGAGA GAATGACCCA TGGTATCCGT GC'rGGTCGTG CCAATGCAAG
CTATTGCCAT
AAAAAGGCCG
CAACAGCGGC
GTGTCGATGG
AATTGACCCA
CCCTCTCAAT*
TCAAACGTGT
AGGAATAAGA
GTCTCACCAA
01'TGCTTGAC
TTCAATTACG
GAAAGACATC
TTCTGTGAT'r
AGAAGTGAAG
TATGGACGAA
TCTTGAAAAA
TGCTAACAAA
TTGCTGGCTG
AGGCTATGTT GAGTCGGCGG CAAGGAAGAA AGCGAGGCTA CATTACCCAT GCCGACAGCA GGAGTTGGAA GGCGAGCGTA TGCCCTTGCC GACTTAGAGG ATACTATAAA AATCTTAGAA ATGGCGAATC CCAAGTATAA GAACGTGGCG TAGGGATTGA GTTCATAGCT TAGGTATCGA GAACCTGCAG CAGAAGCAGG ACTGTTATGA ATGCTCTTGT 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 1'5600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 TGGAGTCGAA ACTCCTCTTA GTTGGTAACA CCATT'rGACA TGATATTGGT ATCACACCGG TACAGAAGAA ACTCGTCGTG AGTGGCTGTC CGCAATATCC AAAAGAAATC ACTGAAGACG CGATGCTGTT AAACACATCG CTAAAAATAA ACAGAAAAAC AATATGAATA CAAATCTTGC
ACCAAATCGC
AGTCTTCATT
CTAATGACGG
ACCT'rGCTAA
GTCGCGATGC
AATTGAAGAC
ACGACATGAC
TCAGTTGGCA
AAGTITTATC GT'rGGACTGA 283 TCATCGATGA AAACGACCGT 'rTTTACTTTG TGCAAAAGGA TGGTCAAACC TATGCTCTTG CTAAGGAAGA AGGCCAACAT ACAGTAGGGG ATACGGTCAA AGGrrGCA TACACGGATA 16380 16440 TGAAGCAAAA ACTCCGCCTG ACAACCTT1AG AAGTGACTGC GGGGACGTGT CACAGAGGTT CGTAAGGACT TGGG'rGTCTT ACAAGGAAAT CGTTGTGTCA CTCGATATTC TCCCTGAGC2T AGGGCGACCA ACTCTACATC CGTCTTGAAG TGGATAAGAA TGGCTTATCA AGAAGACTTC CAACGTCTTG CTCGTCCTGC AAAACTGGCC AGCCATTG?'r TACCGTCTCA AGCTGTCAGG AAAATAATAT GCTTGGTTTT A'rTCATCCTA GCGAGCGTTA AAGTATTAGA TGCGCGCGTT ATTGGTTTCC GTGAAGTGGA 'rCAAACCACG CTCCTTTGAA ATGTTGGAAA ACGATGCTCA AAAGCAATGG CGGTTTCATG ACCTTAAATG ACAAGTCATC CCTTTGGCAT TTCTAAAGGT CAGTTCAAGA AAGCTTTAGG AAA'rCAAGCA GGACCAGTTT GGGACAGAGT TGATTTAGGG TTACAC'rTGG CTCATGACCG AGCGCAATCC TAAAAGTAAC AGACCTCGCT 'rTTGAAGAGT CAG.CCTTTCC .AAAACACACA CACTCAGGAC CAA'rTrGGTT TGTGGATACA GGCCTTCCTG CAAGGAACTC TGGCCTAAGA AGACCGTATC TGGGGCCTCT
CTACAACAAC
AACTTTTGTT
CGCAGAGCCA
ATGCAGAACC
TACCTACCAG
CGTTTGGGGC
CCGCACTCTG AACCTCTCCC GATGATTTT'G ACTTATTTGG TCCAGACGAC ATCAAGGCAA TGGTCTTATG AAGGCTGGTA AGGCTTATGA GAAAATCATT AGTCCCAAAG CAAT'rTTGGC GATGATTTTG ATGAGGTCAG GGAGATTTTG ACAGCAT'rTG GGGCTAGTAA TTTCTCCATC TCTTTATTTG AAGGAACATT TCTTTTTGGT TCCAATGAAC TCATGCTCGT ACGGAGATTG TCAAGTTATT CAGGCTTITGA TGTAGTCACT GCGATTAGCA 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 -172&0.
17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 TCGCT'r'r?1G
GCAGGAATAT
CCTCTGCTAT
GAGGAGCATG CCAGTT'rCTC CTAGAACACT1 AGCATTTATT AATAAAAAGA AATAAAAGGA CAATAGACAT TCAACTGAGT CATCCAGATG GCCATCTTCG TTTGATGGAA GAAGAGCTTG TCCAGGTTT GGGAGAAGAG TCTGCCTGTG TGGTCTTGGT AAATCGTGGG ATGACCGTTG TGGTCAAAAA TGATGAAATT GACAAGTTTG ATAATACTGG GAAACCTATC CGTGTCAAAA 'rCA.AACAGCA TGATGTGACC TTTGGAATTG CAGTGACCTT GGCAGTGACT GCCCTTAAAC GTCCAGCGGT GGAAGCGGGA GAGAGTCTTG TGGATCCTTA CCTTCGTCCT GTTTACGATG
TTTAACCTA
CATTGCGTTT
TTAGAGAGGT
ACCTGTTT CA
ATGTTGTGAT
AGGAAGCCCG
GTACGCCAGA
TCGCCCTTTA CGAAGAAGAA ATTATCAAGG CCCTAGGGCA AAAGCTI'TAT GTGGACAGTG GGCCAGCAGG TACAGGGAAG ACCT'rCCTTG GTGGGCA.AGT CAAGCGAATT ATCCTAACTC GATTTCTTCC GGGTGATCTT AAGGAGAAGG CCTTGTATCA AATTCTTGGG AAAGACCAAA 284 CGACTCGTCT CATGGAGCGT GAAATTATCG AAATTGCGCC CCTTGCCTAT ATGCGTGGCC GGACCT'rGGA TGATGCCTTT GTCATTCTCG ATGAGGCGCA AAACACGACC ATCATGCAGA TGAAGATGT'r CTTGACGCGT TTAGG7r'NC ATTCTAAGAT GATTGTCAAT CGAGATATTA GTCAGATTGA CCTGCCACGT AATGTCAAGT CCGGTTTGAT TGATGCTCAA GAGAAACTCA AGAACATCCA TCAGATTGAC TTTGTTCATT mTCAGCCAA GGATGTGGTT CGCCATCCTG TTGTCGCTCA GATTATCCGA GCCTATGAAT ATTCTACTGA AGTTGCACAC GACTGATTTT GAGGAAGTTC GCCTGCAAAA GAATAGACTT GTTCGGTAAC TGTAAAAAGT GTTATACTAT TTTTATGGAA ACAGTATACG ACAAAGCACA AAAACTTAAC TCAAAAAACT TCAAACTATT GATTGGTGTC AAAAAGGAAA CCTTTCAACT CATGCTAGAA CACCTGAATT CAGCCTATCA GATTCAGCAC CGAAAAGGTG GACGTCCACG TAGTCTGCCC ATGGAAGACC AGCTCATTAT GACCCTCCGT TACTTGCGAT ATTATCCCAC TCAGCGTCTG CTGGCCTTTG ATTTTGGCGT CGGTGTAGCT ACGGTAAATG CCATCATCAC TTGGGTG.GAG GATACACTTC GTGCGTCAGG TAGCTTTGAT TTGGACCATT TAGAAGCCCC GAGTGCTGCT GTGGCTATTG ACGTGACCGA AAGTCCGA'rT CAGCGTCCAA ACAAAACCAA AGCAAAAATT~ ATTCTGGTAA AAAGAAACGA 0 0 0 000 0 0. *0 0 0 a. 0 S CACACCTTAA AAACTCAAAT TATGCTGGAT T'rTTCTGACG GACATACGCA TGATTTTACT CCTGAAACGA CGCr'rGCCTT TGTTGACCTA AATACTTTCA TCCTGCTAA AAATTCCAAA T'rAAATAAAG AGATGTCAGC GATACGAATT ACCTTCCAALA TCATGTCAG'r CCCTTATCGT GAATTAATTr GTGCCATCAT CAATTATGAA CTTTTGAGAG AGGAAAATCC AGTTGTATAG TTGACGACAC ATAAAGTCTG TCAAATGGCC CTCTTCAAAG AAAGTATTGG ACAAAGTTTG GGTTATTTAG GCATCTTGAA ATTTCATGAG AATCGCCGCC TGAGTGAGGA TGATAAGCAG GAAATTGAAC ATTTTAACGC TAAATTCAAG 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 19740 19800 19860
AACCGCAGAA
GTGAACTAGA
GCTAAAGGTT
AACGTTTCGA GTTACGGGCG TTCCGAACAA GTCTAATATA TTATCCAPLAG G'CTCGAGACA ACGATCAGTG CATCTTCCTG 0* 00
S
0055 00** 'Sc.
*5 0 0
S
ACGATTAGGC ACCATGGAAA GAACTTTTAT GTGGCTGATG TGTCATAATC ACAGGGCACA AGAAAGTAGG AAT'rTGAAAA AGTATTACAG TTGTAGGATA CTAACTGAAA AGGATATTCC AAAGTAATCC TCTGTATTTT CAGCATTGTC CACCAGAGCC AGGACATGCT TTGTCTACCT GAAGGTAAAG CTAAGCCTGA GGAATGGATC TGACCTTGTG GCTGTTrATGG AmTTGTCTA CTGTTTTAT TGGTTTGTTT ATGGT'rGATC AAGCCTATCA
GATGATTGAC
AAGTATTTTA
AAATTTTGCA
CAACTATCTA
TCTTTATATG
ACTGTAAAAG
TAAGTTT'r GT'rCGATT'TT TGCATATCCT GATGAGGAGA GAGAAAAGGG ATTGGTAGTC ATATTG'rGAC AGAAGCACTA GCTTATT'rTG CTAAGAACTT TCGAAAGGCA CGTTTGGCTT 285 ATGT'rAAGGG AAATCCGCAA TCTCAGCAT'r TTTGGGAAAA GATGCGAGGT TAAGCAAGAA CTCTATACGG T'rGTrATCGC AGAAATGGCA TCAAGTAAGA ACTATTTGGA AI'TGTTTTG TGATGTGACT TACCGTTCCA TGATGGCGGGA GTATATTCT'r GCACGGCTTT AAATCAATTG TGAACAGAGC CTAGAAGATT
GAACAATTAT
TAC'N'CCGCG
CAGGATTAGA
GCAAGATTAT
TGGCGGCAT'r TATGACGATC GCTTr'rTAG'r TAAACCCGTG CAAGCAGTCT 'rAGATAAGAT TGACCAATCT TCT'rTTGAGT TTCCATACAA AGGTGCCAAA GAAATGATTT GAGTGGAAGA ACTTGATAAT AAGATGT'rTC TATAAGACCT AATTTTAGCT ATGTATAACC AACTGCCAAC GCCCAAACCT AAAAAGAAAA AGCAAGGGTG AACGAAGTAA AAAAGAAGTC TGCTAAGGCC C1'GTCTTTGC ACGGGTAAAA TTTTATATAT AAAAAGAAGC TGGGACTAAA GAGCTCAGCT TCCTTTGGTT TATATAATTG TCATTACAAG ACGAAGTGGT TGGGCGAAAC TCTGTTGACT T1TATTCAATT TAGAGTTTCT TATGCACAAT TGAGTCTGGA ACGAAAGTCT CCAGTTGCAA AGTATACAGT ACAATAAACC AACGATGTAA TAGCTGATGA ACTTGCGAAG TCACCCTTTT CTrTTTCAAAA TTTATACTAA CACGATTAAG TCCTTGAGCA ACTGGTAGGT TAGTCAAGTA CTTGACGCAA GCCTTCATCT TCAGAGATTG CTrGTGCGAA CGATATAAGG AAGAG-TGACA TTGGTTAGGG CGATGGTIIGA GGATATTGGC AACGGCATAG TGGAGAACAC CGTGTTTTTC TTGTCACACG GTCAGCTGTT TCGATAACGC CACCTTGGTC AGAGCCTGGA CGCATTTGTT TGACCATCTC ATCTGTCACC AGGGATGAGA ATGGCTCCAA TCACCACATC AGCATCTCTC TGAATTAGAC ATAAGAGTTT GAATTTGACT TCCAAAGACT CTTGGAACTA ATATCTAAAA TAGTCACTTG AGCACCAAGA ATGTGTACCG ACGACACCAC CACCGATGAT AGTTACTTTT ACCACCAAGT1 AGAACACCAG AGCCACCAGC TTGCTTAGTA AACAGCCATA CGACCTGCAA CCTCACTCAT AGGAACGAGG GTCACGAACA GTTTCAGTTG TTTTTGCTGT TAACATAGCA GGCCATGTGC AAGTAGGTGA AGAGAAGAAG ATCGTCGCGC CACAA.AGCAC AGTGGGTAGG ATCATTGATA TCAGTGTAGT ACCTTGATAA GTAGTCACAC TICCTTTGCCA GCCAAAGCTT AGTGCGAGCA ACCGCACCAG.
ATAGACGGGT TCATCGTGCG AACAGCAACG TCAACGATAC AATTCCGGTG CTTTTGCACC ACACT'rGCTT CAATGTTGAA TCTTCTAGAA CTGAGAGACG CCAAGGGCGA TGCGGGCArC CCTTTTGGAA CACCTGGTAC AGGAAGTGAG CTCCGATTTG AGCGGTAGTT GTCCTTGATT TCTGCTAATT CTGGAGCAGC AAGTAACCGT ATTCAGAACT 19920 19980 20040 20100 20160 20220 20280 2040 20400 20460 20520 20580 20640 20700 2 0760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21360 21420 21480 21540 21600 TAAAGATTCT TTTACTTTCA CAACCAACTC TGCTGCCCAA GCTTCACCAG CAGTAGCGAC AATCTCAGCT CCTTGCTTTT GATAGTCAGC ATCAGTAAAG CCAGAACCGA GACCAGCATT 286 TGTTTCGATA AGGACACGAT GACCACGACS' AACTAAGCTA GGCGACACGG T'rTTCGTTAT TTTTAATTTC TTTGGGATT CTACCTTTCA ATTGACGGTC TTGTTTrGGT TGTCACATTC TGACGGTTTC ATTGTATATG AAACCGCTITC AAAAATCAAG TTTTTATGCT AGACTAGTGA AAATCAAGCT CTAATGG.AGG ?TTTGAAATT 'rGCCCAG'rAT CCGTCTATAG AAACGGAGCG CTTTGGATGA TGCGGAAcAA TGTTTGACTA TGCCTCGGAC TTT'ICCAACC AATCAAAGCI' TGGAAGAAAC CAAGAATAAC TAATCCCTTG GGACGT'rGGG GAATAGAACT AAAAAGCAAT TGAACACCTG CAGGTrGTGAG 21660 CCGATTAACA TTGAGATAAC CAGTTCATAA ATCAAAAATG AAAAACTTGT CATCCAAATr GAAAAGTATG GAATCAATAT TTTATTGCTC AGACCTGTAA AAGGGTAATA CACGTTACAC ATTGCTCAGT TCTACTTGGC GGTCAGTTTA TTGGAACCAT TGACTTGCAC AAGATTGATT AAAGTATTGG AATCAAGGAT TGAGAAkGATA GGGATGAATA CTGT'rCTTAA GAAGGCAGCT ATTGGCTACA TTATCAATAA TAACGACAGA AGCCAATCGT GCTGTGATTG AGCTAGrT
AAAGGTCATG
CCAGCATGAA
TT'TGCCAAAT
AATAATCTAA
TCATCGTCAT
CTCCACAAAC
GAGAAATCAG
AAAGGCCGAA
AAATAAGCAG
GAGAGGAGAA
C1'GTACTGGT
ACC'TGTCAAA
AGTTGACTGC
GCATGCGTTT
TCGTGACAAG
TTGAAAAGAA
AATATGGAAG
CTGGGCTTGC
GAGACGAATT
CCTTCACCAT AAGGCTAATC TTCCCATGCA GAACCATATG AGTTCATTAT GTCTTGACCA ATTTTTCGAC TGTTTTTTCT CAATTATCGA GAAAATCAAA
TT'GTAGGAGG'ATTTTTCCTG
TGCAGGCTGA AGTTGCAGCT AAGAAAAGGA AGAACCCCTT
CCGCGTCAGG
CTITGTATGGA
AGGAAGACTA
TCCTCTTACG
GAGTATAAAA
iZTAAA:ACcAG
GTTTCCAAGG
GAACAAGATC
TTGCCTGTAG
21720 21780 21840 21900 21960 22020 22080 22140 22200 22260 22320 22380 22440 22500 22560 22620 22680 22740 22800 22860 22920 22980 23040 23100 23160 23220 23280 23340 23400 ACTCATCGAC CGAAAAGGAA GTGAAGAAGG TAATCACAGT AGATGTCAAA GGTGCTGTCA AATCGCCAGG GATTTATGAC *o o* G'rAGTCGAGT CAATGATGCI' GTTCAGAAGG CTGGTGGCTT GACAGAGCAA GCAGACAGCA AGTCGCTCAA TCTAGCTCAG AALAGTTAGTG ATGAGGCTCT GGTTTACGTT CCTACTAAGG GAGAAGAAGC AGTTAGTCAA CAGACTGGTT CGGGGACAGC TTCTTCAACA AGCAAGGAAA AGAAGGTCA.A TCTCAACAAG GCCAGTCTGG AAGAACTCAA GCAGGTCAAG GGACTGGGAG GAAAACGAGC TCAGGACATI' ATTGACCATC GTGAGGCAAA TGGCAAGTTC AAGTCAGTAG -ACGAGCTCAA GAAGGTCTCT GGCATTGGTG GCAAAACAAT AGAAAAGCTT AAAGACTATG TTACAGTGGA TTAAGAATTr' CTCTATTCCC CTAATTTACC TGAGTTT'rCT ATTACTTTGG CTTTATTACG CTAT'TTCTC AGCATCTTAT CTTGCTTTGT TGGGCTTTGT TT'ITCTGCTA GTCTGTCTCT TTATCCAATT TCCGTGGAAA 'rCTGCTGGTA AAGTTCTAAT AATT'rGCGGA ATCTTTGGAT 'N'TGGTTTGT TTTTCAAAAT TGGCAACAGA GTCAAGCGAG TCAAAATCTG GCGGATTCTG TTGAAAGGGT ACGGATI'TTG CCTGATACTA TTAAGGTTAA TGGTGATACT CTATCCI'TC GTGGCAAGTC GAGGAGGAGA AAGAAGCCTT AAGCTTTCGG AGCCAGAAGG AAGACTCAGG GAATTTACCA GGCAGTTGGG ATATAGGAGA AAGACGCACT TTCCAGACCC GACACCGACT TTGAGGAGAT GCCCTATCTG GCATGCAGGT TAACGGTCGT GCITCCAAG TCAAGCT~rA AC'rGACCTGC GCAGAGAAAT TTTG4GTGGC1' GACTCTCAAT ATCAAAACAA AAACTTGTCC AGTTTACG'rC TATGGGCAAT TACATGACAG GAATGAGCTT TATTCCAGTC AGGTTTTTTC ATGAATGGAT TCTATTATAA ACTCCAGTCC ATGAGATAGG ACTAGAAGGG TTAATACCA AGCCTATCTG TCCAGTCACT TCAAAAGAT'r GAAAGGCTGT GGTTTGGATT GACTCTTGCT GGGACATCTG TAGGAATTAT CCACCTCT TTAAGAAACT TCTCTTGCGA CCTTTTCCCT TATCTATGCG TGCAAAAGCT ACTGGCTCAA TTGTCCTCTT TATTGTCATG TTGGGCTTGA CCCAAGAAAA GTTGAAATGG CTGACT'rATC GGACTAACTG GATTTTCAGC ATCGGTTATT CGCAGTCTCT 0 CATGGGGTTA AGGGCTTGGA CCAAACTTTT TCTT'GACAGC ATGACCAGCA AAGAAGGGGA TTGGGCATAT TGCCCATTCT TTGACCTTTG TLCTTT~CCT'r TTTGTCCTTT CCTTTCTCTA GGCATTATTC GCTTGGTCTC GCATGGCTTT TAATCTTATT -ATTAAAGGAT TAACAGTATT CCACTGGAAA ATGAAATCAC ATGTAACTGG GAAAACCATT TCAAAAAATG GCAAGAAAAG TCAAAAGTCG AGGAGTAGCT ATGTTGGAGA TTTGTCAGAG AAGACAGTCT GAAACAGAAG GTAGTATGAT AGTAGGGGAG CAAGGAAAAT GGGAGATGGA ATAAGCAA'rT TCTCTTCACG TAATTTTGCC TTGACGGTGC AGGAGGAGTC TT-GTCCTGCG GGGGCTCAAG GCTGTTACTA ATCCTTCTAT TTTGCGGAAT TCTTTT'rGAC TTGGTCTTCT TCCAGTCATT CAGCTGAACT GCAGGTGGCA AGGAGACCAC GTTAATTTCC T'rGGCTTTGG GAGTTTATTG ATTACAGGTC CATGCTGGAT GTGGGGCAAG CTCATAGATG TAGGTGGTAA ATGACGACCA GCAATGCCCA AAGATTGACC AGCTAATTTT ATGACCAAGG CTTTCCATGT GAATTTGTGG CAGAACTACA AACTTGCCCA TTTTTGGA.AG GGACACGATG ATACCC.TAGT GGAAATTTGG AGGAGAAAGG
CTTATGCTTT
GTGAAAGTCT
TTCAACCTTG
TACCGCTCTT
TTATCTTTGA
TTGTCTTTGG
TCTATGATTT
TCTTTTTCCT
GAGAAAGTAT
TATCCTGACC
AGTCATCTCC
GTCTATCCTT
GTCTATCTTA
ATGGTTAGAG
TCAACCCAAC
GAGGAAAAAC
TACCAAGTAT
TTTCTACGGG
23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 24240 24300 243*60 24420 24480 24540 24600 24660 24720 24780 24840 24900 24960 25020 25080 25140 GGCAGAATCT TATAAGAAAA GCGAACCTTG ATTCCCTATC GACTAACACG GACAAGGAGC AGGGGAGATT CTAGTATCAA GGCGACTCAA ACAAAGGTGC TCAGTTAGAA GTCTATCTC TCTGTATGGG AAATTCTTGG AGAGAAGGAC TTGCTGAAGC 288 ACTATCCAGA CT'rGAAAGTA AATGTTTTGA AAGCTAGCCA CAAGTCCAGC CTTTCrAGAA AAACTCAAAC CAGAGCTTAC GCAATCGAAT GAAACTCCCC CATCAGGAAA CATTGACACG AAGTTTA'rCG AACTGACCAG CAAGGAGC'TA TACGTT'rTAA TCGAAAGTGT TCGATAGGAA GGATAAATGT TTGCATAATA ATGATAAAAA TGGTATAATG CA'rTAAAAAT CAGCAAAAGT TG TTTTATTA TCCAGTGTAT CTGCTGTGAC AGTCACTAAA GAATATAGTA CCAACTATCA CGACCATCAG GACAGCTATT CTGAGTATAA AGTTGGCGGA AACTATTACA GCGGAGGCTA TTAATTCT'rA TGCAGCTTAC TC-ATGTGACC TTAAAAACGC
TGTAGATTAG
AAAACGTATT
GTTAG'TTTAT
AGTTACAAGT
TATGCTTGGA
GGCTGGAACT
AAGAGTGAGA
GACAAGTCAT
TTGCTATCAA
TAAGTAGTGG
ACA'rCGCAA'r AAAAAATCAT TCTTATCTCA GTTGGAAAGA ACTGGAAGGT ATCAATAGCA GGGGTTGGAT AGTTGGAAAA TGAAATAAAC TAAAAATTTG CAATATTGAG GATATAAAAT AATCTATTGG TCTTCTTCAG ATGATTGGAA TACGGT'rTGG TTCCGTCATG GTCTCGTTAT
ACGCTCGTTA
AAAAGGAGGG
CTTGCAAGAT
TGGCTCTGGA
AAATATCGCA
'rGAGGTCATA CTAGA'rATGT
GTGGATTTCA
AAGACGACCC
GCCCCTCCT
a a a. a.
a a a a a. a.
a CCTTTAAAAA GGGTAGGGTT 'rGTTCCGTGC CATTAGCAAT CTTTATTr'rA TTATGAGAGT 'rrCGTCTTAT CAAAAACATC GGGAAATGTC TGACTATATC GCTTGGTGAT TGCCATGTAT CAAATGGCTT AGATGAGTAT GACAAGAACA GCTGGTI'CTT ATAGAGTAGT AACCATTCAT GTCTATTTTT ATTGAAAAAA TTGTATCTGT ACTCGGTGTT TGGAGAGCAG GTTGGAAAGT AGAAACTCTC CCAAATGTCT TAGACGTGCA AAAAAATCTT GGCGCTGGAA AGAAGCCTAC TATCAAATGA CCCGACTGCT
TATGGTCTTC
TTAATTCCCA
ATTGAATGGC
TGGAAGTCAG
AGTCTTCCCA
TTCCTCAGTC
TATCGACAGA
TTAAGTTCCC
25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26400 26460 26520 26580 26640 26700 26760 26820 TGGATGGAAA CTTAAGTGGG ATGGACTACC GTCTGAAC'1r GAGGGATGAA ATCGCCTATT TTCGCAAGTA TTCCTTAGGC ATGA:AGCAAC AGGCCAAATG CTGGCTCATG GATGAGATTA AGTTT'TTTGA TAGGCTAGCA CAAATCGATA ACTATAAGGA AGAGTTGGTT GATGTCTGCG CAGGCAGA TAGAAGAGGT T'rAGTTTATG AAAGATGTTA GTTTTCAAAA GCCGCTTAAA CTGGATTGTC TTAGCTTTAT ACCTTTTATT TAAATAGTCA GACTGCAAAC TCACACAGCT CGCATTGCAG CCAACGAGAG GGCTATCAAT GAAAATGAAG GATACCAGCT CGGAGGAATA CCAGTTTGCT AAAAATAATT TTGACGCGAA AGACAGAAAT TCTGACTTTA T1TAAAAGAAG TATTTGCAGT GGCAAGATGA AGAGAAGAAT TATGAATTTG AGCCCTGGCT TAAAAATGGG GGTTGACCGC GAACGGAAGA TTTACCAAGC CCTGTATCCC TTGAACATAA AAGCACATAC TTTGGAGTTT CCGACCCACG GGAT'rGATCA GATTGTCTGG ATTTTAGAGG TTATCATCCC AAGTTTGT'rT G;TGGTTGCTA 26880 26940 TTT'IAT GCTAACACAA ACTTATATCC TGTTTCAAAA ATGTAACTGT GCTGTTTATC GTGTTGG ACAGTTAGAT CTATTGGGAA AATACAAGA'r TCGTCATTGT GGAAGTTGTG TCT'rTCTI-TC ACTCA'rTGGG TTCAAAGGAT TGCACATCTG GAAGATTACC TAAGCAGAT'r TT~CCTTGCCT GATTATCTrT CACAGAAAAA AGAATTTr'rT AAACTAACAT AGAGAGGGAA AATACAAACA CAAACTTTTC GTATCATATA AAAGTTGAGA CTA'N'TGCAG AAAGATATCA AAATCATCTG GACACAGCTC GTGACATTTG CAATATCCTC TCT TGGAGTT GGAGTGGGAT GGAATCTGTG GCTTTTCTr TCTAGTGGGA AGTCTGATAA TATCCCTACC CAAT'rTATAG CTrAGTGAAT CAAGAAGTAA GTATrATTrC CTGGCTTGCT CTAGCTTTC TTAGCC?1-rA TACTrGATTG CTTACrT'r ATTGTTGGCT TATTGTTTGG ATTCCCTTTA CTTACTTGCG GATAATGTCG ATCTAAATTG TTGCTATTGG GAATTCTATT AATAGATTCT AGCTTTCCTA TCAACTTGAT TCTCTCTTT'r CAAGCAAAAA ATGCCTGTCC TATCCAAACC ATTrCAGCCTC TTCAGTGGAG ATTTTATCTG GAGCATGGGA ATGGTCTTAC TATTGAAAGA TGGGGAAGTr TAGGTAGGGA AAATAAGTAA TGAT'rCGAAA ACCAAACCAA 27000 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27660 27720 27780 27840 27900 27960 28020 28080 28140 28171 AAAAAATAAC TTTTTATCTT GACAAGAGCT AGAAAACTTG AAAGCAGAAG TGAGAGCTTC TCGCCT'rGTG ACATTAAGTT GCCTGGCCCT ACGGATGAAA AGr'rTCGAAG AAACGCTATC ATAACGTGCG GCTcTGTATA TTTACAAGTC CGCTATTGTT TTTCTCTAAT AAAACAAAAG AGGTGAAAAC CATAGCAAAG CAAGACTTAT TCATCAATCA GGAGAACAGC TAGGTATCAA GTTGACCTAG TATTGATTCA GGTAAGTTCA AATTTGAGTA TGAGATTCGT GTACGTGAAG T TCGCTTGAT TGGTCTTGAA GCCACTCAGT GAAGCCCAAG CTTTGGCTGA TAACGCTAAT ACCCCAAGCC AAACCGCCTG TTGCAAAAAT TATGGACTAC CCAGA-AGAAG CAAAAAGAAC AACGTAAAAA ACAAAGCGTT GTTACTGTGA AAGAAGTTCG TCTAAGTCCG G INFORMATION FOR-SEQ ID NO: 23: (i).*SEQUENCE CHARACTERISTICS:.
LENGTH: 7147 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: -CCGCTCAACT TTTGCAATCA AGGCTAAGTA _GACAGCAGCA AATTTCATAT TGTATAATTT CTGACTCATA CTTCTCTCTT TCTATGTGTA CTAGTATAAA TAAGAAAAAG AAGGCCGTCA 290 ACCCTTCTTT TGATTTAT'rC TTCTGCT'rCA TCN'CTGTAA ATTGACTATT GTACAAG'rCA GCGTAGAAGC CACCTTGCGC CA'rCAGTTCC TCATAGrGC CTTGCTCGAT GATATTTCCA TCTISTCATGA CCAAGATCAA GTCTGCATr'r CGGATGGTTG ACAAGCGGTG GGCAATGACA AAGGATGTGC G1'CCTTCCAT CAAACGGTCC ATGGCTTI"IT GGATCAATTC CTC'rOTCCGT GTGTCAACAG AAGAAGTCGC CTCATCCAAA ATCAAAAGCG GTGCA'rCCTT AAGAAGGGCA CGAGCAATAG TCAATAGT'rG T'1-rTGTCTT ACAGACAAGG TCACGGTGTC ATCCAAGA'rG GTATCATAGC CATCTGGCAA GGTCATAATA AAGTGGTGAA TTCCCACAGC CTTACTAGCT TCCATCATTC GTTCATCACT AATCCCTATT TGATTATAGA TGAGATTGTC TCGAATAGTT CCTTCAAAGA GCCAGGTATC CTGCAAGACC ATTGAAAAGG CATCATGCAC TTCTGAACC GTCATAGCCT TGGTATCCAC ACCATCAATG CGAATACT'rC CCTTA'rCAAT CTCATAGAAT TTCATCAAAA GATTGACAAT GGTTGTCTTA CCAGCCCCAG TCGGCCCAAC AATGGCAACC TTTTGACCAG CATGAGCTGT CGCAGAGAAG TCATAGTC'rT GAACATTGAC ACCGTCCACC AGAATTTCTC CTGCTGACAC GTCGTAGAAA CGTGGAATCA GATTGACCAG AGTTGATTTA CCAGAACCTG TTGACCCAAT AAAGGCCACT GTTTGACCAG TTTCTGCTTT AAAGCTAACA TGTTCAATAA CTGCCTCCGA ATTTGCCGCA TAGCGgAAGG TCACATCCTT AAACTCGACC TGACCTTT~GA AG'TTCATC AGTCAGCTGC ACTTGAACAG GGTTTTGGAT AGAAGAA'rcC :..0U 0.000 0 0.00 U.0*U 0:0:* 0 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 AAATCTAAAA CTTGATTAAT AGTGCTCCCA 'rGAGAAGGAA ATGTCACTAA AGAGAGGCAG ATCCAGTAAA TCGCCACACT GCCATAAGAC GGTTGACAAA TT'rTCATTTT GATAATCCTC TCACGAGTGA TACTGTTCAG GCTAGCGTCA TCAAAACGGT CAGAGCCAGT A'?TCTGAATG CCACGCGTTA CCACTTGCAA GTAGTACGCG TCAAGAGGCT AAAACTCGGT TAAAAATATC GCAAAAAATC CAACTGCAAC CTTGCCGACT GCCACAACTC CCGCT'rAGCA GAGACCATAG TTCGGGGAAG ).ACGATGAAG GCCCATGACA ACCTACATGG CATA.AGACAT GAAAACAATC ACGCGCTATC GGAGCAGCGT CGTTAATCAC ATAGGCCCCA CAAACCACT GAAATCCCCA TCATGATAGG ATTCAAAATA CAAATTCAAA CGGGTCAATT TGCATTGTAG GCACGAACGA TTTATCTGTC AGCCCCtGAA CGTCATCAGG ACGTTGATAA ACCTAAAATC TTCCCAATAG GCCCATAGTA ATCAACATTT AGGAA'N'GAA AATTTCTTAA ACTTCTCAGC CTACTAGTAT TACGGACAAG AAGGCAAGAA ATCTAAATTA GTTTCTTGAC CATCATTTAC TGCTGCAAAT CACGAATACC TGTTAAACTC TCAAGGACTG T'rTTGGAAAG
TCACTGCCAC
CCCAGATAGC
GAACTTGAGT
AAGTACGGCC
CATAATTGAA
AATGTCA'N'G
TCTCTGTCTG CGAGTAATCC AAGAAGCCGC CACTCGGGAT AGGACATTCC CATCATCATG TACCTAGCAA ATCCGTAATT TTCGAGATAT AGGTCGGCAC TTCCAACTCT AGATAGACCG AAAAGCAAG'r AAAGAGAATG GCTAGTAAAA TCATCCCCCA TTCTTTTCTA CTAATTCI!T TGGCTAA'N'T C71'TATTCTC TCCTCCTATT CCCTTGATAT TTTGCCTGTA GTTGACCGAG AACCTTCTCA AAAATC-ACTA AI'CATCTTC ATCAATGTCT TCCATCAACT GCTTGTCTAT GCGTTCAAAA AAAGCCT'rAA CCTGTTGCATI CTGAGAACGT GCTT'GTCCG TCAGACGAAC AAACTTAGCC CGCT'rATCAA CACGACTCGC CTCCAATTCC ACCAAACCAT TTTGCACTATr ACGCTTAACC AGATTACTAG CAACAGGCTT GGTAA'rATTG AGTTCCTGCT CGATATCTTT AATCAAGACC AAGTCTTGGT TTTTCTCGCG ATTATCCAAA AAACGCACAA CCTGACCTTG CGGCCCACCC ATAAATTCAA TGCCGCAACG TTTGGCTTCC ?TrGCACCA TCAGGTGAAT TTGATGACCA AAACGCTTAA AGACTAACAT CGGTTTATCC ATAATCTCCC CCTTCTAAAT AAAAATAGTT CTCTCGAGAA TAATTAAATT TCTATGAGAA CTATTTTCTT GATTAAAAAA ATCCCA-AGTG ATTTTCTCAC TTAGGATCAT GTTCTATAGG TrAAATrAAA ACCCATCTAC GTTCGTATAA ATCTTTTGGA CGTCTTCGTC GTCTTCAAGA ACGCTGTAA.A GT'rTTTCAAA GGT'rTCAAGG TCTTCGCCTG *S as S S
S
ACAATTCCAC TTCTGACTCA GGAATCATTT CCAATTCAGT CAGACTCACG GAGGGCAACG TTGTACCTtC*TTGTGCTTC'r AGACTGCGTC CGCArCTTCA AAACAGAACC TGAAGCGCCC CTGCTGTACG GTTGACGTTA CAAAACCTTC GTAACGTCCT PCGCTTTATC GATAATGTGT TCAAAGCTGA GTTTGATTCT CACCAAATTT TGCATATACT TATTGGCCCA TTTACGTCCC ATAGCCTTGT GAAGGTCAGT ACGTCATCCIA CATCCACATC CCTCCAAATA CAATAACACC CACTTGGAAT TCTTCAATAC TGGCGCTGTG TA.AACTGTGA CGCTTCGAGC AATTGC'rCAA TTTGTTGTCA AAGAGGTAAG 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660
ATGTTTCCGC
GAAGTCAAAG
TCTGTAAAGG
TTTGGCACTT
GGATCTGGAT
T'rAGAGTTAG
ATTAGGAATC
CGT'rTTTACC AAAGGCTGCA CGGACATTGG TATCCACAAT TAGCATAGAG CCATTTGGCC TTTCGTCTGT GTTTCCTTTG GC'rTTATCAA GCGCTTGTTT AGCACGGTCG ATAACGAATT CACCTTTTTT AGCTGCTACA TAGATTTCTA CTCCATC'rTT AGCCGTTTTC TTGGCTACGA TCCTTTTTTC ACATTTTAAT CTTTCTTATT ATAACACAAG TTTTTTTGAT TTTCACTAGA GGATAGCACT TTACCTGCTA AGATGGTCTT CCACATTCAA AAAACAAACT AGACCATTAT CAAAGTCAGC TCAAATTACT GTTTGAAGTT TGCACCTTTT GTTGACAGTC TACTCCAGAC
GGAAATGGAT
GCCTTTCTAT
TTTATTAGCA AATCAAGCTA CTTTATCAAC AGGCACTCAT CTGCAAATAG AA.AGTTTCAG CCAAGTTTGA TGTAGATATA AGCGACAAAA ACAATCATAC ATATCATAGT TCAAGTAAAT ACTTTGAAAT TCAACAGTTC TTATAGGCGC ACCTCTAATA CTCAATAAAA 292 TATTGTAITC TAAGAAA'rCA ATCAAALGAGC AAACTACAAA GATGGGGCTG ACATGG?1'TG GGAGAAGTTA GACTACTACA ATTTAATTTC ATTTTTTCCA
AACACTGTTT
AATT'rACGTG
CTAGCAATTG
ACAGCAAAAT
TGAGGTTGCG
TTCCCAAGAT
ATTTGTTfCAT
ATTTCCGATA
ATAGAAGAGT
GCTAGCCTCA
AAGAGATTTT~
CTGGCACTTC
TAAATGGGTA
AATCATCTAA
TTTTACT'TT
TGGAAATCGC
TTrCTAAGC.AA
GGTTGCTCAA
CGAAGAGTAT
TAAAACATTG
TTAGATATAA
AACAAGTAAA
TTACACA'N'C
TAGGCAAGAG
CGTGTCGTTC
ATCAAGGAAT
ACATCACGAG
GGATAATCAA TCCCCTGTAT TGTTTGATAG ATTCATTTTA ATAAACTTTC AGATATCCGC TCCTAGTCAT TTTCTACCTT ATCGTCCGCT TACGCACTAG CAGGCAAGCC CGGTACACTC.
GGAATACTAG TGGTAAAGTG GTCAAGACT'r CCTTACCTTG GCAACATAGG CACCACTATG GTCTCAACAT AGTCTGTACT
TTGAATT-TCC
TGGCTACCCT
CATACTCCAA
AGAGAGATCA TCGCCTCT' TTGTCGCAAG CATTCTCCTC ATCTTCTACC TGAGGATAGA GAGTTGTTCC CCAAATAGAA TGGCAAATCG GTTTTTCAT AAACCGTACG CCACCAT'rCC TCTAATTTTG ACAGAGAGAT TACGAACATT CCCTTTAPA AGCCGTTAAA TCCTGCCCAT TTCTGTCCCA AGCCTTAGGA ATGAT1CATAG GATAATTCAT TCCAAGTAAT ATAATATTGG ATCCAGCAGT AAATCTCCGT TTCTGTAAGC TGTAACCTTA ATTTGALAAG GTCGCALACTA CA'rTGTCACG TAAAAAAGAA
A*
At S. *e S S
S
*5 5 0 59 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 46 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 Gr~rGTATAGG AAATCGGCAA GCCTGGATGA TCTGCTGTAA 'AGCGACTGCC TTCTTGAATC 0S 55 S A 0@0O 0 5@5* AAGTCCTCTA CCATATCCAC CTrGCCTGTT ACA.ACTCGGG CCTAAAATAA CCGCCTTCAC TTCTGTATTG TCCAAAATCT GCTACCTTGA CTCCTTTTAT CAAAGCTTCA AAAGCAGCCT GTGGTTTCCA ACTTGAGATA GACTTGGCGC CCATAAGCAA GGACGCTCTG CAGAAATTCC TCTCTGTTT'r AALATCCTCTA ACATCTCCTG GATTAAC AGCATCTACG CTGACTGTAT ACAATCTGAA TCTGCN'TTC GCCTGAATGG ACAGAGTrAA CCTGTCTTTT CAAAGTCAGA ACCAAACTTG ACCTTGAGTT AITTTTrTCAT ACTGCATTCT AGCTGGGACA TTATTGACCT TTAGCCAACA AATCGT1TTAC CGCTCCGCGA ACACTTGAAT AGAAAGCTA'r CGCTACTTGC CAAACCAGGC AAATCAATAC TCGACCGCAA GAAGAGTGGG ATTA1'TCTCT AACAAGGTCT CCAGGATAGA GGCGACTGTC GTTGGTAGCT GI'ACAGAAA
CACTCGAAAT
CCGTTACAGT
AATAAATCTG
AATCAATATC
GTTCCATGCT
GACCATAA'rC
TGCTGGGGTC
TATAAG'rCAT
CATCCACTAC
TATCACTTGT
ATAGACCAAA
A'rCTTGAAAC
CTTAAAATTA
AAGAGAATTC
GTGAGCCGTG
TTGATGCCAC
TTCCACTTGG
CGGAGCACGA
GAGAAGTGCT
A'N'TGTCGAC
CACCCGAACT TGGGTCGCCC GTTTCCACTC TGTCTGACGA CTACTTCATC ACTCTTACTC AAGCTCCGCT TCTTTCTTTC GATAACAACA AACTCATCGG GTAGCTGATT ACCCTCTrrG ATGAAACGAT ?I~rCAATACT TTCTCCCTGA TGGGrCAAGA GTTTICTI"N'T ATCGTAATTC ATAGCTAGTA TAAAGTCATT TGGATTGAGT TGCATAAACT ATCCAA.AATT TCCTCGTACA AGTCGGAATG ACCGTATTAT TGTAAAAGGA TTACGAGCTA GGTCAAGAGA GCTGGTrrTGA GTTGTCAATA CAGATATACA AGGCAAATAA AGAGCTACAA CTAATGTAAC ATAGACAGGA GAGTTTGGCG CGCTGTCAAT CAATAACCTT GACAATATAA GTCTGTCCTG AGAACTCAAG CAAAAACAGG AGCTAACTCC CTTTGCGAGC TGATACTTCT
TACTGCTTTA
CAGACTTGTT
ATTCTTCTGA
CCAACATAAA
GATCCGGCTC
AGGTCTGATT
TATGATTCTT
CATCTCCTCT
TTGACCAAGT
TTAGTAGCAT
TCCTTCTCCA
AGAGAAGGAT
TGCCAATCTG
TGGTTAGCCT
TTTGCCATCT TCTACCTCCT AATAAGTTCC
CAGCGAAATC
GACATTGCGT
TACCTTATCT
TTCTATCATA
TTTAACCAAC
CACAGCCAAA
CTTAATCAAG
CT-rCTGATTG
CTTGTCTC'TT
AAGCTCACTG
AGCCGTGGTT GGACTAAGTA CGCCGTCTGG CTAAATAAGA AAGTCAATCA AGGTTGGTCT
AAGTTCTTGA
TC'rTGTTDT
TCGCTACTAA
CCAAACGTCT
TAGTCATGCT
TAGTCGGAAA
CAAGAGCACC CCCTTTTCTC AC'TCAGAAT'r TCCAAAGTTT CAATACAAAA TGCTTGTCGC GTAAATCCAC ATCAGATGTT TTTCAAGCCT CCCGATAGTC ATTGGCTAGT TGTCAAGGCT
TGATAAAAAT
AGAAAAAGGG
5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7147 TGAGAGTTAC TTTCCCCTCC AAGTTTTTTA GAAATCGGGA AACTCCAGAA AGCAAATTTT TCTCTAACTG CGAGAAATA.A AAACCTTTCG TTCCCAGACA TAAGTCTTTC ATGTCGCTTT CTCTAGCAAA TAAGAGCTCA AACATTTGAT AGTAAAAGAA AAATATCTGG CACTGGGTCG CGCTCATCTT TTCCTTATCG GCTTCTTTT 'rTAACCAGAG CAAGGGCGAC AGGTAGCTGG ATTGAGACAT TTCCTCTACC TCCTACTCTT TTTTAACTGG AGCATCTGCA CTAGCTGCCA CTTCTTTTGA CTGGATACTT TCCCACTGGT TAATCTCCTC TGAGATAAGA CCTTCGCATG TCTTGACAAA TAGGGCAAAA GCCI-rGGTC!T TTCCTGCATA TTTCTCCGTT TGGCATTGAT AGAGGAATTT TTCTTTCTCC AGGAGTTGCG.
CAGTTTTTTG GTAAGAAATC CAATTTTCCT TTGCATTATA CAAATTGATA ATCCCCTCAC ACAGCAAGCC GAGACTGGAT AAGGCAACCG AAATCAAACG GTAGCGATCA CCTGGCATAG GAATAGCACA AAAGACAGCT ATGAGGAAAC CTGCCACGAT TTCTGTTATT TTTAATACCT TATAGCGCCT ACGATGTTGA ACGCTTTTCT TTAAAAAATG AGCTATCTGT ACGTCTAATC GCTCTGCAG (7PACA'ITTCT TCTGGCGTCA TATTCGTAAC TCC'ITTCATT TACTTTGATA
ATCAGGG
294 INFORMATION FOR SEQ ID NO: 24: SEQUENCE CHARACTERISTICS: LENGTH: 755 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: CCGCATGGGA TTGGTGTCCT TTTGGGCAAT CTCTTTGACC GCGCCTGCCT TTACTGCCCT CGCGCTTTGG AGCCATTACC AACACGGTGC TGGTTCCTTC CTCATTTAGG AAAATACAAG TTAGTACAAC AGGACCAATC TTCTGGCAAG AGGAAAATCC GAACTGTCCT TCTATTTATC GACAAGCCTT GAGTAAAAAA CTCTTTTTTA TTTATGGCTC AGATAATCAA AATGATAATA ATCCATAAGC GATGGCTACG TTtcCCAACA AGGGaAtCAA TGTCGGCGGT ACGTCTATAT ACTATCGGCC TTGTCATTGC CTTCCTGGAA TTATCTGTGG GACAAAACAA AGAACTTCCT TTGCTTATGT GGATTGCGCC CAAGAATATA TCGACCGTAT GCAAGTATTG TCATCGGAGC AAACTGGAAA CATG'rTTTAT GATCCTAGTC GCAAAAGTTC CCTCTTTTTC TTGGGAACTA CCTCCTAGCA GATGGAGTAG TTCTTTCATT ATTTTCGCCT CAAAGCCTAT ATGGCTACTC CATGGTCGCT CCAAACCCTG CCTAGTGGGT GCCTTGATTG AGTTAAAAAG AGCCACGCGG TCCCAAGAAT TGGATTGCAA ATCGTGATTG TAGCGGTTA:A AACCGCACCG GCCATAGCTG
TTTGCCCAGA
AATTTCTTAG
AAATCTGATC
TCAAGAAATC
ATGGTTGCCA AGATGGTCAC TTACCGATAC C-ACCAGCTCC GGTcACAGTC GTCAC INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 3010 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: TTCAATTGGT ATCTCAATCA ACGGTCTTCA CATGGTTTCA ACTGGTTTGA AGCGAAAGCT GCTGGTTACA ACGCAACTGA AACAGGCTTT AACGATCTTC ATTCATGAAA CATGACAACC ATGAAGTAGC AATTAAGATT GTCTTTIGACA TGAAATTCTT GGTGCCCAAA TGGT'PTCACA TGATATTGCA ATTAGCATGG GTTCTCACTT GCTATCCAAG AGCATGTGAC AATTGATAAA TTGGCATTGA
CTCTTGAAAA
AAAAACCAGA
AAGATAGCCG
GAATCCACAT
CAGACCTCTT
CTTCTTGCCA CAC1-rCAACA AACCATIACAA AAATTAAAAA 'rGAATGAGCT ATCTGGCCCTr TGTCCCCATA CAATTATAGT
AAAGGTAGCT
TAAATAAAAA
AACAGGCTCG
ACCAATACAA
CTTGGCACAG
TATCGCAGAA
CGGCTGATAT TCGTGCAGCT AA6ATCCAAGA AGCGGTTAG'I AAGCTCAGAT TTT'AGAGGC'r CAGCTGATGA CCGTTTCCAT CTAAGGATTT GGGTGAAGCC AAGGAGAACC AGGGACAGGG AGGAAATTCG CCGCATTCAA TGCAAGTCCC TGTAGAATTG ATTTCGCTGC TGGAGGTGTT
TTTTTATCT
ATGATGAGGA
ATG.CTCAACG
GCTC.CTGGTrG
GGAGGAGTTI'
ATCCAGTAA
ATTGAA6ATTG
GTGGACAAGA
TTGCGTCGTA
GA'rATCGTCC A6ACTTACGTG GTCCA6ATATG
GCAACGCCAG
295
CTACATCACA
AAGTTAAGGT
TG1'GCTTCAT
TAAALACAAAT
GTGGTGTTAT
CGGCAGCTfGT
CCCGCATGAG
TGCCTAAGGT
ATTATATCGA
AAGAATTCCA
TCGCTGAAGG
AAGCTGTTCG
CAGATAGmT
TCTGTTCTGA
GACTGAAAAT
TATGGATGTG
GATGGCCTTG
CGACCCAAAG
CAGAATCGGG
CGAGAGTGAA
AGTTCCT'IrT
TGCTTCCATG
TC:ATATGCGT
ATGGCTGCCC TTACGGCTGA TTAGCTAATT 420 CT'rAAAATGA 480 CGTTATGAAC 540 CAGAATCCTG 600 GAACGAATTC 660 ATGATT'AAGG 720 CATTTTGTTG 780 GTTCTATCTC 840 GTCTGTGGTG 900 ATTCGTACCA 960 ATGATGAATC 1020 GCCAAGGATT 1080 CCAGTTGTAA 1140 CAATTAGGGG 1200 AAACGAGCGA 1260 CAAATCTCTG 1320 CTCATGGCTG 1380 GCAGAACATG 1440 AGGACGAGCT TTATGTTGCT 'rTCATGAACA TGGAAAATTG CAGATGCTGC GT'rAATGATG CAGAGGGGGT cT~rGTCGGT 'rCAGGTATTr TCAAGTCAGG AGATCCTGTT GTGCCATTGT TAAGGCTGTG ACTAACTTCC GTAATCCTCA AATCC'rAGCT AAGATTTAGG AGAAGCCATG GTTGGTA'rrA ATGAAAATGA AATCCAAATT AACGAGGAAA ATAGATGAAA ATCGGAATAT TGGCCTTGCA AGGGGCCTTT CAAAAGTGCT AGATCAA'rTA GGTGTCGAGA GTGTAGAACr CAGAAATCTA GATGATT~TC AGCAAGATCA GAGTGACTTG TCGGGTTTGA TTTTGCCTGG TGGTGAGTCT ACAACCATGG GCAAGCTCTT ACGTGACCAG AACATGCTAC TTCCCATCCG AGAAGCCATT CTATCTGGCT TACCAG'rGTT TGGGACCTGT GCGGGCI'AA T'ITr'rCTGGC TAAGGAAATC ACTTCTCAGA AACAGAGTCA TCTAGGAACT ATGGA'rATGG TGGTCGAGCG TAATGCTTAT GGGCGCCAAT TAGGAAGTTI' CTACACGGAA GCAGAATGTA AGGGAGTTGG CAAGATTCCA ATGACCTTA TCCGTGGTCC GATTATCAGT AGTGTGGTG AGGGTGTAGA ALAT'rTrAGCA ACAGTGAACA ATCAAATTGT TGCAGCCCAA GAAAAAAATA TGTTGG'rAAG TTrCTrTTCAT CCAGAATT'GA CTIGATGATGT GCGCTTGCAC CAGTACTTTA TCAATATGTG TAAAGAAAAA AGTI'GAGATT GAATTCTCA ACTNTTAC ATGTAATAAA CAATAGCGAT GTATTGAAGT GCGGACCAG 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 296 CTAGGATAAA GAGATGCCAA ATCATGTGGA AATAAGG'N-r TTTCTTGGCA TAAAATCCAG CTCCAACTGT ATAACAGAGT CCGCCAGTTA CCATGAGACT GACTGATAAT GGCAGGAATG ATAGCCAGAA CCAACCAGCC GGCTAAATTT CTCATTrGACC TT'N'TAGCAA AGATTTTATA TTCCCCATTG GATGACAATA ATCAGATAGC CGGGCGTGTA TGAGCCGGCA ATGGCAACGT CATA'rTTGTG GGTCGAACCA. TAGGCCATAG GAAAGAGACT GATGACGAAA ATGGAAACGC AACTATAGAT GGATGAAATA GGCAGCAAGA TCACGCTATT AGCAATCTCC TCTCCAAAAC 'rCATTGGATT ACCTCCTCTT GAGTATGATC AACGGTTTGG CAGCTGGTTT GGATAATAGG CAAACCAGTr
AAATCATAGA
AGTGATAAAT
CGATAGAGGA
TAAGCATGAT
TGAGTrGTTT GA'rTAAGTCT
GTTAGCTGGG
CCAGAAAACG GGTGTCGTT'r CATAATCAGG TAAAGAGCAA GAGAATACCA AAGATGGTCG ATTCATCAAG GTCAAGACAA ATGGTCA.ATG ATTCGCAAAA GGTGGATGAT AGGAACATGA TA.AAAATC:CG TGTCCT'rCAT GACTGCACCC ACAGCATGGG GCTGAGTTTA AGACTAGTGT AGAGTTTGAT GATAGAGTTT 'rCAATTCCTT GGTTCATGTA GTTTGTAGAG TATTAAGTGT GTGATAGCAA TCAAACGGGC AGGTAACCAT TTTTCACATA GGAGAGATAG GGGCGCAGAC GTCCACAAAA GCATCGTAGA GTTGGTCTGA ACTTGCTTGA CTGGGCTATT TCTrGAATAG AAAATACAGA CTTGAGGGTT AATCTGTTGG CGTTGGTATT TTTTTTTGTC AGGCTTTGTC ATTGTTGACC ATAGATGCTG TTAGGCCCTT GTCTTTATTA.
CTGATTGACA
INFORMATION FOR SEQ ID NO: 26: SEQUENCE CHARACTERISTICS: LENGTH: 15213 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3010 120 180 240 300 360 420 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: CATAAATCGG TGCAAATAAC TTAATAGTGA AGTAGCCATT TCTTTCGTAT TTACCTGAGG CATATTCCCT AGACGAAAGA ATATTATTAT CAATCAAATC ATTGAATGAA CGTAGTCTTT CAACTTCTrrC TACTGTTAGA TTTCTGACAA CATTTGTTGC ATAGACCTTA TTTCCATCAG GATCAGGATG GTACTCATTT GTAACTTTTC TAAGAAGTTG TTGTTTTTGA TTCGTATCCA ATTTAAGAAT TGAATTTCCT TCGAGATATT CCAACATATA AACAACGTCA AACATGTTGT GGACAT ATTG CTTCAAATCA TCTGCATTAT TAAATCTTGT AGTTGGATCA AGTACTTGTA ATCGTCGACT TTCTGTACTA TCAGATTTTG AATGTTTCAA GATGGAGTTG ATGGTAATGG 297 TCGCATCATC TGGATGGTCT GGTGC1'TGTA A'rAATCC?1'r AGCAAAGAAC TCTGGTrCCCA AGCCACT'rC' TCGACCATAT CCTCCAAGAT AAATGTCCTG ATCTGAGTCA TGTGTCATC'r CATGCGTATA AGTAATAGCT CCATCCTIAT CATCACCTGT ACCATAAGCA CCGTGTTGAT AGAAATGTTG CATTGCAGGA TTGGATTAT CGGTATTATC ATCGCCAAAT T'rATAAGCAT CACGTGCATT GTCGTCTAAA ATACGATACC CTGT1'TCACG CGCA'N'TTCT 1'CAACAAAAT TACTGCGGTA GCGATCATAA GCTCCAAATC CGGATCTCTC TGGCAAGGTC AGGAGAGGCA TCGTGATACG ATCATAAACA CCGATAGAAT TCACCTCTTC GATAGTGGAT TTTTCTTCGA CCAACATTCG ATAACCCATA TAATAA.ACTG TATGCCCAAC TTTATTITCCA ACAGGTCCAA CAAAATCTGC CACTTCTGTA GCTTTCCCTA CGTAAAGCAA AATATTTCTA TAAAGT1,TTT A.ATAATCGTA GTGATCTCGC TGACGTTTGG CATCAGAGC CTTGCCCGCT TTA'rGc'CAC CTAGACTAGA CATGGTCGAG ATGACAAATA AGACCATATT GCGGTATTTC CATGTGGCAC ACTTGGTGCC AGCTAACCCT TGCTTCGTT'r CAATGTAAGC CTTAGTCTCT GATTTAAACC AGTCATTATT GCTTGTATTT GGTAAAAAGA CTTTTICGGTA ATGTTCCAGC GTGCTAAACA AATCTGTCGT TCCATGTTGA CTGGCAAGAC TGATACCATA AGTATCGACA TTATTCTTAG CTAGAAGA'rT GTTAAAGCCA GATTTACCCA ACTCAATCAG TCCCCTTACC AAAGAAGTCC AAATGGTACA -GAACTAGGTC.
AGCTAAAGTT ATACCACCGT TCCAGATAGG GTT'rGAT'TT ATCTACAAGA TAACCTTCAG CCGCTGACAA GACTTTTTrTC AAACTGTCTT CTTCTAGATA GAGCI'CAGTT TGCTTGACGT CTTCTGAATG ATAGTCAACC TTTTGTAAGT GGTCATACAG GAATTGGTTT GGCGTATAGA CTAATTTGGC GAAATCATTC TGGTATTTGA AGTGAAGCAA CAGTTTGTTT GCAGTCTGTT1 TGTCCTTCAT CATGACTGCT GACAAGAGTT CCAGGTTTCC GTATTTGACG ATGGT'rGCCT 'TTTTATAAGT CAAGT'rGCGC TTAGCTTGAT
TCAAGCCAAG
TGACGGGGTT
CCAGTTGTTG
TTGGAGAAAT
CAGGTAAGAC
GAAGTCCAGT
GATCCAGCTT
AGTATCTAAT GGTGAAGCAT T'rTGACATTC.ACCTGACCAT TAGCAAGGCT TCCT'rGT'GC AGCACTAGCC AGTCCAGCAT TTTTGTTTTG GCGAACTGGT ACCCAGCGTC TTTCTGATGG TTGCTTGATG ATAGAGGTTT AT'rGCCCAGA CTATATTCTG CTCAGATAAA TCATCCTTGT 1200 1260 1320 1-380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 TGTTAGAAAC AATGTCTGTG ATGACTTGGT C'TTTTTGATA TAAAAGACTG 'rrCTCATTGA TGTTGTAGAA AGGTAGCAAT TTTTCAATGT AATAGGCCAC CTTAGAAAAA TCACI'GTCTT TTI'TGCCACT TGTTGAAAGT1 GGCTCCACTG TTGGTAAALAT GAGAGGATTG ATTTCTGCTT TTTGCTTGC AATTTGAGAA GCATCTAGCA TTGTTCCTCr TTCTTCAAAG GATTECCTTGC TGACGACCTC ATCCTTGACC AAGGTGACAT TGTCCTTTAC CTTCAT?1'CG TrATAGTGGT TAACATCGCT GAGAACATTG GTCAAACTTC CCCACAAATT GCC'rGCCACT CCAGCGACTC CTTCAGCATA GCTATCTTGG ATCTGTGCAT CAGTCTGATC TGAAGTATTT GTGTTAGATG GTAAAGCCTT GTCACCTGTC AAA'rGACCGA TTTCATAAGT GTTGATAATT CTTCCCTTGA CCTTAGCCAG CAAACCACCG ATACCACGTT 298 TGTAGACTC'? GTTGGCCTTG CTGCTGAATG AACCAGTGAT GGCATTTrCCG TTCGTTACAT CAGCATGCCT AACATCACCA GAAGTTCGAT TACCAAAGTG C7TGACATTG ?I'GATATCAC CTCGGTCTAC 'rAGGCCTGCA AGTCCACCCA AAATGGCTAC TGTCGCTTTT GACTTAGTAA CCATACCACC GATATTGTAG GCAGCAGTCG AACTGCTCTC TGTGATGC?1' GATTGCTCAG CACCAGCCAG AACACCA'rCG ACGTGAACTT TTGCCAGTGA ACCGATATCA TCTTTCCCTG CTACTGTAGC ACCACTCAAG TTTrCAAACA TCTT'GCCATC TTTT'rCACCG ATTAAACGAC C-ATCAGGACC AAGCTCCACT TCGTTAGCAT
GCTTAATTTT
AAATAGCAAC
GAGGT~t'rTTT
CAGTAAAGGT
TGTGTTATTC TCACTTCAT ATTTTrrAGA C'rCAGTr'rTT CAAATTATAG ATAGCATAAT GTCCTTGATA TACGATCTTT a..
a.
a a a a a a. .a a a *Saa*.
a a a.
a a a a a.
a. .a a a TCAGGCTGGC CGCTAAATGA TAGGTTCCAG. AGGGATTTTG TACTAAAGGA AGTAAAGTTT GTTGTTrCTT CTGTTCCCTT AATTATCTTT ATATCTGCTT TCTATCTCCT GCTGAAGCTT TATAAAGGAT TTTATCATTI 'T CTTrCCT CTGATATTGA CTTTGAATGA AGAAGATTTC ACTTTAACAA AGTAGCTATT CTAACGAAAT GTGTTGTTTA TAAGTACCAT TTGACAAACT CAT'IrCTTAA TNCAAGTGTT TTCTCTGGTT CTTCTACCTT TTTCTTGTTT AATTCTCG TTrCCATTTG AATTGGATGT CCTCAGTTGA AT'rTCCGTTT GATGGTTCTG G'1TCTGTTTG GTTTATAGCT TTGACCAGAT CTTAGCTAGA TAGAAGGTAA CTCTACTTTT GCTGTGATTT TGCTACTrGGT- AGGTATACAT TGGAT'rGCTT GGAACTTGCT GTATAACTCT AGGTCGGAAA TT'rATCAGGG TCTAGTTCAT GTTTGATTCG GTTGAA-ACAT TCCATTCTCT GATGTTGTAT 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960
TACCTGAATT
CTGGTGGTTTr
TCCCAGAGTT
AGGTACCTTC
AATTTTCATT
AATCACTAGG
TCTCTGAATT
AGGTTCCTTC
TTCTGGTTTT
TGAATCACTA
TGTTTGTGTT
TACTGTGCCT
T1TTAGAGTCA
ATTACTGGAC
CGTTGTTGAT
TGTAGTACCT
GTTGCAGTTC CG~TrTTC GGTTTATTGG ATACTTCTCC TCT'rCTGCAG GMGAACTGG
TGGTTGATT
AGTATT'rTCG TTT'rTCTGTT
GATTCTTCAA
T'rAGC'TATTT
TCTTGATTTG
TCATTTGGAT TTACTGGAAC TTCTTCTACA GTTTTTTCTG TTATGTTCTG GTTTATT'rGA IrCTCCAACT GAGG'N'GTCG ACTTCCCCAG 'rATTrTTGCT AGATGTATCT GGTGATACTT TCTTC'rGCAG G?1TGAACTGG ArTTTCTGCT TCTTGAATTG TCATTTGGAT T'rACTGGTGT TTCTTCTGTT GGTTTTACTG GAACTTCTTC AGT=TTTCT TT'ACTTGCTC AATATTACCC TATCACI'AC CACAGTATCT CAACTGCTTC GGGTAATGTA CGG.GCAACTC AGGCrGAATT CTACACCAGT CTCAGGTTGT T'rCTTGGACT AGGCGCZAG'IC TTACAGAGAA TATTCTGACG 299 GGACCTT'GTT CTrTGGTCT CTCAACCGGA GTr'rCAGGTT TTATATTCTG GAAGCGGTGC TACCTGCTCT GGTTC-ACCT GGCGACTCTG GTTGAACCTC AGTCTCACCT TTGTCG.GTCA GGTTGAACTT CTGGTTCGCC GCGGGTCAA CAATAGCTCC TCCTTTATAA CTTGAG'I'NT GTTGAAG TTO AAACAATTC ATTTCAACTI' 'CTTACCTAA TTTGTCACT'r ACTACAGCTT
AGACTGTACG
TrTACTACCT
TCGCCAAACT
TCCTI'ATGTT
TTT'rCGAC'rA TCT'rCCTTGT 0 *t 0 000 0 a. ta
V.
a a a *b tO 9S a a a V a *aa.
TTACAGT'rCC TTCAGCTAAA TCAGGATTTT TCTCCATAGT TT~CCTCACGA TATAAGAGTT CATCCTGTGG ATTTAATGTA TTTACCCCAG TCTTCGTTTC TAGATTCTTA TGTTCGGCTA CTTCTTTTCT TGGATTGATT AATTCAGTAG TCGGCTTAGT TGAAGAAACA GGTGTTTGTT CTACAAAATT CGGTGTAACA T'TATAATCCA AACTCTTTTG A'rrACTTACT TCAGACTC?.G TATAAGTGTA ACCTGAAATC TCTTTAGGAA AGTCCGTATT GTAATTTAGC AAAAGATGAT CTTGAAT?1'C
CAGGTTTGTT
TCTTTTCTT
ATTGTTCTTG
AGAAAGGTTT TTCAACTACT TCAACT'rCTG CCTGAATAGC TTGTACTGTT GATGGATGGT CCTTTTGTTG TTT'rGTAGGA GTGGCAACTG AAGTCGTTTT TCCCTCTTTG ATATATrccAA GAGGTAA'T TTCTCCAGAG OTCAATTCAT TTTCTAALAGC ATGGACTGAA ACTAAGACAC GTAATACCGT TTTATTCTTA ACCTTTTT'CT CAGCTAAGCT AGCACCAGCA ACTAGGGCTT TTTACCTTCT TGTTTTACTC T'rCTTGAAAA 'rCTATTTTTG CAATTGACCT GATAAAACT'r TGGAGAAATC TTCTCC'rCTT AGAATCTGAA GATTGTTTCT 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 CATTTCCTAT CCCTGCAACC TGGAAACAGC AAAAATTAAA GCCTCITCATT CTTGCTTCCA TATAAACAAG ATAATAAGTT GC'TGCTTCTT TTCTTCTGAT CAGTTTCTTrG AGCATCCACA TCGTTGCAGA TACAAGTCCT CAAAATACTT TTCCATTATTr ATTATATTAG TGTATTATCT
AATACTAAAT
GTATTTGGCA ATTCCGCCAG TCATCATCAT TCTCCACGTA GATAGCTCTG AATCTGCCAC GATGAACTAG CTAATACAGA ACTGATAATT TTCTAAATGA CCTCCTTGAA ATAAAATTTA ATTATCTATA GAAAAGGCAG TTGATTTTGA GAATTTAACT TGTCGGAATA TCATAGACAA ATAT'TTATAG TGAACTCCCG CATAAAAAAT AAACTTGAAA AAAACGCTCT TGTTTTTCAC TATATG'rTAC AAAGACCTTT TATACCTTA.A TTATACTCTT CTTTGTTTTA TATTATTTGG TTACrCATAT CACTAATCGT 9.a@ a. to 0 0 AATTTACAAA AAAGTCTTAA AATTGAGATG CGCTTTCATA AGGTACAATA ACACCTACCA TGAAATTTAC ACGGTAGGTG 300 TCTAAAAATG GTTTGAGGCA GTTGAGGAGA ATTCCTTCTA TCCAGCTTCC TTGTGCTGAT GAGCGATGGT CTTCCTGCAG GCTTrMI'T AGAAAATCTC GGACTTGTTC TGGTGCGATT 'ICAAATTCAA AGGCTTTCAT TTTATAGAAA AAGTCGATGA GATGATCTGA CAGGTATTCA GTTGAAAAGG GTACTTCACC ACTTTTTCTA TA'rrCTAATA AGAGTCTAG.A AAATCGAGCT TTTTCTTCAG GAAGCTCACG AAAATAGGAA TTGAGGATCC AA1~rGCT'r CTGTTrCTT TCAATTGGAT CCTGACTGGC AATTCGTTGG GCCTTGATAG CTCGTTCTGC TCTATTTTTA TCTTGAGTCA GG'1'CTCTGT AXAGCCAAAG ATGGCTAGAA CCAGATTGTC TGCATATTGC TI'CTCTGCCT GGATACGGAG TTCTTG'rTCG AGGTAGAGT'r GGTCATCCGA TTTCCCAAGT TCTTCCATCC GAGCCTTTTT CTTTGGTTCC TCTAGGAAAA GCTGGTAGTC TCTCTCAGGC ACACCTITTT CCCAGAGCCA T'TTrAGAAGT
TCTTTTTCCA
CCAAAAAGAA
TAATCTTGAT
TTGGCGAT'TT
TAGTCAATTT
AAAAAGGGTT
GCCTTGGTCC
GCTCTTTTG GTATTGTT-TG T'TTTTTCCCA CTTGCG'1'CT AACCACGCTC TGCGGGTCCC TATCCCTCTT CTTGCGTTCT TCTCCTTGCC TAGCTTGACA TGATACACTT TTCAAGGACT AACTTCCTCC CTGAAAGAC'r GCAAA'TGAT TGCCACCATTr GGGTTTGAAA CGCTCGTCAA AGT'rACTITr ATTGACCTTG ATTrTTTTCCT TTT'rCTGAGC TCCTCTTCCA ATTGCTGGTC ATTGGAAAGA GGCGTTGGCC TTTCCACAGA CACAATTGCTr TCCTrGGTAG CTTGTTCCCA GATTTTCTAA CTGGAAGTGT CAAACTAGAA AGCTAGCCGC GACGAAGTCA GCtCAAAACA ?rITTCTGGTT AGATTTTCAA CCTTTCTGAG CAGTTTTTCT AAGGGACAAT CGATGAAAAT GACGAACACA GTCGCTACCA TGTGACACCG TTAAAGAGTT CATAAGCtGTA TTTATGGCA 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 ACGGCCGATA CCGTTAAAAA TAAAGGAAAC AGTATCCGCr TT1CCAAGCCT GTAAAACTGC CATGAGGTCT CCTTTCTAAT ACTCAATAAA AATCAGCTCA AAACACTGTT TTGAGGTTGT CTGTTTTGAG GT'rGTGGATA GAACTGACGA CATATATACA GCAAGGCGAA GCTGACGTGG TACTTTTACA ACTTGAACCT CGTCTTTACC AATCGAATAG GCTCGTGATA AAGCCTCI"TC TACGAGTTGA CTTGGTTCAT CATAGCGGTC TTCGTAAAGC AGATAGCCAT CAAGGATACC GTCTCGTTTG AGCATGGAGA AAGGTTGCTC TTTGAAGAGA TTTTCAAAGA GAGTAAAATC AAGTATTTTT GTATAGAGCT AACTGACCAC TGTCTTGTAG TCGAACAGAA ACGGACAACA AATCTTCCP 'rTCATTCCAT
ATCGTGCAGG
AA'rCAAAGAG
AGATAGAACT
AGTCAgTAAC
GTATAAGTTA
CAATATTTTC
GATAGCGGTC
CAATTTTGTT
GACTCTTTTG
TAGCAAGAAT TGGTCGGTAT GCGATAAAGA TTCCTGACCG AG'rACTG1'GT CAAAGAAAGC AAGAATTTTA TCAAGATTGA TCTTGTCTCT GACAGCTTGG CTAGTTTGAA CTTGTTTGAG TGTTTCTGTT AGGCTAGCAA GGGTTAGTTG
CTGGCTGACG
TACCT'I-r'CT
ACCTTGACCA
TTGACTTGGG
'rTCCACCTCC
T'TGGGAGCTA
'rCAATTCTC'r
TTGCGTTGAAA
GCAATCTCGA
GTTTGAACAC
TTCAGCATTT
'rCTTGGAGAG GCATGAGITC GTGAGTAGCA AATCTGGCAA A'rCGAAGCTr.
CACCr'rCCAT ATCCATAACT CTACCAATCT CAGCTCCAGT ATr-rrCTTGC CTACTGACTG GGTTCG;TAGA ATTTCTTGAT TAGGAAGTTC AATAGCTrGCG3 CGGTGAAGAG
CCAGAGCTTC
GACTCTTGGT
'1-TGATGGTA TTCCA6ACTCT
TCTGACTGAC
CCGATAGCTT
CTGATCTTCG CCAATAAAAC GATAACTAAA GTTGAGCTTG TCCTTAGTA.A GATAGCCCAA AGCCAATCTT GGAAA'rTCCG TGCTTGCAGT CTAGTATTGC CCCAT'rTTTG GCTGCTGGGT. ATTCCTTGGA TTCCAGCTTT TCACGAGAAC AAGATAGAGC TTTT'rCTCAG CCCGCGTCAT AGCAACATAC AGCAAACGCA ATAGCTTGCT AGCTGTAATT CC'rCTTCGTT CTGCCTATAG GTCAGACTAG TTTGATGGTT TTAGGATAG'r GGTCTTCTAC TGCCCCTGTC TCCATCTTGG GACACCAAGA CCATTCTCAC GACTGAGAAT GACTTCTGAC ATAGAGTCT ATCTTGATCC ATATTGAGGA TAAAGACGTA AGGAAACTCC AGCCCTTTAC GGTCATGAGC TCTACTGCAT CTrTTTGGCGG TGCGACGGCC ACGCTTGCCA TATTATAAAC 7740 GATTGTCTGC 7800 CTCTGGTCAA 7860 ACACTTTACT 7920 TATTTAGTrr 7980 CCTTrGCCGAC 8040 TCTGCTCAGA 8100 GAATGGAGAG 8160 CAA-ATArrT 8220 GCTTGTTGAA 8280 TCrrGTGGAT 8340 AA'rCGTGCTG 8400 AAT'rGCTCTT 8460 CAGGACCATT 8520 AAATCAAGTC 85e0 TGAATTGCTT 8640 GGCTTCTAAG A=rGGTCAA TCATACGAAT AAAACGCGAC T'rCAA.ATTGA TCAGCACGCA GTGCTAGGGC ATAGAGATTG CGGCAAAGCC CCAACATAGT CATAATAAAA ACGGTCGTTG ATAGAGAGAG TGGGTTTTGG CATACAAGCG CCAAGAAGCT TAGTTTTTCA GCTAGAGCTG TGTGAATCAA GCCTTTTTGA AAAccrTTGA
GCCTGCCTAG
TAAATCTTCC
AGGATATCCA
CTACTTrGCCA TTTTTTGTGC 0 0 ATTGACCAGT TTCTCATAGA*GATTTTCGTG ACGTGCTAGC TCATCC'rCAT CAAAACCAAA CTAGTCTTGC*AGGGGATTGT GAATGACACG GGATTGGAGA TAATTGTTTT GCTCTCCGTC GGCGAGGAGA ATCTGGTCAT TACGACTGCG Tm:r 1 ATCC! 'TCTrCT GAAGGGACAA: CATTGGAGAC TTCATA-AGGG CAACCAAGGC AAGAGTGTCT AGCATGACTT GCACT'rCTAG AGNTTTGACA GGAATTCCGT ACTCAGACAG GCTGGAGGTC AGAAGGGCA.A TTTCCTTAAA AATCTCCTTG ATAACTAAGC GCATTTCGCC TTCCTCACCT GTATCGTCCT TGTCGTAGAG AGTCAGTTTG GTATTGGCAA AAACAAGCTG CTCTTGGTCC ATGAGACGTT CAAAGACATC 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 GGCAACACCT TTTTCTTGAT TGTTAGTTTC GTTTCTGTTT GAGAAATGCT GCCTTGTTGT GTGCTTGTTA TCATAGTTGA
GAAGTTTCAG
GACTCTCTTC
CTGGATTGGG
TTTCGCCGAC
A1'TGGTTGCI' GACAGCAC3-r GCCTTCTTGG GGATTTTGCG CTGACGGAAA CGATAGATGG AGACAACAAT TCCAGCATCC GACTTCATGG AAGCGCTCCT AATGGTGTAA TGGC'TGATAT ATAAGCCTCT ACAAAATCGC TCCATGATAA CGTTCTTGAT AGCAAACTGG GTCTTTCTCT CGGCTGGCAT TAGTCAGAC AGCACTrcCCT GATAAGCCTG ATTAACTGAA CAT'N'TCTAA TGGTAACGGA AAAAGCTTTC TCTTTTTCAC TGGTAAAATC TTTTGAGGAT TACTGGTGGA CCCCGTTCGT CCTTGCCACG TGT'rTACCTT GGTAATGCGC TGCTCGCTTT GGTTGTAA TTGCCAAGGA ATTTTTGTGT GCCAACTGGC GACCCAAGTG CTGATTTTTT TCTCTAAACG ATAAAGAGTT GAGAAATTTC 302 CTGAACTACT ACGAAAT TCCTTGAGGA TAATGAGCCT CATAGCGTTG GAA'rTTCTCA 'rTGAAAATCT GCGGG'rCTGC ATTGCDGAT ATCTCCCACC ATAAAGCGAT TGTGGCCATT GTTCTTGAAT ATGGTTGGTA 'rCCTGATACT CATCG.ACCAT GATAAGAC'rC ACGAACTTGT GCCAAATTCT CTAAAATCTC CAGCGAATTC GAAGGCAT TCCTGTCG?1' T'rCTCTGACG TCATGAAAGA TTGGAAGGTT TTAGCTAGTT TCCAAGTGTC AGTCGAGAAT CGCTATCTG.G TCTGATAATT GTCCTAGTTT CTTCGT'rGTA GGCATCAGCC AGGGGCTTCA AATCAGCCTA TCGACCGTTT TTCTCCTTAG AGATGGCGAC AACACGCGCA ACTATCGGAC TCCTGATTrA GGGAGCCAAT T'rCATCCAGA ATAGGCAGCC TTTGCAPLACT CCTTGGCATC GTTATCCAGA CAAATCCCAA AGGGCTTGTT TGATTTGCTC GGTCAGTTTT AGCTTTCTCA AATCCTTTGA GGAAAGATTC ACTCAGCCAC TTGGAGGAAG TCATAGATTT TATAGACCTG CTGGCGCAGA CCCAGCAA.AG T'rTTTCAGCA AATGACTAAA GGTCTCTTTC TTCAAAGACC TCATGAAAGA C-TTCGTTTTC GAGAATAAGT AATACGGAAA TTAGGTdCAA TATCAAGCAG ATAACCATGT GAAAGAATCC ATGGTTCCAA TGGCAGCCTT GGGTAGGTCT TTGTTTGAGG TCGACATCAT CTGTTTCTTG GATTT'rCTTG TTCTTTAAGT TCAGTTGCAG CCT'rGACGGT AAAGGTTGAG GACACCACGC GCCAATTGGT CCAGAATGCG CTCTGCCATG 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040
ACAAAGGTCT
ATAGCTTCGA
TGCAGT'T'T
CTCTTATTTT
CTAGGTCCAA
TAATAGCCTG
TGATGGCGAA
AGTCCAGTAG
TTCCAGAACC
TTTGCTCGGC
GAATCTCCTC
TTCAAGCCAA
CT'rTTCTAGG
ATGTTGCTGG
CCGGCCTGCT
GAGCTGAAAT
AGCCGATGCT GAGACCAGGA TAT'rCTGGGC AGTTTTCT'rC TGTTCCT'rGC TCGAATTTGC CTCACTTAAA AAGGGAATAA GCTTCATCGA GCTTGCTTGA GTTTTCTCC GACCAGACGC AAACGGGCTT GGCCCAGATG GTAAT'rGGCT ACGTATGGGG CAATGCTTCT GCCATTTTCA AAAATCTT'CT CAGCAGCTTT CTTGTAAAGA TCCTCATCTG TCAGTTGATT AGCCTTGTTT
AGAAGTGTAG
'rTCTGC'rTCT
TTCAACTCCT
TTGCCATCAG
TCAAAGCCTG
GTATAAGGAT
TAGGCATTGT
TTGTTATAAA
ATTCGCCTAA ATAACTG.CrT TCTTTCCA AGAAGAGCCC TTGGTATTC ATAGA'rTGC 'rGGCTTCTAC CACTGCTCCT CCATTTCCAA GTACATGGCG GA'rAGGTTGG TAACTGAGAA GACTGGAT'r? GTAGTCTACT CCACCTTGCC TCGTACAAAG TTCCACCAAA ATT'rGCTrT GTCCAGTTGT CCGTGCAACA CTTGATAAAT AGCr'rCAAAT GTTGGTCAAA GGAATCTTCA AATI'CCCGTG ACTACGGGCA AGCGTAGGAA ATAACTGTAT* GCCAACTT TTCCCCATCAGAGATTGG ACAGGTTCAG CCGAAAAAGT TCTGCTCCCC TTCTCTr= AGCAAA TTGAGCCCAT TAAAGAAATG AGGAAACTGG AACTGAGTCA ACTCCTATCG CTCCATTAGC 'rTTCAAACGG TCAATCCGGT ACACTGCGTC CATTGTCTAA T'TGAATAAAG GCTTGGTCT TCTTI'GATGG TTTCGATGGC TGGATTC;TGT CGGAGAATAT TCAAGCAAAA CTTCCTTGGT AAACTGGCCT TCCA.AACTTr TCGCGTTCTT GACTGGTTTC TTGAATAGC'r TGTTCTAGAC TTAGGCAACT GTAAGGCGCG rTCAAAGATA CGATGCAAGA TCAGGATGCA AACGTAAT'rC CTCCTGCAAG CCTAAkAACG'r TCATTGCGAT AAAACTWTGT CAAACCCGAC GTAGACAGGT b
AAAACTCCTG
T'rGGACTGGT
CACGCGACAG
GGTGATAGGC
TTTGGCAGGA TAGAGAGCT'r GCAAGGTGTC CTTGGCTAAG GTCTTGCTC TGGGATAGCT GGATrTTTCCA GACCTTGCTG ATCTAGTTTT TTIACCTATGA AACCTTGACA AAAGTCAALAT CTTGCTCAGT ATCGCTCATC TCACCCTGC'r AAcr-AGAC'rA GACAAAAGAC TGTGATAGGA CCCCATATCC TCCTTAGACA 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 GTCCTTTGTG ATTCATCCTC TTCTCTCTCC GCCTAAATCC AAAA'rGGATC AACTC'rTGAA
GATAGGCAGA
GCTTACGAGC
TTCCTTACTT TCACTTTCGT TAAAAAGGCT AGAATTGACC AAGGAAAGCA 'rAGTGTAGCG TGCTGGCAAT CAGTAATTGA ACGCCT='CTT CATCTGTCAG AAGACTGGTG TT'rTGAGAAA TAGCATAGAC AAAGTCAGCA GTCAATGGTG TGTCCACTbT TGCTGGAATG GTACGG'rAT'r CTAGGAAGTC TTCCAGACTA ACCTGTGAAC q4TGGCAGAA AG3CCTTCCAA ACTTCGGCTT TTGTCAAATC TTGTALACTGC TTGGTCACAG GTAGGAGTTT TTCAGCCTTT TGT'TTCGGC TTCTCAGGCG GAGGACATTC AAACGC'rCAA
CGGTCGCTTG
TITTTGGTAA
CAATCAAATC
GGGACAAACT
TGGAGCCOAC AAGAACAACT AT'TTTCTTG AGATTTTCAC GTTT'AGGTTT TGCCTTTCTT ATTCTCCTGA GTTAGTCCAA GTAACTCTGC ACCAGAACAG CATTCCAGAA TGGAGCAAGG AGTCGCAAAT TGTTCTAAAA TTCTACAGCT TCCAAAGTGG TAGAAAGACA CTCCA'TTTT
CAGCAAAAAC
GTCTTTCCTG
CTCTCT'rr TCGCAAAGAG GGTT'rCAAGA GGTGCTAAAA GATTAAATTT TCCATGGTGG GAT'rTGGTGA AGGTTTGCTG AAAGGCTGGC AAGCCATTGA TACCAAGATA GCGGATATAT TGCTCAAAAG 304 CATCAATATC AGACTGACTG AGGTCAGTAT ACAAATCAGT TCTAAGAAGA 7-rAATCAAAT CCTCCTGACG AAAACCG'1AA CGTTTTAAAG C'rAAAATAGA CTCGACAAAC TCAGTCAAGG GATGATGAGC CATGGCT'rCG CTTCTACCAA GATAAAAAGG AATCTGATAC TGGTCAAAAA TGGTTTTGAG AGATAACTGG TAAGAAGCTA CATCCCCCAA GAGAATACGA AAATGCTTGT AGCTCAGGTC TGAGTTCTCA TGTAATT1'CT GACGAA'rACT ACGGGCTACT AGCTCCAACT
CCTCCTTTTG
CCAAAGCGAG
CGTCAAACAA
TTCTGAAAAG
CACCAGArTTT GTAAATTTT'C ACGGTCTTTC TCATCGACAT TCATAAGAAG ACTCCAACAA ACGAGAGGCC TTGTCAAAAC TATCCATCTTI CTCATGAGTT GATGATGGAG AAATTTACG AGGCTTTCTT ACTAGCATAA CCACAACCCG CTCTrCCTCA GATTAAAATC ACTACTTACC TTTCCTGGGC TAACTGACCT AATCCGCCCT CTTATCCTCA TGAGAACAGT CCTGAGCAGG CTGGCTTGGT AGACATTGCC CG'ETTGGTAT TTAGAAGCCA CTCGCTAAAA GGACTGGTAT ACCTT'rGCCG TGAAGTALAGT
GCCCCGATAA
GCAGAAAAAC
TTGTCATTCT
TGATTAAGAT
TCTGTTAAAT
TCAATTAACT
TTGGGATCGA
CAATCTCAAC
S
S.
S
S
S. a.
S. be S S
TGGTCATCTC
CGCCATAAAC
GAGTAAAGCC GTCAATGACC AAGGCGATTT CAATAGCCTC AATCAAATGG GACAACTGAC AGGCTGTTAC TTTCTCAAAA ATCAAGAGTA TCTCCAAGTC CAAAAAACTC ATCTGAGATT GCTGGATCAA TTGAGGATCC TGCTTAATAG GTTCGGCALAG GCATTTGTAA AAGGCCAACC CTGGTAAATC AT'rCAAGACC AGATAGCGAG AAAAAGAAGC CTCCTGGGAC AAGTATTCCA
ATGGTAAAGC
ACGCAAGTCC
12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 CAAGACCGAT ATCATCAAGA GTAGTT'rTAG CCATTTGAGC AAAGCGCGTG ACGGTAATCG
ACCCGCTTGC
GCACGGCGCG TTCCTTTTCA AAAGAAAGAG AGTTGGGGGC AATGTAGAAG CAGCTGCAAC TAGCTC'rTCTr GCCTCTCT'rG TTAGAATTTC TGTCAAAGAA GTCCGAATAT CAGTATAAAG TAATTTCATC TCAGCCTCGT TGGAATTTTT CATCACCCTA TATrATACCA
S
C
CCC.
S
.5 S C TGATTAGCCT CGTAAATCTG CTAAATCTTA AATACTTAGC CACATAATAA AAAAGCCTCG GAATACACTT CAATGTGTTG TCGCCATCAA CATCGGACTC ATATATTCAA TATTCTTGAT ACCGTGGATG CTAAAATATT TC'rTATT'rCA ATCTACTATA TCTTATCGCA TAT'rCAATAG TTAAAATATT TAGGCCATCC TTTACTTGTA TTAGATAGAA GTAACAAGGC TTTGAGTTTT TCCCAGTATC TTAATGTCGA TAATTCGATA TCAGAAGAAG AGAATGATTG AACTATAGTA TCTAGAAATT AATTTGATTT ATAAAATGAA CCAAAAATAG ATTTTCGTAA AAAAGTTCTC TTTCTTTTCT TCATCATCTG TAAGTCTGGC TACTGAAAAT ATGATTGTTT CTTAGGTACG CTGGTAGATT GTCTGATTTA TTTTAATATT ACGTGCCTTT AATTGAAACT ATAATAGTAC CCCTAATCAA GCTATTCGTA
TACACAATGT
TCTTATTGTG
GGT1ATAATCT
AGCGAACAGG
305 TAGTATAACA GAAGCATCAC ACGTTTrCCA AATCTCACGT AATACCATTT A'rGGCTGGi-r AAAGCTAAAA GAGAAAACAG GAGAGCTAAA CCACCAAGTA AAAGGAACAA AACCAAGAAA AGTTGATAGA GATAGACI'A AA ACTATCT TACTGACAAT CCAGATGCTT AT1r'GACTGA AATAGCTTCT GACTTTGGCT GTCATCCAAC TACCATCCAC TATGCGCTCA A.AGCTATGGG CTACACTCGA AAAAAAGAAC CACACCTACT ATGAACAAGA CCCAGAAAAA GTAGCCTTAT TI'CT'rAAGAA T'N'TAATAGT TTAAAGCACC TAGCACCTGT TTAGATTGAC GAAACAGGAT TCGATACTrA Tr=rATCGA GAATATGGTC GCTCATTAAA AGGTCAGTTA ATAAGAGGCA AAGTATCTGG AAGAAGATAT CAGAGGATTT CTTTGGTTGC AGGTCTA-ACA AATGGTGAAT TAATCGCTCC AATGACTTAC GAAGAGACGA TGACGAGCGA CTTTTTTGAA. GCTTGGTrTC AGAAGTTrCT CTTACCAACA T'rAACCACAC CATC( ATAGGGGGGG GGGGGGAGGG GGGGGGAGGG AGA INFORMATION FOR SEQ ID NO: 27: SEQUENCE CHARACTERISTICS: LENGTH: 6004 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear GGTTAT TATAGTAAAA TGAAATAAGA 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15213 120 180 240 300 360 420 480 540 600 660 720 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: TTATTACCTG AAACATTAAA TTTAATTGGA CATCCCGTTA TCAAT'rTTAT AATATCATCA AGATTTTTAT TATCTGATTC AGGAATTTTA TCTGATATAA CAACACCATT TTCAAGATAG TTCATTAAAT TATTTGATTC ACTAACATTA GTGTTTTGAT CTCCATCAAG CCAAAAATAA TGGTTATCGG AATCTAAATA CGATGAGTTT AAAATATTAT TACAAATTAT TTGATT'rGCT CCACCAGGAA TATATCTCAC TACTAAATTC TGTTTAAGAT TCTCACTACC TGAATGAGTG ATAACAAACT CTAGAATATA TTTAGCTAGT CTATCTTCAA CATAAATCAT cTTccTAG.AA TGATACACAT CACCTAATTC AAAAAATGCA TCCTGATAAT CAATATTTTC AATAACATCT ,NCCTTTTCTC CGTTTTTCAC TAAAAGTTTC ACGGCTTCTC TAGGAAAATC TTTTATAAGT TGTGTAGAAT GTGTAGTGAT AATAATTTGA TGVT'rTAT T TAAACAC'rC T'roAAGAAA AACTCTTTAA ATTTATAGAT TGCACTCGGA TGAAGTGAGA TTTCAGGTTC ATCTATTAAT ATTAATGAAT TTGAT'rGCGC ATTrACTATA TCATTTACTA ACAAAATAAT TCTAGCCTCA CCTGTTCCTG CAAAAGCCTC GGAATATTCT T'rTCCAGATT TTTTCATCCA AATAGTTTTG GAAGCTTr'rA TATCATCACC TAAGATTCAT CCATTATTTC TCTATAACCA TTTCTr?1 GCAACTGGCT TAGATCGTAA TTAAAATTAT AGTGATAGAA AAGACAACAT Tr'rTTTCAAT TTTTTGGGTA ATCTTTCCAT AATCCTTTTT TTGTTCCAGA TTATAATATG AATAGATTAA GTACI'AAACC AATATTCAGA ATTGAAGTTT TATTTACTCC TTTTAACAGC TACGGAATCA GGCCACTTGT AGTGCCACCT
TTTGAATAC
ACTAATAAIT
ATAACGCGTA
306
AACTTATGTG
TCACAAAC'TT
TAGCTACTTG
T'rAAAATr-r AATGTCTGTA 'rATCA'rCAC TTTAACATTA TATTATTCTT TAAAATATCA TCT'rATAAAA TCI'rGTT'IAC AAATAAATCA AAAGCAGAAA GCCATCCCAT C1'GTCTGTCG CTCATATTGT TTTTGAGGAG ACGGCCTTTA AGAACTTCTA TACGTTGAG'r
CATATTCTTI'
AAGAACTTCC
CATATGGTTC
CATTTCTAGA
ACATTGTTTC
CACACTTTTA
ATATCTATTA
ATTTCAACAG
GCACCGTTAA
CCATCCACTT CATCTAT'rTG TTGGCTGGAC AACCATATAA
AGAAATTTTT
ACAATCACAA
AA'rATATTTA
CCAATAATCT
AGCTTTAATG
ATCAACATTT
AGCTTGTAAA
AACTTATAAG
GAT'rrTTTAA
TTCCCCATAG
CCAACCCGAT
GTGCGACGGT
CGACTTCTGG
CCAGT'rC'r'r TATTCTCCTA AAGTTTCTCC TrTTTrATTAT AGGGTTAGGT TTTTAACATC ATTTCACCAA CGCGTCATA'r TT'AGCTT'GGT AGTCACCTTG 'NrTGGCGTTG GCTACGAAGC GTTCGrAGA
CAGACACCTC
CAACTTGAAC
ATAAATCAAT
AACATTATCA
CTTCTTCATC
TTTGTCGCAT
GAGTTTCCTTA
AGGATTAT'N'
ATCTATGCCT
AGCAACAATT
AATGTAAAAC
TCATCAATAC
TCTTTTGGA
CCAA6CATGT 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 TTGCCATTTA GCAAGTTCCT GCCAGTGGC AGGTAGATTT AGGGATGGTT GATGCGATTT ATTGCTGTTA AAGAAGGCTT CTTGCrTrGGT GCTACATTTA GAGACTTTCC ACACCAGTGT TGCAGCTGTC ACGATAGAAC CATGATTGGG TGAAGGAGAC -AA'rGACCTTA TCGTCTTCAT
TGTCGAGACG
CTGCT1CCTGT
CCAAGTGTTC
CCAAGTCGCT
CTTCCGCACG
GAGCCGCAAG
CTTCTGAGAT
GAAGGATCT
TGTCGCTGTA
GGCCAGTTCT TCTTCAACAT TGAGGAGATC GATGACACTT GACATAGCCA GT'rCAGGTGC TGGATTTGTA AAGCGTTTGA TATAGTTGAC ATCGCTTGTC TTAACAAGGA TGGTGATAGG CGCATTCCGA ACAGCACGAA TCAAGTCTTT GTCTTCAAAG GC'rAGATTAA CAGTTGGGTA 'PTGTCCAAAG ATTTCCTCTG TCACGAATGG GTCCAGCGTA TAGAGGAGAA CAGATCGAGT TAGAACTTCC TTGGTCAACT CAACATACCA AAGGATATGA CCAGCCACAC CAAACTCGAA GGTTTCGTTG AGATTGTGGA GAATCCAGCG AACTTI'TGTG ACATTGTCAT GCGCCACATC ATAGCr.AGAA ATGTTCCAAA TTTTGTTAAT GTTGGCAAAT TCTTCCCAGA TGAAGTTGTA CTTATCAAAG TTTTCAG'rAA CTTTTGCAAT GTCCGTCACA TTACCAGCCT CACCTGTTGC CAGCGTCAAA CCTI'CATTGT TCATGAGGAT 307 AAAGTTCCAT GAAGCATCCA TTTT-CTCGTA AGAGAAACGA ACCGTTTGAA AGGAACCAAC GAAGGGCATC AGCACCGTAT GTCAATCCCG TTACCGAGAG ATTAGACAT CI'rGCGTCCT GTGGATAAGC ACGTTTTGGA AT'rGTCG ACCAGTAAAT ACGAGACACC CAGAAGAAGA TGATGTCGTA ACCTGTTACC ACGTTTAAAG TCTTCTGAGT CGACTTCAGG CCAGCCCATG AGAACTGAAC CAAGTATCCA AGACGTC~rC GTCCTGAGTC TTCTTCGCCG ACATACATTT CACCATCAGC A'1TTACCAG CCAAAGCTGA CGAGAGATAA CCCAGTCGTG GACATTTT'CC GTTGAAACGA GGTGGGTAGA ATTCGACCTT GTCCTCTGTG CTTAGCCAAT TGGTCCAT'r TGACGAACCA TTGAGTAGAC ACCTGTACGT TCTGAGTGAC CAACACTGTG GACACGTTTT ACGTCTTGAC CTGGTGCCGA TTCCGATGA CATCCATTGG TG.CTCGTCAC GGA'rGAGACC TCCAAGGACT GGAAGA'rCA'r AAGGTTGAAG TTGGGAAATA GTTGAAAATG GCCAGAGGGC CATCCGTCAC CTTCTGGAGC GCAGGGATTT GGTGACCCCA ATCCATTGAA GGAAGGTATC TCTTGGTTAG CAATGCGTr AAGCGTGGCT CAACT1ACGAC TCGA=TTTGA CAAGGGCACC 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 GATTTCTTCC AACTTAGCAA CGACTGCCTT ACGAGCTTCA AAACGATCCA TGCCTGAAAA TTCAAAGGCA AGCTCATTCA TAGTTCCGTC ATGACGTTGG CCAACCAAGA AGTCATTTGG AGTACCAAGC TCAGGATCTG CGTGCTCATC TGGAAGGATG ACGTTTTTAC CAATCAAGTC AACCGCAACG TCCCCAALACA TAGTCTCAGG ACCATCTTCC AGCATGTAAT TCATGTGGTA CTCAATATCA GAAAGGGCTG TGCGAGCTfGC ATACATCCAG CCTTTCTTGT AAAGGTTCAC TTCATCAAGA GTGAAACGCT CACGAGAATA GTCGTTCATG ACGTTGACTT GTGGCAAG'TT ATCGTGGGCA GGTGTGATTT TCACGACACC TCCA.ACGATT GGGA'rGAGTT TATTAGCGAT CTTGTAGCGC GGGTCTTCTG GATTAACCGC ACGAGTTGTA GCAACTTCAA GGGCGCGTGA GAAGGCACCT TCTACATCCT TGTGAATCAC TGGGTCCCAG TTGATGATAA ACTCACCACG AA.AGACCTTA CGAACAGCT'r TTGACAAACC GTCTACAGAA AGCCCCATCT TGCCCCATTG CCATTCCCAG ACCTTCGTCA AGAAAGACTC ACCACGTAAG CGCTCCTCAA CCTTAGCCTG AAGCCAAAGG GTATCAAAGC CTTGCATGCG CGTATCCCAA GCGTGACCAA GGTGAAGTTT TGAATAAGGC TI'AGCCTTTT GATCGCCTGA TTGGTAACCA CCAGCCTCAA CCTCGGCTGG GTGTGTGTCC TTTCTCTATT TTrGTTTrATTT
T'TCCTTGATG
ACGACCTAGG
AGTCGCAATA
TTTTTGACGG
CCCAG'rTACG
AGGCTTGAAA
ATTGTAT'rrA GTAGTGGCAT AT'rCGTCTrT TCATAACGCG TAATACCCTC CCAGCGTGGT CCATACCTGG ATGATGATAT CCTGCAAAGT TTGGTGGTG CAATCACGAT ACATCCGCAT CAAGCCATTT GGTGAAAGTT CTTTAGACAT TATTTTGAAT TTGCTTAGCA AGTGGCAA CTCATTCGGT AAATTTCTCT TTCAGATACC CCGTTCGATT TCGACGTA'rc CTTGATGAAA G'rTATCCAAT TTGAAAAATC AAT'rACGTCTr GGAAAGAAAA GGTAATTCCA TCTCTAGTGC AAAATCAATG 308 GCI'CTTCTG CAGACAAATT TGATGTTGGG AATTTAATTG TCAATATGTC GTTTTAAGG CACTCTTCAA GACTrGTTTC AAATCCTGAA TTTGCTTTAA GTTAAAAT'rA CTGATCGCTG CGAACALAATT CCCACATCTC GCTTGATGGT TATAAAATAG ATAATTC'Tc ACCAATCGTC ATTTTTCCTG ATGCTGGAC TCAAATCACC TCCCACTCAC TCCTCAGCAA GCCATATCTC a
TTGAGCATCT
TCGTTGACTT
TTTAAAAGCT
GTCTGGGTGC
AACAGAACCA
TTCCTTTTGA
CtTGACTGGA
ATCCTCAGCA
TGGCAGCTCA
CTACACGCTT
GGGCAAACAG
GCTCAAGAAT
GACAATC'rAA
CATAATGCAC
TCGCAAGCAT
GCTCTGCGC
'N'GCGGTCTC TTATGCGAGC TTrCGAGGGTA TGAAGGTTAT ATCCAAAGCA AGTTAGTTCA AGATCAATCA AGGAACACGC TGCTTCTGGA AAGGTATAGC CAAGCTCTAG CACATCATCC ATGACTTTAT CGGTTCCTTT' GACGACAATC GTACGCTCCG GAAGAATGTG CTCCAGATAA GGAAAACCTG CTGGATAGGC GACCTCTGGC TCCACCACTG TGCGGACTrCG TAAAACGAGA GTTCTTGCCA TCCTTCTTCC TCGCTTTTTT CTCCAGATAG CGATAAACGC GCTGATATCC T'rGATTT'rTA AAATGTCCCT TTTCATCCAG CTTGGCAAAG ATGTGGCAAA TACCGTCTTC AACAAGTGGA CAGGCGTCTA AAATAGGAGT CGTATATT ATTr'rAAAGT AAGTGTTTCA GCGGTCTCTA T1rTTGCTTT AATCGATTCT CAATrCAACA AACAGAATCT ATACATCAGC TGGTACTGAr ATTTCTTGCA CTTGCTCCAA CTCGGTATAA TCC'rGATAGA GGTAGCATCC GTCAGTCGAG ACCAATGATC AAAAGATGCA AAATCATCAC AGCAGTTGCC AAGCCAAGCT TTTCCGAGAC ATC=GTGAA GACC'AAGTTC ACATAACCTC GACCCCAATA GCATGAAGAT GGTTGAAGTC CCATAGCCAG CTCGGAGATT TAAATCTCAT CTTCCAAGAT AAACTAGCGT AGGTATGGAT CGTTCTGTTT CGATTTTATC GATGAAACTG CCCTTCATAT ATCTCCCATG AAATAGGTTG GAGT'rCTGGG GCAACAAGTC CTCAACAATC CTATCTACCC CTGAGTTCGT TCAGAAAT r' GATAAAACCA GCCTGCTCCA TPTTTGATAC TGTTTGATCT CCAATCTCCT AGACTATAAG TTGATATCCT AAAATAAAAA CTrATAACA ACCCCCTTTG GGACGAATGT GTTTATCCGC 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6004
TCCCAAACGT
AGAAGTTTCA
AT'ICTCTCTC
TCCAATTTCT CCTGATGACT TCAGAAATA'r TCACAGTAAA GCAACGACTC CAATCACAAC TTATAGCCAA AA'rTCTAATC TTACTTGTGT TAAAAGATPG CCCTGCCATC TACATGACAG AGGAACTACA GGTCGTGATG CAGGAAAACC ATAATATAGT ACTAAGACGT AAAAGAAAAG INFORMATION FOR SEQ ID NO: 28: SEQUENCE CHARACTRISTICS: LENGTH: 5857 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: TGTAGAATTC ACGACAATGC TTCGTTGArT TCTGGGTrGA TTTCGrCGCG TTCTGGCAAG CGAGTC.AATG AACCAAAAAT AGTACACAAT AGATTTTCGT AAAAAAGTTC TCTCTTATTG ACACGTTTTC CAAATCTCAC GTAATACCAT
GTGGTATAAT
TGAGCGA.ACA
TTATGGCTGG
CCTTTTATGG CATATTCAAT GGTAGTATAA CAGAAGCATC T'rAAAGCTAA AAGAGAAAAC AAAGTrGATA GAGATAGACT GAAATAGCTT CTGACTTrGG GGCTACACTC GAAAAAAGAA AGGAGAGCTA AACCACCAAG TAAAAACTAT CTTACTGACA CTGTCATCCA ACTACCATCC CCACACCTAC TATGAACAAG TI'TAAAGCAC CTAACACCTG AGAATATGGT CGCTCATTAA TCAGAGGATT TCTIGGrrG CGAAGAGACG ATGACGAGCG ATTAACCACA CCATCGGTTA AGAACTCTTG TGTGAAGAGT GTACAATCCT ATTGAGAAAA AAGTTGCAAT ACCTTTTATG ATTGTCTAAG CGAAACAACC ATACAGGAAA AACAGTTCAT TAAAAGGAAC AAAACCAAGA ATCCAGATGC TTATTTGACT ACTATGCGCT CAAAGCTATG ACCCAGAAAA AGTAGCCTTA TTTAGATTGA CGAAACAGGA AAGGTCAGTT AATAAGAGGC CAGGTCTAAC AAATGGTGAG ACTTTTTTGA AGCTTGGTTT TTATTATGGA TAATGCAAGA T'rGGGTATAA ACTTTTACCT CATGGGCTCA TATCAAAAAG
TTTCITAAGA
TTCGATACT'
AAAGTATCTG
TTAATCGCTC
CAGAAGTTTC
TTCCATAGAA
CTTCCTCCCT
CACCTCAAAA
AT'TTTAATAG
ATTTTTATCG
GAAGAAGATA
CAATGACTTA
TCTTACCAAC
TGGGGAAGCT
ACTCACCTGA
AGGTATTACC
GACTATATAA
TTTGTTACCA
I AAACGAAC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 AGGCTTTTT GTCT'PGTTCT TGTTTCAATT GATAAGAATT GGCACAAAAG CGACCGTATT AGTTCTATCT TGAGCAAGTC TCTCCAGCGA GCCTTAAAAA ACCAATTCCC AAACATCTGT CCCCTCACAT CTTCAGACAC ACCACTATTA GCATCTTATC AGAAAATAAA ATTCCTTTAA AAACAATCAC GGACAGGGTT GGTCATCCCG ACTCTGAAGT CACTACTTCC ATCTACACCC ACGTCACAAA GAACATGAAA GATGAAGCAA TCAATGTACT GGATAAAGTT ATGAAAAAGATTTTTTAAAA AGTTTTGTCC CTTPTTTGCC CTCTAXATAC AAAAATAGCC CTTCGGATAA AATCCGAGGG GCTAGAAACG TTGTTAAATC
AACGGCCGAA
T??rAAGGTT
TTTACCTTTTT
CTTTrTGAATT TCATGGTTCG ATCATAATAT CAAATAGTTC TcATTcTAAA ATGTAAAGTA AAAAAAGGTA TTAAATCGAT GAGT-rCAGCA TTTTTTAATT AACGCCACGT 'rAACTTTTGA CATTTCACGG TAAACATCGA TGAAATCTT TATT'TTTGTA TTAA'rAACTT TTTTAGTATC CTCTCCTTTC TCATCCTGTA ATCAAGATTT 310
GGATAAAATA
AATT1AAATAC
CAAACAATTA
GGCAAGAAAA
T'rGATGAATT
TCCAACATTA
GA.AAGAA'rGG
TI'ATCAATGT
GTTCACTGAA CTATTTTATT GCTAAATTAC TAATATACTT CAATATACTA GAGGGGGAGT TAGCACCTTT ACGGGTGCTA 'N'ATTGTTT'G GCACTTCTTT TTT'rTGGAGT TAAC'rGCA'TT TTTAAGAAAT CCATAACTAA CAAAATAGTA TTTTCTATCA CTACCGGACC TCCTACTCT'r ATCCAAATTG GTCCTTCTCC TCAAGAGTGT TGACAATTTT CTCGCCATTT CATrAAGTGG rT'rTGCTGTA TAGATGATAT TTCTCCATAA CTGATGATAC TTCATAATCG TCTCAACCAT TGATTTATCA CACTTTCATT TTTAGAAATA GCAAGTACAT TCTCTTAAAT GAAGTTAATT CAATAAATGT TTTAGCTGTA TT-GCATTCCA ATAAGGTCTA TTATAGGATT TATATAATAT ATTTTCAAAT ATATTCTCAA 'rTTCATCACC CAATCCATTT TTGCTCTGCG ATATATACAT TTA-AGTTAGG ATCTATACCA CTC'TGACTGT. GCAAAAGGGA T'rATATCACA AGTTT'rATGA AATAACTT'rC CAAATTAATC GTT1'AGA-AAA AATTrCCATAT 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180) AATTCAATTN GTCTTATAGA TGGAAATATC TCGTCTGTAC CATAACCTGC CCAGTTATGT TTGTTGAGTC ATATCCAATG AAAATCGCTT TATATAAAGA ACTTCAACCT CATCATCAGT ATGAGGAAAG GATTTAAAAA ATTAACTCTA ACTCAGCTTC AAAAAATTCA AAArTACTT'r TCTAAACTAA AATTAGT'rAT AGCATTTAAT AAAATTTTAT GTTrCACCAT TAGAAACTCT TAAA'rCAGCT GTTTCTTGCG AAAATACTTC TTGTACTITCT GACAATAT1AA TTTCTTAATA TTAAAGGAAA TTAAAAATTC TATTAGCTTT TCAACGTATT TCTGTGCdAA TAGCCTGCTT AAACTCATTT AAAATTACCT GAAGCGTTCC CATATATCA'r GATCCCCACG GAATGTTCTT' 2CGGGCGC1'AT TAAAAACTTT TGAATTTTTC CCGTCTGATA CCAATACAA CACCATTT'rT ATTTAATATT CCAATTTCTG CTTTCTAAAC CTGCTCATGC TCTAATGGTA CAACAGCTAA ACACTTTTAA TAC'TGTATCA AGTTGTGGGC TTGTCTTTCC
CATCGTCTAC
CAGCTTCTAC
TAAAATCATC
CTTCATAGGC
AATCCTCAAC
GGGCAGTATT
CCCACGGAAT
TATAACTAAT
TTTACCAATA
AATGCTTTTT
TTTTGAAATT
TAGAGTGATG
AATGCTGTCC
TTGTAGATGT
ATCTAATAAA
TTCCATAAAC
TTGATAAAGT GAATAATTTT AGGTTACAGC GCTATCAGAA CTGTCAAAAT ATCACCTAAA GGTCTTACCA AGACTTGCCA TGTTTCCATT CTAGCGATAA CTGGCTGACT AACACCGCTC ATCTCCTCTA GTTTCTTCTG ACTAATACCC TTTTCAT'rTC TAGCCTCGAT AAGCTCACTC A'rGATAGCCA CGCGCATATC ACTTTCCAAA ATTTCCTCTT TGCTGAA'rAA TTCAC~CTT ACATCTTTCC AGTTACTACC AATAGCATTA VTTTCATTG TCTAAACCTC TTTCTTTlTAA ATCTGCAAGT TCACGTTTAG CrCCCAAT CTCTCTTTTG GGTGTmCT GTGTCCTrr-1 CATAAAATCA TGCAGTAAAA CAAAACTACC ATCCATCCAA GCAACAAATA AAATTCTATC TCTAAcGa'r CTCAGCTCCC AAATrrCAGC ATCTAA6ATGC TTAATATATG GTTCGCCTGC GCGTGT'rCCA TGTTGGCTTA ACA.ACTCAAT ATAATCATTA ATTTTATTAA GCTTAA'1TCT GCTATCTTTC CCTTTTTTAC TGCTA6AGCTC TCGCATATAA TCAAAAACAG GCTCATTGCC GTTTTTATCC TrGTAAAAAT AGATAT'IATG CACTATTAAC ACCTCTTCCT AATAACA6ATT ATAACCTAAA AGTTATrG=r TGTAAATACT TTTAAGTTAT 'rAAAATAAAA AGCACCTAGT TTCCTAGATG CTAGCACAAT GACACGGATT CGCACCGTGG 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 CTACCTCTAT CAAGGTGTAC TCCTCTATA CTATCCC?3'G TGCT7rACAA TATTATACCA A*TTTT'GGGCA AGGGTACAAC so 000.
0*.
CACAATCAAC TAGATACCTA CCATCTCATG ATATACCCCC GCTAAAATAC AAATCAGAAT AGATATTAAA CCACTTATTT TTGACTGATA AATAATATCC GCTGACAAGC TCCGATAACA AACCTCT7r- ACAGCCTCTA AAATGTCAGC CTCACTrCT'r CTCCTTGATA GT'TGCGTATT CATTTTACGC TCAAGGACAC GAGCCATAAA ATCTTTTCTC TGCTACCTCC TAAATCATCA TAGATCTTGT TCGATGTATG ATCTTATCCT CATGAGTAGG TTrGCACCAAT CATCAATAAT AAAAC'rTCAT TATCACTAAA TTTGATAAGC TAGCATATCT TATACTTAGG TTGTTCTTTA GGTGTk'rTAc AGAAATACGC ATTTAACAAT TCTAACCACT
AACTTATCAT
TTCATGTGAT
TGTACCCTAA
TGATTTTTAG
TCTCGCATGA
TCAA?-TTTCT
AAGCTGGTGA 3960 TGTACACATA 4020 TATCTGTTAT 4080 CAGCATCAAA 4140 A.ATACCACTT 4200 TCTTTGTCAT 4260 AATAGTTGCA 4320 AATATAGTTC 4380
CACTTTAGA
ATACAAAGGT
AAAGTATAGT
AACTGGCACT
TAGCTCGCCA
TCTAAATCTT TTTCCACCCT ATTTCCGTTT CATCCTCGTT TAGGATACGA TCCCACTCAC GCCATTTTTT AAGGTTTTCT TCTATTTGGA AAAAT'TCCCC TAAGTCATTG TTTCTATTAC ITAATAACTC CI'GAGTTCT CTCAATTTCC AACCTCAATT CTTCAATCTG TTTCCTTCAA CAATAATAAA CTCTGGCATA TGTAACTCTr TGATTTCCTT TAGATACTTC 4440 4500 4560 4620 4680 4740 4800 4860 4920 CCT'rACTACT
TTGAGTTCTT
AGCGTTAGAT
TATCCTCAAT
CCAAAAATTI' CATGGGTCTT ATAAGATTGT TCAAGTATAG CCTTTGCTGC ATAAACGGGT TGACCTTACT GTCCATCATA ATATCATTGA GTACAGAAAC GATGCTAAAT AAAGCATTTG AGTTGTrTA TCCATCATCT CATCTTGCTr GTCTTTTTAA CCGCTGCAAC TI'TTAGATAC TTATGACCTG TTGCGCGTGA 312 TACCCCTGCT TTTTGACATG CTTGTCTAT CGTTGGCTCG TTTAALTTrGC TTGGACGTAA GGTTATCATT TTCATT'rCCT TATCAAAATA AAGGGTTGCC CCTTTATTTC CCTATGCTAG GTAAGCATGG CATCTATGAA
GCCATCTATT
ATAATTCTGC
CCATTGCCTC TGAATTGCCC CGTTAATAAG TAAACCACCG TTrCAAGGCG VrTGCTGTTGG TATCAT'rATC CATAATATCT CTAGGTATTC TCTCATTTCT CTGCATCATC TGCTGTAATA TCTCTTT'rTC TAGT'rGCTGA CTTTTTTATA TTT'rAAAAGT AAACTAATTG CCCATCTTGA ACTTAATTAT ACCATTTI-I- TATCATCCTT GTTTTCAGTC CCCCTAGTCT ATTTATTTCA TCAACAATCA TTrCATGCTG TACTAAATCA TGGAAATAAT CAATTr'Tr'CT WArC7AGGAAA CTGAATTGCT CCATGTCAAT TTCGATATAA TCTAATTTTC TAAGAGCTrAG AGGTTTATTT GCCACTGTTA ATTTGATACT AGATAATAAA GGCTCTTCTr TTGATTCATG GT TTGCTAGT TACAATAGCT GAGCAGTATT TTGGGAATAG TCTTGCTCTG CATACACTTT CCCGATAATC GCTTTTAGCT TAATACTCCC ATGCTCTGGA GAGTATAAAA CAAAGCCTTT CTCCATCATT ATGCTI'TTCT CCTTATTTC AT'NTTATTAT CTAGGTTTTT AGGGTTCGTA TGCTAAAATA
ACCTCCTCAT
AATTCTGCAT
ATCTTATCTC
TGTACTAGCT
GCAAGGGTAG
T rATATTTTT
CTTAGTTCAG
TCAGCATTTT
TTT'rCGCCCT
ACTTCCTTAT
ATTTCAATAT
TTTAATAATT
AATCTGAATA
CTACCCTTTT
0 6 0 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5857 120 180 240 300 360 420 480 TGTGTACCT'r ATGGCTGACT TT TCAAATTG GTTAGTT INiFORMATION FOR SEQ ID NO: 29: SEQUENCE CHARACTERISTICS: LENGTH: 10254 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: AAAATGATAG CAGGAGAGTT T'rCCCGTCCA TCAGACCCAG AACTGAGAGC CTTAGCTCAG GCTTCTCGCC AAAAACAGGC CGCCTTTAAC AAGGAAGAGA ACCCCTTGAA GGGAGCCGAA ATCATCAAGA CTTGGTTTGC CTCAACCGGG AAAAATCTTr ACATCAACAC TCGC'N'GATG GTGGACTACG GTGTCAACAT CCATCTAGGG GAAAATTTTT ATTCTAATTG GAACTTGACC ATGCTGGATA TCTGTCCCAT TCGTATCGGG GACAATGCTA TGATTGGTCC TAATTGTCAG TTTTTGACAC CCCTCCATCC ACTAGATCCA CAGGAACGCA ATTCAGGTAT CGAGTACGGA AAGCCTATCA CAATCGGAGA TAATTTCTGG ACTGGTGGTG GCGTCATTGT CCTTCCTGGA GTGACACTGG GAAATAATGT CGTTGCAGGA GCAGGGGCAG TAATTACCAA ATCTTTTGGC 313 GACAACGTTG TCCTAGCTGG CALATCCTGCG CGCGTGATTA AGGAAATACC TGTTAAATAG AAGTAAAAAG GAAr-AGCTGG GGTTGTTTCT rrTCTAGc TTrTcrATT TTTTAcccAG TTCACATTTA CCTACTCrAT CTCTTAGCAA GTC'rGTTT'CA TTAAGCAAGT TCGTAAGTC GATGTTTrTC TCCTCAGTTC ATCAGCTTCC TCCT'rGACAC TTTGATACAA TAGTACAAAA TTAGAGGAGG CAGGCTATGA TTCAGAAACA ATTT'TAGAGT TTGATGACAA TCCTCAGGCG GTTATCATGC TTGCAGTTGC CAAAGAACTG TGTTTATGCA TTTTTAGGTG AGGGAAGTAG GGGCGAACTG TGTTGGCGAA TATGTCGTGA ACTACAAGGA CGAGGAGGTC CCAGCAGCCC AGTTTATGGA T'rGGTTGATT
TTTGTTCTG
TGTCTGGCTC
GGCTATGGTG
CCAATCACGA
AGGAGATTGA
CCACCAAGAC
AGGC'rCCTGT
TGGACCAGAT
TTCTAGTCCC
TCAAAGCATC
TCGGTCAGAT
TGCGATTCCT
GGGGCTGGAC
CCGCTATGCG
CTATCCAGTT
TGGCTCCGCT
TATCTrCTACT
TGTTCGCGCT
GGCACC'rGTG GTGTCCTrAC CTGCGAGATG AAGGAGCCAG CCAGAGGCTA TTGCTGCTAT GTCATGACCT GGACGACAGA AAGGAAGAAG GCTGTCGT TTGCGTGGGG TTCTC.TGGGC CAGTACGACA GTCGTGACTG GCAAGTGTTrC ACCACCTTTA GCTCATACTT TTCAAAAATA TGATATAGAG GAAAATGCCT TTACCACTAT GTGGCACCTT TGAGGAAGTT TTGCAAGACA CGGTTTTTAC CGAGAAACCG GTCGTTATAT GGAA.ATGCAG GAGGGATTCC TTATGAAGAA CTGAAA.AGGT GGCTTATCCT TGTGGAGATG GAGTGT'rCTG CTCTTGCGGC AGTAGCTCAA TGAATTGTTG TTCACAGCAG ATTC'rCTAGC GGACTTGGAC GGGCTCGGAA GCr'T'AATA AGGCGCTAGA ACTGAG'rTTA 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 GITTA(CIG, CAAAGGA'TTT TGTTTAA-ACG AGGTCACCTT GAGGTTGGGA AAAATCTTTA AAATCAGAAA AACGTATCAT ACACTATGCG TTT'rATGTCG ATAAGATTTA GAGTGAGATG CTCTTCAAAC CAGGTCAGCT TCACCTTGCC GTAGGTATAr TATCCGGCAA CCTCAAAACG GTGTTTTGAG C'rGACTTCGT AAACAGTGTT TTGAGCAACC TGTGACTAGC TTTCTAATCG TATAATCAAA AAGAGAAATrT 'rC'CCTGAA AAGCATATAG CCTGTCTTGC TTTTGACC TATAGTCACA TCTATCAAGT CAATAAAAAG GTGGCATTTT TTAGGCTTGG TGTTAGTAGA GTCATrTCGA ACTTTTTATG GTACAATGGA AACATGTTAT CTTTTATCAT AAAATGTCTA CCTCr'rGTCC TAGGCATGTT ATCACGTGAT GAAAACTTTG AAATGATACT CTTCGAAAAT GT'rACTGACT TCGTCAGTCT CAGTTCTATT TGCAACCTCA ATCCTGGT TTTCATTGCC AGTAGCTGGC G'rAAAAGCT AT'rGTTCTTG CCTAAGCTrAT TTTTGCCTTA TCCTATCTAA TCAAATTATC TAAGGAAAAA ATACAGCTAG GCTTATCTCG TT~TATCGCCA GCCCGTCGTA TTTTTTTGAG TTTTGCCTTG GTCATTTTAC TAGGCTCTCT GCGACTTATT TTGATCATCT ACCCTTCCAG TAGCTCACAC CAGATCGGTG GTCTAGGGCT AAGCTTAGTC TTCGTAGCCG TCTTTGAGAA AGTTTGTCTA GCTATTTT'GC TTAGTTrrCG TCCATTTTTC TAGCGATCTC AGTTTATTTG CTT?1'CAGAC ACAGGCGGCC T'rGGTTTTAT AAAGGACGTC TGCACTTTCA TTTGGAACAG CAACTACTCT CCTGTTGCCG ATAAGGT'TTT TTTTCTACGA TAGATTA'rAC ATGTr'rCTAG-GTGGCGCACC GTCCTCTTGG TCTTTGCACG 314 TCTTTTGAGC TTGCCCTTTG 'rCCAAGTTGA TTTCACTGCT GTCTCTGCAG TCTGTGTGAC CTATAATATC TGGGGTCAAA TAATC'rG'1-r CATGACCT ATTGGGGT TCTATATCCA TGCAACTATT CAGGATAGTT TTAGTTATGG T'rCTATTTT'r CI'CACGACCT =rrWTTGA CCTTA'rTCCT CAACTTCGCT GGGGACGTGG AGCCTTCTGT AATGCCGGTT TTGATAATTT CGATTTACTG GTCAATCTGG TGATTGCAGG
AAGCTCACGA
GGGTrCTCTCA
GCTCTTGATT
GAGCAAGCAA
AGAAACTCGA
GAGCTTGGGA
TCTTTrTTAGT AGGGAGCAC6
CTTGATTATT
GGTCTGGTTT
'rACGAAGCTT CTTTrCTTGAG
AGTTAGCTTT
TCAGGCTCAT
TGGAGGAACA
AAGTGAGCTT
GATTTGGC'rG GTCATGTAGG AAGAAAGAAA G'rACTATTAT TGACTATAGG TTTGTTGTTA TGGAACAATG CTGGAACGAT TGGCAATCTC w T'rTCAAACAG
CCTGTGACTC
GCTGGGGGAC
CTAGGCT'rGC CGAACGATCG CGCCGCGAAC GGTTCAAAAA TCCTTTAGTG AGCTTCTTGA TAGGATTGAT TCTGCTAGGG ATAACAGCCA CACCTCGTAT T'rGAAACCAT TTCAGCTCTT AGTACAGTTG CCTGACCTTG GGAAATTGGC TCTCAGTGTT ATCATGCCAC GGTCCCTrTGA-CCTTGTTTGT TAGCTTGGCA GATTACCATC CACTATATGA AAGCAGATAT TAGTATTGGT TAAGAAAGGA GATTGGAATT TTGGGCTTGG GAATTTTTGG GAGCAG'rGTC GGATATGAAT ATTATCGCTA TTGATGACCA CGCAGAGCGC TTTIGGCGCGT GGAGTGATTG.GTGACATCAC AGATGAAGAA T.GATACCTGC GATACCGTTG TAGTCGCGAC AGGTGAAAATI GGTTATGCAC TGTAAGAGTT TGGGGGTACC GACTGTTAT'r TGACGATGCG AACAGCTGGC TTTTGATTA TATCTTACAG TCAAGATTAC GACAT'TTT CTCATGCCAA TGTTGCGAGA TCTTTATTAT CTTTTTGATG AAGGCAATCC TCCCTTTATC GTGTAACGGC AAATCTGACT TTATGTTTAT GGGACGAATT' CAGAAAAGAA AGATATGATT AAGAGCATGT CAGATCGTAC CTAGCTGCCC TAGCCAAGCA ATCAATCAGT TTCAGCCAGT TTATTGAGAT CAGCAGGGAT 2280 2340 '2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020
CTGGAGTCGA
GCTAAGGTCA
CGCTAAGAAA GTGCTAGAAA AGATTGGAGC TGACTCGGTT' ATCTCGCCAG GGGGCAGTCT CTAGCACAGA CCATTCTTTT CCATAATAGT GTTGATGTCT TAAAAATGTG TCTATCGTGG AGATGAAAAT TCCTCAGTCT TGGGCAGGTC
CTGTGCTTGC
AAAGTCAGAC
AGTATGAAAT
TTCAGTTGGA
AAAGTCTGAG
315 TAAATTAGAC CTCCGTGGCA AATACAATCT GAATATr1-rG GGPTTCCGAG AGCACGAAAA TTCCCC-ATTG GATGTTGAAT T'rGGACCAGA TGACCTCTTG AAAGCAGATA CCTATAT'TTT GGCAGTCATC AACAACCAGT ATTTGGATAC CCrAGTAGCA TTGAATTCGT
GACCCCTCTT
AAAAGTTCTT
TACTCAATGA
TTTGAGGT'rG CGCGGT1TrGA
TATGAAGTTA
TNGATGCC TAAGATGGCA AATAGAGACA GAAGCCCCTT CAAAGGCTGG ACTTTATGCT AAAATAGAAA GAAGTGACAA AAATCAAAGA TCAAACTAGG AAACTAGCTA CAGATAGAAC TGACGAAG'rC AGTAACATCT AGAGATTrrC GAAGAGTATA AGAAAAAATC T1'GTCTATCG CAATI'TCTAG CTATAATGCA CTAGTGATTG GTGGTGAGCA AGTTGGGATT
CGGGCTGCTC
ATACGGCAAG
AGTCCCCTAA
GCAGCCTATC
TTrGAT2'ATCA*
AAAGAGGGAT
GTCTTCTAGT
GAGAGAGTAA
AAAACACTGT
GCGACGT'rGA
AGGAGTAGAT
TTCATTACTG
ATGACGGGTC
TGTGGAGTCG
TCACGATCAG ACTCAGGAAA TCGCTGAGTG AGCCATCTAT CAGGAAAATA AATGCCATGG TTCTGGGCGC TATTTTAAAG TAGTTGACAG GAAAATTCTT GAAACCTTGC AGGAAC'rTGA GACCAATT'rT GTCTATGAAA AGGAAGGGCA AGTCTTGCCT GI'CGGCAGA TTrTTGGCTG GTATACCATG ATGCACTCGC TGAT'rTATCG AC'rGCCTGAA CATACTTTT ATGTCGATAA CAAGACCATG TACTATCTGC CTGTCGATTT GTCTGTCAAT GAGCAAGTGA TGATTAAGTG CT'rGATAGAC CAACT'rGATT TGTCCCAAGT GAATCATATT GAACTCACGA. CGGTGATT-TC GGAGCATCTG GCAAAAAAAC GCCAATITG CTTTCAGGCT ATTCGTAAGA CCATGTTGAG TCCCAAACTG TCCAATGTCG TCTATCAAAT
TTTAGCTAGC
CGGTCGTC
TGATGACTGG
GAGCAAAGGT
GTCTCGTAAG
GGACCAGGTC
GACACATTTG
AAG'rA'CCTA ATATCGTTAG A.ATCGTGGCT TGGTAGAGGC GTGGATCCTC GTGCCTACTT CAAGAGGTGG ATGTCTT'rGT AAGAGTATGA GTTACGATTC GGAAA'TrCT CCAAAGGCCA TTGCGTGCTA GCCAGTTCTA 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 49 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 TCTCTTTGTC TTTACGCCCC TTCAGCAGGT CTATCGTTAT TTGATTGGGC GTGAGGACCA CATTGACCAG CAACTCAAGG TCAATCGACT GAGTCATCCC AAAATGCGAG AATATCTGCT CAGTACCCTG C'rCAACCGAT CTGGAACAGC GACCTATATT CAGCAGAAAA ATCCAGAAGT
CCGTT'TGACC
CACCAAATCT
'rAAGTGTTTT ATAAGACGGGA TTTAAGAAAA ATTTTAAC'N' TCAGGAGATT ATACTACAGT CATCAAATAA AGAAAGACTC TCAATCCAAA TCAAAGATATACTCGTTGGT CTATTCGCCG CAGTTGTTGT GCT'AGTGGC TTCTT'rGTCC TAGTTGGTCA AAACATTCTG TCTTGCCAGA GTTTATGGAT TTAATTAATA TTTCTTAGTC CTTTTI'AATT TAAGGAGAAT CCTATGAAAT TCTCAGTGTC GGTGTTGCCT GCCAAGTTCT GTACG'rGCCG ATGGGCTCAA TCCAACCCCA GGTCAAGTCT G'rGACTTATC
CTGGAAATAC
GCCCTTCTAG
AGAAAAACCA
GAATTCACTT
TCTGGATACA
TAACAGATG'r CTTAAAAGAA CAGCAGAAAC TACTCCTGAA TCGATGTTCC TGC'TGCTTAT TAAACCAAGT AATTCCTTAT TAAAAGCTr'C GGATAATGC'r CTCCTCTTGA AGGATTAACA CTGTTGGTAA ACAAGGTCAA
GGAGACACCG
CCGACACCTA
CTTTTTrGAAA
ACTGTAGATA
CAAGTAAAAG
CTTGAAAAAG
GAACTATTCG
CCT'rGGTCTG
AAAGGGAAAT
GCTTTAAT'rG 316
TACCTGAAGA
TTCTCACTCA
CAGAAAGAAC
AAGATGAAGA
CAGCTGATGT
GTGGAG'rGAA
CTGAAGGGAA
CTGGTGATGG
ACAATGGTAC
ACTTCTATGA
ATCAACTTCG
GACATCGGGA ACGAAAGAGG AGCGAAACCT GAGGGCGTTA TCAA AGC GAGGAAACAA AGCTCAAAAA AATCCAGAGC GGATGGGACA CAAGCAAGTC AGAAAATACA AAAGACAGCA AGGTCCTTTC ACTGCCGGTG TATGTTAACT CGTCTATTAC TGdTAAAAAT CCTGCTTTAC AG'rAGACTTA AATGGCAATA CGCTAATGGT ACTCAAACTT ATAAAGCTAC TGTTAAAGTT TACGGAAATA AAGACGGTAA AGCTGACTTG ACTAATCTAG TTGCTACTAA AAATGTAGAC ATCAACATCA ATGGATTAGT TGCTAAAGAA ACAGTTCAAA AAGCCGTTGC AGACAACGTT CCAAGGGTGA AGGTCCAT'rC CAGGTGATGG CATGTTGAC'r ATAACGGCGA CGCTAAAAAC G'rCAATACTT CTATCAAGTA TCATTGACCA GTTCCGAGCA GTAACAAAGA CGGTAAACCA ACATAAACGG TTTAA'rrTCT ACAGTATCGA TGTTCCAGCA CAGGTGTCAA CCATGTGATT 'rCTTGCTCAA GGCATCTGAC CCCTATCTCC ACTAGGTGAA TGGACGGAAA TGTAGCTGGC GTACTCAAAC TTACAGCGCT TGGACAACAT CGTAGCAACT AAACAGTTCA AAAAGCCGTT
AAAGACAGTA
ACAGCAGGTG
CGTCTCTTGC
CCAGCCC'rAT
GCCTTGGACG
AAyGGTACTC
GACTTGGACA
AAAGAAACAG
GCCTACCI'AG
CCATACGAAC
AAGGCACCAT
AACGTGAAGA
AAAGAAAAAC
ACAGTCAATG
AAAAAAGTCA
GCAGACAACG
TCGATGTTCC AGCAGCCTAC TCAACCATGT GATTCCATAC TCAAGGCATC TGACAAGGCA CTCCACTAGG CGAAAACGTG GAAATGTAGC TGGCAAAGAA AAACTTACAG CGCTACAGTC
CTAGAAAAAG
GAACTCTTCG
CCATGGTCAG
AAGACCAAAG
AAACAAGCGC
AATGTC1'ATC 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 ACATCGTAGC AACTAAAAAA GTCACTATTA TTCAAAAAGC CGTTGCAGAC AACGTTAAAG
AAAAAGCCAA
TCTTCGtAGG
GGTCAGATAA
CCAA.AGGTCA
AAGCGCTCAT
TCTATGGTAA
CTATTAACAT
TTAAGGACAG
GGGTGAAGGT
TGATGGTATG
CGGTGACGCT
ATACTTCTAT
TGACCAGTTC
CAAAGACGGT
AAACGGTTTA
TATCGATGTT
TGTCAACCAT
CCATTCACAG
TTGACTCGTC
AAAAACCCAG
CAATTAGCCT
CGAGCAAACG
AAACCAGACT
ATTTCTAAAG
CCAGCAGCCT
G'rGATTCCAT ACCTAGAAAA GGCCAAGGGT GAAGGTCCAT TCACAGCAGG 317 ACCAACTCTT CGCAGGTGAT CACCATGGTC AGATAACCGC TGAAGACCAA AGGTCAATAC AAAAACAAGC GCTCATTGAC TCAATGTCTA TGGTAACAAA AAGTCACTAT TAAGATAAAT CTTCTAACTC 'rGGTTCTGGC ATAGCATGCC TGCTGACACC CTGCTTCTGC TAACAAGATG ATACTGGTGA GACTCAAACA G'rTTACTCGG TGGTCTAGGT TAAATGATGG ATAGTGGGCT TTCTTTCAAT AGCAGAT'rAA AGTATAGCAC TGTTTTTATC GACGAGGTAG AAATCACAGA TTGGTAGCCC ATGATGGACT ATTATCACAG ATGTCATGAT TACTTATCAC CAGAGCACCC ATTTACGGCC TGAGCTTGGG CTGGTTTTGC GTGTCCACAA ATTTCCCTTG GCAATCTAAA ATGCTGGAT'r TAACTGTTAA a. a a GGCATGT'rGA CTCGTCTCTT GACGCTAAAA ACCCAGCTCT TCTATCAAG TAGCCTTGGA CAGTTCCGAG CAAACCGTAC GACGGTAAAC CAGACTTGGA GTTAAAGAAA CATCAGACAc GTGACTCCCA TGAATCACAA ATGACAAGTT CTACCAACAC TCTGATACGA TGATGTCAGA TCAATGGCAA GTATTGGTTT TTGAAAAACA AAAAAGAAGA GACTAAGATT AGTTTAACAA AATCATCGTA AAACAATAAA AAAGGAGAGA CAGATGGGAA TATTCATCAG AGATACTTAA GGAAGCGCTA GAGCTGTTCA GCCTCGCATG GATGGTTATG TTTCCTATTT ATTACTGCTA AGCAGATGAT T'rTA'rTGCTA TATTTTGCGC CGCCTTCATC AATGAATCAT AGTAGTCATG ATCATTTGAA TTGCTGTGGA CCTCTATGAA AAGATCTGGA TATCCATGCT CT'rCGACAGG GACAGTTTGG GGGTTGGGAT AGTTATA'rTT TGGTTGGATA GCTGTTCAAA AAATGCTGAT GTTGCCAGCC 'rTGTCGGTGC 'rTGGGCAAAC TCAAGGAGCA
GCTC.AAGGCA
ATCTCCACTA
CGCGAAATGTA
'rCAAACTTAC
CAACATCGTA
AGCAAATGGT
TCATGCTACA GGTAC'rACAG GATGGCAGGT GAAAACATGG GGATAALAGCT ATGCTACCAA CCTTCGGCTT GCGCTTGCAG AAACTAATCA GCTAAGGAAA CTCAATCAGC AATCAGGACT AATAGTG'rrA TACTTAAAGC AGACAATTTT ACTCGTTGAC TTCAGGCAGG TTATCAGGTC AGAAAAAACC GATTGATTTG ATTrAATCAG TGAGGTTCAA AGACCAGTGA ACAGGACAAG ACCTTTTAG CCCACGTGAG GTGGGGGCGA AACAGAGCTG AAGTTCAAAT AGGAGAAGAA
TCTGACAAGG
GGTGAAAACG
GCTGGCAAAG
AGCGCTACAG
GCA.ACTAAAA
TCATTATCAC
7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300
CGAGTTTTCT
ACCAATACCT
GACCAAACTC
GGACAAACAT
CCATTTTGGT
CCAAGACAGA
TGAATGTGCA
CCACTATTAA
GAAACTAAAA
'rGTTTTr'GG
TTTTAGCTAG
AAGAAGACTA
AGCTGGCAAA
ATAAGATAGA
TATTA'rTTCA 'rGCGAAAGGC
TGGGATTAGT
TGCCAAGCGG
TAATCCAGAG
CGTGGATGAC
ATATAGTAGT
GAAACCGAGA
ACCCTCTTAA
GAGATTTACT
CTCTTTCTCC
GTAGCGGCCA
TTTTGCTTGG GATGACCATC TATTGCCAGT CTT'rACGTCG 318 AGGATTTTCC TTCAAATTTG GAGGTTCAAG GTCCTGTAGA ATTCAGCAA TTAGGGCAAA 9360 CTTTTAATGA GATGTCCCAT GATTTGCAGG TAAGCNTTGA TTCCTTGGAA GAAAGCGAAC 9420 GAGAAAAGGG CI'GATGATT GCCCAGTTGT CGCATGATAT TAAGACTCCT ATCACTrCGA 9480 'rCCAAGCGAC GGTAGAAGGG AT'Tr'GGATG GGATTATCAA GGAGTCGGAG CAAGCTCATT 9540 ATCTAGCAAC CATTGGACGC CAGACGOAGA GGCTCAATAA ACTGGT'rGAG GAGTTGAATT 9600 TTTTGACC!CT AAACACAGCT AGAAATCAGG TGGAAACTAC CAGTAAAGAC AGTATTTc 9660 TGGACAAGCT CTTAATTGAG TGCATGAGTG AA1'rTCAGTT TTTGATTGAG CAGGAGAGAA 97~20 GAGATGTCCA CTTGCAGGTA ATCCCAGAGT CTGCCCGGAT TGAGGGAGAT TATGCTAAGC 9780 T'TTCTC!GTAT CTrGGTGAAT CTGGTCGATA ACGCTTTTAA ATAr'rCTGcT CCAGGAACCA 9840 AGCTGGAAGT GGTGGCTAAG CTGGAGAAGG ACCAGCTTTC AATCAGTGTG ACCGATGAAG 9900 GGCAGGGTAT TGCCCCAGAG GATTTGGAAA ATATTTTCAA ACGCCTTTAT CGTGTCGAAA 9960 CTTCGCGTAA CATGAAGACA GGTGGTCATG GATTAGGACT 'rGCGATrGCG CGTGAATrGG 10020 CCCATCAATT GGGTGGGGAA ATCACAGTCA GCAGCCAGTA CGGTCTAGGA AGTACCTTTA 10080 *CCCTCGTTCT CAACCTCTCT GGTAGTGAAA ATAAAGCCTA AAACCCCTTT ACAAATCCAG 104 :CTAT'rCATGG TAGAATAGAT TTTGTGTGAA ATATCAGCAG GAAAGCATGA AGCTCGTCAA 10200 *CAGGTGTCT'r ATGACAAGTA ACCTTGGCTG TTTAGGCGAA GGGCATCTGC ACGG 10254 INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 9769 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION: SEQ ID NO: CCGGCGACTA TCGATAACAC TTGACTTGGT AGCCCCACAT TTTGGACAAC GCATCCTTTC CCTCCTTATC GTITTCTTTT CATTATACCA TTTTTTAAGC GATTCCCAAA ACAATTCTTC 120 TTrTTGCTTG ACAAGTTTTT TGTTTTGTTG TATTA'N'TAA TTAAGACAAC AAGGTAAAAG 180 AkAAGGAGACT AAGATGTCCT GGACATTTGA CAACAAAAAA CCCATCTATT TACAGATTAT 240 GGAGAAAATC AAGCTTCAGA TTGTTTCCCA TACACTGGAA CCCAATCAAC AACTTCCAAC 300 *CGTGAGGAGC TAGCTAGCGA GGCTGGTGTC AATCCCAATA CCATCCAAAG AGCCTTATCA 360 GACCTTGAAC GAGAAGGATT TGTCTAC!AGC AAGCGAACAA CTGGACGATT TGTGACTAAG 420 GATAAGGAGC TAATCGCCCA GTCACGCAAA CAATTATCAG AAGAAGAATT GGAACACTTC 480 319 GTTTCCTCCA TGACCCATTTr TGGCTATGAA AAAGAAGAAC TACCAGGCGT AGrCAGTGAT TATATTAAAG GAGTTTAAGC C'rA'GTCATT ACTAGTATT GAAAATGTAT CCAAATCATA
TGGAGCAACA
CCTTCTTGGG
ACAACCAGAT
CCAGCCCTTG AAAATGTTTC TCTTGACATT CCAGCTGGAA AAATTGTCCG CCAAACGGCT CAGGAAAAAC AACCCTGA'rT AAACrAATTA ATGGCCTT CAAGGACGTG TCCTCATCAA CGACATGGAC CCAAGCCCAG CAACCAAGC CGTTGTAGCT TA'rTTGCCTG ATACGACCTA TCTCAATGAG CAAATGAAGG TCAAAGAAGC CCTAACCTAC T'rCAAGACCT TCTATAAAGA TTGTCAGATC T'rGAACGCGC CCATCATCTA CTTGCAGACC TGGGCATTGA TGAAAATAGT CGTCTCAAGA AACTA'rCAAA AGGAAACAAA GAAAAGGTTC AACTGATTTT GGTTATGAGC CGTGATGCTC GTCTCTATGT TTTGGACGAA
CCCATTGGTG
TACTCACCAA
TTGGATGAAA
ATTCGCTACG
CAAAGGAGAT
TGGTATTTAG
CAAGGCTTTA
ACAGTCTTTG
CGCTTCAAAG
GAACACCATA
GCTG'rATTGG
CTTTCTTATG
TCCTTCCTAC
CAGCTTTTCA
GTCATTGGAT
GTAGGACTCA
TTCATAGCTA
TAAATAATTT
TCTTTAAAAT
GGGTGGATCC AGCAGCCCGT GCTTATATCC CTTCTACCGT TTT'GATTTCT ACCCACTTGA TTGTCTTCCT AAAAGACGGA AAAGTCGTCC AGTCAGGTGA ATCCATTGAC CAACTCTTCC TATTTATGTT TTGGAAT'rTA GTTCGCTACG CCCTCTACGC AGCCGTGCTA GTCCTTTCTG AAAATCTACC TTACCAAGAA AGTCAGGCTA GTGG TGAT GCTTACACTT GGGAT'rTCAA GTAGTGTCTA CGACCGACAA GGCTATCTGA TCATCACAGC CAAACTAATC GGTGCCTTTA CTCTAAGTGC TGTTATTATT CTGGCTTTAA TGATTACATT TGTAGAAACA CATCTCCCTC TAAATACTAT TTCAGGAATC CTCTGCATCT ATGAATACCG TACAGCACTC GCTGTTGCAG TTATTGAACr TTTC=TCAAT CTTAGTTCTA ATGACCATTT CTATATGGGA GCAGGTATAG TCTTTTATCT CGGAACCTAC TACATCTTGA TTACCTAGAT ATGTAACATA CTCATAGAAC TAGAAAACGC ATAGTATCAG GTGTTGAATA TCAATACCAT TATCAACAAC TT~TCTGATAT CGAGCCAATC GTCAAGGAAA TGTAGATGAT GTCAGaATTT AAGGCCTAAG AATTTrAAAAA TGTTAACAAG CCCTCATCGG AATACAGACA CTATGCTACT TTrTCTAGCT CCATTTTCTT GATTAT'rAAA CT'rTGACCTT- GCCAGTTT'CT TCTGGTCATT GATTAGCACC CAGCTCCAGA ATGGATTCCT AGATCTTTCT TACAGGTATA ACCTGGCTAT TTCCAT'rGGA TCTACATTGG TATCCAAATC ATTTCTATGT CAATTCACTG CCAT'rGTTGA AGAACTCATA GAAATAAGGT TAATTTGCTT AAAAGAGACC AGGCAAAAAG TGTACTGCcC CCCAAAAGTT 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 AGATTTTTTC TGTCTAACTT TTGGGGGCAG TTCATAAGAA CCTTGGTAAT ATGCGTT 320 TGTGAGCTGA CTTATT1TCCT TTCACTATAT CGCAAAATGA ATI'TGGAAT TCAAATCAAT TTrATAAGAAT GTTTTAGAAG TTCAGTI'CAC TATACAATTG AGTTTTCAAG CAACCTCTTr GGTTCGTGAT TCCACCCTTT TCACCTTAA AAACCTCGCT
ATAAGATAAG
ACTAGCATAC
AAGTGAAATC
GAAGGTCGAA
ACCTAGAGCT
TTTCCCGAG
CATGAAAAGA
ATCCAAAATA
ACCACCTACA
CGCATAGAAA
TTTTTCACTG
GCACGTTTAA AGGTTTTCCA AATCCCTAAA ATGCGTCCGA TAAATCCTGT TGCTACCACC
AATAAGAACG
TAATATTATC
ACATAATGTG
T'rCGCAAGGC
TCATCCGTTT
GCAAAAATCA
ATCGTTCGAA
AGGAGATTGT
ATCATCAAAG
GAACGATGGG
CTATTCCAGA
TACATAATTA
TC'rTCTATTT GAAGAAcGAG
CTGTAATAGC
ACGGCATAAA
CACCAGCTGC
GCGACAAGGC
CATGCTTCTG CTCCCCCCGC ATAAAGGGAA TATAAGAACC GTCACTCCAA AAAAACCACC TCCTCAGGAC GAGAAACCAT CTGATCAAAA TAAAGAGCAA CCAGACTGAG CCAAGAATGG ACATAGATCC CAATATGCGT TAGTGACTTG CCCT'rATGCT GCAACTTCCT GAGCTGTTAC
ATAGTCATTA
AATCTTCAAG
CATAATCAAA
*W
9
S
9 9 9 9.99 9 9999 9* 99 9*99 9 999W 9. 9 9 AAAGAATCCT AAGGCACCTG CTGCAATTGT ATCAATCT 'r CTGTGAATT GAATTGTCTG GGAAGCAG1'T GAACGA'rTAA GCTGATT'rTG TTTAATTCCA TTTTCAAGCG ATGTTTCGCC TTGATCAA'rG GTCAAATAAC CTTTTAATTT TTCGTCTTTA 'rAGTCGAAGT TAACACCATT CACTGTTGTC ACTACTGCCA CTTTATTATT AATTCCTACA GAGATTCCTA AAAAGAGGAA TGACTCGACA TGTCGAAGAT AGGrrCCTT AGATCCTAGG AAGGCTGCCA AGACTACGTA GGTATTCAGT GAGATAGCAT CTCCCAAGTG CAAATCTTITA AAGAGCAAAA CGGCAGCCAG TAAAATCACT AGAAACAGAG CCATCATCCG AGAAAAALACG ACTTCCATAA TT IrGGTGCC ACCCGCATAG GTAATCAGAA TCATATAAAG TTGAATAAAC TmTTATTTT CCTTGGCTTC CGCTAAGCGT TT'rTCCTGCT CTPGAGACAA" CAGTTCATTG AGTGTACCTG TAACCTCAAA ATGATAAACT GCCT'rTAGAA CACTATC'rTC T'rCTTCTTTA ATTGCTTCTT TGGCACTTGC TACATTCTTC AGTCCTTCTG CTACAGATGG TTTAGCCATA GAAGAACCTT GGAGATGCCC CGGCGAAATC ACCATAAAGA AGAAACTCCA GAI'TACAACC CACATATTTC TCATACTTCC 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 ACTCCTGATT CTAGTTTAAA GATTTCATCG ATAGI'GGCG CTTGT'rGGTC AAATGTTGCG ATATATTGAC CTTGAGTCAA GATTGAGAAG AGTTCCCTTC CAGCGCTCTC ATCCTCCAAA ATCAATTTCC AACTGCCTTG TTTGGTCAAG CTCACCTGTT TGACATGAGG AAGATTTTCC AATTCTTCCT TGCTTCGTTC ACTTGAAACA AAGAGACGCG TTTTCCCGTA T'rGATTGCGG ACATCCTGAA CTGGTCCGTG CAAGACCACA CGGCCATCTC GGATCATCAG AATATCGTCA CAAAGTTCCT CAACATTGGT CATGACATGG TCAGAAAAGA TAATGGTTGT CCGCGCTCT 321 TTTCCTGAAA AATGACTTGT TTGAGCAATT GCTCATCCAA GATAATcAGG TCTGGTTCAT GCTGAT TrCC TTTGACAGA CTCTTGATTT TCTTCATCCA ?TTAGGGAGT TTTT'CTTTGA CCAAGTAGCG AAcTTG'T~cA AGAACTGTCA ATAACCAATC CGAGCATAGG TCTCCTGACG
CTGTATTAAC
GAATCAGAGT
TATCTGTCAG
CTTCTTTGGC
ATTTAGGCAT
AATATCCTGA
GAAAATCGTT
CGCTTGAAAG
TACTTCrAGC
CTTTACAATG
CACTTTTATA
TGGCTCCAAT CCACTAAAAG AATAATGAGC TGAATCTTcT C'ITTCCTrTrC
ATCCATGCCT
GAGA'rGCGTT
ACITCCAACC
TrrAGAGTCG
CTCZAGGCAG
S. *S p *p 0 0 01 09 0 0 6 0 CTGATATTCT AGGAAr'rTCA TCCGACTAGT CCCAAAATAC CTTGGATCCA AACT'rTTCT CTTGCACTCA TTATACTCCT GACTATTGCT GTGTAAAATA AAACTAGGAA GCTAGCCGTA ACGAAGTCgA CTCAAAACAC ATCTACGGCA AGGCGAACTG TCCATTATAC AGCAGCAAAC CCTGAATTGT TA''rGAGTA TGGAAAAAGG CTAATAGTT'r CAAGAAGGAG TAATCCTT1'A ATGTT'1CTA AGGATTATAT TCAAATTGAT TTCTAACAAT TATATCTAT'r ATGCACACCC ATGGTTACCT AAGCCTAAGG GACTGCTCAA AGTACAGCTT TGTTTT1GAGG TTGTGGATAG ACGTGGTTTG AAGAGATTTT TTAATTTATA CCTTCCGCTC ACTCCT'1TTT CCTCGTAAAG CAGACAACAT TTTTATAAGA TCTACTAATG GACGGAACAG AGTAAAATGA AATAAGAACA
AAATACTATG
GACCTGGTCG
CTAGACTTCT
TGGCCTGGAG
CCATCCAGAC CGATTTC'rCC GTTTCCAG CACCATTT TCAATACCAA ACAAAACTTG ATCTTTCACC TCCGAAATTT TrTT'r'rGTCC ATTTTTAGAA CTCAATGAA-A ATCAAAGAGC TGAGGTTGCA GATAAAACTG AACTGACGAA kCrTAaCTAT CGAAGAGTAT TAGTGATAAA CTCAACTGTC TATTTTTAAT TT-CT'rCCT CTAAAACTTC AACAAGTTrCA TCTGTCATT'r AATTCAACCG CTTGTCCGAT GGACAAATG ATCACGACAG 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980' 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760
GTTTTAGAAG
CTATAGGATC
GAACTAAGAA
TAGA'rGTATA CTATTCTAGT T'rCAATCTGC TAATGAAAAT CACAACAGGC TCATTCATAG AACGACTACC AAGGAAGTCG CATTCATCGA CTTGAACTAC AAGTCCCCCA GAGAAGACTT CAATAATTAA TTCACTATCT AACTATATTT AATTAACTrAA CAATTCAAAG GATTCATACT *00* 0 4*cS S
I
AAAGTAGATT AACAACTATC CTAAAAAATG CTGGATGACT AACTTGAACT TGAAATTTAG AGTAATTATT TCAGAACTGA TTAATATTAA AGCCATAAAT TACGTCCATC AGAGAGAGAC GCTTCAGAAT ACATCTAA-AC TTTAGGGAAA ATTATCTCAG ATAAGCTATT CGAAACTTAG TTCGAAACCT AGAATGCATA TAACC'TTAG TCTTACTACT TTTAGATTTT ATGACTATTC GAAAGCGCGA AATOCTTTTA, AATTTATGGA 'rTGACAGACC TATTCTAAGT
AGTCTTTCTA
ATGCCTCAAA
ATTGCGATTA
CTCGAAGGGC
TATTTACTTT
TATAGTAGAA
TGT'TTCCC
TTTGATTTTC
TGCC'r'rrATC
ACAGACTGTC
CTATTCCTTA TCAAAAAAGA ATATACTATC TATGAGGAGT CI'CTGCAGC GAGTTATCAA TGTATTTGGG CT'rATCAGGC CAGCAGGCAG GCA'rCTGGGG GTCCCTATCA TTCCAGGGGC CGGGCACATC ATCGGGACTA TCTACAACTA CTT-TThTCTA GTGCGCCTAT ACGGAGCTGC CTACGACAAG TACATCGACT GGCTAGATAA TATGATGATT TGGCCCATTA GCCCAGCTGA GA'rGAGCTTC AAGCGCTACA 'rGACCATCAT TTATACCTAC GGTCTGACCT ATATTATTGA AAATCCGTTT GGTTTCCCAA GTGGATTT'I AATATACTTT GGTATGGAAA TCATGCATAT.
CCTTTCCGCC GTGATAGAAA CACCTGAAAT 322 CTCATTCCCC CTTTCTCCTC TTACATGTCA CAGGATAAAC TATCTCATCG ATTGTCGGTG TGGGATTr'rA CAATCCAAGG TCCACCTCTC TIrATCTTTT CTTGACCTCG GTGGCTGGGG TATCGGCATC GTGATT-GGCT CTTTGTCCAG TCTG1'CGTCA GGGCAATCG'T TTTGACCGCT CI'TTCTCTGT ATGCTGGCTG CATTCTGACC AAACCCTTTA CTTTTTCTGG CAAATGCT'TT AAAGCGTAGA TTAACT~ATAG TTTTCGATAG TGAGGCGACG CTAATGGTTT CAGGTATTCG CAAAATATGG 5820
GCCTAGTGTC
CTTAAGGAAA
GATTTTGTGC
CCATTTCCTG
CCATTAAGAA
ACCCCTCCAC
TGAGCGATAT
CGTACACCTG
TGGAAAGGAC
TAATGACCCA
GCACCGATAC
TTACGGGCAA
ACGTTTAGGG
GTCGTCGTGA
TCAAAGTTTA GGTATGGAAT TTTGAAGAAA GTCGCI'ACCG GGCTCAAAAA TATTGTTTTC AACCACAAAA TCCGT'IrGGT TTA'ITTTGA AACTTCTTTT GCAAGAACAA AGTTCCCAAG CGACTGCTGG CGTCACGAT1A TAGTCACGCA CATCTGGTAC AAATGAAAGC 5880 GGTTGGGAG 5940 AAACCCTCTC 6000 TACAGATTT'r 6060 TCTTTATCTA 6120 GTGCCATTAT 6180 GCAAGCGCAC 6240 TCTTTATrTT 6300 CCCTGACCAA 6360 CCCTCGTGGT 6420 GACACGTAAA 6480 CTTGATACTA 6540 ACTTACCTAG 6600 GAAACTTTGA 6660 TCCGTAATCA 6720 TTCCCAAGCG '6780 TGTGGCAGAA 6840 TGGTAGGTAA 6900 TTGAGCCATG 6960 AACCGCAGCT 7020 AGTTTCCCCA 7080 ACATCCCTTA 7140 AGCAACATAA 7200 GATGACGCCT 7260 ACCAGCATTG 7320 GTACATTGGC 7380 TGGTTTTGGA 7440 CCCAAATGAA 7500 TTTATCGATT 7560 GAGATGTAAA 'rTTCTCACGG CAAAGACAAT CACGTCTGGG AGTAGGCTTG AACATCCCAA TACGAGCTTC CAAACTTGGA AAACACCCTT AAACTCTTTT T'rTCAGGGTG ACCCACACCA CTGTACCGAT TGTGTAGTAA ACACGGTCCA GCATATGTTG CGGAAAGTCA CTGTCGCATT ACAGGGTTGT TGAGTTCAAT CCAGCTGCAT AACCTTCTAG TCAATATCCA TTGGGTGTCT CCGATAAACT CACCACGTTG ACCAAGTTTT CGATACGACC CCAT 'CACC
CGCGACGAAG
TAAAGCCATA
GTAAGCAGAG
GGCACCAAGC
AGTTTTITGAG
CTGTTTACGT CTGTTGTGAA AAGTCTACAT T'rGCCCAGTT T'rqT'TGTCAA TATCAATCGG TTTGAGAAGA ACTCAATGGT CCAACTGCAA GACCAGCAAG GTTATCGAAT GTTTCGATTG GAGTTGTTGT CCGACACCAC AGACAAACTT ATAAACCTC'r TGTTTTATT T1'TIrAGAT TTTCCTCTGT GTTCGTTTTC ACATI'AGAGA AGCAACTGAA CCTGTTGCGA AACCATTGTT GTATCATCAA AACATCGCCA GCTTTCACCT CATAGATACA GTATCAATAC ACCAAAAGCG TGCCCTGTTG CACGCCTTGG CTTGGTTTCA GTCATTGACA TCAGCAAGAG TTGAAGAGCT GCTGGCGCAA TGGAAATTGT GTTTTTTCTA TGTACCGCCC GCTTCCAAGC TTCT'rrATTA TAGCATAC T AAATCTTACT ATCTAATAAA GGATTGATTA GAT'rTTCACT CTGGAGCTAC TGAAGCGTAG GTCCAGCTGC AGCGA I-rr'G TATTACCTTG AGCAACTTTT CAACATGAA'r
GAAAGGCAAT
CAACGATACC
CGACAACATC
CTTCTTCTTT
CAAAACTTCA
TGAAACTTCA
TT1GTCCCATA
ACCGACGATA
TTCTTCAGCC
AAACATGTAA
TGGAAGTTGT
CAACGTTAAA GTTTTCATCA TTCCATATAA TTTGTCATC CGAAAGTCTA AATGTCTCTA AACGAACAAA CATGTCATTT TCGATCACAG CATCCCCCTr TCACCTGTAT TTGTAACGAT TTTGACTCAA ATGTTCCAAG GTTTCAAAAC CGTCACCGTT GCACCATTTC TTGTTTTCAA GCATCAGCTG GTGCATAGAC GCTCCACrrG AGAAGAC'rGG GGAGTTACAA GTGTTTCATT ACTTCAGCTC GTTTTGCAGC GTAAGAGCAA AACCAAGGGC CCCTTACCAA CATAAAGCAT AGTCCAAGGA TAGAAGCCAA- GGTTTACGGA AGCGCAAGTT GAAAGAGCAG CCGGGAAAGC GCA.ACAGTAG CAGCACCT'rG 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 TGCAGTTGCG TCTACT'rCAT Cr1'CGTAACC AAATGATACA GCTACCATAA GAAGGTATTG TGTACCAGGG ATGATGGTGA TACCA7TDACC AGTACCAGCA TCCACCACCG ATTGCACCAG CAATCAATGA AAGGAAGAAT CACCCCGAAG ATAGCAGGCT AAGTGTTTTC AGTTTTGGA'r AGCTGTCATA GCAGCTGTGA TTGCACTTCA AGCAAGTI'GA CCCACCAATC AAGAAACCAC CTGTAATACC TAGGAAGGCA M=TTGTTTT AACACCAACC TGATAGCGTT GAATGGGTTA GCATGGTCAG CAGCAAGTAA AGATGTGGTG CACACCTGAC ACGACGATCA ATTGGTGAAC CAAGACCAAA TGGCATGCTA AGAATCGCT'r TTGTAGCAAT AAGGATGTAG TTTTCAACAA CGTGGAAAAC TGGTCCAATG ACAAAGAGTC CAAGGATAGA CATGACCAAA AGTGTCACGA ATGGTGTTAC CAAGAGGTCA ATGACATCTG GAACAACTTG CGGACAGCTT TTTCAAATTT AGCTCCGACA ACCCCGATGA TGAAGGCTGG AAGAACGGAA CCTTGCAAAC CAACAACAGG GATGAAACCA AAGAAGTTCA TCGCTrGTTAC TTCACCACCT TGAGCAACTG CCCAAGCGTT TGGAAGTGAG CCAGAGACAA GCATCATACC AAGAACGATA CCAACGGCAG GATTTCCACC AAATACACGG AAGGTTGACC ACACAACCAA ACCTGGCAAG ATGATGAAGG CTGTATCTGT CAAGATTTGT GTGTAAGTTG CAAAGTCACC TGGAAGTGGC 324 ATTTCAAGAG CGTTGAAAAG ACCACGCACA CCCATGAAGA GACCTGTCGC TACGATAACT GGGATGATTG GAACGAAAAC ATCACCAAAA GTACGGATAG CACGTTGGAA CCAGTTCCCT TGTTrAGCAA CTrCTGCTTT CATGTCATCC TTAGATGATG TTGGTAATCC AAGTACAACA ACTrCATCGT ACATTTTGTT AACTGTACCT GTACCAAAGA TAAT'rTGGTA TTGCCCTGAG TTAAAGAAAG CACCTTGAAC TTrTTTCCAAG TTCTCAATCA TfCTTTATT GATTTTCTCT TCATCT'ITGA CCZATGACACG TAGACGAGTC GCACAGTGGG CAACACrATT GACATTTrCA CGTCCGC!CCA AGGCATCGAT GACTTTTTTT GCAATTT-CCT GATTGTTCAT T'rGCAAAAAT CTCCT-rATAT AACATTTTGT TCTTGTT TGA AAGCGATTTT ATTCGCCGG INFORMATION FOR SEQ ID NO: 31: SEQUENCE CHARACTERISTICS: LENGTH: 3149 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: CGCTTGAGTG CTAATTCATA GT1'CTATTGT ATCACTTGGT CAGAAATAAT CAAGAAAAAA GTCTGACTTT CTCAAGATAA AAAGCCTGAG ACCAACTCAG ACTTTTTAAT TCTTAAAATG GCAATTCTTC CTCTTCCAAG ACCAAATCTG CCAAATCTTG GCCTGCATTA TTTTCACGCA TAGCACGTTG GGCACGACTT TCCA.AGAGTT GGAATCCTGT GACAAGTACT TCGGTCACGT AGTTCATTTG GCCATTTTTC TCAAAGCGAC GGGTACGCAA TTCTCCATCA ACGGAAATGA GACTACCTTT GGTTGCGTAC TTGCCAAAGT TTCTGCTAGT CTGCCCCATA GGACCATATT 9360 9420 9480 9540 9600 9660 9720 9769 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 GACAAAATCA GCTTCACGTT CACCGTTTTG GTCTTTGTAA TGCTCGCGCT ACCGACTTGT CATTGTTGGT TTTGTGCAAT TCCAATCAAG ATAACTTAT TATACATATT TTCTTCCTCC ATCAAAAAAA GTTACAGAAA TTTGTAACTT TTCGAGAAAA ATGAAACCTG TCGCCTGTTG ATTGGCCATA ATGGTCATAT GGTTGACTAG TCACATAGAC TACTGTATCT GCAATATCCT CCTTGGTAAA CGGACGCAGC TCGTTCTTTA TCACCATGAA GTTTCGACAA TTCCAGGCTG AATGGTCGTC ACCTTGATAT CGCAGTCCAT CTGAAAAGGT CTTAACTGCC GCCTTGGTGG GCATAGGCAT AAATTCCTGC GGTTGACCCC ATATTGATAA CGACGGTTCA CAGCGATAGT TCTGGTGTAG ACGTTAAACG TACTTATCTA TTCGTAGGAA TTTrIrTATTT TTTATGAACC CTGTAATCTG AACACGACGA GAGCTTGCAA AGCTTCTATT AACGCACTGT AGAAAAATCT CCGTTGCGAT GGTATCAATT CTGAGTAAAC AGCTGCACCA TATGACCTTG ATTGGCTTTT ACCAT'rGCTG CAAGAAACA GCGAGTGAC'r ATGGTCAGCA TATCCAACTC TTCATAGTCT GCG'1TATTGA CCAGGATGTC AATCTGACCT TTTACCATTG TCATATCCGT GACATCTAGG GTTCTGCAA ACTCCGCCTT AAGAGCTTCT ACATCCTCAC CCTGCTCCAG ATAACCACC CCTGTAATCA CAACAT'N'TT CAACTTCTTA GGCAGTCCAG CTGATAATTC AAGCACCCCA TGCACATCAT ACCAAAACTC TTGGCATCAT AGCTCCAGGA CTCCTGCAAC GATTTG'rACA GGTCACTATC TGGATGGGCT TATCATTAAC AATTTCTTCT
TGCCATCTTA
TGT'rTCGCTG CGT7TTTTGTG
TTTTCACCAC
AGCGCGACA.A
GTCTTATCAC
ACCATCTCAA
GTAAAACCTT
325 GCCATCAAAC CITrTGACATT GGTATCC TGA'rAGGGAG CTAAGCCAAG AGCCAGTCC'r ATCGTTTCTA AAATATCAGA GCAGACAGTC AGAAAAGTCC AAACTGTTTG A'T'rGGAAAA AGTCTGTCTA TCCGTCGTCC TGTTAGAACG GCAATCGCTT CACCGATTCC TGATCTCGCT ?TTCCTTCTA GCTGGTCTAT CAGATATTAA GGTCGAACGG TGTTCCGACA ACTTGGTCT'r GACCATTTGG CAGATGCAAT TCACGAGGAC GAAGTTCACC TGGGAAAA'rG AGATTCCCTT TGGT'r'r'CAA CCCCACACGC GCATTGGGAG TTGCGACTGC AACTTGGCAG ATGTTGAGGT CAATTTrCACC TACAACAAAC TTAGGTTCCT CCGCCTGCAA CrCTTGGT 'C AAACGAGCGA CTTGCTCATC TGTCAAAAAG ACTTGACCGC GCTCTGCAAT TTCAAATAAA CTTGAAACTT 'ZGAAAATATT CCAAGCCACT-GTTTCCCCAT TATCTTTGAG AAAAACACGG GCTACCTTGC 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220' 2280 2340 2400 2460 2520 2580 2640 2700 CTTTGCGCTC C.ACATCCAGT TTGGCA'rCTC CGCTA'T'r'rT CACGATGACC CACCGACATG TTCTT'rATTA TATGTAAAAA TCATTGTTTrC CTTTTTCTCC
ATA.AGGACAT
TATTTCAGTC
CTGCTAAAAA GTCATTGAT'r TGTTGCTTGC TTTTACGGTC -TTTCCTTGTC CTTTCTAGA ACAACAAGGC TAGGAATTCC CCAAATCCAT ATACTGATCT CGGTCCATTC GAATAAAGGT CAATCTCTGG TAAGGCAGGA TAAATATAAC GACAATCGCT TGAAGACCT'r CTTGCCCGCT TTTTCCACTA AAGATGC1'AA GTA'rCATAAG ACTTCCTCCT CATAGACTAG GTCTTCATTT ACGGCCATCC TCAAAAATGA CGCCACCAAC CAAGCTCTCC AACATAAAGG GTCGCAATTT CCCCCATGTC GGAAAAATGG CTCTTCCTGA GTCTTCATGA GCTTACGGTC ATCTGCAACT GCTTCCGATA CCTAGCAGAG CCAAGCCTGC CATCCACATT CATTTTAAcA CAAAAAAGGC TTCAGGACAA ATGAGGAAGC GCGATTGACA AAACGACCGA GTAAACATCC CAGAGTTTGG GAACTCTGGA TTGGTCTCCT ACACCAGTCT GCCACAAAAA T'rCTTCTAAA CTTGCTGGCT TCATAGACAA AGGTATAATG AGACTGCTTT CGTAAACTTG TCTCGCACAA TCTCTGTCAA TTTTTCGTAG CAAGAGCAAG TTTTTAGCTT TCATACCAT'r AGCAGAAAAG CAAGTAAAAA 326 GCCTCTPTCCT TTAAGGAAAA GGACTrCTrA TACTCAATGA AAATCAAAGA AAGCTAGCCG CAGGCTGCTC AAAGCACTGC TrTGAGGTrG TAGATAGAAC CrCAAAACAC TGTTTTGAGG TTGTGGATGA AGCTGACGTG GTTrTGAAGAG AGTATTATTC TTATrGCCAG GCACCTAAGT TGCCAACGTA GTAACrATCA TATTGCGAGC ATCTTACCTG ATGAAGCCAG ATAATACTAC TTGCCATTGT ATCATTCGCA ATCATGGAAC CAGAAGAACT TACATAATAC CATrCTCCCT CCAAGTACTG ACTTTCATGG TPCCTGAGCA AT'rAAAGGCA AAAAAACTGT TCGTTTTTTA AAAGCATTTG ACACTACAT
CCAAACTAGG
TGACGAgTCa
ATTTTCGAAG
GGTGTGTAGG
CTTTGACCCA
TGTCATAAAC
CCAATAACAT
2760 2820 2880 2940 3000 3060 3120 3149 INFORMATION FOR SEQ ID NO: 32: SEQUENCE CHARACTERISTICS: LENGTH: 10240 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 4~ 4 4. 4.
4 4. 4* 4 4 0* 44 4 4 44 44 4 44~t 4 44 4* 4 444 4 44..
4444 4 4. 44 4 4 4 CCAAAAATTC AACCTTTAAG GGGAGTCCAG AGAGACTCAC AAGGTGTCAG ATAAAAGAAT GGTGCAATTT TCTAGAGGAG GAGGCCTTGC ACTAGCAAGG AGTTATGAAT CCCACATGTA GGTTGCACAA GAGGAAATCG GAAGCCATGC GAGCAGGCCA CGTCGTCCTA TTTCAATTTC
ACTTTTTGAG
TCTT'rTCTTT
AGAAGCGTTT
CGCCACAATC
ATTTCTTCAT
GTCTATTGAC
TGTGCTCTCT
ATCTGGTCCC
GGGTGTCATT
TTTGAATTAG
CTGCGTGTAC
AAGGCAAACA
TCAACCTTAA
TTGTCTGACC
TGTGTTGTAC GATTTTAACT CTTAAAATTT AAGGAGGAAA CGGTTGGAAA CCATGAAGGT TCCTAGAAGG AGAAATGGTT CGGACGATGC CCATCTCTTA AGCAGTGTCA CCTCATTTAT GTCAGGGAGA CACTCTTGAT TTGATGAGCA GAATCAGGTT CGGATTGACG GAGCTGGGAC TGCAATTTTT GTGATGGGGC CTCAGGGAAA TGG'I-r?1GAC CTCCTTGTTG GTGGTGGGAT GAACGTGGAG TGAAAGTAGT AAAACGGAAT TGGCTCAGTA.
ATCAAGGGAA ATGTTTCCGT TCGTGTGGGG CTCCAGGAAT GCCTATTTAT CTCTGGAATC CTAAAAGTAC CAGAAAACGA
TGGTGTTCCA.
GACAGTCCTC
TGGTCAGGTC
TGTTATCAAT
GATGAAGTAT
CCCTTGCTTG AGGTGGCCAA GGAATTGCAT GGTTTTGCTA. ATAAGGATGC TGTrATTTTG
TTTGTAACGA
GATTTAGACA
ATCAATCAAA
CAGATGATGG TTCTTATGGC GTCAGTTTGA TGCTGTTTAC CCTTTGATGA TCACCCAAGA GAGCTTGCTA TGCCTGTGTT GTGAAGATGG TCCTGT1'TTC TCGTATGGCT TGTGGGATGG GACGGTCAGC CAACGCGTCT CGCACAGGAA CAGTTGTA'rr ATAAGGAGAA 327 AATTATGACT ACAAATCGAT TATI'CCAGC-A TCAGCCTGTT TCTACCTG TTGGATTGA ACAAGAGTAT GCCAAGTACT AACCCTTGAA CCACG?1'?rG GCTCAATGCA A?1'GGCTTGC TTGCCTGGAA AGAGAATATC ACAAGAGTAT GCAGCTGTT1 GCTCAATATT TCTGTCCCA TCCAGATTTG GCTTATGATG TGTCAAATTA ACCCCGAGTG GGGAGCAAGT GGCT1TGACCA TAGAAAACCA ATCTTGGCCA
AAAATCCCAT
ATGATTTAGA
GGAATCCAAC
AAAATCCTCG
CAAArC~TTCC
CTCATGGGAT
CCT=~AGGT
TCCAAGAGTG
TT'rAGAGGTT
TATTATTGCCC
TT1CCAAGGCA
TCTATTATGA
GCAGAGACGC
GT~?rGGCTG TACAAG'N'Tc
TTGGCTTTGG
TCAAGGCGAC
CTGCTGGTAT
AAAAGCTACC
ATGTTGACCA CTGTAATCAT TGGTGAAAGC AGCTGTGGAA TCACCGATAT CGTTACTGTC TGATCAATAC TCTGGTTGGA ATGGAACAGG TGGAATGTCT AATGTAGCTG GTTTTTCAAA ACTA.ATGTAA AAGCTATCGA GGACTTPTGA TTGGTCAAGA GCCTCAGALAG TGCCAGTTTA GCAAAAGCTG CAGAAGA'rGC ATGCGCTTTG ACCTCAAAAC GGTCCAGCAG TCTTTCCAGT CTGCCTATCA TTGGAATGGG GCTGGGGCAT CTGCTATCGG GACATCATCG AAAA'rTTACC CTCCGTCAGG AAGTAAAAGA TATTAGTT'rG TAATATGAAT
AS
S 5
S
5*
S
E~
S. S.
S
5*
S
S
S
S
*SSS
5( a
S
S
Sn.
AS
5 9
S
AGCCCTCAAA CTCATCCGCC AAGTTGCCCA AACAACAGAC AGGAGTGGAT TCGGCTGAAG CTGCCCTAGA AATGTATCTG AGTrGAACA GCTAACTTTA CCAATCCTTA TGCCTGCCCT AAAACTCATG GATAAATACG GTATTAGCAG TCTGGAAGAA GTCTCTGAGG TAAACTGCAA TCAA~TCTGTT CTTGATTTTT 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700
TTAGGAGAAT
GTAAGATTTA
AGTGATGGTA
GAGTGGGTTT
GCTGAAAATG
ATGGCCAAAT
AAGATGAAAA
ATAGAAGACT
TTTGGTACAA TAAAATAAAT AAGAACAGAG GAAGAAGGTT AATGA-AGAAA TTTTT'rTAGC TCTGCTATTT TTCTTAGCTA GTCCAGAGGG TGCAATGGCT CTTGGCA.AGG AAAACAGTAT CTGAAAGAAG ATGGCAGTCA AGCAGCAAAT TTGATACTCA TTATCAATCT .TGGT'rCTATA TAAAAGCAGA TGCTAACTAT AATGGCTAAA. GCAAGGTGAC GACTATTTTT ACCTCAAATC TGGTGGCTAT CAGAATGGGT AGAAGACAAG GGAGCCTTTT ATTATCTTGA CCAAGATGGA GAAATGCTrG GGTAGGAACT TCCTATGTTG GTGCAACAGG TGCCAAAGTA GGGTCTATGA TTCTCAATAC GATGCTTGGT TrTATATCAA AGCAGATGGA CAGC-ACGCAG AGAAAGAATG GCTCCAAATT AAAGGGAAGG ACTATTA'I1?- GGTTATCTAC TGACAAGTCA GTGGATTAAT CAAGCTTATG TGAATGCTAG
CAAATCCGGT
TGGTGCCAAA
GTACAGCAAG GTTGGCT=r TGACAAACAA TACCAATCTT GGTTTTACA'r CAAAGAAAAT GGAAACTATG CTGATAAAGA ATGGATTTC GAGAA'rCCTC ACTAT'rAT'rA TCTAAALATC 328 GGTGGyTACA TCGCAGCCAA TYTGATGGGA AAATrGCTGA TACTTCAAAT CCGGTGGTTA TTTTACCTCA AATCTGATGG CAAGCTTGG'r ACTACTTCAA TATCAGCTTG GAAGCGATGG TACTATCAAG TAGTGCCTGT TATATATCGC AAGCTAGTGT TTGGCTATTA C'TATTTCTGG GATGCTAGTA AGGACTTTAT GTGGCTCAGA ATGCTAGTAT AAATATTATT CGGCAGATGG TGAATGGATT TGGGATAAGG AAAAGAATGG GTCTACGATT cATGACAGCC AATGAATGGA GAAAATAGCT GAAAAAGAAT ATCTGGTGGC TACATGGCCA TAAATGGCT1T GGAGGAAAAA TACAGCCAAT GTTTATGATT CGTATGGCTA GATAAGGA'rA T'rTGTCAGGC TATATGAAAA CCCTTATTAT GAGAGTGATG CCCAGTAGCT TCTCA'rCI- CCTGCATTTT GATGGTTTTA AATCTTGGTT TTrATCTCAAA CTCATACTCA AGCTTGGTAC TTTGGGATAA GGAATCTTGG GGGTC'rACGA TTCTCATAGT AAAATGAGAC AGTAGATGGT CTACAAATGA AAATGC'rGCT CAGATGGTGA AAAGCTTTCC GAAAAAGTGA TGACAAGCGC CAGAAGATTT ACAAGCGCTA GCCACCGTTT TTATCACTAT CTGATATGGA AGTAGGCAAG AGCTTGAGAA TCCCTTCCTT AATTGGATAA GGTATTTAGT CTACT IrTAA GGAAGCCGAA GTGCCCTAGA AAGTAArCTGG
TTCAAAGATT
'rTGCTAAACA
GAACATTACC
TAACAGAGGC TACAAACTAC AGTGCTGAAG TTAACAATAG CCTTTTGGAG AACAAGGGCG ATATCAATGC TCTTTATCTC CT'rGCCCATA 2760 2820 *2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 1720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 GGAAGAAGTA AAATTGCCAA AGATAAGAAT AATTTCTTTG GCATTACAGC CTATGA'TACG o ACCCCTTACC TTTCTGCTAA CAcATTTrGAT AAGTGGATTA AGGAAAATTA TATCGATAGG GGTATGAATG TGGAATATGC TTCACACCCT GATGTGGATA AGGGAATTTT GGAAGAACTT TCCTTGGAAA TATTGGGGCG AAAAAATTGC ATCAAAATCA ATGAGAAGCT TGAATAGTAA GTTAAAAATC TTTTTTGGAG TCTGGTGTGA TTTTCCACCA GT'rGGTTTAT CGACAGGCGC CC'rTGAGAGA AGATGGGTCA AGATTTCCTG CCAAAGTTCC TCCTTGCGGG TCCGACATAG ATGATGGTTT CTGGATTTTC TCCCGTTGTC TTCAGCGATT TCTTCCAGTC AATAG'rGGCT TTGGTTTCT AGGTGGCAAA GATTAGTACT ATAAGTGAAT CTGATTTCAA GTAAAATCAG GATTTTTTCA CGCGGAGGGT CTTTTGTCCT GTGTAAGTGA TGAGTTTTTT GACTTCAATC ATATCTACCT AGTAGGCAGC TAACTCTGCT GCGTCTGTCT AGATGACAAC ATGCCT'rCCA GGAATGTCCT CCAT'N'TAAA GGTCAATTCC TCATTTTGAA TGCCATCGCT TOCTAGATAT TGTTCTAGTT TTCTGCGGAT AAAACCTGTT TGAATCAATT
AGGTGCAACC
CAAGGCTTCT
TAGTGTGATG
ATGATTTGAG
TGGATGCAAT
CAAAGCCGGG
GCAccAGAr
TGACTGCATC
TAGCATGGAA
GATTGTTTCG
TTTTGCGTTT
CTTCACGGAT
CAGCTTGGTT GAGGACGGTT 'rCTACACTTT CCAGATAGAG CAATCAAATC AGTCAAGTAT TTGACAGCTT CrTTGAGTTT 329 CTGATACCGT TAAAATAGC GT'rGGGCATT CTGGTTGGGA AAPCATGATA GGT-rGGTTGG TATAGTAGTT1 GTCTAGGATA CACTGGTGG AGGAAGGTTG TCAGCAATTC TCCTTTTTGA GTCAGAGCCT TATCAAGCGC ACCTGGTCTT GGTCGTTACG CGAAATTCTT CAGCGTTGTC TGTCGCCAGT AACTCTTTTT TTCAACACGA CGAATCAGTT CTTATAGTAG GTGTCCAACA AAAAGGAACT GGACTGAAGG ATTTCGGAAA GCGGAAAGTT GCGTCCCAGA CCTTGAAAGA CA'TTTCAAAG AGCT'rTTCAT AGCGATATAG GTCGATCCTG TTTGATAACT TCGAGGATT CCCCATAATT TCGATAATCA AACTGTAATT TCCACAA'rAC CTGCAAATAC TT'rCTCAAAA CCTGTTrTTrT GAGTTTGTGT CGGTTTTTCT GAAGTTCAT-r CACTGGCCTG CTGTTTGACG CGGTCGCGCT CAGCCTTATC AATCAGAAAG ATrTTCAAAA GGCTCTCCCA AAGTCTCAGT CAAGCATGGC T'rGGTIT'r TTTCACTAAC CAGTATCCTT' TCCAATTCAT GGCTTTGAAG ATTTTTTGCT GTTAGTTCTT CCTTGATAGT AAAAGGATTG AGAGA'TTTG GAAGTAAGGT GCGGTAGCTA ?1'TTGTGAAA
CCTGATTTGC
GATTGAAAAA
TTGCCGTATC
GGGTTTGCAG
TACTTGGCGG
AGCCGACGTG
TATGACTGCT
AGGTAGCCTG
GG'rCATTTTC
CCATGATAAA
CGTT'rCGCTC AGCTGAATGC GTCCAAAAAC TTGGCGATTG CTGCGGATTT GCAAGACCAA GATGCGACCA TTCACTAATT CGCTTCGCAA GTCAAATGAC ATCGTTCTCT CCTTGTGATT AATAAAATCA TGGAAATGTG GTATAATAAA A'rAT'rGAAAT GGTAGATGAA ACTGGTCAAG AAATT'rTGGA ATTTGCAGCC CAAAAATTAG TTGTGACCAA TGAGCGTAGT CATGAACTTA CAGATGTCAT CAGCCTTGAG TATAAACCAG.
TTrTATCGACC AGTAGAATA'r TACTGTCTTT GATATGGTCT CCAATCTCGT T'rTTATTGGA CACTTGCTCA ATCGACTCAA TCAGGGCCCC GGTAGAAGGT TGAGCTGGAT TTTCAAAAGT TCGATGGGCA GAA.AGGAGCA GGCGATGGCT CTCTTG'IrCA AAAGGCTCAT TGATTTTCTG TTCCTCAACT ATGTGGTGTA AAAAAAATCC GTAT'rCCATA GTATTATATC AAAAAGGTAG GCCAAGTAAA GAGAAACGAG AAGCACA'rGT 1TTCAAAAGA AATGTTGCAA CAAACCCAAG GAAAAGAAGA CAAGGAGATG GCAGTCACTT P.TCTGGAGTA CCGTAACACC GACCGTCCGA ATTGGAAAT TGCCTTTGAC GAAGAGGATT TGTCTGAGTT TGATGCCTAT ATTGGGGAAT ALGGCCGAAGA ATATGGTCAC AGCTTTGAGC L'TTTACATAT TAACGGCTrAT GATCACTACA 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240
TGCTTGAAAA
TGTTCATCTC
GTGAGATGGG
T'rCAGAATTG
TATCGATAAG
CTTC'rTGGCA
GCAGAGATGA'
GCTCATGAGC
GTACACGGCT
CTCCGGAAGA AGAAGCGGAG ATGTTCGGTT TACAAGAAGA AAT'rTTGACA GCCTATGGAC TCACAAGACA ATAAACGAAA ATGGAAAAAT CGTGACTTGA TATCCAGTr AGAA'I'1'GCT 330 TTGACAGGTA TTTTTACTGC TATCAAGGAA GAACGCAATA GCTCTAGTGG TCATCCTTGC AGGTTTTGTT TTTCAGGTGT CTCCTAT-rGA GTA~rTTT GGTAGrAGCC TTTGAGATTA GTGGTGGATr TGGCCAGTCA CTATCACTTrT 'rCCATGCTGG GCGGCCGGCG CGGTATTAGT GGTTTCTCTT TTCGCAGCCT CTCCCACGAA TCTGGGATTT ATTAITTTTAA ACAGTAAGAG AGGCTTTGTA GCCATTTTAG GACGTCCCAA TGTTGGGAAG TATGGGGCAA AAGATTGCCA TCATGAGTGA CAAGGCGCAG TGCGAAAACA CGCAGTGACG CACGAATCGA ATGGCTCT'TT 'rCAACTCTGC TATTGAAAA'r CTAAAAATGC CAAGGATATG TAACAGGCGC A'rTGATTTTT GAAATTATGA CTTNTAAATC TCAACCTT'rT TAAATCACGT ACAACGCGCA ATAAAATCAT GGGAATTTAC ACGACTGATA AGGAGCAAAT GCCTAAAACA GCTCTCGGAG ATTTCATGGT GGACACTGTT CTTTTCATGG TGCCTGCTGA TATCGAGCGT CTCAAGGCTG CCAAGGT'rCC GGTCCATCCA GACCAGCTCT 'rGTCTCAGAT GGAAATTGTT CCAATCTCAG CCCTTCAGGG GAGTGAAAAT CTGGATGAAG GTTTCCAATA AGAACGTTTC TTGGTTTCAG AAATGGTTCG GATTCCGCAT 'rC'GTAGCAG TAGTTGTTGA GGTTCACATC CGTGCAACCA TCATGG'rCGA TAAAGGTGGC GCTATGCTTA AGAAAATCGG GCTAGGAGAc AAGGTCTTCC TAGAAACCTG AAAGCTAGAT TTGGCTGACT TTGGCTATAA TGTCTT'rATC GACACACCAG GGATTCACAA TGAGTCTGCC TACAGTACCC TTCGCGAAGT TGAAGCGCGT GGTAAGGGGG ACCATATGAT TGTGATTTTG GTGGTGAATA AAATCGATAA TGATGACTTC CGTAATCAAA TGGACTTTAA AAATAACGTG TCTCGTCTAG TGGATATTTT TTTCCCGTCT GATCAAATCA CAGACCATCC 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920
CGAGAAAGTC
CTCTATGAAA
GCGCGATAGC
TAGCATGGCC
GGTCAAGGTC
TGAAAGAGAA
ATGCCTGCTT CTTGTT'rTrA CAGAAGGAGq, ACTTATGCCT CGTTTGTCGT GGCTTAGAAA AATTGATTAT AGGAAAGAAG CTACCCCAAG ATGATTAAGA CGGATTTGGA AGAGTTTCAA TATCGAGTCA A'rGGGACGTC GTGGAAAATA TTTGCT'TTTT GATTTCCCAT TTGCGGATGG AGGGCAAGTA TTTTTACTAT CAAGCATGCC CATGTTTTCT TTCATTTTGA AGATGGTGGC TCGCAAGTT'r GGAACCATGG AACTCTTGGT GCCTGACCTT TTGCACCTAA CTCGTGAAGA
CGAGACGAAG'AGACAGACAA
CAAAAAGGGA TTATCATCG CGTCGTGATA TCGAACTCAT AAGAAAAACT GGCGCGATAA TACTAAGTAG AGGTAGGCTC GAATTACCTG AGGTTGAAAC ATTTCGAGTA TAGAAATTCG AGGGAATTGC CTAGTCAGAT TATCTGACAG ACAAGGTCTT CCAGACCAAG GACCTGAACG ACGCTTGTTT ATGAGGATGT TTAGACGTCT ACTTTATTTC
TAAAAAATTA
CCTTGCCAAG
GGTCCTGAAC CA.AGCGAACA AGACTTTGAT TTACAGGTCT TTCAATCTGC TCCAAAAAGC CTATCAAATC CCATCTCC'rA GACCAGACCT TGGTAGCTGG 331 ACTTGGCAAT ATC'rATGTGG TTCCCAGACT TTGACAGCAG GGGCCAGGCT GTTGAAAAAG AGATGGAAGC ATGCAGGACT CTGTGGTACC ATCATTGAGA CTGTCAA.AGG AGGGACTGAT AAGTCAACTG TGACAAATTT GTCGTCCACC AACTACAGAA GGGCAAGAAA TCATTCTTGA TT'rTCAAATC CTGATGAACG GAACTGGCTA CT'rTGAGAGA CCCCTACTTT TTGAGCAGGA GACCGAGATG CCCAAGTGGA GAGTCTCGTC TGGCAGCCCA CTTGATAATA ATGGCAATCA ATGAGGI-rCT CTGGCGAGCT CAGGTTCATC CAGCTAGACC AAGAAGCGAC TGCCATTCA'r GACCAGACCA T'rGCTGTr'rs GTGGCTCCAC CATTCGGACT TATACCAATG CCTTTGGGGA TTCATCAGGT CTATGATAAG ACTGGTCAAG AA'rGTGTACG AAATTCAACT AGGCGGACGT GGAACCCACT TTTGTCCAAA GGGAAAAATC ATCGGAATCA CTGGGGGAAT TGCCTrCTGGT TCTAAGACAG CAAGGCTr'rC AAGTAGTGGA TGCCGACGCA ACCTGGTGGT CGTCTGTTTG AGGCTCTAGT AAACGGAGAA CTCAATCGCC CTCTCCTAGC AGAATGGTCT AAGCAAATTC AAGGGGAGAT ACAGTTGGCT CAGACAGAAG AGATTTTCTT CTACAGCGAT TGGTTTGCTG AGACTTGGTT ACGCTTAATG AAAAGGGACC AGTTGTCCAA GTGGCCTTTA GAAAAAAAGA AAGATTTGGC GAACCAGCTT CTTAATCAAG TGCATATCCT GGTAGGCA-AG ATGACA 'GAGA TAACTGGAA GGATAATCTG CGCATTGCCT TTTTCTGACA GGAGCCAGTA TTTCTTTGGT TGTACCT'r ATGCCCATCT TCTAGGTGTA GGGAGTCAGC AAGTCGCTTT 'rTATGCAGGC TTAGCAATTT TATTTCCGCG GCGCTCTI'T CTCCTATTTG GGGTATTCTT GC'rGACAAAT ACCCATGATG ATTCGGGCAG GTCTTGCTAT GACTATCACT ATGGGAGGCP CCCAAATATC TATTGGTTAA TCTTTCTTCG TTTACTAAAC GGTGTATTTG
ACAGCACTTT
TAGTCTCATC
TATCCGTGAG
CATGGATATT
GGTC'rATGTG
AGATGAAGCT
CAGCCAGGTT
TCTTGAGGGA
GGTTTGGTAA
TCGTGGAAAA
CTGTCTCTGC
ACGGCCGAAA
TGGCCTTTGT
CAGGTTTTGT
GCTCTGCCTT
T'rGGTGGCI-r
TTCTATTITTT
8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 TCCTAATGCA ACGGCAC1TGA AGGTACTTTG TCTACAGGCG TATCGCAGAA TTATTTGGCA AGCTGCTATT 'rTGACTATTT GGCTATTCCA ACAAAGGAAT CTTrTTTAACC AGTTTI'GTCA TTATGTACGC GACTTAGGGC CAGTATGGGC TT'rTCCAGCA TAGCCAGTCA GGTTCCAAAG TAGTTGCAGG TACTCTAACT TTCGTACAGT TTTCTTACTG
GAGAAATCAG
GGTCCCTTTA
GTTGGTAGTT
GCTTTATCAA GGAAGA'TTT CAACCAGTAG CCAAGGAAAA TATTTACCTC GGTTAAATAT CCCTATCTTT TGCTCAATCT TCCAATTTTC AGCTCAATCG AGACAGAGAA TCTTCTTT TGATGAGTGC AGGAGTCATG
ATTGGCCCTA
GTCTCTGGT'r
GGCAAGCTAG
TTTTGGCTCT
TGATTGTGTC
GTGACAAGGT
332 GGGCAATCAT CGTCTCTTGG TTGTCGCCCA GTTATTCA GTCATCATCT ATCTCCTCTG TGCCA.ATGCC TCTAGCCCCC TTCAACTAGG ACTCTATCGr TrCCTCTTrG GATGGGAAC CGGTGCCTTG ATTCCCGGGG TTAATGCCCT ACTCAGCAAA ATGACTCCCA AAGCCGGCAT TTCGAGGGTC TT'rGCCTTCA ATCAGGTATT Cr=ATCrG GGAGGTGTrG TrGGTCCCAT GGCAGGTTCT GCAGTAGCAG GTCAATTTGG CTACCATGC'r GrCr'1TTATG CGACAAGCCT TTGTGTrGCC TT'rAGTTGTC TCTTTAACCT GATTCAATTr CGAACATTAT TAAAAGTAAA GGAAATCTAG TGCGAGTAAA AATCAATCTC AAATGCTCCT CT'rGTGGCAG TATCAATTAC CTAACCAGTA AAAATTCAAA AACCCATCCA GACAgATTGA INFORMATION FOR SEQ ID NO: 33: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 13206 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 9840 9900 10020 10080 10140 10200 10240 0 .0.
00* .0.
CGCTTTATCG
TTGGCGATAA
GTATCGCCTA
TAACAATCAA
GACGCAAGCA
TTTATTCTAG
TGGACGTGGT CAAGCCGAGA AACGGATAGT TCAACCTTAA CAATCTCTAT CTTTTTCTCA ACGCTTCCGC CATCTTTTTC GCTCCTCAAA T'GTCTAGTC GATTAGAAAA GTCAACCTGA GAAAAGCGTC GTTAATGATG CATTAAAGAA CAAGGGAGGA GTCTGCCCTT TTGAGGAAAT GAAAGTAAAA CTGACCTCAT GAGGAGGAAG AGTTACATCT AGTTGAGAGA GGTATGAATG CAGGAGAATA GTAACGATTT TTTCCTTTTT ATTTAATCAT GTACCTAATA TTAGAATTGT AGCTATATCC TTGTTTTCTA AGTTCATAGA ATTTCATCAA GGAGATGAAG GAGGGATTTT TCAAAAACGA AGTTCGTATG ATGATGAGCT AACATCTAGC TGGAGGTGAC TTCCAAACTT TI'CACGTGGT GGGAAAATGT GTTCGAACAG TCTATGCCTA TTCCGAATTG TNTTCAGCAC ATCTTCCTGT TCCT'rATGAA CCACCTAGAA CAGTCGAGAT GAAAAAATCG TGTGACGCAC CTAGCGAGGA AAAACGATAC TGGAACAGCA AAAGTGGCTC ATGAGGTCAG GGGTTTTGTA ATTTGGGATT AATCATTTCT TGTTTTAAAT TGACGAACTC TATTCCGTAA CGATCAATCA TTATCCCAAA TTTATTTGAA AGCTTCTCTA TCTGAACTTT ATCATCATAA GTTAGTTTCA TAATAAAAAC ACCCCAAAAG TTAGATTTTT TCTGTCTAAC TTTGGGGGG CAGTTCATTC AACACCTGAT ACTATGCGTT TTrCTTATTT GAAATACTT'r TTACTCAACC TCT'rTATACT CAATGAAAAT CAA.AGTGCAA ACTAGAAAGC TAGCCTCAGG CTGCTCAAAA CAGTGTTTTG AGGT'rGCAGA TGGAACCTGA CGTGGTT'rGA AGAGATTTTC GAAGAGTATT ACTTAATCTr 02 1020 CTTGATAC'r?
CGGTAGAATT
TGACTAAGAA TAAATCCTAC TCTGGGAGGG CTGCTGCCCA AATCATCCCT ACCATATT'rT GCATAAAATT GCCTCCTACC ATGGCAATAG TTGCTAAAAT TCCTGCGAAA AATCCCTGCA AGCCATGGTT GCCTGA'rAAG AG'?CAATCA AGAAACTTGC ACCAAAGTAA AAGGCCGCAA AGAAGACACC TGTTGGGATT TTTAAGAAAT AACCTAGAAC GCCATTCA'rC AAAGCAGAAC AAGGCCTAAC CACTGACTTT GACCAAGCTA AAGAACATCC TAGTCCTCCG ACTACCGCTC AGCATCTAAA AGAGTTAGAA CACAGAAAGG CGTTAATA ATTGTCTT'AC TCCATACTGA T'rTCTACTGC TGGCAAAAGT AGGTACAACs TGCACCAGCA TCrGTCCATC ATAAAAGACA
CCAAGGCGTA
TTCCTTTAAA
ACTGAGGGTA
CTI'CACGACT
TTCCTGTAGG
GGCATACAAG
TCTGCT'rGTG TTATCACCT'r
TTTTGGCCTT
TCCACAGCC'r GGCGATTTTA GTTG'N'?TTG CAATAGCACG A'rAAACGAAA TAACCAGGTG ACTGGCAATG GGATAACTGG ATTTTCTAGG TGTCCTGACT AAGACGATTG TTTCTGCGCT GCAGTTTTCA TGCTTCTGGG AGAT'rAGGCG GCAGAGCTCA C'rACAGCTA CACAGGTACT CCTGGGCGTT AGGGAGAAGA CCAATCTTAA TTGAAAAATG GTATCATCAG ACAAGTCACT GCTACAA.ACC T'rTGCTTCAT GCCT'rAGAGC CTAGAGsCAA
ATAGTAAAGG
CCTCCCTTGA TAATGACTG L TGTCTTCCAA GGTTrTAATT GGCGCTCCTA AATCATGCAA TCCTGACCGG ATAATAATTC TAATCACACT GACATAAGGG AAAAAGCGAA TCAACTCTTG CATCATGCGT TTCCT'rGCAG ACCAAGACAG GATCCAACAC GTTTGATAAA GTCCAAGGCC TTCTCAGCCA CGCTGACAG'r TTCCCCCAAA TTCCACATCA CGCAAGCTAT CTAATTCATG TTGGAAAGAC TTCAAATCCT TTTTCTGTCA AGGCTGTCAA CATGCAAGCC GTTCAAdGTA TAGGTAGCCA AATCAGCTGA TATCATTTCC AGAAAGTGCT AAAATACGAT TATT'CTTCAT AAACCATTTG GTGC'rGCAGT GGGACC'rGCA AGTTGCCTGT TCAATCTGCT CTACTGGCAT GCGGTTGTTA CCGAT'TTTGA CGAATC'rGT'r TATACAAGAA ACCATTTCCT GAAAAGrnYAA 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 192.0 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700
CAGTCCACCA
CCACTAAAA.A
AACGAATCTC CTTTAAATAC CCTTCTTCTC CAAGATGAGA GAAGAGTCCC CACCATATTG AGGTCAAAAA TTGTCCTGTC TATCCTCTAC ACTAGTCCCA 'NTTTcATTGC AATCTGCATT GACGGCGCAT CGGATTTTTG GCTTGGC-ATA ACGGCAATGA rCATCGACTA
GAGGCTGTAA
CGTTCCACAT
GGACGTCCTC
AAATCATCTG
TTAAACTAGC TTCTGTGATG AACCGGTAAA ATCATGGGTT CGAG'rGGGTA GGGAAAGTGG TATCCACAGT AAACTCATAG CCACAAGCTC AATCGAAATC
GTGCGAACCT
CCCTCTAGCT
GTGGCATAGT
GTCTTGCTAT
ACATCAATAT
334 CTTICAGGAGA CTGGGTATCC AAGGCAAAAC GGAGT'TTCc CTCATCCATC GGTCAAAATG AATCACCTGT CCCAGGGCAT GAACCCCACT ATCTGTCCTA GA6ACAGTAAT GGCTTGCCCT TTATTrAATC TGCTCAAGGT ?TTTrrCAATT CCCTACCCGC ATGAGGCTCG CGCTGAAAGC CAGCAAAGGC ATAACCATCA TTGCTTTATA TCTCGTCATA GCCTCTA~r TATCAAGAAA TTAGTCTGTA TAAAACAAAT ATTGTA'rGGG TATAAAAATC TCATACTCTT CGAAAATCTC GTCAGTTTCC ATCTGCAACC TCAACACACT ATTTTGAGCA ACCTGCGGCT AGTACATTGA AATAAGATAT GAACAACTCT ATTAGGAAAG TCAAATTAAT ATTTAGCAG CTACAGCGTA CTATTCCAAA CTCAATCAAC TATAGTTTGC TCATTGAGTA TCAAAAGAAA AACTTAGGAA TCAATCCTAA GCTCTCT'rCT CATGACAAAG ATAGAGATTA CAATCAACCA ACCTCCTAAG ATACTAAAGA
TGATAAGGCA
CCAGCACCGT
TCTTCCTGAA
TAGGAAATAG
AACAAGGACC
TCAAACCAC
AGCTTTCTAT
TTCTAGAAAT
TCTTTGAT'TT
GAAGTAGGTA
CCAACATCCC
ATTGTGAGTT AGTAAGCCAA T'rGCACCTAG ACAGCCTAAT ACAGCAAATG AAGTTGCTTG GACAAGTTGA AAGACCGTCG TCAAGACTAC TGCTACTACC ACCCACAAGG ATGAAGACAA AATACCAGAC CAGAGGAGCA GTTTCTCrTT CCCAGCCACA ATCCCGATCA ACTGCATGAT CCCCAATCCT CTTTCCACCA TCAAACTTGG AACTACAACT GCCGCTTCGA TAGCTAAGGT ACGACTTGCT TCCTTCGCTC T'rTrCTTGAC GAGCAGATAA ACGGCAGCA CCAAAAATCC ACCAAAGGCC AACAACTGAC CGACGGCCAA AGAAGCGCGT ACCCTrAACA TCTGAATTCG AGAAATGGCC TTGGCATTrGA TCATCCCAAG GACAAAGGGA TAGGCTTGGT ACCAGAAGGG GCCCAAACTA ATCTGTAAGC GCTCAGGAAA CATCATCATG ATTCCAAAGG AAGGCAAGCT ACCCTGATAA TAGTCAAACA TGGCTGGTAG AACGAGCGAG AGAGCCAAAA TGCTGGCCCG TATATTTCTC TTAATCTTCT ACTN'TGA A6ACGAATGGG GTCGTAAAGG CTCCGAAACT ATTGAGGAGT TTAGCTGGAA TTCGTTCAGA ACTATAGGCA AATCCAGCCA GAACACTTCC GGCAATCACG ATTTGCCCCA AGCCAAAGGT AAAGATAGAA ATCAACAAAG AAAAACTCAC ACTAAGAACA AAACTAGATA ACTGGGCATC AATACGGATG GTAATAGCTG TATTGGTACA AAAAATCAAG CCTT-rCATTT CTCGAGTTAA TCITCTTT GATTTTCCAT AAGGGACAAA AGCACTATAG GCTAGAAACA TAGCTGTCCA GGTAATGAGA GAAGCTCCAA CGACCTCTGC cCTTrTTTCCT TGGTAGCGTT CACTGATAAT ACCCAAACCA AAGAGAAGCC GTGTTCCAAA AGCTGTACCG CTCAATGATA AAATCAGCAA TATTT=rCT AAGAAACCAT TTAGCAGTAA CACCAAGAGC TCAATT'rGTT CCTTAGAATA 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500
GGCACTCGAA
TTCTAAAA.AT
TAGTTA'rCAA ATGGAAAAGG AGGTAATCAA TGTTTCATGA AATCTCTTTC ATAAGCA.AGA AAAGAAGAAG CCTCATTGGT TTrGTAGACTC CTTC~rAAAT TCGAAAATG-A ATCCCTTGTA AATGAAAATC AAAGAGCAAA CTAGGAAGC'r AGCCGCAGGT 'rG=CAAAAC GG7TGCAGAT GGAAACTGAC GTGGTTTGAA GAGATTTTCG AAGA~7rATTA CTCTTGATrr GCTTGATAAA GTAGAAAATA AATCC'TGCTA CCATATAGGC ATCAGACACC ACTTAAACAC AACATTCCALA CCCTTGTTCA CATICAAAAA AAAGGATTAT CCr'DGGCATT TG.GAATAT'rG AGTTTTAGAA CCAAGCCAT'r AACATCATAT ACAGAAAGGG TAAALATGGTC CACACrGCTG GATCCCAAAT CCCTGTTTGT CAAAAAAGAG GGTATCCGCT AAAAACCAGA TGGGAACGAT AGGAAATTTT CTAGGGTATA GAAATTAGTC GCAATGGGCG CCAAGAGGAA
TCTTATACI'C
AGTGTTTGA
GGATGACT
AACAAAGATA
GAAGTAAGGG
AAAAAGAGCA
CTG'rATGA
ATAGTGGCAA
ATGGTAAATC
0
S*
000 0 0* 0 00 *0 0 0 00 00 0 0 0 000000 0 0000 0 0*00 00 00 0 0 0000 0 0000 0000 0 0000 00 00 0 0 0 ACACAGGTAA TCATGATACT TGCCAATTTT CACCI'ACACG AAGAGGTTGG ACAGAACCGT TCAAGATAAA CTCCCGTAAA ?TATAGTGT'r TTGACA'rGCT CAAAACCATC AA'rCTTACAG CGCGCGTCCC TTGTTTCAAA TTACTGTATC ACCATCAGCC CTACTTCTGC AACTTCAGCA CACCGTCTTC GTAGACATAC TTTCTCCTTA TCATCATTCA CA'rGGTGACC CCACCTTTTA AGCGCAAGAG ACTTGGCCTI' GCTCATAACC TTTAGAAGAT AAAGGGTAAA AATAGTTACC GTAATAGAGA AGCATCCCAA AACCACCATG CTTAGTAATT AGCCGCTAGA AACAAGALAGA TACGGCTATA AAATACAAGT TAAATCTTCC TCACAAACTC TGATTTAAGT T'rCATGGCAC TCGATATTGT GGTCCCCTTC TACGATGCGG ATATTTTTCA TCTTTTCGCG CACCrT'rAC TTTCAAGTCC TrGATGAGAG AATTATTC CGTTGGCATC GATAGCGACA AGACCTTCTT GGATTCCACT CATGAGCACA CTCTGGGCAA ACCAGTAGGG TCTGAGTTAC AT'rTTGGACA ATTTGGTAAA TTGTTCATGG CTATTCTTTG AAAATCAAAA TTTCTCGAAC AGCAACTATT 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 ATACCCTAAA ATCAGCATTr TGACAAATTT AGAAAAAAAC CGATATCAAT CTATCGGCTT TTCTACATTr ACATTCTTTT T'rCAGCTTCT GCTTTGATTT TT'rCAACTAC TTCTTGAATG TTCAAACCAG TTGTATCAAG GTAGACAGCA TCCTCTGCrr GTTTGAGAGG AGAAG'rCTCA CGATGACTAT CCTTGTAGTC ACGCGCAGCA ATTTCCTTTr TTAGGGTTTC AAGGTCTGTT' TVAATTCCCT TGGCAATATT TTCCTTGTAA CGACGCTCTG CTCTCTCATC AACAGAAGCT ACTAGGAAAA TTTT-CAATTC TGCTTGTGGC AATACAACAG TTCCAATATC GCGACCATCC ATGACAATCC CGCCTTGCTG GGCAATNTCT TGTTGcAGAG AAACCAGTTT CTCACGCACT TGAGGAATTG CTGCAATAGC AGAAACATGA 'N'GGTCACTT CATTTTCACG GATAGGATGG GTAArATCCA CATCTCCTAC AAAAACAAGC TGGTCTCCAG TTTCTGAACG TCCAAAGCTG 336 ATTGGATGCT GGTCA GGCTAGAAGG GCTTCGACTT CT'rCAACTCC TAATTGGTTC TTAAGAGCCA TATAL3GTCGC TGCACGATAC ATAGCTCCTG TATCAAGGTA GGTGAATCCA AAATCCTTAG CAATAATC2-r TGCGACCGTA CTCTTACCGC TGGAAGCAGG ACCATCAATA GCAATTTGAA TTGN'TTCAT ATCGGCTCCT ATTTT'ATTTT TATAACATCA CCTGGAT'IAG CAAACCAAGA TCCTGTAGCC ATGTGCCCAG GATTCAAGGC CTtTAACTGA GCAATGGAGA TTCCTGCACG AGCGGCAATA GCTGCTTCCC CTTCTCCTGC GAGAACTTTA ATCGTTCCTT 9*O**S
C
C.
a 9
C
9 C 9C CC C C
C
*9 CC C 9 9*~*CO
C
CCC
C
C.
C
C
9CC*
CC..
C.*
C C
CC..
CC CC C C
C
CAGGATTAGC AGCTTCTTCT GCTGAGAACT ACTrTGAAGAT TTA.AGGC'rGC TG'rGCGATTA CCACCACAAT TACAAAGAAA CAGCCCCTCC GTGGTTr'CGA CTTC'rTGCCA CGGTrCTTrT CAATATAAAT ATGAACATGA TI'GCCAAACT TATTCTGATT TGACCC'rGAC CAACTGGAAA AAATTGCCCA ACTCGCGCCC ACTCC CTCTA GTTTAGCACA TGTAAAATCA GGAAGGTCAC TAAATAAATC TTACTGA'rAT AAGAAAGAGA ATAACAGCCA GGTCCGAACC ACAACAAAAA TACAAAGGCT ATAATCAAGG AAAGAGTTGC CTAAAACTCC CAAATAGTAG GTT'rTGAAAC GGTCAAGCTT AAAATGGCAT CGACGGCAGA GCTACTGGGC ACTGCGAAAG AAGGGTTCTC CTCCCCCCCG ATGATAGATA GATGAGA.ACG ATACTAGCTA GGATCGTCAA AATACGATTA TGCCGACGCT CTGCTCr'rGA TTCTrCTrGA GCCATACCTT ACTCCTrGrr TTTTTTTACT AAATCACACT TATACCTGAA CGATGTATCG
ATGACCATCA
GCCATCCTAT
TCATAGATAT
TTTCTTATTA
CCTGTGGGCT
TATTTGATTA CCACGATAAT AAGAAATTTC TCCTAGTCAG
TGATTGGAAA
TTTCCATGTA
TrTTTrCTTT
ACTTGCCGAG
GAACTACTAG AAGTAGAT'?C TGGC'TCTGAA CTCTGCTCAG GAGATTTGTA CTACACTGGC ATCAGAA'rCA TGAAAGCCTT
CCAGGAAGCC
AAATTATAGT
GATAAGATAA
AATCCCAATG
GGAATCGTGC GT'rTTTACGA GATATCTTAG AGGCTGTTAA TAAATCAATG GCGATAATCC CTTTTCACTT TATTTTTTTC AGTGGTCTTT TTTTAGTCTC GTCAAGAGTT GAATGCCTCC 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 TCAGAGAGGT CCAACCAGAT GTCGGATTGC CCAAAATGAG AGGTCATCAG CAGAGAAAGA AAACAAGATA GGAGACCAC GAAAATCTGA AAAATTAATA ATCCCTTCAA TGGAGTACAA AACTTGTCTT GCCAGCCTGC CTT'rCGACAT TTrCCATAGC
CCACCCAGGC
CGACTACAGA
TGATTTTTTG
CCTCCCGACT
GAAGAGCCCC
CCTTCTCATC
CATGAGGCGA
?TTGAAAAAC GA'rTGGACTC ATACGAAAAT CACGGACACC TAAAAGAGAA CAGCACAGAA TACATTTTT'G TCTAATACAT 'TTTTCATCTC TGCATCCATC ACCACCACCA AATCTCCTGT TTCTTTGCCA AAATGTCAG AGAAAGAAAT AGTTCTCCCT GTCCCAACGC AGTCCAAGTC AAACAACATA CTAGGACGAT CTTGGAGGTC CGCATATTGC AAGCCTGCAT AAAGGGCTGC ATAATGGACT GCCGGATTTT GC'rCCCGATA
GGCCTTTAAG
TTCrGTCC
CTCTTCGTTT
AGTTCCAAGG TCCCATCACT TGATCCATCA TCGACAAAGA CATACTCGAT AAATCTGGAA GTAMAGCTTC CAGAGCCTGA TAAAAAAGAG GAAGTACTTc AAACAAGGGA CGATGATTGA AATCATCATC T'rAGTCrrCA AATCCAT'N'G GATGCTTGCT TTGCCAAcGc CCCAACCGAG TTCTGCTTTA GGCGACGTTC TACGATGCGG TAATTTCAAG AACTGAGTAA CT'TTrTGGAT TrrTCAAA AGTCACGAAC ACCTGTTCCA CTAAI-rrTCC AACGGCTACT GATTTrCTCC CAAATCACCA CAACATTCCA TTCTGAGTCT TAGTACGACC GTATGGGTTG CATr.CGTCTT CACACAN'G GGTGATGTCG AGTTCTGCTT GCTTTGCCG GGTC'rGAGTA GCAGGCAGCG ATATCACCTG TAAGGAATAG GACGGCCCAC CGCTTTTTCC ATGTTTTGGA
CCTTTACC.AG
GCTGCAACGT
TCTI'CCGTAT
TGAGTCACAT
CTCTCATGGG
GCTTTGTAAA
GTCACTGAAA
TTCCAAGGTT ATAAACGTTT AGTCCTGAAC GACCCTTAGC CAAATCGACA ACGTGGATAT CGTAATCGTC TCCAAACACT TGCACTTGCT ATGGCAAGAG ATTGTTTGGA ATACCGTTTG CTCCGATTGG GTTAAAGTAA CGAAGCA.AGA TATCAGTCAA AATTTCCTCT AGCATGAGCT GTGGGAAATC T'rCCAAGATG GGCACTGTGT TGAAGATGAT GTTTTTACAG TTGTTTTCTT CCATATTGTT GTCATAGTAG GCAAGAGGGA CAGCAAAGTG AATGACACCA GTCGGTTCGT' TGTCACGAAT ATCTGCCTCA TAGAAAGGAA
GCGGATCCCC
CCATGGCTTT
TACGTG'rTGA GTAAACTGTC GCAGAAGAAC CAAAAGGCTG ACAG'rTCCAG T-TCGCCAACA GCCTTCAAAC 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 CCTGCTTGAA AATATCTCTG AGGGTATCTG TCTCAACTCC TGTGATTCCT TCAACAACTT CTAAACTCTT ACGATTGCTA TTGACAAGAT TATCCACCAC AACAACTTGA TGACCTGCTT GGATCAATTC AATAACAGTG TGGGTTCCAA TAAAACCGGC ACCACCAGTT ATTATTTTTT CTTATrAC TGA'rAAAATT TCAGTAAAAT TCCCTTNGCC ATGGGTATGG GTTTTGAGCT GACTTCGTCA CGGCTAGTTT CCTAGTTPGT TGACATAGTT TTCAATGGG GTTGGGCTAG ATGCTAACCA TGGCTGCATA GACACCAGTT TGTAGGCACG GATTCCA.ACA ACCAAAATCT TTTCTTCCAT CTTTTTTCCT CGATTCTCAG
CATTTTTGAC
GCTTATACTC
TTACTGACTT
GTTCTATCCA
TCTTT~GATTT
TAATTAGAG
ATGATAGGAC
AAGTCAGGCA
CGCTCAGATT
AGGGAATGTC ATTTGCCATC CTAAACTACC TTCGAAAATC CAATTCAAAC TACGTCAACG CGTCAGTTCT ATCCACAACC TCAAAACAGT CAACCTCAAA GCAGTGCT GAGTAACCCG TTATTGAGTA TTATTCGCTT T'N'ACTCGTT GGTCCAAGGT CAACTCCTTG TCTTGGA'rCA CAGTVCTGAG GCCTGATGAA CCTAGTCCAC CC'rCCCCAAA GAAAGGAGAG AAATCACTGG TTGAAGTAGC TTCAGCCAAA ATCAGATAGT 338 GAGTCAAGGT GGCCTCCTCC ATTTC?1'OGA GCAAGGTTTC ATCTACCGTC AAATCAAATC 9840 CCATGTCATT TTCGTGGGTA GCGCCTAAGG ATAATTTCCC ACCTGCAAAG GGAATCAAAT 9900 CCCACrCCCC T'rCTGGCATG ACAACAGGGT AATCT'rCCAT GTCTTGGGCA AGCTGATAAT 9960 CTCGTAGTTG TCCTrTGA GGACGGACAT CCACTTCATA ACCTAAAGGC TCTAACATGT 10020 CCCCCAACCA AGCTCCCGTC GCCAAAATAA CCTGCTCAAA CTCCTCTTCA CCAATCTGGT 10080 AGCCTGATGC TAACGGTGTC AGAGTCACTT TTTCTTTGAC CAGCT'rGACA TGACTGACTT 10140 CCAGCAAACG AGTCACTAAA AGTTGGCCAT CTACTCTCGC TCCACCAGAA GCATAGAGCA 10200 GGCGGTCAAA TCCCTGCAAA CCAGGGAATA ATTCATTAGC TGAGGCTTGG TTCAGA.ATGG 10260 CTAATTGCCC TATCAAGGGA GATTCTTCTC TGCGCTGGAG GGCCAGTTGA TAAAGTTCTT 10320 CCAAATTGGA TTCATCCTTT TTCA-AGAGAA AGACTCCCGA ACGCTGGTAA AAGTCGATTT 10380 CTTGTCCTGA TTTCTCTAAA TCAGCTAATA AATCCACATA AAAATCAGCC CCCAAGCGCG 10440 CCATCTTGTA CCAGGC'rTTA TTACGGCGTT TGGAAAACCA AGGACTGATA ATTCCTGCTG 10500 CGGCCTTGGT GGCTTGACCT TGCTCATGGT CAAAAACGGT CACCTCTAGG TCACTT1'CTC 10560 TCGAGAGGTA GTAGGCAGCT GTTGCTCCCA CAATTCCTGC TCCAATAATG GCAACTTTTT 10620 *TCATTGTCTT CACTTTCTAA CTAGATATGA TGGAALAGGAT TGGTTGATGC CTGACTAGGC 10680 *.*AAGATATCAA TAGACCACCC CTTATCTTCC TTCCATTGAC 'rAAGAAGTGC TGCGATTTTT 10740 S* -TCTACAAAAA TCAC'rTCGAT ATAGTGACCT GGGTCCAATG CAAGCAACCC ATCAGATAGC' 10800 *ATATCCTGAG CAGTATGGTA GTAGATATCA CCAGTGATAT AGACATCTGC CCCCTTTGCC 10860 AA.AGCATCCT TATAGAAAGA CTGCCCGCTT CCACCACAAA TTGCTACTCT TGAAATAGGC 10920 *TTCTGCAAAT CATCCTCTTG ATAATGCACC ATTCGA-AGGC TATCTAGGTC AAAGACTTGC 10980 TTGACCTGTT GGGCCAATTC CCAAAATGTC TGAGGCTGAA TATTCCCAAT ACGTCCAATT 11040 CCACGTTCTG GACCTGTTTC CTGCAGATAA GTCGTCTCCT CGATTCCTAG CATCTGACAA 11100 AACCAGTCAT TGAGCCCATT TTCAACGATA TCAATATTGG TATGGCTGAC ATAAAC'rGCG 11160 *ATATCATGCT TAATCAGGTC GATGTAAATC TGATTTT'GCG GACGGCTGGC AAGCAAGTCC 11220 TTGATAGGAC GAAAGATAGG CGCGTGCTTG ACGATAATCA AGTCCACACC CTTTTCAATG 11280 GCCTCTGCCA CTGTCTCTTC ACGAATATCG AGGGCAACCA TGACCCTTTG GA'rACCCTTG 11340 TCTAAAGTGC CAATTTGCAG ACCACGGCTG TCTCCCTCCA TAGAAAAT'rC CTGAGGGCAA 11400 a *AAGGCTTCAT AAGCTTGGAT CACTTCACTT GCTAACATGG AGCACCTCCT TGATAGCTTG 11460 AATCTTATCT ACTAGAACTT GACGTTCTTC CAGATT'TTTT TCTGGGATTT GTCCGAGGGC 11520 GAACTCTAGC TTCTCAGCTT C'TTTTTGCCA TTTT'TGGACA AATACTGGAC TGACTTCTTT 11580 339 GGACAAGAAG GGACCAAAGC CACCAAAATC TCATAAAACT TCCATGATCC TGTAGCCAGA ACGCTCTACA TTAGCTAACT ACCCATGCCA GCAATGGTAA ATTGGCTIAAA CGGACTTGGA AGACTGATAG GGACCTTCCA AACCAACTCG ATAGGCAGAT TGACACAAAG GAAGCTACCA ACTCTATTAC CTCTTATTA'r TCTGGCGAAA GAAGTTTCAA ACAATTTGCA AAAATGTCAA TATAAAAACA CATGGTAGAA TTTGTATTTG AAGGTGGTGT TATGCTTTCT GGAATrrTATC TTAGAACTAC TTCCAGAGGA
GAACATCACT
TTCCAC=C
TACGCAAGTC
TCCCCAAACC
TGACAGACAC
TTTT'CTCCTT
CCACCTCACC
AAGCATGGTC
ATTCTAATCT
ACCACATTTC
TGTCCTAAAG
AAAATAAAAA
TATAATTAGA
TCAGATAAGA
ATAACAGGAG
AGATATCAT
GGCTGATAGC TTCAr'rTGTC CTGCTTCCAC TTCTAAGATG CTTTCTCCTA CAATCTCGAA GTCTTCACGA TTATTGGGCT GGAGGATCAA TTCTTCTAAA ATCCTAGCAA TCAAACGACC TTGGTCAGTC TCTCAAAAG CTGCCAAGCC TAGGCCGTGA GCCTCAACAT ?r'TTAACCGC TGCAATAGCG CTTrGAT3-r GGCCTC'TCTC ACTTCCCACA TCTAGTAAAA TAGCCCCCTG CTTTGAAATC ATCTTCTCTC ACTT'rCCAAA AATCTTCAAC TTCCCAGTAA TATAAGCACC TAATAAGTGA ATCCAATTGA AAGATTTTAA A'rAAACAGTT TATTCAGAAA ATTCTTGACA AAGTTAGAAA AAATAAAAGT TTGACTAAAA AATTTAGTCA GACGAACCAC GAA'N'TGCTC GATACAGTCA TGGAACAAAC ATTGTTTGAA GTCACAGGTC 'rCCCTAAGTA TTGTTCTTTT 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13206
ACTTGTTTAA
AATAGGCTAG
TAGAGAAAGA
ATTATTTATC
AAGACGCTGT
TTA.AAGTTGT
CTACTGCTAA
TTTG'rAATCT
A.ATGTGGAAT
ATGATGAAAA
TTACAGGTCG CTAGTTATAT TTTATATAAA ATAAGTAGC'r TT~ACT'rACG TGCTGTGTCT CTAGCCTAT'r TTAATAATTA GGAGTTTGTT ATGGATTTAT ATGTTTAAAA TGTGATAAAA AT'rTCCAACA GGGTGATATT TGGAATTACT AGATAAGATG CCTGCACAAG GGTGGAAAAT ACACATAAGC TCCCAAATAA AAATATTTT AAGATTGTGT ATAAACTATC CCAACTAAAT AATT'GTAGCT TAAAAATTTA GAGGAATTAA AAAAAXI'TAA TTCCCCTAGG GAAATGAGCC CAAA'rTTATA ACTCTATATC CTAAGTCAGA ATCTGAAGCT AAGAGTATGA TACGAATAGA CTGTCAGAAT TTAAGGCTCC AAAAATACTA TCTGACTATC GCATTCTCCA GTTCATTATA GATATGGGGC TrTTTTAAAA AAACAAGCT'r AAATAAAAAA GTCATCTATT TA'I GCTAGA TGAAAAAAGG AAGAACTATG TAGAAGATAA GAGACAAAAT TTCCCTAGTC T'
AAGAAG
INFORMATION FOR SEQ ID NO: 34: ~CCTAGCTG GAAAATGGAT TTATTTTCAG 340 SEQUENCE CHARACTERISTICS: LENGTH: 13104 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: CCGGATCCAG CGAAAAATAT GCTGCTAACG ATGCTGTAAC AAAATTTGTT CAAGGGGGGT AATAA1-rTAG CrI'TCT'rATA AACAACCTAG TAAAGCAGCC AACAAAAAGC CAAAGGTTTT TAGCACTTGC AACCCCTGAA ATAATTCCCT CTTTATGC'rG TACTCTTTTA TrTCTCAAAT GAATTCCAGT TCCACGCACA CTrACCTCTT GATCATTCCA TCGTAACCTT GATGATCGCC TGTTGGACTG GGTTGGTTTG CTGCCTTTGG TTCTGT'rCTT AAATCGTAAT TGGTATCTTC GTATCTT'rGG TGTTATTrTC CATTCTCAAA CATGTTTAAC TGGCTAAATT CCTrCCTr'rC GGACTAAGAT TGCCTTGATT GGAAATCAAA TCCCCCTT'rG AAAAAGTAGT ATCCTATGAA CTGCTGTCTA TCATTCCTGG ATCTTCCTTG GTGTAACCAT TrGAGCAACC TCATCACTCT AT'PCGTGGTG CCT'rCCATCT ATCAAAGATG CACATACGAT CTCAAAGACA TGATCAAAGG TCTTATGTTG CCATGACCTT TTTACCAACT ACGACTTCCA ACCAACTTTA CAAACATTTG TCTTGGACTA TCATTTGGGC ACAGCTATCA TTGCCAACCA CTTCTTCCTT GGGCTGTCCC GATAGTGTCG GTGCTATCAA CTTGATGGAG CTCTTATTcc ATGATGCAAG GTTGGCTCGG AATT'rATCAA TAGAGACACA AGGAGTTAAT ATGGAAAAGC GTTAGGACAG ATrTACAATA CGTATTTGTC C?1'TACTTCC TGGTGACAAA CCAGGTCGTG AATCTTTGTA ATCGTTTATG TGCAAAACGC ATTAACAATG GATI'TATGAA AATGGCTTCC CGCGATT'rC TTCCCAGT'rA GCTCTPGAT GCTGTAAGTG GTCAAAAAGA TGCTAAAACA ATTGATCAAA GAAACAATCA AACAAAAATT TGGTGAATAA
ACACTTGCCA
GAGCTTGAGT
TTTGGCAGCT
ACCATTTATC
CCAAACAAGT
ACCTTCCGTT
TCTACTTTAC
AAAGGAAAAC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 AGCCTTCATC ACTATCTTGA CACTCAAGTA TTGCCAATCT TTGGAAAACA GACCCAAC'1T ATTCCCATAC ATCTACGTTC TGACCTTGGG TATCTTGCAA TCTATTCCTA ACGACCTTTA CGAAGCAGCT TATATTGACG GTGCCAACGC TTGGCAAAAA TTCCGCA.ACA TCACTTTCCC AATGATTTTG GCTGTTGCGG CACCTACTTT GATTAGCCAA TACACCTTCA ACTTTAACAA CTTCTCTATC ATGTACCTCT TCAATGGTGG AGGACCTGGT AGTGTCGGAG GTGGAGCTGG TTCAACCGAT ATCTTGATCT CATGGATCTA CCGTTTGACA ACAGGTACAT CTCCTCAATA CTCAATGGCG GCAGCTGTTA CCTTGATTAT CTCTATCATT GTCATCTCAA TCTCTATGAT CGCATTCAAG AAACTACACG 341 CATTTGATAT GGAGGACGTC TAAGATGAAT AACTCAATTA CAAAGCCTTA CTTACCTTA CCTGATTGGT CTATCAATTG ATTACCATTA TGTCAGCCTT TAAAGCAGGT AACGTCTCAG ATCGACCTCA ATTTTGATAA CTr'TAAAGGC CTCN'CACTG TACCTCAACA CTTTGATTAT CGCCTTAATT' ACCATGGCTG CTTGCTGGTT ATGCT'rACAG CCGTTACAAC TTCTTGGCTC TTCTrGATCA TCCAAATGGT GCCAACTATG GCCGCTTTGA CT'rATGTTGA ACGCCCTTAA CCACAACTGG TTCCTCATCT ATCCCGATGA ATGCTTGGCT CATGAAAGGC TACTTCGATA GAATCTGCAA AACTAGACGG TGCAGGACAC T'rCCGCCGCT CTTGTTCGCC CAATGGTTGC CGTACAAGCT CTCTGGGCCr TACATCCTCT CTAGTTTC'rT GCTTCGTGAG AAAGAATACT CAAACCTTCG TTAACAATGC GAAAAACTTG AAGATTGCCT CTCATCGCCC TTCCAATCTG TATTCTCTTC TTCTTCCTAC CTTACAAGTG GTGGCGACAA GGGATAATT1' ATCCCCCCCA TTCGAAAATC TCTTCAAACC ACGTCAGCTT TATCTCCAAC AACCTGTGGC TAGTTTGCAC TTTGATTTTC ATTGAT'rATT
AACTCAAACG
TAATTATCTA
CCTTTAAAC'r
AAACCTTGTA
TTCAAACAAG
GTAAACAAAG
TAGACTGACT
TCCACTGTTG
AGATACTAAT
CGGTACTTGG
'rATCATCGTA
TTTGGTCTTC
G-AGCCTTC'rT CGTTATCGCG TCCTCTACGT TGGTGGTGGT CAGTGCCAAT GTCT'rTAGAC TCTG~GCAAAT TGTTCTACCA TCATGGGACC TTTCGGGGAC TTACTGTT1GC CGTAGGTCTC ACTTCTCAGC AGGTGCTATC AAAAGAAC'N TGT'rTCAGGA CCCTTTTTCA TTTTATACTC CTCAAAGTTG TGCTTTGAGC AGCAATTGTC ACTGTAAATA AAAGCGCATT TCTC1'ATATA TGCTTCCATA TCCATTTTCC AACGTTTCGA GCTCAACTGG CTATGGTACC CA?1'GCTATC TCGATAATGT CTATGAACCT CAATTGTCGA TGGCACA'N'A TTGGTCCAAG TCAAATCAAG 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 ATATCCTTGT AGCAAGCAAT TTTTCTCCTA ATAATACTCA TATAGAAAAC ACCTTTTAGA TATTrTTTCAA GTAT1'TGGGG GGTTCGTAAG TTTCAACTTC TCTTACCAG 'rATCTTCCTT CAAAACAGCT CCCAGGAGAC CTATCCGCTA CTGACAGATA AGGTTGTCCA GGATCTCTCT ACTTATACTG GAACAGCTAG TCAAGCCCCT GAATTACCTA AGGACI'GCA ACTGCATTTC AGCAAGGAAC TGACCCGCAT CTCTTACCGA GACAGCTTGA CCCAAGCAAT TTCTAAAGAC CTCTTCCTAG TI'CTCGGTGC GAGCTTCCTC GGAGCTAGCT TTCTCCTTTA TATCACCAAA
GACI'GAAAT
AAGATACCTA
CCCCTGTCCA
ATCAGCTTGT
GAAACTT1TTA
GAACATGCTA
TCTGTTGTGA
GATACAAATG AGCTAGTCAT CAGCAAGGAA GCCATTCAGA CTGAGAGTTT CAAAAGCAAA TGGTACCAAC AAAATCGTGT CTATATCAGC TrTGGTTTGA ATTTCTTTAT CGTCTCTCTT AGATCACGCC TCTTTTCATT TAATACCTTT 342 AAAGAGTGCT ACCATTTTAT CTTGAACTGT T'DAGGATTGC CGACTCTGAT TACACTTAI-r TTGGGATTAT TTGGCCAAAA TATGACAACC CTGATTACTG TACAAAATAT TcrTTTTGTT CTGTATCTGG TCACTATCT'r TITATAAAACA CATTTCCGTG ATCCAAATA CCATAAATAC GAGATT'TTTA TGCCCGTTAC ACCGTAACCC GTGTTATTCA CGCAAAGCTA riGAAGCGAACT AGCTATACTC ACGTTATCGG CCTTTCTTTC CATCGGTTCT AT'rCAGATAG CAACAGGGAA TACGGCAAGC GTGTAGATGG AAACTCGTCG CAGAAGAACA ATCCCACTTG TCGACAACGA AAAAAAGGCT GCAAACGCATP GAI'rAAAGAC GTGGCCAAC.G CTGCTGGTGI' TTCGCCTTCA AAATAAATCA ACCA'rTAGCG ACGAAACAAA AAAACGTGTT CAACTACCAC CCAAACCTrCA ACGCTCGTAG CTTGGTAAGC ATTAGTTCTr CCTGATGACT ACG'rGGCATC TCTCAAGTCG AGA'rGAGAAG GAGCGTCTCA GCTAATTTT'r CTCTATGCCC GT'rCCCCTTC CTTATCTTAG CAATGTTCAA GCTGGTTT'rG TGCCTTTATC GGAGGAAGTA ACAGGCGCTT AAACAr'rACA GTTTCTGGAA GAAAAGGGCT TGATGCTATC ATCACAACCG ACACCAGCTG GATGTCCCTG AGCC'rATGTC GATATCAATA CAGACGCCTT CTACCAGAAT CATCTGAAAA CCACTATGCC ACGCTAT'rTC ACAAATGGTC AAGAAGAAGA CCCTCTCGTA GTAAATCTCT ATCTCCTTTC ATGCGACTGA ATATTT'CATC AAAAGCTCTT CGTGACCAAA AACTTACCAC TGACAACAAT ATAAArE'TAG CAAGCGA'rTA ATAGCCTCCT ACCTGAAGGT TTCTCAGCTT TGACTCGGTT GTTTAGAGCT TGGTCGTGTT a a. a a a a. a.
a a a. a.
a a a. *a a a a a. a.
a a a
GACCGTTTAA
CGCATCTACT
TTCAAGCACG
GTTTGTAACT
AATCCCAAGC
TCCCTTGAAA
CGTCAATTGA
CAGGCTATGA
TTGCCGACGA
ATCCACAAAT
.ATATTGCCAA
TCA.ACTTGGC
CTATTCTCCA
TCGCCCACAA
3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 GATTATTAAT GATAATAAAA ACAATAAACA AAT'rTGTTAC AATTATCGAA AAATAAGAGA CTGGGCAAAA AGTCGTTAAA AGCAAAAACG CATACTATCA GGTATTGAAA AAACTTGATA CTATGCGTTI' TATTGTGGGA AGATTTACTT CCTTT1C'AC TGAAATTGAG TCTTTTCCCA AGATCtT'rTT ATACTCAATG AAAATCAAAG TGCAAACTAG GAAGCTAGCC GCAGGTTGCT CAAAACACTG TTTTGAGGTT GTAGATGAAA CTGACGAAGT CAGTAACCAT ACCTACGGCA AGGTGAAGCT GACGTGGTTT GAAGAGATr'r TCG.AAGAGTA TTAATCACTA ATTATCTATC TCAACAAATC TTCCTAGAA'r ATGAACATTT TCCGAGACAG AGACAAAGGA GCTTGGATCC ACTTGTGTCA TAATCTGTrr AAATTCATTA AACTCTGCAC GTGTAATGAC AGTGATTAAA ACTGCCTTTC TCTCGTGATT ATAGGTTCCT TCTGCATCGT GGATCATGGT TGCTCCGCGG TGCAA'TTTT TATGGATTTT TTCAATTACC TTCTCTGGAT GA'TrGTCAC AATCATGGCC TGCATACGCT TTTGCTTAGT AAAGACTGCG TCTGTCACAC GGCTAGAGAC AAAGATGGTA ATCATAGAAT AAAGAGCGTA 343 TT'rCCAACC-A AAGGTCAAAC CTGCTATCAG CATGATAGTr ACTACCGACA TITCTTACCCG TTrCTrACG AATAGTCAGG ACTGGAGATA TTGITCGAA GAGCAAAACC AATCCCCAAA AAGGGAATTG ATAATGGGAT CCTCTGTCAA GGTTGCCACA GGAACTCATA GATACCGTGA TAAAGGTAAA GACGG'TGAAC AGCTAAGACC ATCAAAGGGA AGr'rAATGGC GTAGAAGC'TT ACCAAACCAG TGATTAC'rCA AGGCAGAGAT AATCTGTGCC ATACACATGC CCTGG~rGGA AAPAGAAATr AACTGCTACT CAGAGAGGCC GAAA'rCr'CT CATCATACTT TTCTCGAGAG AA'TTTTATC TGATAAGCAA AGCGGCGCAG ATAATAGCGC TTGTTTCATC TTCTTCTACT TGTAAGC1'GA GTTrCCTCTAG ATGGAGCT'rG TGTCATTGGG TCAGTTGCCT TGTTGTTCTT GGATATTrC T'rCCCAGCA AGCAACATGA CAAAACGGTC CGTGTGGTGG GAAACCATAG TCCATGGCTT CAAGAAGGAA CTTCAGTTGA GAAACCAAGA GCCTTGAACA TGCGTTCTTG GAAGGCTACC kCCACCAAGC TcATrAACCGT TCAAGACGAT CCTTAGCCAA ATCACCT'rCT AATTCATGAG CAGTCTCTTC GGTGGGCGCT CATGTAGCGG CCTTCTTCTT CAGACCATTC CCATTTACCA AGAAAGAAAT CTGACGATAT CCGTCCCACC
CCCATAACAA
GGGACAAACT
TTATGGCCAA
AGCGAAATCG
AGACCTGTTG
GCTGATAAAA
CACCCCCAAA
GGATAAAGAA
TCTGATACCAr
GAATATGAAA
CACCACTCGA
AACCATAGAC
ATACTTTGTA AGACACGTAA CACCGCTTAA T'TCGTTGT p.
V
V4 V. *V V t V.
V
T'rGTTTGAGA
AGGAAAGGCA
AAGCCCGA'rA
ACCAAACTG
GCGACTGTTG
ATGACTTCAC
GCCAAACCAC
TCATTGGCTT
AAGGTCTTTT TGGTTGATAC A'rCGTAAGCA ATGGCACGAA CTGTGGAAGT GTGAAAGGAT AAACATCGGC CAGTCAACCA 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780
S
C C en.
C. CS C C
C
CCCAAAGGAA GTTGAACTTA TCATTATCAA TCAAGCCAAG CTC'N'TAGCA ATACGTCCAC GAAGGGCACC CAGTGTTGCA TTAGCCACTT CAAGCGTATC CGCCACAAAG AGAACCAAGT CCTTATCTTC AAGAACAAGC GCTGTTGTCA ATTCTTCTTG GATACCAGTC AAGAACTTGG CAACTGGTCC GTTT ATTCT CCATCAACCA CC'rTGACCCA AGCAAGACCT TTGGCACCAT ACTGTTTGGC' TACT'rCCGTC ATCTrGTCGA TGTCTTTACG TGAATAGTTG TCCGCAGCTC CTGTGACCAC AATCGCTTTT ACAGCAGGTG CTTCTGAAAA GACTTTAAAG TCTACACCTrC GGACCACTTC TGTCAAGTCC TGAAGCAACA TGTCAAAACG. AGTATCTGGC TTGTCAGAAC CGTAAAGAGC CATAGCATCA TCGTATTTCA TACGAGGGAA TGGTAGCGTT ACTCGATGC CI-rTTGTr'rC CTTCATCACG CGCCGATCA AGCITITCTGT AATATCTTGG ATrTTCTTGCT CAGTAAGGAA GGACGTTTCC AAGTCGACC'r GAGTAAATTC AGCCTGGCGG 'rCTCCZACGCA AGTCCTCGTC ACGGAAACAT TTAACGATTT GGTAGTAACG GTCAAAACCA GCATTCATCA AGAGCTGTTT CGTGATTTGT GGACTT'rGAG GAGACGGCAC TAAATAATCA CGCGCCCCTT CCACGTCGAT AAACrCCAAC TCATCCAAGT GAAGTTTAAG ATTTCCAAC ATrTCTGGAC GTGTATCG'rC ATTTGCCTCA ATGCCATCCT TAAGCACAAT AAGAGCTGTC ACGTTTAACT CTTGTCACGC GCAGCGACCT GACCAGTCAC AGCTGTTGCC ATAACCTCTG CAGATACTTT TTCACGGTCA CGAAGATCGA TAAAGATCAA 344 GAAGAGCGTA AAAATGCCCC TTATTAACAC CAGGCGTTGA CTTAGAAAGG AATGGTGTCT AGTTGCGGA'r AGAGTGGGTC ACCTTGGCAC GACGAAGGTC AAGGTAACGG TAACGCAAAC TAATCTCAAA TGGTGTTGTC TTAGCTGTGT CAACCGCACC AGTTGGCAAC TTATCA'rrGG CTCAATAACA AATTCGCTAC GAAGGcrrTC ?rCAGGGTTG ATAACCAACT GCATGATTCC ACCACCAAGG TCACGACGAC GGCCAACCCA TTCCTCACGA ACACGACCAG CATACATACT TGTTACTATT TTACCATAAA AGCGCAGCTC TCTT'rAAAAG TCAGGTGAAA GCCCTAAAAA AAACCACGTC AGCGTCGCCT TACCGTATGT AACCTCAAAA CCATCTTTTG AGCTGACTTC 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500
TCCTTTCAAG
ACGTTTCAT'r
TTCATGAAAA
TTAGCGCTAA
ATGGTTACTG
GTCAGTTCTA
TTGCTCTTTG
TATTTTGAAT
ACAATTTCCG
ACTGCTAGAT
GTTATTTCTT GTCCGATGTG ATTTCTCTCC TCTTTATTC T1CATCAGAAA AGTTTGCCAG TACTCTTCGA AAATCTCTTC ACTTCGTCAG TT'TCATCTAC 7560 7620 TCCACAACCT CAAAACAGTG T'rTTGAGCAA CCTGCGGCTA ATT'rTCAT'rG AGTATAATAC AAAAATCCGA TGAACTTCAC TTrTTCCTGC TTTACGCTrTT TCAGCGATTT CGGCTGCCTT TTATGTAAGC CGTCCCAAAA CGCAGTACAC CTGCAATAGG AGTTATAGAA GAAATCGCCT TTGAAGGCAT AAGCTAGCGC AAAAATAGAA CGACTGCCTG AATCACTGCT AATAAAATTA CTCGTTTCAT GACTCTATTA TAGCATGAGA A'rCATCAAAA AGCCGACTAA ATTATTCAAA AAATACTGTA GACCAGACCT TTTCTGCTAA TGTAAGCCAA ACCCAAACTA TAAAATAGAC AAAAAATTG'r TGCACATCAC CTGGAAAATG AATCAAGGCA TAGATACCAG AAGAAAAATC AGGGTTCGTT TACTATTGTC CTGCTTAGGA GTGCTAACAT CCCTCTAAAA ACAATCTCTT CCGTCAAAGG AGCAAAAATA AGAATGAGAA AAGTGGTTGA GACAAGGTCA AGTCTGTCGC TATTTGCTGA GATCATCTGG CAAGAAGAAT TGAACGACCA GAGATAAGAA CCAAACCAAG GCTTCCTAGT 7680 CGGACTCTTT 7740 TCGAGGCAAG 7800 AGCAAAGACA 7860 TCCAATGATG 7920 GTGACCTCCT 7980 GCGTGAAGAG 8040 AAACCAAGGC 8100 AATAGAAGAC 8160 AAGAGATAGC 3220 ACCACAGCAA 8280 TTTACTGAAG 8340 ACAGGAAGCC 8400 TACCAT'rTGT 8460 GCATAGGGTG 8520 ATAAATAAAA 8580
AAATAAATCG
AAATGCCGTA
AAGCGCCTAA
ATTAAAGCCG CTCTTCTCAA TATGAACAGG AGCCTTCTGA CACATATACT CCAGCCAAGG CCACATAGAG TAGAGTAACA AGCAAGCCAC GCAGTCGCGA GCCCCTGAAT AAAGCCATAG AGGATAGAAG GGCTAGAAGA ATCCAGCCAA GGTTTT1TAAG 'rAATTTCATA GATAACTcCT I TTArTTGAAA TAACGTI-rA CCATAGGTAA CTGCATCACA TTGATATAAA TCCTACAAGC AAGAAAGC'rA GTAACTGAAT AAAAATATAT AAGGCTGGTA AGACATATTG TTCCAAATTA GCCTGACGCT TGGTGTAGCG AAAAA'N'CCA TTTTGAGCG ACTAAGAATA TGTATGCAAT ATA6ATCACCT TGTAATATTA AAAGCAAMGG
CCTCATC
AATCAAACTG
CCACAAAGAG
CTCTCCTGTC AAGAAAGAAA GTG'rAATTGG AATAAAAT'rC A'rAAGAATrT ATATAGT'rCA ACGAACAATC GCAATGGTTT TAAGAAAGAA AGGAAAAATC
TTCATTCAGA
TAAGAAAT'rG
TCGTCATCCA
AAAGTCAAA6A CACTTAATGA AATAAAAATA GCCAATGGAA TTCCAAACTC AAGATTCCGA TACATTTGCA TTTCCTCTTG ATACAAAGAA TGAAATTTTC ACATACTAAT GAAACCTATC AGTA.AACAAA CATCTAAAAT AATTTCGTGG GATTCGACAC AGAACCCCCC AAGACAATAG ACATCAAAAA TCTCCTTTTT CACTTGCTAG ATTTTTGGAT AGAGCAAAGT AGACCCAA.AC AAATTGGTCG
CATGGATGGC
TGATAATAAG
GAAAACTCTG
AGACATCCVT
TAAAAAGAGA
TTTGAGGGTT
TCGCTACACC
CATAATAGGT
TGCTTTTCTT
TAGCTGATAT
GTGCCTTAAA
TAACAATCTA
TTCTTTTCAA
CTTTGATACG
0 o .0 0 0 0 0 0 a 0 CCATGGCATC AAGGC'TTTTA CATCCCTACA AACATGCCCA CGT'rTCTTTT TCATATTCAT TCCAT1'CAAT TACTGGGATG GATTAAACCA GCT'rAGGTCC CCACTACATA ATAAATCACT CGATTCGACT GTT-rCGT'rGA GTACTCATCC CGTTCTAGTA GTCGGTTTGA TTrGAGACCAT GTTACACACC TACTCrCCGT GAATT'rTTCT TrTTCCCGTAT 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 ATCCCAATCA GTAGAAATAC -GCTGAZ-TAAT AAAGCTATGA TTATACTTGT TCATCACTCG TCCTCCTCCA AACGAAATAC AAATTTGAGA TATTTTCAGG GCAATGATAA TGGATGGGGT GGCTAATGGT CTGTCTGGAA ACCCCTGCCA GTTTGGCTAG CGCGAGCTCG AAGCTCTTTT AGACGATT'rT TTAGTTGCAT CAAATTCAAC GG'TrGGATA TCCTCAATAC GTTGCAACTT TATCTACACG TCGTAGCTTT ACCCATTCCT CATCAACATC CACAACTTCC CAGTTATCTC GCCCAATATA CACTCCCGTT CATT'rCTTGT AATAATCTCG ACATTTCTGC GTTTCC'rTTC GALTTTTATTC TCTAGTTTCT TGATTTTTTT AGAATTATTA TAGTATAAAT CCTAGTACCC ACATTATAAC TCCTTTCTGC TTCATTGTAA CATATCTr'r' TCTTTTGAC AAGTATAGTr TGTCAT'TTTG CAAAAGAAAA AGGTCAGGAG TAGGTTCCTG 'rACTCTTCTA AAATCTCTTC AAACCACGTC AGCTTCACCT ATAAT'rGGTT CCTTTCCAAT TC'rTTTCGCT CAAGTCTTTT GAATAAAAGA AAATCATAAA TTCCTA'rrrC TTAACTTGAA GTCAAAAAAA TTATGATTTT' ACCACTTTAT CTATCATTAA TGCCGTAGG'r ATGGTTACTG ACTTCG'rCAG TTTCATCTAC AACCTCAAAA TCCACAACCT CAAAACCATG TTTT-GAGCTG CCATGN'TG AGCTGACrrC GTCAGTTCTA CCTGCGGCTA GCTTCCTAGT TTGCTCI'TTG CAAAGATTC 'rGAGAAGTrTr 'GGCTGATTG TTTGGTTGTT CTTGACCGTC ACT'rGTCCGC CGGTCTTAGC CGCAAAGACA TCGGCTGACT AATCACGCTC TGCTTTGAAA CCTTGTTGGC TATTTGCCCC TTCGCCCAAG ACTGCGATAT TCACACCTTG CT'TTrCAAGG ATGAGAAGCA CAGCAGTTTC AGGGCCTCCZA AAGTAAGCAA 346 CCATGTrTG
ACTTCGTCAG
TCCACAACCT
ATTTTATTG
TCTCAAGTGA
TTTCGACTTC
TGAACTGAGC
GAAGAGCCTG
ACACATCTAG
ACCTGACTTC GTCAGTTCTA TTCTATCCAC AACCTCAAAA CAAAACAGTG TTTGAGCAA AGTATAAAAT CCTAGTTTTr CACTTGCACT TCTTCTCGGG GCTCTCTCCT AGGGTGATGA 'TTTTATTTA CGGTTGAGGT TACCAATTCC AAGGCCTTGA GGCGTTTTCG ATAGGGAGGG GGCGCTCTAC ACCAAGTCCA AAACCAAATC CCAAACCA'rC GTAGCGACCA CCCGCACAGA
CGGTCAGGTC
CCAGACCACC
GACGCACAC
CATTCTCTAC
ATTGCCCTCA ATCTCTGTGA TAAACTCGA.A AATGGTGTGG CACCATATTG GTATCGATGA TGTAATCTAC TCCA.AGATTT ATCAAAATGA GCTrGGCTT'r CI'CATCAAG AAAGTCCAAG TGCCACCTTG TCTTCT'rTTT CCTTAGAGTC CAAGACACGA CCTCCAAGCG ACGTTGGCTA TCCTTAGACA AGGTCTCCTT GAGCGG'rGTC 'rCAAGGCTTG GCGGTAGGCT GCACGGCTCT CAGGATTTCC AAGAGTGTTG TGACACCTTG AATACCGAT'r TCCTTCAAAA AATGGGCTGC CATAGCGATr' CGGTAGCTGG ATTGCTAGAG CCAAAACACT CAACACCAA'r CTGGTGGAAT GCCCTGCCTG TGGACGCTCA TA-ACGGAACA TAGGTCCCAT GTAGTAGAAC TTTGCACTTC TGGGGCGAAA AGTTrTATTT'T CCACATAGGA ACGGACAACG CTTCTGGACG GAGGGTAATA TGACGGTCAC CCTTGTCATA AAAATCGTAC TTACGATATC CGTTGTATCT CCGACAGAGC GACTGATAAC CTCGTAATGC GCGTGCGCAC TTCTGCATAG TTGTAGCGTT 'rGAAAATCTC ACGGGCAAAG
TTGTAGTAGT
TCCAACATCT
ATAGACGGCG
AGAGGATTTT
AAATAGTCAA
AGGTGCAATI'
GTTTCCACAT
TGGCGCAAGC
T'rGCTTGGCT
GGTGCAGTTC
ATTTCCTTGG
TCAAAAATAG
CCCTCAACGT
TTTTGTAATT
GGGATTAAAA
TCAAAAAAAT
ATACGGCTAC
CTAAAAAATT
10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120
ACTGCCACTT
TCATAGGGAA
CAGTAAGAAA
AGCTAGCAAG
TAGTGAAAAA
ACTAGTACGA
AGCAGACTCA GtAGGTAAAA TATCCTGCGT TCCTTTTGGT TCCTCTTTAA ACTTAATACT CTTATTTTAC CATAAATAGA AAAATTAGGA TTTAGATA'rC ATTNTTGAGA TTAAGAAT'rG GAA.AGACCAA CAAATAGCAT CCAAGTCAAC TGTATATTCC CAAGCTGTTC CCACAGGTAT GGATAAGGTA AACAATAGAC GCTAGAACCT CI'GGAGCTAG ATTTTTCATG AGCATGGCAC TAATC~TTGG TTGAACPTA CCAGACACAT ACAGAGTAAA GAAGAGAAAT AGCAAACCAA GCACGACTTG ATTGAATAAA TTAGCCAAAC AGGCAAGGAC TGCTTCCCAA TAAAACACAG AATTGATTGA GCTCAAAAAG AAGATATTAT GACAGCAAAG AGCATAAAAC ATTTCGGTAA ATTGTrAGGA ACCATCAGCA GATGACAT'rG TGATACTAGG AAAAAGCAGG AGCTAAGAGC CAGACTCCGA CAACTAGACT AAGTCCTACG GTCTCCCACA TCATCAA'rCT AATAATCATT GCCCGTAAGG CTACTGATGA TGACTGATAC TAAATAGTGC CTCTGTATAA GAAAAAT'rCA AGAGAGAATG AAATTCCACC CAAAGCGCCA CCCAAGGAAT TAATAAGCAA CAAAGTTTrT CTGTCCACTr TTAAGAAAAA CGAGACGTAA ACTGGTCTTT GATAGAAAGC TTCTCATrTT'vTA" T
ACAGGCTCAA
CATTGATTCC
AAGCTTGACC
TTTGCTTTTT
CGCAACGAGA
ACCAATAGCT
CCTA.AAA.AGA
GAAAAATTGT
GAAATATAGG
GGATAGTGGC
TGACCGATAG
TGATGAACTG
0*
TGAAAAAGAA
ACGCAGGCCA
AGAAAAGGCA
CTGAAACAAA
CCCTGCAAAA
TAAGCCTCCA TCAGATCATC CCTGCAAAAT CACTGATGAT AAGAGACTAG CTTGCTGAAC CCGCTATAGA CCATCCATTT ACTGTAAAGA GGGTCGGAAG TTCAGCTACT TTTTCCTTAA TAAGAGGCAT ATCACrAATG ACATTGATCA AACACAGGCT AACTAGGGCT GCTAGAAAAA ATAGAACCGC GACCTTGTCC CTCGTGTAAT CTGCCCGAAT AATCATGACA ATATTCGCCA TAGCAACAGC 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13 080 13104 120 180 240 300 360 AAAAGATGCT TGTGACAAGG TCGATGCATA-GACGATAAAG ACCAGGTTGA AAATCGAAAC ACCAAAAGCA TTGAAGAAGC GTGG INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 19250 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCGGGCAAAT AGTTTTGAAC TTTTCATCAT TTTCTCCTT AAAACTTTCT GACTCTTTTC AGAAAGTTGT CAACAGAATT TTCAGAATTT TTGAAAATTA AACATCTTTG CAAAAAATAT GAATATCGTA AGCGCGTCAT AACAAGGTAT TGGAGCTCCT CCTGTATACT ATTAGTAAAG TAAATATTGG AGGATATT CCTATTGTTC CTGTAGAGAT TCCACAATCT CGTCGTTTTG ATTCTAAAAA ATTCTrCTTA AAATTCGTAT TGGCA.AGCTT GAAGTAAGTT TTTTTCA.ATC
CTCCATTATA
T'N'TTCAAAC
CTATCATTCA
AATGCCACAA
GAGAAATGAT
TCTCAATCTC
348 GAAATGATAG AACAGCTTTT GGATAAGGTG TTGCTCTATG ACAATTCATC TATCTAGCCTr AGGGCAGGTC TATCTCGTGT GTGGGAAAAC TGATATGAGA CAAGGAATCG ATTCACTGGC TTrATCTCGTT AAAACCCACT TTGAATTGGA TCCTTTCTCC GGTCAAATCT TTCTCTTTTG TGGTG-GACGT AAAGACCGCT ATATAAACGC TTTGAGAACG TCTCGCACCT GAACAAGTAG GTAGATTGAA ACTAGAATAG CTGTCCTGAT CGATTTGTCC TGATTTCTAT TGAAATGAGG GTCCATCTCC GATTAACGAT TGAAGTTGAT TCATGACATC CGAATCTCTr TCCACACTTG GCAAAGCCAA TATTAGTCGG ATAACGAGTA AAAGATAATC TTTATAACCT CT'rGCGAGAG CCTATCGGGT TCTAGAGAGT TTAAAGTCCT TTACTGGGAT GGTCAAGGAT TTTGGCTACT GCAGACTGAC TTGGCCCAGT ACAGAAAAGG ATGTCAAAGC ATTGGCTGAT GAAAGCTT TCTATCACTC CAAAAATATA TACACCTCTG CTTCTAAAAC AITGT'rAGAA ATCGATTTTA 'rGTTATTA'IT TCAT'rTTAC'r ATAAATCCAT ACTTTC=T TATACTCATC TGCTT'rCAAA GGACTTTATC ACCTCCTTCT CCAGTCCTTG TTCCAAAGTTI CGAAAGGCTT TATTC=TAAA TTCAATGGGG TTCATCTCTG GTGTGTATGG AA'rCTTTAAG GTACTTGArrr TATGCCATAT ATCTGGATAA .GCTTGTGAAA GCTCCTATTC
CAGAAAGTCG
AAGCACTCTA
TATAACATCT
TCCACGTTTA
AGGAATAAAT
AGCATTGTCC
CTAAAGCCCC
AGACTATTGA CTCAGCCCTT ACTTCATGCG GATGAAACCT GATAGCCATC TGACCTACTA TTGGACTT TTGTCAGGTA AAGCACAGAA ACAAGGGATT ACGCTTTACC* ACCATGATCA GTGTCGAAGT GGTTCAGTAG TACAAGAATT CCTAGGAGAT TAT'rCTGGCT ATGTTCATTG TGATATGTTG CGGCAGTAAC TTAGGACTTT AGTCCTCTAC TTCTGCCTAT GCGATAGCAG TCCAAGGTTT AGGAGTAAGG CGACGCTAAG CTTGGTAAAC TGCGAACAGC TAGAAGCTTA TCGTCA.ACTG GAAGAAGCTG 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 CACTTGTTGG ATGTTGGGCG CAGATAAAT1C ATCCTTAGGA AAAGAGACTG GGAGGCTTTG AACCCCTACT GGAAGACTTC AACTAGGAAG GGCAATTGAA AAGACGGACA TCTGGTCCTT TGGGACCGAG TAAAAGAGTC CATGTGAGAA GGALAGTTTTTr TGAAGTGCCC CCCAAGCAAG GCTAAAGGTT TAGCCTATTG TGATCAGT'rA T'rTTCCTTGG CCAGCTGATG AACGGCTACA GAAACGTCAA TTTGCTTGGT GCCGTCGTCA GTCAGTTTTA TACAGCCTCA AGTATGAAGA AACCTTTAAG TCCAATAATC TAGCTGAACG CGCCATTAAA CAGTGGACTC T'ITTAGCCTA AGCTCAGTT
GAACATCTCC
TCGGGTTCAA
ACCATTTTAA
TCATTGGT'rA
AAAAAAACGA
AGGAGCTAAA
ATAGTGCGTT
TTCCTTTCCT
GGGTGGTTAT TTTTAAAAAA GCGAGGGTGG TTATTTTCTC AAAGTT'rTGA GCAAGAGCTA TTATTATGAG TTTGT'rGGAA ACAGCTAAAC GTCATCAATT GAATCTATAA CAGTACGCAT CGACTGCTAA AATATTTCTA TAAATCAATT
AATCGATTTG
CTTCTAGAAT
TTACCGTGGA
ATATATGACT
CGAAACTTTG
ACGAAATTTG
CCTTTATTrCT
CCTCGACTAT
AGAATTAATC
ATTTATCCA
TCCTCAACAT.
349 'rrCATATCTT ATTACAATCC ATTATAAATA GCGAG3AAATA TCTATCCTAT GTCTTCCAAA CGAGGAAACT CTCGTAAACA AAGAGGN-r AGAGGCCTAT CTAAAGTTGT ACAAGAAAAG TGCAAATAAG AAATCTCCAG ATTAGGAAC'r TCTCTAGTCT GGAGATTTTT CAATAGACTT CGI-rATrGGG CGGTTACT AAAACTTCAA AAAACGGATT TTTATCGCTC TGAACATCAA AAAAGAAAGG TCCTTT1CTCA AGCTTAGCTT TTCTCAACC CACTACAGTT GACAAAGAGC ATCAAACATG AAGCGCAAAA ACAAGCCAPLA AATCCGATAG AATCGCTATC CAAGTAAGAC ATTTCCATCA AATACG'N'CA ATTTTACTCT TGTTC'rACTA ATC'rCGTTTT GATTTATTAA AAATATACAA TTCAGCTTTT CCTCCAAACT CTATCCCTGT ATAGCTCTGT ATTATCTTAA. CAACTTTAGT AGAGACATTT AATCCGGAAC CCGTAATCCA AAATCCTCAT CTTGTGCCAA GCTAACAGCA GTTTCAACTG CTTGAAGAAG AGAATTTTCA TCAATCCCTG CCAAAATAAA TCCTGCCTTA TCTAAGGACT CAGGACGTTC TGTACTTGTA CGAATACATA CAGCGGGAAA AGGATAACCT TGACTAGTAA AGAAACTACT TTCTrCCGGT AAAGTTCCCG AATCAGATAC TACAACAAAT GCATTCATCT GTAAACAATT ATAGTCATGG AATCCTAGTG GCTCATGCTG AATCACACGT TTATCTAGTT TAAAACCGCT CTCT'rGTAGC CI'TTCTTTG ATCTAGGATG GCAAGAAkTAT AAGATTGGCA TATTATACTT TTCAGCTAAT TGATTAATTG CTGTAAAGAG AGAAATAAAA TrTTATCTG TATCAATATT TTCCTCACGG TGACCTGAAA GTAAGATATA ACCTCCTTT TTCAATCCCA AACGTTCATG GATATCTGAA GACTCAATAG CAGATAAATT TTTATGTAAC ACTTCTGCCA TAGGAGAACC AGTTACATAT GTGCGCTCTT TAGGTAAACC ACACTCATGT AAATACTTAC GTGCATGT'rC AGAGTATGCT AAGTTAACAT CTGAAATAAC ATCAACAATC CGACGATTAG TCTCTTCCGG TAGGCACTCA TCTTTACAGC GATTGCCAGC CTCCATATGA AAAATTGGAA TATGTAAACG CTTGGCAGCA ATAGCTGATA AACAAGAATT TGTATCCCCT AAAATCAATA AAGCATCTGG TTTAATTTGA TTCATCAATT TGTATGAAGT ATTAATAATA TTCCCTACAG TAGCACCAAG ATCATCTCCA ACAGCATCCA TGTATACGTC CGGAGTGTCT AACCCTAAAT TATCAAAGAA AATACCATTT AAATTGTAAT CATAGTTTTG TCCAGTATGT GCCAAAATAA CATCAAAATA CTTTCGACAT TTAGTGATAA CACTACTTAG ACGTATAATC TCTGGACGTG TTCCCACAAT AATCAATAAC TTAAGTTTGC CATTATCTTT AAAGTGAATA TCACTATAAT CTGTCTTAAT TTTCATTTAT TTCTCCACTT GTTCAAAAAA AGTATCTGGA 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 'rGTCTAGGAT CAAATGACTC AGATTAATAA TATTATGTGC GACACTTCAA AGTTCAGAAT TCTTGTATTA AAGcAcGAcc TGTTGCCcTT TGGTAATGCC GTTTTTAATA ATTCCGTAAA AACTTATCTA CTGGTAAATA TGAGGAATTT CAGGCATAAC ACAATCI'CTC CTAAGGTTGC CTAGGTAAGA TT'rGTAATCC AGACACTCTT GTATCAAATC ACTTGAATAG GTAAATCGTG TTAGGACGGC ACCACTTCCC 350 ATTAGcCCAC ATGACAGTAA TTAGATT'ITC ATAGCCCGGT ATCATATGTA TTGCTTCAAT AGGATACTCT TGACCGTTTT CATCCAGCCC AGAAACAACC ATGAAAAATT CCCACTTAGA AGGTTTAGAA ATATTAACAG-7T~GACC ACTACCTCGT TCATCTATAT TCATTTTTAG AGATAGGTAG GTAGAATACA ATTrCTT'r'r' TAAACTATCA GGCTGTT'Tr TAAATGTTC
TGTATCAGAA
CTTATCGCCC
TATCCTACGC
ATGATGCCA
CGTATT'NTCT
AGGAAACTTA
AAACGATCCC
TAATAGAGAG
TCCTGATGGG
CAATGCAGTT
TGGATCATT'r ACGATGAGTC GTTGGTACGT ATCTAGATTA CAACGATGAG ATCAATATAC AGCAACTCCA AGCTAGATTA TAACAGAAAG
AGCAGTAGTT
GAT'rTCCTT'C
ATTCTACACT
TTGCTACAGC AGAATTGTAG ATAAAGATTC GGGAAACGGT AAACTAAGAC AGGTGCTCCC S.
S
S.
S
S
S
S. S
S
S
55.5 S. S S
S
GT'I-r'CTT1TC CATATTCAAA GAAGAGTTCT TCCCCTGCTA GCTTAGATTG TCCATATATA CAGTTTGA.AA ATCGGCCTTC TAAACTAGCT TGAGTAGAAC TTGAGAGTAG AACAGGACAA 3960 4020 '4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 GTGTTT'rCAT ACTTTTCTAA AATCTCCAAT ATGAATTCAT CAGGATTCTG TGGACGATTG TTCTTACAAT ATTCATCTAA 'rAAAATCGGA CCAATCTCTA AATTAGGACG AGTCCTATCT ATAAGATTTT TTCCTACAAA TCCTTTCGC'r CCTCCTTATT TTATATGCTG TTTTAATAGT ATATCCTTGA TAATTTTAAT GTATCTTAAA TCACGAATTG CTGTCTGTAT TTCATCTAAT TCCATCAAAT CGGTATTATT ACTATTGAAT TCTrTGAALAT ACTTATCATA GTTAAGATTA CCCAA.ATCAA TTGCATTTGC GCACTCTTCG CCGTGTCTAA TACCTATAAT CTTAATATCT TTAGCCAACA CTTCAATCGT ACATGCTGGT CCTTCT'rCAA ATGCAAATAA AACCAAGTCT GTCATCCTAG GTTCAGTAAT rGTAAGAGCA AATCTACTTG AAAAACCGTA ATTITCCCTCC ACACCAGCTA AATGGAATAC CAAATCGGCC TCTGTATCAC GATCATACTG AAAAATCTCT CGTCCATCTT TCAAAGCTTC CAGAGTACAG CCTGTGATTA AAATATrTTTT TAACTCTCTC GACAATACAT
AATCATGCCC
GATACATTAT
AGATTTTACA TCTCTTCGTC TGCTACCATA TCTAGCAACT TTCTTTTAAC TTGCTCTACA TCTGTCAACA AATTTCTATT CTACTACCA CGATTATCAC TAGGAACTCT ATAAAALATCA TTAGTTAATA GT6TTTCATA CCTTTTTTCT TGTTCTGAGG CAAAAATTTC TGATACAGCC GCTTTCTGAA CTAGTATATC TCCAGATTTC ACTGCTTCTT CCAATGTCAT CACAAAACGT TTTCCTTGCT TAATTTGCTC AATCCAAAGA GGAACGACAG ATCCACGCCT ACACAGAACA TTCCCATAGC TGCTCAGGAT TTACCGTCCT GGACTTAGCA ACAGCAATCT GTTCCCATAG CATTGACAGG ATAAGCCGCC TTATCTGTAG GAGTCACACA TATCTTTGTA ACACCAGCTT CGATAGCCGC AGTGAGGACA GCTTCTACAG GGAAAAATTC ACAAGAAGGT TAA'rCCACAC CATGCATAGC ATTTTTACC
TTCTCCGTTC
ACTTGTT'rAA
GAAGCTAAGT
TTTCCATCAT AGCCTTGGAT AAAGACAGAT AACTTGCTTT CCAAAATGTT AG 1 11"1TACC GAGCAGCAGC GTGAAAAACA CACGCACATC TCCAAGGTAA ACTCATGACG CATATCATCT CATCTGTTTC TAAAAAACGC TTAGGAGAGT TTTTCCTGTA CAATTTTCTT ACAAGCCGTT AAACGGAT'rT TCCCAGCCAC TTCTGGTACT TTTACCTGAA TGTTTCTT'rT CATCTCGCGA AAATATACGA ATCTCTGAGA TTGAGAACCG CATTCCCAAA TGAACCTGTC CCTCCTGTAA AATTGTGACA TATATTACAC TTCTCCTTCT AGTATGTCTG CCATCTCCAT ATGGATTTGA AGCTTGACTC AT'rGCTTGAT AATTCTTTAA AATGCCTATA AATATTATTT TCATCAGCAC GCTTCAATTC CCTCTGGACG TTCAGTTGTA TCTCTCATAA AAACTGAATC AT=~CTAAT CTACAAGTTT CAAAGTCCCT CCAAAACAGG TTTTCCTAAA
CTTGGAGCCT
AAATTGTGAA
CTTCCTGAAT ACCACCACTA TCTGT'rAAAA rTTAAATAACT TCTTGATAAA AATCTAATAC TTCTAAAGGT TCGATCATCT TGATACGTTC ACAGCCACTT 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 AGT'rCTTCCT CAGCAATTTG GCGAACACGA GGATTCATAT GGATAGGATA. AATAGCCTTG ACATCTGAAT ATTCTTCAAT AATCCTTCTA ATTGCTCTAA ACATATGTCT CATCGGTTCA CCAAGATTTT CACGACGATG AGCTGTAATT AGAATAAACC TGCTTTCTCC TATCCATTCT AACTCAGGA'r GCGTATAGTC CTCTTGAATT GTAGTTTGTA AAGCATCAAT CGCCGTATTA
CCTGTCACAA
TGTGTTGTG
GGATATGGTG
TGTAAATAAA
ACCAAATCAG
ACATCAAATA
AATGTGTCCA
TCAATATTCT
CGAGTTCCAA
ACGGrTTCTTA ATATGC'rCTC TGGAGTTTTT TAAAATGATA CTGAGCCAAA AATAGATATC -GTAAGTGCGC AGGCCGCCAG TGAACTAGCG GTTTTTCTGA CTCTAAAATA AAGTTTGTTT ATCTTTCATA AGACCTGATC CAACATTTGA TACGTGTTCT TAACTCTTTG ATACTACAAC TACTTTTTTC AAATAAATTA GATAACGGCT CCTTCTCTTA AAAGATTATC TTTTGAAAGT ACCCCAACTG CTTGACGATr' AAACTCTTCA AAACCAGCTT CAACATGACC AATTGGAATC AAGGTCGTAC TTGTATCCCC ATGAACTAAC GCCTTCATTC CTTCCAAAAT GCCAATGGTC ATAGACAAAT CAAAATCGGG AATAATCCCA CGGTGTTGGC CCGTAACGCA AACTAATGTT ACCAAAGGAC ACATCTTGAT GGCTTCTGGA ATATATTTAC TTACTCCTAA CAAATAATGA AATCCATAAC ACCACCTCAG ACATACTTGA ACAAATAGCT AATGTTACTA AGTTTGGACA ATCGAAGCTA TCCTAAGACA GGCCATCCG2' GTACTTAAGA AAATCTGC1'G ATTTGAAAAG AATAAAACTA AATACTCTTA ACCGATTGTA TATAGGATTA TACAATCATT TATCTCTTGA CT'rTGTAAAT AGTTGCAAAA 'rTGGATAAAA CAGATAAGAA AATGATAATT AGCAACTAAA TTCCCAATTC TACTACAATA AATGTCAAAA TGCATGCATC TTTT'CAATTC AGCTAACAAA TAAAAAACTG CAATATGGTG TAAATTAGAA TCCAACTATC CTTCCAATCT TGATATCATG AAACCAAAAT 352 AACTAAAATT ATCAGACAAG ATATAGTTGT CATTGTAGTT AA6ATCATAGA ATAAAAACTA AAACGGTATA TTNTTCACCA TCAAAACTCC AAAGATAATA TATCTTTAGT ACCTATCATA TTGCTGCTGA AAGCAGTTGC AAAAACCCGA AATGACTGTC AATAAATAGA GGATTCCTTT TAATTCCATA ATAA'rGAAGG CTTCCAATAT AGGAATCCAT TTGTAATGAT AGTTNTAGAA CACGAAATAA AAAGTCAAAG AAAAAAGAA'r AT'ICTCTCTC TCGAAATGAT AGATAAAAAT 'rTGAGGGAGT AGTACAAGCA CAATCACCAG TTGAACATAA
ATAAATATTC
TCTTTrCACTT
GCAACAAAAG
CCAA?1'ATAG
GGAATAAACA
TGCGGATA'rA
ATTGCTATCC
GTrAAAGACcC
CTAATCCCAA
TATCAATAGC
CGGGTAATAA
AAAGAATTTG
TAATCCGATT
AACTAT'rCGC
CCCAAAAGGC
CAAAAATAGT
0* a.
a a AAATCT'rrAA CCCAAACAGA AATCTATAAG AAACTACTGC AAAATAGAAG AATCATCTTTI ATAATATAAG GAAT'rGCAAC ATAAAAATAT TGGTCACTGT ATTATTGGGA TTT'GCCACAT A'TTTTrTCAA CTAGAGTATC TTTACA-XTAT TTTTTGTAGC GCTATTAACG CTTTA.ACATA AAATACGGGA GTGTTAATAA CTTGTATTTT TTATAAATGA ATCTA.AATAG TCCAAACTAC TTTTAGATGA AAAGTTTCAA TT~CACAACGT TGTAACTCTC TGCAATGGCA ATCACAGAT r 7500 7560 76 0 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 AATAACCATT CCATACGCGT CTAGCGAAAG CACCCTTGTC AGGAAATAGT AATTAACAA TATTCAGAAT ATAGAGAGAA AATTCTATCA ACTT'rCACGA TTCTCGCTT'r CAACACCAAT GTT'rCTTTAC AATACTATTA CAATTGCTCC ATAATAACGT TATTAAAACA TGTTGCCACT T'TTTCAACAA TGATGTCATT T.TCTAACTAA TI'rTCTAAAT AAGTTACTAC AACATAATAG ATTTTTCAAG AGCTTTTTTG CTTCTCTTTT CAATTGACAC T'rAAATCAAA CGTTTCATGC ACTAGTCCTT CCAAAAAAAG TC'rGAAGGTA TTGTTATCGG
ACACTTGAAT
GCTGTTTTTT
ACCCCAACAT
TCTGCAGGAG
TCCTGATAAT
TTAGGATTGT
GAATGATTAT
AGCTGCTCTG
ACTGGAGCCG
CAAATAAAGA
CTGGATGGCA
GTAATTTACA AGTTAAAACC AATGATACTT GAATTGAAAA AAGCATCTTC ATAAGGTAGA TTTCTAGAAA AAGACTAATT AGTGAACAAG AATTATCTTC
ACATCTACCA
CAATCCTCAG
ATGGAATCCG
GATTCCGCAA
T'rATCTTTAG TTTTrTTCTTC TCTTAATTTA CTTGAAATAA AAGGCGACAA ATGCTTCAAA GAATCAAATG 353 ATTCTCGATC ACGAACTGTA ATAAATTGAG CATGA'rTAAT AA'TTCTT TCATCAAAGA A'rCGTTATTA GGCCCTGCAC CAATACCTAA TACTCCTATA AATATGAAGC CCAAATTCCC AAAGGTAAAA ATCGTTTAAA TTGGATTAAA AACCTGCATI' ATGCCCTTCC CCAAAATATC CTCCCGGGAT.ATACAAAATA ur rTTTAGT AAAACTTTGT T'N'TGGCGAT A?'rCTTCAA GTACATTT'GA ATGGATTATA AAAAGAAACT TCATATCCTT TAGATTCTAA TAAATCATAG CGTAAAGATA ATrCACCGTAA TTACTTGAAC CATAATCCG;T TGCACCATGT
ATACCATAAT
GGCT'TrrAA
TTATCACGAA
GCATCTGCTT
AAGAAATCTG
ACAATCTCAC
AACATAATTT
TTTTCACCAC TATTTTTTCA ACCTCCTAAA AGGACGATAA ACATCTATTG AACTACTTCT AAAATAAATA ACrTTTTGAGA TTTTACTTGT ACTAAATATT CCCAAAACAA AACTCCAAAA TAATTCTTCC ACAAAAGAAG AGCCTACAGG CGCTGATGCT TTATCAAAAA AATCACCAAC
AATAAATATC
CACTAAAAC
ATAATCAAAC TATACATAAT AATAGTTGAG AAATTACCGA AGTGCGTAGA AATGTCATCC AGAAGCTACT GTTAGGCTGG TACAAAATCT AATGAAGAAA ACGGTGCCCC CTAATAACAA AAGTACTCTT TTCTTCGAAA CAAGATTGAA AACACCTGGA AAACAACATT ACCCAAAAAA CAAGGAGAAC ACACCAGGAA TGCCTCTGAA TATGCTGAAT AGTTAT'rACC CTAGAAATAA TTTCTCTTCA TTTTCCTGAT ATAAAAAA)AT AAATAGCCTA ATGAAATAA'r TCTTCAT1'AT CCCCATATAC TCATTGAAAA TACAGAAAAA TCATA'rATAC TAGAAAGCAA ACCCATATAA
CATATTCATA
TCAGTCCCGA
AG.AATCAAA
CTAATACATT
AAGTAATCCC
TTTTACGACT
TAGTACGCTT
GCATAAGTAC
AGCTATTCGC
AGCCCACTCC
AATTTTGTAC
CAGAATAACA
TATAGAAGTT
TTAATCCAAA
T'rGAAAAGCT CTGAAAT'rTA ATCGCCATCC AACACCACCA TAGTAACCAA ACTTCCAAAA TAACCCCAAA AATTTA'rTAA TAACAACCGT TAACCATCCA ATAGGAAAAA TTGATAGGAT TGGAATGCTA CTAGGCACAA CAGTTACAGC CTCTGAAAAT ACTTCCCCTA GTATATTCTT TAAC-TATATA CCTATAGTAT TCAAGTCGA.A TAATAGAAAT ACACTTACTA TTAAAAATAC TAAAGATTGT GTGTATACTA AAACCAACC TCCTGTTAGG ATCATTATCA AAATTAGGTA TATAACTCGG GACAGCTTAT CTGAATAAAA TCCTAAATCA TCTATTATTC CTGAACTAGC CGCTCTAACT GCTAGTACTG TTTTAGAATC TGTTAAAATC CTACCCGCAT TGTACAAAAT TTCTGAATGA TA.ATGTACCT TTCCATCACT AAACAALAATC CAAATTATAA AAATATATGA ACTAGGGCTC CACAGCAGAG TTGTTTGAAA CATAAAAAAA TAAGATAAAA TCAGATACCA 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 0 0 000* 0000 0 0000 00 00 0 TAACTTTTTG TAAAATAAAA CCAGTAATTT GAAAAATAAT ATATAGACGG AACATAATTA GATATAAGAA AACCATTALTT 10980 354 CCAATTATCG AGAGTCCAGA ACAAGTAACA GAAAGCAAAT ATAAAACTTA ATGTCACTAG
TGTCACTCTA
ATAACGATTC
GTTATTTATT
TAGCAACACA
TTGCGTCACT
CAAATATACT
AATAATTTIAC
TICAAAACGAT
GACTCTTCGT
GTATCTGACG
TTGTCTGCAT CTATATCTCC !TTrATTACAC TAGCTTGATA ACAAATATCA TAGAGTCCAT TGCATTCCTC AGATGTrAAA GACACTACTT TGATAGGTAA GTAACTAATG ?rNTTGTCA
ACATTTCTTG
CTGTCATACT
TATCTT'rCCA CATCTACTrc ATAAAATTTG TAATCCCGAT GCCTGAGCCT CTACTAGAGA CAAAAAAACA TCCATCGCAG ATAATAAA'rC AACAGGCAAC CCCTCATATT TAGACGGAAG AGAAATATCA GTCCTTCTCC CTAAAAATAG TTTCTGTTTT AATTTCTGCT CATCCTCACC TTTGATTAA.A ATGAGTTCT'r TTAAAACGTT TAGGCGAGCT ATATTflCCTA ATACGAACTT TTCTCTAACA TCTGACAAAA ATTGATACTT AATTTTTCCG TCTTTATACG CTTTCTCTCC TGCAAACCAA TGAGTTGCTA AGATTTTTAC TTGAAAACTG T'N'TCTGTTA CATAAGCCAT AATTATTTTA GATAAGATCA GACCAATTGC CACATATGGG GTCAGATTTA GTTCTAAAGC ATTACCAACT AGGAGTAAAA TAACATTTCG AAATAAATAA CI'TGGTTTT ?I'TGATCTGA ATTTGACACA TCTAATTCTC TACGACATTT TTTCAAATCA A'rTGCATTAA AAATAATTTC ATATAACCAC TTAGCCGAAT CTTCCCCACA CAAAATTrGTT ACTAATTTAC GCAATACTT'r ATGACTATGA ATAATTCTAA 'TrTACAACC AGATTITATAG CCATGGCAAT GAACTATATC AGAGAGAAAC TGATGTAGAG GCTTTTTCCT CAATTCTT'TC ATTTTATCCT CTAAAAATCC 4, 0 goo.: .0 0 &too0 S S 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 ATAATCTCCT rrCTTTATTA T'rCTAGCAAG TAATAGAGGC ACATGATAAA CCTTTGCACC TTGTTCTTTr CCAGGCACAA TAAAATCAAA TTGAAT'IrTT TTTCTATCAA TGTGAGAATA ATAGTTGAAT AGAAAACTTr CTACTCCACC ACTATCTAGT GTTGTAAATA GA'rGTAATAC TT'rAATCATr CTTCTTCCTT AAGCTTAAGA TTCGCTrCTC TAAT'rCTATT TCTGT'T'TT GTT"=T~AA ACTAAI-rCTG TCCATGAAGT TATCACAATT CTTAATTAGC TGTTrCCTGT CAAGGTTTTG AATATACAAA GCCAAACAAT CTTTTTCCGA TTCATCCTTC ATAGGTAAAA CGAAACCAAA ACCATTCTCq' ATTGACACTT TTTCCATATA AGTATCTTCA CAAACTAAAA TAGGTTTATA CAACAATGCA GCAAAGTAGA GmTATTAGA CAAAGCATAG TCTAGTAAGG GAGTGTGAT-r CCCGTATAAA TTCAAAACAA CATCTGTA'rT CTTATAAAAA GACA'rGGTAT CTTTAGGCTG GAATGTGTCC ACCAAGTTAA CAT'rGCTGAT AT=r'aTTCT TGACAAAATT CCCTTAATTC TCCTGCATTA GTACCTATAA AATTCAACTG AAA'rCGACTG TCATTrGCAA AAAAATCGAT TATTTNTTTh Tr'rTGTTCT1 5 GAAAACGAAT TAAACCAATG 'rAGGAAAGTT GAATTGGAAA CGTACTATTA TTTTTTAACT GCTr'rACCTC GTTTAATTCT ATCATATTGG GTAGGTTATG GGTAGTAAAA TACTCTCCCA AAACGATATT CATTAAAGAA TTTTTCACCA ATTTTI'CATA ACTGTAATCA CGAATATCAT GAAAATCTAC TAAAATGAAA GACACAATAC TTTCTTTTAG CrrCTTTITTA ATTTCTTTTC ATTTTCCTTT ACCAGAAAAA GAAA'rACGAT 355 TTGGThAAAAA
ATTGTTTCTG
AAATATATCT
TATG1'AACGG
TGAATTTTAC
AAATTTATAG CCGTCTCAAG AACCAAACGA TAAACCAAAA ATTTTAAAT GAAAAGAGAA CAATATCATA TCATAATCAT ATAACCTAAT ATCTTACTT-A AGTAGTTTTG TTTTGTAATA ATCTCGTTAA TATTCTTIATC CCAATATATA ACATCGTAAC 'rAATAGACAG AAAAATTGAA GTAAGGAGT'r AGATATATAT TATCAGATAG AATTAT'rTT TCTTACTI-rC CCTCTCTAAA CATGTCTCCA TTTGAAAAGT GATTCATA GTAACAACGA GCTTT'CTTTC ATAGATAACA TAC'TAAATTT ACAAATATTT TTTGCCAATT CTAACATATC CACAATTTGC TTCTTCTACA ATTATTTTAG ATAATTGGTT TGCCTGCCGC CATATAAGAk TGTACCTTCC ATCGAGTCTC CTATTAAAGA AACTAACATA GCATCTGATT TCCTCCAAAG AACGTCTTCC ATAGAAGGAA ATATTCT?1'A GCTTTCATGC TTAACAATTC CGTACCATCT CCAACAAAAT AAATTGGTAT TCTTCTCTAT CAAACTGGCA GCTTTCAAAA TTGCCAATAT TACCAGCAAA AGTTAGGTCA ACACT'ETCTT TTCTCAAT AATTCTTTAT
TATAAACAGT
GTTCGAGCAT
CTAACTCTCT
ACTCTCATTA
AAACTGCTCT
TTGTCTC'ITA
GTTTTACATC TCGTTCGGGA CATCTCCTGA AATTGCACCT CAGGTATAGT ACGAGAAACT TTTT'ATAGAA GGATGGCATT~
ACTCCAATTC
GAAAATGAAT
TAGTTTCCAA
TATTAACTAT
ATGAGCTAAT
TTTCTTGGGT
ATTTTGTGCT
AGATTCATCA
GGATATGTCA
ATCACTAGCT
TTGTTTCACT
CnaTTTCTTA 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 GGGATAAA-AA GATC3-rCTGC ATATTGTGGC AAATATGTAA TCTTrTGTTC AATTGCTTCA CAAAATAATT TrI'AAATGAT GGACTAGTGA CAAATATATA CGGTAAAC1'T TTTTTGAGAT AAATTTAAAC AGCTTGAAAA TCAAGCCATC CCACCTACGG TTAAACTATC TGGCCAAACA TCCATACAAT ATAGAAACAT TATTTT'r
AATACACAGT
GTAGAACTAA
GGGATTGTAT
TGACGATAAT
ACTTCATGCC
TTATAATGTT
TATAAGCCAT ACCAGCCCAT GCCATCATAA CTGGAGACAA TTGGTTAACG CAAAATTCGA TCCATCTTTC GTTTTATACC TCCCCAATAA AACTCCTAAA TTGCA.AAGCT AAAATAATTC AACAATCGAA ATACAACACT TTTT'TTTCTA AACAACGATA TATCGTAACA CCTTCTATAA TCTCACGTCT 7TTTTTATTA CTGCATATAT CTTCCC?1'CA GCGTAATTAG GAATCCCAGC CAAAACAGAG CTTTTCGAAC TAAATCTTCA CAAATATCTG ACAACCTGAA TGGTTCTGGC GGCAAACAAA TAGTATTTTC AT'rGTCCAAT TTAACTTrCT TTCTTACCAC 356 TACCCTCTAC AATACCIT' CGTTTCAGTA CGTAAGGTAT TGTCTTAACT ATACATCTAA TA'rCCATTAT CAAAGACAGA TG'TTAACA'r AGTAGCCATC TAACTCCGTC TTCATCTCAA CAGACAAAGT ATCACGCCCG TTAATTTGTG CCCATCCAGT TAACCCTGGC AAGATATCAT TTGCTCCATA CTTATCTCTC TCT~GCAATCA AATC'rAGTTC ATTTA'rACCC GCTGGTCTAG GACCTACAAT AC'TCATATTA CCAACAAGAA TA'N'AAACAA TTGTGGTAGT TCATCCAAAG ATGTTT'rTCG CAAGAAAGCC CCTAC'TTTTG TAATCYATTG CTCTGGATTA TATAAGTTTC GAGGCGCCAC ATTTTTAGGT GCATCTATTT TCATAGACCT AAA'ITTTCAAA ATATAGAAGT ATTCTTATG AATACCAAAG CGTI-rTGCT TAAATA'rAAC CGGACCTTCT GAA'rCAAGTT TAATCGCAAT TGCAATTATC ATAAAAACCG GACACAATAT TATTATCCCT AT'rAAAGATA ATAATATATC ACCTAATCGT TTTATTATAC CGTACATAAA CAACCTCCAA CTATAAATTC TAT'rTCCATT TTTCATTCTA TTTCCATTTG ACAAATTAAA TCACGCAGTA CATGCAACTA CAGAAACTCA ATATATATTT GGTCACTCAA TGATTTTCAG AAATATAATT CTTTTATCCT CTACGTCAGA TAAAACTTTT CTCCATCTAA ACAAAATTTA TTTGTTTCAG TAATATATGA GTTCTCAATA ATGAATTAGA AGGTCCAGTI' CAAT'rATTCT TCCAAATAGA CCGAATATTA TTTGAAGACA TATCGGTTT'C TGAAATTGCA ATCAGTACAT AAGCTA-ATAA ACTGATAAGT ATGCTCTrGTA AGAA'rGCCAG AGTTATATTG TAGTCCCCT'r CCATACTATA TTCATTTTAT TPT~IrACCAT AATTTCCATA GGAACCGTAA ACTCCATACT TATTAACCGA GATATCCAAT- TTATTTAAAA CA.ACTCCTAG GAACAGTT'rC CCTGTTTGTT TTAATTG'rTG TTTCGCTT TGGATATCAC GTTT'ATTCGC CTCACCTGTT GCTGTTACCA AGATGGACGC ATCACACTTT TGAGTGATAA TTGCCGCATC AATAACAATT CCAATAGGCG GTGTATCAAT AATGATATAA TCAAAATATT TACGCAATGT TTCAATCATA TCATTAAAAT TTTTACTTTG TAACAAGGCT GTAGGGTTTG GTGATACAGA TCCCGATTGA ACTACAAATA AATTTTCAAT AT'rTGTATCA CATAAACCGT GAGATKAATC AGCTGTCCCA GATAAAAATT CTGTTAGCCC TGTAATTIrT TCACGAGATT TAAAAACTCC TAACATAACT GAATTTCGAG TATCGCCATC GATCAAAAGA GTTNTATAGC CTGCACGCGC AAACGACCAT GCTATATTTA TGGAAGTAGT TGTrTCCT TCCCCAGGGT TAACAGAAGT AACGGAAAT'r ACTTTTAGTT TATCTCCGC'r CAACTGTATA TTTGTACACA AGGCATTGTA ATATT1CTTCT GCCT'rCTTAA TGAACTCCAC 'TTTTTTTT GCTATTTCTA ATGTCGGCAT CCTTCTCTCC TATTTCAACT TACCCAAGTT TGGCACAACT CCCAAAAGTG TCATCTGCAA TGTATTTTCG ATATCTTCCG GACGTTTCAC ACGAGTATCC AAAAGTTCAA GATGAAGAAC TATAACACTA GTTCCAATCA CCCCTGCCAA AAAACCAATT 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 .15540 15600 15660 15720 15780 15840 15900 .1-5960 16020 16080 16140 16200 16260 16320 AGTGTAT'rGC CI'TAATATT TGGCGAAGAC GTTGTCACGT CAGAAACACG AG'rAATACTG GAGTTAGCG.A TACGGCTTGC CTCTTCAGGA CGGTATCAA CTGGTACTGT CACTTTAATT 357 GGGGATATCG CCGGCCTTGC ATAATTmrr GAGCAGCTAC ACTCGATCAT TAACTGAAAT TTATTAGCCA AACCTTTTGG
CTCCTCCAGT
TTCTCTCAAA
AGAGACAATA
CGTCAAATCT
AGTTTCAAAT CAGAAACAAC TTCCTCCAAA ACATCCTGCG TCTTTTACCA GATAAGTTCC TGCCTGCAAA TCCTGATTTG TGA'rTGCGAT TCAC'IACGTA AATTCGCGTG GTACTCGTAT GTGCTATATG CAAAAGCCCC CGCACCTGTC ACAAGTGCCA CGTTTCCACA AGCT'N'TAAC TAATTGAAAT Ar-ATCGATTT'
ATCATTTCTC
CCTGAGCCTT
GAGGTCTACC
AAAAATACTG
CTAAAT'rAGT TGATCCATTA CAATTTTTCG CGCTTCTCCG TATT-TTTGGG TAACAAGGTC GTCTAGATTG TGCATATCAC T'rGCAATGAC AGCTCTTl' TTCATGAAT'r TATAACGTTC GGACATGTGA ACTATTTACT TGCGTGTAAC TTTCATTATT TTCAAGAGCA TCATAGCGCT ACATCAACAT CTTGCTCAAG GCGCTATGAA CTATCAAGGC ATAACGACTA TCATTGAGGG GAACATCTGG TGTGTAATAA ATTTCAGCCC CCTTAGCTAT TTCCCGAACC TGAAGAAAGT ACATGCCCTT GCGACGGTGA GAGGTAGAAA CTGCCAAGAG AGCCTTGCTT TCCTCTCTTG TATGCGAATG GATGTCTATC ATTTCATCTA ACTACAGCTA AACTACTATC ATCTATTTCC TAAGAAGGAA GATCCATCCG ACCTGTCCCT CCTCCACTTT CTAACTGAGC ATTGACCAAA TGGATAGAAT CTTGCAAGCT AI'AATGATC GTTAATTTTr GAAGGATAGC CACAATCACC CCATCTGCTA GGGAGTAGCG CTCACGAACA ACATTGCCTG CAGGGTAATA CTrTCCATTC
AGCCCATATC
CAATGTGGGC
TATCGCGATA
TCGGAATCCG
CGTAAGCAA'r
TTTCTGCTAT
CAATGGTTCG
AC'rTGGGACC
CCCTCCATCA
ATCACATAGA
TTTAAATCTT
TTTATCATGG
GTACTATAAT1 T'rTTGTTGAT
AAACCGAGAG
GTATGGGCAG
AAAGGATAAT CTCACGGTAG TCAACCCCGG CTTGTCTCCT ATTCTGGCTr AACAATAAA CTATTAAAAT CA'rTAGCTTG CTATCGTATT TTGTTCTTTC AGGATTGTCT ATAAAAAGTT ATATGCI'TCT GCCATATGAG ATGAACCAAA TCCTGCTCTA GCCAAAAAGT TTGGGTTTGA GATCAGTTCT CGAACGCGTT AATGACTGGA GTAATTCCCA AGGAGTGTTC ATAC'rAAACT CT'TTTTTCC AGCTTATCCA GACCAAGTCA CTCGCCACTT CTTCTCTTCC GGAGTTTCAA CACCCCCTGT CTGTAGGATT GTCATCTACA TCAAAAACGA CATCCTGTAT AGCTGCTTTA GGTTACTGTC TGGCATTGCA GAGAATTTAC TTtATAATTC TCTCAAGTGG CATATTTGTT TTTTCAGCAC TTCGGTTGAC GGCGCCCGCG GTCACGATCG CCTGTTCTGA ATCAAGATGA TAAA'rTCI-rG ATCATTATAA 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 ACATCAATrC CACCCAACAA ATCAATCAAT TAGTAATTGA TATCCACTCC ATAGAGATT'r TAAATGCCCG CATGAGTCAA T'rTATCTTTT TAGGCATCAC GTGGCGTTGT GGTCAAGAGG AGGATGTTGA CATCTGATCG CGACACCGAA ACATAGATAT TGAAAGAC'rG ACTCTTAGAC CCCTTAGTAT AAATCTTTTT TATCTTCGAT TTTTCAAAGA CACTATTTAG GACAATGGCC GCTGCCAAGT AAGACGAACT CTGGTTGACC TCAGCTAGTA AT'rTCTGAAT ATTT'rCATrrA AGTTGCGTAA CATTTTCGAT CTCACTATCT GAGTAATTAG AAGTCGCATT TAAACGATTG AGCGACACAG AGCTGACAAG GATAGAGAAC TPTTTTATAGA TAATCAAGAG TAGCCCTACC ACTAGATTAA GATATCTAAA AGCAAGGATA CAAACTAACA ATAAATAAAT AGTCAGCAAA GAACGTGATT TTTTAAAACG TCTACTCATG TTATATCACT TTTTTACGGT AATGTCTACA ATCTTAGTAG ACTTCCCGCG AAACAAAAAT CGTTCAGGAC AGTCAAATCG ATTTCTAACA 358 TrCAAAAACG AAGTGAAGTr CAATCGCACA 18120 TCTAAGGTGr GA.ATGGACGA ATCAACTCCA TGATTATTNC CACCATCTGC GATTGGTACA AI-rTCTrGG
CTAATAGGAC
GTCTTAGGAG
TATCTCGATT GACAGTCATC CATAGGTGTC AATTCCACTA CT'rCTACTTT TTTAGTGAAT GCGTAGTCTG GATACTCTGA CTCGATGATG T'rAGTCTCCC CTGCAATCAA ACTCTTGTAA GTCAAATCGG TATTCTGACT TGACTTGATA TTAGTCCCAG TCGGTGCTGT CACACTCGTC GCTAAAACAG CGACACTGAT TGAATATTCT 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19250
GTCAGTCCAA
ACCAACAGAA
AAGGCAACTA
TrGTACTTAA
ACTATATTAA
ATTAATACCT
CAAACTGCTG TACTGCAAAG AAATAGTAAA CTTTTCAGCT GTAGGACTAA CGCAGTTACC AGATTAAGAA CAATAAAAAA CACTTCGCTT CACTTrCTGT ATACATTGAA CATTATACGA CCTrTATTTT TACTATCTGC ATCTTTAAGT ATAGTAAAAT GAAATAAGAA CAGAACAAAT ATGTTTTAGA AGCAGAGGTG INFORMATION FOR SEQ ID NO: 36: SEQUENCE CHARACTERISTICS: LENGTH: 21706 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: AAAGTTGAAA GACTGCTAGC TGTTTTTGAT ACCAATCGTT TCCAACTACA GAGCAAACAG TATACAAAGT TTGTTTTTGG ATGTAAGCTT CTTGATGGAC AATTCCAAGA AAATCAAGAA ATTGCTGACC TTCAATTTTT TGCCATTGAC CAACTGCCGA ACTTATCTGA AAAACGCATT ACCAAGGAGC AAATAGAGCT TCTTTGGCAG GTTTATCAAG GTCATAGGGG GCAATATCTT GACTAAGAAG ATGATTATCG 'rATTTCTAAA ATATGCAGGA AAATTTTGAA TATGAGGAA ACGACCAGG CAACCGAGCC GCCCAAAWr CTCTAGCCT TAACCT'TA TACAGCCCAT 359 TCCATTT'rA ACAACTAGCA TGGTATAATA GACTAGATGA ATTTATGGGA TATNrCTTT GACCTT GGTATGTTAG CCTATTTACG CGCTATCGTG AAAAGAAGGT TTACCAACGA T'rTTCCAAA TCTTGCAGAC CATATGCCAC TGTCAGAAAG CTCTTGCTTC CTGGTCAATC ACATTAGCAG CCTTTG'rrTA CTATCCTTTA TCTTGGTCA CAGTATAATG CGCGATTGCT GCCTTGATTT TTGTGGTCAA CCATTGGTTG GGGATCACGG
TGTTCAGTTA
CCTACCC7Tr
CAAATATAAA
TCCAGTGCCA
TTTAGCACTC
GGATGTGAAG
TTTGGTGACA
TCTrAGTAGCT ATCCTTCTTT ATGGTrGGTA CTGGGTCAAT TACCATTGCC CTATGGCTAT GT'rTGTGGTA CAATACTTTG CA'N'ATTGGG AACATTTGGG GATGCTTACC CTTTTCCACA TATCACCATT TTGGGGAACT CTCTAGTTTA TCTATTGAGA GGAATTTTTC TCATGACCTr TGCCCTAAAT GGTGGCGATT ACGGAT'Trr GACAAAACCG AATTATTTAC TTGTTTCAAT TGTGCTGGTA GAATTCTTTT TAGCTCAAGA AGCAGAAAAA T'rCTTTTTTG CTCTTAGAGA GTTTTTACAA ACAAACTCAA AAAATGGCTG GGAAATTTAG GTTATAATAT TTTGTGAATA GCTATGCCTA AACTTGGAAG ATAGAGAGGA AGCGATGTAA ATATTGGAAC AAGCTCTGTC AAGGTGCTTG GCTACTATCA GTTTGACTAA GAAAATCTTA ATGAT'rGCAA
GCAGCTTATA
GAAAAAAGCA
TGTrTAGCTA AGGAAGCTTA ACACAGAGCT AAATAAGAAT TTCTGAATAG AGCACGATTA AATTTTTTGT TGGAATAATA CGAAGTGCGA 540 600 660 720 780 840 900 960 1020 1080 1140 320.0 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 TGGCTAGAGA AGGCTTTTTT TGGCCGAGCA GAGAAATGGT GTGTAAAGGA TGGAATTATT TTTCCCAAGC GGAAGAAA.AG GTAATCTTTT GCAGGTAGAA AAATTACGGA TCAAGATGTT CTGACCGTGA AGTCATTACC TTCGTGACCC ACGTGGCATG GACCTCGTAC TATCTTGCAC AAAATGTTAT CAT1TTCACCA AATTTGGTGC TACAGTGATT
ACAGGTCTAG
GAAT'rAAATG
GTTGATATTG
GCAGGCATTT
CCAACTCAGG
GAAAATGTTG
TAATTGGCGT
ATGCACCGC
CA'TAAATC
GGATGATTCC
TCAAATCAC
GAGTAATGCC AAAAGTAAAG AACTGCTATC AAGTCAGCCA AGTGAATGTC GGCTTGCCTG AGTAACATCT GATACTAAGG TTTGACAAAG AGTATGACAC TGTGGATGGT TTCCAAGGGA GCGTGGTTTG CTTTATACAG GCGTGCAGGT GTTCAGGTTG TTTGAACGAA GGGGAACGTG GACTGTCGCT ACAATCCGTA TTTATTCCTG AAGAATTTAT ATGGGGGTTC GCCTTGAAAT AATTTGCGTA AGACGGTTGA CTAGCAATGG TTCAGTCTGT GATATGGGGG CAGGTCAAAC ATCAAGAACT CCAGT1TCACA CATATTCTCC AAGAAGGTGG AGATTATGTA ACTAAAGATA 360 TCTCCAAGG'r TTTGAAAACC TCTCGCAAAT TAGCGGAAGG CTrGAAACTG AATTACGGGG 2040 AAGCCTATCC GCCTCTTGCA AGCAAAGAAA CCTTCCAAGT AGAGGTATT GGAGAAGTAG 2100 AAGCAGTCGA AGTGACCGAA GCCTACTTGT CAGAAATTAT TTCTGCACGA A'rCAAGCACA -2160 TCCTGAACA AATCAAGCAA GAATTAGATA GAAGGCGTCT ATTGGACC'rC CCTGGTGGTA 2220 TTGTCTTAAT CGGTGGGAAT GCCATTTTAC CAGGTATGGT TGAGCTTGCT CACGAAGTCT 2280 TTGGCGTCCG TGTCAAGCT'r TATGTTCCAA ATCAAGTTGG TATCCGTAAT CCAGCCTTTG 2340 CGCATGTGAT TAGTTTATCA GAATTTGCGG GTCAAT'rAAC AGAAGTTAAT CrrTGCTCc 2400 AGGGAGCGAT AAAAGGTGAG AATGACTTAA GTCATCAGCC AATTAGTTTT GGTGGGATGC 2460 TGCAAAAAAC AGCTCAC'TTr GTACAATCAA CGCC'rGTTCA ACCAGC'TCCT GCTCCAGAAG 2520 TAGAGCCGGT GGCGCCTACA CAACCAATGG CGGATTTCCA ACAAGCTTCA CAAAATAAAC 2580 CGAAAT'rAGC AGATCGTTTC CGTGGATTGA TCGGAAGCAT GTTTrCACGAA TAAAGAGGAA 2640 AAATAAATTA TGACM-r'rTC ATT~TGATACA GCTGCTGCTC AAGGGGCAGT GATTAAAGTA 2700 ATTGGTGTCG GTGGAGGTGG TGGCAA'rGCC ATCAACCGTA TGGTCGACGA AGGTGTTACA 2760 GGCGTAGAAT T'rATCGCAGC AAACACAGAT.GTACAAGCAT TGAGTAGTAC AAAAGCTGAG 2820 ACTGTTATTC AGTTGGGACC TAAATTGAC'r CGTGG'TrGG GTGCAGGAGG TCAACCTGAG 2880 ::GTTGGTCC'rA AAGCCGCTGA AGAAAGCGAA GAAACACTGA CGGAAGCTAT TAGTGGTGCC 2940 *.*GATATGGTCT-TCATCACTGC TGGTATGGGA GGAGGCTCTG GAACTGGA~c TGCTCCTGTT 3000* *ATTCCTCGTA TCGCCAAAGA T'rTAGGTGCG CTTACAGTTG GTGTTCTAAC ACGTCCCTrT 3060 **.GGTTTTGAAG GAAGTAAGCG TGGACAATTT GCTGTAGAAG GAATCAATCA ACTTCGTGAG 3120 CATGTAGACA CTCTATTGAT TATCTCAAAC AACAATTTGC TTGAAAT'rGT TGATAAGAAA 3180 ACACCGCTTT TGGAGGCTCT TAGCGAAGCG GATAACGTTC T'rCGTCAAGG TGTTCAAGGG 3240 ATTACCGATT TGATTACCAA TCCAGGATTG ATTAACCTTG ACTTTGCCGA TGTGAAAACG 3300 GTAATGGCAA ACAAAGGGAA TGCTCTTATG GGTATTGTA TCGGTAGTGG AGAAGAACGC:T 3360 :*.GTGGTAGAAG CGGCACGTAA GGCAATCTAT TCACCACT'rC TTGAAACAAC TATTGACGGT 3420 GCTGAGGATG T1TATCGTCAA CGTTACTGGT GGTCTTGACT TAACCTTGAT TGAGGCAGAA 3480 GAGGCTTCAC AAAT'rGTAA CCAGCAGCA GGTCAAGGAG TGAACATCTG GCTCGGTACr 3540 TCAA?1'GATG AAAGTATGCG TGATGAAATT CGTGTAACAG TTGTTGCAAC GGGTG'rTCG'r 3600 CAAGACCGCG TAGAAAAGGT TGTGGCTCCA CAAGCTAGAT CTGCTACTAA CTACCGTGAG 3660 *ACAGTGAAC CAGCTCATTC ACATGGCTTT GATCGTCATT TTGATATGGC AGAAACAGTT 3720 CAATT2GCCAA AACAAAATCC ACGTCGTTTG GAACCAACTC AGGCATCTGC TTTTGGTGAT 3780 361 TGGGATCTTC GCCGTGAATC GATTGTTXCGT ACAACACATT CAGTCGTNTC TCCAGTCGAG CGCTTTGAAG CCCCAATTTC ACAAGATGAA GATGAA'rTGG ATACACCTCC A'TMNTCAAA AATCGTTAAG TAAATGAATG TAAAAGAAAA TACAGAACTT GTTTTTCGAG AAGTTGCAGA GGCTAGTCTG AGTGCTCATC GAGAGAGTGG TTCGGTCTC GTCATTGCAG TTACCAAGTA TGTAGATGTA CCGACAGCGG AAGCCTTGCT TCCGCTAGGT GTCCATCATA TCGGTGAAAA TCGTGTACAT AAGTTTCTGG AAAAATATGA AGCTTTAAAA GATCGAGATG TGACTTGGCA TTTGATTGGT ACCTTGCAAA GACGTAAGGT GAAAGATGTC ATTCAATACG TTGATTA'TTT CCATGCATTG GACTCAGTAA AGCTAGCAGG GGAAAT'rCAA AAALAGAAGTG ACCGAGTCAT CAAGTGTTTC CTTCAAGTAA ATATTTCTAA AGAAGAAAGC AAACACGGTT TTTCGAGAGA GGAACTGCTG GAAATCTTGC CAGAGTTAGC CAGACTAGAT AAGATTGAAT ATG'rTGGTTT AATGACGATG GCACCTTTITG AGGCTAGCAG CCAAGATTTA CAAAGAGAAA T'rCAAGAGAA AAGTATGGGA ATGAGTCGTG ATTATAAAGA TATAGOTACA TCATTTTT'rA AGTAGGAGAG ATTTATAGAT TATTTTACGG GCCTGTGTTT ACTTCAGTAA ACAGTCGGCT GGCACAAAAG GGCAAATCAG AGTCAGCGTG ATATGAGGAT GCAACAGAAA TTTTCAGTAT ATGACAGAGG TCATGTTTTA GCTGGAAATT GAACGTTATT GTAAATGTTG CGGTTTTGAT ATGAAGCGAA AATGCAGTGG ATATTTACTC
AGGATGAGGA
ATTCTTCACA
AGAACAATAT
CAACGGATAA
TTGT'rGATTT
TGCAGGCTCG
TGAAAAAGGT
AAGATATCCG
ATAGAGTACG.
CCTGATTTTG
TGAGCAGTTG AAAGAGATTT ACAAATTCCA AATATGCCTA AGCGATTCAA TTCGGTTCCA AACCATGTCT TTAAA.AGATA TTCAAGTCTC CCTTATGAAA GGAACCGGCT CTCCCAATGA CACCAGACTTr CATGCAAGAC GGTCATTATA GATGTTCGTT ATTGGCAGGA AACGAAAGTA TCGTTGTTTG GACTAT'N'GG AGCTTCTACC ATGTATTTGT
TCAAGGCGGC
TGACCGAGTT
CTTTTGTTrCG
GATTCGATAG
AAAGAGATGA
ATCPLACCT'rC
AACAGGAATT
ATCCTAGAAA
TCTTGATTGA
ATGGAGCTTG
TGACACCAGT
3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520
TTTACCAGAT
ATAATGATTT
GTAGCCTTCG
GGTGCCTACG AATCCAGTTT AGGTCGTTGG ATTGTAGCGT GAAGATCAAC AGGGTGAGTT TTTTAATTCG TATGATTTAT CTGTCATGTC TTGGTTTCCA TGGTGAAACC AGTGCTTGCT TATCTGTTTG GGTTGCGATT TGGCGATGAT AGGATGAATA TCTTG.ACAAG GGAATGGAAT CCCTTGCAAC GCCTGCCTTT ACAGATAGCG GGTCTTGATT GTTTTGGTTC GATTTTTAGG AGAAA.ACCTA GTGCGTT'N'C AAGGGATrTA TCAGCATTTC TCCATAGAAG ATCGTCCATT GGATAAAGAA GGTAGAAGAT AGCTA'rGCTC CTTTTTTAAC TCCTTTrATC AATCCTCATC AGGAGAAGCT ATTAAAGATT AATTCGTCTC GAGTGAGTAT ITTCAGAT'rr 'GAAATATCT 362 TTGGCCAAAA CCTATGGTCT TGCTTGTAGC AGTAGTGGGG G='CGAGTTT 'rA'TATACCC AGATTATTTC CAACCAGAGT CTCCAGGAAA TTGTGTATTC CAATAAATTrT GAAcATTTAA CGCATGCTAA GATTTTAGGG ACAGTCATCA GAGATATCCT AGTAGATGAA GAACGGGCGC TCTTTCAAGA TGGACTAAAG AAAATTGGTC TCACCGAGAA AATAGATAAG CTAGAACAGT 'rTCGATTIAGA
I'GATTGAAAA
TTCAAGTTGG
AGGGACAAAC
GAATAGAATG
AGGTTTTGAT
TGTTCN'TTA TCAAATGT 'r GAAACTTGTC CAAGTAAATT AGACTTGATT AGTGTGAGAA GAAAAAAGAG AAGAAAAAAA CCAATTACAT CATTAGAAAT CCAGAAGAAG TCGATGAATT ATCAATTAGG.GAI-rGAACGG
AAACTTTTTG
AGATTATGAT-TAATCAGCAG
TTTCTTCTTC
GTATACCTGT TTrCGCTCGAG GAACGTCCTT ATCGAGAACT GGATTTATCT GTGTCTAGTT TGAAACTATC TAGGAATCAA GCAAACCAGT A'rCA'rCTGGT AGACAAA'rCA GATTACACTG AA'rTTGGTCG CTTGAGATTA CTTCAAGATA TAACCGTCCA GTTA'IrATTA AGTAAGTGAG AAAGGACAAG ACTTTTGGAA CTCGATTCAG TTTAGATATT GTGGTTCGTG ATTACGAAGA a. a.
a a. .a a a TCTTGTGCGT GCGAATCATG TTACTTTGAT GAAA'rAA.AAG TGAGAGAGTG AAACAGGCGG AGATGCGCAA CGc'rTGI-rGG AACTCATAAT GCTAAGAAAG CTTCCACCAA CGTCTCAAAT TTGGGAAGAT ATTCTCCGTC AGAAGTGGT'r AGCGAAGTAC TGATATGACA CGTCAGTTCT ATAAAAATTT GCGTATTAAG AGTTTAGAAG AGCGTTTGTC AT'rCATTGAG CCAGTCTGTA TTGATTGCTC AGGATACAGC CGCATGAACG T'rCAAACAAT ATCATTCATC AAGCAGAGCA AAGAAGCTAA ATATAAGGCA AACGAGATTC TTCGCAAGC TCGCTGTTGA AACAGAAGAA TTGAAGAACA AGAGCCGTGT CTACAA'rTGA GAGTCAGTTG GCTATIrGTTG AATCTTCAGA 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6 540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320
CAACAGCTAC
TTGGAGAACC
CTCAAGCAGA
AA'rTrGAAGC
TTGAAGAAGA
AATGCATCCG
T'rATCTTCAA ACCAGTGATG GATTCCAGCT CCAATTGAAG AATGGCAGAA TTACAAGCTC TCAGATTAAA CAGGAAGTGG GCCTCTGCTC ATCCAGTTGG ATAGGTCCAA CACCAGCTAC
AAGCCTTTAA
AAGAACCAAT
GTATTGAGGT
AAGCTCCAAC
CCCAATGTAT
AGAAACTGT2,
AGCCGATAAA
TCCTGTACTG
GAAGAACCAG
GATITCAATAC
TGAGAACAAT
CCCTATTGAT
TCCCACGTTA
ACGATTTTCG
GAATTGTCTG
ACTCCTCAAG
AAGTAGCTCC
CGGGATTTGA AGCACCGCAA ATCTTATCCTr TATA?3'TCCA AAGATTATCC TCTCAAAAAC CGGGATAAGA GGGACAAAGA TCCTTTTTGG AAGTCGTGGT
GAATCTIGTI'A
GCGAGCAGGA
TCAAGTCTGA
CTAAATCTT
TTTTAATTTG
CAATTTTATA AGAAATATTC GATGGTGTGA GTCCTGTAAT AGCTAGTAAG ATTTGACGTT TTCCGAATAA AGGTGGTACC TITATTATTTA TAAAGGAGAT 363 ACCATGAAAC TCAAAGACAC CCTTAATCTT GGGAAAACTG AATTCCCAAT GCGTGCAGGC CTTCCTACCA AAGAGCCAGT TTGGCAAAAG GAATGGGAAG ATGCAAAACT TTATCAACGT CG1TCAAGAAT TGAACCAAGG AAAACCTCAT TTCACCTTGC ATGATGCCCC TCCATACGCT AACGGAAATA TCCACGTTGG ACATGCTATG AACAAGATTT CAAAAGATAT CATTGTTCGT TCTAAGTCTA TGTCAGGATI' TTACGCACCA TTTATTCCTG GTGGGATAC TCATGGTCTG CCAATCGAGC AAGTCTTGTC AAAACAAGGT GTCAAACCTA AAGAAATGGA CTrGGTTGAG TACTTGAAAC TT'rGCCGTGA GTACGCTCTT TCTCAAGTAG ATAAACAACG TGAAGATTTT
AAACGTTTGG
GAAGCAGCTC
GCTAAGCCAG
GTGTT'rCTGG TCACTGGGAA AATCCATATG AAATTCGTGT ATTTGGTGAG ATGGCTAATA TTTACTGC ATGGTCATC'r GAGTCACCAC TGACCTTGAC TCCTGACTAT AGGGTTATAT CTACCGTGGT 9 9* 9 9 9.
99 99 9 9 *9 99 9 9 9 TACCATGACT TGGTTTCAAC TTCCCTTTAC GTTCTAGATrA CAGATACTTA TATCGTTGTC TCTCGTGGTT TGACGGTTGG TGCAGATA'TT GCTCGTAAGT TTGTCGTTGC TGCTGAATTA GCTGATG'ITC AAGTTTTGGA AACTTACCGT CACCCATGGG ATACAGCTGT AGAAGAGTTG TCTGGTACAG GTAT'rGTCCA TACAGCCCCT ATTGCTAATA ATCTTGAAGT CGCAGTGACT GCTCGTCCTG AAT'rTGAAGG TCAATTCTA'r CTTGGTAACC TCCTTCTTGC CCAAGAAGAA
TATGCCAACA
TGGACAACGA
GATT'ACGTTT
TTGACTAGCT
GGCCAAGAAC
GTAATTCTTG
TT'GCT'GAAGC
AGGTAAAAGA
CTCCATTTAC
TGGTTCAACC
TGTCTGAGAA
TCAACCACAT
GTGACCACGT
AGAGAT'rGAA
TGGCAA.AGGA
CATCACAGCT
TGCTGGTGAA
ATTTGGCTGG
CGTAACAGAA
TACGACTGAC
CAATGTTGGT
7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 GGTTTTGGTG AGGACGATTA GTTGATGAAC GTGGTATCA'r GATGAAGAAT GAAAAGGTAG TTCCAACTGT TATTGAAAAA ATCTCTCACT CATATCCATT TGACTGGCGT 99 99 .9 9 9 9 9999 9 9999 99 9* 9 9 9 ACTAAGAAAC CAATCATCTG GCGTGCAGTT CCACAATGGT CGTCAAGAAA TCTTGGACGA AATTGAAAAA GTGAALATTCC CGCT 'TACA ATATGATCCG TGACCGTGGT GACTGGGTT-A GGTGTTCCAC TTCCTATCTT CTACGCTGAA GATGGTACAC ATTGAACACG TAGCTCAACT TTTTGAAGAA TATGGTTCAA GCCAAAGACC TCTTGCCAGA AGGATTTACT CATCCAGGTT AAAGAAACTG A'rATCATGGA CGTTTGGTTT GACTCAGGTT GTAAACCGTC CTGAATTGAC TTACCCAGCC GACCT'rTACC TTGCCTCAGT TTC'TAAAT'rC ACTCAGAATC GGGTAAAGTC TCTCTCGTCA ACGTCrGG CTATCATGGT AGCTGAAACT GCATTTGGTG GGAACGTGAT CACCAA.ACGG CGAGTTCAAA CATCATGGAA TGGAGTGGTG TAGAAGGrTC TGACCA.ATAC CGTGGTTGGT TTAACTCATC ACTT'ATCACA TCTGTTGCCA ACCATGGCGT AGCACCTTAC AAACAAATCT TGTCACAAGG TTTTGCCCTT CTTGGAAA'rA CTATTGCTCC AAGCGATGTT CTCTGGG'?AA CAAGTGTTGA cTC-AAGcAAT CAAGTrrCTG AAACTTACCG TAAGAT'rCGT TCTGACT'rTA ACCCAGCTCA AGATACAGTC TACATGACGA TTCGCTTTAA CCAGCTTGTC GAATTCTTGA CGATCTACAA GGCCTTGGTG TACCTTGATT- T'rGCCAAAGA TGTTGTTTAC CAAATGCAGA CTGTCTTCTA TGACATTCTT CTTCCTCACA CTGCGGAAGA AATCTGGTCA CAAT1'GTCAG AA'rrACCAGA AGTTCAAACT 364 GATGGTAAAG GTGAGA.AGAT GTCTAAATCT GAAAAACAAT TCGGTGCTGA AATCTTGCGT GACGTGCGTA TCTCTrATGGA TATCTTGAGC AACACTCTTC GTTTCtTTAT TGCCAATACA GCTTACGATG AG r'rCGT'rC AGTTGATAAG AAGACCATTC GTGATGCCTA TGCAGACTT'r AACTTTATCA ACGTTGACTT GTCAGCCTTC ATTGAAGGTG CCAAATCACT GGAACGCCGT GTCAAAATCA CCAAACTCTT GACACCAATC TATCTTGAGT TTGAAACAGA AGACT'rCGTC T'rTGCTAACC AAGAAGAAAT CTTGGATACA 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 IS C.
S
a
TGGGCAGCCT
GCAAAAGT'rA
AAAACTCTAC
ACCATCGCAG
GTTGAACGTG
GAACGCAGCT
GCGGAAGCAG
AATTTGAGAA
GAATACCTGA
TCATGGACTT TCGTGGACAA GCACAAAAAG TCGGTAAATC ACTTGAAGCA CACTTGACAG TCGAAGCAGT AAACAGCAAT GTAGCACAAC AAGGACCAGC TCCGGAAGCT GCCCT'rAGCT CTACTGGTGA AGTATGTGAC CGTTGCCGTC ACCAGGCAGT TATCTGTGAC CACTGTGCAA TCGCAGAAGG ATTTGAAGAG AAATAAGATT GAAAAGACAA CTAATTTTAT AGTCTATTAA TATGATGCGT TrTTTATTTA TTTTAAAAAT CCTTGGAAGA AGCTCGTAAT TTTATCCAAA 'rGAAGTTG'rG T7TrTGATCG'r G'rCTGAGTTG TCGAAGATGT AGCCTTCACA GTATCGACCC AACAACAGCA GCATCGTAGA AGAAAACTTT' GAAAAGTCTA GGCAAAATTC ACGCATTGTA TCACGTTT TTGCGAGGTA TGACTTTTTA TACTCAACAA GAATCAAAGA TTGATATTAG GGATAAGATT GTTAT'rTCTG GTTTCTGAAA ATAATATCAG AACATAC'TTT AAAAAAGGAG ATTTTATTAT CTTGTGGGAG GAAT'NTTACT ACAGCTGCTA CAACTCTTAA CTCT'rGAATG CCTTGAAGAT AAAGGAGATA AGCGTGTAGG AGTCTCATTC TATI'CCGTTC GAAACTTAGC AAGCTAACAG TAGTAAGATA AAATAGGAAT GGTAAATAGT GTAATATTTT TACAACAATA AATTTATATA AGTATTATAT TTTATTTCAT ATTATACAAA TTTTTATr1rr TTTTAAAAGC AAATATGATA CAATTTTATT 'rGAAAAAAAT AAAATTAAAA AGACTTGCTT TAATTAGTGG TATCGTCGGT 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860
TCTTATTGGT
TGGAGGAGCT
TGCAAATCTT
TGCAGCTCCG
TTAGGATGGG
CCTTTTGTCT TGTTGGGAAT AGCGGTAAAC ACTGCAGGGG CTTTTTCAGG TGTAGCCTTA GTTCTTGGTA TCATTGCTAT TGTTTACTAT TCTGTACTAA TGATTGT'rTC TGGTGCAGTT TTGGGGGGAT TTTTGCTATT ATCGGAGGAT 365 CTCTATT-CCT TTCAACATTG AAGAAATTCA AATCAGAAGA ATAAAAGGTA TTNTAGCATC AAAAGAACAA AAAAGTTTAT CGGTATAGGA GTAGCTCTAT TATCTCTTTC TCTXrCTAGTT GCATGTGGAA CATAAAG'rrC AAAGAATACT TCAACAAGTA ATGATGAGAA GACAGTACCA ACATCCAATA GTTCAAAAGA AACAATCACT TTCGATACAC CGGTTGTAAC AGACGATGCG ATTGAATCAA TACGCACTTA TGCAGATTAT ATAGATCTT ATAAAAATAT TATTTTACTA AAGCTGAGGA AGG'N'TCAAA GGCATAGCTA TGGAAAATL ACTAAACTAA KAGAGTCAAC GAAGATAGAA TAGAAACAAC TCCTTTTGG TTTTGACTAG AGGGGAAACC TCGGAAAATT GCCACGGTGA GTCTGAATGG TGTCTGAAAA AGGTACACAA TCGAATTTGA CCAAGCTTAC C'TCTTGAAGC ?rCTGACCAA GTCACTACGG TGGTTTGACT AGCAAGTTC-A CATCTGGCGT ATGAGCACTC AGCTCACACA ATGCTGAAAA CTTGAAACTG TCAAAAATTA TTCGATGCGC CAAAAACAAT GTGATTGCCA CTTTTTTGTG AAAAATTGTG
AGAAAAAAAG
AACATTGTCA
TAAAATAGAA
?I'rTGATGAT TGACTCGTrT GT'rAAATAAT*
AACAGTCCTT
TAGATAAACG
G'IrTTTGCTC
GATGTTGATT
GAAGCTGGTA
ACTAACTTGG
T'rGAACGAAC TAAAGGAGAA TCCATCTAAT GGTAAAATTG AACAAAGCTA ACCT'rCAC TGGTTGGGCT
S.
S S
S
SS 55 S C
S
OS
S S
S
S.
S S
S
CAAGCGATrG ACGCTGGTAA ACTTCAGTAT TGAAACGTGC TTGTGGGTTC CAGTTGAAAA GGTAAAAACA AAGCTGAAGC CGTTCATrACG ATGTAT-TGCC GACCGTCGTT ACGCTTCACT AcTTTGG.AAC GTGCTCTTCC
ATTGATCAAA
TATCAAAACA
ATCATGGCGC
TGCTGAACAA TTTGGTGATG TCCAAACATG GACCGTGATG TGACGACTCA GTTATCCCAG ATTCTGGGAA GATAAAATCG 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 CTCCAGCTCT TAAAGATGGT AAAAACGTAT S. *s S S 0 05e5 5
OS..
5.55
S
S. CS 5 5
S
CCCTTGTAAA ACACATCAAA GGTTTGTCAG ACTTCCCACC ATTGGTATTC GAArTCGACG TTGGAAAATA AAAAATTGTA AGTCTAGAAT AGTATGATAA GGAATAAAAA ACAAGATTAT ACTTACATGG GTTGGGAAGA AGAAGCTTTA GTATTTGGCC TTATAGGGGC TGAACGGATT GGTCCACTrC CTGATAGTrC AGATTATCAG TTAGCTGAGA TTCGGCAGGC TTTGGAGAAG GAGCAGCACC TAATTGGGCC AAAAACGAGT TCGTAGGAGC TCACGGTAAC TCAATCCGTG ATGACGAGAT CATGGACGTG GAAATCCCTA AAAAATTGAA CGTCGTTT~CT GAATACTACC TGATTTCTAG GCTTrATG TTAGTATGGA GTACTGGCCT ACAAGCAACC AGCTTCAACC CCGATAGGCA ATGGTTCTTT AGGAGCAAAA CAATTTAATG AAAAAAGTCT CTGGTCTGGA GGTGGAAATC TTCAGGATCA GTATGTTT AGAGATTACA ATCTGGCTAA GGAACTGGCT CAATATGGGA CCTATCTGTC 'rTTTGGGGAT ATTCACATTG AGTTrCAGCCA GCAAGGTACG ACTTTGTCTC AGGTGACGGA CTATCAGAGA 366 CAGCTGAATA TTAGTAAGGC AC~rGCGACG ACTTCTTATG TCTATAAGCG AACGCGArT GAACGTAAAG CTr'rrGCGAG TTTTCCAGAT GATCTCr"rGG TTCAATGTTT- TACTAAGGAA GGGTTGGAAA CTCTAGATT'r TACTATAGAA C'rATCCTTGA CCTCIGATTT GGCTTCTGAT GGAAAGTATG AGCAGGAAAA ATCTGA'rTAC AAGGAGTGTA AGTTGGATAT TACTGATTCT CATATCTTGA TGAAGGGAAG AGTTAAGGAT AATGATCTGC GGTGCTAG ?rATCTAGCT TGG4GAAACGG ATGGAGATAT TAGAGT'rTGG TCAGATAGGG TTCAGATATC AGGAGCCAGT TATGCCAATC 'rCTTCTTGCC CGCTAAGACG GATT'rTGCCC AAAATCCTGC TAGCAATTAT
CGCAAGAAAC
GGCTATACCC
CAAT'rGGATT
TAGATTTAGA
AATTGAAATC
TGGAAGCTGA
GCAACAG4GTG ATAGACTTGG AAGGCATATC GAGGACTACC TGTTGACGCA TCCACTACAG AAGCCACAAG AAGGGCAGGC TTTGGAGGAG ATTAGTTCGT CCAGAGACTG CCCAGATGCT GCGGTCGACA ATCCTCCTT-G GAATTCGAC TATTGGCCAG CCTATGTTAC CAATCTCCTA GATGAT'rTGC GTGTCTATGG TCGTCTAGCG AAAGGTGAGG AGAATGGTTG GTTGGTTCAT
CTGTTCTCC
CTACCAGCTA
TATCACTTAA
GAGACGGTCT
GC'rGTAAAGT
ACTCAAGCGA
TGGACACAGC TAAAGAAAAG AAGCCTTATT CCAGCGTGTT ATGAT'rTGTT AAAAAATTAT ACTATGGACG GTATTTATTG ACCTACAGGG AGTC'rGGAAT ATGTCAATCT GCAGCTGAAT TTCCAGTCAT CAACTATGTA ATGCAGGAAT CGTCTCTCAG CTCCCTTTGG 'rTGGACGGCA 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 CCTGGTTGGG ATTACTATTG GTTTATGAAG CCTATT'rATT ATGTTGAGGG AAACGGTTCG
GGGTTGGTCA
TTATAGGGAC
'N'TTTGGAAT
CCAGC'rGCCA ATGCGTGGAT GATGCAAACC CAAGACTATC TCAGGGAGAA AATTTATCCC GCCT7*MTAC ATAAGGATCA GCAGGCGCAG
CGTTGGGTGT
'rATGACCAAT
GGACTGGATG
CAAATCACTC
AATGAGAAAG
CTTCTCCGTC 'rTATTCCCCA GAACATGGGC CGATTTCGAT CTCTGATTTG GCAGTTATT'r CATGATTTTA TCAGGCTGC AGGACTTGTT GACTGAGGTT AAGGAGAAGT CTGATr'rACT AATCTGGTCG AATCAGGGAG TGGTATGAGG AGGAAGAGCA TCGAGGCCCA GCATCCGCAC C?1CCCATC TAGTGGGACT
TGGCAATACC
TCAGGAATTG
AAATCCTTT-G
GTATTTTCAA
CTATCCTGGC
CCTCAATGAT
GGCGCGTTTG
CACCTTGCAA
AATCTCTrTTA
CGTGGAGATG
GGAGATGGCA
GCTACAAGGG ACAAGAGTAT ATTGAAGCG GCGGCACAGG CTGGTCCAAG GCTAATAAGA ATCGAGCCCA TAAA'rTATTG GCAGAGCACT
CCCGTCCTAC
TCAATCTCTG
TAAAGACATC
AATCTT'rGGT GTAGCCATCC TCCTTTTCAG ATGGCAGAAA TGT'rACTCCA GTCTCATGCA GA'rCC TGGT CAACAGGTTC 'rG?1TCAGGC ATAGATCGTA ATTTTGGTGC TACTAGTGGC GCTTATC1'GG TACCTCTAGC 'rGCCCTACCT TTAATGGCAC GTGGACATT~T TGAAGTGAGC ATGAGCTGGG AAGATAAAAA ACTCTTAC.AG TTGCGAGTTT CTTATCCAGA TATTGAGAAG AAAGCGAAAT GCA'rGGGAA AGATTGTATT CAATTTTATT 'rrTAAGAAGA TCTTATAAGG; TTTAAGAATA TAAGCAGTTr TCAACTAGTT ATACTCAAMG AAAATCAAAG AGCACAAACT 367 TTGACCATTT TATCAAGGAG TGOAGGAGAT AGTGTGATTA AAATGAATCA AGAAAAAATA TCGGTGGCAA CAGCAGAAGG TGATCT'rGr- CAGTAATTTG AAACTGCCTr TTAATAAGGA GAAAAAACGT TATAATGATA ATAGGAAGTA
TGTTTTGAGG
TTTGTT'rGAT
TATCAGGTTG
GGGGAGCTGG
CTTGGGGGAA
GATTATAGTT
TTGCAGATGG AAGCTGACGT AGAGGGTGGG TCTGATGGCT GGGACACGGA GATTG'rGGCC TTATTATCCT TCGTGCTTCA 'rGGATACCAA TGATGAAGGG AGGAAGCTAG CCGCAGGTTG GGTTTGAAGA GAGATTTTCG TATATTGAGA TGAAACACTG AATTGTGATG TGAATTTTGA GGTGCAGGCA AGTCAACAGT GAAATCTGGA ?rGATGGTGT
CTCAAAACAG
AGGAG'rATAA
TTACAAGCGT
GATTGAAAAG
TCTTAACCT'r
TAATATTGCG
CCCACCAGCG CACCAATTAC CGTAGAAATG TTTTrATAATC TAGTTTCTAA TCTGACAGCT AAGGAAAATG GTGACAGATG CCTTGAATCC AATAACTTTC CAGCCCAGCT GTAGCCAAAA ATCC'rAAAAT ACGGGCAAGC AGGTTTTGAA ATCATCGTGA CTCATAATGG GATGCCAGTG TCAAGGATGT TACTAGCATG ATCAAGCGAA CAAGGGGCGT TTTTTATCCA CCTCAAAGTA ACCAGTCCCA AACCTT'GGA'r TTGGCAGTCA AAAACAGACG GAGGGCGCAG TGGGCAGGAT GCCATTCGGC )LAAGCGACGA CTTCCTCAGT ATACAGCGTG GGACAGGAGA AGACCATAC'r TATACCATTA TATGGGCTAC GCAGGAAGTG TGATCAGGCC TTGACAGATG TTCTGGAGGG GAGCAACAGC TCTCCTTTGT GATGAACCCA AATTCTCCAA GACATGTCTC AGCTTTGGCG CCCATTGCTG GGTGCTCAAC CAGCATCCTC ATGTGGGGI' TG'TTrTCAG TGGAACTGGC TTCTGAAATT TAGGTCTGGC TCATCGTCTC GAGTCTCCAT TGCACGCGCG CTGGAGCCTT GGATTATCAG GTCAAXAGGG AGCGACGGTG ATCGCGTGAT TCAAATGCAC AGGATATTGA CAGTTTGGAG GTTCAGTCCT TCACAGGCTC 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15420 15480 15540 15600 15660 1572 0 15780 15840 15900 15960 16020 16080 16140
AAACTTATTG
TCTTGATCCT
ACATGGAGC
TGTCTAACTA
AGGTCGAGT'r
TCTACTCCAA
CAGACAAGGA
TTAGTTTAA
CTGGTTTTGT
GAAGTGGGAC
GAAGGACTTA
GATGATGTTG GGATCTCTAG GACAGCTAAT GCTTATTTAA
CCTTAGTAGG
CAACTGCTCA
TGGCTTGGAT CAAGCAGACC AAGAAGAACT TGGCTATTTG ACAGATGTGA CTATGGATAA ACCAGAGCGA ATTTCAACCT T'rCAGCTAAG AATCGCTTTG GCCACTCAT'r TGCAAGGCCA AGAAAAAGAA GAGGGTCATT CCTCTTTAAA GGATTCGGCT GAAATCCTCT CCCAGCGAGA TCTGACAGCC TATGGGGTGA TTTTACCTAG
TCAATTTGAT
AAATGCCTTT
AAT'NATCA
TCTAGACAAG
TCGTTTAGCA
CAGAAAGTCT
TCATCAGCCT
GATAATGGCA
GGGCAAGAGA
GCTGCTCAAG
368 ACAATATAGC TCG'TT'TGAAA TATCAAGATT TAGCGGGTTT ATGAAGAAAA ATCCAAMCAA CATCAAGAAG AGCTTGAACA AGGTACGTCT GCAACTrG AAAAAAGAAG GACAAGAGTC CCCTTG.ACAA GGCTCAGACT AATTTGCAGG AAGGCAAGCG CTCGTATACA GGCTCAAGAA AGTCAACTAG CCTTGICC CTAGTGCTCA ACTIACCCAA GCCAAGCAGG AA'rTGGGCAA AAGCTGAACA AAA'rCTAGCC CAAGAAAAGG AAAAATTAGA ATGATTTGGC GGAGCCAAGG TATCAGGTTT ATAATCGTCA GCTATCrrTAT GTATAGCAAT GCTTCATCCA GTATTCGAGC TGGTACTTT'A TGCCGTAGCA GCCATGGTGA CCTTTACGAC TCAAGTTCAG AGAGAGCAGG GGAAGAGGAC AAACTAAAGC AAAACATCAG CAAGTCTTGG GACCATGCCA GGTGGTCAGG AGTGGGCAAT ATCTTTCCTG CATGACTCGC TTTGTAGACG AAGAGCGAAC TCATGCAGGG TCGTAGTAAG GATATTATCG CCAAGTT'rCT CCTTTATGGA AACGGCTCTA GGTAGTATAC TTGGTCATTA TTTGCTAGCC TACAAAAGGC ATGGTGGTGG GAGAAACTCA GATTCAGTTC AGCT'rTTGTC TTGAGCTTGT TGGCGAGTGT GTTACCAGCC ATTTTTAAGG CCT'rGGGTTA CTAGTAGCTG GGACTGTCGG AGTGTAATTr CAAGTGTCAT TATTGGACCT ATAGCTTACT TATCTGGTGG CTTGGAGGGA CCTGTCAAAG GAGCTAAAAT 0 0* 0* 00 0 0 00 0 0 0 *00000 0 0000 0 00*0 00 0 0 0000 0 0000 0 00 00 0 ACTTCATGAC GAAGCAGCCC AGCTTCTACT CTTATTGGAG CGTATCGGTT T'rATCTGGCG CCGCAACATC TTTCGTI'ATA AGCAGAGAAT TGTAGCTCTG CTCTTTGCAG GTTTGGGAAT
TCCTAAACCT
16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 L7580 17640 17700 17760 17820 17880 17940
ACAGTTTCAA
TCAGGACAAG
AATCTATTCT
TCTTATGATG
CAAATCCAAC
GTAGAGCTAG
AAAGCGCTAT
ATAGAGAAGG
AGTATCAGAT
CAGAAGTGTT
ACAAGGATTT
AAGATTTGAC
GCATCGTTAT
AAATTGAAGG
TTATATGAG
TCGTCTCAGT TTTACTCATA AGGTAACAGC GTTGATGACA ATCTTTGGTG TGGCAGGTTC CCAATCTTCT GTAGCAGGAG TT~CCGTCTAA GCTTGTCTCT GAAAATCCTA GTGCGACCAA GAAAGGGCAG GAGATACTAG CCTACCAGAA CAAAGGCAAA GCTGGTCTTC AAAACATTAC TCCCTTTATC CATCTTCAAC ATCATCAGCA TACAGCTAAA CTCGCCCAGC TGGCAGGTGT GGAGCTGACA TTAAAAGATG CAAGGTTGGG CAGACTTTAG GAACTACGTT GGTC-ACTTTA GCTACCCcAA GCCAACACTT AAGTCAGGCG GGCTTGCTTA AGCCATTCGA CTCTTCGACT CATCGTATCG GTTCTATTAG TAAGGAACTA AAGGTCGTTG CTATTACTGA 'rCAGGCTAGC TATGAGCAAC TTTACGGACA ATCTGGTCTC ATTAAGGGAT ACCAGTGCAA CTAGTATCGA TGAATCAATC TGCGGTGTCC AGCGTTGTCC AAAATGCTrc CTATCGCTAG CTCACTCAAT CAGACCATGA CCATCTTGGT CTATTGTCAT CCTTTACAAT CTGACCAATA TCAACGTAGC 369 TGAGAGAATC CGTGAACTCT CCACTATCAA GGTICTTGGT ccTCTAcATr TACCGTGAG.A cGATTGTGcT GTcccTTGTG AGCTGGT ?TC TATTTACACC AATTGAT TCAAATGAI-r TTATCCGCAG GTAGGCI'GGG AAGTCTATGT AATCCCAGTG GACCTTGCTT GGTTTCTTCG TCAATTATTA TCTGAGAAAG GAAATCTGTA GAGTAAGGTA GTTATTNTA GCTGATGAA AAAATCCTCC GTTTCAAAGA TTAGCAGCT GCAATTGCGG AACGTAGCGA ATCGTATTGT TTCGTTGATC AAGAGGGCAT AATGGCCATTG TCAAGGCCTT TGAAACAGTC AATACGACCG TTrGAGTTGAG CAGATGCCTG ATCAAAATCA GCCAGAGATT CTTGTCGCCG ACTTGTAGTT CATAGGATAC TCCAATCTT'r ATTTGACCGG AAATATGGGC TAGGTTTATT TTGCCAATCC TAGCTGTCGC CTTCTTTACC GGGATGTTTG TGTCTTGAGA AAGAGTGATG GAACTTGT GTTCCTTCTT CTTCAGTATT GAGAAGTATT TGAATGCTCC TTAAA'rTrAT CCT'rACrr'rC TIrT='CAT AGAATTCTTT TTAGCAAAAA TAATACCGTT TCATTTTGAC CT'rCAAGGTA TTT-TCAGTr'r TCTTGGCATT AGGTTGA'rAC CGTTATAGAA GCAGGGAACT CTTTGTGACA C'TTCGAAGTT TGGCTCAGAA CAGTATCGAG GACAAAGACT AATCGCGCCC GAAAGAATGG CAGCACCGCA CCAACGT'rTr TGTTGTCCAG TCCAGCCAAT TATCGATAGA AGGAACGACA TTTCATAATA ATGAAGTCAC GGAATCGTAC TrGGTCTGAT TCGCCTGCGA CTATTCTCTr GCAGCAGTAA GCATCATTrr GTTGATATGT TAGAAGCCCT CI'CTATTTA CTAATATTCA GAGGATT=r TCTATAGGGC TTGATATTAT CCACGTATTC GCGCGTGCTA ATAGGTGCCA TCAAAGTAGT CTGAAAGCAT TGAGCAAAAG GTAGGTCCAT TCT'rCATTAA AACGACGTGT CTCAAGACTT TTTTCTTGCC GTAAGAGAAA AATCAAGCGC GGATTTCCGA GAAAAGTTAC ACAGTCGGAA 7rTTCCAATG TGTGAGAATG ACGTT'rTTCA TGTATTTCAT CATGCTGTAG TAAAGATTTG AGCGTAGATT CATCCACACT TGATTCTACA AGGTCTTGAT TT'GTTCAGGA TTGGGACACC ATAGGCTTTA TCTTTTCAGC AGGCATCTTA TATCAGTA'rA TTCTTTGAGA TCAAT'rGT'rT GGCGATATTT GTGGGTCTTC TTTTCCTTr
TTTTAGAAAG
GTT'rACCTGT
TTCTTCCAT
ATAGAAAAAA
TTCAGCAATC
TTGTTCTGCG
AACAGTTTTC
TTGGCGAAGT
GATTTCCCAG
TTCGCTGGTT
TT'rATCAAGT
ATTGTTAGGG
TTCAAGGTTA
GATAACATCA
T'rCTACCAAT
AATCAAATTA
ATCTGTTGTA
AAAGCTCACA
TTTAGCTGAA
CGCCAGCTCA
TTGTCAAGGT
ATAGAGTCAG
ATTGGACCGT
TTTTCAACCA
ATGTAAGCAC
ACAATGAGTT
TTGTCTAACT
TCTTTGGCGC
AGCCAAGCGT
18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 ACGCCGTCGC TGACTGCGAA TTTGTAAACC AAGCATTGCC GCCTCAGAAG TTTTCTTAAC
GTAGTCTTTG
ACCTGTTTCA
GTCTTCAGGA
AGTGGTTCGT
TCACCAGCAA
TTTTGACCAG
AGAAAGAGAA
TGCCCCTTAT
AAAGAAGCTA
ATTCGTGTGG GTCTTGCCCA TATN'TTAGT AATATCAGCG AAGTTGTATC TTrTTCCG CGAGTAATGT ACCTAATTT T'rTAACAAAT GTTTATTTTT ATGAGAAAGA AACTAGCAGC 370 ATCGGAACGA TACTATGAAG GTCAATTTTG ATGATTGAGT TTGTAGCAAC AACTTNTAGT CTAGCACATG CTACAAGAAT GATTGCAGAA
TTCATTAGAT
CAGI'TCAAA
TGTAAGCACG
0 *0 00 0 a 0 0s *0 0 0 0 *b *0 60 0 ATTAAAACTA TAGCCAATAA AAGGAAAATC ATACTTTTCA CAGCATGGCT ACAATCAGGA CAGGAGTACC ATGAGAAGGT GAGTTCATCA AAGGAAGTTA AGCTGCCCCC ACACCCATAG AAAAAGGATA TGGAAAAGGT GGCTAAGAAA GAAGAAAAGG TTTGATGTAG GTAATGATGA GTCAAGGCCC AAGATGAAGG TCCCATGAGT GACATCCCGC GACGACAATA GCTGTTATCA AAATTCTGCA ATCATAGGTC AGATTGGT'rT CGGTAAAAGT AGTAAGACTT GATCGAAGTA GTCTTCCCAG CTTTTTTCAA TCAATCCCAG CAAAGGGTTC GCAATCAAGA CCCGCTGGAA TAGTTCCGAC ACTTTGCATG AGTGATAGA.A ATTGACAGGC TCAAGAGTTG CTTGAAGAAA TAATAAACAT ATCCGTATCT CAGTTGAACT TTTAGCGACA TAATGCCGAT GGCGGTATCG TGGCAGCTAG CAATCCAAAG ATAGGGCTAC ACC'rGGTAAG AGAGTCCCAA AACTGAAGCA GACTATTAGC ATACAGATAA CCTCCAATT'r A'rTAGGGCTT~ TATCGTTGTT TGGGAGCGAT ATACTAGAAC CTGCCGCAAC GTAGCTCCGA AGCTTGAGGA GCAGTTGCAG CTGGGGTAAT GCTGTCACAG ACACGAGAGT ArlwrCCCATGG CTTTAGCCAA ATCCAGATTA ACAAGAGGAT TGGACGGCCA GGATATTACC CCAATCAAGA TGATACCGAG CT'r=TGATAA TCGAGTTTCC ACAATGGCTC CGATAAAGAA ACAGCATGTG AAATGGCATC ACAGCTCCAG CTACAATCCC AATr-TTCCA ATCCATCGAT GATTACCGTA AGCTTCTT CAATCACTTC TCGATTGACA CGTGGTGAAC GATGAGAACC TGATrTCCTC ACTGACAGAG CTTCCTGCAC CAAACATCTG TAATTrrGACG TTCAGCGTAG 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21360 21420 21480
GTAGAATAAT
AGGCATTTG
ACCTCCATTG
TTCTTTTGTT
GTGGGGAATC
ATCTCTCAGC
ATCCAAGAGG
TTGACCTCCA
GAAACATCCC
TAGGAAATGG
AAAAAGAGTT
GGACCAAAGG
TTGCTGAGGT
GTATTCATGA
ATATAGTCGG
GACAGTTGAC
*09.
*0*0 00 00 0 0
S
TCAGCTACCC CGACGATTTC AAGGGCCTCT CTTICGAAAGA GAGGAATAGA GGGAAATAGT GGAAAGTTGT AGTCGATATT GATTTTTTGT TTAACTTCCT TGTCATCGAG AAATGCCTGA TTTAATAGTG TTGATTTCCC AGCGCCGTTT TGGAGCACTA GTGAAATATC CTTAAGTGCC TGCACTTTCT TCCAATGTTT AGCCT'rTAAA CCTAACGAGA CGCATTCCTT GACCTTGATG TCGACATAGG CAATTCCGTG TAAGGA'TTTT CCTTGATGTG GCATAATTCC CAACATACCT GGACCAATGA TGCCGGTAAT TGTTGGTCCA AACGTT'rCTT TGTAGGAGAC ACTGAGGTTT 371 TCGATACGTA TCATAAACTr GTATTCCTCC TGTCTCTTAA TATACATTAA AAAAAA'r AAGTCAAGTr AATTTTTGAA AAAATTAAAA TAATAACrGA AAAATAGAT'r CTAAAGATAA CTTTCAGGAT AAATTTCTAA ATTATAAAAC GCATAGTATC AAGTGTAAAA AACTTGGAAT TATGCGTTTT ATCATGGAAA GA 1-N1-ITr AATAGCTAAA AAATAA INFORMATION FOR SEQ ID NO: 37: SEQUENCE CHARACTERISTICS: LENGTH: 6171 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 21540 21600 21660 21706 GATCCCCAGG AAAAACCGAG TTCTAAAAAC CTATTTC rA AATTTTCAGA AAATTTCTCC AThTTATCTTC AGTAACTACT TGTCAGAAAT GAAGCTTCCA TTTCATAGAT TGTTCCTTrA CCGTACTGAA GAAATCGCCA GATGCGGTTT GTDTCAAC CTAGAATTTC CTCAATGGAA ACGAAGTATT ACAGATAAAG TGACACCTGC TGCCTTAGCA GTTrTCCCAA TCAATCGTTA TATTCTACAC TATT'rTCTA AATAAAAACC AACTCTTAGA TCCTGAAGAT AAGCGTCAAA TTGCTAGTGC GTTCTGACAA TTGGATTGGA CAAGCAGAGT ACAAATCCTT GCTCTGCAAC TCACGCAAGA CTTGTAATCC CTGTCATATT CCACTCCTTA AAATAGCAAG TATATTTTGT ACTGATTCTT CATTTCACTT AACTTCTTCA. TCTGAAATCG GTTCAAGTCT TGCAATCGGC
TTGGTCGTTCACATCCAC'TT
TGCTCCTGCC AAGPAAGACAC TCGTTTGGCA CGGCTGGTTG ACACGTTTCA AGCTTCCACG CTGGGTCAAG AGGTAGAAG CCAGATTGGA GGACATCATC TTCTTTCAAA TTCATAGCCT CCGACAACCG GAACCTCTTC GATATTGAAA CGCAGGGCAT 120 180 240 300 360.
420 480 540 600 660 720 780 840 900 960 1020 1080 1140
AACCATTTTG
GATCTGTATC
GAGTGAATTC
TTGTCGCATC
iGTTTGTGAT
CATGGA'PTGG
TCTTGGCAGA
TGGAAGCCGC
ACTAACCAAG ACAACATCAT CTAGTTTA.AT CGGAGCCACT GTCTTTGAGC TTAGCATACT TGACAGAC?1' AGATC'TATAG TTTTCGCTCT ACCCGTTTGA TrrTGACCA.AG GCGAGTCACT GTCAAACTGA TCCAGTACTT CCACATAAAG GATTTCTTCA GGTTTGGCTC AGATGCTCTC CGATGTCCTT CCAACGAATA TCTGTAGATG ACATTTCCAA GACTTGTGAA CATCAAGAGG TTGAACAAAA ATCAAACGGT CATCATCACG CTTGCCAATT AAAGGAACGT GGACTGGTAC GCTTGATGTA ACCTGCCTTG
GCTACAATCT
GTCCGCCATG
GCAAAGTAGG
TTCGTTTCAA
TCTGCCAACT
TGCTGGGTTG
TCTTCCAAGG
GTCACGCTGA
372 CGTAGGTATC TTCCTCAGCG ATAAGACTAG CTGTATCAAT CTCAATTGCT TTCGCAGTGT CTTCTAAAGA ACTCAAACGA GGAG'TTGCAA ATTTCTTCTr GACCTCACGA AGTTCTTTCT TCATGAGATT GTACATAGTC CTTTCATCAC CGATAATAGC CGCCAGCATA GCAATCTTCT 1200 1260 -1320 CACGAAGCTC TGCTTCTTCT TCCTGCAAGA CAACCACATC GTTGCAAAGT TACGATAGCC TCAGCCTGTT CTTCCGTAAA TTTCCTTGGC GTCCGCCTTA TTCTCAGAAG CACGGATAAG GGTATTGGTC AAACGGTACA ATCATAGCTA ACTr'rGAGGT AGCAA'rGACT TCATCCAAAA TCGAAATCAC ACGAATCAAA CCTTCGACGA TATGG.AGACG 7rTCTCAGCC CAAAGCGTGA ACGCGCCAAA GAACAATCCC AACCTGACGA TGATTTGTAG GTCGGTGTAC TCTTAAGTTC GATAGCGATA TCCCAGCTAC CTTGTTAT'rA TGATTTCATA AGGAATCTCA TT~TCAGTCTT GGAACGAACA CATCACGACC CTGAATAATA ATCACTTCTC GACGGTGAGC GGTGTGAAAT TGTCAATCGC
TTAAATAAGT
CGAAGACCAT
ACACGAACAT
ATAATAACGA
ACCACGCGCC.
GCCCCTGTAG
AGTTGAGAAC
CACGGTCAGA
CATCGATTTT
GATATAGCTA
CACCATATTA
AAGCTCAGTA
CTCATCACGA
CTTGACTAGA
TTTTCTTTGT
GACAGGATTG
AAGTTGTAGT
TTAGCGTCTT
ACCTCAGCAA
TTGGCCTTAT
AGCTTTTCAA
TTCTTGATTT
TTTGTTCCTT ACC-ACCTTTT CTTTCCCAGT CTCATAAGCT o o 0 0 0 0 GGAAGTCTGG TCCAGGCAAG AAT'rCCATGA TCATGTAAAC TGCAGCATCT ATGACCTCAG GTTTATCAATr CTTTGCAGTT GGGTGGTCAA CTAAATTATG GGGAGGAATG TCTGTGGCAT AACCAGCCCA AATCCCAGTC GAACCATTGA CCAAGAGGTT TGGAAAGGCT GCTGGCAAGA CCGTTGGTTC TTTCTCCGTA TCGTCAAAGT 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160- 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940
TCCATGCAAA
ACAAACGTGC
TACCGTGCAT
CATCATAGAT
AGGAACTGTC TTTTTCTCGA CTCAGTATAA CGCATAGCCG TTCAACTAGA ATCTCACGAT AGAAGAATCC CCG'rGTGGGT TGGCCGACTT ACGGTAGCTC T'rGTCAAAAG TACGGCGCTG AACCGGCTTC AACCCATCAC TGTACTTGGA GTACGACCA AAGCGCTCTC TGTTAGACAT AAGATACAAA GCCCATAAAA AAGCAAACTC ACAAGAGAAT TTATCTTrTTT TTCAAAGAAT GTAGAGTAGG TTTTTATGCA TGTTCAGTTA CGATAAGTAA CCAAACTATC TTTTCCAAAA TTAGTCTTAG TTTGTGTCrT TATCCTGAAG AAGGTACCT' GCAATTTCAG CAGGAGGATC TCCGTCCATA GAACCGTTAT TT'rTCCAGTT CTGTGACATA CGAACCATGG GGAAATTCCC CATCATGTTC CCGACTGACT TATTGCTAkTC CTTATTCATA GAATAAAGAA GAATATCTGG CAAAGCCCGG TCTTGAATAA CCATGATGTC CTCCAGGGAC ATGTTTTGAA TACCAAGTGA AAATAGAAAA TTCTTGAAGT CACACAGTAT CTAGGGCGTG TTCAACTCCT
ATTCCTCCCG
GTAAAAGATA TTTTACGGGA CTGTTTGTAT TrTTCAATAT GAAAATCTGG AGCCGCTCCC TTAAGCGCCT CTTTGAGATA AGCACTCA'A GCAGAT'rCTT CATTAA'rAAT CCTGCAATTT TTTCAAACCA A0ATr-rTA ACTGCTTTT CACATAGTCA TTTCTTTCAT TAAAACACTG TTTACGGCGT GCT'rCTACCT TAAATCTTCA A'IrGTGACAC CAGCTGGTCC GCAT'rCATCT GAACrG~rrA CGGAGTTCTT- CTGCCTTTA CCTTTGGACA CTCGACTAGC GGACCCATGT ACCG'rCGGTA TCCGCATCCG GAAGTCTGCT CCAACACCCG GAGGATATCC GCCAT'rTGG 'rTCACATCCG ACTCTAATTT TCGTTCTTC TAGCGTAAAC
TATCTCCCAT
GGATGAGGGT
CACCAAGTCC
GAGAACATTG
ACCIGTTTrCT TTTGTA'rCGT C'TAGT'rCTCC GTCCC'TCCAA TCT1TGTAAAG AGGTGGGAGG
AACGGTAGAA
TCATGATAAT
CACCAAT1GGT
CCTTGGCTGT
AAATGTCAAG
GATCTTATCA
ATAAATCATG3
ATTGACAACC
GGCAGAACCA
AGATTGGGCT
ATTTCCGCTC
CCAG~lrTACT AACATATTAT 'rTGACATTAT CTTCAATCCA ACGCGGCCCT CGGCGCG;CGC CGGTTCATGG TTG'TTT1CCCA TGGAGGGTAG CGCCTTTACC GCG'rAGGCCA CTTCTTCT -r GCAATATAGA CATGACCTGC AGCAAGGTCT GGATATCGGC TAGTTGGCAT CTTCAATAGA GTA'rrGATCT CTTCATTTrT TTACCACGAA GAGGTAGAAT CCCGCAGAGT CCCCCTCAAC GGGGTCAATT TCCCAGACAA TCATCACGCC CCTTACGTGC AGGTTAGAAG CTAATTCCCC CCATCCACAA CTGGGCGAGC TGCAAGTGTT CTTCAGGAAC AGCCTGGAAC TTGCGGTCAC GACCT'rGTTT TAGATAGAGT TCATTCTTAG CAGGATTCTT CAAGCCCTTA TCTTTCTTGT TTTTCTTCCC 3000 306o 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 41.40 4200 4260 4320 4380 4440 4500 4560 4620 4680 TGCTTCACGA GCATCACGGG CCTTGATAGC CTTGCGGATG ATTTTCCATA AGGAAAAAGG TCAACTTATC AGCCACTATT TAGGGGGCTT CCTAGTTTAT CCT'rGGTCTG TCCTTCAAAC TAAGATAGAA AGAACGGCCG CTAGTCCCTC ACGATAGTCT GAACCTTCAA GGTTTTTATC TTTTTCCTTG AGAAGACCTG TTTTACGTGC ATAGTCATTC ATGACCTTGG TAATGGCA.GA CTTGAGTCCT GTCTCGTGCG TTCCACCGTC CTTGGTGCGA ACGTTATTGA CAXAAGATAG AATGTTATCT GAGAATCCGT CATTGTACTG GAGGGCTACT TCCACTTGAA AACCATTGTC TCCCCTTCA AAGTAAAGAA CTGGCGTCAA GATTTCCTTA TCTTCGT.rGA GATAAGAAAC AAAATCTTGT ACTCCATTCT CATAGTGGAA CTCAATCGCT TCATTTGTTC GCTTGTCCGT TAAGACAAG GTCACATTTT TCAAGAGAAA GGCTGATTCA TTAAGGCGCT CTGAALATGGT ATTGTACTTG AAATCTGTCG TAGAAAATAT AGTCGCGTCA GGCATAAAAG TAACTTGGT GCCTGTTTA GACTTGGGTG CTGTACCGAT TTTCTTCAAA GTCGTGACAG GTTTTCCACC ATTTTCGAAA CGTrGCTTGT AAACTG;CGCC ATCACGGGTA ATTTCAACTT CTAACCAGCT AGAAAGGGCG TTAACAACGG AAGAACCCAC TCCGTGAAGT CCACCTGATG TCTTATAGCC ACCTrGACCG AAT'rTCCCTC CGGCATGAAG TCCCATAGCG TGCATACcTG TCGGCATCCC GTCTTTATTG ATAGTTACAT CAATACGATC ATTATCAACG ATTCCCAAA CTAGGTGATG 374 AATGGTAAAG ATAACC'rCAA CAGTTGGAAT ACGTCCATGG TCTTGAACCG TTAGACI'ACC ACCAAACCCA GACAAGGCTT CATCGACTGC AAGACCAGCG CCATCGGTCG ATCCAATATA CCCTrCTAGC A=tGAATAG CATCATCATT CACAAGGAAC CTCCTAT'rCG TrCATCTTTA AAATTTTCT TTCTCCGATG TGACAATTTC
CATCCCTGGA
ATAATTGTTA
CTATrTCTACA CGTTTTCGGA. CCGCATCCAA ATATTGAT?1' CC~wr=GA GGTT'rTCCAA GGATTTTGCA.
AGCAGAGATT CTCTGCT'rTT CTTTCCCA.A' TCATGATATA GTTTTAT'rAA. TCCTAGCCTA TCTGCTGGGT TCGATTCCAT GTATrCTr'rC AAATCAATCT ACGCGAGCAT GGTTCTGGTA TTCCGCMTr TAGGTAAGAA AGCTGGTATG GCAACCTrTG ACCCTAGCAA CGCTGCTTCC GATTATTTTT CATCTACAAG GGACTrVTGG CTGTTATCGG CCATACTC CCTATCTTTG GCTGTCGCAA CCAGTGCTGG AGTGATTT'TC GGATTTGCGC GCGATTATCT TCTTTGGAGC TCTCTATCTT GGCAGTATGA GCATCGAT'rG CGGCTGTTAT CGGGGTTCTG CTCTTTCCAC AACTAGACT CTCTCTTCAT CGCTAI'rATC TTAGCACTTG' CATAAGGACA ATA'rAGCTCG TATCAAAAAT AAAACI'GAAA AACCTAACCC ATCAAGATCC TAAAAAATAA AATGCCAGTT TAGACAAATA ATTTATCCAA AGGATTr'AGT TCTGTACTGC TTTTACCTrA. ATTCGTTTGT TGTTGTAGTA ATCAATATAG TTGATTAAGT GATTTAAATG TTTTCTCATA GCCATAAAAC AAAGAAAGAT TCCATCCTAC CGTTGTCTTG GCTGT'TGCCC ATAGGAGTAT GATrACAATA CTGGTCTCTG GATTGGACAA ACACTGGAAC GACCAACACC TGATrGACTT TTTCAAAGGA GCGTTTCTCC TCTCATCTTT CAGGATTTAA AGGTGGTAAG CTATCTTCTG TCTCTACCTT TTTCACTGTC TAG'rGTCACA TTTGT'rr TATCC7GAGT CTAGTTtGAT TATCATTCGT AT'rTGGTCCC TTGGGGATTG CTGTACTGCC CCCAAACAGT ACAGGACTAA GTCCTTTTAG 'rCTATAATGG CTrGTTCCAA A'rTTCGGATT TTAAAATGCC TTACGTGACA TGGATGCTTG 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6171 AATTCCCTTA C'rCTCTAGGA ACCGATGATA AGAATCGTGT TGGTATTGCC ACC'TGGTC ACTATGGAGA ATCGTAT'rCT CGTAGTGCTT CTCTOTGAAT GCCTGTTCCA A INFORMATION FOR SEQ ID NO: 38: SEQUENCE CHARACTERISTICS: LENGTH: 18475 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear 375 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: TAT'rACAAAT AAAAAAACGG AGGAGTGCP ACI-rGCTTCT TTITTGATG TAGACAAACC GCGTATTGTA AAAACCACTA TTTGTGGAAC TACTTGCCAA AGTGGTACCA TTCTTGGCCA GGAAGGAGTT TCCAACTTCA AAAAAGGTGA TGGTAAATGC TACTACTGTA AAAAAGGAAT GATTTTCGGT CACTTGA'rTG A'rGGTATGCA TAATACTCTT TACCATACTC CAGAAGACTT CATTCTGCCT ACTGGATATG AAATTGGTGT CGTAGCCATT ATTGGTTCAG GTCCAGTTGG TTCACCAGCT AAATTGATTA TGGTAGACCT ATTCGGTGCG ACTCATA.AGG TTAATTCTTC TGATNTGACA GATGGTCGTG GTGTGGATGT ATTTGATTTC TGTCAAAAGA TTATCGGTGT TGGTAAACCA. GTTGAATTCG ATTTAGATAA TGGTTTGGTA TCTACAAATA CGACTCCACA TGAACCGGAA AAATTGGTAA CTCACTATTT AGTCTTCAGT AAGGCAGCAG ACCACCATGC AGAAGCCTAA GTAGTAAAAA TATTTTTGTA AGATCGCTGG ATTTTTTATC AAAAAATTAA GGAATTGGTT ATAATATACC CTACAAAGGA GTACGGAGAT TTTGAACCGT GGTGGTTCTT TAGAAAATTT GACCAGTATT ATGATGCTCT GGAACAAGAA TCGCCTCTTT ATAAAAGTAG GGAAGACCAA CGCTGGTGTG ATGAATGTGA TCTTTGCAG GATGAGCAGG TTATCCCAGA AACCAGTCAG GAAAGGAATC GTTCTTGCCG TTTGGAGT'rG CTTTTTTTAT TTTTCTAACT
TIATGAAAGCC
AGTTATTCGC
AGACCTCCAX
CGAAGGGATT
CAAGGTCT1'G
TTATGCTCAC
GGCTGAATAT
GTCAGATGA-A
CT'rAAAAGGG AAAG'rAGAAC CTGGTTGCAG ATTGGCTGCT CTTTTAACAG CCCAATTCTA AGACGATAAC CGC'rTGGAAA CTGCCCTATC AGACCCTGAA AAAGCCATTA AAGAAATTTA CGCTATCGAA GCTGTTGGTA TTCCTGCAAC AGACGGAACG GT'rGCCAACT GTGGTGTGCA ACTTTGGATT .CGCAACATCA ATGTAACAAC ATrGTTGAAA GCACTTGAAA GTCATAAGAT CAAACTCAGT GAAATTGAAA AAGCCTACGA CATTAAGGTC ATTA'rCGAAA ACGATATCTC CATAAG'TAAA TAGAAATTCA GTCATCCATC GAAATGAGCA TATTTCTTTC CT'rGTCTGGC ATGAATGAAT ATGTATCGTG TTATAGAAAT AGAAGGTTGG GAAGAAGATA TTGTAGCAAG CAAATACTAC AAAACTTGCT GGTTTAGATT AAGCGACTTG ATGACCA'rTT TTTGGGACCC TGAGTATTTA CAACAATACC ATTCTTTGGC CGAAAAACTA CGCTCAGGCT ATGAAAAACA TATGAAATTA AAATAGAGAA AAGTAAC'Tr CTTTGCGAAT AGTATAGGTG AGGAGGTAAG TATACTTATG TTAAACCAGG AAGCCAACAG ACGCTATTGT ATTATCAAAG GGGATGTTCC GGGAT'rGTTG AAGAAGI-rGG ATTTT=GCG TCTGTGCCTG TGTGAAGACG AAGGGGGCTG CTACGTGTCC CTCATGCAGA GCTTTGG'rrA TGCTGTCAGA 376 TATGGT'rCAA GAAATTGCAC AAGAAATCAT TCGTTCAGCT CGGAAAAAAG GGACGCAGGA TATC'rATTTT GTCCCTAAGT TAGACGCCTA TGAGCT1TCAT ATGAGGGTAG GAGACGAGCG CTGTAAAZATT GGTAGCTATG ATTr'rGAAAA GTTTGCAGCC GTTATCAGTC ACTTTAAGTT TGTGGCGGGT ATGAATGTGG GAGAAAAAAG ACGTAGTCAA CTGGG'NCCT GTGATTATGC C1'ATGACCAT AAGATAGCGT CTCTACGTTT ATCTACTGTA GGCGATTATC GGGGCATGA GAGTTTGGTT ATCCGTTTGT TGCACGATGA GGAGCAGGAC TATTGAAGAA TTAGGCAAGC AGTACAGGCA ACGGGGACrC
TGGGAGTGGT
AGTTATGTCC
GAACGAAGCA
AGATCTCTTG
TAGT'rrGACA
TGAGCGTCTG
AAGACGACCT TGATGCATGA ATTGTCCAAG ATCGAAGATC CTGTCGAAAT CAAGCAGGAC ATCGGCCTAA CCTATGAAAA TCTA-ATCAAA ATTATCCGAG AAAT'rCG'rGA CAGCGAGACG GGTGCGACAG TCTTTTCAAC CATTCACGCC CTGGAGTTGG GTGTGAGTGA AGAAGAATTG CTGCATTTTT GGTTTCAGGA TATCTr'rTTG CTGGTCCGGT TCACTCTr'rA AAGGACAGCA GACATGCTTC AGTTGCAG'TT CTTTCCTTGC GTCATCGACC GCGCGTGCAG TGGTCAGAGC AAGAGTATCC GAGGTGTTTA GCAGTTG'TTC TGCAAGGAGT TT'TGCAAGCA GAGATTATCA CTTCTTAAAG ATGGACATAT TAAGCAAAAA AATATCATCA CTGCTACCAG AGATTAATCG AGAACACCAA GCAGCCAAGT CACAAGTCTT CAGGCTGAGA CCCTATTTAA CAATCTCTTT ATAGGAGTGC TTTGTTGGAC GGAAATCATT CTCAGAAATG TATCCCTAGC TGAAGTTCAT TGGACAATCT -GGCTAAGGTC TGCTGGGTTT TCTTCTCTTA ATAGTAGCAA TAI'GCCACC TAGGGCTTGT TTCCGTGCTT TGAGTGTCTT TTCTATCT'rA 'rGACAGCCTA TTATGCACGT AGATTTT'rCA AATGATrGCAG GGGGAGGAGG AATCGTTGAC GGAATGAGCA AATTGACCAG CGGAAAAAAT TAGCTACAGC 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 TCTAGCGGTT TTCATCTGGT GGAGACTATC TCCTTTTTAG AAGCAGTGTG TGACCCAGAT GCGTGTGGGC TTGTCTCAGG ATGGAAAGTT TGGGATGTTC AAGTGCTATT GTCACTCAGT GGCAATCTCC ACCTGAGTr'r GGGAAAGATA GAAGAATATC AAGAAAAAAT TGATTGAAGT AGCGACCTAT CCCTTGATTT ATTATGCTGG GGCTACGGAA TTACCTGCTC CCACAACTGG CAAA1TTATCG GTAATCTGCC CCAAATTTTT CTAGGCATGG GCCCTTTTAG CACTCACTTT TTATAAAAGA AGTTCTAAGA GCACGCCTTC CCTTTATTGG AATCTTT'GTG CAGACCTACT GAATGGGGGA ATATGATT'rC ACAGGGAATG GAGTTGACGC GAACAAGGTT CCCAGCTCTT TAAAGAAGTC GGTCAAGATC TGGCTCAAAC CCTGAAAAAT GGCCGTGAAT TTTCTCAGAC GATAGGAACC TATCCTTTCT TTAGGAAGGA ATTGAGTCTC ATCATAGAGT ATGGGGAAGT TAAGTCCAAG C'rGGGTAGTG AGTTGGAAAT CTATGCTGAA AAAACTTGGG AAGCCTTTTT TACCCGAGTC AACCCArCA 377 TGAATTTGGT GCAGCCAC1'G CGGCAATGCT CATGCCCATG ATGACAT'rCT TGAAAAAAGC TTGCTGATTA TCAGCGTGCT GCAGTCAATG ACAAAGGAAA G'rTTATCT
TATCAAAATA
TAAG4GTTAAA
TTTCTTGCTC
AGCAGCTG?1T T'rGTGGCACT GATITATCGTT TTACTTNATG TGGAGGTAAA TI'TTAAAAT GAAAAAAATG TATAGCTTAG AAAAGAATGA AGATGCTAGC ACG4GAAGAAC AGGCTAAAGC TTATAAAGAA AAAGTCAATG ATTAAGGCCT TTACCATGCT TATCCT'rGCC TTGGGCTTGT CCGGCTCTGT GATTTTCTTT ATGGAGI-rTG AAGAACTCTA TCAGCAAAAG ACTAGTCTGA ACT'rAGATGG GCCAGTCCCT AAAGGAATTC AGGCCCCATC GGGCAATTCG TCCCTGGCTA AGGTTGAA'TT ATTATATC'rA GGAAATGGAA AAATTAAACG 'rTTTACTGGA AGCAGTAGTC GCTCTAGCTA GACAAATTCA AAAAAATAGG CAAGAGGAAG GCTrACAT TTTGTrACCTA
GTTAAGGTGG
CTAAGAAAGT
TACAATGATA
GGAAAGTCTC
CCAGTCCACT
TCGGGAAACC
GCAGACGCTT
AGGCCAAAGT
TCAGACCAGT
CATTrAAGGAA
TCTTTGCCAG
CAAAAATCTT
TGGTGGAGAT G'rTGG'rGGTC ATCTGACCAA GCAAAAAGAA TGGAAAGCCA GGCAGAACTT TACAAGCAGA TGGACGCATC AAAATGGG AGCAAATCGT TTGGTTTTGG GACTTGTGAC TTTTCAGCGG TAGAGGAACA CAAAAACGCA GTGTAGCCAG AGCAATGGCA GTCAAAAGTT ATTACATTTG ACCGAGCTGG AAAGGAGCGA TTCGCTATCA ACAAAAAATT AGGGCAGTGA CATTGCGACC CTCCTTTTGG GCAAAAGCGAA GAAGTC'rTGA AAGCATCAAC GGAGTTGAGA 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220
GGGTAGCTAA
TTCAGTATT
CAATCAAAGA
GTCATCAGTG
CGCTACCAGC
GAATTAGACC
GGCAAGGACA
GGTCGAGGTT
CAACTGGTTC
GTGGAAAAAG
CTTTAGTCTT
'rTTGAATAAA GATGGCCCTG CAGACGGGGC AAAATCAGGT TTCTAGTGAA AAAGGATTGG AGGTCTACCA TGGTTCAGA.A CAGTTGTrGG GCCATAAGGT CAAGGCTTTT ACCTTGI"TAG AATCCCTGCT TGCCCTCATr GGGGATTACT CCrTTTrTCAA GCTATGiGTC AGCTCCTCAT TTCAGA.AGTT AACAAAGCGA GCAAAAGGAG TGGCTCT'rGT TTIGTGGACCA ACTTGAGGTA GTrCGCAGTT. CGAAAAAGTA GAAGGCAATC GCCTATACAT GAAGCAAGAT TCGCCATCGG TAAGTCAAAG TCAGATGATT TCCGTAAAAC GAATGCTCGT ATCAGCCTAT GGTTTATGGA CTCAAATCTG 'rACGGATTAC AGAGGACAAT GC 1'TCATTT CCAGTTCCAA AAAGGCTTAG AAAGGGAGTT CATCTATCGT AAAAAAGTTA AGGCACGTGT TCTCCTCTAC GCAGTCACCA TAGCAGCCAT TTG'N'GCAAT TrrrATTTGAA CCGACAAGTC GCCCACTATC AAGACTATGC GAAAAATTGG TTGCTTTTGC TATGGCTAAA CGAACCAAAG ATAAGGTTGA GCAAGAAAGT GGGGAACAGT TTTTTAATCT AGGTCAGG'rA AGCTATCAAA ACAAGAAAAC 378 TGGCTTAGTG ACGAGGGTTC GTACGGATAA GAGCCAATAT CAGTTTCI'GT TTCCTTCAG'r CAAAATCAAA GAAGAGAAAA GAGATAAAAA CGAAGAGGTA GCGACCGATT CAAGCGAAAA AGTGGAGAAG AAAAAATCAG AAGAGAAGCC TGAAAAGAAA GAGAAI'CAT AGTCAATTCA ACTATAATGC GTTGAATCCA GAATAC'TCCA CTGTAGTrTrC TAGAAAATTG CTGGAAATGG ATGTTAAGCT CCAATTCATT TG'rTTATA'rC TT'ATTTCAGT TITACTATACT TTGTGCTAAA 'rTAAAGATAT GAAACATGAT T'rTAACCACA AAGCAGAAAC TTTCGATTCC CCTAAAAATA TCTTCCTCGC AAACT'rGGTA TGTCAAGCAG CCGAGAAACA GATTGATCTT CTATCAGACA AAGAAATTTT AGATTTCGGT GGTGGCACGG GTCTATTAGC CTTGCCCCTA ACCCCTAGCC AAGCAGGCTA AGTCAGTCAC TCTTGTAGAC AT'rTCTGAGA AAATGTTGGA GCAAGCTCGT C C 0 *0O C C. 1*
C
C. CC a.
C
C C a TTGAAAGTGG AGCAGCAAGC AATCCCTTGG AGAAAGAGTT GATTTGGATG CGGCTCTCTC ATTGCTGATT TTACCAAGAC AACAAGCTAA TTrGAGCATGG GACCTGTTTC AAGGAAATCA TAGTCAGGGA GTGAT'rTTTC AATATGGATT TTGAAAAAAT ATCCAAAGTG AT'rTGGCGAC CTGGATGGTG AAACTGAGCT GCACTACGCA AAGAAGAATG ACAGAACCCT TGCAGGCCAA TTTATTGTGG AAGAGT'rGTT ATGGGAAT'rC TAGGCGCTAT ATGGA-AGTGG ATGATTTGCTI CAGGCTGGCT TTGTCCAAGG QTCATCAGTG ACTTGCCTGT GTTGCTTCTA GCCAAGAACA TACCTCAAGT CAGACGGATA CAAAGTGATT TGTTAAAAGA CTGCCTGAAA ATCTCTTTC AATCAAGAAT A'rCCAGTTTT TGA'N'GCCI'T GCTGTTAGTC ACTGTTTCAT CAACATTTGA AGAAGCI'AAT CATCATGGAT TTTTTCATCT GTGCATAGTC CTCAGAATTIC T'T'r'rAATAG TATAAGGATG GAAAAAAGAA TGAACAAGCI' TATACCTATT CAACTTTTAT GACGCCTTGG AAACCAGGTC AAGGAGAACA GCTCAAGACC TACCAGTTTC TCACCAGTTT ACACCGGATG TGGAGCAAGA TTTACCGAAA GCGT'rCTTCA TCATATGCCT AGGAAGATGG GAAACTCATC TTGAT'rTAGC TGAACTGGAA AGATTCTCTA TAGTGCTGAA TAGCCCAAAA ATCACTCGCC GGGAAATTTG GTAAGATAGG TACTAGAGAA TGTCCAAGTC TGGAGCAAAA TAGCATCTAT ATCAAACCCT TAAGCGTTTA TCTTGATGAA GGCTGGGCAA CTATTGCTT'r GCTTTTGG'rG 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 TAAAGAGGAG GAAATTACTA TCCTCGAAAT T'rTCTTGACC TCGCTTACTA AAAAGGTGA
GGGTTCTGGG
TTACTGGGA
C 0 0 000*
C
00*C
C
0*0C 00 00 C 0
C
GATTGATCTG
AGATGCCGTT
CGGCTATTAT
TACTTACGCC
CGCTATTTTT
GCAGCTAGCA TGGCAGATGT AATTGGTTrG CGCCCACAAA TGCTCAAAGA AAGCGATGTG CCTGATGATG CCGTTGCGTC GCGCCATCAA CATCACTTGC TCATGGAACA AGGGCTTAAG CTAGCTCCGA G'rGATTTGTT GACCAGTCCT ATGGCTGAAA GAACAGGCGA GTCTGGTTGC TATGATTAGT TAATGCCAAA CAATCTAAGA CTATTTTIAT CT'rACAGAAG 379 AAAAATGAAA TAGCAGTAGA GCCTTTrGI' TATCCACTTG CTAGCTTGCA AGATGCAAGT G'N'TTAATGA AATTTAAAGA AATTTTCAA AAATGGACTc AAGGTACTGA AATATAAAAT AGATTTTGTT ATAATAGTTG AAAACGCT'rA AAAAGGGGTA TCATGTTATG ACAAAAACAA TTGCAATCAA TGCAGGAAGT TCAAGTTGA AATCGCAATT AAGTATTGGC GAAAGGTTTG ATTGAACGTA TCGGTTTGAA AATTTGACGG CCGTTCTGAA CAACAAATTT AAA'=TATT GGATGACTTG ATTICGT'rTCG GTGTTGGACA TCGTGTTGTT GCTGGTGGAG GAGATGTTTT AGAAAAAGTT GAAGAGTTGA ATGCAGCAGG TGTTCGTGCC TTCAAGGAAT TTGATACTTC CTTCCACACA AGTATGCCAG AA'ATTACAC AGAAAACAAG GTTCGTAAAT TAGCAGGAGA AGC'TGCAAAA CTCTTGGGAC
TCGATATTGA
ATA'IrATCAA
AATATTTCAA
GTTTGTTGGC
TGTTGCCAGA
AGAAAGCTTA
ACGG'IGCTCA
ATACTTAATrG CCAGAAGAAA AGATTCAATT TCAACTGTAA AAATCATATA CAAGCCGT'rA GGCTTATGAC GAGATTACAG AGAATCAACA GTTGTTGAGG TCCTCTACAC AACCCGGCCA CATTACCAGT GTAGTTGTTT TCGCTACCCT CTACCAACAA rGGTACAAGT CACCAGTTTG 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 GTCCATTAGA AGACTTGAAG TTAATTACCT GTCATATTGG TAACGGAGGC TCAATTACAG CTGTGAAAGC CGGCAAkATCT GTAGACACTT CTATGGGGTT CACTCCTCTT GGTGGTATTA TGATGGGAAC GCGTACAGGG CAGCTATCAT TCCTTATTTA ATGCAATATA CAGAGGAT'TT TAACACACCA GTCGTGTTCT TAACCGTGAA TCAGGTCTTT TGGGAGTTTC TGCTAAT'rCT GCGATATAGA AGCAGCTGTA GCAGAAGGGA ATCACGAGGC TAGCTTGGCT ATGTTGACCG TATCCAAAAA CATATCGGTC AGTACCTTGC AGTGCTAAAT
GATATTGATC
GAAGATATCA
AGCGATATGC
TATGAAATGT
GGAGCAGATG
GATGTAATCT
GGCGTTACAG
GA'rGAAGAAT CCAT'rGTTTT CACAGCAGGT CAGGGATTTC GTGGTTTGGT GAGACATCTC AACAGAGGCA TAGTCATTGC CCGTGACGTT TACAAGGAGT TGGGAAAGTT TTGCTA'rGAT TGGCTTTN' TGAGTTTTGA AGAAAA.ACTT GAGGGGACAA GTTTTGCTAA GTCGGTGAAA ATGCAGA( AG TTTCCGTCGT TGTGATGTTG ATGATGAAAA GAATGTCTTT GCTAAAATCC GTGTCTTGGT TAT'rCCAACA 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 GAACGCTTGA AAAAATAAGT GAAACTAAAA AAATATTCAA ATTTTTCCAG CTTCTTTTTC TGATGAAATT GTCCAAAACC GAAAAATATG GTATAATAGT AGTAATTTAA TAGATGGAGT TCGTGTAAAA AGAGAGAAAG ATTTTAAGGC GATTTTCAAG TCGCAAATTT GTGGTCTACC AATTAGAAAA CCAGAAAAAC
CGTTTTCGAG
ATTAAGCGAC
TAGGTCTATC AGTTAGCAAA AAACTGGGGA ATGCCGTCAC TAGAAATCAA GGATTCGGCA TATTATCCAG AATGCAAAAG GGAGTCTGGT AGAAGATGTC
GACTTTGTTG
AATCTACTCc
GAAACTAAAG
TAATGGGGTA
CTTCTTTGCG
TCATTGCTCG AAAAGGAGTC ATGTATTAAA ATTATCAAAG TTGACTAGTT 'rGCTAGGACT ACTAGCGATA TTACAGCCGA GAAATCATT'C GCTTTTTATC 380 GAAACCTTGG GATACGCAGA GATGGAGAAA ATTNACCGGG. AAGGAAATGG GAGTGAAAAA GTCTCTGTTA ATCATGACAG CCTGTGCGAC ATCGGCTGAT T'N'TGGAGTA AATITGGTTTA G7TTTGATATr AGTATCGGAG TGGGGATTAT TCTCTTTACG GCTTGATTC T1TCTAGGAAA ATGCAGGAAG TCGAGATATG GAAAGCAGAA GGGTGTCAGA CAGTCAGACT CCTGT1TCCAA GCCCTATCAA CCTTGGTAGT GTGGATACAA AAGTACTTGG TTGTCCAACA GTATGGGAT'r CCAGTCTTGA ATACTGGACA GTGTCTAATG G'rACAGTCCT CT'rGCCAG'rC TTTCAGGTGC AAATGGTGGC CTCAGCCACG CA'rTAAGGCG CTTCGAGAAC AATATCCAGG CCAAACTAGA GCAGGAAATG CGTAA.AGTAT TTAAAGAAAT CTCTTTGGCC GATTTTGATT CAGATGCCGG TTATTTTGGC GAGTTGACTT 'N'TAAAGACA GGTCATTTCT TATGGATTAA
CCCTTGTTCT
AACCTTTGTC
TTTTTATCTT
CTTATCAAGT
CZAAGAT'rATC
AAGAAAACCC
AGTATTTACA
TCCAAGAArc
ATTTGGTAAA
ACCAAATCAA
GAAGACGGT'r GCAGAGCGCG AGGCCGTAGT AAGAAAAAGG CTCAGAAAAC GGTTCAACTG TTGAAGAAGC AAGGCTCATA TCAAAGTCAT AAACCAGCCC AAGTGGATAT CAGGTAGTAA AAGGCGTTCC AGTGAAGAAA CCGTTGACCT TCCGATTTTA GCAGCAGTAT TCACCTTTTr' TGAGCGAAAT GGCGCTACGA CTGCGATGAT TGCAGTTTAT GCGCCAGGTG GAGTCGCCCT CTTGCAAACC TATT'rCT'rGA ATAATCCATT ACAGGCACAA AAAGATTTGG AAAATAGAAA GAAATAAATA AGGAGGAATC TGGTAGTGGT AATCCAGAAA GGATTGAAAG AATTAGATAT TTCTAGGGAG AAAAAAGGCT TTCTTGGTCT TGAAGCGAT'r AGTGAAACGA CTGTTGTCAA GAAAAAAATC AATGATTTGA ACGAGCCTGT TGGTCATGTG GTTGATGCTA TTAAAAAAAT 8820 8880 *8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 0000 0 o* AGAGGAAGAA GGTCAACGTA ACATGCCAGC ACTATCI'AG CCAGGAAGCG ATGAGGGAAG TGAAAGTCAA CAACTAGAAG AGTAGCTACG GAAGTAATGG TACACT'rTCA AATGATTATA AGGTCGTAT ATCGGCTACC TTATCTTTAC AACCGCTATT CGAACACCG'r GCAGAAGTCT TTTr'CTGATGA AGTCAAGGCT GAAATCTTAA AACATGAAAG AAGAAACTGG TCACATTGAG ATTTTAAATG AACTTCAAAr AAGCAGGCGC TGATGACCTT GAAACTGAGC AAGACCAAGC AC'N'GGGCTT GAAAGCTTCAA ACGAACTTTG ATATTGAACA CTTATGTTCA AACGA'rTATT GATGACATGG ATGTTGAGGC ACC&TCGTAG CATCAATCTA CAAATTGACA CCAACGAACC ATGGTAAAGT CTTGAAGGCC TrGCAAC'rGT TGGCTCAAAA CCAGAACCr'r CTACGTTACA ATCAATGTCA ATGATTATGT TGCAGACCTA TGCGCAAAAA TTGGCGACTC GTGTTTTGGA 381 AGAAGGGCGC AGTCATAAAA CAGATCCAAT GTCAAATAGC GAACGCAAGA T'rATCCATCG TATTATTTCA CGTATGGATG GCGTGACTAG TTACTrCTGAA CGTGATGAGC CAAATCGCTA TGTTGTTGTA GATACAGAAT AAGTAAAATC AGGTT'ATCC 'rGATT7M'IG CTAGTTAGAG GAGGTTAAAC TCATGTTGAA CGTAATCCTG ATAAAGCGGG CAAGAGGCCC GAAAGGTT TGGCAACTCC AACAGACTAG TGGGTTTATC TACAGAGAGA GGGACATCTA CTGACI-rTGG GAGCAAACAC TGGGCAAGCA TATCTAACCT ACTCTAATGG ACTTTACGCG AGAAGGTGAG CC'rATGACAG AAAATTCGTC AAAATTCTTC CCTATTATCT CATACAGTCC AAGAGTGAAC GCATTA'rATA ATAWACTTAC AAGTGTTCTA AATTGGTT ATTGGTGTTG ATAGGATATG TGTTAAAGCA ACAGTAGGGT CTTTCGTCCA ATCTTAGCAG CCCTTACTTr GGACTTGCTG TGGAACTGCA ACTACAGCTC TCGAAAGATT ACGAAGGTAA TGCAACAGTA TCTCTTATCG AGCAGCGATT GGTATCATCT GGCAACTCAA CGCTTGACTG AATCTGGTTT GTAGATAAAG TCTTAAATTA CCTAAGT'rCC GATGCTCGTA TTCTTCGGAG AATTTCTTTG GAAGTCAGTT GGCCAAAGTT TTAGACATTC TCAAAGTCAA CGGTGGGAGG AAGTCAAGAA GTTCGAAAAG TGAAGAAGAA ATCGTAGAAG
AGCTACGAGA
AGTCCGCTGT
AAAAACAATT
CTAGCAATAT
CACTTTTGAA
ATATGTTGCT
CTCTTAACTA
CAGCAAACAA
TAAGATAAGA GACTATAG AC?1rTGCI'GG TTTGCAGTAC AGCAGAGCGA GAGAAGATGC TGGCATTCCG CCACAAAGGA TACAGAACTG GCCAAAGCCT TTCAAGCAAG CCATCCAGAA CCAGTGGATG AATCAGGCCC AGCGTTTGAG ACCACAT CGGACAAGTG ACAGAACCTA TGATGGCCTT ACG'N'TGTAT
AAATAAGATA
GTAATTCTTG
CAAPLAGGAGA
TTTGCAGAAT
AAAACCTGCC
TAACGTGGGT
CAAATTCtAA
CAAAATTGTA
TCATCGAACG TAAGAAGGAT CAACCCTTAA AGGGATTTAT CGAATGAAGA AAAGCGTCGT rT'rAGTGA.A GGTAGATGTT GCTTATTGAA GTCTTATTCT ATTTGTAAALA CATCATAAAT GTCT'TTr'GT TTGCGCTTTC ACAATTATGG AAGTCGTTTC CCCGCATTTT TCGTAGGTTT CATGACGTTT 'IrTCAGGGTT GCTGGTGGTT TGGTTACAAC A'rGGTGCAG CGGTTATCGA GCAGAGTTTC CAGATTTTGT 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 TATTGATTGG TTTTGGAATA AATATCTTGC TCGTAGCTCT GAACCCTCTT TATTACTGGT CACATCATGG TACAACAAGC TTCTAT'rCTT
GTGGACTTTA
GTGGTGGCGG
TAGCAGGACG
TCTCAATCTT
AGTACCACAA
CTGGGCAGTT
ATTTGCGATT
CT'GGTAAG
CCACGATACA
TTGCGCAATG
AGTTCAAATA
GGTCACCAAC
AAAGAAGAAA
GTTGTTGCAT
CTTACGGTAC
TGACTGTTGA
AGCAAr'rrGC
GTTTAGACAA
CTGCTACCTT
CCATTCTTTT AATCTTGGGT CCAGACATTA TGTCTAATAA AGAAGTCATC ACTTCAGGAA TATCCAAACA GCCTrACCT GTTCGTATCT GAGTTGACAA ATTCCCAGCG GTTGACGTTG AGGAThCC TTTGG'r'r'GA AAATCCGATT CTTATTATTA GGTCTACGCT GATAAACGCG TGTCCI'CAA GTTGCTCTAG TGGCTACCAT GGAAATATCG ATACCTTGGT ATTGI-1'GGT'r TCAATTTGCC AAAGCAAAAG TTAGTA'rCTA GAAAAGGAGA TGGG'rTCA'rC AATGGTTA'rC CAGATTTTAC AGTCAATTCA ACATCGTAAT CGCTTCTCT'r TAATI'GGGCT TGATAACT'rG TACAGTAAAA GGTTGGAGGG CCTC'rTrAAA TAAAGGAGGC ATCCGACTAG GTTTAGAGGC TTAATTGAAA GTGGGGCAAT 382 CTCTATTCAA TCCTGCTAAA CAAGATTTCT TCTCAGTTTA CITrGTTCGTT TTGATGCAAG ACGCCTTCCA AGGTATr'rCA AACAAATTGT CAGCTTCT'rA TGGATTTGGT TCTCCAAATG
TTGGTCAATT
CAGGATTTGT
GCGGATGGAA
GAGCT=TT
ACTTTGAATT
ATGTACTTGT
ATAAAGAGAA
AATAAAATGG
AAGATGAAGG
TGCAGTGTCG
CA'rrTGATTC
ATGGATIGATA
GGCTGGACAG
AGATATGAAT
TAACAATTGG
TTTGCCAGAG
GATTACAATT G~r1-rGCTCA ACCAGTGT'rC TTGACAATG AGCGGCTGTT ATCCTTTCCT TGTGGCCCTr CTCGATTTGG CCCATGGCTT GGATTTCGAT GTGTCTCTrTC TTGCTTGTTA ATATTACAAC GGTGAAGTTC TTAAAGTATT AGCAGCGTGC TTGAAAATGC TCTCCGTAAG GTGAAGCTAA AGTTTAGCA AAGAATTGGA AGGGCGAACT AAGAAATCAC CGAAAAACTC
TTATGTACAT
GTGTCCGAAT
TGCCAGGTTC
CTGTCTTGTC
TCGTCTT'AA
CAGCCATTGC
TTATATCAGG
CATCTTATGG
ATATCT'rCAA TT'CCTCAAC-r
AAGAAGAAGC
GGAAATGGAA
CTTAA'rCAAA
GTAGGATATG
AATGGGAAGT
AGTCAAGCAC
TGTCCTTCTC
CAATGACTCG
AGTAGATCCC
ATCGACTGAA
'rAGACCTGAA TGTA'TrTCA 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100
AAACTGAGAG
T'rAAA.ACAAG
AAAGAAGCAG
TATTACGATG
GGTATGGCTA
ATTACCTTAC
TTATCGTTTC
CTTTA-AI-GA
TCAAGGTAGC
CTATCATTGA
TGCCCCACGC
AAAATCCTGT
GAGTATGGGC CTTACTATAT CTTGATGCCA GCAGGTGTGC AAAGTGATGC CTTTTCATTG GATGGGAAAG AGGTATCTGT AGTGTAGCCA TTCCACAAAT CAGGCTTGCC AGACTAAAGA TATCTCGAAG GATTGGATTT ACCTAAT'rTA CAAGTTGCAT TTC'rGTTGGT CAGGAAGTAG AAGTGAACTG GCTGAAGTCT AAAATGTGCT GATGQTrGGTG
TTTGTTGGCA
TATTGCCCTA
ACATGTCTTG
GGAAAGTTAG
TAGACCATTC
ATATTATCGA
TGCGTAGCCT
CTAGCAGCAA CAAGTTCAAA AATTCACACA TTTGAATTAG AAGATTCTAT TGCACGTTTA
GCTATGATTG
AAAGAGGAA'r
AGACTTGCAA
AGCTGGAACT
TTTCCCAGAT
AAGAATC'rAA
AAAGAAATGA
GGAGCGNI-rA
GGATAGCCCT
CAAAAAGAA'r
AAGCAGCTGT
GTTTGCTTGC TTCAAGTTGG AAGATTATTG TGGCAGACAC GAACAGTTGC TAAAAATAAT GCGGTTCGTG GAGCAGACTG
GATGACTTPGT
GACTGAACGA
ACAAGCTCAG
TGCTCrcTT 383 ATCTGTTGTG CAACCATCCC TACTATGGAA GCMGCTCTAA AGCTATCAA GGAGAACGAG GCGAAATCCA GATCGAGCTT TATGG.CGATT GGACTTT'rGA CTTGGCTAG ATGCAGGTAT CTCACAAGCT AT'rTATCACC AATCTCGTGA GCTGGTGAAA CTTGGGGTGA AAAAGACCTT AATAAGGTTA AAAAACTCAT TGACA'rGGGC TTCCGTGTAT CTGTAACAGG TGGTCTAGAT GTAGATACTC TCAAACTCTT TGAAGGTATT GATGTCTTTA CCTTTATCGC AGGTCGTCGA AGCAGGAGCA GCGCGTGCCr 'rCAAGGA'rGA AATCAAACGA a a a a a a a. *a a a a ACGTCCALATT GGAATTTATG AAAAGGCAAC AAATTTTGCC AAGGAGTrAG GCTTTGATTT GCGTTrACCA AGACTTGACT GGAGTAAGGA TGAAACTGGT GTTCGTATTC CTTCTA'rCTG TTCAAAAGAT CCAGTTCTAG AGGAAAAATC AGCTCAAGAC TTGGGAGT'rC GTACGATTCA AAAGTCACCC CAGACACGCC AACGTTTTAT TGAAGAAGCT CAGGTGGTAC TTGCTATTGA CGAAAAATAT TTGGCTATAG AAAAAGAGAT TATTGGTAAT GTGTCTGCAT GGCATAATGA TGCCATCGCA GCTCTCCA'rC TCAAGGATAC GT'rCCGAGAT GTACCTTTCG GGCAAGG'rTG AAAGGAAACC AA'IrATAATG GACCTTTCCT AGTAGAAGAA ACACGCGCAG CCATTCAAGA GAAAGCAGGT TTGATGTAAG ATGAATCAAG ATGCCAATCA ATCATTGCCA AAACATGGAC AAGTTAATCG CGAACTCGGT GTCATTGI'A TGACACCrGA AAACATGGTA GTGACTGATC GACCATCTTC CGACCTCCCA ACTCATGTGC GTGTGGTrTCA CACCCATTCG ACAGAAGCTG CTTTCTACGG AACAACCCAT GCAGATTATT
CCCAACACAC
TGTCGAGATG
AGAACGcTTG
TTTTTCAGGC
TCTAGAACTC
ATTAGCTGGT
CAAAAATTTG
AATTATGGAT
TGACTCTCCC
TATCTATAGT
TTATGCAGTG
TGTCAAATGG
AATCGAAATG
GGCGCAAGCT
TAATCAATGC
TTGTCAAAT1'
TCAAACCATC
TAGATGGTAA
AATTATATAA
TTGGTTGGGC
TCTACGGTTC
ATTACAGAGG CTGTGGATCC ATTTGGGGGT AAATCATGG'r TG'rACTTGGC TAGAACGTTT TCTATrGACG AACGTGACGA GAAGTTGTCA AAGCAATCrA CATCGTCGCT ACCCA'rTGGG ATGAAAAAAT GTATCGAATT TACGATGTTT ACTATGAGGA AGAAAAGCCT GTGACTGGGC GATCCTrCA TCAGTAGCAT T'rCCTCTTTG TATATCCAGA GAGTTTrATC TTGGTCATCA ACAGAAAGTT CAAAGGGCCA GAAGAAGCTT TCGATATTTT TGGTCTGAAA ATTGTGAAAC TTTCTCTATC CACTCATTAA TATGCGTAAA CGAGTCTGTG TACCTGGGGG AATGTATCTG AGGCGTGGAT TATGACGAAT GATCCTAGAA GGGGATTTAA GACTTGGTCA GAAATTGG;TA TCAGGCAGGT CGTGATATrC AATCCCTrGC GCCCGTAGTT 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 a. .a a a TGACCAAGGA CGAAGTAGAA GTGGCCTATG AAAAAGATAC TGGCCTGGTT ATCGTAGAAG 384 AGTTTGAACA TCGCGGACT'r AACCCGGTTG AAGTACCAGG AATTGTTGTA CGCAATC-ACG GTCCATTCAC CTGGGGCAAA AATCCAGAGA ATGCTG^TA TCACTCTGTC GTACTAGAGG AAGTATCAAA GATGAATCGC TTTACAGAAC AAATCAATCC AAGAGTGGA CCTGCTCCCC AGTACATACI' AGAAAAACAC TACCAACGTA AACATGGACC AAATGCTrAT TATGGTCAAA AG.TAAGAACG ATGAAGGAGG AGAAAAAGAT AAATTTAGCT CCTCTTrTTA CAT'rTGArTTr TTATTGAGAG TAAAGTTGGA GTTGAAGTAA TTTAAAAGA TTrTTTTAGAA ATAGCGCTTG ATATATATAT GGTAAAA.TAA AAAGAATTGC TGTGATATCA ATAGA'NTrGG GGGATTT=r AATATGGTAC TGGATAAGGC AAGTTGTGAT TTGCTrCAAT ATTrTGATGGA TCAAGAAACG TCCAAAACGA TTATGGCGAT TTCGAAAGAT TTGAAAGAGT CAAGAAGGAA AAT'rrATTAT CACAT'rGACA AAATCAATGC TGCTCTGGGT GACGAGGCGC T'rCACATCAT TAGTATTCCA CGAAT'rGGTA TTCACTTAAC GGAAGAGCAG AGAGATGCTT GTTGTAAACT ATTATCGGAA GTAGATTCGT ACGATT1ATAT CATGAGTGCG CATGAACGTA TGATGATAAT GTTACTATGG; ATAGGTATTT CTAAAGAACG TATTACGATT GAAAAAT TGA TAGAGTTAAC AGAGGTATCT AGGAATACTG TTCTCAATGA TTTGAATAGT.ATTCGTTATC AACTAACTTT GGAACAATAT 09 9 S *09 4 9. OG 09.
9 99 09 9 9 9 99009* 4 9*0
C
.0.9 09 90 0 4 SOOt 9 9-C.
09
S
9 CAGGTGATCT TGCAAGTGAG AAAATTCAGT ATCTTCAATC GTATCTATTr TAGAAGA'rAA GAAATGAACC AATTTTT'rAA ATAAACCATC ATGAAATAAC CATAATGTTG AACAGTATCA AGAAAAAGAA TAGAGTATCA GAAATTTCTT TGTCAGGACT CAAGTCACAG GGATACAACC TTCATGCCCA CCCTCTTAAT GCTTCTATAT CATATTTTTA TGGAAGAAAA TGCCACTTr'r GATGAAAGAG AGGTTAGATG ATGAGTGT?-T GCTTTCTGTT GGAACAGGTT. CCTTTAGTTG AACAAGATTT AGGGAAGAAA TTTTATGTTG CAGGTTCTAC CTTATTTrGCT GTTAAGCTGT AGAAAGACAT CAGGATATAG AGAAAGAATT TTCTTTGATA GGTGTCTAAG AAATTAGGAG AACGGTTGTT TCAAAAGT'TT TGAAGTTTCT CTTGTAGCTG TTCTCCTCCT CTCCTATCGT 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 L7280 17340 17400 17460 17520 17580 17640 AAAGATTTGG ATATTCATGC AGAAAGTGAT GATTT'rCGGC AA'rTAAAACT TGCTTTAGA GAATTTATCT GGTA'N'TTGA ATCACAAATC CGAATGGAGA TTGAGAACAA GGATGATTTG TTACGAAAT'r TGATGATCCA CTGTAAAGCC TTGTTATTTA GAAAGACTTA CGGTATTTT TCTAAAAATC CTCTAACAAA ACAAATTCGA TCCAAGTATG GAGAATTATT TTTAGTCACT AGAAAATCTG CGGAAATTTT AGAAGGAGCA TGGTTTATTC GGCTAACAGA CGATGATATT GCCTATTTGA CGATTCATAT TGGAGGATTT TTAAAATATA CACCATCATC TCAAAAAAAT ATGAAAAAAG TTTATCTCGT TTGTGATGAA GGTGTTGCGG TI'CGAGACT TTTGCTGAAA CAATGCAAAC TT'rATTTTCC AAATGAGCAA ATTGACACTG TATTTACAAC AGAACAATTT 385 AAGAGTGTGG AAGATATTGC AGCAGATTTC CGATTmAAG
CTAGACTATC
TCTAGTCTTA
GTTCAAACAC
TTAAACACAA
TTTCGTCTTA
TTATAAATCA
ACAGTCCAAT GATGAACACA TGGkATACTA ATCTCAAAGA AAAGAAATTA CAAGAGAGTC AACGAAGAAA TCGAAGAAGA AACTGTGGTA CTTGGATTAA AACCGTGGAA AATCAACCAA ATGGCAGGTG TTCGTAAGCT TTTGCTATCA ACGAACTAGA CCATACACAG CTGCAGCAAT ACAAGTTGAT GTAGTGATTA CTACTAATGA TGATTTGGAT CG=.AATCCT ATCCTrGAAG CAGAAGATAT TTTGAAAATG TATATTTCGT AATAAGAGCA AAAGrrTCAG TGAAAATCTT TATTGTAGAC AGCAAGTTGG CTAGTAAGTI' CCAAGAAGAG AGAAATAGTA GTCAAGCTT TTTTGGAAGr TATrTGAAGG AACCTGTGTk TTTCsTGGTC TrTTtTAGTG TTTTGAAGGG TAACAATTAT ATCCAAAGGA GGCAACATAT GCCAAACGTC ATGGAT'rTTA GCCACT'rTCC CAGAGTGrGG AACATGGTTG AGTCGTACCT GAAGGCAACr TTGCCATGTG GTGGCTAGGC GACACCAGOT GGTGCTAACG T'rGTCATGGA CCTTTGGTCA AAAAGTGAAA GATATGGTTC GTGGGCACCA AATGGCAAAT GCAACCAA-AC TTGCGTGTTC AGCCAATGGT TATCGATCCA CTATTACTTA GTTTCACACT TCCACAGTGA TCATATCGAC TCTCAATAAT CCTAAGTrAG AGCATGTTAA GTTGG 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18475.
INFORMATION FOR SEQ ID NO: 39: SEQUENCE CHAR~ACTERISTICS: LENGTH: 7186 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: CCAGGATTTG GTACCGTTGC AAGTGGTGTG CCTTTCCTCC TAAAGGAAAA TGGAGGAAAA ATCAATCAAT CAGCACATTC AGATATCAAA GTTGCTAAGG TATTGGTCAA GGATGAAGAT GAAAAAAATC GCTTGCTTGC AGCAGGGAAT GACTTTAACT 'N'GTAACCAA TGTGGATGAT ATTTTATCAG ACCAGGATAT TACI'ATCGTA GTGGAATTGA TGGGGCGTAT TGAGCCTGCT AAAACCTTTA TCACTCGTGC CTTGGAAGCT GGAAAACACG TTGTTACTGC TAACAAGGAC CTTTTAGCTG TCCATGGCGC AGAAT'rGCTA GAAATCGCTC AAGCTAACAA GGTAGCACTT TACTACGAAG CAGCAGTTGC TGGTGGGATT CCAATTCTTC GTACTTTAGC AAAT'rCCTTG GCTTCTGATA AAATTACGCG CGTGCTTGGA GTAGTCAACG GAACTTCCAA CTTCATGGTG ACCAAGATGG TGGAAGAAGG CTGGTCTTAC GATGATGCTC TTGCGGAAGC ACAACGTCTA GGATTTGCAG AAAGCGATCC GTTAT1'GA GCCAATTTGC GGAATCCGCA ATATCACACC AAATTGGTTG GrrCTATTGA TTCCTACCTA AAGCGCACCC GAATCTATCG GTATTGGTGA GCAACAAGTG TTGTAGC1'GA GGCAAAGACT TCAACCAATA 386 GACGAA'rGAC GTAGATGCGA TTGATGCAGC CTACAAGATG CTTTGGCATG AAGATTGCCT TTGATGATGT AGCCCACAAG AGAAGACGTA GCTGTAGCTC AAGAGCTrGG TTACGTACTG GGAAACTTCT TCAGGTAT'rG CTGCAGAAGr GACTCCAACC ACTTGCrfAGT GTGAATGGCG-=XGAACGC TGTCTTTGTA GTCTArGTAC TACGGACCAG GTGCGGGTCA AAAACCAACT TATTGTCCGT ATCGTTCGTC GTTTGAATGA TGGTACTATT TAGCCGTGAC TTGGTCTTGG CAAATCCTGA AGATGTCAAA
GCAAACTACT
GAAATCTTCA
GACAAGGCC
TCAGCTGAAT
GAATAAGATG
ATTTCTCAAT
ATGCTCAAGA
GTGTCGTTAT
CTTGGCTCTA
TATT'rCCTTT
CATCACACAC
TGAAGAAGGT TTCAGAATTC AAGATTATTG TACCTGCAAC GACTCAAAAG GTCAGGTCTT GAAGTTGGCT AAGCAAATCC TTCAAGATGG CAAAGAGGGT AAGATTAATA AAGCCCAGCT TGAA.AATGTC GACCTCT'rGA ATACCT'rCAA GGTGCTAGGA CAGTGCCAAT ATCGGGCCAG GTTTTGACTC AATTGAGGTC TGCGAAGAAC GAGATGAGTG TCCACATGAC GAGCGTAATC TCTTGCTCAA ACCAAGACGC TTGAAAATGA CCAGTGATGT CTCGGTTATC GTTGCTGGGA TTGAACTAGC CCATGAAAAA TTGCAG'rTAG* CGACCAAGAT GGTCGGTGTA GCTGTAACCA AGTATCTTCA GCTGATTGAA CACCAGATTGCGCAAATGGAT AATCGCCTTG CAAATTGTAC CAGACTTGCA CCCTfTGGCG CGCGGTTTCG GTTCTTCCAG CAACCAACrC GGTCAACTCA ACTTATCAGA 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 TGAAGGGCAT CCTGACAATG TGGCTCCAGC CATTTATGGT AATCTCGTTA TTGCAAGTTC TGTTGAAGGG CAAGTCTCTG CTATCGTAGC AGACTTTCCA GAGTGTGATT TTCTAGCTTA CATTrCCAAAC TATGAATTAC GTACTCGCGA CAGCCGTAGT GTCTTGCCTA AAAAATTGTC TTATAAGGAA GCTGTTGCTG CAAGTTCTAT CGCCAATGTA GCGGT'rGCTG CCTTGTTGGC AGGAGACATG GTGACCGCTG GGCAAGCAAT CGAGGGAGAC TCAGGACTTG GTAAGAGAAT TTGCGATGAT TAAGCAAGTG TGCAACCTAC CTTTCTGGTG CTGGGCCGAC AGTTATGGTTr GCCAACAATT AAGGCAGAAT TGGAAAAGCA ACCTTI'CAAA AGTTGATACC CAAGGTGTCC GTGTAGAAGC AAAATAAAGA ACTCTTGACC AGAGGGCTTC ATATCCTTTT TGTGAAAAGA CAAAGAGCAA ACTAGGAAGC TAGCCGCAGG CTGCTCAAAA TAGAACTGAC GAAGTCAGCT CAAGACACTG TTTTGAGGTT CTCTTCC-ATG AGCGCTATCG ACCAAAGAAA ATGGGGCCTA CTGGCTTCTC ATGACAAGAT GGAAAACTGC ATGACTTGAG ATAGAAGATA GGATGGGGAA AGTTTATACT CAATGAAAAT CAGTGTTTTG AGGTTGC-AGA GCAGATAGAA CTGACGAAGT 387 ACGTGGTTTG CAGTAACCAT ACTACGGTAA GGTGACGCTG AAGAGAT'TTT CGAAGAG'rAT I TAGTTAAAAA CGTGATAAAG GAGAAATAAA GATGGCAGAA ATTTATCTAG TTTTGGGGC CTAGAGGAAT CTACGCTAAT GGTCAAGTCG AGAAACGGTC CAAGTGATTT T'rATTTCCGA GTTATCGATC ATATCGAACT GGGATTTATT GCAGGAGCAG GAACGCATGC CTACATTCTG GCTGAAGACT TCATATCGAT GTGACCGATG TAGTCAAGAG GTGT'rGAAGG TGCTACAGAG GCTCCATTTA AGATATTACG ACAGGTGAGC TTGGCCAAGT TTTAGCCGTC CCATGGAATG GAGCGAATTG TTTCACAGAT GGACCGCGGG ACGCTTTGTG GCCAAGGATG AAACAAATAA AACAGAGAGT
ATTTTTCACG
AAACGACCAA
ACGATGAGAA
CTCTATCTAT
ATCAGGATGA
TGGGTCGAAA
ACCACCAAGA
CTGATAAGCC
CCAGTCTATC
CCAATGCCTA
CATTTCTGGA
TTACC2AGTI'G
GGAAGTGTCA
CAATCAACAA
AGCAGATTTG
GATTGCAG'rA
CTATCTCAGG
ATTGATTGAT
TGAAGAGTCT
TGACCAAACC
GTGCTAGAAA
CTCAAGGAAA
CTCAGAGAGA
CAGGTGGT'rG
CCAGTGTT-GG
CAGACCATGC
TTTTACTTTA
GGGAA'rGACC GTGGTCGCCA CCAGCTATCT ACACAGTCGT GAAGTGGAGC AAT'rACGCCA AAGAATCCTr CAGGTTACTG GCAGCAAACT ATGAAAAGCC TATCGTGTCA CACALAGAAGC TTTGAAGAGG GGATT'IATGT AAGT'rTGCTT CAGGTTGTGG CATTATI'ACA AGGATCTGAG AGTGCTCACT TGGGTCATGT 'rG'ATCAATT CTGCTTCrT GGCTA'rCTAT TGCCTTACTT TTTCTAGAAT ATGAATAGAA CACTCTTTTT TGCCAAGGA'r CGAT'rTCCAA AGAGTTGATT AAGTTCGTTC TCGTTCAGGC AGTTAGGCGG CCTCCGTTAC AGATGGAAAA AGCAGGATAT GGGGCTTCCC ACTTTCTTCA 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080
GGGATTTATG
CCCCTTGTTC
GATTGTTGAC
GCTCCTTATC
AAAGGCAGCA
CTTGCCCAAG
GGATACCTAC
AAACACCTAT TATCTTACTT CAAACCCTAC ATCAAGGAAT CAATTTTAGC AAGCTGTTAG AAGCTGTTTT TGAGCTCTTG GTTCCCATGG TGATTGCTGG CAATCTTTAC CTCAGGGAGA TCAAGGTCAT CTCTGGATGC AGATTGGCCT TTTGCAGTAA TITGGCGTT1'? AGTGGCCTTG ATAGCTCAAT TTTACTCAGC GTAGGTTCTG CTAAGGAATT GACAAACGAT CTTTATCGTC ATATTCTTTC GACAGCAGAG ACCGTCTGAC AACTTCTAGT TTGGTCACTC GCTTGACTTC CAGATTCAGA CTGGTATCAA TCAATTCCTG CGTCTCTTTT TACGAGCGCC CATTATCGTT T'NTGGTGCCA TTTTT'ATGGC TTATCGAATC TCAGCTGAGT TGACTTTCTG GTTCTTAGTC TTGGTTGCCA T'rTTGACCAT TGTCATTGTA GGGTTATCTC GATTGGTCAA TCCTTTCTAC AGTAGTCTCA GAAAGAAAAC GGACCAACTG GTTCAGGAAA CGCGCCAGCA ATTGCAAGGG ATGCGGGTTA TTCGTGCTTT TGGTCAAGAA AAACGAGAGT TACAGATTTT 388 ATGCTAGATT ACAAGAAAAG TCAAACCCI' AACCAAGT ATTAACACCT CTGACCTATC CTATATTTCA ATTCAAGGAG CCTCTTACAG ATTTTGGTGG G'rCCTATATC TCAGTCAAGC TTCAGAGTTA GAACAAAAGC CTTTACCTAT CCTGATGCGG AGGACAAATT CTAGGTATCA CTTACTTGGA CTTTATCCAG TCCTCTTAAT TTGGAGCAGT CTTTAAAGGA ACCATTCGTT GGAACTCTGG CAGGCCTTGG ACTCTTGGAT GCTCTAGq'TG ATTGTCTATC GCCCGAGCAG CTCGGCACTG GATACCATTA AAACACGAGC TTAATTTTGA TCTCCTCTTG GAAAAAGGTG CAGCCAAGTC TATTGTGAAA GACAAACTGT AAACCAGACG TCCTTTTCCT AGCCTTTCTA
TTCTGATTGG-GCAGGTCATT
AGATTT'rTCT CCAGATGCTC CTCTCCTCTA TAATCGTCTA ATAAGCTCCA TCGTTTACCG GTCGTGTAAC CACGGACATC TTTCATrGG TGTrMTATG TCATGACTCT CTTAGTCTTG CCAAGAAATC CTATCATCTC TGATTGAAGA ATCGCTTAGT T'rATCCAAAG ATTGCGTGAG
TGATTGTCAA
GAGTGC'rCAG
AATTGGTCAA
GAATCGAGGA
AAGCTACCAG
CCCAGCCT'rC
TCGGGGGAAC
TAGACAAGGG
GGCGGTCTTG
CCAACTTGAC
TGGAACTCTT
TCAAGGTGCT
GCTAGCCATG
AGTCTTTGTT GAGGCTCCAG AGGATATCCA AGATAAGGTT TTACAAGTCC AAGAATTGAC TCTGAGATAC ATTTCCTTTG ATATGACTCA TGGTTCTGGT AAPATCAAGCT TGGTGCAACT GAACATTGAC C=TATCAAA ATGGACGTAG GATTGCCTAT GTACCTCAAA TCTAGGTTTC AATCAAGAAG
AGATTGCGCA
AGGCAGGGGG
TCTTGCGCCA
CAGAGTCCAA
TCTCTCAACG
AGTTGCTAGC
TCAATGCATC
CTCAAACGTT
GGAACTATTG
GACCAAGTCC
TTGGTGGTAA
ATCTTCTCTT
ATTGCCTTTG
GAACAGTTGG
ATTTTGGTCA
C'rGTTGACGC
TTCCAGAAGC
CAGCAGACTA
GCTCATGACA
AGCTAAGGAT
GCGAAATTTC
GGCTCCGTTT
GCTCTTGAAA
AACCTCAACT
TGTTGGCAAG
CCAACATGGA
TAGCCGTAGA
CCCAAGTTGG
TAGTGGCTGG
TAGGAAATAC
7TrTGTCAGTG
TCAGGTGGAC
CTCATCCTAG
GCTATTAGAG
TTACAGATGG
CACGATGACT
AAGGAGGACT
TTTAGCAAGC
CTTATCAATT
ACAGGTTTCT GGTCTAGTTT CTCGTTATTA TCTGGCAAGC CTCATTGCTC TTATCAATTA TTGATCAATT CCCTCAACCA
AGGTCGAACT
TATCTGACCA
AAAAGGAAGG
AAAAACAAAG
ATGATGCAAC
AAAATTrCC
CGGACCAGAT
TGATGAAATC
AGAATGAAAC
CATCCTTTCC
TACCTACCTA
4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 TTCATCACCA GT'rTTTTGGC TCTGGTACAA TGGGCCAATC ATACCAGAGA TTTACGGGAG CGAATCATCC TAGATAGGCA AGGTAGTGGA GAGATGGTTA
CAGCTGGCTT
GTATTCTAGC
CACTGTCCAT
AAACAGAGAC
TAATCCAGTC
GACCATGATT TTTAACCAAT CATGCTCCAA ATTCATCTCC GGTGATTTCA CGCTTTATTG GAGGGGAATT CAGACTCAGT CTTCAATGCT CAAACAGAAT ACTACTCAGG CTATTCTCAG TCAGCCATCT 389 TTTATTCTTC AACGGTCAAT CCTTCGACTC GCTTTGTAAA TGCACTCATr TATGCCCTTT 5940 TAGCTGGAGT AGGAGCTTAT CGTATCATGA TGGGTTCAGC CTTGACCGTC GGTCGTTTAG 6000 TGACTTrT GAACTATGTr CAGCAATACA CCALAGCCC'rT TAACGATA'rr TCTTCAGTGC 6060 TAGCTGAGTT GCAAAGTGCT CTGGCTTGCG TAGAGCGTAT CTATGGAGTC 'N'AGATAGCC 6120 CTGAAGrGGC TGAAACAGGT AAGGAAGTCT TGACGACCAG TGACCAAGTr AAGGGAGCTA 6180 TTTCCTTTAA ACATGTCTCT TTTGGCTACC ATCCTGAAAA AA'ITTrGATr AAGGACTrGT 6240 CTATCGATAT TCCAGCTGGT AGTAAGGTAG CCATCG1-rGG TCCGACAGGT GCTGGAAAAT 6300 CAACTCTTAT CAATCTCCTT ATGCGTT'TTT ATCCCATTAG CTCGGGAGAT ATCTTGCTGG 6360 ATGGGCAATC CATTTA'rGAT TATACACGAG TATCATTGAG ACAGCAGTTT GGTATGGTGC 6420 TTCAAGAAAC CTGGCTCACA CAAGGGACCA TTCATGATAA TATTGCCTrT GGCAATCCTG 6480 AAGCCAGTCG AGAGCAAGTA ATTGCTGCTG CCAAAGCAGC TAATGCAGAC TNTrTTCATCC 6540 AACAGTTGCC ACAGGGATAC GATACCAAGT TGGAAAATGC TGGAGAATCT CTCTCTGTCG 6600 GCCAAGCTCA GCTCTrGACC ATAGCCCGAG TCTrTCTGGC TAT'TCCAAAG ATTCTTATCT 6660 -TAGACGAGGC AACTTCTTCC ATTGATACAC GGACAGAAGT GCTGGTACAG GATGCCTTTG 6720 4CAAAACTCAT GAAGGGCCGC ACAAGTTTCA TCATrGCTCA CCGTT'rGTCA ACCATTCAGG 6780 ATGCGGATTT AATTCTGTC TTAGTAGATG GTGATATITGT TGAATATGGT AACCATCAAG 6840 AACTCATGGA 'rAGAAAGGGT AAGTATTACC AAATGCAAAA AGCTGCGGCT TTTAGTTCTG 6900 AATAAGCCAT TCTCTTTrGA AAGTTTA'rGG ACGAAAAAAG TTGCCTTCGA GTGACTTTTT 6960 *TGTTACAATA GCTAGAAAAA TTGTTCACTG TAATACTCAA 'rGAAAATCAA AGAGCAAACT 7020 *AGGAAGCTAG CCGTAGGTTG CTCAAAGCAC AGCTTTGAGG ?TGTAGATAA GACTGACGAA 7080 GTCAGTTCAA AACACTGTTT TGAGGT'rGCA GATAGAACTG ACGA6AGTCAG CTCAAAACAC 7140 TGTTTTGAGG TTGCAGATAG AACTGACGAA GTCAGCTCAA AACAGG 7186 INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 14273 base pairs CB) TYPE: nucleic acid STRANI3EDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CTGAAAATTC TAAAAAATTT ATAAGTAAGG AAT TAATTAG TTATTTTTGT GATAAAGTTT ATCATGAAAT ATTTGTTGAA TACTTCTTAT TTTACCAGCT ATGATTTAGT TACAGCAACT TGAACTCTAC TATGT1GGGAC TGGGACATAT ATrrTTAT T TAGATTCAAC ATGTTCAATG AAAATATTGT CCT'rGAAGAG ATGGAAAATT CATCAATCGA GTCGTTGTTG CTGTTGATAA CTGGATGTTT TAAATTATGC GATAATACAA TGATTAATAA TTGAATGGGA ACTTCATT'rC CTTGATGAAT TTAAAAATCA AAAGCAAATT CAAATGAGAA GTGGCGTCAT TATCTGAATT GTTAATGGTC CCGCATTTTC 390 GAGGTAGTTC CGCACG71r TCTGCCATAT GAATCTGACT ACGGCAAATG TGATTGGCAA AATTGCTAAT GGTATTGCTG GTTAAACT TTAATAAAAA AATAATTTTT TGTCCCAATA AATCACATAG TCAAAGAAA TGTATCAAMT CTAAAGGAGT GAGTCTAAAA AAACATATGA GGTAGGATTG CG'rAAAGCAA 'rTACAACCAC AGTCG'N'AGT GGACATTAAA AACTACTrGAG TGTAGATATG CTGTTGGAAG GGACTGTTAA TT'rAAACTTA AGGCAGTAGC CTTTTATTAT AAAAATACAA CAAGTGTTT GCC'N'GTGTA GTCTATGAT'r AATTCCTGTA ACATTCAATT GATACTTGAA TTCTTAGTAG AGATGTGGCT AAAAAATTTT T'rAGAA.ACT TTATATAAI'C AAAAGAACTT ATCAAATTAG AATATTAATG AGGGGAAAAA AATTGACACA AGAAGCAATG 'TCGCAATATA TGAAAGGTTA ATACAAATGG ATAAAGTAAG TGGAATCATT ACAGAATTTT TTGGCrGCT GGAAACTGTA ACCAACTTTT TTATGCCGTTI ATAAAATTGA TGGAGTTGAT TCCCACCAAC TCAAATrrCT TGATTAAAAA ACAATATAAA CAAAAGAATC TGTTGGAATA GATT'rGGAAT AAATATTTGC CACGACTGCA TTTTCA'TTTT TATTATTTAA GTTACAAGAA GCGTTTAT GAAATTATTT AAATCTATAA AAAGTACGAT CAATTACTTC ATTTTCTGCC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 G'rTGATATTA A==T'TGGA ACATCTTCAA AGAG'rAACGG AGCCAGATGA ACTTAATAAT AGTGATAATT GGACTAGTAA TTTACAAAAT GGAGAAAAAG ATGATAAATT TATI'GTTAAG ATTAATAAAC TTAGAGAGGT TAGAGAAATA ATGGAAAATC GTTTGAAAGA ATTTTrCTA ATTGATAGTA CTGTGACTAC AATAATAGAA TATATGATTA CTAGCCCAGT TAATGTTAGT GACACATCAA TATACACCAA TACATTATGG
CAATTT'TCCC
CGTATCAGTC
CCTITTAATTA
TTAGATAAAA
AATCTTGGAG
TCACTTATGG
CCAGGTAGTG
T.CAAGTAGTA
AAGTATAAAA
CCTGTAGGAG
GAGGTATATT
AAGAGCGTTA TAGTACTATT TGAAAGT'rTC GCTTCCAAGG CTTATATAAA AAGCAATCAT TGTGGACAAA AAACTTGACA GAGGTGAATG AGCAGCAGGT TATCACGTTA TTTGTTCCGC ATATTATTAC AACGTTAAAT CTCTATGGTA ATACCTGTTA TGAACATGAC TTAACTCAAA ATAGCAGTAT AGTGTTTTTT TCAAATTTAC ATCGTAATGA TTTTATGCGG TATTAGATAA AGTTAAGGAA TATTTAGGAA ATAAALACTAC TCAAATTCTG GATAATCAAT ATAAAGAATT TTTGAAACTr AATGATATA.A CGCGAGCGTT TCGTATTTCA AATTTAATGA .=NAATTAAT ACGAATGGAG AAAAAAATGC CAGATGATIT TCTTAACACG GATATrATC GAAAAGTTCG CTTTAACAGC TAATGGTATG TCTTCAAGAA GAAGATTGAT TAAATAATCT TGCTAAATCA AAGATAAATT T'rATAA'rCTA AAACTGATTT CAAGATAGTA TTATTTTAGA TATTTCATA'r TTAAATTTGC GAATGTAATT GATTAGAATT GACAAATGGG TGAAAAATTT TAT'rGAAGAG GCCTCTATGA ATACTGTTTG CTGATTTAGT -TATGAAATTT AAAATTCTGA TATTGAAATC TAGGTGAAGG TAAAAAAGAA AGCTTGGAAT CACATTGTAT ATTTTGGAAT TATTGGGACA GGTTAAGTTA 'rTATTTGATG CTACTGATGA GGTTAATAGA TAACTTAGCA GACAGTGTAT TTCGAGCATG TTCT'rAGGAA CTTTTTTGGA CCAGTTAI'G GGTTCAATTA GCAGTGGCTG GATAATGAGT CTAGTGTTTA TGTGTTGATT CCTCAAGTGG TATT1'CGTAT AAAGTA'rTAG
TTCATGATGG
GAATATTTTG
CAAGGTTCTA
ATTTTTGTGG
GGAATAATAG
GAATTCAATA
CTTGATAATT
ACGAGTAATT
ATCCATGAAG
TATGAAATGT
GCTAGAAATA
GAAAGATATA
AAA'rTTTTAT AACCCAATCG AGCCGATTTA ACATTACAAA TTAAGTATAA AAATAATTCA ATAAAAGTCG ATTT'rAAATT AGTAGAATTT T1'AGAGAAAT TACGATCTTT GATAAAATTA GATCAAATGG AAGTGTTTAT TCCTAATCAT TTGAGAAAGT AATTTAGAAA TTCTCACGGA GCTAATCTAA CTTTAACTTT AAAAAATGAT TGGAACTATT 391 GAAAAAGTAT TAAACAATTC TTTTAATTTT- ACGAGTAAAG AACGAAAAT ATTTATTCGA ATATGCATGT AGAATTAGAG TT1TAATCATT CTTATCGTTT TCTATGCTCA CCTATAATTA AAGACATTGA GAAGTAGCCA AATTGAATAT AAATATGAGC ATAGGCGATA GAGCGGTTGA TGGCTTTGTT TCCTTCAATA TCTGCTATTA AACTATGTCT TGAGATATTA AACTCTATTT TTATTATAT'r CAACCGGATA TTATGAAACA AGATTTTTAT GGTATTAGTT GCTATGAGGT AAGTAATTGT GAATTGGATA 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120- 3180 3240 3300 3360 3420 3480 3540 3600 T'rrATCCTGA
AGGGAGTACC
TTTTTCAATG
GTTTTGGGTT
TATTTAAGAT
TAAAATCTTT
TATAAAAGAC TTG'rTCATGG TTTTGTATTT TTAGATTTAA GTTAAACTTC TTTTACAAAC TCGGAATCTA ACAGTAGAGT TGCCAGGT GTTTATAAAG TTCAAATGAA TATTTAAAAA TGAAAAATTT GATAAAGTTG CTAATAATTA GATTGATTGT TTTATATAGT AGCA'rTGTGG CACGTTAGCA ATAATTATTC 'rATTTATTGC
ACAGAGTAAA
TAATAT~TT
TTTCAGTAAT
TAGAATATGA
AGTAAATTAT
TCCGCAAAAA
ATTATTATTIA
GGCTAGCTCC
TA.AGATTGTA
CTACCGGAT'r TGTTACTAAT ATTCTTATAA TA'rCAATTTT AACCAAATAT CATTTTGGGT ATAAGTTACG TGATAGAAGA TTTGCAAATT CTCTTTTTAG ATTCTATTTT TAATTCATTC GCATCATTTT TACAGGTGGC 392 AGTAGGATTr ATTTTATTGG TTAAGATAGA TATAGGCATA TTTTI'ACTTG CTCTA'rTTAT ATTTGTTrG 'rTAAAATTTA GAACTAGCAA TGCGAATATA GAAAACTrCT CTcAAATA TTACAAGAGA GAAGTGTTGC AAGGTACAAA GTTTATTTTA AA'rAATAAAT TATTATTTAA AACCAG'rA?1 TCTTTAACGC TTATAAACTT TTTTTATTCA 7rCAGACAG TAGTTGTACC GATT'TrTCT ATTCGA'rATT TTGATGGTCC GATTTTTTAT GGTA~N'TT TAACTrTGC TGGTTTGGGT GGTATATTCG GAAATATGCT AGCGCCAATC GTAATAAAAT ATTTAAAATC GAATCAAATT GTTGGTGTAT TTCT7TTTTT GAACGGCTCA AG'rTCGTTAG TAGCAATTGT TATAAAAGAC TATACTTTAT CACTTATTTT ATTTT'rCGTT TGTTATGT CTAAAGGAGT CTTCAATATT ATTTTTAATT CGTTGTACCA ACAAATACCT CCACATCAAC N'CTTGGTAG GGTAAATACT ACCATTGATT CTATTATTTC TTTTGGAATG CCAATTGGTA GTTTAGT'rGC AGGAACGCTT ATTGA'rTTGA ATATTGAATT AGTGTTAATT GCTATTAGCA TACCT'rATTT TTTG'rrTTCT TATATTTr'rT ATACGGATAA TGGATT'GAA-A GAATTTAGTA TATATTAGAA ATGTTTATGT TCATTCAAAA AGGT'rGTTCT TCTTGGTGGT GCATAATCAC TATA.ACTGAA AAAGAAAAGT GATATC'rTTA GAGATTCGTG AGACAACCCA AGCTTTTGTC GGAAAGATTA CCAATGCTT'r GATGGArAGG ATGTACTTTrA GCAAGATGTT GGATGGACGT GTAATAACCT CTTCTTTCGA GGAGTAT'r'1' GCGTTCCCCA GAAACGGACT TTCATCTGTT CCTTGAAAAA TATGAAGAGT TAGAAATTTA AGTTGATTAG TATTGACCTA GTGAGATTGA AACTT'rGTGG TATCGGACAT CATCAGTTCT ACCTTCTTGG AATCGGTATG TATAAACTCT CCGTCTACTr AATCT'rCA TAGACT?1'CT TTATTGAGTA TACCACTATT TGGGCTCCTA TGTGGTGGAG CAATTTCTAG GAAAATAGAT
TACTCATTGA
AGTCACAGCA
TACGCAGGTG
GGTGGTACAA
AGTATTACAA
ATTAAAAATA
GGAAGTTGCT
CTCTTCAAAG
GTCATCACAG
ATTGTATAPA
ATATTAAGAT
CAGATA:CPAG
AATTGACCGA
CATCATACTT
TTTAGTGGTA ACGGTATCGT ACTAAAAAAC TAGCCTTGGA ATTTGGGGAG AAGATTTTGC ACGATAGTAC TGAACAAAAC GGATCTGGAA ATAGATAAGA TACTGTCrr TCAAATGACG TGAGAAAGGT TCTCAAATTA ACGGAATATT CCTGATAGCG TCCr'rGTAAA TCATAGGGGC 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 GTCCTGCAAC AATTGAAGTC CCCTTTTAGG AGCCTAGCTT
TTACTCCCTC
CTTTTCTGTT
ACAGAAACGGG
TGGCAAGGGA
CTTTCTGAAA
GCTGAAAGAT
TGCTCAAAAC GCCGTCCGCT TCTAGTTTGT TCTTTGATTT CTTTGTCTAT GTGGAGGGAT TATGGTATAA TAGCACTAAT GTCTCATATT ATTGAATTC CAGAGATGCT GGCAAACCAA ATCGCCGCTG GAGAGGTCAT 'rGAACGTCCT GCCAGTGTGG TCAAAGAGTT GGTAGAAAAT GCCATTGACG CGGGCTCTAG TCAGATTATC ATTGAGATTG AGGAAGCTGG TCTCAAGAAG AGGTGGAGTT GGCCCTGCGT TTCGGATTCG GACGCTTGGT TCTTGACTCT GTTAACGGC
GTTCAAATCA
CGCCATGCGA
TTTCGTGGTG
GTGGATGGTG
ATCCCAGCGA
CCTGCCCGTC
GTCAACCGTC
GGGGTGAAGT
AGGATCTCTT
TGTCTCATAT
GCTTGATTAG
CAATCGCAGG
TGAGGAAGTC
TTTCAACACG
CArrGATAT'r 393 CGGATAACGC TCATGGAATT CCCCACCATG CCAGTAAGAT AAAAAATCAA CCAGATCTrCT AAGCCTTGCC TTCTATT'GCG TCTG'N'AGTG CTAGTCATGG AACCAAGTTA GTCGCGCGTG CTAGTCCTGT GGGAACCAAG GTTTGTGTGG TCAAGTATAT GAAGAGCCAG CAAGCGGAGT TGGGCT'rGGC CCATCCTGAG AT'rT'rTTA GGACAGCAGG GACTGGTCAA TTGCGCCAAG TGATGGCAAG GAAATGACGC GATTTACGGT TTGGTCAGTG CCA-AGAAGAT
CCTTGCCTGA
ACCTAGATTr CGAAATTTCA GGTTTTGTGT a. a a a. a.
ia a.
a a a a a a. a
ATTATATCAG
TTTTGGATGG
TCCATATCGA
TTTCCAAGGA
AACAAACCTT
AGAAGGTGGA-
AGCCGTCAAG
CCTCTTCATC AATGGCCGTT ATATTAAGAA TTTTGGAAGC AAGCT'rATGG T'rGGACGTTT~ CCCTTATCTA GCGGATGTCA ATGTGCA'rCC AAAAGAACTG ATGACTCTGG TTrCAGAAGC GATTCCAGAT GCCTTGGAAA ATCTTGCCAA GCAAACTATT. CrCCCACTCA AAGAAAATAC ACCTAGTCAA ACTGAAGTAG CTGATTATCA GGCAGGATTT GACCC'rCTTT GCCAAGGAAA CCTTGGACCG TGCATTTTGC AGAGAGAAAG TT1GCTAGCAT CGATAAGGCT AGTTGGAGTT TTTCGGACAA TTTACATCAT AGATCAGCAC GCATTGGCAA TGTTGACCAA CTGCGGATGA TGCCCTGCGT TTCTAGCAGA GTACGGAGAA A.AGAAGAGAT TGAATCAGGC TrTTCTATCAA GAAATACCGA AGGCCAATCA 'PCGTATTGAT GTGACAATCC CTATAACTGT GATTGAAATT GAGAACTCTG GTTGACTCGG GCTAACCGCA CT'rCCTGCTC AATCGTGCTA TCCACTGGCT GTCATTCACA AACTAAGCAA GAGGTGCGGA TATTGCAAAT AGTCTCAAGG ATCGACCGTG CGCAATCGTG GCTCTACTAT GAGA.AAACrG.
GGTAGAATTG ACTGATGAAG ATTGACCAAG CCAGCAAAAC AGACCATCCA GAGTTAGATC AGAAGCATCC AGCTrCCCAG TGCCCAAGGG CGAGATGGAC GTACGAGGAG TACCGTGAAA GCCCTATATC TTTGAATTTC CTTAGAGGAA GTGGGCGTCT ACATCCTATT TGGATGGCAG GCTCCTTTrG ACCAAGGAAG GTCTTGCAAG CGATCTATCA CCTCTA'rCAG CTTTCTCAAT 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 CCTGCTAACT ACGACCAGCT TATGACAAAC TGGAGC6AGA ATGCACGGGA CTTATCTCTT GCTGCTCAGG AACGGGTCA-A
AGCCAGCAGC
CTCAAGGAAA
AATCAATTTA
AACTCCTAGT
GAATGCCTCT
TTCTACGTGA
ATC'rATGAGA TGTGCGACAT GCAGAGCTGG C'rATCATGAT GATCATTCAG CTAGACAACr CCTCACGGAC GTCCTGTTTT GGTGCATTTT ACCAAGTCGG 7140 394 ATATGGAAAA GATGTTCCGA CGTATTCAGG AAAATCACAC CAGTCTCCGT GAGTTGGGGA AATATTAAAA GTATAAAAAA G'rCTGGGAAA AATTTTCAAA ATCAAAAAAA CGCATAAAAT CAGGTGTTCA AAAACCTrGA CCTAATTCI-r TTCGAAACTC TGTT-TTGAGC TAkTTGCC CGACTGGTr CCTAGTTTGC GTT'rCTTTAT TCGTCTAAAA ACCATT'rCAA TAGCTCCTTG CGTTCTATGG CTTGTTCAAT CTATTTAAAC TGATT'rGGGC 'I-NTATGCGT TTTATCATGG AAA'rAGTTAC TTCATTTTTT' ITTTrAAACG ACGTCAGT'rT TATrCAGTAAT CTCAAAACAG AGTI-rTGTCT GTAACATCGA AGrrGTGTTT TACCACTCTG TCT1ATGATTT TCACAGAGCA GTAGAGTCTG TTCTATGCGT TGCACACT~CA GAACCCTTAT TGTATCTGTC GTTAGCACAC GATTCCCTTA GATACCTCGC AGCTCCCAAG CAGATAATTG TGGTATTTCA AAAGCTCCTG
CTTGTATTCC
CCATTTT1TG
TCTTTCTCGT
GT'rATAAATT
TTACCTTCAT
TTCTTTGTTT
CTGGAAATGG
AAATGAAGTG
AAATCACCTT
AAATGATAG
AGTAAGACAC
CA.ATCGCAAC
AATACATTGG
AGTTTATC2'A
GAAACTCGTT
CTCTAATGAC
ATGCAATCAG
TTACATTCTC
CATTAAATCT
AAGTGTTCAT
CTAAATAAAA
TAATTCCATA
ACTGAAGTCC
CAAAGCCTAA
TCTI'TTGAGA
CGCTACAACA
TTATTTTTCC
ACTATCGTAA
rTTTTTCTAAC
CAGATCTTTA
TGCAAGATTG
TTATCTAGTG
ATACCTATTT
TCCATATTTA
GGATTGGCTT
TGTTCAACCT
AGCATTTTTG
GCATCAAGCG
CCAATTCCTC
ATTTTCATAG
CCTGTTAAAC
ATAATAAGAG
TTAAATTGCG ATTTTGCCAA CTAATGTACG AATCAGGTTG TTCCTGCTTT AGTACCAGCT CAAACATAAC AGGAATTTCG TACATACATA ATCATAATGA CA'PCATA'rrT T'rTACTTTT GAACCCAGGC TACCTCTATA CTCCAGATAA TAATTrrGAA TAATATTGTT TGCTACTAAA AAAPGTGACC CA N'CGATTT CTATTTCGAT TGATATTCTA TGTCAGGATT ATTTGTCAG'r CTCCAATATG ATATTC'rCTT TATCCATGCC TTGATCTTGT GTCCCTCCTG TCGCAAGTAA CCTTATCGAA TTGCTGTCCA ATTCGGAGTG GACCCGACAT CAAGATGATG TTCCCCATTT 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160' 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 CT'TTAATTr ATTGATAAGT CCGAACCATT CTCAACAATC GTAAAGAGCC TAAAACATCT CTTCATCCTC TATATTTCCC TATAGCTAAT TGCTTTGAAA TTACCGTATC TAGTAGGCAT CTACCAGCTG ATCATATACT TTTCTATATT CTTGTAATTC ATTGACAGT'r
TTTGATGGTA
ZATTAGTGGAA TGTTGTGTT'r TI'TCGAGAAC CCATCATGAT TCATTATTTC ACAACATAGG AAATCAACAG TTGC'rTCTGT GTGTCCATTT AAAGGAAACA TGTGTCCTGG CCTGCGAAAA ATACGTGCGG TCAGTCCTCT TTCCTCGGCA
TGAATTAAAT
CCACACTCTT
CTTTCTAGGA
TCAGAGGGTG
GAAATACCTG
CATCTGTTCT CATCATTTTG TTAGTCCAGC TAATmTAAT CACrZACCTTT TTTTGCAATT TTATATCTTC AGCTACACAC TGGTCGTTTC TTTATAATCA ATTCAAACTG TAAAAGCAGT CTTATGATTA TCTGTATTGT TrrrCAACCAT AGGTGAAAGC A'N'AATTGAT TAGCTAAACT TTCGCTCATA GGCATACAAA, TTAATCCTN GGCATAAGTA GCCATAAAAT TAACATTTTC 'rGTTGTAGCT GCTTG'rGCAG TTTTC'ICTAT CCTTGTGTC GCTTCTTGTA TNTCGATA ATATAATAGT TCCTTAGATA TTTACCTAAG ATATCATN'T GGT 'rGTTCC AAGGTATGAG GACAGTCAGA CTAATGCCGT TTCTTTTTGT GTGTTGATTT 'rTTTCCTGTA CCATCAATGT TAAGGCTCTT TCTAGATTCA CCATGT'rTCA TTCATTACAT ACAGATACCA TT'rACTGCTA TATAACAAGA ACAACTCGTC 'rTCCATTGAC TGATTATCCT
?ITCTGATTT
CAAGATTTAC
GGATAACAGA
TCGAGAGTTA
TGTACTCCCG
TACTGAAAAG
CAATTGTAAT AGATCCTTTT GATACCATAC AGCATTATCA AACAAATTAA GTCTCCTTCA CCTTCTGCAA TGCTTCTAAT TTCTGCTAAA ATCCATTc TCCATCAGTT TTTGCACATA ACTTGTTTAC TCTTAAGAAT TTTACTr'rGG AGACTTTAGC TCAACTATTA AATCTAAAAT TCTTTTTTTA TTGACGAGAT CCAAGTCGAC CGTTGACAGA AGAGTAAGAG CTGTTCGACT TTGA.AATGAG TAACTGTAAG TCCGTTAATA TTTTTGAGGC C1TTTCAACTT TTCCCATTTC CTATGAGATA GTCATTTCCT rTGGCAAAGA AATACCTTCA 0 0 0* 0 0 000 0 0* 0 00 00 0 0 0 *0 0
GTCCTGTAAC
CCTCACTTCC
CA.ACTGTAAA
TACTATCGCC
GACGI'GACCC
ATGTTTTAAT
GGATTGATGA
TAAATGGATA
TTTAATTGAT AGTTr'ACAAT TACGAGAGTC TTTCTGTATT TTCAATTATT CC'TGTGAACA TGGATAAATC ACTTCACTTT ATTTGAGAAA ATGCATAAGG TTTCAATCTA ATAGCGTCAT 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 CCTCCGACAG GAAACTTGGC ACTACCTCCA AAAACTTTTG GTGCAATATA TATT'N'CAGC TCATCAACAA TTTGTTGTTC CAAAGCACTC CAATTCATTA GACTGCCCCC TTCTAGAACT 0 0 0000 0 000 00 00 0 0 0 AGGCTATCAA TCTGCATGTT TCCTAGATGT TGCATTAAAC CCTTrTCT TTATGGAAAG TATTTCACA.G CCATGATTTT TTGTCTTCAG AGGAAGTGGC AATGTAAGTT TTAATATCAT GAGGTAAGAG GA=TCCTAA ATGTGTATCG CATATGATAC TCCAATCTAC ATGTCAGCAA AGGATCGTCT TGAATAACAG GCACTAACAT GGTGTCGTAA CTGATGCACA TGCTT'rCTTG TCGATAAGTC TATATGATTG GATATAGCTT CATTTTATTT TTGCTGTTTT TACGATTTTA GGATAGGATT TTTCCCTTCC TArrGACTCC CACCATAATT CTTCTrCTrC AGTAATCCAT ACATTGCATA TTTCATAAAA TTATTAAGTT AAGACACTCA CAAGTATCTT TACTCCTT CTCTTGTAAT ACCACTATCG TTGGATTGAT TTGTTTTAGT ACATAGGGTA CATGCTGGGT N'TTCTAAAA TTCCAACAGT CCAGATACAA TAGGATTACA GGCTATTTTT CCATCCATTG AATATACTTT CTAAAACTT AACTTGAAGA TI'A7MTCCT GTCTAGGCTT CCAATGACTA 396 ATTATAGCAT CTATACAGGG AGGTIGTTTlTC CCGAAGTGAC AACAGGGTTC AAGTGTTACA TAAAGCGTCG CTCCGACAGG GGAT'rCTCTA CACTTTTrAA GAGCATTTCT CTCACATGT GGGCCACCAA AAAACTCATG ATAACCTTGT CCGATAATGT GATTATCTTT TACAATAACT GCGCCGACCA TAGGATTGGG ATTGACGTAA CCAGCCCCTT T7TTGTGCCAG ?T'rATTGCT AATTTCATAT ATTTTGAATC GCTCATCTCG CTACCTCCAA AAAAATATAC CT'rGAATAGG GGACTACTCA AGGCATACAA AAGAAAACI' ATGCGATTAA CAAAAATGCT CTGAAATGAC AAGTAATCAT TTCAGAGCAC GCAAAAAGCA CAAATATACT T IrATCT'rCT 7?rCATCCAGA CTATACTGTC GGCTTrGGAA TTTCACCAAA TCATGCCTTT CGGCTCGTGG GCTATACCAC CGGTAGGGAA TTTCACCCTG CCCTGAAGAT AGTTATTCAA TTACAGATGA TTATAGTACT TAATTTTGAA TATGTCAACA GATAAATACC GATTGTT'rrT
S*.
ATCGATTCTC
AAGGAATACT
TGTTCTTGAA
AGGTCAGGTT
TTTGCTTTAT
CTCTGGGATT
GGTTCAAGCC
GAAAACAGCC
CCTTCCTGCC
GGAAGCCATG
TGAAGGAACG
ATAGGAGCAG
CGCCTATCAT
GCTCCTCGGA TAAAGAAAAT ATGATATACT ATGTACGCAT ATTTAAAAGG AATCATTACC ACCAATGGTA TTGGTTATAT CCTCCATGTG AATCAGGAGG CTCAGAT'rTA TGTGCATCAG GGATTTCGCT CAGAGGATGA GAAAAAGCTC GGTCCTGTAT CAGCTCTTGC TATTATCGCT ATTGAAACCA AGAACATCAC CTACTTGACC CAGCAGATGG 'rGCTGGACTT GGAAGGCAAG AAGGTCGCAG TGCAAGCAAG TGCTGAAAAC TTGGCTCTGG GCTACAAGGC AACAGAGCTC ACAGATACAG CTGAGAACTA TATCAAGTCG GATATACTGT ArT'GTGATA AGATAAACGA AATAAGAGAG AAAATTACTG CCAAATACAT GCCAATCCTT ATGCCTArTC GT'rGTGCGTG AGGACGCCCA TTTCTTAGTC TGATTTCGGT GCTGATGACA ATGCTGGCTT AAGT'rCCCTA AAATTGGCAA GTAGTAGTrG CAGGAGATGA CAAGAATTGG AAGAAGC'rAT AAGAAAATCA AGAAATTCTT GCCCTAAAA TGTTGGTCAA 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 AGAATGACAA AACGTTGTTC GTGGGTCAAG ATGACCAACC CGCTCTACAT GATGAGGAGT GGGGCCAGCC CCTCCATGAT GACCAACTAT TGTTTGAGI'r GTTGTGTATG GAAACCTATC AGGCAGGCCT GTCTTGGGAA AGCTT'rCCGA GAAGTCTTTC ATAGCTATCA AATTCACTCA T.GAATTGGAA GCCATGCTGG AGAATCCAGC TATCATTCGA TACACGCGCT AACGCCCAAG CCTTTCTACA GTTACAGGCA CTATCTTTGG TCTTTTGTTG AGGGGAAAAC TGTCGTTAAC AGCGCCAGCT AAAACACCCT TATCTGAGAA ATTAGCCAAA CAAGTTCACA GGCCCAGTCG CCGTATTGTC TTTTCTACAG ACGGTACTCA ACAAACGCCA GTCGCAGAGA TGACTGACAC AATAGAGCCA AGCTTTTTGC GAGTACGGCT CTTTTGATC GATGTTCCTG ATTATCGCCA GATCTCAAAA AACGAGGCTT GCTGCAGGGC TAGTTGATGA 397 CCACGAGAAT GATTGTGAGT GGAAAGGTCT TAAATGATGT CTGATTTTTG CGATTC'rCTA TACAGTCCTC TTTATGTTTG TCTTTAATGC CATCTGCCAT TGCAAATTAT CTTGT1-rATG TCCTTCTTGT TCAAGGATAG ATTGATCCAA CAATGG.AAGG AAAI'TCTTCT TTGGAGTCTT AACAGGATGG CTTTTCTCA GAATTTGTAT CAGAGA'rCTT GAAGCAGTTT GTGGGAC'rAG TCTAA'rATTC AAAGTACCTT TCAAGAACAA CCACTACTGA AT'rGGACCTC TGGTAGAAGA ATrATTTTTC CGTCAGGTCT CGGTTGTCAG GT'rTACTAAG CATTATTCTG GTAGGACTTG CACAGTTTGG CTCTATCAGA GTGGATTGGT GCAGTTGGTT TTTTCTrATTA TTTATGTGAA AGAAAAAGAG AATATCTACT TTAAGCAACA GCCTCTCCTP AA'CA'T'TTrA GCCTA~TCG CTAACAAAAA TAAGGAAATT ATGGCGT'rAA ATTGCTGGCT TAGTTTTAGC TCTATATGGC AGATTAGAAA GACTAAAAGA TTCTGATGAC TGTTGTCTTT ATGGACAAGG TCTAAATCAG TAGCTGTTTT TGCTTGTGTC 'rATTGCATTA CT'rGCAGGAA TTTTTGCTCT GACTCA'rATG 12540
ACTTAGGTGG
ATCCCCTACT
'rAGTAAA.ATG
AGGCCTTGCC
TGTTCACATG
AAATGAGAAC
S*
AGGACAAATC GATTTCTAAC AATGTTTTAG TACTGTAATA TGTGATGAAA ATGCCAGTAA CCCAGCTTTA T'rTGTTATAG TCAAAGAGAA GGACCCCAAA GGGTACAAT'r GCTC.TTGGAG TCGGGATATT GCTGTCAATG CATCCATAGG TTTTCCGACC AAGTTAGCAA CATCTTTTTG AGAGGATTTT CCCTTCCCAG CCGTGCTATC TGCGTATCGA CGAGGAT'rTr' CCCClT~CGA
ATATCCAATA
AAGAGTCCAT
AAGTCTTCTG
TCTGACAAGT
GAGTTGTCAA
TGGGCACTTC
AAGTAGAGGT GTACTATTCT AGTTTCAATA TGATACCGAG AAAAAAGCTG AGAAACTTTT TGACTTGTTC CTGTGCATCT ACATGAGCAT TT-GCGTGGCC GACATTCAGA TTATAGACAA GTGCCTCTTT ATAGTCGTCA TGGAAAGTT TGATGACCGC GAATATGCCA GTGTCCTTTA GCTTAGGCTT TTCTTCGCTT GTTTCGAGCA CAGGGAAAAG TTTGTATTTT TGGCAGAG'T AGATATCGTA GAGGGATTCG AGGCAACCAC CTGCAACAA GTCAAAACCT GTATTTGTAT GACTAAAATC ACTTCGTTCC TCATACCAAA CCGTCTCAAT CAATTCTTTA AAGTAGTGAA CACAAATGTC TGCTAAAAAG GATTGACCAT TGAGGTGGTT CATGGTTGTA TCCGAGAAGC GGAGTTGGTC ATTT'rCAAAI. AGATAAGGTA CACATAGGAT CATGTCGATG CTATCATCAG CTTCAGGATG GTCCTTGATA AAGTCTAATC 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 GACTGACACG AGGTGTTCCC AGGGCCGTGG C-GTCACTAGG GCGGATTTCT GAAATTCTTC GGCTATAGGC TAGCATTTCT TTGTCTAATT AAAAAGTCTT GATTCCTAAT TTATGCAACA CAAGAAAAAT TTTTTGCTTG ATAACC'N'TT GCAAGCGATA GGTATCGTCT CCACCGATGG AAAAGGCATG AATCAAATCC TCTGCACGAG 398 CTT'IT1AACGA ATGGGGCAAA AAGATGGGAT TGGTCCCAGA TCCTTGAGAC GTT INFORMATION FOR SEQ ID NO: 41: SEQUENCE CHARACTERISTICS: CA) LENGTH: 9828 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 14273 GTGAAGTGCG GCAAAAGGTG CAAGTGATGA GCTCAGGTTC TTGGCTCAGG TGGT'rA'CCT GTAAGACAAC GGTTGCCCTr CCTTTATCGA TGCGGAACAT TTGACGAATT GCTCTTGTCT AATTGATTGA CTCAGGTGCA CTCGTGCGGA AAT'rGATGGA TGAGCCAGGC CATGCGTAAA AAGGGACGTA TCATCGAAAT CATGCAGTTG CACAAGCGCA GCCCTTGATC CAGCTTATGC CAACCAGACT CAGGAGAGCA GTTGATCTTG TCGTAGTCGA GATATCGGAG. ATAGCCATGT CTTGGCGCCT CTATCAATA-A 0 0* 0 0 0*0 0 00 00 0 0 00 00 ~0 0 *0 00 0 0 TTTAGCTCTT GACATTGCCC CTATGGCCCA GAGTCATCTG AAAAGAAGGT GGGATTGCTG TGCGGCCCTT GGTGTCAATA AGGTCTTGAG ATTGCGGGAA CTCAGTTGCT GCCCTTGTTC TGG'NTTGCAG GCTCGTATGA AACCAAAACA ATTGCCATTT AAATCCAGAA ACAACACCGG TGTTCGTGGT AATACACAAA AGAAACTAAG ATTAAGGTTG TGAAATTATG TACGGAGAAG TTTGGATATT ATCAAAAAAG TTATCAACCA ATTGCGTGAA AAAGTTGGAG TGATGTTTGG GCGGACGTGC TTTGAAATTC TATGCTTCAG TCCGCTTGGA TTAAGGGAAC TGGTGACCAA AAAGAAACCA ATGTCGGTAA TAAAAAATAA GGTAGCTCCA CCGTTTAAGG AAGCCGTAGT GAATTTCTAA GACTGGTGAG CTTTTGAAGA TTGCA.AGCGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 00 00 0 0 0 0000 0 0000 00 00 0 0 0
CAGGGGCTTG
AATACTTGGC
TTGGCTTGAT
CAAAGAAAGA
TCGAAATTGA
ATTCAAGTTT
AAGTCAAATT
AACTATAATA
GATTTGACTT
GGAAAGCTGA
GTATTCTTAC AAAGATGAAA AAATTGGGCA AGGTTCTGAG AATGCTAAGA AGAGCACCCA GAAATCTTTG ATGAAATTGA TAAGCA.AGTC CGTTCTAAAT TGATGGAGAA GAAGTTTCAG AACAAGATAC TGAAAACAAA AAAGATGAGC AGAAGCAGTG AATGAAGAAG TTCCGCTTGA CTTAGGCGAT GAACTTGAAA AGAATAAGCT GTTAAAGCAG TGGAGAAATC CGCTACTTTT TCGATTTTrG TTAGATTATA TATAGTAGCT TGAAATAAGA TATGAACAAC TCTATTAGGA AATTTCTAGA AATGTTTTAG CAGCTACAGC GTACTATTCC AAACTCAACC GATCGAAACT AGAATAGTAC ATATCTACTT CTAAAACATT GTTAAAAATC TCCTTATr'rC ATTCCGCTAT ATATAGTTTG CTGTTTCTTG TCGCTCCTCT TATAATAGCT TTATGAATAA AAAACGAACA GTGGACCTGA TACATGGTCC 399 TCCAATTTG CTATCAAATA TT=NCAACA GATTrCTTCCC TCGCTCTTAA GCTTCACCTT 1440 GCTCTATAAC ACI'GCTGATG TGCAGTAGGA GCGACGACAG CAATGGCATG CGGATTGTCA GGAAGCACTA GCAGCCACCT GGGCTTTCTT' GGCTTGTATC TCAATCTTAT CAATATATTT
TCTTGATTGT
CGATTTTTGA
TTGCTCGTTA
GGA'rrTTAGG
CTCTCTTGCA
CTATGATTGT
TGGACGATTT CTTGGTCA.AG CCTGATTGTA GGTTTACAC TTATGGGGCT CGGAATTTCA TCrCTTTTG AGCATTCTAG ATACTTAGAT ACTCCTGCAG GACCTGTGTA GGTGTCAGCT TCTTTTTGCA GGCNTGTTGC GGTCTATTGG TGACAGTCTA GCAGCCCTGG T'rTCTCTGCC TTGGTTAATG TGGTTCTGGA TCTCTATTTT AT'rACGCAAT 8S a a. a a a a. a* a a 'a *1 0 a 000000 0 *6t0 0 0 0
I
00.a a alOe *600 0t al 0 0
S
AGTTCAATCC GCAGGACTTG TTATT'A'rATT~ CGTAAAAGTG CAAAAGCT'rG TACGCGGATC TGTATCTATC GGCAGTGTGA TAGTGCCCAG ACGGCAGCTC TrCTGCATCA ATGACGACC'r TGTTCAAGGT CTTCG.AATCG TTTCCTCTTT TTTGCCAGTC CTTGATAGAA AATGGAAGTC CCTCTTGTTG ATTTATCGCA *TTCTAGCTTT ATTGAACTAA AGGATATAAG GGTGTTATCC GTACTTCTCA T'rAT'rCCGTC AGTGCAATCC TAGTTGGATT ACTTGTGTTC TCTTCTT'rAG CTACCAT1'AT
TGCCAGAACT
TCTTGGAGCA
TTTTACAGTT
GACGCATTAT
T'rGCT'rCTCA TTCGCAAGGT TTATCAGCGG CTrGCCACAG TTTAAACATT AGGTTTGGCT ATGGGCTTGA TTCTGTTAAT ACATTTGGTG GACCTTTGCC CTTCTTCCTA GAATCTAGGA GC'rAAGCGAC AATCCTTGGc 1500 TI'GGTGTTGG 1560 CTAAAATCAA 1620 TTATGTTGCT 1680 AAATTCTTCC 1740 TTGCTTATAA 1800 GATTTCTGAT 1860 TGCATC'rGGG 1920 TTCTCTGCTr 1980 TCAAATGGGA 2040 TGAGTTCAAT 2100 CAGTGATTAT 2160 TGACCGCTAT 2220 CTGACCGTAT 2280 TTGTTTGTAT._ 2340 CAGATGGTTA 2400 CCATTTTGAG 2460 TTCCTCTAGT 2520 TTCCTTGGGC 2580 .GCAGTCGTTT.AAGTATATCC TGGGCAGTTT.
CAGCTGGT TTCCTTCTTG GCTAGTTCGA TCTATCTGCA AATCAGTTCA ACCT'N'TATC ATTGCTTGCA GGGCTTGGGG CAAAAGATCC TCGGAAAAAT CGTTTTTGTG GTTTTGATTA TTTGTGAACC TCTTATCTGG GTTGCCATGA CAGTTCAACT ATCCCTTGAT AAAAGAAGGC AAGGCAATCT TGGCAACCAA TACTGAATAA AATCCATTTC CTCTAGTGAA AATCGAAAAA TTTGGTGTTG AAAATAGTTT AACAGACTTT TGACTTCIT TATATGATAT AATAAAGTAT AGTATTTATG AAAAGGACAT ATAGAGACTG TAAAAATATA CTTTTGAAAA TCTTTTTAGT CTGGGGTGTT ATTGTAGATA GAATGCAGAC CTTGTCAGTC CTATTTACAG TGTCAAAATA GTGCGTTTTG AAGTTCTATC TACAAGCC'rA ATCGTGACTA AGATTGTCTT CTTTGTAAGG TAGAAATAAA GGAGTTTCTG GTTCTGGATT GTAAAAAATG AGTTGTTTTA ATTGATAAGG AGTAGAATAT GGAAATTAAT GTGAGTAAAT TAAGAACAGA TTTGCCTCAA GTCGGCGTGC GCATTCAACC GTACAGAATG TTTCTCGCAC ATTGTTG-GGA CTGGGACGTTI CGGGGCGGTT CCATTCAACC AAAGAAGAGT TCTAGCAGAT GAAGCAGGI AACGCACGAG TATTGCACGA TCCATATCTT GCTAAATGGG CTTGACGATT GAAACAGGCT CGGCTCT'rAT CCAAAAGACA TTCAGGCTAT ATGCTrGCAG CGACAACTCA GGCGAAATGG CAACGAAGAA GGTGCCATGA AGACCCTAAA GAAGGCGCCA CTGGTACTAC CTCAAACCAG AGATGGCTTG ATTACAGTAA TATTAGGTCT TGAAAAAGCT
AACCATATAG
AAGCGGATTA
ACGGTTGCAT
GGAATGCTGA
TCATGACGGA
TGCCGAAAAC
ATAACCAACC
GCATTAGCCCG
GGCAGAAGAA
AGTTTGAGAA
ACCGCTGGAG
CTACAGGCTG
AGACAGGCTG
TCGTATCAAA
ACGGAACACT
AATAATAATG
TAATAGTATG
400 GCAAGTACAC GCACACTrCAA TCACrrGCGG AAAGACCCAG CATGCAGGTA GGACCTGTTG GACCTATGCA GCGGTGA.AC CTACCGCCTT TATATCGA.AC
CTGGGAATCC
AATTAGGTTT
ATAATGCTGC
TGATTGAAAG
TCTTACGCAA
GCTTGATACA
AAACAACCAC
TGAGCAGTTT
TGACACTGGC
AATCAATGGC
GGGAGTTT'AG CTGGAATTAA TCAGACCACG TTGACCCTTA AAGCATGATA TTGAGAACGG TACTGGTACG TACATTCAGA ACTTCGTACT ACTTTGACAG GAAGCACACA GACGGCAACT GAAGAAAATC GCTG.ATAAGT GGTCAAGTAC AAGGACACTT TGCCTTTATC CAGTrCAGCGG
GGTACTGGTT
GGTACTATT'r
GGTACTACTT
ACGGAACAGG
TTGCTACTAT ATTAAACAAA AA'rCAAAAAG 'rAAATTTGAC AATGTAAACC GAGTCGGATA CGATTTTTGA TAGGAACTCA TCAATTTTAG TTG'TTTTGTC ATAATrTTT'rT TATTTAAAAA GAGAGGTGTA TGAATATGAT AAATGTATGT AAAAGAACTT' AGAATATCTT CAAATCTTAC GAATCAA.AGC ATGATTGCCA AAATGGAAAA GTAATAAAGC 'rTCAAATCTT AGAAAAAAGT TTTTTAAAT'r GGTACAAAAG ATAGAAAAT'r ATATTAGCGG AGTC'TGTTAT AAAGGAACTC TGCCATTAGG AAGGTAAAGA TTTTCN'TAG AGGTGAAGAA GAGTTCAGTC AGTCTTATCT GGCAGACAAG CCAGAATTCA CAGTAGAGCC GAATGTCTT CAAATCAGAA CAGCGCATAT CT'CTTrG .TGGAGATATT TCCTTCAATT CAAACrAGAA AGTTATGCTC AAATAAAATC GCTTTAAGTA CTGT'ITTGAG GTTGAAGATA ATr'rrTAAGC AGCATCAATA AATTGCTTCC ATTATGAkma GAGTGTGCTA TTCTTTTTAT GATAAATGTA TGTGATGTTG GAAAAAGAAT TCAAGATAAG ATTGCTGAGT ATTTGTCTT AGGTGAAAGG AATATCACGA ATGGATTTAA TGGGAGCTGA TGGTGAATCG CCGATAGATA TGACGCTGGT AT'rTrATGGA CTCGGAAAGA AGTTCAGTCT CATTGCAGTC AATTCAGACA CACATGGACT GTA'rCATCTT TATTATGA'rG TCATTGGTGA AGGAGATGAA ACTGAAAGAA 3180 3240 3300 3360 3420 3480 3540 3600 3660 3.720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 AAGCGGATCA GTTTGCTTCT TATTTTTTAA -rT'rCCCATC TTCACTGTAT AGGATGGTTG 42 4920 AGGAAATCAG AGAAAATGCC
I
GTCAGTITTTA TGGTATCAGT TTGATGCAGA AGAAATTAAA GCTATGATAC AAGTTTATAT AATATATTAA T~cAAcTGAA AGGAACTGTT ACTAGATGCT 401 AATAGAAcTC ATCTTCAAG1' AGAAGATATT ATAAAATrrGG CATAAAGCTA TGTTATATAG ATTGAGGAAT GPLTGGATACC AATATGGATA TTAGTGTTAT AGAGACAGCT TCAAGATTAG CGTCCTTTGT CAGAAAGTAA AAAAGAAATG GCATTAGGAT CAACTTTTAG AAAATAACAG AATTTCGCAA GGGAAGTATG TTCAGATATG ATATTGTATA TGGGCTAGAT GAAGAGGGGG GAGTTGTCGT TTGACTAGTC GTGTA'TTAT GGTTGGCACT GAACATCTTT TAGAAAAGCT GGTGTATGAT GAAATCAATA TACCTACAAT GGTAGCTAAG GGTTCAGCTG AGATTGTGAG ATATAGAGAT TTAACAAGAA ATCATGATAG ATCTATTTCC TTAGCGAAAA AGCATAATGG TAAATCATAT GTAGAAGAAT TTTCT'rTAGA AGCGTTT1AAA GCGTAATTTA TTACTGAATA TGATGCAGAT TGTATTCAG TATTTTTATG CTATTTGGGT AAAATTGT'rA TTCCACAAGA TCCCCATTTA AAATCTAGGA TAGATCAGTT CATAGACATT GGAACTGAAG AATACGCATT TA.ACAAGATT A~rGGTAAGG GAGAAGGGQC GATATTAGGA AGTAATAACC TAAGAGATGT ATATATGACA ACAGGAGATA TACTGATTGA AGAGGGCAAT CATATCTGGA ATAATATGCT TAAAAAGAGA AGGAAAATTG GTGCAAATTC AT=rCAGAC TATCTTCGTG GAAGTATTCA TCAAAATAGA CAAAAATAAA TTTGGATAAA TCGAACTCAC TATTCAGGAG GCATATGAGC- 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 AATTCGAAAA AGAAAAGTGT CAAATTGAGC CTATAGGAGT TGCATAGTGG ATGAGAGAAA AGTTCTCCTT AAACGATATG TAGGGTAATG TGAGAGGGGA AAACTTATAT TTTATTATAC CGAATGATAA AACATGAATG TCAAAAAGAT AATGTCAATT GAGGAACTGA CTTTGACTTT ACCAATCAGT ACTTTTCATA AGGAATCATT TTTATTGATT TTTGTTACTC AGGCTCGCAC TATGGGTGAA GAAGTrI'TCC
TAGCGAGTAG
AATATAATAA
TTTCAATCCT
TTTGTAAAAA
AAAGAAAAGA
AAAGCCAATA
AGAAGTGAAA TAGTAAGTCC TGAACTATCA GTCGCATGTC TTTTTGGTTA TTTTATCAAA AAATGATAGA ATAAGGAAAA TTTATGTTGA TGTCAGTATT G;GTTTGAGTA TACTCAAATG GAAGGGGGAG TTTGAGTTCA TGGATGTTGT TTTGGTGTTT TCGAAGTTAT CAGACAGTGA AAAAAAGCAA TTACTTCAAG CTAGAGTTCC GTTTGTAGAC TTTAAGGGAA ACCTCTTCTT GTCCCTAAGG AATTAACACC GGTCAAAAAG TAGTAGATGT ATTTATAGGT GTTTGAGGAC CCCTCCATTG GGACTAGTAC TCAATGCGAA TGATACTGAA TAGCGAACAA TTAACGTGGA TTGCCTTTT ATTGACAAAA TGATTTGCTT TCACAAGTCA CTGGACTTCC AAACTCAACA TTTrTAAAGCT TTATATTGGT TAAACAAGCA AAATAAGCTT TACACATATA CGGTGTCAAA CCCATCAAAA AACGGATT AACCTTCTA'r ATGGTGGTGC GAAAATATTA GCTATGTCAT C-AGCATGTTT TAAAATGAAA TTTTGGAATG ATT'N'AAAAA TTGACCTTAA AAGATGATGA ATGATATTAC AGTATCTGGG AATGTTTGCG GATTTTCAGA 402 GAAAGAATTA TTCTTAAAAT CCGTGTCATG TTTATTTAAT ATTGCCAGAT GGCGA'rATAA AGCAGATAAA ATCTGT'TrcT TTATGCTTTG TCGCATTCAA CT7rTIMAGC TGAAACGGAT ATGGCAGAGA AAATTCAAMC AGTTA'rCCTT GCcACTTTCT GATGCTAGAG ATATGGAAAT ATCGTCCTTT TGTATCTGAG TAATCATGAT AAACAATTTG TAGATCCGAT TTCTCTTTAT TGACCCACGT ATAGAGGAAG AGAGTGAAGC ACTAGAAAAT AGAAGATGAT GCCAGCTAAT ACGAAAGTTA TTTTTCAAGA ACTATTATGT TCTGATTGGG GGAACTGCTA CCTCTATCGT
ATTGGATTCG
TGAAGTAAAA
TCAAGGAAGT
GTTTCCTTCT
CAAGGATTTA AAAGTCGCAC AACAAAAGAT TATGATATGG TCATCATTGA AATAAGGAAT TTTATACTAC CTTGAATCAT TTITTTAGAAT TGGGAGAGTA CAGAAAGATG AGAAAGCGCA GCTTTTTCGA TTrACAACAA CTAATCCTGA ATGA'rTGAAC TATT'rAGTAT CT'rACCAGAA TATCCATTAA AGAAGGACGG a a a.
a a a a a a *aa.
a. a.
a a TCGAGAAATT CCCTTACATT TTGACCAAGA AGAT'rATTAT AATATATTGG TGCATGAAAA TAATTGTGGT TTATACTCTT CGAAAATCTC TCAAAACAGT GTTTTGAGCA GCCTGCAGCT GAGTATTAAT TAIM=TAAG GCTAAAGCTT GTGCTCAAGG TTTAAGTAAG TCCATTAAAA CTTCCTTGCT AGGAGATGAA AAGTTATCGG ACATGCACCG CTTTGTGATA GAATTAGAGC TGCTACTTTA TCAGCCTTAT TATTGGATGA AGAAACCATT CAGGGGTATT CGGTATTGAG TTCAAACCAC GTrCAGCTTCC ATCTACAACC AGCTTCCTAG TTTGCTCTTT GATTTTCATT 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 GGCTGGATAT GGGAGCGC AGCATTTGAA TGACCTTACC CTATAACATC AAGTAGTGCG CTG'rGAAGTC AACTATTCTT AAATTCTGAA AAATTTTCTC TC'rGCCACAG
CGTTTGACAG
GTAAAAGCAG
CAAAATAATG
GATGGTTAAA
AAAAAACTTC
CAAGGTTTTT
TTCGTTTTTA
ACATTTCATT GGATCAAAAT ATAATTGTAG CGAGATGGCT AATTTCAGGA GAAAATGAAG CAAGACCTGA TATTATGCGT TCCTCTAGTC ACTTGATTTG AGGACAAATT GATCAGGACA ACTATTCTAG 'N'TCAATCTA GATTGTAATA CTCTTCCAAA
GAAATTTTTG
ATATTGAATT
TCAATCTTCC
TTTTTGCTTT
TTTCAGGTGG
CGTCTATATC
CACAATCAAA
TCAAAACTTT
TTTTTTAGTA
TGGAAACTAG
CGTA'rAGTAT
TTGCCCAGTC
GTCAAATCGA TTTCTAACAA CTATAGTTAA ATCTGCGGTC ATCTCATCAA CCACGTCAGT TAGTAGAATG AAACGAGAAC TGTTTTAGAA GCAGAAGTGT AAGTCTACTG GTGAATCTAT CTTGCCTTGC AGTCTGTATC TTACTGACCA AGCTAGTGAT GGATTTAGAA TAGGTGATTT GGAGCGTCCT ATTAGCTAGG 403 AA.ATGCTGCT CATAGTCCTr TGCTGAGGCT AGGGTG'N'rC AACATTCAAC ACTCAACTGG TTGATCTAGT TGATAGGAAG GGAGTTACTA TAAAATACTC AGGCrTCCAT CATATTTTTr- GAAACGATTG TGTAATCAA.A ATGTACCAAT ATrGTAGTAT TGGTACAGAA GATGI'GTGA ATCGATAAAr ATATCATAAC TGCTATCTCA AAAAGATTTC ATATGTCTGT GCATATATAA TAGACTrCCT GCAAAACTAG AATCCTAGTr CATGA'FrGAT AATACCAGCA ATCAAATTCA TTCGTA.ATCC AAAGCGTr-rA CGATGAFTC GATAGGT'rGT 'rGAAAACATT TTAAACGTTT CTACTTTGGC AAAGATGTrC TCAACCTTGC TTCTCTCCrT AGATAGCGCA TGGTTATAGG CTTTATCTrC AGCTGrI'AGC GGCTTGAGTT TGCTGGATTT ACGTGGAGTT TGTGCTTGAG GACATATCTT CATGAGCCCT TGATA.ACCAC TGTCAGCCAA GATTr'TACCA GCTTGTCCGA 8520 8580 8640 8700 8760 8820 8880 8940 9000 TATTTCTGCA ACTCAI-r TG AACAACTTCA AAGAAACAAr TCTCCCTGA. CTrGrGACAA TTTTACCAGA ATCATTCGCT, AATTCTTT TCAATCATTA CCGTGTCCrC AGAACTAAGA ACAAGAGTTA CTTCAACCCA TTGGCTCCGA TCAGCCGCAA TTrCTTCATA AGTGCGGTAT TTTGCGTGTT TAAGTrGATA, AGCTGTT'TTT CGCTGAACAC CAACAAGACG CT'rAAATCGT TTTCGCAGGG AGTCTATTGA CTC'rTTGGTA AATTATTTC CCGCCATTTG TATrTGCAAA AAGATTGTTT TTAGCTTTTrT TGTATTCTAA TTTCCTTrAC ATCTG'TTTT TGTGGTTCTG GAATTGAATT TCGAGAGTTT T'rACTCAGTT
TATCATGACA
TCCTGAGC
ATAGTTCACA GTGATATCCA CTTCATAGCG TGAAA'TCr TTAGGGCCAT TGAT'rTTTAC TTCCGTCGCA GGAGT'rCTTG AAATCGTAAC ACCACTTTGA CGGATTAAGT TGCTT'rCGTG AArACCAAAA TCTAGGCTTA ATTTAGGTTT~ TCGTCCACCT AATACAGCTA ACATCTCTTT AAAAGTCGTG GTATCAGTTA ATTGTTTAC'r TGCTTCATAA GGTGTCAATG TTTTTTTCAT CTATCCCGAG TGCTGAGTAG GTTTCCCAGA AAGACI'CTGG ATCAACCCCT TCAAATTTTA AGTCCATATT GTATTTGTTC AAGTTGAGTG ATAATATAGC A.ATTTCT1TT TrAACCCACT TTAAT'rGCTT* 9060 9120 9180 9240 9300 9360.
9420 9480 9540 9600 9660 9720 9780 TTTTA.ACACG GGTTAAAAAA GAAATTAAAG TGGCTTAATT 'rTTCTTGA INFORMATION FOR SEQ ID NO: 42: SEQUENCE CHARACTERISTICS: LENGTH: 3369 base pairs TYPE: nucleic acid STRANDEDNEss: double TOPOLOGY: linear 9828 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: CCGCGAAAGA TA'TTrTGAA CAAGAGTTTG AAGT'rGACGA GTTTTTAGAC GATGTCATCA 404
GACGTGAGGT
AGGACTATGA
AGGAAGAATT
CAATTACAAG
AAGTTTGG
AG'rCACTTCG
CACCAGTTCA
TTTTGAAACG
CAGAT'rTT'rA
TTTCTCATGA
CGAAACCATA
ATAGGCCAGA
TCAGGAAATT GCGGATTTGA AGCAGAACCC CT'rGAAGCGG CCTGAATAGA TTGGAAAAAG CCGTGGCTAT AATAAAGTAG AACCTATGCT GCCTTGGTCA AACTCCTAAA CCGAAACCTT 7?rCTA'rGACG AATTTTGATA TAAACAAA'N' TTAGATAACT TAATCGCGTG AGGAGAATTG CCTGTAGTGT TTGTrGCTAGG TGAAATGGCT AAGTCCT'rGG CTTTCTGGAA ACAGAGAGAG AGTAGTrrAT'r 'GAGATGTGC AA'rTTTTGGA GGAAAGTCCA TGCTAGCACA GGCTG'rGATG AGCCTAGGGA CGAGAAATCG TTACGGCAGT GTAGGCTTGA AAGTGCCACA GTGACGGAG'r be.
0.0.
TGGAACGCGG TAAACCCCTC AAGCTAGCAA CCCAAATT'rT GGTCGGGCA TGGAGTACGC GGAA.ACGAAC GTAGTATTCT GACTGCTATC AGCTAGAGCT GTTAGTGGTA GACAGATGAT TATCGAAGGA AGTGGTCCTA GTCACTTCTG GAACAAAACA TGGCTTATAG AAAA'rGCAT ATAGGTTGGG GCTGAGAAAT TTTCTCAACC TCATTTTTTA AAGTGGACAT ATAGAAAGGT CTTGCAAGAC 'rGTAACATGA AAAAAGAA'rT TAATTTAATT GCAACTGTGG CAGCAGGGCT TGAGGCTGTC GTTGGTCGTG AAGTGCGAGA GTTGGGC'rAC GATTGTCAGG 'NTGAAAATGG ACGTGTTCGT TTTCAAGGAG ACGTGAGAGC TATTATCGAA ACCAACCTTT GGCTTCGGGC AGCAGATCGT AkTCAAAA'r'rA 'rCGTAGGAAC GTTCCCAGCT ALAGACTTTTG AAGAGCTATT TCAGGGAGTT TTCGCTTT'GG A'rTGGGAAAA TTATTACCA CTTGGAGCTC GGTTCCCGAT TTCAAAAGCT AAATGTGTTA AGTCCAAACT TCACAATGAG CCCAGTGTTC AGGCTATTTC TAAGAAAGCT GTTGTCAAGA AATT'GCAGAA ACACTATGCT GATGGAGAAT GGCCCAGAGT TTAAGATTGA GGTCTCTATT CATGATTGAT ACGACCGGGT CTAGCCTCTT TAAACGTGGT CGCTCCTATC AAGGAAAATA TGGCAGCAGC CATT'rTACAA CAAGCC'TTTG A'rTGATCCGA CCTGTGGTTC GGGGACTTTC TGCTAGAAAG A'rGGCGCCAG GTCTTCGTCG CTCTTTTGCA CGCCCAGAAG GG;GT'rCCTCT CTCAAAGATG TGGCAACTGT TATCGTACCG AAAAAGGTGG CTTTCTAACT GGTATCCAGA 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800
TGTATTGAGG
TTTGAGGAAT
AAAAAAGTAG
CAGTTATGAT
GGAACTGGAT
ACCGTGAGCT CAGCGATCGC TTGATTCAAG TGAGCTGGAT ATCATGGGCT 'rGCTCAGGTA GCTGGTGTTG ACGTTCCGAT AAAATCAATG AGATGATGCA GGGGTGACCA AAGTGCGCAC AGAAGCGGCTr GTGATATTGA TGCTCGCATG CAGGAGACAT TACTTTTAAG GAGTAATCAT TTCCAATCCG AGCTCTATGC TGAGATGGGG GTGGAAATTG CTAAGGCCAA CAGATGCGCG TGCAGGATTT CCTTATGGTG AACGTT~TGTC CAAGTATTTG CACCGCTGAA AACTTGGAGC AAATI-rATCC TGACTAGTGA TGAAGCCNTT GAAAGCAAGT ATGGTAGCCA AGCAGATAAG AAGCGTAAGT TATACAACGG AACCTTGAAA GTGGATCTAT ATCAA'rATTT TGGTCAGCGT GTCAAACGGC AAGAGGTAAA ATAGAAAGGG ATACTCATGA GTAAAAAAAG ACGA.AATCGT CATAAAAAAG AAGGTCAAGA ACCGCAATTT GAT GA'rG AAGCAAAAGA GCTAACAGTT GGTCAAGCTA TTCGTAAAAA TGAAGAAGTG GAATCAGGAG TCTTGCCTGA GGATTCCATT TTIGGACAAGT ATGTTAAGCA ACACAGAGAT GAAATTGAGG CGGATAAGTT TGCGACTCGT CAATACAAAA AAGAGGAGTT CGTTGAAACT CAGAGTCTGG ATGATTTAAT TCAAGAGATG CGTGAGGCTG TAGAGAAGTC AGAAGCTTCT TCGGAGGAAG TTCCATCTTC TGAAGACATC TTACTACCCT TGCCTCTGGA CGATGAGGAG CAAGGCr'rGG ATCCTCTATT GCTAGATGAT GAAAATCCAA CAGAAATGAC TGAAGAAGTG GAAGAGGAGC AAAACCTTTC TC!GTCTGGAT CAAGAGGACT CAGAAAAGAA AAGTAAAAAA GGCGCTTGTA TCAGTAATTA. 'TTGTGTCAG 'rGCTTATTAT TTCGACTAAG GAAA'N'GAAA CTTCTCAATC AACTACAGCC TTTTAATACA CT'rTATGACG CCAGTTTGAT AAACTGAGTC ACATACGCTT GCCAAATCTA TGTCAATGCT CAATTTGAGA AGCCAAATCG GATGCTAAAT GCTAGATAAG GCTATCAGTC AAGTCAAACT AGCAGCTCAA ACCAAGTAGT TCAAATGAGA CCTTTTACAC AGATAGCAAT AACTCAAGAC TTTAC'N'GAT AATATGATAG TCTAGCAACG AACCAGCTAT TGTGGATGGT TTACGGATAT TAAAACTGGA TTGGTAAGAG CCAGCAAACA GGCTTTAXTrT TGACCGTT'TT GTCTACCGTC AAGTGGCTCG AATCAATCCG ATGTGGATGA AAAACGGCTT TGAAAAATAG AAGCTGGAAG GTAGTCGTGA CAAATCAAGG CTATTCAAGA GTGTTGGATA CCAATGCCAA AATACGGAGC TTGATAAAGT AGTACTTCTA GCTCAAGTTC 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 j580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3369 GTTCAAGTCA AGCAAGTTrCA AATACGACTA GTGAGCCAAA CTAGAAGTAG TCGCAGTGAA GTCAATATGG GTCTCTCGAG TGCAGGGGTT GCTGTTCAAA. GAAGTGCCAG TCGTGTI'GCC TATAATCAGT CTGCTAT-rGA TGATAGTAAT AACTCTGCCT GGGATTTTGC GGATGGTGTC TTGGAACAAA TTCTAGCGAC TTCACGTTCA CGTGGCTATA TCACTGGAGA CCAATATATC CTTGAACGTG TCAATATCGT TAACGGCAAT GGTT1ATTACA ACCTCTACAA GCCAGATGGA ACCTATCTC'r TTACCC'rTAA CTGTAAGACA GGCTACTTG TCGGAAATGG CGCTGGTCAT GCGGATGACT TAGATTACTA
AGCAGTCGG
INFORMATION FOR SEQ ID NO: 43: SEQUENCE CHARACTERISTICS: LENGTH: 9713 base pairs 406 TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: AAGTTTACAA TTAAATGAA TTACAATTT TCCCAACTAA AAGCACTCCA GTTACCGCAA CGTTTGTACT GAATGTACTA AATCGCATTC CATCAACTrC ATC'rGTTTCG TCAACTTGAA 120 CAGATACTAA TTGAAGATTT AATACTTCTG CTGCCATAGC TAGCTCCTCC TATTTAAATT 180 TTTGGGATTA AGTACTTTAT CCACCCTCAT ATACTCTCTC CACCAGTAAA ATGCAAGCAA 240 TGATACAAAA TAGATTTAAC TATT-TTATAT AGCGAAAACT TACAAATTTT TAAGAAATAA 300 T'NTTTGCATT CTTAAAGATA AAATAGGAAC TTTAGTAAT AAATATTAAA ATAAATAAAA 360 TAATAGATAC TATAAAATTT GGAAGTATTA ACCCCAAAAG ATTCATATCA TCTA'IrAAAA 420 TATCCTCTAA AGAGTAGTAT ATTAAAGCCA TAATTTTAAT GTTAAGTWA AATGCAATTA 480 ATGAAGTAAC AAATGTCAAA AATATAGCCT CACCAACTTT AATCTTAACC ATCTGGTAAT 540 TAGAAGTTCC TAAAATTTCA AATTGCTGAA TCTCAATCCT TTCTTGATGC GATGACAAAA 600 .ATGCAATTGA AATAATATTT GCAAGTACTA TCAAAATTGG TGCTCCTACA TAGACAATAA 660 **.ATGCTACTTT TAGCTCTAAA TCACTGTCAT CTTGAAATTG AGATAGTATA TTCTGAGAAA 720 *TCATTTGAAA ACTAGAAATT AGTAATATAG CTCCTGTAAT TGCAGCACTG* ATAGATTTTA 780 TATAGCT ACAATATAGT AATCATTCGAAACAAT GAACATAAAA TTATITCTAA 840 ATATAATTAT AGAAAGTAGT TTGATAAAAC ATGACTGTAT AAAAGGAGAT AATTGATAAA 900 TAATCACAAT ATCTAAGATT ACAATATTGA ATATTATCTG GGCCTTCGCT AAAATTGTGC 960 *TATCTTGGAA AATTTGTTGC AAAGAAAGCA ACCAGATAAC ACTAAAACCA GCCAATAGCA 1020 GTATTCTTTT TACTATTGAA AGAACATGCC TTATTTTAGA ACTCTTCCTA TTTCTAATCT 1080 TCTTGAACGT ATAAAAGCAA CCACTTAGAA AGGCTAAAAA TGAAATCAAC ACTACTGTAA 1140 TGATACATCC AACAGCACTC GTTTGAAATT GGATATCAGG TAATATATTT TCCCCGAAAA 1200 AGTATTGTAA AAAATAATAA TAATrrTGACG TAACAAATAT AGAGCATAGA TATGCAATAA 1260 A.ACTAATAAT CGAGGAAATG ATAAAAATCT GTCCCCCCAC AAGAAATGAT AG'N'GAAGGC 1320 GACTTGCTCC CAACACCTCC AGAAGTTCGT AATCATCTCT AAAAATTTCA ACCAACATAT 1380 TTATTATGTT AGAGAGCACA AAGAATAATG TTACTCCTCC GAATACTATC GGAAACATAA 1440 AAATTGGTTT AGGATCTGGA AGTCC!GACAA ATACTTGCGA ATTATTCTCA ACATTAATTA 1500 CCCCA'ITAAC AGCCAATCCc ATAACTAAAC TCGAAACAAA AATTACTGGT GAAACGCCTA 1560 407 ACCATTGTTT CTTATTATGT AAAAATTGAT AGTAAACTAA TCTGAGCATC TCTAT'rCCTC CGTAGTTGAT TGTACCTCTA AGATTTTATA CAACTCTTCC CCGCTAGGTC TATAGTTC rITTGAAAATT TTTCCATC1-r TCAATAT'rAA TGCACGATCA G7TCGAGG CCAATTCrAT ATCGTGCGTT ACCATAAT'rA CACACTTACC CGCCCCTACT AACTCTCTCA ATAAT'rCA;A AATTACT'rCA CGAGAAACGC TGTCTAAAGC CCCAGTTGGC TCATCAGCAA ATATTATATc ACTATCAGCA ATAACCGCTC TAGCTATAGC AACCTTCTGT TGTTCTCCAC CAGACAGAGT TCCAACAAAA TCGT~rAAGC CAGCATTAAA CTTCATTCTT TTTAATAGTT AATTTTTT GTGATAATCG CAAAGGAAGT CAGGGAAGGT ATTAAATTGT ATGCTTGAAA TATAAAAGAT TGACAATTTT GCATTTCTGA TTTTATAGGG GTTGATTCCA TGTTGGTTCA AGCAAACTAG AAATACATTT TAATAAAGTT TCCTAGAATA CTTATAAATT CTCCTCTCGA AGCAGAAAGA CAACGTTTTA TTATTTCCTA GTAAAAATTG ATGATACAC ATCTT'rATCC ATAT'rCTT-GC CTCCAATCAC TTAATTT'rGA TTGAGTAAGT T'rrCTACATT GCTA'rATTTT- CTATTACCGG ACT'rCGTTAC GTCTTATACT TTAAAATTA CTTCCCCACT GACTTTCCAG ACCACTAAT GAAACATTTT TCAGCACTTG CCTTTCACTT TTAATATATA AAAGTGTTCC ATTTTTCCAAT *n.
e.G.
C.
C
C
TTATATATAT CAGTGTATCT CTTGTCATT'r AAGTCA'rAAT GATGTGAAAc T'rCAATAAAT GAAATACCTA AATTGAACAG AATATCATGT ATGGAAT'rTG AATTATCATT ATCTAAATTA GCTGATATTT CGTCAAATAA GTACACTTTA TTATTTCTA.A TCAGACCTCT AGCTAAAGCT ATTTTT'rGTT ?r'rGACCTCC AGACAAATTA CTACCATTTT CACCACATTG ATAATTTAGT 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 ATATCTATCT TTTCTAATTC TTCATATAGA TTTACCTTTT TCATCTGAAA AATATTCATT TTGAAATAAA GTTACGTTCT ATATATGGTG TCTGATCAAC TGTTGGTATT GAATCTGAAC AAATTTACA'r AACCTTTTG TGGCTTTAAA GAACCA'rTAA TTCCCACTAC CAGAAGTTCC TGTTAATAAT ACCCTAAATG ATACTTAATT 'rATTTTCTGG TGTAATAGAA TATACAACAT ATTGATGAAG TATACAGTCC GTTATTATCA TGTTCAGCGT CTTAAGTATT TTAAAAACGG TTTCCTTAAA TCTTTGGTTG TAGGCAAT'rG ATTGTATCGG CCCTAAAACT TTATCGTTTG TTAACACCTC AATTATCTGA CACGAATAGT AGTGTCAAAA TCT'rTTTCCC ATGTGATAAC T'rAAATTTAA AATCGTTGTT GTGACTTAAA TGAGAAGTCA CTTTCATGTG TATCTCATCT CTA'rAAAATT CTTCTCTCCA TATT'rATCTT ATTTAATGAA CTAAGAAAAT ACCTATCAGT TCAC'rAAAAG AAAGGCT'ITTT AGAAAGCAAA AACCTGAAAT ATGATAAATT ACAAAATAAC ATCCTACAAC CAAGGGAACT TAGTACTGCA ACCAATTTTG AAAGAACCTC TGATCG'TTC 408 AAATTAAAAG TAGAATCrC TAGTTTATCC AACTTTTAT CCGACAAACT AATPTTCT TTAGTAACAG AATAAGATTT TAATGTCTTA AAACCATAA AAATTTCTr'r TATTATGTA GTATACTC'rG CATTGCTGTT AGACTACTCA TTAGCTGAAT TAGACAACAT CTTCTTCATA AAGACAGGTA CTATAATCGG CAATGCTGAT AATACAATA), ATATTATTGA nACTAGGAAG 'N'rAAATAAA GCATAAAACT TAGAGAGACG ATGAACAACA ATATTGMAGA ~A1-I'TICA AAAATTTGTC TAAAATAGTT 'rTCTTCGATT AATCTCAAAT CATTTGACAA AACTGAAATA ATAGA'rGAGT AATCTTTAAC CATTTCAGAA GAAAGATACT GTTCTCTAAA ATATCM'rcT TTAATTT-TTA CATTTATATC TTTAGTTAT'r GATGCI'CCG T'rACTTCTAA ATAGTAATTT GATATATAGA TTGCTGACCA ACCCAGAATA CTTATAGCAC CAAATCTTAG
S
S. 55
S
S. 55 0 5
S
S. 55 S S
S
AATGAGGAAG TCTGATTTAA ACTACCTGCA TTAAACGAAG ATAGAAATAT TAAAATCCCC TT'rAAATAAT 'rCATAAGTTA TTCCTTCCCA TCATTAAGAG AACATCTGAT
GGAGTAAAAC
ACAAACTrTT AACTCCAATA GAATTTAATT TAATATATGG GTCTr'TCTCA CCCAAAATAT TTTCAATATT TTGCATAGGC GAATrATGACT TATCTCCCCA CTC'rGCTATT TCGGTCTTAG CATAAGGATTr TATAAATGGC GAAAATAAGA TCAACACCGC ACCAGCAATT ATTCCACCTG TGTCAATTAC ATCATTTTCC CTTAAATAAT TCCATTTGTT TAACGCCTTT CCTGAGCGAT TTACATGAAC TATAGCATAA ATAAAACCTG AATCAGAATT ATCATTCTTA CCATAAGCCC TTGGGAGCTC ATCCGTATCT TCACTTTTCC AACTGr&rI-r TTTAGTCCAA ATCACCTTAG
TATACAATAA
ATTAATATAA
CTTCTTCAAA
CTCCATGACC
TCTTTGACCA
TAACTATAAC
TATATAAGC
AAAGAGGATC.
GAATGCTTT'G
CACTAGAAGT
TTCCTGAGAG
GTrTAGTCTT
GAAATAATTT
AGCTGCTTTG
CTCTATCACT
AGTATTGAG
CTTTACTTCA
ATTrTGGATT.C
CAATAAATTT
AACGTCAGAA
CAAGACACCA
T'PTTATAAAT
AAAGTATCAA
TTTAAATACA
TCGTTATTAT
TCTCGTGCCT
GGGTCTCTAA
TGAAGTGTAT
TrTTTCCTCGT 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 TAT'rAAACCT AATCGCTTAC TTACTCCCTC AATAAAATCT CTGATAGAAT ACCATTCACC ACCCAAATAG CCTCCACCTC CATCTATTAT AGATAACATA ATT'rCATCTA CATAGACACT TAGAATACAT TTTTTTCTTC AAAATAAAGA AATCGGTATG CTTACATCAT AAAAATATTT AGTATTATTC GATTTTATGA S. 55 5 0 0
S
TGGGTCTTTC AAATTCAGTT TT'TAATGTAT TTTCTATTAA ATCAAAACTA AGTAT'TTTT *CGTAAAAAGT TCTCCTCTCT AAAAACAGAA GAACACGATC AGAAAATGAA TTTTCATAAA GTGTTGTCTT TTCATCAAAT GTrrATCTTAT TAACACTCAA CTCCCTCAAA CTATTATTTT TAAATGTAGC AAGATAAAAG ACGGAATTCG CTGCGTTTGA ACAGTCTAAA AGGATATA.AC GTCCTATACA GTGAACTCTT CTAGCCCTAT CTTGATATGG TATAGTAATA GAAACTCTGT
CTCCCGAAGA
CAAGAAAGTA
409 AGTTTCCCTT AGAATTAGTT GATCTTTCT'r 'rCTCAGTT GAAGAGACC CTGTGCTTTT TCTGTACI'AA ATAGAGCGAT ATCTCTAGGT GTIrGGGGCTA CCGI'TTCTGT GTAAGAGTGT CTAACAAAAC CCGTCCGGTC GAAAC1TGIAT AGAAAAATCC TGCCTCTG AAAGTCTACT GACTTTACAA AACAATTATT GCTATCAATG TGGACrAT TTAATCGAAA AGAGCAT'rCG TTTTCTTCAA ACAdTTCCTC TTCTGTAAAG CTATCAAAAG ATTTATAGAA TAACTTACT GGCCTCCCGT ACTCTTTGGA GCGAGTATAC ATAACACCGA ATTTACCCAA ATAGAACGAA CTTTCTACTG AAATATCTTC AATGATAAAT AACTCTTCCA TAGTATATTT rTTTATTCCA ATTAAATTAG TCGTACGCAG TGAGGATACA ACCAAAACTA TATAACTCTC ATCAGATGAA ATCCTAACAT CCTGTAAGAT ACTATCATCT GGCAAAGTAT ATTTTTCCAC ATCAAAGACA ATTTTAAGTG AATTTGAAT GTCTAAACTG GAAGAACTAA CCTTAGGAAT CCAGTCATTA TCTTCGACAT ACCATTCCTT TATTACACCA GTATTGGGTA TACTCCAATT ATCAAATTGG TACCAATATC GCCCTCTCCT AAATATCAAA GAAT'rCCATT TTTTTAATTC CTGAAATGAT GAAGAGATAG ACCTC'TTATA GTGTGTTTTT TCCTGTATTG TATTTAAAAA TA'TTCATTA CTCTGATTCA CAAGTATGAC CCCTTAATAA TGGTATCTAA
ATATTATATT
AACACCCGCA
AACAAATCCT
TCCTGAAAAG
TCCGTTAA.AT
TAAATATTTT
GCCAGCACTT
ATTATTACTA
TGAGGAAGAA TCGTCAATT'r AT'rATCCATT A'rTGATACCA ATCCAATTGC .AATCCCGAAG CAATATCTGT TGTTATCTTT AAACCATTAT CTCCCGCAAT TCTTCAATTA CACACAAATA TCTATAAAGT TGTTCAATTA ATTTCTTTTG TTATCATCGA TATCACTATA TATATTATTA GCAACTTCAA GACCACAAAA AAACCTGGTA ATACACAAAA AACTACATCA GT'TGCCCTCT CTAAAGAAGT AAGTATTTGC TTGACAAGAT T'rCTTTATTT CTATTAATAA GTAAAAGCAG CCAGTTGCTA GATATGGTAG TAATCTATGA CC'TTGGCTGT ACTGCAATGA TrTArTTTAT AAGCAACTAA TTCTTTATCT ACAGCCAAT'r CTAGACCATT 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 .6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 TTTATAGATA CI'rTCACCAG TTAATTTATA AGCTTCACCG AAGAGCCAAG CTACCCCTGC GTGACCATAT AGTAATCCAC CAAAATTCTC ATAAGGATCG TTACTCTGAA CATCACTAGC GCCAACTTTA CAAAAAGTTT CTGGATTITTC TATATAATTT AAAGTATATT CTCTAAGCCT AATTAGTATT TCTTCTCCTA GTTTATTATC AAT'rCCCCCT TTACTAAGAA AATACAGTCC AACCAGTAAA ATTCCAGCCT GCCCACTATA TAAATTTTTA TTTTGTGAAT TCTCAAATAT CTCTATAAAA TGAGTTGTAA AAAGTTCAAC TGCCCGATCT ATCTCCCCAA ATTCATAAAT GAGCCAGATT GTACCAATTT TACCATCAAA AAGACCAGAA AGGGACGATT TCTTAAAATT AP'rTACTGCC TCATTAATAA TTTTACTTC TTAGCTAGT'r TCCTTGATTA AAATTCAGAC ACGCCACTGT CCATAAGTTA ATCTTGCTCA C1TCTCGATAG AATATTCTCA ATCTGTGGAC TATATGTTCT CTTGTATAAC ACTTACATAG 'IrGTATGTCA TATT'rGTCTA ATACCATACC ACCAGGGGCT GCAACTTTAT GTCTATTATA CTAATCTCAT ATGATA TACA TTTTCAGAAT TGAAACTCTC AATAGATAAT GGTAACCCAT CTAT'rTAGTG
CCTGTGTTCG
GTTGATAACT
CATAATAATG
410
AATCTCATAA
CCAAAGGATA
AACTGGGAAG
GCGTAAACCC TCTCAATAAT TTCTAATCTC ATGCATGGGT AT'N'AGAATC TAGATATGAC CCAAAGACTC AAATAGTT'TT AATCCGATGT AGTTACTAGT AATCTATCTC AC'rGGGAAGT GTACAACTTT TTCATCATTT CTTCATCCTT AACCAAGATA GAAACT'rATr CGTTAAATCG CTTTGGTCTT ATCAACAACT GAACGCCCTT CATATGTTCA TAGTCATCAA ACTTGAAATT GCTAAATCTG AAAACGCAAT AATCTTGATT GAAATTCTTr TTATAArA-A AATCTTGTAT T'N'AAAACTT TTTTCCTGGA AATAAACTTT CTACATAATC TTTCCT1TCTA TCCTGGTTTG GGCATGTATA AATAATGAGC GTTTCTCGCC ATGCTCTAAA GAAAAGACAG CC'rGTTCCCA rTTCCTAAAT GTAAATCTTG ATGAGTTTT'r CTACTATCTT TCATATAAAG GAAAATTATT ATTCCTAAGA AGGTGTGCTC CATTCATT'rA GAATTTTTAG CCATCAAATC CTAGACCTGT TCTTTTAGCC TAGCATTTA' ATAAAGGGAA AGTCTCCCTG ~GGGGCACTA TACCTTrTTC TCGCCAACAA TTAATTCATC ACACCATACC TCAGATATAT CCATTATATT GCTTTAAGGC CCAGATCTTA CCGTGCCAGT ATATTDTAGG CG'rCTCACTC TGCTTTGCAC TCCGAAGCTA ATTNCTCTGA AGAATAAGTA ATACGGTCTA GCCTCTTTTA AAATTATTTT ?r'rCCCATCT TATCCCACCA CTGTTTGAAA A'rCTAATTGC ATTATCTATA TTTTTTATCT T'rCTTGTCAA GCCATTTATT CAAAAAGTCA AATTTTAAAT ACTGGTAAAC GTTCATCTTT AACAACTTCA AATAGCAACC TTCTTTTCAT CATCCCTTGA CGGCCTA AAC TGGTGCTTCA TCCCAACGTT TATCGCTTAA AATATATGGC 6900 6960 7020 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860- 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 ACTTTCTAAC CTTTGCAAAA CCGACTCTA.A TTCATTTTGA TTTGGATAAC A'rGTAATAAA TTTACCAGAA AATCCTCGAC TAACCAATTT CCCGT'rTCGC ATGATAAATT TGTCTTCTG'r ACTAAGATGT TTAAATGGAA TTCGCATTTC ATGGCAAATT TTTGCTACAT CTTGTAACAA TTCATGTGAA CTGTTATACT CTGAACTAAT GTGTATTTTC CACCCTTGTC TTTCAACAAA TN'TCCAATA GGGTAT'rGA'r AAACCCACTC AGGCAGACTT ACTrGGTACT 'rTATGCTAGT GAAAGGATGC TCCAAATTGA AA'rTATAATC ACTT'AATATA TCTATAGCTA GACAGACTTA ATCATTATTC ATTACTTCGT ATCTGTACTA TAATCATT'AT CATAACAAAA TCTCCAAGAA GCCAAT'rAAA
TAGTGAAAAA
ATTTTATCAA
TTTAAATAAA AAGGGAGAAT CCTTTGGATT 411 CTCCCCATAT AAGCACTAAC ATTCCAACGT GAGAATCTCT AAAGTTTACA AT'rTAAATGA AGTTACCGCA ACGATTTGTA CTGAATGTAC CGTCA.ACTTG AACAGATACT AATTGAAGAT CCTATTTAAA TTTrTGGGAT TAAGTACT GCACATArTG GAACGACATC CATAACTCCA ATTAACAATT TrCCCAACTA AAAGCACTCC TAAATCGCAT TCCATCAACT TCATCTGTr'T TTAATACTTC TTCTGCCArA GCTAGCTCCT ATCCACCCTC ATTATACTCT CTCCACCAGT AAAATGCAAG CAATTATACA ATGTTGTCAC ATAGAAAATA ATGTTrCCGT AACTTTTCA.A AGTAACTTCC ATCTCTCTCC! CAAAACTGGA AG;T'AGTTTT AGAAGTTACC TAAAAATCAG
GTCACCTATT
CCATACTGCC
CATACAAACA
TCAAAGCAAC
ATTCTTCAAC
TTAAAAAAGC AGCAAACTAT CCATAAGTCA GATTTATAGC CCAAGCTAGA ATGGTTCCTG TCTGATATCT AATTTTCTGA CATACTCGCA TTGATTAAGA GTTGAAGGCC AATTAAGT'rT GCTTGATTCG ATAGACTTAT AATCAGTAGG CTAACAAAT'r TATTGACCTT ATGCGCTTGT TTGCGTTGGC AACCATAGAG AATCTGTAGT ATAGTTMACT GAGrTACCAA TAsGACATTT ACTTGTTGGA TTACCTCCGA AATAAATCTT CATAATCTAA AAACTAGTAG GTTCCACACC AAATGTAGTC GCACCATACC TAAAAACATC CCAAGTGAAA TATGATGTGC TAAGGCAAAT AA.AACACTrG CCAAATTCCA TAAAATTTC'r CGATACAGA.A ACAATAAAAA TGAAAACCAA GGAATrTGAT TGCTTCCTTG AGCATGAATC AGACTAAAAC CAACACCAAG CCATTTCATC CTAGATTTCA CATACATCCA TAAAAAAGAA ATGAGTGACG .PAC!;GATACA. AAGAAATTT.C AATAAGTATA ATATATAAAC TGGAATTATT CTTTTCATAG ATCTAATACC TGCACAATCC TTT INFORMATION FOR SEQ ID NO: 44: SEQUENCE CHARACTERISTICS: A) LENGTH: 8657 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: AAAGAAATTG TCAGAGAGTG GCTAGATGAA GTAGCAGAGC GGGCTAAGGA CTATCCAGAG TGGGTGGATG TTTTCGAGCG TTGCTACACC GATACCTTGG ACAATACGGT TGAAATCTTA GAAGATGGTT CAACTTTTGT CTTGACTGGG GATATTCCTG CCATGTGGCT TCGAGATTCG ACAGCCCAAC TCAGACCCTA CCTTCATGTA GCTAAAAGAG ATGCCCTCCT GCGTCAGACC ATTGCAGGTT TGGTCAAACG TCAGATGACC TTGGTACTCA AGGATCCCTA TGCTAACTCC TTrCAACATTG
TGGATCTGGG
CTCCTCTGGA
AAGGAAATTC
GTCCGAGATA
GCAGTGACAG
TACTTGATTC
412 AGGAGAACTG GAAAGGGCAC CACGAGACTG ACCACACAGA CCTTAACGGC AGCGCAAGTA TGAGGTGGAT TCGCTT'rGCT ATCCTTTGCA GTTGGCTTAT AAGAGACTGG CGAGACTAG'r CAGTTTGATG AGAT~n'TT CGCAGCGACT TCCATCTGTG GACGGTGGAA CAAGACCACA AGAACTCTCC TTATCGTTTT' CGGACCGTAA GGAAGACACC GTATGACTTG GTCAGCTTTT CGTCAAATAT GT'rTGCTGTA GCAGCATTAA ACCTAGCTGA TAGCCAGAGT GAAATCCAAG AAGGAATCAA AAACTACGCT TACGCTTTTG AAGTGGATGG CCTAGGAAAT AG'rCTACTAG CTGCGCCCTA TCTGGGCTAC ACTCGTCGTA CCArT'rGAG CTCTGAAAAT GGTCTCGGCA GTTCTCATAC CTTCTATCGC GGCTTGACAA CAAGAGATAA GGCAGAGAAA GATGGTGGTA CAGGTGTCAT GCACGAAAGC CGTGAA'rGGT TCTCC'rGGGC TAACATGATG xiTTcGCAAG GGGCTCGC'rT 'rAGCTCAACC AAAACGTTAA AATTTAAATT TAGAATGAGG ATTATCTCAC ATAGTCACTG GGATCGTGAG CAGTTGGTGG AATTGTT'rGA CAATCTCTTT AGTTTCCACT TGGATGGACA AACTAT'rGTC AATCGCGACA AGGTCCAACG CTACATTGAC ATCTTGCAGG ATGACTACTT GATCTCCAGT CAACAAGAAG CTGCCAA.ATG GGGTAAATCA GGAAATATGG GACAAGCGCC TCAAATTCTT GGTCGTGGTG TGAAGCCGAT 'rGGATTTGAC TCTCAGTTTT CAGAAATGTA CTGGCAGGGT TTTGCCAACT GGTACAGTAA CGGGAATGAA TTCTGGAAAC AAAAATTGTC AGATGTGCGT ATGAACGGC'r GTGACCACCA GCCTGTACAG
TATATCTGGC
AAATTCTTGC
CAATCGCCCT TTCTATCCAA TGGATCAGCT GGTTGCCTGC 'N'GGTAAATG -ATMCTTGG ACCTGACT CG'TCCGAGTG ATGACTGTTG CCAGTA'rAGTI GTAGTCTTGG GTTATGTGCA AGAAATCTTC GTTATTGCTG ATGCCAAGCG TCTTCAGGAT TACACCACCA ACAGCAAGGG CGAAAAGATT GCCAGCATCA TGGATGA'rCC AAATGTACCA 'rGTTCCGTCG ATGATGAAGT GTATCAAGCT CCATACTTCT ACCAAGGAGA ATACGCAAr.C TTTCATGTAG ATGATCCGAC CCTCTACTCT TTCTGTGAGT TGGTCTTGGA T'rACTTGGAT GATTCTTATC AGAATCACAk GTTTACATTI' TTTTACTTCA TGGAAAATGT TGTTGTACAT TGGTACTTGC CTTTTGAAAG CCATCGTATG GATCTCTTTG AAAATGACCC TGAGTTCAAG CTTGA'rGACT ACTTACAAAT TCGCCCTGAA GAGGGCAAAC TTAAAATTGG TCCCTTTTAC GAAGCCAATG 'rCCGCAATAC CTTGATTGG.' ACCCAGATTG GCTACTTTCC AGATACCT'TT CAAAAATCAG GCATTCACGT GGCGGCCTTr AACCAAGTCC TTGAAGATGA GCAGTTTACG GTGGATGGTA GTCGTGTTTT AGGTATTCTC 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 AT'rCCAGTTG
GCCTACGCTT
AAAAATCTGA
ACAAAGATGA GGCCTTGACC CGACCAACCA ATGGTTGATG GCGAAGCCAT TCGTGTGGCA 413 AATGAACTCT TCCCGGATGT AATCTI'rGTT CATAG'rTCTT T'rGATGAATA TGTTCAAGCT GTAGAAGGTG CGCTTCCTGA ACAGATGGC'r GGTACACACT TTCCAAGAAA ATACCAACCT GGACACAACC ACAAGGACCA CATGATAGTA TCTGGCTG CCCAAGGTCA ACCAAGTAGG AAAATTGCTA CGGATAAGGC CATGATAAGG TCGA'rACTGT T1TGCACCCAA CAGAAGGCTA GAGGACTTGG ATGGTCGTCC ACACTTATCA ACTGI-rACAG TGCCAACACT TCTTCATCCC
CCTAGAGCAA
GTTGACCTAT
TAGCGTGGAC
AAACTTTGT
TCAAAGTGAC
CAGCACAGTG
CAAAAAGATG
TGTAGAGGCT
TATAATTTAC CAAAACACAA GTTCCCCCAA ATTCCAGT'rC ACCTAGCGCC GCTTTCTTGG GAACACCGTG AGGGTATTTA CCAAAACGGA GTGGATGACA ACATCACAGT CTATGACAAG CGCTTTGAAG ACCGTGGGGA CATCGGAAAC GAGCCAATCT TTCCAGAGCT TAAGGGCCAC AAAATCTTGC TCAAACATGA ATTGACCGTG GAGCAACAAG GTATCATCGA G'rTTATGA-AG A.ACATTCCTC TGGAAACTGA GTTGACTGTC ACTCGCTTTA C'rAACACTGC CAAGGATCAC
GTGGTAGAAC
GCTTGGAA.AA
GAAGTTCACC
AAAAGTAACT
TATCTCTTA
ATTGA'rGTGG GCTGC'rCTT'A
ACAATCGAAG
GCTCGTATTG
ACAACCTTCC
GTGATTGATA
ACAACTCACG
GAGTATATC1'
GAGGTCTTGG
CCTGTCAGTG
CGTGAGGCTG
TTCGTTGACA
CGTATCCGTC
GCGAGTTGAC CAGTCAGGAA GCATr'rACCT AAAACAAGCC CCTTGACTAT TATCACTGGT CACTT'rTGCA GAATGCGCCA GCGAGATGGA AACGCGT'TTT TGCTCAACGA GTGGAAGGGT CTGTCATTAA CACAGGCTTG CGACTTGTGA TTCAAGGAA TCTTGCCAAG TTACCGTGTG ACCTCGGAGC TAATTTTGAG CTCGTCAAGT GCGCGTGACC AATTGCTGGA AGGAAAACAA CACCAT'rCGT A.ACGGTGAGT AAGCCTATGA AGACTTTATC AmTTCCAACC .AAAAGGAACA AAAACACAGC T'rGCTATGCT CGGATGAAAA GCTAGAAGAA GACGGTCAGA AGAATTGACA ATCCACAAAT CCGC~T-CAAG TCTTGGTCAA GACTCATAAC 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840
ACGCGTCCAA
GCTGCTTCA'r GCAATGAT'rC TGAAAGTATC TATGAGG'rCG TGACACGACC AAACAAACCA GGGAAAACCC TGAAA.ATCCT CAACACCAAC AAGCTTTTGT CAGTCTGTAT GACGATGAAA AAGGGGTGAC GATAACACCA TTGCCGTGAC TTCCCAACGC CAGAAGCACA CACCAAGCCC AAGAACGCT'r ACCAGCCTTC AGCTTGCTAG CATTCTGTTC TCAGCATACC TGTATCCAAC AAGGGATTGA ATGAATACGA AAT.CCTTGGG CATTTTGCGT GCATCAGGTG AGCTAGGTGA CTGGGGCTAC ATGCTTGCGG GAGTTTGAAG TCGAGTTTGC ACTTGAATGC CTCAGCCTAT CGTCGTGCCA AAGCCTTGCA GACACCGT'TT ACAGGAAGGA AGCGTGGTTG CGACTGGTAG CCTCTTGAGC GCAAGTTTGT CCAACAGCCT T'rAAGGTAGC TGAAAATGAA 414 GAAGGCTATG TGCTTCGT'rA CTACAATATG TGTAGTGAAA ATGTACGTGT GCCAGAAAGT CAACATCTC'r TCCTTGACCT ACTrTGAACGA CCATACCCAG TTCAN'CAGG ACTATTGGCT CCACAAGAGA TITCGTACAGA ATTCATCAAA AAAGAAGAAA 'TAATTTCA AAAAGTAAAC ATCAAAAGAA AGGAGGGGCG AAAAAGTAAG AACTAACTGC TGATTCGCCC CTIrrATGGT 3900 3960 4080 4140 4200 4260 4320 AAAAACAATG ACCATTGCAA CGATTGATAT CGGAGGGACT GACTCCTGAT GGGAAAATAC TGGATAAGAC AAGTATT'rCA TTTACTAGCG TGGCTAGATC AACGCTTGTC AGAACAGGAT CGTTCCAGGT GCAGTCAATC AAGAGACAGG TGTGATTGAT CATCCATGGC TTTTCTTGGT A'rGAGGCGCT TAGCTCTTAT GGGATTAAGT TTGCCAGTCT ACGCCTGAAA ACTTI'GGAGGA TACAGTGGGA TTGCTATGAG GGCTTCAGTG CGGTGCCCTA
C.
4
C
AAATGATGCC AACTGCCTTG AGCCTGTGTC GTGA7I'GGA TCGAGGTCGC CACCGTCTGG AAAACTTAAT AACTGGTCGC AAAATCTGGT CATACTGAT'r TATCCTTTGT CAAGAAGCCA TATCCAGTAT CTGATCGATC AGATTTTATC CAAGGTGTCA CACGGTCGCA CCAGTTATCC TCTTGTCAAC TGGCTACAGG CAAACGCAAG CTA'rrGAGGT GTCACTCAGT -CTGACCAAGC ACCTACCGCA )'ACCTCACCA GAAGCTGATA AAGTAGAGAT GACTGTTCTC GAAATGCGGT GCTCTCATGG GCTACTCAAC CAGCCT'rACT TTGGCTATTT GACTCZAGTGA ACTACTAGCT CAGGGATTGG CGGAGCCATG GTGGAGAATT TGGCTACATG AACTAGCATC AACTGGGAAT GGGACGGTCG CAAGAT'rTAC T'rGAGCGCAT GAACCGCAAT CAGGTG'rCAT CAGTCTGGGT AGAAGGCTGT TGAAGACTTT AGCCCTGCAC CTATCACGCA AGGAAAAGCA ATGGTAAGAT TTTAAAAGGT CACATTTCTC ATCTATCTCT ATCGAGGGTG ACTTTATCGT GCCTTGTCCT CAGCTACCTG TCCATTTAGA 4380 CATCCAGAGC TTGAAAATGC 4440 ATTATCAA'rG GTAGACTTCA 4500 ACAACCCTTG CCCCTGCTGA 4560 ATGGTACGAT ACGTGA'rTGA 4620 CAAGAGGCCG CAGCTGGTAA 4680 CTGGCGCAAG GCTTGCTCAA 4740 GGCTCTATCA GTCAAAATCC 4800 GTCGATGCCT*ACGAAGAATA '4860 GATGCCAATC TCTACGGTGC 4920 TTACAGGACT TAGTCTCAAA 4980 TACCAGATGT GGAAGTGGCT 5040 AGGAAGGTCA CTATCAATTG 5100 TGTTGGTAAC AGV1'CTAGCA 5160 AAGATTTGGC TTACATGGTr 5220 AGCAGATGAT TGAGATAT'rG 5280 ACACTTACCA GATTGAAGGG 5340 AGGAGTTGCA GGAAATCGAA 5400 TCCAGACCTT GGCCCACTTG 5460 TCCGTGATGT AGAGGACATT 5520 GCATGTrTGC CACGTTGTCT 5580 TGAGGAACAA GCAGCTTACG GCTGAA'rGTG GCTTCTGCCA
GCCTATGCCC
TCGGCCTTTG
CTTCTCATTG
AACAGTTTGA
TCAAATGGGG
GCGAAGAAAA
CTTTrGAGCTT
CCGTGGAGCT
CGTGACCT
TGTCAAGGAA
GGTTTA'rGAC
TACATGGAAG
TATTCAGCAG
GTACCATGCA
GTGCAGGAGC
TTGATTGATG
AAACTGAAGA CTCGCAAGGT CAATATCGGG ATGGACGAAG CCCACTTGGT TGGTTTGGGA 54 5640 415 CGCTACCTGA TTCTGAACGG TGTTGTGGAT CGTAGTCTCC TCATGTGCCA ACACTTGGAG CGCGTGCTGG ATATTGCTGA CAAATATGGT TIrCCACTGCC AGATGTCGAG TGATATGTrc TCAAACTCA TGTCAGCGGA TGGCCAGTAC GACCGTGATG TGGAAATTCC AGAGGAAACr CGTGTCTACC TAGACCGTCT CAAAGACCGT GTG.ACTCTGG 'TTTACTGGGA TTrATTATCAG GATAGCGAGG AAAAATACAA CCGTAATTTC CGCAATCATC ACAAGATTAG CCATGACC 1T GCATTTGCAG GGGGAGCTTG CTAGTGCCTA TCGAGGCTAA ACCGG7rGGG GAGACAATGG ATCTGGGCAG AACTCAGCTA AATACTGGTC TAACGGTTGA CTACCAGGCA ATCTCAGCGG TGTCCGATTC TTGATCA.ACA GCTGAGACGC TTGCTAACAT CAGGCCCAGT TGAATGC'rAT
GAAGTGGATT
TAAAGCCTGC
TGGTGAAACT
TCGCAATGAC
GGATTTTATG
'rATCAATCCC GGCTT'IrACAC CTCACAACCA TTTAGCCGT CGTGCCAATC AGATTAAAGA AGTC-ATCGTA GCCCAGTTCT CTATCCTACC AAGCTTGCAA CTAGATGGTT 'rGTCTGCGCA CTTCAAGACC CAGATTGACC TTGCCA.ACCT CTTACCAGAC AACCGCTATG T'TTTTATCA GGATATTCTT CATGACACCT GAACAGGACA AACCGCACT'r CGCTCAGGCT TAAAGAAAAA GCTGGAAACT ATGCCTATCT CT-TTGAAACT TTTAAGTAGC AAAGTAGATG TGGGACGACG CATTCGTCAG
V
S. V GCCTACCAAG CGGATGATAA AGAAAGTTTA CAACAAATCG CCAGACAAGA ATTACCAGAA 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 ?260 7320 7380 CTTA-AAGCC. .AAATTGAAGA CTTCCATCCC CTCTTTAC-CC AAGGTCTTTG GTTTGGATAC AGTTGACATC CGTATGGGCG CCAGCAGAAA GCCGTATCGA GGTTTATCTG GCTGGTCAGC ACCAATGGCT GAAAGAAAAC GACTCTTGCA ACGCATCAAA TTGACCGCAT CGACGAGCTG AGGATTTCGC AGCAACTACA ATACGACTTA ATATTCTTCG AAAACAGTGT TTTGAGCAAC S. SC C. C
C
S
*@SS
S
S
S. V S GAAGTTGAAA 'rCCTACCATT GCCAACCAGT GGCATACCAT AAAATCTCTT CAAACCACGT CTGCAGCTAG CTTCCTAGTT TGCTTGGCGC AGGGTGT'rTC TAGACAAACT TATGATAAAA TATGCTACAC TTAAAAN'AG AGCTTGGATC ATTTCCGTCA AACAAAAACG TTAGCCGAAC TCATGCTCAC GCATTTTTCA TAGCTTTGTT AGTGACGATG TACTGACTTC TACGCAGACA TGCGACAGCG TCGACGATTT CAGCTTCCAT CTGCAACCTc TGCTC'rTTGA TTr'rCAT'rGA GCGTGAAACA GAAGAATTAT TAGCAGAAAG TGAATGTTC
GTATAAAAAC
CTGGTTTCAA
CTAAGAGCAA
AAGAACACCT
ATGCTACAGT
TTGGAGGTAT
CATCTGTAGA
TAAAACAAGA AATAGAAGCT GAAAAGCCAG AA'IrTAAAAA AGGTTGCTAC CGACATATAT AGATTCCAAA TAGCAGATGT GATTTTATGG AGTT'rTGATT TTGCAAATGA TGGATAATG'r TGAGTGGAGP CATGCAGATT CTTACTTTCG TTGAAGAACG TTACACAGAA AATGTCTATC TGGATAGCCT 416 AAGTGTCAAA CAAAAAmTA AG?1'TATTTT CGACTrCGGT CCAAGTGCTG AGAGAAATCG AGACAGAGGA CGAAGAAGCT AACGTCGCCA GAACAATATC CAGATTATGA. TGGTN'TGAC AAATCAGTCT GTGTAGGCTT AGTATTTCAA TAGACTTCCT CATGATTGAT AATACCAGCA ATCAAATTCA TTCGTAATCC GATGAATGGC GTTTTGAATG TATCTCGTAC GTTCGGTTGG TATGAAGAAT GGTAAAATTG GCAAAACTAG AATccTAGTT GAAGCGTTTA CGArGATTTC AAAGATGTTC TCAACCTTGC AGCTGTTAGT GGCTTGAGTT CATGAGCCCT TGATAACCAC ACTCATTTTG AACAACT'rCA 7440 7500 7560 7620 7680 GATAGGTTGT TGAAAACATT T'rCTCTCCTT AGATAGCGCA TGCTGGATTT ACGTGAAGTT TGTCAGCCAA GA'1rTACCA TATCATGACA ATAGTTCACA TCGCTTGAGC CTTCATAGCG T'rAGGGCGAT TGATTT'rrAC GGAGTTCTTG AAATCGTAAC CGGATTAAGT TGCTTTCGTG TCTAGGCTTA ATTTAGGT AATACAGCTA ACATCTCTTT GTATCAGITA ATTGTTrACT GGTGTCAATG TTT'TTTTCAT TGCTGAGTAG GTTTCCCAGA ATCAACCCCT TCAAATTrTA GTATTTGTTC AAGTTGAGTG AATTTCTTTT TTAACCC TTAAACGTTT TTACTTTGGC TGGTTATAGG CTTrATCTrrC TGTGCTTGAG GACATATCTT GCTTGTC!CGA. TATCTGCA GTGATATCCA AAGAAACAAT TGAAATTrCr TTrTACCAGA TTCCGTCGCA TCAATCATTA
TCTCCCTTGA
ATCATTCGCT
CTTGTGACAA
AATTCrTTT ACCAC?1'TGA
AATACCAAAA
TCGTCCACCT
AAAAGTCGTG
TGCTT1CATAA
CTATCCCGAG
AAGACTCTGG
AGTCCATATT
ATAATATAGC
ACAAGAGTTA
TCAGCCGCAA
TTTGCGTGTT
CGCTGAACAC
TT'rCGCAGGG AATTATrTC
AAGATTGTTT
TTTCCTTTAC
GAATTGAATT
CCGTGTCCTC AGAACTAAGA CTrCAACCCA TTGGCTCCGA TTTCTTCATA AGTGCGGTAT TAAGTTGATA AGCTGTTTTT CAACAAGACG CTT1AAATCGT AGTCTATrGA CTCTTTGGTA CCGCCATTTG TATTTGCAAA TTAGCTTTTT TGTATTCTAA ATCTGT'rTTT TGTGGTTCTG TCGAGAGTTT TTACTCAGTT 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8657 INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH- 11384 base pairs TYPE: nucleic acid STRANDEDNEss:edouble D) TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ, ID NO: TCTATTTTGG GTATAGACTT ACCTATAAAG AAAAATATCT ATACACTGCC TTACTAGCTA TACTGAACGA GTCAACAAAA ACGATATATA TTGATGATAT AAATACAGCA AGAT'TTTTA 417 ACTTC?1-rGG CAATGATATT CCTAATT'CG'r CTTAAAAAA AATTGACTAT ATCGCACCTT 180 CAGAAADGT TT-CATTTAGT ACGTACGTTC GACAACGTTC TGGAAcATAT ATTAAAATCA AGTTrTAT TAGAGAATAT TAAATATT AGAAGATCAA TTAACAAAAC ATAGAACAAT TGGTTGATC'r CATGTATAAA 'rACcTAACALA AACCACGCGC AAGGTACAAA TACATGAATA TCAAAGAAAA AATCAAA-AAG TGCTAGTGTT TATCTAGGCG TTGACCAACT AACGGGCAAA AGCAACCACI' AAAAACGGCG TTAAAGTAAA AGCGCGTGAT TAATGGCTAT ACAGTTAAAG ACAAGCCGAC AATTACAACA TTGGTGGGAT AGTTACAAGA ATACAGTTAA GCCAAATACT GGTTAGAGTG CATTTATTGC CTGTATTTGG CGATTACAAG TAAAGTAATT CCTAAAxr'rr AGATGTTTCT GGT'rACACTG CAAAATTAGT AAAAAC'rAAC CTTGCCTGCT GATGGAAAGA- AATGGCCAAA GAGTTNATTA AAAGCCCGTA CAACTGTTAC GCGATCAATA CTTTTGCTGC TATAATGAGC TTGTAAAAGT CGCCAATCCA TGGAGGGATT CTATCTAAAC TTACTACGCC TATTCTTCAA CAGCAAGTAA ACAAATGGGC ATTTGC'rAAC TACTCTTTGC TCCATAACAT TATCCAGGTA A'rACAATACA ACCCAGCTAA AGAAAAGGCT GCTGTCAAAT AGATGCTCTG OATCAATCAA ATTGGCCACT GGTTGCCG'rA
ACTTAGACAA
A'rTATGAGAA
TTAGTGAGGC
AGAAAGCG~t GTTATCAGCA TCAATAAGAC TAAATCAAGC GCTGGTTATC GTGATATACC ACAATACAAA AACCGTCAAC AAATTCAGTC ATTCTCTGTA TTTACGGAGA AATATGCTTA GCATTTTGAT GCTGCTGGAG TAACTAACGT TACTATGATG CTCTATGCTC AGGTTAGCCC TAAT'rTAATG ATCACTGAAA ATACTTACTG CGTCTCAAAT TATGAAACAG CTATCAACAA GGCTACCCTC TTACTATACC AAAAATTAGT TGACAAGGCA AATAAAGGCG AAAAAGGGGC GAATAAGCGT AT'rTTGAAAT ATGGCGTAGC TGATGTCATC GTTCCACGCA AACAGCAAAA CAAAGAATTA AAACAGT'rTC TTGATTArTT CT'rATTTC-AT GTTGT.TCTGT. ATAAGACT.TT TCTGGCTCTT GAATGGTCTG ATATTGACC'? ACTAAACCGC TATCAGGAAA TAAACTCACC AATAGACAAA GCCACATTAC TTT'TACTGAA TTGGAAATTA GGCCGATCTG AAACAGTTGT TGCT'rGTAAC TTACGCAAAC GCCTAAATAA ATCATTTCAT GGTTTCCGCC ATACACATAC GAAAGATGTT CAGTATAGAT TAGGCCACTC GCATACTAAC CAAGAGAATG CAAAAAAAGC TTTATAAAAA. ATAAGGGTGA CCCATTTCCG AGGGGTAGTA AAAACGGTAT TAAATTATAA TTAT'rTCAAA GGCTTTATAG CCTATAATCA ATGATTTCAA TCCACGATAT TCAGCTACTT GGTTGTATTT AGCGATGCGG TCTGTACGTG 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 AAAGCACTAA GGGAAAGCGC CATAAAGAGA TTATTTTTTA CACCAAGTTG GTCTTCGATA
CCCAAAGTGC
AGGTTGTAGA
CGAAGCAATT
418 AAAGTGAACC AGTCTTGATT TGTCCTGCGT TAGTTGCAAC TGCAATATCA GCGATTGT'rG AATCTTCAGT TTCACCTGAA CGG'rGTGATA CAACAGCAGT GTAACCAGCT TCTTAGCCA .5.555
S
S S
S
S
S. TTTCGATAGC TTCAAA.AGTT AGrrAGCAG.C ACCTTCTIGG CGTCACCAAC AAGTTGTACT AGTCGTTTTC ATCCATACCA CAAGGTAGTC GATTTGTTCTr TGTAGTCGTA AACTrrACGT TAAATACGTC TTTACCTGGT CACCATCTTC AGTTCCTTCG CCAAACCACC TGATTTAAGG GGGCTTCTTT AAATGTTGGC GAGCGTCAGA GTGAGAACCA TGTrGAATCC ACCAAGATAG CTACAGCGAT AGACACACCG CGTCAAGTGC GATCATAGCA TAGCT'rCAGC AATGA'rGTTG AACGAGATTT GTCACCGTCG ATGGAACCAT ACCACGTCCG GGTTACCGCG TGAGTC1'AGG ACTCTCCTTA TGAGTTAAAT AAGAAAAAAC GTTATCTTTG TGTCTGTTTT ATTCTAACTr TACTTGAAAA AGCTCATGGT GTACTTTTGA GGAAAGAGTT TTCTTACCAA GACGTTCAGT AAGAGCTTrC TCTTCAATAG TGATGAT'rGG GTATTTGTTA GCAGATGTAC GAACAGCAGC ACCTTCACCT TCTTT'ATCGT AGAATTCTGA TGAAGCACAG ACATATCCAG CAGCTTCAAT CGCAGCAAGG AAACGAGGAG CGAATCCACC TTCGTCACCT
CAACCATCCC
ACCAATTCTT
TCAAATTTAG
TCAAATCCGA
ATAGTTTCAA
ACGGCAGT
ATTTTCTTAA GAGCGTGGAA GCACCAACTG GCAAGATCAT CCGT'rGATGA TGTTCATCAT CTGTAAAGTG GGAT'rTCAAG AGGATTGCAT TCGCACCCAA
CGGTCAATAG
T'rTACGTTGT
CCAAG'N'CAA
AAAGCACCTG
ACTTCGCGAG
TTTTTACACC
TGCAACTT'TT
TTATGATATA
CTTGTTGATC
CAACAGCT-rT
CTGCTTCGTG
ATTCAGTGTA
CGTAAACATC
TCTATAATAC
CC~rAACTTT C'rGTTTTCAT TCAGTAAGAG TACCGAITG GTTAACTI-rG ATAAGGATTG ATACCACGTG CAAGGTACTC AGTGTTTGT-r ACGAAGAAGT GATTTCAGCA CCGTAACGAA GAACTCTTGG AAAGCGATTG TGGAGTTGGA AGAACTAG GTAGTCAGCA GCAGCACGAG TTTACCTTTG TTAGGAGTAC ACGTACATCG TAGCCAATGA TTGTGTACCA AGACCACCGT TTCACCAGTA GAAGCTCCTG AACTTCTACT TCAAGTGTTG AGTAATAATT GACATTTTTT CTTAAAACCC C1'CCTTTTTC ATAAAGTAAT CGCTTITCTTT GACAGAT'TTA TCAAAACAAT TGAGCAAAGA CCCTATCTTG CACACCAAAT AGCCCTCACT AAAAGCAGAG CCACTATTTG CTACTTAAAA GAAGCAAAAG TACTTCTCCT TTTGGCCTGG CCTTCGACTr GCT'rTTCCAA CTTA'rCGAAG AAATGGTTTA 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2?60 2820 2880 2940 3000 3060, 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 GGGTTAAAAA TAAATCCGGA CTGGATATG TAGATAT'rGA TAGAAAAAGG CTT'rATTr ATTTTAGAAA TGAAGATTTC ACCAACTATC GAATTTGATA
AAACTGATAG
TTATTCATAG
AACTTTGGGA
TCAAGCCACC ATAGTATCTG CAATGCACCA GTTCAAGTAG AGTTAAAAAG GAAGAACCAG
ACCICAGGA
AGCAAGTTT
AAGAGCATAT
AAGAAAAAGA
CCAAAACATC
CCTAAATCTT GCACATATrTT AACTGGCATT TTTCAAACTA CTTGGATATT 'rTCAAATGGT TAACAGGAAC TCCAACAGGG TI'TGACTATC AAGTCTTCCT CAAAGTTGAT TAAGTCCAAG CACACAAACC ACCAGCAAGG GTGAGAAT'rT TTCATAAAAA GTAGACCTTT TTGATTGCCA CGACATCTGC TAAAATATGA AAACAGAAGC TAAGTTTAAT CATAAAAT'rG ATAACCTAAA 419 AATAAGTGCC CAATATTGGC TCTTCTAAAG GTTCACNr AGGGGAGGTA AATCTTCAGC GTTCI 'TTTG CAACACCTAT TC'rCCAACAA CAACCAAGTC CAGGTATCAA TTCCAGACAC CCTCCACCTG CTCCTGCTCC AGCCGTGCGC TCCAGATAGA CTCCAAAATA GAAAAGACAG AAGAC1TACCG CAAATAGCAA AGGCGCTTTC CCACCAAAGC AGCATCTGAA ACT'N'CTTAT GATACTTGCC TGAGCAAAGG TTTA-ATTTCT AATGTTGCAG GACTGCAAAC ATAGTCCGAT ATGACCACAT AAGGGACTCA TTTATAGCAA TTTI'CTGTTG CAAGACATTT CCATCCCTA'r TCCATCATTA C'TGGCCGTGC
TCTTGGATCG
AAAGTGTAAG
ATTTGAACAC
AAGGATTGAC
CCAGCAGCAA
CCTCATCTAC
TCGCACCTTG
CTTCAGGAAT
CGGAAGCAGG
TCCCCAGTCC
CACCAACACC GATATAAATA TCT'rTAA'rCC CTTTAGAGAT GAGATGAAGA A'rCAACTCTC CAATACCACA AGTT'rGGATr TGAAGTGGAT TTCGTTTC'rC TAGCGGAATT TTTCCAAGAC CAACCAAGTC AGCTACTTCA AATAGTGCCA GTTCCCCTTT CTTITTTGTCC AAAAGGTCT GTCACTTGGA TCCATTTTTC TTGAAAATAG CGCATGGCTT TTTTAGGTCA. AGAGAATGTC GCAGAGGAGA CATrCTACAT TACCTGTTGA GCTGTCAAGC 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 GGATAGCATC TACAG'TACCT CTGCTATCGA TTGTTGGAAG TTTCCTTAAA CGAATCCGGT AATCAAAGGG AGAACTTCTA GAGCACTTCT TTGGCACAAA ATTAACCCCA TCACCGATTG TTTTTCCAGA GTCTCTTTTT TAAAAGACCT TCTTTGACTT TGCTAATCTC TCCAACTATT TCTTTTGGAG AATAGAGATG CGT'rATCAAA GACCAAAATA TTTCAAAGAC CAACTCTCCT GACCTGCCTC TCTCCCTAAA TCTCCCCCAT CACCAACAGG CCTCTTTTTA TTCCTTCAGC GCAAT'rACAA TCTTCATATT TTCCCTCATT CTAAACAGTC AAAAATCCC'r CTTGTCAACA TGATGTGGTA TTTCTT'TTTT
AGGCGATTCC
CCACCGTTCT
TGACCTGGGG
CAAGCTAGTTI
GGTGTAAATC
TAACTTCGCC
TTCTTTAGAA
ACTTATAATT
GGCAGTGAAA
CACCAGACAC
GACTTCAACA TTAATAGATT AGTTTTAGTT TCTTTCTCCA TGTCCAACTA ATTTTrCCI'GT TAGGCAATAC CAAGGGA'TT CAGACCAACT AGGATGCCAT TAGATGAATT GAGTTGAAGA CACTCT'r'r'T CTTAAACTGC C'rGCGAAATT TCCGCCTCAT GATTAAGG'rT CCATCTACAT AACTCTGGGA CATTT'AGCGA GGAAGACCTT CCAACAAGGA CGCATTGCTC GACTTGTAAT AGATCAATCA CTTCTTCTAG
CCAAAACACA
CGTATCAAGT
420 CAAGCCTT'rT ACTTGAGACA 'rCAGTTCTCC CATCATACGA TTIATCTAT TAATTAACTA TCTC1'AAACA GCCTAAAAAT AACTATGGTA CAAGTCAAGG TATGACTTGC AGGCTGTATC
CTCCAAATGT
CAATAAGGAG
CTGGATGGTT
ATAGATTGTC
GCGCGAACTG TATCTCGCAA ATAGCCATCA TTI'GAGACAA CAAGCAGCGT AGATTTCCTG TCTGCGTAAA TCTGAGCAAA AAAAGAACAA TCTGAACATC AAACCTr'rTT GACGTGTCAA GGCAAGCCAA CTCT'rTCTTG TGAT'rGAAAT GATAGTCTAA CCATTCACGA TACCAGATAC CCAAACTGAC TAGTCATAAT GCATAGAGAA TACCTGCCTT GCGTAACGTT CAAAGCCAAC TGGAATTCTA AAT1YATGAAT CGGTATTTTT CCTTCAACAA AGATCAGGAA TAAAGTCAAT CCATGAGAAG TCACTCTCCA TAGCTTrTTT AGTT TCTATACAAG TCCAATGCTG TTTGGAAAG'r CCAATTTAAC AAAGCTAAAG CCAGTACCGC 'rTCC?1'CGAT TGGA?1'GAA.A GCCTCCAACT TCATGGACCA ATGGCAAGGT TCCATAACGC GCCACACGGT TCAAAACGAC TTGGCATGAG GAAGAGGTCA AGCAAGTTTG ACATCAAAAG TGATATTrGT TGATAGCTTG CCATGAGAAA GCTCCTI'CAA AGGCTGGA'rC GCCAGTTCCC TTCTTGCAAG ATATGGTGAA GACTTI'CGAC CACCACATCA ACGAGAAACA ATTCCCACCA GTGGAACGTC TGCTCTAACA CAATTTTGCC TTA'ITrTTGG CT'rTCCCAGA CAAATCTTCC AAGAGCATCC GTCTGAGGAT TATAAAGATC AGCATCAATC TTTACCAGAC TCCATTTTAA GAATCTGATC CAAATTACAT TTCATGAGCA TAGCTAGGTG AAACGGTTGA AACACGGTTC CATCCAGTTC AGACAGTTGT TCr-ATCGAAG GGTGCCATCA TCCAAACAAA TCACCCAACA TTCCTTCTGA AAA'N'GTCCT 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6~840 6900 6960 7020 7080 7140 7200
GGTTAAAACT
GAAAGGAATC
CCTTTCCATA
AAGCGTTCTC
TCA-ATAAAGT
CGCCAACCAA
TCTACCAT AT
TTTGGAAGAG
.GCTGCTACAA
CCACAACTGG
TGTCCAAGAT
CGTCATCAAA ATCACCGTAA AGAAGGTTAC ACCATTTAAT CGCTCACCTC AAAATGAAGC CATAGTAGGG TAAAATCACT CGCCAATGAC GTCTCCCAAA ATAAAATTTT CATGAATGAA ATGTTCTGCA GTTCCTCGAA AGCATATTCG ACCTGACCCC GTTTCAATGT CCTCATAGGC TTGAATCCAA ATAGCTGTAT GGTAGTCATG AACATGGAGA GCC'rCAATGG CAGCCAGTTG GAAAAAGGCA ACATGACCAC GGAAGAAATA ATATTGATTG ACTG'ITTTCT TAATTCCACA ATACTGTCTG ACATCTTCAA TCTGATTTCC AAATTTAGCC GCAACTTCGT GCCCAGCTTT TACCAGTGAT CCACCTGTTT TTGAAAAGGG TGCACCCTCT TATCCTCTGT TACTTTAGCA CCTTCTTAA TCACAACACC ATGCTCAACT TCAACCCCTT CTTCTCCAAT AACAACACGA GGGAAGAGCA CATGAATATT ACGTGATAGA ACAGAATTAG CAAACTGAGA AGTGCTTACC TTAGATGTAT GGCTATCTTT AACCAAGCTA TCCTTATGGA CTACTTGACC TTCAATAATA CTACCAGAGG TAGCATAGTA AGTTGGCTCT AAAGAGAATA GAATrTGT CAGAGTGAAT ATTIGGCrAGA CCAAATCCCG TAAAACATAG AGTGT'rCAAT CAACCAAGGT CAGCTGTTGA CTTGCTATCA CCAAGATTGC ATTTACTrT TAGGCTCTT'r TGTTGTACTA CATCGCAGTT GAGGGCAACC GTTGGTAGTA T'rCTTTTCCA AGTAATGGCT AAGAAGGGTT CTGAGCTGAT ATTATCCTGC 421 TCGTTI'Tr'GA CCTT-rGTATA AATC???TGG GATTCAAGCA TATCGATA'rT CGCTI'GATAA 'rAGCCCGTGT ACTCGTAGGC GAAAGCTCCC CGCAATTTCT CTGGATGTTC 7rrTAGCT GTATCAACGA CAAAGATA'rC TGTAGACATA AAGAGTrrAT GAGAAAGAAC GAAATATCTT 'rCTTAGCTAG TGTAGGTGGA AAACTTGCTT GTTTGGTTI'G AGCCAGAACG
ATG-GTCTGTT
?TTTTTATAA
CAAATCAATG
TTTCAAATAA
GATTGTAA
TGA.ACGAATA
TTGGTGAGA
TAAGATTTAA
TCTTAcAc
TTTCTTCCA
'rTGAACGTT
TCATCTACAT
ACTACAGTGA
TTAATAAGAA
GTAAGAAGCT
ATTCCTACAT
TGGTCAAATA
ACTGTACTAC
GATAAGCCCC
TGGAAAATAC
S
S.
S
S
5S55
S
S.
S S
S
GGCTTGAAAG
GACGGTGGTC
AATATTTATC
ACTTCATCTG
GCACGT'rTAA
ACTCCGCTC
TCAACGAA6AC
TTTGGTGGTG
GCAT'rrTCTG TCrTTCCAAT
GGAATGACAT
ATATTACGAA
TGGGAAGTCA ATCAAACGAT CGTCAATGTC GACATATTGT AATCTTCATC TGTTGCTACC TTCCATCAAT T'rCGACACCG TCTTAGCTCC TTGACCAATG CTrCGCGAAC TTGCGCGCCT ATCCGTCTAC AACTAATGAG AAATCA.AGTT TCTTGAGTAA GAGAAATATA CTCCATGT'rC AACCACTAAA TTCGTAAGCA
TTTCTACACG
ACTCGCGTCC
CAXAGACACT
ATTTCCCACC
GAAAACCAAC
CCCACTACTT
TCAGAAA'rAA
ATAGCTCCAC
GTT1GAAAGGA
TCTTCCACAT
ATCTTCCATT
GCTTCCCAAA
TAAACACTr'r ACGAACACCT GCATTAGCAA AAATGGCAAA CTTGCTACTG TGTATTr'CCT AAAATGGCAG CATTATA'rCC TACAACTTGT TCGCACCTTC ACCAAI'AATG TCATGATAAC TGAATCAAGG TAGAATGTTT AACAGTTCCA GAGCATTTGC CCCGAGGAAG GACGGT'rACG ACTATCCAAG GTGACrCAAT AGTACCAACA CACCrGACTC AAGGTAATTr 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460' 8520 8580 8640 8700 8760 8820 8880 8940 TTTTACCAAA GTCTGACATG CCAACCTTGC TCTTrTTCAGC AGCGACTAAC GGCGTTGCCA ATCAAAAATG TAGA'N'CCCA TAGAAGCT'rT TGTAGATTTA GGTTGAGCTG GT'rTrTCTTC AAATTCAACA ATACGATTGT TAGCATCTGT GTTCATGATA CCAAAACGGC TTGCTTCTTT AAGAGGGACG TCTAAAACTG CTACTGTCAA GCTGGcATTA TTATCCTTAT GAG.ACTGGAG CATATCATCA TAGTCCATTT TGTAGATGTG ATCCCCAGAC AAAATCAAGA CATACTCAGG ATTGACACTrG TCGATATAGTf CGATATNTTG GTAAATAGcG TGACTAGTCC CCTCAAACCA ACGATTTCCT TCACTTGCAG AATAAGGTTG AAGAATAGAG 422 ACACCTGAAT TAATACCGTC TAGTCCCCAG CTTGAACCAT GCAAGTGG'rT GATACTGTGT AACGACCCCA ACATTGTGAA 'rCCCAATATG GTTGTTGAGA TCCCTGAGTT GGCACAGTr'r GATAGGGCAA AGTCAATGAT ACGGTAGCGC CTTTGAGTGA GTIACCGAG ACGAGTTCCT TCATTTTCA TTTTCTACTC GCGACGTTTG AT'rTTCCATA CTCATAATCT TTCCATAGTC GCCTCCCCAC TCTTCCAACT TCCGAT'rGTA AAATCTTTCC TCCCTTTTTA CCC'NTACGAA G.ATTTCAATA CCATCATAGC CTGGTtTAGC 'rGAGAAGCGA CCATTCCAAC TGTTCT'rCAG GAGCAATTTC T'rACCAGGGT C1TTTGGTT
CACTTGCTCC
CTTCTI'GCGT
CAGTATTCCA
GCTCAACAGG
TAAAGGAAAG
TGGTATCAAT
AATACTTCAT
ATTTCCATTC
GACAAATTTG.
CCACCAAATT GCACAGCTGG 'TTTTGCGATG TGCCCACCAG CAAGAATCAA AGCTAACATr TTTATTTGTG ACGGTTTTAG TAGATCAA CATAGCCGGT AGGGTAAAGG TTAAGGTCTG TTGAACAGTT TGATTATGTT CTTTCCAAAC TACTTCTTCG TAAATTCCTG CAACGGGTAG TACCATATTA AAGATACAGA CTAACATTTC AACACTCTGG TCTCGATTAT CCGCATCAAT TTCCCACAGA CAGCGATGAT CTTTGTAAA.A CTAGCATTC ATTGGGTCTr CTAGGTTAGA TAGGAATTGA CCGTATTCGC TACCCATGAA GTACGTATAG AGATTGCGCA AGCCTGCGAA CATCATACTC TflCTTGCCAT GAACCACTTC CTTGAAAACA TACATAAAGC TGAAAGTCAC ATCTTCTTCG 'rAGAAACGGA GGATATCATT TCCTAGACCA CCAATCTCT'r TCATTCCCGT AATCATCATC ACATCTGGAT ATTCTIAACTT ATAACCTTCA TAGTTGAGAT TTCCGCCATC 9 TTGATTGTAA CGATCTCCCC ACATCTTATG ATCGTGCGAG AATGGCAAGA GATAATTCTC CAGGTTAAAG TCATATr'rAC GATAGATCGG CATCCAGCCC ATGTTCCATT TGTAGTCAAA AATCTTGATC GCAGACGAAC TTTCTTCTGC AATAACCTCA TTrCAAGCGCT GAAGGAAATA 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 TTTATTAGGT GTCCATGGAG CACACGAATA CCATCCAAAT GGACTGGACT TCATTTTTTC CTTAT'rATGG TCTTGGTATr GTTGATGG'rA AAGTGACTG TCCTCGACAA AATCTTGAAA CCCATAAGCT GATACCCCCA ATATGAGTAT AGTTCATTTC CTATAAGGAC TGCCATCAGA ACAGGACGCT CTTCAAACC CATCATCATA GTCCAAATAG AGCATGTTGC TAACAGCATC GATAGACATC AATCCAATGC TTAATGCAAG AAATTAAGAA 9 CAAGGTCAAA ATTAAGGGCA CAAAAGTCGG TGTCCCATCA TACCCAGTCC ACAATAACCC CTCC'rCTGGT CGGCCATAAG ACTCAAGCCC AAAGGATGGG AACGAGATAA GGAATGAGT'r ATTTCTTTTC CATGATCCAG CCAACGTTT'r CTTCGTGCCA CCCCAACCAT GGTTATGAGC TAATAGGCTA AGGCATCATC CAATATTATG GGTATGACAC CATGCTCTAA AGCGAAGTAA ACATCAAGGG CATAAACTCA CATCCTTGAG CTGGGCAAAA CGTGAACTTC ATAAATATTG GCCAAAGTCC ATCCTTCCAT 423 TTCTTCTCAG GAAGCTCTGT TACGATTGCC CCTGTTCCTG GACGAGCCTC ATACCTGACA 10800 GCAAAAGGGT CAATCTTCAT CAGTTGATGA CCATTTTGAC GTGTGACATG ATATNTGTAA 10860 CTTCCT CTGAGCCAT ATTGGTAAAG ACTTCCCAGA CCCCAAAATC ATTTCTTACC 10920 ATTGGAATCT GATTTTCAAT CCAGT'rGGTA AAATCACCAA CCAAGTGAAC AGCCTGAGCA 10980 TTAGGTGCCC AAACACGGAA GGTATAGCCA TGCTCTCCAT TTAGTTCTTC CCTATGTGCT 11040 CCTAGA'rAAT GrrGGAGATA AAAATTTTCA CCCGTCATAA AGGTTTTTAA TGCTTCTCTA 11100 TTATCCATAT ACTCCCCTTC TCCTGTAAGC GTTTTCTATG ?rT'1ATTAT ACTACCTr'rT 11160 TAGAGAAGAT TCAAGTAAAT TACTATAC?1' C'TTA-ATTAT TTTGAAAATC TACAACAAGT 11220 TCACTTACTC GTTCAA'PTGT AAATCAATAT TT1TTTCAAAA AATTGCGAAA ACGCCTTTCT 11280 TTTTCTACTA TAGTGAA.ATG AAATAAAACA TGCGCAAATC GATTAAGGAA ?T'TAATCTAA 11340 TTTCTAACAA TGTCTI'AGAA ATCAAAGTGT ACTATTTTAA CTCC 11384 INFORMATION FOR SEQ ID NO: 46: SEQUENCE CHARACTERISTICS: LENGTH: 7577 base pairs TYPE: nucleic acid STRANDEDNESS: double C TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: *TGTTGATTTG TTACTAGACG TTGACCAACG TCCTTCGGCT GGAAAAGGAA TTCTCCTTAG TTTCCAACAC GTTTTCGCCA TGTTTGGTGC GACCATCTTG GTACCATTGA TTTTGGGAAT 120 GCCTGTATCT GTTGCCCTTT TTGCTTCAGG TGTTGGAACA CTCATCTACA TGATTGCTAC 180 CTGGTT TTAAA GTTCCAGTTT ATCTAGGTTC TTCATrTGCC T'rTATCACAG CTATGTCACT 240* GGCTATGAAA GAAATGGGGG GGGATGTATC TGCTGCCCAA ACAGGGGTTA TCTTGACTGG 300 TTTGGTCTAT GTCCTTGTTG CTACCAGCAT CCGATTTGTA GGA.ACAAAAT GGATTGATAA 360 ACTCTTGCCA CCAATCATTA TCGGTCCTAT GATCATCGTT ATCGGTCTTG GACTTGCAGG 420 TTCAGCTGTT ACCAATGCAG GTCTTGTAGC AGACGGAAAT TGGAAAAATG CTCTGGTAGC 480 .CGTTGTTACT TTCCTAATTG CTGCCTTTAT CAATACAAAA GGAAAAGGCT TCCTACGAAT 540 CATTCCATTC CTCTTTGCCA TTATCGGTGG TTACCTTTTC GCACTAACTC TTGGCTTGGT 600 TGACTTTACA CCAGTTCTI'A AAGCCAACTG GTTCGAAATT CCTGGTTTCT ACTTGCCAT1T 660 TAGCAC.AGGT GGTGCCTTTA AAGAGTACAPA TCTTTACTTT GGTCCAGAAG CCATCGCTAT 720
CTTGCCA.ATC
AATCTGTGGT
GCTATCGTAA CAATTTCTGA CGTCAAT'rCT TAAAAGAACC TATCGCAACT TCTGTTTC 'G 'rACAGGGGTT
CATCGCGATT
CGCTGTACTT
AGTCTTGAT
TATGTTGGTT
TACTGCCCTT
AGACTAAGAG
ATAATAAAAG
AAAACCTTGA
GATTTGCATC
CATACGTTTC
AAACATCTGG
ATCGGTATGA
GCCC'rCAGCT
GG'TGGTATGT
AAAGAACGTG
CT'rGGACTTG
TCAGCCATGA
TCTAAATACA
TGTCTTAACA
TACTGTACGT
AATCTCTTGT
ATGGACTGTT
CTCATATCTG
CCI'CCTTGG
CTCGTATCGC
TCCTTGGTAA
CAATCCTTCT
TTGATTTCGC
424
ACATATCGGA
AGGTCTTCAC!
TGGACCAGCC
TTCTGTCTCA
ATrCACTGCC
CTATGGGGTT
TCAAATGCGA
GACCATACTG rrTTGGGTCA CGTACTCTT-C 'N'CGTGACGG AATACAACTr ACGGAGAAAA GTIACCGTA ACGCTGCCTT -7N1GAT=CAA CTATTCCAAA ATCGCCAGCA ATGGTTTGAA.
AACCTCATCA TCGCAAGTGC GAGGAGCTAT CCTTAAACTT CACGAATCAT CTTGAACTTG CCTAA'rCCAC TCAGACAGCT AAATTATTAA AATCAAAAAA TTTATCATAG AAATTTTTAC CTTACTTGCG TTTCTTCr'rC TCATIGGCAAA. TTCACCAATT GCATTCCTGC TCCTCCGAGA CAAGGGCAGA CATATCCATT TAATCCCCAT TTGCTTCATC GT'rTAGCCTG GTTAAAGTCC
GGTCCAGTTA
ATCTTGCCAT
GAGTGGATTT
CGTATAATAT
TTATTTCT
CACTT1'CAGG
ACGAAAATAA
TTCGTATACC
CAGATATCT
CATCAAATGA
GCr'TT=CA TTT'rGTTAGC TTACC'rTTCA AACCGCCACC GCTGATAAGT CAGGCATACC CCTCCCATAT TTGGCATATT ATTTTATTCA TATCCCCAGA TTGATGAA'rT TATTGACTTC 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 GCCTTGTCCC ATCATT-CCTT T'rTAGGAAGG TTATTTGGAT CATAACACCC 'rGCATGAGCT GACGAATGTA TTTCCAGAAC TGGGTTTTCA CGCTCTTCAG GCGTTCATCC ACCTTCATGT
GAGCAAGTCT
AAAATCAAAG
TTCC1'GAGA.A CATGCGGTCTr -dATTGGTTTT
AATCTTGGTA
GGCTGCTTCC
TGCTTTCACA
TCCATCGGCC
GTGTTTTCGC
GeTr'rCTCAA
GGGTGGAAGG
CCAGTAATGT
AGGATGACCC
TGACCAATCA
TCACGAAGCT
CAGCAGCAAT ACGACGGCGA CGGCTTGGAT TTAACAAATC GTGTCA'rCGA AGACACAA'G GCACGTTTAC GAGCAATCTG TTTGAAGGGC TGGATTGTTG GCCATACCTG GAATCATCTT CCATATTTTG CACCTGATCT A.ATTGATCGA TGAAATCATT GCATCTTCTC AGCCATTTCA AGGGCTTTTT GTTCATCGrA TCAAAGTGAG CATATCCCCC ATACCAAGGA TACGGCTAGA TTrTCAATGTC CGTAATCTTT TCACCTG'rAC CAGTGAACTT GACGAACAGA CAGAGCAGCA CCACCACGAG TATCGCCATC CAGTCACTTC CAACTGAGCA TTAAACTCAC GCGCAACATT TAGCATCAAC GACAAGCAAG ATTCATTTG GTTGAGCCAA CATTCATGAG GAGCTCATCA ATCTGCAAAC GACCCGCAGT ATCAATCAAG ACATAGTCGT TATGAT'rAGT TTGGGCTTGC TCCAAACCTT GACGTACAAT CTCAACAGCT GGTACTTCTG GGTCTrAAC TGGTCAATGG AGCATTTCT TCTTTCTGA CCCTTGTAAA CCA.ACCATCA
TTCCAAGTGC
CAGCTGGACG
GTrGTTGC
TGATGATGGT
CCTATCAGAA
AGGATTAAGT
GTCCTTTACA
TGCCTCTTGG
CTGCAAACG'r CCTAAAACGG CTGTC.AATTC GTATCAATGA CCTC-ATGCCC ACAGGCAAGG CAACGTCGGC ACATCAGATT CAGAGAT"TTT TCTGTTAAAC TTTCAAATGC AAAGACAGGC ACATCAATCT GTTGTCCCAA A'rAAATATCC CCCGCAATCA TCAAAGGACG CAATTTACCA GCAAAGGTTG TTT-rACCAGC TGGAATCTTA GGTGACTrrGA TrAA'ITr1CCC C'rCATCAACG ATTTTAATAA TCTGI-rGCGC GACTGCACGC TCACGAACTT TCTTGATAAA CTCGAGCAAG GCCAAGCGAA ITTTCTTTGGT TCCTTTA CGTAGATTrTT TAAAGACGTT CATTTTTCTT CCTCTflATTC TCTATTATCA AGAAACTCAT CCTTGGGATA GCGCTCCAAA TAGTCCCAGTI ACATGTGCAA TTTCAkTCTCA ATGCTTGTTA AAATTTCTAT CTGCTCCTGC ATCTGATCAA AAATCTGACT GCGGACAATA TAATCT'rCCA
CCGAACTCCT
GAATCTTTC TGTTCGCTTG ATATTGTCAT AGAr-AGCCTG CCGCAATTTC AGCAAGGCTG TAATCATCAG CGTAGTAGAG TTC-ATTTGCT TATCTGTCA-A AAGCGCCGCA TAAAATTCAA AGAGCGCATT GT'TTTTTCGA TrTTCCATAAC TTTTATTATA CCAAAAATTA GCCTAATC?A AAGCCGATCC AAGAAGATAG ATAGCTIAAAT TTGAAAAAGA CATGACCT'A ATTTCCAA'rT GATAGCTGGC AAAGGATGT CCCTCTTGAT TTTGTAGTTG TCAATCTTTT CCCTATCAAC 'rTGATAATGG CTCGTTTGGA TGATAAACTC ATAGGTGTAG GAATATAGGC TAAACTrATCG CTATCCTTTA GAAACCGCAT
ACGACTGACA
CTCGATATAA
CATACGATTG
CCACACTAGG
GCCCCAAGTA
ATAATCTAGT
CTGCATGCCC
AATGGTCT'rG 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 GGATTAGAAA ATCGGCTCAT TTTTCCTCAT TATAGAAAAG TAAAGC'rGGT CAATCACTTC T'rCACATCAG GCCCTCTTTC TTGCTATAAT GGTCATATCA TGTCAATAA'r CAAATCATCT CATCAAACAA C3'GGGAACTT TCAC'rGTCTA GGAGTCTATG TCCTGAGGAA TGGAATCCTG CCTTGGCCAT GGTGCCTACT CACAAGTTCT TGACCATGAA ATTTA-ATCAC TACTTT'rTCC CAGGTAGCTA TAATCTCCTT TTTrCATGCAC TTCCACATCA CAACTGCTCA TCAAACTGAA TCGTATTTCG CATCCGAATC TTGTCTCTTG TCCTACTATT TTACCAAAAA GAGCAGGAT'r ACGAAAAAGT ATTCCGTGAC CCTGTTCACA ACTACATCCA A'rGACT'rGAT TAATACAAAA GAA'TTCAGC GTTTGCGCCG CCAGT'rATAC CTTCCACGGT GGAGAACACA GTCCTCTC AAATTGCACG ACGCATCACA GAGATTTTCG AAGAAAAATA CCGAGTCTCT CTTGACCATG ACCGCTGCTC TCCTACACGA CCCATACTTT TGAACA'rCTC TTT~GATACAG ACCATGAAGC 426 CATTACTCAG GAGATTATTC AAALATCCTGA GACAGAGATT CACCAAGTCC TGCTACAAGT 4320 GGCACCTGAT TTCCCAGAA.A AGGTGGCCAG TGTCA'rTGAC CATACCTATC CTAATAAGCA 4380 GGTCGTGCAG CTCATTTCTA GTCAGATflGA CGCAGATCGC ATGGACTATC TCTTGCGCGA 4440 CTCCTATT ACAGGAGCAT CCTATGGGGA A~I'GACCTG ACTCGAATCC TCCGAGTCAT 4500 TCG'rCCTATC GAAAATGG'rA TCGCCTTTCA GCGCAATGGC ATGC-ACGCCA TCGAAGACTA 4560 CGTCCTCAGT CGCTACCAGA TGTACATGCA GGTTTA~TTTC CACCCCGCAA CACGCGCCAT 4620 GGAAGTTCTC CTACAGAATC TTCTCAAACG CGCCAAGGAA CTCTATCCTG AGGACAAGGA 4680 TTTCTTTGCC CGAACT'rCTC CACACCTCCT GCCTTTCTTC GAAAAAAATG TGACCTrGAC 4740 TGACTATCTG GCTCTGGATG ATGGCGTGAT GAATACCTAC TTCCAGCTTT GGATGACCAG 4800 TCCTGACAAG ATTCTTGCAG ATTTATCGCA TCGCTTTGTC AACCGCAAGG TCrTTTAAATC 4860 CATTACCTTT TCACAAGAGG ACCAAGATCA ACTTAC.TAGC ATGAGAAAAT TGGTTGAGGA 4920 TATCGGCTTT GATCCCGACT ACTACAC'rGC CATTCATAAG AACTTTGACC TCCCTTA'rGA 4980 .TATCTATCGT CCCGAATCTG AAAACCCACG GACACAGATT GAGATTTTAC AAAAAAATGG 5040 AGAACTGGCC GAACTCTCTA GCCTGTCTCC TATCGTCCAA TCCCTTGCTG GCAGTCGCCA 5100 ***CGGAGATAA'r CGCTTTATT TTCCAAAAGA AATGTTGGAC CAAAACAGCA TC'rTTGCAAG 5160 CATTACCCAG CAATTTTTAC ACTTGATTGA GAACGATCAT TTTACCCCAA ATAAAAACTA 5220 *GAAGAGGAAA TTTATGAGTA TTAAACTAAT TGCCGTTGAT ATCGACGGAA CCCTTGTCAA 5280 CAGCCAAAAG GAAATCACTC CTGAAGT'r'N TTCTGCCATC CAAGATGCCA AAGAAGCTGG 5340 *TGTCAA.AGTC GTGATTGCAA CTGGCCCCCC TATCGCAG GC GTTGCCAAAC TTCTAGACGA 5400 CTTGCAGT'rG AGAGACGAGG GGGACTATGT GGTAACCTflC AACGGTGCCC TTGTCCAAGA 5460 AACTGCTACA GGACATGAGA TTATCAGCGA ATCCTTGACT TATGAGGA?1' ATCTAGATAT 5520 *GGAATTCCTC AGTCGCAAGC TCGGTGTCCA CATGCATGCC ATTACCAAGG ACGGTATCTA 5580 aTACTGCAAAT CGCAATATCG GAAAATACAC TGTACACGAA TCAACCCTCG TCAGCATGCC 5640 TATCTTCTAC CGTACCCCTG AAGAAATG.GC TGGCAAAGAA ATTGTTAAAT GTATGTTTAT 5700 CGATGAACCA GAAATI'CTCG ATGCTGCGAT TGAAAAAATT CCAGCAGAAT TTTACGAGCG 5760 CTACTCCATC AACAAATC'rG CTCCTTTCTA CCTCGAACTC CTTAAAAAGA ATGTAGACAA 5820 GGGTrCAGCC ATTACTCACT TGGCTGAAAA ACTCGGATTG ACCAAAGATG AAACCATGGC 5880 AATCGGTGAT GAAGAAAATG ACCGTGCCAT GCTGGAAGTC GTTGGAAACC CCGT'rGTCAT 5940 GGAAAATGGA AATCCAGAAA TCAAAAAAAT CGCCAAATAC ATCACCAAAA CAAATGACGA 6000 ATCCGGCGTT GCCCATGCCA TCCGAACATG GGTACTGTAA AAGTATCATT TTTCAATAAG 6060 AATTGAIrrAG CAATAAAA'rC CATAATT TTr'rAGCAAA CTA'rN'AATT TAAAACAAAA TAA'rCATAAT AGAGACACAA ATrCTGAT'rG TAACAA'N'TT TACCTAAACG AATTAGAATG TGGCCTrACT CCTGGGCAAC TCATACTCA'r AGATTGGACT CAAAAAACAG GGAGAAATrA TAATTTCCCA AGATATTTflA AATACTCTCT TCAAATTGAC CCTGAATCTA CACACAA'rCA ATrATACAAA, TrAGGATACT TCACTAAAAA TAAGACI'TA TC-ATATCTrA CAGTAGTAGA ATrAAAAACT ATAT'rATCTA AACATAA'1rT AGCTACTrCTr GGAAAAAAAG CAGAATTAAT TACAAGAATA A'rTAATAATG TrAACATrGA CAATTTAGAT AACAAAAGAA GCACAAAATC TTATTATCGA ACATAGTGAC TAAAGACATA ACTATGGAAG A'rTATTGTAA AGAAAAAAAC TTrTGGTGAT ATAAAATGGA GTCTCTTAAA TAA.ACAAGCT AGATTTTGGA TGCTTATCTA ACACACGAAA GGCTCAGGGA TAATATTAAA CATGCTTTAA TATATTACAT AGAATCTTTG AGAAAACAA'r TTTTCAGCCA. CTGATrATCC AGTATATTAT CTCACTAAAA CATATTCAAA CATTAATGGA ATCATTATCT ATTCCGTTCG AATTTAAACT TArATCAAAG CATACTATGA AATATCTCTT TTAAAGCAAC CATAGGAATA CTGTATCAGG AGACATTTGG AACAAGAAGG ATAATTACTA TTTCAGGATT CCCGATTCGA TACCTGACTA GATGACCATT ATGATTTTGC AATCATTTTT- TATCTA-AGGA GCTGAAGAAA TAAACAATTA GAACTTGACG ATTTTGAATA ATTTTATCTG ATGATTGCCC TCCCCTATAT TATGCGTTCT AATTACATGC CCCCTGCAGG 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7577 TTTTGATGAA GCATTATTTC GCTTCTCAAT AGATATTGAC TATTTAAGAG CTTAAAGAAA TATGAATGT'r AATTGACTAT ACAAACATT AAAT'N'TTCA ATAATAAAAC TTTAATATCA AAGACTTTTT AATCGAACCT GCAACTACTC AGATAAAAAC TCTGCTAAAT ATACGAGCTG CTTTACCTTG CCGTAACGAA CAACTTCGAT ACACCTACAC CGTTAGAGAT
TTAATTTACC
ATAGTCCTTT
ATATACTCGA
GCATAATATT
GACAAACTTC
CTTAGGAGGG
GAGCAGAGTT
AAGAGCACGC
TTTTTCA.ACA
TTTACGAACT
TTTGAATGCA
TCGT'rCCACT
AAATAATTTA
TATAGTCTCA
ATGGAGACAA
TTTGATATCT
AGTTGTTATA TCCATTrGAAC T'TTTAGTCGA ATTAACGACG AAGTAGTACA ATTTCGCACG CGTGGAGTGT GGAT'rGGGAA GTGTAGTTTT CTGAGATTCC
TAAGGGAGCT
GATTTCTTTG
ACGTACTTTA
GATACGCTCA
AGCACCTTTA
CGTGCGA'rAA CAACACG INFORMATION FOR SEQ ID NO: 47: SEQUENCE CHARACTERISTICS: LENGTH: 4945 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: CCTCGCTGAT GATTGGTGCT GTTrATTTG CTGGTCCAGC CTTGGCTGAA GAAACTGCAG TTCCTGAAAA TAGCGGAnCT ATGAAGCTGA TAAGCAGAAT AAGGAGTAGC GATAGCATCT AAACTGCAGA AGCAGC'rAGC AAACACCATC TGCAGAAGCA CAACTAACCA AGGGGATGAG TCCAGCCAGA TGTCCCTAAA ATTCTTGGGA AGAATTGTTA GCGGATCTGT TGTCCTCGCT AGGAAGCAAA AGTTCAAGCC AATACAGAGC TTGTTTCAGG AGAGAGTGAG CATTCGACCA GAAGGGGAAC ATGCTAGAGA AAACAAGCTA GAAAAGGCAG GAAAC'TGCTT CGCCAGCAAG CAATGAAGCT GCAACTACTG GCAGCTAAAC CAGAGGAAAA AGCAAGTGAG GTGGTTGCAG AAACCTAAGT CTGACAAGGA AACAGAAGCA AAGCCCGAAG TCTAAACCAG CAGCAGAAGC TAATAAGACT GAAAAAGAAG AATACAGAAA AAACATTAAA ACCAAAGGAA ATCAAATTTA AAATGGGAAC CAGGTGCTCG TGAAGATGAT GCTATTAACC TCACGTCGGA CAGGTCATTT AGTCAATGAA AAAGCTAGCA TTArCAAACA CCAATTCTAA AGCAAA.AGAC CATGCTTCTG TTGGTGGAGA AGAG~rCAAG GCCTATGCTT TCTTCTGGGA AGGTCTCGTA CCAACTCCTG TTCCTGTATA CGGTACACTC TTCT'rCAACT TTGCTGAAGC TTTGAAGCAA GACGCAGATG ACATGGCCAA GTATTATGGC TATGATGGCT TGGTTAAACC TCTTGGAGAA AAGATGCGCC CTAAGGTAAA CCATCCAATC AAGTATTCTT TTGACTATTG GCAATATCTA GATTCAATGG
ACGTTATTGA
GGTCTAATAG
GTAGCTTCCC
ATTTCATCAA
AGTTTATGCT
GGTACGATGC
TGCAGGTCAC CGTAACGGGG TATTGCAGAT CAAGAAAGAT AATTGCCCGT AAATTGGTAG CCAAGAAACA ACTGGAGATT CTATAGCAAG GAATATGCTG CATGACCTAT AACTATGGAC CATGCAACCA GAAGGAGATA TAAGGCTAAA AATGATTACA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 GTTATCATCA AGATGGTTTG AGGTTCCGGC AGATAACTTC CTATTGCAAC TGCCAACTGG TGCAACAGGG TGGTTCCTAC dGAAATTGCG CCTTTCTCTT CTGGTGAAGA TTATCATAAA CTGGCCAAAA ACCAGGTGAC CGCCAGCGGT AGGTAATACT TCGTAGATGG TAAGGTTTCT GGAGAATACA ACTACCAATT TTTGCTAACT TTAACTZGGGA ATTGGTCGTA ATCCTTATGA TGTATTTGCA GGTTTGGAAT AAGACAAAGG TTAAGTGGAA 'rGACATTTTA GACGAAAATG GGTTTA'PTTG CCCCAGATAC CATrrACAAGT TTAGGAAAAA AATGAAGATA TCTTCTTTAC AGGTTATCAA GGAGACCCTA AAAGATTGGT ATGGTATTGC TAACCTAGTT GCGGACCGTA TTTACTACTT CPTTAATAC AGGTCATGGT AAAAAATGGT AAGGATTCTG AGTGGAATTA TCGTTCAGTA TCAGGTGTTC TTCCAACATG GCGC'rGGTr.G CAGACTTCAA CAGGGGAAAA ACTTCGTCCA GAA'rATGATT TTACAGATGC CTATAATGGC GGAAA'rTCCC TTAAATTC TGGTG.ATGTA GCCGGTAAGA CAGATCAGGA TGTGAGACTT TATT-CTACTA AGTTAGAAGT AACTGAGAAG ACCAAACTTC GTGTTGCCCA CAAGGGAGCA AAAGGTTCTA AAGTTTATAT GCATTCI'CT ACAACTCCAG ACTACAAATT CGATGATGCA GATGCATGOA AAGAGCTAAC CCTr"TCTGAC AACTGGACAA ATGAAGAATT TGATCTTAGC TCACTAGCGG GTAAAACCAT TCGAGCATGA AGGTGCTGTA AAAGATTATC AGTTTAACCT ACAATCACCA AGAGCCACAA TCGCCGACAA GCT'rTrCTGT CTATGCAGTC AAACTATT AGGACAATTA ACTATCTCCG AGTGAAACAA TCTCTTAA.AA ATGCCCAAGA AGCGGAAGCA GTTGTGCAAT AAGTTTATGA AAAAGATGGA GACAGCTGGA TTTATCTACC AAAAGTTAGC TTGTrAGCAG'r CGGTAAAAAT GTATGACTGT AAAAGATACC CAACAGTTAT TGATAG'rACT TGAACGGTAC CATTrACTAGC TGGATA'rrCG TT'rGACCAAG CTGGTGGTGA GTCTGTTAAC AAGATGCAGA TGGTGAGTGG CAGATATCAC TCTTGATAAA CTGACAATGG AACTCCATGG TTGATAC1'GA GAGTGTCAAT ACAAGGTACA AGTTGGCTT'r ATCCAAATTC TCAAACTCCG; GTGCACCATT GGATTTGACA CCAAGGAAAT TAGTAA'rGTC CAGTCAGCCT AGAAACAGGA TTAGAGGTGG TG'rTCTTCGA TAACTCACGC AGGTGTATCA TTACTCTCCA ATATTTGGGA
CGCTCAGCAA
GGAGTTCGTT
AGCCTACCAA
TTCCCTAAGA
TTGTCAGATA
CCACGTACCG
GATGGC?1'GA AAGCTAGC'rA TTAAAGGCAA CAAGGATGCA GATTTCTATG AATTACI'AAC TCGCTCATCT TCTACAACTA GTGCTCAGGG TACAACTCAA GAACrGAAGG CAGAAGCTGC AACCACAACC TTTGArTGGG AACCACTAGC TGAAAATATC GTTCC-AGGTG CTGAAGG'rGG AGAAGGTATT GAAGGTATGT AATGGTCTTC AGCTCAGTTG AGTGGTAGTG 7'rGT'TAGATG GGTCATGGAT CATGCACGAG TGAACACTAA AGACT'rTGAC CTr'rATrATA AGGAAGTCCC TGGTAACAAA GCACACGTGA CTCAAGACTG GCGCTrCAAT GTTGTCACTT GTATCTATAA CTGGAAAATG TATGAAAAC CCAAGGCTGC AGCCCGTrCr CTAGGCAATA CGGC'rGGAGC AACTATTACC GTTTATGATA TGAAGAGCGA AGTTGGAGGA GACCTAGCAA GTC1'TCTTTA T'rATCGTACC CAGTTGCCAG CCGTTCCAAA AGATGACAGA AGAATCAAGT CAAGCTACGC CGAAGGGGAG CATT'rGGACC 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880' 2940 3000 3060 3120 3180 3240 3300 3360
CCAATCACTG
AAGGCTAT1'C
ATTCCGATG
GCAGATGTAC
CTCGCAACCT
AA'rCAATCTG
CTAGCAGTTT
CCTAAGAAAA
GTTCAGTATG
GTATCAGGTT
AAGGAGGAAC TGAGGACGAA TTGA'rACGCA T.CATAAGGGA
CTCATTCGCC
GAACAGAATC
GTCACTGGCC CAACCGGTAA ATGCTAAT'N' GTCAGTGACT 430 AAGACGAAGC AAGTCCGAAA ACTAT1'TTGG GAATTGAAGT AAGTCAGGAA CCGAAAAAAG ATTACCTAGT TGGTGATAGC TTAGACTrGT CTGAAGGACG CTIrGCAGTG GCT'rATAGCA ATGACACCAT GGAAGAACAT TCCTTTACTG ATGAGGGAGT TGAAATTTCT GG?1'ACGATG CTCAAAAGAC TGGTCGTCAA ACCTTGACGC TTCAT'rACCA AGGCCATGAA GTTAGCTTTG ATGT TTGc? ATCTCCAAAA GCAGCATTGA ACCATGAGTA CCTCAAACAA AAATTAGCAG AAGTTGAAGC TGCTAAGAAC AAGGTGGTCT ATAACTTTGC TTCATCAGAA GTAAAAGAAG CCT'rCTT'GAA AGCAATTGAA GCCGCCGAAC AAGTGTTGA.A AGACCATGAA ACTAGCACCC AAGATCAAGT CAATGACCGA CTTAATAAAT TGACAGAAGC TCATAAAGCT CTGAATGGTC AAGAGAAAT'r TACGGAAGAA AAGACAGAGC TTGATCGCTT AACAGGTGAG GTTCAAGAAC TCTTGGCTGC CAAACCAAAC CATCCTTCAG GTTCTGCCCT AGCTCCGCTT CTTGAGAAAA ACAAGGCCTT GGTTGAAAAA GTAGATTTGA GTCCAGAAGA GCT'rACAACA GCGAAACAGA
S
S S GTCTAAAAGA TCTGGTTGCT AAACAGGTGT TGAAGTACAC TAGAGCGTGT TCAAGCAAGT TCTrTGAAAT AGAAGGTTTG TTGTGAAAAT CCCAATTGAA GCAAAGAGGC AGTAGAATTG CTCACTT'rAC TCATTATGCC CAGCACCACA AAACACAGTC AGGCTCCTAA AT'rGGAAGTT ATACTGAGAT GCTAGT'rGGG TNATTGAAAG AAGACAAGCC TTCI'CAAATA AAGAGAAGAC GCTGAAGAGA AGAAATACTT GATGAAAAAG GTCAAGATGT AAAGATAAGA AAGTTAAGAA GCTTTr'GAAC 'ITT'GT'rTATG
CTCCAAAAC
CAAGAGGAAA
GAACAACGAG
AAACGGATAG
AATCTGCTGA
CTACTTATCA
AGGTTGCCTT
TCATCATACA
AGCGTCGTCT
GAACAAAAGT
CAGCAGTTAA
AGCAGTCTTT TCTGATAGTA TGTCATCAAG GGTTTGAAAG TGCTGGAGAA GATGCTCATG TGATCTCTC'r TATGCTTCTA AGTATTT'rrC TTACCTGAAG TCATGTTATC TT'rACAGCAC AAAACCACAA CCTGCTAAAC ACCGACTTCT GATCAACAAA TCATCGTCAA GAGCATGAAA GGGACGAGAT GGAC'rGTTAA TCGTTCAACA GAAGTCATCC AAAAACAGTA CCAGCAGTAG ATCAGAAGAA GCAAGCAAAC 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4945 GACATGTCTT TGAAGTTGAT GAAAACGGTC
AAGAAGCGAT
TAGCTACACA
TCCAGAAATT GTTGAAATTG GGAAAAACCA GCTCAAAATA AA'ITGCCAAA TACAGGAACA GCTGATGCTA TTGGTCTTGC TAGTTTAGCC TTGACCTTGA CGAAAAATCT TGTGAAATCT TTCCG ATGAAGCCCT AATAGCAGGC TTAGCCAGCC GACGGAAAAG AGAAGATAAA GAITrAAA'rAT INFORM4ATION FOR SEQ ID NO: 48: SEQUENCE CHARACTERISTICS: LENGTH: 25002 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: GACAACTCAA GTAGCTTTTT AAAATCACAA TGGGGGTCGA CTTGGTTCCG TTGGAAGTT CTAATCTTTC TCA'TCCAC a. a.
-a a
CTTGGCCGTA
AAGTACAACT
AGTGTTATCG
CTTGGTGGAA
GCCCTAGGAG
CAAAAAGGGA
TTTATCATCG
AAACCAGACT
TTrTGCCCTCT
ACCAATCTAG
GCAGGTCTAG
AGCCTGCTCT
TACGTCCTCT
GAAATCAATG
ATTTTAGGAA
GCGGATGTTC
GTGCTGGCGT
TTrATCGGT'rG
GAGGATGGAT
CGGGTGATTA
CTCAAGCGGC
TTGAAAGAGC
GTCGCTCTCT
TTTCAAAACT
CACTAGGGGT
TCCAGTCAGG
CCA'T'ITCCA
T'rATCGTCTT
TCCTCTTGCT
TAGACAATAT
TTTGACCTT
ACATTTTTGG
CTTATTTTGA AAAAGGAGAT CAGAGTTAA CTA'rGTCAGA AACTTGGTTT TATTCTAGCA TCTGCTGGCT GGCCATCGGG TCCCTACATG ACTGCTGCTA ATGGCGGTGG AGGCNPTrA TATTTTAATC GGTTTCCCTC TCCTGCTCGCC TGAGTTTGCC TTCCGCTATC AAAACC7TTG GAAAACTGGG CAAGAATAAC GATTGGCGCC TTGCCCTCT TTATCCTCTT ATCrTI1-AC TCTAGTCTAT CTAGGTATTG AGTTTGGGAA ATTGTTCCAA TGCTCAGTTA TTTACTTCAA TCA?1TCAAA TCCAGCCA~T CTTTATCCTA TTGAATATCT TCATTGTATC ACGTGGGGTT TTCGAAAGTC ATGATGCCCC TGCTCTTTAT CGTCTTTGT CAGTTTGCCA AATGCCATGG AAGGGGTT-CT TTACTTCCTC GACTAGCACT GGTCTCCTCT ATGCTCTCG ACAATCTTTC TACAGTCATG TTGACCTATG CTTCTTACTT AGACAAGAAA AATCTCCATC GTAGCCATGA ATATCTCGAT ATCCATCATG AGCTCGATCC CCCTTCAATA TCCAGTCTGA AGGGGGACCC GCCTCAACTC TTTGACAAGA TGCCTTTTGG AACCAT'rTTC ciTrCCTTTTT GCGACAGTCA CTTT'rTCTGT CGTGATGCTG CACCAACCAG GATAACAGCA AACGTGCCAA ATCGAGTGTT TGTCTTTGGC A'rTCCTTCAG CCCTATCTTA CGGTGTCATG TAAGACCTTC TTTGACGCTA TGGACTTCTT GGTTTCCAAT TCTCTACCT1T TCACTTTTTA CAGGCTATAT CTTTAAAAAG CCATCTCGAT GAAAGAGCAT GGAAACAAGG ACTGTTCCAA TTTCTTCGTT TCGTCATTCC AATCATCATC AT'rGTGG'rCT TCAAAAAGGA CTTGAGTAGT GAACTCAGGC CCTTTCTTT TTCCAAACCT TGCCCTTCCA GAGTCCAAGC TTCAACATCA GCCTTTTTGA ATTGCATAAT TTTTCCCGTC AACAGTTAGC a. a a a a *aa.
a. a a a GCTCTTGCAA TGGAGGAACT GTCTGGCTCT TCCTTCTTCG TCATTCCCCA ATTTATGTAA
TATGGATGGC
CTTGGTAGGA
TAACAATCAA
TAAAGTGGCT
TrGACCTTAC
TTTCCAGTAA
CGCAAATCAT
ACATCGATGG
AAG3TCATAGA
GCCCCGATAG
ACTTTGGTCA
GACTTGGCAT
CATTC'rGT'TT
CAGCCAAGAC
TTTCCCACT1'
CTGCTTTAAC
ATTTTTCAAG
CGCGATAGGT
CGTGCATAGT
ACAAGTCATC
TGTGACCGTA
TTCCGAGTTC
432 ACTCAATAAG CTGTAGTCAG GTAAACTGCG AAGAAATCAT AGTI'ACAGGA CGGCTAT 'IG ATGAAGTTCA CGCAAGTTC GGTATCGCTA GAC'TGCTGGG CCCGCTTGGT ACATAGAAGA CCAGTTCTTG TCCTCGATTT GATAATCTCT GAACCTTCAT GCCTTCATGC TCGAGTCCAT CTGTCTmC AAAGTCAACT TAGATACAAG G.AGAGTGGAA CTGGCTCACC AA'rGTTCAAG CTTTGTCATC CTTGCGGTCA TTTCAAGGAT TAAGATACCC AATCTCCAGC CTTAACAGGG GCTrGGCGGAG TTCTCTT'r CCGCTGCGA'r AATGTACCAG AAGCATCGTC TGGGTGAACT TGGACACTGA GCCAGTCGTT GGCATCGAGG ATC?1'GGTCA AAAGTGGAAA TACAGGTTCT GCACGATTGC CAAATAATTC ACGGTGTTCC GCATACAAAG TAGCAAGATC TGTTrCCCTCG TAACGACCAT TGGCAACTTT AGAGACTCCA TTTGGATGGG CTGAGATGGC CCAATATTCT CCGATTTTTT CACTTGGGAT GTCGTAGCCA AACTCATCAC GTAGCTTGGC TCCACCCCAG ATTTTTTCTT GCATAACTGA TTGTAAAAAT AATGGTTCTG ACATGTCGAT CTCCTGTCTG 4. 4.
ATTTr'TCTCC CCTCATTATA GCAAAAAAAG AGTTCGAATT TAAAGCAGGG AGAAGATTTT ATAAAAATAG TAAACAAATG CCATTGCTAT AAATGACATC GCTCCGTTTC CTACA'rTATC CCTG'rGCCAG CTGAAACCGG CCGTCTTGGT AAATGGCATC TAGTCATCTG CTACATAACC ACGATAATAC TAAACrAAAA GAAGAAATCC AAACCATCAA CGCCTTCAIAA TCGTTCTATA CAAATCGATT TCTIAACAATG CTTGTACCAA TAGAAGGACT TCGATCTACA TAGATAAAGC A'rCGATACAG CCCCAAGTCG TCGCATGGCC TTGATGTGGG ATrCTCATCC GGTGTATCCA TCAAAAAGCA TTATATAATA AACACTTTrA AAAGACTCTC GTAAAATGAA ATAAGAACAG TTTTAGAAGT AGGGGTGTAC GAACTCTTTT TTACATCTTA TGCTCTACCC GATGCTTGCA TCTTCT'rGCT ACGTTTGAGA CATAGCGCTT AT'rCAT'rTCC TATAACCAAG CAAGTCAACC CCTCTAAGTA AGTAATCCGA TAGCACCGAG TCCATTTTCT GTGATATGAA ATCAACTAAA GTACAGCTA-A ATATCATAPA TACAAATCGA TCAGGACAGT TATTCTAGTT TCAATCTACT AA=TTATA GTAGTTTGAA 1620 1680 1'?40 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 *ATATTTCGTC TGATGGGCAA ATCTTATAAA GAGATTATAG ATAAGATGTG AACAACTCTA TCAGGAAAGT CAAATTAATT TATAGAAATA TTTTAGCAGC CAAGGTGTAC TG'rTATAGAT TCAATACACT ATAGACTGTA ATCAAACAAC GATTTGGCGA AATGTAAAAA AA'rATGAGGA GTTCGGACTC GACTCTCTCC TTCAAGAAAt ACGTGGTGGT CGTAACCATG CATATATGAC AGTTGAGGAA GAGAAAGCCT TTCTTGCCCG CCATTTGAAG
GCTACAGAGG
TTAGGTCGTT
AATATTACGC
AAAAATAAAA
TAAGGTTTGC
TTGTTGGGCT
CTATTGTTAT
ATGTAA'rAC'r 'rCGT'rAGGA GTT'rrCcAIr
AGATTCGTAA
CAGGAGAATT TG?1'ACAATT CCX'ACACACG TGATGCCTTC CACGTCCAGA ACATCCTAAG TCTCAATCCA AGAAGGCAAG TTGATGTACC AAGCTGAAGC CCAATACGAG TAGCCACA GGAGCTGTTG ATGCCTATAC GAGTGGATGA ACGCCTTTTT CAATGCTATA TGGCATAAAT 433 GATGCCr-rAT 'rTCAGCCrA TAAAAAGGAG TATCAACTGT TGAAGCGCCA TGGTTGGCGA
AAAGCAGACG
AAAGCGTTTT
TGG rCGGT
TATCCATAGT
AGGCGAATCA
AGAAGAGCTT
CAAGTACCTT
CTCAAACCA'r TGTTGCGTCT AAATATAG'rA GACGTTCG AGA6ATCACTA AACTGGGATC
CACTATATAC
TTTTCT'AA
TCACAAGCTT
AAAG3ATTCCG
GAGAATTTCG
TAGCTGGTAG
ATCCTTTTAC
ACTAATATTG
TAT'rCCTCCA TACACACCAG AGATGAACCC ACGTGGATT AAGAATAAAG CCTTTCGAAT AACTCCAAGA TGTCATACAA GGATTGGAGA AGGAGGTGAT GATGGACTAG AATGCTTTTT GAAAGCAGAT GAGTATTATA AGACCGGATT GCTCCGATCT TTCAATAGT'r CATATTCTCA TAAGGTTAAC GTCAAATGAC TACGCGACCT ATTTCATACG C-AGCA-GGTCC TTGAACTAAT AAGG.ACTCTG TTCCCCAATC AAACCTTTAT ACCAAGCTCG TTCAACCGT TGTAGTTCtG TATCAATAAT GATTCCTGAC TAAGGCAGTG ACTACCAATC TATCCAAGA'r ATGCTCCAAG ATTGATTTAA ATTCATTTAC GTTCTGCTCT ATGAAAATCA GAGGAATTAT TTAATTGcGc TACGAGTTCT TCTGCTTCGT TTTTIGGATCC AATTCAAGTA GATGACAACA GTrACCTT GTCTTCGATA CGGTCAACTG GTCAGCGTTT GTGAAGTCTA GATGAAGCCG TACTCAGCAT TGAAAGTTT'r CCAAAGGAAC GTAAACTGAG CAAAACCATC TCAGCACAGA CT'rGGGGAAC TCTCCTCCAT AAAAAGACCG AAGAATAAAG TCTACAAGTT G'rGATTGCAA TCCTTCTTCT ATTTTTCCTT GAAGGC =TG CTTCTACTGG AAGTGGACGG CrTTGTTCAA TTTAACAGCT' TGAATCCAAC AGCTCCTTGA CACCAAACAA GTGTT'rGTTT TTGAGAAGAC AAGGTTGATA CAT'rGAACAA GTGTGGAAAG TTTGGAAGAT GTCATGAATC AAAGTCCATC GTTAATCGGA TGCAATTTCT TTATATAAAA ATT-rCTATTT TAAAAATAGC ATAAAAATCA AGCACTAGAC GGTTACAGTT GGTCCGTGTA GTCTACATCC TCAACCTCGA CAAATGATTT TGTGACAACA ATTAGCATAA TCTGCCTT ATTTGAAACG ATAATATCTA GATTGCTCCG ATCTTT'rAAA TCATAT'rTGA TTI'TCGGCGA TCCAAGAAGA GACGGAATG ATAGCTTCTT CTGAGTGAAG TGTTGAGTGA TGCGAGCATC TCTGCAACA6A CTGCATCGAT GCT'rCCGCAA TrTTTAGCGTA GTATCTTCGT ATT'rCTT ACTGGAAGGT CGTATTGAAC 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 G'N'TGTGATA ACGTCTGGGT TTGGCGATCT GGATTGTC'N' CGCAAAGAGT GGAGATGTAC TGIrTGAGTA GTGTTACCTA GATTGCAT1TG 'rAAACTI'GAT CATGTAATCA CGCCACTTTTI AACTGGG'TT ACTTTGTCAA AGCCTCAAGG GCA'rGACGTT T'rCAGTGTTC TTGAATGCTT AACTGTGTCT GCTTCAAAGA ACCTGTCAAA CCTTCATAGT TGGTGCTT'rG ATTTTACGTG TCCAGCATAG ATAACTGGGC AAC'N'CGTTC AAAGCAGGAG GT'r'rTCATCG ATTTCTTGGA TTTAGAAACT GCAGCACGGC TACACGTTTG TTGTAAACAG AGCATCCATG TTCAATTCC'r ATCCA'rAGCT GCATCGTAAA AACTGCAACC CCGATTGAGC CTC'rTCGTGG CGAACTTGTA 'rGAGCTGAGT GTTCCTGATG GTTAAGCATT GCTGCAGATG ATTTTTTTAA ACTTGGAGAA
AGCACATGTT
TCTTAGCAGC
GCCACATGTT
CGTCGATTGA
ACAATTGCAA
GGTTGTTCT
GcATAGCTTT
TACCAAGTIT
CGTAAACTTC
CCACT'rCGT'r 434
GAATGCTCCG
GATACCACCA
CTTAGGTG'rC
GTAGATACCG
TTCACCCTCA
AACGTr'rGCA
AGCTGCTTGA
GTAAGGGTCG
AGCAAA'rGGG
GGCTGGTI'TC
TCCATTCGAA AGCTTCAAAG ACAATTCAGT AATCACTTCA GTTCAGCATT GTTCAAGATr
CGATGAATGA.GCGTTCGTAT
TCACCCATGA TGTTCCATAC GGAAGGGCAA TACCCATTGT ATGTGAAGGT GACGAGTAGA 'rCT'rGATCA.G CATGN'GTr GTTTTACCTT CGAGTT'rGT'r CGCC-ACCATG GAGTTGA'rTC CCAGCATCAC CAAGCATTGA ATATCGACTT CGATGAATTT AAGTTTrGAAC CAAGGAAAAG CAACCAACAC GGTAAGCAGA TTTTTACCAG TTGTGATGAT CCAGCTTTAA CACCACCAAA TCAACAGCTT TGTCGATTZTC GAACCTGAAC CGTAGTATGA AACCGAAGTT TACTGAATT TCAACAACAG AGGCTTCGTC AATTACITTT GGCAATTGCT CGATACCGTT GTACATTGGG rTrT'rGGTTAA TAACTGGACG TGATCCA AGG ATCGCTAGGA CACCGTTAAT CAAGTGAGTC GCACCTGGAC CGCCGAATTT AGCTTGCATA ACCGCTGCAA
CTGGACCT
CAGCGTAAGC
GCTCTTGGAA
ATGGAGTGTT
CACCTGAACC
GAGCACCTGT
5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 AGAAACGGAT ATCTTTGTCT GGATACCGTA GATTGTATC'r CAGTAATTTT CCCTTGAGTC TACGATTACA TAGAATTGGA TCAGCCAAAG CGTCCATCAA ACGCCCCATG TTTTCAATAC ATAATGATAA CTCTCCTTCA AACGITrCTCC AAATTTTTAC CTGC'rCAAAA CTC'TCTATTC 'rTATAACAAA GAAATCTAGT TATTCCACTG TATCATATTT ATGCTGACTr TTCTAAAAAT TCTATTCTAA TACAG=G AAAGTTCTGT CATTTCTGTT CATTACTTTT AGTCTATTI'T ACTAAAATTT AACAGAAGGG AACTGGTCAG AACAGATACA
GAACTAAACG
AAGACTCCTG
AGTAGAATTC
CCATGGCTAG ACCTGCCAAT TCTGGGTTGA CTGCAATCGG AATTCCGACA ACATTGTAGA GATGAAAGGT 'rTTCTTACTC ATATCAAAGG GAGCCAGTCC AACACCTGAA TAAAAGCCCA GAAAAGATTG CACGAACCAC TCCTAAAAGA 435 TTATTGGTTG TCAACACCAA ATCTGCTGAC TCGA'rGGCGA TATCTGTTCC AGCI'CCCATA GCAATCCCCA CATCTGCTAC ACTAAGGGCA GGAGCGTCAT TGATACCGTC CCCAACAAAG GCTACTTTCC CTGACTGTTG CAGTTTATGG ATTTCATGGG CTTTPCTI'C TGGCAAGACG
CCTGCAATGA
TCTCCTGTCA
GCATTTTCCT
AAGAACACAA
CCTCTTCAAT TCCGAN'TGA GCATGACTGTI TCGGAGACCA TAGGAATATC TTGCAAAGCA CTGTCTTAGC TTCTTTTTCT TCTGCAATAG CACGCGCCAC CT'T?1A GCTGACTGAT AGCAAGCCI' 'GATI'TCATT AGT'rCTTCTA GTTTATCTTG 'TTTCCAAGTA AAACT'TGTT'r
ACCACCATTG
GGCTAGCTTA
GTCAACAGCT
ATAAGTATTA
TCCATTGArr GAAATATCCA TGCCATCCAG CGCCCTGAAA CACCTTTCCC CCAGCTTCAC TCGCTCGCTT AAGGAGGCTG CCAACCCAAA TTCCCTTCCG TCAAAGTCCC TGTAAGACAG TTCCATTTT ATAAGGGCTG TCGGTGTTGC ACTCCGTAGA GAAGAGAGGA AAGACGAACC AAACCCAAAA
CATTTI'AGCA
GTGCAAGGAC
AACGATAGCC
CACTTCTACT
TGAAAAT
TCAGCCAGTG
TCGTCGCCGA
GGTCTTATC-A AAGACAAGGG GAGGAGAACC CCCATCTTGG AAGTCCCAAG GCACAAGGAC CACAAAGCTA GCTCCAAGCA GGTCATGATT CCTAAAATGA CCTGAAATCT TATCCGTCAA GTCCTGAATC GGCGCACGAC AAATCCACAA TCTGAGCCAA AACAGTCTCT GAGCCAACTT GT'rCCACTAT GATTGATGGT TGAGCCAATG ACAGTATCTC AGACTCTCAC CTGTCACCAT GGATTCGTCA ATACTAGAGA TCAACAGCAA TCTTTTCACC GGGACGCACT CGAATCAGGT AAAGGAACTT GGACATAACT ATCATCACTC AAGACTTCTG AGTAATTTCT CCACAGCTTG GGACGTATTT TTTCTCATTT AAAAGAACGA AAAAGAGGAT AAATCCAGCA CTTTCGAAGT AGAGCAACTA GGCTATAGAA ATAAGCCACT AGAGTTCCCA TTGGCATTGT GCTTTTTAAA ACTGGCCCAA GCACTCTGGA AACATAATAG GCGTTGTTGC TAGAAAGGTT CCCCAATGCA CCTGTCAACA TCCCAATCAT GAGAATCACA AGAGGCACAG AAACGTTGCA GGAGAGATAG AGATT'1TCGA GTCTTCTCAA CAACAGTTTG AAACTCAAGT GGTGTTGAGA AGCATCTTCC TGACATCTGT TACCACAGGT TTTGAACTTT CTGGAITCC CACTACGTCC TGTCCCCACC AGGCGATAAT CAAAACCGCC CAACCACACT ATCCCTGAGC CAACTACTGG GACAAAAATC.
TTGTCTGAGC TTTCTTCACA Tr-rCTGCTCT AAAGACAAGC CAACTGTCTT GTCCACAGC CACCTTCTAC TACGACACCA CGCCTACCT'r GACTTGTTCC CGGTTTTAGC 'rTGCAAGTCC TTTCCTCAAA AACTGCTCCC AAACAGGGAG ACCAGCAAAG GCGCALACCAA GGTATCCATG TATATGGCTT ACCTGCAACT TGACTTGATG ACTAATGCTA TAAAGATACT AGTAATCCAA CGACTGTATA GCTTCCCTTT 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740.
7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 TOCATCTTCA TGCCACAAGA AC'rTTCTCC'r CATCTACGCC TTATAACAGT TTGAAGGAGT AGCTGGATAT GGGCTGGATG ?TTCTAAGC TTGCTTTCAC CCGTGCATCA TGTTCATACC TCCACTACAT ACTCTTCCCC ATTTGATCCA GACATGGTGA TTCTTGAGGA CAATCAACTC TATCCG'TTT TTT1GCTGGGC AAGATAAACC CGATAAGGGC
AAATTCATGT
GATTGGTITCC
AGCACGATGA
ATAGCCTTT
AATTTCTGTC
ACAAGCAAAG
CArTG4GCAGG 436 CGCCCTAAT'r CTTGAGGCGT AAGATACCTT C7rT=CAAA AAGGTAATCT CACCTGGAAT TCAGCTCGGA TACGGATTTT ATAGTCTCCA CCTA'CTCTAC CCAAACTCTC CAGCCTGTTC TTCGCATGTA CACCAAAATC AAAACGAA'rG CAGAATT'rCC
TCCCTTTTGA
?1'GAATCTCCA AATCATC~rG AGGCGTGAT'r
TGGAAAAACA
AGGATCCTTG CGGTCAAAGA CAATGCGTGC TGGCACTGAT AGGAGTATAG CC1'CCCA'rGA CTTCCACTCG AATCTCTTGG ?TTTTGTCCA GA7=1'CAG GCI'TTTGAA AAACCAAAAC AATACAAATA ATGGTTACAA TACTATTTAA CATGACGTCT CCTTTACATA CAArrACATC TTACTTCTGT TACAGCACTT GAT'rrCTT~CT CTGAAATCAC AGCTTCCAAG TCTTCCAAGT CAGTCTGAGT AAATTCACAT CAAATTCCTA ATCCTACGGG A.ACAAACCTT GTCTTTGATA ACTTTGGTCT AGAGTTAAAA GGGCTGAATA AACAAAGGAC CAAACACTCT T'rATCAACCA GACGAGCCAA AAGTGTCTGA AAACCGCTCT GCCAAAACCC TAATCAAATC TGTACTGGTC AATCTTCATG ACCTGCCATT CTGCATCTGA AATCTGCAT ATTTGTCAAT TACACTCATC AGTATACTCT TAAAATCTAC ATATTTTCTT CGAAAAATAG AATTTTAATC ATTTGAAAAA TCTACAATCA AGT1CAGCCAA TCTTGGACAA GTAAATCCCG TTGCCTT~CTT TTTTCCGAGT ACCGTGGACT TGGACCAGTC TGCTCCCCCT GCATCCAAAT ACCATACCTC CAAAATCTAC ATTTGTCAAT TATAGAAATA CGATTTGCAG TCAAATATTA AAAACAACCA CTAGGGGTGC ATCTAGGTAA TGCTAGCTGA GCCTTTCTAA AACAGAC'rAG TTAGATGACT TATTTACCCG TGGCATTATG GCAGAT'rTAA AACCAGTCTT GTCGCTCAAA 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 CTATA'rAAAC AATAAAAATA 'rGCTATACTA GTAAAGC'rGA GATTAACGAC TGGAAGTGGA AATGATAATG C?1'GTTCTTA AGAATACAAA TTGCTTTGAC CATTGCAGGG AGTCATTCCA AGCGAGAGAT ATACCAGAGG TGTTCAC1A AGAGTGTCTT TTCTGATATT
TGTTAGATCC
GGGACTAGCA
CTTCAGTTGG
ACTGACCCTA
GTCTATGGAA
ATCGAGCACG
CCACCTCAGG
AAGAAAAAAG
CTCTGAC1TCA
GTCTTCTATT
TTGGGAGGTT
GTGGTGGTGC
TGGCTGTTGT
TTTCTCCTCA AATGTTGAAA GCCCAATTGG CTGTAAPAAC TGGAATGTTG GCTACTACTG AAAAACTGGA Tr'GTCCCTAT GTCC'N'GATC TGATTGACTC AAATGCTAGA GACTATCTCA AAATCATGGA AATCATCCAA CCCTATCTTA CTGTTATGGT TGCTACAAGT GGAGATGCCT 437 CCAACTATTA ?1'ACGCCAAA TCTTCCTGAA GCAGAAGAGA to* 00.00* o* 0 AAACAAACTT ACTACCTCTA TTGTTGGTTT TTCAATCCAT AAGAA'ITTGG 'rCCTCAGTCT ATTTCCrTT TACCAAGAAT ACACCCATGG TACTGGATGT AGAGTCT'rTA CCAGGCAG'TT CCCCTCAACT CGGTCATGGT AAAAACTCTC TAGTT-CCCAC CTTCAAACTA CGTCAGC'rTC AGTCTTATCT AAAACCTCAA CTCTTTGAT'r TTCAPTGAGT CTTGCGTTTT TGCCTCAATA CTATGCCAGT GCCCATAAGC CTGGAATGGC AACCGTTTGG TTTCCTTGGT GGTGGTTGGG CCGCTTCCAG AGCT(CrTTA AGACTTTGCG AGCTACCGAA CAACCCCAAG ACAAACATCC TTATI'TCCTT GACTTGTTTT *GCAATTGGAC TATGGTAACC CCACGGAATC TTGATAGCGA ACCTCTCCTT TGATGGTATC AGTACTTGGT AACGAAATTC ATATTGAGAT AAGAGACTGC AAAACAGCTG TTAAGGCTCC CAGCCZATTTC CCAGTACAGC ACTAAGAAAG ACATACCAGG TCCGT'N'CTT GATCTTTAC AGAC'rTCGAA TTTCTGACAT GACCCCGAAG ACATGCAGCG GTGGT-rATCA AAGGCCGACA GAACAATTTG TCTGGGAAAG ACC?1'TGCTG CAGTGATTAC GATAAGGCCA AGGCCI'TAT TCTGGTCCAG TCAACCATAC TTTAAGGGAA TTAGAGAGTT CATCTGCAGC CTCAAAACAC GGCAGTACTT TGAGCAACCT GCGACTAGCT TrCTAGT'r'A ATT-AAT'rAGG AAAGAATGTT ATGCAACTTT TTTAAAAAGG TGCTGGTCGC CTGATTT'rAA TCrCAAAGGT GGTGCTAAAG CCCACGAATT CAAACCTGTC TGCTGAACTA GCCAAGGGCA CACAAAAGCT ATTCAAGATG AAC7=TAAA GATTAAGAAA TTTATACTCT TCGAAAATCT TGTTr'rGAGC 'rGACTTCG'rC TCTTCTGCT'r GCATCAAATC TGATCAATAT TCTCCGAAGT CAAAT'rGTT'r TCAAGGTCGA AAAATGGCTC CTGTACCCAA ACCGTT1TTAG CGGTGACACC ACTGGTAATT CATCATCTCC AACCGATCA'r CGATTATCAA GCCAGTTGAT AATATTGATT CCTGAACGGC AGGCCGTCTC TT1GGTTACCA GATATAGTCT TAGCCAATTT TCATCTCTTC ACGTACAACA GCTACACCAG CAAGCCTCCA ATAGCAACTA 'rATCACAGTA ATGGGCGCAT GTAATCTGCA CCTGATTTCT GAGGATTT'rT 'rCAGGACCCA GATAPGCACA CCTGCTGCAT GGGTACCTGA TAAGCATCTG GGTTGTGAGA TTTTTTTC'rC 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 114 0 0 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180
AACTTTTGCA
AAGTGCTTCT
TTAGGAGCGA
AGAAAGCTTT
CTATTCATAA
AAGCTGATTG
CTCAGCAGCG
TTGGCTGAGA
TTCCAA'rCCC ATTCCTTGAA CAACTA'TTT TAAGCAAGAA GCTTCAAAAC CAGTCTTTCC AACCAAGTCT CCTGTCCCTG TTATCCAGTC TAATTCAGTA GACCTGATTT TTCGAAACGA CGAGGTCCTT GGGACCTGTG ATAGGTCTGA CACCAGTCTT TICAAGACTTG ACTCGCATCG.ACCCCAACGC CGTGGTGCTT GTTTCCTTTA AGGACCGTAG GTCTATAGTC
AAGCAAATICC
TAATCCAACA
TAAAAGGTCT
438 TTAACTAAGC TCTTACGAAT GGATGAAGTC GTTACGCCAA CCGCATCTAc AGAGAAGA?1' GG'I'?1GCATA CGAAGCTGCC ATGCGGATTG cTTTTccTT AAATGCCCCA AATTGATGAA GAA'rCATCTG CCATGACAGG CAAGAAATCT CAT'rGGTAAT GTAAATTCCT GCATCAGTCT ATTCCTGCTT GA'rTAAAAAT CTCCGAAAAA TCGAGGCGTG GTACCATAAC AGGATAGGAA GAGCAGAATA GTGAAATTTT' AAGTCGCTCC TGTGAGAGCT TGACTGTAGC CATAGCCAAG CTACACTGGA CATCGGTGCC GGGCAATCAT CATAGATAAA 'rTCTATTCTrr CTCCTTTTTC GCCATTTCCC AAAACI'GGC TGCTTGTC'rG TCTrCGTCACT ATCTGTTGCT CTAACTCATC GGTGATGGTT TAAGATTAAG AAGCTTGCAA AAGCGATGGC GAGAGCCTGA C1-rTGCT'rAG TAAAATCAAG TTTGCATCCC AGAGCCAAAA TCCCA'r'r'GC
CCAGTGAATG
ATCCTrTTCAC
CGAAAGGCAA
AGGGAACTAG AGCCTATAGG TAAAGAAATA TCCCTGCACT TAAAGGAAAT CGCTGTACCA TAGATAAACC AGCTAAGCTTI AGCAGCTGAT ACAATGGAAC CAATAATACC TGTTCCCAAA
S
S
S. 55
S
S
S. S
S
SS 55 5 5
S
5,5.
S. 55 S S
S
5.5.
S
*S5S
S
S. *5
S
CGACCGTACT
AAAGGCGGAA
GCATAAACAG
ATTCCCTCAA
ATGGTTAATT
TAAAGACTGT
TTCCATATGA
TT~CTCGATAG
CGTAATATAA
TGATTTGCCT
TAAGTTCGGT
TATAAAAGAG ACCTGCTAGA TCCCTTGAGT CG'rCATACGG
GTCCCATCAT
TTCGAAAGAT
TGTGAACTTIG
AAATCGCTCT
ACACTGATGT
AGCTGATTGA
GTTTCAATCC
ATATCATGGT
TCTGCAAATT
GATTCCTGCT
AGGTGTAAGG
TAATTGGTGC
TCCATGTCTG
GGAAGGCATC
CCAGTGCTCC
ATTGTTGATA
ATAACCAAGG
GCCTATAAAT
TACCATCGGG
CTCAGCTGAC
AACTTCACGG
CAGCATCTCA
AAAGGGATr
TTTTTAAAGA
ATCAAGGTrG
CCTGTAAAGA
ATCTCTCCTA
AGGGCTCCAA
ATAAAGGCTG
AGAATATTGA
ACTACATCAA
TTTCTCATGC
GTGTTGGTAG
TAGCATTTTT
CTCCTCTCTG
GAGAGGATTr'
ACAAGGAAC
ATGAGAAkATC 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 TAATGATAAC -AGGTTGGAGC GATTGGATGT TGCTCCATTT CCTGGTCCCT GATTTCCAAT TCCTTGAAAA A'rTGTTGGCG AATAAATAAC TCACCCTCCA CTAAACCCTG AGCATTTTGT TTCAAGAGTC TTTTCATCTC GAGAAAGCCT TCAGATAGTA AAATCCCCT CTTGTAATTG TTGGATAA'1r CCATCGCAAT GGTTTGAAGC AATAAAAAGA AGTCCGATAG CTATTCTTCA CAGGTAAAGA TGGCGCCACC CCAAAAATGT AGTTCTTGAC TTGGTTTGAA GTCTTATCAG CCAAAAGATG ATAGATTTCT GGCATCCTGA ATCAGGTAA'r AGCGGAAAAT GGCAGGTTCT 'rAAAA'rAAAG GGA'rGATGAA AGGAAGCCTG ATCTGTAAAT TCCATAATAA CTCCT'rTATA AAAGCAGGTA GAT'TAATTTT GTTTTTTTAG ACTGTGCATG TTCGTCATAT CCGTGAGCAG TAAAGAC'rCT CATCAGAACC TTACTGTAAA CCAAGCT'rTC
AAAATAGACT
GAATATAAAA
ATAGAGCTCT
TCAAGGGCGA
CACGTAA'rAG GCAAGCTTCT TTGAGGGACT TGATTTCTTG 439 CTGAATGAGA GGAAAAGAAT TGAATACCAC AATCAAGCCA CCCCTTTTGA GCCAAG'TACA AGAGAAGCTC TTNAGTGAA GCCGATACAA ACTGTCACAA AGGCCCTCGT TCCAACCATG .0 0 4.
0 GTCTAACTGA ACTGCCCAGT AGCTAACATr TrAAATCGAC AAGAGTCAGA GCAA'rCGAAG ACTGATAATC GG1'GTCT1GGG TGCTACTCTG AGATGTAAGT GACTAGTCAT CTCAATCCAA TGAGCAGAAA ACTTCTrCCT AGGCTCTATC ATCCAAACCT TCAAGATGGT CAGGAGCTGA TATCGACTGC CTGCTCCAAA CAGAATCAGG TCCATC'rAAT CAGCTTCCTG AACAACACCA AACCCTTCCA AGTAATGCTC TTGATTTCCC GACACCATTG CAGCAATTGA AAAGAGGGGG GTTCCGGGCT AGAAGCAACT GAAACAC'rTC CCTTAGTTTr AGTCAGATAA ATCATGGTCG GAATCTCCAA TATCTCGATI' GATAGACCCT AGGATTCATG CTGATAAGTG ATGGATGAGA CTGCTATrTT AGAATCAATT ACTCCTCAAA CAAGC'rCTCC CCGTCTGGAC ACGT'rCGACG GACCGC1'ATA GGGAAGAGAA ACCCACTACT CCCAAC'rAAC AGTTGGCAAA AGATGGTAAA GGTAATAGAG CATAAAGAGA GAATGAAAGA TGTTTCCAAG TTGCTACTT GACCATACTA GGTTTGGTAA TGGTCACTTC TAATCAACCA CAGAAATCAA TGATTCCTC'r CCTCCACAAT GCAAAAGGTT CATCTAGCAA AGAATTTTTT GCTGACCACC TCAAAATATC GTAAAGCTTG TGAAGCTCCT CTCGCAGACT GTCAGATCAC GATACAAACT CCCTTATACT TTTGAAATTG TCACCCAGGA TACAGGAAAT CGATTACCAA GCTCACCAGT TCCTTTGAAG CAACCTGTGT CCGTCTCTTA GCTCCACCAT CACAAAATAA cTGTCTTCCC TAGGACCAAG AGCG;TGATAG ACAGAGGAAA CAAAGACAAG ACTGCCTGTG AAGCATCTCc ATGGCAAGTA TGATCATCCA ATACAAA6ATG CGACTACCGA GATAAAATCA GCAAGAAGAG TCrCACCTCC CCI-rGGGTAT TTTCACATGC CGAAGACCCT AGGGrCTAAA CGATGACTPA CCACTTGCAA AAATAATGGC GATCACGGAA GCCTTACTGG ACTTAATrGA TAGGGACTCT AAAAA'rCCGC TGATTTCTT GACTCGGATA AACTGCTICT CTTTTCTTr TrCAGGACCC AAGAATAGAC CGAAAGAGGG CCCTTGATAG AATGTGAAAT CACACGGTTC ATATGGAATA CATCTCATAG GAAGGGATr ATGGTCGATA TAGGCTTTAT ATCATAGACC AACTCTr'rTA 14040 1.4100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 C'rGCTCTTGC GGTCA.ATGGA AGCGAAGGGC TCATCCAAGA GCAAAGAGGA CAGCCAGCGC TGCTTTTTGC TTTTCCCCAC CGGTGCAAGA TGTCCTTGCA ACGACA'NTGC TGGACAACCT TCCTGAAGG.T GATAGCCGAT ATTTTCCATG GTAAAAACCA ATGGTAA6ATIT GATGATTAGG ATTTTGCAAG AGAATACCAA ATAGAAAGCT GACTGACCTC GCTCCCATCT ATCAGGACr'r CTAACTTGGG CAATCATTTG AAAGAGGCTG GATTITCCAG AAGGTAAAGG CTTGCGCATG AAAAGTAAAA TCAAACGGCT CAGAGAAGAT TGGGGACTGA TTGCAAACTG ATGATAGAGT CAGAAATAAA ACGTACCACA CTAACTTAAT GTATTCATAG GAGAGAGCCA ACTTTCATAG CTTGAACAAA GCCAGACAAA CAAGCGCAGC TAGCACTTCT CAATGGGCGC AGCCATACAC CAAGAGGTGT TAAGAGTAGA CACCAAAAAA GATAGACAAG ACATAAAAAA CTCCT'TTTTT 440 ATCGCTCGTA GTTCCAGACC TTGACAATGG CACGAACCAA AGCAACGAAA GGACAAACGG ACAAAGCTAA CAAGCGTAAT CGATTCTTAG TTACGATAAA AGAGCTCCTA GACCAAATTG CCAATCGTTG CACTTCCGAC CAGAGACCGA AGAGGAT'N'C CTGAGAATAT TATACACATA AAAGCAAGCA AGATAACATC TAAAGAAAAG TGAGGCACTC TAGAACAGTA CACAAGAACA TTTGTTCGCA TCTTATTTCA ACTCTA.ATTA CAGATACAAA TACGAAAAALA ACATTTCACA CATCTATGCT TTTCCTCCAG GATGGTACAG AAGAAATAAA AAGGGAAAAG GCGTAGTAAC CCCAATACTA TTAGCAGTTA ACCAAATTCA CTTCCCAAAC GCTACCATAA AGGACTTCAG TCTCGGAACA AAGATGGCAG ATTGGCAAAG GCCTGCAAAC TCCTGAACCA ACCAAAACCC TTTTAACTGC CATTTTTTCA AAGAAGACCG ACCTAAATAC CTAAAATATT TCTAGAAATT ATCTACTATA TCATCTTCAT TTAGAGTTCA GCTTACAAGA TTTCCCr'rCG CCAGTCTTALA
TTTGTATACC
AATTTGAATT
CCAGTTTCGT
TTAGACACTT
AGACTGAATT
TTCTAATTGA
AAAAGAAAAA
C'TTN'CAC!A
15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 CTGTATCAGG TTCAATGGGT AATTATGTGA TTATTATAAC AGCCTTGCAC TCACAAAGAC TGAATTAGCC ATTCAAGCTG CTTAGAATCC AAGGATAGAT TTTTTGCATA CGATATTCC CTTTAATATG TTTCGTCTGT TTCTCATCAA TTCAAGACCT TACCAAATAC AGGCCCAACA
ATCATCTCAG
ACACATTTTA
AGCAGATC'N'
AATCTGGACA
ATCTATTGTT
GAAATGATTG
ATCCCACCAA
TCATAACCTA
CCTACATAAT
AGACCAATTA
CCTACATGTA
AATGGAACAT
AAGCCTTCTA
AAGGCTTCCT
CCTAALAGCAC CCCAAATGTC TTTATTATTT TACTAGTTCA AGAAATTGAA CTGAAATAC TCTTTTGCAA AAAACAAATG ACCTGTTTGA TAGCTTTTTA AAAAAGGAAA ATCCTACTTA
CACTCATTTC
AAACGCCATC
TTGCAACTAA
TAGTACtACC
CTACATATTC
TTTTATCTC
CGCCATCGGC
CCGAACAGTT TTTTCTATAT CATATTGGTC TTTATAATGT AGGCAT'rTGT GGCAATAGTT AGCATCATCC TTTGACTGGG AACTTITTGAT TGT.TGAAATT CATCAATTTT CTAATTTCAT GTCAATTTCC ATTGCTAAAT
CTTCTTCGTT
CAACACCAAT
CTATATCGTC
TAGCTAGCTC
ATAAGGTTAT
TCTTATAGAA
ATCATCTTGA
ATTAACGATA
AAGTTTTTCT
ACCACCTTTT
TGTATTTr ACAAAGTTCT TTAATTTGGA AAGCACCCTC ACCTTTTTCT CGAAATTGAA CAACGACTGT ATATAGATT'r TTTCCTTGGC TTAGTAATTC TTTATGAAAC ATCTTACTTC AAGTAGTCGT TCCACAAATA AAATATAGTT 441.
ACTCTTICA ATTCCITAC ATCTTCATCT GTAATCTCGT ATAAGGCATT TATAAATTCA ACTTTAAATG TCCCAGGAAG ATGTCCATTT GGACGTTTr CTGCTATTTC TCCAGCGATA TTGTAAACCA ACACTGCTGT TTTTAATGAT TCAATTCTT GACC~rTrC TAGTCCGATA AAGC 'TGCTA CTACAGCTCC TAATAAGCAT CCTGTCCCAA TGACrTCGG CATCATAGCA CTACCATTAT GAATCATTAC CACTTCTCCA TTAACAGCAA TGGCATCCAC TTCACCTGTT ACTACTATTG GAATATTGAA CTTCTCATTT GCIG.CTAGAG CAATTTCGTC AATATTATCT ACGCCCGCAC TATCTAC'rCC TTTAGATGCC ACATCTATTC CTACTAAAGA GGCAATC'TCG CCACCATTTC CTCTAATCGC TCCTAGTTTA TTTCTATATT CTCCTGCTCC ACAGGCTACA TCTGCAATTT' TCAGAGCAGC TT-GGTATA.AT TTTATTAATA AACCACCAGC ATACTTTAAC
ATGGCGG
TTrGGTTATAC
AAATCCTTCC
TTTTT'CATTTr
AACCGCAGGT
AGATTTTCGA
CGTAACCAAT
GTTCTGTCAA
CCCACTCTGC
CAGGTAAAGC
TGACCAAAAG
CrrTTGTTAC
TAAAAGCCTT
AGGCGCCCAG TGCTACTAAT AAATGACCAA TGGTGCTTTT TAATTG'rTGA TTAGATCATC TCCTACTTT GGATCTAAAA CTGCTGGGAC A'rTA'rATTTC TTCCAATTTT CATCTGTCAA TGITrCCTATC AAATCCTCTA AATCTGCTGG AAACTCACTC CCATTTGCTG TGAAATTTTT TACTACATCA 'rCTTT'rAATA ATTTTAAACT TGTCATAT'1G CTAATTTCGA TTTATCTr'rA GTTGAGAATT AATGAAAATC AAAGAGCAAA CTAGGAGGCT GGTTGTGGAT AGAACTGACG TGGTTTGAAG TTGTAA.ATAT CATGAGCCTT CTCTPAGACAT TTTTC-ACTTT ATACGATCTA
ACATTGAATG
TGCTCAAAAC
AGAGTCTTAC
ATCAAAAAAA
ATCTTCTCGA
AAAGAAAGGA
CAAATCTT'rA
ATGGGCATAG
ACGGTGCAAA
TTGAAATAA
ATTTATACTC
ACTGTTTTGA
C'TCATCAAAT
17580 1.7640 1.7700 1.7760 1.782 0 17880 17940 1.8000 1.8060 3.812 0 1.8180 1.8240 3.8300 18360 1.8420 1.84 1854 0 3.8600 1.8660 18720 3.8780 3.884 0 3.8900 3.8960 19020 19080 19140 19200 19260 GCTAATTCTA AAGCACTGC TTTTTACCGA CACGATTGAG CCTCTGTGGA GATTGATCCA GAACCCCAGT CTAAATAGAG TCTGATGAAG CCACCGCCGA GTCACTTCTA GA'rAGTCATT- CCATCTTCAT CTGCTTCAAG TTATCTGCGT GACTACGCAG CTTGAA'rCAG ATAGTCCTTT CAAAACGATG xr'rGATrGCA CAGTCAAACC TGGCTGAAAT CTTrGTTAGTA AT'rAATAAA.A TTGATTCCAG CGTTGCTGAA TTCGTCAACC AGAAATTGA.A TTCCGAATGA ATATAGACTT ACCN'CTGCA ATGACCAGCA ATACATTAGA TCCTGAAAGG CTCTGCTACT TTTAACTCTT AAACCCTAGT TGCTTGGCAA GCACCCAGCA TGGATAAGAA AACGAAGCTT GGATTCAAGT GGCATCAAAG AAGTGATAAT CCCCGCAAAA AGTTCCTTAA GCTTTTTAAT AATTCTCCAA ATTTACTTCT CCTC'rCTTTA
AAGACCTTAT
GCCTGCCAAT
GCATAG;TCTT
CACC.AAGAAAL
TCTCAATTGT
CCT'TCTGACT
GTGTTTCCAT
TATCAAGCAA
AA'rCGTAATT
AACAACCCTA
ACTTACAAGA
CAGTCTTAAC
CTATI'ATA
442 CCACTTGATC CTTTTAAAGC ACATCGAGAG CAITTGCAGA TCCAGTTTAT ATAAACAAAA AACTCCAAT ACAATCAAGA TTAGACCGTT CATTTCACCA TACGAAAAAA CTGTwrCACAT TGTATCAGGT TCAATGGGTA TT-ATCCAGC CTAAAGCACC ACTACTGAAC tAGTA'rAGCA AAAAATGAAA GCCCTAGCAA CGAAAAATAT CTTTATATAT AATATATTGA AACTAGAATA CATTGTTAGA AATCGATTTG ACTGTCCTGA TTGAmTGTC TATAGT'ITTC GATAGCAATT TATTCTTCCA ATACACGAAG GAGGCAATCT GTTTTATCAA TACAATTTTA AGTCACGACG TGTATGGATT GTGACGGAGC TTGAAGTGTT TGACATCTTC
GTACACCTCT
CTArrC"NAT
AAAAACCTCC
GTCAACTGGG
AATGGTCTGA
0 .0 :000..
Otto* Vto.
to 90.0o ATTGCATAAC TGTCTTCAAT TCCGCATTCA AGTGT'rCAAA GACTTGACGC TACCACCGAG AGCCAAGCCA TAGATGACAG GGCGTCCAAT AGCAACCAAG ATGCCAAGGC TTTAAAGACG TGTTGACCAC GACGAACACC AGAG-rcAAAG CACGTCTATC AACTGCTTCT GCCACTTCTT GAAGCGAGTC AAAGGCAGCT CGATTTGACG ACCACCGTGG TTGGTTACCC AGATACCAGA AGCTCCTGCA GTTCAACGTC CTCACGGCAT TGTGGTCCCT TGACATACAC AGGAAGACCA CGATAAATTC TACATCGCGT GGAGACAAGC GTTGTTTAGC TGATTTGTAA TTGATTTACC AGCACCTTCT GGCAGGTATT CTTCAACAAT CGGCATGCCA CAAAACCATT ACGCTTATCC ACTTCACGAT TCCCCCCTAC AGTAGCATCT CAATCGCTTT ATAACCTTCA GCCTTCACAC GGTCCATGAT GTGGCGGTTG CCTTAC2'AAA GTAAAATTGA AACCAATGAG GTGTCCCT'rG GAGGGCTTCA GAAGGTCAAC AGTAGAGTAA GAACTGGTTG TATAAAGAGA ACCAAACTCA GCGCAGTCGC CACTTCCCCC TGTTCATTTG CCAATTTATG AGCCGCAACA TGAT'rGGAGA AGATAGTTNT TCACCTGCAA ATrCAATCTC TGTACTTGGA TGCAAAGTGT ATGAGGAACG ATGAGCTTGT GGTTAAAGGC ACGGATAT'rC TGAAAGTATC TTCCGCCCCA CTAGCGATAT AGCCAAATGC TGCTTTAGGA GCGCCATTGG CTCCAAA'rCA TAGGTATTGA TGAAZTCTAC ATGACCT'rCT TTTTGTATGA CATAAAATGT CCTCCTTAAT AAGTAAGCGT TTACTTTGTG ATATCTTAAC TCrM~CAA AACTTTTAAA ATATTTTGTT TGGAAATTTC 'rGTCTATGAT AAAAATCCTT ATAACGGCAA TAAAAAATAG ATATTATCCA GAGCrAACTA
ATTAGAGTTG
TTCCCTTCGC
CCAAATGTCT
GATATTTGAC
ACTTATAAAA
TTC-ATT'rTAC
ACATTCAGTG
AAGGTTGGGT
GTTCCAGACA
AC-ACCGACAC
TCTGCTCCTG
ACAATCGGCA
GGTCCACCGT
GCAAGCGAAC
GAGTATTCAG
ACAAAGTCCA
ACTGGGAAGA
GCCGTCAAGA
ATACCGTCAT
GAAATCTCTG
TGCACACCAC
GGTGCCATAA
TTTTCTACAT
TCTCTTAAAG
ATAACTTGTT
GCATTGCTTG
TATTACAAAA
AGAAATTTTA
AAGAAGATTT
19320 19380 i9440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 443 TAACTGCTAC AATAACTGTA TTATTTCTAG ATGGGAGGTT CTATT-rrGG ATTGATCCAT TGTTGAACA6A TATCTACCAC TATATCAAAA CGCATTCTT CTGACCI'GC ATATTGCAGr TT1GGGGAATT T'rGGGATCCT TTCTrCrCCGG NTAATCGTT AGTATCATCC GACA'rTATCG AATCCTGTT TTGGCGCAAG TAGCGACAGC CTACATTGAA TTGTCACGTA ATACGCCCCr NTTGATTCAA CTCTTCTTTC TCTACTTCGG TCTTCCCCGA ATCGGGA?1'G TCCTA'rCTTc- AGAAGTCTGT GCAACGCI'G GGCTTGTC1-r TTTAGGAGGC TCCrTATGG CAGAATC?1T CCGAAGTGGG CTGGAAGCCA TCAGTCAAAC CCAGCAGGAG A'I'GGCCTCG CTA'rTGG'rCT GACACCTrCTA CAGGTCTTTT ACTATGTGGT TCTTCCGCAA GCAACAGCGG TGGCACTCCC CTCCTTAGT GCCAATGTCA TTTTCCTTAT CAAGGAAACC TCI'GTTTTCT CAGCAGTGGC TTGGCCGAC CTCATGTACG TCGCCAAGGA TTTGATTGOT CTCTACTATG AGACAGACAT
S*
6 6 0* .0
C
0 0 *6 0 TGCGCTAGCT ATGTTGGTAG TAGCTGGATA GAAAGGAGGC GAAATAATCT CCTGAGAA'rC CTGTCCTCTT A'rCCATGATG GAATCATACG A'TTI-rAACA TGCTACTCTT CATCGTTTAC AGACTTCAGC TATrATCGT'r GTGGAGCTAT CAC'TTCTCTC CTAATGTTCA ACTT'rACTAC TrGCTTATCT AATCATGCTG CTACCCATCT CACTCGTCT'r TCCGCCATGC AGGAT'rCGGG AATCCA6AGTA CrCTTTCAAG TTIACAGGGAT TGGGCGTTAC TTCAGAACAG TCATGGGAAT CGATTGTATC TGGAATITAT TTTGGCTTGG CTCGAAMCTT~ TTTACCCTCT GGGGAACAGC CCTA-AACATC AGTTTGAAAG CACATCATCA TCCCACAAGT GATTGGGATA TCCATCCTGT CATCATGACC TCCCATTCTA CCGTATCATG CCCCAGCTGG TAATATCAAT ATCTCAGGTG TGAAATGGGA GACTTGGTAC TGGACAGGCA CTCGGCTTGA CTrAAGAAGA CTGCTACCGC ATTAGTTGTT TTGATTGGGG 21120 21180 21240 21300 21360 21420 21480 21540 21600 21660 21720 21780 21840 21900 21960 22020 22080 22140 22200 22260 22320 22380 22440 22500 22560 22620 22680 22740 22800 AGGC'rATCAA 'rCTTGTCACT CGGATGATTA AAACCACTTC *&to 0-00 -060 TTGTGGAAGT GACCAAAGTT GGACAACAAA TCATCGATAG CAATCGCCTG ACCATCCCAA CTGCTTCATT TTGGATTTAT GGAACCATTC TAATCTTATA TTTCGCAGTr TGCrACCCTA TTTCCAAACT ATCCACTCAC TTAGAAAAAC ATTGGAGAAA CTAAATG'rCl GAAACTATCT TAGAAATCAA GGAACTAAAA AAATCCTTCG GAGACAATCC CATCCTCCAA GGACTTrCTC TAGAAATCAA AAAAGGGGAA GTTGTTGTCA TCCTAGGGCC .ATCTGGTTGT GGGAAAAGTA CCCTCCTTCG rTrGCCTCAAC GGCTTAGAAA GTATTCAAGG TGGAGATATT CTTCTGGATG GTCAGTCTAT CGTTGAAAAT AAAAAAGAT2' TTCACCTAGT TCGCCAAAAG ATTGGCA'rGG TCTTTCAAAG T'rATGAACTC TTTCCCCATC TGGATGTC'rT ACAAAACCTC ATCCTAGGCC CTATCAAAGC TCAAGGAAGG GACAAGAAAG AAGTAACGGA AGAAGCTTTG CAATTAC'rAG
AGCGTGTCCG
AGCAACGTGT
AGGTGACTGC
TGGCCCAAGA
TTACTGACCG
CCTTCTTTAC
444 T'rTGCTGGAT AAACAACATA GCTTTGCCCG TCAATTATCT TGCAA'N'CTC CG'rGCCC1'CC TAATGCATCC AGAAATCATC TTCGCTGGAT CCAGAAATGG TGCGTGAGGT GCTGGAACTT AGGCCGACC ATGATTAG TAACCCACGA AATGCAGTTT' GATTATCTTC CTCGACCAAG GGAAAA'rCGC TGAAGAAGGA CAATCCCCAA ACCAAACGAG CCCAGGAATr TTTAAACGTC
GGTGGACAGA
CTTTT'TGACG
ATCAATGATr
GCCCAAGCCA
ACAGCTCAAG
TTTGACTTTA
GCCAATTCGG
TGTTTTAGCA
TGGTTCATCC
CGGTGAAC'rG
TGGTTCTTAC
GGTGTCAAGG
AACAAGGTAG
CTCATATCTA TAAAGGAGAT TCTTATGAAA CTATTCAAAC CACTCTTAAC CTrGCCTTTG CCCTTATCTT TATCACTGCT TGTAGCTCAG GTGGAAACGC TCTGGAAAAA CAACTGCCAA AGCTCGCACT ATCGATGAAA TCAAAAAAAG CGAATCGCCG TGr'rTGGAGA TAAAAAACCG TTTGGCTACG TTGACAATGA CAAGGCTACG CTACGATATT GAACTAGGGA ACCAACTAGC TCAACACCTT 'rTAAATACAT TTCAGTCGAT GCTGCCAACC GTGCGGAATA CTTGATTTCA ATATTACTCT TGCTAACTT1 ACAGTAACTG ACGAACGTAA GAAACAAGTT GATTTTGCCC TTCCATATAT GAAAGTTTCT CTGGGTGTCG TATCACCTAA GACTGGTCTC ATTACAGACG TCAAACAACT TGAAGGTAAA ACCT'rAATT'G TCACAAAAGG AACGACTGCT GAGACTTATT TTGAAAAGAA TCA'rCCAGAA ATCAAACTCC AAAAATACGA CCAATACAGT GACTCTTACC AAGCTCTTCT TGACGGACGT GGAGATGCCT TTTCAACTGA CAATACGGAA GTTCTAGCTT GGGCGCTTGA AAATAAAGGA TTTGAAGTAG GAAT'rACTTC CCTCGGTGAT CCCGATACCA TTGCGGCAGC AGTTCAAAAA GGCAACCAAG AATTGCTAGA CTTCATCAAT AAAGATATTG AAAAATTAGG CAAGGAAAAC TTCTTCCACA AGGCCTATGA AAAGACACTT CACCCAACCT ACGGTGACGC TGCTAAAGCA GATGACCTGG TTGTTGAAGG TGGAAAAGTT GATTAGTCAT TAACTCTTAA AAGGAACTGG ATTTTrAAGCT CCAATCCCTT TTTAAGATTT TACCTATAAC ATCCTGAGTC 'rATCTAAGAT GTTCAATCTG AACACAGTGT ACATACTTTA TCT'rCTATTG CATATACTTT ATCACATAAG ATACGAATAT CCTCTICACT ATGACTAGCA ATCAAAATTG T'rGTCCCT'rT TTCACTAGAG AGCTTTCTAA ACAATGTTCT CATATTTTCT ACACTTGATT TATCCAAGGC ATTCATAGGT TCATCTAGTA AAAGAATAGA GGGATCTCC ATAATTGCTT GAGCAATCCC TAGCTTTTTC CTCATACCTA GCGAATAAGT TTTAACTTTC TGGTCTTTTT GCTCATATAG ACCAACTATT TTCAGTGTAT CATTCATTTC CTGATTACCA ACTACTCCTC GTATGCTTGC CAAATATTGT AAATTCTTAA AGCCACTATA ATAA'rTTATA AAACCAGGTT CTTCAA'rCAA AGCTCCCAAA TTAGCTGGAA TTTTTCTCTC AGGAACAATA 22860 22920 22980 23040 23100 23160 23220 23280 23340 23400 23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 24240 24300 24360 24420 24480 24540 24600 445 TTTTCCCCAT TGATTAACAC TTCTCCATAA GACGG AACAMTACAC TTTTCCTGAL GCCATTCGCA CCAGTJ CAACTAAAGT TAAGGTTTTG AAAAACACAT GTCFTT AATGTAATTA TTTCATTCAT TC'TATAA6ACC TCCTC r.'TGAAAAAG AAAGAcTAAA AATAGCP.ACT GAAGAJ TCCCTCGATT CAAAATATAA AATAGATAAT TAGTT( AC ACAATCAT GAGTAAAAAG AAACTAACGC AAGCA INFORMATION FOR SEQ ID NO: 49: SEQUENCE CHARACTlERISTICS: LENGTH: 11443 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear kCAT rrA
M.ATTA
ATAAACCAGC TATTAATTTA CTATAAT'rTC CCCCTGTTTA ATrTCA6ACTC AATATTNTTT ACGAGTGAAA TAGAAAATGC ATCTCGTCCT ATATCTCCAT TCCTACAAAT AGACCACCAA 24660 2.4720 24780 24840 24900 24960 25002 k.AGTT CC (xi) SEQUENCE DESCRIPTIION: SEQ ID NO: 49:
CAGGTACGGT
TTATGATAGT
TAAGTATAAA
GGAGGGC'TGG
TTAAAAAAAA
GAGAAGTCAA
CCAGCACTTG
CCATTGCTTT
TCTGCTTTAA
AGCTCAATCT
?1'AATCCTGA C: AGTGATGC
AAAGTATT
GTGGGTCAGT
TGCTTGAGGC
TGGCTGGTTT
GAGGCGCA.AC TAAAATATAA GAGCATTGCC ATTGATGGAC ACATGCGATC TCCTTCGATT CAAACTTTTC CCTTGACTAG TGATATAATA GAATTTATGIG GGAAAGACAG GCTGAGGGTT GCAAATCGTT AAACGAAATG GGCTCTTGCC TTTGTGCAGG CGCTTrTTTCT GGGATTGTGA CATGACCAAG GAAAAGGTCA AGAATTAGTG CTAGGAGATG CTTGGNTTTG GAAGGCTI'TG GGTGCAAAAG GAAG'N'GACG TTTATCTCAA GTrrcACcATG.
TAAGACCGI' AAACCCATCA- TACTGGGAAG ATTATCATTC TTTTCATCTT GATTAGGAAT CATAAGAGCA ATACAACTAA GT'TTTCTTGT TATTATTATA ATACATATTT AGGATGAAAT ATAAAAATAA GATTATGGGA TGGTCAATGA CTTTACCGCA TCTTTACCCT TTTTAACGCT CTTGGAGCAA TCTGGTCTTC CCGAGCTACG AGCCAAACAC AAACCATCCG TGATGGTCAG
TTTATCAGTA
TCCACGCAAA
CCTTATCAAA
TAGAATTCTG
TTALACCCAAA
TCAGCCAGTA
TTGAACTTTG
TTTGC'TGTTA
ATGGTGGACA
GAAGTTGCTC
TCATTCGTTT
CGGAAGTCAA
GTCTGCAGGA GAGCAGATTC TGAAGCCATG TTAACGGGAG GCTTrACTT GTCAGGAAGT TCGGTGCAGA CAACTATGCT ACTCCCGTAT. CATGAAATCG CCTTTGGTCT GGCTCTCrG
TTCCTAGCCA
GCCAAACTCA
CTGGACAAGT
CTGGAAGCCT
446 TGCTT'rTAAA AGGCCTGCCT CTCAAGTCAT CCGTTGTAAA CTCGTCGACA GCTCTT'rTGG GAATGTTGCC TAAGGGAATT GCCCTTTTGA CCATTACTTC GCTCTTGACT GCAGTGATTA AGTTGGGCTT GAAAAAGGTC TTGGTGCAGG AGATGTACTC TGTTGAGACC TTGGCGCGCG TGGATATGCT CTGTC3'GGAC AAGACGGGTA CCATCACCCA AGGAAAGA'rG CAGGTGGAGG CTGTTCTTCC GTGACGGAA ACGTATGGTG AAGAGGCTAT TGCCAGCAPC TTGACTAGCT ACATGGCCCA TAGTGAGGAT AAGAATCCAA CTGCCCAAGC CATTCGCCAG CGT'rTTGTGG GAGATGTrTGC TTATCCTATG ATTrrCCAATC TTCCCTTCTC GAGCGACCGC AAGTGGGGGG CTATGGAGTT AGAAGGCTrG GGGACAGTT TCTTAGGGGC ACCTGAGATG TTGCTTGATT CTGAAG'rCCC AGAAGCTAGG GAGGCCTTGG AGAGAGGATC ACGTGTCTTG GTCTTAGCTC TCAGTCAGGA GAAATTAGAC CATCACAAAC CACAGAAACC ATCTGATATT CAGGCrCTAG CCTTGCTGGA AATCTTGGAC CCCATTCGAG AGGGAGCAGC AGAGACGCTG GACTATCTCC GTTCTCAGGA GGTGGGACTC AAGATTATCT CTGGTGACAA TCCACGrTACG GTGTCCAGCA TTGCCCAGAA GGCTGG'NTrT GCGGACTATC ACAGCTATGT AGATTGCTCA AAAATCACCG ATGAGGAATT GATGGCCATG GCGGAGGAGA CAGCTA~TTT CGGACGTGTT TCCCCTCATC AAAAGAAACT CATCATCCAA ACGTTGAAAA AAGCGGGACA TACA-ACGGCT ATGACAGGGG ACGGGGTTAA TGATATCTTG GCCCTTCGTG ACCGGATTC TTCTATCGTG ATGGCGGACG GGGATCCAGC AACCCGTCAG ATTGCCAATC TGGTTCTCTT GAACTCAGAC TT'rAA'rGATG TTCCTGAGAT TCTCTTCGAG GGTCGTCGCG TGGTCAATAA CATTGCCCAC ATCGCCCCGA TTTTCTrrGAT AAAGACCATC TATTCCTTCC TGTTAGCAGT CATCTGTATT GCCAGTGCTT TACTAGGTCG GTCAGAGTGG ATTTTGATT'r TCCCCTTCAT TCCGATCCAG ATTACCATGA 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760
TTGACCAGTT
CTGTTGAGCA
TCGTCTTCAG
AAATCTCAAC
CCTGCATGCC
TACCCACAGC
TGTGGAAGGT TTCCCACCAT TCGTTCTGAC GAATTTCCTC AGAAAATCCA TGCTTCGTGC CGTCCTGTTT GTGAAAATGT TTGGCGCGAG TCTACTCTA'r TATCTCTTGG GGTCAAT'rGG A'rTTACCCTA TGGCGTGTCC TCTTGATTGT TCTCTTCCCA AGAATTCAAA AACTGCTTGA TTTTGAGCGA AATATCAAAC CCTACCAAGC GCTCTCATGG TCAAGGTTGG TCTGAGTTAG TTTCTTATCC GTATTT-AGAG TTGGTCAGTA GGAGGTTTCC AAACGN'GCC TGT'rTATGGT GTCGTTACCA AGCGAAAAAA GGCTATAAGC CGCTTCTACC GTCATGATGT TGGTCTTTAC TAAATCAAAA CCACCAGTGT GGCCAGGGCC AAAGGCCCAC AATTr~CAACC CGT1GATTTTC
GAACTGGTGG
TTAACAGAAC
ATCCTGACCA
TTTGTTCTGC
CGAA.ATAGCT TCCTCGCGCA CTCTTATTTA TTTCGCCCAGT CCACTTT1CCC GAGCAGGTGC TAAAGCACCT TAGTTACTTC AAACGGATCT ACTGACTCGA A'rAACGTGAG CTGGTCTGCT ATTCTrGAATA TATTCAGCTA TCACTCTG ATTACGGCCT TCTACACCAA AACTTGCGAT TGCCATATTT GTATT'IAAA CATCAAACrG CTCTGCCCT TTAAATAGCC CATAAAGGAC AATACTGATA AGCATGTGAA TATGGTCTGA ACAAGCATC CTACGCTCA CATAAGTCAC GTATGATICT TCCGATACTA GATTTGACGA CGATATTTGG GTGCAAAAAC AATATGATAT TGATAAACTT TGAT'rATCCT CTCTCATGAG GTACCTCCTG GAGAAACCAC 'rTCTATCTTA TCATr'rTAGG AGGTTCTTTT TATGGAAcCA CTAGCATAGC TAGTGGTTTT CGGGAGACAA GGATTGCAGT TTTTTATACG ATGGATCTAT CGTAGATCTG GATCATC'rAT CGGTGAACCC AAGAGCGACC CTCAAGCCI'G TCAAATATCT GTACTTAGAC TAT'TTGAAGT TTGATGTAAG
ACTCTGTCTI'
ACCGTATCTA
TTCGCATGCT
GAAACACTAA
GCITCATGGA
GCTTTGTATC
TTACAATTCC
TATGATATGT
TGTTACCACG
cTGrA.ATTG
CATAATACCC
TATCAAAAAT
GTTTCGGAGG
TTATTACACC
TGCCATAAAT
ATGTGGTATG
TGTAGTGGCG
CTAAAAGCTC
CAAGAAAGAC TGCAATCTGT ATGTGCAAGG CCTACGTGCC CTTGGATTGA GGTAATAGAT AAAGAGAAAG CGACAGATTG 9 9*9*9* 9 99 9.
9 9 9 9.99 9 AAGTAATTTT AACTCTCTTC TATTGCTAGA AATGAAGATG CTATCTATTG TT-AAATGGAA TT-TATTTCT ATCAAATACG AAAAGCAACT AGAAAAGGTA AAATGATT'TT GGCATAGTGA ATAAAAACAA AAATGTCCAT TGCAAAGGAC GATATAATGG ATTCATAAAG GAGGTGTATC TTGTTGCTCC TTrTGCGAGA TAGTAAGGAT TTAALATTGCT CTGATAAAAC GGTTrATCGC GTAGAAGCAT TCATTTTATC TGAAAAAGGC CTCGTGGA CG TTGATGGGAA TTTTACAGAG TTACTAGAAC GTCTCTTGTT GACTGC'TCCT GAATTCTACG TAAGCGAGTC AGTAGTACITA ACAAATGGTC GGATAGGTTG GTAGTTrGAA CA'rAGTGTTA TTrATTAGAA AA'rCGTr'rGG TAAATATTTC AACTAAAATA GATGrTATGA GG;TrCTGr TC TATTTGATAT CATTrTTG AAAATGCGAA GTATATTATT TTTrGAAAGC 2820 .2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3560 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500
GTGTCTAGAA
TATATATCTG
CTTGTCAAGG
AGAGGTTTCA
GCTTTTGATC
AAGCCACIAT
AAAGATCGTC
AACAAGAACA AATGGAAACG CTAAAGTATT GGGAGAAAAA GAATCAACAA AGATTGTCCG AATTAAATCC AAGAAGT'rCC CTCAAGTAAG GCGTGAAAAA 99 99 9 9 9 GCAAIYTATG GGTTAGATTT AAAAATGAGA CAACGAAAGC GCTCAAATTC GTTCAGCCAT TCTAAA'rCTA CTGCCAATGT CAAATTACAC AGAATAAGGT TCAGCCTCTT GACGGAGAAC
C'TATTTATGA
AGATATTACA
TTTTTATTGA
TTAATCAGTT
TTGCTCACTT
TTTAGGAGAG
AGAGAGTCTA
TGGGGATGAG
GGATTTAGAG
TTGTTTGGGA
TTACTGATTA CACTTGAGAG AGAATTGGGG GTAAACATTC CCTATCCATA TAATATAAAT 448 AGGAATCGTC GTAGTACTAG 'rATTCATGTT ATrr-rCTCTC ACCTGTATAT TTTTATCAGT 4560 GTAGCACCTT CAAAACC'rAC ATTCAAGAAA TTGAACAATA TATCAATACC TTGTATCTTC TCTCAGCGAG 'TTTTAGATGT GAGATTGAAA CGACAGATCC AGGAGATTAG ATAATAGAGT TATCCTAATC 'rGGTTAAAGA TTTGCTTCCT 1'GAGTCTGGA GAAAAGCGAG CACGTCCTCT GAGCTTTTAC GAGCACGATT
TATTGTTGAT
TTTTAGGATG
GAGATTGCAA
CACTCATTAC
TGACT-rTGT'r
ACAGATTAAG
GTTAACAACT
CGAGATTGGT
GAGAAAATTT ACAGTGTCTG AAGGTTGATG CAG7rGAGAT AAACCATTTT CTTCCGGGAA TATTTTAGCC G3TAM=TGAT GACTTGGCGA GTCATATCAG AATAGTCTTT TATCACAAAT ATTTCTA.AAG AAGTGAGTCT TTTCTAGTC'r TATATTTTGC TCAAAAAATT 4620 TGACTATCTT 4680 GCTTCCTTTT 4740 GGACAATAGA 4800 TCCCTTACTG 4860 TCTTTTAACC 4920 AGTATTTGGT 4980 ACGGTTTCAA 5040 CGGAACTTCA 5100 TGATGTAGTT 5160 CATTGTGACG 5220 TTTTCTAACC 5280 AAAAACAGTA GTGATGTGTA CATCAGGTGT AGAAAAGCAA TTTTCTGAAT TGGATATTAT GCTTATCATC AATTAGATGA GCTGATAAAT 01'ATATCCAG ATTTAGATTT ACGGTAGCTT TGCAGGAACC AGCAAGTGTC CCGTT'rGTCC TAGTTAGTGT 0 0 4 96 09 S GAGGGTGATA AACAACGTCT TCAAGCAAAA ATTCAGGAGA TAAACTATGA ATAATCTTI'C GCTTGTCCTT ATGGATATAT CTGTTCAAAA TCGTCAAGAA GCCTACAAAG AATTAGCAAA TCAAATCAGC CTTCTTGTTT CTGAAGATAC AGAAAATA GAAGAGCTTC TATATTACCG
TGAGAGACAG
CTTTCAACAT
GGAAGTATAG AGGTTGCTAA CATGTCT'rAG TGATTACTAG GGATATCCAG TGTGT'rGACC ATGTATTAAA ACATTGATGA GTTAACAAAA GAAGAATTAC GTCTGATCCA GACGGAT1'TA AACTATTGGT TGAGACGGGT AGAGAGAGGC AGAAGGACAG GTTCTGCTGT GGAGAAGGCG AGACCATTGA TGGGAAAGGG AAGCTGCTAG GGAGCATTTG
TTATTATCGG
GAAGACTAGC
GGGAGATAAT
CAGATGCAGA
TATGTGTCTG
ACCGGTATTG
GGGGTAGTCA
GTCAAAGTAA
AAGACCTTAT
AGGTGTTCTT CTACCACATT GTGAAGGAAA ATTAAAATCA CC'rATCAGAG AATGGTCGAA TTTGGCCATT GCAGTATCAC AGGACAAGTC AGATGAATCA TTCATAAATC AAT'rAAAACA ATATGGAAAT CAAAGATATT CTTAATGTGA GCAAAGAAGA GGTTTTTGAG GCATTAGCTC ATAGAGACCA ATTTATCGAA GGTCTTTATC 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 .5940 6000 6060 6120 GGAATTATAT TGCTATTCCC CATAGCAAGA .0 9 6 TAGCTATAAA TCACAATGAG ATTCCTTGGG TTGTACTCTT TGCAGTTGGr GATGATACAG CACTCTTTGC TCGAAAACTT GGTAATGACG AAGTTGTTGC CAAATTAGTT CGGGCTCAGA CATCTGATGA TGTGATTGCA GCTTTTGT'r AATAAGAAAA AATTTTGGAG GGTATCCGTA TGAAAATTGT TGGTGTTGCA GCTTGTACTG TGGGAATTGC CCACACTTAT ATTGCACAGG AAAAATTAGA GAATGCCGCA AAGGTAGCTG 6300
GACATGTGAT
AGCAGATTGA
AACGCTGA
ATAAACTGAT
449 TCATGTTAG ACTCAGGGGA CAATAGGGGT TGCAGCGGAT GTAGTTATTT TAGCAGTTGA GGGTAAAAAG ATTATCAAGG 'rTCCAACAGA TGCTAAAGCT GTTGAG.ATTG TTACGAAATA AGAA.AA'rGAA TTGAGTCAAG TGTTAAGATT TCTGGTATGC AGTGGCAGTC AAATCTCCCA ACTGAAAATA ?1'TAAGGAGA TGACAGCCAT 'ITCCTATATG GT'rAGCAAT GGGGGTGGT ATGCTTTAG.C AACTATGGGT AAATATATGT TGAAACACT ATTCCAATTG TTGTGGTGC GTTCCTGACG CTCTTGTAGC GGTAAAGCCC TTGG'TCTCTT GC'rAAGCCAG GGATTGCACC GGTATCG GTGGTATCTT AAAAAGGTCA AAGTACCAAA GTAGCCTCTT TGGTAAGTAG TTTACCAACI' GGTTGACGAG GGGGCAGTTA TTGGAAT'rC' AAACTTAAAA GGTCACTTAT AGGATTCTTA GTTGCCATTG AGGAAAATTC ACTATCTGGG
GCCAGTTGTT
AGTTr.TT~
GGGAGGTTAT
CTGGATTAAA
TTGATrATG
CTTATTACAA
CAGTGCTGTT
ATTGCTACAG GTTTG'rCTTA CTCGATTGCT GTTGGTCTAA r'rGCCAAITrC TGTTGGTTCA ATAGCTGGTT TCTrGGTTCA AGCGATTATT CGTTTAATGC CAACC~rGAT TATTCCTr ATTTATATTA T1'GGAGCGCC TATCGCACCC AGCTTGGGAA GTGCTTCAAA TGG'TTTGATG GAC'N'TGGTG GCCCACTTAA TAAAACArTr 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 TATGCGTTG TG'rGACTTT ACAGGCTCAA GGTGTGAAAC TTGGTGAATA CTGCTACACC AGTTGGATTT GGATTGGCCT AAAAAAAATA TCTATACTCA AGAGGAAATC GAAACATTGA ATTGTCAATA TTGTTGAACG TGTAATTCCG ATTGTTATGA ATTGCAACAG GTATCGGTGG TGCTIGTTGGT GGTGCTGTTT TCTGCTGTGC CATTTGGTGG AGTGCTTATG TTACCAACCA ATTTGTGCCT TGT'rAGCTAA CATTGTAGTC ACAGGACTTG CCAATAAAAC ATGCAGAACC AGTTATGACT GTTGAAGAAG GAAATTTTGT AAGAGGGTAA CGATGTCAAG AATTGAAT GGATTTGGAC AAATTCAAAG AGCAGATTAC ?rTTTTTGAAT T ATCGATATT ATGGATGGCC ATTTTGTTCC CAATATTACC AGAAGTTCAA AAAATTAGTG ACACACCTTT ATCAGTTCAT AACCATTGAC TGCTT'rACA.A ATrTrATCGC GAAATTACTC AATCGGCTGT TCCTATGGGG ATAACTTGGT TCCAGGTCTC CTTTGACAA'r GGGTGCTGAT TGACTCGTCC AGTAGCTGGT TCTACGCGAT TTTGAAAAAA AGAT'rGATTT GTCAGA'rAT'r 'rCACCATCTT 'rGATGACCAT GATAAAGTAG CATCTTATCA TTGTCTCCTT GGrrCATTCA CTGATGGTCA CAGACCCAAC CN'TTGGGTA GATCAAGTTC TCTGAATGGT CTTGCTTTTC TGTTGTCCTT AATCCTGAAA TCGATT'rACA ATGTGAGTAT ATTTGTATTC ATGCTGAAGT GT'rrGATTGA TAAAATT'CA'r GATGCAGGTC TAAAGGCTGG CACCTGTTTC TACAATCTT'T CCCTACATTG ATTTA=GTA CAAAGCAACT ATTATGACTG CTTGTATAAA ATCCAAGAAC TGAGATGCAT GGTTCTTCGA TATTTATGTT ATAGGTCGCA GGATATCTGT TCTAGAGA'TT GTTTGAGAAG AAATTTATTA TCGTTTTCTT GGCGTAGATG GGGTGCTGCG CCAATGGCTT GCCAAACTGG ATflAACCGAG 450 TAGATCCAGG TN"IGCAGGA CAACGCTTTT TGGAGTCTAC TCCGTCAGCT TAGAGTTCAG AATGGTTATC ACTACATCAT GT~CGTAAGAC TTTCAALACAA ATTGATGTGG GTGGATTATT TGGTrTGGAT GACGATA'rTG ACGAAGAAAT GACCGGAAAA ACAATGCCAA GTTAGGAGGA ATATATGTCA CTACAATCAG CTATTAACAA ATCTAATTCT GGTCACCCGG ATAGCCTATT TACAAAGCAC CTTAGAAI-rA ATCGCTTTA'r CTTGTCTGCG GGTCATGGA'r GTATGCTCTC TTGCA'rTTAA CAGGGTATAA GGATGTATCC ATGGACGAGA.
CCGGCAATGG GGA'rCTAAGA CACCTGGTCA TCCTGAAGTG ACGCATACGT TGCGACATCT GGTCCGCTTG GTCAGGGGAT TTCTACTGCC GTTGGTTTCG GCGTTTTTTA GCTGCTAAGT ACAACAAAGA TGGTTTCCCT ATI'TTGACC TGTTATCGCT GGAGACGGTG ACTTCATGGA AGGAGTGTCT GCGGAGGCGG AGGTCATCAA GCTTTAGATA AGCTTATCGT CCTCTACGAC TCCAACGACA TGGTGAGACC AAAGATACTT TCTCTGAAAA TGTTCGCGTC CGT'rACGATG GCATACAGTT CTGGTAGAAG ATGGAACAGA TTTAGCAGCA ATTTCTACAG GGCCAAGTTT TCTGGTAAAC CGAGTTTGAT TGAAGTGAAA ACGGTAATTG
CAGGACCAGA
CCAAAGCCTG
TCAAATAATG
TTAACGCCAT
GAATTGTCAT
CACCTGAGCA
CAATGCTACT
TTAAAAATTT
CTGGTGTGGA
CCCA'AGCAGA
ATTATACTTA
CTTCTTATGC
TCTGCT'rGGA
CTTATGGTTG
CAATTGAGAC
GTTACGGCTC
AAGAAACAGG
AGGAAGTATA
CT'rGGGCTAG
ACGCTATTGT
8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 ACCCAATAAA AGTGGTACAA ATGCTGTTCA AGCAACTCGT AAGTTTTTGG TTCTGATTTC- AAGACAAATG TTTGGT1GTCT GATTACAAGG AGCTGGAAAA TCCCCTGTAA CTCTCAAGCA ACTCGTAATT CTTCTTAGGT GGATCGGCAG CTTACAAGAT AAATATAATC CATGGGAACA ATCCTCAATG C'rTCTTTGTT TTCTCTGACT GCCTGTAACT TATGTCTr'rA TGAACCAGTT GAACATTTGG
GATGGGATTA
TAGCGGATCG
TTGCTTATCC
CCATTACTGA
CGTCCCAAGA
ACr'TAGCTCA
CATTAAACCG
GAATGGCTCT
ACGTCAAAGC
CCCATGATTC
CAGGTTTrACG TGGTGCACCA CTAGGAGCAG COATCCATTT GAAGTACCAG TGGTCAGGAG GCATACGATG CGAAGT'rGCT AGTGAGATTG AAAAGACTTC CCTGTCTATG AGAATGGCTT TGCTATTAAT ACAGCAGCAG TTTTACCAAC CTCTAACATG ACCTACATCA AGGCAGATGG CAATATTCAG TTTGGGGTAC G'rGAATT'TGC TC-ATGGTGGT TTACGAGTT'r ATGGCGGAAC TGCTATTCGG CTATCAGCCA TTCAGGAGTT ATTGCCG'IT GGTGAAGATG GTCCAACTCA CTCAATGCCA AA GACTG TTATCCGTCC AGCGGATGCC CGTGAAACTC AAGCGGCTTG GCATCATGCC TTGACCAGTA CCACCACTCC AACTGTCATT GTCTTAACCC GTCAAAAC?1' GGTAGTTGAA GAAGGGACAG ATTGT GGTCGCTAAA GGAGCCTACG TCGTGTATGA TACCCCGGGA TTTGATACTA TTATCATTGC TACAGGATCT GAGGTCAATC TAGCTATCAA AGCTGCTAAG GAATTGGTTT1 TACAAGGTGG TAAAGTACGT GTGGTATCTA TGCCCTCAAC CGAACTATTT GATGCTCAAG ATGCTACCTA CAAGGAAGAC ATr'rTACCAT CTAAGACTCG TCGTCGTGTG GCCATTGAAA TGGCAGCGAC CCAAAGTTGG TACAAGTATG 'rTGGT'rTGGA TGGCGCGGTC ATCGGTATTG ACATCTTCGG TGCGTC'rGCC CCAGCTCAGA CTGTGATTGA TAAT'rATGGA T'rTACGGTAG AGAATATCGT TGCTCAAGTT AAGTCCCTAT AGAAACCAA'r TACAATGAAG ATACAGCTrGT TGTCAGACTA GCAGATGTAG TGATAGACAC TAATCAGATG ATTGGTTATT TAAAAACTGT AATGAAAATG 0 00.
0 TAATAAr'rTA
GGTTGCGAAG
ACAGCATCCA
TCTACGAAAG 'rrATAGTAGA TAG;TATACAC TACGCTAATC ACTTTGCTAC TGATCTAGAT CAGA'TCACT TAGGATATTG TAAGT1TTTTT CTAAAATTAA AAAACGCATA GTATAGGATG TT1GAAATGAT GACAGAAAAA AATCTAACTT TTGGCGTGTT TTTATTATGA GTTCAGTTCT ATGAACTTAG AAAACAAGGA TATATCT'1AG GGGATAAATA ATTCTAATC'r TAGGTACATG ATTAAAqTrGA TTCGTCAAA.A AAGGGAAAAA TCC'rrACTAT TCTCCTGATT AAAGTCTGAC ATGAAGGCTG GACTAAAGAT AGAGmTCTC AATAGAGTAT ACCCTGAAAC AGTTTCTr'TA ATCAATAAAC GAAAGCTAGA GAGAAGGTCT GAACTGCACC CCAAAAGTTA AATTAACTTA TGATGATAAA AGAAGCTTTC AAATAAATTT TTGATCGTTA CGGAATAGAG TAAAACAAGA AATGATTCA'r TTGAATACGG TCTCCCAAGT ACGGGTATAC TATTGT'rGAG CTAAAAAAGT TAAGAGAACT TTrGTTCAAGA ATTAATGACT CTCGTTGGAC CTACTACTAT TTAAAGCTGA AATTCAATCC TTCATTTAGA ACTAAGAAAT TGAAAGTACT CAATTTACAA 9900 .9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11443 CGTACGATAC TCTTAACTG AAAACAAAAG GGAGAGTACC CCGATTGAAG GAGGAAAAAG GAGTTTTCGT TAGATCTTCT CACTTGAAAC AGCTAGATAA .ATCTTTATCG AACACAAGGG CGTGCTTATC TCGTAAATCA GCTAGAATGC GACAGnAACG GCTAGCACAA TACAGGAAAA TGAGAGCCGA GAATGCCATC AGAAATAAGA AAGACAGAAA TCTAAAAGCC ATTAAACTAG ACCAGATAAG GACCAAGAGC AGATTATGCT TATCGCCGGG TAAAAGAGTT CAAGGCTTGA AAAATATTCT TCTCATAAAG GAG INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 5338 base pairs TYPE: nucleic acid STRAMDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCAATTACAT TA'rATTATCA ACTCTTTATC ACTCAGCCAA CATTGAGATC TTTATCCGCA TCTCTCTTAT GCCCGTCAGG TCCTGTCGCC CACCGCGAAC TGCCTTTGTC TCTTTGCTCA AACTCGGACT TGGGAGATTA CTTCTrAAAC AGTTACAACC AGCCCACCAT CTCTTCACAG AAAGAAAGAC AAGGTGAAAC AGGGACGCAC AACTCCTTCT ATCCATTGTG GTCAGTGACC TACCATTGCG ACAGGGATTT ACTGGATATT GATGACCCGA AAATCGTCGA AACTGGCTCC ATGAATGAGG GTCTCTCCAA TGCAGTGCGA GATTTGGAAA ATCCCAAGGG AATCACCTTG ACCCGTGATG TTGTCGAGCA GACCCAGCTT CTGGAGGAAC TCTTTAGCGT TTCGTCTCAA CACTATGCCT AGAAAAGCGA TATGGAGAAA TACGAACTCT
CAGCCAAGCA
ATGAAATGGG
GCATGGAGTT
GCTATAAAAA
TTGTGGTCAA
TCCTTCGTGA
a a.
a a a a a.
a a a a a a. a.
a a TCGACGACGT CAAGAACTTC GTGATGTTTT AACCAAGATG CGCAACCGCA TATCTTTGTC TGTCTGATTT GGAGAATTTC ACTTTTCAGA AGAGATTCTT GTGCCACCCT CTTTAATCTC TGAACAGCAA CCTAAACGGA TCGAGCTGGT CTATATCCAG TAGACTATCT CCTAGAAGAA CGCAGTGAGG TCGGGGTCCT CTGGATGACA ATCACCTGCT AGCAAGACCA ACCCTCTGGC CCTTACCTCA GCTATGACCA TCTCAAGAAC ACCACAAGAA TT~GATTGGTT TGGATGGTTA GACAATATCG TTTCTATCCC CATGAGAAAA CCAGCCTATC GTTCAGTTTG ATAGrTGAGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 GAACGCTTTA TAAGATGGGC
AATGATAAGA
ATTAAAAGTT
GGCTAGGTTC
AATCTTTTTA
TCCATCTGG
CACATAGACT
.AAAC"ITGGTC
GATGTTGATG
ATTCATCAGG
ACCAATATGT AGGCTAGCAA TCCCCTGCCA ACTTATCAGC AACAAAATCG GGAGATTGAG GCCACTGCAT CTGGTTCTAG TCGGCTTGGT CGAAAAATCC CCATAGGGCA TAAGTTCGAG GCTGAGTAAA GACTAGACTT ATATGCCCTT TGCTGCTTTC GCAAAGGTAT TGACCTCAAA CALACCTGCAC ATTGGTTCTT TNTTACTTATA TAGCTTGGGA AAGAGAGTAT AAAACTTATG TTCTCGTTTG TTTTT'rCCTA TAATCTTGAC CAGGAAPGCGA TCAACCGATT TAAGATAAGT TGTACGGATT GGTCCTGGAT TGACTGTTGT TCGCAGAGCA TTTGAAAAAC CAATAGCCGC GCCAGTAGCT AT'rAGACCTG CCATGCTGAC C'IrCATACGA GCCGCAAGGT GACGAGACAG CATCTGGTGA ATATCTTTAT CAGCAATCTG GTCAAATCCC TCAAAAATCC CGTAACCAGC GTTGTTAATC AAGACATCAA TCTTGCCATA GCGGAGATAA AGATCAGTTA CCAGAGCTTC TAGGGCTGAA TCGTCGGTAA TATCAATTTC AATCAATTrCT GCATGGGAAT AAGCAAGATG AGTTGGTrCAT AGCTCCGG1A ATGAGAATAG TCTTCCAAGT CTTTGACC-AC 'rTGCTAATAT CTTTTGAGAG CCTGCTTCTA CCGCTACI'G ATTTTTTCAT CACCCTTGCC AGACGCACAC TGGCACCCGT CGTGGCGCTG AGATATAGTC TGGCCGTI'T TGATTTTACC TCAGCATCCA GCGTCCCT'rC GTGTGGTCCA GCTCCTCTGC GAATCTTGGT CAALACTCATG AAT'rTCCGTA
TGGGCAGGAG
'IAGGCATACT
ATGGACA'T
GAAACGGGCA
TGCAGCTTGC
ATAAGTGGAC
TTTTCGAGTG
TGCTGCCTTG
GAGTTGGGCT AAT=MN'CCT TATTTCTACC TTTGACCATT TC1'TGAGCTA GACCACCGC7' TATCCTTTCT GTGACTGCTA GA1TIrCACT TCAAAAATTG 'rGGCAGCGTC TTTCTTGAGT CTGATATGGT TGAGTAGGAG GCG'TTTGGCA ATATTAGTTG AGTGACCATG GTTACGAGCA TCATGAACTA GGACATCTGC AT'rGACAGCC TCTCCTAAAA TAGTCATAAT CTTACCTGGA ATTTCAGTTC CGTCTTCCAA AACAAGATCC AAAAAGCGGG CCCAACGGAA CACCAGCAGC CTTGAGTTTr TAGATCCTTT TGCATGAC-AC GATAGCCAAC ACAGAAAATA
U
0* U U
U
U
U. *U
U
U
ATACACAGTG AATTTATCGG P'rTCAAGAAT GAAATGAATG CGGTAGGGCA GACGAGAACC
TTACCCAGA
TGACACACGA
ATCTGTCTGC
AATGTGGTCT
AGGCTGGTTA AGACAAATGA CTTGATTCCT TCTTCATTGG CCTGAAAGGC ACGGCTAGAA CCATGCAGAT GGGTAATAAA GATTT'rGCTG ATGCGATTTT GCGTACCTTC TCCACAGTCA AGTTTCAGGG CGAGACTTGA AACGTTGCGG AAAAATTGAA TATCCATTCG ATACTTTCTA TTTCCGATCG GAAATAGCGT TTGCCAGAAA GGCTGTAGCC TTTGAGACGT TTTCGACCAT TGAGACTTrC TAGGCTGATA ACTTCCTCGT ACTCTTCAAA GACTAATTGA TGGGGGAAAA AGATr'rTCTC CTCA'r'r'TCA AAGAAAATCC CAAAGGA'rTT ACCAGACAGA TTAAGCCGAA GCTCAGTATT GTAAAGATTC AG'ITCCTGAC GAGGACTCT'r 'TTCTGATAG AGTTCTGCAA TAATAAGTGT TAACTGCTCC TGATCTG~T TGTCCAGACT TGTATAACCT CCATACTTTA TGAGGTCCG'I AGATTTCCAA AGGAAACCTG GCAAACCAAA 1620 .1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 ACCT'rACGTG GTCGAATTGT GGTTTCCAGA AAGAGCCAAA CTTCGTTAA'r CTCATCCAAA GCTTTAGAGG GC-TGACCAGC CCCCGTTCCT ATTAATCAAT ATATAACATG GCTGTGCGGT AAGCAGCAGC TTCTTGCAAT AAATCCTCT CAGCCAATCT TTCCAAATCA GTCAAAGCTG CCTCGACAGG CT'rCATGTAA ATCTTACCAG TT'rGCGCAAT TTC.AAAGAGC AAGTCATCCG GACCAAGGCC GTCACTCTCA TAACAAAAAC TAAAAGGCTT ATTTTCTAGG GTGAAAC'rTG TGAGTCTGC AAAATAATCC GTCGCAGCCT AGTAGGCATT AACAACACTr GGCGGAGGTG TACCAGCTAG AAGCTGATCC AGATAGACCT GAGCCAAAGT TTTAATATCA GTCATAAAAT TCTTCTAACC TCCATTTATT TTCTCGGAA TGATAATICAC GTTCTTCCAG AATTGCAACA GAAAAAGGCA CTCGCAGGGT AAATGCTTCA GCTTGCAAGT TTTCACGACT GTCCTCAGAC CTAGGCGTGA AATCCTCCAC TCCATGTCCA GGTCTTTCAT TTGCTAGCAT CGATAACATG
TTGAAACTGG
GTTACTTGGA
AGCTCATCTG
GCATTAG'rAT
ACAGTCGCAC
CGAACGCTAC
CCTgCCTGAC
ACACCAACTC
GATTGCCTCC
CTTCATACTG
AACCAATCAA
GATTITTTC'rC
GACGGTTCAG
GGCTGAGCAT
CAAATCCGCT
GATGGAGAGA
AACCAGAAGG
TGTCGGCAAA
CAGATGAATA
GGTCTTACTG
ACCAATCTTA
AACCACCTTG
CTCCAGCTGG
AATCCCCTGA
CTTCCCTTCA
AATGACCTTA
GTTGACGA'rG
ACCACAGCCG
454
ATGTAGCCTG
CTCTCTAAAT
AAAATCCT
TTGGCAGAAA
TTA?1'ATAAA
ACCGTTTTTTT
TCCACATGCT
TCTGGATAA
CTCTTGGTTG
GTCAAGATGT
AAAGTGCTAG
TAATCACTTC GCCGTCTITCC CATGAATCTT GTAGGACT TAATCTTA'rC TAGCAATAAT TGAGGGrTATA TGGCGTTrGG GCG'rCAAGTG AGGAATATCT CATGCTCCTC GTGGTAAGGA TGC rlr C~ CAAGGT'rGAC AGCCAACGGT ATCTGTCAAA TCGCATCCAG AGTCGCAAAG TCATGATAGT TGATT'rCCCA ACTCCAAACG TTTTTCTCTG AGC'rCGCGCT CGATATCCGT GATTTGATTG CTTTCACCAG GACCACGGGA ACCAATTCCC 9 9* I. 9. 9.
C
9 *9 .9 9 9 9 9. 94 9 TGGGCTAGGT GGACTTGGAG AAAATCAACT GCATACGGTC TGCCTTG6 TCAGACGATT AGCGCAATCT CTTCCAACTT
CCAACCAAGC
TGGCTTCGAG
ACACCGAGAA
ACAdTAGTGA
ACGAAGGTCT
GAGGCAAAAG GTATTTGAGT CCCGCATGGC AAAGATATCC CTTCCTCTAG A'rTGACATTC TTT&TTdC ATCCACCATA TGGA6ATCATA TTTrTCACGT TCGCTAAACT AGCCAA'rrCT 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 TTTTGTCTGT AGC'rATCTAC AACGACTGCC CCTGCCGTTT TCCATGGAGA GGTCAAAACT GTCCATACCC TGCAATTCCA CACCAATCAG CAGGACTCGC TCCTCTTTTT TCTCCGTTTC AATCATCTAA AAACTCCTCT ATCTGGCTTA AAATGCGGTC TTGTACACCA GAT'rCTCCAA TCTGATAAAA GGTGACCTGC ATGCGATTAC GGAACCAGGT 4CAGCTGACGC TTGGCAAAAC GACGAGTCGC CTGTTTAAGA CTCTCACTAG CTTCCTCCAA GGTCTGCT&r CCACGGAAAT AAGGAAAGAG TTCCTTATAG CCAATTCCTT TAGCAGCCTG TACATTAGGG G;AATGGTCAA ACAGCCACTT GGccrcA'rcc AAAAGCCCAG CCTCAAACAT CAAATCCACT CGGTGGTTGA TACGCTCATA AAGTTGACTA CGTTCATCAT CCAAGCAGAT *AATC-AGCGGT TCATACAAGG TCTCTTGATT TTCCAAATCC TGACCAAAAT GGGCAATTTC TAAGGCACGC ATAGCACGAC GACGATTAAA CTGGGGAATC TCAAGGCCTG CTTGATCCAC CAAATGGGCr AATTCCTCAT CTGAATATGG CTCCAAACTA GCTCGATAAG CTAAAATCTC CTCATGAGGA GTCTCCCCAC CTAGGTGGTA ACCTTCTAGC AAGCTCTGGA TATAAAGTCC 455 AGTCCCACCG GCGATAATGG CTAGCI'GCC ACGGTTGTGA ATACCCTCAA TAGTCATCTT AGCTTCTGAA ACAAAATCAA AAGCCGAGTA AGACTCGGTT ATCTCTCTAA CATCGATTAA ATGATGAGGA ACAGCTGCCT GCTCTTCI'GG ACTAGCCG GCCGTCCCAA TATCAAGTCC TCGATAGACT TGCTGGCI'AT CTCCACTAAC CACTrCGCCA TTAAAACGcT TTGCGGGG INFORMATION FOR SEQ ID NO: 51: SEQUENCE CHARACTERISTICS: LENGTH: 19446 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 51: CGGAAACCCA TCTAGTCTCC ATCGTTTGGG AGACCAAGCA ACACGAATCr TAGATGCTTC 5160 5220 5280 5338 TCGCCAACAG AT'rGCAGATT TAATCGGTAA GAAAAGCGAT TGGAACAGAA GGGGATAACT GGCTTATCAA GGGTGTGGCC CAAGCACATC ATTGTTrCAG CCATTGAACA TCCAGCAGTC GA.AAAGTCAA GGATTTGAAG TGGATTTTGC TCCAGTTGAT TGAGGCGTTA CAGGTTTGAT ACGGCATGAT ACAATCCTCG AATGAAATCG GCTCTATCCA ACCTATTGAG GCTATT'rCAG ACTATTTCCT TCCACGTTGA TGCGGTTCAG GCGCTTGCCA CTGACAGAAC GGGTGGATTG CGCGACTTTC TCTAGTCACA GTTGGCT'rTG TCTATATCAA ATCTGGCAAG AAGATTACAC GAAATCTTCT TTACCTCGGG TTTGAAAAAG CTCAGTTTGG AAAGAGTCAG CCCTCTGGTT AAGAAAGGCT TGGTCGATGT TTrCCATCAT GGCTGTGAAC AAT'rCTTGGC AGACAAGCCG AAATTCCGAC TGAAAAGTAT AGTTCCACGG GGTTCGAGGT CTCTTCTrAC AGGTGGTGGC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 CAGGAGCGAG ATTATCGTTC GCCCTCCGTT TGTCTATGGA GCAGTGATTC GCCAAGCTCT AACT'N'GCAC CTCATATTCT CACGCCTTTG AAGACTATGA GGAAAACCAG CCGGTACCTr GTGCGTCTTA GCCTAGACT'r TTAAAATTGA TTTACAATCA.
AAATTATGAT TCrrTACGA GACAACTGAA AATGTGGCAG GGATTGCAGC GACAGCCAAG AAAGCTAGAT ATCTTTAGGA GCAAGACTGG GCAGATGAAG TCTGA.ACTAT CCGGATATTT TTGTCTTTTC AGATGAGGAA GACTTTTGGA ATCAAAGGTG TTCGAGGTGA. AGTCATCGTT TATTTTCATC TCAACAACCT CAGCTGTTC ATCTAAGGCA GATTGCCATG GGAGTGGACA AAGATAAGGC CAAGTCAGCT GGAAAATGAT ATGAGTCAGG TCGAGCAGTT TTTGACCAAG AACTAGAAAA GTAAGATAGG AGCATTCATG CAGTATTCAG GAGTTGTCAA CCAAGGGTAA AAACCGTATG CGTTTCATCA ATAAACTTCG TAATAATATT CAGATCGCGA CCGTGCCCAC CTCTCAAACA AG7"rTTTGGA TAGAAG?1'TT GAAGTCTTCT CCTTTAAGAT TTCTAGCAAG ACCAAACAC1' TGGAGGGGCT GTCCTCACAT CAATCTTCAG CCA'r'CGTGG GGCTGGTGGT CAGGAGGGAT TGACTCACCT AGGCAGTTCA CT?1'GCTAGT ACTTGACCCG TAAATTGACC CAGAGATTCA AGAGGAAATC GTCGCTTTAT GATCGATT TCAATGGGGA AAGTCrAGGT ATGCTGTTAC CAACACTCCC TTGACATCGC CCAGGAAATC GTACC'ATTTTr TGCACCAGAT ACGAACCGCG TATGGATGTT CTGAAATCAC ACCTCAAGCC AATTCAGAAA ATCCAAAAGA AACAGGTAAA AAACTAACTT AGAGAGTTTT CTGACAATGA CAAGGTTCCT TATACAGGTA GAAAGATAi:A GACCGTTCCT TAGCAAAGAG TCTT'rCA'TG CGATATCAGT CGCATGATTG GTATGCGGCT 'rGGAAGTTCC TGTGGAGTAT GCTGAGTTTG TACAAAAGCA GACTGCCAGC CCAGTTTATC GGTTTGGAAT 456 TCGGACGTTT 'rGTCTATCTA TACCCAAGTT GCTITACCTCA ATGGAGCTGA TTACACAGCA ATTCAAAACT TTTCTCCTGT TTATAAGGTTI GTCCAAGAGA TTATGCGGGA CATCTACAAG CGTAGCGACC ACAACTTTGA ACT'rGATAGT GTATTCGAAG CCATTCCAAA TGTGCAAGTT GTGGAGAT'rC GTGAAGAAGC AGCCTATCTT TTGCCAGT'rG GAACT'rCAGG TAA-AGGGATG GTAGCAGGTr ATCTTGCTCT TAAGCGTGGG GTGGATATCG CCACCATATA CTAGTCCTGG TGCCCTCAAG AAAGCGCAGG AAGT'TGCGCG GAAATATCCA GTTTATAGAG GTGCCTTTCA AAAGCCAAAG CGCCAGAAGC TTATPTGATG ACTCTAACTC ACTGACCGTA TTCGTGAGGT ACGAAATGGT TTGGTTATCA CP.AGTAGCCA GCCAAACCCT TGAAAGTATG AAGGCTATCA ATCATI'CGTC CTGTGGTTAC CATGGACAAG TTGGAAATCA GATACCTTTG ACATTTCAAT CCAACCGTTT GAAGACTGTT CGTCCAAAAA CAAATCCTAA AATTAAGAAT GCGGAGCAGT GAAGGCT'rGG TTGAGCGAGC AGTGGCTGGA ATCATGATTA GAAAAAGATG AAGTTGATGA C'rTGATTGAC AATCTGCTCT ATAGCGAAAA TCAGTAAAAA AAGTTAGTTT T'rTCTCTAAA TT'TTTATTTT TATGATATAA TGATATAAAA TTTTGAA'rAT ATCAATCCTA C'TTTATCTA AAAATGAAAG AACACAAACT AGGAGCGCCG TGTACGTATT CTTCTrCCTA AAGATTATGA A'rCCTGTTGT ATACTTTCAT GACGGGCAAA ATGTTTTTAA GACATTCATG GAAGATTATC CCAGCTATCA AACGAAATCC TCGTTGCTAT TGACAATGAT GGTATGGGGC GGATGAATGA AAGAATCTCC TATCCCAGGG CAGCAGTTTG GTGGTAAGGG TCATGGAGGT GGTCAAGCCT 'IrTATCGATC AGACCTATCG ATACGGC'rAT GA'rTGGTTCC 'rCACTAGGAG GCAATAT'rAC ACCAAGACCA AATTGGTTGC TTGGGCGTTT TTTCATCTGC
AAGGTAACAG
GT'rGCAGAAT
GAAAAATCTG
GAAGGTATGA
CGTGAACTCA
CAAATGAAAA
TCTTATGAAA
CTCATGTTGT
1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 457 AAACTGGCrC CACCAAGAAG CCTTTAACCG CTATTTCGAG TGCCAGAAAC TATCGCCTGA CCAGCGCATC TTCATCTATG TAGGAACAGA AGAAGCAGAT GATACAGACA AGACCTTGAT GGATGGCAAT ATCAAACAAG CCTATATCGA CTCGTCGCTT TGCTATTACC AGCAGGGGGA GTACATCTGG ATAATCTTGT GCTAAAAGTT" CAGTCTGGTG TGAAA'rCCCT TGGTCAGAAA ATCTACCAGA TTGTCTGAGA T7rTTGCAG AGTTAAGAAA GGAAAAAACG AAATGCATAT TGAACATCTT AGCCACTGGA TAACCGTGAA ATGTACCTTA ACCGTTATGG ACATGGTGGG ATTCCAGTTG
ATGATTTGAT
CCATCCATAG
AAAAATGGTA
GTGGTC-ATCT
TGGTCTT'rGC
CCTGTGCTTC
GTGAGAGCTG
ACGAACGTTA
TGGCATGATG
GCATCCAGAT
00.. 0 0 0 0 .e 0 TTCATCAGGT GGTAGTCACA ACGAATACTA TGATTTTGGC CTTTATCGAG GAAGGCCTTG TCCAGTTCTT TACCCTATCT GTTGGCTACT TGGAAAAATG CTCATGACCA AGCGGAAATG TGTGATTGAG GAGGCCATTC TTTTATCAAG CACAAGACAG ACGACAGGTT GCTCTATGGG AGCCTATCAT GCACTCAATT GTCTTTACCA AAGTGATTGC I'CTCAGTGGT GTTTACGACG TACTACAACG ATGATGCTAT TTACCAAAAC TCGCCAGTAG GACGGCTGGT TTATTGACCG TTACCGTCAG GCAGAGATTG GCCTGGGAAC AAGATGGTTT GCCATCCTTT TACAAGCTCA CAAATTCCAG CC'rGG'rTTGC TGAATGGGGA CATGATGTCG CGTAAACAAA TGCCTTATTT CCTCGGTAAT CTCTATTTAT
ATGATTGATG
AGTTTGGATA
CACCGTGCCT
GTTGGTTTGA
TCTTCCTCCA
CACGTTTCTT TGTCGGTGAT ATTATATTTG GAACCAAAAC TGCTGTGTAC GGGGCTTGGA AAGAAGCCTT TGACAAGAAA CCCATGACTG GGAATGGTGG AAA.AGGAGTT ACCTATGAAT 3000 3060 3120 3180 3240 3300 3360 3420' 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 TACCTTGTTA TTTCTCCCTA CTATCCACAA AACTTTCAAC AGTTTACCAT AA'rAAAGGCA TCACAGTCTT GGGAATTGGT CAAGAGTC TT ACGAGCAATT TTGCGCAATA GCTTGACCGA GTATTTTCGT GTTGATAATC AAACGTGCAG TTGCTTTTCT CTTTTATAAA CATGGTCCAA AATGAATACT GGCTTGAGCT AGACGCAACA CTCAGAGAAC AAACCAGAGG ATCTCAAAAA G;ACGAAATA'r AAGTCTGAAA GCAGGTGTTC CTGTGGTACC TGGAGCTGTT ATCAAGACGG CTGAAAGAAA TCGGTCTTCC AATGATTGCC AAACCTGATA ACCTTTAAAC TTGAGACAGA AGACGATATC AATCACTTCA
TTGAGAACAT
TTGGCCGCAT
AATTCAATGT
TGAAGAAACT
AAGCAGATGT
ATGGAGTGGG
AGCAAGAATG
CGAACTAGCT
GGATGACCCC
AGATGAAGTC
CGAGTCTCAZC
TT'rTGGTGCC
TTTCAAAAAA
TGATCAAGCA
AGCAGCCGCA
GGACCATTCA
TGACGGGCTC
TACACCGCTT
ACCCTTT-ATT TCTTTGAAAA ATTTGTCACT TCCAGCGAAA TCTGTACCTT GTGGACAAGG ATGGAAAGAT TGTCTTCTCA ACAACCTTTG ACTACGCCTA 458 GACCTCATGA TTATAAGAT GGACAATTCT TAN'ATGTGC TCAAGATAT CTGCGCAAGT ATGGGGAAGC AATTGTCAAA GAATTTGGTA TGAAAGAACG
GGATCCTAAJ
GTTTTCCA)
ATTGAGTTCT TCCGTGAGGG GGACGATTAT GGTGGTTA CCATTGATGT TTATAACTTT GCAGCTATTG TCGCAGGAGA GGAGTTCCCG GCTACTTCTC GCCGTGCAAA TGCTCACTAT TATAGCCAGC AGTTCAAGGT TAAAAAAGTC
ATT'ACCATCG
GCTCA'IrCCT
GCGTCAGACT
G~rrATTCAG
ATGCCAGCTG
S
S.
S
S
*e
S
S
S
*5*S S. *S
S
S
GATTACCTGT ATATGCTGAC TTCGGACAAC GTCAAGAATA TGTTTTGTCT GATAAAAAAT ATAACTATT'r GAAGCAGGAT ATAGTATTGT GTTGCGTAT ATATAGTTT'r CAAGATACCA TAGGAAGCTA GCCGCAGGTT AGTCAGTATC ATATACTACG GTATAAAATA TTCAGGTGAC AAAATAATGA TATTACTAAG AATTAAAAAA TAAAATCGAT CTTTAACAAA TATTGAACAG AGC'rTGGACC TGAAAGTGAT AAGATAAACC TGAATACCGT ATCGTTTAAT GACTCAGGAG A'rACAGCCTT TTCACGATAT T'rCAAAAGGC AATTTCTATC ATAATGACAA ACAAAGTATT TGTCGACCGC AGGTAAAAAT TTGAGTrGAT GCAACCGACT GCTTC -rAGT ATCTGCTAGC CAGATAATAT CAATCATTTT TGT'IGAGACT TGGGGCGATG CACTCCGAGT CGACAAGAAA AGAACTATCG GATTAAGGAA AAGAGCATCC CAACAAGGTA TAGGTGGTCA GAAATTAAAT CCT'IAAATCA GCTAAAAGGA AACAAGTCTA TTAATATTCA TCTCAAAACA CTGTTTTGAG GCAAGGTGAA GCTGACGTGG GCATAGATAT AGT'rAATTGA TTTTAAAAAC TAAAGAAAAG AGTACAATAA CCGCCCTGCJ TGGACCTTTA TCGTrGGCrA9 TTGAAACrCA GTATTGTTT( AAGAGGATrT GCTTGCCAA; CCTTCGCGGA ACTTICAAGG; TGGAGCAGAT GATrGCAGA9 ATTAACTCCC TTAATCCTT9 GCTATCATAA AACTTGTI'C( TTTAATATTT CAATTGAGTC TCCATGACGA CACCTATAC( ATGAAAATCA AAGAGCAAA( GTTGTGGATA GAACTGACAC TTTGAAGAGA T'TTrCGAAGJ AGCTTTGTTT GAAATCTGAI GGAAGATATG ATTACAGGCC 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 CAGCTGTGGG AAATTCTTTG GACAGAAGGA AACGCAAATC TTGACTTATC TCTTATTTAT GAAAGATTTG GATAGTGTCG GCTGAATTTC TAGGGATTCC TTATGAGGGA GTTTTTCCAA TGGTCAACTT TTAAAAATAT AGGAGATGCT CAGGAAGTTT ATTTTTCCGT TTA'ITAAAAA TCTCAAGGGG GATACAGATG ATGCGAGA.AG CTATTTT'rCA AATAAA'rAAA CCTGCTACGC TTAGATGTTT TTCCAACTAG GGGATTAGAT GTAGATTTTG ACTGATATCG GAGATATCTA TGAATATCTG TTATCAAAAT GGACAGTTCC GTACACCTCG TCACATCATC GATATGATGG ATCAAAGATA TCATCTCAGA TCCCGCTATG GGTTCTGCTG CGTTACTTAA AGCGTAAGAA AGATGAATGG GAAACCAATA CATAATCAGA TG'N'TCATGG AAATGATACG GATACGACTA AACATGATGC 'rACA'rGGAG'r AGAAAATCCA CAAATCAGTT 459 AAGCCGATAA ATATAC'rTTG GTTTTAGCAA ACCTTGACTC GCTGTCTCAA GATAATGAAG
ATCCTCCTTT
TAAAAACCAA
GTGGACGAGC
AAGGAAT-rCG GTGGTGTGT'r
GTAATGGTGG
ATGATAAGCG
ATCTTGAAAA
TAAGCGCTCA CTTGACTACA AAAAACAGAA TTACTCTTTC AGCAGTTATC GTACCTGATG.
TCAGGAAATT GTAGAGAATC CAAGCCTTAT GCTGGAGT TACTGACAAA GTCTGGTTTT ACAACCGATI AGCGACAATG AGAAGCAGAA CGTCAGAGAA ATTCA6ACCTC TAATGACCTT TTT'CTCTTTT CTTGCGAACT
GTGTCCTTTTGGTTCGTCT
ATAAGCCTGA TGCTGTAATC CAACTGCCAT TCTCATCT= ACGATATGAA AGCCGATGGT ATATTCCAGA TATTATCGAA CGGATCAATC TTTCT'rTGTT
CTTGCAACCG
TTAAAACCAG
AAAGCTCATA
TCAATGCCTA
ACAAAAACTG
TTAAGT'rrGG
CGCITTCATC
CCAGTTGCTG
6540 6600 6660 6720 6780 6840 6900 6960 7020 AGATAAAGGA AAATGATTAT GATTTGTCTA TCAATAAATA TAAAGAGATT GAGTATGAAA AAGTTGAGTA TGAACCAACA GAAGTCATAT TAAAGAAAAT CAATGATTTA GAAAAAGAAA 0 a 0 0 00 000 TTrCAAGCTGG CTTGGCTGAA 'rTGGAAAAAT AAGTGAAGTT GGGGGAAGTC TTATCTCTAA AACAAACAAC TCTAAGCCAA CGTTATATTC TAAAATTCAC TGAAAGTTTA AATATGACTG WGGAtG.GAGC TAATGCAGGA ACAGTTGGTT TTACGGTCTT AAAAAAGAAT GAGCGATACA TCTT'TTTGGA AAGTAAATCG CAGTATTTAC ATTTAAACAA GAATATATTA CTTGATTTAC AGAACATTAT CTGTATTCTT AATACGATTA TAGATGAACT AAACTTGCTC GTCAAATCCC TATTTGAAAG CATTGATAAC TTATTTGATA CTAAATCAGA TGAGTTGTTT AGTGAGGAGT CTAAAAACGG ATT'rTCATTC GATACAAAC TTCGAAAAGG CAAACTTGAG CGTTATGATA ATG'rAGCGTA CTACGATGAA TTAATAAAAT TAATATTACG TCCCAAGACA CCAAATCTAA ATAATAATTA TAGTCGAGTG ATATCAGGAA TAAAAAAAAT ACTTCTCCCC CTCCCCCCAC TACTCAAGTA GGGAGGTGGC TGTATGAAAA AAAAAGGCAA GAAAGCCACT GTACTTGCTG AAA'rAGATGA TTTAAGAAAT AATAATAAT'r AAGCACTCCC AGATGATATT CTGATAGCAT ATGGATTATC GGGAGCTGTT GGTAGTACAA AAGAAAAAAT TATATCAGAT TACTTGGGAG GAGATCATTC AACAGGTGCA ACAATTCCTC AATTAGAATT GCTAGGTATC GAAGAACAAG AAAGGCTT AT TACTAAA.AGA AAATTTCAGT GATTTAACGA GATGTT'rGGG GAAAATAAAA 'rrATAGATGG TGATAGGGGC AAAAATTATC ACTGT?1'ATT TTTAAATACA A.AGAATGTTA AATTTATCAC TAAAACAAAG GATAAATTAC TAGTCTTGAC AACAAGAGGT ACTGTTGGAA ATAAACATTT ACGTATAAAT TCAGGTATGG ATCAGA.AATT TATTATCCAT GT'rTTAAGGA GTGCTCAGCC TCAGTTACCA ATTACAAAAT 'rAGCCCTCCA AAATGAGTTC GCAGACTTTG 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220
TAGTCCAGGT
GCcrGAAATT' TTTAGCTrTr 460 CGACAAATCA CAATTTGCTT GTGAGATAGC TA'rAAAAGTG TGGAGAAATA TAGTATAATA TAGCTAAACT ATTTGTTTAA AGTGAGAAAA AAATGGGAAA CTTTAAAAA ATGACGAATA TGAATCNTTT TCAAAACCTr GCAT'rGAAGc 8280 8340 8400 TGAGAATATG ATTGCTACAT CAACTGTGGC TACTGCCTTT ATGGCGCGTC GTGCTTTAGA GCAGGC'rGTC CATTGGA'rAT ATAGTCACGA TTCATATTTA GAAGCTCCCT ATCGTGCTAC TCTATCTTCT TTACTA'rGGG ATGATGATTT TAGGGATATC GTAGATTCTG AACTCCACAA GCAGATAG'TT CTGTTGATTC GGTGCGGAAA CCATGCTGCT CATGGTGGTG AAATTAAGGA ACGAGAAGCG ATTTTTAGCTT TGCATCATTT GTA'rCAGTTT- GTTAATTTTA TCGATTATTG TTACAGCA.AT GAGTTTGTGG AGCGTI'ATTT TGA'rGAGAAG TCCTTACCAC TTTCAGCAAA CATCAAATAC CGAGAAACTC CACAATCTAT GATAAAGTTA CAAGACAGTT TACCAGAACT GCCTGATTrTT CATGAACAGA TCGCTGCTCA GTCCGTAGAA GTTCAAGAGA CT'rATACTGA AAAAcarGAG ACTGCAGCGC AACGGCAAGA TGTGCCTTTC CATATTGATC A-TTCGA GGCAGAGACA AGAAAGCTCT TTATTGATAT CGATCTCCGT TTAGCAGGAT GGATATTTGA AGAAAACTGT CGTGTTGAGA TAGCCGTTGA .TGGTCI'CAAG CACGGTTCAG GAATTGCT'rA CTGTGACTAT GTACTTTATG GTAAAAATGG GAAAATTI'TA GCGATTGTGG AGGCTAAAAA AGCCTCTGTC AATCCAGAAG TAGGGGAAGTr ACAGGTCAAA GAATA'rGCTG AAGCTTGGA GAAACATATC GGCTATCAGC CAATTTGCT'r TATTACAAAT GGCTTGAAGC ACTATATACT TGATGGTCCG AACCGCCGCC AGATTGCAGG CTTT-TACTCT CAAGAAGAA'r TGCAATTAGT GATGGATAGA CCTCATCTTC AAAAACCGCT TGAGGATATT TCTAGTAAAA TTAGGGACGA TATTTCCGGG CGTCACTACC AAAAACATGC CATTGCAAGC GTTTGTGAAG CTTrCTCTGA TCATCGTAGA CAGGCACTTT TGGTTATGGC AACTGGGGCG GGGAAAACTC GTACAGCAGT TTCTCTAGTT GATATCTTAT CACGTCATA.A CTGGGTAAAA AACGTTCTCT TCTTACCCGA TAGAACTTCC TTGGTTAAGC AAGCATATGA TTCGTTTAGA AAATTACTCC CAGATCTTTC CGTTTTAAC TTCTTAGAAG ATAAAGAAGG AGCTCAATCA AGTCGCATGG TCTTTTCAALC TTATCCGACC ATGAT-rGGAG CGATTAGTGG TCAAGAAGAA GTAAATCAAC GCCCTTTCAC TGTTGGGCAT TTTGACCTTA TCATAATTGA CGAATCTCAC CGTTCTATTT ATCAGAAATA CAAGTCCATT TTTGATTATT TTGATGCAAG AATTGTAGGC TTAACAGCTA CTCCGCGTCA AGATTTAGAT AAAAACACCT ATGGATTCTT TAATrTTGAG AATGGGGTTC CAACATATGC ATATGATTTG GAAGAGGCTG TTAAAGACGG ATATTTAGTA GCCTATCATT CTATCGAAAC CAAACTGAAA CTACCTACGG ATGGTCTACA TTATGATGAT TTGTCCCAAG AAGAAAAGGA 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 ACATTr'TGAT AGCAAATT'rG TAATTCCTTT AT CAATA AGGAA'rTCAG ACAGCCTCGG AAGACAATAG CTGTGAAAAA GATA'rTGATG GGAGTGTATr AAAG'rACAGT AGAAATTGTT TTAAATGAAC TCATGACAAG GTGATGAAAT TGGTAAAACT ATTATI-rTG CTAAAAATCA.
TGATCATGCG GAATATATCA GAGGTAr'N'T TAACAACCGC 'rATCCTGAAA AAGGGAGCGA CTATGCTCAC GTGATTrGAT-r ATAGTATTrAA GCATTATCAG ACCTTGATTG ATCAT'TrA AATTAAGGAG AAGTATCCTC AAATTGCGAT TTCTGTCGAT ATGTTAGATA CAGGTATTGA TCGrACCAGAG GTTGTTAATT 'rAGTCTTCTT CAAGAAAGTA CGCTCTAAAA CTAAGTTTTG GCAGATGAT'r GGTCGAGGAA CCCGTCTATG TAAAGATTTA TTTGGACCTC AGCAGGATAA GGAAAACTrC TT'GGTATTT~G ATTATGGCGA CAATTTTGAT TATTTCGTG CAGATCCAAC AGATGGAGAG GGTCGTCACA TTGTTTCGCT GACTCAGCGT TTATTT'AATA TCAAAGTGGA CTTGAT'rCGA GAACTTCAGG GACTCCAATA CCAAGAAGAT CAGTTTGCGA GAGCATACCG TCAGCAGC'TT GTCTCGGAAC TTCAAGGTCG TA'rAGAGACC TrA.AATGAGT TGGACTTCAG GGTTCGTATG GTTT'rACATA CAGTTTATAG CTATAGGAAA TTGGAAAGTT GGCAGAATCT A.ACTGCTGTT ACAAGTGAAA CCATTCAAAA AAATCTCTCT CCGCTTTTAT TTGATGAAGA TAAAGAAGAT GAGATGGCGA GGAGATTTGA TT'TGTGGTTG CTTCATATTC AG'rTGGGGCA ACTGACACCT AAATCTTCCA CTGTTCATAT TTCCCAACTG ATGA-AGACCG CTAGAGCTCT TTCTGCTATT GGCAATATCC CGCAGGTT TGAGCAGCCT GAAATTATCA GGAAAGTACA GGAGCCTCAA TTTTGGAAAG AAGTTAACTT GTCTGATTTG GAAA.AAAT'rC GTCTTGCTAT TCGAGATTTA TTACAGTTTT TGGATAAAAC AGACCGTAAA. CCCTACTATG TTAACTTTCA AGATCGTA'rA CTCTCCACTG TTCACGAGAC CACAGCATTT TTIrGAGGTCA ACGATCTTCG GTCTTACAAT GAAAAAGTTG AGCATTATTr GAAAACTCAT CTGGATGAGG AGTCCATTTC TAAGCTATAC CATAATAAAA AGTTGACATC TGATGATATG CTTGCACTTG AAAAATTGCT TTGGGAAAAA TTAGGTAGTA A.AGCAGACTA CCAAAGTCAT TATGAAAATA AGGCAATTCC GAGATTGGTr CGTGAGATTA TTGGCTTAGA TAGAGAGTrCT GCCAATCGTA 'ITTTTCTAA A'rTTTGTCG GATGAGAATC TTAATGCCCAG GCAGATTTCA T7rGTAAAAT TGATTCTAGA CTACATTGTA GAAAATGGTT TTT'rAGAGAC GAAAGTGTTA ACGCA.AGAGC Cc'rTAAArC TTATGGTTC'r GTTCAACTAC TCTTCCAACA CCAACTACCA GTACTTCGTA ATATTGTTCA AATCATTGAA CTTATCAXI'A ATCGAGCTGG AGAAGCGGCT TAAATTCTAA AGTGATTGCC ATGCTGAGAC TCATTTAAAA TTAAAAAGAG TAGAAATTTA TGCTATATAT GACAAGTr'rT 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 ATAGGAAGA ATGTCATCGT TTTCAAAGTA GATACTTCTA 'NTGlrAT?1-rA AAGAAATAGA GAAATAATAC TCAATGAAAA ACACTGTI-rT GAGGTTGCAG AAATCTTCCT AGGATAAAGC GCGTTTrcTc ATGTTAAAGA GTCATTTATT ATTCTTCAAA TTTTGAGCAG TATCTGCATC GCGATAGCGT CAGGGAACCT GGAAGGGTTT TTAGTAGGAC 462 TTTCCTAGAA TACAGTATCA GTTG-rAAGT GGI-rGATAAA CCACGATGTT TG?1'GATCGA GTTA'rAACA AAAGAGCTAC AAACAAAAAG CCGAGCAAGA ATTCAATTGc AGGAGAAAAT TCAAAGAGCA AACI'AGGAAA CTAGCTGCAG ATGGAAGCTG ACGCGGATTG AA7=ATTTT AAAACGCATA GTATCAAGGG ?I'TrCAACAC CTTTCTACCA GG~rTTTTAA AAGCATAAT
GAAAAATGGT
TTCACAGATG
CAAAGTATCA
TTCGTGT'rGA GGGGCGAATT TTTTCAGTTC
ATAAGACAGA
ATGATAGAAC
CATCATTACC
CAAAGGAT'rG
GCTGCTCAAA
CGAAGAGTAT
TTGATACTAT
GTTAGT'IGTA
TTCAAAGCAC
ACAAAGGGTA
AGCCAGTCCA
GGCGTCTTCC
AACATAATAA
TGAGAGGGTT
CTGTGTATGA
TTTATTCATA
CGACAGTCAA
S
S
555555
S
55 55 S 5 5 5**S
S
*S*S
5*S5 *S *S S S
S
ATGTAGAGTT CGAGACGTTT TTCCCATTTT GCGCTATCTT CTTCGCGGAC TTTTGATAGG GCCTGGGT'rA CTTGAATGTC GTTCTCAGCA ATCTTGT'rTT TTGTGATAAG AGCGCGTATA ATAATGTAAC TCCTT'rTAGC AAGGTAAGGT GtdGTiACTCT GTA'rTGTCAC GGATGGTGAT GAGAGTGTCT CCTTCTTTTA GCTTGTAAAT CCATTCATCT GCAGTAATAT CCTCTACTAA GTAACCTTTT GTCACGACTC GTTGTGCCTC CAACATGTGA TAGCGACCGT TGTCAATGGT TAGGTATTTC TTTTGCCCAA ATTCTAGCAG GGATTTGATA CGAAGAT'rGA GCTGAGCGCG AAGGTTAATA GTAATCTGAT 'rGGCTTTCCC TGTCTTACCA TAG~tGCTCAA AATCCTTATC TTGACCAAGT GAGAAGGTTI' CACAGTTTTC AGTCTGTGGA GCTGTATTGT CCTGCCAGAT ACTGGGCGCA TCCAGACAAG GAGATGGAAC CATTGTTAAG TTCATATTTT TGATGTCGCG AGAAGGGCTT GCAACTCAGC AGTTGGTGGC GGTGTTCTCA AAGCATGGAC TGAGCGAGGT 11820 11880 1:1940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560
TTCAAAGTCA
AGTTGGCTGC
CAGTAAATCA
TGGTCTAAAT
TCTAGCACTT
TTGGGCAGAT
ACCGTT'rAGG GTAGTATAGA GGACTAAACG AGTTCAAATT GAACGTCCAT TT1TCTATTTT GTAAATTAAG GGCAATTCAC AdAGATTTTC AAAATAGCTG GATAAGGTTG AAGAGCCCCT TGTTTGTAC'r TGGAGATCTT TAGTCACAGG TTGATAGAGC TCTGTATTGA AGGTTTGGTA TGGGTACTGG TTTrGAATAG CT'rGCTCTTC TTGCCCACCG AAGTT'ATCAA GTGATAACCA AACAGTAGGA AGTTGAAAGT CTGTT'rCCTG TATGGACTCA CGGAAGTCAA T'rGAT'rGCCA TCCTAGTAAT
ATTGTTCATG
AAGAGCATG
TTCrTGGTCA ATAAGGCATT TAA6ACATGGG CACCATTATG AACATCTGGT AAACATGAAG
GAAAAAGAGA
TGGTTTGACA
TGCTTGTGTA TATGAGTAGG TTCCALATC!CT GAGAACCATG 0 *0.0 .00* *0 0* 0 0 0 *0*0 000* 0* 00 0 0 0 463 AGTAAAGACA ACCTCTGCCT I-TACTTrTATG GGCATTGAGC AGATAATTGC GGTCATGCCA AAACTGATTG TAGTCCCCAG T'1rTrCGGTC TAG-CTGAGCT T1'CACTTTTT CTAAGTCAGC TTGGTGAGCT TCM'TTGCCAC GGATATAGTC GCCAGC'rAAG AGATTACGAG AATAGGTTAA CTCAGCAAGG GAGTCAAAGT CCTCACCTGG ATAACCACCT GCGCTAGTCA CCAGACCGTT TTCACGGTAG TAGTTGTACC ATGATGAAAT TCCTGCCTCG GCAATGA'rAA CTrCTAAACC ATCGAC'rCCT GTAGTCGCAA GACCATTGGA CATGGTACCT AGATAGGAAA GTCCTGTTGT AGCAACTTTT CCGTTTGACC AATCAGCCTT GACTT'GACGC 'EGGCGCGTGT GATCAGTAAA GGCACGGCAA CGACCGTTAA GCCAATCGAT GACATTTTTA TAAGCCTCGA TTTGCTCGTA GTCTCCATTA GTCATGAAAC CTGTCGAGTC T1TTGGTACCA ACACCTGAGA CATAGAGATT GGCAAAGCCT CTCGGAAGGA AGTAGTCGTT TAGTGTATAG CTAGAGTTGA TGTGAGTrAG CTTTTCCTCA GCCTCTGCTA TAAGCTCAGC TT'rACCT'rGG GGTTGGACGA GATI'TAGTTG AGGTTTCTCT AGCTCAATCT TGTGAGGAAG CTAACCTCA AGCTCGCCCT CCATCTGTA GAGAGCCTTG TCACTAGCCT TGTCATT'GGT TCCCTGATGA TAAGGGCTGG CTGTCATGAT GGCAGGGATT TTTCCATCAA AACGAGGGCG AATAATGCTA ACCTI'TACTA GGTCTGATAG CCCT'TTTTCG TCAGTATCGA CACGAGACTC AACGTAAACG ACT'rCACGAA TGACATCCTG GTTAGAAAAA GTAGCCAAAC TCTTGCCGT'r AAAGTAGTG.G TAGTCATTAT CCTCCGGAAT AAGACCATCA CTAACAAGTT GGTCGATAAG AGTAT'rTCC'r TTTTTGGTGC GAGTAT'rGAG 'rAACTGATAG AGATTTTCAA TCAAGTCACC ATATATAATG GGAAATCCAG TTTCTrTACG AAAAACGTCA CTATCTTCGA AGTCAACCAA ATAAGAAAAG CCTAAAAGTT GAAAAGCAAC AGTATAAAAA ATATCTGCTG TCAGT'rCATC TTCTGAT'tGA AAAAATGTCA GCAGGTCTGT TTTTT'rATCA GCTGCTAGGA TAGAAAGTGG GTAGTTGGTG TCTTGATAAG TGAAAAAGAA ACGACGTAAA AAGGTTTCAA GTGAGTCTTT GTGATTGGCT GTATTTTGTA AATCAAAGCC ACATTTTTTT AGTTCAGATA AGACATTTTC TTTTGGAAAA TTGATATAAC 'rA'ATTGATT AAAACGCATA GAACCTCCAT ATAGAATGAC AGTTAAGGTT ATTATATCAA AAAAAAAGCA GAAAGGGAAT TGTT'AACTTC AAAAGGAAAT AATCCAATAA AAATGAATAA AGTACTAAAT TCAATATAGA GAACAGAGTA ACAATAAGAA TAAATAGATA GGGTATAAAA GTTCTAGGAG ATTTATATTA TATGCT 'TCT ATTTTTATAT ACAATATAGT ATAAATATAA AAATGATGAC AAAALATACAA ATGAATAGAA AATAAATTAG TAAGCTGATG AAATTTTTCT CAAGAGAAGC CATTTATAGG TGAAAATGGT ATAATATAGT GAGAAGGATA GAGGAGAAGT GTAAA'N'GAT 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520, 14 580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 464 CGCACAACTA GATACAAAAA CAGTCTATAG T'N'TATGGAA AGCGTCATTT CGATCGAAAA 15360 GTATGTGAGA GCAGCTAAAG AATACGGCTA CACTCATTTG TCT'rTA'rG~C CCTTTCGACT TTCTAGAGAT TACAAAAAAA AGGGCTTGAA ATGACAGTGT TTGTAGATGA TCAGGGAGTG ATCTAGTGTG GGCTATCAGC AGTTGATGAA GCTTTCGACA AACTTGGTCA GTCC'rGTCCC AGTACCTGGA GGATATCGCG TAGAGTTGAG TCGTTAGAAC TAGGCTGTGA TTACTATATA AGCAAGCGAA TTTCATCATC CTATCTTACC TCTTTATCGG GGATAGAGAA GT'rCTTCAAG TTTTAACAGC GATTAAAGAA TCCC'TTGCGT TCGAGACAAG ATGTCTTTAT ATCAGCAAGT AGAGCGtTT CCGCAAGCTT TGGACAATTT AGAAAAGCTI' CTTGGATACT AGTCTGAAAC TGCCTCGTTT TAATCCAGCT GAGAGAGCGT GCTGAACTGG GGCr'rGTTCA GAAGGGGTTG TAGACTAGAC CAAGAATTGT CTGTTAT'rCA TGATATGGGC TGTTTGGGAT TTGTTGCGTT TTGGACAATC GAATGGCTAT TTCTGCAGTA GGCAGTTTGG TTTCTTATGC CTTAGACATC GAAAAATCTG AII tlT GAAC GCTTTCTTAA TCGTGAA6GC TATTGATATC CCAGATATTT ATCGTCCAGA TTTTATCAGA TAGTAAACAT GCGGCACAAA TCGTTACTTT 'rTCAACCTTT AGATGTCTTG AAACGCTT'rG GTGTGCCAGA GjTATGAATTA CAGTTTTCGT GACAATCTTA AGTCGGCCTA TGAGGGAAAT CAATAGTAAG TTAGAATACC AAAAAGCTTT TGAGATTGCT AAGGCAAACC TCTGTCCATG CGGCTGGTGT TGTAATtAGT CATTCCTCTA AAGTATGGTG ATGAAA'rTCC ACTGACTCAG GGCTAGCGGA CTTTTGAAGA TGGACTrTTCT GGGACTACGA GATGCAAGAG T'rGCTTGCTG AAACAGAAGG TATTCATCTG AGAAGACAAA GAAACGTTAG CTTTATTTGC CI'CTGGTAAT TGAGCAACCA GGTGCCATTC GTCTGCTTAA GCGTGTGCAA CGTCGCGACT ACTTCTCTAA ATCGACCGGG TGCTAGTGAC GCTATGATGG ATATTGACAA TACGGCATC ATCCTTTGCT AATTTGCGCT TTTTAGCTCT GCCAAGATGC AGGGGGAGAA GTCATTGTGC CTATTTGA GGGGTTTATC CAGAAACACT GTCAACGCTT TTGAAAGCAG AATCTACCGC TCAGAGAAGT TCTTTAGAGA AACTA'N'CCA ATTTCAGGCA TTTCTTACGA AGACCAGCAG TAGAGGAGTT ACTAGTAAAG AATATCAAGA TTTGATGATT ATTTCTTGGT TATATGGGAA TGGGAAGGGG ACGGGGATTG ACCCAGTAGA 'rTA:CCtATGiC CT6ATATTGA TATGTTGGTA ATAAATATGG GGAGCCAAGC AAGCTCTTCG TCTGCAA'N'A CTAAGAAAAT CTCCAGTTTC GTCAGCAAAT TGCAAGATAG AGGGCTATCC GACCAAGATT TAACCAACTA TATGATGCTC ATGGAGTTGA AATTTGACCT TTGTCCACAA AAAATTGAAG AAATCGATTT ACAAAAGGTA TCTTTCAATT CCAGTCTGTT TTGAAGATGT TATATCAATA ATTTTGTGGC 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320' 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 AAGAAAGCAT GGGCAGGAAG AAGTGACTGT TCTGGATCCA GTACTGGAGG ATATTTTGGC 465 TCCAACCTAC GGCATAATGC TCTA'rCAGGA GCAGG'TTA'rG CAGGTTGCCC AGCGACTTC CGCATI-rAGT C7"TGGGAAAG CCGATATTTT CCGTCGGGCT ATGGGGAAAA AGGATGCCTC TGCCA'rGCAT
GGAAAAAGCA
GTCACACGCC
'rCCAGCCATT
ACTTGAAGCA
AATTGCCA.AC
AGCTCTCTGG
ACCTGAGAAT
TTCATTTGAA
GAAAGAGTTG
GAGATGAGGG C'TTCCT'rlAT GAGCAGGTCT TTGATGI'TAT TATGCCTACT CAGCCTTGGC T"TTTATCAGG TCATGTTAAA GGTITGAAG TAGCCTCTCT AAGGCCATCT ATCTAGGTT'r ATTATTGAALA ATAGACCTTA TATCTGAAAC TTCCTCTGCT AAAAATCGTC AAAAAGTATT GGAAGTTTGT 'r'GGAGATGC TCALAGG7"rCA TTAAZLG= GTCATACTGT GGAGAAGr'r GCAGGTTATG G1TTTAACAG CTTCCAGTTG GCTTATTTCA AAACCCATTA TrcrCCAAC AGTGATTACT TAATAGATGC ATCCATCAAC ACCATTCCCT? ATCACGATAA GAAATCCATT AAAGGAGTCA GTAATGAT TTCTAACATT GAAGArA TACCTAAATT AGAACCTTTG GTAAAAGTTG GTCTTTTCGA TAATAACN'A GCTAATCTAT I-rGAATTrT TArrTATAGT TGGCAGGAAT CGGAAGATTG GACGGAACAA GAAAALATTTT ATATGGAACA AGAGCTTTA GGGATAGGTG TCACCAAACA TCCACTACAA GCTATTGCAA GTAAGGCTAT T'rACCCGAT'r ACCCCAATCG GAAATTTCTC AGAAAATAGC TATGCTATTA TCTTGGTTGA AGTTCAGAAA ATAAAAGTGA TTCGTACCAA AAACTCAA AATATGGCCI' TCTTrACAGGC AGATGATAGT AAGAAAAPAT TGGATGTCAC TCTCTTTTCA GACTTATATC GTCAGGTTGG ACAGGAAATA AAAGAGGGAG CCTTCTACTA TGTAAAAGGA AAAATACAAT CACGTGATGG CCGTCTGCAA ATGATTGCAC AAGAAATAAG AGAAGCAGT'r GCTGAACGCT TTTGGATACA GGTGAAAAAT CATGAATCGG ATCAAGAAAT TTCACGCATT TTAGAACAAT TTAAAGGCCC AATCCCAGTC A'rCATCCGGT ATGAAGAGGA ACAGAAAACC ATCGTTTCTC CCCATCAr-TT TGTAGCTAAA TCCAATGAAT TAGAGGAGAA AT'rGAATGAA ATCGTTATGA AAACGATTTA TCGCTAAAAA 'rACGGAA.AAT AGAAGAATTT TCAACGTAAA TGTGGTATAA 'rCAGTAAGAA 'rGTT'AAAAGA AAAAGGAGCA TAACCAATAT GAAACGTATT GCTGTTTTGA CTAGTGGTGG AGACGCCCCT GGTATGAACG CTGCCATCCG TGCAGTTGTT CGTCAAGCAA TTTCAGAAGG MATGGAAGTT TTTGGTATCT ATGACGGATA TGCI'GGTATG GTTGCCGGTG AAA'rTCATCC CCTAGATGCA GCTTCAC'rAG GGGACATCAT TTCTCGTGGT GGTACTTTCC. TTCACTCAGC TCGTTACCCA GAGTTCGCTC AACT'rGAAGG GCAACTTAAA GGGATTGAGC AATTGAAAAA ACACGGAATT GAAGGTGTAG TTGI'ATCGG TGGTG.ACGGA TCTTACCACG GCGCTATGCG TTTGACTGAA CATGG=TCC CAGCTATTGG 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 TCTTCCAGGT ACAATCGATA ACG.ATATCGT AGCGGTTACr ACTGCCATGG ACGCTATCGA TCGTACTTTT GTAATCGAAG TTATGGGACG TATTGCAACT GGTGCTGATG AAATCATCAT CGTAGCAAGC ATCAAAGCTG GTTATGAATG TGAAGGTGTG ATGTCAGCGG CTGAATTTGG CGACCTTCGT GTAACAGAAC TTGGACATAT CCGTGTTTTG GCGTCACGTA TGGGTGCAC.A TGGTGTTGCG GTTGGTATTC GTAACGAAAA AGAAGAAGGG GCATTGTTTA GCCTTACTGC 466
TGGTACTGAC
TAAGATTCGT
TTTACAATCG GTTTTGACAC GATACATCAT CAAGTCACCG TAACGCTGG'r GATATCGCTC TTTGGGCTGG CCCTGAAGCA GGCTTCAAGA TGGAAGATAT TGG'rAAAAAA CACAATATTA TCGTCTTAGC TCAAAAACTT AA6AGAAGCTG GAGATACAAG TCAACGTGGT GGTTCTCCAA CTGCGCGTGA TGCTGTTAAA. CTTCTTAAAG AAGGTATCGG AATGGT'rGAA AATCCAATTC TrGGTACTGC AGAAGGTAAG ATTGTGGTTA ACAACCCAGC 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19446
TACAAA
INFORMATION FOR SEQ ID NO: 52: SEQUENCE CHARACERISTICS: LENGTH: 16593 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTIION: SEQ ID NO: 52: TCGTAA.ATAT GCTCTGTTTT ATAGAAATAG GACCACACAT TTAAGATAGC CGTCTTTCGT GCATAGTTAC GGAGTAAATC TGGATTTTGT TTCTTAATCT ATAGACGGTT GCATGTTCGG ACTGTCGATT AGATGGAGTT TAGGTAGACT GCATTT'rCAT GTTTGGCAAG TGCCTTCATC GCACTCTTT TTGTTCAA.AA CAAAATTAGG ATTTTTCTGA CTCCACGGAA GCTATAGTAG AAGAGATGAA GGGGGTGATC AAGTGAACCT GTTTATCTAA AATAGGATGT CCAATACCTC CAGCAATCCA AACC'TGATTT TAAGCTCTGT CTAGGGTTAC TTTGCTGCCG TGGTCGCCTG AAGTTTTAAC AGTAAAGTAA AAGGGATGCG GAGCACTTTC AAAGCCTTCT GATTGATAGT TGAAAGGTCT GCTAAGATGG CGTTTGAGAT GGGTAATTTT CCCTAGATAG ATATAAAAAC CAGCTAGTAA GCCTAAAAGG TTAAATGTAA GGAGACGATT GCCCATTATC
TCACGGATGT
TCTCGTCCTT
GCTTGAAGAT
AGAGTTTGAC
TGGAAAATCT
ATTTGAATTT
GGGAAGGAAA
GCATAGCTAC
ATGTAGATGT
CTTCTATGAT
TATCATAGAT
CATGACCTCC
TTAGAAAGGC
CTCTAGTATC
TCTTTTGATA
CAACAAGAAA
GAAAGAGTCC
CATGTGTCCG
ATTCTTGGTA
TGAGATAGAA
AAATTGTCCT
GTGA'IrTAAG
TAGAAAAATG
ACTTAGAAGA
TAAAATATAG
GCTAGGTAAA
TAGG.CGACAA
AAGCGAGAGC
AGAAAGGCTG
AGTAGTGGGA
GGAATCATGA
ATGATAAAGA
TT'rGTTGTAA
CTATAAATAT
467 CCAGGCGGTG AA'rCCATCGC CAAGCTTCGT ATTGATGTA TTTGCCTAAA GGATCATGCT GGCAAAGATA TAGATGGCAA GATTGCCAAA CTGAGCAGCT CCCACAAACC GCCCATACTA AAG'N'ATGAA AGATTAGTAG GA'rGATTGAG TGAATTTGTG GACGGTGTAG ACCTTCTCCA AACTGTGAAA CCACCTTTCT GACGAGTGGC TAGGATAAAA GTCAGAGATA GGCTTGTTAA AGCTrAG'rCCT ATTGGGGAGA AGTGrrCATC CAAGTCAAAA GAGTCAAGAT AAAACTAGCT GTAGTCCTTT GACTGATTTC ATAGAAAATT CCATrrCATT TAGAT'rTCGA ATAAATTTGT TACATTTTAT CATAGAAAAT GTATGGTGTC AAATTGAGGT CTACTCTCAT CAAAAAACTC TCCAATTGAA CTGGAGAGTG GCTGTTTATA ATCAAAGAGC AAACTAGGAA GCTAGCCGCA AGTTGCTCAA AACACTGTTT GATAGAGCTG ACGTGGTTTG AAGAGA'rTrT CGAAGAGTGT TArrCTGCAG ACGTTTGGCT AGCATATGAG ACAGGCTAGA AAT'rCCTAGG TTAAAGCTGA GGCAATCAGG ATCTAAAGAC TGAAGACCTG CTCTGGTTCG AAATAACGGC TTGGCTGGCT CCAAAGAGTT CTTGTAGGGC GATA.ACAGAG TAGAGGAGAC CTrCAATGAAA TGAGGT'rGCA
CTTGTTGCCA
AGTAGATGAG
CCATGAGAAT
S
.9 S S S. 55 S 9
S
S S
S
SC S S
C
59 S S .5.5
S
t~.S *5 S TGGTATCCTr AATCACGGTA ACAAACTGAG AAATGATGGC TGGTAGCATT C7rGTGGGAG AA'rGATGTAG CTTCGTACTG TCCCTTGTCT CTGATGTAAA GAGAGTAAAG TAGAGGATTT GGGCTGAGGT GAAGCCTTGT ACGGCATTGA GACCGCCTCG AATAATCTCA GCTGTAATAC CTGCTGGTGT GGATI'CATT
TTGCGGATGG
GACATI'CCTG
GCCAAGGCTG
TTGAACACCA
840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 AAAAGATAGT AAAAATCCAG AGAAGGTTGG GAACGTrGCG TGGAA.ATAAT GCGTAAGACA GGATTTTTGC CAT'rTCTCGT TGATAGTAGA GAGGATGATG GCAATCAGAG AAATATAGAG AGATAAAGAC TAGGTTATCT GGGGTTAAAA CTTCTAAAAT AAAGTGAATA GGCTTTTTTG AGCATAGAGC AAAGTAGAGA GAGCCGACCA AGACTTAGTC AGGTGTTCT'r GATGAGGTTA CCTGAGGCAA GATAATCAAG CCTCCATCTG ACCACrAGGA CGCCGTGATA GAGTCCCACG
TTGGCTTGCT
AGAGCAGCAC
ACAAACATCA
ACAATTTGGT
CGCATGGCAC
ATAGACTGAA
CAGAGAACGG
CCATCTTGCG
CTAAAAAGGC
AGTCTACTCC
TGGTCAATGG
TGATATAGGT
TCCCTGAACG
CTGTCCAATA
CACAAACTCG ATATAAATAC GACAGCTAGC ACCGTACCGA GGTCAAGCCA AATCCTTTAA AGATTCCATA GTAACCTCCT ACCAAACTGG GCAACAGGGA TGGTATATAG TTTCCGT'rGA AGAGATGATA GCTACAGTAG AGGGAGAATG ATGCGGAAGG AAAACCTTGC GACAAGGCGG AATAACCTCA GCGATATAAG AATTGGAATC ATGATGATAT GGTCACTGAT AAGAGGTAGG TTTGGTAAAA TTCAACAAAG ACATGCCACC AAAGAAGATG CAAGGGTGAA GAGGAAACCA ATGATAAATC TGTCATGGGG TGTAACGGTC ATAAAGTTTC CAAGATAGTC GTTGAGCTCT 468 CCATAAAAAA CAATAACAAA C'TGCACCAAG ATGCGAGCTA AAATGCGTAA AATIGGACG'r CCCAAP.ACCA TACCGAGGAT AAAGGAACCA TTGAAAAA'rT GTCCAAAATC CTGAAAATAG TGTCCTCCTrT AATCTGCAGT ATGGCTAGAT TGCAAACTAC CATCCTTGCT CCATI-rAGTA
AGGGGAGTAT
TTACTGGTTG
ACCGCTAGGG
GCTCTCCAAG
GGTTTGAGCT
ACCAAGTTAT
TGAAACTATC
CAACGGAAAA
CAAGTTCGAC
GGGTGATAGA
TTTTGGCAGA
TGTAGAGTTT
CATTGTCTAG
ATCTAGTAGT
GGTATCGATA
GAA'N'TAAAC
GTATTTGATT TCTPCGTAAC GCTGTCCGTT TACTAGTGTA CGATGACCGT GCAGGGAAGT TTCAGACCTT TCTr'rTTACC ACCTTGGGCG ACTCCGATGG TTTTGCCGTT T'rTAT'rGACC AAAAATCCAG AAGCGTCTGT TTTGCGTTCG TCCGTGATGG TAAAGGTCGC AAGGGGGCCG CGGGTTTGTG CTGTAACCGG AATACCGTAG TCAGATGGCT GCCAGATAGA ATAGAGCGGT AATCAATTCT GGGTAGGAAC CAGTTCAGTA ATCAGGCGrT TAGGTCCTCA ATCTTTTTGA CTAGTAGGGA CTGGTAAACT GATATCCATA TCGACCTGTr CACATAGCGA A'rCTTGACCT ACCAGAATAA GTACCGGTCT ACCGACAACC AG'rTCGCCTC TTTGGCAGCA GCAAGGCCGA TGAGTTCATC AGCTACCATC T'rGGCCAAGT TGGGATCTTT GTAACCAAAA TTGGGAACGT TTTTTTGAAT GTCTGCGATA CTTGTATCAG AAAGGCTAAT CAATAATGCT GATAAAAAGA CTTTGTCACT TTCGTGG'rTG ATAATTTTGC CGGTITrCGAT
CTTGTTTGAC
CCTGGAc&PGG 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 GATTGTCAAA AAAGTTATCG AGATAATGCG GTCCGCAACC TCATCCCATC ATGCGCCAGT CAAGAGCAGA TGTTGGTTCA CGATGGCGAT CCGCTGTTT CCCACATATT TACAAATTCC TTCCTAGAAC TTCAATGGGT GGTTAAAATG TTGAAAAACC TGGCACCAGC AACTTGGTGC CATTGATCGT ACGGATAAGA CTTGTCCTTT TTCAAAACGG
ACATCTGTCG
TCTCGAGCAA
TTCTGCATAA
TCAAAGAGGA
ATTTT'rvrTC ATAGGCGCCT TGAGGAATTG 'rTGGGCACGA TATCTAC'rAA AAC=CTCCG AGCCCATTTC GTGGGTAACG CrGCTAGAAC A'rCTCCGATA GGAGTTCCGG ATGCATAGCA CCTTATrTGA
GGTTCGCTTG
TCGGCCATAA
ATGATCATGT
GTCTCAGGAT
AGACCACGAG
TCTTTCTTGT
TTTT-rATCAA
TGTGGATAAA
TCTTTCTGGC
GTCTCTAAAC
TGTCCACCAG ATAGCATGGC GGGATAGGAA AGATATTTTT GGGCGG'T'T TTCAGCTTCT GCAAGCGTTA CGTTTTCTAA CACAGCTTNG A'rGCCGACI-r CCTTGCGAAG AGGTACCAAA CCATTGACTA GGAGACTTCC TTTGTCAACA GTGGACTTCC CAGAGCCAGA AGGTCCAAGC AGGACAACAA ACATTGATGT TGCGGAATGC GTGGTAGTCT CCGTAATATT TT~TCGACGT'r T1rTAAATCT ACACGGTTCT ACAATAA6AAG TGAAATAAGA ACAGGAAAAA AAGTAGAGGT GTACTATTCT GATAGAGAAA ACGTCTAAAT TCGTC'rrTTTG ATAACACCTA TTTTACAAAG ATAGCCAAGA GAGAGGGATT CAGAACCAAG ACATGATGCT TATCTGCTCC 469 ACTAAAGCCA TGAGAGATCT CTATTGTGTT ATATTTTATA AATGTTCTTG TCAAATCATA TCTGAAAAAA TTCACTATAG TCGATCGGGA CAGTCAAA'rC GATTTCTAAC AATAT'N'TAG AG7rTCAATA TACTATAAAA TGTTATAAAA AAGCAATCTG CATGTTATAA TGAAGCAATA
CTTATGAATG
AGGCTGGACT
AAAGTCTGAA
ATGATATGGA
TGAAGAAGGG GAAAATATTC TTGTTTATGG TTCTArTTTG AAGGAAAGTT TGGAACAACT
TCGTT'ITACC
AGGGATTTCC
GGCTCAGTCT
GCCAGATGCT
GGAATTGCTT
AAATCGTATC
GCAAGAAATG
TTTCCAGATT
GATGGCTATG
TT-GATTGTGA
ATGGGAGTAG
TATGCTATTG
TGTGGAGTTG
GATTTGGTCG
GCCCTA-ATGC
CGGTGGACAA
ATGTCATTGT
TCCATCCTGA
CTTTCAAGTT
CTATTGGAAC
GCAGTTTGCC
GGGTCCTGAG
GAAGT'rrrTA
CAAGGCAGTG
AGACTATGAT
TG4GTGCTGAG
TAGTGTTTAT
TGGGGTTGCT
GACAGACCAT
ACATCCAGAT
GGCTTGTGCC
TATTGCAGAT
AATGTTGGGT
GAATTCTTAG AAAGAGTGGA CTGCAGGTAG AAGATGCGGA GTGGCTCGGT TATTGTTTGA GAACCTTCCT TGGAGGACTT GAGCGGATTC GTCAGGCTAT GCGGATGGCA TGACTTCGGC TGCCGAGTTT ACCTGCCAAA AAATACTTTA TCGAGCAAGA GGTCATGAGG CTATTGCATT CATTCCATGC CTGAAACCCT GCGGATTATC CTTTTAAATA CTGTTAGAAG AAGTGCAAGT ATGGTGAGTC TGACGGATGA CATACCCAGC GCATTGGTCT 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 TTAGTTCAAT ATGGTCTGGA CTGGACATGG CTGGGATTGC GCTCCTCGTT TGAATGCCTT TGCCAACGAA GTAACAGAAG AAACGGTTCG GGGTCGCTTG GATGATCCCA ATCCTGCCAT TGATTTGTTG ACTGGATTTG ATGATGAGGA AGCGCATGAG ATTGCCCTTA TGATTCACCA GAAAAACGAA GAGCGCAAGG AAATCGTTCA GTCTATCTAT GAAGAAGCCA AGACCATCGT GGATCCTGAG AAGAAGGTTC AGGTCTTGGC CA.AGGAAGGC TGGAATCCTG GGGTTCTAGG AATCGTGGCT GGTCGTTTAT TGGAAGAATT GGGACAGACA GTCATTGTTC TTAATATAGA AGACGGTCGT GCCAAGGGCA GTGCTCGTAG TGTCAAGCG GTCGATATTT TTGAAGCTCT GGATCCCCAT CGAGACCTCT TCATCGCCTT TGGAGGTCAT GCAGGTGCAG CGGGTATGAC GCTGGAAGT'r GAGCAACTCT CAGATTTATC TCAGGTTTTG GAAGATTATG TTCGTGAAAA AGGTGCAGAT GCTGGTGGCA AGAATAAGTT AAACCTAGAT GAAGAGTTGG ATCGAGGC ACTTAGCTTG GAAACGGTCA AAAGTTTTGA ACGTTTAGCT CCTTTTGGAA TGGATAATCA GAAACCTA'rT TTrTATATCA AGAATTTTrCA TAATGCCCAT CTAAAGCI'GA AAATTTCCAA TGGTCAAGGC AGATGGGCGA CAGAGTrTN'C ATTGTCTGTC AACCAATGGA ATGGCCAAAC AGTGGAAGGT GTI'CAACTTT TTAACATTCG TCCAGTCT'rG GATTTTICCTG GAGAACTGCC AAAAAACATT CCAGAGGATA TTACTCAGCT TGCTGTCTAT TTCAAAAATG ATATTGACAA AGATCAGTTT GCCAAATTGT ACAAGAC'rAT CAAGCTGAAA GATTITGGCTG CATATCTTAA TCAAGTrA'rT GAAGAACTAG GCTTTGTGAC 470
GGTCGAAAGT
GGGTGAGGCG
TCAAACCAAG
GCTCGTACTA TGGGGGCAGG AGTTTGAAG TGGTAGCC3-r AATCTAGAGT TAGCGGT'rAA TGCCCTCCAG TTGATGATGG TGGATGCGCG TGGAAAAAAT GCAGTCTTGC CAGAAGGTGT AAATCTTGCG GCTAGTGAAG CTGTTGTCGT GAAGACCATT TT'rCAGGAAC AGCATTTCTC CGCTTATTAT CTGACAGGTT ATGGGAC'rAG TTACCAGTTC CCAGAGTTTG ATATTCGCTA TAT'rCAACAA ATCTTGCTGG TCAAGATGAT GA'DAAXAGAT GGTGTGArGA CAGTCAATAA TCAAATTAC CAAAATCTCA AACAAACCGT GGTGCAAGAA ATTTATGATT TTTTGATGGA AATCAACTCT TTTTTGAAAA CAGACCT'rCA GGTAGGAAAA GATTCGGCTG AAAGTATCAG TATAATCAAG ATAALACTAAG ATTTTGGAGG
AGAGGCGCCA
TAAAGACCAA
AAAAGAGTAG
TTTGAAAAT
AACT1'TTAGA
AAAAATGAGT
GTACATTGCT
AAATGAACAA
CGACCGTATT
AAGCGGGAGA TAGGAGAAAG GAAATGATGG CGCTGGGTAC AAGTTAGGAA AGAGTTGGGA CATCAAAAAA ATGGTATAAT ATAAGAGGGT AGAATTGCCC AATATCAGTT TAACAACACT GAAATTGGAG AGTCCATTTT TNAGGGGTCG ATGTGGTGAT GCTGGGGTTT TCTTGACCCA 6120 6180 '6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860
TGGTGGTGTG
TGTTTTGAAT
'rCCAAACATG
CGGGCATGCG
GTATCTCTTG GCAGAGGCTA AAGTTCCTGT ATTTGGGTCT AAAGCTCTTTr GTCAAAGGAA ATGATGCCGT TAAGAAATTT TGAGAATACG GAGATTGATT TTGGTGGGAC AGTGGTTTCC CGTTCCA6AG AGTCTGG.GAA TTGTCTTGAA GACATCGGAA TGACTTCAAA TTTGACCAAA CGGCTAGTGA ATCTTATGCA AGAGATTGGT CGTGACGGCG TCCTGGCTCT CCTCAGTGAT TATTCAGGTG GCTAGTGAAA GTGAAGTTAG GGATGAAATT GGAAGGTCGTr A'rCATCGTTG CAGCTGTTTC CAGTAATCTT CGTGAGAATG GAAAAAATAT GTAGGG~TAA AATATCCTGA GATTACCTTT TT'GAAAATAG GATGCCATTG GTGCTCTACC GAGTTGACCA TTGAGTTGGC AATGATTTCC ATGTCATTGA TTC=TCCCTA CGACTTACTC GGAAGCATCG TTTATACAGG ACTGATTTTG CTCGTTTGGC
TCGGCCAATG
ACCCAAACTA
TCTCGTATTC
CAGACAGCAA
TTGCTGACTG
AGCAGATTTT
TGACGCTGCG GATAAAACAG GTCGACGTAT CGTCTTGACA CGATTTGATA TTGAAAATAT CGTCCGCACA GCGA'rTCGTC TTAAGAAGTT GTCTTTAGCC AACGAAATTC TTrTGATTAA GCCTAAAGAT ATGTCTCGCT GGGTGAGCCT ATCAATGGAC CAAGGATGGG GACCTAGTCT TGCGCGTGTG GAAAATATGA TITACATGTA TCAGGGCACG ACCTAAGTAC CTCTrCCCTG TGCCATGGCA GTTGGGATGT GGCTTACGAG AATGGAGACT TGATGGGAAT GCCATTGGTG AGAGGATGGA ATrCATCG TAGGGC'rCGT GTTCACACGC TGAAAGTTCA GAATTGATTA CTGGGCAGAT CTCAAAGGTA CA.AGCGTCGC CCAGCCATT AGAGAGAAAG TCGAGTTTCG 471 TTGAAGACCA TGAGTTGATT ATTCNTGAGA CAGGTCGTAT TTCGTAAGAT GTCGATTGGT CGCCATCGTr A'rGTAGAAAT ATATTGCTAC GCCTCCGTCT ATGCTAAAG AAGCCTTTGT TTTATCAGGC AGGTGGGGTT GTCAAATTGA N'ACCCAAAG GAAATGTGCG TGATTTGCAG C1'GATGATCA ATCT'TGCA TCCAACGGGA GTATCGTGAG TTGGATGCTC ACGCTAAGGC TGCCAGAACG CATCTTCATT CCTAAAAAGG GGACGACCAT TTGTTCCAGC TGGATCGGT'r TCACCAGGAG ATATC'rTGAT ATGTTGGAAA TGT'rGTTCTT CGTGACCGTA AGGTCTTGTC TGGCTATTAC AGTCAACCGT CGTGAGAAGA AAATTGTGGC
GTGGATTTGT
ACCAAACGGT
AGGTTCGTGA
TACCAGTAGT
GCTTTTTCTT
TTATCTCAAG AAGAGTCGCG ATATTCTCCG AGAAGAGTAT CTTCAA.GGAG ATGACTTTGA CAATC'rGACC AAGTACCTCT TTGATCAAAC CATGGAAGCA AAATAATCGT TGAAATAAAC ATAGAAAAAT AGAAGGAGAA AATCATGGCA 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120, 9180 9240 9300 9360 9420 9480 9540 9600
GTGXI!GAAAA
TACCCTGATG
TTGCACGGGA
CTTCGAGGAA
ACCCAGTATG
CGCTTC'rTCC
GGAGGCTACG
TTTCAGGTG
GCCTACTGGA
GAAAGTCTGG
TCGAGTATTA, CTCACAAGTA CCAATCGAGT GGAAGAACCA TGTCTGGAAA TCATAATAGT CTAATCTCAT CGTTGTTATG GTTTTGACTA CTACACGGCT CTAATATGAC GAGCAAGCGT GCTGCTTCAA ACTGGCTCTT CCCTCAGCTT TCAAAACTTT GAGGTGTTT TGGAGAGATT C'rAAAAAATC GGATAAAAAG
GAGTGTGAAG
TGGCTTAAGC
CCCAATACCA
AGTGGGGGGT GAATGTCCTC.
ATATTCCCGT CTTGTACCTT GGACCAATGT AGAACGCTTG GCAATGGTTG GTACACCGAT CTAGCAGAGG AATTGCCACA GGTTCTGAAA GAAAAGACCT TTATCGCTGG TCTTTCTATG ACGACAAATC GTTTTTCTCA TGCAGCTAGT TCTCCTGAAA GTCAAAATCT GGGAAGTCCA AGAGACTGGA CAACTAGTCC CTATTCTCTT ACCAAACTTT GGGCGTGGTG TGGCGAACAG GTGAAAAATC TCAAAAAACT AGGTTTTGAT GTGACCTATA GCCA'rAGCGC TGGAACTCAC GAGTGGTACT GTTTTrTTTAA CAACCCTACC AATT'GATTTC AAATrAGAAG CTTCAGCATA GGGGGAGTAG AACTAAAATA AAATATGTTT ACTGGGAAAA ACAATTGGAA AGAGACTGAC TTAGTTTGAA TCACTAGACT TTITCAAACGm 472 AAGTAGTAGA ATAGTAATAA AATACTGGAG GAAAGAGAGT AGGAAATGTA CCGTTATCAA 9660 ATTGGCATTC CCACAI'AGA ATATGATCAG TTCTCAAAG AACATGAATT TTACAAAGTA GTGCTTGGGA GGAAGTTAAG TCTAATTGGC AACATGAGAA TACAGGGAAG AAAAATTACT TATAAAATGT TTTACATCCC AATTTTGCCA TTCAGTCTAT TTTGACCCAA GTATCCCT GAAAATCTGG CTATTATTGA GAGGAAATGG GAGACACCAT GAAGAAGATA AACTTTCCAA CTTGAGATTC AATATGGTGG ACTGAGAAGC GAAAAGAGAT AATTTTAAGG ACAAGGCCTA GAGTTAGAAG AACAGTTAGC CGAACTTCAA AAGTAGAAC GGCGACAGCT AGTATTrGA TTAGAACTCT AAGAGGACCT ATATTGGATT ATGGGdATAA
TAAGTCCTAT
ATCTCAAAGT
TAGTTTGCAA
TCAACCTCGT
GTCAACAAAA
GCTCGCAGTA AGAGAGCGGT TTAATCAATC AGGAAAAGAC
AGCCAATGTA
GTTTGGTGT'r
TCCGCTAGGC
AGAAC'rCTTc
TTTTGTGACT
AGAATTTCCT
AGGAAAAACG
GGAA.AAT'r
AAACA-AAGGG
GATGAAAAAA
AT'rGTTAGAT
CAAATGGGAG
ATTCAGGCGA
CAGGCTAT'rC
TAAGGTGGTC
AAATATACAA
GAACAGCACG
ACTGGAACTA TTAGATTCAT TTTCGGAGTT TCATTTGAGG AATGAAGCCT AT'rATAAAAA *e
C.
4 .4 *4 4 C 4 4* 4.
4 4 4 444444 4
C
4. 4.
4 C 4
TATCACCTTG
GAAAAATAGA
GCAGAAGAAG
TTCTTrGCAGG AATATA'rAGA TGTAGGTCAA TTGGAATTTG GTACTACCTC TGTCAATArA TACAATGCAC CAATTTTAAC 'rTGGTATGPA ATCTGGCAAA ATTTAGGTGG TGTTGAAAAC GAAAAATTTA ATCCAACGAT TGAAGAATAC CTCTATCCTC TGTTAAGACT TGCTCTTGAT AAGTAAGTAT ATGGCACTAA CAACACTCAC GGTTTCTTCT CGTTCCTTTA TGCAATCTGT GCCACCTTGG ATGTTTCTAA ACGTTCGCAA .GCCTTGGAAG AGACCTTTAC TGAGTCGACT GAAAAAGAAC GTT'rCTTAGA GGAATTGACC GCGAGAGTTC CTrTAGCGGC TACTTTGAGT TATG(C"TGGTA TGGATGATGA TTTTAAACGT ACGGCTCGCT ATGCCTrTGA ACGAGGTATG TCTCTCAATG GTGGACTTrA TCAT'rTTAAG TTGGGTGAAT TIACAATGCC CACTCATCCT TTCCGTAAAA CArrrAAGAAA AAAACATAGA GAAAGAAGAG TTTCAGACTT ATTCTGATCA CCAGATGGGG GATTTGCTAG AAAAA.AGAGG 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 4~ 4 .4(4 .4 4.
4 C 4 C GGCTCGAATT GTTTATCTTG CTTTGAAACA AGAAGGAGAA ATTCAAGTTG CAGCTCTGGT TTATAGCCTG CCCATGCTGG GTGGTCTGCA TATGGAACTC AAT'rCGGGGC CGATTTATAC CCAACAAGAT GCTCTTCCAG ~T'ITATGC AGAGTTAAAA GAATATGCCA AGCAAAATGG TGTATTAGAG TTGCTTGTAA AACCCTATGA AACTTATCAA ACTTTTGATA CCCAAGGTAA TCCAATAGAT GCTGAGAAA6A AAAGTATTAT TCAAGATTTG ACTGATTTAG GTTATCAATT TGATGGCT1'A ACAACAGGTT ACCCAGGTGG AGAACCAGAT TGGTTATACT ATAAAGATTT AACTGAATTA ACTGAAAAGA GTTGCTTAA AAGTTITACC AAAAAGGGTA AACCCTTGGT 473 GAAAAACGCT GAAACCTTG TTTTAAGAAT ATAACAAAAG ATATTATGAG CATTTTTATG AAATTTTTCG GACTATATGA GGACAAGTTG CGACTTGATT GAGAGAATAT TCTAGTCAAT GATTGAAAAA TATGGAGAAG TCAGGAAACG ACTTATCTCT TGCACTGCTT CAAAAATATG GCATTCCGTT GAAAAAGTTA AAACGTGAAG AACTATCGAT AAACCTCTGA ACGTAGAGAA 'rATAGTGATA AAAGTTITAGA ATACmGG AGAACAAGCG GAGTTTCTCA TAGCAAGCI' GCAAATTGCA AGGTGAACAA AGTAAACTAG AAGAAAACTT 11460 11520 11580 11640 TGAGTrAAAAA TTGAAACGT'r
AAGATATTGT
'rTAGTGGTTC
TTATGTTGGA
TCCTCATTCT
TGAAGTTCGA
TTTAGCTGGG
CTACACTGAG
AAGCATAAAA
0 0 CTTCCTAGGC ATTCAAGGGA TTTTTGATGG AAGTGATGGT ?TTAATGGC TATATTGTAC GCAAAGCAGG TACTTTCCGT ATACAAAGCT ATCCAGTTAC TCAAAAAAAT AGTAGGACGT TTAGATTTCT TTTAGCTTCT TTTAGTAAAA TAAT'rCTTAT CATGCGCTGG CTTTTTCGTT TGATAGGGGC TT'rCTTTTCT GCGTCTGGTT 'rGGATAGTTG TGCTCTTA'rG TGTGCTTGCT
GAGAAAAAAC
AAAGCAGAAG
AGTTTATTTG
TTTAATAAGT
CGTGGAATAC
GTrTTTGCGTT
TACCATCCAT
TAAGA'rGAAA
TTGCTAGAAA
TTTGTGTGGC
TTCGGACTTC
AAAA'rCAACT
CGCGAGACTT
TTTA'rATGCC
TCTATGCCCC
CTAAATACAA
'rTAAACAGAA
CGCCTTTAAA
AAGTCAGTAT
GGTGGAGAGA
GTTT-GTTTTG
TCTGGTATCT
11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 .12360 12420 GAACGGAGAT T'rTCAAGGAG CGCTAAAGCA AGCAGAACGG. TCAGTAAAAA TTGGTCAACA AAGTATTGAC CAATGGGAGA AAACAGGGCA ACTGCCTAAG TTAAGCCAGA CAGATAGTCA CCAGCATTCT GAAGGAAGGT GGATTCACGC TTTCAAGAGG ?r'rTAACTTT GAACTCGTGA CGACGGAGGC ACTCCTGTGG ATTCTTGTCC GTAACGGTGC CTCCTATGAA CCCTT'G'CC CCATACAGAT GAGAAGTCTG GGATCTT-GCA AACCTCCGAA CTTTTTTTGC TATAATGGAA GGGCACAGGC CTCTGCTCGT ATTTACCTGG ATCCGCAGAT CTTATTTAGA AGCAATCCAG AACTGGAATC AAACTGGTGC CTGAGTCTAG TAAGGCGCAT AT'rACGGCTA CGGAGATGAA CAGGAGAGGC GGAAAGTCAA ACTAATCTCT TAACAGGGCA GGTTGAATCA TTATTATTTG TCCAATCCAT ACTATGGCTA ATACGGCAGA ACATGAGTTA GGTCATGCGA TTGGCTTGGA TCATGCAACC AGCAGGTTCC TTTTATGGTA TCCAGGAAGA AAATATATGA GACTAGTGAG TAGGGTACTA TC'rTTCCCTA CTATGAACAA CTTGATTAA.A TCAAAACTAG AGCTCTTGCC 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140
GACCAGCCCT
GGCTAAAAAT
AGAGGCTCTG
GGTTGCTACA TTCATAAGGA CTGCGTAATC GAGTACGGTC GTGTCTGAAA TTG'rGGATTT
TAAAAATGGC
CTATTTTCG'r
ACCATTATCT
GGAAGTCATG
ATGTAGGAAA
ATACCAAGAC
CTAATATTGA
TGAATTTATT GTTACGGAGT GGCACTTCTC CTAGAAATCA ACCTGATCAA CAAGGATGAC AAGTCCTATC CTr'rCATCAA TATCACTCGT CAGGTCAAAA AGGACGGAGG GGCAGCCAAT GAAATCAAGC GGTTGCTGGA CCCGCCCTCT AAGGTCTGTT TTTATTACCA 474 GGAAAACAAG CCCAAGTACA ATATCATGCT AATCACCAAT GAGCGCTATC CACGCTTGAT TAAGAAGGAT GAGGCTTATT TCAGGATGAC AAAATCATCG GGAGTTTGAA CGTGCGGCGG CAAGCAACGG GTCATGGCGA TAAGGGCTGG ATGTGTGTC GTCAATCTCT TCCCCTACT'r TTCTATCAAG AAAAATCTCA AAGAAGCTGT CA AGGCTTTG AACAACTGGT CAATCTrAGCC TGCTAGAAAA ATCTGTCGAA AAATCCCGAC CCCAGTACGT CTGTTTCGGC TATGGTGGTC ACAAGATAAA AACGGTTGTTI GACGCTATGG TCGAGTACAG GGGGGCAAGG TCAAGTCAAT
TCAAGTCTAT
ATGATCTCAA
AATACCGTGA
AAGATTGCA
AGGTTTTCTT
CAATGATCCA
TCTAGTTCCC
GTGGATTCCA
ATAAAAAATG
AAGACTCAAG
A'rCGAGTCCT
TTTGTTCAACG
GGACCAGACG
TCTATTTT
TCGGATATTC
TATCGGCCAG
GGCCCAGGAG
GAGTAA.AATG
CCTGATTCAG
AAATCGCGAT
TGTCCGTXCAG
GGACCCTATC
CCTT'N'CGTA
TCTATWCCC
GTGTCTGATT
GCAGTAGCAG
GCTATTGGAA CGCTTCGAAC GTCTTTGGCT ACTATGTGGA GtAAGCTCAT CGAGCGCGAT GATGAGGATT TTTTGACCTA TGTAGGACAA AATGAGGTAC TGATTCCGCA GATATTGACG AGATTCTTAA GCCTCAACGT CGAGAGAAAA CTCGTGTTAG TCTAGAGCAG AAGTTCAATC GAGCTATTGA AAATCTAGGG CG~TTGCTCC TCGATAACTC TAATATCATG GGAACTAGCC GTAAACCGAG TAAGAAGGAT TACCGTAAGT ACTATGCCAG CATGAGAGAG GTCATTCGCA
CCGATGTGGG
AGTGTACCAA
ACACCATCTG
TTCTGAAAGG
CACAAAGTAT
13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 CGTGAGGCTT TGACTCCTCC AGAT'IGATT GTGATTCATG ATCGCTAAGC AGGTTATCCA AGAGGAACTG GGCTTGGATA TT'CCAATTGC TGGGCTGCAA AAGAATGATA AGCACCAAAC CCATGAAT'rG CTCTTTGGAG ATCCGCTTGA GGTGGTGGAT TTGTCTCGCA ATTCTCAGGA ATT7-CCTC CTCCAACGCA TCCAAGATGA GGTGCACCGC TTTGCTATCA CT1'TCCACCG CCAACTGCGC TCCAAAAATT CTTTCTCATC TCAATTGGAT GGGATTGACG GTCTGGGACC TAA.ACGCAAG CAGAATCTTA TGAAGCATTT CAAGTCTTTG ACCAAAATCA AGGAAGCCAG TGTGGATGAG ATTGTCGAAG TTGGGC'rACC TAGAGTCGTT GCAGAGGCTG TGCAAAGAAA GTTGAACCCG CAGGGAGAAG CCTTGCCTCA AGTAGCAGAA GAAAGAGTAG ATTACCAAAC GGAAGGAAAC CACAATGAAC CATAAAATCG CAATT'rrATC AGATGTCAT GGCAATGCGA CGGCGC1TAGA AGCAGTGATT GCAGATGCTA AAAATCAAGG GGCCAGTGAA TATTGGCTTC TGGCAGATAT TTTTCTTCCT GGTCCAGGCG CAAATGACTT AGTCGCCCTG CTAAAGGACC TTCCTATCAC AGCAAGTGTT 475 CGAGGCAATT GGGATGATCG TGTCCTTGAG GCTTTAGATG GGCAATATGG CCACAGGAAG TTCAGCTCTT GCGTATGACA CAGTAT'ITA TGGAGCGAAT 0 **0 000*.
ACGAT'rG'rC' GGCTACGAAG CTTGCCTC CTGGAAAAGA TTTTCTATCI' CTCATAATTT ACCTGACAAA AACTATGGTG GATACAGAGA AATTTGACCA ACTGCTAGAT GCGGAAACGG GTTCACAAGC AG'rTGCTTCG 'rTATGCAAGT CAACGGCAAC ATTGGCATGC CCTATTTI'AA TTGGGAGGCG TTAAAAAATC ATAGAAGTTG AAGATGGGGA ATTACTCAAT ATCCAATTTC GAAGCTGAGT TAGAATTGCC CAAGTCCAAG GGGCTCCCT CTGCGTCGTG ACGATAACTA TCAGGGGCAC AATCTGGAAT AAGCATGGGT ATGTACACGA TGTGAAGAAT TTTTTTGATT ATAGCCAATG CAAACTAAAA AAGCGATTTG CTGGTCCAAT CTCAATGAAA ATCAAAGAGC AAACTAGGAA GCTAGCCGTA TGAGGTTGCA GATA.AAGCTG ACGTGGTTTG AAGAGATTTT GAGATTGATC TGGGAGGTAA GAACCACCTA GATAGGTATT CGTCTTGATA GAGTTCT'rTG AGCC-CT~IrA'r CAALATTGCTC TTGAGAAAAT GATATAATTG CTGGGGCTAT CTGCAGAAGG CTAAACCACG GTCCTTGATA ATCTTTTGAA. CGGATACCITr ACTCTCCGTT AGCAAGGTCT AGGATTCGTT TACCAATATC TAGCGGGATT ATCAGTGTCT TT'CTGATTCC AGTTATTGAT CGGTA'rCCTC TFGTG'rTGTT TTACCAGCGA TCTGGTCAAG TGTTGCTGAC AAGGACGAGG GGATTGTTGG AAATTGGAAG CACGCTCTTrT TGTGTAACTC AAGTTATTGG CCGC-AGCCTG C'rGGGAAGAT GCTCTCCCAG GCGGTTCTTT GGAAT'rGAAT CA.TCTACTGC CTTTAAAACT TCGATATCAA AGCCTGTCAG AAGAAAT'rGA
GTGACTTGCT
ACGTGGCAGT
AAATCATCAA
ACCGTTCCCA
GTAAAGTTGC
TTATCGAAAT
TATTAGCCAG
TT'rTGTAAGA GTTTCCTAAA CGCITTTTAGT ATATCTTATA GGTTGCTCAA AGCACAGCTT CGAAGAGTGT TATTGTAACT CT'rAGAAGAC
GGATCCTGCA
CGGATTGCGC
AGTTGAGAAT
TA'GGTCAT
TCCAGGGTCG
GTATGCCGTG
TTATGATTAC
GTATGAAGAA
CT'rAATAGAA 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16593
GCTGAGTTTT
TTTAAACTCT
TAAATCAACG
GTCAAAAACT
CTCACCAGAA
GAATTGAGCG
AGAAGTCAAA
TCAAGGGTTC
TTTTGGTCC
ACTGAGAGGT
AGGAAATCAA
AAATTAATTG
TTAGAAGTTC
GGATTTTTCT
CGAGTAAAGG TATTTTrCAG ATAGTGACCA GAATCAAGTC
CTCGTAGTCG
A'PTGCCCTTG
CTGAGTTTTT
TCTTCGTAGT
CAA.ATGGTGG CACGTCGCCA GCTGTAGCAA GGACGATTGT CTTTTGAGCG CTAGTCTC'TT TGGGTG'rAGC TTGATTC'rCA CAGGCAACCA AAAATGGTAG ATrTT'CAT ACTGTCTCCA TTCAAATGTA AAC INFORMATION FOR SEQ ID NO: 53: GATAGCTAGT AATAGGCTAA 476 SEQUENCE CHARACTERISTICS: LENGTH: 3510 base pairs TYPE: nucleic acid STRAX4DEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: GGGATATCCT TATATCCTTG CCTTGAATTC CTGGTGCAAT AGTTCCATCA ACTCTGGCAT AAGTAAACCT TI'TACCCTT AATGTGAAGT CTGGAGCATC GGCTGTTGTG CTGCI'GAGT AATGACAAGA GACTTAAGCC AATTCCAGCT AGAACANTTA GAAACCACCA ATTTTCTTTA GACTAGACCT GAAGCTAGTG AGAGTATAAA TCGCTCCTrG GAACTTAAAA CTGGACCAAT.
AAAGCTGACC AATAACGATT TTCCTGGAAC CATTGTGGGA ATTGCTCAAC AGTTTTTTCA GACAGTAAGA ATTTCGAAAT CACGATCTGG TTTCGCCGCT ACPTTCTTG CATGGACCAC ACCATGAAGC CCAAAACTTC AAAATCAGAT AACTTAACTT CTTTGCCATC CATGGATTGC TTTTCCAACA GCAATTTGTT GTACAGTCGT TTGTTGTrTT CTTTTTAGTT TCTTCCTCAC CACAGGCCAT CAATACAACT AGCAAACATT ACTTrTTTTCA TTTGTCCTCC T'rTATTCAAA CTTGTCCTAA TAGTAACAAA ATTCCCATTA AAACAATGAG GTAGCATCAT ATGACGCTTG ATTTTACTAA AATATGGCAT CCAATACCAA GAAAGGAAGG GCCATGCCaG AGTGTAAATG CCAAGCGCCA TTGCCTCCAG AAGCCGCAAG TGCTAAAACA ACAAGGTGTC CAACCAAAGC TAAAGGTAAT ACCAAGTAAA AGAATCTGAT TTTTTAAAGG TAAAACTTTT TTGAACTTCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 AATTTCTTCA AATGAAAAAT TTCCATCTGG TGAAGACCCA AAATGATAAT AATAGCTCCC ATGCCATATC. GAAACCAATT TGCATAGAGA ATATGACCAA AGTAACCAGC ACCAAAGCCT AGAATAAAGA AAATGAGAGA GATACCAGCG ATAAAGCAAA GTGTTCGAAT CAAGCCTGAC CAGAGAACCT TTCTCCCAAA CAAAGAAAAG CTTTTTGCAC ATCCCAGCAT AGACTGGCAG GCTAGAAAAA CAGAGATTAA ATTCTAATCC TATTTTACTA TATCGGGCAC TATTGGACCA AATAGCCAAA AAGCAACGAC TTAGGTCTGA CAT 'TCATAA CCTCTACTAG TTTAGTTTCT TTTCTTGGAG AAGGTTTTGG AAGAGGAAAA ATACAAGGAG AAATACTATC GTTTCCAATA TATTCAATTT TATTTGTAAG ATCTTTTCTT TTGCTAGTCA AAGGATTAC'r CATCGCTGCT TTTCTTrGATC ATCCAATAAA AAAAAAAGGA TAAAACACET AAGAACCAAC TTTCTTAATA CTTTCTGCTA CGCAAAATCG AGGCGGATCT TATCCCCCAA TTTGTGAACG AAAATGTCTT GTCAAGGATT GCTTTAAGCT TTCTTTCACG AAATCAAGGG ATCCGCTTI'T TCTGCATCTA
ATCATGTTTT
GTCTCTGCTG
GCTTTGGCAA
ACTTGAGTTT
AGCCATTTTC
GGACTTTTTT
47*7 GCTGTCCTAC AACCTTCATC GGCGAATCAG CCAGAAGGCA CTGCCGGAAC TGAG'm'rGTT AATTCCGTGC TTAATTGCTG ATCACGCCTA CGAGGGCAAG TCAGTCATCT TATCTGAATC
GATTTCTGAC
TAGACTGACC
CTTACTATCT
TCTTTCTTAC
ACAATCACTC
TCCGrrCCT'r *96.9* 9 99 c, 99 99 9 9* *e 9 9 *t .9 99999* 9 .9.9 .9 @9 9 9 9 0 9900 ee 9 9 9 GTTTTGCATC CT'rCTTGTCC TGTGCAGGCT TGAGAGAGTC CAAGGCAGCC CAGCCTTCAC TGTCAAGGCA CTATCTTCCG GAGC?1'TTTG GATTTTCGGA TCAGATACTG TTGCCAALAGC GGCACGAAGT TCAGACTTGT CAAC'rTGCrC AGCGCTTGCT ACCACTCTAG GATCTTGAGT TGCAGGTTGA CTAGGAACAG TTATGGTATA AAAACATTG'r TAGAATTCGA TTTI'ACTGTC TTACTATAAT AACCGATGG'r GTGGTTAATG CAAAAAAGTC GCTCGTCATC GTCTCTTCG'r TTAGACCTGC AACCAAAGAA ATCCTCTGAT ACTGACCr'rT TAATGAGCGA CCATATTC'rC CAATCTAAAC AGGTGCTAGG TGCTTT1AAAC TTTCTGGGTC TTGTTCATAG TAGGTGTGGT AGCGCATAGT GGATGGTAGT TGGATGACAG GCrTCTGGA'r TGTCAGTAAG ATAGTTTTTA GTTCCTTTrA CTTGGTGGTT TAGCTCTCCT ATGGTATTAc GTGAGATTTG GAAAACGTGT CAATAAGACA GAACTTTT ACGAAAATCT ATTGTGTACT ATTTTTGGTT CATrTCAC'rA TGCTGC'rCT AGCATTTGCT ?I'CACATCT AGACTCrACT GCAGTATGCA GACCTrACTrC AGCATCTAGG AGGACAGCCT TGGTTGCATC TTCAAGCGT TGGTCTAACT CTTGACTCAA TTGAGCTTGT GTGCTCGTTG AGCTAGCCGA CGGAGCTGAG CTTGGAGCTG GGACAGGGCI' 'rTGAAACTAG- AATAGTACAT ATGGACTTCT CTGATCGATr TGTCCTAT'rC TTATTrCAr'r TGGTAAGAG AAACTTCTGA AACCAAGCTr AAGTCATTGG AGCGAT'rAAT TCACCATTrG ATCTTC'rTCC AGATACT-rTG CCTCTTATTA GATAAAAATA AGTATCGAAT CCTGTT'rCGT TATTAAAATT CTTAAGAAAkT AAGGCTACTT TCT=-'T1TC GAGTGTAGCC CATAGCTTTG CCAAAkTCAG AAGCTATrTC AGTCAAA'rAA AGTCTATCTC TATCAACTTT TC'T'GGTTTT GTTTTCT CTT TTAGCTTTAA CCAGCCATAA GATGCTTCTG TTATACTACC TATTCGCTCA ATTGAA'rATG CCATAAGAAG ATTATACCAC TAACACAAAA TAGATTATTA TTACATAACA 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 -2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240
AAAAAGAGGT
TCATGATTAA
TTCCCTATCT
TGTTCGGCAT
TACTCAAAAT
CTTTGGATAA
CTAAACCTCT TAACTCAATT ACTCCCCCAG TAGGACTCGA ACCTACGACA CAGTCATGCG CTACTACCAA CTGAGCTATG GCGGATTAAA GCTAAGCGAC CACAGGGGGC AACCCCCAAC TACTTCCGGC GTTCTAGGGC TTAACTTCTG GGGTACAGGT GTATCTCCTA GGCTATCGTC ACTTAACTCT GAGTAATACC TGAATATCTA TTCAATTTAA GAAAACCGTT CGCTT'rCATA TTCTCAGTTA GTCCTCGAGC TATTAGTATT AGTCCGCTAC ATGTGTCGCC ACACTTCCAC 478 TTCTAACCTA TCTACCTGAT CATCTCTCAG GGCrCTTACT GATATATAAT CATGGGAAAT CTCATC'N'GA GGTGGkTtCA CACTTAGATG CTTTCAGCGT TTATC!CCTTC CCTACATAGC TACCCAGCGA TGCCTT'rGGC AAGACAACTG GTACACCAGC GGTAAGTCCA CTCTGGTCCT CTCGTACTAG GAGCAGATCC TCTCAAATT'r CCTACGCCCG CGACGGATAG GGACCGAACT GTCTCACGAC G1-rCTGAACC CAGCTC1GCGT INFORMATION FOR SEQ ID NO: 54: SEQUENCE CHARACTERISTICS: LENGTH: 20986 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: CGGAGAAAAA CATGGCTAAG TCAAACTTTG AAAAAGTAGA ATCAGTTGTT GGCTGGGTTC GTGATAAGAA AATCACAGGC TACCGTATCT CTAAAGAAAC GAATGCGCGT GAAATGTCTA TCAT'rGCTCT GGCGCAGGGT CGTGCAAAAG TAAAAAATAT TTCATTTGAA ACAGCCCTAG GCCTAATTGA TTTCTATGAA AAAAATTATG AAAAATTTGA AGATTAATCT TTGGATA.ACG 3300 3360 -3420 3480 3510 GCGGATTCTT GACCTTCAAG CAAAAAGACT GCACGGTTGA TCTTTTGTTC GGTCTTCTTT ATCACGCCCT TATCCATAAA TGGGTTACGA CAATCATGGT TCTCCAACCA TTTCTGGATC TTCATGGAGA GGGCACGAGC GGTTTGGCTT GCCAGTAGCG TTrTTCAGCTT CTGTGCGTTC TAGTAGAGAT AGAGAATCTG CCTTTTCATT TGCAGCCT?1' TCT'r'ITTATT TGAGATAGCG AGGATTGGTG AAGAGGTCTT CTGGTTTACC GATAACACGG TGAGAGACAT CACGGGCAAA CAAGCCTTCC TGAGCCAGGT CCTGCATGAT
TTGAGGACAG
TTGAAGGAAC
TTCTTCAGCG
TTCCATTTCA
TTTGAGGACT
AGCGTCCGGA
GAGTTGTTTT
TTTGGCAATC
GAGAGCTGAT
GATGGCCACA
TTCTCCCATG
GCGTTTTAGG
GTTGGTTCAT CAAAGAGAAT CGTTGTTTTT GACCACCTGA CCGACCTTTT CCAGGTTTTC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 AGAACATTGA GATT'rTCAAA GAGGTTAAAG TATTGCGTGA GGTCATAGCC TTTTTCGAGG TCAGTTGGTG TTTCAAGTAG GTTAATGGAG CTTCCGATGA TAGAGATGAC CTCTCCCTTG ACAGTTGTCT GAGCGACGAT TGTGTrCA GATTGGAAAA CCATCCCCAA CTTTTCACGG ACGTTTTCTC CATGATAAAG GATTTGTCCA CGTAGGAAGG TCGATTTTCC GCTTCCAGAG TGGACAGTGA GTGAAATGTC TTTTAGCACT TCGTTTTGTC CATAGGATTT TTTGAGGTGT TTAATTTCAA TCAAATCCTC CGTFN'GCATT TGGTTAGCAC CTGTAGTGTA GGATTGCTTG TGTCATTATT GGTATCCATG TCCATTCTGC GCTCGATAAA, GCGTAGGATA
CGTGTTACG
TGATTGTAAA TG;TCTGGAAC TATTG.ATAG4 AAAGTTCGAC AACAGAGATA
ACGTTCAAT,
CATTACCAGT TGCAGGTAGG ATGTTACCGG TCTGGTTAT GGTCATAccA
AGAGCAGTCC
GGATACCACC ACGGACGAT'r
TCAGTCATGI
TAGCAGCCAG TGTACGGTCA
AGGTTGATCC
CcATCGA~G AAcAATcATT
GC.CGTACCAC
AGCCGACTAG T'ITTTGTAGG
CCGTAAATGA
AGACACCAAT GGCAAGTCCA
ATAATGAGAC
GAGTGATACC AGCACCACGC
AAGAGTTGTT
GGCTAAAGA ACTACTGCTA GTCTC'rrCAG TCATACGATC CATCAAGGCA ACT'rGGTCAT TTTGGCTAAT AcG.ATTGTCA
TTTTTACGAA
CAGT-r'GA ACCAGGTTCT
ACTTGAATCA
TCAGTGC'rTC
TGGACGAGCA~TAG
GCATTrGAGC GAAGTCTCCC
ATGGC'TGTTT
AGTTATAAAG GTAGACCCCT
TGTTGAGAAG
ATTTAGCACT TGCGTAGGCA
GAATCTTTTT
AACTGCTCGA AAAGGCAATT
TCTTGTTTGC
TC-ATGTCAJAT CTTACCAGA
GTAAGGGCAG
479 G TCAAGGTGAG GACAAAGTAA
ATC-ACGGCG;A
3 TTTGTGTTrCc CACGGTATT'r
CCTCAGAAAT
k. CAGATGTATC TT-TGATATT-G
ATGACAAATT
k CTACCTGACG TAGGACAA'rC
TTACGCATCG
CAGCTTCAAA 'NTGTCCCTTC
TCAACTGCTA
AGGCACCGGT ATTGAT'TGA
ACGATCAAGA
CGAAAC -rG GGCAGTTCCA
TAGTAGATA
GGAAAATTC AATGTAGACA
TTGAGAACCC
CTTTGTTTTC AGAGAGAGGA
GCAGTACGGA
CTATGATGCT TCCGACGATA
GAGATAAA
CCAGTT'rTC AGAAAGAATT TTAGCAACTr TTGTTGTAGC TTCGGCAGGT
TGTTCCTTGA
CTTTTGAAAT GGTTTCAATG
CTGGCATTGA
GCCCGATAGC GATAGCTGTA
TCTTCTTCCC
TCTTGAACrT. AGAG?1'CGC-A
GCTTCAGCAG
CATCAATGAC ACCAGCCTCA AGAGCTTG'rC CTTTTTTAGC ACCTGGGATT
TGTGCAATCA
TGATTrTTTGC ACCGTTAAAG
TCATCCAAAG
TGACAAGCAA AACTGGTTCG
CTAGTATAGT
GTTCTGCAdT TGGACTCATA
CCTGCG.ATAA
GGACTAGACC TTCCCACTrG GTTTTAACAA 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 CCAAAGTC TTTACCTAAG TCCTTAGCGA TTTTCTTGGC GATTTGAACA
TCGTATCCGT
TGGC-ATACTG ATTGGTCCA TCGATTTTGA CAGCTCCGTT GCTATCATCA
TCCTGGGTCC
AGTTAAAGGG AGCATATGCT GCTTcC-ATAc CGATGCGTAA ATATTCATCG
GCTTGAGCAA
CATTGACAG TCCTAGCATC AGCAAGAGAC TTGTGAAAAT AGATAAGTAY
ATGTGGCTCA
TG-ATTTCTCC TATTCTGATC TATTAAAAAA TAACTGTCTC CTATTT-rATC
GAAAAATGCG
TAATTTTTCA ACATAACTAA GTCTTTACTT.ACCAAAAT GCTATAATGA
TAAGAAAGAT
AAAAAGGGGG CTTAGTTGAT GAAAAAAACT TN'TTCTTAC TGGTGTTAGG CTTGrrTTGC CTTCTTCCAC TCTCTGTTTTr TGCCATTGAT TTCAAGATAA ACTCTTATA
AGGGGATTTG
TATAT'rCATG CAGACAATAC GACTTTAAGG GCCAAATCGT ATTGACCCTC ATCCAAAGAT AGCGAAGTAA CAGAAGAAGC GGCGACATAG TGAAGTTGA CATATCGCTO AATTAAATT'G GAATTTCATG TAAGGGGAGA TTTAGAGAGG GAACGATTGA CCGGCTAAGC GTGGAGTTGA AGGGATCAGG GATTGAAAGG GTTAGAGAAA AAGATCAGAG ATCTCCTTGT TATTGAGTG'r GTCAAATATG CCAAAAATCA 480 GGCAGAGTT AGACAGAAGA TAGTTTACCA GGGACTCA CGTGCTGGTA AGATGCCTAG TCAGGCCGCC AAAAACGGTG CAGAACTAGC GGATGG'rTA'r ACTGTGAGAG TCTATAATCC CCTCGTCTGG AACTTAAAAA ATTTACTTTT
GTTTGAGGAG
CGGGTTTGAC
AGATGTGAcT
AGGTCAGGAG
CCTTTATGAT
GCAACCTCTG
CAAGGGGGCT
AAAGAGTAAC
GTTGCATGCC
GAATCGTTTA
TAAACAACTC
CTGCTTCTAT
TCGTCTCTAT
ACAGATAGTT CAGAGTCTAT TGAAAAGTT GAAAAACTCT 'Ir=CCArAC AGGGAAACTT CTTGATTATA CTATCCGTTT~ AGACAATCT TATTGGCCTC GGACCGATTT TGCTAGCGCT GAAGAGN'TA ATAAGATAGA AGACTCGATT GTTACTTGCG TCCTCCCTTC GATCCTTTCC TTTATT-TATA GAAGAAAGAC CACTCCTTCA GAACCACCAA TGGA.ATTAGA GCCTATGGTT GAGGAAGTGA GTCCCTTGGT CAAGGGAGCT GCTACCTTGC TAGATGTGAT AGACCGTGGG GTTGGTTTGA GGCTAGTAAA AGAAGATGGT CTAGCTTTTT CAGGTAAAAA AGAAGAAACT TCTGATAGTC TTTATCGTAG AGCCAAAGTT CTTCAACTCA AATCTTCTTT TG.AAGAGGTA CGAGTTTCCT TCTGGGGGCT CCCAGATTAT TTATCAGAAG CAG'rCTAC'rC GACCTCCTTG GGAAAATTCA CCTTTGATCA ACTTATTCAA AATGTCTC'rA TCATTTCAGA AGGAGATGCA TTGTCAAGCT TTGAGAAAGA CTGCCTAAA'r CTTTCCAATT TGTTTGCGGA .TTACAAGGTA TCTGATGAAA AACGGATTCA AGCAAGAGGG TTGAACCAGA TGCAAGAAGG AGTGACAAAA 2940 3000 -3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 TATCGTCCTT 'rAACTGGTGG GGAAAAGGCC T'rGCAAGTGG GTATGGG'rGC CTTGACTATC CTGCCCCTAT TTATCGGATT TGGTTTGT'rC TTCTACAGTT TAGACGTTCA TGGCTATCTT TACCTCCCTT TGCCAATACT TGGT'r'r'CTA GGGT'rAGTTT TGTCTGTTTT CTATTATTGG AAGCTTCGAC TAGATAATCG TGATGGTGTT CTAAATGAAG CGGGAGCTGA GG'rCTACTAT CTCTGGACCA GTrTTGCAAAA TATGTTGCGT GAGATTGCAC GATTGGATCA GGCTGAACTG GAWAGTATTG TGGTCTGGAA TCGCCTCTTG GTCTATGCGA CCTTATTTGG CTATGCGGAC AAGGTTAGTC ATTTGATGAA GGTTCATCAG AT'rCAAGTGG AAAATCCAGA TATCAATCTC TATGTAGCTT ATGGCTGGCA CAGTACGTTT TATCATTCAA CAGCACAAAT GAGCCATTAT GCTAGTGTCG CAAATACAGC AAGCACCTAC TCTGTATCTT CTGGAAGTGG AAGTTCTGGT GGTGGCTTCT CTGGAGGCGG AGGTGGCGGC AGTATCGGTG CCtTTTAAAG AGAGCTACCA 481 TAGACTGAAA AAGTATGATA TAATGGAAGA TAGAAAAAAG ACAAACTATA AGAAAAGTCA ATAGrTAT CTAAACTATT TCTTATTTCA ATTTGATGAT TTGGCGATGA TTTAGACCA CGGCAAAAAG CCCTITGAAAA AG'rCCATrTTT 7TCAAAGCTA ATCCTGTGTT' AATTTCAGAA ATTACATCAC TTTTTGTTCG TCAAATGGCA GCTCTT'TTT AGGATATAAA ACACGGTTCG GATAAGTTTT TTTGCAAGGT GGATGATCGC TACATTGTAA TG=r'CCTT ATTCTAACTT AGTCT'rAAGA TAGGCCTTAG AAGCAGGTGA AAAGCGAGGG CATGC'rTrGG CAGCTTGTAT GAGTGCCCAC CGCAGATGAG GGGAACCCCG TrTTGACCAT'r CTTCCAGCTA AATCAATCTG
ACCTGACTGA
AAAGGCATGA
GTAACCGTCG
TAAATAGAAG AATCCAGTCC AGCGAAAGC'r 'GTAATTGAG ATATTTCGAA TCTCGGCTAA AATGACCGCC CTAAACGA'rC TGATGACCGA GTTGAACTCA GCCATCGACT CATTGATACA
CAGGATTATC
CCCAATCCCA
TGTrTCCGCC 4* TTGTCAATGA GCCTCTTGTA A'rGCTTGATG AT'rTCGAATT CACGAGCAGG AGATGTTG'TT CCGATAGAAC GAGGTGCGAC TGAGAGGATA TCCTGAATTT TAGAAGCGGT CAATCGCTTA ATTTC'rATCA GCTTATCAAA TCCTGCCTCA ATCCTTTTCT GAGGATTAGG GTAGCGTGTC AAGAGTTGGT AGGTATATTC TGAATGCTTT CCAACGATTT TATCCAACTC AGGAAAGATG ATATCAAGAC AACGAGTGTA T'rGTACTrTC CAATCAGACT GTTTTTCTTG AGACGATGAA TATGTCTAGC CAGTATTTTT ACGTCTACTT GCCGATTATC GTGTTGAA.ATTGTT.CACG.AT TGGGGTCAGA AAGAAGTTTA AGAGCGATGC CATGAGCGTC TTTCT'rATCC GT'TTTAGTCT TGCGAAGTGA TAATGATTTG GCAAAT'rCCT TGATGAGCAA AGGATTGTAG GTGTAAACTT TATATCCT'rG TTCATGCAGG AAGTTCAGTA GATTAAAGGC ATAATG'rCCA GTATC'rTCAA GAGCGATGAG ACAGTCTTGG TTGATCTGTC GAATAGACAG ATCTAAGAGT TCAAAACCAG CTT'rAT'rATT TGAAAAAGTG AGTGGTTTAA GAACAGTTTT TCCTCGAACA TTCAAGGCTG TAACATCGTG TTTATTTTTA GCGATATCAA TGCCTACA'rA AAGCATGGGA GTACCTCCAG ATATAGTAT1T TCAAG'rCTAC TTGG?1'ATCC ACGAATTTTT TGCCTTGTTA CCTTAGACGA GATCAAACG'r CTATGCGTTA TCAAACTCAT TACCAAT'rGA AACAAAAGCT GTGGTTAGAG CCTTTCGGAA ATCGTCAAGC GATTGGAGGA AATGAACTAA TCCATAGTGG CTTATTCCAA 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 GTATACCACT TGGGCTTTGG A'rGTATCTTA TTGAAATTT'r TGGTTGCCGA TTTCCAGTAC AATCAAAATG AAGCCTTTAT
CAGTAGCTAA
AAAATCTATC
AGGTCACTTG
GTCCATGT
CTGCGCTAAA TATAATATAG TTCTTCGGAA TTGTTGAAGG ATTTTAGCAG AGGAATTCAT AATGTCGTGA TTCAGCTTGG
GGAGTAATCT
AATTrACGGA.A
CCAATACCAA
TGCTATTTTA
482 GCAGTTATGG 'rGATI'TATTT TAACAAGCTC AATCCTI'TTA AACCGACCAA GGACAAACAG GAAGT'rCGTA AGACTTGGAG ACTATGCTTG AAGGTCTTGA TTGCTACTTT ACCTTTACTr GGTGTCTTTA AATTTGATCA TTGG~rTGAT ACCCACTTCC ATAACATGGT TTCAGTTGCT CTCATGTTGA TTATCTACGG GCTATCGACC CAAGTGTAAC GGACTCTTCC AACrrGC GGTGGTTTGT TAAATGGAAC ATTICCTGrrA TG'TTTGGAGC CTCrrGAGCT TTGGGCAAT'r AGCATGGTGG CTATTCGCT GGTAAATACC GTATCGTGCT GTATAAGAAA AACCTTGAAG GGTTGCCTTC ATCTATTTGG AAAAGCGCAA TAAAGCGCGT AGAGI'GGAC AAGCTTCCT ATACGACCGC TIrCTATATC TCTTTTACCA GGGACTAGCC G7rCAGGTGC AACGATTGTC CAGTCGTTCA GTTGTGACAG AATTTACCTr CTATCTTGGG TAGTGCCTTA AAGA7"rTCA AATTTGTGAA AGCCGGAGAA GTTTTTGCTC TTGGTCGCGA TGGGAGTAGC TTTTGCGGTC CTTGACCAGC TATGTGAAAA AACACGACTT CACCCTT'TT TGGTAGTGTT TTGCTACTTT ACAGTTTTGT CCGT'N'ATTr GGGCAACTCT TCAAGGTTTr ATACTCTTCG AAAATCTCTT CAAACCGCGT CAGCTT'rATC TGCAACCTCA AAACAGTGTT CCTCCTAGTT TGCTCTTTGA 'TTrCATTGA GCTTTAAAAT TAGGCGGACA CCTCTTTCTT TCTTGCTTAA TTCTTCATAG TATCTGACTA GCATCTTGTG TTTrTTGAGC AAGACTTTTT GTCCTCGTAG CGGATTTTCA AAATGACAAT TTTTCCAGCT GAGAGCGACT TTTTCTGATA GAAGAGTCAG CTCTTTTTTG AATCT'rCCCG TAGGTTTTCT CCTTGCCGAT TGATTTACGG AGAGTTGTGA ATGCCACGAG CCTTTCGATA CAGATCATAG TATTAGGGTT ACCTCAGGAA CTTCAAGTAA ATCAGCACCA TTGAGCAGCn
CCAGTCATGG
AGTTCCAGGG
CGTTTGGTAA
TTTTCTrGTT ATATC7TTCCT
ATGCGATTGG
CCTAGTCTAC
GTAAAAACGC
CTGCGGCTAG
TAATCCCCAA
CTATTTGGCT
GAGTTGAAAA
GATGTAGATT
CAGCAAGGAG
ATTTGACTGG
CAAAACGGTC
CCAT'rTGATG TT'rGTTTGAG 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 AAGACGTTCT ACTGTCTTTT TTCCTACTCC ATGAAATTTG GAAATATCCA AAAATCCTCA GCCTGTTCAG CGCCATTrTA CCTAAGAATT TTGCCAGATA TCI'rTTTGAA ATTTTCTGTC ACATCCAAAT CCGCI' AAAA ATAGCTCGAA GATAAAGACA GCCTGGGGAC ACCAAAAGCT CTTGCCTCAT AGGGTCGCT'r CCAATAATGA GTAGAATCAC TGTCAAACCA TGTGGTTT'rT GATAATCACT TGTTGTAAGA AACGCCTGCG GAAGCAGTTA GATGGAGTTC TGAGGCGAGC AAT'rTTGACC GCTGACTTGA TACCGAGTTT AGGCTTCGTC AA'rGCTCATG GGTTCAATCA AATCTGTATA TC'rGGAGTCC CACAGACTTG TATTTCTCAT AAr'rCCCTGA AACGTTCATA AGCTTCCTTG GAACTCATGG CAGAATGGAC AACTACAG4GT
CAGGTTTTCC
AGAAACGACT
TCTGAGTTTA
CCCCGTCCAC CTGTTTGCCG GGATTATCCC TGATTTCCAC TGCAGCAAAAL AAGGCATCCA TGTCAATATG GATGAIrTTT CTGACAAAT CATTTAACAA ACGAAAAATC AACATGCCTA GCACCTTTTT ATACTCTTCG AAAATCI'CTT CAAACr-ACGT CAGCTCTATmn TGCAACCTCA AAACAGTGTT TTGAGCAATC TGCGGCTAGC rrCCTAGT'IT GCTTTT-CGAT 'rTCCATTGAG TGTTACTGCT TAT7~YrCTTT TA?rATACCC TTTTrCTGA AAAAAAGAAA AAAGGACTr'r AT'rTTTTCAA AAATA'rAATA CAGIIGAAA TAAAATATAG ACTCTI' AG AAAAGAAAGT TATGGTATAC TTGTCT'rATG ATATGTGT TAAGACAGTT TCAAAGGCGT AGATTGGAAA CACCTTATGA TGGAGACGAA AGAAAAT'rGT AGAAGAAACT GTCCAACATC TATCGCTGA'r TCGGTATCCA AAACGATGAA TGGCTGAAAC TACTTTGAAA TCACTAAATA TGTAACAACA GTCGCGCTCG TCAkCGC.CANC TCA'rCGGTCT TTACGCACGT TAPLATGACTG GAATGCAATC TAAACCTTCA ATACCAAGCA GTAAAAATAG GCAA~rCA AATGTAACAG ATGACTGTTA GTTGA-AGCAC AAGATATTTT GAAAAACCAA GTGTA'rCACC ATTTGTACAA AGCT'TCCTTG CAGGACCAAC AGAGCGTTCA AAAGCACACT ACGAAGAA.AC TCGTTTCCCA ATCCCTGCTG GATTTATCGA CAAAGAAAAT CTCTTCAAAT TGAACTTCAT GCCAAAAGGT GAAAATGGAT ACGAACCAGA CCCAGCTGTT GTTAACGACG GTATTTTCCG TGCCTACACT ACTGTAACTG GTCT-TCCAGA TGCATACTCA CTTGCTCTTT ACGGTGCAGA CTACTTGATG AAAGAAATCG ATGAAGAAAC AATCCGTCTT TTGCAACAAG TTGTTCGCCT GGGTGACCE-r
CTTGTGAAA
CTAGAAAAAA
TGACAAAGCT
TCGGTTACTA
GAGGACATTA
TGGGAAGGCT
GCTAACTACA
CTTCACATCA
ATGGACACTC
GAAGTTATCT
GGTATCCGTA
CACGAAATCT
TCAAATATTC
CGCGtACGTA
CAAGAAAAAG
CGTGAAGAAG
TACGGGG'rrG 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 ATGTTrCGCAA ACCAGCGATG AACGTGAAAG AAGCAATCCA ATGGGTTAAC A'rTGCTTTCA TGGCTGTCTG CCGTGTGATT AACGGTGCTG CTACATCTCT AGGTCGTGTA CCA6ATCGTAT TGGACATCTT TGCAGA.ACGT GACCTTGCTC GTGGTACATT TACTGAATCA GAAATCCAAG AATTCGTTGA TGATTTCGTT ATGAAACTTC GTACAGTTAA ATTTGCTCGT ACAAAAGCTT ATGACCAATT GTACTCAGGT GACCCAACCT TTATCACAAC TTCTATGGCT GGTATGGGTA AC GACGGTCG TCACCGTGTT ACTAAGATGG ACTACCGTTT CTTGAACACT CTTGACAACA TCGGTAACTC ACCAGAACCA AACTTGACAG TTCTTrGGAC TGACAAATTG CCATACAACT TCCGTCGCTA CTGTATGCAC ATGAGCCACA AACACTCTTC TATCCAATAC GAAGGTGTAA CALACAATGGC TAAAGACGGA TATGGTGAALA TGAGCTGTAT CTCATGCTGT GTGTCTCCAC TTGATCCAGA AAATGAAGA-A CAACGCCACA ACATCCAGTA CTTCGGTGCT CGTGTAAACG TrCTAAAGC CCTrCTTACr AAGTATTTGA TATCGAACCA ACTr'rGAAAA ATCTCTTGAC ACTACATGAC TGATAGGTAC AACGTGCCAA CATGGGAT'rC CTATCAAATA CGCTACAGTT AAACAATCGG TGACTACCCA AATGGTTGAT CGAAGCTTAC AAGCTACAGT ATCACT'rTTG ACTCACCAGT TCACAAAGGT TTrGAATTCTr CTCACCAGGT ACTTGAACTC ACTTTCTAGC CACAAGTATC ACCTCGCGCT CAATTCTTGA TGGTTACTTC TGAACGATGT TTACGAAAAA
GGTTTGAATG
ATCCG'IGACG
TGGTTGACTG
AACTACGAAG
GGTATCTG1'G
AAACCAATCC
484 GTGGTTACGA CGATGTTCAC AAAGACTACA AAGICTTGA ATTTGAATCA GTTAAAGCGA ACACTTACGT AGATGCCTTG AACATCATCC CTGTTCWAT GCCCTI'CTTG CCAACTAAAC CAT'N'GCTAA CACTGTTGAT ACA'ITGTCAG 10020 10080 10140 10200 10260 GTGACGAAGA 'rGGCTACATC TACGATTACG 10320 CGCTGGGGTG AAGATGACCC ACGTTCAAAC GAAMTGGCAG ACAACTCGTC TACGTAGCCA CAAACTATAC AAAGACGCAG ACAATCACAT CTAACGTTGC TTACTCTAAA CAAACTCGTA GTATACCTCA ACGAAGATGG TTCTGTGAAC TTGTCTAAAC GCTAACCCAT CTAACAAAGC TAAAGGTGGT 'rGGTTGCAAA CTTGACTTTA GTTATGCAGC TGACGGTATC TCATTGACTA CTTGGTAAGA CTCGTGATGA ACAAGTTGAT AACTTGGTAA GAAAACGGTG.GACAACACGT TAACT'rGAAC GTTATOGACT ATCATGTCAG GCGAAGACGT TATCGTACGT ATCTCTGGAT 10380 4 0 09 0 000 0 00 00 0 0 00 00 0 0 0 09 0 ACTGTGTAAA CACTAAATAC CTCACTCCAG AACAAAAAAC TGAATTGACA CAACGTGTCT TCCACGAAGT TCmTCAATG GATGACGCCT TGGATGCATT GAGCTAATCA AG TrCTTGAA TAATAAAAAG GAACCCTCGG TCAAACGACT GAGGGTT'TTG TGCTTGGGAT AGTATGAGCA ATTCCTTCGG CGCAATATGC AATGTTTTTG GGCTCTTTGT CAACTGTAGT GGGTTGAAAA 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 AAAGCTAAGC 'IrGAGAAAGG ACAAATT'rCG TCCT'rTCTTT TTGATGTTC AAATCCGTTT TTTGAAGTTT TCAAAGTTCC GAAAACCAAA GGCATTGCGC 00 0 0000 o* 0 .0000 TGATGAGTTT GTTAGTGGCC TGATGTAGTT TTTATAGCAA GAGGTAACGT GTCTTGAATT ATAGGAGTAG TTGATACAGG TCTTCAGACA CTCCCTAGGA GAGAGAGTTT CCGACTATCT CCAGAGATTT ATCATCAAAT TCATGTGTTG GACAATGTGG TAATGAGAGG GATATAACTT TCAAGTTTAG CGTTAGAATA AGGCAATTCA ATAAATGTGC TCAAAGTGGT TTTAAAGGTG AAGCCCCAAA ACTGGTCAGT ATTCTTCTCT TCATAGTAAT CT'rTAAGTTC AGGTACTAGA GTTAAGGTCT CTCTGAAAGT TCTAGCATAG
AGGGCGATAA
TTGATGTCTT
ATGGCGTTAG
CGGTTGAGAT
TGTAGATGAA
GTAAAGATTT
AAAGGCTTAA
GCTCTGTATT
AGAGCCCTGC
AATAATTTCT
TTTTCTAG
rrTAGGATAA
TGCTTCATGA
ATTTCCAGTA ATATTTAAGA TGTTGATTCT AGTCTGATTA AAACGATCGA GAACAATTTT AGCATTGGGA CCAGACATAT CAACAGTGAC GACTTTAACT ~t 5
S
S.
S
C 0 CTTC~rrCGA GTACTTGAAG TGGTCATGAT TTTCTTAGTG AGGAGAATTC ATCCCAGGAG AXTGCTTGAG CTTACGATAG AAGCGATATG TGTAAGAGCC TrTCCGAGAT TTGGCAA'1r7 AGGACTTGCA TrGAAA'rCGT CGATAAAAGG GA'N'TTAGAA GT?3'ACATTT AGGTGGGTGA GAATGGCTTT AT1TTAAGGTG TGATGTG'r'rC CATAAGATAC ATGGGACTT'r TTTGATACTC AGAAATrATA GAGCCAGAAA GTTTTTCTGT TCAGATTTAC AGGAAGCCCC T'N'TTGTGTG TAGCAATCAT TGCGACCCGT AGATGAGACG CTGTTGGC'rA CTTTGGGATA CTGCTTTTA GCTCTGTCAA ATGC'TCCTCT AGGCAACTTT GTCAGTAAAA GATT'ATrTAA AATGATAAAA AGTAGAGTGA GAAAAGGTAC TCATCCACTC TTGAACAATT CATAATGTTC GTAATAATrA TTGTTAAGAA AGTCAGTGCT TAT'rCTTGAA CTTGGATAGT TATGAGTCTT TTITTCTTGA
AAATGATTTC
TGAAATCCT
AGGATTTCAG
ACGGTAGAGG
TCTCTGTT1GA GTAGGAGTTG GGCAATTTrC TrCTGAACGA GAGTTGI"NC AGCTACAGTG cCTTrTCA AATGAATGAG GCTAGGGAAA GGCT~rrGGA
TAATCAAGTG
ATGT'rTTTGT
TTTCTAATGA
AAAAAGCCCT
AAACACTTTT
CCAAAACTGG
TAGACAGTAC
TTGTCAAAAG
ACATGCAAAT
ACGTAAGGCA
GAAGGAGGAG
TTCCGTAAAA
ATCGGGATAG
AGTCGTATTT GAT'rTGr= TAGCGAAGAC T'rCGATATGG CTTTTATTCC GATGAGTAAT GTTGTTTAGG CGCTTTTCAT ATAATCTCCA CAGTGGGATT GTTCACTAGC AGAAACTAGA GAAATATGGG GATAAGAATA GATGAAC'rTA TAACAAATAG CCTCTN'TCG GATATCTACA CTAAGGCAAT CGTCAAAAAG GGTATrCTTT CGTTGTAATA GACTAATTAG AATATrGTAT TAATGGACTT TATTAAGTTT CAGGTAGTGA GGAAAAGATG 485 GGATGGTTGT TTGACGTCTG GAGCAATGAA AGCCAATTC GCAAAGTGGT GTAATCCTCT 'rAGAGGTAGA GGTAGAGATG TrATCAAGAA 11820 CCCTTrCTGGT 11880 TGGAAATGAA 11940 GCTAATTTAG 12000 TGTCTCACCA 12060 ACTTTCCGAC 12120 CCACCAATCT 12180 CCI-rTACAGT 12240 GTATCGTGCT 12300 GTGGTATGAT 12360 TATAAGTCTT 12420 TACCCACTAC 12480 GAGCAGAAGT 12540 GAGATGGCTr 12600 TGAGCCTTTT 12660 ATTGTCTGAT 12720 TGATGTTTCC 12780 ATAATCAATG 12840 CCTGTAACAG 12900 ACATCTGCTT 12960 G'rrTCTGTCA 13020 TTCTGCTCAG 13080 CTTTCAATCC 13140 AAAAACGrTC 13200 TTTCCTAAGA 13260 AGTTGAATAG 13320 GCAGAACAAT 13380 'rGTGGCTCTA 13440 AGCCGATGCT GGTCGATA6AC TCCTTCAATC GCTTTCGAAA TATGATACAG TGGCTrqrCG TAATAGGGAA CTAGATTTTG TAAACCAAAC GTTAAAAAAG AAAGAGAAr'r CGAAATGTCA AGATCCTTTC CTC'rTGTATG CTGAAGAATC TTCCATTTGT CCTTGGAA-AA CGAAGAATTA AAACCAAAAA GATATAATCC AGTTCTTCCT GAGTAAAAGT CATGTTGGCA AGTAAGTTTG GCAATGTTCC A'rCAAAATCG GATACATAAA GAGG=1TTTT AATTTTTCAA 130 13500 486 ACTC?1'rGGA CTCAGGGAAC TCAAGTGGAA ATTCCCGACG TTTCCAAGTG AGTGCCACTA 13560 GTATGCTAAA ATGAACATAC 'rCGTCAGGTG AATTAGACTG CACAATCA'rA TGTGTGACCC TCTCAATACC AAAATGAAAC TGGAGGAGTG CAACTACTTG ATTTTTCACA AGGTCCAAAC GTCGTACGCG G'rAGCC'rGTT GCGATGGAAA TTTGATTACC TTGTAGTAGA AAGAAGCGGA GCTGATGGAA GTAATAATTC GTTTGATGAG TATCTAAA?1' AAATGTCAAC TCTTCCTCGA GGAGACTTTT AGATTGTAAT GAAC'TTAAAG TATCCAATAA TATATTTAAA ATGGTAAT'TT TTIAGCATAGT TACCGAATCT TAGTTGCATA AAACTAATTG TCTTGTCAAA AAGGTTGTGG TGATTTCTAA CAGTTCATGA CTGAGTTGAG AATCCATACT TCCATCATTC AAATCATAAA CAATTAAAAA ACGAATGCGA TATTCAGGAC CTACTGAACC TAGTAACAAG CCACACTr'rT TATACTCT'rT TTGTGTAAAT TCGTTAAAGC GTA'TTTTTAA AATAGT'rGAT TGGTTATAAA AATGGTGT'rC CATTAATTGA ACT'rG'TGCG ATGTTTCT'rG TAATTCCTGC AAAATGCTTA TAGACAGTTC ATCTAGTTCA ATAGACCGAA TATCTGTAAT TCTTrTTTrCA ATGTATTTGT TAGATAATTT TA.ATTATTAT AATACAAAAG AATTTCCGAC TTTATTGATA AAACAGCATG 'rAATAAAAGG CATTTTAAAG TTATTAGAAA ATATTTTT TATCTATTG'r GATGGAAGTTI 'rATACATAT TGTATACAAG TTATACCCTT T'rGCTACGTT GAACATAAGT AATCCGTTTC GATTAACTTG TTCTATGAAT ATAT'rTCAGA AGGCAGTTGC ATAGTTGCTT TAGCAGAAAC ATAGTAATGA GTATTGGTGG AGTTTATGG CTTATTTrTTT ATCAAPATATT GTCGTTCTAT AAAAAA.ATAT GTGATAAAAA GT'ITTAATTT ATACTAGGAT AGTTAATAGT AATACTATAC TGTGTCATTG CCAGGTTGAG AAGATAGCTA TAACGCAC'TT TGTTAGTGAA CGGATTAACT CAGTGAGATA AATTTTA'rCA TTCGTGTATA CAGATTGAAA GTACCTATGA ATCATAGAAG AATGCTTAAC AGGGAGACAC ACATGAAAAA AGTAAGAAAG AGGACTGTGC TCTATATCTC AGTTGACAGC TTTTTCTTCG GCCTGAAACC AGTCCAGCGA TAGGAAAAGT AGTGATTAAG GCTTCTAGGA GATGCCGTCT TTGAGTTGAA AAACAATACG AAGGACAGAG GCGCAAACAG GAGAAGCGAT ATT'rrCAAAC CTTGACAGAA GCCCAACCTC CAGTTGGTTA TAAACCCTCT AGTTGAGAAG AATGCTCGGA CGACTGTCCA AGGTGAACAG TCTATCTGAC CAGTATCCAC AAACAGGGAC TTATCCAGAT TATTAAGGTA GATGGTTCGG AAAAAAACGG ACAGCACAAG 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 i4520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300
GAGACAGGCG
GATGGCACAA
ATAAAACCTG
ACTAAACAAT
GTAGAAAATC
GTTCAAACAC
AAGGAGGAGC
CTGTTTCGCA
GGACATACAC
GGACTGTTGA
GAGAAGAGGC
CTTATCAGAT
GCGTTGAATC CGAATCCATA TGAACGTGTG ATTCCAGAAG GTACACTT'rC AAAGAGAATT TATCAAGTGA ATAAT "TGGA TGATAACCAA TATGGAATCG AATTGACGGT TAGTGGGAAA ACAGTGTikTG AACAAAAAGA TAAGTC1TGTG TCAAATAGTA TGAGTAACAT TCGAAACAAG GCGACACGTT CTCTTATTGA TAAAATTACA AC?1'ATGCTT CCACTATC'rT TGATGGGACC AAAAACGGAA AGCGATTGAA TGATTCTCTT ACCAATACCA AAGATTA'rAG TTATTTAAAG TTAAAAAATA AGCTACCTAC CGAGGCAGAA TTCGGTGCCA CTTTTACTCA GAAAGCTTTG GCGAGACAAA ATAGTCAAAA AGTCATTTTC 487 CCGCTGGATG TCGTTATCTT GCTCGATAAC AATGCTCGAC GI'GCGGAAAG AGCTGGTGAG 'rCTGAr'rCAG AAAATAGGGT AGCGCTTCTG GAGT-rTACAG TAGAAAAAGG GGTAGCAGAT TNTTGGAATT ATGATCAGAC GAGTTTACA CTGACTAATG ATAAGAATGA CATTGTAGAA GACCA'rGATG C.AAATAGAT'r GATGTACCAA ATGAAGGCAG ATGAGATTTT GACACAACAA CATATTACGG ATGGTGTCCC AACTATGTCG GCTCCATCAT ATCAAAATCA ACTAAATG3CA TATCCGATTA ATTTTAATCA 'TTTTTTAGTA AATCTCCTAA ACTAGTGGAG AACATACAAT AAGACAGTT'r ATGAAAAAGG GAAATGAAGG CGGCTGGTTA C'N'AATTGGA GAGAGAGTAT AATCAPGGTG ACCCTACAAG GTCTTTACGG TAGGTATTGG AGTTT'rATGC AAAGTATTTC TAAAGATGGA ATACTATTAA GTGATTTTAT TGTACGCGGA GATGGGCAAA GTTACCAGAT TGCTCCTGCA GCTTTCCCAG TTAAACCTGA TGCAGTTATA GGCCATCCAA TTAATGGTGG TCTGGCTTAT CCGTTTAATT CTAATACTGC ATGGTACTAT AACGGGAATA TTGCTCCTGA TATTAACGGA GATCCTGGTA CGGATGAAGC TAGTAAACCT GAAAAC'rATA CCAATGTTAC TCGTTATTTC CACACCATCG TAACTGAAAA TCCGATGGGT GAGTAAT ATTTGCAATT TTACACTI-TA ACTGCAAACG ATGGTAGTCG
TACGCAAGCA
GTTTACAGAT
AAAATATTCT
ATATATT'rGG
TAAAATTACC
TGGGTATGAT
AACGGCTACT
TGACACGACA
GAAATCAATT
GGGCACAGAT
CTTGGAGAAT
TGCTACGTTT
15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 AAAATATTGG AACAGTPGAA GAGAATGGTA CGATTACAGA GGAAGA'rTTG
GGACAAGCTG
TATGATACGA
CTTACGTrrGA
ACCAATGGTC
CCGATTCCTA
ATCCAGCAGA
TAGGTGCTCC ACAAAATGAT GGTGGTTTGT CTGAGAAAAGCGATTCGTGTA ACAGGTCTGT CCTACAATGT TCGTTTGAAT GATGAGTTTG GAACAACCTT ACATCCTAAG GAAGTAGAAC AGAT'rCCTGA TGTCCGAAG TATCCAGAAA TAAAAAATGC AAAAGTGCTC ACCTTGGAAC GGATGAAAAA TAAGCAATAA ATTTTATGAT AGAACACAGT GCGCGACTTC TCACAATTC AAAAGAGAAA AAACT'rGGTG ACATTGAGTT TATTAAGGTC GCGGTCTTTA GTCTTCAAAA ACAACATCCG CAAAATGGCA CTTATCAAAA TGTGAGAACA AATAAAAA'rG ATAAAAAACC ACTGAGACGT GATTATCCAG ATATTTATGG AGCTATTGAT GGTCAAGATG GTAAGTTGAC CTTTAAAAAT 488 CTG'TCAGATG GGAAATATCG ATT-ATT'rGAA AATTCTGAAC CAGCTGGTTA ThAACCCGT- CAAAATAAGC CTATCGTTGC CTTCCAAATA G;TAAATGTGAG AAGTCAGAGA TCTGACTTCA ATCGTTCCAC AAGATATACC AGCGGGTTAC GAGTWTACGA ATGATAAGCA CTATA1'TAcc AATGAACCTA TTCCTCCAAA GAGAGAATA'r CCTCGAACTG GTGGTATCGG AATGTrGCcA 'rTCTATCTGA TAGGGCcAT GATGATGGGA GGAGTTCTAT TATACACACG GAAACATCCG 'rAA.AGTGTAG AAATGATAAT ATCTATGT'rC TGAACGATAC TTTAAGAAG TAGCACTCAA GAAGAGATTT AAGTTTACTT GGTGAAACCT GTTTTATTCG TAAGTAAACT ATCATTGAAA GGGGAGATGT TTTCGAAAAC TTGCACAGAA AAAGGATTAT TATTGTCATG TGTAATTCAT TACATTGCTC ACAGTTGAT'r TTAAGAGATA TGAATAAGGA GAAA'rCATGA AATCAA'rCAA CAAATTTTTA ACAATGCTTG CTrGCCTTATT ACTGACAGCG AGTAGCCTGT T'rTCAGCTGC AACAGTTTTT GCGGCTGGGA CGACAACAAC ATCTGTTACC GTTCATAAAC AGATGGGGAT ATGCATAAAA TT1GCAAATGA GTI'AGAAACA GGTAACTATG AGTGGGTGTT C'rACCTGCAA ATGCAAAAGA AATTGCCGGT GTTATGTTCG TACTAATAAT GAAATTATTG ATGAAAATGG CCAAACTCTA GGAGTGAATA AACATTTAAA CTCTCAGGGG AGGAGCTAAA TTTAACACGG CAGTTTATCA ACTTATGTCG AATTGAAATT GAAT'rACCAT AGAAGCAAAG CCAAAAATTG TGTAGATAAA GATACACCTG TACAAAAATT CCAGCACTTG AGGT'rTGGCA 'rTCAACAAAG AGGTGATTAT GCTCTAACAG TTTAGCTAAA GTGAATGACC ATTGAATGAC AAAGCAATTG TAATAATCCA GATCACGGGA GACATTGACC AAGACATCGG AACGTTCGAT TTGCTTAATG AGACAAAAAT ACAGTTACT1G ACGTAGTATA AAAGGGTATT CAATGCCGGC AACTGCA-ATG AAAAAATTAA CAAATTTACC AGCTGCTAAG TATAAAATTT GTGAAGATGG AGCAACCTTA AC-AGGTTCTA TGAACCATGTI TGTGGATGCG CATGTGTATC ATAAAGATTT CAAAGGTAAA GCAAATCCAG TGAACCACCA AGTTGGAGAT GT'rGTAGAGT CTAATTATGC AACAGCAAAC TGGAGCGATA GTACAGTGAA AGTAACTGTT GATGATGTTG AAGTAGCAAC TGGTTTTGAT TTGAAATTAA AAAACGCTGA AAAAACTGTG AAAATCACTT TAGAAGTACC AGAATCTAAT GATGTAACAT ATACTCCAAA GCCGAATAAG CCAAATGAAA TTGATGCTAC AGGTGCACCA ATTCCGGCTG CTCAGACTGG TAAAGTTGTA CAAACTGTAA TTAACGGATT GGATAAAAAT ACAGAATATA CAGCAGATTA TCAAGAAATC ACTACAGCTG
TATTGGCAAC
CTGGTAATAA
TTTGGACAAA
TTGATCCACA
CAGAAGCTGA
ATGAAATTCA
AAGCAGTtCC
CAAAAAATAC
ATACACCACG
ACGAAATTCT
GAATGAC'rGA
CACTTGAAGC
CAGATGCTGG
ATVCGGCAAC
TAACrATGG
ACGGCGATTT
GAGCTGAAGC
CTTTGACAAC
AATTCGTTGA
GAGAAATTGC
17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 489 TGTCAAGAAC TGGAAAGACG AAAATCCAAA ACCACTTGAT CCAACAGAGC CAAAAGTTGT TACATATGGT AAAAAGTTTG TCAAAGTTAA TGATAAAGAT AATCGT'rrAG CTGGGGCAGA ATTTGTAATT GCAAATGCTG ATAATGCTGG TCAATATTTA GCACGTAAAG CAGATAAAGT GACTCAAGAA GAGAAGCACT 'rGGTTG~rAC AACAAAGGAT GCTTTAGATA GAGCAGTTGC TGCTTrATAAC GC'rCTTACTG CACAACAACA AACTCAGCAA GAAAAAGAGA AAGTTGACAA AGCTrCAAGCT GCr'rATAATG CTGCTGTGAT TGCTGCCAAC AATGCAI-r'G AATGGGTGGC AGATAAGGAC AATGAAAATG TTGTGAAATT AGTTTCTGAT GCACAAGGTc GCTTTGAAAT TACAGGCC?1' CTTGCAGGTA CATATTACTT AGAAGAAACA AAACAGCCTG CTGGTTATGC ATTACTAACT AGCCGTCAGA AATTTGAAGT CACTGCAACT TCTTATTCAG AGGCATTGAG TATACTGCT]G GTTCAGGTAA AGATGACGCT ACAAAAGTAG AATCACTATC CCACAAACGG GTGGTAT'rGG GATT-ATGGT AT'rGCAGTGT ACGCATATGT TTAAGTAAGA GAGAAAGGAG GTATCTTCTT TGTTATGGCT CGCAAGAAGA TCACACGT'rG 'rGCCATCTCG TGATGGTCAT ATGATCGGGT GCAAATTGTA TCAAAAAGAC TTCGTTTGAG CAAATGGTCT TTACTATGTT
CCATTGATGA
CTGTGTTT
G'rCTTGCAAT TACAA'N'A'C ?TCCTGTAG TAAAAACAAC AAAGATGAGG CAATGCAGAA AATGCAGAAA CTCTTGTATG GGGTGCACAT TGGAGAACTA TCAGGAGGTG CGGTTGCAAG TATGGAAGTT GGATCATTCG AGAGACTTGC ATTCGTGGGA TGAGAATAAA ATGACCTTCC TTGAGAATCA GATTGAAGTA CGCTCTATTA TCCAGACGGA TGCGGTTTCT
CGACTGGACA
TCAACAAAAA
CGGGGGCTC
ATCAACTTGC
ATGATTAGTC
GCAGTCCAAG
GTTAGTCAAT
TATTCCTATG
CTTTCTTCTT
TCTCATATTC
TATCCAGCTG
GCGAAAAAAA
AATCGCTTGG
GAGGTTCCCT
TATACTGATA
AAGGAGGTGG
GTAGATCATC
GACTTTATGA
ATGAAAGAAG
ACATCAGGGA
18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 AATTTCrI-rT
CAGATACAAT
AGGGTGTCGG
TGATTGGAGA
AAAATGGAGA
AGCCACTGGC
AGCTGGTGAC
AGGTGGATGG
AAAGCGGACA
TGAAATGACA GATCAAACGG TAGAGCCTTT GGTCATTGTA GACAACAAAG GTGAACCTGA TAAAGGTGGA TCAAGACCAC CTTTAAATTG GTATCAGTAG CAAGAGATGT TTCTGAAAAA ATACCGTTAC AGTTCTTCTG GTCAAGTAGG GAGAACTCTC GATTTTTGTG ACAAATCTTC CTCTrGGGAA CTATCGT,C AGGCTATGCT GTTACGACGC TGGATACGGA TGTCCAGCTG GATTACGGrrr GTCAATCAGA AATTACCACG TGGCAATGTT TCGGACCAAT ACCTCTCTTC CTATACTCCT GTTCT'rCAAA
AAGGGGCAAT
ATGGTAAGGA
GTTCAAAGTC
AGTAGTTGTA
AAGATGGTCG TTTCCGAGTG GAAGGTCTAG AGTATGGGAC ATACTATTTA TGGGAGCTCC 490 AAGCTCCAAC TGGTTATGTTI CAATTAACAT CGCCTGTTrC CTrTACAATC GGGAAAGATA CTCGTAAGGA ACTGGTAACA GTGGTrAAAA ATAACAAGCG ACCACGGATT GATGTGCCAG, ATACAGGGGA AGAAACCCTT GTATATCTTG ATGCTTGTTG CCATN'TGTr GmTGGTAGT GGTTATTGTC TTACGAAAAA ACCAAATAAC TGATATTCAA TGTACATCAT TATGAATAGG ATAGCAGGCT GAAGGGAAGA CCAGAGTACT CTGAGGTGAT GTTAATCAGG AATCATGGTG ATGTGGCATG AATCATCAAT AACGGATATG AGGCTGGGCA GATrGTGCCA GCCTCATrGT GGGTTATPTGT TrGTAAA6ACG ATAGGACTGG TCTGGTAATC ATTr'rA INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 21040 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCCAGCAAAA AGCCATCCGA AGATGACTTT TTTGCTATTT AATTTCTGTA TAAGTTACTT 20640 20700 20760 20820 20880 20940 20986
CCAAGCCACG
GATAACT'rGA
GTCCATACCA
GTTTCTGAGC
GGTTAATAGG
CTAGAAAAGG
CAGGATTACT
ACTCAAAGAC
CTTAACAGCT GGACGATTGG CAATTN'TTTC GGCATCCAAG AATTTTGCAG AACCTTGGTA AGACCAGATA GCAATATCTG CAATCGTATA CAA'N'CCTTA TCCAATAA.AT CCAACTGGCG ATATTCCAAT TT'mCAGGAG CATAATTGAA TGCTGCACCT GCTTGCCAGA ATAGCCAATT TGGTAAAAAG GCTCCAAATT TCTCAGCAAG TCT1'ACGTTT TCAGTACCTG ACTGGTCCAA TGCCCATTTT ACTAGATT'TT AAGATTTCCT TGAACTAACT
GTCATTGCCT*GCAATATAAG
TTTCACTTCC ATCGTAAAAC GAAATGTCCA AATCCCCCAC CAAAACTTCT ACCTTTTCCA GTAAAGAAGA ATATGAGCAG TAAGGCTGGA ATCTTGGAAT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 TTGGATTGAG CTTCACAAAG ACAAGTCGTA AGCCGCTTCC CCTTCACACC ATTTGGTGTT AGTTI'GTTC GAAACGGGCA TACTAGCTTC ATCCTGCCAT AAATCGCATT CTTGTCAAAA TAATCATTGA GAAGCTTCCA CGTGGAAGAT TCCTTCAAGC
TCTGATCCGA
TTAAAACCAG
CCCAGTGAAT
CCTGCTGTTG
ACGGTCGGTA
ATTGATCCCC ATCCATGATA GCAATCTTAT CTTCTAGTAA TTCTTCCAAT AAGATAGTAA AAAGCTGAAA AGCTTGT'rCT CCTTTTGGCA GTCTGTTTAG CCCCGTAAAA GCTCCTTGAT ATTGATATGC TGACATCCGA AACCTCCCTT CCGAGTTTGC GTTGA.ATAAA CTTAACGATT TCGACGATGA GCCATAACAA TTCCCCATTG TGACAAGTCT AGTTTGGTTA GGTTCTACAA. CGATTGTTGC CATGAGAAGG ATAAAGGATA
CCAAGATGGA
AGACAGACT'r
TCGTTAGGGC
491 CCAGTTAAAG CGTCTTACACT TGAATGGGCC AACTGTCAAG ATGGA7"rrGT GAcATTGTAG GCATGGAAGA GCTCAATCAA ACCAAGGGN' GCAAAGGCCA ATCTGCATCA ATAGCA'rGAT TGTCACCCAC ATGAACTGGG TAAGCAATCG AACACTCATA ACA.AGACCTG CTTGGAGTAC ACCTTGATAA ATGATAGAAc
CAAGGCCATA
TCAAAACACC ACCTGAGAAG AAGCTTGCCT CAGGTTCCGC AGGTTCAACA CCAAGAGCGA TCCACAAAAG ATGAACCGGC TGTAAGACAT TTAATACTTC AGCAGTATTA GCAGAAAGTA AGACCTTACG TCCTTCTTCC ACTGCGACGA TCATATCAGA AGCCCCCTTA GAAACCTCTG CGGCTGTTTr CAGAGCTGGC GCGTCATTGA CTTGTTTT-TG CCAAGCCTTG ACGATACGAA CAGAGTAT'rG ACCAACGACT TT'rTCAAATT TGCGTCCACG 'rGGTTTATGA 'rTCATG.ACAC TAGCTC-GGAA GGTATCCGT ACCAAGTrGA CCCAACCAAA CAACCMGAT AGGAACATGG GGTACTGAAT AGTCI=?GA ATGTTTGAGA TAATAGTCGC AAAGTTATCA TCTGCAAGAA TACCAGTCAT TCCCATACCG ATACCGATAT CACCGTCACC TGTCATGGCA ACGACTTTAC CCTTGTGTTC TGGAGACACA CGGGCATAAA CT'rCATCTGA CAGTTCATTG AGTTCAGCAC CAGTTAAAAc GTGACCTTCT CT'rCCGCTGT GTCTTGGTGG CCACACGA.AC AGCCTCAGCG TAAAAATTAA ATCATTTTCA TCTTATAAGC ACCTGCAAGG CAATGAGATTI TGTAACCT'rC GACAACGTTT TAAGAGTTGG GCAATGGGTG AACTGTTGAC GAGGATATTT CTCTAAGAAA GTATCG'rTTG CGTCAATGAT TCCCAA.ACGT TTGGCAATGG TCACC'rGTAA TCATAATTGG ACGGATTCCC GCT'rCCTTAG GCTTCAGGAC GTTCAGGGTC AATCATCCCA ATCAAACCAG AGCTCTrrCAG AAGTGAGATT TTCTGGAATA CTATCGATAA ACACGCAAGG CTGATGAGC CATTTCAGAA TTGTTTGTAC TCATCAATCG GAGCAATATC CCCAGCCTTA TCACGAAGAA TCTGGCGCAC CCTTGAC1'GC TACALAGGAAA CGACCATCTG ATGAGCTTAC GGTCAGAGTC AAATGGCAAT TCAGCTACAC CCTTTGACAT CATAGCCCTT GTCCAACGCA TATTGGATAA CCAATCAAGT TACCITCAC ATCGATTTTC GTATCATTG PGTGGCATTT CAAGACCTAG TTCAATATCA TCAGCTGAGT 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 AGGCTGTTTC GGTTGGGTCA CCAAGACAAC 'rGAACGAAGT CA TGTAGAAC CGCA'rCGTAG AAGACTTrI'T CGACTGTCAT CTTGTTCATA CAGTCTTATC AGAAGCGATG ATTTCAGTTG AACCAAGTGT TTCAACTGCT GAACGATGGA ATGTCGT'rTG GCCAAAAC'rT GAGTACCAAG AGAAAGAACG 'rAGCAGGAAG TCCTTCTGGA ATGGCTGCAA CGGCAAGGGC AACAGAAGTC CAAGTGGATT TT'rCCCTTGA ATGAACACAC CCACTACAAA AGTAACAAGG
GTCAGCGTAC
GGCAACTTAC
ATGGTAACGA
AACAACTCAC
GCAA'rGACCA AGATAGCATA GGTCAAGACC TCTCATCCGC ATCTTGAAGC TGACAACAAC ACCCATCCCA CACGGTCACC AATACCAGCA 492 TTAGAAAGG1T TGTTCAAATT- ATACCAGCAA TATGACCAAC
CAGATTCACC
GGTCCGCTGG
TAGAGTCAAT
ATTTGAGGGC
'rGATGATAAC
CGACTGACAA
TGAATTTGAC
CAAGGCGCTT
TGTCAAGGCT
TACCACGTCA
CTCTGCCATG
TTCAATAGCT
CACAGCTAGG
GATT~ctGCC
CAAGATT~GAT
TTCCCCCTCA
CGACCATAGG
TCTGTCGCAA
Gd'TTCTTCAA
CCTGCTTCAA
TGTCCATCAC
TCTTCAGCTT
ATGATAATGG
GCAACTAGGA
CGTTTCTCGC
CTTGATGACA
TTACGrrTGA
GCTCGACTGA
TTTAAGAGA
GGGCAACGAT
GAAGAACGCG
TTcC7rCTTG
CATCTGCGAT
TGATAATCAT
TG?'MTGAGT GGTGTATCAG T'rCAGTGTAC ATACCTGTAT GTTTGGAAG GCCATGTTGA CAAGTCTTTT TCGACTGGTA GTTCGCTTCT ATCAAACGTA ATCGCCTGGT ACCAATTCTT GGCAACTGGA CTAGACATGG GTAAACACCA AAGGCAGCGT ATCTTCCCCA CCAGAAGTCA CAAATCCTTA AATTGCTCGA CTTCTTCGAG TTCATTGTGC CCAAAT'rCGG AACCTTGCTC GGTCGCATCC ACArCCTrGCA CTTGGCGTTT TTGTTCTTTT GACATGTGTC CTCT'TCTGT CATAGCTTTT CACGACAAAC AGACCTCTTC AGGGCTCTGA GTATAAAACG TCCTCCTTGA CATTGTGTGC AAAACAGACT AAAAAGAAAC CTGTTAATCA TAACAAGTCT TTTTCAGCAT AAAATTCGGA ATGACGACAC AGTA:GTACCA TTATACCAAA TTTTGGGGAG ATTTTTCCTT GAAALACCAGT ATAATGGTAG GAAGCAATCT ATCTCAAATC TCAAGTTAGC CTATTTGATC TTGTCTGCAG CCAAATTAGC GGTGGCCGAT GGTTT'rAATA ACGTATCGGA GATTCGGATG GCGCGCCACC TGCAGACCGT CGCTGTTTAA GATAGGGCCG TATCACAGGT TTCTGCCAGC TTT'rCAAAGA GTA.AA:AACTG AATGCTATGT GACTAGAAAG TGAGCGTGGA GCCATTATCA AGCTGGTCAT CTCCtTCATT CATCATTGGA AATGTGGCCC GACCACCGTT TTGGTCATTG
GAAAGCATAC
TACTCCCTTG
CCTTATTTGA
GAAGTTGAAT
GTAT-rTCGAC
CATCCAGTTT
TCTTAATCGG
GAAGATTGAA
TGTTCTAAGA
TGGTGCAACT
TCGCCTCAGT
TGACGCTGT'r
GATTGTGGAT
TATCTTCATC
CTACCAAAAG
CACCTACGGT
2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 GATTTGGCAA GCTTGATCAC TI'CTA'rCATC ATGTTCTA'rG TCGGTTTCGA GATACCA'rTC AAAAGATTCT CAGTCGGGAA GAAACGGTCA TTGATCCTCT CTAGGAATCA TTTCTGCAGC GATq'ATGTTT GTGGTCTATC TCTACAATAC AAGAAATCCA ACTCCAATGC GCTGAAGGCA GCTGCTAAGG ACAATCTTTC ACCTCACTTG GAACCGCCAT TGCt:ATCCTA GCTAGTAGTT TCAA'N'ATCC AAACTGGTTG CTATCATCAT CACTTTCTTT ATC~rGAAGA CTGCCTATGA GAGTCTTCCT TTAGTCTTTC AGATGGCTTT GACGACCGCC TGCTCGAGGA GCTATCATGG AAATTCCCAA AATCAGCAAG GTCAAATCGC AAAGAGGTCG AGCAACATCT ACCTGGATAT TACAcTAGAG CATGAAATCG cGGATCAGGT CGAGTCTATG GATGTCCATA TCGAACCAGC ACCTATCCCT AAATTGCTTA TGCGTGA6ACA ATTGATTGAC GATGAITTTrG TCTATATTCG CCAAGATGGA AAAAAAGAGT TAAATTCTGC TATCAAGGAC AAACTCATCT GCTATGAGTT AGATGGTATC ACCTGGCAAA ATATCTTTCA TCAAGAAACC CGG3ATTTTT CTATTCTTTT ATACTCAATA ACAGGCTGTA CTTGAGTCGG CAATGTGAAG AGTCTTAACT ATCAAATTCA CTGAGATACT CATT'N-rCTC ATCCAA'rTCT TTTTGGAGAG CCTGC-ATTTC CTCTTCAATA, GCAGCGATAC 493
ATGAATCCTG
CTGGAGGAGC
GAGGATGAAA
CAAGGAAACC
GAGCAGATGG
AT'rCAAATTA
ATCCATACCA
ACTTGTCTGT T1TTTGAAAGC GTGGCGT C?1TrGATACC TTTTAGACAA TGTCTATAAA AACTAGAAGA ACTCTTGACT ATAAAGAGGC TTATAAGACC CTTCCATCAG TCAAAAAACC GTATCTGGCG TCGCCACtGAA AAAAAAGAAT AGAGAAATCC AAAATCAAAG TGCAAATTAG
TTTCATGAGA
GAAGCCGGTC
CCGACATAGT T'rGCACmTG ATTTCGAAT CATAGCGTTC GTATTT'CA AGGAGTGCTT TAGCCAGCTT ACCAAAGTCA GAGCCGTTAG GTTTTTCCAA GGTTTCAATA TCACCTTCAA 00. 0*0 00 0 *00 0 00 *0 0 0 00 *0 00 0 0 00 00 *0 0 0 000000 0 *000 0 0000 00 00 00 0 0 0000 0 0000 *000 0 0000 00 00 00 0
TACTTGCCCA
CCACTTTTTC
CATCAAGATA
CTCCTGC'TTr TCTTGGTAGG TCATGCGTTT CTTGTCTTCT CGAACCTTGA CTTTTCGGCC TTTTGCACTT GATTGGCCAT ATC1TGTTTCA AAAGCTrT'rT GTCCGTGTAA TGACCAAAGA AAGGACGAAT CTTGCCATCC TCAAAAGCGA 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 GAATCTTGGT CGCTACCTTA CTGCAAAACC TTGCAAGAAA TTGGCTCGTC TAAAAGAAGA GTTTTTTCTC ACCCCCTGAC GGAATTGCTC CAGCAACTCA CTGCCAC'rTC CTGCAGGTAA GAGAGAAATA GGCGATGCGA GACTTCCTGC AATCAGGTTA TCCAAGAAAT AGCGGTCGTG ACTGACTGTT AAAACGGGAC TTCTCTAAGA CTGTCAAAGT TGCAATATCT AGGTCATTGG ACATTTGGT'r
AATTTCTCAA
GCGATGGAAG
TTGATCACAC
ACAGTTTCCC
AGTAGGGTI'G
TTTCCAA.AAG
TCAAAGTCCC
TCGTAGAACC
GCrTGCTT'rC
CAATCACAAC
ATTTTCCAAC
TAAAATTTTG
CT'TCTT-ccc CTGAAACTrC CAGTTTGAGG AGATAAAGAC ATGCGTCGAA CGTGGGAAGA ACCAC'rGG'rC TTGACCTCCT ATCCAAACCC TCAATTTGTT TTGTCCTGCT G'rCGGCTCAA ACCATTGTCC CCAACAATTC CAAAATGGGC TTATTI'TCAI, AATCCGACTG GT'rTCAAAGT CTTTTTCAGA TCATGGAAAC CAATACGGTC TTTAGCCTGA ACTAAGAGAT AGGCAAAGGA AACATCCTGA AACTCGATGA TCATAGTCAA GTCTGTCTCA GCACTACTGC GAT'rGATACG AGCTTGTTGC TTGGTCGCAC ATTCTTGTTT GTAGAGTTGT 'rCTTTTTTGT GCGCCTGCGG TTGTCTGCGC ATCCAGGCCA GAAGAAGAGC CGCGTCGCCC TCATCCTGT'r 494 CCGCCTTTAG GCGAACATAG TCCTGGTAAT TTCCCTGGTA CTCGGTCAAG CCTGCACGAT CCAACTCGAA AATCCGTGTT GACAAAGCGT CTAAGAAATA ACGATCGTGA GTGATAAAAA GGACGGTCTT CTTAGAArrT TTCAAAAAGA GGGTCAGCCA CTCAATAATC GCAATATCCA GATGGTTGGT CGGCTCATCC AAAAGCAAGA GGTCCTrGGT'r GCCAACTAAG AC7rGTGCCA ACTGTACCCG TCTTCTCAGA CCACC'rGACA A'ITCCCCAAC AGGAGTAGAT AAGTCT'rGAA TGCCCAATr'r GCTAAGAACG CCATCTCTGC CATGACACGT TCAATTCATA CTCACGAATG AAACTGTCTT TCTATCATCA TTTTAGCTGA AAAAGGACTG AAAGGGTGGT CTTGCCAGTC TAA'rAAAGGA AATATCCCTA CGATAAAATC ACTCATTTTT CAATTCTCCA TCGACAATGG TGG.CTGATAG CCATATTCCT
GTCTTGACCT
TCCAAACGCG
AGCTGGATTT
GACTTTCGAT
CCTGCTTGTC
CCTTGAGTTC
TTCCCAAGCT TGGAGAGAGT CTCACTATAG TCGAGCATAA ACTACATAGA ACCGTATCCA
S
S
S. 55 S S *5
S
4
S.
S
S
S
S
5* 55 S S
S
S
5*SS
S
5. 5* S S AAATCAGGAT CCTGAGTCAA GTAACCAATC ACATCCCCAT CAAATCCAGA AACACCAGAA CCATTGACAC CGATrAAACC AATTCTGTCT AAAACGGTCT 'rGTCACCAAC GGATTTACTT TCTCCCTCAG GTAAGCATGG ATGGCT'rCAC CAAACTCAAT CTCTGT'rAAA ATCTCTCCCA
TGGTAATCAT
AGGACGTCCA
AAGTCATGGA
AGTTTCAA
GATTATTCTC
AGTCTGGGCC
TGATCAA-AAT
ATGGATAGTC AAGCTTTGGT ATTTTTCTGT AGCTTGACGA .AGAT'rTTCAG CCTGTAAAAG CTTGCT1CAAT TCTCCATTTT CACGCAGAGC ACCGCCATrA ATCTGAATCT CTTTCTTGTrC GATGGCTTGT GGGTTGACTT CTTTTCCTTG CAAATCTATG TCAAAGCGAt AACAATCTCG CAAA.ATAATC AGCAAATCCT GAACTTGCTr 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7326 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100
GGCAAACTGG
AGCCCATAGT
ATCAAACAGT
TTGACTCTCA
AAACTCGACG
AGCTTTAAAT
CATAATCCGT
TTGCTTTTCC
GGCAAAGGCG
GGAAACCGCA
CTCATACTCC
TGTTCGCGGA.
CGTGAGGTCT TCCAAGATTT CAAAAATGAC TGCGCATTTT CAATCI'CCAA AAAGCCGCCC AGGCTTGTTC AGAGGATTCA AAAGTAAAAT CAGTCTCCAA CTGTTGAGCT TGTCCTGGCT AGATGCCATA TCAGGGAGAT AGTCATAAGC ATCATGGAAG CCAAGCCCCT TCTCCAAAAT GGAGCCAGCA AGAGTTTATC AAGGTACGCT CTACAGAAAT TT'rCTCCAAA AGCGGCGTCA AGGTCTTCAT GTTCTGGCT CAAGTGCAAA ACCAAGACTA GCCTGAAAAC GGAAACCACG AAAGCATCTT CGTTGAAACG CTCACTAGCC ACTCCAACTG CTCGCAAGAC AAATCTTCTA AACCATGGAA CAAGTCAACG ATTTCTCCTG TCTCATCCAA
TTGACTGTGA
CTGGGTCTGC
TCATCCCCAT
A.AAATCTGCT
AATCACGGCG T'N'GAGGTCT GATAGTCCAC ATAGACATCC CTAAGACCAA GACGGTTCCA TGGTCTCTTC TGGATAAGAA
TCTTCTACCG
TCTGTCCGAA
TGCTCGATTC
ATCGTACAAA
AGGTTGTTAC
CGATATCGGC
GACGTCGCAA TATCCACATC 495 GTGGATAGGG CTATGGAGAA GGGCATCTCG AACAGAGCCC CCAACAAAAT AAGCCTCAAA GCCTOCT'rCT TTAATTTTTT CTAATACTGG TAAAGCC?1'C TGAAATTCAG AAGGCAT?1'G CGTTAATCTC ATAATAAGTG 'rrCTAATCCA TAGACAAGCT CATGACGCTr GACAACTTCT 'rTAATTCCCA AATTGACTCC TGTCATGAAG GAGATGCGAT CATAGGAGTC ATGACGGAGG G;TCAACCCTT CTCCCTCATT GCCAAAGATG AC7TCCTCAT GAGCTACCAA GCCTGGCAA-A CGAACTCAGTr
TCCTCATCTG
TTAATGGCTG
ACATTTGGGA
GCAA.AGTTAG
GCAATTTCTT
GCATGCGCAT ACCATCAAAG TCAGCACCAC GAGCACCAGC CTGCACCTTG CTGAATTGAC TC1'CGAACCT CTGCCATCAA TTCCACTCGG AGCATCCT'rT TTCTTGTCAT GATGGAGCTC AATATTTGGC AGCCTGCGTC GCAAATTGCA 'rGAGTAAGAC GGGCAATCAG GCCACCCAAG TCTTGGGCAC GAGAAAATTC CACTCGTGAA ACCAGTCG'rT CCAACTACTG GAGCAAACC
AATCAGCTCT
CTCAGCTGTT
AATA.ATCTCC
AGCACCCAAG
TTTTAGCTCT
ATTTTCAAGA
GACATCCGCT
*0eS
S.
0 0 S. SO S
S
*b 55 S S
S
55 5 5 0 GCAAA.ACGTG TATTTTCGTA TCAAAACCAG CTAAATCAGC GACTCAAAAG GATCCAAAAC CAAGCAGCCT GGCCCATCTT ACTCCTGTCT AAGATACAAA GAAGTGTCTA CTTCTTGGAA ATCAAGATAC AAAGGACCT GAAAGGCAAG ACAGGCATCT ACAAGGCATC TTAACTAGCC ATACTTCCTG CTCCTAGGTG GAACCCAAGC CAAAATCAAG CCATGAATGA CAATGACCCG
GGCAACAGCT
CTTATCCTTG
TGCCACCAAG
TCCCTTAAAA
GTCCGTAAGA
GAAC'rATCT'r
AGCTGCCTCT
TTTTTCAAGA
TAGAAGCGCC
CGTTCCAATG
CAAGTGCTGA
GTATTGACCT
GGAGTAGTAA AATCTACCCA
AAAACAGGAA
TCCAAGTCTG
CCGGCAATAA
ACACAAAC-TG
TTTCACACAG
GAAAAATAGG
GGCAGGTAGT
AACTAAAT CA TACCCTGCCA TTCTGACTCA GATCAGTCAA TACCATCTGA TTACTCGAAT ACTCATCTCT AAA:ATALGGAA' TTCCAATCAA GGTTCCAGGC GTGTTCAATT GAATGGCACT GACTTTCCAC CCGTGTTCAA TTTCTAAGAT CTGGALATATA ACCCAGACCA 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 55 S S 0
S
555*
OOSS
S
50 S S 0 ACACTACCAA ATGTAGCAAG TGAAACATCC CGCAATTCTT CAGCCTTT'rC AGGAGCATTC GAAGCCGTTG TTTCCTTGAT AATTTCAA'TT ACTTTTTCGT AAACTTCAAT CACACCTTGA AAGCGCTTGG TGGCCTTCT'r TTrCAGTACGA TCGTTAAAAT AAAGGATTGG CTTAATGCTA AGCAAATTGC CCAAAATGGC AGCCCCA'TTT GAAAGGCGTC CACCTTTTAC CAAATGATCC AAGTCATCTA CCATCATAAA GGCTGACGTA CGGCTGAT'rT GAATGGCTAG CTTATCCTGA CAAT'rAAAGA CGCTTTCAAC CATGATGCCT AAAGcAATGG TTAAGCCCTC ATAGTCATCG ATGCTGGCA-A AATCATCGCC CTGA'rCACGC AGGGGAGCAC TTGTAATCAA AGTGTCTGGG ACCATATACT GGATATTTTG GTAAAAACCT Page(s) were not lodged with this application 497 GGrCACTTGCT CCTTCAGTTT CAACCAAGTA ACACTGCT TrGTATACG TGAATCCCTA CTGAAAgGAG ACCTGCCACC GGTGTTGCAA GGACACCAAG AAGCCCGA'rT CCMACATT'rC
CCCTG.AAATA
AAGAACATAG
AGCTTCTCCA
CGTGTGTCAC
CCTCCAAAAC
CGCACTCCAT
TATTT'rrAA'r TCGTTTTT-GA CTTGTACTTG AATTTGATGT ATTGTATCAA ACGGAGTGAT AA.AGGTACTG AACCACTGTA ATCTrTATCAA TTCTATCCAA ACCATGACAT TTTCAATTTG ATCT'N'ACCT TATCCTTCTG GTCCTACAAA GACTTTCGGC GCTTCCCITr CATGTGACT GTCCTAGTTT AAAGGCTAAT TCTGGTrA GTTCTAGGT? CAGTrCCCAAA ATATTTACCC ATTGTATAA AATCCTTTC ACTAGTTGCT TTCGTTGACG AAGATGTCTC CGATGAACTG GCTTGAACTT GG'rGCTACTG GTr'N'GTAGT CACCI-AT'r AACTGCCGGT AAGACAACAC CATTGCGGTC GATTGCCTGC ATTACCTGTT ATACGTTCGC TAGTGGCAA AACAGCGATA TGTCTCTTGG TCACTCGTAA TAGACACTTC TTTATCTGAC
TACCCGACTA
AGCCTTCTTA
GCCACAGCGG TCAGCCCATT GGGTAAATCT ACACCGGCAT CTGTTAGGTC AGCAGTAACC CTAGCTAGCG ATACGCGATT TGCACCAGTC CTAATAAAAT ACTTATCACT ATTATAGCGT GTATAGGTTT CCGTTTTTAC CTGCCTAGCA GCATAGACAA ATAAGACACA AGCAAAAAAG TTCATGTTTC CATCCTCCTA GCAATCGTTC AAGTAAGATT TCACGTAATT CTGTTTCAAA TCCATTATAG GTTATCGAA.A TTCCT~CCCCT TGAGACTTCT GATAAACCGA 'rAGCCCCCCG TGTGTTTT'rT GTCAAGGGCA GATAGGCAGA TCAA'rTTGAC TAGGGTCAAT CTCTGGTACA CCAATCT'rGA CTGTAATTTT TTGCGGACTC TCAATGCTCA AAGGAACTTC AATCGTTCCA TrTGAATTTAC GTGTACTTTC 'rTGCATTTCA AAGACCACTG A'rACTTCTGA AGCAAAACCG ATGTCAATAG GGACAI-rG' TACTOTATTA CTOCTACTGT TTTGAAAATT CGTCGCCGTA AGTGAGGATA TGATATATAA ACTATTTN-r TTTAAAACTA AGACCCAC?1 CCTCTTGG TTCATCAXGT GTTAGGTTGT GCTTAAACCT TTCCTCTGAT ACGACAAAAG TCAAGGCATC GTGTCTGGTC CCAAATTCCT TGGAAATCCC CGTCACAGCG ATACGICTT CTTTGATAAT AATAAAAATC TTAATGAGAA GTTCTGCAGA A'ATATACTCC TGCAAGGTAC GTACACGCTG 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380
CACCGCACCA
AATCTTAGCA
A:ATAG.CAACC
TCATGTAGGG GAGTGTTGG TCCAAGGGAA TTCCTGTCGA AAGGCCCCGA TTTTACGAGG ACTCAMGTAT TCAACAGACT TAACAAAGGC ACGAATCATC TGTTCCTCAG CACTAATAGG GGCATTGGAA AAGAAATCTG TCGCTCTTCC CAAACGTTCC AAACCAGTCC GAATCTCTGG AGAGAAGATA ACAACCGCCG CAATAACCCC ATAAGTAATA ATTTGATTGA TTAACCAAGA AATCGTAGTC AAACCAATCA TATTTGCAAG Page(s)... 2 were not lodged with this application 499 CAACTGAC TTGCCTGACT CAGGGTCACG AATGCTCCCA TTTCGCCAAGA AACCCACA GAGATAGGCA CGACCGCTT CCTCATCCGA TAAAATCGCC TCATCAATAC CTGTTTCCAG GCCAAAGAAA GACTCTGCCA AGTGCAAATC ACTTAACAAA TCCTGCACCT TTTCATCTGT AAAAACGGTA TAGAcGCGAT TcTTGCGAAG ATTGCTCCGT TrGGTGGTGAC CAATTTCAGA ?N'GATTTCA TAGAGATGGA GAAAGGACTC ATAGACGGTGA CGGCCAGTT TGG.CAT'ITrc TGTCACAACT GACAAAC'TCA AGCCCGAAG;T CGAGAGACCG ATGCTACCAG ACATTTTGA'r AATGGCAGAT AATTCATGCC AGCTCAGATG TACTGTGAAA CTCAT~rrT= CACCTGTATA CCATCGTGGA AGGCACCGCC ATTTTCCAGA ACTTGCTTAC AAAGACCTAC AAAATCGTrGT TTGGAATTCA TGTATTCCTG AGGCACTTTT GGGCGACCAA GGTGACGATG CAAGACTTCC TCCCCACGTT GGGTCATGAT ATTGCAGACA GCCCGCCCAA TTTCCI-rAAT CACGATATTG GTGTTGGCCC ACGAT'TTCTT CTTTAcTGc ATGCGCATCA ACTCGTCCAC AATCAAAMCT
CGAAGGAACT
TCCACTTGCA
TCAATATTCA
ACGTGGTCGC
TAGGCAATTT
GGCAAAATAG
GTCTGCACTA
ACATTGTCAA
TCCTGAAAGA
TAGATGAAAT CACGCGCGAA CTAAGTATTC ATCAAAACGG CCAAGACAGT GTCGATAAAA TATCTGTAAA GTGTTCCG'TC CTGCC -rGGT TTCCAAAACA AGGTAAAGAG GGAACCTGGC CTCGACGGCT GGCC-AGAGGC TTATGCCTCC ATGGTCTACA CTGCATGAAG GGTCAAAGGA
CCTAGGACAA
GTATCATCGT
ATATGACTCT
TGGTCACTGG
TCATGTCACT TTCAAGGATG TTAGGGCATT GGTCACATAG CTCCAGCCAC TTCTGTCCCA AAGGATAAAT TTTCCCTGTT 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140.
162 00 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 GTATGGAAAA ATTTGCTCAA TAACTGCATG GCATTATAGG TTGAACCCTG CATTTCTGAC AAGCCAGCAA TGATGAGATT 'rCCCAATGGA TGGCCAGCAA AGGCTCCGGC ATCCTCAGAG AACCGATACT GAAAGACCTT CTCATAAAAC TTAGGCATAT CCGACATGGC CACAAGGACA TTACGAAGAT CACCTGGCGG TGTCAACTGT TGCATATrTT TTCGGAGTTC ACCTGAAGAA CCACCATCAT C'rGCCACCGT CACGATAGCT GCGATTTCCA CATCTTTTTC CCGCAGACTT TT'rAGAATGA CGGACTTCC AGTCCCTCCA CCAATCACCC T'rATCTTTGG TTITCTCATG AACGGTTTAC CGT'rrCCTTT CTGCGGTCTT TGTCGCGATG CCCTTCATTA ACAGACCAAT TCTTGGATAA GTCCTGCGCC AAGCGTTTAG CAAATGCCAC ACTACGGTGT TGTCCACCCG TACATCCCAT GGCAATGGTC AAAACCGACT TACCTTCCTT TTGGTAACTT GGCAGAATCG GCrCAATCAA GGCCAATAAA TGTTGATAAA AGTCTTrCTGA CTCAGGATGG TTCATGACAT AATCA'rAAAC AGGTTCATCC AC-ACCCGTTT GGTTTCTCAG TTCTGGTAAA TAATAGGGAT TTGAAA ACGGACACA ACCAG %oo Page(s)., were not lodged with this application 501 CAACATCATC ATAAGAAAGC AAGGTTTGGT ATTGT=TCAC CAAGGCA'rTT CTTGGCTCT'r TCAAGATGCG AACCAACTCA TCAACGGTCA ATTGCTCAAG AGCCGCAAAA ACAGGCAAGC GTCCAATCAA CTCAGGGATA ATACCAAA'rT TTTGAATGTC TTCAGCGATC ATTTC!-rGCA TGTATGAGCT GTTTCGTCA ATCGCCTTAT TATTTTGACC AAATCCCATG ACTr'N-rCAC CCAGACGTTG TT'rGACAA'rT TCTTCAATAC CATCAAAAGC ACCACCCACG ATGAAGAGGA TATrr=GT ATCCAC7TGA ATCATCTCTT GTTGTGGATrG TTTGCGTCCA CCrAGGCG GTACGCTAGC AACAGTTCCC TCAATAATCT TGAGAAGGGC TTGTTGCACC CCTTCACCAG AAACATCACG TGTGTAGAC ACATTCTCAC TCTTCTTGGC AATCTT-G'rCA ATTTCATCCA CATAGATAAT GCCACGCTCT GCACGTTCGA TGTTAAAG'TC AGCAACCTGC AAGAGTTTGA GGAGGATATT TTCCACATCC TCACCCACAT ALACCAGCCTC CGTCAGAGCT GTCGCATCCG CAATAGCAAA AGGTACA?1'C AAGCTCTTAG CCAAGGCTG GGCAAGG.AAA GTTTTCCCTG AACCAGTTGG GCCAATCATd AAAATGT'IrG ACTTCTGCA-A ATCCACATCT TCTGACTCTT CGCGTGTATC GTGGAAATTG ATGCGTTTGT AGTGGTTATA AACCGCCACT GCCAAGGCAC GCTTGGCACG ATCTTGACCA ATTACATAGT GGTTCAAGAT ATGGAGGAGT TCAATTGGTT TTGGCACCTC AGACAAGTCT GCCAAGACI' CCTCAACCAA TTC1'CTCGA ATGATTTCCT 18780 1.8840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 GAGCTAACTC CACGCAT'rCA TTACAAATAA AAGCATrrGT CTTCTTCT'rG GTTTTGCCA CAAAATGAGC AATAAACCAT
TAGACATGAT
GCATGAATAC
TAACGCTTT'r
GTGAATAGAC
TI'CCTTCCAT
TATTGACCAG
TACGAAAAGC
CAAATAAACT
TCTATACTGT
ATTCTAAAG
TTGTGCTCCT
CCGTTCCATT
CATTCTATCT
GCATTTAACC.
GccAGAAGcc
AGACTTCCTT
GCCAGCAATT ATTTTTTGTA CATATCATTT TTCTATTTG AAAATAAGGT CATGTAAAAA AAAGGAGGAT AGAAAGCCCG AGATGAAACA CAGAAAAGCC TCTCTTGCGG TATTGGATGG TAAAATCATA AGGA'rTCTTC GAGACAAGTC AAG'rTCTTCA TGACAATCAC TTCATCAAGG TGTAGAGATT CrTTTCTTGA TAGCAACCCC ATCTATCTT'r GAAGCAAGCG ACGCCCCA'rC GAGTTGTTTC TITAAAGTGC TACCAATCAC ACCCTCTTC-A TCATCTTTGG CGTAAAATTT GCTTGAAACT GTCTCAAAAA GGGAAATAGG TATCTCCTTC CACCCGAGCA TGAATGTGAG TAAGGTTCAA AAGCCTGAAA GCCTGATACC AGTCAAGAAC TCTTCCGGA'r
CCATCAAAGG
TGCAATTCTG
TCCTGGGCCC
TACGCGTCAA
TCACACGCCC
CTGGCAAATG
AATTTCTC CCACCGATAA AGACTGGACG TCCTGAAAAG AATCAAGT TCCCGTTrTG CATCAAGATA GCATGATTCA CCAAGGCAGA CGATTCCT AAATAGCTAC GATTTTCTTA GTCATGCTTC E02 were not lodged with this application 503 CAAGTGG TGA GATTATCTTT ACTCCTGAAG AATTGGGGCA GCAGGTTTCT TATGTATCTG 900 ATGATGCCTT TGACTTAAAT TTAGATAAAA TATTTGACGA ATACGACGAT GTTTTCAAAG 960 C7rMGrGGA AAAATGACAA *TCTATTTGAC AGAAAAGCAA ATTGAAAAAA TAAATG(-L11 1020 AGCAATTCA.A CGGTATTCTC CAAATGAGAA AATTCAAACA GTrAGTCCTT CTGCCTTAAA 1080 TATGATTGTG AACTTACCAG AACAATTTGT CTTTGGGAAG CCTCTTTATC CAACAATTTT 1140 TGATAAAGCA ACGATACTAT TTGTCCA.ATT GATAAAGAAG CATGTTTTTG CTAATr.CTAA 1200 TAAAAGAACT GCTTTCTTCG TTTTGGTCAA ATTTrTACAA TTAAACGGCT ATCGTTTTTC 1260 TGTAACGGTA GAAGAAGCAG TAAA.A.ATGTG TGTAACCATC GCAGTAGAAG CTTTAACTGA 1320 TGAAAAAATG ACAAGCTACT CCAAATGGAT TTCTGAACAT TCTGTTAGAG AAAAGGTCAA 1380 AAAGTAACCT AGTATGCTGG ATTTGAATGA GCACAAGAAA ATAAATGAAC AGACAATATT 1440 AGAATTCTGT AATGCAGAAA CTGATATTGT CTCTTTTTAT TGATGAATAA GAAAGTGAGA 1500 AATTATGGAA TCAAAAGTTA CAATTATCAT GCAAGAAATG TTACCTCTTT TAAATAATGA 1560 ACAATTACTA GCGTTGAGAG AGAGTTTAGA ACATCATCTA GTAGACGGAA AAAAGCAGCA 1620 GAACTATTCG AATAATAACC TGTTGCAACT ATTTATTACC GCCAAGCAGG TAGAGGGCTG 1680 TAGCTCAAAA ACAATTCGTT ATTATCAGAG GACGATTGAA AACTTGTTTA ATGCTATTAA 1740 AGAGTCTGTG ACACAACTCA CAACAGATGA TTTAAGGAGT TATTTAGCAA ATTACCAGTC 1800 TGAAAAGGAT TGTAGTAAGG CAAATTTAGA CAATATTAGG CGTATATTGT CTTCTTTTTT 1860 TGCTTGGCTT GAGCAAGAGG ATATATCATT AAAATTCCCA TTCGACGGAT ACAGAAXATT 1920 AAGACTGAGC AAAATGTGAA GGAAACTTAT ACTGATGAAC ATTTGGAXAT TATGCGTGAT 1980 AACTGTGAAA ATTTGAGAGA TTTGGCA.ATA ATAGAC TAC TAGCATCGAC AGGTATGCGT 2040 GTAGGGGAGC TTGTACAGTT GAATCGTTCA GATATTGATT TTGAAAACAG AGAGTGTGTT 2100 GTCTTTGGTA AAGGAAAGAA GGAGAGACCA GTATATTTTG ACGCTCGTAC GAAAATTCAT 2160 TTAAGAAATT ATCTTAACGA CAGAAAAGAT AGTCACCCTG CTCTTTTTGT AACGCTAGTT 2220 GGAAAAGTCC AGAGGCTTGG AATTGCTGGT GTAGAGATTC GCTTAAGAAA GTTAGGAGAC 2280 AAACTCGGCA TACAAAAGGT TCACCCACAT AAGTTCAGAA GAACTTTAGC GACTAAGGCA 2340 ATTGATAAAG GTATGCCTAT CGAACAACTC CAAAAACTGC TAGGTCA 2387 INFORMATION FOR SEQ ID NO: 57: SEQUENCE CHARACTERISTICS: LENGTH: 10669 base pairs TYPE: nucleic acid STRANDEDNESS: double Page(s)7 were not lodged with this application 505 CAA'rGAACTA ATAAATTAGG GTGGAACCGC GTTTCTGACG CCCCTAGG'rr AAATCAACCT AGGATTGTCA GAT=GTGC T'TrGCTI'AT TCAGTCTATT GTCTGAAAGA AAGGAGAGCC CTGGACAACC TTTATCTTGT AAAAGACCA'r AGTCAACTAG CTACATTTCG TGA7=TGTA GTAAGAAATA CTGAAAAGTT GAAAGATTAT CAATCTrT TAAAGAATGA ACTTGCAGTC TG'TGA'IrrAC CGCAAGCTGT TATIwrGGTCA GATTTTAATG CTG.CTACACA GATTATTAGG GAAAG'rGCTG TTCCAACCTA TACAAATAAT AGACGAGTGG TTATGACGCC TGATTrAGCTr GTTTGGAAAG AATTGTAT'rT G'TATCAGTTG ATCGACTACG AGTGTTCTGA GCAAACTCAA GCAATAGAAA GTCACTATCA TTCTTTATCT GAAAArrTCC 'rCTTACAGAT TGTAGGACAT GAGTTAGCTC AT'rGG'rCGGA CATr'T'TAG ATGATTrrGA TGGTTATGAC TCTATATCT GGTTCGAAGA GGGGATGGTT GAATATATTA GTCGCAAGTA TTTCTTGACA GAAGAGGAAT 'N'CAAGCGGA AAAAATTTGT AATCAATCTC TCGTAGAACT TTTTCAGAAG AAGTATAGTT GGCAT'TCATT GAATGATTT-T GGTTCTTCGA CTTIATGATAA GAAC'TATGCA AGTATTITT
S
S
S
S
55 S S 5*55
S
S
S. S S ATCAATACTG CGCAGC'TTT AAGCGGTCTT AGATTCTTAT ATTGGTTITGT TCAGCAGAAA TGTCTAAGAA ATTAACATTT TCCACGCTCA GCACTAGAGT GATCGACGAG GCAGACTCGT TAAGAAATTA ACATTrCACT CGCTCAGCAC TAGAGTGCCT GACGAGGCAG ACTCGrGTCG AAATTAACAT TTCAAGA.AAT ATGCTrATGC- AGGCTTATGA CI'CGTGCTA TCGGACCTGA T'rGACAGTAG ATAAGTTGGT AGA.AAATTTA GGTAGTGTAC CATTATGGG CAAATACAGA A.AAAACTN' CCCTTGTTAG TTAATTGAAA AAGAAATATA AAAACTAAAG GAQTAAACAA CACTGCATCA GTGGCAGAGA CCTCCrTACA-GTCGGGCTGC GCCTGAGCTA GACGCAGTAC TAACTCGTCT TGCCTCGTAT GTCGCAAGTA ATTATTTTTr ATTAAGGAGT ATTCAATGTC GCGTCAGTGG CAGAAACCTC CTTACAGTCG GACTCCCrA GAGCTAGACG CAGTACTAAC TCGTCTTGCC TCGTATAATC CAAGAAATTA TTTTTTATTA AGGAGTATTC AATGTCTAAG TATTTTGACT TT1GCAACAAT TTTGGAATGA CCAAGATTrGT TAATGAAAAA GGTGCGGGGA CAATGAGTC C TTACACTNTC GCCATGGAAT GCAGCTTATG TAGAGCCATC ACGTCGTCCT 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 GCTGACGGTC GTTATGGGGA AAACCCTAAC CGTCTCTACC.AACACC-ACCA A'rTCCAGG GTCATGAAGC CTTCTCCA'rC AAATATCCAA GAACT 'TACC TTGAGTCTT'r GGAAAAAT'rG GGAATCAATC CTTTGGAGCA CGATATTCGT 'TTGTTGAGG ACAACTGGGA AAACCCATCA ACTGGTTCAG CTGGTCTTGG TrGGGAAGTT TGGCTTGACG GAATGGAAAT CZACTCAGTTC ACTTATTTCC AACAAGTCGG TGGATTGGCA ACTGGCCCTG TGACTGCCGA AGTTACCTAT Page(s). were not lodged with this application 507 TGCAGCAGCC ATTTACAAGT CCAAGGAATT- ATGGGTGAAA TGCTATTCGT GAACACTACA CGGCGCAGI' CTAGCCATTG ATTGATTCCA TCAGCTA TCGTATCTI'G GATGCCTTTG TGCATTGAAA rr'rGACAGT GGCTCGTGTr GATAAGATGA AGGTTCAAAC TrTTGTGG CAAGGAAGAA GAT'TrAAAC GAAGGCACAA GGGGTTGCTA TTGGCAGAA GCAGTAGAAA ACTTT'IrGCG CTTAGCCCAG AGATCAGGCT GTCCGTCAAA TAAGTTTwGCT TGTTTTAACC TATTACAAAG GAGAAGAAAT AAGAAAAAA.A CAGAAGGCTT GAGTACATCG AAGGTTATCG GACGAAGAAG GAAACGATGT TTACATGGCC G'rAGTCTTGA AAACCACGTC AGCTTCACCT TTGACrGTT GACAGGTA'rG AATACACCCT TCTTGCTGGT 'rGCCTACATC AGCI'GAAGGA CAGACAAATT GGATACCAT'r ATGACCCTTA 'rGCCCTrCGT GTTGGCACAT TGCTATGGAT TGACTTATGA AAATAAAGCA TGGCCTCTAC TCCAAAAGAT CAGATATGTT GGAAGCAGCA CATCTGTTGA ATCACTTTCT CGGT'rGATTC AGCACTATTT 9 .9 9 9 9*9 9 *9 *9 9 9 .9 9. *9 *9 9 9 CACTCATTT'r
TC-ATGATGC
ATCGTTTGGC
AAATTAACAC
GGATCCGAAA
AACACCAGAA
CCGCGCTGTT
TACACCAGAA
TGATCCAAAT
TGCCGTACTT
ATCAGGACCT
TTTTTGAA
AATCI'GTCA
TAAATAAAAT
AAAATGCTC
GAAAAAGTGG
CGTCACCACA
AAACTACGCC
TCATAATAAT
AAG'TACACC
C'rTTCTTTTA
GTCTTTNTTTC
ACCGGCTGTA
AGAAGTAGCA
CACCAAACGT
GATTTCATTG
GT'rGATAACA
GCGAAGTGTT
GTrGGTGAT rrGAcCAACT GAAACTCCAG CGGTGGCAGC GAACTrCCAG AGAGCAAGGT ?flGAGITTrCT TCTCAGTAGG CGTrGCAACTC AAGGTGTGGT GAGC'TGATTG ATAGCCTTrA GAGGTTATGG ACTTTATCAA ATCAAGGAAG CAGTTCr'rGC AGTGCTCTCG TAGAAGTAAG CGTGCCTrTA ACCTGGCCGA GAGA6ATGACC AAGAAAAAGC GCAAGTCAGC AATTCA.AACA AATACTATGG TAATGGCTGA CAACTA6ACCA AGAAAGCAGC TTGATAAACG GACTTTATCT GTATCAATGA GCTTGCTAAA.
AACAAGCCAA AC'rACGTGAG T'rGAAGGAAT CAAAATTGTG AAGTACAACG TGAAAAAGGA ACTCT'rCGAA AATCAAATTC TGCGGCTAGC ?1TCCTAG N'T ACAAAGATAG ATGAAACGAT GCTAAAAAAG GAACCATA6AT GCGTICTGCGT CACCACCGTG TCGGCGTCTC CATGACCTCC TGACCAG'rCT CTTTTAGGTA ATACCAGCGT ATGATCCATC CCTGCAGGAG AACCTGGAAC CCGATAACCC ACTCTACGTr 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 99 99 9 4* 9 *99* 9 9.99 991~.
9 99 *9 9 9 fi GCTCTTTGAT TTTCATTGAG TATATGTATT AACAAAGAGA CTACCAGTTT GTGTTTGCTA GGTTCCTAAA AACTATCATT AGTAACTTGC GCCTCCAGCA TCCCCrGAAT CAGAAGCGCC GGCAGCAGGA GCAAATGGTC CGCTACCACC CCAGTCAAGC CZATGGTTGGA AGTTAAAGAC AGGATAGTAC A'N'GCTTGGT AGTTGTGAGT GATCGTACGG ACGTATTCTT GGTTTCCGTT Page(s). were not lodged with this application 509 AAAATGTGAT GGTAATCTT'G TCCATCGACC GTCAGGTGAC GI-rCA'rAAAT GCCTGAACTC ACGACAGATT TATTGACAAC AGGGATCGTC ATAAATGAT rrCCCCTAGG ATTGGCTGGG TCTTCGAATCC CGATTTGCCA TGGGTTATCC CCTCI'GCCT GA?1rI'TCC AATGGTCAGG ATATTCCCTC CCAGATTGAT CAAGGCAGAA GTCACCCCCT ACCI-rATCCG CACTGTATCC 'rTTGGCTAAA CAACCTAGAT TTTAAAAACA CAGTAGAAGT AGAAGAATCT AGCACCGATT CAATTTCTTG AGGC'rGGGCG GTTTGAATTA AGGGACCAAT GC'rGATATTrG AACCCAAGTG AAATCAGCTC AAACACGTCT TGATAATTGA TTTCCATCA.A CTCAGATTCT TCTTGAGCA AGTCAAAGGA TTTTTGGAGA ATAGTGATAG TAGTCCCCAT TAGCCGTTCA CCTTTCTCTT ATAGAAAATA AGTTGTAATA ATTTCATTTT CATGTTATTA TAATACCATA
AACTCGATAC
ACCTTGGCA'r
AGGTGGCTAG
GGATGAACCG
TGACTATTGG
CTTCCAAG AAATTGGGCA CG.ATCI-rCAT TCCTTTCTGT CATGAGGATT GATAGACGC C TGA-AAA AC C GATACGCCAG AGAGCGCTAG GCTATGCTCT TGACGGGGGC TATTCCTGCT CCGTTCAAGCG GTATTCAAGT CTTGCTCATC CACTAATGAA GAAGAGTCAA GCTACCAACT
AAGATATCGG
GAATGTGAAC
TCAAATAATC ATCTAAATTG AAGCCCTAC AAGTTAGAAT TTTCACAAAC AAAATTTGGA AAAAGTCAAG AAATATGCTC TA=TrTTAT A.AAAAATGCT GTATACTAGT AAGGTAAATT AGTCGGTGCT AACCACGCTG TGAGAACGAA ATTGTTGTAT GGCTCTTTGG ATTGGTGAAC AAAATTGGAA GCTAAAGGTG TGATAACAAA GTAGTTACAG ATTGATTTTC GCTACAGGCT ATAAAATITCA TCAGGCTTGA AAACAGGATA AATGGGGAAT GAAATAATAG TACCCCCCTT TAGAATGAAG GCAGGAAATT GTACAGCATG TATCAATACC TTGACCAAAA CTCTAACATC.
AAATTGACGG TGCTGAAGGC
CTAAAGTTTA
CGGAAGTTGA
CTACACCAAT
CATGAACTCA
AGGAAAAGAG
CTTGCCZACCA
TGAAAACGTA
TGACAAGAGC
ACTTGCTGAA
GTAAACGCTA ACGGTAAATG TTTATGAGTA AAATCGTTGT ATGTTGGATA ATTTTGGAAA TCTTTCCTAG GATGTGGAAT TTGTTCTATT CTGATAAAGA CCTGTTCTTT CAATCGACTA CACAAAGAAT CATACGAA-AA ATCCAAGGTG TTGAAATTGT CAATTCGTGA AATTGTACCA CA.ACACCTCG ACCGTATCGC GCCTTTGAAC GTCTTGGAAA 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 1.0080 10140 10200 10260 10320 10380 10440 TAAAGGAAAC CGCGAA'rTTA AAGCAACTCT AAAkTGCTGAA GAAGTTATCA ATAAACM-TC CGTTGTTGGT GGTGGTTACA TCGGTGTTGA AGAAGTTGTC CTTGTTGATA TCGTTGATAC CACACAAATG ATGGCGAAGA ACTTGGAAGA TGTTAAAGCA ATCGAAGGTG ACGGTAAAGT TGTCTTGAAC GGTTACTATG ACAAAGACTT TCACAACATC CGCTTGGCTC TAGGTCAAAC TGAACGCTTG ATTACTGACA AAGAAAGCTT Sic Pgs. .were not lodged with this application rrTCAAACAA ATCCTTGACC GCCTCCTAGC CATCCGAAAA GATrTTCCCT ATCGAGAACA 1260 AAATGACTAC T'rrGACCATG CTAACTG'rAT CGGTTGGGTA CGTTCAGGTG CTGAAAATCA 1320 ATCCCCAATC GCAGTCCI-rA TCTCAAATGA CCAAGAAAAC AGCAAGTCAA TG7TTTGTCGG 1380 TCAAGAATGG ACTAATCAAA CCT7TGTAGA 'rTTACI-rGGT AACCACCAAG GTCAAGrrAC 1.440 AATTGATGAG GAAGGTTATG GACAATTCCC TGTrCrCAGCT AGATCCGTAA CTGTCTGGGC 1500 AGTCAATACC ATCTAATAGC TCATAATAAC CAAGCTAGGT CCAAGCGGAT TTGGCTTT-TT 1560 TGTATTCACA AAAAGACCTA CCCAAATGGA TAGATCTrA CTT AT1TACA ATTTACCTGC 1620 TACTGCATCC AACAATTCTT GGATCI'TAGG TTGGT'rGCTT CCTCCTGCCA TGGCCATATC 1680 TGGTTTACCA CCACCACGTC CATCGATGAT TGGTGCTAAT TCTTTGACAA GGTTTCCTGC 1740 ATGAAGG;TCT TTTGTCTTGC TTGCTACAAG GACATTGACT TTGTCZACCGA TAGCGGCAAC 1800 TAGGACAAGA AGATCAGAGT AGTCTTTrG TTTCCAGTTA TCTGCAAAAG TACGAAGGGC 1860 *ACCGGCATCG GATACAGACA CTTGACTAGC AATGTAACGA TCACCGT'rGA CTTCCTTAAC 1920 .ATCTTTGAAG ATATCGCCTG CGGCTGCAGC TGCGGCTTTr TCTTTCAACT CAGCATTTTC 1980 L I TrrGAAGT TGACGAAGTT GTTCTTGA.AG TCCTTCTACC TTGTCAGGTA CTTCCTTGAC 2040 *TTGAGGTGCT TTCAAGGTTG CTGCGATAGC TTTAAGAGCA TCCTCTTGTT CACGATAGGC 2100 ***TTCAAAGGCT TCCrrACCAG TCACTGCCAA GATACGGCGA GTTCCTGAAC CGATTCCTTC 2160 TTCTTTGACA ATTTTGA.AGA GACCAATCTC AGAAGTGTTG TCAACATGAG TACCACCACA 2220 o.ooo. AAGTTCAATA GAGTAGTCAC CGATAGTCAC GACACGAACT TCCTTGCCGT ATTTCTCACC 2280 AAAGAGGGCC ATAGCTCCCA 'TTTCTTTAGC AGTGTCAATA TCCGTTTCAA CTGTCT'rCAC 2340 TrTCAAGTGCT TCCCAAATTT TCTCGTTAAC TTGCTCGTtCA ATCGCACGAA GTTCCTCAGC 2400 AGTTACTGCT TGGAAGTGGG TAAAGTCAAA CCAAGGAAT TCAACTTCGT TAAGAGATCC 2460 TGCCTGTGTT GCGTGGTTTC CAAGGATATT GTGAAGGGCA GCGTGAAGCA AATGAGTCGC 2520 AGTG'rGGTTT TTCATGACAC GGTGACGGCG ATTGCTATCA ATTGCCAAGG TATATTCTTG 2580 GTTCAAGGCA AGCGGTGC.A GGACTTCAAC TGTATGAAGG GCTTGACCAT TTGGGGCT'rT 2640 oos.
CTGAACATTG GTCACAGTAG CCACAACCTT ACCTGACTCA TCCAAGATTT GTCCGTAGTC 2700 *AGCTACCTGT CCACCCATTT CAGCATAAAA TGACGTTTCC GCAAAGATAA GAGAGGCAGT 2760 TCCTTCTGAA ACAGCTCCTA CTTCTGCATT GTCAGCAACG ATAGCTACCA ATTTAGAAGA 2820 CAA'rGGCTA GCATTGTAGT TGAAGACACT TTCTACAGTG ATGTTrrGAA GAGTTTCATT 2880 TTGCATACCC ATTGAGCCAC CCTTGACAGC TGACGCACGC GCGCGTTCTT GCTGTTCTTT 2940 Page(s)., were not lodged with this application a.
a a. a.
a a a a a a a a a a a a a. a.
513 CAAAAATCAA AAGACAAGCT CATArCACGA AGGGCGAA.AA CAATGAACTT GTCATTCTCT TGTTCTTATG CAATTGTATG AGCTTAGATG GCTCGCAGCA CCGCCATr'rC TCTGGACTAA CAACTTTCTT ATTATAACGT TT'N'TTAAGC TT'GCGTCAAC TTAGACCAAT TCCCTACATC 'rCTGATTACT TTTT'CAGGAT TTTTCTTTTT ATCCCAAATT TTCATATTAC TAAACACAGC TAAAGGTGCC TATCACCCAA TATATGGACT CACTrGTTAG CCTTrAAATG GAATAGTATA GCAGTT'rGGT TAACAATCAT TTTGAuAAAA AGTAGACATT TTCAT'rAT'rT GTTGCCGCTT ThAAAArCAA AAAGCAAAC1' AGGAAGCTAG CCTCAAGCTG ACGCTGACGT GGTTTGAAGA GTATAGGCTr AG'rATAC'rAC TAAACAACTA GAA'rAGAAAA AGATAGGGCT CTAAAAACTG AACCAGCTTG ACTGA'N'CGT CTTCT'rACGT TTATCTCCTA GTAGGAAGAG GTCGCTATAT TTCCCTG2'CC ATTTATGGTC GGTGT'rTCAT GGTTrCAACA TCGGGATAGA AGGCCTTATC GCAATTCCT'? CGCTGGTAGG TTTGGTGTTG AATAGCCGAC CATTrTCAGG TTTCA.ACATA AAGTTGATA.A AGGCATAGGC TTTTGGGAAT GACCATATTG TCAAACCAAA GATTGCTGGC GTAGATTrTTC AT'TTTTrCT AACATTTGGC TGGCTTCACC CAACATrAT'r CTGAATCATA TAGCCCTrCA TCTCGTCCGC GAGTCAGTTr GTAGAGCTTA TCCACTGTCT CTTCCAACTG GGCTGrAGCC GAGGGAATT"G. AGTCCTAGTC CCAGCACCTC TGATAGAATT CTTATACTCC GGC'rTCCAAA GGTCATCCCA CCATGGTTTrC GTTGTAGACA ATTCCTAAGG TT~CCCCAGAA TAtCCTGGGTC AAAGGACTGG TTGAGAAACT CTGGTCCGAT TGAATAATC AAGCGGAACC AAGAGGTCTr CGTCCTTCAT T'rGGAATGGC AATATCGTAG GTCGTTCCAC CCTGCTTTAT TGGAGTCAAA AGTCTCGTAC TGAACTTGAA TTCCTGTTrC GTTCAGGATC GATATAGTCT CCCCAGTTATI AGATAACCAA ACCGCGGTAC CACCTTCATr ATTGAGTAGC ATGACT-TCCT GACA.AGTGAA AATCAAT'TCT TGGAAATGAT CTCCGTTGAA ATA?rrTC TTACTGCCAT TACTAGAATA 'TrCCAAATA GTATTG'TCGA TCCAAGCCAT AAAGG'N'GGC CAGANA~CT TCTGTAAGGT TAATACTCAA TACTTGAGTA CGGCALAGGCA TAGGCA.AGCA AATAAACAAA ACT'rCTAT'rC CTTAAAAACG CTTCCCATAC ATTTTAAACT AAATTr'CTCA TAAACTTCTA TTCCTTTGTT TCCTCTGGGA ATACrCCGCA TTTTGGAGAG TGAGTTTTGG TT'TTTAACTG CTCTGTCGGT ACCACATAAC AGAGAAGGTC ACGCCGATTG AACGATAGCC TTGATATT~TG CTGCAGATCC TTGGAGTTGA ACGCGCCCCA TCAAAGAGCA ATGCTCAGGC GCTTCATCTA GTAAGGGATG GAGAAT'rTAT ATTTTCGATT CCTTCAAT'rr C?1'GTTAATC ATGTATTCAC CTTAGTGTAC ATGGCrCGT TTCTGTAAAC TGAGTCAAGA TTTTTGACTA TCTCGACTAT 4800 4860 4920 4980 5040 5100 5150 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 Page(s).,. were not lodged with this application 515 a.
a. a a a a a a TAACCTACCT GTAGAAATCA AAGTCCTAGC AGAGGATTAG TA'rATAGAGA AAGCCTT'N'T TCCTTTAACA ACGGACATTC CTTGCAAATA GAGTTTGAAA ACGTTTGCGT AAAATTTGAA TG3AG'TGCTAT TTACCATAGG CCTGAGTCGG TCC-ATATTCG AATTCGAACT AAGAAAGGGG ACCCTTTT~AT CTT'rATGGAG GAG 'TTTATC CTGGTACCTT ATTrGACCAT TGGCAGG'rTG ATCTCTTrGA GCTCAGAGAT ACAGAAGGTC TGGAAAATrC TCTAGAAAAT CTTCATGCAA ATGAGATTGA TGCCTGCAAG g'rTCCTGACT TTCCTGAAAG ATTTGCCAAT GGCAATGCTC ATTCATCTGT CACACCTAAG AGCGATGATT ATCA'rArGAA T'rACTrGCAA GACTTGGGTA AATCTACAAG CAATCACAAG TACAATACGA GAGACAAGGA GACCTTTCGG GAACTGGTGG TGCTGGATGC GGTATTTAAT CATATTGGTr AAAATGGTGA ACAGTCTGCT TATAAGGAT'r CTGAAAAGCT AGTTAATAAG AGAGACTTAC TGCCTAAGCT AAATACAGCC AATCCAGAGG ATTGGATTGA AGAGTTTAAT ATCGATGCTr ATCAGTrCTG GAAGGATTTT CGTAAGGCAG TAGGAGAAGT CTGGCATACA TCTCAGCCTT TCAGCGAGTA GAAGTATTTTa TAAGGCTTTT TGTATACT GTTTTACAAA AATAGTATAC TGAATACTTT AGGAGACAAA AGTATGACTA TCrrATAAG ACATTGAAAG CATCAACTTG AGGATACAAA AGAAATGG'rC AAGTGTC-AGT TGACTTGCA AAAATAT'rrr GTATGGCGAT AAAGGGTGTG TTGGGAATGG ATTTAAGTTG CCTTAGCTTC GGGTTTCAAA TACGGTATGG TATTAAACCC AGAAGGGACT TCIarGGTGG TGATTTACAG TTACTGGACT ATATCTTTGT CAGATTACT TGAAA'ITGAC ATCA.AGCGCA TCATCGTGGC CGCAATCTCT TCAATGGAAA GGTTCCATAT TCAACAATTC 'rATCAGATAT
TTAGACTGG
GGGATTATTG
CCCATCTTTG
CGTCATrTTG
ATGAAAGTCA
AATGTCGTCA
CCAGTGACAA
CAAAACCGAA
AAAAGATAGT
TGGATTCATT
TTGATGGAAT
GATAAGAAAC
CACTATGGGG
AAGATAACTr
CGTATCCAGT
420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 CCTATCATCT IlrCGTTTC GAGGACTATA TCAAGAATTA TCTTTTAAAG GTTGCGACTT GGCGTTTGGA TGTGGCTAAT GAGATTGACC TTTTAGCTAA AAATCCrGAT CITrATATCC GGCTAAATGG AGATGAGTTC CATGCCGTCA TGAA'rTATCC
ACCAGTTCAT
TCATGTT-rAA
ATGTTCAACT
T'N'ATTACGG
TGCCTTGGGA
TTTATCTGAT AGTATCAAGG ACTATTTCTT ACGAGGAATT AAGAAGACAG CGATGAAATC AATGGAGAGT CTATGTATTA CAAGCAGCAG ATTTCAGAGG TCTCTTGGAT TCACATGATA CAGAGCGAAT CCTGTGGACG GCCAATGAAG GGTTAAATCA GCCTTAGCCT 1-rCTCTTTTT ACAAAAAGGA ACACCGTGCA AACCGAGCTA GCCTTGACTG GAGGACCAGA TCCAGATTGT CGTCGTTGTA ACGTGTATCA AGTGACAATG ATATGCTGAA CTTTATGAAG AGGCTGATTA Page(s)., were not lodged with this application TTTGATGAAA TAGAGACATA TCTGAAAAAC GGATGGTGGC ATOCN'TAGC AAGGAATCTT TAGCGA'N'TT GCTTGGAGGG AGAAAGAATT TCAAGGAAAT 517 TAGTGGCTAT r-rGGATGGAA CTAATGGAAA TCCAAGAI'A TTGAACGTAT GTTAGAGATA TACTGATAGT GAAI'CAAAC
CG'N'CACTGA
CAAAGTGGTG
AT'rGTTATCA
CAAGAGGAAA
CCGGTAAGAC
TAGATGGTGA
AAAGTCCAGT
TACAATTCAT
TAGT'TrCGT
AAGCAACCTA
CGTATTAAAC
TCTCAGCATC
CACACTATTT AGAACTGCAG CAAGAATATG GCAAAGACAG TGTAGAATAT ACCAAAGAT ?TGC-AMCAAA AA~rCTAGAC; TrTTTAG;TAA CAAAATTGAG TAGTTTGAGA TACAATCTrT TGATAGAGGG AACTTTACGA ATAAGGGATA TGAAGTACAA GTACTCTTAT CCGTTATGAA CAAAAGAACA TCATGATTTC AACTAGCTAT CTTTGAAAGA ACAGTTGA'rG 'rTGGCCTTAA
GAACTGTACA
ATTGTAAATC
ATTCAAATTT
TTCCAAAGAA
TTGCGACAAA
TTATCA.ATCC
ATCTAGTTGA
ACCAACGAGA
AACAGCACAA
CCCTGAATTG
AAATCAAGCA
TAACACACGA
TAGA.AGTTGT
CTCTTGAAAA
TCGTATCTAA
CCCAACTC
AAAT'rGGAAG
GTATATGATT
a. CAAAAGAAAA TACAACTT'CA GCAGCAGATG TTCTTCAAGA GTTACTCTTT GGGGAGTGGA GTCAGGTAGA GAAGGAGATG T'TGCAGGTGG GGGAAAAGAG ACTTAATGAA TTACTTGAAA AATAAACAAT TGATATNT CTGTATCA AGAACTAACA TTCkITTTGCA TACACCTGTT AATATAGAAA AATTACTGAA AAGTTGATAA GACAATTTTT AATATATTAG TTTTCTTATG AGGAGAATAG AAATGAGAGG AATTCCAAAG AGAAATTCG TCT'rATGAT'r ACAACCTATT GATGAACTAT ATGATATATT T'rTAGTAATT TTGATAAGGT TTAGCAGAGG CAATCATAAA TAGTAACTGA TCATAATACT ACCAALAGGTA T1'AAAAAGr TAATGAAAAA TTATCCGATr TATGATATAC ATCCTCATAT GTGCAGCAGA TAAATTGCAT A'rTGTATGTA TATATGATTA ATCAATGGTT AAGTGAAAAT ATTATAAGTG AGAAAGATGG CTATAATGAA GGATTTCAAT AATCAAAAAA TAGTTAACTA ATGACATTTT GAAAAAAGGT TCTCACTTAT CAGGTGCATA AAGAAAATAC ACGATTTTGG AGTTTAATAT TAACTCGAAA TATTCTCTAT AAAGAAGTTG GTGTATTAAG TTTGGGACAA T7ITTATTA GCATATAGTG ATTATTCTAA AGACTTCAGA GTTTA-ATA.AC AAGATAAAGT TAGCTT.rCAC AAGACTTTAA TTCTAAT'rGG ACTGCAACGA TTTTGAA.AT AAGAAAATAA TGTTT'TTTCT AGTTCAAAAG AA.ATGGAATA GAAATAGTTG ACAAATGGCA GTCTCAATCA TT'rACATGGA GTAGAAATTA TGAACAAGAA TCATGGGTTA AAGTTATCAA CATTCACTGA TATTrGCTCAT 'rTCAATAGT'r TAAACGAAAA ATTT'TTTCTA GAATCTTCGC AACAACT'rGA AAAGTTGTAG CCATGCTTGA CCATTGATTA TTGATCAGCC 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 Page(s). were not lodged with this application 519 TAPATCTGCAA GCGCAGAAGC TAAAGGAGTT CCATGGTCAT CACTAAACAC TTCCCATTC ACGGAAAGAG TTACCGCAGA AAGCTTATCA ACTACAT'rCT CAA'CCTGAG AAAACCAATA ATCTTGCCTT GGTGTCGGAC TATGGCA'rGA AGAATTTCT GGACTTTCCI' AGCTATGAGG AAATGGTGCA GATG'TATCAT GAAAATIICA 'rCAGCAACCA TACGCTTTAC GATTNTCCCC ACGACAGGAT GGAAGAAAAT CAACCAAAAA TACACGCTCA CCACATCATT CArCPrrTT CGCCAGAGGA TCATATCACT CCTGAACAAA TCAATCGGA'r AGGTTATGAG ACTGTGAAGG S S
S*
S
S
S. 55 S S
S
S S
S
S. 55 AATTAACTGG TGGCAAATTT ACAATCACAT CATTATCAAT ACAAGGTGGA GCGAAATCTT AAATCATTGA GAACCGCTAT AGTA'rGAACT CAAGCAGCGA TCAAAAAGAA TGCTCCGCTrA TTT'rTATTAC GGACTCA6ACT AGCCTTACAC AGAAGAATTT TGGAATTTTT ATGCTGAAA TTGGACTAAC TATCAATCCT TAAAGGAGAC AGAGCTAGAC TTAAAAATAG AAAAGAT1'GG CGTTTTATCG TTGCGACCCA TGTTGATAAA GACCACCTGc TCAGTAGATA GCAA'rTCTGA CAAAAAGCTC AAGTGGGACT CGCATCATTT CTGACCGTrr TTCTAAAATC GCAGGTGCTA TCTCACCAGC GGTATGAAGT CTATCGTA6AG ACTAATCACA CTCTAT'rTTT TGATGGAACA TTCTAGGGAC TTTGAGGATT CTACATGTGG AGATGGATT CCGTCACAAG CATGCCACCT ATGAAACAGG TGGTGCGTGG CAAGCAACTC AATCGCAAGC TTTAAGAACT ACTTTGCCAA AAGAGAAATA GAAAGTCrA GTTGAGAA'rA TGGATGATTT ACTTCAGAAA GCAAAACTTT 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 AAACAAAAGC ATGTCTTT TCAATTTGCA CAGAAAAATC TT'TATGATGT AGAGTTTTTC CAAGCTCCAG AAACTGAGGA 'rTTCGTTCAA
GGAGTGGAGG
CAAGATTATT
CTTTATCAAG
S S
S
S
.55.
S
S. 55 AAGAAAAG;TT ATCCAAAGAA AAAGAACTTC CAAGCGATGA GAAGTCTGG GAGTCCTATC AAGAGTTCAA GAGTAACAGA GATGCCGTTC ATGAATTTGA GGTGGAGTT~G TCACTCAATC AAATTGAAAA AGTAGTGGAT GATGGAATTT ACGTCAAGGT CA6AGTTTGGT ATTCGTCAGG AGGGACT'rAT CTTTGTGCCG AACATGCAGC TTGATATGGA AGAGGATAAG GTGAAGGTTT TCATCAGGGA AACCAGCTCC TACTATGTCT ACCACAAAGA CGCTGCCGAG AAAAATTGTT ATATGAAAGG TCGAACCTTA ATTAGACAGT TCAGCTATGA AAATCAAACC ATTCCATTAC GCAGAAAAGC GACAGTCGAT ATGATTAAAG AGAAGATTGC GGAAGTGGAT GCTTT1GATTG AACTGGAAGT AGAAAATCAA TCTTATGTCA CGATTAA6AGA TGAGTTAGTG CATGAACTAG CAGCGTCTGA ATTGAGAATC AATGAGTTGC AAGAACGAAT GTCALACCTT-G AATCAAGTAG CAGAATATCT ACTGGCTTCA GTTGAAAGTA AGCAAGAAAT GAAATTAAAT CTITCAAAAC TGAATATAAC TGAGAATATC AGTGCTAATA TTGTTGAGAA AAAATTGA.AG AGCCrGGGGA 9180 were not lodged with this application 521 GTTTTTCCTG GTTITGATAAA GTCAAGCGCA GCCTTGCGAA TCGCTGCAAT CTCTGCCTCA TGAGTTTGTrG GAAGCAAGTT CAAACCTGCA- ACTGAAATCT CATCTTCAAA ACCGATACGA ATGACAGCCA AT'rCATCACG ATCAGCCACA GCGATGATAT AGCGAGAGTC TGTCACrAAG CCGTTTGAGC CCCAAAAACC ACTCAAATAA CCATCTAGTT CTTTTTCTTG C-ATTTTAGCT TAACTTTCCT ?TCAAATAGT GTCCTGTrATA AGTTCCTG'r'r ACGATGATGG TTCCACCACC ATGrGTcTGcc GTcrrATA.A CATCCAGATT GTCTACAA.AG CGAGCTAAAA CCTTGAGCAG CGTCGGCTCA TCCAGAATGT AGAAAGATTT TAACTTCA'rA CGTTGGGCTT CTCCCCCAGA ATAGCCTAGC CCTACATCCT TGATGGTCTG GAAAAATTCT A-CCCATCGT TGACCGTCAT GTAGTGAACT TCTAGGGTTT CACTGTTATA ATAAACATCT GGCAAGAAGT GCATCTCAAT ACAGCGACCT CCCTTGACGT TGAAACTGAA TTCATTTGTC TGAGCAAAAA GGTCACGTAT GTTAGACCTC GGCGTCCGTC CGATAGGGCT CTCAATCCCT GTAATAGTCT TAAACTTACC GGCAATGGCT TTTTT'GAGAA TGCTG~rGAT TGTCACTGCG ATAAATTTTC CTAGTGGAAA AqGCGCTCCT ATCAC?1'CAA TAAAACGACC GATGACACGT TTGCCTGACA AGTACTGACC
TCGCGGAAAG
TCCTTALATCA
AAAGCTGCCT
GTCAAGCCCA
ATCTCAAAAC
ACCTGACGGT
TAGACG=T TAAGATTGTT AGAAATGC?1' GTACGCGTTT
GCTGGCTTCG
GACACCGCCC
GTGCTCGATG
GCGAGCAATG
TCCTGTCGAT
AAGGG'rGGTA GAGT'TTGCG'r
ATCCAAGACC
GCGGGTTCCG
CTTGATAATC
GCGCCCCTTC
ATCGTCAAAA
CTGGTCAATA
AGGTTTGTCT
TAGAGTCGAT
GCGAGCCGTG
ATTTCCGACA
TGTGATAGAC
ACCAAAAACA
TTCGTCGTGT
AATCAGGCGA
TTGGCAGCTA
TCAGGTCCCA
ACCAGGACTC
TCCTCTGTAT
CG'rTrGTGGA
GCTCGCTGTC
TGAATTTTCG
TGCCAAATAT
TGGCA-AACTT
CCGTCACCTG
TTGTAGCCTC
ACTCCTGTAT
TCAATCAAAC
GAAT'rACGGT
'TCCCTGAAC
ACATr'rrGCA CGGCGCTC ?1'GCTGTTGC
CCCZCACCAG
TCCACCACGA
TCArrGTCCC
CTTGGTCTGA
TACGAAGACC
GCATACGGTG
TGTCTAAC
CACTGGT -rC CACGACTGA'r
GATAGAACAA
TTCCACAAAC
GTAATAAGAC
AATTCCTGCA
TTGC'rrAGCT
AAAGACTGTT
GA'rGATGATA
ATI'CATGATG
CTTCTTCTGG
AGTCAATGAT
TATGCCATC
GAAGCCCTGT
GTTCGCTAGC
CCAAGGTCAC
GAATGTGTTG
TCTTTT~CCTT
CACAACCAC
AGCAAGCTTC
GAATCTTGGC
AGGTAGCTGG
GGTCGACATG
TGAGCTTCTG
CCGACACACC
AGrrGTT~CTrC
CTGGTACTG
GAGCCACTTG
GACCAACGTC
TAAGAGTAT
TCTGGTGAAG
1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 CAGTGTA CCTGCTGCAA AATCAGATAA TCAGCCTCAC GCCCAAGTCA CGCATCTTTT
CAATCTCACC
GCATGGTATC
TCAGACTGGC
E-22 Paes,. were not lodged with this application 523 TccccAAcAc cI'cGA'rATT GATATTGCTG ACCN'CGAGC TGTrCTCGTAT TACCA'?rGAA GACGAGTATA CCCTGATTAT AGGAAAGAAA TAACCGCACC TACTACGTAA CCATCCCGCT AAACCATTAT CACTACGTGT TTGGAACCAC TACCTGTCCr GATTGCGTAA TTTCTATACC TTCATGCGTT CACGTTM' ATGCAGAGCT 'rTACCTAACA GCCC1'TCGTT CAATCGACCG G'rCAACTGCA TCAATCAACT CGTAATGAAG AATTGATrGA CTATCGTCTA TTTCAAGGCC TCCCTCAAAA CAAATGAGCG GTTCAACCAG CAATA'rCAAG AAATACCTTG AGGACGAACA T'rGAAACCCA ACAGGCCATC GAGATGGCAG ATATTATGG CAGAGACCTT TGCCTCTA'rC ATTTCTA.ACA ACCAGAACAA TTGTCACCAT CGTCATGTCC ATCCCAACCA 'rGGTCT'N'TC
ACCGCTCGAT
CGTAGaACGTG
GCGGAAGAAA
CCGGTCACCG
TGGTATTATC ATCACTGACG TGATGTCTTT ATCAACCGTC CTTTCAAATT CTTTATCGCA CAAGAGTGAA CAAATCGAAA GCTCATGGAA TTGGAAAAAA CGTGATTAAG AAATTGACCA CCTGCrTTAA GACACCCTGA AAACGTCTG CATTCTATGA cATCATGAAA ACC?1'GGCCC TGCCTACGGG ATGAACTTTA CTGGTTAATC GTCTTTATCG TAAAAAATGG TTCTAAGAGG AAAAACCAAG AGTTTGTCCA GCTGAAATCC AGACTATTTT 00 0 000 0 00 *0 0 0 0 00 0 0 00 00 0 0 0 000000 0 *000 0 00 00 0 0 0 0 0000 0000 0 0000 00 00 00 0 0 AGGATAATGA AATCCCCCTA AACGGAGAGC CCTITrGCTAT GAGTGTCTCG CTCAC'rC'T AGTTCCTATG TCTCAAATTG ATCTACAAAA CATTGCI'ACC CAACAATTCA TCAAAGATGG TGAGGAAGTC ATTCCCCAAA rCCTTGAGGA ATACGGCGCA CCAACTCATT CGGCTCATAG GCATCCAAAA GAAAATGATG ACCCAAAACT TAGCCTCTTT GCCCTTGTCA GCGCCCTCAC CTATGGATTG ATTACTCTTC TATTAGTTGG GTACTACTTr GTTTACCAAT ACTATGGACC CTGGAAATCT GTACTAGTTA TCCTAGC'N'C AACAAGCTTC CTACCAGCTA GCCTTAACCC TGAGCAGCC CTCCTAGCCC 1'rCGCT'rCTA CAAATGCCT'r
ATCTCATCCA
ATtAACTAAG
GAAAACAGAC
4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5a80 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 GCAATCTAAA GGTACAACTG CTTCACTGTC AALAGAGCAGT GATGATTATG GACTCAGCTC AACCTTCTTT GCGGCAGACC
CCCGTTCCCT
ACGAAAAAGA
TTTCATCAC
AAGCTTTCGG
ACTGGTTGGT GGATTTGCCT TCTACTTGAT AGATATGGAT CGCAGTCALAC GTCCACCTT TATGT'rCCTT TGTTG(ZrTG TCTTCTTTGC AGTACTGGAT CCATTGCCAC TAGCTATTAT TCTCAAGAAA CGCrGAATA TCCGTAGTGC AAGTGCAGGA CCAACACGCT ATCAAGAATA AGAAAACGAT AAAAGCAACT GCAGGTGCGG TTGCTTTTTC ACTTACTTT TTGAGTTATA TTCAATGAAA ATCAAAGAGC AAACTAGGAA GCTAGCTGCA GGT'rGCTCAA AGCACAGC?1' TGAGGTTGCA GATAAAACTG ACGTGGTTTG AAGAGATTTT CGAAGAGTAT TAAAAGTATT CTTCTGAAAT CCCACATAGC mCTCTTrAT were not lodged with this application
GTATTCTGG
~rrGGCATTG
GTGAGACAAG
TCCCATG.CrA
GATACCATTT
AGCGATAATT
TGCCTGCTCA
GTGCCCCATC
AAAATGATAT
TCGTAAGGCA TGAAGTGACG CTrCATATTCG GCCCAACCAC GACCTGTCCC ATCTCCAATC ACATGAGACA ACATACCTGT CGTTGAAGTT ACAAAGTCAC GCATAAAGCT ACCTAGAAAC TGGTCCGCAT AGGCAATrTC GACGTTGTTA TCCATATCAC CGTCTAGATT TTm-TCATCA AGACCGCGTT GGGTAAAGTA GTACTTTTCA 'rGGTGCAACA TCAAGGCCAA GGCACTCATC G'TCTTTGACA TGTTTTCTCC CCTATTCTGT TCACATTCAA AGACAAAATA AAGAAGCTGG TATCTGTAAT TTrTCCATGTG CTCCTGCTAC TC-ATGGTAAC GTTTrGTAGCT TCT1GGACGAA AGGCAT -rCC AAAGTGAAGAA TGGTAATTCC ACATCTGATC CCTGAACCTT CCTGATCCCT TAATTCCGAT CATTCTGGTC AGATTCAACT TTTATTGTAG ATTTGGCTCT 'rGACACTTC'r TCATTCTTGG a.
a a a. a.
a a a a. *a a a a CTTGGGCAAC CCGACGTTCT TCTTIAGAAA ATCATAATTG TAGGGATAGA ATGAACTTCT ATI-rCACAAA TTCACCAGGA TAACTrAGACT GGGCTGAGAA CAGAGCGT'rT CTTTTCAAG TTC=CTTT TTTAAcCCCT GTITTACTGG TTTTTCTTCA CACCCTTGAT ATTACTGATC TCAACATGAC CTCGTCATCT1 =TCAACACT 'rCATTATAGC CAAAATTrTCT CTAATTTCTG TTGTGACACC AGATGTTTCT CTTGGTCACC AAGCGATAGA AATGGCTTGC GCTrCCACCG ACCTGTCAGG TTCAGACCGA TCCTTGGTTA AAGGCCNTGG TTTAAAGGAA GTCACGCCCA GAACrGAGAG AAAATCAAGA GAGACTATCT AGT'rTGCCGC TGTTCTG3TTT GTTTACTTTT TTTTCTTG GAGCAGGTGC
TCCCCCAAGA
TTTTCCTTrT
TGACGTCTCG
TCCGCACGCG
AAAGGAGCCT
GCAATAGGAG
AGATCAGACT
GACACCAATG
CTATTGTCTT
CTAGGGTCAG
TTTGTTCTTG
CCTCAACCGT
CAGGATTCCA
CCCCACCAGC
TCATGTCTTG
AGTCTGGCAG
CACGGTGTCC
TGGCTCCCTG
TATAATGAGA CTGGGTCAAT TTr'rGGCTAT GGAAAGGAGC TGTCGGTTGA TTGCCCTGTC CAAGGCTGA-A ATCC'rGAGTT AGGTAGTTAG C'rTCTTCACG CGCCACCTCC GCATAGCTCT TTTTAGGTTT TTCGACTTGC TTTTCAATCG CCCATTCTAA ATAATTTTTA TCTCGATACT CATCATAGAG ATTCATGACT GGCATTTCAG GAAATCGTTC TTGTTTCATT TTCTATTTCC too* *too a00..
GATTTTTCAA
ACTACCACOT
GAGTTCCTGA
TTCTTCCTGA
CCAAAGGTCA
CTrGAGGGAA
TCTTTCCTTG
T'rCTTGTTCA
GCCGTCTGCC
ATAATCTTCC
GTGCTGGCTT CAGAAATTCC GACTCTGTGC CGTCCAATAC AT'T?1'CTT CAATGGTTCC CCCATCCGAT GGGCACGGCC ACCAAGATCA CTGTATCTGC ATCAGAAAGG CATCTCTTTC GCTGGGGTTG AACCCGTAAT ATTTTTTCCA ACATTCCCTT ACCTGTACCA GTAGGTCTCG ATAAACAGGG CAGGAGTGTC
527 GGTTTGGTAG CrATCCACTT CACACCCTGC TCAAAAAAGG GAGGTCTGGA ACCAGCCCIT GGCATCATCA AAGGCTTCCC CTTTCTCTGC TCGACAATCC GCTATTGATT TGGCCGATC
GATGGTCAAA
TCAGAGGGAA
CCTCCGTTAG
AAGAAAGAGA
TTAAAAAAAG
TCAGACTCCA
AGGCGCATTC
TGGTCAAC
CACGCTCATC
?rGAAAATGG CGCAACGCCA TCAGTAAATT AAAGAGGTGC CGACCGT= TTGGAAAAA TCCGTGCAAG AAAGTCAAAA GTTCTTGGCT CTCCTCATAA ATCTTGCCAA TCATATACGA TGGAATATCC CGAATGACAT AGTATTTTTG CAAGATATGA TTGG=rCCTG CT'TCCACCTG TGATTTTGGA GATAAAATTC GATCCAAAAA AGCCTCTTr1' TCTTCATGAC CTTCTTCCAG ArrTTCAGA AAATGCTCTA GCGCTGCCAA ATCACAGGCA CAAAAAACCA AATCATCCI'C GCGAGCGTAG AGCCGATTGT TCTTCCTT GGCAACACCT TCGATACGAA T'TTrCCCCGG
ACCCACAGCT
CTTGCCACCC
ACTCCACAAG
ATGCACACAG
TAAACTATAG
GATAACTCAT
AAGGTCACCT
ATTTCCTGAC
TACCCCTCT TTTGAAAAAA CGCAGTCTT CT'rCTGCAAC :.go.
6 a 9 00*0 GATGATATCA ACCTTACCAG TTTCATAAAG AATCAATTTA GCCATATTTT CACCTTTACC TTATCTT'rTT ATTATACCAT
ATGAAAATAG
AAAGTAGCCA
ATGCGGGCAT
GCT'rCATTTC CCT'rCTAGGA AGACTTTTCT CCTAGAAGGC TGGATTTTTA CAATCCGCTG ACAGACTCT TGCAACAGAG AT'rTGGGCAT AT'rTTCGCCT ACGT'rTGGCA
AGCTATATTG
4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 GGAGACTTCC TTCCTCTCCA AAATCCAAAC CACGGTTGAG GATAACCTTG TCAACAACTC TTGCAATGTT TCATCAGTCA GGTCATAAGC 'rGAAAAGTCA AGCCAAATCA AGTAGGTACC TTGCGGTTTC ATGACC'rTGA TTTTAGTCTC TTTTCCAAAT AGATCCATCA CATAATTGAT GTGGTCTTCA AAGACTTGCT 'rGAGTTCCTC TAGCCAATCT
TTACCGTATC
TTATTGGCCA
TAGGAATTT
AAATTTTTGA
AAATCTTGGT
ATCTTCTCCA
ACATAGAGT-r
AACAGACTAT
CTGCGAGCAA
GATAGGCAGC TTCTGTCGCC AAATAACCCA AGCCTGAAAT TrCATGCTGA ACAGGCGTTr CTGGAAAGCC AGTCTCAACT TAGGATTTTC AATGACTGCA
T'TGTTCCAGC
AGGCAGGATT
GAATCTCATC
AC-ACTTCTTr'
TAACCTCCTC
AATATTAAAT
GATGGTA-TTG
CGAAACTAAC
TTCCCAAACA
TTCCACCAA.A
GTT'rTAGTGG CACTGCTCAA GACGATAGCA AAAGACTGGT GTTGTGACC AAAGAGGGTC AAAACACCGT CTTTTTGGCA CGTCCACCAG GATTGTGAGG TCCTTTTCAA GTTGGTCAAA GAGT'rGGCCA
GTTGCAAAGA
GTCAATCTCA
CC'TTTTCCAC TAAGGAATTA GTAATCAATC TACGATTATT CAACTTGACA AGGGTGGGTA GACAGGCGTG TTAATTAAALA CCGCCTCGCC T'rCTTTTGTA AAGGT'rTGAA TAGCTGTTGA GATGGCTGGT ACCACACCCT CGATAAAGAC AAGAGCCTCT £29.
.were not lodged with this application 529 TTTCGT'rCGC CCCCCTGACA AGAGTCGTCC GCGT'rCACCA ACTTCAGTAT CTAGTCCCTC 8400 TTTCATGGAG CGAATCTCAT ATCAGTTACT AAGCGATTCA TGCAT'rAT?1' TGTGAAACCC TATACTTGAT TGCTCCATTA ACGCACAATC GTTGATTTTC GAAAATGAA CAAGTAATAT ATGGTI'AAAA TTCAACCCTC AACTGCAAGC AAGTTCTCCA AAAATTAGCT ATATTACTAA CAAGGTTCCC ACAGATATAT AGCTATAGTC GCAAAGATAA TGATTTCAGT GAATTGTTTT TTTCTCTGCT TGGTTAGTCT TGTTAAATTT CCTCCTGTAA ATAATAAACA TCATACAAGG CTAAA'rAAGA CTACAATGGA GTCGTAATTA AAAACTCACG CCACTTTGGC TCTTATCAAA TTCCTGATTT TTGCTATCTT CACCTAGTGA TACTAAGTCT AGCACTTTCA AACCCAGACA AAGA'rTGTCA CGAATACTGC AAGCCGATTTT AC1-rCTCCAT TCTTTAAGT GAATATCTCC TGAAAGCGGT TTATAAAACC CTGATCCAGA TGGTCCAACA AAAGCA.ATTT
TCAATTCATC
CAGATAAGAC
TAAAATCATA
GCTCTAACAA
TTTGCCCCTT
CCTTTAAGAC AGG'TCGATTT TCATCATAAC CAAAATAGAC GTCCTGATAC CGATTTTCCT CCCTCAAATT TTTCTTTAGG GTGCAACTGA AGATCCCTTG CTCCTAGAAT AAACAGTTAC TAGGATTAAG TAATTGAAAG AGGTAAATCA AAAACGAAAC ATCCTGCGC'r GACCCGATAA CCCCCATAGG 'rTAGCATCAC ATAAGAGAGC AAACGGGGTC TCAAAAGAAG TAACCCTATC GTACCCTTTC AATACAATTA TCCAAAACAT CCTGTACACT TAATTAATTC ATGTTCTTGA ATCTTTTCAG TCAATTGCCC PCGACGACTA TACTTTTCAC TGATATTGGA AAGGGGCAAG
AAGAGTGATG
PCCAAGTACC
AATGACACTC
GAAGGATTTC
AATAAAAGTA
ATAACTAAAC
GTGTCATTGA
TCTACATAAA
8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 GAAAACAAGG CTTGACCAAT TCAATAGAAC TCCCATCTAT CTAGACAGAA CAAGTAAGAA AATTTCATAA ATACTCCTTA GGATAAAATC TAATAAATCT TGATATTATG CGTTTTTAAG TATTTGACAT G'N'TTGCCAA AACCATTTTT CTCTGGCTT AGGTATTTAT ATGATTCAAC TTTTTCACCC GATTGACTAA A.AAAATCAAA AACGATTGAA TAAATCCTTT AAGATAAGGG ACTCCCCATA ATCACCTTAG TAATATTTCA ACGGATAAAG TCCTATAACA AAACGCATAA GAGAAAGATT CCAATCAAGA TCAGAATAAT ATTTCGGAAA CAATGGCAGA AGTCAACTCC TCAkACCCCTC TATCACTT'TT ACAGATAGTA ACCAATAGAA ATACT'rTGGA GCCTATATTT GAAGCAACAA AGCAAGTAGA TATCTACTCT TAATAATT TCGGGAATAA CTCAATTTGA CATCTAGGAT TTTATATACC ACTTATCTAC AATTAGATTT CACAAAGACT TCTTACACAA TTCTTCTTCG GCTTTTTTAT TGGATTCTTC TTTTTCTTTC TGCATATTCG TCTGTTGTGA CAATCTTATC TTGTACTTTG CCCTTTTGTA CCGGTTAAAC CATAGGCAGC AGCAAATGGT ACGGTTCTTC TCAATGATGG
TCAATCAACC
TTATTAGCTT
TTGGCCTCAC
ATATCGAGAT
TAATCTTTCT
AATTGCTGAA
ATAGAACTAA
ATTGG43AATT
AAGCTTGAAT
CTTCCAACAT
CAGGCTCTAG
AGGTTGACCG
GAGCAGCTGT
TGTCAATCAC
CTCCTTGTAT
GAACACCCT
GCT'rTCTCAG GA'rTGTAGTA TTACCATAGT TGACCATCr'r ACAAAGTTTG GAGGAACCAC GACTGAGCCC CATAAGATGT TTGAGAACTG CT'rCCTGAGT TAAGACTTCC TATCTAGGT'r ATGATATTGT TTTTGTATTT CGAGCCGTAG TATAAGCACC TCATAGTAGG 'rCAATTTCAC TTTT'rCTTAT ATI'CAATAGC TACAAAATAC TAGATGGATC GCATTAACAG GAAAAAGTAT ACCAAAGTAT ATTGAACCGT CTTGTTTTAC CAGTGATATA GCTTCTGATT TTTTATCAGC GGCGCATATT CTTCTCCCTC TAGGTCAAAC CGTCCTGAGA TATTGGTCAT TTTCTrAATAA CG7?I-rCTG CTAGATAGTT TTTGACGCCG TGCTAGAATT CCTGCCAAGG TTAGATATTT 530 TGTTCCCCCA CGCGAAACAC TTGGAAGAAC TAAAGAACTA ATCAGCATAT TTCTCATAAC GTI'GGCCGG ATCTTGCTCT TTGAGTATAG ACATCCAGTC CAACTGCCTT AGCC?1'GTCA 'rCCAAGATTT TGCAGAAATC CTCCACTATT AGTATTAAAA GTCTTGATAA TCAGG'ICCCC AACCGCCATG ATATAAATCA TTGAGCAALAG TAGCCTGAAC TGTCAAACTC ATCTCATGTT TACATTATCA GAACCTAAAA CAGATTCAAT TGATTrGTTTG GCCTACTrTTA TCTGTTACT CCACAGTCT'r ATCCAAGTGG TGCTTCGAGT TCTTTCTTAG CTTCCGCAAA CTTAGCCTTG AGGGTCTTGA CCATCCGCAA AG7TGATACC TTGCCAT'rCC AGAGGCTACA ACTTCACCAA AGTC-=TCC CTTGATACTG TACGTTACGC AAAATCTTTG TTGCACC'!rC TTTCCCTTCA TCTGTCAAAA GCAAAATTGA TAGCCTGACG GAAGTTTTTA CGAT-TTCTTT TCAATGTCAC T'rGTTT'rAGA AGTATAATTG AAAAT'rAAAG AAATATGAAG TTGAATTTTG CATACTA'rAG TTCTTTAATC CCTTCATAGC TGGAGCTGTT AGGAAAAAGA AGCTG'rAAAA TTACGTT'CCA GTGATTCTTG GTCGCTACCA ATCGTCTACA AAGACATTCT TAGCATCCCA GTAATTAGGG AGA'rrTTGAG ACAAGTGCTT TCATCAAGAA AGGTCCATTG CGCCTrTCCCA AAATCATCCC CTTTTGATTT CAGGAAATCT CGTTGCAAGT G'TTTTTGAAT TCCAGTAAAG T'rCTGGTTTA TTGGTCATCA AGTGCCTTGA CACCGACAGT TGAAAAGTCG GTCATCCAAA CCAGCAACAG AGTCCTGCAC TAGATACAAG TGCATATTGC AAACCTGTCA CAAAA'rCCTG GGCAGTTACA AGAAGTAAAC CACTTGGCAT CCTTACGAAG TTTGTAGGTA AACAGTCCAA TCCTCTGCTA ATGATGGAAT AATATTCCCA CCCGTCTACC AAATTTGCAA CAATATCGGA TGTTGCTGCG CAAGCTAGAT GGATCACITrG AATAAACATA GTTGTAGGTT TCCACACGCG CTCAATAAAA CTCCTGTACC CAGGACAAGA GCTCTTAGAC TTTTTCATT'r CCGG 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11864 INlFORMATION FOR SEQ ID NO: 62: SEQUENCE CHARACTERISTICS: LENGTH: 2412 base pairs TYPE: nucleic acid STPANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: TAACTGCACT AAACATAATA TAAGGAGAGA, AAATGTCTGC AATAGAACGT CTGCTCACTT AATTGATATG AACGATATTA TCCGTGAAGG GAATCCTrACT TTGCTGAGGA AGTCACTTTC CCCCTATCTG ACCAGGAAAT CATCCTAGGC TGCAATTCCT TAAACATTCC CAAGATCCTG TCATGGCTGA AAAAATGGGA GTGTTGGACT GGCTGCTCCC CAGTTAGATA TCTCAAAACG CATTATCGCT CTAATATTGT TGAAGAAGGC GAAACTCCAC AGGAAGCCTA CGATTTGGAA ACAATCCAAA AATCGTCTCT CACTCTGTTC AAGATGCTGC TCTTrGGCGAA GCCTGTCTGT TGACCGTAAC GTGCCTGGCT ATGTTGTTCG CCATGCCCGC ACTACTTTGA CAAAGATGGA GAAAAACACC GTATCAAACT CAAAGGCTAC
ATTACAAAAG
CTACGCGCGA
GAA-AAGATGA
CTCCGCGGTG
GTT'TTGGTAC
GCCATTATGT
GGAGAAGGTT
GTTACTGTTG
AACTCCATTG
TTGTTCAGCA TGAAATTGAC CACATTAACG GTATCATGTT T'rACGATCGC ATCAATGAAA AAGACCCATT TGCAGTTAAA AGACGGGGTT TTGTGTTATA CGCCGTTACC GAAGCCCTCC CCGAGGTAAG AATGTTGAGA TTGGACATCA AAAAAATCTC TCTACGAGTG TCTGAATTTG AGAAGAAAAT CCACTTCTAT
GATGGTTTAC
ATAGAGGCAT
TTGCAAATAC!
AAGTCAAGGA
TCTCTGAGAT
CCTATAGCGA
TGATTCTAGA
TGATTCTTGA ATAAAGAAAA TCCCGTTGCA GAAAACAAAT GATA'N'GTCT ATGGTGTCCA AGGAAACAAA CTCTACCTCC AAGAAGATCT ACTAGCTACA GAAAAGAAGG TGTCCATTTC TACTGAAGGT GCTGTTCATC AAGGTTTTGT GCTAGATTAC ATCCTTGCAA AAACACGCCA TGGTCTAACC GATCCCCATA ATCTGGGTTC TTCAGGTGTC ATCATTCCCA AGCACCGTAC AGCCACAGGT GCTATTGAAC ACGTtCCAAT AGGATAAACT TAAGGATGAA GGTTTCTGGA 120 180 240 300 360 420 480 540 -600 660 720 780 840 900.
960 1020 1080 1140 1200 1260 1320 1380
TATCTTGCGA
TGTCGGAGTA
TGCCCGAGTG
ACAGCCGATG CGACCAATGT ACTCCTGTCG TTGCCAAAAC ACCAACCTCA GTCAAACCTT CCTTTGGAAC GGATATGAAC CCCTCATCAT TGGAAATGAA AAATGATTAC CATTCCGATG GGTACTCCTT GCCACAAGTG GAATACAAAA 6GGAAAATCG' GGAAAAGGTA TCTCTAGCAA CATCAAAAAA CAGGTCGATG AATGGACATG TTCA.AAGCCT TA.ATGCCAGT GTTGCTGCCG 532 TTCCGAAATA GACTATAAAA AAGTTNCCAG TCATCTGAT'r CCATTCTCAT GTACGAAGTT GGAAACT'r TTATGATTAA TAGCTCCATC TCCAACCGCT CTGCAAAGAT ACCGTCGACT GATCTTGGAT AT'
AGACACCACCGA
ATTCTACTCC CT' TTrTCATTCGC A
GAACAATGGTAA
CTCCACCACC AA( AGTAAGAAAC AC( TAGAACCAGT TG( TCTTAAAATC AC( ATTTTCAAGT CC' TGGGTAATTT TC( CAAAGCTACT AGCACCGATA AT~ ATTCTAACTC TG rcAAT'rCT kCC''CTTG rT'CACCC k.GGCGCA
:AGTCTTA
TACCAAT
CACGACTG
CTACGATA
CATGGCTr
T'TCAAACA
CTATGTTCTG
G?1'GIACTT CCACTT'rCA
TTAACAAAAT
TCTGTCACTT
T'rGATT'rCCC TCT'rGTAAAA
GCAAAACGAG
AAATCTTGGT
TTCACTTCTTI
ACTGTACGTG
CGACATr'rTC TCT'rTTCAGC TAATGAATTT ATAGGCTrCT TGAccA~CGA GCGAAGGTC TTITCAACCGA ACATCTCCAA TCTGTATC TGTCACAATC CATCCTGCCT CCCTAAGAGG GTCCAAACCA ACATAGATAA GACCTGT'rTT CACANTTCA AATACGACTG TTACTACACA ATCCCACATA AAGCTCATTT CCTTTrGGCC ACGAAGTTGG TCACCACCGT TCAAGAAGAG GGCTTCTTCA ACAGCTGAAT CACGGAAGAA AGCACCATCA CACACACCAC CTTC'rCCAGG CACTCCCAAA GGACGGTCT'r TTI'CATATGT TTCGTCATCA GTCATCAC'TT AACATAACCA TAAATCTGCT CAACACCAAG CAATTCAGGT CCACTAATAT TAGCGTATCC CATCTGACCA CCTGGCAGAC CACC'rTCAAT ATACAAGGCC GCAGTCATCC CTGCAGGTCC 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2412 GATATCAG ATGTATTATT ~GATTGC TTCCAGCAGC %ATAGTAT CGTACATATA GAT'rCCTITCT TTCTTGGTGT AACTATCTTT INFORMATION FOR SEQ ID NO: 63: SEQUENCE CHARACTERISTICS: LENGTH: 7760 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: CCGATTTCGT CGAAT""TTC TCTCATCATT TACAACCTGT TGCAAGAGCA GAGT'rTACCT TGCTGCTTCA TACCAAATTG GCACAACCCT CTGTIrTGGC AAATATTGTA GATGTAAACA AGGATGAATG CATTTTAGGA ACACTTCCGCTCCCAATAC CTTATTGGTT ATTTGTCGAG ATCACCACGT TGCCAAACTC ATGGAAGATC GTI'TGCTACA TTTGATGAAA CATAAGTAAG GTCTTGGCAC TTGCTCTCAA GACTTATTTT TCAAAAGGAC ACACAGAAAA TCGCGATAGA AAAGTTATCA CCCGGCATGC AACAGTATGT GGATATTAAA AAGCAATATC CAGATCCTTT TTrTGCTCTTT CGGATGGGTG GCAGATTCTG GAAATTTCCT GGCGGGTGTT CCCTATCATT
ATTTTTATGA
TAACGAGTCG
CTGCCCAACA
TAAGGTGGCT ATCGCAGAGC AGATGGAAGA AGAGGTTGTI' CAGGTCATTA CGCCAGGGAC GAATAATTTT TTGGTTTCCA TAGACCGCGA TTTGGTGACG GGTGACTTTT ATGTGACAGG AATCCGTAAC CTCAAGGCTC GAGAAGTCGT ACAAATCCTC AGCCGCCAGA TGAATCTGGT CCT'rCATrTA TTGGATTT-GC GAT'rGGCAAC A'rrATTr'rAT GAGGATGCGG CAACAAGAAT GCCGACAATC GTATATCGAT GTCTTGATTG TCCTAAACAA GCAGTTGGGG AGTGGTCGAT AGCPAGTAAGC AGGCAATCAA TTTGGCCTAG TCI'rTGGAT T'rCACGCTGG GTTGGGTTAT GACTTGTCTG ACTCTCTTA'r GAAAAAGAAA GGTGGAGCAA ACGGCATCTA
TCAATGCI'GC
CGATCCC'rAT
AGCAGGGTTA
TGTTAAACG
CGGACAGTCA
CTTATATGGA
TTTGTCGGGA
AGGAAGAAGA
GCTTITGAAGA
GTAAGCTGC'r a a.
a.
a a. a a a a a. a.
a a a a a. *a a a CCAGTATGTr
CGAAATTAAG
GAATGCTCGC
GGCTATGGGG
AATCGTCCAA
CTTGACAGAC
TGGCAAAACC
GATTCGTGCG
CATCGGACTC AGA'rGAGGGA A'rTGAACCAC CTCAAACCTC TrATCCGCTA GATTTCT'rGC AGATGGATTA TGCGACCAAG GCTAGTCTGG ATTTGGTTGA TCAGGTAAGA AACAAGGCAG TCTr'rTCTGG CTTrTTGGATG AAACCAAAAC ATGCGTCTCT TGCGTTCTTG GATrCATCGC CCCTTGATTG ATAAGGAACG CGTCAAGAAG TAGTGCAGGT CTTTCTCGAC CATT'rCTTTG AGCGTAGTGA AGTCTCAAGC GTGTTTATGA CATTGAGCGC T'rGGCTAG'rC.GTGTTTrCT-T AATCCAAAGG ATCTCTTGCA GTTGGCGACT ACCT'rGTCTA GTGTGCCACG ATTrITAGAAG GGATGGAGCA ACCTACTCTA GCCTATCTCA TCGCACAACT 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 GGATGCAATC CCTGAGTTGC AGAGTTTGAT TAGCGCAGCG TGTGATTACA GATGGGGGAA TTATCCGGAC TGGATTrGAT TTGCGTTCTC AGAGAAGGGA CTAGCTGGAT TGCTGAGATT CTCTGGTATC AGCACGCTCA AGATTGACTA CAATAAAAAG CACCAATTCG CAACTAGGAA ATGTGCCAGC TCACTTrC CTCAGAACGC TTTGGAACCG AAGAATTAGC CCGTATCGAG TGAGAAGTCA GCCAACCTCG AATACGAAAT ATTTATGCGC G'rACATCCAG CGCr'rACAAG CTCTAGCCCA AGGAATTGCG TCTGGCGG'N' GTGGCTGAAA CCCAGCATT'r GATTCGACCT AATTGATATC CGGAAAGGGC GCCATGCrGT CGT'rGAAAAG TAT'rCCAAAT ACGAT'rCAGA TGGCAGAAGA TACCAGTATT AT'rGCTCCTG AAGCTCCTCA GAGACTTTAG ACAAGTATCG GAGGCTAAGG AGCGAGAAAA GATGGCTACT ATrTTCATGT CGCAAGGCGA CGCTGAAAAA GGAGATATGC TTGAGGCGCG ATTCGTGAAG AGGTCGGCAA ACGGTTGATG TCTTACAGAG GAGTTTGGTG ACGATTCACA GTTATCGGGG CTCAGACCTA CAACTGGTTA CAGGCCAAA 534 CATGAGTGGG AAGTCTACCT ATATGCGTCA GTTAGCCATG ACGGCGGTTA TGGCCCAGCT GGGTTCCTAT GTTCCTGCTG AAAGCGCCCA TTTACCGAI' TNTGATGCGA TT"TTTACCCG TATCGGAGCA GCAGATGACT TGGT1rCCGGG TCAGTCAACC TTTATGG'rGG AGATGATGGA GGCCAATAAT GCCATTTCGC A'rGCGACCAA GAACTCTCTC A'rTCTCTrrG ATGAAT'IGGG ACGTGGAACT GCAACTTATG ACGCGATGGC TCTTGCTCAG TCCATCATCG AATATATCCA TGAGCACATC GGAGCTAAGA CCCTCTTTGC GACCCACTAC CATGAGTTGA CTACTCTGGA GTCTAGTTTA CAACACTTGG TCAATGTCCA CGTGGCAACT TTGGAGCAGG ATGGGCAGGT CACC'PTCCT'r CACAAGATTG AACCGGGACC AGCTGATAAA TCtACGGTAT CCATGTTGCC AAGATTGCTG GCT'rGCCAGC AGACCTTTTA GCAAGGGCGG ATAAGATTTT GACTCAGCTA GAGAATCAAG GAACAGAGAG TCCT1CCTCCC ATTTCACTCT 'rTGATAGGGC AGAAGAGCAT GTGTATAATA TGACACCTAT GCAGGTTATG TAAAACCAAG ACTCACTAGT TAATCTAGCT ACTTTT'TTCC TAGAATAACA TCACACAAAC.
CCCI-rTTGTC rATTTTTAA GGAGAA.AGTA GGCAGGCCCT GGCTTCGCTC CTGATGACAG CGCGTTATC'r GCAGGAAGTC TTAGGCGCCC GTATCGGGGC TTGGT1'GATT GGTGTGGCCG TTGTCCTCGC ACCCTATATT GCCCAAGGAG GTAAAATTCA AACCT'N'rCT TATGCTGATA TTCGAATGAC AAATGATATC AACCAGATTC TTTTCAGACT 'rCCCCTCTTG TTCATCGGT'r CTCTGTGGTG GGTGATTGTT CTCATGGTAG GTATCAAGGA GACTTCTTTG AGAATGAAAA GGAGCTGACG TGC'rCATTCA GAAAATAAAA GCTTGArCGT TGCTAGT'rCA TCCTTACTGG GAAATATGAA TAGTCGGTCT AGTTGCTGGT TTTCATCCGA CCT'rCGGGAG TTGAACAATT TAATGCGGGA AGAACGTTGT CATGATGACC CGTTTATCCT AGCGGTTCAA TCTTGAq~r'r TGTTr.ACT
ACAATTCTCC
CATTGTCGCT
ACCTACAAGT
CTTCTGCAAC
GCTATTTATA
GGACTCAATG
GATGCCTTCC
AATCTAGTCG
TTCCAAATTC
ACCTTACCTT
GCTGTCATGA
ATCAATGCCA
GAAAAAGAGC
TACATTGGTT
GTCTTCCTCT
TCCA'rCGCTTr
TTTTTGGGAA
ATGAGACAAA CTAGTGCTGT CACTGAACAG CCTATCCTAG CAGAATTAGC TAA.ACTGGAT AATGTCTTAG TAGAGTTAAA ACAGAAACTA 2160 2220 .2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 TGGGAATGAT GGGGCCTCGT 'ITGCCAAGGA AAATTTACGT AATTTGCTAA GTTTACAGAG ATGCCTTTTC AGrAGTGGAA CTATTTGGCT GGTCGCGGGA TTT1GCCAAGT
GGCGTTCGTG
GTCTCAGACG
CCC'rTTA'rGA
ATGGTTCAGT
TTCAAACCCT
TGGTCAAGTC
AGCTTCTTGG
TGTTGGTTGG
CGGATCCGC
TTACCATTGT
TCTTGAGCGC
CTTGTCCAA
TCAAA.ACCTT
TTACGGGGCG
TGTTGTTGGT
TATGGTTGGA
CTTTTGTTLAA TrACCAAGC CAGATTATCT ATTCTGTCAG CCGTGCCATG A'ITTCCA'rGC GTCGTATTCG AGAAATTCTT GACGCAGAGC 535 CAGCTATGAC CTTCAAGGAT ATCCCAGATG AAGAGT'rGT TGGAAGTCTT AGCTTTCAA 3960 A'rGTGACCTT TACCTATCCA ATGGACAAGG AACCGATGCT GAAAGATGTG AGCTTTACTA 4020 TTGAACCTGG TCAAATGGTT GGTGTAGTrG GAGCGACTrGG TGCAGGAAAG TCAACCTTGG 4080 CTCAATTGAT TCCACGTCTC TTTGATCCAC AGGACGGGGC r-ATI'AAAATC GGTGGCAAGG 4140 ATATTCGAGA AGTGAGTGAA GGAACCCTGC GTAAAACAGT TTCCATCGTT CTCCAACGTG 4200 CCArrC~rTTr TAGTGGAACG ATTGCAGATA ACTTGAGACA GGGGAAGGGG AATGCTACTC 4260 TATTTGAAAT GGAGCGCGCA GCCAATAT'rG CCCAGGCrAG TGAATTCATT CATCGTATGG 4320 AGAAAACCTT TGAAAGTCCA GT'rGAAGAAC GGGGAACCAA T7rCTCTGGT GGACAAAAAC 4380 AAAGGATGTC GATTGCGCGT GGG.A IGTCA GCAATCCACG TATTCTGATTr TTTGATGATT 4440 CGACCTCAGC CTTGGATGCC AAATCAGAGC GCTTGGTGCA AGAAGCTG AATAAGGACT 4500 TGAAGGGGAC GACAACCAT'r ATTATTGCTC AAAAAATTAG CTCGGTTGTC CATGCAGACA 4560 AGATCTTGGT TCTAAATCAA GGACGAT'rGA TTGGTCAAGG TACGCATGCA GACTTGGTTG 4620 CCAACAATGC CGTTTACCGT GAAATCTATG AAACACAGAA ATGAALAGACA AACTATAAGA 4680 .AAAGTCAATA GTTTTA'rCTA AACTATTTCT TATT'rCAATT TGATGATTTG GCGATGATTT 4740.
TAGAGCACGG CAAAAAGCCC TTGAAAAAGT CCATTTTTTC AAAGGTAA'rC CTGTGTq'AAT 4800 TTCAGAAATT ACATCACTTT TTGTTCGTCA AATGGCAGCT CTTTTTTTAG GATATAAAAC 4860 AGGGTTCGGA TAAGTTTTTT TGCAAGGTGG ATGATGGCTA CATTGTAATG TTTTCCTTGT 4920 TCTAATT'rAG TCTTAAGATA GGCCTTAAAA GCAGGCGAAA AGCGAGGGCA TGCTTTGGCA 4980 GCTTGTATGA GTACCTACCG CAGATGAG.GG GAACTCCGTT TGACCATTCT TCCTGCTAAA 5040 TCAATCTGAT CTGACTGATA AATAGAAGAA TCCAGTCtAG CGAAAGCTTG TAATTGAGCA 5100 GGATTATCAp. AGGCATGAAT ATTTCGAATC TCAGCTAAAA TGACCGCCCC TAAACGATCC 5160 CCAATCCCAG TAACCGTCGT GATGACCGAG TTGAACTCAG CCATCAAGTC ATTGACACAT 5220 *GTTTCCGCCT 'rGTCAATGAG CCTCTTGTAA TGTTTGATGT TTTCATTACA CGAGATAAAA 5280 CGTCTATGCG TTATCAAACT CATTACCAAT TAAACAAAA AGCTGTGGTT AGATCCTTTC 5340 GGAAATTGTC AAGCGATTGG AGGAAATGAA CTAATCCACA GCGGCTTATT CCAAGTATAC 5400 CACTTGG.GCT TTGGCAGTAG CTAACTGCGC TAAATATAAT ATAAGGAGGA GTAAAATGA-A 5460 *GACAGTTCAA TTTTTTTGGC ATTATTTTAA GGTCTACAAG rTCTCATTG TAGTTGTCAT 5520 CCTGATGATT GTTCTGGCGA CTTTTGCCCA AGCCCTCTTT CCAGTCTTTT CTGGACAAGC 5580 GGTGACGCAG CTAGCCAATT TAGTTCAAGC TTATCAAAAT GGCAATCCAG AACTTGTATG 5640 536 GCAAAGCCTA TCAGGAATCA TGGTCAATCT TGGCCTGCTG GTTTTGGTTC TATrTATCTC TAGTGTAATA TACATG'rGTC TCATGACGCG CGTGATTGC-A GAATCGACCA ACGAGATGCG CAAACGCCTC TTGGTAAGC TGGCGATATC CTGTCTCATT
AAGCTTGATT
rCGAGAAAT
GCTGATTTTC
GAAGCTCAAC
AATTCAAGAG
CTTTAAAGGA
CAGGTCATGA
GTGACGCTGG
ATCGTGAAAA
GCCTATATGG
GATATGATGG
AGAATGTTCT
TTCT'CAGTT
'rTACCAGTGA
GCAATATTGT
C'rCTCATCAC
TGGCACGCAA
ATGAGAGCAT
CAGGAT'rTCT
CAGGAATTCT
GACGGTI'TCT TTCTITrGACC TTTGGATAAT ATCCTCCAAG TT'rATACATTr WTCTGATTC CATTGCCAGC ACCCCATTGG ATACACCAAC CTCCAGCAGA CTCAGGCCAA AAAGCCGTGA TGAACAAAAT GAGCGCGTC TTTCCCTGTC ATGAATGGGA
GTCGACAAGA
CCTTTAACGA
TTGTCATGI-r CTrCCTTAT
AAGAGGTAGG
TTGTGCAAGG
GCAAGGCAAC
TGAGCCTGAT
TAATACAGCC ATCGTCATCT T'TGCTGGTTC GGCTGTACTT TTGAATGATA AGTCTATTGA AACAAGTACA GCCCTAGGTT.. TGATTG~rAT GTTTGCACAA TTTTCACAGC AGTACTACCA GCCTATTATC CAAGTTGCAG CGAGTTGGGG A.AGCCTTCAG ACGAATTCAG GAAATGTTTG ATGCAGAGGA GGAAATCCGA CACTAAGTTG CAAGAAAGTG TTGAAATCAG TCATATCGTT ACCTATTTTG AAAGATGTCA GCATTTCTGC CCCTAAAGGC GCCGPZAGGT TCAGGAAAAA CGACTATTAT GAACCTCATC TGCTGGTGGT ATTT'ATTTTG ATGGTAAAGA CATTCGTGGC TTGGCCTT'TA CTGGAGCTGA CCTGAAAAGG CTCCAACC'TT TTTTCATACT TGCCTGATAA CAGATGACAG CAGTT-GTTGG AATCGCTTTT ATGATGTTGA TATGACTTAG ATAGTCTTAG AGCGGAACGA TTAGAGACAA GAGGTAGCAG CAAAAGCAAC GATAC'rCTTA TTGATGATGA ATCGCTCGAA CCCTGATGAC 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380
AAGCAAGGTG
TATCCGATTT
CCACAT'rCAC
CCAGAGCATC
AGATCCAGAA
CAAGATTtAG
CCGCTTGAAA
GGAATTGTAT TGCAAGATTC GGTGTGCCAG ATGCTAGTCA GACTATATCG AAAGTTTGCC TTTTCAACAG GGCAGAAGCA GTTCTCATTC TCGATGAAGC CATGCCATGG AGGTGGTTGT ACCATTCTCA ATGCAGATCA
GGTCTTGTTT
GGAAATGGTT
TGATAAGTAC
ATTGATTTCA
PLACTT1CAAAC GTAGATACG AGCAGGTAGA ACTAGTTTCG GATTATTGTC CTTAAAGATG GCTAGGTGGC TTTTATTCAG TTCTCCTATG TGGGCAGCTT .TGAACGTGGT AACCACCATG AACTTTTGAA CAATCAATT GTTTTCGA.AT AAGAAAGAAG
TGACAGAAAG
TCATTGCCCA
GAGAAGTCAT
AACTCTATCA
T'TTTGTrC
TTTTGAGTGA
CGGT'rGTAGG ATAAAAAATG T'rrATCACAG CCTTAAAAAA AACATAT'rAG ACGAAAGTCA TATGATAGGA CTATCGTTAG CATTCGAAAG GAGAGGCATC ATGGCTAGAA AGTTGCTGCA AATCTATGTC CCGTAGACGC AGAAGGCAAA. ATCATTCATT CATCTGTATC 537 TrGTAGA'N'C GCAGAGATCA TTCGTCAAGT CGGTGGTCTC CL;1-TAGTCA TTCCTGTrGG TGATGAGTCA GTTGTACGTG ATTATGTGGA AATGATTGAC AAACTCA'N-r TGACAGGAGG CCAAAATGTr CATCCTCAGT TTTATGGAGA GAAAAAGACC GTCGAGAGCG ATGATrACAA TCTGGTCCGT GACGAAT'rTG AATTGGCACT CTTGAAGGAA GCGCTTCGTC AGAATAAACC AATTATGGCA ATCTGTCGCG GTGTCCAACT TGTCAATGT'r GCCTrTGGTG GAACCCTCAA TCAAGAAATC GAAGGTCAGG INFORMATION FOR SEQ ID NO: 64: SEQUENCE CHARACTERISTICS: LENGTH: 2723 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 7500 7560 7620 7680 7740 7760 GAGGTTTTAA TTCACTrACC TATTTCTTGC AAAATC'TTTT TAATATCATT AAATGATGTA CACCATCGAA TCCATTATTT CATATTCGAT ACTAGTATTT ATAAAAACrG ATAGTGGACT GATAAACATr TCCATCCATA TCATTCCCCA GTCCATGATG TGTTATCAAT GTGTAAATCT TCAATTCCCA GTTAAAACCA TCTsCCGTAT CTTTATTTAA ACAACAATCT TAATGTTTAG TATTCTTTTC CATTTATATA CTN'TATCAT TGATGTTAA.A CCCTTAGGAT CAATGTTTA.C CCAACTGCTT TAGCATTCAA TCTGTTA.CCT TATCTGGAAA TCACCGTCTT TAACATTCAG AATGAATTCT TTTACGGTTG TGTCTTGTCr ATTATTTGTT AATATGTTGT TCTTGAATCT GACTACAGAT TTTCCATCAG TTCGGGTTTA ACATTATCAT ATCGCTA'rAG CCAGTTTGAA TCCGTrTGCT TTATAGTCTT CTrAATATTA AAATCTCTAG TACAACCGAT TCATTAACTC CCGTAGATTA
CCCTTATCAG
AATAATTATC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 TTCGTACATA AATTTTATTA GATTTAGATG TTACTGGTTT GGTATCAATA AATAACATGG ACTTATCCTC TCCAGTGTAT TCTTTATCAT CTGTCACAAG ATTTCCGTCT TTATCGATAG ACACTTGATA CCTTATAATG TTAAAGCCGT TTCTTCCATC TTCAACATTT CTACTATCAG AAATCTTACC TCTTAAATAA GATTAAAGTA GTTCTTATCC AGCCATCTTC TTTTATAGCT CCTTACCAAA TAATACAAGT TTAGAAGAAT CTTCCCCT'r-T ATCGTTCATT TTAAATGTAA CCAAAGCCGA CATTAATACA GATTGGGTAC CATAAATTGT TGTTTCTGAA AGGGCTC'N'A
AATTCTGGAT
ATTGAAAGGT
TCTACATTGA
GATTAGGATT GGCCTTTTGT ATTTTTGCTA TATCI'CCTT GCTATAGACT CCATTTCCTT CTAACATATC CGTTN'CCA GAATGATATT ATCCTTTAAC ATTCCATTN AGTTAAATTG TGACATTCCC ATAACTTGAT GCATTCTXAAT TTTAAT'rTCA CTCCTGCAAA AGCTAATCCA TCGTTTTTGA GTGGAAATCA CATAATATAC C1'TGGCATTT CATCTTTAGA CGGACTTAGA TTTC-ATATTC TAGATCAGTC TATAATCGTA TTCCTCCATT ATT'rCTTTTT AATGAGTTTC TTCTATCAA'r AGTAAAACTA GTTCTACCCC AGTTGTATCT ATTTAGATITC TGCAATCTCG CTAGATAATA GGAAATCATC 'rCTTTGTGAA A'N'GCTAGAA
GGATTATAGG
AGATATTGTT
TCTGGTTTGT
ACATAGGGAT
GCTTTTTTCT
TCCACAATAT
TACTTGGTCG
TTAGAAACA'r
ACACCATAAA
CCATCATCGT
CTCTTACCAG
TTTAAGTCTT
GATTTTTCTT
TTTTTTAGAC
CCAAGCTTTrTT CCCT'rTTCA'r
CCATCTAATG
538
TAGTCACTTT
GTTTT'rCTGA
TTTTGAAAG
A'IrCTGATTT
GATCATTATC
CATTAA'rATT
CTTTCATTTC
TACTTATCTT
TTTTGA1-rT
AGGCTATTAT
T^TTCACTTGT
TAT'rTTCAAA
TAATAGACTC
TACCCTCTr'r
GATATTTAGA
TACTGCATAG
A'rCAGAATAG
ATCTCCTTCC
AGTITrCCTTA
T'TAACAAAT
AGCGTATAGA
TATAGATTTA
TCCAAGAA'N'
CC~TC',"A
ATT'rTACCAG CCTAArrA
ATTTTTTCAG
AATCTCATAT
TCAAATGTCA
TAGTTATTCC
TCAAAGTGTC
GATTTCG'rCA AGT'rTCTCAG ATTTCCTr'rA TCATCGTATT AAAATCATCA ACTTCTCTAA GTCTCTA.ATT GTTGAAATA'r TTCATTTTCT TGATGATGAT TCCATTTCCT AAATTTTTAA TGA-ATCTTGA TCAGGATCTA 0 0 0 0~*0 00 *t~ 0 0 00 0 0
S
et 0 0 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2723 CAGCCTGATT AGCAAATTTA AT'rCTATCAA CAATCACTTC AATGATTTTT CCCCTTAAAT CTCCCGCACC TTlTAAT'rTCA TAAATGGTAT TTCCGTCTTT ATCAAGTTTT CTATTTCTTC CTTGACCCTC ACCTGCGTAA GT1TACTTCAA GATT'TTTTTC AACCTCTCCA TCTTCATTAA CAAGAGCGGC GCCAGCATAC CAAACTTCGT TCGCAATCTC GTCAAAT'rTT TCAGGATGT CTTTTTG'ATC .TCTCGCAAAT AGCGTTTCAT 'rCTATACTG ATCTTTTACC TTATGATAAG 0* S.
S 0 Son
S
5055
TATCCTTTGT
GGGCGGTGTT
AATCAACTTA ATT'TTTTCAG GATTTGAAAA ATCAATTTTT ACAGGAATAT AGGAAACCTG ATCTATATTT AAATTTATAG TCTTAGTAGC AGGATCTTGA GAT'rATAGAT GAGTCCATCC TTCCCTTTGC ATCAT'rCCTT TGACTTTTAA GTCTCTTTGA
AAATATTGAC
TTATCCTTAC
CACTTCAAGT
TTAGAATTA
TTTTC'rCCCT
CTTCCCCAAT
TTTCTGGTGC
CACCCCAAAC
ATCAACCGAA
CCATGGGTAA
CGGTTCAAAT
ATTT'rCT'rC'
TTTTAGTI'TA
ACAATCTTAG
TCTTTAGTTA
TGACCTCTTA
CTACCTCTAG
GATGATTTGA
AAATTCCTCT AATAAAGTGT TCTCTCGAAA CTT'rATTTGT ATTTACTATT GAAATCAATC CTTCTTCTGC ACTTCTTAAT ACA INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 11831 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: AAAAAAGTGG GAATGACTCA ATTGAAGCAA CTCCAAACGT GCTATCCAAG TTGGTTTCGA CATGTAGCGA AAGCTAACAC GGCTTGGAAG TTGGTGCTGA GTAACGGGTA CTTCTAAAGG CG'rGGACCAA TGGCTCACGG GCACCTAACC GCGTATTCAA ACAATTCAAA ACCTTGAAGT GGTAACGTAC CAGGTGCTAA AAATAATAAA GAAAGGGGAA GGTAAAGAAG CTGGCCAAGT TCAGTTGTGT TTGATGTAAT GTTAAAAACC GCTCTGCAGT GGACGTGCTC GTCAAGGTTC GGACCAACTC CACGTTCATA AAATCAGTTT ACTCTGAAAA TTTACAGCTC CAAAAACTGC AAAGTTCTTG TTATCCTTGA CCAAACGTGA AAGTTGCAAC AAACTTCTTG TCACACAAGC TGTATGATGT TATCAAAAAA GAAAATATGT ATTTGAAGTT AAGCTGCTTT CGAAGGTGTT AATCTTCACT GAAGCTGGCG AAT'rGATCCC TGTAACAGTr TGTTCTTCAA GTTAAAACTG TTGAAACAGA CGGATACAAC TGACAAACGC GAAGTATTGA GCAACAAACC TGCTAAAGGA GGCTCCTAAG CGCTTCATTC GTGAATTCAA AAACGTTGAA AATTACAGTT GAAACATrCG CAGCTGGAGA CGTTGTTGAC TAAAGGTTT~C CAAGGTGTTA TCAAACGCCA CGGACAATCA TTCTCGTTAC CACCGTCGTC CAGGTTCTAT GGGGCCTGTT AGGTAAAAAC CTTGCAGGAC GTATGGGTGG CGACCGCGTA TGTACAAGTT GTTCCAGAAA AGAACGTTAT CCTTATCAAA GAAATCTCTT ATCACTATCA AATCAGCAGT TAAAGCTGGT ATCAGTCACA ATGGCAAACG TAACATTATT TGACCAAACT TGTTCTTAGC GATGCAGTAT TTGGTATCGA ACCAAATGAA CATCAGCCAA CGCGCAAGCC TTCGTCAAGG AACACACGCT ATCAGGTCGT GGACGCAAAC CATGGCGTCA AAAAGGAACT TATCCGCTCA CCACAATbGC GTGGTGGTGG TGTTGTCTTC CGGCTACAAA CTTCCACAA.A AAGTTCGTCG CCTAGCTCTT AGTTGCTGAA AACAAATTCG TAGCrGTAGA CGCTCTTTCA TGAATTTGCA AAAGTTCTTG CAGCATTGAG CATCGATTCT AGAAGGAAAT GAATTCGCAG CTCTTrCAGC TCGTAACCTT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 TGCTACAACT GCAAGTGTTC AGCTATCTCT AAAATCGAGG CCTGTCATCA CTGAAAGCTC GACACTCGTG CACACAAACT AAAGTTGCCA ATG'ITAACAC
TTGACATCGC
AGGTTCTTGC
AATGGCTCAA
TTTGATCAAG
AATCAACGTA
AAATAGCGAC
ATAATGAATT
CTTrGAAGCAG CAAGCTGTTrG
AAACCAAAAG
540 CTAAACGTGT TGGACGTTAC ACTGGTTTTA CTAACAAAAC TTACAGCTGA TTCTAAAGCA ATCGAGTTGT TTGCTGCTGA AAATATCGTG GGAATTCGTG TTTATAAACC AACAACAAAC TTTGGATTrTC GCTGAAATCA CAACAAGCAC TCCTGAAAAA GAGCAAGGCT GGTCGTAACA ACAACGGTCG TATCACAGTT CAAACGTTTC TACCGTTGG TTGACI'CAA ACGTAATAAA TAAAACAATC GAGTACGATC CAAACCGTTC TGCAAACATC CGGTGTGAAA GCATACATCA TCGCTCCAAA AGGTCTTGAA AGGTCCAGAA GCAGATATCA AAGTCGGAAA CGCTCT'rCCA TACTT'rGATT CACAACATCG AG'rTGAAACC AGGTCG'rGGT
TAAAAAAGCT
AGCTGAATAA
GGTCGCCGTA
TCATTGCTTG
CGTCACCAAG
ATCATCACAC
TCTAAGGAGG
ATATGACTrc 'rTGCATTGAA
GTGGTGGACA
GACAAcGTTG AAGCAGTTGT GCTCTTGTAC ACTACACTGA GTAGGTCAAC GTATCGTTTC CTTGCTAACA TCCCAGTTGG GGTG.AATTGG TACGTGCTGC TGGTGCATCT GCTCAAGTAT TGGGTTCTGA AGGTAAATAT GTTCTTGTTC GTCTTCAATC AGGTGAAGTT CGTATGATTC ACAACATGGA CTTGTAAACC AACAGTTCGT GGT'rCTGTAA AGCACCAGTT GGTCGTAAAG AACTCGTAAC AAGAAAGCGA ATATTAAACT AGTCGCT'rAA GTGCAAGCCG CTGTGGTACA AAAAAGGACC TTTCGTCGAT AAAAGAAAAA AGT'rATTDAAA GTTACACTAT TGCAGTTTAT 'rGGTAGGCCA CAAACTTGGT TTGGAACTTG CCGTGCTACA GTTGGTGTTG TCGGAAACGA TTGGTAAAGC AGGACGTAGC CGTTGGAAAG GTATCCCCCC TGAACCCTAA CGATCACCCA CACGGTCGTG GTGAAGGTAA CACCATCTAC TCCATGGGGC AAACCTGCTC TTGGTCTTAA 1500 1560 -1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 AATCTGACAA ACTTATCGTT GCAACTAGTA AATCCCCCAG ACATTTAAAG GAGAAAATAT GAGCA-tTTGA TGAAAAAAcT ACTTGGTCAC GTCGTTCAAC GACGGACGTA AACACGTACC GAATTTGCAC CAACTCGTAC CGTCGTCGCA ACGAGAAA'rA CTCGGTAGCG CTCCATAGGA AAAAATGGGA CGCAGTCTTA TGAAGCTCAA GCTAACGACG GATCTTCCCA AGTTTCATTG TGTTTACATC CAAGAAGACA TTACAAAGGT CACGCTGCAG GCAGAAATTA CTTCAGCTAA CGTCTTGTr'C TTGATAACAT TTCACTCCAA ACAAAGCTGC ACGACAAGAA AACACGTAGA AAATAAGGAG AGCAATGGCT CGTACAGTAC GTGTTTCACC CCGTGGTAAA AGCGTAGCCG ATGCAATCGC TGAAATCATC TTGAAAGTTT TGAACTCAGC
AACATA.AATG
TCG'rAAATC.A
AATCTTGACA
TGTAGCTAAC
GGATAAAGCT AACT'rGGTAG TATCTGAAGC ATTCGCAAAC TTTCCGTCCA CGTGCGAAAG GTTCAGCTTC ACCAATCAAC TGTAGCTGTT1 GCAGAAAAAT AAGGAGGTAA kATCGTGGGT TATGCGTGTC GGCATCATCC G'rGATTGGGA TGCCAAATGG
GCTGAAAACA
GAAGGACCAA
AAACGTACAG
ACTTrTGGTTT
CTATGAAACG
CTCACATCAC
CAAAAAGTAC ATCCAATTGG TATGCTGAAA AAGAATACGC 3240 541 GGATTACCTT CATGAAGA'rC TTGCAATCCG TAAA'rTCGTT CAAAAAGAAC TTGCTGACGC AGCAGTTTCA ACTATTGAAA TCGAACGCGC AGTAAACAAA GTTAACGTTT CACTTCACAC TGCTAAACCA GGTA7TGTA TCGGTAAAGG TGGTGCTAAC GTTGATGCaC TCCGTG.CAAA ACr'rAACAAA TT'GACTGGAA AACAAGrACA CATCAACATC ATCGAAATCA AACAACCTGA 'TTTGGATGCT CACCTTGTAG GTGAAGGAAT TGCTCGTCAA TTGGAGCAAC GTGTTGCT= CCGTCGTCCA CAAAAACAAG CAATCCAACG TGCAATGCGT GCTGGAGCTA AAGGAATCAA AACTCAAGTA TCAGGTCGI' TGAACGGTGC AGATATCGCC CGTGCTGAAG GATACTCTGA AGGAACTGTT CCGCTTCACA CACTTCGTGC AGATATCGAT TACGCTTGGG AAGAAGCAGA 'rACTACATAC GGTAAACTTG TCGTAAAAAC ACTAAAGGAG TCGTGAGTTC CGTGGAAAAA TGAATACGGT CTTCAAGCTA TCGTATCGCC ATGACTCGTT ACACAAATCA TACACTGCTA TGAAGGTTGG GTAGCACCAG TGAAGAGATT GCACGTGAAG ATTCGTAAAA CGTGAAGCAG TGTTAAAGAA CTTCGTGGTC AAAAGAATTG TTTGAACTTC CTTGAAAGAA GTTAAAAAAC ATAGACTAGG GAAGGAGAAA GTGTTAAAGT ATGGATCTAC CGTGGTGAAG TTCTTCCAGC GTAAATAACC AATGTTAGTA CCTAAACGTG TTAAACACCG TGCGCGGTGA AGCAAAAGGT GGAAAAGAAG TACCATTCGG CAACTAGCCA CTGGATCACT AACCGCCAAA TCGAAGCTGC ACATGAAACG TGGTGGTAAA GTTTGGATTA AAGCTATCC TGTGCGTATG GGATCTGGTA TTAAACGTGG TAAAGTGATG TT1CGAAATCG CGCTTCGACT TGCTAGCCAC AAATTGCCAG A.ATAAGGAGA AGGCATGAAA CTTAATGAAG TTTCTCAAGA AGAACTCGCG ALAGCGCGAAA GTTTCCAAGC TGCTACTGGT CAATTGGAAC AAATCGCTC-G CATCAAAA)ICA GTTCAATCTG TTTCAATGGA ACGCAATAAT CGTAAAGTTC
AAATCTTCCC
AAGGGGCACC
CTGGTGTATC
TTAAATGTAA
TAAAAGAATT
ACGAATTGAA
AAACAGCTCG
AAGCGAAATA
TTGTTGGACG
AACGTAACCA
ATGAAAACAA
CAGCTACAAA
3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 .43 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980
TGTTGTATCT
CCCAGTCTAT
TGTTGCCAAA
ACGTTTCCGT
AGAAAACTGA
GACAAAATGG ACAAGACAAT CACAGTTGTA GGTAAACGTA T'rAAC'rACTC TAAAAAATAc GAAGGCGATA TCGTACGTAT CATGGAAACT CTTGTAGAAG TTGTTGAAGA AGCGGTCATC AATGATTCAA ACAGAAACTC GTTTGAAAGT
GTTGAAACAA
AAAGC'rCATG
CGCCCGCTTT
GCGAAATCTT GACTATCAAA GTTCTTGGTG GTTCAGGACG ATGTTATCGT GGCATCTGTA AAACAAGCTA CTCCTGGTGG TTGTTAAAGC AGTTrATCGTT CGTACTAAAT CAGGTGCTCG ATCTAATCAA ACCTGAAAGG CGCAGACAAC AGCGGTGCTC TAAATTTGCA AACATCGGTG TGCGGN'AAA AAAGGTGACG TCGTGCTGAT GGTTCATACA 542 TCAAATTTGA CGAAAACGCA GCAGTTATCA TCCGTGAAGA GTATCTTTGC CCCAGTTGCA CGTGAAT'rGC GTGAAGGTG TTGCTCCAGA AGTACI'TAA ?r'rTTAGGAA CAAACTAGTC GCCCTTATGG GCGTAAGAAA AATCAACGAG AAACCTAATG AGTTCGCGTA ATCGCTGGTA AAGATAAGGG AACAGAAGCT CAAAACTCCT CGCGGAACAC CTTCATGAAG ATCGTG'rCAC CCCTAGCT'rC AAGCTAGGGT 7=TTAAAAA AAGGCGAcAA GI'rT'CCTTA CTGCCCT'rCC AAAAGPAAAC AAAGTTATCG TTGAAGGTGT TAACATTGTT AAGAAACACC TAACGAGCTT CCTCAAGGTG GTATCATCGA GAAAGAAGCA GCTATCCACG
AACGTCCAAC
TATCAAACGT
TAGACGGTAA
TCAAGTTTTG GACAAAAATG GTGTAGCTGG AAAAGTTCGC TACAACAAAA AATCAGGCGA AGTATAATGG CAAATCGT'rT AAALAGAAAAA GAACAATTCA ACTACTCATC AGTGATGGCT ATGGGTGTTG GTGAAGCTGT ATCAAACGCT GCACT'rA'CT CAGGTCAAAA ACCACTTATC CGTCTTCGTG AAGGTGTTGC GATCGGTGCA GAATT'CTTrr ATAAATGTACATTC TCGTGTTGGA TACAAATTTG AGTGCTTGAT TAATCACGAA GGAAAGGAGA TATCTTAATG AAGTAGTrCC TGCTTTGACA GTGCCTAAAG TAGATAAGAT TGTTTTGAAC AAAAGCCTTG AAAAAGCTGC TGAAGAAT'rG ACTAAAGCTA AAAAATCAAT CGCCGGCTTC
CCAACAAAAT
TTCCCAGAAA
ACAACTGCTA
CATTTGATGG ACGCGGGAAC TCAACTTCGA TGACGTTGAC ACACTGACGA AGAGTCACGT
AAAGTTACCC
CTTCCACGTG
TACACACTTG
AAAACTCGTG
GCA'rTGCT'rA TTCGTGGTGA ACGTATGTAC TACGTGACTT CCACGGTGTC GTGTGAAAGA ACAAT'rAATC
GTCTTGACAT'CGTTATCGTA
CAGGCCTTGG AATGCCTTT AATGGTAGCT AGAGAGGCTA TGCATTAAAG GCGGCAGGGG GACTCGTTTA CATAATCGTT 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 GCAAAATAAT ATAGGAGOTA AATCTAATGG CTAAAAAATC 0 AACGCCAAAA AATTGTTGAC ACTACGAAGG TTTATCTAAA GTAGGGTTAC GGGGCGCCCA TTCGCGAACT TGCGCATA.AA AGATATCAAG AGCGTCAAAA TTCTAGGAAA GTTTATCTTT TTTGAACACG AGCTACAGCT CATTAAATTG TCTATTTTTG CGTTATGCTG AAAAACGTGC TTACCTCGCA ACGCCTCACC CATTCAGTTT ACCGCAAATT GGTCAAATTC CTGGTGTAAC rCrCCAAGTAA AAATAGGA.AA TTCACACAGA GTTTAGCCCG TTGGCAAAAA AGACCAAT' CTCGTGCTGT TACGCTCTTT
TGGTCTGAGT
AAAAGCATCT
CTTGACGAAG
GGTTCAATTG
GCTTTGGAGC
GTATCATGTA
CGTATCGCTT
TGGTA.ATTTA
A.AACTAAAGT
GGCTTGCCAA
ATTGCTTCTG
TTAACTAGCA
CTGACCCAAT
TACTTGAAGT
AGTGCAACTT GCAAACTACT AGTAAGAGGA CGCAGACTTC CTAACTCGTA TTCGTAATGC GAAAAACAAA ATGGTTATGA TAACCAAGCT AAACACGAAG ACCTGCATCA AACATCAAAA AAGGGATTGC TGAAATCCTT AAACGCGAAG GTTTTGTAAA AAACGTTGAA ATCATrrGAAG ATGACAAACA AGGCGTCATC CGTGTATTTC TTAAATACCG ACCAAATGGT GAGAAAGTTA TCACTAACTT GAAACGTGTT TCTAAACCAG GACTTCGTGT CTACAAAAAA CGTGAAGACC TTCCAAAAGT 'rCTT'AACGGA CTTGGAATTG CCATCCTC AAC'N'CTGAA GGTTTGCZTTA CTGATAAAGA AGCACGCCAA AAGAATGT'rG GTGGTGAGGT TATCGCTTAC Gr'rrGGTAAA ATCAAGATAC AAAC'rCGTA AAGAACAAAG CAAAATTAGG AAGTTGGAGA AGTTTGmTA CAAACAAGCC AACTTATCTA TTTTGCACAG TTCTTAGAGC GTGITrCAGTT CAGCTCTTGA ACTAAATAAG TATCTGAACC CCGTGAAAAC TGGCCGTTCT GGCCTGACAA TTTAACAGGA GAAAATAAAC ATGTCACGTA TTGGTAATAA AGTTATCGTG *4 0 0 TTCCCTGCTG GTGTTGAACT CGCTAACAAT GACAACGTTG GGAGAACTTA CTCGTrGAGN' CTCAAAAGAT ATTGAAATCC ACTCTq'CACC GTCCAAACGA TTCAAAAGAA ATGAAAACTA CTTTT'GAACA ACATGGTTGT TGGTGTATCA GAAGGArrCA GGGGTTGGTT ACCGTGCACA GCTTCAAGGA TCTAAACTTG CATCCAGACG AAGTTGAAGC TCCAGAAGGA ATTACT'rTTG ATCGTTGTTA GCGGAATTTC AAAAGAAGTA GTTGGTCAAA CTTCGTTCAC CAGAACCATA TAAAGGTAAA GGTATCCGTT 'rAACTGTAAA AGGATCTAAA GTGTGGAAGG TACTGAAATA TCCACGGAAC TACTCGTGCC AGAAAGAACT TGAAATGCGT TT'rTGGCTGT TGGTWATCT AACTT-CCAAA CCCAACAACA CAGCTGCTTA CGTACGTAGC ACGTTGGTGA ATTCGT'rCGC.
6840 6900 6960 7020 7080 7140 7200 7260' 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 G*e 0 0 0660 CGTAAAGAAG GTAAAACAGG TAAATAATGT TGAGTGGT'rG ATCATCAACC ACCAACCTA'r T'rTCCAACTT TGTGCATAGC ACACGATTTA AAACTAAAGA GGTGAAAACT GTGATTTCAA AACCAGATAA AAACAAACTC CGCCAAAAAC GCCACCGTCG CGTTCGCGGA AAACTCTCTG GAACTGCTGA TCGCCCACGT TTGAACGTAT TCCGTTCTAA TACAGGCATC TACGCTCAAG TGATTGATGA CGTAGCGGGT GTAACGCTCG CAAGTGCTTC AACTCTTGAT AAAGAAGTTT CAAAAGGAAC TAAAACTGAA CAAGCCGTTG CTGTCGGTAA ACTCGTTGCA GAACGTGCA ACGCTAAAGG TATTTCAGAA GTGGTGTTCG ACCGCGGTGG ATATCTATAT CACGGACGTG TGAAAGCTTT GGCTGA'rGCA GC'rCGTGAAA ACGCATTGAA ATTCTAATAG GAGGACACTA GAAAATGGCA TTTAAAGACA ATGCAGTTGA ATTAGAAGAA CGCGTAGTTG CTGTCAACCG TGTTACAAAA GTTGTTAAAG GTGGACGTCG TCTTCGTTTC GCAGCTCTTG TTGTTGTTGG TGACCACAAT GGTCGCGTAG GATTTGGTAC TGGTAAAGCT CAAGAAGTTC CAGAAGCAAT CCGTAAAGCA GTAGATGATG CTAAGAAAAA CT'rGA'rCGAA GTTCCTATGG T'rGGAACAAC AATCCCACAC CAACTTCT'rT CAGAATTCGG TGGAGCTAAA GTATTGTTGA AACCTGCTGT
AGAAGGTTCT
GGCAGATATT
TGTTGAAGGT
GGAGTTGCCG
ACATCTAAA'r
TTGAAACAAT
544 CTGGTGGTGC AGTTCGTGCC GTTGTGGAAT CACTTGG?1'C TAACACTCCA ATCAACATTG TGAAACGCGC TGAAGAAATr GCTGCCCTTC AGTTTCTGAT TTGGCATAAG AAAGGGGATA AGTCTCCAAT CGGACGCATT CCATCACAAC AATTGAACAG CTCTGTTATT' AAAGAAGATA TATCTCACTr AG-TAACAGTT GAAGAAGTAA ACCATCCCCT AAAACTAGAT ATAGTCATCT GGAGACAACC TTTTCTCCCT TATCGGCGCT AAATGGCTCA AATTAAAATT GTAAAACTGT TGTAGCACTT ACGCTGCTATI CCGTGGTATG ACTAATGAaG TNTTAGGGGA ATGATGACAT CGTATAGGCG
TGGCAGGTGT
TTCGTGCAAC
GTGGTATr'rc ACTTGAcTA
GGACTTGGCA
ATCACAGcAG
TGTGCACTGT
AGTTGATGGG
8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 AGCA'rTTTAC AAAAGAGGAG AAAATAAAAA TGAAACTTCA TGAA'rTGAAA CCTGCAGAAG GT'rCTCGTAA AGTACGTAAC CGCGTTGGTC GTGGTACTTC ATCAGGTAAC GGTAA.AACAT CTGGTCGTGG GTAGCGGTGG CGGAGTTCGC TTCCAAAACG TGGATTCACT AATTGAACGT CT'rTGAAGAT TTGTTAAAGC TGAAAAG'rCA TGACTGTGAA AGCAGCTAAA GTTCAGTAGA AGTrCATCTAA 'rTAAAGTCA.A GCAGGTTCGA GTATCGGAAC TAGCATTACA CTTGGTTTTG AAGGTGGACA AACATCAACG CTAAAGAATA GGTGCTGAAG TIAACTCCAGT GGTATTAAAA TTCTTGGTAA TTCTCTAAAT CAGCTGAAGA TCAAAAAGGT CAAAAAGCTC AACTCCAT'rG TTCCGTCGTC CCCAATTGTG AACCTTGACC TGTTC'rTATC GAAGCAGGAA CGGTGAGTTG ACTAAGAAAT AGCTATCACTr GCTAAAGGTG TAAATTATTA iGAGAAGC'rC TTTTATCGTT TTGGTCTTTC TAGCTTGAAr GCTTAAGTG TGCCCTAAAA AACTTr'TCGA TGTTGTCCA-A CTCTTGCAAA GGAAGTAGGT CGAAGAAAAT
GATTATCCTT
TTTTTGCCCT
TGGATATTTT
TGAATCAAGC
CAGCTGGTT
CTTAAACATG
AGGAGTTAGT
ACCCAAGTTT
TACTCGTTAT
TAATACCTTG
GAGAGGTGAC
TCAAAAATTT
GTTCCTGGTG
TTGAGCTTGG
CCCTATATCA
GTAGAGTGGG
ATTGCTCTAG
GCTGGAGCTC
CT1ATGTTTTT
TATTTACAAT
TGAATGCCAA
TGTCGGGGAA
CCGCTTCTAT
GTAAACAAGG
9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 TTTCGAC GArTGGTATC ATCTTAACAG A.GCAAATTAC AGATAAGGGA TACGGAAACG TTCCTCAAT TCCAGAGATG XITTCAGGGCA GTAGCCG'rAT CACTTCATCr ATCATTTTCG TTATTTACTT TACAACTTAT GTTCAACAAG AGGTTGCACA AGGTGCTCCA TCTAGCTCTT TT~CTCGCTTT TGTGCAATCT ATCGGGATTA AATTGATTAA AACTGCTTA ACTCCACAAG CTGGTAGTAT GATTGTCACT TGGTTGGGTG GTGTTTCCir GATTATCTTT GCCGGGATTG TCTATGTGGA CTACTTTGTG AACGTCCCAA TAATCATTTT GATTATTACT GTATTGTTGA CAGAATACAA AATTCCAATC CAATATACTA ACCTTCCGTT AAAAGTAAAC CCTGCTGGAG 545 TTATCCCTGT TATCTTTGCC AGTTCGATTA CTGCAGCc'rG CGGCTATrT TCAGTTTTrTG 10380 AGTGCCACAG GTCATGA~rG GGCI'GGGTA AGGGTAGCAC AAGAGATGTT GGCAACTAC'r 10440 TCTCCAACTG GTATTGCCAT GTATGCTTTG TTGATTATTC TCTTTACAT'r C7.rCTATACG 10500 TTNGTACAGA TTAATCCTGA AAAAGCAGCA GAGAkCCTAC AAAAGAGTGG TGCCTATATC 10560 CATGGAGTTrC GTCCTGGTAA AGGTACAGAA GAATATATGT CTAAACTTCT TCGTCG'rCTT 10620 GCAACTGTTG GTTCCCTCTT CCTTCGTGTG ATI'TCCATTT TACCGATTGC AGCTAAAGATr 10680 GTATTTGGTC TTrTCTGATGT TGTTGCCTTI' GGGAACAA GTCTCTTGAT CAT-rATCTCT 10740 ACAGGTATCG AACGAATCAA GCAATTGGAA GG'rTACCTAT TGAA-ACGTAA GTATGTTGGT 10800 TTCATGGACA GAACAGAATA AAAGTATTTA CTGAATCAGT AAATACTGAG GGAGTGGAGG 10860 TTTAAACTCT GACAT'N'GTA AGAGTTGGAT CTCCCCTCTr CTATTT~TGTT TTTAAATCGG 10920 GGTGAAAAGA CTTrTTTGCTT CTAT-rTAAAA ATAAAATAAG GAGATCAAAT CATGAATCTT 10980 TTGATTATGG GCTTACCTGG TGCAGGTAAG GGAAC'rCAAG CAGCAAAAAT CGTAGAACAA 11040 TTCCATGTTG CACATATCTC AACAGGTGAT ATGTTCCGCG CTGCAATGGC AAATCAAACT 11100 .GAAATGGGTG TTCTTGCTAA GTCATATATT GACAAGGGTG AA -rGGTTCC 'rGACGAAGTT 11160 SACAAATGGAA TCGTAAAAGA ACGCCTTTCA CAAGATGATA TTAAAGAAAC AGGATTCTTA 11220 ***TTGGA.TGGTT ACCCACGTAC AATTGAACAA GCTCATGCCT TGGACAAAAC ATTGGCTGAA 11280 CTTGGCATTG AACTAGAAGG TGTTATCAAT ATTGAAGTGA ACCCTGACAG CCTTTTGGAA 11340 CGTTTGAGTG GGCG'rATCAT CCACCGCGTA ACTGGAGAAA CTTTCCACAA GGTCTTTAAC 11400 *CCACCAGTTG ACTATAAAGA AGAAGA'ITAC TACCAACGTG AAGATGATAA GCCrGAGACA 11460 GTAAAACGTC GTTTGGATGT TAATATTGCT CAAGGAGAAC CAATCATTGC TCACTACCGT 11520 GCCAA.AGGTT TGGTTCATGA CA'rCGAAGGT AATCAAGATA TCAATGATGT CTTCTCAGAT 11580 ATTGAAAAAG TATTGACAAA TTTGAAA'rAA AGCGTTTTTC ACACTTGCAA AAATCCGCTA 11640 *CAAATGTTAT ACTGAGATAG TCTGACTTAT AATTGTTGTC TCTGTGTCTA GACGCATCGA 11700 ATCGAAATI'T ATGGAGG'rGC TTTTGCGTGG CAAAAGACGA TGTGATTGAA G~rGAAGGCA 11760 AAGTAGTTGA TACAATGCCG AATGCAATGT TTACGGTTGA ACTTGAAAAT GGACATCAGA 11820 *.TITAGCAGG G 11831 INFORMATION FOR SEQ ID NO: 66: SEQUENCE CHARACTERISTICS: LENGTH: 10726 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: CCCGCCATTT GAAAGCTATT CGTGAAGGAT 'PTATGATGGC AATGCCTTTG ATTTTAGrCG GCTCT1TTATT TCTTATTCTA ATCAGTTGGC CTCAAGAGGc TrI'TACAAAT TGGCTGAATA GTGTTGGATT GCTAAGTATC TTGACAACTA TCAATCAGTC TGGTCGCTTG TTTCGGTATT GCCTACAGGT TGTCGGAAGG CGGCAGGGAT CATAGCCTTA TCCAGTTTTG TATTGATGGC TTTATGATAA AAATGGGGAG GCCTGAATGC ATCTTCTTTG ATCGTATG'PT TATCCAGCGC TAAGTAAATC ATT'PTCAGCT TCTTAAAAGG TCTTGAAGCG TTGTTGGAAC ACCGCTTAAG TTGTAAACTC ATTCTTTTGG TAGACCCAGT TTGG'rTACAA CACTifCAACA CATTATTACA CAGGTCAAGC AGTTATTTGG TT'rATGGCGA TTACTATTGG GGAATTACGA TAAAAATGCC CTTTTATC!TG GTTTACTAC GCAGGAGTTG CAGGAGGTCT TTAATTGCAG .GAACGCTTCC TTCTGTGGAG TTAATGGGGG 'PTTACTACAG AAAACCAAGA TTACCGTTTA AAGATTTATT GAGCGACTAT TGGTCTTGCG ATTTCTCTCT TCCTATTTAG CATTAGGTAA GCTAGCTATT ATACCGTCTA TTTTTAATAT CGTTTCCAAC AGTTTTAAAT CCGATTATGC TGATTCCGTT ATGCC'PTGAT TACCTATGTA TCAATGGCTG TAGGATTAGT TCCTTCCGTG GACAATGCCA CCGATTATAG GAGGCTTCCT GAGGAGCTCT ATTACALAGTT GTTTTGA'rTT TGGTTTCTGT TCAAAATTGC AGATAAACGC AATCTTGAAA AAGAAAAAGC ATGGTTATCA GAGTATTTGA TCAACAGAAA AATACTTATT TTAAGTTACT ATATGAATCG GGTCT'r'IAAG ACTAACATAG GCGGATATTr TTGTAGGATT AGTCAATAAA GAGGACAGAA TTAGACAAGG GTAAGGGGAG AATTGAGTCT AATACAATTG TACCGAATGT T'rCATGAATT TGGGGTTGTG TATACTAGAC GTTCCAGAGT TACGATTTGA AGATTTTTTA GATAAACAGC AACAGTAGCG ATTATCTCCT ATATGGTACA GATGGTCCGT ACCTCGTTTT TCGAGTATGG CGGCGCAATA CCATTTTCTA ATTGGTTACA GCAGAGATTT AAG'rGGTGTC CCAGATGTAG TTTTGTTTTG TGGGCTTTGG CAACGGACTC CTAGGTGCAA AGGTATGATT CTATGTGTTA ACAAGTTTTA AATGCTTTTG AGCTGTGGCT GCAGGACAAA TGTATTTATT bGTGGCGGTG TAAGAGTCGT GCGAATAAAA CAATACAGCT ATTCTATTTA TATTGCTACT CCTACAATCA ACCCTATACA ACAGGTGTAA TGCAACAGGG GCTAGTTGGC AGCAATTTAT TATCCATTCT TACTGTTGGA GGGAAATAAG CTAGCTTTGC CTTAGAGGAA AGCTTGTCGA GGAGAAGGAA AAGACCATGT TCTTATCTCA TAGGTTTACT TATTGGAATT CAGGGCGCAG ACATGACTTT TATCTATAGA TGAAACAGCC 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 AGTTACTATC ATAGGGGAGT ATGTAT1AGAG GGAGCGGAT'r CATrGAAAA TATACTAGAT TTCATTGATT GGCTACCTAA GATTGGGATG AACAGTTT TACTCT'rTTT TGAAACGTTG GTATGAACAT GAATTTAATC TTCAAATG AATTAGTACA AG.AATTGAGT GkTAGGTTGG GGTCTTATTC ATCATCGTGT TGGTCATGGA TGGACAGGTG AAATTTGGCT GGGAATCACG ATAAACGGCA AACGAGAATT AATCCAGATG TAGCTGATAA GATGTTAACT ACT'rACATGT AACTGTAGAC AAGAATTGGT GCTTAACGA GTGAGGGATT TGGGCACCTC AGAAAGAAAA ATTACAAGAA CATTTGAAAT CCTAAACCTT ATATGCGTAA TCTTAGTAT'r TCAGAGGAGA TCATCCAGTT TGAAAATCCT C.ATATCTAAA TAAAGAACAA ATAAAGAATT GCAAAAAAGA AAGTTTTAGG TTACTCTTCA AGAAACCCTA TGTCGCTGAA TAACCAGCCT GGATTTTTCA ATTATGCCAA GAAAAGACCT ATAATATTTG TGAATGCGAA TTCTCAATCA ATTGGATAGG GTTTAATACG GCTCCGAT T GATGGTAGAA ATTATCAAGG ATGGTTGTCG GATGC'TCGTA TTCGGATCAG TATATTCGTA .000 o.*
AGATACAAAG
ATTAGATAAT
GAGTTATGCA
TAAAATTATA
ATT'rGTTTTC TGCTTTATCA TGAGT'rGTTA CCTGAACGCT TTACCATGAT GTTTGCACCG GATGTAGATr TTGACAATTC CATACCTACG CTTCCGAATT CTCTTGAGGA AAATTTATCT TATCTTTTTG AGTGGCAAAA AGCATTTAAA GGAGATAG?= TCGTATATGA CTATCCTTTA GGGCGTGCTC ATTATGGCGA TT.TAGrGCTAT ATGAAAATTA GTCAAACTAT TTACAG.2GAT GTATCTTATC TTTCCAACCT ACATTTGAAC GGGTACATTT CGTGTCAAGA ATTACGTGCC 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 GGATTCCCTC ATAATTTTCC TAATTATGTC ATGGGGGAAA TGCTCTGGAA AGTTATGAAG AATTGATTGA AGAATACTTT TCTGCTTTGT ATGGGGAAAA GTTGTTGAAT ATTTAGAAAA ATTATCCATT TATTCCTCTT GTGAT'rATTT GGCAGCCGTC AAAGTGATGT TTTAGCGAAT CATTATTATA TAGCTTACAA AATTTITTTAC CAATTATTGA GGAAAATATT TC'rAAGTTAT TAAATAGTCA TGGAAACAGC TCAGTTATCA TCGTGA.ATAT GTTGTTAAGA TGGCGAACGC CAAGCAACTG GAA.AAACAAG GCAAGCTCAA GATGAATGGA GA.AATGTGTT CGTGGGCACG AATTGCTATT TCAATCTAAT TTGGATGTTT ATCGTGTAAT
GAAGACAAGA
T'rGGCAGTCT
TAATGCAATT
TCTAGCTGAT
AAAGGATGAA
TTTATATCTT
GAATTATiATC
TGAAGTAGCA
AAAAATTACG CTGGTTTCCA CTTATAAATC ATAAGTATAG AAAATGAACT AAGGTATTCA GAGAAGATTG ATCCTAAATA TTATGAAATT TAAGGATTTT TAAGATATTT AGGGTCAACT TTCTATTTAT ATCGTAGCGA AGTCATTTTA ATAATGATGT GTAAAAGATG GATCAAGATT GAGGAGGAAG AAAGATGAAA TCAAAAOAAG AAATAAATAT GCTTGGTTTT ACAATTGTCG 548 CTTACGCAGG AGATGCAAGG TCAGATTTGA TGGATGCTTT GGCG=rGCG AGAGA'rCGAT 3420 ATT'rMAACA GGCAAGAGAA GAGAACAGAC TAATrrTATTA TTATGATTCA TGGTCAAGAT TTTTTATTGA TGAATATGAA GATTACTCAT GGAAAATTTA CTACCGCTGC TTATCAAGTA GGGATGTTTA TTTGCAAGAA ATCGTTACGA AGAGGATATA CTATTTCTTG GGT'rCGTATA ATTATTACCA TAGAGTTTT TACATrCATT'r TGATTCGCCT ATATTGATCG TTTCATACGA ATTGGTTTAC AATCAATGAA TTCCTCCAAA TCATCATTTT TGGCGCATGC TCTTGCAGTC TTGGTTGAc( CTGCAAACGA CTCAATAGTG TCTGCCCATC GCGGAGGACG CATATGGAGA TAA'rrrTGAA GTGAGCTTTA ACTTTGATGA CAACGATGCT ATTGTATGAT CAGGTAAAGT CGAATTCGAA AGATTGAAGA ALATATTGGT TTGCAATGAG CAGGTITAAAG CCTTACCGAA GGAGTTTTT1A TTAGGAACTG CAGGGTGCAA CTAGGGTAGA TGGCAAAGGA ATAAATN1'GT AATAGTCCGT TCTTACCAGA TCCAGCTAGT GAT -rTTATT GCTTTGGCGG CAGAACATGG TTTGCAGGCI' TTGCG'TrAT TTTCCTGATA TAGATGGGGA TGCTAATGTA TTAGCTGTTC CAGTCTTGCT TAAAACATAA TGTGATTCCG T'rTGTTTCTT CAGAAAATGT TAGAAACAGG GCATTGGTTG AACAGAGAGA TATGCTCGCT TTTGTTTCCA AGAAT'rTACA GAAGTCAA~c CTGATGTCTC TTGCTGCAGG TCAATATATA GGAGGTCAGT CAATTATCTG AAGCAATTCA AGCGAATCAT AATATGTTGT CTCGAATr'rC ATCAATTAGG GATTGAGGGA AAGG'rAGGTT GGCTATCCTA TTGATGGGCA AAAAGAAAAT ATTTTGGCAG AATAATAAAT TTCTA'rTAGA TGGAACTTTT T'rGGGCTAC'r CACTTGAATC AAATATTGGA AGCTAATAAT TCTAGCTTTA 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160
GTATTCATGC
CTAAACGGTA
ACAGTGAGGA
TTTAAAGCCA
TGATGTTTAT
CACGCTTTTT
'rrATTGAAGA TGGTGATTTA GAAATTATGA AGAGAGCTGC ACCTCPAAT ACGATGTTTG GGATGAA'rTA TTATCGTTCA GAATTTATTC GTGAATACAA AGGTGAAAAT AGACAAGAAT TTAATTCAAC AGGAATAAAA GGACAGTCTT CTrTTTAAATT AAATGCTCTA GGTGAATTTG TAAAAAAACC TGGTATTCCG ACAACAGATT GGGAr'rGGAA TATTTATCCT CAAGGGTTrAT TTGATATGTT GCTTCGTATC AAAGAAGAAT ATCCTCAACA TCCGGTCAT'r TATTTAACTG AAAATGGTAC AGCCCTTAAA GAAGTTAAGC CAGAGGGCGA GAATGATATT ATTGATGACA GTAAGAGAAT CCGTTATATT GAGCAACATT TACACAAAGT TTTAGAGGCT CGAGATAGAG GAGTCAATAT TCAAGGCTAT TTTATATGGT CTTTGCAAGA TCAATT'rTCT TGGGCGAATG GCTACAATAA GCGATATGGT CTTTTCTTTG TTGATTATGA AACACAGAAG AGATATATTA AGAAA.AGTGC TCT'rTGGGTA AAAGGGCTAA AACGGAATTA AGGTTAGCGA TTTGACTGAT GTTTAATATG TTTTAAATAT GAGGTTGAAT TTTTTATAGG AGGACTTTTA TGGATAAGCT 549 AGTCGCTGCC ATTGAAA6AGC AACAAGGGAA ATTTGAAAAA AT'N'CTACTA ATAACTATAT GATGGCTA'I' AAAGATGGAT TCATTGCTAC TATGCC'TTTA ArrATGT'rr cAAGcT= GATGATATr ATTATGATTC CTAAAAAI-r? CGGAGTAGAG TTACCGAGTC CAGCTATTG;T CTGGATGAGA AAAGTGTATA 'rGTAACCAT GCAGTTTTG GGTAI-rATTG mTCAGGGAC TGTTGGAAAG TCATTAGTTG GAAATGTTAA CAGAAAAATG CCTCACGGAA AGGTAATAAA TGATATTTCT GCAATGTTGG CAGCCATATG TAGTTATCTG GTA'rTAACTG TAACGCTTGT AGTTGATGAG AAGACGGGAT CTACAAGTTT GTCGACAAAC 'rAT'rTAGGAT CTCAAGGATT GATAACTTCG TTTGTrCAGTG CCTT~TATTAC TGTAAATGTT 'rACCGATTCT GTATTAAGCG AGACATTAC'r ATTCATTTAC CTA6AGGAACT TCCTGGGGCT ATATCACAAG CTTTAGAGA TATTrrTCCCT TTTTC? rTTG TTTTACTTAT TAGTGGT'rTG TTAGATATTG TATCTCGGTT TAGTTTAGAT GTTCCTTTTG CCCAAGTATT TCAACAACTA TTGACTCCTA TTTTTAAGGG a a.
a a a. .a a a a a. a.
a a a. .a a a a a a a. .a a a a a a a.
a.
a a GGCAGAA'rCA 'rATCCTGCTA TGATG'rTGAT TGGAATTCAT GGACCATCTA TTGTCTTACC GGAAGAGAA'r GCTCAACTTC TTGCAAATGG TTTCGGGAAT TATATCGCTG CTAT'rGGAGG TTrGATr1-rC TTTATGCGGT CTAAACAATr TGTTrTATr GCGGTAAATG AACCTCT'rCT TCr'rTrTGTC CCTTTTTTGA TGACTCCACC TGATTTCT'rT GGAATGAATG GATTTTATAT GGGATTGTTA ATTGGAACGA ATTTTCAACT AGTTGTCGAC ATATTGATTT ATTTGCCATT GAAAGAAGAT ATI'GCAAGCT CAAATGATAT TCCTGGTGAG ATAGATGAAA TAAAAAGTAA GTCTGGAACA AGTGCGCAAT TAGCCAATGC TAGAGTGATT GCGAATrCAG GAGCGTACGG TTAATTATT CTGGCCCCAC AAGTTCGGAG AAGATTAGGT ATTCAGATAG TTGCTACCAG TCCAAGTAAA GCCTTACAAT TTGTATTGGA ATCTTTTATT TGAGTAAAGA TTTTGTTTAC TTGGTTTATG TGTGC'TTTGC TTTGGT'rTGT TGCTGTTACA GCTTTGCAAC GCAGTTrCCCT TATCATTCFT AACGGGGGCT ACCT'rTGTTG AAAATCGGTA GGTAAAGCTA ATTTGGTATG CCTGTTATTT AGTGAATGTA TTI'CTAGGAA CCAGTTACCT TGGACCTTTC TATCTCCfTTT GTATTTTTAT CTGTAGAGCG TATGATAGAC TATTTrAGAG GAGGATACAA GGAGTTGAAA GTACTGGTTC AAT'rAACGAG GGGGCTAACT AGCTCATTAT GATATTATGG
TGAGCAATAT
TAACACCTAA
TACCA'N'TAT
CAATTACTCC
TGAATCCCTA
AGGTCTTTAT
CTGGTCCCTT
CTTTCATTTT
ACTTACTGGT
GTCAAATAAT
TTTGTGCAGG
TAACAGAGGT
GTGTTTATGA
TGGATGCAGA
TAACAAAGAG
AAGTTTTTCC
ACGTTCCCCC
5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900
TTATTATAGA
AGGAATGGAA
GCATTACCAA
AGATAGGCTT
GAGATGAAGG
TATATTCATT
GCTGTGTAGT
GGATTTAAAA
550 TTTTTTAATA TAAGAATCCC TCTT1CACAA TTGTAAAAAG AGGGAT'rTrG TATTTTATCT CTTAGACCAA GTrTCTTCA TAA.AGAGAAG GAGGATTGGG TAAATCTCCA AGCGCCCTGC AATCATTGCA AAGGATACCA GAA~TTGA GATGGGACTA AAGAT-rGAGA AACTAGAAGT GGTTCCTAGA ATAGGCCCGA TATTAT'rGAA ACAGCTAAAG ACAGCGCTGG TCACGACCAG 6960 7020 7080 7140 AAAATCATTG CTATCTAGGC TGACAATAAA GACAAAGTAC TTGAGAA'rCT TATGCTGGGT GAGGGTCAAA ACACGGTGGG GCGATAGGAT AAGGATGAGG CCTCGAATAA TCT'rGAGTCC TGCCATGAGG AAAAGGAGGA TAAACTGGGA TCCAAAACCA GTTGTTGTAA TGATGTTGGA TGAAAACCCT GGGTAGAGGT AGAGGGTGTT AATGACCAAG TAAGCCCTAA GCTCTTCATC GAGGTAGTAG TAGAGGTTGA AATTTACTCC GGTAATCAGT GAGC'rGCCA'r AGTGGGCAAT TCCCGCTGTC CCCATAGCAA TA.ACAAAAC'r GATGATGACA AAGAGGGAGA AGAGAGCTAG GATAAGCGCT AGCAAAATCA TAGCATAGAT ATCrTG'CA ATCACCGTTT TATTAACATG TGACAAAATT TGGTTmTGG CAATTTTTGA
V
V.
V V
V
V. *V V V
V
V. *V V V
V.
V V
V
a
V
VVV*
V
V
V
V
V.
V V
V
ACCTGCAGTT
GAAGAGGGGC
AACCTGGAAG
GAGGCTAATC
TCCAAAGAAG
AAAAACCAGA
TCCGTCGTTA
ATCGTAGAGA
ATAAAGGAGA
AGGACCTGGA
AGCAAGTGCA
GAGGAGGGAA
TCCAGAACTA
TTTTAGTTTG GATACAACCT 'rAGGTGGCTA TTrTTTGGCAT TCCAATCAAG TGGGTAAAAC GTTCAAAATA CTTGCTCCAG
TGCCAAAAAC
TGTCCATAAT
TTCGCCAGAA
TAGTTGTAAA
GATCCAGCAG AGCCACCGAT CAGTTGGTAA TATCTCCATA AAGGTCATTI' CAAAGCTCTT AAGCCTGTAG AAACCAGTAC GCCTTGATGC GACGGAGCAT ACTCCGATAC TGACCAGATA TAGACGGTAA AGCCTCCAGT GGCATACCGG CTAGATAATA TAGAGAATCT GGGCAGTGTT ACCTCAGCCT 'rCATCACC'rC AAAACAAGCA CTCCCATCCC CGGCTGAGAA CCGAAACGTC ATTTCAAAAA AGGCATCAAT CCAAAGAAAG ACCAAAGGAT ATCCGTTGAT TTTTVGGCTT ATGGTCGAAA AGAGGGCTGT ACAGGAACCA AAAGAAGAAC ATACTTTTAT TCATTTCTTA AAGGT'rGTTA CTAGGAGCTT 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700
AAGGCTGGCG
CCAACAGAGG
CTGTAAACTC
AAAGACTT GG
AGCTTCAATC
CCTCGCGATC
ATTTGCCCAG AAAAGACAAA GGGGAGACCA GCAACGATCA AGACTCCCTC C'rTGGCATAA CCTGAACCGC CTAACAATAC GAGAATCCCT CTCGATTCAC GGTAATAGAC AGCAATCGCA AAAAGTAATT TTGAA.AGGAG GTAACGAATC AAGTCATAAA TCTTGGTGAT GTTTGGCAAC GTCTCCAACT TCCAACATAT GGCTGCAATA AGAACCCCTT ATT'GG CC TTGATATGGA ACTT'GAAGG TCTGAATACT CCTCCCCAGT TGGGAAAATA GTCTTGCCCT TTCGAATAAT TTTTCAATTT' CAGTTGAGAA AGAGGTTTGG CAGTCATTT AT'rGCAGGGT TTCGATTTGG CCATTGGCTA GATGGTGCAT GGGCATTAAC TCGACCACGA ATAA-AGTGCA TAATCGTATC
TACAGCGATG
GAGACTGGTA
AGATGTAATC
AGCACTTCT
TGGGAATTTC
CTTTTAGGTG
CGATTGACCT
AGATTTTCCT
TCCAGCAGGA
TCGCTAAAGA
GATACCACTA TCTTTGAGAA GAGAAGGCTC TTCACCCCGC GTTACCAGTG ACAAAGATTC AATTTGATGA TCCCTCTCTA AGAAATGGGC ATT'rGGCAAA
V
.fr
V
S
55 V V
V.
S
S
S.
C,
GCGTCCACCA
AGCGCGGGCA
CTTGAAATAA
CATTTTCTTG
AAAGATATCA
AAGGATACCA
AATCAGCAAA
TCCCCCTCCG
TACCCTTTTT
CGCTATAAGG
ATATGGTATG
ATCAGACTTT
G'rCTCCAACC
GGGGAGCTAT
GGTTGTCGAT
TCAATACAI'
TGTAGCCAT'r
TACCTTCCAC
GCAAAGCGTT
GCCAAGAGCT
GAGTTAGAAT
GCTAGAACTG
CAATCTTGGA
ATGATATCAA
ACATCATGCT
ACAAGGATAA
CAAGAAAAAA
TAAAACTATA
ATAAAGGATA
TGATGATACT
TAGTAATATT
CATCGACTCC
TATCTTGC
AGCTGGCGAT
1'ACCAAGTAG
GTGATTTAAA
TATCT'rTATC
TCGCACAGAC
GACCGCTGGT
CGACAGACAG
CAGGATTAAC
ATTCAGGGTT
CTGCAATCAT
CGCTGGCTTG
AGCGACTGAC
TTTCTGCAAC
TTTTCATAAT
TGCACCTACT
CCCTAACCAA
TACAAGGAGA
GGCGTTGGGG
GA'rAAGAGAA
CCGCACCCGA
GTTGACTTCA
CTCAAGAATG
AATATGAT'rG
CAAGGAGCGA
AAAACCTACT
AGCTAATAAC
TT4GAAA'IAGC
TAAAATGAAT
AACTCAATGA TAT'rCGCGA'r AAACCGAGAA 'rATTCTrTTC ACGATAGTTT CTTTAGCTCC TCGTGCTCAG TCAGGGCGAT GCAAAATCGG CCCCGTTACC AGAACAGCTT CGTCTrGCTC CAGAGGGCAA AACCAACT TTI'CATGAT GTAACTATCA AAGAGTTTTT AGTGAAAATT TATTAGCGAC TTTCTCTGAA AATAATTTAC TGGTATTACA ATTGGAGTGG CTTTAGAAGA ACGCCTTATA ATATTTTTGA GAGGG-AACGA CGTTTGTATC GT'rGCCAAGA CTGCAAAAAA ATCAAGAAAC ACGTTGGCAT CAAAACACAG AGCAT'rCTTA AAACTGGCCA GTGGTCACAT TGAAAAATCA CGCGCATTGA TAATCTCGAG 'rTTCTGTACA CCTACCCTGT CAAGGAACAT 'rGTTAGAGTC GCAACGGCAT CATAGTGTrG GGTTCCATC'r CC'N'GAACGA TGTAGAGATT T1TCAGGATTG ATTT~CAATGA CTrTrGTATc ATAATAGGCA ATTCTACCTG CCCCAACGAT ATAAT'rA'GG AAGAGTA'rCA TATCGACACG CTGTACAGTC ATGTCACCGC TTGGAATGAT AATGACATTA CCAAATTT'rT TACGAAAATC GGACT'rGACG ACAA.ATTCCA TGAGGCTAAC 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440
S
*5*V
S
GGTCTGGTTC. ATGGTGCGGT ATCGGCTATG TTAAAAATAC ATCACTTGAC GCACGATATC
CGTCTCTTTC
CCAGGTGTCG
GTCACGCCAG
CGTGAGATTT
GGTCGTGATG
AGACGGTGGA
GTTCGAAACG
ATAATGGGAC
CTGAGGTGGC
TCTATGCCTA
TTACTGGCCT
TAAGAGTGTA
GCITrCCTTT
CAATAGGCGT
TACTGGTGCT
TACTTTT"1GAG GAAGTAGGGC CAGAGCTCAG TGTGGAACAG ATTGTAGAGC TTCCAGTCGT 552 AGCGACCATC ATAGAAGATC ATCTGGTGAA GGGAGCCATT GATATTCTGG ATGTGCGTTT CGGTTCGCTI' TGGACCTCTA TCACACGGGA AGAMTTAC AAGCTGGAAC CAGAATTrGG TGATCGTrT GAAGTGACCA TCTATCATGC TGATATGCTG GTCTATCAAA ATCAGGTTGT CTATGGCAAA TCATTTGCAG ATGTGAGAAT TGGGCAACCs ATCTTTACrc TCAGCaTCTt CGATTAGCTG GCAATTCGT TCTAGTTGGA TTTCGTCAAT CAAGGT INFORMATION FOR SEQ ID NO: 67: SEQUENCE CHARACTERISTICS: LENGTH: 7163 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: TTATCTTTAA CGATATCAAT CAAGATCTGG TCAATAAAGG GATTGGGGCT TATCGTGAAG TTGGCATCCA AGCCCATGGA TATGTCTGTG ACGTGACAGA CGAGGACGGT ATCCAAGCCA TGGTCAAGCA AATCGAACAA GAGGTTGGTG TCATTGACAT CCTCGTTAAT AACGCTGGTA TTATCCGCCG AGTTCCAATG TGCGAAATGA GCGCCGCTGA TTTCCGTAAG GTCATCGATA TTGACTTAAA CGCACCATTT ATCGTTrCAA AGGCAGTTAT TCCTTCTATG ATAAAGAAAG GGCATGGAAA GATTATCAAT ATTTGTTCGA TGATGAGCGA ACTGGGACGT GAAkACAGTTA 10500 10560 10620 10680 10726 GCGCTTATGC TGCTGCTAA.A ACGGTGGAGC CAATATCCAA CAGCACCTCT TCGTGAATTG TTGCAAAAAC ACCTGCTGCA TTCTCGCTAG TGATGCCAGC TCTTAGCCTA CATCGGAAAA CATTAATCAA TGAAA.ATAGT AAGCGACAGA TAAAAAAGGC AAAGTCAATT AACTTATGTG CAGrI'GAC'N' TGTTGTTACC GCTrCCCTGG TGTTGTCTGT AAATCAATGG TGGTAACGCC AACTGACCCT CAAATTGATG GGGGGCTTGA AAATGTTGAC CCGCAACATT GCGTCTGAAT TGTAACGGAA TTGGACCGGG TTATAT'rGCC ACTCCTCAAA CAAGAAGATG GTTCTCGCCA CGTTGGGGAA ATACTGAAGA AATTTTGTCA ATGGCCACAT CAACCTGAGT AAAAATAGAA CAAGCTAGCA AGAATCACAT TACCAATTAT TTAACTATGG CAGAACGGAC TAATGGCTGC GGCTGTGGTA CGGGTGTAGG GGTCTAGCAG TGGACCCAAC TTGTCTATCC CTTATGCCAA TTTGAACGCT TATTTGCTGA CCCATTTGAC CAGTTCATCA TTTGATGGGC CCTGCTGTCT CCTATATGTA GATGGCGGTA AGAAGATCTT ATGAAAATCG TATTTACGAT AGTCTAAA.AG TATGCGTGGA GAAGAAGGAG CATCCT'rTTA AATACAAAGG GGCTATGCTT GCTTTAA.ACA.
TGACGCTPTAC CTTTATTCTC AGGATTTGGC TGGGGGGCAG AGAAATGGGC GGTGGCTACC 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 553 CAAGAGAACG TGTAATCCCT GAACAACGCA ACGCTCGTAT CTTAAACGAG GTGAAACAAA TCACCCACAA TGATTTGATG ACCATCCTTA AAATAATCGA CCAAGACTTC CTCAAGCA CCATCTCTGG CAAATACTTC CAAGAATACT TCTTGAAAA CTGCCAAGAT GATGAAGTTG CTGC'rTATTT GAAAGAAGTA 'rTAGCCAAGT AAAGCTATTC TAAACCAGAA AGGAACTAAT GGATGACGAA AATATTACTG TrTTGCGAAC CATTAATTCG AATTTCACCA ?1'AGATGCCA CCAGTATCGG CGATCATGTT GCCAGTCG.A CTTAT'rTTGG CGGATCAGAA CTTGTAATTT GCAAGCCCTG GGTATCTCAA CGAAAGT'T TACCGCACTC AGATTGGAGA TCGTTTTCTC ACATTCTTGA
GTCGGCTTGG
GTGAAGTTTT
CGATCGAATC GGCCTCTACT CTACGATCGT AAGCATACGA AACAGCACCA AATCGATACC ATTTGGAGAA CGGCTTTGGT GTATCAGCCA CATTCCCCA ATTTTCATTT TAGTGGAATC AT'rAACATCG
CCTGCCAACG
AGTTCAATCT
TGTCGTCAAA
AACATGCTAG
ACCGTAGCTA
CGCCGAGGAA
GAAGCCAAGT
CCTCTCATGA
ATATGGATTC TCTCTTTCAG TCGGTCAAGA GG'rCCGTGCG TTGTCGTTTC AATGGATC'rC A'rGAATTTTC TAAGTTTGCA TTGATGACCA AAATC'rAGAG ATCGCATGCG ACTTTTAAAA CTAGTGATGA GCAAGACAAA AGTCTGTCCA ACTAAAAACT CTGGTGCCCT TTACCA.ACTA
GGGATTACCC
ATCCTTCTCC
AATCTGAGAA
CGTTTTACTG
ATGTTTCCAA
TACTCTTGGA
CAAAGATGAT
ACTATTGCTT
CAGACAGTGC
AGAAGCCAAG
TTCAGTCCTA
CGGTATTGAT
GCAAGCCTATG GTTTCAAGGC AATGTCTATC AAGCCTATGC GCAGTCTATC AACGAATTGG TAGCCTAGAA GAGGTGGAA.A CAT'rT'rCCAT ACCCTCCCrT TCTAGAAGAA CTATTTGAAG TAGCGGGGAT GCCTTTATAT 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 CTCCATCATT CCTCCCTAAA AACTACCATT GACTTTGCAG TTGCGAGCGC AACTCTCAAA TGCACTCTTC CAGGAGACCA TCTCTCCACT TCCTCAACTA GTATTGAAAA TTTACTGGCA AATGCACAAG A'rATCATTCG TTAGGAGAAT TACATGACCA
AATCAGATAC
CAAAGGAAGA
GATTATTGAA CTAAAAAAAC AGGACTACAA GCCTCGATTG AAATCGCCTA TACCAATCAG TATGCAGGAC AGGACGATCA GAGTGTTTGT ATCGGTGCAG ATGCCATTCT AGCTGGAGCA AATTACGTTG AAATGTGCAA TCTCTACAGC ACACCGTACA CGAC'TCCACT TGAAGCCGGT AGTGAAATCA CAGCATATAT CTCTGCAGTC AAGGCACCGA AAAAAATTGT CGCTGTTATT CGAGGAAATA CTTGTATCAA GGGCGG'rATC AAAGCTATTG A.AATCATCAA GGAACTTGTA GACTTGTATC GTACTGTGCT TGATGCCGTA ACTGCTAGAG TTTCTCCATC TTTCCATGCT GAAACTGCGA TTCCAGGCTG TATTACCCTC ACAGAGATCA TCAAACTCTT CCCAGGTAGT ACTCTCAGTC TCCCACAAGT TTCCGTAATG GTAACCGGAG 554 GAG'rCGGCCT AAACAACATC CCTCAATGGT TCGCTGCTGG TGCAGATGCC GTTGGAATTG 2940 GTGGCGAAC'r CAATAAACTC GCTTCCCAAG GCAACTTTGA CCCCATCAGC GAGATTGCCC 3000 AACAGTATAT TACACTCAGA TAAAATCATA ACTACCCGTC TAACGGGTGG TTTATCTCAG .3060 AGCTATAAGC CCAAATCATC AGCCAGCGCC TAAAGACGCT GGCT'rTCACG TTGTTCAAGC 3120 CTTATTGCTC TTGACTCGTC ACIGCCTCT TTAAGAGACT qTGrGTATTAC T'rACCACTAT 3180 CCCTAAAGGG ATCCTrCATAT TCTTTTACAC TCAATI-rATC TAGTGCTATA GTAGATTGAA 3240 ACTGGAATAG TACACCTCTG CTTCTrAAAAC ATTGTTAAAA ATCGATTTGA CTGTCCTGAT 3300 CGATTTTGTC CTGTTCTTAT TTCATTTTAC TATATATCAT ACTTTACTCG TTCTCAAATT 3360 TTCATACTCA TGAAGAAATC ATCCACTCGA TAATTTCTTT AATCTTGACT ATATTTCTTA 3420 ATTGTGGCTT CATTAAGCCC TACTGGACTT ACATAATAAC CTTCCTCCCA GAAATGCCGA 3480 TTCCCAAACT TGTACTTGAG AT'rGGCGTG'r TTGTCAAACA TCATGAGTGC ACTrTTGCCT 3540 TTTAAATACC CCATAAAACT TGAAACACTT AGCCTCGACG GAATACTGAC TAACATGTGT 3600 ACATGGTCTG GCATTAAGTG ACCCTCGATC ATTTCAACAC CTTTATAACT ACACAAGCGA 3660 *TGAAATATTT CGTCTAAACT ACTTCTATAT TGATTATAGA TGACTTTTCG TCTATACTTA 3720 *GGGGTGAACA CAATATGATA GAACACCTCC ACTTTGTGTA TGATAAACTA TGAGTCTTTT 3780 ***GTGCCA'rATT TTT'TCTCCTT TCGCT'rTACA ATTGGATTGA ACACC'rTTAT TGTATCCCGT 3840 **.TTGGAGTTTT TTTGGTATAA CCTTCGACGC GCACCCGTAT AGCGGGTGGT TGrTTTGTCT 3900 *.*CCCACCTCAC GGAGCGAGAC GGACTAATA'r AGTGGAGTGA AATAGGATAC GAACAAATTG 3960 ATTAGGAAAA TCAAATGAAT TflATAGAAAT CTNTTAGCAG TTATAACGTT CTAT'rCTAGT 4020 TTCAAAACGC TATAGTCACA TAATAATGAA GTAAAAAAGG A'rAAGTATCA ACTTATCCTT 4080 '1TTAAAAGA AAAATCCGAA GATATTTGGC CTTCTTCGGA TTTTTTCTAT TT'rCCACAGT 4140 TTCATGTAAT TCATCTAGAT GATGAACAAA 'rTAGTTGTTC TTTCCTCTAC GGAATAGATA 4200 :**AAATGCCCCA AGTAGCAAGA ACCCTAGACT TGCCAAGATT GACTGACCTT CTCCTCTCTG 4260 AGGGAGATTC TTTTGATCCG AATGGTTCTT TTCCTCTTCA GATTTTTCCT TTTCTTTTGA 4320 ATTCTGTACT 'rGTGGCTGAG C'rGCTTCCTC TAGCTTTTTA AAGACTTCCT GATCTGGAGC 4380 TGATTCCTGG G1'?rCAGGAT TATAGTAGGC AATCTTATAT TCATCCCCTT CrTrTCGAAT 4440 GGTATAGACT CCACGTTTCA AAACTTGGAA TTGGTTGGAA ATAGTAGAGA CAGAATCATC 4500 *ATATTTCACA ATGCCCCAAA CTCCTrGTTT AGCATCATAA ACAGACTGAA GGGTTTCGTT 4560 ATTTTCGATG AGGCTACTTT CTAACTCTT'r TATCATTTGA TTGAAGGTGG CACGATCCAC 4620 GTTAGGAATG AGCATATAGC CATAAGAATC TCTATTTTGC TTATGAGCCT GACTAATrCGT 4680 555 AAAATC TTTACTCTGCG CTGTCCTTCA rrGATATCCT TCCAGGCTCC 4740 CNTGCAAA GCCTTACTCA TACTGATTGA ACTCTTCTTA AAGAAAAAr.T AACCAATA'rr 4800 CT'TTTTCGAA TCGAACGATT CTAAAAAGAC AC7?TGGGrr TCAGGATAAT CCTTTTCTTG 4860 TTCTGTrAAGG GAGGCI'CTT TATCAT'rGAC ATAGACTTTA TATGGATTAC CTGA?1'CCAG 4920 'N'TTCTCTrGG TCAATTGTAG 7rGCAGCAGT ATCTGTTGAA GTGTTTGGA TATTGCTrC-C 4980 TAAAAAGGCG ATCT'rATCCT TTAGCATAAA CCACCTCTTA rAGCAGTCA ATGTTTGATT- 5040 CCAGTTGGTG AAATCCATGG TTGCTGTCGC ATTGGCATCA TCTAGTTTGC TCGTTCCAAC 5100 GAAAGCAGAC GGTAAAACTT TACCTGTATC GCTATCCGCT CTCTTAGCAT CCGTCTCTGT 5160 TGTACCAGGC ATCTTATATG GATTAACTGT TGGCCAGTAG CCA'rCGCTAT AGTGACTCAA 5220 ATCGCCA'rTG TAAAGATAGA ACATCCCATC ACTCGTATAC CAACCACGTT TATTTTCCTT 5280 GTTCATGTGT TCGTAAT'rCA AGGTACGACT GGAAAAGAGT GACAAGCCAA ATCCAAACCC 5340 TTTCTCTGCA TTGTACATGG CTGTTTTATC CATCTTGTTA AAGGCAGATA GGTAACTTGG 5400 TCT'rGGAACA C?1'GCGACTC CTGCATCACT TAACAAGGAT TGCATCAAAC TGATATCC'rT 5460 *ATAAGTCT'rC AAATTCTTAA AGACATCATA ATAACTATCC GATTGAACAA TGGTCTTCAC 5520 *.*AAGACTCTGC AAACATTG'rT TGGTTTCTCC TTCAGACATA TCCGCTATTC GGTGAATCCC 5580 *TCT-rAGTACT TCTACTGCGG CCACGTGCCC C1'CGCTATTT GCACGACTGA TCGAGCGTCC 5640 *ACGACTCATA TCCATCAACT CTCCATTCAC CAGCAAAGGA GCAAACGATT TATCAATCCA 5700 *GTGGTACATG GTTITGCAT'rT TATCTTTATC GATTGGATTC TTCGTCTTTT GAATGACTGG 5760 CAACAGTTGA GACAGGCCAT CAATCAAAAC ATTCCCATAA GCACCCGTAT AGGCAACATT 5820 *GGTGTGGTCG ATATAGGATC CATCTTGATA AA.AACCTt~CA CCTTGGTCTA CCAACT'rGAA 5880 .*CACTTGCTCA ATCGAGCGAA TGGTAGAAGA AATTTCTTGA TCATCCTTAC GCAGTAAACC 5940 AGCTATTACT TTTACCCTTC CCATATCAAC TAAGTTTCCA CCTAGAGCCT TGAATGGGT'T 6000 *ATCAGTCGTC TTTCGGAAAT GTTCGGGATC TGGTACAAAT TTTTCAATCA CATCTGTATA 6060 *TTTTTTAATT TCCTrCATCAG AGAAGTATTC TTTCATCAGA GACAAGGTAT TGTTGATGGC 6120 ACGAGGTGTA CCGATTTCAT AAMCCCACCA GTTCCCAACA ATGCTCTTTT CACTATTGTA 6180 *GACATGTTTA TGCATCCATT CCATGGAATC CCTGACTGTT CGAACGACAG TTTCATCTTG 6240 ***ATAATAACGA GAAGAAGGAT TGGTCACTTG CTTGGCCATC TCCTCCAATT TCCGATAAGT 6300 GGCAGTCAGA TTTGCAGACG T'TTTATAATT TCAAAATTT'r TCCCACAAAT AGGTGCGGTC 6360 CGCCTGACTT GAAA'rACTGG ATAGCCTATC AGCTACCTT CCTTCCAATT CCTGGTTTAA 6420 556 TTTGGCCATC TGTTCATT'rT TAGAATCATA GTATTGATTC CCAGCGATGA TGCCATTCCA 6480 GTCATCCAAA CGGTCTGTGT ATGCATCCTT AACAGAGGCC AGAATCTTCA AAGGAATCTT 6540 TTTCACTTCC TTGCCATCTT TACTGACAAT GACATTGGTT- GTCCCTrCCT TAAGAGGTTC 6600 TAAA.ATTCCA TTTTrGACTG AAGCAACGTC AGGATTT'rCT ACCTTATAAG TATAGTCCGC 6660 AAGAGAAAAA. ACATGTT r TTCCAATTGG TAAATCAATC T'rCCTCAA GCTGTTATC 6720 TGTTTGAGAA TCCTCAGAAA GCTGGTCTGC TACCTCTACC AGCTCAATAT CCTTAAAGGA 6780 AACAGTCCCA GTTCCTGTTT CATAGAATAA CTCCAGCT'rG ATTrTATCAA CATCTAAAGT 6840 CGGGCTATAG TCTGCI'rCAA TGGTCTGCCA GTCCTr'PGT'r CCTGACGTCG TTGCAGAATT 6900 CCACAATCGC TTGTCCTTAC CACTTTCCTC AATGATACGA ACTT'rGGCAA TCCCGATTTT 6960 ATTATCTGTT TTAATCTTGA AACGCAGTTr ATACTTTTTC TrAGCTTCAA TAGGAACCAT 7020 ACGGTGAAGC GCTGCCCTTA ATTTCTCATG GCTTGAGATA GTGATAGCCC CATCCTTAGC 7080 CTCAATG.ACT CGAGTI'GAGG CATCTGCACT AT'rCTTCTGG TCTACCCAAG CTGACCACCC 7140 CCTGAGCTTT GCT'rCCTGTC CGG 7163 INFORMATION FOR SEQ ID NO: 68: SEQUENCE CH{ARACTERISTICS: LENGTH: 9244 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: **CGTTATAACA TACATGTAAG CGGTACCCAA AATGGTGCCA AGTCAAAATT TTTAAGGAGG AAAATACATG TCTTCACATC CAATTCAGGT CTTCTCAGAA ATTGGGAAAC TGAAAAAAGT 120 TATGTTGCAC CGTCCAGGCA AGGAGTTAGA AAACTTGTTG CCGGACTATC TTGAAAGGCT 180 TCTTITTGAT GATATTCCTT TCTTGGAAGA TGCTCAAAAA GAACATGATG CATTTGCCCA 240 AGCTCTTCGC GATGAAGGAA TTGAGGTTCT CTACCTAGAA CAACTCGCTG CTGAATCATT 300 GACCTCTCCA GAAATCCGCG ATCAATTTAT CGAGGAATAC T1'AGACGAAG CCAACATCCG 360 TGATCGTCAA ACCAAGGTTG CTATTCGTGA ATTGCTTCAC GGCATCAAGG ACAACCAAGA 420 ATTGGTTGAA AAAACAATGG CTGGGATTCA AAAAGTTGAA TTGCCAGAAA TTCCTGACGA 480 *AGCTAAAGAT CTAACTGACT TAGTTGAATC AGAGTATCCA TTTGCAATTG ACCCGATGCC 540 AAACCTCTAT TTCACTCGCG ACCCATNTGC AACAATTGGA AACGCCGTAT CGCTTAACCA 600 CATGTTTGCA GACACTCGTA ACCGTGAAAC ACTCTACGGT AAGTATATCr TCAAATACCA 660 557 CCCAATCTAT GGCGGAAAAG TGGArGGT CTACAACCGT GAAGAAGATA CGCGTATCGA AGGTCGAGAC GAGTTAGTTC TTCTAAAGA CGTCCTTGCA GTAGGTATCT CTCAACGTAC AGACGCAGCT TCTATCGAAA AACTTTTGGT CAACATCTTC AAGAAAAATG TTGGCTTCAA GAAAGTrG GCCTTTGAAT TTGCTAACAA CCG'rAAATC ATGCACTTGG ATACTGTCTT CACTATGGTA GACrATGACA AG '?CACTAT TCACCCAGAA ATCGAAGGCG ACCTTCACGT TTACTCAG'rT ACTTACGAAA ACGAAAAACT TAAAATCGTT GAAGAGAAAG GTGACTTAGC TGAACTTCTT- GCTCAAAACC T'TGGTGTAGA AAAAGTTCAT TTGA'rrCGI' GCGGTGGTGG CAATATCGTA GCAGCTGCGC GTGAACAA'rG GAACGACGGT TCTAACACTT TGACCATCGC ACCTGGTGTG GTAGTTGTTT A'rGACCGCAA TACCGTGACC AATAAGAT'rT TGGAAGAATA S S
S
S. *S S S
S
*9 S S S. 45 S
S
S
S. 55 S S
S
S
S
5.5.
55 S S
S
CCGGCTTCGC TTGATTAAGA TTCGCGGAAG TTGTATGTCT ATGCCATTTG AACGTGAAGA GAAAATGTAA AAAATAGAAA GAGGAAATAA CAGCTTCTTA GCAGAAAAAG ACTTTACCCG AGCTCACTTG AAAGATTTGA AAAAACGCAA TATCGCTCTC CTAITGAAA AAACATCTAC 'rATCGACCTT GGTGC7TCACC CAGAATACCT AGAATCTAC'r GAAGATACTG CTAAAGTATT TAA.AATGACA AATTCACTAT TGCAGAGTTA GAATACCTTA TATTCAACAC CACTACCTTG TCGTACTCGT GCAGCCTTTA CGGAGCAAA'r GATA'rTCAGT GGGACGTATG 'rTTGACGGGA
TCCAAGGACG
TTGGTCTrTrC
CTGGCAAGAA
CAACTGCGGC
TGGGTAAAAA
TTGAAT'rCCG
CAGTATGGAA
CTGTTCAAGA
TGAATTGGTT CGGGGCCGTG GTGGACCTCG AGTGTAATCG CTGTTCGATA TTCGTCAATA 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 16.20 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 CGGATTCAGC CAACGTATGG CGGTC'rAACT GACGAATGGC TTGAAG;ATrT GGCAGAAMTC TCAGGCGTTC ACCCAACTCA AATGCTCGCT GACTrACTTGA AAACTTCGGT CGCTTGGAAG GCT'rGACATT GGTATACTGT GGTGA'rGGAC GTAACAACGT TGCCAACAGC TTGCTCGTAA CAGGTGCTAT CCTTGGTGTC AATGTTCACA TCTTCTCACC AAAAGAACTC TTCCCAGAAA AAGAAATCGT TGAATTGGCA GAAGGATTTG CTAA6AGAAAG TGGCGCACAT GTTCTCATCA CTGAAGATGC TGATGAAGCA GT'rAAAGATG CAGACGTTrCT TTACACAGAC GTTTGGGTAT CAATGGGTGA AGAAGACAAA TTCGCAGAAC GTGTAGCTCT TCTTAAACCT TACCAAGTCA ATATGGAC'N' AGTTAAAAAA GCAGGCAATG AAAACTTGAT CTTCCTACAC TGCTTGCCAG CATTCCACGA TACTCACACT GTTTATGGTA AAGACGT'rGC TGAAAAATTT GGTGTAGAAG AAATGGAAGT AACAGACGAA CTCTTCCGCA GCAAGTACC TCGCCACTTC GATCAAGCAG AAAACCGTAT GCACACTATC AAAGCTGTTA TGGCTGCTAC ACTTrGTAAC CTTTATAT1'C CTAALAGTATA ATTTTAGATA ATAAACCGTC TACCAACAGC 4. *4 S S S* @5 S S S 5
C
a TATGAGGGCT GCGACTAATA TTTCTTATAA AATATGTGAA GATAAAGGAG AATTTATGGC CTTTCTTCTG ACCCATCAGC CT'rGTAAAAT TGATTAAAAA GTTGGGAATC TCTTGCTCCA CTCGACTCAC T1'GTCGCTAT CAAAATGCrC 'rCTrGGATGA GTCGTAGATA AAAArGATCC TCAGAAGAAG AAGCAAAAGC GGCCGTGGCT GGCG'rAAGGT ACCATCCGTA CTCTr'IrAAA CCCGTCGTCA AAGAAAACAA TTCGCTTCCC AACGTTTGGC GTAGATTATG TATTTGTTAA GTTGCCCAGC TGGAAGAATA AAAGTAGAAG CAGCrATCGC TCCCT'rGAAA ATCTAGGCGC TAAGTTGTTT TACTAATAAG TATTCTT'GAA AACATGTACA AACAAATATA AATAGAAAGC ACACTTTGTT AGACATCAGG GATGCCTTCA TCTTACACCG GT'rTATCCCT GCGGGGGCCT AGGGATTGG GATGTCCTCA AGGTTCGCTC ATTAAAGAAA TGGTGGT'rrC CTTGGCATTG CGTGAAGAAG TATAAGGGCC CCTCGGTGGT ACAACTTATG GCCAGTTATG ATGGCCGTTG 558 GCTTTAGTCC GGTCCTCTTT AAATCATTAA ATTGAAATCT AAATCGTAAA ATTGTAGTAG AAAGGCTCAA CAAGAAGCr7 TGGAGATGAT CTGATrATCA ACATTTGGC-A TCAGACrCTG GACAGAAGGT AGCATCGGTr AGGCATCGAA AAAAATGTTG AGCTTTTGTT AACTTGAGTA AGAAGCCGAA AAAAGCGGAG CGTTGCC'rCA CCAAAACCTG TAATGGTCAA GTCGTCGTAG TGGACAr'rG ACTGGTGTCG AGAAT-TGGTT GATGCAGACC CTACAACAAG CCAAACCAGG TATCAAACAA GATCAGTTTG TTTrGTCAAT GGTCGTCCAG CTTGATTGAA TCTGAAAGCG ATGTATTCTA Tr'1CTAGTAT ATA'rTTCAAA AGATACTAGT TATGTAATGG TAATCTATTA AAACGCATrC TATTGAGTGT CTTGGGAGG AAATGCGAT'r TAGTTGAAAC AGCTAAGCAT CTCACGGTAA TG-GACCTCAA
A.AAAGAACCC
TCTGGTTGAA
CCTCTGTTGT
AACCAATCCG
CGACTT'rCAA
TTGACATCAA
CTGCAGGTGG
AAGCGGTTAT
TCT1TCATCGT
AAAAATTGGA
CACCAGGTAG
AAGGAAAAGC
GAACAATTAT
CTTTATATCA
TTTAGACTTT
TGCCTTCCCA 2760 AAATGCTr'rG 2820 AACGCAAGT'r 2880 TCCTrTCTAT 2940 GGAAGATGCT 3000 AGAAATTGAA 3060 TGGCGGTATT 3120 TGATAAAGAC 3180 T'rTGACAGGT 3240 ACATGTGAAT 3300 CATGCT'rCCA 3360 AGTTATTACT' 3420 TGAAAAAGGA 3480 AATTAGAAAT 3540 AATATGGTAA 3600 T'rGGTTTTTT 3660 AAGGGTTTAA 3720 TGCTAACT'rG 3780 AAAATCCACA 3840 2460 2520 2580 2640 2700 GTTTTCTTGA ATGTTTATTT AAGAAAGTAG AGGAAAAACA AA'rGAGTGAA A.AAGCTAAAA TATTATTGAT AATCATTGCT ATTATGGCAG TTATAGAAGG TATTTACGAG ACTCAGCCTC S. *S S S
S
TCGCACCGAT TCGGGCTATG CTAGGTACTC ATCCAGAGGA CGAGCGCAGC GAT'rGATGTA GCCTTCTTCA TCCTTATGGT TCAACAAAAC TGGTGCTCTT GACGTAGGGA TTGCCrTAT GCGAAAAAAT GrAATTTTG GTACTGATGC CTTTGTTrGC GTATGGCTGA AGAAACAATG GCCTTCTA'rC CACTCCTTGT GTTTTGATAG CCTGACTGGT GTTGCAATTA TrrTGCTCGG 39Q0 3960 4020 4080 4140 4200 559 TTCTCAAATC GGCTGTTTGG CATCTACTCT GAATCCATTT GCGACAGGTA GACTGCGGGA G'rTGGTACAG GGGACGGTAT CGTAC1-rCGT C1'GATCTTCT GACTGCTCTT AGTAC?1'GGT TTGTTACCG TTATGCGGAT AAGATTCAAA TAAGTCACTG GTTTATAGTA CTCGCAAAGA AGATTrCAAA CACI-rAACG TTCATCTGTA GAATCTACAC TTAGCAGCAA ACAAAAATCA GTTCTCTTCT GACATTCATC TTGATGGTAT TGAGCTTCAT TCCATGGACA GACCTTGGCG TGATGACTTT AATACTTGGT TGACTGGTCT TCCAGTTATT GGTAATATTG TACTTCTGCA CTAGGTACTT GGTACTTCCC AGAAGGCGCA ATGCCTTTG TATCCTGA'rT GGTGTTATT'r ATGGTCTTAA AGAAGA'rAAG ATTATCTCTT TGGTGCTGCT GACTTGCTCA GTGT'rGCCTT GATCGTAGCG ATTGCTCGTG TATCATGAAC GACGGTATGA TTACCGATAC AATCCTCAAC TGGGGTAAAG TTGCTTCAGc
GGCT'?ACCTT
AAGATCCGAC
TAGAAGAATC
TA'NrGTGTT 4260 4320 4380 4440 4500
TTACCATT
TCGGTrCATC
CCTTTATGGG
CCTTCATGAA
GTATTCAAGT
AAGGCTTGAG
CTATGTCATT
CGGTCTATCT TCACAAGTCT TTATCGTTGT AACTTATATC T'rCTATC'rAC CTTGATCCCA TCTTCATCTG AGAATT'rGTA AATGTCCGTC CTTGAACTTG ATTGCACCAA CAACATTGGT ACTTGGTGGA CATCGCCCTT CTTCTCCTTG TTCCA'rCAAA ATAGATATAA CTTGA'rTTCC TATCC1"rCAG AATCCAAGAT GTCCTAGAAA TCTTGACCC'r AAACGTTATT CAT'TCTCTGT CATTTGGATG ATTTGAAGCA ACTATCAAAG CCCTTCGCTC GCAGCTCTCT AAAGCGCGTA CGCTT'rATCT CTACAATACC ATCGAAGAAC GACCTA'rGCT GAAAAAGGGC AGAGCTTGAA GTAGGAGGCG GTCTrGCCAG CGCAACTATG GGTATCATGG CTCCACT'rGG CTAGCTTGAT TATCACTGCT TACCAATCTG CTTCAGGTGT CATCTGGTAT 'rGTGATGGGA GCTCTTGCAC TTGGACGTAT AATTCATGGG CAAACTCGTA GTCGC'rATTA TTGTAGTGAC GAACCTTCCT TCCATTCCTA TAAAATAGTG AGTGAGGTGA CAAATCAAGT TAAAGATGAA TTTCTTATAT CATTAAAAAC TACTCAATGA AGGAGAAAAT GGAACACCTT TTGGACAAGC AAACTTTACA GATTTGT6GA GACATAGGTT TCACTACCTA 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460
ACGGATATC
TTGTTCCATC
ACGGCTGGGT
ATGCAGTAAA
TTGGTACCGA
AGGCCAGTAT
TTCTACAGGT
CCTT-TAACGT
AGAAATCGGT CAGGGAGCAG AGGTGATGAA GCAGATTGGC AGACACCGCC ATTCGGACGT GGTGTCCMG ATGATAAAGG AAGCTTGCTG GACCAAGGTA TTCAGTTCAA TGAGGAAACC CTCTGGCGCT GCATGGCACG GGGCTTTGCA CCTGACTCAT CTT'rrCCTCT CAAACTTCAT GGCCCTGGAT CGGATCAACT TGTACCAGAC AAGGCCAACT ACCAAGGTCT 5520 5580 5640 5700 5760 5820 5880 5940
AGCTTCTGGC
CCTCTATGAA CAGGTTTGTA ACGGTCTCAA AGAAGCTGGT TATGATTACC AAACCACTGA 560 ACAAACCGTA ACGGTTCTCG GAGTGCCAAA GCATGCTAAG GATGCTAGTC AAGGTATCAA TGCTGTCATC CGACTAGCTA CCATTCI'GC TCCTCTCCAA GAACACCCrG CTCTCAGTr'r TCTTGCAACA CAAGCAGGTC AAGACGGCAC AGGAAGACAA ATCTTTGGTG ATATAGCAGA TGAACCTCT GGTCACCTAT CCTTTAATGT CGCAGGTCTC ATGATCAATC ATGAACGTC 6000 6060 6120 6180 TGAAATCCGT ATTGACATTC GGACTCCTGT GC'N'ACAAGA TGTGCACAAA ACTACCAACT TCTATACGTC GCAGAAGACA GTAAACTCGT GACTGGCGAT AACAGTCCTG AAATTGTGTA GCCTTCGGCG TGAATGTGCC GTTC'rAGAAG TCGACTTGCA ACTTAATCAG TGCACCCCAA AAGT'rAGACA GTTATGAAGA TAAAGTTCAG TTCAAAAAG ATTTGGTGTG C'rTTATAAAT CGCTTTTC CCGTGGTATT GCCTGATTTT AACATTT'rTA GAAATCGATT ACTATATTTG AGCCACTTCG TACTATCAA'r TACTTCTGAT ATTTCTCGAC ACGCTCATTA TTTTTTCAAA AATATATTTA AI'TAATTTGA AACATAAGGA CTANTrCATC
CCTTATTCCC
ATTTGTACCG
CCAACTGTTT
GAATAAATCT
ATCTATGAAC
GATGTTTCTG
AGTTTTTGCA
ATAGTATATT
CTTAGCTGAC -AGAGAAC TAGTAGAGTT CCGCTACGAA GAGTTTGACT ATCTAGCGCC TAGCACACTG ATGCAAA'rC' ACCAAGAAAA CGGTGGTGCC ACTTT'rGCTC GCACCATGCC AGGAGCGAAG CAGACAGAAC ATCAGGCAAA TGCTATGGA'r ATTTATGCCG AAGCCGTCTA CTACCAAAAA AAATCGACCC ATTAATGAAC AACTTTTGGG GTGTTTT'ATT ATGAAA ITGA TAAGAAAGCA AGGACAAAGC TTCAAACAGC
GTCTAAAGTC
CTGGTGTTTC
GAAACTAGAA
TGACTGTCCr GATCGATTTG TCTTTAACGG CTTTA7TCAT TTTCCGTTGT AAT'rTATTGT ATTTGA'rCTT TTTTGAAGGC TCAGATAGCG GTTTGTCTTC ACAAATCCTT CATAGTAACC ATCTGAATCT TTGAGATGAG GATAAACTCA AACTTTTTAG TAGTACACCT CTCCTTCTAA TCCTGTTCTT AT'TCA'T'r'r AAGCTCTTGT AATTTTTCTT AATAGGTTTT AACTTACCTA rGCTTATGTT T'rTCCTAAGA TTCTTCAGCT TGGTTTTTGT 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740
TAATGCTCCC
AAGCTTGTTT
CTCATCCAAA
AGCACCCATA
ATATGGCCTC
CATCATTCAT
TCTAATTCAA ACCATTGCAA CTCAGATTTC AGCTTTTCAG TAATGACTTG AAATTAGTGC TGAACTCGTT TCTGTATCCT CCAGCAAAAA ATAAACTCGT TCCTAGCAAG ACCGAACAAG
ATAAGTTCAA
ATAA.ATCCTG
GTACAGGCTG
CTCCTATTGC
AAAGAAAAAC GCTGCTTTCT CTCAAATTGA TATTACTGTA TATTTTG'rAT ATCAGAXATA AATTC'rTCA TCCCATCTCC GTTTGTATTC ACAAATCTTT CTAGTTATTC CCTTATCATT CCTAATTAAG GGAGATAACA TACAATAATT TTTAGTTAAA TGTATATCGA TGTTTTTTGT TTTTCTTAAT AAACGCAATA CAAAAAGAGC CTGTTACCAA GCTCrrTGTA CTCAATGAAA ATCAAACAGC AAATTAGGAA ACTAGCCACA GGTTGCTCAA AACACCGTTT TGAGGTTGCA GATAGAACTG ACGAAgTCAG TTGCAGATAG AACTGACGAA GTCAGTAACA TCTATACGGC TGAAGAGATr ?TCGAAGAGT ATTAGTCTAD TATTrCTTCT AmTGTGI-rC GGATATCATC CACACCA'rrT GGAGTATTTG CCTTTAGAGG CAAAGGTATr CAAGGTATCC AAATACTGGT CTCAAAACAC TGTTTGAGG AAGGCGACGC TGACGTGGTT CAGCGCGAAG GGCTGACAAG GTAAAAAGAT AGTr'rGATTT TGGTCAAGAG GATAGACATG 0 :0 .0* 0 0 0.
0 ATTTGTTCTT CTGTCATGCC CCATCCACAA TCGCCTTACG GCTrCTGCT'r CAGCTGCAGT GCGACCCGCT TACGTTGCGC GGTTCGACCT TGGTAATCAA GCTACTTGGT GTTGAACTTC GTTAAr'rG GAACAGAAGA GGACGTATGA GTTTATAGTA GCTACATTCA TCATAACGAA TGCAACAAGC GCAACTGAAT TGAATACCGC TATTAGCAAC TGCTGACGAA CCACATAAAC ATCAGAAAAA TCATGAAAAA TATAGCACAT TTAAAGAAGG ATTAGAGGTG AATTGTCCTA AACATTGGCT TCCTTGAGTT CGGTGATAGA CTC'rGCCAAT TTGTTGGGCA ATCCCCACAC CATGAAGGCG GTCTTTTTCT GACAATTTTA ATCTTGTCAG CTTCCGCCAA TTCTTGTGCT CGCATTGATT TCATTCATGG ATTGCTTAAC TTCTGCATCT GGTTTTCACG ATAATGTAGC CGTA.AGTGGT CA'flTCTTCT AAGGGCAATC TCATCTTTTT TCTCAAACAA TTCATCCAAG GCGAAGAGC-A TCTTCGATAT AAGATTTAAT CTGAGATTCT AGCATCTGTC ACGCTC'rGCT CGTTGACACG GTACTGAGTC CACATrGTCC TTGGTCTTAG TCTCAACCAC AATATCACTT CCGTGCTGCA ATCGAGTCAA TCCCAAAAGG CAAGCGAATA.
7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9244 CTrTTGGTAT TTCCCAAAC TGTACTCAGT GTGACTATCA TATTGCCATA ATGGAACCTC CTGTGCCGTT TTTACTGCGA TTGTCGTCCA ATCTCTTGCT GTTCAATAAT CGCCACCGAC CCAATAGGAG CACACAAACA CACAAGTATT TTTCTAGTAT TTTTTCCTGA AATGTCAATA AAAATAACTC TTTATAAAAG GTGCGCAGGC ATCAATCAAG CAATACCAAG TTCAGTCAGT GCAATCGTTT CTTCTAAGGT TGGCATAAA'r GGA? GCATTCTTAG AAAGGTATTC AAAGTCGAAA TCT TTCTTAGGAA TACCTACTGT CTCAGAAAGC TTCT CATTCTTGAT CTGATTTACC TTCTACATGA AGTC' GCTTCTGGTA CACGTTTAGC ATTTTCACGT TCTA
ACGG
INFORMATION FOR SEQ ID NO: 69: Wi SEQUENCE CHARACTERISTICS: LENGTH: 8898 base pairs TYPE: nucleic acid STRAl4DEDNESS: double
TTCCTG
IrTCTT
CAATCT
CCAAGG
rAACTG CAGCAATCGC ATAATCGGCA CTTTGGCAAC ATTGCGGAAA GTAGCAACAT GGCACAGCAC TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69:
GATCTGAACT
Tr'rCTGTCTA
TTTTGATTTG
TGCTAAGACA
TTGTAGGAAC
ACTAGAAAGC
AAATCCAGGT
CGCTCCGCCA
GACGTTGATT
ACCAAATTGA
ACCAAGTAGA
TGGGAAATCA
TTATCATCAT AAC?1'AATTT ACTTTTGGGG TGTAGTTCAG ATGTACTTGA TACCATCTGC AGAATTGTCA AAACATAAGG GGCAATTGAG AACCGATAAC ATAGCACCGA T'TGGATTCCA CCAACAATAG TTGTCACTGA ATTCCACCTA GAAAACCTGA CCCAAGGTAT CCGCTGCTTG GTCTTAAAGA GAATAAACCA CTAGTTGACT TGAAGAAGAT AAGCGTCCAA AAGTTTGACT CATAATAAAA ACACCCCAAA AGTTAGAT'TT TCATTGGACT GACGTTTTTT TGTATGCTTA TTTNGGTGCG ACTGCTTTTC CAAAGAAGGC TGCAATTTGA AGATAAACCG CTGGCACTCC AGCCAAACTT TGTGAAAGTC CAAAGAAGAG T'rTCCCA-AAG ATCATCGCAG CAAGGGCGAT GAAGTTAACT CAGATTGATT GCGCATAAAT AATAATAACC CCTAAATATC TCATCT'rGTA AGGATGTTCA CCGACAGAGC GGAC-ACGAAG AGCAAGGAAT GAGAAGGCAA TCGCCAGATA ATCACCAATC ACTGGGATAT TTGCCAAGAC TAGGTTGTCG GTTTGTCCTT TGTTATAAAG CGCCATCA-AG TTCAATACCG TACCGCTOAC TGCTGCGTGG ATGATAGAGA AAACACTACC TGGAGTTGCT GCTCCAAATT GTTCTGCAAA ACCCATAACC ATAATTCCTT CAAGGCCAAC ACCACCGATA CTTGTAAAGA TGAGAGGTGC GGGGAGCAAG GTTATAATAG ACATCTTTAC ACAAAGCGTT CGATAAGGTA ATGAACACTG CTGACAAGCT CAGATGGTAC CTGCGCCGCA CCAAATAGGA AGGCTGCAAA GAGTATACCA AACTTTAACT AAGAAAACAG CCAAGGCAGG AAdATGGTCT GCACGGAAAT GAACCGTCGC AACCAATCCT GCTACAAGCA AGGATAGCCA TTCAAGGTTA AAGACAACTC CAGAAAAGGC GTTTACCACA CCACCACGTT CAGAGAAAAC TGAGTAA.ATC AGCATAGAAG ACACCAAGAG TTACCTCCTT TAACTTGTTT TTTCGGTTTG ACAAAGAAGA TAATAGACGC TGTTACAATG TTCATACCAG GAGCCCCAAC TTGGAGAACG ATTGGTGAGT TGGCCGCAAG CAAACTAACC -GAACCTTGAA CATAGACGTT CTGGAAGGTT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 GCCAAGGCAC CTGAAATAAT TA'DTCTGAAG CATGTGGATT TTGAGCATGA ACCAAATAAC CGTGAGTTAC CAGTCAACTC
CATAGATAGG
AAGACCAACT
TGCAACGGCA
GCCATTCCGT TAAATCCGAT CCCAAACCTT CAACAGCTCC ATAATAGTCC GCTTGGCAGA GCACGGATTT CAAAACCAAG ATGATGGCAA AGAAAATACC
AGCTAATGAC
ACCAAGACCT
AATACCAGCA
AGTTGTTTTC
AATATTCATC
AGCCAACCAA GGTGTCTGAT AGGTTGCATT AGCCCCAACA 563 CGAATGGTCG AA'rCTGTACT TTGCATGAAG TCTTTAGGGA AAGCATGGAT AAAGGCATTC CCTACATACA AGACAATGTA GTTCATCATG ATGGTTACAA TAACCTCTGA CGTCCCTAGA TAGGCCCTAA GAATACCTGG AATCGCTCCG ACAATCCCAC ATGGTTGCTA GAATCATCAA GGGACGGGGC A'rATCTGGAT CTGAGAAMCC AACCTGCCAA AGCCTGACCA GGAAGTCCGA CTGGCAACGG CAAAACCAAG ACCAATCAAG ACCAGAGGAC CCAATCCCAC GCAGACTGCC AAAGGCTGTA TAGAACAATT TCATAACCGA AGATCCACAT GACAATGGCT CCGAG'rAAAA AAGGGAACCG AAATTTGTTG TAATT1"r~rA GACATCACTC ACCAGCCATC AAGACACCAA G~rCTTGTTT ATTGGTTGTT AATCTTACCA TCGTGGATAA CGGCAATACG G'DCTGAGACG CAGCAATCAA GGCAATcACG GCGACAGGGC AAACCAACCA CGTTAAAGAA ACCAGCTCGA CCATAGCACG GAAGATTT-CT CTTCGTAGCC CCAAATAGCA TrrCCTAGGAA TACAGAAATC T'TCTCCT'rC CCAAGTTTCC TCTGGTGATA CAATACCTTG AAAGCTGACA ACAACGACAG CCTTGCCATT ATACTCAATG GCACCGACAT CCAACCCACG ATCTCGATCA ATTTCACGAG CAATAATTGC TGCAGGAACT AATTCACTGG CAGCGCGAAC AGAAGTAATA TTTGAATAAT TCAAAATTCC TTGAAGGGCA ATAT'TrCAG ATATCATCAT TTCTGGAACG TGCCCAACAC TTAGTTCTGT ATCTCCTTTT AGCTCAATGC TACCAGATTC CAGTTCAGAC TGACCATTTC CATCAATCCC TTTAAAA'rCI ATCACGCTCT TCAATCAAGC AGTTGGCTGG CTAACGATAA TTTT'rGT'rA TTCCTCCTG
CATCCAATTC
GTTTGTGGAT
GGAGATCAGG
AGAGTGCAGC
TTTTAGCATA
TATAGTAGGT
GGTGACGGTC
CTACAATTGA
ATCAAACTCT
ATTTTTACTA
TTCCAAAATC
AATCTGACGT
AACCTTACGA
CGCAATAC CA
TCCATCAGCT
TG.TGGTTCTT
AAGCCA'rCAC
GGGTGCAAGC
1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 ATCCAAGGAC AGA'TTTTAA CAGCTGGAAC ACr-ACGGTTT GATAGACAAA ACCACTTCTT TTGGTTTAGA GGCTTGCTTC ACGTCCTACC ATCATTYCCG CCAAATCAGC ATTGGTAGCC AGACCTG'rAA TGGCTTGAAT ACAATCTCTC CAGCACGAAC TCATTGACCA CCAAATCTTT TCTGTTT'rAA AGGAAACAGA CCTGCAATTT CAACGGTTTC AATTGATTT~C CCACGACGGA TAACTGTAAC TTTGTGGGTA ATCAAGATAA TTGATTTTCC CAACTCATCA ATTTCTGATG GAGTCAAAAC AGCCCCCCGA TAAAGTGTTT TTAAAAT'rTC TGCTACCTTG GCAGAAGGGT CAACAGCTAA TTTGCTAGCT CCAGCGATAT CTAGCACACC ACGGTCAGAA ACTGCTCGAA TT'rCATCCAA TTCTTTGACA AGATTTTTCA TAATAGCCAT AGCCGTTGGT TCGTCAAAGA TAAGGATATC TACACGTTGT TGGGCTCCAA CTGAGATATC GCCATAACGT TCAGAAAGAG CCTTGATTTC ATTTTTAGTC AATTCACTAC CTAAAATGAT GTTTTCAGCC ACrGTGAAGG CTTCAACCAA CAAGCTAGCT GCTTTAGATG ACCACTAGrI' GGTTCAAGAA A'N'TTCTCCT AAAAGTGCAT GGCAACAAAT CCACCAAACA CATGTGCTCT TCCTTTCAGA AGCAAGCTTT ACTTAGACAA GC'NCCTAAG AAATGAC1"?C ATTTTAGCTT TTGCATCTTC AAGTCAACCC CTTTATCCTT CTr'rCTGCCT TGTTAGAAAT AGAACAAAGT TTGATTCTT'r CGATCAACAC CGATAACCCA GCCTCTGCAA AGACACCTGC
GGGAGTCGAG
GGCCTGCTAA
GAATTTCACC
564
CATAAAGTGC
ATTGACAACT
CATGrrCATT
TTI'TCGTAGG
TGGTGAACCA TCCCGATTCC TGACCGTTGA CCGCGATTTC AGCGTGGACT TACCAGCCCC 7rGCAAGTTGA TTTGTCGTT TCAATGACAT TTI'CGTGTGC AACTTGCTAG TTTGTCTAGT TAAAAAAGCG GCCCTTGGCC CCTTGGTAAT ATCACGCATC GTCTTATTTT ATTTCAATAA AATGACTTTG TCTCAACTCT CATCCATTAT TrCAGGAA CTTTTACGCT GACAGCTTTT TTACCTTCT'r CTGAAAGGTT CAATGAGTAA ACGATCACT'r GACCGCCAGG ATCTTTTACA GTTGTACCAA CTTGTTTCA-A GCCATCTTTA GAAGTGTATT TACCTTCTGC AACTTTTTCA ?TTTrCAGGAC GGCTTTCGTT ACCTGTACCA CCAGCTACTT GGTAAACAAT
TCCATCAAGG
TGTTACTGCCC
GAATTCTCCT
AGTAGA'rACA
TTCTTGGTCA
GAGAGA'N'TT
ATCTGCACCG
S S S. 55 S S
S
S
S
55 5 5
S
S
S
S. S S
GCTGCGTATT
TAGTCAACTT
TCAAAACGAG
GTTGTTTTTG
CTCGCAACAT
TGTTCTTTTG
TTGTAACTTC
AAGTAAGTGA
GTGCGGCTGC AA'rTGrTTTA CCTTTAGCCG CATCACCAAA TGAACCAGCG GGACTTTGAT AGATGGGTCT ACTGACGCAA CACCAGCCTT GAATCCTGCT AGATAACTTC AGATrCGATA CCACCTACAA AACCAACT'rG TTI'rG;TCTTA CTGCAGCCAC ACCTGCAAGg TAACCTGACT CATTATCAGC GAAAGTrACG TCTTTTGGTC 'rrrAATCACA TCATCAATCA AGACATAGTT CAAGTCAGTG CTGCATCTTT AACTGCATTA TTAAGGGCAA AACCAACACC GAAGATTAGG CAGCCGCTTG TrGCA.AGTrG TTAGCGTAGT CAGCrTCACT TGTTGATTGG AACCGTTATC TTTTGAAAGA TTGTGTTCTr TACCCCA.AGC CTGCAA.ACCT 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 TCCCAAGCTG ATTGGTTGAA TGATTTGTCA TCAACACCAC CAGTATCAGT GCTTTTGTCT TCACATCAGA AGATGAAGCT GCGTTACGAG AAGAGCGGT'r GCAAGTCCAA CrGCTGCCAC TGCAACTAGG CCAAGACCTA GCCATTG'rTr ACTGAACCTC CTAAATAAGA TGTGCAACGA TGT'rGCAAG'r ATGGATTGGT GACCGTGCCA CTCAGAGAGC GACTCAGACT AGTTTAAGTC TGTAAA.AGAG
GACGATTGCT
ACCACATGCA
CTTGTTCATT
TGGCCACAAG
TATGGAAGTA
ATTCCCCGAC CGTCATCTCG ACCGTCGATT TATCTTTTGC GACTAAGGTC ACTTTTAGAT CTTGTTCAAA AAATTCAGCC ATCACTTGGC GACAAGCACC ACATGGCGAG ATCGGTTT CAGTTTGACC ATAGACAATC AATTCTGAAA ATTCTCTTTG GCCTTCAGAT ATAGCCTTAA 565 AAATAGCTGT TCTCTCACCC CAATTGGTCA AAGGATAGCT AGCAVMrCA ATATTcAcTC 5220 CCGTGTAAAC ACTTCCGTCT TTAGCTACTA AAACTGCTCC GATAGGAAAG TGAGAATAGG 5280 GGACATAGGC ATGMNGCTG G?1'TCAATrG CCAGTTCAAT CAACTCAGTA GTCGCCATCT 5340 GCCAATTCTC CTTTTAAAAT AGCTACCCCA GCTGACGTTC CGATACGGGT CGCACCTGCT 5400 TCG.ACAAAGG CAAGAGCATC TGCATAAGAA CGACrCCAC CGGCGGCCTT GACACCCATA 5460 TCAGATCCAA CTGTTI'CACG CATTAATGTA ACATCTGCTA TCGTAGCACC ACCAGT'rGAA 5520 AAGCCAGTAG ATGTTTTGAC AAAGTCAGCC CCAGCCTTTT GGGCCAAT'rc GCAAACAACA 5580 ACTTT'rTCTT GGTC1'GTCAG AAGGCAAGCT TCAAMAATGA CTTTCACTAA CTTATCACCA 5640 CTTGCTrCCA CTACTGCGCG AATATCTGAC TCAACCAAGG CTA.AATTACC TGATTTGAGA 5700 GCTCCAACAT TGATCACCAT ATCAATCTCA TC1'GCACCAT TTTGCATAGC TTCrTT'TCTC 5760 TCAAATGCTT TCACGGCTGA AGTTGTTGCT CCCAAAGGGA AACCTACTAC TGTGCAAACC 5820 TTAACATCTG TGCCTTCAAG TCCTTTTTTA GCATGTTCAA CCCAGGTCGG ATTAACGCAA 5880 =ACAC'rGGCAA AGTCA'rACTC TCTAGCCTCA GACAACAA6AC TATCAA'rTTG TTTrTTCTTT 5940 *GCATCTTGTT TTAAAAGCGT ATGATC'rATA TATTTATTTA ATTTCAPTTC CGTTTTCCCT 6000 CCATTTAGGA GATGATTTCT ACAATTTCAC GGA'rTTTTT CACTTCATCA CTTATTTTAA 6060 *CACA~1TTTTG GAAATCTGTA ACTAGTTGAG GTGGAATTTT TTCA'flTGTG TATACTmG 6120 ***CAACAATTTc ACCCTT'N'GA ACGGA GTCTC CAATCTTCTT 'rTCAAAAACA A'rTCCTGTTT 6180 CATAGTCCAA GGCATCAGAC TTAACTGCAC GACCAGCACC CAGCCTCATG GCATAAAGAC 6240 6CAAAGTCCAT AGCTGGAAGA GCTGAAATGA CACCCGTTTC CTGAGCAGGG ATTTCCACCA 6300 66sCATGAGCTAC ATTTACAGGA CGATAGAGGT CT'rCCAAGTC TCCACCTrGG GCTTGCACCA 6360 TTTCCTCAAA CI'AGCCAGT GCTTGACCAT TCTCAAGATG TTGGTGAACT TCT'rCAACAG 6420 TTTTGTTAAC ATTTGCCAAA CCAACCATAA TTTGAGCCAA TTCACAAATA AAGTGGGTAA 6480 TATCCTGACG TCCTTGACCT TGCAAAATCT CCAATCCT'rC AAGGATTTCC AGACGATTTC 6540 CAATCGCTCG TCCCAAAGGC TGGCTCATAT CCGTAATCAC TOCTACTGTC TTCCGTCCAA 6600 CAACC'N'ACC AAGATCTACC ATAGTTTGAG CCAACTCACG CGCCTCATCA ACCGTCTTCA 6660 I..TGAACGCCACC CTCACCGACA GTCACGTCTA GCAAAATAGC ATCCGCCCCT GCCGCAATTT 6720 TCTGCCA CCCAATCGCAATCAAAG GATGTT GACAGTTGCG TATC 68 GAAGGGCATA GAGAAGCTTA TCTGCTT'rGA CCAGCTGGTC TGATTGCCCA ATGACAGATA 6840 CTCCAATATC CTGAACCTGA CGAATAAAAT CCTCTTGACT ACGTTCTACT TGATAGCCCT 6900 TAATGGACTC CAATTTATCA AT'IGTTCCGC TTCCTACAGG CACACCCAAG CGACACCACC AGTAGAATGC
CTTGCCCAGT
TAAAATAAAC
AGCCTTC'rAT
GGATTAAATC
TTTAAGGATT
T'DGTTTTTTC
CTCGATGATT
GTCAAATCGC
CTTAACCATA
AGCCATAGCA
CAGCCATTCA
AACTGCTCTC
TCACAATTGC
TGGATGACGA
TCATGAACGA
CCTTGAACTC
CTAGCAACAA
TTTCAACT
TTCATCGTTA
AAGGCAGACA
ATTTCACTTG;
ATTCTTTCAC
CAAACACATC
TG4GTCAAATC
CTTGCTTGCC
TTGCATAAAT
566
CTGTATGGCC
GAGGAGCTAA
'ICACACCATC
AATCACAGAT
TCTGATAATC
AAGTCAGTTC
ACTTCTAAGG
TTCCATCTTA
TCCACCAATT
AATCAAGGTT ACCTTATCGC AATGGCTGAC AGGTCAAACT TTCTCGAGTC GTCATTCCTT AGGAACAGT'r CCTGATACAT TTGACCGTCT CGTrTTTTTT ATATAGTATC CCTTGTCTr'r GACTTGGCAC TTGGAGCTCC 'rCCAAGAAAT CTTTACTTTT AAGACCACGA CCACTCATrr 6960 CCCACGGATA GGAGGATTGG AAATGACATG ATTAGATTGA AATATCGTCG CINTTGCATT CAGGGCACGA GTGTTAATAT CAACCATGGT ATTTTTTTCA GCAmTCrCT GAGCTAAATC CGCCTGAACT CCGTAAACCT TGACCAAGGA CAAACC'rAAT ATCTAGGACT GTCTCTCCTT GGTTGACATC CAGACACTTG GTCAACCATT TTCTTGCTAA AAACACCCGC ATCTG'TCAAA CAAGTCCACT CTCAACTCAT GAATGTCGTG AGCAGCGTCA TTTACTCATG ACACTATTTT ACCATAATTT GACTCAAATT TAATAAAACG AAAAAGACCG AAGAAAGCAA GTCACGAAGC AACACTTATA AATAATAAAC CATTTAGAAC AAGTTTATCA TCTATAATCA GGCAGATTAT ATTATCAACA AAATTCCTAA CAAAATGTTT AGGAT'rTCCA AACTTGTCCA AAGTCGTATC CCCTCTGTCG CAATCCGTAG GACTAAAAAG CATCGr'rT TAGTAAGAAA GCAATTAAGA CATGTTCCAT CAAAAAAGTA AAACCGTAAT CATTGGT'rCC TTTTATAAAA GGTAAAGCAA
TATAAATATC
TATTTCTATT
AGATAAAAGC
AAATCT'CTA
CAATAACTAC
ACGAACAAA'r
AGGTTTCCAC
AACTTAAAAT
TATAAGTTGT
TAT'rTCA'rCA
CCGCTTTATT
GGACCATAAC CACAGCCTAC AGCAAGAGTT CACTTCCAAA AAAGTCATTT TTTCTCCCAA GGAT'rTTCTG CATAGTACAT GTAAATCGTr TACAAATTGA CATTTTCTTC AATCI'CT'TC ACAGTCCAGA TAAAAACAAA GCT'rAACCr'r AAAATACTT CCAACTGATA CGTTTATGTC GTGACATGTG GAAGAAATAA CCGCAGCAAT CCATTTCGTC AAAGACAGCT GTTACAATAG AAAGCATCTA CCATTATCTG AAAACAGAG'r TCCAATATGT GAACTAAAAA GCCCCCTTTA ATAAATTGAT CACTTATACT TTTTTAGCT'r TTTATGGTAT TTTTTACATT T1'GAAAAAAT 7020 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 AACGTTTTAA GATTTTCATA TAAGCTTA'rA AATCAGTAGA ATATACCATT GACTTACCAC GATAGACAAA ATATCTAGGG
GTACACCTCC
ATCTATC1'CC
ATTCAAGAAA
GAAAACAAAT GACCAACGAA 567 CAGCCGCCAG ACTTGGCAAT CTTrACATCG AAAGACAACA CCTCCTTTGA CAGAAGAAGA ATTGGAATCT ATCAAGAGTT TTAATGACCA AATCAGTCTC CAAGACGTTA CAGATATCTA TCTCCCCTTG GCTCATTTGA TTCAGATTTA CAAGCGAACT AAGGAAGATT TAGCCT'rTTC AAAAGGAATT TTCCTCCA INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 13188 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8760 8820 8880 8898 TATCTTAACG aGGATTGGGT TTAAATCATT TTGAAAATAA GAGTTTGAAC TAAAGAGCAA GCAACGATGA TGAAGCGTGT ACAAAAAATT TTGAGGTAAA TCGATTATTC AAAGAAAACC AACATTGATT TATCACAAGT GTTAGAGATC CAGAAAAAGT TCTCTGTCAT CAAGAGGTTG TCAGAATTTA CCCTTGAAGA AAGAACAATC ATATCAAAGA ATAATAGAAT TTAAAGGTAG CATCAGAAGA GGTGTATGAT GTTACGATAA CCTAAGTTTA TATCTGAAAA TATGAAATTT GGTGGGAAGT AGGAGATAAA TAATAGGTGA ATTATTGGAA AAACTTTATC ACGTTTAGTA TAAGAGAAAA GTATAAGAGT
TCGGCCTGTA
AAAAGGAAAT
GCAGGCAGAT
TAACTTTCTT
ACTTGCACCA
ACCTTCTAAA
TACAAAAGAA
GACAATAGAA
TATGTATCGT
AAAGATTAGG
GCAGATTTTT
TT'rFCATCAA
AATAATCCTA
GTCCTTCCGA
ACTGCTAGAC
GGAAGGATAT
TTTAAGCAAG
ATTCTAAATT
TTTGAAAGTG
CAACAGCTTC
TTATCGTCAG TCTTATTGCC CTAATTGTGG
ACTGTAATCA
CAATCAATGA
ATTTCTTTTT
AGCAATTTGT
GAGCAGGTTG
TTCTTGTGCA
GTTTATTTTT
GTATAGATAA
ACCTAAAAAA
AAATATTAAG
GAACAATCCC
TTGTAGTGAG
TGGTGCTTAT
TTTAACTTAC
TACACCGAAA
GATTGGTTGT
AGATGGACAA
AAGGAAGAGC
GATAGAGGGT
TATCTTTGTT
AGACAAAGAA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 ~~L~CThJP ('tt~AAAT'PA'T' (~AAAAC(~KAA TTCTTAAAAG TCATCTGGCC TGATTATGAA ACTGAAAGCC ATCGTCTGTA CCTTATCAGA TCCCGATTGT GGTGACGAAA AACAACTAGC TTTGATGAAG TTGCCAGAGT GGCTACATAG CTCCTATCAT AGCAATCTAA AACTGAAAAA GTATACAGTA AGTATAGAGG CTGAAAATCC AGATGAAGCC
GTGAGATGGT
GAAAAATATG
AGATTATTGT
GAAATTACAG
GAACGACTTG
TGTGAAATTG TTCTTGATGC AGATGATTTT CAGGACTATG 568 ACACTAGGAT ATATGAATAG GTAGATGTTT TTAN'TTGTC AACAAAAAAG ACGCTCGCAC CTCTTI'TCr TATTTCTTTT TATGAT'TPAA TACGGCATTG AGGACAATAG CGAGTAGGCT GGCTACGACG ATTCCG~TTTG AGAAGAACAT TTGGAAGGCT GTCGGCATGC TGACAAAGAG ATTACTGTTG TTGAGACCGA CACCTGCAGC GATTGAAACA GCTGCGATAA GGAAG~rGTG TTCATTGTTA GCAAAGTCAA CACGGGCGAG GATTTGCATC CCTTGAATTG ATACAAAAcc AAACATTACC AGCATGGCAC CACCGAGGAC GGAGCTTGGA ATGATTTGGG CAAGGGCGCC AAACTTAGGA AGCAGTCCAA GGAGAACCAG GAAACCAGCT GCGTAGTAGA TTGGCAGGCG TTTTTGATG CCTGACAATT TAACCAAACC AACGI'rTTGT GAAAATCCGG TGTAAGGGAA GGTGTTAAAG ATTCCTCCGA GAAGTACGGC CAAACCTTCT GCGCGGTATC CGTTGCGAAG GCGCGTGCTG TCGATTGGAT CCTTTGTGAT ATCAGACAAG GCCAGATAAA CACCAGTTGA CTCAACCATA GACACCGTTG CGATGATACA CATCATGACA ATAGATGAGA TrrTCAAAGGT 'rGGCATCCCA AAGTAGAGTG GAGTTGGGAC ATGGACAAGT GAAGTCCACC AAGCCCATAG TAGCAGCAAT GGCAGTTCCA AGAGATAGAC TTGATAAATC CTTTGGTAAA GATGTTGATC GGAGCTACCG CAACAGGAGA ACAACCAGAC CAATCAAAAT AAGAGGATAA TCAGAACAGT AATAGCTGCA AGCAAGAGAC TTTGACCAGT TGGCTCTGGA AC!CTTATTTr CCATATT'rCC AATAGCGACA GGGATCAAGG TTAAACCAAT TGGGAAGAGA.TTGGCTACTT TTGAGAAGAT TGCGATAAGG GCACCAAACA TAGCGCCACT AGCGACCGAC TGGAATGCAA CTCCAAGAAC GAGTTGGAGT TGGAGGAAGG TTGCCACCCC GGTCAACTGC TCAGCTGAAT AGCCAAGGGC 1200 1260 '1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 CGTGGTAATA ACAGATCCTG T'rACGATAGA GCCTGAAACA AGAACCACGt AAATCCCAGA ACCATGGCTT TGCCCAATCA TAATCAAGGG GACTGGGAGT CCAATCCCAA AGTAT -rGTT ACACATGAAG ATATCTGTAG AAATCAGGTA TGTCGCAATC ATGATGGGAA CCAGGATAGA GCCAAGAACG GCTGCTTGCG AGTGT'rTTrC AATACGACTT GACCATTTTC AAAACAATCC GCTT'TTTCAA GCAAATCACG ACCATCTTGG TCCTGAGTAC ATGGCTAGTA TTGAGTTTGC ATTAGAGATC AAACGAGCAA G'rGA'rAGGAC AAGGAT'T'CT CAATCACGAT -TGAATCAAGC CTTTAGCAGC TCCTCTGGTG AGAGGAA'TTT GAGTAGACTT GAGCAGTTAA AAAATCATGG GAACGTTTAA
AGTGCTGCAA
TGCCTCCTTA
AGGGTAGCCT
ACCGATAGCT
TTGGCC-MTTA
TGGACTGTGG
GCAAGGAAAT
CACCGGCCTC
CGTCGATAAT
TGGTCACCTG
TGTTCTTAGC
TTCGATGATT
CAA.AACCTTG
CTTGGTAAAG
TTTTTGGCG
TTCAGCGATA GAAACGGTGC GATGCCTTCG TTCATGGTGA GGCTTCAGCT GTAAAAACGG CTGGGGCAAT ACCCGACGCT TCAATGGTTA CGACCTTGGT AATGCCAGTA GTAGCAAATT TTTCCGCAAA AACCTTACCA ATCTCTCGCA TCAAGCTAAA GTCAACTTGG TTATCACCCA AGATATGCCC ATCCTTGAGG CCTAAAGTCT AAAAGTTAAT TTACTTGrrG TAATACTATA TA'N'TGATAA AACTATTACG TTGTAGTGGT ATCATAGACA ATAATCTTGT AATAGTTCGG GGAACTATTT TAGCCTAAGC TTAGCGATGA AATTCCCTGG A'rTCCTGAAA GCACTAATTA AGGGAAATAT TAAAAAAAGA TAGACATGGC AAAAACCGCC ATATCTCACT ACCTTGTAGA AACGTCGTGC CAAT'rCACGA AATAGGCTTG AGCCAATGTT TT'rATTTTAC TTAGTGTTTr GGTTTAAAAA ACGAACAAAA TGGGTTAAAA AGGAATCTAC CTTGAGGATG ATGCGCTCTT CTAATAA'rTT CATAAGACCT TTTAAATATT TCTATAGTGA TCCCTTTTGC AGCGAAGCGA GTCTTATCAA ATATTTCCCG TATTGTCTAT GACGGGATTT TTGAGAGTAA CTAGAAATGA AAGAGCTAGG GGCTCAAAAA TTATTCACAG GATAATTTCA CCTCCCGTCC
CCTAC=AAT
GCTGACTTAC
CATAAACAAG
ACTAAATAAC
AGAAGAGAGG
CTCTAAGTAA GTCCCCTAAA 'rTATTGTTAG GTGTTCCGGC TAAAACGATA TITCAATTTTA TTTAGAAATC AACTATTTTG GTGAACAAAA ACTCCATTGT AGGAAGCTAT CCACAACCTC AGTATCATAT ACCTACGGTA TTAGAAGATT TTTCCATCAT CCTTrTrTCTA ACTTTAAAGA AAATAGCCAT CTGGATCCAA
S.
AAGCTAACAG TTATACTAAA TGAAAATCAA AGAGCAAACT AAAACACTGT T'rTGAGGTTG TGGATAGAAT TGACAGAGCC AGGCGACGTT GACGTGGCflT GAAGAGATTT 'rCGAAGAGTA 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620
S
AAAAGGCATA CTATCAAGCT CT'rTTCCCAA TrTTTTATTAT AACTGCAAAT T'rATGAGGAT GGGACGATAA ATAGGATAGT TATGCCAAAG GAGAGATTGA CTTTGTTCCC TCCTCTAACA GGACGTTGGT ATTCAATCCT ATATTCGATA CAAGCAACTC AGTCAATTTC CAAGACAATC ATTTAGTGAC CTTGTCAGCG TATTGTTGAT TTTAGAAGTT CGCCATGAAC ATGGCGGAAG CACGTTCCTC GTCAGTAAAT AGATATAGGG ATCACTGACA CGAAACTTTC TTTTGGTCAA 'rTGCCTTCAT CACTCTTTAA TAGAGTTTTG AAACATCCTT CTCCACGACC AAAGGGATAG GTC-AGTTCAG CTAGTTGATC TTAGTTGACA CTCTTCAAGA GAAAGAGAAA GTTTTCTTCT AAAACCCAGT AAACCACAGT AGAAGGACCG GGACTGTTCG GGGAATGACC GCATTGTAGT CCATATAGAA AATCCT'rACA GGTGTATGGT CTTGGCGAGC ACCTG.AGTCA ATCATATCAG ATACGGTTAC.TTGTGAGCCA GTAGTCGATT' CTCCAGCCTG TTGCTGCGT'r GTGCCCACCA AGTGTAGCGT TCAGGAACAT GTGTCTGTAA ATCCAGT'rGC CAAAAGGTTG GTAAATCCAG CCAGGTGAAC GGCGGT'rGCT AGCAGGATTT GCAAGGTCGA TN'AGACACC TGACAATATG TCTACTCGCT AAATCTTAAA TT'rCATTGTG GGCTACGT'rG TAGTCACCGG TCGCAAGGAC TGGTTTTTCT TTGTCTAGTT CAGCCAAATA CTCAGCATAT TTGGCATCCC CGTCACCAGC GI'TGGAGTG TAAACTTGGG TGATACGACC TTCCAAGTCC ATGTAGAAG CTGTAAGTTC TTTCTTATAA AGGAACATGG GGGAAGAGCG CCACGTGTTT TCGTAGCCTG TCTTTGTAGG TCCTr'rGGCA GAAAGC=TGG CAGCGACCAA GGTTTGTAGG A =CTTrGGG GGGCAGCC'rT TAGGGAATCA ATATTCCATG TCAGATTATA GATTTTATTA TACCAAAAAA 570 AGAC3-rGGCG 'rTCTPCCAAG CGTTTGAGAC TTAdGAAAAA TGCATCAAAT TCTAGAGTGA GGGCACCGA'r TCTGGGAAG CTGATAGTAG TICCAGCATA CCTTTACGG GCAGGCTCTT GGAAGAGTTC TTCTAAAATT' 'rCCACGTGTT TTTCTTGGAT AGCAATGATA TCAGCAT7'rT ACAAT'rTGGC ACGAGCTGAG TCACTAGTTA AGATAAGTTT CATAAAGTTA CCT'TTCAT AGATCTATTT CCCCAACGTA TGGTTTGAAA 0 0 0 0 0 000* 0 AATTACTCTC TTTCGTTTAT AATTCTAC'rC TTATGACTAT GCTTTGGGAT TGTTCTATTA AAAAGGGAAG CGAATTTCGT CTTTTGGTCA GCATCACGAC TCACTTCATr TCATCGAGGT GTTAATACTT CCACAAACAC GCCCTAAATG GAAGTGAGCC GACGCAATTG AACTGGTGGA AAC'rGGTCr'r GACTCTCAAA ATGGTAAGCC AACTCTCCTT AATTAAGAAT GATTftTATGA AAGGGAGTGA AAATACATGA GTACTCAGCC AAATCGGTCA GCAAAATGGT ATCATGGTTG GCTGTGACAG TTrTTTTTTGC TTTCAAGGCA TACCATAATA GAGTTGGTCA TGATTTCAGA TCTGGCCTTA 'N'TAGCTCTG
TTATCAAAAC
TGTTTCCAAA
AGATGGCGCA
AGACAAGTAC
'rGTGAACAAA
TAGCTCAACA
ATCAAAAATG
AATCAAGTTT CTAACAATAA ATTTCAAACT GATTTGTGAG TAGACAAGTC AGAAGTCTAT CTTATCAAGG TIGGGAGATCG CTATTATCGT CTGTTAGAGA AAGTCGAATT GTATAAGACA TGACACPAA TTATATCGAA ATTTTAATCA ACAATGT'rCA CTTTGTGAAA CGTTTGATTG GGAATATTGA CCCAGAAGCC TGTCGTTCAG AACTTCGTAG CCAAGGGATT TTCCAGATGA ATGGGCAACT CATCGTTGTG CAAATGGGAG ACGGTGTGAT TCA-AGTAGAT GTCTTGGAAT ATAACCTCAG TAAACAAGGG CATGACAATG 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480
TTGGTTTGTC
AGCAAGTCAA
ATGAAAATCC
CGAT'rGGTCG
TAGCCAATAT
*AAGAAAAACC
TGCATCGGAT GTATCCCTCA ACGAGCTGTG CAAGAGCAA.A TAAGTATCCA GTTGTGACTG TAGCGAAGAG 'rGG'rTGCTTG CTTTATTGCT GAATATGACA TGGGGTCTTG TACTCTTCGA
AGGGTGCTGT
AAA'TCTrCT
TCTATCTACA
TTGATTTTC
TCCATTATTC
TGCCGTATGT AGGTTACTGA CT'rCGTCAGT GCAGCCTGCG GCTAGT'TCC TAGTTTGCTC ITTCCATTTGC AATCAGAAAG GGATTTTATG AAG'rTAGAAA AACGCCGTTA TCTAGTCGGA TACAGTCGTA ACTTATGAAT AAACCGCGTC AACGTCGCCT ACCTCAAAGC AGTGCTTTCA ATTGAGTATT GGCCTCAGGT AAAAACTTTG GTGGTTTTTC ATTGTGGCCC 'rGATCr'rGGT TTCCGTCCTC AATCTCATTC CTCCTATGGT TATGGGGCGG GTCATTGATG CCATCACATC GGGGCAATTA ACCCAGCAGG ACCTCCTTCT TAGCCTATTT TACTTGCTAC TTGCAGCCTT TGGTATGTAC TA'rTrGCGCTr ATGTGTGGCG TATGTATATC CTTGGGACCT CT'rATT~GCI-r GGGACAGATC ATGCGCTCC GCCTGTTTAA GCATTTCACA AAAATG'rCGT CAGCCTNTTA TCAAACCTAT CGGACGGGTG ATCTGATGGC ACACGCALACC AATGATATCA ATGCCTTGAC GGTGGCGGTG TCATGTCTGC GGTGGATGCC TCTATCACGG CTCTGGTGAC ATGCTCTTTA GCATCTCATG GCAGATGACT CTTGTTGCCA TTCCCCCCT GCCTATACGA CTAGTCGCCT AGGGAGAAAG ACTCATAAGG CCTTTGGCGA GCTTTTTCTG AACTCAA'rAA CAAGCTACAG GAGTCCGTAT CAGGTATCAA TCGT'rTAGCA TTTGT'rGACC
ACCTTTCATG
ATCCCAAGCT
AGTGACCAAG
ATTAACCTTC
TCTCTTGTTT
GGAAGGGCAG
CTGCC'CTTT
CCAGCGGATT
TCTT'rCGGTT ATCAGGCAGA CCAGTTGAAG CAAAAGAACC TGCAAACCAT GAAATATGAT T=T'TCAGG CAGTCAATGA AGTCTCTTTG ACCCTATGGT GTTCGTTCGT CCTATGTTTT ATTACAGTTG GGAATCTAGT CTGGCCATCG GTTTCCTCTT GAAAATCTTT TGTCTCACGA GAAAATGGGC GTTTGGAGTA ACGGATATTC ACTTPAGTTT TCTGGGAAAA CGTCCTTAAT ATTTATCTAA ACGGTCACGA GGCTATGTTC CTCAGGACCA GGCAATCCTA ACTTGCCCCT CAAGATATTG TAGACATGCC CTTTCTGGTG GTCAAAAGCA AACGCT'rTTG GTTGGCTCCT CACCTTATC AGCTATTTGG TAATACTACT CAGCGAGGGA ATCTCCTGTA CAAGACCCTG TGCCATTGAC AGCTTTGCTY GGCAAAAGGG CAAACACTGG CAAGCTCCTC TTGCGTGALAT TATTCGGGAC TATCC'rCTGA
T'GATGGTTCA
ATATGC'rGGT AGGT'rTCTTA AGTTTCCTCT GGATGGTATT TTGAAAATGA- GGAAACACTG GCTTGGTTGG GCAGACAGGC ACGATGTGGA TAAGGGTCC CAGACCTTCG CAGTC'rCATG 6540 .6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 GTTTCTTTTD GCGACTTCAA 'rCCTAGACAA 'rATCCGCTTT TTCAGCGGTC GAGGAAGCTA CTAAGCTAGC CCGGGTTTAC TCAAGGATTT GATACGCTGA TTGGTGAAAA AGGAGTCACT ACGGTTGGCT ATGAGrCGGG CTATGATT'rT AGACCCTGAT ATCTTGATTT TGGATGA'N'C CTTATCCGCCC GTAGATGCCA AGACAGAGTA 'rGCGAT'rATC GACAACCTCA AGGAGATGCG AAAGGACAAG ACAACCA'rTA TCACTGCCCA TCGCCTCAGT GCTGTTGTCC ATGCAGATTT TATTTTAGTT CTACAAAATG GTCAAATTAT CGAACGAGC ACGCACGAAG ACT'rGCTAGC 'rTTGGATGGC TGGTATGCCC AAACCTACCA GTCTCAGCAG TTGGAAATGA AAGGAGAAGA AGATGCAGAA TAAACAAGAA CAATGGACTG TATTGAAGCG CTTGATGTCT TATCTCAAGC CTTA'rCGACT CCTGACCTT TTGGCACTCA GTTTCTCCT AGCGACGACG GTCATTAAAA GTGTrCATACC TCTCAGCAAT CTTAACCAAC TAGCCGTTAC 572 CCTCGTGGCT TCCCACTTTA TCGACCAGTA CGT-NrTGCTG GTCrACTATG GTCTCTACAT TCTTCTCTTT GCGCGCGTGT CTTACAGTAT CAATATGGAG AAACTGGGCA TGTCTTACTT CCTACAAACT GTAGTTCAGT TGTTAGGGA'r ATTCGTCGGG TGACAAGACG CCAGCAGGTT A'rGTCGGCAA ATGCCTr'rGC CTATCGTC TCGT'rTGACC AACGATACCG 0 0 0 000 0 0* 00 0 0 0 0 6 00 ~0 0 0 *000 0 0000 00 00 0 0 0 0000 0000 0 0**0 00 0* 0 0 0 TGATATGTTT TCTGGGATTT TATCCAGCTT TATCTCAGCA CC'rTTATACC ATGTTGGTGC 'rGGATTTTCG TTTGACGGCT TTTGATTTTC CTTITrGGTCA ATCTCTATCG AAAAAAGTCA CAGAAGTC'rC TGTCAGATA TCAATAGTAA GCTGGCAGAG TAT'rCAGGCC TTAATCAAG AGAAGCGCC'r GCAGGCAGAA ACACTTGGTC 'rACGCCAACC GTTCTGTAGC CTTGGATGCC GAGTTTGCTG AAACTTCTAG GCTATGCAGT CTTGATGGCC TTCTATCGGG ATAACGGTCG GGACCATGTA TGCCTr'rATC TGACCCCTTG ATTGAGG'rGA CGCAAAACTT. TTCAACTCTG AGGTCGTGTC TTTGCCCTGA TAGACGAGAG GACCTATGAA AGCCAAAGTC CAAGAAGGCA ATATCCGTTT TGAACATGTC ACATCCCATT CTGGATGACA TTrCTTTCTC TGTTAATAAG AGGTCATACA GGTTCAGGGA AATCGTCTA'r TATCAATGTC CCAGTCAGGG AGAGTTCTCT TGGATGATGT CCATATCAGG GAGAAAAAAC ATCGGTTTGG TCTTGCAGGA ACCCTTCCTC GrTTTTATCT TTAGTC1'TGC
GTGAAAATCA
AATATCGAGG
TTTGATGAAA
CTCTTTTTGA
TACT'rTGGCT
CAGTACATCA
CAAACGGCTA
CC'rCTCAAG TG -rTCTCAT
AGACGATTAG
TT~CTGACAAC
TCTTTCTTCC
TCGAGAAAAC
GAATCAGGAT
TCAACCAAGA
GACCTGCCAT
ACCGTGGTTT
ACCGCCTTTT
TGGTT'rCTGC
AAAATGGGCA
ATGACGGTAA
8280 8340 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 GGTGAAACCA TTGCCTTTG'r CTCATCCGCT TTTATGAATT GATTTCAGTC AAGAAGAGCT TATCATGGAA CTATTAAGTC CAGGCTGCGG CAGCCTr'rGT CAATATCGCC ATGTACCAAG GGA'rGCAGAT TCCTTTATTC TGGTTCGAGC TTCTCTACTG CCAGCCTAAA ATCCTGATrT CTTGGTTCAA GCTTCTCTGG CCGCCTTTCT ACTATTCAAG CGAGAGTGGA ACCCATGAGG TTTGCAGGCA GGGGCCATGG ATCTGCAATC TCAAAGCTG'r TTACAGATTr CTTTCACCGC AAACCAGTGA TGAGCAGGTT AAGAACTTCC TCAGGGGTAC GACTCCCCTG TTTCCGAGCG GGCAACGCCA GCTTCTTGCC TrTGCTAGAA CAGTCGCCAG TGGATGAAGC GACAGCCAAT ATTGACTCTG AAACAGAAAG CGAAGATGAG ACAGGGCCGA ACAACTATTG CTATCGC'rCA
ATGCCAACTG
AACTCTTGGC
CCGATACTCT
AC1-rTGATTT Cl'IrTTCCATT CATCTATGTC TTGGATAAGG GACGCATTAT TCTGGGAGGA ACCTATCACA AGATGTATAG 'rrCAAAATCT CTTTAAACCA TGTCAGCTTT TCATTGAGTA CTAGAAGGAA ATCCT~TCAAA 'rTGTGGTATA ATGAAAAATG TTGACAAATA
GTATAATAAA
AATCAAGCGA
TCAAGCGTTG
TTCTCATCTT
AGCGTGTGAC
573 AACAAAGGAG AACAGCATGC TGAAATGGGA GGTrGAG'rCT TACTACCAGC TTGTCT'rAA CTTGGACTGG GrrGGCCT TGGTCTTACT GAGC-ATTrrcG ATCAAGTTGG ATAGCAAAGG CCAGTACAAC CGTCGGTTCA AGATTTGGAA ATGCGGATAA AAAAGGAAGT CTGGTGACTT C IGCTAACGA GAAATT'rCAT CCGACGTGTC CGTTTGGACG AACTGCCTCA GTGAGATGTC CP'rTCTCGGT ACACGACC'rG AAGTGCCACG CTGAAATGAT GGCAACCTTG CTCTTGCAAG CAGGGATTAC ACAAGGATGA GGACACAA)wr ATCAGTCAAA TGACGGAGAA
*G
et
S
S
0
S
5505 00 55 *6 ~'5 0 000* CCTATGTGGA GCATGTTCTT TTAGTTTICTT TGGGGACATC GTAGTCATAA GAAAATGAGT CATTTTCACC GCCTGATATC CTGGTTGGAT CACAACAGGT CACAGACACC TAAGACTGTT GCGTTTTGGA AGTGGGACCT CATGTAGTGT CATTACGCAC CGTTTGAGAT GGACTATGAC TTCCAGTAGA GCTCGCAGGG AAAAACGTGA CTTCTTTACC CCTGAAAAGA TGCGCTATAA AAAATCATGT TTCAAACCGT ACAGATAAAA GGAGCAAATC ACAGAAGCAG AAATTACTGA CCTAAA.ACAA AAGAACTGGA AGACTrGCCT G'rGGAAATGA AAGGAAGGGT TCGCTG.ATTr GGTTCTGACC TCTCCCATCT GCCAGTGATT TACAAGCAAG GTTTCGTACC ATGGTGACGG TAGCCGCATT ACCAAGGTTG GT'rGGTCAAT GTCCTT~AAAG 'rTATACAGAG CAGTATAGCC CTCTCCAGCC AGCATCAACT AGGTCTGTCA GTTGATCAGG CCTCGCCTAT CTCCGAGAGT GTTTGAGGTA CTAAAATAAA AATGCCAALAT TACA.ATATTC AGTAGTGGAT ACCCTGCGTT GCGCCGCTTG TCTCTTTACA CGCTCTGGAG TTGATTTTAC AGCCATGACC TA'rACGGCTT GGTGGATATC CAAGCAGATA TGAGAAXACT AAGGTGATTA T-rTGTTCCAA GTCGTGGAGA GGCCTTTAAC CGTATTGTCA AGGACAACCT TCTGGTTCTA CTTTACAACG GCAGAAGGTG AGAGATGTAC AAGGAATTCC CAAGATGCAA CTGGGGTCAT 10080 1.0140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760
TGTCTCAACT
GGTGATGAAG
GTGGGAGCAA
CTrGCTTGAC
ATTGTTTGCG
GCTTCAAGCA
TTGTCTCTGA TAGTGCCCAC GCTTTGGGAT TCGCTGACTT TACTTCCTTC TCATTCCA'10
CTGCGACAGC
TCATCGTTCC
CCCCTGTCAT
AAGCTATCAC
ATTATGACCG
AGTGGCAAAA
CTATTTATAA
CAGTTAAGAA
TTGATGACGA
ATGCTCTTGC
AGTGCAACAT
GTTTGI'GCA
GCA1'CCATCC GAAGTGCGAC TTGGAAAGCC AAATCCTTTC CCT1'CACGGG GGGAATACGA TA'rCGTTACA TTGGTTTGGT ACAATTGGAC ACCGCTATGA TAGTGGTTTT
AATCCAGTGA
CAAACTAAGG
CCAGCCTATA
CGCTATCCAA
GCAGGTCTC
GACCGATATC
ACCCCGTAAG
TTTGGCACAC
ATGGCTTCAC
GACATTGTGG
AAGACTGAAA
CTAGAAGAAC CTGTCGAATC TTCACGCCAC CTCTACATCA CCCGTGTAGA AGGAGCAAGC
GCAACCTCAT
CGCTrCCTCT
CCTATGCCTT
AAGTAGACTA
AAAAATGACA
TAAAAACTGT
CAGAGAGAGA
ATGTGAGCAA
GTCTTGTTCT
ATTGTGGAAA
CATCCAAGAA TTGGCTAAAG C?1'GACAGCC TATAAGAATC CTTTGAGAAT GAAATTACCC TATCATTGAG ACTTI'CAAAA AACTACAGTC AAGCGAAAGT TGTTTTCAAT TGATA.ATAGT ATTTrTATAG GATTTTCCTT TTTAGTGTAG CATTTAGAAT 5.74 CAGGAA'N'GC AAGTAATGTT CACTACAAAC TTGGATTTGA TATGACGAAC TATCCTAAGG TCCCTCTTCA TACTAAATTA AGCGATGAAG CAGTI'TCTGA AAAAGTGCTA ACTTTATcAA GATCCTGCCC -CTAAAAGTC TAATTGAGTG
TTACACCTGT
TCr'rGTGGGA
CCTTACTAGA
AGTTrGAGGCC CCTrTCTCCT GTCCCGTGG'T TTGAAATAAG CATCATTTAG AAAATCTAGT AGTTTTCAAT TCACCCTATT TTTTGAAAGA CGTGAGTTTC CATGAGTGAG CTCGCGTCTT TTTTTGI-T' CAGAATATTG TTCAAAATTT TGTGCCTGTC 00..0 00* 0 0. 0 TT'rCATGTrC
TACAAATAT'?
ATCTATAAAA
GCATCTATTr
TGTAGTGTTT
TGCTTGTATG
AATATA GTAA
AAGGCTAAAG
GTAGATATAG
CTAACAATGT
CAAGTCAGTA
TAGTCATTCT TTTGCATGAT CTATATGTTT AGTGATGcTr CACTTGTCTA CGATTACCTA TTATCGAGGT rAAATCTAGC TAGTTTCAAT CCGCCATATG ACAAGGTATT TGTTCTTTCA ATGGGATATT TTATArrCAA AGCAAACTAG GAAGTTGGCC TAAAATGAAA TGAGAATAGG TTTAGAAGTA GAGGTGTACT ACCTAGACTT AGGGCAAGGC
AGAATTI'ATA
GCTATACATT
TATGCCCTAT
TTTTATAGAA
AGCGATATTC
TTTATAATTT
GCTAAGAAAG
ATAGATAGCT
ACAAATTGAT
ATTTTAGT1TT
GGCACTGACC
GCATGTTGAT ATTATAATAA ATTAGATCTC CTGCGAGACA TCCAGTA N'TI TAGAAGCACT GGTCTAT'rTA AGAAATATAT AGGTAAATAT CCCTGGCGAA ACAACATATC AACAAATTTA ATAGCATCAC Tr'ITGAATGG CAAAACCCTG CTTTGAGGTT CGGGACAGTC AAATCGATTT CAGTCTACTA TAGAACTGAC TAGTTTGAAG AGATT'rCCGA AA'rCA.ATTT GGAAAATATA AGAAAAGTAA TAATCAATTrG 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12.720 12.780 12840 12900 12960 13020 13080 13140 AGAGTATAAA TTTTAATATT TTCTTGTGTT ATTCCT'rGAC TGATAAAGAT AATGACAGCG GTGTCATTCT ATCTATTTTA TTAAAAATAG TAAAAAAATT GGAGGTTCTG ATGAAATATT TTGTTCCG 13188 INFORMATION FOR SEQ ID NO: 71:.
SEQUENCE CHARACTERISTICS: LENGTH: 32768 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 575 CAAGCACCAG TGCGTCGGCC AACCAG'rGCA TCAGTCTCAG TGAATCCGCA TCAACCAGTG AACAAGTGCA TCGGCTTCAG TGAATCGGCA TCAACGAGVG AACAAGTGCT TCGGCTTCAG AGCCTCAGCA AGCACATCAG GACAAGCGCC TCAGCTTCAG AGCCTCAGCG TCGACAACTG TGCGTCAGCC TCAGCAAGTA CCTCAGC -rC
CAAGCACAAG
CCTCCGCTTC
CGTCAACGAG
CTTCTGAA'rC
CAAGTACCAG
CGTCGGCCTC
CTAGCGCCTC
AGCAAGTACC
TGCTTCAGCC
AGCAAGTACT
TGCCTCTGAG
TGCATCAACC
TGCGTCAGCC
AACCAGTGCA
AGCCTCAGCA
TCAGCAAGCA CCAGCGCGTC TCAGCATCTG AATCAGCATC 'rCAGCAAGTA TCTCAGCGTC AGCGCCTCAG CATCAGCGTC TCAGCATCAA CGAGTACGTC AGTGCGTCAG CC1TCAGCATC TCAGCAAGTA CCAGTGCTTC TCTIGAATCGG CATCAACCAG TCAACGAGTG CGTCCGCTTC AGCAAGTACT AGTGCATCAG CATCAGCATC AACGAGTGCA TCGGCTTCAG CAAGTACCAG CGCCTCAGCT TCAGCAAGCA CCAGTGCGTC AGCCTCAGCA AGTACCAGCG CCTCAGCCCTC :0 0.
.0.
o** AGCAAGCACC AGTGCCTCAG TGCGTCGGCT TCAGCAAGTA AGCATCAACA ACTGCTTCAG TGCGTCCGCT TCAGCAAGTA AGCGTCAACG AGTGCGTCTG AGCTTCTGAA TCTGCATCAA AGCAAGTACC ACTGCGTCAG TGCGTCGGCC TCAACCAGTG TACTAGCGCC TCAGCCTCAG AGCATCAGCA TCA-ACGAGTG CACCAGTGCG TCAGnCrCAG C'rTCAGCAAG TACr-AGTGCG CCTCAGCGTC 'rGA-ATCAGCA CTTCAGCAAG TATCTCAGCG CTAGCGCCTC AGCATCAGCC AGTCACCATC AACGAGTACG CCAGTGCGTC ACCCTCAGCA CCTCAGCA-AG 'rACCAGTGCT CATCTCAATC GGCA'rCAACC CATCAACGAG TGCGTCCGCT CATCGGCTTC AGCAAGTACC CAAGTACCAG CGCCTCAGCC TCAGCCTCAG CGTCGACAAG TCAACGAGTG CATCAGCTTC TCTGAATCGG CATCAACGAG TCAACAAGTG CTTCGGCTTC TCAGCCTCAG CAAGCACATC TCGACAAGCG CCTCAGCT'rC TCAGCCTCAG CGTCGACAAG AGTGCGTCAg CCTCAGCAAG TCAGCAAGTA CTAGTGCATC AGCGCCTCAG CTTCAGCAAG TCAGCA.AGCA CCAGTGCCTC AGTGCGTCGG CT1'CAGCAAG TCAGCATCAA CAACTGCTTC AGTGCT'rCAG TCTCAGCGTC TCAGCAAGCA CCAGTGCGTC AGTGCGTCTG AATCGGCATC TCAGCAAGCA CATCAGC'rTC AGTGCGTCGG CTTCAGCGTC AGCTTCAGCA AGTACr-AGTG CGTCAgCCTC AGCGTCCACA TACCTCAGCG TCTGAATCAG CATCAACGAG TGCATCAGCT -APCT-TCAGCA AG TACCAGTG CGTCGGCTTC AGCATCAACG AACCAGTGCC TCTGAATCAG CATC AACAAG TGCCTCGGCT GGCTTCAGCA AGTACTAGTG CATCGGCTI'C AGCATCGACA AACGAGTGCT TCGGCTTCAG CATCAACGAG TGCGTCACCC TGAATCTGCA TCAACCAGTG CGTCCGCTTC AGCGTCAACC GACAAGTGCT TCGGCTTCAG CATCAACGAG AGCGTCAGct TCCGCCTCAA CCAGTGCGTC AGCAAGTA'rC TCAGCGTCTG AATCGGCATC TACGTCAGCC TCAGCAACCA CATCAGCTTC TGCGTCGGCC TCAGCAAGCG CAAGTACCTC GGCTTCAGCA AGCACAAGTG CGTCAGCCTC AGCATCGACA AC3CGCCTCAG TGCGTCGGCC TCAACCAGTG TACTAGTGCA TCAGCTTCAG GGCTTCAGCG TCAACCAGTG AACAAGTGCT 'rCAGCCTCAG TGAATCAGCG TCAACCAGTG AACCAGCGCC TCGGCCTCAG GGCCTCAGCA AGCACCTCAG AACGAGTGCT TCGGCTT'CAG
CTTCAGCAAG
CATCTGAA'rC
CATCAACGAG
AACGAGTGCG
TGAATCTGCA
TACCAGTGC'r
GGCATCAACC
TGCATCGGCT
TCTGAGTCAG CATCAACGAG TCAACCAGTG CGTCAGCCTC TCAGCCTCAG CGTCGACAAG AGTGCGTCAG CCTCACCAAG TCAGCATCAA CCAGTGCCTC AGTGCTTCAG TCTCAGCATC TCACCAAGCA CATCAGCATC AGTGCTTCAG CTTCAGCATC TCG;GCCTCAA CCAGCGCCTC CGTCAGCTTC AGCAAGTACC CATCGACAAG TGCCTCGGCT CTTCGGCTTC AGCAAGTACC CAAGCACCTC AGCTTCTGAA CTTCTGAATC GGCCTCAACC CAAGCACAAG CGCCTCGGGT CTTCAGCCTC AGCATCAACA AGCTTCAGCG
TATCTCAGCG
AGCCTCAGCA
GACAAGCGCC
TCAACCAGTG
TCTGAATCGG
AGCACCTCAG
TCAGCTTCAG
TCAGCATCAA CGAG'rACGTC AGTGCGTCAG CCTCAGCAAG TCAGCATCAA CGAGTACGTC AGTGCGTCAG CCTCAGCATC TCAGCGTCGA CA.AGTGCGTC CATCAACGAG TGCGTCTGAG CTTCTGAATC GGCCTCAACC CAAGTACCAG TGCTTCAGCC 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820.
2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 GGCCTCAACC AGTGCATCTG AATCGGCATC TGCATCGGCT TCAGCATCAA CCAGTGCCTC
AGCAAGTACC
TGCCTCGGCT
AGCAAGTACC
AGTGCTTCAG TCTCAGCATC TCAGCAAGCA CATCAGCATC AGTGCGTCAG CCTCAGCGTC 'rGCATCAGC'r TCAGCATCAA CGAGTGCATC AGCAAGTACC AGTGCGTCAg CTTCCGCATC TGCGTCGGCT TCAGCAAGTA CTAGCGCCTC AGCA.AGTATC TCAGCGTCTG AATCGGCATC CGCCTCAGCC TCAGCGTCAA CAAGTGCATC GGCATCAACG AGTGCGTCCG CTTCAGCAAG TGCATCGGCT TCAGCATCAA CGAGTGCGTC AGCGTCAACA AGTGCATCGG CTTCAGCGTC AACCAGTGCG TCAGCCTCAG CAAGTACTAG GGCTTCAGCG, TCAACCAGTG CGTCAGCTTC AACAAGTGCT TCAGCCTCAG CATCGACAAG TGAATCAGCG TCGACAAGCG CCTCAGC-TC GACAAGTGCG TCAGCCTCAG CAAGTACTAG GGCTTCGOCG TCAACCAGTG CATCAGAGTC AACAAGTGCC TCGGCTTCAG CAAGCACCAG AGCCTCACCC TCAACCAGTG CGTCAGCCTC AACGAGTGCG TCCGCTTCAG CAAGTACTAG GGCTTCAGCG TCAACGAGTG CGTCTGAATC TACTAGCGCC TCAGCCTCAG CGTCAACAAG CGCT'rCAGCA AGTACTAGCG CCTCAGCCTC AACGAGTGCG TCTGAGTCAG CATCAACGAG 577 'rGCGTCAGCC TCAGCAAGCA CATCAGCTTC TGAATCTGCA TCA.ACCAGTG CGTCAGCCTC AGCATCGACA AGCGCCTCAG CTTCAGCAAG 'rGCGTCGGCT TCAGCAAG'rA CCAGTGCGTC AGCCTCGACA AGTGCGTCCG AGCCTCAGCA AGTACTAGTG AACCAGTGCA TCAGAGTCAG GGCTTCAGCA AGTACTAGCG AACCAGCGCC TCGGCCTCAG GGCTTCAGCA TCAACGAGTG CACCAGCGCG TCTGAATCCG TGAATCAGCA TCAACAAGTG TATCTCAGCG TCTGAATCGG- AGCATCAGCG TCAACAAGTG AACGAGTACG TCAGCCTCAG
CCTCAACCAG
CA'rCAGCTTC
CAAGTACCAG
CCTCAGCCTC
CAAGTATCTC
CATCAGTCTC
TACCAGTGCG 'rCAGCCTCAG AGCCTCAGCA AGTACCAGTG TGCATCTGAA TCGGCATCAA AGCATCAACG AGTGCATCGG TGCGTCAGCT TCCGCATCAA AGCGTCAACA AGTGCTTCAG AGCGTCTGAA TCGGCATCAA AGCAAGCACC AGTGCGTCGG
CGTCGACAAG
CGTCAGCCTC
CCAGTGCGTC
CTTCAGCATC
CAAGTGCCTC
CTTCCGCG'rC
CAAGTGCCTC
CCTCAGCAAG
CATCAACCAG TGCCTCAGCT TCAGCAAGTA CCTCAGCATC a a a CCTCGGCTTC AGCAAGCACA AGTGCTTCAG CCTCAGCAAG CATCAACGAG TGCGTCCGCT TCAGCAAGTA CTAGCGCCTC CTTCGGC'rrC AGCGTCAACG AGTGCGTCTG AGTCAGCATC CAAGCACATC AGCTTCTGAA TCTGCATCAA CCAGTGCGTC
AGCCTCAGCA
TACCAGTGCT
GGCATCAACC
TCGACAAGCG CCTCAGCTTC AGCAAGTACC AGTGCGTCAG CCTCAGCAAG TCAGCCTCAG CGTCGACAAG TGCGTCGGCC TCAACCAGTG CA'rCTGAATC AGTGCGTCAG CCTCAGCAAG TACTAGCGCC TCAGCCTCAG CATCAACGAG 3600 366o 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 TGCGTCCGCT TCAGCAAGTA CTAGTGCATC AGCGTCGACA AGCGCCTCAG CT'rCAGCAAG TGCGTCCGCT TCAGCAAGTA CCTCAGCGTC AGCATCAACG AGTGCATCAG CTTCAGCATC TGCGTCGGCT TCAGCATCAA CGAGTGCTTC CGCATCAACA AGTGCCTCGG CTTCAGCAAG AGCT'rCAGCA AGTACTAGCG TACCAGTGCG TCAGCCTCAG TGAATCAGCA TCAACAAGTG AACAAGTGCT TCAGCTTCAG AGTCTCAGCG TCAACCAGTG CACCAGTGCT TCGGCTTCAG AGCCTCAGCA AGCACATCAG
CCTCAGCCTC
CGTCGACAAG
CGTCGGCTTC
CAAGTACCAG
cCCTCTGAATC
CGTCAACGAG
CTTCTGAATC
TGCGTCTGAG T CAGCATCAA TGCATCAACC AGTGCGTCAG TGCTTCAGCC TCAGCATCAA AGCGTCAACC AGTGCCTCGG
CGAGTGCGTC
CTTCCGCATC
CCAGTGCATC
CTTCAGCAAG
AACAAGCGCC TCGGCCTCAG CAAGTACAAG AGCTTCAGCC TCAACAAGTG CTTCAGCCTC TACCAGTGCG TCAGC'rrCAG CAAGCACAAG GGCTTCGGCA TCAACAAG'rG CCTCAGCATC TGCGTCAGCT TCAGCATCAA CCAGTGCTTC AGCATCAACG AGTGCGTCAG CCTCAGCAAG TACTAGTGCA TCAGCATCAG CATCAACCAG 578 TGCATCAGCC TCAGCAAGTA TCTCAGCGTC TGAATCGGCA TCAACGAGTG CATCAGCATC AGCATCAACG AGTGCATCGG CT'rCAGCGTC TGCGTCGGCT TCAGCATCAA CGAGTGCCTC GGCATCAACG AGTGCGTCAG CCTCAGCAAG 0 .0 .o 0 TGCGTCGGCT TCAGCATCAA GGCATCAACG AGTGCGTCAC TGCATCGGCT TCACCAAGTA AGCAAGTACC AGCGCCT'CAG 'TGCGTCAGcT CAGCATCAAC GCATCAACGA GTGCGTCGGC GCTTCAGCTT CAGCATCAAC GCAAACCATT CGAACTCACA GAATTGCCTA ATACAGGTAC GCTGT'rACAG GTATTGGATT AACCTGTAAA GTTAGGCTAA GATGATAAAA TAACAGTCAT CTAGATAGTA TTATTACTCA TCTACGGATG CTTCAGGTGA TATATAGAAC AAGAAAATGC TCCGGAAATT ATGTGACCTT ACTCTATATA AAAAAATAGT TTCAACGAAA GTGAAGGAAT GTATATGATA ATGTTTCTAT GC1"rTGATAT CTG3CTTGGCG GACATAGGTA AATTIAGGAGA AAGGTAATTT ATT'rAAATAA AGAGTTTGGA CAGAAAAGTG CTACTAGCTA ATATGGGT'rA GAAGTCAGTC TCGCCAACGG
CCAGTGCCTC
CCTCAGCAAG
CCAGCGCCTC
CCTCAGCAAG
AAGTGCTTCA
AACCAGTGCA TCAGTrCTrcAG AGCCTCAGCA AGTATCTCAG TACTAG;TGC-A TCGGCTriCAG AGCCTCAGCA AGTATCTCAG TACTAGTGCA TCAGCMTCAG AGCTTCAGCA AGCACCAGTG CACCAGTGCC TCAGCTTCAG GCTTCGGCCT CAACAAGTGC TTCAGCAAGC ACCAGTGCCT CGGCCTCAGC AAGTGCGTCA GCTTCAGCAA GTACATCAGT AGTTGGAAAT ACTTCTGGAT CGACAGGTAA
CAAGCACCAG
CGTCTGAATC
CAAGCACCAG
CGTCTGAATC
CATCAACGAG
CGTCAGCCTC
CAAGTACCAG
GTCAGCTTCA
AAGCACCAGT
TTCAAATTCA
ATCCCAAAAA
TCTCGTCA ATTGATCTG TGT'rACT'rGG AGTTCTAGCA GGTTG3CGAAA CGCCGTAAAC GTGATGAAGA AGAGTAAGAC ACTAACTCGC GCACATAAAT CAAGGAGAAA ATTGCTAGTG TGTACCAGTA TACAATGTGG AAAACTATCT GAGGALAGTGC AACATATAAA AATATTGAGA TTGTTGTCGT TAATGATGGT AATTTGTAAA GAATTTTCAG AAATGGATCA CCGAATTCTC TGGTCTT'rCT GCCGCACGAA ACACCGGTCT GAATAATATG TGTGGACTCG GATGATTGGA TTGAGCAAGA TTATGTAGAA AGAGTATCAG GCTGATATDG CAGTTGGTAA TTATTATTCT GTTCTAC'Tr CATATATTGG GAGACTCCTA TTATGAGAAA 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 CTTTGAGAAC TTGTATGAAA CTCAAGAAAT TAXACTCTAT AAGGCAAGAT TGTTTGAGCA AGATGGTTAC CTCAA'rCAAA AGGTATATTT AAGTCTTTAT GCTI'ATCGGA TTAGAAAAGG GATGCACGCT TTAGTTGATG CTATGTCTGA TCCTCTAGAG AAACACTTGG CAGTTTATCG TCAAGCTAGT GGTTTATCTG ACACAGCAAC
GAAGAGTTTT
GTTGCGCTTT
ATTATCAGAA
TAGTTTATCA
ACGTA?1'ACG
TCAGATGTTG
GTATAAAGAG
TTTGAAATGA AACAAAGGCT TTTAAATCAG CTATCGAGAC AAGAGGAAAG TGAAAAGAAA GCCATTGTCC TCGCAGCAAA ATTTGTTIATC ATAATCGTTC CTATGGC'rAT
GATTCGTTTT
TGGATTAAGC AATTAAATAA GCGCTTAGAG GTAACTTCTG AGCAAATTTC ATCTTATAAA TATTTCATAG CTGATTTCGT GCAAGAAGAC GTAACGAAAA ATCTGGATGA CTrGTTTGCT GTTAGAGA'rT T'rGGGGGCAG AGCTTATTTT TTGGTAAACA ATGCTTTTTG GAAAAAAGAG AATGAATGGC ATGATAAGGT GGATCAGGCA 5.79 GTAGACCAAG 7T1rrAACGAC AA'rCAAGTCT TArC'TGATTC ATAGCGAT'rT TCCAAATGAA AAGTTTGACT CAGAAATTAT TAATTCTCGG TCGGATATTA GTTACACAGT CTTTTACGC AAGGCCCTCT ACTTGGACTG TGATCTAGT-r ACACACTTAC AAGATTATCC TTTGGCTGCT GGTCAAGAAA TCTrTAATGC CGGTGTTCTC AATATGACCC AAAAATTAAT TGATGTAACC GATCAGAGCA TCT'rGAATA'r GCTTTTTGAA
C
C.
C
C. PC C C
C
CC C C
C
C
C. C~ C C
CCC.
C
C
CC..
CC CC C
C
CATAAATGGT TGGAATTGGA CTTTGATrAT AATCATA'rTG TCATTCATAA ACAGTTTGCT GATTATCAAT TGCCTGAGGG TCAGGATTAT CCTGCTATTA TTCACTATCTr TTC1'CATCGG AAACCGTGGA AAGA'rTTGGC GGCCCAAACC rATCGTGAAG TTTGGTGGrA C'rATCATGGG CTTGAATGGA CAGAATTGGG ACAAAACCAT CATTTACATC CATTACAAAG ATCTCACATC TATCCAATAA AGGAACCTrT CACTTGTCTA ATC'rATACTG CCTCAGACCA TATTrGAACAA AT'rGAGACAT TGGTTCAATC CTTGCCTGAT ATTrCAGTTTA AGATAGCAGC TAGAGTAATA GTTAGTGATC GATTGGCTCA GATGACAATT TATCCAAACG TGACTATATT TAACGGAATT CACTATTTGG TAGATGTCGA TAATGAATTG GTAGAAACCA GTCAAGTACT TTTAGATAT'r AATCATGGCG AAAAGACAGA AGAAATTCTC CATCAATTTG CTAATCTTGG CAAGCCTATC TTATCCTTTG AAAATACTAA AACCTATGAA GTAGGTCAGG AGGCATATGC TGTTGACCAA GTTCAAGCAA TGATTGAAAA ATTGAGAGAA ATAAGCAAAT GAAGAAAAAT CATTTAGTAG GAGATGCTCT GATTTTGACG GTTAGTGATC AGATTGAAGA GTTGGATTAT TT'TTTATAAA ATTTCTCCGT TCATCATATA TGAAAGTTGT TCAAACATCA GAGTGCTrTA TAAAATATAA ATAGACCTAA AGATATTTAA TATGAACTGC ACCCCAAAAG TTAGACAGAA AAAATCTAAC TT'TTGGsGT CAGTACAATA TTAGGGTGTG ATNAATTATC TTTTAGGTG AAAATGATTC TATATTATAG CTGTTTGATA CGAAATTTAT TATAAGGAAA TTATGTTAAT GAATACAAAA TCTATAGTTT TTAATGCAGA TAATGATTAT GTAGATAAAT TAGAAACTGC AATTAAATCT AT'rTGTTGTT ATAATAAT'rG TTTAAAATT1' TATGTATTA ATGATGATAT TGCGTCAGAG TGGTTTTTGA TGATGAATAA GCGATTGAAG ACTATACAAT CTGAAATCGT TAATGTAA.AG ATTGTAGATC ATGTTCT'rAA AAAGTTTCAT TTACCGTTAA AGAATTTAAG TTATGCCACT 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820
TTCTTTCGTT
GACATCATTG
TTGGCAGCAG
TrATTAGTTA
ACCAATCAAT
GATAGATGGA
CACATAGAAG
AGTGTTh'rAC 580 ArTTTATACC TAATTTTGTC AAAGAAAGTC G'TGCTTATA CCTAGATTCT TTACAGGAAG TTTAGACTAT TTAT1-rGATA TAGAACTAGA 'rGGTTATGCC TAGAAGATTC TTTTGGTGAT GTTCCrrCTA CCAATrTTAA CTCCGGAA'rG ATGTAGATAC TTGGAGAGAT GAAGATCT GTTCGAAACT GTTAGAACTG ATCATGAAAC AGCATATGGA GATCAAGGAA TTTTAAATAT GT'rATT-CCAT AAAGATTAGA CCGAAATTT AAT'N'TATGG TGGGGATGGA TAGCGTCGCA GAAATCATAA ATGGTATGAG ATTTCTGAGT TGAAAAATGG AGATTTACCT ATTATACTGG GGTAAAACCT TGGGAAATAA TT'rCCAATAA TCGCTTTAGA 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 GAAGTTGGT GG7T'ATAA TCTGTTAGAA TGGTCTGATA ATTAGTCGTA GTTTCGAAGA ACTTGTATAC AGTCCTAAAG GCTAGTTGTG AGATGGAGCA TGTAGAATAT TT'GATAGAAA TCTATACTAG CACATACATA TT'rTGCGTCT AGTGTCGTTG GT'rACGATTT ATCCTTGTTT TTCTCCATTT GA?1'ATCGAA TTTTATTTAG ATATTAATCA TTATAAAGAA GTGGATAATA CTATCTAAAC CAATTTTTAC CTTTGAAALAT ACTACTrATr TTrTwATTGAG AAAAGACATT CTCATACAGC AATTTTTACA ATTACCAGA GGTACATTTT CTTTATTAAG ATATAGCAAT AAArrTTGGA TAATTTAGAT T'rGTATCCGT TGTTCAACAA ATATAGGCAA TCAAACTAAT
S
S*
S
S
55
S
5 S. 55 S S 5 56 55 S S
C
ATATTTTCTT CAACCGAACC TAAGTTTATG GCAGACGAAC TTTGCGAATG TGTTTGGATA AATCAATGAT GGCTCTCCAG TTCrCGTTTC AAATATr'TTG TATTGAATGT TCGGGGGGGG ATGATGCTTT AGACCGATTA GGCGTTATAA TTCTTATGAT ATGATTC TCT AGAAGTGATA AAACAAAATG GTAGAGGCTA TAATTAGTAT TGTAGT'VCCA GCATTCAGAA TCAGACGTAT
TTAGACAATT
ATCTACAACG
CAAAATT?1'G
GAAGALATTTG
TCATCAGCTC
CTCTGATGAT
ATCATrCATC
AGAAAGCAAA
GCGTACAT'rA
TATGGTGCTT
GAAACACGCT
GAAGGTAAAG
CAAAATATGT
CGGCGGTCTT
CTTTTGTAGA
TATAGGAGAA 9780 TTGAGAATTA 9840 AGTGTTTATT 9900 TAGAGAAAGA 9960 GTAACCTACG 10020 TGGTTGGAAC 10080 ATTAGTATCG 10140 ACGGATCCAG 10200 GTCGAAGAAG 10260 TTACTACAAG 10320
*C
S S S 5 TGAAAAAGGA AAACGCAGAT ATGTGTATAT GACTTATGTT CAATTATGGA TAGGGAAGGT TGAAGTTATT CAAGAGAGAG TCAGAAATGG GAACTGGACT GTAGCTGTCT ATTTACCATT TCCTATAGGA AAAATTGCAG TAACACTC GAGGATAGTC TATTTGAATC AGGATACTTA CTGGACATGG AAGGTACTTC GTTGTGTTTA CTGG'rACCGT GTTGGT'PTAT CTGATACTTT ATCGAATACA TGGAGTGAAA AGCGTATGTA TGATGAAATT GGGGCTAGCC.
AAGAAAAGAT AGCTATTTTA GCAAGTTCAG ACTATGACTT GACCAATCAT ATTTTGATTTr ATAAAAATAG AT'rACAAAGA G'rGA'rAGCAA AATTAGAAGA ACAAAATATG CAGTTCACAG 10380 10440 10500 10560 10620 581 AGATTTACAG AAGAATGATG GAAAAATTGT CTTT-ACTTCC GTAGATAGTA ATAAAAAATG AGATAGCGTA ATATGAAACT ACATTTAACA AATTT1ATACG GCATCGCTGG TGATAGTACG GTTATCTTAG CTCAAAATGC TGTTCAAAAG ATAGCrAGTC AACTGGGATT TAGAGAGGTT GGTATTTATT TTACAACAT TGCTTCAGAT AGTCCTTCTG AAATGAATAA GCGTCTGGAT GGTATTATGG CCAG1'ATCTC TATTGGGGAT ATTTTAGTCT TTCAGTCTCC AACCTGGAAT GGT'rTTGAAT TTGATCGTC'r CTTGTTTGAT AAGCTAAAGG ATATGCAGGT GAAAATTATT TGCTT'rATCC ATGATGTTGT TCCCCTCATG TTTGATAGTA ACTATTATCT CATGAAAGAT TATCTGTATA TGTATAATCT ATCAGATGTT TTGATAGTGC CCTCAGAGAG AATGAAAACA CGCCTGATGG AAGAAGGAT'1 GACGACTAAG AAGATTCTTG TTCAAGGGAT GTGGGATCAT CCTCATGATT TATCCTTATA CACCCCTGCT TTTAAAAAAG AAC'N -IrTr TGCTGGAAGT TTAGAGCGTT T'rCCAGACTT AATAAAGGGG AAGCTAGTTC GAGGAATTGT TGCTAGAATT AATGAGGGAG AAAGTAACCA CTAACAGCGG GCATTCCAGT GATCAAGCCT TGGGCTTTAT ATGAATCTAC AAGAATATCA AAAGAGGGCT ATTTCACTAA TAAGGGAATG AAATGAACAA TT1AGAAACAA CGATAAAATC ACAAAA'=~G TCTCAAGATA CGCCTTTGAG AGTATTTTCA TAGTGCTAGA AGTCTCAGCA TCGAAGGATG GAAAAAAGAT ATCAAAGGGT GGATTTGGCC TTGTCTGGGG AACCcATCAA ATACTATACC TTGAATATAT CATTGTACCA AGTAGCTT'GT GGCGGATAGT CTGGAAGAGG AGAAATGACG AATCGTATCA AAAGTTATTG GTAGATGCAA AACAATTGTA CTAGCAGGGG TA'TTTATAC CACAATC6AG CTCATAACGT GAGTACCTAT CAACTGCTAA ATr'rATAGTA TTCATGAGA'r AGTTGATAAA AGACCTTTAG CTATTTGTTA TCTATCACTT GGGAATTGAT ATCGCAATTA CACCAGGCAG ATGTTAAGAT TTATArrrTTG AAATAGCTCG CATGTTAGGT TTCAAGATT'G GGAAAAGCAA CAGATTATAT CCAAGAAGAT 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 AATCAAGATA TCATGCCAGA TTGGTTTCGC AAACCACGAA AGTGAGATTA TCGATGTTAA ACTACCTGAA CAAACTGTG'I GATCACATTA GTAGCATTAC TTA'rGCTAGA 'rATTTTATTG AAGGT'N'TAT ATT'rAGACAG TGATTTGAT'r GThAATACr'r CTTTAGAGAA ATTAT -rAGT ATTTGTTTAG AAGAAAAATC ACrCGCAGCA GTTAAAGATA CAGATGGAAT TACATTTAA'r GCAGGTGT'rT TATTAATCAA CAATAAAAAA TG.GCGTCAAG AGAAATTAAA AGAACGACTA ATTGAACAGA GCATTGTTAC AATGAAGGAA GTTGAAGAAG GCCGTTTCGA GCATTTTAAT GGTGATCAAA CGATTTTTAA TCAGGTC7rG CAAGATGATI' GGTTAGAACT AGGTCGAGCT TATAATTTAC AAGTAGGGCA TGATATTGTG GCTTTGTATA ACAA'rTGGCA GGAACATCTG 582 GCTrAATC ATAAACCAGT GGTGATTCAT T'TTACGACCT ACAGAAAACC CTGGACTACC TTGACAGCCA ATCGTTATCG TGATTTATGG TGGGAATTCC ATGATTTGGA GTGGAGTCAG ATTTACAAC ACCATA'rGGG AGAATTTGAA CTAATATCGC CTCTAGATAA GGAATrTTCT TGCTTAACCT TAACGAATTC CCAAGATTTA GAACGAATAG AAGAGCTAGT TACAGCTCTA CCTAGG'rGG TA'rrCA'rAT CGCAGCT'rGG ACCCATA'rGG GAGATAAATT GCTGTATATA ATAATG'rGAG AAAAAGTCAA CAAATCTATA AAATCTTTGC AAGAACAAGA TTAGGACAAA TCCTTTTCGA TTTAAGAAAA ACGGACATCT TTTACGGCTT CTCAGTATAT GTTT'rTCAAA TTGC1'GCTTG TATCCTAATA TTCAGCTCTA AAGATGGATG CT'rAT'rTAGA ATGGCTCATC TATCTAAACC ATTGCATCCA CAAATTGTTC CACCGGTCTT T'rTGGATATC AATCATGGTA GTGCAGATGA AAAAACGCTA CTAGCTTTTC AATCGACTCA AAATGGGAAA GTTTCCTTTA TGATTGATAC
AAAAAAATTA
AGATAAGCTG
GAACTTTTTA
GCACGGAGAG
GATTAAAGAT
T'rGTTTAACG GCCAAATGT'r GTCTAATCG'r TACCTGTT'rr CGACAACTTC CGAACAAT'rG GATTACTTGG GACAC'rATG GGGCCAAAAT
CAAGTTTAAC
CTGGACAGTT
TATATGATTT
TCCGGCAATT
TATCAACCTA
TATACTAGCC
TCTAGAGATA AGCTAGACGA GTTGAAGGAG CTGACI'TCAA CATCCGATAT CGT'rGCAGAA TTTTATAAAT CTCAAAATGG GAATAATGGC CGAATGTTGG CTGATTTGCA AAAATTGATA ATAATCCAGG TGAAAGGGAT AGATGAAACC TTAGTTCGTT TTGGAGATGG GGAAATCAAT 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160
CAAAGGTTGT
ACTAAGGATA
TTGGATTATA
ATTCAAGTGA ACATCCTGAA TGCTAGAAAA ACCGCTTGAT TTATTGAACA CAACTCTTCT ATGCTTGCAG GGCATTCAAT TCCCTACCAG GATTATGATG AAGAGTTGGT TTCAArCA'rG AGGGACATTA TCGGCCAAGA AAGTCGAGAA GATTTAGTAG TGTGCCTTCC TGATGC'TTTT ACAGATCGTT TTAGGTTTAC ATCGTGGGCG ATTCCATTTT GGAAAGATCA CATGGATCAT TATATGGATT TTTACAGAGA GTTATGCAGT GATTCATGGT ATGGCTCAAC CTTTGTATrCT CGCCCTTATA TCGATTTTGA AGCATTTGdG AAAACCGTGA GGAAATGATT TATTCGATGA GCCTT'rTCTA GAGTTCATGA ATTTTATGTA TGCTTGGACC TATCAAGTTT- TGGATGTAGG AAAACTAAGG TTAAATTTTC GAATT'rATTG ATGA'rcAAAC AGACAAGAGT CAAGCTAAAG CTCAATTTGA CTTACTGATA GTCGAAGGTG CGACTTCTCG GGCAAATTCT ATTAAGCGAA TTATCTGTCC ACTTGAACAA GAAATTGAAA AGTATGCTG TACAGCAAAA GTTCTGAGTT ATAATCTATG CCATATTGAC TCAGAGTATG AATGGATGAA TCATAAACAT AC'rGCAGAAC ATAATTTCGA CTATAACAGT CAGATTCTTG CACGAATATT
AAAATTGAAA
'rTCAGGTGTC
TTCTCATAGT
TGGTCGCTTG
CCAGATGGGC
AATGGGAGCT
CCAAGATATT
AAACTAGACT
583 ATTTAAAATA AATGATAAGG ATTTAAAATG ACAAATACCA AACGCGCTGT AGTATTTGCA GGTGATTACG CTTATATTCG ACAAATCGAA ACGGCGATGA AGTCACTCTG TAGACACAAT AGTCATTTGA AAATTTATCT GCTAAATCAG GACATTCCTC AGGAATGCTT TAGTCAAATA ACAATATATT TACAAGAGAT GGGGGGCGAC TTGATTGACT GCAAGTTAA'r TGGCrACAG TTrTCAAATGA ATTGGTC'rAA TAAAI'TACCT CATATCAATC ATATGACAT'? 'GCACGCT TI-1-ATTCCAG ATTTTGTAAC AGAAGATAAA GTTCTCTATC TAGATAGTrGA TTTCATTGTG ACTGGTGATT TGACCGATTT GmTGAATTA GACTTAGGTG AAAATTATT'r GGCAGCAGCT CGTTCTTGCT TGGAGCAGG AGTCGGCTTC AATGCTGGTG TTCTCTTGAT TAACAACAAA AAATGGGGAT CTGAAACTAT TCGACAAAAA TTGATrGACT AATGTGGAAG AAGGAGACCA GTCAAT7TNG AATATGTTGT CTTGAAGATC AATATAATTT TCAAATAGGA TATGATTATG CAATTCATTT TTGA'rATTCC GC1'CGAACCA CTGCCACTAA GATAACCCTT GGAA'rCAATT TTCTGT'rGGA CGTCTAAGAG TTGA'rCGATT GGTCTGTTAT TTTAAATGAA TGGTTTTCAA AAATCACAAA TATTTAAGTT GCAATGTGTT AATTTAACGA ATCGATTATT TGGCGGAGCA ATTGCCAGAA GTTCATTTTC TAACAGAAAA AGAACATGAG TTAAAGATCA A'rATAGTTCC GGGCGGCAAC CTTTAAACAT 'rTTTACACTA TATTTCTCAG AAGTTTCG'?G GGAATACTCT AGAGTGTGAA GTACCCTAGT A'rTCTTGGTG TGTCGAGAAA ATATTGTTGC TTATACAAAT 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 i5060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 ATGGCAAATG AACTACTAGC TTTAACCCGT TTTCCTAATG TTACCGTATA TCCAAAT'rCC TTACCAATCT TATTGGAACA AATAGTAATA GCTT'CAGATT TGTATTTGGA TTTGAATCAT
GATCGAAAAT
TTCGACAATA
TCCATTICCGA
GAAAATCAAT
AGTC'rGTAIT
CTGAATGGTT
TACACATTGA
CTTACTTTAG
CCGATATCAT
TAGAAGATGC ATATGAGTTT GTGCTTAAGT CTTGCTCTGA AAATCTrTCT GAGATTT-CAT AAAAAATGGT TGCAGCAATC AGATCTTACA AGTATTAGCG GCAGATAATG CCTATCTTAT GTrATCACAAT AGAGATGTTG ATTITTATAT rAAATT~ATTG GGGAGAAAAA TGGAAGTTGT rAAAGAACTT TT'rGAAAGCT ATAAAACAGG A.TTTTT'rGCG ACAGAAGTGG TTGAATCTGA ACAAAAAACC AATGATAGCT ATGAAGGTAT CTATCCAAGC TGAGGTAGAG AACAGTATGA TCCTT'rACAG ACGACTATA
TCTCAACAGT
GA6ATTCTACA
ACCTCATATA
TAGGGTATTG
GATAGATCTC
GATATAGCTC
ATTCGCAGTG
AATTATGCTT
TATCTGGATT
AAAGGATATT
rGTAACTGGG GAACTAGCTA CT'rTGT -rGA CAATTGGTGC TGTTGATGAT GTCTATGCCT ATGAAGGACG AAAATCTGGA TTTAATACTG GTATGTTACT AATGGATGTT GCAAAGTGGA AAGAACATrC TATTGTCAAT AGT'rTATTGG
AATTAGCGGC
ATTTTGAGGA
TTTATCACCT
584 CGAGCAGAAT CAAGT'TGTTC ATCT'rGGGGA TCAGAGTATT TTAAATA'r'r' TAATTGGCTA GCCTT'AGATA AAACATATAA TTATATGGTG GGTATI'GATA TGCTCAAGAA TGTGAACGTC TAGATGACAA TCCACCTACA ATTGTTCACT
ATGCTAGTCA
GGGTTTATAG
TTGAAAGAAG
AACATTTAGA
GTGATTGTTC
ATGTATTACA
ATACAGGTGG
TCGCTTTTGA
TGGAGAGACC
ATTAATTAGT
'rGATAAACCT TGGAATACAT ATAGTATATC TAGACTACGT AGATTTGGAT TGGTCAGAGA TTGCTTT'rCA ACGTTCCGAT CAATCAGTCT AAAAAACAAG TG;ATGCTTGT GACA'rGGAGT GTAI'TAGTA CAACG4GTTAC CTGATTGGCA T'rrTCATTTG TGAGGAGCTG ACCTCTCTAT CACAGTATAC GAATGTAACA TAGTAGAATT GAT'rGGCTAT TCGACGATTC TATAGT'rTAT AGAGGTTTI'T AATGTAGTTA CAAGGGCACA AGAAAGTGGC TATCACACGT AAAAGTATGG ATGATGGACT CTATGACGGT GAATTATGG'r
TTAAATTATT
GCAGATATAA
GCTGCACCG'r
GTATATCAAA
TTAGATAT'rA
AAGAAAATCT
ATTTTTTCTG
AGATGATTTA
GTTGTGGTAC
GTGGATAGAA TGAAGAA'rAT CCATATACAA TACGGGAAAA AGAGATAGAG TAATGAGTGA TATTTAGTGG AGTGTGTCGA GCATATCTG AAGCAAACCT ATCAAAATAT. AGAAATTATT TTAGTTGATG ACGGTTCTAC GGATAATTCT GCGGAAATTT GTGATGCTTT TATGATGCAA GATAATCGTG TGCGAGTATT GCATCAAGAA AATAAGGGGG GGGCAGCACA AGCTAAAAAT ATGGGGATTA GTGTAGCTAA GGGAGAGTAC ATCACGATTG TTGATTCAGA TGATA-TCGTA AAAGAAAATA TGATTGAAAC TCTTTATCAG CAAGTCCAAG AAAAGGATGC AGATGTTGTT ATAGGGAATT ACTATAATTA TGACGAAAGT GACGGGAATT TTTATI'TTTA TGTAACAGGG CAAGATNTT GCGTCGAAGA ATTAGCTATA CAAGAAATTA TGAACCGTCA AGCAGGAGAT TGGAAATTCA ATAGCTCGGC 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 CTTTATATTG CCGACATTTA AGTTGATTAA AAAAGAATTA TTCAATGAAG AAATCGTCGC CGCTTTGATG AATCGTCTTT ATAAACGATA AACGGAATTT GATCTTTCCT GGATTGTGTC TITGGCTGGT'r AAAAGATTAT AAGCAAACTTI TA'rTTGTTTC AGATTAAAGT TAAAAGAATT GTTATTTACC AAGATTTATT CGTCAATAC TATGTATTGG GAAGTCGTAT ATGAAGCAAC TATGCATCGC TTTTATCTTr' ATCTCTATCT GTATAGAAGA CGTTCAGGAA GGGCA-AGAGA TATTGT'rGA-A GTGTTTTCTA TGGATGTCTC CGTTCTGCGT AT'rCGATT'TG TAGAATACCA TCAATTAACA GATACTGAGG TGTTTTTTGA TGCAGAACAA AGAAATGGTA ATATCACAAA CAATGAAGGT GAGGGGAGTG AGTAAAAAAA GGACTATTTA CCTCATTTCT TATTCTCCCT TTTGTTGACC TAAATACTAA
TTCACTTTTC
TAGCCTCTAA
GCATCATGAG
AGAAAATATC
TCAATCTTTT
AATATAAAGA
AAAGTTGAAA
TTTTATGACT
ACTGTTTATC
AGATTTTTTA
Page(s) ~l were not lodged with this application ACAAACTCCI' CTGATTATTG TGATACACTT GTAACAACCT GGAGGTTTGG CTCACTACTA 'rTTATACAAG GAAGAGCATG TAAGCTCTTT ACTAAAGATA TAAGCGAACA GGGCGTCTAA TGAAGCCAAG GAACATGTCA TCAGAGTC'rT TTTAAGATGT GGAAAAAGAG TTTATTGAAA GAGACAACGG ATTGAC'rATC 588 CGGGTTCTCC TCGTGTTCAG TCTAATTACT ATGCGATCAT TGGTCGAAGG AGAGGATTAT ATCTTrAAAG AGGAGAAAGA AGGGGGCCAA GTCTGCTGAG AA'ITrCCTAG GGATrGATAA CGTCTTGC TCGTCATTTG GTrrA'rGCGA TTCGAGCTcA AGGACrArAT CA'rTCGTGGA AAPGAGATGG TACTGGTTGA TGGAAATGAC TAAACTTCAA GGAGGTCTCC ATCAGGCTAT 23040 23100 23160 23220 23280 23340 ATCCTITGGAG TACATCAAGC CTCAGTTGAA ATGTCTCAAC TGTCCTAAAT GCTAATAATG GGGGGCTGTG ACAGTGGCTA AATTATCTCC TGAGACGCGG GCTATGGCCT TTALATAAGAT ATCTGGTATG ACAGGOACAG CTTACAATAT GTCTGTAGTA CGCATTCCAA CAGATAATCT ArATATCACT TTACCTGAAA AATACCATGC TAAGGGAAAT CCTr'rACCG TCTAT'rCGTC TCTCTTGTTT CGTGAAGGGA CGGCGCGTGA GGCTCAGATT ATCTCCGAGT CCTCTATGGC AGGACGTGGT ACGGATATCA
S
9*
S
SS *S
S
S. *5 5 0 9* AGGAGTCGCA GAGCTTGGGG GCT'rGAT'rG' GATCGACCTA CAAATTCGTG GCCGrTCTG TTTTGTATCC TTAGAGGATG ATGTTATCAA GTACAAAGAC TATCAGGTTC AAGATATGAC CCGGAAACTA GTCGAAAAGG CTCAGCATGC TCAGACTCTG GAGTATGCTG AAAGTATGAA AAATCGTCTA ATAGATGGTT CTCGTGACTT ATATACAGAA GAGGTAGCGG CTGATCACTA TGTGACCAAT ATTAGTTTTC ATGTTAAAGA AACTGCAGTT CGTAGCTTTA TGAAGCAGGT ATTACTTAAT CAACATGACT TATATGAACA -TtATGACAAC TGGGTAGAGC AGGTAGACTA TATTGGGACT GAGCGGATGG TCGTCAGGGA GATCCTGGTA GAAATTTGGT CCATCTTGGG TCAACCGGAA GTATTGAAAG CAGTGATAGT GCTGGACGTT TATACAACGG GATATAGTCT AGAGGArGTT GTTGTGGATA TGCTAGTCGT GAAT'rATTGT GGTTCCAGAT TATATAGATG GAT'rGATAAA GAACTrTCTG GTTTTTACGA CTTTCACTGC CGATCACCTA 23400 GrAAGGTCGC 23460 CCAATCGTCC 23520 AAGTGTATGC 23580 TTTTTGTAGG 23640 TTGCCCATAA 23700 CAGGTCAGAT 23760 AGCTTGGTAA 23820 AAAGTCAGCG 23880 rCAGTAAATT 23940 TGCATAAAAA 24000 GTCGTAAATA 24060 CAGCACGTCG 24120 ATAAAGAGAG 24180 rCATTGAGAG 24240 rTCACTTrTAT 24300 rAACTGACAA 24360 AAAAGAAAGA 24420 TTAAAGCCAT 24480 CTATCGGTGG 24540 ACGCGGGCTT 24600 LGGGGCTGGT 24660 NAATATGACA 24720 'GCTCAAGCC 24780 55 5 5 5
*SSS
TCTACAACAG CTATCCATGG TCAATCTGCT AGTCAGAAAA ATCCAATCGT AGAGTACTAT CAAGAAGCCT TGAAGCTATG AAAGAACAGA TTCATGCGGA TATGGTGCGT AATCTCCTGA TGAGGTCACT CCAAAAGGTG AAATCGTGAC TCATTTTCCA TAAAAGGAGA ATTTACAATA TAAATTTAGG AATTGGTTGG GCTAGTAGCG GTGT'rGAATA TATCGTGCTG GTGTTTCG ATTTAGCCC A'rAATATTCA ATCTGGCTT ATAATCATTT GA'rGTCTTGG CTTACTTTGG CGTGTATTCT '1?rTTGACCA CACI"MGTTC AACATGCCGA TCTTATACGC GTTATTGTAG CAACGAACTT TTTATAATGA AAGGAAGAAG TTTATCATT GCCTTTATGA AATCTTTGAA
GAAATTAAAT
GCACT'rAAC.A
CACAGATATC
TGrGAAGAA
AGATAAG'TTT
GTATGrTT
CGAGAT=
AGACGGGACT
CAAGGATAAG
CTGTCCTCTA
GCCAATATTG
AAAATTGvCAC
AGTCACAGAG
cGrAACCTGTT
AAGGGAACC
GCTCCCAAGG
CCAGTCTATG
ATTT'TCTA'rG AGTTTATCTT TACAGATATG GTTTTGATGA TAATCAGGTT1 CTACTAGCGT CACAGTGGAT AAAAAAATGG CAAGGTTTTA ATTrCGGTGA TGAGAACAAG TGA'rrCGGAA GGATTACT-TT ACAATGTTGC AGCTTATAC ATATCT'rGAT GAATCAAGGG GAAACCAAGC TTTTGTGCGT TcATTrcTcGA TAGGGAGACA ATCTAGCGGT AGT'TGTTCA'r TCCTTTGGAA TAACTATTAT TCGTGTCTAC TGATAGACAA ATCAGCCAAA GATTGTTACC T'TTGAATAAG 'rCTGATrTTGG GGTATTGGAC AGGTTGTGTT TGAOGAAGCA CAGACAGCAC GCGGAGCATT ATAGTGAAAA TG;CTACAAAT GAGGACTATA GACTATCAGT TTACCAATGC AGATAAGGTT GACTrCTTTA AATGAAGTTC TACAAGAGCA ATTTGCCAAA TATACTCAGC A'rrCCTGTAG GCAGTATTGA TTCCTTGACA GATTCAAGTC AAGGGCGCAA ACCAT'rTTCA 24840 24900 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26400 26460 26520
TTGATTACGG
ATTGAAGCTC
GATTCTCTGC
GGGCATGCGG
AGCGAAGGAT
TTTGATGTGC
CTTCACGTCT TGCCAAAGAA AAGCACATTG AT'rGGCTTGT ATAAGGAGTT ACCGGAACTA ACCTTTGATA TCTATGGTAG TTAGAGAAAT TATTGCAAAT CATCAGGCAG AGGACTATAT AACTTTCGCA GATTTATAGC CAGTATGAGG TCTACTTAAC TTGGTCTGAC CT'rGATGGAA GCTATTGGTT CAGGTCTACC CTATGGTAA TCAGACCTTT ATAGAGGATG GGCAAAATGG CCAAGTTCAT CTGACCATGT CAATTGTAT C AAGAAAATcG.
GGCTTCTTGA CCAAAGAAAT qATTGAACTT TATGATAGTT TACTGGTCT'r TCTCAACTTG GCT'CTCCT NrTACCTATT AGTTCCCGTT TCAGATTTT TGTGACGCAG GAGAGGGCTG AGAAGACCAA ATCAAGCAAG CTTATGCCGC TTTGGAAGCT ATGCGTGCCT ATTCTTACCA TTTAGAAAAG TGGAAGAAAA CAGTAGAGGA ACAGTCAAGA AAGTCGAGAT TTACATGAAA GAGTGGTCAT CCATGCAGAT GG7rTTCTGC ATCTAGGTTA CGAGGATGGA AAACCTCTCT GGGAAATTTr .AGGAGATAAT CAGTCTGCTT TCATTCATTA TGCTGATGGA ATGCAGGCTC
GAAAGCTGTG
TGGTGGAGAA
CCAAC'TCAAG
GGCTTCTACC
TCTAATTGGT
TTATTTGATT
TAAGATTTGT
AATTGCAGAA
GGTGCTCCAT
GTCTAGGCGC
CTGATGGTCT
ATTTTAATCA
GTATTGAAGA
GCTTGGTTAA
590 ACAGGTAGAC TGGAAAGACC TAGAAGGTCG AGTACGTCAG G'rrGACCACT ACAATCGC'TT CGGAGCTTGT TTTGCTACAA CGACr'rATAG CGCAGATAGC GACCCGATTA TGACAGTr'rA CCAAGATGTC AATGGTCAAC AAGTTTTACT GGAAAACCAT GTGACGGGTG A'rATCTTAT'r GACTCCCA GGTCAGTCCA TGCGTTACTT TGCAALATAAA GTTGAAmTA TCACCTTCT'r
TTTGCAAGAT
CT'rGc'TTCC
TCTCTATGAT
TAAGAAGATC
?TGGAAATAG ATACCAGTCA GC?1'ATCTTT TTCCATCATC CAGATAAATC 'rGGCTCGGAT GCCATTCCAG GTAATATGCA GTrGATTTTG ATCATT'CCAA ATAAGGCGAC TTArGAGCGC GAAATACCAT GATCAGTTTG TGCACTTGGG CCTAAGACGA GATGCCTTAA TCTTGACCAA CGCAGGACCC TTGCCTGATG TCACTTTCCC GCTCT'rAGAC ATGCT'N'GCT ATCCTAATGT GATTCAGGAG CTGTATCAAC TGTCGGATAT GCTACAGGCA GTGCGTCAGG CCTTTGAGCA GGTGCACAAT AGACTTTATA TCGCTCCAGA TT'rGGTTGAG ACCAT'rAAAT TGGCCCTTTC CAAACAAGGC CAACATGCAA ATTATGTTGA TGTTTTAGGA GGCTAACA'rG TCAGAGGAAG
TTATCATTAC
TTCAGATCAG
TATTGCAGCG
GGCCCTTTAC
TTACTTGGAT
CAATCTCTTG
CCATCTAT'TT
AGATG'rTGAT
CTTGGTGAGA
ATTTATTTTA
AATACTCTAG CGACTCCTTT GTCTTGGTAT GGCAGGAACC GAAAGTGATA ATGTGCGTAC GCTTTAGAGT TAACTGACGA CAGTTCAAAC GTGATAATTT ATTGAGCAAG TAGAAGCAAT GTGACAGAGA TGTCTTCTAA CAGAACGCTA GTCCACAGAA ATAAACCACA GTAATGAGTT ATTCTTGGCT TTAATCAGAC GAAAGTAGTG AAGTTGCTGC CAAATGCGTC AGGCACTTGG TATCAGGAAA CCATGCAAAC 26580 26640 26 700 26760 26820 26880 26940 27000 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27660 27720 27780 27840 27900 27960 28020 28080 28140 28200 28260 28320 CAAAGACGTT GAAGGCCGCA TGGAAGAGTT GAAACAAAAA GTAAGACTITT TTCACTTTTA TGGGAATTTT GAGC'rAGATC CTTGCTTTGA ?TGTATTGGT GCCACTAGTA ATATTGGTAA ACTTTATTGG TGAGT'rTGGC AAATAAAAGA AAACTTCAGA TACTCAATGA AAATCAAAGA TTTGAGGTTG TAGATATAAC CCCATCAAGA AGGAAAAACA AACCCGAGGG GAAAAGATTA CTGGGTTTGA TGATTCTGAT TGGTTTGCTC TTTACTTTGC TATGAT'rGAA ATACTAATTG TTTTAGCTAT TATCCTATC'r AACTATACAA CCCCGTCAAA ATCAACTATT TTCCATGGAT ACCAAGCTAC TGGCAGAGCA ACACCTTGGT CAAGGTGCTC T'rTATT'rATT CTACTATTAA CCT'rTATGGT GATTAC'rrAT TATT'CACCTT TTGTGGATTG GTCTGAAGTT TTCTTT~TTA GCAAACTAGG AACerAGCCG CAGGCTgCTC AAAACACCGT TGACGAAGTC AGCTCAAAAC ACCGTTTTGA GGTTGTAGAT AAAACACCGT T'N'GAGGTTG TGGATAGAAC TGACGAAGTC GGTTGTGGAT AGAACTGACG AAGTCAGCTC AAAACACCGT TGACGAAGTC AGCTCAAAAC ACCGTTT'TGA GGTTGTGGAT
ATAACTGACG
AGCTCAAAAC
TTTGAGGTTG
AAGTCAGCTC
ACCGTTTTGA
TGGATAGAAC
591 AGAACTGACG AAGc tCAGTA ACA'rATATAC AGCAAGGCGA CGCTrGACGTG GTTTGAAGAG TATTACTGTC TATATrTTTTG GTAAAAATCA AC=TAC?r GGATGAAGGT T'rTGGC~rCA CGTAGG.AGTT GAAGAAGGGT CGGTTCCGGA AGACTTCCAG GTCTGGCTCA GTTGACCTGC GCGGCAAGGT CACAAGTGTT TGTCTCATCT CAAAGTAGTG TCCAAATAGA CCAGTCI'GAG GCTCGGTTGC GTCCGTCTCC AACTTTTCTT CCACCAGCG'r AGATTAACAA GCAAGGCAAA AGGTCTGGAG AGGTTGACTC ATGCCAATTT CCCAGCCCAT AGACTCCAGA TATGAAATCC AGCATAGAAA AAAGACGATT CTCAAGAGAG CGATAATTCC AGGCAGGCAA GACAGGTAGC ATTTCCTAAT AAGTTAGAAT GAGTTATTTT TTACGGGTTG AGAGTTAAGG GCTGTTTCAC AATTTIGTATG GCAATATCGT GAGGGAACCT CCGGCAATAC GACAAAGGCT GTGGCAAAGT AAGGTTAATG GTAATCGCTA AGAA'rAGGTA TCTGGGTTGA AGTCGCAGAA CTACGAGTGA TAGAGGGTAA GGATTGCGTC AACAAAAAGC ATAGTCGTTA GACTCCCI'TG TCAGAAATGG GGCGCGGGTT TCAAATTCTT CTCTTGTCTT ATAACGTTCA Ar'rTCATCTA GCAAATCAGA AATTTTTGAA AAGAGTTGCG CTAAGATCAG AATCTGTTGG GCCATGTTTC TCAGGATACG GATATGGTAG TCTGTCTGGT GAAAGAGGTG GGCTTCTTTC AAAAGCGTG;T CTAAT'rCTGC TCTGGATAAA TAGTATTTGA AGCGC'rGGAG GTGGTAGTGC TGGA'rTTCCT CTTCTCGTGA TCCTGTACCA ATAGCAAAGA GAAGGAATC TTGAACCAAG AGATGGC 'AA CCAAAACAGT
GGGCAGACTG
AGCAGGATTG
GCTTTrCACTG- AC?1'TGTCGC
GTCAGAGTGA
TACCAGCTGT
GATA'rCrrr'
AGGCATATAG
ATTGACTAGA
GC'rTGGTGTG CTTGTAGGCT AAAGGAACGT AGAAGGCCAG ATAGAGGCCC GCTCAAGTGA AAAGCTAGAA CACCGATAGC CAGAGCTAGA GCGAGCCAGT TTAAAGTAC TTCTACGCGT ATCAGATAGG AGCCGAAACT GCrGACGAAA GATTGAGAAA AT.ZAGCAAGC TAAGATGAGC 'rTGGTCGTAC GTTGGCTAAT AGACATAAGA AAAAGCGTAA AAGACAAGAC ATGAGCAGGC TTGCCTTGAT CTGCGTATTC GGCAACGGCG GTAAAGAGGA CATCTGTAGA ATGAGTCTTG GATGACACCA ATCACAAAAC CAACCCCAAC TAGAAATACC GAAAAGGCTA CAAGCAACTG GGATAAGAAG CTGAAGCATC ACAGGATGAG ATAGCTGCTA CCACACTGAG CAACAGGAAT TCCAAGAGTG TTAACTGCAG CAAGGGTCAA CTCCAGCCAT ATI'GATAGTA GAACCGAGTG GGATAGAAAC GTCCAAGGTC ATGGCAGAGT TTCATGTTGA CAGGAATGTT AAAAGGCTGT CACACCGCTG ACACGGAGGC AGTTCCAAAC TCATAAAGAA GAAGGCA.ATC AAAGGGTTGA CCACAGGGGC CTAATAGAAC CAATAAAATA CCGTAGTTGG CAAGGC'rTCC TTTTAAAAAC AAGACCAAGG ATTCCAAATG GAGCCAGATT 28380 28440 28500 28560 28620 28680 28740 28800 28860 28920 28980 29040 29100 29160 29220 29280 29340 29400 29460 29520 29580 29640 29700 29760 29820 29880 29940 30000 30060 592 GATGATCCAT 'rCGACAATrT TAGAAGTCAC GTCAGCGATA GTTTAGCA ArrCTrGACT A7TTTTACTG GC1-rCTCTCA TAGCGATTCC AAAAATGACT GCCCAAGATA AGATTCTAAT ATAGTTAGCA GTAAGCAGGG CGTTGACTGG GTI'GTCAACC AGTTNGAGCA AGAGG?!GCT GAGAACCTGC CCAATCCCAr CTGGTGGTGC AATTTCAGTA TTGGCACTAT TTGGGGTAAT TTCAkI'AGGG ACGATGAAAT TTGCTAGTAC CATAGGA'rAT ACAAGAAAAC AACAGTTTTC AGCTACAAGA GCAGCGCGA AAGTCCCTAT
ATATTGCTAT
AGCAAAGACT AGGATAGGAG
CTTGTCCCTT
CAACAGCTT
GAAAGGGCAT TGGCAACGAG ACGAATAAA'r CCTCGAGTAG CCCCACAAAG CATACCAATC TGATT-CTTTT CATAATAATC GACAAAATCA AGAATTTTCT AATATGGTAT AATGAAATAA AAAGCTAGAT GTGACAACCA GC'IrTAATT GTATTTTATA CAAACCTTCT CTAAAAATGT ACTAGAAAAT GTGTTTAArT TTTAGGNTTG CCAGTTTCTA TATGGGACCC CAAGGCTT'rC TCAACTGGAT GTGGGAGATG GGTTGTCAGT GTGGGAATTC TGTCCCTAAC CGAAATATCA TTTTAAGTAA TTGTGGTAC AAGCAGTTGG GCAATGGTTG GAAGACCTT'r CTGAAATTTA AACGAGCGCG CCCACA'rGGA GTCTTAGACG TAAAAAAGGA GAGCATCGTC GCCTGATTAC
CCCAATCCCT
AAGATACGCT
TCCTTTTTGT
GTCTATTTTTT
AGGAGTTTTA
TTATCGAGAA
TTGCTAAAAA
CTCGTCATGA
ATACGCTATA
GTTTGCTGGC
TGTCTGATGT
GAGAGATTAG GAAGGGTCAG TGACAAGGCT TGCCTTATTC GTAGTGATTA TGAT'rATAGT TGA.ATATTTA TGGAGAATGA TATGCAAAAA TTTATTCAGG TATTCTAACC AAGGTCATTT AATGCTTCAT ACCATGGTGC TGTTGGACGC CAAAAAACCA TTTCTflTTTA CTCTACTGCA TGGAGCTGGT ATTGCTGGGG CATCAATGGC TTTTTCATCC
TTGATGTTGG
TAGACCTCCA
TCCTAGGATT
CAAGCATGAA
ATAAATGATA
GACTGATGAA
CTTATATTGA
CTCTrACT
AGAGAATTGT
TCTCACGTT'r
TTTTGTCGAT
TAGCGATTGG
TCTTTGAACG
TATCGGGTAA
CCCTTCAC'rr
AGACCTGTTA
GGTTT'rAGAA
GAATAAACTA
30120 30180 30240 30300 30360 30420 30480 30540 30600 30660 30720 30780 30840 30900 30960 31020 31080 31140 31200 31260 31320 31380 31440 31500 31560 31620 31680 31740 31800 31860 AGGTCGTTCT GACAAA'rGGA CCGATTACTG GTACGACACA GCTTCGTAGC GAGGAGCAAG CAGTTG'rTAG CAATTTCTCA CGCACAGACT AATAGAGGGA GTTTAATAAG GAGAAAAGAT TACCTGGATA GACCTAGACC TAGGAAAGTT CGGTTTGGAC AAGGAAACCA TTGAATACGC ACTGGATAGA CTACCACCGT GAAAGTGAGA CGGTTACCTT TATCTATALAT CAAGGCCTAC TATGAGACTT TTCCCATGAC CTTTATTGTC CATTAGTAAT ACCAAGAACG CCTATGTCAT TGAACAGATG
ACTCGTTATC
GAAATCATCA
GTCAATGACC
TGGAGAACCA TGACACGCTT TCGATTTATA AGTTTCTCTT TCCA GTCTG GCAATGCCTA CTATCCTGTC ATTGAGCAGA TGGACAAGAG TAGGGATGAG TCTTGCGCCA GCGAACTACC AAGAAAAACC TCTTTGTCCT GTCTGATTTG GAGACTGGTA TGGTTTATCT GACGGCAGCT ATTCAAGGTC ATGCCTTGTA TCGTAGTTTT GCCATGATTG AGGCTCATCA GCrGGTATCC CAGCTTTCAG CCTCTTACAA CAATATTCTA TTGACTATCA TTCAGTCTT GCTAGCTGTT AATGTTCCCT TACCTTTAAC AGATGAGCCC GCAGGTT1'GT GGATTGTT'rr ATCCTTGTTA AAGGAGCCAG AATGGCGATT GAAA.ATTATA ATCTGACAGT CCCAAGCCTG CAGGCGCAGG ATACCCTCAT TGCT'rGGAAC AACCCTGATG ACCTrCGGGA CGCGGGTA'PT GGCATTATCG 593
GCCAAACAAA
GATGAGATTG
ATGACAGACC
AACAATAATC
TGGCAGTCG
ATCGGATTTT GTTAGAGCAT AGAGAGAACA GTTTGATGAT TAATCTCTCA GATTTTACAG TGAATGACAA TTTGACAACC TGACAGGCTT TTTCGGAATG CATGCT'rGGC TCTATATCAG TTTGGCTAGT CTAACCAAA-A TTGCGAAAAA AAGTTAAGAA
TACCAGATTT
GAATAAAGGC
GAACGCCAGA
TGCTGTGGAA GCAGTCTATG TGTTrTGGTC GATTTGGATA GATGAAGCAA TGGCTACATG 31920 31980 32040 32100 32160 32220 32280 32340 32400 32460 32520 32580 32640 32700 32760 32768 TAGTGTCAAA TAACACCAAA AAACGCGTTC .000 *0 0 4.4.
00. a C AACGAGCAGT TGAGAAATT'? GGGATTGATT ACGTTTACTG TTGGTATTGA CCGTGCTATG AAGGAATTCC ACTATGACAA GTGACCAACT CATGACAGAT ATACGAGCAG CCCACCGTGC TCAAACCCTT GGTCCAACAT GACTCAATCA AAACGCAGAT
GTGTTATG
INFORMATION FOR SEQ ID NO: 72: SEQUENCE CHARACTERISTICS: LENGTH: 14872 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear GGCCTTGAAG CCCTTCACAT AAAGGAAGTG GTCATGGTTG AGGGAT'rCGG TCAATTTTAG TAACCGAACT CGTGAGCGTC (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: CACAA AGAAATTGAG CGCGTTCAGc TGAGGATGCA CTATGATGCA AGCTACATTT CCAGT
CATTTGATGG
AAAAAATTTA
TTACCGTAGA
CTCTTCTCAA
GAACTGACAC
GATTTGTCTT
GATATTAAGA AAGGAGAI'T TCATGACACT TTTAGATGTA AAACACGTTC TAAAACACGT TTTCAGGGCA ACCAAGTAGA AGCCCTCAAG GATATTCACT AAAGGGTGAC TACGTTGCCA TCATGGGTGA GTCTGGTTCT GGTAAATCAA TAT'TCTAGCT ATGTTGGATA AACCAAGTCG TGGTCAGGTT TACTTGAATG CGCAACTATT AAAAATTCAC AGGCTTCTAG TTTCCGGCGT GAAAAGCTAG CCAAGACTTT AACTTGCTAG ATACTCTGTC TG'N'AAGGAC AATATCTTGC
TTCCGCTTGT
CTGAGAATCT
AGAAACAGCG
ACGAGCCAAC
AAATCAATGA
GGGCCAAGCG
AGAAGACAGA
AGGTGAATTA
CGCAAACTCT
TTTTACTCCC
GCAACACTTG
CCAATAGTTT
TGGAGAAGCG
CTGTTGGAGC
TCAAACrAAT CTTGTCAAGA AGACCTATAA GGGTATTAAC CAATTGCAAG TGTAGCAGrA GCCCGCGCCA AGGAGCCCCT GATTCCAAGT GCGTGGGCAA ACCATCCTCA TGTTCTCTTT ATCAAAGACG GCGTCAGATG TTCCAAGAAA GTATGTTTCG AT'rAACCAAT ACTATCCCTT TGCACTGGCT TAACCTTCAA TCCAAAGATTr GATTTGGTAT GT'rrGTCGTT TGTCATGAAA AACCGTTCCA CCATCTAATC AGTATGACCT GGGTATCGGT ATTGGAGCCT GAAACTGAAG GTTGAGCTGG 594 CGGAGATGAT GAAGAAATTG AGAAGTACCC TTACGAGATT TCATCACAGA ACCTGAAATT CATCTGCAGC CTTAC'rTGAT TGGTAACCCA CTCAACAGCA GCA'rrCTTrA CAACCAAATC
GTGGTGACAG
'rCTGTGGTC
CTCCTTGCGG
GTCTT'rAATG
GCTGCTAGCA
TACCGTGGAG
ATGGCAAGCG
GATTAAAAAC
CACCTATCTC
CACCATTCAA
GTCCTCTATG
TCTCTGATAC
AAGTTAGCGG
GTTCTCTTGG
CCGAAATCC
ACCCTTGCGT
CTTGACTGTC
TATCGAACTT
CAGTCACCAT
GTGGAGGAAC
CAcCATTATC
S
S
S. S S
S
55
S
S
5555 55 S S
S
S
S
5* S S AGGAACTGGG TATATATGGC ATGTTAGGCT TTAAGGAGTT AGTGGTA'rrT GGGATTCTAA TGTTTGACAA GTTAATTTTC GCTTTCCTGC T'TGCTACCTT CCAAATGAAT GT'rGTCATTG 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 CAGTACTTGT TGTCTTTGGA TTGATTTTCC T CGCCCGTAT GAATGCCCTC CAGCTCTCGC TAGGCCTCAT GTTCCTGAA'r GTGAGAAAGC AAGCGGAGAG GCTTCCT~ACC TCTCCAAACG ATTCTTGGTT CCATAAGTTT AGGGATTGGC CCCTTACGGT AACCGATCCT CTTACAGCCC TAACAACTTT CTTCCTAGCT TTATCTTTGG TACTTATCTA TTGTT'rAATG AGAAAAACAA GAAATACTAT TACCAACCTA TCCGTATGAA GAAAAATGCG GTTGGACTAG TGGTAACCAT GTCAGCAGCG ACAAGCATTT TAAATCCTCZA TGATTrTGGG. GTTTCAGGGC TCTTGAGCCA GTTTGCAAGT GACAAAGGT'r AAGTAACTT TGGTATTGCA AATCAAGAAG AAAACCGTGT CCAACCCACA ACAGTTNTCA TGACTGGTCA AAAACTGTCT CTATCAGGAA GACTGAAAGG ACAGAAAGCT CTAACTCTAA
CAGGGATTAC
ATAACCTCAT
CAACCATCGC
TCAATTCCGC
AAAATGT3'GA
AGTCTTCCTA
ATCTGTTTCC
TATTTTGTCA
AGAAAGCTTT
AAAAGAAGAT
GCTCTTCGAA
AAAAGAGGTC
TATTATCTTG
GTTTTGCTGG
CAAATCTTAA
AACTTGATTT
ACAATGGTTT
AAAAAAGTTC
TTGGACAAAC
ATAGTGTCAA AGAGAAAGAA GTACTTCGTT GAACCAAGT'r AACTAT7=? GAAAAAGGAC
TGGTATTTGA
ATGAGGTCGG
ATGATCATCA
CCAAAAAGAT TATGAAAATA 'rCTCTrTGCC AAAAATGACG ATTTTCTGTC AAAGAAGAAT TTAATAAAGA TTTCATTGTG AACCATGTTC CAAATAAGTT TAATATCTTG ACTACTGATT 595 GATTTrACAAG CCTTTTTGGA TCAATTCCCA GATrcGGccA ACAATTACCT TGTCCT TCTATAATCA G?1=ACGGT TCGCTGAGGA GTATGAAAAC GCTA'IGTTTA TGGTAGCAAT GTGTCTTCT? TATCGGTATT TCTACTACAA ACAAA'rrCT AAGTCGGTT'r GGACCAAAAG TCTTCCTTCC 'N'TGCTCT'T TGATT'rTAAA AGTGATTGGT TCTGCGCTAT CTTCCTCATC GCAAGATrGT GCAAA'rGTAA TCTA.AATGCr GAAAACTTGT GGTATGAATG TAAATGTCAG TGAAGAAGAA 'rACCTCAATC AAT~rAATGC TC.AATTAGAC CTAGCAGATG CrAGrrCTCA GATGAGTGCC TTCCTATCCA TTATCTTTAT GGTCGGAACT GAAGGCTACG AAZACCGTGA ACGCTTTATT
CAACTCA.AGG
ACAGAAGCTA
CTCTT-rGGTG
GTTC'TGCTCA
ATCTTGCAGA
TTAACTG?1-
ATGCTTAGCC
ACCTTGTCTA
CAAATCAAGC
GCCTTCATAC
GTACTCGATA
GCCTATGTGC
AAAAGATACC
AAACCATCAA
ATCTCGC=~
CGACTATGAT
TGAT=CAT
TCGACTTCAA
CAAACAGGT-r TGCCTrACCAT
GTTGATTGTG
GATTACTTCA AGAAGTTATC AATCGAGGTA 'N'TCTTGTAT 9*
CAAGGTTCCG
CAGTCAGAAG
GAAAAATGGT
AAATGGCAGC
CCAGAGCCTT
TGAATTCTCC
GGTC1'AGCAA
CGGATACATA
ATAAAAAGAT
TGGAAATAAT
AATCATAGCT
AAGGGCCACA
CACTGACAGC
CCGAGCAGGA AGGTAACTCC CATGGTCAAG AGACCAATAG GT'rTrGGTTG GGGCTTTTCC AAGTCTAGCA CTTGTGTAAC CCGACAATAA GGACCGTAGC AGGGATGCGG TAATCACTTG ATTGGAGGCA AACTTCTAAG GAAAAAGGCA ACGAAGCTAG 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 GTGCCAAGGA TTGGTAAATT CT'rCATACTC ALATCCCATAT TTTTCCTCTA GAGTGGATTT TTAAGAAAGA 'rCTATTGGT CAAGAGTTGG GCAGAAGTTT ATTTTGGATA TAAGCACCAT AGAGGGATTT T'rTGGCTAGT TCCCTATCTT GAGTTTTTCT CGCGAAACC CAGC'rTCCTC GGTATCTTTT GGAGTTGAAA TTCTCCACCA GCCATTGAAA AATCCAGATA TTGGTCGTGG AGGCACCAGC TAAGATAGCC GTAAAACCTG CACTGGCAAC TCCGATAACC ACACCAGCAA TCCATCGT'rA GCATCAAGAA CACCCGCACG CAGGATATTT AAACGACCTG CAAAA'TTGA ATCAA'rTTCG TGATTTGTTT CTGACGCTAA ATTTCAAGTT CAAGTTAGCC ATCAAGAAGT CTTCTCTGGG TGACTTGTAG TCCAAGCATT TTTAGGATA GTTGTTAATC CACTTTTCGA TGAATGCGAC TTCTTTGGGA GTCATTTTCT TGGTTCCCT-r AGGTAACCAT CTACGAATGA GCCTGTTGTG ATTCTCATTA GTTCCCC~T'1 CCCA.AGAGGC ATAGCGATGT GCATAATAAA TGTGCTCC'rC AGAAAATACA TTAGACALAGC GATTGAATTC CGTTCCATTA TCTGCCGTGA TGGAAAGAAT CTTGTGTTGT 'rTTAAGATGA GTTTAGACC CTGA'rTGACC ACATCAGCAC TTTTATTTGG AATCAATCGG ATGA'rCTGAT GTCTACTTTT TCGATCCGTC 596 AAGACAAGCA AGCAGTAGTT TTTCGCTCTC GTAAGTAGAA CTGTATCAAT CTCATAATGC CCATTCTCCA AGCGAAAATT GATAGCTTCA AGCCGCTGTT CGATGGA'rTG ACCAGCAGGT TTrAAAGTTGG TGCTGGCCTG TTTCTAAGC CCTN'TCCTT TTCTAGGGTA AAGCAGATCC TGTTTGCTTA ACCCCAATTT TCCATGATGA ATCCAATAGT AAATGGTTGA AATTCCCACG TTAACCCCTT TAGCCATCAC CATCATTTCA GGCGAAAATT TTTGGTTATG ATAGTGGAGA ATCTTTTCCT TTAGTTCCTT GGTCAAGCTT GATTTCTTGA CCGAGCGCTT GCGATTGTTT TCATAAGACT GTTGAGCATA GTCGGCAGAA TAAACCTCTT TGAAGCGCCC TNTTCCAAGA CATTGTCGGA CTGTCCCACG AGAGAGGCAA TTTCTCTATT CTATCGATTG TCAAATGTTC 'rTGTGTTTGT GGTTGAACAA TATCGGCTAG TTGAACCATC CTTTCATTTA CGAACATATA AAAA'rCTATT TCTAACAATG ATAT'rTrTGT TTTATCAA AAGCTfTAGAA AATGAGATGA GGACAGGCAG GCTTGAATGC ACCCTACTTT TATAAGTTGA TGAAAGCACA CGTCATCTTG GTTAGGAAAC ATCGGGAACA CTTGATTTCA GTGTGGATAG TTTGACGAAC Tr'rTCCAAGC TGATTTCCCT TCTTTTTTCC ATCTTTCGAT 'TAAGCGACOG GCCTTTTGTA GTATAATGGT TTTGCATCTC TGTGCCTTrC CAAGTATAAC ACAGAGGTGT TTTCTTATGC CTACA-AGAGC TAATI'TTTAG GAGGGCTGGG TGGCTAACTr CATTATAGAA GTAAAATGAA ACAAGAACAG TTTTAGAAGC AGAGGTGTAC AAAATACTTr ACAAGTrT TGTTTTCTAG CAAATATAAA CGAAGCGTGG TTGAAAAGCC TGTTAGGACA CTTGTCCTAA TGAAACGATC AATAAAGTAC GACATACTCA ACAGAAACCA AACAAATCGA TCAGGACAGT TAT'rCTAGTT TCAATCTATT AAAAACATGA TATAGTAATA CCCGAGTAAA AAATGCCTAC ACATTATTGA 'rAGGGTTAAA TTCATAAATT TTTAGTGTGG GTAATATTTG C'rACTAGAGA AAATAAACAC GTCAGAAGAT CCTTTAGTTC CTTAGTTTTC 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 TGCAGAGCAG GTGAAAACCT GCTCTTTTTT CATGAGTCAA ATAAGGTCCT AAAAATATTG AAAGGAGTAT CCAATTTATG AGGCCI'TGGT GAAGTTACC GGTCACAAkGC GTGGACGGGG AAATCCAGAA GGCATTGA1'G TCAATTCGAT GAAACCTr'rG CGTGATGCAG AGGAGCTGGC TGCAGATGCT GGTGGAACAA CTTCATCGGT GCAGACTATG ATTATTCTGC CACGAAATGT CCATAAATCT ATTCCCATCT ATATCGAGAT GAGTGTAGAT AATGACCGAG TAGCACAGGC CATAAAGGAC GTTTTGAAAG AGTTAGATCA AAACCAAGCC AAGAAAAGGA TTGTTCCCTT TGATGTTCCA
CTTGTCGAAC
GATAATTTAG
TTTGGAGCTA
ATTCTGGCAA
GCTATCAATG
TCTTAGGAGA
GCCATCCTAT
GCCATGCCTT'
CCTGCAAGGC
CGTTGGT!'CT
AAAATGTGTA
TTCGATTATT
TCTAATGATT
AGGAGATAAG
ATGTGGTGCC
AGGTCT'rGAA CCTAAGA'rTG GTATCGCTTT CATCCAGATG CTAAGGCTAT CCTAATCAAC AATCCTACTT ACTACGGCAT CTGTTCAGAC CTAAAGGGGT TGACAGAAAT GGCTCATGAA GCTGGCATGA TGGTTTTAGTI AGATGAAGCC CACGGAGCGC ATTTGCM-1-r CACTGATAAA CTTCCAAT CTGCTATGGA TGCAGGGGCT GATATGGCAG CACNTTCCAT GCATAAGTCT GGTGGGAGTT TGACCCAAAG CTCCATTTTA CTTATCGGGG AGCAGATGAA TTCTGAATAC GTTCGTCAGA TAATTAACCT GACCCAGTCT ACATCTGCCT CTI'ACTrT-rT GATGGCTAGT 1-rGGATATTT CACGTCGCAA CTTrGCCCcTT1 CGTGGTAAAG AGTCGTTTGA GAAAGTCATT GAGCTATCTG AGTATGCCCG CCGTGAAATC AATGCTATCG GTGCCTACTA TGCCTACTCA AA.AGAGTTAA TAGACGGTGT T'rCGGTTTGC GATTTTGACG TAACTAAGCT GTCAGTT-TAC a. a a. a.
a a a a a a a a a a a. a.
ACTCAGGGTA TTGGCTTAAC AGGTATCGAG ATTCAGATCG AGTTTGGTGA TATCGGCAAT ATCCAAGACA TCGAGCGCTT GGN'GGTGCT GATGGAAAAG ATTTGATAGC AGGAGAATAT GAAGCCTTCT ATTCAGAAAG AAAAAGTTTA GGAGAATTTG TTATGTGTTA CCCTCCAGGT ACACGAGAAA TTGTCGACTA TATCCAATTC ACGGAAGATC CAGAGGTCAA TCATATCAAC GTTTATGACC TCTTGCGAGA CGAATACGAC ATCTTGGCCT ATATTTCCAT CGGCGACCGC CTGGCTGATA TrAAGAGACT CTATTCAAGA ATTCAGCCCC AGTTAGTGCT GTCTCCGCAA ACTTTCATG ATTCTGTTGG ACAGGTCTGT ATTCCTATCT TGGCTCCTGG TGAACGCATT GCCAAGGAAC GTGGTTGCTC CCTCCAAGGG GTTATTAAGA GAAAGACAAA CTATAAGAAA 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 AGTCAATAGT TTATCTAAA CTATTTCTTA 'rTTCAATTTG ATGATTTGGC GATGATTTTA GAGCACGGCA AAAAGCCCTT GAATTAGAAG CGGTCAATCG CTTAATTTCT ATCAGCTTAT CAAATCC'rGC CTCAAGCC 1r TTCTGAGGAT TAGGGTAGCG TG3TCAAGAGT TGGTAGGTAT ATTCTGAATG CTTTCCAACG ATTTTATCCA ACTCAGGAAA GATCATATCA AGACAACGAG TGTATTGTAC TTTCCAATCA GACTGTTTTT TCTTGAGACG ATGAATATGT CTAGCCAGTA TTTTAGTTC TACTTGCCGA TTATCGTGTT GAAATTGTTC ACGATTGGGG TCAGAAAGAA GTrrAAGAGC GATGCCATGA GCGTCTTTCT TATCCG=rT AGTTTTGCGA AGTGATAATG ATTTGGCAAA TTTCTTGATG AGCAAAGGAT TGTAGGTGTA AACTTTATAT CCTTGTTCAT GCAGGAAGTT CAGTAGATTA AAGGCATAAT GTCCGGTATT TTCAAGAGCG ATGAGACAGT CT'GGTTCAG CTGTCGAAGA GACAGATCTA AGAGTTCAAA ACCAGCTTTA TTAT'rTGAAA AAGTGAGTGG T'rTAAGAACA GTTTTTCCTG GAACATTCAA GGCTGTAACA TCGTGTTTAT TTTTAGCGAC ATCAATGCCC ACATAAAGCA TGGGAGTA'rC TCCAGATATA GTA'TTCAAG TCTACTGGGT TATCCACGAA CTTTTrTGCCT TGT1TACCTTA GACGAGATAA AACGTCTATG 598 CG'rTATCAAA CTCATTACCA ATTGAAACAA AAAACTGTGG TTAGAGCCTT TCGGAAATCG TCAAGCGATT GGAGGAAATG AACTAATCCA CAGTGGCTTA TTCCAAGTAT ACCAC'rTGGG CTNTGGCAGT AGCTAACTGC GCTAAATATA ATA'rAAGGAG AAATAGATGG ATTTATGT T'rCGAAGTT CATACTCCAG ATGTCAAA'rT GTCTCTGAGA ACAGCCAAGC AACTTTACGC TGGAAAAAGT GAATGGCAGG ATA'rCGAAGT CTTGGATACG -CCACTTT1TG GGAAAATACT GATTTTAAAT GGCCATGTCT TGTr'CTCAGA TGCGGA'rGAT 'rTCGTCTACA ATGAAATCAC CGr'rCACGTT CCCATGGCTG TCCACCCAAA TCCAAAGAAA GTATTGGT'rA TTGGGGGTGG TGACGCCGT GTTGCCCAAG TATTAACCCT CTATCCTGAA CTGGAGCAAA TTGATATTGT GGAACCGGAT GAGATGT'rGG TCGAGGTCTG GCTAGATGA'r CCTCGTGTTA CCATTTACTA CGAAGATGAT TACCATATTA TCATCAACGA ACTCTTTACC AAGGAATTCT ACGGCAATAG GATTTACCAG CATGGGAGTC CCTTCTTTGA
TCGTGAGTAT
CCAAAATGGG
TGCGACAGAT
TTCCCAGACT TTGCTGCAGG C-rACGCTTTT TGCGAAACTG CCATTTGGCC ATACGGAAGG CCGCAAGGTC AATCAAGCCT
TTCCAATCAG
CCCAGCTGGC
TGACAAGGAA
CGTGGaAGCC
AAAATGAGTC
ATTTGTCAAG
TGCGATGACT
CT'rGATGCTG
GTTTTGAATG
TATTGGTTGT TTGGATTTGC GGCTGGAAAA AACGCCAGCT T'rTATGTTG;C CCAAGTATGT GTTTACTAGT TATTGGTTGT ATAGCGAAAC ATTTACAGAG TGAAAGCGAA GCTAGAAGGC ACAAGGTTGA AGAAGTGATT TAGCTCTGCC TTATCAAGAT TTATCGAGCT CTGAAGGAAG ACGCCATCAT CGAGGATGAG 'rCGGCCTGCC GAAGCATGCA TCGGGTTTAT CAGGCCCATA TTCCAACTAG ATCGAAAAAA TACCACCCTC TCAAAGATTT TTTCACAGAA TACTACACTG CAAACTTACA TGAGGACATT TTAGAAGAAG AGGAAGGAAA GGGGGCGTTG CCCAAGTTCC TATTTCAAAG ATTATGATTG CTAGCCGTAC CAAGTCAAAA 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 AAAACAAGTA CTAAAATPGA AACTGCAGCA GCCCTGATTG AAAGCTACAA ACCAGAAGCT TTAACCATTA TGGATGCTTG TTTGGCAACA GGTGTTCACT ATATCGATAC AGCCAACTAC GAAGCAGAAG ACACAGAAGA CCCTGAGTGG CGTGCTATCT ACGAAAAACG TTGTAAGGAA CTTGGTTTTA CAGCCTACTT TGACTACTCA TGGCAGTGGG CTTATCAAGA GAAATTCAAA GAAGCAGGCT TGACTGCTCT TCrTTGTTCT GGTTTTGACC CAGGTGTAAC TAGTGTCTTT TCAGCTTATG CCCTCAAACA CTATTTTGAT GAAATCCATT ATATCGACAT TTTAGACTGT AATGGCGGTG ACCACGGTTA TCCATTTGCA ACCAACT'rTA ATCCAGAAAT TAATCTCCGT GAGGTTTCTG CGCCAGGTTC TTACTGGGAA GATGGGAAAT GGGTCGAAGT CGAAGCTATG TC'rATCAAGC GTGAGTATGA TTTCCCTCAA GTTGGACAAA AAGATATGTA TCTCCTTCAC CATGAAGAAA TCGAATCATT GGCCAAGAAC
ATTCCAGGTG
ATGAAATGTC
GAAATTGTTC
CGTACAGTCG
AAGACTATCT
CAAGC'rATT'r
GGAACTTGGA
GAAGCTTTGA
TAATGAAGTT-
CTAATTGCCG
AGAACGCATA
CAGCTAGTGG
TCAAACGCAT TCGTC'rrT TTGAAAATGT TGGACrCCT CAATTCAATT TTTGAAAGCC GAAAAACCAA TATTGGATGT ATATCTACAA TGTCTGCG.AC CTTATACGAC AGGAGTTCCA AACAAGCTGG AGTGTATAAC ATGAGTATGG TTTGCCATGG AGAACAAGTA CCAACACCAG CATTCTACAA TATGTACAAG TTCCCTCtAC AAAACTTATC ACTCTATGAG GCCAAATTCG 599 ATGACT?1WTG GTCAATCTTA CTTGACCCAC CGTACGGATA CCATTAACTT TAACGGCCAA TTGCT'rCCAG ATCCTGCCAG TCTTGGGCCA ATCTTTACAG GTGTCAAAGA CGGTGTCAAA CATCAGGAAT GTTACGCAGA GTGGTTCG GCCATGATTG GGACAAAATT AGTCATGAAC CTTGAGGAGT TAGATCCAGA TCCATTCATG GTTGTGGTTG AAAATCCACA AATGGTGGAC CCTATGTTAT TGACTTGGCC AAGTTAGAAG AACAGGCCGG TTGCAAGGTC TrGCTTGCCC CCTTCATTAG CCAGTATrCTA TCAGGTACGA CAAGGGAAGA ATTTCCTGGT GAAGTCCATG 0* a a
TATTTGCGCC
TAGTCTTTAA
TCAGTGTTGG
TGCTTTCAAG GATGCAGACT TGGAGGAATT GCTAGAGATA ATGGACCATA CTCAGAGAGA CAGTTGCGTA AACACGGTCC GCGTTGTCGA GAGGCTGGTG TTTGCGCCTC AACCCTCAGT GTTCAACTCA AGGcAGATCA CGCGCTCTAT 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 30680 10740 10800 10860 10920 10980 11040 a. *a a a
S
a. a a
S
GACCCTTGTG CACCAGGTTC CTAGATTTGG TTGACGGACT CAAACAACTT TGAAAGCAGT CTCAATATGG GTGGTGGTCA TCAGAAATCA AGCGTATCCG GCCATTGCGC ?rT.AATGCGGG ATGGAAATCT 'rGGTTTTAGA CCCTATCGTC CACCTTTGAG CTTTCTTCTA ATACCTGTCT GTCCAAATCG GAGACAGACT AATACCTTTA ATGGTATTGG AGCTTACTCA AAGCTTTTGG CAAAAAAATT ACGCTATCAC TATGGCCGAC TCGACCAGGA TCGCTTTGGA GTTACTATAG ACAAGATTCC GAGTGATTTG TCATTTTCAT ACCCTTTGCG AGCAGGGAGC AGATGATT'rA AGAAGAACAG TTTGGTCCCT ACTTACATGA GGTAAAATGG TCATATTACA AGAGAAGGTT ACGATGTGGA TTTGCTGATT AAAAACTTAC AATCTTGAAA TCTATATCGA GCCTGGTGAA TTATTTAGCA ACTGAGGTAT TAGATATTGT AGAAAACGGT CGCCTCTGCG ACCTGCCATA TGCCTGATGT ACT'TGAGATG AAATGGCTTT GAGTCACAGG AAAAAGCCCA TACCTACAGA GACGGGCGAT GTGATTGGTG ATTATAGTTT TGAAAATCCA TTATTTTCAA GACATGGCCA TTTATTCTTT TGTCAAAAAT ATTGCCAAGT CTCTATCTCA TGGACGAACA GGGAGACTGT CTATCAAGAC TTTAAAGGGA GATATCATG ATGGACAGTC ATGCCAGCAG AGTACGAACC CCATCATGGT ACCCTCATGA TCATGGCCTT TTCAAGGAAA GGCTGCTAAA AGAGCATTTA 600 CTCAGATTAT CGAGACCATA GCAGAAGGGG AAAGAGTCTA TCTTrGGTG GAGCAGGCCT ATCTATCTGA AGCCCAATCC TATCTTGGAG ACAAGGTTGT TTATTTAGAC ATTCCCACCA ATGATGCCTG GCGCGTGAT ACTGGCCCAA CCATTCTCGT CAATGATAAA GGTAAGAAAT S. 0.
.9 0
S
S
99 S 0 TAGCCGTGGA TTGGGCCTTC ATGAAGAGGA TGACCAAGTA ATGCTAAACC TTTTGTACTG TCGTAACTGA AAGTTGCTTG TTGAAAACAC ATTATTAGAA TTTwATCAGGA TGAAACCAA'r AGCT~TGTTTT GGCTTGGACA ATCTCGAACT CTrAGAACAG TGCCTATCCC TGCAGTTCGA AAGAAGGAGA AGAAAAGCGA ATATCGCCAA CAAGGCTGTC TAGATArCCT CAGCAAGTGT TTCTCTTAGG TGGTGGCAAT AGATGAGAAA TGTAAGAGTT ATATCCAAAC CGCAGAGCGT TCTTGCCCGA GT'rGTTTGA.A AGTATGCCCA ATCTGTAGCG AACTACAAGT -TGTTr'rACCA CTATTGCCGT CATTGATGCA
AATGCTTGGG
GCCAGTCGT'r
GAAGGAGGCG
CT'rAGTCCTG
AGTCTTGGTG
GAGGCACCTA TGATGGTCTr TATCAAGAI-r TTGCTGAGGC CTTGGAAAGG CCTGTCTATG CAATCCATAG CGATGGTCAA GGAACTATTC GTCGCAATCC TAACTrGACT AAAGAGGAGA CTGAAAAAGT TATTTGGCTT CCTTATGGTA GAACACGTCG ATAATGTTGC TGCCTTTGrT GATGACGAAA ATGATCCCCA GTATGCCATG GAAACAGATG CAAAAGGTTG TCACTTCACC CAAGTrGTGA CAGAAGAAGA TTTGCCAGGC GGTCCTGCTG 11580 TCAAAAGCAG 11640 ATTCATAAAT 11700 TACATCTATG 11760 G'rAAAC'TTTT 11820 11100 11160 11220 11280 11340 11400 11460 11520
TACGCAGGTG
TTGGTTCCAC
TTCCCAGACC
ATCCACTGTA
GCAACCAT'rC
AACGACTAGC
AGTTTGAGGA
GTAAAGTTGT
TCACCCAACA
AGATGCAATG
AGCTTCCTAC
T'rAGTACGTC AGGCTGCTGA CATCCCTATT TCTGTCAGGA GAAAATACTG CCATTCAGCA ATCAGTTTCT ATGAAAAAGA GATGGGGAAG TGCTGGGCGT TGTAAACGAC CAAGTGGCCT CGGAA'rACCA GCCAGAGATA AATTCCAGAA TAGGAGAAAA CGCTAAGGAT GTGGCAACAA GCAAGGAGCC CAAATrATTC ACGTCAGrAT GACTACTACC TTTTAAGGTG ATTGCTAAGG TGGTAATGTC TrGTATAAC'r TITATCGAAAG ACCCATATAC TGGTAACACr GGTTTCAAGG TTGGGATCAA 'rGGTTCCCTG) CTT'rATCCT ACAGCTATCG GCAACGTACT ATGCAAGGGC TTA'rGGTTTA GAGGAGGTTA 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 11880 CAGATGACCA TTATTATCAA GAAAAATTCT ATTTCACGCCC TCTGGAATAC TCGCTATGCT AAGATTGGTA TCGGTATCTG AAACAGCGCG CTGTCTTGCA TTGAATGGTG CTGAATTGCT GT'rCAGAGCC AATTTTGGAT ACAGATAGTT GTGGTCAC'rG ACGC-AGCAGC GAATATTGTT CCAGTCATCG CAGCCAATCG CTCCTAGTGA GGAAAATGGC GGACAGAGCT CCAGTCTTGA CTTCTACGGT TCCTCCTTTA TGACGGATGA AACAGGAGCT ATTCTAGAAC GAGCTGAAAG ACAAGAAGAA GCTGTTCrGT TAGCTACTTA TGACCTAGAC AAGGGAGCAA GTGAACGCCT AAACTGGGGC TTGTTTCGAG
ATAGAAGACC
TTCTGCTAGA
T'rTTA'rGATT AGAAATGTAT AGACAAATTA cAG.ATTAGTG TGGGAGAAAT GAGAGATTCA CTAACrCTT' ATTAGTAACT ATAAGATACT ATGGCATCTA G3TAAATCGAT CGCTAT GTCTATTGAT TAGTCCGTAT TTTAAALATAT TAGCAAAAAA GCAAATAGCA GTAAC?1'CTG TCTATT'rGCT TTTCTTT? GCACGCGCAA CGCCC'TCTTC TTCGTTGCTT GAGGTAACGG TAATCGCTGG CATTTCCCAT 'rGCAATCCCA AGCCCTGCAA TTAT'rAGCAT CGCCCATGGC CATAATCTCT GAGGAATCAA CGTGAAAGAG CAGTAGCCTT TGTCGTTCCA AGCGGCATTG GAACGAACTC CACTGAATCG TTGGCAAAGC TCTTCAGCAA ATAGAATATA ?1'TCTCAATA CATCCGCAAG AGATTTGATA ACTGGAGCAT TTCGATATCG TCTrCAAAA'r CTCAGCTAGT CTTCATAAAT GACAGGCTGC AACGC'rGCTC AAAATCGTCT GGAACTTTCC ACTAGTCGCT TAGCATCATr TTCAATAACT CAAAAAGTGT CAACTGAACA GTTTGTTCTT TTGTTCCTAA TCTTCAAGAG A.AATTTCAGT TGAT'rGGGCT TGTCACCGAG
ACACATACCT
CAGGTCTGAA
AACAAAATAA
TGGAACATCC
AATACTAGTT
TGTGACTCGT
TCACTCTTTTr
TCAACTAGAC
TCGTTCTGGA
CAGCAAGGTC ATAGAGGTAT TCCAATCACT GGTCTGGTGA GGTCAAGCTC CAGTTrTTTTG TCGATGTCAG CTGGACTCAG 7rCTTTCCAG G'rTGAACAAC CGTTGT'rAAC AATAATATAr TAGTAGGGGA GGACACCGAA AAGGGGGCGA 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 CCCGTACAGA GAACCACTTT GACACCTTTT TCAATGGCTT GCTTGTGGGA TTTCCTTGGC 'rTCATTGAGG AGGGTGCCGT
TGTGAATAGC
CCATATCCAA
'rTAATCATAG GTCTTCCTCT TGATAGAAAG C'rTGAGACTA ATTGAAGAG GATAT'TTCGC CAAATTAAAA AACCAACTTA ACTCAACTAA TCTGAAGAAT AAATCTAACA AATGAAGAAT TCCTAATGGT AGGTACGAT'r TTGTCCCAGA GGCACCTATG AAGTGC'rGCG AGATTT'rAAG TTATCTTTGC TATTATTATA GCATATT'TTC ATTGATTTTA TAGTTTAAGA TGTT'rTGATG AAAGATATGC TATACTATGT TTGTCAATGT ATATAATAGT TTTTTTGTAA GTAGGTATGA AATGGAGGAA ATATATCATG ATT'rTAATGA TAGAGCTGAT ACAAGGTGGA GCAGATCCAT
AGTAATGTGT
GGCTAGTAGT
GAGAAGAAAT
ACAATTCATG
TGCAACTAGA
GTAGCAGAT'r
CAAAAAATAT
ATGGTAAAA.A
GGGAAA'rAGA
ATTCAGGATA
TAAAATTTAT
ACCAGTATTA ACTCTGCTGG TTCATGGATT TATTGGAGGA GGTAATCATC TTTGCAAAGG TAGGAATATG AAGAAACAAG GGGAGAAAAC CAAATTTAC TAGCAT'rGTC CTTAATTTTT AATTATCTCG CAAAGCCCTA TCTACCAGCA ATGGAGAGCG GAGT'rCTTTT CTTAGCGGTC AGAGGATTTA ATATGAAAAA ACGAGCTATT TACAAATCAA CTTGGTTTTG GAGGCTTTTC AGTCGTGAAT TTTTCAGAT TCTGC'TTTTG 602 ATCTATCTAC TGGITrTTGC AGGAAAGAAA ATT'rTCATT TCA.AGTGGCA GCTGAGGTAC TTCATCTACC TTTTACTGGG CTACATCATT TCATATATGT CTGACTTCCT CTrTTTCGTAT TTCA'FATCCC TGTCTTCAAA TCAGATTTCT TTGAATGAAA CGGTAGAAAT GATGGGGAGA CAGGAGTTCC CTTATGTCrT GCTCAT,-GTT TGCTTCATCG CCCCTATTGC TGAGGAATTG ATTTATCGAG GtGTGCTTAT GACAACCTGT TGCAAAAAC'r CACCTTGGTA CG INFORMATION FOR SEQ ID NO: 73: SEQUENCE CHARACTERISTICS: LENGTH: 10223 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 14640 14700 14760 14820 14872
CGTGCTATCG
TCTTCTTGGA
TATCGGCATT
GTAGGTATAT
GACTTCGTCA
ATCTACAACC
GTCTCAAAAC CAATCTGGTC GCCATCTGCT GGATTGCCAT TTCTAATACT CTTCGAAAAT GTTACTGACT TCGTCAGTTC GCTATGGTCA AATCCAGTTG GAAAATCCAT CATCCTCACC ACTCTTGGTA TGCAGACCCT CTCTTCAAAC CACGTCAACG TCGCCTTGCC TATCTGCAAC CTCAAAACGG TGTTTGAGCT GTTCTATCTG CAACCTCAA.A ACGGTGTTTT GAGCTGACTT CGTCAGTCGT TCAAAACAGT GTTTTGAGCT GACTTCGTCA GTTCTATCTG CAACCTCAAA ACAGTGTTTT GAGCAGCCCG TGGCTAGTTT CCTAGTTTGC TCTTTGATTT TCATTGAGTA TAACACAAAA GGTAGCCCAT CAGCTACCTT TTTCT'rATGC TTCCTCAATC AAGCGAGTAT GTTCTCTCTT GATACAGCGA TTCATCACGA CTTTCGCTTC TAAACTTTCA AGTCCTAGCT GAAAATCACG CGCCACATCG GGCAGAAATT CAGGAAAAGG AATTTCAGCG AGGCTAGCAT CCGCCTTGGG ATTGACTGGG ATGATTTTAT GATTGCTGGT TGTTTCTTCA CGGTCAGACA CGAGATACTG ACGAATCACG CCATCACTTG TCCTTTTTCA TCAGTATAGC ACATTTTGAA AGGACTAGCC CCCTTTTTAT TTAGCCTCGT TAAGAGGAAC ACTGAGTTGA ATGGCTTCTT
TATCATCACA
GTGCCCAAAA
CACTGCGACG
AAGCCT=TC
AGCCCCGAGC
AACCCACCAC
GATTGATAAA
AAGGTTTGCA
ACCAGGTTGC
CCATGGTTTG
TCCACCATCA CGCAAAATCT AATCTTGGCA TCAGCTTTGA ATAAACATTG ACAATATCTA ACCCAAGATT TCGCCACCIG CTGCATTTCC TTTGTTACTC AGCAAGGGTT TTACTCGTTG TTCTTGACTC ATAGAAATCC GAATTATACT ACAAAAAAGG CCCTTCATTC TCATCTGCGA TTTCACCAAT TIVITTCATCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 CTACCAATTC AGATTTAGGC ACTTCAAGGA CGATTTCATC GTGCACTTGT AACAGCATCT 'rAGTCTGATA ACCACCTGCA ACCAAGGCTT TATCTGCTGC CGAACCCTGG ATAGGTGACT TATTGAAGTT GCGCGAA'ITG ATATCTGCCA CATAGCCCCP ATCACGCGCC TCCCCCACCA AACGTTCAAA GTAGGTATCA ATGTAGGCTT TAGACAAGCC AAAGTCTGAA ATCCCATAAA GACGGTCGTr TG3CAGTCACA TCATCAGGAC AAGTATGGAT ATCTGCCCCC TCTTGGAAGG 603 TATCCAGCTG AATCATGGCA ATCTTGAGAA TGATAGCAGT TCGCTCCGCA AAACCACGAA ACTCACGGCG ACCTAAAG AGGGTCTCTA CTTCA'rCCAT GTAGTTTTTA ATACCTGGAA 'rCGC?1'CCTT ACGACTAATT CCCAAATTAT CCACTCCAAA GTTAACTGCC TTGGCATTGC GCTCAATGCC AAAGACCCGC ATGGCTGTCG CCTTAATCAA GTGCTCATCC 'rrAGAAATAT GCGCCAAAAC GCGCAATTCA ACTCTGGCAC AAAAGCCTTC GCAAGTTT'GG ATCCACACTA ATCTGTGAAT AGTCAGAGCT CGAATCAAGC GCCCCTGTTC GACAAACGCC CGGTCTGGGT GAGTAGCACA CTATCCTCCC CAATCGGGCA GGAATAr'rT CAAATCCTGC ACATAGCGAG 9 9999*9 9 9 9 9 9999 9 *9.9 9. 9.
9 9
TATGAATCTT
TCTTAGCAAT
GCTCTAAAAC
GAAGTCCCAA
CCTCACCAGC
CCTGCATCTC
TCCATCAGCC -AAAATCCAGT TTGACGGTAA TCCAGGATTT ATCCACTGCT GTCGAATAAC TTTCTCAAAG AGAAGCACGC CAGC'rCGTAA ATCTCTTGAG AAGCAAGGTC TCTTTCTTGA CCTGCAAGCC AATTACATAA GTAGATTGAA TCTTAACAAT CGGAGCAATA GGAGCGAGAC CTGTCTTGGT TTTCTTAGTG CCAACTGCTT AGGCGAGTTG TCAGTTT'rTC AATGACAAGC CCATAATCCC AGCAATT'rCC
TATTCTAGAG
ACATTAAACT
TCATTTTCAG
ATC'IrGGCAA
TTTTCGCTGA
GCTAAGTGTT
AAAGTTTCAT
TTGTCCTCCA
1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 GGACAAAAGC CAGAGGTTGC GTT'TTTCAAG TAAAATAGC CCAAGAATTT CTCACGTTCA CATCAACCAA GTAAGTCTGA CAG'rCGAAAG GAGGTATr'TA CAAAACGTTG CAAAAGAACT CTAAGAAATC CTTGAAAATC TATCCCCACA AGACCAGACA GCTCAAAGTG GAAGATAGAC TCCATATCAT AAAGAAGCTC TAATTGCCCA TCTGTTTCTA CCAAAACAGC AAGTTTACAA GGAATGGCCT TTTTAAC~kCC CTTACCGTAG CCATAAAGAC TAGCGATGGT CGCAATT'rCA GCCAAACGGA TGTCAAAAGC AGGCGCCTGC AAATCCACAC r'rAACCTTCT TAAAGTCATA AACTCTCAGA GATGTTTTTT GGGTCTTGCA ACAGCTCAAG CTTGTCTGTG GCATAGAGCT AATCCA6ACCA AATTATCCGT ATGGTAATTC TCACCAAAAA TCTTCACTCA GCATATCTTG ACTGATTTGG TCAACAATAG TAAAATCCAA ACTCTCAGAC ACATCAGCTG ACGACACATT TGAAGCCCAT CTCATCGTAG AATTT1CCCAA GATTTTCAAC AGTCCTC'rAA ACCAATCGCA ATCGGTGCCT TGCTATCAAT TAAAGCC'rGC T'rTAGCTGTT ATCTGGACCA CTATAGACCA GGTCGCTAGT GTTTTAGACA AAAAGGCCTG TTCCTGTCA TTGATGAGAT CAATATTTC ATAAATCCCC TITTCACCGAC TTTGGTCACC GATCGATAAA CTGAGCTGGT CCTCAAACTC ACCCACACCT GAATCAAA'rC CTTGTCCCCA TA'rCCAGCGT CCCAATGATG CCATATCATC CAGCAACTCA TGGCCCGACC ACCCTTATAG CAAAAGCCAC CAAAATATGA GAAAACCATA AATCGCATTG GATACAGCGC AAAAALACGCC TC'rTATCCAT ACACCCATTA GCAAGTATTT TTCAAACTTT AGATTTAAAA ATGTGCTATA AAAATATTTC TATAAA'rTAA TCACTA'rAGA AGGAGGAATA TIr'rGATTrCr AAGAAAATAT GCTGGACTTG CCAGCAAAAA CTCCATGCCT TACTCGGTGC TAACGGTACT CAAGTAATTA CTTGTGGGCT TACCTGTGCA
TCAAGCGAAC
CCAGGGATAT
GTGAGGCCCA
TTCTTGGAAA
CTGACAATAG
TCATCCCCCT
604
TTTCCTTCAT
CATGCTCCAG
TATCCGACTT
TTTCTTCCAT
TTTCAACCAC
TAATATCAAA
CATACTGAGC
CGAATGAAAG GAAATTGCTC TCCCCATACA TCTCTGTCCG CTAGAAGTC TCATTCCAT 2940 CAAGAGCTTA ATACCCGTCT ATCACCCATG AGCGCCTTGA GAGGTAATCT GGCGTAAAGG CGTATGCTCA TCCGTCAGCT ACCATCCTGC TC'rCCTAGCT CAGA'rCATAG TGACGAATCC ACGAAACTCA TCAGGAGTCT GAAGGTCGTC TTTCCCGCAT TAAA'rGACTC AACA'rCAACT CT'rAAAACGG TCCAACTGCT ATCAATCAAT AATAATT'Tr TACCATTGGG AAGAGCTAGA CCAGAGAAT'r TAGTAAACCT 0 0 0 .0 .00.
0
CTCGGCTCAA
GTATGCAAAC
CGAAAAGCTA
TAAAGGAAAG
TTCCGAA'rAA
ATATAGTATA
TTGACTTTC
GGAGGATTCT
TCAGCAATCC
ATGTGACCAT
AGGATTTTTA
CCCGCTCCAA
CAGCCACATT
CAX3AAGACCC
AATCAAAAAA
ATAGATAGAG
TTGAATCTAT AATAGTACAC CTTGACTGCT CTGATAGAGT TAT'rCACATC TTATTTCAAC CAGACATCCG GGCATCAGCC'CAACTAATGA AGAAATCACT TGTCAATTTA TTCGCGATAT TTTGGAGGGA AGCGATATTC ACGTATTACT 'rACCAG'rATA GACGTCTTGG CGGAGTTGGA 3000 *3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 TTGAGATTCA AGTCCATCAT CAGAATTTTT TCATCAATCA GTCAGGTTAA TCAAAATCTT GAAAAAATTC GTCAGCGAGA AGGTGATACT CACTAGAGCT ACAAACACAT CGCTCCTGTT TACGCCATTG CTATCGTGGA TAGTAATTAT TTCTCAGATG ACCTGGCTTT TCATAGCTTT AGTATGCGCG AAGACACAAC AGGTGAGGTA TTGGCGATTA CCAACAATGG ACAGGAAAAC CATCTGGTTA AGATGGCATT !ZTGGAAT'rA AAAAATACAG AGAAACCAGC AAAGACAAGG TTCGCAAGCC ATGGTTGGAG TTTTTCGGCA ACAAGCCCTT TACCCAGCAA CCGCAACGAG CCAT'rACCCA AGCAAATCAA CTGCTGGACT ACAAGAGCTG GTCCGAGGAG GACAGGAAAA TGTTTAGTCA ACTACATATG CGAGAAGAAC AAGTCTTIGTT ACCACAGGAC TATGCCT'rGG AAACTGCTAG GGCTGAAGGC CITGAACAAG GACTAGAGCG TGGGAAAGTT GAAGGAAGGG CAGAAAGGAA ACTTTTTGCC TTCCTAGACA TAGTACGCCA AGGTCTTC'rG ACTTCTGAGG TTGCCAGCCA GCAA'rTAGGT ATGTCAGTAT CTGAATTTGA GGCACTGTTG TAAAATGGCT CCATAATATC CATAGTGGGT AAATCCCCTA TIGGATATAT 'rAATGAAAAG CCACAAAACA GGAGCCTATT TTGTGTAGAA AAA;ACTCCC ACTCArrAGA AAGAATCATA TGGAACAAT ACAAAATTAC TAGACATTAA AGACCC'TAAT ACACACAAGG AAATCATCGC CAAACTGGAC AACCAATTGA AGAAATATGA CTr'rCAAAAA GG'rATGCCTA CAAGAATTCT CCTTAGAAAG ATGA'rGGTCG CTGAAACTTC TGATGACGTA ACAGTT'rTAA ATCTAGC'TTT ACTAGATT-CA CGAAAAAACC TGAGAATCAT CTCAGGCTTG GTGGAGAAAG TGGTCGTi-rr TCA'rGAATAC TTGAAATCTG CTCAATCTTA TCAATCAAAC CAACCAATTT CTTAATAGCT GATTTTTGGA GGTGCGTACA GC'rTGCATAG ACTTCAACAG GACAAATCGT TCCAGCGGTA TCAATCATAT CCTTACCGAT GATATTCATA ACTTCACTAG GTCCAGATTT TAAACATCAT TACGACCCCC CATCTTGCCC CCTTCTAAAA T'TCCTTATCT CGTCGAT'rCA AGTGCTATCA CAGTCATATT TCTTCTCTTT CCGCTACTAT CTATTTATTC GTCATTAAAT TTr'T'CTCA
ATATGACCTA
ACATTTTATC
CAATAAGGAT
TGAGTGCGGA
TGAAACGACT
CTGTTCAAAA
TTATTATATC
GGAAAAAAGA
ATATCGAAAA
GTACGATAC
GCTCTTCTGG
A'rCCCCTAGG AGATGAGCGA CAGATAGATG GTATCCAAAA TATTGTCCGT AGCAGGACCA GAAAGAACTG CACCAGCTTC CGCAAGAGCA TCTGCCGCAT CATCAATCAA GATACAAGTC TTGCCTTCAA TATTCATCTT ATCAACGCTA CGACGTTTAT CTGCCAACTT ACGAGCACGA GTCACCCCTC 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 CAATAATAGC GATAGATGTT CATGGTCCGG GCTGACAACC CTGCAATCAG AGGAGCACCC GCGCAGCATC CAAGTCGATG CGACAAGTTT TGAAGTGAT'r CATAGTAAGG CATGACAACA TAATCAAAAT TTCAAGCAGA CGTGTTTCCC ACCGATTGAT GAACACTTGA TTTCCCCAAC
TTCAAAAATT
ACATAGTCAG AACCAACCAT ACCACGACGC ATCAAATGA'r CCACAGGAAT ATCAAAGAAT GTCAATAAAC GATCCACTCC AGCTACTTCA CGCTCACGCG CTCTCGCCTT TCTATCCTGA TTGACAGAT'r CTGCACTCGC ACGCTTCAAA TTGTCATTTA CAGGCGAACT AGTTGATTGT TCTTCAATGT TGACCTGAAT CTCTCCATCT
TCAAAATAAT
CCTTGAATTT
AGCATATTTG
CGTGCATACC
GCATCTACCA
AAGATAAAGA
GAAAATTGGC
TCTATCCCAA TCTCCTGCGC
TATTAGAAGA
ATGTATAACT
TTTTTCAATC
AAGGGCAAAC AGCTTrAAAT CAGAAAAAGA TGTGCT'ITTC ACAACATTT'r CCATCTACCA AAAAATAAAA GAAGGGCACC ATATT'rGTAC CACACGTTCT GCCAATTCTT CATGATTTCC TCCGGTATAT TTGTAGCGCT TTTTGCACTA CCTTGCZATCA TTCTTTTGAA 606 AAATAT'rCyA GGTCATCAAC TCATTGTGTT TCTCAACAAA CATAGAGAGC AATAGCCGTA ACCACTGGAA TCGC1'AAAGG CAAAAGGAGA GTTAAACAAG AAGTGAGTTC CCAAGGCTAA GTTTCTTGCC AACCICTGT CCTTTA'rAGG CTCTGTAAAG CTAGACCTGA AAAAGTCCAG TGAGAGGCAA TTCCTGAGAT AAATAGTAAA GTCAAACCCC TCTGGCAAAT CCGTACGAAT 'rTTGGAATCC CAAACCGGAA GCAATTCCAA GTAAAAACAA GAATCAAAGC CAAAACAAAA ACAAGTGACA ATAATTTCAA CCGCAATAGC ACTT'rCAAAGCGCATTTAAAA ATGGACTATC AATCATGGAT ATAAGTATTA GCAAAACTAG ACAACCAGCC ATAAAGACAG AATCAAAACC TTCTTTGGCA ATTCCCATTT ATAAAGAGCC GGAATCATGT AAAAGAGAGC 'rAGAAAGATA TTCCGCACCT GACCTCGAAC CGTCCGTATA GTAGATGGT'r TAGCAATAAA ATAAAAA'rAA ATAAAA'rA'rrTT r'rCTTC AAGTATTTAT AATTCTACGA CTGTCATACT TCCTGTATCA GATAATGACA TCGTCTGGTA TTAGGGGAGA CTCGATAAGC ACTCTCTTCT ATATCT'rGGA AGGGCAAGAA GTCCTGGTCT TTCCTTCAAA TACTCCTGAA AAGGTTCATT TTCAAAGGTC ATGCAATACC ACAATr'rCTC TTGTCCCCAT TTGTTGCTGG CATATCCTCA GTCACTCGAC CACCTGCATT CCGCAAAATA ACCTAGAGCT TGCGCAACGT GCAAACGTGA GTCCATACAG 9 9 .4 ~6 9 *1 9. 9 9 9. *9 9 9 9.
9 9 GCAATAAGCA TGATAAAAAC CAACTCTGTT TCCAACTCCA ACCrAGAAAA ATAAGGCCCT CAAGTAAACA CCTACTACAG GATACGCTCI' AAAATTCGCG ATAACCAATA TCCTTAATCA AGATTTAAT TT'rCGC-ACAG GGGTTCTrCT ACCAAAGGAG TGGGAAAAGA ACCCCCAGTA TGAAAGGAAC ATCCCTCCCA TTCCCAATAC GGAAGAGAAA GAAACTCCCA TTAGrCCATA TCATACTGTA AACCAATACA ATACACTTTC TTTCTAAATG ACATTGTAAA TGGCACC-AGA AGTTGCATAT CCTCGCGTAC GACACATCGA CACCCAATTC TGAGCACCAC AGTCTGTATG GAAATAACTA GAGAACGAAT TGAGCATCCC CAAGTGCCAA GTCACAATGG CTACTC'rGGT ACATAAGCCT GATTGGCTTG ATTTGATAGT CAAATATTTC GCAAAGACCT TTTTCAAGAC GGCAGAGTGA TTCCI'GTCAA GCAGCTTCGT 'rGATGCGTrG AGTTCTCCGA CAAAACATTC GCAACTGCAA CAGCCAAGTC 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 72320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 *9 99 9 9 9 9*.9 9 *999 999 9.99 99 99 4. 9
C
TTTAGGTTTA
CATAAACTGT
TccTATcrTTA
TT~CCTGAATC
GGAATTCTTA
CTCAATACGA
C'rGAGGATTA AGTCCAGAT TTAACTGCCC ATGTAGGGCA TCAAAATACG ACACGATTCC CTCCTTGAAA TCATTTTTAA GAGAATTTGT CACGGATTAT GTTGTCACGC CAATGACCTG AATT1'CCTTA GGTACATAAA TCTTAGTAAA GCCCAGTTTA TTCACGCGCC GAATCTCTCC TGTCAAGCCC GTTGGCTTGT CTTTGTAGCT CGAAGCAATA AATCGCAGGT TCATCCAATT TAACACCACC AGCAGATTTG AGATAGGCAT CCTGATTT'rG CAAGAGAAGC CCTGCCCGTT TT~TCCAAAAC AGCCATAATC AAGCTAGCAC CGTTAAAATC AAG'TCCTGTC GTAGTACGCT TGGCATTTCC AAACATCGTC GGTGTTACCA AAGCCTGAAC CTCCGCCAAA ATCGGACGCG TCCCTTCCAT GGTTACAACG ATGGAGGAAC CAGTCGCCCC ATCCAAACGC TCTTCTAGGA AAACTTGACT CGGAT'rGAGT ACCTCAACCA AGCCGCCCGA CTGCATCTCA AAAATCCCAA TCTCAT'rAGT GGAACCAAAA CCATTN'GA CCGCTCTCAA AATACGAAAG GTGTGGTGAC GCTCCCCTrC AAAGTAAAGC ACCGTATCCA CCATAPGCTC CAACATACGA GGCCCAGCCA AGGTTCCTTC TTTGGTCACA TGACCTACGA TAAAGATGGC AATGT'rATTG GTCTTGGCCA ACTGCATGAG TTCAGCGGTC ACTTCACGCA CCTGAGAAAC AGACCCCTGC ACCCCTGAAA TCTCAGGAGA CATGATGGTC TGGATGGAAT CAATAATGAG AAAGTC'rGGC TGGATACGCT CCACTTCTGC ACGAACACTC TGCATATTGG TCTCTGCATA GAGATAAAAC TCACTATCAA TATCACCTAA GCGCTCTGCA CGTAGTTAA TCTGCTGGGC AGACTCCTCC CCACTGACAT AGAGAACTGT CCCCACTTGG GACAACTGGG TTGAGACT'rG TAGGAGAAGA GTTGATTTCC CAATCCCAGG ATCCCCACCG ATA-AGGACGA GACTTCCTGG TACCACTCCG CCTCCAAGCA CACGGTTGAA T'rCCTCCATC TCCGTCTTGG 'NrCGATTGAC ATTGATGGAA GTCACCTCAG CTAGTTTCAT GGGCTTGGTT TTCTCACCTG TCAACGACAC ACGCGCATTC TTAACTTCGG CAACCTCAAC CTCTTCCACA AAAGAAGACC AAGACCCACA GTTGGGGCAA CGTCCCAGAT ATTAGGGGA AT'rATACCCA CAATTTTGAC ATACAAATGT CGCITrTTTC TTTGCGATGA CAAACCTCTT TCTATATCTC TAACTCACAC TCAATCACTT GGCAAAAATC AATCITCTCA TTT'GGCACAA ACTGGCGCAT GAGCATTCGA TGAGCAACAA CTACCACAGT CTGATGTTCT CGATACT'rAG ACATACA'rTC TAGAAACCGA GACTTCATTr CCGTAGCTGT CTCATATTGA ATAGGACTAT TAGGAAGCAA CTCCCCCTTG TTTTCTAAAA 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 ACAGTCTTCT AGCTGT'rTCA AAGTTT'rCTA TTCCTGTTTT ATAGACCTGC ATAAAGGCTC TAC'TCTTAAA GGAAGACCCG TAGCACAGAC CACATACGAA AAGCTCTTGT GACTGCAGAA GATACGATTA TTTCAGCTGA CGAGAGTAAA TCAATTTCTG GACTTGCTGC CGTCCCATCr CAGACAAGGG TGCCAAATCT CTATATAAGA ACGCTCCTCT AACTCACGGT AATCTGGCTC CCCATGACGT
CATTCATGTA
GCCGTTTCTA
GGATTTTTGC
ATCCCAAATC
ACAAAGATAA
TCTTCATTCT AGTGCCCTGT CGATCCAAAT CCACCAGTTC GAACGCCATC AGCTGCATCT CCATCTGCAA TTAAGAAAGT ACCAAAAACA GCCTGGACAA TACGCTCCCC AACTTCAAGA ACAACCTCTT GGTCTGTGAT ATTCTTCATC TGCGCAAAAA TATGCCCTTC ATTTCCAGGA TTTCCATAAT AATCCCCATC AATGACTCCA ACTGAGTTAA TTAAAACCAA GCCCTTCTTA 608 CGAGGAT'rTG AAGAACGATC ATAGAGGTAG AGAACCTCAG TCGGCTGCA'r ATAAGCCT'rA ACCCCTGTCG GAACCAAGAC AATCTCTCCT GGCGCAACAA CTGTACGCAC AGCAACCTIT AAGTCGTAAC CAGTCGCATG CGCTGTCTCA CGCTTGGGCA ATAAATTTTC ATCTGrAAAA CTCGAAACCA ATTCAAAACC! ACGAATTTTC ATAATT'IrCT C7rr'CTATr ATCATTTATT CTAGATTATT CTATACTTAT TTA INFORMATION FOR SEQ ID NO: 74: Wi SEQUENCE CHARACTERISTICS: LENGTH: 16535 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: TGGTTCTGTC CTTATCGGCG CCTTGTCrTG CTTGCCATGG CTACACCAAC TATCTCATCC GACGAAAGTA CACCAACCAC TAACGAACCC AACAACAGAA ATACAACCAC CCTTGCCCAA CCTCTTACTG ATACAGCAGC TGGCTCTGGT. AAGAACGAAA GTGATATTTC TTCACCTGGA 10020 10080 10140 10200 10223 AATGCAAACG C?1'CCCTAGA GCACCACAAA CTGGACAAGA ACTGAAACTA AGGCAGAAGA CTTCCTGAAG AAAACAAGGA TCTGAAAACT GGCCAAACGG TATTACCTAG ATGTCAAATT AATACAGCTG GAAAAAATCT AACGAAGCT'r GGTTAGACCA ACTGTTCGCG TCAACTACTA TGGGGAGATG TGAAAAATCC GAAAACAGAA GAAAAACCTG CTGCAAGCCC AGCCGATCCA TCGT'rCA.AGT GAGCCAACTA CTrCTACTAG TCCAGTAACA GCCCATCGAA GATAACTACT TCCGTATCCA TGTCAAAAAA TGCTCAAGGA CTATGGACTT GGGACGATGT TGAAAAACCA AGCTTTGTCC TTCAAGGATG CCAAGAAAGA TGACTACGGC AAAGGGAGAA CAAGCCAAGA AAATTAGCTT CCTCATCAAC AACCGGCGAT AAATCTGTAG AAAAACTAGT TCCAAAAATG AGATTACAAG GTT'rCTCTT ACGAGCCACA GCCTGCAGGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140
ACAGGCAAAT
TTTTTATTAC
AAGTTCACAG
TACACAAATC
TCTAGCATTG
ATGGCCGCTA
TAGATGAGAG
ATTTGAAAAA
CATACTATGT
AAAGTAGCT'r CCGCACAGAT GGCAACTATG AAGTAGCGCT CAATCGCCTG TATCGACATT CCTCTTA.ATG CAAACAAGGA GACGACGTGA TCATAGCCAA ATTTTCCTAA.
CCATGATATC CGTATGACAG TTCAACACTT GTCGGTGCTA CCTAGGAAAC AAGGTAACTA ACAAGAAATC TCTCTGGTAC ACGGAACAGA CTTTACGGCT AAGCCGCAAG AGAATTTGGA
AAATCCGTAA
AAGACGATGA
GAGCCCAACA
AGAAAAT'rAT
TGAATCGATT
CGTAGGCACT
AAAAAGAAGA TATCCTCAAA TTACCGATGT TGCAATCGAT CACTCCAACA TCACTAATCA 609 GAAGCTGGTA AGAAAGTGAC CTACAGCGGA GATTTCTCTG ACACAAAACA TCC?1'ATACT GTTAGCTACA AT'rCCGACCA ATTCACTACC AAAACAAGCT GGCGCCTGAA AGATGAGACA TACAGCTATG ATGGCAAACT GGGAGCTGAC CTAAAAGAAG ACCCTTTGGT CACCAAGTGC TGATAAGGTT TCTCTTGTTG GACAAAGTAG TTGGAACTGT CGCTCTTGAA AAAGGGGAAA CTAGACAGCA CAAACAAACT CGGAATCACA GATTTCACTG ATCGAGCGTC AAGGTAAAAC TGTTCTTGCA CTCGATCCTT TGGAATAGCG ACGATTCCAA GATTGACGAT GCCCATAAAG AAGGAAAACA AGTTGATTTG TCTACGACAA GAATGACCCT GAGGAACTTG GAAACAAACT GCTACTATTA TCAATACCAA ACGCTAAATC TCTTGCTGCT TGGCTAAAGC CGCCTTTGTA AGA'rTCACAA TTTCAACACT TCACTTCAGA TCCTGCCATT a GATCCAGCTA AACTCGGACC CGTGAAGACG CCGTTATCTA GCAAAAGACT TGACCAAACC CTCAAAGACT TGGGTGTAAC AATGAATTGA AAALACCATGA TGGGGA'rATG ACCCTCAAAA AATCCAGAAA AACCAATCGC ATGGGAGCTA TCCTAGATGT TTGGAACCAA ACTACTACCA ATTTGGGACT TTTGAAGCCT TCATTGAAAA ACTAGACTAT CCATATCCAG CTCCTTCCAG TCTTGTCTTA CTACTTTGTC TCAAGACTTG ACTTATGGTA CGA6AGCTCAT GTGCGTGATT ACGCT'rGTCT GACTACGCTT CTACTTCTCC TTGACTGGTA AGAATTTAAA AACCTCATCA CGTTTATAAC CACACAGCCA CT'rTATGGAT GCCGATGGCA CAAGCAACAG CAACTACAAC TGTACTCAAG CGATCCTAAG ACGAAATCCA CAAACGTGGT AAGTCGATCT CTTTGAAGAT CACCTCGAAC TAGCTTTGGT TCCTA6ATTGA CTCTATCAAA ATATGATGGG AGACCATGAC TCALATCCAAA CCTCATCATC 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880
GGTGGACGCT
TACCTAGTTG
GCCGCTTCTA
TGGGGACAAC CCACCATATG ACCAA-ACGGC ATACCTACAA AGTGGATGGC TTCCGTTTCG TCGAAGAAGC 'rTACAAGGCT GCACGCGCCC CTTGGTGAAG GTTCGAGAAC CAAGATTGGA TGAAACATAC CTCAAATCTG GTTATCCAAA GTCAACACCA TCTTTAAA.AA GGAGATGTCA TCCAATACAT C-AGTCTATCA AAAAAGACCC CGACTTGGAA ATCTCATGGT GAATATGGAC GTACTAAACA AAGGTTCCAA ACAAATCTCA CTATGCCGGT GATGAAAACA TGCCTACTAA AGCTGCTGAC CGATACTGTC GCTGTCTTTT CAGATGACAT CCGTAACAAC CGAAGGTCAA CCTGCCTTTA TCACAGqTGG CAAGCGTGAT TCTCATTGCT CAACCAACTA ACT'rTGAAGC TGACAGCCCT CCCAGCCCAT GATAACTTGA CCCTCTTTGA CATCATTGCC AAGCAAGGCT GAGAACTATG CTGAAATCCA CCGTCGTTTA CTTGACAGCT CAAGGAACTC CATTTATCCA C'rCCGGTCAG ATTCCGTGAC CCAGCCTACA AGACTCCAGT AGCAGAGGAT CTTGTTGCGT GATAAGGACG GCAACCCATT TGACTATCCT 610 TACTTCATCC ATGACTC'TTA CGAT'rCTAGT GATGCAGTCA ACAACTTTGA CTGGACTAAG GCTACAGATG GTAAAGCTTA TCCTGAAAAT GTCAAGAGCC GTGACTATAT GAAAGGT'rTG ATTGCCCT'rC GTCAATCTAC AGATGCCTTC CGACTTAAGA GTC'rTCAAGA TATCAAAGAc CGTGTCCACC TCATCACTGT GGCTACCAAA TCACTGCTCC AAAGCTCGCG AATTTAATTT GCAGATGAAA ACCAAGCAGG GAAAAAGGCT TGAAATTGAA ACTAGCCATG AGTCAACTGC CAAAATGAAG CTTCTCACCC ACTAAACCAG ATGCCAAAGT TCACAAGCTG AACAACCAGC AACGAATCGG TAGAAAACTC GAACTTCCAA ATACAGGAAT CTTGCGCTCC TTGGTCTCGG CC'rATAGAAA AATCCCCCAA CCCAGGCCAA AATGGTGTGG AAAAAGAGGA AAACGGCGAT ATCTACGCAG TCTTTGTCAA GGGAACTGCC TTTGCACATC TAAGAAATGC ACCAGTCGCA ATTGCCAACC CGAAAGGACT TGCCCTTACA GCTACTGT'rC TTCGAGTCTC AGAAGAGAAA CCAGACTCAA CCCCTTCCAA 'rGCACATCAA GACCCAGCTC CAGAAGCTAG AGCTGATGCG GAAAATAAAC CTAGCCAAGC ACAAGAAGCA CAAGCATCAT CTGTAAAAGA TAGCAAGGAA AATATACCTG CAACCCCAGA CAAAAACGAA AACAAACTCC TATTTGCAGG TTTrCTTACTA AAAAATAAAA AAGAGAACTA TGTAGTGArr
TGCGGATGAA
GGAAGTTTTG
TGAATGGACT
TCAAAATGGA
GCCTGAACAT
ACCTGATTCT
TACAGCTGAT
AGCGGTTCGA
TAAACAAGC1'
AATCAGCCTC
AACTAGCCCT
2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 41320 4380 4440 4500 4560 4620 GCATTATAGC TCGGGGGATT GCATTrTTTTA TTAAGCCT1CT AATTTTTGTTA CAATATTTGT TTICATAGCAA'AATAJAGCTCd TvGT=CTkATA
TACTTTGGGT
CATAGAACCA
AATCAAT'TT
CTGC'rCCAAA
ATCCACCACT
AATCCACCCC
AACTTGATTA
GCAACTTGTG
AGCGGTAGAT
TCGGCCACCT
TTCCGAAGAG
GAAGCATGAA
GATCTGGATT
TATTGCTCAT AACGCAATTC TGCTTAGTCG GATGGAAATA CAAGAATGGG CTCCCAC'TTT TTCAATAGCT C'rCAGAACCT GGTCATGAGG GCGGTCCAAT CCTAAATCCT CTATCATGCG GCCAACAAAC ATGGCGCCAT TTGGCCCTAC CTGCCAGTGC GGACGGTCTT TGGAAATAGC ATCTTTCACC GCCTGCTCAC CATCTTCCGC CAAGTCTTTG TCAGCATGGC CCCTTCGCTT TTAAAATAAC GTCGATTACC ACCAATAATA ATCTTCACTG TTGATTCGAC ATGACCACC'r ACTGGACGAG GATAAACTTC TTTACCAGCA AACTTGGTCT TTTCAT'rGAC TAACTGAAGC TAGTCTTTCA AGTCATAACC AAACAGAGGG ATAATCTCCG ATCGTCCATT TGACAAAGCA CCAATCTCAC GATAAGCCTG AATCAACTTTr GCATATACAA TCGGTAGACC AGCCTGACCA GTAGCTATCC ACAAGGGCAA TTTGTCCTGA ATCGTTTGAG TCAATCGACC TTGCCAGTCT AAGTCTAATT TCTCATCAAA AAGAGAGTCG AAAGATTCCG TGAAAGAGCC CCTTCCAGCC TCGATAGTGG CATACTGTTG GAACAAACGA ATCGGGTCCA TGCTTGACAG AATGCTGACT C. *e 611 GCACTGGTCA AACGGATTS'T CTTGGTATTG A=TCCCAG GCTGATACTG CAAAATCCGC CCGATGGTGC TCACCAATCC TTGTCLGCCA GCTcAATCTC TGCCACCAAC TGGCGAATGC TGTCAGTCC C'TTCAAGCrC CGTTATTTCC CCAAATGTTG GTGAT'rCTCC TTATCTATCT CTGTACTTCA ATTGAAAAA AGTACAAGCA ACCGA=MTC TCATTAGAAA AAGCCTAGAT TTCTACCGTT ACTGACTTGG CAAGGTTACG 'rGGTTTGTCC GTGCAAAG 'rAAGCGAC'rA ATTGCGN'GG TACGACCATT ATGTACGGT1C GTAAGGACGA TATCGTCGGT ATCTTTGGCT GAGGACTTTG GCACC-ACGGG CTGCGACCTC TTGGATATTT AACTGGATCT GACAAGAGAG CCAAAACAGG CGTTCCTTCT GTGCTTGAGT TCTCCTGCAG CAAAGCCT'rC ACACTGGATA CAGAC'TTGCr TCCATGGCTA CGTAGTAATC TTGACCACGT AGTTGTTCA AGAAGTTC-AC GAACCTTGAC 'rrCAATGGTT TTCGATAGAC TGAGCTACGA TTGACAATTC ATGAACCAGG ATTACCATTT~ GCTTCrCCGA CTGCTTT'rGC AAGGAAGGCA ATAGGCTTTA GTTGATGCCA CGGCAATT'TC AGGACCTGCG GGCTCACGT GAGAGG'ITG AACCTGGAAC GTTTGTCACT TTCATTAGCC TTGACCAAAA CTTGACGACT ATCCGCTGTT.
GA'rGAAGAGT GGTTTCTTGC TGAGAAGTGG CATACCGTAG AAGTTCAACT CGTGTATCTG TCAAT'rCTTC CAACATTTTC GTAAGATGTT CCAGCTGCAA GGATGTAGAT GCGGTCTGCG ATCTGGGTCT ACGACAACTT GACCAGCCTC ATCTGTGTAG AACAGTTGGT TGCTCGTCAA TTTCCTTGAG CATGTAGTAA A T.CTGACAAG TCAAGTTCAG CAGTGTAGCT AGCACGCTCA TTGAACTTCC ACACTA'rCAG CCTTGACGAT TACCAACTCT TTGGTTAGTT TCACGAATCA TAGCCATGGC GTCTGAGCAG AAGACCAATC AAAAGTGCTG ATT'rATTTr-r AGCTACGTAG GTCAACCAAG GCAAAGGCAT AAGAACCACG GATCATGTGA
TTATTC'TAAC
AACTAGACTT
ACATCGAGGC
GAAATTGGTG
ACATTCTCT'r
ACGAATCTTG
TTTTrAGCTTA
CACGGTGGAG
AGAGGTATGG
CTGCGATAGT
CCACGAGTAT GATTGGCAAG TCAATCAAGG CAATGGTTCC TAAGAAATCT CTTTGAGTTT CCGATGTAAA AGGCGTTACG TCTTTCTCTG AAAGAGTTGA TCAAAGGCTT GCGCCTTAGC AGGGCTGCGA TTTGCGCTGT TGAAGGAGCA TGGTATAGTT GTTAAGCTTG GAATTCCCAT TCACCAGA'N' GGCTGATAAA CCCCACTCAG ATGAGATTCC TTAGAAGCAA ATCCTGCATG TCTTGAACAG CCT'rAATGAT GCTTGGATGA GTTTCCGCAT GGGTAAGTTC CCTTACCGAT CGACGATN'C CATCATAGTC TGGTCATGGA TTTCCATGTA CGGCCAGAAC AATCTCTGGG CATATACATC CAAACCAACC GTTCAGCATG ACTrGTAAG= AAATTCCCAA TrCTACCATT 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420
ACCATGTTAT
ATGACTTCAG
AGCGCTTTrT
AGCCTTCTCC
GATCTTGTGA
TGAAGGCTTC
612 AAGAACTGAG AGCCCTTCTT CTTCCGCAAA TTTTX'CCAATC ATCTGTC'rGC CCCTTGAAGT GGTGACCTGC CTCAATCACC CCATTATGCA CCAAGACAAA ATTGTCCTCA GTGGTTC CGTGAGTAGC CTCAACACCA GCTGTCTTGG CAGACAATTC GTTATCAGCA CCA'rCTAGGA CAAAAATTCC TTCAAGCCCT TGAATCAAAA TATCAGTTGC ACACATAGTA TATACGACAC AGGCAAGCTG T'rCATC'rrTT ATAGAATCAG CAAAAACAGT AAAAATTGGT ATAGTTCAAA 'rTAAGCTCCT AATCAGTCAG AGTCCTNTTT AAAATCCAT'r
AAGGTATTCT
ACGTTCCGTC
CCAACGAGTA
TGCAATACGA
CGCAGAATCA
AI'TGTGTTT
TGCTTTCTCC
ATATACTTGT
AAATGAACGG CTAr'rTCAGT 'rCCTTGATTT CAAGATAGTT TCAGAGCGGT GTGGGTGAGC TGTCCGATAC CAGTTGTTCC CCAACCGCCT TCACCAAATG TAGCCACGGT A'rTCAAGCT'r CCAACAACAC CAACAATTCC 'PrAAAA'T-GG TATAGTCTAA TTCTTTCACT TGTCAAGAGT GTAAGCATAA AAACTCTGAC CGATTGGGAT ATTATCCT AATTC'TTrGA ACCAGTGGCC 9*.b*e 0 4O 0 000 00 00 0 0 0 00 0 0 4 *0 0 0
S
000004 0 00.0 0 *000 0S 00 0 0 4000 000.
0 0004 00 *0 0. 0 TGATTTCTTC AGACGACGTT TTrATAGCTG TTGAGCCATG GTTGGCACCA TCTTCAATGG ACGGTAATCA TCTTGAATCA ACCATTCTCA GTCAACATCC GTCATAAATC CCTrGCTCAT CATCACATAA GGCTCGTAAA TCGAGGAGCC ATAACACGCA ATCACGAATG AGTrCCAACT GATT'rCTACC AACTCCTGTG CTGAAAAAGG GCCGCAATAC AGCCGGTGTC AAGTTGAGGA CTTAACAGCC CGGcTGCTG CTTGCGTTTC CA.AGTCTAAT TCGACCAAAC ACCAGCAGTC AATAAAGGTC CAAATCAAGT CACGGTGAAG T'rCACGAAGA TGACCTTTTA CA'rAGCGATT
AGCCCTTACA
CAAAGTCAAT
TTCCATCT'rG ACGGAATTT'r TCrTCCCCTT CAACACCCAT ACTCAATATT GCCATAA'rTT TCCTTGATAT T'rTGGGCGAT AA.ATCTCCCA ACCACGTGA GAATTGATTT TACGTCCAGG AATGTTCTGG TAAGAGTGGA CTCTCTGGAT GCTTAGCAAA AAGGTTGATA GTAGTTCACA CCAAGGAAGT CCACCGTATT CTTCCTCTGT AGCATCAGGT AAAAGACCGT GTTCATGCAA GATAAGTCCC CAAGACAGAT GGATCTAAGA AAGATTGGGC GAGCTGCCTT GACATCAGCA GGATGCTGGC TACGTGGATA CAATCCCAAT CT'rGGAATCA GGCAAAAGTT CATGGCAAGC CCAATTGTGT ATGATAGGCT ACCT'rAACAG CTGCCTCTGC 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 ATCCACCTTA TGTGGATAAT GGGCATCATA CTCGTTAAAG GTAATCCATT GATCCACTAA ATAGTCTTCA TAGGCTGAGA CTGTCGCCTT GGCAAAAGGT AAATCAAAAT GATAGAGATT AGCCTCAAAG ACCTTACGAT AAAAATCCAC TGGAAAAATC CGTGACCACT GAATAGAAGT AAAATAACCA AATTCTACAG GAACGATGGG ATCTCCATAA GTCTCAAAAC AAAAACGAGC A'rTTTCCCAA CCATCACCAT CCTCTTGAAG GACTAACAGA CGAATTCCTT TAGCCTTAAT ACCTTGAGTG T'rGACTTC CACAGCCTTG CCGAAAGGCT GTGTGACCAG TCTCTAACAA 613 AAGCTCAATA TCCCCCTCCC ATTATAG'rAA CGATTTGGCT GTCACCAGCT ACACGTCCTT AATTTTCATA AAAAGTCGAT CCACTTGGAA CCAGTAATCC CTGTCTGCGG TCCAGAAGTA c MXGAAT
AAAACAAGGT
TTCCTTGCTTA
CCAGTTGCCT
CGGATTGGAA
TCATCTGTCA
GCTTGCGCTT
TCTTTAGGGA
CTTAGCATAC ATTTACCTCT TTATCTACTC AAAAACTAGT TACA~rTTT CCTTGTTTT GGATTTCAAG CGTTITCAAGC ACGTTATCTG TGATCTTAAC TTC'rACAATG CCATCGGCCG GACCAATCAA GTCACTATCG CTAAATTTAA AGACTTCATA ACCAGCTCCC ATCAAGCTTG CTTCATCCTT GACA'rTGACA GTAATCAAAT AATTGATTCC CCAAGCGTAA CGGTAT'rCAC GTCT'rATCTG AACCAATCCC CAGAGATTGT C'TCCCTTACC GAGCA'rCCCC AGACAAAA'rC ATTTCTCCCA T'rATACAGAA CTT'CTGATTA 'rAGTTTT-TAT CATGAACCTC AATGGTGTCA CTTTrTTTACC AACAGTGATA CACCGACACG TTCGTTACGG CTTCAAGTTT TTCTGTCAAG GCACATCAAA TGGTGCCAA'r CTTTTGGCGT rGTTAACA AAGAGGCGAG CGTGT'rGCTC CATCAC'rGCT GAAAGAAGAC GGCTGACACC GATACCGTAA CATCCCATGA TGATTGGCAC AGCACGACCA =TTCATCCA AGACATCTGC TCCCATGC 1' GCTGAATAGC GAC'IrCCGAG AGGACACCI' GTCCATCTGG TCTGCAGTAA AATCACGGCC CCGACAACTG CATTGCGAAC GGCAAACCAA CTGGTCCAAG TCGCTAGCAA CGTCAAAGAA TGGTCATTTC CAACTAGAAG TTAATCGT'r GTTCTTCTGG TTTGAAAATA TGACCGATCT CAATACCACG GCAAATTTCA CCCTCACGAA CTTCACGGAT TGGGTTCACA CCAGTCAAGT GGTAGTCATC ATCTTGTACC TTACGATCTG CAATAATTTT TGAACCAAAT CCTGCTTGAA CAACATTCGC ATCTGCTCCC AAGTGATTTN GGCTGCAACA ACCTCACCAT AACATTGAGG AAGGCTGCAA
TCAACTTGAC
CTGCAATGTA
CTTCATCAAT
CACGGTTGCT
CACTTGAGTA
TTTCTTCTTG
CGCAAAGTTA
ATCCACATAT
'rrCGTTAGCA
PLATATTCTCT
CACTTCTTCT
TTCGTTGAGT
GAAGAGGGTT
TGATTTAACA
'rGGTTTGTAC
AGCAATGGTA
CACTTCTGCA
8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960
TCTGGCGTTG
TCGTT'rGTTG
TCTTCACCAG
GGAATTTCGT
CGAGCAGATG
CCAATAATAG
TACTCATCAT
ATGATAA.ACT
CAACACGAG-T _AAC'rTC'rTCT TCAGCGACAA CCATTTCTAA GTTAGCTGCA TAGCTAGACT AGACTATCCA TTTGAGCAAT TCTGCCTTGA CAAATGAGGC AACTGACTTG TCCAAGACAA TAATGGCCAT AAATTCTTGG CTATCCT'rAC CCT'rGAAGTC TAAACCACTA CGAGTGAAAA AAACACTATC CAAACTATCA TAGTTAGCGT CACGTGTACG AAGAAGTCCA TTACGCGGGC CCCAGCGGTC AAGGTCTGTA CACCCATGGC TCCACCGTCA TACGCTCATA GGCTGCTTTG GGAAACTATA AGCATCCTTC GTTTrCATC ACGATACTTG 614 GGCTGAATTT GATAAAGGT'r GAGTGGCAAT TGCTTGTAAG ATTTAACAGA ATCACGGACA ATAGCTGTAA AGGTTTCTTC GTGAGTTGGA CCTAAGATAA AGTCI'GATTT TTCACGCT Tr AG'rTrGl
GCACTAAGAA
ATGATGTTTT
GCTGAAACTT
TCGCTTGGCA
GATTATCTAA
ATGATGACCA
CGGATGGCTT
AGA'rrAAAAA
CCATTTTTAG
AAATCTGGTT
GCAGTTGTAA
CCTAGAAGGT
TTTTCAGAAA
GCTTGGATCA
ATTTCTGGTA
AAAGGTCTTrC ACCATAGGP GGGCTGGAGC CAACATCTCA TAGCTTTTTC AATCACACGG CGCGAACATA ACCAGCACGC TTTCGCGAAG CGTTGGGATA AAAAGAGTCG CATAATGTCA CTCCGGCCAA GGTGACATAG CTAGGATAT'r GAGCACAATC TCCCAATATT GATGGAAATC CAGCATCACT ACTTGCC'rTA GGAAAATCAG ATTTTTCAGA AACCACCTAC AAACATGGAT TCGTAACGAC CTGAT'rCACG CCACAATTCT ACAGCACCAA TCTNTCGAA 7CTTGGCGC TTGGCAAGTG GTAGATAAGA ATAAACACCr AACATAAGAG CATGGCTGAT AACTTGAGCA GGCA7TrrTAC TTTGTTTCAT AATA'rCCTC TTCCAAGTCA CAGCAATCAT CAAGACAACC GTTTCAATTT CTTGTTTCAA TGGTTTGCGG TTACCACCAT CCAAGGCTGG AATCGGAATA ATTGCCAAGA AGTACAAGAT ATTT'TCAATT AAGATAGCAA CAGGTCCACC CAACTTGTTC GCTGACAGAA TTCGGAGAGC.TGAGTCAGCA AGAAAATCTG ACT'rAACCCC CG'TGAACA 0 *0 0 0 00000 000 AACGACCTTG ACTATCTTTG GGTGTAACAG TGACTI'GTTT TAGTCACATC CAAAGTCGGT GCCGTCTTAT CTTTGGTTTC AGCTTTCCCA GTTGCTAACC CTCCTACCTT GGCCAAGGCA TCATGTGAGC CAATC=GGT C GGGGCA TGATATGGAA TCAACATCTC TGACACCACC CTGCATAAAG ATTAAAACCC AAAAALACAAC ATAAAATTGT TCATAGGACC TGCAAAATTG GTAATCAGTT 'rGCCCCAGAT TGATATTGAA CATCTAAAGG TGCAATCCGA ACCTCAGTAC CATCTGCTTC GCATCGTGAT CCACTGCAAA TGTTT'TTTCT TCTTCCAGAA CCAATCC=r TTGTCTTCAA AATCAAACTG GGTCACCTGC ATAGGGAGGG CTGTTTGATC CCTGAGAGAkT TGATGCGTTT AACCTTACCA TCATCAGCAA GTGTCAAACT CCTGTCTTGA TT'rCAG'rTGT ATCATCACCC CAACCGGCCA TGCGGACATA GGCAAGATTC GAATGGTATA GGCCGTTCCA TCCTTGCCAA TGTGAGCAAA CCCA'rACCGA TGGCAAATTC ACGTACTAAA ATCCCTGATT TCTTGGCAAA CCGAACTCGT GCACCACTAC AATAATCCCG AAAACCAGAA TAAAGGTTAA ATAGCGTTTC CTCCGTCTTT TGAT'rAAAAG AGTCCAAATA AGTGCATGAT AGCAACATAC TATCGAAACG ATCCAAAACA CCACCATGTC CAGGGATAAA
GTCACTCCCC
TGTTTCCACA
AATTTd- CCC
CTGATTGGTA
GACACCTAAG
AGTCGCATTT
CACAACCGTT
GATAAAGAC
CAATTTTTTA
AACAGGCGTT
GCCACCCAGA
AATTTTAGGT
GTAGAAGTGA
A.MTTCCGAGC
TGGAAATACA
TTTCCCACAA
10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 400 11460 11520 11580 11640 11700 11760 615 TCCTTAACAC CAAAATGACG 1TrGATCGAA CT'1TCTAGTA AATCACCAAA ?TGTCCAGCA ATGCTAAAGA AAATAGCAAA GACTGACATC TTGTAAATrC CATA'rGGAAG AGCAACTGTA CTGTCAACTA TCATAAGGAT CCCTCAAGGG TTTTATTACG CCAACAAGAT AGGCACCACT TCCAAACCTG CAACACGAGC ATAGCAAGAG GGAAAACCGC AACATGAT'rG AAATCAAAAC AGGTAATTCT CCAAGGGAAT ATCGTCATGG TCTCTAGACC ATTCCGATTG CTATCTGAAG AGGGCAATCC CTGCAAACAA- CTCCTCCAAA TCGGCGATGA CGTCAAAATC AGGCCATAAG GAAGGAAATT GCTCAAACGT AGTCCTTAGG CkkATGCTGA AATGTACT AAAATTGCTC CTAAAATACC ACCCAAGGCA CGATACCCTr GGTGCTAACT TTCC'TTTCCC ATAGTTrCATC GTCTGTCGCC CAGACGATAC ACAAGGCTAA GAGAGCCTTG ATCTAGTAAA GCAT'rAAATC CAAAGCCCAC GTAGAAGCTC ATCCTCAATC GTATAAGAC'? TGCTAAAAAC GGTCGTTCCT ACTATAGGCA ACCACATTCC CATCAACTGG CAAAAAAGTC GGTCAATGCA AAGGT'rGCAA AGAGGGTCAA GAGGCCCTCC 'rCTCATCTTC AAAAGTTCAT GCATGGCTAG CATGGCTATG CAAGAGGCCC CCAATCATTA AAATTGGTAG GAAAATACCC GGTTCTTTTC TGTAAATCCT GGGTCATATT 'rCCTCCTAAA CGACGATTAT AGGCAAGAAT AGCTTCC'rGC AAGGCCGCT GTGTCCGTAA AATAAAGCTC ACTATAGOCT CCCTGCCATG AATTCTCCAC TAGTACGGAT AATCAAGTCT GGGTCTCGTA 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500
GTAAAGAGAT
CTGGGTTGAT TTTGGCATCT AAAACATCCT CAGCACGTCC ACCATAGTTA AGAGCAAAAT ATTCCTCAGC CTTGGTTAAA GCTTCAAAGG TCATTTGAAT CTTAACATTA TTCGCATGTA CTGGCAAGTT CATGATAAAC TTGACTTCCT AAGCATAGAC CGTA.ATAACC TTGACGCCCA ATGCT'rCCAT GCCCGCCTTA TGTCCAAAAA GGCCATTGCC ATCCATGATG ATGCCGATAT CCACAGCCTT ATCTT'rCTTA AAAAATCCAA CGTTTCATTA TACCATATTT CCCCATTTTC CCAAGCCCAT TTTTCAAAAA AATAAGCCGC ATTATTATGA AAAAGTTTTA GGAGTTTAAG TACACTCCCT AGCTTAAAGT TTCCTTAAGT AGTTACCAAT CAATTCCTCT GTGATGTCAC GGGAAATCAA CTTAAGCGCC TGTGTAATCT TAAGAATCAA TCCTGTGTTG TTCTTAGTCA TTTGCTTAGG CAGGCGGTCT GTCTCCCCAA GTTCCGGGAC ATAATTATCA TAAAACTCTA GATCTGGACG GGTCCAGTTT TCCGTAGAAA GTTTGTTGGC TGCCTTGGTC ACGGTTTGCA CTCGCGGTTG CATACGTTT TTAGCCCAAC GAGCAGGAAC CTGTGTCGGA ACATGATCTT ATTCCTATTC TTCTA'rCACT AAGCTATTTA CTGATTGGGC GACTTTATTT TTAAGGTCT-r CTTAACTTAT ATTTTTAAAA ATCAAATTTT
ACCTCTACTT
AAAAATCTAT
TTCTCAGGCA
TTATAGGGAG
GAAC'N'AGTG
TCCATTTCTC
CTGCCAATT'r TTCTTGGATA AACGTGTTTG GAAATGAGGA GTTGGACGAA CTrTGAAAATT GAGTTCAAAA TCTAATTCTT CATTCAAGCG CTTACTCATA GCA'rCACGAG AACTGGTATA ATAAGACCTGA TTITTTCAATT CCCACTGGTA CAGTAATAAG GTTTCCTCAT AAGAAAAATC TTCATGATCA AAAGGGGA'N' 'GTCTGACAA TGCTGCCAAC CACGAGTCTT TCAAACC.AAG 616
ATAGAGTTCC
CAAAATATCC
CAGTCCAACT
GGAAGCAGTG
CTGAGGAAAA
TGGATCAAAG
AAGATTCCAG
GTCTCGGATC
TTCTTGCTTA
CCCGTCCAGA
CTTCAACTGT
CTCCACTGTA
CATGGTCAAC
ATT~CGGTCTT CATTTTCTAA TCCAAACCAT AAGGTACATA GCCGTACACC GTTCTGGATA TGAGGACTGT GCTGA'rGCAT TCCTCTCTCA GCTTTrTC'rC AAAA'rCACAT CTATATCTGT ATGAAATTTC TGACAGALACC ATCGTCAGAA TGGCCATCAT ?TTTCACI'G TATTCATAAC CCGTGTCCGC ATGATAAAAC CTCACGCTCT TCGGATAT CATCTCAATC ATGGTGCGAA ATCCAGCATA GTCAAAGCCT CTGGGCAAAA TCGCGCACGC ACGTAGGGCC AACTCAGATrC *9
ATCTGGACTT
CTAAGTGCTC
CTTTTTGAAT
TCACAGAAAG
GGATTT'rTTG
TATCGGTAAT
GCTTGAGGAG
CTGCCTCATG
AGTCAGCATG
TCTCTAAAAG CCTCTAAGAT ATATGCCTTA GCAGTCGCCA CAAGTAAGGC TCATACATGT AGTTCCTAGA CCAACAGGTC ATCCACATAG TCCAAACCTT AACATCATCG ATAACCCCAT TCCCCATTAT ACGATTGGCA ATACGAGGGG TTCCACGACT 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 GGTGATTTCC ATCTCAAAAA AGCATAATAC TCCATATGAC TTGAGAGCAT ACCAGCCCGA GTCGTCGCAC GAACACTGCG ACTGCCTTCA CCAGCCCCAA CACTATAAAG CACTTCTTCC ACTGACA'1GG CATCTCCAGG CTCTAAATCA T'rCAAAATCG GACCAGACGT TTGCTTGAGA TTGACTCCCA TTT'TCCCAAG CCCTGGAGGG CCAAATAAGA TAGCGGCTTC GATAAAGATC TGAAGTTGAT GTAAATACTG AGGACGGAGC GTGCGTTCTA TATCTCCGCT CCGCTCGACA ATTTCTGTCA CTGTAATCCC AAAACGTGCC CGTAGTGGAT CAATCAAGGT AAAAGCAGGC AACTCCAAAT TCATAATATC GATGTAGAAG TCCTCCATGG GTAAGCGATG AATCTCGTCA ATAAAGAGGA CTACCAAATC ACCCGCT?1'T TCGATA.ACAG GTTCATTGGC AATGACAAAA GCCATCGG'TG GCACATGATC CAGCGCTTCA TCCCGCATTT CCTTAACCTT ATCCTGACCA ATATATTCAC CTAACTCCTC ATCACCCATC ATCTCATTAT C?7ALATTCT ACTCATGGCT1 CTATTATATC AAAAAAAACA AGCCACAAAC AAAAAAGCCA CCTGATTGGG TGACTCCTAA GTrTAGCACT TATGTGGTAT AATATTATAC GGCACTTCTA CACCGCCTAC GAAAGGAGGT GAGATAGCCC ATGATGGAAT TAGTACTCAA AACTATTATC GGACCAATTG TGGTCGGTGT CGTTCTTCGT A'rAGTCGATA AA'rGGCTAPLA CAAGGACAAA TAGTGTCAAA AAAGACCTCA AGCTITATT'rG GTCG'rGAGCT TGGGGTCTTT 'rCTACCTAT 617 GATATAGAAC TAGTACTCAA TTCCTN'T"A TrATCCCATA GTTCACGAAT TTTGTCAA)A CTTTACATrT TCTAACCG CTGTACGACA AGACGGTrAA GATrAAGAGA ACGTTAGGGA
TTCTATC.AAT
TCTCATTTAA
CAATTAACAG
TrCATAGAAA TTTrGATTTC TACGCCACTA CTAGACAAGC TCACTTACAA TCAAATTGAG TTTCTTAGTC ATATTCGCTA AAAAAATCCC CCCCGACCAA AATCCGAAAA. ATACCGAAAA AAAAATCCTG AAATAGAGCT AAAAAAC'TCC 'rATGAAAAAG AAAAGTTTAG GATTTTATTA TACATGATAC AAGACGAAAC TTAAAACTAG AAAAAATTrC TATCACCAGC ACCTCACCAA ACACCACCGT ACGTGCCGTT TGGCATACGG CAAGGTAATA ATCCAAACAC GAAACCAGTC CACGT'rTAAG TACCGACT'rC TGAGCTACPA TATCTGCTAT CCATTTAGGA ACTCCTAACT TCTTCCATTG CTTCCAGATA ATCACTCGTA CGACTATACT TTTCATATTT CCCAATGAGC TCAGTTGCTC AATACGTCTr GTTAGGTCTA ATTTAAACTT AAATCTCCGA ACACTATCTT ATTTCCAGAA CCCAAAACCT AGATATTTCA AGCCGTTTCT CAATAAACGA CTGACTGAAT INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS, LENGTH: 8136 base pi TYPE: nucleic acid STRANDEDNESS: doublE TOPOLOGY: linear GTAA.ACGAAG AGACAATCTT ACATGTCACT AAAATCATrA TTACAGTAGT TCCCTCT TTGAACTAG CTGAAGCGAC CACAGACC'PA CGCCAAAATC TCAAAAAGTC CCCGCCAATT ATATCGAAAA ATTATTTTTA GA.ATAGTCCC ACCTGAT'rCG GTGGAGTTAA OGGAGATTAT AATAAAGTrA GGAGGTCTTr ATTTAATAAC CTrAACTrTT CTAAAATrTT ACTATrMGC TCGAGTAGGG GATAATCTCr AGCCCCTCTC CGGTTCAACT AACTTTTAAC GCATGTCGTT CACGTTTTTC CAGGACTGGT TTGATATAG 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16535'
ATTGATAATG
TAAGCAATCC
GGCGAGTACG
AATAGTTTAT
TACTCCATTT
GATGTGQACG
PCTCTCTTGG
A.CATC
GTCGCCCCAG CCAGATACCT CCATAATCGT CTCGATTTCT CAAGCGCTCA TCTATGCTGG CCATCCTCGA ATAGACAAAT CCTCTGTGTT AGTTTCTTCA.
GCTI'TCCAA CCATCTGATA TCATGTTTAC T'rrCAAACCT S
S
S
S.
S S (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCAGAGCGTT GCGTCCGAAA GTCTATCCAG ACACGGCTCT TTAAAAACAA AAGGAGAAAT GATGCATACT TATTTGCAAA AGAAAATTGA AAATATCAAA ACAACCCTAG GTGAAATIGTC AGGTGGTTAC CGTCGTATG TTGCGCTAT GGCTATCTGG GATGACCTCT TTGCCCATCG TTTAGGAAG'r TTTCCTCTCT GGCTGGAGTT TGGGATGA'rT TGTAGCTTGA CAGGGATTAT AAGTAATTAT CTTTTTGGCT TGATTAACTC AGGCTTTTAT GG'rGAGGTGC TGACGACACT 'rCTAGTTrGG ATTTATCAGG CACAGTTTAA ACTGGACGGC AAGGGCTGGA CAAAGTATC'r 618 GGCTGATTTA GGATTTTCAG GAACTATGAA TAGTTGCC CAGTGGATTT ATTTGCTGGT GGTTTACGAA CATCGTATTG TTGACTGGAT CTGTGTAATC TTTGTATCGG AAGGTCGAGC TGI'TATTTAC CTTATTTTGG CCCTACAGAA TTACTTCACA GTCATGCAGC CAATTGGACT GAAGGAAAAG CAGGAGTTTG TCGCGCGTAA TTCCATTAGT GTGCTTTGGT GGTTGGCCTT TCGTCCCTAT CGTGATTCAA TCACAGATGC AGCTGN'TAC CGTGAACAGT GGATATTCTG CTGGTGGGGA GAAAGCCTGC AAATTCAAGG AGTI'GGTTGG TATCAATGGA GCAAGGCAGC
TGGCTTCATT
AACCAATGGG
GGCGGCTACC
GAAATATCTA
TAAGCAGAAT
TATCAGTCTA TTGGTGCCAA GTAGGGCAAA TCCTCATGAC AATGTCTTTT CAATCTATC'r ATTTATCTCA TTAACAGTCT ACTGATTTAC TTAACTAGGA AAAGATGTTr GAAAGTGCTG TCGATTAAAA CAGATATAGT TGATAATCAA GGATTTATAG TATGAAAAAG GGTCCTCTTT TGTTGTTGAA AAGATAAAAA ACTCAGTAAC CTAGAAATAA GCTTTACTCr ATATTCAATT TGGTCTGTGA AATCTTGACC ATGACCAAGT GTrGGAGGTA GCTAGCACGG AATAGACATG T'rTTTGGAAA ATAGCAGACT TGGGCGAGGT AGTCCTCGGT TAGCGGCAGA GCTGGTCCAG CGTCCCACAT CTGCGAGATC TTCAAAAGTT GTATAAAAGC ATGTTAGTCA TATGGACACT CTTCTATCCT TTCTTTGAGT ATTATAACAA AGGAATGAGA TAAATTTGAC TTATGCCATT TTCTTGCTGA C'rCTGTGCAT TAGAAACAAT CTCCAATCGT TTTAGGAATG AGAAGGTCTA GATAAAATTG TTTTTTCAGC CACCAGGTCA ATGTCTCGAT AGAAGTAGGC AGATTAGGGT GGGCTTCTTT GTGTTCTAGC TCTT'rATGGA GTTGACGGAG
TTTTGAGATT
AGGATCGGCG
GACAACTGAA
GACAACTTCC
AAAGTrGGAC
TAAATTATCA
GAAGTAGTCA
GAGAAAGAGG
A'rGGAAGAGG
AAAGGTGGAT
GCCCTTAGZA
ACGGCGGTCA
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 GGTGATATGG TCTTGGTTTT TGAAATGGCT TGCTCTCTT AAGAAGCTCC TTACTCTCAT AATGATATCC TGAACAGTAG TTGATAGATG GCTTTTTTGG TAAGGCAAAT TGTTCAGAAC TTTAGTGGAT AATGATAATG AATATGAAGG CAAAATATGC GTTGAGTTTA TTGCAGGTGG GACTTGGGAG ATGCGATTGC GAAGAAGACA ATCAGTACAC
TATGAAAATG
CAAAAAGATG
AGTGACAGTA
TGGCCTCGTA
TTTTGCTGAT
TGAATAAAGC TGACGTTTTG AACAAGGTGT TCATAAATCT TGTTTGGGTG GCTTTTTTCT AGTATTTGGT TCTAGCGCTG AATTGGAATA TCAGCTTTTC CTTGGGCTAT AAGCGGTTTA 619 GCC'TGCTAGG AGCCTTGGTA ACAGCrGTGA TCTCGTAAC GGGCTCTGTT CTAGTCATT'r 1980
TGGCAAAATGT
TAGGAATTAT
CAAAGAATGA
?ErATCCTGAT
CCCTTGTCAT
AGATTrrTT~ AGCCArrGA
CACGAAGATT
TGCGATTACT
c'rCTATTCTG
GCGATTGTT
TrTCTTTCT
GGATGCTGTG
CAA'rGTGGCC TTGCATCCGC AACCAGTCAA TGATGAGGGG ATTCTC'rCGT ATCAATCTG.T TAGCGAGTC'r C4TG 3 G' AAGGGAAACA AGTCTGCATT- TTCTGGAAGA TACGCTAGGG TGGGTAGCTG CTTCGATTTA CGGACTGGTA TATCCrAGAT CCTCTTTTGT ATTCTT1'CAA AAGCCCTTCC ACCTTTTGG TCTACACTCA CCAGAAGC TITGATATCAA GCAAGTAAAG AGTGGCCTGG AGCCTTAATC AGCTTAATCT CTGGACTATG GATGCT??GG GTT'TGTCTAA AAGAAATGGA ACATATGGAA ACTTGTAAAG AAAAAAATGC CATTGTCCAT ACTCTAT'rCG AATTTTCCTA CTGACCTAGA AACTCACCAA AGCATCAACA TTAGAAAAAA ATTTCTTTAT TATTTAAATA GATATATGAT TG'I-rAA'rGAT CAACTCAAAG TTArATAATA TGAGAGGAGG TTAGCGTGTG ATACGATTCG GTATTA'rGAA GGATTCGTGA TTTTCAAGAT CGGCGGGTGT C'rCTGTAGAT AAACGAGAGA GGAGAGCTT 'rGTCTCAGCT ACAGACAGCT GAAAATTTTA AATGAAATCA AAAGATTGTG cGr'rTCAAAA TATTACCATT ACCCATA6AGC GAAAGGTGTG TGACTTGGAA GTGAAAAATA CT'rGGGTACT ATCTTATTTG TTTCAAAAAT TGGTAAGAGA AGAGCATrTCT AAkAAI'rr'rT CGATTAGATA CAAAATGCCT AGATAAGTGA GTTACAATAG AATATTAAAT CTGCCAGTGA CGGGTTGGTC TTGTGCCACC CAGGATATCG AAGCGCTGGA AGTTTAGTTG ACTATA'rGTC GGTATTTTAG AAGAGGAAAA TTAAATCGTT TAAATCTCAA GCAGTATATA CAAAGGCAGG
CGTGAATTCA
T'rrTTGGGA
GATTACTCGT
ATTTATTAAG
GCTCTACCAA
GCAAAAATTA
AATTAAACT'r
TCACGTTGGA
TCGTGTGGTT
GAAAGCTGGA
GAAATTGATG
CGGAGTTATG
GAATAGAGTA
ATAAACTCCA
GACTTGGAGT
GTGAATGAAA
ATT'ICAGcG
ACTGCTACTG
TGTTTTCGTT
AAGGGAGATG
GAGGAGCGCT
TATAAGGAAG
CTTGC'rAGCA
CGTGCGTCCG
CACAAAAATA
.2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 TTGAACGTC CGCAAATAATA GAAG~CGATc ATGTGATTAT TTTGTGGTTC AGATTTATGG AGGTACCGTA ATCCAGAAAC GTGGACACCA AGCGATTGGG ATTGTTGAAG AAGCTGGGGA CAGGTGATTT TGTGATTGTC CCTTTIACAC A'rGGATGTGG CTGGATTTGA CGGTTCTTGC GACAATCATA TTGGCAATAA CAGAATIATAT TCGCTTCCAC TATGCAAACT GGGCGCTCGT CTGACTATAC AGAAGGGATG CTCAAGTCCC TTTGACTCT AGCCATTACG ACGGTGAAACG TGAGTGTGAT GCCTGTCTTG TTGGGGGG'r GATTITTCACG TAAAATCCCT GGTCAACCTT TGCAGATGTC ATGCCGACAG 620 GCTATCA'rGC CGCGCGTGTT GCAAATGTTC AAAAAGGGGA ATGGGGCTGT TGGTCAATGT GCTGTCATCG CGGCTAAGAT TCCrTTTGAG CCGTCATGAA GACCGTCAAA AGATGGCTAT 'rTGTTGCAGA ACGTGGTCAA GAAGGAATTA CCAAGGTGCG CAGATGCAGC ACTTGAATGT GTTGGTACGG AGGCTGCTAT T'rCATAATGG AGGGCGTA'rG GGCTTTGTAG GAGTCCCACA GTTCGACATT TATGCAAAAT ATCTCTGTAC CAGGTGGGGC ATAAGCAA'N' TTTACTAAAA GCCGTCCTTG ATGGTGATAT CTTCAAGTTA TAAACTGGAA GATATCGACC AAGCCTATAA CAAGGTTGTT GTTATCGGTG GCGTGGAGCA TCACAAATTA GGAGTCAGGT GCGACAgcTG 'rGAAATCCTC GGTGGAGGAG AGAACAGGCG CTAGGTGTTC CTATAATAA'r CGTGCTCTTG AGCTrCTGCT ACAACATACG CAATCCAGGT CGCGTCTTTA AGATATGGAT GAACGTAAGA CAAT'rAAGTC
TTTTTTATGT
AGCATGGTCA
CCTGCTCGCC
TTGGTCAGAC
TATGATTGTA ATCGAATAAA AAACGAATAG GAGTTTTAGA ACTCTATTCG TATCCTATTC TTGAPTTAGG GTACTTTCTC TTAATGTC-AG TCTGGTTCCC GGCTAGGGAT TTTCCGACCG TGGAGGACTT CCTTGTTAAG AATATCCATA
S
S.
S
S
S. 55 5 5
*.SS
S
S.
S a
S
CCATTTCTTC AGTA'rAAACT GTAATACTAG AGAGGGGAGG TAGTGTCGTT AAAGGAAATG AGGCTGACGC GATCTGGCAG GCTTCTTGGA GGGCACGGAG GGCGGAAGTT GGTCTCCCAA GCAG'rAAATC TTCCTrGAAA TT'rTTGAAGT TTTCTAGACG AGGCCTGTTA GAATCCCGAT TTCATAGCAG TGTAAAAATC TCTAGAAATA CAAGAGGCTT TTTCCGATGC AGAGAATCCC TAGCGCAAGA TATCATAGTC TAGTAGTAGA GGTCGTCCAG TGCTrGGGTT TGTGGGACTC ACGGTTAAAA TACGGTGTCT ACGCGGGATA CGGTCGCGAT GGCACCGA'rA GCTAAACTAT CGCTGGCTGC GCTCTGAATG GCCTCCT'rCA TTAAGTCATA GACCAGTTCA TCATGATAGA TTCCCCTCGC CTTGTCCTGA ATGATTTCTT CTTGGTCTGT
ATAGACCTGT
GCTGATTCCA
GAAAAATGCT
GCCAGACTGG
TTGACTATAG
TGTTTCTTCA
3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 ACGGTCCATT CCr'rGACTGA CGTGA'rAATA CAGGTATGTC TTGGTAT'rCT TCAAAGGCAG AATCACTTCC TCGCTTAGGG CAACTCTTGG GCTC'TTTTTT CTCCCCTTGT TCGCTGACCC GCCTGTCT'rG AGGTGCTTGG GGTTTCTTCT GTAACAGATA AGAGACAGAG GCTAGCTGTG GGAAATAATC GACAACCTGT CCAGGGAAAG TGTATCGCTG AAATCTGAGC TCGACTAAAC TAAAAGGGTG GTCATTAAAA CTATTCCTAG GCGAATCTGG ATTGGATAAT GGCAATCTTT TGTAGCCCAG CTCTrCAGCA GGCTCTGGTC GCGGTTGAGG CAATGTCTTT TAAGGTAGCC TCTTTTTIACr GATATTTTAC ATGTTTTTCT ATAAATCCTC C'rrGATTAGG TTAGTATATC TAAAATTTT'A GTAAAAAGGA TTGACCTTGG AAAATTCCTI' GGATATAATA GAAAGAAAAC GATTACACGT TAAGATGGCT TAACGGACAG TCAAAGGAGA ATTCATATGG CACAACATCT 621 TACTACTGAA GCCCTTCGCA AAGACTTTCT TGCTGTTTT GCTCAAGAAG CAGATCAAAc CTTCTTTTCA CCAGGCCGCA CGTTrCCT GCTGCTATT AGTCTTGCG'r TTCTACTCAG TGACCTCAAG 'rTrGAAAAAG CTTGCAAGAA GCTGGGCACG TCCAAA'rGGT GCTGGCTTGT TGAGCATCTC TT'rGATI-TAA AGAAAACAAC TTTATCGGAG GGCAGACCAA CGTGCTATT'r TGATTTGAAG GACAA'rGTCG CTCTAAATAC AATGAACGTC CTTGGATATTr CAGACTCTGG GATTAAAGAT GAAAATCGTT CCTCAAAGCT CAAGTAGCAC TGCGTCACAC GTT'rCTCTGG TGTTCACACA GCT'rGGGCAC TGGTGGCTGT GCcATTGCCT TTAATTTGAT TGGTGAACAC ACAGACTACA ACGGTGGGCA CCTTGGGAAC TTACGGTGCA GCTCGTAAGC GTGACGACCA CTAACTTTGA CGACAAGGGC ATTATCGAAG TGCCTCTCGC AGCACAACTG GACCAATTAT CCAAAAGGTG TCCTTCATTT TGATTGACAA AGGTTT'rGAT TTT'rATGTTT ATGGAAATAT CTTCTTCTGC ATCCTTGGAA CTCTTGACAG GAGTCGTGGC AATTAGAGCG TCTCGATTTG GTTAAAATCG GCAAACAAAC TAAACTCTGG CATTATGGAC CAGTTTGCTA ACCTAGATAC TAATACTTTA GAATACGAC'r TTGTTATCAT GAACACCAAC AAACGCCGTG
TTGGTATGCG
TGGTGCCACT
AArTrGGCGGA
TGCAAGTTTC
ATAGCTATCT et. Se *5
S
C
S. 09 S 9 0
GTGCTGAGTG
GTGAATTGGA
TGAAACGTGC
TCCAAGCAGG
TGAAAAAGCA GTGGAAGAAT CGAGTGGGCC GTTGACCAAT
TCGCCATGCT
AGATTTGGAA
GTGC'T'GAAA ACCAACGTAC ACAT'rTGGAC GCTTGATGAA 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 AGCATGATTA TGAAGTAACT GGTTTGGAAT TGGATACCCT AAGAAGGAGT TCTCGGTGCT CGTATGACAG GGGCTGGTTT TGGTTCAAAA AGATACTGTT GAGGCCTTTA AGGAAGCTGT 95~
S
*CS*
6* 55 S I
S
S
555.
9
S
S. .5 C S S
S
AGGCAAACAC TACGAGGAAG TAGTTGGATA AGGTGGCACT CGCGTCCTTG ACTAGTCAAA ATTTGTAACA CATGTCATTT CTGAAAGCTC CAATCGTGTT TTGGCACGAG TGGGAGAAGG ATTGATTGAC CTCAAGGACC AGCTGGTTGA TAGTCAGACT GCGCGTGAAA TCCTTGGTGC AAGTCAGGTC AATCGTGATT TTTGGGCAAC GGATT'TTTAC CAACTCAGTC AGAAAAATGA TATCGCTTAT CGTGTTCCAT CTGACTACGG GCCTGAAAAA GATCCCAAAG AGATTGTGGC CGCTCCAAGC TTCTATATCG CTGAAGTTGC AGGAGGCT CT
ATTTGAGGAA
TGTTTT'GGAA
AGAAGCCGTT
TGAACTGATG
ATAGTGACCT TAGTAAATAA ATGGATCGAA TCTATCTGAC
GTTGAGACCA
CCATTAGAGA
GATTT'GGTGA
ATCTGGATAA
CGATTGAGGA
CTCCTTGTCC
CTACGCCCAC TCTCCAGAAC AAGCGATAGA CTACATCAAA CTCAAGGCCA T1'GCTAGAAA AGAACTTGAA ATTACCATCA ATCTCTCTAA AGCCAAGTTG GTGCAAGCTA GTAATTATCC TCAGTGTCAG CTTTGTCTAG AGAATGAGGG CTACCATGGT CGAGTTAACC ACCCAGCTCG TAGCAATCAC CGTATTATCC GCCCTATGCT TACTTrAATG CATTAGTCGT CAGAGTI'TG TGCTGGATC!T AATGCCGACC TCAGGGAGGC CGTCACGTAT TGCTGGTTTT GAGCAGGTCA GACTTCGGAT TCCAAAGAGG CCAGTATTCA GATCCTGCAG TATCACACCC ATTGCCCGCA CAATCAGACT TCAGCAGAGT TATCAAGAAG GAAAATATCG TCTGAAAGAA GAAGTGGAGC CGATTA'rCAT CAGGAGTGGG GAAAAAGCCC TTGCAATCGT GATGCAGGAG TCTACAAGCA 622 GTTTTGAAAT GGTTGGTCAG GAATGGGGI' TCCAGTATTC AGCATTGTAT CTT rTTAGAT GGCCAGCATC GTCCCATGGC AACGTCTGTT GGCTATCGTA GACCAGTTTC CAGGATATTT TGCCGATTGT GGGGGGCTCT ATTCTAACTC TrCCTATGGA. ATrGGCTCCC TTGCAAAAGG AGGCTGGAAT TGTCA.AGTGG CCCATGTCTG
ATTTGATCAA
TGCAGATTTT
TI'GGCTGAT AAGATTTTGC GGCAGAGACA GACAGGACAC AACGCGATGG ACAGTTTGAG TTGGACTTGG ATCCTGATGG TATCTATCAT CCCCACAAGG GCT'rGAT'rGA GGTCATGGGC TTGGCA-ATCT AAGTCGCTAG CTATCTTGTA GGAGAAGCTG CAGACCAACT CAA.ATCCCAA CATCCAGACT CAAGGACTCT .GTGGGTGCTA TCTTT'GCGCG GACAGAACAA GGGCAGACAG CCTTTATGCG
ATGATCATTA
CCTTCCGATT
TCCTACGTTT
AGGAATGGCG
CGCATCACAC
TCTTGCGACA
ATCTCCAACA
TGCCACCACG
TACAGTTGC
AACGGATAA
TGTAC'rTGAG
CTTTGTGGAA
7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8136 CAGGTCGGAA TT'rTACTAGA CTAGGAGCTT TCTCGG INFORMATION FOR SEQ ID NO: 76: SEQUENCE CHARACTERISTICS: LENGTH: 10011 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: CCCATAGTGA AGAGTGGCCA TAAGAAGGTC TTCTAGGCTr AATTTAGGTT TTCGTCCACC TTTTGCGTGT TTAAGTTGAT AAGCTGTTTT TAACACAGCT GAACATCTCT TCAAAAGTCG TGCGCTGAAC ACCAACAAGA CATTTAAATC GTGTATCAGT TAGTTGTTTA CTGCTTCAT PATTCATAGA ACTACTATAC CATGTTTTGT TTCGCAGGAA GTCTAATATT GTCAAATACT GGAACGCTCA TTGCTGGGAT GGTTCAAAAC CAAGGTCTGT TCTTCGTTTA CATCTTTCAC
ACGGAATAAG
TGCAGCGATT
CATAACTGCA
ATrGGCCCCAG CTTCGATAAC TGGGATACCT GGTGTAAAGA. TATCGTAACC TTTCATAAGG TCACAGTGAA CATCGTAACC ACGGTTTGAA TGACTTGAGT TAACACCTGC ACCGCAGGCA AGTTCTTCTT CTAGAGCACT TTTAATTTGG 623 GCAAGAATTT TAATCATT'rG GATT'TCCTCC GATTTTATTT TTTAATAGAC GGTTGCTTCA GCAATGTAAG CATAAAGC TTCTGCI'rCA GAAATTTTG AAGATGACCA T'rTCCTGTGA AGAAGTCCAT TAACTGAGCA AGAATGTTCG ACTTGAATTA TTGATGATAA AGAAGACCAA GGATACTrTCT CATATTATGG AAAGTCACCG ATGAACAATA TCTGTGTGAG TATAAACCAG TTGGAAATGA ACAATTTCTC GTTC'rTCCAA GCTTCTAAGC AAAACACAAG CTTCTTTCCA TTAACTTTAT TTCTATTCT'r TTTTGTCTCT AATAAACACA CAAACAAATA AGAGCAAACTr AGGAAGCTAG ACTGACGAAG TCACTCAAAA CCATACATAC GGTAAGGCGA G'rTTCTCTAA TCGAACAACC GAATCATTAC ATTTGCAAGT CT'rTTCACGC GTGATCAAGG CAAGCT'rGCT ACCTGATCAA GTTTTTGTCA AAGAAATAAT GCTATA-AGTA TAACACTATA ?TTTTTATAT TTTTGTTTTG CTCCAAGCA'r TTTTCTGTTC CCGCAGTTGT TCAAAACACA CATGGTTTrG AGGTTGTAGA CGCTGACGTG GTTTGAAGAG
ACTTCCTTAC
ACCACTTTCT
CCTTTCCTAG
CTTCACGATA
AAAGTTATTC
AAGATTAAGC
ATAG-GTCTTC
TTTGACTTGA
CTGGCGCAAT
CAGCTAGATT
AAATTCCATA
AGTTGGAGTG
TTGArTTATCC 0 0 0000.
0000.
CTAATACCAT AAGGTTTTrCC TGAAATCGTT GTTAATTACT TTTATAGTT'r GTTATATAAA TAATACTCAA TGAAAATCA.A GTTTTGAGGT TGTAGATGAA TGAAACTGAC GAAGCAACAg ATTTTCGAAG AGTATAAAAA 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 CTAAAJAAAGC AGACCATCTA AGCCTCCTTT ACTATTGATT CTTATATAAA TTTCCTGTGA ACAAGGAAAG GCAT'rTCTGA TAACTTATTC TTCATCCATA CTCAAGACGC TGAGGAAGGC TTCTTGCGGA AC'rTCAACTG ATCCGATGGA TTTCATGCGT T'rTTACCAG CTTTTTGTTr TTCAAGGAGT TTACGCTTAC GAGAAACGTC ACCACCATAA CATTTAGCAA GTACGTTCTT ACGAAGGGCC TTGATATCAG AAC'rTCAAAT TGTTGGCGAG TTCGTAGGCA AAGTCCTTGT AAGAATATCC ATTTCACCA TGCATAACCA CGTGTCGAAG AGAATTTGA TAGATAACAT CCCACGCTTA CGCTGAGC'rA TTGCGCCTTG ACATAAGGCT TGGGTTAGAC ACATCCATAG AGCTGTCATG ATGAGGTCAA TACGAGCGAC AATCTTGTGT CCAATAGCCG GGATGATT'rT CTTGAGT'rTA TCAACGATGA GAACGATAAA GCTGAGGGCA TCCACC'rTAT GCTTAGATGG GCGATATTCT GACAATTCGT ACTTAAGTT ATCAAAGAAG TCAAAGACAA
CTTGGATTCG
GTTTCCCACG
CTCCATTGAG
AGTCAAAGCT
TTTCAGCAAG
TCACACGGTT ATCATCAATrA GCTCCATTAC TGCTCCGACG CTTCAATGGT CGCAATCTTA ACTCACCGTC GGTCAAATTA TATTGAACTC ACGCTCTAAA 'rAGTCCATAG TCACAAAGTC AACTCCTGTG GTACCATCAT GTTGGGTCTG GAAACTCAGA ACTTTGTAAA TAACAGACGG CGTTCCTGGA TAACATCCAT ATGGAGAAGT CCAAGAAATC TTCAAACTGA AGACTAGCAT GTACTTGTI'T GATrcGATTG ACCATGTAAT GTCTGCCG ATCCTGAACC GCTTGATAG 624 CACAACGGAA ACCAAATCCA AGTGCCTGAG ATGTr'rCTGG CATTrCAGTTG CAA'rTTTTCA AGcGCTTCAC GCAGGTCAT'r GGTAGAGACC CCCAAAGACC ATAGGA'IrCA TCTGCTTATA CAGGATTGGT TGCCAACGTA ACGGTATCAC CCACACGAGT ACGCCGCAAT GTAACCAACA TCACCAGTCG CAAGGAAATC t.
*c' *t*e S S
S
ACGACCAACC GCTTNTGGTG TAAAAA'rACC GCTCATGAGC TGAATCTTAT CACCAGGTT'r GATAACCCCA CGGTAAGCAT CGTAAACAGA CACATCACCC G'rGGTGCTG GTACTTTTTC AATACCAGCC TTGGCAGAAG CCAAAACTGC AATCTCTGTA CGCACGCGCT CCGGATCTGC CATGATTTCC AAATCATTAT CCAAAGCCAG TCCTTGAGCC GCATCGACCA CCAAAATAGC TTCATAGGTA AAGTCAACGT GCCCTGGTGT ATCTT'rTGCA GTGTAATTCA ACTCGATGGC CTCTAGCTCC ATGCTATCCA AAAGCTGGGC TTTTTCCAAA ATGCGGTCTG CI'AGAGTTGA GAAGr'rACGG ATCTTCTCCT GTCGTT'TT TTCAGGGTAT CTATTTATTA TAAATTGTT GGAGTACTAA TCTTCAGCGA CAAAGCCGTC TTGGTCTGTA AAGACAA'rCC CGTGAAGGAC AATCTTGCCA TCTT'GTAG TCCAAAGCTC ATAGACCGGT GTATGACCGA AGACAATGGT 'rGG'TTTTCTA ACCCATACTT TTTTATAATC ATCAATACCT GCGTGAACAA AGATATACTT JAPGAATTCG ACCAAGTCTG CCGCTTCAGC AACTGGTGCA TCCAAGGGAC GACCTAGGAT ACTATAATGG TCATAACTTT CTTCTGGGTC GTTTCCGGAC AAACAGATAG CCCCTTGATT ACGGTGACTA TCCTCACCTC TGTCAATCAA
GTCGAAAATC
TACAATTTGC
TTCACTGGCA
AGCCGGCAGG
ATAAACGT'rG
ACCCTCACAG
GTCAATCAAG
ATTCAACTTA
CTGCATTTCA
TTTTCCGTGC
AAGGCCTTAA GTGGCGCCGT TCGAGGATTT CTTCAATCCC TCCAAACCAA TCACATCTTC TCAATTTTAT TAATGATAGG GCAAGAGTTT GAGCCTCAAT GCAGCTAGCG AACGTGAAAC TGGAAAATAT AAGTTTCCCC ATAGTAATTC CACGTTCCCG CGACT'rGAAA CCGTCTCTGT TCAATATGGG CGATAATAGA GACTTCGGCC ACATCAAAGG TCTTAC'rATT GACCACTCCG TCCATGACAC GCACTTGGAG 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 CAATTC'rTCT AAGTTCATGA TTCTCTTCCT TTGATATTTT GACAAGACCA TACCCTGCTA ATTTTCGATA AAGTGGTGTT CTrGTCATTCC ACCACCATAA ACAGCTCCTC CATCCAT'rCC AGATGTACCC CGTTCTTGCT GTAACAAACC TTTTCCAGTA TGATTTTCAG CTCCGTGGAA TGTTGTTTCA TGCCAGTCGT CCAAGGTCAA GTCTGTCTCT ACTACAAATG GgCAACCCGC T'rGGCATCTT AGAGTTAATG GTTGTATCTC ATCTAGCCAA GTCAAAAACA GTCCACCAAG TCCTTGACCA ATCACCTAGA AAGAGCAACT
GCATTTGACG
CTACTCCATC
CACCATTGCG
TATACTCGTG
TTTCAAGAAC
GGGGCTGACC
ATCCCAGGTT TTGAGAAGGT ATAATAATCT GTCATCTTAT CTGCTTCTGT CACATCATCA CCGTCAAGAG ACGAACAGTC ATTIGATAATC TGCAATCGCA GCTGACCAAT TTTATGAAT'r TA'rCCACCTC ATCAAAGACA TGGCrAACAT GAGACCAGAT CT'rCTCCAGG GTTGGTTGAA 625 CTTCC.AGCAT CCCAGCrTT CCGTGAACAT CTCCAATTAC TTCTCCCTGT TTCTCAACAA TTrCTTGCr TGCGTCAGGG CCTCC.AACA TCTTGGCAAC GAAACCGTTG AATGGTCATT ATTACTrGTG GCAAA'rGGGA TTCTGAGCAA TAGCTTGAGC ATGCTAGTCT TGCCTTCT'T AATTCCCCTC CAGAAGCAAC ATATAAAACT CAACCATT TGAAACTGGG CTTTTTCCAT TGCTGCCCA AATTATGACG TCCAACTTCT TAAGCTCTGC
TTCCCTTACT
GTTTAATCTC
CCAAArrGAC
GATTATTGCC
CATCAACAGT
CCTGCATCAG
TAATGTCTrC
GATCATACTC
AAAAATCTTC
AAAACGAACC
AGCTTCGAGT
AAGATTGACT
TTCCTCCACT CGCTCTTCGA ACTAATCTTC TCAATAAAGA GATAGCCAAA ACCTGACCAT AACACGACCT GAAACTCCCC ACGTGAAAAG GCAGACTTAA CTTAACCAAG GGTTTAAAGT ATT'rCCCTCA CGACTGAATT ATAAAGATCT TGCZAGTCN' AGCAGAAGCA AGTTGACCTG T'rCCATGTCC TCAGACGAAA GGCAAAATAA AGCAAAACAT GAGG'rCCAAA CGATTCTCAA GA'rAGCTTCC AAACGTTTGC TGAAATTTCA CGGTATTCAG TGTCAAGAGA TTGTATTCTr CCGTAATCTT CCC-ACCATAC 'N'ACGAGTAA TAGTATGAAG GCGATTGCCA TCAAAATCAA GGTCCTCAAT TAAAACATAG TAGGTCTCAG ACAGATAGCT 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4960 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 TTCGACACTT TCCATGTCAT TCATAGCTGA ACGAACATTG GCCAGACTTG ATTGTCCAAC ATACTGTAGG CATTGGTCAG TGTATCCGCA ATA'IrTTTGT 0 o* GGTTGAGGAG 'rTTATCTCGC TCTTGATTGA GAGCCAAGTC TTCTCCAGCC TGCAAGTTTG CTGCCTCAAT CTCTGCCATT TGAAATTCCA ACATTTCGAT ACGTGCCTTG TGTTCCTGTT GGTrTTTCTT GACTTCCAGA ACCTGCTGC GCATTTTCCG ATAGGCATCA AAACTCGTTr GATAGGTTTC TTTCA.AGTCC CAAAAAGCGG CATCACCAAA TTCATCCAAC ATCTGGATAT GCAGTTGGGG ACCCATTAAC TCCTCATGGT CATGCTGACC ATGAATATCT ACAAGATGTT GCCCAATAGC TCGCAAAACA GACAGATT-AA CCATCTGACC ATTTACACGG CTGATACTAC GACCATTTTG CAAGATTTCC CGACGGATGA TAATTTCATC ACC'rAATTCT AAACCrrGCT CATCAAAAAT TTCCTGTAAA AGACCACTAT TCTCAACTGA GAAAAGCCCC TCAATCTCTG CCTTTGGTGC ACCATGACCA ATAACATCTG TCGTCGCACG AGCTCCCAAC ATCATATTCA TGGCATCAAT GATAATCGAC TTCCCTGCAC CCGTTTCACC AGTCAGGACA GTCATCCCCT T??CAAAATT GAGGGAAATA GCCTCAATAA TGGCAAAGTT TTTTATCGAA ATTTCAAGTA 626
CAAAGATTTC
ACATATAGAC CTACCAATT TACTTGTT CTCTGCTAGA CTTCCACTTC 5820 TGGCAATGAC TAAAATCGAG TCTCGATTAA CTGAGCTT TCTTATCCAT CAATTCAGCC
TTGATTGG
ACTCTT'rGAT
GTTCTACAAT
CAATTCGTAG
ATCTCGGGAT
TTCTTCTTGC
CTATCATCAG
ACAAAAGCCG
GATTCGATAT
ACATAGGTGT
ACCGTCGCCT
GTGCCGATTT
ATTTTTA-AAT
GCTTGCTGAC
GATGGGAGAA
CAAGTCTCTC TTTrATTC ATCTCAGCAA GAATCTGATT ATATITrCCAT GTCCACCTTG TCAAACAGCI' AAAAATCTTG TCTGCAAAAG TATTTCCTGG AATAACTTGG AGATTGATCA TGTCTTCAGC CAGTTGCAGA CN'TTACGA TGTCTCTCAA AGGAATTTTG ACAATACCTA GAGTGGCAGT GATACCTGCT TCTTTCAAAT GATAATCTGT CACCA.ATCTT CTAATT7TTT TGACTATGCG CCCTCTCTAC TGCTTCTTTA TrTTCTrT TCAAATACGC TAAAAATTCA AAGTCCAAGC CAAGGACTGA AAAACCTACC ACATTCTGAT GAACCTTAGC ATCTCGAATA GCCTCAAACT GAGGTTTGAC AAGTGCTACC AAGGCTGGCA AAATCAGACT AAGGGAAATG TCCTGCTCGA AATCAGTCTT TTCAGCATAG CGTGGGTCTT GGCGTAATTT CCAAGCCAAC AACTTGGCAC TATTCTGTAG CATGACATCG ATCGTAGTCG CGCCATCCAC CGACAAATCA CCACCACGGC TGACATACTT GAGTTTCTCC ATTTTCTCTC CTGGC1'TGTC AAACCGTTCT 00 0.00.
.00.
TCTACTGCCA
ATTCCAT'TTT
ACCTGACCTT
AAACTCACAT
CGGAAATTGA
TGATTGGTAC
6TAAAACCTC
AAGACCTGCA
TAGCTGTTAC AGATTCAAGG TCCCAA'rCTG CTCACGTCCT GATCAGCCAA GACACGGTGC CAATACTGGC AAAGCTCGGC ACTGCTCCAT GCTGACAACT CAACATCGAC TGCAAAGACC CAGTAGAGGC CCCGATATCA AGGCCTrTTC CAGTTTCAAA 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 '7260 7320 7380 7440 7500 7560 CCCTTGAGTT TTAATTCGGT GTCATCTGGA CCATTAAGGA CTGCTACGAC TAGGCCAGCC TCAAACAACC CCTGTTTATA AGCTAGTACA ACTTTCTACT ACACTTACAA TCGATTCTGT TAATTT'ITCA TTrAGCTTGAT CCAGGGTTTG CAACAGGGCA GGATAGGTTG ATTTCTGC
ATCACACCTC
TCCACTCTTT
TTCAAAGGGA
GTTACAAAAG
GCTTGGCCTG CTCTCTCGTT CCTTAGCCAT TGA'IrCTCAA AGCTGCTGGG CAATTTCTTC GCAATGGACT CrflCCAACCC CTGCAGATCC TTTTGAGGTG TCTTGCCGAT
CAAGTCCAAT
TTCCTCAAAA CTAGCTGTCA CATCCAGTAC ATCATCTCTG ACTTGAAAAG -CAATTCACCC ACAGTTTTCA GCT'rCACCTG CATTCAGGT GACAATTCAG CTATAATAGC TCCCGCTTGG AAGGGATAGG CTAGTAACTT CCCAGTCTTA TTGGCATCAA TAGTCTGAAG T'rCTTCCAAA GACAAGTGCT GGTGTTCGCC CTCCATATCC AAAACTTGCC CTGCTACCAT ACCCAGACTA CCTGAAGCAA GGGATAAGTT GGCAATCAAG TCCACCTTAA TCTGACTTGG CAAATCTGCC TGCGCAATCA AGGCATATGA GTCTAAGAAT AAGCATCTC CACCCAAAAT 627 GGCCATAC TCACCGAATT ATCCATAGCA GGAAGGTCAT AGCTACCTGC GCGTrGAGCAG GAGAAAAGGC CGAATACGCT ACTAGAGCCA AACTGCTGGT TTrTTCrTGc TTN-rCATTC AAGGTCTTTT CAGCCTTGTC TGAAAGGCAG TAATCGCATC GTTT'CCAGT'r CTGCTAGATT T'TCTACTTGA CCATCTCGCA CGAATCTACA ACGGACTCTT AGTA'rCCAAC ATGAGCAALAG AACTAACGCC ATTTGGCTAC TTGGATTTTG GTAACAGGTG TTTGTTATCA GAAATCCGAG GCG'rTGCAAA TAACCGTCAT TTTTTCAAA GCCTCTTGTT TTCCTGATTT TGCAAATGAG CGCCGCTGTT GGCGTTGCAG CTCATGCCCC ACACTAGAGA TTCTTCGTTA AAGGCCCAGA ATCCAAATCG TCCCGTTGAT TTCACCTTGA ACCTTGGTCG GG'rCGTGATA ATATCTCGAA -AGAAAATTGG GGCAGAGCTT TCTTGTGATT~ GCTTAACCGC CCTCTTCGAT CGTGAATCAA GCTCCCTGTA TGAATCATCT GTGrATGGT AACCTGCAAG GCTTCCAGAA TGCCACCAGC ATGAATAGAA TAGAGAACAG CTCCATAAAA A'rCTTCCAAA GCCGACTCGA AAAATCACTT TCTGTTCCGT CTTCTTGCAT CAGCGTAGCT TGGAGCTCTT TTGACAAGAC TTCCAGAGCA ATTTCACCAT TI-TCCAAACr TTCCTCAAAT TTCTTTTGT'r T'rGACATCT'r TCAAAAGCGT TACTTGGTCT T'TTTCTTCA CTTTrTTTGAC AATAGCATAA CCACGCGCCA CTTCCGAAAG TCGCTTGGCC TCAGCAACCT CTAAGAGCTT GTCCAACTGT CCTAAACGGT ATAATTGTAC TAATTGATGA GTTCTrGCTT TTCGCAAACT T'rGTT'rCAAA CGCAGTTGCA ACAAGCGCTC AGGTTGTCTA AAGATrAACAG TCTTAGATAG AACATTTCGG ACTGCCGTTA CTAATACATC CAACTTGGTC ACAGGTGTTG
AATCGTCATC
CTAAGGCAGT
CTTCTAACAA
ACTCCCGTAA
CAAGAGCTAA
GACCTTGACC
CATGCCCrI-r
TTGGACAATG
TAACCTCTAA
AACTCTCAAC
CGATTCGGCT
TGGCGTCATA
CTTGATAGCG
GAACTAATTG
GTTGGTCCAA
ACTGACTGCA
CCATCCGTI'
CCAGTTCAGC
7520 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 CGCGTCGATC TGCCACA.AAA TCTGCCAAGG TCACA'rCCCT TAACTGGCAA ACGAGATTCA AAAATAGCTC GTACCACAAT GATCCTCAAT AGAACCACCT CCACGACCAA TAATGAGCAA TAGCACGCGC AATATTCTA GCAATNTCCT CCGCAGCCCC GATAAAGAAG GATGTCAACA CCTGGGAATC GCCTrGCTGAC 1'AACGGCTCC ACTACGGCTG GTTACTACAC CAATTCTCTT GCTTGAAGCG TTCTTGAAAC AGGCCTTCTT CTrGTCAATTT TTTCTTAAGT TGTTCAAACT GAATCGCAAG CGCCCCAACC AATGATGATG GAGTAGCTAC CACTTGGTTC ATAGACCTGT CTTCATTCCT -TCTTCCAGGT CAAACCCTAA TTTCTGATAA TTGAATAACT GCATGGTCAT CCTTTAGGGA GAAATAT'rGG CCATCAGGCT CAGCTTTTTC ACACCCCCAA TCACAT'rGAT ATCCCAGACC AGATGGTCGC TGAGTAGGTC GTTTACGAAA 628 GTTGGAAACT TGACCAGT'rA AATAGACCCG TTCCAAGTAT GGGTCTTTAT CGAATTTCAT TTCAGATAC TTGGTCAAAG TTGTTACCGA TAAATACTI-r TCCATCTCCA CCTACTATT-C ATTTACTrGC TCTTTCATGG GTATTATTAT ACCAAAAATA TGCCTAAAAA TCTCCATTTA TGTACCATTA TGAGGGAAAA ATAGAAAAAG GAGGCAAGGC CTCCACATGT GATTATTTGC TGTTTCGAGC 'N'CTTCCAAA ATCTTTGCAA. TCTTGGTCGT CA.ACAGGTCG ATAGCCACGG TATTGCTAAC CCCTTCAGGA ATGACGATAT CAGCATAACG CTTAGTTGAC TCGATWACT GGTGGTACAT TGGTTTGACC ACACCTAAGT ACTGGT'rAAT AACGCTATCA AGGCTACGGC CACGCTCCTC CATATCACGC TTGATACGAC GAATAATGCG CACATCGTCA TCCGTATCCA CAAAAATCTr GATATCCATC AAATCGCGCA GACGCTTGTC CTCCAAGACC AAAATACCCT CAACGATAAA GACATCTTGA GGTTCCTGAC GATAGGTCTTr GCTACTCCGT GTATGCTCTG TATAGTCGTA GGTCGGGATG TCCACCGGAC GCCCTGCCA.A CAA'rTCCTTA ATCTGCTCGA TCATCAAGTC TGTATCAAA G GCAAAAGGAT GGTCATAGTT GGTTTTGACG G INFORMATION FOR SEQ ID NO: 77: SEQUENCE CHARACTERISTICS: LENGTH: 5365 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10011 CGTGTGGTCT TAAAAATAGA AGACAAAGAA CAAACTGTTG TCAGCCCAAG AAAAAACCAA AACAGCTCAA GTTGTGGCTA TTGAACGGTG ACTTGGTTGC TCCAAGTGTT AAAACTGGAG CACGCAGGTC TTGATGTCAA AGATGGCGAT GAAAAGTACA TTTGGCAATC ATTGAGGAAT AGAAGGAGAA AGTAAGTATG ATCAGATdCC CGTTCAGCCA TGGTTCGTGG TGTCGATATC AACCTTGGGA CCAAAAGGTC GCAATGTCGT TCTTGAAAAG -TACCAATGAC GGTGTGACCA TTGCCAAAGA AATCGAATTG GGGTGCTAAG TTAGTATCAG AAGTAGCTTC TAAAACCAAT TACGACTGCA ACAGTCTTGA CCCAAGCTAT CGTCCGTGAA AGGTGCAAAT CCAATCGGTA TTCGTCGTGG GATTGAAACA AGCTTTGAAA AACA.ACGCCA TCCCTGTTGC CAATAA.AGAA GAGGCTTTGT CCTTGCAGGC CTGGACAAGG TGTTCGTACC ATCGTGTCTT AGTTGAAGCC TCATCGTAGG CGAcTAACAT TCAAAAGAAA TTAAATTTTC CTTGCAGACA CTGTTAAAGT TCATTCGG7TT CACCCTTGAT GAAGACCATT TTGAAAATAT GATATCGCAG GTGACGGAAC GGAATCAAAA ACGTCACAGC GCAGTTGCCG CAGCAGTTGA GCTATCGCTC AAGTTGCAGC 629 CGTATCTTCT CGTTCTGAAA AAGTTGGTGA GTACATCrCT GAAGCAATGG AAAAAGTTGG CAAAGACGGT GTCATCACCA TCGAAGAGTC ACGTGGTATG GAAACAGAGC TTGAAaGrCGT AGAAGGAATG CAGTTTGACC GTGGTTACCT TTCACAGTAC ATGGTGACAG ATAGCGAAAA AATGGTGGCT GACCTTGAAA ATCCGTACAT TTGATTACA GACAAGAAAA TTTCCAATAT CCAAGAAATC TTGCCACTTT TGGAAAGCAT TCTrCCAAAGC AATCGTCCAC TCTTGATTAT TGCGGATGAT GTGGATGGCG AGGCTC'IT1CC AACTCTTGTT TGAACAAGA TTCCTGGAAC CTTCAACGTA GTAGCAGTCA AGGCACCTGG TTrGGTGAC CGTCGCAAAG CCATGCTTGA AGATATCGCC ATCTTAACAG AGATGCGACA ATTGAAGCTC GGTTATTGTA GAAGGTGCAG GTC'rCAAATC GAAACTACAA CAAATT'GTCA GGTGGTGTAG AGAAATGAAA CTCCGCATTG TATTGTTGCA GGTGGTGGAA ATTGACAGGA GATGAAGCAA TCGTCAAATT GCTCACAATC TGCTGAGCTT GGTATAGGAT AGGTATCAT'r GA'rCCAGTTA CAGCTTGArIT TTGACAACAG TCCAGCAA'rC GA'rCCAAGCA GCGGAACAGT TATCACAGAA GACCTTGGTC TTGGTCAAGC AGCGAGAGTG ACCGTGGACA GAAATCCTGA AGCGA'I-I-CT CACCGTGTTG CTTCTGAATT TGACCGTGAA AAATTGCAAG CGGTTATTAA GGTTGGAGCC GCAACTGAAA
TTGAGTTGAA
AAGATAGCAC
CGGTTATCAA
AACGCTTGGC
CTGAGTTGAA
AAGATGCCCT CAACGCTACT CGTCCAGCTG TTGAAGAAGG CAGCTCTTGC CAATGTGATT CCAGCTGTTG CTACCTTGCA CAGGACGTAA TATTGTTCTC CGTGCI-rTGG AAGAACCCGT CAGGATTTGA AGGATCTATC GTTATCGATC G'rTTGAAAAA TTAACGCAGC AACTGGCGAG TGGGTTAACA TGATTGATCA AAGTGAIGTCG TTCAGCCCTA CAAAATGCAG CATCTGTAC 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 AAGCAGTCGT1
TGATGGGCGG
AGCCAATAAA CCAGAACCAG GATGATGTAA GCTTTCTATA TATAAAAAAC ACAAAAGGAG GTAGTGGGTT GAAGTCAGCT GTPCAAAGCG7ATAAAAATCC GCGCTTGATA AGTTTGATGA TTGAAGGGCG TTGATAA'rCT AATrAGGATGA ACTTrGCTTAA ATTCTGAAAG TGAAACAGCA ATAGCTCAAA AGCTTGTCTA ATAAAATCGC TTATCACTCA GGAATGACTA ACCCTTCTI-r TTATAGGCTC AAGCTCGAGA AAGGACAAAT TTCGTCCTTT GTTTTTTGAA GTTTTCAAAG TTTCGAAAAC GATTAT'rGGT CGCTTCCGGT TTGGCGTTAG
TAGCCCCAGC
GAAAACAACT
TTTGTCAACT
CT'rTTTTGAT
CAAAGGCATT
AATAGTGTAG
CAGTCTGAAA G'rTTTAAAGA TTTCTTTATC TT'rGAGGAAG GATTGTCCTC AATAAGTCCG AAAAAITrTCT CCGGTTCCTT' AGAGTTGATA GAGCTGATAG TGATGTTTCA AOTCTTGTGA AAATCTCT'r' ATTGGTAAA TGCATACGAA AAGTAGGACG GTTTACGGCT ATCCTGTTG'r ATGAGC'rTCC AGTAGCGCTT
GATAGCCTTG
ACTCATAGCA
CGGGAGTGAA
TGTTTAGCCA
630 TATTCATGGG ATTNTCGATC CAATTGTC CGOCTAAGAT GTTGTACAAT G2'CAAAGCGA ACAGTCTGGG AGACTGTTTC AGCCTGAGCC AGTCATAGTA AGGACTAAAC ATATCCATCG A'rAAT'N'GAA CACGCACACG TCCAACACGA 'rTTTAGCATT 'rAGAAATTTG AAAGCGAAGC TAATGATTTT CACT'rGACAA CAC 'GTGT TCTGCCTTCA CGAACGGCTC TATCGTAGCG AG3AACAGTGA TAATATTAAG GTGAAGGCAT ACTCATCCCA TGAAAGTCAT TGAGCTTGCG TCAGTCA'rAG AAATTTTTTC ATTTGGTGAT 'rTTTCTTTAC CACTTGAAAC GACGC~rrCT GGAATTTTAG AAGGTT'rTG GATGGGGCGT CGTAGTCCAC TCTAAAATCT GGATAI-rAGG TCCATATGAA TCTTTCTAAT TTTTTTCTAC AACAAAATAG AAGAAAGTGA TTTCGGATGA
ATTATCAAAA
AGACATAATC
AATGACAGTT
AATTAACT
TCTTGCGCAA TGAAACTCAT CT'rTCCCTTA T'ITCGAAGCC GAGAAAAATC ATGCTCAAAG GAAGTTGAAA TGGCCAGCTG ATGGGCAATA TGAGCAATCT TTTGGTTGAT GATACGAGGG CAGGGGAG'rC TCAGCAACCA TCATTTTTGA ACAcrrGATAG AAGGAGAATT CTAGAAGGCA TACCAGTCGT TTCAAGATAA AAAGTCATAT TTCTTCAATT GGTTTCCGCA CTCAGGGCAA TTTGGCGATG ATTTCCTrGT GTGTATCCTT ATTGATGATG 0 *000
GTCTTTAATA
GAGTTGTTTT
GCTCCATAAT
TCGAGCAGTT
GTCGCTTTTC
ATCTATAAGG
TATAGAGCCG AAXATTCACA TCTAATATAT GCAGACTACT TATr'AAAGG ATGACACAAA AGTI'TTTGAA AAATCTACAT AAAATATACC TGACAGAATC 'rAAAGAATCT GGAATTAAAC TAT'ITTGAGT TTATTGAATC TAAAAGTATT GCTTTATATT CTGATAGATT AAATAGCATT TTCTCTGTTG AGATATTGTT GATTGATGCT ATGTGGAAAT ACAAAAAAAT GTTTTTGATA TATACTAATC ATTTTCGTAT 'TTTrGTATT AAACGATATA GGAATAAAGA CATTAAAAAA TAACAGTATA TCTATTTGTT GCATAAATCT CTTTCTAGTA ATGTGTTGTA ACTCTGCTAT TGT'rTACACA ATTTATTrTA TAGTACCAAA AAAGGTCAGG CAACTTTACC GATTCTTTAG TTCTACATAG CGCTTGTACC GAGAAAGGAC CACGTCCATT GTTAATCCAA TCAACAAGAA ATATAGTCCA AGTCATCAGA ATAATTCATT TTGCGTTTGT TCCAAGAGAC GTTTTCCCC ATCTGWAAAA ATTTTAACAT TTGTGATAAA ATGTAATTGT ATTATAGGTC ATATGGGACT GATTrAcCCCA CTACAAATAT 'rrGAAATGAA ATTAA AAA AA TCAAATT'rGT AGAAGGATAT AAATGGACAA TGTCATAAAA TTCAAAAACG ATTAAATGAG 'PTTAAAATAT TGTACTAAAT CGAAGTTGAC CTGTAT'TTTT AGTT'rGTTGT AAACTTACAA TTATATATr TACGAATrCT AATAGATTTA TrCCTrTTTG ATTTTGTTCC TGACCTTTGA AAATGTTTAC ATAGGCTTCT 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3960 4020 4080 4140 4200 4260
TTTGACATG
GACGC1'CGTA
CCAAATCGTA
TTCTTTTAAA
CTCTTCAACG
ATCAATATAC
TTCAGTGCF CTTCATCCAG TTATCACGAA TCATGGCAAT GG'N'CTCGAG TGACCCAACG CCAATAATGG CGTFTCTGT CTCCCATCAT GCTTATAACT ACTAACCAAC TTCTACAAT AAAATCTGTG AAmTCACTr CGAAAAATCG TAGCCCTTTC TTrCATACTTT CGGGCATACT AGTCGTTTCT TCATCAACTT AGAGAAGCCC TTGTTAGTCA TTTTCCCTCA TATTTTT'rCA AAAATCATTC AAGATTTCTT AGTCAGTACA TAAGGTCCCT GTACTGGCTA TCATTrAATCC ATAGrAGGGG CTAGCCATA'r TGCAATAGTA AGAAGTrCCA GATATTAAAC CAATATTCT TGTGAAAGTA AACAATAGCC ACGACCATCA CTTrCGGTAA CAAGTGTATG ATCGTTGACA TGTTT"1AGT ACCATGGTGT CCCGCCAAGT TCGGTGGAGA TTGAATrGTA ATAAAGTCGC CTrCTTTT-GG AAGCTTCATA TrATAAGTrT ATCATrTACT ATrGTACCAT AAAATTACCC GGAAATATTA AAGATATTCT CTAAGAGCGC TTGCTATATC GTGCTAAAAC T'rGAGTTAAA CGCTIGCTTCA GTTCGTATCC
S
S.
S e
S
S* Ut.
S
S
a a Sc a a tata..
S
a *S 54 a *Sa* a S. St US a
S
TAGTATATTG CTTATCAAGT GACTATCCAA 1"rCGTCAAAG AGTTCTGGAT AATCTTATCT ATAGTTTA'PT GGCTACACGT CTATAGTAGA TrTTGAAATT TGTCTCCTGA AAGTTGATTG ACTTCTCTTC TTTkAGATTA TCCTrGAAGA TGAGTTCCTG GCAATTTTAG CATCAAAATA TGCAGGGCAC GAGCTGGAAG TGAGCAACTT CCGAAAAATC CCTTTTTGTG CTAATT'rCrG GCATrGATGA TAGCATAAGC GCAATGACTr GAGAAACGAT 'rCTTTTT-CAG TACGTGCTr'r AATTGAGCAA AGTCTTGAAT AACATAAAAC GAACAA'N'GT 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5365 GTTTTCATTA ATATCATATT TTTTCAGATA. TrCTCTGACC AAAGGATAAG TGGTAGAGGG CCAGATTCTT ACCATAAGAA CrCTTTCAAT TCCTCTTCGC TTATCACCTT ATCTCTCGAT GTCTTCGGTG ATATAGCATT TGTCG INFORMATION FOR SEQ ID NO: 78: SEQUENCE CHARACTERISTICS: LENGTH: 3636 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: TTTCCAGAAA GAAGTTGAGT AAAGTCTTTA TCAAAGAGAA TGACTI'CCGT ATTGGAACTG ACATI'AGGTT TTAT?1'CTAC TTTACTAGCG TCCCCCCTAG CATTTCTAA ATCTTTAATC TCTTCTGTrG CCCTATTTAT AGCCAGCTGA ATAACTGCTT GAGGATTTI'C ACTCAGTCCA TGAAGCrI'AT CGTCCACCGA AGTATAAAGA CTCGAATGCA TGACTTGTAA AATAATCAGA
I
632 GTCATTGTAG AAAAAATCAG GGTGAAGACA CCGAAGTTGC GGATAAAATA ACTAAAGTCA TCCGCATACC ATGTT'rTT AAGTTTACTG AACATCTT' AAAAGATACC CAACACTACG CAAAGTTTGC AAATTCTCTG CAAAAGTGGT TCCCTTTAAT TTCTTACGGA C1'TTTGAAAC ATAGACTTCG ACAACCGAAA TCCTTGTATC ACTATCAAA'r CCCCATAGAC GGTCAAAAAT CTGCGTCTTA GGCAAAATCA CAP'N'GATT TrGAAGGAAA TAAACTAGTA AATCGAACTC TTCCCCAGC AATTCCACAG GAGTATCTTC AACTTTAACG GTArrGGTTG ATAAATTAAC CACGATATTC CCATAAGTCA AGCTGTTTTC ATTAAACTTC CCTGAACGTT TGAGAAGGGC CTGAATCCGC ATTTTAAGTT CT'rC'AGGTA GAAAGGTTTG GTCAGATAAT CATCCGCTCC CAGT'rCAAAT CCATGTCCCT TGTCATCCAA ACTTTCCTTG GCAGTCATAA TCAGAACTGG TGTCGTAATT CCCTT'r'ICAC GCAATTCTTTr TAAGACTTGG AAACCATTT'r TTTCTGGCAA CATCAAATCC AGCAAAATCA AGTCATAGAC ACCACTCTCA GCTTCGTAGA GACCTTCTTC TCCA'rCAAAT ACCTGCATAA CATCCGCAAA ATCGTCTAAA AAGTCAAATA CTGAATTTGA CAGACCTAGG TCATCCTCAA ACTATTATAC CAAAT'rTGCC TGAGTT'N'CT TTTTATTTTA GACTGCAGCT TTT'TCACGGC ACCGATGTrA CGGCTAAGjAG CGCCAAGCGT TGCTGAGTCT CATATCTCCT CCAAGGGCTC TGTTTTCTGA GTTGCTTGAT TT'rGACCACA AATTTGTCCT ACTGTATGCA AGAGCTGCTG AGTATCAAGA TAAAGTGCTA AACTTGTGAG CGAATAGCTG TTGGCTAGCG ACTTGACTAG CCAATAAGAT TTTTATCATG AGAAACTCCT CCTTATTAAA TTAAAAAAAA CTCAACTCTC TGCATTT'rAC ATGAGATAC GGCT'rATTTA TGCAT'rTCCG TATTGAAGAA CAACTGCT'rC
TAATCAAGTC
CAAGGTCAGA
TAAATACATG
CCTTAATCCA
GAGCTAGGAA
TCCAAGAAGT
CCAACTGGCG
ACAAGTCTTT
AACACGCGCT GCAATTTCCT TGATTCCCAT AAGTTGCGGT TCAAAGAACT CCTTGTATTC AGCAGGAAGG ATAACAAAGC TATCAAAGCT ACCCCAGTTT TCACGCGCCC AAGACCAAGC TTGGTAATAC CAAGCAGACA AGTCCTGTGG AATCAGGTTT TGGATATTAT CCGCATCTGT TTTAAAGACA GCATCTGTTG CGTGAGTATA AGTCTCATGA TGTTTCATCT CATTAATCAG 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 CTGGGAGTCC TGCAAGATTC CTTCTGCATC ATTTGAGCGA qIGACGAACC
ATAGTTATGA
TTCATCAATA
AATTCATCCT
CGAGCCAATT
AAGCGCTCAA
CATCTGATTC
TAGCAACCAG
GGGCTGAAAT
TCCGTCTTTA
TCCTTTGAAG
CACTTGAGAA
TCCTTGTGTG
ATCATCATCG
GCTTCAAPLAC
GCTGTTTCAG
ACAGCTGAAA
TTGCGAAGAT
AAACAGCCAG
CAAGACGGTC
CATCCGTTCC
CCACCAGATA
AAATGTGCCC
TATCAAGTGT
AGACTCTrCC TTAGCAAGTT TATCAAGAAC TGCCTCAGCC AACAAACGAC CTTCTTGAAC TGGA.AGCAAG TCTGCATAAG AATTTGCAGT 'rTGCTTGTGT CTCTAGCTCA GCAAGAACAG
AGTATTTTCA
AGGGATTTCG
GGGCACCACC
AATCTTCAAG
CCAAGAA'rCC
GTGTTGAGAC
ATACTTTCAG
CAGAGACOGT
ACATCATTTT
ATGAAGGCTG
633 CTGCTAACAA GTCTCCTTGA. TAGTCGGTAA GAAGAGCTCC TI'CATTTTCA GCAAGAAGAG TTCGAGTGT ATCAGGCAAG CCWTCCAGT TCTTGTCTTC GTTCTCACCG ATGAAGAATT CAACTTTAAC AGTAAGAACT GGGTAACCAG CGACATCACG TCCTGACGCT TGACCAAGGG TGTATTGGTG TTTT'rCA.AAG TAGGCGTGCA AACGGCGAAG CATGTGCATG AGACGGCTTC GTGTATTGAT TTCATCTGGA TGTTTAACTT GI'CACTACCA ATGGTGTTGC AAAATCAGCA TCTCCTAGCC GACGATAGCG CCGTCAAAGA AGACTGAACG CCATCAGTAG ATCTTCAAAG ATATTCCAGC AGCGAAACTT TCATTGAGCC CCATTGGTGA GCCAATTCAT AGAGTTCTCA TCGACAACCA AGCACCAGCT GAGAAGTCAG TCCATAGTAA TCTTCGTAAA.
AT'TTGAAAGT GGATGTGCTT AGCGGTCACC CCTTGCAAAT TGTTGTCTCA AACTTCCAGA GTTTGACAAG GCCAAT'rCAC TTTGGCTTCA OGCTCATCCA AGTAGACAAG ACCTCCT'rCT CATGTTGTCT GTAATNTTAC CAATTCGATA TGAAGGGCTT AACTTCTACA GAGGTGATTT TTGACCAGTG ATGGTCACTT TAAATCATAA TGTTCAGGAA
CGTCACGTTC
TTGGTTCGAT
AAGAGCA-AGA GGTACTCCAC CGTATCCACA CAGACGTATT AAAGGTCATC CCACCATTTC GGGCCACAAC AAGGGCAACT AGTAAACTTC ACGGTAGGTC GAAGGGCGAT GTGGAGAGAT ACTCGATAGA GCGAACAGCG TGGTTGAGTA GACACCTACC CACCAGCAAC AAAGGCCAAC TACCTGTTTC CTTACGGTTT CTTCTGCTTG GTCAAAGC6A
ATAGTCACGA
TGT'rGACGGC
ACAAGACCCC
TGAGGAATTG
.ATATCCAGTG
AGGGTACCA'r
AAGTAAGAAG
TATAGTGGGC
CTGCGTAGCC
TGCTATTGAG
GTTTTTGTGA
GCTGTTCCAA
CATCCCAAAG
AACCTTTAGC
CTTTGGCATA
CGACGTGGAC
CTGTTTGGAA
CCATCATATT
GGTTCCCAAA
TAGCAAATGT
AGTTTTCCAT
GGTACTTAAC
AGAAATCAAG
TTTTAGTTTT
ACATGCGAGG
2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 300.0 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3636 TCAACATCGA TTTCTGGCAT AGAGAGAGGT CAAAAGTTGC CACATGGGAA AGcTI'CGCGC GCAAA.ATGGC TCTCGA.ACTG TGACTCCATC AACTGTATAA TAAGAAGGGT AAATCCCTGT
CAGAAAAGGC
CATTGTCATG
CCAAATCTTT
TCCCAGAAAA
CAAATTGCTT
AAGAACCAAT
GTCAACTGTA
TTGGTGGAGG
AGTCTTGGTC
AATGGG
TCAACTTGAC
AATGGACGAG
GAGATGCGGT
CAGCCTCAGC
CTTGACCTGC
CACTCTGTGC
TCACGACTCA AATCTAAAAA INFORMATION FOR SEQ ID NO: 79: SEQUENCE CHARACTERISTICS: LENGTH: 5066 base pairs 634 TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION: SEQ ID NO: 79: ATAGCGTGTA ATAATCGATT TTAGAGGTAC CATAAGCCAC CTCCTACAAA TATAAATCAA TGCCTTCCAC CCTTAGACTT CCCTAGTTCC 'rGTCTCAAGC TTTGAAACAG GAATAAGTTA ACCAATTCAT ACCAATAGCT AGCAGAATAA AATGCCCCAT AACTTGATAT CTGTCACATT TCTCAAGACG GTATTGAAAA AACAACTGTC CAAGCAAGGC TAAAAAGAGA ATAGAAGGGG ATGTAAAACC
TAGAAACCGA
GAAACATTTC
AAAGAAACCA
ACAGAACTGA
AGTAAAAATA
ATAAAAAATT GGAA-AAAACT TACTATTTCT GTTGGCCTTT AAAGTACGGT GCTAAAAGTA AGAATT'rAAA CAAATGTTCC ?TTGATAGC GTTTTCTATT ATTTTATTAT ATCAAAAAAA TTCTACTTTT TrATTTGCGT TTTCTTGCGA TGAGATGAAT AGGCCTTGCG GATTTOATTT TCCAAGAAAC GCAGGTAAGA CATTGACAAA GATGACAAAG GTTGGTGGTT TGGTTGCCAC TGAGACGTTT TCCTTTGTCT GTCGGTGTTG GGTTGATGGC CGTTCAAGAC AGCTGATGGA ATACGTGTA'I TTTGACTTTC CAGGAAGTTT GTGGAGACGT TGCTTGGTTA AAGCTGATAC GCAGGTATTG GAACTGCTCA CGGATATCTT CTTCCCAGTT
TCCGGAACTG
CGGTGTTCCC
AAAGTGCATG
TTGGGTCGCA
AATGGCATCC
GCTGATTTGC
AAAGATAATC
TT'I'CATAGTG
TCATTCCAGA
TCAAAAACAA
AGTTCITrCTT
TAGAAAATCT
ATGATGACAT
rTAATCATCT
GGTGCGTAAG
TGGTTATCTT
TCATGGGCAA
ACCATCAAGA
TCAGTAT'TT
TCTTGACCAT
GGACTAGCAA
TTAGGACGAC
TCAATCCAGT TATCAAAATA ATCACCGACA TCCCCCCTTC TTTCAAGCGT ATCCCACTTG TTGACCACGA TAATCATCCC TTTACCAGCT ATCCTGCGAT ACGCTT'GTCG TACTCACGAA TGCCTTCTTC CGCATTGATG CCACATCTGA ACGGTCAATA GCACGCATGG CATAAACCTT ACCAGACTTA CGCATACCAG CTGTATCTGT AAAGTGGGTA TC.AA'rGGCAT TAACACGGTC TTCTCCCAAG ATAGCATTGA CACGCATAAC AGAGTATTTC CCGTATCAAT CATGGTAAAC CACGAGTTGT TCCAGCAACA TCAAGCTTGA TTTTCCAACG
CAATCAAGCT
CTACGATCGC
GTTCACCCAA
CCTTGTTGAC
CGTCTGCATC
AAACTTAATG ACATCTGGAT ATCTAGCACA TCCCCTGTAC ACCGAGAGCA TAGAAATCAT TOCGAGGATA ACTGGTTTGT AGTAATTCCT TCCTTACCAG TTrTCTrTCCTC ATATTCATTT GGAAGATTTr CGATTCCATG GACAGATGAG ATAGGCAATG ATATATCATT TCTCATCTCA GGGTTGTCCA GGGTCTTATA AAGCTTACGA GCTACGTATT ACACGACAAA AACGATAACA rC'TGCTrCTT 1320 1380 1440 1500 1560 CCATGGCAAT TTCTGCCTGG TGCTTGATTT TTCCTCCTGT ATCAATCATG CTAAAAGAAC GGTCACGTGT CACrCCTTCG ACATCTTCTA GTTCCATGAA AGGAGCATCG ACATCA'rCAA GATTGAGCCA CI'CACCCGTT GCATAAATAc CAATGGACGAT TCGC'TCACCA GCGATCCGAT TAAATAGGG'r
TAATTI'CTCA
TCAGCTTGAC
GCATAGTCAG
TCT'rCCTCAA
GGCACACCCA
GCCACTTCTT
TCAGCTGCCA
TGATTTCCCA ACATTGGGAC GTCCTACAAT CT'rTCTACAA TAATTTCTTC TGTTCAAGAT CAAACTGTTC TGCTACGCGC TGACTCCAGC CCTGAACACG GTCATAAGCT TGGA?1'GCCT AGACAACATT CTCTAGTGGC AGTCTCGGTT GTGCCATCCC AAAGACAGAA TAGGTCTAGT CAGACTTGTA TCGAACCAAA CCGA'rAATCA ACAAGGCGTT TTGTCCAGCA AGAGCTGCAT GGCAATAGTT GGTAGGGCCA TTq'TTCTAGT TGAGCTTGGT T'rCTGGTCGc
CAGTTGACTG
'rCATATCATG
CAGGTAGGTT
CACCACCATA
CGACCGAACT
CCTT-CCACAC
T'rCAAATCTC
AATTCATACA
AAATTCTTCC
GGAATTTCTT
TCTGTCATCG
CCATGACCCG
TCTCCAAATC
GATAT'rGGAG
CAATATCATC
TGCCATCACT
ATGAAACACC
CCACACCAAA
APGCTAGTAG
AACTAGCTTT
CTTCTTTTTC
GATTCTTAAC
CTTGGGGTTG GAAGGTGTCG GTATGAAGTC GGGCTCCCTT CGACAAAGAG AAGGAAAACA GCAGACTGGC GAATGGC'rlC AGGCATCTTT CTTCTCTTGA CTTCGTACCA CAATCACAGA AAGATGATGC CATCTGGGCT GCTGTCAAAA TCTCATTTAA GCTCTTTAAA CCTGCGCACT GAAGTATGAG CCTTCATCAA ACGGTTTACT' CCTTCTAAAC GAG'rCTCCTC AGCCAAATAA TCTGGCTTCC CAGGTTTCGT CATTT'CCATG TTTCACTTTC TTCAAAGTTG AAGTTGGATG TGAAAAAGGT CGGTAAATTT AATGACCTGC AGGATTTCGT CACCCACCCA AACGGTTGAT
ACGCGCCCCA
TTCTrGGTAT
ATGTTGATTT
AAAGAGCTC'r
GCCCAAGC'TT
AATCAAGAGA
TTCTGCTCGG
TTGAGGTACC
GTAGGATTGG
GTCTACTTGG
TTTAATGGTT
CGGATGCGTT
GCAAAATGCT
TCCTGCATCC
TGCTCGGCGC
1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300
TAAAATCACG
GATAGCATTT
ACGTTTTTCT
GTCTCCA'rAT
CTTTTCAAAA
CTTGAGACTG
PAGCCGCGATT
CAAAATCGGC
ACCTCAGACA
TTGACATCAA
GATAAATCAT
GCTTAATCTC ATCCACCAAG GTCrrAACAT TGACAAAGCT AGGATAGTGG AGGAGAGTTG GAGCTAAGGC CCCCACCATG APLACTTTTAC AAGTAAAGAC CTTTTCGAAT AGCTGGATAT TGCTCCACGA ACTGGTAAGC GCCCCAAA'rC ATCCAAGTCA ACTTGAGCCA GCTGGTAGAT TGAT'rAACTT GAGACGGTTC TTAATAGCCG AGCTCAGGAG TTTCTTCATA TGAAACATCT GCATAACCAT TTGTAGCCTT TGGCAATATA ATCCGTATCC CCACGGAGAA ACTTGTCACG CTCGGTGATG TACTGAT'rAA ACTTGGAGAT ACTGCGATTT AATTCCTTTG 636 GAGTTAACGA TTCTTGCTGG ATAAAGGCCC CAACATCAGG GTCCTTCATG ATTCTGGA ccAAATCTTG ATAATAAAAA cGCTGGGTr GAcGTTTGAG TAcGTCTCCG ACACTTTCCA TCTAATCTCC TCCTTrTTCT AATCGAGCTA ATACTTCTTG CTCTTACGT TCTAGTTCCA GACGAGTTTrC CTCGCTGGTT TCATTCTTAT ATTCAGGATT ?TTTTCTGG GGCAGTCTGA TTCTGTTTTT GTGTTTTTGC TTCGTAAAAC CGCCTCTTCT TCATGGCATA TT'rCTCATTG TAATATTGAT GACTTCGTCC CTGTTTGGGT AATGGTTCCC TITTACTTTT AGCTTCTTTG GTT'rTTGAGC AATT??CA ACTTGGCCAA TTGATAGGTT GGTTITAAGAC ATCGGACTGC GAAAATGTTC CAAGTCAAAA TGCCTAGT'rC TGCCAATTCT GCCGAATGAA TCTTTTGATA ATATTTGCCG AATCCACCTT AGTAAGCCCA AGCCAGCCAT 'rTGCGTGTTT
ATAATGGTCG
CGCATGCGT'r
TCAAACCAAG
TCATCCGCAA
TCATTGGCCA
GCT'rGATT~TC C'rrCCTTAAG
TGGTTGAAAT
TCCATT'rCTT
AACGAAGTCC
CTTTCTTCTT-
ACTCCATTTA GGAACA'N'GG 7TTCGCCCT CGATCACGAA GGCATAGTCA TTGGCTACCT ATTAAAGGTC AATAAGAG;A CTGTTGCAAG AGTTCTCTTT TGCTAAGAAC TGCAGGGCAG ACTAAAGTCA GAGGAAACTG AACCTGCGAA ACAGCTGTTG CTCCTCGGCA ATAGCAAAGA ATCTCGAGCC ATCAGCTGGC GAGACCAAGG TCTTCTTGAC GGAAAGACTT GAT'rGAGTGA GACAGGTATT TCTTCACCAT CAGCACTTTC AACTTTCAAA TCCTCCACAG CTACATCGCC AATCTTTTTC TCTAAGAGTC TGCGAT'-AAAC AGGA~tGCCCC AAGAAGTCT GACTAGATAG AGGAGCATGG AGGGCTAGCT GATAAACATC ACCCTTTTGA TAGAGGGTCA AGAGATTAAA AGCAGATAAG ATTTTCAATG ATTTT-ATCAG TCTATCCATC CCAAAGTTGA GATGG'rTGAG AATGCTTGAA AAAAGATATT CCTTTCTACC ATTATCCCAA AAACTGATTG TATAAAGATA AAGGCTCAGT GCCTCCTGAC 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5066 CGATAATCGG GAGGTAGCAC TGTACCAGAG ATGAGGTATC TTAGATAAGA AAAACGGTCA ATTGGCTTCA TTTATCTTTC GGGTGATTTG TTGGAGCAAG CTCTCTAACT CACTGACATC TAGCAAAACG TACATAGGTA ATCTCGTCCA ATTCAGCCAA CAATGTCCTC ACTTTGAAT'r TCA7rr'rCAT TTCGACCACG TGACTACCAT GTTGA?'N'CA TCACTTGACA CAGGACGTTT CATTAAAGAT T'rTATCTCTG GAGAATTGTT CCCGTGTGCC AGGTTCTTTC TTCTACTCGT TCGTAGGTT-G TAAAACGGTG GTCTTCTACG AATGGTGTTC CCTTC'rTCTG CTTGGCGACT TAGCCCCACA TT1TTGGACAG GGTACC TTGCGACACC CGATTATTCT CTTTTTCTTT TTAGAGGACT CTTAAAACTA CGATAGACAC CTCCTCCATG ACGAGTGAAC
GAGTTTCTGT
CTGGGCTGAG
ATCTTTTTTA
TTGGCATTCG
ATCGATAACA
TCGATACGAT
CGGATAATCC
ACAACCACTA
TCGCACTCAC
CTTGACT'rGG 637 INFORKATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 9607 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: CACTTGAAGT ATTTGAAACA GTCGTGTTGG TGGTTCTAAC CACTTGGACT TCGTTGGTTG ACCGTCTTGC AAAAGAAATC GTGAAGATAC TCACCGTATG ATAGGATGCG AAAGCGTTAA TTGCAACCAA TGAGATTCAT GCTATGGAAA ACATCATGCC TGTACTTGAA GTACGTGCAC TACCAAGTCC CAGTTGAAGT TCGTCCAGAA CGTCGTACAA
GTAACAATCG
TTGGATGCTG
GCTGAAGCTA
GAAAGTCCCA
CTI'TTTCTCC
CTCGTCTTCG
CTAACAACAC
ACCCTGCATT
GAGAAAATAG
AGACTTTTAG
AAATAGGAAA
TCCCGTTCCA
.1 :6,09, 0.
0 GATGCTAGGA ACGGTAAGGA TGCAAGGTAA TACAAGGAGT TTTATCTTTT TCACGCAGCA TGGTGAACAC ACAATGCAAG TGGTGCAGCA GTTAAGAAAC CGCACACTrC CGTTGGTAAG GGAATCGAAG CAGGTTGCGG CTTGAGCTCA ACTAAATCAT CTGACGCAGT ATTCGACGAA GCTCACATCG GCTAACTAAC CTATAACTCA GCTTACCATC CGGGTAGGTC CTGCCTATCC AATAAATAGG AGAAACAAAC TCGGTATCAT GGCTCACGTC TPTTAGCCCGG GTTCAAATTA GCTAAATCGA TTAGTATTAG TCGTAAGTTG AAACCAACAA TAGCATGAAA ACATTGAGAA GTTTTTATTA AAATCGTGTT C 'CATGGCAC GCGAATTTTC GATGCCGGTA AAACAACAAC ATCGGTGAAA CTCACGAAGG GGTATCACGA TCACATCTGC ATCGACACAC CAGGACACGT GATGGTGCGG TI'ACCGTTCT TGGCGTCAAG CAACTGAGTA ATAATAGAAT AGAAATCAAA ACTTGAAA.AA ACTCGTAATA TACTGAGCGT ATTCTTrACT ACACTGGTAA AATCCACAAA TGCGTCACAA ATGGACTGGA TGGAGCAAGA GCAAGAACGT TGCGACGACA GCTCAATGGA GGACTTCACA ATCGAAGTAC TGACTCACAA TCAGGTGTTG CGGAGTTCCA CGTATCGTAT ACAACCACCG CGTAAACATC AACGTTCTCT TCGTGTATTG AGCCTCAAAC TGAAACAGTT TTGCCAACAA AATGGACAAA ATCGGTGCTG ACTTCCI'TA CTCTGTAAGC ACACTTCACG ATCGTCTTCA CACCCAATCC AATTGCCAAT CGGTTCTGAA GATGACTTCC GTGGTATCAT AAGATGAAAG CTGAAATCTA TACTAACGAC CTTGGTACGG ATATCCTTGA CCAGCTGAAT ACCTTGACCA AGCTCAAGAA TACCGTGAAA AATTGATTGA
AGCAAATGCA
TGACTTGATC
AGAAGACATC
AGCAGTTGCT
GAAACTGACG AAGAATTGAT GATGAAATAC TTrGAAAGCTG TCAGCCTrCA
AGCCCACTTG
CGTCCAGCAT
TTCGTAGGTC
GTATTGAATA
AACAGCCGTC
AAAGATACTA
ATCAACGTTrC GTATCCGTAA AGCGACTATC AAAACAAACG TGTTCAAT'rG ACATCCCACC AATCAAAGGT CTGACGAAGA GCCAT'rTGCA GTTTGACATT CTTCCGTGTT CTTCTAAAGG TAAACGTGAA AAGAAATCGA CACTGTTTAC CAACTGGTGA CTCATTGACA CAGAACCAGT TATCCAATTG 638
CTCGAAGCI'G
AACGTTG-AAT
ATGCTTGATG
AT1'AACCCAG
GCTCTTGCCT
TACTCAGGTG
CGTATCGGAC
TCAGGTGATA
GATGAAAAAG
ATGGTTGAGC
AAGAAATCAC TAACGAAGAA TCTTCCCAGT ATTrGTGTGG'r CGGTTATCGA CTACCTTCCA ATACAGACGC TGAAGAAAT'r TCAAGA'rCAT GACTGACCCA TTCTTCAATC AGGTTCATAC GTATCCTTCA AATGCACGCT TCGCTGCTGC CGTTGGTTTG CTAAAATCAT CCTTGAGTCA CAAAATCTAA AGCTGACCAA ATCCAACATT CCGCGTTGAA GACAAGATGG GTATCGCCCT TCAAAAATTG GCTGAAGAAG 4 S 4 4~ S S S. S S 5 55 S S S. 5.
S S
S
S5 .5
S
S. 55 S S
S
S
ACAAACGTTG AAACTGGTGA AACAGTTATC TCAGGTATGG GTGAACTrTCA CCTTGACGTC CTTGTTGATC GTATGCGTCG TGAGTTCAAA GTTGAACGA ACGTAGGTGC TCCTCAAGTA TCTTACCGTG AAACATTCCG CGCTTCTACT CAAGCACGTG GATTCTTCAA ACGTCAGTCT GGTGGTAAAG GTCAATTCGG TGATGTATGG ATTGAATTTrA CTCCAAACGA AGAAGGTAPA GGATTCGAAT tCGAAAACGC AATCGTCGGT GGTGTGGTTC CTCGTGAATT TATCCCAGCG GTTGAAAAAG GTTTGGTAGA ATCTATGGC'r AACGGTGTTC TTGCAGGTTA CCCAATGGTT 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 GACGTTAAAG CTAAGC'rTTA TTCAAGATTG CGGCT'rCACT CTTGAACCAA TGATGCTTGT GGTCACGTAA CTGCTCGTCG ATCGTTCGTG CTTACGTTCC GCATCTCAAG GACGTGGTAC TCAGTACAAG AAGAAATTAT GAAGGAAGTC ACTTAGTGGC TAGTAGAAGA ATAATGTGAG ATTGGGGATG CCTGCTGAAA AGTTGTGGTT CATAAAAT'rA ACCGATTGAA ATCTTTTTAG CAATAAAGCT G7"rTTTGAAA TGATGGTTCA TATCACGATG TCGACTCATC TGAAACTGCC TTCCCT'rAAA GAAGCTGCTA AATCAGCACA ACCAGCTATC AACAATCACT GTTCCAGAAG AAAACCTTGG TGATGTTATG TGGACGTGTA GATGGTATGG AAGCACACGG TAACAGCCAA ACTTGCTGAA ATGTTCGGTT ACGCAACAGT TCT'rCGTTCT ATTCATGATG GTATTTGACC ACTACGAAGA TGTACCTAAG TAAGAAAAAT AAAGGTGAAG ACTAATCCGT CCTCACTCTA TTCCTrTrTGT CTTTAGAAAA TACCTCTAAA TATGGTAAAA GAAAATGAAT GTCAAATAGT ITTGAAATTT TGATGAATCA TGAGACAGGC TCCTGCTTTA GCACAGGCCA ATATTGAGCG GTAAGGTATG GGAGTTTCAT T'rCGTATTTT CTAATAT'TTT
AATTAAAGAA
TTAAGGCTCG
AGGTTTGAGC GAAGAATTTr CTAAGACAGG GTCTCAAGAA TTTTCAAATC AGCTCTTGCA 639 GTCCTACTAT AGGGAGGCTT T'rATCAAAAT TTGCAAGTTC GATTGATAAG GAACA'NTTA GTTTGGTTTT CCAACTTrrA GGAAGAGGCC TrCATGCTG
TCTCTGAAGG
GTGCTGAGGG
AGAAGAATCA
ACTGTCAAGT
AAAATGAGCA
AGCAGATGGC
CTGCAGCTAA
AGGAAAATCG
TCCATGTGCT
TAATCAGCTA
'rCTTCCTAAT AGTCAAGGTT TTAAGTCCCT TTTATTGAAG GATCTGAAGC TTAGCCAAAC AACTTGAAAA
CCGTGCTATG
TGATTTTCAA
TATGATICGAA
GGAGCAAAAA
TTCAAGTTTT
CATCAAGAAG
GAACAACTGG
GCGAAAAAAG
GTGACGACAG
GTGACTAGAA CAGGTCGTGT TCTATGCAAA AGTGGGTTAA AAT'rCTTCGC TCCGAGTTCG CGAGAAGAAT GATCTCCTGA CCCAAGAGCA GATTGTTCAA GCTGCCAATG AGGAAGCGCTI ACCTCCTCCA GCGGAAGAGA AACCAGCCTT ACCCAAGCTG GATAAGGCGG AGATTACTCC TCTGGTATTT GAAGGGG'N'G TTTTTGATGT T'rTAATCAAC TTTAAAATGA CGGACTrATAC AAACGAGGAA GAGGCCCAGA AGTTTGACCT AGGGAATGTG GAGATGAATA ACTTCACACG GGAAGTTGTT CACTATGAGC GGAAGGATTT TCATGC'rCAT ACTAACATGT CGACTATGGA AACAGC'rGCT AAGTGGGGAC ACAAGGCGGT CTTTCCACAT GGCTATAAGG CGGCTAAGAA 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 CGATTTGACT ATGAACGTAC AGGATCTGCA GATGCCAGAA GGTGAGCGTC GGGTTGAGTT TGCTTTGCCA GAGGTCGAAG AGATTGTTGC TGCTATCACG GACCATGGGA ATGTCCACTC AGCGGGAATC CAGCTGATCT ATGGGATGGA AGCCAATATC GTGGAGGACC CGTCTATAAC GAAGTGGAGA TGGACTTGTC AGAAGCAACC TACGTGGTCT AACCACGGGA CTTTCAGCTA TCTATAATGA CTTGATTCAG GTTGCGGCT CAAGGGGAAT GTTATTGCTG AATTTGATGA ATTT'ATCAAT CCTGGGCATC CTTTAC'?ACA GAGTTAACTG GAXITTACAGA TGATCATdTC AAAAATGCCA ACAAGTTT'rG CAAGAATTCC AAGAATTTTG CAAGGATACG TACCTTTGAC GTTGGCTTTA TGAATGCTAA TTATGAGCGG TCAGCCAGTT ATTGATACGC TGGAGTTTGC TAGAAACCTC TGGTTTGGGG CCTTTGACCA AGCGTTTTGG TGTGGCCTTG CTACGATGCG GAAGCGACTG GTCGTCI'GCT TTTCATCTTT ACATGGTGTG ACCGATTTAG CTAGACTCAA CATTGATCTA AAAAGCTCGG ATCAAGCATG CGACCATC'rA TGTCAAGAAT CTTTAAGCTG GTI-rCCTTGT CTAATACCAA GTATTTTGAA AACGGTTCTA GATGCCCATC GAGAGGGCTT GATTT'TAGGT
GTCCTAGTTG
CATGATCTTC
TATCCTGT
GAACATCACC
ATCAAAGAGG
ATCAGTCCAG
CAGGTAGGTC
GGAGTGCCAC
TCAGCCTGTT
GTGTCCCTAT 4140 TTGACGTGGA 4200 CTAAGATGTA 4260 CCTTGTCAGC 4320 AACCACTAGA 4380 CCCACAATGC 4440 CAAAGATTAG 4500 A'rAAACCC-A 4560 ACATGGCCAA 4620 TAGCAGAAAA 4680 ATT'CTTACAA 4740 TAAAAAATAT 4800 GGATTCCGAG 4860 CAGAGGGTGA 4920
AGTTTTT'GAC
TGATTTTATC
CAAGGATATG
TGGCAACCCT
TCGTGAAATT
TGGTGAACAT
GTTGGATGAA
CAATGCCTTG
TTTCATCGAC
TTATGGAAAT
ACTGGGGAAT
GTGGTCGTTT CTCAAGGTGT GAGGTCATGC CACCGGCTAT GAGGAACTCC AGACCATTAT GTTCTGGCTA CGGGAAATGT ATCGTCCGTA GTTTGGGACA GCCCAACCAG CACCACTTCC TTTGCCTTTT 'rGG4GAGAGGA GCAGAAATAT TTGAATCCGT AAGGCTGAAG AAACAGTTGC CCGCTGCCAG ATATTGTTGA GGATTTGCTG TGATTTATCT 640
GGATGCGGCG
CTATGCACCiZ
CAAGAGTTTG
TCACTATATC
GGGTGCGATG
AAAGGCTCAT
ACTGGCTCGT
TGAAGTCGTT
TGAGTTGACC
TTGCGGAT'r
GGCATCGCAG
GTCTGTCGGA
TCC'rCACTAT GTTGAGGTGG CCAAGTATTA TI'GATTGCCA AAGAGCAGGT ATAGAGGTTG GAGACCGCCT GAACCGGAAG AAGAGA'N'TA ATTAATCGAA CTATCGGTCA TTTCGAACGA CTAATGAGAT AAACTGGT'rA TTGAAAACAC AAGGG'rGACT TGTATACGCC TATAAGAAAG CTTTTGAGAT GAAAAAGAAT TAACATCCAT ATGCTGGTGC AACGT'rCTAA TCTAGTTTCG TTGCGACCAT GTCTGTGGTC AGTGTCAGTA GATATGCCCC ATAAGGACTG ATTCCGTTTG AGACCTTCCT TGAACGGGGT TATTTGGTTG GT'rCTCGTGG GATTGGGATT ACGGAGGTCA ATCCTCTCTC CAGTGAGTTT ATCACAGATG GTT'CGTACGG TCCAAACTGT GGTCACAAAC TCAGTAAAAA T'rCAGGATTTr
CGGACAGGAT
4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720
TGGTTTTGAT
TAGCGCCCAC
GGTTGGTACG
TGGCAAGTTT
GCGGACAACA
CGATTTTACG
CTTTAACTTC
TCCGACTATG
GGATGACGAA
ACAAATTGGA
TGGAATGGTA
GTCCCACGGT
GGACCTATCG
TCTGGAACCT
GGGGATAAGG
TTrGGATGTGC
GTAGCTGCCA
TATCGTGATG
GGCCAACACC
CCTGTCCAGT
CACGATATCG
ATTCGAAAAC
GGCGTGATGG
ACGCCTACGG
GACGAAACCC
ACTGATGTTT
AC'TGTTATCG
AAGATGCCCT
TTCCTGATAT
GTGATATCTT
AGACTGCCTA
TGACT'rGAAC T'rCTCGGGAG AAGATCAGCC TGGTGAAGAA TATGCCTTCC GTGCGGGAAC TGGATTTGTC AAAGGTTACG AGCGAGATTA CAGAAGTAGA ACGCCTCGCT CGGGGGGAAT CGTTGTTATT ATCCAGCAGA TGATGTCACG ATGAGAACGT CCTCAAACTC TTCAGGATTT GTCTGGTATT CACTCTTTTC TGGGACTGAT GTATGTTGGG GATTCCAGAG ATCCGACAAC CTTTGCGGAA GGTTGGGGAA TGCTCAGGAT GTTGTCGCGA CGACATCATG TTACCATTAT GGAACGGGTA ATG-GCTATAT CGAAGCAATG CAAGGAGCGG CGGGTGTCAA CCGAACTACA TGGATGTCTA GCTGAATGGC AGACCACTCA GATGTACTGG GACATGATGA GACCCTAATA AAATTCCTAT GTGCTAGGGG TAACACCTGA TITTGGAACAA ATTTCGTACG TGCCTCAGC TGTCTGGTCT CTGATTAAGC AAGGAATAGC GTTTACCTCA TGCATGCGGG CGTAAGGGTT TGTGGCTAAA AAGGCTAATA AGGTGCCAGA GAT'rTCAGAA GAGGAGAGAA 641 GTGATATC GAATCCTGTG GGAAAATTAA GTACATGT'rC CCTAACGCCC ATGCCGCAGC 6780 CTACGTTATG ATGGCCTTGC GTGTAGCTTA CTTCAAGGTT CACCATCCTA ?rTATTACTA 6840 C1'GTGCTTAC TTCTCCATTC GTGCTAAGGC TTTTGATATC AAGACCATGG GTGCGGGCI-r 6900 GGAGGTCATC AAGCCCZAGAA rGAAGAAAT CTCTGAAAAA CGCAAGAACA ATGAAGCCrc 6960 TAATGTGCAA ATCGATCTCT ATACAACTC'r TGAGATI'GTC AATGAGATC'r GGGACGc- 7020 TTTCAAGTT'r GGTAAA'rTAG ATCTCTACTG TAGTCAGGCG ACAGAGTTCC TCATCGACGG 7080 GGATACCCTT ATCCCACCAT T'rGTAGCAAT CGATGGTCTG GGAGAGAACG TTGCCAACA 7140 ACTGGTGCGG GCGCGTGAAG AGGGAGAA'N' CCTCTCTAAA ACAGAACTAC GCAAGCGTGG 7200 TGGACTCTCA TCAACCTTGG TTCAAAAGAT GGATGAGATG GGTATTCTTG GAAATATGCC 7260 AGAGGATAAC CAG'rTGAG'rT TGTTTGATGA GTTGTTTTAA AAAATTGCT'r AATAATCTAT 7320 TAAAAGAGGC TAACGTATAT CCAATAGATT TACATTAGCT TTCTTTrTG ?1'AAAATAGT 7380 C'rATGGAAAG AGGGTGAGAG TATGTCAAAG ATGAGTATAA GCATCCGTCT GGATAGTrCAG 7440 GT'rAAGGAGC AGGCCCAACA GCTGTTTAGT AATCTGGGAA TGGATATGAC AACAGCTATT 7500 *.*AATATTTTCC TTCGTCAGGC AATTCAA'rAT CAGGGATTAC CTTTTGATGT TAGACTAGAC 7560 *.*GAAAATCGGA AGTTGCTCCA AGCGTTAACG GATTTAGACC AAAATCGTAA TATGAGCCAG 7620 *TCTTTTGAAT CAGTCTCAGA TT'TGATGGAG CACTTACGTG CTTAAGATTC GTTATCATAA 7680 *ACAGTTT'AAA AAAGATTr'rA AGTTGGCTAT GAAGCGTGGT TTGAAGGCAG AATTATTAGA 7740 AGAAGT'TTTG AATTTTCTGG TTCAAGAAAA AGAACATCCT GCCAGWATC GTGATCA'TC 7800 ATrlGACGGCA TCCAAGCATT TTCAAGGAGT TCGTGAATGC CATACCCAGC CAGATTGGCT 7860 *TTTGGTTTAT AAAGTAGACA ACTCGGAATT GATTTTAAAT TTGCTGAGGA CAGGCAGTCA 7920 CAGTGATTTA TTTTAATCTA TTTTAAGGGG GTTCTCATGA AACTAAGAAT ATTTGCGGAA 7980 GATAAGCCGG CTAAGAAGGT ATTTGAATAT CAATTAGAAC TTGCTrGATCG TACAATTCT-r 8040 CTATCGACAG-CACTCT'rGTC AGGTGCTATT GCTTTAGCAG GAATCN'rC TGCTTTGAAA 8100 GAAAAATAAA AATAGAAAAG AGAAAACAGA ATGGTTTTAC CAAATTTTAA AGAAAATC'rA 8160 GAAAAATA'rG CGAAATTGTT GGTTGCGAAC GGAATTAACG TGCAACCTGG TCACACTTTG 8220 :**GCTCTCTCTA TTGATGTGGA GCAACGTGAA TTGGCACATC TAATCGTGAA AGAAGCTTAT 8280 *GCCTTGGGTG CGCATGAGGT CATCGTTCAG 'rGGACAGATG ATGTGATTAA CCGTGAGAAA 8340 TTCCTCCATG CCCCGATGGA GCGTTTGGAC AATGTGCCAG AATACAAGAT TGCTGAGATG 8400 AACTATCTCT TGGAGAATALA GGCTAGCCGT CTTGGAGTTC GTTCATCTGA TCCAGGTGCC 8460 642 TTGAACGGAG TGGACGCTGA CAAGCTTTCA. GCTrCTGCTA AAGCTATGGG ACTTGCCATG AAGCCTATGC GTATCGCAAC TCAATCTAAC AAGGTTAGCT GGACTGTAGC AGCTGCAGCA GGACT'rGAGT GGGCTAAGAA AGTCTTCCCA AATGCTGCGA GCGACGAAGA AGCAGTTGAT TTCCTTTGGG ACCAAA?1r'T CAAAACTrGC CGTGTCTACG AAGCAGATCC TGGGAGGAAC ATGCAGCCAT TCTCAAGAGC AAGGCCGATA TGCTrAATA.A TCAGCCCT'rC ACTACACAGC GCCAGGAACA GAT'rTAACAC TTGGTTTGCC GTTTGGGAAT CAGCTGGTGC ACAGAAGAGG TCTTCACAGC AAACCGCTTA GCTACAACGG CAAATCGTAG ATATCACTGC AATGCGGGTG CGCGTGCCTT CAGTCAGGCA TTACCTrCTT- ATCGGTGCAG CCTATGCGAC GAAGCTGCAG GGCTTAACCG ATGGATATCG ATGGTATTCG TGGGCAAATT AAGGAGATAA TTAGCAGGTG CTATGACCAA GGTTGGATCG GAGCCTTTCT TGTCAATGCA CAGGGCGAAG AATTCTTGCC GCCTGACTTC CGTCGTGCAG ATGGTTATGT AAATATCATT GAAGGCATTA AGGTGACCTT
TGTTAAGGCT
GGAGCAATTTr
AAAGAACCAC
AAATATGCCA
CACT'rCTACA
TAAGGATGGA
TGTCTTTGAA.
TCCAATTTCT
CCACTTGGCT
TGAGAAGGGT GATCAGG'rTA GGGTGA.ATGT GCCTTGGTAC TAACACCCTT TTCGATGAAA TAGCGTTGTT GATGGAGCGG TTCAGATGTT CACGTAGACT TGAGGATGGA ACGCGGGTAC TATGTTAGGA AGTATGTTCG TCGTGGAGAG CGAATGGGAT AGGTCACTTG CTCTTrGGAA
TGAAAGACCT
CAGATCCAAG
ATGCGTCAAA
AGATGAGCGA AGAGGAGCTT TTATGATTGG TTCTAACCAA CTCTTTTCCG TAATGGGAAT TrGGTCTCCT AGTGGGATTT GTTTTGGAAA AATGTTTC TC CTTGGGGGCC AGTTTTATCA 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9607 120 180 240 GGAACAGCTA TTATCCCAGC GATTTTAGGA GCCATGATTG TTTTAGCTAT TTTTTGGAGA.
CGAGGAA
INFORMATION FOR SEQ ID NO: 81: SEQUENCE CHARACTERISTICS: LENGTH: 14231 base pairs TYPE: nucleic acid STRANDEDNESS: double SD TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: CTACAAGATA ATTCCAGCTA TAACATCCGC TATAATAGTA AGAGCGAGCT CTATGATAAG GCTCATTAGT TTCACCTCCT CTCACGAACC CATAGGAACG TAATCGGTAA CCGATGACAA AAATAGTATA CCACAATACA TTTAGATCAT CAAGGTCACT TAATTCTTGA AATATCAGAT CTAAGAGAAA AATCTTTAAA ATCAGAAAAA CGCATAATAT CAGGTGTGCA AAAACTTGAT 643 ACTA'rGCGTr TTTGTGGG AAGGTTTACT CCATTTTCTC CTGAAATTGA GTTTTTGTCC AGCCTCTGTT TTTAGGGTTG TCAGCAGACA GAACGATACT GTATATATGT GACTGACTTC ACTTGATCAA TTrTTCAAATC TCATTGAA'rA TCAGAAACCC ATCTGGCACG GTGTC'rGAAA TCCAGTCTTG TAAAATTTAG AATATTCTTT TTTGAAATAG CTAAGAAAA'r AATCATGT CGTGAATATT TGTAAATCAG CT'rCGAAAAT
ATCAGTTCTA
TGTACTTTGA
ATTCTCCATC
TAAAGT1TGT
AACTATCAAT
CT'rGGCTGAG CTAAATCGA'r
CTCTTCACAT
TCTrACAACCT
CCAAGCTGAG
AAATAATTCG
GTATT'rGGAG
CATGTC.AGCT
CAAAACAGTG
ACTAGCCc
ACTGCGTCTA
AGGGGATTAA
GAACAAAATG ACCTCT'rcCT CAGTAAGATG GTTTCATCG AGAAGCTTCA TAAACATA'TT ATNAAAATGA TCTAATAAAG TTTGACCTCG CCAGATCTGA TTGTGCACTA TGTAAACCAT TCGTCTrTTCc T1CGAGCTG TAT!TrGATTT
ATAATTTTTG
TTTTAAAAAA
CrrTGTCAAT
GGTCATCAAT
AATTTTCCTT
TAAAGAT'TTT
TTGTAAAAAT
TACTAGAATC
ATCATAGTTG ACCACGGAAC AGGATTGATG GGAGCTATCT TTAACAGrrT CAGATAGGGT 0 0* *0 0 0 00 04 0 0 0 *000 0 0000 0000 0 *000 00 .0 0 0 0 AATCAAATTA TCAAGTTCAG AAAGATAGGG ACAGAGTTCG TAGACAGTAG
TAGATAGATA
TTGCCTAGTA
CAGGGTCACG
CACATACCAG ACCGAATAAA GTCTTTAGCG AGACTAGCGA TTAGTCTTTT CTTTCTCCTT CACGTATTTG ATGAGAAAGT TCAATTGTGT TCATAGAGGA TATCCGTGCT TTCTTTTGAT AAGACCTTGA TTTI'CTAAGA AAATTAAATC 360 420 480 540 600 660 720' 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 ACGACGTAAG GTACTTGTGC TTTTTTCTTT TTGATAATTT CTCCTAATTT ATCATTTCAA ATGAATATTG GGTTAACATT AAACGTGGGG ATTATAATAA TTTAGCAATA GATTTAGGTG TAAACTAGTA ATGGAAGAAA TTTATCTTGG GATATTGACT TACTAGTTAC AAGATTTTAT TGATAATGAA GGTAAGCTGT AGTGTTAAAG GAAATATCTG TCAGATTATG GAGATAAATA CTCTTTCTAT AAGACCAATA TGGAGAAAGT GATTTCTGCC AGCTCTTTT-A CGGCAATTCT CAATCAATTC AAGTACACGT TCATCTTTT-A TCATAAGCTC CTATATTATA GCACAAATTG GAGGAATTTG AATTATTTTT TGAACATTAT TCAAGTAAGC AGTTAATCMnA GGACGAAGAG CGACTTCTGG AAGAGCAATC TAAATCGCTT TTCTAATCTA TTCTACTAGC TAAAATTCrTT CTATCGGTAT TGACACATGG TATTACAACC TGTTCATTAT AAATGACTGA AT'rAGAAAAA GTTCACATAT TGAAAALAATA AGAAGAAAAA TGGAAGCGGT GTTGGTTACC TTTCTGAAAA CCTATTAGAG TAAAAGGGCA GAAAGTATCC GCTTGGCTAA GGAGT-rGAT'r TTGGACTGAT CGTGATGAAA GAACAAAGGG CTGTATTCAG AGACAGGAAA CCTTGTTT-CA ACTCTTTAAG GCACGTCAAG AATCTCCTGA AGATTCTT1'T AATGCCAGAT TTGT'rTAATT ATCTCTTGAC
AGGTAAGTTT
TrCAAAATTGG
AATTGTTTCA
TCCTGTTGTG
AGAAGGTAGT
ACCGATTCTT
AGTGATTACA
ATTTGAACGA
AAAAGAAAAT
GCACAAGACT
ACAACTATTT
GCTACAGAAA
AATCAGAATA
GAGGGAAATG
644 AAAGCATTGC TTCAACAACT CAATTATTTG ATCCTAGGAG TC'rTAAAACT ATTTGAATTG GATT1CATCTT TACTTCCTGA TTCTTGGAAG OATAAAAGAG GAGTATGGT'r TAGGCGATAT AATGTTTGTA GTCATGATAC TATTTATTT CATCAGGTAC ACTACCGAAT CCTTCATA TTTCGAAGA ATTGTACAGG AGAGGGAAAG CCTATTCT'TT CTTCCTCTGA TTGATACTGA TTGACAGAAT ATCTAGCTTA AAGATTGTTT ATGAAAGCC'r AGCAACCCGC A'N'GTCTCAG TACCTAAGAC TTGGTC7'rTG GTTGGAGTGG AACT'rACTTC TGGATTTACA AATGAAGTCG GTAAAGATCG GTTGTGGATC ATAGAGGAAC TAAGACGTTC TGATGATATT AGGACAATGG TGGAGAAAGA ATCAACTGAA TTTGCAACAG AATCTGATAT TCATCATGAA ACTAGAGAGT GGACAGATGG AGC'rGAAACG TATAGGAAAG CGATAGAG'T ACTAGAAGAA CTAACTCATA AGGTTTATAA GAGGATATAT GTGATTGGAG GAGGTGCTAG AGCCAGTTAC TTTAACCAAA TGATTGCTGA TAGAACTGGT AAAGAGGTTC TTACAGGTTT 6 6 6 0 f6 6 6 0 00 06 6 000~ 6 .0.6 GACTGAGGCT ACAGCTGTGG AGGGATGGAA GAGGCTCACC CCAAAAGAAT TAAAAAGATr TGTGCAGGAA GGGGGGATAA GACAAGGATG TCAGATGTAA AGATTTGACG AAAGGAACAG GGCAATTACC CCGTCGGGTA GGATATTAAT GGAAATGTTG TTTGATTCAA TATCAAACTC TGCAACAGTA TTAGCTTGTC GGCAGGGAAA GATGTTCGGG GGAATATTGT TGTGCAGCTC ATAGCTATGG GACAATTAAA ATGTTATTGA GGAGT'rTCTA CAATTAGAGA GTTATTACTC GAGAGTTTGT AAATT'rGCCT CCCTCCCCCT TC'rrAGCTTT TTGGTGAATT GAAAAATATT 'rAGTGT'rTrG ATATGAG(GAG' AACAAGAATT AATTAAATAT GGTAAGAAGC TAGTAGAAAC GTGGGAATCT CAGCGT'TrC GATCGTGAAA AACAATTGAT TTGATTTCTT TGAAATCAAA GAATCCGATA TTGTAGTGAT TAGAGGGAGA ACGCTTGCCA TCTAGCGAAT GGTATATGCA GTGATGATAT CGATGCAATT ATCCATGCTC ATACAACTTA TCAGAGAACC ACT'rCCAGCG AGTCATTATA TGATTGCAGT TAGCTGAGTA TGCAACATAT GGCACGAAAG AATTGGCTGT 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 6*66 @6 0* *6 6 6 GAATGCAGCT AAAGCAATGG AAGGTCGTAG AGCAGTTTTA CTAGCGAATC ATGGAATTTT AGCAGGTGCA CAAAATTTAT TGAATGCATT TAATATTGTT GAAGAAGTTG AATATTGTGC AAAAATTAT TGTTTAGCTA AGAATTTTGG AGAGCCAGTA GTTCTTCCTG ATGAGGAGAT GGAATTGATG GCAGAAAAAT TTAAAACATA CGGTCAGAGA AAATAGGGAG GATATTAATG TTAAAACATA TACCGAAAAA TATTTCTCC-A GATTTATTGA AGACTTTAAT GGAAATGGGA CATGGAGATG AAATAGTATr AGCTGACGCG AATTATCCTT CTGCCTCATG TGCAAATAAG CTAATTCGTT GTGATGGTGT AAA'rATTCCA CCATTAGATA GTTACGTCGA TAGTrCAATT ATTCCTAAGA TATGGGGTAC CTATAGACAG ACGATTACTT ATCTTAGAAG AGAAGACTTT GTTGCTACAG GAGAAACTTC ACTITATGCT GAAAGAGAAA ATGTrCAATA GAGGAATT AGCTCTCATG ACCGCAGAAA TGT'rTGTrGG AGGTTTGATG CCTGGAGAGA ATATTGTAGA TGAACTGTTA GACTCAAATC AAGAGGTTAT TAATAATGTG GCTTTGTCAC GGTTTTTAAA TAATATCCCT CTCCTAGTGG AATTAATATC AATTGT'rCAC AATGCTCAAA ATAGTTTGTT GGAAGAAGAT TTATG'rCTAT AGAGTTTGTT GTTGTCACTA CGTGGCTAAA AAAGTATGAT ATC'rCAGAAG ATAAAACACG ACAATCTATT ATTGTTTTC1' TTAGTG'rAAA ACGGTTTGTG AGAACAATGC TGATATATAC AAATCCAAAA AAATTGGAGT ACCTCAATGT AGGACAGATG GGAGGTGTAG CTCTAGGTGA AGAAGACAAA 645 GAATTATTAG ATTCCATTCT GTATTTAATG CAGTTTATGA ACGTTGTTTC GGGTGA'rGAT ATGATTGAAG GTCA'rGGTrAC AGATCTTAAA TATGAACGTA GTAAGAAAGC 'rTATGCTATT AATATrATCC TTAAGAAAGG AGTAGTTGTr 3840 3900 3960 4020 4080 a a.
a a AGTTGCCAGT CATGGTAATT TGAGACAACA AATGATAGAG GTTTGAGCAT TATTTrAA.AA CGr'T=GACT GACTTGATTG TTTGGATTCA G'rTGATATTG AAGTTATGAT TCAAAAATCA TAATGTTAAA CAACAAC'rrA CGTATrGATG ACCGTCTGGT ATTGAGCAAG TTATCATTGT TTAAAGATTT CTGCACCGG'r GAAGTr'rTAA ACTCTGTGCC GATGTGTATG ATTCTATTGA AGTAAAACGG AGGAAAATGA TATTATTTTA AGAAAATAGT TTGCTAGCGG 4140 TTAGGACATT 4200 ATCAAGTGGA 4260 GAGGAAGTCC 4320 TAACAGGGTT 4380 ATTTAGAAGA 4440 ACGTAGAGGA 4500 ACATGGTCAA 4560 TAATGATCGC 4620 AGGTTTAAAA 4680 AATAAAAAAG, 4740 AGGAAATTTA 4800 AAAGGTAACG 4860 TGATAAGGGA 4920 AAAATTTTTA 4980 GGTGACATTA 5040 CCGTCCGTTA 5100 a a a. a.
ACGAGAGTTG AAATTCAAAT TAAAA.ATAAT TTAAGGAGGT GTTGGGATTA TTGCCACTAT GTTACAAGTG CAATGGTTGG TCAGCTCTTG AATTAACTTG ACTATT'rCAG GTGCGAPTAT GCTGGTATCG CTATAGCAGT AAAACI'TAG ATGT'rTATTT TCAAAGATCG GT2TTTATCA CCAATTTTCC TAGCTATTAT GGTTCCTAAT GATAAA&TA ACAGTATATG CTATTCACAC TGACTATAAT GGACCGTTAT CTTAG'rATTA GGAGATTTCA GCTCGGTGTA ACAGGTATTG TGGTACTGCA TTTGGTATT'r TCCAATTGCA GTTGCTACCC TGTGAAAAAA GCTGATAATG TTATTCAAGT TTGGTTTTAA GCTTGGAGGG GAATATGTGG
CAATGTTGGA
AAGCATTACT
TTATGATTCA
CCCAAGGTGT TCTTATTGGT GAGGTTATAC TCCACCAGAT TATCTGGTCA AGGAGAAACT AACAGTTGGA TGTTCTTGCA ATGCTAAAAA CGGAGATTAT TCACGTTATT TAAAATTGTA CAGACTTGTT TGCTAAGGTT 5160 5220 5280 5340 5400 5460 5520 CCACCAATCG TTATGCA0GG GGTATGC1TN' TAAATATGAT ATTTGTTCTG TGTATGGAGG GCATrACTrCT ACGATATGAT GAGGAGGATC TTGATCTATG 'rCAAACGAAG T'NTrATGTAT TTrCTATATAC AArrCTTCCA CTGCALATGAA ACGTCACCTT TTGGAGTTAC ITCCGCTATG 646 ACTTAACTCT GCAGGTGC'TT GCTCAAGAAA AATATGTGGG AATGTCAACC ATGGGATCT TACTACCTTC M'ITrGGTTTT TATTCTrTT GATTGGATTC CACTAG'rl'GG TATTGCGGTA TGGAAGCAAA CCACAAGAAA CAACT'rCAAG ATGAATAATA AAGTAACTAA AG71rGAACTr GGTTCTTCAT GGAACTATCA GAGAATGCAG GTATTGAAAA AACTATACCC AGACAAAGAT GAGTTTTTCA ATACTCATCA A.ACAGCGGCA GAAGAACAAG AAGGAAATGA AGGTGCAGCT
TAGTGATGTT
AAAAAAGT
AACCTAGGTT
TCAGCTCTC
CCATTTATTC
TCAATTACTG
TTCTGGCTGA
GGTGCTTTAG
TATTTCGGTT
AAAGGAACAT
GTATTAAAGT TGCTTGATG GGCCACTGG CTGGTCTAGG AGATAGTT'rC CACTAGTTCC TATCTGTTTT AGTATTGGTG CGTCTTATTC GTATCTTTAT CGCCTTAATA TTGTTTAATA T-rATTAATAT TGAA.ATATGG GTATACTAAG GGTTCTAGTC 'IDATCCAAGA
TAAAGACGGC
TCCTGTTAAA
AAATAATACA
TGAATCGCGT
CATCAATGGT
TTCAAGAAAT
TGTGTAAATT
TACGAGTATG GCGACAGCAT TGGTATTAAT TTTGGATTAG GATTACAAAA TTAATTCCAG AATTAGAAAA GGAAAGAATC TTGGAGTTAT TCTAG'ITTT TGGGATATCA CCTCCATT GCCGCAAATT GGAATGAAGC GGTTCAGCTA CAAAAGAA'rG TCAGATAGAA ATAATCCAGA GATGGAGAAA TTCAGAAAGC GATCAAAGCC ACTATGAAT'r TGTGCTCTTr CATTGTATGA -TTTGAGCCTA ATAGT'rGTAA AATATTTTAT GTTGCCGTAT TTGGGTAGAA ATAAACGGTT ATGTCGGC CTGCTGGAGG
TTAGGAATTT'
GGAAGAGAGG
CTTGCCGAT'r
TATTCAACTA
CTCACTATTG
CGAAGAAT'rG
ACTTGGGGAA
AAGAGAGCTA
TAGGGCTAGT ACTAGTGGGT GGTTTGATTC AATTTAAGCA GGGGGAACTT CTTATTTC'rG GATTTATCCC TATGGCTTTG ACTTTATTAA CGTTGTACT AATCTTTAGT GTTATGGCTA TGAAGTAGTA GAAAGTGTGG AGGTGGTATT TAAAGAGTGA AATTATGGTA TAAGAAAGCT GGGAACGGTC ATTTAGGTGG TATGATTTAT AACGATGAGA CTATTGGTA TAGAGGAAAG CATCTTAAAA AAATTCGGGA ATATCTTTTA ATAAAGTTAA CAGTGTTTGC TACCCCAAGA CTTTACATTG AGCATATAGA TAT'rCAGTCT GATTTAGATA CAGCTATTTC TAATGT'rGTG 5580 5640 -5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 TTTACAAATA AAAAGAGAAT ATrTTACGAG TrTTThATAAG AGTGTCATCA GTTCAAAACA CAT'rAAATTT AAACATTAAT TAATGACGAA GTATCTAAAC TCGATTCAAG TACAATTTTA TAGAAAAGGT GTTCAGTTTA AAGTAGTATG TCATTCTAAG GTTACGGATG GTGAAGTAAG TGTATTGGGA GAGACAATAG TTATTCGGAA TGCTACAGAG GTATTTCTTT ATCTCAAATC CAGGGAGAA'r TTAGTAGTAT CAGGAGCAAT TTAATAGAGT CCAACGAATC TACTTCTTGA TTTCATTATG GAAGATATCT CTCAAGGAA TATGGTGTGA ATTAATACTC AAATGAAT'rA CCATTATTTG ATATGCTCGA TATCGAGCTA GAGGTTTTAC CAATCTCATG CCATGGGGGC ATTTGGGAAC ACTATTTATA ATAAAAGAAG CATTTCTTTT
AATGACGGAT
TGATTACTTT
TGATTTTrAAA
AAACACTAAA
GTI'AATATCG
TGAATTAAAT
TATTGGGGAA ATATAGATAT ACAGAAAAAG ATGAACATGT CTAGACTATA GTAAAGGTTG AAGTATAGTA ACTACTTGAC TCTAGTCAAC CGAATGTTT CCAATTTGGG GTTCTAAATA TTrGGATGGTA GGTCCATGTG ATTTACCAGA AAGAATGAGA GAACCGGGAA GACTAACCGC
TTCTTCTCTT
AAAAAAATAT
TC'N'AGCATT
TAACTTGTTA
ACCTGCCAAT
TACGATTAAT
AGTAGAATAT
TAAGAAAATG
TACGGCTCCC
ATGTACTCAT
TTTTGAAATG
CTACTTGATG
TGAAGGAAAT
a.
a a a.
AGCACATCAT
TGCAATTTGG
TTTCCAAGAT
CTTTGAAGAT
AATACGGATG GT'rTTGGCGA GTATTAACTA TTCCATGGTT GAGCGTATTC TTACGGAACA TATTTATT'rG AGGTGGATGG TATCGCTTAA AAAATGGTAT ACAGGTCCAA
GCTTGTCTAT
GGCATTGCAA
AAGAAACTAC
TATGAAGAAG
AATGAGATTG
AGGAGATTAT
TGGTTAGTAA
CATTTTTTTG
GTGTCTCACC GGAAAATAAA CATCTACAAT TGATAATCAA ATTCTAAGAT ATTTTTGTGA TTCATGCATT AACAATTAGG AGACAATTCG GATTT'rATTA GTCGTGTGAA GGAGTTAAAA CTAAAACAAA. AATAGGTAGTAATGGGCAAA TCCAAGAATG GTTAGAAGAT TAGAGCCTGG GCATAGACAC ATTTCACCTC TATTTGGGCT TTATCCTTAT ATATTCATAA AACTCCGGAA TTAGCAGAAG CAGCTAAAAT CACTATCAAT CAAACGCTAA TTTTTTATCT TCACAGGAGA GGGAGCAAGC GATTAATAAT 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280, 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 a. a a GTGGTTTGCA TGCTAGTACA CAAACAGGTT GGAGTGCTGC CGAGACTATA TCAAGGTGAA CCTGC'TTATA ACCAGATTAA AATAATGCGA CTCTTGGCAA TTTATTTCTT GACCATCCAC CATTTCAAAT TTAGGTTT.GG TGAGTGGAAT TTGTGAATTA TTAGTACAGA GCCATCATAA CTATCCAG CTTTACCTTC TGCTTGGTCA GAAGGAGAAG TGAAAGGTTT
ATGGCTGATT
TGGTTTGTTA
TGATGGTAAT
TTGGTTATCA
CAGAGTA-AGA
a a GGAGGATATA AGGTATCGTT TGCTTGGAAA AATGGGGATA TAACATTCCT AAAATTGGAA GGAGGAA6ACA AAGATCAAAA AGTAAGAGTA AGAATATATG GCAAAAATAC TGATGTACAA AATATTGAAT TGGTATT'rAA TTCAGAAAA.A ATTATTGAGT TAAATTTTTA GGTATAAGTC ATGAATAAAG AAAAAATAAA AAGAAAATTA ATCACAATAT TGTTTGTATG TATTGGGATG TTATGTTTTG GATTGTTAGC AGGAGTTAAG GCTGATAATC GTGTTCAAAT GAGAACGACG 648 ATTAATAATG AATCGCCATT GTTGCTTTCT CCGTTGTATG GCAATGATAA TGGTAACGGA TTATC.GTGGG GGAACACATT GAAGGGAGCA TGGGAACCTA TTCCTGAAGA TGTAAAGCCA TATGCAGCGA TTGAACTrCA TCCTGCAAAA GTCTGTAAAC CAACAAGTTG 'rATrCCACGA GATACGAAAG AATTGAGAGA ATGGTATGTC AAGATGTTGG AGGAAGCTCA AAGTCTAAAC ATTCCAGTTT TCTTGGTTAT TATGTCGGCT GGAGAGCGTA-ATACAGTTCC
TCCAGAGTCG.
TTAGATGAAC AA~rrCCAAAA GTATAGTG'rG TTAAAAGGTG TTTTAAATAT TGAGAA'rTAT TGGATTTACA ATAACCAGTT AGCTCCGC-AT AGTGCTAAAT ATTGAAGT T'TGTGCCAA6A TATGGAGCGC ATTTTATCTG GCATGATCAT GAAAAATGGT TCTGGGAAAC TATTATGAAT GATCCGACAT TCTTTGAAGC GAGTCAAAAA TATCATAAAA ATTTGGTGTT GGCAACTPAAA 9120 9180 9240 9300 9360 9420 9480 9540 9600 S. S S AATACGCCAA TA6AGAGATGA TGCGGGTACA GATTCTATCG GGCTTATGTG ATAACTGGGG CTCATCAACA GATACATOGA ACAAACACAT TTGAAACTGG AACAGCTAGG GATATGAGAT TCAATGATTG CTATGGAAAT GATGAATGTA TATACTGGGG GAATGTGCCG CGTATACAT'r TATGACAAAT GATGTACCAA ATTAT'rCCTT TCTTTAGACA TGCTATACAA AATCCAGCTC AATAGAACAA AAGCTGTATT TTGGAATGGA GAACGTAGGA TATCA69GGAC rTTATTCGA.A TGATGAAACA ATGCCTTTAT ATTCTTCCTG TAATACATGA GAAAATTGAT AAGGAAAAGA GCAAAAATTT TGACTAAAAA TAGTGAGGAA TTGTCTAGTA CT'rTATCCAA AACT'rIATGA AGGAGATGGG TATGCTCAGC ATTTATAATA GTAATGCTAA TATCAATAAA AATCAt3CAAG AATAATACAA AGTCGTTATC GTTAGATTTG ACGCCACATA AATCCAAATA ATTTACATAT TTTATTGAAT AATTACAGGA TTAGTGGATT TTGGTTGAGT AATGGTGGGA AAAACATTAT CCTATGCATC GGAACCAGAA GAGGCACAGT TTATAATTTC CTCCAGCATT TACTAAAGGT CAAGTAAGGA AGAAGTTGTA TTAGTTCATT AAACGGATTT ATAATAATGG G AGATATCAT' TTTCATCTAT ATTCCCTAAT AAGTCAACTA TTTAAACTCG GTGTAGGTAA TTCCTGGTAT TAATGTTGCC TATGTATACT CTTACGCTCT TGTTAAAGAA CAGATAAGAC AGCTATGTGG 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 GCATTATCAG GAAATTTTGA TGCATCAAAA AGTTGGAAGA AAGAAGAATT AG;AGTTAGCG AACTGGATAA GCAAAAATTA TTCCATCAAT CCTGTAGATA ATGACTTTAG GACAACAACA CTTACATTAA AAGGGCATAC TGGTCATAAA CCTCAGATAA ATATAAGTGG CGATAAAAAT CATTATACTT ATACAGAAAA TTGGGATGAG AATACCCATG TTATACCAT TACGGTrAAT CATAATGGAA TGGTAGAGAT GTCTATAAAT ACTGAGGGGA CAGGTCCAGT CTCTTTCCCA ACACCAGATA AATTTAATGA TGGTAATT'rG AATATAGCAT ATGCAAAACC AACAACACA.A AGTTCTGTAG ATTACAATGG AGACCCTAAT AGAGCTGTGG ATGGTAACAG AAATGGTAAT 10860 TTTAACTCTG GTTCGGTAAC ACACACTACG TTGAAAAAAA TGGATAAAGT TGGGCTTGTT CAACGTCTAT CTAATTT'TGA TGTGATTCTA AAACATGTTA ATAA'ITTGTC GGGTGAATCT AGGTATATTA AAGTTAAATT ACTAACGAGT GI'TTTAGAG AATCAGATGG 'rAAGCAATCT AAAGTAGTCT CTACAAATAA GGTAGCTACT GCTT'rAGCAG TTGATGGTAA TAAAGATGGA AAGGCAGATT CTAACGCTTG GTGGCAGGTC GTTGATATTT ATAATAGAAC AGATGCCGAA TTTCTATCT'r CATCAGGAGA AGAAGTTTTT TTGTTATCTrT TAAAAGTACC TTCTGTAGGG 649 GCAGATAATC CCTCTTGGTG GGAAGTCGAT AAAATTTATA ATCGCACAGA TGCTGAGACT TATGACAATA ArAGAAACGA AGTTGCTAAG GTTAGTCTAG ATN'TCAAAGA AAAAGGAGCA CGAGTGCCTT TGAGTTTAGC AGAAGTAGAG GAAGAGGATA TAGATAAAAT AACAGAAGAT CAAAGTTCAA CCAATTATGA GGGTGTAGCT GAT'rACGGAC ATCATTCGGT GACTCA'rACT GATCTGGGAG AAGAGI'TAC CCTCAGCCTT TATCTAATTT AGAAGACATT~ TTGATAAAGT GGT'r'ICTAAA
TGATGTTATT
AGTTGATGGT
S
.0 S.
S
S. 0e
S
S. *S 4, 5
S
C. *4, S S
*SS..S
S
S
.S.S
SS
0
S
5545 5 .555 *555 5* 55 S S
S
GCTAAGCTAG
GCAGCTATTC CGTTAAGT'rT AGCGGAAGTT GAAGTCTATG AAACTTTCTA ATATTGCATT A6ACAAAAGAA ACTCGACAGA TTTTCTCGTC TAGCAGTTGA TGGAA6ATAAA AACGGAGATT CATACCAAAG AAGATTCT1CC TTCATGGTGG GAGATAGATr GAAAAGTTAA TTATTTATAA TAGAACAGAT GCTGAAATTC ATTATTATAT ATGATTCAAA TGA'rTATGAA GTTT'rTACAC AGCAATAATC TATCCATAGA CT'rAAAAGGA CTGAAGGGAA AGAAGCGCAG GAATTCCTTT AAGTTTAGCA GAGGTAGAGG TAAAAATTAT CACCCAGGCT ACCGTAAATA TAATGGAGAT AAAATAAGAG GAAAATAGTA TGATTCAACA TCCACGTATT TGGTCGTCGT CAAGGTGTAC GCGAATCACT TGAAGTGCAA TGTGGCAGAT TTGATTTCAA GCACATTGAA ATATCCAGAT GATTTCTCCA TCTAC'rATTG GCCGTGTACC AGAGGCTGCA -AAAATCAAAT GTT'rGCGCAA CAATTACAGT 'rACACCATGC TCAAAATAGA ATrAAAATCA GTTCAAAGAG AACTCCGAAG GTTCAACGGA TTACAATGGT ATGGTCATCA TTCAGTGACT TAGCACAAAC CGAAGAATTA AGAGATTATC AAAT'rTTGAT AACATATTGA CAGTTTAGAA AAAAGGTTAG AATTTCTTTG TTTATACTTA TAAGTAATTT GGTAGTATGA AAGAAACAGA GGGATTCGTC CGACTATTGA ACAATGAACA TGGCTAAAAG GGGGAACCTG TGGAATGCGT GCTTCCCATG AGTTGTTTAA TGGTGTTATG GTAGTGAAAC 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 TATGGATATG TCTCCAGATA AGGAGCTGTC TATCTTGCAG TGGGATTTAT GGAAGAGATG
TTCCTCATGC
CTGTACTAC
TTCAGGAAGC
TATTTGGGGA TTTAATGGGA CAGAACGCCC TTCACATGCT CAAAAAGGGA TTCCAGCCTT TAGTGACACA GATATTCCAG AAGATGTCAA 650 AGAAAAACTT TTACGCTATG CGCGTGCAGC TCTTGCAACT TTACCTATCA ATGGGTAGTG TTTCGATGGG GATTGGTGGT CT'rCCAAGAA TACTTAGGAA TGCGAAATGA ATCGGTAGAT TATGGACCGT GGTATTTACG ACCCTGAAGA GTTCGAACGT AAACGTAAAA GAAGGATTCG ACCATAACCG TGAAGACCTT AGATAGACAA TGGGAATTTG TTANTAAGAT GTTCATGATT 'rAACCCAAGA CTTGCTGAAC TTGG'N'TTGA GGAAGAAGCG AGCTGGTTTC CAAGGTCAAC GTCAGTGGAC AGACCATTTT AACTrTCCTC AATACTCAGT TTGACTGGAA TGGTATT-CGA AGAGAATGAT TCACTAAATG GTGTGTCTAT GCTCTTTAAT ACAAATCTTT GCTGA'rGTGC GTACT'rATTG GAGCCCAGAG ACATACTTTA GAGGGTCGTG CTGCAGCTGG CTTCTTACAT GGCTTGATGA GAGACACTGC TCTA'rTGTAA ATCCGGATTT ATGACGGAGT TCACGCGCCG GCGCTCAAA'r GGGTGAAAGA GTTTTAAGCC GTGAAGAAAA GGACGTGACT TAATGGTTGG GT'rGGTCACC ATGCTTTAGT CCAAATGG4GG ACTTTATGGA AAACCATTTG TATTTGCGAC TATCTATTAA CAAATACTCC GCTGTTAAAC GTGTAACGGG CTAATCAACT CTGGTTCTTG TACATTGGAT GGTACAGGTC AAGCTACTCG ACATGGCAAA CCTATTATGA AACCATTC'rG GGAGTTGGAA GAAAGTGAAG TGCAGGCTAT CCGCGAATAC TTCCGTGGAG GAGGATTC'rC AGTAACAATG GTACGTC'rCA ATCTTCTAAA AGGTTACACA CTTGAACTTC CTGAAGATGT AGGA'rGGCCA ACTACTTGGT T'rGCTCCACG CTATGACGTC ATGAATAATT GGGGAGCTAA AGCAGACTTG ATTACCTTGG CTTCTATGTT TGAGGAAGAT ATCTTTAGAC CTAAAAATTG AGCAGACTAT CGTGCATGTC AGTTGTTGGG GGAGGTGAAC TTACGTCCCT CCTATCCTT'r ATTGAAAACG AA'rACAAAAA GTAATATAAT AGGAAAATTA TATGGCTATA TTTTATGTTC GCTTGAAAAT ACAGACTTCC CACCAGCAAA AACTCGTTTC TTGACGAAGG GGGATATGCC AGGGGTTGGT CCAGTGCTAC AAATTGCAGA TCACCATACT TTAGATAATC GTACAGATCC TTTGACAGGA AAAGGTGCTT TCAAGTCTGT TCACGGAGCC ATAACATATG GACACATTOG GAGAATTCCT GTCAATATGC ATAATGTACC GTCCTTATTT GGAACAGAAG ATCTAGAATC GCCACTACAT AAATAAAACT TGTTTATATA TAAAAAGATT 'rGTTAAACAA 'rTCACAAArA, GATGTTAAAT AGATAGC6CG GAGG;CGCAG CGGCAGTCAA CCTTATTGGA AAAGGTcGTTG 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 i~4 040 14100 14160 14220 14231 TAAATGAAGT GGGTCCTTAT ATCAACGAAC TTGGCTATAA AAAGGCACTT ATAAGTACAT CGAAGGCAGT GATATTTTAC CTAAGACTTT AAAACCACTG GAATCGAATA T INFORMATION FOR SEQ ID NO: 82: Wi SEQUENCE CHARACTERISTICS:
TTGGTGACAG
GATACAGAAG
651 LENGTH: 16995 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: AG.iTLCTTA ACTTTTTTAG GATGGCATTC
GACGTTCTAA
CTCTCCCTCT
TTCTGTCCTC TCTTCAGGTC TGTTTTCTCA ACAATAGTAT 0 .0.
*00 0.* TAAGAAGTAT CGTACGACTr GGGAGACCGT CTTCATGTCA GAC'NTrATTA CTCATTTCTT TTCCTTTTTT GACGAACTCT ATTCCGTAAC TAGAATTGCT TATCCCAAAT TTATTTGAAA GTTCATAGAT CTGAACTTTA TCATCATAAG TAGATTTTT CTGTCTAACT TTTGGGGTGT ATAATTTTTA AGCCTrTTTG CCCAGCCTCG GACAAAAcTr TAGTTTrAAA GGTTTTTAAC TCCGCTCTC!A GGTACTCATT TrCTGCTgAA TCGTTTTTGG CTTACGTCCC ATTTTAGGTA ACCCG7T-r CCTGTATTGT GCTAGCCAGT ATTCAAGAGA AACTCTATCT TAGTCCAGC GTTTrAAATC AGGAGAATAG TAACGATTTT GATCAATCAA TTTAATCATG TACCT1AATAT GCT'rCTCTAA GCTATATCCT TGTTTTCTAA TTAGTTTCAT AATAAAAACA CCCCAAAAGT AGTTCATGTA CACCTGATAT CATGCGTTTT TCAAAAGTAA TGTTTTGACA CAAAATCTGT TTTGTATATA CTAGT'rTTAA GAAAAGGAGG AAGTCAGGGT TCAAAAACTA GGGACATCGC CATTTATTGC TTGGGGAGTA TTGACTGCCCC AACAGTTAGC TACTGTTGTT GGTCCTATGT ACACAGGTGG ATATATGATC CATGGCcAAC TTGGTGCA' CACAGGTTCT AGTGTTCCTA TGGGAGGATG GACTATCAAG AAATTTGATG TTGAAATGTT AGTTAATAAC TTCTCAGCTG
ATGATCTAAT
TTTCAAATAT
TCTTTATCGC
TAACGTATT
GTGGTGCCGT
TGTTTATCGG
AGAAGTTCCA
GGAAGAAAAA
GGTTATGCCC
TGATGGCTAT
ATTGCCAATC
TGTAGGAGCT
AGCTATGGTA
GGAAAAAATT
GTATCATTGA
AATATTGGAG
CTGCCAAATG
CTGATTGGTT
ATTGCTACTG
ATGGGCCCAC
CGTCCCGGAT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 GTCTCGTTGG TTTTGCATTA TTGCTTTTGG CTTTCTACGC AATCGGTCCA GTCGTATCGA CTCTTACTGG AGCTGTTGGG AATGGTGTTG AGGCTATTGT CAATGCTCGC CTCCTCCTA rGGCTAATAT TATCATCGAA CCGGCTAAAG TCCTT'N'CCT CAATAATGCC CTCAATCATG CCATTTTTAC TCCTCTGGGA GTAGAACAGG TAGCTCAAGC TGGTAAGTCA ATTCTCTTCC TATTGGAAGC TAATCCTGGA CCAGGTCTGG GAATTCTATT AGCTTATGCT GTATTCGGTA AAGGTTCTGC TAAATCTTCT TCTTGGGGGG CAATGGTTAT TCAT'rTCTTC GGAGGGATTC ATGAAATTTA CT'rTCCTTAT GTTATGATGA AGCCTACTCT A'ITTTTAGCT GCTATGGCAG
GAGGTATCTC
CACCAGGTTC
ATGTTC'I-r'r
TTCATGCAGA
CTAAGGC'rCA
ACTCAGTGGA
CTAGTATTCT
CAATCTCAA-A
CAAGAGCTAA
CCTCTCGTTA
AAGGAGATAT
652 TGGAACTTTT ACTTTTCAAC TCTTAGACC TATTATTGCG ATTATAGCTA CCGCGCCAAA AGGTG'N'TIA GTGGCAGCAG TTGTTTCT CAAGTCAACT GAGGATTCGC TCGAAGCTGC GTCTAAAGG'r CAGTTAGTAT CAACDCTGT AAAAATCAT'r TTCGCCTGCG ATGCTGGTAT TCGACATAAG GTTAAAAAAG CAGGTCTAGA TTrTGCTTGAT ACACCAAAAA CATTAATTGT AGACAAGAGT CCAAGTGCTA TTCATGTTTC TGATGAAATT GTAGCTTCAT TAACAGGAGC ACCAACTTCA GCACCAGTAG ATAGTCAGGA 'rGG'rCTrAAA TCTCCAGCr AGGTGTTTGG CCCCATCTAA CCTTGTAGCA GCCCTTATTC TCAGGCGGCT ACCCAAGCAG TGATGCAG'rT G~rrCGACAG CGGAAGCTCT GCTATGGGAG GATTCCAGTA TCTAATCAGG TACTCAGGAA GAACTGACAC TGTTGATAAT TTCTTAGCGT TTCTCCAATA GCAGAAArrG AAGTGACC= AACCATATTG AACTATGGGC TGTGAAACGA TTCTACTGCC AAAATTTCAG AACTATTTCT TTACAGGCAG ATGCTGTAGT AGTTGCTTAT GGTAAAGCAC TTCGGGCTAT 7MTAGAAAC AAGAATATTC AATTAGGTGA ATT'rAAT'rCT AAAAACATAA AAGTGCAGCA AGCAGCACCG AATTCTCAAT CAGAATATGA CAAAATGGCT GCTAGAATGT AATGCTAT'rA ACCAAACGAG AAGAACAATT TTCAATGCAA GATATGACTG AAATCTTACA ATCAGATTTG ACAGATAGCA TGGAGCAATA CTATAT'rTTG ACTGGAGAGT TGGATGATTT TAGTCCCCAA GAAAGACAAG AGTTGATTAC CACCAATGAA GCATTGCAAG AGTGCACGAA TTCAGATATT GATAAGCGTC TTTTAGACTr TCGGATTTCT GGTGATTCAG TTGGTAAGAG TATCTCAGTA GCAGATTTTT CAACCGGTAA TAGAAC'rGGG CTGGCCAGTC AGATTGTTAA TGCTAGGATG AAGATGTTTT T3YGCGATCTT TGAAAATTCA CCTAATACTA GTAAGCAGGC TTACTCTAAG CAGACTGCAC AATTTTATAG CTTCGATGAA TTAATCATTA AACGTCAGGA
AGGGAACTGC
GTATTCCAGT
TGATTGTAAC
TTCTTATTGT GGATAGTTTA GTAACAACAC ACAAATAGAA CTAGAGGTTT CTAAATTACG ATTGAAGGCT TTCCTACATG -TAGGGfiAGCT GGTTTCATCT AGAACAA'IrT ATCGAACTrT TGGAATCGAA ATAACGAAGC ATGGGAAATA GCCGACAGAA CTTGAAGTGT TAGTTGAGTA 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2500 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 CTATCGCCTT CTGACTGAGA AGTCAGTAAT GTAACTATTA TGATC~tGAAA ATTGAACGAC AAGATTTTTG GCTATTTTAC TTTTGGGAGC TTTGATATTT TAAGCAACTG TCAGGTTTTC GTTATCTCTT ATAGGTCAGG TrTGGAAA'rT TCTCAAAAAA TATTCAGGAA ATTATCTATT CAATCCGCTC TTTACGGAGA
GTGGTTTTGT
TTCAGGATAT
AAAAAGGTTA
TGACAAACTG
TAGAAGCAGA
CAGATATGGA
AGCAAAACAT
TTTTrTCAAGC
TTGCGAGCAT
AATTTGATGG
653 TGAATr'rTC TACAATATr'r CAAATCTGAT TGATACGG'rr TCCATGTATA CCAAGATTGA 3360 Cr'rTTTTAAG GACAAGGTTT TATTCAArrT TCtTT'rCCAT CATATTCGGC TCACTTTACG 3420 CCTCCCTATC CTTTTTCAGG GTGAAAATTT GCCAGAATCT ATCCAGATTT TAGTTGAAAG 3480 GAATAAATTT CTrTTATACAG TCATCAGTC'r TTTAGTGAAT GATATTTTTC CGAAATATCT 3540 TCATACAGAG TATGAGTATG GCATGATTGC CCTACATTTT ATCTCTAGCT TAGGCCGTAG 3600 TCCAGAGA'rr TA'rCCAGTCC GTGTTTTGCT TTTAACGCAT GAACGTCGGG TCACTAGAGA 3660 TTATTAGTC AGTAAAA'rTA AGAGTGTTGC TCCTrTGTA GAGTTGATAG ATATTCAATC 3720 TCTAGTAGAT TACCACAGTA TTGATCTCAG TCAGTATGAT TATATTTTAT CTACCAAGCC 3780 GCTGACTAAT CAGGAAATCG ATGTAATTTC TAGTTT'rCCA ACCCTCAAAG AATTGCTTGA 3840 ATTACAGGAA COACTTCAGT ATGTACAGcC ACATCGTACA ATTGTCGCGC GTGATGCTAT 3900 CGCTCCAGAG AAAAGTTATG ACTTGCAAGA TTATT'rAATA TCTAGTAGTC AGCTTTTGAG 3960 T *'CAATTCGAG T'rGGTTCAA'r TGGAGAATAA TCAATCATTT GAGCACACGG TAGAACAPLAT 4020 *CA'rCCAATAT CAGAAGAATG TGAGTGACAG AGCTTACCTA ACAAGAAAAT TGTTATCTCA 4080 *CTTCCAGAAT AGTCCTATGG CTrATTCCTAA TAC'rGGTCTG GTGCTTTTAC ATAGTCAGTC 4140 *TAGCAAAGTA ACAACAAATA GTTTTACTArr GTTTGAACTC AAACTACCTA TCTCCGCATT 4200 *GTCAATGAAA CGJAGAGGAALG AAGAGGTCAA AAGGTGTCTG CTAATGCTAA TGTCTAAAGA 4260 AGCTAGCGAG GAAGCTAGAG ATTTAATGAC AGCTATTAGT CAGTCGATTA TTGAAAATCA 4320 TCT'rTATACA GAGATTTACA AGACGGGAAA TCAATCCATr ATTTATCAGA TGCTAAATAC 4380 .TATTTTTAAC GAAAAAATTA AGAAATTGGA GAACTAATAT GAAACTTGAA AAACATTTGA 4440 IrAAGCTTAA TAAACAATTT TCTAACAAGG AGGAAGCTAT TTGTTATTGT GGGCAAGTTC 4500 ***TT'rATGAGGG TGGATATGTr AATGAAGACT A'rATTGAAGC CATGATTGAG CGAGATAAAG 4560 *AGCTATCTGT TTACA'rGGGT AACTTTATCG CCATACCGCA TGGAACAGAT GCAGCAAAAA 4620 ATGATGTCCT CAAGTCTGGT ATTACAGTCG TT'CAAGTCCC 'rAGAGGGGTT GATTTTGGGA 4680 ATGTATCTAA CCCTCAAGTG GCAACGGTTC TTTTTGGTAT TGCTGGTATT GGTAATGAAC 4740 :.*ACTTAGAAAT TATTCAGAA.A ATTTCTATCT TCTGTGCAGA TGTAGATAAT GTTCTTAAAC 4800 TAGCAGATGC TCAGTCAAAA GAGGAAGTAT TGCGCTTAT'r TGATGCTGTT GAATAAT'rGA 4860 ATTTAGTCAT TTGTCATCTA GTATA'rATGT CCCTCA.AATA GGAAAAGGAG AAATTGAATG 4920 AAACATTCTG TTCATTGG TGCCGGTAATI ATCGGTCGTG GT'rTTATAGG TGAAATTCTA 4980 TTTAAAAATG GTTTCCATAT TGATTTTGTG GATGTCAATA ATCAGATAAT TCATGCTCTG 5040 654 AATGAAAAGG GCAAGTATGA AA'IrGAAATT~ GCACAGAAAG GACAGTCTCG TATAGAAGTA 5100 ACTAATGTGG CTGGCATTAA TAGCAAAGAA CATCCrGAGC AAGACGGATA ?1'ATTACTAC TGCAATCGGA CCTAATATAC AAGTCATTGA AGCGATTCAA CTAGCCAAAG GAATCGAAGC GCCTGTGAAA ATATGATTGG AGTCCCGAAG GTTTGACATT AGGATTGTTC CAGCACAAAG GAATGGGTCG TCGAAACCAA TATGAAGAAG ATTTAGAACC GCAACTTCAG CT'rACATTGG AATCCTAATA TTAAATCTCG GCCAAATGGA ACTTTGATAA CTTGAAAACC CTTTCATAGT TTAGGCTATA ATGAACGATT TATAAAAACC TACTTAAAAC GAAAGTAT'rC GATTAGGTGA TCGCCGAG'r GCAGGAAATA
CGGTCTCAA
TGCTGATAAC
TCACGAAGAT
GCGTCTTAAA
CT'rTATTGAG
TCCCATTAT
GATTGAATCT
AAAAGAATTG
TTTCTTTATC
TACATAGGTT
TCCCTTTT'rG
TCCCTTTTAT
CACAGGCATT
AAGAAGTCAA
TTCCAAATGC
TTGTGGTCGA
CGCCGAACTTI
GGATGTTATG
GAAATATTTA
TGCAGTAGAC
GCCCTTTAAT
AGATGTGCAT
TTCTGGACAT
AGCTCTTCAA
TCTCTTGATT
TATAGAACGA
AATCCAGA'rT TACGTCTAAA CGAAAACTTT TTTCAGTCAA GGTGCCAAGA CAATTTTGGA G3TATTAGCTG AAATTCGGAG GAGAATTATC ACAAAGTCAT 5160 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 000.0k 0 00 0 000 0* 0 4 0 0* 00 S S 0t 0.
0 0
S
C 0 *5t* GGACGAGGTT AGTCGCGTAG CTCGTACTCC AATCCGAAAA CATCCGGCCG ATACGTGAAT TGAAAGAACT CAGTTTGTCA AGTTGGCTAT GTCTTTGACT ATCGCGATGT AAATGATGAA ATTGTTGGCT AAACAATCAG TCAAAGATGT TGTTATACAA GT'rAcAGGrT
TCGAAAATCT
ACAACCTCAA
TGTTTGAGTT
GGTAAAAGAA
TAGACGACCA AGAATTGATTf GAGCAAATTG TAGAGTATAT TTAATCTTTT CTTCAAATCA GGT'rAGCATC GCTTTGTCTT AGGCATATGT TGTTCTATCT AGCAGTGCTT TGAGCTGACT CCGTCAGTCT TATCTGCAAT CTCA-AAACAC ATCTGCGGTA ATCTTTCTAG CTTGTCTTTG ATTTTTGTTG TTATTTATAA GCTGGACAAA AAGTCTT'CAA AATCGGGAAA AGGCAGCCTA TCGGGTGTTC
SC*
0 S. OS 0 0 4 AAAAATCTTG ATAGGATGTC CTTTATTATG GAGTTTTTGA TCAGCTTTAT GAGATAGGTC TTATGGACAG TGGGAAAATT GTTGAAAAAA GAGAAGAACG AACCAAGCAA TTTTGGAACG ATGAAACAAG AACAGGACAA ATCGATCAGG GAAGTAGAGG TGTACTATTC TAGT'N'CAAT AGTTTTGGAA AATGACTAAC CAAAAGATAT TGAGTGTTTT AGTTAGGAAA AAGGCTTGTr TATTTATAGA AAATGTTATA ATAGACTGTA GAAAGCCTTA TTGGATTTTC TTGCTAGAGA TGTAGCCCAT ATAATGCCCA TCAATTCTTT AATTCTTTCG AATGCGATCT
TCCTCAGATT
CATGTTArr
AGTCGTCCAA
ATATAGTAAA
ACAGTCAAAT
CTACTATATA
CCAAAGTAGT
GTCTATAATT
?1"rAAAAAAT CGATTTCTAA AAATGTTTTA ACTGAAAAAT TAGATAAATT CTAAAATTGT CTATACTTTA GTCTGCAT'rA GTCTAGATTT TNAAGGAGA AATGACAGAA 6360 6420 6480 6540 6600 6660 6720 6780 6840 655 TGTCTGTATC ATTTGAAAAC AAAGAAACAA ACCGTGGTGT CTTGACTr'rC ACTATCTCTC 6900 AAGACCAAAT CAAACCAGAA TTGGACCGTG TTCCAGGTT CCGTAAAGGT CACCTTCCAC AAGCTCTTTA TCAAGATGCA ATGAACGCAC AAGAAGCTGG TCTTGAAGTG GTTGCCCAAC TCTTCAAGTC AGTGAAGAAA GCCCTATCTr CGACCAAAAA 'rTTGCCAAA CGCTTATGAA CAAAAATTGA CG'TAACTTCA TTACAAAACC TGAAGTAAAA
TCTCTTAATG
TTTGGTGAAG
GCAGCTGTAA
ATGGAAAAAG
TTGGGTGACT
GTCAAGACTG GGTTATCACT ACAAAAACCT TGAAGTATCA AGCGTATCGA ACGCGAACCC AAAACGGCGA CACTGTTGTG GTGGAAAAGG TGAAAACTTC AAGACCAATT GGTAGGTCAC AAGACTACCA AGCAGAAGAC AAGTAA.AAGC TAAAGAAGTT AAGTTGAAAC ACTTGCTGAC
GCTGAAGTCG
GTTGATGTAG
AACAACCTGG
ATCGACTTCG
TCACTTGGAC
TCAGCTGCG
CTTGCAGGTA
AAAAAGAAGT AACTGACGCT GATGTCGAAG CTGAATTGGT TATCAAGGAA GCTGCTGCTG TTGGTTCTAT CGACGGTGTr GAATTTGACG TTGGTTCAGG TCAATTCATC CCTGGTTTCG AAACCGT'rGA TGTTATCCITA ACATTCCCAG AAGAAGCTAA AT'rCGTGACA ACTATCCACG ACGATGAACT TGCAAAAGAC ATTGATGAAG AATACAGCAA AGAATTGGCT GCTGCTAAAG CAGCAATTGA TACAGCTGTA GAAAATGCTG ALTGAAGAAGT TCACCGT'rCA, GTAAATGAAT PCCCTGACAT GTACTTCCAA ATCACTGGAA AAGCAGAAGC TGAGTCACGT ACTAAGACTA
CCGGCTCTTG
AAGAAGC'rTA AAA'rCGTAGA
TCCTTGGGAA
CTACTCAAGA
CAAAGATGCA GTTGAAGGTG ACTrCCAGAA .GAPATGATCC TTTGCAACGT CAAGGGA'rCA AGACCTTCAC AACCAATACC 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 ACCT'rGTTAT CGAAGCAGTT GCCAAAGCTG AAGGAT'rrGA TGCTTCAGAA GAAGAAATCC AAAAAGAAGT TGAGCAATTG GCAGCAGACT ACAACATbGA AGTTGCACAA GTTCAAAACT TGCTTTCAGC TGACATGTTG AAACATGATA 'rCACTATCAA AAAAGCTGTT GAATTGATCA CAAGCACAGC AACAGTAAA.A TAA'rCTTAA'r AAACAGAAAA CCCACCTGAA TTGGTGGGTT TT'CTGATGCA CTATTTCCA AAAATCTCTT TGAGGTCTGT GTCTGTAATC CCAATCATGG CTGGGATGCG GTCCCAGTTT TCTTCGGT'rA GGATGTAGGA TTGTTCAGAG GCACTTGATG TGACTGTTTC AGAGACAGC'r TGTTGCTTTT CTTCAACAT'r CTCCAGTAGA TCACTGAAGC GTT-CAATCAG ATAGGT'r'1r' CGGGCAGTTC CGATGTGTTG GGTAGCATAG TCGAAGGCTT GTAATTCGCC TAGTAAGATG AGT'rTGCTTT TGGCACGTGT AATGGCTGTG TAGATGAGAT TTCGCTCCAG CATACGTCGG CTAGCACTAG TAATCGGTAG GATGACAACT CGGAACTCAC TTCCCTGAGA CTTATGAATA CTCATGGCAT AGGCCA-AGCG AATCTTGTAC CATTCGTTAC 656 GGGGCTAAGA GACTTCAT'rA CCATCAAAAT CAATGACAAT CTCGTCTTGT TTCGATCCGG TGTATTACC AGGAATCAGG TCTGTGATAG CTCCTAAA'rC CCCATTAAAG ACA'rTGATTT CAGCATCGTT AACCAAATGA ATGACCCTGT CTCTCTTACG ATAGTGACAC TGAGGAGCT'r CAAAACTGAG TTGATCTTT TGTGGGCGAT TGAGCAGG'rC TTGCATGAGC TGATTGATAG CATCAA'rCCC TGCCGTCCCT CGGTACATAG GAGCCACAAC TTGGATATCA CGGGCGGGAA TACCATTTCT GAGGGCGGCA CCTAAGATTT TT'rCAATGG'r GGCAG-GA.ATA CAATTTCAAA GTAGGAACGG TCAGCTTTTT TTTGGGTGAA ATCAGCTGGC
TGGCCACTAG
AAGATGCCCT
GTCGAATCTG ACTAGCTAGG GTGACGATGG TTGATTCTTT GCTTTGTCGA TAAATTTTT CCAAGCGAGT CTGAGGAATC AAAGGAATAT GAAGTAGATC CGCTAGAACC TGTCCAGGAC TGACAGAACG TAGCTGATCA CTGTCACCTA CGATGAGGAT CTTACTGTTA GAAGAGATAT TCGAGAAGAG TTGATTGGCC AGCCAAGTAT CTACCATAGA GAATTCATCC ACGATGATAA AGTCAGCATC TAGGTAATCT TCCAGATGAC TGGTATCATC GTCACCTGTC A'rTCCCAAGT GGCGATGTAT GGTCGCGCTA GGCAAACCTG TCAAT'rCAT'r CATGCGACGA GCAGCTCGAC CAGTTGGAGC AGCAAGAAGA ATGGGCAGAT TGCTTTTCTT CCTGAAGTCA AGTCCTTCTA AAAGGGCATA AACAGCAATG ATTCCATTGA TAACAGTTGT CTfTACCAGTA CCAGGCCCAC C'rGTCAGGAT AAAGACCTTA TTCTGGATAG CATCACAGAT AGCCTGT'=N TGAATGTTAT .0.
0 .00 0 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380
CATACTCAAT
CATGACTCT'r
CAGCGAAAAA
CGATCAGGTA
ACTCAAGGAG
TTTCCATACA
GACTT-rCGAT
TATCCTCAAC
AGTCTTGAAT
T.CATCI'CCGT
AGAGTCCTGC
CGCCATAGGT
TTGAAAAGTA
GCAGTTGT'rC
TCCCAGTT~CT
CTGrI-rCCT TGCTCGACAG TAGTGATATG TrTTTTGAATG GTTTCTAAAT TTTTCAAGGA TACGAACCAA GTGACTGCGG ATGCCTTCCT GAGGCTGTTG TCAAAGATCT TGGTATCAAT CTGCTGAACC TTGTCTTCTT GGAGAGCTCT TGGCAACTT GGCTGGGGTC TAGTTCCACG GGACGGGAAG AGTAAGGGTT TGTVCCAGCA AATCCCGTGC TTCAACATAG GTGTCCCCTG GGCCTGAAAA AGACTGTGAA CTAGACCGGC GCGGAAGCGT TCAGGAGCCT CCCTAGTITCC TCAGCTAGTT GGTCAGCAAT GGTAAAGCCC AAACCCTTGA CAGCTGGTAG GGATAATTTT CAACCACATC AAGGGTTTCT TCCTTGTAA.A CTGAAAGGCT AGTTTTGG GAATGCCGTA GTTGGCTAGT TTCGCCAAAA
TCCGTAGTTG
GATGCCTTCT
ATCCACGATT
CTTGACCAAG
'rCCATACTTG AGACGGAGAG 'rGGAGACGAA AGCCTCGCGA AACTTTTCTG GGTGTTGCAA AATTTCGTCA TT'CTGAGCTG TCTTGAGACC AATCCCCTTG CCCTTACTAG TTGGT'rTTGC GCGATCATAA GAGTGCTGGA CAATT'rGCCC CCAAAAAGTA TT1TTTGGCAG ATGGTA'rrTT
AAATGGCTAC
CGACTGATTT
TAGTCTTCGC
CCTCAAI'TAC ATCAGCCATG CGTCCGTATC GTCGATrC'r TAATCCGTTC AATAGTTCCT GAGAG -rCGG ATTGTmTA CCATCTGCAA CCTCAAAACA AAACACTGTr TTAAGCAGCC TATTTGTAAA TAAACAATCA TGCCTCTTAG GTTTCTTAAA TTCCCC'rGTG ATATCTTVMr CCGGTCATCT CCGAGGAGAA CTAGTTGACA TCAACTGTGA TTTATTTCCT TCAAAGCCCT TT1'GATATAG TCTGC'rAGAT ATTTTCG'rAA CGAATGGTGT GCCATCTTCC TCATGGGCCA GAAGAGAATT TCGCCATCCG 657 GTTCCTGTGA CAATGA'N-rC AAAATCATCA AAATCCTCTG AGGAGGAGGA TGCGATAAAA ATTGCTGGGA TT'ITCAAAAA
GAAAAATAAA
TTTATACTC
GTATTTTGAG
CTTCCATAAA ATTCCTTTGC ATGAATAGGT TTCGAAAATA TCrCAAACC ACGTCAGCTr CTGACr'rCGT CAGTTCTATC CACAACCTCA TACGGCTAGC TTCCTAG~T CTTCTCACGA 'rAGAAGAAGA ATGTTCCGAT ACGGGTGA'rr CTTTGAAGGT ACCTACGTGG GGTATTCTCC TTCTGGAACA AGGCTTGAGC TI-rTTGAGCG TGCCTGAG'rA AG'rGCTr-TGG AAGGCTCGTC CGTTTCTTI'G CGCCAGGCAT TCCAATCACG a.
a GTTCTTTGAT TTTCATTGAG GGCTGAGATT GG'rGA'rTCTC GGCCATAAGC GGAATTTAGC CGGCTGTCGC TCGAAACCAA GTAAAGCTAA AGTTGGTGTT ATACTTCTAA AGAAAGTTCC AGTTTGTCAT CCTTGAAGCG TCATTGATGT AGAGTTTA'rC CGCTTGACGA TGTCCTTATT ATAGGAAGGT GTTTrACAAC TGTCCTTCTA CGCGAACATT ATTAGGAGGA ACAATCCCCA TCTAAGCGTT TTTTCGC'T'T CCCTGCATAC CATAGGCTTG CCACTT-GGGA GCTGATAACC TCACGCGCAA TGACAGAAGA AAGCTGATAG GATTGCTGAA GCACTGGTAA AGGCATCAAT
CCACGATATC
CTAGGGTCGG
GCTCCAAAAA AAGATACGAC TTAAAGCTAG CTCTTTTAAG AAATT'TrAA ATGAATTCAT TTCAGTGTTT TTAAAGTGCA ATTTGGCGCA CAAAATCTGG CTAGCCACCT TGTCAGAAGC CAGTTCTCGT CCCAAATTTT CAAGATTTTC AACTGCGACA GACAAGTATT TGCCCTCAGC
AAAACGGTCA
ATCCATG4GAA
TAATGACAGA
AACTTACCTT
GAAGCTGAGT
CGTTCCAGCT
CAGAAAGAGA
CTTTTCTTCT
10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 a
ACGATTGGCC
CACAATTTTC
TCTTGTGCCA AGTACTTGTC ATAATTTTTA TCAGGCTGAA CACCTTTTTG AAGGAGGAGA TAGATAGCCT a aaa* a. a.
GGCAACCTTA ACCGAAACAG CGTTGTAGCG GTCTCCGATG ACCTCGTTGT TGAGAGAAGG AGTGCCTGGT GCTGAATTTT TTCCTTGAGA ATAGGAGTAA CTTTTGGTCG GTCAGAGTCT TAGAATCCCC CACACCGAGT TTTCGTAAAA GTCAGGTGTG ACAAAGGCAG CCACAACTGC AAGCCCACCA AAGTAGGAAC CTCATCTGTC CCAATTAAAG GA.AGATTTTG TCCGCTGGTT TGCTCTACAG
GATTATGGAG
ACTTGCTGGG
TCTGACGGAT
AGTCGTGCTG
CATTTCCCAC
CTTGATAGCC
658 AAAGAAACTG GCGTATTTTT CAGCCCCTTC ACCCTGAAGC AAGATTTTT'C CAGAAGTATA GATAGAAACC GTTGCTTGAG GTAGTTCAA AAAGTAGCGG ATATAGGCAT TCTTGCTAGG AGCCAGACTG GTTTGATAGTr TGTGATACTT GCCATAGT AAAATTAGCG AAT'TTTGGTA TCAAATTTAC ATTCGGGAAA TGGAGGAAAT CGCTAAGGTT GCGCAGATGA TGAAACAATC GCCGTGAGAT TGAATTTGAC TCACTTGTAA GCAAGAACAG ATTGGTCTTG GTTTGGGCAq' GTTCAAGAAA AGCCTGAATA TCCTTTTCGC TTGGTGTGAG CTATTGTACC ACAAAAGCAG TAATA'rCqTG AGGTGAATTT AAATCGTTAA CCTTGACAAG GCGACAGAAA AATACCAAGC GCTCTTTTGT TGGCAGTCAA GATAAGGCC A.AGAGCTAGA AGCAAGATTG AGGATTCCTT TTTATATCGG CTATCGGAGA CATCGGCTTT TATGGCTGGC TCCCTTATGC AAATTCGCAG TAAAATTTGT AAAAACTGAC
TATGGCAAAT
CGAACATGAC
AATTAAAGAA
CTGTTTATCA
CTAAATCGAT
AACCTTT'rTA
CAAATGCCTA
ACTCAGCTCA
AGAACTCCGT CACAAGCTTG ATGA'rTTCAT TCCTTCTTCrr GGCCTGCTCT TACAGGTTTA CAGTTTTATA AGGGGCTTGG GAAGGTCAGG GGACTTTCT'r 00.
00 0 00 00 0 0 0 *0 00 0 0 0 *0 @0 0 0
TACCTGATT
AGAGCAATTC
TCAGCCATGG
CATTTA'IrGC TTTCCCATCG GA'rCAACTCT TTCAGCTGGA GCTTGTATTT GGGATTGCT' ATAGCATTGG TCCTAGCAAA AAACTGGGTG GTAAGT'rGTT GGTGACCT'rA TTTGTCTTGC AAATGGCCTT TATACAA.AAT CCTCTTGAAA AGAGTATCGT TAAGGTCTTT TATGCAGGTA TCGGCTACrT TCGT'rTACTT GGTCTTCTCT TACACT~TGAT CCAAGTTTCA GCAGGTATCT TGTCCATGTT GACAATCTTG GCGACCATCC CCATGGCAGT CGCAAAACAC ATCATCCAGA GCATACCGGT GACAAATTTA ATCGGATAAA AAGGGCAGGA CTCGAATCTA TCAGAATGTA AAAAGCTACC AGATGAATAA GAAAATATTA GAAACATTAG 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 00 00 00 0 *000 @000 0000 00 00 0 0 AACAACCAGT TGGCTCAAAC GTTTTCCTAG CCCTT'T'T ACACCTAGAC ATTICAAAGAC AGTTCGATAA GGTCAAGGCC AATTGAGACA ACTGGCTCCG 'rGAAGGAAAT GCAGGCTCTT AAATTGCAGG AGTCTGCAAG TACTCTTGAA ACGCGTGCT'r AAAATGTCAG CTTGGAAGAA TACAAGGAAA TCTTCAGGCC AATTGGCGCG AATCCGTCGA
AAATCTGGGT
ACAGATTTGA
AAGGAAATAA
TTGTTTGAGC
ACTGCCAAAG
CTCATTTGTT GACCGAGCAG CAGATAAAAT CAAACAGGCT
GGCTTGGAGC
TTTGCTGAGA
TTCGTCGAGC AACCCCATTT AGGTTGGAGA TGGGAGCGGA CT'rGCCAGCC GAGAACTTCA TTrAGCCCTTT GGTTTGAGAA TTTAATGA'rG CGGGTTTCAT AAAATACATG ATAGCGAGAG TACTAT'rCTC TCAACTAAGG TCTCAATATC GAGGAGTTCC AAATTTITTAC ACCAATCTGG ATTACATGAT TTTCCGCAAT TGAAAATTTT GCCAGTGAAG TCAGGTACGC GATGTTTTAC AACACTTGCT CAAGCAAAAA GCGCAGCTGT 'rGACGGAAGG AATTGTTGCT AGCAGAAATG 659 GCCG'rCAGGT TTTACCAGTC AAAAACACCT ACCGCAATAA GATGCAGGT GTCGTTCATG ATATTTCTGC TAGTGGAAAC ACCGTCTATA TCGAACCCCG TGAGGTAGTC AAACTGAGCG AACAAAT-TGC TAGTCTGCGA GCAGATGACC GCTATGAAAT CTCGCATT CTCCAAGAAA TTTCTGAGCG TGTCCCCCT CATGCGGCTG AGATTGCTAA TGACGCTrGG ATTATCGGTC ATCTGGACT'r GATrCGTGCC AAGCTTCGAT TTATCCAAGA AAGACAAGCA GTCGTGCCTC AGCTGTCAGA AAATCAAGAG CCGTCGCAAA TGATGTCTAT ATACAGGTGG GAAGACCATC CAGGATTGCC GATTTTAGCA CTGATATTGG AGATGAGCAG CCAATATCGT GGATATTCTT TGGGGGCTGG TACTGATCCC T'rCGCCTGCG TCAAATCAAG GTATTGAGAC AGCCTTTGTG CGACCTATCG CT=ATGCAG AT'rCAACTGC TCCATGTCTG CCATCCTTTG TTTGGTCAAG ATTTAACAGC TAT'rGTCA'r ATCCTCAAAA CTCTGGGCT GACACAGGTC GACAAGGGAA GTCGTGTrGG TATTTTrGA.A TCTATTGAGC AGAGCTTGTC TACCTTCTCT
GTCAAAAATG
ACAGGTCCCA
ATGGCCCAGT
GAPLATCTTTG
AGTCATATGA
4 0 0.
0 0 4 4* 4* 4 0 0* 04 4 0 4. *0 0 GGCAAGGTCA ACCAACATTC ACTCTTACTT TTGGATGAGT CAAGAGGGAG CAGCCCTTGC CATGGCTATr CTGGAGGACC ACCATGGCGA CGACCCACTA TCCAGAACTC AAGGCCTACG CAAAATGCCA GTATGGAGTT TGATACTGCA ACTCT'rCGCC GGTGTTCCTG GCCGAAGTAA TGCCTTTGAA ATTGCCAAAC 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 .4 4* 0 0 4 4**0 0440 .4 4 0 4' GTCTAGGCCT ATCTGAAGT'r ATCGTAGGAG ATGCCAGTCA ACGTCAATCG TATCATTGAG CAATTAGAAG AGCAGACGCT ACAATATCCG TGAGGTGGAG CAAGAAAATC TCAAGATGAA ACAACGAGCT TAATCGTGAA AAGGAAACCG AGCTTAACAA AGATTGTGGA TATGGCCCTA AGTGAAAGTG ACCAGATPCT CCCAACTCAA GCCCCACGAA ATCATTGAAG CCAAGGCCAA AAAAAGTGGA CTTGTCTAAA AATAAGGTCC TTCAAAAGGC AGGTGGGAGA TGATATCGTG GTTCTCAGTT ATGGTCAGCG TCAAGGACGG TCGCTGGGAA GCCCAAGTTG GCTTGATTAA AGTTTGATCT TGTTCAAGCC CAGCAAGAAA AACCAGTCAA TGAAACGAAC TTCTGGGCGA GGACCTCAAG CTACACTGGA AAGAAGCCAT GAATGAGCTA GATACCTTCA TCGACCAAC AAGTTGATAT CATCCATGGT ATCGGAACAG GAG'rCATCCG TGCAAAGAAA CAAACATGTC AAGAGTTTCG GCTATCCCC GCAGATCGAT CAGGACAATG GGAAAGCCGC AAACGTTTGG CCGTGCGCTA AAAAAACTCT GGCGCGTGAA CAGGCTGCTG CAAAAATCTC CACAGTAkAT GTTGAAAAAA TTGGCTCCTG CAAGAAAAAA CGAGCTCCAA TGGTACCTTG ACCAGTCAAC GATGACCTTG GAAGAGAAAG GAAGAAACAG GTCAATCTTG TCTTCGAGGC AAGCGCTATG CTTGCTTAAC AATATGGCTC TGAAGGAGTT ACCAAATACT ACAAAATGCT GGAGGCAGTG 660 GTGCGACTAT TGTCACTTTT AAAGGATAGC AGTATTCTGG ACTrTATAAA GTAAAAAC'rG TTGAACTAAT TTTTACTAAT AAACACATTG ACAAAAGCCA ACATITTT'TG TAAAATTAGA ATCAATTAAA TACCAACACC GAATGAAGTT TAATAGAAGT GGGGAATCGT TTGATI'rTCC ATGACTGTAA ATGGACGGAA CTCTGGAGAG ACCGTAAAGG CACCGAAGGG CAAGGCAGGC AACTGCTCAA ACTCTCAGGT AAAAGGACAG AGCTAGGATA G2=CGCTT TAGCATTTAT CTAAGCAT'rC CAGAGTACAT GTATCTTGCA TGTGCrCTTT CFTTGGGGT TGAAACGATA GGAGAAGGAA ATGTrAGAAT TGCTTAAATC AATCGATGCT T'rTGCTTGGG GACCGCCCCT CTTGATTT'rA 'rTGGTCGGAA CAGGGATTTA CCTAACTATT CGGCTAGGAC TCTTGCAGGT TrTTCGTCTA CCCAAGGCCT ATCCAGTTTT GCAGCTCTGT AGGAGTTGCG ACGGCTATCA GGCT'rTCTTT GGA.ATGGCTA.
CAAGGACGAC CATGGTGCAG AGAAAAGTGG CGACCACT'rG GGGAATCGGA ACCTTCACCC GATTTCGCCA GCCATCACAG TGGACTfCAAG TCTATrTCTA TATCTTAGGA ACTCTTACAG TTTAGTCTT'r ACCTCAGCTT CGTTCGGATG GC'rATTCAA.A GGGTTCTGCT CCTATTGCAG
AGGTTGGTGG
CCAAGTATGC
TAGCGGGAGG
CTGTTTTG'TT
AAGTCAACTC
CTCTCGTCTr
AGGTTTCAAC
TTATTTTCTT
TTAGTCCCCT
ATGGTGTGGC
CTGCAGCTGC
ACCAGGAGCT
GGAAGGACTC
TCCCATGCAT
TGCAGTAGCA
GAT-rGCAGAA
GTCTGTCTTT
TACTGTrGTT
TAATATCGGA
TGCTGCGGTA
GCGTGG'rGTG
CAAGACAAAT
TTCAGCT'rA' TTTTATCCAG GTACAGCCTT GGCATCAACT GATAAGGGAC ATGGTGATGT GTTGGAACAG GAAATATCAT CTATVTTGGA TGTGGATGGC T'rGGCCATCA AATACCGCAC TATATCCTTC TAGGGATGGG GGAGTATTGG 'N'GCTCTCTT TCTATCCAAA ATACAACGAC GTAGCGATTG CAGTCTTTGG CCTTTTATGG CCATCATTTA AA.AATCCCTG GCACAATCGC GGTGGATTrG CTGGTGCTAG TTCTC.AAACG AATCTGGTCT GAACCAGTAG AGCAAGGTTT 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 16995 GATTTCCATG ACAGGAACCT TTAT'rGATAC CATCTTGGTA ACTGG CCTCATCATT TGTACTCTAA CTGGTTTGAC INFORMATION FOR SEQ ID NO: 83: SEQUENCE CHARACTERISTICS: CA) LENGTH: 28473 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTPION: SEQ ID NO: 83: CCGGGGCTTT TGTAGTATAA TAGAGATACG TTVI'GAAAGT AGGAGGTATC TATGGACTTA 661
ACTAAGCGCT
CAGGCTATTT
ACCCCAGACC
ACAGGGATGA
TACCAACTGG
TTAATAAACA
CGGAGATTCC
ATGTCAAGGA
GTTAGATAAA
TGGGGTCTTG
GGCGGCCAAG
AI'CAACTTT' CGTTGATTCG CGTT'rGACCr TGGGGGAACC CGAGCr.ATTG ATCAGAACCA CAGGCAGCCA GTCACTTGT ATCrTGGTTA CAATTGGGGC GAGGGAGACA AGGTACTTTT
TCAGTTTGAC
TGATTTTACA
ATCCTACTAT
TAAGGAAAAG
GACAGAGGCT
GCCAGCTCCT
GTGGTCTGCT GACTCTACGT ACTATGCTCC TGAAAATGAA CTTTGACGGC TATTTTGGAA TI'AT1CTGCGA GCTTATCCAG GCTATGAACC GATTGT'rAAC TTAGTTGGGG CAGAAATTGT TGAGATTGAT ACGACTGAAA ATGGTTTTGT CTTGACTCCT GAGATGTTGG AGAAGGCCAT TTTGGAGCAG GGTGATAAGC TCAAGGCGGT AGTCGAGAGC AGTTAGAGGC TGTGATGAGG Tr'rACTCAGA TG~TGAGAGA CCAGGCTATT GGCGTTTGGG GCTGAT'N'TC AGTACTTGGT CACTGCCGCA CTGGTAAAAA CGATGCGGAC CGAAAAAATG ACTGCTCTTG TGCTAAAATT CCAGCGGGCT TATTCTCAAC TATCCAGCCA ATCCGACAGG CTTGGCAGCT GT'TrACGCA AGTACGAAAT ATTGACCTAC ACAGGCGAAG CCA'rGTGTCT ATTATCAATG GTTTGTCTAA ATCGCA'rGCC GCTCCTGCGA CCTTCACAGC CCAGTTAATC AATACCATGG CGCAACATGC TGCGGTAGAA CCATGAAGAA GGAATATATC CAACGTCGGG GTTTTGAGAT TATCAAACCA GACGGTGCCT ACAATCAAGA CTCCTTTGCT TTrCAAGG TCCCTGGTGC AGCCTTTGGA CGTTACGGGG GCATGGAGAC TATCAAAGAA GCCATGAAAC TCAGTCTATC ACGAGTCAAG GCTTGGTGCT GCTCGTCAAA ATTTTTACAG AGCAGGTTGG
AATTACCTAC
TTTTGTTGTC
CTAGGAACGA
ATGACAGG'TT
AAGAGTCACC
GCCTTGACGG
ACTATATCAT
TCTATATTTT
A'NTTTGCTCA
AAGGCTACGT
GACTTGAGGA
TTACAATCGC
CAAACGCATG
GAAGAAGGCC
CCGCCTATCT
GTACATGAGA
AATTTTCGTG
TTTTTTG'rCA
GCACGATTTC
GTCATGACTT
GCAGCTCTTG
TTGCAAAAGA
'rTTGAAATTC
TGCCATCGGG
GAGCATTATC
GTTGCCTTTA
TATGCAGCCA
GAAGCATGAT
AGGATGACAA
AACACGCTGG
TCTTGCGAAT
TTCCCAAGAT
CAGATGCTAG
CTTTGGAGTT
AAAT'TTTGAC
TTGGTCAGGC
ATGAGGATAA
600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800
TCAGTCTAAG
CAATGATGAC
TAATAGTGAC
TTTGCAGGAC
GATGGAAGCA
TCGATTTCTGA
TTT'rGACTTT
GAGACGTTGT
CTGGCGCCTG TTATTCAGCC CTTGGTGCTG GGACTCAGTT ACATCGAAGA CTATCATGAG CTCTTTGTCA TGGCCTATGC GACCTATGTG AATCAGCAGG ATGCTCCCTT GTTTGCTT'rr GGCTTGGATT ATCAGGTTTT GACCAATATT ATCAGCCTCA ATTrTAATGA TCTTI'CAAAT ATGGAGCCTG CATCTCAATC CCAATATCCC
GTGTGTCTTC
CCTCTGTCCA
CTATCTGCTC
AATCAATTTC AAGCI'ATTGA TTTTGAGACT 662
TTGGAGACCA
TTATATGAAG
TTTCGCTCAA GCCTGGAATC AGTACGTTGG GATTCACCTA
AAGCAAGAGC
AAATCAAAGA
AAATGAAAAA
AGGGTGTCAA
AAGCTAAAAT
AGAAGATTGA
TGGTAT'rGGC
ATACAGGTGC
'rACGCCAATT TATGGATCAA AATT-TAT'rGA TTCCCTAGCA AATCGCAGTA GATGCCATGG TCAAGCCCTA TCTGACTTTT CAAGCAATAT CTGACAGCGA TTCGGATGA'r GAACCTACGA AGCCAAGC GTCAAAGATG CTTGTTGGCA GCAGGATTCT GACTGGGGAC AATTACTWA AGAGGAAAAG GGGGCGATTA CGCACCTCAG GCCATTGTTG CAGATATCGA GGTTCAACTr TACCGAGATG CAGAGCGCGT CAGCAT'rATC CATACGGATG GAGCTATCG GAATAAGAAA AATGCCAGTA GTGAAGCAGA CGCTGTCCI' TCGGCTGGGA TCATCGTGCG TCGTATCAAG AATATCGACC TTGATGGAAA AGGI'rTTGAC ATGCTAGACC ACC'TCCATCA ATATGCGGTT CTAGGTTCCT AACCACGCGT TGGTTTGCTC AACAACGGAA AGGAAACTTA TGAATTACTG GCGGCTGATG CGCGTGATTT GATGAATGGC GTTGCAGATG GTCCTGGACT CATGTCTACC TTGCCTACCG 'rTGGTGCCAA TGCAGAAAAT ACAGCCCAGC TCTATGCTAA AAATGTCCGT CAGAGAGTAG CAAGGGCGAC AAAGTTTGAA CTTTATCGGA TTGTTGTGGC AGATGGTTTC TGGCAATCAT GGGCTTGCTC GTGCCCTCCT TCTCAACGAC TTGGTGGAGC GGTCTTGTTT ATGCCAACGC TGTrTATAGT TTGCCCAGAC TGCGCGTGAA TGACCGTATT GTGACCATTA CTTGAGTCTG AAAGACGATT TCTGGAAGAT GAATTTAGTA AGGAGATGTG GTTAAAATCA TAGATGG'rTT TTAGAAATGA TTATCAAGGG CTCATGAAGA CAAGCCGCTA ACAGTTGAAG CGTTGAGAAT ATCTTTGCCA TCATCGTAAA CGCTTICGGAT AGGATTCTAG TTr'rCCAGGA
GGCATTGCGC
CCGCITrCGTA
AACGTGGAAG
ACGGGAAACG CTGTGCTCAA ATCCATCGAA GGGACAGCTA AAGACAGCTA T'rACAGGTGG TGCTCTTCGA GCGAAACTAG AGCCTCANGTG GTTTGAAAAA ACAGCTCAAT TATTCAGATG GGTGTTAAGG CACCTGTTGT CAAGACTCAT GGCTCAAGCG ACGATTCGTC AGATCCGTAC CATGCTAGAA ACAGACGTGG TTTTCAGGAG AATAAAAGAG ATGACAGAAA AAGAAATTTT 1860 1920 .1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 TCCAAGAGCG ACAGGGAGAG TGGATGCGGA TTCTGTTGAC TCGAAATCAG CGATGAAGAA TTCAAGGAAA ATAGCAATCG GAAATATCGG ACAAGCTGGT TATATCCTCA AGCACAAACT ATAAAGCCTrG TAATCATGCG AAGTAAAAAC GTTTAAAATG TACGAATGAA TTTGATTGCT AGTCTAATAG TAAAAAAGTG
GACTTTGTCG
TTGATGGAGT
ATTGACCAAC
GAGT'rCCAAG
AAAATCTTGG
CCACGTAAAT
CTATCTAAGG
TTTTCAACAA
GGTATTATCA
ATTAGAAAAC
TGACAGA.ATC
TTATCTTGAC
TCCAAAACG7
TCAACGGAAG
CTGACAGTGG
CCAGCAAACT
AGATAAGCAA
CCTATCGAAA
ATCATGAACT
ATCT'rTTTTA
AAAATAGAGA
AT~rGGATGA
CAAATATTAA
AAATTATGGA
TTGCCAAAGC
GAACATGGCA
ATGCAGCGAC
TGGTATATTT
GAAGAAAACA
AAATTATT'TT
TGAATCGTAA
GCTTAGAAAT
TATCAATGAT
TGATTTTCGAA
AAAAAATTA
AAATAAAAAA
TACTGAGATG
AGGTGT'rCGA
AGGTGCAGCA
ATGTTGGTGG
GGTCCTTATA
GAAAAATCTG
GCGCATGTAA
TTTAGAACGG
ACAAAAAAGC
TAGATACTGT
GAGGTATTCG
CTTGCTTGCG
GGAGGAGCAG
ACTGGTGCTG
TAATTATGGA
TGGATGATTT
TTAAAAAATA
663
TAATTCAAGA
TATACTAAAC
TTATGAATAC
TTGAAGGTGG
CACGAGGTCT
TGGGAGGAGC
TTTTAAAAG'r
AATTAGAAAA
ATCAAAACTA
CGTTTCGATG CCAATTCAAG TTCGTCAAGT TGTAACAAGA AAAAACGATG TCACAATTTG CGG3ATGCAAT TGGGGAGATT TCAGCTAGGA ATTAAAACAA TATACTTGGA GGTGTGGCCT TTTATTATTG GTTTAGTAGT AAATTTTTAA AGTCTTCGGA TAAATGATGA ATCI'GAATCA AGAGGAGTCT TATAGTAACG AGTCAAAAAA GGAGTAACTA TGTTATCTAT TCTGACTAGG AATAGATCAT ACCAGAGGTA AGCAGAGACA TTAGAAATTG AAGTAATAAA TAr.GATGTCG TAAGTGTTAC TTATTTGTTT CAAGCTTGCC TAGGGTGACA GTAAAAAATC AA'IrrCCTTT CAATAGCATA TTTTTAGTGG CAGTTGGGAG GGAGATAGGC GTAAGATAGT TGTTATCAGA ACGTCGCCTT GCCGTATATA GTGTTTTGAG CAGCCTACGG GG4GAAAAGGA GATGAATATG TGGACTGCGG TGTAGCTTCA TGGCTCACTT GCGAGAATTG TCAAGGTGGC AGAGGAGATT TTGACTTGCC GGATTTAACT TCCACTACTA TGTGGTGACT CCGGGGTGAA GTTGACTAAA CTCTTTTTAT GGCACCTAGT TCTCTTTTAT CCCTATATTA
GCAGGACTCT
TCATTTGGGA
TGAGTTAATA
TGTGACTGAC
CTAGTTTCCT
AAATTTGGGA
TTAGCCATGG
GCTAAGACGA
GGTTTTGAGA
TGTTCTGCCT
AGGAAGTCCA
CTCTTCGAAA
TTCGTCAGTC
AGTTTGCTCT
AACGTCATTA
TTTTTGGCTA
CCATGGATGG
ATTTTT1TAT GTTTTTGTrT
ATCAAATTCA
CTATCTACAA
TTGATTTTCA
TCGTCCGCAG
CTATGGTAGT
GACGACGGCT
CCAAAAAGTG
AGTGAT'rGGG
AACCACGTCA
CCTCAAAACA
TTGAGTATTA
GTGGATCAGA
TATTATTTTT
TTGGGCT'rGG
ATGACGCTTT
GGGAAATTGC
GATCCAGATC
ACAGGAGTGA
AATGGTCTGC
GTTTTGGCAA
ATCATTGATA
CTAGTCATCG
3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 CGCGAGCCAT TAAGGCAGAT 'TTCCTr'rTG TTGCCCATGT GCTTAAGGAA GGGCAGGATA AGGATAGCAT TCATfATTGCC CTGCCACGTG AGCGTTTTGA GGAAGAATGG CCAGACTATA AGCCTCATAA GGAACAAAAA GTGAAGCAGC GTGGCTTGAT TGCCAATATC CACTCTTGGT AACCGTGATT AACATTG;TGG GTTCTTATTA TCTGCAGTCT CCTATGTGCC AGATCAGATG CGTTCGACAC TAGGGATTAT TTCTATTGGG 664 TC'rACATCTrr CCACCAAATC TTGTCTTACG CTCAGGAGTA TCTCTTGCT'r GTTT'rGGGGC AACGCTTGTC GATTGACGTG A'TTTTGTCCT ATATCAAGCA TGTTTTTCAC CTCCCTATGT CCTTCTTTGC GACACGCAGG ACAGGGGAGA TCGTG1TCTCG TTTTACAGAT GCThACAGTA TCATCGATGC GC'rGGCT'rCG ACCATCCTTT CGATTTTCCT AGATGTGTCA ACGGTTGTCA T'rATTTCCCT 'rGTTCTATTT TCACAAAA'rA CCAATCTCTT TI-TCATGAC'r TATTGGCGC TTCCTATCTA CACAGTGAI' ATCTTTGCCT TTATGAAGCC GTTTGAAAAG ATGAA'rCGGG ATACCATGGA AGCCAATGCG GTTCTGTCTT CTTCTATCAT TGAGGACATC AACGGTATTG AGACTATCAA GTCCTTGACC AGTGAAAGTC AGCGTTACCA AAAAATTGAC AAGGAA'N'TG 0 TGGATTATCT GAAGAAATCC AAAAGGTTGC CCATCTCTTG TGGATGGCAA GATGAGTTTG CTAATCCTTT GGAAAATATC ATAACCGTCT AAATGAAGTG AGGATTTGAG CTTGATGAAG ATGGTCGAGA TGTCTTATCG TTGTGGGGAT TTCAGGGTCA
TTTACCTATA
CTTAATGTCG
GGGCAGTTGA
ATCAATCTGC
TATCTAGTAG
GTCGAGCAGA GAGTrCAGCAA AAGGCTCTGA GCATTCTCTG GATGGGGGCT GTTCTGGTCA TTACCTATAA TACCTTGCTG GTTTACTTTA AAACCAAGCT TCAGACAGCG CAGGTTGCCA CTTCTGAGTT TGAGGAGAAG AAAACAGTTG GGAGATATGA CCTTCAAGCA GGTTCATTAC AAGTAT3GCT GATATCAATT TAACCGTTCC CCAAGGGTCT AAGGTGGCTT 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140
GGTAAGACGA
ACCCAAGTCA AGGGGAGATT AGTCTGGGTA CCCTGCGCCA GTACATCAAC TATCTGTCTC TGGAGAATCT TCT'N'TGGGA GCCAAGGAGG TCGAATTGGC AGAGATTCGA GAGGATATCG TGACTTCGGA TGGGGCAGGG ATTTCAGGTG CTCTCTTGAC AGATGCGCCG GTCTTGATTT TGACAGAGAA GCGGATTGTC GATAATCTCA CTCACCGCTT GACTATTGCT GAGCGGACAG TTGTCGAAGA AGGAAAGCAT GCTGATTTGC -TCAATAGCTA GAAAGAGGAG AGGATGAAAC ATCGTCGTTA CCATAATTTT TCCAGTAGTG CTTTGGCCAA GATGATGGTT AATTTTTACG GTGTCAArCT'CAATCAGATT
GATAAAAAAG
AACAGCCCTA TGTCTTTAAC GGAACGAT GGACGACACA GGAAGATATC TTACGGGCGG AGCGCATGCC ACTGAATTAC CAGACAGAAT GTCAACGTCA GAGAATCGCT TTGGCGCGTG TGGATGAGGC GACTAGCAGT TTGGATATTT TTGCTTrGGA CAAGACCTTG ATTTTCATTG AGAAGGTAGT TGTCTTGGAT CAGGGCAAGA TTGCACAGGG TGGCTTTTAC GCCCATTTGG CAGAATTTTT AGAAAGTGCG GAGTTTTATA TGATTGTACC CATGGCCCTT CTGCTTGTGT TTTTACTTGG CTTTGCAACT TCGAACCTAG TCGTATCCTT ATCATTTGGA AGAAAATAAG GTTGCAGAGA AGGAGATGAG TTTGTCCACT AGAGCTACTG GCAAATATCC AGTCAACTAG CAACAATCGT ATTCTTGTCA CTGGTTAAGA AGGGGGATCT TTTGGTTCAA TACCAAGAAG 665 GGGCAGAGGG TGTCCAAGCG GAGTCCTATG CCAGTCAGTr GGACATGCTA AAGGATCAAA AAAAGCAAT-r GGAGTATCTG CAAAAGAGCC TGCAAGAAGG GGAGAACCAC Tr'rCCAGAGG AGGATAAGTT TGGCTACCAA GCCACCTTTC GCGACTACAT CAGTCAAGCA GGCAGTCTTA GGGCTAGTAC ATCGCAACAA AATGAGACCA TCGCGTCCCA GAATGCAGCA GCTAGCCAA CCCAAGCCCA AATCGGCAAC CTCATCAGTC AAACAGAGGC TAAAAT'rCGC GAT1'ACCAGA CAGCTAAGTC AGCTAT'rCAA ACAGG'rGCTT CCTTGCCGC TCAGAATCTA GCCTACTCTC TTTACCAGTC CTACAAGTC'r CAGGGCGAGG AAAATCCCCA AACTAAGGTT CAGGCAGTTG 7200 7260 7320 7380 7440 7500 7560 p a.
p *p p p r* p p.
p .ppp.p p p.
p pp..
pp p* p p pp-p p .ppp CACAGGT'rCA AGCACAGA'rT TCTCAGTTAG ATGCAGGTTC AGGTACCCAG CAAGCCTATG TTAAATCCCA ACACTTGGCA AAGGTTGGTC TGGAGGCAGA GTCAGGTAAG AAGGTACAGG CGAGTGAGGA TGGGGTGCTT CATCTTAATC AAGGTGCCCT AC'rAGCCCAA CTTTATCCAT CAGCTTATCTr AAGTTCAAAA TATGTAGCAA CTACGACTCA TGATGCCGGG AATCAACTTT CGACAGCTAC TAAGACTGAG AAAGGGAATT AATCTAGTCT TGCTAC'rTAC CGTGTCCAGT CGTCAGGGTT AAGCAGTCAA TTGGAATCCC AGGAATTGAC CCT'rCTAGCC CAGAAAAT'N' GAAATCTTTT AGACAAGGGG AAAGTTACGG CTGAGACCAG TGATTCTAGC ATGGTTGCAG CTT'TGGAAAG AGAAGGGAAA GCCAAACTCA GAATCAAGGT CGGTGATCT GTTCGCTATA TCCI'AGATTC TACTATTACA AGTATTGATG TCTTTAAAAT CGAGGCCGAG ACTAATCTAA CTTCGGAGCA GGCTGAAAAA CTTAGGTACG GGGTGGAA.GG CCGCTTGCAG ATGATTACGG GCAAGAAAAG TTACCTACGT TATTATTTGG ATCAATTTTT GAACAAAGAG TAATG'IrCGT 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 GTrTTTAGAG TTAAATAATT ACAATTTTTG AAAAACATCT GTGGTAAAAT GTGCTCAAGT TATTCGGGAA AAGCTAAAGA TACAAGGACC AGGCGACTGC GTCTTGAATA ATCAGATCTC ACTCACTTTG TGGAGAAACT CCTTTGGAAG TCGTGCTCCG GATGAGGGAA TCGCCTTGGA TTTAAACTGT GAGAAAGATT CTTCTTGCAG TTTTTTCTTT ACTATTTATT CGGTTAAATT AATACGAAAG GCGAACTTTA TATCTATACA ACTGAGGATG TTTCAACGCT GTCAAGAAGG ATCTTTTATT TTTGAGAAAT TTCAGACACG GAACAACTCA CAACTATACT GCTGGTTCCT GACTCCGATT GTCGAATTTT
CTTGTGTTTT
AAATGTCAAA
T'rGGTTTTTT
ACAATTGATC
pp.
P.pp pp p* I. p p AAAATCTTAT TATTTCAACT AGCAGATTGC AGGTAAGGGA TAAATGTGGC TGGTGTGGCG ATAAAAAGGT TAAGATTATT TTTCAAAACG TTTTGGTGTG ACTACAAAAA TGATGATTTG GATGATCCAT TTATCAATGA TGAGCATGTG AAAT'rCCTAC AGATTGCGGG TGACCAGCAG ATTGCCTACT TGAAGGAAGA AACGCGTCGT ATCAATGAAC TATTGAAAGT CTGGTTTGCT 8820 8880 GAGATTGGGC 'rTAAATTGAT ATTATCTTrGG CAGACGAATI' 666 TGACTTTAAG CTAGAGTTCG GTTTTGACAA GGATGA T'rCACCAGAT AACTGCCGCT TGTGGGACGC TGATGGCAAC CACATGGATA AGGATGTTTT CCGTAGAGGA TTGGGAGAAC TAACCGACGT G?1TTGGGAAA AGTTGCAGGA AACTAAAAGG ACTCAGGCTG TGAACTAACA GATG'TTACG TCAACGCTGT TT1GGGAATAT CGTATTrTTG TTGAAAAAAA CTCCAGCACA ACTTGGGACT GTATTTGACT TGGCTGAGGA GTAACCGACC A'rGTTTTAGA TTTGCCATTG AAAGTCTGCC TTGCTTTTGT TGGGAAGTTC AATAAAGA'rA TT~GATGCGAC ATTGAALATAA TCTG'TTTGCA ACGGAAA.ACC AAAAGGTCCC CCAGACCTTT TCACTCTGTA AAATTGTCTG GGAAAAGTTG CAGGGT'ITAA TGCAAGAGCT GAAATAAAGG AATAAGAA1TT GGCTGATTTT CAGGTCAAGT CAGAGAGTTT
TTATGAGATT
TTCGTCTCTC
GAGAACTAGG
AATAACAACC
GATGGATAAA
GGTTAGAGAG
AGTATATGAT
CTCTGAGCAG
GTCAAGCTTG
CTTGTTTGCA
TGAAGTATCT
AGGGCAGTT'r
GAGTGACGTG
TGAGTTGGAA
AAAAGTATTC GTAT'rGTGCA CCTGCAGAGA AGCACATTTT GATTCTCGTT TCAAGGATAT CACGACAGGG AAGACCATTC CCAAATTGAC AAGGCCGAAC AAGGGATrGGC AAGTCAATCG GGCGCGTGCC GACCACTGCC GTCATACGAC AAATT'rCAAA AGCAATI'GCA GGTCGGTCTG AAAAACCACA CGTGCTAATG GACGATTGGA GAAATTGAAG TGGACGTTGA ACCCACAACC ATCCAACAGA GCTATTCGTG ATCCGTTGTC GCTGGTGATA TTACAGCACC ATTTCTAAAA CAGC-AGCTCA ACCTACGTTC GTGAATACTT GTTGTTGGTG CGACTCCCAA
TTTCTTTGAA
CATGGAAGTG
AACTGAGACT
TTTTGAGACA
GTCAACCTAT
AACCTTGATG
TGATATGGAA
TGGTGTCAAG
AATTGAGCCA
GTGCAGGCGG ATCTT.GC'rAA CTATGCTI-rC GACCAGCGTG CAGCTTCGTC ACAGGAAGCC ACAGTCAACA CAGCCCAACT TTACTTGGTG GCTGTCAAAA ACTACCTGCT CAATCCAGTT ATTGCCAAGC AGGAGTTTTC AGAGTCAGAC AGCTATGCAG CAGAAGACTT TGCTCCCTAC GATCATTTGC TCTTTATCCA AGACTACTTT GAACTCAAGG TTT'rGGACAC TTACTGGTCT GAGTTGAAAC ACATCGACTT TTCAGCTTCT GACAAGTATA TTGCCATGCG CGAGGAATTA GATATGGCGA CTATTTTCGG TCGTTATGAG GTCTCTGACG AAATCAATGC CTGCTCAGTT GAACCTTGGC TCCTCATGTT TAAAAACG.AA rTTGGTGGAG CGGCTACCTG TA'rrGGTGGA 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 AGGCCGTTCC TATGTTTACC A.AGCCATGCG GATTTrCGGAA ACTCGCGCTG GGAAATTGCC TGGTTATTCT TCATA'rGGTA ACCAGATTOG CCACCCAGGC TTTGTAGCTA AACGTATGGA GGGCAATGTT GTCCGTGAAA AACCTGAAGC
TATTTCAGGT
ACAACAAGTC
GCTTGCAACA
ACTTGGTGCC
AGGTGATGTG
ATCATCCTTC TCGGAGGCAA AACAGGTCGT GATGGTGTCG GTGGTGCGAC GGGCrCTTCT AAGGTTCAAA CAGTTGAGTC
ATCGAAGAAC
AAGTCCAATG
CTTGAAATCG
GCCATCTCTG
TT1CGTTGCCG
AAACCAAATC
C'TTGACACCA
CTCCCAGAAG
GCAAGAT'rCA
ACTTTGGGGC
ACCTCAACAA
AATCACAAG.A
AATGTAACAA
TTGTCATGCA
ATGGTGTGCG
AGCGTCAAAC
TGTAGAGACT
GCGCCTCTTC
AGGCGGTGTC
GGTGCCTCTT
ACGGATGGCG
AGAAAATAT
667
GCTGGTGCTG
CGTAA'rGGCA
TGTGTGGCTA
AAATACCAGG
GrrCGTGGTTC
GATGCTGTTG
CTGGAATGGT GAGACAATCG CGTGGTTGTC GATGCCAAAG ATCTGCTGAA ACACTGGAAT TCAAAAAGGA TTACAGACTA ACTTGGTGGT CGTTACCAAC TCTGACCTCA ACCATGCAAG CGCTCAACGG TTAATCACCC 0 0 0 0
GTGCAGAAAT
TTCAACCCAT
GCAACTGCTC
GAGTATTTCG
CTAGGTTCTA,
ATGTCTGGTA
ACGGCAGATA
TACATCCCAG
GCTCAGTTTG
GGT=tTGTAG
ACCTTGCCTG
CCTGAAGAAA'
GTCAACGGTG
GAAGTTACC
TCAGATGTTG
TGCCAGT'rCA ACACGGTGTG ACTCATACTG ATGTAGCTGA ATGGTCTCCA TACCACGGTG GTTTGGTGGC TGCTGGTGCC AACTGGT'rCA AGCGTATGGA TAAACAAGCA GAGCGTTTCG TTGAAGCACA AATTCAGCTT G.GCTTGCCAT CCTTTGAAGA ATTGACCGTT CCGCCAACCT GCCGTAAGGT GCTCTCTCCA GAATTTAAAG GTCAAGCCCT CTCTGCAGAG ATTGATTTG AAGCCATCCA AGCTGACCAT AAAGTGACAT TTGAAAGTTT GGCTCTTGCT ACCTTTGGAA AACTTGAAAC AGCTTTGACA GCTCAATTAG TTGCTGGAGT AGAGAAGGTT GGACAAACGA TcGAAGCcAGA TGGACACAAG CTTGACAGTG AGGTTCAAAA AGGAAATGCC ATGTCACTCG TCTGATCAAG TCGGTGAAT'r GGCAGACGGT GCTTGAATGG TACAGAAATT GTCCTGAAGA TGTGGATGCC- TGGTGGCGAC AGTAACTGAA TTGACTTGGA GCGTCGTTTC TTGTGGACAA GGATGTCAAA CAGATACCCT TACGGTTCTA TCTTTGACTG CTCTGTTGGA TCACACCAAC TGAGGCATCT CGTCGGTCAT TGCTCAAGGT CTGCTTATGC GGTTATCGAA AGGCTCGTTT CTCTTACCAA GTCAGCCAGT AGCTGCTCTT CTATrC.GGTGG, TAAGGACTCC TGGCTGCCTT TGGGGTGACG CTGTTGGGGA AAATATCTAC ACTTGATTAA GAAAAATTTT CTGCATCAGC TGTCAAATAC ACTATATTGG TGCAGAGGTG GCGGCTTTGT CTTCACATCT AAGCAGACTT TACACTGACT CATTTCAAGG GACATTGGAA AAGAAGTACC AGCTGTGGCA CTGTGGTTTA CATCCCAGTC TCGAAAAAGA AGGTGCAGAG TTGTCAAGTC AGTTGAAACT CTGGTGGATT CTCGGCTGCG 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420
CAACAGAATT
TGATTAAAGC
TACCCAAGCG AAAGAACTAG CAAAGAAAAG GTTGAAAAAC TTTCCAGGAA CCAACTCAGA ATATGATTCA GCTAAGGCCT GTCAAT'N'GG TGCCATTCGT GACCTTGAAT GAAGAAGCTA ATGGTTGACA ATATCGACAA GACTAATATT CTCTTCTITTG GATGAACCAG ATGGTCAGC TAAGTTTATC GTGGCTATTG ATAGCTTTA'r CGCCCGTGGT CAAGCCI-rAG TCAAATCGGG TCTCCTACCC AGCCCAACCC 'rCTrCTACAA TGATGCCAAC
ATTGCCAATA
CCTGTTTCGC
GACAATGGAC
AAGTACAATC
ATCATCGGTA
GGCAATAAAG
TTACAGATTT
TGA'PTACAAA
AATGTGGTGT
TCCACAGTCT
AACTGAAGCG
TGGATAAATT
CTTCTGTAGA
CTrCATAATGG
CAATTTTCAG
ATCCTAGCCT
CCAACTCACC ATGGTTGGT'r ACGGTGAAGG GAAGTCTC AAA'N'TTCAG CCAATACGTT CGAATGGTTC TCTCCATGCC AGATGGGCCA CTCAGAACGT ACCAACACCT GT'rCGCATCA TCTAATAGAT AGTATCAGTA TGAAAATTAG GTATAAAAAA TTTCGGTATT TGGGGACATC TCAACACCGT CGTCAGGAGG CCATCGTGAC ATGGGGCTTT GACAGGAGCT GGTGCGATTG 668 GTCAATATCC TGCTTAATGA GGTTTGATTA TCGGTATTTrG TACGGAAACT TrAAGCTGC CAACACGTGG CCAAGATGGT GGTGTGCAAG TGGG.CGATAT GTGACGGCTG AGGAATTTGC GACTTTAACG GTAAACCAAG ATCGAAGGAA T'rACCAGCAA TATGAGGATG GTCTTTTCCA GCGGTTAAAC ATTTCACTGG ATGTAAAAGT CATGTAAATC TGACATACGA AGTAAAATCT CAGATGCTGC TAAGTTGACC GGGCAGGAAT CCTCTCCAAT
AAAAGTGCGT
TAA'rGGAT'rC TAACAG'rACT
GGAAACTCGC
CCACGCTATT
AGAGCTCCGT
TATGGATrrCT
GAATGGTCAA
AAATATCCCA
AAAATAAGAC
TAGCTCTTGA
CTTAATGAAG
TATTTTGGAC
GATCAAGGAC
TATCAGAAGT TTTCAGAAAT CCAGCTAATD GGCATGTGCG TTATGCGACT GCTGCGAAG 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 TAACATCCAG CCCTTCCTCT TCCGT'rrTCA CGATATGCAG AAATCTGACC AATGCAGCCT CTCTCAAGAA AGA.ACTGGAA CGCGACTT9CG GACTCTGAAA TCTTGGCTCA CCTCATTCGT GATGGCCAAA ATCAAGGAAG CGCTC-AGCCT TGTCAAAGGT
TTTGGTTTGG
CAAAGAGGAG
CGCAGTCATA
GTTTTGCCT
ATATCTTGCT GTTTGAGGAC AAGTTGATTG CGGCTCTTGA CCCAAATGGA TTCCGACCGC TTTCGATTGG TAAAATGGCT AATGGAGCAG TT'GTTGTATC T'TCGAAACC TGTGCTr'rTG
AGGTCATTGG
ACGAGGGCAT
AGTATATCTA
GTAAGAGAAT
GTGTGCCCAA
ATGAAATGGG
AATTGCGGGA
AACGTGTGGT
TGCCGAGTGG ATTCGTGATT TCAGTATGAC AGCTATACAG CTTTGCTCGC CCTGATTCTA GGGAGCGCAA TTGGCGCGAG TTCTTCCCTA AGCGCGGCTA TCTGATCAAA AACCAATACA GCAAGGAGTG CGGATGAA.AC CATGGTGGAT GATTCCATTG TGAAGCCAGG TGAGATTGTG ATCATACCCA GTTGGCGT ATATCCACGG TGTCAATGTC
ATCATTGATG
TGTTCTATGG
CATACGGCAC
AATTTAAGCA TGAGGCAGAT ATTGTAGTTG TGGCATTTGC GGAAGAATCA GGCTTACCAA CCCAGCGAAC TTTATCCAA CCGACTCAAG TGTCTGCTGT ?TCCGGTGr'r GTCAAAGGCA TACG'rGGAAC AACCTCTCGT CGTATCGTTC 669 AGCTCTTGAA AGAAGCGGGT GCCACTGAGG TTCACGTTGC CA'rTGGAAGT CCTGCACTAG 14280 CGTATCCATG TTTCTACGGG ATTGATATCC AGACCCGTCA GGAGCTGATT GCAGCCAATC 14340 ATACGGTrCGA AGAAACTCGC CAAATCATTG GTGCGGACAG 'rCTGACTTAT CTTTCAATTG 14400 ATGGCTTGAT TGAGTCGATT GGTATCGAAA CAGATGCGCC GAACGGTGGT CTCTGTGT~CG 14460 CTTACr'rTGA CGGTGACTAC CCAACGCCTC TTTATCACTA CGAAGAAGAC TATCGTAGAA 14520 GTTTTCGAAGA AAAGACCAGT TTTACAAGT AGGCGACAGA TTCTCCATTA AAGAAAAGGA 14580 AAAAATAAAT GACAAATAAA AATGCATATG CCTCACGTCT CACTACTGAC TAAAGGCTTA 14640 AGCATT'rAGT CAGTAGACGC TTTGTCCTAT AGGATCAAAG CTAGAGCCCT GACTAGTATT 14700 TTTAGATAAA AAGATGGTTT ATCTAAAAAT ACGTCGCAGT CTTTCTCAAA AAAAGAAAAG 14760 GAAAAATAAA ATGGCAAATA AAAATGCGTA CGCTCAATCT GGTGTGGATG TTGAAGCGGG 14820 T'rATGAAGTT GTTGAACGGA TTAAAAAGCA CGTGGCCCGT ACGGAGCGTG CACGTGTCAT 14880 GGGAGCTCTT GGTGGCTTTG GTGGTATGTT TGACCT'rTCC AAGACTGGGG TTAAAGAACC 14940 *CGTCTTGA'rT TCAGGGACTIG ACGGTGTCGG AACCAAGCTC ATGT'rGGCTA TCAAGTACGA 15000 *CAAGCACGAT ACCATCGGGC AGGACTGTGT GGCCA'rGTGT GTCAACGACA TCATTGCTGC 15060 *AGGTGCGGAA CCCCTCTATT TTCTCGACTA CGTAGCGACA GGGAAGAATG AACCAGCTAA 152 *GCrAGAACAA GTGG.TTGCTG GTGTG.GCAGA AGGTTGTGTG CAGGCTGGTG CTGCCCTC4T 15180 ***CGGTGGGGAA ACGGCTGAAA TGCCGGGCAT GTACGGCGAA GACGACTATG ACTTGGCTGG 15240 TTVTTGCGGTC GGTGTGGCTG AAAAATCTCA AATCATTGAC GGTTCAAAGG TGGTAGAGGG 15300 AGATGTTCTT CTCGGACT'rG CTTCAAGTGG GATTCACTCA AATGGTTACT CTTTGGI-TCG 15360 TCGTGTCTTT GCGGATTACA CAGGTGAGGA AGTCCTACCA GAATTGGAAG GCAAGAAACT 15420 TAAGGAAGTT CTACTTGAGC CGACTCGTAT CTATGTCAAG GCTGTCTTGC CGCTCATCAA 15480 .AGAAGAGTTG GTCAACGCCA TTGCCCACAT CACAGGTGGT GGCTTTATCG AAAATGTCCC 15540 TCGTATGTTI' GCAGATGACC TAGCTGCTGA AATTGATGAA AGTAAAGTTC CAGTGCTTCC 15600 AATTTTCAAA ACCCTTGAAA AATACGGTCA GATTAAACAC GAAGAAATGT TTGAAATCTT 15660 CAATATGGGT GTGGGACT'rA TGTTGGCGGT CAGCCCTGAA AATGTAGAGC GTGTAAAAGA 15720 ATTGTTGGAT GAAGCAGTCT ATGAAATTGG TCGCATCGTC AAGAAAGAAA ACGAAAGTGT 15780 CATTATCAAA TGAAAAAAAT AGCGGTTTTT GCCTCTGGTA ATGGCTCAAA TTTTCAGGTG 15840 AT'rGCCGAAG AATTTCCAGT GGAGTTTGTC TTrTCAGACC ATCGTGATGC CTATGTGCTT 15900 GAGCGTGCAA AGCAGCTCGG CGTTCTGTCC TATGCTTTTG AACTCAAGGA GTTTGAGAGC 190 15960 AAGGCAGACT ACGAAGCAGC TGCCTAGCAG GCTACATGAA ATTGTCAACA TTrCATCCAGC GCTTGGAATG CTGGCGTGGG GATACAGGCC AGGTCATCAA AGATTTGAAG CTCGCATCCA CTATT'rACAG ATTGACTrT GTGTTTGTTG AAGACGGCTT CTCGATATGT AT1'rTATGTC AAACAAA'rCC TGAATTTA'rA TTTCAGAACC ACTAATAGTC ?TTTCTTTAT TCCATrTAGGT 670 CCTTGTCGAA CTCTTGGAAG AACACCAGAT TGACTTGGTT AATCGrTGGA CCAACC'N'AT TGTCGGCA TGAAGGTCGG CTACTTGCCA GAAT'rTCCAG GAGCTCATGG GATTGAGGAT TCAGTCTGGT GTGACCAT1'C ACAGGTTCGT GTCCCACGAC TGAAGCAGAG TACAGGCTGT TGATGATTCA TATGATATCT
CAAACGGAGG
'rATCTGATGT
GCATTTTTCT
GATGGAAAA-A
GTTTGTTTCC
TATTGTAAT
TATTAACTTG
TAGCTCCAAG
TTGTTTT'rGC
CTTTGATAAA
GAATAGTCGG GACAGGTT'rC TTGAT'rAGTT TATTGTTTGA CAATAGGTAT AACAGATATA ACGGATTTGA CTTTAAATAC TACTGATTTA TCAAATTTTT ATAAGAGTGT TCAAATCACA TCTTAGGTAT GCTTAGCCTT GGTTTTGCTT ATCTTGTTTT GTGTTTAACT AATGATTIAAA AAGGAGAATA TAATGACTAA CAGACAAAGC GGGCATTGTT GAATTTGCCC AAGAACTCAA TCTCAACAGG TGGAACTAAG GTTGCCCTTG ATAATGCTGG ATGATGTGAC TGGTTTCCCA GAAATGATGG ACGGTCGTGT TCCACGGAGG GCTTCTCGCT CGTCGTGACT TGGATAGCCA ACAAGATTGA GCTCATTGAC CTTGTGGTGG TCAACCTTTA TTAAACCAGA TGTGACTTAT GC'rGATGCAG 'rrGAAAATAT ACTGGGTGGA TTCGGGTGTG TAGCTGATGA TACCATTGAC ATCCGGAAGT AGTGAAGGCT TTGATT'rTAA A'N'GGAGTCA GTTAGAATCT AAAAAAACAA GGGAATCTTA 'N'TAAGTr-rG GTATA'rCAAT TGGATTCCAT TGAALATG?1'A TTTAATCTGA AACTAATTTA TCTAGTTTAA GTGCTTACAG TATATTrTTAG GCTAGGTGTC TGTGTAGGC'r GACTAGAAAA TGGATCAATA ACTGTTACTG CATTTACTTA ACGCGTCTTA ATCAGCGTCT AAAACTTGGT TGGGAGATTA GGTGGATACC ATTGCTATCG GAAGACCCTC CACCCAAATA CTTGGAAGCG GCTAAGGACA CCCATTTAAG GAAACTATCC CGATATTGGT GGGCCATCLA TGTGGTAGAT CCTGCTGACT CTCTTATGAA ACTCGCCAAC 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 ;74 00 17460 17520 17580 17640 17700 17760 TGCTTCGTTC AGCAGCGAAA AA'TCATGCCA GTGTTACAGT ACGCTGTGGT TTTGGATGAA T'rGGCAGCAA ACGGCGAAAC GTTTAGCAGC CAAAGTATTrT CGTCACACAG CGGCTTATGA CGCCTrGATT GCAGAATACT TCACAGCTCA AGTGGGTGAA AGCAAGCCTG AAAAACTCAC TTTGACTTAT GACCTCAAGC AACCAATGCG T'rACGGTGAG AATCCTCAAC AAGACGCGGA CTTTTACCAG AAAGCTTTGC CTACAGACTA CTCCAT'rGCT TCAGCCAAAC AGCTCAACGG GAAAGAATTG TCA'N'TAATA ATATCCGTGA TGCAGATGCT GCTATCCG'rA TCATCCGTGA CTTCAAAGAT AGTCCAACCG TTGTGCTCT CAAACACATG AATCCATGl'G GAATTGGTCA AGCTGATGAC CTTGGGACTA CGCTTATGAG TCTGACCCAG TGTCTATCTT TGGTGGGATI' ACCGTGAGGT GGATGCTGCG ACAGCTGAGA AGA'rGCACGG CGTTTTCCTC TTGCACCAAG CTATACGGAT GAAGCGCTAG CCATTTTGAT CAATAAAAAG GTATCCTTGC CTTGCCM'?rT AATGCTCAAG AGGCTAGCGA AGTGGAAGCA GTGTAGTCGG TGGACTTCTC GTGCAAAATC AAGACGTGGT CAAGGAAAGC GGCAAGTGGT GACTAAACGT CAGCCAACTG AGACAGAAGC GACI'GCTCTT GGAAGGCTAT CAAGTACGTC AAATCAAATG GTATTATCGT GACCAACGAC TI'GGTGT'rCG rCCAGGTCAA ACCAACCG'rG TGGCTTCTGT TCGCCTTGCC
ATCGAGACTG
GTCGTCCTcA
GAAATCATCA
AAAAACTTGC
GAATACACAG
CCAGCTGACT
GAGTTCGCTT
CACATGACAC
ATTGACCAAG
CCAAAGATCG TC'rGGACGGG GCGGTCCTTG CTTrCAGATGC ACGTGGAAGA AATCGCCAAA GCAGGAATTA AGGCCATCAT GTGACCAAGA A'rCCATCGAA GCAGCGGATA AATACCGCTT TGAGACATTT TAGACATTIAA GAAGATAAAA GGGAAGAAAA CTTAAAATAC TAACTGAAAC AAGATTAAAA CGAACTwMTT ATTCGCAAAA GAGGTTGAGG AATGAAACTG CTTGTTGTCG GCGAT'rGCTA.A-AAAGTTACT TGAATCAAAA.GACGTGGAAA AATGATGGGA TGACTCTGGA TGGTTTGGAA TTGGTAAATA AAATTGA'rTG ACTTCGCAAA GACCAATGAT GTTGCTTGGA CTTCTTCCCA TTTGCGGATA CCAGCCCGGT GGCTCrGTCC CACTATGGTC TTTACAGGTG CAGTTTCTTT CCTTTTTTGG TGATA'rAATG TTGGTAAATA GTTCTGGTGG TCGTGAGCAT -AAGTCT'rTGT AGCTCC3'GGG TCTCTATTTC CGAACATTAT CCTTrTATCGG TCCAGATGAT GACTTAAGGC CTTTGGTCCG CCAAGGAAAT CATGGTCAAA TCGAGGAAGC CAAAGCCTAT 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600.
18660 .187120 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 GCCCTTGCTG CTrGGTATCGT GGATGATTTT ACTAGGGCTG CAGCGGAGCT TACGGCGTTC CGACAGCAAC ATCGAAAAGC ATGGTGCGCC GTCGTCGTTG CTGAGACGGT AATAAATTTG GTGACTCAGG, TTTTCACTCT TTGCCTTTGT CACAAACGTG CCTATGATGG CCACTCCCAC ACTTACCACA GTTCTAGAAG GGGTGATTAA ATCCTGACAG CTGATGGACC
GGAGTGGTCC
ATATGGCACA
TATCGTAGTC
TGAGCAAGCG
TGCCCGTG
CAATCGTGAT
CGACAAAGCG
GAGTGTAG'TT
AGAAGGTCGC
GAAAGTCATT
AACCAAGCTG
AAGGATT'rCG TTT'rCAGATT AAGGCGGATG GCTTGGCACT TGGGAAGGGT GTCGAAGCCG CTCATGAGAT GCTTT'TGGAC GTTATTGAGG AATTCCTTGA AGGAGAGGAA AAGTTCTACA TCATGCCAAC GGCTCAGGAC CCTAACACGG GTGGTATGGG TGCCTATGCG GATACAGCGG TT'GACACCAT TGTCAAGCCA CCTTA'rCTGG GAGTTCTTTA CGCAGGGCTT GAGTTCAACG CTCGGTTCGG AGATCCAGAA ACTCAGATTA TCTTGCCTCG
GATAGCAAGG
GCATCCAAGG
GGCGATGTC.A
TCAAACGGCG
GCCAGCATAT
ATCGGAAGCA
ATAATGGTCG
GAGACCTTAG
AGCCAAATAT
GCT'ACCCGCT
TCACCTACTA
GACGAGT1'TA
ACCAAGAACT
AGGCAATTAA
672 CTTGACCTCT GACTTTGCTC CATGTGGACG GACAAGGGTG AGACTATGAA AGGGGCGTTG TGCAGGGCT AAGT'rTGCGG TATGCTCGTT ACCAC.AGCAG ATACCAACAA AAAATAGAAG GTAAAGATAT AAGAATAACG TCCTGGTGAA AAGACCAGAA CAGTGAATGT GCTCAAAG'rT TAGGAATGAA ACCGAAGGTT *4 9
C
C.
C
C
C
C. CC C C
C
AAGACCATTA TCAAAAAGAA AAATAAAAAT TCACAAAATA GCGAGCGTTA GCGAGCTAA'r ATAGAACAAT CACCGCCGTT TAATCCAATC GT'rCAGGGAA A'rTGGAAGAC C'rTGGGTTTC rTTTGGTGGCT GCTGCCGTCC CTCACAAGCT AAGGTGATTG GAAGAAATGA AACCAGTAAT TTCCATCATC ATGGGCTCA.A CAAAAAACAG CAGAAGTCCT AGACCGCTTC GGTGTAGCCT GCACACCGTA CACCAGACCT CATGTTCAAA CATGCAGAAG AAGATCfATCA TCGCAGGTGC TGGTGGCGCA GCGCATrrGC ACAACCCTTC CAGTCATTGG TGTGCCAGTC AAGTCTCGTG AAAATATCAC AGATATCCTG TGACTCTGGG TGTGGTTGTC AGTTGCCAGC CAAGACAGAA AAAATAGCAG AGCACTGCrC ATACCGTCAA AGAAGCCCAA GACTCTTCTA CCGAACAGAT CGCCGTAGTC GCCAAACACG TCrGGTCAGG GGGAAACTTG TGCTTCCGCC TCCATCACCT CGTrAATGA'r CGTATGGTTT GTGAAAGAAC GATTGGATGA CAATTTAGGC ATGAGACACC TTGAAAAAGA GGAAAAAGGA AATCCGACTG GGCAACCATG ACGAAAAGAA AGTTGTTTCC AAGCCCGTAG TCGTGGCATC CAGGCATGGT AGCTGCCAAK' CTCTTAGTGG AGTGGATTCA CGACCATGGC TATCGGTIGAA TCTCTGTAGA AGATAAGTCC AAATCGCAGA GGAGTCGTCA CAACTGGGTC AGATGATGGC GATCCTGCGG CGGATTGCCC 19560 19620 1-9680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 CTCTATTCTA TCGTTCAGAT GCCGGGTrGG GCTGGAGCGA CTAACGCAGC TCTCTTTGCC ATTGCGGATG CACTTGCCAA CTr'rGCTGAA AATGAGCTCA TCTAAAACAA TCGGAATTAT G'rGCCTGTrTG
CTCCGTCTCC
GAACAAGGAA
CGGTGGCGGT
TATCGCGCTG
ACCTTATAAC
TGAGTTTGAA
CATTTCTGCT
GGTCTCTCGT
GTTGGCAGAC
GGATGCCGTT
AAATCGTATT
ATCTACATGG
GTGGCGGAAA
CGTTGCGATG
ATCAAGGATG
TTTGAAAAGG
GCCACAAGG'r
TCATTGTGGC
TCCTCACTTA
GACAACTCCC
AC.NTrTTGTC GATGTAGACG CCCTCCGTCA AATGTCGACG CTGACGGTTT CAAGGTCGTG ACTTCTAGCC TAGACTTGGC CAAGACTGCG ACTGGTGGCT ACGATGGTCA CTTGGAAGCA GCCTATGCGC TAGCAGACTC TCAAGGAACA GATCTGCTCC GCATTTCGCA AAACAAGGCT CAAGTCACTG TGGCACCCTA AGATATCGAC T'rGTCGAAAA ACTATGTCC'r TGGACAAAAG G1'TATTCGTT CAGAAGCAGA AGCAGACTGC GTCTTGGAAG AATTTGTCAA 673 CTTTrGACCTT GAGATTTCTC AGTTCAGGAA AATATCCACC TTCTGAAAGT CTAGTAGACA CTTGTCTGGA ACTCTCTGTG AATCGCCCCA CGACCACATA GT'N'GACACC CATAT'rCTGG GCCAGCCGTT ATGCTTAATG AGAAAATCCA AGCGCCCACC GATGGGACAT GTGACTTTGT GATTGATT'rT TAGGACAAGT TAAGACTGGA GCAG=TTG TAAATATGAA GAACTAGCTG TCATCGTGTC AGGAAATGC GCAACAATAT CCTGTCTAAG AGGCTAAAGC TA'rGGCAGTG TGGAAATGTT TGCGACAGCT ACTCTGGGCA CTATTCTATT GTGTTCTCGG AGCACCAT'rA AAGGAGGTGA CG=TTTTCCC ACCATCGTAC CAGCCCGCAT CCAATCGCAG AACAACTCAA GATGACATCA TTGTCAATGA GAAGCCTGTG ATTTCTCTCA CCAGTCATCA AACTCCATGC TCCTCGGTCA GCATGTCGAG GCTGCTGAAA AATATGTCAC TCCACArG'rA TGGTAAAATA GAAGCAAAGC ATAATCGTAA TTAGTGATGT GCCGGATAGT GTGGAAGAGT T'rGGGGAAGG CTATGATACA AATTATCGTT AATACATITTA TTGAAAAGTA AAGTGTTGTA 'rGCCAGT1GCT GACCAAGATA AGGTACAAC CACAATACCC CGAAAATTAT TTAGCTATCT ATAATGTACC GCTGGATACG GATTTGAATA CACTAGATCA TTACCCGTCT GTGTTTATTG *9
*G.
S
S. 49 4 9 -9 9*
S
GTTTGAGTAG
AAAAATTAAC
.GAAAATAAAT
TTGGGGGAAA
GACCGTATTT
GTTTCTGAGA
AAATCTTGGT TTACCTAGAT AGCT'rATTCC CAACAGCTTA ACATGATCAA CCGTTACTCT CGCCCTGAGA TGGCGAATAT ACCGTGCTTG CT-TGAGGTG GAXATCCTCT CT.GACCAGGC TCCCTAAGGA AGATGTGGCT TTGATTCGCA AGAAGGCGGA '1GGAAATTGA GCAGGAGACG CGCCACGATG TGGTGGCTT CTCTTGGTGA AGAGCGCAAG TGGGTTCACT ATGGGTTAAC
GAAAAGAGGA
AGAAGAAAGG
TTGGAGTGAA
ATGGGCTGAG
CTTTGACATC
CACGCGTGCG
TTCTACTrGAC 21360 21420 21480 21540 21600 21660 21720 21780 21840 21900 21960 22020 22080 22140 22200 .222 22320 22380 22440 22500 22560 22620 22680 22740 22800 22860 22920 22980 23040 *GTGGTGGATA CTGCTTATGG TTACCTCTAC
S*
9 5 4 5.4.
S
9S**
S
9* 9* 9 S
S
CTTGAAAACT TCACTAATAT ATGGGGCGTA CTCATGG'rGT TGGTACAGCG AAATGAAACG GCTGGTAAGA TTTCTGGTGC TATGTCTGCG ATAAACTTGG GACCTTCACG C'rGAGTAC?1T GCGACTGAGA TTCGTGGTCT AAAGGGCAAA AAGGGTCTTC ATGACTCGTC 'rGGCGCGTGT CA'rCGCTGAC
GCACGCTGAG
CAATATCGAG
GGTIGGGAAC
CATCCGTGCC
TGCGGTTCTT
AAGCAGG&CA
AAGGCCAAGG
CCGACAACCT
CGCTTCGAGC
TTTGCCAATA
CAAGAAATCT
CCCAGCATTG
ACGACATCAT CCGTCGTGAC AGCACAAGT CACCATCATG TTGGTCTTAA AI'AGCAACT ATGCGGCTGC TGGTGTAGAA TCCCACCATT TGTAGAGGAG CTACACAAGT CCTTCCTCGT CGACTTCAAT CGAACGTATG AAGTAGAAGA GTTCTTTGCT ACAAAAATCT GAGCAACGCG AGCAATGCCT CACAAACGCA ACCCAATCGG TCTGAAAAT CATTCGTGGT CACATGATTA CGGCTTATGA AAACGTCGCT 674 CTCTGGCATG AACGCGATAT TTC'rCACTCA TCAGCTGAGC ACCATTTTGA TTGACTACAT GCTCAACCGTr TTTGGAAATA 'rTCCCAGAAA ATATGATCCG AAACATGAAC TCGACTTTTG GCTATGTTGA CATTGATTGA AAAAGGCATG ACCCGTGAGC CAAAAACAGC CTACTCT'rGG CAGAAGTAAC ATCACGTCTC ACACCAAACG AGTGGATGAT AACAGCGAGC TTCAATCTCG TTAGTGAGTC CATAGGCTGC TCGTGAGTTC CTGT'rTCAGG AGAACTTTGC TTTCCTCAGC T'rTGGT'rTTT CTTCAGCAAT GTTCCGACTT CGACTATTTG TTCACTTCCT TACCATCGGC CACAACCAAG TAGACTTrAA ACACAAGAAG AAATCGATGA ATCTTTGAAC GTCTTGGACT CTrGTTTATTT TTTATCGAAA TAGTGTGGAC ATGAGTCCI'G AAGTT'TTTTC TCTGTTACCA AGGAGCAGTT GATGGAGCTG AGCGGCTTGT CCGTTTTCAT AGTAACGGCT TCCTGTGCTA AGAAGTGCTC ACAGAGTAGA GTATCATCAC ACCAGATACG TCGTCAAGAA CTTGACAGTC GTCTTATCT' 'rAGCCAACGG AAGCCTATGA CTTGGTGCAA ACCACTTCTT GAGGCAGATT AATCTTCAAC CCAGTTATT AGGTGATTAA TTAAAAAATA AGACTTAGTC TTCTTTTCTT CGACTACTAG TCCTGCAGAA CAGGAGCTGG ATCTTGAGGA GTTGGCTTGG GATTTCTAGT CGCCTACATG TGTTACCATA CGACACTATT TACAAGTG'TT AGTTGCTACG ATG'rCCATTG ACGCCCTTAG TAATGACTTG TGTTTT'rCCT TTGAGTAAGA GTGGATTT'rC ACAAGTCAC7 GTGGTAAATG GAATTTCTTC TTCTTGGATA TCr-AGTCTAG GTTTTACCTC AGTAGTTGG1 23100 23160 23220 23280 23340 23400 23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 24240 24300 24360 24420 24480 24540 24600 24660 24720 24780 24840 GCAAGACCAC TTTCATCACC ATAACTTCTT TGGTTACCTG cI-rGTGAGT'r ACAGGAGCGC CAACTTCAAC CACTT'GGTT'r GCTATCAAGG ACTGTTTCTG TTGTTTTTCC ATTTTCAGTG ACTACAGAGA TGTAATGAGT CCGGCTGGGA GGTTAGGATT GTGATGAGT'r CTGGTCTGGT GAATGAGTTA CAGCTGGTTT TCAAGCTCAG CTTGTTTATT AGGGCCTCA-A GACIrCTTT TCGCGGAGAG CATTATAATT ATGAGTTCAA AGATTTCCTC AGCATACTTC CGACTGTTGG AATGGTGTTT TTCCAGTATT TCGTTCACCT TTGACTCCTG TTCT'rTCTTG ATAACTTCAA TTCAACATTG GCAGCCACTT GAGGCCTTGA AGAGCGGCTT ACGGTTGAGG TTGTAATTTA ACTATATCCT TCTAAGTI'TG AGCACGAAAG TAGTCI'TGT TTCCTTGTAT TCAGCGCTTG AAGATCTACT TCAGGATATT CTCAATAGCT TTCTTGAGGA CTGTGATAAT ATTTTCCTGA ATGGAATTTC TTCAGTTCTT CATTTTCATC TAGGCTTCCT TTAGGTTGGC TACAAGCGTG GAGCTGTTTT AGCTGCGTCA TAGGAATTTT AGCTAATTCT TGTGGTCTGC AAAGGCAGTC GTCTATCTGC CCAGATTGAA TGGTAGAAGC TAGTTGATTG AACCACCACC ATCTTCTGGT AGAATTTATA GCCTTTGCTT CTTTAGACCA GTAAGAAATC TTTTGACCAA GAATGTAGTA CCAGTCACCG TTGGTATTCA GCTAGGTATT GAGGTGATGC GAGGTTATAT CCCCACCAGC 675 AACATCGTCC TrTGTCTTCAT AAGACATCTT TGTCAAACTG GCCATTGGTT GAAGCCCTCT AAT7TTGCCAT AGAGTTGATA TTGGCGTATT~ CGTCAGTACC TATTTACCGA TGAGGGCT GTTTTrTGAAA CTTTATCAAA ACCAGCATAG CATCCATGTG GA'TTAGCGT ATTCAATTAG TCGTCGTAGT AAGCTTTAGT GTTTTTCCGT TGGCAGTAA'r AGAAGGAGAT GGACATCAGA
TTCTTTGGCC
CCACTI'GAGG
AAAGTTGAAA
TACAAAGTTC
GTGGGCTTGA
ACCTGGACTG
CTCTrGTTACT TCCT'rCGATA
ATAGCTGCGA
TAGTACCAGC
A'rCTTTGTTT
ATCGCTTCTT
GGAT'rTTTAA
TTAATAGCTG
TCTGCCTGTG
ATAGCTTTTT
AGTAGA.AGCC
GGGTG'rTGGC
CTTGGGCACT
TACCTGCAAA
CGT'rTTTCAA
TACCTAATTT
GGATGAGACC
TTAGTGCAGT
TAACGTCATC
ATCGT'rGAAG ATATTCG4GCA
AGTCGCATCG
GAAGTCCATG
GTCC-ATAGT'r
TTCCATGGCA
GATGTCCTTA
ACCGTTTGGA
ACTAGCA'rAG GGTCATATCA TCGAGTAGAA AGCGAAGTCC GTCATTTCCT ATATCCGAGC TCACTGGCCT TGTCTACGAT GCGTTTGAGC a. TGGrTrCAGAG TT'rTCAACCTr
TTGACAGTTT
AGTTGCTAT
GCAGTATAAC
TCAGCTGCGA
GATGGTGAAT
GGGATATCAG
AGCCCC'rGAT
CAAGCATCAT
ACATCGTAC
CCAAAGCTTG
TGCGATTTTA
ACAGGGTAGC
TCTGTCGCAT
AAATAAGCAG
TAAAGTATTT
CACGTTTAGC
CTTGAAGTTT
CGAGAGAGCG
GGTTCAGGTC
AGTATTCAGC
AACGTGCAGA
CTCCTTCTGT
CGAGATTGTA
GCGTCCAGCA
TTCTTCT'rCT
AGCAATGGCT
AATAGCTTTT
TTTW.GGTACC
GTTGGCATTT
TGGAGTGTCA
TTTTGGTACA
CCAGCCTTGG
TCGATTGAGA
TT'rTGAGCT'r
TGATCAATCG
TCAGCTTCT
T.CGTTAAG-TG
GCAAAATGAC
GCCCAAGCAG
GAAGTGATTG
CCATCAGCGT
CCTTTTTCAG
ATAGAAACCA
CCGTCGTTAA
GCGTAGGCAA
TCAGCTTGAA
CCGA'rGTTGA
TTTGTAAAAG
TTACCTTGTT TTTGGCAAGT CAGGCGTGAG GGTCAAGTTG TATCTTGTTG GGCACGGCTA TTACGGCCGT GACGCTTTCT CTTGCTCTGC AGATTOA-TAA GCATGAGTTT GAAGAGGCGT CTACCATACC ACCGATGATT GTGTGTTTTT AATACCATTG TrTCGTCCAAG AACGTAGTAC CTAGTAGTTT AGAAGAAGCG TGATGTCTTT GTCAAAACTA AAGCCATTGG TTTGAGACCG TAAATTTTTC ATAGCCTTT 24900 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26400 26460 26520 26580 TGGTATTAAG GATTTGGTGA C'rCCCCAACC ACCAGTCCAC
TGTCGCTATT
CAATACGAGC
CTTCGTTTGG
CATTGGCATA
CATACTTGTC
GTAGTAGATA
GAGGTCAT'rG
ATAGTATTTA
TTCATCAAGT
GATAAGGGCT
GCACACTCCA
AGATTTCAGT
CGACAGCTTG
ACCT'rrAGCA
CTTPTTTCGCG
TTCGTTGTCA
AGATCGACAG TACGGGCTGA AATTCTTTCA TGGCATTGAG TTTCTrCCCA AAATAGCTAA AGTTAGGGTT TTGGATTCCC AATCGCATCC ATGTGTCCAG GACTAT'rTAC TGTCGGAATG 676 AGACCGATAC CTT'rATCT'rT GGCATAGI'A ATCAGATCTG TCATTTGACT TTCTGTTAAG
TGATTGCCGT
TCGTCACTGC
AGTCCATCAT
ATGATTTCCT
?rTTTTTCG
GGTTTGTCAG
TCTGTATTG
TCTGCCTTTT
TCACTAGGAG
ACCTTATCAG
GCTTTAGGTG
1-rGGATCGTT GTAATAATCA TTTGTACCTT T-rTCAATGGC GCGTTTGACA CATAGGTCTT GCCGTTAGCT GTGATGCTCA TATCGTCCAA CATGAAACGG TTCCGACTAA TAGGTGTAAA 'rCAGTrAGC CATAATGTN' CGCTTrATCG TGAGCTGTT~C TGGTGAGAAA TA'rTTACGTC CAGCATCAAT AGAAACAA'T CTAGTTTTTC ATr'rACAGTT GCAGCACGTI' CCTTTCCTGC CTCTGTTGCC CCTCTGCT'rT CGCTTCATCT TTT'IrAGCTG GTT-TATCCTT GTCAGTCTTG ACTCTTTAGA ATCAACCTCT T'TCGCTTCTI' CCTTTTTAGG GC'rAGCTTCT TATTAGCAGT TTCTTTTTCA GCAGAAGTTG GAGTTACCAC TTCTGCTTTA
TTGAACTAAC
TAGCTGGAG'r
TTTCCTCAGT
TTCCTCTTGT GGTTTTTCTT TTCTGTTTCT ACAGTTTTTG CCGATT'rTCG GATGATTGAG ATGGTCGGTT1 GGTTTTCTGT AGTAGTAGGA GTAAC'rCCAT TGGAAGGCAA ATCCAATTAG AACAGAAGCT CGCTGTTGTT 'rTTCATGTTT CATTGCAAAA ATTATATAAA TCAACGCCTT TATTTTATrr CAAGAGGAGA TGACAAAAAA CTATAATAAG T'rTCAGATTG GTCGGAAAAA ATACGTATAT ATAAAATATT CCACAAATTA TAGAAT'rTTC TAGTATAATA AGCATAGAAA AAGCCCAAGC CGGTCACATG AGATAAATTT AATCTTGTAG
GCTCCTACAG
CCTCCTGAIT
CTTA'rATAA
TATAAAAAAA
ATATCTAGTA
CAAAAATAGG
CTGTTTTTG3G AAGACTAGCT GAGCTTCTGG TTGAAGCACT GGGAATCAGA AACCG'rATCG CGGCTGCAAC AGTCTGTGCT CGTATPTACG AATAGAAAAA GCATTGTTAT ATTGATAGCG TTTCTTATAT TAACGAGAGT TATAAALATTT AAACTTAAGA TAArTTTGG TTCTATTTCT 'rAAGCGCTAC CTTTTTGGTG 26640 26700 26760 26820 26880 26940 27000 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27 660 27720 27780 27840 27900 27960 28020 28080 28140 28200 28260 28320 28380 GATTAGCTCA GGTTTTCTTC TTAGTI'ATCA TAATCAGATC GTTTGTA.AGT TTCACTGTAT
TCTAAAACT
TGT'rCATCGA
GGCCAGTTGA
CTCCGAGGAC
TTCGAGTTTG GTGATTTTAG TGAAGCTGCA TGTTCTGGAG ATTTCTTCAA AGTGT~r-ATC ATTCATGTGA ATGTGGTAGT ATAGAACTA'r AGTATTCAAG GTTTGGATAA ?ITGCGTTGA GATG'rATGGT AGATATAAAC GACACCGTTT GATTCGCGGA AAT1'GATCGC CGCGTAGACC CAATTTTTCC AAGTAAACAA GAAAGAACAG TTACCTTATC ATCTTTAGCA TTGAAGAGTT1 ACTTGTGTT TGCGTGCACG TGAA.ACGAAG GTTCCTTTTC CCATCTTTGG CAAGGTCGTT TAAGGCGCGA ACAACTGTGA TTTGTAGGAC AGTAGGGAAT TTGGAAAGAC TATTTCGTTG CTAACTTGAA ACGATTAAG TATATTGTTC TGGGATGTAG TACGTTCAAT CTTGTAGTAG GCTTGTTICC GCGTTCAATT CAATATCTGA AAACTCTACA CTTGTTGGCG GACAATATAG TAGAGCTGAC ATCGTACATT 6*77 GAAATGAGTT CTGCTTCAGT GTAAAATTTA TCTCCACTGC TAAACTGCCC AGAGATGAITr 28440 TTATTTTrTA ATrCGTCTTT TATGTATrGA TGG 28473 INFORMATION FOR SEQ ID NO: 84: SEQUENCE CHARACTERISTICS: LENGTH: 6749 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: CCTGATGGGT GGTATGCGAG GATACAGTTC TGAAAATCGC CGTTACTTAA TTAATGGACG CGAAGTCACA CCTGAGGAAT TTGCTCACTA TCGTGCGACT GGTCAATTAC CAGGAAATGC 120 0* a a a a a a AGAAACTGAT GTGCAAATGC CACAACAGGC ATCAGGTATG AAAACTAGGT CGAAAC'rTAA -CAGCAGAAGC GCGTGAGGGC ACGAAACAAG GAAATTCAAG AAACATCTGA AATCCTCTCA TGTTTTGGTC GGAGATGCAG GTGTTGGTAA GACAGCAGTT CATTGTGAAC GGAGATGTTC CTGCTGCTAT CAAGAACAAG CTCAGGTCTT,,GAGGCTGGTA CTCAATACCG TGGTAGCTTT AGTCAATGAA GTGAAAGAAG CAGGGAATAT TATCCTCTTC TCTTGGTGCT GGTAGCACTG GTGGAGACAG TGGTTCTPAA AAACAAGGCG GTGTCCTTGC AAGTrGGATC CTGTTATCGG CGCCGCACCA AGAACAATCC GTCGAAGGTC TAGCGCAAGC GAAATTATTT CTATTGATAT GAAGAAAATG TCCAAAACTT TTTGATGAAA TTCACCAAAT GGACTTGCGG ATATTCTCAA GCCAGCTCTC TCTCGTGGAG AATTGACAGT GATTGGGGCA ACAACTCAAG ACGAATACCG a a a a a a. a.
a TAACACCATC TTGAAGAATG TCCTTCGGCA GAGAATACTT CCACAATGTC ATCTTGCCAG CATTCCTCAA CGTAGCTTGC CTTGGCGGCT CAACATCCAG AAAAGACAAG CAAGAAAAAG AACACGCATT GCAGAATTGG TGCAAGTGTC AACGATGTGG CTGCTCTTGC TCGTCGT'iTC AACGAAGTGA AGGTCAATGC TTAAAATTCT TCAAGGAATT CGTGACCTCT ATCAACAACA ACGAAGTCTT GAAAGCAGCG GTGGATTATT CTGTTCAATA CAGATAAGGC TATTGACCTT GTCGATGTAA CGGCTGCTCA TAACAGATGT GCATGCTGTT GAACGAGAAA TCGAAACGGA CAGTTGAAGC AGAAGATTTT GAAGCAGCTC TAAACTATAA AAAGGAAAAT CGAAAACCAC ACAGAAGATA TGAAAGTGAC CTGAATCTGT GGAACGAATG ACAGGTATCC CAGTATCGCA 720 780 840 900 960 1020 1080 1140 1200 1260
CATCGCTTGC
AATGGAAGCT TCAGATATCG AACGTTTGAA AGATATGGCT
AAGACAAGGT
GATTGGTCAA CATAAGGCCG TAGAAGTTGT ACCTC TCTGAACTCG AGCTCGTGCT ATCCGTCGTA ACCGTGCTGG TTTTGATGAA GGAAATCGCC TAAGACGGAG CTTGCTAAGC CCGTTTAGAT ATGTCTGAAT AGCAGGC'rAT GTGGGTTATG TCCATACTCT ATCATTCTCT TCTCCTCCAA GT'ICTAGATG 678 CAATCGGCAA CTTCCTCTTT GTAGGGTCTA CTGGGGTTGG AATTGGCACT CGATATGTTT GGAACCCACG ATCCCAT'rAT ACAGTGACCG CACAGCTGT'r TCTAAGCTAA TTGGTACAAC ATG.AcAATAG CAATACCTTA ACAGAACGTG TTCGTCGCA.A TGGATGAAAT TGAAAAGGCT GACCCTCAAG 'rTATTACCCT ATGGTCGTTT GACAGATGGT CAAGGAPLATA CAGTAAACTT CAAGAACACTr GTCATTATTG CGACCTCAAA AGAAGATGCG GATAAACCAG AATTGATGGA CCTCAACCGC TTrAATGCAG TCATCGAG IT GATTGTAGAT TTGATGTTGG CTGAAGTTAA GGTAGTCAGT CAAGCGGCTA AAGATTATAT TGCTGGAT'r'I GGCTATGAAG CCAACTTGAC CCGTTTGAAA CCCTTCTTCC GTCCAGAATT CTCACACTTG ACTAAGGAAG ACCTTTCTAA CCAAACCTTG GCTAAGAAAG ACATT'GACTT CACAGAAGAA GGTTACGACG AAGTCATGGG 0 4 4* 4 4 4 44 I;.
4 4 .4 44 4 0 4 4* t~4 4 444444 4 4 444.
44 4.
4 p.
4 *44* 4 .444 no.
44 4.
4 4 4 GGT'rCGTCC'r CrCCGTCGCG CTTGGATCAT T'rAGATGCTA TCGTGAGAAA GTC'rAAGACA CrGGTTCCTT TTTAGGTACG AATATACTAT AGTAGCTCAG AAAAACGTGG ACTGGTTrTCG GTTGAACCGC CGTATGCCGA CCTACTCGAT TITTAAATCAC ATGAGTTTTG CCCATTCI'T TGG'rTGAACA
AACATCTGGA
GAATTTTGAG
ACAGGCATGT
AAGTCGGTAC
AGAAATTCGT GATAAGGTGA CAGACT'rCCA AGCAGATATG GAAGATGGCG TTTTfGGTTAT GATAAAAAAG ALAGGAGCCAG CTGAAAAAAA CGTATAGTAG AAGTGTATTA TTCTAGTTTC TTAAACGTGC TATATCAAAA CCAGTCCTGG 1320 1380 .1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 TGTTrGGA'rT ATTACCTTGA ACGACATGCG-TTAAAAGTTA ATGGTACGTA CGGTGGTGTG AGAGGGGCTA GAGATTATCC ATGACGTTCA AAGGCATCAT CTGAAATCCC TTGTTCCAAG AGCAGAGAAG AGGCTGTGGT CCTTGTAGTT TCCGCAAGAT TCGATGGTTG TCCCTrGGGAC ATCTTCCCAA TCCTTGATAA CAGCTGCGAT TITTAGCACTG CCTGTGCGGC AACCAAATGG TGAACAGTCA TTGGCTAAGA GGTGCTCGAT AGTGTGAAGG TGCACCAAGC GAATATCATA ATTGGAGATG ATCAAGCGAA CATAGGGTGC TTTGACAATG ACTTCTTTTG ACATGGTAAA TCCTTTCAGT TCCTGAGACA GAGAGAAAAC CTCTCCGAGG GTAGTAGTTT CAGCGATT'rC CTTGAGCGAA
GTGTGACGTC
ATCATGCCGT
CCGGCAGTAG
ATGTCTCCTT
GTGTGGTCAA
T'rTCTTCTCT
CTGGAGAGGT
CCCACATAAT CATGTGGAAG CAATGCGGGT ACGGATGAGT GGATAGAGTC TTCGrTTGGT TTGGTCCTGT TTCTTCCCCA G1'TCAAA.ACT TTCGACAATA CATTATATCA TAAAGGTTGC TGAAATCTTT ACTTACGATA ATACCCAGTT TACGAAGGAC TCGCTTCCTT TGAGGTTGCC TAAGCGGTCG TATTGGTAGT ATGGGTCAAA GGTTACGTTG ATTrCTTGTCT TCATCAGTCA AGATGATGGT TGAGTGGGCT 679 GAGTTCI"TCC ATAGCGCGGG CAGCATCAGG ATTTTCTGTA GCTGTGATAG CAAGTGCAAT CAGGATTTCA TTTGAATGAA GGCGTGGAIT GCGGCTACCG AGATGATCGA TTTTAAGACC TTGGATTGGC TTAACAACTT CAGGCTCGAT TA, LrrTTACT TCTTTAGCGA TGTCAGCTGA TTTTTTGATG GCGTTGATCA AG'rGATGATr TCCCCATTr'G TTTTTGGCGC GCAACGACAG GAGCAACTCA ATTTTCTTGA
AGGCAGCGGC
GCAATTCAAA
CAACCTTACG
CGGCAGCTTC
TGTAGGACCA
GGCTAGGGCT
GTCTGCAGGT
GCCAACTTTT
AAGAGTTCTG AGTTCTTACC GCTCCACCAG TTTCTTCTGCG GTGATACCGA GGTCGTTCAT TCAGCT'rTGA AGTCAAGAAC TCGACAGCGG CCTCGTCATC GGTGAAGCGT ATGGTGATT ATTTCGATAT CACGGTTGTA TCAATCATGT TGACATCATC TGTTTGATAG TAACGGCGGA TGATTTCTTG TT'rAGAAGCT TGTAATAGCG AAACCAACCA TGTTGACACC CATATCTGTC TCCGAGAATA CGTTCCAACA TGCGTTTGAG CACTGGGAAG GTTGACAGTG GI-rCTCCAT AGGTI'TGAAG ATGGAAGGGG AAGGTCAGCT GTGGCAGCTT CATAAGCCAA GTTAACTGGA TGATGAAGGG GAAGATTCCA AACAGGGAAG GTTTCAAATT TAGCGTAGCC AGATTTGATG CCATrGATTT GGTCGTGGTA CATATTGGAC ATACACGTTG CCAATTTTCC AGAACCAGGT CCAGGAGCGG TTACGACAAT CAAGTTGCGA CTGGTTTTGA TGTAGTCGTN TTTGCCCATG CCTTCTGGGG AAATGATGTG ATCCATATCC GTCGGATATC CTTTGATTGG ATAATGAAGA 'rAAGAATCAA TTCCGTTITT CTCAAGTTGA TTGCGGAAGG CATCTGCAGC GGGTTGGCCA GCGTATTGTG TAATGACAAC GGAACCAACA AAALATCCCTA ATTCATTGAA TTTATCAATC AAACGAAGAA CT'rCTTGGTC ATAAGAAATG CCTAAGTCGC CACGTGCTTT GGAATGTTCA A'rGTTGCTAG CATTAATGGC AATCACAACC TCAACCTGCT CT'rTCAATTC TTGCAAGA GC TTGATTTTGT TGTrCAGGTTC ATAACCAGGA AGGACACGAG CAGCGTGGAA ATCTTCTAAC AT'N'TACCGC CAAACITCTAA GTAGAGCTTG CCGTCAAAT'r GGTTAATGCG CTCCAAAATA TGGTCGCGTT GTAAATTCAA ATATTGTTCA GAACTAAAAG CTTGTTTT~TT CATr=IA CCTCTGGACI' CTATTATAAT AAAAAATTGG AAG1"TAGGAA ACTACGGAGC TAAAAAAGAA ATTAAAAAGA TTAAGCAAAC GCI'TGCACAA AATTTTAAAA AGTGCTATCA TAGACTATAG ATTATGAAAA TAATGAGGTA AACAGATGCA AGAAAAATGG TGGCACAATG CCGTAGTCTA TCAAGTCTAT CCAAAGAGTT TTATGGATAG TAATGGAGAT GGAGTTGGTG ATTTGCCAGG TATTACCAGT AAGTTGGACT ATCTAGCTAA GC'rAGGAATC ACAGCAATTT GGC'TTTCTCC CGTTT1ATGAC AGCCCTATGG ATCATAATGG CTATGATATT GCTGATTATC AAGCGATTGC GGCTATT'TTT GGAACCATGG 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800
AGGACATGGA
TGGTGGTCAA
ACAGCCCTGA
TTAGTGGGTC
GCAAGAAACA
TGATGAACTT
TTGGCAAAAT
AGGAAATGAA
680 TCAGCTGATT GCAGAAGCTA AGAAGCGTGA CATTCGTATC ATCATGGACT TCATACCTCA GATGAACATG CTTGGTTTGT CGAAGCCTGT GAAAATACTG GCGAGACTAC TATATCTGGC CCGATGAACC CAATGACCTA GATTCTATCT TGCTTGGGAA TACGATGAAA AGTCAGGTCA ATACTATCTC CACTTTTTCA GCCGGATCTC AACTGGGAAA ATGAAAAACT TCGCCAGAAA ATTTATGAGA CTGGATTGAT AAAGGTATTG GTGGmTCCG TATGGATGTT ATTGACATGA TCCTrGACGAG AAGGTAGTCA ATAATGGTCC TATGCTCCAT CCCTATCTCA TCAGGCGACC 'rTTGGAGATA AGGATCTCTT GACAGTAGGG GAGACTTGGG GAGCAACTCC AGAGATTGCC AAGTTCTACT CTGATCCAAA GGGGCAAGAA TTGTCTATGG TCTTCCAGTT TGAACATATC GGTCTTCAGT ATCAGGAAGG TCAGCCTAAA TGGCACTATC
AAAAAGAGCT
GAGTTGAGGA
CAATCTGGGG
TTCATCTCAT
ATCCGT'rTGA
CTCTTGAAAA
ACA:ATGCCCG
AACCTITGGTT
GAATATCGCT AAGTTAAAAG AAA'rCTTCAA CAAATGGCAG ACAGAGTTAG CGGCTGGAAT TCCCTCTTCT GGAACAACCA TGACCTCCCT CGTATTGTCT AAATGACCAA GAATACCGCG AAAAATCTGC CAAAGCCTTT GCAATCTTAC GAGAGGAACT CCTTATATCT ACCAAGGTGA GGAGATTGGG ATGACCAACT AACACTGGAT CAAGTAGAAG ATATTGAATC TCTCAACTAT GCGCGTGAGG AGGTGTTCCG ATTGAAGAAA TCATGGACAG TATCCGTG'N' ATTGGACGTG TACCCCTATG CAATGGGACG AGAGCAAAAA CtCTG=TTTC TCAOAGGTC GGCGGTTAAT CCAAATTACG AGATGATCAA TGTCCAAGAA GCGCTGGCAA 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 ATCCAGA'N'C TATTTCTAT ACCTATCAGA AACTGGTCCA AATTCGCAAG GGCTAGTTCG AGCTGACTTT GAAT'rGC'rTG ATACGGCTGA TAAGGTCTTT GTAAGGATGG CGACCGTCGC TTCCTAGTTG TGGCTAACTT GTCCAATGAA
GAGAATAGCT
GCTTATATAC
GAGCAAGACT
TGACAGTAGA AGGAAAAGTC AAATCTGTCT TGATTGAAAA CACTGCGGCT AAAGAAGTAC TTGAAAAACA GGTCTTGGCT CCATGGGATG CTT'rCTGTGT GGAATTACTA TAAATATTT'r
TTGCAGAAAA
AAATCCTTTG
T'rACAAATTrC
ATTTAAAA'TT
TT?1TTrATAA
CCACTATTAA
GAAATCGTAT AAAAACAAGG GAGGACTGTA CCAAAGTTTA TAAACTCA TTCTTGAAAT GGAGAAAGAA GATGAACATA AAGAAGCGTG CTTTGCr'rTT ACCCAAATCA 'rTCATACCTC GACCTCATGA GCCACTTTCT TCCTCCTCAT
TAAAAGACAG
'rCAA1TrAACT
TCCTTAGTGC
TCTCAACTAG
GAGGTCAGTT
AGGCCTGACT TTTGCATCTG ATGTAACTTA CAAAACCCCT TTACTTTCTG CTGTTCCAGT CTCCCTTGGT GCGTCACACG ATCGTTTTTC CTCGCTAGAT TTCCTCAAAA GGGCAGACTC Aq'rTTTTCAT CTrrArTrTT rTTTALATrcA TCATTALACA 681 CGCTI'TCTT CTAGGTGGTT CATAAGGAAC AGGAAGATTC AGGTrGACTr T'rCTAATCCT AGAATAAAGT GCTGAALAACA ATTCGGAATA GGCATAGAGA CTAGACAATT 'rGAGGAGCTG CTTGCGTCCT GTTCGAACAC ATTTTCCGG INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 1842 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: TCTACCCATG GACTTTGAGG CATTCATTGT TCCATCTTCT AGTGGCGAAT CTTTTGATAC 6660 6720 6749 AAACGATTCA ATTCACTTGG ATCCAGCTGA TATTTCTTTC ATAT'rGCTTC TCCTTTAGTT GTATCTCTTG GCAATAGAGC TTTGAAACCG CAAGAGGAAG AAATCCGTTA ATCTTAGATT TCAAAACAAT CTGAAGAATA ACTGAATGAC AAGATACCTC CTTTGTCTAT TTTTACTTAG CATTTGATAT TCTGACACGA AAGTTAATCG GTTTCTTGTC CCCTCTTTTC CCTCGGCTCG TCATCTACAA AGGCATCAAC TTTTGAAGAT AGGATTCGTA AATGTCACTT CAAACTCACA CAAATTTGAA TTGGAATAAA CAAAGACAAC TCCAACCGAT ATTCTTATAC CGTTCACTCT CTGATATTGA TTAGCCGTAA ATAGTGAAAC TCTCCCGCAA ACATTTTTCT GGTTAACTCA AGCCAAAATA ATGGACAAGT TCTCCCAAAA TCGTTCAGCC AGATAAATAA TGTGTTTGCG CCATGTAAAT CAATTGTTTC TCTAGCCTCT TCCAAATTCA GACTTGGATA AACTCGCTTA TCTGATGGTT AGTTCAGGAT TTT'TTAAAAT TATCTCAACG GTCACGGTTC TTAAATCGTA ATAAATTGCGG AGATAAAAAC GCTCATCATC TCAATTAATT TGTCCTTTGT CATTTCAGAA TATGCCATAG TTTTGGAAGA AATCTAAAAG AAGTTGATTT ATAGAGATCA ATCATGGGAG ACCTCCCAAA GATTCGGTTC TTAAGGAATC TAATAAATTA AGGAATCTAA TAAATTTGCG TTCATCATAA GCTTTTACAG TTACTTGGGT TGTAAGTATT
ATAGCCTTGT
CCCATTCTTT
AAATAGTGGG
TGGAGAGTAA
TCAAATAAAT
CTTTTAAAAC
CAAATAAAAG
CCATATAAAA
ATGTCTTGAC
TAGTITATGTT
TTTTGACTTT
AGCCCCATCC
TGAGTAAACC
TTTGGGGAGC
CAAAAACGAG ATTTTGATGA TTTCAAGGAA TTCCATAACG TTTTATGGTA ATCATCTAAA GAACAGCCTA AAAGTGCCAT TCATC.AATCC AACCTTTGCT ACCTTAACCT CCAGTTTCAT TTATAATAAC GCTCTGATGT TACGCTTCAT TATTGTCCCT CCAAGACTAA AATTCCAACA 682t TrTCCAAATT CATCAAATCG GATTAAACCT ACTTGTTCCA TTTCATCAAC TAACTGAGTrT 1260 GCTTTTACCC AAATCATTCA TACCTCTCTC AACTAGATGT AACTTACAAA ACCCCTGACC 1320 TCATGAGCCA CTTrCTrCCT CCTCATGAGG TCAGTT?1'AC TrCTGCTGTr TCCAGTATCG 18 TTTTrTCCTCG CTAGATTTCC TCAAAAGGGC AGACTCCTCC CTTGGTGCGT CACACGA'N'T 1440 TT'rCATCTCG ACTGTTCT T AATGCATCAT TAACGACGCT 'N'TCTTCTAG GTGGTrCATA 1500 AGGAACAGGA AGATTCAGGT TGACTTTTCT AATCCTAGAA TAAAGTGCTG AAAACAATTC 1560 GGAATAGGCA TAGAGACTAG ACAATTTGAG GAGCTGCTTG CGTCCTGT'rC GAACACATTT 1620 TCCCACCACG TGAAGAAAAA GATGGCGGAA GCGTTrGATT GTTAAAGTTT GGAAGTCACC 1680 TCCAGCTAGA TGTrTGAGAA AAAGATAGAG ATTGTAGGCG ATACAGCTCA TCATCATACG 1740 AACTTCGTTT TTGATTAAGG TTGAACTATC CGTTTTATCG CCAAAAAATC CCTCCTTCAT 1800 CTCCTTGATG AAATTCTCGG C'FTGACCACG TCCACGATAA AG 1842 INFORMATION FOR SEQ ID NO: 86: SEQJUE.NCE CEARATwtLRInSaIA: LENGTH: 19390 base pairs TYPE: nucleic acid STRANDEDNESS: double (0 TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 86: TCATCTTTAT CTCCTCGAAA TTTTCTAATA TAGCCATTAT AACAGAATTT TGTGAAAATT CCTATTATAG TAAATCACTA TTTCAGTATA AAAAGAAAAA ACGAATCAGA CGATTCGCTC 120 TTCTTAAAAT CTGAAAATAG CTTTCCAGAA AGGATTAGCC GATTTTTTGC AGATTGAGCA 180 CTGCATCGTG ACTCATCAAG ACTTGACCAT ACTCTTGTAA GACTGAGCGA CTGATATCAC 240 TATCGTCTGC AAACTCGCGC ATACGGGCCA ACAGCCAAGC TGGATATGGG CTTGGATGAT 300 TTTCAATATC CACTAAAATG GTCAAATAAT AGCGCTCGTT CATTTTGTAG AGTTCAGAAG 360 TTTCCATTTC AAAAGTCACT GTCTTGGCAA AAGCTACCAA GTCAGCCAAC TTAGCAAAAG 420 AAAGGATGTA GTAGATGTAA GGTTCTTTCT TACTCTCAGC TTC'N'GTTCA GCCTGCTCTT 480 *GCTCTTCTTC CTTGACTTCA ACTTGCTCAA GAGATTGAAT GGCTTCGATA TCATCCTTGG 540 *TTTTGTCTGC GATGCTTTTT TCCAGGGTTIT TGATAAATTC ATCTGGAGAC ATTTGAGCCA 600 ATTCTTCCAT ATCTGGCAAA TCCGATAAGT CTTCAAAATC TAGATTTTGG TCAATCTI'TG 660 ACTTGGTCAC AAAGACATCT ACCTTATCAG GTTTTGGAGT CACACGGAAG CTCAACATGC 720 CTGTATCCAG AAAGCTATCA GGCATCTCTA GCTCATCCAA GATAGCATAA AAGAACTCTT 780 C1'TTTTC
CTAAAGATAT
ACCTCATACT
TCCTCTATAC
CTGCAGACAG
CATCAGAATA
AGAATTTGAA
TTGAGGAACG AGAAAGTCAG CGTGATT'TTT AAAGTTGTAT TTCAGTTCTA TCTAN'ATAC GGATAGA'N'T TCCCTAGGGT ATAGAAAAAG CCTTCAAAAT CTATAAT'rT AAAGGT'TGCT GACTITGC'rC AAATTTCTTA CAATCTCCAT TCCACGATCC ATCAAATCCT CACTAATTTG TTTCAT'rTTC A'rTGCTAGTA TAGATT'A CGATTTTATC AAAAGAAGGC CTTTCTATAG GAGACTCCPLA AAGAAAATr CGGCTAAGAG CCGACTTTGA AGACCTTATA ACACCGAGGA TAGAACGAI-r TAAGTTTC'rG TAACGAGTCA CTCCGTACTC TTCAACAAGA AGGACTGTAT CTCTTTCCAA AAGAGATGAT ACATCCTGTA TTTAAAGCTT CTTGACTCTG TTCAATTTA TCTAAGATAG GTCAA'rTCCI' GTCCAGTATT TTTGTATGAC AAAACATCTG ATCTCTG'rTA CAAAATCAAT TTGATACTGA GAAAAATCAC AATCTACAAA ATGCATTCCT CTTTATTTGA GCTAACGATG TTAAAGAGAT AAACTAACAC ACAACTAAAC GAAGAA'rCAG TTCAATTCTT TAGAGTTGAT TTA.AATTTCT CAATCAACCC GATGT-CAAAA 'rTTTCACACC TTCCACCAAT AGTCAI'CGT GCAATTTTTT TCAACTCACT CCATCAATCA ATTTCCCA'rA TTGTCCGACT CTTTTGGAGG TCTTTGACAG TGTTTTTATT ATTTCCCATC ACAACCAAAA ATTTTTCACA T'TTAAGCCAA GGTTTCCAGT TT'rCAA'TTT ATCAATT'TTT TTCTCTAACA AACCCCTGCA TCGTCAATCA TGAAT'rTTTC AAGGTTGT'TT
CTAGGTTAGC
CTACTCTATT
TCACACAAAC
GCGCTGTTTC
TCACATTTGC
AGTTATTGGC
TATAGTAGAC
CTGTCGTGTC
AATTGTTGTA
GATTGTTGGA
CACTCCAATA
ACCATTTGCG
ATAGGCATGT
ATCTT1'ACTT
GGTCAABTTTT
TAATTCACTG
840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 174,0 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 GGGTTCTACA TCATTGAAAA GATAAGCTCC ATTCAAATTA AAAATCACTA TAACCACCALA TTTGATGATT CAAAATCGTT AGTGATTTTA TAGATAAGAT AAGTTGAATA ACTTGTTGTA CCTrAACTGCT TTAATTGTAA ATGGTACAGC AATGAGAGCA
AATAAAGCGA
TCAGCTACTG
TTCCTTTGTA
TGAGAGCTAA AATATTTGCT AATAATGTTC AAACATGATT AGCATTTTTG CTACAAATAT TTTCGCTTTT TATAAAGATT TGCAAACAAA TT~TTTCTCCT TTGTTTAGTA GATACTAGTT AATCACAAGA ACAATTCCCC AGAATTGCAT GCTGCTTCTT GGCATAAAGA ATAGATTATT TGTAAATAAA TTGAAGAAAC TTTCTGAAAA CAAGATGAGT AGGGATAAAG CAAATAGGAT TGTCCTTGAG CGATAGGCTA CTTGCAGCAT GGCTATAAAT AATACGCCGA GTAAGAAACT AAGCAGAAAG ACTCCAATCA TACCATAGTC GGTATACAAC TCCATGATAT AACTACTTCC GATACCATGC CCTTTCAAGT ATTCCTTGTT CAAGACAAGA TAGGATAGAT TGTGGGCATA ACTATTACTA TCAATAGCTA GTTCCACACT 684 ATTGGTTGTA TGT'rCAAAGG CTTTTCCTCC GAAAATGGCT CCCAAAC1'CC CCCT'rGCAAA ATAATCAAGA ACAGGACCAA AAGTAAAATT ACGGAAATCT CCGTAAGGGA GGCTACTGTT AAATAGAAAA CCTCGAGCCA GAACACCAAA ACTAGTCCCI' TGTTTATAGA TAAAGTCAAG TAAGATATCC CAGAAACCTG TATGGGAAAC TTGGACATTA TCCCGTACAT AATTGAGTAC 'rCCCATCGCT AACATGAGAA TAGGAGAACC TACAAAAATC AATCCATTTT CCT'rTTTCAG TTTGCTCCCG CA'rAAAGTAA ACTTAAAATA AAGGGATTTC GTGTCCCAAT TGCCAAATGA GGAGACAAGC ACTGCTGTGG CCTGCAATTT CTTTGGCTTG TGCATAGACC GTAAAGGTAG ACAAAATGTA GGTAAAATAA TGCATAGTAG GCATAGTAGG AAGTCTGCAA ACGATACAAG GAAATAGAAA GGATAAGTTA GAAGAAAAAC TCCTAGTGAT ATAAACCTCT 'rrTAG.AGAAT TTCCTATATP TGCTACTTTT GTAACGAGCC AGAATGCCTC CTGTGGTCAA GCCCAGAATC GGCAAAACGA TAGGCTATTG GATGATAGGT ATCCAAAGCA GGTCGGTCTT GATACCAGAA ATACAAAAAT GG'rrAAATAG ATACTTGATA TCATTCCAAC AAGCAATTAA GCTACTAACC AAGTAAGC'rA ACATTATTAt TATTAAACAG ATACACAATT ATAACTGACT ATGGTCAAAC TAAATAATAA TCGTTTCCCA G'rTCTAATGT AATTTTTTAG ATTTTTCAAT ATTTTTCAGT GCTAACTTTT CTT'rAAACcC TAAACAAAAG CAAATmnAAAT ATAGTATTAG CTGCAATAAA GTTGCCAGAT ACATACACAT GGCAGTT'rAC TTTCAAAATT AGCCGTTCAA ATAACCGAAT ACAAAGCGTA ACCGCTTGAT AT'TrCTTCC TAGCTATGAA GAAATCATGA CAACTATAAA CCATCCCTAA AATAATCAAT AAAATAAAAT GGATTAAGTA AACAAGAACA ATAAAGTAGA CCACTTACTA GCGTCAAG6C TCAATCACTT GGTCACCCCC 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 35-40 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 AATATT'rATG AATAGGGCCA AGAAATAGCA AAGGCATGGC TTTATACTGA TCATCTAGCA GAAAAAAGCG ACCTCTTTCA GTAGATATCA ATATAGGCTA TCTATGGAAA TAGTAATAAT TAAATCAAAA TGGTAATAGG TGCAATCTGT CTCTTGATTA GTATTCCTTT AGAAATGTTT TGCTTTCA TCATTAGCAT AAGCACTAAT TCTT'CTCCCC TTACGGAAAA rrGGATTCCT CTTTTAAAAA ACGATGAATC TGAGAATAGG CTTCAAACTG ACATCTTATC CAGAATAAAG AAGTGGGCAT AGGCCAATCT AGTCAGGATA GTTTTTCACA ACTTCATTAT AAAACTTITTG AATCCTTCTC TGCATAGGGT T'rGGTCGTAA TAC'rATCCCC AGGGTTTAGT ATTAACCACA TACTTCTTGG CCAACTTGAT CATCTTCGTA AATCAACCCC TTAGGAAAGG ATAGGGCAGT GCTTATTGCA AATCGTCCCA CGTATTTTTT CACCTATGAG GAGAATCACA GACAAAATAG TCATCCTGAT TGGCTGACTG AGACATTCAT GACACCACAG CTCGAAACAT CCGCATC TTGAACTAAT TGCTCATATA AGCTCTGAAT CATTTCTGGA TGGATATAAT CATCTGAGTC 685 A6ATAAAAATC AGATAPLTCCC CGTGAGCCTG CTTCATCCCA 1'CATTTCGTG CTTGCGACAA TCCrTCGTTC T'TTTTATGAA GCACTGACAC CCTGTCATCT TGTTCAGCGA TTGAATCACA CAAGCGACCA. CTTTCATCTG 7TGCACCATC ATCAACAAGA ATAATTTCCA GA1MrTGATA GGTCTGCTTC TGAATGGAAG CrATCGATTT TTCTAGGTAC TGCGCCACAT TATAGACTGG CACAATCACA C'rAATTAATG CAGTTTCCAT GCTACTCCTC TAATAGTN'T TCTACT1TGTT CGATTT-GTTT TGTAATTGTA AATTGTTGAA 'rGAATTGGCT AGCCTCATCG ACATCAAAGT T1'GAGGCAGA AGTCATGTAA TTAGTAATCC CCTrGAGCTGC CTCN'GATTG CTC'rCAATGA TTTGTCCAAA TCGTCCrCT TGGGATAArr CCTCAGCCCC TCCAACGTCC GTAGAGATAA AAGGGAGTCC CAGACTCAAG GCCTCCACAT ACACTCCAGG AAAACCTTCT TGT'flAGACA TAGACAAAAG AACTTTCGTC TGAGATAGAT ACTGATAAGG ATTTrTTTTGA TAACCAAGGA AATGTACATA GTCCTCAATC CCATACTCTT TGACTCGTTT TTTCAGTTCC TC'IrCCATAT ."fl.
a
C.
.0 tO 0O 0 0~ 40 S 0 CACCAGCCCC GATAAAATAG AGATGATAGT TTTTTCCCTC CTTCCACTAC ACGGTCAGAA CCCT'rAT'r'r' CCTCAATCCG GAGGAGCAAT CTCGATATCG ATCTTCTCTT GAGAT'1-r'TC ATCCATTGTA GATTGTCTGT AATTrAGAAG TATAA'rCTGG TGC'rGGTCTr TTT'rGAAATC CCr-ACAATTG TAITrCGCAGC ATTCTCTTTT AGAGCTATCC TTAAGAAGTT CTTCAATACT TCTrGACTTC TCT'TCTTTr'A GAGAACAACA GTGGTGGATT CAACATCATA ATCATCTTTT ACAAGCAAAC GACGAGTCAG TTGGTGTAAT AATCGTATC-A TCCGATAGTA CAGATACTTT TAGAATAGTC TGAAAATCAT ATAAACTTCC TTGATAGAAT ATCCAACTGG CTTCTATG3TG TCCATGAATC CAAGATATCT CATAATGGTA AAAGAAACTT TCTTGGAAAA TAAATTCTCA 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 0* 00 OS 9 0 0 09 5* 00 IS 0
S
TTCTCCACAA AAAAGCTCGT AACCATCTGG TTTGGCGATA ATCTTGAAGG GATTTTAAAA TGCGTACATG CTTTGGAACA GAT'rCATATC CCTTGTCAAA GTGCTCCATT TCAAGAATAT CAATATCATA CTTTTCTGGA TCCAGAT'rTG AAACAATGGT TGATAGAATC TTCTCTGCAC CACCTCCAAG AGAAAAAGAC CACATAAAAA ATAAGATTTT TTTCTTAGCC ACCATATTCT CCCTTGTATT CTGTATAAGA CTTATCCATA TCAGCGATGA CAGCATCATG ATGCGGTACC TGCTTGTCTG CTGGTGGAGG CGTCATATAA TCCCCAAAAG CAGTTCTGAG ATAGACATCA TAGCCGATTG GAATAGGCAT CTCTGT1'CCT TCAAATGGCA AGAAAAGATT GTCTTCAAAA GATGTGATTG GGTACT'rGTT TCTCATGTAG CCAGGACCTG AGCATAATTC TGTAATGCCA TCACAATCAG CCAAATCATA CTTAGTCAr'r TCTTTCTCAG CTTTrTCCA GATGCGATAA CGGAGAGATT TTGGAGTCAA*ACCCAGTAAA ATGCGACT'rC CCCATTTCAT GAGATCACCA
TGCTTTTCTG
TTTTCCGCT
CCATGTGGCA
ATGGTAATAA
TCATCTGCAT
686 GAATAGTTrTG CGCACAAAAG AGTGAATAAA TCAAGGCCCA ACGAACCTGT CAGCTGGATT TTTCGGATAA TAATCCAAAG GCAAAACATC C;ACGCCAGA AATCCAAATC CTGCTGATAA GGCTTGATAC AGGTGGNTT CTTGTCACGA AAAGATTACG ATCAACAAAA TCCTrGTGAC TCTTTGACAA GAAATAACGT AACGAGGCCA TAATTCTGCT AATTTCTCAT'r AATTACG AGGCATAAAA AAGTCTAGGT CGTCGTCCCA AGGAATAAAT CCCTTGTT'rC
CCGCCACAGA
ATCTCCAGAC
ATCGTATCGT
TCAAAGCCTG
CGAAAATCTC
GATAACAGAG CAAATCATGT TCTTACAAA TACGAGCCTG AATT-GCTTTT AAATCAGTCA TTrCATTATAC CACAAACAAG GGGTGAAAAT ACTGCTATCC AAATAGCTA'r CAAACTTTGA TTCAAACCAC GTCAC'TCA CCTTGCCGTA o 0 .00.
00. TCAGTCTTAT CTACAACCTC AAAACTGTGT TTTTAGCAGC TGCACTr'rGA TTTTCArI'GA GTATTATCTT ATCTTAAGCC GATATTTGTT T'rGATCAACC AGCAGGCCCA AGCCCCCATA AGTCAC!CC!Ar 'rTC!TrrAATC GTCAATTTTT CALATACCATT
ATAAACCGTT
TTCCPAAGAGC
CTGTTAAAAC
AACGAGCATA
GGGTTGGAGA
GACGCCCATT
CAATCGCTGT
AATTATCTTT
ACTCGTAGGC
AACCATCTAT
,AGCCCCCGAC
TTCGACCATC
AGACTTCTAC
CACGCAAAAA
TAAAAATTCG
TGTTAGGAGG AAAGTATAGG GTACGTTGGT TTCAGAACCA GTGAAAAGTT TAGTGGGATC TTGTTGCTGT CTTTTTGTAC GGCCGTAATC ATTGAGCAGG GTCGAGCCAT TCATCTGCTG CTCAGTICTCG GTAGCGTATA AATCATCTCC CAATGTTGAA AAT'rGAGCAT CAATCGTCAC GGCAAAGGCC TGGAAATCAA CCAAGGCGTA CAAGACTTGG CGAACCATTT CTGCCCCTTT TACGTTTAAC TTGTTATCTG TCTG'rrrTCT ATAGACCAAA TTATCACGCA TGAAACTGAC AT'T'AATACC ATAATAGAGT CAGTTCGTGT AGTACCCATG ATTIAAAATAT TAACTCCATC TTGAGCTGCC CGGGCATCAG CAGTTTTCTT CATGAATACC ATdGCCAAAG CCACACAGAC TTAAGACG AGCTTCCGTC TTTTCTTT GAAGGGCACC AATAGCGCCT AGGCCACAAA ATAT'rCAGCC TAT'rGTTCAT TATTC'rTTCT CTArTGCAGA CTGTAAAAAA TTTTTCTGTC TTATACTCT GGTATAGGTA ACTGACTTCG CTGCGGCTAG CTTCCTAGTT CATTTGAGCG AGCTTGGTTT AACATCATAG GCATCTACCC 'TTTGCTCCA TCCAAAACAG TGAGGTCATA GCAAAAACCT TTTAATTTGC TCTAAAATTG TGCCTCATCA TCATCACGGA T'rrTCCGACT TTAATGGTTr GACTGTAGCT TCTGTTAGGG CCCATCAGGG AAAAGCGTGT GTACTTAATG TCCAAGTCAA TTGCCCCTCT TGTTCTCCTA ACCATTAATC ACTTGACTAT TAGCT'rCATT TTCI'ATCTG CTCAACACTG TTCTGGCCGA TCTAGTGTCC TGACCATTAA TGCGCTAGCA TCTTGGTAAC CAAAAGTGAA AAAATCACCA TGGAGGGAAA GAGAGTGCTT 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 GTGATTTGGA TTGTGAGCGA TTTCTTGTTC CAAGCTAGAG AGGCAAACTC ATTTTTCT CTCCCGTMCG CATAGCTTCG CTACTATTTC CCCTAGCAAG CTCTCATTGA GATAGTGAAT
AACCAACTG
CTCTGATATT
TAATTCTTTT
CTCATGATGA
GGATAAATAG
TGGGCAAAGT
CTTAAGGGAT
GATAGGCACC
GGACCAAGTC
TCTTGAGTGG
GGGCCTTATA
CAGTTTGTGC
TTrCTTACT
CAGAATTTTA
CTTATCGGTA
ACAACT'rTCA
AAGTGCACCI'
TTGCAAGGGA
TAAGTCAACC TGCTCTTCTC AGTTAGCTTT TCTTGCAAAT A7TTTTTAGC AAATAATCAT CATCTT'C'rCT CCTTTCCATG TACTGGAT'rC CAATCGCTTC TAATCCACAT CGATAATGAA ATTTTTGTCA AGTCAATTCC GGAAGGTTCT CAGGTAATGT ATACTAGGCT TTTCAGCTCC AAAGTATTCA CCCAGTGCTG TCGCCAAGCA AAGG'rCGACA CAAGGCCAAA CTCATCTTT'r TAGAACCCAG AAACGTGTGA AATTGGCTTC CATTTCCTGA AACTTCTAGG TGCAACTCCT CAATCCATAT TCTTCAGCAG TTCGGAAATA AAACGGGCCG CATAAGCTGT ATATTGTTCA TCGATGAATT TC'r'rTCCTTG TTCAATCTTA GTATGGCC'rG GAACCACCAT TGCTACTGCT TGGATGTGAG CCTGATGAAA CTCAATAGAA TTTTCAACTC GCACCACA GA GATGACATCT GTAATGTTGG CAAAAGCCTG AACGTGG'rGT GAAAATGATC CCTTGGGACC ATAATTTCCT CTGGGCTTAG CT'rGGTCACA CAAGCCTGTC TTTCTTGGAA AATAGCTACT GGTCGCTGAT TGTCCTTA'rC AGCTGCGATA ACTAGCTGT'r
AGCCAAGGCC
CAACTGCTGA
AAGATAGTCC
ATAGTTCACT
CPLATTCCTCA
ATATCCTCGG CAATCAGT'rC CCAAAGGGCT GGTCTGGATG ACCTCGATTT CAGCCTCTGG TGTGGATGTG AAAAAATCT TGAATACGCT GAACGATTTC AAGGTTTCAT GAACACTACC AATCCTTGCT CATAAGCrT'r TGAGGAAAAG CTGTCTGCAC 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 -~8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 TAGATAAGCA ATTTTCATCT TAGTTCCTCT TCCAAAACCC GACTAGCCAC TTCCTCATAC AGTTCr-rCCT TGCTATTATT TAGAAAAAGC CGTTGGTAGA GGGTTTCAAA ATC'rGCTCTC ACCTAGATGT TATCTGTATT AGTCTTGAGT AAGTCACGAT TTCTCTGAGA AATAACCACT CCTCCTCC.AG TTGACACGAC TTGGTCTGTT TGTAGTAAAT CAGCTAGGAC TTCTGATTCT ACCTGACGAA AGGCTGTTTC TCCCTTTTCA GCGAAAAAAT TCGCAATGGA CATACCTAGG CGATTCTCAA TCAGAGCATC CATATCAAGG TAAT'rAGGGT CCAAGCCTCT TGCAATAGTC GATTTTCCAG CCCCCATAAA CCCTAATAAC ACCTTAGCCA TGAATCAAGC TCTCCAAATC ATCAAAGAAA CTAGGATAGC-TGGTATTGAT GGCTTCTGCA CGGTCAAGCT CCACCTCTCC ATC'rGCAACC AAGAGGGCTG CGATAGCTGT CATCATGCCG ATACGGTGGT CACCAAACGT ATTGACTCTA GCACCGTGAA GAGCTGA'N'r TCCTT'rGATA ATCATCCCA'r CTGCCGTAGG ACTAATATCT GCTCCCATAC CTTGACCTTG AGCTCCTCAG CAGGGCAATA ATGGGCAATT TCCNi~'rCAAG TCAGAAGACT AGTTATTT-CC AATTTTCCAC GTTGATCCCC ACArTCTGCA CCAAAAGGCT GCACTGGAAA TGGCCCCTGG ACTGTGATTT CATATCTTCA GTATGATTAC TTGTAAGGCT GCAAACATCA ATCAATAGGT CTTAGGTTTT TTGCCCTGAA ATGCTGACGC TTTGGAAAGA CTATCATCTC TGAAATCAGG CGAATCGAGG TAAGCCAGCC ATGCCTACAC ACCAAGGTCA CGAAAAACCT AACCTTGGTC TCACCCTCAG GTCACCTGGG ACGCGGATAC GGACCTCATA CTTGCAATAC AATTCCTGAC TTTAGGATCG 688 TATTAAGGC GTCTGCCACA ACCTGAATAC GGrTTTC CATCCTTGAT AACTGMACA CCrGGGCTT GGGTCGCAAG CATCAATCAA TCGTGGAATC AAAGCGCCAC CAATCTCTGT
CAACAATCAA
CCATGGCACG
GCACTAGACG
TA'rCTCCTGG
TCTTACCATC
GGGTGTACTC
AGGCTGACTT
TCGTCCCTT
CCATNTTT'T
CAAACATCTC
GGTAGCAGAT
AATGACATCA
AGAATTTGGA
TACGACCACC
CACACT'rAAA TT'rTTCGATA GACT'rGGGCA
TA.AGCGAAGG
CAGTGGAAGG
TACTTCGAAA
TGCCAGAATT TCCCATATTA CTTGAATGG'r AATAACCCCA GCATGGTCGA AAGAACGTCT CCAA.ACTTCC AA.AGATAATG TACCATGTA.A ATGGCGAATG TTT1TACCTAT TTTATCATAA TTAGCGACTG GATCGA'rrTC ATAATACCGG TGcGAGTIT~C GCAATCAAAC CTGCGACTAA TTCTGTCCTG TCAATTTTTG TCACCACCAA AT'rGTTTCAA ATAACTGACT CCCCCTTAGC GAGCCAATTG GCAACTCATA GGAGGCAAGT CTCGTTCAGT GTCACACGGT CCATAGGACG TCTGCACCAG CAAGGACACC AGGGCAT'ITTT GTGGCGCTTT TCTTTA'rCCT CAATTTCAAC TCACCTCGCA GAATATCATA GAACGGTGGC TGATAGACTT- TTTGTTTTTA GTTTCATAC'r AAAGCCAGAA ATTCCTTAAA TTCTGAAACT GGT'rCAAAAA TTGCT'rGGTA AAGAATTCTG CATCTTAGCT TGGAAGGACG CTCTTGAAAG GCAAGATAAT ATCTI'TAGCT GCTTTAACAC- GTTTGCACTA TCGTAAATAT CAGTATTTTC TGTTATGAGA TTTGTAGCAT 'rCCGTGAATA 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400
TTCTTTTCTT
S
S
CAATTTTTTC AATATCAGAA AGGTAAATGG ACAAGAGGCT ATTTCCTTGA ATCTGTTTAC
ATTTCAGCAA
CCAATTGTTG
CAAAGCCTTC
CATCTGGCAT TTGACCTGTC TGTGCTAGTT TTTGAATTTC CTGTAAAGAT TTTGC~rGCC TCAGCATCTG CTGCAATCGC CCTTGTATTC TGGTAATCCG CGTAGACCGC GACTGAGTTC TTGACATGTT CTTC'TCCTTA TTTGATGACG ACTGTATAGT TGCTCAGCTC TTTCCAAGTC TTGAGCATTT TTAAATGAAA TCCTCACGAT TT~TCCTCGTT GATGTGGATA TTAACCAAGG AAGTTCCACG TAGCAGTT'CC AAAATCCGCA GGATGACATC TTCTTCATCA GGAACGTCAA CATAGAGGTC GTAAGAGCTA TCCACACCAC CACGCT'rATG GATTTCCATG GTC'rGGCGTT GTTCACGCGC TTGGTTAAAA
AAGT~TCCAAA
TCCTTGAAAT
CACATTCCrG
CGCCTTGCCA
AAAATATGAG
TCGATAAAAC
TTTGCTCTTC ATCTCCCT'rA CCTCAATTCT ATCCAGAATG GCTCGCTr'rC CGCAATTCGG TCTCATGCTC TTGAGCATAG GAAAATGGCT AATCTGAGAA GAGCATGAAG ACCTGAAAGC 689 CATGGCCr GACCAATCGC TTCCAAACGT ATCTCCrAT TGGACAAGAG AATGGAGGTC GTCATATCTC GAAAACCACC TGCCGCAAAG ACCGCAGTCT GCTCCATGAG ACTAGAAGCC GTGACACGAT1 CATGCTCCTT GGCATCAATC AGATCCTTCA TTTCCTTAAG CGTGTCCTGA TAGGCATTTT CAAAAAGATT GACATCTCC CTTGTCAGGC TTGAAGGTGT AAAGATATAA GAAGCAGCCC CTGTCT'rGTG ACTACCAGCC ATGGGATGGG TTGCCAGCCA AATACTGCTC CGCCGCATCC ACAATGGTTG GAAATAATAA CGCCTTCTCG CAAATCCAAA TITGGCCAACT TGTTTGATTG GCAAGCTGAG GATAATGACA TCTGCCAAAG TCCGTTGCAC GGTCAATCAT'ACCTTCTTTC AAGGCGATAT TTATAACCTA AAATTTCA'rA ATCTGGATGA TCGCGTTTGA CCAATCAACC CAAGACCTGC GATATAGATT GTTTTTGCCA TTTGTATAGT CTCGGTGTTT GGCTACCGCT TCTTTrAGTT AATTTTTCGA GGATTTCTTG GCAGCTGGAA GAGCAGTCGG TCGATATCCA CACTrCATAAG ACAACGATGG GTTGCCCATT CGAGTATAAC CGTCTTCTTT TAACCAGCCT CAAAGCCAAG GCTTGAGCCA ATCTTGCATC GGAACGCCTC CGACGACTGT TGGTCAATAT AGTCCTTGAT TGGGCAGCTC TTTGCT'rAAT CCACCAAAGA CCACGACATG GCTACTGCAC CAACTGCCAC TTTCGCAAAT CATCAAAACG GGATGAGTAA TTTTCCGCTT CGCCAGAACC GTTGCTACAA ATCACTTCTC TCCACGGTTG AGGTTTATAA AGAGTAGGAA AGTCATACCA CCTTCAAAAC AGACCAGAGA AT'rTCATCCA ACCAAAT'rCC ACCCCTTTAA CAATTTTCTA TCCCATTGGA CTCCACAACC CCACCGATGG TTrCCTGTTCT CGTTCT'rGGT TTCAGCGACT GTCAGATTTT GTTGGCAATC TCCATATCCA CCGCATGGTG GTTTCACGAG CCCCGACAAA GCGAACAGAC ACTTGGTCGA ACCAGCATCT CCTTAATGAA AGCAATAGTT GAGCAAAACT AGCAAAATCA CTCTCGAAGC TTGACTACGA TACCAAGTGC CATAGAGGCT TAGGAACTCC TTAATAGTTC CCTCAAGATT ATCTGATGAG CTGCTTCCAT GACCATTCCT CCTTGTAAGG TTCGTGGGTT 'rGGGTTTCAT GACCCCACGA CACCTAGATT ATTGGTACGG TAACTTGGCT GCCTTTACGA 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660- 12720 12780 12840 12900 12960 13020 13080 13140
AGGCATTGAT
CATAGGAACC
AGAGACAACA
AAGACCAACT
TATCACCATC ACGTTTGATT TGACAATAGA AACTTCAGAC CAGGAACATC GATTTCCTTG GCTCAGCCAA GAGGCGTTTG CTGATGAACG CTCCAAAGAA GTACTTAATC CCCCCAACCA AATCGGCATG ACCTGGGCGA GCTT'TTAAGG CGGTCT'rCAA TGTCCTCCGC AGACATGATG
TCCAGCCATT
'rTCCCGTGGC
CCACGACCGT
GGAAGTCCAG
TCTGGTGGTC CTTATTrGATG GAACGCCCGA AGTAAAGACA AGCCACCCTG ACCGCGTCTA CTGGAATTCC CTCAATAATA 690 ACATCCATAG TAATAGGCC
S
S S
S
S. 55 S. S
S
S. 55 5 5
S
C. 55 C S
S
S.
S S 5
S
.55.
S
C*SS
5 5
S
GCAGTrAAAT ATCTCATACA CTCTCCTTAT GAAACTGGGT GAATGGTCGC TGAACCAAGC CCACGCGCTT TCTTGTCATG AGTAAGACCC TAGTCAACAG GCAAACCGAA TTTCTGACAC GGCATGAGGC CTTTTTCCTC AGCAACCTTG TCTCCA'rGCA TGACCTTGCC ATAACCGGCA CCAAAATTGA GGTAAAGACG AATACCAT'rG TTCACCTGAC AAGAATGTTC AATCAAGGTC CCArTTCAGTC CCGTCAAGAG AGCCCACAGT ACTTCACCCA TCCCTCAAT CAACTCTCTT ATCAGAACCC CATCTGGTTG GGCAAAGGTC ACGCCTGTC'r TrCCACCGAT AGAAGAATCA ACAAAG1'GAA TACCCCGCAT ATAGGTAGAG CCACCACCAA GAGCAACGAT TCCATCGCTA TAGACTTI'CT GAACAGTAGT TAAATTC~ITTT GCTACCTGAA AACCAGCA'rC TTCTAGGCTG ACATGGT'rAT CTGTCACAAT GACTACCTTT CCAGCCTGGG CCATACAACC TTTT'rCAATC ATTCTGATTT TCATAGGAGA GCC'CCCTTT ACCTGGTCA'r TCTCAATCTT AGGTCCTCAT TGATATCCTC GCTGTTAGAC GGGGGCCGTG 'rrTACCAAGT AGTCmTCAT TCTGGCACCA AGACCAATTT TGATAAAGCT TGCCAACTTC ATCTCTGTGA TAGATTGGGT GAAATCTGTA CCATTCCCAT GTCGCTTCGA 'rGGCATGGCC TCCAACTCAT CTTCAACCAC TICTGCATGTT CCAAAATACT CTCAACAGAA TCTGGATCCT CAATCAAGCC ATACTTGATA TTTCCGAGGG TTTCAAGAAC AAGTGGATCA CCCACCATAT 'rTrTAGCAAA TGGTGTATTA ACCTGAGCTG TCAAACTAGT CGGAATCTGA GCTACAAATC CAGCCAGGTC CCCAACAACG CGAGTCAGAC CT'rGCTTGAC TAGAAATTCA
CCCTGTCGTC
CATACGACCA
AGCTGTcAAT
TGATTCTCCT
CTCTTCCAGA
CAAGGTGTrA
CCAATTTCA
AATGCCAGCT
GGCAACACCC
AATAGTGTGG
13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940
CTTTCTTCAC
AGCTTGACCT
TGCGGTTGCC
TGAATATCAT
CTTTATTGGT
CTTCTAAGAA ATCAAAAACA TCTCTGCATA GAGAGAGGCT AGAGTTCTCG CAACCACTGA AAGGATGGTG AGGAATATCG ATTTTTCTGT TAAAGACTGC GAAAAGCTTC TGCAGCTTGA GACTTCTAGC CCATTTCAAA AAAGAGTTTC TGGTAAGACT CACTGGTGGC ATTAACTAGC
CAAATCTCTT
TAGAG'rAACA AACGG'rGTTT
ATGTTTTCAG
AAATCCGACT
ACT'rTAAAAC
CTGTCGGCAT
TTCCCAGACC
CAAAGGGTTG
GAACAGGAGA
CGGCAATCCT
CTGTCTGCTC
TTCCT'rGCCT GTCCACAGTT ATTGACTGCT GGATTGCCCT GTATATGATA TCTGCAACTA GGATTGGCCA TCCATGCCCA TGCTTGCAGT TCAGAAACA1' CTGTAACTTG TCTAGGTAAC
ATTCTAAAGC
GTCTTGTTTT
ACACAAATCC
TTCCATACAA
ACGGAACGAA CAAAGACCGA AATCTGACTG ACGCCATCCA AAATACCG 'rCCCAAGATT 691 GATTTAGCCG CACCACCTGC ACCCAGCAGG GTCATCTT TACCTGAAAT TGTAAAAGAA GCCAAGCACTr TAAAAAA'rCC CTTGCCATC1' GTATTATATC CAATTAAATT GCCATTCTCA TTGACAACCC TATTAACCGC ACCAATCAAG CGCGCTChT CGCTCAGC'rT ATCCAAATAA GGAATCACCT GCTCCTTATA GGGCATGGAC AGATTGATGC CAAACA'rCTG GTAGCGACGA ATATTGGCCA CTGTTTCTAC CAAGTCACTC GCT'rCAATCT CCCAAGCCAC ATAAGCACCG TTGGTAGCTG TCGCCTCAAA GGCTCTAr'rG TGGATGAAGG GAGAAATAGA ATGCTTAATA GGATTGGCAA CAACTGCAGC 'rAAACGTGTA 'rAGCCATCAA GCTTCATCCA AAATCTCCCT GATTTTTC ATGCTAGCTA GAGAAA'rCTG CCCAGGGGCA AAAAGACCAA CTrCGAACCAG TCACA'rCCGC AGTGATACGA CATAGAAATG GTCACATATT CCTGTTCAGG ATTGAGGGTT CATCAAGTCT AAGACATCCT GCTCCGTGTG AGCCATCACC ATTTAGGATC GTCAACTC'rG ACAAGATTrC CATCATGTTC ATGGTAACTC AAAACAAGAT TTGGGAAG'rC CAGCATr-rCC ATAGTACTCA AAATCAATAT AGTCTGG'rTG ATAGAGTTGC GATATACTCq' TCTGGAGAAA GGTCGATTTC TCCACCTTCG AACCAACTCA CGGCCTGCGA AT'rrITCAAA AATGGCTGGA TTTAGGCAGA TAGTCGGCAC GCCATTCAAT GATGTCGGCA CAGAGCCTGA GCCTCCTCTA AACTTCTTGG CATTACTGAA ACCT'rCATAC TAATCACCTT GAGGTAATTA CTACTTTCAT TCTGCTGGAA GACCATATT'r GTTTAAAATC TGGTAACTTC ATTTGTTCTG TAAATTTCTG ACGGGAAACA TTGGCAGCAT CCTCCCGGAT TTAAAATCTC AAGACTCTGG GAAA'rCAACT GAGAAAGTTT GTTT'TTTATT CCGAGCAAAG CTAGGCGGAT GTCAACTCTT TGCGTTTGGC TA1GA TACTCAAAGA CTAACCTCAT CCAGACTGGC GAGACCTTGC CC-ACCTTACC TTAAAGCCTC GTGTATAGTT GCAACCTTAA CAAG7'rTTGG TCAGGTGTrr CTTGGAAATr' TCAAAAACAT CCTTGTAGCT GCAACTTCCT TGATTAGATG GAGCGAGTTC GTAGCGTGAA -GCTACCTCA AATCGCTTC.
TCCAGGTACC TCGTGGCATC ACGATTAATr TCATTTACTA CTTTTTTATT ATAGGCAAAA TTCCTGCAAA ACCTTTATCA TGGTACTGGC AATGATAATC TGTGATAATC CTTGGCCACA CTAGGACAAT CACATCGTAG CATCCATGAC TATAAAACGA 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900.
15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 TGCTCGTCTG TGCTGAGCCC ATTTGCCTGA AAATGCGCTT GAGACAATTC TCGTGAACGT TTGGCTAGAT CAACAGAAGT TGTATGGCTA GCTCCTCCCA TGGCCGCAGC TACTGAAAAA GCCGCTGTGT AGGAAAACAT ATTGAGTAAG GATTTACCCA TAGCCAAGCC GTCAACTAAA CTACCGCGAA CCTCATGCTG GTCTAGGAAA ATTCCTGTCA TCAAGCCATC ATTCATAAAG ACTTGATACA GGACACCATT TTCTAAAACA TTGAAAAAGT CAGGTGCTTC TTGACCATAA ACATGGGCAG ATTCATAGTC CAAACCCTTA TCAGGGAAAA CCTGTCTAAA GGC'TTCTGAT TTATACCAAG AAAAGACGGC GTAGTCGCCA TCTCCCTCr'r GATTAAAGAG ACGAAAGGCA CTCTrCTr TGGCTTTTCY AAACAACGTT TCr'rTCCTCA TAAACCAGCC CAAGCCCTTG AACTTTCCTT CCTGACCCTG CACCTCTACT TCACTGGC?1' C'rAGTAAAAC TAGCCCCTTA CrTATTCTAT TCATAACTAC CATTATATCA ACCCT1'GCTT TTTTACTCTT CTTTTAAAAA AAGTAAATGT GTAAGAATCA CAGTAAAAAA TTATCA-ACGA AAATCAAATA GCAAAC'rATG AGGTAGAGAG GTTGTATCAG CAATATGTGT 692 AAGCGGATTT TCTCATAAGC ATAGTCTGAC GAATCTGATA TAAAGGTCCA CTGTCAGACC GTTGTCAAAT CATCTTGATA TCAAAGAAAG CTTGATTGAA TNTTGCTGAG AAAGGTAGGC TCCTGATCCT TAAGATTGAC GCAAGCTTCT TTTCAACCCT
TCCTAAAACC
AACATAAGAG
CCCAAAGCCA
GTAGGCGTT'r
GGCCACCTTG
AGTCCCAAGA
ATTCTCAAGA
TrTTCTGACT AACTTTTAGA CAATTCTCAA AAAAGAAACT ATGGTATACT AGACTTCCTG CAAAACTACG TGCTCTTCCG TCTTGGACGA GCATTTCTTT AAACTAGCCT CAGGTTAACT GTGAGATTAT CTGTCAAATT TAGTGACAAA GGTAGTAGAA GAAAGATAAA GAAATAAATC AGCTTCAGTA GGTATCTGGA AAATTTGATT TTATAGAGAA GCCTTTTGTT ACAAACTCAA TATACTATCA ATAAATAATA TTATAGAAGC AACAATAA'TT ATAATT'rCAC CTATCTGCAT ACTrGGAACA CACACATrAT TAAACTATAA ACAACAAAAA ATATATTTTT GATTACCAAT CATTCTATTT CGAACTCTAA ATATATGTTC TATCAAAAAT AGGAATTAAC GTTTTTGAAA TTGAAAAAMA TCCAAATAAA TAGAACTATG TTATATTTCT TATTCAAAAC ATTCCTCCCT CTTAATCATT TACAACTACA TTCTAACAAA CTATAAAAGC CAAGCAAGCG ACCAACCAGT TCATCTTTTT TCTATTTCTG TAATGATAGC CAAAAATAGC AAGAGCAAGC AAGACGATAA 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 i7700 17760 17820 17880 17940 18000 18060 1.8120 18180 18240 18300 18360 18420 18480
GTTTGTCGAA
CCAATATGCG
GAGCTCCTAC
TCCCAATTAC
TGCCATTTGA
CGTCCTTAAA
GGTAAATCAT
CTGGTAAAAA
TTGAATTTAT
TGACAGGTAA
TCCCAAGCTG
AAGGGCCATC
ACGATTCACC
GTAATGGTGG
CGAAATGATG
ATGGCAAGGA
AGGATTGCAC
AGGTCCGTAA
CAAGAAAGGA
GGCAAGCCTA
TAGGGGAGAG AGACTGAACC AAGAATATGC TATAAATAAA CAATAAAACT ATGGCGAC CA TGCTACTCCA ATTGGTTGAC AGATTTTTAA TGACACTGGC AATGATCCAG ACTACAAGAA GATATAGAGA AAGACCAAGC AAAGTCAGAA TCCAAAATTT CACTTTCACA TAACGAGCAA CATTTTCCCT CTCCAAGGAC AAGGCAATTG CTGCTATAAA GAGAGCTATA AAAAACAAGG CTATCTGAGA ATCTCGGATT TTGGAAATCA GGACTGGACA GCATATATAA AGTCAAAGGG TAAACTCTTA AGAAAATCAA AATGCAGGCT GGTGATATI'G TTATTGACAA GTAACCAGTA TGGAGGATGA ATGTCTGGAA
GACCGATCAT
TCCCCTGTCC
TGGTTGAAAT
GGATCACGTC
TTAGTCCTAT
TTTTGTAAAA
CATGAGATAA GGAAGGAAAG CAAGAGGGTG AGGTGGTAGC TCTCrCCT'rG CTGCGACGTT ATAAAAACGA GGAAGGACCT CCAAGCGAGC AGACCCACTA GA'rATGAAGA GGATAAAGGA 693 CACTTGTAAA AAGCACTGTA ATCACGCCAG GTAAAACCAT GCGAAAAAAT CCCTT'rAG C'rT'I'GAC CTI'CTCCTCA CrATTAAGCA TCTTTrTGGT CAGATAAAGC AGGAAGAGAG AGGCTrCTGT CGAAAAAGGC TCCACTGCTA GAAATGGAAT GTCTCTAACT TTGTCAACAA TAAATATTAA AGGTATGAGA ACTCCTATCC ACT'N!CTGAA GACCCTAGTT TGAGCCAAGA CAGAGACAAA TAATAAGGTC AAGGACAGTA GAGCTAGCCT AATGTAGAGG ACCAGAAAAT CTGGCAAAAG GACAGACACT. CCTr'TAGCAA CAATCATCAC ATTCGAAAAA ATAGACTGAT 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19390
AATGCACTGC
GCATCAAAGG
AAGCTAGGAT
CACTACCATC ACTAGAGCCA CAAACCCAGC CATAGAGAAG TGGTACAATT CCAGTTAGAG 0 0** 0 000.
0 TTATAATCTC TGATTCTTTA AAGGCATAGG GCCTATACGA TACCAAATCC AAAAGACATT GTAAAAGGCC GTTAAAGAAG TTGAAAAGGC AATCACTAGT TCATCGAGCT AAAATAAATA GGTATTTCCTr CAAAAGGAAA ATGAATGGCT AACAGATGAT CATCAAGAGA CTGGAAAAAA TGTAAGAACT TAAGACTCTA
TTACTTTTTT
INFORMATION FOR SEQ ID NO: 87: SEQUENCE CHARACTERISTICS: LENGTH: 18436 base pairs TYPE: nucleic acid STRANDEDNESS: double D) TOPOLOGY: linear TTAC'rCTCAT
AAAATAGCAA
ATATTACTAA
GCGGAAACAT
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: CCGAGC!G1CG TTACAGACTT TATCAAGATT GGACGCAAGA AG AAAATATGGC ACAATCTCCA TGGCATAC'TC ATTACCATGT TG~ TCAACGACCC AAATGGCTTT TCTTACT'rTG ATGGCA.AGTG GA! TTCCTTTTGG TGCAGCCCAC GGTTTAAAAT CTTGGGCACA GC TTCACTTTAA AGAAACTGGA ATCAAAGTTT TACCAGATAC TC( CCTACTCTGG TTCTGCCATG CAATTI'GGCG ATAACTTATT CC' T'TCGCGATAA AAACTGGATC CGTCACCCAT ACCAGATCGG TG( AAATTCAA CATATAAAGG PGCCAAAA ACAGGACTTC rCCTCTTT rAGAAAGT
CATTAGAT
TACCAGAATT
GATGATTTGA
AGCCACGGTG
TATTTTAT ACAGGAAATG :TTTGATG GACAAGGAGG S S S. S S
**S.SS
S
S
SSSS
S.
S
S
*5S*
S.
S S
S
G'rAAGATTAC AAAGA'rTGAC TCCGCGATCC ACAAATTTTTr ACTTGGAGAA AAAAGGTTTC GGCAAGCAGT TCGCGACCTT CTAATTTGGT CTTTGTAGAG AGAAAGTTCT AGACTACGAT ACCCTAAAAA TGCCAAAATG AAGCCTATGC AACTCAAGCC TTGGTTTGCC AGATGTTTCT TGGTCAAGGA ACTCACTATC AGGACCT'rCG TGCTTCTGAA AACTTGAACT CAACTTGGAA AAGGTAAGGG ACTTTCAATC GCCAGGCTGG AGAACAGTAT ATCAGGCTAC TACTGCTACA AAGGAGAAAA AGTATTTTCT TTAAATCTGG AAACCCAACT GATGTCGCCA AACTTGCAGG GGGTATCTAT CTGAGAAAAC AAACCCAACA ACCTGGCTCG TTCCCCAATA TTTCCAATGT TTCAAAAATG GTTACAAGAC GAATACATCG AAATGTTGGA CTAGGAATCG AAGACTACAA TCGCCAGACA TCCCTGTCGT ACCTTGGTCA AGACAGGTGC TCGCCAACCG GACTGCGCCA AATGTTTCCA GTGACTTTTC CGGGAAAAAC CAGATGCCAT ATCGCTCAAG AAT'rGGGCAT 694 AAGATCTTGA TTGACCAGCC AGCAGACTCT AACTTTCAG GTCAATATTA TGCCATTGTC GTTCGTCTCT ACAAGGCTGT CAATAACGAC GACTT-rGCTA ACGACCGTAC TGCCTACATG GAACAACCTG TCCTTCTCTA CTGCCACAA AATA'rCT'rTC CAAATATGTA TAAGATCGGG GCTTCCTr'rG GTAGATGTGT CTCAACT'rCA AAACATGGAT TACGGTTTCG TTCAACGCTC CTGATGGGCG TGCTCTAGCA GTTAGCTGGC TACCCATCTG ACCGTTT'rGA CCACCAAGGA ACCTTCTCTT AAAGACGACA AGCTCTACCA GTATCCAGTC GCTGCTATTA GAAGCCTTCT CAAACCGTTC CCAAACCAAG AACACTTACG GCTAATAGCC AGAGCGAGAT TGTCTrrACTT GC'rGATAAAG AACTTTGACC TTGTAAACGG TCAAGTAACA GTGGATCGTA GCCCAAGAAT TTGGGACAAC TCGTTCI'TGC CCTATCGAGA ATC'rTCATCG ATAACTCTGT CTTTGAAATT TTCATCAATA GGTCGTGTCT TCCCACATGC GGACCAAAAT GGTATCCTGA GGAACTTACTr ATGAA-TTAGA TTATGGTCGC AAAACTAACT CGTCAGTCCT ACTACCGTTT CTCGGGTTAT CAATAAAAAA CATCCAAAAA GTCAATGAAG CC-ATGCGAGA ATTGGGCTAT TAGTCTGCAA GGAAAATCAG CTAAGTTAAT CGGCTTGATT TTTCTATGCA GAATTGATTG ATAAATTGGA ACACCAACTC CATCATCTGC AALCAGTGAAC ATGATTCTGA GAAGGAACGC AGCCAATCAG GTGGACGGCA TCATTTCTGG TAGTCACAAC TCGTGTGACA GCGCCGATTA TTTCCTTTGA CCGAAACCTA CTCCTCTGAC AACTATGCTG GTGGGGTTCT TGCTGCCCAA CCAGTCTATC ATCATGATTA CAGGGAATGA CAATTC'rAAT CGCTGGTTT'r GCATCCGTAC TCCCAAAAGC TCCTATTATC TCCCGTCAGA AAAGAAATGG AAATCAAGAA TATCTTGACC' TTT'rGCTTCG GA'rGATTTGA CAGCTATTCT GGTCATTAAA TTfCTGTCCCA A;LAGAGCTCA AGGTCATCGG CTATGATCGGG
ACTGACCACT
GGCGGACAAG
TACACAAACT
ATGGAATGTC
GGATTGGATA
695 ACCTACTTTA TCGAAAATTA CTACCCTCAA T'TGGCTACTA TCAAGCA.ACC TTTGGAAGAG AT1TGCT'TGTC TCACTATTGA TCTTCTCTTG CAAAAGATTG AAGGCAAGGA AGTCGCCACA ACTGGTTACr TCTTACCAGT TACGCTATI'A CCAGGAAAAA GTATTTAAAC TCAGACCGAT TCGTCTGAGT TTN'rATGATC T'rAAATTTTC GAGATAGCGC CTAGGTTAAA GGTT TTATCT GAGATGAGGC GCTCTACTAG GGGAGCAACT
TAGCCCCAC
CCGTACTTAC
TCTGGGCTAA
GGTTGAGGCC
CCAATTCTTC
GACTGGCAAA
AAATCGCATC
CCTGCGCAAA
GGCGGCTCAA
AGTTGGACAG
AGGCTTCCAG
TATAAACAAC
CACCTCCACG
TAGGAGAGCT AGGGATTTGG CCTGTAGTTT CATGTGGCCT CAAGGCTTTG AGGGCTGCAA AATTTTGAGC AAGACCGATG TTCTCTGGCA GAAGGATTTC CTAGTAGATC ATGACTGAGA GATAGAGCCA CCCTTAGTCG CTACAGGCAT GGGCAGGGTC TCTTTCAAGG TCCAGCGTCC AGCAGCTAAG ACCTTGATAG GGCATGGGCC CCAGCTTCGA TGGCACGCCA GTCATTACCA AATACCATTA AAAATTCCTT TATTATGAGT AGCAGCTCGG CTGACTAGCC AACGCAATTT TCTCCGCAAT CTCTCGTCCT GTAGCGAAAG GCGATGCGAC AGCT1TGCAGT CACCAGAGAA GATTCCCATG AGACTCTGTC CCTGACTGAG TTCTTCTAAG CATGGTGTTG AGCATATTGG CACCCATGGC TTCCTGGGTA, GAGAAAGTCT GGTTCGCCTT TTATCTCCTC GACATGCAGA TTTAACGATA GAAGGATAGG CTTGATTGGC AAGCTCCAAG
S.
0. frO 0 000**0 *0 0
S
0* *5 S S
S
ACAAGAAAAC
TGGGCTGTCT
TCAGATTCAC
TGCTGGATGC
GACACGATAA
ACTACACGTG
ATCTCACCGA
CGTCCATCTC
GTGGCAATCA
TAAGGATCAG
TGATCCTTTT
TCGGTCGCGT
ACTGGTTTCA
TCGACATGAA
TCACGCGCCC
AGCTCCGCTT
AGGGCTACCT
TTGATGATTT
GGAACGGTGT
AAAGTTCCCA
TTCTCCAGAC
CGCTCTTGGT
TGCTATAACG
2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 TCTTGCTGGC AATCTTCTCT TGCGCTAGTT TAGGATTAGC GCCCAATCAT CTGTCGCTGA TGGACTTGTG CAGTAAA ACC
TGCTGGCATA
ATTCCTGACC
CTACATTCTC
TAGCTTGTCT
ATGATTTTTT
GCTGGCCGCC GCAACCACAG AGGGTTCTTC GTTGACAAGT ACCTCCGGAA CCAGTGAATA ACTCAGCTGG TCTGCCACAG TCACGCTCAT CTCAGGACTA AGGAGCGCCT GAGCTTTTAA AGAAAATCCA TTCCAACTTA TCTTCATTA'r
AACTTGATAA
ACCTGCACGC
TGTCACATAG
AGGCAGAGAA
CTGTTCATCC
CAGCTCGAGG
TTTTCAACCT
GCGTTGGTGG TCGAGAATTT CAACCAAGGC AA.AATCTTGA TTTTCATAGC CAGCAAACTG GGCAGAGTTA GTTTCATCCA AGTTTACTTC CTCAAAAAAG ACCTTTTCAT AGTCTGCAAC GGATAGGGCA GTTCGTTGGT TGAGCTTGTT CAAACGGTCT TTATCCAA.AT AAGCTTCATA TCCTTCAACC AATTCACCAC TGAAGAACTC AGCCACAGCT CCACTTCCGT AACTATAAAG GGCGATTTTA TCCCCAGCTT TCAAGCTATC GAAAAGTGAA CCTGTGTAGA TATTCCCCAC ATGCT'rGT AAGAGGTC'rT TTTT~CTCTTG GCCTTTT'AGC GCTAATTTAG GATAAGGCAA AGTAAGCTGG TAGCGTr'N-r GATATTCAAG 'TTGGGTAGAA TACACACCAT TTACATAAGG CATGATGTCA CGTCTGAG CTACA'TCTC TGTAATCAAC ATAGCTACAC TTCCAGCACC GTATTTGGCA ATATCACTGG CAATGACCAA CAATTTGGCA TAATGGAGGG CAGCAGTCGC ACCAGCAAAG GGCTGGATGC CCAGCAAGCC GTCAX'TCCT GACTCGGTCG CCACAATGAC TAAAATAGAG TCACTAGCAC TGGCCGCCAA ACTCAATTCC TrGAGTAAGA GTCCTTTACT TGCTAAGTCT TGTAATT'rCA AGACATATTG GATTG'rCATA TTTACCTCTG TTI'TATCATT 696
TCTATTTTCC
CTTTGACTG
AGGCAGGCTC
GTGGAAACAA
CCAAGTCGTT
AGTTGTCGAG
ATTATTAAAG
TTGAGTTGGT
GACCTTGGAC
TCCGTAGCAG
ATGCACAAAG
CATGTCAACT
GGTCACGATA
TAATTTTTCA
AAGAGAGACA
TAGAGAATAG
rATCCATGA
ACAGCCGCAA
TTCAAACTAT
TAATrrGGTC
GCCATCATGC
TCTCCTGGAG
TCCGGAGAAT
GCTTCTTTAA
ACGGCCGCAG
TCTTGTCTTT
AAAGTCCAAG
ACTGGTCAAA
TTTTT-TTCAA
AATCATCCAA
CCAAGTATTG
GCCAGAAATC
GTGGATTTTG
TTTCAATACC
TTTCCACATG
TCTCGAAACT
CCTTACTCTG
CTTGCTCAGT
TCCTCAGTTA GGGGCGCAAT GGGTCAATTC CCCTCGCTTC ACTGGTCGCA AAACCAATCT TATCAATACC CATGTAAAAA ATCGTTCTAT ACTATTTTAT 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 CACAAATGGC AGTAAAAGAG AGAAAAAAGA CTTGATTCAC CAAATCAAGC CTCTrrATTGG TCATCATTTT AAAGAATGAT TAGTTGCTAG AGAGTTCACC GATATAAGTA GCT'rTA'rAAG CTCCATTCAC AGT'rATCAGC TCCTGGAGGA TCAAATTTCC TGAGTAAGTC CATCTACAAA TTTTTGATAA AACTGACTGG TCGGAATTTC TCTGACATCC TCTTATCAAG TGTTTTACTA ACCTTCTCAG GAAACTCTGA GCCCGAACTA GAAACCATGA CAGACAATAA GGAAAGTAGT AGACTTCCTG ATACCAGCAA TCAAATTCAT TCGTAATCCG GAAAACA'ITT TAAACGTTTT TACTITrGGCA .GATAGCGCAT GGTTACAGGC TTTATCTTCA CGTGGAGTTT GTGCTTGAGG ATATATCTTC ATTTTACCAG CTTGTCCGAT ATTTCTGCAA TAGTTCACAG CGATATCCAA AGAAACAATT TTCATAGCGT GAAATTTCTT TTTACCAGAA
CAATCAATTG
CTGGGATAAA
CAAAACTAGA
AAGCG'NTrAC
AAGATGTTCT
GCTGTTAGCG
ATGAGCCCTT
CTCATTTTGA
CTCTccTGAC
TCATTCGCTA
ATGCTCT'rGC
CAACAAGGTC
ATCCTAGTTC
GATGATTTCG
CAACCTTGCT
GCTTGAGTTT
GATAATCACT
ACAACTTCAT
TTGTGACAAT
ATTGITTTT
CTTCCCATCT
TTATCAAATG
CATCCAC3TTT AGTAGA'rTTA A'rGAT'rGATA
ATAGGTTGTT
TCTCTCCTTA
GCTGGATTTA
GTCAGCCAAG
ATCATGACTA
CGCTTGAGCC
AGGGCGATTG
A7'IrTAC'rT CCCTCACATC ATCGTAACAC CACT'ITGAAC CTTTCGTGGA TACCAAAATC AGAAAGCGTT ATCAATTTAT CTACGATACT CGATGTGTTT CAT'rCCAATA GAGATTACCA ATTATAGTCC CCTGTCACAA CGGTGCCAAA ATGGCCAAGA CTGGCTGACA CAAGAACCAA TTCCTTGAGC AAGAGATAGA CATAGACCGT TGCACATTGA TGCTTGTAAT AAAAAGGTTG TTCCTGCCCC TAAAGCGAGG 'rTATGTGGTT GGTCATAATA GCATGACCGC ACCAGCTATA AAAACTGGGC AATTATCCCA GGATAAATTT TTCCACATAG CAAGTGCGTC GTAGATATTT CAGCCAGAGT TACCTGCAAC GCTTAAAGGC TGTTTCTAAG CATCGCAGAC TTTTTCGATG GGTATTTTCA ATAGAGAAAA CCCCTGCGAA TGCGCGATTC TTTAAGGAGA A'rAGTCCCCG TTCTTCCATG CTTTCCTCCT AAAAAATCTT TGCCATGAAA CTT~TTAAGGA GTAGCTGAAG
TTATATAATG
AAGCCAACAT
AAAAGGCAGT
TAAAGACCAC
TAATGGCTGC
ATAGAGTCTG AGAATCACTG GACAACCAAG GTCGCACTTG TGTTCGGTAG GAGAGATAAC ACCAGGTGTC TTATAAAGAA AATGAAGGTA GCTACAATGA TTAGCCAGAC AGTCATGCCC AAAATCCCTC GTACGATTAA AAAAGTGATA ATGGCAAGAA 697 AATCATTATC GTGTCCTCAA AGCTGAGAGG AAGAGTTACT TCAACCCATT GGCTCCGACG AGCCGCAAT'r TCTTCATAAG TGCGGTArC TTATCTCATT TTTCAGAAAA TTCTTrATT TTAGTGTCAT AT'rAGTTCAT GTAATGAGCA GGGATTCAAA TCACGGACCG CATTGGTCAA ATCAAATCTG CCGTTGAAGG AAGACAAAGG CTCCAGCAAA AGGGAAAAGG CAAAACCAAA CCGCTAAACA TAACTGAAAA TTAGTATAGG GAAGGGGTTT TCAATCTGCC CCCCAAdTAG TTATAAGAAG AGGAGGTCAC AGATAGCGGC AGGCATGGCA
CAATACCAAG
CATCTTACTC
GGCAATACCT
AAAACCTGTG
GGCTGTCACA
TAAGGTCGCC
GAAAGGAGCA
GGCTTGCAAG
CTGACGAGAA
GCGCTTCATG
AGGACATTGC
AGTTCTTGAA
GATTAAGTTG
TCGCACATAT
TCTGTAAAGT
TTCCGCTAGC
CCAGTGCTTT
CTGGAACCAG
TACT'rAAAAT
CATTGGTCGG
CAGGTAAGAG
AACTTGCTAC
GCGACAGAAG
ATACCAGAGT
GGTACAAACG
TAGCGAGCCC
AAGGGAATTC
ACTCCTGCCC
CTAAAGGTCG
GCCGTCAATT
ATCTGGTTCA
CGCAAATATT
AATCCACAAT
5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500
GAATCATGGT
CTAGCATAAT
TTTATCAACT
TCATGACAAA
TTGTTTTAGG
ATCTTCTACA
CACATCAATG
CCCTCTATTC
GATTGATTAC
T'N'GTAGATT
CGATGGATTT CTGAGCCACT ACGGCATTTA ATTCTTTTGA TATCACAAAT CCGGACTCAA TCATTTTGAT TATCCATCTG GAAATCTTGA CTCTAGTCTT ATTGAGGTCT ACCTTTTCAC CTGCTCTAGG ACTTTGTTCA ACAACCATGC CTTCTGCACT ACCTGCAGGC GCTGTCGTCA CTTCTACAAC TTCTATATTA GCTTCCTTAA TCCCAACAAT TTGAATCAAA 'rrGTTCTTAG ACTTGTAACT TTTTTAGCTA CGT'rCCGGCA CC'rGGAC=r 'rTCCTCTATC ?1TAATCAAAT TGTAGAGTTC CGTCCAATAT CAAAACAATT TGAG'N'GCCT CAGGACCGTT CCAGCCTCAC TGGAACTTTT TTCTCTTTTA ACTAATTTGG AAAGAT'rGCT TCGACCAGTT CCAGCGCCAG AGCCTCTGTC TTCTCCTCAC TGCAACTGTC TGACCTGCCA CCAAATAAGA GAAGCTGCCA AAATCTATGT TTTTCGGTG GTTTTTrGATT GATTrrGTGTT CTGAGAAACC TTCGGCAAGG ACTTTCWNTT CTACGATTGT CGAGCGGTAG CGATTGGTCA CTGAGGTACA GATGGATT'TT GGCAATGGTC ACCGCGCTAT AATAATCCCC ATGGCATAGA TGGTGACAAG TAATGAACTG AAAGGCTACA GCAATCCCAA ATTTTGAGGT TTCAAGTCCC GAGAATTTGT CCCATGATAC ATAGCGTTTG AGGTCCAGTC CTCGCCAATA TCTGTTATCC TTCACGCTGA AAACGAGCTA 698 TAAACTCCAA GCTAGAACCA ATGTAACTCG CTGTCAAGAC AATTTGAGTA GGTTTACTCA GTTTCATAAT CGT'rCCTGGT TCGCTTTCGC TC'rCAGGAAC CTTCTTCTGC TTGAGTTCTG AGTTCCCTAA TTGAATCGTCCGTCITTTT TGCTCAAGTC ATAGGTCGTA CCTTCTGGTA TCTCATTCGA CTCTTCTTCC TCAATTTT'AA ATTCCGCAAT GACATCAGAG TGCCTGATGA GACAACCAAA GATCTGTACG GATAATCCGC CAATCTCAAA ATTGCT CATCTGGAAT GGCAATGGTT CCAATACAAG GC'TGGCCAAC CT'rGTGGTTG GTAAGTTTCC CTGTTTGCGC T'rGAACCTTA TCTTGGTATC TGCCTTGCTC AGGACAAGCT ACTAGACAAG ACTTTTTAGC AGTTGCCTTG CTGCA.ATAAC GGACGGCAGG CCCCGTCATA AGGGATATGG TATCACTCTG CACAGTCGCC AGCCCAACAT CGAGTTACTC AGTCTGTGAC CTTGGCAGTC GATTTCCGAC CGACATAAT'r TTCATTTTCG TTCCTTCTTT CCTTCTTCCA CCTTTTCACT T1'GAGCGTTG CCTTGGCCTC GCAGGAGTTC TGGATAGTAT AAAATCAGGT AACGCATCT'r TC'TGTCACAG CCTGGCTTGG GGA.ATAGATG TCAAGGTACT GTTTCATCAA AGATTAACTT TCCACATACA TCTCTGAALAC ATAATAACAT TTTCTAAAGC GGTTTCTGGA AATGCTGGAG CCTGTCAGCA TCTCATAGAA TTCGAACCAC GCGCCTGCTC TGGGTCAGAC TTGTCTCTGC CCATCTGGTG TCAAGAGGAT GCATGGCAAc CATCATAAG'r TGGACTCTrC AGATTACTrc
TAGCTACTGT
GACTT'rGCTT
TCAAATTATC
7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 TGTGAACAAT TCCTCGAGTA TGGGCCAAGC GCATAGCCAA GGACTGCT'rC TrrCATTAGAA AGAGGATAAT GTTCCTTGAT CAGCCACATA CTCCATAGCT GAACGATATG AGGATGGTCT CAGCTATCGG GTCCGTCTGG
AGGTACTGTT
AGATCTGCCA
TAGTTGGTCC
GACCGTCTTC
TAGCTCTCGC
TCAGAACCTT
CACTGCCACT TCTTCCCCAT CTAAGATTAA CTCTTTGGCT AGGTAGACAT CCGCCATACC TCCTCGACCA ATCTGTTTGA CAATCCGATA GCGTCCGGCA AAAATCTTGC CGATT'rGGAT
I
699 CATTCTGCAT CCTCCTC0TT CATAGAAACA AGGGCAACCG TAATGTTGTC TAAACCTCCT GCATTGTTAG CAAAACGAAC AAGTGTCTCC G TrATCTG CTAAAGGAAT ATCACTGGTT ACAATATCAC GAATCTCACT GCCTGAAATC ATG7"=G'CA AGCCGTCACT ATTGAGCAAG AGATAGTCAC CTGACTCAAG GATAACTGTC CCAAAATCAG GCTGAAWrC ATC7TTNTGC CCAATAGACT.GGGTGATAAT ATTTTTTTGC GGATGAGCTI' CTGCCTCTTC TGGTGTCAAT TGACCAGCCT TGAGCAATTC ATTAACCAAG GAATGATCGC TCGTCAACTG ATrGGTAT'rCT TCTCCACGAA TCAAGCCGAT ACGCGAATCA CCAATATGAG CATAGATAGC CTGATTATCA ATAATAGCAA GGACTTCCAA AGTAGTTCCC ATGCCTCTGT AAGCTTCATC CTGACCAAGC TGGTGAATCT TTTGA=rTC AATTTCTAGG TAATGGGCGA ACCATTCACG CACTTCATTG ACTGTATCGA TCTGGGTATC AACCCAAGCT ACACCCAGGT C'TGTGACCGC CATTTCACTA GCGATAT'rCC CTGCGCGATG ACCTCCCA'rC CCATCAGCTA AAATAATCAT GGTACGTCCA GCTCTATTGA CATAG'rGGTT GACATAGTCT TGGTTATTTG TTCGT'rTCTG ACCAACATCT GTTAATAATG AAATTTCCAT GTGTCAG'rTC CTTCCTAATC CGATATCTTG CGAAATTGAC 0* TGATGAAGAA TCCATCACTT CCATACAATT CAGGTGTAAT GAGGATACAG TGATATCCTT ACATTCATGT TCTAGTTTTA CCTGCTCGAA CTCGGGATGA AGGCCTTAAC GACTTGAAAA TTCTCCTCTG AGACGATAGT GCAGGTGCTA TACCACCTTT GCCTAGTATT AGGACGCGAA ATCTGCCGTT CGATTCCTGA ACAAGGAGCA CATGCACCTTr TCTGGCATCC GGGCATTTTC TTGAATTAAA CTGTCGTAAG ATAAGAGGCT GCACTCGCTC ATCACCTTGT GGATGGTAAT GGCTCCATCC CCAGACCAGT GGTTGCTAAA TTCGACTTAG GTCTGTTACA TGGCTT'rTGC TCTCTCCTCT GAGAATAGGC AATGGAGTCA CTTCACGCAA GA'rACGGCGA TGACAAACAC TACCTAATAT TTCTAACTGA TCTTTATTGT ATTTGATATC TGGTTTTCGG
CCGTCTTTCA
CTCTCTAAA-A
TAAG7,rATTA AT'rTCCTGCA
CGCAAAAGAC
TCAAAAAACT
CCCAGACGTT
GTAACCTGAC
CAGGCATCAA
GACTCATCTT
9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 TCCACCAAAA TCTTATCAAA AATTTTTGAG TTTGAACCCG
TCCAACTTGT
ATATGGGCTG
AAATCAAGCG
GCAAACAAAT
AXGGGAATTAT
GGTCGTACAA
TTTTCCCACC
TCGGAGCAAC
TATGCCCTGC
'rCGCCTCCAA
GGAATCCTGG
ATCTTCAACT
GTCCAGAGCA
TGGAGCCGCA
CAGCTGACTG
CGAATACTGG CTTTGTTTCG CCGTATTCTT1 CCTTGAGI'T CGCTTGTTN' TTCGCI'TGAT AGGACAGCGT TGACCAATTT AAAATGCCCC TGCTCCTTAA CAAGGCTTGG ATTTCCTCTT CACTAACAGG CTTTCAAAGA GGCAACTAGC CAAACTGGGA GCTAGCAATA TCTGGCCAGC TTCACTGCCT TTTTTACGGA
S.
S S S* 55
S
S
55 S S
S
S S *555.5
I,
5*.c
S
5*
S
SOSS
S
55.5 GTTTGGCCAA TTCCACTGCT GGAGTTGGTA GGCACTCATG CTTCGATAAA GTGGGATAGG CCAGCTCGGT CACTAAGCCC TTAAGGCGA'r ATTTGAATAT AACTTCTAGC CGTTTCTACT ACTCCGTTGA GGAAGGAAGC AGGGATAGAG CCCCTTCAGC TCACCTGGAT TTCCCTGACC TTAAGGAAAG TATGGGCAAC CGATTGGTTT TGTTCCAGTC ACCTGACTCG TATCCTGCGG TCCAAAAGCA AATCACGACC TCATCTGTGA TCGGAATGC'r ATTTCCATGA TGGTCACACC GCACCACCAC GGTGTCTAGG TCAAGGAGTT TGCTTGGGAG AGCTTCATAA GATCTTCCAT AGTCCTGCTT CCTTGGCAGC ACAGCACGGT CTGGCTGGGT CCTTTTAAGA CTG'rTGCTGA TC1'CCTTCTT ATAAAAAT'rG TCCCGT'rCTT GAGTCAACC CGGTATTAA TTAAAATCTG GTTGGCCCCA GAA'rGGGAC1' TAGGCACGTT TGAAAACCTrC TAGTAAGGTG GATAGCCGAG TAATCTTGAT CCTTGGCAAA ACTTGACCTG CCTTTTrCAGC TACCATTCCA GAGTCAGTTT TTGTCTGCTG CCAAAAG'N'G GCTTGGTTCA CAAAAACATC T'rAGTCACCA AATCGTTCTC AATGTCCATC T1'AGGCTTAC CGTrGCGACA ATCAAT'rCTr TTCTACTGGT AGGGCTTCAT AGGCCAGGGG TTCATTCCAC CAGTTTTTCT TCCTCTGGCT TTCAGGTTTG ATA'rCACCAG ACGGGCTACC GTTCCA'rAGA ACTTCCCTT'r AGATGCTTAT CTCTAGCACT GCTAGAGCTA CTACAGTCAA TGTACGTCCA CAGCTGGCTG CACTTGTTTG TCTrGCCGAT AGAGAGAATC AAATCTTAAA GCGGTCGCCC GAALTTTGGTT AAAGAGTTGA 'rTATATrTGG AGAGAAGGTA CAATATAGGC AGGCAGAGTG AACTAGCGCC AATTTTTCAA ACAAGGTGCC AACATTGTCC GCGACGAGA-A ATCATATCTC CTGCATCCAT TTCCTTAACC AGCTTCCTCA TCCCCTTGAA TCAAGGCATA ATGGATAGGC AAGGAGGGAG GCATGAACGT 'rGACAGCAAA GTCCATGCTA 700 TCATTAACCA CAGCATGATC TGGAATCTTG TCCAAATAGC AGAAGAAGGA CATAGAGCCA GCTGTCTAAC TGGTCTCTGT 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 AAACTGCCCA AAAGCAGCAG CTCTGGACTT CCAGATAATT CTGCTTGACT GGGGTTTCTT CACAACGGCT AGAATTTCGT AAAGTCGGGG GTCCCCATAA CTGCGGCTCA TGGTCAA'rGC TAAAACCTGG TTGAGGGTCG GTAATGATAG AGGTTGTGGG GGTCTCTGAC AAGCCTGACC T'rCTTCTTTC TTGTGAGAAA TTGTCGTCTG ATTCCCATI' TCGAATAGCA TAGTG3CTGCG CCGACCTGAT CGACCTGCCA TCACAATTCC 'AT&TGCTCCT TTTCAGGTTG GTAGATAGAT GGATAACTTT TTTACGACCA AACGGTCATC TGTCAAAAGT AGATTAGTTT TGTCATATCT TGAGACGGAG C'rCACTATTT ACCCCAGCTC ATCTTCI'AAA TACGGGCAAT CGGTTTTGGC GCAAAATGTT CATGACTTCA GGGTAATACC AATCGTfGAAA CATAGGCATA AAAGCCTTCG GATTGTAGGA CTGTATCAAC CCTGAGTCAA GAGCTGGAAG GTTCTCTrCAG AAGAACGGAA ATCAGGCAGA TTCAAGGCCG TATCCGCA'T- TAGAACTCCG ACTAGGGTAA CATTGGGAALA ATCCAAACCC TCCGCTTCCC CTCGCCCAAA CTGGTCAAGC GTATCCACAT CCATCCTCAG AATGCGAGCT GCCrrCTGAG TTCCCG'rCCC ATAGTAACGA TGAGGAATAT CCTTCGAGAA ACCACAATAA AAGGTCACAG AAATATCGCA OTTGGGACAA ACAAAGCTAG AATAACCACG GCCATTGAGC CGGTCTTGGA TAGCCTCTAG CAAAGGAGGC TAGTCTCGAA AGTCAATCAC ?TGAACCTCA GTTAGACGTA AGTG'rGATA GACGCCTTTG GTTGCAGATC CAAGTACCAG AGTTGCTTGA ~701 TTTGCAATCA TCTGAGTACC AAGG3CTTGGT GACTGCCTTT TGGGGAAAGA GTTCTGCTAG ATACTGCGGC TCTTACAGTT 'rGGCAG?'rCA TAGTCTTGGT GTATCCACCG TCCCACACTC
AAGTAAAATA
CTTTCGAGTC
CTCATCATAA
AGGACAGACC
ATCCATATGC
CCGACACATG
CTGGCATGGT AACGGGGATT ATCATGACAC CCAGATTTTT TGGGCATCGC CACGCTCCAC GAGTGAAGAA TGGCTACC?1' GTCAAGGAAA TCTCAGGTAC ATAATCTGCA AGTAAACCTC GGTTGAGAAC TGCCAATAGA AACTCCAAAG GTCTACTTGC TTTTGGACTA TGGTAACAGC GACTCTAACA AGCTAGCCAA TCCAACTrTT TCTTGGCACG TCATACCAAG ACTCGGTCTT AGGCCTTTTC TAGVCAAACG GCTACCGAAT CT'rCTGAACC GGATAGAGAA TCTTGTCATA GAGATT'rTGT AGGAGAAGAC GTGAGAACAG GAGAAAAATC TCTCCATCTG ATTGGGACTT
GCTGTCCTGC
CAGAGGAGCA
CTTGCGCCAT
GTCCCCAAAA
CAGCAAAATA
GGTCTTCCCA
ACTCACAACC
TTCAATTCCT
ACCTTGATCC
GGAAGCGCTC
TGTAGAAATC
GACCTCT
CATCATTTCA
AAACAGGCGC
GCTAGAATTC
AGATI'GCGT
ATGAGAACCA CCTGCTCTTT T'rTAACCAGA GTAAAGTr'rG ACGTCTCATT T'rGTCCGATA GGGATTGTAG CCAAAGGATT GGCACGTTGG CCAGCACGTG CCCGGCT-CTC TAAGCTCGGC TTATACTGAG CCCGTAAAAT AGCTACCTCT TTATAAGCCG CTTCATGCTC TTCATCAATA AAGATACCAG ATCTGGCACC AACAACAACT TCATCATACT TTTCACCATT GGATAATCCT CGTGcqATAA AACGCTCGGT CATCTGAGGA CCTGTCTTGC CCTTATCCAG GGCACCTTGG CTTCCTGTA-k TCCCTTGAAG TAGAAAGGGA 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 141.60 14220 14280 14340 14400 14460 14520 14580
GCATCACCCG
TCAAAATAAC
ACAAACAAGT
TCTGGATGAC
TCAACACCTT
TGATCCACTC
GCTTGCTTGG
ACTCGTTCTT
AGAAACCCTG
AACTCCTCAG
CCTGTC-TTG TTCTGGATTTr CAGCCCAGCG TTGAACTTCC TGACTTGCTC TCGCGAGTAG ACACCAGATA ATCTCTCAGT CTAAT-TGAGC ATGGTCAACC CCTGATA-TC CAGACCA.AGC CAAGGTCTAG TGAAGAAAAG CCTGACTCAA GCCTTCCAGA GAAGCATGGC CTTGAGGATA CCAGCCAGAG TTGTTCTGGC TTAAATCTTG CTCCATCTCT GAATC-AGGCG ATTACCCTTA CAGCACCTCT GCAATATCTT CAAACCAAGA ACAATCCCTT CCAAAAGGCA CATGAACCCG CTGTAACTAT AGGGCTGGTC GCCATCTT'CT CACCTCCTCC CCCCCCAACC TTAAATTTT TCTTTTCTTC TT'CTTCTTTG GTTTTCCTTC TGGATCTGGG GAAGAGTTGA ?rTr'rCAGAC GGGCACGTTT TGC1'TCCAAG 702 CATCCCAACT TCCAGCATTC CCTCAAATrc CGrTCGCATC AAGGGCACAT CTACGATAAT TTGTCAGTAC ATTCTTGCAA TAGAAAAAAT TCACCATCTT ClrTTTrCTTT AGCAATTTGC
C'ICCGGAATC
CTTAGCTACG
AAGATTGAGT
TCTTTrGATTT
CGGCGTTTTT
TGAATTGTAA
TTGAAACCTT
ATTACGAGTG
TATCAATAGA
TTGGAGATTT
CTCTCAAT
CATAATCATA
TCACTTCTC
GGGTTT'TAAC ATCATTTGCT 'rGGTAACATC TCCTGATAGT CACACATTTG ACACGTTCAG CTCACGCATG AGGGCAATTr ACTATCTGTT CCACGACCTA CTTCTTCGAT ACGGCGACGC ACTGCTTCAC CGTTTCCTGA TTCGATTTCT TCTAAAGCC GAGTTGCTGG GGCACCTGCT TCCAATTCGT AATATTTA AGGAACCTTG TCGAGCAAGG TGTACCTATT TTCTAAATTT TATCGGGTAG GACCAATGAC ACGATCCACA CAGAAGTGTT CAGCTAGGGG TACCTGATCG TTGACAATCG CTTCCTTGGC CTTTTCGATT CGrTCGCAA CCAAGCGATC TTGCAATTCA TCCAAATCTG GAACCTTTTT CTTGACCTGA AGAGCACCCT GTGGTGTCAG GAAGATAAAG ACAGCATCTG GAACTTCAAT TTCAAGGAAA ACATCGATTC GAGGAGT'rCC ATAGTAGTTA jCCGACATATT TCAGCTCTTC AkATTCTTCA CGAGTACGGA CCTTGTCCAA GGTTTCATTG CTGCGTATTC CAACATCTGT AGAAATAGTC AACACCGTCC AATATTGAAA TTGGTTTTCA CCCCTGAAGG ACCAGAAAAA
ACATAGGTCA
CCTTGACGAA
ACTTCTCCAG
GAACTCTCAA
ACGATTAGTA
14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 GACGTTGTGC GCGTGTCGTC AAATCTCTCT1 TCTAACCGTT AGCCTCGGTC TGCCATTGTG TAATGGCAAA AAGCCAGATT TAATTTTTCA ACTGCTCTT TCACGGTTAC C'rTGATATTT TGTGAGCCAA GATTGTGACT TTTCACGTCC TGCTTGAAGG *CTTCCAAATC APGGAGACGC GGGCTGCGCT CAAGGCATCT
ATCGATACAG
CCTTTTCCAA
TCTCCr'rTTA
ATCCTTTACA
?GTCAATCTGT
GTCTTTCTAT
GAAATAACAT TTCTCTAGAA CTAG'rGTA-AC AAAAAAGCAG TACTGCACGA AGCTCGCGAA TTCAATTTTC TTACGAACTT CTTAITrTATT TAGCATAATC CCTGGATAAT CGAGArTGTT TTGTCGTCCT TGATTTTTCC GCAAAGCTAG TTI'GCACTCC TTGATGTAGC TTTCAAGAGA GCTGCAGCGA CGATAACTGC TGGATTGACC ATGATACGAA TTCAAAGCCG TTAGCAATTT CTCACTACGA GCACCTGGAC TATCACGCTC TCAGCTTCAA GGGTTCCTTG TACTTACGGG ATGGTCAATG GCTTTCCCGA T1'CACCAAGT TCGCTCGCCA
CATCTCCGTG
CCAATTCCAT
TATCGTGAAG
GTGACTAGCA ATCGTATTCA CCACAACTGG ACCGATTTCA ACGTGGCTAC CTTCAACCTC GAATCCAGCA CGACGGGCAA GAGCCGCATT TGA'rACCAC
TACGGAACTG
TTTCATAGGC
TCTCAACCAA
TAGTCATACG
GTGTATCGTC
CAACT'rAGCA ACCTCAATCG AA'rGGCGCAA CAAACGTCCC ATAATCTTCA TCAAGTCTGG AGCACCTCA CCG'rATTCAC GAATCTTATT CTCTTCGATA CGAGCTGGAT GTATACGACC GGCAATCTCA CGACGAATCG GATCAAATCC GATAATCACA TCGACCCCTG TCAAACTTTC
AACATTTT'GT
ATGAAGGTTT
G'rCAATCTCT
ATCTTTGAOC
TGACA.AGGTC
AAAG4GTACGA
CCATATGAAG
GGCGCACCAA
TGACGGTTr-r
AACATTCCA
ACCACTTCTG
ATGTTACGAC
CTTCACGACC AA'rAATGCGT CCCTTCATAG ?1'GACTCCGC TACATA'rTCA CCAGCGATAC CCATTTTGTC AGAACGTTCC TTGACCTCTT CCCTGGTCAA GTTTTCCTCT GTCTGAGCCA GCGCACCAAT ACGCTCTAGT TCTGCTTCTT TATCGTCTGG CAGATGAACT GT'rCAGTTTG GTTGCATAGC TTGAACCAAG ATGTCCTTGG GCTCAGCTTC GCGAATGCGA CTGGCAATCT AGATAATATC TCG'rGCTTCT CC7GAGACA
TTTGTCTITTC
9 9* 9.
9 9* 9 9999 9
CACGCGCATC
GTTCTTTACT
GACmTCGA'r
GGTATTTTCT
CACGTTTGGC
TTGCTTCTTG
TAGCTGAGAT
CAAGCGACAT
CTTTTACATT
GAATTTGGAA
CTTGAGAAAA
AAGGTTrC GCTCTATCAG AAATACTTTG CGTCAAATTG TCGTCCTTAC GGTCAAGGCT TTGTTTGAGT TCTTGACGTT CTGATTTGAA GGCTrCTTCT TTGGCCTCCA ATAGTGCrrC TTCATTAACA AGTAAATCCG CTTCACGCTC TTCAGCATTT AAAAGCATCA ACTCTGCAGC GCTGACATAT CCAATGACTA AACCAATGAT GATTTCCATG TTTT'rACCTC ATTTTATTGT CTACCATAAA AAAGTGATT TCACAAACCT CACATTTNACC AAAATAAACT TGTTGTT'rAG AGCCTACCTT TCAATAGACT TAGTAATGAT GACTTCCTCT AATTGCTCTT TTCTTTTTGT TCAAGTGTT'r AGTAGCTCTC TCTGTCAAAC TTCAGCGTCC ACTTCTTrCAC 'TTrTTAAGA GACTTGCTTT AGCTTGTCCA CGTAAATTAG TTCCTGAGAT GAT'TTCATCT GACGGCAAAA ACAGCAATCG TATTCCGAAT GACATACATI' AAAATAGAAT ATGTTTTGAG AAATAGTAGT TTAGTAGAGA CTTTA-AAGGA CAAGAAAGCC CATCA-ATACG CCGTGACCAG TTCCTGTAAA GGGATCACGT 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17100 17160 17220 17280 1.7340 17400 17460 17520 17580 17640.
17700 17760 17820 17880 17940 18000 18060 18120 ACGCTATCTC CATCCATCAT ATAAA'rCAAG CGATTTTCTG GCTCCTTGGT AATCATA'I-r' GAGTGGTTCT GGTTTACCTA 9 .9 9 9 9 TGAATATCCT TGATTAGTT'r ATTGATTCI TAGCAATAAT CTGCCCAGGC ATCTTCTGTA ACATGGACCT GAGTAC'rTCC AGCACGAACT AGTTrCCTTAT TTTGAGCAAT TCTCAGGGTT AGGACTACAA TGTCCTCATC TGGATTTTTA TTTAACGTTT TCTTATCCTG ATTTTGCCAG AACTTGAGCA GCATTTCT'rA CTCCTCAATA TGAGCCATTC C1'CGCAAAAC CTTArCAGAA TCTT1GGATAC TATCCCACTC ACTCTTTGAA TTCACCACCG TCAAAGGCTC AAATTCATCA 704 TT TACCTTCT TCATGTAGTC CTTrAAATGA TITCGGAATG TTGAGTAAAG GACTGCTTCC ATAACCATAC CTCGTT?1'AG CTCI?1'TCCA CTATTATACA CGAAAAGAAA GAAATTGTCA GGAACTTGTA CAAGATTTrC TTTTCTATCT A'N'TATACTC AATGAAAATC AAAGAGCAAA CTAGGAAACT AGCCGCAGGC TGTACTTGAG TACGGCAAGG CGACGTrGAC GCGATTTGAA TTTGATTTTC GAAGAGTATT ATTCGTAAAA AATCTCAAAA AGCCTACCTT TCGGTAGACT TAGTTTGTTT CTATTC INFORM4ATION FOR SEQ ID NO: 88: SEQUENCE CHARACTERISTICS: LENGTH: 7001 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 18180 18240 183 00 18360 18420 18436
S
ACGTAGAAAA
AAACGTCCTA
AGAAATTTTG
AAAAACAGCT
ACTATTTCTA TCACAGATAA TATTCCGTAT GGTATCTTTC TCAGTCTATG TGACTTACAA AATGAAGATT ATCCTATTTC AGAAATCAAT ATAGAGTACG TCCCTTTGGA AACAAAATTA ACCTGGACAT TATAACAAAG TTTTIAATT'rA TTAAATGTAA CGATTGTATG GCTATAGGAC TTTCCTACAC GCTGGATTAG CTCTAAAAAT CCTTATATCG TTGTGAATOC GGAAAAAAAG ACACGCCCAA TTATTATTTA TGAAAAAGAC ATTCATTTAG ACGTCAACAA ATTGATAAAG CATCAATCCT GCTGATAAAA TCATGAAGTC AGGGATAAAA AGAAATTGAA ATTTTACCTT ACTCCGGAAG TATCATTACA
TTAAAAOATT
AATACCT'PTT
GTATTTACAC
GAGAAATTGG
GTTGTTTACA
AAAATTCCCA
CAATTTTCCT
TAATCCACAC
TTCCTTTTTC
ACACACCATT
AACATATATT
ACTAACTGTA
GTTGTTGGAG GTATTGAAAT GGGA.AAACTC TTTTCGAGAC TCCACCATTA CCA.ATATGAT CTTGGATTTG GCTTATCAAT AACAACCCCA TATGGGAATC' TTTATTGTAA AAAATAATAT AATACCCCCG ATAACTTTAT ACAAAAGAAA AAATAGGAGC GTCGAATTGA ATGGGCAATA TCGGATGCTT GGTTAATCAA CTAAAA.AGCC T'rGTAAAGAC AATTTAGGCG ACTCCGCTTT TCTATTGCAA ATCTCCTCCT CTTAATTATC AACCTTTCAC GTTCCCTTTA CTCGTAATAT ATAGGAGCTT GTGCATTAGC GATATTATTT CACCTTAATA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 ACACCCTTTT AACGGCTTAT GAGTCAATT1T ATTAGCCACT
TCTATATCAA
TCCAAGACCA
ACAACAAACA
CAGTCAATTG
GCTCCACTTC
TCGTGGAAGT
TATCGTCGCT T TTTTCATAG AACATAGCAA TGTATTACAA TATTAGAAAT CTATAGACCT CTTTAAATCA ACTATAACCT GTAGTAGATA TCTCGTATITT AGACAATATG AAAACAAGAC GACTTCCATA rAGGAAACCG A'N'TATATTA AAATAACTr'r TCTTCTAGCT GCATTTAr AACCCCCAGA ACTTAAATAA CAATTTTTAT TCAAGATACA TATGAAATTC TCATTTTTGT TTTTACAA'rT CTCCTTAGTT TTACATATA GTATITAGCG TTTGAATAGG GCGTTATCAC AACAATCAAT TGATGCTAAA GTAAAAGTGG ATGAAATTGA TCACAATATA TAAGTTCATC AAAAGAAACG TCAAATGCTC AATCCCTAAA TTACTAAAG TAAAGTTACA AAGATGTACG TGATAGCATT TCTACAAATA GTGCTTTGGC ATTCCTAGCC AATACAGACA CTCGTAACAA CATACAAGAG TATTAATCCA AAACGATTGA TAATAAGCAA AATACTAGTG TAACTATCCT CCACATAGTIA CTGAACTCTC AAGAAAAGCA TCTCCACGTT ACCTGTACCT AGATGmTCG TAGCGATAGT CAAATCAAGA CTCGGAAAAA TATCATTCTA ATGAAACAAC GAATACAGGT ACACTGCTCA ACTTTACACC ACCACACTGT TGTAA-ATCAT CAAGTAAAGA GAGCGGATGA TTCATATCAT TTAAAGAATT CTGCTTCATr TTTCTACCAA TAGCTCAGTr' CTATACCAA1' ATAGATTCCA AATTTTCTCT TCCAGTCAGA AGCTTGTCAA CCTTCTCGCT ATGTTGAGTG ATTATAAAAA CATTCATCAT TACTCCTAGA ATAAACTTTA AAATCTTrGTT TAATATATGT TCCAAAAACG GTTATTCCTC TCAACTTCAT ATGGCTCAAA GTTCATAAAA CCATGAAACT GGCATCATAA CTCTAAAAAG ATTGTTGAAA TGCCTACATG ATCAAAACTA TGACAAAACA TGTAAATGGT TGTTGTATAA AGTGTTCGCG AATATATTAC GATTCAAACG AAATATGTCA CTATAGACAA AATTTTTTCC CATATTTAGG AACAGGATAA 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 AAAATT'rATC TCGTTAGGCA ATCAAGCAAA AACTCGACGA AGGATTGACT TCCTAAATTA TATACTTTAG TAAGGTTTTC TTTACATTTC TAAACATTCT TTTCTAAGAT GAAAAACAGA AGCAACAAGA AGATTTTCAG TA'rCATCCTA TAGATACGAG TTTTGAATAT AAACTACAAT AATATAAACT AAATTTTATA TACGATTATA TGATACAGGC ATCCAAACAA TCACAATTCA TATTTGCGAA AAGTTATTTT TGAAGACTAT TCTTATr'rAA TTGCTAGACT CCAAAGAACT AACCCGTTTT CAAAAAATTA GAGCA'rACTC CAACTCATAA ATATGTGATT TCATTAAATA GTTCAAAAAT TGATGGAGAA ATACAAACAT GGATAAAATG AAATAAGGAA TTAATTCAGG AAAATCTGAC TTTAACAATT
CTAAGACAAA
ATCTGCTCAT
ATCACACCGA
TAGTACAAAC
GGATAAGAAA
ATTTT'rCGAT
TAAGCTAAAA
TTTAATAAAC
AAATTCTTCT
ATTATCATrAC
AAAGGTTCAT
TGTGATI'TAA
CTAATTAAGA AAAACTACAT GGAGGAAGAC AATGGATTGG ACGCAAGCCA TTGGTTTCGC CAAACCAAGA TGTAGAAAAG GCTTGAAGTA TGCCTTTCAA AACCTGCTAA GTTAACCAAT AAACCGGTCT TCCAAGCCCT ATCTGTGTCG GTGGTTATGT 706 CTTAGAA'rA' CATGGTTTAC GTGCCACACA AGATGT'rGA'r GC7=NATGG CTCTATAATA
TTTGTAGTGG
CCCATATGAC
A'rTACATTTT
CATCAATAAG
CCCTGAGTGC
CCTCGAAACA
TCACTGTTCA
TCGTATTATC
TGCTCATCAG
TGAGCATGAT
AGTGACTGTT
GATATCATCA
GTAAATCCCC TATGCATATT CTATAATGAA AAGCGACAAA ATCACAAAAT TACTAGACAT GATACACACA AGGAAATCAT GGAAACCAAT TGAAGAAATA ACTGGTATGC CTTCTAGAAT AAAATGATGG TCGCTGAAAC AACCAAAAAA TTGCGCAAAA CTGGCCATTT CAACTTCAAC TTTTCGCGTC TTCCTGAGAT TCAATCGGGA GATGGAGATG CTGTTCTTGA AGGTAGAACA ATGGAGCCTA TTTTTGTGTA GAAAAAAAGT ACAACTCATT AGAAAGAATC ATATGGAACA TAAAGACCCT AATATCCAGA TTTTAGACAT CGCCAAACTG GACTACGACG CCCCATCTTG TGACTTTCAA AAACCGTCTA AGATCCCTTA TCTCCTTAGA AAACGCCGTT TCAAGTGCTA TTCTATCGTC AAGAAGAATC ATCAAATTCC GTTGAT'rGAG AAGATTTCTA TGACCCATAT TGTCATTCGC AAGCTCAATG ATTCTCACTT TATGTCCTGG GACGTTGAAA CAGTCCGGGG AGCTTTATTG CGCAAGATT TGAAAAGCTC CAAGCTGTCA TCCGAGA'rCA CTTTCTTA-AA 4S 4 4 .4 4.
4 4 4 4e. S 4
S
4. 4.
4. 4 TATGATAGAG CCGTCCGATG TCGCG'rCAAA ATI'ATTACTA TGGATATGTT TATGACTTAG CTAGACAACT GTACAACATC TTAGCCGTGC CGAAAATCCC ATGAATACAA CGTAAACTCA GCGATAAACA GAGATTTTAG ACAAGCTTT CAACTCTTGC TGTTTCACTT GACAATCTTA AGCAGGTTCA AAAGAAAAGG 'rTATCAACGC AATAATCTCA TCAAACTrAT AAAAAACGGA TT'ATCGC CGAGCTTAGC TTTTTTTCAA -TAGCTTTCCT TTCATTTCTT AAACAGGATT CCCAGAAATG AACACGCGAT ACAGATrGTC TGGATCTGAT TrTTTGATAAA GGTAGGAATC GGATCGCAGG TTCCCGTGT GCTAAAATCG TTCTTGATCG TATGAGTCGT GTGCGTGTCC AAATCATGAA GGCTATCAAG CGCTACTGGA AACTCATTCA TTTTTATCGC CCTACTTTTC GTATGCATTT
TAGTCCTTAC
CTTTCACATT
TCAGTTTCAT
ACAGGATAGC
AACCAATAAA
2880 2940 .3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 GAGCTATTCA CAAGACTTGA TCAGAATAAG GAACCGGAGA TCCTATTTTT CAGACTGTCT CCTTCAACTA CACTATTCTA CAAGCGCAAT GCCTTTGGTT TCTCAATATC AAAAAAGAAA CCCACTACAG TTGACAAAGA AACATCACTA TCAGCTCTAT AATIrTTCGA ACTTATCGAG TTAAAACCTT CCTCAAAGAT ATGCCAAACT GGAAGCGACC TTCGAAACTT TGAAAACTTC GGACAAAAT'r TGTCCTTTCT GCCGGAAAAA GGAACAGCCT
TTTATTTCCC
CCACTACCAC
CTTCTTTCGT
CAGCGACTCG
TTATCAAGGT
TCGTAGTAAA
TAGCCCACGG
AAAAGCAACC
TTCAAAATTC
CATGATATTT
CGTGCTAGCT
TACAACCATT
CTCGCAACTG
ACTAATAAGC
TTAGAGCTAA
TCCACAAAAC
GAGAGGTTGC
CAGCTGTTTG
GTTTATTAAA
CCGATTCTAA
N'TTTCCCAT TCCGACGGTA AAATAATCTC TGTGTCCATC ATCTGATATT CTACPAA'rTC CTGGCCAT'rA TCATAATAAA GAGCATCTCC AACTTTTAGC TGATCCAAAT GGCGGAAw GACATGGCTT GCTCCAC GGTGCCCAGC AATCACTGAG CGAATCCCTG TACCATCCAG AGGCAGCCGT GTACCATCCA CATGAGCCAA GCCCATCCCT AAATGATGAT AATCTGCTCC CAAATAAACC GGCTCCA'rGA TTTCCAAACT TGGAA'rAGAC AAGTAACCAT AGACTGCATC AGGGTCGTCA GACACTTGGT GCGATTTTGC GAACCCAAGC CATTTCAGTT GTCATGGATT CAACACCATC TGTCCAAAAC TCCCA6ATAAG GCTCGTAGTr CTTTGACAAT CCGTCGATTA CTCCTAGTAA TAACCACAAC CAATTGGTGC CGTATACGGA TATACGGTGT ACAAGTCAAC CAAAGTCATT CGGCTCCACC AAACGTGAAG ATAAAAGATA CCGTTGGCAA TCCTCTGTGA GCGAAGCCCC TTCTAACAGC AATTrGACC'?C ATATCCCTCC GTTGATTGTA GGCGAGAGAA TCACAAATGT AGCATGACCT AATAAATAGG AATCAAACAG TAGTCCT'rGA CATGACCCCC CGATACACGC GATACAGCAA CAGAAT'rGCC CACGCTCTCT ATCCGCTTCC CACGTACCAA
AAGGTCGCAT
GTCACTATCT
AATCTTCCCC
GATCCACTTG
TCCCCTTTTT TCATC=ATC GCAGTGATCA CTGTATGGGT CCTGCCCCTT TCTGAAGAAT GCCAAAAAAG GATCTACAAT TGGTTCTGTr GTTCTTGGTA T'rCACCTGTC CAAGAGACTG GCTACCAACA TCAACAAGTA CTCCAAT'rGC TTTTCTAGTC GAGAAGGATG ACCGCCATCG CACCGCTCGA TTCCGCTCTG CAGACGATGA CTGTTAATCA ATGTTGAATC AAGACAGGCT GTAGGCCAAC ACCTGATCTA CAATTGACTG AACAATTCTG ATTTTCACCT CCAACAGGCA GTCCTCACTC CTTCCGACAT CCAATCCGC TCATCGACCT TTGCTCTGTA AAAGGATCAA CAACGCCAA CGCTCCTCAA T?3'AATAACC TCGTTTGACT AACGGCGAAT CCTACTAAGA GCCTTTTTTT CGTGAACGTC CTTTCAGCGC CTTCAAAGCC TCCATAAAAG AATCACAATC TGAGTTTGTT TGCTGCAAT1A CCCGAACCAA TAGACGATGG AATCATGACC TGGTACAATC 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 ACATCGGAAT1 TTCCTGATCA ATCGCAGGAA TTTCCACATA TTAGCATAT'r GGCATATTCT GACGCCTT TCTTTI'TCTC GAATTTCAGA TGGTTTCAAG GTCGCATTGA AGGC'rTGAGC GTTCTGCCTT A'rCCATCTGG GAAACCGTCT CATCAAACTC CAATACGATA ATAATAACGA GACACCAATG GATATATCGC AAATCAGAAG AAGGATCAGC GCA'rGTTTCT TCTTTTTGT TACTGTTGTC CATCCTCCAC CTTCACTTCC TTCCTTGCTG TTTTCCGGTr GTrT=TTT CTTGCGCAAG CGTCGAATAA AAACCAACTG CCACATAAAA CAGGTAGCGA TAGAGATGAC AATTCTTCCT CAACCTCTGC TACGTACGGT ATCCGA'rGCC GTATTGATCA TGTA'rGGCGT ACAAGTCAGC AAGG.TCACA'r AATAAATCAT CAAAGTTCGT ACTTCCTTGA TATTGTGCAC AACATC'rTAG CTGTrGGCAA.
CCGATCGGCA GAGAAGTTCC GTACCAGCAT AAACCGGCAA TCATGGATTT CTAACATACG CAAGGATCGC CACTCACTAC CGTTCATCAA TGTCAGCCTC TGATTTGATT CCACTCGATA CCAATGAAAA ATACCACTCC ATTTTTATCA GCATCCCTTT 708 CGGCTCAATC ACCTTTACTT GATCCACTTG ATAAAACTTA TCCCCAACTT TA.AGTTTGGT ACCTGTATGT GCCGTAATCA CCGCATGGGT CTCTAGATGC CCAGCCCCTT GCTGCAATAC ATCCACGTCA ATAACGGGGA T'rTCCACATG TGCATACTCT GCTCGCCCTT T'TT'rCTTCAT ATTATTCAAA GAGTCATTGA AGGCTTGTGC ATCCAACGTT GCrTTTTCCT TATCAAAGTC ATACAAGCGA GACACCAGCG GATACGCCAT TAATAGGAGA TTATTTCGTT TT'rGCTTTTT ATCTTCAAAC TTCAGGGTAT C
ATAGGCCATC
CAAATCCGTA
CGAATTGCCT
CTCTTCAGCA
CCCCATCCC
TTCTTCCGAC
CAATCATT
AGCAATTTGT
TACCGCCATT
TGTTTTTIACC
6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7001 4 *4 4 4 4**4 4 4* 4 INFORMATION FOR SEQ ID NO: 89:
SEQUENCE-CHARACTERISTICS:
LENGTH: 10411 base pairs B) TYPE: nucleic acid CC) STRANDEONESS: double TOPOLOGY: linear I C xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: GAGGGAGCTT AAGAAGTTAC CACCGTCCTC TAGCGCCTTA TCCGCATCAA GATATTTTTA AAACTGTCGC CAGCTTGTGA TACGATGCTT TGTTTAAGGT TTTAGTGAAA TCTGCATTGC TGAGGATATC ACTCTTrGAG AGATTCAAGG
AG'N'AAGGTT
CATTTAGGGT
CAAAATTGAT
GATGATATTG ATC'rGGT'rTC CTGTTATGAC CTGATCAAGT TTGTAATTTT TTAAGGTATC TTCAACAATC TTGCGGATAT CTTCTTCTGT CAGATTTCCC TTACTTTCTr TAGCTTTGGC
GAGTCCTGAC
GTCCTTGTTT
ATTAGCTTGT
ACTTTCTCCT
CGT'rACTGCT
AATCTTGACC
CTGTAAGCTA.
CT'rGAGTTCT
TTGATATCAG
TCAGCATTGA
GGCACCTTGG
GTAACTGGAA
GCGTTTCGGT
TCAAGTGGCG
GAGTCATTGG
TTGGTATCTG
CTAGGGCAAC GTTTAATTTA TTAGCATCAT TATCTGACAA AGCTTTTAGC TCTTCTTGAG C'rCCATTAGC CTCTAGCGAA TAGTAAATCC TAGGGGCTGC TACAGTGATT TTGGCATGTT ACATATCCTG AGTCACCTTA GTGATATTT AGCCTGAT7T
CCAAATCTTT
CTGCTAAAGC
CCATACCCAG
CTGGTGTTTC
ATTTGTCACC TAGCTTTTGA ATCTTGGCTG ATGAATACAA CCACATTCAT GATTTTAGAA TAAACATCAG GTGTCATGGT TTGAGGCATT GTAGCCCAGT TTTAAGAG TTTGATTTTT 709 TTGGTCTTCA GATAGGGAGG AACC'rAGGAC ATATTCAGGT TCGACATAGG TTTCATCGAT AACTTrTGA ACATCTGTTG CTGCATGGAC CCTATTCATA GCTGTTACTG CCCACAAGAT CGCAGCGCTA G'rCAGAAAGA GTTTCTI'TCT CATAGGGAAT TTCCTCCT-rT ACTTCTTTAG AGTAATATAT CTATCrTAAA GAAAACTTAT AACAAAAACA CCTGGTCTAG CCAGAPGTTG AAAAGAGAGT GAAACATTTG ATGATGTAAA GGTTAAGTCG TACCTG'rCTA GAATAATAA'r ACTTTCCTCC ATTTACATAG AGT'rCAGCAC CGTGAAAAAT GGAAATGGGG TGAATATAAC TATAAGTCTT TCCAGTCCTA TTACCAAGCA AGGGGGCAAC AGTCTCACCA GAGTAC'TGTT TGGCTAGAGC CAGGGTATTT TCCTTGCCAT TTTGGGCGAT AAAATCGATA TAGGCAGGTC CAAAATTATA GGCTTGAACA GCTGTCCAGA TATCTACCCC CTTCTTCTGC GCCAGATAGA GA'rTGCCTGT CAGAGrTGA ATGCCTTGCC GAATGCTAGA GGCATTATCA TTGATGGTGT 840 900 960 1020 1080 1140 1200 1260 1320 1380 TGGTGGAACC ACTTGCAGAC TCACrAGACT GCATAACATC *4
C
4.
C
4.
4 9 4 *9*4 CACTATAAAT CATAGCAAC T'rTCTCGCAC CATGGGTrrCA CTTTATAGCC AGCAAAAAGG TTATT-TACT'r TGGATATCCT TTGGATAAAC TCAAC.NGACT TCCATTTGAT AAGGAGAGT'r TTCATCAAAT TGAACAGGTT AGCCGTCAGA TTGTTGTCAA TTCTTCTAGG TTGTTGAAAA GTTAAAAGAT TTAGCAATTC AGAAATCTTA GGAGCGACAG GTACTTGA'rT CGTTrTTTCAA CTCATCAGCT CC'rGTTCCAA ACAAGCTCT'r CGTTTGCTCG TAGGTCATGA CTTGTTTGAC AAGACTGCTA GTACAAGCAC CGATATT'TTT GATTAAGATA CGGCGTCTTG ATAGACGTTA TTTGGTT'rTC AAATTTCTTT CTGGTACGGC TTCTTGACT AAAGGTCATT AGCCAATTTC TAGCTGAT'rT CACCTTG6AT TCTGGGCTGT TTTTTCCAGT CAAGAAGATT ATCTGAAAAA GCCTTCTTN' CCTTTTGTrr GGTGTCTTGT TCACTCAATA ATCTrrGATGA ACGCGGTAAG TCTTCGAATT CGTTTAAACA GAGTAGGTTC CATTTTCGTT TTGGGAACGA TGAGCTCAAT AATTGGCGAC TGGCATCAAT TGGTCAATAA AGCTCAAACG TCAGGTGACA ATTCATTGCT TGAAATTGAA AATCATCTGT TCCTTGATAG ATTTTTTAGG TAGTTCAAAA AAGTCCCGTT 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520
TCAGGTGATA
ATCCAGGC.AG
CTTGCTACTC TGAAGATTGA GTTATTCTGA GTTAGCTTGA *64 AACTTCTCCT CCGAGGTGGG TCAAGGTCTC CCGCAGGGCA TTCTACACCT TCTrrTAGAAA ATTGCACAAA AATCAAGTCA AATGCTAAAC TCCTCTTTCC AGAGATTAGC CACTTACT TGTAATATGA TTGAAGAAGG GATTTTCTTC TTCGAAAATC ATACACATGT TCAATTTTTT TACGCAGGTA TTCTTCGATT
ATTCGCAAGA
TTGGTCTTGA
GATGTCTCCA
CCACCAAGGC
TTGGATTATC
AAGCGAAATG
GATTTTCAGA
ACAAATCGTC
CCAGTCTTGG CTTCATCTGA TTTGGAGTAA TATTGAGAAA CTTATCTGCT AAGAACAGTT CGGTATCATC AATA'rAAATG TCCATAAAAG TTr'rAGTCCT TGACTGCACT TCTCACTTCT TCTAATACAG TGATGAGTTC AGCCACTTTG CGACTTTCTT CTGCTCCGAT ACGAATCCCA CTTGTCTTGA T'N'TATTTAA GC'rAATATTG ACTTCADCCA CAACTr'rAGT CACATCAACA AGGAAGAcGAT 710 CGGAC'rGAAC TGGTrGAATAA TGGCCTTCTT CGTA'rAATGG GAAGGCATCT GTCAATTCTT CCTCATTTTC TGAATTCTTA AGCGTTTTAA CTTCACCAAA TCCACGTGCA CTAATGGCTG ATGGTCACAA-GCTI-rCGTAA GGGATTGAGT AATCAGGGTC T'IGCAACAAG CATATTCCTT GAAGGCTGGA CATCCTCTAA AGGACCGCCC GTTCTTCGTC ATTGGTCAAA TTGT'rGTTGT GATATGAGCG CAGCGATATG GGCCATGTCC ATTTTGAAAA ATCGATAATT
ACATCTGCCA
TCCAAAACTT
'rGAATACCTG
ACAAGTTTTG
GGTTI'TCAGT
TAGCCT'rGCT
CTTTGAAGGA
GGAAAATAGC
ATCAAACCAC CACGAGGTCC TATGGAACTG GGC'N'GGATG ACCATGAGCT TCGCACCGAC AGCAACTTTG CCGT'rTTCTA TCCACCTGAA ATAATACGGA GTTCTTA.AT1' ACATTGGCAG AACTGCCTTA GCCGCCACAA TGAATTGATT TTTTTAGCAA ACGAAGGGTT TGTGGGTCG AAGGCCAGCC GCAACCAAGC AGCATCTGCG ATTTCACGGA AGCTACAATC AGTTTTGGTT GAGTTCCGT'r TTAGGATCAA AACAGGAGCC CCATCAGTCA ACCrGGCTCA ATCAAGGACA TGAGAATAGG CTGAAGCACC
CAAAGTCTAA
'TACTTCTTG GGCTTGTTT'C AAGATAGCA'r 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 CACTATAAGA AACAAAGTTG TAGGTTTGAC CAGAGAAGCT AATGACCACC TGATGCCAAA TCCATTCCCA TAACCGTATC
TGTAAGCCGC
CGAAAATTTC
CACCATAATA
GAGCTGCCAT
ACAGTTAGCT TGGCTTCCTG AA'rGTGGTTG AACATTGGCA AATTTAGCAC TTTrGCGCGT TCAA'rAGCAA GAGTCTCTAC AACGTCTACT ACATC-AGTTC ACGGCGTCCT GGGTAACCCr CGGCATAT'r.r ATTTGTCAAG ATAGACCCTT AACAGCCT'rG GAAACTACGT TTTCCGAAGC AATTAACTCG ATATTATTTT G'N'CGCGTTC TTCTTCTTTG GCAATAGCAT TCCAGAGATC CATCTTTGTC AAAAATCATA GGTCTTCTCC TTTA'rTGTGT TTTACAATAA GAAAATCAAA CTAACAGATG CGAATAAACC GTATAGCCAA CTTTTTCATA AAATGCATGA GCACCCAGAC CGGATAAACC CATAACCACA TCTTTTTGCT TCTTCTTCCA CCAATACCTT GACCTTGCGC TTGAGGTGAA ACTGCTAAAG TTGGAATAGA GTGATTCGTA AACTTCAGCG TGGACATATC GCATCCTCAT AGCCAAGTAG GAAATGATGG GAATCCTGAG GCCGTTTCC'r CTGGACTAAA AGTATAACCC AAAGCCTCTT AGCATCATAT GCTTTAAAAT GACTAGTCCA 'rrAGTTTGAT GTTTCTGCAT T'N'ATCACAA GATGATTGGC AGAATTTAAG ACCCTTGTAG TAAAC 'rTTA CTAAGATATT AAATCCTGCT CAAGTAAGAC ATGATTAGCT ACAGTCTAGC rAGTTGGCTA GGTTGATGTC ACATATACCT TTCACATCAG TTTCTCTTAA ATCTCITTAGC CAACCGAGCA AGAATATCT TCCCGACAAA TTCAAAATAG CAGAACCTCT TGGTCAAAAT TGAGATATTG GCAGACGGCC GGGATGACTA GGCATCCGAA
CTCGCTTAAT
T'rGAATCCTG
CCTCTAGAAT
CAATCAAGGG
ATCCAACACT
TAATGGTCAA
TCTT'rCAAAA
TACGTCGACG
TCGGTCATTA
TTGAAGATAA
GAGATTGAGC
GCTTCGAGAA
GTTGGCTGAT
GCCTTGTCTC
TTCGTCTAGC GCCTTAGCAA AGAGACCGTA ACCATTTTCC AACTCTTGTC TrAATCCTGTC ATCTCATTCC TCCTCAAAAG GGCCCCT'rGA CGTAAGATT TCCAGTTAGA AAAGCATCGT TTCATTAAAG GTCACTCCAC ACCTGTCTCT CGAATCAAAT TGCAAGGCCA GAATTGACCC GGACCTG4GT AAAAAGATCT GTACAAGATG TCCTCTAAAG TTTAAGCTGG TAAACATGGT AACTGTCTCT GTAGGCAAAA CATCATCAAC GACAACCATC TTTCAGGAAG ATGTTTCCTA CAAGGTAAAT C'rTACCACCA AAATAGCTAG GCCATCCTCA TCAAGCCTAC CTCTGACTCA CATATTTTC AGAAATTTCT GATTrr'rAGC ATT'rCGC'rA CTGACCAATC TGGTCTGTTT TATCTAGGAC CATAAGATTT CTGTTTCTGG ACGAGGAATC
AAATCTTTGG
TCACCTTGTC
CT'rCCAGACC
TCGCCTGACC
CAAGG'rAAT
AATAGGGAAC
C'rACAAGT'TT
AGGCAACATT
CAACTGCTTT
CGACAGCTCC
CTATCTTGAC
AAAAGTTCAG
TCTTTGAGA'r
TCTGCAAAGA
TCTTCACGAG
GTAAAACAGT
GCTACATCTA
TTTGCTAGAG
TTCACAGGAT
AAAACCCGTT
CAAATTGGTC CTTGAGTGTT GAACACTTTG ACCTTGCTTG AGTCTrTTTGC ATCTTCCGCA GAGCTAGATG AGGCTCCGAA AGATA'rAGGG TGGAT'rGGAA CAGATTTTTT TAAAAATATT AAGCATCTTG GGAAATATCT CGAGAGCAAT AGCTCCACTA TTTCAGCCAG GATAAGCTCC CATCCACCTT TAAATGCATT TGTGAGCTGC TAGT'rGCTGG
CGTACTCGCT
TATCCAATTT
ATTCTACGGT
TACAAGACAT
ACAATTATAT
TGAAGATTTT
GCTGCCGTCA
CCTGTTCCGA
ACCAACTCCT
4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 CCATAAAAAT CTGCCTGTCC AATGATGTAC TGAGCTGGCT TAAATATCTT CTACAAATTG TTTTTCTTCC TCTGTTGTCA CCTCCTGCTG GAGGGCAAAA ATAAAGTC1'G CAAAAGAGAG GCTTTCCGCT TCCTCTCCTT
ATAATTGAC
CACCAAGGCA
GGTCAAGCCG
TTCTGAACGG
AATCTGAGCA
TAATI'rCATT ATTTGTTTAA TCCACAACTT CC'rCCAATTT ATACGGTGGT CTGTGACACG TAAPACATAG ATTTTTCAGA CTACGATAGA GTCITATCAA CTCTTCTTCA AAATTTGAAA TTCT'rCTAGT TTNTGTGTTT GGTCATAAAG ACCAGACAAA ATCGTATCTA GTTTTTGGAG GTTTTGTGGG AAGTTATAAG TTCGGATCCG TCACCAGTAC CGATTGTCGA CTTACGCTCA GCGTCCTGCT CATCTTGAGC AAGTGGTCAG CAACACGGGC ACGGATGATT TTrCATGGCCT TCTCACGGTT C?1'CTGCTGG
ACGAACGGCA
712 GTACGTTCTT CCTGCATCTC AACCT'rGATA TTGGTTGGCA AGTGAACGAT GTCGCAACCT TATTGACGTT CTGTCCACr-A GCACCAGAGG CGTGATAGAT 6120 6180 GTCGACACGA AGGTCTTTTG AAGAACTGTC CCTGTCGAAG CACACGGTGG GCACCTGATT AGCAACCACT TCTTTAAAAC CCAACC'rTGG GCTTCCGCAT CGCTTCGTCT CCACCAGCTG ATCCTTTGGA AGGAGCAAAA ATCTTTGAGT TCTTGCTTGG CA'rCTCT'rCG GCATCGACGA GGTGTCACGA TTGGAAGCT'r GACATCAGGG TCACTCAGCA TTGATCATAG ATGrrCATTT TACTTCATTC TCGGATATTT TGCCTGTTCA TTTTCTGGTT 'rGTTCGAACA TAGTCTAGTA GATCAATG1'C GTATTCAACC TCTTCAACTT CTGGCATAAC TATGAACACG GCCTrGGCr'r TCTGTCACAG GAACACGr'rG CATACTTAAG CTTAGAGTAT ACAGACTGAC CTGAAACCAT CACCGACACC ATTCATAGAG GCTTCCATGA CTTCAAAGCG ACTTTTGGTA CATAGTTAGC AAATCTCCAG CGAAAAGTGC CTCCACGGA'r TTCAACGATG ATATTCTTGT CATCGTTTGG T1-rTCAGTTT TTCTTCATAT TCTTCTTTTT CAGCCT'rGGC CCAATTCTTC CAAGTCCGCA TCTCCGCCTG ATTCCTTAAT TATTTTGAAG GACTTGTTTA TACTCACGGT AGGCTATTAC CTTCTTT'rGA AAGCTCCATA AAACGCTTGG TGTCTGAAAC ATTCTCCTAA TTCTTCATAA CGGTCrTC'rA CAACTTGTAG TTTCTCCTrTA TTTCTCAATT GTTAAATCAT AGATTGCTAC CCCCAGTTTC TTTAAATCCA TAACTGAGGT AACAAAATCT CATAXGACAA CCAAAGTTTA 'TTCCTrAAAC CTGCTGGCGC CTITATCCAT AATTGGTTTA AAATATCCTT GATTTTGAAA *6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 ATTCTTATCA ATCATAAAAC GAAATAGTAA ATAATTTCCA ATCATAAGCT ATCATCACAA AACCTATAAT TGCATCATTA TACAAAATCT CCATTTTTAG AATGAATTGT. TTTTGATATT TGTAACATCT CTTAGT'rCA.A CAAAATAA'rG TT'rACGGCAA CTCCATCGTA AACGGGCAGT AATACTGACA GATGGTCTTG AACCTTCGAA CAATTCATTG TGTAGACCTA TGCT'rCACCT CTACTAA'rrC CGATCTTTTT TCATAAACTG CCAATGGAGC AAACTAATTG CGTTGGTTGC ACATCAAAAT AATTTTCCAT GTTAGAGGTT A'rCATTGGCG
CCTTGACATC
TTGTCATAGT
ACTGAGATAT
CCATCCTGTG
AT1TTCGTCAA
CGAAAGTCAT
CAAATTTAAA
TTTGCTCCTT
AGGTTTCGTT ACCACCAATC TGGATCTG2-T TTCGCAACAC CATGGT CGCC TTTTTCT'rGC TCTTGTCTGC TAA.AAGCAAG AGATATTTGG 'r'rCAAGCC AAAAGCCATG ACGGGTATGT AAACA'rGGTG GCGTTTGAGA AACTGGGCTT GTAGGTCTCG GATATAGCCA AAGATA'rCCG TCATGCCAAT TCGACTCGAC ACATAGCCAA TAATCACAAC ACCTTTTCCT TGCTCCTCGT CTAACTCGTC CACAACACGA GCTAGGTCGT CATCGACCAA AACACAGTAA GGTTTTTCTG TTGTTTCCTC AATCGCAAgG GCAGGGCGTT CGCCGTCACG CGTATCCAGA GCCGAGGTCA 713 AGI'ATAGGC CACT1'TGAGA ATCrCAATCG ?ITACCAGA AGTACAACTG TGCCATGTTT CT'rGCTTCAC GTCCATT'rCT TATATCA'rAA
ACTAGTGGAG
CGAGCAAAAG
TrTTC'TAAG CTTTAAACGG CAAAATGTGG GAAGCTATTA TGCCATTTGT ACGCATCGAT AAAGCTCTTG CTAAGGAAGT AACGGAAGCA CCCTCAATCT GCTGTCCATG 'rCATCATCAA CGACATGCCA AGGGGAAATG CGTACTAAAT AACCTAGCTT AAGCAGAATT CAAGTAGCA'r TCATTGAAGA AATATCCTAA Ar'rTGTTACA AATT'rCCAAG AAAAGAGCTA TTAAT'rAAAG GAAACATTAT CCATCGCTCC TATCTCTACT CCACTAGGTG AAGGGGCTAT gAACAGACAG T'rTTGCTATT GCGCAAAAGA TTTTAAAGG CCAGCCACAC 'rCTCAACTAC GGTCACATTA TTGATCCTCT
S
S
S
S S
S
AGGTTATGGT
TTAACACCCA
GGGCTCGGTT
ACTTGACACA
ACATTGCGGT
AAATCCTCAA
T'IGAGGAAGC
CCAAACTCCT
TCATTGGACG
AGGCTATCGT
TCAATGGTGT
TGGGGCTATG AAGTCTCCAA AGACCT'rCAC CGGTGGGATT GCGGTGACCA ATGAAATTCT GGCAGAACCT GGTGAATTTA CCAAACGTGC GGCAGAGGCT GTGATGGATA TCATCCGTGC GTrrCATGGTC CCATAACGAT AAATTTTGC TACATTCTAG TAAAATAGAA GAAATCAAAA TTATTTGAAG GACGCACGCT GTTGTCCGCA ACACTGGAGC GAAG4GAACTI' ACTTCCCACA GCTTAGGCTT TTTCAATCTC ATTT'GAAAAG AAACITGGAG GATTACACGT GAATT'TGATA TGGTATTGTC CGCCTGAGCG AAAAGACTTG AACAAGGTTG GACTGGTAAA GTCATGGACG TCGTGAGGAT ATTATCGAGA CCAGCTAGCT ATTCGTGAAG TTTTTTAAAC GGTCGCGTAG CAAGACTGAC A.AGGCCATGA CA'rTAACAAT ACCCGTCAAG CTATCCTGAG TATGACGATG GGAGTTTGAG CAATTACTAA TGAAGGAATT TCAACGGCTA CAACCTCTTG CGTGAGGACA CATCGAAGAG TACGTCAACA TCGTGAAACG GATGATATCG GGAAGCTGAC CTAGTTCTGC CCAACTCCTA GAAATCAGTC GCCTGAAACG ATTGAAACT'r AAATCAAAAC ATCGATAAAA TTTGGTTGAG CAAGATGCTA GGCCG'N'GAA AGCCTACAAG 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 CAAACAATTA GACGGCTCCC TACACTTGCC CAAGTTGAGG CACTACTGCT GTTGTCCGAG TAGGACAGCA CGTCGTGGTA TCCCAACGTT GGGAAATCAA AACAGATATC. GCTGGGACAA ACCTCTCAAA TTGATTGATA
S
55.5 S S TT'rCTGACCT
TCAATATCGA
AGAAGACAAT
AAATCCTTCG
GCCI-rCTCA-A
CACGAGATGT
CAGCCGGTAT
AAGCTCTTAA
CCCAAGATCG
AAACTGACCT
CAGTTCTTAA
AAAATGCTGG
TTGAACAAAT TGGAGTTGAG CG?1'CGAAAA TAGTACTAAA CGCTAGTGAA CCACTAACCG AGGAGACTAA TCGCATTATT CTTCTTAACA CGGAACTACC TGAAGATGTC ATCCGCAT'TT TCGAAGAGAG AATCAACAAC CTCTTCTTTG CCTACTTGTC AAACGCCCGT CACATTTCCT TGATTGAGAA 714 CTGTTAACCA AGGTCIGAA CTAGGGATGC CAGTTGACTT GTACTTGGGA AATTCTAGGA GAAATCACTG GAGATGCTGC AACTCTTTAG CCAATTCTGT T'rAGGAAAAT AAGAAAAATC TGGATTTTAG GTTCTATAAT ATTGTAGTG GGTAAATCCA ATTTTATTGT AGAAAAAAAG TCCCATATGA CCTATAATGA TAGAAAGAAT CATATGGAAC AATTACATTT TATCACAAAA TAATATCCAG ATTTTAGACA TCATCAATAA GGATACACAC GGACTACGAC GCCCCATCTT GCCCTGAGTG CGGAAACCAA AAAAACCTTC TAAAATTCCT TATCTTGAAA CGACTGGTAT GAAAGCGTCG ATTCAAGTGC TATCACTGTT CAAAAATGAT TCAAGAAGAA TCACCAAATC CCTCGTATCA TCAACCAAAA AAAAGATTTC TATGACTGAT ATTGCCCATC AGCTTTCCAT GTAAGCTCAA TGACTTTrCAC T'rTAAACATG ATTTTTCTTG GGGATGAGTA TGCTTTTACA AAAGGGAAGA T INFORMATION FOR SEQ ID NO: GCTTCAAGTT GACTTGACCC TCCAGATGAA CTCATCACCC CATGATCCTT CATTCGGTCA.
CTATAGATAT TATGGAGCCT AAAGCGACAA AACAACI'CAT TTACTAGACA TTAAAGACCC AAGGAAATCA TCGCCAAACT TTGAAGAAAT ATGACTTrCA GCCCACTAGA ATTCTCCT'rA GGTCGCTGAA ACTTCTATCG GATTGCTCAA AAGTTAATTG CTCAACTTCA ACTGTTATTC TrCTT CCTGAG ATTATGTCTT 9660 9720 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10411
S*
S
S. S S.
S
S. S S
S
S S
SS
S
*5 S S
S
SEQUENCE CHARACTERISTICS: LENGTH: 2393 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: GTTTTGGGIT CTGGAAATTA TCAGATGGTT GGAAAAGCCG TCCACATCAA GATAGTGTTIC GGAGA'IrTAA GTI-rAAATTG AAGAAACTAA CACAGAGGAA. ATGGAGTATA GACCTAACAA GACGTATTGA GCAACTGAAT T'rGTCTATTC GAGGATi3GAT AAACTATTGC TCATTGGGAA ATATGAAAAG TATAGTCGCC AGCATAGATG AGCGCTTGCG TACTCGCCTA CGAGTGATTA TC!TGGAAGCA ATGGAAGAAG AAATCGAGAC GATTATGGGG ATTGCTTAAG TTAGGAGTTC CTAAATGGAT AGCAGATAAG GTATCTGGCT GGGGCGACCA TTATCAATTA GTAGCTCAGA AGTCGGTACT TAAACGTGCT ATATCAAAAC CAGTCCTGGA AAAACGTGGA CTGGTTTCGT GTTTGGATTA TTACCTTGAA CGACATGCGT TAAA-AGTTAG CGGCACGTAC GGTGGTGTGA GAGGGGCTAG AGATTATCCC AAATTTATTr TAATTATGCA AATTTCACGT ATTTTTGATG TTGA-ACCGCC GTATGCCAAA CTACTCGATT AACTCCCCTG CTGAGACGAC GATCCTGGGA AC=1TCAGA TATTTTTTTG ACTATCTAAA GGATTTGAGC GTTrTTCTGA TTITTTAAGAC TAATTATTCT ACTAACTAAC TAACTTCTTA TCCTAAAATT AG3CAATAATA AAGGCAATAG TTTGTI'TTCT GCTAT'rTTAT GCTAAAATAT ACTAAAGGGG GAGCGCTACA TGTCTAATTC TGCAAATTrTA GCAGATATTT TCT'rrAGAGT AAAATCAGTA ATTGCCACAT CACTAGTTCC GAGTCT'rTTA GTTCCGTTGG TTACTAAAAG TCAATTTGGA AAGACTATAT 'rATTGGCGAT CGTAGCGCCT TTGGTGACCT ATCTATT'rGT AGCACCCGTT TCCTATGCTA TTGTGCCACG
TCTATCATTA
TTTTTrCCAGT
GTACTAGCCA
TTTGTT
'rAAAAArCAA
ATTGTCAAG
AACAATCATT
GAAAAGCTTA GAGCGCCAAA CTCNTTCG A'N'GAAGATG ACA.ACGATAA 'rCATAATTCC 7rCATGTAAA AAACCTCACT ATTTA6ATTCC AAAG1'?rGTA TTGTTAGTCT CTCAATTATT GCTAACATAT ACATTATTTC TATCTTAATA GGAATATCCT CTTTTGTTGC GTTAGCGCTA AATAGCGTTT TATCTTTATC ACTGGTAGGA ATGTT'rACCG TAATGCAATC TGTTGCAATT TCCATACTAG ATGGTT'rTGC CTATGCGACC GATTTGGGTA AGGCTAATTC t.
*I
S S S. S S
S
S. Sr S S
S
*5 S. S 0 4..
4. 5 6.4.
S* St 5
S
SO'S
AGCCTTATCA ATGACTGGTG AAGCTGTTCA ATTGATAGCT TGGGGATTAG GTGGACTCTT GTTTGCAACA ATTGGTCTGT TACCTACCAC GTGTATCAAT TTAGTCTTGT ATATCATTTC TAGCTTTCTG ATGTTATTIrC 'rrCCrAACGC TGA-AGTGGAG GTGTTAGAGT CAGAAACTAA TCTTGAAATT TTGC'rCAAAG GTTGGA-AGTT AGTTGCTAGA AATCCTAGAT TAAGACTTTT TGTATCAGCA AATTTATrGG AAATTTTTTC AAATACGATT TGGG I1TICTT CCATTATACT TGTTTTTGTA ACGGAGTTAT TAAATAAAAC GGAAAGTTAC 'rGGGGATATT' CTAATACAGC ATACTCTATT GGTATTATAA TTAGTGGCTT AATTGCTTrT AGGCTATCTG AAAAGTTCCT TGCrGCTAAA TGGGAACCCC AAT'rATTCAC CCCAA.ATCTA AAAACCATCC AGAATCCTTG CCTTAGCTTA GATCCTGGAT GGTTTCTTTT TTCACCCAAT GGGTGTTTTr' 'rACTAGACAA AAAAGAGTTT CCCCTTTATG GTATAAGTGT AGAAAAAAAC ACAAAAAGAA AGGAAACTCA CATGAACAGT TTACCAAATC ATCACTTCCA AAACAAGTCT TTTTACCAAC TATCTTTCGA TGGAGGTCAT TTAACCCAGT ATGGTGGTCT TATCTTTTTT CAGGAACTrTr TTTCCCAGTT GAAACTAAAA GAGCGGATTT CTAAGTAT'rT AGTAACGAAT GACCAACGCC GCTACTGTCG TATrCGGAT TCAGA'rATCC TTGTCCAGTT CCTCTTTCAA CTGTTAACAG GTTATGGAAC GGACTATGCT TGTAAAGAAT TGTCAGCTGA TGCCTACTTT CCAAAATTGT 'rGGAAGGAGG GCAGCTTGCT TCACAGCCAA CCTTATCCCG TTTTCTITTCC AGAACTGACG AGGAAACAGT CCATAGTTTG CGATGCCTCA ACCTTGAATT GGTCGAATTC 'rTTTTACAGT TTICACCAGCT 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 716 AAACCAACTC ATTGTAGATA ACGATTCTAC CCATTTCACA ACTTATGGCA AGC INFORMATION FOR SEQ ID NO: 91: SEQUENCE CHARACTERISTICS: LENGTH: 4762 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY- linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: TTTGTATCTT TTTAGGTCTC TTTCA.ATCCA AACCCTTTAA CTGCAAGTCT TGTGGTAATTr TTAGG'ITTGA TTTTACTTTT ACGCTTCTrA TT'TGATGGTC GTC?1'CCCTG TTTTCCTACT AGAGTCAACA GAGGGGGCGT AGTGCTAGAA GAAGCCGAAG GGAGTCGTTT CTTCAAAGGA AATCTATATC TGCTAGTIT= CTGTTCCTTT TTTGATGAAG T'FTGTCCTTT ATCCAGTACC TTGCTGATTT GGTAAAAGAG GAGACAAATA CGGAAGATGC ACTGCGAC'rC TTTATCGTAA GAGTGAGCGC TTGTCCCATC ACTATACAGC AACTGAGGAA AATCGTAATA AGTTACTTAA CTAAGGTGAT TGTGGTAAAT GATAAGGTGG TAGTCTGGTC AAGAAAATTA CCAACAAGTA AAGACTGATT ACTCAGAGTT AACCAAATCA ATATCTTGTG TATTTTTAAA AATI'TTAGGA TTTTTCTTTT TAGAGTGGTA TAATACTTTT TAGAAAGAAC TATGATTGCA CTAGAAGAAA AAATTACAAT TTTGCCAACT TGGG~AGACGT GTTGTATTrTG ATGTGGACAA GATTGACAAG ACTATACGTC ATTTCGGTTC CTTTrCACAA GAGCCTCTGC TTTATTGGTA ACCAATATTA AGAAACGCCA TTATGCCTAT TGGGTTTGTC TATCTTTTGT TTATCAAGAA CGTAATCGTC TATCTCATGC ATGGGATGAT GCCATTTTG TCCCCGTTCC TGAC TTGAAA
TGAAGTGGAA
TAAAGTCTAT
TTTTTAACAC
ATTrTAGAAA
CTCTTCGTCG
GAAAAACAAC
ACACTCTTAA
AAAATTAAAT
AAGATATTGA
AGAGCATGCA
AGAAACGAGA
120 180 240 300 360 420 480 540 6 00 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 GCTCTCCACA AGGCGGCTGA CAAGGTTATG GATGTGACAC CCCTGGTTGA TATTACAGjAA ATTCATAGTC GCTTTCCA-A.
CGTAGAACAT GAACTCCTTG AAGCCAAAGA -TCGGACACAG AGGGATTTTG AGCGCTCAAA ACTTCTCAAC AAAGACCAGA CAGTTrGTCAA TAACACTCAG CGTGATTTGA CAGCAGGGAT TCCTAAGCAC GTAGCCAATG CCCACCAAAA CAGTCCCTAT ACCCCTATGA CCAACTGCTG AAAATGCCTC AATGATCTGA CTGAGCGAAT GGGAATTAAG ATTTACGAAA TTCAAAATAT ATATGCGCTG GCTGAGGAGT ATATTACTTA AGCGACGGAT ATCAACTTTA GTATTCATAA TGAAAACGCT AATAAAGACA GTGATGTCTT TGTTGGGAAA TCAATCGGAC TGCAAATGCT GGGGGATATC CACTATCACG ATTTGGACTA TTTGA?1TGAT TTTAAGGGTA TGTTGGAAAA 717 TGGT'rTTAAG ATTGCAAATG CAGAGGTAGA GAGTCCCAAG ACAGATTTCT CAAATCATTG CCAACGTTGC TTCTAGCCAG CCGTATCGAT GAAATTTTGG TGCAGAAGAG TGCCTATTGC GGACATCTAC GATGCCATGC 'rGGACAAACA CCTTTTACr1' AATTCAAAAA GCTATTTTAA TATCTTTCCT AAACTTATCT CAACTATGAC ATCAAGCAGT C1'TGTCTTAT GATAAGAT'rG TTCTTTCCTT CAAGGGTGGA TCTGGGTGTT GTGACGGT'rA TAAGT'rCTGG GAAATCTTCA TGTCGAACGC ACTAAAGAGG TTTTGGCCAT CGTCTAGGTA GACCGTTTCG CTGGGCTATA CTGGGAAAGT AATCCAGATG CCGTGTAGAA GAGTGGTCAG CGAAAGTCTG ACAGACCGTT TATCACAGAC AAGGAATACT ACCGTTTGAA AAATTGGACT CATCCATTAT TGTGAGTATC CCCCTTATGC AGAGAAGAAT CTGAAAAACA GGAAGATTAC AATCTCTTGA GTATGAAATC CGTTAGGTTT TGGTCTGGGA ACATTCGCAT CAAGGGTCr'r T'rACGCTTAA AAGAGGCCTC 'rGGCTCTAGA GTGTGCAACC TCTATCCAGA CTGCGACAGC TACGGTGGCT GTTCAGCTGA TATCAAAAAC ATCTCAAAGA GCTrGGAAGA AAGCGCAAAA AATACTCTCT TCACTTCAAA ACCAGTCGT TTGAACGAGA GGTTCAGAAC ACCGTACC TTrGATTTGAC
AGGATGAAAA
ATCTGCCTCG
ACGAGCGAAT
CGACACCAGC
AAGAAGAAAG
TCGGCTTGTA
AGGTTCTTTC
TCGTGTAGAA
TATTGCTCT'r
AACI'TAGAGG
AAGCCGATGT
AAG4GTGCCTA
GTCAATTCAG
GAGTCTGAAG
AAGGAACTCC
ATCCAGACGT
TGGCCTGCCG
GTCGCATGAA
GTGATATGAA
GAATATCGCA GAAGATGCTC TTGTTTACCG GAATGCTCCT ATTCTTTATC AGTACGGTGC TGTTGACCAG CTCTTAAGA TCAAGTAGCG ACAGTTTTCI' CTAAGGAATT CACGCTAGAC ATCATTCACG ACCAATATGG CTACCA'N'TC 'rCTATCTACT TCTGCCGACT AGATATAGAC AAGTTTGCT ACACCAACTC TTTccACTAC GATGTTCGTA TTGAGAAAGT CTATCCGGAA GCAGGTGCGT
ATCGTCGTGC
TTGGTAACAG
ATATGAAACG
CAACACCATC
CTATTCCTGA
AAAATCCAAC
CAGGTGGTTT
1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 282,0 2880 2940 3000 3060 3120 CAGTCCTTCA GCAAAATCCA AAGGCCTTGG AAGCTGTCTG GGATTATGCT TATGACCGTG TAGCCTATCT AGGCACCAAT ACTCCGATTG ACCGTTGCTA CAAGTGTGAC TTTGAAGGGG A'NTGAACC AACTGAGAGA GGGTTTGCTT GTCCAAACTG TGGCAATAGC GACCCTAAAA CAGTAGATGT GGTGAAACGA ACTTGTGGCT ACCTAGGTAA TCCTCAAGCA AGACCGATGG TCAACGGGCG TCACAAGGAA ATCGCTGCGC GTGTCAAACA TATGAATGGT 'rCAACGATTA AAATAGCTCG GCATCAAGTA ACAAATTAGA AAGAAATGAA ATGGGAAAAT ATCAACTAGA CGATAAGGGG CGCGCACAAG TGACCCGTTA TCACGAGAAA CACTCTAAAG GTGGAGCTGG TAAGAAAGAA CGCTTGCT'rA GCTTCAGAGA ACAATTTTTA AACAAGAACA AGAAAAAATA AAGGATGGAA TTACGCAGAC AGAGTTTGAA AAAT~rCAGT GTATGAAGAC TGGTTAGAAA ATGGGTTTCT GCAATTCAGT TAATCTCCGG TTGCGCCTCA CATTCGTCCA TCTGAAAGAG AGTTGCTAAG GAAAAGAACA TAGCAGAGCA GTCATTCTAG GCGT'rATTGG ATAGAGGTAG AAC?1'AGTCA AGGTCGTATC TGCGCAACTC TCTCTATGTA CGACT'rCGTC TTTTAA'rGCT CAGACCTTGC CCAACCCTAT ATACTGGGAT TC'rCTTGCCA TCTGGTCCTG GACCGGCTAC 718 AAAGTGAGAG CCAGCTCTCG CrTTTCTCAT AGTGGGAGGT CAAGATTAGC GGATAAGAAA GCTGTTTTAG ATATGATGAC CGCCTCACGA CGGCGGTTTC GCAATCAGGA ACAGGAAATG TAGTGGCT?? TTCTGAGAAA GTAACTTTCT ACTAGAAGAA.
GCAAGGGTTA TGCAAAAGAG TCAAGAAAGC TCTGGTGACC CAAATGGTGG AATATr'rGAG CGAATGAATA ATCCAAALACC ATTGACTACA AGGCCTr'rAA TCAGGCTGTA TGTTTCACTG GGCATTCCCT ATACAGCAGA 'rGGGATACAG AGAACTTTGT GGGAT'rAATC TGCCTGAAGG GGTCAAGCAG TTGGATTTCT GGTGGCCACA TTGGCTACTC
ACTCTCCGTC
TGTAGTGTGA
GATGCTCGCA
ACAAGAATGG
CTTTGTGGAC
AGGGCTTGCA
ATAATCCTGC
ATGGAGTCGA
AAAAGCGAGG
GGCGAAGGCG
GTTCAAGGCT TGACTTTGCT CTTGTTAAGC GGATTCGGAA ACTTGGGAAG AAATGATGTT
AATTCTTGTC
TTATGCTCCA
AAAGTGGGCA
TGAAGAGAGA
GAAAGTCAGG
CGAATTGTAC
CACCTCAGGA
TCTGCCAGTG
ATGGATGTCT
CCAATTTTGA
ACCATCTGTT
ACTGTGAGAA
ACTGATTGAC ATTCTTGTCG GTTTCGAGGT TCATCTAACC AGTAGTGATT TGGGACAAGC ATGAAGAAAA AGGACTTAGT ACACTGGGAA TATACGGTCA CAAGCT1TrAG ATTCTACTAC CGCCA'rCTGG TAGTACCCAA GCGCATGAAC TGGAGAGTTT TGACAATTGA AGAACCTTGG TTGTCGAAGG GATGTCTGTT TCTACACGGA TGAGGAGACC ATCGCGATGC GG
ATGGAAGATA
AACGAATTAT
TCAATGACGG
AGACCAACTA.
'GGAGCTTCA
AGTAAATTTG
GGACGCGCCG
CGAGGGATGT TATAATGTTG ATTAGAAGAG CAGATTATGG GGGAGGGGAG CCTTTTCTCA GGA6ATTGCCA GACAAGGACA GGAAACTCCA GATAAACTGG TGATCGAACT AAGAGAAATC CGATGTGCAA AAATCGCTCA AAAAGAAAGC TATGAACAGG GTCTCAGAGA TCGAGACGGG CGTAAATCAA CCTTTGCACA CTAGAGACAG ATCCTTATAT AA'rCAAAAGG TGACAGCCAG 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4762 GCAGAGAGAT ATCCTTGCTT GCAGGCGGGT AAGGCTAGTG AGGTCTTGTC TGGAGCCAAA GGCTTTCTAC CCAAGGAACT C'rrTCAAAAA GAAr'rAAAGC GACGCCTTGC TAGAGATACG INFORMATION FOR SEQ ID NO: 92: Wi SEQUENCE CHARACTERISTICS: LE1CTU 3832 bae 719 TYP e 3e z', C) Spnucei acidPar a a r-AGCAGTT'rC;ACCAC TAPTON: SEQ ID NO: 92: G G T T TC GGACC;C A7 ATC M A AAGCCT7pQC ?I'paTTccTT 2 'rc.GAAAG A ATAAT GGTAGACACT12 GG T TT A A GGAA
C
T T CA- GAJTA G AGAT ~AC 120 GGTAr~C.C. AGC:TCT AATA P TA T~ _GGT ACCTTGCTTT24 TA C A C C~ CA" TTAAGG A-ACTCTITCC TTCTA;GAC24 TT rCoCAAT TTTC ATTC CAAAATACGA rC C T C A A T G C GATTTTCTGGJ~ AA6AGGGACTGCGCCCTAAA GTC ?CGT CCAACCCACA 360 ATACCACT CCCTGTAA CGTACTATCG AATTAAG AACGC 2 AC CGTTTT GGCCT; A CGAAGCATGAA TTGCTCT .TT 0 480 CAGCT M6C ACCCAAA C:CG.T 1 AGCAGTTCAT TC-A(3TTCAC CACTGTCACCTCAG ATTTGGTGGA G 540 CCCTc AGACCCCTTTT. AGATTGAGG AT" AACCTTCCAC CAGATTCG; C ACTC G~ CTT C G G CT TTTGATC A AA260 T TGA A G T A CA A C T GA GA TGAA A G G AA AC T 7 0 CCAT CATAAGCCT GAAGGG CAGCAATCC TCAGTTGACA C-ACGCATCT, 4 CAACA AACIGGCTYC.ICCA ACrTA CAGAC CTGAT 84 CC AGTA GA
T
T TGCAA GCT CTTAT AATT GTGATAG T A (3AA c M TA A TG 9 00 AAACAGGTAT GTcAA.ATGTC ACG;TACCA TTTTGI ~kCGAATCAA 6 AATCCAGMAG AACA A TC
G
CT G T A C CT C T I AA G CrTC CAT G T G2020 CA A TCC TGCCCAC T ATTA CAA7 TACA T 1000 AA AGG2TCA A T C TC p14 AC Ar cccpAA A c CCTT AAT CC GC A ACT T CT CA ACT TTCT TGCTGTTGCA 1 4 C AAAT AGrrWGCT. AAC G1AT A TOACC AGGCT 1200 CA T AA C GCC ACTA,~~ A? C T CT AC AC GT ACG T 26 C CCT G C G C rATCA A CTA TA G A T A TAC A A T A G GT 1
CCAATCT.TT
0 -CGAACCTAT GCTCA ATCTITT G TCA~ AACA 1320 AAGCTG A AA GC T C A "rrCT GCTC MGCTAT CTGCTG cJp CTCATTG 1 TTCcTCC C TCA G T' ATC GCTTAGCrC CAATCTG GGC T ATTA 150 1560 1620 1680 1740 1800 720 GGTAAATCAG CTTCACAACA CGATGCACAA AAAATGTGTA AAGTTGTTCG TCACGTTGTA GCTGCTGACT TTGGTCAAGA AGTCGCAGAc AAAGTTCGTG TTCAATACGG TGGTTCTGqTT AAACCTGAAA A'IGTTGCTTC ATACATGGCT TGCCCAGACG 'rTGACGGTrCc CCTTGTAGGT GGTGrGTCAC TTGAAGCTGA AAGCTTCTG GCTTTGCTTG ACMrTG'rAAA ATAATCAGTA AGTAGCAAAA GCTACGTGGA ACAGCATTCA GATGTCTGTT ACArTTrrA TAGGAGAGAA AGATrrGAAAA CAAAAATTGG GTCGCTGCAA ATGAAAC'rGA AGTTCAGAGC AAAATCAGTC ATTAGCAAGT ATCTGrI-rAC TAGGCTTGGC AACTAGTCAT AGTAGCAAAA ACTTCGCAGG ATACAACGAC ACTCAAGT TTCTAATAAA ACGCAAACGA GCGCAGAAGT ACAGACTAAT
GCTGCTGCCC
GAATGGATTT
TCGCAGAATG
GAGTGGATCT
GCTCATCAAG
ACTGGGATGG GGATTATTAT G'rAAAGGATG TTGACAACTA CTATAAGGC'r TGGTTT'rATA AATGGCA'rGG AAATTACTAC CTGAAATCAG ATGACAGTA.A TTACAAGAGT TGGTTTTATC AATGGCAATr GATTGGAAAT AAGTGGTACT ATGGTTCTAA AGCTCAAAGT TTAATTCAGA TGGTCGTTAC GTGGATATAT GGCCCAAAAC TCAAGTCAGA T1GGGGCTTAT ACTTCAAGAA GTGGGGTTAC GTCAAGGAGC TATGATGCAA ATCTAAAATC CGATGGAACT ACTATTTCAA GAAGTGGGGC CTGGAAGTGG TGCCA'rGGCG CGGCCTCTGG TGAGCTCAAA
S
S S S. Sb S
-S
S. 55 S S
S
S. ~S 5
S
S
*t
S
*5C5 S. St S
S
S
*.5S ATGGCTAAAA GCCAATGGCA AGGAAGTTAT. TTCT'rGAATG AATGAATGGC TC'rATGATCC AGCCTATTCT GCTTA'TTT TATGCTA.ACC AAGAGTGGCA AAAAGTGGGC GGCAAATGGT TATATGGCTC GGAATGAGTG GCAAGGCAAC TACTATTTGA ACTGACGAAG TGATTATGGA TGGTACTCGC TATATCTTTG 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300
GAAAAAAAAG
AATAGAGAAG
AATGGTCGTA
GTTCGTCTAG
AACCGTCTCG
GCTGAGAGTG
TACCCTATCT
ATTTGAATGT CGGCTGGGTT CACAGAGATG GTAAGCGCTA TTTCTTTAAT AACAAGTGGG AACCGAACAT GCTAAGAAAG TCATTGATAT TAGTGAGCAC TCAATGATTG GAAAAAGGTT ATTGATGAGA ACGAAGTGGA TGGTGTCATT GTTATAGCGG TAAAGAAGAC AAGGAATTGG CGCATAACAT TAAGGAGTTA GAATTCCTTA TGGTGTCTAT CTCTATACCT ATGCTGAAAA TGAGACCGAT ACGCTAAACA GACCATTGAA CTTATAAAGA AATACAATAT GAACCTGTCT ATTrATGATGT TGAGAATTGG GAATATGTAA ATAAGAGCAA GAGAGCTCCA GCACTTGGGT TAAAATCATC AACAAGTACA TGGACACGAT GAAGCAGGCG ATGTGTATGT CTATrAGCTAT CGTAGT 'rAT 'PACAGACGCG TTTAAAACAC
AGTGATACAG
GGTTATCAAA
CCAGATATTT TAAAACATGT AAACTGGGTA GCGGCCTATA AACCCTCATT ATTCAGGAAA AAAAGGTTGG CAATATACCT ATCCAAGGGC GCGTAGATGT CAGCGTTTGG TATTAAGCGA CGAATGCTTT AGAATGGGAA CTTCTGAATA CATGAAAGGA TGAT'rDGAAA GAGGGATGTG 721 ATAGTAGCAC CCTC -TTTC ?1'TGTrTAT GATAGTTCAT CCTCGAGTAA ATTCAAG1"rC 3360 TT ?GC!TCGGAA ATGAAGCTTA TATAGTAGAT TGAATATAGA CAAATACCTT GTGATTGGrA 3420 AAACATTTTA GAAATTCATT TACCTTTCCr AATCGACTrG GTTTCATCTT AN'TCAATCT 3480 ATTATACTAT TGGGGAATTT C1'TCAAACCA CATCAGCTrG GTCAGTTCTA CCTGCGACCT 3540 CAAAACTTGT GCTTTGGTCA AGCTGGGTTT~ AGTTTCCTAG T?1'GCTGATG GATTTCM- 3600 GACTATAAGC ATCCAACCCT CTTTTTGTCT TCTAAAGAAT TCTTAAATTA TCAGTCrATT 3660 GCAACTTTTC TCATATAAGT TCTr'rGTCTr GCTAT-rGGT'r TTCCTTAGTA GTATACTAAG .3720 GTAGTAATCA TTAAGAAGTG GTTACAAAAA ATAATGAATG AGGTAAAGAA AATGGTAGAA 3780 TTGAAAAAAG AAGCAGTAAA AGACGTAACA TCATTGACAA AAGCAGCGCC GG 3832 INFORMATION FOR SEQ ID NO: 93: SEQUENCE CH{ARACTIERISTICS: LENGTH: 10690 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: TGAAAAAATC.CTCATGAACC TGGCGCCAAT AGACAAGTGT CTTGTTTCCC TCACCTTCCT *TATAGGCATG GTCAGCTGAC ACTCGATTGA AGGGTTTAAC AGAAACCTTT GTAATTTCGA 120 *CAATGCAGAC AGCCTGATTT TGACTATCTA AAATGACATC GAAGGTCCCT ACTTGGGGAA 180 GTGGTTCGTC T'rCTAGCACA TAGAGGTCAT AGGCTGATGC TGTTGCTGTC TTTTCTCCTT 240 *TAAACACCAA ATCCGCTAAA AGGTCTGGTT CAACTCCAAA AGCCCAGGCA TCGATTTCAT 300 CTCCGATCAA AGGATTGATT TGCTTGTATT TATTCCACAT TrCTTGCGGT ATCATGGGTG 360 CTCCTTTGTA ATTTTTTACT TTCTTCITT ATGTGTTTAA GATGATCTGG ATGGTCAATC 420 TCTAAATCAA AXATCTCTGG AATAGAACTG TAGTGGATAA TGCACTTGAT ACCCAACTGA 480 *TTCATTTTTT GTATGAAAGA AGTATTCAGA TAGCCTGCTA CAGCAAAATC AATCTTGTTC 540 TTTCTTGCTT TATCCTGCAT ATCTCTTAGC ATATCTAACA TTATTGGACT TTCCATATCA 600 TGCCATTGAC TGTTTCTCAT AGTCGCA.AAA ACAAAGGAAG TCAAATCATT CATTCCAACT 660 *ACAATCTTTG AAATGCCCGT TTCCAGTATA CTAGATAAGT CAAA-ATACGC TGACGGTAAT 720 TCAATCATCG TTCCGACTTT CCCAGTAAAA CCCTGCTGAC GCAATACTGT AATAGCTTGT 780 TTTAATTGGT CGGCATCATT GACAAAAGGA AAGATAACAG ATAGATTGGG GTTGGTTTGA 840 722 TAAACTTCTG TAACGACATG TGCTTCAGCC TGAAATTCAT CCAAACACGC CAGTP.AACC CTAGTTCCTC TATAGCCAAA CAAGGGA'rGC CCTDCGTCAA AAAACTCTTT AGTCCCCACT AAACAATTGG CTTCTGTATT TAAAAGGAGC AAATAGTATC ATATTTTGAT TGAGCWCT AGTTGAGGAT AAATN'TTTC A?'rCACCTTC CAATTCATGT CTTAACCTTIC AAGACTAATG AATATATCCA AGTGAATCAT 'rGCACTCTCA TTCATT'rCAA CcrrAAT-rCA GTAAAACGAT ACCAAACTC CTTACCTAAG AAGATAATCT TTCACAAATT CCTGACAACI' TGTAATAGT CAATAAGTAT TCCCCACGAA -PeATGCCGAC GTGGTGAAAT AAGAATTTTT TCGCCACTAA GCCGCAAGTTG ATr'rCTCATC AACAAGTCI' GTCCAGTCT GG2AAATCCTA ATAATTCAGA
GCGATGCATT
TCGCACCATC
CATCATACAA
TTCTTCTGTA
AGACACAGCT
GGCI'M'GACA
TCGAGGATTA
CGTTTCCGTC
ATCTCTI'GAA TATCCATCCA TCCCAAATCG TAACTTGAGG TGGTGAACCA TAAAATTTTT GAGTAGCTTC TAACAGTAA.A AGTCCTTCAC CAAGTTGCTC 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740
TAACTCTTCC
TCCCGTCTCT
CTGACGAAAA CATCGTAGAT TCCATAACTr CTCTAG'TCAG ACTGCCTCCA GGTAGATCAT ACCGATGTTG ATAACGCCT CTCGTTTTTT CAATGCAAAG TAACTTTCCA 'TTTrCAAAGC AAACACAGTA GACCCCAAAG TrATTTGA TTTCCATCCA ACTCCTCC'rA CTTCAAAGAC CAGCCACCAT CTATTGTCAA GATTTGTCCT TGCATGGCGC TCGCT'rTTCC ACTTGCTAAA AAAAGACTAA GCTCTGCTAT TTCCTCTGC GCTTGATTGG GGTCACTA GCCACCCAGT CAGCCAAACC ACCTGGTTCA CGGTCATAGC TGTCTTGAC'r CATAGTCTAG AGCCAACTGC CGTGACCACC TCCACCTGCTr TTTTATTTTC CAGCATTTG'r
GCTCCTGGAG
TTGG'rGAAGC
AGGCTAGAAG
GTCAAATAAT
ATGTCCTGCG
CGATACCAAA GACCTCAATC CAGCCAAGGC ATGCTTGGAT CAATGGAACA CATATTGATG ACCGAGTCAA CTCTACTGGA CCGTTTGTTC CAACAGTGGT TCAATCCAGC 1800 AAATCCGCAG 1860 CCAGCTTCAG 1920 GAAGTATAGG 1980 ATGATTCCCT 2040 ATAATGTAGT 2100 TTGTAATCAT 2160 AAAATAGGTT 2220 AGTCCCTTTT 2280
TGATTTCAAA
CCAAAACTCC
CCAAGTCCAA
AATCTCT'rGA
AGCAGTATTA
GGTCAAATCT
CACAAAACAT CCACC'rCAGG GCACCAGTCA CTCTGTAAAA AGCGAAAATC ACCCTCTAAG CACCTTGGTC AACTCCATAA ACTTGATAGC A'rCCGATCCC TCAACTCACT CCTGTAATGA TCCGTTGCCA AAACATCACA AACTGTCGGG
CCTTCTCTAA
GTACACGT'TT
CTCCACATGG
AA.AGAGGCGA GC'rTGAGCCA AGTCATGCAC TTCTACCCAA AAAAACCTTC TCCTTCGCCA 2340 2400 2460 2520 2580 2640 GAAACGTTrGA TTAGGAAATA AGGTGTCATr TCAAGTGCAA GCCCATN'TG CTCGATGGTA TCAAAGAGN' GGACATAGTT TTCCGCACC'r CCCCAACCAG TTCGTACATA TTTTCTCTTA GCCTTTAACC CAGGCAGGAT CTCTCAAAT GTCATGTT'rT 'rCTCCTTTAA TTCTACATTC 11 .u JfAu -h 723 TTCATl'TAAT TATAGCAAAA AACCGCTTTA TACGGCTTTT TGAATGTGAG TirAT'CAAAc CTGCTACTAC TTACGGCAiA r-rATTCCCTC3 CAGCAAGATA AATTTCATAC CATTCTTTTC TTGrrAAGCT AAAGTTTGCC GCTCGGCTAA CI'TCTCTCA.A GTGCrrAGGA TTTGTTGTAC CTACGACTGC CTGCATrI-r? GCTGGATAAC GCAATATCCA AGAAATGGCA ATAGTTGAAG 2700 2760 2820 2880 AGGTTACTCC ATATTTAATA GCTAAACGAT CAACTACTTG ATTTAAAGCT CATTTCCAAC AAAATTCCCT CCACATCGTG TAATTGCCAA CTTCCATATT AACATGAAAA GCTCATTAAC AGCTAACGGC TTTGATTAGA AACTCCAAAA CTGCTACTTG GTCAGATTCC GATCAATCTT CAATCTTTGC AATCAAAATA GGTAAATTCT CTCTTAAATC TGGACGAT'TT
TTAAAATACC
TATTCAAAAA
GCTGATTCAA
TGCTTGACAT
TCTCGAACTT
ATCAAAGCAT
AAAATACCGT
TCAATGCGAA
CGAATTGTAA GACACACCAT TGCTGCCATC TCGCATAGCT ATCCTGGAGT AAA.AGCCGCA
TGAAATTTCT
CTGAATGA
GCTTGACTAT
CTCAAT'rGTA
CTTTTTTAAG
TACCTGTTT
CTGGTCGATG
CTACTGATTT
TGCCACATTT
a TT'rAGGACAA GACCTAACAG
CATAAATATC
CTTCAACTTC
ATTCTTTGTC
AACATTCT'rC AGCCAAGTCG AAGGCATTGA TTCCAACAGA TrACAGAT TTATCTTTTA TTCTCATCAT ATCTTGACCA AGAGTTATGT ATCTCATCAA CCT'TCAT'rAT AACAAAAAC CGCTT-rGCAA CAACTCCATC ATCATAGGAT ATAAAGGACA TTAAAGGCrI' AAGGAGCAAG CTATCTAGAT TATAATATAG TCCTTAGAALA GGACTGAATC CACATCTT TCTrCACAA CGACCACGAC AAGTGCTGTT TCTACAAGCT TCCGAGAACA ATTTCTGATA ATTTTTCTCC TTTAATTTCT CGACTTrrTG ACTATACTTC ATTCCAATAA AGAGGACAGC ATCCACTCAT AATAACGCAG AATGGACGAA AGACCGCTTC ACAGACAAAC TATCCCAGGG ATTCCTACAT ATCCTAAAAC ATTTrTCATTT TAGGAGAATC AATCCAATCA GAAAAATAAG 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4026 4080 4140 4200 4260 4320 ACTCCATTTT ATCTTCTTAA ACCCACGGAA CAAGACAAAG a a a a a. a a a TAAAGGAATA ACTTTTGTAA GGAAAACATT CAGAGAACCC ACCACAAGAT GGGCAATAAT TTTCCAATTC CAAATACCGA AGCCGGAATA TAAAAGGCTC AACTAGAAGA ACTATAGTCC TTGGACTAAA CTTCTTCGTA AAGTTGCCCT AAAAATGTGA TAGTAGTTGA TAATAACGTA GATAGAACTG CCAGGTGCTC
TAACTAGCGA
CTTCTTGTAT
CAACAACAAT
AA.ATTGTGGC
GCAAATTGAC
ATACCGCCAG
TGAAATTCCC
CATACTOACA
AATCGTAA.AG
GAAGCTTGCC
GTAAGTGCCA
CACAAGTCCA
TGTTAAGAGA GGACCTTTAG AAAAATCACT GACAAGAATT GGCGTCAAAA GGGACTCTTT CCTTGAGAAT CTCTTTCATT ATT".TTTTAG GATTCTTACC TAGATAATCC TCTGCACTCA TGCCATCTCG TTCTGCTTCT GAGAAATCTA GCATCATCAA ATAGATCTGC TCTCTGAGAT CCACAACTCC TCAAAA'rACT r?CN7rGTAA TACTTCATT'r 'IrGACrCAAA TCGTCCCATT 724 AGTCTTCATC ATAGAQAAAT CCAGCAAGAT TAAAACTTTC TTTGATTCTC CTCAGAAAAC TCATGTAGCA AAGCGCTTGT TCTTCATGGT TTAACCCCCA TTCTTAATCC CTTCTACTr GTTGCCAAAA GACTGAGACA CGCTCTTCTC CTTCTTCAT 4440 4500 4560 4620 TAATGAAAAA TACTTCCGAT CTGGACCATC TGGCGACGGG CGCATGTCGC TTGATTTTr TCTAACTTTT GCAACAAAGG ATAAATAGTT CCTGGAACGA TCCAG.CCTCT CGCMAAGTCT CAACCAACTC ATAACCATAC CGCTCTTr ATCCAAGACA CAACCTTCAA GAACACC~r TAA'rAGCTGA GTrTTTrCA CTTCTAATCT ATTrrGTAAT ACCTACTAGT GACTTCACCT ATAGTATATC 'rAGTTTGTAA AGCATAA'rAG TTAATACTCT TCGAAAATCT CTTCAAACCA GCCCrACCGT ATGTATGGTT ACTGACTTCG TCAGTTTCAT CTACAACCTC TTTGAGCTGA CTTCGTCAGT TTCATCTACA ACCTCAAAAC AGTGTTTTGA CTCTTATCCA 4680 TAGTATCAAA 4740 GACCAATCAT 4800 TCACTTCTCC 4860.
ACTTCTACAC 4920 CGTCAGCGTC 4980 AAAAACATGT 5040 GCrGACTTCG 5100 TTCATCTACA 5160 TCAGTTTCAT CTACAACC'rC AAAACAGTGT ACCTCAAAAA CATGTTTTGA GCTGACr'rCG TT'rGAGCAAC CTGCGGCTAG C'TTCCTAGTT AAAAACAGAA CTAGCCTGAA CTAGTCC'rGT TACAGCTGGA TCAACTGTGA GAAGGG -rAA ACCCTGGC'rG ACATATTTIT TCATCAT'N'T TTTCTTGCCG ACCAATT~CTT CTTCATTC TCAGTTTCGT CTACAACCTC AAAACAGTGT TGCTCTTTGA TTTTCATTGA GTATAAATAA CTACTTAC CCAATCACAC TTCCAT'rTGG TTT'GCCATCA 'rGT-rCAGCTG AGAGAATCAT ACGTGCT'rTG AGGTTAGCAA CGATTTGAAC TTTGAGCTGA CTrTCGTCAGT ATAGTATTTT GCAATTCCTG
ACGATCTTCT
TTTAGACACT.
CCATCACCAG CATCCAA.CG TCT'rTGACT-r CTGCGACACG
GAATTGAAGC
GATTTrCAACC
TTCGTCCGGA
ATAGGCGATT
AACTTATCTG
TTGTCAAAGT
TTCCATT'CTT
TCTTCT1'CCA GATTTCATCC TTGTTTAGTT TGAGCTCAAC TGT'TATTG CCTTCCATTT GTTCCTTGAT
AAAGAATCTG
AACCTTCTAC
CTTCAAACITr
TTTCGACTGC
TATTTAGACG
CAGCCAAACT
CACGACTAGT
CCAAGTGGCT
ATGGTGCGGT
5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 TGGAAAGATA GGTGTTCCTr TGGCAACTAC AGTCACATCT GCTGGGAAGT CAAGT -rTCA AGACTAGAAA CTTCTTCCAA ACCAAGTTGA GTCAAAACTC 'rrCCATCATA AATGGTTCAA TCAAGTGAGC AACTACACGA ATGCTGGCTG CATGACACTT GCCAATTGGT CACGAAGAGC TTCATCCTTG GCCAAGACCC CTCATCGATG TATTTATTGG TACGAGAGAT CACAGTCCAG ACTGCTTCAA GCCCACGTGG ATAGTCAACT GCTTCCATGT GTGTATGGAA GTCTGCGAr GATTG~wCTG CAACCTCAGC AAGAACATGA 'rCATATTCAG TCACACCTTC TACATAGGCA GGGATTTGTC CATCAAAGTA 725 CTTATTAATC ATGGAAACCC TACGGTAAG GT'rGAACGG CCC.ACATAGT CTTCAGGAGT ACGCATGAGG TAGTAACCAA GTGGATCTAG AACGACATTC CCI'TrGACT TAGACATTTT A6ATCAAACGA TCAGGTAATT TAACATCCAA GTGGAAGCGA AGGATATCTT rCCTACCAT AAAGTTACCA TGTTCGCTT~ GAGCGTACCC AATCCAAACC TAGACAACcrr GTTTTGGCAPV GAGGTTCCCA AGGTcATTAG CCAATTCATA AAAGGTTCCG TCTGAACCAA CTGGAAGGTT TCCATAACGC TCTACCAACA TTIrCAGGGTA TCCGTCTTTC ATGACAAACC AACCATGGGC CATCATAACIA AGGATTGGCC AGTAGATAGA ATGGAAGACT GTTCCATrCC AGAACTTGTC AAGAGCrC'rC GCATAGTTrAA GAAGGGCATC TGATGGGACA GGCACTCCCC ATGTAAAGGT TGGCTCGATG AAGTTGCGTA GCATTTCATT ATGAGCCTC AAAAATTCGA CCAAACGGTC TTCT'rCAGAA AcCCA'rrCAA CCTCATGACC AGCTTCATCA CGGAAAACTT CTCCCAGCTG TGTACGAGAT ACCCCCAAAT CTTCCAAGCCC AAGGCGACCA TCTGGCGTGA TAAATTCAGG TTGGTATTTG CTAAGGCGAA GGAAGTATGA TGATGCAGCA ATACCACCAG TCArZATTTCC GCTTTCTGTA AAG ATCT-r CGTCTGATAC TGAATACCAA CCAGAGTATT CACCCAAGTA GATATCATCT TGAGCAACTA AGCGTTCAAA GACTTGTGCG ACAACTTT CATGGTAGTC ATCAGTTGTA CGGATAAAT'r TATCGTATGA GATATCTAGT AATTGCCAGA GTrCTTTAAC TCCAACCCCC ATTCCATCAA CATAGGCTTG AGGTGTAATA CCAGCTTCTT CCGCTTTCTG CTGGATTTTC TGACCATGTT CATCAAGACC TCTCAGATAA AATACATCGT AGCCCATCAG GCGTTTGTAA CGTGCTAGGA CATCACATCC GATAGT'rGTG TAGGCAGA.AC CGATATGAAG TTTCCCAGCAT GGATAGTAAA TCGGCGTTT AATATAAAAA TTTTTTTCAG ACATAATTTT TCCT'rTCCAG GCAA.ATGAAA CCTGTTTTrC TAACACTTCA TTATATCACA Tw[rTAATGA ATTTCAATAG GGAAATCCAT ACAAAAACAA GATAGACGAG TGTCCATCTT GTTCATCTCA TTCATAACGA AGGGC'rTCAA TTGGATCAAG TTTCGATGCC 'rTGTTGGCTG GCA.AGACTCC AAA.ATCATA CCAACAC'rAG CCGAAACTGC AAGACTAAAT AGGGCGACTG GGATTGATAC TCCAACTTCT ATACCTTCTA TTAAACCTTG CAGTAACAAA CCTGCTAAGg CAGTTAAACC ALCTTGCAATT GTCAACCCAA TTAAGCCACC TAACAAGGTC AAAATCATGG ATTCAATCAA AAACTGAATT AAAATATTGG CACGTGTTGC ACCCAAAGCC TTACGAAGAC CAATCTCACG AGTGCGCTCT GTCACCGAAA CCAGCATGAT GTTCATGACA CCAGTTCCTC CAACAAAGAG AGAAATCCCT GCGATGGAAC TAATAATCGT CGTCATAAAA CTA.AACGATT GTTGAATTTC TGCAAATACA ACGGACTCAT CTGCCACCTG GTAT'rCTCCC TGTTGTAAGC CTGCAAGCTC 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440' 7500 7560 7620 7680 7740 7800 7860 7920 726 TGTCATTTT CGTGCCAGTT CTGGACCCAG AGTTGGGGTT AAACTGGrAT CArrCACTCG AAAGACAATA TT-AGCTA~rr CATCTACATT AAAATTCGCA GCAAGGGAGA TATTGGTAGT AATACGCAAG CCACCAAACC CA'PATA'TT TCATCTTTTTA GCCTCCGGAC TAGrATAAAC CCCAATG.ACC CGGTAACTAA ATCCATTGAC TCTACAACC TTGTrAATAG CCrTTGAGG AGATTCAAAT AAACTAATGG ACAATTCCTC ATCTAGCAAA ATGACACTTG CAAACTCTr'r GAAATCTTGC TCTCTCAGAC AGTTCTGTTT CCACCTGTCA CGCATTCGTT GAATTGGTTA GACCCAGGAT TCTTGCGGT'r CGTAAAAGCT GATTGTrTCT GACGCTAATA TTTTTCTGAG ACCCAAAGCC ATAATCACAA CAAAGAACGC ATCTTGTGAG TACGACCTGC AATAATTTCA TCTTAACAG CCTCCATGTA AATTAGCATr CTCAACCTTT CATAGTAACT ATCCACTCCC TTGGCGGTTC AACAGGAACT GAGTAAAAGA CCCGTCTTTA A'NTTAGTCAT ATCTI'ATTG CAACTGA'rGA AACACCGATA CCATGATAGA TGAAAAGGCA 9 9 .9 9* 9 9 9 .9 99 9 9 AGTI'TCCTC CTTTCCTAAC TGAGCACTGT CAGACGAAAT TCTGACGTTT GGCATAGCCA GCAATCTCAG GC'rCATGCGT C1-rCTTT-ATT CAAATCAACC AATAATTGCA TAATT'rGGTT CTCCTrGTCGG TTCATCCGCT AGGATAATAG AAGGATTGTT CTACACGTTG C'N'TrGACCA CCAGATAATT CTGAAGGTPA ATTCAACCTT GTCTAAATAT TCCTCAGCCA ACTTGCGACG.
CGTAAATCAA GGGCAATTCT ACA'r-TrrCA GAGCATTGAG GCTGAAAGAC AAAACCGATT TGTTGGTTAC GGACCT'rAGC CAGCCACTTC TTGACCTTCA AGATAATATT CTCCACGT 'rCGTATTCAT CAGAGTGGAC TTACCAGACC CAGATGGTCC CCTCA'rTCAC r'rCTAGATTG ATATTTTTGA GAACCTGCAr.
AACTTCTGAA GATATTTTTT AGACTAATTA GTTGCTTCAT 'rCTrCCAAGG AAGATGTTGG ATTACTGATG ACCTTAGCAC ATTTCTTGAT ?I'TCTGCGTC AGCATTTCCC AATCAAACCT TGTTCATCCA CAATCCAGAC ATAATTTTTA CTATCATCCA ACAAGAATAG CCTTAGT'rTT GCTTTTAACC TCAATGTTGA TTATCTrGAT AGGTCAAGA'r TTCAGTTTIAG CTGCCTCTTG TCCTCTTCCT TTCCAGAAAC CTTTTTTTAG GAGAGAAAAA ACTTGACGAG ATAGGGAATC ATAATCCCAA TCATAGTAAG AATTCAGAT TCTGCATCTT GACCCCATCC CGAATGACAA TACCATGATA ATGGTTT'rTC ACCTGTTTTG GTATCCAAGG TACCAAGGCA CGCGCAATGG ATGGTGACTA CGTTCTGTCA TTTTGAAGAC GAAACTCCTG CTTCGATAGA AGAAAGAACT TAGTTGTTr TCACCAAGCC TCGTATCC AACATGCCAA CATCATGCCT ACAAATTCAC TTCTTGGTCA CCATTACGGT CAGCCTTCAC CTCTTTTCCT CGT'rCGTTAA ACCAGAAGTG CAACTTTTTT AGCCTTrGT TTACTAGACT GCTAACAGGA CAGA-AAAACC TTGTTTCAAA 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9 9. *9 9* 9 I 9 TCACCAACCT CGCCTGTCAC ATCAATAGTA TAAGGGTATT TAGAACCTGT ATTATrCCCG GCTGCTGGAC TAGCTGCI'TC ACCATTGrr 'rTAGGATAGT CAGAAATATA GCT'TAA TTC CCAGTCCATT rTATCAGG ATACACTTTA GAAGTAAAGC TTACI'CTTG ACCTACA.GAA AGTGGCrA GATGTACrC AGACAA'TCT CCC1'TGAC?1' GTAAATTTTC ATTGC'rGACA ATATGAACCA TAACTTGACT CGCCCCTGTT GGAGATTTAG AAACATrGCT ATTGACTrcG ACCACAGTTC CCTCTAGGGT ACTGAGAAC.A GTT'rTGCAT CCAATTGACT TTGAGCCTTG CTTAA?1TGCG CCGCAGCATC TGCACGCGCA TC.ACGGGCAT CACCCAATTG AGCGTCAATA GAAGCAACAG AATTrCCAGC CACTGGAGTT GGGCTTTrGCA CCGTGCATC ?rTCCTCCT ACTGGCGCTG GTAACTGTGG AGCCGGAGCT GAAGCGGCT-r CATTTCGTGC TTGATTGAGT TCATTGATAT GACGATCTGC GCTTCTGAAC TACTGTACTT ACAAGGATTT CATCTAAATC GCTGTTACTG TCCCTGACAA AGATGAGTAG GCTCATCTTT CCAGCACCCA ATACAACTAC GCTTTACCAT TCTTTCTT TATACCAAAT TTCCCTCCAG
TGCTTTTCGG
CCTAGCTACT GCTCGACTAG CTGAATCATA GGCCGCCTGC GACTAAAGCC TGCCCT'rCGC TGACCTTATC GCCCACAGAA ACCCTrACTA GCATCAAAAT AXACATATrG TTCAT-TTT 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10690 TAAXACAGAG GAGGCCACC TAGAGCAGTC TGAGAAGGTT ACTCGCAGCA CCGATTGCTG CATA6ATGAAA CTCCTT'rCT CAAACAATAC AG'rTCAGGAT 'r-CCT-rCCTT GGCAACAACA GTCTAAAGAG TAAAATCCCC CATACAGTTG CCACTTTTTA TTTTrACA.AT ACTTTGCTAT TAAACA.ATCG TTCGGAATTT INFORMATION FOR SEQ ID NO: 94: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 8195 base pairs TYPE: nucleic acid STPANDEONESS: double TOPOLOGY: linear (xi)'SEQUENCE DESCRIPTION: SEQ ID NO: 94: GAGAAAGCGC CCACGTTTCC CCGAAGGGAG AAAGGCGGAC AGGTATCCGG TAAGCGGCCA GGGTCGGAAC AGGAGAGCGC AACGAGGGAG CTTCCCAGGG GGAAACGCCT GGTATCWTA TAGTCCTGTC GGGTTTCGCC ACCTCTGACT TGAGCGTCGA TTTNTGTGAT GCTCGTCAGG GGGGCGGAGC CTATGGAAAA ACGCCAGCAA CGCGGCCTTT TTACGGTTCC TGGCCTTTTG CTGGCC'TTTT GCTCACATGT TCTFCCTGC GTTATCCCCT GATTCTGTGG ATAACCGTAP TACCCCCTTT GAGTGAGCTG ATACCGCTCG CCGCAGCCGA ACGACCGAGC GCAGCGAGTC AGTGAGCGAG GAALGCGCAAG GATTCATTAA TGCAGCTGC CGCAATTAAT GTG.AGTTAC GGCTCGTATG 'rTGTGTGGAA CaTGATTACG AATTCGAGCT AGAAGATGGT ATAATACTAA CAAAAGGAGA GTCAAACTAT 728 AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGrGGCC ACGACAGCTT TCCCG.ACTGG AAAGCGGGCA GTGAGCGCAA TCACTCATTA GGCACCCCAG GCI'?rACACT TTATGCTTCC TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CGGrACCCGG AAAATCCAGA AAATGCTTGA AAAAAATCCT ATTGTAAGGG T'rATCACATA TAACTCAAAA AAAGAAAGAA GGCTTCTAAA GATTTCCACG TAGTGrGCAGA AACAGGTATT GTTGGTACAA ACTGCTAGCA AATT'rGCTTC AGATATCACT AGT'rAACCTr AAATCAATTA TGGGTGTTAT GAGTCTTGGT AACTATCTCA GCTGAAGGTG CAGATGCAGA TGACGCTA.TC GGAAAAAGAA GGATTGGCAT AAGGGAAATC ACAGAAATGC GACGGTGTTG CAGTTGCAAA AGCATATCTA CTCGTTCAGC ATTACAGTCG AAGATACAAA CGCAGAAGAA GCTCGCCTTG 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 CACCCACGTC CAGCAACATT CTTGAGTACA AAGGTAAATC GTTGGCCAAG GTGCTGACGT GCTGCAATCT CAGAAACAAT TTAAAGGAAT CGCAGCATCT CGGATTTGTC A'N'TGAGACT ATGCCGCTCT ACAGGCATCA CAAGACGAGC TTTCTGTTAT CGCTCGGTGA AGAAGCAGCT CAAGTTTTTG ATGCTCACTT AAATGATCAG CCAAATCAAG GAAACTATCC GTGCGAAGAA TGAAAGAAGT TACAGATATG TTTATCACTA TCTTTGAAGG TCGCGAGAA.A GCAGTAGGTA AATGGTTrCTI'
AGTGAATGC-A
CATGGAAGAC
TGCAAGAACG
GTAAAAAATT
TGACTCCTTC
CGCAGcGGAT wTCCGCGACG TGACAAA.ACG TGTATTGGCA GCCAAACCCA GC~rCTATCA ATGAAGAAGT GATTGTGATT AGATACAGCT CAATTCGACA AAAACTTTGT AAAAGCTTTT GCTGACCCAG 1260 GAAGCAGGTC 1320 AACCCATACA 1380 AACCTTCTTG 1440 GCGCATGACT 1500 GTAACCAACA 1560 GCTGCTGTAT 1620 GTTAACGGGA 1680 TTGGTGGACG TACAAGCCAC TAGGTACAAA TAACATCACT TCACTGGAGA AGTGATTATC GTGAAGCCTA TGCGAAACAA CTGCTGACGG TAAACACTTC GTtTTAACAA CAACGGTGCA ATTCTCAAGA CTTCCCAACT GAATGAACGG TAAACCTGTT CTTACTTCGA TATGCCTCAC CTATCTCTGA GACTGGAGAT TCAGCTATCA TGGCACGTAC ACTTGAAATT GAAATCGT'rA AAGACGGTGA CATCCTTGCT AACCCAACAG ATGAACAAC
AAAGCTGAAT
GAGTTGGCTrG
GAAGCTGTTG
GAAGATGAGC
GTCGTTCGTA
GAAATGAACC
GCTATG'ITCC
GGGCACTTTT
CTAATATCGG
GACTTTACCG
AGTATGAAGC
CAATGGATAT
CATTCCTTGG
GGCAGAATTT AAAGCAGCTG GAAAGATGCT CAAACAGTGA TACTCCAAAA GACGTTGAAG TACAGAGTTC TTGTACATGG ATACA.AGGCT GTTCTTGAAG CGGTGGAGAT AAGGAACTTC ATTCCGTGCT CTTCGTATCT 1740 1800 1860 1920 1980 2040 2100 2160 GCACACAAAT CCGTGCTCTT CTTCGTGCGT 729 CTGTCACrGG TCAATT-GCGT ATCATGTTCC CAATGGTTGC GCTCrGAAA GAATTCCGTG 2220 CAr.CGAAAGC AGTC1-rTGAT GAAGAAAAAG CAAACCTTCT TGCTGAAGC? GTTGCAGTTG 2280 CGGATAACAT CCAAGTTCGT ATCATGATCG AGATTCCTGC AGCGCTATG CTTGCAGAcc 2340 AA-rTGCTAA, AGAAGTTGAC TTCrrCTCAA TTC.GACAAA CaACTrGATC CAATATACAA 2400 TGGCAGCAGA CCGTATGAAC GAACAAGTTT CATACCTTTA CCAACCATAC AACCCATCAA 2460 TCCTACGCI-r GA'rTAACAAT GTGATCAAAG CAGCTCACGC TGAAGGTAAA TGCGCTGGTA 2520 TGTGTGGTGA GATGGCTGGT GACCA-ACAAG CTGTTCCACT TCTTGCGGA ATGGGCTGG 2580 A'rGAGTTCTC TATGTCAGCA ACATCTG'rAC TTCGTACACG CACCTGATG AAGAAACTCC 2640 ACACAGCTAA GATGGAAGAG TACGCAAACC GTGCCCTTAC AGAATGCTCA ACAATCAAG 2700 AAGTTCTTGA ACTTCAAAAA GAATACGTTA AT TTGATTA ATCCAAAACT CCCTGCAACT 2760 CAGTTACAGG GAT'TTT'-rGC ATATrTTTAAA AAGAAT-C AACXAA.ATCT 'TcT =ATAGA 2820 .AAGTCCAACC T'TGAAAAAGT AGTGGTCAGA ACAAAAAATA CTTAAATGGT 'rCATAAAATT 2880 .CTTGACAAGT TGGATATTTA GGAGTAAAC'r ATTAACCAGT TAACTAATAG AGAGGAGTTT 2940 CTGCAATT'rA GAAATGALATT GCAACTAGAA ATATCAAATA GAAACAGAGT TTCGATGAAA 3000 ***ATTAATAAGA AATACCTTrGT TGGTTCTGCG GCACTTTGAT TTTAAGTGTT TGTTCTTACG 3060 ***AGTTGGGACT GTATCAAGCT AGAACGGTTA AGCAAAATAA TCGriGTTTCC TATATAGATG 3120 GAAAACA.AGC GACGCAAAAA ACGGAGAATTr TGACTCCTGA TGAGGTTAGC AAGCGTGAAG 3180 GAATCAATGC TGAGCAAATC GTCATCAAGA TAACAGACCA AGGCTATGTC ACTTCACATG 3240 *GCGACCACTA TCATTATTAC AATGGTAAGG TTCCTTATGA CGCTATCATC AGTGAAGAAT 3300 TAC'rCATGAA AGA'rCCAAAC TATAAGCTAA XAGATGAGGA TATT-CTTAAT GAGGTCAAGG 3360 GTGGATATGT TATCAAGGTA GATGGAAAAT ACTATCT -rA C=TAAGGAT GCTGCCCACG 3420.
*aCGGATAACGT CCG'rACAAA.A GAGGAAATCA A'rCCACAAAA ACAAGAGCAT AGTCAACATC 3480 *oo GTGAAGC7GG AACTCCAAGA AACGATGGTG CTGTTGCCTr GGCACGTTCG CAAGGACGCT 3540 ATACTACAGA TGATGGTTAT ATCT1TAATG CTTCTGATAT CATAGACGAT ACTGGTGATG 3600 *CT'1ATATCGT TCCTCATGGA GATCATTACC ATTACATTCC TAAGAATGAG TTATCAGCTA 3660 CGAGTTGGC TGCTGCAGAA GCCTTCCTAT CTGGTCGAGG AAATCTGTCA AATTCAAGAA 3720 CCTA'rCGCCG ACAAAATAGC GATAACACTT CAAGAACAAA CTGGGTACCT TCTGTAAGCA 3780 ATCCAGGAAC TACAAATACT AACACAAGCA ACAACAGCAA CACTAACAGT CAAGCAAGTC 3840 AAAGTAATGA CATTGATAGT CTCTTGAAAC AGCTCTACAA ACTGCCTTTG AGTCAACGAC 3900 730 ATGTAGAATC TGATGvGCC -r GTCTTTGArC CAGCACAAAT CACAAGTCGA ACAGCTAGAG GTGI'GCAGT GCCACACGGA GATCATTACC ACI-rCATCCC TTACTCTCAA ATGTCTGAAM TGGAAGAACG AATCGCTCGT ATTA'TCCCC TTCGTTATCG TTCAAACCAT TGGGTACCAG ATTCAAGGCC AGAACAACCA AGTCCACAAC CGACTCCGGA ACCTAGTCCA GGCCCGCAAC CTGCACCAAA 'rCTTAAAATA GACTCAAATr CTTCTIrcGT TAGTCAGCTG GTACGAAAAG 'rTGGGGAAGG ATATGTATTC GAAGAAAAGG GCATCTCTCG 'ITATGTCTrT GCGAAAGATT TACCATCTGA AACTG~rrAAA APLTCTTGAAA GCAAGTTA'rC AAAACAAGAG AGTGTTrCAC 0 0e *4 p 0* 0 0 00 *t 0 ACACTTTAAC TGCTAAAAA.A GAAAATGTTG CATATAATCT GTTAACTGAG GCTCATAAAG ATTTCCAAGC CTTAGACAAA TTATTAGAAC AArTGGTAGA TGATTTArrG GCATTCCTAG AACCAAATTC TCAAA'rCAG TATACTGAAG AGTATACAAC GTCAGA'rGGT TACATTXTTTG ATGCATATGT AACGCCTCAT ATGGGCCATA ATAAGGAAAA AGTTGCAGCT CAAGCCTATA CAGACGCAGA TGTTAAAGCA AATCCAACTG TGAAAGGGGA AAAACGAATr CCACTCGTTC AGGTTAAAAA CGGTAAT'rTG ATTATTCCTC CTTGGTTTGA TGATCACACA TACAAAGCTC CGACGATTAA GTACTACGTA GAACACCCTG CTCCTCGTGA CCAAGAATr'r TATGATAAAG CCTTGTTTGA AAATAAGGGT CGTAAT'rCTG GCTGAA'rCA TGAATCGACT AATAAAGAAA CACCAATTAC CCATCCAGAG CGACTTGGCA ACGAACTTCG TATTGCTCAA T'rAGCTGA'rA ATGAACATGA TATAATCAGT GTCACTGGAT TGGAAAAGAT CTAAAGAAAA AGG'ATCCTA GAGATAGTGC AGCAGCTATr GACTTCCATA TATGGTTGAG ATAAGGATCA TTACCATAAT CAAATGGCTA TACCTTGGAA ACGAACGTCC ACATTCTAAT
GATGAAGGAG
AGCCTTTCTG
CCTCCATCTC
TACAA'rCGTG
CATACAGTTG
ATTAAATTTG
GATTTGTTTG
GATGGATGGG
3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 @0 04 c~ 0000 0 0000 *01,0 0 0* *0 0 0 0 GCAATGCCAG TGAGCATGTG TTAGGCAAGA AAGACCACAG TGAAGATCCA AATAAGAACT TCAAAGCGGA TGAAGAGCCA GTAGAGGAAA CACCTGCTGA GCCACAAGTC CCTCAAGTAG AGACTGAAAA AGTAGAAGCC CAACTCAAAG AAGCAGAAGT TTTGCTTGCG AAAGTAACGG ATTCTAGTCr GAAAGCCAAT GCAACAGAAA CTCTAGCTCG TTTACGAAAT AATTTGACTC TTCAAATTAT GGATAACAAT AGTATCATGG CAGAAGCAGA AAAATTACTT GCGTTGTTAA -;AAGGAAGTAA TCCTTCATCT GTAAGTAAGG AAAAAATAAA CTAArGAAAA ATGAAAGTCT CGATAA-AGAG GCTTTCATTr 'rTAI-rATGTA TATATG'rAAA AT'rCTTGACA AGCAATATTA AAAAGAGTAA ACTA'rrAACT AGTTAATTAA CCGGTTTATT ACTTTATAGT GAATCAAATA TACTTAAGAA AAGAGGAAAG AATGAAAATT AATAAAAAAT ATCTAGCAGG TTCAGTGGCA GTCCTTGCCC TAAGTGT'rTG TTCCTATGAA CT'rGGTCGTC ACCA-AGCTGG TCAGGTTAAG 5700 AAAGAG'TCrA ATCGAT'rkC TTAATAGAT GGrGATCAGG CrGG 'CAAAA GGCAGAAAAC TrGACACCAG ATGAAGTCAG TAAGAGGGAG GGGATCALACG CCGAACAAAT CCTCATCAAG ATTACCGATC AAGGTATGT GACCTCTCAT GGAGACCAT ATCATTACTA TAATGGCAAG GTCCCTTATG ATGCCATCAT CAGrGAAGG CTCCTCATGA AACATCCGAA TTATCAGTTG AAGGATTCAG ACATTGTCAA TGAAATCAAG GGTGGTTATG T'rATCAAGGT AGATGGAAAA TACTATGTTT ACCTTAAGGA TGCAGCTCAT GCGGATAATA TTCGGACAAA AGAAGAGATT AAACGTCAGA AGCAGGAACA CAGTCATAAT CACGGGGTG GTTCTAACGA TCAAGCAGTA GTTGCAGCCA GAGCCCAAGG ACGCTATACA ACGGATGATG GTTATATCTT CAATGCATCT CATATCATTG AGGACACGGG TGATGCTTAT ATCGTrCCTC ATTCCTAAGA ATGAGTTATC AGCTAGCGAG TTAGCTGCTG AA~cAGGGAT CTCGTCCTTC TTCAAGTTCT AGTTATAATG TTGTCAGAGA ACCACAATCT GACTGTCACT CCA.ACTTATC ACGGCGACCA TTACCATTAC CAGAAGCCTA TTGGAATGGG CAAATCCAGC TCA.ACCAAGA ATCAAA.ATCA AGGGGAAAAC ATT'TCAAGCC ?TrACGTGA ATTG1'ATGCT
GATGGCCTTA
CCTCATGGTA
ATTG;CTCGTA
GAACAACCA
CCTCAACCAG
GTAGGCGA'rG
CTTTCAGCAG
CATAAGCTAG
TTTT-CCACCC AGCGCAAATC
ACCATTACCA
TTATTCCCCTI
CTCCACAATC
CTCCAAGCAA
GTTATGTC?1'
AAACAGCAGC
GAGCTAAGAA
CTI-rATCCCT 'rCGTTrATCGT
GACTCCGGAA
TCCAATTGAT
TGAGGAGAAT
AGGCATTGAT
AACTGACCTC
AATTCACCAA
CCTGTTGGAA
TGCCTTCTTA
CTACACTGAT
TTATATCTTT
AAACCCTAT CAGAACGCCA TGTGGAATCT ACAAGTCGAA CCCCCAGAGG TGTAGCTGTC TATGAACAAA TG'TCTGAATT GGAAAAACGA TCAAACCATT GGGTACCACA TTCAAGACCA CCTAGTCCAA GTCCGCAACC TGCACCAAAT GAGAAATTGG TCAAAGAAGC TGTTCGAAAA GGAGTTTCTC GTTATATCCC AGCCA.AGGAT AGCAAACTGG CCAAGCAGGA AAGTTTATCT CCATCAGTG ATCGAGAATT TTACAATAAG 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 GCTI'ATGACT TACTAGCAAG GATTTTGAGG CTTTGGATAA AAGTTACTGG ATGATATCT AAACCAAATG CGCAAATTAC AAGTACACAA CAGAAGACGG
GATTTACTTG
CGACTCAAGG
GCTCCGATTC
ATAATAAAGG
ATGTCyCA.AG
TCGACAAGTT
TGATAAAGTC
GTCATCCAGA ACGTTTAGGA GATCCCTATG TAACTCCACA TATGACCCAT GAAGCTGAGA GAGCGGCAGC CCAGGCTTAT ACAGACCATC AGGATITCAGG AAATACTGAG GATGAGATTC AAGTAGCCAA GTTGGCAGGC GATCCTCGTG ATATAACCAG TGATGAGGG AGCCACTGGA TTAAAAAAGA TAGTTTGTCT GCTAAAGAGA AAGGTTTGAC CCCTCCTTCG GCAAAAGGAG CAGAAGCTAT CTACAACCGC 732 GTGAAAGCAG CTAAGAAGGT GCCACTrGAT CGTATGCCTT ACAATCT'rCA ATATACTGTA GAAGTCAAAA ACGGTAGTr-r AATCATACCT CATTATGACC ATTACCATAA CATCAAATTr GAGTGGTTTG ACGAAGGCCT TTATGAGGCA CCTA.AGGGGT ATACTCTTGA GGATCT1TrG GCGACTGTCA AGTACTATGT CGAACATCCA AACGAACGTC CGCATTCAGA TAATGGTTTT GGTAACGCTA GCG.ACCATGT TCGTAAAAAT AAGGTAGACC AAGACAGTAA ACCTGATGA.A GATAAGGAAC ATGA'rGAAGT AAGTGAGCCA ACTCACCCTG AATCTGATGA AAAAGAGAAT CACGCTGGTT TAAATCCTTC AGCAGATAAT CTTATAAAC CAAGCACTGA TACGGAAGAG ACAGAGGAAG AAGCTGAAGA TACCACAGAT GAGGCTGAA.A TrCCTCAAGT AGAGAATTCT GTTATTAACG CTAAGATAGC AGATGCGGAG GCCTTGCTAG AAAAAGTAAC AGATCCTAGT ATTAGACAAA ATGCTATGGA GACATTGACT GGTCTAAAAA GTAGTCrTCT TCTCGGAACG AAAGATAATA ACACTATTTC AGCAGAAGTA GATAGTCTCT TGGCTTrGTT AAAAGAAAGT CAACCGGCTC CTATACAGTA GTAAAATGAA TG.GAGCATAT TTTATGGAGA AGTAACCTT'r CGTGTTACTT CTCTTTTTTA GAAAAACGTA ACAGA INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTICS: LENGTH: 2004 base pairs TYPE: nucleic acid S(C) STRANDEDNESS: double TOPOLOGY: linear
S
7500 7560 '7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8195 120 180 240 300 360 420 480 540 600 660 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: TTTACTAAAA GGAAAAAAGA ACTGA'ITTCT CAGTCCTTCA TTAATCTTAT ATAGGTATGG GTAAACAGGT TGTTGACCTT GGTGAATCTC GACTTCAACG CTTCTACGAT TTCTTGAGCG ATTTCATTGG CAAGTTCTrC GCTTCCGTCT AGAAGGTTAC GATTTCACTG TCTTCATCCA ACATATGTTT CAAGCTTTCA GGTGCATATC AGGGTTTGAC ACAAGAATTr TTCCATCCAC CATACCTAAA CATGGATTTC TAAGCCATCG ATCGTTGTAT CACGCACGGC TGTTGTGACG CGACATCGCT AAGAGCAGCT GTCATACGCT CTTGGTTTTC TTCA.ATGGAC CAAAGGCAAG AAGACTTGTC ATACCTTGAG GAAGAGTGCG AGCCTCTACC GTTGCTCCAA AACTTCTGCC GCAGATTGAG CTGCCATGAA GATGTTCTTG AGAAGAkTGAT GTTACGGGCA TTAACCTGTT CAACAGCCTT GATAAAGTCT GGTTCATGGT TTGACCGCCT TCGATAACAT AATCCACGCC TTGAGAACAG
TCCACACTAA
TCTTCGAATT
TCACCTACAT
GTCAATGT I'r
TTATCGTTTT
CTTCCGCTAA
TTGCTTGGAT
ACTACCGCTG
TTGTTTGGCA
TCTGTTGAAG
AAGATATCTG
733 C'TAGACCTTT ACCAGCCACC ACAGCAATCA TAACTTGAGT AGCTTCTTTC TCAACCTGTG TTACCTTGAC CAAGCTACCA TATTGAGAC TATGAAC.ATG GACTTTGACA ATTTCATCAT CATCCAAGTA GTTACGGAAT TCATCGTAGT TAAGAGCTAC CATGATTTCA GTACAGTAAC CAGCTACAGA CTATGATGC TCTACATTGA CAAAGTCCTC AGATGCAATA TATTCGCCAG AGACCAATCC T'TGACCACCT GAGTCCACAA CTGGTGTTT-T AGCTAGACCT GT'r'1TAGCAC CGTCATCTGT TTGCTCAGCT TTrTTCTTAG AAATCG'N'CC TTCAACAGGT T7CATCACTG AGGCCAGAGC CAAGTCN'GA CCTGTTAACT CACGGAAAAG CTGAGACGTA ATCACTCCTG TGGCAAGAAT GCTCGCTACT TCTCCAACTG CACCATTTTC AATGGTCATT1 CCCATATTTG CGTTTAATGA ATTGACATAT TCAGCTTGCT TTTCTTGAAA TAAGCTAGTA GTAATTTrG.
TTGAATGTAG ACATr'TACAG TCTGAGCAGT AACACGCTCT TGAATGTTTT TTGACACTTC.
ATATACATCA ACTGCAATAC TGCCATCTTC ATTTTCC'rTA CCTAGCAGGG CTTGGAAATT GACCACACCA GAAATCTCAG TTGC AAGCATACTC TTTT-CTrCA GCCGACTTGA CNCGTGGTTG GTTACGCATA TTGTCAACTT CI-rCTrGCAT AACAAGTCCT GGATCTTCTG C=?AACAAC AAGGAGAGAA TCrCCAAGCT CAAAATCT AGCATAGGTT GGACCTrGCT CAAACGTGAT GTCCTCAGTC GCTACGTGAC TCATCTCACr CZATGTTGGCA GGAGTCGCTA TAAGGGCTGA A.AGGAAACCT TCGTAGATGA CGCCAACT-rC TTTCAATACT GGAAGCATGT CTTCCA.AGGC rGCGCGCATG ACTrCAACAG CACCGATAGC AGCrCCACGA GAAACrGTTA CCTTATAGGC AACTrCCACA CCTGATTGGA CGTCTTATC CTTGATAGCT TGGGAAAATC AGTTCCCACG CGCACCCATC AAAAGCCCTT TACAAGCTGG CTTGTCTGCA ACTTCTTTAG TCCCAGTATC TCCATCTGGA ACTGGAAAGA TATTCAAGCG AGTTGATGCA GCCTGCACCA ACACGGTTAT TCTCCrACAA CTTTGATATT AATTCCA AGC TGGTTTT1CCA AGCTAAAGGC ACTAATCTTT GTTCCGTAGC TTAACACGGT GGCTGCCTTT ACGACGACAC CTTTAGAATA PCTTTGAGG GCATTTTTAC TAGCCATACC 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 162u 1680 1740 1800 1860 1920 1980 2004 INFORMATION FOR SEQ 10 NO: 96:.
Ci) SEQUENCE CHARACTERISTICS: LENGTH:. 11915 base pairs TYPE: nucleic acid STRANDEDNESS:- double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 734 CCCGG?1'GGG CTGTTCGCCC ATTAAAGCGG CACCACAGCT GGGTTCAGAA CGTCGTGAGA CAGTTCGGTC CCTATCCGTC GCGGGCGTAG GAAArrI'GAG AGGATCTGCT CCTAGTACGA GAGGACCAGA GTGGACTTAC CGCTGGTGTA CCAGTTGTCT 'rGCCAAAGGC ATCGCTGGGT AGCTATGTAG GGAAGGGATA AACGCTGAAA GCATCTAAGT GTGAA6ACCCA CCTCAAGATG AGATTTCCCA TGATTATATA TCAGTAAGAG CCCTGAGAGA TGATCAGGI'A GATAGG'TTAG AAGTGGAAGT GTGGCGACAC ATGTAG.CGGA CTAATACTAA TAGCTCGAGG ACTTA'rCCAA AGTAACTGAG AATATGAAAG CGAACGGTT-T TCTTAAATTG AATAGATAT'r CAATTT'rGAG TAGGTATTAC TCAGAGTTAA GTGACGATAG CCTAGGAGAT ACACCTGTAC CCATGCCGAA CACAGAAGTT AAGCCCTAGA ACGCCGGAAG AAGTCGCTTA CTCTAGGGA G'TTTAGCTCA GGTCAGCGGT TCGATCCCGT TAACTCCCAT CAGCCTTCCA AGCTGTTGTC GCGAGTCA TGTACCAAGT TTTTGAC2TG GGCGCGTAGC GTGAGGTCGG TGGTTCGAGT CCACTCGTGC GGAATATTAT CTGTTCACTA AGAGGACACG ATTACCCAAG TCCGGCTGAA GGGAACGGTC GTGGGTTCGA ATCCCACATC CTCCTTTTAT CGTCCGGCTC ATAACCCGAA GGTCGTAGGT AGCTCAG1TG GTAGAGCAAT GGATTGAAGC CGCCATTTAT ATATTT'rGGA AGGGTAGCGA CTCCTTCGGG TTCGGGGGTT CGAATCCCTC TAGAACTAAG GTCTCCAAAA CCTTCAGTGT AATTATGGCG GGTGTGGTGA AGTGGTTAAC TTCGATCCCC ATCACTCGCC TATTTTATAT CTTTGACTCC CTCATGCGTT GGTTCGAATC -G CGGAATTGG CAGACGCGCT GGACTCAAAA CCCCGGCCGC CGGTATAG'rA TAGTGTTAGG TTATTTTTGG TATAATTATA GTTATTCAAA TATGTCTTGT TCTATCGATT TATTAAAACA TGA6ATTGTTT GTCGGAATTG AGTTGGAGTA 'rAGTTGGGGG TTGCCCCCTG TGAGATACGGG GCTGGGAGAG CATCTGCC -r ACAAGCAGAG TTTAGCGGGT GTAGTITAGT GGTAAAACTA TTCTCGTCAC CCGCTTTGAA CTTTGTTCTT TCAGGTGGTT AGAGCGCACG CCTGATAAGC CCATAG'rGTT TAGTCCATTA CTAGGGGATT GGCTTGTTCC CGTA'rAAACT ATTTTGGAGG TTGAAAACCG TCAGGCGTGT AAAAGCGTGC ATTAACGCGG GATGGAGCAG CTCGGTAGCT TCAAATCCTG CTCCCGCAAT AAGGCTCGGT TCCATGTGTC GGCCGT'rCGA TTCCGTCTCG AGAGGCTAAA CGCGGCGGAC TGTAAATCCG CCCTTCCATT TTACGGGCAT AGTTTAAAGG GCGTTCAATT CCTACTGCCC GTGTTAATAG ACACCAGATT GTGGCTCTGG CATGCGTGGG TGGCGTATAG CCAAGCGGTA AGGCAAGGGA CAGCTACCCC AG7rACTATT- TGCCGGCGTG TCCAGTGTCC GCAAGGACGT GCCGGT'rCGA AACGTTGTTrA TTCTTCGTTC CTTTTTTATA TTTTATTTAG ATTAAGAAAG 'rGTAGGGGAG TCGGTATTTG AAAAATATTA AAGAAAATCC TCCTGT'rGCA AGTTTAGAAG GGGATGCTAC 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 735 AGATGTcGAA GTTATGAAGG ATCrTTTCA TTATTTAGTT TCTACrTTGG ATCTCACCGT 3ccAAAcGA GATGATTTTG GC;ATrCTGAT CCAGTTAGTA GATCCGATAA GTCAGGATC TATTTTATTT GAAGTCCT ATACAACG.AT TGAGTrGCA TTTGTrAAGG CTGAAACGAT TcAAGAGGTc GAAAATcGr TcAATAATT-A TATGAATGTA AT~cAAGAIA AGTTAncTGA ATCAAATCAT GCTATTGTTG GC1'GTGGTAT CCATCCCAAC TGGGA'rAAAA ATGAGAATTG TCCAGGcr TATCCAcGCT A~rCAGA'rGTT GATGGATrrAT TTGAA'rrrcA GTAGAAATAT TATTAAATCA GATTTACATC A'rTrCCCTGA ATATGG'rACT 'rrATCTGTGC GAGCCAGGT
TCAGCTGGAT
AGCGGCTAArG AAT!-rCAAGG
GGTCAATGCT
GATTT-TTACT
CTATTTCGCT
CCCCCAAGAG
AGGAACAGTrr ATrTCAAAAA
GCTTATTTAT
CCAACTACTT
TTGCAAACTC
ACCGGTGAT
TGAATTTTCG
AATGCTTTrA
GGTGCGGAPT
ATCTATCCAG
CTCAAATTGA
GGGATACGAA
AGAATGTTGG
e.
*0 ~0 GATATTTTCT GGGAAGAATC TATGCATGCT AGACTCCTTA ATCATCAAAC TGATTr r CCGGAACGTG ATGGGCAGAC CTAT'TATTTT ACGTCCGAAA TCCAAGCATT TGCrCTGAAT AAGGATTTTG AAACTCATCG TAG'rTACCAG GAGTTTCGTA GTGTGTGTAC ACAGCCACTT GACTATC?AA ATCATTCTGC TATCCTATTC AGGCTGGGGA CGGGATCAGG TTATT-ATTTA TACCAAGATT TAACGACTCG GATAGGACTT TTGCTTCTGC 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180' 3240 3300 3360 AGCTTTrPCAC TT-GGGATDAT TGGTTAA'rrT AGACAAGTTA GALAGCTTACT TAGAAkACAGC ACCTTT1CTTT AAAGTATTTG GT'rATCATTA CAAGTCTTTA AGGAGACAAT TTTCTAAGAA AAATCTTACA GATGAGGAAG AAACrACGAT TATTGAATTT TCCAAAGACT TACTCCTACT AGCTGAGGAG GGACTAGTGG AGAAGAATTG AGCCTATAA'r AATGGACCAG ACTATAGATA TGATGCCAAC AATACAGTAA CGGTT=,rT ACACCGGATA TGCTTCTTAC CAGGAAGTTG TGAGGAGT'rG GACTACTGTA TGCACCATTA GTGAAATTAG TGAGAAATAA GGAAGAAATG ACCTATTTAC TTCTCTTATA AAGGGAGAAT TTTCTGAAA.A AAGGATAGAG AGTAATGACA CTGCCAGCCA AGCAATTTTG CTTATCCAAA GGTAGATTTG CTAAGCTAGT TTTGTCAGCA TCAACAATGC CTACGATAGC ATGGGCALATA CAA'rTrGGAA 'rAGTTTATC
CAAGGTTTGG
AACTTTGACA
AGCCTTTGAC
ATCATGATAT
AATCPLACGCC
CGACGGACG
AATTGAAAGA
TTTTTAGATG ACTTTACAGT AAATTTGATA CTCCAGCTA1' CTTTTCC.ATG GTTCAACGAT TCCCTTAAG GATATCGCCT TGTCTATT GCCATACTTT ATGACGACTG CTCCTAAGAA ACATGTTTG GACAACAAGA ?rCT'rATCTT GACACACA TCTGGTGACA CGGGGAAAGC TGCTATGGCG CGGTTTGCGA ATGTGCCTGG TACTGAGAT'r ATCGTCTTTT ATCCAAAGGA 736 TGGTGTCAGC AAGATTCAAG AGTI'ACAAAT GACCACTCAG ACTGGCGACA TATTGCTArr GATGGTAACT TTGACGATGC GCAAACAAAT GTGAA6GCACA CGTGGCTCTT CGTGAAAAAT TGACTACCAA CAAGTTGCAA ?N'TCATCAG GAACATTGGr CGTCTGGTGC CACAAArrGT TTATTATrr TATCrACG
ATACTCATGT
TGTTTAAcGA
CTAACTCTAT
CTCA6ATI'GGT TAAGACTGGT GAAATTGTAG CTGGTGAAAA GGrrAACTTC TGGAAATATC rrGGCTGCCT TTTATGCCAA ACAAA'XrGGT CTGTGCT-rCA AATGACAACA ATGTTTTGAC AGACTTCTr AAAACGTGAG TTTAAGGTAA CAACCAGCCC ATCTATGGAT GGAGCGCTTG A'TTTCCATC TT-TTGGGAAA TAATGCTGAA TGCCTTGAAC ACGCAAGGAC AA'rATAAGTT GACAGACT'TT CTTTGCAGCT GAATATGCGA CTGACGAAGA AACGGCAGCA GTTAGATTCT TATATCGAGG ACCCTCATAC AGCTGTTGCT CCAATCGGCC ACTGGAGATG TAACTAAGAC ATGATCT ACAGTACCAA CAGGAAACTr TTGCC-AGTTG GTAAATTAAT AAA6ACACGTG T'rTGACAA ATCTTGGTAT CTTCAAACTT AAGACAACTG AACTTATGAA GATGCAGAGA TTTrGGACCT GAGATCAAGC GTGTTTGTGA TCAGCAGTT'r ATAAAAAATA TCAACAGCTA GTCCATACAA 9.
*9 9 .9 9* 9 9 99 9* 99 9 9 99 94 9 9 *99*99 9 .4 *9 9 9 .969 99 9.4 9 9 6 999.
9 9.99 9. 9.
4~ S 9 GT'rCCCAGrA GT'rGCAGTAG AAGCTGTAAC TGGAAAAGCA GGT'rTAACAG ACTTTGAAGC CTTGGCTCAA TTACATGAAA TCTCAGGCGT TGCAGTGCr-A CCAGCAGTTG ATGGGCTrGA AATAGCTCCA ATTCGTCACA AGACAACAGT GGCAGCTGCT GACATGCAAG CAGCGGTTGA GGCTTATTTA GGACTTTAAG ACAGAGGGAG CAAACTCGGT 4 rGGGAAACCA ACTGAGTTTC TT'rTCATCAG GAGGAGAGAT TGTTTAAGAA AAATAA.AGAC ATTCTTAATA TTGCATTGCC AGCTATGGGT GAAAACTrTT TGCAGATGCT AATGGGAATG GTGGACAGTT ATTTGG 1
N'GC
TCATTTAGGA TTGATAGCTA TTTCAGGGGT TTCAGTAGCT GGTAATATTA TCACCATTTA TCAGGCGATT TTCATCGCTC TGGGAGCTrGC TATTTCCAGT GTTA"TTTCAA AAAGCATAGG GCAGAAAGAC CAGTCGAAGT TGGCCTATCA TGrGACTGAG GCGTTGAAGA TTACCTTACT ATTAAGTTTC CTTTAGGAT TT'rTGTCCAT CTTCGCTGGG AAAGAGATGA TAGGAC'riT GGGGACGGAG AGGGATGTAG CTGAGAGTGG TGGACTGTAT CTATCTT'rGG TAGGCGGATC GATTG'IrCTC rTAGGTI-rAA TGACTAGTCT AGGAGCCTTG A'rTCGTGCAA CGCATAATCC 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340
ACGTCTGCCT
AGCTATTTTT
TTTGGTTGGT
TGGT'rTAGAT
GAGGGCTGGA
CTCTATGTTA GTTTTTTATC GTTCTGGA'rA TGGGGATAGC CTTGTGATTr TGTGGTCACA AAGGAACTGT TGACCTI'GGC GATGTAGTGA TCATTGCCTT CAATGCCTTG AATATTCTTT TTTCAAGTCT TGGTG'rrGCT TGGGGGACAA TTGTGTCTCG ATAAAACTG CCTTATGGGA AGCCAACTTT TTTACCAGCA GCTGGAGAGC GACTTATGAT GGTCGTTTCT TT-TGGGACGG AGGCAGTTGC 737 TGGGAATCCA ATCGGAGAAG TCTTGACCCA GTTTAACTAT ATGCCTGCCT TTGGCGTCGC 5400 TACGGCAACG GTCATGCTGT TGvGCCCGAGC AGTTGGAGAG TAGTTTGAGT AAACAAACCT TTrGGCTI-rC TCTGTTCCTC TATATATGTC TTGGGTGTAC CATTAACTCA TCTCTATACG C-GCTAGTGTT CTAGTGACAC TGTTTTCACT ACTTC-GGACC CATCTATACG GCAGTCTGGC AGGGATTAGG AAATGCACGC
TATAGGAATG
GGGCTTGCCT
ACGCTA'rCGT
TATTTGGGAT
GGAGACTTTT
CAACTATTCG
GGTGCTAAAT
GCCAGGTGCG
TGGTGTATCC
GGTATTTGGG
TACCAGCGC'r
TTAGACGGGA
GCTCAGTT
GTGCAAGAT'r
CAGGTGCGTG
CGTGAGGTGC
GCATTGGGAC
CAGGGTCTCT
ATATGAGCTT
AGGATATCTG
CTTGGATAA'r
GAAAGCATAG
CTTTATTGGA CTCTTACGAA CTATTCCTTA TGATAAGGAG TGCTTGTGCG GGTGGCAGALA CCCAGAGTCT GGCTGAGAAG TAGCTTGGGC AGACGAATCA CTMACCAT 'rCTCAAGGAC AGAGTGGCTT TGTGCGGAAG AGTTGAATTC TGATAATACT AGAATAGTGG GATTCAAAGC GATIGATTGGA PA.AGAGTTGC ATGTrrGCCCC TGTCCr'rrAG ACTGATTCTC TAGCGGTGGA CCTATGACGA CACGAACAGT CTCCCTTTTT ATGCGACAAG ATGGGATTG TGCTTGGTTG GGTTTTCGCT GGTTATTTCT GAAATGCAAA AAACAGCTTT GCCATTTTAT CAGGGATTGA AAGGTGAGAG AGTTTATC'r? GATAGAAATC TGGATGTTGA AATGCTCAGG TAGTTrTTAT GG.AATTC-AGC AGTTTATA'rA TTGG3GGGTGG AATCCTATTT CCAAGTCCAG AAGCCCCTAC TATTATATAG GGGATCGGAC ATCAACTTTT TAGAGTCTAC TACTCATAAG GGGAACAACG TACAGAGATT TTAACCAGTC CTATCTGCTA GATP.AGTATC TCTGGATGTG GAATTTGCCC TTATGAAGGG AATCACAGGA GTGATAAAAA GATTGTGTCA AGTTTGTTAC AAGGAATAGA TTTTTGTGT TATCATAGAC AATGTTATTA GCGTCAACAG AGAAGTTCTT TGGACTGCAC CAACAAAACA AGTTATACCG GGGTGTAGAT GTCACAGTGC CCCAGAAACT GTTT'rGACAA CCAAACACCT CAAGCAGACT TAATCAAGTG ACCGTTGATG 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720' 6780 6840 6900 6960 7020 7080 TTCAAGCGTT AGCAGATATT TCCCGTATTT TTGAGACTAA GT'rTTGTGAC AGAGACCTAA CAAACTATTT CAAGTAACCT CAGTTCTGTT AAATAGGCCC AGGTACTCAT TTGAAAGGAA TAGCCTTGTC ATTTGCCCCA GTAGTG?1'GA GCAAATCCAA TACAGTATGG TGATAC I'T G TTrGCGAATCT GAACAAAATC CGACTGTCAA TGAAOCAGAA CTAGTGAAGA AGTGACAACT
GAGAGGGCTT
TTTGAAAGAA
GTATTGGCAA
AACGATTTGA
T'r'rCTACA
TGAAGAAAAG
CTCAAGCAGA
CTAAAACGGA
AGCACCATTG CAGAAGCCTT ACTAATATGG ACTTGATTTT GAAGTAACAG AAGTTGAAAT GCGACAGCAG ATTTGACCAC ATCAAACTGT TCAGGTTGCA GACCTTTCTC AACCAATTGC 738 AGAAGTTACA AAGACAGTGA TTGCTTCTGA ACAAGTCGCA CCCAGAGGAG CAAACGACCG AAACAACTCG CCCAGTTGAA GACTCCAGCr GAGAAGCAGG AAACACAAGC AAGCCCrCAA AACTACAACA AGTTCAGAAG CAAAAGAAGT AGCATCATCA rTACTTAT CAACCAGAAG AGACGAAAAT AATTTCAACA GCCCGATTAT GCTGGACTTG CAGTAGCAAA ATCTGAAAAT AGCTGCCTTT AAAGAAGAAA TTIGCTAACTT GTTTGGCATT TCCAGGAGAC AGTGGAGATC ACGGAAAAGG TTT-GGCrATC
CCATCTACGG
GAAGCAACTC
GCTGCATCAG
GCACTTCTGT
CTCAGGAAAC
CAGTGGAAG.T
AATGGAGCTA CAGCAGCAGT ACTTACGAGG CTCCAGCTGC GCAGGTCTTC AACCACAAAC ACATCCTTTA GTGGTTATCG GACTTTATGG TACCAGAACG AATATGGCCA GCCGTGGCAT GATAGCAAAT ATGGGCCAGC GAAAATCACT ATGA'rCACGT ATTTTGACGA ATGAGATCTA GTCATACTCT TCGAAAATCT T'rCAGAATTA
TAGTTACATC
TAACACCG
TCACGTTTCA
GCTTTCGTGA
CT'rCAAACCA
AGCTTCCTAG
TATCATCTTG
AATTAGT1CAA CGGGATAAGA TTGCGGAATA TGCTATTCAA ATCTGGAAAC AACGTTTCTA TGCTCCATTC AACCCAATGC CAGACCG'rGG ATGAATGGAT AAACCCGACT TGGAAAGCGA TTCTCGTTCG CGTCAGTTTT ATCTGAAACT 'N'TGCTrTTT GATTTTCAT'r 'rAGTGTGACA
TGATAACATC
TTTTTTCTrT TCAAAGCTGT GCTTTGACCA ACCTGCGACT GAGTATCA-AT TTGAATGGAA AATGGAAAGT 7140 7200 '7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 TAATGAGTTA AGCAACATTC T'rGCAATCTA TTTTACT1'TA TATCACAAT'r ATATTGATAA ATCAATAAAA AGAGAGGGGA AGAAATGCTA GAGATTCAAG ATTTACTGTA TCAACTCCGC TTGTCTGAGC AAGCGAGTAC GCAATTGTTT GAAAAAAGC TTGGGATTAG TTTGACACGG TATCAGATTT TACTGTTTTT' GCTGr.AGCAT 'rCTCCTTGTA ACCAAATGGC GGTTCAGGAG A.AATTTTGGA AACGGAAGGT TGTTGGTAGA GGCTGCGAAG ATATCAGGGT TAAGGAAGAG GCCGTTTATT AAATAAATTG A'rGTCAAT'rA TTTTAACAAC -AGTAT'rGCCA CCCAATCAGA GCTCATCCGT CAGTAAGTTC CTCTTTCTCT TCTATGTCAT TTATTTGTGA TTCGTGCTGC CGT'rTGAAA.A
TTGGTGGAGC
TATGCCAAGG
ATAGAAAGTA
GTTTTGGGTA
GATCGTTGCT
'rGCGACTAGT TTGATCAGGC TGC'TTTGACA CGGCAI'TCA GTCATCGTAA TCCTGAAAAT CAGCGGGAAG AGCAGTTAGT GGTGAATCCC CCTCTGCAAC TCTTAACAGA GTTTGAGAGA ACAGAACTCA TTGAAAA'rAT AGAAATTTAA GGAGAAATAG TTGGAGCATT TTTACATrTrT TTATTGrGAA CGTrGTATT'rA ATATGGAAAA GGAAGAATTG ATTGTTCAAA AATCAAGGAA TTTATAAGGC TCTGCTAGGA TTATTTCTCA CAGAATTTAG AAATTGTGAC TATTTGTC GACTTACGGC TCTTITAACAG CGGATAAAAA AATTATTTTG AAACAAGGTG GATCAGCTAT TTTGGCCTTG ATTAGTATTT TACTCTTTAA A'rACACTTGA A=GCGATTC TAATCTCCCT AATCCTTTTT AATCCAGAAT AAGAAATA TGTTATACrr GTTTAAcA AAAAAGTCrC ATTGAATTGG TTTTGAAGCAG ?TAGAAATGA AAGTATTACT GACAGGTTTT GAGCCCTI'TG GAGGGGAAAA GGGCAATCCA GCTTTGr.AGG CCATTAAAGG TTTACCAGCT GAAATCCATG GTGCTGAGGT CCGTTGGCTA GAC4TGCCGA CAC71=~CA CAAATCTGCT CAAGTATTGG AAGAAGAGAT GAATCG1-rAT CAACCTGACT ?rrCClTTTG TATTGGGCAA GCTGGTGGAA GAACTAGTTT GACACCTGAA CGAGTGACCA TTAATCAAGA CGATGCATGC ATTTCTGATA ACGAAGATAA TCAACCGATT GACCG;TCCCA T'rCGCCCAGA TGGTGCTTCG GCCTACN'TA GTAGTTTGCC GATTAAAGCG ATCTCA.AG CTATAAAAAA AGAGGCTTA CCGGCCTCTG 'NTTCCAATAC GGCAGGGACT TTTGTCTGCA GCCATTTGAT GTATCAGCT CTCTATTT-GG 1'AGAAAAGAA ATCTCCATAT GTTAAGGCAG GTTTTA'rGCA TATTCCI-rAT ATGA'rGGAAC AGGTGGTGAA CAGACCOACT ACTCCACCTA TGAGTTTAGT GGA'rATTCGG CGAGGGATAG'AAGCACCAAT CGGCGCrATA A'rAGAACATG GAGA'rCAGGA ACTCAAGTTG CTAGGCGGAG AAACTCATTG ATAGAAAAAA GCTTGAGGGG AAAAACCTTC AAGCTTTTGG ACGT'rTTCGG GCCAATACTG CTCGGTAAALA CATAAr -ITA GTGCATTGGA TATAAGGTAG GAGTGAAAAA CTAGCAATGC CAALAGGTAAT CCAA'rrGAGG AAG'rACCAAG GALAGAAGCTG TAAATCTAGG ACAAAGTGCT GGA.ACTTGTA GCCCTTCATA AAGGA.ACGGC TAGTTTTTAG GATTCGTCTT GGTGGGACCT GTCCTAGGTC TAGACTATAA CAGAGAAGAA ATTCCACCTG TGAATAGGCA 'VAATACTGTG GA.ATATAGAG; GATATTTCCT ACAATGATCA AGATGAGACT TGCAAGAAAG TAGACTCCAA AGACCATGAG GAAACGCTCG GTTTCAACTG ATGAGAGATC TAGATTTGGA AACTCAGGAT CTAGGGTGAC GAATT'TTTG GCTAAAAAGC TACTATA-AAA GAGGAGGTAA ATCCCAAGTA AATTAGGGAT ACTCCATAAA AAGAGATAGA AACGTTTGAG AAGTACGGTC AAAAAGGTTT GAGAAAAGCG CTCCTCATCA AAGAGAGCTA GGCTGTTTTT TACAGATGGC TCCGTTTTAG AATCTTNCAT GAGTGTCAGT GTTGCATAGA CGGAACTGGT CAAAAGAATA GTCCCGATAA AGGAGACTAG TAGAGGAAAG AGGTAGGTTT GAAGTATTTG GCCAAGTATG CTGAAAA.ATG GCTGTTCTAA AACAGTCCCG TGGATCCGAG ATAAGGGATT AAGAA.AACCA GATAAGATGA CCAGCATACT GGGAAGGATA TAGAGGAGAA AGAGACGGGG GGTGTCAGCC TGAAAATGTI' T'rGACTCCTG ACGAATTGTT TTTAAATCAA TTTTTGGATA GTTCATTC'rC TTATTATACC A'rAGTTCTTA TACATAGTT~C GTGACAGTTC CTACTTr'rT TGATAAAATC ATACAGTGTG TCCTTGGGCA CACTGTATCA ACTGGGACTC 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 102 10320 10380 10440 10500 10560 10620 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 TCTTTCCCAG CTTCGGAGGT GAAAGAAAAA CACACAGGAG GACACC!TATG TPTrATGCCAG A?1'GAAGGAG A'rGGGTrCGG TGGAGATGAA CTCATTGCAC TATCT'rGACA GATAGTGGTG AGAAGAAGGA GTAACCTTA AAAAGCCATC TCTATTCAGA TCCTCAGTTT TATCAACCTT GGCTGAGCGT GGTTTGAAGG 740 AAAAAATGTC AGATTCACCA ATCAAATATC GTTTGATTAA CTCGTCTGGG AGAAATCATC AcTCCCCACG GTACC Tcc TTGGGACACA AGCCACTGTc AAAAcTCAGT cACCTGAAGA GAATTATCCT ATCAAACACC TATCATCTCT GGCTTCGCCC GCGCTGG;TGG TCTCCACAAG TrCATGAATT GGGAcC-AGcc GNTTTCAGGT TTAT'PCTTTA GCAGATAGCC GTAATATCAC AAAATCATCT AAATGGTTCT ATAATCTGGG TTCAGACATC ATGACTACGT TAAGAAATCG CTC!ACCGTCG TCCACATGAC AAGATGTrCC TATCCCCAGA ATGATGTCCT TTGATGAATG ATCGAGCGTA CCAGCCGTTG CAAGGTTTGT TTGGAATTGT CATGATCTTG TCAGCATGGA ACCCATGAAG AGATGAATGC
GCAAGGTGCA.
TTTC!TCAGGC
GGTC'ITGGAC
GGGAGCGCCA
GGATTTGAAG ACCTTCGCCG CCAATCAGCT TACTCTATCG GTGGTTTGGC AG'rGGGAGAA 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 1820 11880 11915 TTTACAACTC AACTGC'TGCC TGAAAATAAA CCTCGTTATC GATAGCTTGA TCGATGGGGT CATTCGTGGG GTGGATATGT
TGATGGGTGT
T'rGACTGTGT CTTACCGACT CGAAT'rGCTC GTAACGGGAC TTGTATGACC AGTCAAGGAC GTTrGGTTGT GAAAAATGCC CAGTTTGCTG AGGACTTTAC GCCACTGGAT CCTGAGTGTG ATTGCTACAC ATGTAATAAC TATACACGCG CTTACCTTCG TCACCTGCTC AAGGCTGArG AAACCTTTGG TATCCGCTTG ACTAGCTACC ACAATCTTTA CTTCTTGCTT AACCTGATGA AGCAAGTGCG ACAAGCCATC ATGGATGACA ATCTCTTGGA ATTCCGTIGAG TA~TTGTTGG AAAAATATGG CTATAATAAG TCAGGACGTA ATTTCTAAAA TGGAATTGAT ATAAAAAAAT CCTAAGTTTT CTCTTAGGAT TTTTCTTCTT TrTTTGCATAG AATAAAGTGT ACAATGAAAG GAAGAATAAA CTCGTATGCG CATTAAATGG TTTTCCTCGA TTAGG INFORMATION FOR SEQ ID NO: 97: (i)SEQUENCE CHARACTERISTICS: LENGTH: 9069 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: GAGAGGGCAA CAGTTCTATC GCTTCA.AATT TTTTCTTGGT TTGCAGATAT TCAAGAATCG GGAGTTTTTC TATAGTATTC GGCAGATTTA TTACAGCCAA GCATCTCAAA AATACGGACA GCATCCTCCA TCTTTTTCTG GCCTTCCTTG ACTCTACCTT GCTTGCrATC AAGGAGAccTr TcZTGCCCACA GATAAACAAT TCGGAAATAG GCTCATr' CcTTGTAGAA ATGCTCTTcc ATAACACGrT TAAAATAATA GGCATTGGTA AATTC1-rCAC ACTCAATACT AGCI'AAAAAG
CCATTCAATA
GATTTACGTA
CTGACTGAAA
TGAACGACTT
GACTTGATAA
TATACTCCCT
TAATTTCTTT
TCTCCTCCAC
CACTCTGACC
GTCTCAGTGA
TAGACTTCCC
AAAACTTGCC
TGCTATGAGA
GTATAGTATG AAAAAGGTTT cGATTGCCAG CCATTC~jCG TACATATCTA GTAAAAAGAG ATAAATTCAG TTCATAGATT CCCCAGATCT TTCCTTCCTC TGCTGTAAG TCTACCCTT TAATGGCATT TAGAATATGT TTCTGTTGT GCGATATAAG TCCTCAAGAG GTGCTATATT TCTCAACTCA GAATCTGTAT CATACTGGAA ACTGGCAGAT ATATTTTCCA GAGCA.AATAG TGTTTCAAA.A CGGGACAACA TAGACGGCGA GATA'rrTCTT- GACTC-rCGTA AT'rCTCTAAA ACAT'rTCCAT TAGAAAATCA AAACAGATAA AAATGGAGAA CCGTAGAAAA CAAATAATcA CATCTATGCT cTrcATATAA TGTGAGAATC GGCATGCTT CTTTrGTI-CC AAGACATCTG ACC'TCTTGCC AGAAAGAGGA AAACTTTTCC ACCGAAAGCT AAATTGTCCT CCGGTTGC'TT GAC'TTT-rCCA ATCTGCTCCA ATT-TTrCAGA AAATTCATCA TAT1TrATCAC GTGGAGGGAC GCCTTGCAAG ACCAAATTAT a. a a a. ta a a. *a a a
CTTGATTCCG
AAATTGTCAG
GACGATATCA
TATTTTCTTC ATTTTATCAT AATTATGAGA AAATAGAGGA AAA'rCAA'rGA CCCTGCTTTG 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 CGAAAAACTA GAGAAAGTTT TTGATACAGA TGTGGAATTG GATGTTTACA ATCTAGGTCT GATTTATGAA ATCAATCTGG ATGAAACGGG GCTCTGCAAG ATTGTCATGA CCTTCACCGA TACTGCCTGT GATTGCGCCG AAAGCCTGCC TATTGAAATC GTGGCACGTC TGAAACAAAT CGAGGGTATC AAAGATATCA AGGTTGAAGT TACCTrGGTCG CCTGCTTGGA AAATCACACC AATCAGTCGC TATGGCCGTA T'rGCCCTTGG ACTACCACCT CGTTAAGCAC ACCAATCACT TTTAAAGATC AAAATCAAAG GGCAAACTAG AAAACTrAGCC GCAGGTTGCT CAAAACACTG TTTCAAGTr ATGGATACAA CTGACGAAGT CAGCTCAAAA CACTGTTTTG AGGTTCTGGA TAG;LACTCAC GAAGTCAgCT CAAAACACTG TTTTGAGGTT GTGGATAGAA CTGACGAAGT CAGCCCAAAA CACTG'rTTTG AGGTTGTGGA TAGAACTGAC- GAAGTCAGTA ACCATACCTA CGGCAAGCC ACG=rGACCGT GATTTGAAGA GATTT'rCGAG; TATGAG=rA TT=TACC1' GACTTGTCCA TATTCCAGAA GTCTGTCACG GCTCCGCGTG AAGCAGATGA TACGATGTGG GCATATI-rAC CGAGGACACC ACGGCTGTAA AGTGGTGGCA AGGTTGTTTC TGCCTTGCGT TT'rTCA-AGTT CT'rCTTCGGA TACGGCCATA GAAA.TTTCTT 'rGGTATCTTG GTCAACCGTA 742 ACGATATCGC CGGTACGGAG ATAGGCAA'rr GGTCCACCAT CCTGAG3CT'C AGGAGCGATA TGTCCAACAA CCAGACCATA AGTACCACCA GAGAAACGTC CGTCCGTCAA GAGGGCCACC TATCTCCCT CACCTTTACC AACAA'rCAT-r GAAGAAAGTG ATAGCATCTC AGGCA'rACCA GGACCACCTr TAGGTCCAAC AAAACGAACA ACGACTACAT CGCCATCAAC GA'IrTCATCT GTCAGAACCG CCTGAATCGC ATCTTCTTCT GAGTCAAAGA CCTrAGCTGG CCCAACCTGA CGACGCACTr TAACACCTGA TACCTTGGCA ACTGCACCGT CAGGAGCAAG GTTCCCGTTC AAGATGATAA GCGGACCATC CGCACGTTTr GCAT-rTCAA GTGGCATGAT AAC~nrTGa CCTGCAGTCZA ACTCTGCAAA GTCAGCCAAG TTTTCAGCTA CAGTCTrACC AGTACATGTG ATGCGATCTC CGTGAAGGAA ACCAT'rTGCC CCGACTTCGT AGAGGTCTTG GGCACACGTT CTTGAPTCGT GCAATGGCGA GCAAGTGAAG ATAGCATCTT CAAAGGCTTC
GAAGACATAC
ATTGAAGTCC
AGTGGCGTTT
ACGAGTCAAG
0
S
.0 0 ATCTrAACAA CAGCACGTCC TGCTGCTTCG GGGTGAGAGG ATGACCCTGG CAAACTCATC TTAGCAGTAT ACATACCACC ACAACCACCA TTCACGTCCT CAGCTGTCAT GTCACCGTGG ACCAAGTCGA TATCTrTACC ATCAAGATTT ATAGCTGGGA TATCCATATT AGCAATAGCA AACAAATACT TCATAACCGC AGGGACACCA TGACCAGATG GTTTCAAGTC GGCCAAGTGA TCAAGTGACA AGTCAACATT TGCGGCATGG GTAGAACCAC CGAGAGCCAT CGT'TACAGTG ATA'rCTGATG GTTTGAGACC AAGTTCCAAC ATATCT'TCTT TCT-rATCAGC 'rGATTCAGCT CCTAGAACTT CGATAGCAGT TGCCATGGTA GGGCCAGGGC AGGCATTACA TTCAAGACGT TTCCATTTTC CGATACCTTC AAAGACAGAA CCCGGTGCAA TAGTTCCACC ATACGCGAA.A ATCATAGATC CAGGCATGTT CTTGTCACAG 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 CCACCGATAG CGACGAAGGC ATCCACGTTG TGACCACTCA TAGCCGCCTC GATGGAGTCC *0 50 C, S S
S
S
*505 50.
0.00 05 5 5 5
S
GCGATGATGT CACGAGATGI' TAGAGAGAAA CGCATACCAG TCCGCTACGG TAATGCTTCC AAACTGTACA GGCCAAGCGC TTAGCCAGTT TCCCGAAATC ATGCAAGTGA ATGTTACATG GAAATCACTC cC-ACAATCGA TGTTTCA-AAG TCCTTATCTG ATAGCACGGT TAGGTGATTT AACCATGCTG TCATAAATGC AATTCAGTCA TCTTATCCCT CCCATTTCAG 'IrTTTACTAT AAGAACAGAA TAAAATTCTT GAATTTTCAG AAAATrCTAT ATTAAAAACA.ACAAAGCGGA TTAGTGCACT TTCTGATGAC GCTT'rCTrrA AA'rAACGTAC TGTAATTTrT ACAGAAATTC GCGT'rCCCAT AGCGATCCCC CTGCAGATTT GACACCTTCT GTGTATTTTC CGCCCAAGTC TCATACCAGT CCCACGAAGC TACTGCGGTG ACGTTTATCT TATAGCACAA TTTTCGCATG ACACATGTGA AATATTTAAA CAGA.ATATGC TTrAATCC TTCAAATAA GTGTATTTAA CATCTATCTT GCATTATAAA 'rTrCTAGAAC CTTCTC=Lr ATATTCCATT CACTCAA.ACC 36 3660 743 AfrACTCATTA AGAAGATAA'r CCATTTCCC CCGATGAATT ?TG?=ATTC CATCATCAGA ACCrATCTGA TTGTGGarr- CTACTA I-G-T1TCGGT ATCGGTTTGA TTCTAAATAA AGACAGTTCA TCTGCAACTC GAATATTGG AAGATCTCA CCATGCCTA ACTCAATGTA CACAGGAACT GGAGCTTTTC TAATTGTTCG CTGGTTCA6AT ATTTCACGAA AMrGGATATC AAT'rAAACG'r AACAATCCAA TTTCTTCAAA CCrrACTCCT GCATrC'GATC CAA'rCACAGT AAATAATTGA TCAAATACTC TTCGTGAAGC AAACCCCTGA ATAGACAAGC CTGCTGCAAG AT'rCACATAA CGGTCTCCAA AGTCCTTTTC ATCGGCT'rCT AAGACTACTA TATCAGAATC ATATACATGC CGTAATTCTT TCGTACTTCT AATCTTTCTA CAACTGAAGT TAACA'TrGT TTGGATTTCA TTTCTTCCAG CTCTTGAACC CACT'rAGGTG A'rGAATTATT TGACTGTTTT ATATCTGA-AC CCTTGACCCT AATGCATTCA TCACCTGGAT TACAAATATC CTTTGTAAAA AATACAATTA AGTTGGATAA CTGTTGAPGA CCCTCATTTA ACTCACCATC TCCAACAATA CTCTGACCAT ATGCAAGTCC AGTTGCAACA ATATCTATGC CTGGCGTTAG Ar'rTCTATCA TTTAAAGAAT ATAAGAATTC ?TCTCAAAG TACrrGACCG AATcTTTCTr GAATAXTTCA CATAA6AGCAC TATAGT'TTTr CCACrrAACA ATCTATCACA CCrAC2'GAAT AGCA.ACCArr ATGccAG2AAG GCC~rrAGAA AAATCTTCTC CACCTTGATrA AATATATTTT ACTCC=rrA AGTCTAATGT ATCAG'rTGCl' TCGAAAATCA TTGAT'rTAGG TGGCATATGT GTTCCACCAT TCATCTCTGC GGCATCCA6AT TGTGCGTATC CAAGAGAAAT AAAAGGACCA AATGTATGAA CATAP.GGTCT GCCGACCATT TCTCCTTCCA TAATCCCAAC AAGA'rTATTA GTAGCCATCG kACrrGACAA ACTr-rGATTA GCCTCTAAAA GGAAGTCTCT CATCA'rTCTG TTrCC'TCCAA rrCCTGACTT TTCTCCTCTA CAGTAGGGCG AACATGATGA CCTTGACCTI 'AATAGTATC TAATACAATG AATTCCACAA TCC=~CATA AATTTCTCTA AATCCAA.ATG. C'TGAAAATTT TTCTACGAAA CCATCTAATT GTTT~rTGTT ATCA'rCAACA GAAGCAAACT GTATAGCCTC CCAACA'rTGT GCGTAAGTA'r A.AAAGGGACT CTTTCTTA'TT CTAA'rTCCTT GTCCTAA-AGA CCCCGTTGTC CGATCACACG GTAATrTTCT TCCATTTGTA AAACCATTCA AA'rAGAGTCT ACTG'rATAGA
GAACACCCAT
TGCCAATTCC
TGT?'AT
AACCTAArr cAACGATTrAc 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 GCTGGTCCTC CGTGACCTTT TGATAATATG AAATA.ATCTC TATCTCGTGC TGCAAATATT TCTGGAGTCA TTGGCATTAT TTCACCATAA AGCACCGCTA A.AACTTCTAC GATAGACACA CTTCCTCCGT AATGTCCGAA TCCAAGATGA TTCAATGTTC TAAGAGTATT TAATCGGATG TTAGTCGCAA ATTTTCTA CCCATCr'TCT CTATTTTTAC TTAAAATCAT CCCTTATTCC 744 TCCGTTGCAG ATGGC?I1'r1 AA'rAAAGGAT ACTCCAAACA AGACCAATCA CAATGCCTGC TTGTGAGCCA AATTGATTTA GATAGACCAA AATCTGCATC TGAGAAAGTT GATCCTTGGA GGCATTAAAA AGACrGGAAG AAAACTGATT AAAATACCTT GCTCCACGAA CACCACCAGA TGCATTCCCA ATGACACCTG TGAGGCACAA CACCTGGTAA GATAACAACC GTTCCTGAAG ACTAAACCAC CAACAAAACT AGAGATAAA'r CCAATI'AGAA TAAACAATCG GACAATCCAA AGCAGGTT'N' GAATTAGCTA TAACTGCTAG AATAAGAACA ACATTCCTAA AATAA'n'CCT AACCAAGTCC TCCCAAAACT GTAAAAATG.C TCCAATAGTG CAGTCGCTCC ACAGAAGAAA CAATCATAAT TACCATACTT CTGC*ATTGGG TGCATAAGTA CAAGACGCTC TGAAATACCT CTGCTAAAAT AACAAATACC GACCACTTGT ACCACTACTG TTAAAGGCTG GAACAATTTC GCCCAAAATA CCTGCTGCAA AT'rGACCTGC TAATT~GTAAA
AGGCGAACAC
GCATAAACTA
ATTTCTTTTr CTATA'rATTC TGACCCTGCA AAGATAGCTA ATGGATAAAG TAATACTAAC AGTACTATCA CGTAAAAAAG ATGTCCTCTG TTGATTTTGA TT'rGTCACCG ATAAGGCTAC TATCCCAAAG AACTGAAATG ACCTAAAGCT ACCTTGTCAT TAT'rTTTGCA CAAATGCTGG GGAAATACTC ATAATAATAC AAGATCAGAG GCAAGCTAGT AAAGCCACCA ACTGATAAAA ATATATAGAG TGTGGTGCCC 'rGTTAAAAAA ATATATTTAA AAGATATTGA ACACCATGCC TGCAAACATA ATCATTGCAG AAAGC'rACAG CTACAATTGC TTCATTATTC GGCACAACGC TCAAACATGG TACCAAATGG ATTTAA-AGAA TTTTGTACAA ACTALAGAAAC CAACAAAGGT CTTAATTCCA CCTTTAATAA TGAAGAACTA ATCCTAAGAT TGCAATTAAA GCTACTAAAA TCCAATATGA ACTTCATCAT GACGCTAGCC TCCTATATAA TTAGTAATT A ATTCTCGTAG TTCATCCATA TCAATAATAC CCAAGATGAC TAGCTGAATC AGCTAGATCA CGACCAACAA GGATCTGCTC CACCTAAATC ATAATGTTCA ACTTCTACAT AATACAGATT CAATATTCAT CTGTACCATA AAACTTGAAC GTACCAATTT TTAACATTAT CTAATCCTCC TGTTTAATTA TTTTTTGATG ATATTAAAGT TGAACATGA TTTTTATCTC CAATAATGTA AATAACTGCC CTAAACTCTT 'rGGAAATTTA CAGTAAAACC ACTCAACCAA TTCC-AGTTAA TTGAACCATA CGAGTGCTAA TCCTCCTAGT TGACCGCAAT CATACATGCC ATCGAGTAAA ACGAGCGATT TAGCTGAGCC ATATGTTGTT CAGATAAATG AAAAGCATGC TTCCTGCACC ACCAGATACA TATCAGGTAA TTTCTTCTTC TAGCTGGTGT ACTAACAATA GTCC7T"TTC TTCACAAAL;T TATTTAAGAT ACGAACATCT TCCAAATATC AGCTGCATTT CCGAAACATr CAAATCACTC CTAATCCTGA ACCACAAGCT TCATTT'AAT GTCATCATAG- TTAAAATTGT TGTTAAATGT 5460 5520 '5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 GACAAAGCCT TTAAATGACT CTCATTATCA ATGGCTGCAA TACAAATCAA CAATCTTACC 70 7200 745 M TTGTTCTG GATTATCCAA TAAATAAATC GGTrcI'TCCA AAACTAACAT TGCATTCC'r ATTTCATTCA CACCI'rCATC 'rGGCCGAGCG TGAGGAATTG CTACTCCCI-r CCCTAAATTA ATAAAAGGTC CAAACrCT1'C TAC7"rTTTGA ATCAT'rGCCT CAGGGTAG~r CTCAGTTATC TTATCTTGAT cCAAAAGCGG =IACCTG3CT AA6ACGAATCG CCTCCTTCCA TCCTAATT TGCGAACTAA CCTGATAGCr TTCTTGGTA ATAAGTTGTT CTAGCACTGG TACAATTTCC TrCrATCAT TTTTTrGCTA AAGATAATTC TTTIAACGCCA ATCTTAATTC CAATTCTTGT GTAATAArC CATATCMr~ GACAATATTC AGGATTTGTT CAATCTCAAA ATCTCCATAC TCTAAATTCG GAAAATCTTT TAACACTAGT TCTACrAGTT GTATTGCTTG CTCTCrAGTC ATCATAACCG AAACTAGATA ATT'rGGuI I TCTGTCTCCA CCTTTATGGT AGAAAAAACC ATATCATAGT CACTACTAGC TTTCACCTGT AAATCATCA.A TCtTCTAGGT TCCTATAAAC TCAATTTGAG GAAATAATGC TAATAGATTC TCTT?1'AACA TCAATGAAGA ACTA.ACACCA TTAGGACAAA TGATTGCrGC TTTATACCAT TT=TAGGCA AAGTATCTGC TTTCTTTAA.A TAACCTCCGA AATGCATAAC AAAATATGCT GTTTCACTAT CAGCTATGGG ATTGTCAATA GCGTCCATCA ACGGCATCAA AGAATCTTTG ACTAATTCAA ATAAATCAGG ATAATGT'rCT TTAACATGCA ATACATATTC AT'IrGAACTA CC'rAGGCCGA ACTTTAATCT ATAGTAAGCC GGTATAAGGT GGCGGCGAAG AT-TTTCTCTC AATCCTTCCC T!-GTTTAAA ATGTAACAAA a.
a. a.
a a a a.
a a a a. *a a a a 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460' 8520 8580 8640 8700 8760 8820 8880 8940 CAAATATCTT CCATTCTACT TATAATAGCC TCTGTTAATT ACATCTACTT CACCTTCAAA GCAACTTGAT AATAAAACG3 TCAGAAAACA CCGTATCTAT AATTCCCAAA TCAACCACTG ATATCTTGA.A TAACAGGAGA TACTAATGTC TCTGAAAGAC TGATACCTAC ACAGAATGAA TACTAAACCG AAAAGGTAAA GGTACTAGCT GTACCTTCTC ATAATAATCT TTAACTACCT AATGAATACC CCCAACTGGA TAAAACATAA TCCAAACCCC AGCAACTCAC TAACCATNTG AAAAGCTAAG CGGTCTTA GA'rTAAAGTA AACCGGAGCA TGATATAGCG ATAATCATCC TATCCAATAA AATAGTGGTT ATACTCTTTC AACATCCCTT CTTTTAATTG ATTAACAATA GATCAATCAA ATCATAAGTT AAATCCCTAT GGAGGATTCC TCCACTCTGA ACCGTG'rAAA a. .a a a a a a. a.
a a GTAkTAACCTr TTG;CTCTACT GTACCCTAGC TCCAAATCAT TATCTAACAT AATCTrTCrT AATGAT'rGAA TATCAGATAA GGTTGTATTC TTACTTACTT TCXAAACaTC TTGGTAATGA CTATTCCATA 'rAAAATCTAA TCGGCAAAAA GTGTAAAGAT AGATTAAAGC 'rAAGCGAGTC GAC?1'TGGTA AAACCAATTC ATCCGACTTA ATAATATCTG TCAAAGACTG CTTCGTACGA TTTGATAAAC TATAGCGACC TTCCTTTTTA TCCAGCACTA TCCCTTTATT AGCTAGATAA 746 GGCACrAAAT AATCTATTCC TTCTrTGACT TCCTrTATAG GTAAGCTCAC CTTAACAGAT AATTCATATA ACGP.TAGCTC ACAATGATCC ATCAAAGTCA TCAAAATAAC TAGTGCTCTA TAATCA AAC INFORMATION FOR SEQ ID NO: 98: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 8654 -base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: CGAGACAACA AGATGAAGAA A.AATTTGCCC TATCGTTTGT GGCGCTTGCA AGTGTAGCAC TTCTTGCAGC CTGTGGAGAA GTGAAGTCTG GAGCAGTCAA CACTGCTGGT AACTCAGTAG AGGAAAAGAC AATTAAAATC GGGTTTAACT TTGAAGAATC AGGTTCTTTA GCTGCATACG GAACAGCTGA ACAAAAAGGT GCCCAATTGG CTGTTGATGA AATCAATGCC GCAGTGGTAT CGATGGAAAA CAAATCGAAG TAGTCGATAA AGATAATAAG TCTGAAACAG CTGAGGCTGC TTCAGTTACA ACTAACCTTG TAACCCAATC TAAAGTATCA GCAGTCGTAG GACCTGCGAC ATCTGGTGCG ACTGCAGCTG CGGTAGCGAA CGCTACAAAA GCAGGTGTTC CATTGATCTC ACCAAGTGCG ACTCAAGATG GATTGACTAA AGGTCAAGAT TACCTCTTTA TTGGAACTTT en a.
I* a a a a. .a a a a. *a a a I. a* 4..a a. a a a a a.
a 9000 9060 9069 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260
CCAAGATAGC
GAAAGTTGTT
CCGCGAGTCA
CTTCCAAGCA
TTACTATAAT
AATCGT TGGT
AGCATCAAAC
TAAAGCCTTC
TTCCAAGGAA AAATTATCTC AAACTATGTT TCTGAAAAAT TAAATGCTAA CTTTACACTG ACAATGCCAG TGACTATGCT AAAGGGATTG CAAAATCTTT TACAAGGGTG AAATCGTrGC AGATGAAACT TTCGTAGCAG GTGACACAGA GCCC1'TACAA AAATGAAAGG GAAAGACTrT GATGCTATCG TTGTTCCTGG GAGGCTGGTA AAA'TGTAAA CCAAGCGCGT GGCATGGGAA TTGACAAACC GGTGATGGAT TCAACGGTGA GGAGTTTGTA CAACAAGCAA CTGCTGAAAA ATCTACTTTA TCTCAGGCTT CTCAACTACT GTAGAAGTTT CAGCTAAAGC CTTGACGCTT ACCGTGCTAA GTACAATGAA GAGCCTTCAA CATTTGCAGC CTTGGC'PTAT GATTCAGTTC TGAAATCAAG AATAACCTrG CTTCGATGCA GACCACAACA AGTTGAAGCA GCAGAAGTTG' TTTGACTCAC TCCCTGTTTC ACCTTGTAGC AAACGCAGCA AAAGGTGCTA AAAATTCAGG CTAAAACAAA AGATTTTGAA GGTGTAACTG GTCAAACAAG CAGTCAAAAC TGCTTACATG ATGACCATGA ACAATGGTAA TAAAACCATA ATAGAAAAAT GTTGAAATAG GGAATGAGCC GATATTTAAT ACTCTTCGAA AATCTCTTCA AACTGCGTCA ACG1'CGCCTT GGA1TrAAA TGTGACTGAC TTCGTCAGTC TTATCTACAA CCTCAAAGCA GTGC?1'TGAG CAACCTGCGG CTAGTTTCCr AGITGCTCT TTGATTTCi T'rGAG'?ATAA GAACCTATCA AAAAhGTGAGG GAAAACCCTC GGAkAAP ATACAAAGAG GCTCCALACAA CTCGTAAATG G1-rTGATTCT AGCTAGrGTT TACGCGCTGT ATATACCATG GTrACGGAA TTATCAAGCr CATCAACTTC GCCCATGGTG GATWGGAGCC TTTATCCGTT ATTTCTTGAT CAATCTC CAAATGAATT GCTTATrT=A GCTATGCTAG CGACAGCTAT 'rCTGGTGTC GTGATTGAGT CCGACCTTTG CGCCACTCTA CTCGTATTGC TOrrGATT ACGGC'rATTG
TGAATCTTAT
TAGCCCTAGG
ATAT -rATA'r TCTTTrGTAGC TrrGCI-rA GGTTTCTT1-
TCCCTCAAGC
AG'rTAATGAT 0 *0 0.0.
00..
*000 CCTATTGGAG TATGGAATGG TCTATCTCGT GATTCAAACA GI'CGATATG ATTTGGCACC T'rrCGCCATT TCCTTGATTT TGATGATTTT GGGGAAAGCC ATGCGTGCAG TATCACTAGA TGGTGCCAAT ACCCGTGCCT AA'rrAGCTTA ACA).ATGTGC GTTACAAGTC ATTGTCCAAA AGACTAAGAT TAGCGACGCG GCGCAATTGA 'rGGGGATCA.A TGTAAACCGT ACCATTAGCT TTACCTTCGC rI-rGGGTTCT CCTCTTGCGG GTGCGGCTGG TGTTCTGATr GCTCTTTATT ATAACTCTCT TGAGCCTTTG ATGGGGGTTA CTCCAGGTCT TAAATCTTTC GTTGCCCCAG TACTTGGTGG TATCGGAATT 'TGGCTTTGTG AT'rCGTCTAT TCGAAACCTT TGCGACTGCC TGATGCCATT GTTTATGGAA TCTTGTTGTT GATCTTGATT TGGTA-AGAAT GTGAAAGAGA AGGTGTAAAC GATGAAGGAA ATCCTGGTG CGGCTCTrG ?TTrGGGATG1' CAGATTTCCG GTCCGCCCAG CTGGTATCCT AATTTA.AAAC 'rTAATATTCT 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 ATGGTTACTC CTT'rTGTTAG CTGGCTATAG ACr'rAATCTA TTCTA'rGTAC AGAT'TTTACA CTTGATTAGT GTAC'rGGTTT CAGTCGGAGT ACAAATTGGA ATTAATA'rTA 'rTTrGGCTGT TGGTCTCAAC TTAATCGTTG GTTTTTCAGG ACAATTTTCA GGCGATTGGT GCCTATGCAG CAGCTATTAT TGGTTCTAAA CTTTGGAGC'r ATGCTTGTAG GGGCTTTGCT TTCAGGAGCA TCCAACCTTG CGCTTGAAGG GGGACTrATCT TCGTAGCA CTTGCTCATG CTGGTTTCAT TCACCAACCT ACGGTGCCT GT'rGCCTTAC TTGTCGGCAT ACTCTGGGrG TTCTGAAAT 0 0 TATCCGTATC ?I-rATCATCA ATGGTr.GAAG CCTTACAAAT GGTGCGOCAG CTATCTTAGG GATTCCTAAC TTTACAACTT CGCAAATGGT TTACTTCTTT GTCGTGATTA CAACCATTGC AACCTTGAAC TTCTTGCGTA GCCCAATTGG TCGTTCAACC CTCTCTGTTC GTGAAGATrGA AATCGCTGCT GAGTCAGTTG GGGrTAATAC GACTMAAATT AAAATCATCG CTTTTGTCT'r TGGTGCCATT- ACTGCAAGTA TTGCTcGGTC ACTTCACCC-A GGA'rTTATCG GGT~CTGTTGT
ACCGAAAGAT
ACTC.GMC
TCTCCAAG.AT
TACACCTTCA TCAACTCALAT ATACAGGTG CGA'rrGTC GTrGCTAGTG TGCGTATGAT 748 CAACGTTTTG ATTA'rrGTTG 'rATTTGGTGG GGCTATTGTT CTGGGAATr'r TGAATATGCT TATTTACGCT TTGGCCTTGG TATTGGTAAT ATGGGAACTG AGCCTATCAC GTTTCTTTAA GCATTACTTG-A*TAAAACA GTTAACCAAA GATTTTCAGA CCAC.GTGG.AC TCCTTGGAAC AAAATCTAAG AAGGAGGAAC AAAACTAATG CATT'rTGGTG GTCTAACAGC TGTTGGAGAT GTrGGATTAA TCGGTCCAAA CGGAGCTGGG GTTTATGAAC CAAGCCAGGG AACAGTAACC CCTTATAAGA TTGCCrTTrT GGCACTT'rGA GATTTAACAG P'rTTAGATAA TGTT'N'GATT ?TTTACTAGrr TCTrACGCTT ACCAGCTTT GCTTTGGAAT TGI'GAAAAT CTTTGATTTA CTTTCCTACG GACAACAACG TCGTTTGGAA
GTGACTCTTG
AAAACCACCC
CTAGATGGTC
AAT7*rAACGA AGGGGAAC'rG TTTTCAACCT TTTGACCGGT ACCT7'rTGAA TGGGAAATCA CGTACTTTCC AAAATATCCG TCTCTTTAAA GCTTTTGGAA ACCATCACAA ACAGCATGTT 0 0 *0
TACAAGAGTG
GATGGTGATG
ATTGTTCGTG
AAAAAGAATT
CAGAGACTCT
CCCTTGCTAC
AAAGCC1'AAA
TGCTAAAAAT
GGAACCTAAA
ATTCTCTTCT TAGATGAACC AGCAGCAGGT ATGAACCCAC GAGTTAATTC GTCGTATCAA AGATGAGTTT AAGATTACAA ATGAATCTGG TCATGGAAGT AACAGAACGT ATCTACGTAC GCTCAAGGAA CTCCAGACGA AATT~AAGACC AATAAACGCG GGTGAAGCCT AATGTCTATG TTAAAAGTTG AA.AATCTTTC AAGCACGrrCG TGATGTAAGC TTTGAAGTTA ATGAAGGAGA CCAACGGTGC AGGTAAGACA ACTATCTTC GCACCTGTC CAGGAAAGAT TGAATTTTA GGTCAAGAAA TCCAAAAAAT CAAGTGGTCT TTCACAAGTT CCAGAAGGAC GCC-ACGTCTT AAAATCTTGA AATGGGAGCT TTCTTA.AAGA AAAATCGTGA AGAAGCTTrT CTCACCCTTT CCTCGTCTTG AAGAACGGAA TTCAGGGGG GGAACAACAA ATGCT'rGCCA TGGGACCC TT~CT'ICTTTT AGATGAACCA TCAATGGAC TTGrCCCCAAT ATATCATTCA AGATATTCAG AAGCAAGGAA CAACGGTCCT ATAAAGCACT TGCAATCTCT GACCGAGGAT ATGTACTGGA CAGGAACAGG AAAAGAACTC GCTTCATCAG AAGAAGTCAG AAAACAATCC AGTGGATTGT TTTAGTCGGC AGATGGAGAT AGGAA.ACAGC CGAATTGACT TCATGTTGAT TGAACACGAT TTGAATATGG CCGTTTAATC TTATCGAAGC TTATCTAGGA TGTGCATTAC GGTATGATCC AG'rTGTTTCC CTTATCGGTG AGGTTTGGTT CGACCAAGTT GCCAGCTCAG AAAATCGTGG TCCTGGCTTG ACTGTTATGG AGAAAATCAA GCTAACT'IZA GAACCAAGAT GCAGCCACTC CCTCATGTCA ACACCAAAAC CTTTATCCAA GAAA~rrTTG CTTGATTGAA CAAAATGCCA AACAGGGAGA ATCGTCCTAT AAAAGCATAT CTrAGGTGGCT TACGAAGTAA TCATCAATAT 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 A4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 AGTCCGGGGGW ACCTTTITAG AAAGCTTAA'r TrCTAATAAT AGCTCGAT CAGAICrCrT AATGTAAGCG T'rrGATAGAT AGAAGTCTCA TGGCAG?=AA ATAACAGTAT CTCATGCAGC ATCGAAAATG ATCAATTAGT TCTAAAGCAA C.AACTCTTTC AAAGATGTCA 'rGATTCGCGA ACTTATCTGA 'rGTTGAAAAA TACCGAGTTA TT~ACTCACCG GAAGAAGGGA TTCGTGTACG GTTTCTTTGA ?I'GTAGAAGA GATGGTAAGG TGATTATCGA GAAAAATTTG AACCAAATGC TTGTAAGAAG GGAAGCCCAA GGAAAGAAAT GATAAAATAT T'rTCAGTAAT TCGTCACAAA CTCTTTGACC TCGAAGGGCT CC'rGATTTr'r GGAACGATAA AAAAACACTT ACAATACCTT TTGGATTTTT TGGCCTGAAGA CTTGATAAGA TAATGACCAG 'rCGGTAGATT GAGATTGCAA ACAAATCTGC ATCTACATTG TGSAAAAAATC GAATGAAAAA T?1'CrTACCT TCATTCACAG ?I-rGCTAGCT TATTCATACT ?TCI'GAATT TCGAAAAAGA TTACAAAAAG ATTGTATAAT AGGGATAAGA ATAGAAAAGG AGATTTTATG ACCCGCAAGG 'rAG=rATAT TAC1'CCAGAT
AGATTTGATG
TGGTTTGGTG
TATCTATGAG
TGTTGTCACT
TAAGATTAGT
TGACGTTTTC
CTTTGTTACA
AAATTTGAAT
AGTGCAAATC
TATTCA.AGTG
AGGCTTCTTT
GCTATAATGA
ATTGACGCAA
AGAGGAAGAG
TA'rTGCGGCC
CCATAAGATG
CGACTCAGTG
CTACGAGATG
CCATCCAGGT
TACTCGTTAT
'rGAGGCTGG'r
CAAGTCAGAA
TCGCCATACC
AGAGAGCAAG GTT-rGCACCG TCTGCCTGTT ACTGAGGGAA CCATZGCCACA AGCAAGTCCA ATGAArrATC
GTCTCAGGCT
ATTCTCCCTG
T'rCTCAATAA CACAAAAGTA ATGCTAG=C AGAACATGCA 'rCGTAGALTAA CCATCA6AGTA CAAGCCTTTC TrGAATTGC GAAGATGAAG TITGC-GTTCT ATCTCCCATA CAGTCAATAT GATGGATCAA TITGA'-rACC GAACGAAATCG CTCGCACTTC TT'ICATGAAA AGGGGA'rVAG AATAATGTAA AAAAZGAGTA ATCGTGAAAA ATTAGCCTTCT ATTGCCATCT -TGGAAAACAA CAAAAAACGT CGCAAGAATT 4860 4920 4980 5040 5100
AGGTTATGC
TOGAAAAATT
TCCGCGTAAG
AGCCTTGAAA
AGCAAAAGTC
AGCAAAAGAT
TTTATGGACA
TTCAGGGGGT
GATGACAGAA
AAATGAATTA
CGAAATTTTA
GTTAGCCGAA
5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 AACAA'rGCCA
GATATTC
TTGGAT-rACC
TCTTGGAAAT
TTCGTATGTA
AAGCAGGTGA
GAAGAGTTGC
CATCATGAAC
ACTCTACTCT
TCTCGTGGTA
GGTAATGCTA
AT'rAAGTCGG
ATGGTGT'TC
AGGATGAAGT
TGGAGCGCA
.TGTCAGAACC TTATCACCAC CTGAGCGCGCA CGACTGGGGT
AAGGCTTTAA
TAACrTTATC
ACCGCTTAGT
AGTGGAAGTG
ATTTGAAGGG
CCAATCTCA
CCTAATGCCT ATGGTCTCCT CCATTTGACT CTGCCAAACG TCTTTCACAT CTCGTAGAAGT GATGCCAGAA GAAGATGATA TCAAGATGGA TACCTTCCGT TTGGATGATA CTATTGAAGT GGAAATCCGT 750 TCAGGTGGTG CCGGTGGACA AAACGTCAAT AAGGTTTCAA AI'CCAACTG GAATTGTTGT CCAATCAACA GTAGATrCGTA CGTGCCATGA AGATGTTGCA GGCTAAGCTC TATCAAATGG GAGGTAGATT CTCTCAAAGG TGAGAAAAAG GAGATCACTT TATGTCTTCA CGCCTTATAC TATGGTAAAA GATCACCGAA CAGGTGTACG TrTAACCCAC CCCACTATGG AAATAGAGA'r AGCAAGATAA GAAGGCTGCG GGGGAAGCCA AATCCG 'ICT CrAGCTTTGA GGTTGCTCAG ATGCTTA'rCT CAAGTGCGA AATGAGAGAT GTCGTTAAAA TAGCGTTCAA CCCGGGGAAT GTAGATAAGG TTATGGATGG GOACCTAGAT ATrAGCTAAG ATAGAAAGGA ACTCACATGT AATACGACAA CGGAACAACT GCTCTACGCG TTGCTTACAT CGTAGGACCT TCAGGAGCAG GTGAAGTAAA AATCGATAAA GGAAGCCTAT GGTI'MATCr.
CAATTATTGA
GTGTTrCGIGT *9 AAAAGAAAGA TCTCCCGCTT TGTTACCAALA GAAAAC'rGTC ATCGCCGTAA TATCAAAAGA AGGTT~CGTTC TTTCCCAAAT GTGCAATTGT AAATAATCCC CGGATAATTC ATGGG-AAATr TTTTGATGGC GACTCATAAT TTGAAAATGG CCGTGTCGTT AGATTTTTTC GCCATTTATT GTAGCTGCTG TCAGTTCAGT AT'rTrCAATA CAGCGAAACT ATCCGAAAGG ATG'rGGAAGA AATAATGACT ACCACAAGGT
CTACGTCGTA
TATGAAAATA
CGAGTGATGG
GAACTCTCAG
AAAGTA'rTGA GGAAGTCAAC T'rTTATTCGT TCTCTGTATC CAGTTGCTGG TrrTAATCTG GTTAAGATCA GTGTTGGGGT TGTCT'rCCAG GATTATAAAT TTGCTTACGC TATGGAAGTA ATCGGGGAAA AAGTTTTGGA C~rGGTTGGA TTGAAGCATA GTGCCGAGC-A ACAGCGGATT GCGATTGCGC TAGCTGATGA GCCAACAGGA AATCTGGATC 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 ATGAATCTCT TGGAACGGAT AGCCAGATTG TAAATACCTT CGTGACGAAT CAAAAGGAGA TGAAGCCTTA AAAAGTTTGA CATGATTACT 'rTGACCTTGG AGCTACAGAT ATTGAAAATA TAATAGTCAG ACAATTGAAA ATATGATTCT TTGAAGAACA TAACyTACAA GGAACAACTA GCGCCACCGT GTCATTGCCA GTATGGATAC GATGATTAGT AACGAAATGG TTGGATGACA TGGCALATATT TGCATCTGTT ATGTCCGTGT AGTAGTTTAT AAGAAGGTCA AACTGTTACA TGTCTACGGT TAAAAGTGTT AGATAATGGG AGATAACTGG ATATTGTAGA GGCAAACACT TTGAAGGTGT CTCTGAGGTT CTTCATTTAT CCGTGTTTGG TCTTGAN'TC AAATACCATT TGCGCTTGGT CGGAGCTAAA ACCTTTTCAA GTAAAGAAGA ACAATATGAA AAATTAACCG AAAATCTTTG AAGGAGATGC CAATCCTCTC TATGATGCCT C-CAATGATrG TAAAAACTAT AGCCGAAGAT GCTAAAAAAA CAAGATGGCG GTGCCAATAC AGAAAGACTC TTCAAGTTAG GGACTAGGGA TTGCTGCTTT GTTAATTTTT ATCGCAGTTT CGTATTACCA TTATTTCCCG CAGTCGCGAA ATTCAAATCA AACAGTTATA TCCGTGGACC GTTCTTGTTA GAAG;GAC;CCT TTATCGGTTT ATTGGGAGCT A Al ulj. I/ ;Pjoo ATCGCACCAT CTGTTTTGGT TCGTTGGTAG GGCAAAATCT GCCCTACTAT TTGTGATrGG CGATI'CTTGA AGATTTAGGr TTTGCTACA.A GAGTTTTTGA CTTTATTGTT TATCAAATTG TPTACCAATC TGTCAACAA ATCCATGATT AGTCCAGA?1' TAVTrAGTCC GIGATGATr GGTTT~rCATT GGTTCATTGG GATCAGGAAT ATCChTGCGC AAAATAGCTG CTrTTATGAG GAGATTGTAA AATCTCCTr AAAGAGATGC GCAGAAGAAA AGACTCCA AAGAAGTcc CCAGAGAAGA CTTC INFORMATION FOR SEQ ID NO: 99: SEQUENCE CHARACTERISTICS: LENGTH: 19718 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: TGTCGCGTCA AAATCATTAC TATGGCTATG TATACCCCTT ACTATGACTT GGCTAAACAC GTTCGCTrTC AAATTTCTAG GCTCAGGCTG AAACAGTCTC CCAGGCTGTT CACTCCCGAA TGCTAAAATC GTTCTTGATC GCTTTCACAT TGTACAACAT CTTAGCCGTG C'rATGAGTCC TGTGCATGTC CAAATCATGA ATCAGTTTCA TCGA.AAATCC CATGAATACA AGGCTATCAA 8400 8460 8520 8580 8640 8654 120 180 240 GCGCTACTGG AA.ACTCATTC CCCTACTTTT CGCATGCACT AGAAGACTTG AAACACCACT AGACCCTGAG AAATTTTTCG TCAGACTGTC 'rrTAAAACCT ACACTATTCT AATGCCAAAC TGCCTTTGGT TTCGAAACT CAAAAAAGAA AGGACGAAAT TTGACAAAGA GCCTAATTTC TAGTCCTTTT TGATAACGTG AACAGGATAG CCGTAAACTG TAACAAATAA AGAAATTCTT ATCAGATCTA TCAACTCTTA GACTCATTGA GGACAATCTG TTCTCAAAGA TAAAGAAAAG TGGAAGCGAC CAATAATCTC
AGTGATAAGC
GACAAGATTT
CTTTTTCACT
AAGCACGTTC
GATTTTATCG
TAAGCTATTC
TTCAGAACAA
ATCCTCTTTT
ATTATCAACC CCCTTCAACT ATCAAACTTA TCAAGCGCAA TTGAAAACTT CAAAAAACGG ATTTTTATCG TT-GTCCTT'rC TCGAGCTTAG CTGACTTCAA CATAAAAATT GACATGGAAA TTATAAAACC CCAATTCGGC TTGGTTCGCC CAAACATAGT
CTTTGAACAT
CCCACTACAG
ATTACTAGTT
GACCTGGACG
CGTAAACCTT
GATTTCTACC ATAGATGGCT TATCAGTCTC ATAGTCGTGT TGACTTGGAT CAAGACCTTC TTACGTTCCA AGATTGGATC AGTATATGGG TGAATTGGAT TGTTAAACAA TGGGATTGGT ACCGCTGAAA GCAAGGCTTG L-CTTCTGTT TCTGCAACCT CTACAATAAC 960 1020 752 ACCCTTGTAA ATAAC~rGCGA TACGATCTGA AATAAAGCGA ACAACCGACA AGTCATGGGC GATGAAGAGA TAGGTCAGGC CGAGCTCTr GGCACGTACA GAAACGTCCA AGGCTGAAAT CATGACCAAG GCACGGGCAA TAjCCGATACG.
TTGGAATTrT TTGAGCAAGT TCAAGACTTG TGGCTCATCT GCAATAACAA AGTCTGGTTG ?'GAcG.TTGA cmccfrAG ATTcATGAGG GTAACGAGTC AACTGCTCAG CAAGAAGACC TACI'CACGG ATAATATTT TTTACGTTCT TCTTCATCCT TAAATAAMCG CAT'rGTAA AGACCTTCAG
ATCAACAGTC
ACGAA'rCAAT
AATGATATCT
CCCACTACCG
ATTTTTAACC
GCACGTCA'r TrCCGCACcT CCA'rTACTTG
GACTCACCTA
GCGACAAACT
TCAAACTTGC
GTTCACGCGA
TATCA'N'TAG
GGCAGGGTCT TGGAAAATCA TTTCTACCA TTAATCTT ACCGATGATA GCACGACCAA GAAC?1'TCTC
AAATAATATA
TCTGGATTCG
GACCATCAAA
TAGTTGTTTT
CAAGCGAGAA AGTTTCTCCC TrcGrGATAA AGAAGTTAGC TT~AC'TCC TTCACCGAAG GAAATTT'C'rA AATCTTTGAT
S
S.
4 4 S. S S
S
*S
S
*S 59 TTCTACTAAM TTTTCAGACA 7wrTCACGGAT CI'ATCATGG CCTCATGAAG AAGCCAAGTT TTTGTTCGAA GTCAATCTGC GGTCAGTATA AAGTGACGGA
TTTCCTTCCT
ACATTTGCAA
TTAGCCCAAT
ATTGCGTAGT
GGTGTTCCTG
CCTAGTCAGC CAGATGGGCA TCACAGC'rGG TTTTCTACT GTGTCTCTGA TACTGAGAAT CAGAACGCAA GGCAAAAGCA GGATTGAGTA AAGATCCCCT ATGTATATGG ATGGCGAGGG
AATCCCATTT
TTCGGAGCAT
'rGAGGAGCTT
TCCCCTTTCA
'rTATCATCAG
TCATAGAAGA
1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 CAAGCTGAGG CAAGCTAGAC AAGAGACTCC CTTCCTCAAC CGTTCCATAC TCAACGATTT CTCCTCCATA CAATACTTGC CACCACACCA AGGTCGTGGG TAATAAAGAT TTTGTAAAGA TTTTAGCAAA TCAATAATCT GAGCTTGAAT TTGGC'rCATC ACAGATCAAG ACATCAGGTC GGCAGGCAAG CATAACCGCT ACCTTATCCG TGTTGTGAAA TGATACTCGT AGTTACATCC AAGGCAGTTG GGCAATAGCA ATAACGATAC boo* 4 GTTGACGCAT TCCTCCAGAA TATTGGAATC GGTATTCATT AAAACGTCTA TCTGCGTCTG GAATGCCAAC CTATTCATG TAGTCAATGG CCAATTCTTT CGCTTCTTTA GCTGTTTNTC CTTGGTGtTT TACAATAACT. TCTGTAATCT GACTACCAAT TGTTTTAATG GGGTCCAAAC TAGTCAI-rGG GTCCTGGAAG ATACTCGCAA TCTTAGCACC ACGAATTTGT TCCCAATCCT TGTGAGAAGA TAAAGCTGTC AAGTCCTGAC CACGGTAGTC AATACTACCI' TGGGCAATAC CACCATTr1'C T'rCGAGCATA CCTGTGAACC TCTGTCAA AACAGATTTA CCTGATCCTG ACTCACCTAC CAAGGCTAA'r ACTTCTCCTT CGACTAGTTC A.AGGGAA.ACG CCGCGAATGG CTGTCAATAC TTTGTCACGA ACGTCAAATT CCACGACA.AT ATCGCGAGCA GTCAAAATTA CATTrTTTTC TTTTGTCATT TCTACTCCTA TCTATGTGTA CGTGGATCAC TAGCATCCGC TAAGTT'GA CCAAC=ACGA AAAGGACAA CCAGAACAAG TAAGCATTG0 TTGTTACGr ACTrGGCACT GTAATCGGTA ATCCAAGACC AAAGCTTGGA AGCA'ITITGAG TCATGGTTGT ATTTTGGCA ACAATCTTCA AGGTTGGTGT CAAGTCACGA TAGCGCAAGA ?rrGCACACG GGATACCAAG ACAACGG TCAATGGAAT 'rTG'IGAATAA TCCG;AATCA AACGACCCAA C-AAGAAAGAC AA~uAAGGCTT CGTATGAGAT CACAATAACA GATACCAATT GAGGCATGAT TCCCAAAGTA CGTGACGCCA AGTTGTATTC GATCATGAAG GCAATACCAA TCCATGTTGT TCCAGCTCCO ATrGAGTAAG TCAAGACAA'r GACGTTGTAA ACTTCCATCA TGACACGGTC GACAAAAACA CCGATAACCA AGTTAATCAC
TACGCTCATG
AACAA'rCAAA
AACTGATTTT
GCAAAAATCA CATTCCAGAA AGAGGTGGGA TGTIGAGAT GAAATACCCC AAATACCACC TGTCGCAATC ACAGAAATGA GGATGGAGTr ACGAGCTCCC CGATTTACCG TTACTGTCAG TACCGAACCA ATGCTCCGCA ACTAAAGTCG T'rrACCTTGC TGACATCATT GAAATCAAAC GAAACTTATC AAAATGATGG CTACCAAGAT TCCCAALCATG CATAAATTGT TTAAACACTG ATT-TCCAGTA AGAATATGCX' GGCAAAATCG TCACGT'rTrA CAAACTGAAA TTC11TA CCTCCTTTCT CAGTCAATTT AATACGTGGG TCAATAATAG AGACGTGAGA AGATACAAAT ACATGTAAAG ATGAAGACAA TTAGATGCTT TrrACAGAGTC AATCAACATT TTACCCATAC TCAGTAAGGG TTGCACCACC GATAACCCCA ATAATGGCAG GGAACCATGG CATTrTTTAAA GATGTGTTTG TTTGAAATTT GCACGAGCGA AACGAACAAA GTCTTGAGAT TGCAAGTCAA ATGGCTGTAC CAGGAGCACC CAACAAACCA AGGATGACTG CALATCTCCAG CTCCCAAGAT AGGGAATGAA TCTGGAAGGG CAACGATGT AAACCAAGGC AATCGTTGGA AGAGCAAGCA GAi3AGGCTAT CPLATCCAAGT GT'rCTTGAAA CGAGCCATGG AGAGCATAGG CAAGAACCAA ACCAATCAAA CCAGTAATAG; GGATATTGGT AATTACTTTC AGTCGCTGTA TXAGGATCAT TCACGAGAGT CAGCCTGACT AGCTGACTTG TAGGTTCT GTTTTCTTAC CTGTTGGGAA CTCAACTTGGCGCAGT'rTTGG A.ACCAGACAC CGTCAAAGAG TTTGGCTTGA TATAACCAAC TTAC-NAAACA TTGGGTAGAT ALCTACAGTTG ATTTTTTCT-r GGCGCATCAA TAGTTTCAGA TCGATTGTAG ACATTATTTG TCATCCAAAT ATCTCCCAAA GACCAACGAC CATAGAGTTA CTGGAAGGC GAAGACTGTT CAGGAATTCC TGAAACCAC CTTTTTCAGA CAAACCTTTT TCATGTAACG ACGA.ATCCAA CrGGTAAAAC GTAAGAACGC CAATAGATGA TCCAATCAAT A-A.AGGTCAA AGCCCCTGTT CTGAACCAAG TGGCACGGCA CACAGCTGAC AATCATAGAT 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080- 4140 4200 4260 4320 4380 4440 4500 4560
CTTTCCCATA
AGTAAATATT
1TT-rGTCCTTG
GCTAGCTACT
TACAGAAGAC
ACCTTGAGTA
A'rAACCTGAA
ACAAATTTT
GrccTGAAC
TTCAAGTCTG
GCATAGAAGT
Cr'IPCAGTAA 754 GAACTGGTGT ATTAGCATAG G~rGGG'rAAG AGTCACCTAA ATTCAAGTTC GATGAACAAA TGMMACrGA CTGT 'AAAGT ACAAGAGATA TrTATGTTTA CGACCAATGA CCATCCGATA GCTGGATCAT TTTCAAAACG AAGMG'ACGT GATN'TCAGG GTCTTGGA'rr TTATTTGTAT GGTCAATGTC AATCAAGTTA GAAAAAC.ACG ?I'CAAAAATT GGAATITCAC GACTAGCATA GAAI'GACCA ATTCTCCCAA AGTCCAACCA TGACCTAATT GATTGATGTA C7=rCATAA ATAGCTTTAT TGGTCGCATT TGCTTCTACT GTTACAGAAG AATCCATGCT 'rCTTGCALACT C~rrAGTATC GTAATACTCA ATGTAGCCCA TACGCTCAAA ACTTGCCr
CACAGTATTT
ATCCTGCTTG
CGT'rACTAAG
ATTATTTCCT
TCATAGTTAT
AAAATCAATT
AAAATCGAAA
TTAAAAATCC
'IrCACATGAC CATrTTTrCAC
TATTTGAACT
ACAATTCGTG
CTATCTGTCA
CACGTTTATC ACCCGTTGTC GCAATTTTAT TTCGAGGAAC CAAGGTATAG ATAATCGTGT CCAATGACCG CAAA.ACACCC ATAAAAATAT CAAAAGAACC TTCTCCTCAT GGAGAGAAAG ~TTCCAATTC TTTTTGAGCT TTCTCAT'N'G GAGCTTrC ATACTCTTCC TTAGTCACCA AAACATCTGA CCCCI'AGAG CCTGTTTGCG AAAGCACTGG TGCTGCACCA GAAGAAGCCA ACCATGCTTG AGCCGCTGCA TATTTTTCAT GTCTCTCTGG CAGCTTCATC AAC'rAA=rA TCGTATTCTT
TATAGTTAGG
AGGTCAAAGT
ATTrTTTTCAT GAAGGGCTAT TTGGATTATC AAATCCTAAA TTTAAAATAT CCAGGTAAGT AGATGGGTCT GATACATCCC AATCC'rCAGA TGAAGCATTG TCACTTGTCA TTTGTTGAAT ATCAACAACG GATTGTTTAA AGGACTGAAT ACGAGATATG TCCAGA'rGAA TAGGAAACTG AACGCCGTCT TCTGCCTTGG CC'rTGTCAGC ATTGAATAAA -TT'CCACTCAT CACCATAAGC AGGAAGTTCA CCAGCTGAAA CAAAGTCTGG TTTTACAAAT 'rTACCATTGA TTTGAGCTGA GTAAGCTGAG AAATCTTTGT TAAGCAA'rGC CT'rCTTAGTA GTATAGTTGT AACTTTGGCG ATCAATATIC
TATGTTTTTG
'rGATAGTCTG
GCAGCATAGT
ACATTT'TCAA
'rAGTTTTTG
GCTTCTAAAG
CCATCCTGCC
GCAGCGACTA
AAATTACGAA
CGATCAAGAG
GCTACT'rTCT
ACACCCAGAC
TTCTATTAGA AATT-AT'rTAC ATTCAGCTTT TTCTT'rCAAC CrTTTATCT'rG TGATTTCAAA CAGAAGCTCC AGTAAATGGA TAGCAGGAAT AAAGAGTGAA AACGGACATT CAAGTCGCT TCAAACCAAC TTGAACTACT TAGTTTCACT GCTAGTTGTT GCCCCCATGA AACTCCTCCT AAGTAATATT AAGGAATTCA CACCAAGAAC TGTTTCTACA ATGCTTGGTC TACTGGAACG CTTTCTTAGC TTTCGCAAAC CATCAGCTAA AT'rCACACCT AATCACCAAA GGTC'TTCTCA CTGCTAAAGC TGCTCCATCT CAAAATTCAA GGCTTGACGG CTCAATCTGT AGTTTTAGAA CAGCAATCCC AGAGCCTGAT 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360
TGTGTGTAAT
GGAAAGAC
TCTGATCCAT
TA71r=CCAT GGACCA'rrAT
GTTTCGAATT
AGATATTGTC CTTGTATTCr TCGCAAC GGGcATAAcT ATAAGCTCCA CrAZMGAG CATAGTAA~c TAGATTGATA GTATCTAGGT TTTrACAAA
AAAGCAAGGA
CTTCATTCAG
GGTrCAGGCT GGTTCAAAGTI GTTGAAAAAT C'TGTTGAAGT GCTAAATACA TAGCTTCTGA TTAGCCGTCA CCTCTGCATA ATCTTATAAG TGTAGGTCAA GCAAGATTAC CGTAATTATC GTTGTACTAT TTTTACTTGA TAGCCATAAG CTT'rAGCGGC ACACCTGCTG CTA.ATAAAAC TACTCCTCI'G TTTATGTGAA ATCAAACAAA TT'rTCAGAAT ATATGATTCA AATTG'TCGTT CAGAAACTTT GGAGTTTAGG TTTTGGAA'rT CAAATTAAAT TCAGTTTACT ATGTCTTTTC CATATATAGA 'rTAAGACTAT
GTTAGTGAAT
AATCAGGTAG
TGATGAATCA
AAGACCTGCT
TTATAGATTG
ATTTAGGC?1'
CGAAGTGTCA
AAGACATACA
TATAACAATA
ACACCAACCT
CTTTTATACT
CTATTCATAG
CTCTACAGAA GATTTTGCAG TGTCGGATCT GTTGGTr'rAG AGGCCAGAAA ATAG.AATAGG GTATTGTAAC GTATAATCAT TCCTGATAGA TAATCTGCCA TTTTTTATCI' GCTGCGTGTT TTCTTCTCCA TCAGAGGTAA ACCATCCTTA GAGACTTCCC
AAACCATCAA
TCCAAGGTTT
GATGATTTTGM
GTAGCAAATA
ACAACCATTA
GTTGGCACAA
AAGACTACAG
GTAAAA'rGAA
TTGTAGAAGT
TATCCCGAAT
TTAA.AATTTC
CCTACCATTT
TAGAATAGTT GGAGCTGGTA TACGCTCrAG CGACTCC1'GA GGACATTTTC TTTATCCCAA TCAACCCTTT CAACAAGAAT CAAAATCGCT TCCTTTTGA'r TCAACTTAGA G~rCCAGAAC CAACCGCCTT GACACCAACT AGCC~rTTAAC CCAA'TTTCA TTAAACCG -r CACG.AAATCT ACCA'TT1AAC CCCTTTACGA AATCCTCTCC AACTGCACCA TCCCATTTGA AGTCAC1'ACT CTGGGTCTGC TGTATAAACA AAGAACTGCA TGCTGCAAGT CACGATTrTTT TTTCATTTTC TATCACATTA TCCATTAAAA ATTIrCATT TTTTTTGAAT TGAAAATAGG AAATTTGACG ATACGGACGG AACAATGTGA ATCATTCTAG TATTCAAGAT TCATTrACTT TTGTGATTTA TCGCTACTT ATCCACTATA CACTATTGCT TTCTCTGACA AGACCATTNT AATCCGCTAC GGTAGI'GGAG CCCTACAGAG AATC'rCCTAC CCAAGGTAGA CCAACATTCG ATGCTCCAAT GAATCCAGAA GAGAGCCAAG GATAAAGGCC GATAATCGGA 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100
TGCTCCTCGC
ATTCTTATTT
CTCACGATAG
CACTTACACC
AGATTTCAAA
GGAATACTGC
TCAAAAATTC
TATCACGTTT
CCTGCGTCAG
TCAGGCTTrG
TCAGATGCAC
ACTTAAAACG ATCTATCCCC GGAGCGCTAT TGTATTCACC GACATGCCCA TCGTATAAAA CT'rATAAAAC TTAATCCGTC ATGTCCCATA ACATAACTAG CALAGAAAATA AAGCCTGACT CGTGCACAGC AACCACTGTA AGGAAAGATA 756 CGTTTCCCCG ACTCCTGACT CATATCCA'rC ATCAAGCGAA CAGGAGCAAC AGAAGACAA.A 8160 ACTAATAAAA TALCTCCCCAC AATTCCGTAA CTCAGAATCG TATCAATATA AAGACTG'rGG 8220 G CATG7TCAT GATAAGGAGC ATGTATCCGA GGATAAGAGT TCATATAGGT CAATGGCCCT 8280 TCACCCCAALA AAGGATTTTG CTTAAACAAG GCCATCCCAG CATCCCAGAT AGAAATGCGT 8340 TCTTCCATAG AAGAGTCTAA AGTACCCATT CGAACTCCCA AATCACTAGA AAAGAGGAAA 8400 CTCAAACCAA TCGCGAAGAC CCCAATACTA AGCCAAAAGG CCNrCCAGTT TTAATAGTC 8460 GTAAAGAGAT AGATAATTGC TCCAGCGATA ATAGCAGGAA AGGCAGTTCG ATTTGAGTA 8520 AAGTTCAAAC CAAAGACATT AACAAAGCCT GCAATCACAC AGAATACTTT CAACCAATTC 8580 AACTTGGTCG TTGTAAACAG ATAGAAAGCA ATCATAATAC AGAAACAACA AATAATTCCA 8640 TAATAATTAG GATTAAAGAA GGTCACTTCT GCCCGGTTCT GATGCCACAC CTGCATATTG 8700 GGTGAAAGAA AAGCATAGTT AAATTTCTTC ACAATTTGGA AATG=rCTAA ACTGGCAAAA 8760 GCAGCTGACA AGACACTACC AAACAAGACA AACTGCAAAA TCAATCG-AAA GAATTTATGG 8820 GATAAAATCG ACTGATAGTG CAAAAAGAAA ATAGTAAATA GAAACATTCC TACTGAAGCC 8880 .ACAAGACCCA TCCAATTTTG TGCAAGAATG GATATAACAG TACTATAGCT AAGAAAAAGA 8940 :*.AGCAGCATCG GATGCTCCCC CATTTTCTGA AGAATACTTT 'rCATGTCTCC TGTAAAAATC 9000 :.*AAACTGATAA TATATAAACA GAGTACAACT ACAAAAAGAT AAAAGGGTAA AAAGA'rACTC 9060 V00AGGATAATTC CCAATAAAA'r CAGCTCTTTA CTAGACAACC CCTTCAGCIT TTCAA'rAAAG 9120 CCTAT-rGATT TCAAAATGAA TCCTTTCTCT CCAAATCAGC 'rGATTCAGAT AATAGTAAGC 9180 TATCCTA'rAT TGTACCACTT TTTTAGCAA'r TTGAAAACAA AGGAAACGTT TTCCAAAATA 9240 *AAAACCCTAT TTTATCCACC ATATCAAGGC TTCAAAATGA TACT'rCAACT CCATTC'rCAA 9300 oosTTACCCGATA ACTCTGATTT TGCAAATCAA TTTCTACTAC TGCTGTTACC GACT'rATCTT 9360 TAT'N'TGACG TTTGATTACA ATGCTGTGAG CTGTTGGTGT CTCTATCTCA GTAGTCCCTT 9420 0 CTAGATCAAA GGCTTCTGAA CGGTTACGGA AAGAAAATAG ATTGAGAAGG GCCTTCACPA 9480 o.o.o CACGTCGTTG CACT-rC'TT GCTAT'rTCCT CG~rGCTATA GTAATGACGA TTAATAT'rrC 9540 GACCTTCTTT AGTTTCTTCT AATAATTTCA AGTCATTCTT GCCTGCTAAT AGACCCACAT 9600 AGTAAATCTG AGGAATACCT GGGGCAAAAG CTTGAATTAG ACGAGCGAGA AAATACTTGA 9660 *.*CATCATCATC TCCAAGCGCT GAATAGTACG TTGAATTGAT TTGGTAGATA TCTAAGTTGT 9720 TATACTCGGC ACTAGAGTAC TTACGTTTGA CATTGGCTCC AACCTTATAG AGTTCATTTG 9780 AAGCATAGTC AATCTCCTCA TCGGTCAGGA TATCCr'rGAC ATCTACTACT CCAATCCCAT 9840 CATGGGTATC TAGCGTCGTA AATTGCTTCA TCGGGCTCAT CTTTAACCAC TTAGCCAAAC 9900 GC 17 TX' GGAAcTGTAA AG AGTATAAA GTGTCACCAT CATAGTAATC ATGGTCTGCr ATrTAA.PCT GAATCGAATA AAAGCTCTGT CCCA'rACTCA GCAGCGATAT CTCGAACTTT TGGAAGAGCA AAATCATAAA GTG'TTCATGA ATCTCAGGTA CTGGTTCCAC AAAGAAATCA TTAGTATCCA GAcGAATcAA ATcAcAcccA TTAcTTrccA 'rAGTTACTTC TTTGG'rCACA TCAAGATCAA ATTCTrCAC AGTrGCTGAA'r
GTCCAATAAA
TCGCATAACCA
GGCTCTTACGG
GTTCCACTGA ACCA'rCTTCA AAATTAAATC TACATCAGAC AAAAGAGAGC TTTAAATT'CA ATTGACGAGA AATATGATTA GCTTCACATC CTCCCAATCA
ATCCACGATC
CAAAATCCTC
AGGTAATCAA
CCAAGTCTCA
AAACTCGTAC
ATTATCAACA
TATACTCTTC
AACTGTTGAT
TTCCAAA'IrA
CATGGTTTTA
TATGATCTGG
TAATATAGAC
AGCTAAACTC
GAAAATCTCT
AACACAATCT
TGTGTCGGAC
CTGGCTTCAT
ATCATAAAAT
CCAAAAGC1TG
GGGAAAAATG
TCATATAAGT
TT'rTGAATTG
CTTCATAAAT
TGTGATAAAC
TTCCTCTGTG
TCAPLACCACG
ACAACCTCAA
TTCATTGAGT
AAGAGATAGA
TAGTCGTTGA
TCTGCTCC1'C ACCAAAGGTA CTTr.CTTTGG TGCACGATCC GGTTTTCTGG CCAAAACTTA GTT?1TrCTTG ATAGTCCTTA CAAACATAAG ATAATATTC
TCCCAAATAT
AAGGCATCTA
ATAAATMCA
TTCCACAAAT
T1'ACGCTTGT
TCCCAGTTTA
TAATACT'rGG
TCACCTAAAC
AGTCCACTT-C GTCG;TACrA ACTGGCGCAA GTAAAAGGTG AACTCCTCCA ATAGCA'rCTC CTTI'AAGATT ATTTCCAAGG CTATCAGAAT GCATCATTAC TCTCCtr=r CTAATTGAAG AAAATTCATT TTA-AATCTCT ATTTATCATC AAAGTACTAC T-rTCTT-GTT-r TCTGCATAGA TCAAAGACTA TAGATTCCAT GAGCTCT-rCT TCAGCTTCAC Cr-rGCCGTAG GTATG=AC AACAGTGTTT TGAGCA.ACCT GCGGCTAGCT ATTACTTCAC TGCCCCGTTG CTCATTCc-TG CAATGGTGAT ACTCATAATG CCGACCACGT AATATTGGCC TGCGTAGTTG TATTGGAACA TCAAGACAAG CAGTGGCXAC ATGAAGTCAT 9960 10020 10060 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 TGACTTCGTC AGTTTCATCC TCCTAGTTTG CTCTTTGATT AAATGATATG GCGTTGGAAG AACGGCAAA GCTTGGTCCG AAGGCAG.AGT CCACATTTTG TCCAGAACCA AAGGGCAT'rG AqATGATG~CG GAAATAGGTT GACTTTCTGG AATCGAGATT CATAGGTCAA GTAGAGCA.AG TAACCGTAAT CGGAATCATG TATACATGAT GGTAAAGGCT rArc~cC~r ATGATCATGG TTGTCGCATG GTAAAT'TGA'r TAGCCCCATC 'rrGATATAGC CAACATAGAG ATCAAACCAA AGGTATTAGC ATGACTTGGA AAGGTACGAA TTTCTTTTAC TCATATTCC CATCGGTTTC ATCA'rrGGGA GATCTCTGCT GCTTCA'rCCA AAAGAGGGTC TGTGGAATCG CAAACCGAGT TTACTCATICA GATTCCGAGG ATTAAGAGGG AGCGATGGAG TACGCTGCCA
S
S
S. 55
S
5 TAGGGATAAA GATCAWrACT AATAGCCTCC AATCCCATCA GAAAGCCAAA GAAATTATCT CAAGGAGCGG, CACTAAAATC GGGCT'rCT TTCATCTrGT ACTCTCAATT GGATGATCGA TTGGCCATAAC CGAATTGr= GTTG'rGGCAT TGTTTGGACC CCACCT'rrTA GGGCTAGGAT ATGTTCCAGA AAACTTGCT'r GTrGGAATAG ATTGCAAACC AGAAGGACAA AGACAGCCGC AAAAATTCAA TATGAAGGC AAGATCAAAG CCACTGTCAA AAGGTTTGGC CTTTGATTr'r TCACCAACCA CCATGGCAAT GGA'rCCATGA AGAGGAGCTT CCTGTCC-AGT TGGTAAAACT GCTTGTAACA AGAGGGGGAT TTCATAGTC'r CTCTACTCCT ATCA'ITGACC ATGCCTTG'TT GTCTGCTTCA CTGGTCCAGT TTCGGTCATA CCAGCAAGCG TGGAGATCCG TCCACATCGT AAAGGCATTG GCT'rCTTTTG CCCACCAACG GTTAAGCTTT AAAGTTCGGT TTTrGTTCAT GACATCCCCA CGTGCGAAGG TTGCTTAGAT CCATTGA'rGC ATCCGACAAT TTAATGGCAT 758 GCAAGTAAAG ACAAGACAGT GATGACGACA GCTAAGAGAC GCcTAAAGTT GTCCCATGTG ACAATATCCT TAGTGGGTTT GAAGGAACTA AGAACCGATC CTAGAATCAA TAGAATGTAT TTCATCATGC TTCTCCTCTT AAATI'CAAA AATCACrACA ATTAAGAAGA ACAAGATTAC GTTrAAAG GCATATTAT AAACCAAGAG ACr-ACCGGTC ATGGC-AAAGA CTTGGTCAAA AAAGACCATA GAGACACTTG GTAGCAAGTA AGGCALATTCA GCITAGTCGCA CCATCAATCC TTGCTCCCTC TGTAATCTCA AGCTAGGAAG ATGATGATGG GCA'rAGCCAC CCCTGCCAA AAAGATTGCT CCCCACTTAG TCCCrAAAAG ACTGGTrTGG ATI'TCCAATC GCTGGAAGAC CGTAGTTGAA GACT'rGCTTG ACCAGATAAA ACAGCTGGGA AGAAGAACCA AGCACGGAAG AGAAT"TCAAG ACACGCGCAA TGAAGATCCC GAGTGCAATC CGCAATGATT GCGGTAAAGC CAATCGCATT CATGAATTTT AAAGTTGTTT AAGCCAACAA ATTTGTAGTT ATAAGTCAAT GTAAAAGGCT CCTTGAAACA TCGGCACATA GAAGAAAAT'r GACCACAAAA GCCCATGCCC AATATNTTTTG TAATACTTT'r AATCCACATC CGCTTTCATC GGGTTAAAGA AGGCATTCAA TATCACCGGT CAAGACATAG TrCATGGTCA AGGTATGGAA ATTGTTGCAA CCAGACCAAG TGACGATCCG TAAAGGCATA GTGAATCTrC TCCTGCTTGT TTGACCCCTT CGATCGCTGT AGTATIwTTTG CA'rGACT'rCT GGACGGGTCA TATATTCCAC GA'rGTT'rGGT GGTGGCTGAG ATAGACCATG CCAAGTCTCC GTCCTTTTTC 'TTrTCCTGGA ATCATGAAGG TCCCAATCTT TAATCGCTGT GATCGCCCAA GACCCATTTG GTGTCATGAG CTCCGATAAC ATCGGTATAG CCAGCACCTT CCCAGTTCTT GAAGGATGT'C CATGACCTTG ATATCATCTT 'rCATAATCGG TTGGTTGAGA ATAACGAAGG TATTGATTTG CTTCTTTTCC
GAGTTCCAAT
AAGTrGGTTG AAGAGGGrAG
TTGCCAATCA
?rTTCITAGAT
GGCAATGGCA
CCCAAGTGAG
GGCAGTCAGC
11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12 660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 11 J )7uJ AtJ 1-1- TCCACG'? GCTGTCGCAA AGGCTAATTG ACCTGCAATT CCAAATGGTG TTTGTCCTTT TTCATCCCAG GTTTCAGGAA CCTTCAACC AATTCCA'rAA GCAT'rACC 7rCAGCGTAG CCATTTTTCA
ACCTGCTITT
CACATCTCCT
TTTGACCTTG
GGTCATTTCT
ATAGTTGGAG
TTTTTTATAC
CACCCCAAAA
TAAAAATATA
GCCCATTC?1'
GCGAGAACGC
ATCTTAGGGT
TTTTTCTGGT
CAAGCGCCGA
CATTCCATTA
GTTAGACAGA
CTGTCTACTC
TAAAAGGAAC
CGCGTTTCAG
GCAGTTCGAT
GTGTCTTCAA
TTTCCTTCTC
TGAAATACrC
GCCCAAACAA
GAA.AGCCTCC
ATAAATCTAA
AAAAAATCTC
AT1TGTAACCA TTGAGTGTCC AAGCATCTGc AGCAACGA'rA TCTTTGACTA ACTGTTCAAA CAGTTCTTCG AATTrATCrr TGTTGTAG7TA G=IG'AAACTrTTrCGTTrA CAGCATAN-r GTAGTCTTTG TTGCTCAAAT CTTCAAAAAC GGACTGTGGG TAAATATTGA CCACATCAGG 'rACTTCACCA GC-ATTTGGTA CATTGACGAC AAAATrCACGA GTGATTTCTT CCAAGGTTTT GATGGTCACT GTGCCATCCG CAGA'N'rACC AGCrAAACCT GTAGTTGCAA GAAGTCCGAT 'rrrATAAATT TATACACCCT TATTGAACTG CTTTGGGGT CAGTACATAT CATAGTTTTC CTTGGGATAA GA'rAACAGTT AAGCCCGCAT CATT'rTCCrG TAA'rrATAT AGTCCCTCTT CCATGG'rC'C TACAACAGAT AAAACGCGAA TAAAT'TGTAC AGCTGCTTCA TTGGATACAG CTAACTGAAC TACTGGTCGT AA'N'CTTTAT TCTCTI'CATC TGATAAATTT GTCAAATCAA CAAGGCCACG 'rGTTTC'rAAT GGTGTCATTC CATGAGCCCC CATAGAAATG GTTGGATAGA G'rGCAATGGC ATCAGTATTA TCACTAGCCC GATCATTITCG TCCACCACCA CCAGAGCAGG TCAGATAAGA AACGAGTTCA TAAAGCCCCA CTAGATAAGT TAATCCATTC CC'rAGCT'rAG AATCAATATC ATGATAAAAT AGGAGTrGAT GAGGATTGGC AAGATTAAGT ACTAATTGAT ACATTAGTTC TGCACCTGAG TAALACTTCGC CATCCAAATC TTTTAATTTT AAAGrTGTTT CGTAGGTTAC AATCGTT'rGA TrTCCGTAAT TATCAGGATT' AATTAGTCTA TACTGCTGTC 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 0 0 *000 *0 0 0 0 ACAAGT'rCAC
GTTCATAGCC
GTCCCATCTC
GATAGGATGA
AGACTTGTGG
ACTCAAAGAG
GCATGTACTG
TGA'rAT'rGCG CTAAGACACr
TCCGAGAATA
CTGATTAGCA ATCGTAGCTT CAAATTTCCC ATCATTGCTA ATGATTCGGT ACTCCTGACA ACCGTATTGA ATTGGTAAAC GAAATAGCGC ATCATACCAA AATATGGCTG TGCTTCTCTG ATGAGATTGC ATCTGTGTCT
GTTCATATCC
TTTCAAGTAT
AGTATGCTCA
CATT'rAATGT 'rCTACTACCT
TAGCCACAA
ACAAATCACT ATCTACAGAA ATCA'rTTCGG 'N'TCATGGAT ACCGAATC AGACTTTCTA CCTGAATAGC CCAGTCAGGA GTTCTAACCA AAGTCCAAAC GACTTCCACC CAGTTTTTCC
TGTTGACCAT
TGCAAACCTC
TCATTAACAA
760 CCCAATCACC TAAAGCACGA TTATCATCAA AACGAT1'GCC AAACCAACCA TCATCTAATA CI.AAAAGTTC AATGCCAACT TTCTTAGCTT CATCTGCTAA CTCTAACAGT ?I'TTCTCTCT GAAAGTCAAA GTAAGTAGCT TCCCAGTTAT TGATTAGAAT TGGACGTTCr TTTTAGAAA ATTCACTTAG CATAATGTGC TTCAGTACAA AATTCTGACT TI'CATGACTA ATACCAGTTA ATCCCTGATC TGAATGAGTC ACTAAAGCTA CCGGTGTTTC-AAAATTCC TCAGGAGCTA ACTTCCAAGA AAAGTTTTCT GGATTAATGC
TTTTTTGAAC
CACCATCCTC
AAGCACCTCG
TCTTTTCACG
CAGCCATAAA
TACTGTAGCT
GT'rGAGCCTT
GAGAAGGTAA
AGTCTGTTAC
AAAAGCTTCA AAGTTGCCAC TGTGACTCCT TGTTCGCATA GTTTGAACTA ATCGAAAAGA AGCATAAGCA CCCTGCAGAG AGAAAAATCT TTATGGATGA AGCAATAGTC GCATCATTAT AGAATCTTCT AACATTAAGA CAATAGCCAC CCGAACTTCA TTCAATTGAT TATACATTAG TTGAATAGCA AACACATTCC GTAGAAGAGC TGGTGTTGA GCATGACCAG TTCCTrGTTC rTACTA'rTTC
CAA=TCCTO
TACCTGTTGA CGTCTAACAG GTAATCTGCA GCTGGAAAAT ATTACTATTA TTATCTAATT- TAAAACTAGT ATAATACAAA CAAGAGTCTC TGTATCGTCC a.
a a a S. a a a S. a a a.
b a a *@Sa S* a.
a 4 0
GCCCTGTGGA
T'rCAGTTACA AGCCATGTTG TCCAAAAATC TrGGXTTTCC TGAAAAGGCA TAATAGTCTT TCCTAAATGT TTAGAT'rTTT ACTCTCAACA CCTCATCACT TTAT'rGATTA TCTCAAGAAT ATGGTAAAAT CAGACTGGAA CAATCGACCT TACTCTTrTG GTCCAGCCAT GGAAAATTr'C ATTACAAGGG AAACCAGAGG AACTAACCTT TTAGGAATCA CTGGAGGGAA TCCTATCTCA TCCAATCTGA ATTGTCCGCT TCGCTCAGAT CAACTTCATG AACTGATGTT CCATTCTGAC C7MTAAAAT CTATGCTGAA CCTGTATGGT TGTCGCTGAG TATCTAAACT TGGTCTCGTT CATAAACACT TTCAAAAGTA AGTAGCCATT TAAAATAGAT TATTCTCTAT TA'rTTTATCA CCTGAAATCG GTTAGGTAGG AGGTAGCACA TGCCCTAAGC TrTTATGGAT TCG'rGATACA TACGTTCTAC TAAAATTGTT GATTTAAAAG TTATCAAGCA GATAGTAAAG AGCCCCTGAT TATTTTGCTC AACTTGTCAT ACCCAGACTA TACAAAATCA AGTGAATTAG TCATCTGGGA ACTATTGCTC CTTrGCTTCT
TGGTTTCCTA
AAAGGTTCGA
AT'rGGAACCT
TCGATTTTCA
CCTAACTCCC
CTTr'CCAAAA
GTCAGACTAA
ATGCTATGTG
ACAAATCGAA
AAATCTCCTA
TTAGTAGCCG
TrATAGTT-CT
ATAATCAAAC
ATTTACTTCA
TAGAAAAATG
15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 16320 16380 16440 16500 16560 1 6 6 2 0 16680 16740 16800 16860 16920 16980 TGTTAGTTTT TTCAGAATAC ATGAGGAATG CACACCTAAT ATTACATTAC TAAAGGACIAA AAGGAGATT'r CTTTCTATTA AACCTTGGGC CTACTACTG TTTCCCAAAT TTCTGATCAA CTGCAAAACT CATCTCAGAC CTCAACTCCA TATCATGGGA CCAATCAGAA AAAAAAGAAT ATTTCATCAA CCCACCAACT CTATCTTGAA TGCAAACGAT TAATTGATAG CCACTATCCT CAATCACrrA CAAITCAAGA AGCGTAITrCA AAGAATTTAA CACCGAGCTA GACAACTTCT GTAGGTTT CAGATCCACT CCAAG'rCATA CAAGAAAAGA AATCCTACCA AGCTGTCTAC AAAT1CGCAGA AAAACTATCC TrrAGCAAAA GAACI'ATCCG TTCACAGAAG CTACTTATCA TACCDATCA CCCAAAGAAT ACCTACTCrA CGrMMAATG CGAAAATACC CAAGAGTCCA TCAAGGTAAT TGCATACTCG CCA'rrTrTCG AAACTATA AACAATACTT TAATCAGACT ATACTCTCAA TACCAAC'rAG TAAGA).AGGC AACATTATCA CAAATCCTAT CrAAAGAAAC CGACTATATC AGCGGAGAAA CTAAGCCGAA CAGCAA7TG GAAAGCCA'rC AAGCGACTAG GATAGTATCA AAAATACAGG ATATAAACTG ATGAATGGTG CTACIAAGAA ATCTT-CCAAT TAAAGTCAGC TTTAAACCCG
AACAAGAAGC
ACCTTATTCT
AAACAAAATC
CCCTCTATCT
CATTGAAATT-
TCCAGAGATT
AACACAACTA
AGCTTCCTAT
GATGCAAAAG AAGCAATTGA TTTAGGCCAT GAAGCAAATA CAAACAGCAG GCCGAGGCC TTTTCAACGr TCCTTCTACT ATGACACTCC ATCTTIAAACC AAATCTCCCC TATGACAA.AT a a a CACCACAAGG TGGTATTTAT TACCATCC'rA CACACTACTT GrACGCTGGAG C'rGTCTACAA AGCCATTAAG AACCTAACtTr TAATAGATGT CGACATAAAA TGGGTCAATG ATATCTATCT AAACAATCAT AAAATTGGAG GAATCCTTAC TGAAGCAATG ACCTCTGTAG AAACTGGCTT AGTCACAGAT GAGTAGGTA'r CAATTTCACr A'rTAXAGACT TCCCTCAGGA ATTIAAAAGAA GCTTATTTAA ACCTACAGCI' CCTATAACAA GGAATGAATT GATCATAGAA CTI'TCTTCGA AACACCAGCA GAAGAGCTAT TATACCTATA CAAAAAACAG
ATCATTATTG
AAACCTGCCA
ATCTGGCGTG
TCATTCATTC
17040 17100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 TAGGAAAAGA AGTCACTTTC ACACTAGAGC AAkA.AGAC'rA CAAGGGACTT GCTAAAGACA TCTCAGAAAA TGGAAAACTT TTAGT'rCAAT GTGATAACGG AAAAGAAATC TGGCTAAATA GTGGCCAAAT TTCTCTCAAT AAAAATAACT TCAGATTAGT TAAAAATAAA AAAGAGAGTT TTGCGCCAGT TACCCGACC'r TCTCCAATTA ATGGGCACGA CTCTAACCAC CTGAGCTACG AGCGGGTGAC GAGAATCGAA ACTACACCCG CATAAATACI' AGTTGGAAGT AAAATAACAC AA'TATA.ATA TAAACGATP.T AATTCAAT'rA AGTTTTACGG ATCTGAAGTT TTATTGGCTC ACAGACTCTC ATTAAA.ACGG AGAA'rAAGGG ATTCGAACCC AACGATTTAG CAAACCGTCC TCTCAGCCT CTTGAGTAAT GTGGAcTcC.A ACCACCGACC TCACGCAT CAGGCGTGCG CGCCCAAGTT AAAAAACTTG CTAATTrTGAA CAAAGTTCAA CTCGCGACAA CAGCTTGAA GGCTGTAGTT TTACCACTAA ATCAATAAAA TGGCGCGAGA. CGGAATCGAA. CCGCCGACAC ATGGAGCTT'C AATCCATTGC TCTACCAACT GAGCTACCGA GCCTATTGC GGGAGCAGGA 762 TTTGAACCTA CGACCTTCGG GTrATGAGCC CGACGAGCTA CCG;AGCTGCT CCATCCCGCG 18780 TTAATAATAT AAAAGGAGGA TGTGGGATTC GAACCCACGC ACGCTrTTAC ACGCCTGACG 18840 TCAAGA CCGTTCCCTT CAGCCGGACT TGGGTAATICC rCCAATATTC AAATGGACCT 18900 TGTAGGACT GAACCTACGA CCACTCGGTT ATG.AGCCGAG AGCTCTAACC AGCTGAGCTA 18960 AAGGTCCGAC AAGATCATTA. TAGCGGCGAA GGGGATCGAA CCCCCGACCT CCCGGGTATG 19020 AACCGGACGC TCTAGCCAGC TGAGCTACAC CGCCATGAAT CGGGAAGACA GGATTCGAAC 19080 CTG.CGACACC TTGGTCCCAA ACCAAGTACT CTACCAAGCT GAGCTACTTC CCGAGTTAAA 19140 TAGAAAAATG CACCCTAGAG GAGTCGAACC TCTAACCGCC TGATTCGTAG TCAGGTACTC 19200 TATCCAGTrG AGCTAAGGGT GCTCCATATT ATGCCGAGGA CCGGAATCGA ACCGGTACGA 19260 TCGTTACCAA TCGCAGGATT TTAAGTCCTG TGCGTCTGCC AGTTICCGCCA CCCCGGCCTC 19320 TCTAAGCGAA CGACGGGATT CGAACCCGCG ACCCCCACCT TGGCAAGGTG GTGTTCTACC 19380 ACTGAACTAC GTTCGCACTG TTT'rCTTCTA TCTAAAAATG CCGGCTACAT GACTrGAACA 19440 CGCGACCCTC TGATTACAAA TCAGATGCTC TACCAACTGA GCTAAGCCGG CTCATTTGTT 19500 *ATATCTAAT GCGGGTrAAG GGACTrGAAC CCCCACGCCG TrA.AGCGCCA GATCCTAAAT 19560 CTGGTGCGTC TGCCAATTCC GCCAAACCCG CATATATGAC CCGTACTGGG CTCGAACCAG 19620 *TGACCCATTG ATTAAAAGTC AATTGCTCTA CCAACTGAGC TAACGAGTCT AAAATAACTT 19680 .GCGTTACCTT AAACGGTCCG ACGGAATCGA CCCGGTAC 19718 INFORMATION FOR SEQ ID NO: 100: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 4117 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPT'ION: SEQ ID NO: 100: *CCGTGGAAAA. GTCTGGATAG TGAATGGTCT TCACACAATG ACCTGAAAGA AGCCTGAGAA TAATTATGGA GAGTAGCATT CTGAGAGGTG TTAGCAGAAC CATATGACAG AGCTGTTTGA 120 AGAGGGA.ATA TTGAGGAGAA AAATCCTGAG CCTACCAGTr GGAGTI'GGAA AGAGCTGACT 180 GTTACATCAT GGTTTATTAT CCACAACCTG TGGATAACTIT TGTGAATAAG AGAAGTTGCT 240 AAAGAAGGAG ATATATAACG ATGAAGAAAA TCAAACCGCA TGGACCGII'A CCAAGTCAGA 300 CTCAGCTAGC TTATCTGGGA GATGAACTAG CAGCTTTTAT CCACTTCGGT CCTAATACCT 360 TTTATGACCA AGAATGGGGG ACTGGACAGG AGGATCCTGA GCGCTTTAAC CCGAGTCAGT 420 TGGATGCGCG TGAGTGGGTT CGTGTGCTCA AGGAA.ACGGG CTTCAAAAAG TTGATTTTGG I TGGTCAAGCA CCACGM'GGC ?TTGTCCTTT ATCCCGACAGC TCACACAGAT TAT1,CGTTA ACTTGCTCCT TGAAGTWTCC CAAGCTGCCA AGGTCAGTDCC
CAGATTGA
ATCATGTCGA
TATCAAATCC
GAGGAGAGGG
ACCTGCAGGG
ATGAACGAGG
CAGAAGCAGA
GAGAGGCAGA
TTGGAGGAGA
TATGGATATG
CCGAGAAGCG
TAACTATGGG
CGCGCAAAAG
CGATTGCTTG
GTATGCAGGT
GCTGAACTAT
TGTTTCCATC
GGAAAGGGCG
CGGGTCTACC
GACTACAATG
AATGCTGGTA
G~rrAATTATG ATTrTTTTCAA
TGVCACCGTG
CCTA!TTATCT
AGTTCGCTGA
AATTTGAAAA
CAGAAGGCAC
GGATGCCCAT AGTCCCCTCT GGCTCAGTTG AAGGAAATCT GGTTTrGGATG GATGGTGCCA ATGGTTGAA ACCATTCGTG CAGTATCCGC TGGATTrGCCA GAATCCTGA'r AAACTAGGAA GGGCACGATT TTTTCAATCG GATCCACTGT GGCAAAAGGT CTTCAGCACG GGGATCCC'TC CCTCCACGCT GG'rTCTACCA TGAGGATCAG GATCCTAAGT CTCTCCAGGA GTTCGTCGAA ATCTACTrrC ACTCAGTAGC GCCAGGAACT TTAATATTCC GCCGAATCAA GCTCGGCTCT TTGATGCAAA GGATATGAA AA'rTTGCGAC CTATCGCAAT GAGCTCTATA AAGAAGATTT GGCTCTGGGA CTGGTCCAGC TCT'rTCCGCA GACTTrGCTT GTCGCCATTT GACAGACCC GCTCTTGGGC AAGCGATGCA GACTTGCCCA TCCAGTTAGA ACTCGACTTA AAACTTTTGA TGTAATTGAG TTAAGAGAAG ATTTGAAGCT AGGGCAACGA TTCATGTGCA AGTAGAGGTG GATGGTGTCT GGCAGGAGTT TGGTTCGGGT GTTACAAACG TCTCTTACGA GGAGCAGTTG TTGAGGCACA. GAAGATACGT CCAC'rCTTGC
CGACTTTATG
GCTGAGGTAT
CTTGAGACCA
GGTTCTCCTA
ATCGCTGCTT
CATACTGTTG
GTAGTCATTA
540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800.
1860 1920 1980 2040 2100 2160 CAGAATCACA GGCT'rTGCCT CAAAAAAAGA AGTTGTTCAG GAGAAA.ATGC CTATTTTACA TTTCGATTCA ACCGGGGACA TTGCGTTTCA AACTGGTGAG GAGATAAA.AC CTTGGATTTC TTCAAG'rCCA AGTTTCATAA TAGTGACTTG GTAACCAGCT TGTTGAA'rAG T'rGATACGAG TTGTGGTTTT GT'rTAGTATG TTGTTGACCA AGATT'rCCCT GAACTAGCAT TTGCAGAAAA GTTAAGCGCA GAGAATGTAG GGTGTCCATG GTCTCGCCTA ACTGAAAAAA GTCTGACGCT TATCTGAACC TAACGGTGGA AAGAAGAACC TTTGCGCGAT GAGGGTGAAA GTTAGTTGTT TGT'TrGTCC AGTCGGCATT GATATCCAGC CATTTATCTT TTATAAAACT CCTGGATTrAT AAGCCTAGCT GTGGCAALAGG TGGTCCTTTA GAAGCTAAGA TCAGGATGAG ATTCAAGTCC ACCAACCTTG TATTTCGCAG TGGTCAGCTr GTGGATCAAC GCAAAGGTTC TN'TGGTTAT CAGCT TTAA GAGGTCTTGG CTTTGACAAA GTTAAAATGG CTTTAGCGAG GTAGACTCGT 764 AGATGGTCAA AGAGAGGGAT TCCGAGGTCA TAGCrTGGTT TTCCTGGACA GGTTGGATAA AATCCGAGAG CTGACCAGAT GTACCAAGCA GAGAGACTAC CATTGTCTTC ATC2'CCAGGA TAGGCTTCCC AACTTGGGTG TAGTrCAGGGT AATCGcTGTA ATGGCTATTT GTCCAAAAGG TAGCCTGTTG TTTCAAAGAG CiTrCT'1-I-C CACCCATCAG AAAAGCTTTC TIGACGGAGCG TCITGATAAG AAGGGCAGTG ACGGAAGALGA TAAGGAATGT GGAAACTMGG CTGGTTGGAA AGCAGTAGCC ATCTCGCTCA TTTCGTGAAT TCGTAACCA
GGGAGCATCT
TTGGA'rTAAG AGCGrAGTCT TGACAGGCTrr
CCAGGGATGT
CGCCCCCAAC
GCTTGAATGG
TGAAAGTTTC
CGGTAATTTT
CAGAGCATTC
CTTGATTGTC
GTGAAGCAGC
TCGTGCTCGC ATGTAACCTG CTTGTAGGTT TCAGCGATTT TCAAAAGATA GTTGCTAAAG CGTGGAGAAC GCCTAAAGTA TATAAGGAGA GAAGTCAGGG TCTCAGCGTC AAATAGCTGG CTATGTTCTC 'rAGTTTTTG GAG'rATGGCT AACACT'rCG CTAGTCCGTG GCGGCCArrG CTTGGAAGAG T'rCTCCTTCT TACCGTCTAA AAGTGTACCT GGAAACCAGT ATCGCGGTAG GCACAGCTGG CGATACAAAA GTCACTATAG .GCATAGTCTA TGGTCGTCGG TAGAGAGGTA ACCTAGTTCT TIGGTATTGG ATGCCGAGAG GGTCGGCN'T GCTGGCTGTT TCGAGCATGG AGGTCGGGGG TCATGTCCTT GCAGGCGCTA TC1'GCGATAA GGCATCATAC CCCGTTCATC TGGAGCCAGC CATTTTGGAA 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 CTATTGAGGA AACCT'rC'AA AAAGCGTTGA TAGTGCTCCG GTATGATAAG GGCAAAGAGG GGGAAGGTGG TGCGCAAGGT ATCCCAGAAA CCATTGTTGC TAAAGAGGAC ACCAGGCTTG ACAGTACCAG TAGCCAGATC GTCTGTGGGA AGAGGAAGAG CCTGTCTCTA TAAT GTCAAA TTACAGCTAT CAAAATCTTC GAAGTGGCTA GTTGCATCTC TCTTGGCTGA TAGCAAGAAT CATGTGGATG GCTTGCCCTG ATTCATTAAT CTCATAAAAA TCTGTAGAGG CAGTGGTCAA AGAAGGTT-CG GTCAGCCTCT ACCATGGAGG AGATTTTCCC AATCCACTTG GGCACTTGAP TTGAGGTAGA TTGATTAGAG CTTGAGAAGG AGAGATGAAA GGTTTGACTA CTTGCTAAGT CAATTCGCCA GTCTCCAGCT ATCCGTGTTC ATTTGCAGGG CAGTGAACAT CCTTAGCGAA ACCTTCTTGT CGCAGGGCAA GAGTCCGCTT ATCTACTTGC TGCGTGAAGA TAGAGGGACA GGGCTTTGCC TTGCTTTTGA ATAGCAAGTC GGTGTGAGCT GGGTTTCAAT CTGATAACC
TTTTTGTTAG
TCTACTGTCA
TTCAAACGAA
TTTCAGTTTT
GTTCATCTGC
TAGAAGCACC
AGAGAAAAGA GCTTCAAATA GTGAGGCTGG AAGCAAGCTT TGGCGGTGAA AGAGGCTGTC 'rCCCCCCAGT TGACTGGTGA GAGTAGTCCC CAATCCAAGG ACTGGGCTGG TGAGTTAATC AGATGTGGAT CAAAAAACCA AGATCCATCC TGGTCACTGG
TATCTATATC
CAGGTGTCAG
GAATCCCCI'G
ATAAGAAGAC
AAGGAGCCA-A
AAAGATAGGC
TCTGGGGCAC AAAGTAATrC 765 ATCCCAAAAG GCACGCCTGT GTATGGCAGG GTAT'TTCCCC GAGAAAAGGC ATGC77GTTG GTAGITCCAA AACGGG1'ATC GATGGTATCA AGTAGTGGTT TCATAGTCI' TCCTTrAGCT GPTTTCTAC ATTATATCAG TAATAGAGGG CCTTTAG IFORMATION FOR SEQ ID NO: 101: SEQUENCE CHARACTERISTICS: LENGTH: 2727 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: CTGGTTCAAT TATTATTCAC TCTAAGTAGT CATATGTTCT TTATTTATGT GAGTTTrTTAC CTTTTAAAGG ATCTTGTTAG ATGGGAGAAG G I t IAAAAG TGACAGATGA TAATACAAGA 4020 4080 4117 AAAGTTCGTr TATTAGTAGC CTTTTTTAGC ATTGTCATAG TTTATTAGCC TGTATCATrrT GTGGCAAGAA GCGCTTAGAG
TAAAGGAAAC
GCGTGACGAT
CAAGTGAAGG
ATCAGGTAGT
AGGTGGATGC
GCGCCTCCAT
TGCCAGGTGG
GAGTATGGAT AAAATrGTGG CGAGGGAGCA AAAAATGCAG AAAGACCGTC TTGCAGAATG TGGTGGTTTG AATGCCAAGG TACTGGCGAC ATCACTGAGG CGTTGTATTA GGGCCAATCC TTGTACGATT GGTAGCCGTC
TTCAAGGTGG
TCTTACCCTT
TTCCGATTTT
TTGACTTTGA
AAGCCCCTTA
TTGCCCGTGT
CTATTGATCT
GCTACATCCT GAGTrCTTTC GATTATTATG AAATCAAGAG CGATAATCGT CTGGTAGGA.A GTTGGCAGCG ACTATTCTAG GTCGGATGTC TTTATTATGA TGAGGAAGCT CATCTTGTCA CAAGTATGTC AGCAAGATGC GGCTCATGCC ALAGGTATCCA TCATTTGAAA GGTCTGGAAG 180 240 300 360 420 480 540 600 660 720* 780 840 900 960 1020 1080 1140 1200 CTATGGGGGT TAAGATTAGT CAGACAGCTG ATGGTGCTCA TATCTATATG GACTTTCCAA CAGCGACTCT GGCTGATGGG GTGACAGTGA TTGACTTAGC CATTCTCCTT AATGAAATGG CTATAACCAT TACTGGTGT'r GAGAAACTTC GTTACATCGA AGCCAAGGCA GAACGCTTGC GTGTTGGTGC A.ACGCAGAAC 'N'GATGATGG
TTGAGAATGC
GAGCCAAGGT
ATGGTACGAC
TGCGCGTGAG
CAAAGGTGCT
TCACAATGTA
CCTGAGATTG
GGTACAGAGA
GTCCAAGACC
GTATCGAAGC AGGAACC'TrT ATGGTAGCTG CTGCCATGAC GAGACGCTGT CTGGGAGCAC AACCGTCCCT TGATTGCCAA AAGTAATTGA AGAAGACGAA GGAATTCGTG TT-CGTTCTCA TTCATGTGAA AACCN'GCCC CACCCAGGAT TTCCAACAGA TGGTGGTGAT GTCTTGATTC GTTACTTGAA ATGGGTGTTG ACTAGAAAAT CTAAAAGCTG TATGCAGGCT CAATTTACAG CCTTGATGAC AGTTGCAAAA GGCGAATCAA TCCAACACCr AGAAGAGATG CGCCGCATGG
CTCGTATTGT
CCAGTGCGGC
TGGTrCACTr
AGATTCAGCG
CGTTTACTTr
ATGGTAGGTT
TGGTGGACAG
CTTGATrTTG
GGATAGAGGT
GATTGAGGCA
TAGTCATCAT
ATGGAATCTT
CCrTTGCAGG
ACAAGTTTGG
TACTACGGTI'
AGTGATGAAG
AGTACTGATT
GGGCAAGGGT
AAATGGCAGG AATTGATrCA TAAATTTACA TAAAGATAAG GAGAAATATG AACAAAAAAA
C.
C. C C
C
C.
C
C
TATTGCTT
CCAAAACCAA
AGAGTGTCITT
CAGGTGCTTT
CCTACOCTGA
CCCTCTTGTC
CTTCTTGGAC
GTCTACAGGG AGCTATTATA TCTTAGTCAG AAAAAACAAG AACAGACGCA GTCAAGAGTC TATCGTCAAT GGTAATAAAA CAATAAAACA AAGACAGTGG TAAGGCCACT CGTCAGTACA TCCTCCAGGT TGGCATCAGG 766
CCATGGTGGA
GCTTGCA'rTC
GAGCASAAGT
TAGCACAGGG
TCCATGAGAA
ATGAATAAGA
TTAGGTACTC
CAAGATCCA'r
GGAAATTAGG
CAAGACAGAC
'rCAAGCAGAT
CGTCTGAAGC
AAATAAAGGG
CAAATCTAGA
GCAAGGAAAC
AGAATCGTAA
TCAAGAATCT
CCTTAATCGG
TTCAGACAGC
AAAGCAAGGT
ACTACGCTTC
ATGGAGAATT
GAGTCTGGAG
TGCCAAGGTT
TGTTCCAACC
AGAAACTGGG
AAAGGGCTCT
TGGTTTGGAT
C'IGGGCAAAT
GCGTAAAGCC
AAACGAGGAT
GGNATTCAAT
CAGTCGATAG AGGTCATTTG TrAGGCTATG CCTCAACAAG CAATCCTAAA AACATTGCTG CCGAGTATTC GACTGGTCAA AACrACTATG ACAAGCGTGT CCGTTACCGT GTAACCCTTT CAGCTTCACA GATTGAAGCC AAGTCTTCGG CCAATGTTCA AAAGGGACTT CAACTGGA'Fr GA.CAGTTTTC GAAAATCGTr TG-AGATTATC CGTGATACAG TCrI'TCAACT GACCTTCGTG AC-AACTGTG GTCGGTAAAT GTTGGCGCAG CTAGGTGCTA AATCAAGCTA CGTAGTCAAG TGGCTCTAGG AATCGGTTTA GGGCATCCT GTCTCCAGCA CTGGAGAACC AGCCTTTTC ACTAATCGGA CTGCTAGTGT GCCGTCGGCA CCTAATAGTC TCCTAGTCAA GCATTGGCAG
TGGAATGGCT
TCAAGTAAGC
GTAGCTAATG
AATGGTTCAA
TATACCCATG
GGTTTTGATG
CAGGCACAAG
T'rGGACCAAA
TTAGTTCCCT
GTTCTAGTTC
GTAACTCAGT
1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2727 ACCGAACTGG AGA.AGTAACT AAAAGATA CG CCTACACTCC 'rATGTCACTr ATGGATGTAG GAG'rrCTrTT TACTAGTTTA AGCAGGACTA AGACAGGTAC TAAGACAAAA TAGCMACTTC TAAAACTAAC TTCCAGTTTT GGGAGAGAGA TGGAAGTTAC TT'rGAGA INFORMATION FOR SEQ ID NO: 102: SEQUENCE CHARACTERISTICS: LENGTH: 5717 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: TTrTTGTAG ATI!AAGTCG ATGTTAGCAO GAATT-GCTTC TrCACATCGT ATAAATTGCT ATTAATTCAA TTri L-rrI AACAATAATA ACTTGAATr GGTGCAATTC CTAAAAAATA AAAAACAATT TTTGAAAAIT AAATTCGATT TTATCACTTA CAGGTITACT TGT'IrrATTG TGGACTCTTA TT1T=ATCA 'rTAACTTAGG TATGATTTT 'rCAGTATGAT AGCGTATTA ?IrTAAGATA Cr'TGAATTCT TCAATATATA GTTCAACTT TAATAGCATT TGTTATTAT ATTACAGTTT TTGACACCCA ATATTATTGT TCG'rAGTATA ACTGCTTTCT A'rAT ATATA TGATAGGAAG GACGAAATAT TTTTTNGCTTA TAAAAAAGAA GGTATAATGT ATTTACGTGG TATT'rTCCTT
GGAGGGGTGG
GTTCTACGTA
TGAGTCAACT
TTGTTT'CTAT
AATAGTTATG
TCGGAGTATG CGGGAXATAG CTATAGAACC TCAAATTTrCG CAAGAATTTA TCAACGA'rCT ATTAATAGT TGTAAGGAAC TATTAGAGAT TGAACTATAA ATGAACAAAT ?ETrAATTTCG AGAACTAAAG GT'rACTATTC GTTATTTTAT ACACTATTAG TATTAGGAGC AGGAGTCTTA TATGGAGTGA GGAAACTTAT TGTCGCTGAT AGGCAAATTC TTTATACAGA GTCGGATGTT AAAATACACA AGTATAAT'rC AGATGTTCAG GTAGAAGAAT TAGAAAAAAT TGTTGCGGAA ATTGATACGC CCX7'rGATAT TATAAAAATT TCCTACATAT CAGGAGGGTT TAATGGATGC ACCATCGGTT CTTCCTTTGG TTGTCGGAAT TCTGATAAGA CAAAGTGGCC GACTACACCA ACTAATTTAA TAATTAAAAT ATTT-C'rGGGA *TACGTrTATA ATA'rGAGAAA TCATGCTCTA GAATGTCCAA TTT'GTAAAALA AATAATAAAG TTATrCGTrC AGTTTGTTTT TGCTTATTAT GGCAGTTCAC TG'T-ATAGAA ACACAATTAA TATATTATGC 'rrACTACAAT AAAAATA'rTC AGAAGAAGTA TT-AGGCAGTA AACTAACATT GATOAGATAG ATATTGATAG TAGATATTCT AATGAAGAGT ATALATAAAAT ACAGAATAAA GGATGTTATA TA'rCTCTAAG TCTA.AGTATG TACGATATAA 'rAGAACCATC GG1'AAGGAGA AGATTAATGT GTAGTACCTA TT-rCTATTAA TA'rGGGAGTA TAGATTTTAT G'rCAATCAAT TTGCTGTATC TATCrTATTA T'TGATAATAT ATAAACAAAG ATATAAATAA GAGATGCCTG CTATTTTGGG TGTTATAATG AAATCCTAAT AGTCAAGAAA AATATGTTCT TGAAAGATAA CAATATTAGA CAGGAGGAGT AGCTTTTTTA TAGTATrGTI' CTTGGGTATT AAAcATCA'rT GGAAAATATA
AAATTTAAAT
TCTITrCTGAA
AGTTTCTTCA
CGTTAAAGCA
GCATAAGATA
ATATATCCCT
GTACACTTTA
AGGGATAATG
AGATAACGCT
GGAAAACGGA
GCGAAAACAT
TCTGCTA'ITG
AT'rTTTGCTA
GTATGGCTTT
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 768 TTTCATCGTT TGAGANA TTTTTGCTTG TTAATTrrAG 'rGGATATTTTr TATTGGTATG ATAATATTTT TAATCCTGTG TAGTGTGTCr TAGTTAIAT ATAACTTTAT TATTTAGCAA AAAATAGGTA TAATGAGCAT TTGCATATAT ATAATIAmT CGTATrrATCA ATAATATTGA AACAATCT1TG CTAACGGTTA TTTTAAGAC GTTTA'N'TGA TAAAGATAAG TAAATAGATG TAAAGGAGGT GCAATGAGTA 'rGATTGAAGT TAGCCATTA AATAGCTTTA AATAATATAA GCrrCACTGT TAAAGAAGGT AAr-ATrrATT CAGI'ACCAG GATATTTATT ATGTTAGGTA GGAGGCTTAG TATGTTTAAA TATACTGCTT GAGAATGTAT TATGCTTAAT GTTATTG~rr 'rTAAGTAAAA ATGTAGAATA TCAAAAAGTT TTGGTGATAA TAGATTrTTTC GA'TTTTAGA ACTGGGCAGT TCCTTCCCGA 'rTAACAAGCG GTGAATTAAA AAAATGTCTC TGTATAACAA 9* ACCATCTGGT TCTGGAAAGA TAAAGGACAA TCTATTATTT GAGAATTGGA TTGGTTAGCG TCTTCTTTTT TATAGTAAAT
GCGAGTAGGA'TTATATGATA
GCAACGAATG CTTTTAGCAC ACCGACCTCA GGTCTAGATC GAAAACAGCA GGGACAACGA ATGTGATTAT GTTGCCTTAT ACTCATTCAA AGATATAATA GATAACTTTT GATTTTACAT TTCAATTCAT TCATGTGAGC GCT1AAATGCT TAAACGGTTT ATAAGAGTAT T'rTATTGCAA TTATGGAAAC ACAGGGGAAG TACCTTTTTC rrTTTCTrrG AAGAAAAGTA CAAT'TTACAA TTTA'rAATAT
GTCGCAAGAT
GAGCTCTI'AT
CCACAACT'rC
TTTTTCTAAC
TAAATAAAGG
AAGATAAAAA
CACTAGAACA
CTACTT'IAGA
CTGGCTTTGG
GTTTTAGTGC
GTCAACGATC
GCTrGTrGAA
ACTCTTCTGT
TAGTAAATCA CGTGTrrATA ATT-TGTTAAA GG'rAGCAGGA AAATTATCCA C'rGGAATGAG CAACAACCCC GCTGTACTCT TTCTGGATGA TICGAACAATT CA'TGAGTTAA TTTTAGAATT GACTCATGAT ATGAATGAAG CAACTCTTTT GAAATTAGTT GAGCAAGGAG CTCCTTCTGA GATTAAGGTT ACAGATTATA ATGGGAATCA GGTATCTCAG ACTrGATCI'GG AAAA'rATT?1' AGATATrrTT ATCACAT'rAA CAGGAGGAAA TATGGTTGCG TTGTCAAATC ATCCTTTCCA C'TTTTGCTTT CACATATTTT TATAAATATC AACAGGCATT AGTTCTT'TTG ATGATGTGTT GTCCTATAAC TATTATCTTG TCTGAAGAAA TGAGTGGTGT TAAAGGCTCC GAATACATTT CCACAACGAT TAATA'TCTG TGGGACAAAA ATCTCAAAAT ATACAAGTGG ATTTTATGAG 1680' 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420
C
a a a. *a a a
C
TATcAAcTAT GTT~crTccT TTTTTGCTAA CTTTGTGAT TATGGGAACT
ACTCCTCTTA
TTTAGGAGT TACAATTGTA CATACTTTTA AT'rATArrAC AATCG=CTr CTAACCTCT-r TATCCATCAT TTTATTCTAT TTATTGATAG GTTTAACCGC GAAGAGCCAA GTAGTAGCTC AGGTTATCAG TCTTCCTGCT A'rGATTTTAG TTGCTCTT ACCGATGCTA 'rCTGGT'rTGG ATAAGACAGT rCGAAGATA ACAGATTATA GTIrATGGG ACTATTTACT AAGTTT'N'CA
CAAAATGGGA
GGATTGTCT
TGAGTATT
CAGCAATTCT
AA'rAACAATA
GCAATAAAAA
ACCTGTCTTG
GGAATTTTCA TGGAATAAAA cTCTAATTcc TAATCTAAcA cTACTTATTT TCTATTAACT TrAM-PACGA TAACTAI-rAG GAAAAAGAAA ATTTCTTAAT TAATGATTAT AAACACAACT GGGAAGGAAA AAATGAAcTG ATCTTTTTGA ACAGAATAGT CTTATTGCTA TATTTTGATr TGAGTGTACG AAAAAAGAAA GTGCTCATAC TAATrrCCAGA AGTTTTGGGT GATAAGATAA CTGATAAAT'r ATGCAACATT TT-TAAA'rCTC CM'TATAAGT GCTTCAAAAA GTG.CTTCAAA TAATCCAAGT A~rMTGGGG ACGGTGATTA AT.AAGCTAGC AAAGCATCAT 3480 3540 3600 3660 3720 3780 3840 TAAGGATTTT TTCGGI'AAfT GTTGCCAAAT CGGT -rAAGA AT'rCGCATTC TCATTACTTC CCCTTTGCCA AGAT1GAATAG AATTCCCATT TGTTCAATTA AAGGGTAACA AGCAAACTCT AGTCTTTAAC TATTCTTTTG GAA.AGAGTCT TGTGAGGTG3T AAATACTCAC GAAGAAGTCC GCATCCGCAA AATAAAACAG TT'rrCTCTGT CCGAAGTCGAA *go* es. S S .00.
S. 99 TTTAGCTGTT TTTACTT-GAC AGCAGTCCAG AGAGGCAGCT TGGAACCTTG CTGTTGGCAG TAGTAA.AGGT AAAAGGAGAA CC'rAGTTT'rG AGGCGGTCAA CTCAAGGTAG GGATGGAGCT GGTTTGGGTC ATAG'rGTCTT TCAGCCATGA AGATCTTGTC GGTGTAGAGA TGATGAAGGC AAGTGCTAGT AGAAATAATA AAGGTTAGAC GGTGAAAGGG GTTCCTTTTT TCGTGGCTrTC ACCTATGCGA GAACATCGTC
TCAATAGCAG
GAATAGTAAA
TOGAGACTAC
TGT'rGGCCAG
CAATCATTGC
TCAACATGGA
AAACCTTT-AA
CCATT'N'TCG
ACTC'rCTCAC
TCTTGATTTT
AAGCCTTTAT
CTACTTAAAA
GGAATTTTTA GCTCTTTTCC CAGCAGAAGA TTA'rTACGCA GCGGGGCCTG AGATTCTGTC TTTGGATCTC AAACTTCATG ACATTCCTAA TACAGTCAAG TCAGC'rTGGT GTCGATA'rGA CTAATGTCCA TGCGGCTGGT GGCGCGTGAA GGTCTTGGGA CTCAAGCCAA ATTGATCGCT 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 GTAACTCAGC TCACATCAAC GTCAGAAGCT CAGATGCAGG AG?1'TCAAAA TATCCAAACC AGTCTGCAAG AG'rCTGTGAT TCACTATGCC AAGAAGACAG CTGAAGCTGG CTTGGATGGT GTTGTTT-rCT CGGCTCAGGA AGTACAACTC ATCAAGCAGG CTACCAATCC AGATT??ATC TGTCTGACAC CAGGGATTCG TCCAGCTGGT GTTGCAGTTG GAGATCAAAA ACGAGTCATG ACACCTGCTG ATGCCTATCA AATCGGCAGT GACTATATCG TAGTGGGACG rCCCATTACC CAAGCTGAGG ATCCTGTTGC AGCTTATCAT GCCATCAAGG ATGAATGGAC ACAGGACTGG AATTAAAGAA CTAGATTAGA AAAATAAA.AG GAGAATACCA TGACACTTGC TAAAGATATC GCTAGCCACC TCTTGAAAkAT CCAAGCCGTT TACCTCAAAC CAGAGGAACC CTTCACTTGG GCATCTGGTA TCAAGTCACC GATTTACACT CATAATCGTG TGACACTAGC CTATCCAGAA 770 ACTCGTACCC rAATrGAAAA TGGTrrI'GTG GAAGCTATC.A AAGAAGCCTT3 TCCTGAAGTA 5220 GAAGTGATTG CAGGAACTGC AACAGCAGGG ATTCCACACG GAGCCATTAT TGCTGATAAG 5280 -ATGGACTTGC CrTTTTGCCTA CATCCGTAGT AAACCAAAAG ACCACGGAGC TGGTA.ATCAA 5340 ATCGAAGGTC GCGTAGCTCA AGGrCAAAAA ATGGTAGTGG TTGAAGACCT TATTTCAACG 5400 GGTGGTTCAG TTCTTGAAGC TGTAGCAGCA GCCAAGCGAG AAGGAGCAGA TGTACI'TGGA 5460 GTrGTAGCGA TTTCAGCTA CCAATTGCCA AAAGCAGATA AGAACTrrGC AGATGCTGGT 5520 GTrAAACTTG TGACGCT'rTC AAACTATAGC GAGCTrATCC ATCTAGCCCA AGAAGAAGGT 5580 TACATCACGC CAGAGGGCCT TGATCTTCTA AAACGCTTTA AAGAAGACCA AGAAAATTGG 5640 CAAGAAGGTT AGGTCAGTAA GATAAAGAGA GACGAGGCTA CCGAGTCTCT rr'rACCATTIT 5700 TATTTAAAAT ATGACAG 5717 INFORMATION FOR SEQ ID NO: 103: SEQUENCE CHARACTERISTICS: LENGTH: 5558 base pairs TYPE; nucleic acid STRANDEDNESS:,double TOPOLOGY: linear .0 (xi) SEQUENCE DESCRIPTFION: SEQ ID NO: 103: o CCTGGACTTT CTAAAATGAA. ATCTTGCGAC CTGGATCAAG"CCCTTCATGA GCATTTTTCA GAAGAAGAA'r TAGCTGGTCA CTTTCATGTC CTTCTATGGA CTTTTTTTAC AATGGCATTG 120 ooo:CTATCACACC CAATACCTAT CTAAGCGCCT GGTTCGTAA.A CTTTATTGCA GCTCTTCCTC 180 TAAATI'CCT AATrGTTGAA CCA.ATTGCCC GTTTTATACT AAGTTCr'rTT CAGAAACCAT 240 TTACTGGGGA AGAAGTTGAA GATTTTCAAG ATGATGATGA AATCCCAACT ATTATCTAAG 300 :.~oCCAGTTCTGT AAACTACTAA TATTTGAAAT CCACTTCCTT T-rAGGGTGCA ATGGTTATAA 360 00.0ATGAATTTTT GAGAGGATCA GAATGAAAAA ACTAGCAACC CTTCTTAC TGTCTACTGaT 420 0 0 AGCCCTAGCT GGGTGTAGCA GCGTCCAACG CAGTCTGCGT GGTGATGATr ATGTTGATTC 480 CAGTCT'rGCT GCTGAAGAAA GTTCCAAAGT AGCTGCCCAA TCTGCCAAGG AGTTAAACGA 540 o* TGCTT'rAACA AACGAAAACG CCAA'rITCCC ACAACTATCT AAGGAAGTTG CTGAAGATGA 600 AGCCGAAGTG ATTrTrCCACA CAAGCCAAGG TGATATTCGC ATTAA.ACTCT TCCCTAAACT 660 CGCTCCTCTA GCGGI'GAAA ATTTCCTCAC TCACGCCAAA GAAGGCTACT ATAACGGTAT 720 TACCTTCCAC CGTGTCATCG ATGGCTTTAT GGTCCAAACT GGAGATCCAA AAGGGGACGG 780 TACAGGTGGT CAGTCCATCT GGCATGACAA GGATAAGACT AAAGACAAAG GAACTGGTTT 840 11 1-1 CAAGAACGAG ATTACTCCTT ATTGTATAA CATCCGTGGT GCTCTTGCTA TGGTCAACCA AACACCAATG GCAGCCAGTT CTTCATCAAC CAAAACTCTA TTCTAAACTC CCTACAAGCA AGTATCCACA GAAAATATT GAAGCCTACA AAACCCTAGT CTAGATGGCA AACACCCAGT TGTGGATAAG A?'rGCrAAGG CCG.AAAAA AATCGACAGC ATCGAAGTGG TGAAAGACTA CAGTATCCAC ATTCGGTACT GTATTTCTT TCCCA'rATTT GGTCTATCCA GCCTTCATAA GATCCCCTAT ATTCTTTGAG AGCGCCTTTC TTGGTCGGCT CGTCCAGCAC TAAAACGTTG ACCTTGGCTT GCTCTCCCCC TGATAATACT AAACCACAAC GGGCAAGGGC TGCACGGACT CAGACAGCTT CAAGAGGAdT TTGGCGATTA ACTTCTAA.AT AATCTCCACG CTrCCACTTCC CTCTTCAACA CAGTTGT'rTT TCCAATACCA CGTTrCGAAGG TAAGATTTAA AGGC'rTAGTA TTGGCTTGGA AGATAAAGCG CCCTCGTGTA GGTTTCTCAC TTTGGAGTTC GATAATATCC GCCATATTAC GAGTTGCAAC ACGGGCTTTA ATCTCTTTCT GCTGGCGTTC GTAGGCTGCC 'rCTTGGAACT GGTAGTAGTC ACCAGAGTAA ACAATATTAA 'rAACGTCATT GAGGAATGGA TCATAGTTTT GGAGATAGCG CTTGAGCCAA GGCTCGTCCA ACAGCAAGAT ATCACGCTTT GTTCTTTGCC CACCTGACAA AGAAGTTACA AGAGCACGCG CTACT'rCGTC AATCTTAGCA CGGTCT'rGAA GTTCTCCTAC TTCTTCCATG ATTTTCATAT ACAGGTCATT GATACGAGCT CGGAGAACAT CACGCACCGA CTGTCTI'CA CI'?rGGTCA6A GTGATTGACG TGAAAAAGAC AAGCCAACTA CGA'rrTTAAA TCTTAAAAAC TACTCTCATT CTTAAGTTAA AAGTCTGGCT CGTGGCAGAC AGCTCATCCT TTGCATCCAC rTTTCACGAT TCATCAAGAG ACAGAAACGA TGAATCTGGC TTTCAATATG TTTGGTTGTC TCTGCTTGAT 'rAAGGGCAGG AAAGGCATTC CCGCCTTCTA CT'TCCTGCTC AAAATAACCA CCAGCGATTG GCGAGATAA'r GCCCA.AGAGA TTAGCACCAA TAATCGCAAC CTT'rGATTG AGAGGACGGT CGTAACCAAT TTGCAAGTTC CGAGCTG;GTT TGAAATCAAA GGA'rGGTT ATCTTATCCA ATTrTTTTTG ACGAGACATA 'ACGAGCCA CAAAGTCCTT GAGGTCTGCA TCTAGCTGAG ATT"TCTCAT AGCATAAACT CGCGTCAGCT GTTGATTTTC CACATGATAG ATATCGTGCG AAATGAGAAC AAAGGCATTC 'rCAATATGCT CAGCAT(;CAA GTAGTTGGTC TCAAGGAGAA GTTTTGCCAA AAGCACCTTG TCCGTATCCA TGCCAAAGTC CATAACACCA TCCAAGGTAT AGAAATCACG ACTCTCCAGA AGAGCATCAA CATCCCCCC GTCTTCAGCC TCAGCTTI'GA AAAGCTCATC AAAAGCCGTA GCAAGGACAG AGTGCTGATC CAAGTAACCA TGGCrAATAC
CAGATACCTC
AAGAAGGTGG
G;TATGGATGT
CTGCTATCAC
CAAAAAAA'rA ArArrAAAA
CATAAGGATA
ATCCAAATGG
900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800- 1860 1920 1980 2040 2100 2160 2220- 2280 2340 2400 2460 2520 2580 772 GCCGTCACAT ATTGGACCA CTCAACCTTr CC'TTCA'rCTG GCAGCATTTT ACCAGTCACG ATACTCATAA AGGTTGATTT TCCTTCACCA TTGGCACCGA CCAGGCCGAT ATGTrCTCCC 2640 2700 TTGAGGAGAC GGAAGGACAC ATCTTCAAAA ATTGCACGGT TrTTAACTr CTAAAATACT CATTTTAATT CCTTACCTTG AAGGAGCCAA GCCAGATAGC CACCCAAAGT GI'TGGTCCAC
GCGATTGAAA
AAGAAAACTA
TCAAAGAAAA ACTCCAAGAT AAAAGAAGGA CCTTrTTTrGT
S
S S
S
S. *5 S
S
*5b* SO S.
S 0
S
5*55 5* S S
S
TTGGAAAATC AGAGGAAAAA ACAAGAAGAC TAATGTCCA ATGTCACTCA CTE'CGCCCAG A.AAAACCAGG CGTCCAAGAT GCTGAATACC TTGAGGAGTA AAGCAAAAAC AGACAA'rGCA AATATGATTA TAAGTCTCT TCATCAT'rAA TCACGTTTCA TTGTGTTAGA GCGCAAT'rGT TCTGACGAA CCACACACr'r GACCCCTTT'r CACTCTTTGG GACTACGGCT ATATTGGTCA TTTACATAAG ACAATTrCTT GATGTTCTTG ACACCGTCGr TGACTTCATT AAGCATGATA TCAATGTAGT ATTCAATAGC AGCAAAGAGT TAA1'rGCG'rA
TTTCCGCAAA
ATGAGGATA
TTTCCAGAGA
TGGAGT'rCC AATCrATAG GGAT1'TACCG
CCACAAGCTG
CACCAAAACC GTGACTCAGA TTTrTATGTA ATCGrrTATA AMETCTCAA TCTCAAAGAC CACTCGATrC CAAGACTCAC 'N'TGGAAATA GATAAAGGAG TTTTGTAAAA AAATCCAACA GAATTGAAAG GAGTCAAAAG ACTCCCACGG TAGATTGTTC AAAATGACTC CCCAGACCAA CTCCACTGC CTTCTGGCGG CGTCAATATC TGTACCATGC TTCTTAAGCG TATCATAGAA AGCCAACACG TGCTCACTAA CTGGGTTATA AGGAATCA.AG AGCAATTCAG TCAATTCCAA GGCTTGTTCT TATTCAAAGG TTACACGACG GT'rTGTTGTC 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200
I=TCAATCG
ATGATACI'TG AACGAAGTTC ATTGTTAGGT GCGTGAAGAG ACCCC'rTCAT CAGCAAAGTC ACGAATTTTA TGAGCCAAAC TGACGAGCAC CGATAGCCAT TCCTTTATCA TCATTGATAG TTGTTGTAAT TATCAAAGGG CTCACCGATT CCCATGACAA TCCTGACCAC GCTCATCAAA GTATTTCTGA ACCAGCATGA TTATTGAGdT CACG'rTGCTT CTTAATCAAA CCAGAGGCAC CAGCCGACCT GACTGGTCAC ACAGACAGAT AA.ACCATAGT TCAATTAACA TACCGTCGGG CAATTCAAAG AGATATTTCA TGCACAATAC GTTGTTTCAA GGGATTGACC ACAAACTGGT GAAAGGCACG GTTAATCTTC ACACGGCAAG A'rrGACCTGA CTGAGGTTGA AACCGTGATG TACGAAAGAA A?1'CAAGACA CGATATGGCT GATGCGT'rCA TTTGCGCTAC GATTTCACCG AGAAGGTACA ACCGATATI'A GTTGACGCAT GAGTACAGTC CTGTACCATC AGCAGACTC'r CATTGAGCTT AGCAATCAAA TCCTTGGAAA GGTTGGTCAT TTCTTCAAAT GACTGCACAC GTTACGGTA GAGCCATTCC CAGATTTGAT CTGCACGGAA TTTCTTTTCT CCCTGCTCCA ATACCCATTC CTGCATGGTT TGATGTACCA AACTATGAAT TGAGGGTTTC ATT'rCT'rCTC CTrATTCTCT ACTCACTTCT 4260 4320 4380 GACGAATGAC AAAATGACGT TGTCCCTTGT CGTCTTTCTr. ACGACGTCTA TTTTTCrrAT CTGCATTCGA CTTrCGTI'TA C?1'CCGTCrI' ACGCATTI'TC CAAAATAGGC ACAACCATAA T1rCAAGTI-r TrCTTCTGTr TGCTCCAGTC CCCCACGATA AAGTCGTCAC ATCAAAGGCA TTrCGACCTT GTCCCCGTGT ATTCACGTGC AATTrCTTTT- TCTACCATAA ?I'TCrAGCAG TTGTCAGACC AGAATCTTAT TrATrTAGCA GCrGCGTACA TTTGATGTAG TCAGGACGCA CAAGCCCAAG ATTGGTTTTT CACTrCAAGT TTCCCTTCTT TGTTGCTGCT GCAGTGAAGG TGCTGCI'GCC AGTTCTGCTG AGCGTGGTTC AAGTGTCCGC AGATTCTACA TCAGCAAGCA -AGCTGCATTG GCA'rTGTTGA
GTTT'GAGTCG
TTGCAAATG
CTACAATACT
CGGTCATCCT
TAATCAAACT
TCCTTGATAT
GTTTCTTTrCC
ATGCTCGCTT
CTAAAAGGTA
TGTAAAAACC
TGG?1'AATAC
TTTCAACCAA
TTTTCAGAA
AGGGGCTTCA
GrCTrTAAA
TCGTAGGCGA
TTCTGAAAAA
GGAAAAAGCT
AAACTTGTTA
ACGATTACTT
ATGAAAAAAG
ATTCTTCCCT
TTGATTACTG
GGTGT1'TCTT
TI'TCTAAGA
CGACTGA'TTT
AGCTGTTCGT
CGCTGATTAA
ATCCCTTCCG
TAGTTGTATA
AATACTTTAT
TCTGACGATT
CCAACTATCA
AAAAGAAAGC
AAATGGAACT CCGGACCAGG CGCA'rAGATA TCCTTTTTTC TTAGCACGTTr TC'rCATAAAA AACCTAAAAA GAGAAGAACA ATTCATCTAC TTTATTCCAG 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5558 CGTTGCGGTA TTTCACGTAG TAAGCATGTT CCCAAACGTC TACCTTCTGA GATTGGTGTG TCTTGG'rTTG CTGTTGAAGT TGTTGACAAC CAACCATGCC C GCTTGGAA 'rTCTTCAAAT AAGGAGCTGT TTTCTCGGGA.
CACCATTGTT GATAAGTGC'r AGGCTTCAAG GTCTTCACCG- CATAAGTTTG ATGGTGTT CAACCTGAAC CAAAACGAGT GAACCAAATG TGCATCGAT GTCATCAATT CCCAGAAAAG TGACGGATAT CAGCTGGGAT ATTTCAGGGT GTTTTTCTAA INFORMATION FOR SEQ ID NO: 104: SEQUENCE CHARACTERISTICS: LENGTH: 6735 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 104: GGAATTGTAA ATATCATATT GTTrTTTGCAC CCAAATATCG TCGTCAAATC AITTATGGCA GATACAAAGC TAGTATCGGA AGAATCATAC GTGACTTATG TGAGCGTAAG GGTGTAATAA TCCATGAAGC GAATGCTTGT TCAGACCATA TTCACATGCT TATCAGTATT CCTCCGAAAC TTAGI'GTTTC GTCCTTTATG GGCTATTTAA AGCATGCGAA ?TTAAAATAC AAATATGGCA TAGATACGGT AGGCCGTAAT CAGAAAGTGA AAGACAGAGT AGCAGACCAG CTCACGTTAT 774 AGGGCAAC2AG
ATCGCAAAGTT
TAGCTGAATA
TCGAGTCACT
GCACCTGCTC
GGTAGAAGCG
AAAAACTTGA
CAGTTIrGATG ATrTGATA TTGGTGTAGA GGCTATTATG TATTCAGAAT CAATTACAAG ACA'CCGT'rr ACTGGCGAAA GGGAAAGTGG TGCGCGAGGA CTT-ATAGCC GCAGAACAAA TACATAAAAA TAAAAGTCTA
TAAATAAGAG
AGCTATTTCG
CCACCAG?1'C GAAGTAACTA AGGTGCTTTA GTGGGCCTTT GGCCCTGGCC ACACTGGTGG ?1-rTGATTTA TATAAAGGAT GGTAAAATTC CTGTTGTCCG TATCGTCTAT ACTTNTCTT AGGAGAAAGC GAGGATCATG ATCTGCCCCA AAAGCAAATA TATGCCAAAA TTGAACGGGG TGAGCATGCG GAT?1'CTATG ACGTCAGTAC AGACTATTTA CGCTTTAGAA AATAATCTCC TCAATTTCAT TGCCCTr'rGA CAACTGAATA GCCTAAAATG ATGGCTCGCC ATGATAAGAG CGATTTTAAA TGCCATGATA CAPLATGATAT ACAATGATAC AGCAGCAAGT GAAA'rTCTTA TGATGACTTC TGTTAGATAA ACGCAATTAA 'rCCTCAAAAG CA'rCACGTGG AGTGTG'rAAG CTTGTTGCTA AATAGACTTT CTGCGAAACA AAAATATAAT AACAATTGAG.CGATAGCCGT TTCAAGATCC AGATGTTAGC TGTGTTAAAA ACAGCTTATC GCAAATTAAG CCTACACCAT CTCCTTATGG CTTATGAACA AAT'TGCGGCT GATTr-rGGCA AATGGTTGA AGCAACTCTT ATTCAAAATG TG'rAAAAACA GTAAAA'rTCG AAGGATTGTA GGTATAATAG CAATCAAAAC TAGAAAATAA CTAGTAGAGT GGTGATACTA TGAAGATTAG GA'rTCCCTAC TT-GCTT'rrAT CTATTTTGGG ATTTGGACAA 'rATCCTAAAT AG'N'ACAATA TAGATGTACA GACGTTTGAG AGAT -rGAGG GCTACAATAC TTTCGTTTAC AAATTCAGCT 'rTGACGGCTG ATGTATrGGT TAAACTCTCA TrGGGATTAA CTGA'rT'CC TCATAAAATT AGAGTTTGAA AATGAGTGAG ATTTTTTATT~ GTACTTr'CCT CATTrGTGGA GCAAATTTGA ATCATCAATA AAATAGAGCG ATACTTTATA TTCTGACCGT TCAGCCTGCC AACGTAAAAG 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 ATCAGTCATG CC-ACGTTGAA TGTGTGAGTT 1200 GTTCCCCGAA CCTTTTGAGT TCTACAGACG 1260 AAAGCGTAAA AACCTTGCAA CGAAAGGAAT 1320 ACAATAAAAC TATGAATGAT GAAGCAAGTA 1380 TTGTAGGTGT TCAGCCCACG ACT'rTrGAAG 1440 AACGTAAACG CGCAAAAGGT GGACGAAAAA 1500 TAACTAT'rCA ATACATGCGA GAATAGAGCA 1560 TTCACGAAAG CAACTTAATC CGTCGGAGTC 1.620 GTTTTAcG.AT TTCAAATTCT GCCTTAATTC 1680 AGGTAAGAGT TTrrTCTTT CTGAAAAAAT 1740 AACGGAATTT GGAACAGATT TGTCTGTATC 1800 TAAGAGGCAC TTATTAAATT ATTCCATCTT 1860 CTTGATTGTG GTCTATTCGA CCACCAGTGC 1920 TATTTTAATT GAAGAAGGCA AGAGCGCCTT GCAGTTGGTr CGAAACCAAG GAATCTTTTG 18 1980 775 GAITGTTAGT TTGATACTGA TTGCCTTAAT TTA'AAA'rrG AGACTAGATT TTTGAGAAA TGAGcGAcTA ATcATrTAG TTATATTAAT AGAAATGCTT TTATTGTrcT TGGcTCGTr-r TATTGGTAI-r TCCGThAACG GGGCA'rACGG TT=GTTTCG GTTrGCAGGAA TAACrATrCA GCCAGCTGAG TACrrAAAAA TCArrATI-AT T1-GCTA'TwrTA GCTCACCGAT TCTCCAAACA GCAAGAAGAA ATAGcTACTT ATGAT1-rTCA AGTrTr.ACT CAAAATCAAT GGCTTrCCCCG TGCTTTTAAT GATTGGCGAT TCGTTCTCCT TGATTTAGGA AATGCGACTA TTTTAGTCT'r AATCGCTTAT CGCTGGTTT CAACCATTCT CT'rGACCACT ATCAGCCTAA TCGGTGT'rGA TGTAGCCAAG CGCTTT1AGTG CCTTTTAA AGTTCTGATT GGAAGTTTGG GAATTTTCCC GGmTCCTTG ATTATG'rATA CAGTTAGTGG GGCGCTCGTA TCT~GCCGCTT CTGTCTGT GACCT'rTTCA AAAATTCCAG TATTCGGCTA TCCTTTTGCC CATCGTGCTG ATGCAGGTCA a a a a a a a. a.
a a a a. a.
a CCAGTTAGCT AATTCT'rATT AAACTCGATT GAAAAACGAG CGTGATTGAA GAAr'rTGGCT GATTTTGCGG ATTATCTTGG ACTCGGTGTC GGAGGGATGA CTTGATTCCA .TCTACAGGAG AGTCTTATCA GTGGCAGTAG ATTGTACCGA GAATTGGAAA GATAGTTTAT GTCTCr'?CAA AAGTCTTGAT TCTAACAGAA CCTTTCAAAA AATAATACAG ACCGTCTAGT GACTAGCTTA TCTTGCCTCT TTTGATTAAT AAAATAATAT TGATCAGGAC AAAAGGAAAA TGCCGTTGAG ATCCAACACA AGTGCAACGC TGCGTAAATA CCG'rGATGTT 'rrGCCATGGT CAATGGCGGT TGTrGTC TAGGTCTTGG GTTATTTGCC AGAAGCTCAT ACAG.ACTTTG TCTTTTCTAT TTGTTGGTGC CAGTCTTATT TTAGCTCTCT TGTTTTTCAT
TCGGTATCCG
TGTTGGTTCA
TGACTT'rCcC
CCTTTGTCTT
ATCAACCAAT
AAATTAGAAA
AGCGGAGAAT CCTTTCALATG GGTATTTC;TC AATATCGGAG
CTTCTTATCC
AAATATrGAT GAACC'rTCTG
ATTATAGTAA
CAGC-GTGGAA
GCCAGTGAA
TTGA.AGTAGG
TAAA.AGTGTT
CCATGGTTGC
GGATTTCGG
ATAGTCI-rCT
AACGCGCTAA
ATAA.AGAAAG
GTGCAAGAAG
2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3 360' 3420 3480 3540 3600 3660 3720 TTACTGGAAG A'rATTAC'rAA AAATATGCTT CCCCCAGAGA TTGAAAGAAT TATCAACGCA GGAAGATTAT CAAGGTCTAA
TCAAATGATG
ATI-rCAGAGG
TATTTAGGTA
ATCCTAGAAC
AAAAGTATCT
AAATCGTCTA
ATGTGGATTT
AATTATCTAC
ACTTGAATGT
TGGATTTAAC
TATTTCACGC
AGCTTATGAA
AACGATTAAA
TGTCCCTGTT
AA.ATCA'rATT
TATTTCTCTA
ATCAATCATC
TTGGTAGCAG
TTGACAGCCC
CATAGTCTTT
TACAATGAT'r
AAATTAAAAG
A.AAGCTGTAC
AAGTTGGGGT TGATCAATAA AGATAAATCG TGCGTCGTTA CATCGAAATT ATCATGCAGA CAGACATGAT TCGTGAGAAA TGACTAACGA AATCACCAAT GC'rATGGAAT ATTATAACAG CTCCTT'rPG 776 CTCATTTGAC GACGGAGTAT AAGCGCTTAG CGCAGCG TGG'rCrGAA'r TTAAAACAGG CTAAACCAAT CACCATGCGT ATGTGGATAG GTGGTGACCG TGATGGAAAT CCATTTGTTA CAGvCAAAGAC CTTGAACCAG TCTGCACTCA CTCAGTGTG.A AGTCATCATG AACTACTATG ATAAAAAGAT TTACCAACTT TATCGTGAAT ?r"TCTCI-rTC AACTAGCA'rT GTCAACGTCA GCAAGCAAGT CAGAGAAATG GCTCGTCAAT CCAAGGATAA CTCG-ATTTAC CGCGAAAAAG AGCTTTACCG TCGTGCCTTG TTGATATTC AATCAAAAAT TCAGGCAACT AAAACCTATC TGATTGAGGA TGAAGAAGTT GGGACTCGTT ATGAAACCCC CAATGATTTC TACAAGGATT TCATT-GCCAT TCGAGATTCT CTACTAGAAA ATAAGGGCGA CTCCTTCATT TCAGGTGATT TrTGGAAT'r A'rTGCAGGCA GTAGAGATAT TTGGTTTTA CTTAGCATCA ATTGATA'rGC GACAAGACTC 'rAGCGTCTAT GAAGCCTGTG A'rTCTCGTTA TAGCGAGTTG AGCGAAGAAG AAGAAGATCC CCGAATTCTT TCTGCGACTC AATTAGCTAT TTTTAAGACG GCTCGTGTTT GTCAGACCAT CATTTCACAT GCAACCAGCC TAAAAGAAGT AGGACTGGTG GATACGGAAA AAACAATTGA AGACTTGGAT CATTCAGAGG TTGCCAAAAA ATGGATTGAC TCACGAAATA ACAGTAATAA AGATGGCGGT TACTTGTCAT TGGCAGAACT CTTGAAATCA GCAGGAATTC AAAAGTGTGA CCTTCTCTTG; AAAGAATTAG ACGCACAAA.AA ATCAGAATTA TTAGCAAAAG TGAAAGATAA GTTGGGAGAT GATGTCATCC TTTCTGATA'r GCTAGAATTA GCTATTCTGT GGGCGCGTGT 'rCAGATTGTT CCCCTTTTTG AAACAATGAG AAAATATCT'r TCTCTTAGCC ACTACCAAGA AATC-ATGCTT GGCTACTCTG CATGTTGGAC CCTCTACAAG CTCAACAAC 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520
AATTGACTGC
GTACTGTCGG
CTATCAAGGA
ACAAAGACGC
TACTCAGAA
TAGTGGACCG
ATTATTTCTT
CAGCCGCTCG
CATGGTCACA
AATTTATCAA
CTTTCT'rCCA
TTGCTTTTGA
TATTGGAGAT GAATTTCGCG TTAAGGTTAC CTTCTTCCAT GGTCCTGGTG 'rCGTGGTGGT GGGCCAACCT ATCAAGCCAT TACATCTCAA CCGCTCAAGT TCGTATCCGC TTIGACGGAGC AGGGTGAAGT AATTGGGAAT AAATACGGTA CGCTTACTAT AACCTTGAAA TGCTAGTATC GGCAGCTATT AACCGTATGA GAAGAGCGAT ACCAATACCC CAAATCGTTA TGAA.ACCATT ATGGATCAAG TAGTTACGAT ATCTACCGTG ATTTGGTCTT TGGTAATGAG CATTTCTATG CGAGTCAAGT CCAATCAAGG CTATTTCAAG TTTTAATATT GGTTCTCGTC TAAGACTATT ACTGAAATCG GTGGT'rTGCG TGCCA'rCCCT TGGGTATTCT GAGTCGTGTT ATGTTCCCTG rAAAAATCCA GAGAATATTG ATCGCTrCTT TCAAATGTTG ATATCCTAAA CTTTG.TGAAG
GATGGTACGG
CTATCTTACG
ATATCGTTTT
ACGAGCAAGT
CGT'rGGTTCA AGCr'rCAAGG AGATATGTAC CAAAATTGGC GTCAAAATCA AATATGAATA TAAGGCCATC TA'rCAGACTA 777 TTTAAATGA ATGGCAAGTP ACTAAGAACG TrATCI-IGGC TATrGAAGGA CATGACGAAC TCTTAGC1!GA CAATCCATAT CTAAAAGCTA GTCTGGATTA CCGATGCCT TACTrTAATA ?1'CTCAACTA TA'rTCAGTTIG GAG?1'GATTA AACGCCAACG T4CGTGGAGAA TTG'rCCAcG ATCAAGAACG A'?rGATTCAT ATCACCATCA ACGGAATTGC GACAGCATTG GNTGATAATT TTCAAGAGTG AATGCTAAAA TGACAAGTAG TI-rAAAAATG ATATAATTTA TTAGAGAGTC 'rGTGGrAGCT GAAAACAGAT GTGAATATCA AAAAAAT-rCT ACCATTCAGA AAACTAATCA AAGTGGCAAT GATGAAAATT CGTAATTCAG 5760 AATAGACTAT 5820 TACAAACTTT 5880 GGGCTGAATG 5940 ATCTCGTTAT 6000 ATCGACGTCC 6060 AGAAAACGAT 6120 AAACAGAATG 6180 *fl 0 0* 0 0 *0*0 *0 0 0 0 CTATTTAGAA 11 TAAATTA TOCOAGACTO AGCGATAGGG TCACACAAGT TTrTTTGTGTG GGCGTCGCI'T GTTTAGATAA AAAAATAAAC GTTTAATTGG ATI'TATTCTT CAATGAATAA AAGATTGGTG TGCTTCAATT CAAGATGGAC TTGCAGAAGA TCAGAAGGTG ACCAAAGTAA GACCTrGTGG TTGGTATCGC CTACCGGTTA TCATGGCCGC TAAAAATTCG GTAAGCACAC AAATTCCCTA TAATTGAGGT AGGATT'TTTr TGATGGAGGT GTGAAATATC TTAAAGGAAA AATTATTGCT GCATTAGCAG ATCAGAAGCT CAGAATAATA TGTGAGCCAT CCATCCCTrG AGGATATAAA GATGATCAAC GGTTGCGACA ATGAGTAA.AC AACACCAGCA GCCCAAGGGT TATTACAGAC CCAATTGGTG
CTTACAGTGC
GGTACCGCGC
TACTATGGAA
TAAAAAGGAG
TCTTAGTAGC AGGAAGCTTG AGGATGAGAA GAAAATAACC ATTTGATTTA TAXAGGGATC TTAAAATTGA TTTTATGAAC A.ATTGGTrGC AAATGGGAAT TGGCTAGTGC AACAAAAGAC CTAACTTGGT TAAAGATTTG ATCCACCTCA ACA.ACAAGTT AAAAAACCAG GTGGCAACGT TACAGGGGTA TCTGACCACA GAACTCATCA AGGCTCTGAC ACCGAATGTG AAAACAATCG GAGCTCTTTA CTCAAGTAGC GAACACAATT CAAAA INFORMfATION FOR SEQ ID NO: 105: SEQUENCE CHARACTERISTICS: LENGTH: 6516 base pairs B) TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105: CTAGAGGATC CCAGCAGGTA AATTGGCTTC AGCTGGCAA.A AAAGTTGCCC TCGTTGAACG CAGCAAGGCT ATGTACGGTG GAACTTGTAT CAACATTGGT TGTATCCCAA CTAAAACCT 6240 6300 6360 6420 6480 6540 6600 6660 6720 6735 120 779 GCTAGTTGCT GCrGAAAAGG ACTTGTCTTT TGAAGAAGTC ATTGCTACTA AAAACACGAT CACTGGTCGC CTCAACGGTA AAAACTATGC GACTGTTGCr CGTACAGGCG TGATGCGGAA GCTCACTTCC IrCAAATAA AGTCA'rCGAA ATCCAAGCTG GAA6AGAACTG Ac-rGcTCAAA cAATCGTcAT cAAcAcTGGT GcTG~TcAA AATCCCTGGA CTTGCTACAA GCAAAAACAT CTT-rCACTCA ACAGGTATCC
TAGATATCTT
GTGATGAAAA
ACGTCTTGCC
AAAGCTTGGA
AATTrGCCGG
CATTCCTACC
CAAATTACCT GAAAAAcrG GAATCCTTGG TGGCGGAAAT CCITrACAAC
TCGTGCAGAA
ATTCCTTCAA
AACTGAACAC
TGTAGAACCA
ATCGGTCTTG
GCCTTGGATA AAACT'rGGAA GCAAGGTCAC AGTCCTAGAT CCTTCCATCG CAGCTCTTGC TAAACAATAC AATATCCATA CTACTGAAAT CAAAAACGAT GAAACTTACC GTTTCGACGC CCTTCTCTAC CTTCAACr AAAATACAGA TA'rTGAACTA ATGGAAGAAG ATGGCATTGA GGTGACCAAG TGCTTGTCGT GCAACTGGAC GCAAACCAAA ACTGAACGTG GTGCTATTAA GCAGTTGCAG ATGTrAACGG GTTCTTACA GCTACCTTGC CCAAATACTA TGTTCA'rCAC 4
I~
C
C
S 0 C. *t 5 AGTAGACAAA CACTGTCAAA CAAACGTTCC TGGTGTCTrT TGGCCTTCAA TTTACT'rACA TTTCACTTGA TGACTTCCGT 0S .0 S C
C
4 5 *505 0S 55
S
C
TGGAGATGGC AGCTATACAC ACCTGCACTT TCACAAGTTG CGCTG'rTAAG GAAATCCCCG CdGTGCC'rTC AAAGCTGTTG CTCAGAAGGT TCTCAAGAAA TTACACTTAC TTCACAAAAC CTTGTTTGCG ATTTAAGTTG ACTTCTGCGG AATCTCAAAT TTAAAAACTA CTTTGGGCCT GGACGTGGAA GTACGCTCC AACCTAGCTA AGAACTACGA ACTGCCCTCA CTGTCGGCAT GGTGCCAACA TGAT'rACAGG GGGAAAAATA TTGCCGTCCT ATCCAGCCTA GTCTTTTTGT GAAATCTATA CTACCTATAA GTTTrGACTGA AACCCAAGCA GCTGA'rTGA AACT'rCCATA TTGCAGCA6AT GCCTCGTGGT CACGTAA.ATG GAGACCTTCG TCAATACTGA AACAAAAGA A ATTCTTGGAG'CAAGCATCTT TCATCKACAT CATCACTGTT GCTATGGACA ACAAGATTCC AAATCTTCAC TCACCCAACC TTGGCTGAGA ACTTGAATGA AGATT'rAATC GTATCGAACA GCCCTCTTTG GGCTGT'TTTT CTGTCTTTCT CCTCrTTTAT GATATAATAG AAACATGAAC TCTTGCTGGG CGTTCTTCCC ACTTCGTTTT AAGCCGTCTT TTGAAGACCG TCTCA-ATGTG 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920
AGGGAAAGTC
GATGCTCGTT
TTTAAAAGAG
GATTGCAACA
CGAAATTGAC
GCCCTTCAAT TTGATAAAGA TATTTTACAA GTCACTGGAA CAAATGGAAA AACCCTGACA GTTTATGGTC AAGTTCTAAC CAACCCAAGC ACCTTCCTAA CAGCCAAATC TTCTAAA.ACT GAAGCCAGTC TATCTCCTAT CTGTGACTAT CATTACTAAT ATCTTCCGTG ACCAGATGGA CCGTTTCGGT CATGATATTG GATGCCATTC GGAAAGTTCC AACTGCTACT GTTCTCCTTA ACGGAGACAG TCCACTTT'TC TACAAGCCAA CTATT-CCAAA CCCTATAGAG
I--
779 TM-ITTGGT.-T TTGACTTGGA AAAGGGACCA GCCCAACTGG ATTCTCTGrC CTGACTGCCA AGGCATCCTC AAATATGAGC GGTGCCTATA TCTGTGAAGG TGTGGATGT AAACGTCCTr.
AAACTGG'rTG AGTTGACCAA CAATCGCTCT CGCTTTGTCA ATCCAAATCG GCGGGCT~rA TAATA'rCTAT AACCCCCTAG 'rTCCTAGGTG CCGATTCGCA ACTCATCAAA CAGGGATTG GGACGCCAAG AAACCTTTCA TATCCGTGAC AAGCAATGTA CCAGTrCGGTG CAACCCAAGC TATCGAAATG ATCAAACTAG TCTGTCCTCC TTAATGCCAA CTATGCAGA'r GGAATTGACA CTCACTACAA TACCGAAGGG ATAATACCTA TCAAACr-rG ATCTCGACTA TCGTTTGACA TrGCGGCCA AGAATACGGT CTGCTGTGGC CA'rCGCCCG'r- ACAALGAGCCG 'rGCTGTC"=r- CCCTTGTC?1' GATTAAAAAT CACCTTATCC ATTTAGCCTA CTAGCTGGA'r C'TGGGATGCA GACTTTGAAC AAATCACTGA CATGGACATT TCTGAAATCG CTCCTCGCCT CCGAGTGACT AGTAATCTGG AGCAAGTTCT CAAGACCA'N' CTGGCAACTT ATACTOCCAT AGAAAGGAGA TGAACTAATG ATCAGCTCA.A CATTGCCCAC CATCCTCATC CTCAAGTATG TTCTCTCCZAT GATGACTTTG AGACTTTGAA CAAAGTATCA CTACATCCAA AACGACGGTG ATATrATGTT GAAGCTTCAG GCTCAACCAG ACCA6ATAACC
GCTCCGAATTT
GTTTATACTT
CTCTACGGAA
TGGCTGAAAA
ATGAAAATCA
CCTGAAATCA ACGCTGGCGG GGCTATCCAG CTGAGAAAAT GAGAATCAAG ACTGCAAGCA CGTGAACTGC TGGCTAGTCC CACTTTCCTC AAAACATGGC ATCTCATGAA TAC tACGGGG AC-TGGGAGCC CATGTGACCG CTACGACATC GCCT'T'rCG
TGTTCGTCAT
CACTGAAACG
TGCCTATATT
TCAGATTGTT
AATTACCCCT
ACAATGGAAA
'rTGACATCGT
GTGGTGGTCA
GCATTGACAA
1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 TTGCAGACGA CCTACCTGCT AAAAA.AGAGA TAGTTCTGGC TATCTGCGGT GAAAACGTAT CGAAGGGCTA GTTTTATCGG TGACATCAAG .000 TGAAACCTAC TATGGATTTG AAAATCACCA AGGTCGTACC ACCGCTGGGA CAGG;TTGTCT ATGGAA6ATGG AAACAACGAA TCAT'rATAAG AATGTCTTTG GTTCCTACTT CCACGCGCCT TCTGGCTTAT CGCCTAGTTA CTACTGCCCT CAAGAAGAAA.
GGTTTCCA6AC TATTGGGTCA GGGGTCATGG GACACTACAC ATTCACAATG AAGATTTCGA TTCCTCTCTG ATGACCAAA6A GAAAAkGGTCG GTGAAGGGGT ATCCTCTCTC GTAATGCCAA TATGGTCAGG ACATCCA6ACT CCCTGCCTAT GAGGACATTC TCAGCCAAGA AATCGCTGAA GAGTACAGTG ACGTCAAAAG CAAGGCTGAC TTTTCTTAAA CAAAGGAAAA TGATATCAAA GAACTCCGTT ATCTTGTCGG AGTTTTTTGT CTTTTCTTTT ACCCTTCTCC CTTGCATTTT C'rCTCATTTT TTGCCAAAAT AGAGGGGTAG AAAGAAGGTA GCATATGTCT AAATTACAAC AAATCCTAAC ATATCTTGA-A 780 TCAGAAAAAC TAGACGTCGC TGrCGTATCr GACCCCGTCA CAATCAATTA CCTCACTGGT r-rACAGTG ATCCCCATGA ACGCCAAATG TTCCTCTTT1G TCCTAGCAGA TCAGGAACCT CCCTCTTTG TCCCAGCTCT TGAAMTAGAA CGTGCAAGTA GCACCGTTC CTTCCCAGTA GTGGGCTATG TCGATTCI'GA AAATCCATGG CAAAAAATCA AACATC-CTCT TCCACAACTT GACTTCAAAC GTGTCGCTCT TGAGTTTGAC AATCTCATCT TGACCAAATA CCATGGTTTG AAAACAGTTT TTGAGACTrGC TGAGTTT~GAC AACCTCACTC CTCGTA'rCCA ACGCATGCC CTCATCAAAT CAGCTGATGA AGTGCAAAAA ATGATGGTTG CAGGTCTrA TGCTGACAAG GCTGTTCATG TTCGTTTrTGA CAATATTTCT CAAATCGACT TTCCCATGAA ACGTGAAGGT ACTGGTGArA ArCCTGCGAA TCCACACGGC GCTCTTCTCC TCTTTGACCT GCG'rGTTCTG ACAGTCGCTG TCGGCAAACC AGACCAATTC GCCCAACAAG CTGCTCTTGA CTTTATCAAG GCTGCCCGTG AGGTCATCGA AAAAGCTGGT CA'rGGTATCG GTATGGATGT CCATGAATTC ATCGAAGAAG GCATG'rGCTT CTCTGTTGAA GTTCG'IATTG AAGACTGCGG TGTTGTTACC AGCAAAGATT TGCT'ITATTT TGATTAAACT TCTAGGGGCT ATTTTATTG'r CATTrT'1CTG TCTAACCCTA AGTGTCTGGA ATGATAACGA GATGAAACCC CAAGAAACAA CAATGGAAAT GATGTTAGAA AAAGTTCAA'r TCACTAGAA ATAGTATCAG GTATTGTGTA CTGACCCCAA TTAGTTCTGT ACTGCACAGG ACTAAGTCCT TAGTAATCAA TATAG'rCTA'r AATGACTTGT CTTGATAAGA CTGAGACAGA TATCATCGCA TATGAAATGA GCTTTGATAC CATGGTCTTG ATTCCAGC.AG CTAATAAGGT TGAAAATGAT GTCAATGGCT ATGCGTCAGA TATGACTCGT AAGAAAGATA TTTACAACTT GACTCTTGAA CCAGGTGTGA CTGCrCATGA AGTGGACCGC TATGGTGAGT ACTTCAACCA CCGTCTCGGG CCATCTATCA TCGAAGGAAA CGACATGGTC CCAGGTATCT ATA'rCCCTGG TAAAGTCGGT AAGGATGGCT TCAACCTCI' TACAAGCACC ATATAGCCCC TATGCTII'CC TTTCAAAATA CTATTATGCT AAAGAAATTG GCTGCAATAA GGGTGCTCTC. CGCTTTT'ATC AAAGACAAGG GATAATTGAT TAAGAAGTCA TCTATCAAAA AATGAGGAAA ATCTCCCCAC AATAAAACGC ACAGTI'AGAC AATTAATTTA TCCGAAGGAT TTT'AGTTTTA CCTTAAT'rCG TT1TGTTGTTG TCCAATTGGT TAAGTGATTT AAATGTTTC 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 51.60 5220 5280 5340 5400 5460 .TCATAGCCAT AAAACATTTC GGATTTTAAA ATGCCZAAACA AAGATTCCA'r CATACCGTT'G TCTTGGCTGT TTCCCTTGCG 'rCACATAGAT GCTTGAATTC CCTTATTCTC TACGAACCGA TGATAAGAAT CGTGTTGG'rA TTGCCAGCCT TCGTCACTAT GGAGAATCGT ATTCTCGTAG TGCTTCTCTT TGAATGCCTG TTCCAACATT GTTTGTACTT ATTCTAAATT AGGCGAACAA GAAAGA'IrAA AAGCAATAAT TTCGCTGTTA AAGCCATCTA AAACTGGTGA TAAGTAAAGC 781 TTTGAGTAC TTGCTGGAAT GGCAAA7TCA GTCACATCTG TG;TAGCACTT TTCCATTGTT TTAGAGCCTr CAAATTGGGC TTGAATGAGA TTCTCTGCCT TC~rrACCAAC GTCTCCTrTA TGAGAAGAAT ATrrTrCGTTT CTTTCGCATI' TrAGCTTGTA AA?1'GAGTAC TTTCATCAAG CCTrGAACTC TTTTATGAI-T TACCAGATAA CCACGATTTC TTAGTrCTAA ATGAACCCGG CGATAAGCAT AATTTCCCI-r GTGTrCGATA AAGATGGATT GAATTTCAGT TTrAAGCTCT TGGTCTTTAT CTG'TTrTGTC TAGCTGTTTC AAGTGATAGT AGTAGGTCCA ACGAGCTAGT TTAATGGCTT CTAGAAGAAG ATCrAACGAA AACTCAGTCA TTAATTCTTG AACAATTTCT 5520 5580 5640 5700 5760 5820 5880
S..
S. S
S
.5 5 5 S. *S
S
4 S S GTCTTrCTTC TTTCTC7 rLT CATTCTCCGC TCTCAGGTAC CTGTATTGTG CTAGCCAGTT ACTCTATCTT TAGTCCAGCC TCTAAAAACC ATCCAGAATC CAATGGGTGT TTTTTACTAG AAACACAAAA AGAAAGGAAA GTCTTTTTAC CAACTATCTT TTTTCAGGAA CTTTTTTCCC GAATGAmCAA CGCCGCTACT TCAACTGTTA ACAGGTTATG
TCCTCCTTCA
TC'TCCCTCTT G'TrTTCTCAA CAATAGrATA AAGAAGTATC GTACGACTT-G GGAGACCGTA TTCATGTCAG ACTTTATITAA CCCCAATTAT CTTGCCTTAG CTTAGATCCT GGATGGTTTC ACAAAAAAGA, GTTTCCCCTT TATGGTATAA CrCACATGAA CAGTTTACCA AATCATCACT TCGATGGAGG TCATTrTAACC CAGTATGGTG AGTrGAAACT AAAAGAGCGG ATTTCTAAGT GTCGTTATTC GGATTCAGAT ATCC-TGTCC GAACGGAATA TGCTTG ATCGGAG'rTC TCTTAACTI-r TTTAGGATGG 5940 CCCGTTrTC 6000 TTCAAGAGAA 6060 TCACCCCAAA 6120 TTTTTTCACC 6180 GTGTAGAA-AA 6240 TCCAAAACAA 6300 GTCTTATCTT 6360 .TTTAGTAAC 6420 AGTTCCTCTT 6480 6516 INFORMATION FOR SEQ ID NO: 106: SEQUENCE CHARACTERISTICS: LENGTH: 14654 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear fee.t~ 9-05 so S (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: TTTTCAACCC ATATCGTGGC TCCTGAATAC TACTTACTGA CA.ACTATGCT ATCAGAGACT TCTCTACTTG TTTTCTATAT CATTTTCATC CATAGAAAAC AACTCATCCA CTTGGGACAT ATCTTTAOCT ATACTGTTCG ATACTCTCTC TTTTCACT-r CCTTTGTAGC AATTTATTTC CTGATTAATT TCGTGTATCC TGTAGATATG GTCATTAATT TGCCATTTTT GATTAATACT GGTTTGATTG TCTTGCTATC AGCTATCTCT TATATTAGTC TAC1 TGTCTT CACAAAAGAT 782t AGCATTTTCT ATGAATTTTT AAACCATGTC CTAGCCTTAA AAAATAA6ATT TAAAAAATCA TAGGAGTTTA:IAAATGAAACA ACTAACCGTT GALAGATGCCA AACAAAT'rGA ATTAGAAATT TTGGATTA'rA ?1'GATACTCT CTGTAAAAAG CACAATATCA ACTATATTAT TAACTACGGT ACTCTGATTG GGGCCGGTCG ACATGAGGGC TTTATCCCTT GGGACGACGA TATTGATCC TCCATGCCTA GAGAAGACTA CCAACGATTT ATTAACATTT TTCAAAAGGA AAAAAGCAAG TATAAGCTCC 'rATCCTTAGA AACTGATAAG AACTACTTTA ACAAC'TTTAT cAAGATAACC GACAGTACGA CTAAAATTA'r TGATACTCGA AATACAAAAA CCTATGAGTC TGGTATCM~ ATCGATATTr TCCCTATAGA TCGCTTTGAT GATCCTAAGG TCA7TGATAC TTG=rATAAA CTGGAAAGCT TCAAACTGCT GTCTTTCAGT AAACATAAAA ATATTGTCTA TAAGGATAGC Cr'rTTAAAAG ATTGGATACG AACAGCCTTC TGGTTACTCC TTCGACCGGT TTCTCCTCGT TATTTTGCAA ATAAAATCGA GAAAGAAA'N' CAAAAATATA GTCGrGAAAA TGGGCAATAT ATGGCT'TTA TCCCTTCAAA A'I-I-AAGGAA AAGGAAGTCT TCCCAAGTC TACCT'rrGAT AAAACAATCG ATTTACCCTT TGAGAATTTA AGCCTTCCTG CACCTGAAAA ATTTGATACT A7I'=ACAC AATTATCG AGTCATGAAT TTCACGCTTA AATTAAAGAA ATTCAACTAG TATTCCTTAT TTTCTCAGTT TCCTGGGAT GATGATATTG GATTATTGAA GAAGAAAATC GTACTTCCAT AATTTCGCAT AGATTATATG ACCCTACCAC CAGAAGAAAA ACGCTTCTAC TAAATTGGAG CATTACATG CAATATTTAG AAAAAAAAGA CCCTGCTGGA CTATATTGAT GAGACT'rG'A AGAAACATGA ATGGAACCAT GCTTGGAGCC ATCCGCCACA AAGGTATGAT ATATTTCCCT TTATCGTGAG GATTATGAGC GTTTACTGAA ACCCTCGCTA CAAGGTTCTT TCCTACGATA CATCT'rCTTG 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 CGATTTTGGA CACTTCTACT 0 .0 GTACAAGCGT CATGATACCA GCCTTTTCAT CGATGTCTTC CTTGAGCATT GTrCGACA6AGA GCTATA6AGTrA TGTGGCTCTT AAAATCACGA GCAGTTCACG GTGATAGCAA ACTAAAAGAT GTACGCTCTC CGATTTGTCA ATCCTCGCTA CTTTTACAAG AAATGCTGTA ACCAACACTC CTCAATA'rGA AGGAGGAGTT GAAAGA.ATC rTCCCAGTTG ATAccTT'rAA AGAAcTGA'r'r GTTATAGAAG ACCATGTTAA CCAATTGATC GATTTACAGA CGTCA6ACTAG CTTATATCAA TTTCTTAGAT TATGTAGCTG AAPAATTGATC AACTAGTCAA GGGATCGGTA AGGAAGGGAT TTA6ACTGAGT TTGAGGGCCG CAGATGTATG GCGATTATAT TATGT'rGCCT GTTCCCAAAA AATATGACCA ATTTTTAACC GACACCACCA TCAAAAGAAA TGCAAGAGTG GTATAGTCAT AGCATTAAAG CTTATCGCAA AAACTGATTG AGGGGGATTA TACAPLACTAC rA-AGATAGAG GTTATTCAAA AACATAA'I-r TAGTAGAAAA TGAAATACAT ATTCCCACAA TAAAACGCAT CATATCAAGG rT?1'GAAAA ACCTTGATAT GATGCG1TTTr ATAATTTTAA ATGCCAACAA ATCAATTAGA AAATTCAAAT GTACTGTTCT AAATTCAGTC AAGCTACGAC TTTCAAGTAC GGCACCCCGT GTTCAATGGC TTTGT'rCCTA CTGTGTTAAT TCCTTATCCT CATCACCAAT AAGGCTGCTG TCCCTTCTGT AAGTTCAAGG CTTCTTCTTT CGCAAGTGTA GATAAGAAGC GAACCCATAA CTTCACCTGT AGTTTGGTAA AGGAGAAGAC CCATTTTGGT AGCCAAGTTC GCCATAGGAA TATTGGTTAC ACCTCAA'rAA CGTAGACTTI' CAGTGAAGAC CGATTGCTAA GACAAGGTT GTGGTGGGTA ATATGC'rCCA TGATACCAGG CACTCTTGCC CAACGATATA GCAGTTCGCA TGTAAGAACG CCAAGTACA'r AAGATGGGCG GCTTCTTCTT CATTGGTAGC GCTTGCTCGA AGAGG'rCACC ATGTCACAC CTGCTTTTGC
TGCTATATCT
CTTAAGCATG
TGAZACGACGA
GATAGCTTGA
CTTACCAACA
CGCAAGGATT
GGCATCATCA
TTCAAAGGCT
TGACTTCA'N'
ACGTGCCITG
TGATAAAC'N'
783 AGACTTTTT CTATACTAGA TTGAAATAAG TAArrATAG AAATATTTTA GTATTCCTGT TATTTTTCTA ?I'rAAATCGC TTCTGTAAcA GCATTAGCTG TATCTAGCGC TGTGAAGAGG ATITGCTCAC CATCTI'CGTrC AGCAGTTCGT ATCTTCCTT TGCGTACAAA ACTTGGGATA GGTTGG~C~r GCAAGCCATG ACTAGCAAAG CCATAACCAA TGTTTTGGAA ACGACGAGCC GCGATGGTAA AGACGACATT ACCAAAAGTT 'rrATAGAGAG CrT=-CCAA AGTAGCATCA TCAGGACCGA GCAAGCTG;TC TACCTTAGCT ATATGAACAC GGGTGCTT'?C AGGGTAAAGT TGACCAAGAA TGAGTT'rCC? CGCTACTTGA CTTAGATAGG AATGGAACAG TACGGCTGGC ACGTGGATTG TTCATCCTTG ATAACAAACT GGATGTTCAT CATTCCAACG 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840
GCGTTTGGTG
AACAGCCATT
AATGAGTACA
AGAGTCGACA
TACGTCTGCGA TGGTCTCCTG AACCTTTTGC GAGTCACCTG AGTGGACACC.AGCACGTTCG TTTTTACCAT CTGAAATGGC ATCAACTTCG AGAACTGGGT GGTCTGGACT AGCCT'rAACA AAGGTCTTCT TCGTTTTA GACAAGAACT GGGAAGCCAA CGTTGTCCT GGTGGCTGTG GTCTTCGGCA CGATCTAGGT CAATGGCTCC CCAAGGTTGA CGATTTCCAT GGCACGTCCA TCTTGCGAGC TGCAAGAGCT GAATATCCAA TTCTTGAGA CAGCAAgCTG TGTACCAAGG TGGCTG=TG ACCACCGAAC TCATAACATC TTCGAATGTC TTGAAACCT CTCTGGTTT CCTTAACAGA GTGAACCGTT CI'GAACCTAG GACAAGTACA AGGTTGAATA GAAATA'rGGC
TGAACGATAA
AATGGCTCAA
GAGTTCATGA
CTCCCTTTGG TTGTTCCAAG TCAATGACGT AGTAA.AGCI' ATCTGATACA GAGAACTCTG TGATAGCTTC ATAACCACCT CCCTGGATAG GCGTAGTCA.A ACTCAACCCC T'rGACCGATACGGATTGGAC GATTCTTTAT CAGATCTGAT AGATTCAT'T TCCCAACCAT
GTTTCGGAGT
?TTCCAAGC
CGGTCTGAAA
CCCAATTCrr
TTTGTAAGCT
AAGAGACGGT
GCAAGTTCAG
784 CGAACTCTGC CGCACAAGTG TCTACCATCT GAAGTTGGCG AACTITATCA TCAG.TCGTTC AACCATrAAG TTTGGCTGTT~ TTCAAAACrr GCTCAATTTC AA6AGATATGC AAGAGTTTAT CTGCAATTTC TTCAGGTGTC TAGCCACGAC CATCTNGGGC =rrGACAACC TTTTCAATCA GTATT'rCATT GTGGTGCACC CCAATTTCAA TATAAACTGG AACAATCTTO CCCALGAGTTC AGCAATCTTA CTAAATCTTG TGGATGAGCA CAAGATAGAA GATATCAA'TT GATCGCTI'C TGATACGTAG AGAGATTCCT CGATGTTACG ACCGATTGCC ATGACTTCTC CCGAGACGGC GTTCACCCTT TTCAAACTrG TCAAATGGGA ACG'rAGTCAA GCGCTGGTTC AA.ACATGGCA TAGGTTGAAC
AGGCATCATC
GGGAGCGGCA
CAGTCGCCTT
AACGTGGAAT
CTGTAACTGG
TAGCAATCGG
TTACTTCGAT
AGAAACTGC'?
GGCCTTGAGA
CATTTCTGTA
CTTAGCAACT
GTTTATAACC
ATATCCTGTC
AACATAkATAC TCATCCAAGG 'TCAAACCTAC TGCAATCTTG GCTTTAGAAG CAAGGGCTGA CGAACGTGAT TTGAAGC'rGT TAGGATCAAG AGCTAGCTGA CGAATAATGC TCAAGCTCGC ATCACGAAGC
GCAGCCAACT
ACACCAGGGT
ACATTACATC CACCTTCAAT CTTGAGGGCA ATTTGGTT?'r CATAGTCTGA CATGGTI'TGC GCAGGGGCA.A ATACA.ATGGA ATCCCCTGTG TGAATCCCAA CTGGGTCAAA GTTTTCCATG 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 TTACAAACAA CCAAGGCATT AAAtCCZAA TCGAACGCTC TCAGTGATT'r CACGCAATTC GTAAAGGCTG GACGAACGAT TCTACTGTGT TAACAATN'C T'rAAAGAGGT CACGGTCCTC ACGCCAAGCT CGTCTAGGAT TGACCACCGA GTGI-rGGTAG AACTCAAGTd TAATCGGTTC GTTGCAGGAT TTGAGTTAAC GTCAGCTCAG TCACGCATCA CTTCGTATTC AATTTCCTrG AATCAAACA'r 'GGGTAACAG GTGACAATTT CAAACCA'r'rT TTTCTCGTTG GCACACATAC CACCACCAGT ACCACCAAGG GACTGGGTAG CCAATTGTCG CTGCAAAGGC AACTGC?1'CT AGATTCTGGA ATGGGTTGTT CAAG=~CTTC CATCAATTGT CGCTTGGTCA ATGGCAGATA ATTTGGTACC CAGAAGTTCA ACCATTTTTA GATAATTCCA TGGCC-ATGTT GAGACCTGTC CAAGGCATCT GGACCTTCCT TACGAAGAAT ACGTGTCACA AATGTAAACC TTGTCAGCAA TTTCCI-TGTC CGTCATGATG CAAAACAACC TC-ATAACCTT CCTCr'rTCAA CGACAAGCAA CCTGAGTCC CAGCGTAG'rC AAACTCAGCA GCCTGACCALA TAATAATCGG ACCAGAACCA ATCACCATAA TrTTGAAT ATCAGTACGT TTAGGCATAT ATAAGATATT AAGGGTGTCA AGCGGACAAA GCT AAAATAG GACT'rATGAC GAAGAACTGT r-AGTTCTAGG AATAACTATC 'DTTTTAGCAC CGTCCGTAGC CCGTATTCAG TTrCAGCAAA'r ACGGAGCACC CTTCTrCCT CTATTCGTCG CCTCTCAGGG CGACAT'rAA.A TAAGATACAA AGGACGAATA GAAACGATT 785 GAATTTTAGG AAATCAAGGA AGGATTGACA ATCCAAGTTG GT7CTCTAC ATrCTGAGCT 5700 TTCCGTCCGT GT'rCAAGrTAC ATAAArrCTC CGACGAGCTT TTACTCGTTC TTAGN'GAT 5760 rGrrAAAAA CTTCCATCAT CTCGATAAAC TCGTCAAATA GGTAGCTAGC GTCOTGTGGC .5820 CCAGGAGCTG CATCTGGGTG GTATTGAACA GAGAAAGCAG GTTGGTATCT GTGGCGCACA 5880 CCTTCCACTG ACTTGTCATT GATTTCTTCG TGGGTAATAA TCAAGTGCTC TGGCAAA'rcc 5940 TCGCGGCTGA CTGCATAACC ATCGGTTCTGG CTGC'rGAAGT CTACTCGTCC TGTTGCGATT 6000 TCACGTACCG CATGGTTGAA TCCACGGTGG CCAAACT'rCA TCTTATAGGT C'PTAGCCCCG 6060 TT'rCCATTG CAAAG.AGTTG GTGTCCCATA CAAATACCAA AGATTGCAAT TTT'rCCTTG'r 6120 ACACCGCOAA TCATGTCGAG TGCTTGTGGA ACGTCI'TCTG GGTTACCTGG ACCATTTGAC 6180 AACATAACTC CGTCACGATT GAGATGGAGA ATTTCTTCAG CCG'rrC'TCGA ATAAGGAACA 6240 ACTCTCACGT TACAGTTGCG TTAGAAAGT TCACGTAGGA ?rrAGTGC -r GAGACCAAAG 6300 TCCACTAGCA CCACGCTCAA ACCAACTCCr GGAGCTGGAT AAGACC'T1-I- AGTAGAAACC 6360 TGTTGATA'r TGTCTGTCGG TAAAACTGTT GCTTGGAGCT GGTCCGTCAC ATGGTCCATA 6420 .CTGTCCCCAA CATGGGTCAA GGTTGCACGC ATAGTACCAT GCTTACGGAT AATCTTCGGTA 6480 :..AGACCACGCG TATCAArrCC TGAAA'rCCCT GGAATTrrCT TGGCTTrTCAA AAATI'CATCC 6540 :..AAGGTCATTT GGTTGCGCCA.G'TTCCTAGCT CTACGCGCT1' CI'CAAAAAC AACGAC'rCCC. 6600 TTACAAGTTG GAATAATCGA TTCATAATCA TCACGA'rTAA TACCATAATT 'rCCTACCAAA 6660 GGATAAGTAA AGGTCAAGAT T'rGTCCATTA TAAGACTGGT CTGTAATGGA TTCTTGGTAG 6720 CCGGTCATCC CTGTATTAAA GACGATTTCG CCTGTTACAT CAATATCTGC TCCGAAGGCC 6780 0TTGCCTTCAA AAACTGTGCC ATCTTCTAAT ACTAGAATtC ?N'TTGTCAT ATTTTCACCT 6840 CTCGTGGACG CTCACTGGCG TCTTTTAACG TCTTGTGTTT TAGTTGGCG'1 TTCTACTCGC. 6900 TAGTACGGAT TCTAAGATTG CCATT~CGAAC AAAGACACCA 'rTGGTCATTT GTTGGACAAT 6960 CCGTGATTrIr GGTGCI-rCAA CCAAGTGGTC TGCTATTTCT ACATCACGAT TGATTGGAGC 7020 *TGGGTGCATG AGGATTGCTG TTTCTTTCAA ACGATCGTAA CGTTCTTGAG 'rCAAGCCATG 7080 TTGGGCATGG TAGTCTTCTT TTGAAAATAC AGCTCCACTA TCATG;GCGrT CGTGTTGCAC 7140 *ACGCAGAAAC ATCATGACAT CAACCTrGATC AATGATrrCA TCAATGGTTA CAAACTGTCC 7200 ***ATAGTCTGCA AACTCTTGAC TTCCCATTC CTCAGGTCCA GCGAAAAAGA GTTCAGCTCC 7260 CAAGCGT'rTC AAAATCTGCA 'rAT'rGGATTT GGCAACGCGT GAGTGGTCCA AGTCACCTGC 7320 AATAGCAACT T1'AAGACCCT CAAAGTGGCC AAATTCCTCA TAAATCGTCA TCAAATCAAG 7380 786 CAAGCTCTGG CTAGGGTGTT GGCCCGAACC ATCTCCACCA ?1'GATGATGG AAGTCGTAAT CGTrGGAcTA GCAATcAAMT crcrATAr.TA ATCCACTCCT AAAGCAGACA GAGTCAAAAT CGAGCTAGTC TTCACATCAA AGTCAAGTCG GTCGACCTCT GGATGGCGAA TCACACAGAC GGTGTCATAA ACTGTCTCAC CC~rATTAAC TTCCAATCCA AGTTTAATCT CTGCGACTTC AAAGGACTTA TGTGTCCGTG TAGAATCCTC AAAGAAGAGA ?TGGAAACAA 'rCGGATGGTC TTCATAGGGA AGCTGGGCTC CATTITTTAAA CTCAAT'rCCT CCCTTGATCA ATTTCATTAC TTGATCGACA GTGACZTCTT CCATGGACAC CACATGGTTC AATGCTTGTT GATTTTCTGA 0 0 0 0 0 .00.
CATGGCTACT CCTTTAACTT TTAAGCTTC TTCAGTAATC AGAACTCTGT AACTTCTGTC ATCTCTACGA TGATTTCTTC AGAACGACTG GTTGGGATAT GTAATCTGGA CGGATTGGCA ATTCTCTA'rG TCCACGATCG ACTAGAACTG ACGCGCAGGA CGACCATGAC CGACAATATT ATCAATAGCA GCACGGATGG ATAGAGCACA TCATCCACCA AGATAACTTC GCGGTCTGTC ACATCGACAG AGTATCTTCT CCACTTTAA CATCATCACG GAAACGTTTA GTATCCAATT AACTGAAAGA TT'rTCTAACT GCTTCAAACG TTCTTGGATT CGGTGGGCAA ACGAGT'rTTA ATACCAGCCA AGACGATCTr ATTCAAATCT r-rGTTGCGTT ATAAGTAATA CGCGTAATCG CTCGTTTGAC GGTCAA'rTCG TCTACAACTT CATGACAAAC CTCCAAAAAG AAAAGTCTCC TTAAACAAGG AGACTTGAAA AGCGAGCCCT ACTGCACACA GTATAGACTT CACCCTTCTA CTTTATCGCG CCCTCACGGG ACAGGTTTAA AGGAATATTT AGTTATCATT TACTATAGCA
CTTGGTCATC
'rTTTCCAAC
CTAAACTCAC
TACGACCTGT
AAACCAAAGA
CCACAACAGG
TAAAGACACC
CGA'rAATCTC
CTTGTTTT
TTTATAGCCA
CTCCT-rGCCT
CAAAGCATGC
CATATAATTG
AATAAATCAT
GCTTT'rCCAC 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180
TTAAAATCAA
TGGGTACTGG
GGCCTGATGG
CTCAAGTGGC
GCAAAAAGT
TCACACTCTG
GCAGCTAACC
GTCCCTGATT
TCAATGTAGC ATCTTACAAA GATTTTTTGG ATGGCAAATG ACTGCTCAGG CGGCAAGATA TTTTGACAAT A'rCGTGGTGT CAATTCCAAA 'rCCTACACTC TCTCCA6ATTC TTCACCI'CTC CACATTTTTT AAGGAATTTA CAATCTCACT CTCTGTCGCT CTGGTGTGGC CTI'ATTACC
TTGCTAAAAT
GCTCTTCCAA
'rCCATGACCC AGTATCCACT GCAAAGGCTG GCGACCAACA CCTGCCAAAC ATCGTCTAGT AACTGTTCGGG CAAGCGAGAA ATATGTGAAG TGGAAAGGCA ACAAAGAGAC 'r GCAA.ATAC GCTCcACATrG ATGACA.ACAT TGGCTGTCTT TGAGGGACTT GACCATCAAA GCTTrATTCC GATACAATCC ACAGACATAG CTTGGGGTGT GCTGCATCTG TCGTCTGGGC AAATCAAGAC TAGGCTTGGC GCTCGT'rTTT TTIGACAAGAC TGATAACATG ACCGCAACCA GGAGTTCAAA ATGATTGGTA ATCTGGGAAG AGGGCAATGA TTTCTTCTAG CACCTTTCGT 787 CATTATTCAT CTCCGTCAAA TAGTCCTTGT AAGCCAGCAA AACGACTGTT TTCTTrCTTTC 9240 TTTACTGCr'r TTTGA~CCTG GTATrCTTCC TCTGTCATGA 7ITGCCAGTC Al'=Ce'r~c 9300 ATAAATCCTT GACCAGCTTC TT'CTCACCC GTCAAGACCT TG.ATAG.GAAT G~rrAGC-AGc 9360 ATATTGTCTG ATACACTCTC ACCAAGGTCA AGCTCCCCAT TrI'CGATGGG CAAGACCAAG 9420 'rCATCATCTA AAAC7TrCG ATCTAGCTGG TTAGTTGCGC C?1'CCATGAA AACTTCCGTG3 9480 ACTGGATAAG ATTCAACTAA CTCAACTGGC TCCATACTGC GACTCGACGC A.AGAACAATG 9540 GTATAAGATA GTTGATAATC TAAGAA.ATAC A'rACGGTCTT CATATT-GTAC TTTCCCAACT 9 600 GCAAGGATAT CTTTTACATC TAAAATTTCT TGATTACGTG CACGCAGGTC ATCAACTAAA 9660 TCTAACCTTr =TCAAAGTT CAAACCTTCA GAC'rGCTTAC GAAT1TTCr'rG AATATTTAAT 9720 TTCA'rACTTC CTCCATAAAG ATTTACTCTC TTGATTATAC CATGAAAAGG CTACAAATCA 9780 GCACACCAAA CTTTGTAATT AAAATTCAAA ATTTAACAT ATTTACTATG ATAGTrrrAT 9840 TTTTTAGTGC TATACTATAG GGAAAGAGTA CATCAGATCA AGGACGATGC TCACATGGAA 9900 *GACAAGAAAC TCATTCAACT CCTATCCAAG TTAAATAAAA GCTACCAAAA CTGTAAACAG 9960 GGTACGGCAG ATGATATTCG ACTACAAGAG CTGCTAAACA CTACTATGCA AGAGCTCAAA 10020 AAAG*A AGTTGAACAA LA1Tu..-i-A ATGTTGAGAA.ATTTTA CCAACCTACC 10080 *AGTCTTCTGA TTGGACTGGG TAGCCTAAAA CTAAACGATC AAGCACGCAC TGCTTGGCGA 10140 *aAACTATCATA AATTCCATTA CGATCATGTC AAACACGTAC TAAGTCTCTA TGGACCTGTT 10200 T'rTGAATTTT AGAGCATAGA ATTTCCAGTT TTCTGTTGAC AAAATTTCCT TAAAGGTATA 10260 too..:ATATAAAGAT ACTAATAC'TC GGAGGTAAGG GAGACATGAA CAACTAAGTC TATCAAATAA 10320 6 a AGAACCTTTA T'rTAGTAGAT CTTGTTTTTO TCTCTTTTTG TGTGCTCTTT TATGCTCTTT 10380 .64.
TTCTCGCATG TTAATAGAGT TTTTTTGACA TACACTr'rGG GCTCTACTAG GTAAAGTAGA 10440 CTTTTrCT'T ATGCACTATG AACATTCTAG AAAGGGAAAT CATATGATAA AAATCAATCA 10500 TCTAACCATC ACACAAAACA AAGATTTACG AGATCTTGTA TCTGACCTAA CCATGACCAT 10560 o* oo CCAACACGGG GAAAAGGTTG CrT'TATTGG TGAAGAAGGA AATGGCAAAT CAACCTTACT 10620 TAAAATTTTA ATGGGGAAG CTr-TGTCTGA TTTCACTATC AAGCGAAACA TCCAATCTGA 10680 CTATCAGTCA CTGGCCTACA TTCCTCAAAA AGTCCCTCAG GACCTAAAAA AGAAAAC~TTT 10740 *ACACCACTAC TTCTT'rrrAG ATTCTATTGA TTTAGACTAC AGTATCCTCT ATCGTTTGGC 10800 GGAGGAAT-rG CATTTTGATA GCAATCGTTT CGCAAGTGAC CAAGAGATTG GCAATCTATC 10860 AGGGGGCGAA GCTTTGAAAA T'rCAGCTTAT CCATGAGTTA GCCAAACCCT TTGAGATTCT 10920 788 ATTTTTAGAT GAACCTTCAA A'rGACCTAAGA CCTTGAGACA GTTGATTGGC TAAAAGGCCA GATTCAAAAG ACCAGGCAAA CCGTTATTTT CATTTCCCAT GATGAAGACT TTCTTCGA AACGCCAGAC ACTATTGTTC ACTTGCGACT GGTcAAACAC CGTAAAGAAG CGGAAAcGCT AGTAGAGCAT TTAGACTATG ATAGCTA'rAG TGAGCAGAGA AAGGCTAATT ?PGCCAAACA AAGTCAGCAA GCTGCTAACA ACCAAAGAGC CI'ACGATAAA ACCATGGAAA AACATCGGAG AGTTAACGCAA AATGTAGAAA CTGCCCTCG AGCTACCAAA GATAGTACTG CCGGTCGCCT ATTGGCTAAA AAGATGAAAA CTGTCCTCTC ACAAGAAAAA CGCTACGAAA AGCCAGCTCA GTCCATGACT CAAAAGCCAC TTGAAGAGGA ACAAATCCAA CTTTTCTTT CAGACATCCA ACCAT'rACCA GCTTCTAAAG TCTTACTCCA ACTGGAAAAA GAAAATI'rGT CCATTGACGA CCGAGTTTG GTTCAAAA.AC TACAACTAAC TGTCCGTGCC CAAGAAAAALA TCGGTATTA'r CGGGCCAAAT GGTGTTGGGA AATCAACTCT GTTAGCCAAG TTACAGAGAC TTWTGAATGA TAAAACAGAG ATTTCACTTG TTTATCCCCA ATAGCCTATC ATCTCACCTA GCTAGTCTCA ATCTGGCGGA CAACAGGGAA TCTCCTGCTG GATGAACCCA ACTCTTTGCT ACCTATCCAG AGAAGTCTGC TCGATCATCT AGATTTATAA ATTTGCAACA TGTTTTAAAC GTTCAATCCG CCCCCACCTr TCTTAGGATC CTCATAGGTT TGCTATTGTC GTCACCTCGA CACTAGATGC ACCAAGGTTG CAGCGAGAGG AT'rTCAACAC CATTTCCATC -CACATCATAC GACCTGCCAT ATAGTCGAAA TACGGATATC TCTAGTTCTT CACGATTCAT G'rN'TATGCC ACAAGATTAC TCAGTAAAAC TGGG4GAAAAA ATTTCAGTTA TCCAGAAATG AACTCCTGCT TTTGGATTTA CACGAAACTT 'NCTCCCACT GCGGTCTCAT CACTGTTTCG ATCGCATGAC AGAACACGGT TAGCAAAAAT CCAGAGACGA TTCTGAGATA GGTGGGTGGG ATTGATATAA AGGGCACTGC CAACTrATCT AGGGCATTAA ATCTGCCAGA AA'rTCCCTCT TGCCAGTACA ATAGCTAGTA TCGGTCATCA TCACTTCGTC ACTAGAAAGC ATGGTGATAG ATAATTACGA ATATGACTGA CACA.AAAAAC TGCAATrGGA GAGGAACTAC AGAAAATCCA CAGCATCAAA TTCGCTCCTT GTCCTGCGCA AACCAAACT'r TCTCAACCCC AAATCAGAAA CATGACCGTC GTTrCTrAAA TTGAAGCTAG TTAATTTAGA CCTCTGGATT CTTTTACATC TATAAAAGAG T7"rTTGGAAC TAGCATCATC GACGTGGCGA TCATTCCCTG GGGA'rrGCGA GACGAGAAAT AGCGAGCTGA 10980 11040 '11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 GGGAA.ACCAC TAGCATAATG TGCGACCTGC TCCACCCCAC CACTAGCAAG GGCAACTGCA CTTCATGTCC CATAACAGCT GATAGCTAGT AGACCTCAAG TCGCAGCAAC AGCCGCATTT TGAGGATTAG AACCTGTCGC AAAGGCATTT AAGGCTGGAT CATCALATGAT GAAAACACG GGCATAGGAA TCTGAGCGAC CAGAGCCATA TCTTCCACTA CATGGTAGAG CTCTGGTGCC 789 GTTTGCTCAT CCACCTCACG CGCTCCATTC ATGGACATGA CAATCTCTG CGA~rGAAA ATCATAGACA AAGCGTAGAT AAACCCGATA ATCAGTrGCAA TAACCAAACC ACCAAG'rCCA GATCITATAA AGAGATAACC AACCCCATAA CCAACAAGAG CTAAGAGTAG GAAAAATACC A~CAAcAAAA TccAGTTT TcGTTTATTG cTTccAATTT aATcAAAcAA cATcTTAGTc ACCTAAACCG CrAAAATCAA CTTTAGGAAC
AT~CTGCCC
TACATTGTAG
TGTGTTTG'rC TAAATCCA-A AC-ATTCCAC TTGCTGACAA CACTGTTATA AACTCCTCTT GCAAT'rTAAC CGACTTTTCC TCr-CAGGTG TTGAAGGAA GA'rAATATTG CTCGGGAAAG TTT'CTAAI-1- GAGTTGACGA GAGTAAGAA TTTAIT=C AAAGTTAGCA CTAGCTTTCA AATCTGGATA CTCACGAGTG AGGGCATCAC TGGCTCAT TTCGTTACGT AGTTCTGCCA CC=rTCAAG TACAGTCTCA ATCAAGTTTG GCAAGAGGTC GC1'TTCTGCA ACTGCAAAA.A TACCTGAAAC AG=~CT~CT GGTGAAGTCG CTGCCCCCAC GG'rAGAACCT TCATATTTGG CATAACCTTT ATTGCGACGT r'rCAACTGAA CATCAATCTG TTTAACCAAA CCGTTATAGC TAACAATCAC ACTCCAAGCC TCCTTGG'rrT AAAAATAACA ATAAGAGCGA
GCATACGATT
TAACTCCAAG
TATATCAAAT
GAAACCAAAA
AA'rAA'rCCAA GTCATAATAT AAGTCCTTTC TGCTTTTAGA TTAGTACCAG TTTCTATGAT TGTGGTAAAA TAAGATGATA CTA.AAGAAGG AAATAACTAT 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 1.3440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 .14100 14160 14220 14280 14340 14400 14460 Ar-ATTTTACA ACTTGCTTGC CGAGCAGAAT CTTCCACTTT CGGACCAGCA AAAAGAACAA TTTGAACGTT ATTTTGAGCT CTTGGTCGAG TGGAAT'IGA AGA'ITAA'rTT GACGGCGATT ACCGACAAGG BLAGAAGTTTA TCTCAAACAT TTTTACGATT CG;ATTGrCACC CATTCTTCAA GGTTTGATTC CCAATGAAAC TATCAAACTT CTTGATATCG GGGCTGGGGC AGGATTTCCT AGTCTACCAA TGAAAATTCT CTATCCGGAG TTAGATGTGA CCA'TTATTGA AAGCGCATCA ACTTCCTACA ACTCTTGGCT CAAGAACTGG A'TTTGAACGG TACCACGGAC GTGCCCAAGA T'TTTGCCCAA GACAAGAACT TCCC'rGCTCA GTAACAGCTC GTGCGGTTGC CCGTATGCAG CTCCTATC'rG AATTGACTAT AAGGTTGGTG GCAAACTAT'r AGCACTCAAG GCTAGCAATG CGCCTGAGCA
TTCACTCAAT
AGTTCATTTC
ATATGATTTT
TCCCTACCTT
ATTATTAGAA
GCTAAGAATG
CGAATAGAGA
CCCTCAATCT CC?1TTTTAGT AAGGTCGAAG ACAATCTCAG TACCCCCTAC TCCGCGCTAT ATCACAGTGG TAGAAAAGAA AAAAGAAACA CCAAATAAAT ATCCACGTAA GGCTGGTATG CCAAATAAAC GCCCACTTTA AATTTTTAG TAAACAAATG TTTACAAAAT CAGCCTCGCT CTr'rTATTTC TAGGCTCGGG AAAAAA'rGAT TTACAAAATC ACCCTCGCTC TTTTATTTCT AGGCTCGGCA AAAAATGAT'r TACAAAATCA TTTTTTTCTC 790 CTATACTATC CTAAGCAAAG G?1'rTAATG TCATCCCGTG ACGTGACGAA GACGCAGWA TATTTAAAAC TCTTTAAAAT CTAAA'TTrA AAGAAGTCTT ACTCTGAGGG CCTATTGCTG TAAAATAATG GGCTCTI'TTT TGATGCCCAA AAGTGAG'r- 'FATATGAAAC AAGAATCAAC TGTTGATTTG TTAC INFORMATION FOR SEQ ID NO: 107: SEQUENCE CHARACTERISTICS: LENGTH: 6405 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 14520 14580 14640 14654 AGAAAAATCT GCTrTACAGA AAATACGAAG GTGTCATTCC CCAGAACGTA CGCGTGCCTT GTCAATGGTT CTTCTGGTGA GAAGAAGTCA TGGCGGTAGC ATACTAAAGA TAGTATGGAA CAACGATTCC ACCAATNTAT ATATCAGTTC TGCAGCTCCA GGGT'PGCTTT- GACTCCAAGC TGAAGAACTC TTCTATGCCA ACCATATCGT CTTT'AATGGT GGGCTGGTAT CGGTGGTACT TGATTGCGGA TAAGGACCTA TTGGTAAACT CACT'rCTGCT AAATAAAAAT AATAGGAGAA AATCTATGTC AGATTTGAAA AGCCTTCTAC GCATGTTATG ATGATCAAGG AGAAGTAAGC GGTTCAATAC TTCATTGATA AAGGTGTTCA AGGTCTTTAT ATGTATCTAC CAAAGCGT'rG AAGA'rCGCAA GT'rGAT'rrTG AAAGGTAAAT TGACCATTAT TGCCCATGTT GCTrGCAATA CTTGCTCGCC ATGCTGAAAG CTTGGGAGTA GATGCTATTG TTCCGCTrGC CAGAATACTC AG??TGCCAAA TACTGGAACG AACACAGACT ACGTGATTTA CAACATTCCT CAATTGGCAG CTTTACACAG AAATGTTGAA AAALTCCTCGT GTTATCGGTG GTTCAAGATA TCCAAACCTT TGrCAGCCTT GGTGGAGA.AG CCTGATGAGC AGTTCCTAGG AGGACGCCTC ATGGGGGCTA TATGGTGCTA TGCCAGAACT CTTCTTGAAA CTCAATCAGT GAAACAGCGC GTGAATTGCA GTATGCTATC AACGCAA'ICA CATGGAAATA TGTACGGTGT CATCAAAGAA GTCTTGAAAA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 TCAATGAAGG CTTGAATATT GGATCTGTTC GTTCACCATT GACACCAGTG ACTGAAGAAG
ATCGTCCAGT
AATCTAAAAG
CAAGTATGGT
GGCGCATAAG
AGAAAAAGGC
TGTAGAAGCG GCTGCTGCCT TGATTCGTGA AACCAAGGAG CGCTTCCTCT GAGGTATTTA TGACATATTA CGTTGCAA'PT GATATCGGTG GAACCAACAT TTGGTTGATC AAGAGGGGCA ACTTCTTGAA TCGCATGAA.A TGCCAACTGA GGTGGACCTC ATATCTTACA AXAGACCAAA GATATCGTAG CTAGTTATTT CCAGTAGCAG GTGTT~GCCAT ATCTTCTGCT GGGATGGTGG ATCCGGATAA GGGTGAGATT TTCTATGCTG GGAAATCGAA GAAAAGCTTTA TC rGCTGAG GCP.GTACTG TGGAACCGGT ATCGGTGGTT GGCCGCAAAT CCCTAACTAC GCAGGCACCC AGTTCAAAAA CTATTCCTTG TGAGArGAA AATGATGTCA ACTGTGCAQG GTTCAGGCAA GGACAAGT GTGACACTTT GCTTGAccAT GCTTGATTAT GGATAGGAAA GTCTTCCATG GTTTAGCAA 1260 1320 1380 1440 T1'CACCCTOT GAAGTCGGGT ATATGCATAT GCAGGATGGA GC?1N'CAAG ACTTGGCrTTC TACAACAGCT TTAGTGAAAT ATGTAGCTGA AGCCCATGGA GAAGATGTTG ATCAGTGGAA TGGCCGTAGA ATTTTCAAAG A.AGCCACTGA AGGAAACAAA CCGTATGGTT GACTATCTAG GAAAAGG'rCT GGCAAATATT AGTGGTTAT'r
TACAGCCTTG
CCATCACCAA
CTAGrI'CGC CTTGGTGGTG GTATCATGGG GCAAGAGGCT AAAGAGGCTT TGGTACCAAG TTTAGCAGA AATACAGCAG GGATGTTGGG TGCATATTAT ATCTGCATGG AAGGTATTGA TGCTACGrG CCAATCCACA ATCCTCAAAC CTAAGATCCG AAAACACCAT TAGAATTTGC CATrrTAAGA CAAAACAATC TCAGCCAAAC TAGGAT~rrC T'rACACGTTT TTGTCTACGA TAGCCGTTGA GTTTTTTATT TTCCCAGTAG CTATAAAGA 'ITTTrCCTT GCTTTCGCGA TTGA~TTCCA AAAAGTACGC ATAAATCAAA TCGATAAAGA AGAGCATAGG AAGTTGAGCG GATATTCGT'r GGATATAGGA CGGTTCGCTG TGGGTGGCTA CAAGAACAGT CTCTGTATAG GTCTGGCTAT CTTTATTGGG AACACTTGTA AAGAGTACAG TCTTTGCCCC CATCTCCTTA GCATCTAATA GACTATCTAA AATAGAAGGA GTTGAGCCTG AAAGTGAGAA GCCCACTACT1 AGACAATTTT CATCCATGAT GCTGG'TTGTC CAGGCAAAGC CGTCTTGGTC TGTCAAAGCT TCGCACACCA CACCTAGTCG CATAAAACGT AATT'rCATTT CACGGGCCAC GAGGCCAGAA CTCCCTGTTC CAAAGAAGTA GATACCrCA GCATCTTCGA TTAGCTGGGC AATTCGTTCT AGTTGGATTT CGTCAATCAA GTCTTGTGTT TGTTCCCTCA TATTGCTATA ACTTCTGAGG ACTCGTTTGG TCAGTGGACT GTGCTTGGAG ACTTGGTTGG CT 'GATTTTC TGCCTGATGT TGGTATTGGA AAATAAATTC TCGGTAGCCA GTAAACCCAC ACTTTTTAGC AAACCCGGTC AAACCAGCTT GAGAAATATG TAATTTTTGG GTGACTTGTT GAGAAGATAA ATCATCTGTA ATCGTT'rCAG CTTGCAAAAA ATAGCGAGCG ATTTC 'rGTr CTAGGTCTGT CATTTC~rCA AAA'rCTGAAT CAATGATAG'r TGCGATATCT GGTTTGTCCA TACGGAAAGC TCCTTTACAT GAGTCATACT GGAAGACTAG ATCAGAGAAT ACTCACACTT CATTATAACA CATAATATAA GGATAGATAA ATAAAAACGC ATCTCTGTT'r 'AAAAACGAA AAAATCGAAA AAGCrrCTCT.CTTTTCCATA ATTTTCTACT CAAATTGTGG TACAATTAAG AGTAAGATTT TPAGTTAGAA ATGAGACTGA 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 792 T1-rGTATGAG AAAATT'TAAC AGCCArrCGA TTCCGATTCG GCTTAATTTA TTGTTTTCAA TCGTCATTTT1 ACTCTTTATG ACCATTATTG GTCGTTr GTATATGCAG GTrTTGAACA AGGATT-TTTA CGAUAAAAAAG CTAGCTTCAG CTAGTCAGAC CAAGATTACA AGCAGTTCAG CCCGTGGGGA AATT-TATGAT GCTAGTGGAA AACCTTTGGT AGAAAATACG TTAAAGCAGG TTG?1-rCCTT TACGCGTAGC AGTACTGAC TTATGTGAGC ACTATTTGGC TGATCCTGAA GCTTGGATTC AGATGGCAAr GTGTACAAAC GAGTCAACTA GTCAGTTAAA TGCTGTTGGA ATTCTCAGGT GGCTGTTAT AATAAAATGA CGGCTACAGA CTTAAAAGAA ACACCTAAAA ATCAGTTCTC CAAATTTCAC AGAACCCCAG CTGGCGGAr ATCTATAAAA AAATAGTGGA AGCTCTCCCA AGTGAGAAAc CGTCTATCCG AATCAGAACT GTATAACAAT GCGGTCGATA AACTATACAG AGGATGAAAA GAAAGAAATC TATCTTTT'rA AACTT'rGCGA CAGGAACCAT GCCTCTATTT CAAAGGAGAT C'rTCTTGGGA TAGAAAG=T TTGGAAACTT GTGAAAAAGC TGGTCTCCCA GCGGAAGAAG TAAATCACCG TGTAGAACC TCCTATTT'GG AACGCTCGGT AAAAGAAATC CATCTGGATA T'rGAGGAAGG TACTAAGGGA AACAATATCA GCGTGGATGC TTTACTGAAA AGTTATTT'CA A'rCTGAAGG TGTCTATGCA GTCGCCCTTA CAGGGATTAA ACATGACTTG AAAACGGGAG CCAATGTCTT TGTTCCAGGT TCGGTTGTCA ATGGAGTCr'r GTCAGGAAAC CAGACCTTGA CTCCCATCAA TTCTTGGTAT ACTCAGGCTT CTCTGGAGTA TTCATCAAAT ACCTATATGG CCTATCAACC CAATATGTTT GTCGGCACCA GTTCAACCTT TGGCGAATAT GGCTTGGGTA -CTACTGGATT TGTTCCCAAA GAGTATAGCT AGTTTGATAA CTATACGCCC ATGCAGTTCG
CCCTTTCTTC
CAGAAGCCTA
AAAAGCAATA
AATATGGCAA
AACTGACCAT
ATTCTGAGCT
ACCCAAAAAC
AGTTGACGCC
AGGCGGCGAC
TGCGACAGAT CCTCTAAATG GCCTGGCA'N' AGTATTTCTA TATAGTTGGG AGTGTATCCA TCTTAAAAAA GGCTATTCTC TGAAGAGACC TTACAAGGAA TATGGAAAGC GTGGATACAA TGA'N=TGGCT TITCCAAGATA AGAAAATGCt GGAGCCAAGT AGGTGCGGTT TTGTrCTATGT TGAT'rCCTTG GGAACGCTAA CATCAGCTCA GGTTGGGAAA 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 CAGACCAGTC CATTGTCTTC CAAGGTTCAG ACGGTTCATT CCCTATCACA GCGGTCCAAG
TCCAAACAGC
GCAATCTACA
CTGCGACAGG
TTGCTAAT'rA
CTCAGTATGT
CTTAGGTCTT
GTCTGCTATG
AATTGACCTA
ATGCGGCAAA
GAGAAACTGC
CCAGATGAAT
CATTACTAAT GCCTTTGGGC AGCAACTrATT GCAA.ATAA'rG GTG?1TCGTGT GGCTCC'rCGT ATTGTTGAAG GCATTTATGG TAATAATGAT AAGGGAGGAC TGGGTGACTT GATTCAGCAA CTGCAACCGA CAGAGATGA-A TAAGGTCAAT ATATCCGACT CCGATATGAG CATCTTGCAC CAAGGTTr? r ATCAGGTTGC CCATGGTACr AGTGGATTGA 793 CAAC1TGGACG TGCCTTTTCA CCGAAAGCTA TGTGGCAGAT C1ATCTGATAA 'rCCCCAAATC ATOGI'GTAGG ACCTTCCATT TGAATTAGAA AGGAAATTAT TCTAAG 'rAC CAGGTATCG4G AATGGTGCCT TGOTATCCAT TAGCGCAAAA ACAGGTACAG GGTCAGCAAC CAACCAATAC CAATGCGGTG GCCTATGCCC GCTGTCGCAG 'rGGTC?1,TCC TCATAATACC AATCTAACAA GCGCGTGACA TrATCAATCT GTATCAAAAA TACCATCCAA GCTTTATCCA ACACCTATTG CCAAGTTGAT TGACAG=rAT GATTAAGACG GCACGCGTC TGGCC rr A TACGATTGGG TGAATTTGCA AA.AA.ATCTCC TTTCTGC'rAA GAGAGAATTG ACGTTT-GACA GACGACGATC CTTGT'rCTAT CTGTACTGAT AATNTTAGI' CTTGAGGATA GTAGAGATGT GGCAGCCATG
ATGTCTGCTG
ACATATTGTT
CCCACTCGTG
ATGATGTCAA
CTATrTTTGG
ACCAGACAAC
0 0 00* *of* 0 GAAAATATCC AAGAATACCA TGGACTCTAT CATGTCCTTC AATGGTATCA GTCCGGACGA TATCAATCTC AAGAGCCTTA GAGGTTTCAG AAGTGATTGT GGCGACTAAT GCTACAGCGG TATCTTTCAC GTTTGCTCAA. GCCGGCTGGT ATCAAGGTTA GCTGTGGGAG CCGACATTGA GTA'rGCGGAC GAAGTGACAC CGGACAGAGT TGTAAGTGTA GGCAAATTTA CGAACTCCAT AGGCTGAAAA TCGTTCCTAT CGGCCCTTT TTGTATAGTG AAGTTTTAAA AAACCAAGCA AATATGATAT ACTAAAGACC GACAAATAAT ATGAAACAAA CGATTATTCT T TrATATGGT AGTCTCTGTC CTTTCAGCTrG AGAGTGTCAT GCGTGCGGTC CAAGACTr'rC TTTATCAGTC AGTCAGGTGA CTTTATCAAA TCCCGGGGCAA GAAGACCGI'C TCATGACCAA TGAAACCATT ACCAAGTGCT ATCTACGAAG AAGGTGCAGT GGTCTTTCCA AGAAGATGGC TCTGT'rCAAG GATTCTrGGA AGTTTTGAAA CAT'rTTGTCA TCAAGTCTI'G CCATGGATAA AATCACGACT TGCGTATTGCC CAAGTTCCTT ATGTGGCTAT CGTTGAAGCC CGCTGAAGTG GAAGAAAAAT TGGCTTATCC AGTCTTCACT TAGTCTCGGT ATITTCTAAGT CTGAAAACCA AGAAGAACTC CT'rCCGATAT GACAGCCGTG TCTTGGTTGA GCAAGGAGTG ATCGCCTCAT TTCTCCTATC TGACTCGTCT TATGATAGT ATGGTGAAGC GACTTCCATG CGCGTCTAGC ACGAGGTCTC 'rCTTACGAGC CATTGAAAAT TCATTTATAA AAAATCAAAG TGATGAGTAG GCTCAGGT'rC GAGTATTCTA GTAGAAT'rAG GGACGGAGTG CGGAACGCGA GA'rTACGACC GTTTCACAGT ACACAGGAAT TTAGTCATGC GATTGGGATA AGAAAGTTGC GTCCTTCACG GGCCAATGG ATGCCTTACG 'rTGGT'rGCAA AAGCGTGTTC TGGA.ATCTGC GATGATGTGA CTGCAAAAT 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000.
6060 6120 6180 6240 6300 6360 6405
AAGCCGTCAA
CGI'CAAGCCT
AATGC
ACATGGGCTC
TAAAACTrC INFORMATION FOR SEQ 10 NO: 108: 794 SEQUENCE CHARACTERISTICS: LENGTH: 11309 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108: CGAGCTCGGG TACCGGGATT TTAAGGAGTT TGATATGTAT AACCTATrAT ATTAGTATTA TCTGTTGTGA TrGTGATTGC AATTTTCATG CAACCAACCA CAGCAATGTA TTGATCCA GTTCAAGGTGA TTTGTTGAA CGCAGTAAAG
TAACCATTTT
AAAACCAATC
CTCGCGGTTT
S
.5 S S S S. 55
S
TGAAGCTGTA ATGCAGCGTT AGCATTGACG GTATTATCAA T'IITATTT-r TAAAGGATGT AAAGAAAATA TGAAAGATAG AATGAT'IrGG CTCAGGCT ACCTTGTCC'r TAA'rGGAAAG TTAGAAATTA AGA.AAAAACA GGCTTTGGCT TTGTTAGTCT TGACAGGGAT TrTAGTCTTT TTCTGGCTAG CCATTGCCTT GTAGATAAGA AAATAATGGG CAGGACTAGG TCNTGCCTC TrGAGAAGGT TTACAGTAA AAGAAAATTA AAAAATCTAG AATAAAAGAA TATTACAAG ACAAGGGAAA GGTcGACTGTT GGGAAAAGAC AGTTCCAAGG AAAGCACCAA ATTCGT'rrrG TGAGATTACC CTCAAGGGGA GGAAGGCGAG GAGGACGACC ATTTTCGTGA GTTGATTAAA AAGAAGATGG TAGTCTGACA TT'rTTCATGC CCATAAAAAT TTNTGTAGG GAAAAATGA'r TTAAGAAAGT CGCTGACCC TAGAACACAG TT'rGACAACA ATGCTGGCTA TATTCGTTCA CAGCCCTAAA ATTAGAAGGA GTCAACTATG CTA'IrGATGG TGATACCGTC GAGGTAGTGA AATAAGGGAA CAGCAGCAGA AGCCAAAATT AT'rGATATCC GTTGTCGGGC AAATCGTTCT GGATCAGGAA AAACCTAAGT AAAAATCAGA AAATCAGTCA ACCGATTTAT GTTAAGAAAC 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500
S
S
*S
S S ACAGAAGTTC TCAAAGTCTT TATCGATAAA TACCCAAGCA AGAAACATGA TTTCTTTGTC GCGAGTGTTC TCGATGTAGT GGGACACTCA ACGGATGTCG GAATTGATGT TCTTGAGGTC TTGGAATCAA TGGACATTGT ATCCGAGTTr CCAGAAGCrG TTGTrAAGGA AGCAGAAAGT GTGCCTGATG CTCCGTCTCA AAAGGATATG GAAGGTCGTCTGGATCTAAG AGATGAAATT ACCTTTACCA TTGACGGTGC GGATGCCAAG GACTTGGACG ATGCAGTGCA TATCAAGGCT CTGAAAAATG GCAATCTGGA GTTTGGGGTI' CACATCGCAG ATGTTTCTTA 'rrATGTGACC GAGGGGTCTG CCCTTGACAA GGAAGCCCTT AACCGTGCCA CTTCTGTTTA CGTGACAGAC CGAGTGGTGC CAATGCTTCC AGAACGACTA TCAAATGGCA TCTGCTCTCT CAATCCCCAA- GTTGACCGCC TGACCCAGTC TGCTATTATG GAGATTOATA AACATGGTCG TGTGGTCAAC TATACCATTA CACAAACAGT TATCAAGACC AGTTTTCGTA TGACCTATAG CGATGTCAAT 795 GATATCCTAG CTGGCGATGA AGAAAAGAGA AAAGAA'rATC ATAAAATTGT ATCAAGTATC 1560 GAAC1TCATGG CCAAGCTTCA TGAAACI'TA GAAAACA'rGC GTGTGAAACG TGGAGCTCTC 1620 AATNTTGATA CCAATGAAGC GAAGA?1TYTA GTGGATAAAC AAGGTAAGCC TGTTGATATC 1680 GTTrCCGGC AGCGTGGTAT TGCCGAGCGG ATGATTGAGT C??r"ATGT GATGGCTAAT 1740 GAAACAGTTG CCGAACATTT CAGCAAGTTG GATT'rGCCTT TTATCTATCG AATTCACGAG 1800 GAGCCTAAGG CTrGAAAAGT TCAGAAGN' AT'rGATTATG CTTCGAGT?1' TGGCTTGCGC 1860 ATTTATGGAA CTGCCAGTGA GATTAGTCAG GAGGCACTTC AAGACATCAT GCGTGCTGTT 1920 GAGGGAGAAC CTTATGCAGA 'rGTATTGTCC ATCATGCT'rC TTCGCTCTAT GCAGCAGGCI' 1980 CCTTATTCGG AGCACAATCA CCGCCACTAT GGACTAGCTG CTGACTATTA TACTCACTT'r 2040 ACCAGTCCAA TTCGTCGTTA TCCAGACCTT CTTGTTCACC GTATGATTCG GGATTACGGC 2100 CGTTCTAAGG AAATAGCAGA GCATT'rTGAA CAAGTGATTC CAGAGATTGC GACCCAGTCT 2160 TCCAACCGTG AACGTCGTGC CATAGAAGCT GAGCGTGXA TCGAAGCCAT GAAAA.AGGCT 2220 GTATATGG AAGAATACGT GGGTGAAGAG TATCATGCAG TTGTATCAAG TATTGTCAAA 2280 T TCGGTCTCT TTGTCGA.ATT GCCAAACACA GTTGAAGGCT TGAT'CACAT CACTAATCTG 2340 *CCTGAATTT ATCATTTCAA TGAGCGTGAT TTGACTCTTC GTGGAGAAAA ATCAGG'rATC 2400 *6*ACTTTCCGAG 'rGGGTCAGCA GATCCCTATC CG'rGTTGAAA GAGCGGATAA AATGACTGGA 2460 GAGATTGATT TTTCAT'rCGT ACCTAGTCAG ?r'rGATGTGA TTGAAAAAGG CTTGAAACAG 2520 'rvo CTAGTCGTA GTGGCAGAGG GCGTGATTCA AATCGTCGTT CGGATAAGAA GGAAGACAAG 2580 0 0 r AGAAAATCAG GACGCTCAAA TGATAAGCGT AAGCATTCAC AAAAAGACAA GAAGAAAAAA 2640 GGAAAGAAAC CTTTTTACAA GGAAGTAGC1' AAGAAAGGAG CCAAGCATGG CAAAGGGCGA 2700 GGGAAAGGTC GTCGCACAAA ATAAAAAGGC ACGCCACGAC TATACAATCG TAGATACGCT 2760 AGAGGCAGGG ATGGTCCTGA CTGGAACTGA AATCAAGAGT GTACCAGCTG CTCGAATTAA 2820 TCTCAAGGAT GGCTTTGCTC AAGTGAAAAA TGGAGAAGT'r TGGCTGAGCA ATGTTCATAT 2880- CGCGCCTTAC GAACAGGGCA A'rATCTGGAA CCACGAACCA GAACGTCGTC GTAAACTCCT 2940 RGCI'CCATAAA AAGCAAATTC AAAAATTGGA ACAAGAGATC AAAGGGACAG GAATGACCTT 3000 *AGTTCCCCrr AAGGTCTATA TAAAAGATGG CTACGCTAAG CTTCTTTTAG GACTTGCCAA 3060 AGGGAAGCAT GACTATGACA AACGGGAGTC TATCAAACG;T CGTGACCAAA ATCGACATAT 3120 CGCGCGTGTG ATGAAAGCTG TTAATCAGCG ATAAAAAGAG GAATTGAAAA TGGAAAAATT 3180 AGTTGCCTAT AAACGCATGC CTTTGTGCAA TAAACAAACA ATGCCTGAAG CTGTTCAGCA 3240 796 AAAGCACAAT ACAAAAGTTG GGACTTGGGG GAAAATTACT GTTTATTGAA TTGACAGAAG AAGGGGAACT TCTAGCcTGAA AGACAATCCA ATGCCCCAAC- CTCAAGCCTG GCACCGA=T GTcrGAAGG GAGcTCTcAA CACCTCTrTG AAGCAGGGGC GAAGCTGCCA CAGATGATGT TTGCTAAAA AATAcAATAC GTGAAACAAG CGAAAGCT -r GCCCAGCAAG ATTrC.A'TG GGAATGGTAC TTGGAAT1?1T ATTGTAAACC CAATCCTGTT CATTCAGAGG TCCTAGAGGC GGATT'rGGCT TGTGGTCAGG GGCGTAATTC GACGGCTGTA GATCAAAATG GACTAGCTCT ACATTTGGAC ATGCC'rGTTG GCCTTTACGA TGATTTTATC GTTTCAACAG TTGTTCTCAT TATTCAAAA'r ATGCAGGACA AAACCAGTGT
TGAGGATTAT
CATGCAGACA
'rCTTTTTrCTA TGAAATCTTG CAAAGCA'N'G TATCAATTCA GCTAGCATTG GTTTCTACAA GCGGACCCCA TGGTGGTTAC AACCTTATCG CT'rCCCATTC ACCI'TAAAG
T'GGAGCAGGA
AACAAGAATA
TTCCAGCTA'r TT'rGTGCCA'r
AAGGAGAACT
S. 55 5
S
S
S
S. S S GGACACGGAG GATTATCCTT OCCAGACTAT TACAAGGATT- CCGTCGCGAT GAGAATGGCA AATCAAGTAA ACACACATGA GATATAGAAA AGGAGGGAAT TGTGTrAGAG GATAAACTrG CCATTTGCGT CAAGGACCAA TGATTTTrTC TTTGAAAATG GGCGAATTG4GT TAAGTACAAT ATCGTATTCA AC'rACGCT AGATTAGGAA r 17CCTGAT TCATGTTTGT TGCGAGAGAT
AGAAGCAAGC
GTGTACGGAC
AAAGTCCAGA
GCTCGGTTAA
ATACACCTGC
GCATTTTGCC
ACACCTGGCC
AGAGTACCCG
TCTAGA.AGTT
TCGTAGTCAG
GAAAATCCAG GCCATTTGCA GCGACCTrAC TAGC'rAAGAA CTTTTTT'rr TrTTTACGAA'r GCTAGGGGAG AATTGGTAAA CCAGCTTGTG GAGGCCAGCT CATAAArCCT tA.AAAcr ALATAAGGAAT CCCTCTATCA CTI'CAGAAC TTAAACAGAT CAGTGTAGTC CCTTGCCTCA GGT'rACCAAG TACTGTGGTT 3300 3360 '3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040
CTGGTTGAAA
TGCGGATGTA
GAAAGTCCTT
GCTGGGTCAA
AAAGAGACAA AGGTTCAATT TTTGTAAATG GCAA'rCTAGC AAAGAGCGAA GTGAGGGCTA AAACTGTGGC TCAAGGAGCG TTTGACTCGT CTACAGCAAG GITrTCTTTA TTTCAGTCAA AACATGGGCT TTTATGTT'rG GGAATTAGAC ACTCAAATAC CTGAT'rTACC AGGATCTCCG CGGTAAACTC TTCCTATGGT CAAGGTAGTT TATTGGAAAT AT'rGCGTCTT ATCTCATTTT ACAGTTTCTG AGGACAAGGA CATCTGTCGC TTATCAAAAT CTCTTTTGGA TGAAAGAACA AGCAGAAGCC CCTGACTrA'r GGACTGAAAG AATGGTATCC ACAAArTCGA CCAGATTGA-A CAAGACTTGA CTAGCTATTA TCAGCACTTT 'rCCTCAAAAT GATTGGCAAA AGCTTTATCC ACCAGCCTTT AAGGAAAAAC AAGTTTTAAG CATTATCAAA TCAAGGAATI' CCCrATAAGA GACAAAAAAT TATATCCGGC AACAACT'rTA TATCAAAAGG GAGAAAATAT CCAATAGTGG GCAAATTTTT TATACCTATT ACCAAAAAAA TATCAGCAAT ATTTCTTGAA AAATATGGTA GAATAGAAAG GATGGAGGAA TCTAATGGTA TACAAAGAA TGAAAAAGAT ACATGGGATC TATCAACGAT CTACCCAACT GACCAGGCTT CTTAAAAGAT TTAACAGAAC AATTGGAGALC ACGrAGCCCAG TATGAAGGCC TAGTGCGGAT AACCTACTAG AAATCACTGA ATM-TCTCTT GAAATGGAAC GAAGCTT"TAC GCTTATGCTC ATATGAAGAA TGACCAGGAT ACACGTGAAG ACAGTACTAT GCCAAGGCCA TGACACTCTA CAGCCAGTTA GACCAAGCCT
ATGAAATAAA
GGGAAGAAGC
ATCTCTTGGA
GCCAGATAGA
CTAAGTATCA
TTTCATTCTA
TGAGCCTGAA TTATGGAGA TTAGCGAAAA AAAGCTGCAG GI'TATCAAC ACTATTT-TGA TTrCACAACGT
AACCTTCGCT
TAAAGAATT
GGTTCGCCGT
TGCCAAAACC
CAAGAGTGCT
GAAGAAGAAT TATTGGCTGG ATCTTGGACA ATGCGGATAT CAGCTATCTC ATGGGACTTA GGTGCCTATC AACCCTTA TTGCAAACCA ATGTTAAGGT CGTCATGCAG CCCTCGCAGC CCAGTATGCTCrAC?1N'AG AAGCTCAACC CAAGCTrZrM CAAGGCAAGG ATCACG~TCT AGCTGGAGAA ATCTrGGTT CAGCAAGTGA TGTGT'rCCCI' TATGTCC'rAG ACGATGATGG CACACGTTG ATGGAGTCTA AAAAACGTGA TTTGGTAGCA GCAGTTCGCA AGCATTTGCC AAAAATCTTG GGGATTI'CAG ATCTCAAGAT TGAATACAGT TTACCTACC AAGAAGCCT'r GGGTGAGGAT TACTTGAGCC GTGT'rAAACG CGA.AAATCAA GGCAAGCGTT CAGGTGCCTA 'rATGCTTCTC AACTGGCAAG ACAATCTGGA TCACAGTATG CATTCAAGCT ATACTCGTGA TATCTTTTTC GCTGAGAT'rG CCTCAACTAC GGAAGAAGTC GAAGACGACG CAACACGCtTT
TGCGACTTAC
GCAAAATTAC
GAATTTTGTT
ACTCTTACAT
GTACGATGTC
GAAAAAAGCA
TGCCTTCAGC
CTCTGGTGGT
CAATCTdTrT
AACTCAGCCT
CAATGAAAAT
TGCTATTCTC
TGC'rGAGTTT
TTTCCTAAAT
CAATCCTGAA
G~AACAATTCC AACACACCTA CGTGCTAAAG TTCGTAACTA CCAGAAAGTG TTTATGACAA CGCTATCTTG AGCTTCGTTC TACACACCGC TTTCATCTGT GAAGATGCTT TGGCAGTCTT GAGCGTTGGA TNGATGTITA TCTTATGATA CCAATGCCTT ACTC7'rGT'rC ATGAAACAGG TATGTTTACG GGGATTAC'rC ATCTTGACGG AGAAATTATT AATAACTTCC TAGATGGTTT GAACACGCCA TTCACCAAGC AAACTCTACG CAGACTTGAA ATCCAATACG AGTGGGerCG 'rCAACTGGCT TTGCGGCCC 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6,000 6060 6120 6180 6240 6300L 6360 6420 6480 6540 6600 6660 CCGTGGAACA GTTTTCCGCC AGATCAAAAT GGGGAGGTCT CCAAGAGTAT TA'rGGTTTGA
AAACTCAATT
TGACAAGCGA
GTAAGGAAGA
CATTCCACAC TTCTACTATA ACTACTATGT ATATCAATAT CTCAGCCTTG GCTGAAAAAA TTGTCCATGG TAGTCAAGAA GACCGTGACC GCTATATCGA CTACCTCAAG GCAGGTAAGT CCGACTATCC ACTTAATGTC ATGAGAAAAG CTGGTGTTGA 798 TATGGAGAAG GAAGACTACC TCAACGATGC CTTTGCAGTC TTTCAACGCC GTrrAAATGA GTrTGAAGCC CrGTTGAAA AATTAGQATT GGCA'rAAAA'r GG?'GA.ATCG TATAGTAAGA ATGCTAACCA TAACATGCGT AGCGTCAAAA GCAGG'rCACA ATATTCCTAT TATTCCCCAT
AGCCTAAAA-TATTCTGGAA
AACATGCGCC AAATGCTAAG CCAACGAAAA 'TTrTGCCCAG CGGTGGATGT CTTATCTACA AGTCTAAATA CATCGTCTTT TTGTCTTGGA TGATA'TTT GTGGTCAGCG AACCATTTAT CGTCCTGTCG TCAAAGAAGA AATTGTAGAC TTGATGCGTC GGTT-TCTTGA AACAA'TTGGA AGACr GCC CCAGGAA GAAACGGTTG CTTATTTCCC-TrTTCTTATG GAAACCATGC ATTGGGACGG CTATCCGTTT TTCAGCTCTC T'rGATGGCTG ATTACAACTA T1'GATCGTAA TCCAGAAATG ATTGGT'17I1'G TTTI~GACAGTC GCAAGCAAAT CACTCTCCTA GAGGGAGATC CTGACAGAGT CTTATGAT'IT CGTCT-rTATG CTGCCAGAAA TCCTCAAACA TTTGGAAGTT CAAGGTGGTG ATGTTGCCAA GGATAT'rATG
GATTCTGCCA
GGrTGGTGTGG
GAAGTCCGTC
CGAGGCCTTC AAAAATTATT CAGAACTCAC CGCAACATTA GTGCCTTTAG GAGATGGTAT TAGCAGA'rGT TCAACTGTCT GAAAGCGAAT GATTTTCAGA AAAATAGATA GAGTAACACT TATCTCAAAG GAGTAGACAT GTGCCATCAC ACTATTATCA GTAGCAACTT TAGCAGCTTG CAdACCTTAT CAGCATGAAA GGGGATGTCA TTACAGAACA AAAGCAACCC TTCAGCCCAA CAAGTCTTGT TAA.ATATGAC AACAATATGG CTCAGAGCT'r GATGATAAAG AGGTTGATGA TGATGCAACC TTAGACAATC TCTCATGCTT CGTAAAAATG AAATTTAAG AAAAAATAGT GAAGAAAAAA TTATTGGCAG TTCGAAAGGG TCAGAAGGTG TCAATTTTAT GAGCAAGTGA CATCCAAAAA GTTTTTGAAA TACTATTGCC GAACAAAAAA 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580
AACAATATG
GTAAAGCTCA
CTGAATTGAC
CGAAAACTAC CAACGTGTCT AATTCGTACAL AGTAA.ATTAG AGATGAAGCC TATAAGAAAG CTCAAATCAT CCGTCTTAAT CAGAAGGTGC TGATTTTGCT AAAATGGTGG AGAAATTACC AAGCCGCTTT CGCTTTAGAT AAGCCTACAG TAGCCAATAT ATATTGATGA CTACAAAGAA
AATGAAGATA
CAATTAGCCA.
TTTGAT'rCTG
GTGGATGGTG
TACATTGTAA
AAATTAAAAA
TGTCACAAGC AGGTATGACT CTTGAAACAC TTGAGTTGGC AGTTAAGAAG GTAGCAGAAG CCTTTGATCA GTACACTCCA GATGTAACGG AGGCCAAAGA AGTTCTCGAA AAAGCCI.ACG AAGATAATTC AACTGATGAA AAAACAAAAG CTTCAACAGA AGTACCTGAG CAAGTCAAAA TTT-CTGATGT GATTACAGCA ACTGGCACAC P.ACTCACTALA GAAAACAGAA AAATCATCTA CTGTTATCTT GACTCAAAAA CAAAATGATT CAACATTTGT TCAAAGCATT ATCGGAAAAG AATTGCAAGC AGCCAATATC AAGGTTAAGG ACCAAGCCTT CCAAAATATC TTACCCAAT ATATCGGTGG TCGAGATTCA AGCTCAAGCA
GTAGTACATC
AAAATGAAGC
AAAGTTAGAC
17 AGTTTTA
TCCAATTGAT
ATGCCAAAGA
GCTTGAATTC
TGG'DCACTAT
AAACGAATrAG AAACATrcCC
AATTAATTTA
TCTTAATTCT
'rAA.GTGATTT
AAGATTCCAT
CCTTACTCTC
GCAGAATCGT
799 TCCAAA'rCAA TGAGTCAGGG AAAAAACTCG ACTTCAGGAA AcAATAAAAC GCATAGTACA AGGTTTGTAC TGCCCCCCAA TCCGAAGGAT TTIGTTCTGT ATTGCACAGA GCTAAGTCCT CrATTGTTG TAATAATCAA TATAGTCTAT AATGGCTCGT AAATGTTT'rc TCATAGCCAT AAAACATTTC GGATTTAAA CCTACCGTTG TCTTGGCTGT TGCCCTTACG TGACATGGA'r TAGGAAGCGA TGATAAGAA'r CGTGI'GATA TTGCCAGCCT ATTCTCGTAG TGCTTCTCTT TGAATGCCTG TTCCAACATT 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 AACGATCAAT CAATTTAATC ATGTACCTAA GATTAGAATT GTrrATCCCA AATTTATTTG AAAGCTTCTC 'rAAGCTATAT CCTTGTTTTC TAAGTI'CATA GATCTGAACT T'rATCATCAT AAGTTAATTT CATAATAAAA ACACCCCAAA AGTTAGATT TGTACTTCAT GTACACCTGA TATGATGCGT TTTATAA'N'T TCATTTT AACTTGATAC TrCAGTGAAAA GCAAAGATTA GCTGCTCAAA GAACAGCTTT GAGGTTGTAG ATAAAACTTG ATGTGAAGCT GACGTGGTTT GAATAGATTT TAGAAGAGTA Tl'rCTGTC'rA ACITTTGGGG TAAACACTT TTGACCAGCC AAC'rAGGAAG CTAGCTGTAG TCAGGTCACC AACATATATA TGAGTCTGGA AGTTTTAATG GATAATGCAA GATTCCATAG AATGGGTAAG AAACTTTTAC.CTTTTCCTCC CTACTCATCT ATAACTGCTT CTAAAACATT CTTATAAATT TCTTATTTCA TT'TGCTTCTA CAATCCTGTT TTGGCAAGTT GCAATACCTT TTTrACGAGGC ATATCTCCTA 'rGGTTCTAGT TCAGAAGGC3' AT'7CCAACTA 'rTGGGAGTGA ATGT'N'CAAA ATTTGCTAGA CCCCATC?1' TACGAAGAAG ACTGGCAGTA AGATTGGCGC CGTGTCCGAC CTAGAGTTCT TATGTGAAGA GTTTGGGCAT TAGTATAGAA AAGTGAATC'r GA.AATAGTAC GATTTAAATT CTCAAATCAT ATTATTCAGT GAGAAGACAC GTG-TTCATAT CAAAAAGG'rA TCTGTTdTCT
AGGCTATA.AT
ATCATGGGTT
TGGT'TCTTTA
AATTAGAATA
TATrTTTGTT TCAACTGACT TATGATTGAT AAGAAGTA'rC TCTATAATGG TCAGGCTGGC TAGCCTAGGA GAGTACGAAG CGTTCAGCTG GACTATCTTT 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 TAATGATTTG ATAAATTGGA TGGTCCGTTG AGT'rGTACTA TAGAGGGArr CATTCGAGTG TCAAATTGAG CAAGA'rrGA ACGAAAAGCC TGGATTTGTT AGCrrCCAAG GT'TGCAATTT TCAAACCL'C TAACTTCCCA AGTTGCCATT
CGGCTCCGAA
GCGGGTAAAT
CACGGAGATT
AGGAACGA'rT TCTAAAGAAC AGGGGGTATA GAGT'rGACT'r TGGATAATCT CAGCAGATTT GACCGCTCGA GGTAAATCAC TTGA-ATAAATI CTGATCAAAA GGAATTTCCT TGAGATACTG
ACCAAGTCGT
TrGAAAACGA
TACTTGATGT
GGCGCGACGA
AGAGCAA1TTT
TTGGCTCCAT
TTTAGGGTTT CAATGGATTC CCTTCTTGGT TCCACP.GGGT CC!TCCAAAAT ATCTACAAAG TAATGCTGrG TCCGACTTCG TATCGATAAA AATGGGATAA GAATGTAACC AGT 1GTTTTT 800
AGGAAGAAGA
ACGACCGTGG
TCTGCCTTTA
CCI'GCAGAGA
TTGTGTrCT
TCTAAGTCCT
TTTrCAGACA
AAAAGCGCCA
TTTTTATGC CAGAAATTTr ATTCCGATAA TCGGTCCGGT
AGCTAGTTTC
CT'rGTCTCCC GGAGAA'rCAC CACTAGCACC CGGACAAAGT AGAGTTrCAT CAAAGCTAGC CAAGTCTrGT CAATCATTTG ATCCAAATCT GACGAATTCC GACAGGATTA TTTGTGGAAT CATGCTCACr AGTGCTGAGT GATAGGGACA AGGTTrTGAA AATCTGATCT TTGAATCCC CTGATGAGGG TGATTrTAAC N'TTTTGCC AAGCCAGATT CCCAGCGCAA TGTGTATTGG ATACTI'TCAG TAGGACGATA CCGATGATGA GTCCGTCCA.A GCGTAGGCCA AGGTGGGAGG GCAATATAGA AAATGTCAAG AGCAAGGTCT 0 0 *0 *0 0 0 *000** 0 0 0 0*00 0 *000 0 *0 0 0 0 CGTTCATAAC CTTGAGGAAG CTCTCCTrCT AGGGCATTGA ATAGCTGCTT TAGATAGGAT TTGTTCCACC AATG~wrTT AT'rATTTATA T'TTATCCTCC AATTGACTCA TCCAAATACC AGAAGAAGGC GATGATGACA TAACCGACAA GTGAA.AGTCC CGTTTCCTGC ATTTGGAAT'r AAGATCAAAA GGGTACTTGA AATGATAGAC GAACTGTTTA CGGAGTTCTT CTAGTTCTCC CTTCTTCTTT CTrGCCTrTA CCTTrGGACA TCTTGTAA.AG CATGACCTGC CTCGACTAGC GGACGCATGT AACGGTAGAA -GGATATGGGC ACCGTCGGTA TCCGCATCG INFORMATION FOR SEQ ID NO: 109: SEQUENCE CHARACTERISTICS: LENGTH: 5548 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 10380 10440 -10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11309 120 180 240 300 360 420 (xi) SEQUENCE DESCRIPTION: SEQ ID*NO: 109: CCATAGTCTA ACAAGTCTTT GTAAAGGTTT ATCCCTGAT'r CATGTAAAGA T1'GTGTAAAG AATCAAAAAA AGCCACTTTT GAAAAATGGC TGCTCCTAAA AATAGCTTTA AAA.ATTATTA GTCCTGTGCG AAAGATTGGT TAGGAAGAAA. AATCGTGAAG CAACTGCCTC TGCCA.AGCTG ACTCGTCACC GTGACTTGGC CACCTAATAA TTGACTGAGT TCTTTGACAA. TGGCAAGGCC AAGACCAGTG CCACCAG?1'T GTCTGCTTCG ACCT'rTATTA ACTCGGTAAA AACGTTCAAA AATACGATCC TGCTCTAATT GACTAATACC AATCCCTGTA TCTGATACAG AAATCTTAAT GCCTTCGTTC ACCTTTTGGG TCTTGACCTC AATTTTTCCC CCTTGTTCAG TGTAACGGAT GGCATTGGAT AAAAGATTGA GTAAGAIrG GGAAAGTAAT TGACTATCTG ATACGAGGGT GACATCATCr GvGCACCTGCA CCTTTAGCTG TAAATCCTTC TTCTTGAGCT GAGG7?GCAA GCTTTGAGTC AAATCCTGTA CAAATTCTGC CAAAGAAACG GTCGTCCATT GTATAGGCAT TTGTTGAGcc TTAGATAAGG TAAGAAGATG CTCAACAATA TGCTCAAGAC GCAAACTTC TITGTAAATA ATGTCTAGAA AGTCATCCTT GAGCCrTCT TCTCAGCTG ACATCCCCL' AATGGrrCA GCAAAGCCCT TGAGACAAAG GCTAAATrTA TAA'DCGAAGT AACTGGTGTC CTCAAT-rCAT GGGAGGCATT ACTTTTCATA ACTTCTAATC A'r1GGGTGOG CAAAAACG
GTIGTTAAAT
GGAACTGCTG GACGAGCACA
AATCAAGTCA
GGCTTrGTCA
GCCGTCCACA
AGCGGAAACT
TAGATCCTTG
GCTTCCACAG
CCCTCATGAA
ACThAATTCC
TCGGGAAA.AT
AAAAACGTCC
TATTGCTGAT
CATATAGCAA
TCACTTCTA.A
ACCCACTTAC
GAATATCCAT
AATGAGGCAG
CCATGGTTAG
CCTGTTGGGA
TTCTTGT'r-r
CCGTTTGAGG
AGAGCGACTG
GTCACAGA
GACTTTGGTT
TTCTAAAAAG
AACCTTGTTT TTTGA'rCAAA TCATCAAGTG AACTTATGTC GATAATA-ACA TCTGACCTTG AGAACCTCCA TT-GT'rTCGC TTTACGCCCAG ACACATACTG TA'rTCACTAC TGTCAAGAC 00.9* 0 *.w .Poe ACCCAAAGAC TTTAAGTCTT CTTGCCCTTT AGGTTCGTGC AAGCTCTCAA AAGCAACTTC CCATTTCCAA ACCAAPLAGA GCCAGTAGCC ACCTAGTCCC AAAGAAAGGG CTAGAAGAAA GAGACCGA'rG CCTTNACTGA TCCAAGTTAA TGCCATCCCT GCAATCAGAA TGAGGCTAAC ACTTAGATTG ACTAGCCAAA AT'rGAAGGTA 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 GCGT'rTCATC
AGGGGCTTTA
ACGTGTTTCC
TGTCATGTTG
CAGTAACTTA
TAGCCAAGAA
CCTGAGGACA
G;TCATCAGCC
CATCATAAT7 TATAACTCCT TGAACTTATA ACCATAACCC CCAATCGTTC GAATAA.ATTG GGA?'rGTCTT CAATTTr'rTC CCTCAACTTA CCAATATGAA CGTCCACCAA TGCCCAAAGT CATACCCCCA GATACGTTCC AAAAGACGCT CTCTAGTCAG
GGATGTTTCA
TTCGCCTTGT
TCGTCAGCGA
GCCTTGACAC
CCTAATTCCA
GGAGTTNGA
TAAGATAGAG CAAGAGT3'CA AATTCTTTTG AGACTTCATG ACGCTCAGGG 'rATACTTTCA TATTATCTGA ATCATCTCCT TCTTGT'rCTC GCGCCAGCAA TTCTCTAGGC CTAAAAGGCT
CGGTCAAACT
AGGTCCCAAA
CT'rTAGTTCG
TGGTCAGGTA
TCGCAGAAAC
CCATGCCATC
CTGCCAALAGC
0000 *550 *0 00
S
0
AGGCCAAAAC
CGCCTTTGGC
CTTATCAAAT
TCTCAGCCGC
AAAATCAAAG
TCATCACTTT
TTACAAACTT
GGT1'CTGT TAATTGTGGT AACATGATAT CAAGCAAGAT TAAGGCCTTC CGTCCATTTG TCACCAATTG AGTAGAAAAG CCTTCCTTAC TTAAATGGTA GTCAAGCAAT 'rrCAGAATGT GTTCTTCATC ATCCACTAAT AAGACTTGTT TTGTCATCTA 802 TTATCTCCTA TTGGTAACAT TATAACACAA TTATCAGAAA TCCrAACA -r GCTAAATCAG ATrrAAATTTG CCTATCAAGA CTAGTATCTG GTCAAKAC=C CAATCATCTC CDGTGCTCT GGATACGTCG CCAGTAGATC TACCCrrTCA AATAArrCAA AAMCCTCAAA '1rCAAAACCA GGAGCAACAA GACAAGAA6AC CACIAGCATCA TCCTTATCAA CTGTTGATCC CCAAATAGGG CCCTTAGGAA CACAGTAGTG AAGTTGTTGC CCTTTGGA'rA TG;TCCAGGCC TAAAGTGACT GCTTCGTAGT GACCATCTGC TGTAATCATG TGAACAGTAA GTGGGGATCC TGCATGAAAA TACCAGATrr CATCTGCTGT CPLATCGGTGA AAATGTGAAG GA'rTCGTTTC TTCTAATAAG AAATAAATAC TGGTATAAAG CGCCCrrCCC TTACCAGCAA GGTTTATAGT GTCTGAAGCT TrrTTGTTT GTCTAAAATA GCCACCTTCA ATATGGGAG CTAACTCTAG AGTTCTTATC AAGTCTTC -r TATCCGTCGG CGAT -rCAAG AACTCCTCTC AAA'rCAATCG CCCTAAAAAA TGAAGGCTCA AGCGATrTC ACTATTGTCA TGATATGAAG AGCCAATGGG TTGAAGTA.AC AGTCGAGG ACACGGTAAT AGAATTAGCG AATCATTCTG ACAAG'rCAAG GGAGAATTGT TTGTTCAAAA TGAGTCGCAG TCTTGT'rCAA AGTGGTTTTA GA'rTGATGCG ACGCAAGTAC GTAAAAAAAA TGCCACGCTA TTCTTTGGAT ATCGCTGTGA AAATATCGGA CAAGCTGCTA ATATCCTCAA CCACAAACTC TAAAGCCTAT AACCA'rGCGC AGTAAAAACG TTTAAAATG;A ACGAATGAAT TTGATTGCTG AAATCTTGGC TGACAGTGGT TATCAAGGGC TCATGAAGAT CACGTAAATC CAGCAAACTC AAGCCACTAA CAGTTGAAGA 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 TATCCAAGGA GAGAAGCAAG GTrrGAA.CA TITTCAACAAC CTA'rCGAAA'r CATCGTAAAC GCZA'TATCAA TCA'rGAACTA GGATTCTAGT AATTAG'rGAA GCGTTTAGGC AAGTGTCTCT ATATTTAGGG GTCATGACTA GTGAAGCAGT TAACAATTAG GAACTr'rAGT TCCAATAACT GATCATATTT ATGTCCTAAA ACTAGTGAAG GTTAGTTALCT 'rAGATTGCTT TGCAATCAAC
TCTTTGCCAA
ACTTCGGATT
TTTGCAGGAA GTC'rATTATT TGGTTAGCTG GGTTACGACG TCATGGACTC TAAATCGATT TAGCTAGTTC GCATATAAGC GGC'rAGCGTC TTAAGATTAC GACGT'rTTAG GACATAAATC CGCCTAGCCA AAGTCCGAAT AGGATTTh;GC TAACTTTGGC GATTTACATC TTCTCTGGCG CTTCTACTCC AAGCAAGCGA AGGGCTTCT'r TGAGAACGAC TGCGGTrGCG TAGCTGAGGG CTAGACGGCT GTCGCG'rTCT GGGCrrTCAT CCAAGATACG TIGTATGTGCA TAGTATr'rGT TAAAGGATTG AGCCAGGCTA ATTIGCAAATT TAGCAATCA'r AGAAGGTTCA AAGTTATCTG CCGCACGGT'r GATAATACGT GGGAAGTCTT GAATCAGTT'r AATGAT'rTCC CAGCTTTCAG TATCATTCAA GCTATAGTTG CCAGCTGTI'T CTCGTTTGAA ATCGGCTTTG CGTAAGATAG ATTGGATACG AGCGTAGGCA TATTGAACGT AAGGTCCAGT TTCACCCTCG AAGGATACCA
TAGCCTCTAG
TGGCTCCAAT
CCTCGATTTG
CATTCCCTTT
GAGTAATGTC
AGTGGGCAGA
TACGGTAGAG
TGATGAGGGC
ATTCAAGAAG
AGAAGGCTTC
GTCCGbAGMCG TA'rCCATTTG TACGGTCGGT T?1TGAGGTCA TAGAATTrAA CCCAACP.GCA TGTGCTACL-r GGTCTTTG= TrCTAGTrCA GGATTTTTAG GACC?1'GGCA CCGCTAACAG CCTCTGCAAC AGTAGGCTCT AGCAAGATGA ACGAGTAGAG AGTTTCTTCC CTrCI--rG AACCAAACCA AAAGGAACGT GTCACTCCAG TCGTAGCCCA TCTCTTGCAA GACAGCTTTG AGCTGTTTAA
TTGTTCTTGA
GGCTGCAGCC
TGGATGTTCA
TAGTCCTTTT
TCCGTTATAG
CCAACGACAT AGATAGATTT AGCAAATTGG TATITCGTTr'r AAGTCACGTG TGATATAGAO AGVI'GCACCA TCAGACTTCT AT'rCCATAT TCTCAAGATr CACAACTTGG GCACCTTCTG TCAGAAAGAA TGTCTACAAC TGCATCCATC TTATCATTGT CTGTCAAATT CAACCTTCAA TTCATTGTAA AGGCGGTTAA ATrCCACTAA ACTTTCATCG CGGAACCA'rT GCCAAAGAGC GACAGCTrCC TCATCTCCAT TTTCAAGTT'r ACGGAACCAT TCGCGCGCT'r CTTCA'ICCAA GCTAGGGTCA TTTTCAGCTT
CAGCGTTGAT
CTTCGTCGCC
CCAAATGGTT
GCGGACATAG
CCATTTTrTG
GACCTTGACC
AGTTTAAGGA
TAGGCAACAA
GTTTGATAAC
CTCCGA-TAAC AGTTGAACGC AGGTGGCCAA ACATGTCGAT AACAACATTT TCT'rGTTrAC CAGTGGTA.AC AGCTTGCAAT ACTTGAGCAG CGTAAGGTCC TGTTGCGACA ACTTTTTCAA CAGCCGCAAT CATTrGTGGT GCTTTACGTT CA.ATGTCTCC CATTTCTGAG T'rTTTAGGGG GTTCA'rCGAT
TCAACATCCC
CGATTTTTTG
TAGAAAATGG
CAATATTTTG
AAATGGCAGA
AGGCTTGGCT
CGACTTTTGC
TTTCCAGTAA
TGGATGAGCT TT'rACAGCTT AAATrGTTTA CCCCAGTCTC GAAAATATGT GACAAGCTAT 'rTTAGCGATA TTCGGACTAG GTCAGCATAG TGTTCTr'r'r TrTATCAAGG AAAAAGTTA.A GTTCATTTTT TCAGCCAGTT AAGAGAAAAA GCAGGGAAAG CTTAAA.ATA GCCTCTTGGTI 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220- 5280 534'0 5400 5460 5520 5548 CCAGGCTATC AATGATGCTA GATAA'rTCGC TAGCAATCAA TTCTTTTGTA 'TTCATTAAGA GCTCCTTTTT GGACTTTTCT ACTATTI'TAT CACAATTTTA AAGAAAGAAG AAAAAAT'NTr TGAAATCTCC TGTrTTTTTTG GTATAATATG GTI'ATAAATA AGAGGATTTT ATGAGAAAAA GAGATCGTCA TCAGTTAATA GAAATTAAGT ACACAAAAAG AAATTICAAGA TCGGTTGGAG GCAGACAACC TTGTCTCGTG ATTTGCGG INFORHATION FOR SEQ 10 NO: 110: SEQUENCE CHARACTERISTICS: LENGTH: 3132 base pairs
TAGTTATAAA
AAAAAAATGA
GCGCACAATG
TATGCACGCA
TTACTGAGGA
TTTGTGTGAC
TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 804 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: TACCCGGTAG TCTTAGCAGA CACATCTAGC TCTGAAGATG CTTTAAACAT cTCTGATAAA GAAAAAGTAG CAGAAAATAA AGAGAAACAT GAAAATATCC ATAGTGCTAT GGAAACTTCA CAGGATTTTA AAGAGAAGAA AACAGCAGTC ATTAAGGAAA AAGAAGTTGT TAGTAAAA.AT
CCTGTGATAG
AAATCCCAAG
GAAGATAAAG
AAGGAACTAT
GGTAGTGCCA
TCATCGG'TTG
GGAGTTGAGG
GATGGTAGAG
ATGAGAATCG
ACTGATAAAA
ACAATAACAC
GAGATTATAC
TTGTCTATAT
CCAGTCTTAA
TAGAAACAAC
AAAGGGCACA
AAGCTATTGA
GTATGGTCAT
ATGATGATGC
ATTATrGT TAGCAATGAA GAAGCAAAAA TCAAAGAAGA AAATTCCAAT GGACTCATTT GTGAATAA.AA ACACAGAAAA TCCCAAAAAA TGCTGAATTT AAAGATAAAG AATCTGGAGA AAAAGCAATC GAATACAAAA GTTTTATATA ClTl'ATGATAG AATTITTTAAC 'rCCAGATAAC TTGGACAAAA TTAAACAXAT AGAACGTATT AAAAGTCCAA CCCATGATGA ATCATGCCAG AAAGGAAATT TTACCTAA.AG TCTATCAATG 'rTCAAATATC GATACTGGAA CAAAGCCTCA ATGAGATTTA GAGTGATAAA ATCCCTCATG CTCCGTrTGG GAAAAATTTT CAGATT-ATAG ACATAAGGCT AAAAAGAAGA CTTAAAAGGC CG-TCAATTA TTATAATGGT AT-TTGACCC ACATGGGATG ACATCAAAAA CTTTAACGGC AAATGTATTC TGACGCAGGA GGCAAAATCA CTGTAGAAAA ATATGATGAT CATATTGCAG GGATTCTTGC TGGAAATGAT ATAGATGGAA TTGCACCTAA TGCACAAAT'r TCTGGGTTTG CGGGTGATGA AACALATGTTT GTTGATGTTG TTTCGGTATC ATCTGGTTTT TGGCAAGCTA ?TCGGGCATT AAGAAAAGCA TATGCGACTT CTGCTTCAAG TTCTTCATGG ACCGACACTG GAAATGTAAC ACGAACTGCA GCTAAAAATC AAACAGTrGA GTTTGATAAA AGAAATATAG GGGCCTTTT CGA'rAAGAGT GCTCCTAGTA AATTAAAATT TGTATATATA TTGGATCTTA GGGGCAAAAT TGCAGTAATG GCTTTTAAAA AAGCTATGGA TAAGGGTGCA
GGAAGGGATT
ACTGAACAAG
TTCTCTTACA
240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 CATGCTATTG AAGATTCTAT ACAGGAACAG GTC=rGTAGG GGCATTCCAA TGG~rGTrCGC GATTTAGTAG CAAATALATCA GCACATGAAG ATGCGATAGC
CAAACACAAC
TGAGAAATAT
TACGGGTAAC
TCTGAAAATG
GGTCGCTTCT
TTTTAAATAC
TGGAACAAAA
TT'rGATAGGT
GTTAACATAG
AAAATCACAA
GGCAAGGGGC
GTGGAGAAAG
CAA.ATGA.AGA
AAGACCAAGA
GATAGAATTT ATACAAAGGA TTTAAAAAAT CGCGCCA'rTA TCG=rTAAA TACTGTAAAT
I
TACTACAATA GAGATAATrG GACAGAGCI' CCAGCTATGG ACTAAAAGTC AAGTGTTITC AA?1'TCAGGA GATGATGGTG AATCCTGATA AAAAAACTGA AGTCAAAAGA AATAATAAAG GAGCXATACT ATCCAATTGA TATGGAAAGT TrrAA1-1CCA GATATGAAGC GCATGAAGGT TAAAGCTATG GAACATGATT AAGATTTTAA AGATAAATTG ACAAACCGAA TGTAGGTGAC GAAAAAGAGA TT-GACTTrAA GTr'rGCACCT ATCATCGTrC CAGCAGGATC TACATCTTGG GATGTTCAG CACCTGGTAA AAATAT-rAAA AC'rTATGGCT ATATGTCAGG AACTAGTATG TTGATTAGAC CGAAATTAAA GGAAATGCTT CACACAGACA AAGAACTCTA TAAAGAAGAT GGGCCAAGAA TAGATTTACT TAAAAccc TCCACGCTTA ATGTTA'I-rAA TGGCAA.ATCA GCCACTCCAA TCGTGGCAGC TTCTACrGTT GAAAGACCTG TATTGAAAAA TCTrAAGGGA AAAATTGCCC TACAAAATAC TGCGCGACCT AGTCAATACT TTGCATCACC TAGACAACAG TTGAGAAATG AA TGTAGC AACTTTCAAA 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640
S
S
GATGACAAAA TAGATCTTAC ATGATCGATG CAACTrCTTG GGAGCAGGCC TAATTAATGT AACACTGAT'r CTAAAGGTTT GGTGATAAAA AATACTTTAC AAAGTTTCAG CATCAGCGAT ACATATAAAG ATGAAA;AATC
A.AGTCTTACA
GAAAGAAAAA
GGCCAATGCT
GGTAAACTCA TATGGTTCCA T= CTCT'rAA AGAAATAAAA ALATCAAGCTT CACAATACAT CAAACAGACC TTTGACTTTT1 AACTACAGAT TCTCTAACTG ACAGATTAAA ACTTGATGAA TCCAGATGGT AAGCAAATTG AAAGTCAAAG GAGCAAATAT CACATTTGAG CATGATACTT AGCTTTGA'rT TGAATGCGGT TATAAATGTT GGAGAGGCCA TTCCAGAAAT TCACCCAGAA TCACTATAGG CGCAAATTCT AAAACAAAAA TAAATTTGTA GAATCATTrA TTCATT'rTGA GTCAGTGGAA GCGATGGAAG CTCTAAACTC CAGCGGGAAG AAAkATAAACT TCCAACCTTC ?I'TGTCGATG CCTCTA.ATGG GATTTGCTGG GAATTGGAAC
CACGAACCAA
GGTTATGATG
GAACATGGTA
ACAACATCCC
TCCTTGATAA ATGGGCTTGG GAAGAAGGGT ATGATGGTAA ACCGAAAATT CCAGGAACCT TAGATAAAT'r TAATCCAGCA GGAGTTATAC TGGATCAAALA TCCAGAATTA TT'TGCTTTCA.
CAAGATCAAA AACACTGGGA TAAATAAGGG AATTGGTGGA AAA.ATAGAAA AGATAAAAAT ATAACGAAGG GATCAACGCT 2820' 2880 2 940f 3000 3060 3120 3132 CCATCATCAA GTGGTTCTAA GATTGCTAAC ATTTATCCTT TAGATTCAAA TGGWA.TCCT CAAGATGCTC AACI-rGAAAG AGGATTAACA CCTTCTCCAC TGTATAAG AAGTGCAGAA GAAGGATTGA TT INFORMATION FOR SEQ ID NO: 111: SEQUENCE CHARACTERISTICS: LENGTH: 14672 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 111: CGAGATTTCT ?TrAAATGAAC TACGI'GAAAT CTACCCATCA TCCAGATCTG GATATTCTCT CCTATCTATA AGTAAAGTTT TAGGAGATTT TAATATAAGT TCTCATGCTrT TAAAGCTTC GGTAAGAGAT TTAAAACCGC TCAGTCCC ACTCATNTGC 'rTCTGGGAGA GTTCTCATTT TA'rrA'rCTT GAAAAAATTA GTAAAAACAA GTPATATT TTAGATCCTG CAAAAGGCAG GCAGAGAATG TCAATAAGTG AATTTGAAAG GCATTATTCA AATATCATTT TAACATTTAA AAAGTTAGAT AGCTrTATGT CTCGTAAAGA TAATAAGAAG TCGCCTGTr'r TAAAGTATTT
TTTTAAGTAT*AGGAATAAC
ACAATCAT'rA GTACCTATAG GTATTCGTCT AGAATGTTAT GTATTTATTA AGACAGATAT CTATGATTrT ATGAAACATT AGGGGATATA CTTTTTAGAG TTTrATAGCA GCTATACTTG CTTTTCTAAG TACATGGTAA GTATCCAATC ATAAAAATCT TGTTCAAAAT ATTACTTCCG TAGGGATTTT ATTTTTTGTA ACAGCATTAT TGTATGTAAT CTAATAGATA CATAATTGAC ACGAATTrCA AGGACGATTC TTPACTATATT ATTTATATTT ACTCTTTCAT TCTCACTAAT ATGTTGCATC CTTAAAATAT ATAATGGATA AAGAGATTAG TGATATA'N'T ACCTTACAGT TTTTATGAAA AACGTACTr'r CTAACTCTAT TGTTTATATA AGAGAAATAC TATCAAATAA ATTrGTTAA'r GATTGTGGTT TATGCtGTdG TTTTATTTAG TC'TTTTTAAT ATCACTAAGT CTAGCTCTAT CTATTGTAAT CAAAAAATTT AATTGATAAA AATATAAAAG AAGTAATT'rC TA.AAAATAGT GATATTAAGC AGAGGAATTr TGGATTAACA AATGGGATAA TT'rTAATACA AAACAGCTCA AAAACTTGAT ATACATTTAT CAATTGTrAG TAGTATAACG AATGTTTTAC CCCTGTTTrG ACCC'TTATTG TAGGTGTAAA TATAAAAACA T-rCGAACAAT ACAAATTGTA GCAATAAGTA CAGTCTCACC ATACTTTATT TCTCCTATA.A TGATAACTAT ATACAATTAA TGTrATTAAA GGGATATTT TTAAGAATAG TAATACTAAA TCCGAATTAA TTCCAGAAAG AGTCAGTCA A GATATAAAAT AATAGAATTA AAAGATATTT GGTATAAATA TGGATTATTT GATGATTATG AATAAATGTT ACTATTAAAA AAGGAGAAAC TGTTGCTA~r GTTGGAGAAT AAAAGGT'rAA
TAACTGGAGA
TCATAGGTCG
AAATTATTCr
TGACGTTAGG
TTTCTTTAAG
AGGATGTGTT
TTGATAAAAA
TTTGAAAGG
CAGG'rTCAGG- 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 TAAGAGTACA TTAGCTAAAA TTTTATTAGG TTTATTAGAA CCTAATATTG GTTCAATAGA AGTTGATGGA GTAGAAAAAG AAGAAATTGG TCAAACATTG TATAGAAAGA TN'TGGAGC -1-1 1-1- AGTGTTACAA AATTCAACCC TAAGrATGG TACCTTAAGA GAGAATTTGA CATTTGGAcA C1TTGTTCA GATGAAGAAT TAATGACAA6A TCTAAATI'CA ATTGGTCTTA GCA.ATGTACT TAAATCTTTA CCTCTTGGAT TAGAGACAAT CATCGCTGAA GAAGGTAATA ACTTrrCTC .AGGGCAGCAG; CAA6ATGATAC 'I-r'rAGCTCG 'rrGTC?1'TTG TCGAAACCTT CGGTAGTTGT ?ITGGACGAA GCAACAAGTA GTI'TAGATAA TrrATCTCAA CAAATTACAA CTTCTTAC1'? AAGTGAAATC CGTACCACTA AG2ATrAAT TG4CCC.ATCGA CTAGATACTA 'rCAAGTCTGC AGATAAGATC TTAGTAATGC ATAATGGTGA AATTGTAGAG ATTGGGACCC ATAGAGAACT TCTTGAACTA GGAGGCATTT ATAAGCAATI' GTATTCAAA'r AATTAGTTT TGATTAAAAG G4GTAAATTTA TGAAGATTAT GAA.AAAAAAA TATTGGACTT TAGCGATATT ATTCTTTT1GT TTGTTCAATA A'rTCTGTTAC CACACTCAGA CTrAGCGAAAG AAAAATCAAG AAGAAGTAGA TTTGTAACAA CAGATAAACA ATAAATAACG ATATTGTTAA AATAAAATTA AGGAAAATGT AAAGATAALCT TAGAATCGTC GGTGGAGGCG AAAGTTATAA TCCTCAAGAA ATACCTAAAA ATCTTGATGG CAATATAACT TTTTTCTGAA TCTGATGAAA AACAGGTTGA CTATTCTAAT CCAAAATAA6A TTTCGTATTC AAA'rCGATAA GACAGAATTA TI'TAGAAAAA AACTCTTGTA AATTGGA-ACT 'rCAACCACAA CTCTGAAAGT AATAAT'rTAC TAGGCGAAGA TAATTTAGAT TTCTCATCTA GATAATAGAG GAGGAAATAT AGAGCATGAC GATTGTAAGA AAATATGAAT GGGATATAGA TA.AAGTTACT ATTATATTCT AAAAGTAATT CTAAAGTTTC AATTGCTATT 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 .2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 TTAGArTCAG GAGTCGATTT ACAAAATACT GGA'rTACTGA AAAATCTTTC AAATCACTCA AAAAACTATG TCCCCAATAA AGGATATTTA GGAAAAGAGG AGGGAGAGGA AGGAATAATA TCAGATATTC AAGATAGATT AGGTCATGGT ACGGCTGTTG TAGCTCAAAT TGTAGGGGAT GACAATATTA ATGGAGTAAA TCCTCACGTT AATATTAACG TCTATAGAAT ATTTGGTAAG TCGTCAGCTA GTCCAGATTO GATTGTAAAA GCAATTTTTG ATGCTGTAGA 'TGATGGCAAT GATATTATCA ATCTTAGTAC TIGGACAATAT TTAATGATTG ATGGAGAATA TGAGGACGGA ACAAATGATT TTGAAACATT- 'rTrGAAGTAT AAAAAGGCTA TTGATTACGC GAATCAAAAA GGAGTAATTA TAGTAGCTGC AT'rAGGGAAT GACTCCCTA-A ATGTATCAAA TCAGTCAGAT TTATTGAAAC TTATTAGTTC ACGCAAAAAA GTAAGAAAAC CAGGATTAGT AGTTGATGTT CCAAGTTATT TCTCATCTAC AATTCGGTC GGAGGCATAG ATCGCTTAGG TAATTTATCA GATTTTAGCA ATAAAGGGGA TTCTGATGCA ATATATGCGC CTGCAGGCTC AACATTATCT CTTTCAGAAT TAGGACTTA.A TAACTTTATT- AATGCAGAAA A.ATATAAAGA AGATTGGATT 808 TITCGGCAA CACTAGGAGG ATATACGTAT CrrATGGAA ACTCATTTGC TGCTCCTAAA GTTTCTGGTG CGATTGCAAT GATTATTGAT AAATACAAAT TAAAAGATCA GCCCTATAAT TATATGTTTG TAAAAAAA'rr CTG.GAACAAA CATTACCAGT AAAAAATGGT ATAAAAGTGT TAAATATACC AAACGTATTG AGATATGATT TGAATATGTT ACAArPAGAA TATAAAAATG AACAAAG3?rG GGATAGTTTC ATAGATAATG rrAATTTAAT TGAGTTGGAA GAGAGAATTC AAACTACTAT TGGAATTAAA CAAATAAACA CACACAATAT TATTACTATT GCCCGAGAAG GGTACTCTCA AAATTATTTA CCTAACACTT CAGAAAATAC ATATAATTCA TTACAAGTA GTTTAGTTGG AGTATTACTA Cr=rMATAA GTATGGTAAA TATNTATGG GCTAAAAAAA CTAAATGAAA ATAAAAT'rTG GAGCCCTCTG AAAAAGTAAG TCCTACAGTT CAACTAAAAT GAGTCAAAAG ATGAATCACC TTGATGTAGG GGAGTTTGTC TTATTGCTGC CTGAACACCT CCGTTCAGAG GAAGAACATT ATAALATCTGT 'TTrTGAAGAC GACTTAACCA GTCGC-ATATC TAGTCAAGA'r GAACGACAGC AAATGACTGC TACGGTAGGT TATTTIAGAAT CAGGTCAGGA TCGTTTTGTG TATAATACGA CcCCCTATTTC TTACCAGCAG TTTTTGAAAG A'rCCAATCAT CA'rrGTTATA ACACCCCAAT CAACTGGTCC ACAGTCCATT TTGTTTT'GGA TAGACGCAGT ACAGAACTAC GT'rTCTCTTA A'rCAA'rTGTC 'rGATGCCCAG GAGCTTATCC AGAGACAAGG CATTGAAAAT TGGGTCTCAG AAATGCAAAC ACC?1'ACCAC AACTACATCA CATTAT'rGGA *V 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 TAA'rA'rCAG ACGGAACGTT AATCTTGTTG 'rTTAACACTA TATCAAACGC ATTGCAGGTC ACTGGGTGTG TTI'TTACTGG GGGTAATGCT AGCAGGAGCT GTGCTTGGGA
TGAATAGGCT
TCACGGTCTT
GATTTGTTGC
CTACTTTGAkA GATTTAGAC AGAAATCCAT CGCACTTATC TGCTTTCTTA GTCTTGTTAC TCTTTACTGG GCAGAAAGAA AACAAGATGT CCATGCTTGT
GAGTGTATTT
TCTATCTCTT
TTTGA.AGGGA
AACAGGTGAG TAAATCTTTT AGGCTGGAAA AGTCTATGCC ACATGATTGG GAAATTAGAA CCAAT'rATAA ATCAAGTGAT GCTTAATTGA AAACCAAAGT TGAGTCCGTC CGAACAGCGG ATCTTGACCT AGATAAGCGC TGGCAAAAAT TATCTTAAAG GGAGAACCAG AGrrTTTC TTAATTGGTT CAAGTGGTAG
CCTTATGATG
rTTTTTCCGTC
ATTGAAGAAA
GGACGA'rrrT
ACGAATTGGG
ACCTTAAGCT
CTTCAGGTAG
TTACAGTTAC
GGTTAATATG
GAATCTT'rCA
CGGAAAAACA
TTACCCAGGT
CTACCTCTTC
AGGTCTCATT
AGAACAGGTC
AGAATCGCAA
AGATGAGCCA
T'rGCAACT'rC GTGCCATTTr
TCTTTGCTCA
AGATAGGAGT
ATGTCCAAAT
ATTrGAACTTA
ATGACATTTG
ACCTTGATGA
AAAGACTTGG
CAGAACTTTG
GGTCAAAAGT
GGCCTGGTT
CGGGTTGCCT
ACAGCTTCA-A
T'rCAGGCAGA AGCAGGCTTT A'rCTTTGAGT TATCGGGCCG AATCCACCCT TTATrCTGGC TAGACCCAGC AACCTCTCAG TTGATTATGG AGATTrTGCT ATCTCTTCGA GATGATAATA GGCTAATCAT TATCGCAACA CATAATCCGG CAATTTGGGA GATGGCTGAT GAA-GTGTTCA CGATGGATICA TCTGAAATAA AAATCCTTGT TT1TAATTGC ACGATGAGTr ACTGAAATA'r TATCATGAAT CAAGAATTGG AGTrAATTTA GAATTGTACT TAATrrAGAA TTGTACTTTA TTAATATTGA GGTAACTT TCTTGATAAA GGAAGAAATA ATGGAGAGGA AGTTAGAATG AAAAAATTCG ACAATTATAT AAAATCTTAA TAATTGAAAG AATAGTGTAG GAGAGATTTT TGTTATTTTG AGGAAGATAT TTAGAATTTC AAAAAT'rTAG CTATGGTAAA AAAGTAGCTT TTAACACAAA AAGAAATTAT TGGGAAAAAC TTGCGATTTT GAAACAAGAA CAGGATAAAA TGGAT'rGAAC ACCTTTAN'C ACCCGCATAG CGGGTGTT ACTTAATCAA AAAACGCACC TAAACTAATT GACTATACITT TATTGAGAAG CCTTGCGA'N' CTAATTCAGA TAAACTGCAA TTTGGTAGAT GATA=rTTC AAT~IrTCTCT CAGA6ATCAAT CCTCCTACAA CCGTI-rTAAA AGAAAACTAT CrTTTATTCCA TGTGAAAGTC AAAGATGATG ATAAAGTTGA GTGGAATTTG AGCATTTTT-G GC?1'AGTAAT CTGTGT'rGAA GGC'rCAAAAC TGAAAACGTA TTGCCTCCAA TGAAGTTCTG GAAAGATGT'r U 0 0* 0 S *n *0 *s S 0
S
00 09 S 0 Ob SO S S 6 09.6 9 9
S
9 *909 *054
S
S.fS 9. SS o 9 6
CACAGAGAAA
GAACCTTTTG
TATCGCGT
TTTTGTCTCG
ATATCAAAAA
TCTATTCAAA
GGAAGAA.AAA
TGCCATATTT
GGAGTTTTTT
CACCTAACGG
CTAAAAAGT-r TGAGC'rTT'rA AGATTTAGTr AAATAATGAT GTTTCAGTAT TGAGAAAAGG GTATAGAAAT ATAGTCAAT TTCTCCT'rTC GCTTTACAAT TCGTATAACC TTCGACGCAC ACGAGACAA ACTAATAGTC TGATATCATG CGTCATGTCT ACCA.ATTGAT TGAGCCAATC CTTCTGA6ATC TGAACCATGT CAAAATCACC TCGAATAGTG 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940.
6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 CACTCTTAAA ACCA.AAGAGC AAT'rTCTCCC TT'AGCTGACT ACAACATTTT GGATAATCTC ATTTTCTCCA GCAGCTTTTG CCTGGTAAAG CTTCTTCTGG ACGAGTTGCA CCCATCATGG TCCGCCAACT TTCGATTACT TTGGGACCAG AAATCACACC CACAAGAACT GGACCTGAAG TCATCAATTC ACGAATCGGT GGGTAAAAAC TCTGACCAAC CAAGTCCTGA TAGTGCTGGT CAATCAACTC TTCTGAAACC TGTGAACGAA ACTCCAATTT TTCGATTGTA AATCCACG;TT GTTCGATGCG C'TTTAACACT TCACCCACTA GCCCTrCTTNr TACACCATCT GGTTTCATGA TAAAGAATGT TTGTTCCATA CCCGTCTCCT ?TGTCAGCTT CTTTCTTTTA TTTTACCACA TNTCGTGGAA AAATGGAGAA AGTTTTCAGA AGAGAGAATG AGAGAACCCT CGGGTTCTCT C-AT'TCTCTCT TAT'rCTAC'rG TTCTTCCAC AGTTTCAACG GCAG'rA'CCA CALACTACTTC TGTTGTTTCT TCATTTCCTT CT'rCCTCTAC TGGAGGATTA AGGTATCTT CTTCGTTGAC AGCATGTGGT TCAAGGTTAC 810 GGTAACGGGC CATACCA~TA CCAGCTGGGA TGATCTI'ACC GATGATAACA TI-1-CTTTAA GTCCAAGGAG ATGGTCI-rTC TTACCACGGA TAGCTGCGTC AGTAAGGACA CGAG7rG?1"r CCTGGAAGGA ACCCGCTGAC TAAGGACTGG GCGACCTGTC CTGTAAAGTC ATGATATCC AAGAAACTGT TTGTT7TCAAG TGAGCTTTG GTAATTCCCA GCTGGAACTC CACCTGCGAT AAGGACATCT TTGTTGGCAT ATGAGGGTAC CCATG-AGAA-ATCTGTATCA CCTGGATCCA ATTGACGAA CCATTACCTC GATGTGTTTG TCACCGATrT ACT?-TA CTTCACCGAG AAGGTACGTr TCAACTGACA
TGACACGGAC
CTACCCCTTG
AGACATCACG
CACGCGCTAC
CACCTTCGCC
CGATAGCAGT
CTTCAAAGAT
CACCTGTGTG
CGATTGTACC
AACAGTGACG
CTTCCACACC
CAATKTCAC
TI'ACGGATC
GCTACGGTAA
AACTGCAAGG AGACGTTTTG GTTGGA'rAGA ACCTTCTGTC TTCGCCCCCA ACTTCGACAC AGTTTCACCC TTAACAAAGA AACTTGTCCT TTAACCTCTG TrCTTGGACA CCAGGAAGAC GAAGGTACGC ATTGTAAGCT GCATACGAGC TGTAALATGGA CTTTCTTGGT ACGAGTTGAT TAATA.ACCGC TTCCCCNTTA CCTGAGTGAT ATCGGI'ATTT GTGTACCAGG TTCCCCGATA
AACTGCTTCA
GCAGACACCG
AGCATTGACA
TGCACCAGTT
CCAACTTCAA CCCCATCACC AGTCGCCAAG TGACGAGTGT TACATGTAAA TACAGAACGG ATTTCACGCG CCTTGTCTTC TGTAATCAAT TCTGGATGTT TAACAGTTTT CTTAGTGTAA
AGAGCAGCAC
ACGACATATT
GCATCT'rTT
GGATTGCGGG
GAC-GCAACCC
GATGGGCAG
TTGATACCGT
ATAGTCACTT
TCATTTGGAC
CGACCGTTGA
ATCA-AGAGAC
TCGACCAAAC
CCT'rTACGAG *7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 GACGCTCTTC GAGAGACTCG ATCATCTC'rT TT'CCTTCTCC GATAGAACGG CACGGTCAGT TCCACAGTCG TCCTCACGGA TGATAACGTC TTCvGGCAACG GACGACTCAA GTAACCTGAG TCGGCTGTCT TAAGGGCCGT ATCGGTCATA CACCGTGAGT TGAGAAGAAC AT'rTCCAATA CCGACAAACC TTGGCAATTC CATGATACGT CCAT1'CGGAG CAGCCATCAG GTGAGAAGTTr TGAGATGTTA CCACGGGCTC CAGAGTCCAT TAGGATCTTG GTTAGCAATC AAGCGTTTCT CAAGTTTTTC CTGTAACAGC ATTGTAACGC TCGTCGTCTG TGATCATACC TTTGTTCGAC ACGTTGTGT GATTCTTCAA TGAT'rTCAGC TATCGGCAAT ACCCACTGTC AATICCTGCAA GAGTTGAGTG TGCGGTCAAG TAGGGCAGAA GTTTCTGTCG TACGGAAACG TA'rrTCCAAG GTT'r'rCTTC TTGAATGGAG GGTTGAGCTC TTCGCGGAAG TTTGAAA.GGA ACCACGCATA CCGGCAAGCT CATCATAACG ATTGGCJ'CT ACGGGCAGCA CGCCATTCAG ACGACGGAAT TGTTTGGTGA CTTGTCATCA ACGACTGGGA GTCGTAACCG AGGTTCTTCA 'r-r'GAAGATT TCAGCGATGA AAGATTGCTG ATAGCTTCCT TGATATCTCC ACCAAGTGGC AAGAAGTATT TAGCTGGAAC ACC?1'CTGTC AAGTTGGCAT TGTTTGGTTC TTGCAAGTAT GGTAGCCCCT CTGGCATGAT ATCGTTGAAG AGAATTTTAC CAACTGTTGT AAGCAAGACC TTATGTCWTT GCTCTTCTOT CCAAGGCTTG TTGAGGCTGT CTGI-rGCGA'r ACCAACACGT GAGTGGAGGT GAACATAACC ATTGCGGTAA GCCATAACCG CTTCGTCACG GTCTrGAAG ACCATTCCP CACCI'CGCG ACCAGCI'TCT TCCATGGTCA AGTAGTAGTr ACCCAAAACC ATGTCCTGAG ATGGAGTAAC TAAZCGGTTTC CCATC ~CG GGTTCAAGAT GTGCTCAGCA GCTAGCATGA GGATACGAGC I-rCTGCrrGT GCTrCrTC AAAGTGGTAC GTGGATGGCC ATTTGGTCCC CGTCAAAGTC AGCATTGTAG GCTTrCACAGA CAAGTGGGTG CA.AGCGAAGA GCCTTACCAT CAATCAACAC TGGCrCCAAG GCTTGGA'rAC CCAAACGGTG AAGGGTCGGT GCGCGG'rTCA AAAGCACTGG GTGTTCTTTA ATCACTTCTT CAAGGATATC CCAGATACCC TCATCTCCCC GTTCCACCAA CCGTrTAGCT GCTTTCACGT TTTGCACGAT ATCACGGGCA ACGATTTCAC GCATGACAAA TGGTTTAAAG AGTTCAATCG *CCATTTCACG CGGCACACCA CATTGGTACA TCTTAAGAGT TGGACCAACG GCGATAACTG .:AACGTCCTGA GAAGTCAACA CGTTTACCCA GCAAGTTTTG ACGCAAGCGT CCT-rT1-rAC *CTTTAAGCAT GTG GCTCAAT GATTTCAATG GACGGCTACC TGGTCCTGTG ATTGGACGAC *CACGACGACC A'rTGTCAATC AAAGCGTCAA CrGCTTCT'rG AAGCATACGC TTCTCATTTT *GAACGATGAT-ACCTGGTGCA ?r'rAACTCAA GCAAACGAGC CAA.ACGGTTG 'rTACGGTTGA TA6ACACGGCG GTAAAGGTCA TTCAAGTCAG ATGAGCCAAA ACGGCCACCA TCCAACTGCA 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 ACATTGGACG AAGATCTGGT 'rGTTTCCAGA CTI'GTAAAAG 'TTTGTCCAGT AGCTGTTTTC CTTGCTTCAA AAGGTCTTGG CATATTCACG CAAGCGCTCT GTGTATCCTT AGGATCAATC GAGGGCTCAT ATCAAGGGTC GA GATACAGG AGCTTTCAAT GGGATAACCG GAAGGATGTT GCATCCAAAA CATCCAAACG ATCTTCTT TCAGTTCAGC ATGGCTTCCG CACCCATCTT CGGTATTCGC GCTCTGTCAT ACCACATAAG CCCCAA.AGTA AAGCCCATAC GGCTTGGAAT AAGAATCATC CA'rrCAGGTT ACGGATCGCT TTGACACGCT AATTTCTTTT TCAAGATCTA GGCAJLCAAAT GAACCATAAC GATACACTTG TGCTCAACTG
GATAACTTCC
CCCCTGAAC
ACGACGAACT
TCGAGGGCAC
TACCAGATCT
TTCGTACGCG
TCGATATGTC CCATACGCTC 'rTACTTCAAC CCCACAGCGG TCACAAACAA TTCCTCTGTA ACGAATGCGT TTGTACTTAC CACAAGCACA TTCCCAGTCT TTTGTAGGAC CAAAGATCAC TTCATCAAAG AGTCCTTCAC GTTCTCGTTT CAAGGTACGA 'rAA'rTGATTG TTTCAGGTTT TTTCAC'rrCT CCATAAGACC ATGAACGGAC TTTACTTGGA GAAGCTAGGG TGATTTGCAT ACTTTTAAAA CGATTTACAT 812 CAACCACTAT TTCTTCCCTT TCTATTCTAA GTGAACTrGCT TATTCTTGTT TTCTGTTGCT TCCGCCIG TTGCTTTCTC AGC?1'CTTCA GCTTCA~A~A CAXGCAGC1'TC
CTGCTTTAGC
10440 10500 CTCTTGGGCT GCTTTCGC CATTCCTTCA TCCAAGTCGC GTCAAGACCA AGAGM'rGCA TTTTGGAATT GGTTTGCCTr GTCCGACT'rG TAAGTCAAGA CCAAACCTCC A'rCTCACCGA TTGGGTAACA GTTGAGTATG GTGGACTTTG ATCATGTACA ACGTCCATCG TAAAGGATCG CCAAAGATCT TCAGAACT'rG GGGCTTTTC AAGGTCATCT AcGTGGATGA cCT~cTCT GAAGTTCCAC TTCTrGGTCA TCTTCGTCTA GGACACGCAT ATTC?1'TGAC A GAACTCGG AAGGA'TCTG GAACACCCGG TTGTAATAGC TTCATAGGCT rrCAAACTCc CGTTGATATC TT'rCTGAAG
AACGTTGTCC
GTCCCACTGA
TGACTCCGAC
TTTTGGCATrC
CTCCATCAAA
GACATTTGAC GCACCGTAGG CTTCAAGAGC ACCAAACTGA GCCTTACCTC CGAGTGGTTG ACGCGCGTGC AATTTATCAT CAACCATGTG AGAAACACGG TTATCA.AACG GTTCACCAGT GCTATCCATA CCTGCTTCTT TAACAG~rGA GACTGGTGTA GCCATGTGAA TACCAAGAGT 6.00 0 ACGAGCTGCC ATACCAAGGT GAAGCTCCAT AACC'TCACCG CCCAAGTGGG TTCAACATGA TGTCGACTGG AGTTCCGTCT
ATATTCATAC
GGAAGGTAAG
TACAGGAACG
GACCTTAATC
'CAACTCA'rCT
GTGTGGTACA
CAAGAGACGT
AATATCACCT
ATACGAGAGA CAACCCCTrrT GTTCCGTGA CGTCCGGCCA TTACGT'rT GAGCGATGTA AACACGAACC AACA'rGTTAA CCATTrrACAC GTGTAAAGAT CTTAACATCA CGAACGACAC CGAAGAGAAG TATCACGCAC TTCACGAGAC T1'GTCTCCAA TCTTCAGCTG AAAGATCTTT CTCACCCTTA GGTGL ACTT TCTTTAACCT CAGCACCAAT ACGGATAATC CCCATTTCGT
GTGATGGTAC
GCATGTCTTC
TTTTATCTCC
CACCTGATTG
CAkTcGcATc
AGATAGCGTG
TACCTACAAG
CAAGGTCTTT
CAAGCTTTGT
CGTCCTTCAC
TCATGTAGGC
-10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 GAGGGCATCT 'rCACCAACGT TTGGAATTTC GCGAGTGATT ATCGCGCGT'r TCTGATTCGT A1-rCT'rCAAG GTGAACAGAT CAAGCGTTCG CTCATGATAA CGGCATCCTC GAAGTTGTAA
TCTTCAGGCC
GTGTAGACAT
CCTT-CCCAAG
AACGATTGGG TTTTGTCCAA GCCCCATTTC GAAATCGCCT TTTTCAACGA CATCACCAAC ACCTGAGT'rT GAACGACGGA ATTTTTGGAT ACGAACTTCTr ACCTTGTCAG CATCTGCGTA AGCCGCACCA GAATCGTGGG CTGCTTGGTA AGGATTAATC AATGGCACAG CCTGACGT'rG GTCATCGTTT TCCAAGAAAG CAATACATGC TCCATTTTCC ATAGAACGTC CGTCAGCGAT TTTTACGAGA CTGCGTTGCT TGTAAGCAGT GTGGTAAACA TCCAATGAAC CATCTTCACG AGTAACTTTA CCATCATACT GAGCAATCAC TTCCATACCA GTACCAACGT AAGGTGCCTG CATATTGGCT CCCATGAGGG CACGGTTGGA TGTCGCAACG GCAACTACC'r G'rTTGCrGA 813 AACGTCCATG TAGTCAACAA TATTAGCrGG ATACTCTTGG TTGACCCTT GGTGACGTCC 122 CATGACAATC T'rCTCAGCAA AGGTTCCATC TTCMTTCAGA CGAGAcTTAG CCTGAGCTAC 12300 AGTATATTCA TCTTCTTCA'r CAGCTGTCAA CCAAACAATT TCGTTCGTGA CAACACCTGT 12360 TTrCPCGGTCA ACCTTACGGT ATGGTGrTG AACAAAACCA TATTTGTTCA AGTGTCCATA 12420 AGATGACAAG TTArrGATCA AACCGATGTT AGGTCCTTCA GGTGTCTCGA rrGGACACAT 12480 ACCACCATAG TGAGTG.TAGT GCACGTCACG TACTTCATAT CCAGCACGGT C-ACG.AGTCA.A 12540 ACCACCAGGT CCTAAGGCTG ACAAACGGCG TrrGTCAGAC AACrCAGAAA GCGGGTTGTG 12600 TTGGTCCATG AACTGTGACA ACTGTGATGA ACCAAAGAAT TCTTTAACTG CZAGCTGTTAC 12660 AGGACGGATA TTGATAATTT GTTGTGGTGT CAAGACTTCA TTGTCCTGAA CAGACA'rACC 12720 rTCACGGACA T'rACGTTCCA TACGAGAAAG TCCCAA-ACGT ACTTGGTGG CAAGCAATTC 12780 ACCAACCGCA CGGATACGAC GATTTCCAAG GTGGTCGATA TCATCTACAC GGCCAAGTCC 12840 TTCAGCCAAG TTGAGGAAGT AGCTCATCTC AGCAAGGATA TCTGCAGGAG TCACCGTACG, 12900 **..AACCTTGTCA TCTGGGTTAG CATTACCA.AT GATCGTTACG ACGCGATCTG GATCAGTTGG 12960 *AGCAATAACC TTGAATTT'rT GAAGAACAAC AGGCTCAGTC ACAACGCTG CATCGTTGG 13020 *GATGT AGACA ATCTGTTCA AGTCGCCATC CAAATGGCTT TCAA'rGC-rr CAATCACGCT 13080 *ACGAGTCATA' ATCGTACCAG CTTCTACCAA GATTTCTCCA GTTTCAGCGT CTACCAATGG 13140 CTCTGCAATG GTrrGGTTGA GCAAACGTGT TTTAACATTG ACTTTTTTAT TGATTTTG'rA 13200 *ACGACCAACT GCTGCCAAGT CATAACGACG TGGGTCAAAG AAGCGAGCTA CAAGCAAGCT 13260 *ACGTGAGCTT TCAGCCGTCT TAGGCTCACC TGGACGAAGG CGTTCGTAAA rirTCTTTCA.A 13320 GGCTTCGTCT GTACGAGAGT CCA'rTGGATT CTTGTGGATA TCTT'I-TCAA CAGTGTTCG 13380 *.*AACCAATTCG CTGTCACCAA AGATATCAAA GATTTCATCA TCACCTGAGA AACCAAGAGC 1344a~ *ACGAACCAAG GTTGTAAATG GAATCTTACG AGTACGGTCG ATACGAGTGT AGGTGATATC 13500 TTTTGAGTCG CTTTCAAGTT CCAACCAAGC TCCACGGTTA GGGATA.ACAG TTGAACCATA 13560 GCCCACCTTA CCATTTTTGT CTACTTTGTC GTTAAAGTAA'ACACCTGGTG AGCGGACCAA 13620 -CrGAGAAACG ATAATACGTT CACCACCATT GATGATGAAA GTACCCATTT CTGTCATGAT 13680 TGGGAAA'rCA CCAAAGAAAA CTTCTTGGGT CTTGATTTCG CTGTTTCTT TATTGA'rCAA 13740 ACGGAAGGTT ACAAAAATTG GTGCTGAGTA GCTAGCATCG TGGATACGAG CTTCTTCTAG 13800 CG'rATATTTT GGTTCCTTGA TTTCATATCC AACAAATTCC AACTCCATTG TGTCTGTGAA 13860 GTTTGAAATTr GGCAATACAT CTTCAA.ACAC TTCCTTAAGA CCGTGGTCTA GGAAAGCTTT 13920 814 GAATGAGTCA GTI-rGAATTT- CAATCAAATr TGGTAAGTCA AGAACTTCTTI TGATrCTTGA 13980 AAAACTACGA CGGGTACGAT GTITCCCGTA TrGAACGTCA TGTCCTGCCA AGATGATTCT 14040 -CCI'TGTAAA TAAGTrCCAA GCCTTGTCAA TCAGGCTTTT CTAATCGTCA TATGGTTGTA 14100 AACCCCTTAT CACCGTGTCC TCI'GACGAA ri TCAGAAT CrTAAGCCT CTGPACAAA 14160 TGCTCAAAAT CTTGAAAAAA AGCACAAAAA GAGCAGCTAA ATCTGACTr'r TTCAGAAGAT 14220 TTAACTGCTG TGAGCCTrGT CTGGACAATA. TTrCAGACAA AACCrACGAC AAATGATTAC 14280 CCATATrATA CCCTATTTAG CrAGATrTTTT CAAGGGGI' CAGTAGGTTT ?rGGTAAATT 14340 TTTTCCCATA GAAAACTTGG CATCACAT'rC GAATCACGCT ATGGTACAAA AAACTGAAAA 14400 AACTATTGAC TGAAAATCAT T'rTCAAGGTA TAATAATAAA CGTTAAGGCG GTATAGCCAA 14460 GTGGTAAGGC ACGGCTCTGC AAAAGCTTGA TCGTCGGTTC AAATCCGTCT ACCGCCTTCT 14520 ATAACTTGAT TTATCAGGTT TCAAATGAAC AGAAAGCCCA ATTTGAAGGG CTTTTTTTAT 14580 TTrCCCTCGA ATAAATACGT ATAACTTTAA AAACTTTTGG AGCGAGTTTG TGGrAGAGTT 14540 *CTTTCCATGG CATAATTCCC TTTTGA.AATC AG 14672 INFORMATION FOR SEQ ID NO: 112: CiW SEQUENCE CHARACTERISTICS: CA) LENGTH: 7902 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 112: *AGGAGACTAT TCAAGCCCAA ATTgAGTAGC CCAGCAAAGA CTGTATAGAC TGTGATACGT TTT'rCATAGC CATTGGTAAA GAGA.ATTTGG GAACCAAGAA TGGTATCTAA GGCCAGGATA 120 **ATCGTACGAA AAGCGAAGAG AGAGGTCAAG ATGCCGCCTC CGATATATTT TTCACTACCG 180 TAAAGTAGGA TGGCATT'rGG TCCTAAAACC ATGAGTCCAA AACTCAGTGG AATGATAAAG 240 AAGTTAAAGA TTCGACTACC TCTATTAACC AGAGAAACAT AGGCTCTTT- GTCTCCTTTC 300 CCCAGATAGT AACTGAGACG AGGCACACTC ACTCCAATTrG CACCTGTTAC AACCCCAGCT 350 **ATAACGGTCA CA.ATTCGCTG AGCTATGGTA. TAGTAACTAA CGTTGACATC AATCCCrGT-r 420 TTAACGAGGA AGAGGCGATC TAAAAAAGTG AAGAGCATAT TGGCATTGGC AA.AGACTAAC 480 ATGGCTGTCA. GAGGGAGAAA GAGTGGTTTA AAATCACTTA GGTGAATTTT AACAAGTTTG 540 ATGTCTCTrT TAATCCAAAA ATAACTAATC AGGTAGTTAA TCAGCGTCGA TAAACTCATC 600 ACAAGTGTAT AGACAACAAT ATCGTGTTCA TTTTTAACAA ATAAGAAAAT AGAGACCAGC 550 815 0 000 00 0 t 00 0 a 09 0 0 *00*0* 0 6.9
C
0~.0 A'rCAGGATAC GGATGAAGGC TTGACCCATT CGATTGAAAA T 'N'TGACGA TTC-GArTATC GTCAAAATCG TACAAGCGAT TT TTGTTAT CCTTGACAr GCAA.AGGGCA AGAAAAATGA CGGTCCAAGA CACGCGCGAC CGAATTCCCA TGTAAGATAG TTCATTATAT CATAAAGTT1A TGAAGTrTrCA AATCTTAAGT TCTAAAATrT CGCAGCCATT TGGACTAGTT TCTTTGAATA AGTATTTC'rT TrATGATAGG AAGACAAAGG AGAAAATAGA CATGGGGATC AGTAACTTGC TCATACAAT'r GAAAAATTTG TGGAGACTGG GTTCTCATG ACGTA'rCATC ATTACAAAGG AGCCATTGAT GCTTATCGTC TCGTCCATTT1 ATrACACrTC CGCAGTGGAC ACAGTGGTAG TATTACAGAT AT'rCCAAATC TGCAAGGAC TTCATGGACC AGATGCATGT AAAATCTTTG CTCAAATCTG AAGATTACAA AGACTAGTAA AATGATI'AAT AATATCAGGA AGAGGCrATT CTGTCrGTCA TGCGGATCAG AAAAGCTTCC AATGGCAATG AGTrGTCTAA
AATCTGGGCA
AGTAAAGAAG
GCACAAATAA
ACrGATAGCC
CAAAATAGTG
ATAGTTCCA
AGCATTTAAT
GCTAATA.AGA
TTTAAGtTT
TATTAGTAAT
AAGAGAAAAC TGTATTrTTC ATGAGTrAA TCCCCATAAC AGAGGATAGG CTAGGATATA AAAAGACTAG AAAAGGTTCT CTrAAACcGT AGTTATAGAC TCGACTGAGT TGAAGTAACC
GTTAGGATGG
TTrATACTTT
AATGAAGGGC
GAAAAATA.AT ATTCAAGACA TCATTCAATT TACCTCGTTT AGTAAGTCAA GTAATCACTT CTTTA.AGGAA AG-ATATTAT TCTGAAGGAC TGCTACAGAA TTCCTAGTCA TTACTAGAAA
CAGAGCTTCA
AAGGTAGACC
GACAGCAGTG
GTTAAGATCT
ACCATXAGPT
ATAGTCAGTT
ATAGPAcTGC ATAATTCTCC ACTAGATrTGT GGTATAATAG TGATTTATGC AGGAATTCTT CAAAACAATT TTTAGAGCTA TCTTGGAGCC AAGTATTGAA CAGAAGATCI' TGTAGATAAA GTGGTGCTGA CCGCAATACA CGCTTACTCC AGAGGATATC GCATGATTCA GGACAATATC AAGCGGTTGA TACTATCGTT rrc.CTCACCT TTATCPLAGGA TATTCTAGAA GGGGAGGACC AGAGAATAAG TTTTrTrTAGT GCCGG;TGGAA CTGGCACACG GGGATCGAC CTATTTTGAT AA\AAT-rGTAG TTGGTGTTCA TATCTTCCTC TTTATALAGGA AGTATTAAGA ACATCATTGA GTTGCTTACCC ACGATTCTGT CAACTTGCCC AAAATCATGA GAAAGTACCA ATGGTCAATT CAAACACCTC AAACAT'rCCG GAAGAGAAGG AAATCTTGAC GCTTTGGCCA AAGGTGALATA GCAAAAAGTA TGAT'rGAGAA GCCTAAGTTT ATCAATGTCA TATCCGTCCC AACTACATGG TGATCCCAAG ATr'I-GAATA CGTC-ATTTCT GACCCGACCC 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400
TTTATGGATC
TGATCAAAGG
TCTTTCTGAT
AAAAGATGTG
0009 0 0000 06 *0 00 0 0 CCG;TAACAGA TTTGAAGATT CAAATTATC AACTAACTAA GACCAA'GAGA ATCATATCCT CGTTACTATC AGGGAAAACG ATTCACGAGT CATGTGGAAC
GAACCTACGA
ATGAAGAATTr
GC-NTATGAG
AAGA'rACGGT
TATTGACTCT
CNWTTGTGGT
GTCGTCATTG
ATATTCCTGA
GACCAGCTAT
GAC'rrAGCGA
TGGTTGGC
TCAAGAAA'TT
TTAAACATAT
GGTTGGTCAA AAAGrTGTCA CTATGAAAAC TACATGACAG AGAGrrTGTr TCTCTCCCA TGCAGCCATT ACAL2AGTTTG TGCTCATAGC AAGCGGGAGC TGCCAATAT ATCAACTATA GGAAAAGTrG CAACTCTTCT AGATTTCCCC 'TTTGACCATG 816
TGATTCCCAA
GGACCCATr
AAGATCGTGT
TCAGGGG
GG.ATCGCCGT
CTTTGCCAGA
CATTTrcCCAA
CTTTTGAATG
TCAGTCTCCT
CTTGICTAGT
GGTGGCTTAT
CATGCACGCT
TA?1'GGAGAT
AGCAGAGATT
AGAATGCTAT
TTrGGTGGT
ATGCAGAGTG
GGATrI'GATG
GATGCTATG
ATGAATCGTC
GGAAGTTTAG
GTGGTTATTG
ATTACGGATA
GATGGTACTG
TAATGACTTG ATTCGCTACA TTrCGTCCTCA ATATAAAGTC AATCTCAATA CTCGCGATGC ATCTCG rTCT GG'rCGCATTG ATTTTGAAAA TGCCAATCGT CTTAAAAATA TCCTTTATCT TCATCGTGTC TTTGCAACCG A'rTTAAACAC GGGAACGATT CTCATGATG CTTAGAAAAG GGCT'rGATTT TGCTATCCAA ATGATGGAAG AGAAGAACCT GTAAGAGAAA AGCCTTTAAA ACAGTGTTTA AGTGGGAAGT ATAAGTACTG GAGGTrAATT GTGGAGAAAA TCATTAAAGA AAAAATTTCT TCCTTACTTA GTCAAGAAGA GGAAGTCCTC AGTGTTGAAC AACTGGGTGG AATGACCAAT CAAAACTATT TGGCCAAAAC AACAAATAAG CAATACATTG TTAAATTCTT 'rGGTAAAGGG ACAGAA-AACCTTATCAATCG ACAAGATGAA AAGTACAATC TGAACTACT AAAGGATTTA 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 GGCTTAGATG TAAAAAAT'rA TCTTTTTGAT ATT'GAAGCTG ATCGAATCTG CGATTACGCT TGATTCAACG 'rCAATCAACA CCAATATTAC AAACTATTCA TACGTCTGCT AAGGAATTAA GAAGAAATCA AAAAATACGA ATCCTTGATT GAAGAACAAA TCTGTTAGAA ATGCAGTCTT CTCCTTAGAG AAAAGACTGG AAATCTTGTC ATATCCATTT GGTGCC'rGAA AACTTTATCG TATTTGATTG ACTGGGAATA TTCATCAATG AATGATCCAA TTTTTAGAGT CTGAATTCAC TTCCCAAGAG GAAGAAACTT -dACCAAACAC CGGTTTCTCA TGAAAAGATT GCTATTTATA GTATCAAAGT AAATGAGTAT CCAAGTTCCA CAAAAT'rACT GAGGAGAAT'r TGCTCCTTTT TTCCTTATGC CAACTATGAA CTGACTTAGG TGTTGACAGA AATCACCTCA AGGACGACTT TGTCGGATTT GGCTGCCCTC TCTTATCTCA CTATGAGAGT AAAT'TTrACA AGATACrATT TGGAGTCTAT GGACTGTCTA TAAGGAAGAG CAAGGTGAAG AT'TTTCGTGA CTATGGTGTG AATCGTTACC AAAGAGCTAT TAAAGGTTTG GCTTCTTATG GAGGI'CAGA TGAAAAGTAA AAACGGAGTT CCTTTTGGCC TTCTCTCAGG TATTTTCTGG GGC'rrGGGTC TAACGGTTAG TGCTTATATC TTTTCdATTT TTACAGATrT G;TCACCCTTT GTGGTGGCTG CAACTCA'rGA .817 T-rrTTGAGC ATCTTTATCT TACTAGCWI' TCTCTTGGTA CI'CAATrC TTAAATATTC GCAAT'GTCAG TGTTATCATC TATCGGTATG CAGGCCAATC TrrATGCAGr TAA~rATATC TGTATCGGCT M'?flACCCTG CGATTTCAGT GAT-rTCGAAA AATACTGTA'r 1-GGGATTGT CTATAAG? GAACACC'rTA AT'rCTTTCTA TATTGCATGG GGAAGTGAGA GTGTTCTTAG AATCGAAGCC CTCTTAATCC GTCAAGTAAC CTTCTCTCAT CAGTCATTTA CTGCAGTAGC TT'rrGCAGCC Tr'TGATATGA T'TTCCTACTT
TCTATTGGCT
CTTGATTA'TT
CATTGGGATT
CrC'T'TGCC TTCGT'rCTTG
CAATGGACAA
GGCTTATTAT
CTATGTAGTA
GACCATTATG
AAAGAAGGGA AAGTTCGCcT GGACCTTGC TAGCAGGCCC GGAAG3?rCI- TAGCTTCATC T~rcrc'r TG.AAGCAcAA GGAGGGATTA TrGCTCAr.AC CTrrGTGCrT TGGTTTGTGC ATGGAAAGTG AA'TTCAGTGA TCCTATCTTG TGATTGTGCT TTGCTAGGTC TCATGATTGT ATCGCTATCA ATCGCTTGCA TGGACGGTCT 'rGTTGCAGT ACGTCACTTG 'rCGTCATTGC rGAAAGCCAT TATCTTAGCA :CCCTAAAGC CTTGGTTrCAG ACCAGCCAA C CTACAGGCT TGTTTTCTTG GGTGCACCGC TGGAGTTTAT ATTATTATTA GCGGGATTGG GAACTCGCTT GTTAATCAAA AACCTTTGAT GACATCATCA TCATTGTMrG GGTGTTCGTC TCGTTTTCAA CTTGTA-AAAG AAGAATTGGC AATATGTTCC GCAATGATTT ACCAACGAAT GGTTCTTGGT AGCAAGGCAG GTCGCATCCT
TGAACGTGAG
TAGATATGCT
AAGAATAAAG GAGATTCGTG GCGTCCTATG ACTGAAAATA TGAGTACCAA ATTGAGTTTC TTATCTTAAA GAACAATTCG TGATAAATAC GCTGACTACA CAACAGCTAT GTTATTGATC GACACGTTCG ACTTATTTTA TTATGGAGAT CACTACAAkGG TAGTGGTGTA TCCTTCTCGGG TCAAAGAAAA AGGAATCAAT
ATTACTTGAA.AGAGAAATAC.
ATAACTTTTA CTCTCTCTAT CTGACAATTA TCTCTTTAAA GTGTTTATCG TCAAGATTGT TTCAAGACAT TATTGTT-GAT ATCCTCCAAC TGCAGAAAAG 4260 4320 4380 4440 4500 4560 4620 4680- 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 ATTCTCACCT TTATCGACA-A CGCTTATGTA AGTGGTGAAT TTGTTGATCT AATATGGTTA AGGATAATAT CAAAGAGCTA GATGTCTATC TTGAAGAATT AGCATTTATG AGATCGATAG TGTCCAAGAC TATCGTAAAT GAAAATTAAA GATTCCAACA TCTGACAAAA TAGTCGGATG TTTTACGAAT AGATAGATGA GTAGAAAAAG AAATGGAGTT
TAGAAGAAAT
TTTI-rTGATT
ATTTATGAAA
CTATTGGGAC
AGAACGCAAT
TCTTAAA.AAC
TTTTACGAAC
ATCACAAACT
AAAGTGCTAG
ATCTCAGGTT
CATTTGTCGA
ATGAAATCTA TAACTTAAAA AAATCAGGTT TGACCAATCA ACAGATTTTG AATACGCTGA AAATGTTGAT CAGCAC'T TGTTCGGTGA TATTGCAGAT CCCTAATCC AGCCGTTTTT ATGGAACCTT ATTTTCAGAT AGACGATGCG 818 AAGAGTrTTCA AAAA'rTTCCA TCTTTCTCTA TT~rAGA'rGA CTGI'ATCCT GTGAAATATA TGATGCGCCT GTACI'?rTAT ?1'rACAAGGG AAATCTTGAC TCCCGAAGGT ACCGTG GGCAGTCGTG Cr=AC.CAA ACACGGAGCT AAAAACTCAT TCAAGGCTTG GAAAATGAAC TGGTTATTGT CAGTGGTCTG TTGACACAGC AGCTCATATG GCAGCTCTTC AGAATrGGG AAAAACCA'N' GAACAGGACT GGATGTG=N' TATCCTAAAG CCAA'rAAACG CTGCAACAC ATGACCATCT GGTTCTAAGr GAATATGGAC CTGGTGAACA ACCTCI'GAAA CTGCCCGTAA TCGCATCAT'r GCTGGACTTT GTCGTCGTGT GATTGTAGCA TCCGTTCAGG TAGTCTCATT ACGTGTGAGC GAGCAATGGA AGAAGGACC
TGGGATTTGA
CTCCTGAAAT
AAGTCAGTTG
GCCAAGGGCA
GCAGTGATTG
TACATCCGCA
TTTCATTC
GAGGCTAAGA
CATC.TCTTTG
CTATTCCTGG TAGCATI-rA GATGGACTAT CAGACGGT'rG CCATCAT 'rG ATTCAAGAAG GAGCAAAAT'r GCTCACCAGT GGGCAAGATG TTCTTGCGGA ATTTGAATTT TAAAAATGAC CTAAGCTAGA ATTCTAAGAA AAAATCAA'rT TTAAGACAAA ATGAACCCAA CATTTCCATA ATAAAACGCA TATTAGCAAG TTTTTAACAC 'rTGATAATAT CCCI-rTTTTC TAAGTGGA'rr AGTAGAGTAG AGGATT'TTTC TCATATAATA CTCTTCGAAA ATCTCTT'CAA ACTACG'rCAG 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 '6960 7020 7080 7140 7200 7260 7320 7380 7440 CTTCrZATCTG CAACCTCAAA ACAGTATTTT GAGCgaCTtC CAAAGCAGTG CTrTGAGCAA CCTGTGGCTA GC'TTCCTAGT
AGTATAAGGG
AATTAATTTA
TAATTTAGTG
AATGATTCCT
AAAGTATAGT
TAGAAATATT
'rAATTGAGAA
ACCGTCTCAA
GAATTGAAAT
TTAGCACCCA
AGGAGAAATG
ATCTTGTCAG
AAGATGTGAA
AGGTGTACTG
ATTGTGATTG
TAACGAAAAA
AGACCTTGAC
TACT'NTTCGA
AGCTACGGTT
TCAGTAGACA
ATATCAGACT
GTCAGTCTTA TCTACAACCT TTGCGCTTTrG ATTTTCATTG CAACTCTATC AGGAAAGTCA TTA'rACATTC AATTACACTA ATG?.TGGCTA GGTTATGTTC TAAATTrCTTC AAAAGTAGAG AAATAAAAAT AAAATGGTTA AAATCTCTITC AAATACGTCA AGCTTCCGAG TTTGATT7":'C TTTGAGTCAG GATATTATGG ACTAAAGTAT TGAGTrrGT'r ATTACAAGGC 'rTCTTTAAGA AAGAATTCAA TTATAAAAAA TGGTCTGAAA TAGATGATGA GCTCAGC'TTT GCCTTGCTGT GTTTTGAGCA ATTTACTAGA AATGAAACTG ATGAGAGATA AAAA'rGATAA AAAGAGCTCG TGAGATTGGC AGGATTTTAG CGACTAG~rA GC'TGGGAAAG GAAGATATTi GTGACAAATA TTCGTTGATA GAATAGAA ATAAAATATA TGAAGAATTA GAAC'rTrCCA ATAAACTGTA 7500 GAAGTGATTT 7560 AGCGATTT'rA CTATGTGCCA TGCTTATCGC CTCTATCGGA TTAAATA'rGG ATTCGACTCC CGTGATTATT GGAGCCATGT TAATCTCTCC TTTGATGACA CCTATTCTGG GAGTGGGGCT CTCTCTAGCT ATATTTGAT'r TTAAATTGTT AAGAAAATCT TTTAAAATAT TAGCTATTCA 7620 7680 7740 819 AATTCTTGCC AGTCTAA'PAG CTTCAACACT TTATTTTTIAT CTTCTCCCA TTrCGTATGC 7800 TAGTTCGGAG ATTGI'TGCTA GAACCTCTCC GACTATTTGG GATGTTCTCA TTGCTTN'GT 7860 AGGAGGGATA GCAGATCA TTGGTGCTAG GAAAAAAGAG AC 7902 IFl'ORMATION FOR SEQ ID NO: 113: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 18627 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3.13: GAAGTTGAAA TGGCCAGCTG ATGAGCAATA TCGGTCATAG AAATCTTCTC AATCAACTTr' TGCGCAATTT TTrGGTTGAT AATACGAGGA ATTTGGTGAT TTr'rCTTGAC GATAGAAGTT 120 -TCAGCGACCA TCATTTTTGA ACAGTGATAG CACTTGAAAC GACGCTTTCT AAGTAGAATT 180 CTAGTAGGCA TACCAGTTGT CTCAAGGTAA GGAATCTTAG ACGGTrTTTG AAAGTCATAT 240 TTCTTCAATT GGTTTCCGCA CTCAGGGCAA GATGGGGCGT CGTAGTCCAG TTTGGCGATG 300 ATTTCCTTGT GTGTATC'ITT ATTGATGATG TCTAAAATCT GGATATTAGG GTCTTTAATG 360 TCTAGTAATr TTGTGATAAA ATGTAATTGT TCCATATGAA TCTTTCTAAT GAGTTGTTTG 420 GTCGCTrTTC ATTATAGGTC ATATGGGACT 'TrTTTCTAC AATAAAATAG GCTCCATAAT 480 *ATCTATAAGG GATTTACCCA CTACAAATAT TATAGACCCA AAAATCCTTT GTTTACTAAA 540 CAAGGGATIT TTCTTTTGTC TCTGCTCCTT TTTTGATATA ATAGTTCrAT GTTAAAATCA 600 GAAAAACAAT CACGTTATCA AATGTTAAAT GAAGAATTGT CCTTCCTATT GGAAGGCGAA 660 *ACCAATGTTr TGGCTAATCT TTCCAACGCC AGTGCTCTCA TAAAATCACG TTTTCCTAAT 720.
**ACCGTATTrG CAGGCTTA TTrGTTCGAT GGAAAGGAAT TGGTTTrAGG CCCCTTCCAA 780 GGAGGTGTTT CCTGCATCCG TATTOCACTA GGCAAGGGTG TTTGTGGTGA GGCAGCTCAC 840 *TTrCAGGAAA CTGTTATrGr TGGAGATGTG ACGACCTATC TCAACTATAT TTCTTGTGAT 900 a*AGTCTAOCTA AAAGTGAAAT TGTGGTGCCO ATGATGAAGA ATCGTCAOTT ACTTGGAGTT 960 CTGGATCTGG ATTCTTCAGA OATTGAGGAT TACGATGCTA TGGATCGAGA TTATTTGGAA 1020 CAATTTGTCG CTATTTTGCT TGAAAAGACA GCATGGGACT T'rACGATGTT TGAGGAAAAA 1080 TCTTAATCTA TCAAGCACTr TATCGAAAAT ATAGAAGTCA AAACTTCTCC CAGTTAGTTG 1140 GTCAAGAAGT TGTGGCTAAG ACTCTTAAAC AAGCGGTGGA GCAAGAGAAA ATAAGTCACO 1200 820 CTTATCTI-1- TTCTG(7rCCT CGTGGAACGG GAAAAACCAG TGTGCTAAA ATCTTTGCCA AGGCTATGAA CTGTCCCAAT CAAGTGGGTG GCGAACCTTG CAATAACTGC TATATTI'GTC AAGCAGTGAC GGACGGTAGT rrAGAAGATG 'rCATGAAAT GGATGCAGCT TCTAATAATG GGGTAGATGA AATTCGCGAA ATTCGTGATA AATCTACCTA TGCGCCTAGC CTTGCTCGTT ATAAGGTTTA TATCATAGAT GAGGTTCACA TGCTGTCTAC AGGGCCTTTT AATGCCCTCC TAAAG.ACGCT GGAAGAACCA ACACAGAATG ACAAGATTCC 'rGCTACTATT CTATCCCGTG CACAGGATA'r TAAGGAACAT CAGAGGCTGT GGAAATCATT TTTTGGATCA AGCCCTGAGT AAATTACTGG CACCATTAC AGGATGTTCC CAAAGCTTTG CTCGTTTTGT GACCGA'rCTT GAGCAAATAC TCATCATAGT
ATTCACTATA
GCCAGACCGGG
TTGACACAGG
CTATCAGCCT
TCTTGCTTGA
TACTCTTTAT ?1'TGGCCACT ACTGAATTGC TCCAACGTr TGAGITAAA TCAArrAAGA TCTTAGAAAA AGAAAATATC AGTTCTGAAC CGGAAGGTGG AATGCGGGAC GCCTTGTCTA GAAATGAGCT GACGACTGCT ATCTrCTGAAG
TGGATGATTA
ATCTTCTTrT TGTGGCGGCC T'rGTCTCAAC TGACAATGGT AAGAGCATGA TTGCACTATT TAAGAGACT'r GTTAATTGTT CAAACAGGGG TCAGTCTTTG .TAGAAAATTT GGCACTTCCT CAAAAAAATC
TGTTTGAAAT
CCAAGAT'rTA
TATCAGGAGC
GATTCGCTTA GCAACACTGA GTTTAGCAGA TATTAAGTCT AGTTTGCAAC TGCTGAAATG ATGACCGTCC GTTTGGCGGA AATCAAGTCC GAACCAGCTC GGTTGAAAAT GAAATTGCrA CCCTGAGACA CAAGTTGCC CGTCTCAAAC 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 AAGAGCTTrC TAATGTAGGT GCGGTTCCTA AACAAGTTGC CTACGGGCAA AACAGTCTAT CGTGTCGATC GCAATAAAGT CCGTCGAAAA TCCTGATTTA GCACGTCAALA ATTTAATTCG AGGTAATTGA AAGTCTAGGT GGGCCGGACA AGGCTCTGCT CTGCCAATGA ACACCATGCT ATTCTTGCTT TTGAGTCTAA TGAAACGAGA CAATCTCAAT ACCATGTTTG GTAATATCCT CACCTGAGiT TTTAGCTATT TCCATGGACG AATGGAAAGA CCAAACCAA ATCTTCTCAA ACTGAAAAAG AAGTAGAAGA TTGAATTTTT GGCTGATAAA GTGAAGGTAG AGGAAGACTA ATAAGTTTA'r GAATAAACAA CAATT'rATTA TTATGGCGCT ATTTTTTCAA TGAAGCCTGG ATGACTGGCC GCTATATTAT ACCAGCTCCT AGTCGACCAG GCAATCTATC TTACAAGAGG TTTGCAGAAT GCCTGGGGAG AGTTGGTTCT CAACCGGTTG CTTCAATGCT GGTCAAACTA CAGTCAGGCG GCAGGT'TTTT AGTTCGCGCA GCCTr'rTCAG AAGCCTGATT CCAGAAGGAT
AAGAAAGATT
GTTTACAGCT
CGCAGCCTTT
'rCATGATACA
GCTGAGACCT
TGGGCAATTT
TACTCTTTAG AAATTTCCGA GTCAGTTATG TGATGGGCAA A.ATCGTTGAT GTCATCGATC AGCATTTTAA TAGGAAACAC TACC!CTCAr- CTTCCAGACA AAATCAAAC CTTTTAGGCT 'rrN'I-r GTT ATACTAGAAA AGTATATTTA TAGAATTr1'r GCTCTATrrC TGGGGAAATC MACGTTTr'r CTACTAACTA CTGTAAAAGT TTTGAAAAAG AAAGGAACTA TCATGTCAG1' 3060 3120 ATTAGAGATC AAACATCTTC ACGTTGAGAT CCTGACCCTG AAAACAGGAG AAATTGCCGC GACTCTTTCT GCCGCTATCA TGGAAATCC GTTTGATGGC GTAAACA'rCC 'DTGAGTTGGA CCT'rGCTATG CAATACCCAT CAGAAATCCC CGCTATGA-AT GCGGGTAAAG AAGATGATGA TGAAGGAAAA GAAATTTTAA AAGGGGTTAA TATCATGGGA CCAAATGGTA cAGGTAAATC AAACTATGAA GTAACTAAAG
GCTAGATGAA
CGAAGGCTTC
GCCAACATTT
TGTGTCTAAA
CTACCAACGT
TGTTGTCCTT
AAAATGGAAT TGCTCAACAT TCTGGTGGTG AGAAAAAACG GCTCTTTTGG ACGAGATTGA GGTGTCAATG CCATGCGTGG CTTTTGAACT ATATCACACC TCTGGTGGTC CAGAATTGGC
AGTGGATGAG
TGGAATTACC
GAAGATTTCA
GAAAGAAGAA
CAATGAAATT
CTCAGGTCITT
TGAAGGTTTT
TGATCGGTA
TGCGCGTG
CGTGCGCGTA
AATGCTGAGT
GT'rCGTGAGT
ATGGCAGAGC
Cr'rCAACTTT' GATA'rrGACG
GGTGCTATGA
CACGTGATGA
G'rGAAGT~rr TGGGACTTr ?rcCTcCGTGC
TTATTACTAA
GTTACCTCAA
TGATGTT-GGA
CTCTTAAAGT
TCATCACTCA
TGGAAGGTCG
ATTAGCTCAA GAAC'rTGGCr ACGACTACAA GGAAGAATTG GGAGAACTA.A ATGACTAGAG AAAATATTAA ACTTTTTTCA CTGGTTGGCT GATCTCCGTC AAAAAGCTTT TGACAAGATT GAACGTGAAG GATACGCAAA TAATTCCCTC GTATCTTTTA GAAATGCACG CTGAACCAAG GAGACTTTGG AATTACCAGT 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380- 4440 4500 4560 4620 4680 4740 TATTGAGTGT GTCAAATTCC GCCATCAGCA AATGTTCCAG AGGAACTCAA ACTGTTTTCG CACAGACT'rT CACTCAGCTT ATCTGTTAAG TATGATGATG TGCTGTACTC TATATTCCAG CCAAGA'rAGC CATAGCAA'rG TT TAAGATT' AGTTATCTGG TGCCAATATC ACAGTGGAAG CGACCGTCTA GGTGAAAACG ACCGTTGGAA TrCTGGGTGAT GGAACGATTA CAGAAAATGA ATTTCACAGC ?1'TAGATCAT CAC?'rGAAGT TGGTGCAAGT AACAAACTCC AGTTGAGTTA GCTCA.ACAGG GTGT'rGTCTT TAGAAGAAAT TCCAGAGCTG A'rCGAAGAAT TCTTCA'rGTC ACAAGTTGGC GGCTTACCAC ACAGCTTACT TTAACAGTGG ATAACGTAGA AATCACAGAG CCAATTGMAG GAATTTTCTA TGCCGTTTAA CAAGCATAT'r ATGATTATCG 'rTGGTAAAAA AGCGTTTAGA GTCACCGT GAAGGAAGTG ACAAAGCAAC TGATTGCACG T1TC'rGGTGCG CAAGTCAAGT TTGCTGCTAT TCACTGCCTA CATTAGCCGT CGTGGTAAAT TACGCAACGA TGCAAGTATT GACTGGGCTA TCGGTGTCAT GAACGAAGGA AATGTCGTTG CTGATTTGA TAGTGACTTG ATTGGTAATG GTAGCCATGC TGACCTCAAG GTTGTAGCTC TTTCAAGTGG
TCGTCAGGTA
CATTCTACAA
CATCATCAAG
AGACCAAGCG
AGGCCATGCA
TGGCTTGGAT
822 CAAGGGATTG ATACTCGTGT AACTAACTAT CATGGGTA TCCTTGSAAAA AGCAACTTTG GGTGCTAAGG GAGCAGATGC GCAACAAGAG CGTTCAGATG CTAACCCAAT TCrT-r'GATT GGC1'GCAACT CAATCGGAAA AC~rrCAATG GTATCGGCCA AGCCGTGTTC TCATGCTTT'C GATGAAAATG ACGTAACTGC ATGACTACC TCATGAGTCG GTTCCTTG GA'rCTGTTAT
GCCTCTATTG
AAGGCAACTG
GTrCAGGTAGA TCCAGAAGAT CAGAGCGTTT GGTTGTTrCGT CGTGGAGATT CCAGTCAAGG AAGTTCGTGA TGAAATGATT CCAACTATCG GTCAAAACGC TAAGGGGCAG CCTATG.TTAG ATGTAGAAGC GATTCGCAAG TTTTAGATCA GATTG'rCAAT GATCAACCTC TGGTCTATCT GGACAATGC' AAAAACCACT AGTAG~rCTG AAAGCTATTA ACAGCTACTA TGAGCAGGAC TTCACCCG TGTCCA'rACC TTAGCGGAAC GAGCGACAGC T'rCTTATGAA AAACCATTCG TAAGTTTATT AA'rGCAGGCT CTACAAAGGA AGTTCTC'TTT CGACAACCAG CCTTAACTGG GTGGCACGCT TTCCTGAGGA AATTCTCACT AGGTCT'rGAT 'ITCAGTAATG GAACACCA'rr CTAATATCAT TCCATGGCZAG
AAGAG.AAATT
GATTTTCCAA
GCGACGACAC
AATGCCAATG
GCTGCTCGTG
ACCAGAGGAA
GAGGGAGACC
GAAGCTTGTC
4800 4860 4920 4980 5040 5100 5150 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 5000 6060 6120 i180 6240 GAAAGACTGG AGCAGAGCTT GTCTATGTCT ATCTTAAAGA A'rrTGCGAGC
TTCTTGGTGT
TTATGGTAGT
ATCTGGACTT
TTTACGGCAA
TAAATTGACT
GGTCAATCCG
GGATGGTGCT
'1rTCGCCTTT
AGAAAAGTAT
GATAAGGTTA AATTTGTTTC A'rCAAGGAAA TCACTCAAT CAATCTACAC CTCATATGAA TCCGGTCACA AGATGGCTGG CTTGAGCAAA TGTCTCCAGT TTTCTAGTT GGAAGGAATT GCTATTGGAC TTGCGACTC GCTCATGAAC AGGAATTGAT CCGTGCCTTG GATATGGAGG CCTAGC1'CAT GCCTCCAATG AGCCCACCAA GTTGGGGCAA GATTGATGTC CAGGACG TCCGACTGGT ATCGGTGTCC AGAATTTCGGC GGCGAGATGA GCCTTGGAAA TTTGAGGCTG AGTTGA'rTAT CTGGAAAAGA TGCGTACG'rC TATCCAAAAC TTGAT'1?TGT CTACGAGCAA GAACGCCAAA TATGGCAGGA TTGGTATGGA TGCCGTTGAA TGCACGCAAT TGAGGGATTG TTA'rTGCCTT TAACCTAGGT ACCATTTACG G'rrCTCAGGA TTTGGCTCAA CG'rTCGGGTG GATCTCCATC CTCACCATCT TGCGACGGCT CTGGATTATG AAGGAGTGGC TGT'rCGTGCT GG'rCACCA'rT GTGCGCAACC CTTGCTTCAG TATTTGGAAG TCCCAGCAAC AGCTCGTCCA AGTTTTTATA TCTACAATAC CAAGGCAGAT TGCGACAAAC TAGTCGATGC CCTACAAAAG ACAAACGAGT TTTTCAATGG CACTTTCTAA ACTAGATAGC CT'rTATATGG CAGTGGTAGC AGACCATTCG AAA.AATCCAC ATCACCAAGC GAAGTTAGAA GATGCTGACC AAATCACTCT CAACAATCCr AC'TTGTGGGG ATGTCATCAA CCTrTCTrTr AAGI'TGATG CAGAGGACCG TTTGGAAGAT TCAACTGCrT CrGCTAGTAT GATGACAGAT TTAGAACTGG CGACTATTT TICTGAAATG CAACT'rGGAG ACGCGGCATT CTTGTCAG GCAACCCTAG CTTGGAATGC CCTTAAGAAA AAGTCTrr TGTCTTATGA ATTATTAGAA AAAGAGTAGA ACCAAAACCA AI'GACCTTG ATTGCTTC 'rAAATrCAGG ATGCACGATT GCCGI'ITAG GAAAAACCAA ACAAGAAATT GTTCAAGGGC AAAAAGATGA GCGTCAAGAC GTTGCCAAAT TCCCTCAAAG AATCAAGTGT ACAATTGAAA ATCAAGAAAA ACAGrAAGAc ATGAAGAAAG AAAGGATACT ATGGCTGAAG GTGAATATAA ATTTrGTlrC CATGACGATG 6600 6660 6720 6780 6840 6900 6960 TAGAGCCTGT CTTATCGACA GGAAAAGGAC TCAACGAACG TGTTATTCGT CTGCTAAGGG TGAGCCTGAG TCGATGTTGG AGTTCCGTI-r GAAGTC?1'AT AAAAAATGCC CATGCAAACT TCTACTACCA AAAACCATCT TTrAAAGAAAC CTr'rGAACGT CTTCTGCCCA GTACGAGTCA TAGGTATTAT CTTTACAGAT AATACTTTGC GAAGTTGGTA TATGGTCGGG TGGAACT =1 AAACTTATTT CCGTATCAAT TTGATGAGGG AGCAAGCGTC TGGGGAGCAG ACTTCTCAGA GAT'rGACTTT CACAAACCAC CCCGTTCTTC GGATGATGTA ATCGGGAT'rC CAGAAGCTGA ACGTGCTTAT GAATTATCTG 7020 GAAACCTTCA 7080 CATGACTTAA 7140 CCTCAAAAGA 7200 TTAGCAGGGG 7260 TTCCAAAAAT 7320 TTATI'TAAAC 7380 AACTCAGCAG 7440 ATTCCAC-rC 7500 TTGATTATCG 7560 GAAGTGGTT'r ACAGAT'rCCG
CCGCCGACAG
ATCTACGTGC
AACGAAAATA
TACTACGTAG
ACCACAACAT GAAGGAAGAG
CACTCAAGGA
ATAACAAGTT
CAAAAGGTGT
TAGGTCAGTT
AAGGATGTAC
T'rGC'TTGA_
ATIAACTGT
GAAACI'rGGG
ATACCCAGAC
CCCAGCCCTC
CAAGGTAGAT
CGAACGTACC
ATAGC?'rACA
CAACTATCCA
AAAAGGATGC
ATCCATCTCT
C'TAATGCAGG
CGCTGCCATT GTAGAAATTT AAACTGGTCT GATAACGTCT C-ACTGT'rGAG TGGATT'GATG AGCACCAACA TATTCAAGCA CGGAGCTTAT ATGCGTTATA AACAAAGCGT GCTAAGGCTC TGCCAAAACG ACTATGAAAT CATGCTCTCT ATCGCCTTT TCACAATGC1' CCACATACCA TTACCTTGAT GGAGAAGGAG CGCGTGGTAC GCAACACCA.A CACACGGGTG CTAAGATGAT 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 GCTCGTCTAT TGTGTCTAAA TCCATCGCTA AAGGTGGAGG AAAGGTTGAC TACCGTGGAC AAQTCACCTT TAACAAGAAC TCTAAGAAAT CTGTTTCCCA CATTGAATGT GATACCATTA TCATGGATGA CTTGTCAGCA TCAGATACTA TTCCATTTAA TGAAATTCAC AACTCGCAAG TGGCTTTGGA ACACGAACCC AAAGTATCTA AGATTTCAGA AGAGCA.ATTG TATTATCTCA TGAGCCGTGG ATTGTCAGAA TCTGAGGCAA CTGAAATGAT TGTCATGGGA TTTGTAGAAC CCTT'rACAAA AGAACTTCCA ATGGAATACG CAGTTGAGCT GAACCGCTTG ATTAGCTATG 824 AAATGGAGGG ATCAGTTGGA TAAAAT?1'GA TIwrTATACTC TTCGAAAATC ACGTCAGCAT CGCCTrACCG TATGTATCGT TWCTGAtTCG TCP.arrTCAT
AAAACAGTGT
AGATTTACrC
ATCGCGAT
AAGAAGAAGG
GTGATGTAAC
CCAGCTTCAA
TTTGAGCAAC tGCGGCTAGC T'rCCTAGTTT GTTCDTGAT AAAATcAAGG AvrTTGAAr-A TGAACTTGTA TCAAAAAATC TTTATAATTT CTCGTTAACA AAGCGGACAA ACTGAr'rCCA CTTTTTCAAT TTT~CTTG'rCT GCTACCATTT CGAAACTAGG CTTGACCAAT CAAGTCCTTG TCTTCATAAG TCAAATGGCC GTGGTGCTGG GATTGCTTTG GAATCAGGTG TGAAT'rCAAC
TCTTCA.AACC
CrACAACCTC
TTTGAGTA'TT
GCGGTTTAAA
CCAAACTTTT
GCGCTCTGTG
AACCACTGTT
ACATTGGGAA
S
0 0 00 0 0 00***0 0
*OS*
tOSS GATT'GATTCC CAACACGTTC GATTAGATAG ATATCCrCTG GAGCCACTGC AGTTACTGTA TCTTCTTT'rC CATCTTGTAC AGGGGCTTTG CTATCTTGAT AGGCATCGCC TTG'r'GAACG ATT'IrGCGAA GTIGTAAATGT AGAAGAAATA TAATCCATTA GGGAAGATG'r AGCTGTAAAT CGAGCGTAAG GATTATTGTC TTGATGATCT GCATTTAAAA CAACTGTGAT GACTCTCATG CCTTTTTCGA CAGTAGTACC AACAAAAGAC TCTCCAGCCT TATCTGTTGT TCCTGTTTT'r AGCCCA'rCAA AACCACCACG GTAAGCAGGC ATACCTTCTA ACATGTAGrr GGTTGAAGTG ATrTGTCATCC CAGCAAAAGT AGAAGAAGGT TTTTTGGTGA TTTCTAAGAC TTGTGGGTA'r TTTTTGATGA GGTTGCGAGC AACGATAGCG ACATCATAAG CACTAAGCTT ATTrrCCTCA TCTTTTTTAG AACCTGGGTA AATGTTATCC CCTAGAGTTT CATTGIITAAG ACCTGTCGTA TTGACAACAG TGGCATCCTG AATTCCCCAT TCCAAGAGTT TTGCCCGCAT CATATCGACG AAATCTTTT CTGAGCCAGC AATTTTCTCA GCTAGGGCAA TAGCG;GCGCT GTTGGCACTA 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 GATACCAGAG TTGCTTCAAG TTACTGGCTT CAGAATTTGT GAGAGGGTAA TACTTCCGTT GTTATGGAAG CAATTTCGAC TTTGCCTCAA CAGCAATCGC GAAGCACCCC .CTAAAAGAGA TTATTCTATC ATAAAGAAAA GAAAATATGG TAAAATAAAG TTTGGACCAC AGAAATTrA CTTTGATTAA CCCTTTTGTT CAACTCTTCG ACAGTATAAT CGTCAATTGA TA.AGGATAAT TTCCAPAAGCT TCATAGACCA AGGTTGCGTT GCATCCTTCT ATGTTTAGCG GCAATGGTAA TACGGGCCTC CATAGGAATA CAGAAATATC TACAGGAGTG GATAAACAGT AATCAATTTT CATAGAGAAT TTTACCAGTA AATCTTGAGC AACAGCAGTA ATAT=rTT CATAGTAGTC ATTCATCTGT TAAGCTTTTT 000 t 0 *0
S
0
GACAGTTAAC
AAAATATTCT
TAAGGGAGGT
CTCTTAACCA
AGCGTGCTTA
A.AAGTTAAAA
TGCTTTAATA
AACTCATGTT
TCATCTTTTA
ATACAATTAT
'rCGTAGAAAT
CCTATGGAGA
GATTCCATTT
AAATTATTTT
CAGATGGGGT
TTATTAGCG
GCTTTTTTTA TTATTTGACA AACCCTATTG TTACTTT'CTT AAATAAAGTC 'rGTAAACTCA 825 ATCG'TGCT TGGTAMrA kl-rACCTTGT ~TTATcT cr'rAccTATT TTGATTAATC CrTTATAG TCGA~rACAA GACTTAATCA ATTGGATGT AGAAGCTACA ATTCAGCAGT ATATCCTAAA 'rAGCGrATCA AATAG'rCTGG ~rlr 3A'IT GATTATGACT CC-ACTTTTr'r TCTTGCCCAT GCTTGAAAGA ACGATTCTAA CTACTTGTr
AGTTATCTAG
TAGACI'ATC
TAAAC~rATC
GGAGCGTCTT
TGGTTTATTT
AGAGGGATCG
ATATTAGTGG
GTATTATTGG
CTGGGGAATG. GTCATAGGTG TTTGATTATA TCTAGTCAAA 'rAATTATCCT GCGCTCCAGA CTATGTAT ATTrrCAAA GTCAGCTC'TT ATCAGTAC'rG CTTATTAGAT GGACATAAAT CTTGCATATT GCAGGCTTrAT AGTTTCGATT GACGCAATCA ?TTAAAATAT GCTrTAG'TTT GGGGCCAAGT ATTGGTTTCA ACTGCrGATT GCAGTGAI-TTT 'rCC'rCGAATC CTAGGAAGTG TAAAGAATT~T AAATCGrACG TTATAGGT'rG TTTGGCTTAT TTGCCAT'Tr TTCTCGTCTA TTCCTATCAT CA'rCGCAAAT ATATGCTTGT TGTTCAGCAG TTATGAAGGT TCATCCAATC CTGTAGTTGG AATCATTGTC TCTTATCCCA TTTrGTATGAA AGTAAA.AGTC AGGAGAACCC TGAAAATAAG TTAAAGTCTT ATAAAAATCG AAAAAACCAG 'rCTATTAAAG TTCGAATGAA AATTCTAAAA CATTGTTAGA TTCATTTrAC 'rATATTTTGG GCCAATTTAA TTCCTTATGT ATATTCACTG ATCCCCATAG GTAGATGGCA ATAT ATA AT*=cCGCT
ATTGGCTATA
ACGATTTTAG
GCAGTGCCAA
AATCATAAAA
TGArrTT
AAACTAATTT
TGGAATTCTG
ACC'rATTTAT
AATCCATTTG
TCCAATAAGT
T=rACTTTT GTTGTCAACC CCTATTCTAT CTGAAAGAA TAATGAAAGA ACGAGAAAGA TTAC'rGGAAC TGGCC?1'TAG TCACAGCTAA GAATAGTAGA 'rG'CAGOGTA AGTTCCACL'G
AGTAGATTIGA-AACI'AGAATA
ACTGTCCTCA TCTATirCGTT
AATATCTATG
ATTTCTAAGT
GAATTAGCTA
ATTACAAGAC
AGTTAATCTG
GTTTTCATAG
CTACACCTCT
CTATTCTITAT
10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820
GAAAAGTAGT
?TTrATAGT'r TTTAAACTCA AATGAATTGA AATAAAGAGA AAGTATTTTA GAATAATTCT CTTCGTGAAT TTCTTCAAAA ATCTGATTTC TT'ITTTGCATT TTTrGACTrAG ATAAGGTATA GGTCGTTACG GATTCGACAG GCTTATGAG GCATATTG( ACGCTCAGTT AAATATAACT GCAAAAAATA ACACTCTA CAGCAGGCGT GACCCGAT'rT GQATTGCTCG TGTTCAATGA ATACGATTAA GCCTTGTCTA GCGGTTAT AAGAGATTGA TTGAGT'rATG 'rGTCGAGGGG CTGTTAAAAT AATACATAAC CCGAATAATA TAAGGATTGA GTACGAAAAT TCTCATCTGA CAGATAGCTT CATCTTAGGT ATGATrrTAT TGTCT'rTTGG CGACTCGTGT GGCGACGTA-A CGCTCTrAGCT GCCTAAA.AC CACGTCTTA'r TATTAGCGAG TAGACTCGCA GTTTCTAGAC CTATCTTGT AGACAAATAT 826, GTTGGCAGGT G'TTGGACGT GGTCGACT CCCACCGCCT CCATTATTCC 'rTTGCATTCT ?TTCA'rrCC TTGGTAAAAC GTTGrrAAA'r CAACGTTTT TATr'N-rATC TTTGGTATTC CTTTrGCATTC TTTTGCTAAA AAGGGAGTCA CAAACAGACC CTATr1'TAAA AAAMGATAGA AAAAAGGATA CAACATrTGT CGCATCCTAA AAATAATCrr TTTTCGACGG AAGACATGGG ATTCGAACCC ACGCACGCTA TTACACGCCT ACCGCGTTTC CAACACGGCC TCI'AAGCCT CTTGACTAAT CTTCCAATAC TTACTCAAAT AGTCTACCAT AAAiGGCTCTT ATTrGcAAT AAAAATTCTrA GAAATAAGAA AAATGATAGA TTTTCAAAGA AAATGATAAA AAATGCTTGA CTTCGAAAGA AAG'rATGATA GAATGAA'rAG TGTAAACCAT AACAGGAGGT GATTCAGTGT TAAAAACAGA ACGTAAACAA CTAATT'rTAG AGGAGTTAAA TCAACATCAT G'rAG'TTTCTC
TAGAAAAATT
AGT'rGGAAGC
TACAGGAAGA
AGTTAGTTTG CTAGAAACGT GGAAAACAAG CTTCGTCGTG AGAAACCATT CAAGAAAAAT TGGCTCAGAA AGCAGCCTCT CTCATTAAAG CAACAAC'TGC TTTrTrTGATT CATGAATTGG CCATTCACCA TGCCGCTCAG TTGGTTGAAA ACGTCAAGAC GGCGACAGAT GCTAGTATCG 'rGCACTT'rcA CCGTGCCTTT ATCCGAATAA CTGATATGGA GGAGGGAGCT GTGAAAAGAG TCTTGGTGGA TTCGTCAAAA A'rrGGACAAA GCGCTATCGT TATCACTAGT CAAGGGCATG CAGAATCAAC GGTTCGAAGA GAC'rTGGATG TGCATGGTGG AGCAGAACTC CCCTACTCCT CrarCAAAAA CCTTCAAGAA AAGAAATTGC AAAAAGATGT CATCTTTATC GA'rGCTGGAA TCAATAAGAA 'rGTTACAGTT GTGACCAAC'r AGCAGAW'rCC AACTGTCATG GTTGGAGGAA GGGGCGTTGC TCTTAACCAG ATTAACCAAT 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 ATGGTGTTGA CGATGGCTAT CTATTTGGA GAATGCCAAG C'TTGCTTTGC CAAGGTACC AGCTCTTGCA GGTTATTAAG
TATACCACTC
CAGACCTACG
CCACTCAAAC
GAGAAAACGG
AGGTAATAGA
TTTGCACCAA
TGGGAAAGGA
GGGATTTATC
AGTATGATTT ATACAGTCAC ACTCAA'rCCA TCCkI'TGACT ATATCGTTCG GTCAAAGTTG GTAGTGTCAA TCGTATGGAC AGTGA'rGATA AGTTTGCTGG ATCAATGTCA GCCGTGTCTT GAAACGTT'TG AATATACCAA ATACAGCGC GGTGGCTTTA CTGGTAAATT TATCACAGAT ACTTTAGCAG AGGAAGAAAT CGAGACACGT TTTGTCCAGG TGGCAGAAGA TACTCGTATC AATGTTAAAA TCAAAGCAGA CCAAGAAACA GAAATCAACG GAACGGGTCC AACTGTTGAA TCGGTTCAGC TAGAAGAATT GAAAGCTAT'r TTA'rCTAGTC TGACAGCAGA AGATACAGT'r GTCTI"TGCAG GTTCAAGTGC TAAAAATCTA GGCAATGTTA TCTATAAGGA Tr'rGATTTCC TTGACGCGCC AGACTGGTGC GCAAGTGGTC TGTGAC'TrG AAGGACAGAC CTTAA'1TGAT AGTTTGGACT ACCAGCCTCT TCTTGTAAAA CCAAACAATC ATGAACTTGG AGCGATTT'r GGGTTAAAC TCGAAAGTTT~ 827 AGATGAAAT GAGAAATACG CTC3TGAGTT ACTGGCTAAG CTCTATGGCT GGTGATGGTG AATCAAAGGA ACAGTCAAAA AGGTGAATr GTCAAATCA6A AACGGCAACr ACCrrCTCAG AAAAGTTGAG CTAGAAAAAC CTAGATTTGC AGGCAACTGA CCCTTCTTGT CACATCTGAG
ATTCAGTTGG
AAGACG.TAGT
ATGACTTGGC
GATGAAAATT
AAAAACAGCT
TTTGAAACA
TGATGGAATC
TGCTAAGTC-A
AGCTGGTGAT
AGAAGCCTTC
AACGCGGAA
CAAGACCTATr
GTCATCGACG
TTTAAAGAAG
GCAATGCCTC
AATAACGGTC
GGTGCTCAAA ATGTTATTAT GGAGCTTACT TCGCTAAAcc TCTATGGTTG CTGGATTcAc AAATWGGGAG TGGCTTGCCG TTTATTAAAG AAACATATGG TGAGAAAAGA TGTCATGrrG AGATGATTAA AAATTTGACA GAATT -rGGC CTGAAGC'r ACAGCAAAAA CGCTGCTGTC TTGACTACGA GAGCI-rGGAT
GACCACGGTT
TTGACTTCTA
AAAXAAGCGA
ATGTAACAGA
CTCGTTTGGG
CAGTTCTATT
GGACA6AGCAA CTCACCTCTT CTTCATGATT CCAGCTCCAG AAGGTGCCAA TTCGCAGCC'r TGGCAGAATT GTCTCAATAC TTGATGAAAG ACCGT=TC CGTCAAGCAA CATCTGCAGA CCAAGTTATC GAACrrMG ACCAAGCTTC CAGGAACTTG TTCAAGCACC TGCTA.ATGAC TCTGGTGACT T'rATCGTAGC TGTACAACAG G'rATTGCCCA CACTTACATC GCCCAAGAAG CCCTTCAAAA GAAATGGGG TTGGTATCAA GCTCGAAACC AACGGTGCTA GCCGTGTTGG ACTGCAGAAG ATATCCGTAA GGCTAAAGCT ATTATCATTG CAGCAGACAA 'rGATACTCAC
AGACAAACTT
AGAkAAAAACT TG7'rACAGCT
AGTAGCTGCT
AAATCA.ACTA
GCCGTTGA
13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15006 15060 15120 15180 15240 15300 15360 ATGGATCGAT TTGATGGAAA ACCATTGATC AATCGTCCAG ACAGAAGAGC TAATTAACTT GGCTCTTTCA GGAGATACTG GGTGCCAAAG CTGCAACAGC CTCTAACCAA AAACAAAGCC CACTTGATGA GTGGTGTATC GCCC -rcCCT TCTTGAT'rGA.
GGTTCTTAC C ATGAGTTAGC ATCCTTCCAG TCTTTGCGGG GCAGGr'rTCG TGGcTGGTGC GCCCCAGGTG G'rGAAGCAAC CTTGTTCGTG GATTTATCGC CCTCG'rTCAC TCGAAGCTGC ACAGGATTTG TTATGCTAGC TCAAATGTTA CCATTCGTTA CGGTGCTTTG GGTGTTCCAA TTCTATGTTC ATGAAAATTG TTATGTTCCC TACTC'rATTG TATTGCCAAA GAAGGTTT'rG TTGCTGACGG TATCCGTAAG AAGTCTACCC TGCCGCTAAT 'IrGGTGGTGC CTTGTACAA TCGGTCGTGG TA'rCATGATT ATGAAAACCT TGGCAATCTT GTGGAGCTGC CTTTGGTTTG C'rGAAAAACC GGGTTTGGTA CCT'TTCCTAA AATTCCTTAT CATCTGGTTT CCTAGGTGCC TCAAGAAA'rA CGTTAA.AGTT CACTTCTTGG AACAATCTTG
TTCAACTCTT
AGGTGCCTTG
TAA.ATCAATC
GCAGGTGTCT
GTTCTrCCCA
CTTCTATTGC
TGTGAATATC CCAATGGCTG CAATCAACAC TGCTATGAAT 828 GACTrCCTAG GCGGTCrTGG AGGAAGTDCA GCTGTCCTTC TTGGTATCGT ATGATGGCTG rrGACATGGG TGGACCAGTT AATAAAGCAG CT1'ATGTCTT ACGCTTGCAG CAACTGTTTC TTCAGGTGGT TCTGTAXGCCA TGGCAGCAGT GGAATGGTGC CACCACTrGC AATCTr'rGTC GCAACTCTTC TTrTCAAAGA AAGGAAGAAC GTAACTCTGG TTTGACAAAC ATCATCATGG GCTTGTCATT GGAGCG.ATTC CATTTGGTGC CGCTGACCCA GCTCGTGCGA TrCCAAGCI-r TCAGCAGTAG CAGGTGGACT CGTTGGTCTT ACTGGTATCA AACTCATGGC GGAATCTTCG TTATCGCCCT TACTTCAAAT GCTC'rCCTTT ACCTCGTTTC
CCTTGGTGGA
TG(7rACAGGT
TATGGCTGGA
TAAATrrACT
TATCACTGAG
CATCCTrGGT
GCCACACGGA
TGTCTTGGTA
1.5420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 16140 16200 16260 GG-AGCAATCG TAAGTGG'rGT GAAAA.ATCAA AAGATTGGAC AATATGGTAT AATAGAAGAA TAAAGCAGAA CTGGAAAGAA GATTATTG ATTTTCGCAG GGTTTATGGT TACCTACGCA AACCACAAGC ATAAAAAATA CGTTTrGGTGC AGTCTTTTTC TGGCAAACAA GAATACAAG;T AAGAAGCGAT TCAACGAATG CCrrCAAATT AGGGGCTGCA rCTTCCCGAA ATGCCTGTGA ACAACAAGAC GGAGACCGTC TTGATrTCGT TAGGAATTGC GGTATAACCC I-r'ATAA'rrT AATTCGCTTG CTAGTGGGTA GCCTAGCTTA TCTGGCGATA TrCGGCCTAT CTTCTTTTTC AAGTGGATAC GAAAACAGGA AGGACI'CTTA 'rCTGGCTT'TT TGCTGGCTTA CTCTTGATr-r TTGAGGCCTA CTTGGTTTGG AAATATGGTT CGTTCTAAAA GGGACCATGG CTCAGGTTGT GACAGATCTG ACTGGTTTTC CTTrTGcTGGA GGGGGCTTGA TCGGGGTCGC TCTTTATATT CCAACAGCCT AAATATCGGA ACT'rACTTTA TTrGGTTCT'AT CTTGAT~rTA GTGGGTTCTC CCCTTGGTCT GTTTACGATA TTGCTGAATT TTTCAGTAGA GGCTTTGCCA AGGGCACGAG CGTCGAAAAG AGGAACGCTT TGTCAAACAA GAAGAAAAAG GGCTGAGAA-A GAGGCTAGAT TAGAACAAGA AGAGACTGAA AAAGCCTTAC TCCTGTTGAT ATGGAAACGG GTGAAATTCT GACAGAGGAA GCrGT'rCAAA TATTCCAGAA GAAAAGTGGG TGGAACCAGA AATCATCCTG CCTCAAGC'rG CCCTGAACAG GAAGATGACT CAGATGACGA AGATGTTCAG GTCGA'TTr AicccTrAA TACAAACTTC CAAGCTTACA ACTCTTTG;CA ccAG.ATAAAC GTC'rAAAGAG AAGAAAATTG TCAGAGAAAA TATCAAAATC TTAGAAGCAA
TAATCTATCT
TCACCATATT
'rGGACAAGTC 16320 GAACGACTAG 16380 TTCTCITTTC 16440 TCCTAGTCAG 16500 AATGGTGGGA 16560 CTCGCCAAAA 16620 TCGATTTGCC 16680 ATCTTCCACC 16740 AACrTAAATT 16800 CAGCCAAAGA 16860 CAAAACATCA 16920 CCTTTGCTAG 16980 CT'rTGGTA'rT AAGGTAACAG TTGAACGGGC CGAAATTGGG CCATCAGTGA CCAAGTATGA AGTCkAGCCG GCTGTTGGTG TAAGGGTCAA CCGCATTTCC AATCTATCAG ATGACCTCGC TCTAGCCTTG GCTGCCAAAG ATGTCCGGAT TGAAGCACCA ATCCCTGGGA AA'rCCCTAAT 17040 17100 17160 829 CGGAATTGAA G'rGCCCAACT CCGATATTGC CACI'GTATCT TrCCGAGAAC TATGGGAACA ATCGCAAACG AAAGCAGAAA ATTTCFGGA AATrCCTrTA GGGAAGGC'rG TTAATGGAAC CGCAAGAGCT WTGACCT1-r CTA.AAATGCC CCACTTGCTA GFGCAGGTT CAACGGGTrC AGGGAAG'rCA. GTAGCAGTTA ACGGCATrAT TGCTAGCATT CTCATGAAGG CGAGAccAGA TCAAGTTAAA ?I'rAGATGG TCGA'PCCCAA GATGGTTGAG; TrATCTGI-rr ACAATGATAT TCCCCACC!TC TTGATCCAG TCGTGACCAA TCCACGCAAA GCCAGCAAGG CTCTGCAAAA GGTTGTGGAT GAAATGGAAA ACCGTTATGA ACTCTrTGCC AAGGTGGGAG TTCGGAATAT TGCAGGTTr'r AATGCCAAGG TAGAAGAGTr' CAATTCCCAG TCTGAGTACA AGCAAATTCC 9* 8* 0 *000 0* GCTACCArTC ATTGTCGTGA GGA.AGTGGAA GATGCrATCA GATTCTTGCA ACTCAGCGTC TCCATCTCGT GTAGCATTTG AAATGGAGCA GAAAAACTrC TCATCCAGT'r CGTCTCCAAG CTrCATCAAG ACTCAGGCAG TGAAAATGAA GGAGAATrr
TTGTGGATGA
rCCGTCTTGG
CATCTGTTGA
CGGTTTCATC
TTGGTCGAGG
GCTCCTTTAT
ATGCAGACTA
CGGATGGAGA
GTTGGCTGAC CTCATGATGG TGGCCAGCAA GCAGAAGGCG CGTGCTGCAG GTATCCACAT TGTCATCTCT GGTT'rGATTA AGGCCAATGT AGGAACAGAC TCCCGTACGA. TTTTGGATGA AGACATGCTC TTTAAACCGA TTGATGAAAA CTCGGATGAC GATGTTGAGC GCATTGTGAA CGATGAGAGT TTTGATCCAG GTGAGGTTTC TGCTGGTG'rGATCCGCTTT TTGAAGAAGC 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 184 18540 18600 18627 TAAGTCTTrG GTTATCGAAA CACAGAAAGC CAGTGCGTCT ATGATTCAGC AGT-rGGATTT AACCGTGCGA CCCGTCTCAT GGAAGAACTG GAGATAGCAG TCCAGCTGAA GGTACCAAAC CTCGAAAAGT G'N'ACAACAA TAAAAALAATA AAGTTTGGAG GGAAGCTATT TrrAGTGGCTA TTGATTGCTT TTATTTTCTG ATTGGACTGT TTrTCGTTTT CAG'rAGCAGG TTTACTTGAA GCAGGAGTAG AGTTGCTGTT TTCTGATCTT CTTTTTTCTC TTCCTTGACG C'rAGATTTTG TTGCTGTGTT TTTTCrTGAC TAGTGTTAGT CTCTTTAGTT GGACTGGTGT GGATTCCTTT TGGATTTCTT TGACAATGGT TGTCGTCTGG CTTGTCGTAG AATATTrTTT TTATTATCCA AGGCGTT INFORMATION FOR SEQ ID NO: 114: i)SEQUENCE CHARACTERISTICS: LENGTH: 2560 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
GTCGTTTATC
GTGTCATCGG
GCTTCTTTCC
AAGTTGGCGC
AAGAGTCCTG
GTGTTTCCTC
TTTCCTTAGG
GTTCTrTTTTT 830 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114:- TAAAA'rACGT TACCTTGCTT CTGCACGTC AGCAGGTAAG TCA?1'GAAAT TPTAAAGATCA AGATATTACA ATTGAAGAAA CCACTGAAAC AGCTTTGAA GGAGTTGATA T'rccCCTCr TTCAGCAGGT AGTCTACAT CAGCTAAGTA TGCACCATAC GCAGTAAAAG CTGGCGTGG'r AGTAGTAGAT AATACATCTT ATTTCCGTCA AAATCCAGAT GTrCCTrGG TGTTCCAGA GGTCAATGCT CATGCACTTG ATGCTCACAA CGGAATCAI-r AATTCAAATG ATGGTGGCrC TTGAGCCGGT TCGCCAAAAA TGT'rTCAACT TATCAAGCCG TTTCAGGTGC TGGTATGGGA TGAACTTCGT GAAGTCTTGA ATGATGGTGT GAAACCACGT GCCTTCAGGT GGTGACAAGA AACATTATCC 'rATCGCCTTT TGTTrTCACT GATAATGATT ACACGTACGA AGAGATGAAG A6ATTATGGAA GATGATAGCA TI'GCAGTATC TGCAACATGT AGCTCACTCT GAGTCTGTTr ATATCGAAAC AAAAGAAGTG GCCTGCCCTA ATTGTrTCAAC TGGGGCTrGG ACCGTATCAT GCAATTCTrG AGACACAACG GATT'rGCATG AACGCrCTrC
ATGACCAAGG
G'rGCG'rATTC
GCTCCAATCG
CGGAAATCTT 480 CACAAATTGA 540 AAACTPAAGAA 600 CAGTCTTGTC 660 AAGAAGTAAA 720 S. p.
C
S. S S
U
S. S V
AGCAGCTATC
TCCTCA6AGCT
CTTGGATGCA
TGC'rTGGAAC
AGCCGAATTG
CTTTGAAATA
CAGCCTTTAT
TGATTGAGCA
GCAGCCTTCC
ATCAATGCAG
GAAAAAGGAA
TCAGTTCAGA
AAAT'rTGAAT
GAGAGGTGTT
TACCCCCTTC
CAGGTGCTGT TCTTGAAGAT GATGTAGCTC ATCAAATCTA TrGGTTCGCG TGATACCTrTT GTTGGTCGTA TCCGTAAAGA TTCACATGTG GG'N'GTTTCA GATAACCTTC TCAAAGGTGC TTGCTGAAAC TCrTCATGAA CGTGGATTGG TTCGTCCAAC TAAAATAGTC ATATCGTTTA GGAGTTCAGA TGAACTCCTT TTCGTGTCTT ArCA6AGATTT AAAAAAATGT AAAATCATTA CATCAGGATG GTTCCATTAA CTTTGATGCT ATTCCAGCCT 780 840 900 960 1020 1080 S. 55 S S
S
TTTATTGGCC CATCATACGG ATGGAATrrCT sn 5055 5* C. S S
S
AGAGTCCAAC 'rTTGACCCAC GATGAGGAGT TGGAGTTGTT TCAA'rGGACG CGTrCCTTTG ATTGCGGGTG TAGGTACTAA AGTTTGTCAA AGAAGTAGCG GAATTTGGTG GTTTCGCAGC ACTACAACAA ACCTTCTCAA GAAGGGATGT ATCAGCACTT CTGACCTACC AATTATTATC TATAACAT-rC CAGGGCGTGT AAACCATGCT TCGCTTrcT GACCATCCAA ATATTATCGG TGGCTAATAT GGCT'rACTTG A'rTGAGCACA AGCCTGAAGA TCTCGCAGGA ACGACTGCTG TGCGGCTGTA CAAAAGGTTG; TGATACGCGT GACTCTATTG TGGGCTTGCT ATTGTTCCTT TAAGACTAT'r GCAGATGCTT AGTTGTCGAA TTGACTCCAG 'rGTCAAAGAA TGTACTAGCT GTTCTTGATT TATACAGGTG 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 AGGATGGAGA TGCTTTCCAT GCCATGAACC TTGGGGCGGA TGGGGTTATT TCTGTTGCCT 831 CTCATACAAA TGGGGATGAA ATGCACGAGA TGTTTACTGC GATTGCAGAA AGCGATATGA 1740 AGAAAGCwCG AGCA.AFICAG CG;TAAATTCA ITCCTAAGGT rA.ATGCTCrC TTCI'CTTATC 1800 CA.AGTCCTGC TCCAGT'rAAG GCAATTCTTA ACTATATGGG ATTTGAAGCT GGACCCACTC 1860 GTCTACCTCT TG?1'CCAGCA CCAGAAGAAG ATGCCAAACG CATTATCAAG GTTGTCGTAG 1920 ATGGCGACTA CGAAGCAACT AAGGCAACTG TAACAGGGGT CTrAAGACCA GATTACTAAT 1980 AAAGACAATA AAATCCGGCT CTTTGTCAAC TGTAGTGGGT TGAAGTCAGC TAAGCTCGAG 2040 AAAGGACAAA TTTTGTCCTT TCTTFTGA TATTCAGAGC GATAAAAATC CGTTTTTGA 2100 AGTTrCAAA G'FrCCGAAA.A CCAAAGGCAT TGCGCTTGAT AAGTrTGATG AGAT'rATTGG 2160 TCGCTTCCAA. I-rGGCGTrr GAATAGGGTA GTTGAAGGGT GTTGACGATT TTCTITTTGT 2220 CCTTTAGAAA GGTrTrAAAG ACAGTCTGA-A AAATAGGATG AACCTGCTTC AGATTGTCCT 2280 CA.ATGAGTCC GAAAAATTTC TCCGGrCCT TATTCTGAA.A GTGAAACAGC AAGAGTTGAT 2340 AGAGCTGATA GTGATGTTTC AAGTTTTGTG AATAGCrCAA AAGCTTGTTT AAAATCTCTT 2400 :TATTGGTTAA GTGCATACGA AAAGTAGGAC GATAAAATCG CTrATCACTC AGTTTACGGC 2460 *.*TATCCTGTTG AATGAGTTTC CAG'rAGCGCT TGATAGCCTT GTATTCGGGA TTTTCGATGA 2520 AACTGATTCA TGATTTGGAC ACGCACACGA CTCATAGCAC 2560 INFORMATION FOR SEQ ID NO: 115: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 11303 base pairs TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: TAT'TGGATT'r CCCTTGCAAT CAGTTATGG GACAAGCACC CGGCAGCGCA GAGGAAATCA ACGCCTTCTG TAGCCTACAT TTTCAAACCA CCTTCCCACG TTrTGCCCAAG ATTAAGGTCA 120 ACGGTAAGGA AGCAGACCCT CTCTATGTCT GGTTACAAGA CCAGAAATCC GGCCCACTAG 180 OAAAACGAGT CGAATGGAAT TTCGCTAAGT TTCTCATCGG TCGACATGGG CAAGTCTTTG 240 .AACGCTTTTC TTCAAAAACA GACCCAAAAC AAATTGAAGA GGCGATACAA ACTCTACTAT 300 AATTCACAAT CTCACTATGA TTAGGTTTCC TTTAACCTGA TGAATAGTGA GATTT1-rTGA 360 TGGGCTTTGA CTTAAATAGA AAAACACCCC ATGATATGAA ACATGAAGTG TTGTAAAGTC 420 TATGTTGTAG GTGCTTATTT CACAATTTCA ATGTGACCAG TGATAACGAA TACCATACAG 480 AATCTTCATA TACACTAAAC AAATGACr ATATCATI'TC CAACAAACGC CCTCTCAATT AGTCATGAGA GTTTTTCGCA TTGATGAATT GAAGAGAAGC TGTCTTTAGT AGTCTAAAAA
CTTTCCATTC
TTCCTTTAAA
ATGCTAACTC
AAATTTAGCT GTATCATTCT ATGAATTCTA TGTrTrrCTAT TACCAI'CCC ATTTCCCAAT 832
CTAATTAT
CCTrATCCTG
GATTTAACAA
CTTCGTCATT
TATTTGGCAA
TGCTTGGAAC
CGA'rrGATAA
GTTGTTTTCC
TCTCAGAACC
TGTCCCTATA
CAATTAGTTT TGGCTAGTAA ATGATGCAAG ATATTCAI-rA TCTATCTTTT AATTCATATG TAAAGATGTC CTITTATTAT TTC.AATTATA G.ACACATTCG GATACTAGAA TCTCC'TTGTA TT-GTTTT~A TATCTTTGAC TGG4GAATACA TACCAATCTA AACCAAGGCA ATGATTGCAC A.ACACTTAAC TTCACACCAC CA'TTTGATC TTCAAGCATT TCAAAAGAAT CAACTTCAGG 'rAAATCAACA CCCA'rACCTA CACTT'TTTGC AAACACAGGC GTAGTCGAGA CTGTGTATTT TTTCTCTGAA AAGAACTCAT ACCA'rTCACC TTCAGGGAAC CATACATCTA CTTTTGCAGA TTGGAATGTC AAATCCATCT TTTCTACAAT GGGAGCCACC TTCCAAAAAA GTATT GGTTT GGAACATTAT AGCTCTCATC A'rrCTCTGGA AGATTGGACT GATTAATGGG GCACCTTCCT CATGTGTCTG TACATTCATG AGGGAATCAT CTGATGTCTC AAACGAAGGT ATTTCTTCAT AATCTTAGAT AAAAAAACCA AGGTTCTTTA CTAT'rAAAAG GACTTCTAGA ACTATG;TAAT GACTAAAAAC ACCAAACTGT AGCCATCTAG TTTGTAGCTC TTCGTCATAA TATGTCCACC GATATCATGA CTCCACCAAC TATAACCGAT ATTAGATGCT AATAGGGTTG AAATCTTAAG GAATTCCAAC TAATAATAGT ATCCCCTGAA GGTAGCGGTG ACTACCAGGA CC'rGCATATC TTGATAAAAT CAAACCACCT TACAACTATC CTGATAGTGA TAATGGTTTA AAAGCCAAAG TGGATCTAGC TCCCTTGTTG CCAGTCAATC CACCAAAAAT CTACTCCCTG CTTTTCTAGT CATCTTTAAA GTAGGCTTCC CTAAAAGAGG GATTAAAAAA ATCAAAAATA CTAGTTCTAC ATTTAACCCC AACCGTTTTG CGATTTGAGG A'rAAGCTTCT GTATCCCATC AGCAGGATGG ACATTTAAGG AGAGTrTTTAG CTTTCTATCA GCAATAAcTG TrCTGGATTT GGTATTAAGT TTCTATTCCA ACTATATCCT 'N'CCAAAGCG AGCTCGAATG TCAGTTATAT GCCAATCCAT ATCTAACACA ATGGAATTTT CTCTGmTCA AATCTGTCTA TTAAATCCAA GTATTCATCC GCCAATATCT ACTCCACCAA TTGCCTAAAG CATATCTTGG CAACAAGGGT TCAAATGGTA AAAATCTCTG ATTGCTCCTC TATAATCATG CCCATAGGCA
ATCAGTTCTG
TAGAAATAAT
GTATATAGAT
GTTGTTTCTG
CGAGTAATCG
TCCCCCAACA
GTCGCTGTAA
AAACCAACAG
TCTGCATTTT
A'rACCTTGTG
TCATAATGAA
GCAGGTTCfT
TCATAAGCCC
TGAAGTTGTT
GTCCAGCCAC
CCCATAGATA
GACGTATAAG
GTTGAACCAG
AAGAAATACA
540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 833 GGTCAATTTG ATTCTCTC TCAATATAAC CAGATTGTTC ATCCCAAATA AATCATCCAA TAAGGCTATA CCATTTCGGC TAATAATTCC ATCTTCTAAC CATCTGCCTT ATCCAXGAGTC CGAGCTG?1'C CTTTTAACGT TTCAATAGAT ACCAGCGACT ACCATATACG GCAAAATC C?1TFTAATTC TATAAATAAA TAAATTCTCC 'rTTATTAAAG TGCAGATGAA AATAGTCCGT CATAATATCT ATGTCTCGAT ATAATCTAAC GAAATTTGGC CAAAATCTCT A'rTATAGATA
AATCCTTGAG
GAGATTGCTC
TCACCAAAA1 TTTrcGGCGT
AGTACGTTG
AGTTGTGTCG
2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 ~T=ATCCTC AAAACTTCCA GTTTGAGAGT ATTCTAACCT TACTAGCTTG TCTGTTAATA CAGAGATTCG ATAAAACTCT CCCTTAAAAA TTTTCAATTT OTTTCCTCC TT=1ATGGTA GCATAAAAAC AGAACGCACC ATTTGATr. CGTTTTTCAT TATCTGAAT GCAATGTTCT A'rCTGTTATA TCTATGACAA ATAATAGTCA ATTGAAAAALA 'rGCAG'rGGAC AAAATATCTT 'rrAACAAACC
TTCTGAAACA
TCTTACTAGC
AAGACTTTAT
AACTACTTT
AATCATACCA
TAAAGAGTTA TCACTTTrA ACTTTTCTAA GCTTATGCAG AAACTATTAA CTAAGATAGG ATTGATAAAT AA'rTTCAAAC TATTCAAGCT CACGTGCTr'r TTT-CCTTCCT GCTTATTTCT CGGTATATAA ATTATCCGGA TCAACATAGT CATAAGATTC TAGAACTGAA GAACCCGGA'r ATAACAGTTG CGCT'rATTA AGTCATCCCC AGAGCAAGAG ACATCACTAA CCGTAGGTCG CCA'rCCT'rCA ATCATATTTG CTCTTAAAAA CGGATCGGTT TTCAAAAGCT ATTCCCATGA ATATCTAAGG ACATA'rGCTA TAAAAATTTT1 ATTTTGTTTG CCTCALAACCC AAATATAAAA GATAACATAC TTTACTTITAT TCGAACTACT TITCTATTTTG AAGAATATCT TAAAAAGACC GGA'rTCTATC ATCGAGTATA CCTCCTTTAG ATACATTATA TCATATCTAA GTTTTCAGCA CGCATTCTTT TTGCTT'TTTT TGTTTTTTTA AATAACAGCA CTTCATACCC TACATAGCGA TCTTCACTGC TAA'rATTAAC CTTTTACT TATTAGATAC CTTC-ATCTCG TAATTTTTcA TACTTAAAGC ATACCAAACA TTGTCATCTT TTCTTTATCT CCATGTTTCT CTGTAGCTTT CGCTTATCCT ATTTTATAAG ACTATTGTAT CGTATTCTAC GTTCCCTGTT TATCAACTAT AAAAATATGA AAAAGCAGAG ACTCATTATT TAAACTATAT CTTGCTCTTC T1"TCACCAAT ATGCTGCAAA TGCACATACT AAGCTCCAAT CCCTACTGGA TTAGAGAATr CGCTACGTAA TAGCCAAGGC TGGATTATAG ATAAGGCTGG AACTACAGAT 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 T~rTGATCAT ATACACGGAT GAATGGAAGA TAGACTAGGA AGAGCAACTA ATACAGCTCG AAGATCTGCT GTCCCTAAGA GTTGCCCATG GAACCTGTGC GATAATTGGC TTAATAAAGT TAAATAGTAG CAGTAACCAT TGGTGCT;AAA ATAAATGGTA ATAATAGGTA ATCCAAAAAT TAATGGTTCA TTAATATTAA 834 GCTCGTrCCTA TTGCTTTAAG CTGTTCAGAT TTAGAGGCAA AAGCAATATA TAAACATACT CCrAAAGTTG
GCGAAGTGCC
GCAAAAACAA
ATAATCATTA
ATGACTTTAA
AATGCAACAA
GCTTCAGCCA
ACAAT'rGCCA CACCAGAACC ACCTGCAA'rr ACAAACATAT TAGAAAATTC CGCCAGCAGC A7M-rCAGCC ATGTTAGCAA GAGCAATTGG TGTTCGCACC GTGGATACCT ACAATCCAAA GTAOI-rGAGT AACCAATCCA CGAATTAGTC AGATTGGATA t=ACCAAA AAATATCTGT TCCCATTGCT ACAAGAAGAC CGI'GATAAA CAAATCCCGG AACCAAAGCG GTAAATCCAC GAGAAACTCC TTTAATAAC CCAATTATGT TTAACACACA 'rACr.ATAAAT TAATGATTGC GGTAAAAATC CCTGTTGTCC CAAAACGTGC
ACCTGCAACA
ACTAACAAAT
CAATAGATAA
TGGAATTGCA
GATAACAACA
TCTGGAACA
AAGAACAGTC
GACTACATT'r 0 CCCATTGCCC ATCCATCTGC ATTCCACCAT CAAAAATGAT GCACCATAA GAGGATTCAT TATGCAAGTG ATAGAACGAA ATGTAAAGTG ATGTGAAT'r-I AATGAGAAAG CTTGTGGCAA AATT'ACTGCA CCTTCTTTTTA TTGCCGTACT GTCATGACAA ATTGAG?1TCT TCTTCCTCTG ATAAAGAGAT ACAGAACCCA ATCAAATGAA GCAGAGAAAA GACTTGTCAC AGTCTTCATC A6AGCCATCAA GGCA.AGCAAG CATAAATNTT TGTCAATTCA TAG'rCGCA'rA GrTTTGCAACC TATCrGCCAC AATTGGCCAA
GGTACAGCAG
CCCATTGGTC
AATAGACCTC
CCATACCTGC
CCATAACATG
CTTAATAAAA
AATACTGAAT ACCAAAAACA TTGATCCTAC AATAGTAAAT AGCCGTGATA GCACGTACTA CTTT-AAACTG AGCAAGTTTG GTTTTCAAGA AAACCAAACA AcccrTT'rTG 'fTGATCOATA CATAATAATT TTTACTT-rCT AA.AGACTAGr TT~CAAATACA 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 AATTATACTA GATCAGGAT'r ATAAACTAAG TGAGTTCTTT TCCAATTGGA CAAATTGTTG A'rAACCTTA TCTGTTCGTT TATAAATTTT TTTAATTCTT CTA6ATGTCTA ACAAACTCAG AACTAAACCT AATAGAAGAA CTACAAAAAC AAATAAACGT GCTACTTGGT TArr'rTCAAA AATCGGAAAA AGATTCTTAA ACCAACT-rGT CCAAG'N'AAA ACAAGTAATC CTATTGAAAT AAGCATTTGT ATTCTAACAA ACATTAGTGT TATTCCCAAC TTTTCTTTCC TATTTCCATA AAGTTTAAAT TGTTCAACAG T1'GCTAAPLAT AGAAAATACT ATGACCATAA TG;GGGAAAAT AATAATAGGC GAGGGACTAA TAAACTGACT CAAAAGCCAA TAAATATTCC CAAAAAAGAA GAGTGCTATr GAATAACGTA GAAGAAGATA TCGATTGAAA AAAGTAT'rAG TTAGAGCCAT CTCTCGACGT TCTTGTTCAA TCTTI'TGTCG TTCTTTTtTA TCCA'rATCAT TTCCTCCI'TA TATAACAACA CATATTTAGT TAACTTTCTT ATAAAGAGCr AACATTTCCT TTGCTACTTC TAATAATGTC ATAGTGGTCA T'TAAATGATC TTGAGCATGT ACCATGA'rAA 'rTTCAATTTT AATTTCCACT CCACTTGCGT ATTCTTGCAA GAGTTTGGTT TGTGCATGAT GCGCTTCA6AG IF ,Ud AUj.JA 835 AATTATCTCA TTT'GATTGAT TTAATT1ArcI TTCTGCATCA TCAAAACTAC C~rCTCTCAT 5880 TTTTGCAAAT GCTTC;LTGTA TTTCTGACCT TGCAWCCC GAATGCAGGA TAATN'CAAA 5940 TGCTGcAAcc TGcAT~ccT cTTGATTCAT ATAAACCTCC TATTTTATCT TCTCAAATAT 6000 GTTAATAAAA TCTTCAAAGT TATTGCAAGA TATTAGCTGA TTTTGCAArr CATCAT'rCTC 6060 TGTCAGAGAG ACTATCTI'TT TAGTCACAGT TGCCAAACCT TCGT1'CCCAT ATATGATGG 6120 AGATAGAAGA AATACTAGCT GGACATCTGA ACTTTGATTA TCCCAGAGTA ACGAATCrTr 6180 ACAAA'rTGCA ACCGAAACCT T'rCCCTCTGT ACCAAAGGGC TGAATAGGAT GCGGAACTGC 6240 AATT'rTTCA GAAAAAACAA CTGAACT'rAA T'rCTTCGCGC TGTTTAATrC CATAAAGTAA 6300 AGATTGTTCA AACTCATTTG ATTCACCAAC AGATAAACTC 'rCA.ACCATCT TTTCAAGTAA 6360 ATTTACCTTG TCTGATTCAG TACATAT1'AA AAAGTTTTCT TTACTAAAAT ACTGTCTAAA 6420 GCCGTTGT'rT TCAAATTGT TAATCT-rTGA TGAT'rGTACA TAACTAGAAA C'TTCCATCTA 6480 *ATCCATAGCT TTTCTA.ATCA TTTCCATCTC ATCACTCTTA AGAAACACAC TAACTrTrAAA 6540 *AACTGGGAT'r TGAAAATATA GA1-rTGATAA ATCAATAGCT GACACTATAA AATCTATTCC 6600 *TTTAAG'T'TT TCTTGATTCA AT'rCATAGTA GCCTATTACA TCAACAACTT CTACTCGCTT 6660 .CCCAAACTCC GTTTCCAAAC GATTTCTTAA CATrrGGGCT GCACCAAATC CTGTTGCACA 6720 *AATAGCAAGA ATATI'AAACT TAG'rACTCTC TTTGCTACGT 'rCCATAGCAG CTAAAAAGTG 6780 AAGACTTACA TATGCTACTT CATCATCTGA TATTGTCCAC TCCAAGAACT TGTCCATATT 6840 TGCAAGAATT TCTCTAGTCA TAAAGAATAT ATCACTATALA TTCTGTTTAA TT'rCATCTAC 6900 *CAAAGGGI-rA TTTAAGGTAA TCCGGCTTTC TAAACG'TACT TGTAGTGTCA TTAGATGAGT 6960 TATCAATCCT 'rCAA'rTAGTT GGAAATCTGA AGAAAAGTTA TACATATCAT C'TAATCCTAA 7020 ***ATTCTGAAAT GTTTTAAATA AAGATTTTTT TAAAACTTCT TCAGAAATAT TCTTCTGAT-r 7080 TTTTTGACA'r TGTTGACTCT TAGCTAACAA ATGCAAAGTA ATGTAGTCTA TTTCCTGAAC 7140 TGGAAATTCC TGATTTGTTA CTTCTCTTAC TTTAGAAAGA ATrTTTTGGG CAACCTTITCT 7200 CrCTATTGCA TCATCAGTCA TCTGACAGTC TA'rATTTTTT AT'rTCAAATC CGGA'r'r'TAA 7260 ***ACGAATCACA GACAATGCTA TGTGAACTAC TAAAT'rCTGT AGTAC.AAA.AT CAGATAGTTT 7320 TAGGTTGGCC TCTTGGCATT CATCCAAAAC AATTCTAGCA AATTCTTCTA ATCGAACAGT 7380 TTGATCAAAA AAGTTAAATT TTACATAGCA ATGTATTGTT TTAAAAAATT GATTCTCTAG 7440 CAAATAATTT ATGATAAAAC GTCGTTTATC ACGTTCCTCG CCTGAGACAT AAACTCCTr'r 7500 ATT-CGCCCTA CTCTCAATGG ACAAATTATA CTCTGATAAC ATCACTCGTA TCTTTCTGAA 7560 836 ATCATGAGAT AATGTTGAAC GACTAACGTA AAGTTCATCA GCTAAATCAT CAAAAAGAAC TGCGaACTTG.C TCAAATAATA ATTTATTTAA GATAAATACT AAAcG.ATCAT cAccI'TTGA AACCGCAGTr TTCrGTATAG? CTTCTCCAG 'rrCATAAGTT TGTCTAAACT CCTGGTAAGc GCCTTGATTC TCAAAAAATA ?rTGATACCC TTGACCTTGT TTGAAATCA ACCGGACTCC TTGAATAATC ATTGTCTTCT CAATTAATTT CAGTACATTA CGGACAGTTC TATCTG.AAcA GGATAAATAT TCTGCCAGTT C=GTCTTGT AACAAAACGT TCCTTAT'N-r TTATTAAAAA TTGAAGGATA TCTTTCTCTT TAATGTTTAA CACATTCATT CCCTCCTAAA ACGTATGTT TCATATATTG AAGCATATTA TACACTAAA TCAGmTATA TCAAACTCAA AACAATTTAT CTTAACCTAA ATATTTATTG ACATTTCA'rG TGTTCATCAA ATATTCTCAA GAATCAAATT AGCCATTTTT 'rCAATTCCCA TTGGAATAGG AATATAGGCT TGAGGAGGTA TTTTACAAC TGGTTTT'CCT GCTTTAGAAC CAGCCTCTTC AAATTGCTTA AAGTACATr'r TTGTTTGACG ACTGACAAGA TACAAATCAA AAGCTGCTGC TGCGATAGCT TTCCCTCCTT CAGTAGCACT AATAGCATCA ACTACAATAT CTT~TCCCTTT TCCTTTT~AGA AACTCTGTTG TTTTCTGTGC CATAAGTGAT GAAGACATTC CTGCTGCACA AATAATT'AAA GCT'rTTGCCA TAATATTTTC 'rCCT=CTT AAATCCAATC AAAGCTGTGC TAAGTTGGCT TATT'rGTTAT CTATrTTrAT TATAAAATAA AGCGTTTCCA ATGACAATTC CCTCATTTTC CTAAATGATA TGGAAAAAAA TTATTTATAC TTCAAT'rrAT AAAATAAAAT TATTCCTGAG AGTAGAAATG AAACACTATT TGCTAAAATC AAAGGCAAGT CTCCTATACG AATACCATGA GCAAGCCACA ATGCAATACC
S
S. *S S S
S
S. S.
S S
S
S. 55 S S
S
7620 7680 '7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 AATAACT'rGC ATAACATACA TACCTAGAGC AATAGATCCT
ACGAAAAACT
ATGTCACCTC
TCATTCACCG
ATTrATAACTT TGTGGTAAAA ATGCAAATGT TGTTAAAATT AATATGCTAA ACAAACTGAG AATAATCTCA TTAGATGAAA TAACTTCCTT ATACCAGCCA CCCTTCCCAT TATCATCTTT ATCTACATAA, S. S
S
*5*S S. S
S
GTGTCCTr'TG TCTTAACTAC GCTGCAATAC TTCCAATCAT G7"=GTTAT ACTATTCTAC AAAGATTTTT TCGGGCAACG ATAAAGCCAT AACGTTTCCG CATGG;A6TAT AACCCATTAA ATATGGGCAC CTAGA'rATTC TGATCrATAG CTCCAAPLACC CA'IrTCACCG GTACCAGCTG AAACCAAATC ATCAACACCA TCTTCAACTA CAGCCTrTTTT AATTCTATAA TCATCATGTA CCATACCATC
AATACATCCC
CATTTCACGA
TGCTGCAACT
ATTTCAACA ATAAAGAGTG GTAAGTGATA GTGGTCTGTA AACCAATTTA ACCCATAACC CAAACCTTC'r GGATCAATTr CCCACTCCCA TTCAGAAGCC TTAACATAAT TA'TTrCAC TAAATCTT-CT GTTTCAAGA'r AATCAAAATA AGGATTATTT TCACGATGAG AGTCGATAGC AAAGGACATA TAGTAACTGA AACCAATGTA ATCTACAGTC CCACCAAGTA AATCTTCT 837 ATCCTGGGCA GTAAAATCAA CTGAAk1,ACC TT?1'CGTTCC CAATACrrGA AAATATCCTC AAGGATATTTrA CCTAAAACAT GCACATCAGC AAAATAATAA cCCI-rCTGCA TAGCTTTCAT TGCCATrAAG ArA'rCCTTAG G.ATTGCAAGT AACTGGATAA ATTGGACACA TCGCAATCAT ACAACCTATT TGAAAATCTC GATTAATCTC ATGACCAATT TrrACAGCTC G7WCACAAGC AACTAATTCG TAA=CrG CTTGATACAT AATTGCTTCT CrTATCAC CTTCCTCATA TACAATACCT GAGTTAGTAA ATGG'rGCAAA ATCTTCCTGA TAATTCGCTT GATTA'rrGAT TTCATTGAAA GTCATCCAAT AT 'rAAcc -I ATCTITGTAA CG'rTTAAATA CGACTTCTGC AAAACGAGCA AAGAAA'rCAA TCAATTTCCT ATTTTTCCAA CCACCATANT CGGTCACTAA CTGATAAGGC ATTTCAAAAT GAGATAGAGT GATGACAGGT TCAATACCAT TCTTAAGCA ?rCATCAAAA AGAT'TATCAT AAAACTGTAA TCCTTCTTCA rrTCGGCTCTA ACTCATCACC 9420 9480 9540 9500 9660 9720 9780 9840 9900 9960 0 "boo: *800 *00.
0.0 T=GGAAAC ATACGTGTCC AAAAAGTGCT ATATCTTCTT ATA ACCC TCTAAAACTC ATGCAATAGA GGTACGGAAG CA GAATC CCATT'rCAGC TATAACGTG ATAAAAATCT ATCGCCTCAT GA'T'rGGATA CCAAACTAAT TTCACCAGCT ACTCCATGAC CATAACATCA GCAACACTAA TTCCCTTGCC ACCTTCTTCC CATCCACCT AGCAGCA.ACA GCACCACCCC ATAAAAATCC ATCTTTAAAA GTAGTCATCT TCACTTTGAT ACTCTTATTA TATTGTTAAG GAAAGA.AGTA TAAACCTTAA ACCAAAAGAT GAAAACCCAT ATTTTTAATG GAAATAGAAC AATATCTCT
GACCAGCAGT
CAAGTTGATC
'T-rTTCCTCC
TCTTTCCT
TGTATTCTCG
AATAAT'TCTA
CTTCTATATA
TCTGCATTGT
TAAGGAGAAT
TAATGATATC TTTACGATTT TCAATACT ATTCCCTGTG TCTATAAACC ACTTATCGCT ACGTTCAACT TGCATCTGCA AGTGATATTT CTTTGATTGA TAATGT'rTAT CTAAAGTTTC AGTTCCCTCT TTTTCAATTG GTAAAAAATA AATTTCTTrrA ACPLAGGCCAC TATCAAGCAT ATCACCTI'TA TAATATACAT GAATAGTCAA CA.AACTACAA AAACTCTCAC TTCTGGCATC CCAGAATCAT TTTTCTTAAA TCTAAGATTT TTGATTTATC CACTGATCAA 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 CAATTTTAAA AAAACAAGTT TAACGTTTCG CCATTACTIT GCGATT'rATA GTCAAT'rGAA CTGGTAATA CCTTTTTGAG TACTCAGGTG AGTAGGGAGG
TAGATGAGGA
CAATCACI'TC
ACAAGACCAG
GTCTTTTTG
AAGAGGTA.AA
TTCGTATTTC
TTCTCTTGCA
TGTCATCTTA
TCTAAACTT-G
TTTATACCAA
CACAAAAGAG
ATATGACCCC
AGTTTATACC
TATCCTCCAA
rTTTTATGA
TAAAATGATT
CCTCATAAAA
ATGTTTTCTC
CAAACTCTTC
AAGTTACCTT TTTGATTTCT AACTTT'ATTG CACTATCTCC
AATCATCCTT
ACTAATTATC
TTTTCTTATA
GGTATTGCAA
AATAGGATTG
ACACAAGAGT
838 TCTAGCTTCC CCATTCTATG GAATCTrGCA TTATCCATAA TAATAACCGA TGGTGTGGN 11160 AATGrGGTA AGAGAAAcTT cTGAAAccAA cTTcAAAAA AGTCGcTCGT cATcGTCTCT 11220 TCGTAAGTCA TTGGAGCGAT TAACTCACCA TTTGTAGAC CTGCAACCAA AGAAATCCTC 11280 TGATATCI'TC TTCCAGATAC TTT 11303 INFORMATION FOR SEQ ID NO: 116: SEQUENCE CHARACTERISTICS: LENGTH: 3112 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: CCTTAGATTT CCACTTGCCA GAGGAATTGA T'rGCCCAAAC GCCCCTTGrAA AAACGTGATG CCTCCAAACT CCTCATCGTC AACCGTGAGA CAGGAGAAAT GCAAGATAAA CATTTCCACT 120 *CTATTATTGA TATGCTGGAA CCTGGTGATG CCCTTGTCAT GAACGACACC CGAGTTCTCC 180 *CTGCCCGCCT CTATGGTCAA AAAGTGGAGA CAGGAGGTCA TGTGGAACT'r CTCCTCCTTA 240 *AGAACACTAG 'rGGAGACGAG TGGGAAGTTC TGGCTAAACC TGCCAAACGC CTCAAGGTCG 300 **.GTACTCGTAT CAGCTTTGGT GATCGCCGCC TCAGCGCTGT CG'rrACAGAA GAATTGACCC 360 *.*ACGGGGGACG CATTGTCCGC TTTGAATACC AAGGAATTTT CCTAGAAG TC T GGAAAGTC 420 TGGGAGAAAT GCCTCTGCCA CCTTATATCC ACGAAAAATT AGATGACCGT GAACGTTATC 480 AAACCGTCTA CCCCAAGGAA AGTGGCTCTG CTGCAGCACC GACTGCTGGT CTI'CACTTCA 540 SCCAAAGAACT GCTGGCAGAA ATCCAAGCTA AGGGTGTTCA TCTAGTCTAT CTGACTCTCC 600 ATGTCGGACT CGGAACCTTT AGACCrOIGT= CTGTGGATAA TCTGGACGAA CACGAAATCC 660 ACTCAGAGTT CTATCAACTT TCTGAGGAAG CTGCTGCCAC CCTTCGCTCT GTCAAAAAAA 720 ATGGTGGTCG TGTCATCGCT GTCGGAACCA CTTCTATCCG CACCTTGGAA ACTATTGGTT 780 CCAAGTTTGA TGGGCAAATC CAAGCAGATT CTGGTTGGAC CAATATCTTT ATCAAACCTG 6~40 GGTATGAGTG GAAGGTCGTG GATGCCTTCT CAACCAACTT CCACCTGCCA AAATCAACTC 900 -TGGTCATGTT GGTTTCTGCC T'IrGCAGGCC GTGAATTAGT CTTAGA'rGCC TACCACCATT 960 *CCATCCAAGA ACACTACCGC TrCTTCAGTT TTGGTGACGC CATGTTTATT TATTGAGAAA 1020 GAATTTCTCT AAATCTTCTA ATACCAATAA ATCGCTAAGA TATTATT1TCA AAGAACATCT 1080 ACAATTGAAA C'rCTAGCTAG C'rGTAGAAGA GGCCTAGTAC ATTGAAATTA AAATGCTTCC 1140 CCCTAGCTTC GAAAATA'rrc CCATAGATTG CGTTGACTCT CCAAATTGAT TCATCTrATAT 1200 839 Tr'rATTTCAG CTTCCTATAC TACCACCATA rr'DGTTACTC ATCATTCACT ACTTTGACCC AAAAATCTCT TCAAACCGCC AGTTCTATCT ACAACCTCAA CTCTTTGATT TTCA6TTGAGT AAGACTAGGC 'rTGTrTCACTT ATATATAATT ATATATTCAG ACGCACCCAT TCACCATTAT CTCACCCGAT GTATTTrACAT CATAGAACCG TTGCTGTTGA CATGGAACCA TCACTATTAT 'rGGAGCAACT GCTTTAGCTC TACAGGATAC GAAGGAATAG CCCTCTCCTT ATACGTCTTT AATTGTTCCA TCACGTTCAT CCTACrrCTr CTACTTCTTG ATTGGAATTT CTTGGACAAG GAGTCTTTTT TTTGACTTAA TAAGTTATCG TTGTAACTGT GCAGAAAGGT CATTGTCTGC TTGCTTTTAG CTGGAACTTT TGTTCGGTAA CATAACCAGT CCATCCTCTC CATCAArl'GT .TAACGAATTC GCGTACTTGA ACAATAATAT CCCCATTGTC TCATCTCTGA CATATTGAAT CTAAAAATAA AGTTAGCCCG TrTC 'rCGCT GTTTGTAAAT C.AAAATGCAA TTATCTGTCC TCTCAAGAXGA CTATTATGAG TG.ACTCTCCT TAGTCTCAAA Aa'CAAAGACT TCAACGTCAC CTTGGAI'AT ATATGTGatC AGCAGTACTT TGAGCAACCT GCGACTAGTT AT'rAAACAAA AAGTGAACAA ATCTGAATDC TTTTATAG;TC GCTATAAGAT GACCTTATCT ATAGCwrPr ACATACTATT ATCAATTTTG. TCGCAGGGAG GAATCTGTTA CATTGACTCT ATAGCCATCT ATACTTGTAT TGACCGCTAA AATACCATTT ACCACCAACT TGGAACCATT GATTGACTT GGTAGTACCA TGAACTATTA ACTTGTACCC AACCTGTTGC AAAAATACCA CATACCATI' TCTTGTTTCC AGTCTGTTGT GTTCTACTGC TACATCTGTT CCTTGGTTAG ATGTAACAGA ATGATTGCTC AGGAACAACA ACTTTTTCAG GTTCTCTCGT TTACCATCTC TTTAGTAAT'I TGACGAGAAG TAGTTTCTTC CTACAGTATA GATTGTAGTA AGAGTAATTT ACCAAT"T'CT ACTTTTATCA AGAGTTGGGC CATCGAGATA TTCTGTTTCG AACTTGGGGC TTGGT'rCTTT TTTTAACAAC TCTTGTTTGA AGTACTCTCA GTTAC~TGTC CACTC=rCC ATCTACATTA TTTCCCATTC TT'TCCTAGAG TAATCTCTTG CTCCTGTCCT TTCATATTrTA GTAGCAAATG GAACAAGAAC TTCTTCAACC GATAACTGTA TCCGTGGCTT CTTTTCTATC AACAGTAACC CTCTGGATTA ACATCGTAGG TCCTTGTCGT AGTTACATAG AACAGGATI'T TCACTACGGT CTTTTGTTC ATCTT=rCA AAT TCTTG GTTACTACCT TAGGTTTAGT CGCTACTTTT AGCGTCATCA TACTCTATTC CC'rCTTCTTT ATCTCTAGTA CCCATCAGCA GCATGAACAA AACT'rGTATT CAGATTCCTC ATTACCOCAG AACCAAAAAT CTTTCCCAGT TTACGTATTG GATTTTGCCA TTACATCCTA CTTCTAGTAT AGCATCTT'TT
GACACATGAG
TTATTTCAGA
TATACTCTTC
T~aCTTCOTC
TTCTAGTTTG
TAATGTACAG
1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 CATAGCGCTT ATTAGTATTA 840 CTATCAAACG ?TAAACAATA TACGT'PATAT ATAAAATAGA CTTAGAATGA TATATTGATT ATTGAACTAA CACTTTA6ACT ATATCGTAAT CAATCTCATh TATAAAGGAT TGCAGACATC TTATCTAAAT ACATGCGA.AT ATAI'TAGAT ACAAACATTC CAACTTGATA AT IFORMATION FOR SEQ ID NO: 11.7: i)SEQUENCE CHARACTERISTICS: LENGTH: 4327 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 117: CCCAAAAATC TCTrCAA.ACC ACGTCAGCTT CGCCTTGCCG TAGTATGGTT ACTGACTTCG TCAGTTCTAT CCACAACCTC AAAACAGTGT T'N'GAGCATC ATGCgGCTAG CTrCTTAGTT TGCTCTrTTA TTTTCATTGA GTATAAAAAC AGATGAGTTT CTGTTTrCTT TTTATGGACT ATAAATGTTC AGCTGAAACT ACTTTCAAGG ACATTATrAT ATA.AAAGAAT TPTTGAAAC TAAAATCTrAC TATATTACAC TATATTGAAA GCGTTTTAAA AATGAGGTAT AATAAATTTA 3000 3060 '3112 CTAACGCTTA TAAAAAGTGA AAAAATAGTA GTAGCTATGC GGGAA'rAATA TGATATTrAA CTACTTTTGA CAGTTTTT ATTGTCCCCT TTAAGCTGGA TCTACAAGTA TTTATGCTTA GGAAGTATGG TGTAAATAGC AGTTTAATTG CAGCGGTGTC TAGAATCTAT T'rTTATGTAT ATTTAAAGAT AGATTGCTGT GAAATAACAG ATAGAGAGAA GGGATTGAAG CTTAGAAAAG GGCATTCAAG ACAAAAAAGC AGAGAAAAAG ACAAGTTGAA CGACAGTTTT CTGATTGATT TATTTCTTCA CTTATTTGGG TAAGATTCTG ATTGTGAGCT TGA'rTATATT- TCCCATTATT TGAAAAGCTA TTTGAAAAAG TGT'rCGATA.A GGATTGAGCA ATAGGCTGAT GTCCATCATT TGCTTATAAA GAGATATTTT CTGGTAGATA AACTAGATTG GCAGGAGTCT GATITGGAGAA CCAATTTGAG ATAGTTTGTT TAGTTCATTT TTGTCATTTA AGTTAATAAA AGACAAACTA AGTGCATTTT CTGGAGTAA.A GATATAGATA TAGAGAGGAT CAGTATGAAT CGGAGTGTTC 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260
AGGAGAGGGG
AATGAACTGT
TGTCTTATTT
AAGAACGTAA
AAAATrGGCA
AGTAAAAGAA
CAGAAATCGG
GTGTCGTTrAT AGCATTAGGA AACTATCGGT AGGAGCGGTT TCTATGATTG TAGGAGCAGT GGTATTrGGA ACGtCTCCTG T'TTTAGCTCA CTCTGGCAA.A TGAAACTCAA CTTTCGGGGG AGAGCTCAAC GCCAGCCTTC TTCAGAGACT GAACTTTCTG GCAATAAGCA AGCAAGAAGA AAAAATTCCA AGAGATTACT ATGCACGAGA AGAAGGGGCA AGTGAGCAAC CCTAACTGAT ACAGAAAAGA AGAACAAGAA AGGAAAGATA TTTGGAAAAT GTCGAAACAG TGATAGAAAA AGAAGATGTT GTGAACTAOA TAAACTAAAG ATGCCAAGGC CCCAGCATTC AGTACTTCAC TATGGCAG'TT GGAAACAG?1' TTACAATAAT 841 GAAACCAATG CT'rCAAATGG AAACI 'GAAA ACGCAACAGT TATAATCTCT 'rTrCTTC TACAATAATA CTGCTACTCT TACAACGATG CACCCTTAAA TCAGAGAGTT GATTTATCAA TCACATCCAG TTTAAGCCAG AAGTGCTACT AAAAAAGATG; AGAGGGGCGT GGTTCGGATG AGTTAAACCA GGTCAGTGGA ATTCTGTGAC TTTCACAGrT GAAAAACCG-A CAGCAGAACT ACCTAAAGrC TCTACG'rAAA CGGGGTATTA TCTCGAACAA GTCTGAGATC TGGCAATC TGCCAGATGT AACCCATGTG CAAATCGGAG CAACCAAGCCG TGCCAACAAT GGTCAAATC'r ACAGATTCGG AATCTCACTG 'rGTATAATCG TGCTTTAACA TACAAAAACG TAGTCAAC?1' TTTAAACGCT CAGATTTAGA AAAAAAACTA
CGACTGCGCC
ATTAAAGATA
ACGGTTTGCG
CCAGAAGACG
CCTGAAGGAG
CCAAATAAAG
ACTTTGATCG
ATGGTCATCA
CGGCTTTAAC AGAGAAAACC GACATA'rTCG ATGGAATCAA GAGTTATCGT ATTCCAGCAC CAGGTGCAGA TGAACGCCGT CTCCATTCGA GACGTAGTGA AGATAATGGT AAAACTTCGG ACAATCCAAA AGCI-rCTGAC CCATCGATCG TTCAAGA'rCC TGAAACCAAA CGAATCTTT'r GAATCTTTGG AATGTCTTCA CAAAAAGAAG ATCAAATCCT CTACCGTGAA GGAGAAAAGG TCTATACACC AGATGG'rAAG GCGACAGACT CCTATAGCGA CAAGGGTGAT CTATACAAGG CAACAAACAA AACTTCTCCA T'rTAGAATTG
AAAGCGGGCG
TT-CCAAGAC
GTGACTGGGG
GTGACCGAGT
GTTCACCAGT
TAACGGTAAC
AGATAAAGGA
TGATATCGGT
AACCATTACC AACTTACGTG GAATATCGAT ATGGTGTTGG 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 .2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 CTATCTATGA CATGTTrCCCA .GAAGGGAAGG AACCTACAA AAAAATCGAT GGAAAALACCT GAGCTTATAC CATTCGAGAA AATGGTACTC ATCGCGTTGT TGTAGATCCT GTTAAACCAG G'rGACCAATT
CCAAGGATAG
AC=AGGAAAT ATCTACTTCA CTATCTATGG ATGTCCTACA TACTCCGATG GTCAAACCCG TGTACTTCGG AATGGGCCTC GTGATGACGA CGGGAAGACA ATTCATGAA ATTCTTGGGT ACAAGGGACG GATTTTGA TGGTCAGCTC CTCAAGATAT G'rAGGTCCTG GAACAGGAAT CCGGTTTATA CGACTAATAA TGTATCTCAC TTAGATGGCT CGCAATCTrC TCGTG'TCATC TATTCAGATG ATCATGGAAA AACTTCGCAT GC'rGGAGAAG CGGTCAACGA TAACCGTCAG GTAGACGGTC AAAACATCCA CTCTTCTACG ATGAACAATA CACGTGCGCA AAATACACAA TCAACGGTGG 'rACAACTAAA CAATGGAGAT GTTAAACTCT TTIATGCGTGG TTTGACTGGA GATCTTCAGG TTGCTACAAG TAAAGACGGA GGAGTGACTT GGGAGAAGGA 'rATCAA.ACGT TATCCACAGG T'rAAAGATGT CTATGTTCAA ATGTCTGCTA 842 GCACGAAGGA AAAGAATACA TCATCCTCAG TCCATACGAT
GTGAAAATGG
AACACAATCC
CGGAGTATGG
TTAGAAAATT
AGAGAGATGG
GTAT'rGGTCA
GATGGTCCAC
AATCAAAAA
CATCTTGTAT
TAATTGGGAA
GCAAAGGAGA
ACAAGGCTCC
?I'GGCACGTG. TCGAAGAAAA GGAGAGTTTG CCTATAATTC GAACATACTG AAAAAGGACA TTTTTGAGCA AAAATCTGAT GATGGGCAAA GGAGTTATTG AACCCTTCAA TTGGCAAATG TAATGCAGGT GGACCGAAAC TGGTGAGTTG ACTTGGCTCA GCTCCA.AGAA ?TAGGAAATG AAATGCCTAT ACCCTATCAT
TTCTCCTACC
GCTTGGAGTT
GTAAAACAGC
ACCCAGTATG ATAGCAAGAC ATTATGGTA TAGCTAAAGG GGTGCCAGAG TTCCTGGCGG TTTACAGGGG GAGTrAATGG TCTGATTCCC TTGTAACTCT CTTGTGT GC.AGTAGATA AGGAAGATAT AAGCATCGAA AGTATGCATA ATCFTCCTGT AGTAA.ATGGT AGCAAAGCAG CGGTGCATGA TACAGAGCCA GCTGTTCATG AAATCGCAGA TACTACAAAA AAAGATTATA. CTTACAAAGC
GAAGCGAACT
CGACTCAGAA
GACTTTCCTA
CGGACAGGAA
AAATCTAGCA
AGTTCCAGAA
GTATAAGGGA
TCCTCTTGCT
CAGCAGGCAC TTCCTGAAAC AGGAAACAAG GAGAGTGACC TCCTAGCTTC ACTAGGACTA ACAGCTTTCT TCCTI'GGTCT GTrTACGCrA GGGAAAAAGA GAGAACAATA AGAGAAGAAT TCTAA.ACATT TGATTrTGTA AAAATGGCTC TTTGTCAACT GTAGTGGGTr GAAGTCAGCT AAGCTCGAGA AAGGACAAAT TrTGTCCTTT CTrTTTTGAT ATTCAGAGCG ATAAAAATCC GTTTTrTGAA Gr=CAAAG TTCCGAAAAC CAAAGGCATT GCGCTTGATA AGTTTGATGA GATTATTGGT CGCTTCCAAT TTGGCGTTAG AATAGTGTAG TTGAAGGGCG TTGACGATTT TCTCTTTGTC CTTTAGAA.AG GTTTTAAAGA CAGTCTGAAA AAGAGGATGA ACCTGCTTTA.
GATTGTCCTC AATGAGTCCG AAAAATTTCT CCGGTTCCTFr ATTCTGAAAG TGAAACAGCA AGAGTTGATA GAGCTGATAG TGATGTTrTCA AGTCTTGTGA ATAGCTCAAA AGCTTGTTTA AAATCTCTTT ATTGGTTAAA TGCATACGA-A AAGTAGGGCG ATAAAAATGT TTATCGCTGA
GTTTACG
INFORMATION FOR SEQ ID NO: 118: i)SEQUENCE CHARACTERISTICS: LENGTH: 3S21 base paiLrs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: CTCTGGCCCT GCCACTCCAA CGTTTTGTCA GGGTGCTTTT TTCATAAAGG AGTTCTTATG 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4327 843 TTAGATATCA AACGTATTCG TACAGATTTT GAAGCTCTCC CAGAAAAATT AGCTACACGT 120 GG1'GTAGATG CTGCTGTCTT GAATGAAATG AA6AGAAATCG ATGCTAAACG 'rCGTAACATC 180 TTGGTCAACG TTGAAACTCT CAAAGCAGAA CGTAACACAG ?1~1CTCCTGA GATTGCCCAA 240 GCTAAGCGCA ACAAGGAAAA TACAGATGAC AAGATrGCTC CCATGCAAAA TCTATCTGCT 300 GAGGTTAAAG CCTTGGATG4C 'rGAATTGGCA GAAATCGATG CTAAATTGAC AGAATTTACA 360 ACGACTCTTC CAAATATCCC AGCTCACAGC GTTCCTrGTTG GGGCTGACrA AGACGACAAT 420 GTGGAAGTTC GCCGI'GGGG TACTCCACGC GAG?1-rGACT TCGAACCTAA AGCTCACTCG 480 GATCTCCGTG AAGACCTTGG TATCCTTGAC TGGGAACGCG GTGGTAAGGT AACAGGCGCT 540* CGCT'rCCTCT TC'TATAAAGG CCTCGGTGCT CGTTTGGAAC GTGCTATCTA CAACTTTATG 600 TTGGATGAAC ATCCAAAAGA AGGCTATACr GAAGTCATCA CACCTTACAT AGTCAACCAT 660 GATTCTATGT T'rCGTACTGG TCAGTATCCA AAATrrAAGG AAGATACTTT TGAACTCAGC 720 GATACCAACT TTGTCTTGAT TCCAACTGCT GAAGTrCCTC TGACAAACTA CTACCGTGAT 780 GAAATCTTAG ACGGCAAAGA TCTTCCAATC TACTTCACTG CCATGAGTCC GTCATTCCGT 840 %:TCTGAGCCTG GTTCTGCCGG TCGTGATACC' CGTGCCTTGA TCCGTT'rGCA CCAATTCCAC 900 SAAGGTTGAAA TGGTCAAATT TGCCAAACCA GAAGAATCTT ACGAAGAATT GGAAAAAATG 960 *ACAGCCAACG CTCAAAACAT TCTTCAA.AAA CTCAACCTTC CATACCGTGT CGTTGCTCTC 1020 ATATGGGCTT CTCAGCTGCG AAGACTTACG ACTTCGGAAGT GTGGATTCCA 1080 GCACAAAACA ATTACCGTGA AATCTCAAGC 1'GTTCAAACA CAGAAGATTT CCAAGCCCGT 1140 9:.:CGTGCCCAAA TCCGTTACCC TCATGAAGCA GATGGCAAGG TGAAACTCCT TCATACCTTC 1200 *AACGGTrTCTG GACTTGCAGT TGGACGTACA GTGGCTGCAA TTCTTGAAAA TTACCAAAAT 1260 GAAGATGGTT CTrGTGACCAT CCCACAAGCA CTTCCTCCAT ACATGGGTGG AGCTGAAGTrC 1320 S*ATCAAACCAT AAAAAATAAG GTTTAGCTAT TTCTAGCTAG ACCTTTrTC GTAACCAAAT 1380 cAGATAAGCA ccTAG.TAcAA AcAATAAAAT AGI'AGGC-AT ATAATGGTTr CAGCCAATAC 1440 CAGGTAATCC AGAAATGGAA CGrrCAAAAT TCCCTGAGCC ATCT.TGAGCG AGGTCGCTGT 1500 GATAATGGTT GGGAAGGTGA GGGCTGAGAA GGCTGGTTGA AAACrTGTT- TTAAAATTT 1560 C GGCAr.ACGA GTTAAAACAA AGAAAAAGAA GGATTGAGAA GCCAAAATCA TGACAATCAA 1620 GACCCA.AC'rC GGCACGCTGG T1TCCTCCTAC TCGAACTAGA GAAGCCAAGA GTAGAGAGAA 1680 AGGAGCACAG TAGATTCCT'r CTTGTCCAAG C;ACGCTAGT GGGAGTGGAT GTTTCTITAA 1740 ATCGCTATAA ATAAGGGGAT AGAGA'rAGAA GTCAAGAGA AAACCAAAAC TCAACGTCGC 1800 844 ATAGGCAAMT TCGATAATAC CTACCAGAGG ATAGGTCAAG GCACCCACI'G CTATCCCCAC ATAGAGAACC GTCCAGCTTG GAGTGGCATG AACCCTCCGC CCTGGACAAG CAAAC7rGAT GGTAAAACCA GCAATcAAGC TACCAAAGGA AG2ATAAGAGA AGGAAAGGTr GCCATTCCTC CCAATrTAAAG AGATOCAGAA AAGATGGGAT AGAACCGGCA CAAACAACCT GAAAATACTA TTTCAGTATA ACATAAAAGC GACTTCAmT TACTCAGATA TCAAATCCAA GAGAAATGAA AACCACCAAA TCCCTTGTGC ATACGCGAAA GACATAGGTC ACAAAAGAGG GGGCTTGGTC TTAGAAAGTA AATCCA'rAAA ACGTATCTAA AATAAGATTT AGGGGAGTTT T1-rCATCCTA GCTTAAATGA CAr'rTAAAA TGAATTAGGC ATAAG1TGC GATAAAATCA 'rcccAGcCAT AATI'CTTGCT TGG'N'TCTTT ACCAAACCAA TCAGACTAAA CCAGCTCCTG CCAAACCTAG ACCTCCAATA ATCATGTTAG AAACCGTCC GCTTATTTCA AATTCTGOAT TAATTGGTGT CTGACACCAA AAACCACATC GACAAGGCTr CATCTCCTAC AT'rAGCTAAG TTGTTGGCAT AGTTACAGAG GATTGCTAGG CAAGGCATTT TGTTGAGTGT AGCCAGCTTC TAAAAACTCA 4 0 4 Se 4 0 4 4. .4 4 4 4 4. *4 4 44 4* 4
ACGACCCTTG
TGTTTCALATT
T'ITGATGGAA
GTATTGATAA CTGCCAAGGT AAACTTAGCT AGGG'rATCCA ATTTAGGATC GGAG'rACGAT 'rGCGAAGAGC TrGAATCAAG TCATCATTCA TCTGGATTTG AAGGCTGTGT GACCTGCGAC ACAGAAGGCA CAACCATTGG TCACGGCTGC 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3521 CCTGATTTGC ACCACTTCAC GACAATr'rGG TAGCCTTCTA AATATAGCCA TTGTTGCTT~ TGACTCTACT GTATOGATAG CTAATATTAT TGGAAAATCT TATTCTCTCA TAAGATT TTC ATATATATTA TATATCAGGC TGTGATTCCA TTATTTGTCA CCAAGAGGGC AATCAAG'ITTT GCTCAACGGG TGTCAGGCTG AAACAGTCGG GGCATTGGCC TTTCTACTGT TrCAAGAATT TAAATGTTGT CATAAGATAC TATAAAATCC TGATTC.CTAA GT'rGTTATAT TAGT'rTATCA TGATAAAAAT TATTTATAGG AAATACTTTT TAGTTTCAGC GGCAGAGCCA TCAAGGCGTT TTGCGACGGT GGATAGATGA AAGAGACCGA TTAGGTrrGGG T=IrCACTT CTGCTGGTGC CTCTTTTCTT ATTATTGACA GTTTATCTAA GATAAAGCTT CACTTCCAAT CACTTGTATA CAAAAAAATC ACACGAGCTG AATAACCACT GGCGACAAGA AACGATATCT GCGATAATCC GAGCACAAAA ACCACACGCT AGACCATATC CAACTCGATA AATCCTCCTA ACAAGACCAT 4444 4 4 4444 44 *4 4 4 AGAGCCAGAT AAAGCGAACC CCAAAGAGGA ACTCAAAACA GCGTTCTCCG TAATAGTTCC AACCTAGAAT CGTTGTAAAG GCAAAAAGTA CAAGGAAGAT GGTCAAGAGA GCAGGCCCAA AGTGTGAAAA GTTTGTTGAG AAAGCTGACT GAGTCAAGGC AACCCCATTC AAGTCACCGC TCCAAACTCC AGTTACCAAG ATGGTCAAAC CAGTTAGAGT A INFORMATION FOR SEQ ID NO: 119: Mi SEQUENCE CHARACTERISTICS: LENGTH: 1968 base pairs TPE: nucleic acid STRANDBDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: AACCTGGGCA AGCAAGCTAA AAGCAATGGG TGGAGGTTCC CACATI'GTAT CCTCAACTCA TCTCTTGGTC ATCTTAGCCA ATGTCTrTAA CACAGCTGAT ACTGGAAAGA CTrTGGTTGA CTGGATTTTC 'PTTATCCTCA ATGTCTrTTC TCTGTGCTCA GCTATCATCG CCAGTGCCTT GTCCCTCATT CTCGTTGCAA TCATTTGGGC ACCTGGAATC CTAATGGCAA CTGCCGCTGT AGCTGGCGGT TCTrrACCGTT GGTCTCTACT ATATCCATTr T rCCGTTTTG GTGCTGAATA AGGTTATGCC GAAAAAGGAA AACTCTATCT GGCTATGGTC AACACGGCTG GTGTTGCCAT CCCAATGATT GGACTTAGCA TrACTCAGTG TATGCTACTC TT TGGAGGCT ACAAACTTTT AGACGGCATG GTCAAATGGA TTATGTCTGC CTrAACCArr GCGAC'rGTrC TTGCAGTTAT CATTGCGGCC GTCAAGCATC CAGAATACAG TTCTGATTTr' GTCGAGAAGA CACCTTGGCA AATGGCAGCT CTGCCCTTCA TTCAGCCATC AATTCACT AGACGCTCTG TTTGACTTTA TGTGGCACTG GGAGCACTGA CAAATACATC TCTCAATTCG C?1'GATTACC TTTATTGCCT CTrATTCTCCC GTTAATCAGG TAAATCT'rTG AACATCTGGA CTrCGCTGGT CAGGTTTCAA ACCTTTCTTT GCTCTTTTGA T'XGGCTCAAA CACCTTGCCA ATCTACGCAC TCGCAATCGG TTCATrTCAA GAGAAAATTC 'rGAAGACCTG AKACTATGCT TCAACAACTT CAACCACCTC
TCGTCTCCCT
GGTCACCTGA
ACACTGGTTA
TTCAGTATCC
TGGGCATGTA
TCCTCTGTAT
AATCTCTCCG
TGACCATCAC
CCATGCTCCG
CCTAGGATGG ATGCCGGCTC CTATTGAAAT AAAGAGAAAG ACCGTCAACTI TTAACACAGA TATTGGAACA GCTATCCTAG CCGTCTTCTT 'PACAGGGCAG GCGGTTGAAO CTGCTTCAGC TGCCTCTGTT CTTGCCGAAT. GGTCCCGTrA CTT'rGGkACA GTTATAACTG 'rrATCGATGG
ACTGCTAATC
TGCTATCATC
CTTTGCCATC
AGTCAAAAAG
GGTATCGTCA
ATTGGCTCTT
AGGACAATCG
TTATCAAGTT
TCCTGACAAC
AI'ACGCCTT GGTAACGCGT GAA.AACAAAA ATCTTCCrTC TTGCGGGA'PT GMTTTCCTC TTTGCTTCGC CATCTTCTT 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 AAAAGCAGGG TAAGGGACAA AGCAAATATT TCTATGATAA TTTTTACGTT CTTAAAGACT TTATAAGAAC TTTATACTAT GCGCGAGATG AAGATAACGT AAAGCATAAC AACAAGGTTT GTTTATACTC AAAAAACAGT TCGAGAATCT CTTCAAACCA 846 CGTCAGCTCT ATCTGCAACC TCAAAGCTGT GCTTTGAGCA ACCTGCGACT AGCTrCCTAG 1500 TTTGCTCTTT GATTCATr GAGTA'rrAAT TCTCCTTTrC CAACTCATAC AAATCTGCGA 1560 TAATAGCTGC GACATGTITrG ATATCTTCCA GCATGCCTCG CATTTCAAAG TCAGCCAATA 1620 CAGGGAAGCC AAAGCGTTGA CTGTATTGCT TGGCTGTTAG GCAGTATTGG TTAT'rAAAGT 1680 TACGATTTCC TGACCCAACC ACACCAAAAC ACTTACTAGC ATTGTrTACCA TAGGCAATAA 1740 AATCTCCCAC CGGTGTCGTC AAAATCI'CAA CATCTCCGTT ATCCACGCCA TTCCCACCTP 1800 CGAGATAGGT CGGCAAAAAA GCCACATAGG GATGGTCCAT TTCATAGAAA TT'rGCCTT 1860 CCTrGACCAA ATCCTTGATA TGA.ATCTTTT GAACCTCAAT CCCTr'rGTAC TGGGACAAGA 1920 GATAGTCT'rT CAAGCGCGTC ACAAAACTTT CAGTGTrGCC ACTCAACG 1968 INFORMATION FOR SEQ ID NO; 120:.
SEQUENCE CHARACTERISTICS: LENGTH: 7172 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 120: CCGCATTT'rT TATCACTAGA C'rCGAGACAT CTTTTGAGTG GCTCTTGCTC TCTGGTTTAA *TT'rTCTTCCT TGCTCAAGGA CTCCTGCTAT TTCTCTTGGT CGTCCGACTC AAACATCAAT 120 TCGCTGAGAT TTATCCTCAA ATCAATAAAA AGATTCGCTT CTACTATTTA GGGGTTCTCA 180 CCATTGATTT TCTATTTTTT GTTCTCTTAG CCTTCATTAG TITCTCACGT TTTTCATCTC 240 *TTATGCCAAT CATCACTGCT TGCCATTCTA CT'TTATTA TATGACAGCT GACTACCTAA 300 GAGAAAACTA TCCAOACTTT TACGACAAAC ACATCTCTT'r ATGGGAGTGT CTCTAAAGAA 360 **.AAGGAGGTTT TAGCATGAAA AAAATCATCT TCATCAAAAC CATTCAACTC CTTGTCATTG 420 ATGGAATCAT GCTGGCATTT TTGACA'N'TA AA.AGGGGGCT TACTTGGGAC TGGATTTTGA 480 TTTATAGCGG TTGGCTCATT TTCTTTCATC CTGTGCTATT GACCTATCTT TCAAACCAAC 540 ?PI'GTGACCA CmTTAGTTAA CTCTATTCCC AGATrrAGACC GAGAT'TCTGG CGTTTrGCTT 600 *TACAAAT'rCT CCTATGGGAT AGCCTGATGA TTCTCTCCTT GGTGTCTTTA AGTGATATTC 660 CACTTTTCCT TCAGGGAACT CTCCTCATCC TAGGACATCT CATCCCTTCC TATCGCATCT 720 GCCAAAGCCT aAAAAGAGAC 'PTCCCCCAAG CATATCAAGA ACCGATTTCT TTTTGGAGTA 780 TTTATGATA GATGAGAAAG ACCAAGCCGA CTGGGCTTGG TCTTTCTTAT CTCTTTTTAG 840 TATCTAGGAT AATGGTAACA GGTCCATTAT TAACCAGCTC AACCTGCATA TCTGCTCCAA 0 900 847 AGATGCCTGT CTGAACGCC AC~rCTGCG CrAA7"TT-TG ATGAAAGCA 'rCATAGAAGT 960 CTGATGCCAT ATCA=-MIA GCTGCCCCTG TAAA~GCTGG ACGATTGCCT CTCTACTAT 1020 CCGCAAAGAG GGTAAACTGA GAAATAGAGA GGATTTCTCC TTCAATATCT TTGACAGACA 1080 GGTTCATCTT GCCTTCTGCG TCTGAAAAAA TCCGCATAT'? GACCAGTTTTI CTCACAGCAT 1140 AGTCCAAATC TTCCTCTrCG TCCTCTGGTC CAACACCAAC CACCAATAAA AGTCCCTGAT 1200 TGATTrCC CTGAATCTGG CCTTCTATAC TCAC7rGGGC ?IWITAACC CGTTGGATAA 1260 TG.ATITTCrAT AATAGCCTTT CTAGTAAGAG CTAGGACAAC TAGCCGTTGG TCCGTCAC 1320 AGAGTAAACT TCTGGCACAC TCTTAATT'N ATCGACAACC CTGGTCAC'TG 'rAGAGAGGT 1380 CCCAATACCG AAGgACACAT GGATATT'AGC AAACrrrCATA TCCTTGGT'rG GTTGGGCATT 1440 GACCCTTGAA ATATTCTTGG TTGTATTTGA AAGAACTrGC ACTACATCCT TCAACAGTCC 1500 TGTACGGTTG AGACCGTAGA TA'rCCATATG GGCCATATAC TCCT'rATTTGC AGCTAGGGTA 1560 CTGGTCTTCC CATTCCACAT CAAGGAGACG TTGCTCGTAG TTTTCTTGGG CACGCAGGTT 1620 *CATACAGTCC ACACGGTGAA TAGCCACACC ACGACCCTTG GTAATGTAGC CAACAATATC 1680 GTCACCAGGC ACGGOGTTAC AACACTTAGC AATCCGCACT AGGAGACCAG AAGCACCTTC 1740 :.*AATAACCACT CCCCCCTCAT GCTrTGACCTT GAGGGTTTCI' TTATTTCAA CCTTGACCTC 1800 .GCCACCTTTG ACAAGCTCCT CTGCCTCAGC T'rTGGCCTTG GCACGCTCTT CCTCACGGCG 1860 TTCCTTrTTCA GTCAGACGGT TAAAGACGGT AATCGCACCG ATTTCCCCAA AACCAATGGC 1920 CGCAAAGAGG GAGTCTTCTG TCTTGTAACT GGTCTTTTGC AGAACN'GAT CCATGTGGCG 1980 *CTTGTCCA1'A AATTTATTTG CCACATAGCC ATTTTCTTrGG AACTGAGCCA TCAGCATCTC 2040 ACGACCCTTG TTGACAGACA ATrccTTATC TTGGTT'rTTA AAGAACTGGC GAATCTTATT 2100 GCGCGCCTTG CTAGTCTTGA CCATATTGAG CCAGTC-ACGG CTAGGTCCAA AGGAGTTCGG 2160 ***GTTGGCGATA ATTTCAACCT GATCCCCTGT CTTTAACTTG GTTGTCAGTG GAACCATGCG 2220 *GCCATTGACC TTGGCACCAG TTGCTTTTTC ACCC-ACCTTG GTATGGATTT CGTAGGCAAA 2280 ATCAATCGGT CCTGAATCTT TGGGAAGGGA ACGG;ACAGCT CCATCTGGGG TAAAAACGTA 2340 *AATCTCCTCA GCCAAATAGT TT-TCCTTAAC AGAGTCCACA AA'rrCCTTAG CATCATCAGC 2400 **CTGGTCTTGG AG-CTCCATCA TCTCCTTGAT CCAGTTCATT CCAATAGCTG ATTCCTTGCT 2460 GTTAACTTGC CCCTTTATAC CTTTCTTATA AGCCCAGTGA GCCGCAACCC CGTACTCAGC 2520 CACCTCdTGC )LrTccTTGG TTCGAATCTG GAATTCAATC GGCCCTTTTG GTCCATAAAC 2580 AGTCGTATGG ATAGACTOAT AACCATTGGC CTTGCGGTTG GCGATATAGT C7TTTGAAGCG 2640 848 ACCTCGGCATC GGTI'TCCAAA ATTCATGCAC CTAACCAAGC ATGGCATAAA CA'rCAC?1-rG GGTATCTAAA ATACA.ACGA.A TAGCAATCAG ATCATAArr TC=TCAA-ACC GTT~I-CT=~ CTCCTGCATT rTGCGGAAAA TTGAGTAAAT ATGCTTGGGA CGACCATAAA TCTrCCCT-TT CAAGTGACGT TCTGTCGTAT ACTCCTCTAA ?TTrGTGACT ACCTCATCCA CCAACC.CCTC ACGCTCCCTC CGCIwrrCCT TcATCATATG G(;TAATCTTG TAAAACTCCG TTGGA~rGAG ATAACGGAAA GACAAG;TCTT CTAATTCCCA TrrGACACTG AAGCGGGCCA TAGAT~rCC-A TCGGTTCTTT GGAAATACGC ATGTTTCAGG GTCCGCATAT TG'rGCAACCG GTCACACAGT GTCCTCAGAC ATGGCC-ATGA GC-ATC~rGCG AT.AT rMCC 7T1GTACTCG ACCTTGCCAA GCTTGGTAAC TCCGTCAACA AAACCTCTT TCCAAATCGT CCAAACTCGC ATCT1GTATC1' TCCACAAGCT ACrOTTACAG CATCCAGCTT TAGCTTAGCT CAA.ATCCCCA AACGATGGGC TCCTGrCTTGT CTTTTCGAAG ITCACCAAAA TAACGCGGAT GCTAATTGCT CCTCGATCGA ATCATCCGC-A CATCACGACC TCCACCACAT CATGCAAGA-A AAAATACCTG CCACT'TGGAT CCACTGTGGC ATTCAACAC GTTAAATATT CTrTGGTrAA ATCTCTACTC TCCAAT'rCTT AGGGTGAATG ATATAAGGCT ATAGACCA6AG GCCTTATGGA AGCGACAACT TCT-rCGCCTG CCTACCATTT TATCACTN'T AATAATTCAA AATTGCTTGA ATTTTAAACT TAGGTGATAG CGCCTGATTT GCGATATTGA CAAAATGAAC ATCCTCTTCC TTAAATTCAC TTCTTTCGC 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 360 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 TTAAGAATAT GAAAACTAGA TTGGAACAGA ATAAGAAAAA TAATTCTGAA TTATTGGTCC GTAATATACT ACGAAGTTAG AAGGAGAGAT AQAAGAACGG AAACCATATT GTAACCCAA6A GACTTTCTGA CTTCCCCAAT TCCATTGAAG ATACGAAAGA TAA.ACGGTGG AACTCGTATC ACATACACTG GTACCTTGAC TGGATTTTGG AATTAATACT AAATGAAAAT CAAAGAGCAA ACTAGGAAAC TAGCCCCACG TTACTCAAAG CACCGC'TrTG AGG'TGCAGA TAAAGTTGAC GCGGTTTGAA GAGATTTTTG AAGAGTATAA AAATCCTCAA GATACTTTCT TCTATCCTTT AGTTTATAAG GAGAATACCT ATGAAAAAAA CTGCTrATTTC TATCT'rTGCT CrCCTAATGT TAGGAGTr-rG CTGCCTGTTC CTAT'rCAGCC AGCAAAGCTA TAAAAAACAG TCGTTCAATA CTATGCTAAC GACCAGAACC TGCCCAGTAG GATAACTTAT AGTGAATATA GCGACAAATG AGAAGCCAAC TACGCTAGCA CTCTAAACAT CACGTCTATC AAACAAGCTA ATGACGGAGT TTATGCAACC TATGAAGCGC AATTGACACC T1'rCCAATAT TG.ATAAATTG ATAACCAGcc TGTCTTCATC TGTCATGCT GG?7TTAAG TTCAT'I-r'AA ATCCTTACCT ATTCTCCCTA ACTGTGCTAT ACTTAATTTA TACTCAATGA AAATCAAAGA GCAAACTAGA AAGCTAGCCG CAGGCTGTTC AAACCACTGC TTTGAGGTTG CAGA'rAAAGT TGACGCGTT TGAAGAGATT 'rTCGAAGAGT ATTAGTACAT TCTTTGAGAT TGGAGCTAGT ATGAAAATCC ATAAAACCGT GAATCC CVr GCCTATG.AAA ATACCTATTA TCTAGAAGGC GAAAAGCACC TCATCGTCGT CGATCCTGGT AGTCATTGGG AAGCCATTCG TCAGACAATC GAGAAGATCA ACAAACCGAT CTGTGCTATT CTCTTGACCC ACGCCCA?1'A TGACCATATC ATGAGTCTGG ACTTGGTTCG CGAGACGTTr GGCAATCCTC CTGTCTATAT CGCAGAGAGC GAAGCCAGCT GGCTCTACAC TCCTGTCGAT AATCTCTCCC GTCTCCCTCG CCACGATCAT ATGGCAGATG 'rGGTCACAAA ACCTGCAGAA CACACCTTTG TC~rTCACGA AGAATACCAA CTAGAGGAAT TTCG=AA GGTTCTACCG ACCCCAGGGC ACTCTATCGG TGGTG=~CC C!'AGTCTc CTGATCCTCA TCTAGTCTrG ACGGAGATO CTCTATTCCG CGAAACTATC GGACGGACCC ACCTTCCGAC TGGTAGCATG GAGCAACrCC TTCATAGTAT CCAGACCCAA CTCTTCACCC TACCAAACI2A CGATGTCTAT CCAGGACATG GTCCAGC'rAC TACTATCGCT CACGAAAAGG CCTTCAATCC CM=TCTAG CA.AGATGATG ACAATCGAAA ?T=AAG;TAAA CTATCCAGCA AATCTCTA TTACAAAACG CATCCTATCA AGG7=rTTAC ACATGATTGG ATGCC'TTTT TCTGATGACT AGA7TTT=rT CATTACCAAA TAATCACGCG TTCI'rTGACA TCATAGGTTG TAAAGAAA'rC GAGTTTGGCT GGTGCGTGCA CATCGACGCT TTTCATGCGC CAGATGCGAC CGAAGTTGTA TCTCTTAGCT GCTTCAACG CTGCTGCGAT AGTCAATTTA CCGTTAATGG 'rTGCTCC-ATA TT'rTTGTGTT TT1CTccTTGA AGGCAGCATA ACCATTTTCG 'rCAAAGGAAG CCCCGTTAGT CTCCTCTGGT GAACGCCACA TTCCGTCTCC GTCGAAGTT'r GGTACTTGCA CATTGACACG AGCCAAAAGT TTCATAAATT CTGGTCGACC GAAGAACTCT TCTGCTGAGA AGTCTGCTTC TCCTCCCAAG TCAGCCACCT TTTCTGATAC AGAATCCTGT CCATCAAATT GGTCAATGAC GTCGCTtTCT GTCCACCAAT CCTTGAGGC'r ATCAAAGGCG TGGGA.AATTT CATGGGCAAT 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 CACTGCCCCA ATACCACCGT AGTrAGCAGA AGA'rGACTGA TGCAAGTCAT AGAAACCC CTGTAAA.ATG GCCGC'rGGAA AGACAATCAG GT'rCTTCTGA GGATTGTAGT AGGCATTGAC CATATGAGCA GGCATGCCCC ATTCCTTATA ATCTACAGGC TGGTrCCACT TACTCCAACT GTGCTTGATT TCCACACCCG CAAAGGCTAG AGCATTCTCA AAAAGACTGG CAGTTTCATT CACTACCTA 'rCCTTGTAAC GTGCAGGCAA TrCTTCTGGA TAGCCAATAT AAGGTTTGAT CACATTGAGC TTCACGATAG CCTGTTTACA GGTTTCrGGA GTGAGCCAGT CATTCTTAAG CAGACGCTCC-TTATAAACAT cAATcATGrGT TGCCACTTTT TTC'ICCACAT CCCCTTGGC TTCTGGAGAG AACTTCTCAC GGGCGTACCA AAGACCCAGG GC1-rGCTTGA AACGTTCTTG 850 TGCTAGATGA TAAGCTGCP TGACCTTATC T?'N'GCCTCT GGAACTCCAG AAAGGGCACG 6240 GCTGTAGGCA CCAGACAAAA CACGGA1'ATC CTCTGTTAA.A TAGCTGG N'G AAAGATTGAC 6300 -AACACTCAAA ATCA)AGTG CTrTAAGGAG AGACCAGGCT TCC!TCACTGT AGAATTGCTC 6360 TGCTGC'rPGC CAGAAACGTT CCTCG'rCTAC AATAACCTrG TCTGGTAAMT GCCCAATAAC 6420 TGC?1'TGAAG AAGTCATCCA AAGGTAGGGC AGGCGCGAAT TTICTTGAAAT CTrCGTAAGA 6480 ATATGGATGA TAGAGI-rTAG CATATTCTGA ACrTrCTTCA 'rTAGAGAGCA CCACTGCCGC 6540 AACTCGGCGG TCCAATrCAA GTCTTTIC TAGCAAGTCT TCAATTTCT CATCAGAGAA 6600 ATCATAAGCC TTGAGGAGAT TTGCGCTGCr TTCTTTCCAA AGAGTCAAGA GCTCTTCGCG 6660 CTGAGGATGT TCTTCTGCAT AGTAGGTCGT ATCrGGCA.AG ATTGTGCTTG GAGCGCTAGC 6720 CCATAGAACA T'rGATTCTAG CATCCATAAA GTCTGGCGAT ACACCAAAAG GAAGGAAGTT 6780 TCGCTTTCCT GCAAGCTCAA ACTCTGCTAG TTTAGCTGTA AAATCCGCAA AAGTCTCCAA 6840 'rrCTrGGAAT TCT'rTAAGGA GTGGTAAGAC AGGTGTGATA CCGTCAGCTr CTCTCTTGTC 6900 AAAATCACGA ACTAGGCGGT GGTATTTGAC AAAGTTTTCC AAGATAGCAT CCTCAGGCAC 6960 *TTCTTCACCT GCTAACCACT TGTCTGTTGT CGCCAGCATC AGGTCTTrCAA TTTCCTGGTC 7020 .TAAATCAACA AAACCTCCTG TTTGAGACTT ATCTGCTGGG ATTTCAGCTG TCTGTTGCCA 7080 T'rCCCATTG ATAGCATCAT AAAAATCATC 'r rGATAACGT GTCATCTTGT TCTCGC'rTTC 7140 GCATT'rATCT TAACAAAAAT CG 7172 INFORMATION FOR SEQ ID NO: 121: Wi SEQUENCE CHARACTERI STICS: LENGTH: 4518 base pairs 9444(B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID'NO: 121: CGGGAAGTTA TGCGATCTAG ACTTCGTTCC TGTACAGCTA CTTTCCAGG TGGTCTTGTT *GTTTGTATGA GT'rTGTTrAG AGAGGATCTr TCTATGTCTT TCTTTCTTAT TTrTGTTTTA 120 'rATGCTTTTC TGATTrC'rrA TCTAATTTAT GGTTATTTCA GACTAAAAAG GAAATACCGA 180 0GTAGATGAAT AGCAAGGTTC TAGGTCT1TCA GATTGATTTT TAGCACTCTT GATAAAAGAG 240 TGCrAATTT_ TGAG71rTT'r GTCTTGACAT TCTCTrCTAA GGGTGATAA TAGAATCATG 300 AGTTAGCACT TGGATGCAT'r GAGTGCTAAT TGATCAGACA GAGAGGAGTG ATGAGATGGT 360 TACAGAGCGT CAGCAGGATA, TTT'rAAATCT GATTATTGAC ATCTTTArCA AAACGrACGA 420 ACCTGTCGGA TCAAAAGCCT TGCAAr.AGTC TATTAAC'rCT AGCAGTGCAA Cr-ATCGTAA TGACATGG;CG GAACTAGAAA AACAAGGT GCr-IGAAAG GCTCATACr'r CAAGTGGTCG CGATGCCAAGT GTTGCTGGTT TTCAGTACTA TGTGAAACAC TCZACTGGATT TTGACCGGCT GGCTCAAAkr GAGGTATATG AGATTGTCAA AGCCTTTGAT CAGGAATTCr TCAAATTGGA GGATATTCTG CAAGACCC CTAACTTACT AACAGACCTG AGTGGCTGTA CGGTAAGTGC ACTG;GATGTT GAGCCGAGCA GGCAACGTTT GACAGCCTTT GATATCGTrG 'TrGGGCA ACATACAGCC TTGGCGGTAT TTACCCTAGA CGAGTCGCGA ACGGTTACTA GTCAGT~rCr GATTCCAAGG AACTCTTGC AGGAGCATT GCTGAAACTG AAGAGCATCA TTCAGGAACG TTTCCTCGGT CACACCGTTT TAGATATTCA CTACAAGArr CGGACGGAGA TTCCGCAGAT TA'rCCAGCGT TACTTTACAA CAACGGATAA TGTCATCGAT CTC=rGAAC ACATCTTTAA GGAAATGTTC AACGAAAACA TTGTGATGGC CGGCAAGGTC CATCTCTTGA ATrTGCCAA TCTAGCAGCC TATCAGTTCT TTGACCAACC GCAAAAGGTG GCCTTGGAGA TTCGTGAGGG GTTGCGTGAG GATCAGATGC AAAATGTTCG TGTTGCACAC GGTCAAGAGT CCTGTTAGC 'rGACCTAGCG GTAATCAGTA GTAAGT'rCCT CATTCCTTAT CGGGGAC?1'G GAATTCTAGC CZATTATCGGT CCAGTTAA'rC TGGATTACCA ACAGCTAATC AA'rCAACTCA ATGTGGTCAA CCGTGTTG ACCATCAAGT TGACAGATTT TTACCGCTAC CTCAGCAGTA ATCATTACCA AGTACAT'rAA GATTGAAATC AT'rAAAGGAG GCGAACATGG CCCAAGATAI' AAAAAATGAA GAAGTAGAAG AAGTTCAAA AGAGGAAGTT GTGAAAACAG CTGAAGAAAC AACTCCTGAA AAGTCTGAGT TGGACTGGC AAATGAACGT GCAGATGAGT TCGAAAACAA ATATCTTCGC GCTCATGCAG AAATGCAAAA TATCCAACGC CGTGCCAATG AAGAACGTCA AAACTTGCAA CGTTATCGTA GCCAGGACTT GGCAAAAGCA ATCr'rACCAT CTCTTGACAA CCTTGAGCGT GCACT'rGCAG TTGAAGGTTT GACAGATGAT GTGAAGAAGG rcTTGGCAT GGTGCAAGAA AGCTTGATTC ACCCTTTGAA AGAAGAAGGA ATTGAAGAAA TCGCAGCAGA TGGCGAATTT GACCATAACT ACCATATGGC CATCCAAACT CTCCCAGCAG ACGATGAACA CCCAGTAGA'r ACCATCGCTC AATCTTCA A)AACGCTAC AAACTCCATG ACCGCATCCT ACGCCCAGCA ATGGTAGTGG TGTATAACTA AGATATAAAC CCCGTAAAAA CCTCGCAGTA AAAATAGGAG ATTGACGAAG TCTTCGATGA ACACAAGAAA ATCTATCTTT 'rTTACTCAGA GCTTAGGGCG 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 TGTCATTC GdGcAATTcTr. AcGGTAGCTA TGGCCTTTGC CTAGC'1rCCT TACTAACTCG AACCAACTCG TCAGAAAACG GCAATCGC1'A TCCTCGAAAT AAAATCGATT TCGACTCCTC 11 GTGTCGCAAT TTACATAATA AAAATATGTT1 TGGCTTTGTA GGCGCTATTT' GCGCAAMTTT AGTCAAGCTC TGACGGCGTC AGAAAATTAA CTAACA(GrA 852 GAAAACTTGI' CCGAAACGAC AATAAACTAr GAACAAAGAT ATAGTGACCG AACCGAACr-A AACACGATAC TCTTCGCCGT TGAGACTTA GGCTCAAAGT r'rACTCAAAG AGAT1TGACGA GCCACTGTCG CCACTTAAGA AGAGTATCAA AAAGAAAAAT GAAAAACACA TGTCTAAAAT TATCCGATT GACTTAGGTA CAACAAACTC AGCAGTTGCA GTTCTTGAAG GAACTGAAAG AAGGAAACCG CACAACTCCA TCTGTAGTCT CA'rTCAAAAA ATGCTGCAAA ACGTCA.ACCA C'N'ACAAACC CAGATACACT TGGG7AACTTC TGAAAAAGTT TCTGCAAATG GAAAAGAATA CAAAATCATC GCAAAcccAG CGGAGAAATC ATCGTTGGTG TATCTCTATC AAATCTAAGA CACTCCACAA GAAATCTCAG CCTTG;GTGAG AAAGTAACCA ACGTCAAGCA ACAAAAGACG CGAACCAACT GCAGCAGCTC CTTG4GTATTT GACCTTGGTG CTATGATCC'r TCAATAC TG( AAGCTGTTAT CACAGTTCCG CTGGTAAAAT TGCTGGTCTr TTrGCTTATGG TTTGGACAAG GTGGTACAT1' CGACGTCTCT CTGC.AGGGGA CAACAAACTT TAGCAGAATT CAAGAAAGAA GTTTGAAAGA TGCGGCTGAA TCAGCTTGCC ATTTATCACT CTCG;TGCGAA ATTTGACGAT AAAGGCTACG CTGAAGACTA GC TTACTTCA ACCACGCTCZA GAAGTAGAAC GTAMrTTAA ACTGACAAAG AAGAAAAA.AT ATCCTTGAAT TGGGTGACGG TGTCTTCGAC GTA'N'GTCAA GGTGGTr.ACG ACTTTGACCA AAAAATCATT GACCAC~rGG AACGGTATCG ACTTGTCTAC TGACAAGATG GCAATGCAAC 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 AAAGCGAAGA AAGACCTTTC GCAGGTGAGG CTGGACCTCT TTGACTrCGTG ACCTTGTTGA GTCAAGCCCT TTCAGATGCA GGTTTGAGCT TGTCAGAAAT GTGGTT CAAC TCGTATCCCT GCCGTTGTTG AAGCTGTTAA CAAACAAA'rC AGTAAACCCT GATGAAGTAG TTGCTATGGG TGATTACTGG TGA'rGTCAAG GACGTTGTCC TTCTTGATGT TCCAAACAAT GGGTGGAGTA TTTACAAAAC TTATCGATCG CTAAATCACA AGTCTTCTCA ACAGCAGCAG ACAACCAACC TTcAAG.GTGA ACGCCCAATG GCACCAGATA ACAAGACTCT TGGTGTAACT TCAACACAAA TCACTGGAA ATGACTTTGA ACGTACAAAA GTTCCAGTTC CGACGAAGTT ATCCTTGTTG AGCTGAAACT GGTAAAGAAC TGCGGCTATC CAAGGTGGTG AACGCCATTG TCACTTGGTA CAACACTACA ATCCCAACAT AGCCGTTGAT ATCCACGTC TGGACGC'ITC CAATTGACTG ATATccr-AGC TGCAccTCGT GGAA TTCCTC AAATCGAAGT AACATTTGAC ATCGACAAGA ACGGTATCCT CMCTGTTAAG GCCAAAGACC TTGGAACTCA AAAAGAACAA ACTArlrGTCA TCCA.ATCGAA CTCAGGTT'rG ACTGACGAAG AAATCGACCG CATGATGAAA GATGCAGAAG CAAACGCTGA AGCCGATAAG AAACGTAAAG AAGAAGTAGA CCTTCGTAAT GAAGTAGACC 853 AAGCAATCTr TGCGACTGAA AAGACAATCA AGGAAACTGA AGGTAAAGGC 'rTCGACGCAG 4020 AACGTGACGC TGCCCAAGCT GCCCTTGA'rG ACCTI'AAGAA AGCTCAAGAA GACAACAACT 4080 TGGACGACAT GAAAACAAAA, CTTGAAGCAT TGAACGAAAA AGCTCAAGGA CTTGCTGTTA 4140 AACTCTAcG.A ACAAGCCGCA GCAGCGCAAC AAGCTcAAGA AGGAGCAGAA GGCGCcAcAAG 4200 CAACAGGGAA CGCAGGCGAT GACGTCGTAG ACGGAGAGTT TACGGAAAAG TAAGATCAGT 4260 GTATrGGATG AAGAGTATCT AAAAAATACA CCAAAAGTTT ATAATGATT TrGTAATCAA 4320 GCTGATAACT ATAGAACATC AAAAGAT'rTr ATTGATAATA TTCCAATAGA ATATTTACCT 4380 AGATATAGAG AAT'TATATrA GCTGAACATG ATAGTTGTAT CAAAAATGAT GAAGCGGTAA 4440 GGAATTTGT TACCTCAGTA TTGTTGTCTG CATT'rGTATC GGCGATGGTA CCGTATCTGA 4500 CGAACGTTCA GCTrATAT 4518 INFORMATION FOR SEQ ID NO: 122: SEQUENCE CHARACTERISTICS: LENGTH: 8145 base pairs TYPE: nucleic acid STRANDEDNESS double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: *TGCTATTTTC GATTCCC 'G GGCGTTTTGA TTCCTTTGC CTTGCAAGTC CATTGGAAGC CCC'TCCATTA TCTGATTAAC ATTTACATCT GGGTTATGCG AGGAACCCCC TTACTCTTGC 120 AACTGATTTT TATCTATTAT GTGCTCCCAA GTATTGGGAT TCGTTTrAGAC CGCCTTCC G 180 CAGCTATTAT TGCCTTTGTT CTCAACTATG CAGCTTACTT TGCAGAAATT TTCCGTGGGG 240 GAATT~GACAC TATTCCAAGA GGACAGTATG AGGCCGCCAA GGTC'N'GAAG TTTAGCCCTT 300 TTGACAGAGT GCGCTATATT ATCTTGCCCC AAGTGACCAA GATCGTTCTT CCTAGTGTC1T 360 'PTAATGAACT TATGAGTTTG GTCAAGGATA CTTCTTTGGT CTATGCTCTC GGAATTTCAG 420 ACCTTATCTT GGCTAGTCGA ACAGCTGCTrA ACCGCCATGC TAGTCTAGTT CCTATGTTCT 480 TGGCAGGAGC CATTTATTTG ATTTTGATTG GGATTGTGAC AATTATTTCC AAAAAAGTrG 540 XGAAGAAGTA TAGTTATTAT AGATAGGAGG CTCCATGTTr AG.AATTACGA AATATC-AATA 600 AAGTCTTTGG AGACAAACAA ATCCTGTCTA ATTTCAGTCT AAGTATTCCT GAAAAGCAAA 660 TCCTGGCTAT CGTTGGACCT TCTGGTGGAG GTAAGACAAC TCTTTTACGT ATGC-jTGCAG 720 GTCTTGAAAC CATTGATTCA GGGCAAATCT TTTATAATGG ACAACCTTTA GAGCTGGATG 780 854 AATTGCAGAA GCGCAATCTA CTGGGATTG TCT'rCCAAGA ?r1-rCAACTA TTTCCTCATC TATCAGTTCT GGAAAATTG ACTTTATCGC CTGTCAAGAC CATGGGAATG AAGCACGAAG AGGCTGAGAA GAAGGCGAGT GGACTCTTGG AACAGTTAGG ACTAGGAGGA CACGCAGAGG; CCTATCCTTT CTCACTATCT GGTGGGCAAA AGCAGCGGGTr GGCTTTGGCG CGTGCTATGA TGATTGACCC AGAAATCAN' GGCTACGATG AACCAACTTC GTTTGGAAGT CGAGAAGCTA ATCTTIGCAAA ATAGGGAACT TTACCCATGA TTTGCAGTTT GCTGAAAATA TCGCAGATGT AATAGGAGGA AAAATGGATG AAAAAATGGA TGCTTGTA?1 TGTTCTTAGT AGCTTGTGGG AAAAATTCTA GCGAAACTAG TGCCCTGGAT CCAGAA'rTAC TCGGATGACC CAGA'rTGTGG ATrATTGAAA GTAGAACCTA AGTCAGTCTG ATGACTGCT-r TCGAGATAAT TGGTCAAAGT T TTGTTCCA ATGGGATTTG AGCTACAGCT GTTTrTGAAA TTGAAAGAA GCTGAATTGA 'rACAGACGAA CGCCGTGA.AA
ACC-AGTCTAA
CTCAGAAAGA
AATACGGAAT
CAAGTCTATT ACTATTGGAT TTGATAGTAC TGGTTCTTA'r
CACGGTAAAT
CAAAAGGAAC GATTGATCTG AGGTGGCTTT CAGTAACTCA CTGGTATCAC GACTGCAAAG CTGGTTATGC GGACTTTGAA AAGCGAA1'CA ATACCAAACC ATGGTCTAT'r GA'rrGACCGr ACGATTATAA TCT'ITACA GCAGGAT'rTG
TGGCAACCGA
ATTTGGAATG
TATATGAAGA
GATATGACTG
GCAAATCCAG
TTTAATGAAG
GTCTATGCAA
GTTGGACTAG
ATATTGATT'r
TTGATTGGGA
GCTATTCCGC
ATGAGCAGGT ATTGGTTACG AAGAAATCAT GAAAGACATT ALGGAGCTCAA GCTGGTTCAT AAATTTTGAA GAATATTGTC GCTAATAAGG CCTTGATTGA TTTGAAAA.AC GATCGAAT'rG ACTATTATTrT AGAAGCAGAA GG.TGTTTTAA AAACAGAAGC TTTTGCCGGTT GGAGCCCGTA 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 AGGAAGATAC AAACTTGGTT AAGAAGATAA ATGAAGCTTT TTCTAGTCTT GCAAGTTCCA AGAAATCAGC C.AAAAATGGT TTCGAGAAGA TGTAGCAACC AAGAAGGACA GTAAGATAAA ATAGTGGCTG AAACTrGCGT'r TTGATTAGCA TTTTr'rGTAA TCTAGGAAAA CGATAATAGC GATTGAATAT GGATAA'?TGA GCCCACTGTG ATTTCTAAAA CATTGTTAAA AATTGATTTG ACTTCCAAAA CTGTAATGAA ATACTGATGT AACTGTTTTA GGAACAATAA AACGCATAAT TTGcAccTTA cATTATGcGT TTTTGTGATT TTAAGAcTG TTAGCTG.ATT CTGCGAAATC TTTGATTTC'r TGTGCTGACA TTGAAGAGTC GCAACGGACG CATCTG1?AAT-ATGAACAAAA CCTGGTAC-AG TTGGGATTCC ATAGCGTGAG
TACAAGGACG
AAAGAAGTAA
AAACGTAGTT
ATATGGAATA
TTAAAATGTT
ATCAAGGTTT
TTTTACAATC
T'rGATTT'GTC
CGGAATGCTT
GCAAATCATT GAGTTGCT GGTTCTTCAC TATI'GATGAA GTAAATGTGA GCTTGCGTTT CAGCTACGAC ACCTGACAA'r G'rACCTGCAA ATTNACGGC GTAAGGGCAA GTTGCGAC 855 CGATAAAGAA GGTTGCAGTT TCTTTTTTAT CAAGAGCTTC I-rGCCCACGC ACAACTGTAG TGAC7TCAAG GTCTTGATG TTATCTAAAA ATTGTTrCCAT GAGAI'ACCT CGCC-rTCATT GATAAGTCTA GTATGCCATA AAGTTTCTAA AhTTGCTTAG ATI'?GATACG AAAAAAGATG AGGTTGGTTG GTCTCA'rCTT 'rTATAGGTCT TTATTTTACA AATGCArrGA TTTCTGC'TTC GATGTTAGCA ATCTTAGCTT GTGATTCTTC GTTGGTTTCC CCTACAACTG CAATGTAGAA CTTGATTrTr GGTTCTGTAC CAACGGCG AAcGGCAArc CA'rGAACCGT CAGCAAGTG;T GTATTTCAAC ACATCACTTG CAGGAGTTGT CAAG~TCTA ACACTACCOT CAGCAACAGT 2640 2700 2760 2820 2880 2940 3000 AGCAGTTTGT GCCTTGAAGT CT'rCTACGAC AGTGATAGCT GTTGCCTTCC AGCATTGT'rG CGGAATTTAG CCATAATCGC ?I'rGAT~rGT AAGAGTAACA GAGATTGTTT ACCGTCAGCA AGTGTCAAAC AACGGCTTGG ATGGCATCTT TTCAAATCCC ATCATGTAAG GATAAATTTG AAACCTGTCA CGTTACCAAG TCAGTTGAAA AGCGTTTTTG TGAGCTTCCA AAGGTTGAGG TAGCTACCAT GTCAGTTGCG ACAAGAAcAT TTTCTGCGTA GTAGCCA'rAT CACCACAACG GTAGTAGGCA
TATCACGTAC
TGTGGT'rGTG
AGACGTTGAA
CGATAGATTT
AGATGTAT
CTTTTTGAAG
CTGCACCAAC
AAATGGTTTA
TTrTCTTCG
CATAGTTGCG
GCAGAGAGCG
ATCAAGTAAC
AA~TCTTGGA
CCCTAGCTTT
CATTTTCAG
GCACC6AT'TT
CCAACACGGT
AGTTCTTCAG
TCAGCACCAT
TCTTATACA
GCAAGTTCAG
ATTCTGTTGG 3060 CGACACCTGA 3120 TTTCTTCGAT 3180 CAACTACAAG 3240 CGAAGCTTC 3300 ,rrrTTTTCAGC 3360 CAGCAATCTT 3420 GAAGAGTTCC 3480 GGTTACCTGA 3540 CAGCGTCTGG 3600 CAAGCGCAAA 3660
ACCCATCATA
AACI-rCAACA
'ITGACGACCA
GGCTGCTTGG
TTGCGCT'rCA
CATTTCACCA
AATCAAGCT
TTCGCCGATA
TGCAAATGGG
CATTTG.TCCA
GGCTGTGACC
CTTTCTGGGT 'rTGGAGATGT TACAGTTGAA AAGTCTGGGT CAGCAGTTGC ACAACTTGAA CAGAGTCAAA TCCTGCTTGG GCA-AGAGCAC GACGAGCCA-A GTACCATGAA GTGGTGTGTA GACAATCTTC ATGTCTTTAC CAAATTCTT~C GGGTTGATGT TTATCTCCTT AACCTCTTTA ACCTATTCTA TGTCAACACC ACTTCAATCA AGCCAGAAGC TTTTCAGTT TCCACATCAG CAACTTCAAC TTTTCGATTC CACGGATATA AGTAGTCAAA GCCTCCGCAT CGTGTGGAGG CCGTCTTCAC CGTAAACCTT GTAACCGTTA AATGGAGCAG GGTTGTGCCT ATGATACCTG CGAAACAGTT GAGATGACGA ACTGCAAATG ATAGTTCTGG 3720 3780 3840 3900 3960.
4020 4080 4140 4200 4260 4320 AGTCGGACGA AGGCTTTCAA ATACCTAAGA TTTGATGCCG ACATTCAAG GCAAACTCAG GTGAGAAGTG ACGGCTATCG TTTCTCG TTTCCACCT'r TTGACTCAAT CAAACGAGCC TGT'r7AGCAA
TAGGCAATTG
AATCCTTCAG
GAACTGCCC
CTACACCGCG
TAGCTTGGCG
AACAACGTAG ATGrTGATAC CGTrTGTACC AGCACCAACC AAGCCACGCA TACCT-GCAGT 4380 ACCAAATTCA AGATTTGTAT AGAAGGCATC TCCTTAGTT 'TTrCGTCCA TATTrTCCAA 4440 -ATCTTGACGA AGGTACTCAC GAAGCTCCAC AAAATCAACC CATTTCT GG'r AATTrrTG 4500 GTAAGACATT CAAATTCTCC TTTATTTTTA AAACATTTAA TCAGTNTAAT TATATCATTT 4560 TTTTATTT TACTAAAACC TTATCTGC'rT CGAACATCTC TT'CAAACCAG GTCAGA7TGA 4620 ATTTGGGGT TATATGATGT TGAGGCTAGG AAAAATTCA.A TTCATAAA AAAAGTAAGT 4680 CTTCTCATAA CAAAACATTG ATATAGTTAC TTA 1TrTrAA ACAAGCATAT TATAATAAAG 4740 CTA'rGGCATA TACTACTGAT 7rAAACAGC GAGCATT-AGA TTACATCAAA GAGGGGCACA 4800 GCCATGTCGA GGCAGCCAAG TTrTTTTGGTG TGGCGTCAG AACTCTTTC ACGTGGGAAA 4860 AGAAAGACCT GAACAAGAAC ACATAGAGAG GAAAAAGCGA GTCGTCAAAA ACCGAAAGAT 4920 TCCTTAGAG GAATTGAAAG CCTTTGTAGA GGCTCATCCA GATGCTTTr 'rACGGGAAAT 4980 TGCGGCACAT TTTGATTG'rG CTGTTCCTTC AGTATGGGCA GC!--TAAAGC AGATTAAGGT 5040 CACTrrAAAA AAAGATGACG AGCTr'rAAGG AACAAGACCC AGAAAAGTAG CCTTATTTCT 5100 *TA6AGAATTTT AATAGTTTAA A1GCACCrAGC ACCTGTTTAT ATTGATGAAA CAGGAATCGA 5160 SCCGCTATCTC TATCGTCCTT ATCCAGGGGC TCCTAGAGGG GAGAAACTCT ATGAAAAGAT 5220 AGCGGACGT CGTTTTGAGC GAACT'rCAAT TGTTGCAGGA CAACTAGACG GAGAG'rrAT 5280 .AGCTrCCCATG ATTTACAAGA AAAGCATGAC AAGCGATTTrC TT'rGTGGAGT GGTTCAAAAC 5340 GCA6ACTCCTA CCTGCTTTGA AGACACCTCA TGTTATTGTC ATGGGCAATG CTGGT-rTCA 5400 *TCCCAAGAAC ATTTTGGATG AACTCTGCAT CCAAGATAAA CACI-'I-CT TACCTCTACC 5460 ACCTATTCA CCGGATTTGA ATCCTATTGA GCAAGCTTGG GCTATCTTGA AAAAGAAAGT 5520 GACGGATGTA TTAAGGGAAG TTCCAACTAT T'r'ITGAATCT TTGGAATGCT TTrTrAAAAC 5580 TAGATGACTA TAACGGTTCT AAAGGAACCT ATCGAGTAGT CATT-AAAACT AAGGATACTG 5640 *CTGGTTAAGA GAAGACGGTA TACAATCAAA CCATTCACCG TGTAGCCGAA ATCGTTC-AGA 5700 ATGAAGACTT GTA'rCAGAAT GAAGACTTGT ATAAGAAAGG TTTGAATGTT GAACTTGCGC 5760 ACCAACAAAT TAAGGGATTT rrGAAGCAG AGTTTAAAAA TCGTATTAAT GGAGTTCTTA 5820 *ATACTAAAAT AAAAAATAG.T AcATTAAATC GTGTAAATAA AAAAACTATA cAccAG.AGcA 5880 *.*ACAAAAACTC CATGATCAAT TTGAAGCAGA AGCAACGGAA GATGCI'AAAA AACAAGGCGA 5940 TATTGTGTTG AATGTTGACC AGGA'TrCAT GAGCATATCT AAGTCTAATA AAAGTGGTTC 6000 AGACTGGAAG AAAACTTTCA CAG'rGAGGAT AACCAATAGG CTAGCAAATG ACTTGAATAA 6060 TGTCTGAAA CAGGTTGATA AAGATACTCC TAATACCCCA ACTTGGCTAA ACTCAGCTGC 6120 TCTAAAGcT AAAGATGATG ACAGACTATA TAAACTACTG AAGACTCTTA TACCAGGAGA AAATrACCTA TCATGTrAAG GATAATCAGC TAGAAGTAGA AACAL2ATAAA TACACATA'rA CTGCCGCrAG AAATGGTAGT AAGGAAGTI'G GTATrCAAGA GTCACA'rATA GCAGCAACTC TAAGTGCCGA TGAATATAAT TCTAATCGCC AAACTTTGA GAGAGAATAC AAATACAAAA GcAAATGccc ?TAATAATGG TTGGGCTAGA TCTGGTrTCG AAGAGTTCAA AAACTTCTCC CACT1'TGTAG GGGTAGACAA TCTGATAAGA TTAGGAAAGA TTCTCTACTG GGGATGTTCT TCAGAGAATA ATGAAAGAGT CCAAT'rCTAG CTGACTTTTC GAAAAAATCA TrCCAAAACT ATCAAGAAAA AGAAAAAAAC CGCTATGT'rG ATGAACAAGG AAAGAAAGTG ACGGAACCTA GTCACAGATA AAAAACTCAG AGGGATTGTG CGAACGAATG TACTGACTGG, TAAAAAACTA AGTGGGCTCT GGAGATAGCA AACTAGGAAA AGGCCGCTAT AT'rAGGAAAA GATGrGT'rr CTATACCGT ACAAGTATTT AGGAGTAAAC ACTCAAAGTC ACCGTGTTC-A GTATAATCrC AGTCATCCAA GATACTGTGG AACCATCACG AACCG'rrGTT AAATATTCCC GAAGAAGAGA AAGGAAAAT AACCGAAGAA CTCAGAATTG GCAGAACTAA TCTCAGAAAA TGTGAAAG'TT GCGTTTGCTA TCATrGAAAA A'rCATACTGG AATTGGAGAA CATTACCAAT AAAAAACAAC TGATTGGTAC CAGCTATAAT TAGCATGACT ACTACTGACG GAAAATATTA TACTTTTAAA a.
a a a a. a a a. *a *0 a 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500- 7560 7620 7680 7740 7800 7860 GAAGCAGATA CAAATTCTGC AAGTTTAACT GGGAATATTG TAAGCGAAGG TAGAACAGTG ACCrTAGTT-r ATAGAGAAAG CGAAGCGCCA ACCACTGCTA CAGTAACAGC CAArTACTAT AAAGAAGGTA GGCAAGAGAA GTTGGTAGAG TCTGTTATAA AAGC-GAT'rT AGCGATAGGT TCTGAGTATA CCACAGA.ATC A).AAACTATT GAAGGGAAAA CAACAACTGA GGACAAAGAA GACCGAGTTA TCACAAGGAA AACAACATAC ACCTTGGTAG CAACrCCTGA AAATGCGTAC a a. a a a a a.
a a CAGAAGACGG TGCAACAGTT GACTATTAC'r CCAAAACAGC AACCTCTACT GAGACGAAGA AAGTTACGAA CCAAAATGTA AAAGAAGATG AAACTGAGAA CAAGGTCACG GGAGTTGTAA ACGAGGTTAT ATCTGGTAAG ATTGACAAGT CACAAGAAGT TACGTCAGAC TCTACTGATA
ACCGTGAGAA
CTATAACGCG
TTGTTCAACC
CCTACGGTGA
ACAAAGATCC
AAGAAATAAC
ATCCAGAGCA
TGTTGAGGAA ACAGTGGTTC TATCATTCAT TACGTTGATA TGTAACCTTA AGCCGTACAA ATGGACAACA GGAAACTGGG AGATATTCCA ACAGTTGALAT GGTAAGGTAT GACCGTTTAT TCCAAGTGTT CCGACACCAA CAACACCAGA AAAACCAATC CCACAACCAA ACCCAGAAcT ArCCAAATcAA GAG.ACTCCAA CTCCAAAAAC TGAAACTCCA GTGAATCCAG CACCAGATAA ACCAACTCCA GAACCAGGTA ACCCAGAAGT TCCGACTTAT GAGACAGGTA 858 AGAGAGAGGA ATrGCCAAAC ACALGGTACAG AAGCrAATGC 'rACCTrrGGCT AGTGC'rGGTA TC.ATGACCTT GTTAGCTGGT CTAGGATTAG GATTTTTCAA GAAAAAAGAA GATGAAAAAT AATMAATTT AGAATCTAGG AACCAGGAAA AGCTrCAC.AGA TGTGGGCr'r TTrCcTGG=r 'rTGAGAACCA GGTCTrTCGT AAAGAATAAA AACGCTITACA ACTCTGTTGA ACTGGGAAAC TATGAA'rCCT ATT~TTTAA AAATATTTCC ACAAAPCAGT TGCGG INFORMATION FOR SEQ ID NO: 123: SEQUENCE CHARACTERISTICS: LENGTH: 8697 base pairs TYPE: nuclaic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12]: CCGTACCGGG AACGATACTT AGTCTAA'rrr TGCACCTTTT CCATGTATCG TAAAGGTTTT 7920 7980 8040 8100 8145 0S* 4 S. *4 4 S 4* *4 i 4 *4SSSS S. be e S
S
4 4 454.
4 4 TCI'TTTTrA AAAAGGAAAA CGAGAAGAGG AGGTrCTTAT GAAAGCAAGC AAGTTTTACC CCTAGTACAG GGGATrGATC, ATCTGCAWAC TCAAGAAGTG ACGATGGTAG AGTTTGATGA GCTTATGCGC ATTCTAAAAG ACAATGTCTT TGCCAATGTC AAAATAAATG TTGAGAAGTA TACTGAGACC ACACA'rTAGT TGGCAGTTAG CAGGTTTTCT TAAACTTCTC ATTCTCCAGC CCTTTGT'rCG TGACAGAGAA AGAGTGGCTT TACTGGGGCT GA'rrTTGGGA ATGGATAGTr TGACTTGGCT CAATGACCTG ATTCCGACCA 'ITGCCATAGC TCCTATCCTG AAGATTGTCT TGATTATCTr AACGACAACC TTTAGGCATT GCGACAAGGA TATGCTGACC GGATAGCTGT TATTrGATCAG TGACACCATT TGAAACGGTC AAGCGCTCGA AGTGGC.AGGG TAGGAGAGAT TTTAAGTATT CTATTGGGCT TTCTCGGAGT CCCAAGTrTA TCCTGCCGAC TTTCTCTGGC ACCATAGCTG GTrTTGATTG CC-rarCTTA'r A'N'TACCCTA TGATGGTGGT G'rCTrGTGGC TAGG'ITATGG T'IrCCCATCA TCGT'rAGTAT ATrGCCTTGC 120 GTCATTGCTT 180 TTGGAAGGGG 240 CAGGAGGCAG 300 GATGAGAAAC 360 ATTGTCAATC 420 ACCTCTTGAA 480 GGCGACCTTG 540 GGCTGTGCTC 600 CATTCAGACC 660 GATTTTGCCC 720 TTTGGACGGT 780 TTGT1TTAGTC TGATGCGGGC CAAGCCTTGG :dAAATCCTGT GGCATTTTAA AATCCCAGTr AGCCTGCCTT ACTTTTATGC AGGACGA GTCAGTGTCT CCTACGCTT TATCACAACT GTGGTATCTG GGTCTTGQTG ='TATATGAT TCAGTCTAAA AAACTGTTI'C AGTGGTrGGG AGGTTTTGAA AGTATGATAC CATGTTTGCC 840 900 960 1020 1080 1140 ATTATTATTC TGGTGTCGAT TATCAGTCTT TTGGGTATGA AGCTGGTCGA TATCAGTGAA AAATATG'rGA TTAAATGGAA ACGTTCGTAG AATrAGAATG TTTCTGAAAA AGAAAAGAGG AAATCJLAAAT GAAGAAAACA TGGAAAGTCT ?rN'AACCr TGTAACAGCT CTTGTAGCTG TTGTGCTTGT GGCCI'GTGGT CAAG-GAACTG CTTCTAAAGA CAACAAAGAG GCAGAACTTA AGAAGG'rlGA CTTTATCCTA GACI'GGACAC CAAATACCAA CCACACAGGG CThTGI-rG CCAAGGAAAA AGGTTATTC AAALGAAGCTG GAGTGGATGT TGATTTGAAA TTGCCACCAG AAGAAAGTflC 'rTCTGACTTG GTTATCAACG GAAAGGCACC ATIrrGCAGTG TATITCCAAG ACTACATCGC TAAGAAATTG GAAAAAGGAG CAGGAATCAC TGCCG~rGCA GCTAITGTTG AACACAATAC ATCACGAATC ATCTCTCGTA AATCTGA'rAA TGTAAGCAGT CCAAAAGACT TGGT'rGGTAA GAAATATGGG ACATGGAATG ACCCAACTGA ACTTGCTATG TTGAAAACCT TGGTAGAATC TCAAGGTGGA GACTTTGAGA AGGT1'GAAAA AGTACCAAAT AACGACTCA.A AC3'CAATCAC ACCGATrGCCC AATGGCGTCT TrGATACTGC 'rTGGATTrrAC TACGGGGC ATGGTATCCT TGCTAAATCT CAAGGTG'rAG ATGCTAACTT CATGTACTTG AAAGACTATG TCAAGGAGTT TGACrACTAT TCACC-AGTTA TCATCGCAAA CAACCAC3'AT CTGAAAGATA ACAAAGAAGA AGcTcrcAAA GTcATcCr.A CCATCAAAAA AGGCTACCAA TATGCCATCC AACATCCAGA AGAAGCTCCA GATATTCTCA TCAAGA.ATGC GTGACTTTGT CATCGAATCT CAAAAATACT TGTCAAAAGA AATCGGG'CA ATTTGACGCA GC'rCGCTGGA ATGCTTTCTA GTATCCTTAA AGAAGACTTG ACAGACAAAG GCTTCACCAA AGAAATTAGA CTAGAGCACG TrCAG~rATGC CTATrGGTCAG CAACCTACAG GTGACr'rCAG GCCAAGTGT~ TTCCATCCTA GACCACCCTC TTAATCrAA TCGCTGGGAT 'ITAGAAGTT TGATGGTGAA GAAAATCCCA AGGGGCGCGT GAGT'rATATC CGAGCACAAG ACGGTGCTTG GAAATATCAT TCTGCCCCTC GGCACAAGCT ATTTCCCGAG CGGATAAAAT TCTTGCGACC AGACAAGTA'r CCTCATGAAC TTAGCGGTG GATGCGCCAG CTACCTrT GGGCACAAGC 'rC'TrrCTCTT AGATGAGCCC GACAAAGATG GAACTCCACG CTTGGTA'rCT TGAGATTCAC CCTGATCATC ACGCATAGTA TTGAGGAGGC CCTCAArC'rC GAAAAATCCC CCTGGGCAGA TTGTTTCAGA AATTAAACTA CAAGGAAGTC CAAAAGATTG CCTACAAACG TCAAATTrrG ACCTGAACTC AAGGAAAAAC ATACCCA-'AGC GACAACGAAA CAAATG4GGkr AAAGAAAATC CG;ATTTGTG AAATAATGAC GAGGATTT TAGAGGATAT GGCCCAAGTG GTGTTGGAAA CAGTCAGGGA GAATTGTCCT TTGCAAAAGG ATCTGCTCI-r T'rGATTCAAA AGGTGGATAA 'rrCCAGCTGA CAGCTGTAAG CGTGTAGCCT TACrCCGGAC =rA~CCCCT TGGATGAGAT AAGCAGTTGC AGCTAACAAC AGCGACCGTA TCTATATCTT GAT'rCGTCTG AAGATGAGGA GCGGAATTAG GCTTAGATAA 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980.
2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 860 GTAGAAAAAT AGGGAGTTGG TGAAGATTAT CCTTTACCAG CGCCC=~ CTTTAAAAA TGAGAAAATT 'rCGGTATAAT AGTCAAACAA GGTCAAGGI-r TAAAGAGAGA CGGTGGGNrG TTATGAGA?1' TAAAAATACA TCGGATCATA 7rGAGGCCTA CATCAACGCG A=M~AGATC AATCTCG;TAT CGTGGAG 'rG CAACGGAGTC AGTTGGCAGA TACCTTTrCAG G?1'GTTCCTA CTCAGATTAA CTACGTGATC AAGACACGCT TTACGGAAAG- 'rXCAGGCTAC TTGGTGA;A GTAACCGTGG TGGCCGAGGC TACATTCGTA TAGGACGGAT TGAGmrCT AGTCATCATG AAATGCTCCG CC-AGCTGCTT TACT'CCATTG GTGAGCGAGT CAGTCAACAA ATT'rATGAGG ATATTcTccA GcTTTTGGTT- GAGCAGGAAT 1'GATGACCA.A CCAGGAGLATG AATTTGCTAG AATCAGTAGC ITrrGGATCCC GTTTTAGGAG AAGALAGCTCC AG7rTrTCGA GCAAACATGC TACGTCAGAT CATACAAGAG GTAGATAGAA AAGGGAAGTA AGATGAACTA TTCAAAAGCA TTGAATGAAT GTATCGAAAG TGCCTACATG GTTGCTGGAC ATT?1'GGACC TCGTTATCTA
GAGTCGTGGC
T'TAAATGATT
ACGGACTATA
ACTTGTTGAT TGCCATGTCT AATCACAGTT ATCCGTATGA GATGGACCGT TTAGAAGAGG
GCCAGGATGA
C7=rTGATG AAGCAGAGTA CACGTCCTCT ATGCGAT GCTGGTTTTT cTTArAAGA TTAGAAGAAC GGGCAGGCTG ACAGTAGCTG ACAAGCAAAA CGTGGTCTCG AGGATTATAC CCAGTCATCG GTCGGGACAA AACCTTTACG GAATTGCCGT TGTAGCGTC-A GTGGTCCATG GCQNTGATAGC AATGCCTTGG CAAGAAAGAT CAGGTCAAGA GACTCGTGAA GATCTCAAGG TTCTATGGCC AATATCATGG GCATGATTTG ACAGAGCAAG ATAGTGTAGC AGGCGCAACT TGGC-TTGCA ACTCACTGAA TCTCCCGTCG TTTGCAGGTT CTAAGGTACT AGGOACAGAG CGACTCGTAT CTTGGAGAGG TTGC-TGCTCT TCGTCGAAAT CTT'TACGCCA ACGCCATCGT GCATGCCGCA GACTCCTAGT CGCGTTCTrGG CAAGTTAGAA 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 GGAAATCTCA CGTATGATTC AAATCTTGAG CCGGAAGACT A.AGAACAACC CTG=CTTGGT TGGGGATGCT GGTGTCGGG;A AAACAGCTCT GGCGC'rTGGT CTTGCCCAGC GTATTGCTAG 'rGGTCACGTG CCTGCGGAAA TCGCTAAGAT GCGCGTGTTA GAACTT'GATT TGATGAATGT CGTTGCAGGG ACACGCTTCC GTGCTGACTT TGAAGAACGC ATCGAATAATA TCATCA.AGGA TATTGAAGAA GATGGCCAAG TCATCCTCTT TATCGATGAA -CTCCACACCA TCATGGGTTC TGGTAGCGGG ATTGATTCGA CTCTGGATGC GGCCAATATC TTGAAACCAG CCTTGGCGCG TGGAACTTTG AGAACGGTTG GTGCCACTAC TCAGGAAGAA TATCAAAAAC -ATATCGAAAA AGATGCGGCA CTCTCGTC GTTTCGCTAA AGTGACGATT GAAGAACCAA GTGTGGCAGA TAGTA'rGACT ATr'rTACAAG ='-TGAAGGC GACTTA'rGAG AAACATCACC GTGTACAAAT CACAGATGAA GCCGTTGAAA CAGCGGTTAA GATGGCTCAT 861 CGTTATr'rAA CCAGTCGTCA CTTGCCACAC TCTGCTATCG ATCTrGGA TCACGCGGCA GCAACAGTGC AAAATAAGGC AAAGCATGTA AAAGCAGACG ATrrcAcATTT GAGTCCACCT GACAACGCCC TGATGGATGG CAAGTGGAAA CAGGCAGCCC ACCTAATCGC AAAAGAAGAG GAAGTACCTG TCTACAAAGA CTTGGTGACA GAGTCTGATA lrrGAccAc cTTGAGTcGC TTGTCAGGAA TCCCAGTTCA AAAACTGACT CAAACGGATG CTAAGAAGTA TTTAAATCTT GAAGCAGAAC 'rCCATAAACG GGTTATCGGT CAAGATCAAG C7rCAAG CA'rTAGCCGT GCCATTCGCC GCAACCAGTC AGGGATTCGC AGTC-ATAAGC GTCCGA~rGO TTCCTTTATG TTCCTACGC CTACAGGTGT CGGGAAAACT GAATTAGCCA AGGCTCTGGC AGAAGTTCTT TTTGAcGACG. AATCAGCCCT TATCCCTTTI GATATGAGTG AGTATATGGA GAAAT'rTGCA GCTAGTCG'rC TCAACGGAGC 'rCCTCCAGGC TATGTACCAT ATGAAGAAGG TGGGGAGTTG ACAGAGAAGG 'IrCGCAATrAA ACCCTATTCC GT'rCTCCTCT TTGATGAGGT AGAGAAGGCc CACCCAGATA TC?11TAATGT TCTCTTGCAG GTTCTCATG ACGGTGTCTT GACAGATAGC AAGGGACGCA AGGTCGATTT TrCAAATACC ATTATCATTA TGACATCGAA TCTAGGTCC ACTGCCCTTC GTGATGATAA GACTGTTGGT TTTGGGGCTA AGGATATTCG TTNTGACCAG GAAAATATGG AAAAACGCA'r GT7rGAAGAA CTGAAAAAAG CTTATAGACC GGAA'rTCA'rC AACCGTATTG ATGAGAAGG'r GGTCTTCCAT AGCCTATCTA GTCATCATAT GCAGGAACTG GTGAAGAT'rA TGCTCAAGCC 'TTrAGTGGCA AGTTTGACTG AAAAAGGCAT TGACTTGAAA TTACAAGCTT CAGCTCTGAA ATTGTTAGCA AATCAAGGAT A'rGACCCAGA GATGGGAGCT CGCCCACTTC GCAGAACCCT GCAAACAGAA GTGGAGGACA AGT'rCGCAGA ACTTCTTCTC AAGGGAGA'rT TAGTGGCAGG CAGCACACTT AAGATTCGTG TCAA.AGCAGG CCAGTAAA.A TTTGATATTG CATAAAAGAA TA.AAAGTATC AGCATCTGAC CATA.AGTCAC ACTGGAGTGA AATTCAATGA AAATCAAAGA GCAAACTAGG CAGCTAGCCG CAGGTTGCTC AAAACACTCG TTTrGAGGTTG CAGATAGAGC TCACGTGGTT TGAAGAGATT TTCCAAGAGT ATGAAACTAA AACCTATAGC TTCTAAACGA TCCGTGGTTI' TCATCATTCA ACACAAAATT CATATGTTTA TTACCCTCCG TCGTATTTGT CTTAGAGCGT GTGTAGTAGA AAAAGACCAG TCTTATCTGA AATTTNTATT CTTTCAAAAG ACACCTGTTT CT TTTGCA TGTCAAATCC GTTCTAGCTG GTA'TTTGAAA AATCAAACTA ATATTCAATG AAAATCAAAG AACAAACTAG GAAGCTAGCC GCAGGT'rGCT CAAAACACTG TTTTCAGGr GTAGATAGAG CTGACGTCCT TTGAALGAGAT TTTCGAACAG TATAAGCTGC AAGATGAATG ATTTCTTGT ATTGACGTTG TTGTTGACAA 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 862 AAAGTAGCGG ATAAATGAAA TCCA'ITCCAT TATCATAGAT GATAGGCTGG TAG CATTTr TCAAATAGCA TACAGGAAAT AGATGTATGG AGTTCTGGTA GTAGAAAGGG AGAGAGATGA ACA=MTAGT TGCAGATGAC GAGGAAATGA TTAGAGAAGG AATTGCAGCA TT-TCTGACAG AAGAGGGTTA TCATGTCATT ATGGCTAAGG ATGGACAAGA GGTCI'rGGAA AAATTTICAAG ATCTCCCTAT CCATCTCATG GTACTGGATT' TAATGATGCC TAC-GA6AGAGT GGTrTTTGAAG TGTTAAAAGA AATCAATCA6A AAGCACGATA TTCCTGTCAT CGTCTTGAGT GCTCTG3GGAG ATGAAACTAC TCAGTCACAG. GTA'rTTGATC TCTATGCTGA TGA'rATGTG ACAAAACCT'r TTrCTTGcT ACTGCTTGTC AAGCGTATTA AGGCGCT'rAT CAGACGTrAC TACGTCATAG AGGATCTTTG GCGATATCAG AAAArGAAGA AATTGATCTC A'rAAAAATCA AG=rTAAGT ATTTACCTI'G TGATAGGGTC TAGATTGTAT CGTGACTGTG TAAATTATTA ACCAAGTCTT CATTCATATA GCTATTTATT GTTTlAATGAG AGCGCAAGAG GATGTAACAG TGGATTTTAC CTCTTACAAA GCACATI'ATA AAACCAAAGG AATTACTGGT ACTAAAG'rGT TTGATTCAGC AGAGAGCAGA TATTGGAAGA AATNTCAAAA GATGTAGCTG GTTGATGTCT ATATT~CGTAC TCTTCGCAAA AAATTAGCTr AAAAATGTTG GGTATAAGAT TAGCTTATGA TAAAAAATCC TTTAAGAAG TTTTGCAATT CrAGTGGTG TTGGTC'rAGT TGACCTTTCC TTTTTATTAT ATTCA.ACTGG AGGGGGAAAA TGTTTACGGA GTATTTAAAG ACTAAGACAT CTGATGAAAT 6480 6540 6600 6660 6720 6780 6840 6900.
6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 TCCAAGCTTA CTCCAGTCrlT ATTCAAAGTC CTTGACCATA TCTGCrCACC TTAAAAGAGA TATTGTAGAT AAGCGG=TC CTC'TTGTGCA TGACTTGGAT ATTAAAGATG GAAAGCTATC AAATTATATC GTGATGTTAG ATATGTCTGT TAGTACAGCA GATGGTAAAC AGGTAACCGT GCAATNTGTT CACGGGGTCG ATGTCTACAA AGAACCAAAG AATATTTGC TTTTGTATCT CCCATATACA TTrTTTGGTTA CAATTGCTT TTCCTTTGCTT TTTTCTTATT TTTATACTAA ACCCTTGCTC AATCCTCTTT TTTACATTTC AGAAGTGACT AGTAAAATGC AAGATTTGGA TGACAATATT CCTTTTGATG AAAGTAGGAA ACATGAAGTT GGTGAAGTTG GAAAACAGAT TAATG;GTATG TATGAGCACT TG;TTGAAGGT TATTTATGAG TTGGAAAGTC GTAATGAGCA AATTGTAAAA TTrGCAAALATC AAAAGGTTTC CTTTGTCCGC GGACATCAC ATGAGTTGAA AACCCCTTTA GCCACTCTTA GAATTATCCT AG.AGAATATG CA.AAGATCAT CCAAAATATA TTGCAAAGAG TATAAATAAG ATTAGAAGAAGTACTC4AGT CTTCTAAATI' CCAArGAGTGG GACTGTTAAG CCACT=rAG TAGATATTTT ATCACGTTAT AGTGTTACA AT'rGAAAATC AATTGACAGA TGCTACCAGG
CAGCATAATA
ATTGACCAGA
ACAGAGTGTC
CAAGAATTrAG
GTCGTCATGA
TTGGAGATTA
TGAGCCACTT
GTGAGACCTT
C'TCATrCAAT GTCT'rAGGCC ATTGGATAAC GTrTTGACAA ACCTGATrAc TAATGCAATr AAATATTCAG ATAAAAATGG GCGTGTAATC ATATCCGAGC AALGATGGCTA. TCTCTCTATC AAAAATACAT GTGCGCCTCT AAGTGACCAA GAACTAGAAC ATTrATI-rGA TATATTCTAT CATrTCTAAA TCGTGACAGA TAAGGATGAA AGTTCCGGTT TGGGTCTITA CATTGTGAAT AATATTTTAG AAAGCTATCA AATGGATTAT AGTTTTCTCC CTTATGAACA CGGTATGGAA TNTAAGATTA GCrTGTAACc AGATTAGTTT TT1TATTAAAG TTCATATAGG G?1'AACATAA GTGTGNTATT CTTTGTGTAG ATAAAAGAAA GGATACTAAT ATGGTATTAG CGATTATTTT AGTAACATTC TTTATrCGAT TGATNTTTTr AAAGCGTTCG ATAGAGAATG AGAAACGAAT CC'TTAGCAAT GGCGGGG INFORMATION FOR SEQ ID NO: 124: SEQUENCE CHARACTERIST1ICS: LENGTH: 4317 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 124: AACCATACAT ACCGCAAGGC AAAGCTGACG CGGTTTGAAG AGATTTTCGA AGAGTATTAG TTGCCTTTAA AGGCATCCAC CATCGTTTGA AATTCTrAT TTGAGAGAGT AATCCCTTTG CCCATTTTAG TATGGTCTGG ACTCC.AAGCA CGAATATCAA ACTTTGCAGG GGCACCATTA 8280 8340 8400 8460 8520 8580 8640 8697 120 180 0. se .0 too* 00..
.0: 0 AAGCTCACAC GGTTAATTTC CTTGGTCCAA CCPTTTTCGT TCCTCTTCGA TTCAAATGT AAATTCTGCC ATTTTCTTCT TTATTCGTAA AATCTTGTAG ATT TTAG.GAA AATTTTATAT GAGGCCAATA TGAGACATAA ATTCCAGCAA GTTCTAAATA GGATATGACC AACCTGACCA GACTGAAACC AACTCCCTTA ATCCAGAAAC AAACCGCTCT TCACCTTATC TTGTCTGAGA ATCAAATATG ATCAGCAAGG CCACCAAATT ATCGTGAAAA CGGATTATCC GTATAAGCGA TATTCAACGC CTGCGATTTG GCCCAAAAAA ATAGATTTAA GAAAGAGTGA GATGTAGTTG TTAGCGAATT TGTTCAAAAT GTAAATGAAC TGCGATATGA CAAGTCACGT TGTAAACGAT AGGAAATGTA GTGTTCTGCA CAPLTAAACGA TGTTCAACCA TAGATTGAAT CATACTGATA TTTCAGAAAG AGTCAACAAG CCTTNTAG TTTCATTAGT AATATTGATA TAALAAGAAGG AAATACATGA TT?rAAAT CAGCCACTAT TGAAGAGGCT CAAGCTTTAC AGGIGACATC AT MrTCCAA AAATGTGAGC TCCCCTCAAC TGTCCAAACA CTTCATCCCA CTCTT'TN'C TCTCCATAAC CACTTCTTTC ATGGTAATGT AACCTGCGCC GTCGCACGTr CCACCTTGC 864 'ITCTGTAAA TCCAAALACTA CCTrTITACT GACTTGAGCA AGATTTGAC GCAAATCATC TCTrCAAAACA TAAACAGTTTI GCGCTGCCTT CAAGATGGCT TCCTAAATC'r TATCTGG.ATT AAATTCAGCA ATTT-CGCCAT TACGTGAT TACTTGCATA GG7=CT~ TTATTCTr-rc 'TTTTrrrGA TTrTGCCAG CAT?T=CT TCTTCTACTG TCAGTTGATA ATGTrAACGr
AAATCCGGTC
TTTTICATGGT
CCTGTGTACT
TCCTCCCAA
AATTCTCCAC
ATCCGCTCAT
AAA'rCTTCAG TGCGCTCGTA cGTTCT GGCCACTCAT CAATACATCr GAGGATATTC TAAAAGACCT TC-ACTTCTGG AATCAGGCGA CAGTCACCAC ATAGTCACCT CATAACCCTC ATAGTGCCCA CATAAGCCTG ATCAAACC AAACTCTCG;T ACAATCGCCA CTGACCAATC GGCACGACCA TGCCTCGATA ATCATACGGA GAAGAAAAAC TA'rCATCTTG GTGGCTAGAC ACTGTAGCAT CAATrCATGGT CATAGCTCC AGGGAAATCT CATCTGTTAC CAAGCGTCTTA CAGATAAAGA TTAGCTCTTC CTCTTGAGCC T'rTCCAGCAG GATCAACCAG AATAACGCGC TCGAAAATAG GTTCTGC-TCT GAGCAACATG ACATGACGGG CCTITrCAGC A=NPCTCGA TTTTCTCGAG CCTTCCAAC GATTGAGTGC GTTAAAATAT CAATCTTCAT CGTCTAACCC TGGAATATCA ACATTGAGAA CCACTGGTGG TTTTCGTTTG ACCACCCAGA CATCATTAGC 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 GGATTTT'r~CTI 'TTCAATAGC ATCAAAGGAA CCCTGACCCC CTCCGTAGGG CTCATCATCT 0 9* a.
a a t a. a.
a a AAATTATGAT ACTGGATATC TCCAGTGGAG AAAACATCTC
CAAGAGCCCT
TGGA.AAGAGG
TTC'1'AAGAT'r TCCACATCGA CCCGTTTrAC GATATAAGGT AAAAGCAAAT CACGTTTGCC ACCTGGTrCC AGGATTTCC1' TGATGGTTCC AACCAAGCTA TCACCCTCAT AGACTTCCAA ACCGATAATC TCGTGATAGT AAAATTCACC ATCGTCTAGG TCATTCAAAT CTTCCTCAGC GACCT'rGAGA CTGTATPCCCT TT1TAATAATG TCAAAGTTCT Tr.TACTI'C GATAGTATTG ATATGGTACA TCTGTTTACG GTGGCTAGCG ATGGTC-ACTG TATCTTGAA 2100 TTTGGACAAA 2160 CTTrCTGCAAA 2220 CGATTTTCCC 2280 CTGATCTT TCATCAAACA AAACCAGCTC AGCTCCTTTT 'rTAAACCGTT ATCCGTrCACA GACAAGACITC GCATCTCCCC CTGTAATCCC TGCGTATrAA 9b~a a.
a AACA'rTAAAG TAGTTcZATCT TGTCTCCTGT AATCTCCTTT TTTCCATCTT ATCTAACAA TTCTCGAATA ATAGCCGCAA TTTrCCGA TTCTGACCAT TGTAAATAAT GGTGATTCCC -TCCTAAAATG AG7rTAGTAT TGGAAGTCCA ArATTCTGA'r TCTCTGTACT Cr=~CTCT ATAACGCTGA CAAAAAACAA ATAGGAAT ATGAGCTTCT ATAGATACAT CCTCAAAATC TTCCTCAGCTA ATCTCTCCAG ATATCTGAAA TCTGGATCT TGAmTTrTTCCA ACTCTAAGCC TTTTTCTTGC ATTAATTCCC AGATTTrTTT ATTCGTTTCA GGACTAAATG TrTciCTAcG TAAG'rTCT'rA AAATAAAGTT CAGGACCACA CTCGTCAATC AGCCTCATCT GCTCTTCCAT TrCTGGATAA GGAT 'TTCTG AA.AAATCAGC TGCTACTAAA GTCTGACGCT TAATTGGTI-r CCAACTATGT GCACAAAGTA TATATTCAGA CGCA'rCTGCA AGATTATCTA GATTTTTCC GTrCGGAAAA TCAATTGTCA AATAACCAAT AAAAT'rN'CA TAACTrGGTA GCAAACCTGC crirGATAA GTAACAGAGA GGCTACCAAT GAAATCCACr GATTCTATAT AATGAATTAT AAACATGACT TIrrAGr TCGGTTCAAT CTCOAGTAAT 'rTGCAAGCTA, AAATTCCACT AATT'CCrAAT TCTTCAAGTA CTTCATAAAC AGCTrGGTCA TGAA1'CGGAC TCCTACCTOT TGTAGGAGGA G'rrl-TCAA GTATAAGTGA TCCGT'TAAA CA.AACTAGCA CTr'rCTrTTG TTCTGTAGAT ACI-lCAAACC TCTTCATAAA TAAAAATCCT TATCCTTTAT TCCAAGGAT'r TrCCAAGTr CTrCCAAAI-r A'rOTrTATG'r TCTTGGTAAG GTCAATATGA CCAGTTCTCT GGCAAGTTcC TTCAAAACCT CGACTACTTC GTCCGAGACT GCTGTTAATG TTTATAGTAT ATrGAXAACTA ATTTGACTGT CCTrGATCGAT ACACAAAAAA GCGAGACATC GGAGGAAOGG GACAATATCT TGG'rAATrAA TIGGCTGCGGT TTAATAGCTA CGATTGCGAC
CACTTTCCC
TTrTGTCTTTC
GGTGTAGTAA
TT'rATCACGT
TTCAATAATC
TCAAAGACAG
ATOATATCAG
TCGTTCGAA'r TCCGACTrCC
TTCGATTAAG
GAATAGTACA
TT~GTCCTGT
CGTCCCGCCC
CCTATCCCT'r C7'1=rGACC AGAGCGCCTA TCCACTAACT TC.ATAAAGAG ACCTTCATCA TAGTCTTGGA ATTGCCTAAA CGT'rCAATCT CCT?1'ACTTC TAAAACATTG TTAGAAATCG C'rTGTTTCAT TrTACrATAT CTTCTATTCC TTCTTATTr TCG'rCAATAA CGATTCTTAC AATCGTTCTr ATCGCAGAAA TAGTGCGACC TTGATCAAGA 'rrCAAATGAT ATTCCAAAAA GGCATCTGGT TGTGAAAT'rA AGGGTT'rCAC CATCTGTCAA CCTACTrTAA ACT'rATTTrG CGCCrCTrT TCAAAGGATG TTACGTACTG 2760 2820 2880 2940 3000 3060 3120 3180) 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4317 ?r'rTTTG'rAT TCAGTTOGGA CAGAGTAGAC CTTACGACCG ATTACACOAC CCACATCGCT TTCTGGTGTA TCCTCAATCT TGATAGTTAA AATCGCAATA ATGAGA'rTTT CAATCGTATC AAAATTrAGA ATCGTGGAAT TTTTTCAATA TGTCTGAAGG TTCAGC'rCCA TTAGCCAACC ATGCALAGAAC CTrCGTTTC AGCAACAAGT GGGTrGTAAG TTCCAACTGT GTGGTGAACG TGAA'rCTGCT ACGTTGATAC GGTAGAAAG GAGTCAAACG GATTTTAACT GCCATTTTTA AAGTCTCATr TGAAATAGCT GAGCTATTTA GCACATGTTC TATrATAGCA INFORMATION FOR SEQ ID NO: 125: SEQUENCE CHARACTERISTICS: LENGTH: 4881 base pairs GCCGTCTTCT TTCAAAGTTA TTCGATGAAA CGTCCGTCAC T7"=TCTTA GAACCCATAC TCTTTAATTT TTTATTTCGG OATTTCTGCC ATGTGTC TYPE: nucleic acid STIRANDBEDNESS: double TOPOLOGY: linear (xi) SEQUEN~CE DESCRIPT'ION: SEQ ID NO: 125: AATTTATTTG ACTGGAAATT G1'AGAGGGTT cTCGAAATTT cTTGAATCGT TAAAATAAGG ACAAGAGAAA ACATG'GATAT cTATATccT GTGccAAAAA AAcC-AcTGcc CTCCCCAGAC CAACCTGAGG AAAGCAGTGA TTCTTATT'rP AGCAGTTAGG AATGAATACA CGAAATCAAT TTAGCTGATT ATI'TTTGTT TTTCA.AGAAT TCATCGTAT? GT'rTNrGCAT TTCGTTCAAT' ACTI'TTTCGT ACGCACCTTC AGATTTCAAT TTTTCCATCA ATTCTGGAAT CGCTTTATCT GGGTcTAcAG TAccAGTG-TT GATAGCTGTA TCAAATTGTT GC.ATTGTGTT AGCAATAGCT GAGATTTCAG ATTTCACATT GTCAGTATTG AAGATAAATC CAAGCGCTGG AGATTCTTTA GCTTCTGCCA ATTCT?1'C~r AGAATTTTCG ATrTGTTGGT CTGTAACGTT TTCGTTGATG TAAAGGATCC AGTTGTTACC AGTGTrTCCAT CCACCCATGT GAGTGTTNCC TTGTAGCCA TCAAGAACGC GAACACGGTT T'rCTTTACCT TCAATTTT CCCAG?'TCTT GCCTTCTGGA CCGTAAACAA GACCGMCALA GAGTTCTGGG TTCGTATTCA AGAGGTTCAA GA11TTCCATT GAT-TTTCTT TGTTCTTAG-A TTTTCTTGA .TGAAGTTAGT AGCAAGCTGT TACCGTAGTC TGTTGAAGGT CAAAGGAAGT TAGAATTTGT GAAGAGTCTT ACTTTAGTAG TATCGCCTTC TCAAAAT'rAT r-ArATr=~AT GTTGTTTrGAG ATGACAAAGT
AATTGGTTTG
AGCTGGTCCT
ATCGCTTGTT
CA6AGTGTTCT A6AGGTCGATA
=AACTTA
ATTTGGATAT
ACTGT'rTCTT
GCGACCTCITT
TTGAAACGAG
ACGAATGGAA
CrCkATAr-CA-A TAGCAACTTG TGTTGTTTGG CTTTGTrrGGC AACACGTGAA CACGAACGAA CCAAGTATCT TTGGAATGTA GCCAGCTTCA GCAC -rCGTA ACGGTTTACA GACCG'rTTGC TACTGGGTAG ATGCTACTAC GTCTGGAGCT AAGAAGTAAC ACCTGAAATA 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 TITTTCTTTGA TTTGTI'CAA GACTGGCTCA AGAGTTTCGT TCGATACCAT ATrI'AGCA6AG GAGAGTTCCG TTGAAGGcAA AGTTTGAGA TGATGCAACr.
TTGGCTGC.AA CTGGAACAGC G'TAAATCTPA CCATTTACAG TA'rPACCCTT GATGTAAGCT GGGTCAAGTG CTPGTAAAG GTCTTTACCT TAAGCACCTT TTTGAGCATT TACAATATAG CCAGATG.ATG TGATAACTGA CATrITrCTTA TCTTTTTTGT ACAATTCTGT CAAGTCAGCG TTATCTGCAA AGGCAATATC ATAGTTTTCA CCATAGTCAC CCCAGCCAAG GTA'N'GGATA TCCAATTTGG CACCAACTTT TTCTTCAATG ATTTTGI'GG CATTTGCTAA CAATTCA'PCC AAGTTGTCTG GTTTGTCACC GATTGGTAC ATTTTGAT.AA CAGGTTTGTC ACCTGAATCA GCAGCCTT1-rT 'rGCTGTTACC TGTCAAArr ACTACACTAG CAGATGCAAA AGCATATTT T'N'ATTTTTA AACTTATAAA CAATG'rAATG.
ACTAGAAAAC TAGCCGCAGG CTGCTCAAAG GAAGTCAGTT ACATATATCT ACGGCAAGGC AAGAGTA'rrA ACTTCACACA AGGGAAGTTG ACTATTCTTT CACACCACCG ATACTCAAAC CCACAAGCAG CAAGACCTGC AGCCACAGCG TTCCAGTTTT TCATGATAAA AACTCCTr ATCITA'rACT CAATAAAAAT CAAAGAGCAA CACTGCTTTG ACGTTGTAGA TAAGACTGAC GACGTTGACG CGGrTTGAAT TTGA'rN-CG GGAACTGAGA AATC'rTATTT CTCAATAAC CI=rACAAA GTAGCG""GG AAAAATGG.AT ACAAA.ATCCC GATTCAAGG GTTGCAACCA CAACCATGGC CATACGACCT G CTTTCC GTAGAGCAAC TCCCAGTTGA CCAATCAAGC CCACCGC=T GCCAATGTAG TCCArA'TT GTTGGATTTG CATGAGCAAA TATTGCAATG GATACAAG'rT GTCACTCTTG ATGTAAAGAA GGGCGTTGAA CC-AGTCATTC CAGAAACCAA GAGCTG=TAA GAGCGTGATG GTTGCGATAC CTGGTAGTCA CAATGGCAAA CAGATrMGA AGAAAXrCC GCCCTC-ACTG GCACCATCCA TACGAGCCGA TTCTAGAATG GCTTCTGGAA TGGTCTTCTT GAAGAAGGAA CGCATCAAGA TCATGTTAAA TGGTGAGAGA AGCATTGGAA CAATCAAGGC CCAAAC-AGTG TCACCAAGCT GAAGTACACG GGTCACCATG ATATAACCrG GTACCAAACC AGCGTCAAC AACATACTGA 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 29r40 3000 3060 3120 3180 3240 3300 GAAGGACGAA GATGGTAAAG AATCTGCCGAT ACTrTAAAGGT TGTCCGTGA.A ATAGCGTAGG CATAG'TGCT TGTGATAAAG ACATTTGTCA ATGTCCCAAC TACGGTTACA AAGACAGAGA TGA.AGAGGGC TTGTAGGATT T'rATCCTTA6A ACTGTGCCAA AAACTCAAAA CCGTCTAAGC CAAATTGGGA TGGGAAGA6AG CTATAGCCGT ATTGGAGGAG GCTPTTCTCG TcT'CACTG AAATAATGAT AACGAATACA AAAGGTAGGA TACAAGACAG GGCAATCAAA CCCCAAATCA TACTGAAGAA GATATCTGCT TTCTTACTGA AGGAGTGAAT GCCGACATTA TCAA'rTTr CTTTTTTAAT TT-TCTT=rr GCCATAT'rCT CCTCCTrTCTr AGAACAAAGC TGAGTTTGGA TCG;ACTCGTC TTGCAAGCAA GTT'rGATAGG ATAACCAGAA TCAAACCAAC AACGGATTGG TAAAGACCGG CTGCTGCAGC CATACCGATA TCTOCTGTCT GAGTCAAACC ATI'AAAGACA *TATACGTCCA AAACG 'TGGT TACATTGTAA AGCTGACCAG CATTGTGTGG GATTrTGATAG AAGAGACCGA AGTCTGCCCG GAAGATAT CCGACTGCAA GGATGGTCAA TACAGTTACA AGCGGAGTCA ACTGAGGA.AT GGTTACGTTG CCAATACGT GCCACTTGCT ACCTCCGTCC ACTGTCGCTG CTTCGTAGTA GGT'rGGA'rCA ATTCCCATGA TCGTCGCA'rA GTACATGACA CTGCTATATC CAAAGCCTTT CCAAATACCT AGGAAAAGTA GGAGATAGGG CCAGATGCCC 868 AGGTCAGCGT AG2AAATTGAC TI-CTrTGAGA CCAAGACI'T CC.AATAGATG ATTGAACACC C;r rATCAA TATTAGGAA GGCATCTGTA AAGAAACTGA TGATAACCCA AGACAAGAAG TAAGGCAACA ACATAGAAGT T'rGAAAAATC 'rrCACCAT1-C rCTrAGAACG GAGCTCGCTG AGGATAATGG CAATCCCTAC AGATACAACT AAACCTAGAA AGATAAAGCC AAGATTGTAG AGGACACTAT TTCCTGTGAT AATAAAGGCG TCTCI'rGAAC TAAATAAGAA TCTAAAATTA TCGAGTCCGA CCCATTTACT ATTrATGATA CTATCrATGA AACCATrACT GCTCATGTGG 'rAGTCTTTGA AGGCAACCAC GTTCCCAAAT ACTGGAATGT AAAAGAATAG AATCAACCAG AGTGCCCCTG GCAA.CCAT CAAGAGAAAG ATCCAGTTGT TTT TTCATAA TTTCCTCCCT TTTTATTTTG ATATCCATCT TTGAT?.ACGA TTACATTATr AGTATACTCC TATTTGCAGG TAGAAAAAAC TCCACAAATT ATGTAGCACA TTTAAAACTI' 'rGTCCTAAAT CAATT'.rrrA TTTTATCTCT ATTAGCCCAG XAGCATCCAA CAACGGGGTA TACTGAAAAA TCTCCAGACT CTAATCTGGA GATTTTTAAT ATGTTATTAG GCGTTTGCTr 'rTAAGATTAT CAATCAACTC TGCTGCAGTA TGCTCAGAGC CTCTCAATGT TT-rGAAAAC AAAAA'IrCTT TTTTAGACTT 1-rAGGT'rAAA C1'CCTAA'TTA TATCACCACT ATCAAACAAA TGATGGCGTC ACTCTGTrAT AGGGAACTCA GCGATAGTTC TCAACTTAGC AATAACCTCT C7TTrTTCATC TCCCAAGAAC 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4881 AA.AACTGCTr TTTGAAGTTC TTTTTGAGAG T=rCAAGGA CATCCTTATC TACTG'rTTCA AGGTTTGAGT CTTTAAGAAG TTTACTTAAT TCCTTGGCTA ATTTCTTGAG TTTGATTTGC AGACTCATCT TCTCCTGCTG TTrCTrTGCC CGCTGT'TTGT CCrCCATCCT TAGTTGCTGA CTGGCTTTCC TTAATGGACT CTAGGGAAGC AATGGCA'rCT TTrCAC'TGTr'r GCAAGATATC ACGTAAACCT TGCTCTGTCA AACTATCATC TGCAAAAGCT TTAT'rAGCCT CTGCCAAAAC CAGACGTGCT GAATCTIGTGG TAGGATTCGA TACACCTGTC AATGATCTCA AAAGATTTTC TAAGGTTTGA GTCTGC'rrAC TAATACTAGA CTAAAATCAA AA.AGTATTAT ATAACAGTGA TATGAAATCA ACTAAAGAAG AAATCCAAAC CATCAAAACA CTTTTAAAAG ACTCTCGTAC AGCTAAATAT CATAAACGCC ?I'CAAATCGT TCTATTT'FGT CTGATGGGCA AATCT-rATAA AGAGATTATA GAACTTTTAT AGTAGTTTGA AATAACATGT GAACATCTCT ATCAGGAAAG TCAAAT'TAAT TTATAGAAAT ATTTTAGcAG ccAAGGTGTA cTG.TTATAGA TTcAATAcAC TATACTTGGT GGTTTAGCTC G INFORMATION FOR SEQ ID NO: 126: SEQUENCE CHARACTERISTICS: LENGTH: 13121 base pairs TYPE; nucleic acid STRAl.DfEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ Mf NO: 126: AGGATCCCCG GAAAAGGAGA CTAAAAATGA AGAAAAAATT TCTAGCATTT TrGCTAA1-r TA'rrCCCAAT T1'TCTCATTA GGTATTGCCA AAGCAGAAAC GATTAAGAT'r GTrTTCGATA CCGCCTATCC ACCTTGAG TTTAAAGATT CAGATCAAAC ?1'ATAAAGGA ATrGATGTTG ACATTATTAA CAAAGTCGCT GAGATTAAAG GCrGGAACAT TCAGATGTCC TATCCTGGAT TTGACGCAGC AGTCAATGCG GTTCAAGCTG GGCAAGCCGA CGC TATCATG GCAGGGATGA CAAAGACTAA AGAACGTGAA AA)AGTCTrCA CCATGTCTGA TACTTACTAT GATACAAAAG TTGTCATTGC TACTACAAAG TCAC-ACAAAA TTAGCAAG'rA CGACCAATTA ACTGGCAAAA CCGTTGGTGT TAAAAACGGA ACTGCCGCTC AACGTTTCCT TGA.AACAATC A.AAGATAAAT
ACGGCTTTAC
GTGCCATCGA
AAGACCTCCA
AAGGAAGTAA
AAGATGGTAG
CAACTACAAC
CCAGCGATTC
TTGATATGGA
TATTAAAACA TTTACACTG GTGXI-rrAAT TGCCATGATG GATGACAAAC CTGTrATCGA TATTGAAATG GATGGTGAAG CTGTAGGAAG ATACGAGCAC CTGGrTACTG AATrTAACC-A TCTGATAAA ATrATCAAGA AATGGAC'TGC TACTCTCGCA GGAT'rAAAAG CTATTCCTGT TTC7TTTGCC CCTrTGTTT TCCAAAATTC ATTGATTAAG GCAATCGCTA AAGACCAAGG
GAACAACAGC
ATATGCCATT
TTTTGCTC
AGCCTTGTCT
TTGAGTGCTG
AACCAAGGTC
GGTGTCAAAA
GAAATGAAAA
TTCATCATCT 'rCAGCAGTGC TAAGGCTAAA TATATCATTG AAGCAACCAA TACACTGGTA TT'rrGAAATT GAAATCACCA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 120 1260 1320 1380 1440 1500 1560 ACCCTGGTTT TGATGCTGCT ATCAGTGCTG CTGGTATGTC TGTCACAGAT GCTCGTAAGG CTGCTAATAC CATTCTTGGT GTCAAAGAAT AAGGAA.AGAC AG'rCG--TGT AAAAACCGAA AAACCAAATA CGGCTACAAA ATCAAAACCT .tAAACACTGG TGCCATTGAT GcccTATGG GCCAAGGTCA AAAATTGAAA ACTCCAATCT CCGTTAAAAA AGGAGCAAAT CCAGAACTGA TTAAAGCAAA CGGTGAATTC CAAAAGATTC TCCAAGCTGG TCAAGCCGAT GGTATC.ATCG CAACTTGA CTTCT-CAGA.A TCATACTACA
CAAGCAATAT
CTGCTTCTCA
TTGCTGATGG
ATGATGAACC
CTGGAACTCC
TGCTTCTTAT GAAGATCTAA AACCTTCCTA ACAGAAAATC TTCTqrAATG TATGACAGTT 'rGTTCTCAAA TAT'rCTATCA AATCGGTGAA ACAGCCTTTG I-TGAAATGr CAACAACGGA CTTGCAAACC TTGACAAATA CCTACCTAGC GAATCTTCAA CTGCTrCAAC AAGTACTGTT GACGAAACAA CGCTCTGGGG CTTGCTTCAA AACAACTACA 870 AACAACTCCTr TAGCGTTr GGTATCACTC TTGCI'CTAGC TCTrATC1'CA rr=CATTG CC-ATTGTCAT CGGAATTATC TTCGGTATG'r TTAGCGTTAG CCCATACAAA TCTCTTCGC'.
TCATCTCTGA GATTTCGTT GAcGTrATTC GTGGTATTcc ATTGATGATT cTGcAGccT TCATCTTCTG GGGAATTCCA AACTTCATCC AGTCrAT'rCAC AGGCCAACAA AGCCCAAfI-rA ACGAC=rGT AGCTGGAACC ATTGCCCTCT CACTCAATrGC GGCTGCT-rAT ATCGCTGAAA TCGTTCGTGG TGGTA'rTCAG GCCGTTCCAG 'rTGGCCAAAT GGAAGCCAGC CGAAGCTTG40 GTrATCrCTTA TGGAAAAACC ATGCGTAAGA TrATCTTGCC ACAAGCAACT AAATTGA'rCT CTCTTAAAGA TACAACTATC GTA'rTGCTA
S
S.
TGCCAAACI' TGTCAACCAA ?I'CG'rTATCG TCGGI'TGGT TGAACTCTTC CAAACTG'rA TCAAGATGTA TGCAATCCTT GCTATCTTC'r TAGCGAAACG CT'rAGAAA.AG AGGAT'rCCTT TTTACACAAG CACTATGGAA AAAATGAAGT AGGAGATGTT GTTTGTATCA TCGGTCCTTC CCTCAATCTT TTAGAAGAAG TC-ACTAGCGG TGAAAAAACA ACCAATGTTG ACCACGTCCG CAACCTCTTC CCTCATATGT CTGTATTGGA GTTGATGACT AAGGAAGAAG CTGAGGAATT AGCAGATAAA GCTAATGCCA ATCCAGATAG CATCGCTCGT GGCCTAGCAA TGAATCCAGA
AGATATCAT
ATCTrGTAAT
AATGGCAAAA
CCTAAAAGGA
AGGTTCTGGT
TCACATCACT
TGAAXATATC
CAACATCACC
GGGAATGGAG
CCTATCAGGT
CATCATGCTC
TTAAAAATTG
ATTACCACTA
AAGTCAACTT
GTGAACGGCT
GGrCATGGTAT
TTTGCTCCTA
TTrC-cTAAA
GGTCAAAAAC
TTCGATGAAC
AAGGAATTGG
CGTCAGGT'rG
ATGTAAATGA
AGTTCTATGA
TCCTCCGTAG
ATGATTTAAC
TCCAACACTT
TTGAGCACAA
AGGTTGGACT
AACGTGTGGC
CAACTTCTCC
CTGAGCAAGG
CCA.ACCGCGT
TGCTCGTAAC TACCAAAGTT TATCACACT'r TTGAC'rAGAC 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 29-40 3000 3060 3120 3180 3240 3300 3360 CCT'rGACCCT GAGATGGTTG CATGACCATG ATTATCGTAA TATCTTTACT GCAGATGGCG CCCACAACAC CCTCGTCTGA TGTAAGGA'rT TCCTTGCAGT .TATGTTAGAA TTAAGTTTAT AATAGAAATT AGGTAGCTAG GAGACCTGGC TAAACGCAAG TTccAGccA-cTTrccGT GAGACGTACT TAACGTTATG CCCATGAGAT GGGATTTGCT AGTTCCTTGA AGACGGAACA AAGACTTCTT AGATAAGGTC TTTTrCTACCT CGTATTCGAA GAAATGAGGT T'rCCTCATAC CCTGACCAAA 'rCTT'rGATAA TTAAACGTCT AAACTCAAAC TTT=GATTT TTCGGAAAAT CTAGCAAGAC TAGGAATAAA ATGTrCATCTA AGGTTAT'rGT TACAATTTTC GGTGCGAGTG CTC'rACCCTT CCCTI'TTAG ACTATATCAA TCCGGCAATC ATTGGAACTG CCCGTAGACC TTGGAGTAAG GAATATTTTG AATCTGTAGT TGTCGAGTCC ATCC?1'GATT TGGCAGATAG TACCGAGCAA CCCAAGAAT TTGCTAGCCA CTTCTACTAT CAALAGCCATG ATGTCAATGA T'rCCGAACAT TATATTGCT TCTCAATT ACAAGCTGAG TCTTGTCTAT GGCACCrCAG TTGCGATGG CAAAGTT CTTAATGAAA AATACCAAGC TGAACACAAT AAGCTCT~c 'rTCTTrGGAA CCATTGCCAA ACACCTCAAA TCTGAAAACA GAGCGCTTCA TCG~rGAAAA ACCAITTGG ACAGATTrACO CAATGCAG AAGiTTGAAT GACcGAACTC.C TA(CCAACATT TGACCAAGAA CAAATrCC GTATCGACCA TTATCTTGGT AAG4GAAATGA TCCAAAGCAT CTTTGCAGTT CGCTTTGCAA ACTTGATT TGAAAACGTT TGGAACAAGG ATTTTATCCA CAATGTTCAA AI'TACCTTTG CGGAGCGCTT GGGTGTAGAA GAACGTGGTG GCTACTATGA CCAATCCCGT GCCCTCCGTG ACATGGTCCA AAACCACACT CTACAACTTC 'r-I-CGCTCCT CGCCATGGAC AAACCAGCAA GCTTCACAAA AGACGAGATT CG.TGCTGAAA AGATTAAGG'r CTTTAAAAAC CTCTrATCATC cAAcTGATGA AGAAcTcAAA GAAcACTTTA TCCGTGGGCA ATACCGCTCT GG.TAAGATTG 0 ft.
0 0 0 00 0.
0 ft 0 00 09 0 ATGG.CATGAA ATACATCTCT TATCGTAGCG AAACCTTTAC ATCTGGTGCC TTCTTTGTAG r =TCCGTAC AGGTAAACGA CTGACTGAAA AAATGGATTC TATCTTTGGA GAACCACTTG CAACAGAAGG CTTCTCTCTT AGCCTAAATG CI'CCTAACTC ACTTGATTAC CGTAC.AGArG ACGAAAAATT GATT'rATGAT GTCCTrAAATA AAGTTTGTGC G'rCATGGAAG TTGAT'rGACC CCCCACTTCA TGACTATAAA cCCGGAAGCA AAAAA'rTCGG TGCCAAATGG ACTTGGCAAC AGCCAAA'rGT GAATCCAGA-A TCAACAACTG ACAGCGATCG ATTCCGTGGT GTTrCCTTT-Cr AAGGAACTCA TGTCAACATC GTCTTAAAC CTCCAAATAT TTTGACCATC TATATTCAAC GGAAGCAACT AGGAGAAGAA TTTAACT'rGG CGACTGCAAC TGGTGCT'rCT CCAGAACCAT ACAACTCAAC TAACTTTAGC CACTGGGATG GTATTGAAAA GCTCrGGGCT GAAZAATGGTG TGGAcCTCA AGCCAGCTTT GACCTAcTr CAGATATCAC CTATCGTCAA GATGGTCGCT 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 0~ 0 0 0**0 0 0000 900t *00.
*0 00 0 0 TAGAATAAAA AAATTTCCTG CAAGTTTATG Cc'rTGCAGGA TTTT'TCC TCATTAGA'II AAACCTTCCA AGAGACCTTT CATAAAGTTT TCTGAGTTAA ACTCTCCAAT ATCATCGATTI TTTTCACCAA AACCAATCAA TTTTACAGGA ATATTGAGTT CTTCACGAAT GGCTAGAACC ACACCTCCTC GAGCAGTTCC ATCAATCTTA GTCAAAACAA TTCCCG'ITAA AGGTGTGATT TTCGAAAATT CTTGGCCTG TACTAGGGCA TT'rTGACCTG 7rGATGCATC AAGTGCCAAG AAGGTTTCAT GTGGTGCTTC TGGCACAACA CGTTTGATAA TACGACCAA'r CTT'rTCCAAC TCAGCCATAA GGTTATCCTT ATTTTGCAGA CGACCAGCAG TATCAATCAT GAGAATATCG ATACCTTCAG TCACGGCACG TTCCATACCA TCAAAGACCA CGCTGG3CTGG ATCAGCTTTT TCAGGTCCAG TTACTACTGG AACATCTACT CGTCGGCCCC ATTCAGCTAG CTGAGCTACT 872 GCACCCCGCAC GGAAGGTATC 'rGCTGCAACC ACCATCACCT TCTTACCAGC TTGTTTGTAG CGGTGGGCTA GTTTCCGAT AQAATTTT TTCCCAACAC CATTCACACC AACAAAGAC ATAACTGTCA AGTTATCTTG GAAGTGCGATG CTrl'CATCGT AGCTACCATC CTTT'TCATAA AGCTCAACCA AfZTTCTCAAT G.ATGACACGA CGAAGTACAT CAGGrC~r GGCATC~rA AGCTTWGcvr CGTAACGTAG TTCCTCCCrr AAGTTAGAAG C=tTGGAC ACCAACATCA CTCATAATCA GGAG'rrCTTC CAGTTCCTCG GrAAAAAGG C-ATTCAAGCG GGCACCGAA6A TATTTrCCT GAACAG=TC TTCTGtTTGrA TTI-TCT'CTA CAGTTCC'TTC GTGCTCALAGC GGTAA'rTCTT CTATNrCTTC TTGAGAAACC TCAACTGTGT CTTGGA'rrTC CTCrrCTTGG GCTTCTTCCT GAGAAACTTC CTCAACTT1C1' TCAAGAT-Tr CCACAGCTTC 7T1rACAACr AATAGACGGT CAAACAATCC CATATCTTAG CAGCCGACAG CT-rCCTCATC GTTGGTCATC TCAGGAACAG CGTTTGCAT AGCAACACCA TTGGCCTCCT CACCACAAGC CATCACTTGA T'rGCCAAAC CTGTTGCTTT- ATGAACATTC GATTTAAAGA TrCATATTG GTCAAACAAT AACACAGCTT GTTCAACAAT G'rCAAGGTAG GATCAACATC TCTTCCAT-1r TAGGTTCTT~C Tr CCTTTA GCACATATTC GGCGTCACTA CAT-TGCGGC AGACCTGCCC ATTCAATCAT CTT'rGGTCGA TTCCAAGATG TTTGGTGACC ATTCTAGCAA TCTGGAGAAA TCTTCTGAAT
TTC.AACCTCT
TT-CAGACAAA
TrTTT'rrCCG
TTCGATAGCC
TGCCT'rTACT
AGAGAGGTCA
G=TATTAGT
CAT'PTCACGT
GGCTGCATCC
AGATA6AGTCT
ACTTTCTTGG
'rGATAATTTC
ATCAAGGATT
A'rC?1CCTCA
ACCAGTTGTC
ACGATCCGTC
AAAAATCTT CGTrCAACAGA GCGGAAGTTA CCTGTGCGAG TT'!rCTAAG ACTGCGGTCA GGACCTTCTG GTTCAAGCAC TTCAGAATTA TTCI'CTTCCT CTGGTAAT'rC TT'CTrAGTr'r CCTACAGCTG GC-rCTGAATC CTGACTr'rCT 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 AAGGGI'CTT GAGCAAAGGC AGTCACGCAT TTGTTGTAGG TCAAAGTCCA CTGGAACAAA GGTCAAAGCr GGATTGAATT, TCCGATTGGA TrTrGATAAAC TGTTCCT'rCT GAGATGGCAT TCTGT'rrCTT CATACAAACG TGCCACATCA TCATATGAAA TCTCCTGTAT TITTTCTGAAC TAA'rCCACCA TTAAAAGTAA CCGTCAGTCC CTAACTCATG GAGAAAGAAA TCCATGGCTT AATACCACCT TGATACCACG ATCACGCGCA gCTTGCAAGG
TCATTTGACT
TGGCATAAAG
CAAGAGGCAG
AGACTGT'TTT
TGGTATACTC
TTA.AGGGACG
'rrrCCTTGcT AGCCTTTTAT CAGTAGTCAG GcC-ATrATAA-GcccrccATA TCrTTGCTAA T'rCTAAAATT CAAGGTCCCC TCCAAGTCCA TAAGCTATAA CCGACCGI-rC TCAGGTCGTG CATTTTCAGG ATGCAATCAA 'rrTTATATCT CTTATGGTGA CCAATCACAG ACTACAGGA TGTCCCACAA CATGACCTGA TCCAT'rGTGA CCTGCATCAT A'rGTAAGTCA TTAAGATTGT CTrCCAAAAGC TACCAAGrMr TTTAACIAAT TCAACAATGG CCACTCCCTT ATCGACATAG TCCAGAACAA TATCAATGGA ITCAAAGCCA CI'GTCA'rGG AAGCC'rCCCC ATCTTCCAGC GTTTCTTCTG CTG;TGATATC T'rCCAAACTC CCTACT7TT 'PCAAATAGGT CTCATCAACC CTATCTAGAA GTTTArTTGAT A'rCTACATAA GGTGAAGTTT' CACGAGACAT AGTCGCTTCA TACAAGTCCT CCGCGATGAA AATAATGTCA TC-ACGAACAC CCCGACCCGA AGCTACCGCA AAGTAAATCC TGAGACGATC CATATCAAAG CGTCr-ATTCC CTACTAGTT'r AATCTCATC CTTCAATACT CTTTGAAACA CCCGATT =1 GCA'rGGTCAC CCTTAACACC AGGAACG=r TCGTTTACCC TCAAGTTGGT TGTAAATT-Ir AAAATGTCAT GGATAlw'-rC ATTATAGTGC TGACTCACTT CATATCAACC CT'rCTTACCC GTCAAGANGCA TCAGCrrT'rC AAAAGTTGCC AGATAAAAGT GACCrrGATA C'TCTACCAAA CTGCCAT~rr CAGCAAATAA 7'1r7rCTAGA GACAGAAATC
CATCTAGGAA
TTCTAAATCT
GTAGGAAACC AAGAGAGACT GGTTrCCGTCC ATATCCGTTG rTTAACTAA CTGAAAC-AAT TCCCTTACGG 'rGGGT'rACGA CCCAAAACCT TTAACATTrG TGGAATAGTC TrTGACACGAA
CGATGAACTG
CTTCATCCAG
TAATGGAGAA
ACCACCACTC
AACCCCAGCT
CATCTGCTTG
TTcCCTGAcC ATGAGArrAA GAGACTGGA'r GTCAGCAAGT CTCCTTCAGT AAGGTC-ACTT TAAAGGACTC TCATCATTCA TCTCTGTAAT TCCATAGATG CAATCAGCCG CTGCCATGGT GCTGTCCTTG TCAAACGCGGT TGAGCTAATC CGCAGCTTCC ACCTCATCCA AGATAACAAA GAGCAAGGCA AGAGCCGATA GGGCTTT1TTC TT-rGCCT GGTGGTTGGA CAGAAA'rTTC CAAAATGAGG TCAGCCTGAC CTCCACCAAA ACGAATGACC TCAAAGGTTG AT?1'AAAGCG GGTCTCAAGG AGCAGGTTTT TCGCAGACAA CAGACGGTTG TGAACTTCTr CGTAC'TGTTC GCGTATAGCC IT-rCCTAAAT CCTTAACT.TC 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 828CF 8340 8400 8460 8520 8580 8640 AATATCATCA CGTTGGCTAT AATAGCGTCT AAATTGACAG TTGCTCTGCC ACATTGAGAT GATCTGGTAC TGGTCTGTTA TTTCTTGGCT TCAGCACGAG AWCCAAATGA CTAGCAATAT GCGAATCAAA CCTTGTTGGA GAGCAATTCT GTATCAACCT TrCCTCTTGT TCAAAATCAA TTCATAACGT TTTTGCCCTT
'ITAGGAAATC
GACCCAGTGA
TTTCCAACTC ATGCCCCTTT TC1TAAAGCTT CTGTGTAGCT ATTGCACTTTG TAGATGGCGC AAGC=CCGC TAACCTr'rTC
TTTGCTTCCG
CATCCAGTTG
CA'rTTGTTTT
TCTCAAGATT
AATCCACTCT TC-AlwrCTGCT GGCGAGCCTG ACCCTCAATA TCATCCAACT CAAACTGCT TTGAG=MrG GATTCTrCCG CC'TGTTGACT ATCAATCTTr TCT-GAAGAA GGCGCTGGAT GATTGTCCAA 'IrC=CrTCCT AAGCGTTCAA T1ATCAGCAAC GCAGTTCTGT CTTAAGCAAA CGAGCTTGCG CTAGCTCTTC 874 CTG-CAAGTTT TGATACGT CTGGATGGC ATTTTTGTTA GAC=AATCT CrICAATCTC AGCTTCCAGA TI-rrCCTMT CACTG-GAGAT TGCAGCAAGA CCrTCTrGGC AGTTTTCCTT ATCCGCTTGC CAATCTCCCT CGGAAAGACG ATCTA'N'TCC TCTCTTGGA GTTTCCAAAG AGITCCAGT TCI-rCAACITr GCTGACTAGT TTGCTGAT.AA GCGAGGAACA AGCCTTGCTC CTCAATACGT GCCTGCTCTC CrrGAGATTT AATAGCTTCT AATGACTCGG TCAATCTGGC CATCTCATCT TGCAAGGTCT 'rCAAACTCCC CTCTTCTGAA CCCAAGCTTG C7TTCTTC AGCAATTTCT TTTTGTAATT G3CTCCAGTC TGGCTTCATA AAAA'rGCTGT TAT1'CTGGCG ATTGGCACCA CCTGCATAAG ACCACCTGT GCGCA.ACTCT GTCCCATCCA ATGTCACCAT ACGAACCTGA TAACGAACTr GGCGAGCTGC TGCACGCGCA TGTTCTACGG TATCAAAGAT AGCCGTCGTA GCTAGCAAGT TCTTGAAAAT GGCTTCCAGT CTAGTATCAA AAGTCACCAA CTCATCTGCC ATCCCAAGGA AACCTGGCCT 'rACAGCCATA GCATCT'rGGT TCTGACrAGA AATCGTACGC GCCTTGATAG TCGTCAAAGG AAGAAAGGTT GCACCACCGG CTCTrGTTCCG TTTAAGGAAG TCAATAGCCT TCGTTGCCGA CTCI-CATCT TCTACGATGA TATGCTGGCT ACTTGCCCCT AAGGCAATCT CTAGGGCACT TTGATA.ATAA ACTGACTGCA CCALATAATCC CACCTAGGCG ATC'TTTTTCT
TGCATAAAAG
GT'rTTTGAGA
CTGCTCCTCT
CTTGGCAGTT
TTAC'rATGAT TrCTCAGGA'r TTATCCAGAC GGTCAAAGAG 'rGCTCCTrGG CAATAGCTTG TCAAGCTCTT CCTTTTGCTG A7TrTTCCAAA
TTGGCTTTGT
GTAGTCAGCC
ACTAGCCTTC
C'TCTTTCAGC TrTTCTAGTT CTCATTCTCA ATACGGGTCA AAACCGTTCA CGTAAGAGCT AGCTTCTAAA CGATTGAGTT AGAGCTTrCT TTATCAGACT CAAACGGGCT TGTGCCTCCT CCCTAATTI-T CTTTCTAAAT 7"rTGGCCATr TCAGCCTGTA TAATTTTTCA -GGCTTTTGGT GATCTGCTTG TTTTTGAGAA ACTGGTTTGA GACATCCGCT CAATCATCTG ATCAGCATCG TTTGATTATT TTGGACTAGA TTTCTTTGCT GAGTGAATTT GTTGATTCAA GGCCACTTGC CACTAA'rCAG ACTAGTCALAG AATCTTGGCG TTC7"TTT~A AATAACTCAT CAAGAGTTCT ACATCAAAGG TCAGATGCTC TGGAGAACAC TCTTAACACC C'TrWGAGCTC TGCCCTG=~ TGAGCT'GAT AGGAAGI=T AATAATTTCT GAACCTGCTC T TAGCTA TAGCTAATTG AGCTGACGAC TATTTTrCCAA TCTTCTTGTA AAAGAGCTAC TCTGAGAA.AG CCAGCAATTC TTTCCCTCTA ACAGCAGCTAA C1'CTTATCCT CCAAAGCAGC TCGGAC1'CCA G'rTTCGATAG 'rCCATCAAAC TGCCTTGGTC AGAGTrTGAT TTTCTrCTTC TGAACCTGAG TCAACTCT'rC 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 TTCTGTCGAC TCTAGTTCAC CCTTATTr'rC CTTGATTTGA CACCAGAA CATCTAAATA AATAGCCTTA CGTTGTCCTI' CCAAGTCTAA AAAC'rTACGG GCATTCTCAG CTTGCTTCTC AAGAGG.CrrG AITrGATAT CCAACTCGTA GATAATGrCC TCTA).GCGGT CCAGA'rTATC CTGAGTTTGC TGCAGTrTAC TCTCGGTITC TTTrCTGCGA GTCTrGTATTr'r AAAACTCC AGCAGCICT TC.AAAAATAG CTrCCCTrGG GAAATAATAG ATGAATATCA CGCAGACGGA ATAGACATGG CGT-rCCACCC ATCCAGAGTC ACAACTACAG GATGA'rATCC GGCATCTrGC ACGCAGACTT TCTGTAATAT ACCTTGGTC-A AAAACGACCT TTCCTTTAAA TACATGAATC CrAATTTC'r TAGAACGACC ACATCAAAAA CCTTATCGTrG ACATCACCAT TGACCTGAAG AACTCGCCTG CTCAACCTr ACATCCTTAT CCAAAAGAAG TCACGATTGC GACCACC1'GA TGGCAATCAC GCGCAAAACC GATAGGTCAC CTTCAGGCTT CTCGTCGT1'C
AGAAGGAATC
CTrTCTTGCC TGAr'rrCTrG AAGCATAATr
CCCCACGGAG
TGGACTTTCC
CTCAGCCI-rG 'rCGTCCCAA'r GTCAATCTTrG
ACCTGCATICC
GAGCGGTTflG GAATrAAAAA TCTCCTCAAC CCAGTATCCA AGAAGAGGTC TATrCGCTAT CTCCACTACG TrGArAAATC CGTCATGATr CGACrCGG TTCCAGCAAA ACTC-rTGACA CrAC-ACTCCC CCAAAGCCCA AGATCCATTG GGTCCAACAA CTGCCGTrCAC TGG'rCTTATC AGCAAAAGAC CAGCCCCTTC TCAACGGCAT TTGGCCTTGA CCCATGCTCT AGC-AGGCCCT GTTCAGAAAA CAACTCTTCG AGATGGGTTT AGGAATCATG ACN'GATAGA GGCACCAAGA AAGGCTrCAA TTTTTCr'rCC CCTTrACCCA AGCTAAACTC TCCTCACGGA TTTAGGATAT TTrTTATATA TTGAACCCCr ?TTTGccAGC TACCTTrCAAC
TC.ACCTGATA
TATAGTCTGT
TAAA=CCTT
AGGCATCACC
ACr'rGATAAA
CAATCATAGC
GATATT'C'rGA
GTGAAATITTT
CTAAC7'=
GAATTTCGAT
TrCCTGCTCT
AAGAACTTCT
ACGAArAGCCC
AATCATCTCA
GACCTTGGCC
AAGAATGGTG
CTGGTCAAAC
ACGGAGTTT
AATC.AATAAC
TAAGAGGCGG
GTCTGCAAAT
CTCTTTCTAA
10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 TGTAGAACAG CGTCTCCTAA AAATrrCCAAG CGTTCA'rrGT TGCTCATrGG CATAACTCGT ATGAGTAAAG GCAGT1TTCCA TCGA'rTGCAA AATGATTCTT TAGTACAGTT TGTAATT~CTT TCATACCAAC
CTGATAATAG
TGGAAAGCAT
GGATAAAGA
ATCGAACCCC
ACTTGT'rTTA
CTTAGTGGGA
TAAGGTCAGC
TCCTTTTTAT TATATCAAAA AAAGCCCCCT. GAGT-CACTCT AAAACGGGAC TTCGGAATTC TTTAGACAGA GATTCTCAGT TTTAGCCCA AATTTGGGTC AAAAAGCCCT ATAAAGGCT TTTTAGGATG TTTACATCCA CCCTGAGGGA CATCTCAAGA ACCGGAATCT TACGTGATAT CCA'rTACACT AAGGGTGGAA TTATAACAGA AAT'rTGCTCT AATAACAAGT T'rTTCCGTCA AAGACCCCCT AGCATCCCCA TTCCAGATGG AGTTrCAC GA'rCACATAA TCAACGTGTT AACCrACGT CC-ACCTGCAT AAGAAATAGC ACTTTGAAGG TCTTGTTCCA 876 TCTCAGTTAA AGTGTCTTGC AGATGACCTr TAGCAGGAAG CAAGATACGT TGCCTTCCA CATTTTTGTA AGCACCI1rT TGATATTGTG AGGCTGA.ACC CACCATCG.AC TrCAATCGTT TTCCCTGGAC TTTCA.ATGTG TCATGATCAT GCTAGCACCG AAGCGGATAG ACTrAGCXAT CTCCATCAGC GATAATCGGT TTACGCGCAG CCTTrGGCAC.A GCCAACCACC TGTACCAAAA CCAGTCTTAA CCTTG=GAT TTrCCGACcTr AGTAGCATCC GCACCAGCAT 'r~rCCAATTC CCACATTTCC AGCAATGACA AAGGTATCrG GCAAT'rCTTT AAATCACGCT ATCCGCATGA CCATGAGCAA TATCAATAGT CCTTGAGCTG GCTAACAAAA 'rCATACTCAT AATCCTTAAC TG.AGCCCTrG ATTGTGCATT CGI-rTAATAA AAGGAATGCG GC.ATAATGTA GAAGTAACCA CCTTTAGCCA GT'rGCTCTGC TCTGCATATr CGCTGGCACA ACAGGTAGTr TAAAGGTGTG ATAATATrCT ?1'GAACTGTr TCCTGCAAAG AGGGAACCAA ATCACCGTGA GTACGAATTC CCAGCGTAGA GCAGCCAACT ACAAACCTTA CCAGGACCGA ACGCACAGCT TCTGGTGTrC CTTGATGTGT TGAATCATAG GATATACTCA GGAGTATCAG ACCGACAGAG ATAGAAGCAA TCCTrGCCTCA TCAAAACGGr TACAr-rTrCA TCCAAAATCG ATTTCCTAAA GTGACACTTG
S
TATCCGCT rC TGC.ACGGCTT Tr'AATGACAC ATTTATPrGG AATCAATTGA ATATCTTCGT AATCAAAAAT TGGAAATTwCA TTTAACZATAT CGATGTCTCG TTTCTT rGT AATGACCTAC CTATGCTCTr GCATC-ACTAC GCCTTTTCCG ACGTTTCCTC; G INFORMATION FOR SEQ ID NO: 127: SEQUENCE CHARACTERISTICS: LENGTH: 9578 base pairs TYPE: nucleic acid STRAI4DEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: CCGAATOCAA TGTTTACGGT TGAACTTGAA AATGGACATC AGAT'rrTAGC AACAGTTTCT GGTAAAATTC GTAAAAACTA TATTCGTA'IT TTAGCGGGAG ATCGTGTTAC TGTCGAAATG AGTCCATATG ACTTGACACG TGGACGTATC ACTTACCGCT TTAILATAATC GAAAAACTTG GAGGGATAAG AAATGAAAGT AAGACCATCG GTCAAACCAA TTTGCGA.ATA CTGTAAAGTT AT'rCGTCGTA ATGGTCGTGT TATGGTAATT TGCCCAGCAA ATCCAAAACA CAAACAACGT C.AAGG.ATAAG ATAGAAAGGA G;AAAACATGG CTCGTATTGC TGGAGTTGAT ATTCCAAATG ACAAACGCGT AGTAATCTCA TTGACTTATG 'TATGGTAT CGGACTTGCA ACATCTAAGA AAATTTTGGC TGCTGCTGGA ATCTCAGAAG ATGT'rCGTGT ACGTG.ATCTr ACATCAGATC 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13121 120 180 240 300 360 AAGAAGATGC TATCCGTCOT GAAGTGGATG CAATCAAAGT TGAAGGTGAC CTTCGTCGTG AAGTAAACTr GAACATCAAA CGNTTGATGG AAATCGGTTC ATACCGTGGT ATCCGTCACC GTCGTGCACT TCC7GTCCT GGACAAAACA CTAAAAACAA CGCCCCCACT CG;TAAAGGTA AAGCTGTTGC GATT-GCTCGT AAGAAAAAAT AATATAGGAG CTAAAAGTCT 'rGGCTAAACC AACACGTAAA CGTCGTGTGA AAAACAATAT CCAATCTCGT ATTGCTCATA TTC-ACCCTAC ATTrTAATAAC ACTATTCTA TGAT'rACTGA TGTGCATGGT AATGCAATTG CTTGG'rCATC AGCTGGTGCT CTTGG=ICA AAGGT'rCTCG TAAATCTACA CCA'rTCCTG CTCAAATGGC TTCTGAAGCT GCTGCTAAAT AAAAGGTCCA GGTTCTCCTC AGTAACAGCA ATTCGTGATG TCGCCGTGTA TAATCATCC
CGAGTTTGAA
CTGCAC?.AGA ACACCGTCTT AAATCAGT G AAGTTACTGT GTGAGTCAGC TATTCGTC CTTGCTGCCG CTGGTCTTGA TGACTCCAGT GCCACACAAT GCTGCTCGTC CTCCAAAACC ATTACACTGC T'rTTCGI-rTA AGAGGGAGTA ACTAAATGAT TA.ACAAAAAT TGATGAAAAT AAAGATTATG GCAACTTTGT GCTACGGTAC AACTCTTCGT ACTCTC'rC GTCGTGTACT CAGCTGTCAC ATCTATCAAC ATTGATGGTG TGTTACATGA TTCGTGAAGA CGTGATGCAA ATCATTCTGA ACA'rTAAACG a.
.v
S.
S
AAACCAAATA'
AATCGAACCA CTTGAACGTG TCTAGCTTCT CTACCAGGAG GTCTACACA GTTCCAGGTG AATTGCAGTG AAA'rcGTAcG TGCrGAAGTA AC-AGCTGGTG TCATTATCrC TT'rACAATCG TGGTCGTGGA TATGTACCTG TGCTGTAGAT TCTATTTATA
TTG.AAGACGA
ACATT7TTGAC AAAAA'rCATC GAACTGGATG TTGAAGGTCC AGATACCCAT ATTGAAATTC GTGAAGGTTC TTCTCrAAAA GCGAC'TATGA CTGATGAAAA TAAAAAGGAT AATGCACCAG CACCAGTTAC AAAAGTCAAC TATCAAGTGG 'rAAATCCAGA
CTGTTAACAG
TTGGAACACT
AACCrGCTCG
ATGGAACAAT
TTGATTTGTT
ATACTGAATC
960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860.
1920 1980 2040 2100 2160 2220 TG;TAGGTAGC AATGATGCTT TCGACAAAIT AACCCTTGAA ATCTTGACA.A 5 5 t
S
S
S. 55 S S
S
TATTCCAGAA GATGCTTTAG GGCTTTCAGC TACAAATCTT ACTGAGATTG CTAAGTCAAC TGACGACCGT ATTTTAGATC GTACGATTGA -C1GTAAAA CGTGCCCGTA TCAATACTGT GATGATGAAA GTACGAAATC 'rTGGACGCAA TGATTTGGGT CTTGGATTAA AAGATAAATA CCACCCACTA GCTC-ACAACG TAAAGCAATC AACCAATCAA TCCTCACAAC TGAAGCTCGT ACGTATTTTG ACAGAACATC TGAAGTGATG AAAGAAGCTC G;GAACTC;GAC TTGTCTGTrGC GTTCATACA.A GCATGATTTG ACAGAAAAAT CTGAAGCACA GAGTTTGGAA GAAGTGAAAC TCAAACTCAT AAGGAGCAAT ACATGGCTTA CCGTAAACTA CTTCGCGATT TGACAACTGA CCTTTTGATC GCTAAAGAAA 'rCCGTAAAAC TGTTGAAAAA ATGATTACTC TACGTAAACC CCTAATGAAA TCGCATCTGA GCACTTCAAA AATTGTTCTC ACT'.CGTATCC 1'rAAAACTGA TAGTATAAA ATCATCAATT TAGCrCTGGT CTACCGCTAG TGTGATTrG AAACTA'rGAT
AGAAATCGCA
ATCACGTCGT
TTGTGAGTG
GATTTCGGTC
878
CATGCACGTC
GAAGCAACTG
G'rCAAGCAGC TGCTTT"CGTA ATAAGTACAC ?ICTACTACA CCTCGT'rATG CTCAACGT'AA CCGTGGATAC GGTGATGCAG CGCCAATG-GC GATCATCGAA 'PTATGATGAT GGAGTCTTGTI CCTTrAGTC CTAGCGGGAA CACTCATCAT AAGTTGGGA'r TCTTAAGAAC AACTTCGTAA GCAGGCGTTT TATTGAAAA GAA'rCCTGTT TAATGT'rAAG CTTATGAAAG CCATATAAC TGTTGTTGGT 'rCTGGTAAAA TTGCAGA6ATT AGGATTGAAT 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 AGTAGACGCT TGTr'rACGAA ATTGTTTITTT TTGACTATTT TCG=rAGAAT TATGC*.ATAC G rrCTTATT TTAAGAAGAA 'I-TGGAGTTTA AAAGATAAAT CrGGAATTGT TGCAGGTGTT AT'TGACGATA TCTCTCAAAC TGTCTrCGAT AGTGATGAAA AGCAAGATTT TACCTATCTT TTGAATG'rAA AAATCAATAT TCAGAGTGCA GAGGTCATCA TGGATATTAG ACAAGT'rACT TTCGATATrA GAACCATTAC CATGGGGATT AATCGTGCTG CGGAGAAAAT CTATCAAAAA GTTGGTGATG AAATTGCGGC TGAGTTGGGA ACACCTATI'T CTCTGATrG GGCAGCGACA GCGCTTrGATA AGGCTOCCAA AGAGATTGGT GTACAAAAAG GT'rATCAAAA GGGACATCAG GCTGAGACGG ATAAGGTCTG CTCGTCAGTC
GAATATTTTA
CGTAATGAAT
GCGATTTTCG
GAAACCATCG
TCTCTTTTGG
ATTACGACAA
ATTCCTATCG
GATGCGACGG
GTGGACTTTA
ATTCTCATCA
AATATCGGCT
CGATGATCGC TGTTGTATCT 2880 T'rCAAGCrrT TGGGCAAACT 2940 AAGCTATGTA TAATATCTAG 3000 CCATGATTGA GGAGCAAAAC 3060 ACTGTATCGA TCCACATATC 3120 AGGCGGCTAA TTTACTAGCT 310 TTAATAAGCG TGTATCGGTG 3240 ACTACGTGGT TCTGGCAAA6A 3300 TTGG;TGGTTr TTCTGCCT'rA 3360 A'rTCCATTCC TCGCGCTTTG 3420.
CAACCAAGTC TGGTATTA6AT 3480 CAGCAAATCT '1-rAGATATG 3540 ACAATCCATT TATGGCGGGT 3600 TCGGAGTTTC TGGTCCTGGT 3660 TTGATGTAGT AGCCGAAACA 3720 GGTTGGTCA APATGGCCAGT 3780 TGGCACCAAC CCCTGCGGTT 3840 AAACAGTTGG CACGCATGGA 3900 ACCGTGGAGT GATGGCCTGC 3960 ATGACGGCTG TGGCAGATAT GGG.ACGAATT A'rCA6AGGAAA GGAGTGGCCA AGTTGGTTGT ATTCGCTAMT GCTGTTGAGG GCCTTTCATG GTrGTTrCGA AGCAGATGTT ATCATCAATG IGI'GTGAAAC GTGCTTGGA AAAAGTTCGT GGACAGAGCT GTTAAGAAAA CTGCCTTTAA AATCACTCGT ATCGGTCAAT GAGAGAcTGG GTGTGGrAGTT TGGT ATTGTG GACTTrGACTI GGAGACT CTG TrGGrCACCTGT CCTTGAGGAA ATGGGGCTAG ACGACGGCTG CCTTGGCCCT CI'TGAACGAC CAAGTAAAA AACCAAGTCG GTGGTTTATC TGGTGCCT?1' ATCCCTGT CTrGAGGATGA AGGAATGATT 42 4020 GCTGCAGTGC AAAATGGCT TC9-rAATTTA GAAAAACTAG AAGCrATCAC rTCTTCCAT 'rGATA'r.AT Tacr-ATcccA GAAcAT1Acrc cTGcTcAAAc ATGATTGCGG ATCAAGCAGC ATTCCCAAAG GAAAAGAAGG GTTATGAAGrG TTAATGGGC GCACCAATTC ATAGTTTTAA TTAGACGTGT ATACTATAA'r CACTGAATTA GT""rGAATT TT'IrCATAAT AATCTCC~rC GATATAATAG AAGCAAACGG CAI'GGCAAGA CCATGTT'TAA ACTGCTGAAG G2'GAACGAGG AATCGGTGTT A'rCAACA'rGA CCATATGATT CAGTTTGGTG TcCTCTGTC GACTTCATCT AAATTAAGAA AATAGGAGAA CATTAAATAA AGACCTCCTA GA'TTTCATC TAATATCTT AAAAGTCGCC TGTATGCCTG AGGACGGAAA ATGGTAAAAG CACGATTGGT CCCGCGCAAG GATTCAAGAG TTAGGAATCG
AAACAACAGC
GTCTATTAGG
ATTTTAAGTT
ATATTATTTG
ATAATGAA
GCT=rATTr
TACCATIGTA
GTTGGAGCGA
GTT?'GCGAGA
GGCTATCTGT
TArrGCcGCT
TGTTCGTATC
AACTGCACCC
ACAAALTCCr-A
CTATTTAAGA
AAACAGATAA
CTCCTAAAC'r
TATCATTCAT
TTTGGTACGT
TAC1TCCCTTA ATCrGATCTA CAGTTTGAGC GTGCTTATrC GAGTGATTCT GGTCGTACCA TTCAGACCAT GGGAATTATC C1TTGAAGAAC TTGG=?GCA GGGGGAAATC CCrrATCGCA TGGACAAGCG TATCAGAGAA TGGTGTTTCG CTAGTTGTA TCGAGCCTAT GATGCGATC '-T-rCATGGG CATTATrCCT CCTATCTTTA ATGTGGACr-A CGTiTCACCAA TTGTCTTATG CTCAACTGGC TGAGGGCTTG GTAGAGGTCC ATACAGCTGG TTGGGCTGAA GGCTGGGAAA AACTCAGTGG CCGAATCAAG GAAGGCTG AAATGA'?GC AAAAGAAATG CAAGATCAAG GTGGAGGTAA CGCCCTTGTT 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 GTCAGCCATCG AATGACTAT TGGAACCATT GGTCTGGATA ATGGTAGCGT GACAATCCTT GrrGTCGGTG ACCGTACTTA CCGAGAGCTA TAATCAGTCT AG-ACTTGCTT CCCA'rGAGCT AAAACAGCCC AGGGCACTCC TTT1CGGCTGT ATI'GCTTTTA GAT'NTTCA TAAACAAGAG AGTTG.AcAAG G1-rGGGrA cAccGTAAMT AGTCAfTTGTT CTTGI-rrGA CA'rTGTAGAG TGTGACTCAA GATAAGATGG CTCCAGAAAT CTTAGCAAAG GTATTCAGAC GACTACTTCC TGCTATTTCGT TGTAAGCTAG CAACAGATGA cGr'rATCTGA TTAATGGCAT GCATCCGCAT GAATATGAGG ACGGCCAGTT TAGGGTTGAA GCACGTGAGA AGATGG.AAGA AGGC'rCTATT AGGGATTTGA TAAGAATATC AAGATAAGAA TTTTGATGTG GAAAACTAAA GTGTAATCCT CAAGGAACCT ACTG1-rAGAA CAGTCAGGAT TCCTCTGAGA ACCTCTGTAT AAATAGCTAC GAGGATAGAA GTAGAGACTT TrGAAATCAT GATACCAGAT ACCATCATTG GAGTTGTAAT TAAGC-T-TCA GCAGCTTCTT CAATACTTGG GCGAATAGTA TAAGGTAATC TTCTGGCAGA 880 TAGAGACATA ATCAAGATGA AACCAGTCCC TGTAATCATA AGAAATCCAC TTCCAAATAG ACCAGTA'rTG AAGGAAGAAA TCAACGCAAT CCCTAGAACG CTTCCTGGTA CAATATAAGG TACCATACrG AGCrTGTCA.A TTAAGTTI-CT AAACAAATTC CGI-rTTCTAA CGGCTAGGTA GGAC.ATAAAT GTCGCAAATA CAACAACTAG AACTAAGGCA ATCAAAGGGA TACGAATGGT 5820 5880 5940 6000 ATTGAAAATA GCAGAI'CCCA TACGATGGAA TTTIAACAGAT ACCATACCTG ATGrTT'rAG 'rAAAACAGAG ATAAAQATAA TTCCGTAGAC TGTAGTTT TTAGCCTCA.A TGGATGGAG AATGTGTT'rT TGCATAAGGA AAATTGCCAA AGAATTTCC'r CCAACCTCGC TAATAAATTG CCCTTCGCCA ATICAACATAG GCGTTCCAAA GGAGCTGCTA GTAAGGTTGG AACrAGGAGA CCGAACGACC CCATGCTTTC AGCTGCTTCA AGCCACC?'rG TAACTGTr'rG GAGAATAACC GAAAGAGGTA TAAATTAAGT AGATTrrGAGG TGTTGCATAA ATGCAGCCA TTTCCNT r-AGATTCATC
CCCAATGATA
GGTATAAATC
GTCTGAGAAT
GGTAAAACA.A
AGTAGAGA.AT
CTGAAACTGT
ATCGCCATAA
AGGACAGGGA
GCTCT-CATAA
CCGTTACCAT
TGTCAATACT
AGCGGTTTGC 6240 TTGCAAAAGC 6300 AAGTCCGATA 6360 ATACAAGCAA 6420 AGGTTTAPLAT 6480 GT'rCATTGTT 6540 TACAATTCCT 6600 TTTAGTGATG 6660 AGCTGACATG 6720 CATAGAGAAG 6780 CCAGCAACAT A'rAGAAATAC CAGTGGGAAT AGTTGCAGTG TAAAGACAAG TTGA;ATCAAM AAATATCGAT AGCTCAAGA TAAAGGGCAT -TGTCAAAAA ACCTCATTT-C GTCCTAGCAA CAACCCAG GAGTAGGCTC CTACGAAAGG GAA~CAATGA TAATCAATAT TTGTAGAAAT 'rrCTTCCCCT TGAAGTCATA AGATAAGCTA ATAGGGTTCC TACAACTAAG GAAGTGA'rAG TAGCGGTAAT GGAAACCTG AAACTGTTGA CTAGTGTC1'C AGAGTAGTAG CTTTACTAA AG.AAAGTGAC AAAATTAGCT AGTGAGAATT GTCCTTCATG TATAACTGCT TGCTTGALGCA CGGTA.ACGAT AGGATAAACG AGAAAGATAG GATAGGTAAG AAAGAGGAAG AAAGAGGAAA CTG'PCCAAAT A'IrTAGTTTT TTACGTTCCA TGGTTGACTC CTTTTATCAG GrTTITGGGAA CCATCTGCAG AAAAGATGTT TAATTTT.TGC GTA'rrGATTC GTAGACGAAT ACGATTGCCT TTTTGTAGAT CT'ICTTCAAA AGTTGATTCT TCACTAACTT GAAT7"rrTGA GGCAAAACCT GTCTCAATGA AATAATCCGT ATTAGTCC-A AGATAGACGC TATCTCTAA'r AGTT~CCTTCA ATATCTCCAG ATTCATCTTT GATAAACTCT TCGGGACGAA TG.CTTACATG AATAGCTTGC TCCTCAACCT GATCAAGAGC TGGCATTCGA ACGGCATAGC CATCTGAAAA GACGA'rATAA GCGCCGTCGC TCCGTTTTTC AAr.ATTGcA-GcGATAA'rAT TTGTGCGTCC GA'rAAAGGT'r GCCACAAACT CATTAGCTGG TTTATGATAG AGTTCrTTG GTCGGCCGAT TT-GTTGGATC ACCCCATCT'r TCATAACAGC AATTTGGTCT GAAATAGCCA TGGCTTCTTC TTGGTCGTGG GTTACATAAA CAGTTGTAAT 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 TCCCACTTCG 'rCTTGGAT~r CAGATTAC1'A AGTCZ-CTCGT CAAGGTGACA CGTGTGTT AGCAATTTGC ATGAGTTCAA CTTCTTTTGC ATAACACCAA GTAGTTTTGG AAAACCATCC ATCATCCAAG TAAAATTCTC GCATT'rrC CCACATCCTG AATGTTCAAA TTCTCAATAA CTCCGGATGGC TTGACGCATA 'rCCAAGCGAA GTTTGGCCTC CCATGAGGAG AACACTTGCA rrAACCGCrA AGGCCCATGC G'rCCACC.ACT GAGTTTATCG GGCN'TCCAT CCGCATAT'rG GATACTTGTT GCTCTG'rTGA ATCAATT~CTT CTTrGGAAC AAGCAACGTT GTCTCGGACA CTCAAATGTG GGAAAT'AGC CGATATTGCG TTGCGGGT TCCATATTA'r TGA=rGTrc CACC'?rCGAT ACT=~CAAA CCrGCAATCA TACGAAGAAG AAGCTCCAAG AAGGGTAAAG ACACTTCCI-r 7'rGGAATTGT CAGGGACATC GTGGTAGATT TTrTGGCGT TAATAATTTT 7620 7680 7740 7800 7860 7920 7980 8040 8100 GATCTCACTC ATAGTGAACC TCT1TrTACTC TrTAGAT'rGG ATATCTGTAA AGACTTCGTr' GTATTTCTTA ACGATATCTC ATT-rTrT GATCACATAA TCATAkATCTT CAGTaAGTGT TTTGATTrTT TCAATTGGTT TCZATGTTTTC GTTAGTAGTG GTTGTACCAA GTGTATCN'G 'TTTCTTG.GCA TTrTTCCATAT TrTAGATTT GACGGTTCCT TCTTTTGGAT AGACTACCTr TGCTGGATCT TCATAAGAGA GACCAACAGC ACTAGATCAA CTTGAACCGA TTTTACCATC CCAAGCCT'rA TCATCT=TT AACCACC'rTG GGCCCTAGAA GAGTTT'GCTG GCTCAGCAGT AAGATCGT2'A TATCCTTCGA TGTrTCATGCCC ACTACCATCT AGTGTATAAG GAGTAGAGTA AT1'ATCATTT TCTTTTGAAG TATAGTTTTC ATAAGAAC CA CCAAAGATAA CATCAGCTAC GAAAAGTTCT CCAGTACCAG CTTGAArCAG CT!GrTTTTA GC-ATTTTI'AC GAACAGGACC TACTTCTTGA GAGATAATAA AATCGATAAA TTAACGATA CCAGCACTAG CAGGTAGGAA AATGTrTAGCT CCGTCATTTA AGAGTTTAAC CATTTCTCCA TCAGCGACTA CTTTATACAC AATAAGTGTG AAAAGATCTT TTACATAAGA AGCTTGTAGC ATATTTGTTA ATTGAGCAAA
TGCGATTTTT
TTTAG'rTAAA
GCCAGTTGTG
AAAGAGTTCT
CCT"rAGI-r CAGGTTTGAA TC-AGGGTTGA CGATTAAAAC T=TGATATT CTTTGATAAC CCGTGGGTAG TATATTGTGT 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 AGGAACTTCT TTTTCTGACT CTAGr=rT TTCTACT-TTG ATACCATATT TTTCTTCAAA GQ3CAGGAATA GTTGCTCCAA GCCGTCTCCT TTATCAGATG ATACATCCAT TTCTTTTTCA
TTAAGCCCTC
AACTGTCATC
TGATCGATAC
TGAGTTTGGT GAATAAACGA GGCAGATTCA TTAGAAGAAC CTCCGTTGTG TTAT'AAGT
CTAGCGAACC
AAGCAGCATA
TTATTTAAA
TCTATAACAG
ACAATGTAAC CGT'TTTAAA ACATACAATT CTATTCTATA GTGTATTGAA TACACTTTGA CrGCTAAAAT ATTTCTATAA ATTAATTTGA CTTTCCTGAT AGAGATGTTC 882 ACATCTrATT TCAATrACT ATATTAGAGT AAAATTCTCT ACAAAAAGAA GAATAGCCTA 9360 TVI'TACTATT CTTCTGAGTG ArrTCAATTC CTr'rGGGGAA ATATGGAGAT ACTN'TTAAA 9420 TCCTGAC.AAA TGGTTG'rTrC TT?1'TCTAAA TCGGTGATAC TGTATCGGAG AATGCGCGTG 9480 AGGTCACAAA GGCTGCGATA GAGCTTCTAT GGAGAATTTC ""-TTGGAGA GATT rM-AA 9540 AGGAATGAGA CATCCGCTAC CTCCTTGGAA GGTTTTTG 9578 INFORMATION FOR SEQ rD NO: 128: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 13440 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: CGGGCTGTrG TGACGATTCT TATTTCTATC TGTGTTATCT T=rGGGAAC TATTTGGGT *GTTGTCTTGG CTTTTGGGCA ACGTTCAAAG TTTAAACCGC TTGT'TGGTT GGCCAACTTG 120 *TACGTTTGGA ?rrT=CCGTGG GACACCGATG ATGGTTCAAA TTATGATTGC CTTTGCTCTT 180 ATGCATATCA ATGCTCCGAC TATTCAGATT GGAATT?1AG GTGTTGATTT TTCGCGTCTG 240 ATTCCAGGGA 'N'TTGATTAT CTCTATGAAT AGTGGTGCTT ATGTTTCGGA GACTGTTCGT 300 :000 GCCGGAATCA ATGCGGTTCC AAA.AGGTCAG CTAGAAGCGG CTTATTCCGCT AGGGATTCGT 360 *CCTAAAAATG CGATGCGTTA TGTGATTTTG CCACAAGCAG TCAAAAATAT CTTGCCAGCA 420 TTGGGGAACG AATTTATCAC CATTATCAAG GACAGCTCCC TCTTATCAGC TATTGGGGTC 480 *ATGGAGTTGT GGAATGGGGC TACAACAOTT TCTACA ACAA CC-TATCTACC TTTAACACCA 540 *CTrTTATTTG CAGCATTTTA CTACTTGATT ATGACCTCTA TCTCACAGT AGCCTTGAAA 600 GCTTrTGAAA AACATATGGG ACAAGGAGAT AAGAAATAAT GACAGAAACC TTGATAAAAA 660 TGAAAATTT ACATAAATCC TT'rGGAAAGA ATGAAGTATT GAAGGGCATC AACCTCGAGA 720 *TTAAAAGAGG AGAAGTTGTC GTrATCATCG GTCCTTCAGG GAGCGGGAAA TCTACCTTGC 780 TTCGCTCTAT GAATrGTTG GAAGAAGCAA CCAAGGGGAA GGTTATCTTT GAGGGAGTCG 840 *44ATATTACGGA CAAGAAGAAT GACCTGNTTG CCATGCGTGA GAAGATGGGC ATGTTTC 900 AACAATTCAA TCTCTTTCCT AATATGACTG TGATGGAAAA TATCACCTTG TCCCCTATCA 960 AOACCAAAGG-TGACAGTAAG GCCGTTGCAG AGAAAAGAGC TCAGGAACTT TTGGAAAAAG 1020 TTGGTTTGCC AGATAAGOCA GACGCTTATC CACAG-AGI'T GTCAGGTGGC CAGCAACAGC 1080 GGATTGCCAT CGCGCGTGGG TTGGCTATGG AACCAGA'rGT TTTPGCTCTTT GACGAGCCAA 1140 883 CTTCALGCCCT AGATCCTGAG ATGGTTGCAG AAGTTCTGGC AGrCAGGAAT GACCATGGTT ATCGTAACAC ATGAGA1TGG ATCGTrGTCAT CTTTATGGCA GACGGTCTGG TTGTTGAAGA TTGAACAAAC CCAAGGACAA AGGACTAAAG ACTTCTTGAG TGTTGCAA GATCTAGCCA ATTCCCcT
CGGAACACCT
TAAGGTTTA
GACCTGG.CAG
GAGCAGATr'r
TAAGTTAGCT
*999* :4,60 TTGTMAGCT ATTTGTAGCC AGCTTTAALAC GTTAAAGAGA AGATTAGTGA AAAGCTCAAC CAGAGCT' TCTrATAGTr TAAAGCTATA GGATGCCTA GGAAAGAAGT GTTAGAGCTA CATTGTA'MTr TTrGGTATAA TrAAAGATAT TrGTAAGAAA AGAGAAGTGA TATGACACAG ATrATTGATG GGAAAGC~rT AGCGGCCA6AA TrGCAGGGGC AG=rGGCrGA AAAGACTGCA AA.ArTAAAGG AAGAAACAGG TCTAGrGCCT GGTrTGGTAC TVxrrT'TGGT TGGGcAcAAT CCAGCCAGCC AAGTCTrACGT TCGCA:ACAAG GAGAGGTCAG CCCTTGCCGC TGGTT'rCCG;T AGCGAAGTAG TACCGTTCC AGAGACCATT ACTCAAGAGG AA'TCTAGA CCTGATTGCT AAATACAATC AGGATCCAGC TTGGCATGGG A~IrTGGTTC AGTT-GCCATT ACCAAAACAC ATTGATGAAG ACGCGrrCT A'rrGGCTATT GACCCAGAAA AGGATG;TGGA TGGTT'rCCAT CCTCTAAACA TCGGCGTCT TTGGTCTGGT CATCCAGTCA TGATTCCTTC GACACCGGCA GGAATTATGG AAATGTTCCA TGAATATGGG ATrGACTTGG AAGGTAAAAA TGCAGTCCTC ATCGGTCGAt CCAATATTGT CGGAAAACCT ATGGCCCAGC TTCTTr'rGC AAAGAATGCA ACAG'TAACCT TGACrCACTC ACGTACTCAT AATCT'rCCA AGGrGccTc AAAAGrCAGAT AT'rCTGGTTG 'rTGCAATCGG TCGTGCCALAG TrTGTGACTG CTGACITTGT CAAACCAGGT GCGGTAGTCA TTGACG'rTGG GA'rGAACCGC GATGAAAATG GTAAGCcTc TGGG;GATGTT GATTATGAGG CGGTTGCCCC ACT-rGCTAGC CACATTACGC CAGTCCCTGG AGGTGTCGGT CCTrATCACCA TTACTATGCT GA'rGGAGCAA ACCTATCAGG CAGCACTrAG GACATTGGAT AGAAAATAAG ATAAAAATrr TCTGAGGAAA GTGTATT'TC TATAGCTATA TCTAAAATGA TAGZAATGAA TATTAAATTT TAGAAATAAG TrTATAAAAG GAGGTflTCG CCTCCT7=r 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980.
2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 4 9 GTTGTATAAT GGAGTGAGGT T ACAATGGGA 'rGGAATATAC AATTAGAAAA AATTGCTAAA TTGAGTGTGC TGATTCTTGT TTCTGCC-ACC ATTTACTCCT GATTAGATGA TI=rAAAAA'r TT ATAATGGG GAATATAGTT TACTTAGCAC TAATTGATTA TCCAAATATT CAAGAGTGGG TTTATAGC?1T ACGAAAAACT TCATAAACGT CAAACAAGTA TTAAAAAAAG AALATrAGA TTACATCTGT CAGCATCCCT Ar-AGATAA.AA GAGTAGCCTC GAC?1'ATGAC CTACATAAGA GGTTAGTGAC TTCAGACI'AC TGTAGTCATA CTACGACTAT AGATGCAGCG ATTTCTATTT TTAAAACTGG TCGTCTTrA TrTTTCGATAG TCGAAATGCT CGTGGrCAAA TACAAGTr CTTCAGAGAA AGAATTACAA CAGATISGAT TAAAGTTCCT ACA'rGCTTAA TTATAAGT AATTTGAAAA TATTATTCCA GAGA.AGACTT AGAAGAGTG AAGGATAGTT GAkGGAAAAAA TATTGALACGT TCTTGTACAA GACTTATCCT TATGGTGG.TG AGGATTGGTA ACCGTTGGA.A TGAGGCTA'rG GCCTTTrTC AGA4AGTTGTC T'rGCTGGGGC ACAGGTCTTT GCTAGCTTAA 884 TCTGCTGTG-A AAGCC~TTCG GCGuAGATGC-T GAGGAGTTGG GCATCTGATC CGATAGATI'A ?1'TTGACTAT GTrCATGITrAG GGI'TATCCAT TGGCG-ATGGA GCG'N'1ATTA GGTCGAGCTC GACAAGITrA T'rCCTCGArT AAGI-I-CAT rl'rATCTATA GGTTATATTT TTGATGGTA-CC'rGTGA AAAATTA6AGG
GAGTTGTATA
ACCAAAATAC
ACTAAGAAAG
CGATGAAAGT
GTCATAAAGG
CCATCATCAT
CGGACAGA
TGCAAGATCA
CTGGTT1TACG
AAAAGAATCA
TTTGCAT'rAT
AAGATAGGT
TCTATCAAGT
TCCAACTCAT AATAAGAGCC GTATT'ATCTT GACTATGCTC TG-=TTAAAA CAATCAGATA GATTGATCA.A ACCTTACTAG AAAAAGTCAT AGACTACGcr? CGTC=GCTCGT TGCTTGGTGG GGCTGCTTTA GCAGCTGTAA AAAGCGGTGC AAATATCCCT GCIT1ACACA GCCATTTGCC GTAATTG7TTA CAAGACGCAA'r TGGAGAAGGC AGiACGATACG TTTGGAGAAA ATC?-TGTAAA GATTTTGA?1' GTAGATGGAG GGGCCTTAAC ACTAACCAG CTTATCTTAA CTCCC.CACCA TATTGAAAAG CA-AAACGAAG GTACAACATC AATTTTGGTA CAGAAAGGTC CAGCTACTCG CCAGTrAAAG GTTGrGGTC CCTATCAGGC AATGATirGcA GGATTTGCAG GCCAATTTCG PCAACCCAT CTTCATTCAG CCATAGCCCA GCCGACGGAA ATTAGTAATT GTCTTCCTAA 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 CArCCTGCT AAAAGaATGGr
TAGTGCCCTG
TATTTGGCAA
GACTGGTGGT
AC-AGGCCAGT
AGAACTATCT
AGGACAAGTT TGTI'GTTTCC GAAAAACTGT CTGGTATTGC ACTTCTTTCC CTCAAGGAAC GTTGGCCAGT CTGATTATTA ATGGGTGATA C-ACTCGCTGG CTCTACCAAC GTGTGGCAGr.
CAAGAAAATT ATGTGGTCTT AGTAATGAAA AGATATGTCT AAAATAGTTA GACA.AAAAAT GTTGATAATT TGTATCATTA TCTAAI-rC ACAAAAAACG AACGTTTAG;T ATTCTTCTT'G CTAAGAAAC'r AAATT=GTC GTTTTTTTAC TCTTGTAAAT CTATTTTTGT TAr.AGrrGAT TTGGTTTACA TCCGTACT-rA AATTGA?1'rG TTAGAGCTCT ACTr?2'ATTA MAAAAATTCA A'ITCAAGGA TAAATAAGCA GTATrCTAAA GGTACTTTTA GATGAAATAA AAGCCTTTAC A'rGGTATAAT AGAGGTAGCT CTT1TAATCGA IMGTTTGAG TGGAAAATCT GAAGAAAATG GCAGGTATCA CGGCTGCTGA ATTTATCAAG GATGGGATGG TTGTAGcGc AGGAACACGT TCTACTGCCT ATTATTTTGT CGAACAAATC GGTCGTCGAA TCAAGGAAGA AGGCTTGCAG ATTACAGCTG 'rGACGACTTC 885 TACTGTGACC AGTAAACAGG AG.ACTTTGTC GATGTGACAG CAAAGGCGGT GGTGGTGCCC CATTTrGGGTG G'rGGATGAAA CTGAAGGGCT CAATATCCCC CTCAAGTCTA TTGACCAAGT 'rCGACGGGGC GGATGA6AGTG GATACrCAGT TTAATGGAA'r TTCTCATCGA AAAGGTGG7C CCAACACCAT CAAAAGAATA CAGCTGGT CGAAAAACTA GGTG,-TTTA AATTGCCACT AGAAGTGGTT CAGTATGGT r-AGAGCAGGT ACCAAGT'rrC CGTGAAAAAG ACGGCCAACG TGACCTCGCC TTGGATGTCA TTGAAAATCC CGTTGCGTT GTGGAGCATG GTTTATTCAA ACGAGATGGA GT-rCAGATTT- CAACTTCAAA CTTTCGTCAT TrGAACCAG CrGGCTACA T'NrGTGACC GATATGCAGA ATrTATCAT AATTGC7rr GGACAAGAAT TGGACCATGT CCA.AATGGTG GATAAGGTAA TCCTTGCTGG AAAACGAAPLA TAGAAGGGGG CATAAGATGT TCGATTCTGT AGGAAWGGCT GCAGCACCAG CAGATGCACC TTCTGACACA CTGGGACACA ACATGGCTAA AATAGG'TCTT GGAAATATTC CTGAAAGCAA TCCAACTCGA TATGCAACAA CTAAATTTAA TCGTATTCAT ATGCTAATAA CTTTGTCAAT TTTCAAAAAC AGTTGGTTTG CTCGTCAAAC TCCTrCTTAAG AATTAGAGGA AGTATCTCTT TCAACATTAC TGAGCrrTC AAATCGAACA ATTCTCAGGA CCGCTGTTAT CTATG.ATT'T
TTGGTGGTAC
GCAGGGGT'rC
AATGTCCCAA
ACTGTAGC-AG
GGTAAGGATA CTATGACTGG GATACTTTCT CGA.ACGGATT CGCAAGGTTA TTCGTGAAGC GGACCACGTC AGATGGAAAC CTTCAGCTGA CCCTGTrG CAGATTGCTG CCCACGAAGA TGTACCGTAT CTGTGAATAC GCTCGTTCGA TTACCCTTGA GCATCATTrGC TCCCCTTAT GTAGGTGAAC CAGGTAAC'TT GTGACTTGCC 'rGTATCTCCA TTTTrrCCCAA CTGTTTTGGA TCGATACr'rA TGCTGTGGGT AAAATCAACG ATATCT-rAA ACATGGGTCA CAACAAGTCA A.ATAGTCATG GAATTCATAC ACACTGGGAA ATCATGGGAC CCCAGAAGAA ATCCTGACAA CAACAAAC-CT TATTCAGGAA TGGACAGTTG A-rATCTATA CATTA'rTCCT TTGGATGAAT GCGTCCTGCC CTTCTTGGTC CACTCGTACG GCAAACCGTC TAAATTGAAT GACGC'TGGTA CGGTGCTGGT ATCAACCATG ACTATTGAAG ACTATGGGAC 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420
TTGCTGAGTT
Gcc-ATcG.Tcr
CTGAAATTAT
TGAAAAAGGA
TAATGCTCAC
CGCAGCTATG
TTCTCATTCA CAAACCTAGT 'rcACr'TGAT GGTTACCGTG ATTGCTCCA TGAG -rGAT AGAGAGAATG ACCTTCTCTT GATTACTGCG ACGGATCACA CTCGCAATA TATTCCATTG
GCCCTTTACG
GAACGCTTAC
CACCATGGAA
TTGGCCTATA
ATGACCCAAC GTATGCAGGA GCCCTGCCTT TAAAGGAAAT CTGTTGCCGA TAACTTTGGT GGTCTCATC CAGTAGGACA 'NTTTGCAGAT ATTTCAGCGA GTGGAAACTG CTATGATTGG GGAAAGTTTC TTAGATAAAT 886 TGGTATAAGA TGACGCGCTA TGCTTTGCI'G GTGAGAGGTA TCAATGTTGG TGGTAACAA'r AACGTCGTCA TGCCAGCT TCGTCAAGAA ?TTACAAACT TGCGACTCGA AAAGGTTGAG AGCTACATCA ATACTGCCAA TATrTCTTT ACTTCGATAG ATTCCAAAGC CCAATTGGX-r GAAAAGCTAG AGACTTTC= 'rCCAGTCCAT TATCCATTTA TwrCAGAGCTT TTCTTACTG AG1TCTAGAGG ACI'TCAGGC GCACGAAAAG ATTTTCTCTT G7'TGAAAGTT TAGACTGAA GGGAAATTTT~ CTGAAGAATC CCT-rCTACC GCCACATTAC CTAAAAAAAT AATAAAGGAG ACIrTCCrA AAGAAAAGGG CTTGGAGAAT TGGCAGAAGA AACTCGGCC GTTCAACAGT GGAACTTGAA AATCTACCAG CTTGGTGGAC CAGAGACTTG T'rACACTGAG GCT?CGATG TGGACCAAGT CATCGCGAC:A AGATGAAGTr, CTTTAITrTrG GAAAACTTGG GATTTTCTGG C'rATTCTAAG ACTCCCTATC ATAAGTACTT GCTGAAGGTG TAT"TCGTAA'r GCTAAAACCT TTGACAAAAT TGGTCAAATC ACACACAATG ACATTTTTAA ACAAAATCCA TGAAACTGCT AATTGCZAGCC CCTGAGTTCG GTCTAATCCT TGGATCACGA AATCCAAAAT CCAGTTrTAG TAGACATGC TGAGATTCCA AGTCGGTCAT GCTGGTAAAT TGGTATATGG TCAACTGGCA TCAAGGGCGT TTCCATTTCT ATGAAGGGAA TCCTrTrr.AA S.
S
S
*SSS
S
S.
S
S
GGTCGCAAGG TCTTCGCTCr GTGGTGACT'r TCCCAGTTCG TGTGATGAAA GTTCTTGGAT GTGAAGGTGT TATTGTAACC AATGCAC-Tr
AACATGACGG
CCAGATATGT
GCGGTATCGG ATTTrGGTCCT GGTACCTTGA GGCAAAATCC ATTGATGGGT GAAAACTrTGG CITAGGGCCTA CACACCAGAA TACCCTGCCA AAACTTAATA TCAAGCTTGA ACACCACCAG AAATTCGTTC GTTCCTGAAG TTATCGTGGC ACTAACT'rTG CGGCCGGTTT GAACGTGTTA AAGGrGATTT AAAAGATTTA AAACGGGGAG AGAAGAXAACA GAGGAATACT AATAGGTrATG TCTGATCTGA TGAAGGTGTC 'rATA'rCGGAG CTATAAGACA CTGGGA GCAG AGCCCACTCT GGCTTGAAAG CCAAGA.AGAA CTCALATCACG .?GGCTATCTC AGACCATATC ATGAC=rTG CCCACGTrC CTGCCCATGA AGTGGCTAAA TTACTGGTCC GACTTATGAA ATGCAGTTGG TATGTCTACG TTCTGGGAAT TTCATGTATC AAGAAGTTGT AGAAGTGACT 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 CAAAGGCTTG CTTAAAGCGA TTCTGCTGA ATTGTAAGAA TGCCTCTGTT T'rTCAGGAT TGACTGCCTA TCCGCATTAA ATGAGCTTCT TCCTGCTCTr ATAACTGAAA GAAGCGGAAG TAGCCACCAT 'rGTGAAAGAC AAGA=rCAG GATACTAGCA ACTAGTATGA TAAGGAGAGA TGAGAATGAA TTGAC=rCT TTAGCTTCCT AGCCAAGCAG GAA'TTTCA -C=rATCAT ATATAGCACA ATGAGATTTC GC~rGAGTCT GCTTGTAAAT AAACGAAAAG AAAGATAAGA AATAATGAAA ATTGGTCAAC GAATTATGCG CTTTGGCATA AAAAATTAAG TATCGGAGTT GTATCTGTTG TAGTCGG=r TGATTTCTAG CTCr-AGCTGG AATT'rCAGCC AATGAAGTAA AGCAAGA'rGT AACATCTGAA G'rGGTAATAG GTGTCTAGA TTCTAAGGAG GAA'TTGAAAG ACTCAGAAAA TGATGCTCCA AAACTAGAAA CTCCTCTTAG AGAGGAGCCA AGACTAGCTC CTCAAACGCTr AACCGAAGAG TCAAAACTAG AGATAACATA TG=rGCGGAA T'rAGCCAAGG ATATAAGTAT TGGATGTTAC CTAGCTCAGA =rGCAGCCGCT TAACCATGTA TGAGGAAAG TGACAACT 'rGTTTGGAXAT GCTAAGAATG G~m.TCTGGGA TAGTGGAAAA ATTAAGCA'rT ATGTGGTTGA TAATAA'rCGT GATAT'rGTTA CATTTACAGG CTATTTGAA AGTCCAATGA ATGATATTCC 'rCeCWAAGCA AGTGA.AGrC TTGAAAACA ACCAGCTCA6A CCCCATGATA 'rCCGCA6AGGT TACrAAG'r'G TATATGACAG GTCATrCrT TGAACCTTAC CAAAAATATC CTCATTI'TrA CAGTGCTCCT AAAGTGA1-rA CTTCCAGAAC TGTTGGTrTC GAAACTC TA AATTAG=rT TAATGACAAT GrrGTCACTC CCTTGATTCA TAATTCACCC ITrTAAACACC G'TCTCGTrCc TAACrTTA.AT ATTGGTAAAC AAC=rACCTT GGATA.AACAT GGTTATCGTG ATCCGAAATT GG3ATAAAGTG Cr-ATTCTrTA AGAAACAGGC TCTGCCTrCGA TCTTCTAGTC AACCAAGCGC TGAACCAATG ACAGGTTrACT CAAAGT-TCCA r-AC TT'r-a~ A=cArATCC!T AGTCGATGGT AACTATGGT ACAATrCTGT GTGGCAAC-TA GATTrCCTA A.ACAAG-AAAC AGACACTGCC CAGGATAGAT TGGCAAACTT AGAAATTGAG TGAAAACGTA TAACATCTCC CCATAAAAAA GCGCGCTATG TTCGGATTGA AGAAGTTGAA GTTTTCTGCT TTATAGCTAC GCCAGTTCAA CCAATC-AGTC AGACTCCTGT TGGAGCTTAC ATTrGCCCGCrT ACTCCATAAC AAACCAAGTT GTTCGTAGTC ATTCTTGGGA TGTCCTCAAC CTCCCAATCA AAGAAAATAT
CACTCATACA
CATTCGCCAA
TGATGTCATT
TAAACATGTG
GCTAGAAGGC
GAATGCTC.AA
GAAGGATAAA
TTGGGAAGAA
AGGAAGC=G
GAGAAATCTG
GAAAATATTG CCTCAGGAAA AGAAGAGCTG TGGATGGCAA AACT'-TCCAAT CTAAGCCTTG ATC AATATTT ACAAccGAAr- CT'rTTAGACA GTTCTGGTAA 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9130 9240 9300 9360 9420 9480 9540 9600- 9660 9720 9780 9840 9900 9960
TCACCACA.AA
TATAATGCCC
ACGGCGACAC
TTACGATTAA
TCAGTCTTGC
AAG=r'CTAA ACA'rTGACAA T'rCAACACAG GTTCCAGTAG ATAAAGATGG CGCA6ACCAGA CTGCAG1-T CGAGTTAAGA TTGAGAAAAA GACGGCCCTA CTATGGAATA GATC-GCAAAC AATCTATGAA AACAGACCAA TTTAGCTCA ACCCCACCCT AAAATTACCC ATTGGCGTAC GACATTGAAT TCCAAGGTGA GTGACGATGA TGTCT'rGTAA TCTGATGGTA GAATCACACT TAGITTGTCT AGTT'TATA).G AAAGTACTAC CTGAGCTTGA ATAGGACTCA GGTAGCTCTC TATGAAAGAA CAAAATTAAT AC1'C ATGAA AATCAAAGAG CAAACTAAGA AACTAGCCGC AGGTTGCTCA AAGr-ACTGCT TTGAGG'rrGT
AGATAAGACT
GAGATTTCG
GAAAACTTTT
CGGrGCCACTC 888 GACGAAGTCA GTCACATATA TAATCCAAGG AAGAGTATAA ACAGAAAGGT AGCCGCGTG CTAAAAACAA AAACGAAAGC ATGGGTAAAC TCCTCTAAAT CAAAATTAAG AAAGGAATTG CGAAAAAGAT AGTTGATCTA CCGAGCA'rCG GC'?'rTTTGAT GGTTTGTA TCTTTCTCAA GTCTTrT ATTTGAACAC GAGCGGAAAA CAGGAT'rTCT CCTCAGTTC CTAArTTCA ATTATGTCTA TCCATATTGC TGCTCAGCAG GGGGATCCTC TTCGTGCTAA GTr'I2TTGCG AACGAAGTGC GTAACATGTT TGGrrACAC'r ATGGGAACTG GGA'rGGGAAT GCCA'rCTATT
CTCACTGCC
'rATAAAATA'r
CTCGGAAAAT
GTCGTTTTCT
GGTGAAATTG
CGACGTTGAC GTGCI-r'GAA TTCTAATTTG AACACCAGTA TGTATTCGCT GAACTGAATA ACCCCACCCT AAAAGTAGTG CCAACTCCTA TTTTCCCTTC AAAA'rAAAGA AAGGTAGAGC AGATAATCrG ACTGAAAAAT TCTCGC-,CT1' TGTATCATAA CTGATAAAAT TCTTCTTCCT GACAATTTCC TTGATGA'rGC TGTTTGTrTTT GGTACTTACA AGGGTCACTG TGTATCTGTC TCGATFTATG CGCGTGAGTT AATCGTAGAC TACGGTGTGA AGAAATTGAT TCGTGTGGGA ACTGCAGGTT CTTTGAA'rGA AGAGG'TTCAT GTTCGT43AAT
TGGCCACACT
ATCGCCAAAA
TACTTT'TGGC GCAGGCGGCT GCAACCAACT ACCATTTTCC ACAAA'rTGCT AGCTTTGATT AACTTGGTAT GACTAC'rCAC GTTGGGAACG TACTCAAATT ACTTTGAAAA GAATATCGAG CTTGGTAAAT ATGGAAGCAG CAGCTCTTTA ATGACCATCT CTGATAGCTT AATACCTTCA CTGATATGAT CCAAAAAGGG GCTCTTrGTC CAAATTTCGT CCTTTCTTTT CAAAGTTCCG AAAACCAAAG CCAGTTrGGC ATTAGAATAG GAAAGGTTTT AAAGACAGTC
CTATCTTGCT
GGTCAATCCA
GAAGGTTGGT
AACTGTAGTG
TTGATATTCA
GC-ATTGCGCT
GCCCAATACC
GACGAAGACA
'rrGGAAACCT
GGTTGAAAAA
GGGATAAA
TGATAAG'TTT
CAAACATCGT TCGTAATGAC TGC7rrGATAA AGCCTACCAT ITTTCATC TGATGTC=r GGGGAGTCA.A GGCTGTGGAA ATGTTGATGC GCTAGCTATC CAACTGCAGA AGAACGTCAA TGAT'rGCAGA ATAATTATAG AAGCTrAAGCT TGAGAAAGGA AA'TCCGTTT'r TTGAAGTT GATGAGA'rrA TTGGTCGCTT 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 TGTACTrGAA GCGCGT'rGAC GATTTTCTCT T'rGTrCTL'rA TGAAAAAGAG GATGAACCG CTTCAGATT'G TCCTCAATGA ~rCCGAAAAA ITrTCCAGGG TCTTTGTTCT GATAGTGGTG =rCAAGTCT TCTGAATAC GAAAGTGAAA AAGTAAGAGT TGATAGATCT TTAAAATCTT GTCAAGAATT TCTrrrATTTG TTAAGTGCAT GCGAAAAGTA GGGCGATAAA AACGTTTATC GCTsAxITTA CGACTATCCT GTTGGATGAG 'rrrCCAGTAA CGCTTGATAG CCTTGTATTC ATGAGAIM~T CGTTCAAACr GATTCATAAT ?TTAACACGA AAACGACTCA TGGCACGGCT GACATGTTGG ATAATATGGA 889 AACGATCTAG AACGATTTTA GCACACGGAA AAAGCI'3"rr AGCCAAGTCA TAGTrAAGGAC 11020 TAAACATATC CATCGTAATG A'IrrTCACTT GACAACGAAC GGCTCTATCC TAGCGAAGAA 11880 AGTGA'rTrcG GA'1'AcAGCT TGTGTTCI'GC CTTCAAGAAC AGTGATAATA TAAGATTAT 11940 CAAAATC -ra CGCAATGAAA CTCATCTC CCTTAGTCAA GCCATACT~CA TrCCCAAGACA 12000 TAATCTTTGG AAGCCGAGAA AAATCATGCr CAAAGTGAAA GTCATTGAGC TTGCGAATGA 12060 CAGTTGAAGT TGAAATGGCC AGCTGATGGG CAA'rA'CAGT CATAGAAATT 'TTTCAA2-rA 12120 AC~wrTTGAGC AATT'rTrTGG TGATGATAC GAGGGATrc GTGA?!r"rC ?TTACCAGGG 12180 GAGTCTCAGC AACCATCATT TTTGAAsAGT GATAGCACTT GAAACCCCGT Tr'rCTAAGGA 12240 GAAI-rCTAGA AGOCATACCA GTTG'TrCGA GGTAAGCGAT CTTAGACGGT 7TTrGAAAGT 12300 CATrT1'TrT CAT'rAGAC~r CCACAATCAG GGCAAGATGG AGCCTCATAA TCCAGCTTAG 12360 CGATAATTTC 'rrTGTGGGTA TCCATATTGA TGATATCTAG AATCTTGATG 'TrGGGTC'rr 12420 TAATATCGAG CAGTT1'TGTG.ATAAAATGTA ArrGTTCCAT ATGATTCTTT CTAATGAGTT 12480 GTTTTG'rCGC TTTTCATTAT AGGTCATATG GGACrr'rr Tr TCTACACAAA AATAGGCTCC 12540 ATAATATCTA TAGTGGATTT ACCCACTACA AATATTATAG AGCCCAAAAA GG.AACCCCTT 12600 TATGAATTGT AGGACTTCCT TTTCTTATCC ACAAATTGAT CTAGCTCTCT CTGAwrCCA 12660 AGAATAGTGA CTTATGTGA ATATrCTTGG CAAAGTTTT GGTAATTTTC TT1TGTAGTT 12720 TTGCGGACGC CCATCCCAAA GAATCCATCT GATAAACTCC CACTCAAAGC GTTCAGGGCA 12780 ATCTACCGCC ATACTTTCTC TGACTTTTCC ACGGTATTTA AGATAACGCT TAAAGGCTCT 12840 *AAAGAGACAG GTCAATGGCG AAAAATTGAG AAAGATGATT 'rGGTCAGcrr CTTGCATTCG; 12900 TTCTTGGTAG TAGCACCAAG AATAA'N'ACC ATCGATGACC CAAGCTTTAT GCTTGGTGAG 12960 AAAGTTTTTT ATCTCGGTTA ACATCCATT~C GCAGTCACTG TCTTGCCAAC CAGGTTGAAA 13020 *TTGGAGTGTG TCCATGTGCA GTTTTGGAAT GGAGTAGTAG TTAGATAACT T'rTCTGCTAT 13080 *AGTTGACTTA CCAGAACCAG AATATCCGAT AATTGCGATT TTCATTTTCT ACCTT'rTCCT 13140 ATTTGGAGAC AAAAAAACAG CCTCTATGGA CTGTTTCTTA TTTAACAAGT TTAGCTGAAA 13200 ACGAGCT'rT ATCGcGGcr GcTTGTTTT TGTGAATcAA AccTrTAGTT TCTGCTTTAT 13260 ***CGATAGCTGA GCTAGCAGCA CGGAAAAGTr CTTCAGATGG GTTTGCTTCG AAAGCTTTTA 13320 TAGC-AGTACG CATAGCTGAT T=~GAGCTG AGTTCTTTTC GATTCGTCTA ACGTTCAATT 13380 CAGCGCGTr'r GATAGCTGAT TTAATGTTTG CCAATGGTCT TACCTCCATA ?TATCTAACT 13440 INFORMATION FOR SEQ ID NO: 129: Wi SEQUENCE CHARACTERISTICS: LENGTH: 8512 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 129: CCTr 'rTTCA AAAACTAGAT ACTAGTCTAT CAAAAGTAGG AAAGGG?1TrC ATTGGAAATI- TIT1TGAAAAT CATAGAACTA TTAGCTAATC CCTAGTATTG AAZAAAMTrG
AAAAGACTGG
ATAGCT'rCTT TCAGGTCATC TTGTAAACTA TTTCTCTGGT ACCAGACAGG ATCTAAAGTT GGAAAATTTG TCAAC.AGTT TrATCCAAGA AGCTACTTGT TCATAGATCA CTCTTGCTAA ACGCCAATCC TTAAATAGTT GGCCAAGTAT ATCAAATACT
TAAAAATCCT
TCTTGCTCCA
TCATCATCTG
TCATGAAC'rC
CCATGTGCAA
TTTGAAAGAT
CCCAAATGAT
CAAGTTGGAC ATAGACrTCC CCCTTTCTTC TATCGQAAAA ACTTCCCTrG TAAAATAGGT TAAAGCGAAT CGACATTCTT TGTTTTTAGG AAAGTCTGGA A.AGCGTGAAC CCAACCATAC AGTGCA.AGCC TTGA'PrTAAA CAAACAACCA CTGCArTCT .too9 too o to e TGACAAACCA CCTCTGTCAG TGACTTGAGA AACCCCTrGT AGGACATrAC GAATTTCTCG TCCTGGTTAT AATTTGGTTT AATACCTCTA CATTTCTAGC TTCTCA.AT'rA AATACAATCT AGTAAGTTCA AATTTAACAT GACAAATAGA ATGACGC'ITfA CCTACTCTCC CAACTCAGCA
TAAATCGGCT
ATCCTTTTCT
AGAAGGATT'r TTCTTCTGC'r ATTrCTTA GTAAATCT-G ATACATGGTC AACTGTI'CAA AAAGGCAGTC TTAAATGACT CAATATTGAA GATATAAAAT GACGTAAA'rA ACTATCAATA CCAGT'rCTAC CACGACCTTC AACGACATTr' TTGAAAATAG CTACAACTAA ACAAGCCCAT AAACATCATT CTAAAAAATT TTTCTATTCC CTATAGGAGA TAATCTGGTC AACTGTGTrCA GACAAGAATT 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 14'40 1500 GGATGGTATC ACGGAGTGGT TTGTCTGTTG AAATATCAGC CAAGTGGTGT CTTGCTACCA CCTGATTTGA GGAGATTGAG CAGAATGTTT TAGATGAAGG TAACCAGCAG TCGAGATAAC tAACTATACAA GCCCATATAG TAGTGAGCTT GGCGCATCCA CAATTTCAAT AGCATCTCCC CAGAAATCCG TCAAAACTTC TGCTTGCTCC AAAGGTCTCC CCTTCT'rCAA TCAATGTATA CTTCCA6AGAG-GTGGGTGATA AAGTTATGGA AGTAGGTGTC CGAAGCGT'rT TTGACGTGGG TCATTAGACT GGTTCTCCAA ACCGATAATC ATGGCTGACT CCAGTCTrCA GCTCCAGTT TAGTCCTGCT GAGTAAGTGT AGTCAGAGTT GCATCATCGT CTTCATAATG CTGTTGAGCT CACCTTACGC TCGAAGGCGG TGTCA.AGCGA TGAGCCAGAG GTAATCACTG AGTAGCAATT CATTGAAGGT TGACGGTGCT TCAACATAGT AGGTCGACAT ATGGGCATTG AAGTAACTT 891 GATGATTGTC TGAAAAGATG AATTGACCAG AATGCCCGAT TTCATGAATC AAGGTATAGA CATCGCTCAA ACGGCCTGTC CACCTCATGA GTACATAAGG GTGTACCCGA TATGCGTCCG CCGCATAACC ACCGGAATCC T'rGCCACTGT 'rAGCAGCAAA CTCCACCCAG CGCTC1-rCT GGTAACGAGC AACTTCCTGA CAATA'N'CTT GCCCCAAAGG TTCTACCGAC TI'CATGACCA AATCATAGGC ATCGTCAATA GCTACl'TCAG GArrCAGGCC GCTGTCCAAG TCCATTCe AGTCTGCAAA GGTCATCTr GACACTGG TGCAAAGTCC CCACTTCTTG TTCAGCTrAGA AGATTTTC AGACTTGACC GAAGTCCCTC TGAGAAGGAA GGTAGAAATT CTCATAGGTC CAGCCATTTC AAAATCCCCA CTTCACCGAG A'rTGTCAAG TTTTAGCCTG ACCAATGGCA TCAAGACCAT 'rrACC7TGGC AACATG~r~G AGGTATCTCT T'rCATG.ATGA GCTCAATCTC GCGC;TCAAAC ATGACACGGT AGATAGTCAA AGACAGAGTC GTATCCCTTC ATATCAGCCA TGAGCCAGAT AGGCTGCTGC AGCCGTATTT TGGTGCTTAC CGGAAGGATT TCTCACGAAC C'TCAGCATCC 'rCATGGTTTT ACAAAGCTGT TTTTCTAGGT CTTGCCATGG GCTTCAAAGT GCTCGCATCT TAGTATAAAT GTCCTGCG3GA CTGTAGAAAA GCCTTCTCCA CATCTGCCCC 'rAAGTAGTGG GCTTTTTTGA 0 0 0 00.
0 GCTGTTAAAT GTGGCAArrT ACCCA.AACCC CCTCATCTGC TGCCACCAAG GCATCGTCAA AGAAGGTCAA GGCTACGCTC CAAATTCCAT CCCAGCTTGG GCAATATTGC CAAATTCGTC ATTGCTATAG GAGGCATAAA ACCATAGTTG CCAATATGGC TCATCTGA.AT GTAGATCTGT CAAAGCCT CTCGAAATCC TCAAAAGTG'r GAAGATITGCC CTTGTAATCA
TCCAAGACTT
GCATCTGTTT
TCCGTCGrCT
TCCAATTCCC
CGGCTAAACT
1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 GGTTGATGTC TTCGCCAGCT 'rTCTCGATTG GGCCTGTrAA GTCCCAGAGT TCCT'rCTCTG
TCTTCCTCTT
GCTGTTTCTc AAACr=CCA GCTCCTG'rTr
GATTCTTCAT
ATI-TCTCTAA TTC'rACTAAA CTCGCAAAAT ACGAGGACCT TTTCTGCAGG TGAGAGACCG CAAGACCAGI' GACTGCTTGC AGGCrACTAT GATAGAGTCA CACCCAAGAA ATCCTCACGG TC'rTGGTATA GAAATTCTG3A ACGGTGTTTT TGTrCCATr'r ACAC'rAAGGG C'rGATAAAGC GTAAAGCGGT AGGCCTGCCA AAACGGCTCC TTTA~CvCTA CCTTCTGGAC CAAACATAAA GAGCAGTTTG AGAAGCGCAG CGCTTCTCC TTC'rTTAGCT AACTGGTCCA GCTGAGCTAG AAAATCTGCT TrTTTTCCA AAAGTTAAT ACTTGGTACA ATATTACGCT TGCTrTrGCTC GGCTGCTCr-A AGGCAATrr T'iTCTAGTTT TTCAACTTTT TTACCCAATT TC1-GCCATC CCACTTGGCA ACTGACCAGT CTGCAGGAALA GGCCCAGATT TGGCTAGCCC CCAGTTCGGT 'rACTT'rTTGA GCGATGAACT CCAGCTGTC TCCC!TCGGGA AATCCAGATG CGATGGTCAC TTGGACTGGT 892 AOTTCCACAT TGTCATTTAA 'rICTGGACC AACTCAAACT GACGATTT'rC CATATCCAGC ACGCGCGCCA AGCGCTTGAT GCCATCATCA AAGACTAAGG TAACCTCATC CTCTTCTTTC AAGCGCATAA CCTGAAACA1' ATC-CTTACTG GTT'rCCTTGT CCTCGATAGT GACAGGAG2AG ATAGCACTGC CrI=ACAAA ATACTGC2G ATGC'rAGCCT CCAATCACAC CAGAGATATC C'TGG~rrcTTAAAGACAC AGC'rATTCCA TTCCCCTTGA ACCATGTGAG TrrTCGAGCAA AAATCCAGCT CACTCAGCCG ACTGGCGCAC CATGTCCAAC CATGATCAGG TAGCCT'rAT CCTTACCAA GCCATAAGCA GATATCCCCC AAGA'rATTAG CCACAATCAC ATCTGCCTCA ATC'TCCAGCC GCTACATGGA TATTTTCCAT GCCAGGGTTG CACACGAACC GCCACATCAT CCAGGTCATA GGCGAAAA'IT 7rrGTrCCTTGA TAATGCCACT TTCTATTA GATGA.ATGAG ATTTCCACAC CCTTAAGCAA AGCTCAATAT T'NrCCTGAGC TCTTTAGCCC CCAGAAGCA AGCACCGTTT CCCCACCACG GGCGTGCTTC CAGTACCAAA GTCGC--TCA' AGTCTGTCCA GC'TGGCAATA GAGAGAACCC CTGAACCAGT AAGAACCTGT TCCAAGGCAA AAAGGCTCAT AGCCATGCCA GGATCCAGCT TGATAATCAT
AG.AGCCAACG
GTCTGCCCAG
AAAATCTGTC
ATGGTCAAAT CATGAGTGAT TCTTCCTCAG CCAACGCAGT AATTCrGCTA GACGAGCCTG
CCCCACATCT
CTTGCTAGT'r
TTCCCCCGCA
ACGAGCAGGT
CGTACCTA -r
CAAATCCGCC
TCTTCTTGC
GTCCATACTG
C.AACTCCTrCT
TACCAAGCCC
TC-ATAGTATT
rTTAACTCTC
TCAACCACTG
TGCTCCACCT
TCTTCGATTG
TCTTCCAGTT
CCAAATCCAT
TCACATCCAC
CTGGGAAAAT
CGACTCCTTG
CGTGTCAGGG TAGTAGGCTG TC.ACTACCAT
CTCTCCAAAG
CGCTCCCAGC
TTTTAACTCT
AGGAAATTCT
CGGTCCACAT
TCAATCAAGA
TGCCATGTTT
CTGAAGACC
TTCCCACA-TA
GATTCGAAAC
CC.ATTATTAA
3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 CCCTrCACGCT TCACTGTAAC GTAAA.ACACA AAACCAAAAT GGGCGGGTTC AGTTTAGAAA ATCTTTTAAT CTCTCTTGCA ArTC-ATCAGC TGATAGCCCT TCTTAZCTGA CCAACCATAA CTTGCTCCAA AGCAGACAGT CCGAGCGCAT TCAAAAGGGC AT'rGGCCCAG GGAATACTTG AAACAAACCA ATCTTGCAAC CTAACATITC CT'ITTCCAGC TTGTGTCTAA GAGAAGTTTA TCTTTGGC ACAGTGTTTA TGTAACTGAA CCATCCTTC TAA'rCACrTA CTTTTAAATA ACTGAGGCAC AACTTGACTG GAACrA.AGAA ATTCCTCAAC GTCCTTCATC TCCGAAGATG ATATrGTCAA ATTGTTCTTG AGACCGATTT C'TTGCCTTTA AAAATTACGC TAGCATAAAT CTCATCTAA ATGAATTCCC AGTTCCI'CAT AAACTTCACG TTTCGTCCCC TCACGGCCA CCACCTGCCA GTrCCCACAT CCTTATCATC GCGTAAGATA GTCAAAAGCT TATCCCCACA CTGTGAAATC AGAAATTTICT AGTTCCATCT TCAGTCCTT 'rCGGCTAACC AGTTTTCATA ATATCTTTTC TCATCCCTCA 893 ACATTCGACT ACTATCCATT TTCTGTCTAG CAATCTr.G CATCTTTCTT CACCTTT1AAT TGATACCAGG CrrGTATCAC GAGACAGAAA CGATTTGACC TGTCGAATAC 'rAGCATA-rG CTTCCAAC.AA CGCGATATG.A AGCACGGATA CTTGGGCAAC 7rGTCCTCTC AAGTAATGCT TGAAACTGCT GrAGCTAC ACCCTTACGA GTTCGATCTA TTGAAGA=T GACAGTITGA CTCCGCTrC TCAAAATCTC TG~rTCCATC ATCGGAGTAG ?TCTTCCTTC CC'TTCCAAAA ACTCCCTCGT GCATCCTCAG AGCTAGCAAC GTCTGGGCAG GTGTTCTAGT AATTCCCAAA TAGGAAGAAA GATAAATTAA AAATCCAAAC AATATCGCC-A TGGAAACTTC ACCTTGCATA CCTAA'rACAC GAACTGCTTG AACA.AAGTCT AATCTCTCI.A TACCAGGTAG TTTTTGCTCC ATCCGTCATG GAAAAATCCA AGTCCCTGCA
TTCAAATCAT
GCGTATTTGG
TCGACG~CA
AAATACCAGC
CATCCGCAAA
ATTCTGAGG
TATT'TTCAGG
AAATGTTAT1'
TCTGTCAA
ATAATATCAA GCCGAGATGA ACCATCAAGC CTCCTGAAAG CATCAGGATG ATTCTTTGAT CGCTTTCATC CTCTTTAA.A CCA.ATGTATT GAGCACCAAC A'N'TCAGA ATGGCTGTTIC TACTAAGATG AALACCTGCCT GAC=rTTGG TCAAAATAAA ATCTCCTAAT CCAAAAGCCA CCAGCCGATA GCCTGTCAAG TAGCCACAAA AAGCATGACC CAGCTCZATGA AGAATAAAGA TTAAATACAT GCrrAGAAGA ACCCCAACTC *TGCA-AATiGCG CTAGCACATA AACCAAATAA AT=TGGATA AGGATAAAAT GCGAAGGCAT AACCAAAAGT ArtTGTTCCAC AAGCAAAAGC AAAGGCTAAA ACTGCGGAAT TAr.CATAATA AAGC;AACAG 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 G'rCCCAATTT TCTTCZATAAC ACCTCCAACC AACTCCTAG'r TC'rCCCT'rTr CCAAGCCAAT TTTTC=CrT CAA.AGACTT CTTGGTTCCA TTCCATGACA AATTCCTCTG CTTCTGGGTC TTCCAAAAAG TCCATGAGGA CATCTAGCCC AACCTCAGCA GTATCTTTAA GGAAAAGCCC AAAATAAGCT AAAAATTCAC GGG-AAAATCC TTTTTTAGGC AGGTAAGGAA TAACAGTCAA ATAGTCTTCC TCATTGACTG TTGACTrrGGC AGGATTGTAG AAAAGGACCG CTTCCTCAAA AAGAATGTCA TCTGATGA;A CCTCTCCG'rC TTCATCCACC ATCTCCACAC CGCAGCATTT TGCGCTrCCA ATAGAAAACT CACTTCTACC GCATGGTTGC GTTTGTCCCA GCTAATCTCA AAGTCAAAGG GAAAGTTCTT -GTCCAACTCT TCCTCTAAAA TATCTAAAAA TCCGTATGTT GCCATTTTCT CCTCTTTCTA TGCGACrCTT TAATCGCCCC GATTGCTCGG AAATA'rGCTA AAATAGATAC TACCATCTTA CCACAAAATT ATTTATGTC CTAATTATAC CATATTACCT CATTT-AAACC CTTGGTATCA GTGATTTrCT TAAAAGTCTG. AlTTTCCTCAT T'rCTCATAAA AATCAATATA AAAAGCCCTC GAAAGGGCTA ATAAATCTAT AAAATCAATA GGCGAGTAAC TAGCACAAGT GGACGTGCTT 894 'IMrATTGAC TAT'rACCACG ATACCACGCT 'rAATCrTAGG CTGAAC~Trr CTTATC1'CCA A'rAGCG=TT TCAAAGTCTG AGAAAAGTTA AGCCCCcATTT CTCGTCCCAA CTTATCTGCC CATT'TTGGTA TGGTCAAAGT CTTTTAATG GGT'rCCTGAC TTC=TAGGTA TTCTGATACA TCAACAGATA CCATACAAAT AAAAGAmTA TCAAGGTCAT AXGGACAC GAAATCTTCA TCATCTAA AAGGATCATT ATCAATTAAA GACAAOCTAT TGATATCTCA TCCCTGAGCT AACTCrCCAT CACTCTCTAT CAAATCTCCA ACAGTTATCC CTAGCAZCTC CGACCCCATA GCCAAACCCT CAGAAATCCC CTCTCCTTGT GTAGCTGAGT A~rCAAAATC TGGGAAATGG ACAAAATAAG TCGCTTCTGT 'TCCG'TCTGTG TCGTCATAAT AAAATAAAGC TGGATACGTA ACTAACAT'rT CACTACCTCC ATATCAAAAA GCAGGGACTG AATTTTACAA CCCAGCTTGC TTTC1'TATCC CTCTTrTCAGT GTACTTATTC AGCTCACCAT GAAGGATTGT GATAGGTCTT TCCCCTTGCT TTTCCA'ITTT AATATGGGAG CCTTTACCGC C=C-ACTT~ TATCCAACCA 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 TGGGCCGTAA GGAGTAAC CATCTCTT GACAACACCA TTA'rAACACG TGTTACACCT TATACATIAAA AGCCCCTAGA TGTGGTTCTA CTAATGAGTA GTAAAAACTC CTCTTTATC AXAAGTA'rc CGATr-TAAGA CAGATTGAGG TACTCTATAT TTTT-TATCTC CAATTTTTAC ACCACGTCAA CGTCGCCTTA CCGTACITCAA TCTTrTT~T TCAT'rGAGTA TCAT'TAACTC ITTATTrACAAG GAGGCGATT ACTACTTCAA TTCAACCCAT 'rATTAGAATG AACI-rCTTGG TCGTACAAAT CTAAAATCTC TTTTGGAGTA GAAATAACTr GATCAC1'AAG AATCCAATGA TGACTTGGTA GTTCCATCAT TCCATTGC AATGACATTC AGCAATATTA TT1'CTC.AAGT CCTATTTTGA GCTTTAAGGA AAATCAAATC AAATAGrAAA AAAGAT'rCTA T'rCACTCCC ACTAATAGGC TrAcAA'rGAC ATAGTAAGAT CG?1TAGAAAG GACTGCTATG CCAGACAATC TGTGTCATAG GCATAGCGCT TTTACCTCCT ATTGTAAAGG AGTGATACTT ATTAT'rCTAT AGGGAAGCCA ATT-ATTCAT ACCTA'T'r GAGCAATTCA TCATCTGTAT AGTCAATTGT1 CGGAGTTGAA TGAATCATAG GAACACTGCG AAACTGATAC TCTTlCGAAAA TCAAATTCAA GTACACCCTC CC=AG=~ CCTAGTTTGC TCAAGTCT'TC GAAATCAGGA TTTTCAACAG AAACATCAAT TATTCTA'IrT TTCATAT=r TAAGCAAAAT CAAGTTTAGA TCTrCCCGGA AGAAAAGT?1T CGAATTITGTT TTGTAAAAAT TATCACCTCT CTTTTCATTA
TTTAATGTTT
TCTTTTCCCT
CAAAAT'TTCC
TAGTTCATAC
7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8512 CAGCACTTCC ACTTCTTTAG GCTCAACTAT TCTCATGCTr. ATACCTCTCC TCATTAAA'r TGAT'rATTAC AAAACCATTG AAATATCACA ATAGTAGATG AG'rCATTCTA CTCAAATCCA TCCCGTTCG CATGCGCC~n G INFORMATION FOR SEQ ID NO: 130: Wi SEQUENCE CHARACTRISTICS: LENGTH: 2569 base pairs TYPE: nucleic acid S'rRAlNDEfNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130: CrCGrI-TCAA GGTTGAG'rCT C?'rGC.AAATC TTG?1TCGCGT TCTTCCTTTT CTCTCCCATG GTTGGTGCCA GCCATTGTrG GAATCTTGCT AGCAAGAAAG CGATGTTTTT GAAATGGAAT AATCACTTAA TACAGGAGTG ATI'kTCTTTT ?1-IArCCGAT GATAAATGTG GGTGAAGAAA TGAATCAAAC AGTAGAATAT ATCAAAGAAC ACAGGCTrTA CTCGTGAGAT TGCGGACTAT TTAGTCAAGA CTCATTGGTr
ATCACTTTTG
TTATAATAGG
GCCAAGGCAT
CrACCAAACA
TAGCCAAGTC
TAGCGAAAGA
S
S
S.
S
S S CAGCCGGTTC GCACATCCAA GGGCGGTGTC CAACATCGCT ATGTGACTGC CCATGTAGAT CCAGACGGCC GTCTCAAAAT GCACCGTATC GAAAACTGTA CCATTCATGT GGCTAGCACA
AATCTAACTA
ACGCTI'GGTG
GGTGGCTTr'C TGACAGCCAT TGC~tCGCCA CTCTAGAACG TTTTGGTTAC TTAAAGGTCA, AAATGATGAG CTATTrGTCCG TGCTGTCAAA CTTGGAACAT GATTGAAGGA GGTGAAAAAG TATCAGGAAC CATCCTCATC GCAGGAACTG CAGAACGCAC GC AAGACAAT CACCAAACTT CTTGCCATGT A'rGGAAGTGC GT'rTGGACGC GAGGTCGGTG ATTTTATCAG AAGTCTCCCC ATTrGGATGA TATAAGGAAG AGAAGAT'rGA GAAGTGGGAC ACGGTGCAAA GATATGGGAG CCATGGGAGA AAGGATGCTT CTGGACCTTA GAGCA.AGATA TTCCATTTAA
CTATAAGGAT
CAAAGTAACT
TTTTGACCCA
CAAGGTCAGT
ATTGCCCGTA
C1'CTAACATT
AGTGAAAAAG
CGAACTGTCG
GCGGCGATTT
ACAACTCATT
CCTGCTCAGG
AAACTCGTCC TC-TGGCATT TGACAGAGAC AGGTrTTATC TGCrCAATC'r CCTTCGCATT TTGCTTTTTC AG'rCI-TGAA TAGTAGAATA TCTGGCTGTG 120 1.80 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 TGACCAGCAA ACAGACGAAr ATACAGTGTC TATCTGrGTC TCACTATGAC TTCCCTCAAC ATTTGGTGGC T'rTGGCGAAA GCTGGATATC TATCCATTT GCTATGTCTG CAGGGGCAGA AGTCAAACAC GCCCTTCTCG CATTCCTATG AGCGTACCCA TATTGACTCG GTGATCOCAA TATCPAAGA GCACGTTGGT GGAC'TAATAT GTCCTTATT
ATGGTTCGGA
GTGCTGGTAT
CAGAACGAAT
TGTCAGAGAA
CGCTTCAGCG
AGAGTCTAGC
GGTCGATGCT
?1'GACCTCAT CAAGAAGGAA GAAAATCCTT ACT'TTGTCAA AGAGTTGGAA ACAGGCTATC 'rrGTGGTTGG AGACCACCAG TATTTTGAAG GCTATAGTCT CT'rTCTAGCC AAGGAGCATG TCAGCGAATT 896 GCACCATTTG AAAAAGGAGA CAAGACTCCG TTrnCTAGAA GAAATGAGTT TAGTCCAAGA GGCAGTTGCC AAGGCCTTrG C2'GCTGAGAA AATGAATATC GXACTGCTAG GAAATGGCGA TGCTCATCT'r C-ATTGGCATC TGTTTCCACG ACGGACAGGT GATATGAATG GTCATGGTCT CAAGGGTCGT GGACCAGTCT GGTGCGTTCC CTTrGAAGAA ATGAC.AGCAG AAACCTGCCA AGCAAAACCG GATGAGATTA AAAGATTAGT CAAACGTTTA 'rTCAGAAG TAGATAAACT ATTAGAAATA AAGGAGTAGA AA'rGAAGAAA AGATACCTAG TCTTGAC.AGC 'N'FGCTACC TTGAGTCTAG CAGCTTGTT'C ACAAGAAAAA ACAAAAAATG AACATGGAGA AACTAAGACA GAACAGACAG CCAAAGCTGA TGGAACAGTC GGTAGTAAGT CTCAACCAGC TGCCCAGAAG AAAGCAGAAG TGGTCAATAA AGGTGATTAC 'rACAGCAT'rC AAGGX-GAAATA CGATGAAATC *00.
0 ATCGTAGCCA ACAAACACTA TCCATTGTCT GCCAAGGCAG AGTTGGTCAA ACTCATCAAA GATCATTACA G'rGGTTTTAG AAGTTATGAA AACCAAGATC GAAAGGCAGC AGCTCACC0T CAGACAGGCT TGGCCTTrGA TGTGATTGGG GCAGCCCAAT GGC'rCTTGGA TCATGCAGCT GGCAAGGAAA AGGAAACAGG CTATATGGCT GAAGCTAAAG AAATTGCTGC AAGTGGTCTC GGAGACTACC TCGATTAATA CTCTTCGAAA CCTACTGACT GCGTCGGTTC TA'PTCACAAC GTTTTATCTG CAACCTCAAA GCTGTACTr'r TTTGATTTTC ATTGAGTACA AAAAGTAAAC ATAATGGATG GGTATGTGAA AAACATACTT AAACCACAAA. GGAGGATTTA AAAATGGCTA AAAGACTATA ATCCAGGGGA AAATCCAACA GCGATGCAAG AGGCAGGTTT CCCTATTAGT ACrCAGACCA AGCTCTATCA AGATTATGTC TAC'rCTGCCC GTCCTGGCTA TAGCGAACAC ACTGATGGTG ATT TGGTGAC AGAAGAAAAA GAT'rATGGCT I-rG~rTGTCCG TTATCTCAAA GAAGAATGGC ACCTGCGr'rA TGTAGCAAAA AGTTTGGA.AG AATACTATGG cTrTTGAAGGC ATCTCTTCAA ACCACGTCAG CGTCGCCTTA CTCAAAACAG TGr=~GAGT cGATTICGTCA GAGCAstGCG GCT-AGCTTCC TAGT'rTGCTC TNTTCTCTTG CAATrCCAGA TAAATAGTGT GTGGGAGGTA AAAATCTCTA ATTACCGCCA AAAAAGTCGA AAAACTTGTA A.ATTGCAA 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2?60 2820 2869 TCCCTGCTGG TAAAGCTACA CCAGCTCCAC CGGtrGACC TGCTCTFGG .42) IN.FORMATION FOR SEQ ID NO: 131: SEQUENCE CHARACTERISTICS: LENGTH: 6186 base pairs TYPE: nucleic acid STRANDEDNESS: double (DI TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: CTGAATCC-.r TATAGGAGTC TCALACATCAT TAAAAAAAGA 897 CACTAACrTT TTACCCTCrA C=rTGCCTrC ATAGGCAGCT ArGCACTGAA GCAAGI-rCTT CAGTGCrCCA CGACAAATCT AGrGGGTAAC TATACTGTr'r =~CATTAAC 'rAATACCAC GTTCTTGCTT ACGATAACTA CGAGGGAGAA AAGCACGAAT 'rTTGCATACG CTTGGCATCA ATAATAATTG TCAAATGAAG CAATTCCGAT ACCGAAATAC NT"7TGAACG AG1'TGAATG ATGTCALTCAG GCAATTCTTT TC11'GTTAAA GGACTGGTCA TTTTTCTAAC CACGCCTTAG CCTTACGACA AATCATGCTT T-rCTCTrCTT ATCTATACTr GCTTGACTAG GGAT'TTTAG AAAAAAACC AACGATTGAC ACAATTGGAT TTGTAT ATTTATAT ATGTAAATTA AAAACATCTT AGATACCAA AGAACTTTTC ATAGAGGGCA GTCTCTGGCA AAATC-ATAAT A.ACAGGATCC
GACCACGCAA
TCTCTACATC
TACC-ATTTTC
TAATATTGTG
TGATGTACAG
TCTCArrc= GCTTCTTTTA CTCATCTTCA TTAAAACCCA AAGACTAGGA TACTCCTCAA AATArrCAAT T=TGAAAAA GGTCA6AGGAA AGGATGTGTT TTCCACAAAG GGAACTTATG CTCGGTGATA GAAATAGTCT TGCTACTTCT ATT-ATACAAA AAAATAAAGC TA==~rCA AGAXAAAATAG GCTIrTGCG CACTCTTAAC GATGG=rA AACCATATAT TCCTTTCACT TCCTACGACT TTTCAGATAC AAAAAGAGGA GGAAGGCATG TTCGCTGGAT CAAAAAGCCA
AAGAAAGAAG
GGTATCATCT
CAAAACTCCC
ATAAAGAGAC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 CCCACAAAGA GAATTTGATG GAAAACAGTA AAGAATTCG-T CA.AAACCAAT CCAAGTCCAA TCATCACAGG TAAGACTACT AGAGCCAGGA GAC'-T-TTrCG AAAAAGTCCT 7"r'TCACAAT CCTATTGACA AAGACATAGA AACTTGCAG TGTCACTAGA GCTACTAGCT GAACCAAATG AAAGAGATTC TTGACCACTG CGAAATGGTG CAGACCAGCT GCTGACGAAC GAAAATCAGG ATCAAGATAT GAAAATrTGTA TTTAGCCACT GAATCTCCAT GAGAGGGAGA GGAGAAAGAG CCACTCCGCA AGGCTAGAA6A TCCTTAGTA AAACCTGTCG AATA'rCACTC AAATAATTGT CATCTGTAAG ACCTGACTAA AAGGATTGGT CAGATAATTC TTGAATGGTT TCTGGTTTTA GATAGACTCG ATTCGTTAAG AGGATAGAAA ATCCAAGCCA GATAAATGGT CATAGAGCCC CAAAAGATCA ATTTAGTT CCACATGTGT CGGTGCGATT GGCAGGCCAG
CAGA.AGGATG
CATCAAAATC
CTACTTCTTC
TCACCAAGAG CGTTGGAATG CCATTGTCAA TCCCAGCCCG CCCCAACCAT GATTAACTCT TCAC CCA AACCTAAGTG CTCAACCGCC TTGTCCATAA TGATGGCATT TGGTTTTCCG ATATAAACCG GCT'rCACTCG TGTCGCTACT TCAAGCAGCG TAATCAGTGA GCCAGCACCT GGCAAAACAC CGCGTTCCGT CGGGATGTTG AGGTCAGGAT TGGTTCCGAT AAAATGGCA CCCTTTTGAA TAGCACACT 898 T=~GTGGCA AATTTTrCAT A~rCGACTTG CCAATCCAG-A CCAACTACCA CGTAGGCAGG 1.800 TTVTTcr. TCTTCCACAT AACCAGCCCC C1TGrATGGC TCCTTGAGTC CTGCrC'TCC 1860 GACGACATAG ACGGTCTrTrr CAAGCCCCAA ATCATTCATA TACTCG.ATGG TTGCCAAACT 1920 CGCTGTGTAG ACAGTCGATA GGGGCG'TATC GATAI'AAAA TrCTG.AGCCA ACATCTCC~r 1980 AACACTCTCT GGAGTGCGGG TTTA7TGT GGTTACAAAG AGATAGGGAA TGTCCCGCr'r 2040 TTGCAATTCA TGAACAAAAG TCTCTCCAGC AG=A'NGG TCTT'TCCCCT TATAAATGGT 2100 TCCGTCTAAA TCAATTAAAT AGCCTTTATA TTTCATCTAT TCTCCCTAA CCTTTITrA 2160 T'rrCTGCCA AGTAATGATT GCTTGGGCAT TGATAACCCC ATCACTTGTA ATTTCATGCT 2220 TGCTTTCCAG TCCAG'rCCCT TCAACAGCCG ATGTAATCAC CCCACCTGGT CGAACTTCCT 2280 TGACATACTT GAGGTI'CATT T-rCTTGGGAA TATAGTGGGT CAAAAAATCC GCTCCCATC;A 2340 CCTCAAAAAT CCACTCCAAG TATTTACTGT TATTGACATG ACCATTCATA TCCAAG'TCG'T 2400 AAAAACGAAC ATGGTAATCC TTGCrQATCC GTTCTTCCAA GGACrCATAC TTrCGGTCc-AC 2460 P :GGATAAGTTT 'rTTATCAAAA TC-AGACTGGTr AAGGAGCCAC AATCTCAGGr TCAACAACAT 2520 G GACTTTTCG ACTGTCGCGG 'rCCATGAGAA CAAAGGTCGC CATCATGTGG ATGAGCTCCT 2580 *aaGCTCCGCTTC ATTATAAATA GTAAAGCGAC GGTAGCAAAA AAGTCGATTG TAGCTCAAGC 2640 *aaCT'rCCGTC GATGGTAATT TC-TTCCGCAA PAACGAGGCAA ACGAACCACC TCAATATCAT 2700 ***ATTCTACGAT AATCCAGACC AGATTATATT CTTCCAAAAT GGCCTTATCA CTAACTCCCA 2760 GTTCAATCGA CTGCATCCCT GAAACTTGCA GTGACAGCAA AATCACATCT GGAAGTTTGA 2820 *TATGACCGTT CATATCAGCC ATATCAAAAG GAATTN'CAT TTTCATGA TAAGTTAAGC 2880 *CCATGATCCT ACTCCAAAAT AAATCGTTCT GCTACAGTAT CTCCCAAAAA GAGACCTCTC 2940 TTTG'rCATGC GAACGTGGTIC ACCCTCAATC TGCATGAGGC CTTGTTGAAC CAAATC1'CTG 3000 ACAATTTCTC CATAAAGTCC AGCAAAAGAC Tr.TCCAAA~r T1'TCCTCAAA TCGCGCCATG 3060 *GAAACCCCGG ATTTCTTGCG GAGTcccAAG. AAr-ATTrcTT CTTCCATTTG CTCCT'rcA 3120 CTCAGGTGAT CTTCTGTAAT ACAACCATTG CCTTCCTCAA CCGCACTGAG A'rAATGACGA 3180 ATGGGACCAT CATTTTTAT1A GCGTACTCCA TTGACATAAC CAGATGCCCC TGCACCAATA 3240 a..CCA'rAc'ATT CAGCAT1TGTC CCAGTACATG AGATTATGAC GACTTTCAAA ACCGGGTG 3300 'GAGAAATTAG AAATCTCATA ATGCTCAAAA CCCGCTCGCT CCAGCTCTGC AATGATGTAC 3360 'rcAAACazTcr -cccrcTcAG TTCCTccTTA GGcAr.AGccA ATTTCCCACG TCGCA'rCCGG 3420 TTCATAAAGA CCGTATGG'rT TTCTAAAA TC AAACTATACA AACTCATGTG GGGAATATCC 3480 AATCC.AATGG C 'I-AGCCAC ATTTTCC'rTr ACr'rGCTCCA TGGTCTGACC AGGCAGAGCA 3540 TA.AATCAAAT CAATGGAGAT TAAATATCCT TCTCCA.AATG TGGACACCTA GCCAAACACG TCCAAATCGC CTGGA'rTGGC TT'AGTCAACC CATTCACTAA CCACCGATAT AAAGGGTTGA TGCTCTAAAT AGCTGTCCAC TAACAAATCT GGGTACAAAA GTAATTATTA 'TACCACAAAG
AGTCAAAA
ACTGCGCCCA
ATTGACAGCC
T'rCAATGTC
CACCTCCAGT
CAACTTTTCA
'rGGCTGATTT
TGGGATGTGC
ACTAGATTCC
CCAGCCAGTT
ATCTTTTTCA
GAATTC-A
AACI'CTTCCA
TGCGGAGCCG
ATATCATAAG
TCAGGCGATC
ACAT!CTTATC
AAACAGCAT
AGACAGACAA
ACAGGGCTOT
AACCAAACTC
GATA'ITN'CA
ATCAAAGGTC
CTTATCCGCA
ATCCAAG'Tr CGgTGTTrCCA T-rCCAGCAGA 'PTGATIGAAGA CCT-rrGAAAA ATCACAATAA ACATAGGCTG ACGTTGG=TT 1TCTGCATA TCCGrCCGGA GATGGTGATG GTTTATTCTT CATCAAGAGC TTCGGCTTTT TCTTGCCATT
AGATAAAAAT
CTGTTATATC
GCTCCTTGAG
TAGTTTGAC
TCAGAACAAA
0*@ 0@ ATGCTTCTGT CCT'TGAAA.A TTTCACCTGA CTCAACCTGT CTTCTGTCAG TTTTTCACAG CTAGAAAATC TTGAGCCTGA CAAGAGCTGC ACCTGAAACG TCCTTTTTTA TwTTTT~GAAA TCTTGAGTGT TTTTGAACCA TGAGGTCTGA AACAGATAGA CCACCTTGAC ATTGATATCA GATGCAAGGT CACATCCACA TGACACGACC AAGCTrGT.
GGGCAACAAG AATATATGCA GGGCCTTCTT TCGATTTTCA A.CAGCCATTG GGACACrCGA GCATAGGA1TT 'rCTTrTGIr ACT'TGCTCCT TGGCATACTC C'TGCAAACTT GTTTGCCC'TT CTTCCTAAAA GGATT'GGA AATTTACTE'G CAAGACGAAG GCTGATGAAC CT1'TCTTGCT GATAAATCTG CAACAGCACT TCTGCCAAGA CATTGACCTT TCTGAAGTCA AGGTTNTAAT -ACAGTAATGA TCAGATAGAC AC1'TCTAACA TTTAGTTTTC 'rAAATAACGA TCATTATACC AAGCCGAAG.A ACATGAGACT CACCATCCCC AGATACATAG 4080 AATCACAATC TCTTCTGAGT 4140 ATTATTTAAT TGATTTTTITG 4200 AAGTATACTG TCCACAGTGA 4260 ATCTGTACCC TGCrCCTTAA 4320 CGGATCTTCT CTCAAATCAT 4380 ATCACTTGTT AAAAACAAGG 4440 TAAT'rTACCC ATAAGGATTC 4500 AGCTGACAGA CCACCAG 4560 CA.AGACACCC GCATGGTCAT 4620 GAAGAGTGGA TCAATCGTAG 4680 AGCCAACAAC TCATTGGTGT. 4740 CGTCTT'IrCT GTTrrCATCGA 4800 CAAAAAGACA ATCAAAGCTA 4860 CTCCTCTGTA ATATACTAAG 4920 GACACCGATA AGGACAACTG 4980 ATCTGTTCGC ATACCTTCGA 5040 CGTGATA'rGA CCTCGTrCTGA 5100 GCCAAGCAGA TTCCATAGAG 5160 ATACATCTGG TCACGCATAA 5220 AGCTTCTTGG TTAAAGAAAT 5280 got* ese.
TAACCATACG ACCGAAACCA TACCAAATCA AGTAAAAGGC GACTCTTCCA T1'TCCCTCTA AAAATCAGAA TCAAGGCAAA ACTCATAAAG GAAAGTCGGT TGACGGTAGC TCCCCTCAAT AGCCAGGTAG ATAATCCAGA T'rATCCACTC TTGCACCATA 900 TACCCCAACC CCCCAAACTT TGAGCAATCA TAACGCTAGG CGCCGCAATA TCTAGAAAAT CCCAAGTATT GATGAGTTTA CGGTCAGCAA AGATATAGAG CACAAGAGCC CCAGMTATCA AACCACCGTA AATGGCCAAA CCACCATTCC TATAGTAATC AAATCGGAAA ATAACATAGT AGGcrAcTAA GATAAAATCT AAAATATCG.T TCATGGTCAA ATAAACCGCA AGAATCAAGC TGGCTAGGGG TCCTAGrTGA ATAGCAATTG ATTAGACTTG TC-AGTCGTTc GTCGAACAAA CGATAATrCA. TGGCAGCTGC CTCAATCACA ATACGAATAC GAGGAATGtA CGCCAGAAAC ATCAAAGGTC TTATGCGTAT CGTAATTTTC ATCCTTGACA GCACrCGCAC CGTAGAGACT ?I'CAATCAAG TGTTTCAAAA TTTCAGCTGG AAATGGCAAA AATCTCTCCT AAATTC'TGAC AGAGACGAGC TCCTAAAATA GCCXAGGGAA
CTGGTATGAT
CTGTCACAAT
GATCAAGCAT
CGGGTCGCAT
ACAGAGATAT
TTCAAGTTC
CAAATAGACA
CATAACATCG
CTrCTTCTA GGTCCC ACATAAGGCA TACCAACGA TTI.GCACCrC ATT'rCGAGCG CAAAGCCCAT TT~CCTTGGCA TACGACCTGT TTTAACTGGA TCTGCAPTAT T'rCCAAGACG GCAAGCTGAA CCTGTGAAGA ATAATACCAA CCCCACGAAT 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6186 TTCACCCCAG AGAGTAATCT CATCCTTGGC AAAGATATCG ACACGGTCAT CGGCTACCAA ACGGTGACCA CGTTrGACAA GCTCAAGACC TGTCTCGCTC TTACCAAT'rC CACTATCTCC CTGAATCAAG ACGCCCATCC CATAAATATC
CATCAA
INFORMATION FOR SEQ ID NO: 132: SEQUENCE CHARACTERISTICS: LENGTH: 9541 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: GAAAATCACA ACCCTTTTTG CAAAATTTTT GAGATTAT'TT TCACAAACTT GATTTTTCAA AGTATACTCA ATAAAAATTA AAAAAATCCA CTACGTCA.AG GCGAGGCTAA TGTGGTTTGA AFAAA'rTTTC GAAGAGCGTG AATGAGTATC ATCTATAGTA AAATAAAAAA ACTGAACAAT 'GGTTGGGG ACAGCCAAAC CAATTTCTCA CAATGTTTCA GAAACAAGGG TGTGCTATTC CAATTTCAGC CTACTATAAC TGTCATAGAT TGCTGAAACA AAGTCTAGGT AAAAGTCTTC ATAATAAAAA GACCTrCCTAT CAAGTGN'CA AAAACI 'TGA TAGGAGGTCT TGTTTTGTGA AAATATTTAT CAAATTTTCT ATACAAGTGA GCTGTTAGCC AGGTTCPTTTC TA'rrCTTTCA ATTTCAATGA ATGGATTTTI' TACTAATACT CATAACTGGG AA~1'TGTCTG TGTAAAAATA GCGAGATAGA TGGTAT 'TAT AAAACACTCA AGACAGCTAG ACTAATATCA T11TAAAACA'r
TATCTTCT
GATAGGCTrC
AGATAAGCTA
TGAGCCACG
TGATTTACCA
GCCGCI'ATGT
1-rGGTTA.CCA ACATAGCTAA ATTTCCTGCA rTrCAAATT TTCACAACCA CCAA.GGTrG TTCTTTGCCG TGAACTTCAT TCAATCGCAG AATr.CACAAA GACATGATGG TAAATTTCAT CATAGCTAGA GTAAGAAAAG ACTCAATAC1' GTCTTGACGC CACCATTATA GCTATTCATC CAGTCGCAAC CACTTTGGTA GCATGAGCAG ATTC-ATGGCT GCACCAGTTT- TTGTCTTCAA TCATTAATCA AGTTCCAGTT GCACGCTCTC TATCATCATG CGACCAAATT CTTGACCCAT GTGGCAGT'rT CGACCTTGCG TCGGATGuAC TGACCGTAA CACTTGGTrC ACACTGTCAG CGTCAACTCA CCATTTTCAC TCCATAAAG GCCATCCCCT CATGA'rTTGC TCTGAAC L 1 GGTCTGGATG AAGGGTTGCC AATAAATCGT GAAGATTGTA ATTGTCATCG GCTTC'rACAT AGTTA.AGCAC CTGATTTGGA TGTGTATAGC TTCCTAATTC ACGACTTCCT AGGATTGCTT TAGCTAGAAT TGGCTCTGTC GC.AGCACCAC CTTCTCCCCC TTTGACAGCA TCGCGCTGAT GGTAGGCATT GTCCTTCTTG GCCTTATCAT CT'rCTCCATA GAGGATAATG TTGGTCGA TGGTCTTGAC ATCATGAATC CCCATCAAGT GCACCCAGTA TAGAAGAGAA TCAATCATAT TGACAAAACC TCACTTGATA GCACCATAAA TGTCATTAAA GAAACCAATA TTTGGCATCT AAGGGGCAAG ACCTGT'rCCC ATATCCCATC 'rrTCATCCAA GCTTTGACGA ATCATCTGCA CAAAACGGAA GCCGTCAATA TTATATTCCT ACT'rGCGANAA CA=rTCGTGT TCACTGGCTG 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 TTTCATTT1CC AACACCCGTT CCATTCTGGA ACGTACCATC TGGATTCATA CGATAATAGT AATCAGGGAC TGTTGTTTGG AATGGTGCAT CAACAACTGA GAAGGTATGG TTATAGACTA CATCCATAAT GACTCCAA'rA CCCGCATCGT GATAAGCTTG AACCATCACC T1-CAAA'rCAC GAATGACCTG AGCTCGATCA TCTGGATTAG TTGAAAAACT AGTTTCTGGC GCCTTATAGT TTGTGCATC ATAACCCCAG TrGTAGGTTA CATTTCCATC CTCATCGTAT TCTTTATGAC GGTCTGCAAT TGGTTGCAAT TGAACATAAT TGTAGCCCAG CTTCTTGATG TAATCAAAAG CAGTTGACTC GCCGTATI'GG TTAACT%.rTC CAGCCTGAGC AGCACCCALAG AAACTTCCTC GAAGATGTTC ATCTACACCC GATGTAGGTO ATTTAGTCAA ATCACCAATG TGCATTTCAC AGATAACTGC CTTACATGGA 'TT=CCA6ACC GCCAAGTAGC CTCCGAACCG TGCTTAACCT CGAAGTTTTC AACTTGCTTT TCTACATGCC TCAGAATAGC TGAACCTTTG CCATCAAGGGC TGGTCGCGAT TGTATAAGGA TCACGTGTCA GTGTTTGGTG ATGAGGGAAT TGGACTTGAT ACTGATAAGT CTTACCTACC AAATCTTCTT CAACA'rCCAA ACTCCAGACAL CCCATTGTAT 902 TGTCCTTATG ATTATAALGAG TACTATTGC CTCTTI'TCAT CTCAAAACTC rrCCAAACGG GTGCATCATT AGCAGCTGAT TCAThAACGA CAACTTGCAC TTCTGTCGCT GTAGGTGACC AAGAGAAAA ATGACCCTGA 1'TGTCCTCTA CACGGCAACC CATC'TCCT TGGTAACCCC AATGA'rGATC A)AACTACCA CTGTTAATGG CCTTATCAAA GGCAAAACGA TTTTGAT=r TATAGAAAGG ACTGGCAATA GCAGGATTT CAGAGTAATA AATCCTATCA TCGCCTTCCA AAATCCAGAC CTCTCI'TAAT AGGGGATACT GA'rTAAAACG GATAGAATAT TCTTTACTAG TTrTGACCTGT ATGAACCACA AAATTCAAGC TTTCTATAAC ATrGAACTT GGGTGTTCAA AGCTAAATAA AGCTCCAAAA GACTTTTTAC AAAGGAXGCAA TAGGGTAGTT ATACATTTTr T~wrTGTCAAT CTCGTrCTCAA CGATCCA'rTA CAAACTTTCT CTTTrTCCAG TTACrTCCA TACTACCTC'T CACTTTGGCA TAATCTTCTT TGTAGGTTAG GTGTCATAT'r CTCCATTCTT TATTI-TTCCr TTTTACTTTC TTAACAGACA TAGTCATATT AGCCATGCCT CATCTCTGAC GCG'rCTGCAC CTACCACGAC CCCAGACIT CGAGTTTACT GTCGTTCCAA TTrTCcTGTC AATACCACGC TCAAACCTGA CCGTCTGTCC TTTATAGTCC AGATrOACCC CAGTV'CTTT CAGAGCCTTC 'rGTCGCAAAA TAAGTCTGAA GACTTTTGGC CAATACTAGC CZACTrCCTCT GAATCTGCCT GAGACAGATT GAAGTAAAAG CrGACTAACC TTGCTTCCGA CATGACGAAT TCTCGGCAGA ATTCCTTT GATGCrTrGGA TAGCCTGATA CCTTAACTCC CTCTAAAAGG AGGAAATCCT CNTCTTGC.AA CCTTGACTAA ATTAGGACCA AAAACCTTC'r CAACAATAGA TCATAGCATC ACGAGAAGCA AAGTGAATCA AGCCTrCCAT GATTGATACA ACGTAGGGCC ACTrCATCTr CAAAGTGCAA .GACAGTTrGT AGGGATATCT AGTTTTTCTT CAGAAACCCG AAACGGCAGG GATGATGTCA Cr-AGCCTTAT ATACAATGAC CAAATCAATT CGTTGATCCT ACGATGGTAA TGA.ATGCGCA TTTCTA"TrrC ACTAATAAAT CTCrAAACTC TGTTrITTAAA CTGGATACCA AGT'rCTTGTG GAGGTCGGTC TTrTTrAGAAA TTTrAGCTrCT GAGCGCTTGA CAAGGCCGCA TCCGCrACTA CAA7-CTCTC AGCAGAATTT AATCACGCCA CCTAGACTTT TTCAATTGAA TGGAAATATT TCCCAAACCA AATAAGAGCT CAGTTTAGCA GCCGACrrT ACGATAAATA TCCGCCACAT TGGACCAAGG CCTGTAATAT GATTTGAGCA GGGCAACGCG CAAGTCAGAG TTACAACrTT TTTGGACTCT ACCACACGTA CGTATCGTC'r 'NTCGGATAT 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 CTTNTCAGC AATATAATCT ACATTGTGCA GTTGTACTGG TGTTACATTA GCAGTTGGAG CAACTCATAA GAGTTGAGCT TCTTTTTCTT GGGTCGCACG GCTAACAGTC GTACCGGCA.A TTACAACACC GGTACGGCCA AC1'GTCCAGT CGGCAGGGAA CTTGTAGGCT ACTGCCCACT TTGG.AGCCTT AACTGTAAAA CCAAGTTCTT CTTGAC 1'GC TAGGTCGTTG ACCTTGATTA CCACTCCATC AATATCGTAA GGCAANr CCCGT-rCCTG TCCTACTITCT TGGATAAAAT TCCAGAT'rrC ATCTATGTTT TCAGCCAAGA TTCCTT1AGG A~rGACCACA AAACCrAGTT GTTCTAGGTA C?rCAAACCC TTTTCTTGGC TATCACC.AGT TGAAGGGCTG GCTI'CTTGAT AGAGAAACGT TGCAAGATTrA CCTTGGCAA CTACTGCTGT ATCCAACTCA CGCAGAGTTC CTGCTGCCGC ATTACGAGGA TTAGCAAA'rr CAGGCTCTCC ATT'rCTTGC CGCCTGGT TAAC7rGGTC AAAGGAAGCG CGTGGCATG;T AACATTCCCC ACGA.ACEGTG ATATCTAGTT CTTCTGGCAA AGTCAAAGGG ATGTCCTTAA CACCC?1GAG G~rTT=rGTC ATATTrTCAC CAA'rTGAACC ATCTCCACCT GTTACCCCAG CAACCAAAAr CCCC-TTrCA TAAGTCAGCG AGATAGATAA GCCATCCA'rr TT'CAGCTCAC AAATATAGGT CGGATGAGCC ACTrCCTTAC GAACACGCG.C ATCAAAAGCA TCTAGCTCCT CACATCAAAA AGCZATCCrGC AAACTATAA.A GAGGATACTG ATGACTG'TAT TTITrCAAAAC CATCTAAAAC C'TTGCCACCA ACACGATGAG TrGACTGTC TCAGCACT TGCTCTG.GAT AAGCAGTTTC TAACTCGACC AACrACGGT AAAGGCGGTC ATACTCACTG TCTGAAACCG AGGGATTATC GCTGGTATAG TACTCAGTCG CA'rAGCGATT GAGCAAACC ACTAACTCAT TCATTC=--- ATTCATAAGA CCA'XrTACC ATAAAACAAG CCCTCCTCAC AAACGAGAAG CGCGGAAAAA ACAC-rACTTI TGAAATTATT 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 TTTGAAACTC AAGCAACCT TA.AGAAATAT AAGGCTACAA TGGCAAGACT GTCA'rAAATC TGTACTCAGT ACTCTTCCAA GTGAATATTA AAAATCG'rCA AGTTGGAAGr GGTAATCCCA AAGATTATAA ATATTIAGCTT AGCCCCCATA GTIAAAATAC GGGAAGTAGA AATTCAAAAG ATATCAATTT TTCAAAATGA GTTCCAACAT ATCCGAGAGC CTCCAAGTCC AATAATCAAG AAAGAATAAA GATGGACACT CTTTTGCAAT AGGCATAAAT AGAATAGCTA AGGTAAAAAT GAAATTCGCT CTCAACC.TTG CTTTGTACTT GAGTAAAAAA TAAACAATTC ACAAACTAAA TTTCCAGCAAA AGGAAAGAAA TCATAAAAAC TCCGACACCT GTCAA.AGCCA GTAAAATCAA TAATTTTACT AGCTAGAAGA GCCCCAATGA TGGAACCAAT TTGCA'rAGGC TCCTTCTGAC CCGTAAAGCT CATTCGAAAA CTGCAAAAAA GAAATTAACG CTGGAAGCTA CCAGCAAAAG GAAGAAAA'r
ATCTCTCCCA
CACTAGAACA
TCCAAACTGT
CAGCTCTAAG
TCTTGCTGAT GCCAGATATA G3'CTAACCCA TCCT'rGATAT CTACAAAAAT G;TAAAAGCCT TTTTCTCTTG AACTTTTGCT TCCTCTTTTG GAAGGAAAGC AAAGCAATGA AAAAAGTCAG CGAGTCTAGC AGTAGCGTCA TATGGAGACT AAAACAAGGA AGGAAAGAAC AGGAGAGCTA ACACCTACAA CCTGCAAAAC CGAGAATTAT AGATCACAAT CTCATCTTTC TCCACCACTT CAGTTATGAT 904 AGCTTTATTG GCTGTCGAG AAAAGGCAAA AGCAATAGCC TGCACAATGT TAGCAACAAT CAAAGCGCCA ATCATCCAGC TATCATTCCT TATC-AAAGAA ATAGCCAGAC AAAGAATCCC ACAAACAAGA TCTGCCGTCA ~T-AAAATCrT GCCAAAGGGA TTGACGAGAA TAGATGTGAC TCTCTGTCCT ATAGTCCCCA TAGAAGCCAA
ATTTCCCATT-
AAAGCTCCTC
CCCTTATTTC
TTATTGATAG CCCCACGGCT TCAAATlrrG AAACTATT'GT GGAAAATTAA CTTTTGACAA ACGACCAGAA AAACGCTCTG AAATAACTCC GAGCTCAGAA ATCTGATACA 'ITCCTAAAAC CCAGACACTA TTTCCATAAT CATAGAGCAT AATCAACTGC ACTGCATAGC GATTCATATT ATCAAAACCC AAAGGAGCTT TTTAT-rNM ArrTTTCGTA GTGTCCTGA TAATAGGCTA TATGCATrCCC CTCTTGCATC TGCTTACAGC CAAAACCTT-r TAAA'rAATGG AAAACATTAC CTTCTCTGG AAGACCTAAC CTTCTTTAr-A CTGTCCTTTT GCTCATAALAG CTTAATACCT CATTATAAAA AGAAGAATC CTAATCTCTT ATGCGACTA
ACATCAAAAA
TCATATAAGG
TTrGTCAATAA
TCTAAACAAT
ATCL'CTTGGT 00 so#b 00. oo. 006.
a.4..
TCTTCTCTGT
GCTGGTAACA
AAAATTCCTC
ATAAGCCTCA
ArrrAGGGCC
CCTCTCCATA
CAATCCCAGT GACA'rAGCT= ACATCCTAGA r-ATACATGGT CCATTCTTCr GTTCAAG.A GCTTCATATC ATAACTCGCA TCTCTTTGAC GCTCAAAATA AAGGGGAGTC GTCGAACTCTr AACTATAGAG GTTACCGAAA GATAATCTGC TACCTTACCC AAATCAGACC TPGTAGCAAA TAGACTTTTC AAG"TTCT
AAATAGTTGG
AAAATCAACA
ACTTCTCTAC
AGAATCAACT
AAATCA'rCCT ATCCAGTrCA
CTTTGAACCT
ATCATAGACA
GTTACATGAA
AGCTCAAAGC
5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 T'ITGAAAACC TGCAATATCG TTTG3AATAGT AAAGTGGGAT AATC7CTGCC CATGTTCATG ATTATGAAAA TTCCTrGCCCT TATCCATCAA ATTTTCGATT TGT'rATCCAA AATCTCAAAG AAACGGGAGA CTGCCAGGTC AGACTCCCCA GAGATAACTG AGAGGTAGAG CAGGATTCGC CTGCTGCTTC TTGTTCGAAA TTCACGAAAT ACTTTTCCAA GATGT-TCCAT TTcTTCCCAC TCA6AGCATAG CTTCT'rCCTG ACGATGGCTG TAATTCCATG AGTTTGTCGG CATCGTTTGT TTCCAACATr TTGACTTTCT AGCT=~CAA TTTCAGCTTC TAGACTTTCG AACTC'ITrT TGACTTTCTT 'rCTCGGCCTG ATACTCAT'rG TGATTGCTA GTTGAAGCTT CCTCAGTCTG ACTCA'rTTCT ATAGTAGTCG TAATCTCCAA GGTAGAGAGT TWGACCATTC AGTTGCCACA CGATTGATAA AGTAACGATC ATGACTGACA CI'CAATCAAG GCATTTTCTA CCACTTCMr ACTATCAATA CTTTAAAGA-A TAATTTCCAC CTTTACACCT GCTCTGATAA ATTTTGTCCA GCTCAGCCTG T=~CAGAAA TGGCTTGGCT ATTTGTCGCA TGAGTTTGCG ACTGGACTTG CTTCCTTTGC GCTGTTGCTT TCTTCTCAAC TcAGAC-AATr ccAA:AAcATG AACAGCAAGG TTCCATCAAA TCCAAGTGGT TGGTCGGCTC 905 ATCCAGAATC AAAAAG=rAT TGTTrX'CCAT AGACAATTI'A GCTAAAAGCA AACCAGCTT T7CGCCACCA GATAGCATGC CGACTGATTT TTTA.ACATCA TCTCCTGAGA AA6AGGA.AGC TCCAAG.ACCC TTGCGGATTT CAGCACCGTA TTACrrGGTG ATTAGCGCCA AAGCCrTT AAAGTGAC TTCCCGATAC ATCTAGGTTA ATCGGTTGTG AGTCAAAACA ACATTGCCCG GCCAGCrTCA GGCTGTCCA CAACTTCTGG 'rGTCAGTTTG AAATCATTCC AGAGT'rCATC TCAGCTTGCr TTGGCTTTGG- TCATAGTAAC CAACCTCAAC CTCCCTTGAT AAAAGGAATC CATrTGGACC AACGATAGCG ACAAGACTTC CCCGTCATAG ACGTTTTTTC AGACTGGA6AC AACGTTCCAT 7"rTN'CCAGT TGGTCCZACAA TAGACT'rGAT ACAGCATCA TCTTACGAAG CCAACAGCTG CATT'rTCAAC GTCATGTTGG CTGAT-rTCTT TGT?!'rACGGC GAGATTGAC ACGTTTAGTC GTTGAAGCAC GAACTAGATT GCGATTGACA AAG'TCTTCCA GACCAGCGAT TTCCTTCTGT TCCTTTTCAT AGTTT-rTGC CTCAGTAACT AGCTTTTGCT CCTTCAATTC GACAAAACCA GAGTAAT'rCC AATTGTCCCA ACCTTGTCC-A ATAGTr'rACC AAGTAATCr CTCGTCCAAG ACCAAGAGAT CCACATAGCCG ATCCAAGGA.A TGCTTGGTCA AATCTAGCGT AGAAATAACG GTCGTGGCTG ACCATAATGA GGGCACCGCT CTAGCCAGGC GATGGTTCA ATATCCAAGT GGTTAGTTGG TCGGGCTTTTC AAGGAGCATT TTGGCAAGTG CCAAACGAGT 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8946 9000 9060 9120 9180 9240 9300 A7TrTTGACCA CCAGAAAGCT CAGCAATTD? CATCTGCC-AC ATAGACTCGT CAAACT'rGAA TCCATTCAAA ATCGCTCGAA TATCAGCTTC ATAGGTAAAG CCACCTGCTT GGCGAAAAT'r CTCAGATAAG CGGTCATAAT CTGACATCAG TTTATCCAAA TCCTCACCAG CATCTCCAGC TCCATCTGAC GCAGTTGTCT CTCCGTCCGA CGCAAI'CAT AAGCATTTCA TCGTAGATGG TATTTr'rAGA CTCAAAACGG CTATCTTGGG CAGAGAAATA TCTTrI'Tr TATTGATTTC TCCGCTAGTT GGCTCCTCT'r
ACTTTTCACC
TAAAGACATG
CTAGGTA-AGA
C'rCCAACTAA AATCTTCAAA AGAGTAGACT 'rACCTGCACC ATTTr'rCCCA ACAAGAGCAA TCCGATCTCG TTCATCAACC TGCAGGTTGA TATTA'rCGAA AAGAACCTCT CCTGCAAAAG AACG'IrCAAT rTTTAT'rAGCT 'rGTAAAATAA 'rCATACA-AGT AGTATAGCAT GTTTCCCTAA GGCATTrCAAG ATAATCGTAA GTCTTTTAGT ACAAC'rrTTA TAACATAAAA TAAACTAAAT TATGTA'rATTr TTATATTAGA TTACTTCACT ATCTTGTGG ArTTrrCTAAC CAGCTAATCT TGT-rrCAAAT AGTTATCGCA CAACTCTA'rr A'1-I-TATTCT TT'rCATCATT TACGTACGTA TAGCAGATTG.
AAATAAGATG AGAACAAATC GATTGGGAAA GTAAAAT'TAA TTTCTATAAA TGT?1'TAGCA ATTGTTTCGT ACTATTAG ATTCACTCTA CTATATACAA TATI=CGGA ACATTCAACT 906 TTrTA.ACTCT ATTTATT1ACT AGATTTCATA ATrAAAAAAC CTACTGACCA AGCTAGAAAG CTTGATACAA TAGGCr?1r AAAGACTGAT TATTrAACAG CGTC?1TrAAG AGCTTTACC-A GCTTrGAATG CTGGTACTTT~ AGAAGCTGCA ATTGTCATT C3TTACCAGT TrGTGGGrrG CCACCTTTAC GTTCTGCGCG CTCACGAACT TCAAAGTTAC CAAAACCGAT CAATrGAACT
T
INFORMATION FOR SEQ ID NO: 133: SEQUENCE CHARACTERISTICS: LENGTH: 3502 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 133: TTGACT-ATCC TATC.ATGCTT TCTAAGGTCT ACTCAAGAAA ATCA"TTrCA AGT=~CACA CCPTCTCAA. AAAAGTTAAA AAATTTTCTC AAAAACGCTr GGTTIATACTA TCATTGTAAG GAGGAAATCA TCTACC-ATAT CAGCTGTCTC TGTCAAGACC CTGCATCACT ATCACAAGAT AGTCGGAAAA CGGCTATCGA ACCTACAGTC AAGAGGATT TITACTACAA ATATCTAGGC TTr'rCTTTAG AGAAAATAGC GGACAGATTT ATTGCCCCAT TTGAcTAGGC AGTTGGACTA ATCTGGATAC CTTGATTTCC ACCTTGCAAA AAACTATTCA AAATGACCAT TGAGGAAAAA 'PTCACGGGAT TTAGCTATCA GACTCTGACC TAAGGCGAAG AAA.AGAAGCT CGCAGCITrT AGGACTCTTG G'rCCC CTT AA GGAACGCCT'r CAGGTCATTC AGACC-CTTA AAGGAAGAAA TCTAACTCGC GAAAGGCAAC AGAACAAAAA GGAGAAAGAA AGACAATCAA AAATACCACC AAGAAGCGGT AGAGAAATAT GTCACGAAGA CGAGGCTACG TTCAAGTrGG TrrACCTGCA AAGCCATTCG CACTTATGGA GTTAcGTCTA cAAccCAGAG AGTACACGTC AGATGCCATT AATTTCCTAG CCTATTTTTT AAAGAATTCA QTGAGATCTT TGC.AAGT'rTT AGATGAGAAA AGACTGGAGT TCACCAGTAA GGTC.AAGAAG TCATGGGACA ACCGCTCGAA CGCCAAAAAG GCCGCC1TTTA ACCAAGTCTr 'rCAAACTTrG GCACAAAATC ACAGCAACCG AAAACCAGGA GCAAGCAGCC AAGCTCTTGC TT'TGACTGCT CTATTGAGGT A?1'CGGTCAT ATCGGTAAAG TTTAAGGAAA ACATrGACAA GTTTGGTTCT GAAACAGCCC GCGGTTTACG T'rCAGACAAA TGCAGAATAA ATAGGCTAGG ACTTCAAATC ATAAAGCCAG TCGTCACCGT TTTTGTAGTA CTTCTAGAAA CACACGAAGC A'rATCAGACA TATCATCGGT' GATTTTCAAA GTCCTCCCAC CAAACTTTCC CTTCGTCTGA AGTGTTCTGT CTTGTAAAAA AGGACCACAT AACGATAATC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 CTTGTCGTCA 'rACCAG~rTr TrCTT'Cr-I-C ACT'rCACGAA ACCAGGAAAA GTAATGCCAG TCCGrTTTTA ATCATACACA CAACCTTTAA TCITAATCA AAGTGATGTT AAAGCCACCC 907 TGA'rACCACA GAG'IrCGGGT ?TGGAAATGA TCAGACCAGT TGACAGCATC GACAAAGGAT T'rCCACG~r CAACATGACC ACCAGrcCGGG A'rrAACTCGG TCTTGGACCA GGACCTTATC TGrAACAAA TTCGACTGCC TCTCTTCTGT TCATTCTTCA TAATGCAGAC TTCCCGCCAC CCAGCCGGTIA CAGACGGCAG GTGTGGGCA'r TGATATCCAT AACTrC=CTr GCAAAGTGGA GGCCAGCGTAC CACCTTACTT TCAAGGGTTT TAGGATTGAT TTCCI-TGAGA CTGACTCCAC 7r'=CCACT 'rACAGGAATT TTAAGTTCTTr CCTTGGTAAC AAAGGACTr GCAAGGGACA TAA'rCGACTG CACAAGTTGT TCTCGTTCCT ATCCTTGTAC AAAAAATTCG GCCAAGCGT AGGATTTTTC CCAT CT TCTAGAAATG AAACATCGAG TGAGAGA.ACC TCCCCACCTT GACCTGACAA ACCAAAGTGG GTAAAGAGTA TTAGGGTCAC ATCGTCCAGA GAAATACCTT 'rTTCAGTCAG CTrGGTrAACAA
TAACCAAGTC
TGACAAAGCT
AATCATCAGT
GTAAGGCTTT
TGGTATCCTT
AAGACTTACC
TAAGGACA.AA
TTGI-IrTGAC-r TTTCAGGAT GTrTTAAA GCGTrTCA CTCTCAGAA AGTTGACGCA AGACATGCGT ACGGCAGCAG GATGACA.TGC TTACCATAAC ATGTG.GAAAA TCTGTTAATA AAAATGGCGA GCAATCTCGT ACCTGlrTG ACAATGAGTT CTGGTCATCT ACr7r1rTAA AAGGACTC ACCAGCCTCA GACCAAAACC AGTCGAACCA TCTCACAAGT GAAGG'N'TGA CAGAAACGAT TTCTATTTGA AAGCTTCGAT AATAGTCCGA CCTTAAGTr'r AACACCATTT AGAAAACACT GTAAAGAAAG TACCATTGTT GCACATTG TCCGAT'TTr TTCGATGAGG TCATACCAGC AGGTCCCCCA .ACCACAAAAA AACAAGAGAT
AGATCGGTGA
GTCGAACGAT
TCCGCTGACT
1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 GTAGCAACTT GACCACCTAG TTCGGTGATT TTCTTTTCCA GACTTGTCAC TGGCTGGAAA GACGCGTCCG TGCTTTCGA TCTGTAAAAA AGTTGATGAT GTCA'rGAT'rA TCCAACTGGG CGTCCGTTTC CAGGAATTCC AGCTAGCAGG TTGTCTAAGC CAACCTCCCC CACCAGTCCC ACCTAATTTT TTTCCA.AGI.r AGGG~rI'CT GTCCATAAAA GCTACTGGAA ATCGTAGCCA CCGATGACAA 'rAGTA'rCA.AA ATSTTTCATA GCTCTATTGT GATGGTCACC TCTTGTCAAG AATrCAATTA ATCAATTTCA TAGCCCATCA GCAAACCCCC CTCTTCrGCA TAGAAACTGC AGACACCAGA GGTTGGTAGA ATTI-rAATAT CCGCTTGTGG GAAGGTTTCA CGGATTCGCT CTGAGAGCTG TTGACAACAT TTTTCGTTAT TGCGT'rGGGC CATGACAATA CGGCCACCAG CATATCCAGC TN'TACTAAC TCATCATAGG CAGCT'TGAAC TGATTTCTTT GATCCCCTTG CT~rTGTAG CAATTCr-AGA 908 GTCCCAGTl'T CACTAGCTTT TCCGACCATA CGAATGTTGA GAAGGCCAAC GACCGTACCG ATAAGCTTGC TCAAACGGCC GTTCTTCACC AAGT-rATCGA CTTrGGCTAG GACAAAGAGC AACZ'rAGTTT TTTCrTGATA GGCGGTGATA GCITCAACCA CTTCTrCAAA AGACAAGCCC TGGTCAATCA AGTCA'I'TCAA 'I-I-TTCTACG AGTAGGTCAA CTrCACCACC AGCAGATAA CTATCAATCA CATGAATCTT AGTGTCAGGA TGGTCTTCCA GATAAATATT CTTrGCTAGT TGAGCACTAT TGTGACTGCC AGAAAGGGTA CCTGTGATGG 'ITACTAGGAA AATGT'TG GCACCTTC.AA ATGCTCGCAA ATAGTCATCT GGGCT'rGGAC AAGCCGATTr TGAAGCTrCT GCAGTTGCAT ACATGG'TTrC CA'rCATrTGG TCPATATCGA GACTGGCGTC ATCAACAAAG ACCTGATCAG CTACTTGAAT GG'rTAAGGGG ACACTTACAA AGGTT-GTGT-r AATAGCTGGT GTTGGCAGTT GACGATAATC ACAACCAGAG TCAGCAATAA TCTTCCAAGr CATAGAAATT CTCCATCTTT GTCAGGAACG AT INFORMATION FOR SEQ ID NO: 134: SEQUENCE CHARACTERISTICS: LENGTH: 12665 base pairs TYPE: nucleic-acid.
STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3502 S. CGATTGATTT TTTTAAAGCG TTCGATAGAG AATGAGAAAC AAAGAATTTG GAGTTGAGAA TCTTGCTTGG TTGAGGCAAT TTAGTCTTGC TTATTTTT'rC ATrTGGACAG TGAAGCTTAT AGGACAGTAA AACACCCTAA TTGTTGAGTC ATGCTTATGT TATCGACGAA TAGCTGAAGA ATAAAGAGAT AAATACAAAA GTTAAAGAAA AAATATAGAA CATTATTCAA T'ICGTAAATT ATGGGAAGTG TGGTTCATGC AATAGGGCAA ATGAAAGTCA TACAAAACGA TTAACTATGG GAATCC-TAG CAATGGCGGG CTCATATTGT TrTTTrATCTC
GGTGCACAAG
TATGCTGATG
GCTTGTCAAT
TTACTTTTTA
GACTr'rrGTT
GGAAAAGCTA
TTCGATTTAT
GGAAATAAAC
TAGTGTTGGA
GACAGAGAAC
GGCAGAACAA
ACAATTTTTG ATGGCATGGG CATGGTTGGT I'TGATGTTGG TGAT'rCACTT GTTGGGAGAT AATCACAAAT ATGTAGATCA TATCTTGTTT AATATTCTTC CTGAGT-rGAT TGGCTTGACC TTAGTTTTTC CAG TTTATGC AGTTATTTTG TTACATGAAG TTATAATCCC AAATGGAAGC ATACAGTTCA TATTGAAGTG ATATAGTAAG
ATGTTTGCAT
GTAGCTAGTG
GAGGGAGCTA
GGAGAACAAC
CAAAAAGCGA AAGAAAAGTA TAG-TGTTGC CAGTCTTGTT CCCAAGTACC CACTTCTTCT CTAAAAAACT CGATTCAGAA 909 CGAGATAAGG CAAGGAAAGA GGTCG-AGGALA TATr.TAAAAA AAATAGTGGG TGAGAGCTAT GCAAAATCAA CTAAAAAGCG ACATACAATT ACTCTA~CrC TAGTTAACGA GTTCAACAAC ATTAAGAC ACTATTTGAA TAAAATAGTr GAATCAACCT CAGAAAGCCA ACTACACATA CTGATGATGC AGAGTCGATC AAAAGTAGAT GALAGCTG=G CTAACTTTGA AAAGGACTCA TCTTCTTCGT CAAGTTCAGA CTCTTCCACT AAACCGGAAG CTTCAGATAC AI.CAAGCCA AACAAGCCGA CAGAACCAGG AGAAAAGGTA GCAGAAGCTA AGAAGAAGGT TGAACAAGCT GAGAAAAAAG CCAAGGATCA AAAAGAAGAA GATCGTCGTA ACTACCCAAC CAT1TACI'TAC AAAACGCr'rG AACT'TGAAAT TCCTGCTCC GATGTGGAAG rAAAAAAGC GGAGCTTGAA CTAGTAAAAG TGAkAAGCTAA CGAACCTCGA GACCAGCAAA AAATTAAGCA AGCAGAAGCG GAAGTTGAGA GTAAACAAGC TGAGCCTACA AGGTTAAAAA AAATICAAGAC AGATCGTGAA GAAGCAGAAG AAGAAGCTAA ACGAAGAGCA GATGCTAAAG ACCAAGGTAA ACCAAAGCGG CCGGCAAAAC GAGGAGTTCC 'rGGAGAGCTA GCAACACCTC ATAAAAAAGA AAA'rGATGCG AAGTCTrAc ATTCTAGCGT AGGTGA-AGAA ACTCTTCCAA GCCCATCCCT GAAAcC-AcAA .1 0
S
S S S S.
S
S9
S
*SSS
AAAAAGCTAG CAGA.AGC'rGA GAAGAAGGT AAAGA-AGAAG ATCGCCGTAA CTACCCAACC GCTGAGTCCG ATGTGGAAGT TAAAAAAGCG GAACCTCGAA ACGAGGAAAA AGTTAAGCAA GAGGCTACAA GGTTAGAAAA AATCAAGACA CGAAAAGCAG CAGAAGAAGA TAALAGTTAA.A GA.AGAAGCTA AG-AAAAAAGC CGAGGATCAA AATACTrACA AAACGCTTGA ACTrGAAATT 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 GAGCTTGALAC TAGTAAAAGA GCAAAAGCGG AAGT'flGAGAG GATCGTAAAA AAGCAGAAGA GA.AAAACCAG CTGAACAACC CCCGCTCCAA AAGCACAAA.A CCAAAAGCAG AAAAACCAGC GAAGAATATA ATCGCrTTAC TCTACTCCAA AAACAGGCTG GGTTCAATGG CGACAGGATG CGGCGCTATGG CGACAGGATG GGT'rCAATGG CAACACATG CGTTCAATGG CGACAGGATG ACCAGCTCCA GCTCCAAAAC CAGAGAATCC TGATCAACAA GCTGAAGAAG ACTATGCTCG TCAACAGCA.A CCCCCAAAAA CTGAAAAACC GAAACAAGAA AACGGTATGT GGTAC?.TCTA GCTCCAAAAC AATGCCTCAT GGTACTACCT
GGAAGCTAAG
TAAAAAACCT
AGAAGCTAAA
ACAACCAGCG
AGCTGAACAA
TAGATCAGAA
AGCACAACCA
CAATACTGAT
CAACACCAAT
5555
S
S.
S. sO S S GCTCCAAAAC AATGTTCAT GTACTATCT AAACGCTAAT GCTCCAAAAC AATr.CTTCAT GGTACTACCT AAACGCTAAT GCTCCAATAC AATGGCTCAT GGTACT1ACCT AAACCCTAAT GGTTCAATGG CGACAGGA'rG GCTCCAATAC AATGCTCAT GCTACTACCT AAACCCTAAT GGTGATATGG CGACAGGTTG GGTGAAACAT CGAGATACCT GGTACTATCT TGAAGCATCA 910 GGTGCTATGA AAGCAAGCCA ATGGTTCAAA GTATCAGATA TCAGGTCCC TTGcAGTcA.A cACAAcrTrA GATGccTATG AATCCTACTA T=rc.LATGC GAGTCAATGC CAATGGTGAA
TGCGTAAACT
TATTCCCTAC
TAAAAAATAA
AAACCTAATA
AAATACCA'rA
CTCTGTAATC
TAACTA=?A
TCCTTTCAGT
TCTAGCCGGA
CATTCAAGAG
AAGAGGG
GCGCCAGATG
ATACTGACrr CCT.GTAAGAA CTCNTTA.AAG AGATAATATA CCC-.TGTAGG AAGTTTAGAT 'rTTATAGCCC TA6*ACTAC GGAGTT~TTrr GCTCTT3'AAG AGAGTTACCG GTTTTAAACr CAATCTCTGC CAC-TGCTG GCTTGCGAAA TTCCACGGAG ATAGrGAGGA GCGAGACCGC 'rGATGAGGAA AGAATGGCCG ATITAAGCCTT CTCCAATTGC TGGCTCCACG GAGTTTGGCA GGAATTCACG AACTGCGACG AGGCGATCTT CATCTrTC1 TT7'rCTCCT'r TGAGGTTAAT CAATCCTTTC AACTGTTCGT TCAAAGGTCA AATCAGGTAG GA=TCCT GTTTCAAAGT AATGGTTGAT T'rGGTTGAAG AGGTAAGCAT T-rCCCATCCC AGCTCGC;CCA CGTCAGCACC AACTTCTTCG ATGCCTTGCT TGGCTTCTrG GACAGTACCG TGGCGATCAA TGGAA'rCTG GTTAGAGCTT GCGCAACCTT GTAAAGGGTC CGTGGCC-AGT ATACATTTGT TC-ACGGGTAC GGCCATGCAT GGCG.AGGGCA
ATCATCACTG
ATATCACCGT
TCAAGGTCTG
GAAACACCTG
CCGGTACGCA
GAGTAAATCT
CAGCTTCAGC
'rTTTGACAGT
TGTCT=ATC
AGCCCATGTT
AGCGAGAGCA TTTTCTACTG CAAGAGATCG CTCCGCCCAG AAGTGGGATA TCAAGGACAG ACTGGACCTT GTPG-ATGATG 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 CTTGAGCCAC ATACACCAG CTTCGTTCTT CACGATI=~C TTGACAGGGC GATATCGACG ATATC GGTCT TGGTG7rTTC TTGGATCAAT TCTGCTGCGC GTCCTAGGCT GTCTTCATCG CTACCAAAAA GTTGGA'rAGA GACAGGGTTT TCGCCCTCAT CGATATGAAG CATGTGCAGG GrrTTTCc'r TGTTGTATTG GATTCCCTI'G TCAGAGACCA TTTCCATTAC AACGAG1'CCA GCTCCGAGCT CCTTTGCGAT AGTACGAAAG GCTGAGTtGG TCACGCCAGC CATAGGCGCT AAAACCGTAC GATTGGGA.AT CTCAATAT'rG CCAATCATAA AAGGTGTATT AAGATTTGTC ACGAATGAGT TCCTCCAGGT CCT-.rTCATC AAAGTTGTAA GTAGN'I'GGC AGAATTGACA AGTGATTTCT GCCCCGTGGT CTTCCTCTTT CATI'CCTGT
AAGTCTGAGC
TGGAAACGGA
AGGAGGGCTT
ATGCGTTTTT
AAACCACCTG
'rTGGAAGCCT GGCAAGAGCG TTCTTCTrC AGAAAGACGC CGATATGGTC GTCG.CITICG eAAAGCGAGC AATCTCTTCT CAACCTTGAC CTTGTCTTCC TTCATAAAGC GTTCATGGCT ACAGTCACAT T1'GTAGGCTT CGTCCCCGTA GATAGCCTTG AGAAGAGTAG AGATAGCTGG CATrrrCTTGG TTCTTGGCTC CTGGCAAGAC TTGAACTAGG TCGTCCAAAA GGACATTGAG GCCGACCGCT GAAGGCGTTT GTTGGCTTTC AGTAAGGTAA AAGGCAAGGT C?1'CACCGAT TCTCCAGAG 911 A'rGAGCGGGAG TTATAGAGTr GTAAGGATTT CCACGTACCGT AGTCTGTGAT AACGAGGAA'r 4380 TGACCATTTC CAACAAAACG TCCCACTAGG ACT'rCACCAG ACACCACGA'r T7TGAACATA GCCc1-rGACG rrCCCCrGG GCACCTAGAG AGC"TAGATCC CGCr-GAGAA TcTcGcrAGC G'rGA'rG'rr T?1'cTTGAGc AAG~c'CCGC 'TrrCGATAT AAATGCCCAA AGGGGGAGCC GCATGAAAAT AAAAAGAGAA AGGTAGAAAA TG.AAACCCAT TGG'rCAATTT TCAATAC?1'T AATGGAAGAA ACZAGGA~T CAACAC=A ACTGCTAAGr GATAAGAGTT CGACCAAGCG AGTGCGGCG GI-rTCAGTGC AcITTTAATA A'1rTATCCA CGTrGTTTAC TGATTTTCAG ACAGA'rrATT TTACCATTTG TCGCAGCTT rrTGATGTcA TATCAGCGAC GGTGATAATA TGGTArrrCC TTTTTCATrG CTrACAGTTGA GCTAGCTTGG TATCAAGGAC AAAAGC-ACGA TACCTACTAT TTTAGCA'rAA GATAATGGAC CAGGAAATCA TCAGAT'rAT GCTATGCTTA AACAAATGTA TTTAGGAGAT TTGATCGAGA AAGCCGAGTG CCTCTATT ACAAGAGTCT CAGACGACCG TrCAAGCrG'r CAAAAGCAAC CCTAACCAAA TATGTCACCC TGCTCAATGA 0 0. **0 0000 0 CAAGGCTTG GATAGTGGCT TAGAGCTGGC TATTCACTCA GTCTATCGGT GCAGCTACCA AGGGGAGAGA 'rA'TCGGAGC TAAATACCAG ATTTGT'rT ATCT'rCTC'rA CCACCAACAC TCAAGAATTC; 6TGATrACG AGGCTACGCT TGGTCGTCAC TTTGTCAGAA TTGA'rrAT CCATCCAAAA TGGCCCTrG TCACTATTTC TAT'rTCTGTC TTTTCCGAAA GGTCTCGTCG CATGCAGAAA CCAGAGAGAA AACAGGAGAT TGCCAAT'rTA TTTGTCTGCG GGGCAGAAAT TGGACTTGGT TCTICTGGGCT TCCGCTCAAT GCTTGCcACT TTC2LAGTCAT AGAAGAGAA.A- TATCT'rAT CTTCGTTTGC TGAGAAACCT 'rCCCTCCTTT CAAGATCAAA ATCTGCGTCT 'r=rrGG ACAGTGCTGT rTr-rAGCCCC ATCAGCTGGC TTGGCTGTT TAAATCAGAT CCAGGTCCAC AGCATCAGAT ACTCAGGAAT GGGAAGGTCA GAGGAAATCT GCCTGCAAG CACATCAGTC AACAACGTCT ATCCGAGGGT ATITrGACAA TTTGCTGGGC AACATATTCC 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 AAcrACAGTT aAGGATGG.TG Ar.ATGATGAT. ATTCTTCTCT TTTCTCCTAT CTCATCGCAT TCTTCCTCTT CATACrATGG AGTATATTCT TGGTT'rTGGA GGC;CA G CAGATTTACT r-ACGCAAT'rC ATITCAAGAAA TGAAGAAGGA GGAACTATTG GGGGATTATA CAGCGGACCA TGTCACCTAT GAACTCAGTC AGCTTTGITGC TCAAGTCTAT CTCTATAAGG GCTATATT ACAGGATCGC TACAAGTACC AGTTAGAGAA TCGTCATCCA TATTTACrGA TGGAACA'rCA TETTTAAAGAG ACAGCAGAGG AGATrTTTTCA TGCTCTACCT GCT'IrCAAC AGGGGACAGA TTAGATAAG AAGA'rTCTCT GGGAATGGCT CCAGTTAATC GAATATATGG CTGAAAACG TGGCCAGCAT ATGCGGATTG
GGCAGCCATT
CCCTAGTCCG
ACCAGTCTAT
TTGAAACG.GT
CATTATGATT
TATTTAAAAA
912 GTCTGGATTr GACATCGG ATTTGGAATA CAATCGT? TGCTGGTTAC CAA'rAACCCG ATGACTNGGA TATGGAGGAT GGTTAATCCA GGTCTTrTTTT AAAAAAGTTG ATAAACAAGA CACCTCAAGA TCTTTATC-AG TGACCAGGGA ACTACAAGTI' CTCGAGTCAA AAAGAGTTTA ATTATTCACT TAAAAGGC= TC-ACAT'r= TTAAA.AA'rTA TAGTAAGTGT AAAGAGGAAA GAAAAATACA TCATGGCCAT AAAAAAGGG AAALAGGTTAG TT-CrTGTCT TI'TCAAGGAT ATTACCATTG AAGCTTATGA ATTCATAAGA AGGAACAGAC TTGGTAGCGA TTCGCC:AGTT GTGAAATrCA CACAATCTCC AAGCCrTTA T'TrTGTATAC GAGGACAGTA CATGTcAcAA CTCGTCCAT CATTTTCAAC CCCAGATTTT CCCTCAGGCA TTCAGTCAGT TATTGCGGGT CAATCGGGAT TACCAACCAA GGT'rGGGTTG AGCACAATGC CAATGAAA'r TGGAACTCTG GCTTrCATCG AAAGTGGTGT CAAGCCAAAT CAAATCGACG
S
S
S. S
S
55 S S
S
*5.S 5* *5 S S
S
CGTGAAACA.A CGGTTGTCTG TGGCAGTCZAC GCCAGACAGC GGATAAGAAA ACAGGACTrC CTA'rCTACAA TGCTATCGTr ACCTTTGGCT GAGCAACTAA AA.AGCCA.AG TTrATGTGGAA AAATTCCATG AAAAGACTGG TTTGATTATT GATGCT'rACT TC'TCTG;CTAC CAAG(3TTCGT TG;GATTTTGG ATCATGTAGA AGGTGCTCAA GAGCGAGCAG A-AAAAGGGGA AT'rGCTCTTT 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440' 7500 7560 7620 7680 7740 7800 7860 GGTACThTCG ATACTTGGTT CGG-wrGGAAA TTGACTrGACS GTGCGGCTCA TACTCAAATG CAGCTCGTAC CATGCTT'rAT AACATTAAAG AACTCA.AATG ATTTTGGAAA TCCTTAACAT TCCGAAGgCT ATACTTCCAG AAGTTCGTTC ATCTACGGCA AGACAGCTCC ATTCCA'NTC GCTGGGCACC AACAAGCAGC CCTCTTTGGA AATACTTATG GAACAGGCTC T'rrCALTCATC GAA.AACAACC TCTTGACAAC CATTGGTTAC GAAGGTTCTA TCTTCATCGC AGGAAG1TGCT GTTAAAATT CACCAGAATC TGAAAAATAC T ATGTCGTTC CAGCCTTTAC AGGTCTAGGC TCCGTCI'TTG GTTTGACTCG TGGAACAAGC TCTATITGCTI' ATCAAGTGCG TGATATCATC A'rTCAAGrAC TCAAGCTGGA TGGTGGTGCA GCGGATATTT TAGGCATTGA CATTGCACCT GCGGCCTTCC TAGC-AGGTTT GTCAGTAGGG TACGGTGGAG AGGTGCCAAT CAGTTGGCTT 'rTGAGCCAGG ATGAATACTCG GGAAGACAT
CGTGACTGAC
GGATCATGAG
'rAACTCCGAA
CTCAGGTATC
TATGGT'rAAG
GCAGTTGTCT
TTATGCCTTG
TCTCGCATG
CGATGAAGTT
TGCTCGTGGT
GGAATCAACG
ATTCAGTGGC
GCTCGTGAT
GCTCCATACT
AAAGAAGACT
GACACCATC
GCCATGAACA
GCTAAAAACC
GTAAGGTTA
TTCGTG.ACGG
CTCACAACAA
GGAACCAAAA
TTATCAAGGC GACTTTGCAA AAGTGGATAC TCAGACCGCC ACTTCCTCAT GCAGTTCCAG T=~AAArAAC ACCTCTAGGA TACTGCAAAG ACTTCGACGA GTTGAAACTC 913 TTGAACGAGA CAGGAGAACT CTr'rGAGCCA TCTATGAACG. AATCTCGCAA CGAACAACTc TACAACGGCT GGAAGAAGGC TGTGAAAGCA ACTCAAGTCr TTGCGGAAGT AGACGACTAA TACTGGCAGA ATAAAGCGAT TTATTTAGAA AGTGTGTAAA TATGGAA~rT TCAAAGAAAA CACGTGAATT GTCAATTAAA AAAATGCAGG AACGTACCCT GGACCTC~rG. ATTATCGGM GAGGAATCAC AGCAGCTGar GTACCCTTG.C AGGCGGCAGC TAGCGGTCTT GAGAC'TGGTT TGATTGAAAT CCAAGACTTT GCAGAAGGAA CATCTAGTCG TTCAACAAAA 'rTGGTTCACCG CAGGACrrCC TTACCTCAAA CAA~rrGACG TALGAAGTGGr CTC-ALATACC GTTCTGAAC GTCCAGTGGT TCAACAAATC GCTCCACACA TTCCAAAATC ACATCCAATG C.PCTACC-AG T1-rACGA'rGA AGATGGAGCA ACCTTTAGCC TCTTCCGTCT TAAAGTAGCC ATGGACTTGT ACCACCTCT'r GGCAGG'rcTr AGCAACACAC CAGCTGCGAA CAAGG-rI-rG AGCAAGGATC AAGTCTTGGA ACCCCAGCCA AACTTGAAGA C 4 4*4 .4 4 4 4 44 4* 0 4 4 *4 4 0 4 f- .4 .4 (4
C
4 44, C *s.4 44 4 4 TTGAC?1'CCG rAAC.AACGAT GcCC'rCTCG ACGGcCCr CATTGCCAAC CACGTGAAGG AGATTrACAGG TGTrGTAGCT CGTrGATCTCT GTCTGGTTAT TAA'rACAACA GGTCCTTGGA GAACCCAATr CTCACAAATG CGCCCAACTA AAATCAAGGT TTCACAGCCA GTTTACTTCG r'rGTTC'rCCC ACGTCAAAAC AAGACTTACT ATTTGGAGCA TCCAAAAGTA ACTCAAGAAG AGGAAGGCTT GGTACGAGGT GGAGTGTATC TGArTGAAAA CATCA.AACC'r GCCAACCAAG CAGAAGGCTT CCrC--rGAC GAAAGTGGCA 'rGACAGACCA AGrGrMrGAA ATCAAGGCCC GTGATAAAGT ACCTAA'IrM TCTAATAAGG AGGGAGTrCA CTTGGTAGTA GATTCAAGCA ACACAGGTTT GGGTGACCGT CGTATGGTCT TTGGTACAAC TGATACAGAC TACACAGGTG ATGTAGATTA TCTAcTTGGcC ATTGTCAACA 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 ACCGCTTCCC AGAATCCAAC ATCACCATTG ATGATATCGA GTCCATTGAT TGCAGGGAAC AGTGCCTCTG ACTATAATGG GTGA'rGAAAG C~rrGACAAC TTGAT-lGCGA CTGTTGAATC CACGTGAAGA TGTTGAGTCT GCTGTCAGCA AGCNTGAAAG TGGATCCATC TC-AGTT'rCT CGTGGGTCTA'GCTTGGACCC CTC?1'GCTGG TCOGTAAAATC ACAGACTACC GTAAGATGcr- TGGTTGACAT CCTCAAAGCA GAATTTGACC GTAGCTTTAA ACCCTGTTTC AGGTGGAGAA TTGAACCCAG CAAATGTGGA CGCAACrTTG ACTATCACGr GGTTTGGATA GCAAGGAAGC ACGGTTCAAA TGCACCGAAA GTCTTTGCAC TTGCTCACAG AAGCAGC'rGG GCAGGTCTrC TGGAAATAAC GGTACCATCA TTATCTCTCC AAAGAAAAAA TAGCACATOT GAGAAACATr TGATGACAAT GGTCTCTTGA TGAAGGAGCT ATGGAGCGCG ATTGATCAAT TCTAAAACTr
TTCAGAAATC
TCACTATCTG
CTTGGAACAA
GAAGCCTTTG
GCAAATCTTT
GCGCCACGAC
914 TCAGCTTGGC AGATACrTTG TCCCTTCACT ATGC?.ATGCC CAGTTGACTr CCTTCTTCG;T CGTACCAATC ACATGCTCTT
GTATCGTTGA
AAGCAACTTA
GCCAATTTG GATGAAATGG GACCGATTCrA CCGTGCTGAT CTCGAAGCAG CrCTCGCTAA AAAATTAAGA AAAAATAAAA GAGTGGAGG GCAGCATTCC TTAATGCAGA C.AGAAAGATG ATGAATGAAT TATTTrGGAGA TGATTCTTCT AGGAAATGGT GTTGTTCCAG GTGTGCTrcT GCTCAGGTTG GATTGTGATT ACTATGGGTT GGGGGA-rGC TATCTGGCAA GCTCAGTCCA GC-TTA'r'I-AA ACCCAGCTGT CAA'rGAGTTG ACTCTTAGCC TATGCGTGAT AGCTTGGATA TGACTGGACA GAACAAGAAA CAACGA'rTTA CCAGAATTAA TTCTCGCCCG TCCrCTTCT ATTTCTAGGG ACTTTAATCC TCCTAAAACC AAGACCAATA AGT'rGCGGTT GCAGTrCTTTGr GACCATCGGT GTGGCCTTAA AGCCCACTTC GCAGGGGCCA CTATGAGGCA CAAGAAAA'rG CAAQCATACT GTATCAAACT AATCTr'rGCT TTGGGTCTTT T1'rGATTGTC GGTATCGGTC AAGGTGGTTT CCC=rGCGCT TGCTGGrCA GATTTTr-GG'rr CAGGCAATAT CC1rGGCAACC TGATTAGCGA AATCCTTGGA ACCACTTTCA GGCAGGTATC TATCACTAGG TGGGACAACA TCATGCACAC CATC-TTGCCA TTCCTGTTGT AGGCCCTGT AGTI'TATACT CTTCGAAAAT TACAGCTTGC GGCTAGCTTC TTATGTTGAT AGACCTTGGG3 TCCGTTTTGC CTTATATCTT TGGTTGCAAT TCAAACCTCA TTCAGTACTG GACCAGCCAPT AC=---GTTT- TrGTGrrCAC GGAACCTTTG CAGTCGGAAC GGTTATGCCT TGAACCCAGC ATTCCAAACA AGGGAGACG ATCGGAGCAG CCTTGGCAGT CAAATTCAAA CC-ACGTCAGC CTAGTTTGCT CTTTGATTTr CAAGAGCCCA ATTTCAGCAA
TCGTGACCTT
AGACTGGTCT
CCTGTATTC
GTCGCCTTAC
CA7rGACTAT
AAAATGAAGT
GTCAACTACA
CGACCTCGTA
TACGCT'rGCA
TCACTTTTCT
CGTACTCAAG
'rAGAAAACAA
AAATCTTCTC
GTCAGAAAGC
9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 ATAATAAAAC GCATCATATC AAGCACGAAA ATTCCACGAG TGAACAACAA GCCAAALACGC CCAAAAAAGG CGGCAAAAAG GCCGAAATGG 'rCAAATCCTG ATTATGTCAA CGAATTAGAC AGTAGAATTTI C-ACAAGTCAC AAGGCACTTT GGAAACTCCC CAAAAACCT GAAGAAPLTcr AGCAAAGGAG A~cTG.AGCTT TTTGAACCGC 'ITGAACAAAT AGAGTTCGC AAGTATTATG CAAGCACCTG CAAGCAACGT CCAAAAATCG 'rTGATATGCT GAGCGCAAG CAGAAATCC GAGGGTAAAA AACAAGAGCT C~rACAAATT ACTI'GAGCAA TTAAcTAAAA TATAAAcccT Gcc'1'TTATAT CTAGrcAGGG. TTTATATTTT AGAAATrcAc CGrAGGT'GTT KtGGTrrI'TA cATAccr-AGT ATAGTITGAG -TTTcTATAcT ATrAGTcAT AAACTTCCAT TTTCTTTGAG CAACATGGAT ATAAGTACTT GTTATGTAGT ATGGATATGG GCTTTGTGAA TCCAAGTAAG ACTGATAAGC TTGTATACCA AAATATCCTC CACCAATTAT 915 TGCACCCCAT GGACCCCCCA ATAAAGCACC TATCCTACCA ATCATATAAC TGATTCCAGC ACCAL2TCATG AAGTT1AGCGA ATGTGTTAGC TTGTTFAT'rC CCATGTATTG TGTTGACGTA A'rTCCAAACA '1-AGGA'PCGT ATGATCTAAA AGATATATT AGGTCGATrT
ATAAGCCATA
TACACTTCCA
TAAAATGCCC CATTGATA'rA
TCTGGATTGA'CAACCTCAAG
GACGCCGTCA GCACGTCGTT AACTTCATCG CI'AAAATAT AAACTGAGCC 'rCACCAGATA
CATTCTTTTG
CAATAGTGTC
TTACTTGCGT
CAACTTTAGA
ATCTCCGAAC CGCACTGATG AGCCATTCrC GTTTGCCGAT AAGCrATCAT CAGCAAAAAC AGAAAACAGA, CATAACTACC AAACACATGC CTTTGTAPTT TTGATTTTr TCAACTTTTA GGTAACAATA TCCTTAATGC GAAAATTTTT ACTACAGGCT TGTTGTTG'rr GAAAAGAGGT
AAACAAGCGA
ATTAAACAT
CGGGGAAATC CTAGACATAC CT'rAGACATA ACGGAA.ACTC TTATACAATA AAACCA.AATA TATATATTTT TATGTTTGAT CTCGAA.ATGG GTTATTTAGA
AAAAGAAAGC
CGTTATCGAA
CACAGAAGCT
11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12665 ATTATCCTCG CAGTTTTTTC ATT'rGCTTTT TACAACCTAT GTTCATTCGC TrGGGTCTGC TCTACAATAA AAAACAATAA AAAATAAATA GACGTATTTT CAAAAAAAAC inaAATGCATA TrTATATrAG CAAAACGACG ATTTAAATCG TCGTTTTTTT GTAGTACGAC GGGCATGTCG TATATCTGAG GTGTAAGTCC TCAGCCTGAC TATCGTGAGG TAGCAGGGAG AGGAAGGGAT AGCGAAATCG TGGCTCTACG AACAGGAACG TGATAGTAAG GCbTATATAG CGGATAAGGA GGCTTCAAAC TCTAA.AGTCC AAAAAGGTAG TCGTAACCTA. TATGTGTAAA. TCACGAGAGT AATTGAAT'rC GGACTAAGGT TTGTGTGAAA. AAGATAAATC TTTCTAGAGT CTAAAGACTC TGCGTCAGAT T'rCCTATTTT CACTGTAACC T'rTTAACCTC C'TCATATCTT GTATAAACGA GGAAAGATGT ACGACTTATC CCGTGAGGTT TCATGAGCGT GAA.AGCGTAG TAACAACGAA TCATGAGAAG TCAGCCGAGC CCATAGTAGT GAGGAAACTT CCGTA.ATGGA AGTGGAGCGA
AGGGG
INJFO RMATION FOR SEQ ID NO: 135: SEQUENCE CHARACTERISTICS: LENGTH: 5305 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 135: CGCTAATCAC TACAATCATT TTATTGTACT TTT-TCACTCT CAAGAAAAGC AAGAAGTATT 916 CATTT'rAGTT TCATrTAGTA 'rrATTTGCA TACCTAAAAT ACAGTAAAAA ATCACTCATC TTCGTATGCT CCI'GCTTTCA CTATTCAACA CCrTTGAC TATACTAGG CTCA'N-rCCA AAAGCATTAT ATAATAGTGA TATGAAACCA ACTAAACTAA ACAACAA6ATA TAAGCAATAA AAATTCGTTT AAAAGATCN' GAAGrwrATrT TCAAACAACC AA'rTAATGGG GTTATAATAA ALGATGAGTAT TAAAACTATA CTCCCGTAGA GCGTGATTTA
ACTAAAGCTA
TAAAATACTG
ATAGCTGATA
AGGAGGACGA
ACACA6AGGTC ATACTAAATA AAAATAAAAG AGTAAACrAG ATTTT-CGCCr GAAGATAATA CTGGAGTGCA GCTTGTG GI I I GGATT AGr.TGCrAA AAATITAAAA
TTTAAGAGT
TTAAAArrAG
AGACTATTGG
'rTGCAGATG AGATr-rTATT TTTAATAGAG GCGGGAAAAT TrAGGGAAA ACCCTAGACC ATTCTCAATT ACTAGACGAA TCTGAAAGTT TCTATATCTG CA'PTGCTT'rC GCACCAAGCC I L'T AATTC TAATATGCTG TTTrrCAATC GTGCCCGTTA TCTGGGAGTG TTGGGTTGTT ~T=CCTAAC CTTGGCTATT TCACATAG AAATTTATCA GCACAATCCT AACTGGCAGA GGCTGTCGGG GTGACACGCC ACAATCCCAG TCTCTCCCTC TGCCAG'TCTA AACTATTrrG GG-AGGAAGA6A GATCAAAAAT AGAGAAGAAC AACTGTTCAA 'rAAAGCGGGC TCGCTCCTAT CTTATATCAT TTCAGTATTA CTAATCGTTA TCATCATAGG GACATTTTAC ACCTACTATG CTCGTTT'rA TTTTACCATT ACGGC'rCTTr TGATGTTGCA CAATTATCAA GAATTTTA AATACCTGTC TGCTTGGGTC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 ATTACTTATA TCATTTACCT TCCGTGGATC TrTATTGQCA ATCL'GGTCT TAAGAGCTAT GGCGAATGGG CTCAGAAAAA ATTTGAACAA GATATGGATG AATTGGAGAG TGGAGAATAG CTTGT'rACTC TTTTCTCAAT CCAGCTAAAA TGTGATATAA TAGTACTAAT 'rTAT'rGGAAT ACATGAAAGT TCTTGAAAAT TTTCATGGGT TCTAG3CTAA GGAAGTAGGA AAAGTATGTA TCCACATGAT AGTr'rGACA6T TGCACACGGA L-TTGTACCAG ATCA.ACA'rGA TGCAGGTTA CTTTGACCAA GGGA'rTCACA ATAAGAAGGC GGTCTTTGAG GTGTATrrCC GCCAACAGCC TTTAAGAAC GGCTATGCGG CTTGCGTTrTT TCAGATAGTG CTTGGATC CTTCGCAATT T'rTGTTTTT GCTAATGAAC GGTCGAAACG GCTCrTTTGA
TTTTTGCAGG
ATATAGCCTA
'rCAAG3TGGA
CGATTGTGCA
ACATCGTCAA
TTAGAAAGA ATTGTGAACT TTTGGAGTCG CTTGGTTATC G1'TGACCG'rr CGCrTrTCCCC GGTGGAAGGA CCTCTAGCCC CTACCAGACT ?TGGTGGCGA
ATCTGAAGA
ATGGGGCGTT
AACAAGGGGA
AATGTCAGTT
CGAAGGCAGC
TCCTATTCGT TCGGTTATCG AAGATGAACC CTT0ATGA AGAAATGGAT GCGGCCATCT GGGGAACACG CGCAGCTGTG TT'rGGGACAC GTcfGGGCTcA ATTGGTGGCG CCAATGGAAC CAGCAACGTG CGTGCGGGTA AGCTCTTTGA CATTCCTGTT TTGGGAACCC ATGCCCATGC 917 C7rCGACAG GTNATCGCA ATGAC1'ATCA AGCTTTCAAG GCTTACGCTC CGACCCACAA AAATTGTGTC TTTCI'rGTGG ATACCTATGA CACCCTTrGC ATCCGTGTAC CAGCTGCCAT TCAGGTGCCG CGTGACCTGG GTGA'rCAGAT TAACT~rATG GGTGTGCGGA TTGACrCTGG GGATATTGCC TACATITrCTA AGAAAGTCCG TCAGCAACTG GATGAGGCTG GATTTACAG.A GG='AAGATr TATCCT'?CTA ATGATCrAGA TGAAAA'rACC ATCC"-TAACC TC.AAGATIGCA AAAGGCCAAG A'rrGATGTCT CGGGGTGTGCG TACCAAGCTG A?1'ACAGCCT ATGACCAGCC GGCTC'-TGG GCCGG=ACA AGATTGTTGC AATCGAACAT GAAACTGGTC AGATGCGCAA TACGATTAAG CTGTCTAA'rA ATGCTGAAAA AGTTTCTACG CCAGCTAAGA ACCAGGTGTG GCGCATTACC AGTCGTGAAA AAG4GCAAGTC AGAACGCGAC TATATCACTT ATGATGGTGT CGATA'IrAGC
GACGGTTCGT
AGTTTACAAC
GTTGTGGGAT
GACATGACAG AAATCAAGAT GTTCCATCCC AATTTTGATG- CCGTC=.r~ CNTGGTGGAT TTGCCTAGTT TGACTGACAT TCAGGATTAT GAGTATAAGC GTGTGCTCAA TCCGCAGCAC ACC'rATACAT ACA'rCAAGAA ATC?1'CAAAG
GCCCCTAAGG
TATCCAGTGG
CGCA.AGGAAG
TCATGTATGG CAAGATAAGA TGGACTTGAT 'rCATAAGATC
A.AGCGAATATT
AATTGACAA
ATTTGGCGCG
CCCTTGGTGA
CAAACCAGTG
TCTGAAAAAA
AACCTTGGC-A
CCATAGCTAC
1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2540 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 AGGAGAAGA.A GAATGAGTTT ATTGATGCCC AGGAAGAAAT CATCCCTTCC TAAAAACCTT GGACGTTTGG CGCAArrAGC AAATTTATCG CTGTCCGCCT GCCCTAGCCT TCATCCACCC GCCATGAC-AG CTGCAGTTGA ATCA.AGGCAC GTTGCCGTAT GTCATTGGAA CAGACCACGC GGCGGTGCGG A'rATTCTCCC CAGAAACTTG GCGCAGAGCC GATAAACCAG GCCTAGCTGA TACCTAGAAG GCAAAACAAT GCAAGAAACG ATTATCCAAG AGCTGGGTGT CCGTCGTTCT ATTGArMCm TAAAAAcGA'rA TGTACTAGGG ATTTCTGGGG GACAAGACTC TATGGAAGAA CTGCGAGCTG AAACG;GGAGA GCCATACGGA GTGCAAGC:TG AGATGTCAGC TTGCT'rGTGA ATGA.AGCAGA TGCTCAAAAA ATATCAAGGA ATCAGCTGAT AGCCACAGGT AGTCCTGTTT CAGACTTCAA CAAGCGGAAT GATTCCTCAG TATGCCCTTG CTGGTTCCCA TAGCCGCG CGCGGAAAAT ATC.ACAGGTrT TCTTTACCAA GTTTGGTGAC TCTTTACCGC CTCAA'rAAAC GCCAAGGAAA ACAGCTCTTG AGCCCTTTIA'r GAAAAALATCC CAACGGCAGA CCTAGAAGAA CGAAGTCGCA CI'GGAGTCA CCTACGCAGA GA~rGACGAC CAGCCCAGALA GCTCAAGCCA CCAr'rGAAAA CTGGTCGCAC AAAGGCCAAC ACAAACGCCA CTTACCCATC ACCGTATTTG ATGACTTTTG GGAGTAAAA.A GGTCCGGGGG ACCTTTTT'AG CTTCTTGCCC TGAAATrrAAA AAGCAAGAAA AACCTCCACT 918 GGAGGTTTrC AGCCTCTCAT CTTraAATAA GAAAGTGAGA GAAGCrCrCTG GGGATCTTGA ACCCCGAGTT TAGAAATAAG AAAATGAGGC AGATTCAGTA ACTCGAAGAG TTCGATTTCA TCOTCTTACC CCTGCAACGA 'rGACTAGGTT 'rGAAAAAGCC TGCTAGAGCG CATTT-CAAAc CAGGCAGCAA CTGCGTCAAG AAATTAGAAG ACAAACTCCT TTTCTAGCTG TrACTGAGTT GAGCQTT-r ACTACCAGTA 'rAGAAATAAG GAAGTGAGGT AGCATCATGA AATCTATCCG TACGCAAATA 'rTACAGACAG AACG1rTTGAT TTAAGAAGA TTGTGGAGA ACCCA'rGTTr CAAAATrGG CTTCATCCGC TGAGAATCTG ACCTATGTTA
GTGATGCAGA
CCTGGGATCC
CCATCC'rCAT GTCGAAATCA C'rCGAAACTC TCTCAACTAT TATAAATGGG CCATTrGTCT TATCAGCATT GTTAAGATAG ACGAGGCTGA CAAGGCTTAC 'rGGGGAAATG GTATGATGAC TTTTACTCAA GCAGGTTTTC AAAAGGTCAG GATrTGCAAT TGGTGCTT CCrATACTAA AAAACAAAAC CCAGAGCAAG TAATAGGAGA ?ITTAAGCTCTI GAAAT'rCGCT ATGTGTrAGG AGAGACTTTG AAAGCTATCT TGGACTTTTG AGCACCTTAT GCCACTCTCA ACCCAGCTTC CTATCTACAA ACCATTGTTA ATGGTGTAGA TrATGGTATA AGTACGGAAG AATGTrGAAT TTTATTGTAA ATATAATAAT TAGCATTCA GATTAGTACA AGACAGATGT TCTAGTTCCT AGGTCGTGTC ATGGAAAAGG GAGAAAAGGC TATCT'rGCG TCTATI-rTCT GTTTCTATCG AGTTTATr'TG AAACTTrAAA TC7rI AATCT GGTTTAGTGT 'rAATCGACTA GATTCTCCAG CTGTTGTTAA AT'TTAAAATA TATTCATGTT TTTTCTATCA CTATTCCCAA ATCATTCATA ATGAGCCACT TTCTTCCTCC
CI'GGAATGTC
ATCTTATTrA
AAGTCAACTA
ATAGCATATr 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5305 TAGTTAAAAA ATCGCTTTAA GCTGTAACT AAGAGGGAGC CCCAACAGG7 GGTAATGTAC l'NrTTATAGT GTAATCCTAG GAATCCTCTA TCG.AGTTAGG GAATTAAATTL CAACCAATTT AATrATCTAA TATTAAAATA CCTCTC1'CAA CTAGATGTAA TCATGAGGTC AGTTT1TACTT CTCTCA'rTCT
CTTACAAAAC
TCTGCTGTTC
GATGAGAAAA
CCCTGACCTC
CAGTATCGTT
T'rTCCTCGCT AGATTTCCTC AAALAGGGCAG ACTCCTCCCT TGGTGCGTCA CACGATTTT TCATCTCGAC TGTTCTTTAA TGCATCATTA ACGACGCTTT TCTTCTAGGT GGTCATAAG *CAACAGGAAG ATTCAGGTTG ACT?11'CTAA TCCTAGAATA AAGTG;CTGAA AACAATTCGG AATAGCCATA GAGACTAGAC AATTTGAGGA GCTGCTTGCa* TCCTGTTCGA ACACATTTTC CCACCACGTG AAGAAAAAGA TGGCGGAAGC GTTTGATTGT TAAAG~TCG AAGTCACCTC CAGCTAGATG TTGAGAAAA AGATAGAGA'r 'GTACGCGAT ACAGCTCATC ATCATACGAA CTTCGTTTTT GA1'TAAGGTT GAACT INFORMATION FOR SEQ ID NO: 136: Mi SEQUENCE CHARACTERISTICS: LENGTH: 3964 base pairs 'TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRiPTION: SEQ ID NO: 136: TGGCAGCTCG TCGTCGTWA CAAAAATCAG TAGGAACTCG GCAAATGGAT TCGGAAGCGA CTAAAACATT GTTAGAAATC TTTTACTATA AAGTTGAAGT TCAGCTATrA CAATr-CCAGG ACTAAACTGA CTGCTGAAAA GATATAACTC AT'TAACAGA TCAGCATrAG ACGGAGCGAC CCAGATGGTr CAGTAGTGAC GAATCTGTAA CTCAAGAAGC AAGGGAGGCA ATACTGGAAG GGTGGATCAG CTCACACAGG GCTACTGAAA AAGAATCAGC GAAATCAAAG GCGCACCGCT GCAGAAAAAC AAGCAGCTCT GAACCAGAAA CGATTGGAGT GCTCCTAATG CTGCTCCTAA GATGTTACCT ACCAGTCACC GCAGCACTTG CTAGTCTTGG AAGACTAGAC GTAGAAAATA ATAAGGGGGA TTTrGAGAAG GATAATCACA tAcT-rTAGcT TCTAACTGTG GAATAGGAGA
GGACGCAAAG
AGTCTACTGA
TGT TACAGTA GATT'rGACTG
AGGTGGAGAT
AAATCAATTG
ATCAACTGTT
TGAAG.AAAA
AATCAATGTA
GATTCTAGGA
TACACCAGAG
CTCAGATGCT
TTCACAAAAC
TAA.AAATGCC
TTCTGATAAA
CAAAGAGATT
GCAAGCCATT
GACAACAAGT
TGCTGGCAAA
TCTAGTGGTG
GAACAGCTAG
TCATCAATCC
GAATAGGAAT
TGGGCAATAT
GGTACAGCAA
GTAGCACAAG
AAAGCACCTG
GT'rAAGGTTG
GCTGGAGATG
AAAGATACAG
TATAAGCTAG
AATGCGAATG
TCAGCTCAAT
ATTGAAAAAG
GAAAAAGCAG
GAAAATGCGA
CCCATGGTTA
GCACCCCAAG
CAATTACCTA
CAATCGTCTT TAAAGATGGT ATCCAAAAGC ACAAGATAC CTCAAAGAGT AGATGTAAAA CTATrTACA AGCAAATGGT GTACAGCAAC AATCACATTC TTCAACAATC TGCGAAAGGT AAAATACACC AGGTGGAGAT AAGGCGGTGG TAGCCAGGCG CACAAGCTTC TAAGCAATTA CAGCCAAGGA CAAGCAGGAT AACTTTTAGC AAGAGTGGAA AAACTATGGA AGATGTGAAG CAGTTCCTAA GAGACCAGTG CAACTGCAGG AACAATGCAA ACACAGGTTC AGCATCAAGT TTT'rGGCTGC ATAATCCAAA CGAATTCTAT =Mr~AT=TI TGTAAAAAAG TTCAGTAGAT GA'rTGAAACT ACAATAGTAC ACCTCTGTTT' TCCTGATCGA ?TTrGTCCrGT TATrATTTrA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 GCAACAAGTG GTT'rTGCTTT AAAATTCTAT TCTCTACTrA TAGTGATGGG TGAGAAAAGT ATTCTATCAA TGTACCCAAT CGGATAGAAA AGATAGGAGA
GCTAGGAAGA
AAGTI'AGATT
GAGAACCCAA
CTCTTCTGTC
ATAGCTCTCT
920 CCGAAAAATT AGCTGCCCCT CCTCr Tr GC-TATAATAG ATTGAAGAGA GGAGGGCA;A
AAGATAGAAA
AGAATTTGCA
GATGAATATT
GTATGAGCAG
GGAAGTTGAG
AATrGGCTTA
CCGTCAGATT
GCTGGAAGAT
ACAG=TAG
GGTCGTGAC
GT-rGcTrTrG
CTCTATGCCA
GCTTACTTGT
GATTACCAT
CAGCTATCTA
ACCTATGAGA
AAGAGATGAT
CAGAAGAAAT
ATAAACCGAC
CTATTGGT'IG
T?1"rGATACA CATACACACT TGCCTrGGCT GCTGAGATGG CATTGAGCAT GCCTTCGAGT GCATCCTACA GAACCTGGTA
TGAATGTAGA
GTGTGACACA
TGGTAGATGA
CTATACAGA
CTTTAGGTGA TGGATAAGTT AAAACArrCC AACGTTGTGG GCAT'rCATTT TCAGGGACGC TTCCTTCTCA GGAGTGGTGA GTTACCTTG GACALAGATGT GCGTGGTCGT GAAAATAAAA GCG-TGGTATG ACGACAGAAG
GGATGACAGC
AGGACTTGGA
TTATCAAGAG
TTGAGTGGGC
CTTTTAAGAA
TGGTGGAAAC
CAGCCTATAC
AGCTGGCGGT-
GCCCAAAGAG
TTTGCCITTTT
TGAGCGTT
AGAGAAT
CCCAACTGAC
AGATGCGCCT
TCGCTATGTG
.AGCAACGACT
e. a a a a TGGACTGGAC AGCAAGTAAT GAAAGAGAAA ATTTCrC-AAG GATGATACGG TCAATCTCA-A ACGTTATTC GATGTGGAGA GCCATCAATG CTCAGGATAT AGAGCGGATT CAGCGCCTC GTCTTTACAG ACCCAGATTT TAATGGGGAA CGGATTCGGC CCAACAGTTC AGCATGCT TCTCAAGCGA GATGAAGC rG GGGCGTTCTC TGGGAATTGA GCATGCCAGC TATGAAGACC GTGACAGAAC AATTTGAACA TGAGAGTCAG 'rTrGACA'N'A CTTGGTTTTC TAGCACGGGC AGACACGCCGT AAGCGTAGAG CGAATCG CT ATTCCAACGG CAAGCAACTC CTCAAACGCC 'rTGGCAGAAG TGGAAGAAGC TATCAAATCT TATGAGTAGG TTTTrAAGTT TCACAGTATT TTrCGAAGCZA GGTAGAAGAG GTCAAAAAAT TAAAGAGATT CGGATAGAAA AAGGAATTAG ATGAGCAAGA ACTGACAGT2' CGTCAACTGT CGCGAATTGA GTTTGCCCAA GTrAGACTAT ATTGCTCGCC GGCTAGGAGT CGGATTTrTC AGCTCTTCCT TCTGCTrrATT TAGAATTGAA CAATCTATGG TAAAGAAGAG GAGTACGATA AGAAGGAAC GTGCAGGAGC AGGTTTCG GTTGTCCATA CCCGTGATGC GGOTCCTCGTG CTGGTATCAT GTGGATCTTG GTATGACCAT CTCCAAGAAG CAGCTAAAGA TACTTAGCAC CTGTACCCAA GTCGACTrTA TCGCTGACTT GCAAATGCAG AACGAATT'rr TrATCGTGGT TGAAGGGCGT CCTATGAOAC TCGAGGTrCT ACCAACGTCA TGGAGTCATT CCATGATCAT GATGGTCATT TTCCCAAGTC CAAGACCAAG TGAAAACGGC TCTAGCTCAA GTCGTAGCGA TTrGATTCGC AATATCTCGG AGAGACTCTC TAGAGTTGTT TGGGGTTACT AAAGATGTAG CCGTrACAAT GAGGCGTCTG ATGTrAA'rTG TCGTCCAGAT TTTTGTGGAG AAGTGGAGCT TCGCAACCGA TCCAGTTTAT AGCCTTATGC ATACCAGATr TrACGTGAAC GTGTT-TGG-AA GAGATTATA 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 921 AAACATACTr TGATAATCTT CCTAAAGAAG AACAA'rrAGC ATGTGAAGTA TTGCAGGCGT 3300 -GTTTGGATAC TTCTAGAACT AGAAGGCCTG AATATG-CAGA GT'rAATACTT GAGGAACATA 3360 TGCCTCAGAT TATAGAAAAA GAAGCrATT CAATAAATGA TATGTTGTTG AITrCGrr'GT 3420 ATCA AATGCTCATT AGAAAAGATC TTGCCAAATT TATAAATCAA ATCGAAAAGC 3480 TAATGCTCTT TCTTTTGGAA CAGAAGAAGG TAACTCAAAT AGAGAATTrAC 'ITrATAATTA 3540 GAGATACTCT TATTTCAGGA ATGTGTTGTC TTG.AAAAGGT AGGAGTAACT GATTGTTTTA 3600 ATGATTATCT ATCGTGTTTA CAAGAAATTA TGGATAAAAC TCAAGATTAT CAAAAGAAAC 3660 CTCTTGTATT TATGTTTrG TGGAAGCAAG CATTAAGAGA AGAAAGAGAT TTTAGTTTAG 3720 CTGAATCATT TTATCAGTCT TCTAAAACAT TTGCGC.AGCT A.A'TGGAGAT GAATTTCTAG 3780 TAAAGAAATT GACAGAGGAA 'rGGCAAGAGG ATGTCAAAAA ATA71TTATAA ACATAG'rGAA 3840 TCAGTGACAA AGATGTCCTT GTCCTCGTAT CAAAACAGTT CTAAAGTTCG TCT TTAGGGA 3900 TGTTITrTA GATATAAGCT AAAAATGACA CGAAATGGTT AGATTTTAAG GACATTGATG 3960 TCCG 3964 INFORMATION FOR SEQ ID NO: 137: i) SEQUENCE CHARACTERISTICS: LENGTH: 12666 base pairs TYPE: nucleic acid STRAflDEDNESS double D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: *TGAGACCGTT ATTTGTATTA GGGAAATGGG TATCTATTTT TAATGCTGTG GGGATTTTGA TTGTTTCTAT TATTCAAACC AAAAGC?1'GT CAGGTATTrGG AGCAGGATTG TTTAATCTAT 120 ATAACATTTC ATCTTATATA GGTGATIAG TTAGTTTCAC TCGATTGATG GCATTAGGAT 180 *TATCTGGAGC AAGTATAGCA TC-AcTT~CA ATTTrAATTGT TGGTrTTGTTT CCGGGAATAT 240 '..TGGCTAAACT GACAATTGGA TTAGTATTAT TCATTCTTT1' ACATGCGATC AATATITTTTC 300 TATCGTTACT ATCAGGATAT GTTCATGGAG CACGTCTGAT ATTTGTTGAA TT'rPTTGGTA 360 AGTTTTATGA GGGTGGAGGA AAACCATTTC AACCTTTGAA GGCTTCTGAG AAATATATTA 420 *0AGGTTATTAC AAAGAATTAA TGGAGGATAT ATATAATGGA ACkITTACCA ACTTATTrT-r 480 CAACCTATGG AGGAGCTTTC TTCGCTGCAT TGGGAATTGT ATrGGCGGTT GGATTAAGCG 540 GTATGGGGTC TGCTTATGGA GTTGGTA.AGG CTGGGCAATC TGCCGCAGCT TTACTGAAAG 600 922 AACAGCCTGA AAAG~TTTGCC TCALXI'TGA T'ATTC-CAATT ATTCGCCCGGA ACACAAGGAT TATATGGTIrr TGNTATTGGA ATr'rrAATTT GGTTGCAT AACTCCAGAA CTTCCTTTAG *AAAAAGGCCT TrCTATTTC TTTGTAGCTC TTCCAATTGC TATTGTAGGA TAC7TCrAG CTAAGCATCA AGGAAATGTA GCAGTACCGG CAA'rGCAAAT CTTGGCTAAA AATTCATGAA GGGAGCAATI' T'TAGCTGCCA TGGAGAAAC eCTT'CAATT TCGTATCATT CATTTTGACC CTTCGTG'rAT AAGAAATAAA TTTGCAATT-C TCTAAATGAG CAAT 'rAGAA GGCGTATCAA ATTATGGAT CGCTCATTAT AAAGAAAAAA AATATCAAAT AATTTTTCAA AACAGAAAAT ATTAAAAGAA CAGATAAAGA AATGGAGTTC TACTAACCTT TGGGGAACGG AATTCTCTT'r TCCAAAT'rAT TTATTTCAAT AGGTAAAATT CTAAGGAAGA AACTTCAAGT TAATCCTTCT TAGAAATTT'G AATCTTTTTT CAAAAATAAA GAAAAA'ITr'C AAAAAA'N'AT TCAACTCCCT ATCA'rTrATC CTAATGACAA AATTAGTCAA AT'rGTGACTT TATTTGCTTT AAGGCCGCAA TITAAGAAAGA CAAACTT'rAA AACATTTAGT CGTGAAGTAG AATCAA'rrTG GTCGGAGCTG ATCTAGCATA GTACTGTCTC AGGTTATTGT CGGTTATCtC AX.ATAAGAG TCCAAAAAGA AGATTCATGA GAAGCTGAAC ATGAACGAAA CAATTAAAAA A'rAAGCAACG CTTTTCAAT CTGCTTT-ACT ATIC1'ATCGAA TCTGGAACG ACTT'rAGCTA AATTCAATTT T'rATTTAGTG AACAACCTAT GATGATAACT ATTTTATAA A'rCGCAAATC AAAT'T AT
TCAAT=IGAA
Gr'rAAAACAA
CCAATCAACG
AGAAATGGAA
ATATTCACAA
AAC?1'ACGAC; AGTCTGTTA'r TGAACPAAGCT
AGACCAAAAG
CTGCTl-rr
AAAGGAGGTG
CATC'AAAAAG
ATGCAAAAGT
'rrCCAACAGA
TAGTATCAA
TCTTCGAGTG
CAAGAGG'rCA GGAACXATTA GAGA.AATTGA CTCAAATGAA TCAGGCTTAC AACATTAATT GGATCCAT'TT CAAT'rAAGGA TGAAATTGGT GAGTATTCCA ATAAAATTAG AAAGCTATTT TATGGATACT TACGACGATwr TCGGTAAAAG AAAACGATTI' TAT'IACAGAA ACAATCCAAA GATACGGAGA CATGGCATT TATCTrAGAA GATTGACATC TTAGA.AGA'rC C1'AGTCAGAC AGAGATTTCG 'rGATTATAGA TGGGCCTATG CTGAAACTCC GTCTGATATA ACGATATGTT TATCATAATA TCAAAGTTT'r ATTAAAATCT ?r'rTTCTAAA TTA'rTAATFC CAATAGCGAT TTTTGATATA TCTCCTTA CATTCAGATA CACTTCCTGA TTTTATGGCTr CAATGAGTAT GAAACTA ATAATATTCG TGTACTTGAT TTTTAAACAT CTGAAACTTT TATCTAATGA GTAGATGAG CGAAATGATT GACTTTTATA ATATTATTAC TGTAAAACGT TCATGCGAT ATTTTACAAT TAcTrrcAGA TGAAGGAAGT 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 ATTTCTGCTA AAGAATTITAT ATACATT1GTA GAAAATCAAG AAATATTTGT GTGGTTCAAT AAAATAAATC CAAGCTTAGA TTCAATCTT TCAACT'rATG AATTGAAGAT GCAGGACGCA 923 ACAATNcA'r cTrrcTGAGTT AGAATTTTA TGTG.ATrAc TATTGTATAA AAcTr'rAGAT CAACGAAGGT ACAATGTAC;A GCGGCCGTTA GTTCTTCCTA GA'rATrM-I' GGGATCTGAG TrrGAAGTAA AGAATCrCAG AATGATCATA
TCAATAAAAG
TCG;TACCCGT
CCAAGAACAA
AAAGGATACG CCCACATTAT GATATTATTT TACCATTTAG GAAGCTATAA ATACACTAAG TCAGCTCrTC AAAA'rACAA'r TCCQC7rTCAA GGAAGCtAAT AAGTATAAAA rrGGCATAAT CATGATTGGG TTTGATATAT TTCCTGCCTA AAAATTAGCT CAATCTG.ATT A'rGGTGTrCAT TTATATCACT GAAGACATTG CTrCAATGAT ATTAGATACA ATTCGCCATT ATGAIrCCr-A AGTTGTGCCT GCTATTATTT TATTACCGAC TCATAAACAA GG-TrrAAATT TAGGATTAAA ACGTATAGAG GATAATGTAG ATTGTCTGTA ATATTATTCT GGGAAGATTA TAAAAGTArC ATTCAAGATA TTTGCCGTGT AGAGATCAGG CATC'TATCCA GTTGTTACAA CTGGACAACC TTTGATGGCA TACAACCCCC GT'rCGGGGG TAG.AAGTr'CC ATAGCAATTG GTCAAAAAGT GTAGT-rAA'rC ATAAAATTAT
AGAAAGCAGT
ATAAT'rTTrG
GGGACCTCTA
AGGTAAGCTA
AGTCTATGAA
TCTCTCGGTT
ATrAGATCGA
AAGTTTGGAT
GAGTACGGG[T
GGTTCCTTAT
AGGACACAAT ATTTTATAAT AATGTACAAA GACTIAGTAA GGAGAATAAC TTTGACTCAA GTTATTGCAT CAGG;TATGCA GGAGGCTAAT GGGTAATCG GTGAAATrAT TGAAATGAGA GAAACATC'rG G'rCrTGGTCC GGGAGAACCT GAATTAGGGC CAGGATTGAT TrCTCAAA'rG TTTAAATTGG CTACTCATAA TGAI-rTrCTA AGAGATATTA AGTGGCCATrT TGATTCCACT GATATI'CTTG GAACTGTCAA GGAAACCCAG GGAGTATCTG GAGAkAGTCGT TTCTATTGCA TATGAAATAA AAAAA'rTGGA CGGTAG~rrC GTrCCGCAAGG CGCCGTCCTGT TTCTAAACGT CAACGAGTTA TTGATG-A'r CTTCCAGTA CCGT'rTGGAG CAGGAAAGAC AGTTGTACAzk AiTTGT'rAT'T ATGTCGGTTG TGGAGAACGT 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 TCTGGCGATT TTACAATTGA TGAAGTTGTA TATAAAGGAA CCTATGCA AAAATGGCCT TTAATTCCAG AAGAACCATT AATCACAGGT ACCAAAGGGG GAGCTGCAGC AGTTCCTGGA CACCAAGTAG CTAAATTTGC CAA1'GTGAT GGAAATGAAA TGACGGATGT ACTGAATGAG TTTCCTGAGT TGATTGACCC TA.ATACCGGA CAATCAATTA 'rGCAACGGAC AGTTCTGATT GCTAATACTT CAAATATGCC TGTTGCTGCT CGTGAGGCTT CAATTTATAC AGGAATTACC ATGGCTGAGT AT'IT3CGTGA TATGGGCTAC TCTGTCGCCA TTATGGCTGA TTCAACTTCA CGTTGGGCAG AAGCGCTACG TGAAATGTCA GGACGTCTAG AAGAAATGCC TGGTGATGAG GGT1'ATCCTG CTrATCTGGG AAGTCGTATC GCTGAATATT ATGAAAGAGC AGGACGTTCT CAGGTTCTAG GGCTTCCAGA ACGTGAAGGA 924 ACGrTACTG CTATTGGAGC TGTATCGCCA CCTGGTGGAG CAAAACACTT TACGGATTGT GAAAGTTTT TGGGGCTTG CGTCATTTTC CTGCAATTAA CTGCNACA TCrrATTCAC ACTTATATAG ATGGTAAAGA GAAGACAGAT TGCAATAGTA ArATTTCAGA
ATCCTCCCTT
TATATAAACA
ACCAGT'rACT
GGCACACCGA
CAGTGTGCC
TACTTACAAC GGGAATCTAG TCTGATAATG AACCACTAAC CAGAACCT TTrGATTCGGT AGTAATATTC TCACTTN'GC ACAGAGATTrA TGGAAGGTAC TCZAGA.AGATA GATTAGATGA TTGATATTAG AAACAGGAGG GAAGTTGTTG GGCCTCTTAT GTTGAAATTC AACTTCATAA GATAAAGCAA TGCTTCAGCT ATTCGTTTTG CTGGTCATGC 'rrTAATGGGA TGGGAAAACC TTTAGAGGAA ATTGTrTCGTC GATGGAAATr GCTAAACAAA AGATACATTC ACTTCGrMG TGATCAGGCA AATCATGCTT CGTGGCAGTT CGAGACCGTA AAATAACTCG TGCGATGAAC TTGTTGGAAT TGATTCTCTG TTCGAGAAGA TTATTrGCAA CAAAACAACA AGCAATGCTA TAGAGTTGGG TTCTACrT TGGCGACAAG TAAATATG~rT AATCAAAATT ATATCAAATG AGATTACACA TCAAATTCAT TCTATAAATG AGTGT'rATAA AAGAATACAG AACTGCTAGT GATTGTTGAA CAAGTAAATA ATGTGTCT'rA CAATGAGTTA
S
S
TGGAGAAATr' CCTCGTGGAC TTTTGAAGGA TCTAGTGGAA ATTAGAATTG GCTGTATCTG A.ATTGATGGT GGACCAGATT GATATTCATG GTCAAGCTAT TAATCCTGTA TC'rAGAGATT ACAGGGATCT CCTCTATTGA TCATTTGAAT ACTCTrGTAC I=TCAGGTT CGGGCTTACC TCATAATGAA TTAGCTGCTC GTTTTAAATr CTGATGAAAA TrTTGCGGTT GTATTTGCAG GAAGCTGAGT TTrTTTATGGA AGAACTCAGA AAAACAGGAG AAGTTTTAGA GATCCACGAA TAAA =rAGA AAAGTCTAAA AGGATATGGT TGGTCGTAT'r TAATrCCAGA GAAATATTPA ATCCAGATGA ATTTATTCAG GTGGTCAAAA ATTACCACTA AGATAGCAAG ACAAGCGACT CAATGGGTAT TACT'rGAA CGATCGATCG TTCGGTTTrA CAACTCCCCG CAT'TGCTTTA TrCTAGTrAT CATGACGGAT CTCGCCGTGA AGTTCCAGGG CTrCTATACGA AAGGGCTGGT TTTTAACAAT GCCAGAAGAT CTGA-AGGGCA AATTATTTTG 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760
TTTATGAACT
ACTOCCOCCAG
ATGACTAACT
-AGACGAGGCT
CGCTTAGTTG
GACATAACAC
TGGCAAATGA TCCTGCAATT GAGCGTATTG AGTATCTAGC 'TTrTGAAAAA GATATGCACG ATTGTGAAGC GTTACGTGAA GTCTCGGCAG ATCCGGGATA TTTATATACA AATTTATCAA GTAAAAAAGG TTCGGTGACA CAGATTCCTA ATCCAATTCC TGATT'rAACT GGATACATTA TCGCATGAGT TGTATAATCA AGGTTATCGT CCACCAATCA ATGITACC TTCTCTCTCT CGATTAAAAG ATAAGGGATC TGGAGAAGGT AAAACTCGTG GAGATCATGC TCCAACTATG AATCAACTGT TTGCAGCCTA TCCCCAAGGG AAAAAGGTTG AAGAGTTAGC AGTAGTATTA 925 GGAGAATCG.G CTTATCTGA 'rGTAGATAAA MTTATGTGA GGTTTACAAA GCGTTTTGAA GAAGAGTACA TAAACCAACG ATTTATAAA AATCGAAATA 'rAGAAGATAC GTTGAATC2-r GGGTGGGAAT TACTATCAAT TCTTCCTAGA ACAGAGTTAA AACCTATCAA AGATGA'TTG CrTGATAAAT AC~rACCTTT GGTAGAACIT TAATCCGGA.A ATGGAGTGAT TATCTATGGT ACcTrTGAAT GTAAAACCAA CTCGTATGGA A 'GAATAAC TTAAAGGAAC cTrTGACAAC AGCTGAACGT CGACATAAGT TATTAAAGGA TAAAAGAGAT GAATGATGA GGCCA'ILrAT TTCrTI'GATT CGTGAGAATA ATCAACTTCG GAAAGAAGTG GAAACTrATC TCTAAAATCC TrGCAGTTG CTAAATCA'rT AAAGAATTCT CAAATGGTGG TTCA.ATTCCA TCCAA.AGAAA T-rGAATTrATT 'rGTTGAGAAA GAAAATATCA AGTTCCTAGA ATGCATATGA ATATTACTTC TCAAAATGAG AACAGTGAAT ATCTTCThAA AGTGAAATGG ATGATGTATT TGC3'ACAATG AATACGrTAA ACTAAGACTG GC-AGAAGT'rG AAAAAACCTG TCAGTTAATG GCTCATGAAA6 ACGTAGACGT GTAAATGGTT TAGAATACTC GATTATTCCA AACT'GTCrG TTATATACAA TTGuAAACTAG AGGAGGCAGA AAGAGCCAAT TTAGTT-ccTA GAAGTAGATC CrATTTAG ATTATTAAT'r AGATGAACA.A ATATCAGIT TAAT'rCATAA
AGGAATTATT
TGAGTG;TAAC
ACAGCTAT
"TT'AAATT
TAGAAAAAAC
AAACTATTCA
TTATGAAAGT
GGATAAGGCT
6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 TTAAGCCTTT CTAAGCI=-T TTGTTAGATA ATGAAAA'rG TTTATTGACA GTATCAGGAT ATCTM=TCA AAA'NTrTGGT TTCTACTAAT CTACAT1WAG GATTAGTAAA TCGTAAATG'r AATTATATAG AAAG'TAAGCG CGTCATAACA AGG'TATCTAT CAT'rCATGGA G.,"CTCCTC TATACTATTA GTAAAGTAAA ACTATTGGAG GATATTTTAA TGCC.ACAACC TA'rTGTTCCT GTAGAGATTC CACAATCTCG TCGTTTrCAT ATTCGTATTG CCAAGCTTGA AGTAAGTTTT CAGCT-rGG ATALAGGTGTT .GCTCTrATGAC TCTCGTGTGT GGGAAAACTG ATATGAGACA AACCCACTTT GAATTGGATC CTTTCTCCGG .AGACCGCTTT AAAGTCCTTT ACTGGGATGG TCTAAAAAGA CAAATGATAT TCTGCTTAAA TTTCAATCTC TCAATCTCGA AATGGTAGAA AATTCATCTA TCTACCCTAG GGGAGGTCTA AGGAATCGAT TCACTGGCTT ATCTGGTTAA TCAAGTCTTT CTCTTTTGTG GTGGACGTAA TCAAGGATTT TGGCTACTAT ATAA6ACGCTT TGAGAACGGC AGATTGATTT' GGCTAAGTAC AGAAAAGGAT GTCAAAGCTC TCACACCAGA ACAAGTAGAC 'rGGCTTATGA AGGGCrTT TATCACTCCA AAAATATAGT AGATTGAAAC TAGAATAGTA CACCTCTGCT TCTAAAACAT TGTTAGAAAT Cr.ATrACT GTCCTGATCG A'ITTGTCCTG TTCTTATTTC ATTTTACTAT AAATCCATCA GAAAGTCGTG ATTTCTATTG 926 AAATGAGGAC TTTCrTTA TACTCATCTG C(rrCAAAAA GCATTCTAGT CCA'rCTCCGA ?I'AACG.ATGG ACTTTATCAC CTCC1-rCTCC AGTCCTTGTA TAACATCTc GAG=rGATTC ATGAC-ATCTT CCAAAGTTTA AAAGGCTrTA TTCTTAAATC CACGTrACG AATCTCTTTC CACACTTCI' CAATGGGGTT CATCTCTGGT GTGTATGGAG GAATAAATGC AAAGCCAATA 'ITAGTCGGAA TCTTT'AAGGT ACTrGATTTA TGCCATATAG CATTCTCCZAT AACCAGTAAA AGATAATCA'r CTGGATAAGC TTGTGAAATC 'rCCTATTCCT AAAGCCCCTT TAGCCAAA CTTTGGCTC-A GCI-rCTA1-rA TCGCTCAC-AC CATCCATCAG AAGTTTAATC TGAAGGI'ACC CAATTATCGC CAAGAAGAAC ATTGGCCTAG GATGGGT'rTA CCAArCACAC GTAAGGAAAT CTCTAATTGG CATATCAAGG CGAGTCAATA CTATTTGGAG CCCCTTTATA ACCTCr-rGCG AGAGAGACTA TrGACTCAGC GAGTGATAGT CAGCTGAC?= GATTACGCTT TACCACCATG AGATTATTCT GGCTATGTGC CTAGTTCTGC CTATGCGATA AAACTTCGAA CCGCTCGTCT GGGCGCATGT GAGAAGGAAG TTAGGAGCTA AAGGTTTAGC CCTTACTrCA
ACTATT'GGAC
ATCAGTGTCG
ATT=TATAT
TGCGGATGAA ACTTCrATA GGGTGCTAGA TTTTTTGTCA GGTAAAGCAG AGAAACAAGG AAGTGCTTrCA GTAGTACAAG AATTCCTAGG TTTGCGGCAG TAACTTAGGA CT'ITAGTCCT GCAGTCCAAG GTTTAGGAGC AAGGCG;ACGC GCTTATCGTC AACTGGAAGA AGC'TGAACT 'rir111GAAG CGCCCCCCCA AGCAAGCGGA TTATTGTGAT CAGTTATTTT CCTTGGAAAG GC'rTTGCr-AG CTGATGAACG ACTACAGAAA CGTCAAGAAC ATCTCCAGCC GACTTCTTTG CTTAGTGCCG GCGTCAGTCA G'N'TAGCAG GTTCAAAACT ATTGAATACA GCCTCAAGTA TGAAGAAACC 'TTrAAGACCA TTTTGAAAGA GTCLC IrT CCA ATAATCTAGC TGAACGCGCC ATTAAATCAT TGGTTATGGG AGAGTCr-AGT GGACTCTTM AGCCTAAGCT CAGTTTAAAA AAGCGAGGGT
TAAGCTTGGT
GT'rGGATGTT
TAAATCATCC
AGACTGGGAG
CTTAATGGAA
AGGAAGGGCA
CGGACATCTG
ACGGAGTAAA
GGTTATTTTC
7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 864C 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 TCAAAGTTTT GAAGGAGCTrA AAGCAAGAGC TATTATTATG AGTT'rGTTGG AAACAGCTAA ACGTCATCAA TTAAATAGCG AGAAATATCT ATCCTATCTT CTAGAATGTC 'LrCCAAACGA _GGAAACTCTC GTAAACAAAG AGGTTTTAGA WGGCTTTA CCATGGACTA AAGTTGTACA AGAAAAGTGC AAATAAGAAA TC'CCACATT AGGAACTATC CGTGAGTTCT CCAGTCTGGA GATTTTTCAA TAGACTTCCT GCGAAACAAA ATATGGTATA ATAGTTCTAT GAATGATGAA GCAACTAAAC A*CTAACCGA 'rGCACGATTT AAGCGTCTTG TTGGTGTTCA ACGCACGACT TTTGAAGAGA TGTTAGCTGT ATTAAAAACA GCITTATCAAC TTrAAACACGC AAAAGGTrGGA CGAAAACCTA AATTAAGTCT AGAAGACCTr C'rrATGGCCA CTCT-rCAA'rA TGTGCGAGAA 927 TATCGAACTT ATGAACAAAT TGCGGCTGT TTTGGTATTC ACGAAAGCA-A crrAATCCGT CGGAGCCAAT GCG lrGAAGT AAC-CTTGTT CAAAGTGGTG; TACGATrC AACAACTCCT Crr-AGTTC1'G AGGACACGGT AATGATTrGAT GCGACGGAAC TAAAAATCAA TCGCCCTAAA AAAAGAAT'rA GCGAATTATT CTGGTAAAAA GAAATTCAC CCTATGAAGC CTCAAGCGAT 9540 9600 9660 9720 TGTCACAAGT CAAGGGAGAA TTGTTTCTTT GAA7TGTTC AAAATGAGTC GCAGAAATAT TGGrrATCAA GGGCTCATCA AGATATATCC ACTCAAACCC CTAACAA'rTG AAGATAAACr CAAGG'rTGAG AACATCTTG CCAAAGTAAA AAATCATCTA AACGCTTCGG ATTACGAATG CTAGGA'rrCT AGTTITGCAG GAAGTCTATT TGATACAGGA AAAGTTTAT TTGATGGT KAGTCAAAAT T'rAGGAGTAG TTCCACAGGA GGATATCACT GTGAACrATT CAGACAAGCT GGTAAAATCT TCAAGCACAA ACTTCACGTA
GTCATGATAT
'rGGCTGACAG AA'rCCACCAA
C
C
C
CC..
C
CCC.
C. CC C C
C
C
'CC.
CC..
C
C. *C C C
C
CTATAACCAT GCGCTrACTA AGGAGAGAAG AACGTTTAAA ATGATTTCAA CAACCTATCG AATTTGATTG CTGGTATTAT CAATCATGAA ATCAAAAATA CCATCAAGAT TATATAAGAT AAATATT-AAT CAAATAGATA AAAAAATATT TTCATTTTTA TTGAACCGAA GTATTCTTGA TAATATAAC'r
TCANATCTAT
TTAAACCACG AAG'rrACTTC ACAAAAGA'rA GATGAAATCA TGCCTATGCC GATGAAATTT GAGGAACTTT GTAAAGCAGT AATACTATCA TCTCAGAGAT GGGGTCAAAT ATTrCAGGTG G=CAA.GGCA ACGGArAGCA CTGC;CACG7TG CAT7AATAAA TAATCCTAGT ATTGTAATTr TAGATGAAC AAGAATA.Ar-A AAGTATATAC AAAGTCAGGG GTCAACGA'N' AAGGATGCGG ATGTTAT7T= AGGAAATCAT AAGTA CTTAA TGGA'rCTTGG GAAATGAGGT CTAAAGAAAA TGAAGAAAGA ACTAGGGGTG ATGArrGGAA TAGTGTTITGC TATTTCTTTA- GGCTTGTTGA ATGGAATAGT TAAGAATTAA GCATAATr'r' TrGCGTAAA 'PATAAATCTA ACAAAAAGCT TTAAAGATAT GCAACTACAG TAGTAGCTTA AAAACATGAT AACTAGTGCA TrAGACACTA T'rAATQAGGA CrGTACTCAA ATA~rrGTAG CTCATAGATT TGTAATGAAA GGTGC;TAAGA TTGTTGAGTC 'rGGAGAGTAC TACAGCTTAT ATACAAAAAG AAATGAATAT GTAATTTTAA CAACACCTC AATTTN'TrA GATTTTCCAG TTGAATATGG ATTGGGTTCG CTGATTGTTT ACAAAAACAA CTAAGGAGTA GAGATGGCTA TAGTrCAAAT TGAAGTTATT CATAACACTr AAATAAI'AGA rAAATCCCTA TTCTTAGGAG TAGCGGTTTT 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 1110 11160 11220 TCTTTTGTT TAATACrcTT TGAAAATCTC TTCAAACCAC GTCAGCTTTG CTTTACCGTA CTC.AAGTACA GCCTGCGGCT CCCTTCCTAG TTGCTCTTT GAT1TTCATT CAGTATA;AA AGGGTCAAGT AAGTATAGTA AAT'rGAAATA AGATATGAAC AAATCGATTA GAAAAGTCAA 928 ATTAATTTCT AGAAATATG'r TAGAAATTGG ?1'rGAAa-rCC GCAATCAATT TGTTCAGTTT 1.1280 TTrATTTrCATT- TCATTTTATT TAATTAGATT 'rrCCAATTrr TTAATTCAAG CTAAAAATCC 11340 CCAATCGTAG TGATTGACGA TTGAGTAAAT AAATCTTAAA CAATACCTTG TGcAATcATc 11400 GCATI'TGCTA CATTTCA GGCAGCAATG TTAGCTCCTG CAACGTAGTC TTTNATCAAGA 11460 CCGTATGTTT CTGAAGTCGT TTTAGCTGTG TTGAAGATCT TTGTCATGAT GTCTTTGAGA 11520 CCGCCATCAA CTTCTCACG ACTCCATGAG AGGCGAAGAC TG1 r-GCT CATTTCAAGA 11580 GCTrGAAACGG C'rACACCACC AGCGTrTGGCA GCITTGCAG GTCCGTAGAA GATACCATTT 11640 TCTTTGTAAA CTTTGATGGC ATCAAGGTCG CTCGGCATGT TGGCACCTC AGA'rACACAG 11700 ATAACCCCTT GACGCAACCAA ACGTTrTAGCT GCTTCACCGT 'rGATTTCCTT TTGAGTGGCA 11760 CATGGAAGAG CAATGTCATA GTTTCCAGCG TAAGTCCA TA CAGTACCTTC GTGGTAGCI-r 11820 GCAGTrCcT TTTCAGCTGC ATACTCAGTC AAACGAGCAC GACGC7?NC TTTAACATCA 11880 *ACCAAAAGAT CGAAGTCGAT ACCATTr'rCA TCGATGACAT AACCATTTGA GTCAGAAACA 11940 GAATA rG TCACCGAn TTCAGTTCCT ?TTGAAGAG CATATTGAGC AACGTTACCA 12000 **.GAACC'rGAAA TAACGACTTT CTTACCAGCA AAGCTGTAC CGITTAGCTTT GAGCATTTCT 12060 ***TCAGTATAGr AAACCAAACC GTAACCAGTT GCTTCTGCAC GAATCAAGCT ACCACCAAA'r 12120 :.*CCAAGAGGTT TACCAGTCAA GACACCAGCA TCAAATTGCT TAAGACGTTT GTATTGACCG 12180 TAAAGGTAAC CAATTTCACG TCCACCAACA CCGATATCAC CAGCAGGTAC GTCAAGTGAT 12240 GGTCCGATGT GTT?1'TCCAA TCAGTCATG AAGCTTTGGC AGAAGCGCAT CACTTCAGCA 12300 TC'rGT'I-rAC CTTTAGGATC GAAGTCTGAT CCACCTTTAC C1'CCACCGAT AGGAAGTCCA 12360 GTCAAGACAr rTTTrAAAGAT TTGTTCAAAT CCGAGGAATT TCAAGATCCC TTGGTrTrACA 12420 *G TTGGGTGGA AACGAAGTCC ACCTTTGTAT GGTCCAACAG CTGAGTTGAA TTGAACACGG 12480 TAACCACGGT TTACTTGAAT TTTCCATCA CGGTCAACCC AAGGAACACG GAAAGAAACC 12540 ACGCGCTCAG GCTCACTAAT ACGTGCCAAG ATATTTTCTT CCATATACTC AGGGTGT'TTT 12600 TCAAATACAG GTTCTAAAGT GTTGAAAAAT TCTTCAACAG CTTGGAGGAA TTCAGCCTCG 12660 *.0*TGCCGG 12666 4 IN~FORMATION FOR SEQ ID NO: 138: Wi SEQUENCE CHARACTERISTICS: LENGTH: 3083 base pairs (B4 TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 929 (xi) SEQUENCE DESCRIPTION: SEQ ID NO.- 138: AGcAAcTG=r GTr.AAccAAT TccGATAAAT TccAAc.AATT GGTTAATAGA GccATTTrGA CCAAAAATCC CGATAAAAGC ATAGGCTTI'A AGGAGCAAAT TGATCCAGGT AGGAAGGATA ATCAGCATGA GCCAGAGTTG ACGGTGTTTG AGACGOTCA AAAAGAGGGC CGTCGGATAA CTGATAAGCA GTGCCACAAA GGTCAC.AATXG CCTGCATAAA GCACTGAGTT GAAACTCATT TTAAGATAGO 1'CAAGTT'rTG TGACGC.AAAG 'rAAGATTTGT AATTTTCTAA ACTGXACTGG CCTTCGATGT TGAAAAAGGA GCAATCCAAA GCATGTAGGG TTGACCGAAA ATCAAGACCA AGGGTGCCAA TACAAAGAGC TACTACAAAG AGTTTACAGC CTCCTCGATT GCATTGATCA AACCTGCTTC ACGAGCATCG AACT=TCTT CGGTTTCATT AA.ACTCCAGA CCGATTTCCT CACCCACGAr ATTTCCAAGT TCGTCATAGG CCATAA'TTC GACCTTAACT 'rGGAGCTTGC CTCTTCZAGG AACGACCTCA ACAGGTTCAT TTGGCTTCAT GTTAAATTCG ACCAAGTAGT CCTCAATCAT AAAGGTGGCA ACAAAGTGGT 'rGATTrGGCTC GACA6ATCTCG CCATCATTCA TAACGAAAAT ATCGTGAGTG ACAAAGACAA AGGTAATGCC CTGCATGTCT GTT(=CAATT TCAAGTCCAG ACGGGGTTGG TTCA'IOATAG CACGGGCGAT TTTGCGGATG GAACG TITM CATAACCTTC ACGCTGCTCG ATTTCTN'TCT TATC.AATTTr AA.ACACATTC ATATGTGGGA ACAAGGCATA GT"TGGTTGGA ATA'rCATTGA TACGAACACC CAGTAAACCT GCAATAArGT TTAGGA'rAGT GGTGTAGAAT TTCCCTTCTT CCAACTCAA6A GTCTTCAALAA ACTTTAGAGA CGTI'TTTGAA A'PTCCTTCTT TTTCATAGAT TAACCGATCG AGGGAGTAAA ACCACCTGCA TACATCTTCG
TTGCTCTTCG
GAGACGC.ATG
AGCCTTACGG
ATAATGAACT
AAGGGTAATG
CCCACCATCA
GGTACCTGGC
ATCGTAGATG
CCAGTCACTC
TTGTTTrCTr CArTCTTTrC ATIT CTACGT ACTCCTCAAT ATGTGGATCT CTTrCTGGTTC GTTGAGTGGA TC.ATCCATrC CCACGGAAAA GCTGGGTATC CGCAAGTCCT CTGGACGAA'r ACCGCTTCAA AGCGrTTGCC AAGATGTTTG ACTCCCCGAT TCCACAGGGG TTCCAGACTG ATGGC.AAGAG CTTCTTCCTG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 CAATCGTG TGTAATTCAC GCAATTCGTA CGCTCATAAA GGCTCGTCCA ACAAGACCAC GGCCACACGC TGACGTGTC CTCCAGAAAG CAACTGAACC ATCTTGAGAA CTTCCGCTAC ACGCAAGCGA AGTOGAAAGG CAACATT'rTC GOATTGGAAO ACGGTATGTA CGTCOCGCTT GTCTAGCATG ATATCTCCTG TCGTCOCATC TGATTTCCCC GAACCAGATG CACCTACAAG G'PTGATGTCT TTGAGAACCr TGGTGTTGCT TTCGATAATT GGCTTTTTCA ATPTGGCATAA GGGccTCTGTc AGGTCCCCAC TACCTCTTGC CTACCGATAG GCTTCACCC AAGATCCGGA 930 CTTICTCTTTC AACCGTrAATA CCTGAGI'GTT CCTrGACTTT' TTCGATAACC GATTCCATCA AGTCCTCGTA GTC~mTGGCC GTTCCATCTG CGACATTcAT CATAAA!TCCT GCATGCrTT CTGACACTTC TACGCCACCG A'rACGATAGC cIrrcAAGcc AGctTCTAA ATTAAcrcGAc CTGCAAAATC CCCGACTGGA CGCrAAAGA CCCACCCACA Gc-rGAcrrc AcGTAGr.TGc GcrAAcGcG' ccATT-rcc~cG AGATGGGTAT TCCAAAGGTT CTTGATAACC TATGGGT- CTGGAGCTAG GGCAAA1TrrA ACTGACAAGA GACGGTAACC AAAAGCCAAG TCT 1'AGCAG AGACCTTACA AGACTGCAAG ATCTGAGCAA AGACAGCACC CCCAACGC'rr CCTCGAATA.C GACGGAGGGC AATGCGAGT'r G'rTTCAA'C.A AGCCATCAAC AGAAACGTrA TTGACCTTGT CATCACGAAC GA'rGATATTG CTTGCATTGC TGGCAAATI' CACAACGCGA GCCAACrCAA CCTCTCCACC TACNTTTGTA TAACI'A1AGC TTCCT'rCTAA GATI-rCAAGC ATTTrTCTC ATTCAT'rCCA TTATACCATT TTTAGAGACA 1-rTTGCArA.A GAAAAGAGG T-rCCCCCCTr TTCCATACCC GACCAGAACG AGCTCCAGTG TCAACCAGCC TACTTGGAAA AGAGATGCTG CTGCTAGGAG ATAAATCCAT TTCTTCTCGC GTAA'rTGGAA ACCAACCGCC AAGATTCGCT AAAGTGACTG AATGGTTI, AACATATTTT GAGTTTGAAC -TGCCGAAAGA ACAATGTAAA TCAACTCCAA GCCACCATGC CCC CAACTGCACC AGACTCCTGA ATAGCTGAAT ACAGGG=~C GATTTtCCA TCCTTGGTCA TCTCGCCACC ATAGGCACCC GCATTCATAA CACAAGCAAA CTCAAAGCCA GrrAAACTAT AGTrACCCCC AGC?'rCTGCT TCAATCGTAT CACACAAGAT GACAAA'rCCA CGAA'rCCCAC CAAGAACCAT CCAAGGGATA 'rTTTCTrGGT AACCAT=CG TGGAAAGACC AAATAATCAG TATGCAAGGG TrCCTA-AAA CGGATATCAA TTACAGACAT GTCACrCTTC CTrrACAAA TTTGACCACC ATAAAAATAC CTTGTI~rCGA TT'rATGAT-T TTTACAAAAG ATTTCCTTGG CTAGAATCAC TTICAACCAAG ACTGGATTTG CCAGATCAAA GAACGCATGC AAGCCATAGG GAACAGCTTG GTAAACCCAA ACTGTCAAAA CAAAACCAAG CAAATAAA'rC TGCCAGACCG CACACAGTAA TTccAPrAACC TGTGGATTCI' CA'rTGAGTAA AC-rAGTAAGG CCTAGAAAAA 1920 1980 2040 2100 2150 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 28 2940 3 0 3060 3083 :(INFORMATION FOR SEQ ID NO: 139: SEQUENCE CHARACTERISTICS: LENGTH: 15363 base pairs TYPE: nucleic acid (Cl- STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPION: SEQ ID NO: 139:
CCGGAGGATA
TACTGCGACA
TAAAGGTTTG
TCTTATACTC
ACTCTTTTrGA
GCTGACGTGG
CCATATCATA
CATTTTGTCG
AAATCAACTT
CACGATTGAT
GAGACTCCTG
TTTCAATAAT
CACGGAGACG
CTTCATGAAG
AGGCAAGr'rT TTGACCACCA CCAAAAGCAG GGGGAAAATC GAAATCAACC AATAGTAGGC CTGGTCAACT CACTATCTGA TGCTTGATAA TAATGCAAAA AAGCTN'TAA TCTATCAGCT CTTTCCACCA C7"TT'rCATG TCATACTCCT TCACTTATAA AATGAAAATC AAAGAGCAAA CTAGAAAGCT AGCCGCAAGC 'rGCTCAAAAC GGTTGTAGAT AAGACTGACG AAGTCGATCA CATACATACG GTAAGGCGAC TT'rGAAGAGA TTTTCGAAGA GTATTAACTA ATTTCTTCTT ACCAATTCCA CGGTAGGGTA TTGGCAGCTT CCTTCAAGGA ATAGTTCTCT AAGTTATT'rA TAATTTCTTG GCATACTTAG TCGTAATCAA TCGTTTTTCT TCGTATTCGA GCGCTCCAGA TAATAGCCTC TCAGCATTTC ATCGATATTG TTGCGTTTGA AACCCGTTCG ACAAAGGCAC CACTGCTGAT AATAGCTGTT TCTCGAAGAC CATAAAACTA ATCAAAGAGC GTCTGT1AGAC TCCCTTCAGG TTTTCCAAAC CATCTCTGTA TTGGCAAGAT AGAGCTCTGC AATTTGGTCA TAATCAAGAG GCTTTGCTCC TTGTTCTTCC AGCTACGGAA GGTCTTTCCG AGAGTAAAAA GAGAAAACGT AAAATCCTCA AGGAAACAAG AAAATAA'rAG GTCAGTCTTG ACGATTIGATT CCTTGTTCTA TATTTTTCAG ATAACGTTGG TAAAC'rCGGT AACCACGATT GCTAATGTTC CCC'rCTTCA'r AGGCCTGTTC CAAACCATCA CTTTCAATAC TAAGAA'rCAA GAGTTTCAAA GCAGCCCAGT CTTCTTGATC ATCCTGGTTT TCTTGGCTTA AAATGAGATT TTCAATACGT CCATGATAAT TGTCAATAGC CGCATAGAGG GGAAGTTTAT TTCTGGTGTC TTCCAACTCT TTTTCCAACT CTAGCGTTAC TTCATTCAAA ATGGCGATAT
GCATAAGATA
CTGTTAAAAA
GTTC'rAGATT
AAAAGGTCAA
'rTCTATACGC GGGCGA'rAAG CTGCTATCA'r
TCACTGTATG
TCTTGAAGCG
GAACTTCTTC
A'rCCTTGCTT TCTTCCTrCTT CATCAGAAAG ATGAGGCAAG ACCAAGAGAC GCTAACAAGC GTCACACCTG CAACAAGGAA AAGCAAAAGA GGATACTCCT ACTTGGTATC'AAGAGAATCG TAGCAATCGA CACCGTTCCC TTAACACCTG GAGAAACATG TCCTTCATAT ACTTATTTAG CTTTTCTTG AGGCGTCGGG ATAATAGCCA TAGATCATAA TAAAACGAAT GACAAAAAGG ACAAAGGTAA AGATAGCAAT AAAAGTAGAG GATTATAGAT TGGATTGGTC AAGATAGGTT TTCCAACTCC ATCCCTAAAA TCACAAAGAC AGAACCGTTG AGCATAAAGG CCAGACCGTC TCGGTCACCG TATCCACTTG GGCTrrCGAGG AGCGTGATTT ACTTGCCTTT AAAATTCCAG CAACTACGAC GGCAATAATA CCTGAAACAT TGCCAGAAAG AAGGTCACflA GAGGCAAACT CAAT~TTAAT AAAAGTTCAC 932 TGGCAATATC CGTTGCGCGC ACACTTAGCA AGAAGGTATG CTGTTAAAAA 'rCCAATTAAA AAACCGCCTA GGATTGAAAA GCCCCAGAGA AAAAGCTCCA GTTGTCCAAG CTGTCAAAGC CACAAGCATC ATTCAAGAGT CCTTCGCCCT1 TAAGAATA'TT GAGGAAGCCG TTGGTCATGG GATGAGCGAA CTGCTAGC3"r TACCTGAAAA GCCACCAAAC GGACACGCGC TTAGGAAAGC TAAAACGCTC CCAAAGAGAG CAGCCAAGCA AGCTGCCAAG TCAGG4GTCGA GATAAAAATC AAATAGCCGT AACATCTGCT CCAAAAACAA CTCCGTATTA TTCCCAAAAG AATTTGCACC TTGAGACAAT CAAAACCAGT TCCTTAATCT TTTTTACAAC AATCT'rGGAA CAGTCTTTGT TAATTTTTTC TTGATTTTAA CGCTCGTrrCG TGGTGGGTTG TAAGTATTGT TTATCCATGT CG'N'GTTGAC TTGGTTTAAA GGATCACAAA ATCCAAGACA TCATAATAGC AATTCGTCCA GCAAAGGCCA CCAAGTCCGT AGGACCAACG GCTGCCCCAA GGAAGGCTGA ACCAAAGAAG ATCGGCCAAG CCACCCAAAC ACTGGAAATA TGAGATAAAC AATGATTCGC CAGTGTTT'rA TCTTCAGCCT CTCGGAAAAG CAAGGGTCCG ATAACCAGTG AGGTGAAAGT CAGTATTGGG TAAAAAGAGA CCAATCACAA AAAGGGAGAG GCAAAAAGGG CAGGAGCTrA TTGGTTGTAC AAAAATAGGA TGAGGTAAAT CAGTAA'rTCC ACGCACGTCC AGGATTCAAA TATCTCCTTC TGCTCTTTGA GCTCAATmT TCTCTGGCAC CGTTCCArT
TTTTTGGTC
CAAGAGCAAC
GCATTTTTTT .GCTCATATGC ATTCAACAAA ATTCTGGCGC CTGTATCTCT CTAATTTTTC GTTCGGTAAA TGGCAGCTGT GCATCTC--CT CGAGATCGCC CCTTTGACAA GTAAGCCACA TCGTTGTCTG CGGTCGGGTG ATGACAGACT TATCACCTGC AAATrCCCTGC CTTAGCTTTT ATCACAAACT GG'rCCAGTGT ACTGGGCAT'r TGTCA.AGTCA GCCTGATGCA AACGC'TCTTG TCTGAATATC AAAGGCATAG ACTTGCTTGG CTAGCTTGGC CATTTCCCAT AGTCGCATCC ACTACGACAT CCTCTTTTGT CATGTGCCA'r CTCAAGTGGT C'TTTTCAmT TCAAACTCCT -TGAACACTTC CACGACGTCG CATCTCCATC TCAATGCTrGT AGGCTCCACA TAGGACCAAC CAGCATATCC CTAGGCAT ACGATATGT-r TGGGAATAAT TTCCAGTTGG TCACAGATGA GCTTGGTCTA GCACGCCCAT ATGGCATCCA GCTTTTCGTG AATCATCACT AAAAACGGCG GTACTCTTGT TGGTTCAACT TCCT'rCATGA CCATAGTAAA TAGCTTTTCT AATGCCTCAA CGGCAAATAG CCCAGATTAA CTCATGGCCI' 'GCAAGATTA GGTCTTTTCC AAGGCTTGCT TAAAAAAAGC GTGTCATGAC CACGACCTCA GCCAAAAAAT GTTTTACAGC CTTGCATCCT TGAGGACTTC CCATTTATTG CTCCTGTAAT TCGATGGATG CCCTGACATA T'rCGTCCTGA 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 CTCATCAATT GTAAACGCCC CTCATGGTAA TCTCGTTGCA TACGAGTATT TGTCATAAGA TGGAGCAAAT GCAGTTTAAT CCCTTGAATA TCGTTATCCG TGACACAACG GCGGACATTT TCAACCATCA TCrCATGGGT TTCACCACGC AAACCATTGA TCAAA'rGGGA AACAATCTCA A'NTTTGG.AT ACTTTCTCAA ACGCTTGACC GTCCACCT ACAATTCATA AGAATGCGCA CGGTTrAA'rCA GGTCAGAGGT TGCTI'CATAA, GTAGTTTGCA AGCCCAATTC AACCGTCACA TGCATGCACT CCGATAACTC AGCCAAATAI' 'CGA'rGGTTT CGTCTGGTAA ACZAGTCTGGG CGCGI'CCAA TATTGATTCC TACCACACCT GGCTCAT'rGA TAGCCTGTTC ATAACGCTCT CGAATAACTT CC-ACCTTr'rC A'rGGGTGTTG GTAAAATTTT GAAAATAAAC CAGATACTTC CGAACATCCG GCCACTTGCG GTGCATAAAG TCAATTTCCT TATAAAATTG CTCACGGATA GGCGCATCCG GTGCCACAAT GGCATCTCCA GAACCAGAAA CCGTACAAkAA AGTACACCCC CCATGAGCCA CAGTCCCATC ACGATTGGGA CAATCAAATC CCGCATCAAT AGGGACTTTA AAAGTCTTTT CTCCAAAGAG TrTTTCGATAA TA.ATCATTCA AGGTATTATA AGATTTCATG ACTTTCATTA TAACAAAAAT CACCCACAAT CTCAAAAGCC TGACTTTCCT ATAAATTCCT *c CTGTTTCTCG T-rTCCATTAG CCTTTTrTTTA TGATACAATA TGGGTATGAT TTAGCATCTA ?r'rTATTATT GATACTGACC TTAGTCGTCT GCATTATCCT TTTAGATTA.A AAAAACTAGG ACGAAACTTT GCGGATTTGG c-rTTTCCAGT GACTAT'TACT TGATTACAGC TAAAACCTNT ACCCATALATT TCCTCCCTAG GCCCTCTCGA TCC'rAGCCAT TATTCTCGTC TTTTTCTTCC TTTTGAAAAA TACTACCCTA AATTTATCAA ATTCTTCTGG CGTGCAGGAT 'rTTATTAAC TATATAGAA.A TGATTGTTGA AT'rGT'rCTTA ATGAAATAGT CGAATCCCTA AGGGATTTTT GCrTTCTC'rA CAAAATAGTA TACACAATAA CACTATACAA AGAAAAGAGT CTGGGgCAAT AGTC1'CTTAT ATCCAAAAAG TTTTGGATC GTTACGATAG TCTTGGTAAA ATAGAATTC GCTATCCCAT GCATATTCAC TATAACACAA ATCAAACAAC CCTTCTTACC ACAACATCAT CTCGTTTTTA CTATTGAAAA AACGTCACTT CTACACCTCC TATCATGCCT TTGATCGCCC TGTATCTAC TCTTCTATTT CCTATTCAC AAGGGATTTT AATGGAAGAG TTAGTGACCT TAGATTGTT GTTTATTGAC TGCCAACAAG TATAGrrNTG TGTGGAAGAA AACGACAGAG AGAAC.AGATA CAGGTCTATT TTCAAGAAGA AATCACTCCC GTTTGATAAG AAACAAAAGA GAGGGTATAA AGAGTCAGCT
GCAACGGATT
CCAATAAACC
TTTACCACTA
AGTGGTGAAT
GTCTTATCAC
CTCTGGTCGA
AGAACTAAGA
AAATTCTCCG
CTTCTGATTA
AAAAACTTAG
TTAATGAAA
AACCAAACTT
CTTGGTATTT
ACTGGGGCTA
ACGCAGCTTT
CC N'ATCATG AGCATTT'rCT
TTTTATACAA
TGCCGTTGCT
ATTTAGAAAG
GAAATCAGTT
ACCTTGGAGG
CCTAAAATGC
AAAATTGAAA
TTGAAGCCAA
CCAAACT'rCA
AATATGCCAT
CGAATTGGCA
3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 934 CTATAATGAC AAGGAGGATA GCTACACACA TCCTGATGGC TGGTATTATC GTTTTCACCA 5340 TACCAAATAT CAGAAAACAC AGACAGACTT TCAACAAGAA ATCAAGGTr'T ACTACGCCGA 5400 CGAACCTGAA TCAGCCCCTC AAAAGGGACT GTATATGAAC GAACGCTATC AAAACTTGAA 5460 AGCTAAAGAA TCTCAGGCGC TT'rTATCTCC CCAAGGTACA CAGAT'CG CTCAACGCAA 5520 GAT'rGATGTG GAACCTGTCT 'rTGGGCAGAT AAAGGCTTCT 4V'GGGrrACA AGAGATGTAA 5580 TCTGAGAGGG AAGCCTCAAG TGAGAATTGA CATGGGAT'rG GTACTTATGG CCAATAACCT 5640 CCTAAAATAT AGTAAAATGA AATAAGAACA GGACAAATCG ATAAGGACAA TCAAATCGAT 5700 TTCTAACAAT GTTTTAGAAG TAAAAGTGTA CTAT'rCTAGT TTCAATCTAC TATACAATA-A 5760 GAGAATGACT CAAAATTAAA AAGCTACAGT TCCACAAT'rG GAAATAWCTA GCTTTTTTGT 5820 GGTTGAGAAC TATTTTGTCT CAGGCTCTTT ATCTTCTATT TAGGACAAGA GTN'TCTTT 5880 GGTCTTTAA'r GATAAAGAAG GTATCAAAAT TTCTAGTCTT CTTTTTTACC TTTAGTAAC'r 5940 ACTAATCCTG CACTCAAACC TAGAAGAGTT AAACCTGCTG CTACTGCTGC TTGGCTTGCC 6000 GCACTACCTG TACTTGGTAA CTGGGCTTTA TTAGTTTGAC TAGCTTCACT TGAATCAATT 6060 *GGTTTTGTAT CTGCTTTTTC TGACACTTGT GGTTTTTTAG CTTCTTGAGC TACTGGTTT~G 6120 *GTTCCAACCA AGACGATGCG GTCTGTCGGA AC~rCTACCA CTTCACGGAG TTTTTCTTCC 6180 ***TTACTTCCAT CAGGATTAAT CGCTGTAAAG ATACGTTCTT TTCCAACT'N' TCCT'rClTGT 6240 *TCTACACGAG TTTCACCTAG ATACAGTGTT GAATCTTTTT TCTCAACTGT CTTGTATGCC 6300 AAATCTTT'rT CAACAAATTC GATTTTTGGA AGATCTTCTT GTACAGCAGC AACTGTCTTC 6360 TCAGAAACTG GTTTTTCCTT AGTCAAGTGG ATACGGTATT CCTTCACTTG TT'rTCCACTT 6420 TCTGAAACCA GGCGAACAAG TACTCGAAAG CTATCTTCTC CACTATCTAC CACAGTTGAA 6480 GCTACTTGAT TGTTTTC'rTC AACTGAGACT TTTGGCCGTT GACCTTTATrA GGTAATT'rGA 6540 ***TAGTCTTGAC GATTTTCAGC GAAATCAGCA AGTTCTTTTC CATCTACAAG AATCTTTGAT 6600 ***TGAGTGCTTT CTTGAGGCAA 'rTCACTTGGT GCAAGGAAGG TCATCTCAAT CATCGCAACA 6660 CCGCTCTTAT CTGCTTTACG CTCCATACGC CATCTCATAG CTTTGGCTT'r GATAGCT'IrA 6720 AATGTTACGT TGATTTCATC ACCAGCTGCA ATGTCTTTAT CCGCACCATA AGGAACAGCT 6780 TCCCAATTT'r CTGGAT'rGTT GAATGGATGG TCTGCGTCGT AGGCTTGGTA GTTTGAATAG 6840 *TAGGTTGGCA CTTCAAACTC TGGACCGACA TAGCGTTCTA AAACGAGTTT AGATGGTGCA 6900 TCCGTACCAC TATCTGCAAA GAACTGAACT TTTCCTTGTG TAACAGTCCG TTCTACAATC 6960 TTACCATTTT CACGGAAAAT CACACCCGCT GATACTTCTG GAT'rAGAAGA TGGTGTTGGT 7020 GACCAGTTTG TCCAACGACG ATTTTCTGAA TGATCTCCGT CATTGAGATA GTCAACGCGG 7080 935 TCATGAGAGT TTTGTCAAT ATrCA'TCG~r CCKAGCAA TCATAGTTAG GGTTATCTGA AAGAGTCTCA CCAAGTTTrGT TCAGCAACAA GGTTACTACC AAGGACACGG CCTCGAACAG AGATTTTCCG CTGGAACTTC TTCCCATTCA ACTGTCAGGT TT'ACCTGTGA AGTAAACTCG AACCTTAGTC GGCAAT'rCA6A AGGCCTGGT'r CTGTCACTcC
TAAATTGACC
CTTTrGTrrTC GTGCT'rGACC AAGCGAGCTT GTTTAACCGC AGCA6ACTGGT AAGTGCAGAC GGTATTC'rCC TAAGATGTCG GGCTCACCTr CACGAACGCT TGGAACGACG GTGACTGCCG GAACTTTTCC ATCTACAGAC AAGTT'rGCTA
ACTTGTTTCC
ACCATGCGAA
CCAGCCTTGA
TTAAAGACAT
AGTCTTTGCC GTCAACr'rGG CAAAGATT'rG TACCTCTGTG TACGAACAGC ATAGGTTTCA GTTGAGCAGG GGCTTTTAGA GGTCCTCA'rT ACCAACAAAA TTATGAGAAA GTAAGCTCTT CCATTTTCAG CTTTCGCGAT GTAGCGAGAC CA'N'GTTGCT TCAAGGTAGT AGTCTGTCAA ATTC'rTGTTT GTCC1TrGCTT ATAGACGTTC CACGCTTGTT ACTTTATCAA AGCTAAAGTG TTAGTAACTG GTT'rCCAGT'r CTAGGGTT'rT TAGGAGCTGT
ACTGTTTTCA
TACAGTGATC
TGCTTTTGTC
GTAGCCGTCT
TACTTGTAGC
ATCCTT3AGTG
GACACGAACT
PAACACT'rGCT
ATCAGGGTTG
GGCTGCCGCA
ATCTGCTr'rA
GTTCATTTCT
GGCAGAATCA
TGGGACAGTC
TTACCAACAT AA'rACTCAAT AATCCGACAC TTAGATTATC ACACCGACTG AAGC'rTCTGG TTATTGTAGG AAA'rGAGCTT GAAGCAAAGG CAAGTGGCAA TCAGTTTGAG CAGATACGCG CCATTAACTG TAAAGACACC ACCTTAGCTG ATGAAACGTG GGTGCTTCTG CGATTGGAGT GAGAC'TTTTG CACGCGCTTC TGATAGGAGT CTAGTTC AAACCTTTGT CATACTCAAC GTCACTACAT TTACAGGACG
CACATAAGAC
AACGGAGCGT
ATTAGTACGA
GTCAT1'AACA
TTCTGAACCG
AACATGAAGT
TTCCTTAGCG
ACCATTTGAA
TGTCACACTG
AAGGTCAATT
TTTCGGAATA
TGTTACTGTT
GATGGATT'GC
TTCGGTACAC CAACTCCATG GTCTTCATGG TTGCTCAAGA TACCTGAk-TC TCCAAACAGA TTCCAGTTTG TCCAACGATT GGCTGGTTGG TTTGAA-ACTG GGTCGCTTGG ATT'rGAGTCT G~TCCATTGGT CAGAAATGTT TGCACC'rTGC TTAGTTGTTA ATTGCGTACC TTCTAAGCGA TATTGCTCTG GACGAATCGC ATCCCATGCA TCATATGTCC GAACACT'IrC TGGTAATTGT ACTTCTTCAA CTGAAACGAT ACCTTCTACA CCTTCAACTT TACCTAGTAC TTCAAATGTT GCTTGCCA6AG TGACTNTrATG AGTTTTAGGG GCTGGALAGAC TTGGT'rCCTG ATGCAAATCT GCAATCT'rCT TCTCAGTATT GGCrTTGATA TATTCAGCGT TCAGAGTGAC TGCTCCTGGC 7 14 0 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 GTGAGTTCAA CTTGGTCTTT ACCTCCCTCA TTATGCA6ACT CAAGCATTCC TTTACGAATT GCGACTTCCC GCGCTTCCCTTCACCACT TGTAGAGAAG 936 GT'rACTTTAT CAGCTGGTAA TACAGCTTGC GTTCCATCTT GATAGTGAGC TCGAACCGAC AATTTGACAG 'TTTGGTCrC TTTGAGACTG TCAGCTTTTT CCACTTGCAA GC'rCAAGTGA GCAATTTTTG GCGCrCTrC AAGGAAI'GA ATTGCATAGG TT-TGAAGAGG GCCACCATCT 8880 8940 9000 TTAGGCTGAA TAAAGATGCT ACAGCTGCAT =rTAGCACT CGGTATTGCA TTGGTTr'rTG G="GGTACTG CAGG'rGCCGC TGACCTTCTA AATAACCGGT TCTTCTTCGG CAATCTCCCA ACATAGGAAA CAGATTT'GTC ACTGGTAGCT CTGATTTAAG GCCATACCTr TCACCGTTAC GGACCTTCTG CACGGCTACC GCCTCTCCAA TAATGCTCTG CCTTCTT'rCT TACCAGTAAA GTCAGGGTGA ATTTCCCTGC AA'rGCTTTAC GAATCCAAGA TCTCCGTTAT CTACACCGAC TTAGCAGTTG GAACCACATT TCTTTTCCAT CTGCTGCAAT CCAGCAGTCG TAATCTTATC CGCACGCATG CCGTTTCCTG CGCTTCTTG AAGAACTGTA TGCTGTGACT TCTGGCAACT TAGCTCCATA AGCAAGAGTG ACTAGTAAGA CCTGTTACTG CCTCACCACC AACCGTTACA AGGATTGCCT TCTTCTACCA CAACGGTTGC ATGAATTGGT CGCT'rGAATA CGAGAACCTG GAAT'rGCTAA CT'rAGCTTTA CTTGTCCACT TCATACTCT'r CAACACTTCC ATCAATCAAA TACAGAATTC AAGTCAGTAT TTGGAGCAAT ACGTTTCACA AGCAA'rCACT TCTACACGAG CTTCTACTTC TCGTCCGTCA AATACCAGGC TTGCTCACAT CTACTGAAGA CCAGGTTACA a. a.
a a a a. a.
a a ATCACTGTAT ACAAACGGAA TACT'rTTGGC AC'rTCTGTCC GACAGTGACT TGGTTCGATT TTGTTCAGTT GATTT'GACAA ACCATCTGCT TCGCCTTATr CAGTTGACCT TGGCCATGCA CCCCTGGCTG TCAACAATTT CGCATGGTCT TCCTTAATAA TCGAGCAATT TCCTTGCCAG GGCAACTTTC CATTCAAGAT ATCGCTGGTT TGTT'TTTTAT TTCTACACTA GAAGCATTCG a. a.
a a a.
a.
a a a
TCCAAGGTAC
CCTTCTTGGT
GATTTTCCAT
TTACCTTCTG
AGAAGGTGTA
CTGGTTGATA
AAGTCCGCCC
TCAAGAACAA
AGTCAGCTAC
CCATCGGTTT
CAGTGGTAGG CATTTCAGGT CCAAAACAGT CTTCTCTTGT TCAAGAGA'rC AGAGTGGGCA TGGCAACACC TTTACCATTA AGCGTTCACG GCTGGCTTGT ATTGG.AAGCG AACCAGATTA CATAGTAGAT GTAAGTCAAG GACGAACTGC CGCTGGCTTA ATTCATCACG AGCAATTGCT AAAGTTCATT AGCATTTGCA TGAAAGTCr'r AAGACCAAGA AATAAGCACG AACTGGAATC T'rTCCCAGTT CCAGTGAGGA GGCTT'rGGTA GAGATAGAAG AAGAGC'rCTT AACAGGAGTT 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 'rrTGGATGCT
CTTCTTAACA
AATTCTTTGT
GAAACCCATT
A'rACCAAAGT TCATGTTTTG GAATGCCGGC TGTATCTACG TGATTTTGGT TGTGCCATGG TGTAGGTTCA TGTCCAGCAT AGCCAGCGTT GTCACGGTCA CAACCCACAC GATCATT'rCC ATAATCTGAC CCAATATAGT CCGTACCTGT AAAGTCCATG AAGCGTGC TGTTCATAAT TACGCTCAGG
CCAGATAAAC
TGTTTTCCCC
TCCATTGCTA
937 TGTTTCAATT CACGT'rCAGG GCGATAGTAA CTTCCACGTG TACGGGTAGC TGAAGATGTT TCTGATCCAT AAATCAACCA TTT'TGGATG3C TTAGCTCTAA GGGCTTTGTA ATTATCTTCA GAATAGTTAA ATCCAACAGC ATCGAGTTCA TCAGCAATTT TCTCATGCCC TCCGCTACCA TTACCGAAAC GCAATTTATC TGCTCCCATG GTAACATAGC GAGCTTATC AACATCCTTG ATAACCTTAA CCAAACGTTT AACAGTTGCT AAAGAG'rGGG CATCACCATT AGCTTCACCT ATTTCATTAC CAATTGACCA CATGAAGATA GCAGGGTTGT 'TTrGCCTCT TTCGACCATG GTACGTAGGT CAAAATCAGA CCATTTTTCA CCTTTTCGAG CTTCTGGGTG AGTGGCATCT TTTTCAAAGA AACGTCCATA GTCATAAGGT TTCTrGCCAC CATACCACGT ATCAAAGGCC TCTTCCTGAA CGAGTAAACC TAGTTCTGCT GCGATTTGCA AGGTTTGCTC ACTAGCAGGG TTGTGGGTTG TACGGATGGA GTTAACTCCC ATCTCCTTCA TT~TG'N-rGAG ACGGCGATAT TCTGCTAT AGTTTTCTTC TGCTCCAAGC GCCCCATGGT TGGAATTTAA 'rACGTTCACC ATTCAAAGAG AAACCTTCAT CGGTAACCAA ACAAATCCTT CTTAGCATCA ACCAATTGAC ATCAATTCGT ACAAGGCAGG TTTGTCAT'rT AAAACAGTCC TCTAAAATCG CATCTAGGCT TGTTGATTCA TGTGCTTTTA ACTAAGCCTG T'rACAGCATG ACCACCTCGT TCAACGATTT TGGTCTT'rGT CGTCCGTA'IT GACGAT'TTTG CTGGTCACAT TGTTGTTCTT CAAGTTTTGG TGTTAAAATA GTTrGTCCCAT TCTGTCACTT GTAAAGTCAC ATCACGATAG ATACCAC'TC GCCTGTTTGT TGAC'TGCATG GACAGCAATC ACATTCTCC TTGGTGATAT CATATGAGAA CTGGTTATAA CCAT'rTGGAT TTGACATAAA CTTGAGAATC CATGTAGACG CCATCAAAAG AGGTCTrTTTT CATCTAGTT'r GAAAGTCTTG CGATACCAAG CGTGGTGCAA GGATACTCCA TTGGAGTCCA GTGATAGTAA CGTCACCC'rA AACACGCGTA AGAGTTTTGG TCTTTCAACT AGGTACGACT CGCTGTACGA GATATTCGGC TACAAGTTCA GAGTTTCAAC CTTGCCATGT TT'rTCTCAAC ATGCACCTTA CTGAATACCA ACGGCTACTT GACCATCT'rT TTGAAGGTAT AATGCCCCAC TA.ACTGACCA 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 CCACCTTCAT TTTGTGCAGG GGTAAATCTA ATTTTTTCCA TTGCATTGA GT'rTAAAGTA TGATTCACTT CTTCATTTGT GAAGCTTGTG ATTCTATCCT TTTGGAGT'rA CGGCTTCATC AGATTCATGA TCGAAATCGT CGTAGATACG TCTGCATCAG CCAATTTTGA TTAAAATCCA TACAGCTTTA GCATCTTCC'r TGGAGCTTT'r TCTTCCGGTT TTCTTTCTTC TCAGATGCAA
TAAGGCGAAC
CTTCCCCACC
TAAAGATACT
GTTTAATGGC
CTTCCTGTC
TGAGCGGTTT
TAGCAGACAC
TAGCCTCAGT
ATTTTTCTTG
GTTGAGCTGT
CCAGTCATAC
T-rCCTTAGAA
TTCAATCATT
TTCTTGATTT
TTTTTCCTCT
TGAACTAGGT
938 TCACTTTGTT C'TGTCCTTTC AACTATAmT TTAGTTTCCA AAGCTTTATC TC TACTATCA TT'TTTCCTC TTTAGGTTTC TCAGCAGTAT GAGTAATAAG GCATAAAC'rA CAGATTCTCC AGCTATATTT CCTCCTAATA AAACTGCACA AT'rACTGAGC AAGCTCCCAC TGGCCTTTCC CCATAAAACC CCCAAGTGT ATACATGGTA AATTCAAACT TCGTCAG'rGT AGCAAACT'rA CGAATGCTAT AAACTCTTT CTCCTTATAT TATATT'rAGT GCAGTETAGCT TGACAACCTA GTTTCAACAA TTTACACTCT CGCCTTGCCG 'rAGATATGAT TACTGACTTC TTTTGAGCTG ACTTCGTCAG ?r'rCATCTAC
ACCTTCT
TGTr'rCATCC
AGTCCCAATC
CCGATTCCAA
ACTACCAAAG
GCGAAAATCC
GTCAGTTTCA
AACCTCAAAA
TTTTGAGCTG
TCTACAACCT
CCATGTTTTG
ACTTCGTCAG
CTAGTTTGCT
TGCTGCGTAC
ATAGAATACA
ACCAATTGTC
CAAAACCATG
AGCTGACTTC
TCTTATCTAC
GTCAGTTTCA
AACCTCAAAA
TCTACAACCT CAAAACCATG CTTTGATTTT CATT'GAGTTr TCTTCGTTAC GTTTGATCAT C'rGTGTTT'TG AGCAACCTGC GGCTAGCTTC A'rATT'rTATA GGAGCGCATT AT'TTTGCTTT TTGTrTTCTG TACCAAGCAA AGATACCGAT GAT'rGCTTTG ATATCACCAG TTGTAGTGTT CTGCACCAAG AGGAAGACTA
CAACCALAGAA
TTGAGTTTGG CTCACACCT TGGTGCAATA AGTGTACCTG CATACGGAGC AATTTACCAC GATACCTGCA ACTGGCAAGA GrTrTTCGAT TGGTCCTTCA CTGGGAAGGC ACCTACACCT AAAGAAGGAA GAGTGGCAAC GAGT'rACAAC CAAGAGACCT TACCATT'rCC AACTTTTGAA CATGATTGGT GCAAGTACGT TGGCACAAGC CCAGATTTCA CCAGTCAAGA CCGATATTGA ATTTACGTCC TTGAAGACGT ACCTTGTGAT AGTGGTTCTA CGGCTGCGAT GAACCATGAA CAAAGATACA CCGGCAGTCA AACCAAGAGA CAACCATCCT ATCTGCATCT GCAACACCTG CAATTGGATG TGGAGTTCCC AAGGATGAAA CCGATGAAGA ATT1TAGATCC CCAGAAACCG AGCATCAAAG TCATATTTAT CAAGGCCTGG GAAGAATTTT CATGATAACT GGGTTCA'rCA TGTAGTTCAT GTGAGTTGA'r GGCGTTAAGA AGGTCATCAA A'rGTACGT'rT CATCAAGTCA ACCGACAAGG ACGATAGCTG CTGTAGCAAT AAAGAGTGAA GTTATCAGCA TACCATTTAA TCAAGAGACC TGTGATAGAC ATCGACATCA AGTGTATCTG rr'rTCTTCAT AGCTAGCATC AGAGTAGAGT GAGTAATCAA TTAGCAAGTT CTGTTGCAAA AAGAGTGTTC CGAAGATAAT GGAGTAACAC CCATAGCGAT AGAAGCACTG CTTCAATCAA OCACGACCAG CGATGAATGG TTAGTAGCAA CGTTTGTAAT CCGATAAGTG AGAAGAGTTC T-rGATAACAA GACGCCA'TTT ATAATACCGA TAACGATACC ArTTCTTGT TCAAT'rTAC TCAAAAATCT TATCCAAAAC GTCATTGGTG ATGAACTTGG GAGTTGATAA TTTTCAACAC ACCCCTTGAC TCACACCATT AAGTGCCAGA TATCAAAGAT ACTATGTTGA CAATCAACAT 12420 12480 12540 12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 939 GATGAGCAAG AAGTATAGTG TCCAAGCAGA ACCCCAAGTG.
ACCAACGTCG GTAATACTCA ATrGGATACC AGTGTrTTCA TGAGAAAGCA GTGTTTAGCA TACCGATGAT AGCACCGATA TTTGATACCA CCTrCAAGCG CTTTGGAGAA 'NTCACTCCA CAAAATGAT'r AACATGATGA CAGGTCCACC CA'PTTCTAAG GATTAGGTCA AAGAT1TGCAT CCATAACAGT TCCrCCCTT'r ACA.AATTAGA ATTAGCTTAA TCCGTGTTCT TTAATAGCTG GGAGCGCTCA TTGCTGGGAT ACGGAATAAG ATTGGCCCAG GGTTCAAAAC CAAGGTCTGT TGCAGCGATT GGTGTAAAGA TCTTCGTTrTA CATCTTTCAC CATGACTGCA TCACAGTGAA AGTTCTTCTT CTAGAGCACT TTTAATTTGG TGACTTGAGT GCAAGAATTT TAATCATTTA GATTTCCTCC GATTTTATTT GGTTGCTTCA GCAATGTAAG TATAAAGGGC TTCTGGTTCA AAGATGACCA TTCCTGTGA AGAAGTCCAT TAACTGAGCA ATTGTAGCAA GTGGTGCCCA ACGAATNTTG CTAGTGATGC CCTGTAAGAG CGATGGCAAG AAAAGTAAAG CCAATACrGT ATGGGATTGA AAACCTTTCC TTGATGTTAT ATGAATGTTA
CTTCAATATT-
C'rTCGATAAC
GTCAAATACT
TGGGATACCT
TATCGTAACC TTITCATAAGG CATCATAACC ACGGTTTGAA TAACACCTGC ACCGCAGGCA T'IrAATAGAC AAGAT'rAAGC GAAATTTTrG ATAGGTCTTC AGAATGTTCG TrTGACTrGA ACTTGAATTA TTAATGATAA AGAAGAGTAG GGATACTTCT ACTTCCTTAT CAGGAGCTAT CATATTGTGA AAAG'N'ATTG GTTTTTCTAA. TCGAACAACC ACCACTTTCT CAGCTAGATT ATGAACAATA TCTGTGTGAG GAATCGCTAC ATTTGGCAAG TCCTTTCCTA GAAATTCCAT ATCTAAACCA GTTGGAAATG ACTTTTCACG CGTGATCAAG GCTTCACGAT AAGTTGGAGT GACAATTTCT CGT'rCTTCCA ATAAAGTTGC AACCTGATCA AAGAGTTGT'r CTTGACTATC CGCTTCTAAG CAAAACACAA GGTTTTrGTC AAAGAAATA.A TCTA.ATACCA TAAGTTrTTC
CGG
INFORMATION FOR SEQ ID NO: 140: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 28882 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: TAAGACTATT TAATAGTGGA GTGAAATAGG ATACGAACAA ATTGATTAGG AAAATCAAAT GAATTTATAG AAATCTTTTA. GCAGTTATGT TATCCTATTC TAGTTTCAAA ACGCTATAGA 14220 14280 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15363 120 AGCAGCAT'rG T1GCTAGTCkA AAAACCCACC CTCACAGGCA CTGAACCGAA TACGTCGATA 940 GATTC-AGTTT ACTATACTAA AACGAGTAGC TTGAAATCA-A GGTT'T'ATCT GTATTATTCA GCTAGA'rTAT GCTTTACCTT CGTTCTTCAA CCGATGCTTG GATAGCTTTT ACACCGTCAG CCAAGAATTT ACGTGGGTCG AAGAGTTT TCTTGTCGTA AGTCACCAGC AAATTTACGA TAACTTTGGC AACACCAAGT CACCGTGCAA TACGATTGGG GGTCAAGACC TTCCCAGTTT GTTGCGTTAG CGAATGCGAT TTGA'rAGCTG CTTGGATTT'G AATCCTGGAA GAGCT'rCTGT ACTGGGTAAG GACCGTGGAT AGAAGTCGAT ACCAGTTTCA ACCATTGCTT TAGCGTCTTC TACCGA'rGAT TCCATCTTCT TCACCACCGA TAGTACCAAC CTTTAGCGTG TGCTTTTTCA ACAACTTCTT TAGCCAAT GGTGTGAACC GTCAAACATG ATTGAAGTAT AACCAACTTC CGTAGTGACC GTGGTCAAGG TGGATAGCTA CTGGTACAGT TTCTGCrrCG TI'TGCTTCGT T'rGGCATTCT GTGTTAACGT CTCA'rCAGGA ATACCTGATC CAATTTTTGC AAGTGGTCAA GTTACCGATA CCAGCTGCCA GATTGGAGCC AATTCACCTT TTCAGCTTCT ACTGAGATAC AAGGTTTTCT TCAACTGGAA CATACACTCA AGTGCATCTT GATACCCATT GATTCAACAA GTATTTAGCA GCACCCATTG GCGCAAGATA GCT'rGAGTCC GTTGTCACGG GCTGCTTGGA CCTGTATA-T T'rTATGGGTC 9 9 9*9 9 99 9 9 9* 09 9 9 9 09 99 9 9 999999 9*e~
C
GGTTAGCGAT CAAGTTGCGA GCAAC'rTTGT AAGTTTGGAT CAAAACTGGA GCTTTTr'rAG ACTCAAGGTT GTT'rGTGTTA AATCCACCAA CAAATTTT!TC TGCTGAAACG ATTGCCATTT ATCCCATTTA CATTGTTCAT TTTATCACTT TTTCGATTGA TTTTCTTCTA AC'rCCATCTA GACTTTTGGA AAATCTATAA AGAAGGTTAA TAATTTTTCA TGT'rClAATA GACTCTTAAC CTTGCGACTG GCATTGTCAG AAAAAGACTG GCTATTTTCG ATAAAAAGTT CTCCTTCTCT
AACCACCCAT
CTTCrGCTGC
CTGCATAACC
TATCAGGCCT
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 TTTGCCAAAA AAATCTAGTT TTTCCCGCAG TGTAALACCCT TTCTCTCCCT AGTCTTGGAC ACTATTCTCC TCCATCTCGA AACGATAAC CACAAAGAGC CCCATACCAG ACCCCTTGAC GGCTAGTTTT TCTTGTTCCT CTGAGCTACA TTCTCCAATr CGAACTAAGC CACCTGGAAC 999* .9 9 9 AGAGTGCTTA ATGGCATTGC TGATGAGAT'r AGAAAGAATC AACTTCATAA CTGATGGGTT TAGATAAGCC 'rGCTGATGGG TCAAACTATT GTCTATCTGG AGCTCTCTTT CCTTGGCTAG CAAGGCATAA TCTTTGACCA GATTTTGCGT CATCTGGAGG AGGTCAATTG TTTCCCTATC ATCTCGCAAT TCCTGCACAG AAGkAAGGA AAGTATCTGC AGAACATGGT GATTGAGTTC ATCCACAATC CCCAAGGCAA CTCCCAGATA CTGGTCTCTA TCCT'rATAAC GACCGATATT CTCTCTCATA TTTTCGATTA GGATTTTCAA ACTAGCCAGC GGTGTTTTCA ATTCATGAGA AGCTCCTCGT AGGAATTCGA CCTTCATCI' CTCCAGCTGG AGAATGGCTT CATTCTTTTC
ATGCAAGTCC
ATTACCTATC
CCGACGGGTC
GGCCACCAAA
GCAATAACAG
TCATCCTTAG
ACCCGCTTGA
AGGGAAATCA
941 TCAAGAGATG CTGGTAGAGG CTATTGATT'r GTTCCTTGAG AATCCACGCG CAATCGCACT TGGGAATCCA GGTCCATCAT TTTCCAAAAT CGGTGCAACA ATACTCCGAG CGTAGATGTA GAAAGGAGGC CAGCAAGGTA TAGGGAAGAA ACTGGAGACT GTAAATCCAT GGAAGC'rAGA AACTGGAGAA TCATAGTACC- GATTTGCTCC GCTTCCTTTT ACCGTCTTGC GTTTTCACCT CGCGCTCCTC AATAA.AGAGA GAGGTTGTCT GGCCGTCTGT GTCCAGAGGA AGACTGTCCT TGACTTCTAA CTTGTCCTCG GTCATCTCAC CTTTGACGGT CCCCTTGATA TCAC1'AGTCT GGGAATACAA GTCTAACACT TGCTCGATAC TC'rGCCTATC TTTCCCTTCT AGGGACTGGG CAATGGCTGT TGCCTTTTGA CCAATGGTTT CCTGACGATG ACTCAGATAA GTCGAAGGAA AAAGAAAATA AATAGCTAAA TGAAGGCAGA TAACCAGAAC ACTAAATATC GAGAAGGTAT AGATAAATAT CI"ITGCAAAT AAACCTGTTC GTTTCATTTT CGCTCCAATIT TATAACCAAC ATTGCGCACA GTGAGGATAC CGCAATTCCT TGATATAAAC ATCAATAACA CGGTCAAAGG CAGACGGCAT CGATAATCTG AGATCGAGTC AAGGCCCGGC TCCAGAATTT CCAACTCTTr GGCATTGATA GGCACTTCTr TAGCTTTCAA AGTCCACCTT GGTATCCTTG TAAGAAAAGA CGCTTGAAAA TCGCGTCCAC CCTCACTTTT AAAAGGGAGA TAGCCATCTG CCAAAGAGGC AAAGGCACTC ATCTTGTATT AACATCAAGA CAGGAACCTG ACTGGrT'rA CGAATCTCAG AGCTTGGGCA TCTGGATATC CAGTAAAACC AGGGCCACCT AGAGCTTCCT GACCGTCCGC TGCCTCAATA GTTTCATAGC CTGACCCCCT CACGGA'rCAT CTCTTCATCT TCTACAATTA CTCTCTATTT TTT'ATTTTTC TTAGAATAAA TACCTACCCT GCTGGCCTT'r TGTCTGCAAG CAACTGACCA CTAGATAAAA ATAAAT'rCCA TAACTTTAGT ATATTATAT'r TAAGCACTAA AAGCAATGAT TT'rCACCACT GCTTTCGGAT TTATTTrGAA TCCACTATTC TTGAATAGAA ACACAAGATG CAATCTTTAT TTrTATTCACC ATCCAGCAAG AGCTCTTT'rG GTTGTTTTCT AATCCAAGTC TAGCTr'rTTC CAACCTCATC TGTCGCTTTC CTTCATTT'rT CACTAGATAG GACCTGCGAG GCTTGCACTG T'TCGTCCTGT ATCGTAGTAG GGGAGAAAGG TTTT'rCCAGA CCTCATCTTG AAAAGCTGTC CTAGGACTTC TAAGCCGTTG CATAGCTAGA AAATTGCTCC CACAATCCGT CAAATAATCA AAATTTTCAT ACTTTAACTa 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660
ATTTTCTATT
CGTTGTGAAA
AGTACAAAGA
TTGTTAAATA
TCTAGACTCA
AAGGAGATTG
ATAGTCTCTT
TTCCT'TTCTC
AAGCAACTGA
GCCATTCCTA
TTTTTTCAAA
CTTGAAGCAA
GCGCCATAAC GAGAACCACT AGAACCAAGG CAAGGACAAA AATGATGATA AAGTCTGATG 942 TCTGAATGGA AATGTCTAGG CTCGACAAGG TCTTGCTAAA GCCATCTACT TCTGCACCAC CACCAAGGTT AGAGGCTTGA GCCGCCrTAC TAGCCTGT?3' GGCAACACCI' GAAGTCACAT TGGCAAGGAC AGTGTTTCCA ATTGCACGGG CAGTGTAATT AGCTAGGAAG TAAGCAGAAA CTAGAGCAGG GATAGCAATC AAGA'rAGATT CGGTGAruA ?rGACCCAAG ATACTTGCCT GCTTGAGGCC GATAGAGAGG AGAAT'rCCCA CTTCCTTGCG ACGGGCGTTG ATCCAAAGC TGAGCAAGAG GGCAAGGAGG AGAACTGAGA AGCTCAAGCT ACCCCAGAAG AGGAGGrTGG CCATCTTGTA CATACCAGAG ATAGATTGCT CAAGAGCTGG GTAGTTAGAG GAGCTCTTGA CGAGTGTGTA GCTCTTCCAG TTGATACCAC TGATGCCATT CAACTCTTTC ATAACATCAT CCAAGTTCT'r GTC'TGCTGTT ACAAAGAAGG TTGCGTCCCC ATAAATGGCT GTGTCTTCTG TGTATCCATA AAGTTTTGCA GCAGTGTGAA TGTCTGTAAT AGCTGTGTTT TCGTAA.AGTT C'rTGTGAGTA GGTTACTGCT GACTTATTAT GACCATCAAA GAGTCCCTTG ATTGTCACT CAACTGTTTC CTTGGCTCCT TTTTCATTAT CTGCATCGTA GATATTAGAG TCCAGT'rTAA CCTTGTCCCC TACTTTCCAG CCGTGTTTGG CTGCCAAGTC CTTGTGCAAG AGGATTTTAT CCTTGTCGTC GTTGGTTAAG TGCTCTCCTT CGACTAGTTT ATA-AGAACCA GAGACAAACT TGTCTTCTTT AGAGGAGTCA TTGACACCTG TAATCATCAA GCTACTTCCA AAACGCTTGG CACGATCAGC AGTGAGATTC TTCTTGGTTT CTGGCGTTTC AATCAGGTCA TATCCAGTCA AATCTCCGAT AGCGTTGATA CGTTTGACAT AAGACTCAAT GGCCTTGTrT TCGGTGATTT 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 TTTTGATGTC TTCACCCTTG GATTGATTTG CATGGAGAAG CAGTAGCTCC CTTGATTGAC GGAAGATGAC AATCGATTTG TAGATTCCCT TTCTAGATTT TAGTATTGCG CGTTTCAGTC GTGCCACTTC TTTACTGTGA ATTTGAGTAG 'rTCGACAATA CAGCTAGAAT AACTGGAGCT GACCACCTGA TAACTGGAGA GAAGTGTATT CTTGCTTGCC AATCTATCAA GT'rATAATT AGCCCTTCTT ACGAATATCC ATATTCCCAG CACCACGAGG CTATTGGTGA TATTTTAAA AAGCCGACCA AACTCAAGCT AAAAAC'rTCC TTGTAACATA TGTTTTAATC ATTCTATTAA AGTTTCTTrAT CCTTTAATTC GTTACGACAA TCACACATTT TCTCCAGCAG TTTTAGGATC TCTGAGACCA AACTGCGAGC ACATTCCGCT TGATCTGGCT TTTTTGTTIGA CCAATCGGAT CGTTCCTTGG TTGACGCGAC GGTCTCCTGA GAAGCCTTGG CGCCATGAGG AGAATAATCA GGCAAATGCG TTGTGTAACA AATAAGCTCA AATTATTTAC AAGTGTAATA TCTCACGCTr ACCTGTTTTC TGGGCAAGTG CAGATTTCCT GTTGGCTCAT AATGGCAACA CGTTGCTGTT TTCATCCAAA CCAAGCTCAA ATTTTCCAGC GGAGAAAGAT TGAAAGACCA GGGAAATATG GTGCATGCGA TGGTAAGAAT TCTCCTTGAA AAAGGATAGA ACCTTCAACA GGACTATCTA 943 GACCAGCAAG..TAGGGACAAG AGTGTGGATT TTCCTGCTCC TGACTCCCCA AAAATTTCC GGGT'rCAAAA TTATAATTGA TCTGATATAG GACTGcTTCA TATAACGGTA GGTAACATCT TCTAATTGTA ATAAAGTCAT GATrTCTCCT TAGATGATAA AATTTCTTTC GGTGATTTTC TAAATAAGAA TAGGAAACAA ATAAGCAACT AAGCAGAACT AGAAAAACAT AGGATTCTGC AAAAGATAAG
ATAAACTGCT
GAGTTTGGAG
CAAGAGATAC
CAAGTGCAAG
TGCTTTGGCT AGTGTATCTT GTAAGCTTGC TAGGTAAGTT GTGATTGCGT TTCCTGCAAC CAAAACTACC TCTAAACAGA ATTGTAGGAA TAAAATCCCC ACT'rCATAGA CCCGTTCTCT
ATAATACTGT
GCAGTAT'rCI
TCTTAACTAA
AGGGCTACAG
ATGCTAGTTG
CTTGCTAGTA
AGCAAAGCTC
TTDGCCTTTC
GACAAALACCA
AGGAAGGTTT
CTGATCTCCA
AAATGCTrGGA
GATCGAGCTC
CAACCAGAGA
GAAGATGGTC
0 0.
0000* 0 GAATTAAGGC TCCAGCTCCT GCTATCAACA TCCCATAAAG GGAAAGTTGC AACTGCTCT TTGAT'rrGTT CAAAAGCCTT AGCCTTGATr TrrCCAAGGCC AAGTTTTCTA CCTGCTTCAT GATT'rTCTAC ATAGAAGCGT GCTGCACTGA CTTGAGCTTC GGCTACTTrTC ATAGTCTGTA AAGACTTGAT TTTCACTGAA ATTTCTCTTG TTTTTTACCA GAAAAGATGC CGATAATCTC TTCCAGATTC AGACTGACCA GCATCCAAGC CAATCTC TCTTAGCCAA TTCTTCGTGG ATAAGGATTT TCTTGGAATC CTTCTTTTAG ATTGAAAGCC GAACTGGTAA AGGTTACATC GTTTTCCTT-T TCGACTTGGT GAGTCCGTCC AT'rTCCTTAG ACTATTGCCC AAAAGGGTTT GTCAGAAGAC AAGCCTGTGA AAACTCTACT GTTTGTCCTT ATGAAGCGAA AGACCGTTCT CCCTTTTTGA AGGTGTCGCC CTTGGATGAA TCCTCA-AGAG 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 CCGTTAAGCT AACCAAGTTA TTGTCTGCAG CTGATAAATC ATCACGCTCC CGCCAGTCAC TGCTTCCTTG TCTTTTAGTT TTGCGACCGT CTCA.AGTTCA TTTCCAGCCC CTTAATCTTG CTTACAGATG CTAGGTCTGA TCTCTATCTr CTTAATAGAA AAAGATGTAT TGAGTGATTT TTTTGTTGGA CTTCATCAGA GTCAAACAGG CTGAAATTCC TCAGAAATAA AATAAAACTT CTCAGTCGCT TTCTGCTGAC TTGGATTCAT TTGTCACCTC CATATTTGTA AGACTATTAT TTATGAAATA CGAAAAAAAA ATATCGAGTA GGGGATAATC CATACGTGCC GTTCGGCATA CGGCGGTTCA ACTAACTTTT ATAATCCAAA CACGAAACCA GTCCACGTTT TTCAAGGACT AAGTACCGAC TTCTGAGCTA CTATAGTAGA TTGAAACTAG
CAACTTGAAT
ATAAAGATTG
GGCCAATAAG
ATAAGCCCAA
AAAACCCAAA
ACGCTCTGCT
GGAGAGACAT
GTCTGACCAT
CTTTCTAC"'G
ACCAATAAAA
GATCTTTGGA
TATGAAATAT
TCTAGCCCCT CTCACACCAC AACGCATGTC GTTCAAGGTA GGTTTTGATA TAGCACGTTT AATAGTACAC CTCTACTTCT 944
AAAATATTGT
TTACTATAAT
CCCTAACTTA
CACTCGTAGG
AATTCATTCC
ATTTTGTTTT
TAAGTGAGTG
AAGTTTACC
CAAAGG'rTGG
CATTTTTCT
TAGAAATCGA TTTGACTCTC CTGAACAATT CGTCCI'ATTC TTATTTCATT TGATAGTGGr CGCCCCAGCC AGATACCTTA TCTGCTATCC ATTAGGAAC AGCAA'rCCCC ATAATCGTCT CGATTTCTTC TTCCATTGCT TCCAGATAAT CGAGTACGCA AGCGCTCATC TATC'rAGTG ACTATACTTT TCATATTTAT 'rTTCGI'TCA CTCAAGGCAC AACACAGAAT GAAAAAGTGT TGTGATCTTT ATAATAATAG TGAGAAAACC 'rATCACTACT ACAAATCACG GGGAGG'rGAA GTACAGCCAC TACCTCGCAT ATTT'rGTCAC A'rCATr'rAAC GGTACATAA'r ATCTGAATAA GTTGCTACAA TATCATTTGC ATGCTCTCCT TCACCT'N'AG AGCTCCTGCT GGATGA'rTTT TATTTGCCTC TT'CAATTTT TCAATAATGG GTATCTT'rTA TATTATCAGG ATTTTTCACT AAGATTT'rGT CTGGATATGT CGGTTTAGCA GAAACAA'rTT TTACTGTTAC TCCAGCATTA T1CTTTAGCAT TTAATTTTAC AGGTTGATTA TCAACATTAT TCAACTTTAA ATCAATCGTT ATATATAATG ATGAATTGTT T'rG'GAGCTA CTGAAACAG AACTGAAATT GCTTACGTCT AAATGAACTT CCCCACTATT ATAACATAAA ATTTGCATAA ATAGATTAGG CCAGTCACAG TGTACTTTCC CAGCATCAAG AGATGGCAAT CCTCTT~CTCC AAATATTAAC AACCTAAAAT CAATTGATAA CACAAGGTCA TCAACTCTTC CCATrTATCA ATCTTGTATT GGGCTGTTTT AGTGAGAACA GCACTATTTT GATCTTGATC CAAAATCATC TGGTGTAGAC TTTCTCCCAT ACTCCAAAGA TAGAGCTGAG GCCACCATGT CTTGAACAAA TCCCGCTCAG CCGTCTGAGA CCAAACCTGA CCCAACATGA CCGCTTCAAT CGTATCACTT AAAATATCTC CGAAAACATA ATCCTGAGCT CCGACCTGTA GCATCATATC TAGCCAAGAA GCCAGATTT'r CCAACTGAAT CTTTTTAAAA ACTTGCGGTG TTCTTTTT'rA TTCGAAGCAC AGTAATTCCT GAACTAGGAA TTPCAAAAGA GCTGTTGCAT ATTATAAACA GTTCCTTCAT ATACCCACTA CCTCCCTGAT ATTTGGCTTA GCAACAACTG GAAATCAAAG CAGCTTCTAG CCACTATAAC TCTGCACATA TTCTTTACAA ACCAACTATA GGTCGGTCAA CTCTTTCAAC GGAGAGAATT GCGGTGCAGA CCCAAAGAGA GAGAATGATT TTGTCCAGT'r
CTTCAGTAGC
CAGACGTTTT
ATr'rAGCTGT
TATCTTCAAT
TTATAGTAAA
GAATGTTTTA
AAAATGGAGA
GTTGACAAAG
TGAAGCCCTG
TAGAGTTGCT
TCCTGAATCT
7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 ATTCCTTGAT TGGCTTCAAG TCCACGAGTC AAAAAGTATG AACACCTTGG TrGACCCTGAC CT'rrGATTAA GTCTGATAGG GCTTGA'rGTC TAGAAAGACG AAGTCCAAAG TCATACTrCAA TTACAGAAGT GTATTT'GTCT TGTTGAAGCA GCACTGTCTG ACAATTCGGA AAAAGAGTCC CC'rGCTGAAA ATAAGAALAGA TGGCAATAAA CCTGTCCCr'r GCCCTCAACC AGATAGGAAT ACCAAGCGTT TAGCGAACGA GCCTGCTCCT GCTGGGTCAA AAGGGCAACC AACTGCTITTT CACGCTCGCT GAGCCCAGCT TGAGATAGCC CTCTTTCTCT GTAGTTCTrT TGCCCTCATG TCTCAAGACG TTCCAGATTC TGGACATACG ATAAGTCACG GGCCACG'N-r GI'AGATTTCT CAATCAGGCG AGCCTGAGCC TGGGTTTGGT CA'rACCTTTG CCCCCTCAAA GGCTAGCGCA ACTCCATCCT TGGTATCAAG TCTGCTGTCG CAAAGTCACG ATCTCTGCAT CCAAAACTTC TCCTCCAGCA AAATCCACrG ACTCGTG4GT CTGAAATCCG TTCTAGCCCT CCACTTTTTG TCAGTCATAT GGAGATAGCC ACATCTGCAT TTTTAGCC?1' 'rCTTCTTTTT CCGTTAGGAC TTGGCTGACA CGTACTTAGT AGGATGAGGT GACGGCGAAT ATCCCGTTAA TGAGATTGAC GAGCTTAATT CCTTGAGTAA ATTGGCACGC GCCTCTTGGC CTCAACAAAG ACAATTCCAA
CTGAGAAGCT
AGCCTCAGGA
GATGCACCAT
CATAACCGCT
TGTGTGGCTA
CTGCTCCTCC
TGCTTCTTGA
ATACATAGAA
AAAGGGAGCG
AACCAGTCTT
GAAACCAAAC
TCAAATCCCG
TTGGTATTGC
AACATGAGAG
TGGAGTT'rAT
TACACCGCAT
0 00 0 0** 0 AGAGCTTGCT TGACACTTGC ATCATAGTTC CCTGAGTTGA ACAACTGTGA TACCGTTGGC AGCATTAAAA TCTTCATCCA TTAAAGTTTT GTAACTCTTG GGCATCCACA TTTCCTGTAA TTGAGATACT TGAGATTGGT CTCGGCATCG CGAACTGCCT TTACGGTAGT GCTGGGTCGC GCATCGTGTA CCGTAATGAA ACAAAGCCAT TGTGCATCCA GCAATTTCAT TGGTGTGGTG GTATCACCTA AAATCTCTGT GGTCCCCAAG GACTATCCCA TCTACAGGAT TTTCCTTACG AAATCTTCCA AGGTTTTATT TAGACATCCC CTTGACTCTC ATGATGTCTG CCATAAACTC GCCGTCACAT CCTCACGAAA CCTTCTTCCC TGGCACGGTT AAAGAAGAAA CGAAGTACTr GTTACCCAAG GACTTAGACA GTAGTTAGCA AAAGCdTTGC TGGAAACTCT AGGTCAGCTC CGACATGACT GAACACTCAA AGAAATCTCA CCTGGTTTGG AGCCGTTTCT TCATCGGTAC AGCCAATTTA GCATAGTTGT ATAGGCAAAG CCTTTCTCGA CACTACACGC GGATGGCGAG GGCAGCGATG TACTTATCCG GATAATCTTA TCATCCACAT ATCAATCACG TGTCCACC'rC CCAATTGGTC ACGGATTTGG GTTTTTGAAT CAAGTCTTCA AAATTTCTAA CATATCTGCA TCCATTPCGC CATTTCAAAG, TAGCTGCTAC AAACI-rATCT ATGGTTGTTC GTAAGTATTC TTTCCGTGAA GTTGATAGGC GCCCATCAAG AGTTTTAAGG TTTTGACATT GTCCATATTG CTGTTTTAGC TTCAGACTGG CACCACCGTG GATATCAATG TATGCCAACC CGGACGTCCA AAGATTTCCA TACAGCAAAG GACCTGAAGC ACCTAGCTCC GGGATT'rC TACACGGAAA TCAAGTCITrC CACAAAACGG TCGCAGGTTT CACGCCCAAT CAACCTCCTG AGGCGTGATA CTGTAAAATT GGAAATATAG 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 946 GCAACCTTAT ACCCACGGTA CTCAAAATAG CGACGAATCG TATCAAAAGC CGGGCGTTTC CTACGTGGAT ATAGTTGTAC ACCGTTGGCC CACAAACATA TTGCCGTCCT CAATCGGGAC AAAT'rCTCGC AAATCACGAG ACA'rGGTGTC ATCATAAATC ATAATCAGGA AAGCTGAAAT CCAAGAACAA 'rrAGTTTCAT 'rCAAGTAAA'r TTCAGTCCGA ATATCTCTAC ACTTCGGAAr -ceeTrGCTCC AGATAAACCA CCTGAGTCTG TTTGACAAAG CCAATTTT1Tr CATACAAACG ACATTGCTAT CTTCCACTGC AATCTGAAAT TCCTTGTCAT TT'rGCTCAAT
TACCGTCGAA
CATCTTGATC
ATAGATT'TA
CACTAAAAGT
TTTCTCATTC
TTTGGCACCT
TAGTTGGTTG
ACGAGGGATT TTGCTAAGTA AAACCGTAGA GGTAATTCGT CCAGCTTT'rA ATAAAATATA CTATCCACAA CTTCTCTCGA ATTTGATCC'r GATACGAACT GGATAAGGTC TTCTATCC'TT CAGTTACTGG CAAAGTCAGG
GCTTCCATAG
ATTAGTCGAT
TAGTCGGCTT
TTCATGTTCC
ATC'rGCTAAC
ACC'TAACCAA
ATGATTCTCT
CCTTTTCCAC GTTCAGGTTC CAATATTGCT AAATCAACCG TACAAGTTCC AATAACCTGA TCTGGATCTT TCAGAGCT'rC AGCGACATAT TCTGAAAATG CCTGAAATTT TAATTGACTA AAAACT'rCAA GATGGGAAAC ATTTGCTAAC GTTTC1'GTCT
AAAAAAATAC
CTTCATCCTC GATTAGTCCC GTTCTGTCTG AAAAGTGACT GACCGAATGG GGAAAGAAGC TGTTrTCTCTC TCAAAACTAG TAAACAATCC ACGCGCAATC CCCTGACGGC GATGACCTGG ATGAACCAGT ATCGTCACTT CTACATCTTG GTCATCTGCA 10800 10860 109 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 TAGACAGTTA ATAAACCAAC TTTGGGTCAA AATTAAGCAT TGGCAACAGT TAAT'rACTr' GCTTGAATCA TATAGGTATC GCTCGAGTTT A'rrGACATAA CTTTCTTACC ATGAAGACGG GTACATCTGC TACGACAACT GCCCGATAAC TTGGGCATGG TGCCACAGTC TTTCCCTGT'r CAACAATCGC TGTCTCTCCA CAATCTGGGC TCCTGGATGA TACCAGCTAA TAGTTTGAAG AGGCCTGAC ACCTGGATAA TTTCT'rTAC AATATCAATG
TACTCTCGTT
ACAATCTTGG
GCTGCAGCAC
GCTGATATGA
CCCCCGAGAG
ATCACCAGAC
ATCTCAATCT
CCGTGCTTCC
GTCAGCAAAA
GT'rTCGCGCC TTTCTTCGAC TTCGTGAATG CCGGAATACC GACAACCGTC
CGACCTTGGC
GGGCTCCCTT
TCACTCCGTG
AAGTTCGCCT TTTTCATAAT AAAGGAAAAA GGCGGGCATG GTTAGAGAGA TAGGGATCGC GATAGGTACC GTCATAGTTT TTTCGCCTCA GATAGCTCCT CTTGGCTTAA CTTGTTCTT CTCTACAAAC CAGACGATCT GTGACTGGCA TCTT'rAGCCTI
ATTTTCACCA
TCGTACAGTC
ATAGAGAAGA
ACAGGCTCAT
ACGTCACTAG
ATTTCCACAG
GGATGGCGTT
ACGCCTrMT CAGAACCATG GTCAATAAAA ACACCTCAAT GAGTCCAAAA GCGCCAAAAC TGACTGTACA
AGAGAAAATG
CCTCCAAAGT
ACCACCCCAT
CGAGAGACGG
GGTGCGGGCC
ACATTTCTCC
'rGGGCCGCCA
GCTGGATCAT
TTTrCTTAT' 9 9* 9 .999 9
S
99..
S
9**9 a.
9 9 CTGAATCTT'r TGATGTTTCT GGCGGTGAGG GCGCTCAGAC GTAGAAGAGC CTCATAGAG CATCAACTTC ATCCCCGATT TCTCAGAGA'r ATGAACAAGG TCTCGATACG AACGACTT'rA CAGCAATAAT TTCTrTGGCA TTCCTTCTTC GTCflATATCA CTCCACCCTT ACCGATGACA TCGGAGCAGT TGGAGCCAAT GGATTTCA).A ACGCGCTTTC TCCCTTGAAT CTI'GATATCC TGAAGTCCAT ATCTCCAAAG TATTTCCATC TGAGA'rAAGC CACCAGCCAT AAGGGCAAGA ATTCCAAAAC TTCTGCTACT CTTGAGCAAG AGCACGCTCA CGTAACGACC TGTTTCCCCT TCTTGTACTC TGGATCCAAA AGACTGAAAG AGCTTGAGTT GGAAGTCAAC AACCGCATCC CCTT~G'rCTrC TGTAATTAAA CCACATCACG CATAATACGG CAGTCACN'G GTCTTTCACT GAACTGCCTT TrGGAGGTCA CCACGTGAAG CAATTCCACT GGAAGGCAAT CAAT'rCTTTG CT'rCTGACAA TTCTTrGGCA CTGTCAATTC AAGAAGAGAT 947 GTAAAT'rCTT TCTrAGGTTT GTAATCCT'N' TGATGACG'rG Trr'rCACCTT TTTCATCATG CTCAGGNTT GGCGGACGAG GCATCGATAC GGCCTN'C ATCAATTTTG ATAACCTTAA TCTACCAAAT CCTrCTACACG ATTGGTACGA GTCCAAGCCA GCATCTG'rCT TATCAAAGAG GTTAACAAAG GCACCAAATTr GCACGGTAAA CTTCATCCAC TTTGGCTTCA CGAACCAAAC CGGTTAATAG CATCTTGGTC ACTAGACTAG ATAGACACAT ArCTTAACAC CTGTTTCAGC GATAATCTTG TCGATGGTTT ATCIAATCT TGTCCACATC AATCTTGATC GTATCAATTT TCTGGACGAA CTTCTGGAAT GGTTGCTTCA ATGACATCAA TTGGCTTGAG CAAGAGCCTC CGTCAAGATT TCTGCAGTAA ATTTGAAGGG CTGTAATCCC ATCACGAGTA TGATCTTCCA AACC=TGGAT ATCTGTCAAT CCCATAGCAA TACCAGCTAC TGGCGCCTTG GTTCCCGCAC AGATAGAAGC T'rGAGATGAA AGACGGATAG CGTAGGGGAA TTCTTCCAAG CCAAGGGCAC CGTGACCGAT TTCACGACGA ACAGAATATT GAGGGAAGTT ATAGTGGTGC CCATCAATGA TTTGAGTTTC TCCCATCGGA TGCCCACGAG TAAAGAGACC TGAACCATGT AAAGGACGGA TTTCATCGAC CTTACGACCA CGTCGCACT'r CTGCGTGTTC CATTTGTrCC TCAAATTCTT CGTGGTCCGC ATATTTCT ACTTGAGTCG CAGCTTCACG GGCCAATrTC CTGTTGTAGG CTGCAATGAT TTCAGCTTGC TCTGCI'TTT CTTTACCGAC AGCAGCAACG ACAGCTTCGT GCCCTTTAAG GAGCGCTrCC CCAGACTCTA CCATGTTGAT AGCGTGCTrG TGCTCTGCI'T GTTCTTGACT TGGGT'rGATG
CCTGCAACCT
ACTGTGTAGT
AT'rGGCACAC
GAACCGTTTG
CTTGGCAAGA
CCTGGCGCAC
ATAAAGCGTT
GCCAAGGTCA
ACACGAGGAA
TCAGGACGCA
AAGATTTCAG
TCGTAAACGG
TCTTCTACTT
AATTCAGCAT
ATTTCTTCTT
AACATGATTT
GTTCCAGCTA
ATGATTTGGC
12600 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 CATCTACATA TCCCACTTGT ACCCCAGCAA ACAGTGCCAA AGATGAACCA AACATAGCAG AAAGCACTGT ATTGATGACT 'rGGACTTCAT TCGGACGGTC AATCAAACGC GCTGTCAAGG TCATAAAGCC ACCAGGAAAC TTCCCAGCCG GTGGGAAGAA ATCCCCAGTT GCCATTTTCT ACTCACCGTA ACGTACGACA ACAGATCCAT CAATT'AACTC ACGACCCGCA AAAGTCGTTT TTGGATTGAT GAAATTATAC GCCTTGCCTA CAAAGTAAAA ATAGGAAACT GACGAAGTCT ACACAGCTTT TCGGCCGTGT TCAATTACAC CATTTCTGTA GAAAAATAGG AAGGTGACGT TTTTTCCTAA GAAATGAGAC CAAAATTCAA CAAAGGAAGA TAGGAAATCG AACGACGGAG CACAGAGTTG TAGGCAAGTT CAGT'rTTCAA GTATCTAAAG CTTT1CACGCT AATCGCTATC CTCGTCAAAT AACATCGATT TGACTCACTC TGTATTATTT AATACCTTCA TCTTTGTATC TTTCTTAAAA AGTGAGGTCT TTACCATTAA TGGTTTGCTT TTATTATCCT AGAGACTGGT TGTATTTTGC TGTATCAGCA TTATTCATCG TTACCAAGTC CACGATTTGG TCCACTGCAT GGTCA'rGCTC ATAC'rTGTC'r AAGATGGCTT 948
TTGGTCCGTC
CCATTGGTGC
TACGGAAACC
AAATCGAATA TCTGAAATAG AGATGCATTT TCATCATA.AG T'rCCGCAAAC ATAGGACGAA TCGCATCTGT TGAAGGACGT CCTTCACGTT CA'rACATTTT TTCTTCGTAG TTGACTTGGA TAGACATAAC GGCAGCAGTC AAGACAGTTG TTGCTTGCTT AGCAACC'rGA CCAGTCTCTA GAAACACTTG TT'rrGCCATT TTAATCCCCT
CAAAGATCAA
TCGATGAACA
GATACCAAGG ACGTCAAAAG CAAGACAGTT TATCTTTT *t 0@e 9 9 9. 9.
9 9 .9 9 AAGATATT'rT GGACGCTTCG GCTTGCCGAA CGCACTCGAC GAGTGCTAGG AAGCTTATCT GTCATCAAGA TACCAAGCCG TCAAGCAACT CGACTACTCC TAGGGAGATT TATCTTTTTC GATACATCAT TAGAAAGGTT TAATACTAAA GGGCGATTAG CTAAATGCTT TACTAACTCT GTGTCGTTAA ArCTTACAGT TTAAATGCAT AAGTACGTAC AGAATTTATT TTATCATATT AAAGGAACCA TTCCCCTCAC CTGAGAAGAA GATTAAACAA GGCATGGGTT GCTTGATGGA TATAGAGATG CACACCGGCA ACATCCTGAG AGGCAAGTCC TGCTGCTCTG AGCGACTCAG TAAATTTGCG TGGAAGATGG ATATTCTCAC 14340 14400 14460 14520 14580 14640 14700 14760 14820 14880 14940 15000 15060 15120 15180 15240 15300 15360 15420 15480 15540 15600 15660 15720 15780 15840 15900 15960 16020 16080 S. 9 9 9 9 9**S *959 9 9999 9. 59 9 9 9 AAGTCTTCAA GAGTCGGAGA GCCTGATTTC GATTCAGAAT TGGCATAATT CCTGCATGAA TGGGAACATC AATCCCAGCC AAGATACACT TGTCCTGAA-A ATCATAGAAG CGCTCATTGT -CAAAGAAGAG CTGAGTTACG AGGCTCGAAC GAATATCTCA AATCTGATTT CGCGAATCTG TATCAAAGTG AGGGGTTTGT TCCTTGATAA CCTrTGTGG TTCCACGTCT GGAATAATAT CAACTTTGTC CAAGTCAGCA ATAGTTTCAG ACCC'rGCATC
GATGCCCTTC
ACTCAATCAA
CCCCACGA-AG
CAACCTTGTC
CACTTTCTTC TTAAGATTT TGGATAGCAA GCTCCAATAA GTCGGTTGCA TAGCGGAAAT AGCCAAGATT TTCTGCACCC CTTAGTTAGA TAAATAGCTG 949
GCAAGTGGGC
TCGTT'rCCTT
ACTCCTGCAT
GGAACACTTC
TGATTTCTTTr AATCGTCGGA ATCGCCAAAT GATATTAAAT TTATTATTGC ATCCTGCAAG GCTGAALATAA AAATGAGAGT GACGGTGT TGAACAACC ACTGTATGGA CATTTTGGAT AAAGTCAGCC TGGCAGr'rAC ACTGATAAAA
TGTTATCATT
GGCGTGACAT
GAGAAATCCA
ACCCACGGCT
ATGTAATAAC
ATCTTACAAT
AAACGAACCG
TGGGGAGCCA
GGGTTTGGAG
CT'rTCTAGT
TTCTCACGCG
ATACCACGTG
CA-ACOrATC CAGCTTTAGC TGCTTCAACA AGGCGGATCA AGCTTTCTTT TGTTTCTGGG TTTTCAAACC ACAGTCAGGG TTGATCCAAA CTTTCTTGCT TGCACTTTA CTTCGATTGT GT'rGTCGATT TCGCCTTCAT CCCCAGGTCC CACTTCTGTT TGGAAGTTTT TTGAACGGT'r AGCTTCAA.AG GAAATAACGT TArCTGTAAA TTCTGAGTAA CACATGTGAG AGTGTACCPLA GCGGAAGGCA GGAATAGCCC T'rGGTACACG
TCGCTTTGAG
CTGCATCCAT
TGTGGATTTG
AGTCAAGGTA
GGCGGAGTGG CAATTTTTCA CGAAGAGCAG CCTCGTCGAT CAGCT'rCAAG GTCAAGTACT TCATCCTTGA 'rAGCAAGGGC TGATAGAGAT GTCT'rCACGT GGGAATGACC TACCTTTAAC AGGTTTGTTT GTACGACTTT GGTTAAGACG AGTGACATCA CCCCAGATGA ATTGTACCCA TCCATTTrTTA GAGAAGAGGT
AGTTAAGGAT
GTGCATAGCT
TTGGTGGT'TT
ATCCTGACAA
CAAGGACATC
GGAA.AGCGTC
GALACTTCTTT
AAGCTTCTTC
AGGTGAGTGG ATATCGTAAA TTCGTCCAAG ATTTCAAGGT GTTATCGATA GC'rGGGATCA TGTGTCTGGC GCTACTGT'rG GTCTTCGTAC CAGTCGCTAC TTGGATGATT TTCACACCAG GATTTGGAGA GTTGAATCCT GGTAACAGGT CCAGTCAACA AGACCATTTA ACAGTGATAG 'rACCCCACGC ATACCGTATG GT'rTTGACCG AAGTACTCA.A AAAGTCAATA TCTTCTTGCC GTACTCTTTT TGAGACAATT TGT'rTGAGGG AATGAACCAA TTGGATAGCT TCACGTTCT~G 16140 16200 16260 16320 16380 16440 16500 16560 16620 16680 16740 16800 16860 16920 16980 17040 17 100 17160 17220 17280 17340 17400 17460 17520 17580 17640 17700 17760 17820
CCATGTCATT
ACTTGATCCA
CACCTTTACG
TCCTTaTTrT ACGCTCAAAT TCACCGTGAA TTCGTCAATC GTTTCAGCAA GTAAGCCAAA CGTTTGGCAC Tr~AArArC!T GAAGTTTGA CAAAGGCTGG CAAACGAGTG TAGTCTGCGT CAGCATT'N'C ACCAACACGC TCAGTCGCAA AACCTTGACC ATTCGGATA GCATCCAAGT CTGTCAAGCC AGCGATACGC GCACGAAGTT AGAGTTCTTT GTTGGCTGCA AGAGCTTCTG CACGGATTTrC ATCCAATT'rT TCAACTGCAA
AGGCAAAGTG
GAAGAAGTGA
CCAAGCTCTT
GAGTCTTGTC
GTTCAACGT GCTGGTTCAA GCAAGAGCTT GTCAAAACGA TTrCGTAGTTG TTGCGCCAGA AGCTGGGAAG CCACCT'rTAA ATTCTTCATT AGCAGTTGTA TGTTTTCAGC TGGAAT'rTGC TGTTTTTACC AT'rGACAATA
AATGGCACAT
TCA.AGAACAG
CCTACATAGA
CGAGTTCAAG AGTTTTCTTA CCTTCAACAA 950 AGTCAAGACC GATAGCATCT ACTGGTAAGT TTACAAGGTC AGCGTATACG TCACGAACAT CACCGAAATA AGTTTGAAGC AAGACTTCAA GACCTTTTTT GTCAGCCAAG AGTTTGTTGT AAAGTCAA GAAGAGAGCT TTT'rTTCAG CTGTCAAGTC TTTACAAGA GCCGCTTCAT CCAATTGGAT GCGAGTCGCA CCAAGTTCAG CCAATTTAGC AAAAACT'TC'r TGGTAAGCAG CCACTAAGCT ATCTACGAAG TCGTCTGCTT TCACGCCTTC 'rrCAAAGTCT GACAATTGAA GGAAAGTGAA GGGACCTACA AGAACAGGAC GAGTGTTCAA TCCAAGTTCT TTGGCT'rCTT GGAACTCATC GAAAATCTTG TGACCAGCCA AT'rTTACTTG AGTGTCTT'TT TCAAATT'rAG GAACGATGTA GTGG'rAGTTA GTGTTGAACC ATTTCTTCAT TGGAAGGGCG CGAACGTCCC CT'TTTCTCC CTGGTAACCA CGTCCCAAAG CGAAGTAGCC CTCAAGGTCA GACAAGTCCA AGTTTTGAAC GGATGCAGGC ACCACGTTGA AAAGGAAAGC CGCA'rCTAGG AAGm'ATCAT AGTGACAAAA GTCATTTGAT GGAATTTCAG TGATGCCTTT T'rCTTTGACA ATGTTCCAGT GTTTAGCACG CAAGTCTTTT GCTGCTGC'rA AAAGTTCTTC TTCTGAGATT TCTTTTCTAA AGTATTTTTC AGTTGTAAAT TTTAAT'rCAC GGAATTCGCC CAAACGAGGG AAACCGATGA *TTGTAGTTGA CATGATGTGT CCTCCAAAAT TTGTTGTTGA AACTATCTTA ACAGAAAAGA 0 0:0: AAGCGTCTGT ATAATTGTAA AAAATTAGCG TCGGACAAAA GAAAAAGACT TGAAGCAAAC AG6TATATTC CAAT'rAGAAT ACTAAAACAT CCTGTTA'rTA GAAAAGACTA TAACTGATTC GATTGCTAGT GTCTTTCCTA AACTGGCTAG TTTGATATAG TT'rGAAACTA TATATCTGTT GTCTCAAATC CTTTGTAATT CTTAC?1"rAC GTTATTAGTA ATTCPTAAA GTGACTATGA TAGTCAACTr TTTCCCTGPT CAAGTGCGAC GACrTTTTAAG ACTGTATCCA ACTGAGGACT AGGCTGGCT'r ATTCCACTGA CTTCTTCCAG AGCCTCAATC AACTCGCTCA TGATAGCCAC GCTAAAGAGC TCAGATGGAC ATCCTTCCAA TAACCCTCTT T'N-rrTACGT CTATGTATTT 17880 17940 18000 18060 18120 18180 18240 18300 18360 18420 18480 18540 18600 18660 18720 18780 18840 18900 18960 19020 19080 19140 19200 19260 19320 19380 19440 19500 19560 19620 AGTCTTTCCT GTCTCCA'rCC CTTTTTCTGA CTGATTCCTT TCGCATATCA CTTTCAAGGA TTACTCCCAA TAGCACTA'TT TTIAAAAAAAT GAGcGAA'rTA GCCTATTTCT TAAGCGATT'r AATTTTCCAC GATGAATCCA ATCACCATCA TTTCAGGCCA CTTTTCCTTT AGTTTC'N'AA GCCCTTTTCC AAGACATTGT GAGCTTT'rCC AAGTAGAGAG
TAGCTATGAC
GTTCATACCT
TTTCCTCCTT
CTCATCACT
TGATTCGATA GATTGACCAG TGGGTT'rAAA GTTGGTGCTA TCCTTTTCTA GGATAAAGCA GTTCCTGCTT GCTTAACCCC ATAGTAA.ATG GTTGAAATTC CCACGTTAAC CCCT'rTAGCC AAATTTr'rGG TTATGTT'TTT GGTTATGTAT AGTGGAGAAT GACTGTTGAG CGTAGTCGGC AGAATAAATC TCTTTGAAGC
CGGACTGTCC
GCAATTTCTC
CACGCTTGAT TTCAGTGTGG ATAGTTTGAG TATTTGATTT TCCTTCTTTT TTCCATCGTT CGATTAAGCG ACGGCTATCG T'rTCTGTGCC TTTTAATCAT AATCTATGAG GAAGACAAGA CTCCGAGCAA CCATTGCAAA CTTTCGGTCA TTCTCTTTTC C'rTAACATTT GGCTATCTAC ACTGCAACTA TGGCTGCTAT AATAAACAAA TACGGTTT'rA
ATTGTCAAAT
TTCAATCTTA
AAAAGAATAT
TTGAGGTACT
TATCCCTATC
CCAACCTATC
TTTAAGTTTC
CTCTGGCATT
951 GTTTGCCTI-r TGTAGTATAA TTGTCI-rGCA AATTGGACTT T7?-rMACTTG GGTTG'TACTT CAATCAAGTA AAGTCACAAA GTCACAT'rAG CACACAATGA TTAAAACATT TCTCTC1'GCC ATAACT'rATT CTr'rIrCCC ATCTTCTAAT TTGGCACAGA TTTATGCCTr CCCCTTAGCT TTATTTTTTT TCC'rATCTr'r TTACAAGAAA TTGCTC'rTAC TATCGCTCAT ATTACTATTA 'rCAAATAAGA CTAA.AACCTT AAAATTAGTA T'rCGGAACAG ATAAAACCCT TTCTTCTGCA ACTTGGAACG TCGCTAATCA AATAGAAGCA 0 0.* 0 GACCCCGATA TGGCTATATT CAGAGAATCA AACTATTGTT ACTTCTCCAC CTACCAATAG
CCCTGAACTA
TCATCAAGTT
TGGAATAGCT
CAACATATTG
GCTACCAATA
GGACTTTCTA
CCTGTGACTG
GGTTTCTATA
TCGAGAAAAC
TTAATGGAAA
CCAAAGGCTA
ATAAGCTCTC
AGCCAAAGTC
TACTATGTTA
ACAGAAATCA
GGGGGAATTT
ACGCTTGGAT
TTTGGAAGGT
AGACGTCAAA
CAACTTCTTG
GGAATTTGGT
GGTTAAGCTC
CAGAAOCTAA AACTTTTCAT ACAACACGGT AAAATATACC AGATATCATT GCCT'rGCATA TCTGGAAGCA AGACTTAAAC ATCATTCATA TTATTGCAGG TGATTTTAAT GCAACTATC ATAGGGACGC ATTAAATGCA CTGCCACCTT CAAAACTTTT TAATGCAACA ATAGATCATA AAGATTTAGA CAT'rGTAAGT TTTCAAAACT CAT'TTAATT ATTTTATATA AAATCACCCC GTATCCTACT ATCGTTTAAC GCACTTCTGC TTT'TTCCATA TAGCGTGCGA CTTCTTCGTC CAAGCTATAA GCCATTGACT TCATACCAAG GAGTTTGATA TCTGTCAAAC GTTTCACGCC GTGAGTCACT TCTGCCTTGA GGAGAAGGGC GATTTCCACA AATGGAACAG CAGGT'rCGAG AGCTACATAC GTTTCTGGAA TATCGTALAGC AGCGAATTTI' TAGCCATTTT TCAGAGGTGA GCAAGAAAAC TGGCCAACTA TGATATNTTC TGATTGTCAA GAAAAGTTAT TCGGGACAAT TGTATTACAT CTGCGCCTCC TCTGCCAGGT ATCAATTGGC TTCAAAATAT GTCATGGAGC ACTTGCAAAA T'rGAAAGAGG AACTTGGAAT TTTTATTGCC TAAAAACCAC CTGATCATAG ATGTATTTTT TCTAATGTTC ATAAACTAGA ATTGACTTTT TCTTCGAGAG CGT'rAAGCTG TCTTCTGGAT TCCC-AGT'TTT TCACCTGAGA GGCAGCT1'GG ATAGCATCTA AACGTCACGG CTGACTGCTG CGCCCCTTCG ATGGCTGAAA CTTGGCAGTG ACTGGATGCA CACGGCTGTA CGACCTGGAT 19680 19740 19800 19860 19920 19980 20040 20100 20160 20220 20280 20340 20400 20460 20520 20580 20640 20700 20760 20820 20880 20940 21000 21060 21120 21180 21240 21300 21360 0* 0 0:0*
I.
CTTGGCCAAG GAAACCAAGA ACTTGGTCAC CGAGTGAAAT 952 GAAGGCTAAC GATTTCAGAT GTTGCTGTAT AGGTTACT'rG GAGTCCCAAA CGAGTAAATA GGCCTTCAAG GAT'rCCCTTA GCATAGAAGA AATCAACTGG AACTGCTGCT GT1'TGGAAAT CTTTICAGC AACCAAGCCr GTCAAGGCAA AGGCAAAGCT GrrGATCTCA TTTGGAAGTT CTTCTTl-rGG ATTACCTGTT TG?1'CAAAGA CTITTCCAAT CTCATAAAGG GCCAAGTTT TATTCT'rACG
GGAGGACTGA
AGCCACGTTG TAGGCAACGG TATCAAGGAT CCCTGAAATC ATATTTTGAC ACGATCCACA GTCATTGGCC ACATGAGTTC AGTAAGGTTA CTTGGTTGAG CTGTGAACTC AACTGCTT'TT CTGCTCCTTC AGCAATGGTA CAGCTGTACC ATCGTCTTTT CGATTTCTTC AAAGAGATCA CTGTAAAGCT GTCTGCA'1TT CATCAGCATA AGACAGCTCA CTTCCACATC AGAGGTATCA CTGCAAGCTC TGCAATCATG TAATTCCTTT TTCAAAGCGA TACGGATAGA TT'rGCCATTA AAATTTCTGT ACCCTGACCA TCAGGAGT'fG TCAGAGCATA GGTGATGATT TCTGTCAAAC
CGAACTTGAC
GGAAGGCTG
GCTTCGATTG
CCAGAAAGAC
GTTCCGAGGA
AGCTCACCCG
CTAGCTGCCG
GAAGATGACT
AAAACAGCAG
CCCATAACAC
GGCGGAG'NTT TTGTATCACA GTCAATTC-AC TTGGCAAGCG GTCATATCCA TAGATACGAG 'rGATATCCCA ACGACGACGT GGTACGCTGA CAAAGCCAAG ACGACGGAAG ACGTCTTCTA CACGGTTAAC ATCAGCAAGG GTTGAAGAAA CTGAAACGAT ACCCTTACGC ACCGTCGCC CATCAAGGGC TTCAT'rAACT GTTGCCACAT CAGAACGAAG GTTCAGGCGA CCACTTGTCT CTTCAAGGAT AACACGACTA GA'TTTTCAG CGGCAAGGGC TACTGGT'rTG TCAGCAACTG GTTCTTCACC GTCCAGGGTC ACTAATTTT'r CAGTCCCT'rC AAATGTGTCC AAGTCAAAAG TGTAGTTTGT CACGTCTACA ACGT'rATTGA 21420 21480 21540 21600 21660 21720 21780 21840 21900 21960 22020 22080 22140 22200 22260 22320 22380 22440 22500 22560 22620 22680 22740 22800 22860 22920 22980 23040 23100 23160 TAATCACGAG GTCTGTCTCA GCCAAGTCTC CACCATCACG CGCT'rCACGC ACACGGATGT CATGCATAGG TTGACCAAAG TAGAGCAGGA TGGGACGGAT GCCTTCGTTC ATGAGAAGGT TCACATTGTC CAAGATACGA GCTGCATAGT AAAGGGCATC TGCCGCAGCT TCATTAGTTT CCTTGTC ATA GATGGCTGCC ACTTCGTGAG GGTTTGGTGT GATGGAAAGT TCGATGATTT T'rTTGCAACCA
AAGGCGCCTT
CTGTTAGAGT
CCACTCCACA
CATCATCCAA
TTGTGGACTT GGTGCGATAG GTCTGTCTCA ATGCTGACAG AA--=TTA AAGTTCACTG CATAGAAAGG GCATCTGCAC GTCTAGGTAA GAAAAGACTT -CCTCACCTGG CACGGCATCI' 'CAGGCAAGA TTTGGATGCC ATCTGCGAAT TCCTTAGGCA CAACTGAGTC AGAAATTCCC AATTCACCAA GTGAACAGAT CATTCCAAGT GACTCCAAAC CACGGATTTT TCC'TT'rr'rG ATTTTGTAGT TATCAGCGAT ACGAGCTCCT GGAAGAGCCA CCATGACCTT GATCCCAGCA CGCACA'r-r'G GGGCACCACA AACGATCTGA CGCTC'N'CTT CTCGCCAAC GTTAATCTGA CAA.ACATGGA GGTGAGTCTC TGGCACATCT TCGCAAGACA AGACCTCACC GACGACAAT'r CGATCCCTGTI AG'rTGACAT'r A'IrCTTTTAA CCATTTATAA CTC'rAG'rTCT TTTCCTTTCC TTTCTCAGTA ACCAATCCGT rTTTTCAAAAC CATATCGGT'r AGCCAAGCCC AAGAAAAACT TTACCTAGTC CAAATCCTTG TCCTCTAATT CTCTCTCAGT TCCTCCTCAT GCATAATGAA GTT3TDCAGAC TATAAGCCTC GCAAAGGTTT CACGAAAGGT TCTACTTTTC TAATCATTAT CACGGATATC GTTGATTCCA CAAAGCCAGA GTATACAGTC TACCGGCCCC CATAAT'rTCG CACACTTGAA GCAAGAAACA GACGCAAACG AATTTGACGC 953 TTTGAGAGAC CAGCAGCTGG TGATTCGACA CCCTCTACCT TTTTCAGCCA ACTCTTGTGA TGCCACATCA ATGTCCACCA CATACAAGCA TAATTTAGTT CTCCAGAATG ACAGTTGTCA TATCATTTCA ATAGAAGAAT CCTCTTCTTA CCTTAATTTC ATCTACTTTT TGACCAACCA TAAAATGATG TTGGCTAAAT ATAAAACGCT TGAGCT'TTTG TAT'rATGCTC CCAAACACCT ATTTTTTGTA GCAAGTTCAA GTGCGAATTC AAACAGTTGC GAATTTTTGT.AGCACATAGA GACGTTGAAT TTCAAAAGCG TTGAGCACTT CCCCAGTTGA CTTTGAGAAA ACCAGCTATC ATAGGTTTCA GAGTCACGAT TTCAAAGTAT TCCTGTAACT TTGTTTGGCA ATTTTAGCCA T'rAAACTGTT CTGAGAAGCG TAACGGAGCA TAGCTACACG GCATCGATAC CACTCAT'TC ATCCAACCTG TTTTCTTACA TCCACCTCAA CAGATGGCTC TCTTCACCAA ACATTTTTTG ATATTTTTCC CAACTACCAA TCCGTATCGC GACGGAAGAC TCATGGGCAT CCATAGCACG TCAGTGATAT AGAAAGTATC ~CGTTCAAAGT TATAGTAGTC
TTCCCAACTC
GCTCTTCCGT
ACACCTCAAC
GACATCTCCT
CTCTTGTCCA
AAGGACACGT
TACATTACAG
TGTGAATGG.
AGTTGACAAA
ATTATCATAC
ATCTGCCATT
TGGTAGAATC
AGACCAAAGG
GGGTGA.ACCA
CCTTCTCCAC
AAGTPLAGATG
23220 23280 23340 23400 23460 23520 23580 23640 23700 23760 23820 23880 23940 24000 24060 24120 24180 24240 24300 24360 24420 24480 24540 24600 24660 24720 24780 24840 24900
CTTGAAGATC
GGTGACTGTG
TCAAAGGACC
GGGTACGGAG
GGTCTTT.TGG
CCACGACTTG
AAACGTGACG
TAGCCAGTTG
AGCCATAGAG
GGTCGCATCG
TTTAGAAAAA
CAAGATTTCT
AAGGTTCATA
GACAATCAAC TGAAGCGTTC GCCTTCGATT TGGTGGAATT ACGCCCTGGC GAGATCATCT CGCCTGAACT GGAGACGTGT CTGCATATCA CGAGCTGGGT TTGCTCCACT TCAAAACCAT TTCTTCACTG GTTTGTGTCA CGTCACATCT ATACTCTCGC AGCTGTTTCT TCAAAAGCAG ATAACCCATA CCGATGAAGA TATCTTCGAT GTGACCAGTC GCAACTGGAC GACCTGGAAG AGCCGCGACT TTCTTTTCTT CCAAGAGCTT CAGTCAAGAC ATCACGAGCT TCATTGACGT GTTTCCCGAT GATTGGACGC ATCTCAGCAG AAACATCTTT CATCCCTTT'G AGGATTTCAG TGAGCGAACC C'T'rTTACCA AGGACAGAGA CACGCAAATC TTGCATCTCT TTTTCATTrC CAGCAGTAAT CTGCTTCAAG CTAGCCAGCG TTTCTTCGCG AAGCGCTTTT AATTGTTrCTT CTCGTAGATA AAAAGAAAAC CACATGCCAA CCATCCGTTT TCA'rCTGACA AGTCAGACCT ACCCAGCTTT CATATAGAGA GCTTGCAGTC GGCTACTAGT CTTTCAGATT CCTATTCAAT CTTGCAAGAC CTATCTTACT TCTGCTTGTT T'rATCTAAAT TAACTATTTA TAATTTTTGT TAAAATAAAG AAGTTTAGAA TTTAATGTCT ATGCCACCTT AACCGTGATA ATAGCTAGTC TCCATTTGAT TCAATCACTT CTTTATACCA TGTACCACTT CCATCATCGT TTCGATCAAC TGCAGTGGAC ATAGAAACAC AGTCAATACA ACCATCCTGT AGAGCTTCAG CAACTTGCAA ATCATCTTGG ACGGTTAAGT TATTAAGTTC TCCATTTTCT ACTATAPLATA ATGGGAT'rTG ACGTAGTCCA ATTGGATCAA TT'rGCCATCC TAAACCACCA ATAATATTCC CTTCTCCTGA CACACTCATG TAATAGCTAA AGGATAAAAA ATCT'rCAGCT GCAAACTCTA TGTTAATGTC CGGATAATA.A CCTCTAACAT GCACATCTGA 954 CAATAGTTGA CATATTTCCT AAACTCCACT CGGAGCGTTG
CCATCAGTCT
ACACGCGGTA
TCATT'rCTAA ATCCATGCGC AAGTGAAT'TC ACGGCTCTCC TCCCTGATAT ACTTCCCTTG TACTACTTAG TTTATCAGAT TTTTACCATT AGCT'rATTCT TATCTAAATT TATATAAACC AACAAAATTA AATTAATTGA CACTCCCCTA TCCAAACTTC TTTATTCCAT ATCAATAAAA AACTATTTGA
ATTTAATGAA
ATAAGGATTC
AGTAAAAGAC ATTTTCTTAT ATCGA'rTTAA ATAAATGAGA CCGTACCTTT TAGALAAGTTG TCCCCAAGAC GTATAGCCCA TAATTTCA-AC TAAATGTTCT TTCATATACT GAATTCTATA ATCTTTTATT AG'ITGATCTT TAGCACCTAA ATAACGGTCA TAATATCTAT TI'AAAATTAT CCACTCTGAA GACTCTAAAT AAGGATTTAC ATTATACTGT GTTGGAAGAG CAGATTGAGT ATCTACGGTA TAArITTTTA ATAACTCTGC ATTTTCCTTA AAATATCTTT TTGCATAATT AAATAGATAA TTTAGATTCT CATACTCATG CATTGGATAA GCTGGCATAG CTAATACCAT ACGAGCAATT TTTGTALACCA AACTTGAGGC TTCTTGTTTC GAAAGATTCT CCTTAGGTAT CAAAACAGAG TTTACTTCGT TAAATGTAAG TAAAACTGTT CGAGCAALATT TTTCATAAAA ATATTTTCTT GCTAAATATA ATGGAGTCTC CCCGTGACCA TGTAGTTCAT CAAACAATTC TTCTTCCTCA TCTCCTTTTG GAAAAATTICT AAAGCCCATT TCAGAAAACA AGGATATATC TATCAATTTT AAGTTATCTT CTGTAGGATT 24960 25020 25080 25140 25200 25260 25320 25380 25440 25500 25560 25620 25680 25740 25800 25860 25920 25980 26040 26100 26160 26220 26280 26340 26400 26460 26520 26580 26640 26700
AGTCGCCCAT
ACATCCCACC
GACTAATTCA
ATCTATTCCT
CCAATATTTA
-ATGAATCATT
ATAGTGTGAA
ATCATAATAT
ACTCCATGCA
TTCCTTATAT
ACATCTTTTG CATTTGGAGT TTAAACTCTG AATTAATCTC TGATGTATAG CTTGATATAA CCACTAGTAA ATGGTAATTC ACTTTATCTT TATACCTTC CTCCTATCAA CCCATCCATG AGAGTTACAA GTGGT'rCTAT TTCAACCCAG CTCGTTAGG ATAGAAGTAC GAAAAACATT TTATGATAAA AATCAATACC TTCTGT'rGCT TCTCCTAATC CACCT'rrGGG ATCTTCATTA TATGCrCCCT CTACTTGATT ATCTGGAAAA ATGGTC-ATAA C1"rTCCTCCA CTCGATTGTC TGATTATTAG G'rAATACTAA AAT'rACTGTI G;TAACTGTIT CGTAGCCTT AATCAACTGA =MrTGAAPLA CTCTGTCTCC T'rTTAGCTCA ACAGTATCTA ATCCAATATG CAATCCAATT GCG'rGCTTAG TCGGAAAAAT AACCTTACCT TCACTTGGGA TAATCGCTAC TTTATCCTGG ACATCGCTTA ACGGAATGAT TTTATTTGAA ACTCCAGCTr CAACTTCTAA AAATATATAA GCTAATACAA AGGTAATAAC TACAATATrr GATGGCACAT CAGAATAAAT AGCAAATAGA TATGCTTTA.A CACTAGTAAG AATCATAGCT GCATAAAGCG GTTTT'rATA GGTAATCCCT GCAAGTAAGG CTGAGAAACC ATTTTTACTC 'rTTAATGCAA CAGCCATCGA CATTGCTGGA AGAATTAATA CGTCTGGAGT AAAAGCCCAA TGCATTCCAG TCATAACAAT 'rGTAAGCCAT CCAGCTACAC CATACATTTG AATTACTCCA ATAGGTCCGA CTACAACTAA CGTAGGTTGC AAAA.AACTCT TAGTAATAGC ATATT'rCATC AACCAAACCA TAATAAGAAT TGTCACAGGT GCACCAAATA AACTAAGAGG TCGGATGGAGA AGTACACCTG CTACAGACAT TGATGCAGAA TAAGCTIAATA ACAGCGGTAA CAAAAAAGCA ATAGTCTGAG AATCTGATTG CAAGACTTTC AACATACCTC CCCCTAACAT GATATACTCA ATGATCTrr CTAAAATATT 955 TAACACATCC TGAACTGATA AGCCCTTACC AGCTGCAACA GCTCCACCCC AAAGAAAATC TTATAATATT ACCAGTAATT CCTTAGAATG TACATCTAGA AAATCAT'rGG TA'TTCGTTAC AGTCTTGATr AAATTCAAGT CCATTTCAAA 'rrCTTCTACA TGACTAATAA AACCTTGACC AATTAcGrAAC TCAACACCCT CATCACTCTTI ATTTGTAATT TTCCCATCA.A ATGGTGCATA TCCGTCTCCA ATTAGTTTAT CTGAAAATGT TTCTCCTGAT ATAGGAGAAA ATATCATT'TT ATTGCTAGAA CTCrCTTCTT CATCGATTCC AACCGAA.ATG ACCGCCACAA TTAAAGCATT AAATTGAGGC AACGCTATCA AAGATGGGAC ACCTGCAAAT AATCCCGCTA ATCCACCACC TTTTAAAGTC ACACCATATA ATGCAGGI-rC TGCTGCAA.AA GCAATTTGTT TTGTATTATT AGCAGCCCCT TGAGCTAAGT TTGACCCTAA AGCAATAGAT GCCGCCAAAA AAATAGGTGC AAATGGCATA ATAGCACCAA GAATAGCTAA CCCAACTAGA TTTGATAATC CT'rCACCAAC GGCAATACAG CTTGATACTA ATAATACTAG TAGTGTTAAT TTAGCAATTA TTTTTTCAkT TGGAACGACT GATGAACCAT AACTAGCTGG ATTCCCTGAT TGCACCATTT GA.ACAAAATT AGCTAATGTA GATGTTACTT TTAATTTG GAAATAATAT GGAGCATCCC CAAAAAATGT 26760 26820 26880 26940 27000 27060 27120 27180 27240 27300 27360 27420 27480 27540 27600 27660 27720 27780 27840 27900 27960 28020 28080 28140 28200 28260 28320 28380 28440
CAATATACCA
TGCTCGAATG
CCCTTTGTGC
AGCATTGCTA AAATGATTAC ATTGGAGTCA TGGAACCAGC CCTTGAACAA CTGAATCGGA TTCAAAATTG CCAAGTTTAA CGAATTCT AAT'rTGATAT TGTCCATTCT TTTTCATAAT ATCATCATTG ACTAAATrTrr CATCTrTTAA AACTCTATTG ACATTTTTTT CACCTCCAAT ATAACTCATG GTATTCTCCT ATTCTAT'rAA GCCATCAAAT AAACTAATTC ACTAGAAGTC ATTGCrACCT GTTCACCACA TTCATATCT AAGTCTTCAT CATCATCGTC GG INFORMATION FOR SEQ ID NO: 14 SEQUENCE CHARACTERISTICS LENGTH: 12835 base CB) TYPE: nucleic acid STRANDEDNESS: daubi TOPOLOGY: linear 956 ATAATAATTA GCTACATCAT TACCAAGTAT ACCTATTACA CCTGGTATCT TCTTCACATC TTCTAATCTT AAACGTGTTA CACAATGGGT TACATCGAGG ATTTTTTGTA CCGTATCTTT TCTAAATTTI' TTGTTAAGCG ACGAATATGA AGCAAATAAT TGTACTCCGT TTGTATAAAC CTAGGATATT TATTTTCAT TAATGCTAAC 28500 28560 28620 28680 28740 28800 28860 28882 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: GCCTATGTCT TTTTCAAAAA AATGCTTGAC TTGAGACGGG GGAAGGCATT GATTTATACT CTTCGAAAAT CTCTTCAAAC TTA TATATGT AACTGACTTC ACTTGCGGCT AGTTTCCTAG TTTGTAGGAG GTGGCTTATG ATGTCTTGCT TACTTTGCTT TCI'TTTTGA TATTCATGCG ACTTATTAAA ATCAGCATTA CATTTATTAT TCCAATCATC AAGTTTTACG ATTGAGTATT CTTTGCAAGT TGCAAGTATC TTATAACCTA TTTCTTTGGG GAAGTGGTTT ACAAAGACTC TCCTACTAAT CGGTATTTC GGAATGTGAC TCGTTCGGCA TTTATAGTGC CTTGCCTTAC GTCGATGCTT ATCTACAACC TTTGCTCTTT GAT-rTTCATT- AAGATTCCTC TCTTAACTTT TTTCTTGCTT TGGTTTATCG CCCGATCTAG CTAAATTCGA GATTTTCGTA TTCTCCAGTT ATTGTTTTGC TAGGTTTTCA GGAAGAGAAG TGAGTTATCA CCTTGTTTGA TATATTTAGT AACTAGGGAA GTCTAAAGGC CACGTCAACG TCGCCTTGGA TCAAAGCAGT 6CTTTGAGCA GAGTATTATA TTACTTTCTA TGCAAGGCAT AAATTTGTTT TGATGTTTTG ATGACTTATT TGGACAAGCA ATTAAAAATG CAATCTAGGT TTTTATCAAT ATATATTGAG CTGAAAAATA AGGGTTAAAA AGAAAGTTGA GACTGTGCTG ATAATTGCAA ACTTTTTCTC CTCTTGGATG GAATTCTCTA CTAGATGGAG AGATAAAAAG CTATTTGTTC ATCAATGCAA TCTATTTTTT ACAAATAGTT ATCACCTATT TGATGTTTCT TTGGCTTGGT TATATGGTTC CTATGACGAG TTTGATGCAA
TTTTCTGATG
TTTACTTGTG
GATTATGTGG
TCTATGCTGC
GCTAGCTATG
957 GGGATGTAAG TTTGATGAAA CTCTTTACTC CTTATATCCT TTA'rATTGTC CCTTACATGG TGCTTGAPAAA ATATGAAGAT AATGTT'TAAG AATTTTAACA ATATTTTGCT AAATAGAAAG ATTGTrTTAC TACTTCGTA'r AGT'rCTGATG ATGATTTTGA TAAACCATCT ATTGTCAACA GCGGTTCAAA AGCAGGATGC 'rGTTATCTTT TTCAAGAGAG AATTGA'rTTC AATN'TTTCC TATAATGACT AT'rCTGAAGC GAATTTAGAA ATCCCCAAAC TATTGTTAAA CCTTTCGCTT T'rCATGGTAG GATGGCTCTC TGTCATTTTA CTTGAAAGTG kI-rTGGCAGA CCATTACCAT CACTTGATTC GCTATCAATC AAGCTCCTTT TTCGATrTATA CAAGGAAACG ATTGGTTGTC ATTTCTAAAT TTTT'rACTCA AGATTTGTTT GTCTGGTr'rC TTGGTTTACT TCCTCTAGGA 1020 1080 1140 1200 1260 1320 1380 1440 09 S* -0 00
I
ATTCATTTCA.
CTACTGTCTT
TTTTTAGCAT
CTCTTAAGTT
TAAACGATGA
AATCAGTTGA
TCTGGCAAGA
GTTCTTCAAG
TTAATTGAAA
AGGTATT'TT
TTACAGGAAT
GCCTTGATTC
GCTTTGGATG
GAAAATGGTC
GATGTAT1TAG
TGGCTTATGG
GTTCTGCTGA
AGGATTGACA
AALACAGTCGC ACTTTTCTT'r TTACTTGCTC AGTTAATGAT GTTGTACTTA ATCTGATAGC ACTGATTAGT GCGGGCGCTG GTTTTTCCT'r T'TC'TCTAT TTGTGGGACA AGAATGGATG ATGGATCATA TTGTAACAGT GTATTTAGTA TAT'rAGT'rAT GT'rGA'rTGTT AGTCGCTTGG AAGAGAAATT TAAGAAAGGA GACTTGAAAT TATAAATGGA CAGAAAATTT ATGGGAAAAG ACCTATTTTA ATTTGGTGTT TCAATCAGGA AAAATTTATG GACTTAAAGG TGATAATGGA CGGTTCTr'rT AAAGATACTT GCTGGTTATA TTAAGCTTGA CAAAGGAAAA ATGGTAAAGT TTACCGGGCTA AAAAATCATT ATATI'CAGGA TGCAGGAATT AAGTCGAGTT TTTATCTCAT TTATCCCTGA GAGAAAATTr' GGAACTGTTA CATCTAAAGT TACGGAAAAA AGAATTGCCT ATTGGATTCA ATACTATGAT TTGAAGACAT TGAATACCGT CATTTATCCT TAGGAACAAA GCAAAAAATG AAGCCTTTAT PTCCTCTCCT TCTATACTCT TTCTCGATGA ACCTATGAAT AGAAGAGTGT GAGGTTAACC AAACAGGTCA TTTTATCTTA CCTGAAAAAA TGGTTATCCT GACGTCGCAC A'rATCGGAAG ATATT~TCAGA CCTTTG'rACA TTGTCGAAAA TGGACATATA CAAATGTAAA GGATATACAA TCCTAGGAGA CACATCTAAA ATCATTTATT ACACGATATT CCAAGGT'rTA TATTGGTTTA TCTGGCTGTC TTTCTTCTTT ATCCCTTGGG ATAAACCACT TCTGGGGATA TCTTCATCAT ACAGAAAATC T'rGCTAGCTT TTGGAArrCT CTCCAT'rCTC 9* 99 9 9 9 9*99 9 9*9* 9* 9 9 9 ATGGCCI'GC TGTCCAAGAA AGTCAGTCTC TTTGTTI-rTG GACTGATT'rG CTGTCTTTCT CTTTGGATTA ACTTATITTAT CACAT'TGCCC ATTTTGCCGA TTrT'rGGCAA TTAAACAGTC ATAAAAGTCG GAGAGGrTAG CTTGAAAACT AACCTCTTTT TCCTTTTCAA AATGGGATT 958 CTTCCTTGAA AATAATCAGT AATTGTGCTA AAATTAAAGG AACATTCTA.A AATATTCGG;A ATTTAAAGTA AGGAAAAACA TGGCTAATAT ?1TrAAAAACA ATTATCGAAA ATGATAAAGG AGAAATCCGr CGTCTGGAAA AGATCGCTGA CAAGG=TrC AAATACGAAG ACCAAATGGC TGCTrGACT GACGACCAAC T1AAAAGCAAA AACAGTTGAA TGCAGAATCA CTCGATTCAT 'rGCTTTACGA AGCATTTGCG ACGTCTCCTA GGTCTCTTCC CTTATAAGGT TCAGGTCA'rG
TTTAAGGAAC
GTTGTCCGTG
GGG4GCGATTG 9 9*~*99 9 9 99 i' 9 999 9 99 9.
9 9 9 99 99 9 9 99 99 9 9 0 TGGTGACGTG CCAGAGATGC ATACCTCAAT GCCCrrTCAG AGAACGTCAC GCGACTGAGA TAACTTGGCT ACCAAATCTC CTCAACTAAC TCAGAAATCG AAACATGGTA CAACGTCCGC TGACGAGGCT CGTACACCTT GTATCACATG GCAGACCACr GCAGTCrAAG ACTrATTGGTT ACTTGAAAAC C'rCTATGACA TCGTGCCAAC TACATCATGC CTTGATTGTC GACCAATTT'A GCACCAAGCT ATTGAAGCCA CTCAATCACG TACCAAAACC AGGTAAGACT GAGGAAGAAG AACAAACCGT CCTGTTCAAC TAAGTTTAAA GCGGTTGTCG GTACAGGGGA AGGGAAAACC TTGACTGCGA GTAAAGGGGT TCACGTAGTT ACGGTTAATG TGGGTGAATT GTACTCTTGG CTTGGTTTGT CAATGGAGAA AAAAGAAGCC TATGAGTGTG GATTTGACTA CCTTCGTGAC AACATGGTCG TTAACTATGC CTTGGTCGAT GAGGTTGACT TGATTGTATC AGGTGCCAAT GCGGTTGAAA
GTTATCAAAA
AAGGTGCCAA
TTCT'rCACCA
CCATGCCGGT
AATACCTGTC
CAGTAGGGAT
ATATTACTTA
TTCGCGCCGA
CTATCTTGAT
CCAGTCAGTT
TCATCGATGT
GCTACTTCAA
ATGTAAAA'rC.
TGTC'rGATTC
TCGAAAACGT
TTCTCGATAT
CAGGTCGTAC
AAGAAGGTGT
TCTTCCGTAT
AATI'CCGTGA
GTATTCACCA
AAGACGTTAA
TTTGAACAAA GATGACTACA AGGGATTGAC AGGGCTGAAA 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 see* 900 GGCTTTGACT CACTTTATCG ATAACGCCCT TGACTATGTG GTGAGCGAAG AGCAAGAAAT CATGGAAGGT CGTCGTTATT CTGATGGATT GCCAATCCAG GATCAAACCA AGACATCTC GTACAAGAAA TTGTCTGGTA TGACGGGTAC AATCTACAAC ATTCGTGTTA TTCCAATCCC CTCAGACCTT CTTTATGCAA GTATCGAATC GGCTCGTTAC CAAAAGGGTC AACCTG'rCTT CTACATTTCr AAGAAATTGG TTGCAGCTGG CCACTATAGA GAAGCCCAAA TCATCATGAA AACCAACA'rG GCGGGTCCG GTACCGACAT AGGACTTTGT GTTATTGGTA CAGAACOTCA TGGACGTTCA GGTCGTCAAG GAGATCCAGG TGATT?GATG AAACGT'r'1-G GTTCTGAACG GTCTGAAGAG GCCATTGAGT CTCGCATGTT GGTTGGT ACA
TGTTCCTCAC
GTAGCGGTTG AAACTAGTGA GAAGTCTTGA ATGCCAAAAA TGCTGGTCAA CGTGGTGCCG TTACCATCGC CAAGCTTGGT GAAGGTGTTC GTGAACTTGG TCAAAGTCGT CGTATCGATA ACCAGCTTCG TGAGTCACAA T'rCTACCTAT CTCTTGAAGA CTTGAAGCGA ATCTTTGAAC GCTTGAACAT GACGCGTCAG GT'rGAAGCAG CTCAGAAACG ACAAGTCCTT CAATACGATG ATGTCATGCG TTACGATGTC ATCACTGCAG A'rCGTGACTT CACGATTGAA CGTGTCGTTG ATGGTCATGC AATTTTGAAC TT-TGCTAAGT ACAACTTGCT GTCAGGCTTG TCTGATAAGG CCATCAAGGA CGATAGTCAG GTT1'CAAAAC TACGCGATGA GA'rTCTACGA GTGGTGGATA ACAAGTGGAC TAACGCGGT'r GGACTTCGTG GCTATGCTCA AGGT'TTCCGT ATrG'-r'AATG ATATGATTGG GATGAAAGCA CAAATTCATG AACAAGAAAG AGCGACTCGC AATATCGCTG C1'CACCAAGC GATTGGACCC AATGAACT'TT GCCCATGTGG TAAAAGACAA TAAAATGAGA TAGTTTAGAG 959
TGTCGAAGGA
TGAACPLACGT
AATAACTACG ATACCCGTAA GAGATTATCT ATGCTCAACG GGCACCTGA6A ATTCAGTCTA TGATCAAACG GCGTGCCAAA CAAGATGAAA AACTACGC 'rCCTGAAGAT TCTATTACGA TGGAAGACTT AGAGCrTTTC CAACGTTCCT TGAAGGTrTrA AGAAGCACTT AAAGAATTCC AAAAAGTTr AGATCATATC GATGCCCTI'G ATCAATTGCG GAACAACCCT GTGTTGAGT ATCAGGCAGA TTCGATTGAG TT-TGATGTGA CACGCTTGAT ACCACAGGCA GAACGTCATA 'rCAGTACAAC AAGTATGCCA GAAGATTTGG ATTTGAGCCA TTCTGGTAAG AAATTT1AAAA ACTGTCACGG GCGGATATCT TG'rGAAA6AGT TGGGTATCCG TTTGCTTTAT AAGGAGATGA GT'TATGGTAT TTACAGCAAA
AAATTTTTAC
AAGCTCTAAA
4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 ATAAA-UTATAG AAGA.NGTTCG 'rGCCTTCrA TCACAGCGAG ATCAAGAGCT AGAAGCCATT GTAATCGGGC CATGCTCATC TGACAACGAA GCAGTCCTAC AAGAAGAAGT GGCAGATCGT AAACCCCGTA CCAACGGAGA TCGCTATAAG GCGCCTAGTC TTA'rCAATGG GAAACAGGGA TGACAACTC GA'r'r'GATT'r CTTACATGGC GTGGCAAGTG GGGCAGGATT GTCATGTTTA ATGGGAT'rTA GAAGTAGAAA CAACTGGGAA TATGGAAAAA ATATTCCCAA GAGAAAATGG GCTTGGAAAA AAGCAGTATA TTGAACAGAT
AATCAAAGCC
TGATGAAATG
AGTTGGTGCC
TTCTACTGGT
TGCTGCTCAA
CCCGCTTTCA
CTACTATTAT
kAATAGAAG GTCAGGCTTT GGAGAGGAAA ATACGTGGAG AAGACCAGCG AATTCTCTTG GAAGCTGTCC T'rGAATACGC TAAGCGTTTG ATCTTTAT.GG TTATGCGTGT TTATACTGCC GGCTTGATTC ACCAGCCTAA CGCGACAGAA GTTCGCCATC TTCACTATCG TGTCATCACA CTrTrATCCTG AAPLACCTTCC GCTTGTACAT CG'TTCAGTTG AAGACCAGCA ACACCGCTTT T'IAAAAATC CAACCTCTGG AAATCTCAAT AACAPLACAAA GTTTCCTTTT CTTAGGAAAA CACGCTATTC I'TCGTGGTGC TCTTALATGAG GACAATTTAA TGCCCAGTAT
TTGATACCAT
TCC?1'?1ATCATCATTrGATA CCAATCATGA _CAAT'CTGGT CCGAATTGTC CGCCAGACCT TGATTAACCG TGCTTGGAAT 960 GAAAAAATTA AGCAGTTCGT TCGTGGTTT'r ATGATTGAGT CAAAATGAGC CAGAAGTATT TGGTAAGTCT ATCACAGACC ACAGAAGCTC TTrGTCAGACA AATTTACAAA ACGT'rAGGAG AAAAGGTCAA GAAATCGATA TGGAAGTCAT CAAGGCTGAA CTTGAGACTC AAGGAAAGCC GTGACAGGGA A'rTGGCAGAT CCGTATTCTC T'rGGTGATTG GTCCTT'GCTC TTCTGATAAT TGCT~CGCCGT TTATCTGCCT TGCAAAAGAA GGTAGCGGAT CGTGTATACT GCTAAGCCTC GTACCAATGG AGACGGCTAT CTTATCTGGA AGATGGTCGA CTTGCCTGGG TTGGGA'rAAC AATAAGATGG CATTTArTTA ACCCAATTGT CTGCGGAAGC ATTATTTCAG GGGAAGATGA GAAGAGGCGG TCTTGGAATA A.AGATTTTCA TGGTCA'rGCG AAAGGATTAG TTCACCAGCC AGATACTTCT AAGGCTCCAA GCCTGATTAA CCGCGTGATT ACAGAGACTG GT'rTGACAAC GATCTTGGTG GATGACTTGG rCAGCTACCA AGAGCACCGC TTTGTGGCTT CTGGGATTGA AGGAAATTTG GGTGTTATGT TTAACGCCAT TTATCATGGG CAGGAAGTTG ACACATCAGG AGCAGTCAAC GAGTATGGCA ATTATATGCC CATTGAACGC TATGAAACCA TGGGACTTGA TGATAACTCA GGCAAGCAAT ATATGGAGCA TCGTCATTGG AATGAGAAAA TTAAAAAGAC AGCAGATGGT CGTCAAAACC AACCAGAGAT AGGTTGGGAA AATACAGAGG CCTTGGTAGA GAAAAGGATG GAGTTGGGGA ATCTCAACTC ATTGACATCG AAGAATTGGC TTCGATAGAA AAGCGTGTAC TGACCGCTCA GGAAA'rGGAG ATAGAATATT TAGCTGGTCG CTGGTCGGCT GGCATTAGCA AGC'rCGGTTT TCAGGATTTG TATTTTAGTC AGGCACCATT TTCAGGAAAG TTTGTGACAG CCAGTGTCAT TTTGGAGGAA CAAGGCTCTG ATTCATCTGG GAGCTATTCG CCCTCAAGGA ACGCTCAAGT TCGCTGTGGT TGCCGTTIGCC AAGGCAATTC AAGATGATGT TGGCTTGCAG GCTGTGCGCC AGTTGCACTA GGCAGATGAG ATGCTTTATC CGTCKAATCT TGCCGTTGGA GCTCGTTCTG 'PGGAAGACCA TGCACCAGTA-GGGATGAAAA A'rCCAACC'rC CTATGCTGCT CAAAACAAGC AAACCTTCCT TAATCCTTTG GCCCATGTTA TCCTCCGTGG GAATTACTAC TATGAAAATC TACTCCAAGC AAATCCTTTT1 A'rCCTCATTG ACACCAACCA GATTCGAATT GTTCGCCAGA CCTTGCAGAA GGT'rCGAGGA TTTATGATTG AATCTTACCT CT'rrGGT'rGC TCTAT'rACTG ACCCTI'GCCT AGAGATTTAT GTTACCTTGA CAAAATAAGT CTTTTGATGA GAATGATAGT TGGACACGGA AGCGCAGTTA CACGACATGA AGGATTTGCT CGCTTCACCA GTCTCAAAGG ACGCAGGCAA AAGGAGGCCT TETTCCAAGGC TATGGGAACG GAAGTCT'rGA ACAATGAACG TGGGGCGCCT ATrTGGCTGT CTATCACCCA CACCGATCAG AATCATGAAA GCTAGTCCAC ATAGACCAAC ACAAAATATT CAGCAAATGG GGGC1'CATAT TAAGGCCAAT GCTTATGGTC ATGGAGCTGT TGATGGCTTT TGCGTTTCCA ATATCGATGA 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 V00.0 0 0 .000 961.
AGCCATTGAA CTCAGACAAG CTGGACTCAG CAAGCCAATC CTCAT'rTTAG GAGTTTCTGA AATCGAAGCr GTTGCTCTAG CTAAAGAA'rA TGACTTCACC TTGACAGTGG CITGGACTGGA GTGGATTCAA GCACTCTTAG ATAAGGAACI' GGACCTAACT GGATTGACAG TCCZACCTCA.A GATTGATTCA GGGATGGGAC GGATTrGG=r TAGAGAGGCA AGTGAGGTTG AGCAGGCTCA AGATTTGCTC CAACAACACG GTcTTGGT TGAAGGAATC TTTACCCACT TTGCTACTGC TGATGAGGAA TCAGATGACT ATTTTAATGC CCAG~rAGAA CG1r AAAA CTATTTTAGC TAGTATGAAG GAAGTTCCAG AGCTGGTTCA TGCTAGCA.AT TCTGCAACGA CTCTTGGCA TGTAGAGACT ATTTTCAATG CGGTTCG'rAT GGGAGATGCC ATGTATGGCC TCAATCCAAG TGGAGCGGTC TTGGA'rTrGC CTTATGAT' GATACCCGCC TTGACCTTGG AGTCTGCTCT GGrrCATGTC AAGACAGTTC CAGCTGGAGC TT'GCATGGGC TATGGAGCAA CTTATCAAC 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 GGACAAGAGA 8700 GGATAGCGAG CAAGTCATCG CGACCGTGCC AA'rCGGGTAT CCAGATGGAT CATGCAAAAT TTCTCTGTCT TGGTAGATGG GATGGACCAA ATCACTATTC GATTGCCTAA GATTGGCTCC AATGGGGATA AGGAAATCAC CAT'rAACTAT GAGGTGTTT GCCTCCTCAG AAGAAAGGAG TGGAGCATGA ATCTACATCA AAAGTCAGCA GAAAAATACG CCAA.ACTAGG CCAAGCTTGC CCAATTGTCG GCAGGGTTTC GCT'rTATCCG CTAGGA.ACCA AGGTAACCTT TGCAACTCAG GTAGCGACCT ACCGCGTAAC CCACCGTATT CCGAGAGAAT ATTATTAGAA CTTTCCTTTC CGTTATGAAG GAAGGCAGT'r C'TTTCTGGTC GCGCALATCGC CTGCGTTTTA TAACCACCCC TATCTGGCTG ATGGGACCGC GCTAAGGCTA CCTCCAGCCT GTCTATCcGrC CAAGACGGCT TI'rGATCAGG ACTAGACAAA TACAAACTCA
ACTTCAAAAC
AGGTAGTGAC
GTCTCAAGCA
ATAAAATAGA
GTCTGACTGG
TGGCTCAGGG
GACTGGACCT
TGTCCCGTTG
ACCCT'rGCAT
AATTGAAAAC
CAAGCAGCTG
TCCTGCTAGT
GGGAGAGGTC
GTTCGAGCA
GTCTTGCCTG GTGTGGGACC" TTGCAAGATC TCTTGCTCTA CTGGAGCTGG AAGACGGTGA GTCCAGTATT ATGGTTTCAA GTTTTTGCGG 'rGAATTTCTT ACCCTTGCTG TCTTTGGAAA 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 GATGAAGGTT CTCGCTCAGG TAGAAGATGA AATCAGTCAG GCC-AGTCTGG TCAAGGTCAT GTATTrG4GCA GAATACAAGC AGGCTCTTCG CCAAATGCAG CTGCAGATGC TCAAGTCTGA GAATTGGTCT CACGAAAAAG TGACAGCAGT AGCTCAGGAA AAGAGTTTGC AGGAAATTTTI
CTTGATACAA
TCZAGGCAGTC
CCGTATAAAG
AAATAGAGTT
TAAAGTAAGT
GAAALArCTrCC
CGTGCTATGC
TTTGAGGAAC
CCCAGTCITTT
ATTTTCCAAA
TCTTTTATTT
CAGGGAAGTG GTC'TGGTTCT CTTCCTT'rTG CCCTGACCCA AACTGATA'rG AAGTCCGACC ACCACATGAA
TCGTCTCCTA
TGCGGCAGTG
GCAACACT'TT
TTCCTGAAA
TTTGAT'rATA
GATTATTATC
AGGTGACAAT
962 CAAGGGGATG TGGGGAGTGG AAAAACGGTA GTCGCTGGCT ACAGCAGGTT ATCAGGCTCC CCTAATGGTA CCAACAGAAA GAGAGTTTAC AGAACCTTTT1 TCCCAATTT'G AAACTGGCTC GCTGCAGAAA AGAGAGAAGT CTTGGAGACC ATTGCCAAGG GGAACTCACG CTCTGATACA AGATGGGG GAGTATGCTC
TGGCCATGTT
TCCTCGCAGA
TCTTGACAGG
GTGAGGCS'GA
GTCTTGGT'TT
AACAG ACCTTUTU TAGGCAA CCAGATGTCC TCATGATGAC GGCGACTCCC CACAGCCTr'r GGAGATATGG ATGTTTCCAT TATCGACCAG TATTGTGACG CGCTGGATCA AACATGAGCA ACTACCTCAG GGAAATTCAA AAAGGT'rCCC AAGTCTATGT CATCTCI'CCT TCTAGAT'rTG AAAAATGCCA TTGCCTTATC AGAGGAGTI'G GGCAGAGGTG GCTCTTCTAC ATGGTAGGAT GAAGAGTGAC GGATTTCAAG GAGAGAAAGA CGGATATTCT GGTTTCGACG
S
0*
S
S
55..
S
S
S
5.5.
S S CAACGTTCCC AATGCGACTG ACTTCACCAG CTTAGAGGTC TGCTAATCCC AAGACGGATT TGGATTTGTC CTTGCGGAGG CAGACAGTCA GGACTTCCAG AGAAGAAGCA AGAAAGGTTG AGAGTGGCGC ATGATTGCCC TAAGGAAAAC TTATACTCAA CTCAAAACAC TGTT'TTCAGG TGAGGTGGCA GATAGAACTG CGTGG.TTTGA AGAGATTTTC AGAGTCATCA AAAAGAAACG .AACCACTTTC CAAAAGAAGG TCTTTN'TCG 'rTTCTAAAAC CACTTTCTAT TTACCCTTCT CGATTACAAA ATAAAAGGAG CCTA'rAGAGG AAGGGATGAG TCATGATTAT CATGGATGCC GTGTCGGTCG GGGGGACA.AG CTGGGAAAGA CCGCATGCGT AAGATTTGAA AATGCGTGGT AGTTCCAAGT CGCTGATATT CTAGCTACAT TAGTTCTATA TTCATCTGGA AAAGAAAGAA TGAAAATCA-A AGAGCAAACT TTGTrCGATGA AACTGACGAA AGGCGTATTT TACGGGAAAA ATTCCACGGA CGCTTGCCAT ATGCCAGCAG GTCGGAAGCC GTCTTGACTT GGTTAGAGGG TTGATTGAAG AATCAGAAGC ACGACTCAT'r r'rCAGGCAA GAAAAAGACC AGATCATGCA ACGGT'rATTG AGGT'rCGGGT
GATCGCTTCGCGTCTCAGTCA
CAGTCCTACG CTGTTCTCGT ATCATGACAG AAACGACCAA TCTGGTGACA TTrTTTGGAAC ATCGAAGATT TTCCGATTT'r GAAGCTTGGC AAGAAGATCC CATCTGGATT AAGCT'rTCTC AGGAAGCTAA CCGCAGG'ITG GTCAGCTCAA AACACCGTTT 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580
ACGAAGTCAG
GAAGAGTATT
AGGACTCTCA
AAGCAAAAAA
TACTTTCAGC
TTCTTGCATT
CATGCTA'rGA TAACATATAT ATACGGTAAG GCGACGCTGA AAGCTAGT'rT TTAGGTTTGG CTCTTATACT TATGACAGTA ACGATTAA.AG TAAATTACCA CTAGTATAAA CAGAAGAGAG AGCGAAATGC CCATCATCCT AAAAGTAAAG AATCTAAA1 5 r GATTACATAG ATATGCTACA GTTGTGGTAA AAAATCCAGC TTTGCTAGAA GAAATTAAGA GTTCCCGAAG ACTTTGATGA TTTCTGGGAT GGGGAAGTGA
AAAATGTTTC
TCAAGTGCTA
TTCTTCCAAA
GTGGCTGGGA
TGGATGTGCG
CCGTGAAGGG
ATGTT'TATCT
963 CACGCTrCCA TCCTACCACT TGGAGGAAAG AGATTrCCAC AT'rCCTCAAG TGAGTTAACA TT'rGAAGGAA GCAAGGAAGG AAAGGTCTAT GCACGCATTG GAGTGAGGAG AAGGTCCCA'r TAATCTTCCA TTTTCATGGT TATATGGGAC CTGGGCCGAC ATGCTGGGCT TCACCGTAGC TGGTTACGGT GTTGTTTCCA GGGCCAGTCA GGTTACTCAC AAGACGGCTT GCGTTCTCCT TTAGGAAATA GCATATTATC CGTGGTGCTG TGGAAGGTCG GGACCACCTC TrTTATAAGG GGATATTTAC CAGTTGGTCG AAATTGTTGC TAGTCTGTCT CAGGTTGATG AGAAGCGTCT TTCTAGCTAT GGTGCCTCAC AAGGAGGGGC CGCTCAATCC TCGAATTCAG AAAACAGTTG CCATTq'ATCC GGGTGATTGA GATTGGTAAT ACTAGCGAGG CTTACGACGA TTCACGACCC CTTCCATGAA ACAGAGGAGG
S
S.
C S S 55 5 9 9- 9 S S
S
9* *5 S S we £5.4
C
55.5 TCAAAAATCT TGCCCATCGT ATGTTTGCTA TCCCATTACC ATCGCATCAT GCCTGAGTAT ACAACTGGCT CTGTGGAAG'r AGCACAAAAT CTTAAAAATT CTATGCGTTT TATCATGGA.A ACAGTCAAAT CGATTTCTAA CTATTATATT TATAGAATTT TCAGAAAGCA AAAGCCGATA TGTAGGTGTT AATACTTTTC
ATCCAAGGTG
CAGTrTGCGA.
GCTCACGAAG
GAGATTCCTT
ACAAACACGC
ATATAGTAAA
CAATGTTTTA
TTTGTTGCTA
CCTATCGAGT
AAAAATCTCT
AAATCATGGC
AGGTTAAGAT
TTTATAATCG
CCATGAATCT
TTAAATATCT
ATAGTATCAG
ATGAAATAAG
TCTAGCTCTA GTTGCAGCAG CTrCTTGTCA GACTrCAGAC ACTTTTCCGT TATTTCAAGT GACCCTTGCC TATATCGATG GATTACGGGC TTGGACGACG TCTGACCTGC GATAAAACCT ATTTGTCAAT GACCAAGTCT AAAATAAGGA GTCGACTCTA GGGATTAAGA AAACTTTATA AACAGGACAA ATCGATCAGG 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 12660 12720 12780 12835 GAAACAAATG TGTACTATTC GATTTGTCAA ATTGCTTAAA AGGGTAG TC TTGCTATCGT TCAAACCACG TCAGCTTCGC
TAGTGTCAAT
ATAATTTTTT
CAGGCTTGTC
CTTGC
INFORMATION FOR SEQ ID NO: 142: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 5020 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 5* S S
S
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: GGGGATATGA AGAACAAAAG AATATTTAAA GACTTCCAAG CTTCAAAAAT GAGTTTAAAC ATTTACACAA GCCCCTTGTT AGCCTTTGTT TTTGTCTTCA TAGGAGAGTT TGTGGCTTTT
ACTTTGTATG
GGTCAAAATC
GArTCGTT
GACAAGAAAA
CAGCAATCTT
TTTAG'rcGTC
CTTTGTCGC
CCGTGCTTGG
ATCTAGCCTG
AGTAAATCTC
GTATTGGCTT GTTAGCTCTC TTGCAAGCTA CTTGCAGACC TAATTI'ACG ATrACTGGCC GTTGAGAAAA GACCTATTCG CTGAAAGGAT T'IAGTCTAGG 'ITAGGTCAAT ATCGTTTGGA TTTACTATCC CATTTTrGGAT CTACTTCCTC AAT'rCGCCTC TTCTTT ACCC TGCTTCATAT TTTTrTATTCG GAGTTGCCAT 964
ATCGGACTTG
TTGCATCAGA
?1"rGG'rTATT
AACCTTGGGA
CCTGGCACTT
ATCCATTCAC
TTTACAGGGG
AAGAACCAAT
GGGCAATTCT
GGCTCTTTAC
CTAGAAATTT TGGAGAGGCT GCTTGACGGA TAAAACAAGT CTTAACACTG TGTTCAGATG TTTTATAGAG AGAATTTCCT TTTCTTCTGA CCTTGTTAGG TTrGAATCCTT ATTCTCTTGC ACAGCAGAAG AAGTGGTGGC CTAAAAC'rAG CTATTCTTAT GGTCTCACCC CTCTATCTCT CTTCTCAAALA CTGATACAGT TTGGGGTGTT GCAGGTATTC ATGGTGCTTG GAATTTTGCT CAGGGTAATC TCTTTGGGAT TTTAGTTAGT GGTCAACCGT CAGaACGTCT CTGATGACCT TT'ITACCACA AGGCAATCAA GATTGGC'rAT CAGGTGGTTC TTTTGGCATA CTACTGCTGA TTGTCTATCT TGCTAATAAA CGGTCCGTCC TT'TCTTCGT GAAAATACTA AGAGAGCATr CTTATGATCA ATCACATTAC TCAACCATCA GGAGATCAAC CCCAAGCTAT AGAAAAAGCT CAGATTCTGA TGGGGGCGAC GGTCATTTCT A.AAGTCAATA AACCAACTCT TCAGCTCTAT GGGGAG'rN'A AGGAATTTN' CTACTATGAT TATrACCAGC CAGAGGCCTA GGATAGT'rCT GTCAATGACG AGATTGACAA GGAGCGTAAT GATG'rTATTG TCGTGGCCTC CAAGGAATAC GCTGATAGTG TCGTTAGTCT ACTCTTGAAT GA GGTCG ATATTCAGTT 0:0:.
GAAGGTTCCA TTATGACAAG 'rCTGGTAT'rA TTAAAGAAAG AAAATGAAAG GATGTGACTT TAAGTATGCT AAAATAGGAA TAGCACATGG AGATAATCAA TTTAAACTAG TATCAAAATA CGAGCAGTTG GTGdATAACA*TTGAtGGGGGG TGGAACAGGG AAGACCTATA CTATGAGTCA GGT'rATT'rCC CACAATAAAA CTCTGGCTGG CCCTGAAAAT GCAGT'rGAGT ATTTCGTATC TGTCCCTTCT AGCGATACCT ATATTGAGAA GCTTCGCCAC TCAGCTACCT CAGCCCTT'TT AGTCTCTTGT ATCTATGGTT TGGGTTCGCC CCGTCCTGGT CTAGAGATTT CTCGTGATAA TGAACGTAAT GATATTGATT TCCAACGCGG GATTTTCCCA GCTTCCCGAG ATGAACATGC TGACCGTATT CGTGAAGTTG AGGCTCTGAC AGCGATTTTC CCAGCGACAC ACNIGTGAC AAAGATTCAG GCCGAGTTGG AAGAACAATT TGAAGCCCAG CGTTTGAAAC AGCGGACAGA
AAGATTTCGC
CTTTCGAGTA
AGGTCAGGTG
CAATGACGAC
AGCTGTCTTT
GTTCGTcGGGG
GAATTTTT'TG
TTGGGAGAAG
CACATGGAAG
GAAAAGGAAG
ATGTGGTAGA
GAGACGAAAT
TGGATCAT
TTGCCATTGC
GTAA.ACTGCT
965
GTATGATATC
CCACATGGAT
TGATTTTTG
CAATGGAGAC
GAAATGTTGC GTGAGATGGG CTATACCAAT GGGGTTGAAA AT'rM-rCTCG GGACGGAGCG AAGGAGAGCC TCCTTATACG CTTCTCGAC'r TCTTCCCAGA AT'rATGATTG ACGAGAGTCA TATGACCATA GGGCAAATCA AGGGCATG'rA CGTTCGCGTA AAGAAATGCT GGTTAATTAT GGTTTCCGTT' TGCCG'rCTGC TTGGACAAT CGTCCTCTCC CGI'rCAGCG ACACCGTG CATTCGTCCA ACGGGACTCT TGATGACCTC T'rGGGTGAAA AACTTTGACC AAGAAAATGG GTCGGGAGGA GTTTGAGAGT CACGTTCATC AGATTGTTTA ACTATGAAAA TGAACAGACC GAGACAGTGA TTGAGCAAAT TGGATCCAGA GGTGGAAGTC CGTCCCACTA TGGGACAGAT TCAATGCCCG CGTTGAAAAA AATGAGCGTA CCTTTATCAC CAGAGGATTT GACCGACTAC 'rTCAAGGAAA TGGGTATCAA 0 0 oo .00.
0 GGTCAAGTAC ATGCACTCGG ATATCAAGAC GCGCTrGGGT GTCTTTGATG TCTTGGTCGG TCCTGAAGTG AGCCI'CGTAG CTATTCTCGA ACGTGGACTC ATCCAGACCA TTGGACGTGC GTATGCGGAC ACGGTTACCC AGTCTATGCA CAAAATCCAG ATGGCCTATA ATGAAGAACA AATCCGTGAC TTGATTGCTG TGACCAAGGC TATCAATAGC CTCAACAAAC AAGAGCGCAA GCAAGAAGCA GTTGAAGTGC 'rTGACTTTGA GGAAGTCAAG GCCTTGGATT AGGGGAATAG AGATTTGATG TCTTTATGGG AAATGGCTTA TGATGCTCCC TATTATGATG ATTATCAGTA AAAATCAGAA TCCATTTTAA GCAACTCAAA AGTTGGGACT G'rrrCGCGTT ATrrGGGTATG TGGTATTTAT GATAAAAAAT TCTGGAACAC GATAGATAGG ACGTTTrCAGG ATTACTTGGA AGGAAATATT GGrATGATGA AACTTGCTGA TCCAAAAGTT CG'rTATTATC AAGGTAAATA AGAAGACTGG GAGAAAATAA ATGACGGTTA TAGAAGGTAA ATCCTTCGT'r CACTGGCAAA CTTCGAACGG ACOCAGATTA TCCGTGACCT AATTAACCTG CTCCGTGAAG GAATTGACGT TGCTGACAAG GAAGGTTTCC TTCGCAACGA TGCACGTAAT AGCGAAGGTC ATGTTATCAT ACGTGCTATC GATGAAACTG CCCGCCGTCG rGGTATCGTT CCACAAACCA AGTTGCTAAG GAAGA.AGACA AGAACTAGTC AAAAAGCTTG ACTAGCAGCT CAGATTCGTG
TCAAGAAAGA
AGGAAGTCGA
AGAAACAAAT
ATATGATGCT
1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 TATGATTTAT T'rAAGAAAGT TAAAGAAAGA TTCACAGCTT AATCCAGTTT GGAAACAGTA TTTTTCAAAT TT'rAAAGAAT TCGAACTACA 'rCGCCTTGGT ATTrTGTTG ATGATAAACT TAAAGAAACA AGATGGATGG TGGTATTGGG AAAGTTGCTA GTTGGAGCAT CTGGGTTTGA AAAATTAAGA ATGAAAAAAG TTTTGACAGT ATTAAATATG TTATCAAATC AATGGAAACT CGTGGAGAGA GGCTTATGAT
AATTGGGAAT
TGTTGCAGTG
CAACTTGGTC
AAGCTCATAT
GTATTTTGAG
CCTGAAGAGA
GACCTTTTGC
966 CTGCGGAATT TCAGGAGACA ATGACATTAG AAAGATGTCG ACI'C'I-IAGT CAAAAG'rATC CAGAAAATAC ATTGATTGCG ATGGATGGTG TGAAGATAGT TGGTTTTATA AGTTATGGCA ACTGTCGTGA TGAGACTATT CAAGCTGGTG AAATTATTGC TTT-ATATGTr TTAAAAGACT ATTATGGAAA AGGAATCGCA TTTCTGAAAT T'rTCTTATGG AAATGGGTI' TACTTTTGAT AAAAACGGAT GGTATTCTAT GTCTGAAAAT TTAATAAAT'r TTTAACAAAT ATCTGTCTGA CGCCCCTGAA AACAATCGCT TGAGGCTTTT GCGGAGTCTG AAATCCTCAA CTTGTCGGCA CATTTGTTAT AAGGCGACTG TTCTTGGGTG CAAAAAGACC GATGGAA.ATG ATGGAAGCTC CGATTGGGAA AAGAAAATCT CTCGATATAG TGGAGGTCTT CAAAAGTTAG TGAAAGCAGC TTrGACTGAT CTrAATCATT GTATTGAAAG ATAACAAGCG CGCCATTGCT TTrCTATCAA.A GGACAAGAAA AAATACTTGA ACTrGGAAAG CCTATAAAGG TCTAAATAAT TCTCAAAAGT AAAAGCTAAT ATGGTACCAA AGAAAGCGAG TAAATTTATG TCCCGTTCCC AATTAACAAT TTGAAGACCT CGAAACTCAG CGCGTGGTGA TGCAGTATCG GGTCTGGTTA TGCCTTTCCT GGAGGTCATG TAGAAAATGA
TCATTCGTGA
TTAAAAATTG
AGTTCTCTGG
AGATTCCAAA
CCGACAAGTC
TCTAGTCTT'r
GTTGTGTCTC
AATCTACGAA GAAACAGGGT TGACTATCCA GCCACTAGAT ACAGGTGGGC TACCCTTCAA TCTTCAGAAG CTTAAhATCTG GCCTATGATA AGAGTTTTTC TACCCTCGCC TACTAAATAA CCTAGCTGAT GGCTTCAACT AGGTGATAAT ACGTTCAACT TGTTCTAGAA TTCAATCTCT CCATAAACAG TACGATAGCA AGTAATrCTC GTGATGATAA GTATTTGCAT
GCTATATTGT
AGGGAGAAGT
TGCTACCATT
GTACAGAAGA
CCAAGGCCTC
GAATACCATC
AATGTTGCAC
GATGATCAAT
GTCCAATTTC
TAGCATCTTT
AAATTGCAAT
3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5020 TGTTAACTCA GAAATTGGCT TAAAGTCAGA GTCGCGGCGA CAGGTCAGTT TTAGTAAGGT CAAGATATTT TGAACGCGAC CACCATTATC TTCAACTTCA TGCTTGACCI' TAAATAATTT ATAGATATAA CCACGATTGG TAGATAGAAT TGGAGATCCA TCAGCTCTTA ATCTTGAACA ATAACT'rGTC GAGTGACATG AAAGTGCTCA INFORMATION FOR SEQ ID NO: 143: SEQUENCE CHARACTERISTICS: LENGTH: 4965 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143: AAAAAGTGGC AATCCATTGA TTGGCCACTIT CATTTAGAGA ATTATCGTCT CGCCCTIGAA GAAGAAGGTC GTGTAGTACT GAGCTCGATG GAGT'rGGTAG TCACT1TCCGC CAATAGCAAA TAGCTGGTAA CAGTCGAACC GGTTTTTGAC CTGTTGATTT CGCTCGTTCC CCCAAATGCT TAATTAGAGT TATTAGAATA ATCCAGCCAC CTAGGGT'rAA TTGGTTGTAC CAGTCTTCCC AGGTTAGACT TGAAGGTTGT ATAATCGTCG CAGTAGCTTT ACTCTACCAT CTGCTGCT'rC TTAGCTAAGG TCTGATAGCC ATTGGCAAGC TCTCAATACC TTGACATCAA CACCCTTTTC GAATAGTTCA GAGCTTCTCC ATCGGATTGC CATTAGCAAA AAGCCCTGGT CAATAGCAAT CGTTTGGTAT CAAAGGCA'G CCTAGA.ATAG CACCTGTTTG TrGACTTACTG CTATCGCTAG
ACTCCCCACA
GCGA'rAACT'r
TGTGACCTCT
CAAGACT'rCC
TGGGGAAGCT
ATACTAGACC
GTCGCTGGCG
ACTTCTTTTC
GAT'rTCACTA
TGCTGAATCG
AACTACTACI' TGAACTGCTG AAGCATTCTG ATAATCCGCA CTCCTGACT'r ATTAGCCCAA CTrCAACAGA AACCTTCTCT ACCTGCTCTA CGTGACAATG TCTAGGTGTC GAAAGCATGA
AATCCAATCT
TGTCACACGA
TGAATAGACT
GCAT'rAGCCA
GAGGATAGAA
TGAACCGGTT
AATCTTTGAA ATCACATGCT ATTGGTATGC TGGGCAACTG GTACTCAGGA ATCTCGTAAC ACGGACCATA CGATAGGTCC CAAGGTCATC ATTCCTGTTC GTTTGTTGGA TAGTTAGATA ACCGTAGGCC AGCAAGGGCT ATTrATTTTGA TTITCTTGA'r GTTATCCATC AAGACATTCC
CACTAGGATC
CATT'rACCAG
AATGATTATC
GCCACATATT
GAGTAGGATT
CTTCTCGTAG
TATCCTGATA
TCTGATGATA
TGACTTCAAT
CCATCTTTTC
AGTAAGCAGG
CCTTGCTATT
GAATCGTTTC
TGGTAGTAGA
AATTACGACC
CTACTTCTAC
GCATGGCAGA
TTTCCTTGGC
CGGAGACATT
T'TGTAAAGTA
TAAAGCAAAG 360 ATGAGCCATG 420 ATCATGCCCA 480 TTCGTCTTGG 540 TAAAGAAGTC 600 .CAATCCCTGC 660 CTCATACACC 720 AACTCCArrA 780 ACCACCACCC 840 CATATAACCC 900 GATATTCCAT 960 AGCATACATA 1020 ACTTCCCATC 1080 AGCTGGCGAA 1140 ACCTACAAAG 1200 ACGACCTGTT 1260 ATGAATTTTC 1320 TGCCAAATrCr 1380 GTCTCTCTGA 1440 TAAAT-AGTCT 1500 ATAATCCTTG 1560 AAGAACTGCC 1620 AGTATTI'TCA 1680 TTGAGGAACA 1740 CCCTGCAATA 1800
CCATCGTCTA
TGATCTATGG
CGATAAAACT
GCTAGATAGT
CqTGAAATTC AAAGATAGCC ATAATCAGCA ACCGCACTTT TAGTAGTAAT CTTATAACCA CCATTTTCAA TCTGAGT'rGC CTCATTTTTC AACTCCTTAG CATACATACG TTCTTGAGCT TCTGCCAAAG CTGTAACCGT GCCCGATGGT AAAAAGTCCT GTTTAAGGTC TACTGAGAAT ACTCGTCTT'r GCTTAATGCA CCTGTACGAT ACATACTGTA TTAGCCCGTC TTAAGCCAAT TTCTAGGTCT TCATCACTCT TCAACTCCCC TAAGGAGAGT AAGTAATGGG ACTCTGTGGA AGTCCTGCTA AAAATGCTGC GTCAACTGAC TGGCATCTAC ACCGAAAATT CCCTCAGCTG CTTGCCGAGC
TCTGTCCCT
TCTTTATTCA
AAGGTCCGCG
CCACTAGAGG
ACTACACCCT
TTT'TCCGAAA
ATCACCGTCC
ACCAAT'TCT'r
GCAATCCCAG
GCTTTTATAC
TTGGTACTTG
AAAAATTCCA
TTATCTATTA
'GCTAGGCTAT
TATTATTTCG GCCAAAGGGA TGGCGCGTTC CAAGGCAAGA CATCCCCAAC CACCTGCTGT AACCCAAACC TACAAATTTC TATGTTCTTT AAAGTGTTCA TTGCTCAGA TGAGATAGAA CGTCCGAATA GGTAATCTCT 968 GCCACAT'rGA GATAGGTCGT TAAAATCTCA GCATCCACAA TCTCTGCCGC CTTACGAGCC TTAATTAGTT CT1GGGTCAA GGTTGAACCC CCCAAGGTCG CACGAATCAC CGCCTTN'GGT TC'rTCTGTCG CAATGATAGC CTTCTTCAGA GTGCGCAACA AATCACTCTC TATGGAAGCA GAAATAGAAG AGATGTCCTT GACCGATTC CTGTCTGAGG CACCCGAACC TTGTCAA.ATA AGGCCACTCC GTATCCCAAA CTCCCAACN1' TCCTCCTAGA AAACCGAGTA CAAAGAGTAA GTTAAATAAG
C
C
C
C
C
C
C
C..
C. *C C C TCAGTAAAAT AGCTGGGAAA AACCTTTCTT GCCAGGTCTA GCATTTTTCG TT'rTAATTCA TACCACAAAA GGGAAATTTI' TGCCCAAG'T- 'rGTGATACAA ATGACTGACT TATCTAAGGT TTTIAGATT'TT GCTGAT T' TATTTTTTTG 'TTrTGCTGG AAAAACACAT GCACATTTTT GATGAGCTAA ATGAAGAAGC TTTGCGTAAA GCCCTAGAAG ATCCAACTGC TGACAGCCTT CACCTAGCC TGCAACTAGC AGGTCACAAA CCTTATGCGC ATCCGTCCTT CAAAGATGCT GAACGTAGTC TCAAGTCTAT CCAAGGACAA CTTTCTCGTT CTGTCATGGT CAACAACTAC GACTGGTTTG ATATTGGAAA ATACTTCACG GTCAACTACA TT1TAATTGAT
CAATAAAATA
TAGGTAGAAA
AAGAGCGTGG
AAGGTCAAGT
ACCTTGTCGC
TCG'N'GGCGG
TCCAAACAAA
TTCTTGACTT
GCAGCATCAG
TGATGAGTAA
T'rTGCATGGA TTTCCTCACT GCCACTTTCT TCCCTATTCT CAATAAT'rrT AAAAAGGAGA TTTGA'rATT'r CAAACGACTG TTCT'rATTAT ACTGGCTACG AATCTTGACA AGTCGTCGCT TGCTACAGGT CTCATCGGAG AGACACAGTA GATGGCTGGG TGAAAATGGC GAAAACAAGG CTTCATTGAC TTCCTCCGTG GGAATCTGTT AAAAAACGGA 1.860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 TCGAAACAGG AATTCTTAC TCGTCC AA CCAAGACCAT ATATGACAGC TGGTACCGAA CTGT'rCCACT AATCACAGAT TCTGGCTCAA TCCCGAAAAG TGGACGCTGA CGCTGTTCGC AAGATATTCG TAAACAATTT CTCGTGAAGT TGTTACACTT
ACTGAGTTCG
AATGTCACTC
TTGCTTCGTC
GCAACTGGTA
ACTTCTCCAT
CTTACCAAAT CATGCAAGGG TATGACTTCT TTCAAATCGG TGGTTCTGAC CAGTGGGGAA
GTAAGGCGGA
AGAAATTTGG
ACGAAATGTA
CAAGACTGGT CACGTTATCA TAAATCAGAA GGAAATGCCG CCAATTCTGG ATGAACGTGA CTTGTCACTT GATGAGATTG CTTGGCTCAA AAAGTCTTGG TTCTTGAAAA TCT'rTACT= GAAGCAGCGC CACACGAACG GTTCACGGAG AAGAAGCCTA CAAAGAAGCA CTTAACATCA 969 CTGAGCAACT CTrTGCAGGA AACATCAAAA ACCTTITCTG'r CAAAGAGCTC AAACAAGGAC TTCGTGGTGT GCCCAACTAC CAAGTACAGG CAGACGAAAA CAACAATATC GTGGAACTGC
TCGTCTCA'IC
CCATCTACGI'
AGTrAGAGAA
ACTAAACTAT
TTTTGCTGTT
GCAAGGTTGT
CAAAGCCAAT
TTGCAAAGA
ATAAGATATC
CTGATAGAGT
TGGTATAGTT AACTCAAAAC AAACGGCGAC CGCATCCAAG TGAACTGACT GTTATCCGTC TCAACATTTA TCTATAAACA AATAACTCTC ATCTATCrAT TAGATTATGT AAGATAGAGA CAAACTACTA TTTACGACAA TGATGATAAC GAATCCAACT ACTCATCTGC TTAGAAATAT TAAAGCCGCT GAGTCATTCA
GCCAAGCCCG
AGCTTGAC'rA
GTGGGAAGAA
TGAAGACGTC
TGTCTTGAGT
AAAATAC'N'T
CAAAACGGAG
GACGCTGATA
GTATTGACTT
AAGGAGTTAA CCTCGAGAAA GGTAACTCCT TTTTAATAGA CAGGCTACGC AGGACAATC
GATTTGAAGG
CGGTATCCTG
CTTGGAAGAA
CTGCACTCTC
ATCCATCTCC
ACTGAACCAA
AATATTTTTC
ATCCAAACGA
AT'rCATCACC
AACCATCAAA
TTAAATAAGC
TTGATGAGTG
T'rATCTAACA
ACACCGATAT
ATAGTGTGAC 0 a.
a 00* 0* 9.
e 9* 0.
9 9
S
CL *0 0 0 0 00*0.0 a *0.9 5*0* 0 05.0 *0*0 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4965 CTGCT'rTCTG CAGTTTCTCT ACTAACTCAA ATTTCCCATC AGGTTTCAAG TCTGTATAGA CCTGATCAAA GGGCAAATCT TTGACTAATT CCTCTGTCCT AATCAAGGTG TCTCCTGTTG
CCAGAATCAA
TCAAAGGAGT
TTTTTTCCCC TGTGCCTTAA GTTTATCCAA GGCTGT~TTTT ATGAATGCAG AACATTCCAA TCAATTCATT TTGATAAGCC GATTGTAGTG ACTCTTGTAC TCTTCAAT'rA AAGCATTTTG TTCTGAACTG GCTCATCCTG CATCAAGACA TAATTCCCAA TAAGAAC'rGG TTGGCCATCT TGATCCCCTT GCTTGCGATA TATTGGAGTT TCCCATGCAT TTCCTCATGT CTATCTCAGC TTGCTTGACG ATGGCATTAG CAATAGGATG ATAAATGTGT
GCTTCTTTTC
AAGAATAAGA
ATATGAATCT
ATATGAGATT
TCAATTCCCT
TCCTCAAGAC
AGGCACTGAT TCTGAGAATA TCTTCCTCAC TATAGTCTCC AAAAGGTAAC ACCT'rTTCAA CTATAGGATA ACTAGTTGTG ATTGTTCCTG TCTTATCAAA CAAGAAAGTA TCAACTTCCA GATATTTCTC CCTGTTGTGG CCTCTGGCTG TCATCTCTGT GCTGG INFORMATION FOR SEQ ID NO: 144: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 3232 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 144: 970 CAGGGGCGTA TTACGTGACA ATTCAATGTA GGCTGTCGCT ACTTGCGCCA AAACAAGGAT TCGATAATGT CGGATGATAC TAACGATTAA ACCGAGCAGA AAGGATCCCA AAATTCCCCA AACTGCAATA TGCAAGCTCA GAAAGAATGC CTTTTGATAT AGTGGTAGAT ATTGTTCAAC AATGGATCAA TCCAAAAATA GAACCTCCCA T1CTAGAAATA ATACAGTTA'r TGTAGCACTT~ AAAATCTTCT TTGGATAATA TCTA~TTP ATTGCCGTTA TAAGGATTTr TATCATAGAC ATAAAATTTC TGAAATTTCC AA6ACAAAATA TwMAAAACT 'rTTGAAAAAG AGTTAAGATA
TTTTTGTAAT
ACAAGCAATG
CAACAAGTTA
ACT'rCTTT'CC
TTCTTGCTTA
ACACAAAGTA AACGCTTrACT TATTAAGGAG CAGAAGGTCA TGTAGATTTC ATCAATACCT TTCCTAAAGC AGCATTTGGC TATATCGCTA AGTGATTr'rA GCG'rCAGGTT CTTT'TTAGT'r TTTATGATAA AATGGGAGTG TCGCAAAAAA TGAGTAAAAC TAGGAGGATC GATCGTTCGC CGTGAAAAAA ACGT'rTTGAA CGTACTGCAA AGAACA.ATTA CACGATAAAA TGGTAAAGGA AAAGTTGCTT CGTTfCGTAAG GATGCTGTCG 'rGACTTCCTT GGTGTCGAAG GGCAACCCAC ATCACACACT T'rTGACAGAC GTTGAAACAA AAGCTTTGAA CGCTTTGTCA CCAAAAAGGA TTCCTTGAAG TGCCCGTCCA TTTATCACCC GACTGAGCTT CACTTAAAAC CCGTATCTTC CGTAACGAAG AGTTTACCAA GCTTATGCAG ACACGCTGCT AAATCAGTCA CATTAACGAA CCATTTAAGC CGATTTCTGG CAAGACATGA CCATGTCTAC AGAACATATG TGGCTGCGCT CCGCGAACAA ATTCACAAGA ATTAAAAGAT ACGAAACAGC TACTATCGCA TTGCCCACCT TCAAGACCGC GTGAAGAAAA CTACCAAATC GTGAAGTGAT GCGTACGGAT TGTCTAAGGC TCTTCGTCCT TTTACCGTAA ACGTTACCTT CTCGTTCAAA AATCATCTCT TGGAA.ACACC TGTTCTTCAT GACATTTTAT GTCATACAAA ATGATTTGGA GCCAATGGCG GTGGGGCGGG AGATACTTTC TTAAAGATT ATCCGTGAAT TGAC'rCATCG TATTCAATTT GAAGAACTAA ATGACCAGCA GGAATCGATC CTTTCGGAAA AAATATGCCA ACCTCGATA6A GGACGCTTGA TAACCAAACG GAAGGCCAGA TTCAGATCTA TTCAAAAAAG CAGACCTTGG ATGGGAGAAC TCTCTATCAA CT'rCCTGAGA AATTCCATGG GACTTGATTT CTAATCGTGA GAAATCCGTC GTTACCTTGA AATGAAGCCG GTGGTGCTGC 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 ACCACAATGC CCAAAACATT GACATGGTGC T'rCGTA'rCGC GCCTTATCGT GGGTGGTATG GAACGTGTCT ATGAAATTGG GAATGGACGC TACTCATAAC ACTTCCAAGA CATCATGGAC AAGGTGATGG CCCAGTCAAC GTGTTCATAT GGTGGATGCT CTTTGGAAGA AGCTAAAGCT CCTGAG'rTCA T'rGACTGAAG
TACCAACGTA
CTTCTATCGA
GCATTATCCA
CTCAAATCAA
ATCAGAGAAA TTACTGGTGT ATCGCTGCTG AGAAGAAAGT TCCAGT'rGAG AAACACTACA CTrGAGGTTGG TCACATCATC AATGCCTTrCT TTGAAGAGTT 971 TGTTGAAGAA ACTTTAATCC ACTCGCTAAG AAAAATCCTG GACTAAGGAG TACGGTAATG TTTTGAAGCC CAAGCTAAAG TGACTACATT GAAGCTCTTG CGACCGTCTC TGCATGCTCC AACAATGAAA TAAATTCTTA GACTGAATTT AAGGAGAAAA TTCTAAGACA TTGTTAGAAA CATTTrACGA TAGTACGCTG AATTGATTG TTCAGAT'rCA TCAACTAGGA TAGTATCTTG AACCAACT TG'TCTATGGA CATCCAGTAG CTGTATCTCC AAGACCAACG CTTTAC'rGAC CGTTrCGAGC TCTTTATCAT CCTTTACTGA GTTGAACGAC CCAATCGACC AACTTAGCCG CCAAAGAACT TGGTGATGAT GAAGCGACAG GAATCGACTA AATACGGTAT GCCACCAACA GGTGGTTrGG GAATCGGTAT
TCACTGATAC
TCCTCTGGGT
TGAAGTGTAG
TTGGTTTAAA
AAAC'TTTTCA
CTATAAATAA
CTTAAACAGT
AACAACTATC CGTGATGTAT TGCTCTTCCC CTTATCAGAG GATTTT-TTGA TTCAAAAAGA TATATTGAAA TTGAAATAGT ACACTTTGAT
TTCCCTAAGC
AAAAGTACTA
AAAArTAATA
ATATATGGGA
AATTTGTGCA -TGT'r'r'ATTT GAAATTGACT TGGATTCCCC AGTGGGATAG GAAGTTAGCG TTGATATAAG TCCATAGGTC .0 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3232 CTATTAGAGG ATGTTCTGGT GTCTrAT'rCA CTTGTTTrT ATAGTATTAG TAGATAGAAT CAGCAAATAA AAACCCAAAT CATTCATACC TCTCTCAACT AGATGTAACT TACAAAACCC CTGACCTCAT GAGCCACTTT C~rCCTCCTC ATGAGGTCAG TTTTACTTTC TGCTGTTCCA GTATCGTT'rT TCCTCGCTAG ATTTCCTCAA AAGGGCAGAC TCCTCCCTTG GTGCGTCACA CGATTTTTTC ATCTCGACTG TTCTTTAATG CATCATTAAC TTCATAAGGA ACAGGAAGAT TCAGGTTGAC TT'rTCTAATC CAATTCGGAA TAGGCATAGA GACTAGACAA TTTGAGGAGC ACATTTTCCC ACCACGTGAA GAAAAAGATG GCGGAAGCGT GTCACCTCCA GCTAGATGTT TGAGAAAAAG ATAGAGATTG CATACGAACT TCGTTTTTGA TTAAGGTrGA ACTATCCGTT CTTCATCTCC TTGATGAAAT TCTCGGCTTG ACCACGTCCA TTGGCT'rGTr CCACTCGTCA TATTTGTAAC GAGAGAAATA INFORMATION FOR SEQ ID NO: 145: S EQUENCE CHARACTERISTICS: LENGTH: 10711 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear GACGCTTTTC TTCTAGGTGG CTAGA.ATAAA GTGCTGAAAA TGCTTGCGTC CTGTTCGAAC TTGATTGTTA AAGTTTGGAA TAGGCGATAC AGCTCATCAT TTATCGCCAA AAAATCCCTC CGATAAAGCT GAAACTGGTC ACATCGTAGA AC 972 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 145: CCGGAGAAAA TGATGAAAAG GCGACTACTT TAGCTGCATG TCATACATTT ATGACACAGA ACACAAATAT TACCAGTAAC TTGTGCCGTC TATGGCTGAG CTATCCGTAA GGATGCAAAA CTCAAGACTT TGTAACAGGA
TTCAAAACTA
CTCTGGATCA
CCCTGATAAC
GTGGTTGATG
GATTGGTCTG
TGGTATACTT
TTAAAATATG
TTTGCCCTTG CGGGCGTGAC ATTATTGGCG GGTTCAAGCA CTAAAGGTGA GAAGACATTC CTCAACTATT TGACAACTGC TAAGGCTGCG GTTCTAGA AWrGATCGC TACGGGAACT
TATCCAAOGA
CTGAAGGTGA
CTGCTGATAA
TTGTTCAAGA ATCAATCAAA GGG'rTGGATG CCTATGTAAA CACAAGTAGG AATTAAGGCT CTGGATGAAC AGACAGTTCA AAAGCTTCTG GAATTCTAAG TGAA'rTCAA.A AGGAGATGAT GTCCTTATTT GTTGAAATCC ACTACTGGGA TAAGGACAAT AAGATACCAG CAAACCTGCA ATCCAACAAG TGCAAGTTTC CTCAACAAGA CTCTATTACG ACACATCTAA GACCAGCGAC ATT'TCCGTCA GGCTAT'rGCC GACAAACTGG AGCAAGTAAA CAGATGGTAA AAACTTTGGC GGAAGGATGT TAATCTTGCA CTGAATTTGC TAAAGCTAAA
ACAACCATGG
TTTGCCAAAG
ATTGTGACCA
GTGCATGT'rG
GAAAACTTTA
GCAGAACTTG
TATCTAGTTG
GAACAAAAGG
TTTGGATTTG
ATCTTGCGTA
GATATGGTCA
GATTCTCAGG
TCAGCCTTAC
GTGTGCTTGC
CTACGGATCC
AA'rCCTCTGT
ACAAAGTTAA
AAGATGGTAG
AGAAGAGTAT
GTACAAATAT
CATCGACTAA
ACCGTACAGC
ATCTCTTTGT
AAGAGAAATT
ATGGTCTTTA
AAGCAGAAGG
TGGATTGACT TACACTTATA AGAA'rACGCG GCAGTCAAAG AAAATCAGAT GCTCTTTACC AGGGGAAA'rC AAAGATTTCT GTACACTTTG AACAAACCAG GCCAGTTAAT GAAGAGTTTT AACTAGTCTC TTGTATAACG TGAATTTGCG AAAAATCCGA ATTGTCA'!TC TOGOATGGTC CCTTACAGCA GCTCGTCTCT GAAGGACAAT ATTGTCTATA TGACCGTrCAG TCCTATAAAT AAAGGCTCTC TTAAACAAGG CTATGCCTCT CAGTTGAATG GCCACCAACA TTTGTTCAAG GGTCACTTAT GGGGA'rGAAT CAATCCAGAA AAAGCCAAGG AGTCCAATTC CCAATTCATT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 TGGATATGCC AGTTGACCAA ACAGCAACTA CAAAAGTTCA GCGCGTCCAA TCTATGAALAC AATCCTTGGA AGCAACTTTA GGAGCTGATA A'rGTCATTAT TGATATTCAA CAACTACAAA AAGACGAAGT AAACAATAT'r TATCAGATAA TGTCGGTTGG TCAAACCTTC TGTAGGAGAA ATGTAGCTGC TAAAAAAGTA ATGAGACTAC AGATGTTGCT
ACATATTTTG
GGTCCAGACT
AGTACTAAAA
GGTCTATATG
AAACGCTATC
CTGAAAATGC
TTGCCGATCC
CATATTTAGG
TGCTGGCGAA GACTGGGATT ATCAACCTAC CTTGATATTA GT'PTGACTCA GGGGAAGATA ACTACGAAAA ATTGGTTACT GAGGCTGGTG ATAAATACGC TGCAGCCCAA GCTTGGTTGA CAGATAGTGC TTGATTATT TGGTACCATT TACAATACCA TGTATAAATA CTTGGAACTT AGGAAAAATG GATGAAAGAA AACATGTGAA ATAACTGTI'G AATCCTr'r'r' TACATTI'CTA CCTCTTTTTG TCAGAATAGA CAGCTATTAT TTGTTATATT CAC'N'TCAGA GGAAGGAGTA AGGATTTAGA AGTTTATGAA AAATTTTrCAT TAGGTGTTGC GTTCTTGCAG ATAGCGTGCA GCTCTTGCAA CAGCAAAAGA GACCAAGGTT CTCCAGAAGT GAAAAAGAAA AACCGGCTGA GAAACACCTA AGACGGTAAC ACAGTCACTA TCCGAGAAGA AATGATAACG CAGGCAAACC GGAAATGCAA CTGTTGATTT GGTGTCTTTT 'rGAAATTTAA
CCAACTACAT
TTTGCAT'rGT
CAAGACAAGG
AAAGAAGAGT
CAAAATATAA
AAGAAAGATT
GAAAAT'TTT
AAAAGTATAA
~TTTTAAAA
TAAAGGATTA
TTCTGTTATG
GTCTGGTTCC
GAATGATGGG
TACAGATGGA
AGAAAAACCA
CCCTGAATGG
AAAAGGTGTC
AGCCCTGT'T
AACCTTCAAA
AGATACCAAG
CTCGTACAGG
CAGGAAATAA
CAGTCACTGT
CTAATAAAAA
GAAAGGAT
CTAAAATGTA
GTTAATTTTA
'rTATTTTTTA
AGAAAATGTA
TTTGAAAAAC
AT'rGGAGCTG
ACGGCGAACT
CGTGATTTTG
CCTAAGACAG
AAAGAGGATA
CAAACGGTAG
CGCTACAACC
GAAAAGAAGG
GATGAT'rCTG
AATAATGTTT
ACTAGCACTT
GCGTCCAATC TTGTCTAAGA
AGGTACAAGT
AGATGAATAC
GGCTCAAGAA
AGTATTTCCC
CGGACCCCCA
GAACCAGTCT
CAAAAAGCTC
GATCTCGCAA
T'rGAATGCTG
AAAGTTGGAG
4 CTTGTTCCT ATTGCTTTCT TTTATCAGAG TTA-AGCA'rTG AACGTT'rGCT CAAAAATGAA GTTGTAAATA TAGTATTCGG CATTCTT'rGG GACAAGTCCC TACCAGCTGA TTTrAGCTACT AAGCGCCTAA GGTGGGAGAA AAGAAGAACT ATTAGCACTT AACCTGCAGC TGCTAAACCT CGAATAAAGA GCAACAGGGA AACTATCCTC AACTGCTCAA GCTTGACCGT TGATGCCAAT AAAAGGGCAA ATCACGCTTT TTGTCGGTTA TGACAAGGAT GGTATAGAGG TAGTCGTGTT 1800 .1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 GGCTGGTTCT GGGAGTATAA ATCTCCAACA GCTGCTCCTG AAACAGGATC AACAAACCGT CTAAATGCCA GCAATAATGA TGTCAATCTC AATGACCATC TTAAAAATGA GAAGAAGATT C;GAACAGTTG TTAGCGTTAA AACGGATAAC GCTGAAAAAG AAACAGGTCC TGAAGTTGAT CTCTCTATCA CTCTCAAGTC AGACGGTCAG TTTGACACAG TGACTCTACC AGCTGCGGTC CTTCTCAAGG CGGGCTCTTA TGACGATGAG CAAGAGGGGG TAAAAACAGA GGATACCCCT GATAGCAAGG TGACTTATGA CACGATTCAG TCTAAGGTCC TCAAAGCAGT GAT'rGACCAA GCCTTCCCTC GTGTCAAGGA ATACAGCTTG AACGGGCATA CTTTGCCAGG ACAGGTGCAA CAGT'rCAACC AAGTCTTTAT CAATAACCAC CGAATCACCC CTGAAGTCAC TTATAAGAAA ATCAATGAGA CAACAGCAGA GTACTTGATG AAGCr'rCGCG ATGATGCTCA GACAATCAAT TGCACTTTGA CAAAAGAT'rG A'rGACGAAAG GTCTCTGTTT CTAGTAATCA ACGCATGTCA GCGGAGATGA
CTTAATCAAT
TGTGACTAAG
CAAACTACTT
AACTGGTGCT
TCATATCGAT
974 GCGGAAATGA CAGTACGCTT ATTGTCAACC ACAATCAAGT TCTTCTATTA GTrTCCGG AAG'TTGATG GGGCAACCAT GTAACCAATC CAATGAAGGA AAGCTTGCTG CTGGTGTTTG TGGACTCGTT TGACAGC?1'A AGCTCTGAAT GGCAATGGGA GAACTTCCAA GTGCTAAGGT
GCAAGTTGTA
CAC'TCCAGGT
CAATGCTTTA
GTCAAACAAT
TTTGGCTAAG
GAGTAACTCT
TAAAGAAACA
3540 3600 3660 3720 3780 3840 3900
GGTTACATGT
CAAAACAGCT
GTCGGAAATG
AAGGGCATTG
GAAGATGCCA
ATTATGAACA
ATGAACTTTG
ATGGATTTGT TTCTACAGAT ATGGTGGTGG r'rCGAATGAC CCAACTATGT AGGAATCCAC TTTCCCAGA ATACACGAAG ATGCAGACAA GAACGTTGAT ATCCTCAAGG TTGGGAAAAA GTTCTCAAGC ACAAAACCCA AAAAGCTTAT 3960 TGTTATCACr 4020 TGGCAAGATG GTGCCATTGC TTATCGTAGC GT'rAAGGA'rA TCACACTA CCGTATCGCG ft *9 S
S
SW 54 5 5 ATCAATCTCC ATACAGATGG TCTTGGGCAA GGCCATGACT CTrGGTCACTT GAACTATGCT GACTTCAAGA CCCTAATTGA GAAGGCTAAG AACGCTTCAG AA.ACTTATCC TGAGTCTAAA CCAGATGGAA GCTATAGCTA TGGTTGGAAC GCCTATGACC TAGCTCATGG TCGTTTGGCA T'rCCT'rATGA
GGTGTTCTCC
GATATTGGTA
AAATATGGAG
TACTTCAATG
TGGCTAGATC
CGTTGGGAAG
GACGGTCTCG ACTTTA'rCTA TGTGGACGTT TGGGGTAATG GCCTGGGCTA CCCACGTTCT TGCTAAAGAA ATTAACAAAC GAGTGGGGCC ATGGTGGTGA GTACGACTCT ACCTTCCATC TACGGTGGCT ACACCAATAA AGGTATCAAC AGTGCCATCA CAAAALAGATG CTTCGGGTAGG GGACTACAGA AGTTATGGTG CTAGGTGdCT ACAGCATGAA AGAC'N'TGAA GGCTGGCAGG TATGTAACCA ACTTATTTGC CCATGACGTC ATGACTAAGT AGTAAATGGG AAAATGGTAC ACCGGTGACT ATGACCGATA ACTCCAGAAA TGCGAGTGGA A'rTGGTAGAT GCTGACAATA AAGTCAAATG ATGTCAATAG TCCACAATAT CGCGAACGTA GTCATCCAAG ATGGTTCAGC TTACTTGACT CCTTGGAACT CTTTCTACTG ATAAGGAAAA GATGTACTAC TTCAATACGC CCTTGGATGG TATCAAGAAA TTAAAGGATA TCGTAGCGAA AGCGTATCGG TGGTGTCGAA CTCATCTAGG TATCCACGTT AAAAAATTCT CCGTAAGAAT AAGGTATCAA CATTGATGCT ATTTGAAGAA AAAACTTGGT GTCAATCAGG TGATAACGGT AAGGCTGGCG CTTTGCGATC ACTGGGCAGC TGACTTGACC CCCGCTTrTAT CCGTAACCAC GTGCAGCCAA CTATCCACTG GAAGAAGTGA CTACAATGGC ACTTCCAACA CTTrCACTG'rA ACGGTAGCAC CTATAAATGG ATAAAGTAGT TGTAACTCGT CAGTAACGCT CAACGGACGT GGGATGCAAA TGGTAAGAAA AGGCCGGTGC AACAACTTGG 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 .5 S S
S
975 ACCCTTCCAA GCGATTGGGC AAAGAGCAAG GTTTACCTTT ACAAGCTAAC TGACCAAGGT 5340 AAGACAGAAG AGCAAGAACT AAC1'GTAAAA GATGGTAAAA TTACCCTAGA TCTTCTAGCA 5400 AATCAACCAT ACGTTCTCTA TCGTTCGAAA CAAACTAATC C1'GAAATGTC ATGGAGTGAA 5460 GGCATGCACA TCTATGACCA AGGAT'N'AAT AGCGGTACCT TGAAACATTG GACCATTTCA 5520 GGCGATGCTT CTAAGGCAGA AATTGTCAAG TCTCAAGGGG CAAACGA'rAT GCTTCGTATT 5580 CAAGGAAACA AAGAAAAAGT TAGTCTCACT CAGAAAT'rAA CTGGCTTGAA ACCAAATACC 5640 AAGTA'rGCCG T'rTATGTTG.G TGTAGATAAC CGTAGTAATG CCAAGGCAAG TATCACTGTG 5700 AATACTGGTG AAAAAGAAGT GACTACTTAT ACCAATAAGT CTCTCGCGCT CAACTATGTT 5760 AAGGCC'rACG CCCACAATAC ACGTCGTGAC AATGCTACAG T'rGACGATAC AAGTTACTTC 5820 CAAAACATGT ACGCCTTCTT TACAACTGGA GCGGACGTCT CAAATGTTAC TCTGACATTG 5880 *AGTCGTGAAG CTGGTGATCA AGCAACTTAC 'rrTGATGAAA TTCGTACCTT TGAAAACAAT 5940 *TCAAGCATGT ACGGAGACAA GCATGATACA GGTAAAGGCA CCTTCAAGCA AGACTTTGAA 6000 *AATGTTGCTC AGGGTATCTT CCCATTTGTA GTGGGTGG'rc TCGAAGGTGT TGAAGATAAC 6060 CGCACTCACT TGTCTGAAAA ACACAATCCA TATACACAAC GTGGTTGGAA TGGTAAGAAA 6120 *GTCGATGATG TTATCGAAGG AAATTGGTCA CTCAAGACAA ATGGACTIAGT GAGCCGTCGT 6180 *AACTTGGTTT ACCAAACCAT CCCACAAAAC TTCCGTTTTG AAGCAGGTAA GACCTACCGT 6240 GTAACCTTTG AATACGAAGC AGGATCAGAC AATACCTATG CTrr'rGTAGT CGGTAAGGGA 6300 GAArrCCAGT CAGGTCGTCG TGGTACTCAA GCAAGCAACT TGGAAATGCA TGAATTGCCA 6360 *AATACTTGGA CAGATTCTAA GAAAGCCAAG AAGGCAACCT TCCTTGTGAC AGGTGCAGAA 6420 ACAGGCGATA CTTGGGTAGG TATCTACTCA ACTGGAAATG CAAGTAATAC TCGTGGTGAT 6480 *aaTCTGGTGGAA ATGCCAACTT CCGTGGTTAT AACGACTTCA TGATGGATAA TC1'TCAAATC 6540 .GAAGAAATTA CCCTAACAGG TAAGATGTTG ACAGAAAATG CTCTGAAGAA CTACTTGCCA 6600 ACGGTTGCCA TGACTAACTA CACCAAAGAG TCTATGGATG CTTTGAAAGA GGCGGTCTTT 6660 AACCTCAGTC AGGCCGATGA TGATATCAGT GTGGAAGAAG CGCGTGCAGA GAT'rGCCAAG 6720 ATTGAAGCTT1 TGAAGAATGC TTT'GGTTCAG AAGAAGACGG CTTTGGTAGC ACATGACTTT 6780 GCAAGTCTTA CAGCTCCTGC TCAGGCTCAA GAAGGTCTTG CAAATGCC1'T TGATGGCAAT 6840 GTGTCTAGTC TA'rGGCATAC ATCTTGGAAT GGTGGAGATG TAGGCAAGCC TGCAACTATG 6900 GTCTTGAAAG AACCAACTGA AATCACAGGA CTTCGCTATG TTCCGCGTGG A'rCAGGTTCA 6960 AATGGTAACT TGCGAGATGT GAAACTTGTT1 GTGACAGATG AGTCTGGCAA GGAGCATACC 7020 976 TTTACTGCAA CTGATTGGCC AAATAACAAC AAACCAAAAG ATAPTGACTT TGGTAAGACA ATCAAGGCTA AGAAAATTGT CCTTACTGGT ACCAAGACAT ACGGAGATGG TGGAGATAAA
TACCAATCTG
TTGTCAGGCT
GAGGAAGTAG
CAGCGGAACT TATCTTTACT ATGAAGCAGC TTTGG'N'AAG CTAGCGTTCA GGCAACCATG CGTCCACAGC TAGCAGAAAC GCTCAGAAAT TAACAGACAA AAATATGCGA CGGATAACCA CTCAACCAAT TAAAAGATTC GAAAGAATGG TGGAATACTT TGCAGATTAT CCAGATGCTC CAACTrGTAGA GAAACC'rGAG GGTAAGACGC CAGATTATAA GCAAGAAATA CCAGCAACAG GTGAGAGTCA ATCTGACACA CTATCTGCTC TCTTTGTAGT AAAAACGAAG
TTTAAACTTA
GCTAGACCAG
GCCCTCATCC
AAAGACTAGT
AATGAGGT'TT
CATAAATACA
GTTGTGCTGG
TAT'rGT'rAGG
GATCTTTAGC
AAACACCTGA
TAGCAAGTGT
ATTTAGTAAA
ATAGTACAGA
ACCTCTTGAC
AGACAATCAA
TCTCTTGACG
TGCTACGAAA
TITCCGAGCAA
ACAAATCTTG
TAGTCTAGCC
ACCTCTTAAC
AAAAGCCTGA
0.0.
0 AAGATTACGG AAGCAG'rCTC GAAGATGTCT TC'rCAGGCT'r CCCAGAAAAA TCTGGGTGAT GATCAGGGGT TGTATTTGAT TTGTGCTTGG AGTGGTTGAG TTTCAGT'rGA TGGGGGTTGT TAAATACGAA TTCTCCATTT TTCCTTCAGA CAGGTAGGTC TGCCTACAAG TGGTGTCAGA GCGTATAGCC AGCAAATAGT CAATTTCCTC GTCTGTATAG GATAGGCATT TCGTCCAGTT AGGCTGTCGT TTCCTTCATG CACTAAA GAC GACTTTATGG CACCGTAAGC AGCAGCCATC -TGTTACTTGA AATGGCATTT GGAAAGTCTT GGCGCGGTTG GCGATTGTTIG CAGGGCGTAT AAACAGGAGT ATTTGTCCCA
TATCTTTTCC
TTGTTAAGCA
AAATGTTATG
TGTTGCGTAT
CTAGACTGTG
TGTGCAGCAG
CTGTAGAGCC
ATCATAGAGC
CGGTTAGAAT
TCATCAGGTG
TTAGAGGTTC
CCATAAGTCA
C-ACGAGTTC
ATAGTGCTAT GACAAAATCA TTGAGGATTC TGAT'rrrGTT ATTGGTAGTC GTACTATTAT AAGTTGA.ACT ATCTGATGAT GAGCTTGAAC GTGACTTCCA CGTAGAACGA GCACCATT2-r CCTCTGGTAT ATTCCAATCT TCTGGATTGC GGTAAACTTT GGCAGCGACC GTAAGGCCAT AGCCTGTCCA TACAGCCATT GAATATTTAC CTACAAATTG AGAGGTCTTG ATGTGGTTTT CTGTTTTACC AGCCTGAGGG AGCCAAGCAA AGACTGTTTT CATCATGTCG GTCAT CATAT CGACAT'rAGA GAACTCTTTT TCACTCCCAT GTTTATAGTA AGT'rCCACCA TTTGCAAAGG TTGCTCCATA TTT'rrTGTCT GATTCGGTTG TACTTGGGTA GTCGATTCCT AGACCAT'rTA 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 ATATACATTrG
TTTTCACTAC
GAGTAGTGAA
AGTCCG.ACCT TGTTTAGAGT TGCAAGGTGA TGTTGCCAAA GGGTAGTTAT AGGGCTCATC TTCCACGGCT GGGACGTTTC GTAGCCCCTA TCCCAGTTAT GTGAACGATA GTAGCAGTTG AATCGTAGAC ACCGTACTCC AAGGCACGAG CATAGTCTGT GATCGGTT'rC ATAGTTGATC 9.77 CCCAGTCGCG GTTTGTTTCT ACTGCTTGG'r TAATTCCGAA GGAAACA'rTA CTTGACTGAT GGCGTGCTCC TAGCTGGGCA ATGACTTTAC CGTTAGAAAC ATCALACAATG GTAGAAGCGA CTTGCAATTC ATCGTCTGCA TAGGCAACGT ATTCGTCTGT AT'rGTAAATA TCCCACAGA'r GTNTTTGAGC TTCTTGGCT~ ACATTTGTGT AGACATCCAT CCCAGTTGTG AG'rAGGTTAT AGCCTGTTC TTCTTCAACT TGATTGATGA CTTCCTTGAG GTAATTATCC ATGTAAGCAG GGTAATTACT TGCTGA?1'TG AGACTTTGTA GTCCATCAGT AATTGGTGTA TTGACTCCTT TCI'CATACTG TTCAGCAGAG ATGTAGCCTT GAT=rTCAT TTCAGATAAG ACCAAGT'rTC GGCGGTCTTG GGCTGCTTCT GGATGTGAA'r AGGGGTCATA TTGGTTTGGT GCCTGAGGCA TTCCAGCCAG CAAGGCTAAC TGAGGTAAAC TTAAATTATT GAGGTCTTTA CCATAG'rAGT TT'TGAGCTGC TGTCTGCATT CCATAGTTCC AGGTCAAGAT TTCTTGCTTG GT'rGCTTTr'r GAGCCTTACG AGAAATAGTC TGGTCGGAAG ACTGTTGGGT GAGAGTTGAT CCACCTTGGA CTCCCAGGAT ACGGATGGTA TCAATCCCCC AAACGATTGC CTTAACCAAA TCTG'rGGGAA CAGAACCCAA GTCAGCAATG AGTTGATTTT CATTAGACAT GTAGACCTTA TT'rATATAGT GTTCTAACTG AATCGCTAAC CAAGCTTCCT TCGA.AGTTGA AAAGTAAGTC AACTTAATCA CAACTAGTTT ACTCrCGGAT CGCCTAAGAC AAT6GCTGCG GGCGCAGAAT CGTTGGTTTG ATTGAGATAA GGAATTTGAG A'rATTCAAGT GGCATTGATT TGCCGGCAAT AAGTAGGTTT TTGTTGGGCA AGGACTTGTT AATCGCACGT TTTTGTTTTG GCCAGAATAG TCCGTCGTTG TCGTTGTGGA TAGTCCACTT CAAGCCCTGA GACAAATAGT GCTCGAAT'rC GTAATCATGT TCCACACAAC ATACGAGCCG
AGGCTAGGAG
ATAACCAAGC
TTCATCTTGT
GGAAGGCACC
TTTGTCCCTT
CTTGCTGAGA
CCATATGCTG
TTTCCTTGAC
GGGAATTGCT
TGTGGTCGAA
TATCATTAGC
TATTCTCGTA
CCTTGCTAAC
TTAAGAAGCT
TTTACCACCT
AGCCTTG ATT
ATCTTGATGA
AGAAAAGTGA
AATCTGATGT
TTCAAAGTCG
TTGCAGATTG CGCAAGAAAG GAAGCGATGG TC'rTCGATAG TTGGGCATTG ACGCGGCGTT GATTTTACTA GAAGTTGTTG GTAGTAGAAA AAAACTCCTC AATGCTCAGA TACTTGA'rTA AATAA.ATGTT CTTTGATAAC TCATATCCAT ATTCTCGAAT 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560
TAGAAGCGAA
AGAAGGACAA
GGATGAAAAT
ATGTAATATC
TCAACAATCT
TCAAATCGAA
AGCAGATTCC
TTTTCATCGG
CATTATAAAC
TGGCACGACT
AAGCTTGTCG AAAATAGGCT GTACGATTTG AATAGGAGTT AGTCGTTGGT AGCATTGATC
CATAGCTGTT
GAAGCATAAA
GGTTTCTTAT GTATAACAGC ATCTTTTCAA AGGGTACCGA AATTGTTATC CGCTCACAAT TGGGGTGCCT A.ATGAGTGAG
TCCTGTGTGA
GTGTAAACCC
CTAACTCACA TTAATrGCGT 'rGCGCTCACT GCCCGCT'rTC CAGTCGGGAA ACCTGTCGTG CCAGCTGCAT TAATGAATCG GCCAACGCGC GGGGAGAGGC GGTTTGCGTA TTGGGCGCTC TTCCGCTrCC TCGCTCACTG ACTCGCTGCG C INFORMATION FOR SEQ ID NO: 146: SEQUENCE CHARACTERISTICS: LENGTH: 11887 base pairs TYPE: nucleic acid STRANDEDNESS: double.
TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 146: 10620 10680 16711 TACATTCATT CCATCGGCTA CGGATACTGT AAAGTATTAT 'rGCI'TTGGT AGTATGAACC TCTGGACCTT TTTCTAGTGT CTCCATAATA CTTAGATAAA ACCATAGCTG AAGTCGAATA CAATTTTAAT CAAATCATCA TTACCGATAA TACTTCTGAT ATACGTTGGT GAAATCTCAG ATAATGAAGA ATCATTAGAC CTCACTTACC TCATATTCTT CACCCTTACT AGAAATAACA 9 *9 99 9 9 9 9 99 9.
9 9 9 CTCAAAGCAG ATACTGTCGA TAACTGGCTA GCCAATAAAG TACTCGCAAT CCCAATTTr'r TATAAACAGT TTTCTTCATT ATI'GTATCCT CCTAATGTAA CTATTCTAAA TTTCTTAATC TACTATAGAA TCAAGAAATC TACCACCTTC CTCCATTATC ACATAAACAG GTAAACTTTT CAATTAATGA CTGCGCTTTT
AATTGAAATA
TTATAGCGTA
TTTAAATACC
CAATCACGCT
AGAGGTACTT GCTTGCTTCT T'rGATACTAA GTTCAGCCAT TCTTTCCTTG TTTTTCTCAA TAAAGCATGT TACCCAAGTG GGATTCGTTT TGGAGTAGTC TCGCAGAGTC CAGCCAATGG CTTTATTGAT AAAAAATTCT GTTTGGTTCA AGTTATGAAG GAGAATCTTT TCCATTAATT GAGTATTGGT CTTCTCTTTT CTTAACAACT GGTGGTCAAT AGCGACACGT CTCAGCCAGA 9 9 9 9 9 9999 9 9999 99 9 9 TAT'rATCTGA TAGGCTCCAT GCCGCAGTTG CTCAAAACAC ATTTTCGAAG AGTATTAAGA TACTACTCGA TCTAGGATAT A'AGCTTAGGC AAATCGCTTT CACATATTGG TATTTTCTAG AATCTTTGTT TTTTTCGCTT CAATACCTAG AAAAGAAAAT AGTCTTTTGC TGCTTCTAGC TTTATACTCA ATGAAAATCA AAGAGCAAAC TAGGAAGCTA TGTTTTGAGG TTGCAGATAG AGCTGACGTG GTr'rGAAGAG TTATTTCTTC TAGTTCAGGG TGTTCATACA CCAAACTCCC CTACCGTGTC CCACAAGGAT TTTGTCACGA CTAACTGCTC CCTTTAGATA AGACTGCATT GCTTTCAAAT AGTTAGCAGC GATCCTTTTC CCAGCAAGTG TCTGCAAAAT CCCAATCGAT CTGGAAAATA TT'rTATAGAG TTTATTTCTT TCAGGCACCG TGATGGCGCA TATAGGCTTC CA'rGGACCTT GCTTTTTTAG TCCTCAAGTA AATCTGCTAA ACTCATCTAA AACTCCTCTT 979 GCCCCACCAA ATGGTGCTGA GTTTGGCAAG AATATTGGAC CTGCGA'rGCG CTGATTrCG CAGTCAATTC TTCATGAAGT C'rrGCTCAAT GGCCAGCTCG CACTAGAAGT CTTGAGCATA CAT'rGTCCAA ATAAGAGGTC CTAAGGTCGC GTCAATCCCA GACGCAGTTC CAAGGCCAAG CTACATCGTC TTGGAGGTCA CTACTAGTAA AATTTTCATC ATATCAACCG CCAGCCCTTG TCAACCCGCT CCTTGATATT AAACCAATCC CATTGTCCAC AAGGCATAGA CAGCCGCCTG GGTACGATCG CTGACTTCAA ACGTGGGTCT TGACCGTCTT GAGAGAGATA AAGAGGTCAT TAGCCCTTGG CGATGAGTTG GAGAACATCT CC'CACGCG TCCATATGAT TGCGGTGGTA TTrCAACCTTC TTGCTAACCTr CCAGCAGCTA CC'rACTGAC GGCATGAAGC AATTCATCTG TAGCCTTTGG CACCAGCATC TAAGACTGGC ATGATT'TTTT ACAATCAAAA TCTTGGCTTC AGGCCATTCT TTAAGGATTG
TTCATCTCAG
TCAATCCCTT
AAGTAGCTTT
T'ITACTCCTT
CTTGGGAGCT
GCATGACAAT ATCCATGACA ATGACATCTG GAGACCCGTT GGACGCCTCA CCCACAACT'r TCAAGCCCAA TCGGACCATT TCATGGTCAT TATCATTCCT TATCTAACAG GGGAATACGG GTCAAGAGTT GAACTGTTCC TCGCAGTCCA TAACTCAAGT CGTCTAAGCT CACCTrCAGT TGCAATTCAA CATCTGTCTG ACATCTAGGC AAGATGCCTG GGCATGGCGG CGGAAGATAT GCTCCTCGAT TTTCTTAGGC CTAAGATCAC TCT'rGTCCTC AAGCTCTTIT TTCTGCTCCA GTTCAACTGG TCGCAAATGC AGGGTATTGC TAATCAACTC AATTTCGTCA TATTCTGCTT AAAAGAATTT GAATCCCTTC AAGAGCAAAA CCCGCAAATC G'TTrCTAAAA TAGCTGTGAC ACTCTGCAAC TGGGTCTGCA TCTT'r'CTCT AAAGCCTGCT GACTGATACC CGATAAAATC ATGTGGGCCG CAAACAACTC
AGCCATATCT
CCCTAACTGG
ATAGAGGTAG
TTGCAGGATA
GAGACTAACC
TATCAAGCTC
CTTCTGGGCr
ATCCA.ATTTC
CTGACTGACT
CTCTTCCTGA
'rTTACCTGA'r
TTGCCCTGCT
CCCTCGCCAA
GACAAATTTT
TTCCAGCAAG
ATAGGCTTGT
ATCTTGACAC
CGGAGGGCTC
1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 GTATCGTGCA AATCCCGAGC AATTCGCTTC GCAAGGCTCT GATTTTCAGC TTTTTGAAGA AAGGACTTGA AACTGGCATC CAAATCTGGA AATAAACGCT TGAGATTAGC CTGCATTTTT AACAGGGCTA AGAGACAGGT CATGGACATG
CGTTCCTTCT
GCCTCTGTCA
TCTGCAACCT
CTTAGAGAAA
CTGAAAACCA
CGATGATTTC
AAAGGT'rAAG
GAACCACTTC
GCTCTTCGAT
ACAATAAAAA
TCTGTTTTTTI
CTGTGGGAGA
TTTTTCATCC
TCTTGTTACT
CGACATCGTG CAAAAAGATA AAAAAAAGAC AAATAGGAAG TCTAACCACC TCCACATCAC_ C'rTGAGATAG TCTTTTGTTT GACCAGTCAA AATCAAGTAT GAGGTGAGAG CAATAATGAC CAATCATAGT GGTCAAGAAA CTTGATGATA GTGTTCATTG GCTTGGGCTG GTTGAAAAAA ATCAAATCCC CCACATCTAC AGGTACGATG ATTTTGGTCG 980
CATAGAGACA
TTCCTACCAT
GATGAATAGT
GTTAACGCTG
CTTTCTGAGG
GTCCTTGCCC TGTCATGATT GGN'AAGATG AGAGATTGAT ATCATCGAAT ACCAACGATT T'rTCTCCTTC CTTTrTTCCTG GTTCATCATC TAGCTAGAAT CACAAAAGGA CTAAAAGAAG ATTATTTCCC ATAGTATCAG CAGAAAACGC GAAGACAGGC TTCGATAAAT
ACCCTCTCCA
TGGCAAGTC1'
TTAACCGTCA
GGGTAAAGAA
TTGAGCATAA
TCTTTACCAG
GAAAAATGCT
AAAAAGATTT
GGTAGC'rTGA AAAATGATGA CGACCTCT'rC AAAAACCAAA GAAAGAGGCT ATAGATAACC CGATGAAAAA GAAGAGAATG TGTAGTAGCG AATCAAAAC
CTGATACCAT
TAAAT'rTTCT
TCAGTCTGAG
TTTAATTTTC
TCTATTTTAT CACAATTCAA AAAAGTCACC TACGCCT'rTT 'rCATCTGATC CTTTGCTTCT TTTTGTAGAT CTGCTATGGT GGCACAGTTA AGGGAACACA TTCCAGCCTT GGACAATGCC AATCACTTCT TCAACTGTGT ACGGTTCGTG ACAATCCCAC AGCCTTAGCA CCAAAAACCA GGATTCCGAA CCCCTCCACT AACCA-AGAGT TCGACCTTAT AGAAGGGCCT GCATGGTAGA CTGACCCCAT TGA'rTGAGGT CGGTTTTCGA TATAGGCAAA GCTGGTGCCA CCACGACCCG CCGAATTCAT AGGCTCTTTC GATTGTCTTG GCATCCATTC ACAATAGGAA CGGGAATTTG CT'rGCTATAA TCTGCTAGAT TTCCTTTCTC CCTCGGGCAT GAGTAATTCC TGCATGACAT ACAGGATTCA TCTCTTCTAC AGTCTGAAGT CCTAACTCGA T'rGGTT1CCAA GGAGGAGATT GGGATGACTA GACTTGACAG TTTTTGAGGG CTrGCGCTATA AGAACCCGTT ACAAATAAAA TGAGCCAGCT TTTGAT'rGAT T'rCTCTTCCC TTA'rTACTTC
CAAAATCAGA
CATAGGTTCA
GATGGAAAAA
CATAAAGAAG
TAATCAAGCG
AGGTTTCAAC
AGCACTTAAT
CTTTCCATTC
AATCACGCTG
ATAGGTCCAC
CAAAGCCCAC
GCGATTGCCA
TGACATGCAC
CAGGCTTGTC
AAAAAGAATC
AGACTGACTT
ATA.ATGACAT
ATGAAGCGAA
AGATTTCCAA
TTGGTCTGCT
GCAACAAAAA
GTTGCCGCTA
AAAAAGAGGA
GCTCCTGTCA
'rCCTCTCCCT
AGGCGCTGGT
ATAGTCTACT
TAGATCTGCT
CAATTCCAGA
CATATCCAGC
'TTGGGCATTG
GCCACTACGA
TGTACGAACA
TTCCTTGAGG
GCTTCTAAAC
TTGCAATAGA
CAATCCAATA
ATCCGTTGGA
3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 TACCACAGGA TTCCGCCACC CACCAGTCAT G.GCATTGATA
'AAAAACGAA
TTGTAAAGAG
TTCTGCTCAA
TTCTTGATAT
T'rGCGCATCA AGTCCCACTT TCGACCAGCA GCAAGGAAGA ATGA.ATCAGC GGGCATAGAG GATA'TGCTCG AAGAGCTCAA TCCCCAGATC AAACTCAGGG CGATGCCACA AACTCTGTCG AAAGATCGAT TTCATCCAGA TCCACCTCAT CAAAGCTATT ATAGGAACTT TCCTTACGAT TTGTCGTCAT GTCCTATCCT GGCCCAACGA T'rTTTTAAGG TTTTGGTTGA GTCACCACCA CCAGCACCAC TACTCTTGGC 11 AACGGTCTGC AAATCTTGAC TGGCTTCTTT TGTAC'rCAAG CCTTCTAAAA GC'N'GCTGGC TTTCCCCTGT TCCAAGGC'rT CTACCAGAGA A71N'ATTG ATATTTTGCT TGATTTGCTG CAACTGTCTA AGCAAAGGCG TGTAAATATC TACTTCTACT TGATCGATAA TCTTTTCTGA AGTCACCGTrT TC7MGAGG AAGT'rAAAAA GACCATGTGA CTCGATACAG CCACTTCCTT AGT TCGr'rC ACTTGTGAAA TTGAAAAGCC GGTCCATCCC ACTAAGAAAT CCAATCACGC TCCAGAACTG GCGA'rCAAAT GAC'rGGTAGA GGAACCATTG TCTCCTCGCT ATCAACAGAA ACATCATACA ACTAGAACCT AGACCAAACT AAAAGGTCTT AAATTCTGAC T'TGAATCAAG CTATAGTCAG ATAGATACGG TAGC'rGTCAG TATCAA.AGCT AACTGCCCTG GCAAGTT'rTA ACAGCAATCA AACGATGACC GAAAATTTCT CATTGGGACC AGC-A'CCATG CAAAGGCCAT AGCCTCATAA
CACATTCTAA
TCGCCAAGTT
GAACCAAATC
TTCTTCTTCT AACCAAGCAG CTCTGCCACA ATACAGGCAA TAAGCAAGAC AGCGCTAGTC GAGCCAGTAA AGCCTTGACA TTTTCCCTTC TCGTTCCATT CACGAACAGC GAGGAAGTCT GATTAGGCCT TAAGTCCACT AAAAAGCAAT CTCAGCCCTC GCTCTAAAAT AGCATATTCA TCTTGACTCA AATCCTTTGT GATAAATGCT CCAAGTCTTT GTAAAGTAGC AGGCCTCTCC GAGGCATCCG TCAGATAAGA
AGCTTGAACA
ACCP.AGACAA CGACGCTGCC TT~GCCACAGA TTTCTAGAGA CCCATCAAAG CAATCGTTTC GCGAAATCAA ACATATCTGA ATATAGATOG GAATATCCTT CCTGCCCAAT AGAGTTTTCC TTTTGACACA ATCAAGCGAT CTCCTGACAGr AAGACCTTAA TTTCTCACGA AGCTGGCGAA AAAGGCTGGA CTAGCAGTCT
CCACCTTCTG
GGTCGCCCAT
AGAGCTCCTG
4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 TTGTCGTAGC ATGCATAGCC AGGGCATTTT TCTCCGITTAA CATTTTCCTT GAGATAAATC AGCATATCCT GATAGTCCTT CGTCGAAAGT CGTCGAGGTT TCCACACAAA GTTTCATCCC 'rTTTCTTGTC CTCTAGCACC AACATAATCA TAGCTAGTTT AAATTTCTCC ACTATCCTTA TCCCAGGCTC CTAGTGGTCC AACCTGAGGC AAATTTGGCT TCCTGTGCCA ACTGACTTCT TTCTCCAATC TTGGCAAAAT CTCAGACTGA CGAACCCAGT GTCACGGCTA GAGATTGGTT CAAGTCTGTC TCTACAGGGT ATAAAAACTC CGAGA.AGAAG ATCCAATCCA AGCTTGAAAT AA GCATTACA AGCCTTGACC AGGGCGGACA AACCACTAGA ACTTGAGGAC AGACCCGCTG CCGTAGGCAT ATTGTTTTGA GTATCGATAC GGACAAAGCC CTCACCAGCT GGACGATAAC GGTCAATAAT CTTACTCATC TTGGCATGCT CGACCTCATT TTGTAGCTGA CCATTGATGT AAAATTCGTC AGCTGTTACA TTGGCTGGTA AAGGCGACAA GGTCGTCTCT GTATACATAT TTTCCAAAGT TAGAGAAATA CTGCTAGTAG CAGGCACCAT CTCTTTTTCT TTTTTCTTTC CCCAATATTT GATAATAGCA ATA'rTGCGT TGTCTGAACA GCTCCTTTCT CTTCTAATCT GGT'TACCAAG GCTATGATAC AACC'rCCTAG ACCATGGCTA AGAGTCGTTT CAACCAAAAA T'TTTAAA'rGT AAATGCGCTT GACTGAGGAT AATCGCAACT TCTGCT'rGCT GGGTTAATTC 982 AGGAACG'rAC
TTCTGCTAGT
CCCACCACCG
TGTTACAGGC TCTCTATCCA TCTGTGCGT GTGTCAAA'rr C'rCATCTTGG CACCCAGAGC CTTGCCCTTrA TTTTGAACCA GGCAATCACC AAATAGGCGG AAAGCGAATA GGTTGGTCAC GGCAATCATT TCAGCTCGAT ATAGTAGTCA AATACTGCAC
CTTGGATGGC
ATAAATCCAT
TAAGACAGGT
TGACCAAGAT
GAATGGCCC
GTCTGCCTCA GGGCTACTGA CTCCAATTTC TTGTCCCAGT CCT'rCAGCAT CTTTTTGTGA TCCCAAGGCA TGCAAAAACG GTAGGGCATC T'TCACGAGTA TGACCATAAA CACCCGTATC CTCAAGTTCT GTAAATCCTA CGT'rCTTGAT CTT'AGCATCC AAACCACTAG GATTCATATG TTCTAGTACA TCATGAGGCA GATCAGCC'rG TATGCTGATA GCCGCTGACG AACCCATCCC ACAACGAATG CAGGCTTCTG TGATATTCAA CAAGGTATCC TCCTCATAAA GGCGCCAAGG t CCGTTTCTCA GGGATAGCCG AGTCAATCTC ATACTCCAGT GAGGCATAAA CCGCCATGGA ACTCTCTGCA GGAAC'rACCT TACAGGTCAC. C'CCACCTCC AAAAGAGGCA GGGAAATGGC AGGATAACCG TAAACGACCG CATGTTCCCC TAT'rAAAATT ATCTTACTAT GTGCCTGACC GACACCAACT TTTTTTGTCA TTTTrTCCTT TTACTAGACG AAAAAACGTC TTATT'r'r'CA TACAAGTATT AATTCTTTCC TATCTATTTT ATTATATTTT CACAAAAAAA GCGATTGTTT CCATTCACAA TCGCTTCTTT CATTATTGAA CCCATTCGCC ATTATAGTTG ACAGAATAGC CATCTACGGT CGTATTCACT GCCAAGGCAC CTGAGCGCTA TAAGCGTAGT ACCATCTCCC 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 ATTGACCTGG AACCAACCTG AAGAGGAAAG TCGTGAGGGA CTCATTAACA AGTACTCGTT AATCT'rCGCC CATTCTTGAT AGCTTAACA'r TTTTTT'GTGA GGAAAACGCC TTATGAAGTA TGTAAAAATA TATACTATAG ?ITGTTAGAAA TCGATTTGAC TAATATTTGA TAAGTTATCC TCTAACTAAA GGATCCTATT TCGTCATAGA ACGACGAAAG AAACTCCATA CCATTAAGTA GCATGCGCCA TTGACAACCT GTTTTAGTGA CGTACAAAGT
TCGGCCATTT
CGTTTAAATC
ATACAGGTTC
TGCTACGGGA
TAGAT'rGAAA ATAGGTGCGG TGTTTGGAGA AATAGGGTTC AGTATCATAT GCTTTGCGTA TCATAACTCT TAAATAATCG ACCACGAAAA AAGTTATGCA CTTAATTTGA CTAGAATAGT ACACCTCTAC 'N'rCTAAGT
CAATTCAAGA
TTCTAAAATA
TGTCCTGATC GAmTATCCT GTTATTATCT CATrTTACTA TAAAAGTATT ATTATGTTGT TGTGTTATAG ATTGATTGAA CAATTACTAG AACTATCACA TACTCAAGGT CAGCTCACAG ATGAGCAACT ATTTTGGTTA CAATGTCTAC TAAATTTAAG TCAA.ACAAAT AATTTAGTCA
AAATTAAAAA
AATTTTACT
GAAAT'rTCGT
ATAAATTGTA
CAATTAGAAC
ATCTATGGTT
AATAGAGGAA
AGATAAXACT
AAGTAATTT'A
CCCCCCAGAA
GACACACACA
ACTCCTACAT
CATAAATATG ATTACAAAAC AGAATGTAAT AGTGTTCTAC GTAAATTCTG AAGGAAGGAT CACTTCTTCA ACAGAATTr'G TCATTCCAAC ACGGA.ATAGC TGGACTACTG TTTCCTCTAA CTGGATTCTA AAATACTrCTC TATCATCAAG AAGGCAGTGA TATGAATATC AATACTCACT GCTATTTGGT GATGCAGGCT TTATTTTCTA TCAGTAAAAA TCAATACTAT CTACAAT'rAG TTAATAGAGA ATTATGATAC TCTAGAGGAA ATAGACTTTG CTATTIATCAT TAATAAAATA CTATCAATTT ACCAATGACA CACAATAGTA 'rAGGGGAAAT TTATCATTAT TTCCTACAAA ATTTTAGACT ATAGCTTTGC TCATCATAT TGTGGAAT'rG TCTAAAGTCT TAGAACCTTC TATGT'rTTAT AATGATCTCC CAAACGTCAC CGCTAAAAAA CATTGCGAAA ATCTGGTGTC ATACTCTTAA AATTTTCATC GAGATACAGC CAAAGAAAGC CATATGCTTT ATTTGCCTAT 4 9 I. 9 9 9. U.
9. 9 r 9* *a 9 0 9 09 99 4 ATACATTCCA TACTGAATTA AAAAAATTAT TAGGAAA'PrT ACAACTTTCT TGGTGCAAAG TGTACGATTG 'rGACGGAAAC AAAGATAT'rA ATCATCTAAA AATGATGACA GGATArTGCC TCTACAATCA AAACAAATTA CTGATGAAAA AACGAGATGA TCACGGT'TTA CTGATGTTTC ACTTCGGAAT AGGAAGCATG GGGTATATTG TGTGCAGACA TAAGGAGAAA ACTATGAAAT TGCTGCTAAA CAGCTGTTTT TCATCATTCG ATTATGTGGC TCAGTACAAT TTCGCTCCGC TAGAAAAACT TACTTCTAAT GAATTTCCGG AATAATCr'rA TTAGTAAkATA TCAAGAATTT ACGGAATAAC TAGCTTACTA AAATCCAACA GGTAATTTTA
ACTGAAAATT
TATCTTTGTA
GTTTTTAACC
CAAACCACTG
GCATGTTCTG
8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 AAGGAGATAG TGGTAAAGCA GATTTGTT'rG GTGTCTATTA AATAATAAAT TCCCATTCGA TATTTTGGAC AAACAACATA TATAGACAGT GCGACAGTAT TTTCTACCTC GCCATTATCA TAGCGATTT ACTGATTTCC ATTTCAGAGA 99 94 9 9
S
9 9999 9909 9 etc.
.9 99 0e 9 9 TGGTTCCCCT ACTATCGCA.A CTCTTTCTCG GGATTCTAGG TCAAACACGC ACTCTGGATT GCCAAAATCA AAATCCTGCT ?'rCTCGTCTT GTCGCCCT'rr TCATTAGTTT CAGTCATTAT TCTrCTGACAC CTTGAGCTAC CTGTCTGCCT ACATGATGAA T'rAAGGACGA CCTGCATGAT GCCATGGGGT TCAGGCAGTC TTGTCGCCAA TCTGGCTGGC GCATTCCTTA TCAATGT'rAT TTATCAACAC TCTGACTTrT GTCAT'rGCCT T'rTTGGGCCr TGTATGAGGT TGAAAAAAGA ATTGAAATGT CACATACAC AGATTTTCAA GAAAATAGAG CTACGCTATT TTGACAGTAT GATTGTCATC ATCAACCTCA CGCCCTCTAC ATCAGTGTAA TCTGATGAGG GTTGTCCGTA AAGTATTCAA ACTATTTCCC GTATGTTA'rr COACATACCT ACTGAGTTTT AAGAAATATT T'TCAACATCT TAAACAGTCG TGTTTCTGAC GACCAGTATG GCTTCATCCA ACAGACACTA TCATCATGTC 'NTGTGGA0CT TCCGTT'rCAC GCACCTCTTG TCCTTTGTCA AGCCTATATC GCATTCTCAG CCCTCGCCTA GGACGGTGGG CTCrTGCTCTG TAGCCCTATC CATAGCATCG.
T1'?rAATTGC TCTTATCTTT AAAGCATGTG TAGATTT'CAC AGTCACTTCT TTGATTAAGC ACCCATTCGC CATTATAGTT CCTGAGCTAT AAGCATAGTA CCATTACCTG CATTTAGGTA GCTCCTGATG AACGGAGATA
CTGGCTGTGC
984
TCCTGAGGTT
ATTGCCATCT TGGATGTGTC GCACAACTGA GCATTGGGCA ATCCTTGGCA ATATGACCAG GTTTTCTGTG AGATTTCCCT GTAATTTTCA TGACCAGTTT CAAGCAGCTG TCTTTGCCCA AGCACAGTGG ACATTCTCGC GGCGTTTCGG TGCAGT'rAGC TGTCAATGGT TAGTCAAGTT ATGCTTTTAA TCTCCCCAA'r GAG'rGTTCTA ATATAATTAT GACAGAATAG CCATCTACGG CCAGTTGCCA TTGACCTGGA GTACCAAGTr GAACCATCTT GTACCATTTG TTCCCAAGGT AAAAGATACC GTCATACTAC CCCTCGGCTG ATTGCCCTCC ACTCCTCGCC CTGCTCTCCA CAGTAATCTA TTAAAAATA -ATTAC'CTA ATAACTAGTA CATCAGTTCT ACGATTATCG TATCCCCAGT GACAAGATGG CCCGTCCCTG CTCTCCCTAT ATTGATATTT TTGTATCTTA CAACACTCAT AACTAACGAA CGTCAGGTCA AGTACAACAA AAGCGCCCTC TCATTACCGA TCGTATTCAC TGCCAAAGCA ACCAACCTGT CTTCATGTCT GATACCAACC AGTTGCCATA TTTGCCAACC TGTT'rrCATA CCTGATACCA GCCAGTGGCC GATAT'rGCCA ACCTGTTTGC CATCGTTTAT CCACCCCGCA TCTTAGCTAG ATAATACCAG TTTGATTAAA GTAATAGTTC GT'rTTTCCCC GTACTTTCTT CAACAAACTC TTGTAGCACT CTTCTAATCT GATACCATTT TATATTGCCA GCCGACAACC CAATCACTCG CCAGTAGGT CTTGGACAAA CTGCCATCCT CAAAGAAAAA TACTGTCAGT CCTCCAATAT TAAATCCACT TTGGCTACCT GTAACCArlrG 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 TCGCCATTTG GGTGGTCTAA ATAATACCAA GTGGTACCTT ATTGCTCCTG AGGAACGGAG GTAGTACCAC TTATTACCTA ATAATACCAG TTGTTGGATC TAGGTAGTAC CAAGTCGAAT CGTCTTTCAC CACCAAGGTA GTTTTCTCCA T'rAATTTCCG TTAGACTGAT CATAAAGCCA ACCTGTCTCT AAAGAATGAT GTATAATAAC GCTTCTCTTC TTTATCTTCT GAATCTTCAC CCAACACTGT CTTTAGTTTT AATCTCTAAT GTTTTCCAAC CCATTTTTAT CGAAGTAGTA CCACTCTGAC TTTGGAAAAC GGGTAAGGAC CAATTGTACT ACCTTTAGAT GGAAACGGGA -ATCTCTCCAG ATAGAGAATC AAAATAATAG TACTTACCAT TCTTTGAGGT CCCCCTTTTT GTAGTAGGTT CTTCCGTTT'r TCAGAATCAT CTGCAAATAC TGTACTGGTC CCTAGCAAAC CCAACTTGCA TAGTr'rTTrT CAAAATTTTC ATCTATATAC CACCAGATGA GGCGAAAT'rA TAAACTrTAC CATCGATAGT 985
CTCCAGG
INFORMATION FOR SEQ ID NO: 147: Wi SEQUENCE CHARACTERISTICS: LENGTH: 11340 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 147: 11887
S
S
S
S
S. .5
S
CCGGTATGTT CTGGAATACT ACCAATCTAA GCTGGCTGTG GTACGAATAC CTTAAGGAAT ATGACCGATT TTTCAGCTGG AAACGCTGAT AAAATATCCG ATATTCCTTT ATCAGTnI'TG CATGGAATCC TTTATCCTTT ATCTACGTGA ACGTCCCTTG ACAAGGTGTT TCACAGACAA CTATCAATCG AACCTTATCA GTATCTAACC GAGGAGGTTG AAAACGATCA GGGGGAACCT GAAAAAAGTT TCCACCAAGA AAAAGAAAGA AACCCT'rGCT GCAAAAACTC TTTCTAGGTG ATGAAACAGA AGGTTTTCTA CCCACAACAG CTTTCAAATC GAGCTCTCTC ATCATTCAAC AGCCATTATT GCCCTTCTCT TGGCATCTGG TGTTCGCTTA TCTAAGAGAT CTCAATCTAA AAATGATGGT TATTGATGTT TGACTCAGTC AATGTCGCTG CTTTTGCTAA ACCTTATTTA GAATCAACGC TATAAAACGG AAAAAACAGA TACAGCCCTT TGTTCCTAAT CGTATCGATG CTTCTAGCGT TGAGAAAATG TTTTAAAGTG CGTGTAACAC CCCATAAACT GCGCCATACA CCCTACAGTT TTACAACCCT GTTTGGAGT CTGGT~ATTTC GAAAATATGT CTAAGAAAGA CTGA-ATGCTA ATACAACAAA GCACT'rTCTA GTCTTTACAA.
TATTTCTATC GTAATGTAAT GCCAGAGCTG AAAATATCAA AC'rTATATCG ATCAAGAGCA AAAAATAAAG AACGAGATTT TCTGA.AGCTG TTAATCTAGA ACTCGAAAAG GTTGCAAACG GAGAATTATC TGGCCATTCG TTTTTAACTC TCTACAGAGG GTTGCTAAAT AC'rCAGAGGA CTAGCAACTA GGCTCTATGA CATGCTAGCA CACAAGTCAC GCTCTGGATA GTTTATGAT'r AGTTGGCCAA CTTCTTTTTG TACCAGCGAA GTAGCGTGTG CGTATTTCAC CCCACGAAGG CACCATAAAT CTTGATGTCT TACCACTAGT GAATT'rGATT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320
TGCGACTAAA
TGACCTCTAT
TCACAAGTTT TAGTCAGTCA CCAACTAGGA ACCCATATTG TTAGTGATGA ACAAAAGAAT TTACGTATTT TAAATTATGT AAATAAATAT CAAAAAAAGA ATTTATCCAA CTACCGCTTC AGCGAFI'TCT TCACGGCTAA ATATCAATGG TTTTTAGCGC CTTAAGA-ACA TCTTCGCGTT ACATCTTCTA CTGCAGCAAC GTCTTCAATA CCAAAGAAGT TGGATTTI'G ATTCAGTAAC GTTAGCAAAG ACTTCAACCT 986 CCACGACGGA CGTTAAATTC AGGTGATT'rA CCATAGTTCC AGTCCCAAGT TCCAA.ACTTA GTATCC'rTGA TGCGATTGAT rTCGGCCAAT TCTGGGTACT CTTTTTTCAT GTATTCCAAG T7'rMrGGTA ATTCATTGAT AATATTGGTT GAT'TCAAATT 'rATCTTCA AACCTTAAGG AAGAGCAAGC AACCGTGGTG CATGATACGG 'rCTTCTTCTG AAA.AGACGTA TTCAGTCATC AGTAAATCAC GGAATTTC GACTGTGATT ACACGGGCAC GGACGGATTT CACACCTTTT GCA'rTTGCGA GGACTGACAA ATCAACGTCA AACTTCTTAC CATCAATCTC TGAGCCAGGG TATTGATAAC TCTTCT'rrGG AGATGATCGT CCACTAATAC GCCGAACTAC TCGATAGTGT TCTGGTGACG ATTTGATCCT CATCCAAAAG GTGTCATTTG AATGA'rTGAT CTGGAAATAA CC1'TCCAGTC GCCATTCCTA CAACATCTGC
AAGGTCATTA
CGGAGT'rGAG
GTAGTTGAGG
CTCAATACCA
ACCAACAATG
CCGTTGATAT
CGACCTGTGA
AAGCTCTTGA
TTATTAAAT
TTTTCGCGAA
ATAGATGGCT
AGGCTTGGGC ATTGCCACAG ACTCAGCTTT AACCCCAAGT AGTCAAATGC CTTATTTTCA CGTGGTAAAC AGCTCCACCA CATAATCACG GTTGATTTCT TGTTAATCCA AAGTAGGAAG CCAAGGCAAT ATTA.AAAGCA GTGTTTAAAG GCGTATTCTT AATGTATTTC ATGATATCCC TAATCTATCT TCGTTTTATT AAACGCTTCG TACATCACTT
TTTACTTTAT
TTTTCTTAGG
ATGATAGAAA
TGAATGGATG
TGGATGGTCT
TTTATrAATT
TTATCAGCGA
GCAGCAAAG'r
AAACCTACTG
GCAAC'rGCAT
GCGTGAGCCA
GTTTCCATGT
TCAGCAT'rTC CTCAACAGTG ATTTCCATTT CTGCGGCTGC AGGACCAATA. ATGTGTACAC TAACTTTTAC GAAACCT'rGA GCTGCGTCAG TAAACTTACC GATGGCAACA 'rCGTATTTCT CTCCTACTTC AGGGAGAGTG TAGATGGCTG GATTTCCTrT AAGGGCATTT TCAGCGGAAA ACA'rCTTAG'r ACCGTTGATG TCACCTGGTG ATTCGTTGAC CTTGATACAA CCACGATCCA CAGAGTAAGT TGGGTGCCCG CGATGATGCT TGATGCTTCG CAAGGATTTC TCCGTATT'rC ATGCAATAGC ACGACCGTTA CACGGGCTTG TTCTTCTGTC CAGGAGTCAA ATTCAATTTG CTTCACCCAT GCGGAAAGCT CATAAATGCC TGGAACTGAA ATTCAAACTC AACCTCTCkCA CTTTCCTTGC GATGATATCG CAATGATTTC TTGCAGTT'rA 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 ATIACCTTCAA GGTCTGGCA'r ACGACCAATT GAAAGAAGAG TCTTTTCCTT CAACCTTGAT ACGAAGTTGA CCATTTTCCT PTACCAGTCA AGATGGTCAT TCCTTTACGC 'rCAAGAATCA AGCGAAGGTT CTTAGAAACT TCCACATCCA TAGCTGGAAC TATACGGTCC ATCAT'rTCGA TAACAGTCAC TTT'rGAACCA AATGTCATGA AGGCCTGACC GAGTTCGA'rA CCGACAACTC CACCACCGAT GATAACAAGG CTTTCTGGCA CTTCGTTCAT TTCAAGAATG TCATCACTAG TCATGACA.AG TGGAGATTCC ATACCAGGGA CGTTGATCTT GTTGACTTTT GAACCACCAG CAAGAATGAT TTTCTTGG'T 987 TCAAGCAATT CAGAACCATT TACCAAGACG TTCTTGTCTT TAGTGATTGT ACCAATTCCT TTATGAACAG 'rAACTCCGTA GCTACGAAGA AGTCCTGCAA CACCACCAAC AAGAGTATTA ACAACTTTAG ATTTAGTC TAAAAGTm TCCATATCAA CAGTGAAGTT AGGATfTTTCA ATCACGATAC CACGATTTGC AGCATGACCG ATATTTTCAA TAA?1'TCAGC G?1'ATGAAGG TAGG;TCTTGG TrGGAATACA TCCACGGTTT AAGCAGGTTC CACCAAGTTC AGAT'rTCTC-A ACAAGGGCAA CCTTACCGCC GAATTGGGCA GCTTTAATGG CTGCAACATA ACCAGCAGGA CCTCCACCAA TCACAACGAT ATCAAAAGCA TCATCGCTCr TACCATCATC GTTTGAGGTA CTTGCTACAG GTACAGGGCT AGCTTCTGGC GATGCTGCTC CAGCTGTTGG GATGTTTTCC CTTTrCTTCAC CAAGGTAACC GATAACTTCC GTTACAGGGA CAGTT 1CACC ATCTCCTTrG AGAATGGCAA TCAAGTACCC ATCTTCTTCG GC'rTCCAATT CCATGCTGAC TTTA'rCAGTC ATGATTTCCA AAAGGATTTC TCCTTCTTTT ACAAATTCTC CGACTT'T'r ATTCCATTGG ACGATTTGTC CTTCTGTCAT ATCCACGCCG TCTTCCTTCC TTTA'rC'ATA TCTTAAAAAT ATTGGCGTT'r CAATCAACTC TTTCAAGTCC ACGACACGGT CGTCAATGGT TTGACGACAA CTGGCTTCTC TTAATAATCG GACCAAAGGA GAATTTI'GTA ACTCACTTGG AAGGCTACAA CCAGTTCTGA AATCCATTAT CCATCCCAAC TTGCCATCTT CTGTCAATGA GCAAGCGAAA GAAGGTCTGT AGAACCTTCT TACGAAGAC GTrGGCGCAG TCAAGTAAGA ATTGGAATAC GCTCGATTTT TCAATCTGAG CAGGAGATTT ACATCCTTCT TCATGATTT TTATGTTCGA GGGCAATTCG TAAGTT'rCCA CGTCTTCTT
TAATCCTAAA
GATTGTCGAA
CTGAACACCA
AGCCAATTTA
AAGACTCATC
TGCCATGGCA
AGCGTTGATG
TACAGTAGTC
GCTTTTGGCA TAATTACTTC TAAGGCCATG GAATACTCTT GCTCTTAAAT TAACATTGAG TTCATAAACT TAGCACCAGC CATACCATCT CTCATGATTG GGCGAATCAC AATTTCACCA CTGACACCAA GGA'rAGCTGA GTTGGGTrGG AACATTCCCA AAT'rACTGAT TGTGAATGTT CCATCCAAGG TACGGCCAAT AACATCCTTA TTCTCAGCAT TGTAAACAAC AGGTGTCATC AGATTGACAT AGTTG'rGAGT GATAATAGTC TATGGGTGTrT TCATAAGAGT CTTAACAACT TTCTTCCCAG TTGCTTCCAT GATTGCTrCA 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 CAACAT?1'CA GTCATATCAA CTrTCATAGTT TTCA-ACCATG CGT'rGGGCAA TAACCTTACG ACCATATGGT GT'rACGTTAT CAGGGACTTC GATGCTATCG TTTTCGATAT TTTCAGGAAG ACCACGATGA CCGGTTCCTT GGATTTCCTG
GAGGGTGAAG
CATTGGTGTC
TTCCAC'Tr
CAGGGCCAAA
CCAAGCAATG
TTTTGCAAGT
GTGGACACGA
GGCGAAATGC
CCGTTTGCAC
GAACCACGTT TGTGTCTTTA CTGAGCCAGA AACGTCGTAG
AGGTTTATCC
'rCAGCCATGA
TAGGAAATCG
GCCCGTGCTC
TTC'rCGTCAG
AGTTTTTCGG
TTGTGCATAA
GTCAAATGCT
988 CTAAA'rCATC CGCTAACTTT CTAGCTGCAG GAGTCGCTCT TAGCTTGTCA CCTCTCCAAT TCTATTTATG A'rACAAAGGG CGTCAAAAGC GACTGAAAAA ACGATGGCTT CGATGAAGCC AAGGAGATTT ATCT7MrTC CGATCTTTTA TAATCTAAGA TATTAATGAC CTTTATTTrA TTTACATAAC ATTGCATCTT TGATACTTTC GGCATCGGCA CATCTTCTCC TCTGATTCTG AAATAATAGC GAAGAGCTCTr GCACCTAAAA GATACAAAGT TTATCr-rATG TAACCCTATT C'rTTGT-rATA AACTGTTGGA ATCATTGCAT TTTCTAGGTT TGCACAACGG CGAATTGGTG CATCTAGATA TGAAATTTCA CCGATATAGC CACTTGTTTT GTGGGCATCG TTGACCAGAA CAACCTTACC AGTCTTCTTC ACTGAGTPTA TGATGATATC CTTATCAAGC GGAACAAGGG TACGTGGGTC AACAATTTCA ACTGAAATTC CTTCTTCTGC a a A *S.a a. a.
a a a
TAATTCTTCA
ATCCGTTCCT
TGGCACTTCC
ATCACGGATA
CT'TAAGTCC'r
GCCAACTCCG
CATGTAACGT
CATGAAGGTC
AGAGATGGCA
CATTCCAACA
GCAGCTTGAA CCACACGC TGGCGTTTGA TTTCACCAAC CCTTTTTGGT TAAATTCTGA GAAGACTTAA GCAGGCCTTT GGAATGTGAG TAAACCAAGA TTACCAGCTG CACAACGAAC GTTTTAGCAG CTTGGTTGAC ATATCGACGA TrGGACGAAG GCTTCAGAAA TCGGACAGTC GAAGTACCGA AGTCTCCTCC AAGCAT'TrT CCATAAGTAA CAACTGT'rAC CCCAAGTGGA ATTGTGTAGT CTGGATCAAC CTTGTACTCA AGTATAATAA CTGGGTTGTT CATGTCCGCA GGTGTTCCAG GTGCCACAAC CTCTAGAGAT TGTGAGTGCT GGGCGGCAGA AGTCATTGGA ACCTGACCTT TACCACCAAA GATATTGTCC ATGGCAATAA CAGAGAAGTC TCCTGTCATG GCTGCTCCTG CTGCTGCTCC ACGGACACGT TCTGGACCAA ATTCT'rCAAG GAAGACACCG ACGTCTTCTC CCATCAAGAA 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 CACATTT'rCA TCGCGACGCA TTCCTCAGA CATAGCAAGG ATA.ATGGTGT CACGGAAGGA CATTGTTTTT GTTT1CCATTT TATCTCTTTC TCCTTAGTCT GCGTAAATAT CTTCAAAGGC TGATTCAAGC GGTGGGAATG GGCTTTCCTC TGCAAATTTA ACAGAAGCTT CTACTGCTTC CTTTACTTrGC GC'rTGGATTT CTTCCAATTC TTCG.GCACTT GTAATTGCGG AGGT'rTTCGA TTGGATCTTT 'rTGTTTCCAC ACGATATTTA CCAGGGTCAG ATGATGAGTG ACCGAGCCAG CAAGACTGGA CCATTGCCAC TGCGAACATG GTCCACAGCT ATCGATGACA 'rTGTTACCGT CTTCGATGAA CATTCCAGGA TTGATGGATA TGTTCTATAT TGGTCATT'rT CTTGA2'ATCC GTTAATGCAA TAGAAAATGA C1'GGCAGGTT CCAGATAGAA GCAATGTTAT TTrTCAATAAG AATTCCACTT CTTCACGCGT CGATAAGTTA CACTTTCAAT TTCTGAAATC CTTCATAGAC ATTCCATAAG CGGCGCTACG GCAGAGATAC CGTAACCGT'r GCCATGTTCA CTGCTTCGTG 989 GAAAACACCT TCATTGGTCG CACCATCTCC AAAGAAGCAG ACAACGATT 'rACCGGTATT 6720 T'rGCAT'N'GC TGACTGAGGG CTGCACCGAC ATTGGCACCA AGGTTCCCAG CATCAAGGTC ACAGGTrCCA GTGTATTTAC CAAGGATTTC AGCAATAGCT TGCCCGTGTC CACGGTGGTT TAACATAGCC CCCACGTT'AG CTGCCTCT'rC TTTCCCTTTC TTTACTAATT GTGCAATTTT ACGGAACAT'r TCTAGCAAAA GATT'TTTATC TTCTTCTTAC CTTACTATTT TACCGCTTT'r AATTrTCACAA AATAAAAAAG AAAACCCCGT TTT'rTCACAA ACTTTTTAGC ATTTGGATTT AGTTAAACGC C.AACGGTAGA GCGCCCCGCT GTAAGAATAA GCTCCAAAAT CTGTTAGGGA CCCCAATAAC CAAGCAAACC AAGGTAAAAA AGCGATCCCC ATACCACCAC CTACGATACC AGCGATATGC ATAGATCCAC CTTTCCcTTr' AGCCATCATT CCGTTGAGGT CAATCCC'TTT TGAGGTAATC AGATCATCTG GAI-rGAGAGC ACCAACAGAA AAGTGCGTCA TTCCTGGCAC TAAGTCCATG CGACGGAT'rr CTTCCATCTT TAAAGTTGAC ATCTTCTTGC CTTTCTAACT GGCAAATACT GTCAAAGTTT TTCTAAAAGA GAAAACAAGG GATTTTCTTG TCAAGAATAT TGCTAAAGAT TCAAATCTCT TCATAATCAC CACAATCAAA CTAATAATCA AGCCGATCCA ATCAAATAGC GTAXICACAGG GATTGCTACG GGAkATAACTG TATCCTTATA CCCCCGCAAA S S 9* 5 5 555 5 ('5 5* 5 55 5* S S 6 55 S S Vote, *0 ATTCCCTGAA GCGGCGCCGC AAAGGTATCT GCTAACTGGA AGA.AAAGACT ATAAGTTAAA AAACGCACTG. TCAAATCGAT AAATTFTGGG TCGTTACCAT AAAGACTGG;C CACATT'rCCC CTAAAAATGT AAAGGAAGGT TAAGGTGAAG GCCGCAAAAA TGAGGGCAGT CCATCTTCCT AGACCAATAT AGGTTTTCGC ATCATCAAAT CGCTTGGC'rC CCACTTCATA GGAAACGACA ATAGCCATAG CCGATGAGAT ACTCATAGGA AAGGCGTACA TAAGACTTGA AAAGTTCATA GCTGACTGGT GACTAGCTAT AATCAAGGGC GAAAA6TTAG CCATAATCAA GCCAACCACT GAAAAGATAG CCACTTCCGC GAAGACAGTT CCCCCAATAG GCAGACCTAA ACGAACTCCT TCCT'rAATTT TATCCATATT AAGTGGAATT CGTTTCTCAA GG'rGTAAG;GC TTTGAGCTTC TCCTGTTTAA ATAAAACCAG AACAGAAATC CCAAGCAAGA CCCAGTAGGC CAAGCATGrT CCTAAACCAG CACCAGCCCC TCCCAGjTTCT GGAACACCAA AGGCACCGTA AATCAAGAGA TAGTTAA-ATC CGCTATTGAG AGGGAGTAAC AAAAGCATGA GGTACATGGA CAGTTTGGTC AAGCCCAGCG AATCCAGCAA GGAACGAATG ACGCTAAAGA GCAACAAGGG GATAATCCCG ATAGAT AA .Ai ACCAAAGATA GCGAACCGCT ACTGCCGCTA CTGCTGCTTC TAACCCAATA TGATTCAAGA TTATTGGTGC CAAGAAAAGT ACCATCCCCA GCAAGACCAC AGATAGGCCC AAGGCCAAAT AAATAAATTG GTAAAAATCA GACGCAACTT CTTCCTTTNr GCCTCGACCA 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 990 AGATGGTGAC CAA'rCATAGG CACCAAGGCT GACACAATCC GGATTCCAGA TACTGGT'rGC CATAGArACA CCAG.CCAAGT GTCATTGCAG TATCAACAAA AGAGGCAGAA TAA'rrGGCAA AAGAAAATTT rTAAAAATAA TACTAACTTC TCTCGTAAAC
CTCTTTCTAT
CGGAGGTTTA
TCTAACTAAA
TTGAATTTTG
CTCCCTACGT
GTCCCTTACA
TAGCTGCTTT
TGATATCTGC
GGTTGTATGT
TCTGATTTAT CTAAACCAAA GAGTTTCAGA TTAGAT'ITTG AAGTAGTATG CCAACACGCA CTGTTAGAAA TGTAAAGAAA CCATAGTCT GTATTGACCT ATTGGTAGAr CAGGATTGGG ACTTTGTCTT ATACATACTT CCATAGT= TCAAACTTAG CATCTACGAC AATAATAGCT GCTTTrCTT TAGTr'rCATA CTATTAGATT TCGCTTGTCT TCAAGCTTGA TAGACGATTT TTCACCGATG AATGGTGAAT
CCTCCGTTAT
GAACGAT'rAG
TTTCGAAAAT
GATGGTATAT
GGAAGCAAAA
TGGACCATTC
CAT NrT'TCA CATATTGAAC CGCATGGTCA CTGCGGGACA GTAAATTCCA AATTCATATT CTAACTCCTA TTAACCTGCC CTTTTAAGGT 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 TGGGAGTCCA CAAAGCGGTC AGCCTTGGCA TCAAAAATAG TCAGCCAACT AACCTGCT'rC AAAGTTGTAA AGCTTCGCTG AGTAATTCCA TCAAGCTCAA CTCACCAGCT TCTACTAAAT AGGTCAAGCT GAGAGACAGG GATGTTTCTA AGCCAGTCAT ACCAGATGGC GCTTTGGTAA TATCCTCAAC ATTTTTrCA TCTACATCAT GAGGCGCGTG GTCAGTCGCA ATAACTGTGA TGACACCTGA TTTGAGACCT TCGATAACGG CACGACGGTC TGATTCCAAA CGAAGCGGTG GATTCATCTT AGCAT'rGCTA CCTTCGT'NA AAAGAACTGC TTCTGTCTTA GAGAAATGCT GTGGCGCTAC TTCTGCTGTG ACr'rCTGCAC CTAACCCCTG AGCAAACTCC ACTACTTTAA CACTTTCTTC CTTAGACAALA TGCTCGATGT GAACATGGGC r-rTAGTTGCA TAGGCAATCA TGACATCACG CGCCATCATA GCGTACTCAG CCACCCCAGT AGCACCGCAG ATATGGAAAT GTTCTCTAGC AATATTTTCA TTAAAGCCAA GAACACCGTT CAAACCTGGA TCTTCCTCAT GAAGGCTGAT AAAGGTA'rTG AGTrTTTTG(G CT'rCCTCC-AT GGCTTCCTTG ACAATCTTAC TGCTCTCAAG CGGAATACCG TCATCAGAGA AACCAACCGC ACCAGCTTCT AAGAGTGCCT TAAAGTICAG'r CAAGTTTTTA CCAT'rAAAGT TTTTAGTAAT GGTCGCAACT GTCTTACAT TAATCTTCTC TT'rGGCAGCT GACTGGAGAA CTGCTTGCAA AGTCTCCACG TCTGAAATGG 'TTGGACTGGT ATTAGCCATC ATGACGACAG TAGTAAAACC ACCTGCAGCG GCTGCTAGGG CACCAGTATG AATGTCTCTTc TrAtGTGTTT GACCAGGTTC ACGGAAATGA ACATGAATAT CGACCAAGCC AGGAGCAACC ACAAGACCAG TAGCA'rCAAT CGTTTCTGCT CCTTCTTCCG TGATCTCAGA CGCAA'rTTTG A'rAATTTTCC CATC?rGAAC TAAGACATCA CAAACTTGAT CCAAACCAGA CTTGGGATCC ATTACACGAC CA'NrTTGAT TAGTAGCATC TGCTT'rCTCC 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 TTTATTCATA GAAATCAACT TGGGTATCCA ACANITPTATC CCCATCATAA ACAAACTTGG CTGAAAAGAA GGGTN-rATCC TCTAAAAGCC ACTCAACAAA GGI'GTGGTCA CCTTCCCAAG TCGGCTTGCT CAAAACCTCA TCATAGGGAA CCCATTCTAG CGTCCCCTCA TTGCAGTCAA TCAAGTCGCC CTCAAACTCC GTCACCTTAA AAACATAGGT GTACCAGTCT AAATCTGGTG TAAATTCAGG AAAAGTGATG ACACCTTTTA GAACTGGCTT GGCTTTGAGC CCTGTT"1CTT CAAGGATTTC ACGCGCCGCG CATTCCTGGG GCGTC'rCTCC TCTCTCTAGC TTACCACCCA CACCAATCCA TTTCCCTTCA TGGACATCAT TGGGTTTCTT ATTACGATGG AGCATGAGCA GTTCTTTCCC ATTATCAATG TAGCAAATCG TCGCTAACTG AGGCATAT'rT TCTCCTTATC TAAGCCAATC GATTGGCTCT TGTCCTGTCT CTTTrA-AGAA TGCATTGGCC TTGGAAAAGG GCTTGGAACC CCAAAATCCT CTATAAACCG ACAAAGGACT TGGATGGGCT GATTCGATAA TCAAGTGATG AGGATTGGTA ACTAATGCCT TCTTCTrACG TGCATAAGCT CCCCAGAGTA CAAAA.ACGAC TGGTCTATCT AGATGATTGA CCACCTGAAT CACAGCATCA GTAAAAGGCT CCCAGATI~rG ACCAGCATGA CCATTGGCCT GTCCAGCAGG AACAGTCAAA CAAGCATTAA GAAGCAAGAC TCCTTGCTCA GCCCAAGCTG TCAAATCATG AGATTTCTTA ACTCCGATAT CATCTGACAA TTCTTTCAAG ATATTTTGCA AGGATGGTGG AGCTGGGATA GAGTCAGGTA CAGAAAAACT CAAGCCCTGC GCTTGACCTG GTCCGTGATA GGGGTCTTGC CCTAGAATPA CCACCTTAAC TTCTTCAAGC AGTGTTGTCA AGAGAGCCTG AAAAACCTTT TCCTTGGGTG GATAALATAAT CCCCTGAGAA TAGACCTGCT CCATAAACTG ATTGATTrTC CCGAALATAAC CCTCAGGTAA TTGCGCCTTA ATCAAAGCAT GCCAAGACGA GTGTTCCATA GCCGACTCGG INFORMATION FOR SEQ ID NO: 148: SEQUENCE CHARACTERISTICS: LENGTH: 12127 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148: AAAAAATAGA CTTGTTAGAC TATAAATGTA GTAAGCCTAC ACAAGAAAAA TACATAGAGA TAAAGGTGAT TATTATGAAA TTCAAAAAAA TGCTTACTCT TGCAGCCATT GGCTTATCAG GATTTGGGCT TGTTGCCTGT GGCAATC!AGT CAGCTGCTTC CAAACAGTCA GCTTCAGGAA CGATTGAGGT GATTTCACGA GAAAATGGCT CTGGG;ACACG GGGTGCCTTC ACAGAAATCA 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 120 180 240 992 CAGGGATTCT CAAAAAAGAC GGTGATAAAA AAATTGACAA CACTGCCAAA ACAGCTGTGA TTCAAAATAG TACAGAAGG'1 GTTCTCTCAG CAGTTCA.AGG GAATGCTAAT GCTATCGGCT ACATCTCCI' GGdATC'TTA ACGAAATCT~G TCAAGCTTT- AGAGATTGAT GGTGTCAAGG CTAGTCGAGA CACAGTTTTA GATGGTGAAT ACCCTCTTCA ACGTCCCTTC AACAr'rGTTT GGTCTTCTAA TCTNTCCAAC GTCAACAAGT GGTCACAGAT CAAGCCAACA CTTATCAGGC TGGAAAAATT AGCAGAAGCT CTAATGGGTC ?1'CAGCAGGT TTTCTAGGGA ATTAACTCCT ACGGTATTC~C TGTTGTGGTC T'rGCAGACGT TTTTAGTGGC CCATAAATCT CTAAAGAGAT GGGAGGTGTC GTATCTCCCT CCTTCAAAGA AGCAGTTTTT CTATN'TTGCT AATCTGT'rTC GCTTTGCCCG TTrTTTTAT'rA GTATTT'rACC AATGATCGTT TGCCAACAGG CATCTTGACA CTAGGTCAAG ATTTTATCAG AATAAATTTA T'rGAAGCTAA AAGTTGTCTG TTGTAGGTTC
TATAAAAAAG
ATTACCGCTG
GAAGAAGGTA
AATAATGACA
AAATTAACCA
AAAATCCAGA
TTAAGGAGAA
AGAGTCTCAC
ATAAGGCAAG
CCTGGGACAA
CTTTrATCCAC TCCAAACAAG AACCGAAACC ACGGAATATA CACTTCAGTA TCTTCTTTAA AGTTACGATT GATATTACCT AACCGC'rGAT ATTGGTATGG CCATGATGCT ATTGCTTTAG CCAAGTCAGT ATGGCTGAAC GATTAAATAA AATGTTTGCT TAAGA'rAAAG AAGGCAAGTA GCAGACGTTT CATCGTACAA
TACTTTCTTC
AGGGCAATTT
TTTATTTTI'A
GGCAG'rGAT'r
GTTCCTTAT
TCGGTGTTTA
ACTAGAAAGG ACAAGATGTG TTTTCATGAG TGCAACAGTA GTAATGGCTT ACCTTTCATA GGTCGCCAAC GAACAT'rCCG TAATTACCTT AGGAGCGATT TGGTTTATTA TTGTCCAAAG CAGCCATTCC ATCTATTGTT GAAGCTTTT'r AGGAAATGGC TT'rTGCCAAC CATTATCAGT ATTCTGGTAG CTTGGCTCTA
ACAAAACAAG
GCTGTTGTAG
GCTAACTrACG
GCAAGCTATG
GTGATTGGGG
CCCGTCTATG
TATGGTTT
ATGAGTGTCC
TTGTCAGAAT
GGAGCTAGTC
540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040
GCTTCTTAAA
TCGGCCTACA
TAACCGCTTC
ATCAGCTATC AACTTGATGG ATTATTGrGTG CCTTGGATTA GTTACTATTA GGAATAATGA CTGCTATCCG AACAGTTCCC AAAACC'rATT ATGAACGGAG TATTTTTAGT GTCATCTTGC CAGCTGCGAG ATCTGGTATT TTATCAGCAG TTATTTTAGG AATCGGTCGC GCAGTAGGTG AAACCATGGC AGTTATTTTG GTGGCAGGCA ACCAGCCGAT TATTCCAAGT GGACTCTTTT CAGGAACCAG AACCTTAACA ACCAATATTG TT'CTGGAAAT GGCTTACGCA TCAGGTCAGC ATAGGGAAGC CCTTATTGCA ACCTCAGCAG TTCTCT'rTTT CCTTATTCTC TTGATTAATG CCTACTTTGC CTACT'rGAAA OGAAAATCAT CTTATGAGTA AATACCTGCT AAAACTTCTC GTTTATTGTT T'rTCAGCTTT AACCTrTGC TCTCTCTTTT TAATCATTGG TTTTATCCTC ATCAAAGGCT TACCTCATCT AAGTCTATCC C'rC?TCr' GGACTTATAC GTrAI-rCTGG TCITrGGTGC TATCTTGTCG AATATACAAA GATACCTTAT CTGGGATTCC GTCTTCTTAG GTTTT'CAATA ?TGCCAGTCA TTATTCGCTC CAAGCAAGTT ATGGACTTG GTTGCCATGC CAGGTAT'N'T ACAGCTGCCC TCATGTATAC TCAGGCCGTT CTCTAGCCCT GAAGCCTATG CTACCGGCGT 993 TTCTGAGsAAC ATTTCCCTTA 'rGC-CAGCGAT TCTTCTTTITA GCCNTGCCCA TAGGGATTTTr AAAAGAT'rCC CI'TGTGTTA AA.ATCATGCG TTCCATTGT'r T'rrGGTCTGT T~cGCATGcT
TATTTCCACC
TGCTGGTTTT
ATTGGCCTCA
CTTCTTGTA
CTCTCTGTTA TCAGGAATCT AACAGAAGAA GCCCTTTAT 'rAACCTCAGT TATC-ATGGTG CTGTrAGTGA TAGCATGCGT TTTTTAGAAT TGTTCTACCA TTGGCCGTAT CCTTGGTGAA CGCCAAGTAG TCTCATGTCT
GGCAGGTAAG
AGCTGGAGTG
AT'TAGGTACC
TTrACGGAC'TG
ATACTAGCTA
TCTACCAATA
ACATATGTAT ATGCTGTCAA GTGAGGGGCT GATTTTGATT ATTACTGTT'r TAATGATAA 6 *g *6 6@ 6
G
6. 6 AGCTTATTAT CTCGAAAACT TGTGAAAGGA GCTTCCTAGT ACACCTAGAC TTATTTTACG GGGATTTTCA AGCCTTAAAA AGAAAGACAG ATTACTGCCT AACCCTTAAC CGGATGAACG TGATAGGCCC ATCTGGTTGT ATTTGGTTCC TTCTTGCCAT
ATGGGAACAT
AATAT'N'CGA
GGCAAATCAA
ATTGAAGGCC
CAGCTACGTA
TATGATAACG
ACATGTCAAT
TAC'rCTATCA
TTTCAGTCAG
TTCAATTACC
CTTTCTAAA
AAGTCCTCTT
AGCGTGTAGG
TGGCTTATGG
2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 AGATGAGCAA GATATTT1ATA G'rAGCAAATT CAACCTTAAT GATGGTTTTT- CPLACAGCCTA ATCCCTTTGC CATGTCTATC CCCAAGGACA CATGGTA'rTC GAGACAAAA.A ACAATTAGAT GCCrAGTGG AGAAATCT AAAAGGGGCA GCCATTTGGG AAGAACTCAA AGATGATCTT AAAAAGAGTG CCATGTCCTT ATCTGGCGGT CAGCAGCAAC GCCTTTGCAT TGCGCGAGCT TTAGCAGTAG AACCTGATAT TCTGTTAATG GATGAGCCGA CTTCAGCCT'r AGACCCTATC TCCACTTTAA AAATTGAAGA S. S 6 C.t6
S
66** 666*
S
6.66 S. 56 6 6 e
CCTCATTCAG
AGCTCACGT
AGATACCGTT
ACGGTTCCGA
TTAGAACAAT
CTGGCCT'rAG
ATCAACCAAG
CAACTAAAAA AGGIATTATAC ATT'rCAGATA AAACTGCT GACGTGTTTA CCAATCCAAA TAAGGAAGGA AAAACCTATG CCTTTTTAGG ACTAGGGCAA CCTCCAAAGA CAAGGAGATG GTCAAAGCGC .!IATCGAATTG.
GATTATCATT GTTACCCATA ACATGCAACA TTTCTTAACA GGAGAAATTT GCGAATTTGG AGATCAGCGC ACAGAAGACT ATATTTCAGG AGAAATCAAT TTGACTTAGA ATTGCATGAA CTTGTCCTTG AAACAGCTTC AAAAGCCTTA GCAGAGCTAA TTATCAATAA GGATCATGCT ACCTGTGCCC, GTTTGTTGGC CTTGCAGCAG AGCATCATGT CT'rCTTGTTC AGACCTTGAA CCACAAGTGT CTGACCTTCG ATTTGTGATT 994 CGTATGGGAG ACCATATGGC AGGCATTGCC AAAGCTGTTT TGCAACTAAA AGAAAATCAA 3840 CTAGCCCCTG ACGAAGAACA GA'rrrATTGG TGCCTTTCC GATGAACAGA TTGACCAATA GACCAAGAAA CCTCAATTcc CGCTCGCTGA TTACATTGCT TAGTGGATTT GAATTAATTC TTTTATGGTT GTAAAAAAGT
GTTACACCAA
TTTGCACCAA
'rrATTATGCC
CAATGGAACT
AACATTrrTG
ATGGGTAAAT
GCCTCAAAAG
TTATCAAAGG
CAATACCI'rT
AACGCCTAGT
TTT'CAAT'rCT
TTCTTTGGGG
GGCTGTCTGG
TCCTTGACTA
ACTGCTAGAC
GCCT'rATTTG
ATCGTGAACG
A'rAAGACAGA CTTATAA'rCT AACTAATCCT TAAAAGAGAA TCATTTGACC AATTTAAGCA AGGGAATGCT GAAAACTTTA TAG'rrCGACC ATGATTAAGA CTACCTCTTT TTAAAGCAGC
TATCCCTCAG
CTATTAGTAT
AAATCATTGG
ATATCATAGG
CTACCTAGAA
GAGTACGATT
GTGTACATAG
TCA.AAGAAAG
ATGAAGTACG
TAGCTGGTGA
TTGCCGGAAA
CCTATTCAAA
CTGTTCCATA
TCAAGCGTTT TCGACGTCTC TTCCTTCATA GACATATTCT CAAATTCTCA AGTCTA'rACG ATACAATCTG CCAGATAAAT CTCATACTCC
CATGCTAGCC
TGCTCAAAAA
ACTTATGAAA
GCATCTGGAA
ACAGGAGAAC
AAGTACTCTT
TGAGGATTG
GAAAGCAGGA
TATGATGATG
TGAAGTAAAG
ATATGTCTCT
ACAGTTTCAA
TAGAGC'rAGA
GAGCATCAAA
CAATTATGTC
GCTTTTCTGG
TGATAAATCT
CACTCTCTTT
ACTTCAATTC
CAACCCCCAA
TTTCAATTCT
3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 GGGCAGGGGA AAACATGCCT AACAGAATAA GTCACCTTAr TTTAAAAATC CCAAGGGAGG AGTCTGCCCT r'rTTTAGGAA AAAATCAAGA CAAATCTCCT TCGAACA'rCA GAAATTAAGC AAAATCACCA GAAGGACAGT ATTTCAACTA TAATT'TTTGA ACTGTGTAGT TCGTTAGTGC CAGATATGA-A TAATTTGGGA TTCTTCCTCA GGTAGCCTAT CATAATACTC TTCAAAAATC TTATCAAAAA CTTTTGGGCG ATAGTTTCAT CTTCGTATGT AG6AGTCCTC ATCAAGAAAT 'rAGGTATTCC TTATCCAACT CTATATAACT TGGCATCAAC TTGTAATCTT ACGTTCAGCA ATATATTTTA ACTTTG'rTAG TATTGGTCTG GATTCTCCAT AATTAATTGA CGGATACT'rA ATTCAGACTC ATCACCACAA AATTCTGAAC GACTGATTII r 'N'TAGCCAAA CGTAATCTTT 'rAATT'TTrC GCCAAACTCT CGCAACCTAC AAGAACTTCC TGAGTTGTTT' ACCTCTATTA TAAGCATATA CTGAATCAAA CTATCTATCA GATT'rCTTCT CACTTTAACT AAAGACTAAG AGTTTATCCC TTCGTCTCGG TTTTTGTGTA T=rCCACC ATACCCCAGT AATGCAAGTG CAAAATCCCC TAGAATATGA TAGAATAAGA GAAAGAACTC TATCAAGGAG GAAATCATGG AAAAACAAAC CGTCGCCGTC TTGGGGCCTG GTTCTTGGCG AACCGCCCTT TCACAAGTCT TAAATGACAA TGGACACGAG GTACGTATTT GGGGAAATCT TCCCGAGCAA ATCAATGAAA TTAATACACA CCATACTAA'r AAGCACTACT 'PTAAAGATGT 995 CGTT-CTAGAC GAAAATATCA TGCGATTTTG 'TTTGTTGTCC AACCTTGGAC CATAAGGTTA TAAACGA'rrA TCAACCATTC CGTTGTI'TCA GG;GCCTAGTC TGCTGCTTCT AAAGATTTAC TTGCCTACAC CGACTTAGCA GAAACATTCA AAGATGTGGA CAACAAAAGT GACACGACTT GTTGCCCAGC AAGT'rGCACA TCATCATGCA CGCATCAAAG GGATTAGAAC CTGATAGCCA TTGAAGAAGA AATTCCTGAA CATCTCCGTA GTGATATCGT ATGCAGAAGA GACCATTGTG CGTGACCTAA CTTTAATAAC AAACAGCTCA ATACGTTCAG AAGCTATTTA GTAATCACTA CTTCCGACTr TATACCAATA CCGATGTTAT CGGGrrGAA ACTGCTGGTG TATTAT'rGCT GTCGGTGCTG GAGCTTTACA TGGTCTTGGA TTTGGTGATA a a a. a a a a AGCCATCATC GCTCGAGGTT TAGCAGAAAT TCCATTGACC TATAGCGGCT TATCTGGTGT CCACTCTCGT AACTGGAGAG CTGGAGATGC AGAAGCTAAT ATGGGCATGG TAATCGAAGG AGCCCA-AGAA CTTGGAGTCT ATATGCCCAT CGGAACCAAT ATCAAAGATG CCATTTATGA TGAGTGGTCT TAACCCTCTA TAGAAAGGAT CATCCCTGCT GCTGGACTAG GAACTCGATT AATGTTGCCA ATCGTAGACA AACCAACTAT AGGTATTGAA GATATTCTAG TTGTCACTGG TGATTCAAAC TTCGAATTGG AATATAACCT GCTAGTTGAT AAAACAACTG ACATCCTCI TCTCGGAGAT GCTGTTTTGC AAGCCAAGGC GCTTGGTGAT GACTTGATGG ATATCACAGA CACCCGCCTA GGGGTACCAC GGGAGATTTG ATCGTAACGG TCTCCGACGA GGAGAATCCC AATTTCAACG ACTCGAGCAG TACACAGGCT ATTTACCAAG CATCATGAAC AATGAATTT-A
CTCTTAAAA.A
ATGCTAAGGC
TCGGGGCCAG
GAACTTCCAT
TAGCTGATAT
CCTATGAACT
TTATTTATCA
AAGCAGAAAA
TTTTATGACA TCAAAAGTTA GAA.AGGCAGT TTTACCAGCA ACCAAGGCCC TTGCCAAAGA CCAGTTTATC GTGGAAGAAG CTCTCAAATC TAAATCAAAA CGTTCTATTG AGGACCACT CAAAGAAAAA GGGAAAACAG ATCTTTTGAA GCATTTTATC CGCCAAACTC ATCCACGCGG TTTCGTCGGA AATGAACCTT TTGTCGTTAT CGAAAAGGCT GTTCCACT'rA CCAAACAACT 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320
CATGGATGAC
CGAAGTATCT
TGTTGAAACC
CGGACGCTAC
AGGAAATGAA
TGCTCGrTGAG
ATCCATCGAC
TACCAGCGTA CCCACCCGTC TACTATCGCT GTCATGCCAG TCCCTCATGA GCTITACGGGG TTATTGCTCC GCAAGGCGAA GGAAAACATG GTCTTTACAG TTI'GTTGAAA AACCAGCTCC AGAGGACGCT CCTAGCGACC TTGCTATTAT CTCCTCACGC CTGAAATTTT TGAGATTCTC GAAAAGCAAG CTCCAGGTGC ATI'CAGCTGA CAGATGCAAT CGACACCCTC AATAAAACAC AACGTGTATT TTCAAAGGGG C'TCGT'rACGA TGTCOGAGAC AAGTTTGGCT TCATGAAAAC TACGCCCTCA AACACCCACA AGTCAMAGAT GATTTGAAGA ATTACCTCAT CCAACTTGGA AAAGAATTGA CTGAGAAGGA ACACATAAAT TAAGTAAATT CTCTACTTGA CGCTA'rACTT GTATrTrGTTT 'N'TCATTAAA CAAAAAAGCA CCGAATCGGT GCGCACrTTT~ AACTrTTCTA TGTTGTTTCT AATGGTTCCA CTGTTGCAGT AGTrCA'rGGTT AAATTAAATC GTGGAAGCTA TTAATAAAAA TATAATAAGG G'rAATTGCGG ACACTATCAA AGAAAAAGAT AAACAATAAA AAAGTAATCA TTTCATCAGT 'rGGA'rGGTAT TCATATAACC AACAACAAGC AAAAGATAGC AAATCAGACA AAGAACAAGT 996
ATAACAAAAT
ATCTACCTAT
ATAAGAGTAG
TCAAGTTGTG
AAATAATAAA
AACCGAGCCG
GAGAAAGATA
TATGGAGAAC
CATTTATATA
TTAATAAAAA
AATAAATTAG
AAGATTACC
CTAATGAAAA
TATAGTAAAA
TACGGACAAA GCCATITr TAA-trTAAA TrTTACTTAA AACATAACTT GTTTAATPTTT GG3TGTAATTT TAATTTT1AAA A-AATTTGTAG AATTTATCGA TTCAGATGAA TCTATTTCTA AGGTAAAGAC TATCTTAATA TTTGCAAAAA GGTTTGGCTT AACTCTTGAA ACAAATGTAA TGAAACTTTT AAACAAGAGA TAAATTCAA AAACAAATCG TGCTGAAAAA GCAAAGGAAT GTCTTATAAT GGATCTTCCA APATTCAAAT AATTATGGCG TGGATCATCA GAACAATATT ATATAAAGGC ACTGGTGCTG AGATGTGTAT GATGACGATG
AATT'AAAAGA
ACAAAGTCAA
ATGATGTTAA
AAGAAATTAC
ATTTGGAAGA
AACTTTTGAA
CTTCTAGTCA
ATTCAAATGT
GTCAAGATTA
CATCTAGCAA
ACGGCTATCA
GAAATTACCT
TGCAGTTGGT G'rrGTATrGG TATTAGGGTT AGAACAACAA GCAAAAATTG TACAAT'rAGA TGATAAACTA TTTGAATCAT TTGATGCATC ACTATCTGAA ACTTCACTTA AAACCGATGC AGAATCATCT AAAGCAAT'rG TAGATTTTCA AGATTCAGAT GACAAATTTA AAGATAAAGC AAAACAAATT GATT TATCA AAAAAG'rTGA AACTCTTAAA TCTCTAA.ATG ATCT'rGTTGA GAAAGAAGAA CAAAAAGCTG CTGAAAAAGC AAGTAATTCT TCTGGTAGTG CTTCTAATGA AGATTATAGT TCATCTGAAC AAACTAATCG TCTGGTTCA GGAGATAGTT CAACAAATGG 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 TAGTCAACGC 'rAATAACTAT TTTAGAGCTG AAAGCTACTC ATTTTTTAAG TAGCTTrTTTT
TTCAAACAGC
AAGATACTAC
TGGGAACTrT
TGTTGTTTCG
CTTATTCAAG
CTTACCG'rAG
TGAGCTGACT
TAGCTTCCTA
TGTAGAAAAG
'rGTTATACAT GGAGCAAATA ATGTCTACAG TACAAAGATC ATAATAATGG GGTGGCGGC.A TTGCAGA-ACC AATGGTTCCA AAACACATTA TTTACATATT ATACTCAATG GTATCGCTAC TGACTTCGTC TCGTC-AG'rTC TATCTACAAC GTTTGCTCTT TGATrI'TCAT CAAAATGGCG AGTCCTACGA CCG'IrTITCT CCTCTAACTG AAAATCAAAT TCAAACCACG AGTTTCATCT ACAACCTCAA CTCAAAGCAG TGCTT'rGAGC
TCAGCATCGC
AACCATGTTT
AACCTGCGGC
TGAGTATTAG TCGTCACAAT CCCATTCCCT ACAAGACTAC CGCTCCTAAT CTCTGGCTGG 997 GAAAGATAAC TGCTAGAAAT GCCACCAA CTGCACCACC GATATGGCCT GCTAGGCTGA TTCCTGGAAT CAGAACACTT CCAATAATGT TAACCACAAA AAGTGTCAGA TAGGATTGCC CTAGCTGTTG GATATAAGGA TTGCGAGTTG CATAGCGAAG AACAATAATC GCGGCAAATA GCCCATAAAG AGACGTAGAG GCCCCTGCTG CTAAGGA'rTT AGGACTAAAT ACAAAAACAA AGAGATTGCC CATCATTCCT GATAAAAGAT AGAGAAAGAA AAACTGCTTA GAACCGAAAA TCTCCTCTAC CTGCCTTCCA AGATAATAAA GTGAAAGCAT ArrAACAATG APA=T~CCC ACCCA6ATATG AACAAAAATG GCAGACAAGA GACGCCAAAC CTGCTCGGGA AAGAGCAA TAGCTGGCCC ATACATGGCT CCAAATCGAA ATAATGTATC TGCCCTGTCA AAGTTTCCGC CTGCAGTGAC CAACATTAGT AAAAATACCA AGGCCGTCAC TAAGAGGAAG AAAC-TCGTCA CAGGGTAACG TCTATCAAAG ATTTCCTTCA TCAATTAATA CCTCCTGAAC AGGAATATCA TGGTTTTCAG G'TATAAAGTC C'rGAATTTGA CAAGGATATA TCGTACTCAA AGTACGACCA GAAAAATGTT CCAGATAGCG GTCATAATAG CCTCCACCGT ATCCTATCCG ATATCCTTTC GTCGTAAAAG CCAGACCAGG A6ACATGAATC AAATCAATCT GAGATGCATC CACCACTTCC AAATCTCCCT GTAGCTCCAG TAAGGCAAAG AAAGTT'TTTA CCAACTGTTG CCGATCATAG
ACCACAAAGT
TTCAGCGCCT
GCGATGACCT
TCTATAGCCT
AATTCCGATT
CCGCAGCCAC
CGTAGGGACT
CAAAGTCCTC
CAAAGAACTC
CTTGTTTGAG
TTTTTACCCT
GTAAAAAAGC
TCACCACTGC
CAAAGTAAC'r CTrTCCTTT
CCATGCGCCCC
GCTCAATCAG
TGGCTTCTTG
GT=NGCTC
TCATAGACAA
CCCAATAGCTr
ATCGATACCT
GCCTGTCATA
CATCAGTTCA
TTCCACTTCG
CTTTGCACC
CTTGGGATAA GTTTTGGGTA TTCCTGCGTT TGAAACTCAT ATAAALAGGGG TGTTGTAAA TTGAGATATA GCCTTCATTT GCCCTCTATT CTGCTGCCTT AAGACTTCTT CCTTAGGACT AGCCAAAACA TCACGCCATC GCAGGTTCGA TATCAATCAA CGCGCCAAGG CTGGATTGT'r ACTTCCATAT CAAAGGCAGC AAGAGACTCA TGTCCTGTGT TTAAAACCTT CTTGCCGTCC GAGAAAAAGA GAGGTAGGTT GCCGCTCGGT 'rAAACTTGG CATGCAAGAC TTGCT'rGCGT CTTTTTCAGG AAACTAGACA CAT'rTGAGGG TGATGAAGAG AACCTTTGAA AGGAGATAAC CTCGATTCCG TCTTTTTCGT CTCAACAGGT AGGTATCCAC TGCAACCCCT TCTGCAACTG CAAGGCACGA ATAGTTCCAT AGCTTGAAAA ACGCCGAACG AACTGACTGC ACTTGGGTCA AGGAAAAGCT GCGTGGCCAC AAA2Ar.TrTA TrACTATTAG 9180 9240 9300 9360' 9420 9480 9540 9600 9660 9720 9180 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 TGTGTCTGTG ATGACATTGT TGGTGGTTCC TCCCTCGATT GGGTTGACAT TGCGGCTAAC AGCCGCCACC AAGGCGTCAT TG;GCTTICATG GAAACGGATC TTCACCTCGC AAG'rrCCTGC TCGCAATCTG GCCGACTTTC ACCAA'rCTCC AAAAGCACCG 998 AAATCTGGAC GAACATGGAG ACCATAGAAT TGATCTGGCA TCCTCAT1ACA TGAGCATACC ACCAGCTTCA TTrTCTTCAG
CAGGCTGAAA
AGCCTAAGGC
GAGAAGCAAA
AACCAATGGT
TACGAATTTG
CCTGAGTCTT
GAATCAAATC
CCGCCTCT
TAGAAAGAGC AGATTATTCT TGGGTTGCTC AATGGTCATA TGAAAATCAT GGACACAGGC AGGTAGACCT GTTTGTrCGA CGATAGGCAG TCGCTCCGGC TGACTTCCCT GCAGG'rAGAC AACAAAATCC TTGCCCGTAG TCAATTrCTC GAACTCCTCC AAGCCAATCT CTGGAATCTG TAACATCTAT CTGTCCTCCG ATATAGCAGA TTTACTTTTA CAATTACAAG GTACGAAGCG TGTTGAGTT'r GGGCATCAAT TTrTCTGGGA CATCTTGGGT TGGACTCCTT CGATAACCAC GGI-rCAGCAC TAGCTGGCTC TTTTTTCCAA CGATGGCACG CCGATTTCAG CACCGATATT TCCACCTGGT CACGGATAAT AGCAAAGGAA CTGCAGAATT AAACCTTCAA GAAGCGGAGC ACAACAGAGC TAGGCACAGC T'rCTTTTCAG CATTGGCGAT
ATAGGTG
TTCTTTGATA ATACGAGCTG AACAATAGCT CCTGCTGCGA TGCATTAGCA CCGATAAGAA AATCACACCT GCCAAAACTG GCCACCAAGG ATGGCACCCA GATAACAGAT CCCATCATGA CGCACCTGGC TCGATACGAG ACGAGCATCT TGCTCGACAA CACATCCTTC CAGTCTCCGA AGTTGCGAGT TGCCCCTCAA AAATTGGATA ATTTCTTGAG CTCAAGGGCG CGC'rCAAGAC ATGCATGCGA CCTrGGTGTr' GCCATCAATA TCTGTCCGCC CAAAATCCCT GTCCGCCAAG AATCACATCC AGCAAATAAG GTGTAAATCT CGTCTAGTCT AAGAGGCTGG AAAAAGGGTT CATCCTCTAG CGCTGTTrTT GAACACCTGC TACTACCACG CAACTGAACC ACTACCGATT CATTGTCTCC GACACGGACT CACCTGCACC AACGTGGCrA TGTCAATCAT GGTTCCAGCA TAACAGCATT GTCACCAATT CGTTGATAGC ACGCTTATCT CATAATCTTG ATTTTCTACC ATAGGACATr TCCTAGTTTG AGGTTACTT-r GACACTGGTT CGTTCATTTT TGTAGCAGTC 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12127 INFORMATION FOR SEQ ID NO: 149: SEQUENCE CHARACTERISTICS: LENGTH: 12566 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 149: CCATCCTTCT GTTGATGTGA CAGGAATGAT GATA.AATCAA. CCAGTAGCTA GTCGCGAAGA GGTGACAGAG G.CTTTGAGTC ACT'rGGCGGT AGAG.CACAAT AGTCTCATTG CTCGTCGAAT 999 CGTTGAGCCA AATGAAGCTG GCTTCCAGAA GGTCTGACCA GTCTTACT'rG ATTGTATCAG GCTTGGTTAT CAAGGCTTG GACGGCCACC CCTATGGTGC GACCCTGATT TATCGGATCA
GAGAAACACC
TTTCCTCCAA
GAAGTTTG4GA
TTTCGAATCG
TAC'rGAGTTT
AATCCCTTCG
CTTTACCTAT
GGAGAGTCCA
TGGAGTGAGC
AGAAGATCCA
AGCTATTTT
TCAGGCACGGG
GCCACTTATC
GAAACGAGTG
TPACAGACCA
TTTCGATAG
CTGCTGACCT
ATTCCTAA
GTGAGGGAAA
ATTTATTAGG
CCTTGAAAGA
TCTTACTATT
TTATGAGTCT
TAGCTGGTGA
GAGCTTGTTT GGAG'N'GCTC TCAGACCAGT GTTAGAAGAT GTGAGACAGC TTATC'rGCTC AGTGCTGGTA TCCAGTCT?1' TCGGATTGGG GATTCTCTGG TATCAAGGTG CCI'rGTTTAT
GGCAACGGTG
TTCTACCTITA
ATTGAAAGGG
AGCTGTATTG
GGAAATGGAG
CAACTGGTCA
CTAAGTGTCG
AAACTCCCTC
GTGGTCGGAT
AGAGCTAGCA
TCATTGCTCT
TCTATCTACT
TCAAACGTAT
CGAGTGCGAC
ATAAATGGAG
CCGATGAAGA
TAGCCAATAC
TCTACTTTAT
TGGTTTACAG
GATGACATTG
AGCTCTCCTA
CCAGTCCTCA
AGGALACGCGT
AGACTCTITTT
GGATTGACCT
GAAAATAGTC
ATGATGGTGG
CCCCACTACC
GACCGTTACC
AAGGATAATC
TATATTATGA
TGGCAGGGAT
TGGTGGATCT
GGCAACTCTT
GTGAAATGCA
GTC'rATCCTT
GTGAGTGGCA
GCAATGTTGA
TGGTTGGTCT AGTGCATTTG GACATTTACT GAAGAACGGT 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 CAATTTC'rCA GATGGAGCAG AAGTGGACCT AGATGGCAAT CGTCTCAGTG ACTACACACC GTCAGGGAAT G~rATCTATC TCTCACCGCG C'rATCTGATA GAAGAAAAGA TTACCGT'rTC TTCAGAGTTT ATGGACAAGA TGCAAAACTT GTCTGAGGGA GAGTTTGGGC TGATCTTGCC TGAGAGCTTG CGAGAGCAGT CTGTCTACTA CCAAGGATTG TTTACAGATT ACCTGCAAAA CTTTT'CATCT GAAAGTGTAG AAGTGACGAG TCAGAAACAC AGCTTTTACA GAAACAGGAC AGGAACGTTT CCTrCTATAAT CCAGTACCTA AAAGATCCGA TTATTGTAGT TCTAACGCCG TGTTGCAGGG ATGTTGTGGG GAACTACGGC TAATAGTGCC AGACAGCATC ACAGCTCTAA AAGAGAAAGG TCTGTATCAC AAGCCAGCTA TTTTTTGCCA AGGTACTAAA TGACAAACGG TATTGGCACG ATT1"rGACCC TGTCTACGGC TATCTTGTTA CTATTTTGAG CAGT'rCAGAC GGGAACTTAT GATTAAACGT TGAGCTTCAT GGCAAGTATT TACTGGCGCA AGGAGGAGTT ATCTAGTATT TTGACAAGAG ATGGTTTGAT TAGCGCTCTA TACCTCCCAC AGGTAAGGCT GATGGGTACA AGACAACACG CAAGCGACT~G GAACAAGACC TTGAAACTAG ATCGATATGG
AAGGTTTCTT
GTGGAGTTTT
TTTGATTCCA
CTTGCTGGTA
CTCrTCCTTG
GTT'GTAGCTT
ACTTGGTAAA
ACTCTCTCCT
TGAATCTTCT
TGACAATCTA
GCCTAGTCCT
TGTTTACGCT
1000 TAACGCCCTC TTGATTr'rAG TAAGGCAGGA CAAAAAAGAA GAAGCTGGTA GCATGGCAGT ATTGAAAGGA AAATAAGATG ATT'GATATTC AAGGATTGGA AAAGAAA1Tr AATGACCGCG CA'TTCTC TGGTTTGAAT CTCAAGCTGG AGAAGGGCAA GTTTATGCC TTAATCGGAA AGAGTGGAAG CGGAAAGACG ACGCTGCTGA ATATCTTGGG AAAGCTrAGAA AAGATAGATG GTGGAAGGGT TCTCTATCAG GGGAAAGATT' TAAAAACCAT TCCCACTCGT GAGTATTT'rc GAGACCAGAT GGGC'rATCTC TTTCAAAATT TCGGCCTC'rT AGAAAACCAA TCAATCAAAG AA.AATTTGGA TTTGGGTT'T- GTTGGTCAGA AAATCTCAAA AAGTGGGGGC TTTAGAAAAA GTTAATCI'AG GGTATTTGGA CTTTATCTGG GGGAGAGGCC CAACGAGTTG CCTTGATTTT GGCAGATGAA CCAACAGCAG
CCCTTGCTAA
CTCTTGATCC
a TGAATCTCTT GGTGGA'IrTG CCCTAGTCTG GAATAAGGCT AAAATCCGTA TTCGCAGGGT CTCATCATGG AAAATATGAC TTGCTTGATT TTGCTAG'rAT CCTCAGACGG TTGAACTGAC GAAAATCGAA GT'rA'CAAGA GGCCACT'rTT AAAGAN'TCC TGATTT'rCAT 'rCCCAAAATA AAGAGAATAG ATGAGAAT'rG TTTTTACCTT'r AAAAAGTCCT AATTGTGGAT ACGGCTGAAA TGGTGCCTTG ACAGAATTTT CCACCGTT'rG GTGCCAATTI' TAACGAAGAT ATTCTCAAGG CCTGCCAA'rC AAGGTAGTGG AAAGATGAAA ATCGAATTAT GATGAAATCA TTGATATGAG ATCTGATTAT CCTAGTGCCA CAATCATC'rC CTTTTGGTTC TGAAGGGCAA AGGCAAAAGC AGTGGATGAT GTGGAGGAGG AGCTGATTTT TTTGAACGCA AAGACTATCT TTCTTCATGA CAAGGGGAAT GTGTTACAA'r CAGATTATAG CGTGACCAAG TTGGGCAAAA TTTTTTGACG TTGATGATCA GGTCAATGTC TGGCTGAGCG TGCAGCCCAA TGGCAGATAC CCTGCGTGAT AGTAGAACCT TTGGAAAGGC TTTAGAACAA AAAATCTATA GACTATTTTG AAAAATCCAC TGAAAATTCA GAGGAGGTTA CATCATTGCG ACCCATAATC GAAACTTGCT CATGTGTGAA GAGGTGGTT-T AGAAGATATC AAATCCGAGT GCATGGCTAT ATTATCGTTT GAAAAATTTA ATGTGGAVTT CACCCTACCT TGTTTCGAGA GAACTGCTAA GGAAAGATAG TTTTTTGGTA AGTAGTAACA GATAATAGAA GCAGTGCTGG AGCGTCACGG GATACCAATA TCCTTCAAAA ATCGAAATCG GGCCAGGTAT GTCATGGCTT TTGAGATTGA TTGATAATG TGACCGTAGT 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 TTGATTTGGC GCAACATATC CAGAATTTTA AAAATCCTGA CTAATTTGCC TTACTACATC ACGACGCCTA TTCTCATGCA CTTGAT'rGAG
GGACCGCATT
GTATTACATG
AAATGTGGAT
AGTGGCATTC CTTTTTGTGA GTTTGTGGTC ATGATGCAGA AAGAAGTAGC TCAGCCCAGC CTAACACCAA GGCTTACGGT AGCr'rGTCTA TCGCCGTGCA ACAGCCAAGG TTGCCTTTAT CGTGCCTCGT ACGGTCTTTG TGCCAGCCCC TCAGCCATCT TGAAAATGGT GCGTCGTCCA GAGCCAGCCG TAGCAGTAGA ACATGAGAAC rT=rCTTTA AGGTTTCCAA GGCTAGTTT ACCCATCCCC GTGGAATAAC TTGACAGGTT ACr'rTGGTAA GACTGAAGAG GTCAAGGACA GGC ?IGGAC CACGCAGGCT ATTTGCCGGT CTAGCAGACG TTAAAGCCTT GGCAGGTTTC CGCGTGGGAA TTTCCGTAAA CTGCCCAGGA AAAT'rCAGAA TGTCACCAAG TGTGCGTGGG GAACCTCI'CA CACT'rAAAGG GCAAGGACTC TAAGATGCAG TACTATGTGG AGAGTGATGG CCAGGTTI'MT AAAGGCCATA CCCCTI'ATGT TIGGGGACTGG GGCTATATCC TCAAAATTCA TTCGTCCGCC TA'N'GTCAAT ATCGATCAAG ATTTTAACAG CAATTTGCTG GATCGTI'TCT
CTGTAGTAAT
TGGTTCT'TT
TGGAAGATAG
TTGTGACCAG
CGAACGGAAA
CATGTCCGTC
GGAGCACAAG
GGGAGAACTG
TAAAGAGGAA
TGTTGGGAAG
AATTTCAGAC
GCAAGACCTT
AGCTGACCAA
GCTTrGGCAGA
GGACAAATCA
CAAACACGCG
GTAGATTTCT
AACAGTCTGG
AAGGAACCTG
GGCATCCATC
GATTTTTACC
CTCCTGTCTT
TCAACTCTTC
AGTCTAGGTC
CCATTGTCTA TATTTCCAAA AGCAGACCTA 'rGGTGACATC TGTTAACAGG CAAGGTTACG TCAATAAAAT CGCACCAGAC GCGGTCGCCA TACCACTCGA ATACACCAGG ATTTTCATCC CTTTCCCAGA GATTGCTACT ATGAGCCGTC TTGTGCCGTC TTGACAATTA CCTGCAATTC ATGrGATTTGT
GGCTATGACT
GTCTTTATGG GGCAGACAGG CTCAATCTTG AAACGGGAGA GCTGTTAGTT TTTACAATCT TTGGACTATG AAGTATCAAG GTrAGCCGAG ATTGTAAGTT CAACGGGGGT AAAATCGCA(G GGCTGAAGAC CTCAATCAGG CCGTACTTGT ACCCATACCC 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4580 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400
AAACCAGCTG
C'TTAGTGAAA
TCAGCAAAAA AATTCCAAAA TAAGGAGA.AA ATTCTGGCAG CAGATTATGC CAACTrTTGAA GCAGAATATG CCCATATCGA TATCATGGAC GCAGGTGTGG TCGAGAGCCT TCGTCCTCAT GTGTCAAACC CTGAGCATCA TCTGGAAGAT ATCCATGTAG AAGCAACGCC TCATATTCAT TTGAAGAGGG TGTTATTGCA ACCTTCCGTT TTGAAAATCG TAGAGAAACC TATAAAAAAG CCTATGTCT.C AATACAAGAT TGCTCCGTCA CGTGAAATCA AACGTCTAGA AGCAACTGGG AGTCATTTTTG TACCGCAAAT CAGTTTTGGT AGTAAGATGG TTTTCGATTG CCACTTGATG TTTGCGCGTG CAGGTGCAGA CATCATCAGT GGCGCCCTCC AAAAAAT'rCG TTCACTCGGA GTAAGCCTT CAGTCGT'rAT CAATCCTGGC ACATCAGTTG AAGCCATCAA GCACGTCCTT CATCTAGT'rG ACCAAGTTTT AGTCATGACG GTTAATCCAG GI-rTTGGTGG GCAAGCCTTT CTCCAGAAA CCATGGATAA GGTCCGCAG TTGGTTGCTC TTCGTGAGGA AAAAGGTTTG AACTTTGAAA TCGAAGTGGA TGGTGGGATTI GATGACCAAA CTATTGCTCA ACCCAAAGAA CCGGTGCGA CTGTTTr'rGT AGCAGGTTCC TATGTCTTTA AGGGAGAAG'r CAATGAGCGA GTACAAACTC TCAGAAAACA 1002 ACTGGACTAG GGTTGCAGTT TCATTATCGG ACAGATTTTG ATGCTTTTGT TGGGGTGGAT GGAAGAAGAC TT-ACCTCTrTG CTCTAGCAGT CGGAGAT 1'T GCGACAGGTG ATTCAAAAAG GTGCCCAGTA TTTTGTCCAA TACAGATCTG GAATTGGCTC TCTTAACCAT CTTTGAACAA TATTTTCGGT GCCTTGGGTG GCCGTATTGA CCATATGTTG CAATCCTAAG TTGGCACCCT ATA'rGCATCA AATAGAAATT TACTTATTGT CCAGAAGGAA TCAGTCAGCT AGAACCTCGT
TTTGCAGGCG
CGAGGCTCGC
GATTCTGTGA
GCACGACCAG
AATCCTCAGG
GCCAATGTCT
GACGATGGGC
TCAGACTACG
GAAACCGCGG
TCTGGGTCTT
CGGAAGAAGA
AAAAGGATGA
CTCAGGTCAC
TTCTGCCTAG
AAAACr'rGAT
ACTATCTAGC
CTTTATGCCA GTTCGGGATA GCCAGCTGAC TATTCTTGGA GCCALAGTATG AGTTGACAGA GGAAAATTNT 'N'CTTAAAA AAGTGTACGC TTCTAACGAA 'rATATAGATA GGGAAGTGTC GGTAACTTGC CCAGATGGT'r ATGTGGTCGT ACTGCATAGC AAGGACAGGA GGTAGGATGG AAAGTTTACT TATTCTATTA TI'AATTGCCA ATCTAGCTGG TCTCTTTCTG ATTTGGCAAA GGCAGGATAG GCAGGAGAAA CACTTAAGTA AGAGCTTGGA GGATCAGGCA GATCATTTGT a.
a. 0O@ C. a a a. ta 9 a. a.
a a a a a a .9.9 CAGACCAGTT GGATTACCGC TTTGACCAAG ATTTGGAAGT GGTTGTCAGC GACCGTITrGC TGACCCAAGT CCGTCAAGAA A'rGACAGATA AACGTCTCCA AGCC'N'GCAG GAATCAAATG TCGAGGAAAA ACTAGAAAAG ACCTTGCAGA CTAAACAACT GGAGTCTGTC AATCGTGGCC TCGGAGCTCT TAACAAGGTT CTCTCTGGAA AACTGGGGCA AAT'rAT'rGAA GACATCATGA CGGTTGAAAA CTCTAGTGAA CGAGTGGAGT AAGAATACGT CTATCTGCCA ATTGACTCTA AAGAAGCC'rA TGAGACAGGT GACAAGGATG CAAGCGTCAA GCGCTTTGCT AGGGA'rATrA CCAATTTTGGC AGTNTTGTTT GTTCCGACAG CCAGACAAGC CAGCCAGTTA GACCAAAAAG AAGAAGTGCG GAT'rGAATrG CACCAAGGTC ATCTCCTCCA AACTAGAGAC AAGACAGACC AGCAACGTTT GGAACAAATG CGCCAGACGG CACGCTTACA GGCTTCCrI-r GAGACAGTTT 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 TTGGAGAAAT GCAGACAGTT GCCCGTGATG CCAAGACGCG AGGGATTCTG GGAGAATTGC CACCTGCCCA GTACGAACGA GAATACGCAA ATGCCATCAA GTTACCCGGA CAAGGCGACC AGTTTCCACT GGCAGATTAT TACCGCTTC;G 9.a.
I
910* a. *a a a a
AGATTGAACG
GGAACAAGTA
AAGGTCTCTA
CTGTCGTAAG TCACTCCTAG CATAGCACCA CCTCGGACGA CTCAGAAATC GTCCGCAATC CGGTCTTCTT TGATGA'rTG TATCAGCCCT TCTTAACTCC CCGACCATAT CAGCAAGACT AGACGGGAAG AACAGATTAT TGTTGCAGGA CTATCAGTTG GTTTCAAGAC CCTTAATATC
CCAAGTACCC
CAAAAGAGTG
C'TGCCAGTG TCAAGACCGA GTTTGGCAAG TTTGGTGGTA TTCTGGTCAA GGCACAAAAA CATCTCCAAC ATGCCTCTGG CAATATTGAT GAATTATTAA 1003 ACCGTCGTAC CATAGCTATC GAGCGGACGC TCCGTCACAT TGAGTTGTCA GAAGGTGAGC CTGCGCTTGA TCTACTCCAT TTTCAAGAAA ATGAGGAAGA ATATGAAGAT TAGTCACATG AAWAAGATG AGTTATTTGA AGGCTTTTAC CTAATCAAAT CCAGCTGGGA AAAACTACCT AGCCTTTACC TTCCAAGATG AAGCTCTGGG ATGCCCAACC TCATAACATT' GAGGCCTTTA ATGAAACGAC GCCGAGAAGT 'IrATAACAAT ACCCCTCAAG CTCCTCAAG CTGGTGAACC CAATGACCCA GCTGATTTCA GTCAAGGAAA TTCGTGACTA CATGTCGCAA ATGATTTTCA CAACGGATTG TCCGAAA'rCT CTACACCAAG TATGATAAGG GCCAAGACCA ACCACCATGC CTTTGAAACG GGCTTGGCCT CGTTTGGCAG ACGCTATTAG CGAAGTTTAT CCTCAGCTCA GGGATTATGT TGCATGACTT AGCTAAGGTC ATCGAGTTGA TACAC:AGTGC GAGGTAATCT TCTTGGACAT ATCGCTCTCA ACAGTTATGG AACTCGGCAT CGATGATACC AAGGAAGAAG ATCCTCAGTC ACCACGGCTT GCTTGAGTAT GGAAGCCCAG GCAGAGATTA TCCATATGAT TGACAATCTG GATGCAAGCA CAGCTGACCT GAGGCAA.ACT ATAGTGGCGA GATTGATGGG CCGCAGGTAA GGTTGTCCAC 'rCAATCAAAT TACTCTCCGC AGGTCAAGTC ACCAGTTGAT AAATTGAAAA TCCTGTCTGG AXTTCTACTC CTATCCAGCT ATCATACGGC GACCATGGTG ATAAGACCCT GCTCTATGCG CGGGGCCAGA CCAGACAGAG TTGATAGCGA AATTACCAAG TCGTTTTCCT TCGTCATGTC TCCGTCCACG CATTATGGA.A TGATGATGAT GTCAACACCT 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 0 CTTGCT'rTGG TGGATAAAGG AGAGATGACC TT~CTATAAAC CAGATTTAGA TTAATAATTT TGTTCGTTTT T'rTATGTGAA TATGGTATAA AAAATCTCT'r CAAACTAGGG TAGTATCGCC AGGGTTTGTC ACTTCTATTG ACAATCTCAA TTCTAGTTTG CTTTTTGAr-1 TTTTGAATAA GAAGAAGTGA"TCGGATGCTT GTCATTTCCA CTAGTCTCAA TACT'rTrGCT GAAAAGTATG AATAAAATCT TCG;CTATGGA TAATCGTTCC AAGAAAAATG AGCATTTT'rT AGGATAAGAA TAAGTAAAAG ACAAAAATGA ATACTCTTCG TTGTCGTATG TATATATGCA GGTATATTAC AACAGTG'IrT TGAACCACCA GCGACCAGCT AAATGGAATA GGAAATAGAA ATGAAATTAA ACTATTTGAT TA.ATAATCCT TATAAACTAA AGTCTGCTA-A ATCATCCATC TCAGAAGATA TCGTCATTA'r CAAACGCGCC GGGCTGGCGG AGGTGTCATC TTGAAGACTT GCGTACCAAG
ATCTGTCTGATTTGCTTAGC
AAAGCTTTAT GGACCAAAAA TTTGAGGAAA TTGAAATCGG. TCATATCCAG ACAGTGACTG TTCACACCGT CTAT 'TCGAG TCAGGATGCT AAGGAAATGG TTGTCAGAAA GTGACCGTAT CTTGCCAGGT GGTTATATCT ACACCACCCA TCTTGAAAA.A TATTGGTCGT AT'TATTGCCA ATTGACGCGG TTATGACCGT AGCA.ACTAAG GGTGTGCCAC 1004 TTGCAAATGC AGT'rGCCAA'r GTCCTCAATG TC'rCTTTTGT CATTGTGCGC CGTGACCTGA AAATTACCGA AGTCAACT G?1'AGCGTCA ACTATGTTTC AGGrrCAAGT GGTGACCGTA 9000 9060 TCGAGAAAAT GTTCCTTr-cA AAACGTAGTC ATGACTTCTT GAAAGGTGGC GGAACGGTCA AC'rCAGAACT GGCAGGTGTA GCGGTCT'TTO AGTTTGACTA CAAGTCACTC 7TGA.AGGTAA ATGTTGAGGT TGGCAA'rATC TTTGACGAAG CGATTGTCCC AGCCTTTCTT TGCAAACAGA TCAATAGAGA AGAGTTAGAA GCGAT2'G?'G ATGAGAAGGG GATTCGTGAG AAGGCAAGAG TTAAGGCAGC CACCCGTG'rC TTGATTGrGG ATGGTATGAT 'rAGTCTCrrG CGCGAGTTCG CGGACAA'rGC CCAAGAAGAA CGTGAAAAGC CCAATATTGA TGTCA.AGAAC CAAGCCATCG ATAAATAAGA GATAGAACTA AAGGTTGGAA ATAGAACGAA GCTTATGAAA ACACCA'rTTA CCGAGTTCCC GACTCCCTTT CACTTGTATG CCGTCAACCA AGCTI'T'CG TGGAACAAGG GCTTTAAGGA ATAI'Tr1GCA GTTAAGGCTA CTCCAACTCC AAGAAGAAGG TTGTGGTGTG GACTGCTCTA GTTATGTAGA TGGAC'rTTCT GGGTTCTGAG ATTATGTTCT CT'rCCAACA.A CCTATGCACG TGAATTGGGT TGGAGAGAGT AGCACCATT GCGACCATTA ACT'rGGATGC CCAGAAATCA TCTCTTGTCG T'rGAACTGGG GACAGACATT ATGGACAA'rC ACCAGCTCTr TGAAGCCTTT GCTATCT'rGA ACTCCTTCCT AGCGTCCAAT ACCGTGACCC TCTTTGAACT GGCTGTTGAA ATCAAGGAAA TTTCTGGCGG TA'rTGGTGTT AATTATCATC TTGGTGAGC AGTTCGTAAG GTGTATGAAG TCAAGATTTr CACCCAATTG GGTCGTTTTA
CTGCGGAGGC
AGGAAAAAGG
ATCTCTATTA
AGTTGGGCAT
CAGACCAGGA
AGGTTCTTAC
TGCTGGCACC
AGCTAT'rrG AAAA'rCTCC GCT'rTTGATG AGCCATAAAC CACGCCAGAC AAGGAATACG CTT'rGAAGAT ATTGAACATC TTATAATCCT GGAGGCGTTT TAAGTTTGGC ATGACCAAGG AGCCAAGACT TTrTGGGAT'rC TCCAGAGTTG GCTCGTCAGC TTCGCTAGAC TTTATCAATC GCCGAACGAT ATCGCCTTGA GTCAGCAGGT CTTGGTCAGG TCACGGTGCT CTAGTCACAA 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 GAGTCACTCA TAAGAAGGAA ACCTACCGTA CCTATCTAGG TGTGGATGCC TCAGCAGTCA ACCTCATGCG TCCAGCTATG TACGGAGCTT ACCATCATAT TAGCAACGTG ACCCA'rCCAG ATGGACCAGC TGAAGTGGTA GATGTGG'rCG GTTCACTCTG TIGAAAACAAT GATAAATTTG CAGTTAATCG CGAACTGCCT CATACAGAAA TCGGTGATTr' GCTGGTCATT CATGATACAG GTGCCCACGG ATTTTCAATG GGCTACCAGT ATAATGCCAA ATTACCTTCT GCGGAAATCC TCTATACCGA AGAAGGTAAA GCCCGTCAAA 'rCCGCCGTGC AGAGCGCCCT GAGGACTATT TrTGCAACCTT ATATGGCTTC GATTTTGAAG AATAATCTGA TA.ATAGATTG A.AAATGAAAT TGAAAAACAG AITrGCTTTCT AAAAAATAGG CAAAAATCTT GTTTTTCCTT CAAGTCGTGA 1005 TATAATAAA-A CTATAAAACG TTTI'CAAGGA AGGTAACCAT TTATGGACAA GTGACAGGAA 'rGGTGCATTC GACAGAAAGC TGGTATTCGC TTTATTGTCT T=GCAGGG CTGTCACATG CCCAGACACT TGGGCTATGG AGTCCAATAA GTCACGTGAA GACAGAGGCC TTGCGCTACC GTGGTTTCTG GGGAAATAAG
ATGTCTCAAG
T?1'GGGTCAG
CGTTGCCAGT
AAACAATTGA
TAGATGGGCC
ATTGCCACAA
AGGAGAAGCT CTCTTGCAGA TT-GAT'rTCCT AGGAATCCAC TGTACCTTGG ACACCTGTGC TGAGAAGTTT GACAAACTCA TGGCTGTCAC CAACGAAGAA CAGCACAAGA TTGTCACTAG CCAGTATCTA TCAGATATTG GAAAACCTGT
GATI'GCTCTC
TCTTCCTTTC
TGACTTGGTT
CCAAACCA.AT
CTGGATTCGC
CGGACGGTAG ATGATGTrCTT GGTGGGATTA CAGTCAGTGG TNCACCAAGG CTALAGGAACA CGTAATAAAC CACGTTACCT CTTTTGGATA TCAAGGAAAT AAAAATATCT TGGCTTGTGC CACGTGCTAG TTCCAGGATT GACAGACAGA GATGATGACT TGATAAGTT GAAAT'rCTAC
TGATTGAACT
CTTATCACAC
TGGTAAGTTC CTCAAGACCC CATGGGTGAG TTCAAG'rGGC AATTCCATAT TCCCTCGAAG GAGTCAAACC ACCAACAGCA GATCGCGTCA ACAACTCATG GATACCGAALA GTTATCAAGA TTATATGAAA CGTGTACATG
TCAAAAATGT
GTGAACTTGG
AGAACC'rAA
GATAGAAAAG
CAGCTAAGCC
TTrI'CT'rGCT
TAGAGAA.AGC
GTTTTGGACG
AAGCCTGATG GAAACATCGG TTTTTCTTCT TATCTCGAAC TGTGATAGCA GTTGGTTGTT ACTTGCTTTT GGTGGG'rrCT AACGATATAG TTGACGATAA AAGTGTGACA CTAGCTGGTG AATCATTTTC TTGATTTGTT CGATTGAAGA TAGTAAAGGA AGGAATGCdT AGGTAGCTTG AGCAGAGAAG AAAGGCATTT GANTACCGTCC TCTTTTTGAG AGTAGTCGC'r TCTTTGAGAC CTTTTCAGAG TGAAGCGTTG GCGTACGATA ATGGTTACGA GCTTTTGACT TGCAAAAAGA GTTGIr'rTCC AGCGTTGCGA CAGGGGTAAC GTCTTTTCGT TGGCTAGTTC 'rrCACGGACT
CTTAGCAAAT
TTTTGTGTT
CCACTTGGTT
TTTTTGCGAA
ACTGTTGGAG AA'CATCATG AAACCACCGA CAACCCAGTA AGAAGAGGGA GAAGACGACG ATCATGAGTG GGCTCATGTA 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 CTCTT'rGCAT TTCATCTTCT CACCAGCACA GGCA.ACCAAA
CTTGAGCAAC
GAAGGAGGAT
CAGCAAAGAG
GCGTTTGGTG
CCCATGA
CCCTTCAGTA
AGGGAAACAT
AGCTTGTTGG
TGGCTCAAGG
TTGGTAGATA
AC'?CCGTGAA
ATCATACTTG
TGTTGGGCAG
CCTACACCGC
GCTTCGAGTT
GTGAAAGGAG
GAGAACCTAG
CAAAG'rAGAT
CAALACATGCT
T'rTCT'rCTTG ACGTGCTTGA GGGCGTTCAT CCAAGTGGTA AGATAATCAA TAATGATAGC GACACCAAAG CCTAGACCTT TATCAGTAC CAAGTACTTG ATGGCTTCAG CCATAGGCGC TCCCATCG'rA TTCCAAATAA ATCCTGTTGG 1006 CTGACCTGTG GTTTI'ATCGA CAT'rGACACA GCCAGTCAAG ACAAGCAACA TAGCCACTCC CATAGCCGAG AGTGCAAAAr CGGGGT INFORMATION FOR SEQ ID NO: 150: Wi SEQUENCE CHARACTERISTICS: LENGTH: 5238 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 150: 12540 12566 TGACACTCTG TAGGATTGTC TAAAAACGAC ATAGAAAGAT TGATTATGAG TTACCTCTGC GTTTTCGCAT AAAAACCACA CCATTCCCAA AGCTATTATA GCACCGTAAT CAGCATTCGG TCTTTTTGTC CACCATTCAT GTAAAAAAGC AGTCCTTCCG TAAACAGAAC CGACAATATC T'rTGATGCTT TTTATATTGG ATTTTAAAAA TTTCAT~TT TGAAAATTGA TTGATTTTAA GTTAATTGAT TGCTCGTACT CTCTACAATA ACCACCAAAG AGCATCAGCT GTAGCCATAG CGCCTTTGAC ACCTTCTGGA AGAAAGACTC GTAAGTCCTC TAGATGATGG CCATATACCA GTCCATGATC C.AAGCACATG GAGAAATACG CATAGCTGAT AGGCTCACGG TTATCGCTGT TTAGCCATGC ATTAAACCGA ATACA'PTCTG CCATATTTCT TCATCGCGTC AATGAAGTCA AATTGCTTCT GCAACAGCAC AGGTCATAAC CGTGTCA'rCT AAATAAAGGA AAGTCCTrTG TTTTGATATT GTTCCATTCG TCCAATAATT GCTCCAAGCA TCAGATTCCT CCTTGTTCAT TTATCTACCA TATTTATTTT AGA.AAATAAC ATCCTGTTGG TTCAAAATAG GGTTTTACCA TTTCTTTCCA CCTAGCTCTA AGGAGATAGG CCATAATTTC CCAATGCATA ACCATCATr'r ACTTCAACAA CAAGTGTTCT GCCATCGCGA GTAACACCGA TATCTAGTCC ATAAGCTATT GGCGCATCTT TCCAACATGA TATCGCTTCA TCAATTACAC T'rGCA'rCAAA T'rGTGCATGA TAATCACCTG TATAGGGTCG TCAGCTATGA AT'rCTACAAC CCTATrAAAT CATGGGTTCC GGCTTAATAA A'NTrTCCCCA TAAATCTTTC GACCATAAAA AAGCCCATCA TTTTTAGTAA CTTGTTAACT CTTTTATTTC TAGGCACCTA CTTGAGCATT AACATCTAAT ACGCGACCAT CTAACACAAA ACAACGCCAT CTCACTAATC CATATAGGAT AGTCGAAAGG TAGACCAATA ATTAACAACT CTTCCAGTAA AGACTTTTGA ACCAGCTTTA ATTATCAGGT ATATTCACAA TCTCTCCTAA AATACCAGCA CTCTTTAAGC TCAATAGGAT AGTCATGAAC CGGAACGTT TGCTCTAGTC TOCATTATAT AATCTACAAC TATATCTTCA AGAAAAAGAT TGATATAAAA TAACTTCTTC TCCTTGTAAG GTATTTATTA ATTGAAACCT CACTTGGTAA TTTACTTTGT CTAATATAAA CAACCATTTC ATAATAGCAA ?rI-rGCTCTT ATCAACGGTA ?I'TAGGAGTA ATCTCTAGCA TCTCTACTA CATTATT'CTA T'NTATCG 1007 ATCACTCCTA TATCACTAGT GTTACACCAA 'TTrGTAAAAA ATTT'rGA GTAAATAGCC CCCATAATAT CATCGAAATA ATTCAA'rAAC CTGGGACTtr GTTAGTCG.CA TrCCCCTTCT AATTTT-CAAG TTTCTCTAGA ?NIr"ATCAT CCAAGCTAAT TTGCCA=TT CATCACCTCA AGTTAATTCT ATCACAGG'rG TAACACTAGT GTCAACTGGC TTTTATIAATA CATTAGTTTA CACAGTAACT TTAAATCTTT GGTATTAAAA AATT'rTCACA TGTCTCAAAT CAGTTATCAA ATCTAGTATA AATTATGAGC TCTAAACAAG AAAAAGACTT ACACTCAAGG GTT'rTCTTCC TGACTCTTTT ACTAGCAAAC GTATATACTC ACAAGGAACT TCCAACTTCT TCTTTAACAT A'rCCTTCTAC ATCTTCAA'rC GTGACACA.AG AAATGCCAAA CTTCGATCCC TTTTTTTCTG TTCACTTCCG AAAAAGCTTC TGTCGATTTC ATATCCGCGG TTTACGATAG TTCGI-rTCTC TTGTTTCGAC ATAGGCTTTA ATATGCATCA ATTTTT'GAAT ATCCTTCGAT CACTCTATCA AATCTCTTTC CAAATAATGT TTACATTTTC CTCNGGATCG AACAATATTT CCGT'rAAAAA 'rAATTTCCAT ATAATCCGGT CTCCACT'rCA AAACCATCTT CTGTTTCCAG AGTGTATCCC TTTCCCATCT TCTATGGAAT CAAATGCTAC TAAATCTTTA TTCCAATATA ACCATCGATA ATCTCTCCAT TTTCALTTATC GTCACCTGAC CAATTCAGGC TCTCTGTATC ATCTCATCAT GTCTTATACC CAGAACACAC CTTATCGACC TTCGGTCTCA ATCTACTTT-T ACTTTGCTGA TGC'rTCAACT CGTACAAGCA TGCGTCAGTG GGACTCAAAA GGTTCGGGGA ACCTTTT-GAG TAAACTTACA CATTCAACTT GTTCATCATT GTCCAAACCT AATTGG'rAGC TTAAAAGTAA TGGATTTTAG CCATTGTCCG AACTTGAATT TCAGAAATCA AAGCTGAAA'r TAACTGCCTA TTTATAGAGC TTATCAAAAT AGATCAGAAC CTTATATATG AAAGTGGAGA GGATTTTTAA ATATTTATAG AAATAAAA'rC GGCTACTCTA ATACTTTCCC CCCCCTTCGT TATAACGTTT 'rTGGT-rGACT ATTGAATCTC TCTACAAACA TTGGGTCTAA TAAAGA.ATCG CTTCACCGTC CTTTCTAAGA AGTCTTTTGC ACTTCATGGT TGTTAACGAC TTTTTGAGGG ATAAATTTGA AACATAAAT'r TAGATAA.AGG ATGTTTTTAG GAT'rAAAATA GGGATTTGAG CTACAAAGGC GAATAATCAT 'rTTGGTACAA AGGCTAATGT AAATAAGCAC ATTTCCTACT TACTTTACGA CCTCGTCGCA TTGGCTGAAC GTGATACqGC CTCAGCGTGA GATTAACTAC G'rTTCTCTAA ATGTTGAGA'r 'TTCTTCTAT TTAGATTGTT TTTCTTCATA CGCTCTACAT CATTCATGAC TTATCTCCTG TAAGCTTTTC 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 AGCTTCAATA G'rCTCTTTCT TTGCTTTCGC ATCAATTAGT CA'rGATTCTA ATTCATCTAG 1008 'IrTTCATAC ATACGATATA GTCTA'rCATC TAAATCCTGT TTCCTCTCT TATAATGCTr 3120 A'rC'I-CAACA TCTAAATTAT CTATTTCCTrC AATTAGCTTA AACTTTGTAG AATGACTCTT 3180 TCTCAATTCC TTTTGGTAAT TAT'rATT'rC TTTTTCTrr TCAGAGGTAT CCACCTTCAT 3240 GT'rGATrTTTr TCT'rGCATCA TAGAAGCAAA TTTCGGATTA CTTACTATCT TGACAATCAC 3300 CTCTGCAACA GCATCATCTA ACAATTCTT1C TCTAATTTGC T77M-rGAATG TACACTTATT 3360 ACCTC'rTATC ATCTGCCTAT GGTTACAACC ATAGTAATAA AAATCTTT-AT ACTTTGTGCC 3420 ATCTTTCTTT TTCTTGATAC AC'rTGTTCCC AAACAT'rCCC ACTCCACATA TCGGGCATTr 3480 TACAATTCCA GAAAGCAAGT GTGTGCGTGT ATCTrCCT TTATTCACAT GCTCATATTT 3540 CTTTGCTTGA GATTTTAGCT TAACCTGAGC ACrGCCAA ACTTCATCGG AAACTATAGC 3600 TTCATGTATC CCTTCAGATA TTAGATATTC ATC'TTGTTCA ACCTGCTTAT ATTCATTTCr 3660 TGTACCATGA ACTr'rTTCTA AAGTTCTTCT TCCAAATGCT ATrCCCAT TATATAtAGG 3720 *ATTCTTT'AAT ATCTrrTTA TAAGACCTGC ATCAAACAAA GGAT'rCTTAC CAT'rCTGTC'r 3780 .TGGGA'rTTTT CTAAT'rCCAT GATTCTCTAA GTATTTAGAT ATCCCATTGG CTCCTATCGT 3840 AGTATTTACA TACTGGTCGA AAATCGTTCT TATTGCAACT CCCTCTTCCT CATTTATAAA 3900 *CAGCTTGCCG TCN'CAAGTT TATATCCATA CGGAGCAAAG CCACCATTCC ATTTTCC'rTC 3960 *CCCTGCTTTT TGAATGCGAC CTTCCATTGT TTGAA'rACTG ATGT'rTTC'rC TTTCTATTTC 4020 *AGCCACAGCT GATAAAACAG AAATCAT'rAG NTTCCCAGCA TCTTTAGATG AATCAATGCC 4080 A'rCT'rCAACC CAGATAAGAT TAACTCCATA ATCCTGCATT ATATGAAGTG TAGAAAGAAC 4140 .ATCAGCGGCA T'N'CTTGCAA ATCTTGATAA CTTAAACACA AGAACAAAAG ATACTCCATC 4200 *TTTTCCAGAT TTTATATCTT CCATCATTCG ATTGAACTGT ATTCTACCTT CAATAGACTT 4260 GTCAGACTTC CCGGCATCTT CATACTCTCC AACAA'rTTCA TAATCGTTGT AAATAGCAALA 4320 *99AGCTT'rCATT CGTGATTTr GTGCCTCTAA CGAA'rACCCC TCTATCTGTA TTGACGTAGA 4380 *TACTCGTGTA TAGAGGTATA CTrTTTATTTT TTCTTI'TGAC ATAGTATTAA CCTCAATATA 4440 ATTTTTCTAT ATCATATATA ATTTTTTTAA TTTA.AGTTTG GACTATCATT TCAAGTATAT 4500 TATAACACTT TTATTAGTCC GTCTCAATTT GTGTTTTTGC CATGTCAAAA CTATrT'rCA 4560 9TCTCTTGATT TTTTGCTGGC GTTGGATCGG GTAGATTATC TAAATCTAAA GCACCAGCAT 4620 ATTTTGCAAT CAGATTTGCT ATTAAATCAG CCAATCCATT CCAGTCATTG TCCAATATAT 4680 ACCTCCTCTA AAGT~rATA TCTAATAATT ATTTGTTTAA TTAAGTTNTr TGACATTCAC 4740 AAGTGCTTTG GATTAGCAAC ATAGGAATCT CACTTCCGCC TCTATTCCGG ATGAGCCGGC 4800 TTCAACC -rA GAAGTATCAT TACCCTCAT'r TITCTTCATAG CGGATAGGGT ATCCCTCCCT 4860 1009 ATAT TCAAAC TCTTACTTAT ATTA'PTCAGC CGAAAGATCT TGTCTTGGTC TATTCAATTG AATTTGACCA TCAGAAAATA TACTTAACAG TGAAAAGAAG TAAAAAGGTC CAATTCCCAC CGCTCACTTr
TGACGGATAG
TTAAGGTTCA
TTTATCTTGA
TCT'PTCTTGG
TGCATAAATA
CTTTTTGCrr AGCAGAACTr TTTTTGCCGA GTTATTACGC TCCAAAAATA ATTAACGTC'r AAATTTATCG AGAGTTATTA ATCTT'rrTAA TGTAACAAAA TTCTATAAAT TACCCTCTTA TAACCAATT'r TGAAATAGAA TTTGCTTATA GCAGTGAAAA TrAGACCCTC T'rGGTAACTG TCATCTAAAA GTCTTCTA INFORMATION FOR SEQ ID NO: 151: SEQUENCE CHARACTERISTICS: LENGTH: 13425 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151: 0 .000* 0 GACGATTTAC GAAGAATCGA ACAAGAACCT GCTCCTATCA AAATCTTGCA GTTCATGCTT ATACTTTTTT AAGAAATCTA GACATCGTCT GGTTGACATT GGTCAAAATA GAACAAACCA CCAACCTTTC AAATGCATCT CATGTAAATG TTCTTCTTCC GAAAATCCGA AATTCTACTC TGCTATTCAT TGTCTTACCC GGCGTTATTT ATTAGATAAT TCTTTCCACT TTTGACTCAA TCTGAATCGC AAAAACTATC AATAAAACCC AATCTATTAT AACTGAAAGA ACTACCTCCA GTGACAAACT TTGAGAAAAA GAAATAAAAT AGGAAGCATC CGCA'PTGTTA AAATCCGTTT TAAACGAAAA TATTATGGCA AAATT"TACGC CAGTmTTGA ATTCCCAACC TCTATCTCTA GAATCATAGA TACGGTAGAT AAACGACTCG TTCTATACCT TTGTCCAAAT CAACAATGGT CAAAATTAGA AAACATGCCT TCTCCAAAAA ATATAAGAAA GAAAATCAAA AACACTTTCC CGGTAGTAGA GCTAAAAAGA GGCATAAAAA AATCTPTTATT ACGGCTGATG TAGATATTTT AAATAGATCA CCAGCCCGAC ACAGCGAAAA ATAGCGGTGT ATACTTTCAA AATGTTTAAA TGTGATTATT T GAAAGTGCT TATAGAATGA TAATA.AGTCG TATGCGGAGA TAATCTGACG CGATGCGAAA GCAGAGTATT TTTATAAGTG TGATATAATA ATGGAGAAGT AGATTTTTAG AATGCGGAGG
TATTTTTGA.A
GTATATTGCA TACTTATTTT GAAGTATAAT TTGTTCTGAT GTTCAATATC GTTGAGTTTA
CAACAATTTA
AGTTTATTTT
TAAAGTCTAA
GAAAGAAATG AGTGAGGAGG ATATTAAAGC AAATTflCATC ACTCCTGCTA TTGTATCCAA 960 1010 AGGATGGAAA AATGGTGAGC ATATCGCT'rA CGAAGAATAC TTCACTGATG GTCGAATTGA AGTTAGAGGA GATAAGGCTC GTCGTAAAGA AGGAAAAAAA TCAGACTATT CACTGTATTA CCAATTTGGA ACTCGAATTG CAATTGTTGA GGCAAAGGAT AATAAACACA GCGTTCGAGC AGGATTACAA CAAGCTATTG AATATGGAGA GATTTAGAT GTTCCATTTG TTTrATTCTTC 1020 1080 1140 1200 GAATGGTGAT GGCTTTAT'rG AACACGACCG AGACGAATTC CCTACTCGTG AAGAATTATT GTACGAAATT ACAGAAGCTA TCTCAACTCC GCCACGCTAT TATCAGCAAA TAGCTATCAA AAAACGAGTA ATGTTTGTGA TGGCAACAGG TATTCATCGC CTTCGAAAAG CTGGTT'rGGC CATCTTAGTA GACCAAACGA TGGCTGAAGA AATTACACCA AAACTTTTGA CTGCTCCTGA GCTTTATCAG CAAC'rAACTG GTGAAGATGG AGACTTCTTT GATTTAATCG TAATTGATGA TAACTGGCGT AAGGTAATTG ATTATTTCAG TCTTAAAGAA ACCAAGAATG CTTCCAATAC TAGTTrAAAA CAGGGAATCG AGGATGGTTT TATCACGAGA GAAGAACGTG AGCTGGAGP TTCTCGTATG ACGAAGGAAA AAGGATTGAC ATACTATACA GACGCCTTCT CAATGAAAAC
CCGTACTATT
AACGGGGAAA
TAAACGAGTT
CTTTAGGCCA
AAAATTAAAT
AACTGAAACA
AGCGCACCGT
TTCTGCGACA
GAAACAGTTG
ACGTTCATGG
TTATTCTTAG
TTCGAAAAGG
CCAGAGGACA
CTTTCAAAT
CAGATAGAAA
TAATGACGAA
9* a a. a a a. *a a. .a a a a TCTTTGAAA TTTATCTAGG CATTATCAAA AATTTGACAA GGT'rCAGCTA AGGAAAACAG CAGATTGGGA TGACCGCTAC GGAATACTTT GGTGAGCCAA TTTGGCTCCA TATCGTGTTA TTTAGATGTG GATGTGGATG GTTATCGTCC AGAAACTGGA AAAGT'rGATC ATTAATAGAA GATAGGTACT ACGGCAGGA.A AGATTTTCAT AAAACCATTG TAGAACGCAA AGAGT'rGCCA AGTTTGTTTC TGATTATATG TGATAAAACA ATTGTTTTTT GTGTTGATAT TGACCATGCC a. .a a a a a a a. *a a a a TGTAAA.AGAG AATCTAGACT TAGTCCAAGA TGACAACGCT GAAGGAAAAG CTCAACTGGA CGCTATTGTA ACAACGTCTA AATTATTAAC GATTGN'TTA GACTCTAATA TCCAATCCAT CACACGTCTT TATCCTCAA.A AGGGGAAAGA TACCAATTTG TTTGCTGACC CTGATTTTGA TGCGAAAACA GTCAGTGGTT CTACGCCCGG AAAATATATC GTTACAGACA AGCAGGTTAC TGAAAACGGG AAACTGATTA CCGAAAGCCT
AGACTATCGT
TAACTrTATG
GACAGGAGTT
GACTGAATTT
ATTTTTTACG
AAGCAAAACA
GAGCGAATGC
TATGTCATGC
GATGTCAATT
AATGCTAAAA
AAACAAATrA
ATTATTGATT
TCTATACTTA
TGAG GGTTAA
CTAACGGACA
TCATTGATGA
ATGCACGATT
GTGCTGCACT
AAGTAACTGG
CTAATTTTC-
CATGTCGTTT
'rTGGTCGTGG
TTCGAAATGT
TAGAAACAGG
ACCCAGTAGA
AAGTA'N'GGA
1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 TGGTGATCCA GTGAAGGTGC TTTCGTAGAT GAGGAAGGTG CATTCTTAAT TCTACTGTTC GACCGACTAC ACTCGAAAGA ATATCTTAGG 1011 TAGCTACGCC ACrTGAACG ATTTTATCAC AGTTTGGCAT ACGGCAGATA AGA.AGAAGCT 2820 TATCT'rAGAC GAACTNTATA AAAAAGGAGT TTATCTAGAT GCTATTCGAG AGTCGGAGGG 2880 AATATCAGAA CALAGAAATCG ATGATTr'rGA TI'TACTCCTA AAACTTGCCT ATCGTCAAAA 2940 AGAATT'AACC AAAACGGAAC GTATCAATAA ACTCAAACAA AGCCGATATT TATATAAATA 3000 TAGTGAGGAA GCGCGTGCTG T'rTTGGAAAT TTTACTGAAC AAATACATGG ATAAAGGTAT 3060 TGGAGAACTC GAAAGCA'rTG AAACATT'AAA ACTTCCAGA.A TTTCAGATA'r ATGGTGGAAC 3120 CTTCAAAATC ATCAATACTT ATTTTCGAGA TAAAAAACGA TATTTACAAG CAATTAAAGA 3180 ATTGGAGCAA GAGCTATT1TA CAGTAGCTTA ATGAAAGGAA AGTATGTCAA TTACATCATT 3240 TGTAAAAAGA ATTCAAGATA TCACTCGAAA CGATGCTGCT GTTAATGGTG ATGCTCAACG 3300 TATTGAGCAA ATGTC'IrGGT TATTATTCTT AAAAA'TrAT CATAGCCGTG AAATGGr-rTG 3360 GGAATTAGAA GAAGACCAGT ATGAGTCAAT 'rATCCCAGAG GAATTAAAAT GGCGAAATTG 3420 4 0GGCTCATGCT CAAAATGGGG AACGGGTATT GACAGGCGAT GAATTACTTG ATTTTGTCAA 3480 004TAACAAGTTA TTCAAAGAGT TGAAAGAGCT TGAAATAACT TCA-AATATGC CTATTCGAAA 3540 *AACGATTGTT AAATCAGCTT T'rGAAGATGC GAACAACTAT ATGAAAAATG GCGTCT-rGTT 3600 *ACGCCAAGTC ATCAATGTTA TTGATGAAGT TGATTTCAAT ACCCCTGAAG ATCGTCATTC 3660 *S0GTTTAATGA'r ATTTACCAA.A AAATrCTTAA AGA'rA'TCAA AATGCTGGGA AC'fCAGGAGA 3720 ATTTTATACG CCACGTGCAG CGACTGATTT TATTGCCGAA GT'rCT'rGACC CAAAAC'rTGG 3780 AGAATCAATG GCAGACCT'rG CTTGCGGAAC AGGACGCTTC ?I'GACTTCGA CTCTGAACCG 3840 *TTTAAGTAGT CAACGTAAAA CTAGTGAAGA TACCAAAAAA TATAATACAG CTGTT -rTGG 3900 TATTGAAAAG AAAGCATTTC CTCATCTTTT AGCAGTTACA AATCTGTT'rC TTCACGAAAT 3960 ~.TGATGACCCT AAAATTGTTC ATGGAAATAC TTTGGA3A-AA AATGTTCGTG AIVTATACGGA 4020 ~TGATGAAAAA TT'rGACAT'rA TTATGATGAA TCCACCTTTT GGAGGGTCAG AATTAGAAAC 4080 see* AATAAAAAAT AACTTrCCAG CAGAAT'rACG GAGT'rCTCAA ACAGCTGATT TATTTATGGC 4140 TGTCATTATG TATCGTTTGA AAGAAAATGG TCGTGTTGGA GTTATT'IrAC CTGATGGTTT 4200 *o.TCTATTTGGT GAAGGTGTAA AAACTCGCN' GAAACAAAALA CTGGTAGATG AGTTCAACTT 4260 GCATACGATT ATTAGGTTGC CTCATAG'rGT CT'rTGCACCG TATACAGGAA TCCATACGAA 4320 CATTCTT'rTC TTTGATAAAA CAAAGAAAAC AGAAGAAACT TGGTTT'rATC GTTTAGATAT 4380 GCCAGATGGT TATAAAAATT 'rCTCGAAAAC TAAGCCGATC AAGTCAGAAC ACTTCAATCC 4440 TGTTCGTGAC TGGTGGGAAA ATCGTGAAGA GA'rTCTGGAA GGTAAG'rrCT ACAAA'rCTAA 4500 ATCATTTACA CCTAGTCAAT TGGCTGACTT AAAAGAGGAA GACGAAATCT TAAATCCCIT AGCAACT'PTA AATCATAAGA TTGATAA'rGT CAAATAATGA CACCACAACA ACTTAAAGCA TTAGTGCCGC AAAATCCCAA TGACGAACCT GAAAAAGAAA AACTTATCAG TGAAGGAAAA TI'CGTGGTG ATGATGGGAA ACATTATGGG GATGITrCCTT ATGATATTCC TGATACTTGG ATTGTCAGAG GTGGCTCTCC ACGACCAATC ATAAATTGGA TAAAAATAGG TGATACTGAA GAAAAAATCA AAAAATCAGG GC'rTAACAAA TTA-ACTAATT CTATGAGT'rT TGGTAGACCT GATGGATGGT 'rGGCTA'rTTC GAACTATGAA ATTCTTTCAT CAAATGTAGT TTATTCTCAA AAAAACTTGA A'rAGTGATA.A AGTTGCT'rC' CAACAACGAA TAGTAGAAGC AATCGAATCA 1012
GAA'N'ATAAT
TGAGTTGATT
ATT'AGCTGAT
AGTATTICTCC
GCAAGTGAA'r
ATCAAACGAG
AAGM~GCTG
GAGTGGGTGA
AAaGATTATC TTAGACCAGT GTGAC1'TTCC
CAGAATTATC
ATTTT'GCAGT
AAAGAGCGAT
TATTAAAGAG
ATAAAAAGGA
ATGGAAGCAC
GGTPCTAC
T'rACTTCTGA
AAGCGGAAAG
TGTGGAGGA
GGAAGGGAAA
AATTAAAGCT
AACTGAGATA
TCAAGAAAT'r
ATTGGTTGAA
AGTAGATGGA
AAGGGTGAAA AGTATATAAA TAATGTTAAA ACTAGATTTG TAAAAAAAGG TACATTTTTG AGTTATAATA GACTAGAACA GCTAGATAAA CTTCAATATG CTA'rGCAAGG AAAATTAGTTI GTTT'rACTTG AAAAAATACG AGCAGAAAAA AAGAAAGATT TGGACA'rTTC TATTG'rTTCC ATACCTATGA AT'rGGGTTGT TATAAAAA'rA 'rCTTACAAGA AGGGCGATTT AAGCATTAAT TATATTTTGA ATGTTGATGG AACTCATTAA ATAAAGATTA TTTCTATCTC TAATTAGTGG AT'rCTTATCC CTCTCCCCCC GCTTTAGAAA AAGTAGATGA GAATTTCCAG ATAAACTAAA GAACAAGACC CAAATGATGA CAAAAACTCT TTGAAGAAGG CAAGGAGATG ATAACTCTTA AAAGATATTT TTTCAATAAA AATAAAGGTG TTAGAATTAT GATAATGATT ACTACATTGA TCbATACAC
CCTATTCTAT
AGCTGTTGTG
ACTATCCGAA
ATATGCTGAA
AAAATCTATT
ATCAGTCGAA
CAAGATTAAA
TTATGGGAAT
TACAGGTCTT
ACGTGGTGGT
TACACAATTC
ATCAACCTCT
TGTGGCTGGT
4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 AATATTAAGC CTTTAOAA'TT ATCTCCTtTG AG.CAAG'rTrA TTAGAACATA TTGGAAAGTT GGATTATr TCCAATTAAC TTTAACTTGT CCTCTCCGTT CAAGCTTTAT ATAATATCC TTTGAGGAAC AGGAACTTAT CTTTGAAAAT GATITCTTTTC
TTCTCTGTTG
TTTAAAACA'r AATCAGCTAA TAACACCTGT TGCA-AGAATC GATAAAGACT ATGATrGTGT ACCATTCGAA AGTTCAGAGA TTATr'rCAAA ATTTCTAT'rA ATTTTATAAA CAATTGAAAG CAATAACTAA ACTATCAGGT TAAAACTACA CTGAGCGAGC TATTAATTCC GTTAGCTCCT TACTCAAAAA GTTGAGAAAC Trr=GAAAA AGTAAATCAA ATCTCTTCAT GATTAGAAAT AGGGAT'rAAT AATTCGGAGA TACTGGTACT ATTTAATGT'r T'rCCCTTTGA TAAGTGGCAA AAATATCAT'r AAGTAATCTC
ATATCGATAT
TAGATAATTT
TTAATGCTAA
AATGGCI'TCA
AAAACTCATT
TTTAGAAATA
CTGATATAGA TACCCAAGGT TTT'TTCCTAT TCTGAAGTTA
TTATGAGTGG
AATAAAGTTC
GATTCTC'rCG
ATT'TCAGTTC
ACTAGGCTAG
GTTGTTTCGT
TTGACCACCT
AGCTGTTTTNC
A.AGCAAGGCA
ATCAGGGTAA
1013 TAGCA'rTTTT
TGATAATA'TT
CAGGAATATC
CTTTAGGTGA
CATTAG'rTAC
CCCAAAAAGT
CAAA'N'TAAT
CTTTGTTCCC
GT'rAT'TTT
CTTAGATAA.A
ATATCATTAT
ATTACGTTAG
CAATATGGCA
AATGTCTATT
ATAACCAGAT
ACTTTTCCAA
TTCTTTGA1TT
ATAGGCATAT
TGAATCACCT AAAGTAGAGA 'TTCTTATN'A GCATAGGGGA AGCTTCACTG CGTGGAGGAG ATATCTCCAT GCTTCTGGCA
TTTCATATAT
AAAAATAGCA
ACAATATTTT
TCGTGTCCCA
GCAAATAAAT
AGGATAAGAG
GGTAGTCAGT
GTTGTTCAGT
TCAAATCTGC
GCCTAAAAGA
ATAATA.AGAG
ACCAATTAAC
TTCGAGTAGT
ACTTCGCTPA
GATTCATTTT
CCATAATCAC
AATTTTATCT
TTCTATACT'r
AAAATTCTTA
GTATTTATCA
GCATAATTTT 'rTAACTGTTG AGCAACTCCT CTTGCTGTAA TTGGTTCGTT AAATTTATTC AAAAATAAAT AACCACTTCG GCGATTTTCT GAT-rCTAACC AACTAAGACA AC'TATTTCTT AATTTTTTAG GAATGTACAG TCTACGAATT TTACCACC'PT TTGAGTAAAT CCGATTTCTA CA'rGCTCTAC' TTTTAGTTTA ATAAGTrAC 7-,ACACGAGC CCTAAAAACC AAACGACAAA ATGCCATTT AGAAAAAGGT AATCAGCATG GCTAATGACA AAAATACCAT CTTTTTE'CAA TCTTCTAAAA ACGGTTTTTG
GTCAAAATAA
CCCAGTTGCA
ACTACGTTTA
CTGTAC=TG
TACTCCTTGT
6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 ATCATGACCA ATAAAAGCCA GATATTTATT ACAAATTTTA ATTTCAAATC AGTCGCAAAT TGACAGTTTT AGGTTTAAAA TTGTCTAATA AATATCCTTT GTATTCAAAT AAATCTTCCA TTTTGAGTTC GTAATTCTCC AAGAAAAATC GAACACCATA AAGGTACGAA CGCACAGTAT TTTCAGCTAA ACCAGCTTTC TTCAAATGTA AT'rCAAAATC TT'rCAACGTA AAACTCCTAT CTTATGTTTG ATAGAAATTC CACCGCACGT AAAACTATTA TACTAAATTrA G'rCCGTCAAT ATGCGCGAAA AATTGTTCGA TTTTATCAAC GATTCTGGAT TGTTCAGGAA GGGCTGGGAG GGGGATTAAA TATTCTTq'TA TAGTTTTCGT TAATAATTCT rTTGTTTTG TACTACCCGA CGCTTTTTCT TCAATAACTG ACTGA.ACAA'r AGGAGAGGAA AGAAAATTAT AGA'rCAAATG GCAAT'rAATA ACCCCCGATA AGACTCTTAT AACTGTAACA TGGCTATCTG CAACAGCCCA GCCATAAGGA TTTTTAT'TTT CATGGTAAAT AGCTAATCGT CCTAACGTAC CTAGACCTGT TC-AAT'rCCAC ATTAAATCAC CATCTCTTAG TAATCTTTCT TTCTGGTAAC TATGAACTGT T'rCGGGATCA ATAAATCT'rG TACATTTCTG AGCAATCACA GGGTATATAG TTTGAATGTA GGAGGTTATA AAGGTAC'N'C CTCATAATAA CTTTCTTTTT AATCTTGCCT GTAAAACTTC GACTGATTCA AT'rGAAGAAT AGATTTT' TATAACTTT'C AGCATATTCA GTTGTTGTTC GGATAGTGGG TTGCAGGATA ACTTGTTCCA ATAAATAATA TTTCAAATAT TAGCTA'rCAA ATACTCTTTA CTGTTGAAAA TAAGACACTA CACGTGAAAG ATATTGTAGA
TCGTTTAACC
GAGTTATCAT
TCTTCAAAGA
TCATTTGGGT
AGTTTATCTG
TCTACTT1"I1
GGGAGAGCAA
GTAGATTTAT
GTTTCGTT'AA
AGTTCTCTAA
TTCTGCGAAA
TTTTTG'rAGT 1014
CTAAGTCAAT
GAATATTTGA
TCACCCATC
CTCCTTGGGA
GT~TTGTTT
CTTG'rTCAAC
GAAATTCTTT
AGAAAAGCCA GACCATTGAT ATATTTTGGA GACTTCCCTC CCAACTr'rCT GGTATTTCAC AACAATAGAA ATGTCCAAAT T'rCTGCTCGT
TAATTTTCCT
ATCTAGCTGT
ATTT'rCAA
TGCATAGCAT
TCTAGTCTAT
C'TAAAGCTGA TTCGA'rrGCT TCTACTATTC TTAATAATAG ATTAAAATTA TAATCATTGA
TATTAACACG
GTAAAGTATC
CTACAGCAAT
CTAATrTTCT TGA'rrATGTT
ATTGATAAAA
CAAAACAATA
A''TT'rTAGA
AGCACGGAA
CTTTTTTCTA
TTATCTGATA
AATGCTGTAC
TATCGTCTAA
GGCGCTTGTT
TCAATACTAG
ACGTATCTAT ATACCTAAAG GATTTCTCTG GCTTATTTTG AT'rTTATCCT CACCCACTCC CA.AGTATCAG GAATATCATA TGCTTCCA'rC AGCAAACT'rC CCATAATGTT TAAAAATACG CCTATAGATA ATGGGGTTGA AATTCAATT TTTACTTCCA ATCGAATATT TTCATCATAA AATTCAATAT CTTCAGGATT
TCTTATGTGC
AATAGG=TA
CAAATCCTCC
TTCCCC'rTGG CCCAAAATTC CAATAAATTG AGGAACATCA ATTTCTTGAG TTCAAGTATA 'rAAAAAGGCG TTGT'rGATGA GATTGTAGAT ACCTTTTCTG CCTGTAATTG CAACC:TCGGC AGAAATATTC AAACAGTCGC CATCATCATT ATCGATTAGG AAAGTTAAAT AGTCTCAATT TACTATGGCT TTTGAAGCTT GCAAAAGTTA 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 TTCCGCTCGA TCAGGATTCA AAAATCGACA AGCACAAACA TATTGAGATA ATATAGTAGA TTGAAA'rAAG ATGTAAACA-A TAGTTTCTAG AAATTTTTAG CAGATGTAGT GTACTATTCT TCAAATATAT CTTTCGALAAA AATATTTACA GATGTG;TA.AT GTAAACTTGT AGATTTCGAT T'rGAAGTAAC TTGTTrnCTT -TTGAATTTTT CCATAGTGAC TCCTTAATTT TCTTCTrACAC CGCAAAAGAG TCAAGAGGAT TTTTCGAAAA ATAAATAGCG GGTTATAGGT ATTTGATGGC TTAGACTGCT GTGTGACTGT TTCTATATTA GTATTAGTAA AGGTCTAAAT AATTATCAAT TTGCATAACT TGCCCATTCG ATTCGTTTGG CTTCAAGGAA
GCCCGATATT
GTCTGATGAT
ACCGAAATCG
GTTTTTGAAA
AAATCTAATT
CTATITTTAAG
TTACCCACAG GCAATCTr'rC TTCCCATTGT GAAACGAAGG GC'rAGTATAG ATGTGATCTC CGAGAGCAGC TTTAACCAC'r ATGGAAGGTC TGTAATACCA CTTCGATAGG AGCTGG;TGCT GAACAGCCAT AGCAACGTAA TTCCCATACC ACGTGAAG.CA CAATGTAAAC AGGCGCTTCA TCATGATGGC AGTATAGTTG 'TTTCTGACAA CTGCATTCCT 1015 TCATCTTCTG TCAAACTT CAAAGCGT'rG TGAAGAGTTG GCTTCCTTGC GCTCTTCTGC TGTCATGATG TAGATATTT TCGATTTTAT 'N'TCAATACC ATACAAACCA ACTTCCAAAA GGGITCGCCA 'rrGGA'rCCAC TGAACGCAAC TCAAGACGAG GGTACGCGCA CAAGTGGCGA ACCGTTACGA CCAGCCCAAIG 'rAACCTGGAA CCAAACGTTT GTATGAGTTA ACTGTTGGGT TAAGCATGCT TGATCAAACC GCCTAGGAAA TGGTAAGCTG TTTGGATCAT TTGGATCAAA GAAGGCGTTA TTTCCTrCTG CATCAAACAA GGACATATTA CAGTGCATAC CTrGATCCAGC AATACCAAAT TTTGGCTTCG CCATAAATGT TGCGTAAAGT CCGTGTI'GC TTTGAATCTT ATCACAAGCA CGGAGAACTT CAACCGCAAC CTCGTGGTGA CTCGCTTICTA CAATCTCACG ACGTGT1GTTG TCCGCAAGGT TGTCATTCAC TTCAAGTGTT GGGTCCCCAT GCTCTGGACC AAGGT'rGAAG GATTTGAATC TCAAATTACC ACGAGGGTCA CCCGCAAATG GAGCAATGGT TTTAACAACA AGCTTAAAGA CATCGTACTT AAAGTCAATC TCATGCTGTC CTTCAAATCC CATTr'rGGTC AAGACATTCA CAGTAGGTGC CAAGTCAAAG 'rAGCCACCCT T'rTCATCCAA CTTAAATAGG AAGAATTCTG CAACTTCTTC CATGTGACGA AGAGCTCGTT GTTCACCTTC TGTTGTATAG ACATCACAGA CCCAAGGGAA GACTGTCCAT GTATCCAAGT TACGTACAAA ACCTTCA.ATA GAAGATCCAT CTAACTGTTC ATCTGTAGCA GGAATTTCGA ACATAAGjkCG ALATAAAGGTA ACATTTTTTT TGATTGGCAT AAGTTTTCTC CTTAATCTAT AAGGTGACTG TACTGAAGCA AAACGCCCCT GTACTTCAGT CTGACTAACC GCTTTCTTGG 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 TCAGACCTGC AACACTTCCA CCGGGTACAA GTACATATCC CAAACATAAC CTTGTTCGAC CGTTTTTCAT GGTTCCCAAA CCTTGACTTC ACGACGAATA GACTACTTGC GGTTGCCTAA GTTGGAGGAG TTCATTGTGA ATTTCGCTTC ACGTTCAGCA TATAATCTTT GATT'rCAAGC CTTCGTTTCG ATCGGGCTTG 'rTTTCATCTC
GACTCATTGA
AAGACCTTAT
ATATCTGAGA
TCTGCAGCTG
CCGCGACCAA
AGTGCACGAC
TAT'rTTTTCT TAATGGCAGC GATATTATAA CCTCAGAGA AGACGATCCA TGTCATTCAA GGAATACATG CGACGATTTC ATCAACTCTT GATCTTCATA ATAACGAATC TGACGCGCCG ATAGATCGGT CAACTTCATA ACACTGCCGA TAGGAAAAAC AGCCATATTT CGGCGAAATTr CTTrTTCCTT CATTTACAAT TTCCTTCTTT CTGTCTATTA TAGTCTAAAA AAAGACAAAC GTCAATTGAT AATGTTATAA AATGTAACAT TATTTTTCTT 7rMNCTCTAA AAAGAGACGA 1016 ATACGATCAA TATCGTAAT'r TACGATAATT GCGACAAAAA ACACGCACAA ACACGTACAA AATTGTCTCA CCACTTGGAA CTCCCATAAA CGT'rTCTAAT 11640
ATAGCTGCTA
GTTAACATGG
GTATTAATA
CACCACCAAT AACCCCTGCT TTCTTATTCA TGCAGATTGG AACAACTACC AAGGTCACCC AGAAGAAGAC CAAGGCATAG AGTCCACCCA GAAGTCCCAA AATGAACACT CI'CATCAAAA CTCTCCCTCA GCACCAAT'rr GAAGACCTTT CCAGCCAAAA AAGCCAAAAA
GCAATACCTG
TTTTTAAAAT
TTATGAGTAA
ATCCCTCTCT
GGAAGAACCT
TTTTAAAGGT TCGCATACCA AGTTTGAACT AACTCATAAT CTCAACTTTC TATTTCCATT TAGT'rGAGAG GAAGCGTTTT TATTTTAAC TCTTTGALTT ATTTATAAAA TCT'rATTT TACCTTCAAG AAG'rTCCATT GATGCTCCAC AACTTGTCTG CACGGCCAAG GTTAATCGCT GCGGCAGCTG; GATTTAACTC CTGGTTGTTT CACGATAGCG TCCATCACAC TCTGGGTTTT CAAATACACC CATAGGTCCG TTCCATACGA GCTTCGTCAA ATTTGGCGAT AGATTTTGGA CCGATGTCAA ACTGCTtCAC CTTCAGTGTC ACGCACTTCA GTGTAACCAG GAGTCAACTG GCAAGATCAA TTTACCATTT GCTTTTTCAA AATTTGTCTT CTTCTACAAG TGAGTTACCG ATTTCGATAC TAAGTCATCC CACCACCGAT AAGGACGTTA TCAGCTTTTT CCGATCTTGT CTGAAACTTT TGAACCACCA AGGATAGCCA TCAACTGCTr CTTGGATGTA GGCAATTTCG TTTTCAAGAA TCAACGTTTG CTGAGATACC AACGTTAGAT GCGTGTGCAC TCGTTTACGA AGATACCATC TCCAAGTGAT GCCCAGTATT TTAGATTCTT TCTTGCCGTC AACATCTTCG TAACGAGTGT TTGATAGGGT AATGATTAAC 11700 TGGCTACATT TGTCATAATG 11760 AAAAGGCTTC GTCGAAAAAG 11820 TACTATTTCC TAGAATACGC 11880 GGCTAAAAAC OGCTGTCAAA 11940 TCAAGAGAAC TAGAAAAACA 12000 GGGAT'rTATC GAATTTATAT 12060 TTATCATAAA TCGGTGATTT 12120 AAAAGAAAAG AGGAACTTTC 12180 CTGTCAAGGC TGCAAGTCCT 12240 CACCCGTACT AATCCATGAG 12300 AGTCACCACC ACCGATCATT 12360 CGATTGTACC AGCTTGGAAA 12420 CTGTTTTGGC ACCAGTCAAA 12480 GACCAAGGAA GCCTTCAGAA 12540 CAAATGCGTT AGCTTCTTTT 12600 GAAGAGCTTT CGCAACATCC 12660 CTTGTGCTTT GTAGAATGTG 12720 CAAGCAAGTT TTCGATAACA 12780 CGAATGGACG TTCTGGAGTT 12840 GGAAACCAGC AACTGCTTTT 12900 GGTGAGCTGT ACCGAATGCA 12960 TACCALAGTTC AGGATCGTTT 13020 TTTCAACCAA GAGAACTTGT 13080
CC-ATCTTCAA
ACAACATCTT
GCTTTATCAG
TGTTCGATGA
ACGCCAT=T
GAGCGTTGAT TGCCOCTTCT AATTCAGCAC GACCAAGTTT TGCTGCCAAG TCAGCTGCTA CTTCTTCTTT CACACGTCCA AGGTGAGAGA TGTACTTAAT AGTTGGAAGA GCTGCTGTGA TCAATGGTAC GTTGAAGTCA ACACGAACGA
CACGAGTGAC
CAGGAGCAAG
AAAGAATTGC
ACCTGGGAAA
TGATTTACCA
ACGTCCACCT
13140 13200 13260 13320 13380 TACGGTTATC GTTAGTGATT GGACT'TTTT ACCTTTCAAG 1017 TCAACGTCTr TAACAGTAAG ?TrTGCCATG TTACAAAAAC TCCGG INFORMATION FOR SEQ ID NO: 152: SEQUENCE CHARACTERISTICS: LENGTH: 905 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152: 13425 GAT'rTATCCT ACCGGnGAAT TTCCGGAGGG GTTCTAGCAG CGAATGATTG GCTTTCTGGC CCATCCCTTT AAAGACTTTA CAATCTTAGG AATCTATGAA AAGA.AAATGT TTTGTACTTT ATTCCAGTTG CCATCGGTAT GCTTCTGGGA ATCGGCTTAT TTTCCTACCC GATTGAATAC CTGCTTGAAA. ATTATCAGGT TTTTGTATTA. TGGAGCTTTG CGGGAGCTAT TATCGGTACA GTTCCTAGCC TCCTCAAAGA ATCAACTCGA TGGTTATGGA CAACCT'IrAT CATTTCTGGA GGAACCTTAA GCGCCAGCTT TCTTAACTTC GTCTTGGTTC CTGGCCTCAG CCCATCAAAT ATGTTGACTG GTTTTAAAAC TTTTGATTTC GCAGGTGCAA CTCTCATCGT TTTTTCAAAA TCACGCGTCT ATCATTTCAT CATCGGTATC CCAAATGCAG GAAACGCTGA AAGTATCCAA ATCATCGCCT TCTTCTTTGC GCTGGGAATC GATAAATATA AATAATGGCA AAAAAAGT'rA GAATCTGACC GAGACAAGAT TGATTTAGCT TTAGGACTCT ATGCCTTAAA TTTTGTCGTT GTCCTAGCAG GCGCACTATT GGCCCTTGGC TTACTTTTGA TTTTGGGACT CTATGCTCCT TTGGGAACCT TCTTTCCGAT TGGAATTGGT TTGATAGATT ATGCCTTAAA CAACTACCAC
GTCCTATCAA
TACACAGGAC
TGGCTTGGTA
AAATCAAAAA
GTACCCTT
TTTC.ACTTGT
TTTGGATGAG
AACAT'rGGTG
GATCTTAATT
CGGTTATGTC
TCAATTGGAG
GAACAAATCC
TATCTAAAGC AGCTATCCCT CATCAGGGGA TTCAAATCAA TGCCCTAGAA GGAGAGCTTC
CTCAA
INFOR14ATION FOR SEQ ID NO: 153: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 4278 base pairs B) TYPE: nucleic acid STR.ANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 153: CTTGAATTAA ATAAAAAACG AAAGATGTGC GTACGGCTAT 1018 TCATGCGACT AAGCATr'rTA CTGATAAGCT TGTTGATCCC
CCTTGGAAAT
TCCAAT'N-rG T'rAGCCAAAC
CAACTTCAAT
GTCAGCGACT
ACAGACCAAG
G'rTT'rGGAAA
GAAAAA'N'GG
TTGTGGTGGT
AACAGGTfATC
GTGCTCGTAA
ATTTATGAA
ACCTAGCTCT
GAATTGGTTC
TCGAAGACCC
AACCAAGCTA
CGAAATTGCA ACCI'AGCGC ACGTGAGAAA AATCCTGAAC ATCAGCGCCT GTAACCATTG GATTGCCCGT GTrGGTGGTG AAATCTGCCA GCTGAGTTTG CAATGCAGGT TTGGTTGCCA TAACATTATT CTTGTTTTG TTTCCGCCCA GAAC'rCTTGA CCGCTTGCCA GTAGATGAAA GAAGAAAAAA TGACAGCAAT TGATTTTACA GCAGAAGTAG T'rGGCTGACT TGTTTAGCCT TTTGGAAATC AATTCAGAAC GCCCAGCATC CATTTGGGCC TGGTCCAGTA AAAGCCTTGG GACCGCGATG GCTACCCAAC TAAGAATGTT GATAACTATG GATGGAGAAC AAGTTCTCGG AATCTTTGCC CATATGGATG TGGGACACAG ACCCTTACAC ACCAACTATC AAAGATGGTC TCGGACGATA AGGGTCCTAC AACAGCr'rGT TACTATGGT'r GGTCTTCCAA CTTCTAAGAA AGTTCGCTTC ATCGTTGGAA GCAGACATGG AC'rACTACTT TGAGCACGTA GGACTTGCCA CCAGATGCTG AATTTCCAAT CATCAATGGT GAAAAAGGAA TTTGCAGGAG AAAATACAGG TGTTGCCCGT CTTCACAGCT AATATGGTAC CAGAATCAGC AACAGCAGTC GTTTCAGGTG A.AACTAGATG CCTTTGTTGC AGAACACAAA CTTAGAGGAG CAAGCGCCCA CAACAGCCAG TGGCAAAGTT AGCTTATGGT CCTTGTTTAC AGATACGGAC CTAA'rAACTT TTCTGAAGAG CCCGTTACAG TGAGCAACAA TGAACTTGGT TCTrrGCATTG ACAAATCAAA AGTTAATGAA TCACAGTGGG TITATACAGAC TCATCGAGAA AAGATAGAAA AAAAACGCAA AGAAGACCTC GTGATGACAG CAAGGCTGAT AGAAATTCCT TGAAATCGCA CAGGACATTT TCAGTTTGGT TGGTGCCTGC TGGTAGCGGT GCCTTTATGC GCGCGGGGCT TGAAAATCAT CAAAGAATTG CAGACGAAGA ATCAGGCTGG AACCAGATTT CCGTTTCTCA ATATCACGGA ATACCTCCAC TTACAGGTGG TTTACGTGAA ACI'GGCTGA CTTGCAAGCT AACTCCAAGA AGAAGCTGGC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 4 0 1500 1560 1620 1680 1740 1800 AAATACAAGG TGACGATCAT TGGTAAATCA GCCCACGGTG CTATGCCTGC TTCAGGTGTC AATGGCGCAA CTTACCTTGC CCTCTTCCTC AGCCAGTTTG GCTTTGCTGG TCCAGCCAAA GACTACCTTG ACATCGCAGG TAAAATTCTC TTGAACGAT6 ATGAGGGTGA AAATCT1AAG ATTGCTCATG TGGATGAAAA GATGGGTGCT CTTCTATCA ATGCCGGCGT CTTCCACTTC GATGAAACAA GTGCTGATAA TACCATTGCC CTCAACA'rCC GCTATCCAAA AGGAACAAGT CCAGAACAAA TCAAGTCAA'r CCTTGAAAAC TTGCCAGTTG TTTCTGTTAG CCTGTCTGAA CACGGTCACA CGCCTCACTA 'rG'GCCAATG GAAGATCCAC TTGTGCAAAC CTTGTTGAAT ATCTATGAAA AACAAACTGG CTTTAAAGGT GGTCGCTTGC TAGAACGCGG AGTTGCCrAC ATGCACCAAG CCAATGAATT TATCGCCTTG GCCGAAGCTA TTTACGAATT GATCAAATAA ACTTCTTTTT GGAGGGAAAG TAGATGTCTC CGGATTCGCA GAATGCCAGC TATACAGAGC AA.ACTGCTCG CATCAATATC ATCGGTCAGG TTTACTGGAA AGATAAAAGT GGTGACCGCT CCrTTTTACAA TTCAGGTTAT TTTGCTGTTT GCAAGTCGGG TGATCTTCCG CCTCGTACAG TACAGGAATT GCCTGATATT CAGT'rAACCC AT'rTACAGGA GAAAATCAGT GGGAAGGTAA TGCCAGCCTA TTTTCCGCTA GTTCACCCAT 1019
CATGAACAAG
GGTGCTATGT
GATGATCTTT
AACGATAGAA
AAATCGAAAG
GTGGCATTGA
CTCCGGGACT
TGCGGGACTG
TGCCTATGGA
GTTTTGCAGA
TCTTGATTGG
CGGACAGGGT
CACCACGAAA
TC-ATCGGTGG TGGAACCTTT TCCCAGACTC GATTGATACC 'rCCGAGCAGC AGCAATTTAT GTCTGAGATC TTATGCTTGG AATCAAACAG GCTATCATGG GCCTCTCTTT GCAGCGCCAA TAAAACTCAA GAAGCAGGCC GCTAGGTGTG GATGAAGA'rA TTTCTACTTT CCAGGACAFG AAAATGGCAT CCGCAGGTCT GCAATATGCC CAAGCCTACT GAAACACTAT AAAGACTATC TCA-AATCTGG ATGGCCAAAA AAGAATTAAA ACCATTTTAT AGGCTGCTCA AAGTACAGCr CACGGTAAGG CGACGC'rGAC AATGAAAGAA ATAGCCTTTG AGTGGATGTG AGAGAAGTGG ACCGCTTAGT CAA'rTGGCTG TATTTGCAAA TCTGGAATGA TAATGTTATC AATGTCCAGG TCTCCTACTT GGGCACT f 9
C
.9 9e 9 9
C
9C Ce S S C S *C 59 a C
C
CU~
C
e S. C*
SC..
S
.95.
n.e.
S
.5 C S
C
ATCCTTGGTT TGAGGCAGAA GTAGTGCCAG ATTTGAAAAA AGTCAATGAA AATCAAAGAG CAAACTAGGA AGCTAGTCGT TTGAAGTTGC AGATAAAACT GACGAAGTCG GTAACATACG GTGGTTTGAA GAGATTTTCG AAGAGTATTA GAAGAAAAAG ACGCATT'rTA CCAGCTTTAC CAAAACGACC AGCTTTCTTT ATGAGTTTGC AGCTCTTCAT TTAGAAGGTG CCCACAACCT ATAGTTATGA TTAATTGGAC AAAGATCGCT TGCATTATAT GATCGGCGCG TGCTTGCCAA TTCCTATTAG AACAAGGTTA GTGGCATGTT AGCCTrTTGAA GAACTTTAAA ATTTTGCATT 1860 1920 1980 2040 2100 2160 '2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 GGGTAGGAGA GTrTTAT'rTT TAGATAATTC TTATT'rTTAA GAAAAT'rGAA ATTTGCCTCG TGATGCTTTT TTCAGACTCC TAATCGTGGT ATACTAGGTC AAATATGAAG GAGATTTTTA TGGCTAAAAA AGGTACCCTA ACAGGTTTGC AATATTrTTT GGTGCGGGGA ACTTGATTTT TCCGCCTTCT CTAGGTGCTC ACATTTTCTT CCTGCCATCG CAGGTTTTGT CTTTTCAGGC GTTGGTATCG CCTTATTAT'r GGAACGCTAA ATCCTAAAGG ATATATCTAC GAGATTTCAA GCCTTCGGTr GCGACTCTTT ACCTCTCAGT TCTTTACTTG TCA-ATCGGTC
AACATTTAAT
AGTATTTTAT
TCCTGTTTGG
TATCTGGAGA
CCGTCTTGAC
CGAAGATAGC
CATTCTTTGC
TACCCCACGT ACTGCTACAA CAGCTTACGA AAATAAAGGA CTTGGCTTGA TTGTAT'rTAC TTCGCTTAAT CCATCAAAAA TCTI'AGACCG AATTTTGA1Tr GTTATCTTGG TCGTTCTGGG AGCTGCTrCA CTGCTTATCA AGCTTCTGCC ACCTTGGACG CCCTTGCCTC AGTGGCCTTT CTTGGATTTT CAAGTAAGAA AGAATACATr GCCCTTGCCT TCAGCGCTCT T'rACATCGGT CCAGCTGAAG CGATGAAGGG TGGAACACCA GAAATCT'rTG GCTCAACAGC TCAACTCTTC ACAACGACTG TTGGTTTGAT TGTGTCA.ACA ATCAGCTACA AGGTrTATGC GACAGCCTTT GGTCTTGATG CGATTATC 1020 AGTAGGGATT AGCCCCCTrT TGTCGGATGC GGTTCTGTAT TTTGCGGCAG CCTATTTGAT CA'rTGGACGT ATTTAACGC CAGTCTTTGC AGCTATCA).A TATGGTGGAA CAAGTCCTCA TTTGGTACAG GTTTCCTAGA AGGTTACAAT AGCGTA.ATCG CAGTTCAAAC CTTGAAACAA TCAACTATrT GGGTTGTTGG TATCGTTGTT TTAGGTTTTC TTGGAAATCA TTTCCCAGTA GGTGTTTACA TCT'rGTCACA AGCCACTCAA CTrGCAGCTA TGGTTACCGT AACCTGCrTC GCTGAGTTCT TrA.ATGAGCG CTTCCCACAA ACCTTGATTG GATTTGCTAT TGCCAATT'rG 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4278 INFORMATION FOR SEQ ID NO: 154: SEQUENCE CHARACTERISTICS: LENGTH: 1953 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 154: ACCCGATCAA ATGACAAAAG CTAACTTTGG TGTCGTAGGT CCTTGCCCTT AATATTGAAT CTCGTGGTTA CACAGTTGCT AAAAACGGAA GATGTGATTG CTTGCCATCC TGAAAAGAAC TGAAAGTTTT GTAAACTCAA TCGAAAAACC TCGTCGTATC ACCTGGTACA GATGCTACTA TCCAAGCCCT TCTTCCACAC GATTGACGGA GGAAATACTT TCTACAAAGA TACCATCCGT CTCTGGTATC AACTTTATCG GTACTGGGGT TTCTGGTGGT TCCTTCTATC ATGCCTGGTG GACAAAAAGA AGCCTACGAA AGAAATCTCA GCTAAAGCAC CAGAAGATGG CAAACCATGT TGGAGCTGGT CACTATGTGA AAATGGTTCA CAATGGTATr GATCGCAGAA AGCTATGACT TGATGCAACA CTTGCTAGGC ATGGCCGTAA TGGGTCGTAA ATCTACAACC GTAGTAAAGA TTTGTACCAA GCTATGACGT ATGCTGATGG TTCAAGCTGG CTTGACAAGG GTGATATCTT CGTAATGAAG AATTGGCAAA GAAAAAGGTG CCCTTGAAGG TTGGTTGCGG ATGTTCTTGA GTGACTTACA TCGGTCCTGA GAGTACGGTG ATATGCAATT CTTTCTGCAG AAGATATGGC 1021 TGAAATCTTT ACTGAGTGGA TGATATCTTG AGCCCTAAAG TGATGCTGCA GGTAACAAGG TGTACCATTG TCACTGATTA AGAACGTGTA CATGCTAGCA CAAGGCTGAA TTGATTGAAA CGCACAAGGA TTTGCTCAAT TGCAGATATC GCATCTATCT GATTACAGAT GCTrACAACC CTTGGATGTT ACTGCTAAGT AGCAGGTGTG CCAGTGCCAA AGCTGACCT'r CCAGCTAACT CCAACGTAAA GACAAAGAAG TCAGCCATGG GGAAACGGAT ACAAGGGrGA ATTAGACAGC TACT rGATrG AAATCACAGC ACGATGAAGG CCAAGATGGA CCAATCGTAG ACTAC-ATCCT GAACTGGTA.A ATGGACTAGC CAATCATCTC TTGACCTrGG CTGAGTCAGT GTrrGCACGC TACAmTCAA CTrACAAAGA AGGTGCTTCC AAAACCAGCT GCCTTCAACT TGAAGGAGA AGATCCGTCA AGCCCTTTAC TTCTCAAAAA TCATTTCATA TGCGTGTAGC CTCTAAAGAA AACAACTGGA ACTTGCCATT GGCGTGATGG CTGTATCATC CGTTCTCGTT TCTTGCAAAA GCGATGCAGA TCTTGCCAAC CTTCTTTrGG ACGAGTACTT ACCAACAAGC AGTACGTGAT ATCGTAGCTC TTGCGG'rTCA CTTTCTCAGC AGCTATTACT TACTTrGATA GCTACCGTTC TGATCCAAGC ACAACGTGAC TACTTTGGTG CTCACACTTA GAACCTTCCA CTACTCTrGG TATGACGAAA AATAAGTAGG TTTATTACTT GAGAAAGAAC GAAATCTAGC TCATTTTTTA 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1953 AGTTTGGAAC TCCAGAAAGA GCAGTATCGG GTTGATCTGG TAGAGGAGGG GCAAAAAGCC CTCTCCATGG CTCTTCAGAC AGACTATGAT TTGATGTTAT TGAACGTTAA TCTGGGAGAT ATGATGGCTC AGGATTTTGC AGAAAAATTG AGCCGAACTA AACCTGCCTC AGTCATCATG AT'rTTAGATC ATTGGGAAGA CTrGCAAGAA GAGCTGGAAG TTGTTCAGCG TTTTGCAGTT TCATACATCT ATAAGCCAGT CCTTATCGAA AATCTGGTAG CGCGTATTTC GGCGATCTTC CGAGGTCGGG ACTTCATTGA TCAACACTGC AGTCTGATGA AAGTTCCAAG GACCTACCGC AATCTTAGGA TAGATGTTGA ACATCACACG GTTTATCGTG GTGAAGAGAT GATTGCTCTG ACACGCCGTG AGTATGACCT TTTGGCGACA CGG INFORMATION FOR SEQ ID NO: 155: SEQUENCE CHARACTERISTICS: A) LENGTH: 6474 base pairs TYPE: nucleic acid' STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155: CCGGCAGTAC ACGAGCTTGG GGAACAGCCA CTGGAACGAT GAGGTGTGAG CTCAAAATAT CCTCCAGT'rA TGTTTTTCCT GGTTACAAAG AACCTACT TTITTGTAC AAATTTTA'rA
AATAGTATAC
CTATTAAACA
AATTAITrGCC
TTATCTTTAT
TTTGTAAATA
GGTTCATATA
1022 CGGAAGAGTG AAAGGATTTT ATAATGGAGC GTATACTATG AAAATGTGAA AATTTAACAT TTTTTAATAT CAATAGTTAA TCTCTTATCC
AGATCCCCCT
CAACAATTAG
CTATACTAAC
CxITTAATTAT
ATTTATATAT
TGTGTAAACT
ATCACTTTGT
GTTTAAATTT
AAGCTTCAAG GCCCCTATCC GTTCAAAATT -CTrTTTCAATA CTAAAA7ITTT TATACCGACA TGTAATTATC TGAATT-ATAT GTTCCAATAT TTGTTTAACT CTATCACTGG GGCATTTAT'r ACTTAAAATA GCTGACTCTT 'rCCTACTA'rC TTTGGCTTTC TTCTATTTG'r GTTTGAAATA CTTr'r'I1TAA TTGTTTAGTA TATTATAAAA ATTTCCAATT TTAAGCTTCT CCTCTCAGCA GAATGTGTTA GAAAAAAGCT TTTAATATTG GCAGTACTAT TTCAACTGTC TCATCTCCAA 'rCAATCGCTT TCTTTGGGAA GACAATATCC TCCA.AATCCA CTTGGATCTA AACAAACACC TTTTATTACA AAAGAATCTA GTTCATTAAA AAAGCAACAC TAATTGCTTC TGCTTCAGTA GGAGAAACTA GAGTACTAAT CGAAAGGAAC AACTCTGCAA CATCTATTTG 300 ATTACGTTAT 360 ATCAATAGTT 420 TTCATCCCCA 480 ATGAACTGTT 540 TCTATAA-ATT 600 AAAGAAGGA'r 660 ACTTCAGCAT 720 GGAGAGCTAA 780 ACATAACATT 840 TTTTTCTTCC 900 CAACTATGCG ACTTGGATAT AAATTATCAT ATATAGAACA ACCT'rCTCTC AAAAATTCAG GGACAAAAAT GATATTTTTT GTATCAAACA GCCT1"rTAA TTTG7I-rTGAA AAGCCCATCG GAACTGMrA CTTTAAAATA ATCTTr'rCAT TAGGTTTTAC CCTCAGAATC TTCGATACCG TTTGTCGAT TTCATATGTA TTAAAACTAC CAATTTC ATCATAATCT GTCGGAAGCG CAATAATATA ATAATCAATA TTATTTTTAA TTTCAGAAAA TGTATCAAAA AAAGTAATA'r TTAAGTTATT CTCGCAAAAA AACT'rCATAA GCTCT'rCATT TTTAGATGGA AGAATGCCCT TTTTTAAATT ATTTATTTT'r ACAGAATCTA TATCATATGC AACAACTTTA TATTTAGATG CAAATAGTAA CGCGTAGGCC AGCCCAACAT GCCCCAAACC AATTACTGCT ATATTCATAA AACTACTTCC TTATTTCTTA ATCCAAAATC TAATAGAATA AGCTGCCCCA TTCCTTAAAT ACAACTCTTT AATATTGTTT AAAAGTTTTT CAACTGATTT CCAGA'rTATC AAAATCTGAG ATTTATAGCA CAATATrGAT GATATTCTAT CAATATAATT TTTTTCATCA AGTTCCTCTT GATACATTTT TAATTC'rTTA GTTI-rCCCA 'rATAACTAAC CATACTACTA TCACTTACAT ATGGGAAGTC CTCATAATAT ATTACTTTAT AACGCATAAA TTCAAGCGCC CTTCCAATAC TATTCACAAA AACATGAGCA ACATGGTCAC CAAGTGAAAG CGGACAATAT ACGACACATT TGTCGTCTAA ATGCATTAAC AGCTCTTTTA TGATATCATT CTTTAATGTG TCCTCATTTT TTAATTCACT ATAGATATGA CGGTATAGAA AATTGCCATT 960 1020 108-0 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 TCTATCT7*rC TrCACAAGCT
CAACTTATGA
TATAGTAATA
GGAAAAAAT'r
AACCATTCCC
CTATAGACAC ATTCATAGTA CGATAAGTGT CTAAAATCAC ATTGTAGACG AACCTGTCTT CTTTCTTCCT 'rTCTTCAATC GCATATI'rCC CAAGGTTACA AAT'rGCTTAG CAGAGCGCTG TAGCI'GTTGG CTCAAAGCr AACCAGAAAA ACAAGTACAA TTTCTCCT'rC TGAAGTTAAT TTTGAAATAT AATCACCACA GCGTCATCTA AATGTGGAGA TAAAAAGATA TACTTAGTAT TGTTAC'rCAT TCTACAA'rTr ATCTAAAAAC TCACTAAGTG TCTGATrAAA TTCCACATCA TCAAAAAAAT TCACCTTATT CTTAATAATG TATTTCAATA TCCTTTCAAT ATCATCCTCT TTIACAATCT TATTAAAAAA GATAAGCTCT C'rGTTGTATC GAAAAATATA TAAAATAATT AATAAATGAG TTGTACCCGG GACACTAT A6ACTCACAGT TCT'rTTTGTG AAATTCTTTT ATTTrCAACAA TCTCAGACAT TTTATATGGA ATGA.ACTGCA CCCCAAAAGT TAGACAGAAT ATTGGGATAT TTTTTTTTAG CTAGAACTAG AATATTTCGT TAAATAAACA 'PATA'rATAAA AAAT'rCTCCT CAATATTTGC TATCAGCCCA TTATCTCTA.A AATTAAATAT TTTCATACAA TTTACTAATG TTTGAATATT TAAACAACTA ATGTT1ATCAA GAACACTATC TTGAAACCTC TTATCGTTTA GATCTGATAT T'NTrTTAGAC
TATCTAGGAT
AAATC'rAACT
TAAGGGTTTC
AGCTTCACAA
ATTGATATAT
TCATCTACAT
TAGAATCTCA
CTTAGTAAAA
AAAAAAATGA TACTTNNTTC TGTT'rCCCTC CCCTATATTC TATTwGTT'rAA GTTCCGGATG TCTAATATCT TACGTTCAAT TCAATTAATA CTATT'r 'GCCCTCCTG ACAGTAATCT AATCTTTTA CACTCAAACC GAATGCCAAA ACTATGCAA.A TTTGGGGTGC ACTTCATAAG TAGTCAAATA ACAGATACCT TCTTCAGTAA TTACCTCATA TTAAATAAAA TCCTTTGGAA CGGAGCATGG GTAACAATAA CCAACGAA GTTCTCCTAT ATTTCTAATT AGCCCTCTAG CCCATCATCA CCAACATAAT AACTTCTCT AAAGACTGTA ATCTCTAATC GTACCTTCAA CATATTTAAA TAGGAGGTTT 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 TGACAGTCAA ATCCTC'rCTA CGATAGCAGA AGTTCCCTCG CTAAAGTAAT TTITTTGTTTC AATCTAAAAT GTTATTAGGA GTATTTCTT.C ATCAGTATAA TTTTCTTCCA ATAAAATATT ACAAATAAGC TTTTTGATCT ACATATAGAA CATTCGAAAC TTTTTATATC ATCCCCGCAG AATCGCAATTr CTCCACTATA ATC'rCTCAAA AAGCCATTCA ATAATTTTAA TAATGTAGAT TTCCCGCTTC CACTTTCACC TAAAATTAAA TACTTTTCAT TACGTTGAAA ACAAAALATTT AAGTTTTTTA ATATTTCTTT ATCTCCATAC TTATAGCAAA TATTTTTTGC TTCATATAAC GGAAAATCTC TATTCACCTC ATTTGGTTCG ATATCATTCA TTTTATTTGA CTCAATTGGA TTAATTGAAT ACAATTTTAA AAAAATAGGC TTCGTACCAA TAATAGAGGA TAATTGACCT CCTAATTCAC CTCCTATTGC TTCAA'AG'rA CCAAmTTCA AAAAAACCAG AGATATCTGA AAAAAAATAT TTTCTACAGT TGTCTTTCTT TGTATAACCA TCTTAGGCAA TACATATAAA AGATTCAAGG TCTCACTAGA TT'rTAAAAAA GCT'rCATTTT TTTTCGATGC AAAGATTTTT- CGTACAAGTA CAGTCAATGA CCAATGATAG TGATTAAGAG CTTTTATTAC TAAAAAAAGT TGTTTAAACG 1024 CTAGCGCTGT AAAAATAACA CCTGTTAGTG CTATTCCTTT TAI'GCAAGA TAGCCTGTTA TGAGAAAGAA GCTAATAGCG CCTGCTAACG TCTI'rAATAA AATTCCTGCT TCTTTAA'IrT ACGCTAACAC ATCAAATCCA TTCAATATAG GGTTAGTTAA AT'rTAGACTA ACTTCTCGCA CCATAATCAT TAATGAAAAC AAGGTGGCTA TCACAACTGC AAATATAGTA CCAGAAATTC CCTGATCATT TAAAGTCTGA ACATCATTAT
TTAGCCACGA
AGATGTCTGT
AAGATATGTT CCTGATICATT TACTATGAAA TTCTTGATAG GTAGAGTTAG GGCAACTCTA TI-rCGAATCT CTAGATTAAA CTCTTGGATC ACT'rCAACCT 0 0 000** 0 0000 0 GATAATTTTrT CACTACCCAG TCAAGGAATA TTATCCCACA CCAGACAATC ATTTGGTAGA TTGACAATTT CAAAAACCGC TCTAAATTCA TCGCAATTAA TTCATTCAAC ACCAGAGCAT TAATAGTTGC TGCATAAA'NT AGCAATAATT GACCAGCAAC AATAALATATC GTTAATAAAC TAAATTTTTT TATATTTGAT TTTATAATAG TATACACAAT AGT'rTCTCAC TTCTAAATT TTAATTGAAC ATAGTTTTCA TATATACAAT AGAAAAAACC AAAATGATAT AATAACATAT ATTTCAAAAA AGAALATTCGT TAAAAATT'rT TTCTTCTCTT GCCTTCTTGA TTACTTTTAA AGCCTTGCAT 'rrGTCTCCTA 'N'AATAGTAA CCGCTTTATG TTTAA.AGAAT AATATTTC~T TGTAACCAAT ATTCTCTCGT TGAAACTCAA TAAATTAAAA TATTTCCTAC AGTAATTATA ATATTCTTCA TCTGCATTAA 'rTGTTTTTG TGTCACTCCA GTGATACCGT TTTCTTTACT GTGAGCGTAG TAATTCACCA AGAATTCTCG CACTATATCA ATTTGGTATC CTTGAACAAG TAGTTTTAAT AAAACAACAC CGTCCTGATG TGAATCTATT TTCTCAAAAC CATTAATTAA TTCTAGCACC TC~rT'rrTAC ACAACCAAAA TGACGTACCT GCTATATTGT GAACCATTTG AACAAACAAG GGATTTCCAA CAAAATCGGT CTTCTCCTCT TCTCGTGTAC CATTrGGATA AATTAT'rATT CCATAACTAC AAACTAAAGC TAAATTCTTC AT'rCTACTCT TTTTAAAACA AGCCATCAAC 'ITTAAAATTC GATCTGGCAT ATATTCATCA 'rCATCGTCTA AAAATGATAT ATACTTACCT CTAGAATTTT TGATACCTAT GTTT'CTGGCA TTAGTTGCAC CTAAATCTTC ATTACTTAAA ATTAACTTAA TTCTATGATT GGTATAGCCA AATTGATGGA TAATTTTATT TCTTAAATTT ACATTACTAT AATTATCATC AA'rAATTATA ACTTCGATAT TTr'rATAACT T'rGATGTAAA CAACTTTTCA CAGCTCTAAT CAGAGATTCA TACCTATTAT GTGTTGGTAT 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 1025 TATAATACTT ACTAATTCTr GATCTATATT CCTATCCATG ACTACTCTTC TCTAATAATT CATCATATAC TCTCATGGTT TCTACAAACA TTTTTTGCAC AGAAAAATGT TTTCTTATTT TTGATrTACT ATrCTCACCT CACAAGCACA CACTCCCTCA CAATAATTTC ACTTAACCCG ACTCTAAAAC ACACATAGGT TTTGGTAAAC TATAGTAGCT TCAAA'rGGCA AATTTTTTCT AAATCACATC TTCTCTGACT TTTTTTCTGG ATCCAACCTT TTCTTTTTTA AAGTCTTGAA ATATA'N'TCA AATACTCAGA ATCATrGAGT AAAAAATTAG ACATCTTCCT TCTCAAATAA CCAACATTAC TAGCTAAAAC ATTCCTTCTG TATCAGAAGG GGATAGATTT CACCAAGTAA TTCAAAGCAG CCCACATACT AAAAATAATT TTTCTGCAAA AAATCCATCA ACCCTATGT'r CGGAGTI'CCT TGTG.ACATTG AATATACAAT AAATCCGA'rA CCTGAAATTA TCTCTACATT ACCATTTCCA GCCATAATAA TTCAAGGAAT CTATCCGGCC TTGTTATTTG GAATACAAAA CTATTrGATA CATTAATTCC TCCAATCTrT TTTTTCATAG CCAAAAATCA AAATAAGTCA
CCAACATAAC
CACC'rACTAC 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6474
AAATGATTTT
ATCTAAATCG
a a. a a GTTATTTATT GCAACTA'rCT TCTTATTTT'r TATTATACTC TTTCAGATAC ACAAATAAAA GCATCTCCCA TAGAATATGT a. .a a a a a. a.
a a AGAATTTCTT TTITAAGTTA TATTCAACCC ATCCATGGCA TGTTATCACT GTCTrAACCT TTCCAAATCC ATTCTTGTCA AGTTTTTTTA ACATATATAA AAAATAAT'rA GTTGAGTAGC CATGACAGTG TATAAGTTGG ATTTTTAATA ATTTTAAAAT AT'rTTTAACG TGTAAGGCAG TTTCAAAATT ATTTGAACAT TGAGTACAAT CAACATAGGC AATATCTAAA TTTTTATAAT CATCAATAAC CTTTGAATCT CTAGATACAA TTATCAAAAT AGGGAATAGA GACA INFORMATION FOR SEQ ID NO: 156: SEQUENCE CHARACTERISTICS: LENGTH: 4792 base pairs TYPE: nucleic acid C) STRAN'DEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 156: TATTTAACGA TTTTTTTCAT GTCAITTCCT CCAAAATAGA ATACCTTATA ATC11'AACAG AAAAAGAGCA TTTACGCCAT TATATGATAT CTATCTCTGT GATAAGTTTT TTTTATGGGT AATTTAAAAG ACCAAACGCA AGATGGCAAT CAAGACCACT CCAAAGAGAA CTGTTCCGAC TAGATTOCOG TAGCGAAAGG CTACCCAAGC TGTTGGAAAG ACGGCTAAGA AGTCCAGTCA TTTGATTTGA GGAAGACTGC CAACCT'rACC TGTCACTACG CTTGAAAGAA TCAGGGCAAA GATAATGGAA ACAGGCAAAA ACTTCAAAAA CTTGACCAAG ATGAAGGGAA TCATACCGGG TGCTAATAAA AGATACTTAC TGACCATCTA TCGCAAACAG AACAGCTAGT GACTGAGACA CAACAACTGC TAGGATAATG AGCAGATTGC ATTGCGAAGC AAAATACCAA TAAACATCCC TGGATTTGGT AGCAGGCCAC CCAGAGCCGT 1026
ACGCTCAACA
AATCCAAGTC
AAACCACCCC
TCACTG1TCAA
ATCGCAGGCA
ACCAAGCCAG
CATGCTACAA
GAGCAAAAAG
GGCCM'rATA
AGAAAATAAC
CCAAGTAGCG
AAGGACACCG
GGACAGGAAT CCGTCTI'TGC ATAATCTGAA AACCAGGGCA AAATCCAAGC CAAAGATTTC TCCGACTACT GTCCCCACAA ACCAAGCCAC CATAGGATTT ACC'rTGTCTG TATGGGCCAA TGTCAAGATA CTAGACATAC CGATATTGTA .0 0 00. 00 00 0 0 00 .0 0 0 0 0* *4 0 0 0 0
ATAGCTGTTA
TTCACCCATC
CCAAAGACTG
GTTGATTAGA
CAACATGGCA
AGGTGTCACA
AGCCGTTGGC
CATAT'rGTC1'
TTGCTCAAAA
TTTTGAGGTT
TAGAACTGAC
GAGTTTCGAA
GCAGATTGAA
TTTTAGCAAT
AAAAAAGTCT
TT1ATAGAAAT
GTATGACGGA
AAAACCGTCA
AATAAGTCGA TGCGTGTAAA CTrCAACAAAA TAGCAATAGC TGCCACAGGA GCTTGAACCA AACTGGGCAC TCCCAGCATA AACAAAGAGA TAGGGCGCAC CGATAATTCC ACAGGCCAGG ATGGCTGCCT GCGCCCCCTC CTAAAATCCT TAATAATACT CAATGAAAAT CAAAGAGCAA CACTGTTTTG AGGTTGCAGA TAGAACTGAT GTGGA'rAGAA CTGACGAAGT CAGCTCAAAA GAAGTCAGTA ACCATACCTA CGGCAAAGTG GAGTACAAGT AGGCTGAAAA GAATCCAACC ATAAGATGAG AACAAATCGA T'rGGGAAAGT TGrTTCGTAC TATTTTAGAT TCAGTCTATT GTTGATTTTG ACCATCATAA AAAGACTGGC AGATTGTTTC CGTGCATCCA AAAACGCCAT AGCTCTCATC
CTCATCA.AGC
CCGATACTGA
TTTTCTTTCA
ACTAGGAAAC
GAAGTCAGCT
CACCGTTTTG
AAGCTGACCT
ACAGCATGGA
AAAATTAATT
ATAACACATT
AATCCAGTCT
AGAGACGCAA
CAATCAGTGC
CCATCTCAAC
CATAGCCAAG
TCTTTCTCCI'
TAGCCGCAGG
CAAAACACTG
AGGT'rGTGGA
GGTTTGAAGA
CTATTATATA
TCTATAAATG
CAGAAAAGAG
CAAACATATA
360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 ?100 tee* 6000 TCTCCACTAA GAAGAAATAkG CAATAAAACA CCAAAGCTAT TGTAAATAAG
ATACTTTCAC
AAGCTAACTG
CTAGGTAAAA
GAATAr'rCAG AAGCATAACA AAGGCAACTA CCAGAGTTCC AAAGCTAGTA GCAATGGTTA CAACCG'rAAT GGCACCGATA GAOGATTGAA CTGCTCCCAT TGACTCCTCA GGTATTGTT TAAAAACGAG TTCTTGCAAT CTAGGAGAGA GAACACCTGC GAAAAAGGCA TCCAAGGTAC TGGCAAATCC TACTAGAAGA AGCAACTGGA TGGAAATGGT ATGTTGCAGA TAGCCAC?1'A CTAGTGTGGC TAACAAGGCA AGAGATTGAC TAAAGATGAG AATCCAGTCA AAACGAACTG TGACAAGTGA GGCATAGAGA GCTGTTTTA CAAGGCTTCC GACAATCAGG GCTGATAATT CAGTTTGTAA ATTCAAAAAG GGCTGGTTCC 1027 TTAAAAATAG AGTGGAAATA CGAACCGTAA CATTTATCAC TGCTTGACTA CTAGAGATAA TAAACAAAAC CAAGAGCACC TTATTCATAT 'rCCATATCAA I-rTCGATGAT TGGAGCAAAT GCTCGCAAAA GGATTTTACA GAGAGTCCTT CTTGATAGCT AATCGI'TTr TCTACTTTCA
AGAGGTCAGT
GTAAGGAAA'r
TTGTTTTCAC
GCCCAATAAT
TTTTATGAAG AGGATACCTA AAAATGCGAT TAAAAACTA AGAGCGTrCA AAACTGGATG GATAGAATGC CTAGTAAGAC TCCTCCTAGG ATATTACTGA TAAACTAACA GTTGACTGTT TAAAGCCAAT AGCTTCTGCC AGATGGTCTT TCTAATGAAA ATCGGAGTGA GCATGGCGCC TGAAAAATAA CTCAATGTGT 0.0 0 0.0.
0 0 0000 00.
0 CAGACAAGAG CTTAATCAGA CAAATAAATG AAAGTGATAA AGACACTATA GAGTAAAGCA TCAAGACACG ATGATGTTGA AAATCCGCCA GCAGGGTTTC TGAAATCGTG ATGAGTAAAA CATAATTCAG GAAGGCCAGA TAAAAAATCG TAGTTAATTG CCTAAAATCT CTATTTTGAA TTCAA.ATGGG AATCTTTCCC CAAGGATAGA TAACATCAAA AGCTGACCAA TGCCATTGTA TGACTTTGTC ATTCTAAATA AGACTGCAAA GTCTGTCAAG ACCCAACCGT GATTACTAAT AAGTACATCT AGCCCCCATT TCTTTCCTTT AATCATGTCT TCAAAAATGG GACCTGCAAT TGTAGCCCCT GTAAAAGTCG GAGCAGCATG CTACTAGCAA CAAG4GAGAAA AAAATTTTGC AAAACTAATG ACTGTGTATT AAACTCCCAG AAAGATTTGT AGAACTTGGG TCGCCAAAGG GGCAAAAGAT GCATCTGCCA 'rATCCCCAAG CGTTGAAATC CACTGGTTGA GAAATACTTT CATCACAACT CCTTCTTAAG CCGCGATACT ACTAACAACC AAAATTACAG GACTATATGC AGTCCAATAG GCCAATAAAT TATAAGACCT CCACCCATAT AGAAGACAAA GTGCGAGACC CCAAATAAAA CAGCGGAACC TTCCAGAGCA GTCATCACTA ATCCACGATA CACAGGATAA AAAAALATACA TCAAAAATGC TTGATAAGAA ATTTCATTTC GAGTAGGTGG AACAAAAGCA AGCAGAGCTA GGAAGGAATA AT'rGATT'TC TGCCATTTCC CCGACCAAAT AAAATTCAAC ATCATATCCG ACAGATAATA
GACTGCCCTG
2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 GAAAAGAAAA AAGGTAACGA AATTCCAAAC GAAAAGATAG GATCCTTTAA ACTTTCTACT CATAGCAATA'AGAGCAAATA AAACCACAAG GGCAAAGTCA GATAGCCCAG TAACAAGGTC GCTGCGTAAA ACTAGAACAC TGAACTTCTG GTCAGCAATA ACTAGTAGAA AAACTATAAT AAAGTAGCGG TGTGAGATTA TCTTTTTCAT ATATCACCTT TCTAATATCC AAATACCAAT AAAGTAACAA TGAGTAAGAA ACTATTCCAT GAAGCATGCA GAGCTATAGC CCAATAGATG GATCGGGTGT AGCGAAACAT CATACAAAAT ATCAAGCCCA TTCCAAAATA CT'N'ATGAAA TCTGTCGTTA TCCAACCATA CTGCAAAACA TGCATAGCGC CAAATATGGC AGCGGAAACA AGAACATCAA GATAGTATCT CTTAACTTTA GATAAACTrG TCATCAAAAG CTAGTATAAA GTATTCGCGT ACGACTGTAC TTCCATTCTG AACAATAAGA AAAAAGAAAG CCACACTCTT TCCAATGTTC
ACCACGACAA
AACAAAATAG
GGTACGAGGA
AAGGAAAACA
ACTTTTGACA
CACATAGTAC
1028 ACAACCTCTT CTGATACAGG TGCGATAATA CTAATTCCTG TTAAATTGGT GGCTACTTCT AAGATATAGG TTGTrAGATT TGCCCACACG CCCAGGTAAG ACCAACGAAA CTGGAAACGA AAAGCAATTG TAGCTATAGT TCCCAGAATA ATATTATCAG ACAAAGCAAC CATAAAATCT TAAGTCAAA6A TCAACAAGCC AGTTGCTAGG CTATTATCTC CTATAAGAGC CTATCTTCTA
AGTACCAATA
AAGTCTGATG
TGAAATTTCA
AAACTTGGAA.
TGACATTAAA AATGAGGTAA CTTCTTTCAT TTTCTTCATC 3900 3960 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4792
S
S
55
S
*55*
S
*5 CGGCGGCCAA ACAATCCATC CTCATCCCTT GATTGCCCCA CTTCTTCCAC CACCTCCCAT GCTTCTGTCA ATTCCATTGT ATTATCTTCA TTTGTAAACC ACAGTATAAC ACGAACATrG AGAAATTrCA GATTTGCGAT TACTAATGAG GAGTGGAGAA TGCTAAATCT ATAGTCCAAT CAAAAGCTCC ACGATTAGGA ACCAGGGTAA ATTCCTGGGA CGCCCCAACC AGATATACCA AGAATTTACG AGGTTGCCTC CTCTAACATC TTGCAACTCA TTCTGCAAAT TGTAAAT'-rTA ACATCTTTTA CACTCCTTCA ACTTCTGCGA CCTAGGATTT GCTTCAAGTG CTTTACAAGT GCTTATTTTA GAAAATCGCA TATTTGATAT TTTTTCTI'AT ?1'TGGTGAAT TTGATTACTT CTCTGGTATA ATAAAGTTAC ATATGAAGAA ACAAATTTTA ACATTATTGA AA INFORMATION FOR SEQ ID NO: 157: SEQUENCE CHARACTERISTICS: LENGTH: 2156 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157: CCGTTCTCGG CGACGGCCAT CTGATGAAGC TA=TATGAG GGAAACTGGC AAGCTGGAGA GTCAGAGTAT CTAGTCTTTC ACCGATTGCT GTGGCAGCAG ATGTGCAGGG AAAAGGAGTT GCTCAAACCT TCTTAGAGGG CTTGATTGAA GGTTTTGATT ATCTTGATTT TCGCTCAGAT ACGCATGCTG AAAACAAGGT TATGCAACAT ATTTTTGAAA AACTTGGTTT TAAACAAGTC GGTAAGATGC CAGTAGATGG CGAACGCTTG GCCTATCAAG AATTAAAGAA ATAATGCAAA AGAAGTATGT A.AAAATCCTC TACTCCTCAC CAAT'rGGTAT TCTATCACTT GTAGCTGATG ACCATTATT GTATGGAATT TGGGTTCAGG AGCAGAAGCA TTTTGAGAGG GGACTAGGAG ATGAAACGAT AGAAGAAGTT GTTAGTCATC CTATTTTAGA CCCAGTTATT GCTTGCTTAG A Al 4, A.J~JOO 1029 ATGATT-ACTT TAAAGGCAAG CCTCAGGATT TATCCAACTT GCTCTGGCG CCAATCGGAA 540 CGAATTTTCA AAAGAGAGTT TGGGACTATT TACAGGGCAT TCCTTATGGT CAGACAGTGA 600 CCTATGGACA AATITGCTCAA GACCTGCAAG TGGCTTCTGC TCAAGCAATT GGTGGAGCAG 660 TGGGACGCAA TCCTTGGTCT ATCCTAGTAC CTT 4 GTCATCG TGTGTGGGA GCAGGCAAGC 720 GTCTGACAGG TTATGCTGCA GGAGTGGAAA AGAAAGCTTG GCTCTTGGAG CATGAAGGAG 780 TAGATTrTTAA AGATAGAAGC AATAGAAGGA GAAGCACATG TTAGAATTTA TCGAATACCC 840 CAAATGTTCA ACTTGTAAAA AAGCAAAACA AGAAT'rAAAT CAAT'rAGGTG TGGACTATAA 900 AGCCGTCCAT ATCGTGGAAG AAACACCTAG CCAAGAAGTC ATTTTGAATT GGCTAGAAAC 960 CTCAGGA'rTT GAAT'rGAAGC AATTTTTCAA CACCAGTGGT ATCAAATACC GTGAAT'rAGG 1020 GCTAAAAGAT AAGGTAGGAA GTTTGTCAAA CCAAGAAGCG GCTGAG??GC TAGCAAGTGA 1080 CGGTATGTTG TTAAAACGGC CCATTTTAGT AGAAAATGGA ACTGTTAAGC AAATCGGTTA 1140 *TCGAAAATC'r TATGAGGAAC TGGGACTGAA ATAGT-rTTTA TCTATCTCTT TGATAGATAA 1200 AATATATAAC TTCCCTGTTT CAAAGTATGA TAAACTAGTA GGTAGACAAA CTCTGTATCT 1260 *.*GACCGTAGCA AATAA'N'TCA TTGACGGCAG AAGCATGGTA GCATGAATCA 'rTATCAGAAG 1320 **AGGATGT'TT TATGAATGTT ACAACGATT'r TAGCATCAGA TTGT'ACCAA AACTrTGATGC 1380 ****AAT'VOATTCC GGATGGCAAG CTGTTTAGCC TACGTTCGGT CTTT'GATGGA ATCCCTAGAA 1440 TTGTCCAACA AC'rTCCAACA ACAAT'rATGT TGACAATTGG TGGTGCCCTT TTTGGCTTGG 1500 7MTTGGCGCT TCTrTTTTCCC ATTGTGAAGA TCAATCGTGT CA.AGATTTTA TATCCCTTGC 1560 AGGCCTTCTT TGTTAGTTTC TTAAAAGGGA CACcGATTr'T CGTGCAACTC A'rGT'rGACCT 1620 ACTACGGAAT CCCTTTGGCT TTGAAAGCCC TCAATCAGCA ATGGGGAACT GGTCTCAATA 1680 TCAATGCGAT TCCAGCTGCA GCTTTTGCGA 'rTGTCGCCTT TGCCTTrTAAT GAGGCAGCTT 1740 *ATGCTAGTGA AACCAT'rCGT GCAGCCATTC TCTCAGTTAA. TCCTGGTGAG ATTGAGGCGG 1800 CACGCAGTCT GGGTATGACC CCAGCGCAAG TTTATCGACG AGTGATTAT'r CCTAATGCAG 1860 CGG'rGGTAGC TACTCCAACC TTGATTAATT CCCTCATCGG TTTGACCAAG GGAACATCTC 1920 TAGCTTTITAG TGCGGTGTT GTGGAAGTCT TTGCCCAAGC TCAGATTCTA GGTGGAGCTG 1980 *ATTATCGCTA 'N'TTGAACGC TTCATCTCCG TTGCCCTTGT TTATTGGGTA GTCAATATCG 2040 GAAT'rGAAAG CCTCGGTCGT TTCATCGAGA GAAAAATGGC TATTTCTGCA CCTGATACAG 2100 TGCAACAGAT GTGAAAGGAG ACCTTCGTTA ATGATTAAGA TTr'CGAATTT AAGCAA 2156 INFORMATION FOR SEQ ID NO: 158: 1030 SEQUENCE CHARACTEISTICS: LENGTH: 3140 base pairs TYPE: nucleic acid STIRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158: GTATCTCTAC ACATGTCTTC AATCGATTTT GTTGTCCTCC AATTTAATrC CTTATATGCT TTGTCTGCAT TTGCATAACA AGTTGCAACG GGAATAGGGA TCTTATTAAC ACTTTCAAAT TCTCCCGAGC CTAGGTTATA GATATAAACA TTTATATGTC CTATTGCTAA ATCTACTACA.
AGCGTATCAT AATCATTTCC GAACACACTT GCAATATAAG GCATCAAGTT GTTAGGAATT TCATGAGCAC CA.ATTGGATT GAAATAACGA ACATGAACAT CTTTTAAAAT TTGCTCAAGC GCAC'TnTTT CCrA'CrPC ArqATTAGAGG TCTCCTGAAC GTCTTGGAAC GTATTTACAA GTTGTAATAC
TATTTTATAA
ACTAGTGCCT
C.
C
C.
C
C
C. CC
C
C TCTGTTTTTT CAGATACTTT TTCTAAAGCT TGGATATAAT CACGCACACC AGTACCATCA AGCTCTGATA GCTrACCTAC CGCTACTrGT CCTGAGGGAT CTTCCCCAAT CAAACCAGAC AGCAACGCAA TACTCCATTC TGAATCTGCC ATCACTTTCG TATACCCATA AGGATTTGTC GACTGATTGT TAATTCCATA TACAGTCGCA AATTCTGACA TCACTTCAAC AAGTGCCAAT ACAGGCTTrT GCACGGATTC TCCGACAGCT CT'rGAAGAAA
GTACTCATAA
AGACAATCTT TTTAACATTA TATTATTT'rT GTAGTACATC TTATAACCTG CAAAATGAAT CTGTTTAT CACAAACATC ATACGGTCTA GCACCAAGAT CCTAAATTTA GTAATTCTAC AATATTGCCA TCTGGGTITrC GCTTCATGCC CAGACGGTGT ACTTGTCCTG CTTCGTGTTG TTTTCTTTCA ATrGGTCGGC AAAAGATATT TTCCAACTrC CCCATCACTT CGATTAACCC TGGAAAACAC CATCTGGGTA TCGTATCTCC CGTCCACTTT TGCAGCATCA ATCGATTCTT TAATrCGTAA AACACGGGAC GCTAGAGTTC GAAAGGTTGT TACGGTATGG CTACCAATAT CTCCTAATTA ATTCCAACCG ATTCTTATAA ACTCCTGCAT AACTACGCTA TTAACCTCTT CCATTCTAAA TGATAATCCG TTCTAACTCT GGTTTCAAAC GATATTTTCC TTTTTAATAT TTGTTCAGTA GTATGA'rrAT ACGAGCAATA GGAGTCACCG GTTCAAATAC CTTTCTCAAT GTATTCCTGT AATTGCTTCA CGACAATGAT AACTTCCT'TT AACCAGCTCC GCCTGTTACC ACTrAACAAA TCTCATAAAC CTTCCAGAAC TCTCGCAA.;.C CTTTATTAAT GCGAGGATAT CAATTGCATT ATCCTCTCCT GAGGTGGTAA TATCGCAAGT GTTGTACATC TTGATGAGGA CTCTTAGAAC AATATCTAAT TATGGTGTGG GACATCTTCA 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 GTCATAGCAA TGATGTCTAC TTCTAAATCT GAATAT'rCTC TCCACTTATT TAGAATTTTA 1031 CGTAGCTAAAT CTAACAAGCG ATT?1'TATTT TCACTTTGTA ACCTAATTAC TGACA?1'GGC CATTTTACAA TACCAGCATT AACATCCTCA AAGTC'TTTAA AACAAAATTC ACTCTCAAAT TTTGCT TTT CCAT'rCCGAA AATATGTNTC CCTCCCTGGT AGTGGTTATG ACTAAGAATG GAGCCTCCTG AGATAGGAAG ATCAGAAT'rr GAACCAGCAA AATATCCTGG CAAAATATCA ACAATCTCCA ATAATrGTTC AAATGTTTTA GAGOI'AATAG CCAT1'CGTAC A'rGTTGACTA TTCAAAAATA TCGCATGCTC ATTAAAGTAT GAGTAGGGAG AATACTGGAA TCCCCATACT TCGTCACCAA GTTTCAACCG AATAATTCTA TGATTCGAAC GTGCTGGATA ATTTAT'rCGC CCCTGATATC CTTCATT'rTC CATACATAGT AAACA'N'TG GATAATTAGT AWNTTTTCAG CAGCAATTGT TTTTGGATCT TTTTCGGGTT TTGACAAATT TCTAGCTCTC CGTATTTAGT TGATC'CGA AACTCAATAT TCTTAGCAAT TTAATATAAT CACTATCTTT ACTTAAC2-rA TAAAACTCTT CAACTGCTTC TGC'rrTrACT
TATCGTAATC
AGCAGAAGTT~
TTGAGGTCAT
ATATCATATG AACTCCAAAA ATTAACTCTG CTCCTAAACA
AAGGCGATT
TCAATTCCCT
CCACTAAGTA
CCTCACCTAT
TCATTATACA TTCCGTATTC TATTTTTCTT TCTAATTTA'r TGCGTAGTA'r GAGCAATTTT GAAACGT'rCC CATCATCTTC AGCTCTTCTA TCTCATTCTG TCTCTATAAC AAGATATAC ATT'ITGAAA TAATTTTCTT ATTrCAACAA TTCTATCCCC ATTCTTCTTA AGTCZATATGA TCAATATTTA GAACTGACTC TCATTTGATA AATATTCAAA TCTAAATTA'r TTGATAATTC CCTCTTCTTA ACAAAGTTCG AATATCATTT AATCGACTAG GTAAAGGAAC TATGAAATTC 'rTCCTTTTCC TCGATTrAAAT CTTTAATTTT ACCGTTTTTT ATCTN'TATT TGTTTCAGGT CAT'rTrCATC GGAAATGCGA TAACGCTAGT ACTCTATTTT TCACATATAT T'TTGTCAATT AATTACTCTA TCAACAAAAT TATCAATAAT TGTTTTCATA GT'rCCCATAT TTTCTATACA T'rATCCATTT ATAAATTGCT ATCAAGGTGA TGAATAATAT CTAAAGCACT AATTACTTCA AAATATGTAA TTCATTATTr TCTTTTCCAT ATTTATACTA TTTTTGTATA ACAACCAtAT CTAALACATCC AGATTGTTCC CC'rATTCATA TGCAGTCCGA TAACTTCATG AAGTATTTTT CAAAATTTCA TTATTTTGAA GAATCTGTAG ATTTTr'rAAA AATACGTTCA ATGTCAGTTG ATATTI-rTAT TACACTrAATA AACAGGATGT TGTAAACAAA TTAACTCATA TCC'TTTTTTA 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3140 ATTTATGATT AAATCTTCTT TAATCAA'rrC TAACTTCTCA TATTTATCAA GCACAGATAC TATAATTTCA TTmC'rAAAT ATAACCTTAA
TACTCGTTCT
CCAAATGGTC
CATTTAGGTA
INFORMATION FOR SEQ ID NO: 159: Ci) SEQUENCE CHARACTER(ISTICS: CA) LENGTH: 9048 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOG;Y: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 159: CCGGATGATT TCCTGGTCAG ATAGGGGGAA AGTGACTTCC TCAGCAATCG CGCGTAGAGT AGGATTCCCT TCACGGATAA TTCTATTGCA GACATTTTCT CCCAAGTGGT ATACTTGGAA ACATGATATC ACAAAATGAC TAGATAGGAA GTTCAATTCA AAGGCTTGCA TAAGAAAGTA GTGGTTTAGC TTTTCGTTI-r GAGCAATAAG GGATTTGTGG CTTGTTGTTT GAAAGCTTCC CAAGAGGGGA CTAAGCAAAA GCCCACGGGG ATTTGCTCTA TATCGTTCAT ATCAATTAAG CTCCTTATAT TATGTTTAGT TAAGCCACTG TGGATTAGTT AAGAATTGAA AGCATTATGG ATTGTGAAAG AAATACTTAT GGGAGAACGA AGATACAAAG ATGAAGGGCT TGGTAAACTT
TGAGCAGCTT
GCAGTTAGCT
C-ATTTTCTTT
CATTTAGGAT
CTGTGATATA
TTOTAATACG
ACTGCCAAAG
CATTACCTCT
TTATAGAAAA
ATAAAAAGAA
Osage: .0 :*ease AAGACAAAAT CGAAATCAGG TTTAGGAGTT ATCGCAAGTG CGATACTCTT .GCTAGCAGTT GGTTTATCAA TGGGCTTGGT AAGGAATCCC ?1'GACTAGTC AAAAACGAGA TACTA N'TCT 540 600 GTCTCAGGAG TAGGAAGAGG AAAAAACTGC CCACGATGGA CTTTTCTTTT CAGCTAAAAA CAGAATTATG 660 AGAAGACGGT 720 ACCTATGAC'r TTCATGAAAA: TTTTGAGTAT GTGACTCCTT GGCTCAAGCA AGGGGACTAA GCAGCAGATT TAGCTA'PTGG TGATTTTGAA GGAACCATTA ATAAGGATCA TTATTTAGCG *C CC
C
S
Ci.
CCC.
4* C C
C
GGTTATCTTC TCTTTAATGC CATGTGCTGG ATTTAGCTCA ACGGCCGATA TTATrGAGAA CGTGATCAGG CTCCGCTGGT TATTCCTATG GT'rTCAATGG CTTTCAGATT TAAACGAAGA jGATATCACCA TTATCATGCT CAAAAAGCTC TTTATCACAA CCTCACGTTG TTGAACCATC TCCTGTTGAA GTTATGGATG CTATTAAGGA GGCAGGTTAT TAATCATAT'r TTGGATTCGC AAATTGAGGG AGTTATTTCA AGCTGGAATC ACTCCAATCG CATTAAGGAA GTGAATGGTA AATTGAGCAG TATATTTCTC TAAGATGAAG GTTGAAATTG TCAGATGGGT GTTGAGTATC GATGATCGAT TTGGGAGCGG TGAAACGGTT GAAAAAGATG
GAGTTTATAC
TCAAGGTrGC
AGGAAGACTA
AACGGGCAGA
GATTGGAACC
GCACGAACCA
ATTGTTAGCC
TAATCGTI'AT
GAAGGAAGCA
AACTGAAGAA
1020 1080 1140 1200 1260 1320 1380 1440 1500 ATATTATCTV TGGAGGGCAT GAGATAAGAA ACTCATTATC TATTAAATGG GGAACTTCAT TTCCAATCAA CGAATTGAAT CTATGGGAGA TGAAGAGAAT GCTAAGTGGA CTGAACGTGG TGTTCTCATG GATGTCACCA 'PCAAGAAGAA GGATGGAAAA ACAACTATCG GAACAGCTAA AGCTCATCCT ACTTGGGTCA ATCGAACACC AAAGGGAACC TTTTCACCAG AAGGATATCC CTTGTATCAT TACCAAACTr ATATTTTGGA AGATTTrATA CAGGATGGCA GTCATCGTGA CCAGTTAGAT GAAGCGACTA AGGAACGAAT TGATACAGCC TATAAAGAAA TGAATGAACA TGTGGGATTG AAGTGGTA'rT AGCTTGAATC CAGAGGAAAG TAAATGATGA TTAAGGTAAT TGCGACAGAT ATGGATGGGA CCTTGCTGGA TGCTAGAGGT CAGCTTGATC TCCCACGATT CGAAAAGATT TTAGATCAGT TIGGATCAAAG GGGCATTCGT T'rTGTCATTG CGACGGGCAA TGAAAr'rCAC CGCATGAGAC AACTACTGAG GATCGAGTGG TTCTGGTTGT CAGGCTCAGA CATGGGATGA GCGTGTCAGG ACCAGTTTGT A'NrT'rACAG ATCI-rGAAAG ATGCAATTTG TGGATGAATT GTTGTTGGTG AGGAACGTTT CGTGTCCGAG CTGTATCCAG AAAGCATGGG GCTTGGAGGA TGCTAATGGC GCTCGTAT'rT TTGAAAACAA CGCCATTGTC AACAAGGCTT TGACTCATTT TGTA-ACGGGG ATGAAGGGTG ATTTTGTCAA TTTTATGACr CCAGAAATGA TTGAAAAATT AACATCTGAC CTCTT'rGGTG GTG'rGCTCAA GAGTTCGGTT 'rTGGAAGAAA TCAATGCTCT TGGCTATGGT TGCATTGATA TCCTCCAAGC
TCCCTTGGTG
TGAATTGATT'
CAAGGGTCGA
GGAAGGTACG
CTACCAACGG
GATGAGCATG
CTTTGATGGC
TGGGATTCAT
A'rTACTCAAG CGCTGGGAC'r TGAAATCCCA AGAAATCATG 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 GCTTrGGTG ATAGTGAAAA TGATGTTGAA ATGCTTGAAA 'rGGCTGGAAT TGCCTATGCG ATGGAAAATG CI'GATGAGAA AGCCAAAGC'r GTGGCGACTG CTCTAGCACC AGCCAACAGC CAAGGAGGAG TTTATCAAGT CTTGGAAAAC TGGTTAGAAA AAGGAGAATG AAGTGGCAGT ACAGTTATTA GAAAAT-rGGC TCCTAAAGGA ACAAGAAAAA ATTCAAACTA AGTATCGTCA CCTAAATCAC A'IrrCTGTTG TAGAACCAAA CATTCTTTTT ATTGGGGATT CCATTGTCGA GTAT'rATCCT CTACAGGAGC TATT'rGGGAC TTCAAAGACG ATTGTCAATC GAGGAATTCG TGGCTATCAG ACAGGACTGT TACTAGAGAA CCTTGATGCT CATCTATATG GTGGAGCAGT AGATAAAWI2I' TTTCTTCTGA TTGGGACAAA TGATATCGGA AAGGATQTTC CTGTGAATGA GGCTCTCAAT AATCTCGAAG CTATCATTCA ATCCGTTGCT CGCGATTATC CATTGACAGA GATTAAATTG CTT'rCCATT'r TGCCTGTCAA TGAGAGAGAG GAGTACCAGC AGGCAGTCTA TATCCGCTCG AATGAAAAAA T'rCAGAACTG GAATCAAGCC TATCAAGAGC TTGCATCTGC CTATATGCAG GTGGAATTTG TGCCAGTATT TGATTGTTTG ACAGACCAAG CAGGCCAACT CAAAAAAGAA TATACAACTG ATGGACTGCA CCTCAGTAT'r GCTGCTTATC AGGC'N'TGTC AAAATCCTTG AAAGACTATC TTTACTAAAT AGCTAAATAA TGTTAAAT -r GCATAATA TCTTGTAAAA AATTCTAAAA TCCTTTAAAA AATCAGATTG TACGGATTAT TCCTACTT'rA TrN'rATATTG AAACCCTI'GG AATGAAGGCC GGTGACCAAA CGGGTCTTGA AAAGCTGGTr AAGGTAGAGG GAAGAAAAAA ACTAGCTAGA ATTGAAGGAA TC1'TATC'rAA AACAGATTCG TACGCTTTG AAATTTTCTC ACCAGAAGAT ATACCAAGTC TAGTAGAAGT AGGAGAAAAG 1034 TAAAAAGTGA CGGACGAATT TATGAATGTA AAAGCTAATA ATAGAAAATT AAATGAAACA TTGrrAGAAG AATCGGCCTT TCTG'rCACTA TTAGAAGAAG CTCCCAGTAT GCG'rACTCGT TTGATTGTCA -AeGGAAAA TCCCTTAGAA ATTCATCGAT TATATAAAGG TCAAAATGGC GA?1TrGA'rT TGATTCATGC GGAAGATGAC CCTGAATTTC AAACAGATTT GGCATCAATT TCTrTAAGTA AATrIrGAGAT TTCTATGGAA T1'ACATCTCC CAACTGATAT TTGGAATCAT CTGAAATTGG ACTGTGGACA ATACGGTTAC GACATAGCAA GTCT'rCGCCA AAATTCT'rCC TTGGTAAAGA GGACCAAGAT TATTAAAAAA CTTTTGCGTA TTTTAAGGAC AGCATGGCTT GCAGACTGAG TTGGGGTTGG CACAGTTTGT TGGTAATAGA TTTTTTACCT GGCATCCCTT GATTTTAT'TC CAGCTCAGGG CTGGGACTTA TCTATGCTCA AGTTCTTGGT
GAAGTTTGAG
TAGAAATAAT
ATAGAAGAAC
AATCAATGGA
GCAGGACTAG
ACtTTCtTGT
GAGAGTGATT
TCTAC'rGAAT ATTT'rATTCC
CGAAAGTTTC
GCAGGATTTG
CAATGAATTA
TAAGTCTGAA
GTTGAATTGT GGTTTrGAAGA AGTATGAAGT AAATCGAGGC AGGGATTTAT CCCGGAGCCT CAGACTTCTA TTTAGGCCAG AGTGACCCAG T'rTATGACCT AGCTAGTGTC AGCAAGGTTG GGGAAATAGG TCAATTAGAT ATTGATAGAC ATCCAGACAT CACTAT'rCGC CAGCTCTTGA CTAATCGTGA TCTTTTAACA GCCCCTGAA'r GAAGTCAGCC AGCCTTTCTT TATTCGGATG AAAGAATTTT TAATCAAGAT T'rGGATGTGA GAATGACGGA AACTAAGTTT GGGCCAGTTG 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 CTCATGCAAC AGACCTTGAT CCTTTTA'rTC TAAAGGAAGC GATGTTTCAT CTCAACAGAC TCCATT'TTTT GCTGTTGGGC TT'TATTTTGG TTTT'AAAGGA TCAAGTCTGG AAACCTTGGG AGCTTGCTG'r TCCAACAGTT AGAGGTGTAG AGGCAGGCAT AGTGCATGAT CCCAAGGCTrc GTC'rCCTGGG TAGACA'rGCT GGGAGTGCTG GTr-rA'I-TTC GACTATAAAG GATTTACAAA TCTrTTTTAGA ACACTATTTA GCAGATGATT T'rGCAACAGA CTTAAATCAA AATTTTTCTC CTrTTGGATGA CAAGGAACGT TCP'rTAGCAT GGAATTT~GGA AGGAGATTGG CTAGACCATA CGGGC'rATAC AGGTACCTTT- ATCATGTGGA ATCG'rCAGAA GCAAGAAGCC ACTATTTTCC TATCGAATCG TACCTATGAA AAGGACGAGA GAGCTCAATG GATATTAGAC CCCAATCAAG TGATGAACTT GATTCGCAAA GAAGAGTAAG GAGAGACATG TCAAATAGTT TAAAAGGGAC TTTIACTAACA GTTGTGGCTG GTATTGCTTG GGGGTTGTCA GGAACGAGTG GCCAATACCT 1035 AATGGCACAC GGAATTTCGG CTCTGGTC1'T GACTAACTTG AATrTCMA G CTCTTGGCTrT ATGCTACTGC AAAGGATAAA TAGAAAGAGT TTGCTGTCTrC CGCCTATCTG 'rCTGCTATTC TTGTCCTGTC GGAATTTTAA AGAGATAGTT 'rCCATCATAT G?1'GGACCAG TTATCCATGA TTATGCTCTG TATATCAT'T CATTGGTGTG GGAATGGTCA GGCCGATATC CCGACTAGTC GACTGTCTTT GCCTATACAG AAGCTTGTTG GCTTCAATTG TGAACAATT TATCCCATTG TTCTTATTTr TGCTCTG.ATT AGGAGACCAA TGCGGGAACA TTTATAGC'rc TATCAAGGAT TCGCCATCGG AGGAACCTTC CACCTGCTGG TCTGTTCTGG TACCCATAGC CTTGATTAAA TAGCAGGTTT GGTCGCCCTT T'rGATT'rTCT CCTTGCGTTT CTTTCCTTAA AGGAGCCAGT AGCCAATATC GGCGA'TTTC ATTr'rCTGG TATGGCAATG CGTCTITTTAA TCGCTGGTGG ATACTGGTCT TTTTAAAGGA GGTCTTTTTC TCAACCAATT GCGACGGTGC TrCAGTATGT AGGGTGGCAC CGACACTGGG CTGATCGCAA CACATGGGCA GGTC'rCTTTT CTGCCTTGAC AAGTGGGGGA GCAGCTTGGT CCTr'TTACAG GGGTTCTACA GCAGGCATTA TCCTTATCGG CTGATAGGAC CGGTCAAGTC TTTGCCTI'C2 TAATAATGA.A ATATTGTTPG CTGTAACTTT a a a a a a a. a.
GATTTCTTTG AAAGATTTAT TCTTAGAAAA ATAAAAAAGA CTCTTT'GTCC GTTTTTGCGT GGTAATCTAA TTATrrrTCAA GAC1'TTTTAC GTATTCAAAA GCAGTACCAT GAATAGCATG TCCTTTTTCA ACTTCCAAAT CATAGATGGT CTG'N'GAGAT TTGCCGATAC AATGACTGGT GATTrTCTTCr TTTTAAAGT CTTCATAAAC TAGGGGAACT TGGTCGGCAT CCGTTGGAAA AATTCCTAGC TTGGCAACTT AAATGAGCTG GCTAGAGGGA ACTTTACCTr TCCCTCGCAT CT'rTTCTTGT ACTCGAGTAC GCTC'rAAGAC GCCTTCTTCG ACTAATAGAG CCGCAAACTG CTCAGCTAAA TCTCTTTCAC ACTCGTCAAT ATCCr1~1TT ATCTGATCAT GATAAAATTC AAAGCGTTCG CT'rCTAGGTA GGAAACCTGG AGTGGGCAAT CTTTTCTTPA GATAGCCATG TTTTTGCAAG
GTGACAGAGA
CCTACATATT
GTCAATCCAA
GCAAGGCGAG
GTTTGGAAGA
5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 CCTTAATGAA TTr'rTCAGGA ATAGAAGCAA AGCGGACCCG C1'CCATTCGG ATAATATTIGT CTTCCTCATT GGGAATGGTT TTTTTGTAGG GGGATTTGAC AATTTCAGTA AALACTGGTTG TGGAAACAAA GGTGCCC'N CCTACACCC ATACGGCTTG GCGGAGGGTC ATGCGACTGA TGGGAAGCCT CTCACCAATA GCCCAACGGT GGT'TTAT A'rAAGCAGGT AGCATATTTT TCACTTCATT TCTATCTN'TT CCTTGTTTAG TTGGTAATTC TTGAGAAAGG AAAAAGATGA CTCTATTGTA CCCCAATAAA GCCCTTATTT GTGATAGAAT GCAACATTTC AACTGATTT
CTAGAAAAAG
ATTGAGAAAA
CAAGA'rGTAG
TCAAAC'ITCG
GATATTTCTT
AAAAAATCAT
CGTATTGGAC TATGGTAGCC TGTTTTTCA GAACTAAAAA TGTAGGAA'1r ATTCTATCAG TGACCCAGAA ATCTTCGAAC ATTGACCCAT AAACTTGGAG CGGTCAATCA ACCCTAACTC GACTGTTTTG ATGAGCCATG AGGTACATCA GCTGACTGCC TATCCAATC CACCCAGAAG TGCCCTTAAC ATTTGTAAGG GATCAAAAAA ATTCGTGAAA TGTTGACTCA TCTGTCGT'rG TATCT'rCGTA GACCACGGTC CGGTGGTAAG TTTG=~TGA ACTTGCTGGC GTTTCTGACC 1036 AGTACAACCA GCTGATTTCA GCCATAAAAT TTCAGCTGCT GTGGTCCAAA TTCTGTATAT TCGGAATTCC AATTTTGGGA GAAAAGTTGT TCCTGCAGGT ACACACCATC AGCGCTTTTT 0 0.* 0. 0.
o .00.
*.00* 0.0*
GTGATGCGGT
CATACCCAC
TTCGTCAT'rC
CTAAAGGTGA
CCGTCGGTGA
GGGTTCTTCT
TTCTTCGTAA
ATA.TCGTCAA
CTGAACAAAA
TACTGAGA'rr
CATCGAAAAC
TGTATACGGA
CTGGTCAATG
TAAkACGTGTC
CCAAAAAGCG
AGGCGAAGCT
ACCAGACGCT
ACGTAAAATC
AGATGTGA.AA
TACAGCTCAA
GAATTGATTG
CTTGGTATGC
CGCCGTATCC GTGAGATTGG GAAG'rrCGTG AACTCAATCC GAAGATGGTT CATTTGATAT ATCTGTTATG GTATGCAGTT GATIGCTGGAA ATCGTGAATA GAATCAACAC CTGATGAACA CCTGCTGACT TTGTTCGTAC CCAGATAAAC ACATTTACGG AATGATATCC TTCGTAACTT GATAATTTCA TTGACATGCA C7'rCTTGGTC TATCAGGTGG ATTGGCGATC AATTGATCTG GATCAAGTTA TGGACATGCT GCTAAACGTT TCCTTGACA.A ATCGGTAACG AGT'N'GTCTA T'rCCTrGCTC AAGGTACTTT ACTATCAAGT CACACCACAA AACCACTCAA TACTCTTTAC CAGACCATAT CGTATGGCGC GTGAAATCAC TGAAGAGAAA AAATCGCTAA AGCTGGACTT 6840 6900 '6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 762a 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 TGTATTCGAT GACGAAGCAA GCAAGCTCAA ATATACAGAT GTTATCGAGT CTGGTACGGA CGTGGtGGTC TTCCAGAAGA TATGCAGTTT AAGGATGAAG TTCGTGCTCT TGGTACAGAG CAACCATTCC CAGGACCAGG ACTTGCTATC CGTGTCATGG CTTGAAACCG TTCGTGAATC AGACGCTATT CTTCGTGAAG GACCGCGATA TTTGGCAATA CTTCACTGTT AACACAGGCG TTCGTTCAGT CGGTGTTATG GGTGACGGTC GTACGTATGA CTACACGATT GCAATCCGTG CTATCACTTC TATCGATGGT ATGACTGCTG ATTTTGCCA-A AATTCCATGG GAAGTACTTC AAAAAATCTC AGTACGTATC GTAAATGAAG TGGATCATGT TAACCGTATC GTCTACGATA TTACAAGTAA ACCACCTGCA ACAGTTGAGT GGGAATAATC GCAAAAAAAT TAAAAGCTTT GTAAAATCAA CGGTTACAGA GGATTAAAAA CTGTAACTGG GATTAAAACG GGAACATTTG CTAAAAAGAA TAAATTGAAT AATAGT"TCCA AGTGGTTTAC ATTTGGACAA AAALATTAGAC CGTAGTTTTC AAGCTGCGGT CTTTTGATAT ATATAATGAG AATTAATGGC TCTTGTCAA CTGTAGTGGG TTGAAGTCAG CTAAGCTCGA GAAAGGACAA ATTTTGTCCT TTCTTTTXTTG ATATTCAGAG CGATAAAAAT 1037 CCGT'rrN'TG AAGTTTTCA.A AGTTCCGAAA ACCAAAGGCA GAGATTATTG GTCGCTTCCA ATT'rGGCGI' AGA.ATAGTGT TTrCTCTTTG TCCTTTAGAA AGGTTrTA.AA GACAGTCTGA TAGATTGTCC TCAATGAGTC CGAAAAA'rrT CTCCGGTTCC CAAGAGTTGA TAGAGCTGAT AGTGATGTTT C.AAGTCTrGT TA.AAATCTCT ?'rATTGGTTA AATGCATACG AAAAGTAGGG TTGCGCTTGA TAAGTTTGAT AGTTGAAGGG CGTTGACGAT AAAAGAGGAT GAACCTGCTT TTATrCTGAA AGTGAAACAG GAATAGCTCA AAAGCTTGTT CGATAAAAAT GTTTATCGCT 8640 8700 8760 8820 8880 8940 9000 9048 GAGTTTACGA CTATCCTGTT GTATGAGCTT CCAGTAGCGC TTGATAGCCT AGACTr'rCGA TCCAA'IrGAT TCATGATTTG AACACGCACA CGACTCGG INFORMATION FOR SEQ ID NO: 160: SEQUENCE CHARACTERISTICS: LENGTH: 10399 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
TGTATTCATG
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 160: GTACCTTTAT TGATGAATGG ACTGTTTAAA TCAGTAGCAC GCCAACCAGA GAGTTTCGTA GTr-PGATGTT CTTGTCTTCT CATTTA'rrAT GATGGAAGAA AGTATTAATC GTTAGCCATG ACTTTGTTGA CAATATGACC TTGAAACCCA TATTGGATTT ACAGAACCTA CCTTTGTTTA TTCCTTTTCA.
AACGATCGAT GGGACTAACT CTTAT CTTTT CTTGTCATTT AAAAAGTATA AAATCTTTTA AT TTACAAAC TTCTTATCTT CATGACGAGT T-TGTTACTTC TGGAGCTAAT TTGGCTTGGA
TTTAGGTGTT
CAAATAAATA
CAATCATCTC
TTGTGGGAGT
AAGGAAAGCA
ACATTGGTTC
GCCTTTATTG
CATGGAACGA
TATTGGTCCT
TATTTTTGTC
AAATGTACTT
GCGCTACATG
AAGGAACTTT
GAAGAAAAGG
GTTATCTTCA
TTTATTTATT
GAGTATGTCT
AAAGATTACT
TATGCTTTCT
CTTTGTAACT
GAGGATTTTA
ATCTGACTAT
GGGCAAGCCG
ATGACTTTGT
CACTCTTTTT
TGGTGATTGC CAATAACCTT GGCTTAATGA CAAAGCTTCA GGTGGAGTTC GCCAACCGCT AATTTACAGT ATGACTTAAC 120 180 240 300 360 420 480 540 600 660 720 780 840 900
TGTTGACACA
TGAGTCCTGT
TGGCTTI'GCG
TTCTTTCCCA
CTGCATTTTC
TATAGAAAGC
TTTTGTCATA
GATTTTTGGG
CCAAGCTATT
TGTCTTTATT
GTTCGTCGTC
CCGATGA.ATA
AATATCTTTG
TATTGGTATC
TCCTGCATCC
GTGGATTrAA
TCTTGGAAGA
CAGGAGAGGT
CAGTAGCCTT
AAGCTTATGT
TTTTACTCTT TTGACATCTG TGTATTTAGG GAATAAGATT AATATTGAAG AGGAATAGAA 1038 AGGAGTAACT GATGCACGTA ACAGTAGGTG AATTAATTGG TAAI=~ATT TAATCACTG GCTCTTTTAT TCTTTT'GC1'A GTCTTGATTA AAAAATTT-GC ATG~cTCrAAT ATTACAGGCA TTr'rCGAAGA AAGAGCTGAA AAAATTGCTT CAGATATTGA CAGAC1'GAA GAAGCCCGTC AAAAAGCAGA AGTATTGGCT CAAAAACGCG AAGATGAATT GGCTGGTAGC CGTAAAGAAG CTAAGACAAT CATTGAAAAT GCAAAGGAAA CAGCTGAGCA AAGTAAGGCT AATATCTTAG CAGATGCrAA ACTAGAAGCA GGACACTTAA AAGAAAAAGC CAATCAAGAA ATTGCTCAAA ATAAAGTAGA AGCTTTACAG AGTGTTAAGG GTGAGGTCGC AGATTTGACC ATCAGCTTAG CTGGTAAAAT CATCTCACAA AACCT'rGACA GTCATGCCCA TAAAGCACTC ATTGATCAGT ATATCGATCA GCTAGGAGAA GCTTAATGGA CAGCATGCCT TTGTCCAAT 'rGGTACTTGA CTTGAC'rCAA ATCAAGCAAG 'rTGTTGAAAA GGCAGTAGAC GAGTCGGATA AGGAAAAAAC TTTATTACAA AACTTTATCC AGGTTCTGGC 'rCTGCTTGTA GATTGCTTGA ACCCACTTGA TACGTCTGCT CATCCTCTAA CTGATGAACA AAAAATGTCTr CTGAAAGTAA GGAGTGTAAA TTTTGTCATT TTTGCCAATC ACAAGCAAT TGTTAAAGAA AATTTGAAAT AGAAAGTGGT CAAGAAAACA GTAAAGGTAA 'TTCAAAAATA AAAAGGAGAA GAAGACCGTA TCTTTTCAGA AACAGGTCTG CCTTCTI'= TAAAACAAGT AATTGCrT'r TTCCAAGATT CTGTGTCGCC
CTACA-ATCAC
AAAAGAAACA
GALAGACTCGT
AGAACAAATC
AGAGCAAATC Tr'rTTTATGA AATCGATTTrG AAGTGACGAT TTGCTCCCTI' TGATTGAGAA CATGAAAGTC TCATTGGTGG 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700
AGCGCTTTAA
GGTGTTGTAA
AGTGGAGAGT
TTAAGCAACA AATTGAAALAT CCTATATCGG GGACGGTATC TGTTrGAATTT TGAAAACGGC TGATGTGACT ATTAAACAAC AACTAA.AGT GTTCTTTTGG CAATTAACGC ACAAGAAATC TTCAAACCCA ATTr'rGATGT GACTGAAACA GCGCGTGC'PC ACGGCCTTGA AAATGTCATG TCTTATGGTA TGGCTCAAAA CTTGGAGTCA ACAGACGTTG GTATTATCAT CCTAGGTGAC TTTACAGATA TCCGTGAAGG CGCCGTACAG GGAAAATCAT GGAAGTCCCT GTAGGTGAAA GTCTGATTGG GATCCGCT'rG GTCGTCCAGT TCACGGTCTT GGAGAAATCC ACACTGATAA GTAGAAGCAC CAGCTCCTGG TGTTATGCAA CGTAAGTCTG TTrTCAGAACC
CGATACAATC
TCGTGTTGTG
AACTCGTCCA
ATTGCAAACT
GTrGAAAG CTATTGACGC CCTTGTACCG GGTGACCGTC AGACAGGGAA AACAACCATT CAAGATATGA TCTGTATCTA CGTCGCGATT GTAGAAACAC 'TrCGTCAGTA CGGTGCCTTG TCACAACCAT CTCCATTGCT CTTCCTAGCT ATTGGTCGTG GTCA.ACGTGA GTTGATTATC GCGATTGATA CAATCTTGAA CCAAAAAGAT GGACAAAAAG AATCAACAGT TCGTACGCAA GACTACACAA TCGTTGTGAC AGCCTCTGCT CCT'rATGCTG GGGTTGCTAT GGCGGAAGAA 1039 TTArGTATC AAGGTAAGCA TGTTTTGA'rT GTATACGATG ATCTATCAAA ACAAGCGGTA GCTTATCGTG AACTGTCGCT GATGTrCT ATCTCCACAG GGTGT'GGAT CAATTACAGC TATATCGCAA CCAACGTGAT TTCAATGCAG GTATTCGTCC TCTGCACAAA TCAAAGCCAT TACCGTGAGT TGGAAGCCTT AAGTTGAACC G'rGGACGTCG CCTGTTGAGA AACAAGTAAC CC-AGTAGATG ATATTGTTCG CCAGAGATI' TGGAAACCAT GCTGCGATTA CAGAGTTrrCT GCAGTATCTC TAAATGATAT CTTGCTTCG'r CCTCCTCCAG CCGNTTGCTT GAGCGCTCAG CCTACCATTr ATCGAGACAC TTCTATCACT GATGGACAAA GTCGTGAAGC CTTCCCAGGG CTAAAGTTTC TGATGAACTT AAGCAGGACA TATCTCACC TCTTCCTTGG CGATGGCCTC- CTGTATCTCG TGTAGGTGGT TTCGTATCGA CCTTGCTTCA
AGCCATCGAT
GAAGAAGGTT
GCCGGTTCAT
GCTGGTACAC
p p.
p p p p. p TACTAAGTTT GGT'rCTGACT TACCGTTrGAG GTCT'rGAAAC CATTcT'NrAT cc7TTTGAcAc TTTCGAGGAA GAGTTCCATG TCGTGATACA AAAGACTTGC CAATCAATCT AGCTTCCAAT TAAAACAAAA ATCGCCTCAA TGGACGCAGC AACACAGGCT ACTAATGCCA TGCAAATGGT ATCGGCTGCT AAGCTAGGTC AACTTCCAAG TTTACGCTCA GAAAGTGCGT AAACTTTTGA GGAGCTGGTG CTTCAACTAA TCCGATGTTG ATTrAGCCGT'r ATCGTTATCA CTTCAGACCG CGGTTTGGTT GGAGGTTATA pp *P'p p
V.P.
p. ep p p p GTTA'rGGAGT TGAAAGAAGA GGTGGGATGG GAGCTGATTT GGCTTGTCAG ACCAACCTAG ATGTACCAAA ATCAACTCTT CTAACCAGTC AAATGCGTGT GCGGA'rGAAG AGTACAGC'T CAGTTGTrrGC CTCAGTTTGC GCTGAATG CTrGCGGGCAT ATCAATGATT TGACAATTCA ACAGAAATCG TAGCAGGTGC TGAACTIAGG ACCTACTTGA ATACCACCCA GACGGTAAAG Crr'rAAGGCT CGCGGTATTC CTTTGATCAA GTTCGTAAGA TGATGAGCTr TATGTTTGCT GGAACAAATG CTTCCGATTG GACTrTGAA TTGGAAACCA AGAAAGTATG Ar'rTACGGTG GACAGCCATG CAAACAGCGA CTATAACCGT GCCAGACAGG TAGTGCCTTA GAATAGGCTC GCTAGGAACC GACAGTATCT AACCTCTTCA CAAACCATTA ATGGTTTCTT GGATACTGTT CCTTCTTTGA TGCTCAACAT CAGAAGAAGC AGTCTTGGAT A.AGAATAGAG GTGTCAGATG CAAAAAATAC GAGTCAAATC GTTCTGAAGA AGCTGCTCGC CAGATATCCT TCATGGTALAT CTGTGAAGAA GACAGGCTAT ATTCCTCTAT TT'rGAAAGCT GTTTTGAAAT GATCTGTATC AACCACTTTA TGAATTACGT TTATTTCAAA AACTGTTGAA ACAACCACCA TGTCAATACG TIGACTTGGA TCCAAATGAA GCCGAGAAGA AATTCTGGAG CCATTATCGA TGCCAAGACA CAGATAATGC TAAGAAAGTC CGGCGATTAC ACAAGAAAT'r TAGTCCAGCT CGTATGAAAA TATATAGAAT AGGAGAAGGA 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 GATGAGTTCA GGTAAAATTG AGGGGAAAAA CTTCCTGAGA AACAAAAATC GTCCTTGAAG CATGGAATCA ACAGATGGGT CTCTGTACCA GTAGGTAAAG TGACT'rGGAA GCTCCTTr'rA AACTTTTGAT GAGTTGTCTA CCTrCTTGCC CCTTACCT'rA TAAAACTGTC T'rAATCCAAG AGTATTTGCT GGTGTTGGGG AGAATCAGGC GTTATCGAGA AGCACGTATG CGTGTTGCCC AGGCCAAGAC GTGCTTCTCT AGTATCTGCC CTTTTGGGTC GGAAATGGGT CAATTGCA.AG CCAGGCTATC TATGTGCCAG TCACTTGGAT TCAACAACAA C1'CAGGTAT
TTAACAATGC
TAGCCTrGGA
TGACTCGTGG
AAACTTTGGG
CAGAAGACCC
CCrCTTCTGA 1040 CGGTCCCGTT GTAGACGTT'r TGrTTTGCAGC ACTTGTCGTC TACAAAAATG ACGAAAGAAA GTTAGGAGAT GGTATGG1-rC GTACTATCGC AATGGAAGTA 'rTGGACACAG GTCGTCCAAT ACGTGTCTTC AACGT'TTTGG GAGATACCAT AGAGCGTCAG CCAATTCATA AAAAAGCTCC AATCCTrTGAA ACAGGGATCA AGGTTAT'rGA 4500 4560 4620 4680 4740 4800 4860 AAGGTGGTAA AGTTGGACTT TTCGGTGGTG AATTGATTCA CAACATTGCC CAAGAGCACG AACGTACTCG TGAGGGGAAT GACCTTTACT AAACAGCCAT GGTCTTTGGT CAGATGAATG TTACTGGTTT GACAATCGCT GAATACTTCC TTATCGA'rAA TATCTTCCGT TTCACTCAGG GTATGCCATC AGCCGTTGGT TACCAACCAA AACGTATCAC ATCAACCAAG AAGGGTTCTG CGGATGACTA TACTGACCCA GCGCCAGCAA CCGGAGTTGG 4920 GTGGTiT'TC 4980 GGGAAATGAA 5040 AGCCACCAGG 5100 GTGATGTGGA 5160 CTGG2"rCAGA 5220 CACTTGCTAC 520 TAACCTCTAr 5340 CAGCCTTCGC 5400 TCTACCCAGC 5460 GAGAAGAGCA 5520 TGCAAGATAT 5580 TTGCTCGCGC 5640 TTACTGGTCA 5700 TCCTTGATGG 5760 ACTTGGAACG TAAGTTGGTA CGTTGACCCA CTTGCTTCAA GCTCACGTGC CTATGCAGTT GCTGCTGAAG TAAAACGTGT CATTGCTATC CTTGGTATGG ATGAGCTTTC CCGTCGTATC CAGTTCTTCT TGTCACAAAA
CTTGGCACCT
CCTTCAACGT
TGATGAAGAA
CTTCAACGTT
TGTACGTGGC
CAATTGGGTA
GAAATCGTTG
TACCATGAAT
AAGACCTTGG
GCGGA.ACAAT
TTTAAGGAAA
GCCAGGTTCT TATGTTCCAG TKAATACGAC CACTTGCCAG
TGCAAAAGCT
TCGTGACACC
TGGATGGTGA
GAAAAAATGG
AGATGGTCTC
GATGGGGATC
'IrGCTGAA.AC
AAGATGCCTT
GATTTTAAGA
GTCTATGATC
T'rGCCACGAC CCCTGGTGTA GGTCTATCG AAGATGTGAT GGTGATCTAT GGCTCAGTTA ACTGTCCAGA ACCATGCCAG CTATGTATCG GTTCGAACTC ATGAAAATAT GATTGCGGTT TTAGCAGTTG AAGATCACGT GAACTGGATT GCAGTAAACG TCACAATCGT CGCTGACTCT GCAGAACGTG GTGCCAAACT TCGTGCAGAA CGTGCAATTG AAGAACGTCG TGCTAAGATT GCTTTGCAAC 5820 5880 5940 6000 6060 6120 6180r 6240 ATGAAGTAAA GGTAAAACGT ATCGATGATA GTGGCGTTAT TGAAATTGCC AATGATATGA CTCGTGATAT CGATATCAGT CGTGCAGAAC AAGAAGCACA AGACAAACAT TTGATTGACC GTGCTATTAA CCGTATTAA'r CAAGTTCATT TrTTTATGGTG GGAAGTCGTT GG.AGAGTTCT CATTGTCAGA GCTTGATGG 1041 GTCGGAAATA GACTATAAGA TTTTAAGGAG CAAAACGGAT AAAAATGAAC 'ITGAAAATAC GCAGACTGCT TCGGGAACAT GCTAGACGAC CATTGTCACA ATTACGTTA AAGACAGTTG ACAACAATGA GAAATTT"TTC GTC=GTC AAATCAAAAT
CACGTGGAGT
GGATGGTATA
CT'1TAGAGAG C'rGACCATGC
TACTGCGATA
ATGAATAGCA
GTTGGAACGA TTTCTAATAA GAATCATGGC CACGGTTAGA GCGGTTCCAT TAAAGCCTTC CTCTAAGCTA CCGTCCGCAA AGCCTAGAGG TATTTACCGT GTAAGCTTCC GGTAAAGTTG 00.0.: 00 :0: :000..
:0 0 AAATGACCTG CATACGTTCA AATTCGCCAA CGCCATCGTA GAI-TAAAACT TCGATAGTAC TATTGAGTTC ACAAATGAGA TAAGCGATTT TATAGTGGTT ATGGAAAATG ATATCGCGTG AGCCTGCTCC TGGCTTGCTG TGATAGGTAT AGAGCTTAGA TAATTTTCCT TCTTGATCGA GGTCATAGGT GATGACTTGG TCAGTTCCCA AGTCGCAGGT CACTAGATAG TGGTCAGGTG 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260
TTAAATCTGT
GTTGATCCAT
GTCCCTTGTG
AGTGGGGAGC
CAATTCCCCC
CAAGGTAGGT
ATAGTGAACA
ATCACTAAGT
ATAGTTAGCT
TGGGGGGAAG CTTGATTTTC AGAAGACTAC CATCTTCCTG GCGTAAACCA AATCACGCTT ATGTGGACCT TGGCCACTGT GCGTTTATAA ACAAGGACTT TTCATCGACA GCAACATAAC CCCGTCAGTT TGATAGGCTG ATGTTGGTGC TGGTCAAAGG ATTTGAAAGC TGACCAGTTT TCCTTCTTCA ACAACATGAT TTAACACAGT CTTATCGTCT TGGCTACCA-A CAGTGTATAA TGGACTTGGC TCAGCTGCAA AAAGSTTCTAG CTGTATCAAA GTCTGCCTTG TAAkATCCCTT GAGAAGTACG ACGTGTATA.A GTTCCAAAAT AAACACTTTC 2TTTCATTACT ATACCTCTGT GTAA-AGATAA GACTATTATA TCACAAAAAC :.,Ooo 60900 00 00
AAGTAAATTA
AAAATGGTAT
TTTAGCCTAT
GTAACCTTAT
CCTGT'rAG AAGATATCCA ATTAGATGTA AGCACTTTAA AATGAGAGAA CAATAGAAAG GAAGTATTTA CTTGGTTTT CAAGTGGTTT 'rTAGATAACA TATTGGGACT GAATCTTTTT ATTT'rAAGTA ACT'TTTTAGC AGTTGTGATG TTGCCAGTCA AAAAGAGTTA TTTTGTTTCA TGGAGCAAAA AGAGAAACAT AGGCA.ATTAC GGTATTr'rTA AGAr-rAGTTT TCTATTTTCA rTTTTGTCTGG TTTGTTATAT TA7'rTGTTGA ATCCTATTGT TGATTGGATG GAGAAGCATA AGGTTAATCG TGTTATAGCT ATCACTATTG TCTTTGTTAT CATCGCTCTC TTTATCATTT GGGGCTTGGC AGTCGCCATT
CCAAATCTGC
ATAGATAGGA
TTAGAGCAAG
AACGTCAGGT TTTGACCTTT GCAAGAAACG TTGTTAATGG ATTGGTAGCC CAGCACCTGC TT'rTGACCAA TTTTTCTAGC CAGGCTACAG TTCCTGTTTA CTTAGXAGAT CAGATCATTT CAGACCTCAA TTTTGGCAAG TAAGGT'rTCA TCTCAGGCAG TCAACrGGGT GAGTGCCT 'TTCATTATCG TTCCTTTCAT GCTCTTTTAT TATTTGACCC AATTCATTCC GTGAATCAAC AGTTGTCCAA GTAATGT'rA
ACTGCTGGTA
CTAGTATTGG
TCATCTTCrT
TTTTAAATCT
GTTTGATTGC
GTAGAACAAA CTATTGAAGG ATCCACCCTA TTAATGTTCT so***
GGAGTTTTAC
GAATGGTATA
CAATAGTCAA
TTTCGCCAAA
TGAAGGGATT
TCCAGAGGTT
CTT'rACCTAT
GAAGGCAGAC
GGCCTTGACC
GTTGGAAAAT
TGAGCAAACG
ATTTGAAACG
AGCTTTTCAG
TTGGTATTCC
AGGTAGTCAG
CAGATGTTAC
GCTTTAGAAA
GGTTTCTATC
CATCTTAATC
CTTGAGGAAA
CTTTACCAGC
TACTCAGAGG
TACCAAGCGG
GGCATTTCCA
GCTACTGAGT
TTGGCCAG'1' AAGAAAA'rrc
CTATGTTCCA
CAAGATTATT
GGTrCCCTTAT
TGGTCCAGTC
CCGTTTTGTC
CTTTGTTTTG
GGTT'rATGCC
TGGTCTATAT
AGGCTTTGGA
ATGATTCAAG
CTCAGGCCAA
TAGCTGCAAT
TCCAAGCTGA
TGGAAGGTTT
ATTCTCTCTT
CTATTCAAGC
CCTATCAACG
TTTTAGAAAA
TTTATTTGA
1042t ATTAGCGGGG CTCTCAAGT GATTVMrrCC CTCTTGCGTG ATGGGAAAGG CTTGCGTAAC A.AGGAACCTG TTGGACAAGT ?1'TATCAGAT CGGCAAGTGA CACTGGCTrAT TATTGTAGCA GGTCTACGCT A'rGCGGTTAC GCTGGGGGTT CTTGGTAGCT TTCTAGCCAT GCT'rCCTGCTr ATGCTTTTGA AAGTAGTGAT TGTCTTTATC TCTCCATTGA TTTTGGGAAG TCAATTAAAC TTAACTTCAG GATCTATGTT TGGTiTCTGG TCTGCTAAGG TTGTCAT'rTC AGCCATTTTC GAATTAGAGG GTGAGGAAGTr CAAGAGTGAA GGAGCAAGAT T'rAACTAAGG CTCAGCATTA TGATCTTCTG TATGAATTGG CAACTTATC'r GGAAATTTAC CTGAAAATTG TAGAGGAT TGCTAGCGAG GATGGTCAAA TAGAAGAAGC CAGTGACTGG TATGTCTCGT CrTTTGGCTCT GACAGATGTG GCACGTGAGA AATI'ATTGGA GATATTGGGT T'rGGCAGAGT TGGATAGTGA CTATGCCCAG TTAGATAATC GCTCGATTTA AATTGGCTTT GCCTATGCTC AGTTAGGGAA AGCCCTGGAG TTAGAATACG ATGACr'rAAC TCAAGAAGAA TATCAAAAAG CCACCCTCTA CTT-TGAAGGC TATGAGTATG GGTACAGYCA AGCCCTGCGT ATCGCTAAGC AAGGATTAGA ACTGCTTCA CAATTT'rCTT ATGAA'rTGCA TACTGCAAAA GAAGACGCTG AGGATACAGA TCTGGAGCAG GAGCGTTATG AGGATATTCT TTTGACCAAG TGGATGATTG CTCGTTCTTA- TGAGTATTAT CAAGAGTTGA CAGGAGATr'r 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 CTTTAAGCAG CTTGATACCA T'rTCTCCGA GGCTTTACAT AAGGAACATC AAGTTCAAGA
GAAAAATCCC
TGATGCTAGT
AGAAATCTTG
AGAATTGCAG
TCAAGAAATG
TTGAAACTC
GGTGCAGAAA
CTTCGTTTAG
AGTGAGGAGC
GACGATTTGG
GCCTC-TTGCT
ATTATCTCCT
CCACTATTTA
CAGAAAATCT
ATACTGCTTA
GAAGGACA.AT CCAGAATTTC TGGAACACTA TATCTATCTC TTGCGTGAAT TGGGACATTT TGAAGAAGCA AAAGTCCATG CTCACACTTA GCAAGAACTG TTTGAGAGAT TGTAAGAATG CTAGATGTAA CTACAAAAC CCCTGACCTC AGI-rTACTr TCTGCTGTTC C.AGTATCGTT 1043 CTTAAAACTG GTCCAGATG TTTAACCCAA ATCAT'rCATA
ATGAGCCACT
TTTCCTCGCT
TCATCTCGAC
TTCrTCCTCC
AGA'IITCCTC
TGTTCTTTAA
ATGTGCAAAT
CCTCTCTCAA
TCATGAGGTC
AAAAGGGCAG
TGCATCATTA ACTCCTCCCT TGGTGCGTCA ACGACGCTTT TCTrCTAGGT TCCTAGAATA AAGTGCTGAA GCTGCTrGCG TCCTGTTCGA GTTTGATT1GT TAAAGTTT'GG CACGATTT'r?
GGTTCATAAG
AACAATTCGG
GAACAGGAAG A'rTCAGGTTG ACTTrTCTAA AATAGGCATA GAGACTAGAC AAT'rTGAGGA 9840 9900 9950 10020 10080 10140 10200 10260 10320 10380 10399 ACACATTTTC CCACCACGTG AAGTCACCTC CAGCTAGATG AAGAAAAAGA TGGCGGAAGC TTTGAGAAAA AGATAGAGAT ATTAAGGTTG AACTATCCGT S
S
S. *S TGTAGGCGAT ACAGCTCATC ATCATACGAA TTCGTTTT'rG TTTATCGCCA AAAAATCGG INFORMATION FOR SEQ ID NO: 161: SEQUENCE CHARACTERISTICS: LENGTH: 9409 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161: GATAAGATTA AGTTAGAAAA AGCTATGGGC AGGAAGAAAT TCAACCAATA TCAAGTATGC CAAAAGCATG AGCAATTGAT GAAAGAACTA GGACATATCT ACCAGATTCA GGTTTTTAAT CTATCGTGTG ATTTGATGG AGACCAATAT TAGTTCGGTT TGCTGTCTTG ATTAATACCA GTCAGTTGGA ACAGGCTAGT TGTGGTCGTG ATGGCTAGTT TCTGGATTTT GTCTTTACTT GGTCAGTGTT AGGCCCCTGC TTGAGAGTAT GCAGAAGCAA CAGTCATGAG TTACGAACTC CACTCGCAGT TTTGCAAAAT TAAGCCAGAA GCTACCATTA TGGATGTGAG CGAAAGCATT
GCCAGTCTCT
CAGTCTTTTG
CGCTTAGAGA
GCATCGAGTT
GCTCGGAGAG
ATCTAGCTAG
TGGAAAATGC
CCCT'rTTTCG TGGAAGAAGT CCGAAATATG CGTTTTTAA CGACAAGCTT GCTGAACTTA ATGATGGGAT TAAGCCGGAG CTTGCAGAAG TTCCAACTAG ClTTTTAAT ACAACTTTCA CAAACTACGA GATGATTGCT TCGGAAAATA ATCGTGTCTT CCGTw'GAA AATCGTATCC ATCGAACAAT TGTCACAGAT CAGCTTCTTC TGAAACAACT GATGACCATT CTTTTCGATA ATGCCGTCAA GTATACTGAG GAGGATGGTG AAATTGATTT TCTTATCTCG GCGACCGATC GCAATCTTTA GATAAAAAGA AAATTTTTGA GGTGGr'1-rG GTTTAGGATT GTTACTGTC-A AAGATAATAA ACACCAPCTA AAAAGAAAAA TCTTCTACGT TT'rCGTTTGA GATTGCTGCG GCAAAGGCAA
TTTACTTGTT
CCGTTTTTAT
ATCCCTACC
ACCCAAGGGA
ATAAAAATAT
TAATAGACCC
GAGCAGTTGA
1044
TCTGATAATG
CGAGTAGACA
AAGCAAATTG
ACAATCTTTG
GAATCGGTAT 'rTCGACAGAA AGGCTAGAAC CCGGCAAAAA TAGA'rGCTCT AAAAGGAACT AAGTGAAGAT TGCCATTCAG S S
S
St 55
S
a V. S .55.
*5 5 5 5fk** 0 See.
etc.
.5 5* S S
S
GATAGCAATA CAGATACAAA AAACAGCGAT GGCATCAGAT TGCGCTTCAG GTGTATAAGC TTCT'rGATAG TCTACCTCAT AGGATTGTAA AGTACATACC TATTTTATCA TTTTTTCGGC AATAAAGTCC CTTrTCATTT TCAAAGCATG 'rCT'rATGGAA AAGATTGTCA T'rACAGCAAC ACTCGAAGCT GGCGTAGACC GTATCTATGT AACGACCTT'r AGTTATGACC AATTACGTGA GGAATTGATC GTTGCGCrCA ATGCTCTCAT TTTCTTAAAC TTCTTGGAAG AAATCAAGAC CTTTTACGTA GTTAACCGCG ATGGTTATTC GGTAACTAGC AGTCGTCAGA TTAACrTCTG TTTGGCGCGT GAAATTCCAT CAGCTGAACT TGCTGAAGTT TTGGTTTACG GTGCTrAGCGT AAACTACTAT AACTITACAC ATATCGATGA GGCTGAGCCA AGTGATCCAG AGAGCCACI'A TATCTrTGCC AACAATGACC TTGATTTGAT CGCTCCAATT GGGGCGATAT TTTGGATTTA TrGAACTr'1r AAAACAAGTA AGCTGAA'rCC 'rAATTTTAAT GCTAAAAAGA 'rAAAACTAAA ArTAATAAAA AATAGGATTT CCTrGAGATT TTGGTAATGA GGAAGCTGCT GGTTTAATTC TTTTCTTACG GGCATGATTC TCTCCTTAAC AGAGAATTAT- TACAGAALAGG TTACAAAAAG GCTGATTTTG GAGAAATGTG GTATAATTTT TGCTGAAAGT ATTGAACAAG TTGAACAACT CGGTGAGA.AA GATTTTGGTC AATCGCTA.AG TTGGTTCATG GCACCAAGAT ATCATGGACC AGACTATATT ACGATTGGGG ATrTAAGACC ATCTACGATG GGGACAAAAG GCTGGCGCAT TrTCAA.AATG CCAGAGATT CATCCATCAT TCTAAACGTC
TTCGTCTGCC
ATGCTGGTAA
GTATCAAGCC
ATGCAGGCGT
CTTCAACCAT
CTGAGGCTGT
TGGAAATTCC
CACTCTTGCA
780 840 900 960 1020 108.0 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520
TGAAAAGACG
TT'CCATTTTT
GATCAAAT'rA
CTTTACTCGC
AAAACTCTTT
CTTCTTGCTG
ATTTTATGAC
GATGCAPLACA
TGGAAACTAG AAGGGCTCTA CACTCCTGGT ATCCAAGCGC GTAGCTTGAT TCAAGAGGGC GATGAAGAAG TTCGTAAACT TCACCCTAAA TACGATCCTG ACATGGTTAG ATAAAATACA TT'rCTTCTCT CAA'rTTTTCG TATTTCTTCA CATAAACGTG ACCTCTTCTT GAAGATAATC ATGGGACCCA ACAGAATTGG TGGAGCATGG CAGAACTTTG TTGAGATTGC AACTTTAGTC ATGCTCAAGC AACCGTTTCC TTGATACAGG TGATTCGTPG AGAGAAGGAA CTATTTTACA AAAATCAGCA GGC1'AGAATG CTCTATTCGA TGGGA'TrTTT AAGAAAAGTA GTGTTCTTGA GTTTGAAAAT TATCCTATGT TTGCAGGTGC CAAATGCCC CGATTGGTAA TCGCTATCTT GTGGTGGATT AAATTATCCA AGTGGGAATIT G'TCGTGATTG CGGATGTCAA TCCACATGAA CCCTTGGATG ACCAACGTCT GGCGCAAGCA CCTGATTTTT TGGAGGATGG GATTTTTGTA GCCCATAATG ATITTATTTTT TGAAGGCTAT GAGCTAAGAA AGGTCTTTTT CCCTGAACTG GAAAAATATA TTCCTCTTAA ACACGCACAC ACAGCCCTTT TTTTTTTACG GAAAAAGATG ACCCAGCTTC TGGCTGACGC TCTCCTATAT GAGTCCTACC CTATCCTGAG TTCTCCAGAC TTGGTCCAAG CTTCTCTGGA GCCACGAAAA CTATCTCAAG T-rGAAGTGAG GGAGGAACAA GAAAGTTTTG AACCTGTCTC TCTGATTCAA GCGCCGACAG CCGCT'rTATC TCAATCCAAA GAGCGACAAA AAAATCAAAT CATGGAAGAA GAAGGTAAAC 1045 TTTT=GGT ATAATT1"?P ATAATGAAAA TAGAGGCAAC TAGCACAGGT AGTAAGGCTA AGGACGGAGA AATCGTCGAT CACTATACCA CTCATATCAA AGAACTGACA GGATTGACAG CGCAAGTTGC CAGAAAAATA TTTGACTTGG TTCAGTT-TGA 'rGCTAATC'rC TTGGCGGAAA ACCCTCGTGT TGATACGGTC GAATTGGCCC GCTTGCCGAT TT'rGTGTCGA GAA'rALGGAA CAGATGCCCA AGC'TACAGCA GAATTACTTC CTAAAGGTCI' CTTGGAACGC TGGTTATTGA GGAAACTTAT TTCAAGGTCT ATAT'rTTAAG
TTGCTGGAAA
CGCAACCAAT
AAAACGGAAG
AC~TTTCTAA AAATATTTCT CTGTTGAACC CTAAAGAGGT TGGCTTGCTA TTGAAAGATG GGATTGGGAA AACCTATGGC TATCTCTTAC TTGTTCTTAC TGTTCCGACA AAGATTCTTC GCCTCALAGGA AGTGTTCCAT ACAGATATTC 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 ATAGCTTAAA GGGACCACAA AATTATCTGA AGT'rGGATGC AAAATGATGA AAATCGCTTA TTTAGACGCT TTAAAATGCA AGACAGAGAC AGGAGATTTrG GATGAAATCG GGCAACTCTA CAGACCTTCG TCATGATGGG AATTTATCAT CCCAGAGCTT GGAAACGTAG TCAAGAAAGG GCAGAGACTT GCAAGCTTTT TCGTAACCAG ACTTGAAGAT AATCCTGAAT T'rGTCAGTGA
AGTCTTGGTC
CCGTTACCAA
ATTTGTGACG
AGTGACTAAT
CCGTTTA(;TG
CTTTTATCAT TCCTTGCAGG AAGTCCAAAA GATTTTGTTA GCTCTAGAAA ATCTGCTTCA AGAGACCTAC CTA?'rATCGA TTTAATTGAT AAGGCTTTAG TAGGAGAAGA AAACAGCTT TACTAGAAAG TA'rTCGCTTT GAGTGTCTCT ACTTGATAGA ACAATTTCAG CTAGGAAAAA TATCTTAGAT TCTCTGCACA ATCTCCATCA GTATTTTTCA TAGAAGACTT TGATGAGCTG GTTCGCTATT TTACAGCTGA AGGTGATTAC TA.ACTGAAAC GAGTCAAAAG AAAAT'rCAGA TTTCTTCTAC AAA.ATCAGGC
TGGCTTACTG
CATTTTCTAG
GAAGA'PTTTT
CATGCCTATC
ATTATTGATG
GATATACAAT
CAACAACGGA
TCI'GGCAAAT
GAATTGGAAG
TGGCTTGAAG
CGTACTCTTC
1046 TGTCCTCrTT ACTTCCT~GAG AGTTGCCAAG TCTTGGGAGT ATCGGCTACT C?1'GAGATTA 4320 GTCAGACGGT TTCT=IGGCA GACC?=~AG GCTA'rCCTGA AGCTAAAT-r GI'CAAGATTG 4380 AATCTCGGGG AAPAACAGGAA CAAGAAGTGG TCATGGTCAA AGArTrCCCT CTGGTAACAG 4440 AAACCTCCT'r AGAAGTCTAT GCCAGAGAGG TAGCTGCTTT ACTAGTGGAA A7TCAAGCT'r 4500 TCCAGCAACC GA7TTTTGGTT CTCTTTACCG CTAAAGACAT G=rCTAGCA GTATCGGATP 4560 TACTTACAGT TAGCCACTTG GCCCAGTATA AAAATGGGGA TGTTCATCAG CTAAAGAAAC 4620 GCTTTGAAAA AGGTGAACAA CAAATCTTGC TrTGGTGCAGC AAGTTTCTGG GAGGGAGTTG 4680 AVMTTCAAG CCATCCTTCT GTGATTCAAG TTG'rACCGAG GCTTCCcTTTC CAAAATCCTC 4740 AAGAACCCTT GACGAAAAAG ATTAATCAAG AACTGAATCA AGAAGGGAAA AATGCCTTTT 4800 ATCATTATCA ATTGCCAATG GCCATTATTC GTTTAAAACA GGCTTTGGGA AGAAGTATGA 4860 GACGTGAATA CCAACG;TTCC TTAACTCTTA TTTTGGATAG GAGAATCGTC G.GAAAACGAT 4920 *.*ACGGCAALACA AATAGTAGCA TCTCTAGCAG AAGAAGCGAC TGTTAAAACC ATCTCTCGAT 4980 *CCGAAGTTGA CGAGCCTATT GATAGATTTT TrAATGAGCT TTGATAAATA GTATTcGrATG 5040 ***AAAGTATAAG GTTAG.TATAT ATGAAACCTT CTCTCGACTC AAGAGTCGAT TACAGT'rTGC 5100 *..TCTTGCCACT ATTTr'rTCTA CTGGTCATCG GTGTGGTGGC TATCTATATA GCCGT'rAGTC 5160 ATGATTATCC CAATAATATT CTGCCCATTT' TAGGGCAGCA GGTCGCCTGG ATTGCCTTGG 5220 GGC7TTGTCAT TGGTTTTGTG GTCATGCTCT T'rAATACAGA ATTTC'N'TGG AAGGTGACCC 5280 *CCTTTCTATA TATTTAGGC TTGGGACTTA TGATCTTGCC GATTGTATTT TATAATCCAA 5340 GCTTAGTTGC ATCAACGGGT GCCAAAAACT GGGTATCAAT AAATGGAATT ACCCTATTCC 5400 *AACCGTCAGA ATTTATGAAG ATATCCTATA TCCTCATGTT GCCTCGTGTC ATTGTCCA-AT 5460 TTACAAACAA ACATA-AGGAA TGGACACGCA CGGTTCCGCT GGACTTTTTG TTAATTTTCT 5520 GGATGATTCT CTTTACCATT CCAGTCCTAC T'rCTTTTAGC ACTTCAAAGT GACTTGGGGA 5580 CGGCTTTGGT T=rGTAGCC ATTTTCTCAG GAA'rCGTTTT ATTATCAGGG GTTTC'rTGGA 5640 AAAT'rATTAT CCCAGTATTT GTGACTGCTG TAACAGGAGT TGCTGCTTTC TTAGCTATCT !700 TATTAGCAA GGACGGACGA CrTCTTC ACCAGATTGG AATGCCGACC TACCAAATTA 5760 ATCGGATTTT GGCTTGGCTC AATCCCTTTG AGTTTGCCCA AACAACGACT TACCAGCAGG 5820 CTCAAGGCA GATTGCCATT GGGAGTGGTG GCTTATTTGG TCAGGGATT'r AATGCTTCGA 5880 ATCTGCTTAT CCCAGTTCGA CAGTCAGATA TCAT'rTTTAC GGTTATTGCA GAAGATTTTG 5940 GCTTTATTGG CTCTGTCCTG GTTATTGCCC TCTATCTCAT GTTGATTTAC CGTATGTTGA 6000 AGATTACTCT TAAATCAAAT AACCAGTTCT ACACTTATAT I-rCCACAGGT TTGAT'rATGA 6060 1047
TGTTGCTCTT
GGATTCCCTT
TTGGTTTGCT
TCCCATTCA.A
AAAC'flAGCAG
GTCTTGCGTC
CCACATCTTT GAGAATATCG GTGCTGTGAC TGGACTACTr CC7ITGACCG GCCTTTCATT TCGCAAGGGG GATCAGCTAT TATCAGTAAT CTGATTGGTG TTTATCGATG AGTTACCAGA CTAATCTAGC TGAAGAAAAG AGCGGAAAAG ACGGAAAAAG GTTGTATTAA AACAAATTAA ATAAGGAGAA TTATA'ITAGC TCAGGGCTTT GAAGAAATTG AAGCCrrGAC GAGCCAATAT CACATGTGAT ATGGr'rGGTT 'N'GAAGAGCA
AATCATGGTA
AGTTGTAGAT
AGTAACGGGT
AGACTATGAT
TCAGACCTTG
TCGCATGCAA TCCAAGTAAG AGCAGATCAT GTC=rGATG GAGATTTATC ATGAT'rGTTC TT'CCTGGAGG TATGCCTGGT TCTGCACATT TACGTGATAA AT'TCAAGAAT TGCAAAGCT'r CGAGCAAGA.A GGGAAGAAAC TAGCAGCCAT TTGTGCGGCA CCAATTGCCC TCAATCAAGC AGAGATATTG AAAAATAAGC GATACACTTG TTATGACGGC GTTCAAGAGC AAATCCTTGA TGGTCACTAC GTCAAGGAAA CAGTAGTGGT AGATGGTCAG T'TGACAACCA GTCGGGGTCC TTCAACAGCC CTTGCCTTTG CCTACGAGTT GGTGGAGCAA CTAGGAGGGG ACOCAGAGAG TTTACGAACA GGAATGCTCT AATCAGTAAA ACGGGAGTTA. rTCTCTCGTT TTTTATGTGG CTTTTTTCAT AAAAAAATGC TATAATGAAG GGTATGAAAT TTAGGTGGAA CTTTACTGGA TAATTATGAA ACTTCAACAG GCACTGTATG G'TATCACACA AGACCATGAC AGTGTCTATC CCT'rTTGCGA T'rGAGACATT CGCTCCCAAT TTAGAGAATT ATCGAGATGT CTTTGGTAAA AAAACTCAGG GAAATCATCG ATCACGATrA CATCTGGGAT CTGCA'N'TGT TGAAACATTG AAGCTTTAAA GGTTTCTACT TTTTAGAAAA GTACAAGGAA 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800
AATGAAGCCA
GACATTTCAA
GAGAGCTTGA ACACCCGATT TTATTTGAAG GAGTTTCTGA CCTATTGGAA ATCAAGGTGG CCGTCATTTT TTGGTCTCTC ATCGAAATGA TCAGGTTTTG GAAATTTTAG AAAAAACCTC TATAGCAGCT TATTTTACAG GGCTTTAAGA GAAAGCCAAA TCCCGAATCC ATGCTTTATT AGCTCTGGTC TTGTCATTGG TGATCGGCCG ATTGATATCG CTTGATACCC ACTTGTTTAC CAGTATCGTG AA'rTTAAGAC AA.AGGAATAA GATGACAGAA GAAATCAAAA ATCTGCAGGC
AAGTGGTGAC
TAAGAGAAAA
AAGCAGGTCA
AAGTATTAGA
ACAGGATTAT
TCCAGGGATG
TGT'rGATAAC
TGAGCCAGAT
TTCTAGCTCA
GTATCAGATT
AGCTGCAGGA
CATATAAGAA
GATGCCAGTC
TACATTGGAT
TCAATTGACG
GATTCGATTA
AAATTCAAGT
CA.ACCTCAAA
AGGCCTTGGC
TTTAGAGGGC TTAGAGGCTG TTCGTATGCG AGAAGGTCTT CACCATCTAG TCTGGGAAAT AGGATTTGCC AGCCATATTC AAGTTTTTAT CTGTTGTGGA TGATGGGCGT GGTATCCCAG TCGATATTCA GGAAAAAACA GGCCGTCCTG 1048 CTGTTGAGAC CGTCTTTACA GTCCT'rCACG CTGGAGGAAA GTTCGGCGGT GGTGGATACA AGGTTTCAGG TGGTCTTCAC GGGG'rGGGGT CGTCAGTAGT TAATGCCCTT TCCACTCAAT TAGACGTrCA TGTTCACAAA AATGGTAAGA TTCATTACCA AGAATACCGT CGTGGTCATG TTGTCGCAGA TCTTGAAATA GTTGGAGATA CACCGGACCC AAAAATC 'TC ACTGAAACAA GGATTCAAGA GTrGGCCTTT CTAAATCGCG AAGGTTTGGA ACAAACCAAG CATTATCATT ATATCAACGA GAACAAGGAT GTAATCTr'rG ATGATATCAC AGT'TGAGGTA GCCATGCAGT GTTCGCCAA TAATAT'rCAT ACCCATGAAG CCTTGACACG TGTrATCAAC GATTATGCTC ATAATTTAAC AGGGGAAGAT GTTCGCGAAG CAAATCCACA GTTTGAAGGA CAAACCAAGA TTACCA-ATCG CCTCTTCAGT GAAGCrTCT CCAAACGTAT CGTAGAAAAA GGAATTTTGG CGCGTGAAGT CACACGTAAA AAATCTGGT-r CAGACrGT'rC TTCTAATAAC CCTGCTGAAA CTGGTGGAT.C AGCCAAATCT GGTCGTAACC GTAAGATTTT GAACGTTGAA AAAGCAAGTA GTAGTCTTTT CACAGCCATG GGAACAGGAT GTTACCAAAA ACTCG'TrTG ATGACCGATG TTCTTAAC C'rTGATT'rAT CGTrATATGA TTGCCCAACC ACCAATCTAT GGTGTCAAGG CGGGTGCAGA TCAAGAAATC AAACTCCAAG CCAAACCGAC TATTCAGCGT TATAAGGGGC AAACAACCAT GGATCCCGAA CATCGCTrGA AGCAGATAAA ATC'rTrGATA TGTTGATGGG ACACA6AC'rGG TTACCATGAA GTGGAACACA TGAACAAGGT GTAAAAATAA GTTACTGAAA
GCTTAACTGC
CCAAATTGGG
CCGATTTCCT
CTGCCAAGGC
TGGAAATTTC
CAGAACTCTT
GTGAGTTTCA
TGGATAAGAT
TTGGCGCAGA
CCGATGTCGA
AACCAATCCT
TTGGAAGCGA
AAGCTTrAGC
TAGGTGAAAT
CGGATAAAAC AGGAACAACT GTTCACTrCA CAATCITGA TTTGATAAA TTAAATAAAC GTCTTCAAAT 'rTCAAI-rACA GA'rAAGCGCC ATGAAGGTGG GATTGCTAGT TACGTTGAAT ATACACCAAT CTATACAGAC GGTGAGATGG
AGTTATCTCA
AAATAGCGA6A
CATGGAAAAT
TCGTGTGGCT
CAACCTTCCA
CATCGTCGAA
GGCrATCCTT
TCTAGCCAAC
A'rrTGATGTT
TGGAGCCCAC
AGAAGCTGGT
GATTAAAGAA
AATGTCATGA
T'rCCGTACAG
GACAATGAAG
GTAAACACC
GTGGTCAAGA
CCACAGAT1'G
GCCAAGCGTG
GGGAAACTAG
GGAGACTCAG
CCAATTCGCG
GAAGAAATTC
TCGAAAGCCC
A'rTCGTAcCCC
TATGTTTATA
TATATCCAGC
7860 7920 '7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9409 CCGTTATAGT GAAGGTCGTA GGACGATCAT CAGCTGTGGG TGGCTAGAGT TTCTGTAGAT GATGTGCAGA GATCGAGTTG 'rCCTCGTCG INFORMATION FOR SEQ ID NO: 162: SEQUENCE CHARACTERISTICS:.
LENGTH: 6415 base pairs TYPE: nucleic acid 1049 STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162:
S
S
CCTGGGAAAG TC -rGAAAAT TATGATAGAA TGGTGAAGG GTGACTCAAA ATGTTGAALAG TCTTCTCGTA TCCATTGTAA AAATATCTGC CTGGTCTAAT TGAAGACTrA AAAAATCAAA GAAATTCTAT TTATAAATGC TATGTCCACA GATGGGACCA ATAAAGGAAG ATACAGAGTT TAACTCAATT AGATTGTATA GCTAGTGGTT TTAACCTGGG AGTTAAACAT TCTG'rAGGGG GCTCATTCAA AAG?1'ACTGA GAC'TTrGTA ATGAACAATG GAATTTGTCT GTGGGGGGCC TAGACCGACG ATTGTCGAAG ACCTTGCATC TTGTTGAGGA AAATATGTTT GGCAGTAGCA TCTGAGGATA GATATGTTC TTCTATTTr'r CATGGAATGT AAGGTTGGTT TAGTAAATGA GCAACTTGGC CGAACTGA.AG ATTCGAGAAT ATGGTTATAA AATCCGCTAT AGCCCAAGTA CGACCAACAT TCAAGAAAAT GCTGCATCAA AAGTATTCAA ACAAGTCATG TTCAGTTTAA GTGTT'rATCA TTATTTCACT TTGAGTCTTG TGTTTAG'rC' AGCATTGTTA CCGATCACAT TTAGGTGCCT AT'rTTCTACT TTTGTCATTA CTCACT-T-GC AATGGATTTC TAATTGTGAT GCCC'N'TATT T'rATrTTCCA AAAAATrCAG GAGAGTAGTA TCAGTGCATA CAATGAAGAA CCTATCCTAA AGAGGATATT CAGCTATCAT TCAGCAATTT ACAATCCTAA GAAAALATCAA ACCTTATTTT AAAAATTGAT TGGCTATTAT TCAACAAGGT GAAAAGGAAA ATGGGCAGAG TTGCCAATTA TCGAAATAGT
ATAAACGAGA
ATAATGATAT
TTCTATCTTA
ATGGTT'rGTG
ATGTTCCTTG
TCGTATTCAT
TGACTTTATT
TTCACTTTGC
AGGAGTACAA
TATAATAACA
AACAGATTAT
GGGAATATTT
GGTTTTCCAG
TCATTATAGA
TCAGTATATT
GATTGGCTTG
TTTATTTGTT
AACTTTACTA
AAAACATAAA
TTATGGCCTT
GAGAACAATA
ATATAGTAAA
GTGATTGATC
GATATTTTCA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 S. *S S S GGGACGATTG TAGGTTTAAT ATTTATTTGG ATAAAATAAG ACTCTTTTAA GGAGGAGTAG TGGTGGAAAT TTTAA.ATAAA GTATGGTGGT TTCCATCATT CTGTTGACTA CATTATCTAT TTTOGGGG'N' GAACGCGAGC TTTTTGGTGT GACTGCTAGC TCTTCTCCAT CCGTTTCATC
TAGAGGATTT
CCAAATAAAT
AT'rTCTATGA
CAACAAAAGC
GTATCTTATA
ACGAGTI-irGG
ATTAGTCGTT
AGTGTCTTGT
ATTCTCTTTA
AAATGGAAGA
CAAAATATGC
ATAAAAAACT
AGGTTTTCTG
TTTTATTTTA 'rGGOCTGATT AATCCAGCAC CCTTCCTGTT CTATCAATTG ATGATTGGTT ACAGCAAGAT TACGGATTTC ATGAAAATCT CATATACTAT CTGTTATGCC TTCTTGCCAC TCTTOTTGAG TACCTTCTTG ATTTTATTGC CACGGATTAC TTGGCAGTTA
ACCGTCGGAC
AACATCCAAC
GTCAAAAACT
AACGCCATCA
AGCGTATCTT
AAACTGTTGT
ACCTTTTGG
GTAAGACCAT
TTAGTCGCTT
TG'ITATCA
CTrCTGATT
CAGTGAATTA
TGCTGGTATT
AATCGAGCGT
GCAGATGTrGT
TCAGGGCCTT
TCGTCAGGAA
CTTAGTCACA
1.050 ATCTACTCCA GACGCAAAAA GGTGCCGGTG ATGGTGGGGC GAACTGGTCG GTATTTTGGA CCTGT'rTTGG GCTCTTATGA GTCATCGTTG CGATTCCGTC AATAAGCI'GG GTGTCAAATG CACCAAGCAG GTACTGGCTT ATCCGTCTTG ACGAATCGCG GGAGCTGGAG GTTCAATCCG AGGTAGTGGr GATGGAGA-AC TCTrrrATG GATAGTTACC TAAGGATTCT AAGAAAAAGG CAATCTGCCT GAA'rTAGCCA GCTGGATCCG TCAGAATATC TTACAAGATG CCTAAGGTTG CCAAAAAATr GATATTACGG TCTGGGTGCA GAACTGACAG TTCTGAAATC TGTCGTCAAG CAATCCTGAA CGCATTGTCT TGCTCCGTCA TGGGGAAAAC TCAATCTACC TGAATTGATT CGTAAGTTCC AAGGGATTGA TTATGTACCT GTGAT'rGCCG C, 8* 9* ACATTCAAGA CTATGATCGT TTGTrGCAAG TCTTTGAGC-A GTACAAACCT GCTATTGTr'r ATCATGCGGC AGCCCACA.AG CATGTTCCTA TGATGGAGCG CAATCCAAAA GAACCTTCA AAAACAATAT CCGTGGAACT TACAATGTTG CTAAGGCTGT TGATGAAGCT AAAGTGTCTA AGATGGTTAT GATTTCGACA GATAAGGCAG TCAATCCACC AAATGTTATG GGAGCAACCA AGCGCGTGGC GGAGTTGATT GTCACTGGCT TTAACCAACG TAGCCAATCA ACCTACTGTG CAGTTCGTTT TGGGAATGTr CTTGGTAGCC GTGGTAGTGT CATTCCACTC T'rTGAACGTC AGATTGCTGA-AGGTGGGCCT GTAACGGTGA CAGACTTCCG TATGACCCGT TACTTTATGA CCATTCCAGA AGCTAGCCGT CTGGTTATCC ATGCTGGTGC TTATGCCAAA GATGGGGAAG TCTTTATCCT TGATATGGGC AAACCAGTCA AGATTTATGA CTTGGCCAAG AAGATGGTGC TTCTAAGTGG CCACACTGAA AGTGAAATTC CAATCGTTGA AGTTGGAATC CGCCCAGGTG 1620 1680 1-740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360
AAAAACTCTA
AGATTTTCGT
AGTTCCGCAC
CAACCCACAT
TGCGT'rTrAT
GCCTTGCCGT
TGAGytGACT
AGTTCTATCC
C'rTAAAACAG
CGAAGAACTC
TGG'TAAGGTT
TCTCAGTGGA
TGAATAAAAA
TATGTAGAGA
TTGGTATCAA CCGAACTCGT AATGTCA'rGC CTTTAGA.ATC GATGAGTTGA AGCAAGCTAT AGAAAAACGC ATAGTATCAA CTTATACTCT TCGAAAATCT TGATAATCAA GTTATGGATA CATCAATCA.A AAGATTGGAG TATCGCCTTT GCTAATCAAA GTTACACAAC CTTGGTAATA CTTCAAACCA CGTCAACGTC ATATGGTTAC TGACTtCGTC AGT'rCTATCC ACAACCTCAA AACAGTGTTr TCGTCAGTTC TATCCACAAC CTCAAAACAG TGT'rGAGc TGACtTCGTC ACAACCTCAA AACAGTGTTT~ TGAGCTGAcT TCGTCAGTTC CATCCACAAC TG'TTTTGAGy TGACnTTCGT CAGTTCCATC TACAACC'rTA AAACAGTGTT TTGAGCTGCC CGCAGCTAGT TTCCTAG'ITr T'TrrTTCTG AAATGGAATT GTTACCCAGT AAGGGT'rTGT GAGCGATATA ATCAGG'rrGA CCCCAAGTGA TGGCCAATT CTGAATACCT GTATCTCCGA TGATGATGGC TTGrrCTGGT ATGACATCTG CCTTATGCGG 'rGCTCAGGG TGGATTTCCA AG'N'"?'rTGC CATGTCTTGA TAGAGTGGAT AACTGC'TCGA TAACTCCTCA GCTTCATAGA TGCCT'rTTGC CTTATAGTAA TGGTCTTTGG ACAGGCAGGT CGCAAAACTA ATAGTTTTGG CATCAGGGCT AGGCACCCCC 'rGAATCCCGA TAGAACTATC AACGAGGT GAGGTCATGG TTTCTCCTAT TTGATAAGCT CGACCAGTAG GGGTGGTAGC GAGTCCACCT CTTGCTCCTA C7"rGGTACAT GGCATCGATC CCTGCCAAGG CCATGTCTCC TGCGATGAAA ACACAGGGA.A CTTCGACCAA ACCTGCAACA A'rGACAAAGG CAATAGCTTG ACTGGCCTGA GCGGCAGCAC TCATAGCAGA GGCTGAACCA GAGATGGAGG CATTGTTTGC GATGACTAGT AATTGT'rGCT CGTGGCTGAG GTCTAATTTT CAGCCAGCAC TTCCAGCGGT TGGAGTGGCA TTGACTGCGA TGGCATTTCG GGCAGCCGAG TTTTCGATGT AGTGATCCAA TTTGGCAGCA TTTTCATTGA GGCCAAGTTG GACAGAGGCTr AGGAAGACTT CTTCACGTTC GCGACCGGTC GCCACATTTC CTTGAAAGTC CAGATCTGC'r ATGCT'TCCTC CTAT'rTAAAG AAArTTACAT CGATAGCCTC ATCACAGT'rG CGACTGTCAA 1051
GCTCTTTGAT
CTATGCTATI'
TAGTTTAGTA
GTTTCTCGAG.
GCTAGTTGAT
CTAGAACCAT
GCAGTAGATC TATCCTTTGT CGTGGTGATG AGCAAGTCTA TAATCTGAGG AAAGAGTTGA GAACCATATA TCTGCACGGC ?1TCAGAAATT CTTTCGAGAG GTGGTCCCAT AAAACCACGA AGCTCTTTAA AGGTATAGGT AAAGGCATTG CCATCCAAAT CGAAAAAAAT CGCTGTGATA TAT'rCTCCGA AAATTTCTTT T'rGGAGGCGA TCAGC'rGTTT CACGAAAGGC AGTTGGCATG ACTTCATCCA CAGGGATTTT AGATTCGATA GCAAAGCTAG CTCCCATGGC ATTACGTTTG GGGTCACAGA TGAGGCCTAG CATATTT'rTA TAAGGTGTTC CACCTGCAGC CAGAGTCAAG ACTT'CACCTT GACACCCACC CTCAGCACCT CCAAAGGCAC CAGCAGCAAA GAGGAAATCC TCAATAGCAt CAGTG.AGAAC GGATGCC-AGA CAGACCA-AGC CCATTrTTGC ATTGTGTTCA AGAATCGTAT AATCTGACAG AGTTTTTCCG TTCATTGAG TATTAC'rTCA GAAAATACGC CAAAACTTCT GATCTGC~rG CTCTCCAAAT CTCCCAGCAT ATCAAACTTG CTGTCTGCAA GGCTTGGTG.A A.AATGCCA'rC AAAGAAATGA 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 0
*O~
TCTCCACCTG
T'rCATAACTT AAT'rCAAACT
TGCTCGACCA
TGTGGAGATG
TCAGGCCACT ACGAGA'rTTA CCAGATTGCG rTCCATGAGA CTGTTGTA.AT CA'rGAGTTCT ATTCTTTGAT AGAATAAAAC AGGGATTTTT CGAATTTC1TT CTTCGATAAT CATA.ATGGCT T'r'TCACCAG 1052 TATTGATACC C1TT'r'DCACG AGTGACATTC ATCTGGGCGA TAACAAGGGC AATCATACCT GGAATATCTT TATTGAGAGA GACGGCAAAA CCATTGAGTT AAATACCAGT CACGC TGATG GTCTrGTGGG GGTGAGGGGC ATTGCTGTCr TTCTGAATGG CAATTCCAG ACTATrTGGA AT1TrCAGGAT CAAGGGCTAG GTCTGTTCCG TGACCACGAT ATTCAACTTC TGTCGGAGTA TCATCAAAAA CACCAGCGGT ATGGCTACTA GATGGGCCAA ATTGAAAACG AAGTGATTTC ATCAGTTTCC AAAGAATGAG GGGCTTGGCT TTAATTGTGG TAAAAATAGT ATCTTTTATG ACAAAAAACA.
TAATATTTCT ACATAAAAGT GTCAAGAAAC AGAAAGAAGG AGAATTTTCG AATATGAAAT CAGGAGTAGC TGCCT'rGTTT GCAGTATTTG ATAGCGGGAA AGCGCCTCTG GATGAACGAT GATGATAGTC GGTGTATTCA CGGTrACCTG AATATTTCCT CCACCGATAG CA'rTTTTAAC AGTAAT TrTA GTGGTGTTAG TCCAGACAAT CTTGATACCA CGCTrGTGGG CATCTGTATC CATTCCTAAA ATACCTGCAA AGGTCTrGGC AAATGAGTTA AAAAGTTGGA 0 0 0000 0 TGGAAGAGAC AAT CTT CCCA TCATAACTGG TCCGATGATA CCTTATAAAA ATTCTTATCr ATGAAAACCT TTCTAATACC CCTTATTTAG GGAAATAAAA GGTAATATTT AAAGGGTATG CAATAACTAA AAAGATTAAA CTCCATCATT TGTATC'rGCT
ATACGAACAG
TCAAAGACAG
CTATTATATC
TCAAATAGCA
AATAATTrTG
ATAGAACTAT
GCAACTCTTG
CAAGAATCAT
CACAACACAA
TATGTTGATC
GCTACTTATG
CCAGTAGTAA
TGGATCGCTC
GTTACCAATT
TACCG
5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6415 CAACTTACAC TGTrAAAGAA GGTGATACAC TTrCAGAAAT CGCTGAAACT CAGTTGAAAA ATTGGCAGAA AACAACCACA TTGATAACAT TCATTTGATT AAGAGT'rGGT TATCGATGGC CCTGTAGCGC CTGT'rGCAAC ACCAGCGCCA CGGCACCAGC CGCTCAAGAT GAAACTGTTT CAGCTCCAGT AGCAGAAACT GTGAAACAGT TGTTT1CAACT GTAAGCGGAT CTGAAGCAGA AGCCAAAGAA AAAAAGAATC AGGTGGTAGT ATACAGCTAC AAATGGACGT TATATCGGAC AACAGATT~CA TACCTGAACG GTGACTACTC AGCTGAAAAC CAAGAACGGG INFORMATION FOR SEQ ID NO: 163: SEQUENCE CHARACTERISTICS: LENGTH: 8494 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 163: TACCCCTTTC GAATTTTGGC AAAAATTCGG TAAGGCTTTG ATGGTAGTTA TCGCGGTTAT GCCGGCTGCT GGTTTGATGA TTTCAATCGG TAAGTCTATC GTGATGATTA ACCCAACCTT TGCACCACTT GTCATCACAG CCTTCACATT TTGTTrGCCC TGCTTTCGCC GCTGGCTTG TCTATCAGC GATATGTTGA AATCAAAG?'r GCTGA'rTACT ATTCGTAGGG ATTATCTCAG CCGTAAACTT CCTGATGCAC TATTCTTCGT TCAGCAATCG AGGTATCAAT AACTTCGGTA ACCATTCTTG TATGGTACTT GACTATCCCA ATGAACTACA 1053 GTGGAATI'CT TGAGCAAATC TAGCCATTGG AGGAAGCTGG CCTTCATCI'r GATTAACCGT AAAATCCAGA TGCTATGGTA TTATCAGTGT TCTTGAAGCT GTTTTGAGG GGCAACTGCT TTTCATTCTT CAACGGGAA.A CTGCAATTCT ACTTGCTGCT TCTGGATTGC CAACTCACAA TGGAACG'rTT GCTCrTTCCA CAGCTCTTGG TGGTACTTAT GGTTGGGGGG 'ITATCGGTAA CTAAAGAAC GTGCTGGTGG ATCACI'GGTA CAATCNTCGG ACTACI'TCT TTGGTGTTC CCAGCCTTGA ACATGGGGGT TACA.ACA.AAT AC'TACAACTT CGTTTCGTAC CATTTGTAGT TTCTGGCCAG TAGTTCAAAC GAAACTGCTC CAATTCTTGC TTTGGTCTTC ACCACATGTT GACATTTTAA CTGGTGCAGC CCATGGGTAA CAGACCTTGT T'rAGATACAG TACATCCAC TTGATGGGTG TGATTGTTGC AAAGGTATGA TGATTGCAAC GAATACATGT TCATGTTCAT GCTGCCTTCG CTATGGCTGA TTCTTGACTC GTACACCTAT GTTTGGGTAkA CTGTTCTCTT AAATTCAACT ACGCAACTCC GAAACCAGCA GCGAAGTGAA TAAAGGTACT CAAGTATTCG GTCAAGACCC ACTATGGCTT AAACCTTAAA GGTACTGATG C'rAGTCAATA TCAACACr'rG rCGTTCAAA GTTGGACAAA-TGATCGGTC ATTCGGTATC TATCTACCGT AATGTTGATG CTGACAAGAA ACATAAATAC AGCTCTTGCA ACAT'rCTTGA CAGGGGTTAC TGAACCAATC CGCAACACCT ATGTATCTTG TTTACTCACT TGTTCAAGGT CGTCGTAAAC CTACGTATGC ACTCATTCGG TTCAATCGAG TGCAATCAGT GCTGGTATTG GTATGGATAT CGTTAACTTC TGCTGTAATC ATGTAC'rrTA TCGCAAACTT CATGATTCAA AGGGCGCAAC GGAAACTACG AAACTGCTGA AGGTTCAGAA 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 AGTTGCAGCA GGC1'CTCAAG CTGTAAACAT TATCAACCTT CTTGGTGGAC GTGTAAACAT CGTTGATGTT GATGCATGTA TGACTCGTCT TCGTGTAACT GTTAAAGATG AGGAAATGCA GAGCAATGGA AAGCAGAAGG AGCTATGGGT CT'rCTCATGA GGTTCAAGCT ATCTACGGTC CAAAAGCTGA CATTTTGAAA TCTGATATCC TGATTCAGGT GAAATCATTC CTGAAACTCT TCCAAGCCAA ATGAC1'GAAG CACTGTT1CAc TTCAAAGATC T1'ACTGAGGA AG'rTTACTCA GTAGCAGACG TGCTTTGGAA CAAGTAAAGG ATCCAGTAT'r 'GCTCAAAAA ATGATGGGTG AGTAGAACCT GCAAATGGAA ACATTGTATC TCCAGT'rTCA GGTACTGTGT
CAGATAAAGT
AAGGACAAGG
AAGATATCCT
CACAACAAAA
GTCAAGTTGT
ATGGATTTGC
CAAGCATCTT
CCCAACAAAA
TGGTTTCGAC
AAAAGTTGCA
ACGTGAAACT
AGAAAAAACA
TTGAGGT'rGG
ACTCAATACT
AGATA?1'CTT
CATGCTTTTG
ACAGTAAGTC
GCAGGAGATC
TCAACAGTAG
GCTTCTCTTG
AAGCTGTT
CACAGTTGGA
GAAAAGGACT
CTATTGTGAC
TGAAGG1'AA
TCCTTGTCAC
TTCGTCTTCAC
CAGCTAAAAC
CCAACCTCTT
TGGAGAAAGA
ATGATTTGAT
1054 GGAAGCAGGT CTTGAAGTAT TGGTTCACAT ACCAT'rrACA GTTCATGTTG CTGAAGGACA AGCTGACT'rG GATGCI'ATCC GTGCAGCAGG AAATGGTGAT GCAMTTAAAT AGCAGTTGCT AAAGTAGAAT ATTTTGGGAG AAAAGAATGA AGCAGAGGAA AAATTCCAGA TTGTTTTCAA GAAATCAATC
CAGTTAAGTT
TGTAATATAC
AA'TT'AAC
TTTTGCTTGA
AGGAGATGAC
CTCGTCAGAG GTGGAGGTTA ATGACCTT'A TCAAGCTTTG CCAGCAGCTG AGCCTATrCA CCAAGACCAT TATGTTAGAC TCTTGGTTGA AAAGTTGTCT GAGCAAGGGA AAAATTACTA CTGGACCTGG GCCTATAACC GTCTAAAACA CCTATTGAAG C'TATCATACT CGCCGTGTTG TGCCAGTGI' CATCTCTCT'r GGCTGTCTTG AAAAAATTGA TGGACAGGAA GGTTACCAAG AGTTGCTCAA GAGAAAAGTG GAACACTGAA CCCCTTCGAA TTTACATGTC GTATTTGATG TGCTATA'N'A AACTGGAAAT TTCTTACTAG AGAAGCTACT AA'rATGGTAT AATTGATAAG A'rATCGGCTA
CCAGAGAAAT
CCCTAGCTGA
GGTGGGATAA
ACAAGCCACT
CTATTTTAGC
GTAGCTATAC
TCGATTATGT
GTAACAAGAG
AATAACTGAA
GGAAATAGCC
GTAGATAGAA
TAACCGCTAC CACGAAGGTG TGCTATCTT TTrGGTTTCA GATGTGGATG ATCCAACAGA AACTGTAGTC GATGGCAAGG AGCTAGCAGT AGGTTTCCAA GAAGAATGGG CACGATTTGA TrrACTAGCT GGAGATT'rCA ACAATCCGGC TAGTCCATTA GGCTTACAAG ACGCATTTGA TGTTCCGCCT GAAATTGATG GCTGGAAAGG CTTTACTACC AAAGAGTTAG CGGTGGAAA-A TCCACAAGTG AGTGATCACT ATGGCTTGAA AAGAGGTTGG ALACTATAAAA TTCCAGCCTT TAAATA.AGTG AGACTACTGT AATGGAATAA TCGAGGATGT TATGTCATTr ACGAAATTTC AGGAGTTAAA ATTTACAACT CCAACAGAGG 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 AATT'rAAAAA CTATATTAGA GAAGCCTTGA TGCAAGACAA GTTGATTCCT ATTGT7TTGG CAGGTCGTGA CCTAGTAGGA CAGGTTCAGG TAAGACTCAT ACTTTCTTGT TACCGAT'Tr CCAGCAATTA GCGATAGTGT ACAAGCAGTG ATTACTGCAC CGAGTCGTGA GTTCGCTACT AAGTAGCGCG TCAGATTCA GCTCACTCAG ATGTCGAAGT TCCTGTGGT'r GTGCTACGGA TAAGGCTCGC CAGATTGAGA AATTGGCAAG CAATCAGCCT TTGGAACACC AGGCCGTATC TACGACTTGG TTAAATCTGG 'rCATTTAGCT CCAAGACATT TGTTGTTGAT GAAGCAGATA TGACCI'GGA TATGGGATTC
GAATCAAAA.A
GATGAAGCTA
CAAATTTACC
AATTATGTGG
CATATTGTTA
ATTCATAAAG
TTGGAAACTG
1055 TT3GATAAGAT TGCTGGCAGT CTTCCAAAAG ACTTGCAATT CATGGTCTTC TCAGCGACTA TCCCACAAAA ACTGCAACCA TTCTTGAAAA AATAC?'rATC AAATCCTGTT ATGGAGAAAA TTAAGACCAA AACGGTTATT TCTGACACCA TTGATAATTG GTTGATTTCG ACCAAGGGAC ATGATAAGAA TGCTCAAATT TACCAGTTGA CTCAGTTGAT GCAGCCGTAT TTGGCAATGA TTN'TG7rAA CACTAAAACG CGTGCTGATG AATTGCATTC ATATCTGACT GCTCAAGGCT TGAAGGTTGC AAAAATCCAT GGCGATATTG CCCCTCGTGA ACGCAAGCGA ATCATGAATC
AGGTGCAAAA
ACATTGAACG
TTCATCGTGT
TCTGGATTI'T GAGTATATTG TGTCAGCCAT CTCATCAATG TGGTCGTACT GGACGAAATG TCGCAACAGA TNTGGCAGCG CGTGGGATTG ATGCCATTCC GCAAGACTTA TCT'NrrTTTG GCCTACCAGG TACAGCTATT ACCCTTTATC
C.
C C
C
C
CC CC C C
CCC.
C
CC..
C
C. CC C C
C
AGCCAAGTGA TGACTCGGAT ATCCGTGAGT TGGAGAAATr GGGAATCAAG AGA'rGGTCA.A AGACGGGGAA TTTCAAGATA CCTATGACCG TGATCGTCGT AGAAAAAACA AGATAAACTT GATATCGAAA TGATTGGTTT GGTTAAAAAG AAGTCAAACC GGGTTATAAG AAGAAAAT'rC AATGGGCGGT TGATGAAAAG CCAAGCGTGC TGAAAATCGC GCTCGCGGTC GTGCAGAGCG TAAAGCTAAA TTTAATAGAA ATTGTTGGAG TA'rTGAGCTC CAACTTTT'rT ATTTATGAGA TTrACTCCTA
GCCAACCGTG
AAAAAGAAAA
CGCCGTAAAA
CGCCAAACAT
ACGAACTATC
3720 .3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 TAAACCGAAA CACTACATTA AAGACTGCAA ATrGCGATTA AAAATGGTAT AATGATAAAG TTATATAGTC CCGATAAGAT GGTAGGTATT TATTACGAAG AGTTTTCCTA TCAGTACTTT GTAACTCTAT AACAATATTT ACGTCTGAAT CTGTATCTGA ATTTTGGATG CTATTTTAGC TATACTGGTT CTGTCCACGT CGTGTGGTTC GTGATACCAT GCTGAGACGG TGGGAGTACA GTTAACGAAG CCTTGGAGGT GCAGGTGACC AAGGGCTCAT TTGCCAATTG CACTCAGTCA GAAATTrAGCT ATCTCCGTCC GACCGTCCGG TACGTGTAGA AATGAACAAA TCCATCAAGA TTTAAGGGGG GACATTTTTA TGTCAGAGCG TAAATTATTC GGGGCATCCG GATAAGATTG CAGACCAAAT TTCACATGCG AAAGGATCCA GAGGCGCACG TTTTGGTGAA ATTTCTACAA TGCAGAGATr GGTTATACCA CCCATCTT'rG GTGGAACAAT TCGTGGAAAT GCTGATCAAG TTGCTGCTGA AACAGCT7GTA ATGCCTATGT GGATA'rTAAC ATACAGAATA TGGATTTTCT CTCCTGACAT CGCTCAAGGT ATCCACTGGA CTTGATTGGA
GTTTGGATTT
TAAATTGGTT
AGATGCAAAA
TACAGTCGTT
TGTGATTGAC
GCAGTAGATG AAACAGAAGA GCTTATGCCA CGTCGTCTGG CAGAACTTCG TAAGTCTGGA
TCACAAGTTA
ATTTCTACTC
AAGGTCATCA
CAGT-rGACTA CGATGAAAAT AGCATGATCC AGAGGCCACT AAGAAGT'rAT TCCATCTTCT 1056 TATCTTGATG ATAAGACAAA ATTCTrTATC AATCCGACAG CCTCAAGGGG ACTCAGGTTT CACTCCTCGT AAGATTATTG TCTCGTCATG GTGGTGGTGC CTTCTCTGGT AAAGATGCGA TCTTATGCGG CTCGCTATAT TGCCAAGAAT ATCGTTGCAG CAAGTGCAGT TGGCCTATGC TATCGGTGTr GCGCAACCTG TT'CGGTACAG GAACAGTAGC TGAAAGTCAA CTTGAAAAAG CTTCGCCCTG CAGGGATTAT CCAAA'rGCTG GACCTCAAGC TCCGCTTACG GTCACATGGG ACGTACAOAT ATTGATCTTC GTAGATGCT'r TGAAAGAAGC AGTAAAATAA GATTTTAAGA TATAGTTTTT AACTA'rACTG GGATACTGTF CTGAAAATCC TACATGTATA GTAGATTGAA ACTAGAATAG TACACCTCAA ATCAATTTGA CTGTCCTGAT CGATTTCTCC TGT'rCTTGTT AAAAATGATA AAGGTTAAGA TTTCTCCTCG TAATAGATAA AAAGTTTTAT TCGTTATCAC TTGACTATTC CAAGGTTTTC ATGGACTCAT GGT'rGAG-ATT TCTCCTTMT GCTTGGACTT AAGCCTTGTT CAAACTTrCTA ATACACTAGC TGTTTCCATA TTTCTTTTCC GAATAPLATAG ATAGAACCAC AGAA'rCTAGT GGTATAATAT TAGCAATAAA ACAAATCTGG AGGATTAGAA CAAATTGCTG GTTTTGAGTT TGACAATTGC TTGATGAATG ACGATAGAGG AGTTAGAAGA GGTCAAAAAC TCAGCGGCAG GTCGTTTT'GT AATCGGTGGT TAGATACTTA 'rGGTGGCTAC CTAAGGTGGA TCGTTCAGCC C-AGACCTTCC TAAGAAGGCA TTCTGTTCG TATCGATACT CGGCTCGTCA AATCTTTGAC GTCCAATTA CCGTCAAACA .*Sao: 0 CATGGGAACG 'rTTGGATAAG GGGGAACGTC CTCTCfTTTT ATTTGCGAA AGTAGAGATT CTTCTAAAAC ATTGTTAGCA TCATTTTACT ATATTTCTTT TCTTGGGGAT ATTTCAATCC TAGAGCAACA GAGTCATGGA CATTCAAAAG TC'rGTrACCC GCATGACTTC TGTACTAGAC AAACCTAGAA TTAAAATTAT TCATGGTATC AACGAAAACA CAGCAGGTGT GGCTTGTATG GAACCTT1TGT TACTA.AGACA ACCAAGATGT TCCACTTGGT ATTATTrGGA TTATCTTTTA CTCTGGTCGG CATGTCTCCA ATTTITCGTGG TCTGACTGAG 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 GCGACCTTGG ACTTCCGTCA TCCATCAACT CrATGGGC1T GATTTGCAGG AAAAAGAGTC GAGGAAACCC ATACTATTTT CTAAATCTT CCTGTCCAAA ACAGACCGGA TTTTGGCAGA CCACCTTATT TTGATATTGT CTCAAGTTTG TCAACTGCGT GTCGTTATTC GGCCTAAGAA GCTTTAGCCA ATGTTCACGC GGGGAATCCT GAGCCACGC'r GCCAAATAAT GGCTTAGACr GAACCGAACT TTCTTCTTAT GAAAAAAGTC CAAGAGAGTG TGTTCCAGGT AAACCTCAGA TGCCTATGA TTTT'GAGACA AGTGT'rTGCT TACTTCACCA AACCTCT'rGG AATTAAATTG TCACTTTGAC -CAAGCGGCAG CTATTTTCAA CAAATATCCG TAACTCTATC GGAAACGGCC TCTATATAGA AGACGAATCT TGGTTTTGGT GGAATTGGTG GAGAATACAT CAAACCGACT CrTTTTATCAA CGTTTAAATC CTrCAAATCCA AATTATCGGA 1057 ACAGGTGGCG TTrCTG.ACTGG TCGAGATGCC TTTGAACACA GTGCAGGTGG GAACGACCCT TCACAAAGAA GGCGTCAGTG GAACTGAAAG CAATCATGGT GGAAAAAGGC TACGAGAGCT TCCTCTGTGG AGCAAGTATG CTrTTGACCG CATTACCAAT TAGAAGATTT CCGTGGGAAA TTGCGCTATA TrGACTAAAT TAAA'rCGAAA AATCTGAAGA AAGGAGAGAC GATGCTAGCC ATTGAAGAAA GTCAGAAGTT GACTTATCA AATTrACCGA GCCTGAGCCT ATTTACAGGG ACAGATCAGG GTCAGTTrGA. AGTGATGAAG AGTCAAATGT TGAAACAGAT TGGGTATGAT TCTGCTGACC TCAACTTTGC CTACTTTGAT ATGAAAGAAG TAGTrTACAA GGATGTGGAA CTGGAGTrGG TCAGCCTTCC TTrCT1'TGCG GATGAGAAAA TCGTGATATT AGATTATT'N ATGGATATCA CGACTGCTAA GAAACGCTTT TTGACAGATG ATGAGCTTAA GTCA'rTTGAG GAATACCTTG ACAATCCTTC TCCAACAACC AAGTTGATAA TCTrTGCAGA AGGAAAGCTG GATAGCAAAA GACGG'rTAGT CAAATTACTT AAGCGTGATG CCAAGGCCTT CGATGCAGTA GAAGTAAAAG AACAAGAATT GCGCCAGTAC TTCCAAAAGT GGAGTCAGAA ACAAGGTCTG .00.
o 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8494 CAGTTTACCA ATCAT'rCTTT TGAAAATCTC ATCCAGAAA-A ATCTTCTCTT TTTACAGTCC GATA'N'GTTA ACGCAATTCC CAAGACTTGC TTCTGACrAA AAAGATGGAT CAGGCGCGCG AAGATGAAAT CAAACTGATT GCAGTCATGC AGATTTTGGC GGAGTCTGGC CAAACAGAAT TGGGACGTAA CCCAAATCCT TATCAAATCA CTrTGAGCTT TTTGAAGCAA GCTATTTCCT CAGGTCTTTA TGAAAAAGGT TTCCTTTTTG TCAATTGACA TTTGTTGAAA CTACTAACCC CTCATCAAGT CGGGGTTTCA ATTTAGCGAA TATAAGGCGA ATTCTGTTAT TGAGGAAGAG AGGACAATAT 'TrGATTTA ACTCAGT'rTA ATTTGGTGAG AGACTTGACC TTGCAAGGGG TG4QGACAATT TCGGACTTTT ACTC.AGGTGA CGCAGATTGC AAGTAGTTTA GGTAGTTATC AGTTTGCATT AAGAGATrCG AGAGGACTTT ATTTGATTGA GACAGACTAT CAGATT1AAGA AAAAGGCACT CTTACAGATT GCTAGTCAGG
GCGG
INFORMATION FOR SEQ ID NO: 164: SEQUENCE CHARACTERISTICS: LENGTH: 9707 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 164: CCGGTCAGTT CGTTCAGTAC AAGGAATCAT AATGAACGAT CAATCAGAAA AAAAGACTAG 1058 AAAGAAGACT GTATGGATAA TCGACCAATT cGTTTTTGG ATTCGGCTGT CGGGGC'I G 12 120
ACCGTTGTC
TCGGCGCGGG
CTGGTCAACT
CAGCTCAT
CGCCCrATGG
TTCTCTTGAC
GCGCCAGCTT CCCCATGAAG AAATCGTCTA CCCCCGTCCT GCTGAGCAAA TTCGTGAATA CAAGGATGTC AAAATGAT'rG TCATTGCTTG AATCAAGGCT CAACTAGATA TTCCTGTCTT CATCAAGTCC AGTCAAGGTG GGAAAATCGG AGACATATAC CGTCAGAAAA TCCATGATCT
TATTGGAGAT
TACTTGGCAG
TAAC-ACTGCG
GGGTGTAATT
AGTGATTGGA
GGATCCCGAC
ACTGCGGTCG TCTGGGAAGA TTGCCAGGAG CTTCGGCAGC ACGCCCATGA CGGTACAATC TTACAGGTCG AGAGCTTGGC TCAACCAGTG TTACCAAGAA GATAGCCTGA TTTTGGGCTG ATGGGGCCAA AGGTTCAGCT CTGTCCCAAG TTTGCTCCCT TGGTTGAGTC AGGTGCCCTG GGTGGTCTAT GAAACCCTGC GTCCCTTGGT TGGAAAGG TACTCATTAT Cr-ACTCCTTC GCCCTATTAT CCAAAATGTG CATCGATAGT GGGGCAGAGT GCGTACGGGA TATCTCAGTC TTACTCAATT ATTTTGAAAT CAATCGTGGT CGCGATGCTG GACCACTCCA TCACCGTTTT TACACAACAG CCAGTAGCCA AAGTTTTGCA CAAATTGGTG AAGAATGGCT GGAAAAAGAG AT'rCATGTGG AGCATGTAGA ATTATGACAA ATAAAA'rTTA TGAATATAAG GATGJACCAGG ACTGGTATGT TGGGTCTTAT AGTATTTTTG GTGGCGTTAA CAGTTTGAGC GACTATAAGA CAGATITTCC TCTGTTTGA-A TrCTCCAAAA TATI'GGAGA 'rGAAGAGTAT GGTTrCCCGC TTTCAdTTAC TGTTTTACGC TATGGTTCTA TCTACCGTTT GTTCTCCTTT GTGGTAGACA TGCTTAATCA AGAAATGGGA CGAAACTTGG AAG TTATTCA ACGTCATGGG GCCCTGCTCT TGGTTGAAAA TGGGCAACTC T'rGTATGTAG AATTGCCTA.A AGAAGGGGTC AATGTTCATG 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 ATI'TC'1NGA GACAAGCAAG GTCAGAGAAA AAACCAAGGA ATTCCGAGCT ATCTTTGATA ACTACCCTGA CCTGCCTGAA GTAGCAGAAA TTAAGGCAGA AACCATTTCT CAATTAACGG TCAAAGTCGA TGTCCTTGGTr GGCTTACCAG GAGCAACTGA CCGTGAAAAT AATGCCAAAC TCAAGGACCG CTCGGCTCAG TTCCACACAA GTTTAGTTGT TGAAGCAGAC TGGTCAGGTT GCTTTGGCTA TGATCCCCTC TTCCTTGTAG CCCTGGAAGA AAAAAATAGT CAATCTCACC TATTTCCATC ATGGCAAAGC AAACCATCAT
CCTTGTTGAT
AGTTAGGCTA
TGCGACTCGT AACGAAGGTA CGATGTGGAA AATCTTAATG CAGGTA'rGAC C'rTTGAAGAA AATGCCCGCC GCAAGATGGT TTTGGCAGAT GATTCTGGTC GCGTCTGGTC AGCTCGTTI'C GCAGGTGTGG TCTTGCACGA ATTGGCCATG GTCTTTGAAC CCCTAGTCGT AGCCAGCCCA AATAAGGAAA ATATTAACTT TGAACCTAAG GGTGAAAATG GCAAACAGG TGAGTCATCA GCTGAATTAA GTGCCTTAGC CGTTAAGAAA CTTTTGGACG TGTAATGAGC GATTCCCATG GCGATAGCTT 1059 GATTGTGGAA GAAGTCCGTG ATCGCTATCT GGGCAAAGTC CGATTCTGAA CTACGTCCGG ATTCTCCACT TTGGGAGGGC CATGGACTrC TACGCCCGCT ACCCAGAACG TCTGGTGACr TATCCAAACT CATGGTCACT TGTTTGACAT CAATTTCAAC GGCTCAGGAG GAAGAGGCCG CTATCTGCCr CTATGGTCAC GTTGGAAGGC AAGATCCTCT TTCTAAATCC AGGTTCTATC CAGAGAATGT CTCTATGCTC GTGTGGAGAT TGATGATAGT GACACGAGAT CACGAGGTGT ATCCAGGTTT GTCCAAGGAG GATGCrTTT TTCATAACGG ATCCGCGTTG TTAAAGGGAA GAGCTTGGTI' CGACCAAGAT TTCAAAAGT TGGAC'rACTG TTGCATGTGC CAAGTGCTTG AG'rCAACCAC GAGGTACCAT TACTTCAAAG TGGACTTr TITTAGCCGAT GATTGCCAAG TGACCCCTGC TAAAAATCTA TCTTGCTCAG TCAGATGACC TTGGGACGAT TGGACTCAGA GAGTTTGAGA CTTTCTTGTT GGGGCAGGAG GCTGTGTTGA TTGATACCCA CAATGCGGAT TATACCCGTG TTCCCGTTG'r GACAGATGAA GATATTATGG CTTATCAGAT GGAGCATGAC ATCGTTCATA TGACAAAAAC GGACGTAGCG GTCTTGCACA AGCTAGTAGA TGAGTCCTTC CAAGGGATTA TTACGCGCAA GTCCATCCTC AGTAAGGAAT ATGAGATTCG ATGCCAATGA AGCAGGGCTT GTCTGTCAAT TCCAAGCAGT
GAAACTTTTT
CATGCGACCC
AAACAGTT'rG TTGAGCCAAG AAATCATGGC GGATACGGAT GTTGTTTCGC CTGATTTCAC CATTACGGAG TTACCGGTTG TGGATGCAGA GGGTANTTTC AAGGCCGTTA ATGCCCTCT1' GCATGACT GAGACAGGAT TTCAGCCTTTI TTAGAGGAAA CCTATAAGTA TGA3-T-rGGAG CAATTTTTAG GTCTCAAGAT TTACCAAGCC CAGCTAGCCA TTTCGGCCTG TAACCAATTT CTATACTTTC 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600
ACATGGTAGG
ATCTAAAAAT
TGAGCGGATT TCTGAGACCA CACCCCAG AAGCGAAAGA a TCTATCAAAA AGGAGAGGTG GACAGCTTTT ACCGCTTGGA AGAAGACGGA AAAGCCAGAG ATTCTATACC TAGACTCTTT CAGAGGGCCG CTTGCTAGCG CTCTTAATCC 'rAGAAATGGG TAGCCATCAA GGTTGCGGAC ATCAATCTGG ATTTTCAGGT CCCAACAGAG GATTGTCACC ATTCCCACGG CCT-rGCTTTC GGCAGACCTA TCTTTTTGAA AGAGGAGAGA AACCCTATTC AGTTAGAATC TTTTGTCAAG GAGAAAGGTT TTCCATCCTT AACAGTTTAT TCTAAGACAA ATAGAAAACA AGGTCGATTT TAGGA'rTAAA AACAGTCCTG ACCTTAGAAA AATATAGATA TTTTGAAGGA CCCCTGGACT TGCTCTTGCA TCTGGTTTCT ATTAGCCAAA CAAGCTGA.AA 'rTGGCAGGAA AGCGACCATC GCTCTTGCCC AGTGAGATTT GTTGCGAATC AGCAAGGCTT AGAATTGGAA CCCTTGATGG TCGTCAGTGG GCCTTTCGTC ATCAGCTCAA GTCTTACGTG GTACGAAATT GCAAAAAAAT ATGGATATTA AATTAAAAGA AAGTACCAGA TGGATATCTA CGATGTGCCC ATTACCGAAG TCATCGAACA CATGCGTCTG GAAGTGACGG GTGAGTACAT GAGTCGTAALA CTCCTTCCGA AGGTAGCAGA GG.ACCTCCTC TCTCAAATCG AAGAATATCG 1060
GTATCTAGCC
GGTCATGGCT
AGTGACAGAC
CAAGTTCAAG
TATGTCTCAA CCCTGCAGGC AGTCAGCTCA TGCTGATTAA
TTGGGGGATG
CTCTTGGGTG
CCGACAGAGT
ACCTGGAGCA
AGCACTTGGA
TCATTTACGA AGCCAAGCAC CAAGAACGGG CCCAGTATTA TCCAAAGCG AGATGCGGAG CTTGTGCATG ACAAGACGAC CATTGACCTC CCTAGCCAAG AAAAAAGAGG AGTTTGCACA AAATCACACG TAAGATTGAG GACATGATrGA TTATCGTGA-A AGACTCCTTG CTTGCAGGAT TTGTTCAAGG AAGCCCAGAA TGTCCAAGAG AACCCTAGAG TTAATCAAAA CCCAGGAGTT GATCCTCGTG TATCTATCTC ATGGAAAAGA AGbAAGAAAG TCAAGTCCCT TTTGrACIT TTTCAAATAT ACGATCTTGC GGGATGAG'rA
ATTOGACGAG
GTCATCACCC
CAAGAGGAGA
CAAAGCTAGA
ATCAATTGCG
TCTrTTTTGGC
GTTT'TGGAGA
CTI'GATAGAG
GTGAAGATGG
TCCAGCAAAG
CTTTGATTGA
TGAAGGAATA
AGGAAAGATG AGTACTTTAG CAAAPLATAGA AGCGCTCTTG TTTGTAGCGG GATTCGGGTC CGCCAGTTAG CTGAACTCCT CTCTCTGCCA CCGACAGGCA ?ITTAGGAAAA TTAGCCCAGA AGTATGAAAA GGACCCAGAT 'rCCAGTTTGG GACAAGTGGT GCTTATAGAT TGGTGACCAA GCCTCAATTT GCAGAGA'NTr 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 CTCTAAGGCG CCTATCAACC AGAGCTTGTC TGCCTACAAA CAGCCGAT'rA CGCGGATAGA TGGAGCCTTG GCAAAGTTGC AGGCTTTTGA ATTGGGGCGC CCCAACCTCT ATGTGACTAC CCATTTAGAA GAATTACCAG ATTTGGTGAA AGGATAGAAG CCAGTAGGAG AAAAGCAGAA TGGTGCGTGA ACTAGCAACC CTATCTACALA CGAAGAAAAG3 GTGTGACAGA TGATAAGGGT GTATTTACCC TGTGGGTCGT
TGATTGATGA
AAGATGAGAA
GAGCTGATTA
ACTATCAAGT
GTCTACTATC
CGCAAGACGG
TTGGACTGGG
ATGATTCACC
AATAAGGACA
CCAGCTGT
TCGGGCTGCC CTTGAGACCT TGTCCATTAT AATTGATGCC ATCCGTGGAG P1'AACTCGAG CCTGATAAAG GALAGACGGGA AAAAGGAAGT GGATTATTTC CTAGATTACA TGGGGATAAA GCTTGAGATT CAAGCCCAAG AAAGCCAAT'r TCAATAAGTA TATTGCCCAC GCAGGTGTG ACCAAGGCTT GGTGACGGTT AACGGCCAAG CAGGCGACAA GGTCGAAGTT GAAGGTCAAC TGCTTAACAA ACCACGCGGT GTGATTTCCA
ATGGGGACTT
CGCGTGTTAA
TTGATGGTAA
TACAGACGAG
AGGTGTGGCC
GAAAACCAAG
TTGTCGACC'r
ATACATCAGG
CTCGTAATGA
ATCTCCGCCC
ATGAAATTCT
CTTGCCCAAT GTCAAAGAC TGTCTTGATT TTGACCAATG GATTGACAAG GTTTATGTCG CTTGACCCGT GGTCTTGAGA CAAAGTGGAC CCAGTCAAAA ATCGCTCTGT GGTGCAGTTG ACCATCCATG AAGGGCGTAA CCATCAGGTT AAAAAGATGT T-rGAAGCTGT TGGTCTCCAA GTAGATAAGT TGACAGGACT CCGTCCAG3GA GAATCCCGTC ACACCATGGC TGTAACTAAG AAATAATGA6A CCAACGTTr ATCTCACCAG TCI'TCCACC CTACATGATT CAGGCTATTG AAAAACATGG GATTTTACGT TCTCATCCCT GGTCGAAAAC 1061 TGTCTCGGAC TCGTTTCGGA CACCTAGACT GTCTTAATAA AAAAGAAATC AGCCAACTAC ACGAATTTTA ATAGCCCTG TGCGCTTTTA CTCTTGTCGC 'rTGAGC1'GA CTTGTTCCAA GTTTAAGGGG GTATTGATGG GCTTrGCCTCG AGGTAAGGAC CCCGTTCCAG ACCGCTTTTC
CCTTAAACGA
CGCATCCTAT
AA'rCAAGAAG CACGT'rTGAG TGAGCAACCT GCGGCTAGTT GTTTGAAGTG GCTTA'rTTCA AAGTCCGCCT CCGCTTAGAT AGCAGCAGGT GTITTCAGCGA CCCACCGATG GCAAGGGTAC.
ACTTTCAAAG CTGACTTCTT GGGAATGAGG TGGCGTAA.AT AGA'rTTCAAA ATGATAAAAA TGAACTTGAT AGGATGCGTTI TTAGAATGTC AAAATTTTAT AACCGCGTCA GCTTTCATCT GCAACCTCAA AACAGTGTTT TCCTAGTTTG CTCTTTGATT TTCATTGAGT ATTAAATTGA AAGCTTTTTC TATGTCTTCA ATCATGAGTT TTGTTGATTC
ACCAGAGGTC
'rAAGGGCATT
GGTTGATGAC
GTCCGTGGCG
TGGTGTTAGT
TTCTAGGACA
AA6AGAGGATG
TGAGTCTTCA
TGGATAATCT TACCATTTTTw CCGTCGT'rGC TAGAGTTGTC TCAGGGTTGA TTTCTTTGAC AATTTITGTAT CAGTTGGI1'T GCACCAAAGG CTGCCATTTTI CTTTCATTTT TAGTAGCGAC GCTTTCTGTG TACCAGTTTC GAAGTCCAGT AGTCGTCCTT 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 GAATTTCAAG GTTTGGTACA AGAAAGAGAA ACGAGATTTG TCCTTCATTA AGGAGGATCG CAAGGGCT'IT TTTQTCAGAG TTCTTGGATG CTCTTGTCTA GCTTGGTCAA TTCTTCCTTG GCCGAAGGCA CTTGCTAAGG AT1'CGATATT AGCCTTGGTA GCTTGCTTGG AAGAGAACGG TTGGGGCGAT TTCTTTGAAT TTGTCTACGA AT'TTTTGTGT ACGTGGCGAA GCGATAATCA AATCAGGCTC AAGGGCGGCG ATAGCTTCTA AATCAGGTTC TTTCATAGAA CCAACAT'TTT. TGACAGT'rCC CACTAGGTCT T'TTAGATAAC TCGGAACAGT TTTTGTAGGC ATTCCGACGA TATTTTTTTC AAPATCCTAAA GCGCGAATAG TATCCGCAGC GCCGAGGTCA AAGGTC-ACAA TCTTTTCAGG AACTTTCGAA AGTTTGACCT CGTCC-AGTGA ACTTTTAATG GTTACCTCTG TTGGAGCAGA GCTACTGGTC TCTGTCTGAC TAGTGCTTGA GTTTGTACTA CATGCACCAA GTAGGAGCAA GAAGCTGGCC ACTAGGGCAG TGAAATAW G TTTAAGGGAT GTT'rTCATAA TTTCTCCTTT TTAAAATGTG ATAACGATTT AGGGAGTCTC TTAATCTTAT TGACTAAGAG ACTGAAGGTT CTCTAAC=G AGCTTTTATG TTACTAGCTA TAGATACAGA TCTTTTTGTC ATTGATATCA GCTAGCGTGA TGGGAATCTC ATAAAGTTGA 1062 CTGAGCAGGT CAGCCTCCAT GATTTGATCG GTTCTI'CCCT TGCTAAAGAC CTGGCCcGrCC TTGAAGGCGA CALATTTCATC TGCATACTGA CTGGCCATGT TGATATCGTG GAGGACGATG ATAATGGTCT TGCCGAGTTC CTCC-ACCAGT CGTCGAAGAA TCTGCATCAT GCTGACGCI'r TGC1-rCATAT CGAGATTGTT GAGTGGTTCG TCCAGCAACA TAAAGTCCGT ATCCTGGGCC
AGTACCATAG
TCTTTT'AAGT
GATCTAAGTC
AATTGGCTT
GAATTCCAGC
AGCCTGCTCA
CGATAAAGAC GCCCTGGAGT TGCCCCCCTG ACAGGCTATT GA'rGTAGCGG TGGTCAGTTC rAAATAGTTC AGAG;TTTCTC GGATTrTTTC CCAGTCTTCT GACCTCGGCT GTAGGGAAAA CGTCCAAAAC TGACCAGTTC TTCAACAGTC GGTAATTGAT TTTCTGTTr'r AGGATGGTTA GTTCTTGGGC CAGTTCTTGC TCTCGATTTC ACGTCCTTTG ATACTGAGAA TGATGGAGAG CAGACTCGAT TTTCCAGCAC GTCAGT'rrTT GAGGACTGAC TT'CAAGCGAA ATGCC'rTGCA S S
S
GAT'rTGTCA.A TGTT=CCAG TTTCACTGAC AGAAGCCACC CACACTCTCA ATGATCATAC CAATCAAGGC TTGCCCCAAG GTTAAGCTAA GTAACTTGTG CTGATAGTC T TGACAATCA AGAAGGCCAT AGGTCCTACC AAGGCAGTCG
GAGACCTCCT
TGATACGAAT
TAAATCCAAC
GGTAGGTGAG
CCGTT'GAGGT
GGAGCTCTTT CTGTTCTTTT GGTGCAAGAC ATCTAGAACG TCAGAGAACC GATGGCTAGG TTTGCAGTTT ATCGTATTCG GAAAGAGACT TCTGAGCGCT GTGTCTTCAA GTAACCTrGT GGAATTCTAA GATAGGGGAT GGGTTTGTAG TAGGACGTAG TTTCCGTCAG GGTTTGAAAA AAACGATGAT CTTTTGGGAA AAAAGTAGAG AAGACAAGCT GTTTGCTTTT AGTCTGCATC CTACCGATGA ?1TCCTAGCA.A GATAGGATAT CGCAAGCCAG TCAACATCGA GTCCCAATAT ACTGCTT.rTC GAAAGAAAAA ATGGAAGTGT TGAGATGT'rG TTTGGATCCA TTAGGACTTG AGACAGATCA GCAGGACGAA AAGGCGAGAA AGAAGAGGGA TTGCCAAGTT GAAGAAACTT AAGGATTCAA TTCCCA-AAAT CTAATGGTCG AAATCCCAGT CTCCCTGATC T'TTcrTGGTT CATTTGGACC AATAAAGGCT AAATATCCTG ?T=GAATG ATATAGTAAG ATAAAGAATA TTCCAGTGCA AAGACTCGT CAGAATGGCC ACTATAAAGA GTTGGCCAGT ATAAAGCCGA CAAAAGCACG ATTCCCCAGA CTGAGCCGTT TCTCTTTGCA GAT'rGTCAAA GCGAGGATGA AAAGGAGGCA AAAAGACTAT AAGGAAGGTG CTGATATTTC GACCAGGTCT TGCTTCATCA CTGGACAAGA AGTAAGACTA GCTTTCAAAA ACCAGTAGIA ACTAGGCGTC AGGAAGCGAT CGCGATGGCT1 ACCAAGAGAT TGACAAGTGA GTGATGGCCC AATCCAGAAG AGCTTGGTAT GAAGTAGGAT AAAGACGAGA AGGGCCTAAT CAGAACTCGG GTGCGACCAT GAGTTTGGTT 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 3580 8640 8700 8760 8820 8880 8940
CGCAACTTCC
CCGATGGCAA
TTTTCGTCCC
GAGACTGACA
AAC'TAGATTG
AAGCAA.AGGC
GAATAATGAG
CCTCTCCAGA
GACAACTCAT
GCACCAACCA
1063 TGACTTAGAT TATCTCCATA GCGCTTGCGA ACAAGATTGG GGTAGGCCAC CCACGGTAAr CATGGTGACG CTTGTCGTTA AGTT'CA.A GTAGGGAGTA GGAAATCCCC AAACTCTCGC ATGATGGTGA AGGTrTGGGA TAATTTCCAA ACGGTTATCA AGCCACTCAT ACTGATGGGT CTGAATCATG GAGAAGGAGC CTCrGAACCA GATTGAAACG ATAGGCGATA ACTrCTGTGA TAGATGATCC CAATCAGAGG CAACATCCAC CTTTCCTTTA GCTAGGAAGA AGAGGGTGAA TACGATGGAT GAAACAAAAG AGACTAGCCG ATGGAAAGAC AAAAAGGCTC AGCACCATTC GAACGATAAC TCCGAGAAAT GCGCCACCAG AAAGAGGGCC TGGTTTCTTT CCCTAGATTC GGATGATGAG GCCTAAGAAG CCTGGGTCCA GGCAGTCATA CTGAGCCGAT AATCCCGCTA CAGTAAAAAT GGTCATAAAG CGAAGAGCAT CTTGTGGGTC CCAGTITTGGC GGCTTCAGTC 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9707 .9999.
9 99 9 S 9.9 9* 99 9 9 9. 9.
9. 9 9 99 .9 9 9 999999 99..
.9 5 9 9 *999 *9 99 9 9 9 GTTCCA.ACTG TACTCGGTGC AGCAAACTGA TrTTTGGGTAA TAGTCTGCAT GAGAAGGCCT GCCATACTCA TACTAGAGGC AGTCAGGAGA ATACTGATAG TTCT'rGGGXG ACGGGACTCr TGAAAGAGGA GCCAGGTCTG CTGGTCGAA.A TCAAATAGCT TCCCCATGA AAAATCACTG GTCCCAATGC TAATAGAGAG AAAGACTAGG AGTAGAAGTA AGCCAGG INFORMATION FOR S.EQ ID NO: 165: SEQUENCE CHARACTERISTICS: LENGTH: 5910 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165: CCGCAATrAT GCTTGAAAAG GAGTATACTT ATAAGTAACG ATACdCAACG TTCCATTATT TTAAC.ACACG AGGTGCTATT TGTGTTGATG CACATCTCTT CTCTTCCAGG AGCTTACGGA TGCTTACGAC TTCGTTGATT TCTTGGTCCG TACAAAACAA ATTAGGAGCA ACTAG'IrACG GGGATTCTCC TTACCAATCT CACTCATTTT ATCGATTTAG ATATCTTGGT GGAGCAAGGT TGAAGGAGTT GACTTTGGTA GCGATGCGTC TGAAGTTGAC ACGTCGTCCT CTTrTAGAAA AAGCGGTGAA ACGTTTCTTT TTTTGAGAAA TTTGCTCAAG ACAACCAATC ATGGCTTGAG TATCAAAGAG TATTTTGACA ATCTTGCTTG GACTGAATGG CAAACGTTTG CGTCTGAAAA ATGAAAAAAC GTCAAAGTGG ATCGGATCAT TTGGTCAA-AG CGTTACTGGC AAATCCTTCC TTCTCAGCCT TCGCAGGAAA TTGTTGGAAG CAAGTGACCT TATGCTAAAA TCTACTATGC GAAGTCGGAG ATGTTAAAGA CTCTTTGCTG AGTATATGGC CCAGATGCAG ATGCTCGTGC
TCGTAAAGCT
CCGTGTGACT
CAACCACATC
GTGGGCAAAT
ATGCCCACCA
GGAAGCAATG
AATCTACGAT
TGCTGGTTCC
TC.AGCACTTG AAAGCTATCG CAATACTTCT TCTTCCAACA GAAATCGTTG GGCACATCC CCACATCTCT TCAAXACAGA 1064 TGAGCAATTG GCAGACAAGT TGGTTTACCA ATGGT'rGAA-A TTGAAAGCTT ACGCTAACGA AATCTACGTA GCGGAAGATT CAAGTGATAT TGTCAATGGT AAGGCTACTT GTATCGCAGG
GATCAGTM'
GACAAAGACG
ATCGTTCGTA
GATAZAGCAG
CTGTAACTGG TCAGCT'rTGG GCGrAATCCAA GCTACAA-ATG GTGGATTGAA CGCTTGCGTG TCGACCACTT CCGTGGCTTC GAATCTrTACT CACCTGGTGA GTGGGTGAAA GGTCCAGGTT TTGGTGAGCT AAACATCATC GCAGAAGACC TGCGTGAACG TACTGGCTTC CCAGGAATGA
TCTATGACTG
AAAGCTTCAA
GGGAAATCCC
ACAAGCTTTT
TTGGCTTCAT
AGA'rTCTTCA TGCAGCCG'rT AAGGAAGALAC GACAGATGAA GTGATCGAAT
S.
S S
S
96
S
S. S S
S
55
S
ATTTGCCCTrC AACCCAGAAG ACGAALAGCAT TGATAGCCCA CACTTGGCAC CTGCTAACTC AGTTATGTAC ACAGGAACAC ACGATAACAA TACGGT'rCTT GGTTGGTACC GTAATGAGAT TGATGATGCG ACTCGTGACT ACATGGCTCG TTACACGAAC CGTAAAGAAT ACGAAACAGT GGTACACGCT ATGCTTCGTA CAGTATTTTC ATCAGTTAGC TTTATGGCAA TTGCAACTAT GCAAGATTTA CTAGAATTGG ATGAGGCAGC TCGTATGA.AC TTCCCATCTA CCCTTGGTGG AAACTGGTCT TGGCGTATGA CTGAAGATCA ATTGACACCA GCTGTCGAGG AAGGTTTGCT TGACTTGACA ACAATTTATC GCCGAATTAA TGAAAATTTG GTAGATTTAA AGAAATAAGA 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 S. SC
C
S
S
CAATAATCAG GAGACAACTA AACATGT'rAT A'rAAAACCAT TGCAGAATGT AGCAATGAAG AGCTTGCAAG CAGCCAAAAA CCAGTCAACA CTGAGTTCTT GATTGGTAPAA CTCTTGTCAA ATGTTAAAA.A AGAACTTGCA GCTrCAGGTA TGGAACCATC TCTTGGTAAT GGTGGTTTGG TTGCTACTCT TGGTTTGAAT GGTGACGGTG AACAAGrCT TAAAAACAAC CAACAAGAAA ACTGGTTGGTr TCGCTCAAGC CGTAGCTACC CAPACTCTrTA CGATATTGAT GTTACTGGTT TGTTTGACTT- GGATTCAGTT GATTCVI'CTA CAGATATCGC TCGCAACTTA ACTCTCTTCC AATTGCTCCG TATCTTCCAA CAATACTTCA CACTACAAGA ATTTGTACAA AATCGTTACA AGCTTTACCT TGCTCTTCTT AACTACAGCA CTGGTAACAA AAAAGTT1'AC TACATCTCAG ACAACTTGAT TAACCTTGG'r CTTTACGACG AAGACTTGAT CGAAGTTGAA GAAGTTGAAT GACGTrrGGC TGCCTGCTTT ATCGACTCAA TTCGTCTTAA CTACCACTTT GGTCTTN'TCC CAATTCCAAA TGCATGGTTG ACAGAGCAAA AAGTACCATI' TGCAGACTTT ACTTTGACAT ATGAAACAGC GACTAAAAAC CGCTTGCGTT TTATTAAAGA TGGTATCAAC TTTGACAAGA TTTACCCAGA 'rGATAGTGAC CGTCA-AGGTG TGGTTTCAAA CGGTGCGCAA TTGATCA'rCG
ACGAAGCPLAT
TCAACGATAC
GTATCGATCT
CAATCCTTGC
ACTTGGTACC
CTGTT'CAAAT
GATACAGTGT
AAGCCTTCTA
GTCGTTGGCT
ATGGTTGCA
TTGTCAAAGA
1065t CGAAAAAGGA AGCAACITrcC ATGACCTTGC TGACTACGCA GrGTCCAAA TCACCCATCA ATGGTGATTC CTGAATTGAT TCGTCTTTrG ACTGCACGTG TGACGAAGCA ATCTCAATrG TCT=AGCAT GACTGCCTAC ACTAACCACA TGAAGCGCTT GAAAAATGGC CTCrGAATT C=TGCAAGAA GTGGTTCCTC AATCATCGAA GAATTGGACC GTCGTGTGAA GGCAGAGTAC AAAGATCCAG- CATCGATGAG AGCGGACGTG T'rCACATGGC TC-ACATGGAT ATCCACTACG TAACGGGTT GCAGCACTCC ATACTGAALAT CTTGAAAAAT TCTGAGTGA CGACCTI'rAC CCAGAAAAGT TCAACAACAA AACAAACGGT ATCAC'r'rCC TATGCATGCT AACCCAAGAT TGTCTCACTA CTTGGATGAG ATTCTTGGAG CCATGAAGCA GA'rGAGCTTG AAAAACTTTT GTCTTATGAA GACAAAGCAG AAAATTGGAA AGCATCAAGG CTCACAACALA ACGTAAATTG GCTCGTC-ACT CCAAGC'TGTG GAAATCAATC CAAAT'rCTAT CTTT'GATATC CAAATCAAAC GTACAAACGC CAACAAATGA ACGCTTTGTA CGTGATCCAC AAATACCTTG
TGAAAGAACA
GTCTTCACGA
ACATCAAAGC TGGTAAC-ATC CCTGCTCGTC CTCCAGCCTA CACAATCGCT CAAGACATTA TTGCTAACGA TCCAGCAGTA GCTCCACACT TTACTGCAGC AAGTTTCCT ATCCCAGCAT CTAAAGAAGC TTCAGGTACT GGTAACATGA GTACTATGGA CGGTGCTAAC G'rGGAAATCG TCTI'CGGTGA AGATTCAGAA ACTGTTATCG GCGAATTCTA CGCTCGTGAA GCTATCAAAC TTCTTGCAGC TGGAAACAAA GAGCGCTTGG ACTGGTT~CAT GACTCTTCTT GATTTGGAAG CTGACTACGA AGACCGTGAC GCATGGTTGG CAATCACAAT CTTCTTTGGT GGTAAAGCAG TCCATTTk.AT CCrTTTGCATG TCAGAAGTTA TGCAAGTAGT TATGGTrGAA AACTACAACG GTGATATCTC AGAACAAATC TCACTTGCTT AATTCA'rcTr GAACGGAGCTr TTGACACT'rG CTGAGTTGGT TGGAGAAGAA AACATCTACA ACCTTTACGC A.AAAGCAGCT TACAAATCAA CATTGGTTGA CTCATCGTT AGTGATGCAG AACGT'TTTA CAATGAATTG ATCAACAAAG ACTACATCAA AGTCAAAGAG CAAATGCTTG 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 GATTCTTCTC ATCTGACCG'r AATACTCTTC GAAAATCTCT TTTGAGCAAC TGCGGCTAGC AAATT'rATAC TAATACATTT GATGTAGATT TGGGTCAATC
ACAATCGCTC
TCAAACCACG
TTCCTAGTTT
TGTAAAAAAG
TTGTCTAAAA
ATAAAGTCAT
AGTATAACGA
TCAGCTTTAT
GCTCTGAT
CGAGTTTCGA
ATAGGGAAAT
CGTTAACATT TCTAA.AGCAG AGACATCTGG CACTTGAACTr CTGCAACCTC AAAGCAGTGC ?TTTCATTGAG TATAAGATAC TI'GAAATTCG CTTTTAAT CCTAGATACA GTGAAGGCTT TAAATGCI'GG TTTTTACTGT TATTATATTC GCTTACATAA ATATACACAC TTAAATTGCGT AAGAT'TrTG GAAAAAArAA GCAGCATTGT CA'rTTATGTA CTTTrrAGTTri I rCTGTTT 1066 CCTCAGCCTT ATA7TrTTC GTAGTTGGTT ACCTCATATC AGTAI'ATAA TATAATTGTA GGAAAGAAGG TGTTTTTATG GTTGTTTATT ACCTTTCTTG TAATAAGCTT GTTACCTGAT AAAAATTTGG AAAATAGTTT TTGCAATATT GACGGCACTG
TCTTCGCTTC
AACAACTTGC
AAGTTCTGCA
ACCATTGCTA
0 0 0 *000* 0.
o CTATTACTCA CCCAAAGCAG GTAACGTTTA TACAGCTTTG ACAGAAAGCA GAAAAGGAAT AGTAATTATA TACAATGTAC GAACTTTAAA CAACATTGAT TTAATTTTAT TGTATATCTC ATT'TTrATCC CATAATTGTG TAAAGCAACA TCATTTACA.A ACTITTTGCT GCATGTTTTC TCCTAATGCT AAAGCTATTG TTGCCAAACr GCTCTAAATG TCGGTTAATT ACTCTTTGA GTTTGCAAGT GTAT'rGATTT TGTTTCAATT TGTGATACTT TACAATTAAC ACAATTGACG CTTrTTCTCCr 'r'ITTATTGC AAAATATGCT ATTATGTTAA CTAAGTTATr TTAAGAATGT TCTAAATTCT ATTTATCCA-A CGCTTTTCTT CGATTTCGGC AATTCTTTTC GCCATCAGGA TCCTACAAAG CAGGAATTT ACACTTCGTA TCAAAGCGCC GATCTGGTAT AAGATCATTC GACGAAGTCG TCATTrGCAAT AAGATAAAGA ATATAAACAA ATGAT'rAAAA GTTAATCCTT CTAAATAAGT CCCCGGTGAT ACCAACTACC TACATATCTA CTTTTACTTT AATTAGATTT CAGTATCTAA TA6AGCACCC GAGAGTATGT AGGTGGGATT TCCATTG -rG TGAGACGTCT GTTCTTCTGT TAGAGAAGTG GTTCAGAAGC GTATACAGCT TCAAAAAA.AC CGAAATAAAT ATCTGCTTAC ATTTTATCAT AAAAA'rATTT TTCAAAATAT CCTGCCTCTA TATTGGTAAC GTGGTAGCGT TTGGGGTCTG TTATATCGAA TCGAAAATAA TTTGGAGGAA GATTTTGTCT
TCTATTCTTT
AGGGAAATAA ACCCTACATT GCGATT-CAAC ATTTCTTGCT ATGTTTTTTC TCGAGTTCAG GATAGGGTGA GTCGACATGT CrT ACTTT TTTGGAAATA AAACACACTC CGAGGGGTTT CTTTTCyCTC TTTTTCTTTA GCTGATCCAC CACCTAAAGG CGA-AAATACG T'rATACCCAA CCGTTGTTAG GAATGGCATC AATAAATTCA TAGCGAATTC GAAATTGCTG AACGAATAGC GCAAGGCCTG CAGTAGTTCC ATTTGATTAA CTGTAATACC CTATAATCGC CTTGTAATTG GGATGAAAAG A'IrGCATT'rC ACAGGTTGAA GTTCCATATT GAAACACTTG GAATCGCTGA TTCATTAATT TGTTCATGAG ATACTGTTAT TATAGTCAAA AAATGGACGG ATI'TATTTTG CATGATTTGT TTACTCTCAA GTACTGGCCT TATATCACTT AGATTAGAGC TATGCTTGAC CTATTATTTA TTATTrTTAAA 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5910
GATTTTATTT
TCATCAAGAA
ACTATTCATT
TGTGTACTTT
TTTATT'rATT
GTTATTTTGA
TTCTCTTTTC
GATAAGTTTG
TAGGATTTAT
TTGTATAAGA
INFORMATION FOR SEQ ID NO: 166: SEQUENCE CHARACTERISTICS: LENGTH: 5406 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166:
GGCATAGCGA
ATCCGTTACT
TAAGGGACAA
AAAACGAATC
GACTTCTTCC
CTCATTTTTT CAACTGTCCA GGCTGGATAC CAGACTAAT1' TAACCTCAGT TCTGGAACCT CTATCATAGC ATCATAAATC TGGTCTGTCA AAAGGTCTGC
CCCATAGTTG
TCATAGATCA
AAGGCTGTTA
AGCCATATTT TCCTCACTCT GTTGGCGTAG TTTTCTCAAA TAAAGACTTT CCCCACATCT GACGCAGAAC ATTTTCTTCA GCAAGACGAT ACGAGTCGTA CTCCAAGGTG GCTATCGTCC CAATCTTCAA GATTTCACGA CTGGTGTCGG ATCTTGCCCC TGATAGTTTC AACCATGTGA GAGTGATAGC CTGACGAATC AGTCAAACTT GTCAACCGCC ACTGCATACC ACGACCGACA TCAAAGTCAT GTCAATCTCT GTTTGCCCTG TGTCACCGTC AACCAAGATT GACAATATCG ATTCCCAACT CAGGGTCGAT AAATCCGTGT TTTGATGTTT TCAATTTGCT CTTCTGTATA TAGTCTTCAA TAAAATCACG AAGCGGTTTG CTACGACTTG GCCTTTGCTT CAATCTGACG GATACGCTCA CGAGTTACGT TCAAGTGTGC GCATTTTTCC ATCATCTAGT CCAAAACGTA CGGTCTGTAA GAGTATCTAA GATrTCATCC AATTGCTCAC TAATCCACTG GATTTTCAAT CACTTCATCT TCGATAAAGT TCTTCACCGA TAGGAGTTTC AAGAGATACT GGTTCTTGGG ACCTTATCAG GTGTCATATC CATTCGTTCA GCAATCTGTT AATTCTTGAA GGAGATTCCG CTGTTCACGA ACCAATTTAT ACTOGGATAC GGATGGTACG AGCTTGGTCC GCAATAGCAC 120 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380
CACCAAGTTG
TTCATCAAGC
TAGCGTTTGG
CCGCAAGACG
CCTCTTCATT
CATTGACCTT
CATTGCTGAG
CCTGAATCCG
GACTTGCATT
TTGTTTGGCT TCGATATCAC GGTCAAGAGA GGAACGACCC AGCAGAAGTT GACCCAATCA AACACGCGCA CTTGGATTTC 'rTGCAAGAGA TCTTCA.ATCC GATTTCATCA TCTGTTGCTG CATAAGTTGA AAACTTGAAC CCTTTAGA.AT CCATATTTCC TTCTTGAATC A.AGTCAAGGA CAATGGAAAC AACCAAACGA AGATTGGCTT CAGCTTCAAC AGCCAGTGCC AACTCTTrCT CTATTTCTTT CAAGTACATA CGGACAGGGT AGTCCTCATC GCTGAGTTCT GGTTCTTCTT CTTCGTTATC TGTGATAGAA ATGCCTGCAT CATCAGCGTC CAAGGTAAAA GGAATAACCA TCCCTTTTTG CTTATGATTA CGGATAAATT CTGCTACCTG TACGTCAAAT TCTTCTCTTT TGGGAPAATTA ATGGCTACCT 'rCCTCCACCT GTTTCGAGTC ATCTCTAC1-r AGCTAAAACT TGGTACCAAG AGGALAGAT'N' CC.ATACTGC 'rGCAAAGTCI' TCTCGCAAAC GAGTAGATGG GCTTCTGCCC TGGCGTCGGT CTGGAAATTC AACAATCTGC TCAATCTCGGG A'rAGCTGTT'r TGAGCAGCGA 1068 GTTGTTACTT CTTTTTGTT AACGTTCCAA TTCTTCTAGG TGTTGCCATT ATTACTCCAT GCTGTATCTG TATCTCCTAC
TCTTTTGAT
CACTAACTTC
CrCTTTCAAC
CAAGCAAGTC
GGTAATCGTT
TCTCATATTG TCCTGAT1'CA AGAGAGCC= CTGCGGCGAT ATCTCAGCAG GCAAATCCTG
TTCCTCTGTC
ATATAAGACC
CAAAACAAGA
TGCTCTGCTA AAACTTCTGG TGAAATTCAG GTGTAGCAAA GGGGATTCCA TC-ATCCGATA T-rGGTGACAG GCATGGTGAT TGCACCTGAC GACTQTCATT 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 TCATAATAGC CCATAACTGC CTTCCATGCG ATTCTGCCTT TATAATCAAA GGACGCCAGA CTGTCAGCTA AAATATGAAT TGGACTTTTC TTGAACAATC AAGGGAGCTA TI'TTTTCAAG AAACTCAATC TGACCTGCA GATTTT'CACT GTTTTCAGGT CTCAATCGGA CTAATACGAG T'rTTCGTTAA TAGATAGGCC TTGTAGATAC TCATCAGGAT CCAAGTTATC AGGCATGCTG ACCAATrTCA TCCAATGCTT TCAATGTCGC CGCTTGCCCA AAGAACCAAT 'rTCTN'GGTTA ACCTTTTCAG ATGCTCAACA TCCCATCGAC GCCACAGCAT TTTCGATTCC AGCCCGATAG TCCTTCCATC AGGTAAATC1' CACTAGCTT'r TCCAGAAGAT ATATAATTCG TAACTTTTGT TAAAAATTGC AGTCGATCGG TTGTGAATCC GTTTTTTGCC ACATACGACC TGAGAAGGCA TGTCAGGGGA AACATAATGC GATTIGTGAAA GGTGTCTACA TTGTACTGAT GAATGTAGAA 2100 AAGTrCTTCTG GACCATTTTT 2160 ACGATTTGCA CAGGCATATC 2220 GCCTTATCTC CATCGTAAAC 2280 TGCTCTCGAC TCAAGGCTGT 2340 GCTGCAATAA CATCCATGAA 2400 CTTTTGCCC TATCCATATG 2460 CTGTrr'rTAT ACTTAGAAGT 2520 ATGACCTTTC CTTGGTCATT 2580 AATTGATTGG CATCCGAGAG 2640 TGATCAGACA AACGTTGATA 2700 TGTTTAAGCA CTTCATCTGT 2760 ATAGTCGTTG TCATGAGAAT 2820 AGAGCTTGGT GAGGTGAGGC 2880 CCAACACGCT GACCTAAGAT 2940 ATGAACTTAA AGACATCACC 3000 TCTACAACAT TGAALAGATGG 3060 CGTCCTGCCT 'ITTGTAALAGA 3120 TTGATT'CTT C!AATGACT'rc 3180 ATAAAACAGG CCTGAATCCA GTAAATCCTC CAGATAGTTT CGTTCTGGAG GTGCTAAACC CAACCCCCGC TGATAAAGGT AATTTCTGC AGCATGGTAA AATTTGGCTG CATCTTCGTG TGACTTCTGC TCACTATAAA GCGGT'TrTC TTGGACTGCT TCTATAAAGG GAACCCCTTG TGACCACCA CAACCGAAAC AGTGATAAAA TGT~TTrCA CCATGAAAAG GACAGAGCCC AATCACATCT CCTATGACTT CCACAATGTT
TTCACGATAC
AATCCAAAAA
CTCTTCGCCC
CATATCATAA
AACCTCAATT
GTACTCCTCG
CTGCTTG'rCC 'rAGATAGTTC GGCATTrGTTT 1069 TTGTCAACC ATACACAATA CCTCCATGT'r ATCATAGTTT ACIrTATATA GTATACTTTA tTTTCAGAAAA AAAGTAAACC ATTTCACTCA TTTCCCTAC ?ITTATTCAAA GAG7TGATAA TAATCAGAGA TTTTCATTNWT TCTrTrCATGA CAATCAAGCG ATAAGGGCTG TTAGCTGATC CTAGTCTTTG GATCCAGGGC AAGGT'rGCCA TCAAGAGACT TGCTTTTTCT 'rCTTGCTTTA AATCTTGGAT AATTCGTCCT ATTGCCGTAT TTGAGAGCAT CTTCCATATG ATGAGTAATC TTTCTTAACA AATTCATCTG TCAATTCCAT CAAAGCAACA AGCAGTATGC TCATCT;AACA GGAGTAATTC AGGTCGCC CAAAGCCTGT CrTTTGTCCAC CTGATAAGAA CTCAATCGGT ACCATTTCCT ACTT'rCAA TGGTTGCCTG APAATTCATCC 3240 3300 3360 3420 3480 3540 3600 a a a. a a a. a.
a
CTATTCAAGT
TTATAGCTAG
AAAAGATT'TT
CGAGACAGGT
CGGATAGTTC
CCAGCACCAT
TCATTT1AAAA
AATTCTACAA
TTGGAATCAT
TA.AAGCCAAG
TAGTAACCAA
TTGCAAGCCC
GAGCAATGAG
CCATGCGTCC
TATAGGCTTG
GTTTCTCAAG
TCAAGCGTCG
CAGCGACCGT
ACTTGGCACG
CACTAGTTAG
TTCCGCCCAA
TAATCTTTC
TTGCTGTCAT
GAGGCAGACT
TGCGATAACT
ACGC-TCTGCC
CACAACGATA
GGCACCTGCA
ACTATGAATC
TCCGAGTTTA
TGGTAACAAT
CATACGGGGA
CTTCTCGGGT
TGATA6AGGTC
AATCGTGATA
TTCATCAAAG
TTGCTTAACT
GCTAAAATCA
GCCCACACTA
AAGCTCAAAC
ACCCCGATCC
AGGGCAATCA
CCGAA.ACTTC
GTGTCCA6AGA CCACGCTTTT CACCACGAAA CTTGGCGATT GCTGTCCCCA TCTTTGGATC TTGGAAGACA GAAAACTTAG TGAGATCTTC ACCTAAAATA CCTGCTATAG TGTTAAAGAG AGTTGATTTr AAGTCCCGTT CAAAAATT'rC TAAGGAAACA CCATTTTTAA CGATTTTGGT TGCATTTTTT TGGCTCCTTT CAAGATTGTT TGCTTAAATG AGGCACTGTA TAAACGAAGG TAACTTGTAT AAAA?1'GATA AGCGATAGAA CCTACAACCA TCTTGAAAAT AACTTCTCCA ATAATCAA.AC CTCGAGACAC ATCGGCATAA CCVrCTTGCT CACCATTTGA TAAGACCAAG CCCATGAGCT TAGCCATATC AGGATTATCC CC'rGTAGCAA AAAAGAGCAT GAGAGCAATA ACAATACTCA AATCCGAATC AAAAGGCAAA ACATCCTGAA CACGTCCCAT AATCAAGAGC ATGATTGAGTI GCAAGGTTGG GATCTTCCCT TTTGTATA;A CTGCTCCTAC AGCAACAAGT GTCGCTAAAA C-AGCAACAGC TCCCCCAAGA GGGAAGGAAC TCCTAA6ATGT CATAAAGATT CCCAGACCTA TGGAAACAAT CATATTTTAT TTAATCCTTT CAAAGATGAG ACCTGTCAAG AGTTGATTCA TTTGCT'rGGT TCCAAGCAGG CCTAAATTCG
GACAAGAAGT
GAAGGCCTGC
ATGGGTTCAC
CATCACCAAA
TGCCATTCCA
GCCTTCGTT
ATCCCTCAGA
GCCAAACAAC
ATCAAAGTGA
CTTCTGTCGT CATATCTGGA A6AGTTTAAAA GAATAGCCCA GACAAATCCT TGAGAAATAA 4920 CTATATTCAT CTrTTTAAAA AATGGGAAGA GACTTGTCCT GCTI'CTTTGA GAACAGACTC 1070 GTCTCCTCCT CCCTACCTTA AGGAATAGTA ATACCTAGTT AAAGACATTG ACTGGGGTAT ACCTGTTGCC ACACCAAGGT TACCATAGCT GTCGCACTGG TTTr TTATTG ATGACTGACT TGCACCTTTC AAGACTTGCA AATTACAACT GATGCCAAAC T1TTCTTAGAA CT'N'GATTGC AAT'rGGAACC CAAATAGCAT ATTTGTTGAA GGAACTGCAA
AAATTC
TACCAGT'rGA
CAATCATT
CACCTACTTC
TTrATTCGAT
CTTGTGCTAT
CGGCTGGTTr-
CATGTTGGTC
GATAAATTGG
TGGTGTTATC
AGGCAATTrC
CATAAGCCTT
4980 5040 5100 5160 5220 5280 5340 5400 5406 TAGAGACAAC CGTTGGAAAT CCTGATGCAA CTACCTTGCT AGTCATAACA GTG.ACAGTTG ATGTTTCCAC TGTCAGACCT GCCTTTTCAG 0 *0 06 0 0.0 0 *0 0.
*0 0 0b *0 0 0
S
INFORMATION FOR SEQ ID NO: 167: SEQUENCE CHARACTERISTICS: LENGTH: 9711 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 167: CAGCTTGCTC TTACTATTAT AGCAGATGTT ATAGCTGGAA TTATCTTGTA TTTCGTCTGC AAATGGCTAG ATGGTAAGAA GTAGACCGAA TGACTAGCCT ATAAACACCC GTTAAATCGC TAAGATACGT CAAAAAAGCC CTTAACTATG GCACI'AGTTA GGGGCTTTGG TGTTCTAATG AACCTTATAC ACTAACTACA TTCTAGCATA TA.AGCCCAGA TATTTCAAGA GTTTTATTTA TTGTTTAAAG TTCTGAAAGG TCTATAATGA AGTTAGCCAT CTAGTATCAA AAAACCGACT AGCTCTTATG AACTAGTCGA TTTCTCATCA ATGCGCCAAC ATTTCTTGGG CGATTTCTTG GCCAGATAGG TTATCTGGGT AGTAGGTTGG CCAGTTGTCC ATTTCTTCAA AGAGGGCTTC TTGGCTTGTG CCTCCAAAGA AGATATGGAA ATGTTCTGCC TTAACTGGGG CAACATTG' 0 GTCACTAAAC TGAACATACT TGAATTGTCC AGCGTCAGCA TCTGTGGCTT CAAAGAGGAA 00 00 0 0000 0 .000 *.00 0e *0 0 0
ACGCACGCCA
GTATTTCTTG
AGTCACATCT
ACCAGTCAAC
ATAAACTGAT
CGATTGCCTT TCTTGTAAGT CAAAATTTTC TTACCGACAT ACTTGTAAGT CTTTGTCCAC CTTGAACAAA TTCCATAGTA TTATCAGTAA TGTrAATCTT GTATGATAGC CTTTTGTATA. GTAAGCCTTG TACTCAGCCT GGGTCATCTT TTAGCCTTGT AGTCAAAGAC TTGGTCAAAC GTGCCGTCTT CAAGGAAAGG TGCCAGTTAC CTGCATAGTC ACTCAAGGTG CGGTCCTTGA CAGCTGCATC CTCGAAGTAA CCATTTTGGA CTGTCTTGGT ATCCTCTGCC TTTTCAGGTT CAATTGCTGG 0 900 1071
GCCI'CTTGG
GTrTTCC-A
TITTGACACCT
'rCTGTTGTTT
GCCTTGGTGT
GCTTCTTTTG
GTTTCAAAGC
CCTCTTCTGT
AAAGTGTT
CTTGAGGT
CAGACTTCT
AGCAAGGGCT
TTCTCCATCA CGGAAATGTA AAAGGATTGA GGACATCAGT TGTGAGGCAT TTCTTCAAAA TAGATATAGG CGATTTTATT TTTCTTGACA TACTCTGTCA ATTCTGCCAA CGCGAGCAGCT GATGGCTCTG CATCTGGAGA AAGTCCTGAG ATTGCGACTT G'I-r'GAGTCC ATAGTCCAAG
GCAAGATAGT
CCT'rCTGCGT
TCAAACGTCT
TAAAGGCTGC
AAGCCTTATC
CTTTrATC
GTGTTGAGTC
CAAGGCTTGC
AGGATAATCT
TGATAACCA.A
TCCCTCTTCT
ACAAAGCI'CT
AATTTTTCGA
GC1'GACAAGC
ACATGGGGGT
TCCTCGCCAC
AGTTTAATGG CACGAACTGG TCTTCTCCAT GGTCATGGTC CCTGTCGCCT TGATGGTTN' CACTTTTTTC TTATCCAAGG CATGrTTTCCA TGTTTTCATT TTCATAAACG AAGGTATCTG GCCTTGGCAG ATGGTTCGTA TTrCATGAGGT TCTGTCCCAG TTAGCCGTAT CTCCTGCGAC TTGCTT'GGTA AATTCATAGA ATATTGAGTT TACCATC'rGC CTGTTTTTGA 'rTGGAACAAG AGACTGGCTA GTAATAAGCr AATTTTTTrC ACGTT'CGTCT ACTAAACTGA TTAGTA'rAAA GACAGTTACA AAAATAATGG GTTTCTGCAT AGTAGGAAAT GTAAAGTCCT GCTACCATTC GCAAGCAGCA TAACCGATTT AAAGTTTTTC CCCAGACGCA .ACCATAATGG TCGATACCAG AAGAGCTCCT GCTGCAGGAA 'I-I-GTTTTGC TTGAGACAAA TATAGCCC TGCATTCTTC TGTCGCGGAT GTGCTCTACT CAAACTCATG CTGATGACCT CTGGCAAGAG CAACATATCG TATCTAGCAA TTTAGGTACC CATC'rTGGAT TTTGGCAACT CACCGATTAG GAGTTCTACA CAGGGTAAAA GGTTGTCACG CCACTAAAAA CAAGGCACAT CCTATTTGAT AAAACGTCTT TAATACT'rGC ACTT'GCAGGT CCAAAAAGCC AATCGCACTG GGGCAA'rACT AGCTGGCAAG TCATAAGGGC AATAGCCACC GCAAGCCATC CACAAAGGCC AGAGAAAGGT CAAA.ATCAAA CACTGATAGT CACGATCGAA TACCCTTGCT CATGACAATC
CCTGTCACCA
GTATCTTCGT
ACAACCGCCG
CCAAAGAGAT
AGAGAAACAG
TAAACCGTAC
ACAGTTGGAG
TGTTAAAAAG AATGGACATG GTACCAACTG CAAAAGTTAA GATATACATA GGACGAAGAA
CAATGACAAA
ATTGGTCCAA
CCAGACCTGT
GGAGATACTC
AAATCCCCAA
GAGGGAAATG ACCTGTTCTT ACTCATTGAA CTCGAGCTT 'rGACATGAGG ATAGCTGTCC CAGAAAGACC GCCGCAATCA AACCAGACCA AAGGCTACAC
CGATTTCCAT
AGACAATGGC
CTGAAAGTGA
AAAGCTCTTG
AATAGTAGAA
GACGTGGCTA
TACCGGTGAG
AGATAATAAA
AGGGTATCAC TCATCAAACT CTGACGACGC ALAGATGAGGA AGGTTCCCAA AAAAGACTCA TAGCAATAAC CGCCAAAAAG GCGCGTTGTA TAAACTCGTA CTAAGCATGG CCCACCTCCT GGCCATTCTC TTGGTTACCG ACTAGATGAA TATI'GCGATC GGTAATCATC AAAACAGCCT TGCCATGATG TTCATTTTT-A CTTCCTGCAT CCA'rCCCCGT GTCAGAAGCA AACATACGCG CAATTACCGC CAAGCGTrTG TCTCGA'rGTT CCCACATGCC CTCATCATGA GCATTCAAAC GACGGAACCA AAATTCATAG ACCGTACTTG GAAAACCAGC GGCTATTCTC AATTTCTTAC CTTGCGTATT TGGTTGCAGA ATTCCAAGAC TAGCCTTGAT AGTCAAGGTA ACAAATTCCC CACTATCAAC CTTATCATAA TAGAAGGACA AATCCTCTAC ACTAA.AGCAG TCAAAAACCG CTGAATCACT ACTTGTTCAT AGGTTAAAAG TGTATGCTCA 1072 ATGAACA'rTG AAACAACGCC ATGGCGAGTC CGCATAATCC TTAACTTCTT CAGGGTCATG ATGGGCGCTG 'rGGTGCATGA G'TTCGTAAAA TGTCGGCTCG TCTrAGGATAA ACACATCAGG TCGCTGCTTT TGTCCCCCAG ATAGAGACCC
AACTGAGTCC
GCCTTTTCTC
ATTAAAACTG
TGTCTTTGAA
GAGCGTCGTC
ACAATAATTG
CGTAATATAT
TTTTGTTCAT
TGGTGATGGT
CGAGCCAAGT CAGTCAACTG ATAAAAAATC ACACGCGCAT TCCAACATCC CTTCCTTGAC CAAAGACTTA ATGGCCTTGG TTGAGACGAC GGGCCAATTC TGAATTTGTT AAAGATTCCT TGCTCCTGAG TATTGGTCAG GGCCACCTCG CTAGTGCAAT TGATTTTCCG CCTGCAAAAT CACCTCATTC AAAAAAGCAT CTCATATCTG ACTCCTTTCC TTTTAGACTT CTC Tv3TA GACATTTTGT TTACCAGTTA ATTATATCAC AAGCAAAAAA AAACTAGTTT CATTCTTGAA CTCTTCTATA TTATATTATC TCCATCATAA GTCGCCCAAT CTTTGCTCAA AAAGCGCTCA TGGTGTGGGA TTGGATAGGA AAGGATCAAC TGCCTTGTCA ACCAAGGTGA ATGGTGTCCT TCATAAAGAA AGGCTCCCCG AGACTAGCCT 'rGATATGCTC GGATAGCGAC CCGACTTGAC GCAATTTG'rT GAGGAAGATA ATAGCCACCT T'rCCAATGCG TTAGCCGCTC CATTTNCCCC ATATGTTCAA GAACAGGCTC CTCATTATTT GATTTCTCCT TTGGAGTAAA CTGAGTCGCC GGTGCTCCTC AGCGATTGGA CTTTAGAATC TTTAGATGTT TAACTGCCGC CTGACTGACA CTGACAAGAG CATAAGGATA GACCTATTAG GATTTCATC TCATATCCTT TGCTACCTGT AGAGAAAAAT ACTATTCTTT AGAGTCAAGA AAAAACGTGA TATTGAAATT CTI-GACATC TTCAGATGGT ALAGTCGGAGC AAAGCCAACC AACCCAACCA CCGTCCTTAG AAAAATCTGC 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 TATATTGGTA AAACCTTGAC CATAI'CCTCT CGTAGACCAG
AATCGGGTTT
'rGGCGACT'rG
CATCTGCTCA
ACCTTAGAT'r
AGATAGGTAA
TTATAGAAAT
TTTCTAACTG CTAGCGAATC CATAGTTCAT CCATTTTTTA TACAAAACTG TGTTAAA.ACC AGCTTTI'C'G AGAATCCTTT AATTTTCCAT TCCCATCTCA
TTCTGCACCG
TTAACAGGTG
AACTGCAAGT
TTTGTTGGTA
GAATGATAAA
CATTATACTC
AATTTCr'rCA AATCCTTCTT TTATTGGA.AG TATTTTTTTC 1073 AGCATCTGCT TTGACAACAT CTTCTATTGC TAAATACTTA GCTACATGCT TATCGTAGTT AAAGGAAGCT TGGCGTTCAT TAAAACGAGC CTGATAAGAA AACTGGTCTG GCAAGATI'T AACATAGCCT CTAACCGAAA ACTGACCA;A CAATAATTCA ATCAT1TTCAT TGTCTGCTGT AACCAGGTCC TTCATAGCTA CGTTTGGGAA ACTAGCCTGA TCCCCAGATT GATGTTTCAG
CGACAAMTCT
CTGTTGCAGT
AAAACTAGTC
T'rTACTGAAC
CATCTGTTGC
AGGACGGTAC
AAGCCATTCA
4500 4560 4620 4680 4740 4800 TCTTrTACTG CCZAACTTCTG AAGCGAGTCG CTGCATATTG AACTGGTCTC CATThAAATA CACTGAGCTG AGATAACATA ATTCCAAA.AT ATTGGTTAAG CTGCTGGAAG GCTGCTGGAT CACAACTTGT TTATrCTCCA CCATGCAGCT CCCCCCTC
CATAGCCATT
GAACGATTGT AT'rTCTCAGC TAATACCGCA GGATGAGCAC CTAGAGCCAA AGAAGGGAAC AAAACGCACA TTTGGATCAG GCTGTGGTAA 4860 CTAAAAGATA 4920 CGTCAAAACG 4980 ATAGTGCTCT 5040 AACGCTTTTC 5100 AAACCAACAA 5160 GACTTTTTrGA CTTCGCTCCT TAAAACTATC GATAGTAGTA GCCACTGCTG AGCTCCTAGA TTATCATGCA 'rCTCAGTAGG ATAAAAGAAA ATGAGCAGAA ACCAGCGATC AAGACCGGTC CGAAGATCAT CCATAAGCGT TTAAGCATTT TGTAGCTCCA CAATACCAGC tATGATTTTA PTAGCTGTAT TCCAGTCGTC ACGACCAAAC TCTG?1'ACAG GGACACCAAT GTCAAAACGG TTCTCAATCT CCACAATCAA CTCAACCGTT CCCATACTAT CCAAGACACC TGCATCAAAA ACATC'N'CAT CCATCATGTC AGAAACATCT TCCATAAACA ACTCATCAAT AAT'rTCAATA ACTTCTGATT TGATATCCAT ATTTTATTTC CTTTTrATTTT ITAAACCA'rA GATTATTCAA GAATCCAGAA AAGATTAAGA ATGACAACAT GACA.ACATGG AAAGTGACAA CCATGCCAAG CAACTGA.ATC CAGCGATTCT CAGGTAGGGC AGCCTTCCCT GC'rTTTTTCC GTTCCTTAT'r GAGCGTTTTT T'rCTTGCGAA CCCAGGCATC ATTGATGACC AAGCCTAGTC CATGAAAGAG TCCATAGGCG ATATAGTACC AGGTCACACC ATGCCAAAAT CCCATAATCA GCATATTTAC.AATGTAGGCC ATGCTTGAGG TTACATTACG ATTTTAAAG ACTTTCTTTC TGGTTAACAC CATCACCATT CGCATAAAGA CAAAGTCACG GAACCAGAAG GACAGACTCA TATGCCAGCG ATTCCAAAAC TCCTTTAAAT CCCTGATAA AAAGGGCTTG TTAAAGTTGA 'rAGGGCTACG GATTCCCATC AAGTTTGAGA TGGCCAAAGC AAACATAGAA TAACCTGCAA AGTCAAAGAA GAGTTCCAGA CCAAAAGTAT ACATA.ACTGC CAAGGCATAG AGATTAAAGA AGCCACCTGA CTGCAAGGCT AAA='CTTCA GAGGAGGTAG TAACGTCTCT CCTAAAACAT GAGCTAGGAT AAACTTATAC AAAAAGCCCC ACATGATATA GCGGACAGAT TCATCCAGCA TATCCATCAA CTCATCTCGC TCAGGAATAG CCTGATAA'rr TTCATTAAAT 5220 5280 5340 5400 1074 CGCTTAAAGC GATCGATTGG ACCACTCGAG AAAGTTGGCA AAI'CCCAGA GGGTAAAATC CTTAATCACT CCATCTCTCA GAACGAAAGG TCAGGTAAGA AAT'rCCCAAG AACCCAAGCA GCTGGTI'GCA CCTTGACAAA GATAATCGGA AGTAGGGACA ACCCACTTGC CATCCTGCT TM~CGATAA TCTT-GTAGA CAGCAAAGGT AAATACCC.AA GGCAGCTAGT TGATTGGTCT ACAATAAAGA AGAGACTTAC CAACACTTCA TACCAGGCAA CCTATAAAGA TGGGCAAGG'r TGCAGCAATC ACA'rAAACAA GGCTCTAAAT GAGGAAGCTG TTGAAAAAAC 'rCCATCA'rCT ATCCTrTTAT GTCAATCTTT CCAT'rTGGAG T'rAGTGGCAA TAGATGGCAT CATATAGGAC ATCATGATGT CTGTCAGGTC TGAAGAGAAG GAAACGGAGG GCTCGATGAC AAT'rCCAACC AAGACTGCGT TCCATTGATA GAAAACTAAC TAAGTAGAAG AAAGCAGGAG CAATATTTCC TTCCACCCAC CAACATGGTG AGCGTTTCTT GAAAAAGAGA AATACTGAGG AT'rGCCATAT CTTATTCACC TCGTT.AATCA ACTCTCTCGG TAAAGGAA'N' T'rCCTTGATG GCCTTGGTAA
TATCGATATC
GATTTTGTAC
GAGACTTGTT
TCTGGAAGTC
CGCCTCTGTG
GATTGTTCAT
CATTTGGCAG
AGCCGATTG
TCGCTCAAAC TGCTCACGAA CACCGTC'r'r' TAAGATGACA TAAGCCAATA CTTGTGGTCC TT-GTTATAGC GCGGTACTGC GACAGCAGAT TCGATAAAGC GAGGTTTTGA GAGACATCTT CTAACTCAAT CATGCGTCCG CCGTAGAGAA GCAAGCCCTC ATAGGCTGGC AGATCTTCAA AC1'CAAAGAA ATAACCTTTT GAAACAGCTG GCCCAGAAAC GCGGTAACCG TTAAACTTAA ATCTGTCATG GTTCCCACAT GGCTTCTGCT GTTTTTTCAG A.ATGATTTCT CCCTGCTCAC 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 TTTATTTCCT TCCTCGTCAA TGATAAAGGT TGGAGAATCA GCCTTGGTAT TAGGCGTTTG AGAGTCGCTA ACATCTCGTC TGTCACGGCA ACTGCTCACA GAGCTACTGT CGCTTCTGTT GGGCCGTkAG CAT'rGATGAT CGCGCAGTTT TTGAGCTGTT TTGACCGTCA ATTCTrCACC T'rCCAGGC-AT TTTCTCACTG TTGAAGTATT CAGACAACAT GTGTTGATGT CCAGATAGCG ATTGGCAATG AAAAGATAGC CCTGAGTGAT GACTGAAGGA AGAGTGAAAA GCGTACCACC AATACATGAC AGACAAGTCA AAAGAATAAG GTCGCTGTGC GTGTCGCAA.A TTCCTTA'rCC GTAATCATCC AGTTTGTAA.A AAATCTGCAC TCCCTTAGGC TTACGAGTCG TACCAGAAGT CATCTCCCTT GACTGGATGC GTGATTTCAT AGT'rATTCCC CCTGAGCTAG ATTTATCATT GGTGTAGAAA CCTGCTCCAA TAATCAAGCT TGGCTCTGCT ACTTCTAAAA TAGCTGAAAC ACGGGCATTT GGGAAACGCT ATCAAAGTAG AAATGCGTGA
GGCCATATCT
CGCAAAGAGT
AAGTGCCAAG
CAGCA=TGC
GCTGAGGAGA
GCAAAGGATG
TGCTTAAACUT
GTCGGTGCCC
GGACGACTCG
TTATCATGTG
AAAGATAATG TAGTAATTAT TTGGGCAAAG GCTTCTTGAA GGGAAAGGCT GAAATGGCAA TCCCTCCAAG GCCGAATGGC *TATCAATTGG AATGTAGGCA TGACCTGACT CATATTCTrG GCCACCAAAA ACAACCACAG 'rGACTGCAGC CAAACTATCC GAATCAGCCT AAACATTATA GACAGGATAG CTAGGC'rGTG TATCTGCTAT TGGTTTATTT GACACAA'rAG GATAAAGCTT CCTTGACCCT GACCAAGATA AAGAAAA'rAC AAGGCTGTCC GACCAAGAAA GAAAAACCAT TCATTTCTGT AATI'TTCGC TT'rCTACCAT TGTACCACTT TATATAGTAT AGATTTCAGC TTATTTAAG GATTATACAG TCCTGCTTCA AXACTCCATT TCAGGAGACA 1075
TAGTCAGCGC
G.AGACTTCTC
TTAAATCGCC
TCTGAGCAAA
GGATTCTCCT
GCTAAAGAAG
GAGGTACAAT
TAAAATAAGA
CTTTTCAATT
TTTTTCTATG
ATGAAGTA-AA
ATGCGC'N'TT
TACAAAGGTT GCCAACATTT AGGCAAGCCT AGTTGGTCAA ATAAGTGTGT TCCTGCCCCA ATGCTCAATG GTTTCAATCA TCAAGTTAAA ATTCATTATA TAAAGCAGCC CTAGAAAGAT TCTTTTCTCT GTTTCATCAA CTGATTCTTA CTAGCTTATT GTTTACCGTA TGT'T CCAAT TATATTTTCA AATAGAGTGA TCTTCCCATA ATAAAACACA CTGATTTTTA AAGACTTTTT 9 9 9 9 9 99 9* 9 9 9 *9 99 9 9 9 99 9.
9 9 9 999999 9 9*99 9 9999 9. 99 9 9 9 9999 9 9.9.
9 9999 99 99 9 9 9 CAATATCAAG TTTrTTTCAAC AACCACTCTC TCATTTAAAA GACAAATGGC TGATAGCCAA TTAGCAAAAC AAATACTGAA TAACTAAATG ATTTTCCTCr GTTGCGATAA CATACCAACT
ACCTGATACT
TAATCTCGTC
APLACrGATGC
AATGCTAATG
AC'TGTTTrCCT
AAAGCTGAAA
TGATATAAAT TAAAATAGCT TCTATCATCA TAATACCAAA ACTCTCAGTA ATATAGCTCA TAGAAATCAC TTCAAGAACG GAATAGACAT GAAGAAATAC ACTTTCAGGA ACTTCTTTTA 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9711 ATAGAATAGT CAGTGTCACT ATITTCCATAG AAATCATTCA TACCTCTCTC AACTAGATGT CTTTCTTCCT CCTCATGAGG TCAGTTTTAC CTAGATTTCC TCAAAAGGGC AGACTCCTCC ATAATAAAAA CATCTGTGCG CTACAAGAGG AAAAAGAATA AACTTACAAA ACCCCTGACC TTTCTGCTGT TCCAGTATCG CTTGGTGCGT CACACGATTT TTTCTTCI'AG GTGGTTCATA TAAAGTGCTG AAAACAATTC CGTCCTGTTC GAACACATTT 'N1TGGAAAAT
CTTTCCCCCC
TCATGAGCCA
TTTTTCCTCG
TTTCATCTCG
AGGAACAGGA
GGA.ATAGGCA
TCCCACCACG
ACTGTTCTTT
AGATTCAGGT
TAGAGACTAG
AATGCATCAT TAACGACGCT TGACTTTTCT AATCCTAGAA ACAATTTGAG GAGCTGCTTG TGAAGAAAPAA GATGGCGGAA GCGTTTGATT GTTAAAGTTT GGAAGTCACC TCCAGCTAGA TGTTTGAGAA AAAGATAGAG ATTGTAGGCG ATACAGCTCA TCATCATACG AACTTCGTTT TTGATTAAGG TTGAACTATC CGTTTTATCG CCAAAAAATC CCTCCTTCAT CTCCTTGATG AAATTCTCGG CTTGACCACG TCCACGATAA AGCTGAAACT GGTCTTGGcT gTTCCACTCG TCATATTTGT AACGAGAGAA ATAACATCGT AGAACAAGTA TCCTTCTT C 1076 INFORMATION FOR SEQ ID NO: 168: SEQUENCE CHARACTERISTICS: LENGTH: 3025 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168:
S
S*
S. 55 S S S
S
S CCCCTrTGTC AAAACTGTAA AATTAACGAC TCAACAATTC GGAAAACAAA AACAAATTGA CCTCTGTCAA AACTGCTATA AACAATAGCC TCTTCAAAGG TATGACGGAT CTGAACAATC GATTTCTTCA ATGATCTAAA CAATTTCAGA CCTTCTAGCA ACCCAATCAG GTGGAGGTTA CGGTGGAAAC GGCGGTTATG GCTCAAACTC CGCCACCTAG CCAAGAAAAA GGCCTGCTGG ACTGAAATTG CCCGTCGTGG AGACATGAC CCCGTTATTG CGTGTCATCG AGATTCTCAA TCGTAGAACC AAGAATAATC GGTGTCGGAA AA.ACGGCCGT TGTCGAAGGT CTAGCTCAGA CCACATAAAC TCCAAGGTAA ACAAGTCATC CGTCTGGATG ACGGGGATTC GAGGACAATT TGAAGAACGC ATGCAAAAAC CGTGAAGACA TCATCCrCTT TATCGATGAA ATCCATGAAA AGTGATGGTA ATATGGACGC AGGAAATATC CTCAACCCAG CAACTAGTCG GTGCTACTAC CCTCAATGAA TACCGTATCA GAGCGTCGTA TGCAGCCTGT TAAAGTCGAT GAACCAACGG CTCAAAGGGA TTCAAAAGAA ATACGAAGAT TACCACCACG ATTGAAGCAG CTGCAACTCT TTCCAATCGC TACATCCAAG GCCATI'GACC TCCTAGATGA AGCTGGTTCT AAGATGAACT CCTAAAGTAA TTGATCAGCG CTTGATI'GAG GCTGAAAATC GAAGAAGATT TTGAGAAGGC GGCCTACTTC CGCGACCAGA CAAAAGAAAA. AGATCACAGA CCAGGATACT CCTAGCATCA ATTATCGAGC AGAAAACCAA TATCCCTGTT GGTGA'rGA ATCTrTACAC CAATCTCAAT AGAI'TATCAA AACAGATCCT GTGACTTCGA TCCC.TTTGGT ATACTCCTCC TATTCCCCCA GTTCCCAAAA TCGTGGATCT AAGAATTTGG TATTAATGTA GGCGCGACGA TGAGATTATC CTGTCCTTAT CGGTGAACCT AAATTGTCGA TGGCGATGTG TGGTTAGCTT AGTTCAAGGA TCATGGAAGA AATTCGCAAA TTGTTGGTGC TGGTTCTGCG CCC'I'TGCTCG TGGAGAACTG T'rGAAAAGGA TGCTGCCCTC TGGACGAAAC AATCACTATT TTCAATATAC AGATGCTGCG ATCGCTTCTT GCCTGACAAG TGACCTrGAA TTTTGTGGAT TCAAGTCTCA AGCTACACCA TTGCCAAGTA TAAGGAAATG GCGAGAAAAC TA'rTGAGCAC AAGAGAAAGA ACAATCTCAA GTCAAGATGA TGCAGTCGAT GTACCCCTAA CCGCCCAATC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 CTCATCCATC TAGCCGAAGA TCTCAAGTCT AAGATTGCCA AGGCTATTCG CCGTAATCGT
CATGTTATTG
GTCGGACTTG
1077 GCCAACTGGT GTCGGTAAGA CAGAACTTTC CAAACAACTG GGAAGCTrCC TCTTCGTTGG
GCTATCGAAC
GAAAAACATA
GCTGGTCAAT
GTGGAAAAAG
TTGACAGACG
AATGCAGGTA
ACCAATTC'rG
GATGGCATTA
ATGCTAGCAG
TTT=GGTrC TGCTGATAGT ATGATTCGCT TTGATATGAG TGAATACATG GTGTGGCTAA GTTGGTCGGC GCTCCTCCAG GTrATGTrGG CTATGATGAG TAAC'rGAAAA AGTTCGCCAC AATCCATATT CTC!TCATCCT TCTCGATGAA CTCACCCAGA TGTrATGCAC ATGTT'rCTTC AAGTCTrGGA CGATGGTCGT GGCAAGGACG CACCG1'TAGC TTCAAGGATG CCATCATTAT CATGACCTCA CAGGAAAGAC CGAAGCTAGC GTTGGATTTG GTGCTGCTAG AGAAGGACGT TCCTCGGTGA ACTCGGTAAC TTCTIrAGCC CAGAGTTTAT GAACCGTTTT TCGAATTTAA GGCTCTCAGC AAGGATAACC TCCTTCAGAT TGTCGAGCTC 1500 1560 1620 1680 1740 1800 1860 1920 ATGTTAACAA GCGCCTCTCT AGCAACAACA TTCGTTTGGA AAGGTCAAGG AAAAGTTGGT TGACCTAGGT TATGATCCAA AAATGGGAGC
CGTCGGACTA
AGCGAAAAAG
TTCAAGACTA TATTGAGGAC ACAATCACTG ACTACTACCT ATCTCAAAGC AGTTATGACT AGCAAGGGAA ACATTCAGAT TGTAACTGAT 2040 ACGCCCAcTT 2100 TGAAAATCCA 2160 TAAATCTGCC 2220 AAGGAGTAGA 2280 ACAGCTTGCC 2340 ATTGCTAAAA 2400 TTCAATCTGC 2460 S S *5SS
S.
A.AAAAAGCTG AAGTT'AAAAG. TTCTGAAAAA GAAAAATAAA AAATGAAATT TTTCTGCTTC TTrTTTTACT AAAATAACTG CTTTGTCCAT TATGATATAT AGTAGACTGA ATCTGAAATA CAT'rTATAGA ATTAATTTI ACTTTCCCAA TCGATTTrGTT TATAGTCAAT TGAAACAAGA ACAAGACAAA AGAGCCTCAT AATACCTTTT TGAGGTGCTT T'rTGATATGA GCCCATGTTT GGTGAGTAGG GAGGAAGAGG TAAAAGTTTA TACCCAAACT TTACCCATTC TATGGAATC'r TGCATATCC ATAATAATAA GGTAAGAGAA ACTTCTGAAA CCAAGCTTCA AAAAAGTCGC GTCATTGGAG CGATrAACTC ACCATTCATT TGTTAGACCT ATATCTTCTT CCAGATACTT TGCCTCrTCT TAACTGACCTr TCGATAAAAA TAAGTATCGA ATCCTGTTTC GTCAATC'rAA ACTATTAAAA TTCTrAAGAA ATAAGGCTAC TTTTTCTGGG GTTCTTTTTT TTTTCGAGrG TAGCC INFORMATION FOR SEQ ID NO: 169: SEQUENCE CHARACTERISTICS: LENGTH: 4104 base pairs AAAAGGTATT GCAACTTGGT TCTCAATAGG ATTGTACTCA CTTCACACAA GAGTTCTAAC CCGATGGTGT GGTTAATGTT TCGTCATCGT CTCTTCGTAA GCAACCAAAG AAATTCTCTG TTTAATGAGC GACCATATTC ACAGGTGCTA GGTGCT'rTAA TCTTGTTCAT AGTAGGTGTA
TCCTATAAAA
TAATTTCTTG
GTACGAAACA
CTCATCTTAT
1078 TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 169: TTITAAGGTTT TAAAAAAAGT TTCGAAAGG TTTCTTCTTT ATN'TPTTAAG GGAGAGATAA CGTTGATATC TAAATCGTGG TCAAAGCCGG CAATTTT'rCC TTTAGATGTG TATTGGTGAA ?e 5 S. @5 S. S S
S
S. 40
S
TATCATAATC
CGGAATCCAA
ACCAACGTAG
GACACCTTCG
GCTGTAAGGA
TTTTCCAGCT
ATGACTCTTA
TTGAGCACCA
GTTGATTTCC
CAAAGCCTGT
AAGTGGGCGA
TGGATGAATT
AAATGGGTTA
AATATGGTAT
TTATTTTAGC
TAAATCAGTT TTAGGACTGC TCTCCAAAA-A ACAGAGGTA.A ACTTGCCTGT ATCAATACTG ATGCCGATGT TTTTAGCACC CAGTGATGCT TTCATATTAG ACATGGTTTT GTCT"TCCACG GAGGCAGCAT TGTAGAAAAC TTCGGCAGCC ACATAAGCGT AGACAGCAAC TGGGACATTC TAGGCCTTGT CTATTCCATT GATAAATGAA CTGTGAACAC GAACAATAGC ACCTGAAATA TC-AGGACGCT GCCAGCCAGA GAGGTCAATA GCTTCAATCT GTGCTATATT GGAT'rTTGTT TTGATGATTA AAATGAACAT CATAATCCCA TGTTTTrCTCA TATCTTATAA TTCTACCCTA AGGAAGAGAC TTrAGAGCAT T-rTTTCATTC AATAA.AAGGG AATTTCTACA GAAAAGAGAA AGCGGGTAAA GGGACTCGCA TGAA.ATCTGA TCCTGAGTCI' GAGCCGTAGA TGTTCTTCCA TGAAGTAGAC AGTTTTGCTC GAAA7TTTTC TCAAGCCAAT AGTAACTAGG TTTTCCATTT CTrrGGACACT CGCT'rGAA G'PTCAGTGAT GCATCATTTT CTTTTGTCCT T'rTTGTGAGA GGGCATCGTA ATCGGTTTGT CTAAGTGTTT TTAAACGATT GGCTGTCATr AAAAAACTAA ATAAAATAAG AAAATCAAkAA AAAATCAAAA AAGAGTGCGG AATGATTTGA GATI'ATGTCA AATTTTGCCA TTTGCCAAAA GTTTTGCACA TGTGGGAGCT ATCCAACCTG TGAGGAGGTC TTGGCTGGAC TCATGCAGTT ATGATGACAG TGCAGGAGAT ACTCCTTTAA CAATCATAAA AATGTGGCCA ACGA.ATTGTT CGTAATGACA AGATTrTTGAA AAGCAAATCA TTTGTr1'rGAG GCTTTGAAAA AGACGTCATT GGTATTTTCC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 S. *4 S 0 S. 55 S S
S
AGGTTGCGGG TATTTCTATG TTGGAACATG TTTTCCGTAG AAAAGACAGT AACAGTTGTA GGACACAAGG CAGAATTGGT AGACAGAATT TGTGACTCAA TCTGAACAGT TGGGAACTGG AGCCTATCTT AGAAGGTTTG TCAGGACACA CCTTGGTCAT TCACTGGTGA AAGCTTGAAA AACTTGATTG ATTTCCATAT CTATCTTGAC TGCTGAAACG GATAATCCTT TTGGTTATGG ATGCTGAGGT TCTTCGTATr GTTGAGCAGA AGGATGCTAC AGGAAATCAA CACTGGAACA TACGTCTTTG ACAACGAGCG ATATCAATAC CAATAACGCT CAAGGCGAAT ACTATATTAC 1079 GTGAAACTGG TGAAAA6AGTT1 GGCGCTTATA CTTTGAAAGA TAAATGACCG TGTGG3CGCTT GcaAcArcTG AGTCAGTTAT AACACATGGT CAACGGGT AGCI-rGTC-A ATCCAGAAGC TTGAGATTGC rrCGGAAGTT CAAATCGAAG CCAATOTTAC TTGGTGCTGA GACTGTrC ACAAACGGTA CTTATGTAGT GAGCGGTCAT TACCAATTCT ATGATTGAGG AAACTAGTGT GTCCTTATGC TCACATrCGT CCAAATTCAA GTCTGGGTGC TTGTTGAGGT GAAAGGATCT TCAATCGGTG AGAATACCAA TTTTGATGAA AGTCTTGGGG GCGTCGTCGC ATCAATCATA AACTTATATC GATATTGATG CTTGAAAGGG CAAACGAAAA GGACAGCACT ATCGGAGCAG 'rGCAGACCC'r GTGATAG'rCG CCAAGTTCAT ATTGGTAACT GGCTGGTCAT TTGACTTATA TCGGAAACTG TCAAGTGGGA AGCAACGTTA ATTTCGGTGC TGGAACTA'TT .0 .*0 0 0 0 0 0 0000 .0.
0 ATGACGGCAA AA.ACAAATAC CAACCATTAT TGC-ACCAGTA TTACTAAAGA CGTGCCAGCA ACGAATATGC AACACGTCTT TTGAAGAAAA A.ACGCTTAGC AACACAGTCA 'rTGGAAACAA TGTCTrrGTT GAACTTrGGTG ACAAT'rCCCT CGTTGGTGCT
AAGATCAGGT
ATGGGGCTGT
ACCGCAAAGC
AAAACACAC
'rGAATTACCA
CTGTGTTTTA
TATCGAGC
CCCTGTGGCA
GATGCTATTG
CCTCATCATC
CGAAAAGAAA
GAAGGCAAGG
GCAGTAACGG
GTCTCTTACG
GCTGCCCTTC
GATTTTTATT
CTATTGGTCG CGGTCGTCAG CTrAAGAACCA GTAGGAGCCT TC'rATCAAGG ACCAATATTT GAACTGCCCA ACGGGATTTG ATGAACAAAA ACTTATCTTG AAATTCCAGC CGGAAAATTG GTGAATTAGA GGA.AGAAACA CAGCTATTGG CTTTTGTAAT
ACAGTCAACT
GGTTCAAATT
GGTTCAACTA
ATCAATAAAG
ATCATGGAGT
AAACTGGTCC
ATTTTCCACA
GTCAAGCAGT
GAAGTAGGAG
GCCTATACAG
GAGAAGTTAA
1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 GGAAAT'rAGA ACTCTTGTAC AACTATATTT AGCAAGCGAT T'rGACAAAAG TGGAAAATrCC AAACCTTGGA AGTCCTTGAA GTGAGCTTAG AAGAAGCGAA ATATCTGTGA TGCCAAGACA ATTATGGCTG TTCAGTAT'rG GGAGGTCAGT A'rGGCTAAAT CTITATTAAC GGATGAAATG CGAAAAAATI' TCAGG'rCCTC CTTTGCTAGA TGATAATGAG CTCTTCTTCC CGTN'TGGTT ATGCCAATCC TAAGGA'rCAT GCGTCCGCAG GATGAGGATG AGAATTAATC CAATCAGGTC GGAGTTGCAG AAAAAATAGA ATT'GAAAGAG 01'AATAGAGG GAAACTAAGA TTTTACCAAC GGTTTTAGCC AGGAAACCTT GAAGATTCAC GTCGAACCAT CTAI-rCATAA AAGCCGTCGT ATTGAAAATA CCAAGAGAAA TGTCTTCAAT TCTAAGTTGA ATAAAATCI-r A'rTTGCGGTC ATCTTTCTCT TGATTTTGCT TGTTTTAGCA ATGAAACTTT TGTAATAGAA AAGGAAT'rGA AA'rCAAAATA GGAATTATTG CTGCTATGCC AGAAGAACTG GCTATCTGG TCCAGCATTr AGATAATGCC CAGGAGCAAG 1080 TTGTTTTrGG GAATACCTAT CATACAGGAA CCATTGCTTC TCATGAAGTC AAAGTGGAAT TGGTAAGGTC ATGTCTGCTA TGAGTGTGGC GA1TTrTGGCT AGGTGGATGC CCTTAT'rAAT ACGGGTTCAG CTGGGGCAG'T AGCAGAAGGT GGGATGTCGT GATTGCTGAC AAATTAGCCT ATCATGACGT GGATGTCACA ATGCTrATGG ACAAATGGCG CAACAACCGC CTCAAATCCA, AAAGAGTrTA TCTCAATTGG CAGGAGATAG TTTTGTTGCA GGAAATGACA AAGTTTrAGC CGTGGAGATG GAGGGGGCAG TCCCAGTCTT AGTCATCCGA GCTATGAGTG TTGATGAGTT TATTATCGAA GCTGGACGTC AGGCTTTAGA TTAAGCGGAA ATTTGACAGT GAAAAGCTAG AAAACGTTTC AGAGGATATT GCAGAGGTCT TAGGATTATC TCGCCAAGCA.
GAAGACACAG ATAAAAATGA CAAG TTTAT'rTCGA ATCAGACAAA ACCAAAACTG GCATCTTGGT AGATAGAAGC GATI'AAGTCC CTATTGCTCA AGCAGCGCAT ACAATGCCAA CCATGAAGCA GCTCTGCCCA AGTCTTGTTG TTrTCTAGCT TATGATAAGA ATGAGTATTG AAATGACCGT ATCAATA.ACC GTGTCAAAGA GTTCTrGTAG 3360 GA'rCATTTCC 3420 ATCGCTGT1'G 3480 GCT7'TTGGCT 3540 ACCT'TTGTTG 3600 T'rGATTGCTA 3660 CATTTCCCAG 3720 GCCCTCAATC 3780 AACATCTTN'I 3840 ACCTTTTTGA 3900 T'rTAAGTAAA 3960 CAGTGAGATT 4020 ATTACCAGAA 4080 4104 INFORMATION FOR. SEQ ID NO: 170: SEQUENCE CHARACTERISTICS: LENGTH: 8876 base pairs TYPE: nucleic acid STRANDEONESS: double (0D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTPION: SEQ ID NO: 170: CACGGATAGG CTCGGCTTTC ATCAGTCCTC AGGCTGATTT ACAAAGTCCA CAGCGATACG GCAGGATTGG CAACGATAGA AAACCGTAAT CCATCCACTG GCCT'N'CCCT TGAAAATAGA AACGTCCAAC CCCAGTTCCT GAAAAATCGC CCACCTGTCG TCTGCCGAAG TCTTCTAGTG GGAGAGACAG ATTAACTTTT AATCTCCAAC TGCCACCACT TnTGGGTATC
GCTTGATTGG
CTT'PCAACAG
GGTGATA.ATG
AATCCTACC
CTTGGAGTTA
TTCCI'TAAAA
ACAGAGTTCA
ACTAATAGCA ACTr-rCCTCG TTACGCTGAT ACCTTTGCTG CTATTGGGCA AGGATGGTAC TCCCGATCCT TGTGTTGATA TGTCGGACA.A TTTTCCTAAA TCCAAATGCC CATCTTTGGG TGGACATGAT GGATAAAAGG GTCAGTTTCA TTTGGAGCTA CTTAACCCAA ACGCTCACCA CTTGCGACTC ATATGGGAAA GATAAACCTT GGGATTAA.A.A AACGTAGACG CCTATTCCAC AAACCTGCTT GACGTAATCA TCACGTTCAG TATCAA7 T= TGACTAGGGT GTTTGATAAA TATGGATGAG ACCGTCCTCA TGAATTCCG.A TACCCACCAC TCCTTCTAGC TTTGTCCAA GGCGAaCACA GGTGCGTCAA AGGAATCACG AATGATATCT TTAAGAGTTT CTGGACCAAG AAGCGACTTG AGTGCTTT- GGGC1-rCTTC GAGTTCCTTA ACTGCAGTGT AATTCTCTGG TATCAACAAA AGCACCGAAA TCAACAACGT CCACTAAGTC CrTTATATCT AGGACATCTT GAAATCTCGA CCTGGTTTGA GAAGATCTGC GTCTAACTCT TGCGCCAT CCTTGACTGA GTTTAGGTCT T1TAATATCTA AACGT~rGAA GTGAACTCCI' GTATTATCAA GGATA'rrGCT CTGCTCAAAG GCCTTGGCTC CCAGACGAGG
ACTTTCAGGG
AACTTTCTTG
ATTTTCAGAG
ATTGACATTG
TAGTTTCrTC
TTTGACCAAT
ATACGAAGGA AACCAGCAGC ATTTGGGCGC GTGAAGTGAT ATAGTTTTGT TGAGTCCAGC ACACCAACTT GGTTAACCAC TGACTGACAT CGTGTTGGTA TCCGCAAGAC GATCTTGCAA TTTTCCTTCT TCCTCGCGGT TACGTGTGAA AGAAGAGCTG TGTATCGACA ACAAAGTCCA TGACCGACA CCAATTGACT ACGACGGGCG ATAGAAATGG
ATTTGACAAT
GGCTA GCTGT
GACTCTCAGA
TAGGA'rCGAT
CAGAGCGTTT'
AGACAGAACC
CTTCCGCTAC
TTCAACGGTC AAGTCTGGAA ACTCCTCACG AGCAAGTTCG ACCACr'rTCA TTAACGATAA CATAGCTGAC TTCAGGGAA.A AAAAGCTTCA CTTTCACGAC TGGCCGTTCC ATTTCCAATG TTGACCAATT AAATCTGCTA AATCTTTCTT GGCTTCTTCG TTTAACAGGA TAAATAACCT GAGTTGTCAG CATTTTTCCT
CTGGCAGAAT
TCTTTCAGAA
CTTGGCACCT GTACGAAAGG AACCAAGAGG AGATTGCGCA CTCAGTTAAT TCTGTCCGAA TTGCTGAACA ACTTCATCAA AAGAATACGG TCCGTCGCAT ATTGAGAGCC AAGGTACGAT AATCTGAAAA ACCTGCTTTT CTGTCTCAGC ACTTCC'rGAT GACCAAAA'rA TCAACTGCAC GAACTTTTCA GCTTC-TT AAAGAGTCCA GCTTCACGGG ATAGAGTTCT TCAACGTCTG CTGGGTCA-AA TCCAAGAACC GATT'GTCAGA AAAAAG~rGG TACGACGCTC GATAGCAGGC TATAAGCATT TTTCACCTTG OTTCAAAACC GATCTTCAAG AGCCTTGCAT AGTTCCAACT GCAATAATCT CTACACCGTA ATTTGACGAG CTGATGCTGG GT'rGCATCCA CGACAGCTAG ACGCGCCCrT TCAGTGGAGC ATAGCTCCTT CTTCAGCTTT AAGACCTTTT TCTTAACGGA AAACGAGTAG CAAAGAAGGC ACACCAAGTT TCTCCCCACG GTCTCTGAAA AATCATAATA GCTrGAGAAG TAAGTTTAGA 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 TTCATCCTTG CATCAAGACT
AAGTCATAGA
CGGTCAACCC
ACGCAAGGTC ACATCTTCCG TTCCTTGCCA GTCGCAAATC CTAAGTCAAC TATATTCTGC AAAATCAAGC CAATGGTTGC CTTGGTACGA CGCTTTTCCT CTAATTTTTC GGCAAC'rAAG ATAGCTTCTT
ATAAGGCTTC
CTTCACAGAC
GAGCAAGAGG
TATAAGGAAG
CCAATTCCTrr GGTCAACTTA CC'TTGTTCTT GAATCT'rAGC TGTCAGACTT TTATCCAAAT CAATAATAGC CATGTCCrrTG CGATAACGCG CGATAA6AGGG AACGGTATCA ATTTGCTTTA ACGTCAC1'CC 1082 TAAGACAGCT TCCTTACGGT CATTGAGArTr CTTAATCGCC ACCTCATCCA GACTACCAGT AATAGTCGCC CCTTCAGCTG TCAAACTTAG CAAATCCTGA GAGA'TTTTT CATATTTT ATCCATAAAT CTATTATACC ACAAGCTAAA CGT'rTC.AAAT TAACTCGTAG AGCGCA6ATAA CTTGCACTTA AAATATGTAG GAAATAGATT TATATGCTAC CCACCT?'T TTAACCA.AGC CA'rGATATCA AGTTCTGGAA GATGTrCCTG ACCTGGAATA CGATTGACTA AAATTAATTT ATTAGAATAA TCACTAGCTG AATCTTTA'rr ATCAAATTTA TTTCTTTTNT CCCACTTAGT TCG'rGCTTCT TCTATATCTC CTAAGTGCCC CAAAGGATA.A 'rCCTCTTCGA AAACAAGTTC TTGTTCCATA TA.ATCATCAG GAAGAATAAA TAAACCAACA.
PTATAAATTA ATCCTCCAA:C ACAATTATTA TCAAGAAAAG GGAAAATTCC TTTCTCTGCA TCTAAATATT CACCGTCATT TTTTAACCAA TGTTTTTTCG CTCTTTCTAT ATCATAG3T TCATAATTTT CTAATATGAT TT'rGTAGGAG TTATAGATAT ATGTCGGGAA AITrAATATAG CAAATAAGGT TGTTTATATC AAATTGAT'N' TTTCCAAATA ATGAGCGATA TTGTTTTCTA AA6AGTATTTA ATGGATCAGA CATAATAGCC ACACATTGCAC TT'mCAAATT TTTATATGGA GGAAGATTAT CCATCTTATT TAAAATCT AAATAAAGAT TATTCCAATT TATGCGTTrT TCAATACTAG AATAATGTAG AAAATGAATA ACTTCATGAG TCCAGCTCGG TGAAATAAGT TAATAACGAA AATGCTTTGT AAGTTTATAA AACATr'rAAA
AAGACCATTG
AAAGGTGTTC TATATTGAAA CTTATAATCG TAAA.ATCTAA GCTATTAACT TATCATAAAC GCACTAAAAT TrTGCCAATTC TCTAAGACGG CGCAATCTTT TCTTTTAGAG GTTTAGCATC GTTOCAGTTT1 TAGAGTGAAT ATTTTTCGTA AAAGCTTACT ATTCGATGAT CTGTATCATC ACCAAGC3'CT TCTrATCAAGC
AATATCAGAA
TTGAATATAT
GA'IrCTATTT
TATAACAGGT
ATAAAGTCTC
ATTGAATAAT
CATCTTTTGT
2400 2460 .2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 AAAACTTGAA CATTCGTT-AA ATTTTCTGTC A.ACCAATTAT TAAAATACTC CATCAACCAA ATCAGCAAAA TGACCAAGAA T'rTATCGCAT GATACA'rCTT TTCAAATGTC CALATCAAATA AACGTAATAT AATCTCCTGT AATCATATCA GACAACTCAG ATCTTAATAT TAAATGATAG ATTCATCTGT TGGCTAATGG TGATTTACAA TAATAAC'rTC TATATCTTTT AATGTTrGTC TCTAA'rAAAC TATTTTTATC 'rCCTTGATGT AACAAAACAA TTGACTACCT CCCATAATT TCTGATAATG AT'TTTC'rTT CCCCCCAAA-A AGGATKAAAkG CAACATCAGA ATCGGATAAT ATGAATCATT TGAAGATAGA CAAAAGAATT C'TCATCTATA AAGCTATCTC CTCTGTAGAT TCTCCACTAT 'rGACAAAGAC CACTAATTGA GTAACTCAGT TAT'AATTA TAGCACAAT'r ATGATATATA TCAGGTAATA AACTTTTCCC TTTTCCGCAA ACCTATGGCT GTCAAATTTA TAAACTCCCT GATTTGAAAC AGAAAAGAAA AACTAGATGA AGCCATTTTT CTCTCCATTA
TCAAGCITATA
AATAATAGTA
CAAAACGAGA
AAGTTACTTT
CTGCTTTTCA
TCATNTGAAGC
1083 ?1'ATCTCTTA TAATAGAGG1'
CG.ACTTGGAC
CCCTGATGAC
ACAACTCCCA
CCTTCCCTTC
TGACAAGGTT
TGTCGG'rATA GCTACTCAAT TTCAAATTT AGAATCTAGA A'rCGAGGTAC AAGATGTTTG AAGAGTTTCC AAAGAGAAAA AAGTCAAAGC TCTAGTGTAC TTCAAACTGG C'N'CTGATAG GAAGCATTGT TATCATTTTC TCCCTCGAAA CTTTTCCCTT CrGTGAATG CTCAGGGCTG AITGAAGTTT ATATCACACC TCGTTGGGGG AGAATCTTTT' TTGGGACCT TGGAATCGTC CCCATCATCA TCCTT'rTCTT GTGACAGCAC CTTTGGCAAC TCCT'rCCATG AATACTAGGA AT7=rCTAG GGCTTGTCAT GAGCATGATT TGTGCAGGCC ATTGATGAAT TGCTTCTATA ATACAGGTCT TCTTTTTGCC ATCCTGCTCT GGATGCCTTT ATAGGTC'r' TCTCGTCATT GGTCCAATGC AGCACGATTT ATCAGTCACT CTTGATTGGA GT'rATCCTAT ATTTACCTCC ATCTGTCGGG GCCTATATCT CCATGGTGCT AAGCAAGTCA AAACCCACAG CTrCTGGCTA TTCCACTTGr ACTGTTTCTG CTAAAGGTT'A ACAAGCGAAG GGACGACAAG TCAGCCTATG AAA.AGGAAAT ATCGTTr'rCT
CTGTTATCAA
TCGCCCTATT
GATTTTTCTG
TTTCTTACTT
TTTTGATAC
ACGT'rCCGAC 'rGATGATTTT CTCT'rCTCTC
TGGATATCAA
GGAAAAAAA G TTCCAAGTT ACACGGCCGT TCCCATTGT'r CI-rrTTGCGA CCTATTCTGC
ACGAGCTCTG
GCAAGAACCG
GAGTTCTGCA
GGGGCGTTAT
TCGGATTCTG
AGCCTTTCTT
GAG'rTTCGGT
AAATATTCTC
GGTTCCATTC TTG'rGGCTGT ATTCAGAAAG AAAATCGTCT AAAAAAGTTT 'rrCAAG'rCTT TTGGTAT'rTG GC'TGCCTCTT ACCTCTATCA GTGCGACCCC CTTTCGCTCT GTAGTGAGGC TTGGCACCAG TTCTGGCCTT ATGATGAAAA ATTACTTGAA 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 TCATAACAAT TGTAACTCTT GTCGTCTTAG TCTATTCTCT GATTCGA'N'T TTAGTTTTAG CTGGCTATTT TGAACTGACT
CAAACTAAAC
TTCTTTTATC
TCATCTGAAC
CATCGGCTTA
TCATTTCCCC
CCAATATTTG
GCGAACGGCG
CAGTACATCA
TTGGCTATCG
AGCCGATTAC
ACTTTCCCAA
CTATCGGAAG
AAACCAGATA
GCGGATAAAT
ATGGAGGCTA
TTTGTCTATA
ACATGCACTA TTCC'rATCTG TTCAATTGTA TATC'rGGATG CCAAGAT,AAC CAGTATTTCT CTGTTAGCTT GGATTCTCAG GAACGGATCT AGCCATTCAG CCAGNCTTA TTTTTCAAAA ACTTATCCCA AGATAGTATT TCTACGACTA TCCAGATGAG ACGACCCCAG TCATGCCAAT CAGATCACTA ATGAAAACTA TATGGAAGTC TTTGAGGGCA AGACAATCCA GTT'TACAGGC AGTCAATTTC TGTTCCGATT TTCrGACCA AGCGCAATAC AAACTGGTCA ATCACTACCA AGCTTTACCA AAGTCG.ATAA 1084 CGGCATTATC CACTGTATCG CAGATTCTGG 'rGTCTATGG.A CCGGCAGTAT GAAAACAACA CTGGATAAC AGCCAAAGGA TAAAGAACTC AAACAAAACC TTCCAACCTT GGAAATCGAC ACCAGAAAAT CCCTATGTAT ATAGAGUCNi TTAAGAAAAT TCTCTTCTGA ATAACAGAAA AAG~rCCTGT TCG7'rT'NG CAAGATAAAA
TTATATGAAA
GCTCATAGAA
ACGANACAAGT
ATTAGTGACT
AATAACAG
GTATAAAGCT CATGGCCAAA TTTACGGCrr CCTCTCTCCA 'rCTCCCACTT CTCTCTTAAA TTTCA1TAATT TATAGCCAAC TTACAGCTTT C'rCCATATTT GAACATCTAA GTCCGAAACA TATCCAAAGG TAAAATCTTA TTACT'rCTTC AAATTCCGAT TCAGACAGAT AGATTCI-TCC CATCfCGAAG ATGCTTTTCA GAAAATCCTT GCTATGATAG TCTACTI'AAT ACTATGC= A ATTCTTCGCG TTTTTTTTCT GGTA'rTCAAT TGAAATGCGT GGAT'rCGCTT AATCAAGTCT CACTAATATT AGACATTGTA CTT'IrAAT'rC CATTCGATTA GATGAAACTA CAAAGAATAA ACTTCCTGCG AAACAAAATA TAACTGATGC ACGATTTAAA TAGCTGTATT AAAAACAGCT TAAGCCTAGA AGACCTTCTT TGTAGATT'rT CATCTTATAC CCACTCATTC ATTAGACTAG GTTT'rCrAAA AAAATAGTAT TGTAGCTTCA TTAGGATAGC AGCT'rGTATT TTTCTCCGTA TCATATCTAT TATACTCAAC TCTGACCAAT GCTTTGCTTC ATTTGAGATT TGATATAArT TCTAAATCTA GATAGCCACC
GCGAAATAAC
TCTACAATTT
TATGGATTTA
GCATTGCTTC
TTCATCTTTT
GGAATCTCAC
TTAATCGTAC
TGTGGGACTA
AGCCTTTTCT
GTCCTCCTAT
GAGCTAAATC
CATT1'TTAA.A
GAAAATAGAC
CCAAACCGCC
GTTCAA.AGAT
CACTTGTTTG
GTGTTGTAGC
GCGCCAGAAT
TAATCATr'rC
TTTCTCTAAG
CATTCCCAGC AATACAAGTA CGATTCmT AGGTGCTTGA CAAAATAGTC TGGCAATTCT GAGGACTAAT AAACAAGGTA GcGGAGTATC GCTTCTATAT ATTCCAGTGA TAAGACTGTC AGATTTTT'CT T'rAGAAGTAA TTTAGTTTCC TCTAACTCTG ATCCAAAAGA ATCAGTTTCT TCCTCCAAGA GAATGGCCTA CCATGATTTC AATTCTGTTT CTGCGAATCT AGTTCTTGAA AATAAAATAT TT'TTTCATTC AGTTGTGATA ATCTGACGCA AGCTACAGCG TAGAGTTCAG ATGTTTTCTG ACAAAAAACG CTTCTCAGTA GTTTCTTTGT CTGTTCTTTT TCTGTAAAAT TTAAATTATG TACTAATACA TGA'rAAGATT TTAGCATTAG TTATGAAGCA AGTAAACAAC CACGACTTTT GAAGAGATAT AGGTGGACGA AAACCTAAAT GCGAGAATAC CGCACTTATG 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 ACTTTAAGAA ATCTTCTCAC TGGTATAGTA GTTCTATGA-A CGTCTTGTTG GTGTTCAGCG TATCAACTTA AACACGCAAA ATGGCCACTC TTCAATATGT AAGAAATTGC GGCTGAVMT GGTATTCACG AAAGCAACTT AATCCGTCGG AGCCAATGGG 1085 ?1-TAAGTAAC
ACACGGTAAT
TGTTCAT
TCTTGTTCAA AGTGGTGTTA CGATTrCAAG GATTGATAGC CA'PrCCCATC AATATCGTAT TTGCGTGGTT TCTGGCTATT AACGATTGAA AACTCCTCTC AGTTCTGAGG CTrTGGACAT AGCCAATAAA ATAACCCACC AACTTATCAA AAATAGAAAT AAAAATCCTA AGATTACTGT CATATCATAA CACTA'rTAAA GTTPAACCCA CTTATCATTA TCCATGATAA AAGGCTTAGC CAGTCCCTCG CCTGTATAAT CCGCATACTr
S.
S S S. 55 S S
S
GGTGCCCAAA TACTTGTAGC TTCGAAAGTC AATTCTCTA GTGGCCTACA AACAGATACT AGCTTTTTCA TTGAATTGAA GACATCTGTA GTTGACTCAG TTGTGAATCA AAGACCI'TC ATTCTTTCA AGCTTTGGAA GCAGTTGGTC CCAACAAAAT ATCGTATTCG GTGTGAGTAA.
GGTATTATAG TCATTAACGA TGATAAGAGA GATTTGACCT TTTCCATGGA ACTrTCGCTG TTGTTGTTGA CAAGCAGCCA TTTrCTTGAT TTCTTCATTT TGCTTTACAG ATATTGATTA AATCTrCCTT ACTAGCAAAT CAAATAAGAA ACCGTCATCA CGCCATCCAA AT'rGTCGTGC ATTG'rGAGAA GAATGCrTCC TTGGAACTCT CGAAAATAGA CTN'ATCAAT CGCATCA'N'A TGGTGACTGA ATTTTTCAAA CGCCCTTCT'r TTGATTCCAG AGGAAGTGAA ATCTCCTGAT GATTAAAAAA TGCATCAACA CTTCTGTAC'r TACCTGGTTG AACTGCTTTG CCTTTGATTC AGCCTAAAAA CAAGGCTGAA CTTTCTCCTA AATGTCTTGG CTTTCTCATT TAATGTGTTC TTAATCGCTT GGTAGGGCTC GCAGGTACTA AGACCCCAAC AAGACTACAG ACAGCATTCG ATCTTTTCAG CGTGAACCTT ATATCAAACT CTTCCTTATC TCTAGGAAAA GCAACTGGTC AGACAATAAC TATTGATACG AGATGACTGA TT'TTCTCA.AC AAGCCAGTTG AGCCGACAAT CTATTTGGAT CCAAGTGAGC 'rTTAGGTTGG TGTATGAAGC GTCCCCTCAG AAGTAGCATG CAGATTCCTA ATGTGGCTAA ATTAAAGTTT CTTTAACTAT ATCGTCTT1TC CTCCGG 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8876 55 S S INFORMATION FOR SEQ ID NO: 171.: Wi SEQUENCE CHARACTERISTICS: LENGTH: 14736 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 555
S
S. 55 S S
S
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171: CGCAPLACTTT CGCGGTCGGA. AGGTAGTTTT 5 ATGACACCAT TTGAGATACG AGATGATTTC TATCTCGATG GAAAATCATI' TAAGATTTTA TCTGGTGCCA TTCATTATTr TAGGGTTCCT CCAGAGGATT GGTATCATTC GCTCTATAAC 'PTGAAGGCTC TTGGTTTTAA TACGGTAGAG 180 1086 ACTT'ATGTTrG C7TGGAATTT ACACGAGCCT TOTGAAGGTG CTGGATTTAG AGAAATCT CCAAATAGCG CAGGATTTGG CCGTCI'CCAT TTATCTGTGC GGAATGGGAA 7I1CGGTGGCT AAGAACATGC GAATTrCGCTC ATCCGACCCA GCATATATCG GATCAGTTAT TGCCAAGACT GGTGCCTCGT TTGTTGGACA ATGCAGGr'rG AAAATGAGTA TGGTTCTTAC GGAGAAGATA CGACAGCTAA TGGAAGAGTG TGGCGTAACC TGTCCCCTCT CGAGCTACTC TGAAAGCTCG AACCT'rAATT GAAGAGGACC GGTTCTAAGG CACCTTACAA CTTTTCGCAG ATGCAGGAAT AGTTTCATTT TGAAGGTGAT CTCTCTACGC AATTIGTGCGT 'rACCAGCTTG GCTCTTGACC AGGCAGTTGG TCGCTACTAT ATGGTGGCAA TATCCATG AGGCTTACCT GAGAGCGAIT T'rACATCAGA 'rGGTCCATGG TCTTTGTAAC AGGAAACTT TCTTTGATGA ACATGGTAAG @Oe 0 00 0 0 00 00 0 0 0* 00 0 0 *00*bO
J
0 00*0 00 00 0 0 0 0000 0 0000 AAATGGCCAC TCATGTGTAT ATTATCACAC GGGATCCTAA TCTATCAATC TTTACATGTT GCTCGAGGAA CT1'TGGACCT GAAGAAGGAA ATCCAACTGC TCAGAGTATC CGCAGTTGGA C'rAGTTGAAA AAGTTCTTT GGAGTTCTGG GA'rGGTTGGT TCAATCCCTG GGAATTGGCA GATGCAGTTC GAGAGGT CCACGGTGGT ACAAACTTTG GTT'rCATGAA GCCACAAGTT ACGTCTTATG ATTACGATGC TAAATATCTr GCAGTCAAGA AGATGATGGC
ACCACTCTAC
GTTTGAAACC
APLAGAGAGTA
TTAGATAGCT
CTCTATCCTC AAAAGATGGA GGAGCTGGGA CAAAGTTATG GAAACAAACT GGGATGCAGA AGAAGAAAGA CTT6GTATCA CAGCTGTATG TCGATCGTCA GTGGGTTAAA ACTCAATATC ATTTTTTATC AAGGTAAAAA GAAAGGGCTA TCTAGGTTAG GGGCGTGTCA ACTATGGGCA TAAGTTCTTA GCGGATACGC GGGGTCTGTA AGGATCTGCA TT1TCTTACTA AACTGGAAAC AATCCTGAGA AAATTGATT' TTCAAAAGGA TGGACTCAAG TATGACTTTA CAGTCGAAGA GCCAAAAGAT ACTTACCTAG GGGGTTGCCT TTGTCAATGG GCAGAATCTA GGACGTTTTT TCACTTTATA TCCCTCATAG CTATCTCAAG GAAGGTGCCA ACAGAAGGTC AATATAAAGA AGAGATTCAT TTAACTCGTA AAGGGGGAAA ACTTATGACA ATTGTAGGAT GCCGTATTGA AAGTAGCCAA TCTTTGGGCT GGAAAACTAA ATGTTTCACG AAGTTGTCAA CAACGATATT GAAAAGAGTG GTTTGAAACT TGGAGTTrGGA
TGTCAAGTCC
GCTACCTACT
TTGATGGTCG
AGACAGAGAT
ATATCTI'GAT
AACGTAAGGG
ACTATCCACT
GACAACCAGC
GAAAGAACCG
GGAACAAGGC
TGGTTGCTCA
CCTTCTGGAT
AACACATTTT
TGCTATTCCA
'rGTAGAAAGT
TTATCGAACA
AGATAGGGCC
TGGGGAAGAT
AGAAAATATG
AATTCGGACA
CCCACTAGAC
CTTTTACGCT
240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 00 0 S p ACTTGTCTGA GTTTGGTAAG GGAACGTTGG CCCAACTCTC ACCGCATCAT TATCTTTGAA AACCTACACT AAAACATATA TGGACGTTTG ATCCACGGAC CATTATGGTT GTAGACGACG TGCGACACCA CCAGGTGTGA 1087 AAI'GAGTAT TN'GCCAGTT GAGAAAGCrG CAGCCAATAT TCrGGTGGC AAATACGATA GCCAACGTCT CTTTATCGTG GCTCGTAAAC C-AGACCGCTr GTGTACCACT TG.AAACCCTr AATGTTGGGA ATATGTCTCA TTACACG=rC 'rATCAACCTA GTAGACAAGG ATGTG-GAAGA AAGGTGTTAA ACTTACTGCT CAGATGGTTC CAAATGATCC TATTAAAATA GGAAAAAAAT TTTAGGAGG TCATT'GTTAT TACTTCTCAC TrTGTACTCA GCTTATCAAA TCTGTGATGA CAGGTTCCCC TGTATTGC'r GGTTTCATTA CTGGTTTAAT GTTTACTTAT CGGTGGTAAC TTGCAACTGT TCG~T-CTTGG CCTTGGTTTG GTAGAAGCAG AACACCAGAA ACrCGTTCTA CITTCCACAAA CTGGCAGAAA AATTTCAGAC TTrGAGCT GATACZAATGG TGGCAAATTTr GTTGACGATC GTTTCA'rCTG CATGGGAGAT GTGACTACTG GGTTGGTACC T'rCGGTGGTG *.o .9.
CTTCTCGTAT CGACGCAACT TCTGGTGCGG AATTGATGCA CCGCTTGCCA TTACTACAAT CTTCGACGTT CTTGGTCGTA -TGACTACTAC CGAACGCTTT GACTATAAAG GTATTGAACG TCTATCTCGT GCCCTTCCAG TCTTCTTTGC AGTAGTAGAC TTCGTGAAG CCTACAAATG TTCTTGCGAC ACCTTCTCTG TTTCACAAGG CGCTGTACCA GTAGCAGCTC 'rCTTGACTTA CTTCTTCGCT CACCGTGTGG ATGCTGCAAT CAACTAC'rTG CTTCGTGCGA TTCCGTGGGC CCTTGCTTTT GGTGGTGCC'r TTGTACAATC 'rATGCTTCCA GGTCTTGGAT TCACTACCTT GCTATGGGAT AACAGGTCTT GGTGGCGCTG AAAAATTGGT TTCGTGA.ACA TAT'TrCCTT GCAGTGCT'rC TACACCATCA GAAAGTGGGG AAGATTTTAA TCAAATCAAC AACGTATGCA AGCTTCTGCT
TTGCAATCTT
TTGGTT'rGAC
TTGCTGGTAT
ACI-rCAAAGG
ACTT~CAAAAA
AAATCGAAGA
AAACGTAGCT
TACCTTT'ACA
GGTTGCAGAT GGCTTGACAC GCTTCGTTAC CTTCCAGTTA AGCTA'rT'rG ACTG'rTCTTT CGTAGGTACT CTTCCTGCTG TTTGTCTATG ATTGGTATTT TAGCCAAAAA GTAGCTGTAG TGACGAATTC TA.ATTACAAA TGTTTrACTTT CCAATTAGGT TGATCTTGCC TCAG'IrGCGT TGAAAGTTCA TACTCAATTC
TTGCAGGACG
AACGTAACCT
ACTCATATGT
AAGTTGCTGA
CTATCGTAGG
CAGCACC'TTC
CTTACAAAAG
TGGAACTACG
AAAA'rGTATG
TTCAATACTT
2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 GTGATGGAAC TCCTGAATTG AAAGAAATGA CACCATTCTT CCATACCATT ATCGCTGGTT TTGACCTTGC TAGGTTCAAA AGACGCCGTT AACGGTATCA AGACAGGTTT TTGGGGATAC AATCTTTGGT TCACTTGTAC CTGCTATCAT TGGCTATCGC TGGCCAACCT TGGGGGATCT TCCTTTGGAT CATGGAAGAA AAAGATGGTG GATGGGACCA T'rCGCTCCTC GGGGTCAGTC GCAGCAACTA TGCAGTTGCA GTAGCGTATG ACATCrTCCG TTGGAAACAG TTGGAAT'rTG CTTACAAAGA AGGGCT'rAAC CTTATCAACA 1088 ACATGCAAAG TACCTTGACA GCTTTGAMT ACGCTGCATC TCTACTTGGT G'rCTTCATGA 3780 TGGGTGCTCT TGTAGCAACA GTGATI'AACT TTGAAATTTC TTACAAGT'rG CCAATCCGTG 3840 AAAAGATGAT TGATTTCCAA GACATCTTGA ACCAAATCTT CCCACGTT'G CTTCCAGCAA .3900 T=~TACTGC C?1TTATCTC 9TGG?1'GC'NG GTAAGAAAGG TATGAACTCI' ACTAAAGCTA 3960 TCGGTATTAT TATCGTACTT GCTTTGGCTC TTTCTGCCCT TGGTCACTTT GCAC~rGGAA 4020 TGTAA'?rCCT TATGACTAAA TCAT'rAATTT TGGTGAGCCA TGGTCGCTTC TGTGAGCAGC 4080 TTAGAGGTAG CACAGAAATG ATTATGGGCC CACAAGACAA CATTTACACA GTAGCTCTTC 4140 TTCCAGAAGA TGGCCCAGAA GAATTTACTG CTAAA'NTGA AGCTGTTATI' GAAGGATTGG 4200 ATGATTrCCT AGTCTTTGCG GATCTTCTCG GTGGGACACC TTGTAATGTG GTGAGTCGCT 4260 TGATCATGGA AGGTCGTGAT ATTGACCTTT ACGCAGGGAT GAATCTTCCA ATGGTGATTG 4320 *AATTTATCAA TGCGAGCCrTr ACAGGCGCAG ATGCGGACTA CAAGAGCCGT C'GCAGAAA 4380 CATTGTGAA AGTTAATGAC CTGTTAGCGG GCTTCGATGA TGACGAAGAT GAATAATACT 4440 ***CTTCGAAAAT CTCTTCAAAC TACGTCAACG TCGCCT'rGCC GTAGgTATAT GTTACTGACT 4500 ***TCGTCAGTCT .TATCCGGCAA CCTCAAAACG GTGTTTTGrAG CTGACTTCGT CAGTCTTATC 4560 *CGGCAACCTC AAAGCAGTGC TTTGAGCAGC CTGCGGCTAG T'IrCCTACAG ATTTTAGTTG 4620 *GAACTCGATT CAATTCATGT GACAACGTGA AAATCGTTAG AGCATITTAT ATAGAATATA 4680 CATGGGAATG TAGCTTACTC CCATTCCCAT ATTTAATAGA AAAAGAGGAA CTCAA'rGCTA 4740 *CATTATA CAA AAGAAGACTT GCTCGAATTG GGTGCAGAAA TCACTACGCG TGAAATCTAC 4800 *CAACAGCCTG ATGTATGGAG AGAAGCTTrT GAATTTTATC AAGCAAAACG 'rGAAGAAATT 4860 GCAGCCTTCC TACAAGAAAT CGCTGATAAA CATGACTATA TTA.AGGTTAT CTTGACAGGT 4920 *GCTGGGACTT CTGCr'rATGT GGGAGATACC T'rGCTACCTT ATTTTAAGGA AGTCTATGAC 4980 GAACGCAAAT GGAATTTCAA TGCTATTGCG ACAACAGATA TCGTTGCCAA 'rCCAGCAACC 5040 TATTTGAAAA AAGATGTGGC AACTG'TCCTr GTGTCTTTTG CTCGTAGTGG GAATTCGCC'r 5100 GAAAGTT'rGG CGACTGTTGA TTTGGCCAAA TCCTTGGTGG ATGAGC??ITA TCAAG'rGACG 5160 **ATTACTTGTG CAGC-AGATGG TAAATTGGCT CTTCAAGCTC ACGGTGATGA TCGTAATCTC 5220 TTGCTCTTGC AACCAGCTGT CTCTAATGAT GCTGGATTTG CCATGACTTC TAGCTTTACG 5280 TCTATGATGT TCACAACTCT CTTGGTCTTT GATCCTACAG AATTTGCTGT TAAGTCTGAA 5340 CGTTTr'GAAG TTGTATCTAG rCTTGCCCGT AAAGTTTTAG ACAAGGCAGA AGATGTCAAA 5400 GAGCTCGTTG ATTTAGACTT TAACCG~'GTC ATCTATCTAG GCGCTGGTCC TTTCTTTGGA 5460 CTTGCTCATG AAGCTCAGCI' CAAGA7=~G GAATTAACTG CTGGTCAAGT TGCGACCATG 5520 1089 TATGAAAGCC CAGTTGGCTT CCGTCACGGT CCAAAATCTC G'NrrGGTCT 'PTGGTACAAC GACAGACTAC ACTCGTAAGT TTATCAACGA CAATACAGTT ACGACTTGGA CTTGGTTCGT GAAGTTGCTG GTCACCAGAT TGCTCGTCGT CTrGAAAATG TCAAACAAGT GGCCCTTG.GT GTCTTCCCTT ACATCGTTTA TGCCCAACTC
AATA.AACCAG
ATTCACGAAT
ATACACCGTC TCCTACAGGT ATCAAAAGTA AGACAGTGTT GTTGTGCTTT TGAGTGATCA AGCT-rrGGT TGTGGCGGTG TCTTGAATGA TA'N'TACCGT 7'rCTTTAT TGACTTCACT CAAGGTAGAA- ACAGTAAACC GTGTAGTACA AGGTGTCATA TATGAATTCT TGACAAGAGG ATTTGTAAAT T'CTATGGT TTGTTTGCTT GAGAGAAATA AGAGCGTGTA T'rTGGAAATG '1-GAGGGTGA TATCAGATAA ACCATAGAT GTCAGTACGC GTAAAAGGAG AACAGAATGA AAGCATACAC GGATGTCTTG GCCTATCGAT TTGAGACAGA TGGTGCGACT ATCTTGCGCT ATGTCGCACC CTTGGGATTT GATGACTTT~G ATAGTTATGT AGGTCCTGTA GCGGGTCGTA TT1GCAGGTGC
CGG'TGGCTAC
TGACAAGGCT
AGGCAATAGT
GACCTITTGAG
CAACTTGAGG TTATGACTTA GGAAA'N'TTG CCAATGTTAT CCCAAGCATG GAGCAAGTGT CTCAATGGTA AGACCTATGA TCAACTGGTr GGGATTCCAG CTCTACACAG AGCGTACAGA AGTTATCACT TGGAAGAAAC GATACGCTGG TCAATCCAAC 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 CCTTGAGGTT AATAATGCTA.GCAACTGTAA TCACAGTGGT CTTGTTTGAA GTTGAAGA-AG TAAGCGATCA TGGCTTGACT TGGGACAGGA GGGTTCCCTG GAAATCTCAA GATTTGGATC TGGTGCCTAT GAAATCAGCT ACAAGGrAAC GACCGATCAG CAACCACAGC TATTTCAACT CCAACTAAAC ACAGAGGGCA AGAAGCCAAC CGTGATGTG TGCAGAAGAA GATGAGCAAA TGCAGGCCAT GACAATGCTG CAAGACAGAA-GCTCC~rGCT CATAGGAGGT CAGCCAATGC AGATGCCATr CACAGTGACC CAGTAAGACA CGTTATGAAC GCTAGGAATA GGTACGCAGA GCTATCTCTT AATATAATAT GTATTTGCTA GTGCCTTGGC TGTCTGGTGA TTTCACGCAC ACGATTGACC 'IrTACTCAAT CGCTCCTGAC GGTGTTrCCTG 'rCAAACACGT CTACAATGGT ACCTTGTTGA TCCAGCTGGC ATCAGGTTTG GATCATCCAT GATTCCTTTA TGACCAAAAT TCAGGTCGCT TTGTGGTCTA CACAGCAAAC TrTGTGGATG TACAGCACAA TGGGATTGCI' CTTGAAGCGC TTAAAGGCCA AGTCATTCT'r AAAGCTGGTC TTGTTGTGAA GTAAAAGAGT CATTGCGCCT GACAAATAGT AGGAA.AATAT GATATAACTA TCAAACTACA ATAAGGAGTA AGAAAGAAAC TTTGAC G GCTGGAGCAG TTTTGACAAA
GTCATGTCTT
CCAAAACTCC
AGGATATC7?T
TTGCCCTTCC
TCCTGCTTTT
AAAGTGTCAT
AAGCTTTACC
A.AACCTTCAC
ACTTTTGGGA
AGCGTTGAAA 7140 GAAGAAAATT 7200 TGATGTTT 7260 1090 GCGAACGACA GACTTG'rGGC AACACAAACT ACTGATGGTA AAAATGAAAA TCTATTGACC 7320 TCAGAGGTGC TAAAACCTTC TAG1'GGCAAT GTTT'rGGTTC GAATCAAAGG AGAATTTGTG 7380 GCTCCTCATC AACAATCTAT TN'GGATGCC ATCAATGCTA TCTGTAAAGA AGCGGCTGAC 7440 GAAGGTTTGG TAGATAAGTA TGTCCCTATC AAATGATCAA CTGACCTAGA AAACGCAGCT 7500 TTTGCCAGAG CTACAGAAGC ATCTATAACC ATGGATCATA CCCGTCTTTC TAGCAAAGAT 7560 CTTTGGAGTG CC'rTTCCAAC T'rCTAATAGT ATAATGGGAG AAAATTTGGC ATGGAATCAT 7620 GACGGTTTTC TAAAAGCTIAT TGAACAATGG CGTGCTGAAA AAGCAGATTA TGTGGAGAAA 7680 AAAATAGTGG TTCAGACAAC GGGAAATCTG GTCACTATGA GTCGCTAATT AACCCTAAAT 7740 TTIACACACAT GG GGATGGCA GCTTTAAAA ATCCTAACAA TCAATACAAA GCTATTACAA 7800 TTGCTCAAAC TCTAGGTGAT GATGC'ITCTT CAGAGGAATT GGCTGGTAGA 'TATGGTTCTG 7860 ***@CTGTTCAGTG TACAGAAGTG ACTGCCTCAA ACCTT TCAAC AGTTAAAACT AAAGCTACCG 7920 *TTGTAGAAAA ACCACTGAAA GATTTTAGAC CGTC'rACGTC TGATCAGTCT GGTTGGGTGG 7980 ***AATCTAATGG TAAATGGTAT TTCTATGAGT CTGGTGATGT GA.AGACAGGT TGGGTGAAAA 8040 aCAGATGGTAA ATGGTACTAT TTGAATGACT TAGGTGTCAT GCAGACTGGA TTTGTAAAAT 8100 6.TTTCTGGTAG CTGGTATTAC TTGAGCAATT CAGGTGCTAT GTTTACAGGC TGGGGAACAG 8160 ATGGTAGCAG ATGGTTCTAC TTTGACGGCT CAGGAGCTAT GAAGACAGGC TGGTACAAGG 8220 AAAATGGCAC TTGGTATTAC CTTGACGAAG CAGGTATCAT GAAGACAGGT TGGTr-rAAAG 8280 **TCGGACCACA CTGGTACTAT GCCTACGGT'r CAGGAGCTTT GGCTGTGAGC ACAACAACAC 8340 CAGATGGTTA CCGTGTAAAT GGTAATGGTG AATGGGTAAA CTAGGCTCAG CCCATAGGTA 8400 AAGCATTCAT CTTACTTAGC AAAAAGAATG AACGATAAGA AAGAGGTTGA TCGCGA.ACAT 8460 *TGGCCTCT'rT TGATTTATAA AGATTGGATT CTTGTCGCCT CAATTTCAGA CTTrTCTATT 8520 90.0 GTAAGCTAAT ATTI=ATAGC CCATTAAAAG CATAAGCGGT AATCTAATTr AAAAAATGCT 8580 GTAATTAGTC TGAAGTCCAC ACTTACTTGT TGAGATGTTA TCTCTGT'rrr TTATCGTTAT 8640 6AATTTACTIGT ATTTTTTATA GTATGCAGAA TAT7'rTTAAG TATATTTCAA TAGAAATTTC 13700 *'TATCGATTTA TTGTATAATG ATAAGTAAT'r GTTGAAAAGT ACTCAGAAAA TTCCATACTA 8760 TATTATT=T ATGTTTATAC TTTTATGCTA TAAAATATAG ATTGATATAA AGAATATAGA 8820 AAAAGCGAGG T'rAATATGAG CCGAAAAAGC ATTGGTGAGA AACGCCATAG TTI'CTCGATG 8880 AGAAAGTTGT CAGTGGGATT GGTATCAGTT ACTGTATCTA GTTTCTI'TTT GATGAGTCAA 8940 GGGAT'rCAAT CGGTATCGGC CGATAATATG GAAAGTCCAA TTCA'rTATAA GTATATGACC 9000 GAGGGTAAAT TGACAGACGA GGAAAAATCC TTGCTGGTAG AGGCCCTTCC ACAACTGGCT 9060 GAAGAATCAG ATGATACTTA GG=q 'AACC CAACTGTTGG TTGCrrrCTA AAAGGGAAAA ATGCGAGTTC AATTG?=GCC TATAATAGTC AGCTrTCTAT GGTTATCAAT ATAIrrGGTTA ACAGTTGATG CGAAATACTC GATGTAGTTC ATTCAGCTGA GGTGAAGCAT CAGCGGATGA
TCTTCTAATG
GTAGTTCGAC
GCGGAAGAGG
ACCAAAGGCA
CTCTACACTA
GCAGTTCGCG
CCAGGTCATG
GCGACAAAAG
CCGGCTTTAG
GAAGAAATTC
GCAGGGACAC
AAAGAAGTGT
GTGAAAGTTA
ATAACTGTAA
GTTTTCCATG
GTAATATCAG
TTGGGTGAAA
ATTCATTCGC
CAACAGTGCC
AAGTATTGGC
CGCAAGAACC
AGCCACTAGA.
AGGAAGA.ACC
AGGGCAAAGC
GCACACAAGA
AGGTCACTAC
AGGATCCAAC
GTACAATTCA
CACGAACTGA
AACCTACAGT
GTTATAACT
GAGACAAGCT
GT'TTAGA'ITA
ATAATGAGGA
AGATTIAAAGA
ATTTAAGTCT
ATCGCTTCAA
1091 TTAC'TGGTT TATAGATCTC AACAGTT'TT TACTITrCCTT ?TTACTGCAG GATTGAGCTT TGGAAAGAAA CGACTTGTTC ATTrTCTGCT GGCCACGCT TTTGGGTTGA CCAGCCAGAT CGGAGTCGGG GAACATTTAC CAGAGCCTCT TATCAAAACT AAGAAACAGG ATAATACAGA TGCTCAAAGA GATAGTCAAC CAAACTCTAC TTTAGAATGG AACCAAGGAC AGGGCAAGGT TGGACTTTCA GAAAAATCTT CTATAGCAGC AAGTCAAGTT GAGCAGAATC CGGATCACAA AGAACAAGGA AATCCTGTG;T CTGCTACAAC GACGACAAAT GATCGACCAG AGTATAAACT CGGTCATGAG GGTGAAGCCG CAGTCCGTGA AACCAAAGGT ACACAAGGAC CCGGACATGA AGCTTACACA GAACCGr'rAG CAACGAAAGG TACAGTCCGC GAAGAGACTC TAGAGTACAC ACCCGAACAT GAGGGCGAAg cGGCAGTAGA ACGAA.ATAGA ACGGAAATCC AGAATATTCC ACTTCTGAAA AATCGTCGTA AGATTGAACG ATATGAAGAC TACATCGTAA ATGGTAATGT AGTAGCTCCG GTCAACGAAG TCGTTAAAGT
ACCGAATACA
GTTAGTTTTA
GTTGACTAGC
TTTATCTGCC
GAAAATCGAA
GCTTTCAAGG
AAAAACATCA
TAGTTTACAA
AGACAATCTA
AGGAGAATCT
GGTGCAGAGT
TCCATTGGAA
AGACTTACCA
AGGTGAAGCT
CACGCAAGAG
GGAACCGGTA
AGAAGAACTT
TTATACAACA
ACAAGGGCAA
CGTAGAAACT
AGGAACACTT
9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 AGAAATTACA AACTTAACAA AAGTTGAGAA CAAAAAATCT AATAGACACT ACCTCAGCAT ATGTTTCTGC AAAAACGCAA
AGTTAAAGAG
CTACACACCG
GTCGATATAG AAAATCCTGC CAAAGAGCAA TATACAGTTA AAACACACCT AACTTATAAT
AAGAAAATAG
TATCGTAGAT
GTGAAATCAG
AAATACTGAA ACATCAACTC AAGATTTCCA ATTAGAGTAT TATTGAT'rCA GTAGAATTAT ACGGTAAAGA AA.ATGATCGT AAGTGAAGCG CCGACTGATA CGGCTAAATA CTTTGTAAAA AGAAATGTAC CTACCTGTAA AATCTATTAC AGAAAATACG 1092 GATGGAACGT ATAAAGTGAC GGTAGCCGTT GATCAACTTG TCGAAGAAGG TACAGACGGT TACAAAGATG ATTACACATT ACATCCTTNA AACAGCTGGT GCT'rCAGATA TGACCGCAGA GG'rCCATTTA CAGGGAGCCT TTGAAGAAAC CA'rTATTTGA ACTGTTTCTG CTGATAGTAA AATATTAATA ATGTTGCAGT GTAGCGAGCG CAACAAATAC AATCACCAGG ACAGTAATAA AGTTCGAGAG TTAATAAAGT AACCAAACAG CTGGAGGGAT GTTGCTACTG GAGAAATACG TCTACGTCGC AAAACGGTCG TATGTTATCA CCGGTGATCA GATAATAGAA AAGCAGACAG GTTGCTGATT ATGGAATCAC CTAAGAGAAG TTGATTATAC AGCAACATAG AAAAACTGAT AAAGTAGCGA CAACAGATAA GATGATGAAC TAGTAACGGA CATTTCAAAG ATAATACAGT AGTCAAGTAA TCGAATACAA GTTTCAGACT ATACAGCGAT TACTGTAGCT AAATCTAAAG AACAGCCATG CAAAGCAATC TGAGGTGAGC TTAGGCGATA GATCGGTTCT GATGGAACAA TACATTAAAT GGTGCTACAG AGAAAATGTC GCAGCGCTGG AGAAGGAAAA ATCTCAGGTG AGTGATAGAA AACAGCTrCGT AAATGATACT GGAGGALATAG TAGGGTAGAT GCCTTAATCT AGTAGGTAGA TTAGAAAATG AAATGGTCAA GGATATTCTA AG.TAAATA.AT GTTGTGAGTA ATACGCAGCA GCAGATGTGA CAGAGCAACC AGGAGTTT'AC TGTCTGGTGT CTATACATTG AGCAGACAAG TTATCTCACA AATCGTATGC CATTTATGAT TAAGATTT CGATATr'AAA CGAAGGCAGC GAATAGCGCG CGAAATCTGT TGCGGGATTA T'TACAGGGAA ACTWTTCCA TAGGTAATAT AACAGGAAAT CTACTAATGC ACGCAATAAT GTGCATTGAT ATCTAAT'rCG GAGTCGGAGG AATAGTAGGA ACGTAGATGT TGGAGATGGT AAAATGCAAG TACATCAG71T 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 11700 11760 11820 11880 11940 12000 12060 12120 12180 12240 12300 12360 12420 12480 12540 12600 ATTCGCTACA AAATTATCAA AAGACCAAAT AGACGCGAAA AGTAACTCTT GATGATACTG GGCAAGATT AAAACGTAAT AACACTAAAT AAA GCAGAAG CTGAAAGAAA GCCATTCTAC AATAAAGACC TAGTAGTTCA ACTTTACACT ACAGAATTGT TAGATGTTGT TATTAATAAT AAGAAAAATT CAATAAATAA
AGTAGCTTAT
CTATGGTAAC
GCCGATGAAA
AGTTATGTTA
AGALATACCTA
TGTTACAGGA
AACGAATAAC
AACTCAGAAG CTACTAAAAA AGTACTAGGA TACTTAGATA GACAATTTGA AGAAGTTAAA TTAGCGATGG ATAAATCAAT CAATACTACA AAAATCAAAA ATAACAAAGA AGCAT'rTATG GATATTAATT ATGGTAAAAT GAATACAAAA GGAAATAATG AGACTTCAAC GTTGGATACT GATGTAACAT TCAAAGAAAA CTTCATAAAC AAAGAATATA TATTCACACC AGAAGCATTT GTACTAAGCG ACTTGCAAAA TGTAACACTT GCAGCGAATG ATGCAGCCTT AGATAACCTA GCTAATATAG CAGAACACCT AAGAAAAGTA GCAGACGGTG TAGT'rGAA'rA CGTAAGTGAG CTAGGTCTTA CT'rATATGAA CCGTTCGTAC GATTTATCTA CGTACAAGTT TGACTTTAAC AT'rGTCGCAT TAGGAAATAG TGGACTAGAT 1093 AACCTGACAG C7TCAAATAC TGTAGGTTTA TATGCGAATA AACTTGCATC GAAGATTCAG TCTr-rGACTr CGTAGAAGCG TATAGAAAAC 'rG'TCTTACC AATAACCAGT GGTTTAAACA AA.ATACAAAG GCATATATAG TCGAAATGAA GCAGAAGTAC GAGAAAAACA AGAATCACCA ACAGCCGATA GAAAATATTC TACGATAGAA TATCAGCACC AAGTTGCGGGC CATAAGAGTA TGTTATTACC TTACCTGAAG AATCrTrGTA TATCATCG AATATGTCTA CACT'rGCATT GAAAGATATC GTGATAGTGT GGATGGAGTT ATTCTTr'CAG GAGATGCTT'r GTA.AGAAATA CAGTTGATA'r AGCAGCGAAA AGGCATAGAG ACCATTATGA AATCTTCT'rc ACAGTGCTTC AAAAGAAAAA CTT'rTCCGT'r CTGTGATAGT TTCAATGTAA AAGATGAGAC AGGAAGAACT TATTGGGCAA GGTTAACGGA GGCTCTATTA AAGAATTCTT CGGACCTGrr GGGAAATGGT ATGAGTATA A GGAGCGTATC CGyAtGCAAG TT'rAACGCAC TTTGTGTTAG ATAGATTATT GGAACGTCGG TTTATACTCA TGAAA'rGGT'r CATAAT'rCTG ATTCTGCAAT GGAAATGGTA GACGTGAAGG ATTGGGAGCG GAGTTATACG CACTTGGTTT
GGTAAAAGGA
AAACAAAACA
GTCTTGATATT
ATTAGGAGTT
ACTACTAACT
CGTCGTAT
ACGAACI"TA'r
TATTTGGTAC
'TTATGATGGA
TAAAAACATC
TAGTAGTGCA
AGATGCTTAT
CTACTTTGAA
ACTGCAATCT
GTAGATAGTG TAAATTCTCA TATTTrTAGCT TTAAATACGT GATTTGAATA CMTTGCATAC ATATAATCCG GTGGAACcGrr CAAAGTTATA TGCATGGATC ATATGATGTA ATGTATACAC GCGATATTAG CTCAAAATAA TGATGTTAAG AAAAAATGGT TATATAAAGC AGAAAAAGATr TCGATTCGGA TGAGGCGCTT TTGATGCGAT GGAAGCAAAA TTAGAAAA.AT AGAAAATTAT 12660 12720 12780 12840 12900 12960 13020 13080 13140 13200 13260 13320 13380 13440 13500 13560 13620 13680 13740 13800 13860 13920 13980 14040 14100 14160 14220 14280 14340 TACGTTCGTG ATACTAGACA TAATAAAGAT ACACATGCAG GAAATAAAGT CCGTCCATTA ACAGATGA.AG AAGTAGCTAA CTTAACATCG TTAAACTCAT TAATCGACAA CGACATCATA AATAGACGTA GCTATGATGA TAGTAGAGAA TATAAACGAA ATGTTCTCTC CTGTATACGC AGCGCTAAGC AATT~CGAAAG TTTAGAAAAA TAGCTTATGA ATTACTTGCG GAAAAAGGTT TATGTTTCTA ATCAGTACGG AGCAGAAGCA TTTGCCAGCG TGGCATGGAA GAGATGTTGC TTTAGTGACA GATGATTTAG GGTGAGTACT CATCATGGGC TGATTTCAAA AAAGCAATGT CAAGATAATC TGAAACCAAT AACAATTCAA TACGAATTAG ATGGCTACTA TACTA'rAAGT GTGCTCCTGG AGATATTATO ATCACAAAGG ATTCCTACCT GAAGCAAAAC ATTCTCATCA TATTTAAGAA AGTATTCAAT TTAAACAACG TATAGATAAA GTAATCCTAA TAGTACAAAA GAAGTAACTA TAACAACGGC TGCACAAATG CAACAATTAA TTAATGAAGC GGCTGCGAAA GATATTACTA ATATAGATCG TGCAACGAGT CATACCCCAG CAAGTTGGGT GCATTTATTA AAACAAAAAA TCTATAAT1GC *AAATAAGATT GTAGAGTTTC GACAAATCGA CTCCTTTTTC GGAATCATCT TCAACTCATC
ATATCTTCGC
ATTGTTGAGT
TTATGGATCG
AACGACCAAT
109
ACTAC;
AGTGT1
ATGT~
GGTGA
~GATG ACI-rTAGAAA TrCTATATAT TT GTAAGGATGA GGAGTCAGAT ;AGAT TTGATTGAAT GCAGATTGCA :AAGG TGGATTrCAA TCCCACAGAA LAAAT AAATACAAAA CAATCCTAGA GCTT CAAGTTGTCT GGCTTGAC'TT 14400 14460 14520 14580 14640 14700 14736 AATGTTGATT TGAGAAATAA CTTTGCTAGT CTAGT AGATTT C TGGGATTG 1 PT TTTTGCTGAG TGGGAT TCrTGAGGGA AGrrATATAA TAGTTGTAAT AATTAG INFORMATION FOR SEQ ID NO: 172: SEQUENCE CHARACTERISTICS: LENGTH: 11770 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172: ACAGGAAAGC ACGATAGCAA TGTTGCTTAC GGTGATACGA AAATCATTTG ATT1ACAGACA GACTTTTAGA GAGATTCTAG TTI'TTAGTTG ATTGCACATT GATTGACAAC TTTATTATGT AAATGAAAAA AGCTTTTTAC GATCAAATTA TCATGGGT GCGTTAGATT ACATAAAAAA GATGCAGCCT TATTAGAAGC GGTACAAAGA CAGAAATTAC ATGACGGAGA 1 rrAATTTAAT TAATACTAGA GTATTATCAT TCTCTTTGGA .AGATrTAAAA AATATTCCTC AAGTATCTTC GATTCTCTCT GTCTrGCGTG AAAATACAAT TTrAAAAGTT TTGGAAGAAG GTGAGTGAAA ATGATAGACT GATTCAGTITT TGTGCTTATA TAAACAAAAA TAGTTTATCT AGTTGTATTC TATAGTTACA AAAGAAAATT ATAGTGAAAT GAGGAGGAAT TTATGGAAAT AATTTTATAT GCTGGTGATG CGAAACAACA TGGTACATGT GAACGGTGTG AAGAAGAAAT TCATAATCTA CAAACAAAAT T'rTTGGCACA
AAAGTTTCGC
CTAATTTAGT
ATGGGGATTT
ATCGTT7rTC
GTTGTTTTTG
TTAAAATTTC
GATTGTTCCA
TATTTATAAA
ACAGTTAGCT
GGAAGCGTC.7r 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 AGCTCTCTTT GTTCATTCAC CAAAGAAATT ATTAGTTTGA TGTTATTAAC ATAGAGGAGG GTTG'TTT GTGCAGCAGG TT? TTCTACT GGTATGCTTG GCGCAATCTA GTGGAGTTGA GGCAGAAATA GAGGCGTITT TATGCGCCAA ATATAGATGT TGCACTATTG GGTCCACAAG TCAAAAGAAA TTTGTGATAA GTGTGATGTT CCGATAGCTG AAGATCATCT CATGACCAGT GAAAAGAACT TCATA.AAAAA AAAACATAAT GGTGAAGATT TAAATAATAT GAAAATTGCA CTCAGTCTAA ATTAGCGGAT TTGCTTATAC A'rTAGATAAA TTATTCCGAT GATGGACTAT 1095 GGTATGTTAG ATGGGAAAAA AGTATTAGAT 'rTGGCCCTrAT CTTGATTAG TGGGTAAGAA AAGGAGAT?1T ATTATGTCAA AGATGGATGT TCAGAAAArC ATTGCACCGA TGATGAAGTr TGTGAATATG CGTGGCATTA TAGC'rCTAAA AGATGGGATG TTAGCAAT'rT TGCCA~rGAC AGTAGT'rGGT AGTr'rGTTCT TGATTATGGG ACAATTGCCG TTCGAAGGAT TAAATAAGAG CATTGCTAGT G==rT.GAG CTAATTGGAC AGAGCCGTTT ATGCAAGTAT ATTCAGGAAC TTTTGCTATT ATGGGTCTAA TTTCTTGTTr' TTCAATTGCC TATTC'rTATG CTAAGAATAG CGGAGTAGAG GCTTTACCAG CTGGAGTTCT ATCTG'TATCT GCATTCTTTA T'=rGCTAAG ATCATCTTAT ATCCCTAAAC AAGGTGAGGC GATTGGGCAC GCTATTAGTA AAGT'N'GGTT TGGAGGCCAA GGAATTATCG G'rGCTATCAT TATAGGTTrG GTAGTAGGAA GTATTTATAC CT'rCTTA'A AAGAGAAAPLA TTGTTATTAA GATGCCAGALA CAAGTTCCAC AAGCTATTGC CAAACAGTTT GAAGCAATGA TTCCAGCATT TGTAATTTTC TTATCTTCTA TGATTGTATA 9 0* 9 9 9 9 6*99 'rATTTTAGCG AAGTCATTGA CTAATGGCGG AACA'rTCATA TCAAGTT'CCG ?TGCAAGGTT TAACTGGATC TTTGTATGGT TATATCATTT TTGTGGTGGT. TTGGTGTTCA TGGGCAATCG AGCTCTGCTT TTATCTAATC TTGATGCTAA TAAAGCTATG ATTAGAAAAT GGTGCACATA TTGTTACTCA ACAATTTTTA AGGTTCAGGG ATTACGTTG ATACCAAGCC TTAGGAAAAG TGTATTTGGA TTTCCGATTG TGTACTTGCA GCTGTGATAG
GTCTTGTAGT
TTGCAGCTTT
TCATGAATCC
TATATGGAGC
TGCCATGCTT
TCCAGCAATA
AGTTATGTTT
'rATTGCAACA
TATTTTATCA
AGCGATGTCT
GAAATGATTT ATTCTGCTAT GCTATTGGAA TTGCATTCTT GTAGTAAATG GAGTAGTGAC TTAGCCTCTG CTAATCTATC GAT-rCAT'rTT TAATTCTATC TTTGCAGCAA AATCAAAACA TTTAkACGTAA ATGAGCCAGT GTACCTTTCA TTCTTGTTCC GGTTrCATGC AGCCATTCTC GGATTTTTGG TGGGTGGATG ACATTGGTT'r ATTTTCCATT AAACAATCTT AGAGGTATTT GAGAGTTAAA ATTrnTCTAG TTCCAAAGAA ATAAAAATAT GCAAGAACAA CAATCAAGAC CGAGA.AGATG TTTTAAATAA CGCGCCTATG AACGATACAA GAAATTTTGG AAAATATGAC 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 .999 99 9* 9 9 9
AGGGGTAACA
GCAAGGAGTT
CTT1'AAAGTA
GTGTG'TTACT
TTAAAAGCTT
T'N'CGAAAGA
AGAATATTT
CAGGATCGTT
GTTAAACTCA
GAAAATTTCT
AAGGTGCTTA
GGCAGCCTTA
TAGCTTACCA AAATGAAATC CACA'rTTGTG CTAAAAATTA ATAAAAATCG GTATTATATT CGATGGTAAA TACAGAAGTA CTGAAAGGAT GAACAAATAT TTGCCTTGGA GTACACCAGC ATTACTCAGC TGGTGATATT AAAACCTTAT ATTGATGCTC AGAGAGCAGT TCTAGCAACA GGAACAACCT AATGTCCTAA AACGTGCATA TATGCTGAAA 1096 TATCTATAT GAAGAAGAAT CTATGATTGC GGGAAATCAA CCTTCTTCCA ATAAAGATC TCCTATTTTT CCCGAATATA CGCTAGAATT TGTTCTCAAT GAGN'GGATC 'rrTTrGAAAA GCGTGATGGA GATGNTCT ATATrACAGA AGAAACAAAA GAACAACTTA GAAGTAT1'GC TCCGTTTTGG GAAAATAATA ATITACG;TGC TAGAGCTCGT GCCTTATTAC CTGAAGAACT GTCTGTTTAT ATGGAAACAG CATTCTTCGG TATGGAAGGT AGTGA-ATT CTGGAGATGC TCACTTAGCA GI'TAACTATC AGAAACTTTT GCAATTTGGT TTAAGAGGTT TTGAAG.AGCG GGCTCGTAAA GCAAAAGTAG CTCTAGATT'r AACAGATCCA GCAAGTATTG TTTTTACGAC TCTATATTTA 'rCGTAATCGA TGCTATTAAA GTATATGCAA 4.
*t TGCTCTTGCT AAAAGTTTAG GAT'rGCAGAT A?'TGC'rCTA TCAATCAGTT TGGTTTATTC ATATGGCCGT TTTGATCAAT AGAAACAGAA GATAGCATTG TAATAAGGT'r CGCAGTCAAT TGTTACAATT GGTGGACAGA GGTATTAAAA TCAGTTGCAC TGCAGdGTTTA GATG.CTCGTT TATGCCTGCA TTTAATAATG GGAAGATGAT GCTTATGATT CCGAAAATGC AAATCCTAAA CGTAAGAAAG GAGTCCCATA 'rGAACCGGCA ACTACTTTTG AATGTATTTT ACAAATT-GAA TCTAATGGCC ATATGTATCC ATATA'rGAAG GCTGATTTAG
T'PGAACGTCT
CACATACATT
CTCGAGATAA
AAACCCATCT
TCATGAATGA
ATGAGATTAT
ACAGTGCCAT
GACAAkATCTT TGGATTAAGA TT-CTTCAGCA GGAAGTCCTT GAAGGATGCT CTTAACCCAT ACCGCAACCT AATCTAACTG GTGTATTGAA GTGATGAAAC TATITCCTTCT TTTATTGCAA TGGATGTGTT GAAACGGCAG TATGAACTTC CCTAAGGTTC TAAACGGTTT GCACCAAGCT
ATAAATATCA
AGCGCTTTGT
AATTACTTGA
CAGAAGCTAT
ACTCTCTTTC
AAAGTGGTAA
CAWI-rACAAT
TATATCAAAA
TATCTTAT'N'
TACGTTACCA
TTGGTTTTGG
AAGGAGTATT
TTCCAGGGAA
TACTTATCAC
T'rGGTCG=? 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 ATGGGGCTAT CGTTGCACAG GTATGAGT'rA GATGAATGAT GGAATTGATC CGGCTTCGGG 0. 0 0 TAAGGATATG AAGAACTTT'T CTGAATTAGA AAATGCTTGG GACACGAATG AGTGTTATTG TTGAAAATTC TATTGA'rTTA TGATATTCTA TGTTCAGCAT TGACTGATGA TTCTATTGGT AGGTGGAGCA GTATATGATT ATATATCAGG ATTGCAAGTT GATAAAACAC TAAGATATT TCAI'GGAAC GAGAAGTTCC CGI'GGAAAAC ACCTI'AAAGA GGAATTGCAA ATTTGrCGGA TTCATTAGCT GCAATTAAAA AATTGGTGTT TGAGGAAGAA CGTATAAGCC TTGGCATGCA CTGGAAACAG ATTATGCCGG AGAAGAAGGT AAGGTCATTC CATTCATGAT GCACCTAAGT ATGGTAATGA TGATCATTAT GCI'GACAAAT TGCTTATGAC ATTTATGT'rG ATGAALATTGC TAAATATCCT AATACACGTT GCCTATTGGA GGAATTCGTT A'IrCAGGAAC ATCTTC'TATC TCAGCCAACG
CAAGTCAGCT
AAGAAA'rGTT
TGGTTACTGC
ATGGAAGAGG
TAGGGCAGGG
ACGTGGAACA
TTCACCATCA
ATTACCAACA
GTTAGCCAAA
?I'TACATGGG
GAAACATCCT
CAATGTTCTT
AAATAAAGAG
AAAGTGGTCT
AGAGGGXTTCT
ACCCTCTATT
TAAAATTCGA
TTAGCAACTC
CATAATATGG
GATGAAATCG
GAAGAAGATA
TACCATATT-C
GAAAAACACA
TCTAAGGCAA
GTTCTTrTA
GAAATTTTGC
1097 CAGATGGACG CAACGCGGGT ACACCGTTAG CAGAGGGTTG ATCAACACGG CCCTACATCT GT'N'TAAAAT CTGTTTCAA6A TAGGTGGGGT TCTCTTAAAT CAGAAAGTAA ATCCTCAAAC AATTAAAACT AATTGCTT1'G 'ITACGAACAT AATACAATGT TGTTTCCAGA GAGACGCTGA
GAGACTTAAT
CCCAAGATGA
TGGAATTAT
CGCTAGCTGG
TGTrTCGTGTr GCAL3GATACT CATTATAGCA CGTACTGAGC GCTTGACACA TTAAATTTAG GGTAACTT'CA AATCCCACTA CAAAGATGTA AGAGAATTGA AGA7N'GA6A GGCATCTTAA
TCTTAATCG
TTGACGCTCA
C1TGCATTCTT
ATACTTTGTA
ATGAGATTAA
TTGCAAAAAG
TTGGCTCTAC
AGGATCCTCA
ATTAATTTr TTGAACGAAT CATGTTCAGG TGATTTCTCA AGACAAGCAG GAGATGATAT ATTACGTGCA ATAAAGGCGC TAAAAAAAGA TACAGTTATT CAGGGATTAT TAGCTATCGA TAATAGAATG GAAAATCTGA ACATTGAT'rC TATTGATAGA CAGAACTCTC CTAGTAAGAT AGTAAATAAT GCTTTAGCTG CAGGTGCGCA ATCAGCTTTC GCCATGCCAT C'TATCCAAAA TGTTATTCAA AATAGTCGTT CCATTTAGAT AGTCCTTCTA GATATATTCA GGGGGAAAAT GATTT~GGGAA ATTGCCCTAT TCTATTATGC CGATTTGAAG ATTACCTACA TAGGTATGGT GAAGCTTCTG ACAATGAAAT CAATCGAGTT AGTATTATCG GTCTTGGTGG GGGAAAGACG
ATTTATCAAA
GGGCTACCAT
AGCAGGAGCG
AAATTCTGTC
TTTAGCTGCA
TGCTGTTACA
GGCGGTTGAT
AGAGAGGAAA
GCCTTGT'rTG
GATCAGTTGG
TTCCATATTG
GTTGCCTTGG
ATTGATAGCG
GTACCTGTTA CTCCAGCTGG ATCACTGCAA CAGCTATTTA GAT'rACCTAG CTCCATATTA ATTCC'TCAAT TAGCTCTTGC TCCTTTAAAA ATGTAGCACA GCAGGAGCGG ATGTTTTTGA GATTTTTCTG ACGATTGGTT TACATATGAG AATTTTTGCr AAAATGCCAA ATCAATTTTG TTTATGATAT TGTTGGAAA-A TTCTGGCGCTI ATTTAA'rGGT CTGAGAAAGA AAATTGTGAT CAAAAGCTAT TGCAGATTTG 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 ATTGAAAAGC CTG'rrATTAT TGCTCCA.ACA ANTGCATCGA CCGACGCACC TGTATCTGCT TTATCTGTTA TTTATACAGA TGAAGGTGCA T7TGATCATT ATCTATTTTA TTCTAAAAAT CCAGATTTAG TTTTGGTTGA TACAAAAGTT ATTTCACAAG CCCCTAAGCG TTrATTAGCG TCTGGTATTG CAGATG0TITr AGCAACTTGG GTTGAGGCGC CTGCGGTTAT GCAGGCAAAT GGAAAAACTA TGTTGGGACA ACAGCAAACA TTGGCTGGAG TTGCA.ATTGC GAAGAAATGT 6060 6120 6180 6240 6300
GAAGAAACGC
ACACCACCAT
AGTGGAGGAT
ATTCATCATT
GAAAATAGAC
CCAACAAC'rC
GGTAAACAAG
TCAGATGTTG
AGGACTACTG
1098 TGTrTTCAGA TGGTrrACAG GCTATGGCAG TAGAAAATAT TG?1'GAAGCT AATACTr'rAT
TAGCTGCGGC
'rAACACATGG
CTAAAGAAGA
TAAAAGAAAT
CAACTATGGA
GCATGCAATT CATAATGGTT TGAAAAAGTA GCTTATGGAA ACTTGATAAG TATATTGAGT GCATTTG4GAT CAAGTTGGAT GGGTGAGACA ATTCA'rCAGA C7TGTGAAGC TAAAGTG4GTG TGAGTGGTCT AGGTS*TGAA TTACTGCATT GACAGGI'GAC CTTTAGTACA ACTATTATTG TTTACAAAAA AAT'rGGTATG ATGATGATTT AATAAAAGTT TGCCGrrTAA GATTTCGCCT TAAATTCAALA ATAAACAATA TGTAI-rGAAT TCTATXGAAG CTCAAGCTAT 'rATCGCTGTA GATGCCTATG TTTTCCAAAT GGTAGTCTTT TATTGATCCC ATTGAAATAG GATGAGAACA AATCGATTGG GAAAGTAAAA AGCAAT-rGTT TCGTACTATT TCAGATTCAG TCTACTATAT CGACATAGGT TGTCGGCTAT TTA'rrG'GAA TACATTAATT GGTCTAAAAT AAGTATTTTG TGCTATACGA GATAAGCTTC CTGCATAACA ATGGGATAPAA AAGTGGGAGA AGATACAGTA TACAGTTTTT AGTAACATCA CAAAATTGGA TAAGTGGCAG GGCTACTGG ACCTAAGTTT TATGCGCTTG ACACCCAGAA GGTTATACTG CAATGTCCTT CACCCAATGG GGA'rACTGGT AATGACCCAG AATTAATGCG CTrTGGATTT C1TACTACAAG TGGACTCAAT AGCTGAAGTG CCAGTAAACT GCTTCCTGAC GGAACTTCTG ATGGATGCTC AAAATCACGG TTGGTCAGAG TCTATCAAGG CCTT'rAAACA
GAGAGACCAT
CAGAACATCA
ATATGTTCCC
CAACCGATAT
GTTGGGATGC
CAGAATTTAC
CTTATGACTG
GGATTTTCAC
TAGAGCAATT
CATTTCAAAT
GAGTTTTTAC
TACATTTAAG
TTATCCGTCT
CCTCAGTCGT
TTTTGGTTTG
AGCGGAAAAC
GGATCGTGAA
CAAGCTTTAC
TTAATTTCTA TAAATGTTTT GTT'CTTCATA AATCAAAAAG AGCATTCCAG TTTTATCTTC TTGACTTACT CCTTGATTTA CATAGTCATC AAAATTAATG TCCCTCAAAA ATGGTATAAT AATCATAAAG AA.ATTGAGCC ACAGGAACAG ATACATCAAA GGAGCTGGTC TGCACGTAGG TACAAACGTG CGCA.AGGCTA CCTGCAGAGC AATACGCTAT ATTGCCAACT TCAAACGTCA GTCAACACAA CAGATCCAAA GAAAAAGGCT TGGCCTATCGA CCCATTGCCA ATGAAGALAGT GTCCGCAAAC CAATGCGCCA AATGACTT1AG ATGAACTAGA GGTAAATCAA CTGGTGCCAA 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 GGGTTGAGGA ATTGGGAACT AGCGTGGAGG CTATCCAGTT CTTACGCAGA GCGCTTGCTC ATATGCAACG CAACTGGATT TGTAACTTTC AA.AGTAAAAG GAACAGACAA GGAATTTACA GTCTTTACTA CTCGTCCGGA CACACTITrC GGTGCGACTT TCACTGTCTr GGCTCCTGAA CATGPATAG TAGACGCTAT CACAAGTTCA GAGCAAGCAG AAGCTGTAGC AGACTATAAA CACCAAGCCA GCCTTrAAGTC
TGACTTGGCT
CATCAACCCT
TI'ATGGAACA
CAAACAATr
CTACACAGAG
CGCTATTGCC
1099 CGTACAGACC TTGCTAAAGA AAAAACAGGG GTCAATGGTA AGGAAATGCC AATCTGG.ATT GGTGCGGTTA TGGCTGTGCC TGCCCACGAC GACCTTCCAA TCGTCGAAGT ACTTGAAGGT GATGGCCTGC ATGTCAATTC AGACTTCCTA AAGATTIGTGG CTTGGTTGGA AGAAAAAGGC GT'rTGGACTG GTGCT'rATGC
GCAGACTATG
CAACGTGACT
GGAAATGTCG
GATGGATTGA
TGTGGTCACG
TCCTGCTAG
GGGAATTTGC
AAGAAGCTGC
ACAAAGAAGA
AGAAGGTTAC
CTACCGTCTC CGCGACTGGC TCTTTAGCCG TCAACGTTAC TGGGGTGAGC CAATTCCAAT CATTCA~rGG GAAGATGGAA CTTCAACAGC TGTTCCTGA-A ACTGAA'N'GC CGCTTGTCTT
GCCTGTAACC
AGATTGGCT
GCCACAATGG
GAAATTGGCT
TGCGGAACAT
CCTCGGTGTT
GGGAACAAGC
TGATGGTTCC
GTCTAAkATCG
TACCCTTCGT
AGAAGGTTTG
AGAAATCCTT
TGTTACTGAG
TGTCAATGCT
AT'TGATTGCA
AGGTGAGTCA
TGAAATTGAA
AGATCTATCA
AATTGACGGT
CGTTA.AATAA
AAGGATATCC GTCCTTCAGG TACTGGTGAA GAAGTGACTC GTGAAGATGG TGTCAAAGGT GCTGGTTCAA GCTGGTACTA CCTCCGCTAT GATGAGGACC TCCTCAAACA ATGGTTGCCA GCTGTAC'1rC ACTTGCTTTA TGCTCGTTTC GTTCCGACTA.AGGAACCATT CCAAAAACTC TACCGTGACC ACCGTGGTGC TCTTGTGGCA 'rTCTTCCATG TAGAAACAGG GGAAGAGTTG CTCAAGAACG TTGTTAACCC AGACGATGTG GTTTATGAAA TGTTTATGGG ACCACTCGAT GAAGGAAGCC GTAAGTTCCT TGACCGAGTT GCGGAAAACA ATGGTGCTCT TGACAAGGTT AGTCCACTAG CTAACTTGAC CGTCGTGAA.A CCAACACTAT ATTGACCCGC ACAATACTGA GTAGATATCT ACGTOGGTGG TGGCATAAAT TCCTCTATGA TTTAACCAAG GGATGATTTT ACCGACAAGG TTGAAAAACG GAGCA.ACCC CAGCCAAGAT GTGGAACAAT ACGGTGCCGA GCTTCGATTG CTTGGTCAGA TACCGTTTGA TTACAAGTAA TACAACGAAA CAGTCAAAGC ATTGCCCAAC TTATGGTCTT TATGCCAAAG GCTTTATTCA TGGCAAACAG TCGCAGAAAC GAA.AGCAA.AT TGGTTGA-AGA GCCAAACTCA TGGTTGCTA-A 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840
CAAATTGAGT
GCTAACAAGG
CCATTTGCAC
ATCTCTTATG
ATTGTCGTCC
CGTGAAGAAT
AAGGAAATCG
CGAGTTTAT'r CTCTCAAATT CAACACAGCT -AAGATAAGCT TITATGTTGAC CTCACTTGGC AGAAGAACTC TAGCTTGGCC AACTTGGGAC AAATCAAAGG AAAAGTTCGT TACAAGAAAT CGCTTTAGCT TGAAAGTAAT TGCGGTACCG AGCTCTATCT GCCACCTTCA
GATGAAAA.AG
AATAAACTCG
ATAGTCCACT
TCAAAGCAGA
TTAATATCGT
GGACTATTGA
AsCCAACTAA ATTAGTTAAC ATTGTTGTGA AATAAGATAG GAGTCCTTCA GAGTAGAATC TGGACGATTT TTTGAATCTT CTTATGAAAG
GAAA.AGTGAA
TCAATGGAAG
GCTAGGATAG
CCTGATACGC
GAACTGGTCA
ATAGACAAGG
AACCGAGTAA
GGAACAGAAG
AThAAGAGAA
CTTATACTCC
AGGCTCTrC
CTCGGGAGAT
GCCTT-rAAA
CAACTGGTAA
TAGAAGTGGG
CCCAGTATCT
TAAGATGCCA
AGGTGAA?1'G
AGTGGAAAAG
GTGGACCTAC
TCAGATGTTG
GGCTTTGGGA
AGCTG'rCACT
CTTGGCTTGC
TCTTAACCTG
1100 TAT1GATATAC TATGGGCAAC TATAAAGTTT GTAAATGAAT ATGGTCAAAT GATTGGGGAG CCTTCTTTTG ATTT-CTTAGA ACGGCGrTAT CATGCGGAGG ATTTATTAGC TGTTTATGC CTCTTTCAGG AGTCAGTAGC AGACATGGAG GCTCG'rAAGG ACCGTTTTTA TTATGCAATC ACTTTTTCCC TCATGCGAAT TGATCAGAAT TTTTCTCCAG AGCTCAGGGG GACACGGATA TATGTCTTTG AGGAGCTTAA CCATCCAGAC GAGCAGCGGA
CTAT.CGTCGC
ACGTTT'GGGA
A
TATGAGTGGA AATGCGATGC TTTATTTATG; AAGGAAC=T GATTGGTTGT CTATGAT'rGA CCGTCAGGCA GTGGTTTATA AGGGCGTAC AAGAGATACG TAAGGACTGG CCTCAAGTCA AAGCTCGATT GGAAATATGG TTGCGTCCTG AAAACTTTGA TAAAAATGGA CGACAGCACA AGAGCTTGAG AGAACTT'TAA GAGGTGTTGA GATGATTACT ATTAAAAAGC AAGAAATTGT CAAGCTAGAG GATGTTTTGC ATCTCTATCA GGCTGTCGGT TGGACAAACT ATACCCATCA AACAGAGATG CTGGAGrCAGG 9900 9960 10020 10080 10140 10200 10260 10320 10380 10440 10500 10560 10620 10680 10740 10800 10860 10920 10980 11040 11100 11160 11220 11280 11340 11400 11460 11520 11580 11640 CCTTATCTCA TTCATTAGTA ATTTATC'rGG CACTTGATGG TTCGTTTGGT TGGAGATGGT TTTTCATCAG TTTTTGTACA GCTATCAGCG TCAAGGGATT GGTAGCTCCT TGATGAAAGA
AGGCCTATCA
CTATGGGCTT
AAAAATAAAA
TCATTAAATA
CTCCTAATGA
CTGAGGTGTA
AATCCTGGCT
CAAACTCTAA
AAT'rCGGACT CAGAT'rTCCT
GATGTACGAC
AGTCCAGCTG GCGACAGAAG AGACAGAAAA TGAAATCTTA TCCACCTATG ACTGTACAGG AAACTTCTTT GTTCTTAAGC AAAGTTTAAG AAGACCTCCT AACTTTATTT AATAAAATCC AGCCACCCAA TCAGGTGGCT TTNTTGCGGT AGTCCTCAGC CTGACTATCG TGAGGTAGCA TGATGCTGTG GTGGGCTTGA GGATTTGATT GTTTTGCCTA GGCTTTAGGA AATTTTAAAG AAACGTGGGA TTTTATCGTT AATGATTTGG ATAAACAGAG GATGGTCTAG TATCATATAG TAAACTTTTT 'rCATCACAAT ACGACGGGCA TGTCGTATAT GGGAGAGGAA GGGATAGCGA CTACGAACAG GAACGTGATA GTAAGGCGTA TATAGCGGAT AGTCCAAAAA GGTAGTCGTA ACCTATATGT GTAAATCACG AAGGTTTGTG TGAAAAAGAT AAATCTTTCT AGAGTCTAA ATT'rTCACTG TAACCTTTTA ACGTCCTCAT ATCTTGTATA TTATCCCGTG AGGTTTCATG AGCGCTGAAA GCGTAGTAAC AAGGAc3GCTT
AGAGTAATTG
GACTCTGCGT
AKACGAGGAA.A
AACGAATCAT
GAGAAGTCAG CCGAGCCCAT AGTAGTGAGG AAACTTCCGT AATGGAAGTG GAGCGA.AGG GTGAATACTC AAACALGTCTG GGGAGAGACT CTTrGAGGTC TGTCGCTAGA AAGAGAAAAC GACAGATCGA AGTAATCCTA CTTCACTTGT GTCTGTAAAA TGAGTGGTCT GATAGAACTG
GACTTTGAGG
INFORMATION FOR SEQ ID NO: 173: SEQUENCE CHARACTERISTICS: LENGTH: 4185 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 11700 11760 11770 0 .*0 .0.
(xi) SEQUENCE DESCRIPTION: SEQ ID NO- 1 CGCGAAACTA CTTTCTTAGT ATAACACTTT CAGAATCATT TTTTTCAATT TTTTCAAGCT ATTTCCAAGG GTTGTAAAAT AAGTAGTAAA CTAACTACTA AAAACAAGGT TGCCAAGAGC TTTCAAGGCC TGATAACTAT ACCATGTGCG ATGGCAGTCG CAATGGTATC.AATGCGTTCT AGCAGATTGC CTTTGATTCG TTGCATAAGA GCCTCCTGAG ACATCTTGAT AGTAAAGAAT GTCAGGCTGA CAGA.AtAAGC AATCTTATAG AACTGACTAG GATGGGTTGT CATCAAAAAG TTAATGGCCA AAT-rTAGCAG ATAAAAGAGC
TTTTTTCTCT
AGCGAGCTAA
GAAGCTTTCT
TCTTCCTGCA
GGCACACCAA
ATAATAGCCA
TCCTGGCTGG
73 GTCAATAGAA ATGACTTGAT CGTCCCTGAT TCTGCAAGAT 120 AAGGTAATAT AGTCTCCTTT 180 TTCCCAAAGC GGCGAACTCC 240 AAATCAAGGG CGTAATAATG 300 TGGATAATTC CATCCCACGC 360 AATCTGGAAT ATAGCGCAAG 420 TTTGATTTAA ACTGGAAGCA 480 GAGGAATGGT GCAAAGATAC 540
CCCTGCCAAA
TAGACCATCA
TCTTTAAAGC
AGCATTCTGG
TTCCCAGCTC
TCACACTTCT
AAACGTTTAA
GAATTTCTGA
TATCATAGCT
CTCTCCATAA AGTCCAACCC AACGGCAAAT ATCGTCGCAA TAAATAGAGG AGAAAGACTG AATCATGGCC GCCAATGATA TTAGAGTGTA GACACCGATT CATACTCGGG AGAAAAGAGA AAACGGCTAC AAAGGAAACA AAAAGATGGC AATCAGCAAG CCAGAATGAA AAAGAGAAGT GCTGGTAACC GATTAATTTA ATCCAGTGGA TCCACATCTA GGCTTTTACT AACAGCTCAG TTCTCCATCC ACCATGACAA GGTAATCATG ACAATGGTAT
CTGACAAGCG
GCTTGCATCC CTCTCTCCTT GTTTCTTAGC CAAGTTAAAG GATCGCTCAA CAGACTGGCT GGACCCGGTC TGAATAATCC ATGAATCACA GTATCTCTAT TCTTTGTAAA ATGCCGTTAA ATGGAGGTTT CTTTTAGT GGAACAGTAT CGGCAATCAA AGCATCAATT GCATATCATG 600 660 720 780 840 900 960 1020 1080 1140 1200 GCCCT'rTTG ATGTAACTCT TCGAGAAATT CCATAATCTC AGTATAGTTC TTCTGATCTT 1102 AGGAGAATAA TTTCAGCTCC TAAGACCAAA ATTGAAGCAA GACCTGCAGT CGGI'CATCIT TG.GTGACACG T-r TTT'CTGA AAAGTCCACA GATI'TCAAG CTCGCAAACG GAGCCCTAGA TAGGATTTTG TAGCACATAT TATCCTGTTT TTCCCAAAGA CTAGAGTTGA 'N'TCCCTGCT TATCTAAATG TAGGGAT'TTT GTCTAAAGAG TGACTGCAAT GACCTTTTGA GATAGACAAG CACCTAATTG ACGGAGAGTC AATCAGTCGC AAGCAACTGG AGACAATCCG ATCCACAGGG TCGTCGTCCC CTCTPCCTTA TGGGATCTAG AT'TGGCCAGT CACCAGCCAG ACTGACTCGC GTAALAGGAAG AAGGTCCAGC GGGCTGTCAC ATCATTTTCC ACTGCCCATC TGTATCCTGC TATCAAAGGC TACTrGACCC TGGGAATAAT CCCATTCAAA CAATTAAGAC TTTCTCTCCC
CCAAATGACA
GTI-rCATATA
GCCACCTCAT
CCTACTICGT1'
TAGCGTCCTT
CCATiTTTC
AAAATCGGTC
GCTGGGGTTT
TTATCCAGAT
GTTAGATAAA
TCAGGGCTCC
CGATGCAGAA
TGAATCTGGT
GGCTCATCAA
TGCTT'rTGTC 'rTTTC-AGCCC
AGAGCAAACG
AAAACTGTGC
GGGCAGAAAT AGGCCAATA CGGAATTCAT CTCTCGTTTC AATrCCTTC TCATCCACAC CAAAAATCAT ATTG-GTTGAA ATCATTTGAT CCGCCCGCTC TGCAACAGAA TCGCCTTTTA CCGTCTGAAT AAAGCTACTT ATAGCCTTGG CGACAATAGC AATC7r'rCA CCC7TTIAA TATCATCATA AGAAAAAGXT ACTTCCTCTA CTrITrGCCAG 1-rCATTCI'GC AACTGAACCT TCGCTAA'rrG TTCTTCCTTG ACTAAGTCCA CGGGTTCTCG AATTCCAT'T TGACTCAATA CATTAAAAAG GATACGACCA TCGTTTAVCA CGTCCTCCAA ACGGTGCTCG ATAATAAGAG CAATCAATTC GATAATATCC TGACCTGACT ACAAGAGAAT CGGACTTTCA TCAATCAAGA CACCTGACAA ATC=rGAGGA CGCTGATCCA ATTTATAAAC ACGACCTTTC ATCTCATCTA CCAAATCTTC TGCCACAGAC AAGCCAATAA TAACCAGA'rG AGACTTATCA TAGATGCTCA e 0 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 TTTATCAAAA ATTCTCCA TA CACTGACCCA AGGTAGATTT TTGTAA-ATGG TCA-AGTCTAT TGTCTGACCC TTGTAAATAT GTTGTGTTTC ATACCGGAAA GAGAAATCCT TCCACTCAAT TC'TCTTCATT CGCTTCTTAG ACTTCTATTT TATCATAAAT TCCTCTTAAA ATCTTAGCGC CAAAAAGATT CCTATCCTAC ATAAACATCG AAAAAGACTA CTTGCCCAGC CTTCCCCATC CTCTTCAAAC CACGTCAGcT TCGCTGCC GTAGGTATGG.
ATCTACAACC TCAAAACCAT GTTTGAGCc TGCTTCGTCA ACACTGTTTT GAGCAACtGC GGCTAGCTTC CTAGTTTGCT TAGTCCTTTT TCAAAC?1'CC TGCACGAGTT TGGGTTCCTG
ACCTGACCCA
CCCTTGCAAG
TaTAGCTTCT
CAAGCCCTTC
CTTACTTGCC
ATTTrTATACT
GATGGTCCAA
GTCGGTTCTT
TTCATCT'IAC
TTGC-AGTCTC
TAACTAATCT
CTTCGAAAAT
TTACTGACTt CGTCAGTTTC GTTCTATCCA CAATCTCAAA CTTTGATT'r CATTGAGTAT CATAGGCAAG TAAGAGAAGA GTTCCTGCAA TAGCTACAGA TACACCATTG ACTTrCTG CCGCTTCTTG ATAAATCACA ACAAGGGCAT TTGCAAGTAG TTGAATGAGA ACACCATTGA TCACGCGAAC GTAC?1'TCTA CTAGCGATAA TCCAAGTCCA CCATAGACCA CCAATCAACC CGACAAGCAA ACCGATAA'rC ACCGCATACT GAAGCTGGAT GCTrGTATTT ATGACGACAA AGAGGGCAGC GCCAATTCCG ATTrCCATAC TATTCTCCTA TTTTATCCTT 1103
GCAATTCCCG
ACATCTCCAA
TTAAAAATAA
AAAAGTCCCA
TAACCAACAA
GGTCCAAAAA
GGAACAGGGG
ACAGCAACAA
CTATTTTCT
CAACAATCCC TrGTGCAAAT GTGGTGCCAA GACACCCCAA GAATATCTTr
CAACTAAACC
GAGAGTCCTT
TAATAGAAAG
TTGGAATGTT
CTTGTTTAAT
TATTTCAATG
CCAGTCAAAA
AAAGAGTCCG
GATTGCATGA
TAGCGCTTGT
GATCATCCCG
TGTAAATTTG
GTCCAAGATG
AACCGACACC TACA'rTATAG GCCTrGGCAA AGGAACCT'rG GTTGATAGCC*AAACCTAAAC GATAGAGAGA GTTGATGTAA AGGATGGGTr GCCCAATTCT CACATCTGCA AATGA'rTTGC 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4185
V
be V
V.
C
S.
V V V. C.
V V
*VICV*
V
CV.-.
C
VV
C
C
CATAGACAAC
GATCACCAAA.
AACCGAAACG
TCGCTACGAC
AAGTAATGTG
AGGTATAAGA
CTACAATGCC
ATTGATTTTT
CAACCGATAC
CTGATTTTGA TAGACCAGCA TTCTGGTTCC AGCTTGrAA.A CACATCCAGA ATATCAATGG TGGAAGCTCT ACAATCTGTT ACCACTGGCC AGTTTAGCAC ATGCTCTGTG TTTTGACGCC TATCAGCATG ATAGATGGTC ATTCTTCCCG TGTGATAGAG CTCCCTTCAC CAGATGATCT CCACACTGAG CTCTGGCCCT CAGTATAGGC ATAGACATCA TATTGGCCAC CTCAGAAATC
ACTTCAAAAC
GTCCAAAGCG
TCTATGATGG
ACTTCCTCAA
CGACCGTGGA
TCACGAATGG
GTGACAATGT
CCTGGATCGA
AACGTGTTTC TTGATAAAGG AAAGCGTCCC ATTATCTGGC TGCAGTCTTG GCAACTACAC TCTTACGTTT CGAACCGACA AAACGTCGTT CCCTCAGGCC AGTAATCCAC CGTCT INFORMATION FOR SEQ ID NO: 174: SEQUENCE CHARACTERISTICS: A) LENGTH: 2069 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 174: TGATAGAGTT AAAGCCGCTG AGTCATTCAA TCCATCTCCA ACCATCAAAA TAGTGTGACC TGCTTTCTGC AGTTTCTCTA CTAACTCAAA TTTCCCATCA GGTTTCAAGT CTGTATAGAC CTGATCAAAG GGCAAATCI' TGACTAATTC CTCTGTCCTA ATCAAGGTGT CTCCTGTTGC Page(s)104- 06were not lodged with this application 1107 CCCTTTCTT GGAAAAATAA TrrGATTCAG TATGCAGTC j:GATTCATAG AAACTACAAG 3120 GAAGTcTT TGGTGCGTAT TTN~cGATAT GTGGATA~c ATG~cCTTA 'TTAGAAAAG 3180 ATGTTTCAGA AAcGAc.AAGT AGCTTACGA AAGATGGAT ATCGTGTCPAT CGAGGGTGAG 3240 GAGAAACAAT TCGTTTATGT .rGA'rAGTAGA TATrGAGAAG TGTTGAGAGA GGACTIAGCA 3300 GGGGAA.LAGAC AGGAATGTCT GCTATTTA CCTTA'rGTG ACCAGACAA ACTGATGAAT 3360 TTTrCTAAAAG AAT.T'AGGAT TAGTCAAAT GAGATATTA TACCAGAGAC GGTTGCAAAT 3420 AAAGCATGGC TAGACCAGT GAAGAGCCAG AAAAT1TAAAG TGTCTTAC TCAATCAAAA 3480 ATAGTAACGC crATTcTT'TT GGTGAATAAG ACTATTGTTIT GGTATCGTGC AATGCCA'ITA 3540 TTAGGGAAGG TAGATGAGA'r GACCATATTA CGTTTGGAAT CAGCTAGTAT AGTTI'CTGA 3600 C'rAGTGGCAG GTTTACGATA GAGAAAATT TTAAAAATT CTATGTATGA TITTCATC 3660 'TTTAGTGAGA CTrGTTGCCAT TATCACATTC GAATCACACA AAATAAAAAA ATTTTTATAA 3720 GTACTTGACA AATAGATTGA AATATCAT'AA AATAAAAACG GTTACAGAGT TATTAATTAT 3780 -se.:TAAGCTCA TGTCACCAT AAAAATGAA ATAAAAGGAT GTTATCACTA ATACAAGTGA 3840 GCAGGAACCT ATTTAATCAC ATCAGAAGAA GTTTCTTGAT GTTTTT-AAG'r AGGTCCT'rT 3900 *TATTTTAAAA GGGAAAT'I'T ATGATCATA~A AACGAATACT AAACCACA.T GCCGTAATT 3960 *CGCAAAGTAA AAAAGATATC GATATTC T'C TTTTGGAAG GGGAATAGCT TTTGGAAGAA 4020 *AAACTGGAGA TAAAGTAAA'r CCAATTGATA TGAGAAAG TTT'TTTCTC AAAAATAGAG 4080 ATAATATGAC CCGTTTTACA cGAGATGTTTA T1'AACGTTCC TTTGGAGTTG GTGTACATCA 4140 0.00:CCGAAAAAAT AATTAACCTA GGTAAAATA CATTGGGTA TAATTTTGAT GAAATAT1CT 4200 *ATATTAATTT AACGGATCA'r ATT1TCTTCGA GCATAGAACG TTATAAAGAA GGGATTAT1A 4260 TT'TCGAATCC CCTACGCGG GAAATATCGA AATATTATA AGAAGAATTT GAACTTGGGA 4320 ***AAAGGGCTT ACAAATAATA AAAAAAGAGT TAGGTATGA ACTTCCAATT1 GACGAAGCI'G 4380 *CATTCATAGC GCTACATTTT GTTAATGCTA ATTrTAGAAAA TAATT'TCA GAGTCGTATA 4440 .0.AAATCACTA AATAATTATG GGAATGAGA AAATcATTCA AGATTCTAT TGTACTGAGT 4500 .00TTAACCAAGA TTCTATTGAT TATATAGAT TCATAACTCA TATGAAATTA TTTGCCCATC 4560P G GGTTGA GAATACAACT TATT'GTGACG ATGATGA 4597 INFORMATION FOR SEQ ID NO: 176- SEQUENCE
CHARACTERISTICS:
LENGTH: 3984 base pairs TYPE: nucleic acid STRAm~Em~ss: double 1108 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 176: CGGCTTATTT ACTACTTGTT AGGGAATTT'r TTATCCACTA TCCTGCACTA TTGGATACAA TTTGATAGGT AAACTACTAA TGGTTGATTG CCTTCGTTAT AGGTGTTAAG CCGGTCCTGC TCGTAATATT TCTGCCTCTA CCAGCGAACT TCAGCCCCAT ATTAATCTTT TCACCTCCAA CTTTACTAAA ATGCGAGPAAA ACACCTGCTT GAGCCTTTAT
CCATCATATA
AATAAAGAGC
CTGCCGGAAG
AATAGGCCGA
CAGGTATGCG
CGCCTGCTTG
AAACTTCTGA
TAATCTCAGA
AAGGATTAAA
TGGAATATGC ATGAAACCTG TTGGTACATC AA=TATTGC
CTCTCATATT
AAACAAAGGT
TCCCTGTTTT 'N'GATAGCTT GTACCATCGC TGCTCCATCA ATACGAATCG GTGTATCAAT AGCATCATCT TGATTAATAG CCACTCGTTC TCCAATACA.A AGTACAGCAT CTGGTTGATA CGACTTATAA AAAACCGTTG GAATTTCTAC TGGPAATAAT TTTACAGCCT ACCTGTAACC AATATTTTCA
S
S
S. 55
S
S
S. S S
S
S
S.
S S GTACATTAAG AATATGTGPA AACGCCATTC TGATCTTTCA AGAGCGTTAA AATTAGCAGC CATTGGGGTC AATAAGGTCC GCAAGAGCAC CAGCCACAAT TGGATTAGCT CCCAGAGCAA CCTGCTGTAA TAACGGTGAA TGCTGCAAAA GCAT'rTCCCA ATrCCAAGAA CATAGGCCAA AACTCCTATA AAGCGACTAT ATCAGATGAG AGATAACATC ACCAACACCT GCTACAGTAA AATAATTGAG GAACAATCCC ACTTGTTGAA ACTTGCTGAG
TAACAATCAT
TATCCATCAA
CACAATAACC
ATACAAAGGG
TAATCATTGT
C'PGAAGGAAC
AAATAGCCCC
TCATTCGATT
AAATTGTAGC
GCGCTAAGAC
ACCTGTTAGA
CCAAAGCTGG
iTN'TATTTTC
TACTAGAGCA
TGCTGCTGGT
TGCTGTCATG
AACTCCAACA
GAATAGAACC
AATACCGCTA
CAAAGCCCCT
ATT'rTCTGAT
AAACAAGGCG
CAACGCAAGT
TTCAATATTG
AACAGACTCT TAGGGTGACT ATTGGTAATC GCAAGGCTAA TCGAAATCTT GCTAAATTCT ATTGCCATCA GCATAACTGG GCTTTCATTT CATCTAAGGA AGCGATAATA GGATTACAAT AACGTAATAG ACAATAGAGT TCTTTATAAC TACAATAGGC TCTAATAGT'r GCTTTGCCAA TCAATTTTr'r ATCAAATAAA
AATAAAAATT
TGGCAAGGTT~
AATACCAATA
CCAAAATGCA
TGTATGGAGA
CTCTGTCATT
TAATTATAAA
ACAAGAACAG
GGAATCATTT
TTATTTTTCA
CCGATACGGA
CTCATATTTG
CTTGCTTAAA CAATGTTAAC GCATATAGGA ACCACCTATA GATGTCCCAA GTCGAACTGG AATTGACAAC CAATCACAAT TTTG'ITCTCC TCCCCTAGTC TCCCCACTAC AATAAGTGTT
GTTTGTTTTA
ATAGGTCAAC
TTTTTTGATA
ATAACAGCAA
CAATAATAGA TGTAGAAGCA ATCCCTGCAT AATTGCTTTC ATAGCCTAAC TGATCTAATG 12 1620 TTCCCCCTAT CAAGAGGACT CCCCCAGCAC CAAAATTTTC ATTCGCAGCC TTCTACCTAA T'rGAGACTCT ACTGAGGGTG TCCTCCTAGA AAACTGTATA GAAGTTTCCA GT'rGCTTGAG TCCAAAGGTT
GCACGCCCTT
GCAGCTGCTT
CGAATTGAAA
ACTGTCAGAC
TCTGACAGCC
AATTCTTTTC
ACCAAACCAG
ATGATTGCTA
1109 CTACAAACGT AT7M-GAGCA AAGAAATTTC TTATTGTCTC ATCTrCAACC TCTG-TAACT CTCCCATAGG TTGAACCAAA GGTCTGACA AGAAACCAGC TAACTCTCGA ATAAAGAAAT CTTT'AATCTT TCGAATCAAA TCGATTGATC
TGAGCACTCG
CACCTGAAAC
TTAAAATAAA
CTGATTGCTA
'rAAAGCTG'rA
ACCCACAACA
CCACAAGAGG
CCAAAATCTC
CTAAGACTAC
TTCCTATTAA
TAAAATTTTA
CAAGG'TAACC
CAAAAATTCA
TGTTGCAATT
TCTAATCCAC
TCAAGATCAA
ACTCCTTTAT ATTCAAAATG ACAGTAT'I-r' TATTTAATGT CTTTTTCTAG TTCTTTTTGG TATTTGCTAT TGGATTCCAA TGCCATTTTT TAAAAACCTC Gr'rATATTCT TTTGTTGTAA CAATATCTTT A'N'CCTTTAA AGATATATGG ATCCCCCTTA ATACCAACT'r GTGAGTATGG GGTACTACGT TACTTACAAC TGGAGAACCA CCAGATGAAG CTGTTGGCAT CTATCTGTCG ACCAAGCTTG AGCTT'rGGCA TATTTTT-CAT A'rCTT'rTCTC GTCTCAGAAA CAGCATC'rTC TAACAAT'rTC TTATATTTAT CCAAACCAGG
ATAAAAATCG
ACGAGAGAAA
GTATCAAATT
TCCATATCAA
TACCAT'rCCT
TTTTTCTTTT
TTGCAATTTC
TTTTGAGAAT
CAATAATGAA
TAGGTCAGTG
TTTACCTACA
1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 ACATCCTTAT CTTTTCCTTT CGTAATACCA AGGTGTTTCA TGGCAGAACC AGATTrTGGA TCTATAATAT TCAAGTGAGA CGCTCGATCT TGATAGCTTG GAGCCCATCC TGTACTGTTC AAATCATAGT CTTTTTGAGA AGGAGCAACA TTGCCGTATT TATCATTTTC CATCAAACCA TCAATAACAT 'r'CCAATAAC GTCTGTCCTC GATGTTCGAG TCGCTATACT GTAGCCCAAT GATGCTGGAT CTACTGCATA GACATAAGAA AATGTTGTCG GTGCATCTGC TTCTT'rATCA GTTTTTCCAC AAGCCACTAA AATAGCTGAC GTGCTCAGGA CCACTCCTGC TGTTAAGAC CACTTTTTCT AT'rTCATAAA GAATCTCCTT TGGTTTATTT TAATCTACTT TTACAATCCA ACCTTCTGGC GCTTCAATAT CGCCAAACTG AATACCCGTC AATTCATTAT ATAATTTACG CGTCACAGGA CCTACTTCTG rr'rCACTATA GAATACATGG AAATCATCAC CATGTTGAAT ACCTCCAATT GGAGAAATAA CCGCTGCTGT ACCACAGGCA CCTGCCTCTA CAAAACGGTC AAGATTATCA ATTGGAACAT CACCCTCAAT AGGAGTTAAT CCCAAGCGAT GTTCTGCCAA ATAAAGCAAG GAATACTTGG TAATAGATGG CAAGATAGAT GGACTCAA'rG GTGTTACAAA TTCAT'rATCA GCTGTAATTC CAAAGAAGTT AGCTGATCCG ACTTCTTCAA TCTTTGTATG 1.110 AGTTGA'rGGG TCCAGATAGA TAACATCTGA GAAATGACGT GACTTGGCCA TTTTTCCTGG 3420 TAAGAGACTT GCAGCATAGT TTCCACCAAC CTrAGCCGCA CCTGTACCAT TTGGTGCTGC 3480 ACGGTCGTAC TCATCCTGAA TCAAGAAGTr GGTrGGGACC AAACCACCTr TAAAGTAATr .3540 TCCAACTGGC ATAGCAAAGA TGGTGAAAAT GTACTCTTCr GCCGGTN'TA CCCCGATAAT 3600 ATCTCCGACA CCAATCAAAA GAGGGCGAAG ATATAAGGTT CCACCTGTTC CGTATGGTGG 3660 TACGTATrCT TCATTCGCAC GGACAACTGC ?r'rACAAGCT TCTACAAACA TGTCTGTCGG 3720 AACTTGTGGC ATCAAGAGAC GGTCACATGT ACGTTGCAGA CGTTTAGCAT TTTCATCAGG 3780 ACGGAACAGT TGAACACTGC CATCCTTAGT ACGATAAGCT TTCAAACCTT CAAATGCr'rG 3840 TTGTCCATAG TGAAGACTTG GAGAAGACTC TGAAATATGC AAAGTTGCAT CCTCTGTA.AG 3900 CTCTCCTTGA TCCCATTGTC CATTTTTGAA ATGAGCAAGA TAGCGATAAG GTAATTTCAT 3960 ATAGGAAAAA CCGAGG'TT CCGG 3984 INFORMATION FOR SEQ ID NO: 177: SEQUENCE CHARACTERISTICS: LENGTH: 8703 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 177:- **TATCTAATTA TTGGTTTTTA TCGCTGACCT TGGCTATTGT TGGGGTTGTT TTACCCTTGT TGCCTACAAC ACCTTTCCTT TTGTTGTCTA TTGCTTGTTT CTCCAGAAGT TCCAAGCGAT 120 *TCGAAGATTG GCTT'rATCAT ACCAAGCTCT ATCAAGCATA TGTAGCTGAT TTTCGTGAGA 180 CCAAGTCTAT TGCGCGTGAA CGAAAGAAAA AAATCATCGT CTCTATCTAC GTCTTGATGG 240 GAATTTCTAT TTATTTTGCA CCTCTTTTAC CAGTCAAAAT CGGTCTGGGT GCTTTGACCA 300 TCTTTATTAC TTATTATCTC 'FTCAAGGTCA TTCCAGACAA AGAATAGTTA AAACAGTAG r 360 *TATT'rGCCTT GATAAAATTG AAAGCATATT CATAACAATA TGATATA ATA AAATTGAAGT 420 AATATTCAAG GAGAATCAAA TGATTTACGA ATTTTGTGCT GAAAATGTGA CTTTACTTGA 480 AAAAGCGATG CAGGCTGGAG CTCGTCGGAT TGAACTCTGT GATAATCTAG CAGTTGGTGG 540 GACAACACCC AGCTATGGAG TGACTAAGGC AGCGGTTGAA CTGGCAGCTA ACTACGATAC 600 *AACCATCATG ACCATGATTC GGCCACGTGG TGGTGACTI' GTCTATAATG ACCTAGAAAT 660 TGCTATCATG CTAGAAGACA ?TCGTTTGAC TGCTCAGGCT GGAAGTCAAG GGGTTGTATT 720 TGGAGCTT'rA ACTGCTGATA AAAAGTTGGA TAAGCCTAAT CTGGAAAAGT TAATTGCTGC 780 ATCAAAAGGA ATGGAAATTG AGCCGAAGCT ATTGACTGGC TGTGTCTGGC GACTCCTTAG TAAAGGTAAA ATTGAAAT'rC TCTTTCACAT GGCCTTTGAT TCACTCAAGC CGGTGTCACT AAAAACGTTT TGTTCACTrAT TACCAGGTGG GGCGA'rTGAC TArLGACCAG GTGGGGGTAA CACAATTGC-A TGGTACTAAG AAGGAACTGC TAGCTTTGGG TAGCAGTTTT CACTTATGTT AATTTAATCA AGAAAAGGCT CATGATTATG GTTTTATAGA ATAGTTGCCA GATTITrGCAA GGTGACTG TCATGACTGT TGAACTTTCA AGTCTTTGAC CAAGAGACTG GTGACCTCTA GCATGAGCGG AAGTT'rTGTC GGAAATGTCC GTGAGGCTTG GAACTAAGTG ATGAAGATCA CGTATCCTAA CTCGTCCrGG CACAGAATTT TGGAGTACGC CTTGAAAACC GTCAAACCTr GTTGTTT'T'rT AAAAAATAGA TGAAATTTTT AAATCCTATC AAATAGCGAA GTC'PGGACAT GTCCATCACT GCTGATAATG TCCTCACGTT TATATGGAAA TCTGGAGATT CTTTACCAGA
C
TTCGGAAGGC TTGTT'TCAT GTGCAAGATT TTATCTGTCA TCAGACTAAG CGTATCATGA CTCAAGT'rCA GGAAAAGTAT GGAAACCAGT TGGAGTATCT GTGGGAAAAA TCGCCTGATA CAGCTGTATT GCGCCATGAA GGCAATCAAA AGTGGTATGC CGTCTTGATG AAAATCTCTr GGAATAAGCT GGAAAAGGGC AGAGAAGGAC AAGTGGAAGC AGTCAACCTC AAGCATGACC AAGTAGCTAA TTTGCTTTCA CAAAAGGGGA TTTATCCAGC CTTCCATATG AGCAAGCGCT 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 ACTGGATTAG TGTGTCCCTT GATGATACTr' AAAAAAGT'rG GAACTTAACC TCTAAAAAAT AATTAGCTAA ATATTCTTTA CTGAAGAGAT AGGAATATGG TGCC-ATCTTC AAATACCTGA TCTATTCGCC TCACTGTCAT GATGACCTCG TCAAGAATGG TGCAGGACG'r GTTGAAGGA TATCAGATGA AGAAGTACTG GAAATATTTT AATAATTTTC TTTTAGAAAA TATAGGATTT TTGAGAATGT CAAGACGGAT GAATGGCAGT GGCAAATAGC CTATCAATGG TATTAGGGAG TCAATATTCG CCAAGATTAC ATACGTCAGA AATGGTTTCT TTGGTGGCAA TACCTTCTCC CTCTCACTTA TGAGATCATC
GAATTGATAG
ATGAACTTTC
ACCACACTAG
CGTCAGATCA
CTrTGCTGCTG
CGAGCTGAAA
TACCAAGTAG
CGCTTCTCTG
CACGAATCTC
ACACCTGAAT
ATGCTGCTTT GGAAGAAAT-T AAACCAGTA'r TGTCCTAAAT
GCAGTGGCTC
GAGACCATCA
GTATTrCCAGT
GTATTCACCA
TCCTAAAAAC AAAGCCGTCG AGA'rGGAdTC CTTAAAAATC
S
TGGTTGGTGT TAAGATTCTG CTTGGAAAAT TATCTGGTCG CCATGCTTTT GTTGAGAAAC TGAGAGAATT GGCCCTAGAT TN'ACAGAAG AGGATATCAA ACCACTCTTT GCTAAGTTCA AGGCACTGGT CGATAAGAAG CAAGAAATCA CAGATGCAGA TATTCGAGCT TTGGTAGCTG GAACCATGGT TGAAAATCCA GAAGGCTTCC ACTTTGATGA TTTACAACTT CAAACTCATG 1112 CAGATAATGA CATTGAAGCG CTCGTTAGCC TAGCCAATAT GGATGGTGAG AAAGTCGAAT TTAATGCGAC AGGGCAAGGT TCCGTTGAAG CAATCTr'rAA TGCTATCGAT AAGTTCTTTA ACCAATCTGT TCGTTTGGTG TCCTACACTA TCGATGCGGT AACAGATGGA ATCGATACCC AGGATCGGGT TTTGGTCACT GTTGAAAACA GAGATACAGA AACCATCTrTT AATGCAGCAG GGCTTGATTI' TGATGTGTTG AAGGCTTCTG CTATTGTCTA TATAAACGCT AATACCTTTG TTCAAAAAGA GAATGCAGGT TGTAAAGGAG AAGGCTATGG AGAAATCATG GAGGTTGGTT CTATGAGATT GACAGACGAC ACCTGATGAA ACCCTTAAGG TAGTCCTCAG TATGATGGAG GGAACTCAAT CTTTACGCTA GTCACCACTC AAACTGGAAC AGGCGGGATT TACTTTGGAT GAGATG4GGAC
CAAAGAAAAT
GCAGTGTTTC T'rACCACGAT ATGCCTAGTG AGTAGCTCTA GCAGGAGACG GAATTGGCCC TAGAAGTTCT GGAGGCTCTA CGTTCGGAGG TGCAGATATT CAAGTAGGGA AGCAGATGCT GCTGAAAAAA CAGGTTTTGA
GATGCAGCAT
ATCCTACTAG
GACCTCCCTT
TAGCTATCGG
CAGTTTCG
ATATTCGTCC
C
pa. C. 0.
C C a C a 0 CC CC a 0
GAATTGCTGG
ATCATATTCT
CCCTGAACAA GGCCTGATGG CTCTCCGTAA TGTAAAAATC TTTGACAGTC TCAAGCAT'TT TGTAGACTTT GTCGTGGTGC GTGAATTGAC TGAAGAGCGC AATGCGCGTG ATATCAACGA TCGCAAAGCC TTTGAAATTG CAAGAAATCG CTATAGCTAT GAGGAAGTGG AGCGGATTAT 2580 2640 -2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 a. *0
CCC.
A
*CCC
aCt C CC CC C a
C
CAGAAAAATC
GAAAGTAGCT
AGACTCAGCT
GAATCTTT
TATGCCATCA
AGCACCTGAT
CATGATGTTG
TGAGACAAGT
GGAAATGACG
CTTTTGATTT
ACGAAGAGT'TT
TTTGGTGCCT
ATAGTTTGGT
TGGATGGCAG
GTCGTGATTG
GTTACTAGTA TCGATAAGCA AAATGTTCTA GCGACCTCAA GAGGAAGTCG CACAGGATTT CCCAGATGTA ACCT'rGGAAC GCTATGCTTA TGATTACCAA TCCTGCTAAG TTrGATGTTA GGAGATATTT TATCTGATGA ATCAAGCGTC TTATCTGGTA GCCAGTCATT CTGAAAATGG ACCAAGTCTC TATGAACCTA TTCACGGTTC ATTGCAGGTC AAGGAATTGC CAATCCTATT TCCATGATTT TATCAGTTTC AGAGATAGTT rCGGACGTTA TGAGGATGCA GAGCGTATCA AACGTGCTGT CTGGCGGCAG GAATTTTAAC GAGAGATATA GGAGGTCAGG CTrCAACAAA GAAGCTATTA TTGCAAGGTT ATGAAGTTAG ACGAAAAAAT TACTCTAGTC GGAATGTCAT CATTTrCTTG ATTTATGGTA TTGACAAATC TAAGGCAAGG GGCGCATCCC TGAGAAAATC TTACTTATTT TAGCCTTTAC TTTTGGTGGT GGCTAGCAGG AATCATCTTT CACCACAAGA CTCGAAAATG GTACT'rTAAA TTCTrGGGAT GGTGACCACA CTAGTAGCCT TATATTrTAT TTGGAGGTAA GGTCTTCGAG GGAATACGCT GCTTGGGCTC TAGCGGACTA TGGTTT'rAAG CAGGATCTTT CGGTGACATT CATTACAATA ATGAACTCAA TAATGGCATG
AACTCTGGCG
ATCAGCTGGT
TTGTAACGGA
CACTTGGGGT
TTIGCCAATCG ?TCrAGCCTAG AGAGGTTAGA CAGGTAACTG TGGACTTGGA ACAACAAAAA GAGATAGATA GCGAGTGGAA ACATAAACTC T1'GCAGTATG AAGAGTTGAT TGCTGCTTA'r TAGAAAAAAT AGAAALAGGAG ATATAGTAAA GCTTAACATC CATTTCCAGC AATTTTTTAG CATTATAAAA TTATGACAAA ACACATTCAC 1113 GAGAAACTAG CCCAGCTAAA ACCAACCGAC ATCATCTCAC CAGTTGAAGA ATT-CACCTTC CTAAATAGTT TGGATGATAT GAAAAACAAC GACCAGCCTA CTGAAATAAG ATGTAAACAA AAACTACAGT GGACTATTWI AAGAAGGCTA CGACATTTTA GCGGA'rGTAT CGTTTGCCCT ACTAAAGTTG GT'rACATTAT CATGACCAGT GACI'GAGCG TAAGTTCGCA GCCAAAGAAC GTAAGCGTA.A CAAACCAGGT
CGGTATACC
CTGGCAGGAT
ATGAA'DTGGA
GGATI'CAACA
AAAGGTGAGG
GACAAGGCAG
GTTGTTCTCT
GAAGCATTCT
AAACCAGAAG
GATGTACGTG
GCCAAGCTITT
0 0 S.0 0 0 GCGGTAGCAT GCATGAACTT ACTAAAAACA TTGGGATGAA CCTTTGA-AAA ACTCAAAGCA GTACTAGCTG TTTTGTTATC GGGAAGAAGG 'rAAALATGGTC CTATGAGCAA GGTGTAATGG AGGAGGAGCA CGTTCAACTT TAAAATCATG ATGCACCTGT TTATTAGGAT AGAGAAGAAG TTGTTTAGAA TTTCTTTI'CT TAATAAGTAA TGCCAATAT TGCGCTTTAG CGCAACTCAA CCCAGAAAr GATATTrCTTC TTGGTTGTAT CTCCTTGG TACGGGGATG GCCGTGAAGA ACTTATTACT AAGTTTGGAA AAGCAGGTGA ACAATTGGCT TACGCCTCAT CTGCTTCAAT
TGTCTATGGT
CACCAGCTCC
CAGATACTTT
TCTAGTGTTA
TTTTCTATAG
TAAACATCAT
CGATAAGGAC
AGTTGTGATC
TAACTCATGG
TGAGATATTA
GATATGGTAT
TAGTAAAAGG
GACAAAACGA TTGAAACTCG GGCAAACTCA TCCCAGAACA CGTAAAGGGC TTGACATTCA GACTACCGTC AGGTTGAGTA AAGCTCCTAA CAC'rGGGCTT TCTATGTAGA AAATATATGT AGTTAGATTG ATGAATAAAA 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 GAAAAGTTAG T'rTAGAAGAT T 'rTATAAAT GGTATAGTCT AAATAAAGAA GAG~TATTAA ATAAGGCAAC TGTTGGTGAA AAGTTTAATG ATAAATTAAA AGAAGAGTTT CTCCAGGAAT GGCCTTTGGA TAGGATTTTA ACAATGTCAA. TCGATGAATA TGTAA'rAGGA AAAATAAGTC T'rTATGCTAC GCTCTTGAGA AGGGAAAATA CAAAAATCTA TTTCTGGTGG CTC-AGCT'rCA AAATTTGGTA TTTATTGGAA TAAAAAAACA AAGATCAAGC TA.ATAATGAG AT'rTCAGAGT TGGATCAGCG AITMCAAAA ATTTGTATGA AATTATCAAA GAAGGTATC GTTTTAACTT TGAAAATCCT TGAAAAGATC AACAAATGAA TTTAT'rGGTC GTTCTGCTAT GGTGACAAAA TCTATACTGA GGGAGATCCT TTCTTTGGTG TAAATATTAA TAGTCAGAAA
AAGGGACAGC
TTTCTTGGAA
AACAAATATA
TTAAAATCAG
ATTTTTGATA
'rrACTTTGTA
GAATTTTGGA
ACCACTTTGT
AACTGGTCTC
AGTATTCTAA
ATTTTCG'rCA
CTCCTGGCAC
TTCTCAGAC-A AATCAAGGTG CAAAACTTAT CCTGAGTTGG GCTTTTTATG GAAAA'rAAGG TCAAT'rAACT CAATCTCTAT GGGAAAAACT TATCTTGCTA 1114
GACCTTATCT
AGCCATCGAA
AAGACAATAG
TAAAGTCTCC
AAGAAATTGC
GCAAAATCAT AAAA'rAATTG ATTAGGAACT ATGCTTTG TACAATGGAT TCATCAAACA AAACCTCATC CTCCGCCGTG TAAAGAATTA ACGGATGGCA ACGAAGATCA AATCGGATTT GTACAATTTC AAGGTTTAAG ACCAGTATCA AATGGGCATG TTTTTAAAGA TTTTTGTCAG AAAGCAAAAG TTGATGAGGC TTGGGATTCT TACTTAGAAT TAACAAAAAC ATCTTACTTA TCTGTTAATA G'rGGTGTTCC AGGATCGTCA CTACCTAGCA A'rTATAATAA GCAAGAATAC 'rACAAAAGTG AGAGATTTGG TTT'GAAAGAC TATGTTTCCC ACCCATCATA TOATTATACG GATTTTGTAG GACCTATTGA GTTTAGGCTA CAGGACGGTA AAACCCAATT GATTGGAGGA CAAGATAATT ATATAAATGT TGCTGAAGAA AAAGAATATA GTAGACAAAA TTTGTCAGTA AATTATGATA AATATGTT'rA CGAGTTGTAT AAAGATAAAA GTCGAAAAAC TGTCCTAGAA ACATTGAGAA CAACAGAAAT TGATACTGAT AAGAATTTTG 0 .0 0.0 **o 0 0 ,.o TCTTCATCAT CGATGAGATC AATCGTCGGG AGATTTCTAA TCTCTATCGA CCCCGGCTAT CCTGCTGAAA AAGGAAGTGT TACACGAAAC TGATGAAAAG TTCTATATCC CCGAAAATGT ATGATATTGA TCGTTCAGTG GATACCTTTG ATTTTGCTAT TTGAAGTTAC TGTCGAGGGT CAAGCTGGCA TGT'rGGATAA AAOAAGCAAA AATTCGTCTA AGAAACTTGA ACOCTGCTAT ACAGTCATTA TCATATTGGA CCAAGTTATT TTCTTAA GTT ATGAATTACT CTGGTCTGAT TATATTAAGC CTCTCCTAGA ATGATGAGGT TGAAACTTTG AAAAAGATCA GGCAGTAGCT GATAATCAAC ACAAGAT'rAT CCTC'rTTTAG ACAGAACCTT -AATGATTWGA CTCATACTCC CAGAAAATCA AGACAGGGAA ATTTCCTCAC GATTT'rCTGA GTTCTTCATA TCAATCTCAC CAACTTTTGG TGTATCTCTTI GA'TTTTTGGC GAACTCTTTT TTCTACCCAA TATGCAAATC TTACATCA'rC GGAACTATGA GCGTCGTCGT TTTCGTT'rTG AGAGTTGAAT ATCCATGCAG CGAAAATAT'r CAGGAATTAA GAAGGATGTA GATrGACT AGACTACTTG CGAGGTTCTT TGATCTGACA AATAATGAGC AAACGATGAT GCGGATTACT AATATCCTAA ACTAAGCAAT AACGTATTTT CATT'TTTCCA AGATTTTTGA AACAGTCAAT ATGGTCAGGA AAGATTAACG TGCATTATCT CTTAAACAAG CTCGTGAAGA GAGGCTTTAT GAaACTTTGA AAAAAGCATT GATGACAATG AAGGCGATGA TAAAGAAAAA TTTGTTGAAG GGAAAGTCTA TCCCAAGATG TGATTTGGAT AAGGACCAAA CGTGATTGGT TTTCTTGGAT TGAGAGTAAT GACCACTTTT TAGTTTAGAT c?'rGCTTTGT 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 TCCCAAGTAT CTACAAGCTG C'rATTCGAAA AGGTCTTTAT 1115 AAGGAATATC ATCGATTTTC AACCA'rCTCA AGAAAAATCT ACCTATGATA ATCCCCTCAT AAAAGCATTG GTCAAGGGGT ATCG'rGCGTG TAACGCCCTC CAATCTAAAC CTATACGTCA 'rCATAACGAC AGTCATGTrA AGGGAGTGAT TGATGTAAGA TCCTTTCACG GGAAATATTG CCTACGCAAC GAGAG.AGTTC GCAGTTGGTC CGTCACACTA TTGAATACAT TAAGAATCAG ACTAGATAAT CTC'rCAACTA TTATAAACTA GCTGATCGTG TGCATACTT'r CACGAGTACA CTGATGATCC TAAACCAAGA AAAGCACGGT TrAGGGTATC ATTCTCTTTG ATGTTGCCTG GCTTTGGGAA GAGTATGTT 'rTTGTACATC
CGAAAAGTAT
AAAAAACTGG
TATTCTTATA
GTAAATAGTG
CGAATCCCTC
GAG
CCAGAAATAA
ATCCAGATTT
AATTGACTGA
TTI-rAAAAGC
AAATAGGAAA
AGAATGCCTC
GGATAAGACG GATGGAATTT TTATGACAGA GAACGAAAGA AAAAGGAATC AACCGTGAGG TGAGAAGGCT GGACTGATTT AGTAGCTGGC TATGGAGCTC ATTCTATAGT ACATTTTGTA GTCGTGAAAA CGTATCTGAA CTAAGATTAT TCGGGGAAAT GAAACTTACA AGAACTI'TGT AAGATCAAAA AATCTATGGT ACACCTrGTT GCCAAAAGGT CAGTATTrTC TGTrGGGAAA TTGTTCTAGA TGCAAAATAT ACTTATTCCA GCTGATTTCC TTCCTAGTAT GGAGCAGTCA AATTGAAGAA GTGGTCTATT AAATGATGGA AAATTCAGAA INFORUATION FOR SEQ ID NO: 178: SEQUENCE CHARACTERISTICS: LENGTH: 4854 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ I0 NO: 178: CATCACCAGT TTTAGATGGC TTTAACAGTG AAATTATTGC TTTTAATCTT TCTTGTTCGC CTAATTTAGA-ACAAGTACAA ACAATGTTGG AACAGGCATT .CAAAGAGAAG CACTACGAGA ATACGATTCT CCATAGTGAC CAAGGCTGGC AATATCAACA CGATTCTTAT CATCGGTTCC TAGAGAGTAA GGGAAT'rCAA GCATCCATGT CACGTAAGGG. CAACAGCCAA GACAACGGTA GGATGGAATC TTTCTTTGGC ATTTTAAAAT CCGAAATGTT TTATGGCTAT GAGAAAACAT TTAAATCACT TAACCAATTG GAACAAGCCA TTATAGACTA TATTGATTAT TACAACAATA AGAAAATTAA GATAAAACTA AAAGGACTTA GTCCTGTGCA GTACAGAACT AAATCCTTTG GATAAATTAT TTGTCTAACT GTTTGGGGGC AGTACACAAG AAAGCGCTTT AAAACCAGTA 1116 TTGATGTACC AAGATGAGGC GACC'TrrCA 'rAAGGTTCGC AACTGGGA'rC 'TG7TGTCT GAGAATTTCG CTATTGTTAT TAGCTGGTGG ATGTAATACT ATCCAGATGA TTATCTTTTA TAAAGAT'rCC GACTAATATT CAT'rGAACAA GTGTGGAA.AG TTTGGAAGAT GTCATGAATC AAAGTCCATC GTTAATCGGA TTGAATTGCT TATAAAAAAG TGATATAGAG TTrCTTGGTTT TTTTGATTTC TTCTGGATGA CAGTACGGCC AGCACGGTGT
CCAATAGGAG
CGAGCrTr'rG
GAGTGGATGA
TAGGTCCACA
ATGCCCATAC
ACGCCTTTTT
TGGTTTCGGT AGAATCAG'rA TGTCCATACT CZACTATATAC AGGCGAATCA TTTTTCTAA AGAAGAGCTT TCACAAGCTT
S*
55
S
5 55 0
S
S
*SSS
CTCGTTIATGG ACAATGCTAT ATGGCATAAA TCAAGTACCT GGTTTTACCT 'rTATTCCrCC ATACACACCA GAGATGAACC AGAT'rCGTAA ACGTGGATTT AAGAATAAAG CCTTrCGAAC AACTCCAAGA TG'rCATACAA GGATTGGAGA AGGAGGTGAT GATGGACTAG AATGCTTTTT GAAAACAGAT GAGTATAAPA CTCCATACAC TGGATGTGTA TAGAGCAATG GGGCT'N'ATT TTTAGGACAA T'rTCrCGGAT ACTTGCAAAC TTTTTAAGTT GTGACGAGAG TGATAACATA ACCTTCCT'rG CCCATACGAC GTGTAGGTTT CGCTATCTCT AGGAATATCA AAGTTrACGA TCAATTCCAC GAGCCAAAAG GTCAGTrGCA AGAAGCAGGG TTTCrAAGA TGATrTTTCT AAATTTAACA T'rAACATCAC A'rATCACGAT ACTGTAGT'rT TTCCTCGGCA TTCCCAAGGT ACTAGACCAC GGAAATCCTC TACATGAGCC AGTTTTCGTA TGGTCTACCT GCATGTAGAA ATGCTGGATA TTGTCCAATT GTGCGTGTAT TCGGCACAAT CTTTTCTTGG TCAAACTrGG AGTTGGTGGT CACGAGGTGC GTAGTGAGTG ATN-r'rTCTA TCTAGTAATT GGTCAAATrC ATCCAGGATG ATGGTTTCCA
CACATTCTAG
TTAGTTGGTT
TAGCGAGGGA
CTGACAGGCT
GCATATCCAC
TTGATCAGA
TCGTGGCACT
CAAAGTGAAT
GCTATCGATA
ATCTTT-AAAC
AACAGCCAAT
ATTGAAGAAG
TCGATGACGT
GAGATCAATA
CATGTAGACC
CTGAGAATCA
540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 .e S a
S
CATTCATCAT CTTGATTTTT TTAAGTTTAA TGAGTTCAAA GATACGGCCA GGAGTTCCAA TCAGAAT'rTC TGGCCCCTrTr TTAAGACGTT CAATTTGTCG TTTCTGACTT GAACCTGAAA CGAAGAGTTG AGCAGTCAAT CCGATAGCTT CTGCCCACGT TTTACATACA TCAAAAATCT GTCCAGCAAG TTCCGTATTT GGTGCTAGAA TCAAGAGTTG TTGGGCTTTT TTCTrTTGTA GtCTGAGAAG ACTTGGTAGG AGATACGCTA G.GGTCTTACC AGTTCCGGTT TGGCTCACTC CTAGGAGGTT TTCTCCAGCA AGAACGGGCT CAAATAGTTG AGTTTGAATG GGGGTGAATT CTTGGAAACC GAGTTGGTCA CTCAGTTCTT GCCATTCAGT CGGTAGTTTG GTTTTCATTT TTCTGCCTCA AATCTAATGC CAGCAGTCTG GCGCATGGTA TATAGTAGCT CATGAACAGA GCCTGCATCA TACAGCCAAG TTTCGTAGAG A1-rCAGATCT GGTTGCTGGA TCATGTGTGC AAATGCAGCG ACTTCC'rCAG TC-ATCGTA'rG AGGAGCCTGT ATT'rCCTTGG TGGTCGGTAA AAATAGCTGA GCGAATATC GGTTCCA'rCT GTTGTATAAA TCTCGCAAGG AAGATTGGAA GATG'rGAACT TGATAGTCTG GGTAGAAGAG GATACCATCT GTCAAGCTGT TGAGCATGGT AAGTCGCGTC ATTGGCTTTT ATAGAGGGCA TAAATCCCCA AATCCATGAG GGCTCCACCA ATT'rGGTGTT TGTCCAGCCA ACAAGTCAGG CATCTTGGAA ATCTGCTCC'r ACACTTGCT TATCTGCTAA AAAGT-TTTG GTGGTAAT'rA CGAGCTGCTT CAAAGATAAA ACAGTTATTT ATCAAACCAT TCTTGTGGTT GAGAGACAGC TGGCTTTTCG TGGATAGGAA GCTGGACTTG TCAATCGTGT TGAGAGTCAA GTGATGTTT TTccAGccTr CCATTTACGT CAATGCTA'TT CCAAAAAGAC GAACAGCAGC GCAAAACGGT CTGAAAAGAC GAGTATTTGG CATAGTTGAA ATAGTAGTAA AGGCTTTCTC TTTTCAGCTG TTTGAATCAA AGAATAACAT GTTTACCAGC AGACAAGGCA GC'TTTTGCCT GAGCAAAATG.TAAGGAG T"r GGACTGGCGA TATAGACTAA ATCAAAAGAA GATTTGAAGA AGACTTCTAA TTGATCGAA'r AGTTGGATAT TCTGATAGCG AGAAGCAAAG GTTGCTGCAG TTTCTAGTTT TCTAGAATAG ATTGCGACCA GTTGGTATTC TCCACTGGTA TGGGCTGCTT CTATGAAATG TAATT'rTAGC ATAAATACTC CTTTTCCGAT AGACGGGACT.ATCCAACAGA
GAGGAGAAAA
AATAAATAGA TAGAAGCATA GAATCTAGCA AGGAGGAAAA GGAGGATTCT CAGACATCTA TATCCGCGAT ATGCTGGACT TGCCAGCAAA TCACGTCTTG CCTTCCATGC CCTACTCAGC GGCGGAGTTA GATAATGCAA TCCAAGTTAT T'rTCATCAAT CGCCTATGGC CTTATCTGTG TCGCCAACGT GAAGGTGATA CCCACCAGAG ATGGCTGATA GCGCCAGTTC TTTAAATCCT TCT7-CATTA TTTCAAATAA GCTATTAGCT AACCTAGATTr TAAAAATGTG GGTATCAGCC CAACTAATGA AAATGTGACG ATTTTGGAGG GTAAGATTTC TATACTAGTA CATCGAAATT CAGG~TCATC CAGTCAGGT'r AATCAAAACC
CGATGACACC
TAACATAGAT
TTCTTTTCCG
CTATAATAGA
7TTTGTCAATT
GAAGTAACAT
TAGACGTCTT
ATCAGAATTT
TAGAAAAAA'r 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 CTACAAACAA ATCGCACTAG 'rATACGCTAT CGCAATTGTC GATAGTAATT ACTTCTCAGA TGACCTAGCT TTTCATAGTT TTATAGTAAA ATGAAATGAG AACAGGACAA ATCGATCAGG ACAGTCAAAT CGATTTCTA-A CAATGTTTTA GAAGTATAGG TCTACTATTC TAGCTTCAAT CTACTAGAAA TTCCATAGAT AGAAAACTAC ATAATCTCTA CAGATACGGA TGTTGGAGTT GATGTAAGAT GCT'rGGCTT GCTAGAGGAA TTGTGGATTG CCAAATTGTA TCATTGAAAT TATTGCTCAA ATTTGTTATG ATATAA.ATAT GAATAAA.AGT AGACTAGGAC GTGGCAGACA CGGGAAAACG AGACATGTAT TATTGGCTTT GATrGGTATT TTAGCAAT CCAGCAAAAA AGN'TTGAGC GAGTGAGGGA AATCAGAAGG TCCTCTCCAA GGGGAGAAAG GGACAAGCTA GAAAGTAAGG rrTAAAGGGA GTCGTTAATC GATTGAAGAG ACTGAAAAGA TTTTACACTT GACCAACTGT GT'rGACCTCC TTCATAGAGG 1118 CTATTrGCCT ATTAGGCGGA ?N'ATTGC'N' TTAAGATCTA AAAAGA'N'GA ATCGCTCAAA AAAGAGAAAG ATGATCAATT AGCATTTT'CG TCAGGGGCAA GCCGAAGTGA TTGCCTATTA 4080 4140 4200 TGATTTCCTC TGTTAGGGAG CTGATAAATC ACAATCTI'GT 'PTTCTACTAT ACAGAGCAAG G'rAATGTGAC CAAACAAATC TATGATT'rAG CCAGTCTAGG AAAGGTTCAC TTAACAGAAG TTTCAGATGC TAGTAAGGCT AAGGAACAGC ATAAAAAAAT AGAGCAAGAC CAGAGTGAGC AAGATGTrAA 4260 AAGAGTCAGG 4320 TTGCTTTTAA 4380 ATGGGCAACC 4440 TGATAAAAGA 4500 AGATTGTAAA 4560 0* AAACTrCTCT GACCAAGACT TGTCTGCATG GAATTTTGAT TACAAGGATA GTCAGATTAT CCTTTATCCA AGTCCTGTGG TTGAAAATTr AGAAGAGATA GCCTTGCCAG TATCTGCTTT CTTTGATGjT' ATCCAATCTT CGTACTTAC'r CGAAAAAGAT G.CGGCCTTGT ACCAA'rC'TA CTTTGATAAG AAACATCAAA AAGTTGTCGC TCTAACCTTT GATGArGGTC CAAATCCAGC AACGACCCCG CAGGTATTAG AGACCCTAGC TAAATATGAT ATTACAAGCG GGGT INFORMATION FOR SEQ ID NO: 179: SEQUENCE CHARACTERISTICS: LENGTH: 2186 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 4620 4680 4740 4800 4854 (xi) SEQUENCE DESCRIPTION: SEQ ID NO-. 179: TAAACAGGTG TTAGGTGCTC TAAACTATTA GGGTCTTGTT CATAGTAGGT GTGGTTCTTT ATAGTGGATG GTAGTTGGAT GACAGCCAAA TGGATTGTCA GTAAGATAGT TTTTAAGTCT TTTTACTTGG TGGTTTAGCT CTCCTGTTTT ATTACGTGAG ATTTGGAAAA CGTGTGATGC AGAGAGAACT TTTPTACGAA AATCTATTGA GTACTATTTT TGGTTCATTT TACTATATTT AATATTATTr CTAAATATTT GAAAATAACA AAATTCTAAG GAAATAAGGC TACTTTTTCT TTTTCGAGTG TAGCCCATAG CTTTGAGCGC
TTCAGAAGCT
ATCTCTATCA
CTCTTTTAGC
TTCTGTTATA
ATATGCCATA
CTAAACACTT
TCTATTTGTA
ATTTCAGTCA AATAAGCA'i ACTTTTCTTG GTTTTGTTCC TTTAACCAGC CATAAATGGT CTACCTGTTC GCTCACAATA AGAAGAT'rAT ACCACATTGT AGAAATAATA AAACAAATTA TTATACTATC TTTGAGGTAA CTATTATGAA CTATATCAAA AGACCACATT ATTTAGATTT TTTAAGAAAA CATCGTGACC 600 GACCAATCAT CAAAGTTGTG AGTGGAGTTA TCTATAAAGA GGAGTTACTA GCAACTGCGGG TCGAAGATTT GAGT'YACTAT GATCTGCGAC ATCAATTAGT TAGCAAGAAA ACATACTATA AATTTGAACT GGTAGCAGAT AG'rCTATT-CA GATCTAACGC CTACTTTATG AGTAGCCAAT AGATAGAGGT TCTTCCT'rTG TCATTTGAAG ATCTGAATAC AACAGAAATT TTTAACAATT AAACATCATC TTACGATGAA AAAATTGACT TAAATGATAT TGTCACTAGA TTGGGAAAAC 1119 GACGAGCTGG TAAATCTGTG CTrTTTCAAC 'rAGACCAGGA, TCAGATTATA TTCATCAATr ATTTTCAAAC A?'rATTCGCT TATATAAAAG TCTTTT'rAGA TGAAATTCAA TATGTTGAAA.
TCTTAGCAA.A TGTAGACCTC TATTTGACTG TAGCAACAAA CTrGACTGGT CGGTATGTTG
AATATCTATC
ATCTCTTTAG
ATCTCAGAGG
CAAATCCTAC
GAACCCTTCT
TCAGCCAAAA
ATAGTT'TAC'r CAGTAGTACA GGTAGCTTAA TATCAACAAA TGTTTCAATA TCCCATAA'rA CT'rTGGAAAA T'rTTTATTCC GTTCCACGTT TrGATGTAAA AGGTCAATC'r CTCACAGAGA TGCTT'rCCCT TACTrATTGC AATA'rATAAC TCCATACTGT TATTA'PrGAG CGCATTGTCC TAAGATT~CGC AATACCCTAG TTATTTGACA AC'ITTGACAG AGGTAGAGCA TTA'IrGCAAC TCTCTrATrA CCAGACCAGA GGAATTGAGA CGTAGATAT GTTTAGAAAA ATATTATCCC GTTGATTTAG GTTTACCACA AAGAAGACAT TAGGCATA'rC TTGGAAAATA TGGTATATr'r 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2186 CACAAGTATA TGTTGGTAAT T'rAGATAAGT ATGAGGTTGA 'rTTTGTTGTT GTAACTGATC TTGGCCACTA CGCTTATTAT CAGGTCAGTG GAGAACTTAG ACCACTAGAA GCCATTAAAG ATACGATTCA GCCAACAGCC A.ATTACAATG TACTAGAAAA ATAGATAAAT ATAAATCATA CAATGATTCT ACCCAAATCC TAACAAGATA GACACTTGAA AATAGAAATT GGGGATGAAA AAAATTAATA TCATAGTCTT ATTAGAGAAT ATTGTAACTG AATTATAATG AAAAAGAGAC GCATAGTATC AGGTATTGAA CAACCTTGAT TCATTTTCTC CTGAAATAGA GCTTT'TGCTA AAACAACACT TGCTCCAGAA ACACTAGAAA ATCAAT'TCCC TAAATATCTA TTAACAATGG GAATCGAGAA GAAAAGCAT'r ATAGATTGG'r CAGCTAATTA GATTTGCAAC AGTCTG'rrAT TAGTGAATTT CGAATACGCT ATATAATACG GGGGATCTAT AA'rTTCTGGA AGTACTATCA.
AGCATCACCC ACTTTC CAA ATAAGATTAA TGAGCAATCA GTCTTTAAAA TCAGAAAAGC AATATGCGTT TTAT'rATGGA AATATTTGCT TCCTATTTTT CTC'rATTTCT AATGATTTAC TTCAACTTCT TACCTCTTGG GAAAAA INFORMATION FOR SEQ ID NO: 180: SEQUENCE CHARACTERISTICS: LENGTH: 3236 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 180: GTCACACGTT TGACTTCACG ATCTTCATGC CATTTTTAGC ACACCAAAGA TATGGTGACT TTCAT'rGTAC CAGCTCCGTA TTCATTTCCA ATTCAGGAAT TGAGAAGTTA GAACAGCTTC TCTGTATCCT GATTI-rCTTC GTATTCCAGA CTATGAAAAT AATATCAGAA ATCCTGCTAG TTATTTTTTT GAGACATTAT TATTrCATAA GTATAAACTr ATTATCTAAA GGAGAAAATA AGCTAGACTA TAAT'rTCCTT GAAGAGATTA ACATTATCAA TGCAATTCCC CCAATAACTG CGAAGAGATA GCTrTGACAG TrT'rATC GGTTAGATAA ACATTTTATT AGCATTATCA CTCCCATTAC TTGCTCGCGT GTCCTTTAAA AATCGGCAAA GTAAT'rTTTG AGCATCCCAT AATCAAAGTC AAAATTGCCT GGCTAACTTG ATACTTATTG AAATCAAAGC CAGTGACAGT TTCGCTTGTT TTTCTTTTTA TTCTGTCCCA ACTTCTTCT ATCTTGATCC AGGCGCATTA TAATTTT'rCT TTTGATACCr ATT'rCGAATT TGAGTATTAA GATATTTGTC AGCAGATTTT GCTTCACCTr CTGTTTCGTT GTGAAAGTCA CTATCTGAGC CGTCCTGTTT GTGAAATATT ATCATCAAAT CCTCATCCCC TT TCTGCCAC
CTTTAACTCC
TGACACCTGT
CGCAACCGTT
CATAGTTGCA
ATCAGTGATA
CCAGCAACCC ATTTTTTTCG GTAATTTTAG CTGTCTGCAT TTGTTGGGTA TTCAGTAGCG ACTGTACGCT TACCATATCC CCTCATCTTG ATCAGTAATC AAGCTGGCAC CAACAACTGT CACCTTTCAC ACCAGTGGCG ATACGGCTCA TACCACGAAC CTGCATAACC AALACTTGGTA CCAATGATAA TATCCATATC TGATTAACTC ATCTTCATCC TTTAAATTCA GCGCTTTGAG CAAACTCCTT AACACTGGTT CTCTTCACAA TACCGTGACG CATCATCACT GCGATCAGAC TCAACATTGA TAACCGTCTG GGCAAGATTG GT1TCGAATCA TTGAACAGTC ATAAGACCGG TCCCTTACCA CCACGACCTT TTTTTCTGTG ATAATAAGAA GTCTCCTTCA CGAAGGTTAA GGCTGATTGA TTAAAGCGAA TCCTTCTGCC AACAAGACAT ACCATTTTGA CGAATATTGG GGTTGTAAAG AAGAGATAAG AATACTrTCG TCTTCATCCA ACCATACTCA GGAAT?1'CAT GAGCAGATGA TCATGGGTGC TCCCGTTCCT TGGACACCAC ACGCTTAATG TAGCCTCTGT CTCATCCTCG AGACTCAAGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 -AtTTCAAGAG ATTGACTACT GGTAGCCCTT AACCTTTAAG ACGATAGACA CGTCCCTTGT TAGTTGACAC TAACTCACGA ACAAAGTCAT GACCCCCACG TT'TTTGAGCA GTGAACTCGT TAGAAAGGGT AATCAAGACA TCCGATTCTT
TGGCAGTCCG
TTGTGAAGAA
CATCTTTCAC
CCTGATCCAA
CALATCAAGTC
CC'rGTCCAAT CATCAACC'r ATTCGTCT'rT GATAAT'rTGA CAATCAGAGC CAAGAGGTCA AACGACGAAG ACGCATATCA TCATCAACTC AGCTTGAGCT
GTACGGCGCT
GAAACACGTT
TCATACTCAG
AGGATAGCTT
TCCGCATcCG 1121 TA'rCAGAA.AA
CAGGCTTAGC
ATTGAA'rCTT
GACTTTGACG
tTCACTAGC
TTCTAAGATA
CACTTCTTTT
TATACCA'TTT
?I'TACGTTTA ACTTCATCCA AAGAATATCT GCTAAATCCG ATCGCGTTCC AAACCTGTCA TTCAGAAAGC TTAAACTTGC ACCGATGATA CGAATCAy'rC TGAGCGCGCG CTTrCCGCTTT TGGTGCTCGA TATAAGCATC TGGATAGCGA GCATATTGAA TTGAGAATAA CATTGGCTGA CGGTTTGACT CATCACGTAC GTCGA'rATGG
TTCCTTATCA
CAAAA'rCTGA
ACCAAAATTG
GGCGTCGCGC
TCTAGCGCAA TCAAGAGACC AAACGTGTAC GACGAACAAC CGAAGAGACA AAATTTTrCGG GTTTGCATTT GGGTCATTTT GAAGAGGTTA TTGACTTCAA TAACAAATCG AACACCTTCA TGCTGTGATA CCCTCAATGC GGTTTTATTG ACCATGTA.AG CCGrTTCAATC TCTGTACGAG ATGGATACCT GAT'rTCCCCA TTCCATCAAG TCCTTGGTAG gGTT'rCACCC AGATTATGAG TCCATTAACC AAAAGGTTTG ATCATAGI-rA TCAACGAAAT AATCTTGCTC ATACGTGCCT ACCAAAATTC CCATGACCAT GACCATGGCT TCATAAATAG TGTAATACGA GCAGATTTTT GTAGAGAATG CGACGGTGAA
GTTTTTCCTG
GAA.ATTCTGT
AACGTAGGAC
TGACAAGAGC
TCACTTCAGG
GTGGAATATT
GAA.AACGCGC
CAACTGTATT
CGGTATAACG
CTACAAGCAT
AGGAATCCCC
AACCAAGCGA ACAATATGCT CATGCACCTT TACAACGATA CGCTCACGAC CAGCTTACT AATCGAACCT TTACCTGTTT CATAAGCCTr ACCAGTTGGA A.AATCTGGTC CAGGCAAGAC ATTATCCATG ACCAACTTCA CTGCATCAAT GGTTGCCATC CCAACCGCGA TACCAGTTGC TGGCAAGACC AAGCGTTCCC CTTCATTGGC TTTGTTGATA TCACGAAGCA TT'TCCAGAGC 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3236
TTGAGCGGCA
GTAACGGTAG
GTGTGGGTGA
GCACTATCTC
CTCCACCA1TT
TATTTACCCA
ACACCCAATT
ACATCAGGAA
ATCTCCTTTG
CATCCATGGA
GAGCCATACG
TGACATCCCC
CATTCATTCC
GAGCTCGCGC
TCAGATTGAC
TATGGGGTTT GTCTGGGCTC CAGGTTTTAA GCCATCTCGA TACGATAACA CTCATGGCGT AGTCGATAAA ATTCACTAAA TTTTTATCCT GCATTAATAA CATTATACCA TAAATTCCCA TCTATTTCAG AACTATAAGG CATATTCGTG ACAAAGTTTT AAAACCCCTA ATAGAATAAG GAGATGGTTA
ACTTGCCTTC
ATGCCTCATT
CCTCTAAACC
TTAAAAGTGA
nACAATGACT TCACAATTAG TAAGTAACAA ACTAAAACGT TTACATCGAG TAGAATGAAG TTGTCTAGGG CTGACTAACA CACAAA INFORMATION FOR SEQ ID NO: 181: i)SEQUENCE CHARACTERISTICS: LENGTH: 8651 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 181: AGGTCCTGAA GTATTGGAAC GGAGTCAGTT GAAGTAGTAG CTATGCGAGC TTGGTAGAAC AGGTGCAGCA CTTCAGGCTG TAATGAAGAA TTGTCTCAGC AGAAGTACTT CAAGAACCTT CAACGGTACT TTTGCCAAAC GGTGGTAAAT CCTGTTGGAT *TCACAAAGAA TCGGATGCAG *TCAAGAAAAA ATGACTGGTC *AATAGTAAAA GAGGTATAAA TTTCTGATGA AAATGGTATC GCCTCATGGT TAAACACCAA AGGAAGGTCA AGAGTTTTTG GAACATTTCA
AAAAACTCTT
CAGTTGATTA
CCATCTCAGG TAGTCTGCCA TTGCTAATCA AGCTGGCAAG TTCTTGAATC ACCCCATAAA TTCTTGGAAG AGAAGTTTCT TGTTTGCAGG GATTGAATGG ATGGTGACAC 'N-rCTACAAG CTGGAGACTC TACTGTGGCA AAT'rACTrCAT CAAGGCAAAT ATGTCAACAT GGCCAACTAT ATGGCTTTAA CAGAACAAAA ATCTCAGCTC TTGCATTTGA ACAGAAGAAC CAACTGTGGC
GCTGGCCTTC
CATGTAGTCT TGGACTGCTC CCAACAGTCA TCAAACCAAA GAGGATTTGG ATGAATTAAA ATTATCGTTT CACTTGGTGC GTAGATATTC CTAGAATTCA GGAA'rTCTT CAGGACT'rCT GTCCTTGGTA TGCTCAATGC CAAGCTCTAT ATGATCAATT ACGTGTACGC TTAGAAAAAC CCAACGTGGT GCTTTGAAAC CCAAATGGAA GAACTTAAAG 0 0 00
TCTTGGTAGC
GACTTCCAGC
CAGGTTATGA
AACGTATTAA
GCTCAGACGA
TGGCTGAA6A
CAGGTTCTGT
TTTCAGACCC
ATGTTGAAGc
CAAAGCGCAA
TAAACTCTTC
TCTTTGTGGC
AGATGAATTG ACTAAATATG CTTCATCTAT GCTTCr'rGAC CCTGAGTATG AACTAAAGCT CTTGATGAAA AAGCTGGTCT TCTCCTrGCT TATGAAAAAA CACAACAAGC ACAAAACGCT TGCCAGACTG CTTGGATGTT TGGTCTGCAA AGAAGAAGGT GCAGATGCAG TTAAATTCTT GCTTTACTAT GATGTAGATA ACTCAATCAA GAAAAACAAG CCTACATCGA ACGCATCGGT TCTGAGTG'JG TATCCCATTC TTCCTTGAAA TCCTTGCTTA CGATGAAAAA ATTGCGGATG AGAATACGCT AAAGTAAAAC.CACACAAAGT TATCGGCGCT ATGAAAGTCT ACGCTTTAAC ATTGATGTTT TGAAAGTTGA AGTTCCTGTT AACATTAAAT kTCGCTGAAG GTGAAGTAGT TTATACACGT GAAGAAGCAG CAGCCTTCTT GATGAAGCAA CGAACTTGCC ATACATCTAC TTGAGTGCTG GTGTATCAGC CAAGATACTC TTGTATTTGC TCATGAATCA GGTGCGAACT TTAACGGAGT 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 CGTGCTACAT GGGCAGGATC AGTTGAAGCT TCTAA TGGAC TACATCAAAG ATGGTGAAGC 1500 AGCAGCTCGC GAATGGtCGC TTCAAAGAAC AGCAACTTCA CATGAATCTA AAAAAATTTA TGCTATACTT AAATCACAAG AATAGGATAT AGT'rTAT'TT ATAAACTATA TTTAGATGCA TACTAAAAAG AGTTTTAAAG AGCCTTACCT ATGGTTAGTG AGCAAATGAT A'rTGTTTTGA 1123 ACAACTGGAT 7TGAAAACAT TGGAAAGAAC GCGTGTAAGA AAAAAAGTTG TATGTAAAGG TTAATATGAA 'rrAGAAAGTA ACGAGCTAGG AAGGAAAAAT 'rGACGAACTC AACAAAGTTC AAGTCCTCC'r AGTTTAGGAA CTTACAAAAT AACTTACT'rG ACTATATGAA GTATAATAAA ACGGAAACAA TATTGCCAGA C-ATT'rCATTC
CGTTAGTTGT
GTTCTGTATT
A'rGTTGA'rGG ATTrGTT'ITAT AAAAGGAGAA GATAAACGGC AGGACTAGGT ATTGTTTCAA TATTCTTATC TGCAGATAGT GCCCTAACTA CAGTAGATAA GAATAAATTT TATAATGTTT CGGTTTCAGA AGATrAT'r'r' TA'rGTAGATA AATT'rGGAAA AGCAAAAAAT ATTGGTA'rTT CTGTACAAGA GTTACCCAAC GTTTACGAAA GAGGTCCTGT
AGATATTGTA
TATAAATTTA
AGCAAGTTTG
AGGTTTTCGT
TTTCGCTACT
TGGAT'N'GTT
AATGCTGGTC AAATTTTGGA AAAGGCACTC CTGAAGAGT'r ATGTATGGAG CTGTAAAAGA TTCAATCTTG GTCCTCAAGT GGATATGCTG GATGGCATTT GCTGTAATAA GTGGTGCGAT
GAGGGGGATG
GAAACAATTT
TGGCTGGGCT
09* 9 0 TTATTGGACA GTTGCTGrAG CTACAGTAGA AGTGCCGTTT AGATTTACCT TAGAGGTTAT TTCTTTATGA ATCATTCTTT GTTTTATAGT TTCTTGTGTq' CTTTGTTTAT TAGACTTAAT CTTTTTTrATT TTTCTG'rCTT CCTGTTTTTG TTTTGATTTA GCCTCTGTTT GATGAATTT'r AGAACATAGT .TAAGTTTTAA TTACAAAATA ACTTACTTGT GCTATACTTA AATCACAAGT TAAGTAATAT TAGGCA'rGAT CACAGGTGAA TTAGAAATCA GTGGTCATTT TTTGTACTTA TATACCTTTA AGATATAAAA TTCTAAATCC AATGAATCAC AATGTCTCGC TTGTCAGAAA T'rGTAATTGG TAAGGGAATT GCA'rTCGGAA AGAAGAAGGG AGGTTGAGAA AATCTTTCCO ATGAAGACCG AAGAGTCCAG TCAAAGATGT TCCGCTTGAT TTTATCACAG TGACCTA'rGA AGAAATATCA TTATCCGATT CAAGAGTATC TCTATGTAAC GGTGGCTGGG CTGCTGGAGC GCCGTTAATC CTGTTACATC GTAAAAACTG CTGTAGAAAA GTGALACCTTG TTT-ACACCAT TAAAAAAATA ACTGTATTTT GAATTTTAAA AATGTAGCTA CAAAAATAAA TAAAALACAGA AAAAAGTTGT ATGTAALAGGT TAATACAAGG TGAGTGT'rAC GCTGATTTTC TAGTTCATT GGACGTTGAC ATGTATCGAA TGATAAGGGA GAAGAGGTGA GGATTTGATTr GCTGAAAATC AGAAAACTI-r ATGGCTCTTC AATCATTGAT AAGCTATCAA CTTGACAGAT CATATTTACT 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 GTTCT'rA'CA AGCTCTAACT CA6AGGAAGGT ACAAGGATAG 'rAATCTGCCA GATATTTCCG 1124 CTAAGTATCC TGTCGCTTr'r CAAATCGCAA ATGAAGCTTT TGAAATTTAC CGTCAGAAGC 3300 TAGCAGATCA ?=TCCTGAG GACGAAATTA TTCGGATTrGC TTATCATrC ATAATGCTG 3360 AAGGTGAAAA TGAAGTGGAA CTTGTGGAGT CGATTGATAA GAGGAAAGAA AT'rCTCAGGA -3420 ATGTTGAAGA AGTTTAACG GACTATGCAA 7rCAACGAAC TAAAAAGAAT AACCATTTCT 3480 ATGATCGCTT TATGATCCAT T'rGAATTATT TCTTGGATTA TTTAGACAGA TCTAGAGATG 3540 ATAACCAATC ACTTCTGGATI ATGGAAGATC ATATTAAACA ATCCI'ATCCA AAAGCC1'TCG 3600 AGATTGGTTC CAAGATCTAT GATGTGATTA CCCAACATAC GGGTCTTGAT TTGTATAAAA 3660 GTGAACGAGT TTATCTAGTT CTACATATCC AACG~rTATT GTCATAAAAA TTTATT'rAAA 3720 ACTATATAAG GAGAATTCTA TCATGAATAG AGAAGAAGTA ACATTGT'rAG GTTTTGAAAT 3780 CGTAGCCTAT GCTGGCGATG CTCGTTCAAA ACTATTrGGAA GCCTTGAAGG CTGCTGAAGC 3840 TGGTGATT'rT GAAAAAGCGG ACGCTCTGGT AGAGGAAGCT GGTAGCTGTA TTGCACAGGC 3900 TCACCACGCG CAAACAAGTC TATTGACTAA GGAACCTi'CA GGTGAGGACT TGGCTTATAG 3960 TGTAACCATG ATGCATGGCC AAGACCACTr AATGACAACT ATCTTGTT'AA AAGATTTGAT 4020 **.GCATCATT'rA ATTGAACTCT ACAAGAGAGG AGTTCAATAA TGAATAAACT AATTGCATr'r 4080 *..ATCGAGAAAG GAAAGCC'rrT CTT1GAAAAA CTATCTCGTA ATATCTATCT TCGTGCI'ATT 4140 ***CGTCATGGTT TCATTGCAGG TATGCCTGTT ATTCTCTTCT CAAGTATCTT TATCTTGATT 4200 *GCCTTTGTAC CAAACTCATG GGGC'rTrAAA TGGTCTGATG AAGTTGTAGC CTTCTGATG 4260 AAACCTTATA GCTATTCTAT GGGTATI'CTG GCTCTCTTGG TAGCTGGTAC AACAGCTAAG 4320 *TCATTGACTG ACTCAGTAAA CCGGAGCATG GAAAAAACCA ATCAAATCAA GTATATGTCA 4380 ACATTGTTGG CAGCAATTGTI TGGTTTGTTG ATGTTGGCAG CTGATCCTrAT CGAAAGTGGT 4440 CTAGCTACTG GATTCTTGGG GACAAAAGGT TTGCTr'rCAG CCTTCCTTCC TGCCTTTGTT 4500 ACTGTAGCCA TCTATAAGG'r TTGTGTTAAG AACAACGTCA CTATTCGTAT GCC'TGACGA-A 4560 *GTTCCACCAA ATATCTCACA AGTCTTTAAA GATGT(IATTC CATTCACTCT ATCTGTTIGTT 4620 TCTC1TTrATG CTCTTGACT'r ATTAGCACGT TATTT'rGTTG GTTCTAGTGT GGCAGAATCA 4680 ATCGGTAAAT TCTTCGCACC ACTC'ITCTCA GCAGCAGACG GATACCTTGG TATTACCATT 4740 TCTTTGGTG CC'rTrGCCTT CTTCTGGTTT GTTGGGAT'rC ATGGTCCATC TATCGTTGAA 4800 ,CCAGCTATCG CAGCTATTAC CTATGCCAAT GCCGAAGTTA ACTTGAACCT TCTCCAACAA 4860 GGGATGCATG CAGACAAGAT TCTTACTTCT GGTACACAAA TGTT'rATCG'r TACCATGGGT 4920 GGTACAGGTG CGACATTGGT CGTTCCAT'rT ATGTTCATCT GGTTGACAAA ATCGAAACGT 4980 AACCGTGCAA TCGGACGTGC TTCAGTAGTT CCTACCTTC'r TCGGTGTAAA TGAACCAATC 5040 1125 TTG7TrrGGTG CACCTCTTGT TTTGAATCCA ATCTTC'rTCA TCCATTTAT CTTTGCTCCA ATTGCAAACG TATGGAT'r CAATTCTTT AT'rGAAACTC TTGGAATGAA CTCATTCACT GCTAATCTAC CATGGACAAC TCCAGCTCCA CTAGG=CAC TTCrrGGAAC TAACTTCCAA GTGCTATCAT TCATTCTTGC TGCCCTTCTA ATCGTGGTTG ACGTTGTCAT TTACTATCCA TTCCTAAGG TCTATGATGA ACAAATTCTr GAAGAAGAAC GTrrCACGrAA GTCTAATGAT CAATTGAAAG AAAAAGTTGC TGCAAACTC AACACTGCAA AAGCGGATGC TA'rTCTTGAA AAAGCGGGTG TCGATGCAGC ACAAAATACC ATCACTGAAG AAACAAATGT CCTCGT'TCTC TGTGCAGGTG GAGGAACAAG TGGTCTCCTT GCAAATGCTT TCAATAAGGC AGCACCAGAA TACAATGTCC CTGTGAAAGC AGCAGCAGGC GGCTATGTG CTCACCCTGA AATGTTACCA GAGTTT'GATC TTGTTATCCT TGCCCCTCAA GTTGCTTCAA ACTTTGAAGA TATGAAAGCA GAAACAGATA AGCTCGGTAT TAAACTAGCG AAAACAGAAG GCGCTCAATA CATC.AAATTA *to* 0 ACTCGTGATG GAAAAGGTGC CTCTGAAATA GTCTCCCATC GTCGGTAAAA AGATATCGTT AGTGAGATGG AGAAGGATGG CAAGAA.ATAG GTGAAAAAAA AACAGCTGCT TATCAAGCAG GGATAAATAT CT'rGAGGATA TCGATATCCA GTTGACCTCA TATTGCTTGG TCACGTATTT GTT'rrATCAT AATTTATTTG TCATCACTrT GACACGCCAG TATCGA.ACAT TTr'GTAGACT TTGGACAACC T -rAA'rGAAA CCCTCCAGGT ATCCAGTACG GTCTCATGCA CGCGCGGTAA TGTTCACGCC CTGCCAACTA AGCTGAGT'rG GAAGATATCA CTATTCAGCT GAAACCATG TCTTGCATTC GTACAAGCGC AATTCGATTA AGGCTAGAGA GTTACGGAAA TCGCTATGGC GAATTTCCTA TTATTAATTC TTTACCTCCT CATGTCACAA TTCGGTGACT TGGTACA.AGA CTCACTGACT CCTCTCCTCT CACTTTACT TTATT'rAAAT TGACAAAAAC ACTrCCAAAA GACTT'rATFT TTGGTGGCGC AAGGTGCTAC ACATACTGAT GGAAAAGGAC CAGTTGCTTG 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780
ACTACTGGTA
AGCTAGCAGA
CACTGCCGAA CCAGCTAGTG AT'rTTTACAA AGAGTATGGT GTCAATGGTA TTCGAATTTC TCCCGACTGG TTACGGCCAA CAGAGTGTCA CAAACGTCAT AAGCTCTCCA CTCAAATGGA ACGCTGCCTT CTGT TTTGAA TTGGACCAAT CGGTGATGGT ACCTTGCCAA AGTCTTrCAA GTAAATGCTA AAGGTGTTGA GTTGAGCCTT TTGTAACTCT GACTTCTTAA ACCGTGAAPA GAAT'rTCCAG AAGTAAACTA CAATATTTGG TTGGGAAATT TCACACCACA ATATGATGGT 'rA'AAAGGGG AAATTGGTGT AATCCAGCAG ATGTTCGTGC GACGCAACTT ATCTAGGTCG TTAGTCAATG GTGGTAGT'TT
AATTGTACAA
AATATCCTCT
TCCACAATAA
AGAGAAAGGC
AGATCCTGAA
ATTCATCTTA
AAdGTGTCAA CCATATCTTA GGATCTTCGT GAAGAAGATT TTACAGCATT AGGAATCAAC TACTATA'rGA GTGACTGGAT CCATAATGGT AAAGGTGAAA AAGGAAGCTC 'rGAGCTCCT GACTATGTAC CACGCACGGA GTATGACCAA ATCATGCGTG TGAAGAAAGA TGAAAATGGT CTCGGCTATA AAGATGAGT'r TATTGATTAC GTGAAGCAAC ACITTGGAGGT 1126 AGAAGCTGCA AAAGACTTGA ATGATTTCCI' GGAAGCCTTT GATGGAGAAA CTGAAA'rTAT TAAGTATCAA ATCAAAGGTG TTGGTCGTCG
TGTAAAAGGT
GAAACGTTAT
AGCTCACTCG
AGATATAGAA
TACTTCATTT GGTCAT'rAAT GGTCTCTTCT ACGTAGATT'r TACAAGAAAG TAGCGGAAAC T'rTTAGTGAG TCAAAAAGAT TTGGGArrGG TTATCC'rAAC
CGTTGATAAC
'TTTATCTGAT
GGATGTCTTC
TGAAACTCAA
TCAGArrATA
GTTCAAAGAT
ATTATCTiACC CTCAAGGTT'r TACAAGAAGA Th'TACATCAC ACTGTTTACG ATGATGGTCG GCGATTGCAG ATGGAGCTAA TCATGGTCAA ACGGTTATGA GAACGTTATC C'TAAGAAATC GACTACTAGA ATTAGTCATT TTTATCCAAT CTATTTATGA TACCGTGTTT GACGAGTGAA AAkAAAAGTTT ATATTATAA.A TTTCGAAAAA TGCTCTCAAA a.
a a a. a.
a a a. a.
q a. *e c~ a.
a a a 000t a GAATTGAAAA GTCTTGGAAA ATGGTATGTC TCGACTGGTA AAGAATGGAT TTGTCATTCA GATGATGAGC TGGAAGAATT TAAAAATCTA rTTTTT-AAAT-r TTATCAATCC TGAAGAATGG
GATACTATCT
AAAAGTTA.AA
TCAATGAAAA
AGCTTTGAGG
CAACACCT'GA
GTT~GTTACCC
TTAAAAAGGA
TTATAAAAAG
TTATTAGAAT
AGAGCTCCGT
AGATTTT~GTA
ATTAAAATCA
CCTTTGATTC AGAT'rTTATG TCTTATATTT AGTACTCTGT TCAAAGAGCA ACTrrTAAACT AATTGGGCAA AAAGTCTTTG TACTATGCGT TrTATrGTGG AGGCTCTTTC AGTTTATTAA TTGAATCACT TAGTTTAGAA TATAAAAATC AAACTTATTG TAT-rTAAAGC G.ATGCGTTGA TrTTGAATACC ATTACAGCTA GGGTCAATGT ACCAACAAAA AATCTGTTAA TTTTCGTTCG CCGTTTCAAC AATCGTAACC AATTTCTCAA AAAACTCTTA TCTAATCACG TTGCTTATAC AGGAAGCGAG TCGCAGATTT CTCAATGCAT ATATAGAAAA ACGCATAGTA TCAGGTGTTT GAAGATTTAC TTTT'rTCTT CTGAAATTGA GGCTTGATGA CTTTAATGTG.TTTAGATAGC TCTGAAACAA TAGTATCA-AG ATTTGATACA AACT'rGCTAT GATCTGCGAG TAAATATTTT GCCTCTCCCT CTTCCTCCCT AAAAGTAGCT ACGAAAGCTT TAGAAAATTG GAGATTAGAG 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580
GCACCTGTAA
CTTAAAATCA
S. *a S a
S
TATCGCGATA ATTTCCACCT GAAAAACAGG TAGACTGTTG AAAACTCTAA TGTTGTTCCT CTGCAAAATG GGCTATTTCT GTTACGACGC GGATATTGTC AATAGGCAAC TCACGCGCAA GGTCCAATGA AAATAGTTTC TCTTTCTTCT ACTAGACTGC TGTTTTTCTG CCGTTTGGAG GGCTTGTTTT TCAATATTTG ATCGCTCATT AGTCAAAAGG GAGTTGGTTC GAAGTTTTTC AGCTCCACCA TGCACACGAA TCAGCAAATC TT'rATCAGCT 1127 AATTCCTGTA AATAGCGCCT TGCAGTCATA TCTGAAACGG CTATTrCGTC CATAATCTGT TTAACTGTTA T INFORMATION FOR SEQ ID NO: 182: Wi SEQUENCE CHARACTERISTICS: LENGTH: 3786 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182: 8640 8651 AATCTCCAAT CAGTGCCACT GGACAGACAA GAATAATTGA TGATWTGGAG ACTGGCAAGC ATCGTAGGCT ATCAAAGAAA CCAAGAGTTG ACTATATTGT TAAATCGTCT CACAACGAAA
TCAGCTACAA
TAGAAGGAGT
AGAATGATTC
GCAAAGAAAC
TGGAGAACCT
CTACCCAAGA
CTAGGATAGA GATGATAGAA AAAAGAGTTA TAATGCTTGC TATATGTCCA TAGTAAGCAT AAGATGCAGA AAACAGAATG AGCAAGAGAA CCAACTTACT TGTAGGAGAT TTGATCGCTT GCTCT'IrCCA TTTATCCCTA GATTTTGGAG ATTTACTGAT AAGAAGTGGC TC'ITTCGTGG TCTCTTCAGC TATAGTGACT TrTTCTGTTT AATTAGATGC CTTCTTT'rCT TCTAT'rTCTG TTTTCTCTTC*TTGCTGGCTT TCCAATrCGA GAGTATTTT TTCAATTGGT GTATCGAGAT qCTCTTGAGC TTGCTCTTCA GGCTTGTTCT CAAACCATTC TTGTTTCATG GTAGAACCTC TAGCAA.ATGT AAGCGTTTTT GTCAACGTCT ATCAGATCTC GCAATGAGTT GATCCTTGAC TAA'N'CCGTA CCTCTTGATT CAGGCTTTTC
AGAAGAGGAG
CGGTTTCACT
CAATGCTAAT
TAGCAATAGC
TGTCTAGCGT
GGAATGAAAA
AAGGAGCTAG
GTTTGATGTG
AGGCTGTGTA
CCACTAGCCA
CTTGGTCGGG
TTGCTTTTTG
CTTTAGA.A.AG
GAAGAAGAGT GTTGTCGCTA CTGCTCAGGG AAGCGACTGT ATAGATACTA AAGAAAAAGG ACTGTGTGTG ATACTrGTTT AGACCAAAAA TCAAGCACT'r GATATAAGGA CTTITCTAAAG CTGAGGAAGA GCTTCTTGGC GTCTGGCTCT TCTTCAGTAG
GATAATAACT
TGCTTGACTT
CACACACAAG
AGTGAGGAMnG
CCAGTCCTTT
CCGTTCACAA.
GGTCTTGTAA
AGGGCTGTAA
ATTGGAATTG
TCCTGGTGGA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 TTCTCGCTTC ACTGTCTTCA GGAGCTTCAA CTTrCAGCTTG AGGGACTTCC TCCTCTAACT CGGCTATCGT TTCTTCAGCC TTGTCTGCAA TGCTTGTTGT. TTTTACAAAA TCATTACTTT CTTTTTAGTT AGATAAATAT GTTTCCATAG GCTTGGTGTG GATATTAGAT CAATATTATC ATCGGTT'TTT TCAGTTTTGT AAGGGTTGCT TCTTGTGAAT TGGAAGATAG AACCATAGTT 1128 GCTTGAGATG TCCCAGTTAA TTCGTTGGCT TTCTTTCTGG ATCTTTGGJCA GTCAGTTCAA CCT'rGCCATG GACTTGGATA CTCTGTTGAC TCTAGCTGAC 'rATCTGTAAG AACTGTATCA CTTTGAGT TTAC'TGTTTT TGATACGACT TCCTTCAATT ATTGAGGG'rC GCATTTTCAA GGCTAGCATT T'ATGATGGTG GATGTT'GATC CCTTT2AGAG TTCrCCCTT'r TGGTAGTCGG ACTAGAGTAG CTACTTGCGA TATGAAGAAT CCCACCAATT TCTAGGATGA TTCTGAGATA T=rCAGCGT GGAAGTGATr AAGATATTAA CGATATTGGG CGGAGGATAT AGCTGTTTGT GTTTGTCCGC GATTGGCTGA AGAATAACTT CTTCAAAACG CCAGAAGAGA GAAACGGAGT TTCAGACAGT TTCTTATCAG ATGGTGAGCA GAAAGAGATG GATGGTGAGC GTGTGTTGGT TAGCTTTTCC GTACGGCTAT CCCGTCAGAT TGGATACCTA AAAGATGAGA AATCCTTTTG TGAGACTCAG AGTTCTATCG TTCTGATTGG TGATAAGATC GATGG'rAAGA AATGTGGA'rT TGA'rCATCGA AAGAGTCTGT GGAGAGTA6AT TTCTAGGTTT TCGAC'?TCCT CATAGACAGG TTCTTT'GGAC ATGGAAAGTA
CAAAAAGCAG
TCCATTTACG
GATAAAGCCG ATAACGGTAG CATGCTGATT ACCTCTCTTT GACCGAAGAA GCGAGTTGCA
TGCCAAAGGT
GGCTCTTAAt
TCACCACACC
CCTTTTTTAA
TAGGAAATGC
GAACAAATTG TACCAGACGA ACAATGAGTA CAAGTAAAAC 'rAGCGAAGAA GCACCGATAG AGGCTGATT-r GGCTTGGGCG AGGACAGTGA TGATACCCAG TATGGAAACT GCAAAGAAAG TTGCGAACAG GGTCACGAGG ATGGCGATTC CGGCTIAACAA GGCGATATGT AAAATTTGTC TTTTTTTATC GAGAAGATTG TAGCGATGAG TTCT'rCTTCT CCATGGCTTC GATACGGTCA GGTATTCAGT TCTTGTCATG TAGAGTGCCC ATTCATCTT TAGTATTTGC GCATGCGACC -TCCAATTITTT TGAGAATGCG GATAGAACTr
CCTTCGACTC
GCT'rCAGGTA
GCGGATACTC
TAGGGTCAAG
TTGGAACTCT
ATAGAGTGTG
ATAAGAATCA
TGGAAAGTAC
ACACATACAT
CCAGTAAACC
AAC'rTrCAAC
CCAGAATGAC
CCAGAGGAAT
GGTTATTT'TT
CGTGGGCCGC
CAGCATCGTC
GTTTCTTGAG
CCTTCTATGA
AGCTGCTCTA
CTAGAATAGG
GATTCTTTGA
CCCTGCTCCA
ATGGGAAACC
TGTACATCTA
AGAACCAAAA ATCAAGATAA TAAAAATAGG AATCCGCCGA AGTCAAAGCG GCTACAAGAA CCCGATAGGT GCTGCAAGGA TTGAGCGGGT GCTTCATTGA TTCTTTGGGA GTTCCCAAAC AAAGAGCTCT C'rGAAATAGT ATAGAGTTCT AGCTCAGTCA TGCCATTGAT GGTGTCTGTA TACCACCGTT TGTCAAGGAG TTGTCAGAAA GCTATT~GCCT TATTAGCGAT CAGCTTAATG GTACAGCCAA GATGAGAA.AT TCCTTTTCTA ATGTGTAAGA AAAGAAAGCC CTGTCAAGAG 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060
GTTTGGCTAA
TCAATCAAGG
TTTTTATATA
TCTCATAACC
CAGAGGATGT
TAATTTTTCT
AAATGTGTAA AATTTTTATA TATAAAAAAC TTCTAGCTAA AACTAGAAGT TAAGT TTAAAGGATC 1129 TTATCCGCTC TGTCCACTGT AAAGAGGGCC ACAGTCATCA GGATATCGAT GAGCAAGAGG GCAGCTACAG ATGGTACCCA AGAGTGGAAC AGGTCAAAAC TGTAACCAAA GAGGGTTGGC CCA.AAGGCTG CTAGGATATA GCCTCCTGTr TGAGATAGGC CGGACAATTG TCAGGGGCGC T TGTC 'IGAG GCGGTTCCGA TGAGGAGATG AGCATGGAAA TGCCGACCAC CGAGTAGATA AACTGGTTGT GATAAGATAG AAGTCAGCAA GTAGGTAACC AGGTCATGAC ATTGCCCAAA CCTGTTTATT GCTAGTCTAT GATrATAGCG AACGTGAGGA GAAGGATAAG
TAGGAA
TGAAAAGTrG ACCATGAGAT AAGGGAAGAG GATGGCAAGC CAGTAAArGA AATTAT'rGAT ACCAGCTAGT GAAACCAGAG TGAGCATGAG
GGC'TGTCTTT
GGCACTGGTT
TGGGAAAAAG
CTGACGGTTG
CAGGCTTGGG ATGGTCATTG GCCAGCTTCG TGACTGGATA GGTGTAAAAG ATCAAGGATT ACGCATGACC TTTATTTGAC GTGATTTGGG AGCCAGACCA TCCTTTCCAA GAACTGGCTT AAAAAGGAAT GCTAATCAGA GACC'TGCATG GATAGACATG GAAAACCTGA AAAGATAATA TTTT'rTGTTT GGTTTGTGGA AAAAAGTTGC TAGACAGAGT GTGTAATGGG CACAGCTAGA 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3786 INFORMATION FOR SEQ ID NO: 183: SEQUENCE CHARACTERISTICS: LENGTH: 3054 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183: TCAGCTAAAA AACATTGCTA AATTGATTGA AGCTGGTGCT ACACATTCCG ATTCAACTTC TCACACGGCG ACCACCAAGA ACAAGGTGAG CGTATGGCAA CTGTTAAACT TGCGGAAAAA ATTGCAGGTA AAAAAGTTGG TTTCCTTCTT GATACAAAAG TTGTTCGAAG GTGAAGCTAA ACTAAACAAG GAATCAAATC GATATCTATG ATGATGTTGA CTTCGTGTGG TTGCTAAAGA GGTATCATCG CTAAACAAAA CTTGCTGAAC GCGATAACGA GCAATTTCAT TCGTACGTAC AGAATATTCA TACAAAACTG AACTCGTGAA. GTGATTGCGT AGTTGGTCGT CAAGTTTTGG TGATGCAACT CGTGAATTTG AGGTGTGAAC ATCCCTAACA CGATATCCGT TTCGGTCTTG TGCAAAAGAT GTGAACGAAG GACCTGAAAT CCGTACAGAA GTGAAAAAAT TCGTGTTGCA TGAACGTTGC TGGTGCTCTT TTGACGATGG TAXACTTGGT AAGTTGAAGT TGAAAACGAT
CTAAAATTCC
AACAAGGTAT
TTCGTGCAAT
TTTCCCAGCT
CAACTTCATC
CTGTGAAGAA
1130 ACTGGAAACG GACATGTrCA A'rTGTTCGCT AAALATCGAAA TTAGATCAAA TCATCGAAGC AGCTGATGGT ATTATGATrG ACCAACAAGG TATCGATAAC CTCGTGGTGA TATGGGTATC CAAGTACCGT 'rCGAAATGCT GCAGGTAAAG TTTATCAC GCAACTCGTIT CAGAAGTATC ATGT'rGTCAG GCGAGTCTGC ACAATCGACA AGAACGCTCA
TCCAGTTTAT
TGCAACAAAC
AGATGTATTC
AAACGGTAAA
AGCTCTTC?1' CAAAAAATGA TATCAAGAA ATGCTrGAAA CAA'rGACTGA AACGCTGT'rA TCGACGGAAC TACCCACTCG AGTCAGTAAC
AGTCAATGCT
AAAACCACGT
TGACGCTACA
TACAATGGCT
TT'TGAGCGTA
ATGGATATCA
AAATACCGTC
TTGATGTTGA
ATGTTCGAAA
ACTCTAAGAC AGAAGTAATG AATTGGTTGT AACTCTTACT CAAATGCTGA CATCTTAGCA ACTGGGGTGT TATCCCAATG TCGCTGAACG TAAAGCGGTA
AATGAATACG
GCT'rCTGCTG
AAGACACGTC
TTGACATTTG
TTGACAGATG
GAAGCAGGTC
GACCTCTTGA TTCAGAT'rCA TTAAAGATGC TACTAGCTCA ATACTGCACG TTTGATTTCT ACGA.ATTGAC AGAACGTGGC CTCCATCTTC AACTGACGAT TCGTTGAGTC AGGCGATGAT GCACAAACAC AATGCGTATC CAGCTTTAGA GCTTGTGTGA CATAATGGAT TGATACTCTT TATATGTTAC TgACTTCGTC CGGCTAGCTT CCTAGTTTGC TTGATTrAGAA AGTCAAATGA 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 ATCGTTATCG TTGCTGGTGT GCCAGTAGGA GAAGCTGTTC CGCACAGTAC GTTAAGAAAA ATATAAAAAC CTATCATATC TAGGCTTTTT G'rATAGAGGG TAAGAAATAG GCAAAACTTr CGAAAA'rCTC TTCAAACCAC GTCAGCGTCG CCT'rACCGTA AGTTCTATCT ACAACCTCAA AGCAGTGCN' q'GAGCAACtG TCTTTGATTr TCATTGAGTA TGAAATAAGA TATGCACAAA ATTTCTACAkA ATGTTTAGC AATCGTAATG 0 0000 000** CGATTTAATG ATATGGTATT TGAGTGTAGA TGTTATTTAC 'rTCTTCTCAG CTAGTGTACT TTTATCGGTT TCTCTAGTTT TAAATTTACA TATCACTATT TTTTGAAATG TAGAATGAAC TAT ATCGCCA TTCTCCCCCC GGTACGAATA TGGTZYCCT TGAGGCAGCC TTGGCGGAGT TGGTATTATA GTCAATCTGT
TAAAACCTCC
TGTTTTAGCG
TACTTGTCTA GATTCGATCT GATATAT'TTT AAAGTAGCTT ACTCCATTCT TTTACTTACG TTT'TG'rGTT CCACTCTA6AC CATTATAGCA 1740 1800 1860 1920 AAGGAGTGTG TGCCTGAAAA TATGGGAACT AAGGGGCTGG AGTATT'rGCC TTTTGCAAAG TGATCTTAAA TGCCTTTCVIC GTTTAACAAA ATCTAATCTA TTTTAGGTCA CTTATTCTT'r TTTTTCAAAG TTrTTTCGAAT CTTTTAAAA'r CTGTTTGC'TT TTTT'rTAATT CTCCCTATAT AGCCTGACAG CTTTCCCGAT CGTCTAGGTG GATGTCGGGG TATTCGGGAT TGAGTTT'rrT TTCTGACAT AGTTAGTGCC GTCTAC1-rGG AAGATGCCGA GGGGTATTCT TGATAAATAG GTAGTCGC'rG TTTCTTATCT 1980 2040 2100 2160 2220 2280 2340 2400 TTGGCTCCAT GGACTTGCTG ACGACATAAG CCATTGGGTC GTAGTCGTCT GGGATAATGG 1131
CGAGCGGCTA
A.AACTCCATA TCTAAATCGT TGTCCTGCAT AACACGAGAG TAAGTAGTCT GTCTGTAGTC GTCCAGTCTG T TTTTCTGAT CATACAGTTG CCTCTCGGCA TAGGTCAGAA TCCCGTTGGT CG'TAGATAGA TTGGATATCG CTAGGAGAAT AGGGCATCGA TCAAGCTACT GAATACTT'rA ACTAAGTCAA GACCTAACCC TTTTTTCATA ATTTCTAATG GTGrrTTAC AATTCTTATT GAGTCCAACC ATTACTAGTC TATATTGTTT ATAGTACGCT GTAGCI'GCTA AAACATTTCT AGAAATTAAT GTTCATATCT TATTTCAATC TA'rTATGTTT TTCACCTCTA ATCCATGAAT GAAATCGCTT TCTATTTTTG TAAGTAAAGC ATGAAAACCT T'rGTTGTGTT TTCGTAAAAA ATTTGTTGAC CCTGCAGAGA TAAACTACCV ATGATTTTTA CGATACTTCG CTTTACCTTG TCTGGGTGGT CCTrTGAAC TGGAGGAAAG ATATAGTAT'r TTTCTTAGTA TTATACCTAT CTTAGTACCC TATAGTTGAT TGAGTTTGGA TTGACTTTCC TAATAGAGTT ACAATCGCAA TCTCTTCTTT ATAACACGAA ATCCACGAAA 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3054 AGAGCACGAA ACGC C.
S
S
S S C. t INFORMATION FOR SEQ ID NO: 184: SEQUENCE CHARACTERISTICS: LENGTH: 1590 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184: TGTGATTTTC yGAAA.ATTTG GACAAGATAT CAGAATTTAG TrATTCACCA ATCAATCAAG TGATGAGGCT ATGCAAGCTG TGAACGTGCG GCTTATrTGC TGGTACTATC CTTGCCAAAG GCGTACAGCA GACTTGATTC AATGGAAGGT GGTGGTTTTG ACCAGTTGGT ATCGTGCTAG TAAAATTGCA CCTGCCTTGA TTCCATT~TCT GGACTCTTGT TTTCAACACC ATTACAGGTC GTAAAATATA TCTTA.ATCAT TAAATGGAAA ATGGAAATCA AAGAA'XTGGG TACAGTTCCA CGCGTGCAGC CCTGCCAGCA ATAAAACAGC AGCTATTTTA AAGTAGCAAA AGGGATTAAA GTTATGCTGC TGAGGAAGGT TTTCAGGAGG ACAAAAATTT TCTGAACA.AG AAATTACGAT GCCATGACTC AGACTGAAGC TGGCGAGCTT TATCAGCAGT GAACGCGATA AGGAAGAAAT GCAGCAATTG GAGAAGTAGT CTCCGTATCA CTGGACAAGC CTGGCTGTTG TCCGTCGTGA CCAGTTAATT TATCTGCTTC TTTAAGCCAC CAACACAAGG GCAGGGATTC CGGCAGGTGT TATATCATTG AGCACAAAGA
*CS
S. S C
S
AGGCAACAAG
CGATTGCrCC
TTGCAGGGAA
TGGCTAA.AGC
GTGGTTCAGA
TAAAAACAAA
CTTTA.ATTAT
TGTGGTCATG
ATTTGAAGAA
AATTGGGGAT
1132 AGTCAACTTC ATCAACTTTA CAGGTTCAAC TCCTATTGGA GAACGTAITG GTrCGTTTAGC TGGTATGCG'r CCTATCATGT TGGAACTTGG TGGGAAAGAT GCAGCTCTTG TACTAGAAGA 'rGCAGATTTG GAACATGCTG CCAAGCAAAT TGrTGCGGGA GCCTrTAGCT ACTCAGGACA ACGTTGCACG GCCATTAAAC GTGTC-ATTGT TCTCGA).AGT GTACCAGATA ATAGCTAC TTTGCTTCAG GAAGAAGTTr CTAAATTAAC TACACCTGTr ATTGACAATG CTTCAGCCGA AGAAAAAGAA GCTCAGGCTC TTACACCAAT GCTTTT'rGAC CAAGT'rACAA AAGATATGAA TTTACCAATC ATTCGTGTGG CTAGTGTAGA ATTCGGCCTT CAATCATCAG TCTTTACAAA AAAACTrGAA GTAGGTACAG TCCACATTMA CCCATTCCTT GGTGTCAA.AG GTTCTGGAGC AGCGATGACA AATGTCAAAT CCATTGTTrT TTGTTTTCCT GGTTTTATTT T'rTTGCTATA CTTTTTGGTA TTATAATAGA TTGAAACCGG INFORMATION FOR SEQ ID NO: 1E Wi SEQUENCE CHARACTERISTICS LENGTH: 4848 base TYPE: nucleic acid STRANDEDNESS: doub] TOPOLOGY: linear AGTTGGTGAT CCATTTGACA ATGCTGATAT CTTCAI-rTCG GGCT'TGATTG AGGATGCACA CAAACGTGAG GGCAATCTTC TCTGGCCAGT AGTGGCATGG GAAGAGCCAT TTGGTCCTGT GGAAGCTATT GCCTTGCCA ACGAATCTGA TGATTTCAAA A.AAGCCTTTG AAATTGCTGA TAATAAAACC C-AGCGTGGTC CAGATAATTT TGGAGTGCAA GGAATTAAAT ATAGCATTGA TGATGTGAAA TAACGTGTAA AACCAGGAAA AAATAATAAT AAT'TATAGAA AAAATACGAA 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1590 120 180 240 300 360 420 480 540 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185: CCTGCAGTrG TCAGACCTGT AATTT'rCTTT TTATCTGTAA TAAGAATCGT TCCAGCGCCT AGAAAACCCA CACCTGATAT AACTTGAGCT CCTAATCGTG TAGGATCTCC TGTCCCAAAT TTATAAGATA CGTATTCATT CGTCATCATA ATCAAACATG CAGCTAGACA AACAATACTA TAAGTTCGGA TGCCTGCAGG CTGGGATTTG CTCCCTCTCT CTAAACCAAT TATACTACCA ATGACTACTG ATAAAACAAT CCTGACAACT ATTTCAATAT TTGATAACCC AAGACTAGTG GCTGTCATGA TTATTTCCTI' ACTTTACGCC CCCGTCTTTG TGTGAAGTAT AATACCGTTC CAGAAATAAT CATCAGAACA ATTGTATAAA CAALATACCAG AGCTTGTGCA TTAGATGTTG CTG'PTTCATC ACCTGCAGAT CGAATCGTAA TACCTAATGG 'PTGAGCTAGG GGATGGTAAA.
GGAATACAGA TAAGTCGAAG TCAGTTAATA AAGAGTTAAA GTTTAAAGCA ATAACAGAGA GAACAACCGG TAAAATAAAT GGAATGA'rAA CCATACTTCT TGCTGCATCT TCCATCTCAT TTCTATAAGA AAA'rGGGATT TTTACAACTA CTACCAAAAT CTGA'rTCAAG ACAAGAAATT C1'GCTAAAAG 'rGTACTTGGT AGTAACCAAG CAAAACGAGA TTTATGrrTT CTGACAACAC rrrGTCGCAGC AATAATAGAA TAAATAAAGC TACTAAAGAA TAAGCGATAA T'PTTCTAAAG CCTTCATCA'r AGTATAAAAA GGTGAAGCAC CATCAACACT AAATAAAATA GCACGTACCA TATATGCAAT AAGTAGAA'N' ACCAAACTAC GTGGCTGATT AAAAGTAAAT AATAAACTTA GAAGTAGAGC ACCATATTCA GAGCAAATAC AACI'GCGAGA TGACCAAGAA TGGAGAGAAT TAAAGTTTGA TAATGT'rAAG
AATAAGAAAT
ATTGTTGCTG
GCCGCACTAT
TTACCTrGTTT
AGCATGAAAA
GACGCAA'TTT
CCTTTTTCTA
GCAAGTAGGG
GAATTGCAAC
CTGTGAACAA
TTTGTTTTTT
TCT'rATTCAT TGGATCTGTA AATGAGTATA ATACTATAAA AATTAGTGGA TCCATATGCT ACAATGTGAG CAATGATA'11 CCAAGGCTTA AAGAGGCGCT 'rTAGTCT'rAG AGATAGAAAT ATAAT'rTCCA GATAGTAAGC AAAATTGTAG 'rrGCAATACC TAAAATAATT CAGCTAAATC ACGAGAATTC CCCATCCCTG CAAA'rGTAAT AA'rCATTGGA TTTATAGTTT GAAATTCTTT ACCACCAACA ATCATGGGTG CTCCTACTGC AGATAAACCA CTAAGAAAAA CCATAATAGT AAGTGCAAAT AGAGTTGGAA TTAAGGTTGG TAACACTAC'r TTTCGGAAAA CAGTAAATGG TrTTTGCTCCC ATATTTCGAG CAGCCTCAAT AGTGTGATAG TCAACGCTTC GAATTGTATT TGTTAAAAAC AATGTATGAT TAGCAGTTCC TGAAAATGTC A'rAATGAATA AGACTGCACC ATACCCAATA AACCAGT'rAG GGTCTAAAGA AGGGATAACA TTTTGTAAAA 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 ATTTTGTAAT CAATCCATAA CTCCATA.AAT TAAAGAGGTC AGTAC'TCTGT AAATAGAACA ATGCTAACTT AAAACTGTTC GTACAGCATC AAGGGAAAAT TTGGATAAAT AATAAATG'TT TTAAA'rTTAA TTTATGACGC GATAAATAAT TCCACACTTT TACATTAAGA ATCTGACTTT CTCAACATCA ATAATTGTCC AACTTTCTCT AATCGAATGT GGACCATAGA CAAATTTATA ATATAACCTA ATTTTAAAAT CAAAGAATAC CTACGACATT ATAATACTCT GAAGTGCCCT 'rCTCCTCCTT TTACAAATAC ACTAAGAACC AGATTAACCC TCCAGTCGCT AAAACCACTC TTTAGCACCT TTAATATCAA AACTGTAATA ATGAGTGAAA CTGAGA'TTT AGAACACGAT ATTCACTACT AGATCAAAGT TAAACGAATA AGCCAATCITT ATACTGCACC TCCTTAAAAT TGCAGAACGT CTGATGGTGT CTCCGACAGA TCTAATAGCA GCCTGACTAT CAATACTTGT CAGAAACTTT TATTGTATAG TGAATTGTAA CTCCAGAAAA CTTTTAGAAT AAAATCTTGT TCAGTTTCAC GATTGAATCG ATCCTTTTTT ATCCTCTAAG AAAACGCTTG TATTTTTCAA 1134 TAATACT'rCG TGGACTGTrT CATCGGTCAA AACATTAATA TCTCCAATAA AATCACATAC AAATTCAGrr TGAGAATTAT GATAAATcrc TACTGGTGTrA CCGACCTGTT CGATGTATCC ATTGTTAAAG ACTGCAATTC TATCAGATAA AGTCAAGGCT TCCrC1-GAT CATGAG'rAAC ATATAAAGTA GTAATACCTA ACTCTTTTTG AAGTCTTTTC AACTC=ITC TCAAATCTAC ACG'rAATTTTr GCGTCAAGGT TTGACAATGG TTCATCTAGA CAA~GAATTT TAGGTTCAAG AACCAGAGCA CGAGCCAATG CTACCCTTTG TTGTTGACCC CCAGATAATT CTGATACATT ACGCTGTAAC TGTTGATCAG AGATCTTAAT TTTITGCTGCC ACTGCTGATA CTTTAGCT'TT AATAACATCT GGAGCTACCT TCTTAAC1'rT CATAGTTGGA AATAGCGCAT AAGAT'rGAAA CAAATGAGTG ACATCTGTTC CATTAACTTC TACCAATGCT CTCAAAGTAG TTGATTTACC TTCCCCTTCA TGTATATCTA AATTCAGATT TTGAATATTA TCAAATTTAA TCATCTCACT TCTTTATTTrC T'rCCATAAAT TT'AGAAATAA TATTTCTTAT TGTACGTATT CTAATTCAGC AACAGCTTCC CAGTCAATAT TTTGTGGTTT AGGTAGjATCT 'rTGAGGGCAT CTTTATTTGC TAC'N'GAATT TCTGATTGAC CAAACCAATC ACTAGTGCTT AAAACCATAG TTTGTTCAGT TAAACCAAAT GCAATATTAT TACAATACCA APTCCACGCT AATAC~rCCT GATGATGGA'r ACATCCTGAA GGCCCAAGAA ATCAATTGCA ACAAAATCAC CATATTTAAT AACTCCCTCT AT'rACTAAAC CAAAAGCCTC TAGAGAGACT TGGACATAAA AATTAACTCT TTTTTCTACC CATTCATCCA AATGCTTTCC CACTTGATCA ACAAATTT-CT TCGTATCTTC
CAAAAACAGT
TTTCAGGTTC
CTAGAAAACC
ATGTAAAAAA
2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080
AGGAATAGAT
AATAAATTCT
AACTTTGAAA ACAACATTTT TCCATATTGT ATTGGATCTT
AAGACTGTAT
TAA'rTCACCT
ACCTCCTTGA
ATCTTTAGGC
AGGATTATAA
CTTGTACTGT
AAATACCATA
GCCAGCGATA
GCATTTTTCA
TTATCATCAA
AGACCAGAAA
ATTTCTTTTA
TAATTATCTT
GTTCTTTTTG TCCAACTAAT CTT'rGTCTA-A CATCTTAACA AATATTCTTT TGCTACTTCC GGTA'rCGAAC TAAGATACT'r TTGAATATTT ACCTTTATAC CATCAGGCGC CCCAATTAAA TATCTGATAA AGATTGATCA CCAAAGTTCT TACTATATTC TTAGCTAACG CTTGTTTTTT ACACCAATCT CAGGAGTCAT GCACCAGAAC CCCACATCAT ATTGAACTTT CTCCCTTTTG CAACCTTTTT CGGAAACACC GCTAGAATTG CCCGTCCTGT TTACTACCTA ATTCAGTCCA ACTAATGGTT GAACAATCAC ATTTTrATCTA ACCATTTAGG GAATCAACAG CACCAATTCC ACACGGTCTG CTAATTGAC TTTGCTTTAG CAGTTAACCA AATT=~GAC TTTTATCAGC ACTAGTAATT TTTGATCTCT TCTGCAACTG CATTATTCTT TCAACCATTT TTATATTAAA
AATTTTATTT
CTCAGCAATA
ACCAGCTTCT
ATCACCACGA CCATTTGAGA CTGAGTTCGA ATAGATAACT
?TTTCTTCA
AGTAAgAGCA TTCTcCTTT'r
TTCAAATACA
TrCAGTTAAT
CCGAAGATAC
TGCTrTTTCT AGGT'rCAACT
GATGA.AGA.AG
ACTCCCGTTG
TTATTAT
GTATAGTCAA
TC'rGCTCTGA
CCAATT'GAAA
GAGACATAAG
AGTATGTr CAGTCGTAGA ArTTGAACCT CCAGAGCAAG CAGCAAGTGT CAAGTACAGT AGACCAAACT TTCA TTT TTT AT'rTAAA 1Tr
TACGGTTTAC
CTTTAGTAAA
TATGG3AATTG
TACGAATTTC
GTTTTCCCAT
GTTGTTTAGC
TCCAAAACTC
TTCCTTGGAG
ATATCAACAA
TTCGTGATAT GGAACAAATT AGTAATAGTr GGAPATCTTCT CTCTTCTTCC TCCTCTTCGG ATATCTATCA TGATTAGGGA TTCTAATCTC TTTGCAGAAG TTCAGTrATA CGCATATGAA AA.AGTAATCA TGTATTTCCT
TCATGATAAG
GTCrCATATC
CTAATAAAAT
TTAGAGGAAT
A.ACAAACACC
CTTCATCTGC
TTTCTTCATC
GTAAAGGTGT
4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4848 CAACAATGGA AAAATTTCAA ATCTAGAGGA AGATTACTGC 'rTACAGTCAT GTGAATAGAA CTCTATAACG TGATTGAATA 'rTGCTACAAC TGTATI'CCCA g LTCACGATT
TTAAAGTAAA
CTTCCATCA.A
TTCATGGCAC AACAATTCAA CT'rATCGATA AATGGTAATT ATCTTGTTTA GTATAAAGAT ATTCTCGG GGGAAATGAT TAAATTCCCC INFORMATION FOR SEQ ID NO: 186: SEQUENCE CHARACTERISTICS: A) LENGTH: 3763 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 186:
GTTATAAGCA
AATGAGTTGG
TCGCAAACTA
TGAAGCAAGC
ACAAGCATAG
TGAATCCGAA
CTCTTTTAGT
TAAATGACCT
TAAACTGTTC
?TTTAGCTGGT
TGGACTGTTT
TGCCCTCCTT
TCCGTTCCAT
TAACGACAGT
TTCAACTGGG
CCATCGAAAG
GAAACTGTAA AAAGAATTCG CCTCGTCAGT TCTGGAAAGA CCAATAATTT TGGAAAGTAG AACCTGTTAA CAGTTGAAAG
AGCGGCGTTG
AAAAAAGTTC
ATAGTTGGTA
ACACCTTCTT GCTTGCCATA AGTTGTGAAA. TGGGTAGAAT CGATATCTAC ACCAATTCAA GGTTGAGGCA AAACGGGATA AGGTTGGCTG GCATCAGCTG ACAATTCTIT AGGAACTGGA CAAGGATATC ACTAAATACT TAGAAATCCG ATAAGACCAC CATACTGGGT TTTTGGA.AGT GATGATTTGG TCTACACTTA TACCATAAAG
GTCATTCGTT
CTGAAAAAAG
AAAAGACTTG
TGTGTTTTTT
ATGTGAGTTT CCTTTCTr GGGAAACTCT TTTTTGTCTA GTAAAAAACA CCCATTGGGT GAAAAAAGAA ACCATCCAGG 600
ATCTAAGCTA
TAGGAGAAAT
TGA'I1r=CCC 1136 AGGCAAGGAT TCTGGATGGT TTTrAGATTT GGGGTGAATA AITGGGGATT- GATGGTATCT TCCAAATCAA AATCAACTTC ACTCCATAGT CTCAACTGAT ATCTTGATAG GTCACATCCT TGTCAAGGAT AAACTGAGTC AACACCTCAT GTTGACCTI'G ACACCTGATG TCATCTACCA AGAGCCAGAC TTTTTCTCCT GTCAAGATAA GGCAAATCAG G'rrCTGCTGA
ATCCTCTACC
CCAATAAGCC
GTGAGGATAG
AA'rGCACTCC CTCCCTTTCT TTATGGTGAC
AAAACAGGGA
CCCACGATCC CGTGATTCTT AAGCACTCTT TAAGAGATAA TGAAAATAGG TTGGCCTTGA CACTTCCTAG CGGACCATTT GTCCAGAGTC TTCAAAGATC GTCGCTTGAC CTTTTCTCGC GACCGTTAAG AAGGTCTTCC TTTCTGTAGA ATCGCTATCA GAATATAGGT CGCCATCTTT AGAGACACAA ATCCAGCAAG TCCGGAGCT'r TCCCATCTAC AATGCAGGTC CGTTCATATA TCTCCCGATA AGAATAACGC TACTGTAAGC AAAAACTATT CTCGTCACTA TTGAAAAATA GATAACGATG TTCATCCTTA ATGGACTTAG GCTGCCAAGC TCTCTTTTCA CCCAGGAACA AGAGGCTAAG CAAATCAACT TGGTTCAAAA CCACAGCAGA CAGGCTCAAA CCAAAAGCCA AAGTCCGTCC ATCTAAGCCT TCCAGCAACT CTTCGTAACT ATCTTGCAAG GCTTTATAAA CCTC'rACATG ATAGAGAATC ATCTGTGTCT CAATTTGCTG TTTCAACTCC
AACATGAGGA
CCCTCAA'rAT TATTCATATr' GAATGAC'rCC
CCAGCATCTA
TGACTATCGGG
ATGCAGACAT
AATTCCTGCA
TTAACATCCA
ATTTCTGTCG
GTCATCATTT
TCTGGAAGCA
GACTGTTCAA
'rCTGAAGCA.A
S
S
a.
S
ACTGCrTTCC
AATGGTAAGC
GAATTGTTTG
ATCTCCTAAA
840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 TTCTTCTAGA TCCATCTTAT TAAAATCCCC CAGTTACTAA CTGAAAAGAA ATGATAGATA GCAAGCATCG GGGTGTACTT GGCGCGATAG TAGCTTTTCA TAAAGTCAAT CTGCTTTTCT AGACTGACCA AAATTTTCTC TAGTTCTTTC TCCTCTAGCA AGTCAAATTT CAAGAGGAGC AAGAGTAGTT TCAACCAAGT AAAGGAACGA ATACCCGTAT CCAAGGTTCT AGTCATCAAG GATTCAGGAG AAAATTCTCT CACCTGCTCA ATCCAATCAA ATAGAAAGAA CTTGCACTTT TGAATATAGT CCT'rATCTCC TTCTACCAGA 'rACCCTATCA TAAACTGCAA GAGATATTCT TGTCGATTGA GCATATAAGA CCATT-CTGGA TCATCTTCAA ATACTTGATC CCATACCATC GOCTGGATTT GATGGATTTT TGAACAAGGC TCCATATCCC -AAGGACTATC AAACATAAAA CGATTGTCCA TCAAGcG.TTC ALAGGGAACTC TTGACTTTCT CATAGTCTTT TGAACAGTGC GACAAGATAT AATCACGACA TTGATTTCCA TCGACTCT'TT CAAAAAATTG TCTTCTTTC'r TCTTTCATTA TCTATTACCA GAAAAAGAAC TACTTAAAAA GCAGTTCT'r? TGTCTTTCCC ATTACACTTT CCTTTTCTAC ATGGATGACC ACACCTTTTG CAATCTGCAA GGAGACCAAG TCATCTTGGA TAGAAATCAT TT'rTCCATGA ATTCCAGACA ATAACAACAC ?TCATCACCA AATGTTAAAG AAGCTrAAATA CTCTTGTCGT TGCTCCATCT GTTTGCGAAG CAACTTTrGC CTGCCAAGAC AATCACTATT CCAGCTTCCG ATAATrCCGA CTTTTCTTTG ATCAAGTAGT AACCTTATCA AGCATTCCCT TGACGAATAG AATGAAAGCT TGACAGTAAA AGGGGAC'TCA CCATAAAACA ATGTTGTATC CATTAAGCTA TAATCTTAAG TGATAACTGT TAAAATAACG AGTTTATATG TTGTCCAI-rT AAACTAAAAG TGTAAATAGG GCTGGTAGAA GAGCTGGAGC GAATACTrAC GATACTTTGT TTAGCGTCTG CTTTAACTTC S. 9 5 55 S S
S
S. 5 5 55 S S CCCTGCAGCA AAGGTAATCG TACGtTACAC CGATAATATT TCAATCACAG TTGTTCCTAG AAAATCGTAT TCATCGCAAG AACGAAGCTG CGATGGTTGA ATACCTGCCA ATGGTCCCAT CCATTTTCCA ACATTACAAG TTATAGAATT CACAGTTTC TGTTTTTTCA AAGCAGGATA TTAAATCCAT TTTGACAAAA AATTTGTTAG ATCCAGTCAT CTTGTTATAA AATTCAATCA GATGTTTAGA TAAGCTGCAC AGCCATCACT GACAAGATCA CAAACCTGTA AGTAATACCG GCACCATAAT CTTA.ACAGAT GGCAATACGA GAAATCGTTG TTTGTATCCA TACAGACCAG GAAGAACAAG ATTGGACCGA GAACAATGGA GCTAAACAGA CAAGGCCATC TTGATGCTAC ATGCAAGCTG GTAATAAAAG GTCGCTGCCA AACCAGCAAT CCATCTGTTC GCTTAGTTTA TTGACAATTT AATCGCTGTT CAACCAAGCC TTCTTGAGCA ATTGAGAAAG AGAATCCCCA GTGTTTCTTT TGCCGGACGG GCAGGAAGTG TGGGTTGGTA 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3763
TTCCAAGGCT
CATCACA'rTG
GAATGCCCGC
CGTGTGCrTC
AAGCAAAGAT
AAACATATCC
TCGCAAAACC
GTGGAATGTA
TGGTAGAAAC CTTCCTGATC CTCTCCATAG GCATATCCCA ACCCTTGATA GTTACTATAG AAAGACGTTT TAAGATAATC ACGT77TGTT CTCCTCTACC ACATGATCCG CTGTTTTTGG AGTACCTACA ATTGCAATAC CAATTGTTGG CAACA.AGACA AAGGGAATCA ACTCTTTCTT GATAGCTGGG AGCATTTTAC GTCTACGAGT 'rTCAACAAGG ATAAAGGCAA ACAACCAAAT TTTTCAAGT GCT
CAGCAACTGT
TATCCATTGA
TGTTGCATTT
*S 55 S S
S
~,S
S
5.5.
*5 S S
S
AAGGGCACCA AGCAACCCAA GGTAAATCCA AGAGTGAACT TAAATTTCTT CAAATTATGG INFORMATION FOR SEQ ID NO: 187: SEQUENCE CHARACTERISTICS: LENGTH- 5053 base pairs TYPE: nucleic acid STR.ANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 187: CAATCTCTGA GTATGTGCGG TCAATACTAw CAATTGGmnCT ATAGGTAATG GCAACCACTC GATAGTCTTG CTTCTcATrr GTACCATTGA 1138 CAAACGGAAT yCCI'GACGTC CPLTCAACTTT ATTATGACGC TAGAACATAA GAGTAATTTG TATAGACTTC AITTTTCCACA TGCATAGCAA ATTCTGAAAA GTACAATGAT TGCAATCGTT TCTGTTCGAT TTTT'rTCAT GAATGTAATT CA LAGTNA ATCGCTTGTT CCAC'N-r'r'r CT'rTTTCTTT ATTAATT~ACA CGTGAAACAG TTCCAACACT CATCTTTCA'r GGTAATTGAT TTrCTTTGTT CTACCATAT'r AGTATCATGC AAATGCTTTr TAAGCAACTA TTTCTCAATC TCCCATCATG AATAALAATCA CTCCAATT4G CTTTTGAAAA AACATCTACA TAAAACAGGA AAAGCCTTGG TTTCATGGCT AAAAGCAAGA GTTTTAGATG GCTATAAATC TAGATGTACA GTCTTTTCT'r AACAAAAACA CCCCCAAAAT TAGACTTTTT AGT'rCAAACG CGAAATAGCG TTTT TTTGTT ATTTTTGGTT ATCATGGCAT TTAACAAGTA TATGAGTGAG ACCGTGTTrA CTCTTATTTT CAATAGGAGG AATAATAAAA TTAGAAATAA TCTAAAGATT CCTTTGATAA TCTAATTCA GTCCAAACTT
GAAGGGATGC
TCCTCTAGCG
CAAAGTTACT
AACTCCTGCT
ATCACC'rCCT
ATTTGGCC
TACTTCAA'1r
TTTTTCGTAT
AAGTAATGTT
AACATCTCCA
TA'INCCT
CAGATACTTG
TAGTAATCTG
TCTTT'AATGC
TCTAAAGCAA
TTCAATATAT
AGATCATTTA
TTCATGTGTA
CTTCTATAAA
TTTTGCTTAA ATGATGAAG CTGTCTAACT T?1'GAGGTAC ACTCATCTAA TCGAATAAAC TATDATTTGA ATAGATGAGT TGATATCATA AGGTGAATCT CCAGTTCAAA ATTATTGCTA 120 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 '440 1500 1560 1620 1680 1740 1800 CAATAATAAG AAAGTGTCTC ACTAAAACCT TTAGT'rrAGG GTGATGAAGG TATAAGATAG TCTAAATAGT GAGAAAGCTC AAGTTCCTGA 'NTGTATTCCC COATACAGAT GTGCGGTAT ATCTGA'rACT TGACTGAAAT ACATAACTAT CCTrTTTAC TGjAAAGTAAG ACACAAATAA GATTCAAAAC TCTGAGCAAC TGGTCG'rrAA AAGAATCTTT TTCGTAACTA GGAGCAACTr TCC'N'ATAAA CCAATTCTAA TGCAACGAAT TTGCATGAT ACTGATCAAA ATTACTCATA CTGATTr'rGT AGCAAATTAA TCACCAAATG TTTGGTATGA ATGATTTACC ATCA'N'GAAC TAGAACAAAC CTCAAGAGTC TT'TTTTTATA TCTGAAACAA ATTTTGGAAA AATATTTTGA TTTTTGATCA AATAAAATAA ACTCAGTAAA CAACTCTTGA ATGCAGATGC CAAATCAGAT TATCCTTATT CTCCATTTCiA CTGATCAATA- AAATCACTCA ATAGATGGTA AGATTTTTCA CCATTTCATA AAGAGACTTT CATCTATGAA AAACATTTTT TTGGCAAACA ACTTCTTCAT CTAAAGAGAT ATTGTAT'rCT ACCTTCTATT CCT'rCTGCCT GCATTAAAAA ATCCAAACTT ATCTACTTCC ATAAAATGAC CAAAC'rTTAT TCTATATAGG TAGCATTCTA TGCGTTGACA AATTCATTGG AAACCTTGTT CAATTGAGAT AGTGGCTCTG ATGAAAAATT TTCAAATGGC
CATTCTAGGA
TCATTTCCAA
1139 AATAATATTI' TTCTGAAAAA TATTGTGCAA TGATT'rGAAC AGGGGTCAGA C1TAACTTCAA AC1'TTATTGA TTTGGCTAAT AATACGATAG AGCGAAGATG CAAATACTCT CAGCTI'GACA ACCTTCATTA AAGAAGATGA GTTGAATGTT TAAAGAAATG ATGGTAAACC AT'rTCAATAT ATGCGTATAC CATTAGTAGA AGAATGAAAA ATCAAGTCAG GATAGATCAT CTTTGACTGC ACGTTCTGTA CAATTTAATA TGAAACCAAC GTTTATGTTC AAATAATAAT TCTAATAATT AAAAGTAACG AATGTCTCTC ATTGAAATTG CCTTTTAATC AACTGATATA AAATTCTTTA ATTCTAAAAT CGAAAAATGA CACTATCATC GGTATTAATA GAAAAGCAGA TTAACATGG ACTCTGCTAG TTCAGAACGA
TTAGATAATA
T'rGCA.ATTAT
TATTTCGTAT
TTGAGGGCGT
AATCTCTCAT
ACCACAAAGA
AATAACACTA
GAATACAGAA
GAATATCTT'r CTCTCTTTAT ATAGGTATAG CATGATATAA CGGAGACAAT ATATAAACAA TCAA.AT'rCAA GTCTAAAGAT
CTAATTGCCT
AAATTATCGG
CGACTTT'TCC
TTTTCTTATT
TATATTTTTA
TTATATAATA GCAACAATTA AAGAATTTGA TTTTTTAAAA TTATATAATA AAATAATTGA CTTTTCTATA TTAAAGTTAT ATAATAGTAA TAATCAAAGA TTGATATTAA AATAAAAAAG GAGGGTAGGC AGTGTTGTGA TCAATTATTG TAT'rGGTCTC TTGGCAGGTA AAATCACTAA AAAAGTAGTT CTATGGGAAT GTATTCGCTG GTTTAGTCGG GGCATATGCA GGACAATCTC TTTTAGGTAG GCAATCGCTG GAATGGCTTT GCTCCCATCT ATTGTAGGTG CAGCGATTGT GTGTCATTCT TTACAGGTAG AAAGTAAAC'r TTTCGCCAGT AAAGTTAGCA AAATCAA'rGA CGGGAAAAAT AGTTTAAATG TTAAATCOAA AGGATTGTAT
ATGACTTTTT
ATTA-AACCTC
TAAAATCTTT
TTACCGTCTA
ATTTTAAAAA
ATAACAATCG
AATTGATTTT
CTGGAGGTCT
CATCGCAAAT
TTGGGGTCCA
GATTACTGTA
AACTATTTTT
ATGTCAAAAG
1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540
CAAAGAAAAT
TGATAGAT'rA
CCGTAGTTGA
TAGTTTTATT
'rTGAAACTAA
ATGTTTCATT
TCATCAAGTT
ATTCTATCTT
ATCCATTTTA
AAACGATACA
ATTTTCTGTA TTTTAATCTT GACAATTTTC CTTCCTGTTT AGTGATCTAG GTATTCATCT ACTTAGCTGG AGACAGAACT
GCTAGATATG
GT'TGTGATGT
TTAAAATTAA
TTGATCAAGA
GAAGGTAAAA
TCTT'rTGGGG
TTTATCCTAA
AGAATTCGGC
ACCCAACTGT
TTCTTCCTTC
GACAGTGGTT
ACGTTACTTG
AATCGAAGGT
TCATGTAAAT
AGACAACATC
CTATCAACTT
GAAATCCAAC
TTTGT'rAGAA
TTACGAAAAA
GCTGACAGAT
GTTTGGTGAG TGATCATAGA ATAAATGTTT CGTTCATGTA GCCAAATAAT TCAAAATGAA AAGTAAAACT TGAAGTTGCA ATAACTAATG GATTGAAGCA GTTTTTTGGT ATTGAGCGTC GTAAAAAATT ACCAACCAAA ACCTCAAAAC AAAAAGACTG 1140 TTAGTCGTGT GAAGTAAGGA AGTAAAAAA'r GGAATGGCTT AAACAATATC GATATCCAAT TATCGCTGGT CTCATAGGCG TATr'rCTGGC ?rGTr'rGAT-I GTCTCC1TrrG G~cIcAA AACAATATTT GTATTGATTT TAGGAGCACT GGGAGTTGCA GCTGGATTAT ATATCGAAAA AAACTATATA CATAAATAAA AAAATAAAAA TTACTAAT1TT TCAAACGAAA AAAACACAAA CACTAACGTA GAAAAGAAAG GAAATCAAAG GGGAACTTAC CTAGAAAACC TTTCAGGTCT AAAATCGTTA ACAGCGATGA GTTGCAGTTG ACT'rAAACGT GAAATCAGAG AAATCGTATC ATCAACGTAA ACGTTGTCGA CTTCAAGATC GCGTATCTGA GAAAAAGCTA AATCTGGTCT GGTGTAGAAG CTG'rTAAAGG AACTAAGATA AAATAAATAT CAAGCTAAAG GT'rCTATTAA AAAGAAGGTG CAGCTGAAAA GACGC'rGTAG AAGGTGCTGT GGTTAAAAGT TACTTTATCT TACGAAGAT AAAGTTATCC
TTGGGAATC
CGTAACAAGT
TATTGTTGAG
TTCAGAAGTT
CATCAAAACT
CGTTGCTGAA
TGGATCTGT
TGCAGCAAAT
AACAGGAGAA
AGAAGGTGTT
AGTTGTTTCT
AGAAGGTGTT
T'rTTAGTAAT
GATGGTGGTT
GGTGTTAACG
TACCAAAAAA
GCTAAAkATGA
AAAGAACAGC
TCAACAGGAG
TTrCTCAACTG
GGTGTAGTAT
ATTATCATGT
GGGAAAGCCA
AAAGTAAAAG
AAAAACATGT
ATTAGTCAAA
A'IrGCGGAAC AATTAAAGGA GTTTCATATG ATGCTACTGT TGTAGCTC.AC AAAAAATCAT TGGTCTTTCA TCT'rCTCAAA. TCTTAAAGAA TAGAAG'rTGG TAAAACACAA ATGTTCCAGC 'r'r'AATTCA CTGACTTGGA AATTGT'rGAA ATGAAGCAGA CTCAGTAAGC AATTCACTTC AGAACAATTC TTCAAGAAAA AGTTAGCGAA CTCACGAAAA CACTCGTGTA CAGTAGAAGA AAAATTAAAT TCGGTGATGA A.AAAATGGAA AAGTTGCCGA AGACGC'rAAA TGAGTGGCGA CGATAAATAA AGAGTCTGAG TCAAGATGAT TC'TAGC'NTrT TAATTTTGCC TACGAATCCC ATGTCAATTC CAAACAAACC TTTATCTGCC TTGTCT~ACCC T'rGGAAGATA TTCATTCATA TACAGTCTCT 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5053
TCTCAGAAAA
TCTTTCTCTT
TCACTTGACG
CAAAGACAGA
CAAAAAGCTA GAGATTCCCA ATTATATTTC AGCAGGTTGT CTTACCTCTC AGATGACATC TTTCATATCA ATCTTACGTT
TGGCCATGAG
TCTTATA.ACC
TAGCGAAAAT
AAACTCC&TG ATATTCTTTA GTT -rThAAC ACTGGTAACG TTTGAGGGGC TGATTCAGGT TCATAATCGC AGTCAACATT GATTTCAAGG CTGT'TTGCTT TCTATCTCCC CGG INFORMATION FOR SEQ ID NO: 188: SEQUENCE CHARACTERISTICS: LENGTH: 6492 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 188: AATTCTC?1-1 TTTCCAACAA AATGTATGAC CTGCACI'GA ATrACTTCTCA TTCTrMTAC ATTCATCTAC TTTCATATAA TCTTTTACAA AATCATAATA TGACATAACA CACTATCCCT 'N'TAGACAAT ATTCCAATTA GCCTTATTAA TTCAAAACTA GATGTATAAT AGAAAAGCAA TGATAGATAT TATCAAT-A GATATTAAAG AAACGAGATA 'rGCTTATGAA GATTTACAAA AGATAAGAAA TATCTTOGGG TTTTGCCCAT AATTTCT AGTrATATGGA TAT'rATTTAA TCTACAAAT'r TCTAGArA-AG ATCCGGTGCA GAGACTATAG CATTAAAATC TGTTATTACA TTATTTTGTC TCAGGAATGT TTTCACATAT CTTGGGATTC *.AAAAAGGGaA TCGATGGTCT GGAAAAAGCA AGTTTTAGGT GGTCAAATAA GAAAGATTAT AGATGPCAAT.GCrGCACAAA ATGAT'rCCCG ATAGTrCTCA GGCAATAATC ACACCCTAC *ATAGTA.AGTA TAAGAGTTGG CATAATTTTG CTTGCTCTTA T TAGGGGCAA TGATGGGCGA GCAAGAATTT ATGAAGATAT CTAAGTGCTG AAACTGTTGA GTACGTGAGA GGAATGCkAG *AATGTAGAG'r CTTTTAAAAG CTTTTATAAG GCGATAAAAG *GATTATTCCC TATCTTGTAA AAGGCCTTAT GT'N'TGTATC ATrGCAATTT TAATTATrCC TATAGTTTAT TTTATGACTA **.ATTTTACrih; AGCTArCAT- CvTT'ATTT TTATCAGGAG .AGAATGATGT GtACTCCATG TATATTTCTC AAGGAAATTA TTGTATTAGT AATTATAACA GCGAATrTAT ATCTAAAAGG AAACTA'rG CTTATGTCCA GCTATATCTC CTGCACTTAC TTAATAATTA ATTCAAACTT CTAACAACTG GAGCrATATT AGGCTTGAAA CAAATTTAAG TCTTTGACTT AAATCCATCT CTCATCAGGT GGTAGCACAC TTGTACTTGC ACTTGGCTTT CTATAXATGG TGGC'ITAATT ACCAAGAATC CCTATCTAAA TTGTAAAAAT A'rTTAAAGCA ATTACTAAA GTATGCTTAT AATCGTTATT TTrTGGACTG GCTTACCTAG CGCAAAGGTG TTCTCTTTGT TTCATrCATG TGCAGTAGAT ACTTTAGAGG 180 240 300 360 420 480 540 600 650 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620
S.
CGCTI'TACGA AGATArGCAA AAAGACAAAT TAGTCCATGG TAATGTCA-r AATTTTAAAA ACTATAATAT AGAATTTGAG AATGTTAGC.T TTCCTTATAA TGATAAAGCT GTCAN'GAAA AT'rTATCCTT TAATTTAGAA GAAGGAAAGT CCTACGCACT TGTCGGTTCA 'rCTGGATCAG GCAAATCAAC AGTAGCAAAA CTTATATCAG GTTTTTACAA TGTTAATAAA GGAAGCATAA AGATAGGCGG GATAGCAATA AGTGAATATT CTGACGAAGC CTTAATTAAA GCCATTTCCT TTGTT'TrCA AGATTCAAAA TTATTCAAGA AGAGCATTTA TGATAATGTA GCGTTAGCTA ATAAAGATGC GACGAAAGAT1 GACGTTATGA GAGCCTTAAA ATTAGCAGGA TGCGATI'AA 1142 TATTAGACAA ATTCCCAGAA AGAGAAAATA CAATCATAGG CTCAAAAGGT CCGGTGGAGA AAA.ACAAAGA ATTGCAA IrG CTAGAGCAAT TTTAAAGGAT TTAT'rATGGA TGAAGCATCA GCATCTATTG ACCCAGA'rAA CGAGTTGAA CTTTAAAAA TCTTATGAAG GATAAAACAG TTATCATGAT TGCACACAGG TTAAAGACCT TGATGAAATT ATTrGTCATGG ATAGTCAAA AATTATAGAA ACAAAGAATT AATGTCAAAA GATACAAGGT ATAAGAGCCT GCAAGAGATG CGAATGAATG GAGGGTTTCA AATGAAAGAG TN'TATAA-AA AAAGATTTGC GTTTATT'rAT
TCCAAAATTA
TGCAAAAAG
CTATCTACAA
AGAGCGTCTG
TTTAACAGTG
TCTTACAGAT
TTGTATAAAC GGAGGAGCAA
ATGCTTCCTG
AGCAATGGCT
'rCTATCGAA'r
AGGACAGCGG
GACATTTCAC
ATACCAAACG
GAAATTTAAG
CCATATTACT
'N'TATATAGT
ACGATAAATT
AGAATTTATC
TAAAGCAACA CTGGCTTCAT TT~TTCGTTTA
TATGATTTTT
ATTCTCAGTT
ATATAACACA
AAAATTACCT
GC'TCAGGAAG TTTTGGAAPA 'rATGGGCAAA TTGATTT'rGA TAGCPLATGTA TATTTTGCTT 9 9 9* 9 9*9 9 9 9 9 9 9 9 99 9 AAACAATCATI GGCTGATATT TGGGCGGCAT GGTACTGTTT ACC'rATCAAG AA.AGTGCAGA CTATCTTACT TTTCTAAACA GAAGGCA'rAG AGCATGCAAT TTCCCATTAA TATCTGTAAT ATTCCATCTA TTTTAAGCfTT CAGAATAGAT ATTATGATGT ATGCAAATGG AGATTAAAGC AAAATGGAAG ATAGTGAGAA
TTTAAGAATA
TGACATTTCC
GAGCCACTCA
GATGCTAGCG
TATATTrA'rA
CTTAAGAAAA
ATATAATTTA
AGTACACTTA
GGCAATGTCA AGATGGGTr'r AGCTGTAATT CCTTTATCTA AAAAATATCA GGTTAATGGA 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 AACTCAGAAA GCTTTCAAGA TCGAAGGATA TTAAAGATGA AAGGCGGAAG TAACTACAAT CTTGC'rGTTG TGATATTTGT
AAATATCGAA
CTTATATAAA
TTfTAACTTTG 'rCTATATCTT CAATATTTAG CTTTATATCT CGGCGTAAAT CTAATTA TTA ATAAAGAGAT AAATTCTCTC 9. 9* 9 9 9 9999 9* 99 9 9 9 TACCTTATAG GATATTTACr AGC'TGCTATG AAGATAACAG ACTCTTTAGA TGCATCTAAA GAGGGCTITGA TGGAAATATT 'IrATTTATCG CCCAAAATAG AAAGATTAAA AGAAATTCAA AATCAAGATr TACAAGAAGG CGATGACTAT AGCT'rAAAAA AATTTGA'rAT TGATCTAAAA GATGTTGAGT TTGCCTACAA TAAAGACGCA AAAGTTTTAA ATGGTGTAAG TTTTAAAGCT AAGCAGGGAG AGGTCACTGC TTTGGTAGGT AAACTTATAT CAAGACTTTA TGATTATGAC ATAAAGGAAA TATCAACAGA ATCCCTTTTT1 GTTCTCTTTA ATCAAAGCC;r 'ATGGAAAAT GAAGAGGTITA AAAGAGcAGC AAAACTTGCA AAAGGTTTCG ATACAGTTAT TGGTGAAAAC GCAAGTGGCT GCGGTAAAAC AACTATCTTG AAGGGACAAA TCTTAATCGA TGGCAAAGAT GATAAGGTG'r CTATTGNTTT CCAAGATrGTG ATTAGAATCG GTAAGCAAGA TGCAAGTGAC AATTGCACAG ATTTTATAGA AAAAA'rGGAT GGAGCTGAGC 'rACAGGACG AGAAAGACAA 1143 CTTCTTAAAA GATGCGCCGA TATTGA'rCTT AGATGAGATA AGATTATCAA 'PAGCCAGAGC ACAGCAAGCC 'rTGATCTTAA CAACGAGAAA AAAGATAAAA CTGTTGTAAT CATT'rCACAT ATAGTAGI'TC TTCAAAACGG AACAG'rAGAA AAATCAAAAA T'rTACAAAAA TTTAATAGAA TAGGAGGACT ACAATGGATA ATAAAAAATT TTTTGGCGTA ATTTATTTTG CCTTCATGTT ATTGT'rCTTA ATATACCCGA CAGTATTAGC TATGGCTAAG GTTCAAAAGC CATGGGCACT GATGTTTGCA GCTGGTCATA CCTACGTAGT AGCAGAATTA ATTAGAAAGA TTGGTAATTA TGCAATCTTC AGCACATgGA TATGTAGCTC ATATATGGAG TGGTC'rTTGA TGACTATGGG AATAAC'rTAT CCTCACATGG CTTTAGTAGC AGCATATATA GGCAAGGCTC TATTGAAAAA ATACTTTACT CCTTGCCTAA TTTTATGGTG TTTTTGAGTA TACCTATTGT TATTAGAATC AAGATTCAAG AGTCTTTAAA TAATTTAGTT AGAATGAAAT CCATAGAAAA TGCAGACAAG AGCGAAGGTA AGCATGAAGA GCTTTTACAA AAGACAAAAA TGGCAGAAGA ATTTATTTAT AAAAGTAAAA GATTTAGTAA GCATCGGTGT TGGAGTTGGT ATGATGGGCT TGATTCCAAT CATAGTTGCA GGAACTGTTG TTATGTTATT ATTTATA~rT GGTATGATAT CACCACTTGT TGTGGTTTTA TCACTTATAG TAATGATAAT TAATTCATTT AAATACAATA TGCTTTCTTA TTTAATGCAA ATGCTT'rTAG CAAAAGAAAA AAAAGATTAT GTTGATGTAT TAGAAAAGTT CTTAGGTGCT 'rrCTTAGGAG GAATTCTTrGG ACACTTTrCA AATGGATTAT A'rTGTGTGGG CTATCTGAAT TAAACCCTAT AGTTAAGATG TTTATTT'rAC CATTTATGGC AGCAAGCTTT 0** 0.6- 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160
ATGATAAAGA
AAGAATGTAT
AAGAAAAACA
TATCTTGAAT
GCAAAAGCGG
GTAA.AGATAC
GGCTTAATAT
ATGA'rATATC
ATGGTAAGTC
AGACAAAGGG
TCTCAATGCT
CGACATTAGA
CCTCGGATGT AGGCGCAATA ATrTCATCGA TGGATAAGCT TAAGATTTCA CCATACCTAT TGCGGTTATG TTTAGATTCT TCCCATCTTT TAAGGAGGAG TCAAAATGGC TATGAGAGTA AGAGGGATAA ATTTTAAAAA CCCAGTCAAA ATGTTTCTGT GCCACTACTC ATTATATCAT CTAATATATC AGATGACATT CAGAAACAAA GGCAATAGAA AATCCAATTG CCAAGACCAG ATACATTCGC AGCTAATTCA TTTTGTTTAT GTTNTAGCGG TTCrGGACT TA'N'GTGGGA GGTTGAAATA AAAAATTTAA GTCTTGATTA TGGTGAAGAG CATATATTAG ACTATCCATA GCCGAGGGAG AGTGCGTGCT ATTTACAGGA AAAAGTGGAA ATCTTTAATA AATTCAATCA ATGGACTAGC TGTAAGGTAT GATAACGCAA CGAAATAATT ATTGATGGTA AGAATATAAA AAATTTCGAA CTTTATCAAA a. a TGTTTCAACT GTTTTTCAAA ATCCTAAGAC AT'rATTATTT TATTTGGAAA ATATCGGTCT ATATTTTTTT AATGTCAATA TGCAAGAGAA GAGATGGACA GGCGTTrGAA GGA'rATACTr GAGATATTCC TTAATCTATC CGGCGGTGAA AAACAAAT'rC CAAAGATTAT AGTTATGGAT GAGCCTTCAT TGGCAAAGAT GCTAAAGATA TTAAAAGAGA GAATTTATTA TTTGATGGAC ATAGTTGACC AAAAAACTTA TACTAGAAGT GAATTTTTAA GTFTAAGAGA TAAAGAATTA AGTAAATTAA ATCAGATAAA AAATCTTAGT TACAAATTTA TTTCGTTCAA GCTTGGGAAA ATT'rATGGCA CGCTTTrAAG ATGTTTAATA GGTCTTGAGA GAGAGAAGCT ATCTAAAAAA GAAAGACTCA ATCATCAATT ATTCACAGAT GAAGTATTCA ATGAAGAAAA GGCGAAAATC ATTTTAAACC CCAGAATCCT TGCCTTAGCT TAGATCCTGG TTrACTAGAC AAAAAAGAGT TTCCCCTTTA 1144 CGATAAAAAA TCTTTTGAAC AGAAATATAT TTTGCAT'rGC
CGAAN'TAGA
AAGGCATAAG
G'TGTATT'TTT
AGCTAGATAA
AAGTTCCTTA
CTGATGATGA
TAATAGGATC
AAAAATCAAA
AAAACTCTTC
AGCTTCTTAT ATAGCAGGrA TATrAAAAGC ATAAG'rGTTr CATAATTGP1' GCAGAGCATA AATAGATAAA GGAAAGCTTA AAATGAATTA AA'rGCTPTAA TTTAAAAGAA GGTGGAGAGT GTGTTTAAGC TTAAAAGATA CAACGGACGA GGAAAATCAA AGAAGAAATT 'rATTTTAAGG ACTTGTTATG CAAGATGTAA ACGAGCTTAG ATTAGGAGTA AAGAATTTTG CCAATTA'rTC ACCCCAAATC TAAAAACCAT ATGGTTTCTT TTTTCACCCA ATGGG'rGTTT TGGTATAAGT GTAGAAAAAA ACACAAAAAG 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6492 AAAGGAAACT CACATGAACA GTT'rACCAAA TCATCACTTC CAAA.ACAAGT CTTTrTACCA ACTATCTTTC GATGGAGGTC ATTTAACCCA GTATGGTGGT CTrATCTrTT TTCAGGAACT TTTTTCCCAG TTGAAACTAA AAGAGCGGAT TTCTAAGTAT TTAGTAACGA ATGACCAACG CCGCTAC'rGT CGTTATTCGG AT'rCAGATAT CCTTGTCCAG TTCCTCTTTC AACTGTTAAC AGGTTATGGA ACGGACTATG CTTGTAAAGA ATTGTCAGCT GATGCCTACT TTCCAAAATT GTTGGAAGGA GGGCAGCTTG TTCACAGCCA ACC'rTATCCC G'ITTTCTr'C CAGAACTGAC GAGGAA.ACAG TCCATAGTrT GCGATGCCTC AACCTTGAAT TGGTCGAATT CTTTTTACAT GTTCACCAGC TG INFORMTION FOR SEQ ID NO: 189: SEQUENCE 'CHARACTERISTICS:- LENGTH: 7174 base pairs B) TYPE: nucleic acid STR.ANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 189: AACTGAAGGT AAAGGCTTCG ACGCAGAACG TGACGCTGCC CAAGCTGCCC TTGATGACCT TAAGAAAGCT CAAGAAGACA ACAACTTGGA CGACATGAAA ACAAAACTTG AAGCATTGAA 12 120 CGAAAAAGCT CAAGGACTTG TCAAGAAGGA GCAGAAGGCG AGAGTTTACG GAAAAGTAAG AAGTTTATAA TGATTT'rTGT ATAATATTCC AATAGAATAT TTGTATCAAA AATGATGAAG TGTATCGGCG ATGGTACCAG ACCGTTCATA ATTGGTATGA 'rATAGGCAAA TACTAAGAAG TAAAATCAAA ATAAAGTTAA CGATAAAAAG GGACGGAATC AGAGTGTTCG TAACTGAACA CTGTTAAACT CTACGAACAA CACAAGCAAC AGGGAACGCA ATGAGTGTAT TGGATGAAGA GCCGCAGCAG CGCAACA.AGC
GGCCATGACG
GTATCTAAAA
AACATCAAAA
TCGTAGACGG
AATACACGAA
GATTTTATTG AATCAAGCTG
TTAGCTAGAT
CGGTAAGGAA
CTATGATATC
TTTGGACAGT
ATAACTATAG
ATAGAGAATT
TTTTGT'rACC
ATTAGAAATA
AGTTGTATT'r AGACAAAAAT ATATAAATAT TTTACTTATT TGCAGAGGTT TCATTTGTT'r GGGTITCTC CGGGTTTCAG AATTTCTTAC ATATTrAGCTG AACATGATAG TCAGTATTGT TGTCTGCATT CAAACATATA AATTTGTAAT CTTATGATCA ATTGGAAT'rA TTCTGTACTT ATAGGATATT GCAACCCAGC CTCTGTr'rTT TCATCAATAG AAAGGAACAA TAAATATAAA AGAAAGGAAT AGAAAGGAAT AAGGGTGTTC TTTTTCTAGG ACGTAAGCGT CCTTTGTATC TTGAATT'ATG ACGCTTCGGC AGACGAAATC 0 0 00 0 0 0* 0 04 .0 0 0 00 Os 0 TGAACCCGAC CTAAATGGTG GTTCGATTCA GTAACTGAAC ACGGGCTATG GACTGTGCCA CCGTCGTCAA AACTCCTAGA TGGCTGTG;TC AACAATACT1G AATTTTATGA TCGTCTGGGG AAAAAGGCTT ATCGTAAGCT TTCCAAAAAA GCTGAGGACA AGTACAAGGA AGTTCAAGAA CGTGCTGCCT ATGACCAGTA TGGTGCTGCA GGTTTCGGCG GTTTCAATGG GGCAGGTGGC TTCTTCGGCG GAGGCGGTTC TTCGCGCAAT CAGTATCGTG TCAATTTGAC CTT'rGAAGAA
GAACATCAAT
AAAAGATAGT
CGTTTGACGC
GTATCCAAAA
180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 TATCACCCAG ATATCAACAA GGAGCCTGGT GCCTATGAGA CTTTGAGTGA CGACCAAAAA GGCGCCAATG GTGGTTTTGG TGGAGCTGGT TTrCGGTGGTT TTGAGGATAT TTTCTCAAGT CCAAACGCTC CTCGCCAAGG AGATGATCTC GCTATCTTCG GAACTGAGAA GGAAGTTAAG 0* SO 0 0 0 0000 5 0000 *000 00 *0 S 0
TATCATCGTG
C CAGTCACTT
CTTGGTATGA
AAATATCCAT
GTGAAAATCC
GCAGGCTTTA
AAGCTGGCTG TCGTACATGT AATGGATCTG GTGCTAAGCC AGGGACAAGT GTGGACGCTG TCATGGCGCT GGTGTCATTA ACGTCGATAC GCAGACTCCT TGCGTCGCCA AGTAACCTGT GATGTCTGTC ACGGTCGAGG AAAAGAAATC GTACAACCTG TCATGGAACA GGTCATGAGA AACAAGCTCA TAGCGTACAT CTGCTGGTGT GGAAACAGGT CAACAAATTC GCCTCGCTGG TCAAGGTGAA ACGGTGGACC TTATGGTGAC TTGTA'rGTAG TAGTTTCTGT GGAAGCTAGC GACAAGTTTG AACGTGAAGG GCGCCTCTTG GTGATACAGT CCAGAGGGAA CTCAGACTGG CGTGGCGGTG CAGTTGGTGA AACGACCGCC AAAAAGTAGC CCAAAGAAAA AAGGCT'rCTT TCGAAAA'rCT CTTCAAACCA CAGTCGTATC TACAACCTCA GCTTTTTACT TTATAGAT'T CTTCGAAGTT CCATACCTAA
AACGACTATC
AGATATTCCA
TAAGAAGTTC
CCAATACGTT
1146
TTCTACAATC
ACTGTTCACG
CGCCTACGTA
AC'TGTTAATG
TCAACCTCAA CTTTGTCCAA GTGATGTTrGA ATTGGTTAT'r GTA6AGGGGGC ACCGAGCCTT TCGTAACACC GACAGCTG CTGGTGACTI GAAAGTAAA'r TTGATGGAGA ATAATACTCT CTTGAAAGAA TTCGCGGCTG TGACCATAT'r AAAGATGCCT CGTCAGCGTT GCCTTGCCGT AAACAGTGTT TTGAGCAGC TTTAAGACTT TCCTAAGTAA ACTTTGAACC TAAGT'rTTAA
ATATATGTGA
CGTGGCTAGT
TGACGGACGG
AGTTTCCGGA
CTGACTTCGT 2220 TTCCTAGTTT 2280 TAGTGACCTC 2340 CAGCTGAAAC 2400 TGAAATGGTG 2460 TGGGTATGGT 2520 CTTCGTCAGT 2580 TAGTTTGCTT 2640 CAAGCTGTTT CAGGTGT'TTT CATTACGGCA GAAAGTCTTC GAT'rTAGTTG AATCATACTC TTCAAAAATT TCTTCAAACC ACGTCAGCGT CGGCTTGTCA TACTGACTT'C GTCAGTTCTA TCCACAACCT CAAAACAGTG TTTGAGCTGA TCTATCCACA ACCTTAAAAC GGTGTTTTGA GCAGTCTGTG CCTACTTC
TTTGATTT
AATAGATTGA
TGCTTTKGAA
AAAAAAGAGC
GCTAGTCTAG
ATTGAGTATG AATTACCTAA ATTATGATGC ATAGTTGATG OGATATATAT AATAGAATAT GAACAAATTG ATAAGAGGAT TTTAAAGTAA TCTCTAACAA ACTATGGTGT C TATTCTAA ATTCAATTCA CTATA.ACTTG TTTACGTTTT CGTCGGGCTC TTTTTACTTA TCTTCAGTTC CCTGCATTTC TTTATCACA TCTGGATATC CTTTTCCAAG ACCTTAAACT TGTAAGTCAA GTCTTCTTCG TATTCCTTGA TAAGTTCTTT ACATCGTCCT TGATAGCTTG TGGCGATTTT TTGTAACCAG GATAATT'rCA TAGCAACTCC TCAAAGTCTG GTTCGTGGGC GGCACAAGGT GAACGTGAGT ATGATATTCA TACCAGCAGC TGGCCAAAGA GTTGGcTGGC TTTGGCACGA CCAAGGTGTG TGCTCATCT'r CATATACTT'r CAATCTGACA TAAAATCTAC TTGCTGGTTA ATGATT'rGCA GGCTGTTTTG GATAATATCC AACGCCGTCA GTGGTATTCA AGACTTCATC TGTGATGGTT ATAACTTCCG GCTGCAGCTC CTGCAAATAG CAGTAGGTTG TTAAGCGTTT TTGATGGTTT CAGCGACTTG AGCAAGTTTG GATAAAATCA ATCTTGAGGT CATCGTCAGC ACTGTAGCGA ATGAAAAACT GTTTGACCAG CGACTTCTTC ACAGTTGGAA CTTAGTGACT TTCATGACTT TTTGAGCTAC TTTTGGTACT GCTCGTAGCA TCCATCTCCA AAAGATTGCG ATAGTGTTCT 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600
TCCTAGTGTT
TGAAGCAGGA
CTCTACTrGTA ACTTGAGAGA TATCAAGAAA ATTTCCCCTG CGATGATTrT CTGAATTTTG ATATAATATA
GGCAAGGACC
ACAAAAAATG
GCTACATTAT
1147 ACCAGA'TrG GAGAAAATATI GTTrAGAAATT AAAAACCTGA CAGGTGGCTA TGT'rCATGTT CCTGTTTTGA AAGATGTGTC CTTTACTGTT GAAAGTGGGC AGTTGGTCGG ?TGATTGGT CTCAATGGTG CTGGGAAATC AACGACGATC AATGAGATTA TCGGTCTGTr GGCACCTTAT AGTGGCTCCA TCAATATCAA TGGCCTGACT CTGCAAGGAG ATGCGACTAG CTACCGCAAG CAGAT'TGGCT ACATTCCTGA GACGCCTAGT CTGTATGAGG AA'IrGACCCT CAGAGAGCA'r ATCGAAACGG TTGCTATGGC TTACGGTATT GAGCAAAAAG TGGCTTTCGA ACGAGTAGAG CCCTTGTTAA AAATGTTCCG 'rTTGGAACAG AAATTAGACT GGTTCCCTGT TCATTTTTCA AAAGGGATGA AGCAGAAGGT CATGATTATC TGTGCTTTTG TGGTGGATCC AAGTCTTTTC ATCGTGGATG AGCCTTTCCT TGGTCTTGAT CCGCTGGCTA TTTCTGATTT GATTCAGCTT 3660 .3720 3780 3840 3900 3960 4020 4080
'N'GGAAGTCG
GCGGAGAAGA
AATCTCCTGC
AGAAGCAAAA GGGCAAGTCT ATTCTCATGA TGTGTGATGC CTT'rG'CATT CTTCACAAGG AACTACGTGA AGCCTTTGAT ATGCCTGAGG TTGGCTCTGA CCAAAGAGGA TTCGTAAGGA GTGTCTTGGT TGC'rTGTCCT GTTGGGCTTr AAAATCATTG GCCTATCCTT GAGGAACTGC CACCTATATG AAATTAAGCT CCATCTCAAG AGACCCTTVT CTTGC'rGTTA GGATCTATGA AAGACTTGTT TATCTGCGCT ATGTGCTCAA CTAGCCTACC AGTACAGTCA
GTACCCACGT
GAGAGGTGCG
CTAGTTTGAA
TTTAAAGAGA
TGACCACTTT
ACTCTTACAA
TGTTTTACTT
TCTCTTACT1'
AGTCTTTI'GG
AATGGGTTAT
TTTCCACTTT
TGTTATTTCT
GCTGGATTCG 4200 TTCCAAAGGC 4260 TGATATTTAC 4320 AAGCAGGCCT 4380 GTCTTG'rTCC 4440 CATTT'rCCTG 4500 TTACTTTGGG 4560 GGAGAAGAGG 4620 CTCTTTGTAC 4680 GGCTTGCCAG 4740 TGTCAAAAGG 4800 CAAGAAAGCA 4860 TTGTTTGTAG GAATTACGTC GAGGCTCCAG ACAAGCTCTT CGTCAAACTG GCAT'rTCCCT TT'rGCGCCTT TATTTTTAGC TTTTTCTGCT CTATGTGCTT TTA'rTGGGGG CCAGCAAArr T'IrCACTGAA ACTGGACTGG AGCGTAAGCA AGTCTTGCTT CGTTTCTTTG ACAGCGTTAA GCGTCGTGCC TATCTGGACT GGAAGATTTG GCAAAATCTC TATCTGCGTr TCAG'rCTTCG TCTTCTCTrG CT'rTCCTTGC TTGCGACAGC AGTGGTAGT1' CTCTTTAACT
TAGGAAAATA
ACTGGGACTA
CCCTCTTTAC
TTATTTTAAA
CTTATCTGCG
TGGCGCAGGT
ACCTCTTGCT
GCAGGTCAAG GGAATTTCAA GGCTGTTCAG AAGGTGCCTG AAATGGCGAC CTCTTTGCTC TTT'rATCGAG CAAGCTTGGA CTTCCAGTTG CTGGCCCTCT GCTCGACAAG GGGCAAAAGG TGTTTTACT'r GTGGAATTAG AGCCTTACTA GGAGCTGGTT 4920 4980 5040 5100 5160 5220 5280 5340
ATCATGCCTT
AAAAAGGCT
'N'GTTGGGTT
TGACTACCAG TATTTGACCC AACTCTrrCC ACAGGAGGTA GTTCGAGGAT TGACCAGTTT GATTACCTTC CAAGAAAAAC TAGCCCTTCT TGGTTTTACT AGTCTTGTAT TTGCCTTATC GCTGATACGA CACTAAAAAA GAAGTTGAGT ACAGGATAAr GGTTGGTCCG TAGAGACTTA AGCGTCGTCT TACCGTAC1'C AAGTACAGCT TTTCATTGAG TATTAACTTG AAGCGGCGCG AGTTGGAGCA TCTGGTAAAA TTTTTCTAGT
GAAAACCAAG
TGCATGCTTG
AATAGAGTTT
GTAGGCTTTG
GTCTTGAC'rr
TCTGGATCAA
TCAATCAAGA
AATTGCTCCT
TTTAAGTCAG
CCTCCTTTAA
ATCACTTTTA
GCGCTGAGTG
'rCCTTTTACG 1148 AGGTAAAACG TCAGATGCAG GACTAACATT TCAGTCTGTC TCAACTTC? TTTTGTTACI' TACTCTTCGA AAATCTCTTC AAACCACGTC TGCGGCTAGC TTCCrAG'T? GCTCTTTGAT GGTCAAAGTG GAAGCGGTCA TAGGCCCGCC GAGCGCTGAG TCCCATGAGA AGACTGGAAG ATCGATTATC CACTGTTTCA GCCTTGGCTA GAAAGCGGAC GTCGTCAGCG CTTGCCTGTT TA-ATCAA-AGT ATGAGCTCT'r TTGATGGGGT TCTGGGTGCC AGTCTTACTT CTGGCAACTG CTGTATCTGT CATGGGAATG TGTTT'TGATA CTGTTAGTTT 'rCA'N'GAAAT CTCTTGAATT CATGATGTTG TAGAAAATAA GCTTGGGCTA GACCTTTTCT ATTATTCAGA CGCAGAAAAC ACACATCCAA CTAAAGCCCA CTTCAAAATT TAAGTAATCT AATTGTTTGG TTAATCT'rGA CTCGGAGAAG AAGGTCAATA CAAAAATGGT TGAAAGAGGG TTCCCAACAG ACTAGAATTG 'rCCTAGCTCT TTTGAGGTAT
ATTCTTTTTT
AATTTTATGA
AGTTTTCAAA
GTTATTCT TCT'rAGGACG AGTCTATAAA GTAAACATCG rACCGTACAG ATTGTTACAG TGAGAACCAA TGTGGGGAAA TAAATTAAAA TCGATTGATT CGCTGTCAAG TATGCCAGAC AGAGTAAGAA GATGAGAATA AAACGACGAT TGAGAAACrT TTGGCAAGCC GGACCAACTG TTTATTCAAA TCTrTAATTG TAAAATAGI-r GTAAGCTCAT GACTGTTTAG GATTGGGTGT AGAATCCAAT GAAATATATG ACGATA'rCTA CTATATCCAA AAGAAGCTAG TTTTAATATG CCTTGATGAG ATGTCATCGA TTCAAGAAAG AATCCTTTTT GTCGCTATAG AGAAAT'rCGT TTTGTTTrAG AGGATGATrT TIGAAAGCAC ATCATATCAT CTGGCTGAAG TGCATGAGAA AATGAAGAGA TGAAGGGACT CTGATTGTCT TTGTGACGAC TCTGCTTTGG ACTACATTGA ACAGCCCTCC TCTATGCCAA 'rTTAAATCAA AAT'rTGCCCA TCGCCCAGAG CCCATCGTGT AGTTTAGAGG AGGT'rTTCAA ATCAATCCTG CAAATGTGGT 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 GGGGGCCCAT CAGCTATTCT 'N'TTGGATAT TGAGATTCGA GGAAGTGGCT AGAAAGATTC GGGATCGGGA TCCT'rATGCC TCACTCGGAG T'rTATGCCCC TGTCTTTT'CG CTACCAAGTG .TAAGGCCTTG TCAGCAGAGG AGTTTGAATC TCGGATCGAG TAGTCAAGAT AGTAAAAGTC TGGCGGAAGA TTGCTrTTTAC ATTTCAGTAT CCTTTTAAAG AGGTTTACTA TCTCGAAACG TATTCTCTAT ACCAACACAG ACAGGCTGGA ATTTACAGCG GCAGGAGCCC CGTC'rCTTGC AGTGCCACCG CTCTTTTCTC 1149 GCAPTTrGGAT AAGAAAGAAA AACTGCTrTT CT INFORMATION FOR SEQ ID NO: 190: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 3207 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 7174 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 190: CCACCAGGGA AAATCATTGA AGTTGGTAGT CACCAAGAGT TAATGCAGGC GCAAAGTTTC TACCATCATC TATTCAATAA ATAAGGAGAA TGTCATGAAT CCTAATCTTT TTAGAAGCGT CGAGTTTTAT CAGAGACGT'r ACCATAACTA TGCGACAGTG TTAATTATAC CTCTI'TCATT ACTATT'rACT TTCATCTrGA TN'CTCCCT TGTTGCCACA AAAGAAATTA CTGTTACTTC CCAAGGAGAA ATCGCCCCTA CAg'TGTCA'rr GCCTCCATTC AGTCAACCAG TGATAATCCT ATCCTAGCTA ATCATTTAGT GGCAAATCAA GTAGTTGAAA AAGGGGACTT ACTCATCAAA TACTCTGAAA CAATGGAAGA AAGTCAGAAA ACTGCCTTAG CAACTCAAT'r ACAAAGACTT GAGAAGCAAA AAGAAGGACT TGGAATTTTG AAACAAAGCT S S TTTTCTGGCG AGGATGAATT CATGATATTG AACTGGGTAT TCCAATAGCA GTTCATCAGC GAATATCAAG AGTTGAGAGA CCGCACCAGT CAATTOAA GCAGAGGAGC CATTTTTATC GCAAGCCTCA AAATTCAGCA GCAACCAAAA TTGAAGTACT ACTGTGGAGA ATCAATI'AAC GAAAACAATA CCTTAACCTC GGTAAAAATA GAATTCCAAC ACAAGAGAAG TACTAATCAC GGACAAACTG TAAGATTAAA CAACTTCAGA CAATTGATCA TGGCTACCAT AATACCTTTA CACAAAGACT AACACCGAAG TATTGAACAA GAAA'rTACAA TGCTATCATA AATAACACAG TCGTTATCTT GrAGCCfCAC TCAAATTAAT CAAACTATTG AGCTGGTATC GGAAGTGTAG CCGCACTCAG TTTTTACAGA AGAATTAAAA GTACAACTAG CCCAAGTAAA GGTATCGTTC TGGTACAGAA AT'rGCTCAAA TTACTACGTA TCT'rCTGACT ACTGGAGAAG ATTGGAAATC AACTCCTACC AGAACAGAGC TAGAAAAAGC GACTOATCTT TGAATTTTAC TAAACAATCC TTTCAAA'rCA AGCTAATCTT AAGTTCAACA ACAAATTGGA CACGCTTACC AACTGGCAAT AAGGACAAAC ACAAGGAACT CAGGTCTTGA ATCATCTATC CAACTTATGA TAACAGTTTA CAGCCTCACA GCAACAACTA ATCAAGCCAC ACAGCGTTTG ATCTGA.ACAG CGAATTTGAA TATTCCCTGT CATCACAGAT ATCTACCTCT ACTAGATAAA ACGGCACCAC CATCATCGGC AAGGAAATCT CTTTAAATTA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1150 ACCGCTCTTG CAAAAC1'ATC TAACGAGGAT AGTAAACTCA TCCAATATGG CI'TACAAGGT CGCGTCACTA CTGTAACTAC AAAGAAAACA TATTTGATT ATTCAAAGA TAAAATTT1'A ACACA'ITCTG ATTAAT'r1-rC AGATAACACT CTATAACTAT TT~ATTATCTT ATCAAAAAGG AGAATCA'rAA CATGGATAAG AACTCAATCA AATTACAGGT ATGCTCAT'rA CATCACATAA AGAAACAAGT ACTCTCGGTC ACGAAAGACT TGATTTTGAC CTTCATGACG GTATAGAGAC GAAGATTTCA GAAATATCGA CTGTGCTCCA TTTTTTAAAA TTCAATAGCA TTGTCACAAA CTCGACCTGA ATCTCCTCAG TAAAAATTTC CCTGCTAGAA AAACAAAACC TAAcTTCATT CGAGGATTGT GGGAAGATTT GAACTTCA'rC ATCCAATACA TTATT'TTTCA TCATTCTGTA AGGTGGTATT TAGACTGGTA CAACTCCTCT CTCCTCCCCT TGCCCTCTTC TTTGATGGAG AGGCGATTGA AACATGAGGT GGATAGACAC AATGGTTAGA GAACTTCGAC ATTAAAGACA GACTTTTGAG GGCTTTATCA TCAAGAACTA ACAACTACCG ATh'ATATAAC ATTAATAGAT ACTATAAAAA AATAAGACCG TGTATCACAG TAAGTACCTG TTAGGATGGC TT'rCCACAAT T'rAGAACTGG CTCCAAAGGA TTTTCGATGA TAAAGGTCTC TGACTAGCTT CCACACTGGC AAATCA.AGTA GACTCATCCC ATGTTCTTAT CTrCTG~CTTT CGAATATTTA CCAATCTGCC 1380 1440 i500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460
CAGGTCATAT
CTCTTTTATC
GGTATAATCA
GCGTTCCATA
CTTCAAATAG
TTAT'rGTTCT GCAATTTCTG ACTGGAATCC TTTAAGACGG AGCCATAGAC TGCTCCATAT CCTCCTCTTC AATGCCCAGA CGTAAGCTAG TCAAGAGGTT TGACGAAAGC TCCGTACTTC CTTGTAAAGC TCCTCTA'rAT GCCCACTATA TCTCTATAGC GCAGGGCCTG CTCTTGTTCC AATCTCTCAT AGAGTTTTTC GTATCCAATT TCTTGATAAC CCCCATAAAA AAGAGTAGGT AAAAGACTAG GATGAGATGG CGAACAGTCT TTGATTGAAT TTCCATGACT AGATAGTAGC CACCCATTAT ACTTTGTTCA TATTCAAAAA AAGACAGACT CCAGTTAA'rC TGAGTCAGGG ACTTrTGAAA GGCTTTATCG AGAATCTCCT AGCTAGAGAA ATGAAGAA.AT ATATACTTGC CCTTGTCCCA ATTCACCAAT ATCATTGGAA CAATAATAAG AAAGATAAGC ATTTCCTACT ATATAGCTA-A GGCCT'rAAAA ATCCTCTCGA TTCTAGTCCA TTAGTAACAA TAGCTCAGTG TAATTTATTG 'rrCTCA-AGCT
TGAAAATTAT
AAAATGGAAG
AGAGACCATA
CTATGCCGTA
TCATCACAAA
AGTAAAATCG
TATACATAAC
CACAAAATAG
AAAGAAAAGG
CAAGGGTTCC
AACAAAGGCC
TAGTCCAACC ATTT'CAAAAA CCAGTAAATG AGTAGCCATC GAGACTCCTC TATAAAAGAG AGTTTTTTAG GAAGCCCTCT ATAAAkATAAG
AACAGTATCT
ATAGGTAAAC
TCAAAAGAAA
2640 2700 2760 2820 2880 2940 3000 3060 3120 AAGTAAGATC AATTCCATCC ACCTTAAAGA AGATGACAAT GTGTATACAA CAATATCCAA GCAATGTTCA TAAATTCTCC ATGGCCTCAG ACACTTCCCT GACC IrATAA CGGGCGATTA GACAACTTCC ACCA'PTGGGA GAGAAGAGCA GT?1r-rCIT CTTATCCAAA TGCACCACAT TTGCAGGATT GATGAGAAAA GAGCGGT INFORMATION FOR SEQ ID NO: 191: SEQUENCE CHARACTERISTICS: LENGTH: 10357 base pairs TYPE:, nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 191: 3180 3207 CTGAATCAAG TGTACTGCAC CAGTTCGTGC
C
ATTGTITTCT GAGTCCGCCT CATAAGTTAA AGTAGCTTGT TCAATTTCTr GAATCATTTC GGAATGACTA TTTTCATCAT TTTTGTAGGA CTGAGCATAT TCAAATAAGT AGCCACTCTT AAGTCGCTTC TTACCCTrTA TAATTAACAA AAATTCTAAA TATCCCCAAG TCTGTCCTGC TGATGCAAAT GCAGTATCAA TATGATTAGG CATTGTCTCT CTTTTTTCTA GACGTTCATC ATCTACAATT TT'rTGTGCCT CAAGCGAATC TCCTCCAAAT ACTTTTTAAT GAATTATACC TCTTTTTCTT AAAGTPTCTGT GTCAGAGTAA CAATTAAAAA CTGAACAAAT TTATTGGGAA
ATCAGGCATA
AATCATAAAT
ATCAGAAACT
AGAATGTTGA
ATTTTTTTGT
TACTTTCCCA
TAATTGTAAT
TCGCGTCCAT
TACATAATCT
AAAGAGATCC
ATTTTCTTAA
TTTAGAAAAT
ATTCAAATCG
ACAACATCTA CAGATATAAT TTT'rCGATAT TCGAATTTTT AACTCCATCT GAATTGGAAA ITAAGATAAA GTGTATT~CAT ACCAAAGGAA ATTGGTTTGT TATTTTTCTG TATTTGTTTC TTATACTCAA ACAAATCTGC GCATAACCAT TCGACACTAT TTTTGCCCTT TCATCAA-AGT TGAT'rCAACA TAATTCTTCC AGAAATTACT ACAATAATTA TATATCTTCT ATAGTAAAAT CTTrCTGAAA ATATTTTAGG TCACCrr'rCT CCTGCAAAAG CTACTTGT'rT GATrATTTTT TCCTTGGTGG AGAGCTTTTA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 AACCGTAGTG TAATATTCCA GATTCAATTC ACTATAAAAC AAAAAGGAAA GACTTCCTTT CGTOCCTTTC CTCTTACTTG GGTAAGCTAC TGC'rTGTCTG ATAAAATCCT GAATCGGCTC CTATTTTCGA ACCGACGATA ACACCATCTG ACACCGCATT. GAAGCGTTCC AGATCGGCTT GACTAGATAC ACCAAAACCT GTCAAGACTG GGATGTCGGC CACTTGATGA AGTTGCGCCA AGTGCTPTGTC CAAATCTGCA CGGTAATTGC CTGATTTCCC TGTCACTCCA TTGATGGCAA CGGCATAGAT GAATCCCTCC GCCCCTTCAA TCAACTCTTT CTGGCGCTCA ATrCCTGTGG TCAAGCTTAC TAAAGGAATC AAGGCGATAT CTGTATTTGC CAAAAATGGT TCTACAAAGT TGGCATGTTC ATGAGGCAGG CT?'rGACAAA GTTCTCCACA GTGGAATCTC TGTrrCAATG GGGCTAAACT GCGCAAGCCA AGGGAATACC CACTTCAATT CAAGACCGTC CAAACCTTC TTrCCAGCTGC TTTAATAGCA TTCTTT'GCTG CATCTGCTTC CCTGATAGGC AGACAA'rCAT AAGGCGATAG CATGGCTAGA TGGAATCCTT CCAAGGCTTC TGGTAGTGAG AATGCTCTGG GCTTCAAGAA TTTGACCATG CC'rGGACGAC CCTNGGTCAA GCTTCAGTTC CATACATAGC GCATTCGACC CACCACCAAC TCACGGTACT GTTGTTTAGC GGAAATGGAT GAGGCCCCAA ACCCATGAAC GAAGGGCTGC GCCTCGACCT TGGCTCCCAA 1152 TCTGGGATAA TCAAGCCCTT CACAGCIGTA TCAGCCAGAT CCGTACTGAA AGAGGGGGTT GAAGTAGGTC ATGATGACCA GTT'rTCAAGG TTTCAACTAA AGCCTGCGTA GAGGTCCCGT CCTTCTTCGA TAACAGGTCC ATCTGCAACA GGGTCTGAAA GCAGAGACAC CCAAATCTTC 'rAAAAAGTGA ATTGTTrCAG TCGTGGTCAC CAGCCATGA'r ATAGGGAACA AAAATTCCT T'rTAATTTTT CTG'rTAGTG'r CI'AGGCATG AGCTTCTCCC CAAGCGGTCC T'rGACT'rGAA CCACATCCTT GTCCCCACGA AGACTTTTCY GGTCCAAGTT CTTGGCCAA TTTCACCGCA TTCCAAGGCT GGGATAATCC CTrTCCACACG AGACAAGAGT TTCGTCTGTC ACAGGGACAT AGC'TGGCACC T'rrAATATCG ACCGATACCA GGATAGTCCA AACCTGCTGA GATAGAGAAG GGCATCTTGG AGCACA'rCCA TGAGGGAACC GTGAAGGACA GGTAGCTGCG rCGTGCTCTG TATCCACACC AAGCCCTGCT TACTGACTCA TCTTCTACAA AGGGATGGAA GAGCCCGA'rA ACAGGCTACT AGGGCATCTG GCAGATCTCG ACCTGTCAAG CTCTCGACCG ATGACACTTT GGAAGTCACG AACGATTTCT GGCAGAACCA AGGATATAGT GGG'rATCGTC GATATTACCC ATTGACCGCA TCCT3'GAGCA CGCGCGAACC ATCTGTTACA AAGCTCCATG CGGAAGACAT TGAGGGCTTG GCGTTTGACA a.
a a a. .a a a a a. a a a.
a a a C**a a a. .a a a a a a a. a.
a a a 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 TCTTCCTCAC CCATGTAGAT GGTACATTCC ATCTTAAAGA GGGCTGCAGC AGTTrGCAGTT GCCACACCGT GCTGACCAGC ACCCGTTTCT GCGA'rAATTT TCTTTTTACC CATGCGTTTG GCAAGCCAAA CTTGTCCTAA TCCCCGr'rGA GATAAATCTT AGAGGAGTTT CACGTCCTAC 6GGTcTGCCT GACTTTCACG TCTGGGACAA AACGTCCGCC GCCATGC'rTT ACCC'rCTCTA CTCCACTCCG CTCGATACAT ATTATCTrTCA TTAAGGCCAC GGCATTGT1TA ATCTTGTGGG CTCCTGTATG GTTAAGGTCT GGCTCCGCCA ATATGCTGGG TCAAGTTTTT TGCGTAATAA GTACTGGCGC AAAAGCTGGr TTAATTCCTC TTGGAAACTT GTAGGCCTI'C TCCAACTCCA AAACTGC'rGT CATCAATGTT GAATTTTCCG 'rAAAATCCAT C'rTTATT'rGG TTCCTCATAT TAAATCTTCr AATCTTTTCA TGATCTTTTT GTCCATCTGT CTACTGCATA GGGAGTAAAG TGTTGAATTG CTTTTACTAC CTGCGATAAA GAAGGGCTGT GCTAGTCCAG TCGTATCCAG 1153 TTGACCCCAA TCAAAGGGCT GGCCACTTCC TGCCACACGGG TGCCTGAGAA TTGGGGACAT GCCCATTTCC ATCTACCTGC AGGCAAATTC 'rCAAATAAAT CATCTGCCAC CTGACCGTGA AACTTNGTCA ATCGCTTCCA GCAGTTCTAC CCGACTTGGT TT'rCACATCT GCAGGAATAA GCTTTGCCAA CTCAGCTGCC TTI'ACTAGGT GCAAAGACAA AACCGATATA GTCGGCTCCT CGCTTCTTTG GTCGATAGTC CACAAATTTT AACCTTTGTC TCTGCCCCAC ATTTTCTGCC 'rGCATAAGAG CTGTCCCTAC GCATCAAAGA GTAGATAATC AC-AGCCTGAA TACTrGGCACA ACTTGAACCA AGTCCAAGCC GAAACAAATA CTCCAACCTT TCTTCTAAAG TCACCTGTCT GCTGAAACGG CTGT'rTCCAC AATCTGCAAC TCCTTGAT'rC CAAAAT'rCCG T'rAAAGTATG TTCAGAAATG TAATAGCGAC C'rCGACCTCA AAGGTAGTCA GTGGGCTACC TCTAGTTCAG TGTCGCGTAG TCATACAGTT GATAACTGTC GCACCTGCAT TTTGTT1GAGC GTCGGAATCT TTTA-AAGAAA ACCTCATCTG CTGGGCCTGT TGCACAA'rAT
S
Sc.
S S 655 5 S. :6 S S 6 S. *e S C 5 55 5 6
S
GGGCTrAGTCG TTCCGCATCC CTTCCTCAAA GTAAGGGGCT AG'rTGCGGTT GTrTGACCCCG CTAGATTGTG AGTC'rCCACT CCTTGAGGCG T'rCTTCGGAC TGCGAGCGCG GATGATTTGC CTACCTGACT GGAAATTrCC
TCAACACCGA-AATCATCACT
CCACA'rCGAG ATTGATATCT GCAAGCGGTC CTGATGATTC GGATT'rGCTC CAGCTTCATC TGCCCTGTGA AAATGGCAGA AAATCTACAC TGGTCTGCAA ATAATCTCAG CACCAAGTCT
AAGACTTCCA
AAGGCTGCCA
TTTrCA'rCGA
CGTAGATAAT
GCTCCGTITT
CCCAAACTAG
'rTCAAAAATT
TGCTCCACCT
GACCAAGCTC
CAATGAGCAA
TGATAAAGTC
CCAAATGCCC
CTTCATAAGT
GGCTAGCTTT
CTGCCAAGCG
CACGCGCCTT
3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800
S
S. 5: S 6 Sn.
S
'.55
S
S. 55 *5 S S
CTTGACCTCA
ATAGGTCTGG
CTGCTCTAAG
AGTT'TTCAA
ATGCTATCAA
AAGAATGGAC
GCGATTACCT
CGCAGAGGCT
ATTCCGCTA
GGGCCTTGCC
TCTTACCATT1
TTGCTTCGTT
AAAATTCCTG ACTCATTTTT GGTACTCCTG TAACAGTCTG TCTAGCAATC ACTrGACGGG CCAAGGCAAC CCCTTCCTTG AGCATAGAAA CCAAGACCAG CATTCAAGAC TGTCGTTTCC TTTCAGAACG CTAAGCAAAA TT'rCTGCATT TTCCrGAGCA CATAGCATAG CC~CCATT1C CCAAATCCrC TGGAGTAAAG ATTTTCAAGA AGTGCAATCT TGGT'rGTTCC GTT-CAAGCCA TCCAGCAACC ACGATGGCAC GT'rTGCGACC CATATTr TrC TAGGAGT'rCT GGACGACTAA TTCCAAGAAG CTGTGTTTCT TGGACCAGTC AAGTTCATAA TCGTTGGAAT TCCCAATTCC TrTTATAGCT GGGTGCATAT TTTTAGCGAA GAGAAAGACG TTCCCACCAC GAATATCT'rc CTTGACAAGC 'rGATTTCGCC GCTTCATCCA ACCCTTCTGG AAAACCTGAG CTGTACTTTC AAAGCCATTG GATGAATCAG AAACGAGCTG GCATGATGTA ATTCCAGTTT TATCAAAGAC 1154 CTTACCTAGT TCAGCTGGTT TGAGGTCAAG ATTGATTCC AAGGCTTCGA GGACATCTGC GGAACCAGAT TrrAGAAGATA TCGAGCGGTT ACCGTCr'rc GCCATGTGAA TACCGCCACC AGCCAAGACA AAGGC'rGCAG V1'CTGGAAAT ATTAAAACTG AAAGACTTGT CCCCACCTGT ACC-ACAGTTG TCCATGGCA'r CATGAATCTC AGTTGGAATA TGCTGGGCAT GTCCTCTC-AT GACTTGGGCA ATGGCTGTGC GTTCTTCAGG TGTI"TCCCCC TTCATCr-rAA GAGCTAAGAG GAGAGAAGCA ATCTGCGCTT CAGTTACACG CCCAGTTACG ATACGCTCAA TGACATCCGT CATTT'CCACA CCTGATAAAT TTTCAAATTT TGCTAGTTr'r TCAATAATCT CTTTCATCCT AGTTrCCTCA CT'rrACAACC TCCTCGATAA AATTCCGAAT AGAAGACAAG CCGTCTGGCG TTCCAA'GCT CTCTGGATGG TACTGGAAGC CATAAATCGG TAGG=r~TA TGTTGAATCC CCATCATGGC TTGGTCATCA GTCGAACGAG CAATCAAAAT ACTGTGATAA CGCATGACCG CAGATGGCGC TTCAAAGTTG ATATTGCTCT CTGTCACT'TC AAAGTCTTCT GGCATTTCCT CTAGCTTACC ACCAAAGACT GCTTCT'rGCC TGCAAAATCA CAGGACCAGG AGAAAAGACC CATCATTTCT CAGAACCTGA TAAAAGAATC ATAGTTGTCA AGATTITCT TTGTTAATGG CCCTGCCCCA GCCTGCACAT GCCCAAATCC ATATCACCCG TrTTTTCCGTT TCCAGTTCAT GGTTCCAGCA GGAAGCGTTG
TCTGCAATGG
CGAATCATGT
AGACCATCTG
ACTTCTGCAA
ATCAATAAAA
TTTCTTGGTA
AGGCTCTTTG
TCGCAGACAA
AGATACGTCT
CTTTCAAGGC
CACGGCCATC CTCAATACCT GTTTCCCATG CATGACTTTTr CTTGGTGGCC CAAACAAATC CTTCCATCTT TCCAGCATCA CTTTTTCAGC TTCTTCATAC AATTCCCAAT GTATTGGGCC TCATGGTCTT AGTTCTCCAA TTCGT'N'TGG GCGATAGAGT ATTTTTGAGA ATCATGGTTC
TGATACAAAA
GGAGCCAAAC
CCAAGAATCG
ACTGGCCAAC
AGCTTGGAAT
AAGTTATAGG
TTCTAGTCAT
CGTAGACAAT
GGATGCCA'
4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 CCCCTTGACT ACGCTGGTCA AATGCATGAC AGTGACTTGG ACACTGGTCG TTTCAGAGAT CAACATTCGA 'rCTTCTGCTG TTTCCTTCTC G'rCTTCrTCA TCCGTAGCCC CTCTTGGTCG GCCA'TTTTG ACAGAAACCA AACTTTCTGG ATCATAGAAA TAAAGGTAAT TAGAAGGAT1' GTAGCCGATT GCCCCAGCGT ATACTCCCCG CATCGCTCGA ATCTrTGGTG CTCCAGAAAC ATCCATGGCA GTGAG'PTC GAAGCAAACG GTAGCCGAAG AGCTCCACTT CCATATACTT GCGGCCAATA TCGT'rACGCC CCAAGTCTAC ATCAGAGAGG AGGTCAGTCG CCAAGGCCT'r CGTCCCTGCA ATCGGATTGG TTGTCACGAT ACTAGCTCCG ATGATTTGA'r AATCCCCAAA AGTCACGCGG AGAT'rTCTGT AGAAGTCAAA TGGATTTCCA GTAAC'rTCTG CTGAAAAACG CTGGC'TGAGT ACACAT'rGGA ACATATCTCC GTTACGAATC AAGTCACGAG CTGTTTCTAC CATTCCCTCA AACTTATGTG GAGCGATATG 1155 CGGTTTGAAG TCTAACGGAG ATAGATCCAA ATCTTCAAAT 'rCATTGGAG CAGGAATGCG -TAATTCCTC-A AGCACTTGCT TCAAGGATT TTCCAAGGCC TCTTACTGC GCTCACTATA AAGTGCATCC 'rCTA'rGACAT GTATCTTCTC ATAGACAAAG AAATGCATGT CTGGCGTCCC TTCATAAAGC GAAATCA'rAT CGTAACCCAC CTCTGAGTGG TGCTGACTCT TATGAATCAC CTTCTTGTGG TCAAAGACCA TATAGCTCTC AATCACTTGA CCATTTTGAT ATAGGCTAGG ATAGAAAALAC CT'rATGT-rGC CCCTTTAAC GATTCGTTCC AT1'GTAATT AAATCCCCAC GCAAAATAAC GATTTCTCCT ATCTCTCATT' GGCACTCCCT TGAGAATGAT
AGAGAACCCC
GAGCTGTTTC
GCATATAAGC
CCC'rrTCAGT T'rGCGTGAGG CCTG'rCTCAG GrTTrCTTCT
AATTGTATCC
AAAACCAATG
TI'CATAAAGG
ATTTTCAAAC
CTTGTC'rCTC
CAAGATTGGT
TCTACTTCTA
ACGAAAT'rCG
ATATCTCCTG
CTCGTTTCAG
CTCTGTGAGA
TCAGGGATrr GACCAATTC GCTCCACCAC CAAAAGGTAG AAATCCAAGG GATCCCGATC TTAATCrCAA AAACTGGATT GGAATACTCT CTAAAATAAC GATAAGACAT CTCCATGAAT GTCCGTGGTG ACTGTATGAA CGGTGCCACC 'rCAAT'rATAG TAACAGGC'rG TGCGATAAAG ATGAACCCAA CTT'rACAGCT GAAAGAACTG TAATTTTTCC TAGGTCCTTG CCTCCACGAC TTCTCTGCTT GTTTTCAGCA ACCACAAGCT ATCTAT'1ATT TTTTAGCTTC TAGTAGTCTG CAATCGCAGC 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340
CAGAGACATT
TCAA.ACCGCG
GCAACCTCAA
TGTTTTGAGC
GATGAAGAGA TGTTCATCTC GGTACACCTT TATACTCTTC GAP.AATCTCT TCAACGTCGC CTTGCCGTAG GTATGGTTAC TGACTTCGTC AGTTCTATCT AACAGTGTTT TGAGCTGACT TCGTCAGTTC TATCCACAAC CTCAAAACAG TGACTTCGTC AGTTCTATCC ACAACCTCAA AACAGTGTTT TGAGCTGACT TCGTCAGTTC TATCCACAAC CTCAAAACAG GTT'rGCTCTT TGATT'rTCAT TGAGTATTAC
GTTTGCTTTT
AAGGCCCTAA
CTTCCGTGTG
ACATCTCTGC
GTTTCAAATT
AGTAGTAGGC ATGGAGCTGT TAAAAGATAA ACCAAACGAC ACGGCGTTGG TAACGCGGTG CTTGTTTAAC AACAAGCTC CATTTTCAAA ATCAGCCCAC TGT'rrTGAGC AGCC'rGCGGC TAGTTTCCTA TAGCTTrTTTT CGTATTAGTC CAGCCTT'TTT AGATAGAACT CAAGTTCATC AAAGCGACTT GGATAGAAAA AAGCCCACAC ACAGAATATA CCACC'rCAAT TATAAACGGGA CTATCCCTT'r ACTGTAAGGT GTGCGCACCG AA'IrTTCATT
TTTCACTACT
TTACTTTTCT
TCCAACCACC
GATTTGTTGA
CACCACAGGC TCCCTGAAGA ATACTT'rGTT TTTTCTTTGT ATGACCATGT CTTCCTTAGA
TCAAAAATAG
TA'rTCACAAT
ACTTATTTTA
TCATTCGAAT
ATTGGATCCA
CAAGACTTTT TTACGATTTT TTTGAA.AATA TCGAACATGA ACATGTCCCA CTTCTTAGAA 1156 ACTCAATAGA AACTGAATGG AGGCTAAACA GAACTTATT'r TAGAACACTC CATC'rTTTCC ACTAGGATTT 'rCAAGAATTA AACAATACTA GAAACTCTGT C'rCCTAACAA ATr'rAGGAGA AACTTCAACA GATGTGACAC TTTCCCCT'rT AATAA'rTGCT AAAACACCTT CTATCATTC TTTAGCCAAT TTAACATAAT 'rGGGAGCAA'r TGTAGACAAA GCTGGAGTAT AATACTGAGA AATAGGAATA TTATCAAATC CAATGATAGA AA'rATCATCT GGAATAAGAA TTCCTTTCTC ATAGCACGCA CGAATCAAGC CCTGAACCTT TTCATCTCCT GAAACAAAI.J. TAATGTCCGG ATAATTTTGG GTAGTCAAGT GCTGCATTGC ATAAGAATAA ACTGAATCAA TTGTAGATAA GCCATAAATG ACTTTTAAAT CCATAAAGTA ATTrrTATCA TTCAGAAAAG AACGCACACC TCTTTCACGA TCCTTATTAA CATGGGATTC TCCTCCCATA AGCAACCACA TATTTTTAAA TTTTTCTTCA GTTACAGCTT TCATCATATC ATA.AGTAGCT TGAAAATTAT TATTAGATAC ATAGACTACT CCAGACGTTT GAGATTCACC GAAAACAAGA AAAGGCATAT GGTTCTTCTT TAAATACTGA ATTCTGATAT CATCTACACT TTCATAAAAA ACA6ATAACAC CATCTACTAG GCTACCTGTG CTTGATATAA TTGAAaTACT AATTGTATCC TCCTCTCCAA AGTACTCAAC TATAGCATTA ACACCAAATT CTTTACACGT CCAAATTAAA GGAAAAGAGT CGATTTTTTT TTTAGTTAAA TTTTTTGCAT ACGCATTTGG AATCGCTTGA TAAGTTTCTT CTTTAACATT TTTGGAGAA AAACCTGATA AACGTGCAAT
CCGTAACACT
TACAGAA.ATC
AATATACGAC
TACTCCACCA
ATCATAAATA
TT~CTAATATT
AGATCCACAA
TTGCATTTCA
TTATCTAACA GCGTATGAAA AATATATTTA TAGCTTCTTT AATTCCTCTA TAACTTTTTG TTAATAACTC GTGAAACTGT GTTACCTTTT TCCCATTTAT ACTATACAAT ATTTAATTTT ATTCACAAAA TTAATTTTAC CACCTTTATT TTTAAGAATG 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140 ATT'TTTCATT TCAGTCCTCC TTT'rAACAAG AGAATTTAGT AAATATTCTT CCCCTTCAAA TTTCCAACTT CACGACAAAT ACTTTTTCAG TAAAATGTGT TTTAACAT;AG AGTTAATTTT AAATGAAAGG CAAACACCAT
ATTACGAACA
AAATTATTTA
AAAGTTTAAA
AAATTCATAT
AGGATTTATA
TTCTrTAAAG
TTCACAAATA
GAGAAAAAAC TGCCATAAAA AAAACATATA ATAGCCTGTC ATAACATTTG TTATCAACTC TCATAAAAAG A.AATAAATT TTACCTAA.AT CAAAATTGAT ATCAA-ACAAT
TTCATAAAAA
GCAAAAGGAC
AAATAACACC
ATAATTCCAG
TATTATCAAA ATATTCTATT CCTCTGAGCA AAAATCTACT AAAAACATCT GTCCCTTTGT TACAATATTA GCAATTTCTT AGTAATAAAT GACGCTATT'r
TTGTAGATTA
AATGTAACAT
ATCAGGAGGT
GTATACTTGT
TTTATAATCT
CATCTAAAAA
TCTTCGACAT
AATCTACCTG
TATTCAATTC
CAAAAATTAG ATGATTAAAA TTACTAAATT TCAGCTAATT CCATCAGTCG AAGATGTTCA TTTTGTCCGG AACATCAAAG 1157 TGTCAGAATT AACATCTCCA AACGCTGTTC TTGAATCGGT CATTCTGATA CCATTTTCTG CACAATAAAC CAATACACGA TTATAGGCTT CTGTAGATTT AACCACTATA TACAATTCAA TCAT*TTTAGA ACGATTTTGC AGATATTTTT TrAGTGGTrrG GAACATGGAT ATCACACCCC AAACAGAAAT GGCTACTAA.A. AGAGCTCCCT CATAAGG INFORMATION FOR SEQ ID NO: 192: SEQUENCE CHARACTERISTICS: LENGTH: 6867 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 192: CGGGACATTC TCAATCTTCT GTCTTTTGTT TTTCTCTTCT TTCTATGATA CAATGGAAAA 10200 10260 10320 10357 AATAAATTCA AAAGGAGTTT TTTTATCACT TATCCAAATC TATGTTAAGG TCAACACGCG CTCTGATGAA CACTCTACTA CAGGTTGACT TCGCAACAAA TGTCCTAATT GT'PTACTATC TACCGAATGG ACACGTA.AGA TTGGTTTTAT AATCCACAGG TAATTGAAAA AAACTCGATC CAGCTGACTP ACAGATGGAA CAACCTTGCT GCCATTGAA'r ATCTAACTGC GGTCCAGATG AAGAAATCGG GATTTTGCCT ACACTGTTGA GCCGCTGGTG CTGAATTGCA CAGATGGTCA ATGCCCTTCA CGACCTGAGT TAACTGAAGG GTTGAGGAGG CGCGTGCAAG CGTAA.AGCAT CCATGCAATC GTCACTCTCA ACTTGACAGA
TTTTGCTATT
ATCGCACATG
CTACGATGGT
CAAGAGTCTT
AGGTGCTGAT
TCATCCTGAA
TGTTGGTGCC
TGGTGGTCCA
TTTCCAAGGT
CCTGAAATGA
GGAACCTTGC
GATACTGCTG
GGTGTGATTG
GAAAAATATC
GACAAGTCAG
ATTAAGCACT
AATAAATTTG
CTAGGTGAAC
CGTAATGTCC
TCTTGGACCG CTTCTTAACC CTACTCCAAG TACACAGAGT AACGTGTTGG ACTGCAAAAT CAGCCAACGA TCCGTCTTTA ATTT'FAATGC TGAAGGAGTC AACTAGGGAA TTCTGGTTTC CAGGACAAAC GCTCATCACA GAATTGCTGA AATTATGACA GTGAGATTCG TGTTGGTTTT ATGCAGAAGA TTTTGATGTG TTCAGTACGA GACTTTCTCA ACCCTGGTAC TGCCAAAGGG ATCAACTTCC AGAAAATGAC TAATGGATGT GACAGGTAGT AAAAAGATGC CTTTGAAGCG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 GCTAGCAATT GATTTTCATA TTACCAAGGT TTTTACCATC CTACATCATT CGTGATr-rTG TATCGCTGAT AAGATGAATG CCAGTACTAC AATATGAAAG
AAGAACTTGG
AAGTCATTGA
GTATCACGCC
GAGCGACCGT
AAAAGATATG
TATTATCGAA
ACTCCAATTA CCATTGCTAA AGCCGTTATG GAAGATCTAG CCAATCCGGG GTGGAACAGA CGGCTCTAAG
ATCTTTGCAG
GAACGTGCAG
GCTCAGCTAC
GAAGATTCTG
CTTGGTTTGT
TTCTCAGCAC
CTTGCGCTAG
AGTAAAGCTG
GTGGCGAAA.A
TTGATACCAT
TTCGCCTr'rC
TTGTTTCATT
TT-TCGTCGCT
TGGTGTTATC
CTTTTGACTG
TTTTGTCT'TT
TATGCACGGA
CATTGGCATC
TTTTATTCT
1158
ATTTCCTTTA
CCTTTTGAAT
GTAGCTTATA
ACTGG'TTTT
TTCTGAAGTT' GATTCAGCAG.
AGCAGTTTCA ATGTTAGATT ACCATTTGC'r TCAGCATTTC GAT'rTGATGA TTCAAAACTA ACTCT'rAGCA GAAAGT'rGAT ATTrATTACTT TCTAAATAAG TACAAGTGTC TGAGGATCTT TGGGAATCCC AACTCCGAAT ACGTTAGCC'r TCAGACTATG AAGGCTAAAA AGACCAGGTA CTTGATTTCC AG'rAGTTGTA GI'AGAATC TCTTGTATTG CTGCAGTTGC GTTTGGTTGG TTrGCTGGACT rGTTTCTTCA GAATAGCTr'r TGTCGArrCA CTAATAATGC ATCCACCTTA AGTGAAGCGA CATGAGAATA GCTCAGCA'rT TTCCTTTTCT CCTGACTGTT TACTTCATCC CTACTTCTTC TGCCAAGGAT TCAAAGTCCG CATCAGATCC TCGTAGAGTT TTT'GATAGAG TGTTGAAGGG CGCTAGCGAT ACGCGTCAAG ACATCTT'rTA AAGTCTGCAT CAGCCTTGTT TGTGGCAGCT TTTAGATTrT TGTCTGATTC CTTCT'rCATG GAT1"rGTTCC AAGAGTTGAT 'rTGCCTTGCT CAAAAGACTT TCTACTTCTT CCTTGCTATC TGTCGCAGAT TATTGGTTGC TATCTACCAT GTACTCCTAA AACAGGAGAG T'rATAATCCA AGATTACAAG GCCTTACAGA AATAAGAAAT CCAGATAAGA CAATdGTCGT CCAAGACGCT ATTCGCTTCG CACAGCAGCA CGGATTCAAT ATGCTTTAAT TTTAAAGTTT AGGTGTCAAG ACCTCTTr'rT AGTGTGCCCA AAATTTAGAG AAGTAATCAA TCAACTAACT TTTATTT'TTT TCAAACTTTC AGTAAACTGA CCTAAAGCTA ACTCAATCTG TCTTTGTAGA TGCTTCTGCT ATCAGCTAGA AGTTGATCTA CTTT'rGCCAA GACTGCCTTC TCATCAAAAG TTCCAGG'rTG ATAGTTGGAT TGCAGGGATG GAATCTTGTT TTTCAAACCC GCTTCATATC CCTTAGTTTG AACCTTGATG TAGTGATTGT GGTCGCCATG AGGAATCACA AAACCTTC'rG AA'CTTCACT TATAATTCGA TTGGCATCAA AACCATGACC ATCTTCTTCC TCATGATdGA CATGTAGTGA CGGATTACTT AATACAGAAC TAGAAGAACT TCCTACCTCT TCCGTGTTAG AGTGTGATGG GGGATTGTTA AGAGATGACT TAGGAATATA GTGATAGTGA 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 TrCCCCA'rGTC TTACTATATA AGCATCACCT GTATCTCTGA CAATATCATT ACATATGTGG CTGCTAATTC ACCTGCCGAC AAGTCACTCT CAGGAATGAA CCACCATGTG GTACTATAGT AGATTGAAAT AGAATATGAG CAAAT'rGATA AAAGTAATTT CTAACAATGA TTTAGAAACT ATGATGTGCT ATTCTAAATT TATATAACCA TCATCGGTAG TATAACGTCC CTGTAATTTr GCTACAGATA AGGG'rTAAAG
ATCATAGTGA
AGGGGATT
CAACTCACTA
CTTCTGCACT
1159 AGCTCCTrTA TCGTCTTTAC CATGTTCTTG TNrrGGCGA TTGATTTCAT C7=MTTTCG TACA'NTrTCT GCATGAGCTT GATCT'rrAAG GTAAACATAA TACTTCCAT CTACCTT1AAT AATATATCCT CCCTTAACCT AACTGACGAT ATCTrGATC'r TTCGGCTGAT AGTTGGGGGC TTCATTAAT AGCTCTTCAC TAAAGAGCGC ATCAAAAGGA ACTTTACCAT TATAGTAGTG ATAATGATCG CCA'rGAGAAG TTACATAACC TTGATCTG'rA ATCI'AATAA CAATTTGTrT TGCTTGAATT CCTTCTrTTT GACTAACCTA GTCTGGAGTC AAATTTTCAG TCTTCTTAGT GTCTTTATTA CTGTTTACAT ATGAAACACG AT'r'I'TATCT GTA'rTGGCCT GTTAGCTA'rG T'rGGTTCAGA GCATAAACAC ACAGACTTAA GGAAAGGATA ACAACAGATC CAGCTGCTAT ATATTTCTTT 'rTAAATTTCA TAATTACCTC ATTTCTATAA TTATTTATAT GATGTCTTCA T'rATTAAATG ATTAAATAAA TTAATTAACC AATTAATTAA CTAGTAAATA TTCCACCTCT TTTTAAGTTG TATGTCAAGA AA'N'TTATAT ATTAATAATA 0**q CAGAGTTTTA TT'rCrAACTT TATTCCTTAC GCTATGAGAT TAATTCAATC GTTCCATCCA TGTAAAITrTT TCTAATTTTT ATCTAACATA GGGTCACTCC GTTTTCTCT TTTACAGGT'r TTCTGTTGGT TGGTTCTCAA TTCACCATTT CCTTGAGGTG TCCCGATGGT AAATATAATT TTGAGAGAAC TTCATTTTTG
CAGATAAATT
'rATTGAATAT
CTTGTACAGG
CCACATTCCC
TTTICGTTTGG
CTGTTCCAGT
CTTCTCCTGT
CAATTGTTCC
CTTTTTATC
AACACTATCT
ATCTACTGCT
AAATGAAATT CTCCCAAAGT ATTCAGACTT TTTCTACTGC ACTTCTCCAC TTGGCAATCT A.AGCCTAATC CGTAACTAGC GGAGCTTCCT CTAA'rGCTGG 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 TTCTGGATTC AACATTCCAT TATCCGTTGA TGCCTCTGCT AAAGAATCTG CTGGTTTATT AGATACTTTT CCATTT1TCAG ATGGTTTATT AAAATCTGCC ATATTCTTTT TAATGACTTC GTCCATATTA AACAAGACAT TTTCTAGCTT CATCCCATAA CTTTCAGCAA ATTTTGCTAC TTTTTCTTGI' ACAGGATCCA CTGTAGGAAC TTCTTCTAAC GTTGAATTAC TAGTACTATT CCCAGTTTCA GAAAGTTTTT CTTTTTCTAC CTTCTCACTA GTCTN'GGTT CTTCTACCTr TTCATCAAGT TTAAGTTTT CTTGTGCTTT ATTCCTT'rTA AATTGTGGTA GAATACTTGG TTTATCAGT'r TGATTTTCTT TTTCCAAGAT AGGTACTTCC ACAATATAAG TCGATTGATT GTCCAAATAA GCAT'rTGCCA TGAAGGTTAC AGGAATTTTA TTTCCGGCCG TTCTGGTTGT TCC -rGG?1' AATTTCGGAA TCGGTAATTT GATTTCACCA AC'rTTATAGT TA'N'rTCTAA ATAAGCATTT CCATGAAATT CATCAAACAC TCTGACTAAA GCATCAGTTC CT'rTAGGCAC TCCAAATTGA GGGTTCACTC TTAAATAAGT ATCCCCTGCA TGGAAAGGAT AGAAAATCGT TTGACTGGCC ATTTTGTAAG CTrAAAGAGGT 4 4 4 .4 0.
0 0 4 4* S 0 *0 0 0 *05404 0 1 4 0 0 4044 4 4400 0044 *00.
4 4 1160 TGGAACTCTA AA'rGTACCAT CATAACTTAC TTCTGGATAA TC~w'"GAAG CGATAGTATA CrAAATGTT TGTCCTGGTA AATAAGGT'rG ATCTA6A'TCA AAGTrGCAA TATTCCCTAC TCCTTCTCCA AATACTTTAC CAGATACTTT CTCCAATACI' TTCCATCTG GTGTTATTAA TTTTACTAGC ATATTGATAC CTAATrr CTCCAATTCA GGCGGAAAAC TAAAAGAAAC GCGTTTTTGA CC-ATTGGCTA GAGTAAAGTT TTGATTATTA AACGTACTAT TTTTTAACAA AT'rAACAACA TTCGTTAATT CTTCTCCAGT ATAAACTTTA 1-rCCCTrCTT TTrTAGCAAC TCCTTCTTCG GGTTTAAACA GTTCATAGTT ACTGTGAGAA TGACCAATTC CAACCGGTTT ATGTTCATCA ATCGGATCTG CATGATGGTG ATCTCCATGC GGATAAATAA TCGCATTTTT* TTCTTTATTC ACGACAATAC TTTCACGTTT GACACCATAT TGTTTCATAA TGCCAGCAAT TT'rTTCTTCG ATTTTTTTAT CTAAATCTTT CATTTCTTG GCATTACTrG GATAATCCTG TTCATGAGAT GACAAAGAAT CTAATCCATT ATGAC'TAGTT TTAACT'rCCT CTAAATGTTT TTGCGCAsCT TAATTTGCTC TTCTGTCAAG TCCTTCTTGA AGAAATAATG ATTGTGGTCT CCGTGACTCA TGACAAAACC TGATTCATCT TCAGCGATAA TACGATTAGC ATCAAATCCG TATCCATCTT CTrCATGTr'r CTCATGTGA6A GTTCCTGGAT TGATTGGAAG AGATGGAGAA GGTGT'rGCTA GACTATTGTT TGGAAGAGTC GGTTGCCCAA TTTGATTrTGA TTrTGGAATG TAATGGAAAT GATCACCATG TCTrACAATA TAAGCTGTAG CCGTTTCTTC AACGATATCT TTTGGATTrAA AAATATAACC ATCAGATGCT GAAGAGAGCT CCTrTACTTGT CGTTAAAGAA GAAGGATTGC TTGAAAGACT GCCTAGACTA GACAC'rACTT CAT'rAGGTTT TGCATTTGTA GAAACTGTAG AACCAGTTCC ACTGATAGGC ACCATTCTGG CAATCTTTTC TTCTAAGGCA GAAAGC'rTGC TGTAAGGAAT AAAG'rGGTAA TGGTCGCCAT GCGGAATCGC AACTCCATTT GGTGTACGAC TGATAATCTT AGCAGGGTCA AAGACCAGGC CATCTGATTC ACTGTAACGT TGGGCGCTAG GTGAATCATA GAGTTCCTTC AAAAGACTCT GGAGATTTTC AGATTTATTT GCTGGCTTGC TAGTTGATCC TTTTGCTACA GATTGCGTGT TATTGTCACT AGCTGTTGAA GAATAGCT'rA ACTGACTCGG. TTGCATATTT TTTCCAGCCA GATGTGC Tr AGCTGCTGCT AATTCACTAG CAGATAAATC GCTTTTGGGA ArGTAGTGAT AGTGACCTCC ATGAGGAACG ATATAAGCAT TACCCGTArC TTCGA'rAATA TCAGCTGGAT TAAAGACATA ACCATCATTT 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 GTCGTATATC GTCCCTGAGA TTGACATGTT CTTGTTTTTG GCTGCATCT'r TCAGGTAGAC ACTTCATTGA CAATATCAGC CCTGCTACA GCAACATTAG AGTTAACCTr ACGATTGATT TCA'rCTTTAG TTCGAACATT ATAATATTTT CCATCGACCT TGATGATATA GTCTTTAkAGT TGATAGTTTG GATCCTTCAT
CTCATTATCT
ATCAGCATGA
ACCACCCTTG
CAAGAGTTCT
1161 TCACTAAAGA GGGCATCATA AGGAACTTC GACGTTACAT AGCCCTGATC TGTAATTTTG TTCTGGCTA.A CCTGGTCTGG ACATAAGAGA CACGATTATT
TGTCAAGTTT
GTCCT'rAT'r
CCA'
TCAC
TCCI
LATAGT AATGATAGTG GTCACCGTGT kCAATTT GCTCAGCCTG AATTCCTT~CT :TTTTCT GACTTGACTG GCTGCCATCC CGCGAAC GATGCTGGTT TAGTGCATAG :CAGCTG CTATATAT'rT T'N'ACTAAAT ~rrTTC TCAACTTCTT TrACTTTATT- 6540 6600 6660 6720 6780 6840 6867 GCACATAGAC TCAAGGATAC GATAACAGCT CATC TTCATAAATC CCTCAT'ITCA ATAAATGATG AG AAATAGTTTT CTAAACCCGG GGGTACC INFORMATION FOR SEQ ID NO: 193: SEQUENCE CHARACTERISTICS: LENGTH: 999 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTIION: SEQ ID NO: 193: CGTTCTAAA.A ATGCAGTACG TTTGATTGAG AAATCAGTTA AAGGTATGCT TCCACACAAT ACACTTGGAC GCGCTCAAGG TATGAAGTTG AAAGTATTTG TTGGAGCTGA GCACACTCAC GCTGCACAAC AACCAGAAGT TC'N'GACATT TCAGGACTTA TCTAAGGAAA GGAACAATAA AGTATGTCAC AAGCACAATA TGCAGGTACT GGACGTCGTA AA.AACGCTGT TGCACGCGTT CGCCTTGTTC CAGGAACTGG TAAAATCACT GTTAACAAAA AAGATGTTGA AGAGTACATC CCACACGCTG ACCTTCOTCT TGTCATCAAC TCATACGACG TTTTCGTTAA CGTTATAGGT CGTCACGGTA TCGCTCG'rGC CCTTCTTCA.A CGCGCAGGAC TTCTTACACG TGACTCACGT AAAGCTCGTA AAGCATCACA CAGAGCACCT T'rCGGGGTGT CAGTTGTTGC GACTTTAGTC CAAAACGTTA ATCAATACGA CGTGCTTTTT TCTATACrT
ATTTAGTAAA
TCTTTTTTTA
GCTTACAAAT
TTATATCAAC
CTAAAAAAGT
CAACCATTCG
GGTGGATACG
GTAGACCCAG
AAAGTTGAAC
CGTTAATTCG
TACTTTCTTA
GTGGCTGCAA
GTTTCAAAGC
TTACCCTAAA
GTCCAGCAGA
CGCTATGAGT
CAGTTACTTC AACTGTAGGT CTGGTCAATC AGGAGCTATC ACTTCCGCGA TTCATTGAAA GTAAGAAACC AGGTCTTAAG AAAGAATTAC TATACTTATA CTAAATTGGT GCAATTGACA CCTGACATGG TCAGTTGCCT ACTCAAGGGT TTACCCTATG ATT'rGCCCTA AAATTACCCT TAATGGAACT ATGTTTGAAG TAATGCTTCA GAAATGGCTT ACTTATTTTT AAGATGTTGG TAGGCAACTT TAT'rAACATA AGTC'ITIAGTT GTAACGGTAT 1162 CTAAGCTCAT TCCTGCTTTT TTAGCAAGTG TCGCTCCTG INFORMATION FOR SEQ ID NO: 194: SEQUENCE CHARACTERISTICS: LENGTH: 2315 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 999 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194: AATATTATCA CTGTTCTTGA AGGCAGAACA CAAGCTGTCA TCCGAAATCA CTTTC'!TCGC TACGATAGAG CCGTTCGTrG TCAAGTGAAA ATCATTACGA TGGATATGTT TAGTCCTTAC
S
TATGACTTGG CTAAACAGCT ATCCAACATC TCAGCCGTGC CGAAAATCTC ATGAATACAA CGTAAACTCA GCGATAAACG GAAATTrCTTG ACAAGATTTT CAACTCTTAC TTTTTCACTr GACAATCTGA AGCAGGTTCA AAAGAGAAAA TCGTCAACGC AATAATCTCA TCAAACTTAT AAAAAACGGA TTTTTATCGC CAAGCTTAGC TTTTCTTCAA CACTACATTT GACTGGATTC CTCTTTTCCT ATGCGGAGAG CTTAGTCCTG CGCGTCTGCC AACTTTrGGCG ATGCTTGTCA ACTCAAAAAG TTCATTTAGA TT'rTCCGTGT GCTAAAATCG TTCTAGATCG TTTCCATATT CATGAGTCGT T'1TCGTGT'rC AAATTATGAA TCAGT-rTGAA GGCTATCAAG CGTTACTGGA AACTCATCCA ACAGGATAGT TTTTTATCGC CCTACTTTTC GCATGCACTr AACAAATAPA AAGCTATTCA GAAGACTrGA AACACCACTA TCAGATCTAT TCAGAACAAA GACCCTGAGA AATTTN'TCGG ACTCATTGAG TCCTCTTTTT CAGACTGTCT TTAAAACCTT TCTAAAGAAC CCTrCAACTA CCCTATTCAA ACGCCAAATT GGAAGCGACC CAAACGCAAT GCcTTTGGTr TTCGAAACTT TGAAAACTrC TCTGAACATC AAAAAAGAAA GGACGAAATr TGTCCTTTCT CCCACTACAG TTGACAAAGA GCCTATTTTC GCTGATTCTC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 TAATT'rTTTA GAGAAATACA AGGGACTTGA ACCCTCACGA AATTCCCCA TCCCCGCGTC ACACTTTTTT TCAAATTT
TTTTCATCTA
9ACCTAGGGC AAATTGATGA TTr'rCAGCCC CGATGTCATA TCCTAGATTA TAGGCATCTA AGACACCTGT TAAGATAACG GTAGACACTC TCCCTGAAAA AGCTGAGTAA TGGCGTTTAT GCTCTTGATA AAAGATCCCC AAATCTCCAT
CTAACTTAGC
AGATGGAAGC
TAGCTGTATG
TACGCTCTCT
CCATCCAAAA
ATAAATTCCG
AAAG.AGCTAG CT'rTAGCTAG CCTAAAGCGG TCACAGGATC GATTACTTTA CTAGTATATC TCATTTTCAC CAACCAGGTT TCCGAGTGTA TTTTTGAAAT AACAGCTGGT TTAACAATCT TAGGACACAG ATATCCGTCA CAAACGAATA TCTAGGTCAG GACACGACTG TCTGAACCAT TCCACTCGTC CCAATCAGAT TATGAGGAGG AAATAACTTA CTTI'CCGGAT GGAAACAATC GTTTTCTTCA. TGAGCATCAA TAGTAAAGAA GATATAATCT CCGAAATCGC CTGAGCTGGA CTGTATAATC AATCGAAATTP AAAAATATCT GAAATCAAGA AATAATCCCC GCAGACTCAA CCTCGTTCAA AAGCTAATCG GCACCTGCTG TTAGTTTCCC AAAGCCTTTG TCTTAGTAA CCTTrAAGATA GGTrCCCTTC GTTTACGAAG AGCATTGACA
GATACGATCT
AATTGCTGAA
AGTACGACGA
AACAACGGTA
ACGCCAAATA
CAAGCCATCT
GCAATCACTG ACGCAGTCAA CTTCCCTTCA ACAGCACGGA GTTCGGAGTA AGAAAGGGTA CGAATATT'rT TCTCATCTTC TTCACGTTGG CTGGCAATCT CAACAAGAAC CAAGTCCTCA ATCAAAGAAC CAAGGCGAAT CCCCGATACA GGAAAATCAT CTC'TACTCTC AATAGGCAAAi AGTTACCTTG CTGATGGCAT ACTATCAGCA ACAAAATCTTr TCTCTrTTrCT TCACrCTrC ATTCCAAGTG AGCGACTTC ATCACAGAGC GAGTGATTCC TTTCCATTA ATTCCCCTAA TGACCGCCA TGGTGACAGC AAGTTAAGAA GCTGAATCCC TCTTCGAATT TTTTATCATT TGAATCGGTG CAATAGTCGT ATACTCATAT CATGCTCAAC AGGCAAGTTT CCcTGTrT CGTAAATCAT ATTAGCCCCT TGAACGTAGT AATC -rAG'1r TGGAAGA.ATT GCTtACGCGA TCTGTATTTG ?TTTTATAACG CCAAGCAGAC GTCCCTTACT ATTGATAATG CAGGCATTGC AATGAATAAT TGACGCGTAA TAGCGTTGTA AGGGAGCTCA TCTCG INFORMATION FOR SEQ ID NO: 195: SEQUENCE CHARACTERISTICS: LENGTH: 6693 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear
CATCTGGGAA
CATAAAATAG
ATCCGCTAAC
1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2315 120 180 240 300 360 420 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 195: CGATTI'CTT C CATTTCTTCA AATAAGA.ATA CTTCATCTGA CATATGTGT'r ACCTTCTTCA TCAAAAATTA TTTTGTAATC GATTACATTG CAGATCGTAA CATAAAGAAA AACAGATGTC AAATATTAAA CGTAAAAACA TGGTCACTAA AGAACTATAA GAGAAAAGGT AAACCTAGCG ACGCGATGAA CGCTGGGTCG TTTGGTTTCG ATTGCTCTCT TCCTCTrGTT TTTTCTGTTC TTCTTCTTGT TTTTTCTCAG CTTCCTTGGC CTCTTGTTTG GCTTTTTCCT CAGCTTCCAT AATTAATTTA TCCCCACAG TGTAGCTGTA GATTCCAGCT TCCATGTCGA CCACACTCGG TTCTGACAAT TGAGGCTTAA TCTTACTGTA ATATCGCAGT TTCTTACTCA TTTCAGATAG AGGAACCAAG ACTTCGTCCG AATCATTCAT GC?1'GGGGCI' AATTCCACCT T'rTGGATAGC TTCTGAGACA AAAAC7rTGA TTTGTTCACT TGGTAAACTG T'rCAGACTCA CAGAACTAGT ATGATTTTCA CCAGAAATAT AGTAGGCCAC CTTAGTTGGA AATTGATAGA CAAGTTGAGC CTTTTCATA'r r'1TGCCTTGT CTAGCAGAAG 1164 GGTCAATCGA ATTAAATCGG ATGTCACCT'r CGCCTTGAG'r TCTGGGCTAA TTTGAGCAAG ATCATTAAAG AGAACTGATA AATAAGTTTC CTCAAGC'rGA CCACTG GAAA GAATAGGATA AA'rATCATAT TCC'rTGACCT TAATAGTGA.A TGAT'rCAACC CAATAGTTAG ACTTAATCTG GT'rAATCG'rA TAATCCGAAT CCTGAATGCC TTGCACCGTT CCCTCAACAC GAATATCTTT AGAGACAALAC AA'rAGAAGCA GACTTGGAAA ACCAGGAATC TTTGCTTTGG CTGGT=TTTC CTGTTCCTCC TTCTCTN'AG ACTCTGGT'rC TGAGGA'rGCT ACT'TTTTCTT CAGACTCTTC TTCACI'CTCC TGGTCCTGTT TATCCTCTGA TCTTTCCTTT TCCTTCTCCT CAGCTAGAGC TGAAGCCTGT CGAATATCAT CATGGTCGCA TAAGGACTGA TAAAATCGTG AAGGCTCGCA CTTTGTAGCC TTTTTAGCAA TTCTTTCTCT TCTTTCTCTT
CAGCTGTAGT
GCAAGTAGGC
AGATATGGAT
GCTTTTTATC
TGTCAGCCTC
CTTAGCTGAT TCTGAATCTT CCTGGTCTGT CTTCTCAGAT TCTTCTCCCA TTCGAGCTTG CGCCTCTTCT TCAGCCTTCT CTCTTT'CAAT TCI'CGAGGG TCCTTATGAT AAATCTT'rTTT AGAAGCTTTC ATCTTAGCTT TTCCAAACTrA TCCAAGGTCA AAAGTAAGCT GCATTTTCAA GACATGCAAT TTrGCCTATCG AACAATATCA GCCAAT'rCCA AAGATTTTGC CTCAACTCAT CTCTGTTAGT TCTTTCTTAT ACCGCCAACA AACAATACAG TTTTTAGA'rA TTCT'rCGTTT CGTTTCTGCC ATTCTGATAA TTTCTTTGTC CTCATTrTTTC TTATC'TTTTG ACATTTACTT TCAACAA7rG ATAAAAATCT GCTAGAGATT TCAATTCCTT GGTAATCTTC CTTGTGACTT AGTAAGTGAG AAAGCTTCTC AATCGCTTTC TTGAAGGTCT TCTGCATAGC CTTTCTTAAC TCTGGTCACC ACGACTAGCr TCACGACCAA GCGGCACAAT CCAAGAGCTC AAAAATCGTA TTGGCACCAC CTCGTGTCAC TICAAGGGTTG ATAGAGATCG GTCACATAGT CAACACGAAA TCAGAC'rAGA ATCTCCAGTT AGArTTGATAA 'rATTGAGCG CGTCTGTCAC CAArnGGTTA AAGACACGAG CGCCTGCAGA TTGGCAAT1TT GGGATTAAAG 'rGGGTTTGAA TATCCACCAA 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 -TTCATCTGGT TCTGGAGTGT TTTTGTCCGA AACCTTGGTC ACCGCTCCCA CATGCTCAAC C'N'AGCCAAA CTCGAAGCTT GTTfCAAAGGT TGAATACATC T'rAGTCGCAA ATTTATAGGC GATTTTATTG GCCAAGCCCA TAGACAGGTC AGATTCGTGA ATAAAGACAG GCACTCCTGA CACACGCGCA GCGATAACAG GCGGTACTGA GACAAAGCCC CCCTTTGAAA AAAGGGTCTG TCGACGCAGT CGCAACATGA TAAAGAGCGA TTGGACAATT CCCCAACCAA CTTTGAAGAC
GTCCAGCATA
GGTGACATCC
ATAGTGGACT
1165 TTTTGCCAAG AGAAATAGCG ACGCAATTT CCAG;TCGCAA AAACCTGAC'r TAAGGATC T'rGGTGTTCG ATACCACACT TCCCAACCAT CTTCGATGAA CTTGGGCATT' AACAAAAGAT
TAGAATGGAA
TGTCCCCGAT
TGAGGGTAAC
GTGTCCAACC GTCCCCCCAC CTGTAAAGAC CTACTG'rGTC GATAAAGAGG TCGCCACGTA CATTGGCAGG ACTAAGAAGA ACCACATCTC TCGCA'rCTGC AATATCTG'rC GCCTCCACAT TGACACGTTC TGCAGATTGA CCCAGGATGA CCAATTCGTC AAACTCATTG CCACGGTCCA TGTCAAATCC TGACAAGGCT TTTTGAGTAG AGAA'N'TAAC ACCC7TGATG TCATCCACAA AATTmC ATA'rTATTCT TTTAACTCCG C?1'CAAAGTT AGCATACATA TCCCAGCTAG CTTGAGTCGC AAGCTCATAG GCCTTGCGGG AAGCGACACC AGCCTTGTCT GCTGCCCGTT CCA'rCTTCTT GAGTCCAGTA ATGTCTGGCA AACCACCTGC AATCAAGACG ACCTTGCTGT CCAAGATATT AGTTGATTTA CTGTCGTTAT ACTGGAGACG GTGTTTGACA CCACCGA-AGG CCACATCACG AAGCTTGGCT ACAGCAATAG *t S
S
CTPGAAAGAGT TTCCTrGATG TCGCAAGGGC ATTTTCCACA CTACTTCACC ACGGAAGTAG GTGTTGAAAA TGGTACAACA GATTAAAGTT CAAGACAAGG CTGCTACATA TTCCGAAAAT TAACCGCAAT CTCTGGATGG TAACAAGCGT GTCCTTATCT TCCCTGATAA AAGACCATGT TGGTTG'rCTT ACCGTTCGAT TTGTGGCTAC CTGGAACACC AGTTGACCAT CTTCCAGATA GTGGCTTCTG TCTTGGAAGT AAATCAGCTG CTGTCATCTT GACCCATGGT AGTCGATATG.
AATTCTTGAA CACCCATGAG GATGCTATTT GAGCAACCTG TGGCCAGCAG CAGTCAAAAC CCTGTGATAC CAATAATCGG
GTTTGATTGT
GATrTTCATTC GCTGCCATGA AGCTCCATCA ACCTTTTCAA CAAGTCTTT GCCAAGTCTT GTTCTGGATA TTCCACTTGG AGT'rGGCATG AGGTTGGTAA TTGGAAAGAA GAAAGTTCCA ACTAGC'TGGA 'rAGCCGATAT TTCCCCAATC ATAGTCGTTG TGCTTCTGAA ATCAAATAAG AGCCTTTTCA ATCATGGGAT CTCTTCA'rCC AAGAGTTCCA CAAACTTTGG GCAGCTGGAT ACCTAGCTTG TCCAACAAAC 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 CCAATTCCAC CTCAGTCAAG ACTGGAATTC CCTTGGCCAA TGTTGTAGCGG GATACCTGGA TTT'rTCACCA TAAGGGCAAA AAGGATGGCC ACCTGTAATG ACCTTGATCC CTTCTTCCAG TGTCCTCGAA AGGT'rTCCCA TCATTTACTG TCACAATGGC
GAGCTGCAGA
GATCTATTAC
GAAATAAAAA
TTCACCAGAC TTGGCCAAAC TTTCATGTCT CGAACTCCAT AGCCACAAAG TGTG?1'TGTG CTAAAACAAG CACrr'rCTTA TTCTACTCCT ACTATTTTAC ACTC?1'TCTT CTAACTGAAT
T'TTTTAAATT
CATTTTTATG
CTTACCATAT
CATCTATGTG ATAAATCGGT AACTCGAATG ACCTGATCCA CTTGCTCCCA AATCAGAGGA 1166 TTATGGGTCG CAATAATAAT GGTCCGA'rTC GGATTrTA AAGATTCTAG GATGGAAAGT AATTCCTCAG AGTrTTTTGGG GTCTAAGGAA GCGGTTGGTT GGATCCTTTA AAATTATCTT CGCTAGTGCA ACACGTTGTG AATATAGGTr GCTTCAAATC CAAATAAGAG AGGTTTACAC AAAGAGA'N-r TCTCTTTTTC CTrCAACI-r TTACCAACTA TTGACGGTT'r GGCTr'rCAAT TAAGCCAAAA TCTTGAAATA CATCTGCGAG GATCAAAGGT C!TTCTCCTCC TGATAACTCA GGTTTAGAGC TTGTTTCATC AACCCAGATTr GAGA'N'CTCT AGTATCCTAA GTAATCTCTA CATAGATGAT TTGCCCTTTG TCTTACCACA GCCACTTGTA AAGAAAACAG AAGGCTTGAT TCATATGGCT CCAATCGTCC CCGATTAAGG CATAAATT CGGCTTCCAA ATNTTTrAGA TGTCATAGAA ACACGAGATT TAGAAAGACC AA'rAAAGTGA ACTAGCACCA AATACAAAAC GTCCTTAAGA GAAGTGCCAT AATCATAT'rC AAGAGTGTTG CCCACCTTCA AAATGAAGAT TATATTCTTT AGTTCAATCA
CTTTCTGCGC
GCAAGCCAAT
TAGCAA.ATTG
TCATATCTGA
TCCTATTTTC
AGCGTCAAAA
CGACTGCTTA
TTGACGGTAA
CACCAAGTCT
AAATAGCTGA
CTTTCATAAT
CTGCACTAGC
AAATAAAGAG
GTGTr'rCAAA
CGAAATATAG
CTCCTAAGAT
GCTAACCATA TACTGAGCAT GATATCTCGG CGGAATI'GCT AA.ATCGTAAA CCTGAAATTC GT'rrAATCAA AAGATTGACA GAATAAAAGA GTAACAAGGA ACTGGC'TATT CCAACAATAG 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700
TAAAGTTGCT
TTTAAGATAG
ATATCCT'N'G
ACTATCACCT
GTTTCAGTTT GAACTTCATT ATAACGAGTT GATACTTGCT CATAAAT'rCC AGC'rTTCTTC ATAAAGAGTT GTTTTCCAGC ATTGATAGAC GTAGAAGTCG GCGTGAATAC CACTAAAATC AGATAAACAC TTCTTCCT'rC AAGAGTTCTA GCCCACTCTC CAACTAGATA AGGATATAAA GGATCAGTCA AATACTGAGT AGATACGGGA TTCTCACCGT AACGTGCGCT TTCATCTCAT ATAACTCAAT CTTTCTTCAA TGGGAGCAAG AGGATAAACT ATCTACCACG ACCTTTTCCT TGAACTATAC GTATCCAGTG GAGCAGATTA TCCTTTACAT ATACCACTTG CTCTGATTT'r GTAATAGTCT GCTCTGTCCT GTAAGACGTC AAACCTGTCT TATTATAAAC AAACCGCTTT TCTCCCATTG AAAGATAACT AATCCAAAGG AGCACTTGCC TCCTCACCAG ATTTTCCATA AAACTTTCTT AAGTTCTGCT TCTCGAGAGC GCAAATGTTC CACCTTTTTG GAGATGGGCT AACTTCTGTT TGGTCTCAGC TGTCCAAATA ACTGGGACTA ACATAGAG.CG TATTAGCATC TCTCTCCCTG TTCATTTTTT CCTTGTGGAT TGGCAAAATG AAAGAGCTTG TTCTTCTTCG AT-rGCTrCCT 'rGGCAAAGGC
CTGTATCTTT
GCCATGCTTG
TAACAGCGTA
TCCTCTATCA
TTTTGAAATT
GCCTACTGTA
CCTAAGCCAA AGGAAATCTG TCAAGTT-CTT TCAATCGTTG AAAACAGCTA CTAACTGACA CAATAGGGTT AAAGCCATCA AGCGTTTAAG CAAAGGTTAAACCACAAGCTTTAGGGGTAATCTT CCCTTAATAA CGGGAACTAA 5760 1167 TGCTTTGTAA CTCAAACTCA TTAGGTAAAG GAGCATTIAGT AAAATTGAAA TCGCCAATAA AAACAACAGA TAGAAACTAA TCCCAAAACc ATAGGTGGCT AACAAGATAG GATAAAACAA ACCTTGACTA AAAAGAACGA CTCCCCCACC TAGGAAGGAA AGGAGGGCTG ATAGAAGGAG CCATrTGATA
GAGTTTTATA
AGATAAGCCA
AAGAAAATAG
ATCGGCAAGC
AtACGTTCAC
TAAGTTGCAT
ATTAAACTCT
GGTTCCTGAG
TATTGTGGCG
AATAATATGA
C'N'ACAAGGG
TCAGTAGATA AAGAATGCCC CCTGCTGCTC TCATTTCCTT AATATTGCTA AACTAATTAA GGCGGTATCT TTCGGTCAGC TT'rTCTTTCG TCAAGGAGCC GACTTTCrTG GCTAGCTTCT AAGTAAAGTG AGTCGTCCCA GTTC'rGAGTT TGCAAAATTC TATCTT1"TTT GACACGAAGT CACGTAAGAC AATAATCCAA ATAGTTTCTT CATAGTAGAC GAACGATTTA AATCAATGAA
CATGATGGAT
AATCCGAGTG
AATAAGGGGA
AAGAGAGTCT GACCAGAAAA ATAATCACTA AAGCAAAGAA TTTAGTAATA TTCGAAAAGC ACTTGCTTT'A TAACCCAAAT CTCCTAATTT TGACAAAAGG AGATAACTAT TTAGCGGAXIT TGGAA'rTC7'r TTGGTAAAGT TCCCTGACCA TCCTTACTCG GCTCTACAAT TCTrCTAGCT TCCAATTCCT GTTCAAATAC CTCACGCGTC AAAGAAACGG AATCATAGCT TGCATATAAA GCAAGGAAGA AGCTGAGAAA AAAAGTTGAT TCCTTGTAAA CAAAATTCCC CCrGTAATTT CGATTAGTCA TAATCACAGT AAAATGCTAC 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6693 120 180 240 300 360 TTGTTCTCCC CATTTAGTCC AAATCCATGC AGO INFORMATION FOR SEQ ID NO: 196: SEQUENCE CHARACTERISTICS: LENGTH: 1847 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 196: CCGGTCTATG TACCCACTAC TTTGGGACAA TATGGGGATC AGCTACCCAA AACTAATCGA GCGTTTGGTT GACCTTGCCA AGGAAAGTTT TGACAAGCGC GACGATTTGA TATAAAATGA AAGAGAGGGT AGAAGCCAGA ACCATCACTG CACGGTGACT AGAGTTCTCG GACTTCAGCC CTTNTTAAAG GAGTAGAAAT GAAATTAACA ATCCATGAAA TTGCCCAAGT TGTTGGAGCC AAAAATGATA TCAGTATCTT TGAGGACACC CAGTTAGAAA AAGCTGAGTT TGATAGTCGT TTGATTGGAA CTGGAGATTT ATTTGTGCCA CTTAAAGGTG CGCGTGATGG CCATGACTTT ATTGAAACAG CCTTTGAAAA TGGTGCAGCA. GTAACCT'TGT CTGAGAAAGA GGTCTCAAAT 420 CATCCTTACA TTCTAGTAGA C'rrGAAAAAA CGACTGTTGA
AAGGATATGT
TACAATAATG
TTGGTTGG
CGTCCAAAAA
CGTTCAGAGA
CTTTTAGCGC
CGTTTTGGGC
ACCTTCAAGG
GCGACAAATG
ATTCGTTTGG
TGGCGCATTT
AGATTGGCCT
AGATGGGACA
CAGCCATCGT
TTGcTAAGGG
CGGCTGACCC
1168 TGATGTrG ACAGCCTTTC TGTCTr'rGCT GTTACAGGTT ACTGTCAACA AGATACAAGA.
TCCTrACACA GTTCrTCATA GGATCACTrG GGCGATATTC GACCT'rGGTr GGAGAAGCCC AAAAATGCAA ATTGCAGACG TATCGTAGAG GACTATTTGC AAGGGGCAGA GCTGGAAATT ACTGACTTGG CCAAT1'TC'N' AGAGCAAGCC CTTGATTTGC
CTATGATTGC
CCTTCCAAGA
ATCCTATGTT
TCr'rGAATTG GCCAATGGAG CAGATATCCI' GTCAGATGTT ATTTTAGAGA CTTClCTGC CATTCCAGCC GCGGATATGA AGGAGCTrGG TGACCAGTCT CTTTCTCCAG ATGTGCTTGA TACCGTGATT CAATTGGCCA GTCAAATG'N' CAGGATCAAT TTGAAGACCT ATCCTGCTCA AAGGCTCTAA
CCCAATCGGC
AGTCAAGCAG
CTCTATGAAT
GCCTTGCAAG
ACGCGTAACC
TACAATGCCA
.AATGAAGGTG
GTTCAACTrC
TT'CTATGGAG
CACGTTTACT
GTCAAGGAAA
CTAGCCAAGT
AAGAATGATT
GCGCTACGAA
TI'CCGAAGAA
CGTCCGTGCT
AATCCTTAGC ATCCTACTAr CAAATGGCAA GACAACGACT CCTACAAAAC ACAAGGCAAT TGCCTGAAGG AACAGAAAAG ATCTCT'rGTC TGAAT'rGGCT ATrTGGCCTr TTTCAAAGAC GAATGGCTTC AGGTTCCr'rG CAACTGATAA AAAGGTGGTT TTGAGCGCAA AGATAGTCTG CAGTAACTGG CAAGTACAAT AAGGAGTTTC AGAGGAGCAA GTACCGAGTG GAAGAAAGCA ATCCAACTrGC TATGAAACTG GCAAGAAAAT TGCAGTGTTG ATAATCAGAT GATTTTGAGC AAAATATTGC TGAATrAGCC ACTTCAAGAA AACAGAAGAC GCCTTGGAGC CCATGACCAA TGGTAGAAAG TTTAGAAAAT GCCATTACAG ATACTGGCTT GACTTGCGAA GTCTGTTATC GTGGCAGAAG TCT'rGAAGCC TGGATTGTTG AGGATGAGAA 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1847 GAAGACAAGT GATTTTGTCA AGTATTTGCA AACCTTTACA AAAGATCCGT TITGACCGTGA TGAAATGTTG AATCAAGCAT CAGACCTTGA AACTTCTGCT TATGCGACTC CGTTAATGGA GATTTGTCTG GTTAGGGGAC AAGGAGAGGA TAGTTGGGCT TT~GCCGG INFORMATION FOR SEQ ID NO: 197: SEQUENCE CHARACTERISTICS: LENGTH: 1062 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 197: 1169 CAAGCGAAAA CATrTrl'FAT TCCAAATAAA CAGAGCATI-r TAGGAGAACA AGAGAT 11' AATGCCPLAGT CGATCTTGGC CTTGCTAGAC GGTTTGGAGT CACATAGCTA TGATG'rAGTC 'rATCTCCGTC AGCC'rC1-AA TCGTCTCGAA TATATCGAGT GTGCGATAGT GGGGCAATCA CAATITCTCT TTAAGGTCAG TTATGCTGAT GGTCAAAAGG CTTACCGTGT CGATCTrCCT GACCTACTAA CAAAGACAGA CTGGCAGATT ATCAAGTCAT rTTrAGATGC TNTGCTTGCT TATACAGGGA CTGATATTGA AGGGCTAGAT GO'1TTTGATT TTGAAGCTTA TTTCCAAGCA AGTATTCA-AG CCTATCTAGC AGACCCTGTA GCTCGTTTrA CGATTTGCCA AGGAATrTTT AATCCTATTT TCT'rTAGTCG TGAGAAC'TTG AAAAGCTTTT TAGAGOCAGA TGGCTTGGCT CAGTTTGAAG CGCGTGTGCG TGCGGTTCA.A GAGACAGATG CCTACTTTGC GAGAGTTTCC TTCTATCAGG ATGGAGAAGG AAAAGTGCA'r GGCGTTTACC ATCTAGCTCA AGGAGTCAAG ACAGTTTTAC CGAGAGAACC GTTTGTTCCT GCAGCCTA'rA TTGAGCAATT GGTGGATAAG GAAGTCCAGT GGGAGATTGA CTTGGTTCA.A ATCAC-AGGAG ATGGCTCTAA ACCAGAAGAC TATGAAGCCA TTGCTCGCTT GGACTATGCA AAATTC-rTAG AGGTATTACC CCCATCTTTT TACCACCAAC TAGACGCCAA TCAAATAGAA GTGCAACCCA TATTAGACAA AGATTTTAAA ACATTAGCAC AAGAAAAGTA AAGCAGAAGC AGGTCAATCG ACTTGCT'FTT TTGACATAGA Ge.. Gee* go.* *090 0S* 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1062 120 180 240 300 AAAAATCCTG CCAAGaTGAC AGGATTGCTA GCTAGCCGCA GCTGTACTTG AGTACGGTAA TTGAAGAGTA TGAAGTTTAA AGAAAAGCCA INFORMATION FOR SEQ ID NO: SEQUENCE CHARACTERISTIC LENGTH: 6846 base TYPE: nucleic acid STRANDEDNESS: doubl TOPOLOGY: linear CTCAATGAAA ATCAAAGAGC AAACTAGGAA GCGAAGCTG ACGTGGTTTG AATTTGATTT AGATACGAAG AT (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 198: TATCTACAAC CTCAAAAACA TGTTTTGawG; gCTCGTCAGT CTATCTACAA CCTCAAkAAAC ATGTTTTgAa kGCtcGTCAG tTCTATCTAC AACCTCAAAA ACATGTTTTG AcaGCcTcGT CAGTTCTATC! TACAACCTCA AAAACATGTT TTGAGCTGAC TTCGTTAGTT TCATCTACAA CCTCAAAAAC ATGTTTTGAG CTGACTTCGT TAGTTTCATC TACAACCTCA AAAACATGTT ?1'GangnCnT CGTCAGTTCT ATCTGCAACC TCAAAGCAGT GCTT'TgagcG CTTCGTCAGT TCTATCTACA ACCTCAAAAC AGTGTGTTGC TATGGTATTC ATTAAGTCAA CATCTCT'rGT ATTCCCTGAT TTTTCTATT TACGTTTTCG TAAGAGAACT TCACGTT-CTT CCAACTCTTC ACTAATGGCA CCAAGGTCAT AAAGAGGTTrG GCCTGAGATT TGTGAATCAT CACTAGTAAT CTTATCAGCC AAACCGTTCA AGACTCCTGC AGTTGCAT1'? GATGAAATCA AACGCTCTGC AGAT'rCAAAT ACCAAACCCT CACTATAAGT 1170 GCAGCCTTTA ATCACCCCC TAGTCCGCTC TTAAGACCAC CAAATCAGGA AATCTTCTCG TGTTGACCTA CGTTCTGTCA AACCATGAGG CTTArGCATA ATCTTGGTCA ACATACGCAT GGCAATCGTT -GrTAGTTTG GACGGGTAAA AATTTCAAAA TCTCTCGCA CAGAAACACC TGCCAACTCA TCACCTGTCA CAACTGCTGC
TIAAGGCGTAA
GATTCCTGCT
CCATCATCAT AGCTATATTT GCCAACTAAA CGAACCTTAC ACGCTCATTT TCTTI'AGCAA ATTGACACTT GGCAACTGGT CATTGATGTC ATCCACTAGC GGTAACTCAC TGCATCAATT GCTCAACA'rC GACAGTTCCT 'rGAACGCGAA AATTCTGAGC GAATTTTATC TGTCAAGTGA ATCTACCTGC T'rTGAAAAGA GGGTATTGAC AACAGAAACT GCTATTAGCT AGGACAATAT TGTACTTGTA CATTTCTGCA TTTTTCAAGG TTTCCTTGTA GGACCGCTAA CGAAAGCAAT GTTGCTTGCT TATAGTCAAT GCGAGAACAA TCGGAGTACG TACCCCATAT AGATAATGCC TCTI-rCTCGT TATCT'rCATC ATATCATCAA 'rCCCCr'rAGC ACACCGACAG TGGTTGTCTT TCCAAACGAT CAATTACCTC CCATTGACCA CACGGCTGAC ATGGTTACTG TATCATCTGC TTACAAGTAG AGGTACTGAT TTTCGTTTTC ATATTTCTAC 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 CAAACTCGAA AAATAACCAT TGGTAATATT
TTTACTTGCA
TAGCACTTTT
CGTCGCCATG
ATTCATTCCT
TGAAGCTCTA
TTA'rrCCATT AGACCACGCG CAACTGCATT TTACGGGTAT 'rCTCrTrAC GAAACACCTG CTTCACGAGC TTTCCTGTCC TTTCTATCTC TATCTACTTA CAA.AAGTGAA
TGGAATCACG
TGGACGATAA
ATTTTTA'rTG
GACATCATAA
ACACATTCT'r
GATGTGAAA.A
CTATCACTA-A
AAATTTCATA GAAAACCTAG GGAAACTAA G CCCTCCTAAA GACAGTCAAA TCGATTTCTA TCTACTATAG GTATTGTTCC TATTATATCA TCATAACCAA CTGCATCTAG TTTCGAAAGA AAAGTTTG'rC TGCAATTTCT TTTCTTGCTC CGTTAATGGA TTGTAAACAC TTTCAAGTGT TTTTTGAAGA TrrGATTGAAA GTTTAGCTCC TTGCTACCAC CTTAGACTAA ACAAAAAGGA GTTATAGTAA AATGAAATAA GAACAGGATA AATCGATCAG ACAATGTTTT AGAAGTAGAG GTGTACTATT CTAGTTTCAA ATTCACTACC GTCAA'rTTTA GCACATAGTC TTCATGAAAA CCAGATTCTr TCGCGATATT AGCTGCCTCT GTTCGATTAC ATATTGGTGA CATAGTTTCG GACTGT'rCCG TTGGATAGAT TGGTTAGAGA AGCCCTGAGC AATTCCCTTT AAAACTGCGA TTGGGATGCA TCATCACCAC TTCC-ATCAAT TCAGCGAAT ACTCCTTGCG TrCCTCGAGG ACGGTGTGCA AGGTTTGCAT GAGGTCTGCA ATGTTTc'rrr 2160 CTTTTAATAC ATAAGCATCT ACTCCAGCCT TGACCGCACG TTCAAAATAC CCAGGACGCT 2220 TIGAAGGTCGT CACCACAACC ACCTrTG?1' CAAOCTrC TGCTCGTATC CACTCCAAGA 2280 CTTCAAGACC TGTCTTAACA GGCATh'TCTA CGTCAAGGAT GGCG.ATATCT ACAGACTCCT 2340 TrrCTAATAG TTGGATTGCT TCTTGCCCAT TCTTGGCTTG AAAGACAGAC TCTACATCCG 2400 GTTGAAGCAT GAGCAACTGG CACATGGCAT CTCGCAACAT ACT=GATCT TCTGCGACTA 2460 ATACT7TTCAT CTACr-rCTC TCCTTATAAA GTAGTCGAAC CTGCACTTCA GTTGGATGTT 2520 TCTGACTGAT TACACT'rACT TCTCCTGAAA ATGGAAAAAC ACGA'rrTCGG ACTGTATGGA 2580 GCTCATCCCC GCTTATAGAG GCAAAGCCAC AGCCATCATC TCTCACTGTT AGAATGAGTT 2640 CTTTCTCTGT CCGTTCTAAT TTCAAGTAGA CTTTAGACGC TTTAGCATGT TTGATGATAT 2700 TGGTCACTAA TTCAAGCAAA ATCATGGAAG CCCTGACTC CAATTCCTGA GTTAAGCTAG 2760 ACTTGTCCAA GTGATTCTCA ACTTGAACCT CAA'rTCCAGC AATTTCTAAC ATCTTr-TTCA 2820 CAGTCTCT1AG TTCGGATGTC AAAGTTCTAG ACrrAAGA'rT TTCCACAATG GTTCGCACTT 2880 *CATTCATGGA tCCTTGCTGA TCTGGTGAAT TTCTT'rTAAT TCCT'IrTCCA CCTGTGGATA 2940 ***AGCCTCCATC TGAAATAACT GCAAGGC'rAA ATCTGTCTTG ACACTCAGCA TAGCAAAGGT 3000 .ATGTCCCAGA CTATCATGCA AATCCTGACC GATACGACTA CGTTCATTTT CAGCAAGCALA 3060 ***TAGATTTATC TGAGCATr GCTTGACCTG AGCTTCTTTC AAATCCTCGA CAATACCA.AT 3120 CCGAACCAAT CCAAAAGTCA TTAAATCGAC AAAAGTAAGA ATTACAAGTA GATAGAATAG 3180 AAACTCAACT TCGATTCTCT GAAAAATCAA CAGTTGCCCC ACAACAAGGA CTTGAGCAAG 3240 *AAGAAAAGTC CAGACATGTA AAGACTTTAA ACTACGTACO CTGAAATGAT AACTTAAGAG 3300 ATTGGATAGG AAAAAGAAAA ACCAGATATA ATTAACAGCA ACAAAGGCAG TAT'TCCCAAC 3360 TACATAAGTC AGCATGAGGC CCCAATATAG CCAAGATAGG CCCGCTCT TAGTTGTTAA 3420 *AACACCCAAA TATGCCACTA CAAATAGAAT ATCAATC AAT AAATGCCAGG CAGAAAGCCA 3480 CCCAGTCACT ACAGACAGGA TGGGGAAAAT CATAAAALATT AAACTGATCC AAAACATATA 3540 ATGTATTCTT TTCAGTCTTT CAAGCATTAA GCATTCTCCT TATGACCI'TG AAGGTAAATG 3600 *GTCAAACCAA ACAAAACTAC TGAAAAAACA AGTAAATAAA CTGTGGCTGA TAGATTGATG 3660 ***CCACCCTCAT. TTAAGAAGGT CTTGAGCAAC TCCATCAACT GATAGGTCGG GAGACACTTA 3720 CCTACTACTT GCATCCAGTC TGGAAATAAA GAGATAGGCA TCCAGAGTCC ACCTAAAACA 3780 GCCAACCCTA GATAAAGAAG ATTGCCCACG ACAGACATCA ACTGACTAGT TGGTAAGAGA 3840 GTCAAGGTCA AACCAAGCGC TACGAAGGCA CCAA'rCCAAT T'rCCAAGAGA CATGTCCACA ACC.AAGATTG AAACCAAATA ATCAACCAGC ACCA'rATTTA CAGGGCTATG ACGCAATGTT
AAAACAACTG
AGATAATCAC
AATAAATAGA
GTCAATAAAA
CTGCGTTTCT
GGAATGAGAA GATAGCTGTT GCATAAAATT CGCGAGTTCA AAGCCGTCGG CATCCCTACT ATTCTATCTT ACTAAGTGCT TCAAAGATTG TATCCAACA.A 1172 ATACTTCCTA CTATCAGCAA AAGTGCAGCC CCTCTrACAA AATGCCCAAC TGAGAAAACC ATACTTGTTA TCTTI'CATAG ATAATATTCT 'rTGCCAGT TGTTGATCTT GTCGGTATGT GACATCATGG AAAATGCAGT CATGGAGATA CCTGGTGTGT CCTGATAGAT ACCAGAAAAA GACAATAGAT AATAGATCAA TTGTCGTTrG AGCCATCGTr TCATCTTAGT TATCTCCCTT ACTACGATTA TTAACT'rCAA TT'rCTTGTAT AAAAGCATCT GCTTCGCGTG TGACTACT AACCAAGTTA GACTGCTCAA TGACTTCCTT AATI'CCCTCA CTACGCATAG CT'AGAGGCGT AACCAAAATC CGGTCAGCCG TATGCTCTAC CGTGACTCCT TGCGCTTT'rA GGTCCCGAAC 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 GCCACATCCT GCTTGAACTA ACAGTTCCCA TAGAGCATCC TGTTT1'TGTG ACCAGTTT'rC GTATGCCAGA GGAAGGATAA AATGCTTTTC CGTATCACGA ATCAACTCTC CCTTATTTAA CTCTTCAATA TAATGAGACG AATAGAGAAT GA'N'TCCCAA AACCGTTGAC GArTGA-Ar ATCCATGGCA GCAGTTGGTT GACAAGCTTT GGTCGCCCAA TCAAGGTCAA GACAAAAGAG AAGAGACGCT TGACAA'rTTT TCTGCGAATT GCTCTTTTTG TTGCTGGTCA AACTGCAATA TTCCTGATCG CTCAAGGAAT TTGGATAGAT ACGTTGAAAG AAAGCAATCA CTTTAA'rTTC TGAACGATGA CATTTTCTTG AGGCAGATAA CCTCTAATAT AGAACTCGTC ACTGACAAGC CTTGGATGGA TACTTGACCG CTTGTGACCA AAGCAGACAG TCCAAGAGTG TGGTCTTCCC AGCACCAT'rG GGCCCAATCA TTCACCTTCA GCTACCTCAA AGGAAATACC CTTCAAAATA GCCTTGCCCT ATTTAGGCTT TCTACCTTAA TCATATTCAT GATATTCTCC TTTCAACCAC TAAGGAAAAC GACGAAAATC ATAAATCCAA ACCCCAAAGC ACCACGAATG gCAAGGTTTG GTCAAACCAA CCTGTAAACA TTTCCACTAA CCATACCAAG CGATAAAGAA ATAGATGATC CCTCTCTTCA TTCCTCAAGC TCCTTTTTCA TAATTTCAAA CCTTCTCTAA CAAGCCAAGA CATCATTCCA AAGCCAGCAA AGGAAAATGA TAGAAACTCT CATCCAATCC CGAAAACATG AGTTAGGTCA TACTACTAAA CTCACTGCGA TAATCATTTT ATTTCTCATC TCTTCTTCCT CTACAATTAT AGTCTTTTGA AATCAGAGGA GACAGAAGCT TCTGTCACTA CATCTAAAAA 4740 TTrTGCCCCCC 4800 GTTGATCGAT 4860 ACTC'rTTGAC 4920 AGTCTAACTG 4980 GTTTATCTCC 5040 AGGCGACGCA 5100 TGATGTNTrTT 5160
TCCATTCTCA
AATTGGCGAA
AGTGACAGGC
CATCTCCGAC
AGAGCTCCCA
TAACTCCTGC
CCATT1TCATA
GAAAATATGA
5220 5280 5340 5400 5460 5520 5580 5640 1173 CAAATGTCAT AAAAAATCT GTT'CAAAACA AGCAAGATAC ACTATACAAT AAAACACAAT TAGAAAAATC TAAGGCAACT TCCTCAAAAG AGATATCAAA CCCAATTCAC ACCATAATGT AAACTAATAC TTATTTAAAA TCAAAAAGAG TAGAAATP TATCAGACAA ACACATATAT AGTGTATTGA ATCTATAACA GTAGGCCTTA AATACTAAALA TAT~rCTATA AATTAATTTA ACTTTCCTGA TAGAGCTGTP CATATCTTAT TTCAATTCTC ACCCTTCTA'r TTCTTTCTTA AAGATTTATA AGAGTTATAA GTATACCTAA ACTACGGTAT ATGAATTACT GAAAATCAAA ATCTAGTTTT AAAAATACAC AACCTGCGAC ACCTTGGTCC AAATAGAAAA ATGCACCCTA CTCTATCCAG TTGAGCTAAG CGATCGTTAC CAATCGCAGG CTCTCTAAGC GAACGACGGG ACCACTGAAC TACGTTCGCA ACACGCGACC CTCTGATTAC TTATTGAAAA GACTGGAGAC AAAGAGAGAA CCAAACTGAT ACTCACATAT CTCTGTAATG CAAACCA.AGC ACTCrACCAA GAGGAGTCGA ACCTCTAACC GGTGCTCCAT ATTATGCCGA ATTTTA.AGTC CTGTGCGTCT ATTCGAACCC GCGACCCCCA TAAATTATAC GTTGAACAAA AATCTGTTAA ATTTCA.ATGT AAAAAGTATA CGCTGCCAAA.
TCCC TCTTAA TGTATATAAT AATCGGGAAG ACAGGATTCG GCTGAGCTAC TTCCCGAGTT
GCCTGATTCG
GGACCGGAAT
GCCAGTTCCG
ao...
.00.
0
CTGTTTTCTT
AAATCAGATG
CTATCTAAAA ATGCCGGCTA CTCTACCAAC TGAGCTAAGC AACCCCCACG CCGTTAAGCG CCGCATATAT GACCCGTACT CTACCAACTG AGCTA-ACGAG
TAGTCAGGTA
CGAACCGGTA
CCACCCCGGC
GTGGTGTTCT
CATGACTTGA
CGGCTCATTT
CCAGATCCTA
GGGCTCGAAC
TCTAAAATAA
5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 .6600 6660 6720 6780 6840 6846 GTTATATCTT AATGCGGG'rT AAGGGACTTG AATCTGGTGC GTCTGCCAAT 'rCCGCCAAAC CAGTGACCCA TTGATTAAAA GTCAATTGCT cTTGCGTTAC
GACGTG
CTTAAACGGT CCCGACGGGA ATCGAACCCG CGATCTcGCC GTGACAAGGC INFORMATION FOR.SEQ ID NO: 199: Ci)SEQUENCE CHARACTERISTICS: LENGTH: 2911 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 199: GAATTCATTT TAAATAAAGA TACGGGAGAG GTAAGTGAAT TAAAACCTCA TAGGGTAACT GTGACCATTC AAAATGGAAA AGAAATGAGT TCAACGATAG TGTCGGAAGA AGATTTTATT 1174 TTACCTGTTT ATAAGGGTGA ATTAGAAAAA GGATACCAAT TTGATGGT'rG GGAAATTTCT GGTTTCGAAG GTAAAAAAGA CGCTGGCI'AT GTTATTAATC TATCAAAAGA TACCTTTATA AAACCTGTAT TCAAGAAAAT AGAGGAGAXAA AAGGAGGAAG AAAATAAACC TACTITTTOAT GTATCOAAAA AGAAAGATAA CCCACAAGTA AACCATAGTC AATTAAATGA AAG'rCACAGA AAAGAGGATr TACAAAGAGA AGAGCATTCA CAAAAATCTC ATTCAACTAA GGATGTTACA GC'rACAGTTC 'rTGATAAAAA CAATATCAG'r AGTAAATCAA CTACTAACAA TCCTAATAAG TTGCCAAAAA C'rGGAACAGC AAGCGGAGCC CAGACACTAT TAGCTGCCGG AATAATGTTT ATAGTAGGAA TTTT'rCTTGG ATTGAAGAAA AAAAATCAAG ATTAAGATAA AAGCTATAGA AAAAAATGGT TTATGTACTG AGATTAGATA GTGAGGTGAT GACATAGTTT TGTGAAAATA GCCATTTATA ACTCAATTA.T TTAGTTTACT TTACTTTACT AGTGATACTA TTTGGAGTTA TTAATGGACT 'rAGTTTATAT AACTAATGAA TTGATTGAAA GGGTTAGTAT TGACA.ATATT GGTCATA'rTG ACTAGAAAAT AGAGTCTATC AAAATTTAAA GGCTAATAGA GGTGATGAGA CAATTTCGGC TCTTTGTCAA CTGTAGTGGG TTGAAGTCAG CTAAGCTCGA GAAAGGACAA V .00 000.* ATTT'rGTCCT TTCTTTTTTG AGTTTCGAA.A ACCAAAGGCA GT'TTGGCATT AGAATAGTGT AGGTTTTAGA GGATGAACTT CTCCTTATTC TGAAAGTGAA TTCTGAATAG CTCAAAAGTT TCTAAAACAT TGTTAGAALAT ATTTTACTAT AAATCCACGT CTGGTGTGTA TGGAGGAATA ATTTATGCCA TATAGCATTG ATATTCAGAG CGATAAAAAT TTGCGCTTGA TAAGTTTGAT AGTTGAAGGG CATTGACAAT GATTCAGATT- GTCCTCAATG AAAGCAAGAG TTGATAGAGA TATCTATAGT AGATTGAAAC CGATTTGACT GTCCTGA.ATG TTACGAATCT CTTTCCACAC AATGCAAAAC CAATATTAGT TCCATAACGA GTAAAAGATA CCGTTTTTTG AAGTrrTCAA GAGATTATTG GTCGCTTCCA CTTCTCTTTA TCTTTGAGGA AGTCCGAAAA ATTTGTCAGG TTATAGTGGT GTTTCAAGTC TAGAATAGTA CACCTCTGCT ATTTGTCCTG TTATTATTTC TTGTTCAATG GGGTTCATCT 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920
CGGAATCTTT
AAAGCTCCTA TTCCTAAAGC CCCTTTATAA CCTCTTGCGA GAGAGACTAT
AAGGTACTTG
TAAGCTTGTG
TGACTCAGCC
ATCTGACCTA
ACCATCATGA
CTTACTTCAT
CTATTGGACC
TCAGAGTCGG
TTGTGATATG
GCGGATGAALA CTTCTTATCG GGTTCTAGAG TTTTTGTCTG GGAAAGTTGA GAATCAAGCA AGTGGTTCGG TAGTACAAGA ATTCCTAGGA TTGCGGCAGT AACTTAGGAC 'rTTAGTCCTC
AGTCATAGCC
ATC-ACGCTGT
GATTATTCTG GCTATGTTCA TAGTTCTGCC TATGCGATAG CAGTCCAAGG TTTAGGAGCA AGGCGACGCT AAGCTTGGTA AACTGCGAAC CGCTAGAAC TTATCGTCAA CTGGAAGAAG CTGAACTTGT TGGATGTTGG GCGCATGTGA GAAGGAAATT 1175 T'TTTGAAGCG ACCCCCAAGC AAGCAGATAA ATCATCCTTA GGAGCTAAAG GTTTAGCTTA TTGTGATCAG TTAT'PTTCCT TGGAAAkAGA CTGGGAGGCT ACAGAAACGT CAAGAACATC TCCAGCCCCT AATGGAAGAC TCAGTCAGTT TrAGCAGG'rr CAAAACTAGG AAGGGCAATr AGAAACCTTT AAGACTATTT TGAAAGACGG ACA'rCTGGTC ACGCGCCA'PT AAATCATTGG TTATGGGACG GAGTAAAAGA CTGAGCTCAG TTTAAAAAAG CGAGGGTGGT TATTTTCTCA CAAGAGCTAT TGTTATGAGC TTGT'rGGAAA CAGC'rAAACG AATCTATAAC AGTACGCATC GACTGCTAAA ACATTTCTAT ATCGATTTGT TCATATCTTA TTrCAATCCA TTATAAATAG
TTGCCAGCTG
TrC TTrGCTr
GAATACAGCC
CTTTCCAATA
GTCCAGTGGA
AAGTTTTGA.A
TCATCAATTA
AAATCAATTT
ATGAACGACT
GGTGCCGCCG
TCAAGTATGA
ATCTAGCTGA
CTC?1TTTAGC
GGAGCTAAAG
TAGTGCGTTG
TCCTTTCCTA
CGAGAAATAT CTATCC'rATC TTCTAGAATG TCTTCCAAAC GAGGAAACTC TCGTA.AA TACCATGGAC TAAAGTTGTA CAAGAAAAGT GCAAATA TCCGTGAGTT CACTAATCTG GAGATTTTTC AATAGAt~ GATcACTACT TCGTCAGTCT TATCTACAAC CTCAAA-A TAGCTTCCTA GTTTACTCTT TGATTTTCAT TGAATAT TATT'rGTTTG TGTGTPTTATT TTTATATAAC AAACTAT, AGAGACAAAA. AAGAACAGAA AGTAATTGAC A INFORMATION FOR SEQ ID NO: 200: SEQUENCE CHARACTERISTICS: LENGTH: 6854 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear CAA AGAGGTTTTA GAGGTTTATT AGA AATCTCCAGA TTAGGAACTA TCG TTATTGGGCG GTTACGATAT CAG TGTTTTGAGC AACCTGCGAC TAG A.ACAGAAAAA ATGCTrGGAG AAA CAAAATAAAA ATATAAAAAA 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2911 120 180 240 300 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 200: GAAAATAAGT CTTGACAGAA AGCGCTATCA ATGATAGAAT GAATTCAGAT AAAAAGATTT -ATTTTTAAAA CAAAAATGAA ACGTTTCAAA AAAAGKAATA AAGAGACAGC GCCAAGCGCT ATCTTTTCTA GAAAAAAATG AAACGTTTCA AAAAAGGAGG TTGCTATGAA TAGCAAAGCG AAGCAAGTTT CTCTTTGGGA AAGAATCAAG AAACAAAA~C.TCTTGTTAT'r GATGACTGTC CCCGGTTTAG TTTTAACCTT TATCTTTAAA TACATCCCTA TGTATGGGGT TTTA.ATCGCA TTTAAAGATT ACAATCCTTT AAAAGGAATT TTAGGGAGTG ATTGGATTGG TTTTTCTGAG 360 1176 TT'rACAAAAT TCATATCCTC TCCCAACTTT GGTATC -rGT TAGCCAACAC ATTAAAA1-rA AGTATCTATG GTTTATTGCT TGGC'I-ITrTA CCACCAATCA TTCTCGCGAT TATGCTCAAT CAACTCTTGA GTGAAAAAGT CAAAAAACGA ATTCAGCTCA TTITATACGC ACCAAACTT ATCTCAGTCG TTGTTATTGT CGGTATGATT TTCCTCTTCT TTTCAGTGGG AGGACCAATC AACAATTTTC TTTCTATGTT TGGAATGAAG GCTGACTTCT AGACCT3'TAT ACATCTTTAG TGGTATCTGG CAAGCAATGG ACGGCAACAT TGGTAAATGT AGATCCAGCC TATAGAAG TGACAAATCC AGACTTCT GCTGGGCTTC AACGCTCTAC CAGCCCGACT GGATGGAGCC
AATATCTTCC
CAATTTGTI'
AACGAATCTG GCACATTGAT ATTCCAGCTC TTAAGCCTAT TATGGTTATC TAGCTGCAGG TGGAATTATG AATGTCGGAT ATGAAAAAGC ATTCTTGATG CACACATCGT TAAATTTGCC AACTTCTGAA ATTATCTCGA CT'rGTATCAG GAGACTATTC TTACTCAACA GCGGTTGGTT GTAGTATTGC TTGTTGCAGT TAACCAAATC GTTAAACGCA TAAGGAGGAA AGTATGAAAA ATTCGAT'rAT GGATACAAAA CTTAAATAAA ATCATTATTG TCTTTATCGT TTTGATGACT CGTCGTAGCA TcC'ITTATGG A'rCCTAACGT TCTGGTTAGT AGCCGATTGG AC'rGTAGAAG GTTACCAGCG TGTAITrCAGT T'rTTATCAAT TCTCTACTAT ACTCT'rTTGG ATTTGCAGCT GTTrACAGCT TATCCI'CTTT CTAAGAAAGA CTTGGTTGGA CTTGATTGTA ACI'ATGTTCT TTGGTGGTGG TTAGTCCCA ATTGGGAA'rG CTCAATACTC CATGGGCTAT CATTGTTCCA TATTATTCTT GCTAGGGCCT ATTTCCAAGG ATTGCCTGAA CAT'rGATGGT GCAAATGATT TACAGATTTT CTTCAAAATC AATTATGTTT GTTCTCI'TCC T'rTATGCT'rT TGTAGGACAG AATGAT'rTAT ATCAAGGATC CAAACTTGGA ACCATTGCAA CATATGTCTA TAAAGTTGGT TGTTTAATGC AGTGATTrAAC TGAATAATGG TGAAG3GAATT TTTGATAGAC GTATCTTACT ?ITGCTTCCT'r TACTTTATAT AGAGGGATTA GCTTTAATCC GACCAATC'rA TTCTAAGAGG 'IrAACAG'rCT TGCTATCTGT CGTCGTTGGA TTAACTACTr ACTTACTTGC TCGTAAAAGA GGTGCTGTTA ACGTTTGGAA GAATTAGTTG AAGCTGCTGT ATGCTTCCTC TTGCAAAACC TGGAACTCAT ACTTTGATGC CTTGTACTTC GTAAAATTCT 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160
CATTCAGAGC
-KCGTTTAGCT
TATGTATCCA
ATAAAAAAAG
CAGTTTGTT
CAAGTCCAGA
CAACCAGGTC AAGACATGAT GAATTGATTA AATACGCAAC TTCT'rCCAAA AATiCTTTGA AAAAAATAAA AGGAGTTTTC GACAGCTAGT 'rTAGCAGTAC TGGAGCACAA GCGGCTATGA ATGAAATGAA TATTGTCATT TCCAGCTTGC CATTGATTGT TAAAGGAA'rT ATGGC1'GGTT CACTTAAAGG TCATGAAATT CAAAACATTC TCAAAATCAG TTGCAGCCTG TGGCTCAAAA AATACAGCTT TTATAAGTTG GAAGCTGTAA CA'rTCCCGCT TCAAGAAAAG AAAACATTGA AGTTTATGAC AGCCAGTTCA CCGTTATCTC CTAAAGACCC AAATGAAAAG TTAATTTTGC AACGTTTGGA GAAGGAAACr GGCGTTCATA TTGACTGGAC CAACTACCAA TCCGACTTTG CAGAAAAACG TAACT'rGGAT ATTTCTAG'rG GTGATTTACC AGA'rGCTATC CACAACGACG GAGCTTCAGA TGTGGACTrG ATG.AACTGGG CTAAAAAAGG TGTTATTATT CCAGTTGAAG ATTPGATTGA TAAATACATG AGGCCTTGAT GACAGCACCT GAGATGGTAA AGAGTCTATT r'rAAGAAACT TGGTCTTGAA CTTTCAAAAA CGGCATCCA TTAGTGGTAA CGGAAACGAA ACGATGATCA TTTAGTAGTA ACTATAAAGA AGGTGTCAAA AAGCTTTCGA ACATGATTGG T'rTACTTTAC ATGGGATAAG CCAAATCTTA AGAAAATT'TT GGATGAGAAA CCAGAGTACA GATGGGCACA TTTACTCAT'r TCOATCGAT'r GAAGAGCTTG
CACAGTGTCA
ATGCCAAAAA
AATGGAAATG
GATTTTAAAT
GGAAATGATG
TTTATCCGTC
AATAGT'rACA
AATAATGTTA
9 9 *6 9 9 9 1* 9 9 9. 9.
9 9
S
Ct, 99 9 5
S
S.C.
A 9 99 9 5 9 599.
0 9**9 999$
S
909* 99. 59 9 9 4 ACGATATGGC T'rGGA1-rAAC AAAGATTGGC CTACTGATGA TTTGATTAAA GTCCTAGAAG GAGAGGCTGA TGAAATTCCA TTTTCATTTA TCCTAT'rTGC TGCATTTGGT ATAGGGGATA GCAAAGTTGA CTTCACAGCA GATAACGATA AATTGCAAGA AAAAGGCCTG AT'rGATAAAG ?rTcCAAAGG TCATGATCAG AAATTTGGTC CTGGAAGTAA CGAAAGTTAT GATGTTTTAC ACGTAGC'rCG TACAAACGGT ATGGGATT'rG ACAAAAACCT AGAATTGACA GCTAAATGGA TGCAAAA'rAA CTGGGGAACT TACGGAGATG AAGCG'rCAAA TAGTCTAAAA CAC'rTACCAC AAAAGACTGA ACTAGGAGGA CCACTAGCTA CCATGCCTGA TGATGCCAAA TGGCGTTTGG TGAGCAATGT CAATAACTAT CCAAGAGTCT CCCATATCGA AGCAGATATG AA'rGACTATA CAGTACTTGC TGGACCAAG'r GGTCAAAAAC CACGTGACAA GATGGTrATT ACCAGTGTAA TTGATGCACA ATACGCTCCA CTCCAATCTG ACAAACAACA AAACATCTT'r GAATTGGATC TAAACGGAAC TGCACCAGCA GAACTTCGTC TCCTAGATrC ATACTATGGT AAAGTAACAA ATCTTATCAA AGAATATTAT GTrCCTTACA T'rATGACACA GGAAGATTTG GACAAGATTG 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 TCTACCGTAA- ACGTGCTGAA TGGA'rTGTAA ATGGCAATAT TGATACTGAG TGGGATGATT ACAAGAAAGA ACTTGAAAAA TACGGACTT'r CTGATTACCT CGCTATTAAA CAAAAATACT ACGACCAATA CCAAGCAAAC AAAAACTAGA GGTTGATTAT GGGAGATAAG AAATACACAG TAGAAAAAGC CAATCGTTTT ATAGCAGAAA ATAAACATCT CGTTAATACT CAATATAAGC CTGAAGAACA TTr'rTCAGCT GAGATTGGTT GGATCAATGA TCCAAATGGA TTTGTCTATT TTCGTGGAGA ATACCATCTC TTTTATCAAT TCTATCCATA TGATAGTGTT TGGGGGCCTA TGCACTGGGG ACATGCTAAA AGTAAGGACT 'rGGTGACTTG GGAGCACTTG CCAGTGGCAC TTGCTCCTGA CCAAGATTAT GACCGAAATG ATGATCGCCT CTG.GCTC-ATG TACACTGGAC TGCAAAATAT GGTATTTTCA GATGACGGGA TTGCAACTGG ATCAGACTrA CCAGATGAGT TC7TTGAAAA AGATGGACGC TATTACTCCG GCTGTATCGT TCTACTAGGG TCCGATAACC TAAAAGGGGG AGAACACCAA GGTTTTATGT GGAAAGATTG CCTTATTATG TCACCCAT~C 1178 GTTGr'rTCTC AGGCTCTGCC ATTGTCAAGG ATATCGAAGA AGAAACCGGT GTCCGCCAAG TTCACI'rTGA A.AAGATTTCC CAAAATCCAG TGATT-GCTGC TGATTTCCGT GATCCAAAAC TAGTAGCTGC CA.AACACAAG GATAATGTGG TAGTAGAATG GCAGTTCGAA TCCATCrT= GGGAATGCCC AGATTACTT~C GAGTT'AGATG GTTATCAGCG TGAGGGAGAC TCATATCATA ACATCAACTC ATCGCTTTTG TTCACGGGTA AGGTAGATTG GAGAGAAAAA CAGAATCAGT TCAAGAAATT GATCATGGCC AAGACT'rCTA TGCGCCTCAA
CGTTTTATCC
*ACATTGTTGG
ACGATCAAAA TCGTCGTATC CCCATGACCA AGAACACAAG AAGA'rGGCAA ACTAAGACAA AAGATTGTCA TTACCACTTA CTGATTGCTT GGATGCAGAC TGGGCATGTG CCATGACTCT TTCCCTGTTA AAAA.AGGCCA GGAAATGATA TAGATTATCT ATGCGCAGCA AGTTTACATT GATCGTAGCC AACAGGACAC TAGTCGACGG 'rATGTAGATA ATAA.AA:ATTC CATCGAGAT'r TTTGTCAATC ACTTAACGGT GCCAGCTGAG CTATCACGAA GAAAA.AGTTC TCTTTCTAAA ATAGTGGAAA TTAGTTTATG GTA'N'TGTAA AATTGGTGTT ATTTGAAAAA AATTNTATTT AAGCAAAAAA TTAGATGCCC CCTACAGGGA TTGTAGGAGA TGTTTTGTAA CGTTTAAATG AGTrTTTTTGA CCCCACTTAT TGCTTGAAAA AGAATTTAAA TGCTTGGAGG AACTGATGAA GAACAATGAA GTTTTAGGAT TGAGGGAAGT TAGAAGGCTT GATGGCTATG TGATTGAAGT TGCTTATAAC
ATCTTATTCA
TTGAAGCTAA
AAGGTGAAGC
TTGATTAAAA
GAGGACTI-rT
GGATTATCAT
ACCTTGGTTC
TATGTTGCTT
GTTTGTTGGT
ATATAGTATA
ATGGGGGCGT ACCCTTCCAA ACCTAGAATT CTAAGATTGG ATATCAAATC CAAATAGATA TGAATTTGGT TATGACAGTA AAAAATTCTA GGTGAAGAAG AGA.ATTGGAA GTTGTTCTAG AAGCTTGACT GCAACTTATT ATTAAGTTAT TTCTCCTAAA TGTGTTTTGG GTATATAAGC TTAAGCTAGT TTTCTAAAGA CA.AGGCTTTT CCTGTTGTAT AGATGTTCTT GATTTTCTGG GGGGCGTTGC CCGGCAATTG GTTAATTATA GATTAACACT 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 AGATTAGGTA TTAAATTAAG TAGAGATAGC TAT'rTAGGCA GTTCAGATAT CCCAGTTTCT CAGATATCAC ATGAGATTGA TATTATT~GAT TGGGTAGAGT TGAACAAGTC AAAAATTAAG ATAAGTGAAA TTAGTGAAAG CGTGGATATA GATGCCACTA GCTTGAGAAC AACTTGACT TTAGACACAT TAGTATATGA AGGTATGAGA GATATACAGT TAAAGTTGAG AGAGCTTACA AAGGGGAGAG TATTCTTTTC ATTTGTAGTG AAGTTAGTTIT TGTTrGCTrC A.AGTGTTAAT CAAGTATTGA TCTTrCTAAA TTNTTTTCGT TTI'ACTCTAA AAAGTGTTAA AGAGCACCAC CAGTrATAAT AGAAAAGGAG AAACAAATCA TACATT'rGAA GAGCTI'ATrC TTTGGTTGGA AGGTATGGTT AGAAGCGTAT GGGGGTAAAA 1179 TATTTTAAAG AAAAAAGATI' CACTTTATCT GGATTTCGGT TACTAGAAAA ATTTCAAGAA ATAATATGCT TAGAAAGGAA CGAATACAAA AACATATTTT
CCTTATGTGT
TCAATGA'rGT
AGTATAAAAC
TGAAACAAAA
AAGACCAAAA
TTACTGCTAG
ATCTAAACGT
TAATCAAAGA
ATTTGTTAGA GAGGTAGATA AATGGAATTG GTATAATAAA AATATTTrAA CTTGAATTAT ACAACCGATT GTTTCTAGAA CGAAACAACA GTI'AGAAAGA TTGGCTAAGT TGTCGCCCGA CTGTGCGTCT TCATTTGCGA ACTTGATTAA GAATATTGCT TGCAGTTGTr ATCATAAGGC TAGTGATAAA AAAATTrGTC TAGATTC'rTA AACTAGTGAT GTGATAAAAG ATGTTACGA AGAAGCGACA CCTGAATACA TTTIATGTTGT ATATTrTAT ATTAGAGAAG TTCTGGGATT AGTCAAGGTT CTTGCAGGAG CGCTTGCTAG TTAAATTTAT ACTCTTCGAA AATCAAATTC AAGTGCTGTC TGTGGCTAGC TTCTTAGTTr ATGGTAGTTA T'rTATGGCAT AATAATATTG CTATAATATr TGTAGTGGGT AAACCACTAT AAAGTCCCAT ATGA AGTTTATGCG AGTCGGATGT TGATGGGTAT TCTTTAGCAG CTA'FTACCAC AAGAATACCT TAATTATAAT AATTATATGG ACAAACTGAA ATGCCAAAAG ATCGTTTAGT ACCATGCAT'r.
AAAATATAAG CCATATCGAA AAACCAAGTC AGCTTCGCCT GCTrTTTGAT 'rTTCATTGAG ATTTGGGAGT 'rATAGCGAAA AGATATTATG GAGCCTATTT
TGGCTCTCTG
ATGCGCTITTT
CTAATACCAT
TTTTAACTCG
ATTCAGATTT
ATGCATTTTT
ATTGAATTAT
TGCTGTACTC
TATTACTCTT
ATTTTAGGTT
ATTGTAGAAA
5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6854 INFORMATION FOR SEQ ID NO: 201: SEQUENCE CHARACTERISTICS: LENGTH: 3895 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201: TCCTTGCTAA GTTTATACTC AATGAAAATC AAAGAACAAA CTAGGAAGCT AGCCACAGGT TGCTCAAAGC ACCGCTTTGA GGTTGCAGAT AAAACTGACA CGGTTTGAAG AGATTTTCGA AGAGTATTAA TTTACATAAA TAGCCAGTGT TTGATAGGGT TTGAGTAGAA TTTTCTCAGA 1180 CACTTCTGCA TCTTCATAGT TTGATATCAA AATCTGTCCA TTI'T~AGA CTCGGCAA GTCGATIT~cA CTTCTTTAGC ATAAAAGTTA TTGAGCACTA G1'AACTTTTG ATCCTCAAAC TGGCGTTCAA AAGCGTAGAC TTGTTT-GCTA TCTTCAAAGG CTGGTTTGTA ACTTCCTTCT GAAATGATTG GCATTTCCTT ACC-ATCGAA TCAAGTCTTG ATAGAAGGTA AAAATCGGAC CCTGGATTrC AT'rCTACA TrrGATGTATT TATAGGATTr ACCAGCTT'rC AACCAAGGAG TGCCTGTTGA AAATCCTGCA TTTTCCGAAG CATCCCACTG CA'rGGGAATG CGTGAATTAT CACGCGACTT AGCTTGAATA ATCTGGAAGG CI'CTTGCTG ACTCTTTCCT TCTI'CTAAGA GCATCTGATA GGCATTAAGC GATTCGACAT CCACATAATC AGCCATAGAA TCATAGTC'rG GGTCAATCAT CCCGATTTCC TCACCCATGT AGA'rATAAGG TGTCCCACGT GACAGGTGAA 'rGCTGGCTGC TAGCATGGTG GCTCCTTCCT TGCGGAAGTT T'rGAATATCG ACAAAACGGT TCAAGGCACG TGGTTGATCG TCATTTCCTT ACCCCAACTA AGGTCCACTT TTGTCCATCC ATAATTCCTG ACGATCAGGC TTTCCCCAAC TGTCATAAAG AATAGT'rATG AACGATGGGT CCACTGAAAC CTCGTCCTTA CCTTGTCGCG CCAGAAATTA AGTTAAGGTC AGCCTGGGTC AAGGCGTCCA TGCAGAACCA TGATI'ATT-CC AAAAGAGGGC TGGTAAAGAC TCTCAACTC TTATAGTCCA CCTTGAGGTG ACTCCAACCG TCTTTATCAC TTCAAAATCA AAGGGAGCCA ATGAAAAT'rA AAGGTCATGG
GACGAATAGA
CTA'rCGTCGG
TTGTCTGTAT
CCGATCAAAT
ACAACCTTGA
TCATCAAATA
CCAAACTTAG
GGACACAGTT TTCCATGGTG GTAGAAGACA ATCCAAAAGT GGCTTGGTTC ATCATACGCA AAGCTGGC'rT CCCTTCATTT TCAGGACAGT TGATCACATC AAATCGGAAA CC74MTACAC AAAGCTCCTT ACGGACATTG GAATTGCGCC GGTGAAGATA GTATT'rCCCA GTATCCCCGA ACTrGCCA.ATC TGTTGGTTGG TCTTGGATGA CTAGGGCTTT CTGAAACCAT TCATGCTCTG TAAAGTCAAT CTTGTGCTCT TTACCGACAC CAAAAAGAGG ATCCACTGCC ATATAATCTG 840 900 960 1020 1080 1140 -120 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980
AGA.AAAAGTC
TCGAACA-ATG
ACACCATTTT
T'rGATA.ATAC 'N'ATCACCAG ATTAAGTACC ATGTCCAGCA CTCAAAATCA GCCATATCAC AAATATCGTA ACCAT'rATCC CGTTGAGCGC T'DGGATAGAA TGGA'rTGAGC CAGACCATAT CCACACCTAG TTTGGCTAAA TAGGGAATTT TTTCGATAAT CCCACGGAA.A TCCCCAATAC CGTTTTCAGT GGTGTCTTTG TAAGATTTTG GATAGATTTG ATAGACTACT TTTCCTTTAT CAAGTGTCAT CTGTTTCTCC TTT'rC'rATA AAAGGGAGGA AGCAGTCTTC CGTCCCTATT TGTGCTATTI' CAATTATACT CAATGAAAAT CAALAGAACAA ACTAGGAAGC TAGCCACAGG 'N'GCTCAAAA CACTA'I TTG AGGTTGCAGA TAGAGCTGAC GTGGTTTGAA GAGATTTTCG AAGAGTATTA GATTCGTGTA GCGACCATGA GAGATGCTCC AGCTTGGATC GTTGTCGGAT 1181 CTTGGTTGGT GATGATAACA GGAGTTTCTG AAGTTCCGGG AATAGTCGCT GTA'rAAGCAT
TCACCAGACC
TAACGTGATC
CCATACCGAT
TGGTAGGGAA
GTTCAATGAC
TCAATTCTT'r
GTTCATGGTT
CGCCCTCTGT
CAACAATCGC
GAATACCTGG
TGCAGCCTTA
TCCTTGGACT
GTGGATGAGC
A'rGACA'rCCA
ACAAGACTT
AATTCAACTC
TATCAAAACG AATCAGT'rGC CAAAACCTTT GCCATCAAGA CCTCGTCAGA GACAATGCCG TAACTGGAGA GGTCAACTCA
TGACCAACTG
CCTACTGTAT
ATGGCATGCT
CCTTGGCT'rG
GTCGCTTGAC
2040 2100 2160 2220 2280 2340 AAGAACCGTC ACTGTCCCAT TAGACCTTGC CCCATGACAC CACT'rGGCCA GTTAGTGGGC CACAAATTCT GCTTCTTCTT TTTTGTAAAG AGACCAGCCT AACTAGCATA GT'rCCTGCAA CAAACCACCG ATACCAATAG CTGA'rCAAA
TGATAAT'TTC
GAGCAACGAA
TGCGGAAGAA
ATGGCAGCAT
AAGCCGCAGT
CAGCAACAAA
AATAGGATCC
TACCGAAGTA
TTCTGCCTGC
GAAAGTCAAG
GTATTGAGGT
TACATTAAAA
TGGATAAATA
AGTTCTACTG 2400 AAGTTCGTAT 2460 AOCA'rTGGA.A 2520 TGAATAGAGA 2580 GTAACGGATA 2640 TAT'rTTACGT 2700 GCAGGAkAGTG 2760 ACGGCTGAGC 2820 GCATCCGCAA 2880 ACATGCCTGC AAGGGCTGAA CCAGTCATCC TAACCCCAAA AAGAGCTGGT TCTGTAACAC AAACCTGAGC CTCACGCTCA TCATGGCGAT CTTGAGCAAT ATTAGAAAGA GCAATCATTG TCAATTGTGT ATCAATGGCA TTGGTCATAT CGACATAGGC TGAAATGGTT GCATGAAATA ATAGGCAAAC GCCATAGGGC AGTGCCACCA GGTGCAGACC TGTGATGACA AATGGAGCGT AGAGGGCGCC AAAAATTGCA CCGAAGAGCC ATTTAACTGG ACCAGTTAAA CCTGCCAAGA CAACTGATGA AAGTCCTTGT CCAATTGTCC AACCGATTGC TCCCAAAACA GTATGAGCCA AAATCAAGGC TGGAATCAAT GACAAGAAAG GTACAAAAAT CATAGAAATG ACTTCTGGGA TATGCTTGTG CCAGAAGATT TCAAGATAAG ACAGACTCAA ACCTGCAAGC AAGGCTGGGA TAACTTGGGC TTGGTAACCG ATACGATTAA CAGTAAAATA GCCAAAATTC CAAACCCAGT TT~GCCGCGAT ATCAGCTGCT GGCGTTGAAG CAACCGCATA GGCA'rTGAGC AACTGAGGI:G ATACCAAACA GATTCCGAGA ACAAT'rCCCA AAATTTGGCT GGTTCCCATC TTACGAGAAA CAGACCAAGT AATCCCTACT GGTAAGAACT GGAAGATAGC TTCACCAGGC AACCAGAGGA AGTGATTCAC ACCTGCCCAA AACTGAGAGG ATTCTGTGAT GGTCTTGCCA TCCAACATCG ACCAATGGAC ACCTTCCAAG ACATTACCGA AACCGAGGAT CAATCCTCCG ACTATCAAGG CTGGAATAAT CGGAGTAAAA ATCTCCGCCA GAGTGGTCAT AACACCTTGG ACCACGTTTT GATTACTCTT ACIGCAGAC TTGGCTGCT'r CTI'TGGAAAC ACCCTCAATA CCTGAAACGG CTGTAAAATC ATTATAAAAG ATGGGCACGT CATTTCCAAT GATTACCTGA AATTGACCTG 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 1182 CATTrTG'rAAA GGTirCCTTTA. ACAGCTGGAA TTGACTCGAT AGCTTTAACA TTAGCCTTCT TA'rCATCTCC TAAAACAAAC CGCATCCGTG TCGCACAGTG AGTTACGGCA GTCACA'rrT CTTTGCCTC!C GATrGCCTGA AGCAGATCTT TGGCTrCTTG TTCAA'rTr? CCCGG INFORMATION FOR SEQ ID NO: 202: Wi SEQUENCE CHARACTERISTICS: LENGTH: 3936 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 202: 3780 3840 3895 AGGATCGCCG CTCCAGCTAC TAAGTCTCGT GTTTTTGATG ACACGTAGGA AGGTGCCGAT TATACTCGTA TCACCAATCC TACAACAGCT ACAGCATCAG GTATGACTGC AGTGACTTAT CATGTAGTGG CTGCTTCGAC TATTTACGGT CCTCGTTATG GTATCACAAC AACCTTTTTC GCTATCAAAG ACAATACCAA GCTTGTCTTG ATTCCAGACC TGGAAAAACT GGCAGAGATT GACAATACTT TTGCAACACC TTATTTGATT ATTCACTCTG TGACTAAGT'r TATCGGTGGG GCAGTGCCGA TTTATCAAAC AACATTTTTT CTGN'TGCCT TGAGGAAACC AGGGAACATI' GCCCTTGAAG GTGGTGTTGA AGCGCTAgcA ACGATTTTGG CGATTGCCCA 'rGCTGGTGAC
GGAACCTTCA
GATATTGATA
ATTGAAACCT
GCTCATAAAC
AACGTCT'rCT
CATGGTACAA
ATCT'rTTGAA AGAACCCCTT ATTGGAGGA AGTAGAAGCA TGGGTAACCC CTTGATTAAT ATCAAATCCC AC'rTGTGTCA CTCATGGCGT TGACATTGCC CTATTG-GAGG AATAATTGTC CTCAATTTGT TGACGAGGGT CAGCAGCCTT TATTATAGCT CACCATTCAA TGCTTTCCTC GCCATGTACA AAATGCTGAG GATAGTGGTC GTTTTGACTG GACGGCTTCA GGGAAATTCC CCAAGCTGCC ACAATTTGAG CTATAC'rCGT GATGTGGGTG GTTCGAGTTC AATTGCTTCG TGATACAGGT GCAGCCTTGT TTGCTACAAA GACTTGAAAC CTCTTCACTT CGTGTGGAAC a ACAATTGTTG ATTTTCTTGT GCAGATAGTC CTTATCATGC TTTACCTTCC ACGTCAAAGG ATCTTTTCTG ACCTTGCAAA ACCACTCACG GTCAATTGTC ATTCGTTTGT CAATCGGTCT TTGGAAAAAA TTTAAAGTAA CAACATCCT AAGGTAGAAA AGGTAAATTA TCCAAAACTT CTTGGCTGAG AAATACTTGC CAAAAGGTGT CGGTTCAATC TGGCGAGGAA GAAGCACGCA AGGTCATTGA TAATTTAGAA CGCGGCAGAT GCTAAATCGC TTGr'FGTCCA TCCAGCAACA AGAAAAAGAC CTAGAAGCAG CAGGTGTCAC ACCAAACTAA 'rGAAAATGTA GAAGATTTGA TTGAAGACTr GCGCTTGGCC AAGAAGATAA ACAGTGGGCT TCGACTCACT GTTTTTGATT 900 960 1020 1080 1140 1200 1260 1183 TTCCCTCAGG CATGATATAA TGTACAGA AGTCTAGAAA TCAAATCTCC CAACTGTGGG GAAGTCTTTA CAGTAAATGA TGTCCCAAGT GAGAACGGCA GAGTTTGA'rA AGGAACTACA TGGCCT'IGGC TGAGCAAAAG GCCATGAATG AGCAACAGAC AAGAAATTGC GCAATTACAG ACTCAGATCC AAAACTTTGA GAGGAACGAT ATGAACGAAA GAGTCAGTAT GCCGAACTCT CCATAGGATG AAGCAGGAAC TAAACTGGCT CAGAAGGATC TACAGAAAAA GAATTGGCCA TAAGGACAAG GAACTACAC AAATCAACTA CAAAAGACCC ACTACTTTT~G CAGGAAAAGC AGCCCAGCTC AAGGCAGCTA AGAAAGAGGT TGAACAGACA AGCCATGAGG TCTTAGAAAA TCAGTTGGCT ACCTTGCGTT T'rTCTGACCT AGAAAAAGAA CGGGATCAGG A.AAATGAATT ATCTTTGGC'r TCTGTTAAGC GTGAACAAGT CGAGTTTTAT AAGAATTTrTA AAAGCCTAGA ACAGTATGCA GAGAGTGAGT ATGCTTACTT TGAGAAGGAT AACAAGGTCT CTCrCTGGC
TGGAGCATGA
TAAAAACCA
AAAACTACGA
AGGC'rCAACA ATCTACAAAA GCGATTGGGG TTAACAAGGT TCGTAGTTTC GCC'rrTCCAA C7rCGCGT~C t
TCCGTGAGTG
AAGCGGACGG
ACCG'rCGGGA
ACTACT'T'AA
GTCCTCA.ATT
AATACAAGCA
AAGATTTGGA
TTGGAAAAGC
TCCTGACCAC
TTAAAAAAT-r AGTAGAA6AGC
ATGATGCGGT
CCTTGGATCC
TGATGAAAAT
AACAGAGAAG
GAAGAACTGT
GGACTTGAA6A TCATTTCTAT AAGCACAAGA ATGCAGATTT GAGTATGCCG TTTTGGTGAC GTCTAAAGGG GACTTTATCT CATGTTr'GAG ATGAAAAACG TTACAAGGAA TTGGACAAGG CATGCTTGAG GCTGATAATG 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 CACAGGGATT GTTGACGTCA GTCACGAGTA CTTTATCCAA TrrGArrGGTC TCTTACGTAA GGAGTTGGCC TTGGTTCGCG AGCAAAATAT TGCCTTTAAG CTAGCTTTTG CTAAGAACTA TATTGATGAA ATCGACAAGG CCATCAAA CG ATCTGA-AAAC CAACTCCGTT TALGCTAACAA GACCCGGAAA AATCCAACA-A TGAAAGCGAA AAAAATGAAC GGTATTATTA ACTTAAAAAA rTTTTAAACTG CGTAAGATTT TGGGAACCAA GGATGTGGTG GGTGTTTTGC CGATTGCCGT
TGAAAAAATG
TGCGGCGCTA
TGACATTACG
TAATTCAGCT
CATGGAAGAG
TATGTT'GTTC
AATTCCCTAA
CA'rTTTGAGG
TCCACTAACT
GTTAAGAAAT
CAAATTGGAA GA'rGTCTCTG GT'rCGAAGCA CTGAAGGGGG GGAAGCAGGA ATGACCTCGC GAAAATTGGT CATGGTGGAA TGGCAAGGCG ACACGCA'rGG TCGAGTTTAT GCAGGACGAG GGTAAGATCT CGAAGACTGA GGATGCTAGT GGGGAAGTGG ATGAAAAGCT TGTTGATGAA GCGATTGCTA ATGAGr.GGGA AATCACTCTG GGCTATTCCA TCGCAGAAAC CCCTGTTTTG TCTCTCTTGG GCTTGACTGG GCCTATTACT CAGATTCCCC GCAAGCTCTA TGAGTATGCG CGTGCTGGTC CTATGTATTIC GGCAGTTAAG GTTAATGGTC 1184 AGGAAGTGGA GCGTCCAGAA CGTCAGGTGA CCATTTATCA ?r'PCrTATGA TGGcCAACTr GCCCGATTCA CTTTTCGTGT ACATCCGTAC T?1'GTCAGTT GATTTGGGTG AAAAGCTTGG ATTTGACTCG TACTAGTGCT GCTGGC?1'AC AATTAGAAGA AT TTGAGCGA ACAAGTCCGA AAAATGCAGTr
TTATGCGGCT
CGCTCTTGCC
AAAGGGACGT
CATATGTCCC
TTGGAGGAAA
TTGCTGAAAA AGTAGAGGCT GGGCAA'N'AG GTGACCTTGT CAAAGTTTTC CTAAGTCCAG TrAT'GAGCT AGACCAAACG GACAAAGAAC CCATTCTAGA AAAACGGGGC AATCTCTATA AGGAATAAAA ATCGGGTGAT AGATAACAAT ATGGTTTTGG GAATTATAAT ATTCCAAT-rG ATAGAAATrT AGAGGTGTGA AATGAAGCAA TCCATTACAG GTTCTGATGG GAACTTAGGC GAAATGAGTT TCGAGAAAGG GTGGAGCAAC GTGAGTTGAG TCACCTGTTT CGTCTrGCTA AATCGGTCAT GGCCAATTTG AGTCAAGGGT
ATTTTCTCCA
AAGAGGCTAC
TGGCTGCCTT
AGCCAAGGAA
TGCTTGATAA
TCCTTTAGAG ATTGGGACAG AGAAGTTCGC TTTGGTCGTT rGAAGATGAT AAATTGTTAG GGTTTTTAGC TAGATCGTTr AACCCCATAC TAATAGTAGA 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3936 TTGCGAGTTG TAGGTACTCA TTTAAAATrC TTTCAGATAA en...
9 9 9.
9 9* *b 9 .9 9 C. 99 9 S CCAGGATT'rG
TC'ICAACA
TACAAAATTT
TGTCACTTTA
GTGTGATAAT
AAAAGAAATA
AGACAGAAAT
rrTrATCACC
AATAATCTAT
ATATTTAGAG
TCCATGATGC
AATGA-AAATA
GAAAAATACC
rATrATTArr AGGCACCTAA GTCTGTCATT GAT'rTTGGT'r TATGGA INFORMATION FOR SEQ ID NO: 203: SEQUENCE CHARACTERISTICS: LENGTH: 3.230 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 9. 9 9 *999 9999 9 *9 9 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 203: CATCCAGCAA CTGCTCCTCT GAGCGTTTCA AAATTGATGT AATTTTTCTA GTTTTTTCTA ATAAATGTGC CATT'ITTTCAC CTCGAATTTA ATCGCTATCA TTATAACATA AAAACGTCTC TTT'TTCAATA AT'rATCTGAA AATTCCTTAT TGACTTGCAT TGACTTACAA TTTAATTAAA AACCAGAATA TTTPTAATTA AATTGTTCCT T'PTCTATTGA CAAGTTGCCTV ATTTTTGTGT ATCATAATAT TATAAAAGAT TATATAAAAA AGCGATTCAT AACAAAGCCT GTTTGCTTTT ACGTTTTTCC TTATCAAAAC
AATATAATAA
TTTGAACCGC
CGCTTAAAGT
TTTTATT'rGT CTTTTCACAT TCGGTCTCCT TTTTTCTTAT TTATCGCCTT TGTTACGAAT ATTGCGTGGT TTTTTATTAT CCTTACGGTA GATCGTTGCC ACGACTTCCT ?I-rTTGAACT CATCACGGCG
ACCATTGCCA
CTTAGCTTTA
AATCTCCACT
CAATTCTTCT
CGGCGATCAC GCTCTCGACG GTCGTCCCCA CCACCGAAAC CA? rACCTGA TGG'rTAAAC TCTGGAAGGC TATCTGGGTC T'rGGACTGTc GGAGTAAACT CAGCAGCCAA TTTGCGAGCA CGACGGCCTC CACGACCTCC GGTAGTGG tT TN'CACGTGC AGACTCAAGA TATACATTGC TCCTTACCAA ATTTCTCAAA 0 0 o 0 GTTGGCACGA ATGGTCAT CTGCAAAATC TTTT'GATTGG AAGGAT'rC'T CTACACTTGC CAAGTr'rTCA ATGATTTGAA GGTAACCCAT ACCTGACTTA CCAGCACGAC CTGTACGACC TGGAATATCG TAGTTGTAGA CATGGGTCAC GTCTGTCGCA ACCAAAACAT CAAGAT'rGCC TTTGTTTTGG TCTAGGTCGC CATGAATTCC ACGAGTCAAT TCATCCACAC GGCGTTTGGT TGCCACATCC ATGAGACGAG TCATrGGTGTC GTACTGGTCA ACCAATTCTG TTGTCAATTC TTTCATAAAC TGAACACCGA TACGT'rTGAT AGTTTGACGG TTCTCAGGTA CACGGGAAAT GTTAAGCATT TCATCCGCTT CGTCAAGGAT CTTGCGTT'rA ATCAAGTCCA AGAGGCGACC TTTAAGAGCC TTAATTTGTr TTTCAATGCT TCCCTTACTA CGACCAAAGC GGAAGAGTTC TGGAGCGATG ACCAAGGCTT GGATAGTCGC CAAGCCAAAG GCTGCAGTT.T TTCCTGTACC TTCAAGGGCC AAAGGAATAG TTTGTTCTTG T'ICAATTTCT GCTAGCAAAT CACCAGACAA CTTTCTAA.AG GTGGTGCGAA GCCACCCTAT CGTATTTTCA TATAACTAGA TATAAAATCG TGTTTTCT'rT GCAACCTATC TAGTATAACA ACGTTCGATrr TTC?1'GAGAG CTACCTGT'TT AGGTTTGAGA CCTTTCATGC GTTTCTTAGT 'rTCGTTTGGA GCA.ACAAAAG TAATAGATrrG GATACGGTGA ACATAAC'rCT CAGGATCTTG ACCTGAAATA TCCAAACCAC GCGCTGCAAC AT=AAAG TCACGAAGGA CACGAAGACG TTCTGCACGG AAGCCACGAA TTTT'CAAACC ACGACCAAAT ACAATAGCGA GTTCTGGTTG
AAA'TTTTCT
CTTAGCCGCA
GGCATCTGGC
AATGGCT'rCG
AAGGGTT'?CA
TGGAGTTCCC
TGATCCGCCA
TTCTTGAC'TT
TTCTTCTGTA
TGTTCC= AA CACGGATATA ATCTTGACAT GTTCAGGGGC ATAG'rTGCTG AGAAAAGCAA ATGTCTTCAA GGAAGCCCAT ATGTCTTGTA ATTTCAAGGC ACCACAATAT GGGCACCAGA TATACTGAAC GGACTTTGAC TGGACAGCTA GTTCACGAGT CGGATTTTT CAAGGGTAGG 780 840 900 950 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100
S
AGTCTGAGCT TGACCGATAA GATAGGACTA GCTTCTACAA GTTTAATTCA TTAAATTTCA AGGGCTTAGT TTATAC'N"TT
CATCCTTGCC
AACCAGCTTT
CGTTATTCTT
CTTTTTATGA
TGTTGCTTCT TTTCCACAAA AGAAAAGTAC CAAGACCAGA GCAAAAGATA GCCCCATTTC TACAGAAAAT CATGTAAGCG CTTTTTGACT TTCTTTmG ATTGAACGAC C'rAGATAATA AGACAAAGCC AAGGCGATAC TGTATAAAAT GAGAAAAACG AACAAGGTTT GTGTGTACGA 1186 ATGAGCCATT TTATAAGTCT CTGCTA.ATAA AATAGGTCCC GCTWACCAG CCATTGCCCA AGCTGTTAAA ATATAACCAT GCAGAGCGGC CAAT7'CCTTG GTTCCAAAAA TATCACTGAG ATAAGCTGGA ATCAAAGAAA AACCAGCTCC ATAGCAAGTC ATCAAAATAG ACATAGCAAC TACAAATAAA ACGGAATCTG TAAAGAGCCA AAGTGAGAGA GAAAAGAAAA GATTGACAAG CAGTA.ArATA CTAAAGGTTA GAGGGCGACC GATATAGTCA GACAAACTCG CCCAGAGCAA GCGACCAAAT CCATTGAAAA TCCCCAAAAC ACCCACCATT ACTGCTGCAT GACTTGTAGA CAAGCCAGCC ATCTCCTGTG CCATTGGCGA TGCCGCTGAA ATTAAGCCTA AACCACAAGC 'rATGTrGATA AAGAAAATAA TCCAAAGCAT TGCAGCCATT CCTTGCGTCA AAGAGGCTGT AAGCTCTTGC TCATTrGGAC GCTTAATGAA ACTTGCTCCT AAAATATAAA AAGTTTCTAC TATGGGACTA GTCAATAAAG AAGCAAAACC ACCACGTT'rA TCAGGAAACC ATTTTATAAT CAAACCAAGC CCACCTAAAA TGCCATAAGC ATTGCAAATC CTGTTAAGAT ATTrCCACCT ACTTTCGGAC CAAATTTTTC TACCAAACGC ATAAAACCGA TTGCTTTT'rA GAGCCTGATT TTTTTCTN'TC CCTGAAGAAG ATAAAATGCC TTGTGAAGCT AGGAGCATGA 'rAATAAAGTA AAGCCCTACC CC!TGCGATGA GGTGTTGCGC AAACCCCATA ATCGCTAAAC CTGTTGCGAG CGTCGACACA GGGGTAATAT AGCCTGCTCC GAGATACAAC AACCACAGCT CTGACGGTCT GCGTATAGAA AAGCAGATAG ACT'rCCCATG CCCATAAATG CAGCCGATAA GCCCAAACAA 2280 2340 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3230 AAGATTGCTA GACTAAAGGC GAAGGCAACA GAAGCCTGAT CCCATCCCGT INFORMATION FOR SEQ ID NO: 204: SEQUENCE CHARACTERISTICS: CA) LENGTH: 5096 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTPION: SEQ ID NO: 204: CCTATGAAGA CTGTCCCAAC TGGGTGTCCT TCTAGGCTAT CTGGTCCTGC CACTCCAGTC AAACTAATTC CAAAATCAGA CTGGGTCTTG CTTCGTGCCT GCTCAGCCAT CTTCTGAGCT GTAAATT1CAG ACACCACACC ATGTTCTTCC AAATTCT'rGG CAGGAATATC CAACATCCTT GATTTTTCCT CCAAGCTATA GGTCACAAAA CCACCCTTAA ATATACTTGA AACTCCAGAA AAATTCGCCA CGGTAGCTTG GAAAAGACCT GCCGTCAAAC TCTCTGCAGC CGCGATGGT? TTCCCTTGCC TTTTCAGTTC TTCTACCACA ATGCTGGCTA AACTAGTTTC TTCCCCATAA CCATAGCAAA AGTCTCGTAA AGAAATTCCT TCGAAAGTCT GGCAGTCCAA GATTTGA'N'T TCCAAGATAT CCAGCGCTTG ATTCGCCTCT AGAGTGACTT CTCCTGTCTT GGCATAACGGG AAATCAGCCA AAATCGTAAC CAACTGGCTC GAA'rACAGCT TGCTCCCTGT CATCAACT'rG AATTCACTTG GCGGACCTGG AAGGACGACA CCAACAGCCA GTCCTGTTTC GTTTGGCAGT CTTTCGTTAT TCGGTGTTCG GGCA'rAGTCT TCCTGACCT GAGGATCAAA GACTAATGCr GTTAGGTCGT CCTCAGTTGG CCCCAAACCG CTGGCAATCT CAAGCAAAGA CAAGAGACGA TATACATCTA CCCCAATCTC AGCTAGTT'TT ATCTGCCCTG TCAAAATCTC- TGTTCCAACA ACCTATCTAT TCGTATTTT'r TTGAAAAAAT ATTTGTATCA AAAGTTAATT ATCTTCATCA AATAAAACCG CATTGGTTTC AAGCTGAGTA AGATTTTGCA TTTCCAACAT ATGT1GCTCTC ATCTCTCC2'T CACCC'rGCAA CTGCTGAGTT AGCAAGTTAG GGATACTTTT TGCAGACAAA GCTTTTAGAA TCTCTTGATA ATGACCAATG TTTTGAATAA CTTCACGTTG ACGT'TTTGA CGCATTCGTG CATAGACTAG GGCTTCTTCT GAAATAGTAT TAAATTCTTC TTGGTCACTG ACTGTAATAC CTCCTACTGC ATCCACTAGT TAGCGATCAA TATGGATATT CATCATTTTT CCATCTGCAT ATGCTGAGTT CAGTTTCGCT CGCGTCAGAA TATCCCGCTC TAAACTCATC GTCATCAAGA TCATGCTATC ACTTCTACCG 1187
*TCTTGACTGC
GCCAAGGTAG
*TCGCCAATcc
GGTAGAAGTT
TAGGTCACTC
GGAATCGCTC
GGTCGCAGGG
TTCCCTAAAA
~C GTCAAAA TCACCAGACT GCTACGT'rGA ACTTCATTGT CTCCTACAGC CGTCTGAAAA TCCGACAAAA ACTGGGCATT GGTGTTGACA GCAATGAT'rT CTGCTTrCAT GTTTCCTCCT CGCAGGAATT TTCCTACGAT TGATT'TTTTT CCAACAGGTG CTCTGCCAAA TAAATCTTCA ACTTCTTCTT GTCCCAAAGA ACGTCGGAGT GAAACAATCT G-GTA-AGAAAC ACCTTGAAGT TCAATGGTTT TAAATGAATC TTTATACCCT TCAATATTGG TCTGCATATT GTCACTCAAA CTATTTAAAC TGAGAGCTTT TTCCATGACT CGACCATAAT CCCCCTCAGG ATCTTGGTAA CCCCCAATAT GTTGCTCCCC AACACCGATA ATAGAAATTG GGAAACCTAG GATATTATTG TTTTGCAATC CTCTCATATT GACCATCACA TGAATGGTTT CTATAGCAAG CTCTGCTCCA TCATGAGCCT GACCATTCCC TGATTCAATG ATTGTTGTT'r TTTTCGTTTT AGGAT'TCACT ACCCAAGTTT CAGTTCGTrTC AACATTTCCG TAGCCTTTGT TGACAGACGT GATCGATCTG ATTATCAATT CAAAGAAACG AAGAACTCGG GGTTAAGAC CATGGGTTTC CGTCTACTTC TAATTTTCCT CTTCTACAAT TTGAGCT'rGT TAAAAAAGAT ATCCAACTTC ATT-TAGCrAG GGTTTGTTTG 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 GTGTCCACTC CCATTAACAG TCACCGATTT TTTTATAGGT AATGGTTAGA GGTTCAGTCG TTTAGCTAAG GTTTCTGTCC CTTCAATAAC CTTGGTTTCT CTTGTTGATA AATAGTATAA GCAAAAACAC CTACTCCTAC ?TTTAACCA TA'IrTCTACT CCCTTCTTCT ATATATCCC A'rAAAGATGG ACTGCTGCTT GCCACTTGC'r TGTGCCCACT CCAATATCTr TTTCCAATCA ATCAGCTGTA ATATTTACAA CTGATTGTCC GAACTAGCTT AATACCATCT CCGTCTAGGC CACTAATTCA GCTGCATCTT GAAGCTCCTC TAACAATTTC CATCTGCTTC TTTTAGAATT GATTT'CCCCA CTCAATAACA AATCAGCATC TCCTTCAATA ACTCTCTCAC GATAGTATAG CAAGTCCTTT AGTAAAGGTC CATTCT'rTGC TAATAGATGG TTGTGTACAT ACTCTTATTA TTATCATCA'r AACATCCATA 1188 TACAGTTACA GAAAGTAAAG AACCTATCAG mTACCCATC CACGCTCTTG GCTACCTTCA GATTrACGAGT TTGGACAGTC CTATCGCTTC TTCTACCAAC
CAATGAAGAG
TACCAGCAAT
GCTTGTTGAG
ATCTCCAATA
TTTGCCATTT
GAATATTTCC
TGGTAAAGTC TGTCTCCAAA 'rGGGCTCTGC TTCCCTAATG TCAGCACGCA AACCCTTTGC TCCAATTCTA AATAAGCATC GTCACGCCGC CACCAAAGAT CGATAAACAT CTAGGTGATA GTGGGACTT'r 'AATCATTTG GTTTTACCTG CACCCAGTTC CCCAAACGCT CCCCTAAGGC TACCAAAAAC TTTTCTTTTG AAAAACAGGC TTTCTCTAAA CTAGCACCAT TCCAATAATr AAGTAAACAT CGATAAATTT ATGACAAAGC CATGCTITrrG AGTTGG AGAC GACGCAGAAT AAACTTCCCA AGCCATTATT TGACGGACTC TCTrACGCTG AAGAATGCAA GTAAGGTTAT ATCTCCTCAC TAGTCAAGAG CTCACACGA'r TTAAAAAGGC AGCAATTCAT ACTCCATATT CTGAAAATTT AAACGGCGTC TGGCAAGGCA TCTCCTAAGA AAACTrCATCC AAGTCGATAG AAGTGGAAGT CGACCTTCAT AGAAATCTGT AATCCTTTTG TCCAGTTAAG ArTAAAACAT TTGCAACTCT TCTTCATT TGTCTATTTT CCTAC'rAAAC AGAAAATGAG CGTAACAATG 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 ACCAATACAA GATCTCGGAA AATATGACCA GGGACAAGAT AGGCTGCAAA AAACAAGCCC GTCATTGGAT TTCTTAAGAA AAGAAGTGTT CTGTCCAGCA TAGCAAGAAA ATCGCGCACG TCTAGCCCAA ATAGGAAAAA GAAGGATGC GTATTTTI'GA TGAACAAGTT ATCTGACAAA -AGTAACATAC TGTAAAAAAG CTTCACCGAC TAAAAGGAA.A CT'rCCTTCTT AACCGAATTT AGTCCAATAT AAATCAGAAG TGAGACAATC GCTAAAATAG TCACCAACAC TGTCTTTTTT TA=rTTTCA AGGGTAAAAA AATCAGCAAA AATAAAAAGT CAACTAATTC TTGCTGCAGC ACAAGAACAG CTCCTAACAA ATTAATTAAG TTCTTACTGG CTAGGACACT ATGGACTTCT TGCTTACGGG TATAAAGATA ATTTACTCCA GCACAGATTC CTGAAACGAA AACCATGCTT CCGATGAAAA AAGCTGTACT TTGTTTAAAG GACAAGATGC ATTCCTTCCA TAGGAAACAG CTACTCAAAC TGATTTGAAT TAAAGCTAAC AAAAATAAGA TTCTCATTGA TTTCATCTTC TCTCTCCCTT CCTACCAATC ATTATACTAG GAGAAAAGAG AGAACTGTTT CTAATCTTCT 3780 3840 3900 3960 1189 CAAATGTCTC 'N'TAAGACGC TAAACAAACA CTAGAGACTA ATACrCAATG AAAATCAAAG ATCAAACTAG GTAGCTAGCC ACAGGTTGCT CAAAAcAGTG TTTTGAGATT GCAGATAGAG CTGACGTGAT TTGAAGAGAT ACGGGTGGTT GTTTTGTCTC TAAGGCTGAT AGTATTAATC TATAATGTCA AGATTAACTA AACAT TAAAG GATTGTTATA AC'TTGTTCTC TGTCTAGAAA AATTGGTAGC TrGTCTGTC T GCTTCATTAT ATTTTTCCTT ?NCGAAGAA TATAATrG AAATCATGAA AATCCGTCAA GCACCTCACG GAGCGAGACG TAACTATCAG CtTmCAGGTT AACAGTATCT AGTTCCTTCA AATCTTACAT AACTCTCTTG TTTGGCTCCA GCATrTCCTA TTGTTTACAG AGTTCAATCG TGATACTAGG TAGTGAGCGT GACTCAGAGT CACATAATTA AT'N'AACGTT TCAGAAAAAC AATAATTTTC TATCTTCATC CTTCTATATA ATAAT'rTTTG CAAGAATAAG TAGAGGAGCC TTTCAAGAGC TTCTTGGATG AGTTGTAACG AACTCTGATG TAGCCAAATA AAAACTCrTG ATGGTCCA.AA TTTTGTCT GAGTAGTTTG CC'TCATAT'rC TTGTTCACGA CCCACTAAGG TTCAACGCCT TTAAATAAAT CAGAGTATTT GAAGAGACTT
GATACAACTC
AATAGAAATT
TTAATAATAT
GACGAAATTG CCTCACACTT AC'TGTCATAT TGATAGAAGT CAATTATAGA TCAAGGTAAG TTCGGTCTTC TAATGTTAGA AAAGTGCT'rC GTTCTACTTC AGATATTCTA AATCGTCATA ATTTCTGTCA TCTAATAGGC GAGCAGATAC TTAGAGAGGT TAGACTTAAC TTCGATTTGT TCATTGAAAA AGTAATCCAA AGTCGTTGAG AGAGTTTGAA TAACAAGTCT GCGGAGGGA-A TAAAATGACC TTACTAATCT GGCTTTGTTC ACAAATTCCT TCTGCAAGAG TTTGTrGGGA INFORMATION FOR SEQ ID NO: 205: SEQUENCE CHARACTERISTICS: LENGTH: 2395 base pairs TYPE: nucleic acid C) STRANflEDNESS: double 10D) TOPOLOGY: linear
TATTAAATGA
AGATAGAGTA
AT'TTrCCAAT
TTTAATCCAT
TATTTTATAA
ATGTTTGAAA
AGGGACTTCA
TCTTTCAATT
GAGTCT
4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5096 120 180 240 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 205: ACAAGATAAA AATAAAGGAT TACAATGGGG AATATAAAGT AAACCGGTAA ACCTAAA.AAG AAAGGAGAAA AGATGAAAAT TGTACTTGTA GGGCATGGAC ATTTTGCTAC AGGGATTTAT AGTTCTTTAC AATTGATTGC AGGTAATCAA GAAAATGTGG AGGCGATTGA CTTI'GTGGA.A GGAATGTCAG CAGATGAACT CAAGCAAAAA ATCTTACTTG CAATTTCAAA TGAAGAAGAA 1190 GTTTAATCC TAAGTGATCT CTTGGGAGGA TCGCCATTCA AGGTTT'CTTC TACCATAATG GGAGAAAATC CAGCCAAGAC AATGAATGTTI CTCTCGGGTT TGAACTTAGC CATGTTAA'rG GAAGCAGTCT T'rGCTAGAAT GGCTCATAGC TTTGATGAGG ?I'GTTAATAA ATCAGTAG'rG GCGGCCCACG GCGGAGTCGT AAAMCGTA).A GAATTGrT' CAACGGATGC AGAGGAAGAG GAAGAAGAT'r TCGAATCGGG TATTTAAAGG GTAAAAGAAT AAAAATAAA ATCGCCTGAG CGCTTC'rTAG AAGTACCACT GCCAGGCAAT CGATAAGGTT AT'rCGGCAGT TAGAACTCAA ATTTCCCGAC GCCAGCTACC T'rTGATAATG TCTATCCAAT CCAATGGTTT CTGGACAGGA GAACTGTGGT TGGCTTATGA GATAAAAAAG GTTACGATTG TCTGACGAAA GAAGAAGTCG CCTTGACTAT TrrCAAGGAAG CATGGATAAC ACGGAATGGA ATACAGTCAA CAGGATGCAT 'rCGTGTCAAT AAGAGAGTAG TTGTATCGCT GAATATAAGA TTAAAAACAT CGC'rCATAAA AATTGGATCA CCATGATCTC TAAATGGAGA TGGAGAGGCT GCTATCAAGA AAAAGGTGGT ACCGTTTGAT TATCGACTGC CAGGCGATCA AAAATACTAC TAATCCGTGA TGACGCTTCG CCTTTAAAGG TGTAACGAGA cATGGGGAGT CTATGGTATT ACTTGTr'TAA GGGTGTGACC ATTGGGATTT CATTTTTAAT TCGCCGTCTG TGGGATTCAT ATAT'rTATAA ACATGCTA'rG AATGTTCTTT CTTTCCTGGA GGCTTCTTGT ACACACCGTC AGAGAAGCAA CCTTGAAAGC rTTTATTCAAG CTTGGGGAGA TTGCTCAATA TCCAACTCT'r GATATTGCAG AAAGCCATTT TCCTTCCACA CCTTCTATTT CAAGGGTATA GTGATGATTC CCTTTGACT'r ATCGTCACTT AAT'rATT'TCT TGAATCGTCT
TGCAGATAAG
CTTGGGCAAG
ATTCTTTGCT
CTATGCrTCA
TGATCCTGAG
ATGCTGGGCA
AAAAGACGAG
GCCAAAAGAT
TTGATTGAAC 960 AAAGAGCATT 1020 TATCAAGAAA 1080 GCTAATAATG 1140 ACAGGTCA.AC 1200 CGTGGTCAAT 1260 tCCTGCTTTG 1320 CATGTGTCCT 1380 GCAACAGCTA 1440 GCTGACAAAG 1500 TATGCAAATG 1560 CATTCAGGTA 1620 CTTATCCGTT 1680 GATGGTAGTG ATCAATCACG AGATTCTTCA GAAATGC'rAA AACATCTCCC AGAGGTGGAT CATGCCATGC TTCGTTCCTT GATCGAACAT
ATCAATTTAC
AAGGAGTGGA
CCCTrGGTGGG ACAAGTCTCC TGAAGGCAAT ATCTGGGGTG TCCACGGTGT GTACTCATGG ACTACTATTA CCTAGAAGCC TCTACAAAGA CTGGAACCTA 'rATTGGTAGG AGGAGAAATA TGACAA'rGCC AAATATTATT ATGACCCGTA TCGATGAACG GTTGATTCAT GGACAACGAC AACTTTGGGT AAAATACCTA GGTTGTAATA CGGTCATTGT TGCCAATGAC GAAGTAAGCA CGGACAAGAT GCAACAAACT CTGATGAAAA CAGTTGTGCC AGACTCAGTT GCCATGCGTT TCTTCCCTrr GCAAAAGGTG ATTGATATCA TTCACAAGGC TAATCCTGCT CAAACGATCT T'rATCGTTGT AAAGGATGTG AAGGACGCTT TAACCTTGGT AGAAGGTGGT GTCACTATCA AAGAAATCAA TATTGGGAAC 1740 1800 1860 1920 1980 2040 1191 ATTCACAATG CCCCTGGTAA AGAGCAAGTG ACACGCTCCA TCTTCCTGGG TGAAGAGGAC AAGGCGGCCC TCAAGGAATT GAGCCAAACT CATCAAGTAA CATTTAATAC GAAAACAACT CCAACAGGAA ATGATGGAGC TGTTCAAGTC AACATTA'rGG ACTATATTTA ACAGAGGAGA T1CGT'rATGTC GATTAATGTA TTTCAAGCGA rrTTAA'IrGG ATrATGGACA GCTTTCTGrT TTrAGTGGAAT GCTGTTAGGA ATTTACACCA ATAGATGTAT TGTTCTGTCA TTTGGTGTCG GAATTATTCT AGGTGATCTG TCATGCTCTr GCAATGGGAG CCAATGGTGA ATGG INFORMATION FOR SEQ ID NO: 206: SEQUENCE CHARACTERISTICS: LENGTH: 3342 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 206: 2100 2160 2220 2280 2340 2395 a. *4 CCT'rCTTTAG AGGTTAATTT CTGTTCGCAA AGAATCTCTG ATAGCATTGC CATCTGTTCC TTCAAAATAA AAAATTCTGA ATTAGGTTAG CTCCACCTCC TCATTAGTAA TGTTTCTATT TTGAAATATA TTTTTTTTCT TAAGATAAAC ATCTTCCCAC
TGCAAAATCG
ATATGTTTTT
ATCAAT'rGGT
TGCGTACGGC
CCCAAAGAAG
ACTCATTTGA
GAAATAGAAG
TAAAAAATGA
TCGATTGTTA
GAATCTr'rTG
AAACATACCG
ACAAATCCCA
TAGAACACCA
CAATAACCGA
TATAAGGATT ATTATAGAGA AATACAAAAC TATCTCTCTA TAACTAGAAA AAGAATTATA AAAGTGCTAA TATTGCGACA AATTCCTATC ACTATTTT ATGCTAATAA CACTGGAAAT AAAAAGGGAG TAGCAAGCAT CTCTAGTTTA CCTAGTTCAT GTAATGTAAT TGATATTAAC GAAATTAAAA TCAATCGAAA ATAATAGATT AATGAATCAT
AGGAACAATA
ATACCATTCC
AAAATGTTCG
TAATCCGTTG
TCCATCAACT
ATCTGTCCts
TTACTCTCAG
CACCCCCCAA
ACGGAATCAA ACATAAATAT ATGACAGAGT TCTAAACTAT TAGCTTCAAA AAGGCGTTTT GAATCATAAT TTTCTAAAAT TAATTTTAtG TTTTGTACTT AATTTTCCCT TCAAGTACAT GAGCCTCTGC AATATCTTTG AGTGAATTGG CCATATATGA AAATATATCT CTAAGATATT CAACATCTAA TGTTACAACA AACI-rTCCAG TCCTTTCAAT AAAGTT'TTTT GTGTCCACAG
TTGGAAAAAT
TATTTAATAT
TTCTCCCAAT
TATCAATAAT
TTTCAACATA
ACATCTTCTC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 TCTGGTAAGC TCTI'TCTTGA CTTCAATTTT ATAAGTTGCC TAATTGAAAC TTGGTGTAAT CTGACACATT ATCAGAGCCG CTAATCGAAA AAGATGGCTC ATACGTTTTG TAAATATACA GGAGAAGAGA TAATTATAAT TCAGCATTAC TTTGCCTATC ATTTCTAGCT CTGAATTGAA ATGATACTTG AAAGTCTCTT GCATATATAA ATATT1'TATT TCAAGATAGT ATGGTACAAA ATTAAACCAA AATCAA'rACT ATAACAGTAG CATATACCTC TTATTATCTC AACGAAAAGC AGTCAATCTT GGTATTTTTA CTATAAATTA AGCTAGATTT
ATCAGACTCT
AATTCCTTTC
AGGTGTCCTG
AGTATACTCT
CATTAATT
AACAGACTTT
ATTTTTGGAG
AAGCCGTTTA
TACACTATTA
TATTGCTTAA
CAAGTAATGA
1192 AATAACTCTT TTTTTATAAC ACCTCCATCA TTAAACAACT CTTCTGAATC AGAATTAGAT AAAGATATAT CAACATTATT TCTACTACAA AAAGTCTTAG AGTTATGATT TCGCACTCCT CATCCTCTCA ATTTGAATTT AGTAGATTTT TGTTGACTCA CATN'ATACA TATGTTTTGT TAATTTTGAT TTTAGTTTAA AATCATTTCT GCAATTAGAA TAGAACTTr CTTTATTATA AAAATATTTTr ATAGAATTAC ATATTAAACT TGAGTGGACA CCTCTATTTT AGAAACAAAA GGGGATAACT ATCTTTTTGT CATTCTGATT AATACCAGTC ACACCTGTAT ACAAAGAAAA ATACTCTCCT TCTTTTGATT TATTCATTAC ACTAGTATCC ATTTCTTTCA TGTAGTCGAT GCAAAATGTA TCATTATCTA AACTAGCTAC 4 V. CAGTGCGATA TACCTTAAAA AAGTATAAC ATCTGGGAAA TTGCTTGTTT GGACGATACG AACACTACAC AATAAAGACT CCAATTCCAT GTAAAAATTT ATTATGGCCA TACTTCCATG AATTCCCTCT GGAACACTTT GGGGATGATT 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 AACTAATGTC CCAAATTCTC CACTACACCA CTTCAAAGAA TGAATTTTGA TTTTCTCCCr AGGAACTAGT TGTAAAATTA TTTTT'rAAGT CTTGTCACTT TATAAATATT TTTTAATGTA AAAATTACAC
ATTCTTTATA
CTGATAGTCC
Vol., 0 0 ATGGCCAAAA CTATATCCAA ATCACCTAAA CTTA-ATACTA TCTTACCAGT GTATTAATTA ACTTGCAAAA TCAGTCTTTT GAGAGTATTT GTATCGTAAT AGTTAAACjCA AATGTGGTCT ATTGTACAGT TGAGTATATG TAATATTCCA GCTCTACCCC ACCTTTAAAA TTAACAGGTA CAAAGAGTCT GCAATATTT GAG'IrTTTTT CTATTAGATA AAAATAAATT TCAATGATAT AATTACTATT ATCTCTCTCG GCCTTAGAAC ACGTTCCTTC AAGGTAGAAG ACCATTAATA CAAGCTCAGT TAAAACACTC CCTCTATAAT GGATAGAACA TAGATAAAGA AACAGATGGC ATGATTTATC TTTCAATAAT TATACATATC ATTMCCCGTT CTTACATCAT TATATAGCGT TCTATTCCTC TCCTATAATA TAGTCAGACT TGTTTGAAAC TTTATATAAT TTAAGCATGC ATGAAATATC CTATATCCCC GGAATTGCAG ATAACATTTT TTTACATAGT ACATAAACAG TGTTCAAGAC ACCATTTAGA TCATAAATAT TATTAATAAC G'rAACTAAAC TTTCATTTGG TTTCGCTGAT CAAAGCTTGG TCCAAATTAT TATTAGGTAA
TACTCCAAAT
CTACTTCATT
AGT1'TAATTTr
CAAGTTGCTT
TGGATATTCG
ATGCAGAATA
ATATCCTTTT
TTCTAAATTT
1193 ATATT'rCATA AAATAGTCAT ATCCAGAAAA TTGATG'rAGG GAAATAAAAT GATTTCCAAA ATCATCGTAG ATTXTCATTGA TATrTGTATC TGTATAAAA.A ATCGGAATAT CTAATAACCT CATTTGTTCA CATTCGCTTG CTACAATACC TTGATTAGAA AACTrATrGC TCCAGAGATT TTCCAATGCT TTN7CTCTAT CTAACATTTC TTCATAAAAA TCAGGATGAT ATAAAAAAGA TAGTACTGAA GCATAGCTAT TTGTGTCTCTr AAAAAGTACC CTTGTCTTTA AACCATACAA.
GTrTGCTTTT AATAGCATTr TAAATTCTTC TGTTTTATTT AACTCTTCAA ATA'rCAGATA AAAATCCCTA AAACCTTTTr TGAAATCTTT TATATACTTA TCAAATTCTA TATCACCATC CCGAACAGGC AGGTTTCC CACCTTCAAA ATCAATT'rTC CCAATATCAA ACTTTACCTT ATCAGTATTT AAATTAATTA AAACTTGACC AOGGATCCTC TA INFORMATION FOR SEQ ID NO: 207: SEQUENCE CHARACTERISTICS: LENGTH: 3454 base pairs TYPE: nucleic acid STRANDEDNESS: double (0D) TOPOLOGY: linear 2880 2940 3000 3060 3120 3180 3240 3300 3342 *000 *0000 0000.
o (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207: GAGAAAAGAA TGTTAAAGAA AAATGATATT GAAGGGGCAG GAGTTGCCAA GGTAGATGGT AGTGAAAAAA TTCTCATGCG TGTCCTCAAG GAAAAATACC TTGTCCAGTC ACCACACCGT TCAGGAATCG CGGATTTAGG ACACCTTTCT CAAGTCAAGG ACAGTCTCTA CAAGATTGCT CTTGGTATGG AACATCCAGT CAAGTATCGC AATGGTGTCT TGGAAACAGG ATTTTTCCGT GATTTCTTTA TCCAGGATCC TGTCATTGAC GTAGAAGTTG AAATTGTTGA TTTGACCCAT TTGGTCTTTT TTGTAGAGAA TGCT'rTACCG GTCAATAAAA AGATTGGCTT TGGAAAAGTT AATCAAGATC TAGAT'rTGGC TTACCTGCGT TATCCAGAAC AGCTCAAGTT TAAAACCAAG
GGAATTGCAG
AATAAGGCGC
AAGAATTCC
CAAGTCGTAG
CGTCGTTTTG ATTTAAAACC TTATGACGAA A.AGGAACAGT GTGGTGCGTC GTGGTCACTA TTCAGGACAA ATCATGGTCG AAAGTTTTTC GTGTTGACCA ATTGATTGAA CAAGTTATCA TCTGTCATGC AAAATATCAA CGACCAGAAT ACCAATGCGA ACTCTTTATG GTCAAGACTA TATTACGGAC CAGATGTTGG ATGTAGAAGT TGCTGAAAcG AGGTGCCCGT TCGTCGAGTG ATAACCTCAT GCCCCTTGAA TAGCTCTTCG AGACCTGCTC CTGGATTGAT TCGGAATCTT TTTTGGTGAC AACTCGTCCA AGCAGTTCCC AGAGATTGTG TTTTTGGTAA GGAGTGGCGC GAAATGACTrr CCAAATCGCT 1194 GGCCCAGCCT TITACCAACT CAATACTGAA ATGGCGGAGA AACTCTATCA AACAGCCATT GACTTTGCAG AGTTAAAAAA AGATGATGTG ATTATTGATG CCTATT-CTGG TATTGGAACC ATTGGTI'TAT CACTCGCCAA GCATGTCAAA GAAGTCTACG GTGTTGAACT GATT1CCAGAA GCAGTAGAGA ATAGCCAGAA GAATGCTTCI' TTcAACAAGA TTACTAATGC CCACTATG'rC 'rGTGACACGG CTGAAAATGC CATGAAGAAA TGGCTCAAGG ATCTTGGTTG ATCCTCCACG CAAGCGCT'rG ACAGAAAGCT AAG4GTATTCA ACCAACCGTT TTATCAAAGC AAGCGCCCAA ACAGGAGCCG A'rCGCATCGC AAACTATACC AAGAGTTGGG CAAACGCATC ACGTCGAGAC AGTGTTGAAA TTGAGCTGA GCTCAAATCA AAGAATATGT GCACAGATAA AAAAGAAATG GA'rAAACAAA TTATTCCACA AGACACTTCA AAATGAT'rTA CATTCCTACT TGGrA'rAGGA TAGGCTCAAT ATAAACATTG GATAAAGGCA ACAAAAAAT CTATATCTCC TGCAATGTCG ATATGAATTG AAGAAAGTCC GGTAGCACTT TTGTCCAAAC TGAGATGGAT TTGACA.AGTG TTGGAATAAA TTTGAATTAA CAACCATGGC GCGTGATATT
TGGAATAGAA
GTGTACACCT
ATAGAAAAGA
ACAGCTATTA
ATAGGATCAT
TTAGGATAAA
TTACGAGAAC
GAAAAAGAAG
ATGACAGTAT
TTCCTTTCI-r TTATATT'r
TTTGCTAAGT
AGCCG~GTGGA TCTATTTCCT TCGATGTCGA TAAGCACATA CGGAGAGCAA AGCAACATAT AAGTTTCGAC ATTATA'rATT ATTACAACA.A GTCTAAAAAG AAGCCATCAT GGATGCTTTG ATGACTTTCT GCATTTATTA GCAAGGTA'rC AATTAGAAAA AA6AGGAGCGT TGAAATGATT TGTATGCCTC TTTTATGAAA 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 AAAGATAAAG AGGTTTATGA TAAAGTTTGT ATGGAGGTGC TTCAACTTGC TTGTTGGTTT TATGTAAATA TAAGGAG'rTC AAGACTTCTA GAATATCTTA GTCCTCATTT GAATAAAGAT CGTGTCATAA CAGTTATAGA GGCAAATAGT CCAAAGT'rTA AAACTCAAAA AATAAATAGT TGGTGTGC'rG CTTACAATAT CCATTTTAAT AATGGATA'rT GTAAGCAGCA CCCCcAtGA.A TTTAAAGATT CTTTAAAGAG TCTTATTTTrG TGATGAAAAT TTAATATGTA AATCTCAGAC GATAGAAAT'r AAAAACTCTA TCGTC7rT'T TATACTCAAA ATTAGGAGGT AAAAATGGTA AGGATAAGAG GTCCCACTTA AAACAATTTA TGGCAAAATA AGGACGGAAT AACACAACAA ATTCTCTAAA ACAAATCACT AAATCAATGT AAGAT'rGAAT GAAATCAATA TTTATGCTAT AATTAAATAA ATTTAATGAA GAAAAAAAGA GGGATATTAT GG.CACTTAAC TATAAACCAT TATGGATACA GTTAGCAAAA AAAGGACTAA AGAAAACAGA TGTAATAGCT ATGGCAGGAC T'rACAACAAA TG~rATGGCA CAAATGGGA6A AGGATAAACC AATTACATT'r AAGAATTTAG AAAGAATATG TAAGGCTTTA TTAcGrGACGA GGAATAGAAA TATT1AGTTTT GAAGATAATT TCTTGCACTC CTAATGATAT ATGACTT'rAA GGACAGAAGA TCAAGTTAGG GAT1'ATGCAA GAGAAGTATA GGCTTTAATG AAGTGAAGA AAACATCAAT CAAGGTACTG GTCAAATAAC 2700 TACTrTTAAT CAATTAGGCT TCAAGGGATA TTCAAATAAG CCAGATGGTT1 GGTATrAcC 2760 TAAAAATATG AATGATGTAG CAATAATCCT TGAAACAAAA TCAGAAGAAA GAGATATTAG 2820 CAAACAAATT T'rTATTGATG AGTTAATGAA AAATATAGAC ATAATTTAAC TAAAAATAAA 2880 AACTAGATCC TTTTTTGAAA AAATTATATT ATTAAArTTG TAACTGTATC TATTGACAAT 2940 GATAATTA'N' ATCGATACAA TAGACTTGAA ATATGTTTAA GGAGTTTTTA TGAAAaCAAA 3000 TTTTTTCTAA TMGCTATTTT AGCTATGTGT ATAGTTTT'rA GCGCTrGTTC 'rTCTAATTCT 3060 GTTAAAAATG AAGAAAATAC TTCTAAAGAG CATGCGCCTG ATAAAATAGT TTrAGATCAr 3120 GCTTTCGGTC AAACTATATT AGATAAAAAA CCTGAAAGAG TTGCAACTAT TGCTTGGGGA 3180 AATCATGATG 'rAGCATTAGC TTTAGGAATA GTTCCTGTTG GATTTTCAAA AGCAAATTAC 3240 GGTGTAAGTG CTGATAAAGG AGTTTTACCA TGGACAGAAG AAAAAATCAA AGAACTAAAT 3300 GGTAAAGCTA ACCTATTTGA CGA'rTTGGAT GGACTTAACT TTGAAGCAAT ATCAAATTCT 3360 AAACCAGATG TTATCTTAGC AGGTTATTCT GGTATAACTA AAGAAGATTA TGACACTCTA 3420 *TCAAAAATTG CTCCTGTAGC AGCATACAAA TC'rG 3454 INFORMATION FOR SEQ ID NO: 208: SEQUENCE CHARACTERISTICS: LENGTH: 3752 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 208: CGGGAGTATA CTTAATATAA TTATAGTCTA AAAATGACTA TCAGAAAAGA GGTAAATTTA GATGAATAAG AAAAAAATGA TTTTAACAAG TCTAGCCAGC GTCGCTATCT TAGGGGCTGG 120 ***TTTTGTTACG TCTCAGCCTA CTTTTGTAAG AGCAGAAGAA TCTCCAqAAG TTGTCGAAAA 180 0000ATCTTCATTA GAGAAGAAAT ATGAGGAAGC AAAAGCAAAA GCTGATACTG CCAAGAAAGA 240 TTACGAAACG GCTAAAAAGA AAGCAGAAGA CGCTCAGAAA AAGTATGAAG ATGATCAGAA 300 *GAGAACTGAG GAGAAAGCTC GAAAAGAAGC AGAAGCATCT CAAAAATTGA ATGATGTGGC 360 *.*GCTTGTTGTT CAAAATGCAT ATAAAGAGTA CCGAGAAGTT CAAAATCAAC GTAGTAAATA 420 TAAATCTGAC GCTGAATATC AGAAAAAATT AACAGAGGTC GACTCTAAAA TAGAGAAGGC 480 TAGGAAAGAG CAACAGGACT TGCAAAATAA ATTTAATGAA GTAAGAGCAG TTGTAGTTCC 540 TGAACCAAAT GCGTTGGCTG AGACTAAGAA AGTAGCTAAG AGAAAATATG ATTrATGCAAC AGAGGCTAAG GAACTTGAAA TTGAAAAACT AGTTGCTACT GCTCAACATC AAGTAGATAA TGATGA'rGGC ACAGAAGI'A TAGAAGCTAA TAAACAAGCT GAGTTAGCAA AAAAACAAAC TCCTGAAGGT AAGACTCAGG ATGAATTAGA AAAAGCTGAT GAACTTCAAA ATAAAGTTGC AATATTACTT GGAGGGGCTG ATCCTGAAGA 1196 AAAAGCAGAA GAAGCTAAAG CAGAAGAAAA TCTAAAGGTA GCACTAGCCA AGAAAGAAGT TCAATATGAA ATTTCTACTT TGGAACAAGA TTTGAAAAAA CTTCTTGCTG GTGCGGATCC ATTAAAAAAA GG*GAAGCTG AGCTAAACGC AGAACTTGAA AAACTTCTTG ACAGCCTTGA TAAAGAAGCA GAAGAAGCTG AGTTGGATAA TGAT'rTAGAA AAAGAAATTA GTAACCTTGA TGATACTGCT GCTCTTCAAA ATAAATTAGC a a. a.
a a a a a a a a. *a a a a TGCTAAAAAA GCTGAGTTAG TGATCCTGAA GGTAAGACTC TAAAAAAGCT GATGAACTTC TGAAATATTA CTTGGAGGGG AGCTACTAAA AAAGCTGAAT GTTAGGCCCT GATGGAGATG AGCTCCTGCA CCAAAACCAG TGCACCAAAA CCAGAGCAAC AAAACCAGAG CAACCAGCTA ACCAGCCACT CCAAAAACAG CAAAAAAACA AACAGAACTT AGGATGAATT AGATAAAGAA AAAATAAAGT TGCTGATTTA CTGAT'rCTGA AGATGATACT GAA.AAACTTC ?TGACAGCCT GCAGAAGAAG CTCAGTT'GGA GAAAAAGAAA TTAGTAACCI' GCTGCrCTTC AAAATAAATT
TGGAAAAAAC
AAGAAGAAAC
AGCAACCAGC
CAGCTCCAGC
AGCCGGAGAA
GCTGGAAACA
'rCAAAA.AGAA TrAGATGCAG CTCTTAATGA TCCAGCGCCG GCTCCTCAAC CAGAGCAACC TCCAGCTCCA AAACCAGAGC AACCAGCTCC TCCAAAACCA GAGCAACCAG CTCCAGCTCC ACCAGCTGAA GAGCCTACTC AACCAGAAAA AGAAAACGGT ATGTGGTATT TCTACAA'rAC AAACAACGGT TC-ATGGTACT ACCTAAACC AGATGGAGAT ACCTGGTACT ATCTTGAAC CAAAGTATCA GATAAATGGT ACTATGTCAA CCAATACAAT GGCTCATGGT ACTACCTCAA CCAATACAAC GGTTCATGGT ATTACCTCAA TAAAGTCAAC GGTTCATGGT ACTACCTAAA 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 TGATGGTTCA A'rGGCAATAG GTTGGCTCCA TAACGGCGCT ATGGCAACAG GTT'GGGTGAA ATCAGGTGCT ATGAAAGCAA GCCAATGGTT CAGCAATGGC GCTATGGCGA CAGGCTGGCT CGCTAATGGT GATATGGCGA CAGGATGGCT CGCTAATGGT GATATGGCGA CAGGATOGGC CGCTAACGGT GCTATGGCTA CAGGTTGGGC TAAAGTCAAC CGCTAACGGT TCAATGGCAA CAGGT'rGGGT GAAAGATGGA AGCATCAGGT GCTATGAAAG CAAGCCAATG G'rTCAAAGTA CAATGGCT'rA GGTGCCCTTG CAGTCAACAC AACTGTAGAT TGGTGAA'rGG GTTTAAGCCG ATTAAATTAA ATCATGTTAA
GGTTCATGGT
GATACCTGGT
TCAGATAAAT
ACTACCTAPA
ACTATCTTGA
GGTACTATGT
GGCTATAAAG TCAATGCCAA GAACATTTGA CATTTTAATT 1197 GATTGAATAG ATTTATGTTC TTGAAACAAA GATAAGGTTC TTATGATTTC AGGAAATG'rC TTAGAGAAAA TGGTTGTTT TACTCACATC ATTCACATAA CTCT'rCCTTG TCTTCTCTAA TCGTGAAGCA CACACGACCT GTGATAACTC TAAAACACGT CGGCTCTTTT GCTCAAGGAG ATGACACAGA TGAAAACGGC TGGCAGACCA GA'rTGGCAT'r GCGTTTTTGA GTATT'rCCTA TGTGCAACAA GGAAATCAAG ACTATGTAGC GAC'rGGGCAT TGCTTCG'rGG CGTGGACAAT AACAACTTCA AAAAACCATG TAGCAGAAGA AGCAGGCCTT
ATTAAAAAAA
TATCTATTAT
TCTGTATATT
CCAAGCGTGT
TCAATCGTGA
G'rTGTCGTGG
CAGGGCTACG
GTCTGTACGG
CGACTCATTT
AGTrTTGA
GACTATAAGT
TATAATGAAT
ATAAACGAAT
GGATGAGTGG
ATGTGATCGG
CGACCGAAGA
GTATTCTN'A GGTACCTATC TCTCTAACCT GAAAAATAGA ATGAAGMTAA GAAGAAGGTA TTrAAAAAAC AAT'rTTTAAG ACTGCTCAAG CGACCTTCAA AGATGGGAGA CTTACCATGA TGGTGTTGAT TCGTCGGTGA CCCTACTACT C'TGTCAATTT GCGGAATACC GTGCAGGGCG TTCPAGGCCT TTTTGCACTA
TATCCATG
TTACAAGGAT
TGAAAAAGAG
CACGCCAAAT
TGCCATAACC
AAGAACTGGG
GTGGTTGCGG
TACTGGGACC
CCGGACGTTA.
TTGGGGGCAG
2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3752 0 A.
*0.
*00 0 **0 TATGCTCGAG TGGCGCGTGA. TGAGGATGGT ACCGT'rCACA GGCAAGGATC AGACCTATTT CCTCAGCCAA CTTTCGCAAG TTCCCACTAG GACATTTGGA AAAGCCTGA.A GTACGCAGAC TCGACTGCTA AGAAGAAAGA CTCGACAGGG ATTTGCTTTA TCGGAGAAAA GAACTr'rAAA AACTTrCTCA GCAACTACCT GCCAGCTCAG CCTGGTCGCA TGATGACTG'r GGATGTCGC GATATGGGCG AGCATGCAGG TCTTATGTAC TATACAATCG GTCAGCGTGG CGGACTCGGT ATCGGTGGGC AACACGGCGG TGACAATGCC CCTTGGTTCG .TTGTCGGAAA AGATCTAAGC AAGAATATTC TCTATGTAGG ACAAGGATTC TACCA'rGATTI CGCTCATGTC AACTAGCCTA GAAGCCAGTC AAGTCCACI-r TACTCGTGAA ATGCCAGAAG AGTTTACGCT AGAATGTACG GCTAAATTCC GT'rACCGTCA GCCTGACTCT AAGGTGACrG TTCATGTCAA AGGAGAAAAG ACAGAGGTCA TCTTTGCGGA ACCACAACGC GCGATTACAC CAGGACAGGC AGTTGTCTTT TACGATGGCG GG INFORMATION FOR SEQ ID NO: 209: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 3580 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1198 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 209: TATTTATATT TTITTTATCTC TGGCATACTT AGTGCCTTT'C CACCTCITT TATCTATAAA TTTAAACCTT TCTGTCTTAG TTTGTCTrTA TAGTGGTTTT GATAGGTCTT ACCACTGCTT ATATCTGTCT GTGAGTTTAA TTTTTCTTTT ACATTTTTGA AT'rCTCCATA AAAAATGGGG TCTTTATCTA TCTCTATATC TTTCCATGTA CGGTAATTTA ATTTTGATAT CTGGCAATAC TGATACCT7,r TTAGACTTAA AGTC ITTAAT GATTCTCCTA CATCATAATT CATTTTTNTA TCTTCTTCAT ACCATN' AA GATTGTCACA TCCATGTATC TGGATAGTTT ATTTATCANT AGATTTTTAT ATTCTTCTr'r GTGGACTTTT TATCTATCTC
GCTTAACCTT
TCCCTCTCTC
ATTTTTAAA
ATTCCAATCT
TGTGCTAGAT
TAAATAGAAT
CATATCTAGG
GGAGTACCTC TACTGTCTAT ATTTGATCTT TATATTCAGT GC'rACT'rCTT TATTCAATTC GCT'rGCCTAA TAAT'rGAAGT TTTATTTT1r AATTTTAAAC AATGAATTTT a a a a caa~.
a f, a. *a a a a a *aa.
0.~9 a. *a a a a AAAGACTGCT CCTAAAA.ATG ATCTATATCT TCGTAGTAAC AGTTTTTCCA TCGAGTAAAT TTTCCCCI'TT TTTTCGGTAA CCTGTCCTTT GTAGGATGAG TTAGCTTrTT AACTAGTCGT ATATTGCTGT TATTCTCTAT CTTAAATTCT GTGCTGTATT CTAACTTTTG AGGGTCAGTT TTALAGAGTTA ACATGGTGCT ACGTATAATT TTTGCT'rCAA TTCAGGCTTT CCCTCTGCAA TCTTGCTGTT AGATTTATAT TAGATCAGTC AAGAGGTCTA CTCATCTAAA AGTAGGATAC CCTTTGACCC CCAGAAAGTT AACCATTGAT CTGTTTATTA AAACAGATAT AAAATTTTCA CTAAGATACC ATTGTCAATA ATCTTTTTGG AATAGATGAG TAAATATrTC TTTTTATTTT TATTTTCTAG AT'rTTCyTGA TGTACTTTCT TPTTGTTTAT ATCCTATTTT TCATTCATGA TGCCATTAAA AAACTOACCT CAAAATTTGC GACTTTTAAA TGCCAATAGG AATCATTAGA GATTAAAGAT ATCTTTAACT CAAGTTTACC TTCTTTAATT- CGTGCAAAAT CATGCAAATG ATAGTTCTAT TTGATATGAG TGTATCTTG GGCTAGGGCT
CTTGCTTTAT
AAAACTCTAT
TTTGTAGCAC
CCTGTTGGTA
GTTGTCTGAT
ATAACTTTTT
TATCACTCCT
TATTCTTTTA
CCTTTAGTTA
ATTTATGATA
AATTTTTATC
TAATTCTAGG
CTTAACTCGA
ATT'TTTCCTA
ACTTGAAGTT
GATCTTT~TTA
CTAATTTTAT
GTTTTTIGGCC
.120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 TGAATTCCAA TATTCAATTA GGCGAATTGG AAATAGGGTC AGTTTATCAT TTAGTATATC GCAAATAGGT AATCAGCGTA GTTGTCTTAT ATTTTTGGTT ATATCCAAGT AAGTAGTTrGG AGAGCTATCC ATACTCTTTG
CTTCAACTAG
?1'TCAAGGTC GTTATTTGCT AGATCT'rCAA CATTGGCCTT ATCTTNTCCA AGACTCTTAA AAGGCTTTCT AGCTACTGTT ATTGATTCAG GGATTATTGG TAAATCTTT TCTTTATAAG AATAATTGA GTAGGGGAAA CGACCACGGC TTACAAGATC AGATTGAGGT AATATAGCTA TGTGTTTTOC 1199 TTTATTATCA AGCAATACTT CTCCCTCTAA TGGCTTTATA AGTCGAGACA AGGTTTTAATr 1800 GAGTGTTGAT TTCCCACAAC CArTTGACCC AATAATAACT TTTTATATNT ATATTTTCCA AGAT'rATTTT TTCATCATAA CCACAGACCT TTCATTATAT ATTCCTCCTG TTCAITTTA GGTGAACCTA ACAAGCCAGT TACAACACCT ACTCCATATC GAGAATATGT CTGATAACAA AACTAGTAAA ATTCCAACCA CTTTTCTTGC CAATATTTAA GGCTATGCGA CCAGCTAAAA CCTGTAAT'rG AAGTAGAAAA AGCAGTTAAA GATACAGCGC GAAAGCTCGG GATITGCTCC AAGTCCGAr'r GCTATTTCTT AGTCTTTTAT TAAAAAATAA AACTAATATA GTAGCAATAA GATA'rr'TT CTTCAGGTAT CCGCAGGTAA GATTATTTGA TTAGTAAGTA TATTAAGTAT TAGCTGGTAA AATATTTTGA
ATCCAGCTAA
AAGATATACA
AAAAAATTAA
CACCAAGTTC
TACTTACTAT
GCCATCTCAT
TATTGGGCTT 2100 AGCTATTGGT 2160 AACAAGCCTT 2220 AATAATTDCT 2280 TAGAACAAGA 2340 AAC7?rCTTGT 2400 GACAGCTTGA 2460 AGCTAGTAAA 2520 AGT'rAAACTA 2580 GGTATG'rCAT CTAACTTTGT AATTCA'rATC 'rrGCTACTTT AAACCAATAC CTAATATTAT AATAATATTA AAGATGATGT TTTGTTTTTA ATACCAATAT ATTATATCAG GACTTGCAAG CCAAAAGACC AGCCAGCTAT GAAAAACTAG CTCCAGGAAC TAACTTTCAT CTCCGATAAG AGTGACCAAA ATAGTGT'rAT ATTAGTTATT AACCCC'1CTA TTGCAGTAAT TATCCCTACT CACATATAAG CAAGAGCTCTI
CAACAATAAA
CAGTCTTGCT
TAGTCCACAA
GCAAAAGACC
AGGA'rvrCT'r
AATTCCTGCT
AGTTTCACTA
TAAAATGAAA
TCTATTTT
T'rTTTCATAG TCAAT'rTCAC
GCACCTATAA
AA'rGAGGTGC CTGCTCTTGT GCTGAAAAAC CATCTrTTTTT GT'rATTGAAA TAATT'CCAGT AAAAGATAAA GAGCCACTGA GCTGCAATAG ATGAAGAACT TGTGACACCG AACATAGTTT GAAAGATAAA TCCTGCCAAT AATAATTTTG GTAATCTAAT TTCCATAATC TTTAAGACTT TAATCAAAGT TGAAAAAGAA AATGATAGAC TGATI'ATTAT TAATAAAAAT CTT'rTrTGAA TACCTATAAT TAAATTTTGC TTACATAAAT AAGTACTGGA CCCCCGATTA CTGGTTTACC TAACATACGG CCGATTATAT AAGATGAAGA AATGGTCATT GTGCGTATAT GAACTATAAG ACCTACGAAG CCAATAGGTC GCACACTTGC AATTATTGCA AGTGATCTTA CCATTTCATC ACCCATAGcT AAAGCGTTTA 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 CTTTGCTTAT AAATAAGCCA CAAAAGTGAG CACCAATTGC AGTAATACTT GAACATAAAA TCCTA'rTAAC ATTAACTCCA AGACCAACAG AATCTGATGA AATAAATATA GCTATCAAGT GACCTAAAAT ATATAGAAGA TAATGTAGCT GCTCCAAGGC TACCTATTTG AGACGT'rATT ATTrCGGTAAA ATT~AAAAAAC TTACAAAACT TATAAAAGGT AGTAGTGTAG CCAAAATCTA AATTTGTCTA GCTTAAAGCC ATACTAACAC 1200 AAGTTCCTGA TAAGGCAAGT TTTATAGGGG TAAGGCC'rGC TTTTCCGTTA CAGCAATCGC GTATACAAAA ATTGCACTrA CTAAGCCACC AATGAT'rGCG INFORMATION FOR SEQ ID NO: 210: SEQUENCE CHARACTERISTICS: LENGTH: 11378 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 210: 3540 3580 CCAAATTGCT CCACAATTAT TATGGAGTCG TCGTrTGGCA TCAAGAATGG TTGACAGGCA AGATATTGAC CCCCTATGAT 00 .00.
0 0 CAATATTTTA ACCCGTCTTC CTATGCCATG GAAACACCTG TTTGCGTAA-A AATCATTTTA ATTTAGAGAG OACCATGCGA GACAOATAGT GGCTTGATTT GTTTGATGTG GCCCATATGC GACCTACTAC GGTTACAAGT ATTGTCTTAT TTGAGTCAGA TCGGGAGATT CATGGTTTGC AGAAATCGTA AAGGGGCAAC ATCGCTCACG TCCGTTGATG TAGATTTACT ACAGTCTTGG TCAGTGAAGT OATGGCTGAT CCATTGTCCA TGGAGATGTA ATTTAGTAGA TTGGGATTCG TCTGCCATTA TATTTCAGAA ACAATCAAAC GGTATTAAGT TTTCCA.AGTA TTATATGAAC GTCATTTCCG AGACAAGTAT GATGGGCGTG ATATGTGTGC ATGAATCGTA AGCAAATCGT ACACAATTGA GTCGTTTGGG CAGGAAACGG CTCCAGATGC TTACGTCAGA CTATTCCAGG CGACATAGTA ATTGGATTGA GTTCOCTTGA CCGATCOCAT CATCAGTGGA AGGAATGGTT AAATTGTATT GGTATGGTCA CAAGATTTAG AAAATGTCAA GGAAAGAGAA GATGAGAGTT AGAATTACTA GAGGCAAATC CCCAGTATGT GGTCCTCAAT 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 CCCTTGOAAG CCAAGGCAA.A ATGGCGGGAC GAAGTTGGAA GTGGAAAGGG TGCCTTTGTr AACTATATCG GGATTGATAT TCAAAAGTCT GAAGTTGGAG TGCCTAACAT CAAGCTCTTG TTTGAAGACG GTGAGATTGA TCGCTTGTAT TTGI'TGGCA ATGATAATCC CATTCATGTG TCAGGTATGG CCAAGCAAAA CCCTGACATC GTTT'rGAGCT ACGCTTTGGA CAAGGTGCTT TGGGTAGATO GTTCTGACT'r AACTGACTAC CTGAACTTTT CAGATCCATG GCCGAAAAAA CGCCATGAAA AGCGTCGTTT OACCTACAAG ACCTTCTTGG ATACCTTCAA ACGTATCTTG CCTGAAAATG GAGAAATTCA TTTCAAGACG GATAACCGTG GCTTGTTTOA GTACAGTPTA GTGAGCTTTT CTCAATATGG CA'rGAAACTC AATGGTGTCT GGTTAGATTT GCATOCCAGT GATTTTGAAG GCAATGTCAT GACAGAATAC GAGCAAAAAT TCTCAAACAA GGGGCAAGTT ATCTACCGAG TTOAGGCAGA ATTTTAAGAG ATAACCTAAA ATTAGGCTGT ACAAGTGCTT 1201 TTGCTTrTACA TAAGTTGGCA AACGTGCTAT ACTGATAGTA GGGAAATATC TTCGCCTCTT GCTTATGAGG AGGTGGACGC TAGTCAGAGA AGTTGTAGAA CCTGTCATAG AACcTCCTTr ATGGAAAGAT TGGCAGTGAC ATGATTCTCA GTATTTTTGT CTTGAACGAC ACGGCAGACT TCACAGAAAT TATCAGTCCT AGATCCCTTC CCAGAACAAT ATTTCCTAGA AATTACCAGT GAAAACCAAG GATGCCGTCG CTGGAGCGGT TGGAAAATAC AGCCATCGAT AAGCAAAAGG TCTr'rCAAGG AACCTTGTTG GACTATGGAA TATATGGACA AGACGCGTAA GAAAACCGTC AGAATATGAA AAGTGACGCG AATCGCAACA ATCGTAGAAT 'rGAACTCGTG GATATCGAGT AGATAAACCC GAAGAAMTAC GTCCTAGACA CCATCAAGCC CCAGGTTTGG AACGTCCTT ATCCATGTCG GGCTCTACCA GCCTrCGAAG AGGACGAGTT CAAATTCCAT ACAGTTTAGT
ATCAAAAGCA
AAAGTGAAGA
AGGGAATCAA
GCAGACGCTA
TTACAGTTTA
CGTTTAGCAG TTAAA'N'ATA GAAAAAGAAA GGATAGCTTT TGAGGATTCA AAACATCAGT AAAGAAATGC TAGAGGCCTT CCGCATrTG GAAGAAGACA AAAAGAAGAT ATCATCGACG CAGTAGTAGA GTCGCrCGT TCCGCrrATC TGGTCAGTCA GACAGCGTAG CTATTGACTT CAACGAAAAA ACAGGTGACT TACTGTCCGT GAAGTTGTTG ATCAAGTATT TGATAGCCGT 'rTGGAAATCA GCTTGAAAGA TGCTCTTGCC AAGAAGCACC AGCTGAGTTT AAAAAA'rGCG CAAgCAAACA AAATCATGTC TGGTACAGTA GCATCGAAGC CCAA'rTGTCA ATCGTATCGA AGTTTATGTT 'rTAGCCGTAG TCATCCAGAA ATGATGGAAC TGTTGAAATC CTGTTCGTAG CCACAATCCA CTAATATCAA GAAGATTACT GCATGGTACC AATCGAAGAA 'rTATCTACAA 'rGCCATCGCT GCAA.ACGTGC CTTGGTGG'T GACAAAACGT GCGCTTGGCG GCGAATrTTA ACCCATGGAA ATTAATTCAG CTTATGAACT TGGAGACAAA ATCAAGT'rTG GGTCGTGTAG CAGCCCAATC TGCCAAACAA ACCATCATGG CGTGCCATCA CTTACAATAC T'rACAAAGAA CATGAGCAAG GAACGCTTG ACAACCGCT'r TATCTATGTC AAC GGTA 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060
AAACAAGACC
TACAAGGTTG
ATGATCAAAC
ATGAGCGTGG
AACGTGGATG
AGCAAATTCC
AATATCGATG
CCTGCTGAGG
GTTCCAGATrA
GCTCACTTGA
GACGCTGCTT
AAATTCCTGG AGA.AGTTTTT GCI'CTCATG AAGACAACCC TCGTGGTGTG AACGTC'TTTG GTT'rAATGGA GCAAGAAATT CTCGTGAAGC AGGTGACCGT CTATCGGTAC AATCGTTGGA ACCCAGCTCG TTACGATGCT TTATCGAGTG GGTAGCAGAT
CCAGAAGTTT
ACGAAGGTT'G
CGTGGTGGTG
AAAAATGACC
CCAGCTGAA'r TTGACCAAGT TATCTTTGAT GAAAACGACA ACAAGCTTrC TCTTCCCATT GGTCGTCGTG CTGGTTACCG TATCGATATC AAGTCTGCTA CAGTAGAGTT GGAAGTAGAA AACGATACTG 1202 TAGAAGAATA AAAGCTGCTA GAGGAGGGAA AGATGAAAAC AGTCTGTTCTr GTCTAACGAA GTGATTGATA AGCGTGA'TTT AGGAAGGACA AGTCTTTATT GATcCTACGG GCAAGGCCAA AACTAGACAA TGCAGAAGCC CTAGAGGCGA AAAAGAAGAA GCATGGAAGT GGAAGAAAGC TXTATGACG AGTTGATCGC AAAGAAGAGA GTTGGCACTT GAATAAGCAA AAGATAAGTA CGAGCAGGGC GCATCATATC GGGTGAAGAA ?1'GG1'GGTCA AAGAAAAATC CCTTTGCGCA GCTCCGCATT GTCAAGAACA TGGCCGCGGC GCTTATATCA GGTCN'AAC CGCAGCTTTA 'rrATGTGGAT CACAAAGTGA ATCTCTTGGG GCTTGCTCAG AGGCCATTCA AGACGGCAAG GCCAAGTTGG TCTTTCTAGC TCATGATGCT GGACCCAATC TGACCAAGAA AAAAGTCATT ATTATCAAG'r AGAAATTGTA ACCG'rCTTTT CAACACTGGA GCAGTCGGGA AATCGAGAAA GGTITTTGGCT GTAACAGA'rG CTGGATTTAC AGGTCTCTTA TGGAATAGAA GAGGAGGACA TGATTTGTCT AAGAAAAGAT
GATTCAAGAT
ATTAAGCATA
AAAGAAAATG
TGTACGAAAT
3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960
S.
S. S S. S
S.
S
S
S
S
*.SS
S
*S
S
S
CGCAAAAGAA CTTGGAAAAG GGATGTGAAA AGCCACTCAT CTTTAAGCCT GCAGCTGCTC AGAAAAGAAA GCCGAAAAAT ACCTGCAGCC CCAAAAGCAA AGCTGTAGCC AAGGAAGAGG AGCGGCTAAA CCGCAAAGTC GGCAGAGCGA CGCAAGCAAA TCAGAAAAAC GACGGCCGTA CTTTAATGAC CAAGCTAAGA GCAAGAGGAT AAACGTTCAA AAAGTAAAGA AGTTGTAGCG CGTGCAAAAG AGTTGGGCT'r CAAGTGTGGA AGAAGCTGTC GCTGCAAAAA T'rGCTGCCAG CGAAAGTAGA AGCAAAACCT GCAGCCCCAA AAGTAAGTGC CTGAGCCAGC TAAACCAGCT GTAGCTAAGG AAGAGGCAAA GTGCAGAAAA GAAAGCCGAA A.AGTCTGAAC CAGTAAAACC 4020 CAAAACCAGC TGAGCCAGTC ACTCCGAAAA CAGAAAAAGT 4080 CTAAAGAGCA 4140
GTAATTTCAA
ATAAGGGCAA
ATGGTGGAAA
AGCAGCAAGG
ATCAAGCGGC
GGCTGAGCGT GAAGCACG'rG TAACCGTGAC CAACAACAAA ACGGAAACCC ACAAGGTCAA AGCAACCGCG ACAATCGTCG TCAGCAAAAA CGTAGAAATG AGCGCCGTCA TCCACGTATT GACTTTAAAG CCCGTGCAGC CGCTCGTTCA AGTGAGGAAC GCTTCAAGCA
AGCCCTAAAA
GTATCAGGCT
CTTTGAAGAA
CGTCCCTGAG
CAAAAA'rCGT
AAGTAGTCAA
CAAAAAAGGC
ATTCCATGAA
GCAGAGCAAA ATGCAGAGTA GCTrAAAGAAG CCTTGGCTCA GCGGCTAAGT TAGCTGAACA AAAAAAGAAC CTGCAGTGGA GACGATTATG ATCATGAAGA AATCAAGTGA GAAATCAAAA AATAAC;AAGA ACAACCGTAA TTGCCAACAG AATTTGAATA
AGCTAACAAA
AGCACACCAA
TACACGTCGT
AGATGGTCCT
GAATAGTAAC
CGCAAGGAAC CAGAGGAAAT GTTCAAGCAG TGGTTGAAGT AAAAAACAAG CTCGACCAGA AGAAAACAAC AAAAGAATCG TGGAA'rAACA ACAAAAAGAA 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4R60 TCAGACTCCA AAACCTGTrTA CGGAGCGTAA TACAGATGGT ATGACCGTTG CGGAAA'rCGC 1203
AAAACGTATC
GGCCACACAA
TATCGAAGCC
AGATGGTTAT
AAACGTGAAC CAGCTGAAAT TGTTAAGAAA C?1'TTCATGA 'rGGGTGTCAT AACCAATCCT TGGATGGGGA AACAA'N'GAA CTCCTCATGG TGGATTACGG AAACAAAAGG TGAAGTGGA TAATGCTGAC ATCGAACGTT TCTTTGTCGA CTCAATIGAAG ATGAATTGGT TGAGCGTCCA CCAGTG~rA CTATCATGGG ACACGTTGAC CACGGTAAAA CAACCCTT7*T GCATACTCTT CGTAACTCAC GTGTTGCGAC AGGTGAAGCA GCTGGTATTA CTCAGCATA'r CGGTGCCTAC CAAATCGTGG AAAATGGTAA GAAGATTACC TTCCTTGATA CACCAGGACA CGCGGCCTTT ACATCAATGC GTGCGCGTGG TGCTTCTGTT ACCGATATTA CGATCTTGGT CGTAGCGGCA GATGACGGGG TI'ATGCCTCA GACTATTGAA GCCATCAACC ACTCAAAAGC AGCTAACGTT CCAATCATCG TAGCTATTA-A CAAGATTGAT AAACCAGGTG CTAACCCAGA ACGCGTTATC GGTGAATTGG CAGAGCA'rGG TGTGATGTCA ACTGCTTGGG CCAAAATATC GAAGAATTGT AGCAGACCCA ACAGTTCGTG
GTGGAGATTC
TGCAAACAGT
CGATCGGTAC
TGAATTTGTT GAAATTTCGG CCTTCTTGTG GCTGAAATCC GGTTATCGAA GCGCGCTTGG AGGTACCTTG ALATGTTCA-AG TATGACCAAC GACCTTGGTC CACAGG'rTTG AACGAAGCAC AGGTGCGGTC GCAACCCI'TC TTGTACAACA TGTCGGAAAT ACcTTCGGTC GTGTCCGTGC AGTTGCTGGA CCATCAACAC CAGTCTCTAT TGACCACTTT GCCGT'rTACG AGGATGAAAA CAAACGTGCC CTCA'rGAAAC AACGTCAAGC TGATACCC1'T AAAGCTGGGG AACTCAAATC AGGTTCTGTT GAAGCCCTTT CTGCCTCACT GACTATCGTC CACTCAGCGG TCGGTGCTAT TTCAAATGCC T'rTATCGTTG GTTTCAACG'r AGAAGCTCAC GATGTGGAAA TCCGTCTTCA GGAAGAAGCT ATGAAAGGGA TGCTTGATCC GGTTATCCGT GAAACCTTCA AGGTGTCTAA CAACGGTAAG GTTGCCCGTG ACTCTAAAGT TGATGGTGAA CTCGCAAGCT TGAAACACTA TCGTGAAGGT GGATTGATGA TCGACGGCTA GGCGTATGTC ATGGAAGAAA TCAAGAGATA
CTAAATTCAA
AAGAACTCAA
ATAAAGGAAA
ACCCAATCGT
GTCGTGTTAA
CGATGGCGGG
AAGAGCGTGC
AAAACCTCTT
CTGATGTACA
GTGTCAAAGT
T'rGCCGAAGC
GTCAACAAGC
4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 58.20.
5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480 6540 6600
ATCTGCGCGT
TACCCAACGT
TGTTAATGTT
TCAAAAGAT'r
CAACGAATCA
ACGCCCTACA
CAGCATTATC
AGAATTTGAA
AGTGGGAACT
CCGTGTTATC
TAAAGACGAC
CAATCATATT
AGATTTTTTG
GCAGCAGGTG
GTTAGCCTTG
ATCATCAAGG
GACGTCGAAG
GACGTGACCC
CCACAAGCTC
TACAAGGTTA TCGAAGAGAT GAAAAAGTTA TTGGTGAAC ATCCGTGGA'r TTATGGTTAT CGTGATGGTG TCG'rrATCTA GTGAAAGAAG TGACAAACGG AAGATCGATG ATGTGATTGA CTCCTTTCTT AGGTGGTGAG GGACGCAAGC AAACCGATGG CTGTGATGGG ACTGATAAAT TTCAAATTGA ACTT~CGTTTC TGCTGGCCTA GC?rTGGTr CAGATCGTGT GGGCATGGAA GTGATCCACG TGTCCAAGGT TTGCCAAGGT TTATTACACC TCGGGCTTGA AAAAGCAACT ACAAAATCCC AGATTTGACC ACGAGATGCT ACGCAATCTG GGAGGAAAAT AGGTTGAATT AATAGTACGC CTCTACTTCT TGTC-CTGTTC TTGTTTCATT AATTAGAAAA TGCTTTTT 1204 TTTCATTGCT TAIT-1-rGAG CCTAGGGTCT CAAAAAMCCC CAGTTCCATC ACTTCACCA CGGCGAAAGA AGCAGATGAC AA~wrTAAACT GAAAATCAAG CAAAGTAGAG AAAGGAATAT ATCAAGCGTG AAGTCAATGA GTGACCATCA TAGA'rGTTCA ATTTTGAGTA ACC'rTGCTTC GGTACCATCA AACGTGAACT TTCCTCAAAG ACGAGTCCAT GATAAGAACT AAAGAAGAGG TGAAATGGAA AAATATTCTT AAAATATTGT TAGAAATCGA TTAATATAA.A AAAGGGATTC GTAGGAAA'rA TAATATGATA AAGTTTAAAA TAGCTAGGTC C-A'GGCAAAT CATTTCCGTA GATTTTGCAA AAGAAAGTCC GATGCTGGGT GACTTGTCTG GGATAACCAA AAAGCCCAAA TGGTCGCAAT TTGAPLAT'GT CGAGTATGGA AACAAGATTG GG'rTGCCCCT CTTTTTTGGT T'rATAATAGA TTGAAACTAG TTTGACTGTC CTGATCGATT TGTAT'rTrTT AATGT'rATCT AGGTGCAAAA AAGAAATAAG GAGTTTGTAT ATGGCTGAAC AAGACTTAC ACCTGTTGTr AAGGTTGATC GTTCGAAATT TCCAAAAGAT ATTCCTACCT 'rATTGGAACA AT'rAGATCGT GTAGCTAATO CTTGTATTCG TGTTT'rGGCA GGATTACCTG GAGGGCTTGC TCAATTTTAT GCTTTCTCTC TGAAATTGC GGATCTTTGG GCTTCACGAG AGGAGTTGAG TCTAGGCGTA ATGTTAGGGG TGAA'rGGAAC AATTGCCAAA CAGGTAATGA AAATAGTCC CCCTATTTTG AAAAAAGTCT TAAAAATATT CAAAGGAATG GGGAAATTTA TTCCTA'rCTT TGCAACTATG AAACCA-ATGG GGGAAAGCTT TAGTGAAGTT CAATATCAAG AAGATGTTGA AGGAGAGTAA TATGAATCCT ATCAAAGCTT CCGTGCAAGG 'TGTAAAAGTG ATGAAAACGA TGGGGAAACr rTTTPATTGCC GACAAGTTAA TATGCAAG'rA TTGCAACAAG TGGTGAAACT 'N'TAGTGGAT AAGT'rTTCCA AAGAAT'rGGA AGGTCCAACC ACTCTTCTAT CTCAAGAAAT GGACAATGTA TTATTAGCGA GTGGGACTTC TATGGCAATr ACCAT'rCCAG CTGATGTGGC TCAAGAATTA GGTTA'rATTT ATGGTTATGA TGAAGATGCT CAAAATACCC TCTTGCTTTA CGCTGCTTTG CTACGTGTTG GTAGTATAAC TAATAAAGCT rA.ACAAAGA CGCTTTGGTA TGGTGTGAAT CTTACCAAGG GAGGGTTGGC GGGTGGTATC A'rTTCAGGTG GTTTAACCTT GCAGAAAGAA TTATCCAAGC TAGTCAACTA AACAATCCGA AAAGAGGCTG AAATCATCAA 'rrGCTAAAAT TTATGGTAA'r TACTrTTTGA 'rAAAGAAAGC TGACCATGTC GTTGTTGGTC T1GGATACGGC TCGGTGGCTC ATTAAGCCAG 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860 7920 7980 8040 8100 8160 8220 8280 8340 8400 1205 AGGAGACAGA ATGAAATTr T ?IrGGTCT'rC TTGCTATTCT 'TTTATCAAA CCGATTATTG GGA7T=GGAA ATTCTTTTGG ATGATCATCT CTTTTGCAG'r CCAATTGCTG TTTTACAAGA TAGTGrAA GATArrGGAT TGGCTCT'rTA AACTTA'rCTA GATGGTAATC CAAGTTGCAG AGAACTAGCA GGAACTCCAC TGCTAG'r'ITT TArrCTCT'r TCCATATGGT ATAATATAAG CAGTAAAATC ATTI'TATACT CTT'CGAAAAT CTCTTCAAAC CACGTCAGCT TCACCTTGCA GTATATATGT TACTGACTTC GTCAGTTCTA TCCACAACCT CAAAACGGTG 'TTTrGAGCTG ACTTCGTCAG T'rCTATCTAC AACCTCAAAA CACTGTTTTG AGCAACCTGC GGCTAGCTTC 8460 8520 8580 8640 8700 8760 8820 CTAG?1-rGCT CITGATTTT ATATCATCGA TGTGTCAATT AAATTCTAGT GGAGTTGGGT GTCGTAAAGT ATCACTTAA.A
CATTGAGTAT
CCTGTTGCAG
TTAAACCCC
CAGGGTTCTA
TAGAACATAC AATGGAGGTC
GTCATGGACA
a. a. a.
a a a a TACGCACACT GGAAGCGAAT GGCTACGAAG GATTCATATC CTACGGGATA TTTTGTTAGA TCAAGA'rCGC TTTGATGCGA CCTTTACGGG GCACGAGCTG ATGAACTCGG ATTCGGGCGT TGTCCATGCC AATCTT'TTTA AAAATGCTAT TCCAGGTCAC CCAGTTCGTG TCTTCAAAGA TCGCATTCGT AGATTGTTAG ATACCTATGA GATGCGTAAG GGTTTGGTGC GTCAGATGGG ACGTAAGGAA GAACTCTTCT TTCCTATCAT AAGTGGTGGA CAAGCATCCA GAAGTCTTGG TTGCCAATCC CTTAATGCGC AATACAGTTG AGCTAGCAGG AACTCCTATG GACAAGATTG TGATI'GGATT AGACTAATGA CAGATGAACG ATTGCACAAT GGCGCCTCTC CTGAGTCGGT CGTGTCAGCC ATCGAGATTT CCCTTATGGA CACTTTGAA GATGTTATGG AACTCTGTGA CAAAGG'rGTC GAAGTTTCAG ATACTGAGCA AGAAAATC'rG GC'TCTCCGTG CGGCCTTGAT
GTCTATGGAA
ACT1'GTGGGT
GGAGCGCTAT
GGAACTCTTT
GACGAGGAAA TGCTGGCGGA CA.ATT'rGACA TCCATTACCA GGACACGATT CACCTCCCAA CAAACAGCTC TAACGACAGC 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 9780 9840 9900 9960 10020 10080 10140
AGTTATGTGG
CAAGTCACTA
AGAGTTTGAA
GGAGTGGATG ATCAGA'rTAG CCAGAAGTGT CAATTAGCAG AGTATGATT'r 'CAAGGAAGA TGTAAAGGAA GCTTTTGAAG CTTTTGCGPC GTCCATCCTC CTCATGATTC TCCTTGAGTC a TTTTACTCAG GATGACTGGC TTCAGATTGC GGAGGAGAGC GATGCCTATG GCTATGCCAT CATCCGTCCG TCAGAGAAAT GGGTGCCAGA ACGACAGAGC TTTATrGAGG AAAAGATTGC AGAGGAGCCT GTACAGCTAG ATACCOCAGA AGGTCAAGTT CAACAAGTCA TAGATACGCC AGAAGGCCAT TTTACCATTA CCTTTACCCC TAAGGAAAAG GAAGCTGTGC TGGACCGCCA TAGTCAACAG GCT'rTTGGTA ATGGCTATCT T'rCAGTCGAG CAGGCCAATC TCATCCTCAA TCATCTCCCT ATGGAGATTA CCTTTGTCAA TAAAGAAGAT ATTTTCCAGT AT'rACAATGA 1206 CAATACGCCA GCTGATGAGA TGATTTTCAA ACGGACCCCG TCCCAAGTCG GGCGCAATGT 10200 CGAACTCTGC CATCCGCCTrA AGTACTTGGA CAAGGTCAAA ACrATCATGA AGGGGCTTCG 10260 TGAGGGAAGC AAAGACAAGT ATGAAArGTG GTTCAAGTCT GAGTCGCGAG GTAAGTTrGT 10320 CCACATCACC TATGCTGCAG TACACGATGA AGACGGAGAA TTCCAAGGAG r GTTGGAGTA 10380 TGTTCAGGAT ATCCAGCCCT ACCGTGAGAT TGATACGGAC TATTTTCGTG GATTAGAATA 10440 AGGAGAAAAA ATGAGTTACG AACAAGAATT TATGAAGGAA TTTGAAGCTT GGGTCAATAC 10500 CCAAATCATG ATTAACGACA TGGCGCACA.A GGAAAGCCA.A AAAGTNTACG AAGAAGACCA 10560 GGACGAGCGT GCCAAAGATG CCATGATTCG CTACGAGAGT CGCTTGGATG CTTATCAGTT 10620 CTTGCTTGGT AAGTTTGAAA ACTTCAAAGT AGGCAAGGGA TrCCATGATT TGCCAGAAGG 10680 CTTGTrTGGT GAGCGAAATT ATrAAACGAG AAAGATTCTT GATTTrTCAC TAAAATCTTG 10740 ATAGAATGTT TATGTTAAAT CCTTGTCAGA GCAGGGATTr TTTATTGAAA GGATTTTATC 10800 ATGTCAAAGA AACTCAATCG TAAAAAACAA TrACGAAATG GCCTCCGTCG CGCAGGTGCC 10860 TTTTCAAGTA CGGTGACTAA GGT"TGTAGAT GAGACAAAAA AAGTCGTGAA GCGTGCAGAA. 10920 9CAGTCAGCAA GCGCAGCTGG TAAGGCTGTT. TCTAAAAAAG TTGAACAAGC AGTAGAAGCT 10980 *ACCAAAGAGC AAGCTCAAAA AGTAGCrAAT TCTGTAGAAG ATTTrGCAGC AAATTrGGGT 11040 :9.GGACTTCCAC TTGATCGTGC CAAGACTTTC TATGATGAAG GAATCAAGTC TGCTTCAGAT 11100 TTCAAAAACT GGACTGAAAA AGAACTCCTT GCCTI2GAAAG GAATCGG.CCC AGCTACCATC 11160 .AAGAAATTGA AAGAAAATGG CATCAAGTTC AAGTAATTTT TCTTGAGCCT TGCAT'N'CCG 11220 AAAAAATCTT GCTACAATAG AGCCATTAGA GGTGTTTTGA ATCCCACATT TTACAGAAAG 11280 TGGCGGCGCT GAGAAGTCCA C.AAATGTGTC AAAACTGGTT GCTAATGGAT GAAAAATTGA 11340 .AATAAAAGTG TCTTTTTGCT TTAAAGACGA GAGTTGCG 11378 INFORMATION FOR SEQ ID NO: 211: SEQUENCE CHARACTERISTICS: LENGTH: 4156 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO,: 211: 99 99CCGCGAGCCA CGGCGAATTT GCTGCGGGTA TTCATCAGTC AGGATCTATG ATCTTTGGTG AACAAGAAAA GG'N'CAAGTT GTGACCTTTA TGCCAAATGA AGGTCCTGAT GATCTATACG 120 CTAAGTTTAA TAACGCTGTT GCTGCATTTG ACGCAGAAGA TGAGGTTCTA GTTTTGGCTG 180 ACCT'rTGGAG TGGTTCTCCA TTAACCAAG GTAAGTTTGC CATCATCACA GGACTTIAACT GCCTCATGGA CGCTGCTGCA GGTGTAGAAA AAGATGGCAT CAAAGCTCTT CCAGAAGAGC CAGCTGCTCC AGTTGCCCAA ACTGCTATCC TGAAAATCAA TC'N'GCCCGT CTTGACACAC GGACTCCAGA TTCAAAAGCA AATCGrATCA ACCTTCGTAA AGAATTGATT AAACAAGCAG CAATTCAAAA ACTGA'rrGAG ATTTCAAAAG TCTTGTTTGA AACACCTCAA GATGCCCTDC CTCTTAATGT TGGTTCTATG GCTCACTCAA 1207 CTACTCGCGT GATCGGAGAA AATCCTGAGC TACCGATGTT GATTCAAGCC TACACAGAGC AAGTCGCTrGC TAATATCATr AAAGA.AGCCA TAAATCCAGT CGAAGAAGTT GCAAGCGCTG CAGAAGGAAC 'rGTTATCGGA GACGGTAAAT GTCTACTTCA CGGTCAGGTT GCAACTGCTT TCCGrrGCTTC AGATAACGTG GCTAAAGACG CTCCAGGTAA TGTCAAGGCT AACGTGGTTC ACCCACGTTT TGGAGAAACA CATGCCCTTA GTGCCATCGA AGGCGGCGTG CCAATCAAGA CAGGTAAAAC ATTGGTCAAT
C
C C
C
CTA'rGGACAA AGAAGACGT'r GCTACATTTG AAAAAATG;CG TGACTrTGGGT ATGTCCGTAA AGTACCAAAT GATTCTAAAA AAGATrTCTT TCACT'rGATT ATGTCAAATA AGCCATTATT TATGAAAGGA 'TTTrAAACAT GTCTATTATT TAGTAGTCGT TGTAGCCTTC TTTGCAGGTC TTGAAGGCAT CCTCGACCAG
ACCGTTTTGT
GTTGAAT'rTG
AACAAAGCCA
TCTATGGTTT
TTCCAATTTC
240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 ACCAACCACT TGTAGCCTGT r'rATCCTCGG TGGATCGCTT TCGCTCCTGA TGCTGCACTT ACTTTACCAA GACTGGTATC GACTTTTCTT GACAATGATT CTGCCC'rAA AAAAGGTGAC TCCAAGGACT TCGTATCGCG TACAAAGTAT CCTTAGTGCC GTATGGTCGT TGCCGTTGGT ACCCTTATTG GGCTTGTAAC AGGTCACTTG GAAGCAGGGA CAAATGATTG CCCTTGGTTG GTCAAATATC GGTGCTGC'rA GCTTCTGTCG CTGCTGCCAT TATCATGGTT CTTGGTGGTG GGTGTTGCCC AAGCGGTTGC TATCCCTCT-r GCTGTAGCTG GTTCGTACAA TTTCAGTTGG TTTCGTTCAT ACTGCAGATG T'rCGGCGCTG TGGAGCGTGC GCATT'rCATC GCGCTACTTT CTTCCTGCAG CTCTTCTCT TATGGTACCA ACTGAAACTG ATGCCAGACT GGCTCAAAGA TGGTATGGCT ATCGGTGGTG TACGCCATGG TTATCAACAT GATGGCAACT CGTGAAGTAT TTCGTTCTCG CTGCTGTGTC AGATATTACT CTAATCGGAT ATCGCTCTTA TCTACCT1TCA CUMTTCTAAA ACTGGTGGAA ACTTCTAACG ACCCAATCGG CGATATCCTA GAAGACTACT ACATCATGAC TGAAAAACTr CAATTAACTA AATCAGATCG CAACCTTCTT ACAAGGGTCT TGGAACTTTG AACGGATGCA
GGCCATTCTT
TCGGTGCTAT
ATGGTGGCGG
CGCTCTTGGT
CGGCGTTGCT
AGGAGCCGCA
AAGATAAGAA AGGACTGAAA TAAAAAAGTT TGGTGGCGTT 1208 AAACTTGGGC TGGGCTTATA CACTCATTCC AGCTATCAAA AAACTCTATA CTAAAG AGATCAAATC GCTGCTC'N'G AGCGTCACCT TGAGTTCTTC AACACTCATC CATACGTAGC TGCTCCAGTC ATGGGGGI'A CTCTT(GCGCT TGAACAAGAA CGrGCTAACG GTGTGGAAAT 1980 2040 .2100 CGATGACGCT GC'TATCCAAG GGGTTAAAAT CGGTATGATG GGACCTCTTG TGACCCAGTA TTCTGGTTTA CAGTACGCCC AATCCTTGGA -'1eTCTCGGTG CCTrACTGGC AAkTATCTTGG GTCATTCTTG TGGTATGTrC TATGTCTGGT GGTATCCTTC TCT'rGCTG'rC CTTGTTCAAC TCAACTAGAT GAAAAGGCTT CCAAGAAGCA TTrCGCACAAG TTTCCAACAA AACTTGGATA TTGCATGTAC TTACTTAAGA AGTGGGTATT GTGGCACATG GTTCTAAAAT CTGATTCCTT CA'rGGTGCAA GTACGATTGG TCCTTACGAA CAACGATTTG TAACCATCTG TACCGACCTT GGCCACTCCT CTTCTTTG1-r GCATGGA-ACT AAGAGATTGG ATACAAGGCT GGATCAGAAA AAGATA'rCAC TAAAGGAGC'r TCTATCCTTG GCTGGGTAAA TATTAAATTT GCTTTCGATG ATATCCATTG GGATAAATTG CCAGAAGGGT TAGGACAAGG ATTGTCTCA-A ACTCCTGAAA TG'rTGATTCC TCGATTATCA GGACTACTCC AAAAAGTATC TCCAATCACT ATTATCCTTG TTCTTCACAT CATGTAATCA AGCAACTAAA CTGGTATCGG 2160 CTTCACTTGC 2220 TGATTCGTAT 2280 TCACTAAAGA 2340 GGA'rGTTCAT 2400 TTTCTAAAGT 2460 CTAAAGGTAT 2520 AAGTTACTAC 2580 TTACTTTACT 2640 CCCTCTTCGC 2700 AAGGAACCAG 2760 CATTGGATCC 2820 TGTCAGCAAT 2880 GGCAACAAAG 2940 GGTTGATTTA 3000 t*CIO *age 9S 59 0444S 6 TTT'rCTATGC
TTCTGCTCCA
GTATGTGTAT
GTCTCCCCAT
T'N'TATTCAG CCAAGGCTCC TAGGCAGCTT GTTCTTCTGC TCGTCCATCC AAGCGTCTGA GAGT'r??CAA CCTTCCACTT CCATTTTCGT CCAAGTCAAC ACCTGTCAAG ACCATGGCGT GGGTCATCAA GCT'rTCACTA TAGTCCAAAC GTCCAGCCTT GTCTTGAGTA AGTTTAATGT CCATGCTTGA 'rTCAAAGTCA TAA.ACA'rCTG TCGCAAGGAT GCCAGC1'TAC GGTTGCTGAG CTGGCCGACA TCAGAACCAA ACCAAACAGT CTCACCTGCr TGCATTTGGG CAATCGCCAA 'rrCTTTCAAG CGCTCCATTG GAACGTTGAT GTAGCGAACr GCACGGC'rAC CAACCACATT CCCCAACATC TCAACTGTGT AAGATTTTCC GTAAGGTTTA TCAGCAGT'rG GAGCATTGAT AACAGAAACG TAGTCTTCTA AAGGAAGATT GACATATTTC TTGTAAAACT CTTGTGGTGT GATTCCTTTT TCACTTTTGT AGTTGTTATC TrTATCGCGA TAAGCAAAGT CAAACrTGCG TGGTGGAAGT CCTAATGACA TAGCAAGAAA GT'rAAAGATT TCTTGCAAGA GGTCTTCTTT CTTAGCTTGA ACAGTCGCTT GATCTGCACC AGAAACAAGC AAGTCACGCA AGATTTGAGc ATCT'rGACGA AGCAATTTAT TAAGGATCGC ATTTAGCTCA CGACTGCTGC TAGATGAA.AC AGACTCAGGA TAAACTGACT TAGGCACGAC ACCGTAN'TT TCAAAGAGGG AAACGACCAT ATCCCAT'rGA CCGCCATCTT 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 1209 GTTGAGGTGT TTGGAGTAAG AATGACTTGC TCCAAGAACC AAGCTAACtT GCGGCTAGTC AGT'rTGATI-P CTCATACTTA AATTCTTGGT CTGAAGTCGC TCCCAGAAGA AAGTGTGGGC TrGTGACAAC TCAAAGTTCT CCAATTTGTA TTGCGAGATG AGTTTGTGGC GGAAGGTGT-r GAGAGCCGCA AACATCCAGC AACGACCAGA CGCTTCTGG TTAGTGACCT TG'rCCTTGCT TAAATCCAAT GAGAAAACAG GTGTGT'rGTC TACATGGCTT TGGCGACGTT CCAGAGCTGC AAAAATTCCG TTGTGGCTGG CAGCATTTC AATCGCT TGG TATTTACAT 'IrGCTTCATA GTTGGCAAAT AGTTTATCAG TAAATGATTC T'rGAATCGCG TTCATAGATT CCTCCTTTTA GTCTACAGTG TATTGG INFORMATION FOR SEQ ID NO: 212: SEQUENCE CHARACTERISTICS: LENGTH: 3902 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212: AAA.AACAACA AAATAAAACA AAAACAAAAA TATCGAGGTT TATrrTTCAAA ACTTTCGATA 3780 3840 3900 3960 4020 4080 4140 4156 TTTTTATTAA GTTATTATTT TGTTGTTTCT AGAATTATAC TCAATGAAAA TCAAAGAGCA AGTACGGCAA GGCGAAGCTG ACGTGGTITrG CCGTAGTTGT AGTCATCATC TTGCATGGCT ACTTGAGAGA AGAAGTCATG GTTGGAAGTT TTGACATCTT CAGCTGAATC TGGGAAAAGT TTGGCATTGT AGCGAAGGAA GGTTTTAACC CTCTCTGTGT AGCCTTCTTC ATTTTCATAA TTGAGTTTTT CTTGC1'CIC 'rTCAGGTAAT ATGTAGGTTC CGTGAACAGA CTCG1TCACGA dCAAGTTTGT TGTTACCGAG ATAGTAGAGG GTTTCGAGGA AGACGCTGGC AACTTCTTT TTGACAATCT CAGCCTTCTT TTGTAGGTAA TCAATCTCAG CCTTAGTATT CAAGGTAGAA AGTTTACTTT TTGATGGTTA AACTAGGALAG CTAGCCGCAG AATTTGATTT TCGAAGAGTA TCAACTTCGC CAAGAAGGTA CCTGTTGAAA TACCGTTCAT GGATCTTGTC CCATGTTCAT TCTTCAGTCC AACCAACACC AGACTATAGA GTAGGTCGTA AGAGTGGTGG 120 GCTGTACTTG 180 TTAGTGCAAA 240 ACCATTTCCG 300 AACGATTGGG 360 GAGAGCTTTA 420 GTCATAAAGA 480 CATCCATTCT 540 TTTGTAACCA 600 TGCAACGTTG 660 GAAGAGGAAG 720 GTAGATTTCG 780 GAAAATTTCT 840 AGCGTGGACA 900
TCATTGAAAC
ATAATCAATT
GGAGTGAAGA
CAAGTTGGAA
TAATGATTTC
AACCAGAGTA
TCAAGTGGGC TGCCGTTTAG GGATTGGTAT TGGTCCATrC AAGATTGATG AGTAAGATTT GATTCCATAA ATTGGATGTT CGAAGGGCTT GAACCCCAGT ACrTTCCGA CCAAGTCTT'r GATAAGGGAA TACGTGTATC TCGATGACAT CTTCGATGGC CTCTTTCTC'r GTTTAGTATT GAGTATCGCA CAAAAAGTCG CATAGTAGAT TTGATTTTAT 1210 ATTGAAGACA GCTTCCTCAT 'N'CAGATTGC ATAG'rGTCAA CTC?1'TGTTA GATAGC'rTTC GAGCCAAAAT TGCTCCGTCA ATTCCAGTTA ATGGCTTTGT GCGAACTCAC AATTATTTCT GAAGCCCGAC ?rTAAAATG CAGTGCTGCT TAGGGAAAAA GTGC'rGTACG CATGTCTGCG
GAAGGGTTAA
TCCAGTCATC
GrTTT'TCCCA
ACCACCAAAA
CAAGTCGTTT
AGTTGATTTG
AGTAAGTTTC CArrAAAAT ACTTTACCAT AATTCTATAG TTACATAAAT 'rATGT'rATGA TAGTGTTTCT ATGCTAGAAA CTAAATCACA CAGCTTTICAC ATTGTGGC GCCGACTTCT CCACCGTCAT CTGTAAAGGT ACGGACGTAG TAGATAGACT TGATTCCCTT GTTAAAGGCA TAGTTACGAA GGATGGACAA GTCACGTGTC GTTTGTTTAT TTTCCCTCTT CCATTCGTAA AGGCCTTrTG GAATGTCACT GCGCATGAAG AGGGTGAGTG AAAGTCCTDG ATCCACG'rGT TCAGTCGCAG CAGCGTAAAC ATCGATGACT TTACGCATAT CCATATCGTA GGCAGAAGTG TAGTAAGGAA TGGTTTCTGT AGACAAGCCA GCAGCAGGGT AATAGATTTT ACCAAT'rTTC TTCTCTTGGC GTTCTrCGAT ACGTTGCGTA ATCGGGTGGA TAGAAGCAGA AACGTCGTTG ATATAGCTGA TAGAACCATT TGGCGCTACA GCAAGGCGAT TTTGGTGGTA AAGACCATCT TCTTGAACCT TGTCGCGAAG TTCAGCCCAA TCAGCAACAC CAGGGATAAA GACATTT'rTG'AAGAGTTCTT TAACACCC'rC 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1.92'0 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 TGATGTTGGA ACAAATTCAC CAGTTACATA TGATTTTTCA AAGTTGTGGA AGGTAATACC CAAGGTCCAG TAGTTCATAA GCATAAAGTA CTTGTCAAAG TAACTTCCGT TAGCATAGTC ACGTTCACGT GCAATATTGT TTGACTCTAC GATGCTTGTA AATTCAACAG ACTCAGGTGA GCTGTGCAGT CCCATGGCAC CGAGACCAAA ACCATATTCA ATGAGTTGTT GGTGTGGGCT 'rGGCTATTTC TGTAACGAAA GTAAGGGCAC CATCATGTTA ACCACGTTGG GAATTC'TTGA GCA'rCGTTGA GTTACTCATG ATAATCTTTC TACATAAGGA TAGCCAGACT CTTGATTTTT GTCTTGCGAA GTCGATGTAA TTGAATGGCA CATTTCTTCA TTTTACGAG
GGGCAAGGTA
CATGGTCAAT CGTTGGTACA GCTACGATAT GTGAACTATC GAACCATAGC ACGGATAGAA CGACCAAAAT CAGGTGAAGT TTGAACCCAG GTTACATGAA ACATCTGTTC CCATTTGAAG TCAAGCT'rGG TTCTTGAACT TGAAGAATCT CAGAACACAA CATCAACAGG ATI'rGCACCG TTAGCCGTAT CGATGTTGAC CTTGTTGCAA TTTAGAGATT TCAGTTTCCA AATCCCGCC TATTTGGATT TGCGACCAAT TCATCGTATT TTTCAGTAAT CACCGTATTC TTTT'rCTACA GAGTAAGGGC TGAAGAGGTA CCAATTCGTA GAATTTATCA GGTACTACAA CACCAAGTGA TAGAGTCI'G ACACGTACTr GATATCTGGG TGAAAGACGT GTTGGAGTAA GAGAAGCTGT TCC'rrCATAG CCTTGATAG ACCACCACCA ATACGTGAAA ATCATCCGTC ACTTGGATrA ATTCAAGAAG GAAGGAGTAG GATTGCAACA GCTTCATTCC CATAT'rTrCA AGATAGTATT
TTTCATCAGC
TG.AGGTAGAC
CTTCAAAAAG
GTGCACCAGC
GTTGAAGAGC
GGAAACAAGA
CAGGTTGGTA
CATCAGCGAA
CACCGTCATT
1211 GTTTTCTrTC TTAGTrTGAAA GGAAAGCGAT AACACCAGCA CCTTGACGTT GCCCCAATTG CTTCATAACA GGAACGACAC CTGA.AGCAGC TTCACGAAGG TTGCTGAGGG TAATTCCCAC TGAGTTGATA GAACGCCCGA TAGAGTTCAT TACCAACTCC CCACGACGAG CACGTCCAGC GCGTTGGTGG ATGATTTCAT TGGCAATATC ATAAAGGGCA TTGAAGAAGA CACGGTCTTC AGTCTTTAAG GCATATTGAT TGTAAAATTT GTTTTGGTCT TTGATAAATT GAGCTAATTC AAAGGCTGTT TCGATGTAGT TGTGTrCAAT AAAAACCATA GTGI'TGGAA CTACATTTTC TTTATGAAGC ATGATTTGTC CATTAACAGG AGTCACGTCT TCAAGATGTT TTAATCCCAT GGCT'rCTAAG TTAGCCCTAA AAGCAGTTTC TGTTT-CAGTT TrCCTGGTTG GAAACCTGAA
ATAAGCTGCC
TTCCAAGAAC
ATGAATGACT TGAATTGGAA TCTGGACGGT ATTTCTTGAT 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3902 GAGGTAAT'rG ATTTTGTCTT TGATTGAATC TTTAAAGAAA GCATCCAAGG CTTCCTTGTC ACGGTTAATT TCGTTATTAA GACGGAAGTA AAAATTTCCC T'rATCTAATT ACAAAAGAAA TTCTGGATGA TGTACTAAGA TTATGCTAAT AAGACTTCAG TTGGTGTTTG TCGACGTACT CAGGTTGCTC AAGAAACGCT TGGTCATT GTGTAACTCC TCTTCAAAAT
GG
GATAACAGGA GCTGCGCTAA AACCGAGCTC ATCAAGATTG ATTTCACGAT AAGAGACATT ACATTGGACA CAATTGTTTr TAGAATAAAC TTAATACTAT CTTAGTATAT CAGAAAATAA
TTTAACTTGA
ATTACTGrCC
GGTTACCATT
AATTTTGTCG
INFORMATION FORSEQ ID NO: 213: Wi. SEQUENCE CHARACTERISTICS: LENGTH: 2456 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 213: TATrGAAGCT ATTGTAGACT ACAAAGATAA GGATTrGCAG TTAGTAGGCG GTGAGACTCA CTGATAACCT AAAAAGGATA GTCAATTATG CTTGTTTACT AACTATTAAC TATGCTAAAT 1212 CAATTGAGGT TGTTTACATA AAAC'rCTATA TCAGAGAAGC CTGATATAGA GTTT'TTCTT GCTAGTTTTA GGATTTTTTT GTAAAATAGA AAAAGTGAAG AGAGGTATGA AATGAGCAAG AAAGATAAAA AAATCCAAAT TTGAAGGTT ATrACATTGAC GGACAATTTG CCATTATAAA GCTGTGGAAA TTTTGATTGA TTTCATGATA TAATAGTCCA GGACTGTAAA TCCGCCCCTT ATGGGGTATA GCCAAGCGGT CCAGCTACCC CAGTTCTTAG TATTTr'rATA ATTGAAAGAC AAATTATAAG TATGTCAAGT GAATTTGAAG ATTT-GCTAAA
TCAAGTAGCG
TATCGGTAAA
GAATGGGAAT
AAATTATAAT
TG'PTGATTGT
CGGCTTCGGG
AAGGCAAGGG
GATGCCAAAG TTAATGTTGG TAAAGACAGT AAAGTTATCG GAGAAATTGC CGAATTAGAC GTCGATAGTT TTTATAAAAA ATTGGAAAAA TTAGCAAAAT AAGTCTTGTT TTTTTGAAAT AGGAGAGATA GCGAAGAGGC TAAACGCGGC GGTTCGAATC CCTCTCTCTC CATTT'CAT'rA ACTTTGACTC CCTCATGCGT TGGTTCGAAT AGATAGAAAG CAA.AATATCT TAGGGTAI-rT TGAACATGTC CTTGCGGGTG CTTAGGAAAA CTTGATTGT'r GGAGGATTTT TrAGATGAAC CAAGTTGAGA CTGGTGATGT TGTTAGTGCT AACGT'rGCAA TCTCTGGAAC TGGTGTTGAA
GTAATAATCA
GTGAATGATA
TTAAGAAAAA
TAGCGTTAGT
GAAGTATTGA CAGTTGATGC GACTCAAGCT GGTGTCTTGA CTCTTCGCGA ATTGACAAAC GATCGTGATG CAGATATCAA TGACTTTGTT AAAGTAGGAG AAGTATTGGA TGTTCT'rG'A CTTCGTCA.AG TAGTTGGTAA AGATACTGAT *300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320, 1380 1440 1500 1560 1620 1680 1740 1800 1860
ACAGTTACAT
GTTGGTCGCG
TCAGTAGAAT
ACCTTGTATC TAAAAAACGC AAGAAGAAGT TGT'rACTGTT CTTGAAGCTC GCA-AAGCATG GGACAAACTT AAAGGAACGC GTGCCGTTAA AGGTGGACTT
TTGAAGGTCT
GTACGTAACG CTGAGCGTTT GCTAAAGAAA ACCGCTTCAT GCTCGCGCTG AAGTATTCGG CGTATCACAA GCTTCGGCGC ACTGAATTGT CACATGAACG ATTGAAGTGA AAATCCTTGA GCAACAGTAC CAGGACCATG GAAGGAACAG TTAAACGTTT GATGGACTTG TTCACGTATC CTTAAAGTTG GTCAAGAAGT TCGTGGATTT ATCCCAGCTT CAATGTTGGA TACTCGTTTC TGTAGGTCAA GAATI'TGATA CTAAAATCAA AGAAGTTAAC CCTTTCACGT CGTGAAGTTG TTGAAGCAGC TACTGCAGCA TAAATTGGCT GTTGGTGATG
TTTCGTCGAC
TAATGTATCA
TCTTAACGAA
GGATGGCGTT
GACTGACTTC
ACAAATTTCA
TCAAGTTAAA
CTTGGTGGTC
CCAAAATCAG
GAAGAAGGAC
GAGCAAAAAT
GGTGCATTTG
CACAAACGGA
GTTCTTGAAG
TTGTAACTGG TAAAGTTGCT TTGACGGATT GGTTCACTTG TTGTAACTGT TGGTGAAGAA GTGTATCACT TTCACTTAAA TGGCTAAAGG TGATGTAGTA TTGAAGTATT GCCAGGTATC T'rGAAAATCC AAAAGAAGCT TTAACGCAGA TGCAGAACGC GTGTCACTTT CTATTAAAGC TCTTGAAGAA CGTCCAGCCC AAGAAGAAGG ACAAAAAGAA GAAAAACGTG CTGCTCGTCC ACGTCGTCC.A CCAGAAACAC AAACAGGATT ?TCAATGGCT AATTGAAAAT TCACAAAATC CT'IGTTTAC! ACTGTAGTGG GTTGAAGAAA AGCTAAGCTC TGATATTCAG AGCGATAAAA ATCCGTT CATTGCGCTT GATAAGTTrG ATGAGATTAT GTAGTTGAAG GGTGTTGACA AGCTTTTCTr GAAAAATAGG ATGAACCTGC TTAAGATTGT CCTTATTCTG AAAGTGAAAC AGCAAGAGTT 1213 AGACGTCAAG AAA.AGCGTGA TTTCGAACTT GATIGTTT G GTGATATCGA ACTTAATCA TAAACA.AGGG ATTTTCTGG CTCTTTGTCA GAGAAAGGAC AAATTTTGTC CTTTCTTTrTT TGAAGTTTTC AAAGTrCCGA AAACCAAAGG TGGTCGCTTC CAGTTTGGCG TTAGAATAGT TATCTTTGAG GAAGGTTTTA AAGACAGTCT CCTCAATAAG TCCGAAAAAT TTCTCCGGTT GATAGAGCTG ATAGTGGTGT TTCAGG 1980 2040 2100 2160 2220 2280 2340 2400 2456 INFORMATION FOR SEQ ID NO: 214: SEQUENCE CHARACTERISTICS: LENGTH: 10974 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 214: a a. a a a AAATAGGATA TAGAGACATC CTATACCTCC ACAATGTCCA CCTTAATGTC TCCATATCCA TTGTAACCAT CTCCTTAAAC CTCAATAATT T'rATAGTAAA GACCTTCTCC AAAGAATTGC TTTAGCACTG TATAAAAACG GAAAAAATTG ATAGAGATAA CTTCTGATCT GCTTTTwACA AAGTCCAATT ATATGCGGAT
TTATTATMCC
TCAGGGATAT
GACGCATTAT
TCATGCTTAT
TATAATACTA
TTTCAATAC!A
ATTAAAAATC
TAACTATAAT ATGAGCCGAA AACACTATAT TAATATTTAT TTTTCCACAA CTATATTGCA GATATTTGAT AGAGAAATTT TTATGAATAA ATCTCAAAGA TACCTATTTT ATCTTGTCTC TTACAAATCC ATCTGCACTA CACTTCAAAT CTAACTTCAA GAAAACTTCC ACTATTAATT TATATTGAAA CTCATCCCGA TGCTTATTTG CCAACAACTA TTCATTACGC TCTAAAGGCT TACTGCGAAC AAGACCCAGA AAAAGTAA.AT TACCTGACTC CTATTTATAT TTATGAGACA GATCGAGCCT TGAGCAGGCA GTTAGTCTCT ATCGAGACAA CGCACACCAG AGATTGCGAT GTTCAATAT ACTATGGCTC CGATGACCTA ACTGAAATAG CTGCTGAATT CAACTGTCCT ATGGGATATA GTCTAAAAAA GAGCCGTACC CGGTTCCTTA AAGAATTGAA TCACTTAAGC GGGGTTGAGA CCTAT'rTTTA TCTCGAATAT CTGGAAGA.AG ATATAATTAT TTGAATTAAG ACTGTTATAG ACTTTAAGAAGTACTAAT GCCCTTTTTT S1214 TAAAGATACG ATGACGAGTG ACTTTTTCGA AGCTTGCTTC TTTAGATACA CCATCCCTTA TCAT1'ATGGA CAATGCAAGG TAAGGAGCAG GGCATAGACT GTTACCACT'r CCTACCTATT GAGAAAATAT GGGCTTACAT CAAAAACATC TCAGAATrAAT TTCTTGAGGC ACT'rTTGTCC TATTCTTGTT TCAGCCGACT TACGGAACAG TCGATGGGAC GATGGGGGGA CATAAAAAAA TAACAGTATA CTGGAGAATT GACAATCTCG GTAGATACCT TTAGGCAGTT ACAAAACAAC TGTGAACAGA AAACAI-rCCA ATGTTTTGGC TCTATAAT'rT CTGTAGTGGG TAATCCCACC TCTTGTAGAA AAAAAGCCCC ATATGACCTA TAATGAAAAG AACGGTTCAT ATGGAACAAC TTAAGAATAC CACAGAT'rTG TATCAAAATC TTGTCTGTTC TGAAATACCA AACCCATCTA TTCCCCCGCT CCTCCT'rGTC CTCATTGTCA AGGGAAGATG CAAAAATTCT TAC'rACCTAC 'ITTCACAGAA TGAACATGTG CACCCGAGTA TAATCCCATT ATTGTCAAA'r TACGATGCT ATACTCCGTr ATTGGGCAGC TCCTCCAGTT ?G'TTG rA CGTTATAGCG CGGTTACTTA GAGTCAGACA AGACTTTGGA CCAGCAATTA TAGGGTCGTT CGTCTAACCA ACTCATTAGA CTCGGAT'rGG AAGACAAAAA GTCGT'rCAGG CAAAGTTGGA ATCAAATACG AC'TTCCAGAA AGCCTCTAAA ATTCCGCTI'C GCGCCGCTTT CAGTGCAAGA GAAAAATTGC CAGA'N'TCCA GCAGTCTATG ACTGAGATTG ACTGAGGGAA TTT~AAGTT'rG TGAGTATAGC TTCAAAAAGA CATCCTCGCA ATTTTAGACG TCAGAGAGAG GTTCGGGAGC TCGGCTCGCT AAGCAACTCT CCAACATCTG AGCCGAGCTA AAAATCCTTG GAGTATCGGG GCTCGGGCTA AATCAGTCCA CAGAAACGAT TCTATCAGCC AAGCTGCTTT CTTACTCTGA TTTCATTTTC AAGAGAAGAA TCGACTGTCA GGGTTTACCC ACGGTACTGC ATCTCAAAAA ATTGCCTTAA GGTGGTCGTT TCTCAAACAT CCATTGTCAA ACATGGTGAG ACAAAAAATC GCTCAGCTCC TCCTTGAAAA CCCACAGAT'r GGCGGTCTCA ACTTCCACCG TCATCCGAAA AAACCGATTrG GACCAAGTTG CCAAAAGT3'A TGAGTTGGGA GCAAAATGAG CTTCATTGCC CAAGAT'TTTG AGTCCAAATC GGCGAACTCA TGCGGTGATT CGAAACCATT TCCAACGCTA TGGTCGAGGT CATCACCATG GACATGTACA GCCCTTATTA 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 T'rCCAAACGC
TGAACCGAGT
CGCTCAAGCG
CTGGACTGAT
CACGT'rTCGA
GGGATTACAG
TGCCGACCAT
GAAGATTGTT CTTGACCGCT ACGAATCCAA ATCATGAACC CTTTTGGAAC CCTCGCTTTT TTACTACACC AGTATAGCTT ATGCACT'rAA CCCATCGGGA GTTCACTACG AACTCTATCA TTCTTTGGAT TGATTGAGCA TGGACr'rTTr TAAGGGATAG GCTAAACI-rG AAGCGACCAA
TCCACATTGT
AATTTGACCG
TCCTTTCTAG
CAAGCTC'rGT
AGTACGAGAT
AC'rCCTGCTC
AGAACTGCCA
AGATAAGATT
TAATTTGATT
ACGGTTCATC CGCT'r'r'TCA AACGGTCTTT ATCAACGCAC 'rTAAGCTGCC TTATTCCAAC 1215 AAGATTATCA AGCGCAAAGC CTTTGGTTTC CGGAAC1TTTA ACAATTT'rAA AAAACGGATT ?rGATGACTT TGAACATCAA AAAAGAGAGT ACGAATTTCG TACTCTCCAG ATrTGCAGCTT TTCGCCTACC CACTACACTI' GACAAAGAGC CACTCTTTAT TCr-ATGGTAT CAAACGCAAG ACTTGGTTTG GCATTGAGGT CCCACCCTGC GAAG7MrTCT 7rGTTCCACT CGCTGACGCT GGCATAGGCA ATCATACCTG CATrGTCTCC GCAGAGTCGC AGAGGGGGGA TGATAACCTr
CCACACCACC
GACATCTGTG ATTTCGGCTG TGCCACAACT AGGATTTTAA AATGTCCATA ACTGCTGCT CT'rTTGCTCG GCATTGTGAT CTCCAGATTA TCTTCCTTAA AGCCAGCTCG 'rCAATCTCAC ATCATAAGCC TCACCAACCG CTCCGAAACA TAAACCAACT CTAGGCGTTC TCTGAGACCT CAGGATATTT CTCCAAACC GGAAGGAAGC ACACAAATCT GAAGATTGAT AAAGGCAGAT TCATGGCACG GGGGAAATCA GACCTGCAGG ATAGGTCAAG CATCATCACG GGTT'rCCCCA
TTATTGGCTG
TTCTTGGTTT TTGCCATGAG TCTGTAGACA GGCCNCTCC TTCAAACCTG AGAAGGAGAA TAAATATCCT GCCCCTGATG CCCATGACAC GGCCGACCTT ACAATCTTAT AATCTCCTGC CTGTGTGTCC GCCGCTGACC AAGAGCGCTA GCAAGGGAAA a.
a a a. a.
a a. a.
a a a a a a a a CTCCAAAGGC TCCACACTCT GAGCTGCCAT GAGGTGCCCA GCCATGTGAT TAACAGGAAT CAGTGGAAGT CCGTGAGCCC AAGCAAAGGC CTTGGCAGCT GACAAACCAA CTAGCAAGGC TCCGACCAAG CCTGGTCCGT AGGTAACCGC. AACAGCTGTC ACGTCCTCTT CGGTAATCCC TGCTTCTGCC AATGCCTCCT CGATACAGGC TGTAATGACC TCGACATGGT GACGACTGGC 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 TACTTCGGGC ACTACGCCAC GGACAAGAGC TCATCGTCGT AAATGCTAAA ATATATCTAT CTGGGTCATG GTAGTAGGCC ATGCTTCCC TCGTTGATTT ATTGAGCAAA CAAGGcCAC GCAGGACTTC 'rCCTTCAAAA CATC-ATAAGC CAATGCATAC GAGTCCAAGG ACTGACTAGG
CAAAACGT'TT
TTTTCA.AGAC
CCTTCATCTA
TTTCGCTCAG
GTGACTCTCA AT'rTGACTAG CAATGACAT'r GGCGACACTG GTCTCATCAC AGGATGTCTC TTTCTCTtTT CATGATAATG GCGTCCTCGA CGATAACTGT CATCTTTTCT TTCTTGTAAA GACTGTCTGA CTTCGAGGAA GCAATCCCCT GACCCTGATA AGATTCTCCT GCACAGCTAG AATTTCCTTG TCTGTCGGCA AGCTCCTTTG ACAGCGATT AAATCCAATC ACTTCTGCCC CAAGTCTGGT CTTGGGACAG ATCTGCTTGG ATTTGCTCCA TAAAC-AGCTG CCATAACAGC GTAGATGGCT TGAGCTAGGT a. a.
a a a CAGGCTGTTC TTGAATTCGC TTGATTTCTA TCATAGGCGT TTAATGTAAG CTCGGTATGG TTCTTGAGCC AGTTTTCCTC AGCCTCGACT CG TrGAGGT AAAATCATGC AAGGAGTCTG CTTCCTTGTC CCAGGCCAAA AGAGCTAGAT
ACTCGCCAGA
AATTCGGCAC
TAGCTGCATT
1216 GGGCAATGTr TCTTT'GTAAT CAGTCCTTGG CAAGTGT-rT TGAATCTGCT CAACAAACGGG CCCAACTTCT CCGACAAACG 'rTACCTGACT AGTACCCT'rG ACrTTTCTA GCACCTCrC AAAAGATAGG TGCGCTTCTG CCATGACAGG TTTGGCATT-r TCATAAAATC CTGCATAAAC ATTATTGCGA CGCGCATCCA TCAAGGGGAC AAACAAACCT TCTTGTTGAT GGGGCACCAG 4440 4500 4560 4620 AGCCAAGAGA CTCGACATAC CAACCAACTC GATGTTCAGG AGTTGCTACC GCAATTCGCA AGCCTGTATA GCTACCCGGC GTCCAAATCC TTGGGTGTCC AATCCAAACT TGCCATCAAA AGTAATACTG TGATT'N'rCT TAATA'rTAAT CGTCGTCTCG TAAAATAGCC AGAGAAAGAG CCTTGCTGGA CGTATCAAAA TTCCTATCT'r TTTGTCrGCT TACTATTATA CTACAAAAGC TGCCCCCAGA CAAGAGTGCC CTCACTTAAC TAAAAATAAT CCTTT'rCTTT TCCGAATATA AAAGTGAACA AGAA.AAAAGG TTTGACATTC TTGACAATCA ATTITrTATCC TTATCTGAAA GGCGGTCTCG CTCCCTrGGT TATCTTTGGA GTAGCAGTA'r GGAACAGCAC TTATAGGTTC TGGTTTGGCA GCTGGT'rATT GATGAAAGAT T'rGAACAATT ATCGTGAAAT TTCTAATAAG TGGCTTTGGT GTCGGTGTTG GTATCGCTTT ATTTATGGCA CCTTCGTAAA AAGTTTGGTA AGTCATGCTA GATAAGAAAC TTTATTGTCT 'rCATCTCTTA CAGTTTGCTC AGCATTCTCA GTGTGAGCTA AGGTCTTAGC CCTTCAGCI'A CCACGATTCG AAATCGATGG CAGGCATAAG
GCAAGAACCT
GCTAATACTT
TGGCACATGG
TTAAAAAAAT
AGGAAAGTTC
ATGAATTATC
CTTGGAAGGC
TTTTAGGAGG
GAATTGCAAG
GGTTATACCA
ACATTTTTAG
ATGATTTGAA
0 to.* *so 0* 0 *so.
GCTTATCCTC 4860 TCATAACACA 4920 GAATTTTCTT 4980 GCTCACTTTT 5040 AATGACAAAT 5100 AGATATTGAT 5160 'rATTrGCAGGT 5220 AGATTAATAT 5280 AAATCAAGGG 5340 TTGGAAAAGA 5400 AAGGATAAAT 5460 CATTACTACC 5520 CAACTCTAT'r 5580 ATCCCTTTAC CATTCGATTT ATCTGTTTGT ATTGTT'rTAT TTTTATGCTT TTTGATCAGA ACAATGACTC CCATAAAAAT AATAAGCTTT GAAAATTCCA TpTGTCATGTC A'rGTTAGAAA AATGCAAAGA CCACCTCATC TTrGATAGATG GGGTGGAATT TTCGTGTCGT AAATCTACTA TCTCTACATT CCCAAACAAA AAACCCCAGC ATAAGCAGGG CATCTAAGCA TTTAATTCAA AGTAAAATAC AAACCAAACG ACATAGGTCA CGAGGAGGAG AAAAAGCGAG TAGAGAGTCA CAAAGGTCAT TrrTCCACAAG AACTTGGTTT GTCGTCGTTC CAGTTTGGCA AATAGAAGAT TCCCCGCATA AACGCAAGCA ACAAAAACAA TAAAAGCTAC CAAGCGAGCT CCGATAGCAA AAGCAAATAA GTTATACATA GGGCAACCTC CTTGACTTAA AATCTATATG 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 GAATTATGAC AAGCAATAAA TGAAAACGCT TACCAAAGAA TTCATTTTGT GGTATAATTG TTTCACTTCC GTTATCAACA TAATACATTT TCTTTATTTT ATCGTCCCCT AACTTTCTCG TTTCCGTCTT TTACTAATTT AA.ATAATTGT AACGAATCAA GGTCAATCTA GACACAAAAT GCAATGAAAT CAAGCAAATA GGAACTATA'r GATTACAAA AAACAACACG CACGCTTT-AC CTGCTCGCCA ACTTGTCGAA TGTCTGACAA ATTGCTCGAT AATATGGCCT ACACTCTTAA 1217 TCTGCTAAAA GN'GGALATA AGCTGACCTG TAAATAGAAA G7TTTTT~ATC AAGAAACAAA AGAACCTAGC CCACGCCGTG CTAGACATCG ATGCCAGCTC AGAACTTGAG GGCCCGrATCA GAAAATCGCC CAGAGTACAA TATCGAGTATr ATCGAACTCT TACGAAAAAG AAACTGGCGC CTTCGAAATT ACGGAGTTCT ACCTGAAGAA GTCGGCGT'rr TTGCCATCGG TGGTCTAGGA GAAATCGGGA AAAACACTTA CGGAATrGAA TACCAAGACG GGGATTAAAT TCCCAGAAGA TGACTTGCT'r GGTATCGACT TACATCGTGG ACAATATCGA CCGCGTCAAG GC'TGTrTTTAA CACATTGGTG GGATTrCCGTT CCTACTCAAG CAAGCAAATG CTTGCCTTGG CT'rTGATCCG TGGGAAACTC CAAGAACACG CTTTACGAAA rCAACCACAA CACCGAGTTG ACCTTTAAAA ft.
ft...
AGAACGACTC ACTCTATTCC ATCGTCTGTA CCGTGACTT CATCGTATGG CTGCGCTYGG GCGGAAGTAC CAAC CTTTAC ATCCAAGGTA TTGAAGGACG CAGCAGGCAA CAGAAGCTGC ATGGAAAAGG CCATTGTCAA TTTATCGAGC CAAATGAAAT GGTAGTCAGG GTGAGCCTAT AGAGCCTTTG GGGATTGTCA TAAGTTCGAC NTTACTCCAG TGAAGAAGGC GTGCTCTGTC CAACTCTGAA AAAGTCGTTG TATCATCTrTT GCATCCTTTG TGTTAAGACT GGACGCAAGA CGGAATCGAT CTTGGCTACA AGA'rTATCAT CGTCCATCC'r ATGTCATTCC TGACTACTCT TCACACACGG ACACGAGGAC TCCCTATTTA 'rGCTGGACCG GCCTCTTGCG CAACGCCAAA ATCTCAAGGC AACT2-rCTTT TTCATACTCC TCAAGGGAAA TTGGAGAACC TGCGGACTrG TCCTGTCTGA CTCGACAAAT GTCAGTCCAT TATGAAGATT CCTCAAATAT CTTCCGTCTC TNGCGGTCTT TGGTCGTTCT TCAAAGCTCC TAAGGGAACC AAGTTC'N'AT CCTCTGTACA CCAACGGAAC CCACCGTCAA GTCCCATCCC TGGAAACACTr GTGTCGAAGT TATCCACGGT AAGAGCAAAA ACTCATGCTC AATACCGCAT GCAAAAAGTC 6240 6300 6360 6420 6480 6540 6600 6660 6720 6780 6840 6900 6960 7020 7080 7140 7200 7260 7320 7380 7440 7500 7560 7620 7680 7740 7800 7860
CAAAGATTAT
GGCAGCCCTC
GTACAAT'rAC
ACTAGTGTCA
AAAGTGAACA
AACCAGGTGA TACCGTTATC ACAAGCTGAT TAACATCATT ATATCCATAC ATCTGGACAC
CCTGCAGGAG
TCTCGTATCG
TTCTCTTCTA
TCTGAAGCTG
GGTGGTCAGC
TGCTTGATTA AGCGAAAATA CTTCA'rGCCT GTCCACGGTG CACGCTGGAC TAGCAGTGGA TACTGGTGTT GAGAAGGACA GGCGATGTGC TTGCCCTTAC TGCTGACTCA GCTCGTATCG GATATCTATG TCGATGGAAA TCGTATCGGT GAAATTGGCG
ATATCTTTAT
CAGGTCATTT
CAGCTGTCCT
CATGAGCAAT
CAACGCCCAA
CAAAGATCGT
CGCGATCTAT CTGA.AGACGG TGTCGTTCTG GCAGTTGCAA CTGTTGACTT CAAATCGCAG 72 7920 ATGATTCTAT CTGGTCCAGA GACTTGA'rC GCCAAAGCCA AAGGATGCTA GCGTGCAATC TATGAAAATA CCGAACGTGA TAAAGCAAGA AAACAGCCCC AAAACTCATA CTCAATGAAA AGCACTGCTT TGAGGTTGTA CGACGTTGAC GCGGTTTGAA AGAAGGCTAA GCGAAAGCAT
CATCCTCAGC
GCGTATCCTC
TGTCAATG4GT
ACCGATCATC
GTCCTCGGAG
ATCAAAGAGC
GATAGAACTG
GAGATTTTCG
1218
CCAGGCTTTG
TTCAATGCCA
GCCATTIGTCA
ATCCCGATGA
CTGTTTC
AAACTAGGAA
TCTACATGAG AGAGTCTGGC TTCGTATCGC ACTGAAAAAT ACCATTCG CCCCTrCCTC TCCTCACACC AGATGAAGAA CTATCCTTTC TTTTGAGATT GCTAGCCGTA GGTTGCTCAA ACGAAGTCAG TAGCCA'rACC AAGAGTATCA ATAAAAATCG
TACGGCAAGG
AAATCAGACT
CTATGGGAGG
ACGTIATCATT
AACTTGAGTT AGCTCCCATA GTTCGGGAAA CTGGAGATGA ATCAAAGCCA AGCTTTGAAC TCATTCGTAA GAAGCCGACG TTGATrT'rTG AAGAGTTTTA GAAATACTAC GATTTTTACC TAGAAATATC 'rGCTCGGTTT ACTCCCGAAA TACGGCTGGC TGATGAGTTT GAACTTCTGA CGGGCTTCGG T'rGCGATAGA TATTGGCCGG AATGCGTTT-r TCTTCCATGC GTTPTCATCTT TGGAAATATA GCCTTCATAC TTGATT'rCTG TTTCAATCA AGTCTTCTGC AGCTGGTCCG ATGAAGGCCA CCACATCTTG GAAGGAATTC CTTGGCTGTC ACTGCATCGG TCAAGGGTTT TGGCATTGGT TTCCI-rGACT GGCTTGAGT'r TGATACTGTC CAALATTGATT TTTCT'rGATT TCAAAACGAG CCCAGCGTTC CGCGTCCCAT CTCAGTCAAG CGCATATCAG CATTGTCATG CAGCACGACT GGTCAAGAGA CGGTAGGGTT CAATGGTTCC TCATCACCCC GATATAACCA TCACTGCGCT TCAAAATCAA TCAGAGCCGC ATTGATACCC GCGATAATCC CTTGGCCTGC TTCCATTTGT CTGACCAGCA GTGAAGAGAC CTGAGATT GCAACTGATG AGGCAAGACC ATATCATACT CAATAGCATA CATTrTTCCAA ACCTTTGATG GAATGCACCA AGTCACGCTG TTGAAAGTCC TTGCACATAG ACT'rCCTCAG TAT'rGCGCCC GGTGACGTTC CTTGTCCGCA AAGCGCACAA TCTTGTCTTC GCCCCACTCC CTTGACCACA CCTGTAALACA TAGGCGCACG TCTCATGACT GGTACCATTG GTATAGGTCA ACCAGCATGG TTCCAGATAC ACCATCAAAA TTGGCCGATG Gr'rTCTGGA'r ATCAATGTCA TCCCAGTCGA GGCAACCTGG TCCATGGCTT TTCGATAATC TTGTCATCCA GTAAGAAACT TCTGGACGGC GAAGCCCATC TCCTCAACCT TAGGCGCT'rC ATCTCATTAT ATCGTCCACA AGGCCAATCT ACGAAGAATG AGACGGTATT CTTCGTCACC AAGTCGTCGA TTCAGGCTTG CCTTGGATTT TGCCTCTTCG TAACCTGATG CTTGGTTTCC AAAGTCGCAC ACCT1GTCCGC ATCATCTCTG GACATCCTCA GGCAGACTGG TTCTGGCTCA AGGAAGAGTT AATCGACGGA CAGTAACGAG GTGGAGGTTG 1-rTTGGATAA TACTTCGTCC TTGACATAAT 7980 8040 8100 8160 8220 8280 8340 8400 8460 8520 8580 8640 8700 8760 8820 8880 8940 9000 9060 9120 9180 9240 9300 9360 9420 9480 9540 9600 9660 9720 1219 CCTCATCACG TGAAGI'GTAT GAGAAATGAT TAGGCACTTC GTCTCCTGGC TGAATTTCTG 9780 TCACATCGTA ATTGATAGAA GAAGCCTTGA CACGTGGAGG GGTrCCTGTC TrGAAACGAC 9840 CGATTTCGAG ACCCAGTI'CC TTGAGAT'rGT CAGCTAGGTT AATAGAAGCC AAGCTGTGGT 9900 TAGGACCTGA TGAGTACTTG AGGTCTCCGA TGATAATTTC CCCACGGAGA GCAGTCCCTG 9960 TCGTCACAAT AACAGCCTTA GCAGCATATT CTrGATGGGT GGCTGTACGC ACACCGACAA 10020 CCT'rGCCATC TTCCACCAAA ATCTCATCAA TCATGGTTTG ACGAAGGGTC AGATTTTCTT 10080 GGTTTTCAAC CGTCTTGCGC ATCTC!CTTAG AGTAAAGTTC CTTGTCAGCC TGCGCACGAA 10140 GGGCACGGAC AGCTGGCCCC TTCCCTGTGT TTAGCATCTT CATCTGGATG TAAGTCTTGT 10200 CAATGGTTr GGCCATCTCG CCACCGAGGG CATCGAC'TTC ACGCACGACA ATCCCCT'rCG 10260 CAGAACCACC GATAGAGGGA T'rACAAGGCA TGAAAGCCAG CATTTCAATA TTGATGGTCG 10320 CAAGCAGGAC CTTACAGCCC ATACGGCTAG CGGCCAAGGA AGCCTCAACC CCAGCGTGTC 10380 CCGCACCAAT TACAATAATA TCGTAT'rCTT CAGTAAAATG ATAAGTCATG T'rTCTCTCCT 10440 ATTCCTCAAG ATGAATGTGT CTTAGTTGGC CTTCCCAATC TGGTAGGGCT GTTTTTAAAA 10500 AGACTGGAAC TAGCTGGATA TTWTGGAGCT TATCCAAGTC AATCCACTCA CAGGGCTGCC 10560 TTTTCTCATC TTCCTGCATG GTCAACGGGG CATCTTCAAG CAAATCCACC AGATAATGAA 10620 *ACTCGATATT GTGATAGGAA ACGCCGTCCA CTTCAAAACG ATT'rTCAACC ACAA.AAGCTA 10580 *GCTGCCCAGC T'rGAGCTrTG ACACCCAGTT CTTCCTrrCAC TTCACGGACT ACCGCGTCTT 10740 .CCGTGCTTTC ATTGACTTGA ATCGCACCTC CAATAGTGTA ATACTTGCCC TTGTCTTTGG 10800 TAACTAGAAG CTTGTGATTT TGGACAATCA AGGCrGTAGC CCGAACACCA AAAACCGTAT 10860 TGTCTACTr'r TGTCCGAAAG TCTTGTTGAG TCATTCTTGT CCTrTCCCTT AAACGACACA 10920 .AAAACAGTCA AAACTACAAA GAAGTGCAGG ACAAAAAAGC CTGCAACATC CAGG 10974 INFORMATION FOR SEQ ID NO: 215: SEQUENCE CHARACTERISTICS: LENGTH: 987 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 215: CCCGTTATGA TTATGGATAG CGCTTTCAAA TTTTTA.AACT CCTATCCCAT CCTTTTATCT ATATAATAAG TGAAAATATA ATAACTGTCA AGTA.ACTGAA GTGAATTTTA TAAAAAAATT 120 1220 ACAAGCCAAA TTTGTAAAGT TTACACTAAG CCGCTAGgCA TTTATTTGTC AATAATCCGA GAAAATCTTG CAACGCTTAG CATTTATATG ACTTGCGAAT AGCAATCCTG CTAAACCTTT CAAGATAAAA ACATGTGTAA GCA.AATCTGC TACACTTTAC AAAAGCTACG ATAGGCTTGC TATCTGCTAT GTCCGTATTG AAACTTATCC AAAAAGGGTr CTCCCTTCrG GAGCTGACCG CATCTCAAAA AAGAAGTAGA GACCCATTAT CAACCAAAGA GAGCTCCTCG ACCTTTTCAT AGAATCGCCG AACGATTTAA GTTTATCCCA ACTCAATrAT GACATrTTTT TCAAAAGTCA GACAAGAAAG AGGCTGATAA TC'rACCAACC TCTTATTCTG TTTAGCTrA TTCGCTTTCT TAGCGACTGC AATCTGGTAT ATCGTCTATC AGAATATCCG AAGI'CTATAA AAACTATCAA CCACACTCTA TCTATACAAT 'rGGAGCACGC CAAGAATAAG GGM'r'rGTAC AGACGATrCT AGGAAACTTC TATGCTGTCT AATTATAAAA AAAGTCGAGG CGAGAAAGTA TGACTTTTAC ATATATCTCA CTM-TTCAAC AACCCATCAC TCCATCACTT TCGACTTGGT CATTCCCCTT ACCGGTACAA CCATGAGCAA ITrGTAGTCGC TTTAGAAATC AGAGGGCGGC TCAAGGCAGA ATGTGACTGA TGAGCCACTA GCACATAATC ATAAGATTCT ACTGCCCAAA CCTTAAG INFORM4ATION FOR SEQ ID NO: 2: SEQUENCE CHARACTERISTIC! LENGTH: 2651 base TYPE: nucleic acid C) STRANDEDNESS: doubJ TOPOLOGY: linear TCCTATCTGA TGCGCTATTT CAACCAATTT TACCAAGAGA TACTTTTGTTI CATAATAGGC TGTAGCAAAT TCGTCCTTAA CATCAATGAC (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 216: CTGGGTCTTG TTCATAGTAG GTGTGGTtCT TTTTTTCGAG GCATAGTGGA TGGTAGTTGG ATGACAGCCA AAGTCAGAAG TCTGGATTGT CAGTAAGATA GTTTTTAAGT CTATCTCTAT CCTTTTACTT GGTGGTTTAG CTCTCCTGTT TTCTCTTTTA GTATTACGTG AGATTTGGAA AACGTGTGAT GCTTCTGTTA TAAGAGAGAA CT'rTTTTACG AAAATCTATT GAATATGCCA GTGTACTATT TTTGGTTCAT TTTACTATAT TTTATAAGTT CAAAGCACTA TAAAGTAAAT TGAAACAAGA ACAATACAAA CAACCACAAA AAAGCAAGCA TTCACAAGAA TACTTACCTA TGTAGCCCAT AGCTTTGAGC CTATTTCAGT CAAATAAGCA CAACTTTTCT TGGTTTTGTT GCTTTAACCA GCCATAAATG TACTACCTAT TCGCTCACAA TAAGAAGAT'r ATACCACATT ATAGTGTAGC ATTCCAACT CAATTCTCGT AAACGGATTG TCATGGGAGG AACAACCGTT CCTCrTr3TTT ATTACTAAAA TATTCTGGAT CTTCTGGGC ATCACTCCAT CTACTCCTAA CAGACATAA6A GTTTCTGATC 1221 TTCAAAGAAT TCCAATGCTTI TTMCAAGAG 'rACTTCTATT TCCCGCTGAA C7=r~CCAA GTGAAGAGAT TTGCTGATAG CT'rCTGAATC CCTTGTCCAT AGTTTGCTTA CAAAATATTC
CAAATCCGTA
ATCATCTGTA
ATTGACAGTC
ATCCAAGGTT
GAGTACTCCA TAGTATA'rCC TGTCGCTCTT GTTTTAGGAA ATGAAATAAA CTGGTAGI-rC GGCATCATAC TGTCTTAC'TT AAAGACTGGA TTTGATGTCC ATAAATCTTG AGCTTTGCAG TTCATCATGT CTGGACTATC TTTTTTACTG GT 'rTAATrT AGTTCGTTGG CTCGACTGAG ATAATCTTCA AAGCTTGAAA TCAAAAATAT CAATCCCTTT AAGCTCCTCC AAGTTTAAGT CCTGCTAGAT TT'N'CAAGTT AGCATCATGC ATCATGACAA AGACAGAATT GTAGGGCATG TTCGACAAC ATGGTAGTCT CATAACGGGC TAAAAAGCGG CAATTAGTAA TTTCACCA TTTTAGTCTC GTAGCCATTT CTTGAGGACT TTTATTGATA ACTGCCCATC TTTTGTT'rCC TGCACG'rCCG TCTCCACCAA GTCTGTTTG GTATTTTGAA TCCC-ATrTGC ATTGGAAACC ACCATGGGAG CCTCCAGATA AATATAACCT AGTTGTGCTG TAGTTTCCAA GGACTCTACT CCTCGGTGAG AAATAAGTTG AGGTAGATGA TCTAAGGCAA AGAAAAGACT GGCACAAGTC 'rCTCTCCTAG GAAGCATATC CAGCTCCTTT AAATAAGTCA GAGCCATAPA ATAGAGATTT ATGACACCCC ATCGCACGAT
CCTGTCAAAAA-TGAAACAAA
TTAATCACGA CAAAAT'rCAA ACCAAAGTTT GAGCCAATAA AGCAAGATGT AAAATAAATT CTAAACATCA CTGCTTCTCG ATCAATCTGA CAGAGACATA ATCCAATATC TATCT'rCTAA TAATAAATCT TCAGCAT'TTT
GTGATCTTTT
TTTAACCAAA
AATACCAAGA ATCAGAGACT TAAAGGAATC AAAGGAAGAT CCAAGCATAA AA.AGTA6ACTC AACAGTCAGC TGATCATATA GAGAAAGATA AGAGATAGAA CTCTCTGAGT GATATCATCT ArAATAATAA ATGTGCTTTG TCTTCTTGGT TTTCTCCAAG CAATCTTCGG AAGGGCAAAC GTAGGATGCT CAGCCACCAC 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280
AACAAGGCTT
CGGATATACT
AGGAGTCCTA
AGACTAATCA
TAGCGCAAGT
CAATTAGCCT
TTTGAAAATA
ATGGAGAATG
ATAAGCTTGG ATAAACTCTG GAATGACGAT TTTATTAAGA CCGTATAAAA GGAAACAGCA TAGCTATATA GAAAAAGATA TAGCTTTTTC ATAAATCCAA AACTTTCATG GAAAACCTTG TCGCTT'rTCA TTATAGAGGA GATGACGAGC ACCAATAAAG AGCAACCAGA AGGTTAATTA CAATCAAGGC TAAAAAAGCT AGTAAGGATG GCTAAGACAT TGTTATAGGA AATAAAAA(GA TAACCTGTCT GATCTAATAA GAAGCTAGCC AACCATGAAT TCCACTATCA TAAAAATCAA GAAAAATAGA AAGAGGATTT TGAATGGTAC CCACAAATAC TATCALAGATC GAGGTAAATC 1222 TGTTTAAGAC CCAATTTTrT AGGTTTTTCA GG'FPTCATAG GAGACAAGTC CAAGCCACCA AAAGGAT'rGT TTGATALAGCT CCCTAGCTTG ATCCGACTCT AAGAAGGATT CGrAAACACG CTAAACTATT ATGAGACTGA CCTTGAAATC CAAGAAATGA GATTGGCAAT ACCATGTAAA TCTGAACTCC GACGT'rCAAA CC'rTGTACTG TTGGCTATAG TCTAAACCAT GCTCTGCTAA CAGCATrGTA G INFORMATION FOR SEQ ID NO: 217: SEQUENCE CHARACTERISTICS: LENGTH: 5638 base pairs TYPE: nucleic acid STRANDEDNESS: 'double TOPOLOGY: linear GCACTCCTAG TCAAATAATT ACITTCTGTC TCTAACAATT CGCCGTCATC CGAGCATCCT GGCAACAGTT TGCAATTrGA AGCTTCATCA TACAAATCCA AATAGGTAAA TCACTTTTAG 2340 2400 2460 2520 2580 2640 2651 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 217: CGTTATAATA AACTTGTGAA AAAATTAACA AAGGATATCG TTCCTTGAAA GCTATGGAGG a.
a a a a a. a a a.
a a a AAAATATGGC TGATAAAAAA ACTGTGACAC ACGTAGATGA GTTGGTTCAA AAAGCTCTAG AAGAACAAGT TGACTACATC G'PTGCCAAAG AATTGGCTTT ACATGCCTTT GAAGAAACAG AGAACTTGTT TGCCTGTGAA CACGTAGTAA TTATCGAAGA AGACGATGTA ACAGGATTGA GTGGTATTAC TCCAACAACA AACCCAACAT TGAAGACACG TAACCCAATC GTCTTTGCCT ATGCAGCTCG TATCGTCCGC GATGCAGCTA AATGGATTAC TCAACCATCT ATGGAAGCAA CGACAATCCT TGCAACAGGT GGTAATGCCA CAGCTCTTGG GGTAGGTGCC GGAAACGTTC GTCAAGCAGC ACACGATATC GTCATGTCTA CAGAGGAAAA GAAACTCGTT GCTGAAAAAC 'rTGCCCTTGA AGAA.ATGCGT AAATTGGATC CATCAGTAGC AGCTTTGGAT GCCCACGGAG GACGTGGTGT ATTTGAAGAC AAAGCAACTA ACAACATGCG CCACACTAAG ACAGTTGGCG CTCTTATTGC TGAACCAGTT GGTGTTGT CAACAGCAAT CTTCAAATCA TTGATTTCAT TCCATCCATC AGCACAAGAA TCATCTGCTC TCGCAGCTGG TGCTCCTGAA AACTGTGTGC CAAGTGCCCT TATGAACCAC GAAGGTGTTG TGGTTAAGGC GGCTTATTCA TGTGGTAAAC CAGCTTATGT TGAJAAAATCA GCAAACATTC AATCATTTGA TAACGGTATG GTCTGTGCAT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 too* a. a.
a a a a CTGAACAAGC AGTTATCATT GATAAAGAAA. TTTACGATGA ATTTGTAGCA GAGTTCAAAT CTTACCACAC TTACTTTGTA AACAAAAAAG AAAAAGCTCT TCTTGAAGAG TTCTGCTTCG dCGTCAAAGC AAACAGCAAA AACTGTGCTG GTGCAAAATT GAACGCTGAC ATCGTPTGGTA 1223 AACCAGCAAC TTGGATTGCA GAACAAGcAG, GATTTACAGT TCCAGAAGGA ACAAACATC 1080 TTGCTGCAGA ATGTAAAGAA GT'rGGCGAAA ATGAGCCATr GACTCGTGAA AAATTGTCAC 1140 CAGTTATTGC AGTTTTGAAA TCTGAAAGCC GTGAAGATGG TATTACTAAG GCTCGTCAAA 1200 TGG7TGAATT TAACGGTCTr' GGACACTCAG CAGCTATCCA CACAGCTGAC GAAGAATTGA 1260 CTAAAGAATTr TGGTAAAGCT GT'rAAAGCTA 'ITCGTGTTAT CTGTAACTCA CCTTCTACT'r 1320 TTGGTGGTAT CGGGGACGTT TACAATGCCT TCTTGCCATC ATTGACACrr GGATGTGGT'r 1380 CTTACGGACG CAACTCAGTT GGGGATAACG TTAGTGCCAT TAACCTC'N'G AATATCAAAA 1.440 AAGTCGGAAG ACGGAGAAAT AACATGCAAT GGATGAAACT TCCTTCAAAA ACATACTTNG 1500 AACGTGA'rTC AATTCAATAC CTTCAAAAAT GTCGTGACGT TGAACGTG'rC ATGATCGTTA 1560 CTGACCATGC CATGGTAGAG CTTGGTTTCC TTGATCGTA'r CATCGAACAA CTGGACCTTC 1620 GTCGCAATAA GGTTGrTTTAC CAAATCT'rTG CGGATGTAGA ACCGGA'rCCA GATATCACAA 1680 CTG'rAAACCG TGGTACTGAG ATrrATGCGTG CCTTCAAACC AGATACCATC ATCGCACTCG 1740 GTGGTGGGTC TCCAATGGA'r GCTGCCAAAG TAATGTGGCT CT'rCTACGAG CAACCAGAAG 1800 TGGACT'rCCG TGACCTGTC CAAAAATTCA TGGATATCCG TAAACGTGCC TTCAAGTTCC 1860 .CATTGCTTGG TAAGAAGACT AAA'rTCATCG CGA'rTCCAAC TIACATCTGGT ACAGGATCTG 1920 *AAGTAACACC ATTTGCCGTT ATCTCTGATA AAGCAAACAA CCGTAAATAC CCAATCGCTG 1980 ACTACTCA'rT GACACCAACT GTGGCAA'rCG TAGATCCTGC TTTGGTAT'rG ACAGTTCCAG 2040 *GATT'rGTTGC TGCTGATACT GGTATGGACG TATTGACTCA CGCGACAGAA GCATACGTAT 2100 **.CACAAATGGC TAGTGACTAC ACTGA'rGGTT TAGCACTTCA AGCCATTAAA TTGGTCTTTG 2160 AAAATCTCGA AAGCTCAGTT AAGAATGCAG ACTTCCACTC ACGTGAGXZAA ATGCATAACG 2220 *CTTCAACALAT CGCTGGTATG GCCTT'rGCCA ATGCCTTCCT AGGTATTTCT CACTCAATGG 2280 **CCCATAAGAT TGGTGCGCAA TTCCACACAA TCCACGGTCG TACAA.ATGCT ATCTTGCTTC 2340 CATACCTTA'r CCGTTACAAC CGTACACGTC CAGCTAAGAC AGCAACATGG CCTAAGTACA 2400 **.ACTACTACCG TGCAGATGAA AAATACCAAG ATATCGCACG CATGC'N'GGA CTTCCAGCTT 2460 *CTACTCCAGA AGAAGGGGTT GAATCTTACG CAAAAGCTGT CTACGAACTC GGTGAACGTA 2520 TTGGGATCCA AATGAATTTT AGAGACCAAG GAAT'rGACGA AAAAGAATGG AAAGAACATT 2580 ***.CTCGTAAATT AGCCTTCCTG GCTTATGAAG ACCAATGTTC ACCAGCTAAC CCACGTCTTC 2640 ***CAATGGTAGA CCATATGCAA GAAATCATCG AAGATGCATA CTATGGCTAC AAAdAAAGAC 2700 CAGGACGCCG TAAATAATTG TTTATCAGTC TAGAAGCAAG ACAAAAACTC AATTTGAGGG 2760 AAAGATCCAG TAATTTTTCT ATGATAAAAG
AGGATGCCI'
AAAATATATA
TTCGATTTGA
TATAGTATAG
TTAATTTTAC
TAAGAGTAGT
ATTCAGCTCA
TACGATAA.AA
TTTATGATAT TGAGGCCTT'r ATAGATTGAA ACTACAATAG CTGTCCI'GAT CGAT'rTGTCC TAGACTGAAT CTAAAATAGT TTTTCTGATA GAGTTGTTCA ATTTACTAAG GCCCAATTAA AAACACTGAT TTGAGATTGC CGATGACAAG GTGTGTTGCT 1224 GCATCCTATC AAGGTTTTTG AACACCTGAT TTGCCC"rTT TGAAAAACTA GAATAGAAAC TACATATCTG CTTrCAAAAC ATTGTTAGAA TGTTCTTATT TCATTTT'GAT ATATA.AAAAA ACGAAACAAT TGCTAAAACA CATCTTAT-rT CAATTCACTA AATCAAAGAG CAAACTAGAA AGATAAGACT AGCCCCCTCA ITTGATTTC TAAAGAGTAT TT'rAAGAAA
TAGTTTAATT
AACGAGTGCC
TTAACAGATT
AATGATAGAT
CTC'rATAAAA TAAGTGCGAA GGAAATGAGC TTTTATAGTC CTTTCGTTTT AAAATACTAT CTCAGATATT CTTATATCGA CAAGAAGTTT TTGAGTCATT. CCCrCATCAT ACATATTAAA TAAATAGTGG CTCATTCAAT TTTTCACTAG AGATATAAAC AAATAAATTG GAGCTTAACA GTGGACTATT CTAGATTCAA CATATTATAA TGTGTAATGC AGGATCCAA'r CCTTTCAATC TGTAGTCGTC AT'rAA'rAAAG ACAGATGGGA ATCGATGAAT TCTTAGATA GCAACTGAAT ATGCATAAAT AGCAAGGAGA ATCCTATTrT AGAGTGAATC TTGTGCGCTT CTACTAAGGA TCTTTTAAA T'rTCCTGCCT TAGCTAGTTG TCCACTATTG GCAATGAGCT GAAAACAGAT CCTATCCATG CAAGTCTG TTCCAGAACT ATATCAATGG TATCTACAGG GATATTTCCT CGAATTTGTG GCCCAATTAG AATGACATCT AATAATAAGC TAGTATAGTA AACTGAAATA TCCATTTCCA GCAAT'rTTTT AGAAACTACA AAACTAGAGT AAAAGAAAAG GATTGGATCT ATTTTGTCCA ACT'rTTGGAG GTTCCTACAA ATGACAGTGT TCCTATTTAT TTTGATAGAG AATCTCTGTr GAAGCCATTT GGTCTT~CTGC TCCCAGTA GCTTCTTTTT GTATGAGATT GTCTTCCGCT TCTTCAACTT TAATT'TTCGC GATGGCTTCA ATAAAGGATG ATTTGCCTGC 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560
ATATTCCATT
TTTGCTCCAT
GCAATTTCTT
GCTTCATGGA
TCTTCTGTCA TCT'rATTTCT TCATCATTCC GTA.ATCCCC TCACAGCAAG TAACTCATAA TATTCTTT'rT AGCTTCTGTC TCGCACTTTG TTGCATTTT ATTGATTTG CTrTGGATAGA GATTTCA.ATC CCACGTTCAG
TTAACAAGCA
TTAACCTTCC
AAGAACTCCA
TGCTAACCAT
TACTTGTCGA CATTCCCGCA TTACATACTA ATAAAATTTG TTTCATAATC ATT-TCTTGTT CAACAACT'rT GTCATTAACT AGTGCAAAGA TGATGAA'rrG AACTAGAACT GCATTTAAGA ATACTGGTGT AGTCCAAGGA TTGATAAATG GAATGTATAG GCTCTCACGT CCCCTGCTGT ACTTGTATAA ATGCAGGACT CATGAA'rTCT GTAACTGTTG CTAAGTAGCT GATTAAAATA CCAAGGACTG GAACTGTGAT 1225 AAATGGAATA GCTAATGAAA TGTrATAAAC GATrGGGrAA CCGA.ATAATA CGGTTCATT GATATTGAAG ATACCAGGTC AC'rCACrAAG AATGTTGCTA GAATGT'rrGT ATTTGTGATA AGTGATGTTT TCAGTAATGTI TGCTTGGTGA ATACCAAATA GATTAAGCT'r GTACCAATAT TAAGT'rCATT CCAGTTATAT GGTCATGACT GGAAGTAATA CAAAAGATAA TTrAGCCACG TTTTAGAGA CAGCATTGCG TrAATAAACA TAATGTAGAT CCACTACCAC CCATTAAAGC
GGTTGATGAT
TAATrAATAG
GCCATAACAT
GACGAATTG4G
TGAATAATAA
CGCTAAATGA
GTGTGGAATG GCTTGTCCAT TATTTGCTGC TAATGGTTCT AGGATGGCAC TGTAAATAAC ATI'CCTAAA GAGTAAATAA TAATGACCCC TTCTTGAATA AAGAT'rGTAA TGCTGAAACA ACCCCAAATA
TGATTGAGAT
AGGAGATGAC
AAGGTTCATT TGTAAAGCTP TAACGTTTGA CGtACGATA.A CCCCGGCGAA CAT'rGCGCCT GAAATGTTTA CCGCATCTT TGCTCCGTCA TCTACTAACA GCTGGTGGAA TATTTTCACC TAAT'rCAATG AATAATrCTG TT-GCAATAAT GTACCTGTGT TGTrGAATGA AAGAACACCT GGAACTACAG AAACTGTATT TGGCATCATC GCTGCTA.ACG GGTTTTCGAA ATCTCTGT'TT ATAATCATAC CTGAAATACT TAAAGTACCG CTTGTTAATG TATCCCCTTG GAAAATCCAC
ACAATTAAAG
TTAGCTAAGA
TTTGCAATTG
AAACTAATGA TAGCATTGAT AATAACCAAC CATTACAGCA TTATTCCCCA ATATTGGAAT 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5638 120 180 240 300 TTAAATACCG TGTTGTTCAA AAGAACGAT'r AAACCTGCCA AAATATATAA TGGCATTACT GTTACGAATG CATCTCTTAG GGTTTTTAAA TGAATTTGGT TCCCTAGTTT ACCAGCAAAG GATGGCAAAA AAATT'r'NTT GGGGGGGGGG GTTATTAAAC CCCCCTTrTTT AAAAAAAA INFORMATION FOR SEQ ID NO: 218: SEQUENCE CHARACTERISTICS: LENGTH: 4745 base pairs TYPE: nucleic acid (C STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ, ID NO: 218: CCGGAAGCTG TTGCCCTTGG AACTCCAAAT GAAGAAACAG CCTTTGTCTT GAACTATTTT G7GTGTGGAAG C-ACCACGTGT TATCACTTCT GCCAAAGCAG AGGGGGCAGA GCAAGTTATC TTGACTGACC ACAATGAATT CCAACAATCT GTATCAGATA TCGCTGAAGT AGAAGTTTAC GGTGTTGTAG ACCACCACCG TGTGGCTAAC TTTGAAACTG CAAGCCCACT TTACATGCGT TTGGAGCCAG TTGGATCAGC GTCTTCAATC GTTTACCGTA TGTTCAAAGA ACATGGTGTA 1226 GCTGTGCCTA AAGAGATTGC AGGTTTGATG CTTTCAGGTT TGATTTCAGA TACCCrCT-r TTGAAATCAC CAACAACACA CCCAACAGAT AAAATCATTG CTCC'rGAATT GGCTGAATTG GCTGGTG'rG ACr'rGGAAGA ATATGGTT'rG GCAATGTTGA AAGCTGGTAC CAACTTGGCT AGCAAATCTG CTGAAGAATT GATTGATATC GATGCTAAGA CrTGAACT CAACGGAAAT AATGTCCGTG T'rGCCCAAGT GAACACAGTT GACATCGCTG AAGTTTTGGA ACGCCAAGCA GAAATTGAAG CTGCAATGCA AGCTGCCAAC GAATCAAACG GCTACTCTGA C7TGTCTrG ATGATTACAG ATATCGTCAA CTCAAACTCA GAAATCTTGG CTCTTGGTGC CAATATCGAC AAGGTCGAAG CGGCTTTCAA CTTCAAACTT GAAAACAATC ATGCCTTCCT TGCTGGTGCC GTTTCACGTA AGAAACAAG'r GGTACCTCAA 'rTGACTGAAA GCTTTAATGC GTAAGATT GGGTGTCAGC TCAAAATCGG AAAGTCTAGT TTGCCT'rATA TCGCAAGGAG TTTCGGCTCC T'rTTTTCTAG GAGTGAAGTA 'rGTTAGAAAA TGGCGATTTG ATTT=GTGA GAGATGGGTC Voo*.
.**00
AGACATGGGA
GGATGGGATG
CTTCITrGAG
GGTGAAGGAA
TGCAGCTGGT
TCCTATGAAA
AGAACTAGGT
ATCGCCTCTG
CAGGCCATCC AGAC'rTCCAC AGGTAACTAT AGCCATGTT'G ATTTATCATG CTAGTGGACA GGCTGGTGTT GTCTGTCAAG TCCAATCATT TATACGACCT CTATGTTTAC CCAGAAATGG AGAGCTTGCA AACATC'rTGG AGCACCCTAC AATGCTTCTT TTTTACTGCT CCCAGTATAT AGCAGAAATC CTACCTATTT 'rTTGGAGWTrG GGGAGCAGGA GATTAGTGAT ?r'rTGAGGG CTGCCTGTTC CTCTGAACCA AGCTGGTACC AATCCTAGTC TT'ACAATGTA AAGAAAGGAA TCTTCATGAT TCAGATTTTT
CCATTTATTT
AACCGGCAGA
ATATCCAGTC
TCTATCCAGA
TTGAAACTAT
AGTAT'rACAT
AGTTGGCAGC
AATCCATCTC
360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 GTT'rGACGAG ACAGCCAT'N' GATTCTACGG GAAATTAAGG GTATATAAAG GCAGGCTTGA GCTTGAATCA CTTGATAGTC GGTC'rATCAA GCCrTTTGGG AGCTATT'rrA GTTGAAAAGA T'rACAAGGTC AAACAGCAGT AGGAGATGTT AATCCTCYAGT CAAAAAAGAC CAGCTTATGC AGGCTATATT CTGCAAAATG GAGGTTAACT TTCTACTTAG TTGGAGAAT'r GATCCGCTAT CTGGATCAGT ATGAGGATGT CTCAATTTCC AGATGTTGCA GTTGATAAAC TCATGGAAGA TTCTACGTGA AAATAAGCGC TATTACCTCA ATTTTCCTAC TTGAACTGGA TCAAGAGATT TTTrGTCAGAG AAGCTAGTCC AGCAGAGTTT TGAGACGGAA TTGCGCAATC AAATCAATGC CGGAC'ITTGC GCGCATTAAA ATGACCCTGT CCAATTATTT ATCCTTTGAC AGAAAAACAG CAGGAGCTCT ATGACATTTT ATGCCCTCAA GTATATCACG GCTTTTTrGT TGAAATTTCT AGAAATGCCG TGATATCTTT GTGGACAGTT AGGTTGTCTT AAGATGGAAA GTATGAGTTG GCTATCGATT TTGATAAGGA CGTGATTTCT TGT'rTCTGAG TACATTGTTT GACTTTCCT'r AGTATTCGCT ATAAACTATA TGTrAACCGGT ATCATATGTC ACTTGAAAAC AAATTGGAAC GTAAAGTTAC TGGAGACAGC AAGACAGAAC 1227 AACACATATC GGAATAAACT AAAGGAGACA AAGCAACAGG CGCTGTCAAA GAAGGr'rTG TTGAAGCGCC TGTI'GAAAAA ACAGTTGCTA AGGCAAAAGA CGTTGTAGAA GACGCAAAAG GTGCTGTAGA AAAACGTTTT TACTAAAGAA TA'rTCTTATA AATAATTT'rC TGGATGATTT GGTAACAATC T'rTGTGATAG CCATTGCCGC TGATTGATGT AGCTACTAAT TCAGCAGTCA GAAAGACT'rc CGACCAC'rTT CTCCCTTGTT TGAGTTTCAA CCACATTGAT AAGACCTTGG CTCCATTTTT GCGATGCCTT TTTGATGTC GCAACTGCAT TT'rCTTGGTT TTACCCGTAG CTGAAAAGCT TAGGAAAAAA TCAACGGTTr TGCGACGGCT GTATCTCCTC GGCTCCCTTA CCCGCAATAA AGT7GCCGTr GAAGGTTTGA CATTTTCCCT TGATTTTTTG GGTAGGATTC TTTCTrGCCC TAACTGCATC TAArTCGTGA CCGCAATCTT' TTCAACAGGA GGTCTTCATA GTTAGGGTCA GGAGGAGGCC AAAGTCCTTA GAGCAATCTT TCCGGTTTGA
CTTGATGGCT
TTCATCTGCA
AATCTCAGGG
TCCTGTTGAT
TCTTGGCGAT
ATGGCCATTG
TGTTGAT'rGA
CCCAGAACCA
GAGTTTTTTC AGACTATCCC CATTG'rGGGC ATAGTCGATG CTCAGTGAGG ACT'rCCATAC GACCAGGAAC GCGGGTTGCA CTCAAGACTT GCTCCGAGAC GGAGACAAGC AAGTCCAGCA GAAGTTGCCA ATGAG'rTGAA TATCATAATC TCCAGCGAGT 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 a a a.
a a a a a. a.
a a a a AAAGGCTTTG GAATTCTCGA TTTGGTTATC CCATAGAAAT CATGGTCTTG ATC?1'CAACC TGTTCTTTCA AGACTGAGAA TCACTGTTAA TGATGACTGC TCGGCTCTTT TCTTCAAAGC TAGGGTGT'rC AATCGGGCCG CCCACATCAA AGGTTAGACC ATAGACACGT ATGATGAGGT CGGTACGGTC ATT'TTGCACA CTCTCAGGGG TTGTCAACGC TGACTTAAAG GTCGACAACA TAGCAGGTCT ATGCCCTTGA GTTGTCTTAC CCTTAGTACC AGTAAAGGCA TAGAACTCCA TGGCAATCAA ACTCATGCT ATACCCACTT CGTAG'rCCTT TTCAGCTACA AGAAGGTA'IT CTTTTT'rAAA GGCAGCGCCT TT'rCGGCTGT CGTAGCTGAT GCTATCAAAA TCCATCAAGA CACCTGTG ATATGGTCTG GGCTGATATT TTGACCAdAT AGGCr'rGACT GCCTGATTCA TCATGTCAAA AAAGTCTCGC CATCAAGAGT GATAAGATGT 'rATAG4GCGAA AGGAGTTTGA GT1VPTTTCCTC.
AAATTGGCTA
GTGGTCCATG
GTAGAAATAG
TAGGAAAACT
GGAGACTTCC
GAGGTCAATA
TGTGTTCATG
ATAGGCTGCT
TGGATTACCA
TTCTTTATA. CGTTCACAAT GATGACAGGG TACCAAGCTA ATCCTTGTGT TATAGCAGAA TTTGCGAAAA AAAGAGTGTC 'rTCTGT'rACT ATAACTTTGC TIGTAGT'rGTA GTGGTAATGA CCTTGGTCAA TAATTTCGCG AAAAAGGCCA 'rCTTTCTTTA AAATATCTAA TACGGTTTCA 1228 ATCTTAATCA TACTTTCTAT TGTAAACCGA AAGTCGTAAA TTTACAAGrA ACAAGGAAAA.
GTTTATAATG GAAGATAAGG AGTrTTCCT AGTTATCAAA ATTGAATGAG GAATCTATGT CGCACGAAAA CAATCACCAG CAGGCCCAGA TGT'rACGGGG GTAACTTTAT CAGTCGCC'rA CTCGGGGCTG TTTACATTAT GGGCTTATGC AGCTAAGGCA AATGGTCTCT TTACCATGGG TCTTGTTGGT TTCAACAGCG GGGAT'rCCAG TTGCGGTGGC ATACCATGCG AGAAGAAGAG CATAGCTITG CCCTGATTCG CAGGACTAGG CCTGGTTTTT GCTTTAGTCT TGTATGTCTT TGTCTGGCGT GGGCAAAGAC TTNTCCCGTC TATGAGTGTT ATGCCATGAG CCAAAT'rGCT TTATCATTAT GAAGCTCGGT CTGCCTTTGT CGGTATGGTA GTTCACTCAA AAGAATCTTT ATACCATTAA GGAAGCCATT
TTTTG
TTGATCCCAA TCATGCAAAG ATCCGAGGA'r TTT'rCCAAGG GAGCAGGTCA TTCGTGTTAT TCAGGAGATT ATCTAGCAGC GCCAGTTTTG CAGTCTTGAT GAAACAGGAG ATAAGATTAA CCTTTTATCC TGACAGGGTC GAC'TGCTTGG CTAACGGCTA CCCTTGGTAC ATCTGGATGG TTACAATATC TATGCTTGGT CAAGCAAGTr GCCAAGTATA GAGCTTCTTA GGCTITrATGA TGC1'CCTTGG CTAGCAGACT CT'rGGCTTGG GGAGTCT'rGA GATGAATAAC CTCAAACCCT CTGGATGCTC CTAGCAACCT CGTTACCCAA TCAACCTTTG TTATTTCCTT GCCCAAGAAG CAGTAAGCGT CTCTTGGTTG TGCCATCCAG CTCTTCCAGA 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4745 a 6 C. a INFOR~MATION FOR SEQ ID NO: 219: SEQUENCE CHARACTERISTICS: LENGTH: 1900 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear a a a Ca..
S.fl a. a a
CCTGA
GCTGC
AGGAT
TAGAG
CCTGC
AAGGC
(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 219: TTGAC CTTATAATAA GGAACAAAAC ACAATGCACT ACCTTTTCAA CAAAAGAGTT TTGAT TAAAACCATC ACACCAGTTA TACCATTTTG CTTCATACCC ATCTTGAGCT ACGAT CTTCTAAATC AAAAACAGAG TAAATCTTTC TTTCCTCGCA AGCTTGCGCA ATGAT ATAGTTCATC ACCACCATCT CTATCCCACT CAGCAGAAAT CGTATCCCGA CAATA AAGCCTGATA AGCCCTGTGA TGCCCATCTG TAATCAGCAA ACAATCTCCA AAGAA TACTGATTGG ATCGACTTGG ATTGTTTCTG CCGACTGGTA AAGCATCTGA ATATCTTGCA ACTTCTTT-rC TGATAAATAT AC'TTCATTT CTTI'CTCCTC AAGGGAATTC AGTTGAGTCA. GATGAAGATC TGCTATATTG GATACTCACT TCTGTTTGCC TTTAAATCGC CATTGGAAGC GGA9CrTGTC CAGAGACTGG CAAGGACGTC ACACTGACTA GGTAGAGGCC 1229 ATAAAAGGGA AACTCGATAA ACAGGACTCC CAAGCCCACA TGATGGGTAA TGAAC'TCCCA GATAGACTCT TGATACCAGC AAGGACGA'r' TGTACGAITr TTCTCCAGAC CTGATCTrTA ATCCGCTGAC TAAGAATAAC AATCAAAGTC CCACTTGGGA AGGAAAA'rCC CTTCTCCTCC TGGTAGATAT TTTTAAAGGT CACGATTAAA AAGAAACTTT CTATCTTCCA TCGCTTACGA GTGATAATCA CTGGGATATC AATCAGACGT TCTGGTAAGT CTCCTCGAAT GGCAGTCTGA GGGTAAAATT TGACCATGTA GCCAAGAATA ATTAAAAATG TTTGTTrATC TCTCATAATG CAACCAGAAT GAAACGGAAA AGATXACACC CCTACCArCA GCGTACAGC TAGAGAATGC ACCAGATGTA A.AATAGCTGG TCGTGGGCGC AGACCTGCCA AAGCCAGATT TCCCAGCATG TAAAAGACAA AAGCTGTAA'r GACAACCCAA GTGAGGGCTC GAAAAAGAAT AGTCAAATAA ATCGATTIGGT CAAAATTGAC CAACAPTTCA ACGAAAAGTA AAAGGGCAAA ACTGCCCTTC TTTTA.AGGTT GGTT'rCAAGA GAACATACAA TrCAATCAAG TTAAAAGGTA ATACCATGGT 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1900 CATTAGGTAG TTGGAAAGTC CCAAAATTTT TCCAATATCA CAAAGGAACA GCATAAACAT AGTTGAGAAC CAACATGGCC AGCTAGAGAG CCTAGTAGGA AACGAAGGGT TGTCCGTTCC TACG.ATGACA AAAACTCCCA AAGCTACGA-T ArTCATCGGC TCCTTGGCTG TTAAGAAGCA ATTTCAAGAG TGAGCGAAGC AGGCAAATCC ATGACCACCA GACCCACAAG GACTGGCAAG GAAAGATGCC GCTGGTAAAA GCGGAAAGTC AAAGTACATC TAGAATTGCA ATGGTCGAA.A GTCGACGTGT GTTTGTCATA ATAAAATCAG AAGAAGTrGG AAAGGATTCC TCTATCTATT AGTTCCCTCT TACTCTATTA AAGAAAAACA AAGCAAGTGG TATCAAAACA GACAAGGCTA TTCTTTCGTC TTCTCCCATC GGAATCTCAC CACATCACGT TGCGCTCA.CG GACTTCTTTA INFORMATION FOR. SEQ ID NO: 220: SEQUENCE CHARACTERISTICS: LENGTH: 4692 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear A.AGTTAGCAA ACTTAGCGTA A.AGGTTAAAC CAATAGTTCC TTTTTCCAAA TCAAAGCAAA AAACCAATGT AAGTATTCAC AAGAGCACTC CTAGAGmCsC ATACTAAATT CGATCFGAG AGCACAAATG AGATGGCTGA ACZAGGTTCCT CCAATTTTCT CTCACTTTTT ATATCCCAAA TTACAATCCG GCTATAAATC CAGACI'ATAC TGTCGGTTGT 1230 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: GGTTTTCCAG CAGGAGCTTC TCC'N'TATCA GAATAATGAT ATTTTTTACC ATGATAGTAA TCTCCATATG TCCATACTCC TCCATCTGGA AATGTAAAAC TTGAAATAAG AGCTAGAGCA TITTTTTCTTT CAAAAAAAGC ACACCTTGAG TCTTTTArrG AAACCGCTTT CTTATGTGAT CGAAAAAATC GAATTTNTTA GATATTTTAC TAGTTTTATT TCAAAATAAT ATGCAACCAG ACGAATTTTA 'rTCAAGTTTT TCCCATTCAT AAAAAGCAAT TCAAACTATT GTAAAATTCC GAATGACCAT CCCATCTGCT CACGATAGAT TPTTAAAAAG CCTAACCACC TCCTGAACCT TATTATACAG CAGCTGATGC AGCTCCCAAT AGTAATCTAT GTTrTTTTCGT TTTCATTTTA CAACAATGcA ACAAAATAAA TCCTCCTCTC
AAGAATAACT
TATATTACCT
TACTAACCAA
ACTATACAAG
TAGCAAAAAG
TTTTTATtAT TTGTTGTCAA CTGTGAATAA TA'N'ATATAG ATATAAAATA GATGCCATTA TAAAAGAGAT GGTGTTAACT AGAGCCGAAA CTCTCTTTTT CTCTAACACT AAAGTAAGCT ATTGAAAAAT TTTCAGAAAA TATCTTCTTT TACTTTTTTT GACTGGCATG AGTGTGATGT AGGATCAACA TGGCTATTGC TAGGAATATT TC'rGTTGGTA GATAGAACCA ATAAAATCAA GAGTGCCACT AAAATACATA CCATAGCGAC GATATTGACA CTCCCTTTAA TGCTTTCTGG TGTCGCAAAT ACATAGAGTA GGAGCAGTAA AATTCCTAGG ACTAAATAGA CCATCTTTCT AGCTTTCTCA CGCTCTGCTT TACTTCCATG TACTCATCGT AGGCGTCTTG ATAACAGCTG TGCCTTCGTG ATGTTAACTG TTCATAAACC TCAGTACCTG TGAGTAAGTT GTAGCCTTAC AATTTCCCCT GGAATCAATG ACGAGTCATT GATAAGAACT GACCAAACGA GTTTGACCAT AGAAAGGCTT TGGATAACAG AAATGGTTCA CATTTAATAC AAGTTCATAG CCCTCACGAC TCCTGAAACA GTCCATTTAT TTGCAATTCT GCCTGCAAGC
CTCTTTCTAG
TGTTAAGGAT
CGTTCAAGAA
TTTGGTCCTT
TCAAGTCATT
GGTTGACAAA
CAGCATCGAT
GCAAGTATTG
CTCTTATTCA
TTGTTTACGC
CTCAAGAGAC
AGTAGCTGAA
TTCACGAGAG
GATCGTACCA
AGAAACAAGG
GTCGAAGGTA
GCTGATTTTT TCTTCTTGTT AAACGGATAG ACTCAGGCGT TCTTCAAGTG TCAAGATACG CGAACGTTGG TCATTTGTTT TTTTCACCGA TGATCAT'rCC CGTTCTTCGA TAGACATGAT GCACCACGGT GACGTCCACC TGGTTCATGA TACCGTAACC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 CAGTTGAGTA TCCAATCAAA CCACGCGCTG TACCAGTTGA AATCATATCC AACATTTCAC ACCCTTGGTA TTCTTCTGGA GTGTCGAT
GAACAAGGAA
CT'ITACGTTC
GTACACGTTC
CGTCGATTTC TTTTACGATA ACTTCTGGAC GAGATACTTG GCATTGTTTC GATAAGGATT GACAAGTGCA ATTCTCCACG CTGGTGAATC AGTTGGGTCA ACACGAAGGG AAACGTCTGT GTTCTTCCAC CTTACGAGAA GTTACCCATT TACCTTCTTT 1231.
ACCAGCAAAT GGTGAGTTGT TGACCAAGAA AGTCATT'rGA AGAGTTGGCT CATCGATGTG TAGGATTGGA AGAGCTCTA CTGCATCTGT CCGAGTGATG GTTTCACCGA CAAAGATGTC TTCCATACCT GAAACGGCAA TCAAGTCACC CGCTT'rGGCT TCTTGGAT'rT CACGACGTTC CAAACCAAAG AAACCGAAGA AGAAAGGGTA ACTTGGTCCC ACGTCCAACG AAGTCATTGT GTTATCTACT GGAGCTGGGA TTCTTGGTCA GCTGGATCAT CACTGGGAAA TCAAGCTCGT ATCCACTACT TCTGCTGGAC GACAAGGTCT TGTTCCAAGG ATAGGCATCT ACGACCAAGA ACCAAAGTCC GCGTGTCCTG GGCAGTATTT TTAGCAAGCA AGCACGCTCT GCCAATTCAG AACCAGGGTT GTTTTACCGT TCTTAATT'rT GTCATGATTT ATACCATAAT TTCAAATAAA GTTTTGTAAC ACGGAAG'TTT TTAGTTGTAC CGTCAAGT'TT
CAACCTTAAC
AGTCCAAAAG
TATGGTCGAT
CTGACAATGA
CGTCATCTGC
TGTACCACGG AAGACACGAC CGATACCGAT TGACACTTGG AACTGCAAAG GCTCATCT3GA AATCGTGTCA AAGATTGGTG CCATAGTcrr
AGAAGTTCCG
ACCAAGCI'CG
9 99 9 99 9. 9 .9 *9 9 9999.9 9 9 9. 99 9 9 9 9**9 9 **99 9 99..
.9 99 9 9 GAGCTGATGG CTTATCGATT CTTT'TTTCAA TACGAAACGA CAACACCGTC AACCATTTTC GTGTGTCCAT AATGTTGATA TGGTAATTCC ACGCTCTTT TCCGTGCATC AAGCG'TTTCT GGTCAACGTG GGCGATAATC CCTCTATAAT ATTCAAAATT TAACATAACT CAAGCAAGTG TTGATCGCTG AAGCATAAAC ATGAAAAGTT CCAAGACTTC TTGTTAACAA CCACGATTG GTTTGTGGCA TGGT'TCCTTC ATGATACGCT CAACTTCTcC CGAGTTCCGT TGTAAGCAAC TCGATATCGT TTGAGTCCAT GATTGTTTCA ATAA'rTCGTC GcAArGTTAC GGATAWCTTC TATTTTCTAA CTGAACGATT TAAATGTTTT CACTCTGCTT ATAAGATAGG CACAGTATGC GGCCATGAAG CAAGCACTCC CCTAGCTCA.A AATATCGATA AGGCTATGAA CACACCTATC AGACAAGGAA CTTCCCACCG CTATGCCGTT GGCCGACTGG TCCCTTGGGC TTTCAGCTCC 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 TTCTTT'rCAC GTCA.AGCCTT T'rCAAAGCGA GCGACTTATG GTTTAGATAA TTTATTAGCT CAAGAAAAAA TCAGCCGAA TCAGAGGGGA AAT1TCTAGTC GATGGTTGCC CAGCCCGCTC CAGGACTACA AGAACTCCTT TTTCAGGATC GAATCATTCA TTATGCTTCA TAA.ACCTGCT GGTGCCGTTA CAGCCAACAA TCATGGACCT GCTTCCATCT AACATCCAGT CTGACAAGCT ACCGAGATAC AACGGGACTC CTCCTCTTGA CCGATAACGG TCCATCCCCA ATATCATGTC GATAAGACT'r ACCAAGTTGA GGTTAATGGA CTTCTAACAC CTGACCATAT CCAAACC'3TT CAAAA.AGGAA TTGTCTTTT AGATGACACT GTCTGTAAAC CCGCAAAACT AGAGATTCTA TCTGCAAGTC sCTCCCTCAG TCAAGCCTCT ATCACCATTT CAGAAGGAAA ATTTCATCAA ATCAAGAAAA TGTTCCTCTC GGTTGGTGTT AAGGTGACTA 1232 GCCTCAAAAG AATCCAAT'TT GGGGACTTCA CA'rTGAACCC AGATTTAGCA GAAGGTAACT
ACCGCCCTT
AAAACAAAAA
ATTGCTACAA
GAACCAAAAA GAGTTACAAA TCATTAAAAA CTATTTAGAG AAGC'N'TAAA ACTAAAGCTT TTTCTTTTA T'I-rACCGAAA TCCAGTTAAC TACAGAAATC ACAATTCCTA AGATATrAAG
ATGAGTCGAT
AATTAAGGCG
AATCTTTTCT
ATTTTATAGT CTAATTGTGA CTCTTTTTGG TATGAAATAG CCCAAAATCA GGCCTACAAT TGGAAATAAC AAACCAAGAA AAAAGTGGAT TTTcTCTT TTCTTTTATG TTCTAAGAAC TAATTATACT ATAAAACAAT AGCTTCATCC TATCATTCGA GCTAGTC'rTC ACTTTCCCTr TCCAAGAATC CAAGCCATAA AAAACCTTGT TTrTrCAAGT AAAGAGCTGC ATTTGTAACT GTAGAGAAGG ACAGGTTTAT CTTTACGAAG GGCTGCAAGA G-GACCAA TCCrATGATA TAATCGACAA GATACCCACA
TCCTTAAATT
CTAATTTGGA
GAAAGGATAT
CGTTGCGCAC
TTATACAAAT
AATAAGGTTA
AAATCTCAGA
GTTGGTTr'rC CTAGTTTTCA ACTGACTTGA 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4692 U er
AGGAATATTG
AATCAATTGA
ACGGCGAATA
CAAAATCCAA
TTGTGCTCTC
GCATCGTAAA
ATGTAAACAA
AGACTCATTC
CGTGCACCAA GGATATGTTT CCCGTACGAA TCAAGGCTTC CGAAGATAGT TAAAGCCCAT GTAACCATTA GTTCTTTTCT TGCGAAGAAC TGCT'rCTGCC TCCGAGATAG TTCCAACTTC TAATACCAAA TCGTTTGAGG TAGCAAAATT TTGTGTrTT 'rCTGTGGAAT TCTGCTGGGT CGCGCAAATC AAACTCCTCA TTGTCCACAA TTTTAGCCGC
CCACGCCAAC
CCATTTTI'CT
TCTAGATAGT
ATCAGTTCAA
AATTGCTGCA
TTCAAGAAGA
AT'rGCTAGTA TAAGTGCCCA CAATATAATC CAATTCTACC CTAATTTATC CATCAACCCT TATCATATAA GCGTTTTCCC CATCATAGAA TGTTT1TCATA GACTCACACA ATGCTCCTTA TTrTTCCTATC T'rCTTTAGCG ATTCTAAGGC AAGTATQGTA CAA'rAAAAAC ATGGGGATTC AACAATTACA TT INFORMATION FOR SEQ ID NO: 221: SEQUENCE CHARACTERISTICS: LENGTH: 706 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221: GCTAAAA.AGC TGATAATCTT1 CGACTCCTGT ATATGATGTG TCTTTrCATG TAAGACACGC GCCGCCAGAA TCATGGCAAG AGCTGCAAGA CTGGCAAGTA AGAAGCCGAT AAGATAGGCA AAAAGATAAG TGAATTTGAC AAAGAAAGTC AAAAGAACTA GGAAACCAAA GCCTCCTCCA 1233 AAAACTACCA AAGTCTTTCG TAAATCCCAG ATITrATCCA ACTGCTTGAC GAGGGAAGTC 240 GTCTGACGA.A CGCCTACAAT AGTTGCTAAC A'rACTrCCTA AAAAGAATGG ATAGACATGA 300 GTTAAACTGG AGAAATAAAC AGAGGAATAA GAGGTCACTA GAAAACTACC AATAAACATG 360 GAGAAGAAAC TGATCAAGAA. GGCAACAGCA GATAAG.AGAA AGACCATCCC CT'rCAACTGA 420 CCATTTGATT TAGCTTGN' GGATAAGAAC CAAACTGCCA ATCCCCAAAG AATATAGTAG 480 TGAACCTCAA CTGCCAAACT CCAAT'rATGA ACAAACAAAT GAGGAATGAA CTGAGAT'rCA 540 TAACTCCCAC CTGTTAGGAG TrCATAGAAG TTGGTCATAA AGCCTrAAGAC GCCCGCAATC 600 TGGCCACCAA TTCCAGCAAC ATAGTCTTGG CGAACCAAGA AAGTAAAAGG CATGGTCACC 660 AAGACCATCA AAACCACAGG TGGCACAATC TCGATAAAAG CGTCTT 706 INFORMATION FOR SEQ ID NO: 222: SEQUENCE CHARACTERISTICS: LENGTH: 3236 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 222: CAGCTGATGG GCAATATCAG TCATAGAA6AT TTTTTCAtTT ACTTTTG-AG CAATTT-TTG ***GTTGATGATA CGAGGGATTT GGTGATTTTT CTr'rACCAGG GGAGTCTCAG CAACCATCAT 120 TTTTGAACAG TGATAGCACT TGAAACGGCG TTTCTAAGG AGAATTCTAG AAGGCATACC 180 *AGTTGTTTCG AGGTAAGGGA TCTTAGACGG TTTTTGAAAG TCATATTTCT TCATTAGACT 240 TCCACAATCA GGGCAAGATG GAGCCTCATA ATCCAGCTTA GCGATAATTT CTTTGTCGGGT 300 ATCCATATTG ATGATATCTA GAATCTTGAT G'rrTGGGTCT TTAATATCGA GCAGTTTTGT 360 GATAAAATGT AATTGTTCCA TATGATTCTT TCTAATGAGT TGTTTTGTCG CTTTTCATTA 420 *TAGGTCATAT GGGACTTTTT TTCTACACAA AAATAAGCTC CATAATATCC ATAGGGGATT 480 TACCCACTAC AAATATTATA GAGCCCGAAA ATATGGGAAA ACTGATCCTT GTTTCTGCTT 540 TTGTCTATAG AAGAATAATA AAGATTATCT TCTTCAA.ATT CTCCGATATT CTCTAAAGTT 600 *TTGTGCAAGT TGCACAGAAC TTGTTTATTT TTTTGGTCAT CTTGCCATAG AAATATAAAG 660 *CGTTTTCATA TATAATATAA TTATCAAAAG ACAAAAGGAG TTCACCTCAT GGTAGAATTG 720 AATCTTAAAA ATATTTACAA AAAATATCCA AACAGCGAAC ACTATTCAGT TGAAGATTTC 780 AACTTGAACA TCAAAGATAA AGAATTTATC GTTTTCGTAG GACCTTCAGG ATGTGGTAAA 840 1234 TCAACTACAC TCCGTATGAT TGCTGGTCTT GAAGACA'rTA CAGAAGGTAC TGCATCTATC GATCGCGTAG TTGTCAACGA CGTAGCTCCA AAAGACCGTG ATATCGCCAT GGTATTCCAA AACTACCT'C TTTACCCACA CATGACTGTT TATGACAACA TGGCTTTCGG TTTGAAA'rTG CGTAAATACA GCAAAGAAGA CAT'rAACAAA CGTGTTCAAG AAGCAGCTGA AATACT'rGGA TTIGAALAGAAT TCTT'GGAACG 'rAAACCAGCT GACCTT'rCAG GTGGTCAACG TCAACGTGT
GCCATGGGGC
AACTTGGATG
GTGCGATTGT CCGTGATGCG CCAAACTTCG TGTATCAATG ATCGGAGCTA CAACTATCTA TGTAACTCAC CGTATCGTTA TTATGTCAGC TACTALAGAAC GAACAAATCG GTACTCCTCA AGAAGTTTAC TTCATCGGAA GCCCAGCTAT GAACTTCATC TCTGACGGTT TCCGTTTGAA AGTGCCAGAA TACGAAGGAA AAGAATTGAT CTTTGGTATC 'rTCCT'rGAAA CATTCCCAGA CTGTGTTGTA GGTTCAGAAT CTCACCTTTA CTGTCAAGTT GCTCGTGACT ACTTGCAAAC AGGTGCAACA CACTTCT'rCG ATGTAGAAAC TGAAAAAACA TACAAGAAAA GATATCTCTr TATCAATTGT GAAACAAAAT GCTTCTCTCC TTTTTGCTAG GCTCTTTAAT ACTCTTCGAA AATCTCTTCA TGATTACTGA TTTCGTCAGT TTTATCTGCA GCTAGTTTCC TAGTTTGCTC TTTGATTTCC AGTGAAGCTA TATGCGTAAA C'rACGTGAGC AATTTATGCT ATAGTTATGG TGACTTGTAT TTAGAATTGT CAGATAATAT CATTTTGTGT -TGTCAGAAGC AGGTCA'rAAG T'rTTTAGCAA GAAAGCGTCC CACAGATTGG TrEAATTGCAG TAGAGGTTGC GTGTAATAGG GGAACTACAG AGATAAC'rGC TGT'rGATATG GATGCTCAAG CGGCAGGTGT TGCTCATTTA ATCAGTTNG CGTCCAGAAG ACGTGAATGC AAAGCGACTA TCTCTGTATC
GGTAAAGACG
GTTGAGCTTG
ATCTACTAAA
AGTGGAGAGA
AGAAGTCATA
A.ACCACGTCA
ACCTCAAAGA
ATTGAGTATT
AATTGAATTC
AAAGTATTCT TGA'rGGACGA ACCTr'rGTCA CGTGCTGAAA TCGCTAAAAT TCACCGTCGT GACCAAACAG AAGCGATGAC ACTTGCAGAC CCTGCTGGTA CAGGTACTAT CGGACGTGTA AAAAATCCAG TTAACAAATT CGTTGCAGGA ACCGTGAAAT TGGTTGGTAG CGAAATTGTT GGAGCATTGA AAGTTCTTCG TGAAAAAGGC
AGTTTGTTGC
GATT'TGAC=
ATAAATAAAA
TATCAGTTAA
TTATGCATCT
ACGTCGCCTT
TGTACTTTGA
ATTTGTGGGT
GAACTAGAGA
AGAACCTGCT
AGAACTGCT'r
AAAAGTTGAT
GAACAAAGCA
TTCAAAGCAC
TC'rAGGGAGA
ATATTGTGAT
GCCGTACGTA
GCAGCTTrACG
ACCATCTACA
GGTAA'rAATA
AAATAATAGA
GGTGTTCAAA
CGTCCAGGTG
AAGAGAATAC
TTTGGTTGCA
1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 GCTTTTGATT CTAGTTTATC TATAATGAAG AAAAAACAGA
AATTGGGGAA
AAGGAGGATT
CAATTGAGTT
AAAACGCTTA
TTCAAAAGAA
GGCACAGCGT
CTTTAGAAG'r GGCTAAAAAA TCTGCTGGAA AA.AGAGCAA-A TGCAATrAkA CTTCCTTATC 1235 AAGATGCTAG TTTGATATT GTrATAAATG CTAAGAAAAA ATGTGTA.ATG GAATATCTAA CACATGATGT GCTTC'A.AG GAAGCTAAAG TTCATGTAAA TGTAGGTCCr TTAACTCAAG GTTATTGTGA TGTGAAAGCA TTGACTGGTG TTTATGACGA AGGTTTGCTA GGAACTTTGA ATAGAAAGCA GTTTTTAACT ATGTATAAAA TTATTGCGAT GGCTAGTTAT AAATCGTCAA CCTTT 'TCT TTCTTAAAAA. ATATGCTA'rA AAAGATGAAT TT1AAAAGATT ACATTGCAAC
AAGCTATGCT
GGGTATI'AAA
AGTCTATCAG
ATGGTTGGGA
AAATGACATT
AAATT'rGTGT GACTATGCAA GCCGATCAAG ACCTGGAGGT CTTCTCTTGA ACAGGAATrrA TCACAAGCAA ACAGGTGATG ATAGAATCAG AATGAAATTA TCGGGTATGA AAATGCTTGT AAAAAGGAGA 2700 2760 2820 2880 2940 3000 3060 3120 3180 3236 TGTTTGCTA.A GAATAAACAG AAATTGGGCT AACGTTAGAT AATTATTGAA GrTAACTTTT ATAGAGAGTA AAAAACT'rrG AAAGAAAGAA AATTGAAAAT TATCCAAAGG GTACCG INFORMATION FOR SEQ I0 NO: 223: SEQUENCE CHARACTERISTICS: LENGTH: 2885 base pairs TYPE: nucleic acid STRANDEDNESS double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 223: CCTGACTTTT CAAATTGGTT AGTTTGCCAC ACTTGGTTTA TATGGTCGTG GAAAGCATGG CTATTACTTC TCAAAGGGCG TGTAAGTTA.A TTCATTGTCA TTGATGCCAA ACGCCTTAAA CGCTCATTGT AAACTATCTT TCACGATTrGT TTAGCACGT9 ACAGCATTTT TTGCTCCTAA GTAGCTTCAA ACTCTGTAAG TTGATCATTT CGGT'rAGGCT CCCTC.AT'rA TACCTTTGGC CCTGTGATA.A GCGTATTTCC AATTCTAAAA ATCGCTTrTC TTAAACTCAT CTGCTACATA ATTT'CTCACC CCATGAAAAG TGTCTATT-rT TGTTTAGGTT CATATTACTC TTTAACTrGAT TGAGTGAGTA CCGCTTATAT AGTGTTACCC TCAAGTCCTT TTAGAATACG GCTATAATTC AAGCTCATCA CTATCTAGGT TGGTATTAAA AATGGTATTT AAAGAGTAAA TCCTGCTCCC AGTCACTCTT AGGCTTAATA ATCATCAATA ATTAAGTAAT CAACAGACTT CATGAGTTCA TGTTGCACCT TTACCATAAT TCCACCCCTC TTTAATT'rGT TACAAAAAGC ACACTCTTAG GTTCTCCTTT TGTCTTA'rAC AATAGCAACT GATAAAAGTG T'r=CCAAT CCCTGTACCT CCTCATGCCA TCAAGATATT TTTGTACCTG ACCTTTTGCA TT~CTGATGTT ACAGCATTAA AATCATCAAA AGTTTTAGTT GCTCTTATTG CTCATCAACA CA'rTATAAGT TTGCATATAT
AGTTTAGCAT
TGGCATTGTT
TTCCACCCCA
GGACACAATT
ATTTTCTGAC
GCTCTAGT'rc
GTTCC-ACCGT
AGTTTGCTTT
TCAAAT'rATC
CACAATAGGG
CTACTACATG
CGTCTAG'Tr
CTCCTAAAAT
ATGGTTTGC
CAATAAATTG
ACCCTGACTG
AGCAATCGCA
TGGGATACAG
TCTTTCTCCT
AAATGTCTCA
GGTTTCT
TGTTITTTTAG
TCCTGTTTCC
ACAGCCTCTT
1236 TCTTCT'rCAT CTTGCTTT CTGTTCTTCT CGAACTTCTT TTrTrGCCTC TCCGTTCTCA TTGATTTGTG TTAGCTGTAT TTCATGCTTA ATATTTCCTA AACTAGATTG TAATGATTTC GTGTTGGTAT CCAATCTTCA CCTCACGCGC TGCCCTGCTA
AACGGTTAAG
TTAACGCCTC
AATCTTCTAG CATATACTGC AATTCTTGAA TCTGTAACGG CAGCTCGCTT CATATTCAAC AAGTCGTCTA TTTCCACACT
GATTACCTTG
ATGGATAAGC
'rGACAATGCT
GGTTACTTTT
GATTTCAGCC
TAGC'rGGTAG
TTTCTAACAA
ATGTATGCAA
TCTGGGCTAA
TTACCTGTCT
TTATTTACA-A
ATTACCTCAT AATCAGAAAT CAGTTGAAAA CAAATTCTGC TTGTGTCATG :0 .*0 0 0**
CTCTATAAGA
T'rTTAAGGTT
CTCTGCTTCG
GTCCATTTCA
GATTT'TATGG
TTTGA'rGGAT
AATTTCTCGC
TCTGTCATG
TTCTAAACAT
CCATATCCTC
TATAGCCCCC
CAACTTTTAT
ACAGTAGCCT
T'rCTCAAAAA
AAAATACCAG
CTTTCATTCA
TTTGTCTGAC
ATTTCTGCGC
TTTTTACCTC
CATTGTCTTA
AAATGCCTGG
ACGCTCTTCT
ATGTTTGGAC
TTGTCTAAAT
TTGCTTTCTG
CTAAATACTC
AGTGCT'rAGC
GGGCGAAAAA
ATTCAGGAAG
TTTCCAATAA
TAATTCTGTC
TGTTTCTTTG
ATTTCCTGAT
TACTGCTCCA
CTTAACTGCT
TTTGTAGCTG
CTAGTGTCAT
AGTTCCATTT
CTTTTCAGCA
TTGGCGTTTA
CATTCCCGTA
TTCAATCCAA
TGCATTGCCI' CCTCAAACTT TTAGAGT'rAA AAAGAATATC TGCTCTATAT ACGCCTGTTG AAGAATGCTT TTCGCATAGC GTGCTTAG AGACCGCTTC AGTAAACGGG ACAACTCATC 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 AGAAAGGATT CTTAGGCCAT TTTCTTCGCT TATACGTCTA GTTAAATTCT CATATGTTGT ITGGTGTGAT TTTTTAGCTT ATTTTTrTAC AACTCATTTT CAATTrCAATC ATAGCTATTG ACTCCTCACT AGTCAAGCTA TCGATACCGT TAGCGTTrCAT GTCTGTTACT GCCTTTAGTA GCAAGTTGTT CATGGTGCTA TGCGCGTGCT TTGGTGCATT CATGCAAGGT TTTTCTTTTC GGTTTTTCTA GCGCCCTCTG CCTCACGCAT T'rCAAAGAAT TATTCT'rrAA ATAAGTGATC TAGGTTGTCC TCTAGTATCT AGTCAGCCTT ATTTTCTCTT CAGCGATGAT CTCGCTTGTG
GCTTTGACTA
AGAAAAGTAG
AATTTATGGA
ATTAAGCGCG
GTGTACGGCT
GGTTTAGT7r
CCTGTTGCTC
TTTTAAATCC
AGGCCATGTT TCTATACTGT CAGACGAATT TCAGAAAGTT GAATTGCCGT ACTGTTTCGG GTTCAGAATA TAGGATTTTr AAGTATTCCC AACTCTTCA-A TGATAGTGTG GTGT'rGTACT TCAGCACATT CTTTCTTACC GTCCATGTAA ACTAGTTCCA 1237 TrACGGTTCT ACCTCCTGTA TAAATCTGGT TAGCTTACTT TTTAATTGCC TCCTCTAGCC TCTTTTrAG CCTCTAAAAC GGCTT'rGGCT AGTGGTTAAT ATTATTACC ACTGTCTCT ATAAACGTGT TAGAGGCCTT TATAACGACT TGTATCGCTG TATCGATATC CTCCGTGGAA TAGTAGATTT ATTT'rCTAAT ATCATTCAAG ACTTGTTrAA CCCATTTCTT GAAAGAAATA AAATTACATC TTCTTrATCC TTGGCATCTG CTTTGTCTGA GACAAATTAG AATGTCAATA
CTTGG
INFORMATION FOR SEQ ID NO: 224: SEQUENCE CHARACTERISTICS: LENGTH: 3144 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 224: TATCAATCCT TTCCCATTAT AGGAGCAACA GAGTGGGAGT AGTCATCTAA GGACTAATTT ATGTATTTTT ACGAGTCAGT ATCT'rGGGAT ACTGGTTrTTT ACTTTTCTAG ACTTTrI'GAC TACTTGTTAA AACTGGGATA ATTTTCGACT GTTTAACAGT TATTATGCAA AGTCTAAAAG ATTAGAATTG TCAAAACAAT CCGTCTAGGC TTGATTT'rAT CCTTrATT-TA CTATAAAATG AGAAGGAAAA ATGTCAAACT T'IrATATTGC AAATAGGAGA AATCATGACA AAAACAITrAA AACGTCCTGA GGTTTTATCA CCTGCAGGGA CTTTAGAGAA GCTAAAGGTA GCTGTTCAGT 9 9 9* 9 9 9. .9 9 9 -9 9 9 9 9. 9.
9 2640 2700 2760 2820 2880 2885 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 ATGGAGCAGA TGCTGTCTTT ATCGGTGGTC AGGCCTATGG AC TTT ACTT-T CGAACAGATG GAAGAAGGCG TGCAGTTr-GC 9 9 9 9999 9 9999 9999 9 9999 99 99 9 9 9 TCTATGTAGC GGCTAATATG TCCGTAAACT GCGTGATATC TGATTGCAGT GACTGAAGCA CTAACTATGA AACCCTTGAG GTGAGGTTTC AATGGA.AGAA CCTI'TGTCCA TGGAGCTATG TGAGTATGCG TGATGCCAAC TTTACGATAT GCCATTTGGG TTTCAATGTC AGCCGTTGAY GTTATGCACG AAGGAAATGA GGGATTGCAG CAGTTATCGT CCAGGCCTTG AAATCCACCT TTCTGGAAAG AGCTAGGCTT TTAGCTGAGA TCCGCAA.ACG TGTATTTCAT ACTCTGGACG CGTGGTGGAT GTTCTCAGTC AAAGAACGTA AGAGTTTGCA ATGTCTATGA TTGACCACAT TCTTCGTAGC CGTOCGGGAA GGCCAAGTAT GGTGCCAACG AGCTGGTGCT GGTGAGTGGT ATCTGACCCA GCCTTGATTA TTCTACCCAA GCCAGTGCCA GACTCGTGTC GTTTTAGCGC TACAGATGTT GAAATTGAAG TTGTACTCTT TCAAACCACA ATGCCGTTGG AAATACGACC GGGTGAGATT CCAGAAGAAT TCcAGATATG ATTGAAAATG 1238 GTGTGGACAG TICTAAAAATC GAAGGACGTA TGrAGTCTAT TCACTAYGTA TCAACAGTAA CCAACTGCTA CAAGGCGGCT GTGGA'rGCCT ATCTTGAAAG TCCTGAAAAG TT'IGAAGCTA TCAAACAAGA CTTGGTGGAC GAGATGTCGA AGGTTGCCCA ACGTGAACTG GCTACAGGAT TTTACTATGG TACACCATCT GAAAA'rGAGC AGTTGTTTGG TGCTCGTCGT AAAATCCCTG AGTACAAGTT TGTCGCTGAA GTGGTTTCTT ATGATGATGC GGCACAAACA GCAACTATTC GTCAACGAAA CGTCATTAAC GAAGGGGACC AAGTTGAG'rr TTATGGTCCA ATT'IrGAAAC CTATATT'GAA GA~rTGCATG ATGCTAAAGG CAATAAAATC CAAATCCAAT GGAACTATTG ACTATTAAAG TCCCACAACC TGTTCAATCA 'rTCGAGCTCT TAAAGAGGGG CTTATCAA'rC 'rTTATAAGGA AGATGGAACC
GGTTTCCGTC
GACCGCGCTC
GGAGACATGG
AGCGTCACAG
ATACAACACT
GTCTTTTTTA
ACAAAGAAGC
TTCGTGCTTA ATGTAGTTGT TTAGTTTTAA AAAACTATGC TAAACGAGAT TAAAGAATGG CGAAATCCCT TGATGCGCAA TTTTTI-AAGT GATAAAGTCG GAGTTTAGGC ATCAAAGCCT GATGTCTTAG ATATTTTGAA AAAAATTAAT AAGCAGAAAA GAGAGTT'TTT TGTTAATAAA ATTTCACAAA ATGACATTTA ATATGATATA ATATTGTTAA AAAGAGGCGC AACTTTTTAA AAAACCAATA ATATTAATGG AGGAATAAAA AATGTAAGTA ATTCTCAAAG ATATAAATTT TGCACTTAAC AAGGGTGAAA AATGGAGTTG GTAAGAGTAC GTTGATGAA.A ATTCTTGT'rC GGTAATATTA TAAGCAGTGA TAATGTTGGG TATTTAATCG TCTAAAACAG GTTTAGAGAA TT~TAAAATAT TTGTCAAATT CAAGAAAGAT TTAGATGTTT GATCCAAGAG TTAGATT'rGA GTAAAGACCT ATTCTTTGGG TACAAAACAA AAATTAGCTT GAACCTGATA TATTGATr'rT AGATGAACCG ACTAATGGTT A'rAGTTTTAG CGGTTCTAAA AAAATTAGCT TTACATGAAA AGTCATAAA'r TAGAAGACAT TGAAGAAATT TGTGAGAGAG CTTTTGACAT TTCAAAAAGT AGGAAAAGAT AGTCATAATT TCATCAGCTA CAGATAGAGA CATTTTCATT ACCAAACAAG GAAGAGGGAT TGAGAATTAC TATGTCTGGG AATATTCAAA TTTAACGAAA ACTCTATTAA AGTAGTTGAT TTTGAAACTA AT'rTACCTAA ATCGTTCAAA ATAAAGGAAG GTTATAATCA CTCTCTATTA TTTTGTTGTA TATATTGCAT TAAGTTAGAT AATTAATGAG AATCAAAGAG AGCATTATGG TCATTCAATC TTGTTGGTCT AGCAGGGAGA AGAATAATCA ACCGACT'rCA AAGAACCAAA ATTATTTTTA TATATGGTGT TGACTACAAT CTCAGTCTAT TAATAAAAAA TGCT'rCTAAC TCTCGTTACG
AAAGCTCCAT
GAGATTAGCT
ATCAAAT'rAA 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820
TAGATATTGA
ATGTGGGAAT
TTCTTTTCTT
TCTTG'rTTGA
AATTTTGGA
ATAGTGAGCT
AAAAAGAGAC
TGAAATTAAA
ATCATCACAA
TTTAATATCG
GGAGAACGGG
GATAGCTTTT
TATTGTTrTAG
TTTTAAATTT
GCTTAAAGAT
TAAACAGAAG
1239 AATCGGATGA TTTACGTCTT GTCTAATTrT CTATATGCTA TC'TCAGTTrC CATTATTTAT GCTTTGAATG GCAT'rGTGTT ACTAGTCATA GTAAGTAAAT TGGGTATTCC AGGTGATT'rA GGATTAAATT TTATAGTAGC TATTGTAGTC AATACAATTT TGTTAGTCCT GTTTTATT CTATTATCTr ACAT'N'TcTA 'N'TATACAAA TTGAAAAGTG GCTrGGTA'rW TGGTATTTrA GTAGCTTTAC TACTCTTTAT CTCTAATATA TTAAATACGA TGATGATGAA TACTAGTAAT- GATTTGTTTA TCAAAGCAAT TGAA INFORMATION FOR SEQ ID NO: 225: SEQUENCE CHARACTERISTICS: LENGTH: 3766 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 225: TACGGTATTA TT?1'TAAGGA GAAAGAATCA TGAAAATCAA AAAATGGCTT GGTCTAGCAG CCCTTGCTAC AGTCGCAGGT TTGGCTCTTG CAGCTTGCGG AAACTCACAA AAGAAAGCAG ACAATGCAAC AACTATCAAA ATCGCAACTG TTAACCGTAG CGGTTCTGAA GAAAAACGTT 2880 2940 3000 3060 3120 3144 0 0 4 0* 0.
0 0 p 0* be 0 0 0 C. I.
0 0 0 GGGACAAAAT CCAAGAATTG CAGACTACTC ACAACCAAAC AACACTATAA CTTCTTGAAC CAGATACT'rA CATCTCTCCA ACACTAAAGT AGAAGACATC ACGAAAGCCG TGCGCTTTAT GAACTGCTCT TGCAACAGTT
AATTGGACGC-TAGCCAAACA
ATACCTTCGT TACAGAAGCA ATGAAAACTC AAAACAATGG GTTAAAAAAG ACGGAATTAC CTTGGAATTT ACAGAGTTCA AAAGCAACTG CTGATGGCGA AGTAGATTTG AACGCTTTCC AACTGGAACA AAGAAAACGG AAAAGACCTT GTAGCGATTG ATCCGCCTTT ACTCAGGTTT GAATGGAAGT GCCAACAAGT CCAGCAAACG GAGAA.ATtGC TGTACCGAAT GACGCTACAA TTGCTTCAAT CAGCTGGC3-r GATTAAATTG GATGTTTCTG GCCAACATCA AAGAAAATCC AAAGA.ACTTG AAAATCACTG GCTCGTTCAT TGTCATCAGT TCACGCTGCC GTTGTAAACA AAATTGGACT ACAAGAAATC ACT'rTCAAA GAACAAGCTG TACAACATCA TTGTTGCAAA.AAAAGATTGG GAAACATCAC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 p.
0 *0*t
S
00*0 000.
OOb* 0@ 00 0 0 0 CTAAGGCTGA TGCTATCAAG AAAGTAATCG CAGCTTACCA CACAGATGAC GTGAAAAAAG TTATCGAAGA ATCATCAGAT GGTTTGGATC AACCAGTTTG GTAATAAGAA ACAGGGAGGT GCGAGAGAAA ATTCCACCTC TTGCTTTGT ATAGAGTATA GATTGTAAAG AAGACTATTC GTTCATAGAA AGGTAGAGAG AATATGGTTT TTCCTAGCGA ACAAGAACAG ATTGAAAAAT TTGAAAAGGA TCATGTAGCC AATCAGCT'T TGCCCAGCAG TCAAGCGTGT TGGAGCTGAA CACATTTCAA GAGTTCGCGT CTGTGCCAGC GGATCGGGAT ATGGCTTCAT GTATGGGCGT GTGCTTTGAG AAAATATATG TGGAGGGAGC GGAGGAATCG ACAAACTCCG TGGGGCGGAT AGCTGGAAAT TTCTGGTGGC CTGATGTGGA TATCCACTCG 1240 CAGCATTATT TTGAGGTTTT GCGTACCTTG G'rTGGACTrCA AGGAAGTCGC AAATTATCTG GTGGAGATTG ATGAGAGCTA TACAGCGCCC CCAGATGCCA AGACCTTGAT CAGGTCTGGA CAGAGGATCC GGGGTTGATG ACGACAAGGG CAGCACCATG ATGATTTACC GCTTCAACAG ACCTAGATAA
TTTCTATAAC
kTTTACGCTT
TCATATCACA
TGTCAATATC
GTATTTGGAA
A'rTTCTAAGA GG'rGAGATTT
TTTGTCATGG
CACTATOACA
TCGGTCCGCA
GCTCGCTTGA
AGCTT'rATCA
AAGCATGCAG
TTGTTGGTCT
AATAAGGGGA
AGTTATGGTG
GGGAACAAGG GACCAAAAAT GCCTTGGAAC T'rGTGACCTT TGATGCCAAG GTAAAAAGCG TCCAAGCCTT ACAGTCTCTT CGTGCTGCGG AAGAAGTACA AGAGCCCAAT GAACGAGAAA ACCCAGAGGA AGTTAGTCGG ATTTATGGAT TGGCCTTTCT AAAACGTTTC TTTTTCGATC GTTA1'CAAGG TCAGGGTGTT AAGACTAT'T TTCGTCTGGT .TCCGGCCTA GAACCGCATG ACAAAAATGG CTTTGATAAG GTAGAATTAT GCGATATGAG CGCACCAGCC A'ITCTCAATG
GTGTTGTGGA
ATGGCCGTAT
TGGCCTTGCT
TGGAGTTGCC
CAGCGCTTAA
TACCTGCAGA
ATCAGCTCCT TGGTATCTCC CTTGGTTGAA GGCTTGTACG AGAAACTTAT GGTCAACGAA TCTCTTACAG GAGGAGCGGA TATCGAAGGA ATCCAGTCTG AGCCAGTGCC AAGCTAGAGG 1080 1140 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 ATGTTCTGGA AAAAATTCGG AAACAGCTAG ACTATACCTT GGGAGAGATG AGCTATCGAA TGATCGAGTT GGCCAAGAAA TTCflATCCAC AGGGCGTTTC AGTCTTGCCG ACGACAGCGG GGACAGGACC TATGCATACG GTCTTTGATG CCCTAGAGGT ACCAATGGTT GCA'rTCGGTC TAGGAAATGC CAA'rAGCCGA GACCACGGTG GAGATGAAAA TGTGCGAATC GCTGATTATT ACACCCATAT CGAATTAGTA GAGGAGCTGA 'rTAGAAGCTA TGAGTAGAGA 'rATTATCAAG TTAGATCAGA TCGATGTGAC ?TTrTCACCAA- AAGAACAGAA CCATCACAGC GGTTA-AGGAT GTGACCATTC ACATCCAAGA AGGGGATATC TACGGAATCG TTGGATATTC TGGAGCAGGA AAATCAACCC TTGTACGGGT GATTAATCTC TTGCAAAAAC CATCTGCAGG GAAAATTACC ATTGACGACG ATGTGATTTr TGACGGCAAG GTGACCTTGA CGGCAGAGCA GTTGCGTCGT AAACGTCAAG ATATCGGAAT GATTTTCCAG CATTTTAACC TGATGAGCCA AAAGACAGCA GAGGAGAATG TAGCCTTTGC CCTTAAACAC 'rCTGAACTCA GCAAGGAAGA AAAGAAGGCT AAAGTAGCTA AGTTGTTGGA CTTCTTGGT TTGGCAGATC GTGCTGAAAA CTACCCTTCA CAACTATCTG GAGGGCAAAA ACAGCGTGTG 1241 CAATGATCCA AAAATCTTGA GCAATTGCGC GTGCCITGGC TTTCAGACGA GTCAACTTCT GCCCTTGATC CGAAGACAAC CAAGCAGATT TrAGGCTTGA CrGTrGTCTT GATTACGCAT CGTGTTGCAG TTATGCAGGA TCAAACCCTA AACAACCTT GCCATGGTCA AAATCGAGAA GTGCAACTCA AGTACGCTGG CATTACCAAG TAATGGCTAA
GTTGGAGAAT
GCCATTCGTC
ATTGATTCAA
GGGAACGGCT
CTTGGGGCTA
TAAAGTCGTA
CATCCTCTTG
TGGTGGTGGT
AAGCAGGTGT
ACCTATTTAC
ATCTACTTAA
GTGGCAGGTC
TTCTGGATTT
GCAATCTTGT
TGGGCATTTG
GACTCAAGAC
GCAAGAA.ATC
AGC'TTCAACA
TAT'rCTCTAT
TTTGTCAGGT
ACAACTAAAA
CAAATGTCTA
CTCT?'rATAT TCTTrCTCGT
TAGACAAAAT
CACCACTTTC
TTGGCCTTGT TGCAAGATTr GAACCAAAAA GAAATGCAGA ?I'GTCAAAGA CATTGCCA-AC ATTGAAGAGG GTAGTGTGCT TGAAATCTTC TTTATCTCAA CAGCTACAGG TATTGACGAA GTGGAACACT TGTCTGAAAA CAGTC'PCTTG GACGAGCCAC TTrTGAATGA ATTGTACAAG GGGAATATCG AAATTCTCGA TGGTACTCCT GAAAAAGCAG CGTTGGCAGG TGCCCAAGAA GTATTGAAGG GAGTACAGTA AGATGGAATC TAAGATGGGT TGGGCTGGTC AGGCAGGCTG GACAGTTCT'r TCCTTCATrA TCGGAGGCT'r CTTGACAGCG CCAGGTGGTG TCTTGGAGAA TACCTCAATr TTTCGTGCGG TTCCCTTrAT TCACTTGAT'r GTTAAAACAA GTATCGGGCC 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3766 AAATGCAGCC CTrGTCCCAC TTTCTTTTGC AGTCTTTGCC T'rCTGG INFORMATION FOR SEQ ID NO: 226: SEQUENCE CHARACTERISTICS: LENGTH: 2520 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 226: TGTTGCTGAG TTAATCGGTA CGTTCATGTT TGTATTCGTC GGGACAGGAG CTGTTGTTTT TGGAAATGGT CTTGATGGCC TTGGTCACCT TGGAATCGCC TTTGCCTTTG GTTTGGCAAT CGTGGTGGCA GCCTACTCAA TCGGAACTGT TTCAGGTGCT CACTTGAACC CGGCTGTTTC GATTGCTATG T'PTGTAAACA AACGTTTGTC ATCTTCAGAA CTTGTAAACT ACATCCTTGG TCAGGTTGT'r GGAGCTTTCA TCGCTTCTGG CGCTGTCTTC TTCCTCTTGG CTAACTCAGG TATGTCAACT GCTAGTCTTG GTGAAAATGC CTTGGCAAAC GGTGTCACTG TCTTTGGTGG TTTCTTGTTT GAAGTCATCG CAACTTTCTr GTTTGTATTG GTTATCATGA CTGTGACTTC AGAAAGCAAG GGCAATGGCG GATTCTrGTC GGATrTGAAGA AGCTGTCTTG GTAGGCGGCG GCTGGTGGAG TCTGCAGC AAACTCAAAA AGCCTTGCTC GAAAATCTCT TCAAACCACG GTTCTATCCA CAACCTCAAA TCAAAACAGT GTT'rTAAGCT AAGCTGACTT CGTCAGTTCT GTTCTATCTG CAACCTCAAA TCAAAACAGT GTTTTGATCT GATCTGACTT CGTCAGTTC'r AACTTCCTAG 'N'TGCTCTTT CTGGATAAAG GTCGTGTTGG GCT'rACCGTA GTTGTAGTAG AGACTTCTAA ATAGCGACGGG AGTTTTCGTG GAAATCTCCG ACTCGACACA GAGCTTGTCA CAGTGATTTG TCCCAGCAGA 1.242 CGATI'GCTGG TrTGGTAATC TTAC'rGGACT TTCAGTAAAC CASCCT'rCAA CAAGTGGA CCTTGI-rGCA AAAAATTTCC CTCATCTTGA GGAACAGGGC TCAGCTT1CAT CTTGCCGTAG ACAGTGTTTT GATCTGACTT GACTTCGTCA GTTCTATCTG ATCTGCAACC TCAAAACAGI' ACAGTGTTTT AAGCTGACTT GACTTCGTCA GTTC'rATCCA ATCCACAACC TCAAAACAGT GATTTTCATT GAGTA'rGACT AAGAGGCGTT GT'rCTGCCAA GGGTCGATTG AAATGCCACC TCTAGCAAGT TGACCAAGTC TGGTTTCGGT AGCTAAATAG
GGTTTGTCAT
CCAGCTCGTA
TITTCATCCT
TTGGAACAGA
TT'rrTCGTAT TATGGTTrACT
CGTCAGTTCT
CAACCTCAAA
TGATGGCGAT
GCTTGGCACC
TGCACCAATC
AGAATAATTG
GATACTCTTC
GACTTCGTCA
ATCTGCAACC
ACAGTGTTTT
GT'rTTAAGCT GACTTCGTCA CGTCAGTTCT ATCCACAACC CAACCTCAAA ACAGTGTTTT GCTTTGAGCA ACcTGCGGCT TTAGCGGTTG TCAATTITrCT GCCCTCATAC TTAGTTCCTT GCGCGGAGTG AATTTTCCCC 'rTTCCCGATG GTGTT'GATAC ATA'rAGT'rTG AGGGA'TTTTC 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 GGAATGTAGG AAATATGAAT CGTCGCAAAG TCTCGCTGAG GACATATCGA GGATATGGTG ACGAATGCCC TGTTCCTTAG CGATTTCTCT AGTAATTTGA ATTTCGAGGT GATGACGTTG GCCGTAGGCA AAGGTGACAG CTTCGACTGT TTCATAGTGT TGCATGACCC AGAAAAGGCA GGTTGTTGAA TCTTGACCAC CACTAAAGAC GACCAAGGCT TCAGAGCACG CAAAAAGCTC GCGATGGCAT ATCCCAAACA GGACAAATCG ATCAGGACAG GCTATTCTAGT TTCAATCTAC AAAAAAGAGA CCAAAGAAAG GGCCTTGATT TGTTCTGCTG AATTGACGTT TCATAGTACT CCTTCCAAAA CCATTAGGGA GCTAAAAAAT ACCAAATCGA TCGTAATATT CTACTTATAT TCAAATCGAT TTCTAACAAT TATAGTCTAG CATATTTT TACTTGGTCT CTCGTTTGA'r TGTGAACACC TGCAACTI'GT
AGTAAAATGA
G'rTTTAGAAG
GAAAAATGGC
TGGGAAATGT
GGTTTTTT'rA
AATAAGAACA
TAGAGGTGTA
AAAGGGCAAG
TAGCTCAATT CAGCAATGAT TTGACAACTT GGCCGTCTT TTTGAAGAGA AGAGTTCGAA TAGACATGAT TCCAAAAGCA CGAGCTGTGT TTGGAT'IrTC ATCAACGTCC ATTTTAACGA TTTTCAAGAC ATCTTCTGAA AGTTC?1'CAG ACAATTTGTC CAAGATTGGA CCTTGCATAC GACATGGACC ACACCAAGTr GCCCAGAAGT CTACTAAGAC CAAACCGTCT TTTGTTTCTT GTTCGAATGT TGCATCTGTA ATTGCTr'rTG CCATTGTATT TCTCCT'rI-IT TTAGTTATAT TGGCTrAAAT CTTGTI-TCAT GAGATAGAAG AAGATATCTC CATAAGTCCC ATGGTAGTCC AAATTATGAC CCTTGTAAGT TAATTr'rTGG ACAGGGTAGT AkkCTGCGAC GCCGATAAGG CAAGCrTGTT GCGAACGTTC AAAGTCTTCA TAAGACTCGG INFORMATION FOR SEQ ID NO: 227: SEQUENCE CHARACTERISTICS: LENGTH: 5278 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY:- linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 227: ACTCAGTTAG ATTTrGTTTT CAAAAACAAC GAAGAAAAAG ACCATGTTGC TCTACTTGGA AGAATTGGCT CCGAACGTGT TTATCGATAT ATTAATAAAA AATATTTAGA TTTACCGGAA ACATTCGAAA ATTATAATGT TTTTGTACCA GAAGCTAATG GAAGTGGTGC CTTAGGTGAA 2280 2340 2400 2460 2520
S
S
S S GTCT'rATCAA CACCCCTAAT CGGGGAACCC ATTGGTAATT TTAAAACAAA ATTTGAAGCC TTCGCTAGAG TATTATTAGG TGTTTTGAAA TATTACGTCC CCCTCCAAGA CTTTACGGTC ACTGATATTG ACCGCCAGCT TGATCAAAAA ATTGAGAATC ATGTAAGGGA GATGGATTAG CTAATCGGGC ATACAGATAC TTT 4
TTATCT
.GATGCTTGTA TTAAATTTAT TAAAACTAAA GTTACTCAGC ATAATTCACG CAAAACTTGG AATTCGGACA TTGATTGGAC ACAATCAGTG TA'rGACTTTT CCCCTGAAGA AATTGCCTTT AAAAGTATTT TTATTTGACA AATAGTGCTC TCAGGAAGCA TACGATGCCC TGACCCTTTT AGATAAATTG ACTCGTTAGC AAACGTTCAA AACCACTAAA GAAATTCATC CTAAAATCTA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 AATGATCTAA AATGACTATA TGTACTTATG AGATGAGAAA
TAGGATTAGG
GTCATTTGTT
TAGAAATTAA
AAAAGGAAAA
TGCCTACACC
ACGTGATGTC
CTTATGGACT
CCATGATTTC
CTTA'rGCCAG
ACACCGACAG
ACACAACGTA
GGTGATGCAG
CACCATTTCC
TAACCAGTAA TGAAGGCTGG TCAAGGAGCA AACGCATACA CTTATACAGA AGAGCCTGAT 'PTTCTTTCCA TGATGTAGAA CTGAAAAATC AAAAAATCTT CTGGAAAAGG ACAGGACTAT ATTAAGATTG GGTATACAGA GCTCATATAG CTACAGATGT AAGGGGAAAA CTTTCAAGGA CGTCGTCCCA AGACGGAATG TTTGATAAGT TTGTTCAGCA ACTCTGCGAC AAGAGCAAGA GTTCTATTTT AATGGAACTC TGATTTGTCT GGTTATCAGC
AGAAGCAGTT
GAATGCCAAG
AGCTGTCAAT
Tv rGAAACA
TAAGAGTCGT
ACAACTTGCT
CGATAAACTC
GCTAAGACAT
CCACGCTTTG
GTCCTAA'N'G
TTCATAGCAG
CCAATCTTGT
TTTATCAGTC
AAATGGGTAA
1244 TAGCTTATTT CCAACAACAT GTAAAACCTT G'rCTACCTAT TAACAAACCG CCCTGCCATT GTCAAACGAC T'rACAAGtT CACGACAAGA ATTTCTTGG'r GCTGGAGGCA AGTTTC'rCTG GACCTAGCTC GACGGATGGA GCTAACTCAT GGTATGATGA GTT'rCTGAAT CAGATAGCCT
ATTTTAGCTG
TCCAAGACTT GAAAGGATCT GTTTATTTAG CTGATCTGCA TTGGGACTTG TT'GGTTATTG AGACTGACCA AGCCTTTAAT AAGATTCGAC
ACGATGTAAG
GTGGAGAGCA
ACGAGGCTCA
GAAATTTTIAC
TGAAGGAGTT GATACCTI'CA TCTGCATTTG TrCAGGTACAT CATTTIAAAGC ATTGGCTAAA GGAGATTTTA CAGAGGAACA AATCTACAAC TGGTCTTATG CTGATGAGCA GGCTGCTAAG TATTCGTGGT C'rCTTGAGCA AGAAGAGGAA AATCCTTATG AAAGCTTGCC TCAGATGATT GGCGAAAAGT TAGAAAAAGG TCAGTTGAAT CTC'rrrACCT ATCAAATGTC TGTTTT'rGAC TTAAGTGAAT TGATGTCAGA AATTGGTTAG AGAACTCCC-T AATGAACTCA AGCATTAAAA GCCCTACTAG TGCTGGTGAC GGACGTATGT GGTTAGAAAA GCGATAGCAG GACAGGTGTC ACTATCCCTG AGCTCTTTAT ATGCAGGCCG
TTTTCCTAC
ATACTCTATC.
AGCATACT
AAGAACACCC
CCGAAGAAGA
AGAATGACAA
AATGGACAGG
CGCTCAGATC GATGGTGAAA ATATTGACTA AGATGATA.AA GGGAAATTTA TTCATGAGCA AAGCA.ATGAA A.AATATCCAT TTTCAACCAA TTGGCI'TTTA GAACGTGTCG CTTCGGCCAA AATCTATGA-A AACTATGAGA TCGTTCTAGC CGATAAAGTC AAACTCAAAT CCTTGGACTT AACCATTACC CTATCCGTTG GTCAGCTGAC TG'rATTGATG 'N'ATCAAATT TGAAATCACC 1.140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 CCTTCCGTGC TCAAAATC.CT TACTCATGGA GCGATAACAA a a.
a a AGGAAATCAC TTTCGCAAAG AAAGAGCCTA TGTATTTGAC TTTGCGCCGG AAAGAACCTT GATTrCTCTTT GATGAGTTTG CCAACAACTT ATTGCTTGTA AC'TGCAGCTG GTAGAGGAAC TTCAGCTACA CGCGAAGAAA ATATTAGAGA ATTATTAAAC TTCTTTCCAA TTATTGCCUA AGACCGTGCT GGTAAGATGG TTGAAAT'rGA TGCAAAGGCA GTTCTAACCA CTCCTCGCCA GATAAAAGCT AGAGAAGTC TTAAACGAGG TTTTATGTCC AATCTCTTAT TTGATAATAT TAGTGGTATT TTCCAAGCAA GTCAAACAGT TTTAGATAT TTAAATGAGC TGCCAGTT1GA AAAGGAAGGG AAGGTACAAG ATAGT'rCTGA TTI'ATTAGAT TTTTCAGATG TTACAGTCGA TGATGAGGGA AATGCAGTAG TAGACCATGA AAT'rGTAGT'r AATCAGCAAA TGCGACTTTT TGGTGAAAAA GTTTATGGAC TTGGTGAATC TGTTGCTGAG TTAGTCACAA AAGATGAGGA ACGAACTCAA AAACAGCTGG TCAATGACTT GAGTAACACC GTTTCTTCAG TGATTGTAGA 1245 GGAAT'rGAAA GCAGATTATT CTCTAAAAAC AACGGAAACT GAGCAAATTA TACAGCAACA CTTGAGAATG AAArTCGAAA AAATGATATC GAAAGAAAAA TCATATCAAG CAAGAGTTGC AACAGCAGCT CAAAGAAGCA AATGATAAAG TAAGATTCAA GAAGATTTGG AAAAACGTTT AGAAGAAAAT ACTAGAACAA ACACTCAAAA AAGAAGTCGA AAAAATGCCT TGAGATAAAA CGTGTGGAAC AGTTIGAAACA ATCAGCTCAA ACGAGGGTTT GCAAGAACAA TTCCAAGTTT- TATTATGGCT AC'rTGATAAT TTTGATGCCT TTGTTCCTGA ACATGTTTT1 GATTGATCAG T'rTAGATATT TGCGAGATGG TGGGCAGGAT TAAAGCAACA TTTGACGAAG CTATTCAAGA ATTTCTTCC TT'ATT 'TAAA GATCAAAAAG AAGACATTTT TGACTATATT AATTTTCACT CCTAAACGAG TGGTGAAAAG GATGGTAGAT AGGGATT'TT GATGATCCAT CTAAGACT'IT TATTGATTT-A TAT'rGCAGAA CTTGTGAAGC GGTTATATAA TAGCAATGGC TCCTGAAGAA CGCTTAAAAC ATATTT'rGGA AAAGCAAGT'r GATTATCTAT AACATTTCCA CTAATTTTAT ATTTGGCAAT
AAACTCATTC
GAGAAA'ITTA
GATGAAATTC
TACGGTGATC
TATGAAGTAA
TTTGCACGGC
AAGAAAAACG
CCACCCCAGA
GA'TTTGGAAA
AGAAACAAAT
TTrrCTGAAGC
CGCAAAAAGA
ATAA.AGAAAA
'rCGAACAGGT
GTGACCATTT
AAACTCTAAC
CAGGGATTAC
ATCTCTTTGA
AGTTGGCGGA
AGACCAACCA
AGGAAAATCC
2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 TATATGAAGT CAGGCCTCTA TTGAAAGAGG CCTTTCCAAA TATGGATTTG CTCCGTCTGA CTT'rCTAAAG ATATCAGTAG GAAGAATTTT GTTTTAGCAG ATACCATTCC AGCCGCTAA.A GAAGGGAGCA TTCAAAAGTT GGT'rGATTCC TATTT'rGAAA ATAATTAAAA AGAAGGCCGA GTCAAAATTC TTTGAAATCA GAAAAAACGC ATAATATTGA GTGCTTTTGT ACTGCCCCCC AAALAGT'rAGA CAGAAAAAAT CTAACTTTTG GGGGGCAGTT CAGACAATCC TTGGTATTAT GCGTTT'rATT GTGGGAAGAT GTATAATGGA TTGAAATAAG ATATGAACAA ATCAATTAGG AACGTTTTAG AGTAATGGGG GGCTATTTCA ACTTCAACCT AACTCCCTGA TAATTCAAGG AGT'rGTCTAT AGTTAAAT'rA ATTCTGGGTT TTTCCATGCT TCGTCAATGA TAGCTTGTAA TTTTTTGAGT TTCTGCGTCG TTCAATGGGA TATTTACTGG CAACAACAGC TGGTTGACCG ATAAAGACAT TCTCAACDCC CTGAA.AGTGG AAGTACTGCG TTTTCATCGT CAAGGATTGC AATT'rAAAGC ATTITTATAAC ACTATAATAC AGAAAAAAAC
GTTTTTAGAA
T'rCTTTAGCA
ACGAACGATA
GTAT'rGACCT
TTTAGTGATA
GCTTCTTGGA
GATGCTTGCA
CCATGTGCAC
TCTTGGAATA
CGAGCAAGGG
4140 4200 4260 4320 4380 4440 4500 CTACTGCGAT ACCGTAGTAT GTTGCACCTT TT'rTGT'rGAT GATTGTGTAG GCTGCATCAC GAACACCT'rC GAACAATTCA A'rCAATTCAG CTTCTTGAAC ATTTTGAGTG TCTTTAAGGA 4560 4620 ATTCTTCAAG GTrTACACCA CGTGTTCACC CATGATGTAG GTGCTTGACG GAAACGAGCT GGAAACCAGA GAATTTCCAA GGAAGATACC TTfTGAAACCA GG?1'TTTACC TACAAGGTCA TCACAACAAG GTCAGCGTCT AAGTGAAGGC AAGGGCGTGA GTGGAATTTC GATAATTCCA AAGATGAACC TACAGCACCA TCATTGT'rTr AAACATCTCC 1246 GCGATGTTAG CGTGTGACCA AACAGCGAAC TCAGAGTCAC GCGTGCACTG AACGAGCATC CACATCCAAT TTTTcAGCAA GAGTCAAGTG AAGTACCTGA ACCGATAACG CGTTCTTTAG GTTGAGTAAG TCAAAACGTC AACTGGGTrA GCAGCAACAA GAT'rCAACAA CTrGAGT'rAC GATTGATTTG TTGATAGCAA AGACGAGTTT CACCTGGTTT TTGAGGTGCA CCTGCAGTGA GCACAGTCAG AGTATTGAGC TGCATAGATT TTTTTAGGTG CTAAGGTCAA GCGCATCACC AACAGCTTTT TCATGCAATT AGCTCTTGTG CAATTCC PTG GT'rAACAAGT GCAA.AAGCGT TCACCGACAA GGATAACTTT TTTGTGT'rGT TTAGTTGAAG TTAATTTTAT TAGGGGATTT TCCC'rAGACA ACTTCATT 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5278 INFORMATION FOR SEQ ID NO: 228: SEQUENCE CHARACTERISTICS: LENGTH: 1941 base pairs (8B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xci) SEQUENCE DESCRIPTION: SEQ ID NO: 228: ATAAGGAATC TCTAAAAAAT TTTAAGGAGA KAGTATGCCA CTGAATTTTT GGGAACTGCC GCCAACGTTG AACTTAAAGG TACGAAAGGT GGTATGGTA TGGGGGTTAT GATCCCAGCC ATCAACCCTG CTTTCACTCT AGGGCTTGCA GTACCTTACA TTATCGCGCA AGTCTTGGGG ACATACCGTC CATTCTACTT GAAAACTGAA ACTATTTCAA GTATTGACCA TGGTACAAAA TTGATTAATG AGTTTGTTGG TTCATTTGTT AACTTCTTTG GTGCTGAAGT GCTTCA.ATTC ACAGTTGATT TTFCTGACTT GGCTATTAAA CTTTCTGTGG CTCACTTGGC ACTTGGATTC GGACCTACAG GACCTGCCTT GAACCCAGCC
ATCTAGCAAA
ATTTTGATCA
CACCAAAGTG
TTGATGTTTG
GTTAGCGGTC
TGGATTTCAC ATGGGCACTG TTCTTGGGAA TGGTGCAGTT GCTGGATCGT CATCGCTGTT GTAACGTATC TGGGALATCAC TTTTCCCTTG GGCACAAGTG GCTATCTTTG GCCAAGCCTT AACCCAAATA ACATCTTGGG GAAAGTCGCT ATGCAGCA-AC TTGTTCTTTG CAGCTCTTGG
AGTTGTGGCA
AAC?1'TCTCA
TGTCAATGGT
TTTGACTAAA
ATGAAACAAA AGGCAACAGA AGCAGGACAA GCACAGGTGG CTCCACACAC TGC?1'CAGGA CTCGTTATGG CTTTGGTAAC ATCACTTGGA CGTGACTTGG GACCACGTCT CCTTCATGCT 1247 TTCCTTCCCA AATCAGTTCT TGGTGAGCAT AAAGGCGATT CAAAATGGTG GTATTCTrGG GTACCAGTAG TAGCACCTAT CGCAGCAGCA ATTGCGGCAG TAGCTGTAT'r CAAA'rCCT' TATCTCTAAG AAATAGCTCC ?r'rAACAT'rT GAGTGAGCAC CATCTATAAG TAAGAGAGGA TCAGACTGGk TCTCTCTN'T kGATrTraG GGAAATGAAA GAAcTCTAAkA CAAACTCCTC TCCAGCAGTG GTTTAGAAGT CTCAGTGGGC TATTCCAGCI' TCAATGGACT ATAGTAGGTT GCAGTTGAAA TAATAGACCC TTGT?1'CTAA AACATrGTGA GAAATTGGT'r AATCAAATTG TGCAGTTTrC ATTCTACTAT CCCTATCTTG 'rAAGTCTGCT TTATAGTGGG CAAACATTGT GAGGAATrGA TTTACCTITCC TACTATAAAA TAAGCGATTA GGGGGGCTAT TATGATTGTT ATCGT'rTAT CTGCAATTN' GAAGCTAGCC GCAGGTTGTT CAAAACACAG AATAGTACAC ATCTACTTCT AAAACATTGT TGCCCTATTC TTGTT'rCATT TrACTATATA GAGTGGATGG ATAATGCTGA AAACTrCCTTG ATTAGT'rAAA TN'?TTACCAA GAATAATTCA CTGA.AATTTG ATAAAATAGT AAGGAAAGTT ATATATTTTA TTGGAGGCTT TTACTCAAAT ATATTATCGG AATATTATCG TTGAAGTTGG AATAGTCCTC
TCAACAAAAT
TCTTCGACCT
ATACTCAATG
TTTTGAGGTT
TAGAAATCGA
AACCAGAGAC
AAGGATAAGT
CAAAAACGTT
AGACTGTATT
GGCAAAAGAA
GTTCAGTTTC TATTTCATTT ACATTGACTC TGCTGAGTCC AAAATCAAAG GGCAAACTAA GTATAGTAGA TTGAAACTAG TTTGACTGTC CTGAACGATT TGTT'rACATT TTCAGCAAGT CTATTTAGTA CTTTCTATTA
TGAATTCTCC
GAGATGGGTT
CC'TTCT~TCT
840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1941
GTAAAACACT
GCCTACTGTC
AAATACGATC
TGCAATTTAG
TATCTATAAA
GTAGTAAACC
ACACGTTAAC ATTGGTACTA TCGGACACGT TGACCACGGT TATCACAACT GTTTTGGCAC G INFORMATION FOR SEQ ID NO: 229: SEQUENCE CHARACTERISTICS: LENGTH: 755 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear AAAACTACCC TAACTGCAGC (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 229: AT'IrGAAGAA ATTGAAGAAA TCGTAGCCCC TACAGATGGT GAATTTTTGG GGGAAGTTTT ACTTGGAACT GGGGTAGTTC TCTTAATTGG AGTAGCCTGT TGTTAAAAAG ATAGGGAGTG ATAATCATGC AAGATAACI' TTTATTTGAG GAAATTGAAG AAATTTCAGT ACCAGTTAAT
GATTTTTCAG
GGTTGTTGAA
A?1'TAAATITr' TGAT'rGCGGA
TTAGGTTTGT
ACTATCATTT
CTGGACTI'GC
GTTTGTTCAT
TCCGTATTAG
CTGAGGGACT
TATGAGAATT
ATATATACCA
AACAGGTATC
TTACTAACAT
TCTrGCAGCA
AGAGTATGTT
GTTGATAAGA
1248
GGATTTGGTT
CAAGCTTTTT
AGAGATTAAT
TTACTTAACC
T'rAAGATATT TAGCAATCCT TGCTCTTGCT CAATTTCATT TTAGACAGTc AGAATTAGTC ATTATTTTAT CCTCTTTTAT TTAT'rAAAGG ACCTAC'TCCT TATGAGGGAC ATGGTTAGAG CAACTAATAG GAAGAAATAT TTTTAAAATA ATAAAAGATT TAATAAGAAT ATACATGAAA TTAGTCATGC TCCAGTAAGA AACA'rGTATT AGTTGGGAAA CAGGAAAAAA.
GTCAAGAATT TACCATATCG GACTTATTAG TGTTAGTAGG TCT'rGGGACT TTAATATAAC ATTATCTGAA AAATTAAACr TTTGAAAAAA TCCTATCTTG TTGTCATTAT ATTTGCAACG AATAATTGCT AATAA
S
*5 INFORM4ATION FOR SEQ ID NO: 230: SEQUENCE CHARACTERISTICS: LENGTH: 1483 base pairs TYPE: nucleic acid STRANDEDNESS: double (0D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 230:
CCAGAAAAAC
TAAATTTTCA
CCAAGATGAA
CGTAGTGGAG CTCGTGGAAC AGTGGAATTG ATTTTCCAAA AAGAATACAA AGTATCTCAA AGAGGGAGGC AAAAATCAAA GAAGAAATCA CTTTGGAGAA TGGTCGTAAG CGCCAAAAAA a S. *a S a a
S
S
a. a.
a a
S
CATCTCTATT TATTGTGGAG TTTGGGGATG ATAAGATGTC AGATGCATTT ACAGATGTAG AGGCACATGA GGGACAAGTC GTAGAAATGA ATAGATTGGG TAAGCTAATT GAAGTTTATC TGGAAGGAGA TAAACAAGTT AATGTTTACG CAGAAAAGAA TTTGATTCAT TATCTTGACT TTCTCCGAAT AATTTAGGTG AAGGCAATC:A ATGAAAATTT TACCGTTTAT AGCAAGAGGA AAGCTTGTTC CTTTTTTAGT AGTAGGA'rTG TTGAATCCTT TACTTACTCA AAAGTGAGAA ATTTTCTCAC TCGCTTTATA TTATTT7I'CA ACAAGTTATT AC'TTGAAGAT ATGCTAGCAG CTGGTGATAG GATAT'rCTTA
TTTTTCTTTT
AGGAGGAAGA
GTCAGTTAAA
TGTCTATGCC
TTATAAAAAT
TTTTGTAGCC
GGGGATGATT
TATTCTCGTC
TATTCCAGAG
GGGAGCCAGG
TTTCGTTTGA.
ATCCTGCTTA
AGTGTACTTC
GAAATGGATC GATTGCGCGT AGATTGATCA GTGGCGCATG GTAATGTCAA TGGTTTTGAA A'N'CCGGCAG CTTATGGAAA TGCGAATGAA TGGGGACATC GTGCTCGTCG GGAAGGTTAT CGTGTAGATA ATACACCGAC GATTGGTTCC ATTACTTGGT CTACTGCAGG AACTTATGGT 1249 CATG'rTGCCT GGGTGTCAAA. TGTAATGGGA GATCAGATTG AGATTGAGGA ATATAACTAT 900 GOTTATACAG AATCCTATAA TAAACGAGTT ATAAAAGCAA ACACGATGAC AGGATTTATT 960 CAITMTAAAG A'N'TGGATGG TGGCAGTGTT GGGAATAGTC AATCCTCAAC TTCAACAGGC 1020 GGAACTCATT ATTTTAAGAC CAAGTCTGCT ATTAAAACTG AACCTCTAGC TAGCGGAACT 1080 GTGAT'TGATT ACTATTATCC TGGGGAGAAG GTTCATTATG ATCAGATACT TGAAAAAGAC 1140 GGCTATAAGT GGTTGAGTTA TACTGCCTAT AATGGAAGCT ATCGTTATGT TCAA'PTGGAG 1200 GCTGTGAATA AAAA'rCCTCT AGGTAAtTCT GTTCTTTCTT CAACAGGTGG AACTCATTAT 1260 TTTAAGACCA AGTCTGCTAT CAAAACTGAA CCCCTAGTTA CTGCAAC'rGT GATTGATTAC 1320 TATTATCCTG GAGAGAAGGT TCATTATGAT CAAA'rTC'rCG AAAAAGACGG CTACAAGTGG 1380 TTGAGTTATA CGGCTTATAA CGGAAGTCGT CGCTATATAC AGCTAGAGGG AGTGACTTCT 1440 TCACAAAAT'r ATCAGAATCA ATCAGGAAAC ATCTCTAGCT ATG 1483 INFORMATION FOR SEQ ID NO: 231: SEQUENCE CHARACTERISTICS: LENGTH: 1027. base pairs TYPE: nucleic acid STRANDEDNESS: double D( TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 231: CCCGGAAAAC AAGTTAAAGT TGAAGTTGGT CAAGCAGTTT ACGTTGAAAA ATTGAACGTT GAAGCTGGTC AAGAAGTTAC TTTTAACGAA TTGTTCTTGT TGGTGGTGAA AACACTGTTG 120 TCGGAACTCC ACTTGTTGCT GGAGCTACTG TAGTTGGAAC TGTTGAAAAA CAAGGAAAAC 180 AAAAGAAAGT GGTTACTTAC AAGTACAAAC CTAAAAAAGG TAGCCACCGT AAACAAGGTC 240 ACCGTCAACC ATATACAAAA GTTGTCATCA ACG CAATCAA CGCTTAATTT TAAGGAGAAC 300 ACATOATACA GGCAGTCTTT GAGAGAGCCG AAGATGGCGA GCTGAGGAGT GCGGAAATTA 36 CTGGACACGC CGAGAGTGGC GAATACGGCT TAGATGTCGT GTGTGCATCG GTT'rCTACGC 420 TTGCCATTAA CTTTATCAAT TCTATTGAGA AATTTGCAGG CTATGAACCA ATCCTAGAAT 480 TAAACGAAGA TGAAGGTGGC TATCTGATGG TTGAAATACC AAAAGATCTT CCTTCACACC 540 *AGAGAGAAAT GACCCAGTTA TTCTTTGAAT CATTTTTCTT AGGTATGGCA AACTTATCGG 600 AGAACTATTC TGAGTTCGTC CAAACCAGAG TTATCACAGA AAACTAACAC GGAGGAAAAC 660 ATTATGTTAA AAATGACTCT TAACAACTTG CAACTTTTCG CCCACAAAAA AGGTGGAGGT 720 1250 TCTACATCAA ACGGACGTGA TTCACAAGCA AAACGTCTrG GAGCTAAAGC AGCTGACGGA CAAAC'rGTAA CAGGTGGATC AATCCTTTAC CGTCAACGTG GTACACACAT CTATCCAGGT GTAAACGTTG GTCGTGGTGG AGATGATACT TTGTTCGCTA AAGTTGAAGG CGTAGTACGC TTTGAACGTA AAGGACGCGA TAAAAAACAA GTGTCrGTI-r ACCCAATCGC TAAATAAAAA GGTCCATTGA ACCTTTTATC CCGAACCT'rG AAATGTAGAG GTGAGGAAGC TAGAAACAGC TTAAAA'r INFORMATION FOR SEQ ID NO: 232: SEQUENCE CHARACTERISTICS: LENGTH: 1990 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232: CGGTTCAAAT GGTGCAGGTA AATCTACGTT A.ATTAATTCT ATTGTAGGTT TTCAAGAGAT TTATTTAGGA GAAATAGAGT ATTGTGATAA AGATTTGATA GTTAGTTCTC AACCTTTTGC 1~ TCATTTAGGC TTPTACTCCTC TGTAATATTG GGGCTGAACC AATAGCCTTA GAAATTGTTG AGGTGGACAA CTGCAACGCG TATTTTAGAT GAACCTACCG TTTAAAAGAT AAGAG?1'TGG ACTCGAAAAG TTTTGTAAAA TGATATGCGT GACTTTGTAG AATTTCTAGA TATCAAATTG AAACCACAGT AATTGATTTT TTGCTGGAAA GTTTGGGAA.A GGTTAGCTGA TAAAAAAAAT TCCAGATTGC TAGAGCAATA TTGGTTTAGA TACTGAATCT C, 0 :0,.6 0 C
AAGGAAAAAC
AAATACTTTT
ATAATTCAAC
AATTTTTAGA
AAGTCCCTAT
TTAAAAACT
GAGAAAAATG
GATTATTGCT
TAATGATAGT
GGTAGGAAAA
TTATTTGCAA
TTAGGTTTAA
TTTACAATAG
GCATGTGAAA
AGAATAGGAG
GAGGTCTAGC
TATTATCATA
TTTACAAAAT
TATCAAATT-A
AAATTTTAGA
AGAAGAAAAG
TTCAACAAGT
AAGGCTGATC
AAAAATGAGA
CCAATCTTAT
TATACTACTG TGAAGGACAA AATGCTGAGA AGTTGTGTCA AATTTGGTAG AAACATTGTC GCTCATAATC CAGATTTTrA GCCGAAAAAT TTTTAATGTA TCTrCACATG ACATAAATCT GGCTCCATAT CATTTTTrGG AATTTTTCAA TGCAGAATAG TTTAAAGTTC ACATCGAAGA ATCTTAGATG TTATCAATGA AAATTAACCT TACAAGAAAG AATN'AAGGCA CAAATCGGAC TTAT'TGCTTTf TTTTAGAAGT ATGTTGTTTT TATAATAATA 780 840 900 960 1020 1027 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 AAAGGTTTAA TTATTTCTCA GTTTCTACAA GGATTAAATT CTTCGATAAA GAACATTCAG TTTAATGATA TAAAAACCTC TTATGCAGAA TATACAATCA TTGGTGTTAT AGCTTTATTG ATAATCGGGC AGATGACTCA AGTTATTTAT 1251 AGGGTGACAA TAGATAAAAA ATATGGGCTA CT'rGCTCT-rA AGTTA'rGCAG TGGAGTTCGT 1140 CCTTTATATT ATATTTTAGG GATGAGTATC TATrCTATAT TAGGGTTGAT AGTTCAAGAA 1200 ATTATTATAT ATATAATTAC GTTAGCGTTr GAGATAAATA TCGCAATGGA TAGATTTIT 1260 TATACAGTTT TGTTATCTAT TGTTGTTTTA TTA'TTTTGGG ACTCCCTTGC AATTTrACTT 1320 ACAATGTTTA TCAATGATTA CAGAAGACGT GATATTGTAA TACGrTTrGT ACTAACACCG 1380 CTTGGTTTTA CAGCTCCTGT TTTCTACTTA ATAGATTCTG CTCCTAGTAT TGTGAGATGG 1440 ATTGGTCAGT TAAATCCCTT AACTrATCAA TTAACTATTT TGAGAAACTT TTATTTTAAA 1500 AATTCAACAA CTTTGGAATT AGTTTTCTTA TTGTTAACAT CATTACTTGT CCTTATATCT 1560 GTATCTTTTA TTATACCAAA GATAAAATTG ATACTGATAG AAAGATAAAA GTTGGGTCAT 1620 CCAACTTTT TGT TGTCTCC CGAAAACCAC TAGCTATGCT AGTGGTTCCA TAGAGCTTT'r 1680 AGCGTGGTAA CAAAAAGAAC CTCCTAAA.AT GATAAGATAG AAGTGGTTrC TCCGCCACTA 1740 CAACATATCA TACAGGAGGT ACCTCATGAG AGAGGATAAr CAAAGTTTAT CACATACCAC 1800 ATGGAATTGT AAATATCATA T'rGTTTTTGC ACCCAAATAT CGTCGTCAAA TCATTTATGG 1860 *CAGATACAAA GCTAGTATCG GAAGAATCAT ACGTGACTTA TGTGAGCGTA AGGGTGTAAT 1920 *AATCCATGAA GCGAATGCTT GTTCAGACCA TATTCACATG CTrATCAGTA TTCCTCCGAA 1980 *..ACTTAGTGTT 19 INFORMATION FOR SEQ ID NO: 233: SEQUENCE CHARACTERISTICS: LENGTH: 4766 base pairs B) TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 233: GAACTATATT GCATATATTT CTAGCAATGA TCATGGCGAA TCTTGGTCTG CACCAACTTT ATTACCTCCT ATAATGGGAC TTAATCGGAA TGCGCCATAT TTAGGTCCTG GACGTGGAAT 120 CATTGAAAGC TCAACTGGAC GTATTCTTAT TCCGTCTTAC ACTGGTAAAG AGTCTGCGTT 180 CATTTATAGT GACGATAATG GAGCATCTTG GAAAGTTAAA GTAGTGCCAC TTC!CTTCTAG 240 .TTGGTCAGCA GAAGCACAAT TTGTAGAATT GAGTCCAGGA GTAATTCAAG CATATATGCG 300 TACAAATAAT GGTAAAATTG CATATTTAAC AAGTAAAGAC GCAGGTACTA CTTGGAGTGC 360 ACCGGAATAT TTGAAATTTG TTTCAA.ATCC AAGTTATGGA ACACAATTAT CAATCATCAA 420 1252 TTATAGCCAA TTGATGATG GTAAAAAGGC TGTCATTN'A AGTACTCCAA ACTCCACAAA TGGTCGTAAA CACGGACAAA TTTGGATTGG GCGTTATCAT CACGACGTTG ATTATAGTAA GTTACCAAAT CATGAAAT'rG GATTGATGTT ACT'rCATATG AAAAATGT'rG TACCATATAT TTAAAGCTGA AATTTGAAAA TATATAAAAA TGTTGGAGCT GGATATTTTG GAGCTGATTT AAAAGTGGTT GCGGTATTTG ACCCAAATCA AGATGTTTGT GCAAGTTTAG ATG.AACTTGT AGCTTCACCT AGCTACCTVC ACCGTGAACC CGTATTTTGT GAAAAGCCAA TTGCATTGTC ATGTAAAGAA AATAATGTCA TCTN'ATGGC ACACCATGCT AAAGAATTGA TTACTCAAGG TGCTCGTACA GGTTGGGAAG AACAACAACC ATCTGGAGGA CATTTGTACC ACCATATTCA AGGACTTCCT GAAAAAGCGA CAATGGTAGG
TCTAATTAAT
CTATGGATAC
TGAAAAAT
AACATTTAAG
GAGGATAAAA
GATGATAATA
TCATATTCAA
CATTCATGGT
ATTGAAGATC
AT'rATGGTAA
CZAATTGATTG
CATTGACAGA
CTCGTAATGA
TGAAAAAGAA
ATTACGGTAT
AGCTCGCTCA ATGAACAAAA TTGAAGATGC TGGAGAAGAA GT'rGCTCAAG AGTTGGGATC AGCACGTGAA GATATTGATT GTGTGATCGT AGTTGTGAAA GCTGCTCAAC ATGGCAAACA TTA'rGAAGAT TGTAAAGCCA TGGTTGACGC TGGTCACATC ATGAACTTCT TTA.ACGGTGT TAAAATCGGT AAAGTTCTTT ATTGCCATGC AACTGTATCA TGGAAGAAAC TTCGTTCTCA TGAATTAGAT TGCATTCAGT TTATCATCGG AGGCAATGTA TATCATAAAG GTGAAAACTT CTTAGAATAC TCTGATGATC GTTATGCTGT TGAACACTAC GTCT'rGATTC AAGGAACTGA TGGCGGTACT CTTCGTGTTA AAGGTGAAGG AGAGGAAGAT GATGATCGTA CAGCTATCTA GTACGGTAAA CCAGGAGTAC GTTGCCCATT 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 TCGTGATGAA GATGATATGC TTTGGAATAT GGTAATGCITT AGGAGCTATC AAACT'rGACT AGAATCACAC TTCTTAGTTC TACCGGTCGT GGTATGGATG ATGGTTGCA-A ACATGTATTG AGAAATTACA GAAGAATTTG TACCGCTGAT GCATGTACTT CACAAATGCT TAACTNTTGT AGTTCTGTGA TACAACTCAT CCCATTTNTT ATCAAAAAGT AGGTATCGAT ATTGGCGGAA TTTGAATCAT TTCAAAGAGA ATTAAATCAG GTCTGTGATr
TCATTGTAAA
TCCGTTGGGG
TGTTCAATAC
ATGAAACTCA
GAGCAATTGC
ATAAAGAAAT GGAATATCTA CATGACATCA TTAAAGGTGG AAAAACTTCT CAATGGTGTA CCTGCTTTAG AATCAATC TATCAGTTAA AGAAGATCGA AAAGTAAGTC TTTCAGAAAT AAAACAGAAT AGTAAATTCT TGTCATTATA TAATTTCTAA TGAATAAAGA AATAGAGATG GGACTGGGAT AATGCCCAGT AATGAGATCA AAAATGTGCG AGTGTT~GAAA TGAAGATTAT CAACAATTAA GGCAGATTTA TACGATGAGT TTGGAACGAG
TAGAAACAAT
TAAT'rGGTGA TATTGACTAT GATTTGGGAA CGAATCAGAT GTATACTTTA AATCATTCAA TTGATGGTGT 1253 TGGGATTCC ACTGCTGGAG TI'GTTAATGC TAATACTGGA GAAATCATCT ATGCAGGCTA TACAATACCA GGGTATATCG G.AGTAAACTT TAC'rGCCGAA ATAGAAAAAC GITTGGCTT GTATACT 'TT GTTGAAAATG ATGTTAATTG 'rGCTGCATTA GGTGAATTGT GGAAGGGACA AGCCAAAGAT AAGAAAAATG TAGTAATGGT TACTATTGGA ACAGGTATAG GAGGCAGTAT TATTGTCAAC GGACAAATTG TTAACGGA'rr TAACTATACT GCTGGTIGAAG TAGGTTATAT TCCTGTAGGT AATTCGGA'I'1 GGCAAAGTAA AGCCTCAACA ACCGCAT'rGA TCATTTATA TCAAAAAAAG AGCTTGAAAA CTAATCAAAC TGGACGTACT TCGAGA'rAAA GTTGCTGAAG AAACTTTT'GA AATTTTTGTA AT'rAACGATT TCTTATCTAC TTAATCCAGA AATTCTCATA TAGTAAGGAT ATTTTGTTAC CTGAAATTCA TAGGTT A CCTAAAAA'rC TTGI'GGCAGC AGCTGTAAAA AATTTCTTAG ATAGAATTT'C CACAATGACT AACTCTGTAT TTTCGACAAT TATAAAATCA TATGATAATG ACATTTATAC AAAACTAGAA AAAAGTTATG ATGAAAAAAG TTTAGAAATG AAACAACAGA ACCTTATTC-A
AAGTTCTITTA
TACATTAGGA
TA.ATAAATAG
GCAAGATATT
TTATA-AAGCT
TCACGAAGAA
TGAGGTTAAT
GAGAGGAGAA
AGATGAAAAG
T'rCAAGAACT
CTCTAGAA.AA
ATCATGGGGT
TAGAAGCAAC
rrCTCACTG ATrAAGATC GAAAATCTAA CAAAAGGTTT TTAGGAGGTG GGATTCTGGA GCTAAAAATG CAATGGATAA AATGAAGCTG GTCGTATAGG TATGTAAGAT AAGGAGGTGT GAGAATGTTG CAACCGATAT GTTTCCCAAG AAGAATTGGA TTAGTTTCAA TAGAAAGCAA AAAACAATCA AGGAAAATGA T'rTGTAGAAA AAATTATTGG AATCTCTCTA CTATT'rTCTA AGAGTCAGTG CAGTTCCGTG AGATGAAGTT GTATTTCCAA TATTGAGGGA AATGATGCCT TGTAGAGAAA TTACAAGAAT 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 TGCAAATATT CAGTATATTT TAGGGTGGTA GAAAAATATG AAAGTAGTCT TGATGATGTT ATTTAAAGGT TCAGGATAAC CTATTCAAAT TTTTCATACT TGACTTATTT GATGAATCAA ACCTACCGAA AGAAAACACG AAGAATTAGA GAAATTGCGT ATCAAATTAA CAGAGTTCAT AGCGATTAAT AAAGTGGGAA TCTGTAAAGG AAAAGTCGGA TTTTAGAAAA CAATATTGAA
CATCAAGTAG
GCCATTAGTC
TTAAAAACTA
TGGTCAGAAG
TCTAATTCCA
CAACAACA7T
TTTAAATTAT
AAAGCTAATG
GAATrrAGAAA
AAATTAGAAA
ACAA'rTCCAA
GTTCAAGAAA
TGCAGCAACC TCCGATAACT ACCTCTTATG TTGCTGAGGG TGTTCT'rAAA AAAGTGAATC GACACATTCA AAGTIAATAAT GAGGAAATAG TTGTTCCTGC GAATTTAGAA CAATTTTCTT GGACTGAAGA TAATCGCTTA TACAATAGTC TATT'rTCTAA TGATAGAGAG TACGGTGTTG T'rGTT'ITCTA TCACTCTAGT TACTCTATAG ATTTTGATGA ATACTTATTr GAACCATTTG 1254 ATTATTCTAG AAAGGAATA CCGAAGCAGC GAGTAGTAGA TTTAGATCAA GAAAACATGC AGTTA6ATAAC TGAAAAGAG AATATFATCG CATCGTTGCA TAGATTTACA ATGGCAAATA GACTATATTT TATCTATCTA ATAACTTNT GTGCACTCCG CATCTAGTTG CATTAGAAGG TTrTATATT'r TATAAAAGTT ATGGATGAGC ATTTTGGACA CGGAAACATT GACGGATAAT CAAGATGAAA TACCTATCAA TTGAACCATT TGAATTATTG ACAGAAATGT ATGCTCTGCC AGATTCAAAG AAATATTTGA TGCTCGTCAA ATCTCTAAGA ATGGATAGAA GAAACTCGTA TrCTATTTAT ATTTATGAAT ATTAACGAAT CATTCTTT'AA CTACACCTGT ATTAGCACCA TTT'rACTTTA GCTATGGTT'r ACTATTGTTT TTAGGAACAA CAGCAACTAA GAGATTTTTA AAATTCTTTA GTGGA6ATCTA TGGCTCATTT TTTGGATATG CTGATGTCAT GACTATATTA GTAGTGTCAG GTTTGTTAGC TTCAGGACTA CAAAAAGTAA CAGGATTTGC GTGGTGTGrr ATTCTG INFORMATION FOR SEQ ID NO: 23 SEQUENCE CHARACTERISTICS LENGTH: 2484 base p TYPE: nucleic acid STRANDEDNESS: doubi TOPOLOGY: linear
CATTTTTTGG
TGTTAGCATT
CAAATATTAT
AATGATGGTT
AAAAATTTTT
GAGAAAGATC
GCTGATTTAG
CATCTACCTT
ATATATTAGG GGTAGCCGTr GCAATTTGGG AGTTGCCATT TCATCTGATA TCTACAACCT TTGTGTTTGG GTT'rATTACA GTATTTGCAG GAATGAATAA ATATGCAGAA GCATATAATT 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4766 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 234: CCTTTTAGAA AAAATTAAAG CCCTGATGCC TTGGGAAGTC AAAAACCATC AAAGCCGTCG TCTTGTTGA.A GATAGAGCCT TGCTCGTATC GATGATAAGC TCCAAATGAT GATGTATACG aGaTGATTAC CCTATTTGCC TGCTCTTTGC AGGAATTGTC GGACTCTTCG CCTGGCTGCT GCAAAATGGA CACTATGAGC AATACGACAC CATTATCATT CATCGTCATA TGAAACCAGA AGGTGGGATT GAAAGCCTTG CTGGAACATC ATTTCCCAGA GT'N'TGATGA ACCAACTCTT ACTTGGATGG CTGAGATGGA ACCAAGGCGC ACTTGTCATC GTCTGTGATA CAGCTAATAC GCTATAGTCA AGGTGATTTT CTCATTAAGA TTGACCACCA GTGACCTGTC TTGGGTCGAT ACTAGTTCAA GTAGCGCTAg CAAACAACCC AACTAGCCTT GGCAGATCGC GATGCTGAGT GGTGATACAG GTCGCTTCCT CTACCCTTCT ACCACTGCAC TATTTGAGAG AACATAACTT TGACTTTGCG GCTCTCACTC TACAAAATTG CTAAACTGCA AGGCTACATC TACGACCATC TGGAAGTGGA TGAAAATGGT ACAATATAAC CGA'rCCTGAA TGAGTCTCTG GGGAATN'TTr GTAAAGTCC-A TCCTATCAAT CAAGTGGTGC TAA'rTCCTAT ACTTGCTTAA AAACTGATAA 1255 GCTGCTCGCG TTATCCTGAG ACTGCGGCCA TT'GTAGGTGC GTCGAACAGG CTGATrGGCCA GAAATTGCCA AGGAGCATGA AGCCTAGAAG AAAACGAAAT AATACTTGCC AAACT?1'?CA
TCAGAAAATC
ACCTGCACGC
CrACCGAGTT
TGGTGGAGGC
CATCTACCAA
GAATCTGATA
AAATATTGCA
ACAACTACTG
T'rCGAAGGCG
TTGAAACAAT
ATTGACAGAG
CGCTTACGCA
CACCCTCTAG
AAGTTAGAAG
GACTAGTATA
AAA'rGAAAAr GTTACCAr'TT
AAACTTACCC
c'TAACAATCT
AGATATCCAT
CCTTAGCGGT
ATTGATCCGT
CAC'rCAAGCA
AGAGAACAGT
ATGGCTCGCA AAGAGACCAT GGCAGAAAGG CCAGAATATC GCCCAGTTGT CTTCATGGAC TCAACAAAAC GCTCTAACGA AACAGTTGAG GTGGAAATTT CATCAGACTC ACACCCATT'C TACACTGGAC GTCAAAAGTT GATGGACGCG TGGATCGTr'r TTGCTGTT CTTTTTTGTT 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 CAACAAAAAA TACGCTCTCA AATAATGATA TCTTGAAATC AACTGCTGTT TTCA'rGTTCC *t 9 9* *9 AGACTCATCT GTAGGT'rCGA TTTCCATGCT ACTAGGCAGG AAGGAAATAG CTGTTTCAAC ACGTCCATAA TGAGC'rATAC TATTGTCACG AACCACACTT TCATTGATGG TCCAAGTGGA ATTCATrTTC TTAAkAAGCTT CTCGGACTTT TTCCAAATCT TTGGAGGCAA TGGCCTGCTC TAAGGTTTCA AAACGAGGAC TTATACTCAT CTGCTTCAA AAAGCATTCT AGTCCATCTC CGATTACCGA 'rGGACTTTAT CACCTCCTTC TCCAGTCCTT GTATGACATC TTGALAGTTGA TTCATGACAT CTTCCAAAGT TCgAAAGGCT TTATTCTTAA ATCCACCTTT ACGAATCTCT TTCCACACTT GTTCA.ATGGG TTCATCTCTG G'TGTGTATGG AGGAATAAAG GTAAAATCAA TATTAGTCGG AATATTTAAG GTACTTGATT TATG;CCAtAT AGCATTGTCC ATAACGAGTA AAAGGATAAG CTTGTGAAAG CTCTTCTAAA AAGGCGTTCA TCCACACTCC CCTGAAATAA GGCATCAATT GTAACAAATT CTCCTGCCTC TGTAGCCTTC CAAGAAAGGC TTTCTCTTCC TCAACTGTCA TATATGCATG GT'rACGACCA CTTGAAGGAG AGAGTCGAGT CCGAACTCCT CATA'rTTTTT TACGTTTCGC TTTGATTACA GTCTAAAAGC TCTATAATCT CTTTATAAGA TTTGCCCATC TAG'rAGATTG AAACTAGAAT AGTACACCTC TACTTCTAAA ACAT'rGTTAG GTCCTGTTCT TGTTTCATT-r TACTATAGAA CGATTTGAAG GCGT'rTATAA TACGAGAGTC TTTTAAAAGT G'N'rTGATGG 'ITTCGGATTTC TTCTT'rAGTT TACTATTATA TAATGCTTTT TGATTTTAGT CTGGTATAAA TATTGCTTTC
TTTTT'ATAAA
AAATGACGGG
CCACGTGTTT
CAAATCGTTG
AGACGAAALTA
APLATCGATTT
TATTTAGCTG
GATTTCATAT
CTCCAAAATG
1256 GTCATAGTTT TACTGGCAAA TCTAACA'rAT CACGGATAAA TTAACAAGTG ATTTCTGAA'r TGCTAAACAT TTT~CTTTTCT TATAGCATAC TTAAGAx-r' TGTCTrTGAG AAAGATATTT CCAAGAAAAA CGTTCGTTTT TTGG INFORMATION FOR SEQ ID NO: 235: SEQUENCE CH{ARACTERISTICS: CA) LENGTH: 1766 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235: CTAGATATAG CTATAATTrT ATTTATAACA AGAGGATAGA AATGACCGAA TTAGAAAGAA 2400 2460 2484 0 0 0 0 0 AAAATCGAAA AATTAGCTAA GAAATATTCT GATAACTTAA GTTCGTGAAA TGGCAAATGA TAATAAGAGC CATTATTTGA TCATTTGAAG AAGGAGAAAA TATCGATTTG TATCAAAATA TATGCTGGTT CATTTTrAGA AGAAGCTGCA GTACTATGCT GAAAATACTT AAAAAGTTAA CATTCCTAAT TCTGAAAGTA ATTGATTGTr TAGTCGGAGA AAAACACGCA TACGAAATAA GATGGAGACC ATATAACTAA AGAACACACT AGAATAAAAG ATACCAATTC GGTTAATGTT CTACTATCCA AATAGAACTC ACATCAAAGT TCAAGAGAGA TATACAGAGT TTTAGGTATT
AAGGTCGTTT
TTAACGAAAA
CAAAACC'rAA
AATGGTGGGA
TTATTCATAA
AAGCTATA.AA
ATTATGGAGA
TTCTAACAGA
T'rTATACAAA 240 ATTTGGTACA 300 GAC'TTTrGAA 360 TGCAACTACA 420 CAAAGGATAT 480 AATTCAGCAA 540 TTCTGCCTGG 600 TATTGCAAAT 660 GAAATATrAA 720 TTTACACAGA 780 ACGTGGACiT 840 0.0.
0 0.0* ACT'rTAGAAA CATTGTATAA GAACATTTAA GAGCAGTGAC AAAAAAACAG GGGTAAAATC AAACTAT'rGA ATCCTCAAGT AAACCCAAAA ATTATCTAAT CGATTGAGGA TTACAAAGAA AAAATAGTGG CAGTATTTTC TTTTAGATAA TATCTTTGGA GGTGGTCTAA TTCAAAAAAG AGTCAAAAGA TTTTAAATTT ACCAAATACT AGTGGAACGA AAAATGACAG TATTAAAAGG AGATAACTTA AT'rGATTTAA TCTATATGGA CCCTCCTTTC AACAAAAATA TTATGTATTC ATTCGAAGAT CGGTATTGGA GGGAAATATT CGGTATTGAT TTACTTAGTA
TTTTTGTCTG
GTTCATrGTG
GTAGATATGT
GGATTATTGA
AATACAATTT
AAACGAGATG
TAAGATTAGA AGAATGCAAA AGAGTGCTAA ATAAAATTGC AAATCATCAT ATTAGATTAA TTCAAAGCGA AATTATATGG AACTATAAAC ACAATCATCA AAACATTTAC TTTTATrCAA TTACAGAGTA TTCrTCTACT ACAAATATCG GAAACTCTAA AACTATATAT AAGGTTGATA 900 960 1020 1080 1140 1200 1260 ATAATGGTAA CTATATTCTA GCAAAAGAGA AAAATGGAGT TCC-CTTTC!A crATrTTTGA ATATACCATT TCTTAATCCA AAAGCTAAAG AAAGAGTAGG TTATCCTACA CAAAAACCTA TTCTGTrATT AGAACAAATT ATAAAGATTG CTACTGATAA AAATGATATA GTTTTAGACC CGTTCTGTGG AAGTGGAACT ACTTTAG'rAG CCTCCAAGAT TTTGAATAGA AATTATATGG GGATrTGATTT ATCTGAGGAA GCTATCAATA TAACTCAGCA ACGTCTGGAA AATGTTATAA AAACAAGTTC AAATTTATTG AATAAAGGAA TCGAAGCATA TAGAACCAAA ACTGAGGAAG AGGAAAACAT TCTTAAATTA TTACAGGCAA AAATTGTTCA AAGAA-ATAAA GGAAN'GATG GTI"I TrTACC TAAACATT CAAAAAAAAC CGATACCTAT AAA-AATTCAA AAAAATAATG AATGTCTGAA TGAGAGTATC TCTITATTAC AGAATGCTAT AAACTCCAAA AAACTTGATT 1320 1380 1440 1500 1560 1620 1680 1740 1766 T'rGGAGTAGT TATAAAAACT CAT'rCG INFORMATION FOR SEQ ID NO: 236: SEQUENCE CHARACTERISTICS: LENGTH: 748 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 236: CCGAAAATCA AATTCAAACC CTAGTTTCCT AGTTTGCTCT GCGCGGAGAA TTTCTAATTC TCATCTAATA CTAAAGTTCC TAGCCCACCA TACGCTTGAT CTGTATTCT TTTCTGTATC TCTTTGAGAG GAATACCCTC ACTTGTGCCA GATAAGTCTT TGACCATCAT TGGTCAAGAG AACTI'CCT TACTCCGCGC TCCTCAGTCG CTGAGATAAC TACTCCAACA CTGCCCATC GCTGATTTTT CTTTTTGACC ACGTCAACGT CGCCTTGCCG TACTCAAGTA TTGATTTTCA TTGAGTATTA AACTAAATTA TTCCTTGGTC AAGCGACGCC ATTCCCCTCG CATAGTCAAT CGTTGCAAGT CCACCACTTC CTGATGAAAC TTCCCTCTG CAATGGTCAC TATGGATACA AGCTCCAGTA TAGCGGGTTG AGCAAATGTC TCCACATCTT CTTGGGTCAT GTCCACATGA CGCTTGGGCG AAAGAACAAC CAAAAGACCA TGCGTGTCAA TATCCA-AGCG
CAGCCTGCGG
AATAATATTA
TTCTAGGTTC
CTTGCCACAG
ACGGATTTGG
ACAGGTAAAG
GATTCCCTTG
ATGAGCCAGC
TCCTACTGGG
CAAGTCATCC
TCCTTTGGGC
AAAGCOAATC
ATTTACAG
AACAAGTCCA GAACGGTTCT GTGCTTGGGA TTGTTCATCA TGTAGTAGAC AAACTCTTCA TCATCTATTT TTTCATCAAT CTGCAATTTA INFORMATION FOR SEQ ID NO: 237: SEQUENCE CHARACTERISTICS: LENGTH: 1449 base pairs TYPE: nucleic acid STRANDEDNESS: douible TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2-37T AAAAGATTAC ATTGCAACAA TTGAAAATTA TCCAAAGGAA GGCATTACCT TCCGTGATAT TAGTCCT'rTG ATGGCTGATG GAAATGCTTA TAGCTACGCT GTTCGTGAAA TCGTTCAGTA 120 TGCTACTGAC AAGAAAGTCG ACATGATCGT GGGACCTGAA GCTCGTGGAT TTATCGTGGG 180 TTGTCCAGTT GCCTTTGAGT TGGGAATTGG TTTTGCGCCT GTTCGTAAGC CAGGTAAATT 240 GCCACGCGAA GTTATTTCTG CTGACTATGA AAAAGAGTAC GGTGTCGATA CCTTGACTAT 300 GCACGCGGAT GCCATI'AAGC CAGGTCAACG TGTTCTTATT GTAGATGACC TTTTGGCGAC 360 AGGTGGAACT GTTAAGGCAA CTATCGAGAT GATTGAAAAA CTTGGTGGTG TTATGGCAGG 420 TTGTGCCTTC CT'rGTTGAAT TGGATGAAT'r GAACGGCCGT GAAAAAATTG GTGACTACGA 480 CTACAAAGTT CTTATGCATT ATTAATGAAA ACAGTCCCTA GGGCTGTTTT CTCTACACTA 540 *GGATATAAAA ATAGACTATA ACTAGTTAGA GAAAAACTAT AATTGAAAAC TATATCTTCT 600 *TGCAGTATAA TAAAAGGACT AAGTGTTTGA GATTTGTCTT CAAACATATG CAATTATTCC 660 *TGAAAGAGTA CAGTrrAGGAG AGGGTTATGC CGATTCGAAT TGATAAAAAA TTGCCAGCTG 720 TTGAGATTTT ACGGACAGAG AATATCTTTG TCATGGATGA TCAACGTGCT GCCCACCAAG 780 ATATCCGTCC TTTGAAGATT TTAATTTTAA ATCTCATGCC ACAGAAAATG GTCACAGAGA 8 CCCAGTTGTT GCGCCACTTG GCTAATACAC CCCTACAACT GGATATTGAT TTTCTCTATA 900 TGGAGAGCCA CCGTTCTAAA ACAACTCGTT CAGAGCACAT GGAGACCTTC TATAA.AACTT 960 ***TTCCTGAAGT CAAGGATGAG TATTTTGATG GGATGATCAT CACGGGTGCT CCAGTTGAGC 1020 *ATTTACCATT TGAGGAAGTG GACTATTOOG AGGAATTTAG ACAGATGCTT GAGTGGTCTA 1080 AGACTCATGT CTATTCGACC CT'PCATATCT GT'rGGGGGGC TCAGGCTGGG CTTTATCTGC 1140 GCTATGGTGT AGAAAAATAC CAGATGGACA GTAAGCTATC AGGTATTTAT CCTCAGGACA 1200 ***-CCTAAAAGA GGGTCACCTT CTATXTAGAG GCTTTGATGA TAGCTATGTA TCCcCTC-AT' 1260 CACGCCACAC GGAGATTTCT AAGGAAGAGG TCTTAAACAA GACCAATCTC GAGATTTTAT 1320 CAGAAGGACC TCAGGTTGGG GTTTCTATTw TGGCCAGTCG TGATTTACGA GAAATTTATA 1380 GTTTTGGTCA TTTGGAGTAT GACCGTGATA CTTTGGCAAA AGAGTATTTT CGAGATCGTG 1440 ATGCAGGTT 1449 INFORMATION FOR SEQ ID NO: 238: SEQUENCE CHARACTERISTICS: LENGTH: 904 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 238: TACCCGCTTC TTTCAAGAGT TGGAGCAGGG CT'rTTAACGG
ACTATGGATA
CGTTTTCGAA GCACTTTATA ATTGTACCAA ATCCAACTAG ATTGGTACCA GAAGTTGCGT AAAAAACGAC AACAGCCAAA ACAAAGTTGA AAATCCAGGA TTTTTGTAGG TAAAGAGCTC CTA.AAGCACC GATAACCATG GGATAGCCAG CCATCAAAAA TGCCATCAAG GGCGACAAGA ACATGGCTAT GGAAGCAGGT GGAATGACAA TCTTGAAAGG CGTTAAAAGG GCTGTCATTG TCATAAATTG CTTTTTAACT GCATATACAC TAGTATGGTA CTTGGGTTTA TAGATCATTT TTTAGTTAAA.
ACCTCTACTT CTAAAACATT GTTAGAAATC TCTTATTTCG TTTTACTATA GTAAAGATTT TCCACCTTCA GGTTTGGAAA GCGGAGATTG
GGGA
CTTGTTTGCG ATCTTTTGTC ATAGTTCTTC GACAGCTAGT GCTAATGTAT AGTCTACCAT TACAAATAGA ACATAAAACA TATTTTCTAC ACAGGCCAAT ACTTCAGCAA GGGCATGAAC AGATTTTGGT TTATCTAGGG TATCGGGGAA AAAAGATATA TGGGAAAAAG CCCGAAAAAC TCCAAAACTA GAGGCTAGGA TGACAAAAAC AAAAATAGCG ATGTGGCTCC CCAAAGTATA CATAACAATT GGAATCAAAA TCGCAATAGC TGTCTTTTTC CGTGTATTCA CAAGAATCTC CAATAAACCA GACAATAAAG CAAGAA'rTTA AGTTATAGTA GATTGAAACT AGAATAGTCC GATTTGGCTG TCCTGATCGA TTTGTCCTGT CATTAAAAAG AAACTGTATA GAGCAAAATC TTTnTTATTT TTTCCAGGGT TTGTAGTCGT INFORMATION FOR SEQ ID NO: 239: SEQUENCE CHARACTERISTICS: LENGTH: 946 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 239: CACTCAAACA TGACTTATAT CAAGACGGAT GGACTTCAAG ACGATGCCAA TCGCTTGAAT
CGTAACATTC
CTTCATGGTG
GCAGCTGTCC
TCAATCGCAG
CGTGCTATGC
TGGTACCTTG
TTGACTGTTG
GAAAATGCAG
GTCTCAGCTG
TCTACAGATG
GTCCGCCGTC
CTCGATGGTG
TGGCAGAATA
CCTAAAAATC
AGTTrTGGTGT TCGTGAATTT GACtTCGTGT ATACGGTGGA GCTTGTCAGC CTTACAAGGA TTGGGGAAGA TGGTCCGAC'r CAAATCTAAA TGTTTTCCGT CAGTGACAAG TGAGAAAACA AAGATGGAAC AGACTTCGAC CCGACTTTGA TACCATCTTG CTAAAGAAT'r GGCTAGTCAA TCTTTGATAA ACAAGATGCA GTGTTGCAGT CGAAATGGGT CCGTTCTAGG TATTGATACT TGGCTTTACT GTAGAAAATC AGGGCGTAAG CTCTGGT'r 1260 GCAATGGGAA CAATCTTGAA CGGGATGGCC ACTTTCT'rCG TCTTCTCTGA CTATGTGAAG CTTCCTGTGA CTTATGTCTT TACCCATGAT CATGAACCAG TTGAGCATTT AGCAGGTCTT CCAGCAGATG CGCGTGAAAC GCAAGCAGCT CCAACTGCCC TTGTCTTGAC ACGTCAAAAT AAGGTTGCTA AAGGTGCTTA TGTTGTATAT ATTGCCACAG GTTCAGAGGT TAATCTTGCT GGCGAAAAAA TCCGCGTAGT CAGCATGCCA GCTTACAAGG AAGAAAT'rCT TCCAAATGCA GCAAGTCAAA ACTGGTACAA ATATGTTGGT TCGGAGCCTC TGCCCCAGCA CCAAAAGTAT TTGTAAAAGT TGTTCGAAAC TTGAAATAAT TCTTACCAGA AAAGTAAGGT ACAATCTTGT TATGTAAAAG ACAA.AG 9 .9 9 9 9*9 9 9* 9 9 .9 9 9 9.
9 9 6 9 9 9. 9* .9 9 9 AAAAGTAGCT GAAATTTGAT ATAGTAGTCC INFORMATION FOR SEQ ID NO: 240: SEQUENCE CHARACTERISTICS: LENGTH: 2764 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240: CGGGGCTCCc TAGTI'CTTAG TTTATACTCA ATGAAAATCA TGTTTTGAGG TTGTAGATAA GTTGACGCGG TTTGAAGAGA 4CAGATTATTC TI'ATTAGTAG GAGCATTTTA GAAAGAGGAA TATCGTCCAT TGGGAAAGGG GTCTCAAAGT AACCATTCAA GTCCTTACCA GCACGGGGAA GGAGCTATTT TTGTTTTTTC AAGAAGTTAT CTTCTTGTAT AAGAGCAAGC TAGGAAACTA GCCGTAssTG CTCAAAACAC GACTGACAAA GTCAGGAACA C.ATATCTACG GCAAGGCGAC TTTTCGAAGA GTATTAGTTG TGAATCTGGT GCAGTCGTCC GGTCTTGTTT TCTATATCCC CTCGTAGTTA ACAAGACCTT TCTATGTCTA CGAAATATAT TTTTGTAACT GGTGGTGTGG ATTGTGGCAG CGAGTCTAGG CCGTCTCTTG AAAAATCGTG AAGTTTGACC CTTATATCAA TATTGATCCG GGAACCATGA GTTTTGTGA CAGATGACGG AGCTGAGACA GATTTGGACT
TGGGTCACTA
GGAAAATT'rA TrTCAAGTCAT
CGACCGACTC
TGCCATTCCT
ATATCCATAC
CCCAACACTC
1261 TGAACGTTT-C ATCGATATCA ATCTCAACAA ATATTCCAAC GTGACAACTG CAG1'GAAGTT CTTCGTA.AAG AACGCCGTGG AGAATACCT'r GGGGCAACTG TCCTCATATC ACAGATGCTT TGAAAGAAAA AATCAAGCGT GCCGCTCTAA TGATGTCATT ATCACAGAGG TTGGTGGAAC AGTAGGAGAT ATCGAGTCC'r AGAGGCTCTT CGTCAGATGA AGGCAGATGT GGGTGCGGAT AATGTCATGT AACCTTGCTT CCTTACCTCA AGGCTGCTGG TGAAATGAAA ACCAAACCAA TGTCAAAGAA rrrGCGTGGCT TGGGAATCCA ACCAAATATG TTGGTTATTC GCCAGCTGGT CAAGGAATTA AAAATAAACT GGCCCAGTTC TGTGATGTGG CGTTATCGAA TCGTTGGATG TTGAACACCT TrACCAAATT CCACTGAACT AGGGATG0AC CAAATTGTTT GTGATCATTT GAAATTAGAC GCACCAGCAG PiGAATGGTCA GCCATGGTGG ACAAGGTCAT GAACCTCAAG AAACAAGTTA TGTTGGTAAG TATGTGGAGT TGCAAGATGC CTATATCTCA GTGGTCGAAG CTCTGGCTAT GTCAATGATG CAGAAGTTAA AATCAATTGG GTCAATGCCA
GTACAGAAGA
CACCAGAAGC
TGCAGGCACA
CGGATATGAC
AGATTTCCCT
CCTTGAAACA
a S.
S
S. *S a a a S. 0 a a ATGATGTGAC AGCAGAGAAT CAGGTGGTTT TGGTCAACGT AAAATGATGT TCCAATGTTG CTCGTCACGT TTTAGGTCTT ACCCTATCAT TGA'rATCATG GTrTGGGACT TTATCCGTCT ATCAAGAAGT GGTGCAACGC AGCAGTTTGA GGCAGCAGGT AAATCGTGGA AATTCCTGAA CAAGCCGTCC AAACCGACCA ACAGCAATTA GCAAAATCAG ATATTGCAGT ATATCTGAGG GTAGCAGAAC TCTTGTCTGA TGCGGACCGGG GGTACAGAAG GGAAAATCCA AGCCATCCGC GGAGTCTGCT TGGGAATGCA GTTGACATGT GAAGGTGCCA ATTCTGCAGA GCTTGCACCA CGTGATCAGA TTGATATTGA GGATATGGGT AAGTTGAAAC GTGGCTCTAA GGCTGCTGCT CGTCACCGTC ACCGTTATGA GTTTAATAAT TTTGTCTTTT CAGGAGTTTC TCCAGACAAT
ATCATCGTAC
TATGCGCGTG
ATCGAGTTTG
GAAACAAAAT
GGAACCcT'rC
GCTTATCACA
GCCTTCCGTG
CGTTTGGTAG
600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 9280 *5 0 5
S
*5*a
S
AATAAATTCT
GAAGAACTCT
AACCTTTGAG
TAGGGGTCCT
TTGTAGCTTG TCAGTATCAC CCTGAACTGT ACACTGCCTT TGTTACTGCA GCAGTTGAGA AAAAATCTCA GAGGTTTTTT GCATACGATG CTGTATGTAC CTGCTACCGT TGAAATCAAT AGCGACTCCC TCN'GCCCTG TGCTAGTGAA TGGAT'rTATC AGTATATTGA AATGAAA'rAA AATT'rGAACA AATTAATTCG GAAAGCCAAA TCAATTTCTA GCAAAGTTTT AGGAACTGGA TTGTATAGTG AATTGAAATA AGATGTGAAC ATCTCTATCA GGAA.AGTCAA ATTAATTTAT AGAAATATTT TAGCAG'rCAA GATGTACTGT TATAGATTCA ATACATrTATA CT'rTTTTAAT
TTAATCCACT
ATTTCTAACA
ATCATTCATA
TTCTTCCTCC
AGATTTCCTC
TGTTCTTTAA
ATTCAGGTTG
GAGACTAGAC
1262 ATAGTAAAAT GAAATAATAA CAGGACAAAT CGATCAGGAC AGTCAAATCG ATGTTTTAGA AATAGAGGTG TACTATTCTA GTTTCAATAT ACTATCCCAA CCTCTCTCAA CTAGATGTAA CTTACAAAAC CCCTGACCTC ATGAGCCACT TCATGAGGTC AGTTTTACTT TCTGCTGTTC CAGTATCGTrr TTTCCTCGCT AAAAGGGCAG ACrCCTCCCT TGGTGCGTCA CACGATTTTT TCATCTCGAC TGCATCATTA ACGACGCTTT TCTTCTAGGT GGTrCATAAG GAACAGGAAG ACTTTTCTAA TCCTAGAATA AAGTGCTGAA AACAATTCGG AATAGGCATA AATTNGAGGA GCTGCTTGCG TCCTGTTCGA ACACATTTTC CCACCACGTG
AAGA
INFORMATION FOR SEQ ID NO: 241: SEQUENCE CHARACTERISTICS: LENGTH: 1682 base pairs TYPE: nucleic acid STRANOEDNESS: double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 241: CCGTTTTITT CATTGTTCAG TACTACAACT TACGTTGTAG CGCCCTGCAC ATTGGTTCGT CTTGTTCAGT TPTTCAAAGGT CTTTGTCACT TGCTTCTCTC AAGCGACAAC TATATTAGTA TATCACAACT GCTTTCGCTT GTCAACACTT T'PTTGAAGAT TTTTAAGTTT TTTTAAACT TTTTTCATCA AGTGGTCCTG ACGCAACATA CCATAGTCCG TACGGGATTC GAACCCGTGT TACCGCCGTG AAAAGGCGGT GTCTTAACCC CTTGACCAAC GGACCTGAGT TGTTATTTTC AACTCTTACT ATTATACAGT CTTTTCAAAC T1TrGTCAACT ACTTTTTTAA ACTTTr'r'TA TTAATTTTAC AACAGCTTCA GTTCGAGCTG TATGTGGGAA CATATCGACC GACTGGATAT AATGAAGATC ATAGACTTCT ACTAAGCGTA CCAAATCACG AGCCAAGGTC GAAACATTAC AAGAAATATA AACCATTTTT TCTGGTACAT AAGTAAGAAT AGTATCTAAT AACTTATCAT CCAGACCTGT ACGTGGTGGG TCAACAATCA AAGCATCTGC TCGGTAGCCT TCCTTGTACC AACGAGGAAT AATCTCTTCT GCCGTTCCAG CTTC-ATAATG AGTATTGTCA AATCCCATTC TTTTAGCATT TCGCTTGGCA TCTTCAATAG CTTCTGGAAT AATATCCATA CCTCTGAGTG TTTTTACTTT CTTTGCAAAG GCAAATCCAA TCGTTCCAAC TCCACAATAA GCGTCAATCA AATGGTCTTC TTTATCAACA TCCAGCGCTT TTACTGCTTC GCTATAGAGG ACTTCTGTTT GCTCAGGATT TAGTTGATAA AAAGCTCGAG GGGATAGTGA A.AATTCATAA TTGAGTACAC 2340 2400 2460 2520 2580 2640 2700 2760 2764 120 180 240 300 360 420 480 540 600 660 720 780 840 900 1263 CTCTAT ACTCTCTTGC CCCCAGATAA TCTCTGTCTrT TTCACCATAT ATCTCACTGG ?NTAGCTGT ATTTGTATTA ACAGCTACTG TCACAACTrC TGGGAAATCT 'IrAACCAACT CTTTTACCAA TTGAGTTAAA 'rA.AGCTGGC GGTrTGTAAC AATAATAATC TGAACCTGTC CGGTCTTrCT CGCGCGTCGG ACCATA.ATAG TACGGACACC TAGAACTTT TGATTGGAAT CTGGTGATAA GTAAGTAATT- CTGCTAAGCG ATTAGCAATC CCTTATCTTG TACCAGGCAG TCTTTCAACT CTACTkAATA GTGAGAGTrT AGCCCGCCTT GACCTGATTT TTAAATTTrC GAGTCTGAAA TTGTAACTTA ATrTTGGTTC CTGCATTCCA ATAGTTGGAC GAATTTCATA ATT'rrCATAT CAAATrTT'r CAGCGCTTGA TGAAGTAAGT CCGTCTTGAA CTCCAGC'rGC GCAGGTGCAT GATTGGCAG CCTCCGCAT T CATTATAAAT AGTACAAGAT GAAATTTAGA CTTCTTGTTG ACCTTCAGTA ATTTrGCTTC AACAAAGTTG AAGTAATCTG ACAATAGATA TCTTCGCCTr TGAGAGCTCC TGGTACAAAG TTTGGTAAAA. GCCGATTCCC TCACCGTTAA TTCCCA'rGCG CTTGATTTTT
CTCTCATCCG
ACT'rGGGTTT
TGTGC*ATATA
GCrCTGTAAT
CCTGCAGGAG
TTATCATAAT
GGCACAATTC
CGTCTAATAG
ACTAATGTTT
AATGGTATTT
960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1682 a a. a a a. .a a a a a a a. a a INFORMATION FOR SEQ ID NO: 242: SEQUENCE CHARACTERISTICS: LENGTH: 2524 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 242: TTAACTT-TGG TCAATTCTTT AAAGTCATCC TCTGTAAGCA TGT CCTTTATTGC TAAAATCACC AATTCCGACT ACAGCTATAT CTA TTCAAATTT CAAAATATCT TGATTGCAAA ATACCATCTG CTAj ACAATCGTTG CATTCATAAA TGTACACTCT CCATGAAATT TTC AqTGTATTCA CATGGTATTT AGCGTGTATG TGACTAGGAC CAC~ TGAACATTTC GGACACTTTT ACTGTGAATT AAATCTACTrA A CTAACCA TTGATGTTTC kATCTTT CCAACTATTT ~CAATT~T ATTTTCTTGC TAGACAT TTCATAAATC CTGCTAG AGGATAGAAG rACTTAA ACTr'N'CCCC ~GACGCC TGCTGCAACT CATTTGG AATAATTTCT rAAACAT ATTGGTATCA CAAGAAAAGC CAATTTTCAT ATTATCATCA TGAGAAATTC TTTCAGATAA AATTGTTGGA AAACTT'rCCA AACTGTATTT TTCTTTTACA ATTAGATTCC TAA( GTATCATCAA ATTC TAATTTTCCA ACT, 1264 AAATTCTCTA TTTCAATTTT AACAATTCCT ACATTCCTTG ATAGAGGTTC TATAAATTCC TAATTTTGCr GCTATTTGTG TAATACAGAT AACCAATTT'r AGAAAGCAGT TTATTCCTAT TCTTACGAAA C'rACCTTAAC CATTATCCCA GCATTTTCTA GAAAGTTTTT CGTCTrGTTAT TACTTCATAG ACTTGACTTA CCTCTITTAT CAAATTTACT TGAGTCACTT AGGACAATGA ATATATTGAA CTACCTCACT GCGCATTAAA TCTTTTCCGG CTTCTGTTAA CATTCTACTA ACTGATT'rAA GTTCAATA CTTGATTCAT ACAC'rTAACC A'rGTAGCTAT ATTTTGTTTA AAGCAAATCT TC?1'ACTGTA CTTTATCCGA CACTGCTGAA TAAAGCCCAT CTCT1TATCC TAACCATCTG TCCCAACAAA AGCTTGACAC AAAGGTCCTA CAGTCACCTG TGAATCTTTC CATGAATCAT AAGCTCTCAC AAAATTTGCT 'IrTCTTTTTT GCTTGCAAAT TTCCTCAGCA ATTATTG;TTT CATTATCTGA CACCAATTTTT TCATAATTAA TTGACAAACG TACATTTAAG TGCTCTCTCT GTAATAAACC TTTTGACTCT TTCGATACAr TTAATTr'rTC CGATAATGTA AATTTAATAA TTTGTTCCAA TCTTTTCATT ACATCAAAAG TCTGTATCAT TTCTrTTAAT TGAAACTCAC CACCAAGAAC AATAACACGA ATAAAAAACG AATTTGTTAC AATCGTAACA AGTAAAGCAC AGCTCGATCC AGATTCTATC ACTGCTTCCT GAACAATTTr TCTCTTAGTT TCATCTCCAC TATTTAATAC AGCATATCCA AATTTATCTA AATCT'TTTCT AATCGTTACT TTAACGTCGA TCTTTTCATA TTCTGATACT 0 I 4* 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 AAAOCAAAAA ACAACAAATT AGGATAGAC'r TATGAAGAGG ATTCAACACr TTTICAATTCA
*I
V
*1t* ebb.
a
TGTCCTCTGC
ATGAAAGATG
ATTATTACAG
TTATCAGGAG
AAAGAACATC
ATTGATTTAA
AAACATAAAA
TT'rTCACAAA
TTAGAGGATG
CTACTCCCTT
ATGGATGGAA
GCTGTCCATG
CTCAACGAGA
AGGTATTAAA
GTGAAATATT
ACATACACAC
AACCTTTCGT
AGGAACTCTT
TGACGGTCCG
GTGTTCTAAT
GAAATTCACC
AGACAAAGAA
TGCTCAGTTT
TGCCATTGAA
TTACACCTCC GTTTTATTCT TCGTALATTGT TTTTCTTTCG ATGGAAATAT CTAAAGGAAT GGTATTCGTA CAACTGTrTTT CCTGAATCTC AAAGA.ATGAA 'rTAGTCGGTG AAGAAAAGAC TTTTACGAAG AATCCGGTGG GAATTTGCTA AAGCCATCTT ACTACTGCCT TTGTTGATCA TACACAGACC TAAAACATTA CAAATGATTA TTA-AAAACAT
ACCAAAATAA
TTTTTGTGAT
TATTTTTAAT
TTTAAAAGGA
ACCTGAAAAA
TGTAGALAGAA
AGGTTTAACT
AAAATCAGCT
TGAAAAATTT
TAATTCTATA
TCATTATGCT
TAACAATAGT
CCAAGTTCAA
GAAATATGAA
GGTATTTCTG
TTCAATATGT GGATTTTATC AAGTGACTGG GGTTTTTAAT
ATAAAACTAT
CAGAAAAATT
TTCATCAATT
TCAACGCACT
CGTTTTAAGA ATCCCAGTTA TTCCTAATTT CGCTACTOTA TTTAACTCAT TAAATATCGA TGGTGAAAAC AAATATCGTT TATTAAATCG TCATCCWGAA GATCTTATTG ATTATCAAAA 1265 AACCACCA'rA TTAATTGTTA TTTCTAGTTT ATTTCCTTGA AATGCrCTAG CTAT'rTGCAG ATAACAAGCA TCrATAATAC ATACTTAACT TPTTCAAXAGG TTTAGCTAAA AAAT'N'TAGC CAAACCTTT'r CTATTTTACC TTGCTCTAGA ATT'rrrAA.AC TGCTATACTT ATCACAAAAA
AACG
2400 2460 2520 2524 INFORMATION FOR SEQ ID NO: 243: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2359 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear 0 0.
00000 0.00 0 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 243: CGTGCTTGGG GGCTTGTGGT CAAAAGGAAA GTCAGACAGG AAAGGGGATG AAAATGTGA CCAGTmTA TCCTATCTAC GCTATGGTrA AGGAAGTATC TGGTGACTTG AATGATGTTC GGATGAT'rCA GTCAAGTAGT GGTATTCACT CCTTTGAACC TTCGGCAAAT GATATCGCAG CCATCTATGA TGCAGATGTC TTTGTTTACC ATTCTCATAC ACTCGAATCT TGGGCAGGAA
GTCTGGATCC
CCTTGGAACG
CGC'TCTATGA
TCGCTGATAA
AAGCCTTTAT
CGACTCAGAA
AAATCTAAAA AAATCCAAAG TGAAGGTCTT TGTCCCTGGA CTAGAGGATG TGGAAGCAGG CCCTCACACA TGGCTAGATC CTGAAAAAGC ACTTTCAGAG GTGGATAGTG AGCATAAAGA CAAAAAAGCT CAGGAATTGA CTAAGAAAT.T AACATTTGTA ACACAACATA CAGCCTTTTC AGAGGCTTCT GAGGGAATGA GGATGGAGTT GATGAAAAAA TGGAGAAGAA GCCCAAATTA GACTTATCAA AAAAATGCGC CCAACCAAAA TTTGAAAAAG TTATCTAGCG AAGAGATTTG AGAACCAAGT CCACGACAAC AACGATTTTT ACAGAAAGTA AGGTGTGGGT CTTAAAACTC TTTAGAAAAT CTTGAAGAAA TGAAAATTAA TAAAAAATAT CCTATGAGCT TGGACGTTAC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 GGCTTAATCA ACTTGGTATT TAACAGAAAT TCAGGAATTT ACGCTTCI-rC AAAAGTAGCT TGAATCCTTT AGAGTCAGAC ATATGAGTAT TCTAGCAGAA CTAGCAGGTT CAGTGGCAGT CAAGCTGGTC AGGATAAGAA GGTCAAAAGG CAGAAAACTT GCAGGTATCI' CTCCTGAACA GTTAAGACCT ATAAGGTTPA GAAACTCTTG TCAAATCAAC CCACAAAATG ACAAGACCTA GAATTAAAGT GAGGAAAGA.A CCTTGCCCTA AGTGTTTGTT AGAGTCTAAT CGAGTTGCTT ATATAGATGG TGATCAGGCT GACACCAGAT GAAGTCAGTA AGAGGGAGGG GATCAACGCC GAACAAATTG TTATCAAGAT TACGGATCAA GG'PTATGTGA CCTCTCATGG AGACCATTAT CATTACrATA ATGGCAAGGT TCCTTATGAT GATCCGAATT ATCAGTTGAA ATTAAGGTAA ACGGTAAATA CGGACAAAAG AAGAGATTAA GCAGATAATG CTGI'TGCTGC ATCT'rCAATG CATCTGATAT GACCATTACC ATTACATTCC GCCTATTGGA ATGGGAAGCA
GGATTCAGAC
CTATGTTTAC
ACGTCAGAAG
AGCCAGAGCC
CATTGAGGAC
TAAGAATGAG
GGGATCTCGT
1266 GCCATCATCA GTGAAGAGCT CCTCATGAAA ATTGTCAATG AAATCAAGGG TGGT'rATGTC CTTAAGGATG CAGCTCATGC GGATAATATT CAGGAACGCA GTCATAATCA TAACTCAAGA CAAGGACGTT ATACAACGGA TGATGCGTAT ACCGGTGATG CTrATATCGT TCCTCACGGC TTATCAGCTA GCGAGTTAGC TGCTGCAGAA CCTTCTTCAA GTrC'TAGTTA TAATGCAXAT CCAGCTCAAC CAAGATTGTC AGAGAACCAC AATCTGACTG TCACTCCAAC AATCAAGGGG AAAACATNTC AAGCCTTTTA CGTGAATTGT ATGCTAAACC CGCCATGTGG AATCTGATGG CCTTATTTTC GACCCAGCGC AAATCACAAG AGAGGTGTAG CTGTCCCTCA TGGTAACCAT TACCACTTTA TCCCTTATGA GAATTGGAAA AACGAATTGC TCGTA'rTATT CCCCI-PCGTr ATCGTrCAAA CCAGATTCAA GACCAGAAGA ACCAAGTCCA CAACCGACTC CAGAACCTAG CAACCAGCTC CAAGCAATCC AATTGATGAG A.AATTGGTCA AAGAAGCTGT GGCGATGGTT ATGTCTrTGA GGAGAATGGA GTTrCTCGTT ATATCCCAGC TCAGCAGAAA CAGCAGCAGG CATTGATAGC AAACTGGCCA AGCAGGAAAG AAGCTAGGAA CrAAGAAAAC TGACCTCCCA TCTAGTGATC GAGAATTTTA TATGACTTAC TAGCAAGAAT TCACCA6AGAT TTACTTGATA ATAAAGGTCG TTTGAGGCTr TGGATAACCT GTTGGAACGA CTCAACGATG TC'rCAAGTGA TTAGTGGAAG ATATTCTTG
TTATCATCAA
CTTATCAGAA
TCGAACCCCC
ACAAATGTCT
CCATTGGG'rA
TCCAAGTCCG
TCGAAAAGTA
CAAGGATCTT
TTTATCTCAT
CAATAAGGCT
ACAAGTTGAT
TAAAGTCAAG
1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2359 INFORMATION FOR SEQ ID NO: 244: SEQUENCE CHARACTERISTICS: LENGTH: 1052 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 244: TTCTTTCTGC TATAATCGTA TAAAATACTT ACTTTAGGAG TTCTTATGAA AGTTGTTAAA TPTTGGAGGTA GTTCTCTTGC CTCTGCTAGT CAATTAGAAA AAGTTTTAAA CATCGTCAAA AGCGATTCAG AGCGTCGTTT TGTAGTCGTT TCTGCGCCTG GTAAACGCAA TGCTGAAGAT ACTAAGGTTA CGGATGCCCT GATTAAATAC TACCGCGACT ATGTrGCGGG AGCAAGAACC AAAGCTGGAT TA'rCGACCGC TATGCTGCTA TGGTTAGTGA AAACCAGCTG TGCTAGAAAA AATTrCTAAA AGCATTCACG GAAGAAAATG AATT'rCTCTA CGATACTTTC CTAGCAGCCG T'rGAT'rGCTG CCTACTTrAA CCA.AAATGGT ATCGATGCAC GCTGGGATTG TGGTCACAAG TGAACCTGGT CACGCTCGCA AAGATTGAAG AATTGACAAA ACTAAGGAAA ATCAAATCTG ATTGCTGCTG GTGTCAAAGC GCAGCCCACC CTGGTATTAT ATGCGCGAGT TGGCCTATGC TACCGTGGAA AAATTCCTCT CGTATCGTTC TAAAACACAG GGCTTTGTCA GCATTAACAT
CACCAATGAA
TACTTTCTCA
TGACCTCTAT
GTCCTTGTCA.
CGTGGAGGTT
GAAAACTTTA
CCTTGGCCAC
GTGAAAATAA
GCTATA'rGCA
TCATTCCATC
TTCCTGGTTT
CTGATAI'AC
CGGACGTTGA
CTGAGTTGAC
TAACGATAT'r
ATTGGGACTA
TCTTCCTATT
CAATGCCAAA
CCCTAGAGAA
AAGTTATGAC
CTTTGGTG'rC
AGGTTCTATC
TGGTATCTTT
CTACCGTGAA
240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1052 CCACCAACCA CACTCGATTC AGGCrTCTCA GTCCTTCATG GGTTATCAAG AATACCAACA TAATGATGAA T'rTCCAGTTG GTCGAAATAC CTCATGAACC ACGAGGCTCT TCTTCCTGCC ACCCTGACCA TCCAGGTACT TGGGAATTGC TGGTGACTCA GTGAGGTTGG A'rrrGGCCGC AAGGTTCTGC AAATCCTGGA. AGAACTTAAC AT INFORMATION FOR SEQ ID NO: 245: SEQUENCE CHARACTERISTICS: LENGTH: 855 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 245: CCCTCGAAAA CTAAGCCGAT GAAGTCAGAA CACTTCA.ATC CTGTTCGTGA CTGGTGGGAA AATCGTGAAG AGATTCTGGA AGGTAAGTTC TACAAATCTA AATCATTTAC ACCTAGTGAA TTGGCTGAGT TGAATTATAA TTTAGACCAG TGTGACTTTC CAAAAGAGGA AGAGGAAATC TTAAATCCCT TTGAGTTGAT TCAGAATTAT CAAGCGGAAA GAGCAACTTT AAATCATAAG ATTGATAATG TATTAGCTGA TATTTTGCAG TTGTTGGAGG ACAAATAATG ACACCAGAAC AACTTAAAGC AAGTATTCTC CAAAGAGCGA TGGAAGGGAA ATTAGTGCCG CAAAATCCCA ATGACGAACC TGCAAGTGAA 'FTATTAAAGA GAATTAAAGC TGAAAAAGAA AAACTTATCA GTGAAGGAAA AATCAAACGA GATAAAAAGG AAACTGAGAT ATT'TCGTGGT GATGATGCrA 1268 AACAT'rATGG GAAG'N'TGCT GATGGAAGCA CTCAAGAAAT CTGATACTTG GGAGTGGGTG AGGATAAAAT CAATTTATTG CAGAGAAATC CTTTAGGTA'r ATAGATACGT CTAGTATTGA ACTACAAAAA. TCTACAATAT CTTTCACCTG AACAAGCGCC TTTCGCAGAA TAGTGTCTTA T'rTTCAACAG TTAGACCATA TTAGAGAACT TAAAGAGTAT TTGATAGCTA GTACAGCATT CTTAACGAAA CATAT INFORMATION FOR SEQ ID NO: 246: SEQUENCE CHARACTERISTICS: LENGTH: 660 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear TGATGTTCCT TATGATATTC GAATTTTGGG CAAAATAAGC TAGAAAAAAG AACATAATCA TTCCCGTGCT AGAAAATTAG TCTAAAAAAT ATTGCTGTAG TAATGTTTTG GGATACTTTA (xi) SEQtJENCE.DESCRIPTION: SEQ ID NO: 246: TTTAGGAAGG CTATCCGTAA T'rTTACAAAG GATTTAGATA TTACAGAGGA ACATTTAGAT
ATTATCAAAA
GCAACGCAAT
CAGGAAATTA
ATAGTTGATT
GGATGACAAG
AGGGATTGAG
CAATGGAAGC
AAAAATATGC
GGAGTATGAT
GAGAGATGTT
ATGATGCTTT
CTTrAGAGGA
TTACAATATT
TATGAGAAAA
TTTAGATGAA
AGACGATTTC
ATGGGCTGTT
TACTTATGAG
TGGCGAATTT TTCAGTAGCA TGAAAATGGT GAGATAATTT TGTCCTTGAT GCTGGACATC CCCATCGTAG TA.ACCTATTA AAAACAATTG GAGAGGTT'TT TTGCAGAAAA AGACAGAAAT GATCAACTTC CA.AGTCCTTT GAGTTAGATG ACCAAATTGT GAAGTAGATG TTGATGAAGA TGAACTCTCT TGAATTTATT TTGATTTGCC GAAAATTTTA ATTTAATiGA TGATdGTGAC TAATAGACAC TAGAAAGAAG ACGATTAGCT AGAATCAATC CCAGTTAGAT ATGTTGGAAG TTACACGCGT TCTTTCTTGA TTTGGATGCT TATGATTCTG TGAGTTGACA GGTCGTAGA2 GTTCAAGTAA GAAAAAGAAG AAAAAAACAT CATTTTTACC TTTATTTTAT TT'rATCCTGG INFORMATION FOR SEQ ID NO: 247: Wi SEQUENCE CHARACTERISTICS: LENGTH: 1805 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 247: 1269 CCGGTTGCAC AGGATCGTGC ATAGTCAACT CTTCAAGTAT ACAAGTAATA ACACCTAAAA TGAAGCTTTT 'rCTrTTACTT AGCATGCTGA GGTAAAAAAC GCTCATCATA A'rAGGAACAC TAGAAAATCG TCAAATAGGC TGAAAAGACA ACGCCAAGGA ACAAATATGA ATCCTTCACG CAAAAAAGGA GTCTCTGG GCCAGCATGG TCCGTTTGAT ATTCCCTGTC ATAAAAGCGT ACTTCTCCAA AAGCAGTTGT CACCAG'rCCC ATACAGAAGG ATATTATCCA CAGTTTrCCG CACAAAAGCA ATAA'rCATTG
AGCATATCTC
TTCGCCA
CAAr.AATGGT
CAA.AACTACT
TTCGGAAATA
TATTATAGGC
CTATTr'rCTT
AGAGGCAAAA
CT'CATGA
AAGCAGGCTA
ATCTCCAAAA
AATACCCGAC
CCAACGGCCG CACTAGATAG ATAAGATTGC CAAGGGAATC AAGGACAGAA TAGG=rTTT CACAATTCTC AAT'TTTTCCT TATAAATCGT ACTCCCATCA TAAACGCTAG CAAGGTGAGA ACCTTGTCCC TAACATCCCA TTAATTAATT CTACTGAAAG AAAGACAACA T'N'CCAGTTT' GTCCAGCTAC CCGCGAACAA TAAAAGTGTA AGCATCCACA TATCCAGCAC AAAACGTCAA AACCTTTTAG ACTGACGTGA TATTTTTCTT ATAGGTAATA ACCTCATTTT
TAATAAAAAG
AACATTATTT
AAGGGTATTC
AAAAAGTGCT
ACCTCCCATT 0 so GTATTT'TCTC TTAGAAATAT TGTACCATTT TCTTTCTAAA AAATCGTAGG CTACCA'IrTA GATTTTACTA 'rTAGCATAAA AATAATAATA GACAACTATT TATCCAAAAA TAGATAGATG TAACATGTr'r GCAAACAALAG CATACGAACC TTTAGTAAAA TCATTTCCAT GAAACTAGAA TAGAGCCCTC TTAGCAAAAA TCATTATTTT AATTTA=TC TAATCACTCC TTGACATAAA TAACTCTCAC CAATAAAAGA CTATGTCTTA AAAAAAT'rT ATAATAAAAT CAATACTTGG GCTTGATGGC TATGCTACTA ATAACAATTA GGAGAGAAAA TCAGGCACTT GTTAACAACA AGGATTATCC CCTTGAGATG AAAGGAACTT TAGAAATCT'r ATGATGAACA TGCAAAACAT GATGCGTCAA GCACAAAAAC TTCAAAAACA AATGGAACAA AGCCAAGCTG AACTTGCTGC TATGCAATTT GTI'GGCAAAT CTGCTCAAGA TCTTGTCCAA GCGACCTTAA CTGGCGATAA GAAAGTTGTC AGCATTGATT TCAATCCAGC TGTCGTTGAC CCAGAGGPLCC TTGAGACTCT TTCTGATATG ACCGTTCAAG CCATCAACTC TGCTCTTGAA CAAATCGATG AAACTACCAA GAAAAACTG GGTGCTTTCG CTGGGAAATT ACC~rTCTAA AAACAAGGAG CTAGAACAAT GCTTGTCGAT AACAAAGGCT AAGAAAGGTG CAAAAATGAC TCTATAATAT TTGTAGTGGG TAAATCCCCT ATGGATATTA TGGAGCCTAT TTTrTGTGTAG AAAAAAGTCC CATATGACCT ATAATGAAAA GCGACAAAAC AACTCATTAG AAAGAATCAT ATGGAACAAT TACATT=AT CACAAAATTA CTAGACATTA AAGACCCTAA TATCCAGATT TTAGACATCG TCAATAAGGA 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1270 TACACACAAG GwAATCATCG CCAAACTGGr CTATGAAGCT CCATCTTGTC CTGAGTGCGG
AAGTC
INFORMATION FOR SEQ ID NO: 248: SEQUENCE CHARACTERISTICS: LENGTH: 2516 base pairs TYPE: nucleic acid STRANDEDNESS: double -TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 248: CTGCATCTAG TTTGTTTCTC CGTCGCCGCG TTGGGGTTCG GAATTAATAA TAATGACAGC CCTATAGTGG GTCAGATATT TGTACTATAA TGGAGGTGGT CACAATTTGG AAATCATGTA AACTTACTTG TCAGAGTAAG
CCTACAGTTT
GATACAACTA
ACTGGTGGCG
ACCCCGGTAT
GACAATTATA
AGAATTCATA
CCTTAAAGAT
*5 TAGCTAGACA GATTGGAGAT TATGATTTAA GTGAGCT'rGA GAAAGAAAAC TCCTCTGCTG GTAAAAGGTT AAATACCTCT A'rrCGTAGCG ATTCATTGGG GTCTGGCTCT AGATTGTCA TTGGTTCTGG TACTAGATTA GCTATGGCGC CTTCAGGTTC TTGGAATCCA GATTCTTrATT GGTTGATTGT GGGTGTAGCA TGAAAAAAGA AGTAAGGATG GAATTTTTCA TTTGTAATCT TATGTCTTTA TTT'FTGACCC TTCTTTGCAT TATTTATAAA ATTGAAGGTT TATCGATTTT ACCGATTAGC CTAGTAGCTG CACTTATCTG ATGCTACACC CTATTITTTAT TATAAGGAGG TGTACGAGTC GTTCAATCAC CTCGATTTTA GAGTTTAGGA AATTTCCTTG CTTTCAATGG TTTTGCCGCT TCTTCTATTC GAGGATTTTC TACACTGCCC TATTCTAGTC ATTGTGTCGA. AT'rTCTAAAA TTCTTTTCTA GTCTTTTTTC TCCTTATCAG GAAATTTATA CAATCAGTTT CTCTATAGTC AGATAATAGA GGATGCTGAG AGTCATTT'rC TAACAGCACA AGAAGTATCT GGCTATTGTG GGTAGTACTG TAATTATTTC TCCCCTATTT ATTATTATTA GGAATTAATC TTTTAGTGAC TTGGAGATTA TAGTGGTGCC TTAAAAGAAT TATTTGATTC TTGTAACGAC TC'rCTGGTAT GGAGTTTGGG GCGCTGTGTT 840 900 960 CTCTATTTTT GGACTAGCTA GTGCTTTGCT AGTGAAGAAA AAAATAGGAG CCCACTTGCC TATATGATGG TTGGTGGTAT TTTTTGGGCT ATTTPAGGGC AGAACCTGTG ACAACGCTAG CTTTGGGATA TCAGAAAGAT ATCAGTCTTT TGCTCATCTT GCTrTTATTT TAT-rTGTTAG TTGTTTGGTT GTTTATGGTA ACATTCAGAG GACTATGTAT AATGAAACAA TTTGTTCAAT TTTATAAAAA GCAGTATTGG TTTATTTI'AT ATTACTGCTA TCCTGTGTTT TATCTAGTAC CTATTTTCAT 1020 TATCTTACTr 1080 CCTTAGTTAG 1140 CATTTTTTCT 1200 AGATTTCTTA 1260 AGTATATTTA 1320 TTGCGCtGTC GCCAATATTC AATCCATCCA AATGTATTAG A.ATGGATCTT AGTTTTACTT CAAGATATGA CGACTGGAGT ATATTGCTT'r CCGI-rCACAT ATATATrGTT CTTTTTTTAT TTGATGAATA ACTArTTTAA TAGGTTGGAG TGTCGCATTC GTCTGA.AATC AATTAAGCAC TTrACCAGTT TrAGI'TCAA A'N'AGCAGCT CT'rAGTACGG GGATTTGGAC GGCGACTTTA ?TTTTTATrCA TTTTCTAAT TGCATTrAGT AATGGTTTTA GCTTCTC'rTr GGAGATAAAG GAGGTTGATT TTTAAGAGA AT'rATGGT ATAAGTATTG CAAACAATGC TAGTTTCTTT ATAGGATTTT TTTTCTCTTA TATACCATAC TATTTCTTTT TATCCT'rACT TACTA'rTAGC AGTTTTTCTrT GGTTTAAAAA ATCAAACATG AGCTTAGTAT TTCTGTT'rAC TITTTTTATTT GTAGAATCCT TATrCTGGAT TTATCAGTTG GACAATGGGA TAATTGGATT ATTGCCAATT
TTTCAGTATA
ATCATAATTC
GAAATGGGAA
TTTTGGGGTT
GGAAATGATA
TTAGTCTTAT
GTATATCGTA
CTCTGGATTA
AAAGTTCTAG
TGGTAAATTC CAATCCGTAT CATTGACTGT ATTTTCTGTT AGTTAAGTAG TCACATGTGG ATGTTCTTTT TTGGATATTG GACTTGTTAT AGAAATTrTA CTATATGGTT GCTTCATrGG GAAGAGCATC CGATTTCTTT GGTATCAGAT TTGGACCTGT TGATTCAATT TATGTTACAG GCATTGAT'rT ATTGGCTTAC ATTACTATCT CATAGAAACT GGAGGAGAGT GTAAAAGTTG AGGTTGAATC AGATAATCTA TACCAAGTAC ATTTGTTTAG GAT'rATGGTA TTGGTTAGA-A AAAGGGCCTA ATCTGAGTCA AAACTCTTTT TTTATTATTC ATACATTTTT TCTAGCAGT'r ATGGAAGTGA TTCGATTTTC TTCTATTAAG T'rTCTTTATG GACTCATTTT AATCATGGTA TrACCAAACT GGGATATAGG AGTTTTGTTT 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2516 4* ATAGTTGATT CTTTGAATGC TTGTGTGTTA GTCTGTTTr GCTTTATGTT ATACGCACTA .GGAGCGAATG TACAAATGAA CTTTGCTrGC GTTAGTTTCT TTTTACTCAT GATTGG INFORMATION FOR SEQ ID NO: 249: SEQUENCE CHARACTERISTICS: LENGTH: 1364 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 249: CGGTGTTTTT TTGTAAATTT TCTAGCACTT GTATGGTAAA ATAGATACAG GTGTTCATTA AACTAGACTA AAAACCTATT TAAGCAGGCA AAATGAAGAA ATACCAACAA TTATTTAAGC AAATCCAAGA AACCrATTCAA AACGAGACTT ACGCTGTCGG AGATTTCCTT CCTAGCGAGC 1272 ACGACCTTAT GGAGCAATAT CA.AGTGAGTC GTGATACCGT CCGAAAGcCC TGTCTCTCCT CCAAGAGGAA GGATTGATCA AAAAGATAAG AGGGCAAGGT TCTCAAGTCG TCAAAGAAGA AACCGTCAAT TTCCCTGTAT CCAACCTAAC CAGCTACCAA GAACTAGTTA AAGAACFGG ACTGCGCTCT AAAACCAACG TGGTCAGTCT GGACAAGATr ATTATTGATA AAAAATCCTC ACTGATAACC GGTrTCCCAG AGTTrCGGAT GGTTTGGAAG GTGGTCCGCC AGCG'rGTGGT GGATGATCTG GTATCCGTTC TGGATACGGA CTATCTGGAT ATGGAACTCA TCCCAAATCT CACTCGCCAA ATTGCTGAGC AGTCTATCTA TTICrTATATA GAAAATGGCC TCAAACTCCT TATTGATTAT GCTCAGAAGG AAATCACCAT TGACCACTCA AGCGACCGAG ACAAGAT'rCT
CATGGACATT
CGGACGCCAA
TTTTGCAAAA
GGCAAAGACC CTTATGTCGT TTCGATTrAAA NTTCAGTTTA CCGAAAGTCG CCATAAGTTA CGCAAGAAAT AAAAGACTGA GACACCACAT 600000 4 00 0 000 0 00 *0 *0 0 00 00 @0 0 0 @0 0 TAATATTTGT AGTGGGTAAC AAGTCCCATA TGACCTATAA AACAATTACA TTrTATCACA ACATCATCAA TATGGATACC CTTGCCCTGA TTGTGGAAGT CTTACCTCGA AACAACTGGT GCTATCATTG TTCTAAAATG TTCCTCGTAT TATCAACCAA ATATTGCTCA TCAGCTGGCC CCCCCTATGG ATATrATGGA TGAAAAGCGA CAAAACAACT AAACTGCTCG ATATTAAAGA CACAAAGAAA TTATCGCTAA CTAATGAAGA AATATGACTT ATGCCTACTA GAAT'rCTCCT ATGGTCGCTG AAACTrCTAT TCAAAAGTCT ATCTCCAAGA GAGAAATT'rA GATTTGTAGA CTCAGCCI'TT TTCGGCTCTA GCCTATTTTG TGTAGAAAAA CATTAGAAAG ATTCATATGG CCCAAACATC AAGAT'rCTAG GCTGGATTAT GAGGCTCCAT TCAAAAACCG TCTAAGATCC TAGAAAGCGT CGT'IICAAGT CGTCAAGAAG AA'rCATCAAA 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1364 AAAATTGCGC AAAAGTTGAT TGAGAAGATT TCTATGACCG ATTTCAACTT CAACTGTCAT TCGG
V
0000 00 00
C
000.
0 *000 0* @0 0 0 INFORMATION FOR SEQ ID NO: 250: SEQUENCE CHARACTERISTICS: LENGTH: 1227 base pairs TYPE: nucleic acid I(C) STRANDEDNESS: double TOPOLOGY: linear 1 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 250: CCATGAAGAC CGCTTGGAAT TGGAATGGCA CAAGTCT'ITG TTGAATGGTC TATTCCCATT GACAATCGGT GGAGGAATTG GACAATCTCG TATGGCCATG TTCCTACTTC GCAAGAGACA CATCGGAGAA GTGCAAACAA GTGT'rTGGCC TCAAGAAGTC CGCGATACTT ACGAAAATAT TTTGTAGAGA ATCGAACCGC AAGGTTCGGT TTTC'TCTC TTTT'IGTCTA TAATTTGGTA 1273 TAATAAACAG TATGAAAATC GTATCAGGAA TCTATGGGGG ACGTCCCCTC AAGACACTAG AAGGCAAGAC GACAAGACCT ACTTCGGATA AGGTTAGGGG AGCCATTTTT AACATGAT'rG GTCCCTACTT TGAAGTGGGA CGAGTC?1'GG TCGAAGCAGT ATCGCGTGGC ATGTCCAGTG GACCATCGTG GCTGAAAATA TCCAGATGAC GATGGATGCA GAAAGGGCA'r TGGAACAGGT CCCTCCCTAT GCCAAGGAAC AAATCGTAGC TTTTTCTGAA GATGTTATGG TrGTGTGCGA AATTGCCTGT CTGGGTATCT GGAAGGAAAA TGTCAGATAA GATTGGC'rTA TTCACAGGCT ATATCATTGA ACGGGCGAGC AGACTTTTTG CCCACAAACA AGGATrTCTC CCTCTTGAAA ACC'rTTATGC CTGTTT TGG'r
CAAGGAAGTT
AGGTAGTGGT
GGAGCGAGAC
GGAAAATTTC
GGTT'rATCTA CGTAA~cTCA
AACTCCTCAA
ATCTGGGGAA TTTGACCTCG TTTTCTTAGA AGATA'rTGAA AA.ATGGCTG AGAGAGAGCT GACGGATAAA GCCGTTGAAC TTCCAGAAGA GATTTATGGA ATTAGTAAGG TGACAGTCTA CATTTGATCC GATGACAAAT GGGCATCTGG ATAAGCTTTA TGTGGGTAT'r TTTTTTAATC ATCGTAAACG GGGGT'rAGAA AAGGCTGTGA 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1227
S
*5S* S. *S AACATTTGGG AAATGTT-AAA GTCGTGTCTT CTCATGATAA ATTGGTGGTC GATGTCGCAA AAAGACTGGG GGCTACTTGC CTAGTGCGAG GTTTGAGAAA TGCGTCGGAT TTGCAATATG AAGCCAGTTT TGATrACTAC AATCATCAGC TGTCTTCTGA TATAGAGACT ATTTAT'N'AC ATAGTCGACC TGAACATCTC TATATCAGTT CATCAGGCGT TAGAGAGCTT TTGA.AGTTTG GTCAGGATAT TGCCTGCTAT GT'rCCCG INFORMATION FOR SEQ ID NO: 251: SEQUENCE CHARACTERISTICS: LENGTH: 3652 base pairs TYPE: nucleic acid STRANDEDNESS: double CD TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 251: CCGGTCAAGT TAAAAACGCT A'N'TCT'rCCC ATTTTATTTA TTTTTTAGGA GTGGTAACGT ATCAAAATAG CCCAAGCGTT CTCACCCGTG TGAGTTTGAA TAATGGAACC CGTTTCCA.AA ACAGAAATTG GCT'XTTCA.AC ATAACTGT AAGCTTTCTT TCATCTCTTT TGCCCAATCA TCACTACCAG AATATGAAAT TCCAATCTCT GCTACAGCAC GTTCAGAAAG CGATGTTATC AACTCATCTA ACCATTTTTT AAATGTTTTA GTTCCACGAC CTTTAACCAT TGGCTGCAAT TCATGGTCT'r TCATTTGCAT GACAGCACGG ATATTGAGAA GAGAGCTCAA CAAGCCAGTT ACACGGCTAA TTCGTCCACC TCTGTATGGT TITTAACCTC TGAGCTAAC'r TCGCAGCCTC TCAACAACAG TCACATCTGC CCCGAAAGAG CATGGGACAT TCAAAAATCT CAGCAAAGAC 'rCTGCATCA ACTGAAGAAA TTIATCAATCA TTACAGATAA TCAATAGTAA CAGATGAATC 1274 TTTGACAAGA T'TTTCCAAAG TTGAAACACC AATA'rAAAGC T'rCTACATGA GATAAAATTG CCTCCATATC ?rTrACCTTCT AACAACTTGG AATTTCACCG C7TGGTCAGT GAAGGAACTA ACTACATAGG CrAGCACCTT GCTGCTGC TTCTACCGTA ATGAATAGCA AGAATCTGGC CACCATCTTT GCATAGGTCT ACCTACAGGT GGCTGACTTG TTTTCGGAAG AIT'rACTT TTTACCTTCT TCTTTCAAAT CCGCA'rCAGA ATAAACAACA TGGAACAATT GTAATATCTA ATTGCTTTAC TAGTTCAGGT GGTTACAATC TTAATTTTTG TCATAGTATC AATCTTTCTA TTTTAGGATT CAGATTGGTT TCCTTACTTC TAATTATATC A-AAAAAAAGA TTAAAAATCC TAATGGAGTC AATCAAATTT TCCGTAAAAT TTGATATAAT CAACTTATAA GAAAAGAGGT GTCCTATGAT TAAAAAAATT CTTTTGGACT GACTTATTT'r TTACCCTCAT CACCTTT'rAT ATATTCCCCT TT'rCATCCTA TACTAGGAAC CTTAGCTTTG TTGATCTTCA AGGTGATTTA GCC?1'GGAAT TATITTTTAAT TTCTCAACAA ATACACTCAT TACCCCATTT TTACCATTTT ACTAGGTGCT GCTATTTATG GTAGTTCCCC ATCATCTCTT TGA-AGGAGGG GCGACAGGCA CTTTTTAAAA TCCCTGTTTC CC'rCATGAAC CTGCTGATTA GCTTGGAAGA TTTT'rGGAGC CAAATCCCTC TATTCTAGTT TCCGGCTGGT TAGCTTT'TTT TGAGCATATT CCCCTTCATA CTAATCACAG CCCTTATAGC GGGAATCCTA TTGGGAATTG GCTGGAGGTA CAACTGGCGG AACTGATATT CTAGCTCGTA ATATCCATAG GAAAACTGCT CTTTATCTTA GAIr=GTA 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 TTCTCATGTT GATTCTCCTA ATCTTCAAGG ATTTGAGATT GGTTTCCTAC ACGCTTTrGT TTGATT?1'AT TGTTTCTCGT GTTATTGATT TGATTGGTGA AGGAGGATAT GCCGGCAAAG GCTTTATGAT TATCACAAAA CGTCCTGACC AACTTGCTAA GGCCATTAAT GATGACCTCG GAAGAGGTGT TACTTTTATT TCTGGTCAAG GCTACTATAG TAAAGAAAAT TTGAAAATCA TCTACTGT;AT TGTCGGAAGA AATGAAATTG TGAAAACGAA GGAAATGATT CATCGAATCG ATCCTCAAGC CTTTATAACT ATTACAGAAG CCCATGAAAT CCTAGGAGAA GGCTTCACCT frTGAAAA.AGA ATAAAAAGAG GTAATGTCGT GACCTCAAAA GTTAGACTAA ATCATCTATC TTTTGGGTTA CAGACAACCT CTTTTTTATT TTATTTACTC AAGCTCTTAA GACCAATTCC GAGTTACTTC TTCATCAGCC TTTAACTGAT CCACTAATTG GTCAACTGAG TCAAATTTGG TCATATCTCG AALTGCGATCA AGCCAATAAA CCATGACGGT TTCCCCATAA ATATCTTGAT TAAAATCAAA AATATTGACT TCAAAACGTG CTTCTTCTCC ATCAAAGGTC ACATTTTTCC CGACACTAGC CATAGCACGA TACTTCTGTC CATCTGCTGG CATATAAGTA CGGTCTAAA.A TACGACCACG AGCATTACCA TGAACCACCA GTTTTCCTGC ?I'CTTTCACA TTTCCATCTA TCTrTCCm CTCATCTTCT ACAGGTGGAA TTAAATCTTC TGCTGTTr T'rGTCAGAAC TTTTGGCATT CATAGCCTTG ATATAAGTTG ATTGACTACT AAAATCAAGG AGATATAATT CACGTTCAGC AGGGTTCAAA ATATGCAAAA TCTTTGGAGA TTICATT-AAAG GTCATAACGA TGTTGGCAAC ACGAAATAAT TCTTGATGCC CAACGACTGA ATCAGATGGT GTGCCAATAT TCATAAAATA ATTATATCAT AGCGATAGCT TTTTTTCACA TGAAGTGTAC CTGTTTTCAA TGAACTTTGC AATATTCTTC TCAAAAACTT GATTAGACTT GTTCGGTAAC CATAAAGTGT CAACTAACTC CTGAGAACTT TAAATTACTA GAGGTACACC TATGGCTGTA AAATTTACAA AATTTCCTAA ACTCCCTGAT TTGAAGCAAG AAAACAGCAA AGAAAAACTA GATGACTGCT ACTGGGCTAT TTTTCTCTCC ATCTGTTAGC 1275 T?1'GAATCTC AACATCAACA. ACATAAACGC GCACTAAATT CGCTGTCGGA TAACCAATTG TACCTCTTGA TGGAAGCGGT GCCCCCAAAA AAATAGCTTG ACGGATACGA G?1'GAACTAA CAATGATAAC TTCTCCATCA AAGTAATTCT CAAATGTATA ATCAAAACCT GCAACAATAA CAAAGAATTC ?'rGTGCAGTG AGACTAGCGA CTTCTACACC TTCGCGCTTT AATTTTCTTT ACAAATCTGG ATGATAAGGC TCTAAAGCGA CGATAGGCAA CAAATCCT'rT CTCGCAGCCT CCTTATGTAT GCCATCAAAA TAGCCGAGAA CTTTTTGGTT TTTTATAGGA ATAGTAATAA ATTTCTGGAA CAGAAAATCT GAAATGTTGT AAAGCACTTT ATTCTATCGT TGCT'rAACTA GTAGGACATC TTCAAAATTT TGCAAGGAGT CATACTATGC TTATGTATGA. AAAAGCAATG ATTGGTGCCG AAAAGGTAGA ATTr'AGAATC AATGAGACAA CTTGGGCAAG ATCTTTGAAG 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3652 TCACPTTCCC. TAATGACAAA TTCCAACAAC TCCCATCTAG
GAAAAAAGCC
TGTGCTTCAG
TTGGATTCTC AGACCGmTC AGCTAAAGAA TATCTTTTCC CTTATCAGAA GGAACGGCTC AAGCCATTCA GCCAATATTT GAAACCAGAT AGCAGTTCTT ATAGTCAATT CGAGTAGGAA ACTCATATCA ATGTTTAACA G.TGTTCTATT AATTAAAGTG CAAACTAGGA AGTTAGCCGC AGGTGATACT INFORMATION FOR SEQ ID NO: 252: SEQUENCE CHARACTERISTICS: LENGTH: 743 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear
GACAAGTGAA
GA-AATAAAAT
CCAGATTCAT
TTGGGTACGG
GGGACGACAA
CTGAAGAAAT
ACTCAATGAw
CA
1276 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 252: GTACCGTGGT GCCAAAGTAC TTACATCAAA GAAAATGGAA T'rATTATCTA AAATCCGGTG TTGGTTI'TAT CTCAAA'PTTG TAGTCAAGCT TGGTACTACT GGATAAGGAA TCTrGGTTT'r CTACGATTCT CATAGTCAAG TGAATGGAT'r TGGGATAAGG AAAAGAATGG GTCTACGATr CATGGCGAAA AATGAGACAG AGGAAAAACT ACAAATGAAA 'rTATGATTCA GATGGTGAAA AGCAAGGTTG GCTTTTTGAC AAACAATACC AATCTTGGTT ACTATGCTGA TAAAGAATGG ArTTCGAGA ATGGTCACTA GCTACATGGC AGCCAATGAA TGGATTrGGG ATAAGGAATC ATGGGAAAAT GGCTGAAAAA GAATGGGTCT ACGATTCTCA TCAAATCCGG TGGTTACATG ATCTCAAATC TGATGGGAAA CTTGGTACTA CTrCAA.ATCC AATCTTGGTT TTACCTCAAA CTCATAGTCA AGCTTGGTAC TAGATGGTTA TCAGCTTGGA ATGCTGCTTA CTATCAAGTA AGCTTTCCTA TATATCGCAA ACAGCCAATG AATGGATTTG ATAGCTGAAA AAGAATGGGT GGTGGTrACA TGACAGCCA.A TCTGATGGGA AAATAGCTGA TACTTCAAAT CTGGTGGCTA AGCGATGGTA AATGGCT'rGG GTGCCTGTTA CAGCCAATGT AGTAGTGTCG TATGGCTAGA 'rAAGGATAGA AAAAGTGATG ACA INFORM.ATION FOR SEQ ID NO: 253: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 4010 base pairs I(B),TYPE: nucleic acid STRANDEDNESS: double (D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 253: TTTTGGTTGA TGATACGAGG GATTTGGTGA TTCTTCTTGA CGATAGAAGT TTCAGCGACC ATCATTTTTG AACAGTGATA GCACTTGAAT CGACGCTTTC TAAGGAGAAT TCTAGTAGGC ATACCAGTCG TTTCAAGATA AGGAATTTTA GAAGGTTTTT GAAAGTCATA TTTCTTCPAT TGGTTTC!CGC ACTCAGGGCA AGATGGGGCG TCGTAGTCCA GTTTGGCGAT GATTTCCTTG TGTGTATCCT TA'PTGATGAT GTCTAAAATC TGGATATTAG GGTCTI'TAAT GTCTAGTAAT TTTGTGATA.A AATGTAATTG TTCCtATATGA TTCTTTCTAA TGAGTTGTTT TGTCGCTTTT CATTATAGGT CATATGGGAC TTTTTTTCTA CAATAAA.ATA GGCTCCATAA TATCTATAGT GGATTTACCC ACTACAAATA TTATAGAACC GAATTAATTT AATTAGAGAG CCAACTTTCT AATATAGTAA TCGCGTCATA ACAAGGTATC TATCATTCAT GGAGTTCCTC CTGTATACTA 1277 T'rAGTAAAGT AAAACTATTG GAGGA'rATTr TAATGCCACA TTCCACAATC TCGTCGTTTIT GATTCTAAAA AGAGAA.ATGA TTGGCAAGCT TGAAGTAAGT T'IT1rTITCAA'r CTCTCAATCT TGGATAAAGT GTTGCTCTAT GACAATTCAT CTATCTAGCC TGTGGGAAAA CGGATATGAG GCAAGGCATT GATTCATTGG TTTGAATTAG ATCCTr'rCTC CGGTCAAGTT TTTCTC=rT TTAAAGCCC T'rTACTGGGA TGGTCAAGGA rTTGGCTAC GGAAAACTGA CTTGGCCCAG TACAGAAAAG GATGTCAAAG ACC'TATTGTT CCTGTAGAGA TATTCTGCTT AAAATTCGTA CGAAATGGTA GAACAGCI'T 'rAGGGCAGGT C1'ATCTCGTA CTTrATCTGCT TAAAACCCAC GTGGTGGACG TAAAGACCGC TATATAAACG CTTTGAGAAC CTCTCACACC TGAACAAGTA GATTGGCTTA TGAAGGGCTT TTCTATCACT CCAAAAATAA ATTTATCAGA AAGTCGTGAT TTCTATTGAA ATGAGGACTT TCTTTTTAGT TATAATAAAG TTAGGAAATA AGGAGAGGAA GCCCATGGAA GAAGATTGAA AATCATTCAA CAACAGAGTG CTACAAT'rGA TAGTCTCACC AA'rGAACTTG CCCT'rCTTCG TGAACAAGTG TCCTCTGAGA AAAGTGTTTG CCCATCTGCA ATGGAAGAAG ACTCTGACTT ACCCAGTTGA AGCTAAAGGG AAACGTCAAG C'rC'TCTTGC AGTAGAAGAG -AGCATI'GCC CTGATTGTCA TCAACGACAA GAATTAGTCT TT'ATTCCTGC CGCT'rATAAG TGCCAAGCAT GCAGTGATAA TATTCCTAAA GCCCCTTTGG CGCATAGCCT CCATCAGAAG TTTAATCTGA AGGTACCCA-A GGGTTTACCA ATCACACGTA AGGAAATTGC TTTPGAGCCC CTTTATAATC ~TTACGAGA GGATGAAACC TCTTATCGGG TTCTAGAGAG TTTGTCTGGG AALAGCTGAGA ATCAAGCAAT TGGTTTAGTA GTACAAGAAT TCCTAGGAGA GCGGCAGTAA CTTAGGACTT TAGTCCTCTA TAGGAGTAAG GCGACGCTAA GCTTGGTAAA GGAAGAAGCT GCACTTGTTG GATGTTGGGC CCCCAAGCAA CCAGATAAAT CATCCTTAGG GCTTATCTAA CGCAAAAGCT CTATGGAAAA CAACTCAGTC TTTTTGAAGA GGAACAAAAT AAGAGAAGAA ATCACCTATA AACGTAAGAA CCAATTTGAT TCAGAAGAAG TTCATCATCA GGGAGATCTA AAAGAGATTG GAGCAACCCT GCAATTAAAA CGAATAGATC ATATCCAACA APLATCCGAGT GATAAAATCG TGAAAGCTCC TGGCTCAGCT TCTATTATCG CTCACACCAT TTATCGCCAA GAAGAAGATT GGGCTAAGAT TAATTGGCAT ATCAAGGCGA GTCAATACTA AAAGTTGTTA GAACAAGCTC TTCTTCATGC TGATAGTCAG TTGCCTTACT ATTGGACTTT CACGCTGTAC CACCATGATC AGCGTCGGAG TTATTCTGGC TA'rGTTCATT GTGACATGIT GT1TCTGCCTA TGCGATAGCA GTCCAAGGTT 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280
CTGCGAACAG
GCATGTGAGA
AGCTAAAGGT
CTAGAAGCTT
AGGAAGTT'T
TTAGCTTATT
ATCGTCAACT
TTGAAGTGCC
GTGATCAGTT
ATTTTCCTTG GAAAGAGACT AGAACATCTC CAGCCCCTAA AGCAOGTTCA AAACTAGGAA GACTAT'rrTG AAAGACGGAC ATCATTCGTT ATGGGACGGA TAAAAAAGCG AGGGTGGTTfA TTATGAGCTT GTTGGAAACA TACGCATCGA CTGCTAAAAC ATATCTTATT TCAATCCATT TTCCAAACCA GGAAACTCTC AAGTTGTACA AGAAAAGTGC 1278 GGGAGGCTTT GCCAGCTGAT TGGAAGACTI CTTTGCTTGG GGGCAATITGA ATACAGCCTC ATCTC4GTCCT TTCCAATAAT GTAAAAGAGT CCAOTGOACT TTTTCTCAAA GTTTTGAAGG GCTAAACGTC ATCAATTATA ATTTCTATAA ATCAATTTTC ATAAATAGCG AGAAATATCT GTAAACAAAG AGGTTTTAGA AAATAAGAAA TCTCCAGATT GAACGACTAC AGAAACGTCA TGCCGCCGTC AGTCAGT'TTT AAGTATGAAG AAACCTTTAA C'rAGCTGAAC GCGCCATTAA CTTTTAGCCT GAGCTCAGTT AGCTAAAGCA AGAGCTATTG GTGCGTTGAA TCTATAACAG CTTITCCTAAT CGATTTGTTC ATCCTATCI'T CTAGAATGTC GGCCTAT'rTA CCGTGGACTA AGOAACTATC CGTGAGTTCT 0 %0.0 0*0* 09 0.* 0 CTAGTCTGGA GATTTTTCAA TAGACTTCGT 'rA~rGGACGG T'rACAATITrA TTATATGAAA ATCCCATATT ATTCTCCAAT TCTATATT'rT ACCI'TCTAA ATGTATAGAT TAACTACCTA ATTATAGCAT ATAACGCAGA TTCCTTTCAA TCGTATGATT TACTGCATTA PATTAAGTAA AAAAATAAAG GCAGTCCGAA GACTGCCGAT ATTTATCTCT CATCTCTTTA ATT~ATGGTAA GTAAATAAAT AATTTCCCTA AAGATATGGA AAT'rATTAAT ACTATAAATA CATATTATAA A GTTTATAAA TACTGTAAAA ATC-CTGAAGT TAATTTTCTA ATAAATATCA ATA'TG'rGTTA GTATCTTT'TA AATTTTTAGA CAATTTACTA GTTC'rATAGA CATGTTTAAC AGACTCTATT TTACAATTCA AAAATTTCAT CTGCCACTTC ATTTAAAAAT TCTATATCAT GGGAAACAAT 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720.
3780 3840 3900 3960 4010 AAAAATTATT TTIATCCATGG GGAATA.ATCC ATACCACTTG AACAGATGCT ATTGCAAGCC TTTTATACTr AT'rAATCAGT AAGGTTCGTC AAAAAAGACA TTTGCTTTTG CCCTCCTGAT TCAGATATTT TTATCATATT AATGGAGAAT TCTTGCACAT AATAAATTCG TCCAGGCATA AATCTTTTAA CCCAAATCAT TGTAACTTAC AAAACCCCTG ACCTCATGAG CCACTTTCTT TACT'rTCTGC TGTTCCAGTA TCGTTTTTCC TCGCTAGAT'r TCCCTTGGT'r CGTCACACGA TTTTTTCATC TCGACTGTTC GCTTTTCTTC TAGGTGGTTC ATAAGGAACA GGAAGATTCA GAATAAAGTG CTGAAAACAA TTCGGAA'rAG GCATAGAGAC TTGCGTCCTG TTCGAACACA TTTTCCCACC ACGTGAAGAA INFORMATION FOR SEQ ID NO: 254:
AAACTCATCG
TCATACCTCT
CCTCCTCATG
TCCTCAAAAG
TTAATGCAT
GATGCCTr'rC
CTCAACTAGA
AGGTCAGTTT
GGCAGACTCC
CATTAACGAC
GGTTGACTTI' TCTAATCCTA TAGACAA'N'T GAGGAGCTOC
AAAGATGGCG
1279 SEQUENCE CHARACTERISTICS: LENGTH: 2789 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 254:
C
C.
C
C
C. CC C C
C
C. C.
C C
C
C.
C C e C
CCC.C.
C
CC
C
CC..
C. Ce C C
CC..
C
CCC.
C
C. CC C C
C
ATGCATCCGT TTGTCAAGCC TAAATTGTAA TTTTTTTCAA GAAAATGACA TAAAAATATC ATTCCTAGGC CTA'ITrATGC GAGTATTCAG TCGGTCAAAT GAAGCTGAAC GAACTCATTT TTCGATGACA TTGTTGGGCT ACATAAGCAT CGTGGGTCAC CTCGATTCAT CTCTAAGAGA AACTTCAAGA CCAAATCTCT CTGTCGGTTC ATCGGCTAAA ATCAGCTGGC TGGGTTTTAA TTCGTTGTTG TTCGCCCCCA GACAACTCGG AGACCCTTTG CTACTCTCTC TAAAATCTCT TCCACCTTTT TGAGCT'rGTC ATTTCAGCGC CACATGAGAT TGTACTCGAC CGTTCATCA AA.ACAGATAA GAGATATGTT CACGGATTAT TGTTrGCGAC ATTTGTCTGA CCAAAAATCT CATACCGTCC GCTA'rAATCA ATTTAACAAG GTCGACTTCC CACTACCACT CTTACCAACA ATCAATCCrG AGAGATAAGT TATCCAAAAT CACTTTTCCC TTTCAACTCA ATCATAAGAT GCCCCCTTTC AATAACTCTA CTAGAAGCTA AGCCTAGCAC AAATAGTATA TCCAGACkTG AGTGGTAAGA ACGCATGGGC AAAGAAAATC AAGACTAGAA AAGAGCAGAA CGAGGAGAGG ACGGTAGCGA TCGACCAGTT GTAATGATAT CCCTGCGCTT CAATAAGAAA GTTGTPACTA ATGCTAAGGA GACCAAACAA AGCAAAGAGT AGGTTAAAAT GAATCCACTr TCTCTTGTTG AATGGCTTGA ATAGATGAAA TT'rAAAACAG
TATTTCTCTC
TCCCTCGCCT
GATAATGACT
ATTT'rCAGGA
AAAAACCCAG
TGAAAAATAT
AATTCAATGA
GTTTTCCCCT
TCCAGAGA.AC
GATGGCTCTA GCAACTGCAA ATGCAAAGTA GCTGATAAAC TT'rCTTAGGC AATTTCACAT TCAATCAGGG CAAAATTTTG TTAGCAGAAT TPACCGCTAG CCATCTATCA AACCCAATAA ATAGCTACCA AATCCCCCTG CCA.ATGGTTT TGGTAATATT CTAGACTTCT TTTCTCCATC TAAAACCTGC AAACAGTAGA GAGGGAAACT ATAGCCCAGC TCCACCCCAT AAACT'rCTTG GTAAGAAGTA GGAAATCATC TCCGAACAGC ATCTCGATAA AT'rTAAATA ATrTCCATCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 GACAATTTCT CAACTAACTC TGTAATCTCT TTTTGATGTT GAACCGTATT TTCAATTTTA ATCGGATTAT TTAAGCCAGT TGTTGACAGG GAGGCTTTCT CATCCCACAT CATATCAGAA TCATTGACCA AGCTAATAAT TGGATTGGAG AGATTTTCCT TTCGCTTATC ACTATATGGG AAAAATGACC AATCTCCTTC ATAATAGGCA ATCTCGACAT CCATCTCCTC TATCGTTCGT TTTTGCTGCr
TTATCTTCTT
GGTAGCTTGA
TCTGTTAACT
CTACTCTGCA
TAATCTGTAG
AAGGTTTCTA
TTGTTTTCTr
CTTCATACTT
CACCTTTCGT
ATCCCTTGCT
GATATTGCTG
CATAGCCCGC
ATTTCCCTGA
AGGTCAGGTA
GTTCTAAACT
1280 CATCGAATGA AAGGCAATTA ACTTGCTGGC ATCAAAATAA CTTAGAAAA TTGCGATTGG AATCTGTTCT GATTGGACAA CTGCGTTTTr TCTACCAAAT CCCTGCTAGC TCTTCTrGCC ATTACCTTGA CTTACCCACT TCTGCCCACC CCAATCAGTA ACTTCCCCAA GAGCTGATITT CTTTrAAT ACCGGTATTT CATAGTAAAC ATCACCGTA
AATTTTTTAC
CCTGATAAAA
ACAGATTATC
AGGAAGACTG
TCGATAGAA-A
ATTGAGTTTG
GTCCCTATTT TCATCACATA ATTGAAGATA AGACCAAATT AGCAGAGAGC TGATTGTCAT TTTTTGGATT AAGAGGTAAG GTTGCTGATA AGCAAGTTCT AGGCCGTCAG TAAAATAGTT TGAAkAGATGA AAAACCTTTC TCAACCAACT GATAAAGAGA
S
et S S
B
S. VS
V
Sv S.
S
VS
S S
S
TAAAGCTGCA ACAGCAAAAA ATGAGACAAC CACAGCATAG GAAACAAATC T7"TGGCTTA TAATCAAGCA AGAAAAACAC GCCTAGATTG ATCACAAGAG CCCCACCTAG GAGGAGGTAA AGGTTGCCTT TTACAACATC AGCTAAAACA GCCCTATCTT GAAAACCAAG TAATTTTTGT ACCCCAACTC TTTTCATCTC CATCATCGGT.TGATACACTG TCACTAACAC AAGAAGCAAA ATAGCCAAGA CAAAAACAAT GGCAGATAAA AGCAAATCTC GATTTATGAC TTCCACTGCA CTTrTGTAGG TCGGCTCTAG CAAGGTAGCC TGGTCTATCT TGAAAAAATC GCTCCATTTC TGTACAATCC TATCCTTGTC CATCTCTTGT GTAGAAGTTA TCGTATAGCG ACCATTTAAA CTACGAGATG TATCCTTGAT ATAGGTTTGA AAAGTCATAA GCTGAATAGG TTTGGCTTTT AGAAAGGTCG GAATCGTACC AAGTT'rATTG GAAATTTCTT TATTACTATA GACTCCTTCA CCATCTGTGG TAAAATCAAG AGAAGAAATC CCAAACTCTT GGTAGGGGAA GGTATCTTTA TCAAAAACAC CAGACTTGAC CACCTCATCA CCACTGTCTG TrTTrGATGAT GGAGACTTTA TACTCCTTTG ATACATCCTC AAAAAATCGA AGAACAGACG CTGCAGGTTC GTTAATATCT T'rCAAATACA AATCCAAAGA ATCTACAGG INFORMTION FOR SEQ ID NO: 255: SEQUENCE CHARACTERISTICS: LENGTH: 2495 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 255: CTGCGAATTT TATTAAAGAT AATGTGTTAA TTACAGCGGC TCACAACTAC TACAGACATG 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2789
V.
V. S S S V S
*SVS
ACTATGGGAA
CATTTGGAAA
CTAAGGATGC
AATTAGGGAC
TCACAGGCTA
1281 AGAAGCGGAT GATATT'rATG TTC'rCCGGC GATCAAAGTA AAGGAAGTTC GTTATT'rGAA AAGGGAATAT GACTTGGC'rT TAT'rAAT'rCT TT'rGGGTCTT CCTACrAGTC AAAAAAAT'T TCCATCA'rA' AATTTTAAAA T'rCATCAAAT TGTTAGTCCA AGTCAAGAAC GGAA'rTTAGA AATTTAAATT AGAAGAGCCC ATTGGTGCAA GACAGGAATA ACTGTGACTA GTATACAGAT AAGAAACAAG TACTTAGAG GGGTCTAGTG GCATACTTTA GGAGATGGAG TTDTAAGTGA TGATGGCATG 'N'CTTGGATT ACCAAGTT'GA GATCTACAGT TTATGATGCT AGTCACCGTG TAGTAGGAGT CTAATCAAAT TAACAGTGCA GT'rAAATTAA ATGAACGAAA TTCTTAAAGG TTACTCTCTT GAAGGATGGA AGAAAATAAA GACAACATCA TAAACAAACG GGTTGGCAGG AGATAAATGA
TTTGCCATTT
TGGrTAGTTGG
TACCTGGTAT
AAAATGGTAT
ATTTAW'rCGG 'rACCAT'rATA
TATI'AGACA
TATCTrCAATT
GTTCCGGTAA
CAAATGGAGC
CATCTGGTGA
TGCAAATCGC
GATGCTTACA GATTGGCAAA AAGTCCATGG AATGGI'ACA GGTAGCCAAA GTGGATTTAA TGTTGGAGGA ACTAGCCACA TTTTTCTTCG ATGATTCTGA AGGA'rGGCAG ATCTAAAAGA AACCTACTG CCGGAGAGAT GGTTGTCGGC GTCCTTCTCC AAGAATAGAG GTGTATTACA AGAATTTGTT ACAAACATCA TGGGGAAGAA ATCAGCGTAG TT-ATCATACT ATTTACAGAA GGATGGTGGC CACGTGGTTG GGTTAAGGAT CATGGTACTA TCTAAATCCA ATAGATGGTA CTACCTCCAT CAACTTGGTA CTATCTAGAT GGAACAAATG GTACTATCTC
TTTGTCCA-AG
AGAGTGATAG
TGGCAATATA
ATTGCTCTTA
GGCAAGCAAG
CTATCGATGG TA-AAGTTTAT AACTTCGCTT TATATAAAAT GAAGCTTTTG AAAAAAATGA GTTrGTTAGC GACAAATACA GTATTT'GCAG AAAATGGTAG AACCTACTAC AAAAAGGGGG ATGGGAAGTA CTATTATTTT GATCCTTTAT TACCTGCTCC ACACAAGGGG GTTACGAT'rG GACCAGATTG GTTTrATrTTT GGTCAAGATG TTTTAGAAGC AAAAACTGCT ACGAATACCA TATGATAGCC AAGCAGAGAA ACGAGTCTAT TAT'IT'IGAAG ?TAAAAACTG GTTGGATTTA TGAAGAGGCT CATTGGTATT 'rTTGATTCGC GCATCAACAG ATTGACGGTT GGAGAGCTAG TACCCTCTTA CGTATGATGA AGAGAAQCTA AAAGCAGCTC GCAACTGGCA TTATGCAAAC AGGTTGGCAA TATCTAGGTA TCGTCAGGAG CTATGGCAAC TGGCTGGTAT AAGGAAGGCT GCTGAAAATG GTGATATGAG AACTGGCTGG CAAA.ACCTTG CGTTCATCAG GAGCTATGGC AACTGGTTGG TATCAGGAAA 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 GTTCGACTTG GTACTATCTA AATGCAAGTA ATGGAGATAT GAAAACAGGC TCAA'rGGTAA CTGG'rACTAT GCCTATGATT CAGGTGCTTT AGCTGT'rAAT
TGGTTCCAAG
ACCACAGTAG
GTGGTTACTA CTTAAACTAT GTGATGGATA CTTAACTTTG TAGTATCAAG GT1'TTFCTGT TTTAGTTCTG TATTGCACAG GTAGTAATCA ATATAGTC'TA CTCATAACCG TAAAACATTT GTCTGGGCTG TTTCCCTTAC ATGATAAGAA TCGTGTTGGT GTGCTTCTCT GTGAATGCCT TGAAAGATTA TAGGCGATAA
AATGGTGAAT
TATAATAGGT
ACTGCCCTCA
GGCTAAGTCC
TAATGGCTTG
CCGATTTCAG
GTGACATGGA
ATTGCCAGCC
1282 GGGTTAAGTA ATGAAGGCTA ATTGTAAACT GGATAAAAGT CTCACAATC AAAAAACGCA AACAGTTAGA CAATTAATTT ATCCGAAGgA TT?1'AGTTTT ACC=rAATTC GTr'rATTGTT TTCCAATTGC-TTAGCGACT GAAACGACrT AATCCCAAAG AAGGACTCCA TCATACTATT TGCTTGAATT CCCTrACTCT CTAGGAACCG TTGGTCACTA TGGAGAATCG TATTCTCGTA 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2495 GTTCCAACAT TGTTrGTACT TGTTCTAAGT TGGGTGAAGT TTTCGCTATP AAAGCCATCT AAAACTGGTG ATAAGTAAAG CTTTTGAGTA CTTGCTGGAA TGGCAAATTC.TGTCACATCT TTTAGAGCCT TCAAATTGGC CTTGAATGAG ATTCG INFORMATION FOR SEQ ID NO: 256: SEQUENCE CHARACTERISTICS: LENGTH: 870 base pairs TYPE: nucleic acid C) STRANDEDNESS: double TOPOLOGY: linear GTGTAGCACT TTTCCATTGT (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 256: TACCACCGTA TTCATCCAGC AAGATTGCCA TTTGTCTTTG GGTATTrCGC AGTTCTTTTA GCAAGTCATC CACAAAAATA GTTTCAGGTA CAAAAAGTGG ATCTTGTAAA ATTCTCT~TCC AAACAATATT GTCAAAACCG TCCACAAAGC CTGCCTTAAG GAGACTCTTG GTGTGAATGA TTCCAATTAC ATTGTCCTTA TCCCCATCAT AAACCGGGAT ACGAGAATAA TTTTGTTTrA AAATACTTTG GATAATGGCT TGACTATCAT CCTGAATATC CACCATAAAG GCATCCGTTC GAGGAACCAT AACCTCTCGT GCCATCAGTT CAATCTCATC AGCATCCAAT GTTTCTTCAC CATCGAOCGA AAAGACACCT TATTTGTCAG CATATAGGCA
TGTAGCATCT
ATTTCATCAC
GGGTCATCT'r T'PCATCCGCA TCATCGAATG ACATAGGAGT CAAATGGCTC AAGAAATTGG TCGAAGCAGC TAAAAGCCAA ACAAAAGGAC TGACTAGTTT TCCGATCCCA ATGATAATCG GCOCTGTACG AATTGCCAAG GCATCCTTTA GATTAAGAGC GATTCTCTTA GGATATAATT CCCCAAAAAC GATGGAAATA TAGGTCAAAA ATGCCAAGGA TAGAAAAGTT GCCACGGCTT GTGCTGTTTC GCCATTCCCA AGCCAAGAGG CAATCACACG TCCTAGAGTA TCAGTTAAAC 1283 TCGCCCCTGA TAAGATTGTA ATCAGGGTGA TTCCTACCTG GATGGTTGAT AAAAAGTGGr 780 TAGGATTTTC TAGTACCTTC AGCAGGCGGA TGTAGCGTCT GTCTCCTTCT TCCGCCTTrT 840 GT'TCAACTCG GGCACGATTA AGAGAAACGG 870 INFORMATION FOR SEQ ID NO: 257: SEQUENCE CHARACTERISTICS: LENGTH: 1245 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 257: CGTTCCCAGA AGCCCGCATT CTCATCGCCA ATGTCGTGAT TGATTTGGCC CTTTCTCCAA AATCCAACTC AGCCTATGTA GCTATGGATA AGGCACTTGC TGACCTCAAA ACATCAGGGC 120 ACTTGCCTAT TCCGCGACAC CTGCGTGATG GGCACTACAG TGGAAGCAAG GAACTGGGGA 180 *ATGCCCAAGA CTATCTCTAT CCACACAACT ATCCTGGAAA TTGGGTCAAG CAAGACTATC 240 TGCCAGAAAA AATTCGTAAT CATCACTATT TCCAAGCAGA AGA'rACTGGT AAATATGAAC 300 GGGCTTTGGC TCAAAGAAAG GAAGCTATCG ACCGTTTGCG AAAAATCTGA AATCCTTTTC 360 eeo- AAAAAAT'TGC ACTTTCCTCT TGATTTT'r'T TGAAAAAGTG GTATCATATA.AATATAGAAA 420 0CGCTGTGGTG TACGACTTCA CACTTAAGTG TTGACCGACT ATTTrTTTGTA TTATTAGGGA 480 AACAAAAGTC TTCTAACAGC ATGTAGGCCG TCTCACACGG AAACAGCTTC AGTTAGAGCG 540 :o.0 AGTTGCCCAC CTGCI'AATT GCGCGGGTTC AATACAAACC GTGAAGTTTC GGCACCAATA 600 CAGCTTTTTT CTTTGCCTCC TTAGCTCAGC TGGCAGAGCA GCGGACTCTT AATCCGTGGG 660 TCACAGGTTC GATCCCTGTA GGGGGCATAT AAATACAACA GGAAAAGCCT TATAATATAG 720 9GGCTTTTTTT GCTTTCC'rTT TAAAAATTGT CGTGCAATTT GCCGTGrrTr TACAACAAAC 780 TTTTCACAGC CATAAACTCC TCACTAATTT TTTCCTCCAA GGTATGCCCA TAAACGTCAA 840 TCAACATGGA GATATCTTTA TGTCCTA.AAA 'ITTG4GCTCTT TGTCAACTGT AGTGGGTTGA 900 o AGTCAGCTAA CCTCGAGAAA GGACAAATTT TGTCCTTTCT TTTTTrGATAT TCAGAGCGAT 960 *.*AAAAATCCGT TTqTI'GAAGT TTTCAAAGTT CCGAAAACCA AAGGCATTGC GCT'rGATAAG 1020 TTTGATGAGA TTATTGGTCG CTTCCA.ATTT GGCGTTAGAA TAGTGTAGTT GAAGGGCGTT 1080 GACGATTTTC TCTTTGTCCT TTAGAAAGGT TTTAAAGACA GTCTGAAAAA GAGGAGGAAC 1140 CTGCTTTAGA TITGTCCTCAA TGAGTCCGAA AAATTTCTCC GGTGCCTTAT TCTGAAAGTG 1200 1284 AAACAGCAAG AGTTGATAGA GCTGATAGTG ATGTTTCAAG TCTrG INFORMATION FOR SEQ ID NO: 258: SEQUENCE CHARACTERISTICS: LENGTH: 1684 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 258: 1245
ATGCCTATGT
AAGCTGAGAG
AGACCATCAG
AACTCCACAT ATGACCCATA AGCGGCAcCC AGGCTTATGC GATTCAGGAA ATACTGAGGC GCCACTGGAT TAUAAAAGAT AGTTrGTCTG TAAAGAGAAA GGTTTGACCC CTCCTTCGAC a a. a a a. a.
GA.AAGCAGCT AAGAAGGTGC CACTI'GATCG AGTCAAAAAC GGTAGTT~TAA TCATACCTCA GTGGTTTGAC GAAGGCCTTT ATGAGGCACC GACTGTCAAG TACTATGTCG AACATCCAAA
AAAAGGAGCA
TATGCCTTAC
TTATGACCAT
TAAGGGGTAT
CGAACGTCCG
GAAGCTATCT
AATCTTCAAT
TACCATAACA
ACTCTTGAGG
CATTCAGATA
TAACGCTAGC GACCATGTTC AAAGAAACAA AAATGGTCAA GCTGATACCA AAAACCAAGC GAGGAGAAAC CTCAGACAGA AAAACCTGAG GAAGAAACCC
ACAACCGCGT
ATACTGTAGA
TCAAATTTGA
ATCTTTTGGC
ATGGTTTTGG
ATCAAACGGA
CTCGAGAAGA
AAGAATCACC
TGAGAGAGGC
AAGAGACTCT
TTATGGCAGA
ATTTTCTAAC
TGAGTTCTAG
GAAACCGCAA
AGAGGAATCA
TGAAGATTTA
CACAGGATTA
AGCTGAAAAA
TCCTAAAAAC
TTCTCATTTT
TCTACTTCTA
ATATCTTAAC
AGCGAGAAAC CAGAGTCTCC AAAACCAACA GAGGAACCAG GAAGAACCTC AGGTCGAGAC TGAAAAGGTT GAAGAAAAAC CTTGGAAAAA TCCAGGATCC AATTATCAAG TCCAATGCCA AAAAATAATT TACTATTTGG CACCCAGGAC AACAATACTA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380
CTATTGGCTT
AGGATAGGAG
TTTCATGAAA
AAACATTGTT
AGATAGTGTA
TATTAAAGGA GAGTAAGTAA AGGTAGCAGC AACGGGAAAA CGAAAAATGA GAGCAGAATG ATGTGCAAAA TATAGTAGAT AGAAATCGAT TTGACTGTCC AATAAAGATA AACTATTTAC TGAAACTAGA ATAGTATACC TGTTCTTATT TCATTTTACT TGGCTAATTA ATCAGTTAAA CACTAGTTAA GGAGTAATGA AGCTCTTGTC TTAGGAGCAT GGATTACTCT TTACTGCTAT TGAAAAAAAG AACAATACTA TTATTGATGG CCAGTCTGTT GTGGTTTCTT GGACATATTG ATCCTGGATC ATTCTCATCA TTTAGAAACT GGGGTGGTTT GATGGAAAGT ATTGGTCTTG.
TTATCGTTTC ACATTCCAAA CACAT'PGCAG AAGGTGTTGT TGAACTGATT AGTAAAGTAG CTAAAGATGT TCCGATTACT TATGTAAGAG GAACCGAGGG CGGAGGAATT GGAACGAGTT 1285 TrGAACAAGT AGATAGGGTT GTTTCCGAAA ATCCAGCAGA TAC?1E'ACTT GCCTTTTTTG 1440 ACCTAGGTTC TGCTAAAATG AACT'PAAAAA T GGT GACTGA TrTCAGTGAT AAAAGTATCA 1500 TCATCAACAG GGTrCCAATT GTAGAAGGTG CCTATAATGC AGCTGCTCT CTrCAGGCTG 1560 GTGCAGAACT GTCAGTTATT CAAACACAGT TaGCGGAgCt TGAAATCAAT AAATAAGGAA 1620 r17x rAC'rATA ACTCTTTTTA TAGATAAGCT ATTGaTTATC TCAACTATAA TAATGTTAAG 1680 TnAA 1684 INFORMATION FOR SEQ ID NO: 259: SEQUENCE CHARACTERISTICS: LENGTH: 970 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 259: .*AOGAGTGGAG AnATATGAAG ACACAAATTT TCACATTATT GAAAATCGTT GCTGAGATTA *TTATTATTTT GCCATflTCTA ACTAATCTAT AAGTTCTTTA TATTGCTGAA AACGCAATTC 120 *AAAAAGGGCT AT3'AATTGTG GATTTTCTAA TACCTGCAGA GAT'rGGATAA AGCGTTCAAT 180 .CTCTTr'PTGA TTGCT'rCCCT TTGTTTGAAG AAAGACACTC ATCTTCTTTA AAAATTGCCA .240 CGATACTTTfT TCAAAAACAT CATACGGTCG TAACATCCTC TCCAACTCGG CTTCGAAGAT 300 TGGGATGTAG GAGAAAAGTT TTCGCTCCAT GAGTTCTOAT AAGATATTTA AGAGTCCTTG 360 *CTTCATATAC AATCGATTGT GTACTAACTC TT-rAAATTCT TTGGATTTTT CGAGTAAGGA 420 GGTTGATAAA AAAATCAGAT CTTGATTGCT CAAGAAGGGC ATGGTATTGC AAAAGAGATA 480 **.GAGTTCAAAC CAGGTCCAAG ACTCGATAGC ATAGAGATAG GTGGTCPAAA ACTCGCTATC 540 CTCCTCTGCT AGTGGGTAGQC TTTTATTTAG TGAATGGATG GCATCTTTAA TCACGATGGC 600 ATTCAAACGA CGATAGGTCT GCGCCATCTG TTCTTGATCG ACTTCCTCCA ATAGCTGCTC 660 TAAAGCAGCT ATATCCTGAT GGGCAAAGCG ATTCACAACC TTTCGACCGA TTCGCATATG 720 TGGAGATI'CT TGATAGTI'GT TGAGCT'rGTG CCCAAACTCA TCAAAGGTCA CATTTATACC 780 TTGGATAGCT AGAATCAACT TATCCGCAGA CAGCATAGAC TGCCCTAGTT CAAACTTGGA 840 *CAACTGAGAA GCTGTTAGAC CCTCACAAGC CACATCTGAC TGCTrGAGCT TT'CTCGCCAA 900 ACGTAATTCC TTGTAAAATT CCCCCAGTTC CATTCTCTCA ATCATCTGAC CACCTCCTAG 960 CTT'PTGCAGG 970 1286 INFORMATION FOR SEQ ID NO: 260: SEQUENCE CHARACTERISTICS: LENGTH: 2996 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 260: t 0 000 00 (.4 0 00 0.
0 0 00 00 0 0 0 GTTGACCACG GGTAAAACTA GCCTTCATCA GTTAACCAAC CGAACGCGGT ATCACTATCA CGCTCACATC GACGCTCCAG TCAAATGGAC GGAGCTATCC TGAGCACATC CTTCITrTCAC AGTTGACTTG GTTGACGACG ATTGTCAGAA TACGACTTCC AGCTCTTGAA GGTGACTCTA TGAGTATATC CCAGAACCAG CGTATTCTCA ATCACTGGAC TAA.AGTCAAC GACGAAATCG TACTGGTGTT GAAATGTTCC TGTCCTTCTT CGTGGTGTTC AGGTTCAATC AACCCACACA AGGTGGACGT CACACTCCAT TGACGTTACA GGTTCAATCG CGTGACAA TC GACG'N'GAGT CCCTAACTGC AGCTATCACA CTAAAGACTA TGCGTCTATC ACACTGCGCA CGTTGAGTAC GACACGCGGA CTACGTTAAA TTGTAGTAGC TTCAACTGAC GTCAGGTTGG TGTTAAACAC AAGAATTGCT TGAATTGGTT CAGGTGACGA TCTTCCAGTT AATACGAAGA CATCGTTATG AACGTGACAC TGACAAACCA GTGGTACAGT TGCTTCAGGA.
AAATCGTTGG TATCAAAGAA GTAAACAACT TGACGAAGGT AACGTGATGA AATCGAACGT CTAAATTCAA AGGTGAAGTC ACTG=r'TGG CACGTCGCTT GATGCTGCTC CAGAAGAACG GAAACTGAAA AACGTCACTA AACATGATCA CTGGTGCTGC GGACCAATGC CACAAACTCG CTATCGTCT TCATGAACAA GAAATGGAAA TCCGTGACC'r ATCCAAGGTT CAGCACTTAA GAATTGATGA ACACAGTTGA TTG CTCTTC CAGTCGAGGA CGTATCGACC GTGGTATCGT GAAACTCAAA AAGCAGTTGT CTTGCTGGAG ATAACGTAGG GGAGAAGTTA TCGCTAAACC TACATCCTTA CTAAAGAAGA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 0* 0 0 000 0 0000 .40 .,04 00 4 S 0 TCTTCAACAA CTACCGTCCA CAATTCTACT TCCGTACTAC AACTTCCAGC AGGTACTGAA ATGGTAATGC CTGGTGATAA TGATTCACCC AATCGCCGTA GAACAAGGTA CTACATTCTC TATCCGTGAG GGTGGACGTA CTGTTGGTTC AGGTATGGTT ACAGAAATCG AAGCTTAATT CGATTrAGTT CCCAGAAGAA CAATTATTTA AGTTAGACAC TAAAAGAATC TTGCTTGGCA AGGTTCTTTr TTTAGATATT GAACTAATAC TCAATGAAAA TCAAAGAGCA AACTATAATA TATTGAAACT AGAATAGTAC ACATCTACTT CTAAAACATT GTTAGAAATC GATTTGACTG TCCTGATCGA TTTGTCTTGT TCTTATTTCA TTTTACTATA GAAAGTTAGC TACAGACTGC TCAAAACATT GTTTTTAGGT TGTAGATAGA ACTGACGAAG TCAGtAACAT CTATACGACA AGGCGAAGCT GACGCGGTTT GAAGAGATr AAAGCATTAT ACAATAGTAA CN'TAAAAG ACTCTCGI'AC CTGATGGGCA AATCTrATAA
TATGAAATCA
AGCTAAATAT
AGAGATTATA
TCGAAGAGTA TAATACTAGA ATTAAAGAAG AAATCCAAAC CATAAACGCC TTCAAATCGT GAACTTTTAT AGTGGTTTGA TrATAGAAAT ATTTTAGCAG AATCAAACAA CGATT'rGGCG
CTAAAATCAA
CATCAAAACA
TCTATTTCGT
AATAAGATGT
CCAAGGTGTA
AAATGTAAAA
GAACAACTCT ATCAGGAAAG TCAAACTAAT CTGTTATAGA TrCAATACAC TTTAGACTGT AATATGAGGA GTTCGGACTC GACTCTCTCC CTTATATGAC GGTTGAGCAA GAGAAAGTCT CAGGAGAAT'r TGTTACAATT GATGCCTTAT
CCTACACACG
CACGTCCAGA
TCTCAATTCA
TTGGGGTGTT
AAAACAAGGA
TAGGTACATG
TGATGCCTTC TATCAACTGT ACATCCTA-AG AAAGCAGATG AGAAGACAAG TGAACTGCAC TTTATTATGA AATTAACTTA TATAGCTTAG AGAAGCTTTC ATTAAATTGA TTGATCGTTA TCGTTACTAT TCTCCTGATT GACTAAAGAT AGAGTTTCTC GCTAGCACAA TACAGGAAAA TGAGAGCGGA GAATGCCATC AGAAAGAAGA AAGACAGAAA TCTAAAAGCC ATTAAACrAG ACCAGATAAG GACCAAGAGC AAATTATGCT TATCGTCGGA TAAAAGAGTT CAAGGCTTGA AAAATAT'rCT TCTCATAAAG ATTTGAAGGC TCTAAAACAA
TAAAACAAGA
T'rGAATACTG
ACGGGTATAC
CTAAAAAAGT
TTATTCAAGA
CTCGTTTGAC
TTAAAGCTGA
TTTATTTAGA
TAAAAGTACT
GAGACGTTGG
TGGAAAAGTG
TTCAAGAALAC ACGTGGTGGT CG'rAACCATG TTCT1TGCCCG CCATTTGAAG GCTACAGAGG TTCAGGCTTA TAAAAAGGAG TTAGGTCGTT TGAAGCGCCA TGGT'rGGCGA. AATAT'rACGC CTCAAACCAT TGTCGCGTCT AAAAA'rAAAG CCCA.AAAGTTr AGACAGAAAA AATCTAACTT TGA'rGATAAA GTTCAGATCT ATGAACTTAG AAATAAATTT GGGATAAACA ATTCTAATCT CGGAATAGAG TTCGTCAAAA AAGGAAAAAA AATGATTCAT AAAGTCTGAC ATGA.AGGCTG TCTCCCAAGT CGTACGATAC TTCTTAACTG TATTGTTGAG AAAACAAGAG GGAGAGTACC TAAGAGAACT CCGATTGAAG GAGGAAAAAG ATTAATGACT GAGTTTTCGT TAGATATTCT CTACTACTAT CACTTGAAAC AGCTAGATAA AATrCAATCC ATTTTTATCG AACACAAGGG ACTAAGAAAT CGTGGTTATC TGGTAAATCA CAATTTACAA GCTAAAATGC GACAGAAACG CAAGAAGGCA GAGAATCTCA TTCAAGGACA CTACACAGAT GTGACAGAAT TTGCCG 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 2996 INFORMATION FOR SEQ ID NO: 261: SEQUENCE CHARACTERISTICS: LENGTH: 837 base pairs TYPE: nucleic acid STRANDEDNESS: double 1288 TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION: SEQ ID NO: 261: CTTATCAACT CCCGACATGG CTCTCAGACC AATCCAAATC CCTAAAAAAA TCAGAACAAG GATGGTGGTC AAGATCAAAC TC'rCGAA.ATA TAAAGAAAAT AGTTGCAGTA GCATGATTTC TCTCATTTC'r ATCrTTTTTA AAGAGTAAAC TCAGCTAG'rC CAACTAACTG AGTT'rTCCTT TATCTATTAT ATCAAATATA AGTCCGTTTG TAACTAGCGA AGAATTCT'rT TGTCCGCTCT TCTTTAGGGG TGTGGATAAT CTCATCCGGA GTrCCAGAC'r CGATGATTTT CCCCT TATCT AAGAAGAGAA TTTTATCCGC AACTTGGGCT ACAAAGGACA TGTCATGACT GACCAAAATC ATGGTCTGAC CTGACTTAGC AGCATCTGCA ATAGACTTT'r CTACTTCACC GACCAATTCT GGGTCAAGGG CTGAAGTTGG T'rCGTCTA.AG AGCAAAACAT CTGGTTTrCAT AGCAAGCGCA CGCGCTAGGG CAACCCGTTG CTTCTGTCCA. CCTGATA.AAT GGCGAGGATA ATGGTTTTCA CGGTCCGA.AA GCCCAACCTT AGCCAACTCT TCCTTGGCAA TCTTAGTCGC TTCTTGGTCA GATAATTTCT TGACAACAAC CAAGCCTTCT TTCACATTAT CAAGTGCTGT TCGGCGTTCA AACAAATTAA ACTGTTGGAA AACCATAGAC AACTTACGAC GTAGGGCAAG GATTTCTTCT TGAGTGATTT TAGAAAAATC AACTGAAAAA CCATCAATCT GAATAGAGCC ACTGTCAGGT GTTTCTAGAT A.ATTGAGACT GCGAGAAAGG TTGATTTTCA GCTCTGAAGA CCAATCA INFORMATION FOR SEQ ID NO: 262: SEQUENCE CHARACTERISTICS: LENGTH: 868 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 262: CCGAACAAAA TGGGCTAATT AGATTATAGT AAGAAAGGTA AGTTAAAAAT GAGAATTGCA ATTGGATGTG ACCACATCGT AACTGATGAA AAAATGGCGG TTTCAGAATT TTTGAAATCA -AAAGGATATG AAGTCATTGA CTTTGGTACC TATGACCATA. CACGGACTCA CTACCCAATC TTTGGTAAAA AAGTAGGGGA TGTGGTACTG GTGTTGGTAT TTGGTTCGTG ATATGACAAC GGTTTTGGTG GTAAAATTAC AGCTGTA.ACT AGCGGTCAAG CAACAACGCT GTAAATAAAG AGCCCTTTAT GCTAAAGAAC CTGATCTTGG AGTATGTATC TTCCAGGTGT TCGTI'CTGCC AATTGAACGC TAACGTTATT TGGTGAATTG CTTATGTGTG ATATCATCGA AGC'N'TCATC 420 CATGCTGAAT ACAAACCAAC TGAAGAAAAC GAAAGTCACA ATGCTCAACA GATCGTGGAG AATACCACGA ATCCATCGAT ATTTCCTATC GGATGTAACC AAAACGGCrG TGGCGATTCT GTCTTGCTA
AACAGACGCA
CTAAGAGGT1G
CCTTGGATGA
GTGGTAAGGG
CTGGTTTAGT
AAAAAATTGA TrGCGAAAAT TGAACATGTT AACTTCTTTA CAGAATCCT TGAGAAATGG ACCTATGATT TTAACAGTCA CAATGAACCC GTTGAAGATT GATACTGTCA ATCGTGTGGT ACTCAATGTT ACCCGAGTAC TrTCAGAATT- GGGTGGCAAA CTTGGTGAGT ~rT'GGTTGA CT'rCTCAATT AAGGGAGAAA CTCGTAACTG 3: ACATATCGAT AATCAAGTAA AGAAAGAT TATCGCTATT CTCCACGGAG ACAACCAA INFORMATION FOR SEQ ID NO: 26 0* SEQUENCE CHARACTERISTICS: LENGTH: 3744 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 263: CCGTTCAAAG TCTTCATAAG ATAGGTAATT TCAATCATGT TATAAAAGAG AATGTCAAGA ATCAAGAAAT CCAAACCTGC GTCTTTTGCC CATCATTTGT ATGCGACTGA GTAGACCGCA CTCGGGCTCA GATGAAGCTT AAGTCCAGAT AGTAGAGCAG GCAATGGTCT -CTTCCAAGAG TTTTCCAAAT AGGTCATAGT
ACTCGAAAGT
TTAAAACTCC
AAAATGATTG
TTCCAAGCTT
CAGGCAGATA
ATTTTTAGCT
ATCTGGCAGG
GTAGAACTCT
ACTTTCAATr
GTCTCCTTTC
CACAGTTCTT
TTTGTTTA.AT
CGCACGCAAC
TCTTCGACAG
AAACTAGAGC
TCCAAGCGAG
TCAATCTGGC
T'PCAAGGTCA
TCTTTCTGAC
TTTTTAGAGT
TTGCGCATGC-
TTTTATTTTT
ATTCGAATTC
GGTCTAATA.A
AGGAGGGCAC
TCGTTCTTGC TGGCATCTAT GCTAACTTTA TT'ETACTCCT TTTTTPTAAA ATCATCTTAA TCTTTTGTAG CGAGGCCAGT GTCTATCTTG ATGGCAACAC CCACCATCCT AGAAACTGCG GTAGAGATTT TTCTTCAGCC GACTTTGCTC GCTCTGTTGG GCCGATTGAA GTCAAACCAT CATAAATAGA AGAAAGTCCA AATAATTATA CTACTTTTCA GTGTGAATAA TGGGGGAATC AGAT'PTTATT TCATTT~CTCT
AGCTATGCATATAGTACTGA
AGACATGTCG AGGCGGCCAA TTAACGGGCA GTCTCTGCGT CACAAGATGA AGAATGCTGG, CAAGCTCTGT TTTTTAGTGG CTATTGTTTC AATTTCTAAC TCCTTATCAC ATCTATAGTT GCTTAGTTTA AAATAAGCAT TTTTAAACAA GGAGCATTAG ATTCCATTAA 1290 GAACTCTCTT CACGTGGGAA AGTTTTTGAT GTCGGCGTCA AACTTAGAGC GGAAAAAGCG GCCTTTGTAG AGGCTCATCC GCTTTGCCCT CCGTATGGGC AGTTTAGGG AACAAAAGCC AAAGATTTAC CAGTCCTATA TATGCAGGGG CTCCTAGAGG CGGACTAATG AGGTGGACCA CAAATTAGAG AATGAAAGAA TGATTTATAC TCATTTGCGT TATACAAAAT AG'rAAAATGC TTGGGTTCCC TAT'rCTATAA ATACTCGAAA GGAAATTTCA AGACCGAG'rA ACTCGGTTCA TTCTCTI-TT GGCTTGTTTA CGAAACGGTA GAAGATAAAG AGTTGCTGCG GGAGCTGTCA TGAGTCAGCA GTACTTGAAA AGTAGTTCTA GGTACGATAT AGAGTCGGCA AGTACATCTG TACAAGTGCA TCAGAATCAG TGTGGTAGGT TCACAAACAG TCGTAAGAAA CCAGCTAGTG TGCTAAGCGA CGCAAGCGTT TGCTGCTGN' TTTCTGGCA AAACAT'rGCT AAAAGTGAGA TGTTCCAATT TACTATAAAT TACGG'rTACG TATGTGAATC TGGATATTCT ATCTATAATT
AGTCGTCAAA
AGACGCT'r
AGTTTTAAAG
TGAGAAAGTT
TATTGACGAA
GGAGAAAGTC
AAAACTCAAT
TAATTATGCA
AAGCG.TAAGA
TTACGGGAAA
CAGA'rTAAGG TCTGAGTT'rC
ACGGGAATCG
TATGGCAAGA
GGTAGTTTTC TAATCAGATA TATTGATTCA TAAAAATAGG AATATAAACC AAAAATTAGC GTCTTrATAA AAXACTTATC TTTTT-T'r'r TAGCAAAAAT TTATAGTATG GTAATTTATT AAATATTTTT TAGACGTCAG AATTAATCA.A ATCAGGGAAG AGGTCTTGCG AGGTGGTGTT TAAGTCATrC AATTACTGGG TAAGTGGA.AC CGTTGCAACT AAACTGTAGA GAAAACGGAT CTACAAGTAA TTCAGCGAGT CATCTGAGTC AGCCTCA4%CC CAAGTACATC GGCTTCGACA CTGCCGCTAC AGAAGCAACT TTATAATATA TATATArATA ACCTCAAGTT TCTTGCTATT TATATCCATA CATGAAAATA AAGGGTGAAT ATAGAGAAAC CATTGGCTAC GGGCCTCGAC GATACTACTC AGGTCATGAC CTTGATATCC TCAAGGGGAT CAAACGAAGG TATTTACAAA GCTTTGGCAA CAAATGATAC TCAACTAGTT TGTCAGCTTC AGCGCTTCGA CCTCAGCAAG AGTA'rTTCTG CATCATCTAC GCTAAGAAGG TCGAAGAAGA AAGAAAGACG TAAACAAGGG TCCCTTTAGA AGAATTGAAA TTGCGGCCCG TTTTGATTGT TCATTmrAAA AAAGACGACC TTGATATTTT GGATAACCTA ACCGCTACCT CTATCGTCCT TTAGCGGACG GCGTTTTGAG 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700
ATTATGTAC
CAGTGGATTC
ATACGATTGT
CAAAAGTTTA
TGAAAGTGAC
CTAAAACAAA
CAGGTACTTC
ATCAGTTACA AATGTCAATC TCCAATCTTA CATCGAGCAA TTGCTGGCTT CTATAAAAAA AAATGGCGCC CCTGCAATTA ATGCAAGTCT TACAGGTGAA GGTGTAGATT CGGTATATCG AAATGATGGT TCAAAA'rTGA CCTTTACCTA
TGATCTTGGT
AACACAAACA
AATATATCAA GTATGCGTCC ATGTTAACCC TTGGCAGTGA AAAAATGGTA GACAGGTTCT TCTTGGTA-AA CCTTCAGGTG TAAAGAACTA CATTACTGAC ATCCTATAAT ACA'1CTACAA CCAAATGAAT GGTT'rTI-rG AATTACTGGA ACGGATACAT TGGAATTAAC TACTTCAATG ACPTCACAG TCTAAGTCAC AACAAGTGCG TCGGCTTCAG AGCTrCAGCA AGTACCAGTG GACAAGTGCC TCGGCT'rCAG GGCTTCAGCA AGTACCAGTG CACCTCAGCT TCTGAATCGG TGAATCGGCC TCAACCAGCG
TGACGACGCA
CTAAGAAAGG
CCTTTACATr
GTGGAGGAAA
TCTC.AGTAAG
CATCAACCAG
GGGTAGTGGG TATACTTGGG GAAATGGTC ATATGGATTA ACATCATCTT GGACTGTACC TACCCC 'rAC GCTGCTAGAA CAGATAGAAT GGTAGTTGAA TCTAGCACGA CCAGTCAGTC TGCTAGTCAA AGCGCCTCAG CrrCAGCATC TGCCTCGGCT TCAGCGTCAA CCAGTGCGTC CTTCAGTCTC AGCATCAACA CAAGCACATC AGCATCTGAA CTTCAGCTTC AGCATCAACC CCTCAACCAG CGCCTCGGCC CCTCAGCCTC AGCATCAACG AGTGCTTCAG CCTCAGCATC TCAGCGTCAA CCAGTGCTTC AGCGCCTCGG CCTCAGCAAG TCAGCAAGCA CCTCAGCTTC AG'rGCTTCGG CTTCAGCAAG 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3744 CACAAGCGCC TCGGGTTCAG CATCAACGAG TACGTCAG.CT TCAGCGTCAA AGCCTCAGCA TCAACAAGTG CGTCAGCTCA GCAAGTATCT CAGCGTCTGA ACGAGTGCGT CTGAGTCAGC ATCAACGAGT ACGTCAGCCT CAGCAAGCAC GAATCGGCCT CAACCAGTGC GTCACCTCAG CATCGACAAG CGCCTCAGCT
CCAGTGCTTC
ATCGGCATCA
CTCAGCTTCT
TCAGCAAGTA
TCTGAATCGG CCAGTGC 1 TC AGCCTCAGCG CATCAACCAG TGCGTCAGCC TCGACAAGTG CGTCGGCCTC AACCAGTGCA TCAGCAAGTA CTAGTGCATC GGCTTCAGCA TCAACCAGTG CCTCGGCTTC AGCGTCAAAC AGTG INFORMATION FOR SEQ ID NO: 264: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 795 base pairs (HB) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 264: CGATAAAGAG GCCTTGAGTA ATCTCAATTT GCAGATTGAA AATGGAGAGA TTATGGGCTT GA'PTGGTCAT AATGGGGCTG GAAAATCGAC CACTATAAAA TCCCTAGTCA GTATCATTTC ACCCAGCAGT GGTCGTATTT TGGTAGACGG TCAGGAGTTA TCGGAAAATC GCTTGGCTAT TAAACGAAAG ATTGGCTACG TAGCAGACTC GCCTGACTTA TTTTTACGCT TAACGGCCAA TGAATTTTGG GAATTGATCG CCTCATCCTA TGATCTGAGT AGATCTGACT TGGAGGCTAG 1292 TCTAGCTAGG CTATTGAACG TTTTTGATTT TGCTGAAAAT CGCTATCAGG TTAT'rGAAAC TCTTTCTCAC GGAATGCGTC AGAAAGTCTr TGTCATCGGA GCACTCTTGT CTGATCCCGA TATTrGGGTC TTGGATGAAC CCTTGACTGG Tr'rGGATCCC CAGGCTGCCT TTGATrTGAA ACAGATGATG AAGGAACATG CACAAAAAGG GAAGACAGTC TTG=rTCAA CTCATGTCCT AGAGGTGGCA GAGCAAGTCT GTGATCGGAT TGCCATTTTG AAAAAGGGGC ATT'rGAT'rTA T'rGTGGTAGT GTAGAGGACT TGAGAAAAGA TTACCCAGAC CAGTCTTTGG AAAGTATCTA CCTTAGTCTT GCTGGTAGAA AAGAGGAGGT TGCGGATGCG TCTCAAGGTC ATTAAAAAAT TAGTTGATAT CAATATCCTT TAT'rCATCTC AAGAAGCTAA TCTGGCTAAT CTACGAAAGA AGCAGGCTAA GAATC INFORMATION FOR SEQ ID NO: 265: SEQUENCE CHARACTERISTICS: LENGTH: 2231 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 360 420 480 540 600 660 720 780 795
S
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 265: TGGTAATGTG CTTGGCAGCw TCCTTGACAC TGCTACTACC ATT'rCCCATA GCGACCGACA TACCAACGCC AGCCAGCATT TGAGGTCAAA GCCATATTCT GATTGATGAC ATCCGATGCA CTGCCGCCT'r CTCAGATTCT TCAAGATCAT TATCTGAGTC ACCAAAAGCC ATGACTTGGT TTCCCAACTC GGCGAATGCC TTCTAATTTA GAATTTCCCT AAAGGATTGC TACGTGTCAA TTTCAAGTCT TCAAAATCAG TCTGGTGTCA TCAGCATCAA AACTTGGTAG ATAGGCTGAT TCATCAGGTG AAGCAGGTCC TCTTCCTTTT GGGGA.ACAAC CTTGCTGACC ATGCGATTAA AAGACTGACT CACCGTCCGA GTTAAAACAG AGGGAACGAA GCGACTAATT CGTTGGGAAA
S
S
5555 .5 S S
S
AAGAACCCAG ACCAAAGGAC CAATCVCCGT GCCCTCTTTT TAGGGCTCGT GAACAAGACT CAAAATCCAG ATCCAAATCG CTACGCCAAC TAGTACCCCT AAACACTCTT GCGATTGTTG CCATCCTATC CCAATCTCCC ATGATTTTAG AACCCAACAT GGCATCCTTG GTCCCTAG;:G TTAGCATAGC TAATTAGATG GCGCAAATGT KACTTGGAAA CTGTCTTTAC TAAAGATATA CTGGCCATTA TAGGTTACCG TCCATCAATT CCTTAACAAA AAAAGGTCCT CGCCCTGTCG TGTTCTT1'GA CAATCTTAAT CGCATCCTTA GTGGATTTCA ACCAAGGTTC CATCGATATC AAAAAAAACA GCTTTGACTT CTTTTGTGAT ACAATGATTA TACCACATTT CAGAAAGAGT GAGTAAATCA TGCCTAAGAA AATCCTTGTT TITACATACGG GTGGAACTAT TTCCATGCAG 0 900 GCCGATGCTT CTGGCGCTGT TGTGACGAGT TCAGATAATC CCATGAACCA rG'rGTCCAAC CCACTrGAAG GAA'rCCAAGT CCACGCCTTG GACTTTTTTA ACCTTCCAAG rCCCCATATC AAAcccAAAc ATATGCTGGT CCrCTACCAG GGAGTGGTGA TCACACACGG AACCGATACT ATGGAAGT'C CCCATATGCC TATCGT'rCTA GTAGTGATGG TG'TTTATAAT TACCTAAGTG AAAATTAAAG AGGAAGCAGA TAAC1'ACGAT TTAGAGGAAA CAGCCTA'rrT ACAGGAGCCA TGCGTACtCC CTrTACGAGT GGCCAGCGAT
CCTTGATACC
AATGAGC1TCG
GACAGGGCTG
CTGACAAAGG
AAACACATAC
'rCATGAAACA AGTTTTGGTC GrrATGAACG GACTAA'rGTC AGCACCTrCC CGAAATCCTC TA CTTrCAAAA ATGAAATCCA CGCTGCCAAG TATGTCACCA AGACTCCAAC ACATGGCCCC CI'GGTCTCA .e ATCACATACA AGGI'TAGTC TTGATATGCT GGATTTAGAA ATATTCCCAA AGAAACGGCT CTCTGGTATC ACGATGCTTT GCGTACAGTT GCAAAAAGCA GCTTGAAACT CCTCATCGCC TGGAAGGCTA ATACTCTTCG ATGG tACTGA CT'rCGTCAGT TCAGTTCTAT CTACAACCTC ACCTCAAAAA CATGTTTTGA T'rTGAGCTGA CT'rCGTCAGT TCAGTTCTAT CTACAACCTC ACCTCA.AAAA CATGTTTTGA
CCTATCATCT
CACTTGGACG
CAGCTGAACC
CGGCTTATGC
GT'rTGATTAT
TCGTGTTCGC
TGGTATGACA
CCAAGCC?1'C
TTTGACCTTG
GATGAGCTGA
GGAGCTGGTA
CAAAAATTAG AAAGCCTrCT AACGGTATTG CCGAGCCTfGT GGCGT'rTTCT TTGTTAAAGA CTCAATGCCG GACTAACAGG AAAATCTCTG CAAACCACGT TTCATC'rACA ACCTCAAAA AAAAACATGT TTTGAGCTGA GCTGACTTCG TCAGTTCTAT GCAAAAAGGA ATTCCAGTCG TTATGCATAC CAGGGTGGGG AC'rCAACGCC CAAAAAGCTC ACAGGCTTTG AAAGACTATA CACGTCGCCT TACCGTATGT CATGTrTTGA GCTGACTTCG CTTCGTCAGT TCTATCTACA CTACAACCTC AAAAACATGT 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2231 TCTATCTACA ACCTCAAAAA CATGTTTTGA GCTGACTTCG AAAAACATGT T'rTGAGCTGA CTTCGTCAGk TCTATCTACA GCTGACTTCG TTAGTTTCAT CTACAACCTC AAAAACATG'r
S
*6.O S S
S
TTTGAGCTGA C INFORMATION FOR SEQ ID NO: 266: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1310 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 266:
GAGTCAAAGG
TCAGAAATTC
GCTTATGAGG
CGAGTT'rCCA
TCAGGAACGG
TACCAATCCC
CATTTGGTAT
TTCCTTCTTT
1294 CTCCGAGGTT GACTTTTTAC AAGGGGACAG GTGAATATTA TCTTCTPTGA AACAGAAGGG AGCAAGATCT ACGCTCATAA TTCGCCTCAA GCTCTATGAG TTGGAGTCTA TCTTGCCTCG AGTCAACGAT CGCAAACATC CGTCAGA'N-r ACTCAGTGGA GCACCATTTC CTTTrATCAG ACGCACAAGG AGGTTCATGT TCCTAAAAGA AAATCTAAGA AACATGAGGT AAAAAACA'rG TGTTTTATTG GTTTTAGCAG CTrGGATCTT GCTGCAAGGG GGATGGTAAA ATATGGCCTT TACTAGGTAT TGTTTTTTTT a CCATTGAGTC CATCCTTAGA CGTCATCTCA CTTCGGCAGT TTTTACAGGT TCATCATTGC AAATTACGCT TATGACTTGT TACCAGTTAC CAATCATTCT CTAGCATCTT GGTGGTACTT GGTGTTGGTT ATCTGACGCA TTCAAGTAAG AAAAAAAATG GTGGTACAAT GGGAAAAAAA CAGTCGTCAC GGATAAGGAA GTAGCGGGAC CTTCTATAAG CAAGATCAAG ATCTCGTAGA TGACCAAGTG TTGGGGATGC TAAAATCTAC TATGATAATG. CAGAGATGCT AGGTGATTTT ATATTGAAGT GGCCTTCGGG AATGCA.ACCG TCTATGTTCC ACAACACTGG TGAA.AGTAGA AACCTCCTTT GGTGCAGCTA AGGCTGACGC TCCTGTAGCC AAACCTTGAT TATCCGTGGA GATGTGGCTT TTGGGAAGTT GGAAATTGTC AAAAAAATCT TCACTTCAAC CATCAAAATA GACGTACTAA GAGTAGGAAA GCTCTGATTT CAGTTCTATG GTTGTTAGAC TTTAAAAAAT GAAATGCTGC TGTATATTTT TCGATATTTT GGCTTTTACG TTTGATGTAT CTATGTACTA GATGTAGTGT CAAATGCTTT TAAAAAACGG ATGATAT'rGG ACAGTTTTTT TGCTCAGGAA CCATGAAAGT CAGTACCTGG GTTTATGACA AGGGAGAATG TCTAGACCTC CCAGAAGGAA 3.20 CTATTTTAAT 180 CAAGTCCTT'r 240 CTCACGGCAT 300 AAAAAGAAAG 360 AATTI'GGAA 420 GCTTATAAGT 480 TTACTGGCGC 540 CTTATTTGGG 600 TTCTGGAATG 660 GTCGCTTT'rG 720 GAAGTCGCTT 780 GCAACT'rTAA 840 CGTGTAGATT 900 CCAACCAGCA 960 TACGTTAAAT 1020 TTGATGCCTT 1080 CTTTAAAAGT 1140 CAGCGTAGAT 1200 TGCCTTTAAT 1260 1310 INFORMATION FOR SEQ ID NO: 267: SEQUENCE CHARACTERISTICS: LENGTH: 5922 base pairs B) TYPE: nucleic acid STRANDEDNESS: double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 267: ACTCTGATTT GATTGGAACG ACAGTCGGTG CCATTGCAGT TACTTCAAAC GTAACGACTT ATGTTGAGTC TGCTGCTGGT ATCGGTGCAG GTGGACGTAC TGGTTTGACA GCCTTGGTTG 'rAGCTATCTG TTTTGCGATT TCAAGCTTCT CGGCTACAGC TCCAA'rC'rG ATTATCGT'rG TCCATTGGGA TGATATGTCT GAAGCAGTTC TCAGCTACTC TATCACTCAA GGGATTGCAG ?rGTTAAAGG TCAAGTTAAA GATGTTCATG TCCTT-AACTA CATCAGCATG GCCTTATAAT TTAATACAaG GAGATAGGTG ATGAA-AGAGA CAGGCTGGAT TTCGTCTTT TTACTTGCCG CCTCTATTTT GACTTTAAAA GAAGTAGCCC TTAGCcCACT
GGATTATGAT
CTGCCTTCTT
'TGGTTCTT
TCATGATTTG
AGAATGACCC
AAAATATGTG
'rCC'rTTTATA
TGCTACAGTC
T'rrCAAT'rGT GGTTCTGGCT CTATTTATTA TGGGAGCTCG TTAATTT'rTC TrTTT'NrAGA GCTAAAGATT TGGCACGTTT TCTAGCGATC GTACCAACAG GCTTGGTAGC TTGAAAAATA CACATCTATC TTTATGGGAT GACTTACACT TTGACTAAGC GATTTTGGAT GCCTTGTTTA AGGGGGATTT CCCCCCTTTT GAAAGAATTG TTGAATCGTG TCAGGTTCCC CTAGTGGT'rA AGGGCTGATA GTTGCTGGCC TAAAACCAAG TTAGCTAGTT GGGCTTGAGT TATCTAGTTA GTCAAATGAG ACGACAACAG GTTGATTTCC AGTTTCTTCT TCGTGGGATT GTTCCTAAAA TACGATTGTG ?r'rGCT'rTAT AGGTATGTCG ACAGTTCTAT CTTGCTTCAC ATGATTGTTA GAGTCGGACA TTAGGAATTT CAGGGAAAGA TGAATGCAAT ATATGATGAA AATCCTTGAG CAGTTATTTG CGGAATTA'TC
TTGTCGGGTC
CTAACCAGTC
TGCTAGCCTT
AGATTTTCCG
TGCATCAACC
CTTGGACAC
ATGGGATTGC
CTGTTTAAAA
CGTGTCCATC
AAATATACTT GGTTCCATTT TATGCAACT TCAGATTAAT GATATGGTTC AAA-ATAGTTC GC'TTGCTrCCG ATTTGTGAGG AAATCTTGTG AGGCAAGGAG AACTTGGGAT TTGTAGTCGG AAGTAATTTA CCT'rCTT'rAT TGATTTATGG CTACAAGACC CAACGTTTGG AAATGTCGAT TTTCTGTTTG TTGGCTCTTG TGCTGATTAT 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 GTTTTTATCT AGGAACCGAC ?TTTTCTTTT TATGGTAAAA
CTCTTTCTAC
TAGAAAAATA
CATGCAAAGA
.00.
GGAGTGACCG ATATGTCAAG AATGTAACCC CAGACTCCT'r CAGCAGGCTC GTAAATTGAT ACTCGGCCGG GAAGTAGCTA A1'CAAAGCGA 'rTCGCAAGGA GTAGCAGAGG CT1GCTGGC GGTGATGAGA AAATGGCTTA AACCCAGTTA TGGCTCGACC CGTCAAACC1' TTACAGAAAA
TAAAGCCAAT
TTCGGACGGT GGTCAATTTI- TTGCTCTTGA GCAGGCGCTC AGCAGAAGGA GCCAGTATGC TAGATATCGG CGGAGAATCG TGTT~GAGATA GAAGAGGAAA TCCAGCGTGT 'rGTTCCAGTG AAGTC.ATGTC CTCATC'rCTA TTGATACTTG GAAGAGTCAA TGCTGGTGCC GATCTAGTCA ATGATATCAC TGG'rCTTATG TGTGGTAGCT GAAGCGAGAg CGAAAGTGGT CATCATGTT'r TCAGCATCCT AGTTCGCTT'A TCT'rCCCTCA TrrTTGGTT AGAGT'rAGCT GACT~rGAAA CATTGCCAAT CGAAGACTTIG
ATGGTGGCTT
AATATCCTGT
TTACGCGACC
AAGCgATTTG CI-rGGTTTCC
GGTGTAGAAG
1296 TCTTTGAACG AGCACTAGCG AGAGCGGCAG 'rGGATCCAGG AATTGGCTTT GGTCTGACCA TGGATAAACT ACATCAGAAG GGCI'ATCCAA TCA'rCAATAT CCTAGAGGAG AATGGTN'TG GAAATCGGGA CACGGC'rTCG GC1'CATGTAA TGGTGCGCGT GCATGACGTA GCTAGTCACA TCTGCCATTC GTCTGGCTGA TGAAGCGGAA AATTTAGATT GAAAGAAAT GAAAACAATC AGTGGATTGC 'rAACTACCGG CTTGGAACGA ATGGTGCAAC TGTTAGCTTTI GCGTGGCAAT CCTCCATATC GGAGGGACTA ACGGCAAGGG CTCGACTATT AGAAAAGCTA GGGTTGAGAG TTrGGCGTGTT TAGCTCGCCC CCAGATTAGC A'rCAATGGGG AATCGATC'TC AGAACCGAGG CTATCAGTCT TTGCTGGAGG GAGAAGCGGT CGCCAATTTA AAGCI'GGTAT TGCACCAGAA AGAAAGAAAA TC'TGCTTCTT TCTTTCTCGG AGTCGCGC AAGTCAATCC TGAGACAGAG C'rACTATCGC TGCGAGACAG GGATGGCAGT TGAAATTGCC TAAAACAATA TAAATAAGAT ACGGATCAAC CGCATTr'rGG CCCCATCTCA AACTCAAGGT GCTT'MTGA AAAAGATGCT TATCTCATTC ATTACACAGA CTAGAAGCTC TCATGGCAGA CAGGGCACAA CCGAGTTTGA CAAGTAGATG TGGCCATCAT TGTCAGCCCA TTTTGACAGG GACACCTTGG AGGTCATAGC GTAACAGGGC GTAT'rGCTCC GATGCGCCGA GACTTGCCTA GATTATCACA GCCCTGGCCT GGAAGTTGGC ATGGGTGGAC AATTACAACI' ATTGGCT1'GG AGAGCAGAAG GCAGGTATTA AGAAGCCTTG GC'TGTGATTG CGGGACAGAT TATCAGGTTC TACAAGTGCT GTCAGACAAG GAATGCTGGG ATGGCCATAG AGCAAGCAAT GATTrCTTG AATCGTGTCA AGAGATCCCT GGCCT'rGTTG GTAACCTTGC TTGTATCAAA ACCAAGGCCT -CGAGCTTACT CTAACACATT GGCAGCTAAG TCTAGAAATC ATGACTACTT TGCCTCAGAG TTTGGA'rAG TACCAATGTC ATC-ATGTGGC TCTACTTGGT TCAAACAAGG GATGCCCTTG ACCGCATTGC GGAAGGGAAA GTCATCAAGA AAGTGTCGTG GTCG.CTTcC-A GACTAGCCTG CTTTACTTGA TACTTTTTGT GTCAAGCCT1' GGAAGAAACA TGATGATTTT GGATGGAGCC AAGAACGTTT' TGCGGATTAT TGGAGGATAT GTTGGACTTG TTGCGGATAG TCGGGCGACG TCAGCTACCA AGATTGGCAT 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2B20 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 ACAGGGGAAG TCT'rrGACTA CTTGGTTTGT ACCAAATAGA CAAGAAGATG GTCGAGAGCT AGTTGGCCAG GGCGTTTGGA CACAATCCCC ATGCTATCAA CATAAGGAAA TCCTCTTCAC CTGGGAGCCA TGCCAGTTAC GATGAAAACG TGCTGAAAGA GATTCTAG AGCAGAATTT ACAGGTTCCT TGTATTTCTT AATGGATACA CAAAAGATTG GACAGATAAA AAAGAAGAGA AACAAACAGT TAGGATTGTC GAGCCAAGTG AGGGCCTATC TGA'rGGAGAG GAAGAACGAG AAGCGGC2'GT1 AAAAATGAT ATCGAGGCTG TAGGAGAGGA CGCTAATCGC GAGGGCTIGC 1297 AGGAAACACC TGCTCGTGTA GCCCGTATGT ATCAAGAGAT TTITTTCACGT CTTGGTCAAA CAGCAGAGGA ACATTTGTCA AAATCC'N'TG AAATATTGA CGATAATATG' GGTAAA AGGATATCTT 'r'r'CCATACC ATGTGTGAAC ACCACTCTT ACATTGCCTA CATTCCAGAT GGTCGTGTGG CAGGCT'rGTC AAGTTTATTC GAAAAAACCA CAAATTCAAG AACGTr'rGAA TGATGGACTA TCTAGGTGCT AAAGGAGCCT TTGTTGTCAT 'rGAGTATGCG TGGTGTTAGA AAACCAGGCA CTGCAACCTr TATTTGAAAC AGATAAGGAT CTCCGTGACC AAGCTTATCG AATCCGCTTC AAGCGGATTT TTCTAGAAAG GAATCATTAT ATTTGGAAAT GTTrTGCCTAT CATGGTCTTT TTCCTAGTGA TTGTCGTTTC AGCCATCCTA TCCTATGATA TGACCAAGGC
CAGCCTCTGT
GTGAAGATTT
C'TCTTGTCCA
TAGATACTTG
CCATTACGGA GAATTGTGTC AGCAGTGGAC GATTGAAACG GTAGCC'rATA AACTGGTGGA AGAAATGAAG TTGGAACTGA AAAAACCTTG CTCGGTAACC ATTCATCGCC GCAAGCAACG GCAATATGGG AGATAAACAA GCAAACTTGA GCATCCATAT TCTCAAAGAG TCCAGTGTCT AGGATAGCTT TGCCAATCAA GTGGTTGAGG TAGAAACCTT GTTAGCCATT GAGTCAGAGC CTCGTTTGAT TGATTTGGAC TTGCTCTTTG TCATATTGCC TCATCCTTAC ATAGCGGAAC TGCGCCTCAT TTTATCCATC CGATATTAAA GAAAAAATAG AAAAACTCTA GTTTTCAGTT TCTTCGAAAA TCTCTTCAAA CCACGTCAGC GGCTAGCTTC CTAGTTTGCT CTTTGATTTT TGGGAGGAGG ATAGTTTCTC TACCGTCCAT AGGGTCGAAA GGATGGTTAA AGTCAAAATC GTGGAAGGTA ATCTT'rCCTT GGTTATTAAG AAAGACATTT TTTAAGAAAT GGTCGA'rGAT
AGCAAGCCAT
TAGCGACGGA
TGGAAACCTG
TGGGACGGGT
TGGAGGACCA
GCCTI'TTGT
ACAACCGATC
ACTTGCA.ACT
GTCGCCTTAC
CATTGAGTAT
GTCTAAAACC
AATGGCTGTA
CAATTGAAAC
ATACCAAAAA
GCCAITTTAT GG1'AGAGCGC TAAGC'rAGCC CGTACGGTTG 'rATCGAAGTG GCCGATGCCT TGAGGCGGAA CATATGTGTA GACGACACTA GC'rCGTGGTC TTTAATQGGCG CTATAAAAAG GGATCAACTG CAGATTAAGG GAAAGAATTG GGGCAGAAAT AGCTACAGAC TTGGATT'rAA GACTTGGTTT CAGGAAACGA ACGTACCTTT GAGTTTTATC GGCACCGGTG CATTTGTCAC AGCCTTTATC GCCCTAGGAA TGACAA.ACTG CGAGCTCGTG GCCTTGGGGT GGAGTGGAGC GCTACCAGCA CAAGACTTGT GAGAGAAGTG CATTGGGGAC GATCCTTTAT ACAGACGACC CCTTGAGTCt TACAGGAAAT CGCAACTTGT ATGATGCTTT GAAGGC'rAGA GTTT'TTA'rAC CGTACTCAAG TACAGCTTGC TAAAATAGGT CATTTTCTTC AGTACTCTTG GGGGATAACG GGGAGGTGTT GACTTGAAAA TCGAGTTCTT CTTCCAATTC GAGTCAATGA TGTCATCAGG 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 CAAGCTGGTA ACAATACCAA TCTGTCCCCT TTCTTTrCCC ATTrTTGA.A AAGGATTTTA CGAAGTTAAT AGGTAAA'PTC AGTTACCGAA AAATATTTTT TAGGCGGTAT TGTI'rACCCC GAAGGAACAC CCATCACCGT 1298 AACTAGCAGA TCGCATGTGG TTATCATACA GCAAATAGGA GTTACAGGGA GAAATAGGGA CCAAATTAAC TTGATrATAT CATATCTATT GACTr'rTAGG GTATTGGTAA AAGCCATATC TTAAAAATCA AGAAAAGGTG
AAAA.ATTCCT
AACTTTCAGT
GGTAAAATTT
AAAAATCTAC
TACTTTGAGA
GGTATGATAG
ATT'rGAAAGG CCCCGGAACC TTCCAAATAC TTTTCGATGG AAACAAAAAT CGAACTATAT ATAGGAGAAA TCATGAACAA AACA.ACATTT ATGGCTAAAC CAGGCCAAGT TGAACGTAAA TGGTACGTAG TTGACGCAAC TGATGTACCA CTTGGACGTC TTTCTGCAGT AGTTGCTAGC GT INFORMATION FOR SEQ ID NO: 268: SEQUENCE CHARACTERISTICS: LENGTH: 1988 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 268: TAACTATCTA CGATGAGCTG TTGTGATTCT CATTAGTTCC CCTTTCCCAA GAGGCATAGG GGTGCGCATA ATAGATGTGC TCCTCAGAAA ATATATCAAA CAAGCGATTG AATTCCGTTC CATTATCTGC CGTGATGGAA AGAATCTTGT GTTGTTNTAA GATGAGTTTT AGAGCCTGAT TGACCACCTC AGCACTTTTA TTTGGAATCA ATCGGATGAT CTGATGTCTA CTCTTTCGAT CCGTCAAGAC AATCAAGCAG TAGTT'TTTCG ATCTCGTAAG TAGAACCGTA TCAATCTCAT AATGCCCATT CTCCAAGCGA AGATTGATAG CTTCAGGCCG CTGTTCGATG GATTOACCAG CAGGTTTAAA G'N'GGTGCTA GCCTGTTTCT TAAGCGCTTT TCCTTTTCTA GGGTAAAGCA AATCCTGCTT GCTTAACCCC CCACGTTA AC CCCTTTAGCC GGAGAATCTT TTCCTTTAGT .TGTTTTCATA AGACTGTTGA CAAGACATTG TCGGACTGTC AATTTTCCAT GATGAATCCA ATAGTAAATG ATAACCATCA TTTCAGGCGA AAATTTTTGG TCCTTGGTCA AGCTTGATTT CTTGACCGAG GCGTAGTCGG CAGAATAAAC CTCTTTGAAG CCACGCTTGA TTTCAGTGTG ATAGTTTGAG
GTTGAAATTC
TTATGATAGT
CGCTTGCGAT
CGCCCTI'TTC
GAGCTTTTCC
AAGTAGAGAG GCAATICTC TATTTGATTT TCCTTCTTTT TTCCATCTTT CGATTAAGCG ACGGCTATCG ATTGTCAAAT GTTTGGCTTT TGTAGTATAA TTGTCTTGCA TCTCTGTGCC TI"FCTTGTGT TTGTGGTTGA ACAACAAGTA TAACACAGAG GTGCTTTCTT ATGCCTACAA GAGCTTTCAT TATrTCCATT TTTACTGAAG CTAGCAAGTC GGGAATCTCG GATTAATAGA CAACATCACT TCTTTTC?1'A TATTGTCCTC AATATCGTCT TAGTTACTCA TATACATCAG ACCTCCATAC TCACCTTACA GTCT'rGCAAA GAAAAAACI 1299 TTCTTTTGGA ?rCACTCTA T TACCTGTAA. ATTTAA'rGAA TAGAGAG?1'T
ACCTGATAAG
AZAAAGACAC
TCCACCTCCA
GATTCTI-rG
CCTTGTAGCC
TrAGTrTAAA
TCTTGATTCC
AATI'TCTAGG
TACTTATGTG
GTAA'rAATAT
ATGTTTCTGA
ATTTCGATTC
TCGCAGTTCC
TTCTGAAAAA CTTGTGTATA AGCAACACAA AATCCGAGAG TAAATTG?1r AAAATATCAA TAATT TGGG GCTACGATTA TTATAACTGG TATrTATCGA CGAGCCTCTC TTTGTATTAT CTT'rGCCTAA. TGTAGAGACA
S
CTGGTAAGTA TTGACAAAGG CAAAACAACA TTGCAATAGT TTTGAACCGA TrCCTTGCCC GATTTCTAGC CAANTCCAA AAGTCTCTGC ATCTTCGACA TAAAGATTAA GTGGCTCACT AACACGAATC AGATTCCCTA 'rTTCTTGCGA GTATAG'rGAA ATGAAATAAA ACATGCGCAA CAATGTCTTA GAAATCAAAG TGTACTATTT AATAAAAATC AAAGAGCAAA CTAGGAAACT GGTTGTAGAT AGAAcTGACG AAGTCAGCTC TATCAAACCT GCCAGGACAT TTCAGCCTCT TCTCTTTTTG TTTATGTGAT TCCTTATTTT ATCGATTAAG GAATTTAATC TAACTTCAAT GCACTATACA AGCCGCAGGT TGCTCAJAAAC AAAACATAGT TTTGAGGTTG
TAAAAGTCCG
TTAGCTTCAC
TCTTTTACAA
GTGCCTGGAA
TTCTGCCTG
ACAAATACTC
TGCCCTTTTC
AACGGTTATA
CCAATCTAAA
TAATTTCTAA
TCTAATACTC
ACTGTTTTGA
TAGATGAAAC
960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 1988 TGACGAAGTC GGCTCAAAAC A'rGG'TTTTGA GGTTGTAGAT GAAACTGACG AAGTCAGCTC
AAAACAGG
INFORMATION FOR SEQ ID NO: 269: SEQUENCE CHARACTERISTICS: LENGTH: 709 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 269: CCGGATATTT GTTTTATGTA ATTTTCTTGC AAGTTTCTTC TTAGTAGCTT GTCAGTCAGG TTCTAATGGT TCTCAGTCTG CTGTGGATGC TATCAAACAA AAAGGGAAAT TAGTTGTGGC AACCAGTCCT GACTATGCAC CcTT7T~iGAATT TCAATCATTG GTTGATGGAA AGAACCAGGT AGTCGGTGCA GACATCGACA TGGCTCAGGC TATCGCTGAT GAAC?1'GGGG ?1'AAGTTGGA AATCTCAAGC ATGAGTTTTG ACAATGTTI' AGCAGTTGCA GGAATTAGTG CTACTGACGA ATACTATGAA AACAAGATrA GTTTCTTGGT TTTAACTAGC CTAGAAAGrG CTAATATTGC GGTCAAGGAA CAATTGCCAA AAGTTCAATT 1300 GACCAGTCTT CAAACTGGTA AGGCTGACCT GAGAAAAGAA GTCTr'rGATT T'rTCAATCCC TCGTAAGGCT GATGTGGAAA AGCCCAAAAA GGGACTGTTC AACTrCCCTA ACTAATATGG
AATACAAGGA
CAGAATCAAT
GTGAAGCAGT
TTGCACTTAG
TGAAGGACGG
CAATGAATTG CAGGCTGGAA AAATAGATGC TGTrCATATG GATGAGCCTC TTATGCTGCT AAAAACGCTG GCTTAGCTGT CGCAACTGTC AGCTTGAAG CGACGCCAAT GCCGyTGC'rC TTAGAAaATA GTGATGATTT GAAAGAAGT INFORMATION FOR SEQ ID NO: 270: SEQUENCE CHARACTERISTICS: LENGTH: 1680 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear k 000 0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 270: TATAAAATGT TAAGTTAAAT GATTTCAAAA TTCAGAAAGG TTTTATTTTA ACAGGAGTGA AACTATAGTG TTTCTAAATT TGTGATGGGG CTATTCTAGC TTTAGAAACC T'rCAAAAATT ACTTGGAAGA GTATGAAAGC.ATTTAGTTTA TAGGAATTCT ATATTTATGA AGACGGGGTG TTCGATAG'IT AGTAT'rGTTC GTCAGTTGTA TAGAAAGTGT TCGAATTTTT TTAAGTGATT GTGCTTTATG ATATAATGTT CTTAATGAAT TTTCAGAAAG CAPLATTTCTA CTCTTCGACC TCGACCACAC TCTTCT'rGAT GGCTTTGACC CAACTTCTAA AAGAAGAAGG AGTTGCGGAT TTACGTTCCT ATGAACAAGG CTCTCTGGAA AGACTTGGAG GATTGCTTTA TGCAGTTCCT GTGAATCAAT CAAAACTGAT AAAATTTAAG GCAATCAATT AGGTCTAGAA TTACATATAT TATTCTGAAA GATTTGAGCT AAATTAGTTA ATTGTATGAG GAAAACCTCA AATTGTTCTA TTTGATGCTG CTGAGGATGT ATTCAGGCTT ATAAAGATTiA CTGAAGAAAA TCAGTAAACA AGAGCTGGTT AACACGCGCT TTTCTCGTTT ATTTGCTCAT TTTGGACAGG AAAAAGACGG -TAGTTTTCTT GCCCAGCGTT ACCAATTTTA CCTCGCCCAG CAGGGACAAA CACTATCGGG CGCTCATGAT CTCTTGGACA GCCTCATTGA GCGTGATTAT AACTTGTATG CTGCGACAAA TGGCATTACT GCCATTCAGA CAGGACGTTT GGCTCAATCT GGTCTAGCAC CTTATTTCAA TCAAGTCTTT ATCTCAGAAC AGTTGCAAAC TCAAAAGCCG GATGCTCTTT TTTATGAAAA GATTGGCCAG CAAATI'GCTG GATTTAGTAA AGAAAAGACG CTGATGATTG GAGATTCTCT 1301 AACCGCCGAC ATTCAAGGTG GCAATAATGC GGGGATTGAC ACTATCTGGT ATAATCCTCA I TCACCTCGAA AATCACACAC AAGCCCAGCC GACITACGAA GTCTAT'TCTT ACCAAGACTT AAAGA'rCACA ITTTAAAGGA GACGAGCTAA GCTGGATTGT TTAGATAAAA TGACTACAAA AAAGCTAATA TAA'PGAAGCT GTTAATTATG TcAATCTCTT TTTAAGTTTT GAAATCTCCA TCITTAGTCA TAATCTTAGA GTAATAGGAA CAATATGAAT CGTAPATGGCT AGATCATGTA GCAAAATATG A'rAT-rCTGA TTACTATTGA AGAGTACATT AATTACATC TGACACTTGT TTGATAAGAA AAATTTCTCT AAGAAGGGGA TTATATTGGG ATATATTTCC GAATTATCT'r GTATGGGAGA TTTTCCTCAT CAKI-rAGCAA ATAGTATTTA GGCGATTC'AA TTrrTACTTG AGAAGGGATT CACAAGTAGA ACTAATCGAA AATATAGCCT GACTTTTTTG ATATATACCT CCTACGAACA AAAAGTTAAT AATATTAAAG AGTAT'rATCC 0 4*
S
S
S.
S
S S S. 0 TTrAAAAAGA GCGAT'rTTAC ACCAAGAGAA TGCATTGTAT TTT~CGATrT TGACGACTTT TTAGAAAAAA ATTATTrAAA GACTATATGG CAAGTTTCTA INFORMATION FOR SEQ ID NO: 271: SEQUENCE CHARACTERISTICS: LENGTH: 598 base pairs B) TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear
TTTCTAA~T
AAGAAACTCC
1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 120 180 240 300 360 420 480 540 598
S
0*SO
S
0**S *0 0 5
S
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 271: AGCTCGGTAC GTAGTATnhTG TGGTGCATAA. ATGAGTGAAA AGAGGATAGA GAGGATGAGG CCGATAAGAA CACCGGTAGC TGCATCGTGA AATACTTGTT TTTTCATAGT TCTAATTTCT CCTTGATGGT TTTTAGATAA CGGCGTGAAG AGTAGGTGAA GCTT-rCGTTT TTCAAGAAAA TTTCTACCAG ACCGTrTGGC GTGAgCTTGA GGTGAGAGAT GGAATCGATA TTGATGATTT CTGATTGGGA AATTrGGATA AAATTGGTTG GCAAGAGTTT AAGAACCTGA TAGAGTCGCA AATCAATGCT GTAGGTCTGA CTCGCGGTTT CTGCTAGAAC CTTCCGATTC TCGATATAGA AGCGCTGAAT CTrGCCAATC TCAACTAGAT AGACCTGATC ATCGATTT?1' CCTTTGATTT TTTCTCTTTG GTCCAGATTT TCTGCGAACT CGATGACTTT CTGGACTTTT TCGGTTTCTT GAGGTGCTTG GACAATCAGC TTTTCCTCCT CGTAAGTCTC ACTAATCTGT AGTTCTACTT TCATAGTI' CTCTCCTTTT CAGTTATACA AGGTTGTGAT CACTrCCTGT ATATCCGG INFORMATION FOR SEQ ID NO: 272: 1302 SEQUENCE CHARACTERISTICS: LENGTH: 1099 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 272:
A.
I
0 0* 10 I I 4 *0 CI 0 C 4 0 00*0.1
C
CI..
0 I. a q
I
0*0* 0.4.
S
I.
I C 0 CCAGCAAATC AATAACTGCA ATTGCTATAA GACCTCCCTC TI'TATCTAA CTTCATTCTA GATAAAAA'rA GCAGAAGGGA GATTCTCTTA TTCTATTATC TAACTrCTTC ATCATTCCAG TAAAAGATGT ATTTACCGAT ATTGGCGAAG ACGAAATTGT AAATCAACCA AAAGCCCCAC TTGGCAATGA GGGACAGTGC AAAGGCAATA CCCCCATATC CGATATAGTT GGTCACAAAG ATGGCCGCCA AT'NTrACCTG TTTT'rGGCTC TCCCACTTCT TTATAGCAAA GGTATAAATG GCCTTATTTC CAAGGATATA ATCAATAGCA TAATTTCCCC ATTTGCTTAA TTTCCCCGTG TTGGTTACGG AAATCAATCC AAAGGGTACA TTATCGAGGT GTGAGTTGAG GTAACCAGAT CCGAAGAGGT CAAACTATTT AGATGTAGCA ACTACCTTTC TTTTTTTCAA ATATTCTCCC AATGGATrCT ATAGAGN'TT TTCATGACAA CTCCAAAAGA ATGGGAGTTA CAACTAAAAT AGTTGGCTAG TATTCTTTAT TTGAGT'rTCC ACAAATAAAG CTCCGATTGC AT'rGAGGATA TTTCCTTGAA TACCAGCTTT TGTCAGCTGA TGAGTTGTTA GTTTTAATGC ATTCAAAGCA GTTGTTACGT AGGCAAGGAG ATTCATCTTG
GCAAAGAGGA
ATI'TGGTTGG
AGGXAGGTGA
CCGGACAAAA
AAACGAGTGG
AGAGCTGTCC
GCAATCGCAA
AAAATTTTTA
ACCAAATGAA
AGGCGATGAT GGAAATGATG GTCTGCCTTC TTGCGAAGCT CGGGATAGGT AATGATGGCC TGGTATTAAC AATACCAAAG ACAACATGGA AATCCCAACG ATGATCCCCA GTCTACAAAT TCCCAACGAC CAAAGCAACC GTGATrTTTT CATAGGTTAA AGTAAAATAA AATGATAGAA GGTAAATCCA CTATAGATAT 120 180 240 300 360 420 480 540 600 660 720 7.80 840 900 960 1020 1080 1099 ATA.AAACCCT GAAAATAAAG GTTCTATAAT ATTTGTAGTG TATGGAGCCT ATTT~TATTGT AGAAAAAAAG TCCCATATGA CCTATAATGA AAAGCGACAA AACAACTC AT TAGAAAGAT INFORMATION FOR SEQ ID, NO: 273: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 2723 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 273: 1303 CTGGGATTCA CGTGAAAAGG AACCCCAGAG AGTAGCCAGG TGTACrGCTA GAACAGTGAG TGAAATTGAA TATTACCATA GAGAGTCAAC CCAGATAGCT CAGGCTTTAG TTGAAAATCA 120 AGCTCGTATC GAGGGAATCT ATAAATACTT TAGCCT'rAGC ATGCCAGACT ATTT'rTACTG 180 GCAATTAGAG CGGAAAGCTT CGCCTTATAT ATCAGTC'TCT CTGTATGAAA ATGTTGATGA 240 CCTCTATGTI' CGAAATGATT TTGTAACTGG GGTGGCCZATT GCTTTCAAG ATTACAAGGA 300 AGTCTATGTr TCTACTAAAG ACAAACGTAG GkkAGAAAA.A.ATCAGGGCTG AGGAT'rTCAA 360 ACCAGCAGGA AATAGTTTTG CCATTCCAGT GTCAGATCCA GTGTCAGATC AAGACTTAGG 420 AGTGATTTAC ATCTCCTTGG ATCCTGCTGT TT'rATACCAT GCCATTGATA ATACTAGAGG 480 TCATACTCCG ATGGCAGTAA CAGTGACCTC ACCTTTTGAT ACGGAGAITTT TTCATATGGG 540 TGAGACAGTT GATAAGGAGA GTGAAAATTG GCTAGTTGGC TTAACTTCTC ATGGATATCA 600 GGTTCAGGTG GCAG'TTCCTA AAAACI'TGT TTT1ACAAGGA ACAGTGACTA GCTCTGCTTT 660 GATTGTGGGT TTGAGCCTTC TCTTTATTGT CATTCTTTAT. CTGACTr'rGA GGCAGACTTTI 720 *TGCTAATTAC CAAAAGCAGG TAGTGGATTT AGTAGAATCC ATTCAAGTCA TTGCTCAAGG 780 CGAAGAGGGG CGTCGGATTG ACATT'rCCGA GAAAGATCAG GAATTACTCC TALATCGCGGA 840 GACGACCAAT GATATGTTGG ATCGATTGGA AAAGAATATC CATGATATTT ACCAGTTAGA 900 *GCT'rAGTCAA A.AAGATGCCA ATATGCGAGC CTTGCAGGCG CAALATCAATC CTCATTTTAT 960 *GTATAATACG CTGGAGTTCT TGCGCATGTA TGCAGTTATG CAGAGTCAAG ATGAGTTGGC 1020 AGATATCATT TATGAATTCA GTAGTCTCT'r GCGTAACAAT ATTTCCGACC AAAGAGAGAC 1080 CCTCCTCAAA CAGGAATTAG AAT'NTrGCCG TAAATACAGC TATCTCTGCA TGGTTCGCTA 1140 *TCCCAAGTCC ATTGCCTATG GTTTCAA.GAT AGATCCAGAG TTAGAGAATA TGAACATTCC 1200 CAAGTTTACC TTGCAACCGC TGGTAGAAAA CTATTTCGCG CATGGTGTTG ACCACAGGCG 1260 ***GACAGATAAT GTGATTAGCA TCAAGGCTCT TAAACAGGAT GGTTTT~GTGG AAAT'rTTGGT 1320 *GGTCGATAAT GGTAGAGGAA TGTCGGCTGA AAAGTTGGCA AATATCCGAG AAAAATTAAG 1380 TCAGAGATAT TTTGAACACC AAGCCAGCTA CAGTGATCAA AGGCAGTCTA TCGGGATTGT 1440 *CAATGTACAC GAGCGTTTTG TGCTCTATTT TGGAGACCGC TATGCCATTA CTATAGAGTC 1500 *TGCAGAGCAA GCCGGTGTI'C AGTATCGTAT TACAATTCAA GATGAGTAGA AAGGGAGAAA 1560 ATGTATAAAG TATTATTAGT AGATGATGAC TACATGGTGA CAGAAGGTCT GAAGCGTTTG 1620 AT'rCCCTTTG ATAAGTGGGA TATGGAGGTC GTCGCAACAG CCAGTCATGC CGATGAAGCT 1680 CTAGAATATG TTCAGGAAAA TCCTGTCGAT GTCATCATTT CCGATGTCAA TATGCCAGAC 1740
AAAACAGGGC
CTGCTCTCAG
GACTATTTGG
GGTC.AGCTCG
GGATTTGT1TA 1304 TTGATATGAT TCGGGAGATG AAAGAGATCT GTTATCAGGA GTTTGATTAT GTAAAAAGAG TCAAGCCTGT TGATAAGGTA GAGCTGGGAA GCGAGAGAGG GAAGAAAAGT CAGACTCTTA GTTATTAGG GGATAAGGAG AATTGGTGGA TACCAGATGC TGCCTATATC CAATGAACCT TAGTGTGGTG ATCTGCTGGA GAAGATTGCA GTCAAGAATT AGACGAGGCT TAGGTCTATC CAAGGAAAA ACTGGCAGAT TTTCATT'rCT CTCCTTATCA AGAACACTr'r GTTCTGTAAA TCTGCAGCAG TTATCATTCA GGGAAATCTC TTCTTGAAAA TACACCTCGT CAAGGTTCCT TCACCATCC CTACTATGTC TTGGGTCAAG GGCCACCCCC TAGATGGTTT AGTCGTTACA CCTTTTGAAG GA.ACGCTGGA AGCTGAATGC TGAGAAAACC CTCT'rFPACG TCTGAGAGTC TCTTrGCCrA TTACGAACCG ATTTATAGGG AATCAAATCG TAGAAGAGTT AAATCTCTTG GAGAAGGTAG 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2723 0 GTT'PCGATTA. CTAAACAGCT TTTTATCCAG CATCTCAAAG CTGATGATAT GACGGACATT GATGAATTGG TTTCTTATAT CAAGGAAACT AATGAAAATG TGGTCAGTGT GCTGGAAGTC CTCAAGGATA TCAGTAAGGC CCTCTTTATC CGTGAAACCG ATTCGACCTT TGCAGAGTTA CAGCTCTTGC T'PTCAAC'IAG TGA INFORM4ATION FOR SEQ ID NO: 27 SEQUENCE CHARACTERISTICS LENGTH: 836 base pa TYPE: nucleic acid C) STRANDEDNESS: doubi TOPOLOGY: linear TTTGTCATGG ATGTTTTCCA TTTATTTGAA GTCAAAACCA TTCATGCTAT TCAATCCTTC CTGATCAGCT TTTTCGGTCA ATTGGTCGTG ATTACCAAAA AATCCTGTCT ATCTAGGGCA CTAAACAAAC AACGTATTAA
ATACCGTATG
AGAGCTTI'CC
GTTGATTAAG
GGCTGCCCAG
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 274: CCGCAGTTT TTTAAACCGT ATATAAGTAT AGCATAGTCA AAAAAAGAAT GCAAGATTTT TGCAAACTTT TTTAAAATTT TTCGTA.ATTT TTCTTTTAAA GT'rCTACTGT CAGGACTTGA 9CT'rGCTITAA CAACCTGTTC TCCGGCGATA. TAAACATCAT CTACATCACT AGATTTAACT GCATAAACCA GGTGAGACAG CATATTTTCC TGAGG?1'GGA GATGAATTTT CCCTTGTGGT TGAATGACCA GAAAATCTGC TTGCTTGCCG ACTTCCAGAC TTCCTATCTG ATTTTCCATT CCAAGGACCT TAGCCCCTTC GATTGTCAGT ACCTTGAGAG CTGTTTCGAT TGGAAACTGG CTGGCATCCC CACT'N'TCAT CTTCTGAAGA AGAGC'TGCAG TCCTTCCTTC CTCAAACATA 420 a. a.
a a. a 1305 TCTAGATTGT TATTGGAAGC AACCGAGTCA GTCGCAATTC CGACTGCI'AC TCCCGCTT TGGAGCTGGA TAATTGGAGC AATrCCTGA'r GCCAGITGA GGTTACTGAT AGGATTGTGG GCGATAGCnA CTrGAGAAGA TGCCAAGCGT TCAATTrCTC TCTCGTrTAA TTCGACCCCG TGAGCAAATA CGGACGGATG ATCITAAATAA CCCAGTTCTT CAAGAAAAGC AAGGGGGCGT TTGCCGTATC GTrTGAGGAT AATTCCTGAC TCCTCCTI'GG TCTCCGCCAC ATGGACATGG AGCGGAATAT TTAGCTCT'N' TGCCAT'rTCC AAACTCGCTT CCAGCAAGTC TCTACTGCAG CTATACGGAG AATGAGGTGC TACCATAACC TTGAAATrG GATTrTTTATA TTYTAA INFORMATION FOR SEQ ID NO: 275: SEQUENCE CHARACTERISTICS: LENGTH: 2335 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 275: ATTTTATTTC ACT'rTTTAGG TGGTCTGGGG CTATrCTTAT ATAGCrxTCAA GACCATGGGA GACGGTT1'AC AACAAGCTGC TGGAGATCGC CTGGGTTTTT ACATrGACAA ATATACTAGT AATCCTTTGT TTGGAGTTCT GGTTGGTAT GGGATGACTG CTCTAATTCA GTCTAGTTCT GGTGTAACAO TTATCACAGT CGGCCTGGTC AGTGCCGGTC TCTrAACCTT ACGTCAGGCT ATCGGGATTG TCATGGGTGC TAATATTGGG ACAACTGTCA CATCCTTTCT CATCGGTTTI AAATTAGGTA ACTATGCCCT ACCTATGCI'C TTTATCGGTG CCGTCTGTCT TTTTTTACG AAAAATCGGA CAGTCAATAA TATCGGACGC ATCCTCTTTG GTGTCGGTGG TATCrTITT GCCCTCAATC TCATGAGCGG CGCAATGGCT CCACTCAAGG A'TTTACAGGT CTTrAAGGAC TATATGATTG AGCTAAGTAA GAATCCTGTT TrGGGTGTCT TTGTCGGTAC TGGCTTGACC TTGCTAATTC AAGCT'rCTTC GGCTACCATr GGGATTTTAC AAAACCTCTA CGCCGGCAAT CTAATTGATC TACAGGGAGC T'rTGCCAGTT CTATTTGGTG ACAATATCGG GACAACCATT ACAGCCATCA TTGCCTCTTT GTTGCCTTCA ACGTI'ATCGG CTGATTCATT GGTTTGAAGC CACGGAACCT TTAATATTAC TAC'rTTGTAA CCAAGATTAT AGGGGCTAAT ATTOCAGCTA AACGGGTAGC AGGAGCTCAT AACAGTTGTC TGCGTATTT TTCTAGI'CC T'PTTACTGTC TACGCTAAAT CTAGCACCGG AAATGACCAT CGCCTTTGCT CAACACCATT GTCCAATTTC CATTTATCGG AGCTCTGGCT TCCTGGAGAG GACGAGGTTG 'rCAAATACGA ACCCTTATAT 1306 CTTGATGAAC ATTTCATCAA ACAGGCCCCA TCTATCGCTC TAGGAAATGC TAAGAAAGAG, CTCTTGCACT TAGGAAACTA CGCrGCTA.AA GCCTTTGACC TTTCCTATAA GTACATCATT GACTTGGATG AAAAAGTTGC TGAAAAAGGG GATGAGCAAT TAACACGTTA TCTCATTGCC AGTGAAGTGC TTACCAATAT CCTTGAT'rCC ACGGAGGCTC TACTCAATCT GACTGACTAT GCCGCCTTGA AAGAATTAGA GGAAG~TTAC CTGGATAGTG TGGAAAACAA TGATATTGAA GCAATCAATA AGATAGAACC TGTTCTCAGA GAATGTTCAA CACAAGCTGG GGTCAACTTr' TCAGACCACG CTATGAACCT TGCTGAAAAG CATAAAACCG AAOAAGCAAT TAACACCATC CTTTCAAGC!G AAGCTCTCAG CCAAAAAGAA TCCCGTGAT'r TGGAACGGAT TGGAGACCAC
CTTCAACGGA
CGCCAAACTA
AAAGCACGCA
AAAACCCACA
AAAATGTTGA ATTTTCTIGAT GTGACTr'rAT CAAAGATGCT GTCTTGTAGA ACGTCA1'GAA TCAAACGCCT CAACAAAGGC AGCTATCCAT CATAA'rTGGA AACTGAAAGA GTATTCTGCA ATCCAGAATC CTTGCCTTAG TTTTTACTAG ACAAAAAAGA AGAAAGGAAA CTCACATGAA CAACTATCTT TCGATGGAGG CTTTTTTCCC AGTTGAAACT CGCCGCTACT GTCGT'rATTC
TGGCTTTTTA
GATATATAGT
CTTAGATCCT
GTTTCCCCTT
ATCGACATCA TCTCACACTA GTTTTTGCAG AACAAATCTA CTTTrTCCTA AGCAAGACTA CCCCAATTAT TCACCCCAAA GGATGGTTTC TTTT'rTCACC TATGGTATAA GTGTAGAAAA
CACTCGTGTA
AGAACCAAGA
GGATGAATGA
TCTAAAAACC
CAATGGGTGT
AAACACAAAA
GTCTTTTTAC
TT'rTCAGGAA
GAATGACCAA
TCAACTGTTA
1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2335 CAGTTTACCA AATCATCACT TCCAAAACAA TCATTTAACC CAGTATGGTG GTCTTATCT'r AAAAGAGCGG ATrTCTAAGT ATTTAGTAAC GGATTCAGAT ATCCTTGTCC AGTTCCTCT ACAGGTTATG GAACGGACTA TGCTTGTAAA GAATTGTCAG CTGATGCCTA CTrTCCAAAA TTGTTGGAAG GAGGGCAGCT TGCTTCACAG CCAACCTTAT CCCG'rTTTCT T'rCCAGAACT GACGAGGAAA CAGTCCATAG TTTGCGATGC CTCAACCTTG AATgGkCGAA TTCTTTTTAc AGTTTCACCA GCTAAACCAA CTCATTGTAG ATATCGATTC TACCCAT'rTC ACAAC INFORMATION FOR SEQ ID NO: 276: SEQUENCE CHARACTERISTICS: LENGTH: 752 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 276: CGGATTCACT GTTGTTGACT AATCAATAAC ACAGTAGAAA ATC'TCACAGC AGTCTATTAG 1307 TTGCTI-1'CA TACTAGGCAA GTGACTGAGG CTTGTACTrG GGCCGTAGAA GAGAAAAATA GTAGACTGAA AACCCGCAAG GACGTGGGAG ATGAAAATCG ATGAACCAC TACAAGGAG AGCAAACAAC 'PTCGTGCTGC TCTTGAGAAA ATCGACAGCA GAAGCTGTAG CACTTIGCAAA AGAAACTAAC T'TTGCAAAAT GCTTACAACT TGAACATCGA CGTTAAAAAA GCTGACCAAC ?I'GCCAAACG GTACTGGTAA AACTTCACGT GTTCT rGTTr GAAGAAGCAA AAGCTGCTGG TGCAGACTrT GTTGGTGA.AG AACGACGGTT GGTTGGACTT CGACGTAGtT ATCGCTACAC GGACGTCTTG GACGTGTCCT TGGACCACGT AACTTGATGC GTAACAATGG ATGTrGGCA.A AGCGGTrGAA GAGTCTAAAG GCTGACCGTG CAGGTAACGT TCAAGCAATC AT INFORMATION FOR SEQ 10 NO: 277: Wi SEQUENCE CHARACTERISTICS: LENGTH: 2643 base pairs TYPE: nucleic acid STRANOEDNESS: double D TOPOLOGY: linear GGTACAGCAA GGGAGCTTAA ACTTCATCAT TTCGAGA.AGT AATAGAAAAT GGCTAAAAAA CAAAAGCATA CAGTGTAGAA TTGATGCAAC TGTAGAAGTT A.AATCCGTGG AGCAATGGTA TCGCACGTGG TGCAAAAGCT ATGACCTTGT TGCTAAA.ATC CTGATATGAT GGCTCTTGTT CAAACCCTAA AACTGGTACT GTGGTAAAAT CACTTACCGT (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 277: GTCAACATTG ATTTCAAGGC AATGAAATAA TAACAGGACG AGAAGTAGAG GTGTACTATIP CATTTGAACG ATTTCAAATC GAAGACATTT AGAGAAATAC CCTTAAAGCC TTGCTACCTA GLATGAAGA TTGCTGTCTT ATCAAAGATT GAAAAGACTC TGTTTGCTTT CTATCTCCCC AATTGATCGG GACAGTCAAA CTAGTTTCAA TCTACTATAT CTTCTTrTTTG GTAAAGATTC TGTCTATATC TCTATTTTCA AGCCTTGCTC CTGTTTCTGG CTAGCCTGAT TTCTGGATA CTTCCAAGGC TTGAAGTGTC TTTrTCATAA
TCGATTTCTA
TPTTCGTACAG
TGAGCTCTTT
AATGCTAAAC
GGGTTGATAA
TGTATAATAA
ACAATGTTTT
GTGCTTCAAC
GATTTGCCTC
TAACAAATTT
AAAATCTCCC
TCCCATCCAT TGTTCTTGAA AGGATTTGCC TAGGGAGTTG AGCGTTI'CTGTGCTCACCT TTTCTTCA.AA ACGAATTGTC TCTATGTTTC TCCA'PTATAC TATTTCTCCC: ATTTTTTACG AATCCCACAA ACTCTTGTTC AGTAGAAAAG GAATCCTTG4G GACCACTGGC ATACAAATTG ATCTTTTCCT CACCACCTTA AATAGATAAG TATGATTGAT 1308 TTrTAT'TTrT TTCTCGTCGG GAGCATTC'rA GCTTCCTTTC TTGGTTTGGT TTTCCAGAGC AATCCATTAT CAGTTCAGCC AGTCACTGCG ATTCCTGTCA CGTCCCT'rAG ATTTGATTCC GATTCTCTCA CAGGTCTTCA ATCGCTTTCG
CATTGACCGT
GACTCCCTTG
CTGTCGCTAC
TGCAAAGTTC
CTGCTTTACT
ACCTTGGGTA
CAGCTAATCC
GGAATTTTGG
TCT'rGTGCTC
ACGGGTATCC
GCTA'rCCTGT CTGGTATGCC CTCTT'rGAAT TAAGCTTAGG ACTCCTCT CTTGGGGA'rG GCTCTCCTTG GGGCAAGTCG TCeAATCAC CGCTGIr TCTACGACTT TCACCATCAG GAATATCCCT TACTGGTCTG GATGACTTTC TAATAGCTTC CTCTGGCTGG AATCTGGTCA TGGTCTCCTT CCTCATACTT CTCATTTTAT CGATATCCGC ATCGGTGCAG GGGATTTCCT CTTTTTAGCT TCGTCTTTAG CGTAACGGAG TTACTGATCT TGATTCAGTT CGCTTCTGCG TGGCCTT'rCT CCTGCAAAAG AAAAAGGAAA GACTTCCTTT CGTGCCTTTC a a a. a.
a a a. a a a a. .a a a a a a a. .a a a a a a. a.
a a a CTCTTACT'rG CTACr'TGTtT ATTTCTGCCA TATATCCTTC GTACAAAGGG ATGATGTTAT AAATTTCATT TCGACTGAAA CATTATAGTC GCTAGAC'rGA TTTGCT'rGGT TTTTTAACAA TTGCGCAATT TGACGATAAA AGCTCGATAC TCTAAAGCCT ATCAGGATTG C'rGACACGC'r TGGCAATAAA AAACATGATG GATTATTTTT GG'rAAGCTAC TGCTTGTCTG ATGAAATTAr TTCACAGTTA AATTATAAA'r CAAATCGATC TGTTCTTCTA TCTTCTTGAT ATATTTCGCT TATAAAC'rGT AAACGAATAC CTAGATGATT ACTCAAAACG ACGTCCAGAA AAATTTGATC ATCCAAGGGT TCAATCATTT AGTAAGAA'rG TTCTTTGGA GTCAATAATC GTATCGAAAC ATTCAAATCC GACTTCAATA
ATAAAATCCA
TATTrT=TT
ACTGATCAAA
TTTGTTTAGA
TACTCTTTAC
TGTAACCTTT
CTAACTTAAA
AAATATAACT
720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 1GCCAACCCT
CAAAATAATT
CTCTTCAAAT
TGCTTCTTGC
ATTrCATATCT TTATATTTAT GTAAAAGAAT ATGTCCTAGC TCGACGTACA GATGAT'rTAT TCGTTCCTAA CACAATATAA TGCGCTATAA GCATCACT GGCCATTAAT TAATCGTTCC TTCTAATftTA TAAAGCAAA'r CA1'GATTATC TTTTGAAATA AAGAGCCAAT TCCTCAATGG ATTC'rCCCTT ATGATAAGAT CTCATGAATT ATAATATTAG GTATAATTAC AAAACTTTCA TACCTTATGT AAATACATAG TTTGAATATC TATTrGTTTTC TCTAAAGGCA ATTACAGAAG AATCAAATCG AATGCTCTCT TTGACTAAAA ACTCTTCTTT TCCAAACGAT CGCCATCTTC TCATGAGCTA AGTCAAAATT GGTCTTCCCA ATTTTGACCA ACGATATAGA TGCCTGAACG; CCTAATTTTT CCCTGGCATA TCACTCACTA CATTACTTAG AAATAATCAA TCAAACTATC CGTGTTGCTA GGTCTGCATT TCTTCCTGTT CAAAATAAGT TAAATCAACA TGAAATTGGT GT'rGGACTCA AACTGCCAAA TGGCCAAATG CATTT'rGGTT GATAATTTAG GTTTCGTTTC TGGCTTGTTrC CGTTAAATTA ATTC'rCTGAG CTAATTCTGC 1309 TCTACTTAAA CCATTTAACA GCCGTAATTC T'rTCAATACC CGACCAT'rAA ACATTTACAT ACTCCTTACT ACTTTTGACC TTCTTGTTT-T TCTATT-CTTG GAATAATrTC AAAATCTTCT G?1'TCCGATA ATTCTGAAAA ATTAGGAATA TCTTGATATT TAGCTTCTTC GAAATGGTAC
GGG
INFORMATION FOR SEQ ID NO: 278: SEQUENCE CHARACTERISTICS: CA) LENGTH: 582 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 278: 2520 2580 2640 2643 9* 9 .9 .9 *9 9 9 *9 *9 9 9 9 9 9 9 9**9 *999 9 9*9* 99 9.
9 TGACCAGTGG CAAAATGGCT GCAAGCCTAC GTAGCCTTGG TCCAGAGGAG GCCTATGAAG TCTAGTCTTT CACCGTATTG CTT CTTAGAG GGCTTGATTG TGAAAACAAG GTTATGCAAC GCCAGTAGAT GGCGAACGCT GTAAAAATCC TCTACTCCTC TTGTATGGAA TTTGGGTTCA .ATAGAAGAAG TTGTwAGTCA ATCCAAATGC AGATGTTATT ATCGATGATA TCATCTCAGG AAGAGGGAGA ACTGCTAGCC TATGCTGCTG TGACCAAGAG CTATTTATGA GGGAAACTGG CAAGCTGGAG AGTCAGAGTA CTGTGGCAGC AGATGTGCAG GGAAaAGGAG TTGCTCAAAC AAGGTTTrGA TTATCTTGAT TT-TCGCTCAG ATACGCATGC ATATTWTGA AAAAC'N'GGT TTTAAACAAG TCGGTAAGAT TGGCCTATCA AAAATTAAAG AAATAATGCA AAAGAAGTAT ACCAATTGGT ATTCTATCAC TTGTAGCTGA TGACCATTAT GGAGCAGAAG CATTTTGAGA GGGGACTAGG AGATGAAACG TCCTATTTTA GACCCAGTTA TT INFORMATION FOR SEQ ID NO: 279: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 554 base pairs CS3) TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID, NO: 279: CCCAAGCTAC TAAGAGACTA AAACTTGCTA GAGA.AGCAAG AGAA.AGTGTG AATCTTTTTA ATTTCATGAT GAATTTCCTT TCTGCTACCA ATTTAGAGAA AT'rTTCTCTA ACCAGCAATT CCCCTAGTA'r AACAAGTTCA AAAAATGGAG TCAATTTATC TGCTCACGGT CCAGCAGGTA 1310 GCCCCGTACT TCTGAGATAA AATAGAGAGA CCCTGTAACG TGCCCTI'TCT TCAAAATCGC TGATAAATTC TCGGTAAGAA CACATCCCTT TCGTCCAAAG CCCCCTGATA GTCAA.AGCCG AGGCAATNTT TCAGTCAGAT AACCCAACAT CCCTTGATAA AAAGAGGATT TGAGGTCGAT AGCCTTCCTG CrCTTTTTCT AGTCAAGGCA GGGAGGTTAT GAGCACCATC CAAATAAATC GCGAsCAGCC CAAT INFORMATION FOR SEQ ID NO: 280: SEQUENCE CHARACTERISTICS: LENGTH: 766 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear AACAGCAAGT CTTGAGCGTC GAAACTATAT CGTAACCTGT GTCACCTTGA GTTCCACCTG TCCTTACGTr TCAAGGATCC TTGATAAACT CAGCCAAGCG TGTGGGCGAA TACGCTCCAA 9e *5 05 9 *9 9.
9 .9 S S
S
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 280: CCGGTTTTTC AAATGAATTT CTTGGTrGTG GCTAAAAAAT ATGCTACACT ATCAATATGA AAATTTTAAT CCCAACAGCA AAAGAAATGA ACACAGACTT CCCAAGTATC GAGGCAATTC CTTTAAAACC AGAAAGTCAG GCCGTGCTTG ATGCCI'GGC TCTCTATTCT GCCAGTCAAT TGGAGAGTTT CTACAAGGTA TCAGCTGAGA AAGCGGCGGA. AGAATTTCAA AATATCCAAG CTTTGAAAAG GCAAACTGCT CAACACTATC CAGCCTTGAA ACTTrTTGAT GGGCTTATGT ACCGCAACAT TAAGAGAGAT AAGCTGACCG AGGCGGAACA AGATTATCTT GAAAATCATG TTTTCATTAC CTCGGCTTTG TACGGTGTTG TTCCAGTCTT GTCACCCATG GCTCCTCACC GTTTGGATTT TTTGATGAAA TTAAAAGTCG CTGGTAAGAC TTTGAAGAGC CATTGGAAG 55 S *5*S
S
S. .5 CAGCCTATGA TGAAACTCTG AAGAAGGAAG AAGTGATTTT TTGAGACTGT ATTTTCTAAG GAAATCAGAG CAAAGATGGT ATAGAGGCGG TCAGCTGAAG ATTCACTCAA CTATCTCCAA TAACAGCTTT AATAGAAAAT CAAGTACAAA CTGTGGGGGa CTGGATTTGT T'rACCGAGAA GATTTGTCAC AACCACAGGG INFORMATION FOR SEQ ID NO: 281: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 901 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear CTCTCTCTTG TCATCAGAGT GACCTTCAAA TTCATGGAGG GAAAGCGCGC GGGGCCTTTC AGCACGTCGC TTGAACTTTG
GGATGG
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 281: CCGGCCACGG TTCCATCCAA CTTCACAGGT GTGCACTTGA TTGTGTATGT AATTGTCACT AACGGTAGAA TTrCACCTAT CCCTCCTATC TGCTCGCAGT ACCCGCAGAC TTTCTGAAAG AAGAAGATAA CCTACTTATC CGTTGCTATG ATTATACTAA AG'N'TCTACT TTITrGCAAA TAGATTTTTA AATTTTTGGC TAATTGTCTG AATCAGGGTC GGAAGTTTGA CGACCTTGTC ATTGCCTAGT TTTTCGCGTG CAATT'rTGAG AATGGCACCT GAGTCTTrTG GAATTTrCCT TTGTCTGTAA AGACTTCGAA GTGGCGGCTG ATTTTGCGTC
AAGCAAAGAG
CAGTGACATT
GGCTCCAATC TGATTGATAT GGCTCCAAGG TGGGTAAAAT TCCAAAGCCT GATCTCCGAC GAGGTAGGAA GTGCCTGTCG TAGTCT'rCCT TACTTTCTCC TGGTC'rTTAT TGACAACAAA ATGTTGATTT TGTTTTCAGA TTCATTTTTT CACCAACAAT ACTTTAGCCT TT-GAACCAA CGAGAACGGA GAATGATAGA
TACTGAGGAG
AAAAAAGGCA
CATAATAGAA
TAGAGCGCGT
CATAATGATA
CTGACGCAGG
AAGAwCGTCG AATCTGGATA AATTGTTCGA CATTGACATC AAGGAATTTC CCAACTTTCC CAGCGATAGA TACTGTT'ITG TTAAGTGATT GGGCCATGCT TTGTAGAGGG CTrTAATTGC TGCTTTCTCT ACTTCACTAG AACCTTGAGA CATCATCTGG GTCGCAGTAG CAGTCACTCC GATATGGCTC GAAAGGTCGT GTTCGATTTC TGCATGATCT ATTTCTTrCTT CCTTGATGGG AGTTAGTTGG ATACCTGT'rG GCATATGTTC CCAACCGATG
T
INFORMATION FOR SEQ ID NO: 282: SEQUENCE CHARACTERISTICS: LENGTH: 1765 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 282: CCCTGTTACG TGGATAATAG GGTAAGACTG CTCAGGATTT CCTAACAAAT CCACCGCTTG CTGCATTCGA CCCAAACCTG ATCGAAAATT CAAACCAATC CGACTATGGA GCCATTC'N'C TACTTCAAAC ATACACATCT CCTTGACAAA AGTCCAATCA ATTATCGCAT TAAAGTATGG TTACTrAATAA AAACAAGGCC AGGATTTTCG TCCCGACCTC TTACCTGGTr AGCTAATAAC TAGCTACTAT GAATGTGAAT ATGGGCTAAA AACATCCACT GGACGTTCCA ACTCTTCCCC 1312 ATTTCTGGGA GTTGGGGTAA AAATGTTCAC 'rGGACGTTCC AACTCTTCCC CATCTGGG 360
AGTTGGGCTG
TrCAATCATG
TCCATCTTCG
TTCTGCGTTA
TGTCAAAAGG
ACGTTCI'CG
TCTATCTAAA
CTTATTGCGA
GCGTTCATCC
TGACTAGCTT
ATACAGTCTC
TTCCATTCGT
TTTTCGATGA
ACTGGTACTA
ATTTCAAACA
TGGTCGTGGT
TAATTTTGTA
CTGATATCTG
TGATAGTCTA
CTACGCGCGG
CCAGACTGTA TCACTCCTCC CTTCTGAGTC TTCTGGGATT ATAAAGCI'GT TGAAGACTrC GGTTGCAATT CGCCTTCTGT
ATGAGTAAGC
GAAGAACATA
AGGTTTCATT
TATGATCGTG
AAATCAGCTG
TTGGANTCA ACTTGTCCGT CTrCGTCTTC GTTTTTACCA AATTCTTCTT T TCCATCAAT TCCTTGCTCA TCTACTAGTG TGATTAGTTC TGACATAGCC TCGCCTTTAT ATTAAAATTT AGCTGCTAAC TTATCAATGA CTTTCTTGCG CTTGTTCAAT CAACATGCGC CTGGTAAACC AAAAAACTCT TCCACTTGTA TTGTTCATGT AATCGTTCCA CCTTGTAACT ATCAACCAAT TGTTCT'rCAT TTATCTGGAT GATTTCAAGC
TCCTTAACGC
CCTTGAGCYG
ATCGCCACCC CTACCGT1'TT TGAACCGACG TCCAATCCCA TCGACTCCTT GTCCTTTGAG GTAGTAGCGA ACCAAT'rCCT TACTTACGGA TTTGATTTCG TGCATTATITA TAACGAGGAA AATACGTAAC CTACGATTTG GTTAATTGGG TTGTAaCCCT ACATCTGTCA AAGTTTCGCT AATTTCTTTT rTATTGGAAT GTTTCTTCAG TAAATCCCAT TCTAACACCC TCTTTCCTTA AATTCCTTAC CTTCTACAAT TCAGGCAGTC TATTTATTTG CGCCATTTGC CAATCTATCT GAAATATATT TGCTTGGTTC AACCAATATT CTTCAGATGT TCCAACTGGG AAGCCTrCTT CAAAACTAGT CGTTGTTTGA AGTTCCGTTG CGCTCAATAG CTGCCAATTT ACGAGCTTCA ATGATAGACT TATCCTTCTC TTTGAGTTrC CTCCACTCCA TGTTG INFORMATION FOR SEQ ID NO: 283: SEQUENCE CHARACTERISTICS: LENGTH: 1346 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear TCAGCAGCCA CTGTrGTCAA TCTAGCTTTG CTCCGTAGCT TTTTAGGCAA GCCCACTACA GGTCAAAACC AAATTGGCCT TAAAACCAAG CGGATCGCTA TAATTCTCAT AGGTTATAGA CAACGATTTC ATCACGCTCA CGTAGGCAGG GTCTCCACTC TATCGTTCAA CGAAGCATAA CGTCCAATTT AAAACGTACT GAATAGTACC ATTATAGCAT GATT'rTCTAT TGTTCTGTCG ATTTTTFCAAA AGATTTTCCA GACATCCAGA ACTTGAAAAT TTTTGTTTCA AGTTTGAAAC CTCCGCTrrCA AGAAGAGCTT 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1765 1313 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 283: CT'rATCCATT CAC~rTG TCTG?1'ATC TATAAATCTT ACTCCTAAGT ATACCACA'1r TGCCCCTAGA TGTGAACGAG ACAATGTAAC AAAATCAAGG AGAACGCT1C
GAGGTCTGGA
GTTTCCTATC
TCTATCCGTC
CTGGAAT'rAC
GAACAAGGGC
GACGATTTCC
TGAAAAATGG TATCGAGTCT AACTGAGCCT TGACTTTCAC GGCACGAACA ATACCTCTAT AACATCAAGA CCTAGAAATC GACTCTGTGT CAATGAAACC
TAGACATTGC
ATGAAGAAAC
GGACGATr'rC
TGCAGCAAGG
GCCAAGCCTC
GAGGTTACCG
TTGATTGGCC
CTAAGACAGT
CTAGTACTGA
AGACAAGCCA
CAAGAAGGAA AAAAAAGGGT AAAGCAAGTA CAAAGAGGTC CGACGGGTAG TCGCCTGCCT ACACCAT'rCA ACGAGCCCTG AGAGTGGCTA CTATGTATTA ACGAACATGC CAGTGCCTAT GAGAAAACTA CCTTCAAC CCATT~CACAA ACTCCTCTTT CTTCTGGAAC CCAACAAGCC AGGAAATCTT GGTGGAACAG TACTATGACA ATCAAGAAGG AT'rAGAAGAC GAGCAAGCTC TCTACTGCAA GGCTAACCAA TTGTr'IATCC TCTCTCAAAT ATCCTTTCCT CCAACCTACC ATCGGATGAA TCGCCTCTTG GAACGAGGCA TTGATGGGAT TGACTTGGAG ATTAACTTTT TCTACACCAT TCCCCGATI-r CAAGACAAAC GATCTATTCT TAACTTAGCT GATTATCTGG GTGATTTGGA CTCCAAGAAG ATTGCACAGG GGCTGGACTA TCAAACGATT GAGCTGGAAG GCCACTTCAA A.ACAGGAAAA CAC'TATCCCC TGGGACATTC CTATTCTGAG GCCAAGTATG ATGTCTATAT CGTAGAGGAC GGCCAAACCT TCCACTATCT TGATACAGAG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1346 GAGCGTGTCA TTTATATCAA GTCCTTCTCG ACCAGCCTTT r'rCCTGCCCT TCGTATTACA GCACTCAT'rC TTCCAAATGC TATCAAAGAA GCAT'TCTGG CCTACAAAAA TATCCTAGAC TACGACAGCA ACCTCATTAT GCAAAAGGCC CTGTCACTCT ATA'rrGACAG TCAATTGT'TT GAAAAAAATC GTTTIGGCTCG CTTGACCAAT CATGAATCTT ACCAAAAACA AATCGAGGAA AGGATAACTA AAACACCTTG TCCCCTTCCT CAT'rATTCCC TACACGATGG yTTATTGCTA GACCTGAGAC AGTATCCTAA AATCGCCAGT CTCAAACACA GTCAACTGGG cTTGGACTTC TTTGAAGAGG CCTATT'rAhG CACCTG INFORMATION FOR SEQ ID NO: 284: Wi SEQUENCE CHARACT~ERISTICS: LENGTH: 900 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1314 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 284: CTATATTCAG AATATGCCAA ATATTTAGAA AACTCCCCCA CCCTTTATTA TTAATTTGCT TGGATTGACA AGAATGGA.AA TATCCTTGGA TTGTTAAACC CTTGAAGGGG AATCGGAGCA GAGAATATAT TAAATTTTTA TGTCTTCGGT TCTATACTTA GTGCTAGAAG AAGAATTAGG GATACACACA GCCAAGGAAA TACAAGCCTA AAAATTTAAT AAGGTTGATG AAAAAATTAG TATGAAGAAT TTATTGATTA TATAATTTTG GTGTGCTTTT GAAAATTTAA TTTCCTATGG
AAATTCGGAA
AAGAATTAAT
TAAATTAATC
TATTTTTTCC
TTTGGTTTT1A
GCAACGGTAC
TGGTATAAAT
TTTAAGAAAG
AATAATTATC
TCTCTAGTAT
GAGATAAATT
AAATATTTTA
TTGCGGAGGG TCATTTGAC ATTTTTCTAG AATIGGCC TAGAGAATAA AGAATACGAG TTTATTTAGA AGATTTAATC CATTGCGTGA AAAAGGTTTA TAACATTGTT TGACAAGGAA TAACAAATAT CCCGTTTTAC TTTTATAGAA ATTTTATCAA a a a. a a a a a. a a a a
GCTAAGGGGG
AACTGTTTTG
AATCAATAAC
GATAAGAATA
TCTACCTCTA
AGCATTTATA
AGATATGCCT
AAATTAAATG
ATACTCTTCT
TCACTAAATA
CCTCGAACTA
GAGCAAAAGA
TGAGGCA.KAT
ATTTAGAAAA
ATATAAAATT
TTGATGACGC
CTAT'rGCTGA
ATCGGAGTCT
TGATTTTAGT
TGGAAAGGGT
GAAAATTGTT
GTATATCCGA
TTGCT'rATTC GGATCACAGC AAAgITTTACC TGAATATTAT TATTTATTTA ATGGGAGTGA TATACATTTT GTAATAATAG AC'rTTGAAAC AATGTTACGG INFORMATION FOR SEQ ID NO: 285: SEQUENCE CHARACTERISTICS: LENGTH: 862 base pairs TYPE: nucleic acid STRANDEDNESS: double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 285: TTAT'rTAGCA GAGGCAGTTT GCAGATTATT GGTTTAGAAA
AGGAAAAGCC
TGGTCTTTAC
GCACCTGACG
CCTTTACTTA
CACATGGGAT
GGATATCTAT
TTGGTACACT
CATCTGGCTA
GATTTACAGA
GAGGACTTGG
ATTCGAGAAG
GAGTTGGGGG
TAAATGTGAA GGATITTGGTC TCCTATCTCA AACGGATACA TGATTCAAGC ACAAGAGGGT TTCTTTTGCC GACACGAAAG TTCCTCTTGT TGGCGGTGCA AGGGAAATGG CATTGAACTC ATGGACGTAT TATCGGGGTG AAAGAGTAGA GCCTTTTATC AGTCAAACAG TTTTTTATC A GAGGTCGTTC TGGGA CTTGG GGAGAAGTAA GGGAACATTA.
GCTTTGGCGG ATGT CTT GAA GATCACGGTT ACAGTGAGGC TATCGAGATA AGCCAGTTTC ACTGAAGTCC TTGCGGCTCA CTAGCAGAGG GTACGAGAAT 1315 GGGGCATATT CATCTTTCTG TCAAGGATAG TCGAAAGTCC GTrAGGGCTC GAGGArAAAT TCAGTGTGCC TAGTGCTAGT CCATCATCAT TTAGCAGTCA ACGAATGGGG AGGAAAAGGT CCTACCAGGT TTAGCCTACT ATGTCATCGA AGTCGCACAT TGCCCAACGA GCACAAGAAG TTGACGCACC AA'rCAAATGG AATCACAGAC TCAGATGGCA TCGTGACCCG TATTCGTTTA
AGACAGTTT
TGGATCGCAG
CTGGATCCGC
AAAGAAGAAC
ATGACATCGA
GCTAGATAGA
ATCAAACGGT
CTGGGGACTA
GTAAACAAGT
TGTTAACGAT
TCCAATTGGA
TGGTATGTGA
S
S. S 6S S S
S
*S*S
S
.5~5 S S I 5 h.p 0S S S
S
TGAAGGTAGA GCATCAA'IrG TA 862 INFORMATION FOR SEQ ID NO: 286: SEQUENCE CHARACTERISTICS: LENGTH: 650 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 286: TCGTTTACAA GATCGCTAAA ATGCATCTCA TGATCGCGAC CACGAATTCC AAGATAGCAC GCGCTACCTC AATCATAGAT AGTTCACTTT TTTCTTGCCC AGCAA.ATACT TCTAATTCCA 120 AAGCGTTTCT CCTCATTrAT ACTACTATCG CCAGAGCGAA CAGACTCTGA CCTCATT'ITA. 180 TCATTTACTC TTTAT'rrTAC GATAATTTTG CGGAATAGTC AAAGGTrAAG GGGGAGAAAG 240 TGGCAGGATT AGACTAAT'rC CAATATAAAA CTCATTCCTT T'TTCTGTTGC TCCATTTTCC 300 ACAAATCCAA GCGACTTGAA ACACCTCCTA GAAGCATGAT TGTAGGTGTA GATTTTC TTG 360 ACTCTCAATT CTrTCCATCC TTTTACTCGA GCCA.AT-rCAA TCAAAGCACT TAGAATCTTT 420 TTTCCAAGTC CTCGATGTTG GTAAGCGGAA TTCCCAATCA CAATGGGGAG ATTATCCTGA 480 GATAGTGTAA TATCCCCAAT TGGAAACCAT TCTCCCTTCT CCTTGACTTC AATCCAAAAA 540 AGCTCACCAT GCCGATyCAr ATAGGAATAC ATGGCTTCCA AGGTCGcTtG ACTGTAAGGA 600 AGCT'PCACCC CATCTACGAG GtAAcCAAGT TCACATCCGT GATACCAAGC 650 INFORMATION FOR SEQ ID NO: 287: SEQUENCE CHARACTERISTICS: LENGTH: 1119 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1316 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 287: GATAGCAATC CGCTTCAGAA ACTTCTCGCT TACCrCTAAC TCCGATCGCT AG'T'TGGGAG AAGATACTTC CATTCTCATA CTATCTGTTG GCITGCAGG CTGTAAAAAC AACTTTTCTC 120 TTGCTACTTC CTGAAAATCT GAATCTTGCA GTTCTTTGCT TTCAAAATAG TCCTGTACTC 180 GC 'CCACATC AAAATrCCCA GCTAAAGACA GAGACATGTT TACAGGTTTG TAAAACTTTG 240 TAAAATTTTC TTGCAAATTA GTTAGATTGA TTTGGGAAAT GGACTCCTCA CTTCCAACTA 300 TATCAGTTGC TAAAGG'rGTA CCAGGATACA. AATTCGCTAA AGTI'GAAAAG AATAAACACG 360 AATCTGGATC ATCTTGGTAC ATTCTCGTT C'TTGCTGAAT AATATCCTGC TCTGTCAGAA 420 TGGAAGCTTC AGTAAAGTGT GCTGATGTTA CCAAT'TCATC AAGTAAATCT AAATTTTCTA 480 AAAAATAATC CGTTGCTGA.A AAAAGATAGT TTGTTTTTGT AAAGCTTGTA AAGGCATTAC 540 TATCTGCACC TAGACTCGTA AAAGCCGACA TCAAATCACT AGAATCTPTCT CTCTCAAATA 600 ATTTATGTTC AAGAAAATGA GCAATTCCTC CAGGATATTG TTTTACATCT CCGTCAACTT 660 *CTGTGACAAA CGTATCTACC GAACCAAACT GTACAGTGAC ACTCCCGTAA ACCTCTTTAA 720 *ATTCCTTTTT AGGCAAAAGA GCAACTGTCA ATCCGTTGGC CAAACGAGTT CGATAAACCA 780 .TTTCTTTTAC AGCTGGATAG TATTTTTCI' CAAAAACAAC CT'rTGTCATT CTATTCCTTC 840 *CATAAAGTAA ATCGCTTGTA GTTTCACATT ATrAGCTACT CTACAAATAG CATCTTTGTC 900 *AATTTGTTCA AGCTTTGCAA TCCAAC'rrrT AAAGTCTGCT GAAGATTTTC CAAATAAGGC 960 ATTTTGATAA GCACGTTCAA TCAATGAAGA ATGATTATCT TGAGAAAGTA ACAACGACCA 1020 ACGAATCATT TCCTTGGTCT GATTTAACTC AAACTCTGTA AA.AAAACCTT TTTTTAAATC 1080 *AAGCCGT'rGA TTATTCATCA ATTTACGAGC CTGGTTACG 1119 INFORMATION FOR SEQ ID NO: 288: SEQUENCE CHARACTERISTICS: LENGTH: 540 base pairs TYPE: nucleic acid **0e~o STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 288: o ACGCCCTCGC GGGGACATGA CGAATTCCCC GTTCATCACG AAGGCCGCCG AGGAGTGGGG GGTGCCGTCC AAGTCAAAAG CGGCCCCACA TCGATTCAGT TCCCCGACGA ACAGCCCTTT 120 CCCCCAGCGT TCCTGGCTTT GCAACCGTTT CACAACAGCC TCGTAAAGTA GGCCGGACAA 180 GGCAGACGGA CTCCAAAGGA GTTCTTCCAT CTGCAAGTGC GCCTGCGTTA TGTGATCCCG 240 1317 GTCTTTGCA TGTGTGTGGC ATGAATGCTG TTCCCAATCC CACTCCAGAA CATTCTCCTC 300 AAAAGTGCGC AACGTCGCCC TGAATGAATC CTGCCTTGTA GTCGTGACCA 'r'CCTATGAA 360 GGGTCGCAGA GGATTrTCCC CGAGTGCAAG CGCATCCTCC GGCTrCAAATC GGGTGCA~rr 420 CACAGTCCCG CTCAACGCTA GCCCGATCCC TTTTGGCAT GGTGACTCAA GCGTCCTTTC 480 AAACAAAAGC TCCTCATCCG CTCCAACCGG CCCGACGTAG ACGCGrAGAC CGAAGTCGTC 540 INFORMATION FOR SEQ ID NO: 289: SEQUENCE CHARACTERISTICS: LENGTH: 1949 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ 11D NO: 289: **AAAGAATTCG ACCAATTCAA GGTTGAGGCA TCGCAAACTA TGCACTGTTC CCCCGTCAGT ***TCTGGACAGA AAACGGGATA AGGTTGGCTG TGAAGCAAGC TGCCCTCCTA CCAACAATTT 120 TGGAAAGTAG GCATCAGCTG ACAATTCTTT ACAAGCATAG TCCGTTCCAT AACCTGTTAA 180 CAGTTGAAAG AGGAACTGGA CAAGGATATC TGAATCCGAA TAACGACAGT AGCGGCGTTG 240 GTCATTCGTr ACTAAATACT TAGAAATCCG CTCTTTTAGT T'rCAACTGGG AAAAAAGTTC 300 CTGAAAAAAG ATAAGACCAC CATACTGGGT TAAA'rGACCT CCATCGAAAG ATAGTTGGTA 360 AAAAGACTTG TTTTGGAAGT GATGATTTGG TAAACTGTTC ATGTGAGTTT CCrTTC=r' 420 *TGTGTTTTTr TCTACACTTA TACCATAAAG GGGAAACTCT TTTTTGTCTA GTAAAAAACA 480 CCCATTGGGT GAAAAAAGAA ACCATCCAGG ATCTAAGCTA AGGCAAGGAT TCTGGATGGT 540 TTTTAOATTT GGGGTGAATA ATTGGGGTTT TACAATATCA ACTCCCATGA TAGTCATGAG 600 *ATGACTCTTC ACGAATTGAC GTGATGACTG TCCTTCCTTT TGCATAATTA CC'TCCGAAAC 660 *ACAAAAAAAG GGGTAGACAA TCTAGTGTCT ACCCCCGAAA GTTTATTAAA ACAAAAATCC 720 TGCCAAAGAA TTTTTGGCAG GAAACCAAAT CAATTTATCA GTr'rCTATCA ATCGCTTATC 780 *G(=TCTCAAAG AC'TGGTAAAT AGGGATTCCG CAATCAAATT GCGATACTCT ATTATTTAAG 840 ***AGTAACTGAA GCTCCAGCTT CTTCCAATTT AGCTTTGATT TCTTCAGCTr CTGCAGTTGC 900 AACGCCTTCT TTAACAAGTG CTGGTGCACC GTCAACAAGT TC 1'TAGCTT CTTTA.AGACC 960 AAGACCAGTG ATTTCACGTA CAACTGAT AACGCCAACT TTTTTGTCGC CTGCAGATGT 1020 CAATTCAACG, TCGAATGAAT CTTTAGCAGC ACCAGCATCA GCTGCATCAG CTGCAGCAAC 1080 AGCTACAGGA GCAGCTGCAG TrACACCAAA CAATTCAAGG ATTGAAGCTT CTTTAATT'rC TGG7rATTTCC TCCAAA'rAAG TTAAT'r CTGTGTAGCT TAAGATTAAG CCGCGTCTTC AGCA6ACGTTG CGCACTGGCG CTrGAAGTAC GTTTGGAAGA. GTTGCAAGTG CAAGAATCTC ACCACCTTTA ATTTCAAGTG CTTCAGCGTT TGCGATAACA TCT'rCAT1'AG AAAATGCTAC 1318 TTCTT CTT CC ATAGCTTTTA AGCAATAATG TTTTCAATGT TATAATAGTT TTITTTCGTAG TTTGCTTTCT GCAACCGCTT AGAAAGGAGC ATAGAAAGAA TTCTTTAGAT GCGACAGCGC TTrAGAAAAG TCGTTCAAGA TGCAGATGGT CCAACAAATA
CAAGGTCGT
TCAATGCCAT
CTAGksTACG
TGACTGCAAG
GTCCTTCGCG
CTTCGATTGC
TTTTCGCTGG
CAGATGCAAG
ATCTTCAAGA CCAGCTTTTT CAGCTGCACG ACGCAAGATT GAGTT'TTTAA TAACTTTATA CTCAACTTCG CTTCCACGAA GCTCACGACG AAGAACTGTA TCTTGCTCAA CTGTCAAACC ACGAGCGTCT ACAACGACGA TAGATGCAGC AGCTTCATT TTTTCAGCTA tACGTCAACT AGTTCCGCTT TTTTAGCAAT AATTGCTTCA CTCATTAGTG TGTTCACCTC CGTAAT'rATT TTGCTTGGGG AAT~rrCrAA AAAGAAAAAC GCGCCCAATC CTAGACACGA AAGTACAATA CGCTTCT'N'T TACATGATAC GTTTTGTCCT CGGTAGGATA TTTATGAGTC GAGCTCCCCT ACTGTCTTAG GCAGTTTTTT TAGATACGG INFORMATION FOR SEQ ID NO: 290: SEQUENCE CHARACTERISTICS: LENGTH: 1023 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION: SEQ ID NO: 290: GGACTGTTTG ATCTTATACA GTAGCTGCTT GATCCAAGCT TTCACCGATA GCGGCTAGGC GCTCGATAAC TTCAGCTTGT GTCAATTCAT TTTTTGAAAC ATAGCGGTTA CGTGGGTGAA CACGGCACTC GTGTGAGCAT CCACGAAGGT ACTTGTCTTC ATTTTCTTCT GATGTCAAGA TACGACGGTT ACAGAATGGA TTTCCACAGT TGACATAACG TTCACATGGT GTTCCATCAA ACCAGTCTTT CCCTACGATA GTTGGGTTGA CATGGTTGAC ATCAACGGCA ATACGCTCGT CAAAGACGTA CATTTTCCCA TCCCAAAGCT CACCTTGAAC TTCTGGGTCT TTACCGTAAG TTGCGATTCC TCCGTGCAAT TGGCCGACAT CTTTGTAGCC TTCACGGACC ATCCAGCCTG AGAATTTCTC ACAGCGAACG CCACCTGTAC AGTAAACCAC GACACGCTTG TCCATGAATT TTTCCTTGTT ATCACGGACC CATTGTGGTA ACTCACGGAA GTTGCGAATA TCTGGGCGAA 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1949 120 180 240 300 360 420 480 540 TAGCTCCACG GAAATGTCCT AGGTCGTACT CATAATCGTT ACGTGTGTCA AGGACAACGG TATCTTTATC AAGAAGCGCT TCTTGAACT CTTTTGGAGA CAAGTAAGCA CCTGTGTTT CAAGTGGGTT GATGTCATTG TCAAAGTCGT TGTCTTCCAA ACCAAGGG ACAATTTcTr TCTTGTAGCG AACAAACATC TTCTTGAAGG CTrGTrCA'IrTTCTFCGTCA ATCTTGAACC AGAGTTCTTC CATTCCTGGA AGGCTGTGA.A CGTAg'rAT GTATTTTTGA GTTGTITCAT AGTCACCTGA AACTGTTCCG T'rAATTCCCT CGTCAGCGAC TAGGATACGG CCTTTAAGGn CGATTGATTT ACAGAAAGCC AAGTGGTCTG CAGCAAATTG CTCrGCATTT TCAATTGGAG TATAAAGGTA GTAAAGTAAG ACACGAATAT CTTTTGkCaw AAGATTTGTA TCTCTTTATC
TAT
INFORMATION FOR SEQ ID NO: 291: SEQUENCE CHARACTERISTICS: LENGTH: 3831 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION- SEQ ID NO: 291: ACTATGAACA AGACCCAGAA AAAGTAGCCT TATTTCTTAA GAATTTTAAT AGTTTAAAGC ACCTAGCACC TGTTTAGATT GACGAAACAG GATTCGATAC TTATTTTTAT CGAGAATATG 600 660 720 780 840 900 960 1020 1023 120 0..0.
.00.
0 GTCGCTCATT AAAAGGTCAA TTTCTTTGGT TGCAGGTCTA CGATGACGAG CGACTTTTTT CACCATCGGT TATTATTATG TATGCGAAGA GTTTGGGCAT CTATTGAGAA. AACATGGGCT ATACCTTTTA CGAGGCTTTT ACATTTTTCG GTTCTTTGTC AAATTTCGTC CT'rTCTTITTT GATA.AGTTTG ATGAGATTAT GGCGTTGACG ATTTTCTCTT TGGAAA.AGCA AGAGCTGATA TTAATAAGAG GCAAAGTATC TGGA-AGAAGA TATCAGAGGA ACAAATGGTG AATTAATCCC TCCAATGACT TACGAAGAGA GAAGCTTGGT TTCAGAATTT TCTCTTACCA ACATTAAACA GATAATGTAA GATTCCATAG AATGGGGAAG CTAGAACTTT AAACTTTTAC CTCTTCCTCC CTACTCGCCT GAGTACAATC CATATCAAAA AGCACCTCAA AAAGGTATTA CCAAGTTGCA TTATCCTGCT CTTGT'rTCAA TTGACTATAT TAGAGGOGAG AACTGTAGTG GG'rTGAAGAA AGCGAAGATC TAGAAAGGAC TGAAGTTTTC AAAGTTCCTA. AAACCAAAGG CATTGTGCTT TGGTGGCTTC CAGTTTGGCG TTGGAATA.AG GTAATTGAAG TATCTTTGAG GAAGGTTTTA AACAAAGTCT GAAACAGAGG GAGATTATAG TGGTGTTTAA AGTCTTCGGA ATAGCTCAAA AGTTTATCTA GAATTTCTTT ATTAGTCAAG ?TATCACTCA GTTTCTGACT ATCTTG?1'GA TATTCATGGG ATTTCGGATG ATGGCTTGTG GTTTATCAAA GTCCTGAGCA ATAAAGCTCA CCGGACTGT'r TCAACsTCCT AGGACATAAT GTGAAAATCA 'rTGTTCTTGC GAATGACAGT GTCGGTCATA GAAG1'CTI'TT TATTAGCrTr AATTTGATGA TTC?1'CTTGA CGATAGAAGT GCACTTAAAA CGGCCTTTTC TAAGAAGAAT CAGAACCATA ATACCTATAT AAAAATATTA A.AGGCGGTCT TTTTAGAACT TTAATTGTTT TGTCAACTT'r TCCTA'NrTT ATCTTGTTGA AGTGAATGTG TAAAATTTTr TGTTAGAATA TATGTTACAA AAAATTTATG AGCAGATGGC TGGTCCTACA TTTGGTGATA A'N'TTGACTG 'rTATTTAGTG AGATATGGCA TTGGTTGTCG TGCTTATCGT 'rTGTATCTTG AAAAATTGGT 'rAATr'rTAGT AAA'IrTCCGA ACTAATTTAC AGTAATTTTT CTAAATCA'1r TTTTAATAGT CT'rGATAAAA AGGCGATTT'r TTATTATAAT CCTGCCATGT ATGTGAGAAA GGAAGAGCCT 1320 TGCATACCAA AAGTAGGGCG ATAAAATCGT ATGAGCTTCC AGTAGCGCTT GATAGCCTTG TTCTGCTCTC AAGAACAGTT ATGATATTGA TCTCCATCTC CCGATTGAAA CAGTCACTCC CTCAGGAAGA-eeeGAAAAAT CATGCTCAAA TGAAGTTGAA ATAGACAACT GATGATCAAT CTGAGCAATC T'TrTGGTTGA TGATACALAGG CTCAGCGAGC TCCATTTTTG AGCAATGA'rA TCTAGTN'GA ATTNTTTAT ACTAGAAAAT TAGT'rCTA-AT AGGATTTACC CAAAAGTTTT GAAATTTAGG TAGCAAATTT GTTTCTATTT GGCTGGTATT rrfAACAATTC AGGAAT'rGAT AGTTTATAAA AAAGAAAAGG AGTATTTGAT
TAATTTCTAT
GGAACATGTT
TAAGGATT'TT
AATGAATCGG
TCTTTTATGG
TGGAAATAC
AAATTGTAAG
GATGGCTCAG
CATAGTATTG AAGAAGAGTA CATTTTAALAT TTTTAATTTA ATTGTTTACC ATTATCGTGT GGTTTTATTT CTTGTTGAGG AAAGATGATA GTAAATAGCT APLATCTTTCT ATTrGTT-rCTT ATATAATTGC AGGTGAGAGT ACAAGATTAT GACTTCAGTT 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 GTTGTTGTAG GTACCCAATG' GGGTGATGAA GGTAAAGGGA AGATTACAGA CTTCCTTTCA GCGAATGCAG AAGTGATTGC ACGTTACCAA GGTGGTGATA ATGCTGGTCA CACGATTGTG ATTGACGdTA AGAAATTTAA GTTGCACTTG ATTCCATCTG GGATTTTCI-r CCCTGAAAAA ATATCTGTCA TTGGGAATGG TATGGTTGTA AATCCTAAAT CTCT'rGTAAA AGAGTTGAGC .TATCTTCATG AGGAAGGTGT AACAACTGAT AACTTGCGTA TTTCTGATCG TGCGCATGT'r AT'rTTGCCTT ATCATATCGA GTGGATCGC TTGCAAGAAG AAGCTAAGGG CGACAATAAG ATTGGTACGA CAATTAAGGG AATTGGTCCA GCTTATATGG ACAAGGCTGC TCGTGTTGGA ATTCGTATTG CAGATCTTTT AGATAAAGAT ATTTTCCGTG AGCGTTTAGA ACGTAACCTT GCTGAAAAGA ATCGTCTTTT TGAAAAATTG TATGACAGTA AAGCGATTGT TTTCGATGAT ATTrTTTGAAG AATAT'rACGA ATATGGTCAA
GTTATCTTGA
GTrATGCTPAG
GGTGGTGTGA
TGTAAAGCI'
GTGGGAGAAC
CGTGTAGGTr AACCTrTCTr-
GCCTATGATC
ATGA'rGCGCT
ATATCGACCA
CAATTGGTTC
ATACGAGTCG
GTATCCGTGA
GGTTTGACTC
TGAACTCTAT
TTGACGGTCA
TGATAATGGC
AGGTACTTAT
TGGTGTCGGT
TGTAGGAGAT
AGTGGGTCAT
AGTTGTGATG
TGATGTTTTG
ACGTATTGAC
1321
CAAATCAAGA
AAACGTG'rGC CCATTrGTTA
CCAAGCAAGA
GGTCCT'rrCC
GAATATGGTA
CGTCArAGCC AGCGGT1'TGG
TACTATCCAG
AATACGTGA'r AGATACATCT TTTTTGAAGG TGCACAAGGT CGTCATCAAA CCCTGTAGCT TTGACAAGGT TGTAGGTGTA CAACTGAGTT GTTTGATGAA CAACAACTGG TCGTCCACGT GTCGTGTTTC TGGTATTACT ATACTGTGAA AATCTGTGTG CTAGTCTTGA ACAATTGAAA AAGATATTAC CGGAGTTCGC GTCGTGTGAG 'rGAAT'rGGTT AACAAACAAA TATTTTAGAA AGGTCGGGTA TACTATAGAC TAAACTTrC TTrTTTCATAA GTGTTGGGAT TCATGATATA CGTTGCAAGC CTATCTATGA AGAGTTGCCA GGTTGGTCAG AATTTGGAAG ATC'N'CCTGA GAkATGCGCGT AACTATGTTC 2700 2760 2820 2880 2940 3000 3060 31.20 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3831 GGCGTTCGTA TTTCTACTT AGTGTTTGGT CCTAAGAGAT GGTTACAAGA AGACCTCCTA TAATCTCCCT ATAGAGTCAC ATAATAAAAT CGATAAGTAG AAGTCTTTAT GAGGGAGGCT CAATTGGTTG TGTGA'rTGTC AGGAATTACA GCGAGCGGTT -GTGAGGAGAG TGCGCTTGCT CTCAGTAGGT CCTGGTCGTG TT'rTAAGATT TGTTTAAGAT ACTTGTTGTA ACAAATATCC CGCATTCGGT GGCTTTTTT GAAAAGAGAA AAGAGATGTA TTATACGCT1 GAAGAAAAAG TTGAGAGAGG CTGAGATTGC TCTTGAACAC GATGAAATTC AAAGATGGGG AAATCA'N'GG TCGTGGGCAT AATGCGCGTG ATGCATGCGG AAATTATGGC TATAGAGGAT GCGAACTTGA GGATTGCACA CTrTTTGdTGA CCATTGAACC G INFORMATION FOR SEQ ID NO: 292: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1441 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 292: CCGCTGTTCC AACCGCAACA TACCATAGTC CGTACGGGAT TCGAACCCGT GTTACCGCCG TGAAAAGGCG GATGACTTAA CCCCTTGACC AACGGACCTG AGTTGTTATT TTCAACTCTT ACTATTATAC AGTCTTTTCA AACTTTGTCA ACTACTTTTT CTAATT'rTTG ?TTATTTTTT 1322 CAACT'rATAG TAAAAAAAGC CAGAATTATA CTGACTCTTC AGCACGTTCT1 TTTCCCCACC AATAAGGGAT TAGTTCTGCG ATTATAGTCC ATCATGAA'IT CTGCATCTTT AT'T1rCAGCA TCTACAAGCA CCGCAAGGCA TGGCTGAACT TCCACCATAA TAATACTTTC TTAACCTTAG TTTGTCCTGA AAATTGGTAC TTCTGCGCAG AGATGGAAAA CACCACAGOT TCCCTCCATA TCCATCTCCT GCTTCTACTG CAGCTACAAC ATGATTGGCA ATGTGGATTG TATAGTTTCT GTGCTTCTTC GTACATCTTT ATCCTCTTTA TTTAGAGATT TCTTTTAGCA TGT'r'I-CGAT GTCCAAGCAA GAAAATTGTA TCTGGTAATT CTGGCCCATG TACGAATAGG CATGAAAAGA TTTT'TCCCTT TAATACCTGT TATCGCTCAT TAAACTTAGA ACTrTAACrG TTTTTCTTAT TTAAGCTCTA AAAGGAATTC GGTGGTTrGT CTCGAAAGGC ATATTGAAGA GGGCCGCCCG CACAATCCTG TA-AATAT'rTG TAAACAAAGT CTGATACT'rC TCCCAGATGT CCATTATTGT ATGCTGAATT GATTTTTCAC CATrTCGCCT GAAACTGCGA TTCTTrTTTGG ACTGCTTTAA CATCGCTTCA AGTT'rTGCTT GACTTCGCGC TCTGCrTCTG GATAATCTCA TCTACTGATT GTCAGTCAAA CGGCCTGCTT GTCTGCATTC TTGATATAAT TGGTGACtTG CTGAGGCGGT CTCATCCCCA CCACCTGGGT AAGGTAACCT TTCTTTCGGT TAACTrCTTA CCAGTTTCAG CCTCAACCTA AGAGCGGGTA e.
S
S.
S
S. *j
S
TTTGTGGGAA GATAT'rTTCT TGAATGCTTC AAGAACTGTT TCAATTCTGG GAAATCTGAG TCATTrGTGG TTTATAGAGC CCTCTAAGAA TGGTTTTGCC CATTGCTCAT CCAGTCTAGT TTTCATCAAA AAGTTI'AATG TCCAACCAAG AAGAGCAATA AATCTTCGAT AAATTGAAGT AGTTGATAAT CAAGTGTCAT GTCACAAATT CATCATCTGT GGAACTGTTT CACCCGTCAT AAGAAAAGAT C'rGTCAATGG TCAACTAATT TTrCAGCCTT ATTTCAAAGA TGGTTTCAAG TTTTTCTGAT CAAAGGCTGC AATTCTTCAC GAGAGAAAAT AAGTTrAAAGA CTGCTTCTG GTATTAGTAT CACGTTTAGA GTGACCGAAC TCTGGAGCTT
S
S
*5SS
S
55.5 Sb S S
S
T
INFORMATION FOR SEQ ID NO: 293: SEQUENCE CHARACTERISTICS: LENGTH: 4398 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 293: CGGCTTATGT AGTGGCAATC TTTCTACGTA AGCGAAACGA GGGGAGATTA GAGGCGCTAG AAGAAAAAAA AGAAGAACTA TACAATCTTC CAGTAAATGA TGAAGTAGAA GCTGTAAAAA ATATGCACTT GATTGGACAA AGTCAAGTGG ATTTATCTCT CAACTCTTTT GCCGATATTG ACCATTCATT TCGTTITrCTC AAGGCCAGTC CTI-rGATTGA AGAAGATATT GCGGCAATTC 1323 CTTTCCGTGA ATGGAATCAA AAATGGGTCG AAAATAATCT CI'TGAAGCA GAAGGCTATA ATCAAATTGA CCAAATT'GAG AGTCAAATTA GCAATGCTTT GGCAGACTTA GAGAAGCAAG AATCTAAAAA TAGTGGTCGT GTTCrTCATG Cr'rTGGATTT ATTTGAGGAA CTTCAGCATA GAGTTGCTGA AAATTCAGAA CAGTATGGTC AAGCCTTGGA TGAAATTGAA AAACAATTAG AAAATATCCA ATCTGAA'N'T TCACAAT'rTG TAACCTTGAA TTCATCCGGT GACCCTGTGG AAGCCGCAGT GATTTTGGAT AATACAGAAA ATCACATTTT GGCCTTAAGT CATATTGTGG ATCGTGTTCC AGCCTTGGTT ACGACGCTTT CTACAGAATT GCCAGATCAA TTACAGGAT'r TGGAAGCCGG T'rATCGTAAA CTAATTGATG CTAATTATCA TTTTGTTGAA ACGGATATTG AAGCGCGTTT CCACTTGCTT TATCAAGCAT TCAAGAAAAA CCAAGAGAAT ATTCGTCAGT TGGAATTGGA TAATGCCGAA TATGAGAATG GACAGGCACA AGAGGAAATC AATGCCTTGT .9 9 9
S
.9 6.
9 9 9 9* 49 9 9 4 99 69 4 9 99E'**4 9 9 9. 9.
.9 9 9999
S
ATGATATTTT TACTCGAGAA ATTGCTGCTC AGAAAGTAGT TTCCAACTTA TCTTCAACAT ATGAAAGAGA ATAATACT Gr'rTGAACAA GACCTAT'rTA CAGAATTAGA GAGTTTTGAG CCCAAGCTTA TTCAGTTCTT TTGAAGATGA GCAAATTTCA ATGCACGTCA AAAGGCCAAT AAAAACGCAA TCTGCCAGGT ATAA'rACCGA GGATTTAATG CCCGAGTTCT TGAAATTGCA TTGTACAATA TGCAACTTTG TTGATGAACG CATTCAAGAA ATTATCACGC TTCATTTGAC CCAATCGCTT TGTTACCTCA AAAAGATTTT ATTGTGTGAG CTrCCTGAGA CAGCTGCALAG GCAGCTATTG TTGAGGTAAC GAAGAAAATC TTGAGGATTT GTTAGTGAGC GCCTGACACA GTTTATGTCA ATCGTCTCCA ATTCCACAAA CTTTCTr'GAA GTTGAGTTAG AACAAAAAAT .ACGAATGATA TGGAAGCTTT ACAGAGCAAC TCTTGCAATA GCATTTAACG AaGCTTTAGA AAGA'IrTCTC AAGCATTGGA TATGAGAAAA CACGTGAAAC GGAAAATCTA CTTGCAACTC ATTGGGAGAA GATATTGCAC CCATGTTCGT CGTATTCAGA TTCAAATCAA GAAGAACCAA ACA.AACTCAA CTAAAAGATA AATTGAGAAA GATGATATTA TACTATCAAG CGATACATGG GTTATTCT'rT ACGGCAAGCA GATTAACATT GAATCTGTTA AGAALACGGAA ACTTATAATA TTCTAACCGC TATCGCTCAT TATTTTTGAA AAAGAATTTG AGTGGCAGAG CCTGGTGTAA GATTCGTTTT TAATAAA.AGA 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 GAGCAGAATC AAATCTTTTT CTATAGTTG'r GGGGAGATTT ACTTCATTTT CTCCTGAGAT TGAGTrTrTTG CCCACCGAT TTATCCACTA CCTCAAAACA GTGTITTATA CTC'rTCGAAA ATCTTTTCAA ATCACGTCAG CGTCGCCTTA CCGTACTrCAA GTACAGCCTG AGGCTAGCTT CTTAGTT'rGC TTCGCCACTC TTATCTGCAG CTTCAAATCT AACGAAGTCT ATTGAAAAAT CTCCAGACTA ATTT'CTTATT TGCACTTTTC TTGTACAACT TCTTTGTTTA CGAGAGTTTC CTCGTT'rGGA TCGCTATTTA TACTAGACTA AAATCAAAAA AAAGAAGAAA TCCAAACCAT CAAAACACTT AAACGCCTTC AAATCGTTC1' ATAG1'AAAAT AGTCAAATTG ATTTCTAACA ATG'TTTTAGA 1324 'TTTTGA?1'T TCATTTAGTA TrAAAGTGAT GTACTTTGAG TAACTTGGTA ACCGTCCAAT GAGAACTCAC GGATAGTTCC TAATCTGGAG TTAGTCCACG GTAAATAGAC CTCTAAAACC AGACA'rrCTA GAAGATAGGA TAGATATTTC GCAT'rATA'rA ATAGTGATAT GAAATCAACT TTAAAAGACT CTCGTACAGC TAAATATCAT GAAATAAGAA CAGTACAAAT CGATCAGGAC AGTAGAGGTG TACTATTCTA GTTTCAATCT ATTATATTTC GTCTGATGGG CAAATCTTAT AAAGAGATT-A TAGAACTTT ATAGTAGATT GAAA'rAAGAT GTGAACAACT CTATCAGGAA AGTCAAAT'rA ATTTATAGAA ATATTTTAGC AGCCAAGGTG TACTGT'rATA GATTCAATAC ACTATAGACT GTAATCAAAC AACGATTTGG CGAAATGTAA AAAAATATGA GGAGTTCGGA CTCGACTCTC TCCTTCAAGA AACACGTGGT GGTCGTAACC ATGCATA'rAT GACAGTTGAG GAAAAGAAAG TCTTTCTTGC CCGCCATTTG AAGGCTGCAG AGGCAGGAGA ATTTGTTACA ATTGATGCCT TATTTCAGGC TTATAAAAAG GAGTTAGGTC GTTCCTACAC ACGTGATGCC TrCTATCAAC TGTTGAAGTG CCA'rGGTTGG CGAAATATTA TGCCACGTCC AGAACATCCT AAGAAAGCAG ACGCTCAAAC CATTGTCGCG TCTAAAAATA AAATCTCAAT TCAAGAAGAA AAGAAAGCGC TTTAAAACCA GTAGACGTTT TCGTAAGGTT CGCTTGATGT ACCAAGATGA GGCTGGTTTC GGTAGAATCA GTAAACTGGG ATCT'rGTTGG GCTCCAATAG GAGTAGGTCC ACATATCCAT AGTCACTATA TACGAGAATTr TCGCTATTGT TATGGAGCTG TTGATGCCCA TACAGGCGAA TCATTTTTCT TAATAGCTGG TAGATGTAAT ACTGAGTGGA TGAACGCCTT TTTAGAAGAG CTTTCACAAG CTTATCCAGA TGATTATCTT TTACTCGTTA TGGACAATGC TATATGGCAT AAATCAAGTA CCTTAAAGAT TCCGACTAAT ATTGGTTTTA CCTTTATTCC TCCATACACA CCAGAGATGA ACCCCATTGA ACAAGTCTGG AAAGAGATTC GTAAACGTGG Ar'rTAAGAAT AAAGCCTTTC AAACTTTGGA AGATGTCATG AATCAACTCC AAGATGTTAT ACAAGGATTG GAGAAGGAGG TGA'rAAAGTC CATCGTTAAT CGGAGATGGA CTAGAATGCT TTTTGAAAAC AGATGAGTAT AAAAAGAAAG TCCTCATTTC AATAGAAATC ACGACTTTCT GATGGATTTA TAGTAAAATG AAATAAGAAC AGGACAAATC GATCAGGACA GTCAAATCGA TTTCTAACAA TGTTTTAGAA GCAGAGGTGT ACTA'rTCTAG TTTCAATCTA CTATATTTT GGAGTGATAG AAAAGCCCTT CATAAGCTAG 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 1325 TCTACTTGTT CAGGTGCGAG AGCTTTGACA TCTr'rTTC'rG TACTTAGCCA AGTCAGTTTT 3720 CCGTTCTCAA AGCGTTrATA TAGTAGCCAA AATCCTTGAC CATCCCAGTA AAGGGCTTTA 3780 AAGCGGTCTT TACGTCCACC ACAAAAGAGA, AAGACTTGAC CGGAGAAAGA ATCCAAT'rCA 3840 AAGrGGGT'rT TAACTACATA GGCTAATGAG TCTATTCCCD GCCTCATATC TGTCTTGCCA 3900 CAAACAAGGTr GAAC'N'GACC TAAATCACTT AGTTGAAT'FA TCATAGTACA ATACCTTTCC 3960 TCCGATAATT AT!TITTTATC TAGTATACTG GAAGTTGGGG AATTAGGATA GATACCTTGT 4020 TATGACGCGC TTACGTAACT TGTAACTAGC TGCCTAGTTT GATCTTTGCT TCTTCATrGA 4080 TTAGCAGTAG ATTTCAAAAT GATAAAAACG CATAGTATCA GGTA'rTGAAA TGTACTGCCC 4140 CAAAAGTTAG ACAGAAAAAA TCTAACTTTT GGGGTGTTTT TGTTATGAA-A TTAAGTTATG 4200 ATGATAAAGT TCAGATCTAT GAACTTAGAA AACAAGGATA TAGCTTAGAG AAGCTTTCAA 4260 ATAAATTTGG GATAAATAAT TCTAATCTTA GGTATATGAT TAAAT'rGATT GATCGTTACG 4320 GAATAGAGTT CGTCAAAAAA GGAAAAAATC GTI-ACTATTT TCCTGATTTA AAACAAGAAA 4380 TGATTAATAA AGTCTTAC 4398 INFORMATION FOR SEQ ID NO: 294: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 718 base pairs B) TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ I0 NO: 294: AGATTTTTAG ACTTTGTCTT TAATCGTTTC TTTTTAGGGA TGATTGCGAC ACCTTCTTTT GGCTATTAAC TTTAGCAGGA GGGATTATCC TTGGTCTAGC GCCGGCTAGT GCCACCTTGA 120 TGAGCTrATA TGtAGAACAT GGTTATAGCT TTCGGGAATA CAGTTTGAAG GAGGCTTGGT 180 *CTCTTITACAA GCAAAATTTT GTCTCAAGCA ACCTGATTTT C'rATAGCTTT TTAGGTGTGG 240 GTCTAGTTT GACCTATGGT TTGTATCTCT TGGTGCAATT GCCTCATCAG ACCATTGTTC 300 ATTTGATTGC GACCCTTTTG AATGTCCTAG TAGTTGCCCT GATCTTTTTG GCTTATACAG 360 TATCTTTAAA ATTACAAGTT TATTTTGCCT TGTCCTATCG AAATAGTCTC AAATTATCCT 420 TGATTGGCAT CTTTATGAGT CTAGCAGCTG TGGCTAAGGT TCTCCTTGGG ACTGTGCTAC 480 TTGTAGCAAT TGGTTATTAT ATGCCTGCCC TGCTATTTTT TGTAGGAATT GGGATGTGGC 540 ATTTCTTTAT CAGTGATATG TTGGAACCTG TCTATGAAAT CATCCATGAA AAATTGGCGT 600 1326 CAAAATAGAA TGAAGCAGTr TTGGCTACAT ACGCTTCTAA GAACCTATAG TTCAGTGATG ATCATTATCA TTGCGAGTTT TGCAATCTTA CTCTCrTACG CTGTCTGGGA TTCACGTG INFORMATION FOR SEQ ID NO: 295: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 718 base pairs CB) TYPE: nucleic acid STRANDEflNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 295:
TCGGTACCAA
TTAGGTCTAG
GAAATT'TCTG
GATGATAAAA
AAAAATGGTG
ACACCACTTT
AATTCTGGAT TTATACTAGC AAAGATCCAA TTTTCCCTGA ATCATTAAAA GAATTTGAGA CAGAAGAAGC AAATAAGATA AATGATGCTG CTCTTGAAGC TTTACAAAAA GATCCTCTTT CCGTTGCTGT AATTCCAGAT AATACACCGT CAATAAACTA TACTATTGAA GAATACCTAA GAGCAAATTA TTTAACAGAT GTGAAGATAG TTTTGCAAAG ATGTAATCAT AACTTATGGT TAGGTAAAAT AAATGCAATT TAGCAGCCTC ATGCACTCCA ATCTTTTAGG AAATGCATGC AAAAATGCGA AATAAAXAAAC AAATAAACCT AGGCATAATT TTTATAATCT GCCTAGGTCT TCTTATTACA ATATTTTTGT CATTAAAGCT TGGAACAAAA GAAATTAATA TCAGAGATTT TTTAGCAGCT TTTGGAATGG GTAATACAAA TGATGATTTT ATrAAATCAA TI'ATATATAA rAGAATACCT AGAACTATTT TTGCAATTTT AGCAGGTTCT AGTCTTGCCA TAAGCGGTGT ATTGATGCAA TCAGTTACTA GAAACCCAAT AGCTGATCCA AGGAGCAAGT CTTAGTGTAG TAATTGGTCC TTCtTTTTAG GGTATACTCG GTATAAACAC GGAATTCATC AAGCATAA INFORMATION FOR SEQ ID NO: 296: Ci) SEQUENCE CHARACTERISTICS: LENGT.:. 1436 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear xi) SEQUENCE DESCRIPTION: SEQ ID NO: 296: GAACTAATCA TTTTTACAGG ATGA GATTTA CAGCAGAGAG TTTGAAGGCT TTATCAAAGG TTTTTCTTGG CATAATGACT TTTCCTCGTT TCCACTTAAT TTTGTGTCTA CTTTATTATA CCAAGTCCAC sCTTAAGTTA GATAATAAAT CTAACTTAAG GAAGCTAGAA GGATGAGAAT CCAGGTGGTC AAGAGTCCCA AACTTAAGCT GATGGGGACA CCCAGAATAA TTTGCTTTTT GAAGGCAAGG CCACGTTCCT TGAAAAGGGT GAGATATTAG AATATCAATC TGAGGATT TACAACGGAT AGGGTGGAAC AGGTAACCAG AAATGAGGAA TGACTTGACC GCTAGAGACA GGGAACCTTA GCTAAAATGG CATGAGTATT GAGACAAAC GTTTGGGAAA. ATGAGATGCA GAGCAAGGTG GTTTGTCTTT TGAGTTTGTT CTTCCCTTAC TGGGTAGATA ATGCTGACGA TCCCATT'rGC TTAAACAGGC CCCTCCTGAA GCTCCCCAAT TTGACAGAGG GTAATCGCTA TAAAGCAGAC AAAAGGrTTG 1327 CTATATTGGG AAGI'GAGAGT TGAATGAGAG AACCAGCTGA TAGATAGAGC GCCAATAACG GTGGCTGTTG TGAGTAAGTG GAGCACTGAT GATAGCAATG TAAAGAGTGA CATCACTCCG TGGTTGTTGT CATGAGGTGC 'rTAGTAAGCT CATGCCGCAG CTTCTTGCTT CCCTAATTTG CAATATCAAA TGTTTTTTGA ACAAGGGAAA AAGCCAAACC GAACCT'TGCT GAGGAGTGGT TATAGTGACT GTAACAGGAT TAAAGATATG ATTGCCAAGT CTTGAAAGAC AATGCCTGAG TGACGGCTTG AGCTCCAATC GAGGACAGCA AACGGCCATA CCATCAGGTA TAAAATCATG
ATGGGAAAGA
GCTATCACAC
CCTATCAGTG
AGCATGATAA
AGCCTTAAGG
GGGCTGGAGC
AAAAGAACAG
TGACTAAACC
'N'GTAGCCCA
CGAGGCAGAC
TAAGTAGCTA TCCAGGCGAT AAAACCA'rGC TGCTGATCAT GGTTGGTCAA TAGTCAAGGA AATAAAAGCA AGACGATGAG GAAAAAGCT'r GCTCTTCCCA CTACrGGTTA TCAAATTAGC AAAGGGTGT'r TGTCCGCTTT GTAGTGAAAA ATCCAGCACC TAGAGGGCGT TAGGGTGGGT GTACCGTTAG TTGTTGCAAC GAGGTTGGCC AAAAATGAAG AGGTAAGAAA AAGCAATAGC ATGGCTAGAG CAATGG 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1436 120 180 GCGTGTGCGG TAGAGAATGT GTTGAGCCAA AACATCAAGA GTTATAAAAG AGAGAGACGC TAAAAATGGT AAAAAAGAGT AAGTTCTTG GGGCTTAATC CCATGAGAGT GGTrGCGATG CAGCAGGCCA ATATTGATTT TGGTGCGGTA ACCAATTCCA INFORMATION FOR SEQ ID NO: 297: SEQUENCE CHARACTERISTICS: CA) LENGTH: 1696 base pairs TYPE: nucleic acid STRANDEDNESS: double D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 297: CCATTTGGGA AAGAACGTAA GAGTTTGCAG GGTGAGATTC CAGAAGAATT TTCAATGTCA OCCGTTGACA TGTCTATGAT TGACCACATT CCAGATATGA TTGAAAATGG TGTGGACAGT CTAAAAATCG AAGGACGTAT GAAGTCTAT'r CACTACGTAT CAACAGTAAC CAACTGCTAC AAGGCGGCTG TGGATGCCTA TCTTGAAAGT TTGGTGGACG AGATGTGGAA GGTTGCCCAA ACACCATCTG AAAATGAGCA GTrGTTTrGG GTCGCTGAAG TGGTTTCTTA. TGATGATGCG GTCATrAACG AAGGGGACCA AGTrGAGTTT TATATTGAAG ATTTGCATGA TGCCAAAGGC GAACTATrGA CTATTAAGGT GCCTCAACCC AAAGAAGGAC TCATCAATCT TATAAGGAA 1328 CCTGAAAAGT 'rTGAAGCTAT CGTGAACTGG CTACAGGATI' GCTCGCCGTA AAATTCCTGA GCACAAACAG CAACAATTCG TATGGTCCAG GTT'rCCGTCA AATAAAATCG ACCGCGCTCC GTTCAATCAG GAGATATGGT GATGGAACCA GCGTCACAGT
CAAACAAGAC
TTACTATGGT
GTACAAGT
TCAACGAAAT
TTT'rGAAACC
AAATCCAATG
TCGTGCATTA
TCGAGCTTAA
GAAAGGAAAA GGAAATGATA GAGGCACAGG GTTTCTTAGT GGATAAGCAA ACAAGATGCA TTCA'rTACCA TAGCAAGCTG GATA'rrATTG
S
S
ATGCTTGTTA TCGGTGTCAT GATTCATTAG CT'rTGATACA GGATAAGCCT ATTTTATGTG AATATAAAGA AAGCTTAAGT TGCCCC'T' ATCA'rAAGGA ACGCTATT'rT AAATAGCAAA ATTTCAAGAG AAAATGAAGT AAATCTTCCC AATACCTGAT ACTATGCGTT TTTAAGATT GACTACTTGT TAAAACTGGG TTAATTTTCG AAGGTTAGAA TTGTCAAAAC AATCCGTCTA
CTTTACAATG
AACATCACCC
GTGTTTGTCT
GTTTTTCTCG
TCATCTAGTT
ACAATAAAAC
TAAAGACTIT
ACTGTT'rAAT CTATGATTGT AAAAAGTATT TTTrGAGCCG TATCCCT'rAT AAAACTACTA ACATATAAGC CTTTAATCCA GGTTGCCAAA
TTGAAGTAGG
GCATAATATC
T'rTCCTTTAT
AGTTATTATG
AGAAAACTCA
AAGATTGTTC
CTGGTATTTT
CAAAGTCTAA
GTGGTGGATG
AAGAACGTAA
TGTCTATGAT
AAGGACGTAT
GAGTATGCGT GATGCCAACC 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1696 TTCTCAGTCA TGCCGTTGGA AGTACGACCT TTACGATATG CCATTTGGGA GAGTTTGCAG GGTGAGATTC CAGAAGAAT'r TTCAATGTCA GCCGTTGATA TGACCATATC TCAGATATGA TT-GAAAATGG TGTGGACAGT CTAAAAATCG 0 0 GGAGTCTATT CACTATGTAT CAACAGTAAC TCTTGAAAGT CCTGAAAAGT TTGAAGCTAT GGTTGCCCiA CGTGAAcTGG CTACAGGATT GTTGTTTGGT GCTCGTCGTA AAATCCCTGA TGATGATGCG GCGGTA INFORMATION FOR SEQ ID NO: 29 SEQUENCE CHARACTERISTICS LENGTH: 1022 base TYPE: nucleic acid STRANDEDNESS: doubl TOPOLOGY: linear CAACTGCTAC AAGGCGGCTG TGGATGCCTA CAAACAAGAC TTGGTGGACG AGATGTGGAA TTACTATGGT ACACCATCTG AAAATGAGCA GTACAAGTTT GTCGCTGAAG TGGTTTCTITA (Xi) SEQUENCE DESCRIPTIION: SEQ ID NO: 298: CCGAGTTTAT TATGGN'CT AAGAAAGTCT TTA'rAGTCAA CATTrrI'AAC CTTCAGTACT
TCGGAATTTA
AGCAAATTTA
GCTGCAGGrG
TCTCAAAGAT
AGTATGCGAT
CAGTTGGGGC
CACCAGGTAG TGGACGCTrC CTCTrTCCAT TCGTT~rTGC ?r'=TTGAA TGCCGAGTrG GTCACT'rCAA ACATGATGT'T TAAAAAAAAT CTCTrGGAGA AAAACAGCTG AGATTF'rACT TTATCGGAGC CTrGATAGCA GGGTGGGGCT 1-GCTCATTC CACACGATAG TT'rCA'ICTCA GGTGTrGTTG AGATGAAGTT TCTTGcTTGA GGCGATTTTG GCAAATAT'N' TTGTAAATAT TGGTCAA.AGA TGGTGGTGCC AAACTTTGGC TTGTGTTGTC TCTTAACAAA CGAGCACATT GCGGCGAACT TTGCTTCTTT TTGCTGCGGA TTCAATTGCC AACTTCGGTG TTGGAAATAT TGAATTNGCT TGCAAkTAAGA TCGTrCGATG TTCGCAGGTG TGACTTGATT AATAAAATTG TTGGGGCTTG GCCTACATTG CTTGACTGCT GGTAGTTTCT ATACTGTACC T'TGTTCAACC GGCAGCCTAT GCGAATCTGA AGGCCGCTCA AATGAATTGG TGCGATTCTG TCATTTATTT AGCTATTTAC ATGTTTGTAT CGCGATTGTG AAATTCAGTG GCTTCGCCAC TGGGGTGTGA 120 180 240 300 360 420 480 540 600 660 720 780 840* 900 960 1020 1022 0**0* 0 0* 0.
0 0 CTTT'CATCGG AAACTTTATC GGAGGAGGCC ATAAAAACGA AGATACTrAr GTAGATTAAG CATTTTCAAA ATAAGGTAAT AGCTATTTCT ArACTATAAC TCAAGGTGCT ACAATATCCT Cr'rGTGA'rrT TAAATrITGAA ACTCTACAAC TCTTGATGGG TCTTCCATAT GCCT'rCCTCA AAAATGAGCA CGATTGAGTC GTGCTTTTTT TATATCAAAA TATAGAAAAC TGATATTTGT TAATAAAATA ATATGGAGGT CACCTTATGA TACATGCTGG TCAAGTTGTG GCTCCAGCTA
CT
INFORMATION FOR SEQ ID NO: 299: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 663 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 299: CCTTAAGTA.A TCTCTGATAA TATTTTCIT ATTAGCATAG GGGAATATCG ATATAATGGC TTCATTATGA GTGGCAGGAA TATCCAATAT GGCAACTTTT CCAATAGATA ATT'TAAAACT CATTAATAAA GTTCCTTTAG GTGAAATGTC TATrTTCTTT GATTTTAATG CTAATTTAGA AATAGATTCT CTCGCATTAG AGGTATT'rCA G'TTCCCCAAA GTTAACTAGG CTAGCAAATT AGAGGTTGTT TCGTC'N'TGT AATGTCCAAA TCTCTTrI' TATTTTTTCA AGTAAAACTT 1330 TTACATAACC AGATATAGGC AAGTAGCTTC ACTGCGTGGA TAATATATCT CCATGCTTCT TCCCATAATA AGAGT'rATCA TAATCTTGCC TTCTrCAAAG CGACTGATTC ATCATTTGGG ATATCTGATA TAGATACCC-A GGAGTTTTrC CTATTCTGAA GGGATTTCAT ATATAGGATA TCTCCTTGGG AAACAATAGA AGTTTTGTT TTTCTGCTCG TCTrGTTCAA CTAATT'TTCC GGAAATTCTT TATCTAGCTG TCTAAAGCTG A'FrCGATTGC TTGCATAGCA TATTGAAGAA TAGATTTTTT TAGTTTATCT TTCTAGTC TA TTATAACTTr CAGCATAT'rC ATCTACTT
'TC
INFORMATION FOR SEQ ID NO: 300: SEQUENCE CHARACTERISTICS: LENGTH: 881 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear 0 0 00 0 0 @00 0 00 0.
0 0 0 00 09 0 0 0 @0 00 0 0 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 300: CGTCGCTGAA CATGTCAACA GCAA.ATTAAA CTAAACAAAC CATAATTTTC TTTAGAAAAT ATTATCAGAA GAAAGTTGAG ATCCTATGAC CCTTGAGGAA AAGGAAAAAC TTGA.AAAAGA TTCGTCGACC AGAAGTGGTA GAACGCATTA AGATTGCCCG AAAACAGTGA GTACGAAGCA GCTAAGGATG AACAAGCCTT GCTTAGAAAC AAAAATCCGC TATGCTGAAA TCGTCAATAG AAGTAGCGAT TGGTAAAACA GTCACCATCC AAGAAATTGG ATATTATCGT AGGTTCAGCT GGTGCAGATG CCTTTGTAGG CAATTGGGCA GGCCTTGATT GGCAAGAAAA CAGGTGATAC TTGGTAGCTA TGATGTAAAA ATCT TGAAGG TTGAAAAAAC A6TGGGGAGG CGATGTGCTT CACTCACTCC TTTTTCCATT TAAAATTATG TGATACTTCA AAAAATGGCA GAAAAAACAT ATTAGAAGAA TTGAAATTGG TTCATACGGT GACCTTTCAG TGTCGA.AGGA CAAATCTCTA CGACGCAGTT GCCCAGGACG TGAGGACGAA GAAGAAGTTT TAAGGTTTCA AATGAAAGCC AGCAACCATT GAAACGCCTG AGCCTAAAAA CAGAAAAAGG TTGCTACTCT TCGAAAATCT ACTGACTTTG TCAGTTTCAT TTCATCTACA ACCTCAAAAC AAAACCATGT TTTGAGCCGA 00 *0 0 004 0 0000 *00.
0000 00 0 t CTTCAAACCA CGTCAGCGTC CTACAACCTC AAAACAGTGT TATGTTTTGA GCTGACTTCG
GCCTTGCCGT
TTTGAGCTAA
TCAGTTTCAT
ATGTATGGTT
CTTCGTCAGT
CTACAACCTC
CTTCGTCAGT TTCATCTACA ACCTCAAAAC TATGTTTTGA G81 881 INFORMATION FOR SEQ ID NO: 301: SEQUENCE CHARACTERISTICS: LENGTH: 949 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 301: CCTTTTTTAA TACAAGTTAT TTTGATTTAA CCGGCTTGTC TGGCAATCGT ATCTGCATAC AATTTTGCTC CTGCT'rCGAT AATGAACCTG GTCTGTTCCA GCCCAAATTT CTGGATGCTC CTGCTATCGT AATGTAAGGT GTCTTCTCTG CCAATTCTCT CAACGATGGC ATAGGTCTCT TTTGTCTTAT CTCCCTCATA GGTGTCCCTT AGGAAGATTT TTCACGATAC TGTCCCAGTC TATTTACCCC AGTCGCAATG ACCACCGTCT TAGGTAAAAA GCATGATTTC AT'rTGCGGTC TTGGTTGTTA CGCTGACCTG GAAGAGCTGT CTGTAGTGCT GTATTTGCCC TTAAAGCCAC TGCCATCAGC AATTCCCAAA CTGTTTGCAT CTGCCCGT'rC CAATATTTGT TGCAGCTTGC TTCAAGCCAT TGACAGTCAA CTTGTGGTGC CAACAAGGTC ACCGTGCAGA CAATGATGGT CAAGAATTGC GTGAATATAA GGCAGGGGAC GAAsGGTTTG CTGCAATCCA AGGTTCCAAT ACATAAAATG ACAGACTGGC GAGTCAGTAA TACAGCAAGA AGATTTGATG TCAACTGTGA T'rGAGCTGTC TGCAAAGCTG AGTGCTACTC TCACTCCCGA TT'rCGCAACT TGATTCCAAT CATATAGGCA GCAGCCTTCT AGGAGTCACC AAAATCATAT ATCCT'rGTAA TTCTCAGGAT TTTATTCTGG CTATTATTTA CGCGTTAATC TGTGCTCCAG TGAGTCACCA ATTAACATAG TGCCATCACC TTOGTCTGGC GTCTGTCTCA AACGCTCCCA CAAGATTCCT GTACCTGCTG GACAATAGGT GTGTTCTTGC AAAGCCATAA GAACAAATCA GAAAATGATA TAGAAAGGCC AATGGAAAAG ATAAACCGCA TAGCTAGTAT CCGCTAAAAA GCTGATAAT INFORMATION FOR SEQ ID NO: 302: SEQUENCE CHARACTERISTICS: A) LENGTH: 622 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 302: AAGATATATT TTTTACACAG AAGTATGCAA AAGTAAAGAG TGCAAAAAAT GGAATTAAAG 949 1332 CGAAAATAAA AGCCGTGTAC AGGCGACCAA ACCAACGTAC ACGGCTAAGG AAAAATAACA 120 AAACTCAAGC AAAGGCAAGG CGCGTGGTTT TGTTAGOTAT TTAGCAAGGG GACAAACCCC 180 TTTGTAAATA ATCTCCTC'rr ATTTTATCAA AATTAGAGGA AAATGACAAC TTAATT-TATA 240 AAAAGGAAAA ATGGAGGATA TAAATGGAAA TTCTGTCTA.A AGAAATACAG ?1'ACAGGGCT 300 TACAACTTCT TAAACAGACT CPGAAACTT TAGT'rGAGCT IkAG&AAACAA CGATCTAGTA 360 AGTTAGATTT AATTTCTCGT AAAGAATTAA TGGATCTGCT AGGTATAAGT GCTACAACCC 420 'ITGATAACTG GGAGGATCrT GGTCTrAAAC GATATCAGAC TCCGATGGAT GGAGCTAAGA 480 AAGTATTCTA TCGTCCGTCA GATGTGTATT TAT'NTAGC AATAAAATAG GAGTTATGAA 540 ATGAAAATTG T'rACTT'rCAA ACCAACTAAA CAAATAGACG ATGGGT'TTTA ACrGCCAGGT 600 ATTGACATTC TAT'rTGTCTC AG 622 INFORMATION FOR SEQ ID NO: 303: SEQUENCE CHARACTERISTICS: LENGTH: 1929 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 303: CGCTAACTTG CAAACAAAAG AAGAACGCAA ACTCCACAAA TCCTTTACGC AGAAACTCAA TCTCATCTAC TTACCTTGCT GACTTGGTAG AGTATGTTGC AGACAAAGAC TTCTCAGTAA 120 ACGTAATTTC TAAATCAGGT ACAACAACTG AACCAGCGAT. TGCrTTCCGT GTCTTTAAAG 180 AACTCTrGGT TAAGAAATAC GGTCAAGAAG AAGCTAACAA ACGTATCTAT GCAACAACTG 240 ACCGCCAAAA GGGTGCTGTT AAGGTTGAAG CAGACGCTAA CGGTTGGGGA ACATTTGTTG 300 .TTCCAGATGA TATCGGTGGA CGCTTCTCAG TATTGACAGC CGTTGGTTTG CTT'rCAATCG 360 CAGCATCAGG AGCTGACATA AAAGCTCTTA TGGAAGGTGC GAATGCAGCT CGCAAAGACT 420 ACACTTC.AGA CAAAATCTCT GAAAACGAAG CTTACCAATA CGCAGCTGTT CGTAACATCC 480 TTTATCGTAA AGGCTATGCA ACTGAGATCT TGGTAAACTA TGAGCCATCA CTTCAATACT 540 ***TCTCAGAATG GTGGAAACAA TTGGCTGGTG AATCAGAAGG AA.AAGACCAA AAAGGTATCT 600 ACCCAACTTC AGCCAACTTC TCAACTGACT TGCACTCACT TGGTCAATTT ATCCAAGAAG 660 GAACTCGTAT CATGTTTGAA ACAGTTGTCC GTGTTGACAA ACCTCGTAAA AACGTGCTTA 720 TTCCTACTTT GGAAGAAGAC CTTGACGGAC TTGGTTACCT TCAAGGAAAA GACGTTGACT 780 TTGTAAACAA AAAAGCAACT GACGGTGTTC TTCTTGCCCA CACAGATGGT GATGTACCAA 840 1333 ACATGTATGT GACTCTTCCA GAGCAAGACG CTrrCACTCT TGGTTACACT ATCTACTrCT TCGAATTGGC AAT'rGCCCTr TCAGGTTACT TGAATGCTAT CAACCCAmT GACCAACCAG GTGTTGAAGC TTATAAACGT AACATGTTNG CCCTrCTrGG AAAACCAGGA TTTGAAGAAT TGAGCAAAGA ACTTAACGCA CGTCTATAAT AGAAGAAAAG AGTGGT'rTGC CCACTCTTTT TACTCTCTTT ATCCATAGAA ATTGGACTCA GCCAAGACTT GTGATATAAT ATAGAAAGCA AAAAGGCAGA CGCCTAGATA ATAGGAGAAA CTATGTCAAA AGATATCCGC GTACGTTACG CACCAAGTCC AACAGGACTA CTACACATCG GAAATGCTCG TACAGCATTG TTTAATTACT TGTATGCGCG CCATCATGGT GCCATGTCGA GGATGGTGAA GGGATGAAAG TCCAGAATCA AAAAATATAT TGACCAACTA AAGAGTTGGC AGCTGAACGC ATGAATACCT TGGTATGAG'r CAGGGATCAT CCCAACTGTT ATATGGTCAA AGGCGATATC AAAAGAAAGA CGGTTACCCA AAATCTCTCA TGTTATCCGT
GGAACATTTC
CGTTCACAAC
CATGAGAATT ATCGCCAGTC TTAGCTGAAG GAAAAGCCTA GAACGCCAAG AAGTAGCTGG GAAGAAGAAA AAGCAGCTTA CGT'rTGGCTG TCAATGAGTC GAAT'rTGAAG GTGGCAATAT ACTTACAACT I-PGCCGT'rGT GGAGATGACC ATATTGCTAA TGAGCGT'rTG GACTTGTATC TAAATCTTAC GTrACAGAAG CGAAACACCA CGCTACATCA CATCGCAGAA CGTGAAGCAG AGGTATCTAC AAGTGGCATG CGGTGGTGAC TGGGTTATCC TATCGATGAC CACGATATGC TACACCAAAA CAGCTTATGG TCATCCGTAT CGAAGATACT GACCGTAAAC TTGAAAACCT TCGCTGGTTA GGCATGGATT 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1929 120 180 240 TCTATGAAGC TCTTGGTTGG GAAGCTCCAG AGTTCGGTCA CATGACCTTG ATTATCCACT
CTGAAACTG
INFORMATION FOR SEQ ID NO: 304: SEQUENCE CHARACTERISTICS: LENGTH: 708 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 304: AAATTTAAGA AAAAGGAGAC ACATCATGTC TAAAAAAGTA TTA'NTTATCG TCGGATCACT ACGTCAAGGT TCTTTCAACC ACCAAATGGC GCTCGAAGCT GAGAAAGCAC TTGCTGGTA.A AGCGGAAGTT AGCTACCTTG ATTATTCAGC CCTTCCTCTC TTCAGCCAAG ArrTGGAAGT TCCAACACAT CCAGCTGTAG CTGCTGCTCG TGAAGCAGTT CTCGTTGCGG ATGCTATCTG
GATTTTCTCT
GCTATCTCGT
TGTCACAGTA
CCTCTTGCCA
CTCTGCCTGG
ACAAGCTCAA
1334 CCAGTCTACA ACTTCTCTAT CCCTGGTACA GTGAAAAACT TGCTTGACTG GCCCTTGACT TGTCTGATAC ACGTGGCGTT TCTGCCCTTC AAGACAAGTT TCATCTGTAG CCAATGCAGG GCACGATCAA CTTTTCGCTA TCTACA.AAGA TTTATCCGTA CACAAGGCGT TGGTGATTTC ACTGCTGCAC GTGTTAATGA GCAsACGGAA AATTGGTrCT TGAAGAAACA GTCCTAAACT CACTTGAAAA GACTTGGTCG AAGCTATCAA GTAACTAACA CTCAATAAAA ATCAAAAAGC ATAGAACTGA AA.ACTAkGAA GCTArCCGCA AGCTACTCaA gCACTGCTTT GAGGTTGTAG CGAGTGTnnA ACATATATAC GGTAAGGCGA CACTGACGTG GCTTGAAn INFORMATION FOR SEQ ID NO: 305: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 781 base pairs TYPE: nucleic acid C) STRANDEDNESS: double CD TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 305: CTTCTNTTCT TGGAAATAGG TGTATAATAC GTT'rATTAAA TTTTTGAGGA AAGAAAAGTT TTATCCATCA ACAAGAAGAA ATTTCCTTTG TCAAAAACAC TATTTGAAAG ATAAGCTAGA AGTTGTCGAA GT'rCAAGGTC CTATCTTGAG GACGGAATGC AGGACAACCT GTCTGGTGTG GAAAATCCAG TATCGGTCAA ATCCCTGATG CTACTTATGA AGTGGTGCAC TCACTTGCTA AATGGAA.ACG
GTTGTCTATG
TTTTACCCAG
TAAGGTCGGT
GGTTCTCCAA
CCACACCTTG
GCTCGTTTTG
GATGAGGATT
GTTATCCCAA
AAGGCTATTC
CCAAAACAAA
CCGAAAGAAC
GCTTTGGTGA AGGAGAGGGT CTCTTTGTCC ACATGAAAGC CCTTCGTCCA CCTTGGATGC AACCCACTCT GTTTATGTTG ACCAGTGGGA CTGGGAGAAG ATGGTA.AGCG TAACATCGTT TATCTAAAAG AAACAGTTGA GAAGATTTAT GCCTGACTGA GCTAGCTGTT GAAGCCCGCT ATGACATCGA GTCTATCT'rG TTACCTTTAT CCATACAGAA GAATTGGTAG AACGCTACCC AGACTTGACA GTGAAAATGC GATTTGTAAA GAATTTGGAG CCGTCTTTTT GATTGGTATC GGTGGCGAGT TGCCAGATGG TAAACCGCAC GATGGACGTG CACCAGACTA TGATGACTGG ACAAGCGAGT CTGAGAATGG CTACAAGGGT CTAAATGGTG ATATTCTTGT CTGGAATGAG
T
INFORMATION FOR SEQ ID, NO: 306: SEQUENCE CHARACTERISTICS: LENGTH: 846 base pairs 13 TYPE: nucleic acid STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 306: CCCGCATCTr GTAGGGTTTT AACGGGCACG ATTTrCATAT CCGTCTTGAT TGTTTTAGCC GCTTCTAGGG CTGTTTG.GTA GTTGTTTTTC GCGTCCGGAT GCGCCTTTTG TrCTTCTTCG 120 CTAACAGGGT TATCAGGAGC AAAGAAAATA GCAGCACCTG CCCTAGCCGA AGCTACAACC 180 TTCTTATCAA TACCTCCAAT GTCTCCCACA TTACCATCGC GGTCAATGGT ACCTGTACCG 240 GCAACAATAC GACCATTACG AAGATCTGGG TGAGCTATTT GAGTATAGAT AGCTAGACTA 300 AACATGAGAC CAGCACTrGG ACCGCCAATA CCAGCTGTTG AAAAGC'TAAT TGGGACATTG 360 CTGATTACCT CTGTACGGTC AATCAAGCCG ATTCCAATTC CATTrTTGCC ATTTTCCAAG 420 *GTGATGATTT TTCCTTCTGC AGACTTGGTT TGCCCATCCT CTTCATAGGT GACCTTGACG 480 GAATCCCCTA ATTTTTGAGA ACTGACGTAA TCAATCAAGT CTTTGGAACT ATCAAAGGTC 540 TGATCATTGA CTGCTGTGAC TGTATCAGAG ATATTGAGAA TCCCTTTAAA GGTTGAATTA 600 TCCGTCACAT TCAAAACATA AACTCCAAAG TACTTGAGTT CGATATCCTT ACCAGCTGTT 660 *TTTAGTCCTT GATACTTGGC CATATTTTGC GATGTTTGCA TGTAGAATTG ATTGATTCGC 720 ATAAATTCAA CATCGGAAGA ACCACCTGTA GTCTCCTGAG CACTACGAAT ATCTGTAAAA 780 *GGTGTCAACC AAGCATAAAT CATATGAGCT AAAGTGGCAT GTTGAACACC AACCGTAACG 840 *AATTGT 846 INFORMATION FOR SEQ ID NO: 307: i) SEQUENCE CHARACTERISTICS: LENGTH: 829 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 307: GCGATCTGCT TGGGCTTTI'C CTATTACCTT ATCTAATAAA TAGGTACGCA GACTCATAAC CATATAAAGT CCACCCCCCA TGGCACCGAC AAGAGCTACA TAAAAGAAGC TCCACAAACG 120 TCCACTTGGT TGGAAGAAAA ATCCTAACAG CCACTGGATG GTTCCTATTA ACAGAAACAT 180 GACTAGGGTC AGCAAACTGA TTAAAATGGT TCGCTTCAAA ATCACCTTGC GCTTGACACC 240 1336 AGTTACTTTA CAAATATCCC GATACATCAA GACGTTAGGA TGAAATCAAA GGACCATAAC TGTGGAAGAG GGCGATGGTA GGCAATAGAA CCATAGATAA AATAGAGAAC GGCCTrGCGG CATTGGAGAC AAGACCATGT ACAAGCCrAA AATAATAGAC TAAGCCCAGA GCCAAACTAT CTGGCTTACC ATAGAAGACC CATAACCACT CCAACCGTTG CTGGTAGCA.A GAACATAAAG AACGAGACGA GAAGCTGCTT TCAAGTCCCC CT'rGACATAG ACCAACACTC CCAATCGAAA CCCCTACAGA AATCAAAATC GGCTGAGAAA TAAGAAAACA TGACAACCAA GTCCTCATTG ATGATGAGAG CAATGGTTGT GGTAGTTGCA AGACTAGCTT TTGCGGAACA TGGCCTGAA.G TGCAAAACTG CAAAGACAAA GTATAAAGAG GTrCTCCPAC AGTAGGGTGA GACTGTCCTG TTTTCCGTCA AA.AGTGGCA.A ATCGTGATTT TATTAGGA'r? CTG'rAGTrGG TAAACCAGCT 66666* 6 6 .6 6 6 *6 6 6 66 66 0
S
06 S
S
CATACTATTG ATAAAGGTCA GCTGAGTCCA AATCTGGAAG AGCTGGATG INFORMATION FOR SEQ ID NO: 308: SEQUENCE CHARACTERISTICS: LENGTH: 464 base pairs TYPE: nucleic acid STRANDEDNESS:! double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO:.308: 66 .6 6 6 Ste.
6 6565 66*6 6.t6 66 *O 6 5 6 CGAACATCTT GCTGGCTIGAT TCGTCTGCCG CCATCGCAGC CCCG TGGCAAGCGG GCTCAATCCG CACATGGGAT CCGTGCCAA.A GCCCI GCTCATCTAG TAACGTATGA GGTTTGCCTT CGCTGTCGAT AAAC CACTGCTCGT TCTCCGCGGA GGGGAAACCG ACTGCGGTAG CATG GATCACGACC TACCAGGTGC GGCTCGTTGA AGCTGTTGCC GCTT CCACGCAT'rC CCAGAACTCA ACGGGGGTTT GATCGGCGTT CGGT' GGTGCACGGG ATGCGA.AGTG GCCACTTCTG GCACACCGTT CTTG TTGGGAGGGT GGCCAGCGTT TCGGCGATGA GGCGCACGCA GGCC INFORMATION FOR SEQ ID NO: 309: SEQUENCE CHARACTERISTICS: LENGTH: 982 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 309:
A.ACACA
ZGCGTG
CGATAT
k.ACTCC 1
GCAGC
rGCTGA
TCTTCG
TTGCGACCCA
TGCATCATTT
TCAATCGCAC
AGAGAAGAGA.
AGGCTCGCCA
CTAATAACTC
TAGAGAGCAA
1337 CCGTCTATAA TGGTAATAGA TTTTATTTGG AGGTTTTTAT GTCATrTCTA TCAAAAAATG GAGCAGGTAT CTTGGCCTGC CTTCTCAT CCATCCTATC TTGGTACTTA GGAGGATTCT 120 TCCCTGTGGT TGGCGCGCCC GTTTTTGCCA TTTTCATAGG CATGCTCCTA CATCCCTTTC 180 TCTCGTCCTA TAAACAACTG GATGCTGGTI' TGACCTrTAG TTCCAAGAAG TTGCTCCAAT 240 ATGCCGTrGT CTTCCTTGGT TT'rGGTCTCA ATATCTCGCA GGTCTrCGCA GTTGGCCAAT 300 CTrCACTCCC TGTCATCCTG TCCACTATCT CAATAGCTCT GATTATTGCC TACCTCTTCC 360 AGCGTTTCTT TGCCCTGGAT ACAAAACTGG CTACCTTGGT TGGAGTAGGT TCTrC'rATCT 420 GTGGGCGrTC TGCCATTGCA GCGACAGgCC CGTTAT'rGAT GCTAAGGAAA AGGAAGTACC 480 CCAAGCCATT TCCGTrATCT TT~TTCTTCAA TGTCTTGGCT GCGCTCATCT TTCCAACCCT 540 CGGCACCTGG CTTCATCTAT CCAATGAAGG CTTCGCCCTC TTTGCAGCGA CTGCGGTCAA 600 CGACACTTCC TCTGTAACGG CTGCCGCCAG CGCTTGGGAC AGTCTTTACC AAAGCAATAC 660 o oCCTCGAGTCT GCAACCATTG TTAAACTCAC ACGTACTTTG GCCATTATCC CTATCACGCT 720 *-CTT'rCTATCC TACTGGCAAA GTCGCCAACA AGAAAACAAG CAAAGCCTGC AACTGAAAAA 780 .AGTCTrCCCA CTTTTTATCC TTTACTTTAT CCTTGCCTCT CTCCTCACTA CACTACTCAC 840 .CTCTCTAGGT GTGTCCAGTA CTTTCTTTAC TCCTCTCA.AA GAACTCTCTA AATTCCTTAT 900 o TGTCATGGAC ATGAGTGCTA TCGGTCTCAA AACCAATCTG GTCGCTATGG TCAAATCCAG .960 TGGAAAATCC ATTCATCATG CA 982 INFORMATION FOR SEQ ID NO: 310: SEQUENCE CHARACTERISTICS: o~oo LENGTH: 1939 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear o o o o (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 310: CTAGCTGCCA ATATGATTGG GGTGCAGAAG CGCGTGATTA TCTTTAATCT TGGCTTGGTT 0 CCTGTGGTCA TGTTTAACCC AGTGCTTCTG TCCTTTGAAG GATCCTATGA GGCAGAAGAA 120 GGCTGTTTGT CCTTGGTAGG TGTGAGATCA ACTAAGCGTT ATGAAACCAT AAGGCTTGCC 180 TATCGTGACA GCAAGTGGCA GGAACAGACC ATTACCTTGA CAGGCTTCCC AGCTCAGATT 240 TGCCAGCATG AGCTGGATCA CTTGGAAGGA CGAATCATTT AGGAGGAAAG CAAATGAAAC 300 GAATAGTCTIT TGAACTTATT TTTATCGCAA CGACCTGGTA TATCTrTTTA CCGCCCCTTA 360 ACCrGACCAG CTGGGAATrT CTCTTCTTCC TATr'rGGCI-r TGGCAAGGGG ATAAACCTTG CGGAAGCTGC CTTAAATCTr GAGGGTTrCA CTTCGATTGG AGGAATTCTT CTCTTGGCAG TTCAGGCTAA AAATTATGCC AATGTAGTCA CTAAGAGTGA CACCAGTAAG GTTCCTATCC ACCGCTACTr GGGTrCCCTA ACCGATAAGG CCCAATTGAC AATTGATGGG AAACCTTATC TCAAATGGTT TAACAATCAA GCCAAGGGAA CTGGAAATGC GGATTTGGTG GACTTGAAGA TTAACCGTGA TGTCAAACGT CACCTGCGCT 1338 TCTGTGGGCA TTTG?1'AGTT GTGGCAATAT TCAAAACGGT TCATGTGCGC CACGGTAAGG AAATCAATCG GTTAGGGAAA ATTC'rGTrAG CTrTGGTTI'c CT'rGGTAACT TCCAGCATGT CGCTTACGGA AAAAGACTTr' ACTGAATT'rC TAGATAGAAG TACTGCTGAA A.AAATTrGGAG TGTCGCAATA CGTAGCGGCA GATACCTATA GGGTCACACC ACTAGAATAT GCAGACCCTA TCGGTGAGTA TATTAAGGTG GACATGGTAA CACCAATCAA GTA'rrCAGAC TCGGAGTATT TGAAGTACCC GACCAAAATC 'N'TAAAACTC CATCTTTTGA GGTGGACGAT AATTTGGACT TGCTGTTCCr AAACCAAGGA ATACAGC'rTA AGGAAACCAT TGAGCAAATC TGAT'rTCCAA GAAAAACGTG ATGACATCTA TCTCTACACA TCATCCTTGA AAATATGCGA AAGAATCAGC CCGTGAATCA TCCCAATCCT CATCAACCTC CTGGCTTGGT CAAAGAGTAC CTACTACAGT GGAAGAGATG ATGCAACGAC AGAAAGCATC GAGACACTGT CTACTTCTTT CCGATGACCT TCCTTACCTT ATTATCTCAA GACCTTTAAG
GAGGGCAATC
CGTCCTGCT'r
TCAGATGTTC
AACTACAACG
ACCCAGAC'rA
GGI'GTGACGT
CTTTCTATGT AGCAACGGT CAGTCATTAT CTTGGATGCT CAGAATGGGT GGACAGGATC GCAAGTACAA GGACGGT'rTC CCAATGGCTA TAATTACTTG
CGGCTAATGC-GGATGAGAGT
TACCAAAAGC
ACAAATGGAG
TATCCAGCAG
TTGAATGCCA
TCTATCGGTA
AACTTGGTT'
420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1939 ACAGGAGAAA TCACTAAGTA TAGCTTGGCT TCTGCGACAG GCAGAAGGTG CTGTTCAGGA GAAATCCTAC AAAGCAACCT AATGACAAGC CTCTCTACA'r CATGGGCTTG AAGGACAATG GCCCTGGTAG ACGCAGTCGA GTACCAAAAT GTTATCGTTG CTCAGCAAGT ATGCCAATAA AAACCACCTT GAAATrGACA AATGGAGTAG TAGCAGACCT CAAATCAGCT GTTATCAA(.,G AAAGTTGATG GCAACATCTA CAAGGTCAAG GCTTCAGTAT GAAAATGGTA AAACCTTCGA AGGTCAACTA GGAAAAGACA CTACGGTAAA AATAGGTTTT TTTCAGAAAG TATATGTTAT AATAAGGTAA AT'rAAGCCG INFORMATION FOR SEQ ID NO: 311: SEQUENCE CHARACTERISTICS: LENGTH: 907 base pairs TYPE: nucleic acid 1339 STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 311: CCTGCTAATA. GAGAGAAAGA CTAGGAGTAG AAGTAAGCCA CATACCCCGT CCTTTCATGT AGATTTGGTA TCGAAAGATA ATTATTTTTC TAATCTGTCA ATAAAATTTC TGACAATTTA
AT'PAAATAAT
TCTGCGGATA
ATAAATACAA
GCAACAAGAC TTrCTCCTTT GTTATCCTAT TCTAAAATGT TTTTACC'rTA CAATCAAGAT ATTGTTTAGC ATAATATCI' CGAGGGAGTA TCTGCTAGGA CACTGGCTGG TAATTTAGTA ATTCT'rCAGC CCTAAGTTAA AGATTTGAGA TGAGCCTATG CAAGGTCCAA TCGTAGTCAT CTrCCAAATAT TTTGGAATGA TGTGAGTTGG CCAGCAACAT TAAAGTAACG TAAATCATTC GTTCGCCCAT GTATCTTCAG TCACCGGCTT AACATGATTT TTTGAATGCC
GCTAGCCGTC
GTCACTAGCA
AGTT'N'AAAG
AGAACTGrCT TCTTGAAATA GACATAAATG TAATCTCGAA TTTTAAGCTA TCAT'rTTGTC ATTTrTCACA CGCAGACCGT GAAAATAACA TATTTCCAGT
CGTCGAGCAA
ATTTCTTTGA
CAATCTCGTG
TAGTATAGCC
GGTAGTTCAT
TACATGAACC
CCAATGCGGT
TTGAAGCATC
GAGAAAGTTT
TAAATGTAAC
CAAGGAGAGA
ATCTGATAAA
T'N'TGAAGCA
TGGGATTTTT
TTTTTTAGTT
TCCTTT1AACA
GTCACGTGTA
CTTGT-rGATA
CATTTCAGCC
CGTAGCGATT GGCCATCCAG
CAGTTTTGTC
6TCAATACAG
AACTTCAGAT
TCTGCATAAG GGTTGACAGG GTCGAGCAGG TTATTTCCAT AGAGAGAAGC AGTCGAAGAG AAGACTTTGA GAACTTGGTT CATACCAGCA
ACGTTGG
INFORMATION FOR SEQ ID NO: 312: SEQUENCE CHARACTERISTICS: LENGTH: 2170 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 312: CCACATAAAG GTAAATATCT TTTGTACTAT CTTGGGCATC CAAGAAAAGC AATTGGGCAA.
TAACAGAGTT AGCCATATTG TCTTCAACCG GACCTGTCAG CATAATGATG CGGTCTTTGA GAAGACGTGA GTAAATATCG TAAGAACGTT CTCCACGGCT TGTTTGTTCA ATAACTACAG GAATCATTCA TTTCTCCTTT TGAGTTTTAA TTTTGTTGGT CAAATGACTG AAGATAAGAC TATTATAATA TCTTGGTCAA AAAAACCCAA CCTCCTI'TCG TTAIrTTGTA CCGAACAAC rTCGTTCAAA CGTTCATCCA AAGGGCTTTT ACACCCTCTG ACGTTTTTTIA AGAGAATCAA TACTACAAAA ATTI'GACGTT TTGAAGTGTT TCTTCATCAC GTTCAAGAGA CCATCAACCA CTTACCTGCC AA'rTGTTT'TT TAGTGGAAGA TCACGAAGTA ACGAAAAGCT TTTGTAGAAG TGGGTGATTA ATAACTTCAA CTTATTATAC CAAAAAACGG ACATTAGATC AGCCTCTTTA CTGTCCCTTT TTCTGATAA AAACCACGCG CT'rGCAGCCG GGGTTACATA GGCTGTAAAG TGACCTCAGC GTGAAGGGTG TACAATGCTC AGTCCCTrGAC TTACCAGAAT CGCGCCCACT 1340 AAAAGGTCAA AT'T'TGCTC TGACTGGAAA TACTTTTCCA GGTCTCCAGC ATCTCCAAGA AGGCTGCTGT AAAGATTT-CT GACCAGATAC AAGGCAGACA CAGCCAAGAT TGCTGAGCCA GGTCAATGTC CTCAGGCAAT GGTACATACC GATGTGGCCA
TGCI'TCATT
AGTCATTCT'r
CCTGGAACGA
ACATCTGGAT
AATTTGATAT
CCTGTTGCCA
TTCACCAAGT
ACTTTAGCAG
AGACAGAAAC
CTTTTCGATC
TATAACCGTG
GAGCTTCr'rG
TTGATGCGCC
ACATTGGGTC
ATTCAACTGG
CTGGAACCAA
TCCCGATACC TGCACGCAAG ATTGGGACGA TGGCCAAT'TT GAACTGTTTT TGTAATTGGT GTTTCGATTT CCACATCTTC 0 0~ 0 0., 00 0 0 00 00 0 0 C. 0 0 0 CTTCATACCC CATCAACATfl TATCTGTACG ACGCAAGA'T- TTTTTCCCAT TTTT~GGAAT'r TTTAAAAATC TTTC'rAAACC AGAGCTGTCT GTACTGTCTC AGGTATTGGG CGTAGTCGTC ACCTGAAGCA ATTG'TTT'TGT CCTTTGGGAA CACCACGCTC CGAACGCAGT GGCCTTCAAT ACCGAACCAT TGTAACCAGT TTAGCACGT'r TACAAGTGGA GCAATCTCAT CTACTAGCTC GACAATT'rGT GTTGAATCAG CCTTCTTTCA ATTTATTCTT ATTTATTTTT GATAATTTTT AAGTGGTAAA TGGGTCAATT CATETCGGTAC TGGTTGATAT ACAG'rrGAGA CAAGGAAAAT AGCACCTTGA AGGATAGCAT GACCAAACAT TCGTGATCAA GGAAATAACC TTATTATCTT ACGATTCGCA AT'rAGTAGAG AGTCATCTCT TTTCTCCTTT CTTTTCAGC'r GGTACCTTCA CTTGGGCTGC AAAATACTCA TCCCAGGCCA GTCTTTTTTC TTCTCTATTT TTTAAAAAAT GGTAAACCTA AATCTGCAAT 0e0~ 0.00 00 *0 0
C
TGCCATCCTT GATCCATTTT GATAAGAGCT GACACCTTTT GGACAAAGAT TTTTT CCAGA AGAAGGACAG AGACGATGGC GATTTCCCAT GGTATTTTTC TGGTAATCCA AGGCCAATTG
TGAGCTCCAG
TAGAAATTCC
AATTACTCTA
AAGGAATGAA
TGCATGGCTT
GCTTCTTTC
TCGTAATGAT
ATTTCCTCTT
TGGCCTCCCG GAAAAGGTGA ACCCAAACCA AATAAAGGTC TGTCTTTAAA GCAGCTGTTC CATAATATTG TGGACAGTTC GTTTAAAGAC GCTCTCTAAA TGGAGTCATA ATTGCGATAA AAGGCCGCAC GCGAAACACC AAATACTAAT CTTGGTCAGT TCCTTTTTTT CCAAGAGTTG TGCACGTTTG ACCAATTCAG CAA~nnCT GTTTCAATGG 1341 CT'rCTCTGGT TAATAAATTG GATTCTTGGT TTGATTr'TCT GAGATTTTCA AGAGACTTTT CAGAGATTCT ACGT'rCAGAC ATAACATTTT CTrCTACTr GTCACAACAG ACGGATGATG
CTTTTGTTTC
INFORMATION FOR SEQ ID NO: 313: SEQUENCE CHARACTERISTICS: LENGTH: 539 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 2100 2160 2170 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 313: ATCTGCACGA ATCAGGGCTT TCTAAGTGAC TATTTCCACC GAAATATTAT AGGACATTCA TATGTCACGT TATACAGGAC CATCTTGGAA ACAAGCTCGT TTTCACTTAC AGGTACAGGT AAAGAATTGG CACGTCGTAA CTACGTACCA GACCAAACAA CCGTTCTAAA. TTGTCAGAAT ACGGTTrGCA ATTGGCTGAA TTCGTTTCAC TTACGGTGTA GGTGAAAAAC AATTCCGTAA CTTGTTCGTA AAATCAAAGG CGGAATCCTA GGTTCAACT TTATGCTTCT TT'rGGAACGT ACGTTGTTTA CCGTCTTGGT CTCGCGACTA CTCGTCGTCA AGCTCGTCAA ACGGTCACAT CCTTGTTGAC GGGAAACGCG TTGATATCCC ATCATrCCGC GTCAAGTGAT CTCAGTTCGT GAAArATCAT TGAAAGTTCC AGCAATCCTT INFORMATION FOR SEQ ID NO: 314: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 667 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear T'rATATCAGG
CGTCTTGGCC
GGACAACACG
AAACAAAAAC
CAAGCTACAA
CGTTTGGATA
TTCGTAA.ACC
GTAACTCCAG
GAAGCAGTA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 314: CCGGTTTTGC TCCTTCTCTA CGGCTACGAC GTGATGTATC TCTGATGATA TCCACTGTrT CTGTAGCAGG CGTAGGTGTT TCTGGACCTG CTTGTTCTGC TTTTTTCTCT GCCGTCGTAT AGGAAACAGC TACCCTTGTT GGGGTTTCAT TGTATTCTCT TTCAAGTTTC TTAGGTCTAA CAGGACCTGG ACCTGGTCTT GATCCACTTT C'PTCCGCTGG AGAAGAAGGT ACATCTTGAC TTGGATGACT TGGAACACCA GGAGTTTCTC TTTGAATCTC ATCTGCTGGA GAAGCTGGTA 1342 CACCTTGACT TGGGTGAGTA GGCACGGTAG GAGC M rTCT ACAAGGAATC AGCCATGAGT TCTI'CAGTTG AAGGTTCAT'r CCTCATCTTC TTTCAGAACT TCATCATAGC CTTTTACTTT GCTCTTTAAA GCGTAATC TCTTCTGCTC 'ETGACTTTTC TGTTGAGAAT CCATAATATT AGAGCTGAGA AGTCCAAAAA CCTAACGGAT TTTrGTCATTT CCCAGACCAT ATCATACCAT
CTGGGAA
CATAATCTCC TCTACCGTTG TGCAGGAGTG CGAACTACTG TrCTAAATCT CTCAGAATCT ACTCAAAAGT TTTTCCTCCT AAGCAATCTA TGATACTrTr GTTTCCCCTG CAAAGGTTGA INFORMATION FOR SEQ ID NO: 315: Wi SEQUENCE CHARACTERISTICS: LENGTH: 1483 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 315: GGGAAGCCAA GGTAT1'TTAT AGAGTCTTCT AATGCAATTA TTCTGCTGAA ATGATTGCCT TAATAGTATC AGCTCT1AATG TGAATCTACA GTAACATCTA CAGAATTATC AAAAAAGAAT AGGTGATGGG ATTCATGATG AGGGCTAGGT GGAGGAAATG TTTTTTAAAA AGTCATACAC A.AATATTAAG AATCACCCTT CGGATGAAGT TGTTACTAGT
CTAATGATTT
CTAATTCAAC
GTACTATTCG
CTAATGAAAA
TTGAAGATAC
ATCGACAAC
TATATTTTCC
ACTTAGAATT
CCATTGTTTT
AGATAATTCA
CACTAATGGT
TTCCAATTCA
TAAGAGTTAT
TGCTTTAAGT
AATTCAAGAT
TGAAGGAACT
GAATGAGAAA
TATGACAGGT
TCTTCACCGA TGGCTACAAA CCAACTGTTA ATCAGAATCG 'PTAGATAATT CGTTAAGTGT
CAATTAGACA'ACAGAACAGT
AAGGAAGATG TTATAAGTGA GTAAAAGATT ATGGTGCGGT GCAATAGATG CTGCAGCTCA TATTTAGTAA AAGAAATTGT GCTACAATTC 'rAAATGGTAT TTATTTACGG ATGATGGTGC GCAAGTAGAA TGGGGCCCAA CAGAAGATAT TAGTTATTCT GGTGGTACGA TTGATATGAA CGGTGCTTTG AATGAAGAAG GAACTAAAGC AAAAAATC'rA CCACTTATAA ATTCTTCAGG TGCATTTGCT ATTGGGAATT TTATCAAGGG CATGCTATTC TTTTCTTGGG CAAGCCTTAC
CAAATAACGT
AAATTGCAGG
CCAAAACGAT
AACTATAAAA AATGTAACAT TCAAGGATAG TTCGAAAA.AT GTATTAGTTG ATAA'N'CTCG GAAGGATGGG CA.AATCATAA GT'AAGGAGAG TTTTCCTTAT GCCTTGAATG ATGATGGGAA CTATTTTGGC AAAAGTGATA AATCTGGGGA CATTCAGATTr GAACCATTAA CTAGAAAAGG AAAATCTGAA AATGTGACTA 'PTCAAAATTC 1343 A'1'3AGTAACA GCAATTGGCA CACACTATCA AACATTGTCG ACACAGAACC CCTCTAATAT 1080 TAAAATTCAA AATAATCATT TTGATAACAT GATGTATGCA GGTGTACGT'r TTACAGGATT 1140 CACTGATGTA TTAATCAXAG GAAATCGCTT TGATAAGAAA GTTAAAGGAG AGAGTGTACA 1200 TTATCGAGAA AGCGGAGCAG CTTTAGTAAA TGCTTATAGC TATAAAAACA CTAAAGACCT 1260 ATTAGATTTA. AATAAACAGG TGGTTATCGC CGAAAATATA TTTAATATTG CCGATCCTAA 13 AACAAAAGCG ATACGAGTTG CAAAAGATAG TGCAGAaTw1r TTAGGAAAAG TATCAGATAT 1380 TACTGTAACA AAAAATGTAA TTAATAATAA T'rCTAAGGAA ACAGAACAAC CAAATATTGA 1440 ATTATTACGA GTTAGTGATA ATTTAGTAGT CTCAGAGAAT AGT 1483 INFORMATION FOR SEQ ID NO: 316: SEQUENCE CHARACTERISTICS: LENGTH: 2453 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 316: *CCTGAACGCT TTTTTATAAA TATCATAAAG CCAATCTGAT TTATCAAGTG TGTCTAAGCG *ACGCGAATTA. AAATTCATTG CATACTCCAT CGCTTCTAA.A AAACTCATTT TTGAAAAGAC 120 99000.
o GTTAAAATCA TCTAAATTCT GACTCCAATA TAATAACAAA ACCAATCCCA TAATATCCTC 180 TGGTTGATTA TTCAATAAAT TTAAGTTGGT TTCATAAAAC CCTGGAGTTC CAAATAGAGG 240 CAACTTTTTT TCTTCAATTT GAGTTTCTTT CCTTAGGGCA TGCTCAAAGT CTATAATATA 300 *AATATTATTT CTATTATCAA TAAGTATATT ATTAAATGAT AAATCTCTAT AGGAAAGATT 360 ATATTTGGAG TTTATTATCT CCATATAATC A.ATTAATGTT AAAAACCAAT CATACGAGCC 420 ACTAACCATA TTATACTCGC TTAATTTATC TGCAATAATA AACTCAAATT CCACAAAATA 480 0o06CGAATTCTTT ATGTAAAAAT CGTTAAAAAC TTTTGGAGTA AATTCCTCCT TTTCCAATTC 540 0 TACTAATATT TCTCTTTCAT TTATTIAAACG ATTCACAGAA TCTCTATTTG TAAAATCAAC 600 CAACGATAAA TCACTAGCTT CTTTTAATAA AGAATAAACT CGCTTTTGAG TATTAAATAC 660 TT1TATAXACT CCACCTTTGG CATTTTTAGA AATCACTTCC AAAATAATAT ATTGATCAGG 720 AATAGTG'N'A TATCTTGGAA TATAGTAATC CCTTATTGGA ACATTCACAT TTGAAGGGAT 780 TTTCTTATCT CTrITTATCCT TGAAAGTGCT ATCTTTTACG AACTCCCCAT ATCTGTAATA 840 TACAACCTCG CTAAGTTGAA ATCTGAAATC TGATGGTATG TTTACACCCT TTACACCTTT 900
ATACAA'PATT
TGTAATGAAT
TTTAATGCTA
GAATTTrI-rA CTTAGAT'rGG
ACTAATTTTA
CAAATAGTTT
CCATCATTTA
TC'rAATrTGT GTAACAAACG TTCCCGACTr GTGAATAACC ACCAAAATTT TGAAATTTAT GCAACCAAAT TAGCAT'rTAA GTGATAT'rAG ACGGCAAAT'r TATTCTAATA ATAAATTATG AAATACTTTT CGTAATTCAT AAGCCAAACA ATTTAAAGcG 1344 TrGAAACTCT TTATTATCTT T'rGGATAAAT ATTAAGCCCT GTATTTTGCA AAGAAAGT'rC CTrTCTCCT CTAGAAAATA TAAAATCAAA TATTGAAGCG CTCAGGTGTA TTTTAAATCC ATATAACCAXA TGTTCATCAC TAAAATTATC GTATGCGTCT TCTATTTCAG TTTCATAGTC ATTAAGAAAT CTTCTCCATA AATTTTTAGA TGATAATAAA TGTTGATAAr CAATGTAACT AATAATTT1TA TGCTATATCT ATTTTCTCGA TAAAAGAGAT AAGTATTATA ATCTGACAAT TCTAGTGATT CTGATAATTC ATCCGGAATA TCGGCCACTC TTCCTTAAAA AGCTrCACAAT GAGTAGTCTC ACAATTTGAA CATTTCACAT AGAAACCTCT GACTAAGATT TCCTAATTAA ATTATCCCTA ATTGAAAATT GAAATT'rTAT TTCAGTCCTC TATTrTGTAA TTCCTTCACC GGCAATTTAT AGGACTT1CAA GATAAAACCA eb
CCAGTTTCAG
ATTCTTTTAA
AAAATT'rTAA
CACTCI'AAT
TTCACTTTCT
AATAAT'rTTT TAGAAAAATA CATCGTATTT ATI'rrCATA A'ITTCTATAC AACAATCCGA ATATAAAAAA TGAATTAATC ATATCATAGT AAGGAATTCT 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2453 GTTTTA'rATA TTAACAATTA TGCGGATTGT AAATCTTGTC TAACAAAATG GCAAGTGCTA CTATGTGCCC CAGAAGGCGA TGCAACGCTA TTTTGAATTG AAAGAGCATA ATCATCCATA TCATTTAAGT CACGGATTAG CAATGCTTCC TTCTCTCTTC CGACAATTCC AAATTTTCTA A'rTACCTTTT CAGGATTATC AAAAAATTCT CCAACAACTT CCATATT'rCC TTGAAGTTCA 'rTCAAGAAAG CTTTCAT'rrc ACTACTCATT ATATAGCTCC TTTTCTATTA CTTTATTTGG AATCAAAACT TACTTGTACA TTGGAAACAC CTCTATTCTA CGCTTTCATA TTGCTGCATG ACACTTTCAA AATCAAATTG CTAAAAATAA TTTTTTAAAG CTTAATTTAG ATTTAATTAC ATATATCTCA AAAAATTGTT T'rGAAATTAG TAAATTAAAA TAGGTTTCTG TACTTATAGG AACTAGTTAT AAAAACTTCG CCCATCATAA AATATCTATT TAAGTAAAAC AAAAATTA TAATTTTTTG ATTTTTAAGT GACTATAATC 'rCCTATCTAT AAATACCATT CGCAGGACCT GGATCAATCC CTCTAGCCAT CN'ATGAACT TGAGTTCC1'C CAGACAGTCC CGG INFORMATIONJ FOR SEQ ID NO: 317: SEQUENCE CHARACTERISTICS: LENGTH: 1049 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear 1345 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 317: CCAATTTGAA GGCTCTAAAA CAATGGAAAA GTGCTACACA TCCAGCAgrA CTCAAAAACr TTACT'rATCA CCAGTTTrAG A'1-GC'TTTTA ATCTTTCTTG TrCGCCTAAT TTAGAATAAG GCATTCAAAG AGAAGCACTA TGAGAATACG ATTCTCCATA CAACACGATT CTTATCATCG GTTCC'rAGAG AGTAAGGGAA AAGGGCAACA GCCCAGACAA CGGCATGATG GAATCTTTCT ATGTTrATG GTTATGAGAA GA.ACTTTAGA TCTTTAGAAA GACTACATTG ATTATTACAA CAACAAGAGA ATTAAGGTAA GTGCAATACA GAACTAAATC CTTCGGATAA ATTAATTGTC GATGTGACAG AATTTGCCAT ATGGCTTTAA CAGCGAAATT TACAAACAAT GTrGGAACAG GTGACCAAGG CTGGCAATAC TrCAAGCATC CATGTCACGC TTGGCATTTT GAAATCGGAG ACCTTGAACA AGCTATTGTG AGCTAAAAGG ACTTAGCCCT TA.ACTTTTGG GGTGCAGTAC 120 3.80 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1049 4* .4* 4* 44 4 4* 4 4 4* *t
S
4*4*44 IdLE 440.
.4 4* 0 4 4*s4 4 *444 ATTrGGTA
GGATGTAACT
GTTCTGTTTC
TATATAAAAT TTGTAGGAGC TACTATATTC ACAATGTTAT TCGAATAAAT TCTTCAAAGT
TAT)
CCAC
TTA2 AATAATCATC CACGATATAA AATTCATCAG TTAJ ATTTTTTI-AG CATGTGAGCT TCATrTTTTrA TATC CATCATAATT CACAAAAGGT CTTGACTGCT TGA1 TAATTGCCCG ATAAACATTT CCTTTATTTG ATC1 TAT rTATTGC AGAGTCCTTA CTTGAAACT'r CAC) CTTCTTCTGA AAATAAATCC ATTTTCCGG INFORMATION FOR SEQ ID NO: 318: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 776 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear kTCTACA ATTTTATATT CCCAGTTTAT TGTTT1T TTCTCTAATA TTTAAGGAGT ICCCGTC AACTTGTTCC TGAACAAGAA IT'AGT ACTATAACTT TTATCGGCTA :ATCAAG AGCTGTCCAT TCTCCTTCAG ~GATTAC TTTTTGCCCG TCCGATTTTC ~CT'rAAT AATTTTTTCC ATTTTGTATT TGTGGT TTGAAAATAA ATCCTTTTTT (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 318: TTAGTTGGTT AGAATCAGAA AATCGCCGAA GTGGTTATTT ATTTTTGAAT AAATTTAACG AACCAATTAC AGCAAGAGGA GTTGCTCAAC AGTTAAAAAA TTATGCTGAT AAATACAAAA TGAATCCTAA AGTAATTTAC CCTCATTCTT TTAGGCATTT ATTTGCTAAG AATTTTTTAG 1346 CGAAGTATAA TGATA'N'GCC TTGCTTGCAG A'N'TGATGGG CTCGAATTTA TCTAAGGAAA ACAGCTACTG AACAACAAAA ATTGGTAAAA AATAACAGGT GGTCAAACTG ACTACCTGCT TATTATGGGA ATATACCTAT GAATTGGGTT GTTATAAAAA AATACAGGTC T TTCTrACAA. GAAGGGCGAT TrrAAGCATTA ATACGTGGTG GTAATAT'rAA GCCTTTAGAA TTTTCTCTGT GATACACAAT TCATCTCCTC TGAGCA.AGTT TATTTAAAAC GTATCAACCT CrAGAACA TATTGGAAAG TTTGCA-AGAA GTTGTGGCTG GTGGATGTAT T'rTCCAATTA ACACCATTCG AAATGTCTAT TATGTAACTT GTCCTCTCCG TTATTTTATA INFORMATION FOR SEQ ID NO: 319: ACACGAAAGT ATAGAAACTA TATTGTAGAT AAAA'PTGTTA ATTrTTGTGA TTATGGCTCT TAAAAGATAT TTTT'rCAATA ATAATAAAGG TGTTAGAA'TT TGGATAATGA TTACTACATT ATAATCAGCT AATAACACCT TCGAGAAAGA CTATGATGGT AAAGTGCAGA GATGATGTCA A.ACAATTGAA AGCAAT SEQUENCE CHARACTERISTICS: LENGTH: 658 base pairs BS) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 319: TGCAATGCGG CGGCTGCATA ACATCGCATC GAGCGAGCAC GGACGAAGAG CATCAGGGGC CCCGACGGCG AGGATCTCGT GAAAATGGCC GCTTTTCTGG CAGGACATAG CGTTGGCTAC CGCT'rCCTCG TGCTTTACGG CTTCTTGACG AGTTCTrCTG AGCGGGC=r TT?1'TCCTGA CGTCGGCATC CAGGAAACCA GTGGGGAGGC ACGATGGCCG
CGCTTGATCC
GTACTCGGAT
TCGCGCCACC
GGCTACCTGC
GGAAGCCGGT
GAACTGTTCG
CCATTCGACC ACCAAGCGAA CTTGTCGATC AGGATGATCT CCAGGCTCAA GGCGCGCATG CGTGACCCAT GGCGATGCCT GCTTGCCGAA ATTCATCGAC TGTGGCCGGC TGGGTGTGGC CCGTGATATT GCTGAAGAGC TTGGCGGCGA TATCGCCGCT CCCGATTCGC AGCGCATCGC AGCGGGACTC TGGGGTTCGA. TGTCGACAGC GGCTGGACGA CCTCGCGGAG TTCTACCGGC GCAGCGGCTA TCCGCGCATC CATGCCCCCG
TATCATGGTG
GGACCGCTAT
ATGGGCTGAC
CTTCTATCGC
CCGCCTAATG
AGTGCAAATC
AACTGCAGGA
CTPTTGGTCCC GGATCAATTC GCGCGACCGG ATCGATCC INFORMATION FOR SEQ ID NO: 320: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 1475 base pairs TYPE: nucleic acid 1347 STRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: CCGGCTTAAT TTTTAGAAAA CGTGGGCAGG GAACCTTTGT AAAGAAAATT AATCGTTCCA GAAAGAGATA TCCGGGGACT
CTCATTCTAC
TAGCAGAAAA
TTATTGACGG
GTATTACTGA
TGCATATTGC
ATTACTTGCA
ATAACGGAAC
ATTCTr 1TGC
CAGACTCTAG
AATTGACTCG AGGATTATTC ACTI'CAAATT ACTACAGGTC GCTTTGCAGA GTCCAGTTTA TAAACCTrAT GTTCTGGAAC AAACTTATAT AGATATTTTA CAAAAATCGA TTTACAATTA CAGTGCTACA AAAATCT'rAC GAGCT'rCTTC GCTCCTTCCA ACGGAACCGG TATTTGAAGT TCCGTTTGAG TACTCGATTA GTCGTCATCG ATTACGACAT TCCTCCTAGG AGAAAATGTG TTTAAGAAAA ATTTAAAACA GGGCAAGAAG 320: TCTCTCTCGT GGCAGCTCAA GACAAAAATA TCTGAAGATG AGAATTrGCA AATGAATTTT TAATATTTAC CGCCTGCGTA GAGTACCGAT GTTATTCCAG CATTGAAGGA AAGTTAGGAT TACTTCAGAA AATGAGCAAC AGAACAAGTG GCTTATTrGG CTATGATTrA TTTGAATTTA AAAATGAAGC CAATCTrTTA GTCCCATCTA TGCTTAAATG GTTGTATCCA TCATGTTGAA TTCACCTGAT CTACTTCTTA CTGACGAGAA GTTTTTGAGT TAGACTGTGA CTTCTAACAA ATATCTTCTT T'rAATATATT TTTTATTTTT CATTACGAAT TCTTTTGCTA. TTAGTTCAAA GATCATCGTA AATTGTTTGA ATTACTGGTA AAAATGGTAC GTTTCTCTTT TCTAAATAAG ATGGCTTTAA AAGAGTGATC AAATATCTTC GTATAGCTTA TAGAGTAGGT TAGTTATTTA GTTTTAAATA GTGTTTCAAA CTTTTCTTGT AACACATATA GTATACTGTG ATTGCTAGAA ATGAATTTCA ATCTCCCAAT AAATAAATTC TAAATCATAA TCATTTAAAA AATATAGATG AAGGGGAAAG AGTATGAAAA AGGAGAAAAA ATGAAAGTAG -AAAATATTTC TAATATTTCT TTTGATACTT CGAGTTCAGA AGGAAAGTCA ACTTTACTAT AGTAGATTGA TATTGTTAGA AATCGAT'ITG ACTATCCTGA TATATCTCAA ATTGAGTATG ACGAAGTGCG TATT'TTTCAT ATTCTTGAAT CCATCGATAA CTAATCATTT TTACAGGATG AGATTTACAG
ACTGAAATTG
CAT'rCTTACA
GTTAGAATAG
TTATTTGTTC
AAATTTTATT
CAGAACTIGTT
GTATAGGGTG
CGTGACAT'rA 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1475 AACTAGAATA GTACACATCT ACT1'CTAAAA TCTATTTGTC CTGTTCTTAT TTCATTTCAC CTCCCATGTC CTGGGAACGC ACTT1TCTTCA.
AGACTATTGG GATGAATTTT TAAAGTTGAA
CAGAG
INFORMATION FOR SEQ ID NO: 321: 1348 Ci) SEQUENCE CHARACTERISTICS: LENGTH: 560 base pairs TYPE: nucleic acid C) STRAI4DEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 321: GAAATATATA TACTTCATCT TAATAGTGAG CAAGCTAAAC TTAGCATTTC ATGCCCTCAT ATGGGATGTT CTTTGACTAA ATAATATGAT TATCGAGATA TATCTGGATA AATGAACTAA 120 TAACTCTGAC GCGTAGACTT ATCAAAGTCA TTGGCATACA CCACTATGAA CTCGTrGGTC 180 TGTTCAAATC CCAACACATT ACCTGAGAAG AAAGTTGCAA TGTTGTTTTT GGTGCGGGTT 240 TGAATTTAAA AAATTTGTTA TGTAGrACCT AATCTAAGGA ATTAGAACAA TGCCTCTAAT 300 ?I-rTCTrTAA TACACTGAAA CATTGATGAT TCTGGCTGTA TTTTTGAAAC AGCTCTTCTT 360 TGCTCCTGGA AAATATCTTC AGAAGT'rATA TTCTCTAT'rC CTAACGCTAC T'rGAGTTTTT 420 TTTCTA.AAAT ATTCTrTCC GTTGCCATCT TTAGAAAAAT CATAACCTTC CCTATCTACG 480 CTGTTACACA AATTAGCTAA AAAArACTCT GGGGTTGGGA AAGGAAGATA AGAAaCGTAT 540 TT4 IAGCCCATA ATCTATAAAG 560 9 C2) INFORMATION FOR SEQ ID NO: 322: Ci) SEQUENCE CHARACTERISTICS: CA; LENGTH: 643 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID HO: 322: CCGCCCGGCC ACCGCTGCCT ATCCTCGGGA GAGGGTCACC TGGAGTGAAC CTAGAACGAT **AGACACGGTG CGGTACGACC TCGTACTACT TTCGCCGACG GCCTCGTCCG TTGTCATCCA 120 CGAACTGATC GGACATGGGT GCGAACACTT CAGAGAAAAA ATCGTTGGAC TGCGTGTCGG 180 GCCTGAGGAA CTACGGGTGG TGGCTTTTCC GAAGAACGGC TCCGGGTTTG ATGACGAGGG 240 TACACCCTCC GAAGAGATTG TACTTGTGGA GAACGGCATT GTGAGGCACG CTGTCAGGGA 300 TCGGGCGACT GGAGGAATGG CGCCTTTTTC CGGTTTGACC AA.AGTGGCAT CACATGGTGT 360 CAAACCTGGC TCAAGATGTA CGCATCTCAA GGCGGAAGGG GAATCGTCAC AGGAAGGAGT 420 TACCGGAGTA CCCGCCGAAC GCACCGTTTG GATAGAGCAT TTTTCTGCAG CGAACTACCA 480 TTCAGGTCGA GCCTTTTTCA GGTCTGGCCT TGCCTGGGTA GGCAGCCGAG AAGAACTCTT 540 1349 ATATCCCTT1A ATGCCTTTCA CCATGTCAAT TGATATCTAC GAACTGGCCA GCTTATTGTG GCATI'AGAC GGTCAAACGG AACGAGCACG TAGGGTACTG TGC INFORMATION FOR SEQ ID NO: 323: SEQUENCE CHARACTERISTICS: LENGTH: 780 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 323: GGTACCCACT CATrCTTGAT AGTCAAGAAG AGGAAAAAAA AACTTC'rTGA GGCTGGTGTA CTAAGTACAT CTTTACTGAA AATACGCTGA CCAAGCATAC TGTTCGTTGG TACTAAGAAA GTCAATACTTl CATCAACCAC AAAAACGTAT CGCTCGTTTG TTCTTCCTAA GAAAGAAGTT TGGGCGGTAT CGAAGATATG GAATTGTGA-A CAGTTGCCCT CAAAAAGGAG AAATACTCAT CACTTTGGTC ACCAAACTCG CGTAACGGAA TCCACGTTAT GACTrCATGC GTGATGCAGC CAAGCAGCTG ATGCAGTTGC
TGGGTCGTTT
GGCAGTAATT
TCGCTGGAAT
CGACTTGCAA
TGCGAGTTGA
TCAATGAAAC
CCTAAGATGG
CAAACTGTAA
AGCTAACGAT GCAGTTGTAT TGAAGAAGCA GTACGTTCAG CGTTGGTTGG GTGGAACTCT TACAAACTGG GGAACAATCC AAAGAAATTA AACGTATGGA AGAAGATGGA. ACTTTCGAAG GCACTTCTTA ACAAACAACG TGCGCGTCTT GAAAAATTCT CCTCGTATCC CAGATGTGAT GTACGTAtTG ACCCACATAA AGAGCAAATC GCTGTTAAAG AAGCTAAAAA ATTGGGAATC CCAGTTGTAG CGATGGTTGA CACCAATACT GATCCAGATG ATATCGATGT AATCATCCCA GCTAACGATG ACGCTATCCG TGCTGTTAAA TTGATCACAG CTAAATTGGC TGACGCTATT INFORMATION FOR SEQ ID NO: 324: ATCGAAGGAC GTCAAGGTGT SEQUENCE CHARACTERISTICS: LENGTH: 624 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 324: CGGGAAAA.AT CAGATTGTGG GTTCAGATAT CGAATTAGCC AAGGCTATCG CA-ACAAAACT AGGTGTCGAA TTGGAACTAT CTCCCATGAG TTTTGATAAT GTACTGGCTA GTGTTCAATC 1350 AGGAAAAGCC GACCTTGCCA TATCAGGTGT TTCTAAGACA GATGAACGGA GCAAGGTGTT 180 TGACTTTTCC ATTCCCTACT ATACTGCAAA AAATAAACTC ATrGTCAAAA AATCTGACTT 240 GACTACTTAT CAGTCTGTAA ACGACTrGGC GCAGAAAAAG G"I-GGAGCGC AGAAAGGTTC 300 GATTCAAGAG ACGATGGCGA AAGATTrGCT ACAAAATTCT TCCCTCGTAT CTCTGCCTAA 360 AAATGGGAAT TTAATCACAG ATTTAAAATC AGGACAAGTG GATGCCGTTA TCTTTGAAGA 420 ACCTGTTrCC AAGGGATT'rG TGGAAAATAA TCCTGATTTA GCAATCGCAG ACCTCAATTT 480 TGAAAAAGAG CAAGATGAT'r CCTACGCGGT AGCCATgAAA AAAGATAGCA AGAAATTGAA 540 AGAGGCAGTT CGATAAAACC ATTCAAAAGT TGAAGGAGTC TGGGGAAT'rA GACAAACTCA 600 TTGAGGAAGC CTTATAAGCA TCCA 624 INFORMATION FOR SEQ ID NO: 325: SEQUENCE CHARACTERISTICS: LENGTH: 1237 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 325: *TCTTATGAAG CCGAAGCGTG ATTTATGGCG GATAGGTTTG GTCTGCAGAA AGTGACAAAT CTAGTGCCAT CAGCGTATAT GGAATCTnTG GCTGAGAAAC AGTCCCGGGG TGAACTGACT 120 TATGAGCAGG TTTATGAGGA TGCAACGGCT TATCATCATA CCATTGATGC GAGTACAGAG 180 *GAGGCAGACT TGGTTTCTCT ACGTATTGTA GAACTATTGT CTCGAAGAGG CTT'rAGCTTC 240 AGTCCTGCGA TCTTACTTGC TATTCATAAG GAGTTGI'TC AAGATATATT TGAACCCTCG 300 .0.0 ATCCGGTAG GTCAATTTCG TCAGACTAAT ATCACAAAGA ATGAACCTGT TTTGAATGGT 360 GAAAGTGTTG TGTACTCTGA TTACTCCATG ATTCAAATGA CCTTGGATTA TGATTTTAAT 420 0 CAGGAAAAAC AAGTTGCATA TGCGACACTA ACCCAGGCGG ATATGGTTAA AAAAATCCAG 480 CATT'rTATTT CAGGAATCTG GCAGATTCAT CCATTTCGCG AAGGAAACAC TCGGAC!GGTA 540 ACGGTATTTT TGATTCAGTA TCTTCGTGAG TTTGGTTTTG ATATTGATAA TACACCATTT 600 C AGCAACATT CCAAGTAT'rT TCGTGATGCC TTAGTGTTAG ATAATGCAAA GATTTACAG 660 *CGACGTCCTG AGTTTTTAAC AGCTTTTTTT GAAAATCTCT TGCTCGGTGG TCAAAATGAT 720 TTGTCTTCAG AAAAAATGTA TCTAGATTTA GACCTCGATC TTTCATAATC CTAATAC 'GA 780 GTAAACA'N'G AATTTTAGGA AAAAATGAAG TAAATATTCT CACAAGAAAA CGTATATCAT 840 CAAAGTTTGG CTCTTTGTCA. ATTGTAGTGG GTTGAAGAAA AGCTAAGTTC GAGAAAGGGC 900 1351 AAATTTCGGC CTTrCCT'IT TGATGTTCAG, AGCGATAAA.A ATCCGGTTTT TI'GAAGTTTT 960 CAAAGTTTCG AAAACCAAAG GCATTGCGCT TGATAAGTT'r GATGAGATTA TTGGGCGCTT 1020 CCAGTTTGGC ATTAGAATAG TGTAGT1'GAA GGGCGT'rGAT AACCTTTTCT TTATCTrTGA 1080- GGAAGGGT'N' AAAGACAGTC TGAAAAATAG GATGAACCTG CTTAAGATTG TCCTCGATAA. 1140 GTTCGAAAAA TTTCTCCGGG TCCTTATTCT GAAAGTGAAA CAGCA-AGAGT TTGAAGAGCC 1200 GATAGTGATG TATCAAGTCT TGTGAATAGC TCAAAAG 1237 INFORMATION FOR SEQ ID NO: 326: SEQUENCE CHARACTERISTICS: LENGTH: 461 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 326: TTTGATTTTT CTGAATTAGA AGAGATTGAA TTGCCTGCAT C'rCTAGAATA TATrGGAACA AGTGCATTTT CTTrTAGTCA AAAATTGAAA AAGCTAACCT TTTCCTCAAG TTCAAAATTA 120 *GAATTAATAT CACATGAGGC TTTr'GCTAAT 'rTATCAAATT TAGAGAAACT AACATTACCA 180 *AAATCGGTTA AAACATTAGG AAGTAATCTA T'rTAGACTCA CTACTAGCTT AAAACATGTT 240 .GATGTTGAAG AAGGAAATGA ATCGTTTGCC TCAGTTGATG GTG'r-r'TGTT TTCAAAAGAT 300 AAAACCCAAT TAATTTATTA TCCAAGTCAA AAAAATGACG AAAGTTATAA AACGCCTAAG 360 GAGACAAAAG AACTTGCATC ATATTCGTTT AATAAAAATT CTTACTTGAA AAAACTCGAA 420 *TTGAATGAAG GTTTAGAAAA AATCGGTACT TTTGCATTTG C 461 INFORMATION FOR SEQ ID NO: 327: i) SEQUENCE CHARACTERISTICS- LENGTH: 1436 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTrION: SEQ ID NO: 327: TAACATTTAG GTACCTCTTC TTAACAAAGT TCAATAGTAA CAATTAATAT T'rTAAACAAT ATATCAAACA TCAATGACTA GAATACTTGC ATCATCCTTC TTTCCATAGA TTGGATCAAT 120 AGCAGAAGAA TTAAATCTCA TCTTA.ATTAA CTCTTCAAAA GTTTTATTr'r GATTATTTTG 180 1352 ATAGAATTCA TA.AAAGCCAT CGCTCATTAA AACAATTTGT TCACTAGTAA CATCTATTTG 240 ATTAATAATA GCATGGTCTA AAAATCTCTC ATCCAACGAA CCTATCCAGT ACCCACTCGG 300 TTGATTAGAT AATTTTCTGA TTITTrGTAA AATAATPrTTT TTATTTAAAA CACTATTTGT 360 ACCAATTGAA TCTTTTATCT CATTTTT'CCC ?r-r-1-CAA.AT AAGTTATCTA CTCTATGATC 420 AGTrATTTCC ATT'rCGTrTA CTAACATGAC GCAGTCACCT AGCATCATAT ACTCCAACTT 480 TTTTTCTGAA AGTTrAGCAA ATATTGGTAA GCGATAATAT AGTATATTGA AACTAGAATA 540 GTACACCTCT ACTrCTAAAA CATTGTTAGA AATCGA'TrTG ACTGTCC'rGA TrGATTTGTC 600 CTATTATTAT TTCATTTTAC TATACTCTGT TAATTTATAT GAGTTTAAAC CGATTTCATC 660 TTTAACCTCG AGTAAAGCAG TTTCAAATAT TTGTTTAAGA GTTTTTGATT CTTTACAATT 720 AACCGACAAA CTTTCTGATA AAATATGTAC AACTTCTGAG ACTGAATAAC CTATCTCCTC 780 TT'rAGAATTA TATAAATCTG TAGCTCCACC AATAATCCAA AAATACTGAT TTTGTGAACC 840 TACAATATCC TCATTTTCTA CGGAACTTCC TTGTATCGAA CAAATTTTAT TTATCTTTAC 900 CATAATACT CAACCCTTTT AGTGTCAAAA GTAAACCAAT TCCTGTCACT GTTAAGAATA 960 *..GTTCCATA.AT CTTATTCGAA CCAGTCTTTG GTAATTTTTG TT'rkACATCT ACTATyTCrT 1020 *..TAGATTTATT AATATGATTT 'rCAGTTTCTC TGCCATCTCC AACTATTTTA TAGTTTACTT 1080 ***CTTCTGTCT'r ATTATCTTGT TTATTGTCGA rCTTGTCATT CATTTGTCTA TTATCTTTAC 1140 T'TGAGTTAAA CTCTCCGTTC TTCTGGTTAC TATCAATTAC ATTATTTGAA TTAGATTGTT 1200 TTTCCTCTTT GTTTTTTTCT TTrTCGTTTT TATCACTTAA ATTATTTGTr ACAATTTGT 1260 *AAAGCCCATT CTCCGTTACA ATATTGAAAT TACCATCGCT ATCACGTATA ACAGGTTCTT 1320 TCCCATTTGC ATTAGATTTG ATGAATGATA TATACTTACC GGATAAATTA TAAAATTGGT 1380 TATTTAAAAC GGTTATTTrA CCCTrTGA.AT CCTCAATAAC AATTCCTTCT TTACCC 1436 INFORMATION FOR SEQ ID NO: 328: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 646 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 328: CCGGCAGACA GGAGAAGGTG TTAAATATCA ATCTCAAATG GTTCGTCAAT GGTTTCTGAT ACGTATTTTC CGTCTTTCTT CCGTTGCTTG ACACACTCTG TGAGGAGATA TTCGAT'PTGC 120 CCATTGACTG AACGAAAGTC GTCTTCTGCC CATGATGCGA GTGCAGCGTA TAACTTTGTT 180 1353 GAGAGTCGAA GGGGGATCTG CTTr'TTrTA GCTI'CAGCCA TCTrTAGTAA AGGCTTCCTG TGTTGACAAT TGGTTGTGCA TCATGATTGC CACAAAGAAC GACAAGGAGA TNTGAAACCA TGGCAGCTTT TCGTTCTTCG TCAAG'rTCTA CCAATrCCCC 'rTCATTGAGC CGTrCTAGTG CCAI'TCAAC CATTCCTACA GCACCATCTA CAATCATCTT CCGTGCATCA ATAATGGCAG ATGCtTG;TTG GCGTTGAAGC ATAACGGCAG CAATTTCTGG AGCATAAGCT AGGTAAGTGA TACGTGCTTC AAGGATTTCC AAGCCAGCAT CCTCAACACG ACTTTGGATN TCT'rCACGAA TACGGGTAGC AACAATTTCG CTAGAGCCAC GGAGACTACC TTCATCTGCG TGCCCATCAC CCGGAGTATC CACATTAGGA GACACATCGT AAGGATAGAT GCGGAC INFORMATION FOR SEQ ID NO: 329: SEQUENCE CHARACTERISTICS: LENGTH: 1653 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 0 .00.
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 329: GTTGCAGGTG CAGTAGGTGT TACTTCAGAT GCAGGAGCGG ATGCGATTGT TATTGATACT AAAATTGCCG AGATTCGTGC TCATTTCCCA ACTGCTGAAG GTGCACGTGC CCTTTATGAA GGACCAGGTT CTATCTGTAC TACTCGTGTG GCTATCTACG ATGCTGCAGC TGTTGCGCGC GGGATCAAGT ATTCTGGAGA TATTGTAAAA CTTGGATCTA TGTTTGCTGG AACTGATGAA CGTAAATTCA AGACTTACCG TGGTATGGGA GACCGTTATr TCCAAGGTTC TGTCAATGAA ACATTTGAAC GTGCAGAGGC GCACATGGTC ATTCTGCAGG GATCGGACTT TGATTGCTGG GCGGGTGTAG ACGTTGTTAA ATTGCTGGTG TTGGTGTTCC GAATATGGTA AXACGATTAT GCACTTGCTG CAGGTGGAAA GCTCCAGGCG AAACTGAAAT TCAATTGCTG CTATGAAGAA
TCTTTTTGAG
TGTCTTGCGT
AAATATTGCT
GGTTGGTATT
GCAAGTAACA
TGCTGACGGT
TGCTGT'rATG
CTTCCAAGGA
AGGT'PCAAGC
GCAAACAAGC TTGTTCCAGA AGGAAT1'GAA GGTCGTGTTG CTTATAAAGG AGCGGCAGCT GATATTGTTT CGCTCTGGTA TGGGTTACTG TGGTGCAGCT AACC -rAAAG TTTATTGAAA TGTCTGGTGC TGGT'rTGAAA GAAAGCCATC AATGAGGCAC CAAAT'rATTC TATGTAAAAA ACAATGAAAA GTTCTTTTAC AATGTTGTCA ATTTCCATTT ACAGCAGCTT TCCAAATGAT TGGTGGTATT AACTACACGA TAATGCTCAA CTCATGATGT GCAAATTACT GAACTCCAGT GAAAACAGGA TACCATCCTG AATAGTGAAG ATACTTAGAT TTTCTGGCAG ATTTTGAAGA TGGATTGATT GAGAAATCGT TTCTAATAAT 1354
TGGTCTAAGC
TTTAACTGTC
TCTGTGGTA.A
CGATGTTGAC
TCTCGATGAG
ATCACATCGT CAAGCAGTAA TCTGCTAATT TTATCGAGAG TCCATCCCAT TTATATAAAA TTAAATAAAT CTCTGGATCT TGTTTGTCAG TTATATTGAC GAGAGTTCAA AATGTTT~CT'r TGATTCATTA CACGACATCC TCATCTATTT TTTGAGCTGA TTGGTTAAGT CAGATAAATA CGTCGAATCG AAGGTGCTCC
TATAGGAGAT
GACCAGACTA
AGAAATGTCA
ACTTTTTTCT
AGAAGATTGA
ACGCCCAAAT
ATAATCAACT
TTTTAGGTAA
TTGTTGTTGT GATAAAGGTT TAGTGTTGTC AAGTTCACTC TGCTTTCCAT TAAT'rCGATT CTTGGCTTCC GAAACTAGCA GACCGACACC AGTATTCTTT AAAGCAATTT TGAAAGATTC GGATAAGTTT TAGGATATTG ACAACTCTTC GATCTGATTA GATTCTAGTT TTTTTATGAA ATCTAAGCGG
AGCTGATCAT
GTGTTTCTTT
CTAACACAGA AAGGAATGTT GCTTTAGGAT GTGGTTATAA.
GATTGGCTTA ATNTGCCCAA GTTCCATATC AATGAATT TTTAATTAGT TGTAAATCT'r CAGGAGCAAA TAAGACAACA TTCATGTGTC CTACATAATC TGAAAGGCGT GCC INFORMATION FOR SEQ ID NO: 330: SEQUENCE CHARACTERISTICS: LENGTH: 1340 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1653 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 330: GAAACACTGT ATTTCAAAGC ATTTTTTGTT AGTTTAAAAT TACTCCCATT AAACGTACAA TATATCCAAA ACCATTCAAA ATACTAGATT CTAT'rrTTTA AAATCCACCT AATTATAGGA CGTTTTCAGA TTTTTAGTCC CAGTCCCAGT TAT'rGTTTTA ATATAATATC TCTTTTTGTC TTCTA.AGCTC TTAAAAGCAA
CTTCTTTTCC
TAATATCACT
ACCGGAGAAA
AAGAACAA43T AAAGAGTCAA GACAAGGATA AAAAGTCCAT ATTAGGGCAA ATAAAAAGCT TTAAGACAGA TGACAAATCT AAGTCAAATA AGAAAGACCA TAGCAAAGGT GCAGAGAGAT AAATATTGGC GGTCTTCGGA CTGCCTTTAT TATGCACATA TACACTTAAA CAATAATCGC ATTTAAAATT TTTCCC'rTGT ATCTTTTATA TCTGCTATCT CTI-rTATAGG
TTTTATCC
T'rCATATAAA
GTGATGTTTG
ATAGATAGAA
TAGTTCAGTG
ATTTTTCAAA
AACATGGCTT
CAAGCTAAAT
TCAAATTTAT
GTAAAAAATT
TACGGACTTC
TCAGACTATA.
ACTTTAATCA
ACTTGGAAGT
AAT'ITGCTGG CAGATGAATA TCCAACAGAT TTTAAAAGAA GAGTITCAGC TACATTCATT CTTTrTCT'rT GAGTGTACTC AATTTAGTAC CACTCATTT'r AAATGATCAT CTAAGAATCT TTATATTTTT TTCTATTTAA GCTTTAAAGA AAAAATCAGC ACTCTTTCTA AGGCCTTTGT GATTCTGGAT TAATGTTAAT ATGGAAACAG CAAGATAACA TTATCAAAAT CAAAAGTACA TCTCCGTTTG CACTGACAAT GTGCTATTTT GAACTACTTC CCTCCTTCAT AAAACCGGTA 1355 TGTAATGCTT TGACAATATT 'TTTCCTTAAA TAAATTTT AGATATTTT TCAAGCGTGC CTTGATTTAC ATTCGTTGC.A TGCTACATCT TCAAGTGCTT TATCATCATC AATTTCAATT GTATGTGTCA ATTACTATAC TTATCCATTC ATTTGCCT GGC.AGGAGCG TCCATCTI'AC AATTrAATAT TTCCATTGCC AAGTATTATT TGATTCGGTT GAAGCAAGGT TGAATAAAAA AGATGCTAAA TGTTTTTCTA TTAGCTCTTT TTTAAAACCm ACAATTCTCG TGTAATAAAA AAACAAAAMT ATCTTTTATA TAGAGAGTTT GCGGTAATAG TTTGATACGG ATTAAACTTr GTAACTTGAA TAAATTGAAA CATAGTCTGA CATACTATAA CTCTTTGATA TAAAAATCAT GTATATCGAT AATGAAGATG 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1340 INFORMATION FOR SEQ ID NO: 331: SEQUENCE CHARACTERISTICS: CA) LENGTH: 607 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 331: TATGTTCGTG ATGAGTTTTT AAGTAGGAAA GTAAAAGAAA CTCTTTTTTC ACCCGTAGTA GAAATTGAGA AAAAACAATT GCTAGCAAGT TTTGCACATA AAGAATTGGA TAAATTGTTT AGTGATTTAC GAAATCGTAT TTTAGCTGAA GAATTTTTAG CCAATGATCG AATAGATTTC ATTCAAAATG TATTAGAATC ATTTGGCTTT TATTGTCAAC CTTATTCTAA TATCCTTCAG TCCATTTTGG AATTAGGTTA TCATTACTGT ATGGATTGAA TGAATGGTTT ACTTGGTGGA
CGGGAAA
AACGTGCTAA CCTCTCAGAT TTTGGAACTT GTTGATAATG GGTTTGATCC GGCCTATTT TTAGCAGCTG ATATGGATGA TTCTTTTTAT TTrTCATGATG AACGTCTTCA ATTGGAATAT ACTCCACAAA GTTCTTATTC TTGTTTCCA.A TTTTTCCTAG GTGATTTTAA TGAGGTTGA6A AAAGGTCGAA AAGGAGATGT GAAGGTTCAG GAAGGTATGG TTCGGAAAAA TGTGGGACAA TCTAAATATG GTGATGAGCA ACATTTACCC TTTGCTCACT CTAAGCTCTT TACAAATGTC 1356 INFORMATION FOR SEQ ID NO: 332: SEQUENCE CHARACTERISTICS: LENGTH: 900 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 332: TTAAAATACC GAATTTTGTT AACCGATACT AAAGATATAA TGGTrrAAATC
GTTTTGTAAT
TCTCATCAAT
CATCATACTG
TAAT'rTTGGT
AATCCAAAGC
ATTTTATCCT
AAAATCCAGT
GAGGTTACCC
ACCTGAAATG
CTCACGAGAA
TTCAACATAT
CTTCTCGCCA
TAATTTI'AAA
T'rGTCCTCTA TTTCAACATT CCAAAATAGT TGTCATTTGC TCGTCAATrG CGCCATCGAT GTACCGCCTG GGATAATCCC ACTTCATTGA CAGTTCCATC GTGAATCGCC TCAGGCAGAG 'N'TACCGATA TCAATCTTAT GTCTTGATT~G ATTTCCAAAA TAACTTAGGA ATGTAGTCTC TCCACCAAAC ACAACCACTG
GCTTCTTCAG
TCAAAGTATT
CAAAATGTGT TGCATCCAGC GCTTTTTCGG CTTTTGCTTT ATTCTCCAGC TTTTCTTTGT
S
S
55 S S
S
S S
S
CCAGAAGTAG GGTTGATAAT CAGAAATGTT TACATTTCGT CCTATTATAC AATGAAAATA CAGAAAAGAG AAATCTGACG TTATI-CTATT TTCCCATCGC CTAACTACAT CCTTTAAGGG CCTTATCCTT GATCCAATCA GGAATACCGT AAGCTGCCTC CTGCGAGAGT ATCACTGTCG CCACCAAGTG AGATGGCATT TACCATTGCT TTTTTCATTG CGTATGCAAG TAAATGTAAT TACTGGAGAT TAATACGCTT TTrCATCCAAG TAAGAATAGG TGCTAWGCTA CAAGTGATTG TCTTATCGCA TCTTCGA.AGT S
*SSS
5 *5*S
S
5555 S.
S
S
CTCTACTTTC AAGAAAGGCG ATAATGGCTT GAGGGACAGT TTCCTGACAT GTTTCGTTAA AACGATAGTT AGGACGGATT TCATCTAAAG TTTGAGATAG ATTGTAATCG TATTCTTTrT INFORMATION FOR SEQ ID NO: 333: SEQUENCE CHARACTERISTICS:.
LENGTH: 533 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 333: CCTTTCTGGC ACACTGGTCT TGGAATACGG CA.AAACCTCT GAAAATATCT ATGCTGGAAT GGACGAGGAA TACCGTCGTT ATCAGCCTGC CATCATCACT TGGTACGAAA CAGCCAAACA TGCTTTTGAT CGCGGACAGA TTGGCAAAAT ATGGGTGGAA TCGAAAACGA CCTCAAGGGC 1357 GGTCTCTACA GCTNTAAATC CAAGTCAAT CCGACCATTG AGGAATTCGC TGGTGAGTTC 240 AACCTGCCAA CTAATCCTCT TTACCACCTC TCCAATCTGG CCTACACTCT CAGAAAGAAA 300 CTGCGCAGaA GCATTAACAG AAAGGAAGCC TATGACCTTr AAACTTCTCA GCCAAGAAGA 360 ATrCATCCAG CATACCTCAG CTAGATCCCA ACGCTCTTTr ATGCAGACCG TAGAAATGGC 420 AGAGCTGCTG AGCAAGCGTG GCTrCAGTAC CCAGTATGTC GGCTACACTG ACCCACAAGG 480 GAAGGTAGTG GTGTCAGCTG TCCTCTACAG CATGCCTATG ACrGGTGGCC T'rC 533 INFORMATION FOR SEQ ID NO: 334: SEQUENCE CHARACTERISTICS: LENGTH: 544 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 334; CCAGCAAACT AGGAAGCTAG CCGTAGTTGC TCAAkAGCACA GCTTTGAGGT TGTAGATAAG g of go ACTGACGAAG TCATGTACAA AACACTGTTT TGAGGTTGCA GATAGAACTG ACGAAGTCAC 120 .TCAAAACACT GTTTTGAGGT TGCAGATAGA ACTGACGAAG TCACTCAAAA CACTGTTTTG 180 *SSAGGTTGCAGA TAGAACTGAC GAAGTCAnnA ACCACACCTA CGGCAAAGTG AATCTGAAGT 24 GGTTTGAAGA GAGTACAACT TGTCTTTTAG AAAAGGAGCC TATAATGAAA GTCTTTCAGC 300 ATGTAAATAT CGTGACTTGT GATCAAGATT TCCATGTTTA TCTTGATGGA. ATCTflAGCAG 360 TCAAGGATTC TCAAATCGTC TATGTCGGTC AAGATAAGCC AGCGTTTTTA GAGCAAGCTG 420 CStAGCAGATTAT AGACTATCAG GGAGCTTGGA TTATGCCTdG TTTGGTCAAT TGTCACACCC 480 ATTCTGCAAT GACAGGTCTG AGAGGGATCC GAGATGACAG CAATCTCCAT GAATGGCTCA 540 :.~*ATGA 544 INFORMATION FOR SEQ ID NO: 335: *pot SEQUENCE CHARACTERISTICS: LENGTH: 349 base pairs TYPE: nucleic acid S STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 335: CCAGGAACTC AAATGTAAGT AGGGGTTCCT TTTTTGTATA TTrTTTCAAAT AACGCCTCTA 1358 CACTATrTGT AGCAAATrCA CCAACTACAG TTGTA'rCTrA GTTAAAATAA GTTAGAATAT GTAAGTGAGT ACCAGATATA CCAAGACATC GTCACCATCT AAGGTATATT CAAAATACAA AAGTTGACCA ACTAGATN'TC TGAATATCCT TATATATCCA TTC?1'AAAAT TGGTTTAAAT AGCGTAGTCT TTTAAACTAG TTTTGAGAAT CCAAAAAATC TTCCTACATA TGTAAGAAGA TTTTTTAGTT CAGAATGATT AGaTTTAGCT AATGGATACC TATCCTACC INFORMATION FOR SEQ ID NO: 336: SEQUENCE CHARACTERISTICS: LENGTH: 1206 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 336: CTCCGATAAC CACACCAGCA ATGGAAATAA TTCCATCGTT AGCATCAAGA ACACCCGCAC GCAGGATATT TAAACGACCT GCAAAATTTG AATCAATTTC GTGATT'rGTT TCTGACGCTA 0 0.
000.
00 AATTrCAAGT TT'rTTAGGAT
TTGGTTCCCT
ATCCTCTAGA
GATGACTGGG
CGCCAGCTGG
CCTGAATGGC
ACACCGCATA
CCGACACCCG
TTACAGACAA
ACCGAAACGC
TCAAGTTAGC
AGTTGTTAAT
TAGGTAACCA
GTCGACCTGC
GAAAACCCTG
CGTAATAGCG
GAATGGGGCC
TGGTGCACTC
CCAACACCCG
CATCAAGAAG TCTTCTCTGG GTGACTTGTA GTCCAAGCAT CCACTTTTCG ATGAATGCGA TCTACGAATG AGCCTGTTGT AGGCATGCAA GCTTGGCACT GCGTTACCCA ACTTAATCGC AAGAGGCCCG CACCGATCGC TGATGCGGTA TTTTCTCCTT TCAGTACAAT CTGCTCTGAT CTGACGCGCC CTGACGGGCT CTTCTTTGGG AGTCAT'PTTC GATTCTCATT AGTTCCCGGG GGCCGTCGTT TTACAACGTC CTTGCAGCAC ATCCCCCTTT CCTTCCCAAC AGTTGCGCAG ACGCATCTGT GCGGTATTTC GCCGCATAGT TAAGCCAGCC TGTCTGCTCC CGGCATCCGC 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 GCTGTGACC!G TCTC!CGGGAG CTGCATGTGT CAGAAGTTTT
GCGAAACGAA
GATAAGGATG GTTTCTTAGA TATT'rGTTTA TTTGTCTAAA ATAAATGCGT CAATAATATT CCTTATACCC TTTTTTGCGG TGAAAGTTTA AGATGCTGAA ATCTCCAnCA GCAG?1'AAGA AGGGCCTCGT GATACGCCTA T'N'TTATAGG CGTCAAGTGG CACTTATCGG GGAA.ATGTGC TACATTCAAA TATGTATCCG CTCGTGAGAA GAAAAATGAA GAGTATGAGT ATTCTACATT CATGTTGCCT TCCTGTTTTT GCTCACCCAG AAATCATTTG GGTGCACAAC TGGGGTTACA TCCTCTGACA GTTGTACACG CCGCAAGAAC
CACCGTCA'LC
TTAATGTCAT
GCCGAGACCC
AATAAACCTG
TCCGTGTCGC
AAAACGCTGG
TCCAACTGGA
TATTCCCGAT
1359 GAATGAGCAA CT'rTrAAAAG TCCTGCGAAT GTrGGGGCGG TAATAATCCC CGTGTTGTAG
GCCCGG
INFORMATION FOR SEQ ID NO: 337: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 813 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 337: 0 .0 .0 0.*00 .0.
.00.
o* CTGCTCAACT CAGACAGTCA AATTTCTGAC GAAAAAGGCC ATCAGGTTAT TATTACGACA TACCGTGAAC TGGGCTTAGA CACTCCTATG CCAGACCAAG TTTGCGATTT TGAAAAGTGT ATGGTTCAAC GTTCAGAGGA CATTCAAGCC TTCTACATTA CAAATCCCAA TGAAGAAATT TTCCAGCCTG AAGATCAATT CCAGCCTGAA TTGCAGACTA GAGCCAGTGA CAAATATTCC.
CATCAACTTT CTATCAATAC CTGGGGAGGT GGTGTCAACA AGGCCTTTGC TTTGGACTAC TTTACCAAAA GAACCATCAA AAALAGTTGCT GGTCGCCCTT ACCGTATGTC AAAAGATTTT ATTAACTTCA ACGGATCCCT TACTCATTTA TTGACTGTAG ACAAAAAATA TCTGCTAGAT GATTTTATCG CTGGAGAATA TCGTAAAAAA GCCAATCCCA AACTATTTGG TGTAGAAGCT TTGGTGACCA AGGACCCTAA CTGTATCCTC TTGGCAAAAG AAATGAACGC CTTCTACCAG CCGCTCAATA. TCCTTGAATG TACCCCAAAA TTGCTCAAGA TAATGAATCG TGACAAAAAA GATACCGAAA TGCTCGCTT'r TGCTGGGAAG CTACTCCCTT ATGCAGATGA GCAAATTTCC ACCCTACAAG ACTTATTCTT ATAACCTATA
TTA
8: irs GATTPTGATTG CCTTTGGAGA TGAACACAAT GGTTATGCCA TGAAAAATGC CAATCCAGAG CT'rACCAACG ACCAAGATGG GGTTGCCAAA CTGATACTCA ATGAGGGGCA.AAGAGCGAAC INFORMATION FOR SEQ ID NO: 33 SEQUENCE CHARACTERISTICS LENGTH: 683 base pa TYPE: nucleic acid STRANDEDNESS: doubl TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 338: CCTAGATAAA TGATATAATT CTATTATTGT TCGTAAAAAT TAAAAGGAGA TTGATGATGG ACAAATTATT TAAACTAAAA GAGAACGGTA TAACAACTTT CTTTGCAATG AGCTATA'rTC CAGGAATGCC TGCTCAGGGC GTCTTCCTAG TGATGATGGC TTrATGCT AACTTACCTT CCTrCTTTAC CTTTACAGTT GTATTCGGGC TG4GTCTTCAT CTGTGGGATT ATT TCATTGA TCATTGAATC GATTCCCAAT GCTCTTCGCT TTGCCTATGT1 AGGGATTAAG AATGCTGGAC 1360 CAGACGTTCG TACAGAGGT CTCGCTGGTT TCTTTGTAAA CCCACAAATA CTrTCACAAA CGACGATTAT TGGTGCAGTA GCGGGTACCT ATGCCCAAGC GCCAGGTATG GGACTCAATG TTGGTTATTC TTGGCAAGAA GCCCTAGCTA TTATTACCTr GACAAATGTT CGTAAAATGA CAGCTATTTC AGCTGGTATC GGTGTCrTCC TTTTGAAATT CACGATTGAT CCAGGCAACT ATACTGTTGT AGGAGAAGGG GCTGACAAAG CTCAAGCAAC GAT'rGCAGCA AACTCTTCAG CAGTTCCAGG ATTGGTCAGC TTTAATAATC CAGCTGTTr'r AGTGGCTCTT GCAGGACTTG CCATTACTAT CTTCTTTGTC ATC INFORMATION FOR SEQ ID NO: 339: SEQUENCE CHARACTERISTICS: A) LENGTH: 852 base pairs CB) TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 339: CTACTTTACA TGGAAGTAGT CACTGAATT-C CAGTTAGAAA TTACTTTGTA ACTACGTTTT GAGGAGGAGT AAAATGCTTT CCTACGTTCG ATATTACCCA CTAGCGATAG CTAAATTAAT GTGTCTGTGC TCTCCTAAAA TCTGCTGATT TATTACTGAC TAATACAGGA GGTTTTTTTT ATGgACAGAC AATCATATCT GCTATTGGTG TTTATATTTC CACCAGTATC GATTATTTAA TTATTTTA.AT TATTTTATTT GCACAGCTAT CACAGAATAA ACAGAAATGG CATATTTATG CGGGGCAATA TCTAGGCACA GGCTTACTTG TAGGGGCGAG TTTAGTTGCT GCTTATGTCG TTAATTTCGT GCCTGAAGAA TGGATGGTTG GATTGCTTGG TTTAATCCCT ATCTATT1TAG GGATTCGCTT TGCA.ATrGTT GGAGAAGATG CGGAAGAAGA AGAGGAAGAA ATTATTGAAA GATTAGAACA AAGCAAGGCA AATCAACTGT TTTGGACAGT TACATTGCTG ACAAT1'GCGT CTGGCGGAGA TAATTTAGGT ATCTATATAC CTTATTTTGC TTCGT'rAGAT TGGTCACAGA CCCTCGTGGC CTTGCTTGTG TTTGTAATCG GCATAATTAT CTTTTGCGAG ATTAGTCGGG TGTTATCCTC TATTCCGTTA ATATTCGAGA CAATTGAAAA ATACGAGCGA ATCATTGTGC CCTTAGTATT CATTCTACTT GGACTATACA TCATGTATGA AAATGGCACG ATAGAGACTT 120 180 240 300 360 420 480 540 600 660 720 780 1361 TrCTGATCGT GTAGA-Ir'1- TTGTTTCACT AGGGATTTAG CCCGAGCTCA AATCAGCTCT 840 CTGATrTrCA GA 852 INFORMATION FOR SEQ ID NO: 340: SEQUENCE CHARACTERISTICS: LENGTH: 754 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 340: CCGCACAAAA GCGCATAGTA TCAAGATTCT ATAAAGCCTT GATACTATGC CTTTTTAATG GATAAATAGT TAGTCTTrTTT TAAAGACCGG ATCN'TCAAA CTCTGCATAC TGGCATTGAT 120 CACCGCGCCT AGGATAACAA TTTTAGCAAT CAAGATAAAC CAAAACATCA TAACAACAAG 180 AAGAACGGAA CCTAAAATTC GGACATCCAC CAAATGATGG ACATAGTAAT TGAGATAACT 240 AGAGAACAGA GTTAGTAAAC CTAAAATCAC TAAGAGAACA AAGGCACTGC CTGGTAGGGT 300 .ATAGCTAATT TTCCTGTTAG ATAGATTGGG AAGAAAATAA TAAAGCATGA CCAAGATAGC 360 AAAGAGGAGG GCGTAAATCA GAGGACCTGC CAACCCTTGT AAAGCCTGAT AGATAATGCC 420 *ATCTTTTGTC CAATAATGAG CAAGTAAAGC CAAAATCATC TGACCAAAT A AGATCAAAAA 480 CAAGGCAAAC GCAAAGAGGA GCTGCAACC.A AAACTGACTA GGAGACT'rAG CATCTGATGG 540 *GAAATAAGTC CACGACTCTT TTICGACGCCA TAAGCCTTGT TAAAAGCTTT TTGCAAGAAA 600 TTCATAGATT TTGAAAAACT CCATAACGCC GATAAAACAG AAAAACTCAA TAAACCTGTT 660 .GAAGGTTGCG TCAAGACTTC TCTGGCTATT TTTCCACAC CTTCATAGAG GCTTGGGGGG 720 CAGACGTCT1' TCATAAAGCC CAAAAATTCT CCCA 754 INFORMATION F'OR SEQ ID NO: 341: SEQUENCE CHARACTERISTICS: LENGTH: 707 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 341: GGGGATAACT CTAGGAGTAC CGCTATTACT CGACTTAATG AGTGCACAAG AAGTCAGGAT TTTTATGCAG GTTGGGCGCT TCATCAGACA GGGAAGATTT ACAGCGACTA TTATGGAAGT 120 CAAGGTTTGC TTTATTATTT GCTGACTTAC TTTIGAGTGGT TAGCCTTGGT AGCAGGAGGA ACAGAGCAAG GAGACCAAGC TGGACATCTG CTrGCT"''T GTGGAGGCTA TGCGACTCTT AGTrAGT'rG CGGCTTACCT AAGCAATCCA CTAGCTTTGG CAGGCGGATT TTTCTTGCT GTGAGTTTAG GCTTGTTGGT CTTTAACCTT CAGTTTCTTG CAGTGGCTTT AGGTTr'rTCA GCTGCAACAG GAAGTTTrGG GGATGCGWTT CGCTT1TGATT TTACTTCTAA AATTTTAGAG 1362
GTGAGTCAGG
T'T'TCCT'TT
GTGACTATTT
TTAGCGCTTC
AGCCATGATA
CCCTTATCAT
GGGCATAGAC
CTTGTCTTTT
AGTGGTATTC
AATATGTTTT
GCGGATTTTT CTTTGCCATC TTAGATCAGC GGACACCTTG TTTACATGCT AGTTACAGGT CTrTCTTATT CGCAGCCTTT AGGGAmTGT ACGGATTGGG CGCTCCTGTT TATTGCTGTA GCTTTGCGCA TGGGTTTTAT ATCCAACTGC CTACTATAGT GTTATCCTAT TGACAGTATT
TTTAAGG
a. INFORMATION FOR SEQ ID NO: 342: SEQUENCE CHARACTERISTICS: LENGTH: 762 base pairs TYPE: nucleic acid STRANDEDNESS: double D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 342: GGATT'rTGAA AAACCATACC GATTTGACGA CGTTGGCCAT CAATTACAAT CTCTCCGGAT ACCGTCGTTG AT'rTACCACT ACCATTATGC ACGTGAAAgT AATATCCTTC ACATCGTAGT GATTT'TTTAC ATCAATTATT GATTTCATTT ACTACCCT'rG AAATAGTCAT AGCCAGAGTA AACTTGACCA AGCAAAGTCC AATGTAATAG AGTTTTAATT TTTCCAGGCA TTGCTGCTGC AAGCCTTAAA CCTGTCAC!AG CTAACTCACG AGCCATACCT AACTCAATCA ACATAATAAA AGGATCTGCA AATTTACCAA AATTACTGAC TAAATAGTCG GTAATACTGG CAACAGCAAA CGAATTTCCT ATCGTTAAAA TAAAGATAAA
CGTATATTCC
TCTGCTTCCA
CCTACAATCG
AGTTCTGATT
CGAACCAAAT
GATAGTGAAA
CAAGAAAATA
AAACATTTTC CTCAGTCAAA GTAAGCCATC AATTAATCGA AAAGCCATTC TCCACGTTTC TTCTTTATAG CGAAAAGAAA GTCCCTTTAA ATACATAGGC AATAAGGCTA CATAAAGTAG ATGGCAAACA TCTGACTAA TAAAATTGTT CCACCAGTTT CAACCAATAA ACAGATAATC ACTGCAACAA TCCAAGCCGG
AGCCGACATA
CACATTCCAT
GATAATAGCT
ACTAGTAACT TATCCGCCAT TTACGAGCTA AATATCCATC GCAACTATAT GACTCTCTAT AATAGGTATA AA INFORMATION FOR SEQ ID NO: 343: Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 482 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear Cxi) SEQUENCE-DESCRIPTION: SEQ ID NO: 343: CTTTTGATAC ACTTAAACTA TGAATACAAA TCTCAAGCCC AAACTTCAGC GTrTTGCTTC TGCGACTGCC TTGCCTGTC CTATCTGTCA AGAAAATCTG ACTCTGTTAG AGACTAATTT 120 CAAGTGCTGC AACCGTCATT CTTTTGAC -r GGCGAAATTT GGCTATGTCA ATCTAGTCCC 180 TCAAATCAAG CAATCTGCTA ACTACGACAA GGAAAATTTT CAAAACCGTC AACAAATCCT 240 AGAAGCCGGC TTTTACCAAG CTATCTTAGA TGCTGTATCT GACT'rGCTrG CAAGCTCAAA 300 AACTACCACA ACAATTTTGG ATATCGGTTG TGGTGAAGGA TTCTATTCTC GCAAACTACA 360 AGAAAGTCAC TCTGAAAAAA CTTTCTATGC CTTTGACATC TCCAAAGATT CAGTCCAAAT 420 CGCGGCTAAA AGTGAACCCA ACTGGGCAGT CAATTGGTTC GTTGGCGACT TGGCACGACT 480 .TC 482 INFORMATION FOR SEQ ID NO: 344: i) SEQUENCE CHARACTERISTICS: CA) LENGTH: 520 base pairs TYPE nucleic acid C STRANDEDNESS: double CD) TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 344: TTTATTTTTA TAAAGTCAAT ACCTGTCTTT ACTTTTTCT AAAAAAAGTT TATTATGTTC TTTAAGGAGG TGTAAAACAT GAAAATAAAT AATAAACTCG TTGGAGAACG TATTCAAAAT 120 *.*ATCCGTTTAA GCCATGGCGA CTCTATGGAA AAA'ITTGGAG AAAAATTTAA TACTAGCAAA 180 *GGTACAGTTA ACAACTGGGA AAAAGGTCGC AATTTACCAA ATAAAGAAAA CCTACTAAAA 240 ATTGCATCTA TTGGAAAAAT GAGTGTTGAA GAGTrACTCT ACGGCGATTA CAATACTTAT 300 *CTACACTTAA AGATTATOGA TTTAGCTCCT GAATGTATAA AAAATTATGA TGAGTATAAC 360 *.*TCTTTACACG ATGATATAAC AAATAAAGCG TTACAGATCG CTCAAAATAC CATTCTAAG 420 ATTGATTATC AAATTTCAGA CGAAACGATC AAAAAATTTA TTGATTTAGC TATCGAACAA 480 TCGAGAGATT TGCAAGGAAA TTTGTTGAAA AATAACGGGT -520 1364 INFORMATION FOR SEQ ID NO: 345: SEQUENCE CHARACTERISTICS: LENGTH: 1003 base pairs TYPE: nucleic acid STR.ANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 345:
S
S.
S
S
S. S. S
S
S.
S
S
GCATCAAATC CGCCATCAAA GAAGTTCTCT AAGTGCTTAA TGACAAGTAC AATGTTCACT GA.ACCATTGG TGAGCGCTAT GGTGCCGTTG TCAAACAGTT GGAAACCAAT CCTTGGAACC AAGCTTTCGA AGAAACAGAT GGGCTGCTCC GGCGTGTTrGA TGGGGAAATC TATCTGGATG TGGTGGCCCA CCACATCAAC GCTATGCAGT ATTTTGGCTG GAAGGTTGGG AAGTTCTTCT ATCAATTTGA ACAAGCTCAG GAATTGCTCC TGGTTTTAAA TGTTCCTGAT GGGACTAATT TGGTGGATTA TGACCCTGTr AAGCCACAGT GAAAAAAGAA GTTGAGAATA ATCCCAACTT GAGCTGCTTT TTTACGGT'rT TCTTCGATGA CTTTCTTTTT AAATGCGTAT ACTGCACCTG TTACAAGACC TTTAGCGAAT CCTTTAGCCA CCAGCCTCCT CAAGAGGTCA CATTTTTCTG GGATTTACCA AGACCAGTCA AATAGCTTAG ACTGGAATGA CTGGGAAGTT GGAGACACGG TTAAGAAACA CGACATTATC AATAAGCTTC GCCGCAATAT TATTTCGCTC TGGGATTACC CGTGCGCCTT TCAGACCATG TTTGATGTTC CGACCTTGAC CCAGCGCTCC AATGAkTATGC ATGTGGCTTT GCAGATGATG ATTGCCAAAC ACTTCATCAA CAACCTCCAT ATCTATGATA GTCGGGAgCC GTCAAACTGC CAACCACGCT TCTTTGATAT CAAAGCAGAA GATTTTGAGT TGAAGT?'rGA CCTAGCTATT TAikAAGAATA' CTTTTGTTTC TTAACGTGAT ACGCGGCGAC AAGCTGCTTT TTGCTCTTCT GGTTCGATTA CAACGGCAGC GACAGTTCCT GCGACACCTG TGAGTCTTCC TCCTTTATAT TCTCAATCAG ACTGACCTTT TTGTGTTATA ATAATAGTAA CGAAAAAATG GGAATTTTTC AAGGAAAAAA GATGAGAACA AAA INFORMATION FOR SEQ ID NO. 346:- Ci) SEQUENCE CHARACTERISTICS: CA) LENGTH: 750 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double CD) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 346: CCGCACGTAC TATTCCAGAT GCCGAGGAAG TGGACCTCAT CCTCGTTGGC GCAACTGGTC 1365 TCAACGCCTT TGAACGCCTC TTGGTCGGCT CTTCATCTGA AGGTCGATTr GCTGGTTGTG AGAGAACAAG AAAAAACCTT CCCCTAGCTC CTTTTTGTTT ACGATI'ATT TCTCTCTTTA CTGGCGCTGC AGTTCCTTTT TAATAGCAGG TTCTGGAGCA TGGTTPAAG ATT'rTATGGG TC-ACTGGATC AAAATGAGCC CCCCATATTG GCCTGATGGA CAATATCAAA AATACGTTCT AAAACTGCCG TAGGTGAAGT AAAGCGTGTC AATCAAGGCA TTGCTGAGCA GGTGTCTTCT TGGCTACTTT ATCTGCTGCC TTGCGATACA GCTTIGACCAA AATCTTCTTC AGAAGGACTG CAAT'rCTTCT ATTTTAAAAC CAGCCCTATG GGTTGCACCC TTCTTCTTGG GTTCGTTCAT CCATCATGTG GTGGAAAGTC GTCACGGCTG ACAAAGACTT TTTCTGAAGA INFORMATION FOR SEQ ID NO: 347: SEQUENCE CHARACTERISTICS: LENGTH: 596 base pairs TYPE: nucleic acid STRANDEDNESS: double D) TOPOLOGY: lineer ATACATACTC CGCCATGCI'A ATAATCACAA AGAAAAGGAG TGGCGTTCGT AAGCCTTGAG TATT'T'TCI-r CCCAATTATC TGCCATCTG GAAAAA'TTTT GGGTCCACCC CCATCAAGAC TCCACTTGCC CTATCAAATC TTATCAAGGG CC1'GATGAAG
GCTGCTCGAA
TCTAAATCCC
TTGACCTT'AT
CAAACTCCAC
AAGCTCGAGG
TGAAATGATA
a a a. *a a a a .a as a a a a 4&aa a 5**a a. a.
a a a *5*a a Gaa* a. sa a a (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 347: CGCAACATAC GGATAACCTC CAAAGAATAT TTTTATATTA TAGCAA.AGCT TTAAATTGAA TGTTAGAGTC T'rGTTCAAA.A CAATCATCAA AACCACGTGG ATGATGGTAT TCTACTAAGT GTTGATCTTG AGGATAAGTG TACTTACCGC CAACTTCCCA GATAAATGGA TGGAAATCGT ATTGCAAGCG ATCTTTTCGC ATT'rTCCAAA GT'rCTAGAAT CTCATTAGTA GAAGCCATGA AGTTAGACCA GATATCATAG TGAACTGGGA TAATGACTTT GGTACGCAGA T'r'TCTGCCA TACGAAGAAG GTCGATAGAT GTCAkT'rTGT CTTGGATACC TACCGGATTT TCACCATAGT TATTCAAAGC AACATCAATT TTA.AAGTCTT TACCATGTTT TGCAAAATAG TTTGAGAAGT GAGAATCTGC ACCATGATAG ATGGTTCCAC CTGGTGTTTC AAAGATATAG TTAACAGCCT TTTGAGCCAT TTCTTCATCT GTAACAGCCA AGCCAGCAgT TCACCGCCTG TCTCATCAGC ACCGTTCACT GGGAGAGTTA CCAAGCAAGT ACGGTCAAAT GATTCTACTG CATGAA INFORMATION FOR SEQ ID NO: 348: 1366 SEQUENCE CHARACTERISTICS: CA) LENGTH: 673 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 348: CAGAGTCAAC AGCCTGAGTT GAAGGCAACT TTAGACACAG CAGTTACGAC AGCTGAATGA GCTCCTCCAT CAGTTTrC TTTAATGAGT CCAGCTACAT CTTCAACTTC GAGGCCGTTA 120 ATCACAATGT CAGCGCCTAC TTCTTTTGCA AGGGCAAGT'r TGTCATTGTT GATATCGACT 180 GCGATAACAT GAGCATTGAA TACTTTTTTA GCGTATTGAA CAGCGAGGTT ACCAAGTCCA 240 CCAGCACCGT AAAGAACAAC CCATrGGCCT GGTTCAACTT TTGCTTCTTT GATAGCTTTA 300 TAGGTTGTTA CTCCAGCACA TGTGATAGAA GAAGCTrGGG CTGGATCAAG TCCGTCAGGA 360 ACTTTGACAG CATAGTCAGC AGTTACGATA CATTGTTCAG CCATACCACC GTCmTACTGAG 420 TAGCCAGCAT TTTTCACTGT ACGGCAAAGG GTTTCGCGAC CAGTTGTACA GTATTCGCAA 480 GTGCCACATC CTTCAAAGAA CCAAGCAACG CTGACGCGGT CACCGACTrT AAGGCTTTTC 540 :ACATCTGGAG CAATCTCTTT AACGATACCG ATACCTTCGT GCCCAAGAAC ACGTCCTGGG 600 *SaACTTGACCAA AGTCACCATG AGCAACGTGG AGGTCGGTGT GGCAAACGCC CACAGTATTC 660 ACTTCTACAA GTG 673 INFORMATION FOR SEQ ID NO: 349: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 198 base pairs TYPE: nucleic acid a CC) STRANOEDNESS: double D) TOPOLOGY: linear SEQUENCE DESCRIPTION: SEQ ID NO: 349: 4GTACCCTkCA AATGCTTTAC AGTATGGGTT GAGGGTGGTC AATGGAACTA TGGAGTAGGT TGGACAGGAA CTTTTGGATA 'N'CTGATTAC TTACATTCTA CTCGATATCA TACAGCAACT 120 GTTAGACATG GGGGTAGAAC CTCTAAGGAT TATGCAAAAC CTGAGGCATG GGCTAGAGCT 180 **4.TCCCTCACCA AGATTCCG 198 a INFORMATION FOR SEQ ID NO: 350: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 891 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 350: GCTTCTTCTA TAGACAAAAA TATCATGGGT AAAATAATCA AGGCTATAGC TAGAAGGAGG GACCAATCCA CTACTAATCC TAAGAACAAA ACACTCAAGA GAGCAGAAGA GAGAGGTTCA CTGGCACTGA TAACGGCAAC CACCAAAGGA GA.AACCAAGG ACACAGCCTT CATGGAAATG AAAAAAGCAA AAGCCGTTCC AAAGAAAGCG ATAATGAGGC AAATCAAGAT ACTCCAAATA TCAAGAGTAA AGGAAAGCTG ATAAACCGGC GAGAGGACAT TGCTAAACAA ACCTGCCAAA ATCATCCCCC ACCCAACCGT AGGAACAAAA CCATAACGCT ATAACATTAA ACATAACACC CATGGCACTC AGCAAACCTG ATGGATAACT GAGAGAGGTC AAAACATAGA AAACAGCGCT AGGATAAAGA CAGGGCrAA'r ACACAGAGAT AGAAAAAATA GGCAGGTAAT TTT'rCTTGTC GACCAAATGA GTACAAGACT GACACCTGAT AATGAGTAAA CCGGATAGGA GCGAATAAAT TCCCTTTGTC GCCATCAAGC TTTTGACGCT CGTTTTTGAT AAACTGTAAA ATAGTTGCTG CTGAACTGAA AAAATCCCCA TAGCAAAAGG TTGGGGCAAG TTATAAGAGC TAGCGGCGTC AAACACCCAG CATGGCAACC AAACCAAGCG ATTGTAAAAG TCGTAGCATT TGAGTATTCT AAATAGCATA GGCTAAAAAG GCGATTTTAA TTGTATTGCA TAGAGGTAAT CCAGCCCGAA CACAGATTCC CCATATTAAG
TCGCCAAATA
CCCTGCCAGT
GAAGTACTCT
TCTAGCACTT
GTCAAACGCA
CCTAAAATTC
TTTTCCGTTA. ACAATCTTTT TCTGATACTG A INFORMATION FOR SEQ ID NO: 351: SEQUENCE CHARACTERISTICS: LENGTH: 325 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 351: GAAAGCGTrrC AATAGAACAT TGCTTTTTTA TTTT'rAGAGT AAGCTAAGCG CTTCAGCATC TGCGATGATG GTTACATCAG GGTGATTTTG GAGGCTACTT GCAGGTAGGT TCTCAGTCAC TGGGCCAGAT ACTGTTCCGG CAATGGCTTC TGCTTTCGAC TCACCGTAAG CAAAAAGAAT AATAGACTTG GCATCCAAAA TGTTTTTAAT CCCCATTGAA ATAGCTTGGG TTGGGACGTC TI'CAATCTTG GCAAAGAAGC GTGCATTGGC TTCGATAGTA GACTGGTCAA GTTCTACTAG 1368 ATGCGTTTGA CTGTCAAATG GAGTG INFORMATION FOR SEQ ID NO: 352: SEQUENCE CHARACTERISTICS: LENGTH: 344 base pairs TYPE: nucleic acid CC) STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 352: CAAGAGCAGT TTGATGATTT TTGATAAGCA TGCGAATTTA AAATACAAAT ATGGCAATCG CAAGTTTTGG TGTAGAGGCT ATTATGTAGA TACGGTAGGC CGTAATCAGA AAGTGATAGC TGAATATATT CAGAATCAAT TACAAGAAGA CAGAGTAGCA GACCTAGCTC ACGTTATTCG AGTCAGTAGA TCCGTTTACT GGCGAAATAA ATAAGAGGAA GTAACGThAA GTGCTTTAGC ACCTGCTCGG GAAAGTGGTG CGCGAGGAAG CTATTTCAGG ATGCTTTGGC CCTGGCCGGT AGAAGCGTTA TAGCCGCAGA CTACGACACT TCACACTGGT GGN' INFORMATION FOR SEQ ID NO: 353: SEQUENCE CHARACTERISTICS: CA) LENGTH: 692 base pairs TYPE: nucleic acid STRANDEDNESS: double D TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 353: CCCTATCCCT GCTATTGGGG CTGCTCTCAT TGCTGCTTTG GCACAAATCA GTCTTCCAAT TGGACCTGTT CCCTTCACTC TGCAAAACTT CCGAGAGAGG CTGTACTTTC TGCTGGACTC GTCTTTGCAG GAGGTGGAC TGGTTTTATC TcG.TTTACTc
GTTAAGATTT
AGCTTGCATT
TTTATCATTC
CAACGCCTTA
TCAATATCCT
TTCTTGCAAA
TCCTAGCTGG
CAGACCTTGG
AAAATCAGGC
TTTCTTTTAT
TGGTT CAG
TGGACTTACT
CCTCTTGGGT
AATGGCATTT
CAAACTTCTA
?I'ACTTTACT
TTTGAAAACT
CAAAACACTG
TGCAATCGGC
TATCTTCTTC
GCTTTAGTTG
TCCTCTCTAA
GATGCCCTTG
GAAAAAGCTC
TTGAT1'CTAC TGTCTTTAGA TAGGTGCTAT CGGTCTTCCT GCCCTACTGC AGGCTATCTT CCAACAGCAA GAGTGGTGTT TCTTTGTCGG CGGGATTCTC TTGCTGTGGG GGTTCTTCCC GCTATTAGTT TTATTAGCCG AACTAAAAAA GGATATCGAG TATACTCAAT GAAAATCAAA TTTTGAGGTT GTGGATGAAA
TCCCCTACTT
TTATCATGAC
GAGCAAACTA
CTGACGAGTA
GGAAGCTAGC CGCAGGCTnG AnATCTCATA CATACGGCAA GGCAAAGCTG AC 692 INFORMATION FOR SEQ ID NO: 354: SEQUENCE CHARACTERISTICS: LENGTH: 1005 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 354: GTGATGGACT ACTGGTTCAA AACGCATCCA GAAGAT'rTTT TCGATAATGT CGGACCTCTT GTAGCCAGTA ACTTTTCA TACTTACACC GAAGATTTCC ACTTGATGAA GGAAATTGGA 120 GTTAATTCTT TCCGCACTrC CATCCAATGG AGTCGACTCA TCAAGAATTT AGAGACAGGT 180 GAGCCTGATC CAAAAGGTAT TGCTTTCTAC AATGCCATCA ?1'GAAGAAGC TAAAAAGAAC 240 CAGATGGATC T'rGTGATGAA TTTACATCAT TTTGATTTAC CAGTGGAACT TCTTCAAAAA 300 *TACGGTGG'rT GGGAAAGCAA ACATGTAGTG GAGTTATTCG 'rGAAGTTTGC CAAGACTGCT 360 **.TTCACATGCT TTGGAGATAA GGTTCAT'rAC TGGACAACTT TCAATGAGCC AATGGTCATT 420 ***CCAGAAGCAG GGTACTTATA TGCTTTCCAT TATCCAAATC TAAA.AGGAALA GGGAAAAGAG 480 ***GCCGTACAAG TCATCTATAA TCTAAACCTT GCTAGTGCAA AAGTGATTCA ACTATATCGC 540 TCATTAGAAC TTGATGGAAA GATTGGGATT ATTTAAACT TGACACCTGC TTATCCAAGA 600 *AGTAATTCTC CAGAAGACTT AGAAGCAAGT CGATTTACAG ATGACTTCTT TAACAAAGTC 660 TTCTTGAATC CAGCTGTTAA, AGGAACTTrC CCAGAAAGAT TGGTAAAACA OCTAGAGAGA 720 GATGGCGTGT TATGGAGTCA TACCGAAAAA. GAGCTTCAAC TGATGAAATC AAATACGGTT 780 GATT-TTCTTG GAGTAAACTA CTACCATCCA AAACGTGTTC AAGCACAAGC AAATCCTGAG 840 *GAATATCAGA CGCCCTGGAT GCCAGACCAA TACTTCAAAG AGTATGAATG GCTGGAGCGT 900 CGCATGAATC CATATCGTGG TTGGGAAATT TTTCCGAAAG CCATTTATGA TATTGCTATG 960 ATTrGTGAAGG AAGAATATGG TAATATCCCA TGGTrTATCA GTGAA 1005 Q2) INFORMATION FOR SEQ ID NO: 355: SEQUENCE CHARACTERISTICS- LENGTH: 973 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1370 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 355: CCGACAAGCA. ATATTAAAAA GAGTAAACTA TATAGTGAAT CAAA'rATACT TAAGAAAAGA AGCAGGTTCA G'PGGCAGTCC TTGCCC'rAAG
AGCTGGTCAG
TCAAAAGGCA
ACAAATTGTT
GATAAGAAAG AGTCTAATCG GAAAACTTGA CACCAGATGA ATCAAGATTA CGGATCAAGG TrAACTAGTT AATTAACCGG GGAAAGAATG AAAATTAATA TGTTTGTTCC TATGAGCTTG AGTTGCTTAT -Ai'A6ATGGTG AGTCAGTAAG AGGGAGGGGA TTATGTGACC TCTCATGGAG CATCATCAGT GAAGAGCTCC TGTCAATGAA ATCAAGGGTG
TTTATTACTT
AAAAATATCT
GACGTTACCA
ATCAGGCTGG
TCAACGCCGA
ACCATTATCA
TCATGAAAGA
GTTATGTCAT
TTACTATAAT GGCAAGGTTC TCCGAATTAT CAGTTGAAGG TAAGGTAAAC GGTAAATACT GACAAAAGAA GAGATTAAAC AGATAATGCT GTTGCTGCAG CTTCAATGCA TCrGATATCA CCATTACCAT TACATrCCTA CTATTGGAAT GGGAAGCAGG AGCTCAACCA AGATTGTCAG TCAAGCGGGA AACATTTCAA CATGTGGGAT CTG ATGTTTACCT TAAGGATGCA GTCAGAAGCA GGAACGCAGT CCAGAGCCCA AGGACGTTAT TTGAGGACAC GGGTGATGCT AGAATGAGTT ATCAGCTAGC GATCTCGTCC TTCTTCAAGT AGAACCACAA TCTGACrGTC GCCTTTTACG TGAATTGTAT GCTCATGCGG ATAATATTCG CATAATCATA ACTCAAGAGC ACAACGGATG ATGGGTATAT TATATCCTTC CTCACGGCGA GAGTTAGCTG CTGCAGAAGC TCTAGT'rATA ATGCAAATCC ACTCCAACTT ATCATCAAAA GCTAACCCTT ATCAGAACC
CTTATGATGC
ATTCAGACAT
INFORMATION FOR SEQ ID NO: 356: SEQUENCE CHARACTERISTICS: LENGTH: 843 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 356: GGTCGCATCT GCAATATCTG TCGCCTCCAC ATAAGCGACA CCAGCCTTGT CTGCTGCCCG TTTGACACGT TCTGCAGATT GACCCAGGAT GACCATCTTC TTGAGTCCAG TAATGTCTGG CACCAATTCG TCAAACTCAT TGCCACGGTC CAAACCACCT GCAATCAAGA CGACCTTGCT GTTGTCAAAT CCTGACAAGC TTTTTGAGTA GCCAAGATAT TAGTTGATTT ACTGTCGTTA TAGAAT'PTAA CACsCTTGAT GTCATCCACA AACTGGAGAC GGTGTTTGAC ACCACCGAAG GCTGAAAGAG TTTCCTTGAT GGTTTGATTG TCCACATCAC GAAGCTTGGC TACAGCAATA 1371 GTCGCAAGGG CA'rr'CCAC AT'rGTGGCTA CCTGGAAC.AC CGAT-TCATT ACTACTrCAC CACGGAAGTA GAGTrGACCA TCTrCCAGAT AAGCTCCATC AGTGTTGAAA ATGGTACAAC AGTGGC?1'CT GTCTTGGAAG TCAAGTCTT TGATTAAAGT TCAAGACAAG GAAATCAGCT GCTGTCATCT TGI-rCTGGAT GCTGCTACAT ATTCCGAAAA TGACCCATGG TAGTCGATAT GAGT'rGGCAT ATAACCGCAA TCTCTGGATG GAATTCTTGA ACACCCATGA GTTGGAAAGA ATAACAAGCG TGTCCTTATC TGATGCTATT TGAGCAACCI' GACTAGCTGG TrCCCTGATA AAAGACCATG TrGGCCAGCA GCAGrCAAAA CTTCCCGGGrI
TCG
INFORMATION FOR SEQ ID NO: 357: SEQUENCE CHARACTERISTICS: LENGT~H: 807 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 357: ACCT-r-rCA rGCCAAGTCT AT'rCCACI'G
GAGGTTGGTA
AGAAAGTTCC
ATAGCCGATA
TCCTCTAGAG
0 a. .*0 V.00, 0*.
09 9 ,rTTTTTTTAT ATTTTTTPA CATTGCTTCT GCATTAAATT ATTAACTAGC AAGTGCAACT ACTGACCCAA TCGCAGACTT GTACTTGAAG TACCTGCATC GGTTTTGTAA AAAACG'ITGA CTTAAATACG GACCAAATG GGACTTCGTG TCTACAAAAA GCCATCCTTT CAACTTCTGA -GGTGGTGAGG TTATCGCTTA GCAAAAT'rAG GAAGTTGGAG
TTTATTATTT
GTCTATITT
TGCAAACTAC
CCTAACTCGT
AAACATCAAA
AATCATTGAA
TGAGAAAGTT
ACGTGAAGAC
AGG?1'TGCTT
CGTTTGGTAA
AAGTTTGTTT
TCAGCTCTTG
GGGTAACAGG
AAAAGTT
TTTGGCAAAA AAGACCAATT TGCTTTGGAG GCTCGTGCTG TTACGCTCTT TGTATCATGT TAGTAAGAGG AGAAAAACAA AATGGTTATG ATTCGTAAT.G CTAACCAAGC TAAACACGAA AAAGGGATTG CTGAAArCCT TAAACGCGAA GATGACAAAC AAGGCGTCAT CCGTGTATTT ATCACTAACT TGAAACGTGT TTCTAAACCA CTTCCAAAAG TrCTrAACGG ACTTGGAATT ACTGATAAAG AAGCACGCCA AAAGAATGTT AATCAAGATA CAAAGCTCGT AAAGAACAAA ACAAACAGGC CAACT1ATCT ATTTTGCACA AGCTAAGTAA GTATCTGAAC CCCGTGAAAA AGAnAATAAA CATGTCACGT ATTGGTAATA GTTCTTAGAG CGTGTTCAGT CTGGCCGTGC TGGCATGTTC AGTTCAGCTA AGGCCTTCGT 1372 INFORMATION FOR SEQ ID NO: 358: SEQUENCE CHARACTERISTICS: LENGTH: 653 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 358: fl.e..
.5
S
SeQ S
S.
S 4 S. *S S
S
*v .5 0
S
CCCAGTATTT TTGTCCAAGC ACGACCAGAA AAGGATGATA TTAACCATCT tTGAACAAAA TCCTCAGGCT CAGGTCACTA CGTATTGACC ATATGTrGGC CAATGTCTTT CTGCCTAGCA ATGCATCAAA TAGAAATTGA GGATGGGCAA AACTTGATTA AGTCAGCTAG AACCTCGTTC AGACTACGAC TATCTAGCCT CAAGTATGAG TTGACAGAGG AAAATTTTT CTTTAAAAAA TATAGATAGG GAAGTGTCGG TAACTTGCCC AGATGGTTAT GGACAGGAGG TAGGATGGAA AGTTTACTTA TTCTATTATr TCTTTCTGAT TTGGCAAAGG CAGGATAGGC AGGAGAAACA ATCAGGCAGA TCATTTGTCA GACCAGCTGG ATTACCGCTT GCCAGTTAGA CCAAAAAGAT TTGGAAGTGG TTGTCAGCGA INFORMATION FOR SEQ ID NO: 359: SEQUENCE CHARACTERISTICS: LENGTH: 641 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
TATGCCAGT
GTGTACGCTT
GTGGTCGTAC
AATTGCCAAT
CTTAAGTAAG
TGACCAAGCC
CCGT'rrGCAA
TCGGGATAGC
CTAACGAATA
TGCATAGCAA
CTAGCTGGTC
AGCTTGGAGG
AGACAAGCCA
GAA
CAGATCTGGA ATTGGCTCTC TTTTCGGTGC CTTGGGTGGC ATCCTAAGTT GGCACCCTAT CTTATTGTCC AGAAGGAATC S. 55 S. S 5545
S
.55.
*SSS
0S S S
S
(xi) SEQUENCE DESCRIPTION: S CACCATGTGA TGTGACGCTG GCCACAGCTG GACTCTTCCC GATGTAATCT TGjTTCATAGT GtGCGCTTCC TGAACACTTA TCAACTGTTA EQ ID NO: 359: TCAGAAATCT GGCGAGCCAT CGTGTGCAAT CCTTTGATGA ATATGTTCAA GCTGTAGAAG CAGGCGAGTT GACCAGTCAG GAAACAGATG CCCGCATTTA CCTAAAACAA GCCTTCCAAG AACCCTTGAC TATTATCACT GGTGGACACA AAACACTTTT GCAGAATGCG CCACATGATA ACCGCGAGAT GGAAACGCGT TTTGCCAAGG GCTGGTACAC ACTTGCCAAC AAAATAGCAA CCTCCTAGAG ACCACAAGGA CCAGTTGACC GTATCTGTGG CTGTAGCGTG
ACTTCTTCAT
CAAGTGGTAG
TATGCTTGGA
GACGAAGTTC
TCAACCAAG'r AGGAAACTTT GTTAAAAGTA ACTTGCTCAA CGAGTGGAAG CGTAAAATTG CTACGGATAA GCCTCAAAGr GACTATCTCT TTACTGTCAT TAACACAGGC T'rGCATGATA AGGTCGATAC TGTCAGCACA GTGATTGATG TGGCGACTTG TGAT?1'CAAG GAATTGCACC CAACAGAAGG CTACAAAAAG ATGGCTGCTC TTATCTTGCC G INFORMATION FOR SEQ ID NO: 360: SEQUENCE CHARACTERISTICS: LENGTH: l958 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ'ID NO: 360: CCTCAAGGCC AATTTGAAGG CTCTAAAACA TTTGCCATTC CAGCAAGTAC TCAA.AAGCTT AGCGAAATTA TTGCTTTTAA TCTTTCGACT TTAGAACAGG CATTCAAAGA GAAGCACTAC TGGCAATATC AACACGATTC TTATCATCGG ATGTCACGCA AGGGCAACAG CCAAGACAAC AAATCCGAAA TGTTTTATGG CTATGAGAAA GCCATTATAG ACTATATTGA TTACTACAAC CTTAGTCCTG TGCAGTACAG AACTAAATCC GGTCAGTACA AAACTCTTGC TACTATGCGT CTCAAATCGA GTNTrTTACTC AATTTTCTTA CTCTGAGTAG AGTGTCTTGATATTGGCTTC CAAGAAAAAT TCTTGAATGG TTTCGATTTC GATGAGGATT TCATAGTGAA GCGGAGCTrG ATAAACCTCA ACAAGGGTTG CATCGGTACT TGAGTTTGTC GTATTGATAA GCTTCATAAT CACTTTTTGA ATAAAGTCGC TTGATTTATA GCTAGCCGCA GGCTATACTT GAGTACGGTA TTCGAAGAGT ATTAGCCAAT CTTATGCTGT ATGGAAAAGT GCTACACAGA TGTGACAGAA TACTTATCAC CAGTTTTAGA TGGCTTTAAT TCACCCAACT TAGAACAAGT ACAAACAATG GAGAATACGA TTCTCCATAG TGACCAAGGC TTCCTAGAGA GTAAGGGAAT TCAAGCATCT GGTATGATGG AATCTTTCTT TGGCATTTTA ACATTTAAAT CACTTAACCA ATTGGALACAA AACAAACGAA T-rAAGGTAAA ACTAAAAGGA TTTGGATPAA TTAATTGTCT AACTTTTTGG TTTATTATTG AAAGACTTAT TGGACTTTCT CT"TGATTGGG ATTGAAATTC CAATTAATTT ATCAACAGAG GCCTTATCAA TTTTACGTTT AGGCTCACGA ATAGCACGGT GTTTGTTTGA GGTAAAAATA ACATCTGTAT TCCCTGCAGA TTCTAGCTGA CTTTTTACAA GTTGCGAGTG ATTTCCTCCG ATTTTCTAAT TCTATTATAG CTCAATGAAA ATCAAAGAGC AAACTAGGAA AGGCGACGCT GACGTGGTTT GAATTTTATT TTTTTCCAAG ATTCAATGGC CCATTTATGG 120 180 240 300 360 420 480 540 600 660 ~720 780 840 900 960 1020 1080 1140 1374 CTACCACGTT TAAGGTTTr GATAGCCTCG TCAATAGGGA ACCAGGCAAT rTTCTAGTG GCTTTTGTAC TTCTT'rGAAA GGAGTTGCTT CATAGAGGTA 'rAGTAGTAGG TA'rCACGATG ACGAGAATAG AAATATTCGT CAGCTrGTCC CCAATTTCTG CTGTGAAACC AAGCTCTTCA ATCAACTCAT GCN'TAGGGC TTTTCACCTG CTTCAATTTC TCCACATGGT AGGAACCAAG CACCATT'rGG AGAACAATTT GTTTTTGTrC AGGA'N'AGGG ATAACTGCAT ATACGCCATA TAGTCTGTA'r TCACTTTTT TCTCCGAAAG TTGGGTTTGC CATTGCATTr CrAGTATCGT TATTATTATA GTGAAATGAA CCAAAAATAG TACACAATGT TCT'rATGGCA TATTCAATAG ATTTTCGTAA AAAAGTTCTC TCTTATTGTG TAGTATAACA GAAGCATCAC ACGTTTr'CCA AATCTCACGT AATACCATTT
ATGATTAAAG
GGCAGGATTG
GTAATAGGTA
TTCCTGATGA
TTCTTGAACA
GCGAGCAATA
TCCTCATTAT
GGTATAATCT
AGCGAACAGG
ATGGCTGGTT
1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1958 AAAGCrAAAA GAGAAAACAG GAGAGCTAAA CCACCAAGTA AAAGGAATAA AACCAAGAAA GGTTGATAGA GATAGACTTA AAAACTATCT TACTGACAAT CCAGACGCTT AT'r'GACTGA AATAGCTTCT GAATTTGGCT GTCATCCAAC TACCATCCAC TATGCGCTCA AAGCTATGGG tACACTCGAA AAAAAAAAGA ACTACACCTA CTATGAAC INFORMATION FOR SEQ ID NO: 361: SEQUENCE CHARACTERISTICS: LENCTH: 851 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear
TATGA
CTTAG
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 361: AATTA AGTTATGATG ATAAAGTTCA GATCTATGA.A CTTAGAAAAC AAGGATATAG AGAAG CTTTCAAATA AATTTGGGAT AITACAATTCT AATCTTAGGT ATATGATTAA ATTGATTGAT CGTTACGGAA TAGAGTTCGT CAAAAANGGA AAAAATCGTT TGATTTAAAA CAAGAAATGA TTAATAAAGT CTGACATGAA GGCTGGACTA TTCTCTTGAA TACGGTCTCC CAAGTCGTAC GATACTTCTT AACTGGCTAG GAA.AAACGGG TATACTATTG TTGAGAAACC AAGAGGGAGA GTACCTGAGA CCATCCTAAA AAAGTTAAGA GAACTCCGAT TGAAGGAGGA AAAAGAGAAA AGAAATTGTT TAAGAATTAA TGACTGAGTT TTCGTTAGAT CTTCTTTTAA ACTAGCTCGT TCGACCTACT ACTATCACTT GAAACAGCTA GATAAACCAG AGAGCTTAAA GCTGAAATTC AATCCATTTT TATCGAACAC AAAGGAAATT
ACTATTCTCC
AAGATAGAGT
CACAATACAG
GCGGAGAATG
GAAGAAAGAC
AAGTCATTAA
ATAAGGACCA
ATGCTTATCG
1375 TCGGATTTAT TTAGAAC1TAA GAAATCGTGG TrATCTGGTA AATCATAAAA GAGTTCAAGG 660 CTTGATGAAA GTACTCAATt TACAAGCTAA AACGCGACAG AAACGAAAAT ATTCTTCTCA 720 TAAAGGAGAC GTTGGCAAGA. AGGCAGAGAA TCTCATTCAA GGCCAATTTG AAGGCTCTAA 780 AACAATGGAA CAGTGCTACA CAGATGTGAC AGAATTTGCC ATTCCAGTAA GTAC'N'AAAA 840 GCTTTACTrA T 851 INFORMATION FOR SEQ ID NO: 362: SEQUENCE CHARACTERISTICS: LENGTH: 1168 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 362: GGGTAGAATC GATATCTCCA ATGAGTTGGT tTAGCTGGTG AAACTGTAAA AAGATT'rCGw ***CCAATTCAAG G'rPGAGGCAT CGCAAACTAT GGACTGTTTC CTCGTCAGTT CTGGAAAGAA 120 ***AACGGGATAA GGTTGGCTGT GAAGCAAGCT GCCCTCCTTC CAACAATTTT GGAAAGTAGG 180 9..CATCAGCTGA CA.ATrTTITA CAAGCATAGT CCGTTCCATA ACCTGTTAAC AGTTGAAAGA 240 GGAACTGGAC AAGGATATCT GAATCCGAAT AACGACAGTA'GCGGCGTTGG TCATTCGTTA 300 CTAAATACTT AGAAATCCGC TCT'rTTAGTT TCAACTGGGA AAAAAGT'rCC 'rGAAAAAAGA. 360 TAAGACCACC ATACTGGGT'r AAATGACCTC CATCGAAAGA TAGTTGGTAA AAAGACTTGT 420 TTGGAAGTG ATGATTrGGT AAACTGTTCA TGTGAGTTTC CT'rrCTTTTT GTGTTT'TTTT 480 CTACACTTAT ACCATAAAGG GGAAACTCTT TTTTGTCTAG TAAAAAACAC CCATTGGrGTG 540 AAAAAAGAAA CCATCCAGGA TCTAAGCTAA GGCAAGGATT CTGGATGGTT T'ITAGATTTG 600 .GGGTGAATAA TTGGGGTTTT AGCTGCTrGC GGCCAATCAG GTTCAGATAC AAAAACTrAC 660 *TCATCAACCT TT-AGTGGAAA TCCAACTACA TTr'AACTATC TATTAGACTA TTACGCTGAT 720 AATATAGTCA A'N'GAAACAA GAACAAGACA AAAGAGCCTC ATAAAAGGTA TTGCAACTTG 780 *GTAATACCT1 TTTGAGGTGC TTTTTGATAT GAGCCCATGT.TTTCTCA.ATA GGATTGTACT 840 ***CAGGTGAGTA GGGAGGAAGA GGTAAAAGTr TATACCCAAA CTCTTCACAC AAGAGTTCTA 900 ACTTACCCAT TCTATGGAAT CTTGCATTAT CCATAATAAT AACCGATGGT GTGTTTAATG 960 TTGGTAAGAG AAA'rTTCTGA AACCAAGCTT CAAAAAAGTC GCTCGTCATC. GTCTCTTCGT 1020 AAGN'ATTGG AGCGATTAAC TCACCATTTG TTAGACCTGC AACCAAAGAA ATCCTCTGAT 1080 1376 ATCTTCTTCC AGATACTTTG CCTCTTCTTA ACTGACCTrT TAATGAGCGA CCATATTCTC GATAAAAATA AGTATCGAAT CCrGTTTC INFORMATION FOR SEQ ID NO: 363: SEQUENCE CHARACTERISTICS: LENGTH: 4483 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 363: GTCAGCTTCA GCAAGCCCAT CAGCTTCTGA ATCTGCATCA ACCAGTGCGT CCGCTTCAGC
S
S
*SSS.S
S.
*SS*
GTCAACCAGT GCGTCGGCTT CAGCGTCGAC AAGTGCTTCG GTCGGCCTCA GCAAGCGCAA GTACCTCAGC GTCAGCTTCC TTCAGCAAGC ACAAGTGCGT CAGCCTCAGC AAGTATCTCA GAGTGCGTCT GAGTCAGCAT CAACGAGTAC GTCAGCCTCA ATCTGCATCA ACCAGTGCGT CAGCCTCAGC ATCGACAAGC CAGTGCTTCA GCCTCAGCGT CGACAAGTGC GTCGGCCTCA ATCAACCAGT GCGTCAGCCT CAGCAAGTAC TAGTGCATCA ATCGGCTTCA GCATCAACCA GTGCCTCGGC TTCAGCGTCA GCTTCAGCAT CAACGAGTGC GCCTCAACCA GTGCGTCGGC GCGTCTGAAT CGGCATCAAC GCAAGCACAT CAGCTTCTGA GCCTCAGCTT CAGCAACTAC ACCAGTGCAT CTGAATCGGC GCTTCAGCAT CAACGAGTGC ACCAGTGCGT CAGCTTCAGC AAGTACCAGT GCTTCAGTCT CTCGGCTTCA GCAAGCACAT AAGTACCAGT GCGTCAGCCT ATCAGCTTCA GCATCAACGA AAGTACCAGT GCGTCAGCTT GTCGGCTTCA GCAAGTACTA AAGTATCTCA GCGTCTGAAT CTCAGCCTCA GCGTCAACAA ATCAACGAGT GCGTCCGCTT ATCGGCTTCA GCATCAACGA GTCAACAAGT GCATCGGCTT GTCAGCCTCA GCAAGCACAT ATCGACAAGC GCCTCAGCTT CAGCATCAAC AAGTGCTTCA GCCTCAGCAT CAGCATCTGA ATCAGCGTCG ACAAGCGCCT CAGCGTCGAC AAGTGCGTCA GCCTCAGCAA GTGCATCGGC TTCGGCGTCA ACCAGTGCAT CCGCATCAAC AAGTGCCTCG GCTTCAGCAA GCGCCTCAGC CTCAGCCTCA ACCAGTGCGT CGGCATCAAC GAGTGCGTCC GCTTCAGCAA GTGC.ATCGGC TTCAGCGTCA ACGAGTGCGT CAGCAAGTAC TAGCGCCTCA GCCTCAGCGT GTGCGTCCGC TTCAGCAAGT ACTAGCGCCT CAGCGTCAAC GAGTGCGTCT GAGTCAGCAT CAGCTTCTGA ATCTGCATCA ACCAGTGCGT CAGCAAGTAC CAGTGCGTCA GcTCAGCGTC
CGACAAGTGC
CAGCTTCAGC
GTACTAGTGC
CAGAGTCAGC
GCACCAGTGC
CAGCCTCAGC
GTACTAGCGC
CTGAATCGGC
CAACAAGTGC
CAGCCTCAGC
CAACGAGTGC
CA9CCTCAGC GACAAGTGCs 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 TCrGCTTCAG CAAGTACCAG TGCGTCAGCC TCGACAAGTG CGTCGGCCTC AACCAGTGCA TCAGCAAGTA CTAGCGCCTC AGCCTCAGCA AGTGCATCAG CTTCAGCAAG TACTAGCGCC TCAGCAAGTA CCAGTGCGTC AGCCTCAGCG TCAGCGTCTG AATCAGCATC AACAAGTGCG TCAGCATCAA CAAGTGCTTC AGCTTCAGCA AGTGCTTCAG TCTCAGCGTC AACCAGTCC TCAGCAAGCA CCAGTGCTTC GGCTTCAGCG AGTGCGTCAC CTCAGCAAGC ACATCAGCT CGCATCAACA AGCGCCTCGG CCTCAGCAAG TGCATCAGCT TCAGCCTCAA CAAGTGCTTC 1377
TCAGCAAGTA
TCTGAATCGG
TCAACGAGTG
TCAGCCTCAG
TCGACAAGTG
TCGGCTTCAG
AGTACCAGTG
TCTGAATCCG
TCAACGAGTG
CTGAATCTGC
CCAG'rGCkTC AGCCTCAGCG CATCAACCAG TCCGTCAGCC
CGTCCCTC
CGTCGACAAG
CGTCGGC'TC
AGCAAGTACT
CGCCTCAGCT
AGCAAGTACC
CATCAACCAG TGCATCAGCT CGTCGGCTTC AGCATCAACG
CATCAACAAG
CGTCTGAGTC
ATCAACCAGT
TGCCTCGGCT
AGCATCAACG
GCGTCACTTC
CATCAACCAG
CCTCGGCTTC
TACAAGTGCT TCAGCCTCAG AGCCTCAGCG TCAACCAGTG
S
S S 5 S. S.
S S
S
S.
S
t. S *5 S C .5.5
S
*55.
55*y 4 St 55 S S AGCAAGTACC AGTGCGTCAG cTTCAGCAAG CACAAGTGCG TGCTrrCGGCT TCGGCATCAA CAAGTGCCTC AGCATCAGCA GCAAGTACTA GTGCATCAGC ATCAGCATCA ACCAGTGCAT GCGTCTGAAT CGGCATCAAC GAGTGCATCA GCATCAGCAT GCGTCAACCA GTGCATCAGT CTCAGCAAGC ACCAGTGCGT GCCTCAGCCT CAGCAAGTAT CTCAGCGTCT GAATCGGCAT GCAAGTACTA GTGCATCAGC ATCAGCATCA ACGAGTGCAT GCCTCAGCTT CAGCAAGCAC CAGTGCGTCA GCCTCAGCAA GCAAGCACCA GTGCCTCAGC TTCAGCAAGT ACCAGTGCGT GCGTCGGCTT CAGCAAGTAC CTCAGCGTCT GAATCAGCAT GCATCAACAA GTGCTTCAGC TTCAGCAAGT ACCAGTGCGT TCAGCTTCAG CATCAACCAG TCAACGAGTG CGTCAsCTCA CAGCCTCAGC AAGTATCTCA CAACGAGTGC ATCGGCT'rCA CGGCTTCAGC ATCAACCAGT CAACGAGTGC GTCAGcCTCA CGGCTTCAGC AAGTACCAGC GTACCAGCGC CTCAGCCTCA CAGCCTCAGC G'rCGACAAGT CAACGAGTGC ATCAGCTTCA CGGCT'rCAGC ATCA-ACGAGT CAACAAGTGC CTCGGCT'rCA CGGCTrCAGC ATCGACAAGT CAACGAGTGC GTCAGCCTCA CCGCTrCAGC GTCAACCAGT CAACGAG'rGC GTCGGCCTCA GTGCGTCCGC TTCAGCAAGC 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 GC'rTCAGTCT
GCAAGCACCA
GCGTCTGAA'r
GCAAGCACAT
GCGTCGGCTT
CAGCGTCAAC CAGTGCCTCT GAATCAGCAT GTGCGTCGGC TTCAGCAAGT ACTACTGCAT CGGCATCAAC GAGTGCTTCG GCTTCACCAT CAGCTTCTGA ATCTGCATCA ACCAGTGCGT CAGCGTCGAC AAGTGCTTCG GCTTCAGCA'r GCAAGCGCAA GTACCTCAGC GTCAGCTTCC GCCTCAACCA ACAAGTGCGT CAGCCTCAGC GCCTCAGCAA GCGCA.AGTAC GCAAGCACAA GTGCGTCAGC GCGTCTGAGT CAGCATCAAC GCATCAACCA GTGCGTCAGC GCT'rCAGCCT CAGCGTCGAC
AAGTATCTCA
CTCAGCGTCA
CTCAGCAAGT
GAGTACGTCA
CTCAGCATCG
AAGTGCGTCG
1378 GCGTCTGA.AT CGGCATCAAC GAGTGCGTCG GCTTCCGCCT CAACCAGTGC GTCGGCTTCA
ATCTCAGCGT
GCCTCAGCAA
ACAAGCGCCT
GCCTCAACCA
ACCAGTGCGT
GCTTCAGCAT
ACCAGTGCTT
GCTTCAGCA.A
ACCAGTGCGT
GCTTCAGCAT
ACCAGTGCGT
CAGCCTCAGC AAGTACTAGT GCATCAGCTT CAACCAGTGC CTCGGCTTCA GCGTCAACCA CAGTCTCAGC ATCAACAAGT GCTTCAGCCT GCACATCAGC ATCTGAATCA GCGTCGACAA CAGCCTCAGC GTCGACAAGT GCGTCAGCCT CAACGAGTGC ATCGGCTTCG GCGTCAACCA CAGCT'rCCGC ATCAACAAGT GCCTCGGCTT CTGA.ATCGGC ATCAACGAGT GCACATCAGC TTCTGAATCG CAGCTTCAGC AAGTACCAGT GTGCATCTGA ATCGGCATC.A CAGCATCAAC GAGTGCATCG GTGCGTCAGC TTCAGCAAGT CAGCATCGAC AAGTGCCTCG GCGCCTCAGC TTCAGCAAGT CAGCA.AGTAC TAGTGCATCA GTGCATCAGA GTCAGCAAGT CAGCAAGCAC CAGTGCGTCG 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4483 GCT'rCACCAA GTAC'rAGCGC CTCAGCCTCA GCCTCAACCA GTGCGTCAGC CTCAGCAAGT ATCTCAGCGT CTGAATCGGC ATCAACGAGT GCGTCCGCTT CAGCAAGTAC TAGCGCCTCA GCCTCAGCGT CAACAAGTGC ATCGGC'rTCA GCGTCA-ACGA GTGCGTCTGA ATCGGCATCA ACGAGTGCGT CCGCTTCAGC AAGTACTAGC GCCTCAGCCT CAGCGTCAAC AAGTGCATCG GCTTCAGCAT CAACGAGTGC GTCCGCTTCA GCAAGTACTA GCGCCTCAGC CTCAGCGTCA ACAAGTGCAT CGGGTTCAGC GTCAACGAGT GCGTCTGAGT CAGCATCAAC GAGTGCGTCA CCTCAkcAAG CACATCAGCT TCTGAATCTG CATCAACCAG TGCGTCACTT CCGCATCAAC AAGCGCCTCG GCCTCAGCAA GTACAAGTGC TTCAGCCTCA GCATCAACCA GTGCATCAGC TTCAGCCTCA ACAAGTGCTT CAGCCTCAGC GTCAGACCAG TGCCTCGGCT TCAGCAAGTA CCAGTGCGTC ACTTCAGCAA GCACAAGTGC GTCAGCTTCA GCATCAACCA GTGCTTCGC;-: TTCGGCATCA ACAAGTGCCT CAGCATCAGC ATCAACGAGT GCG INFORMATION FOR SEQ ID NO: 364: i) SEQUENCE CHARACTERISTICS: LENGTH: 2550 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEO ID NO: 364: GTACCTCAGC GTCCTTCCGC CTCAACCAGT CCTCAGCAAG TATCTCAGCG TCTGAATCGG CAAGTACCTC AGCGTCACTT CCGCCTCAAC GTCASCTCAG CAAGTA'rCTC ACTCTGAA TCAACGAGTA CGTCAGCCTC AGCAACCACA
TCAGCCTCAG
TCGACAAGTG
TCAGCAAGTA
AGTGCCTCGG
TCAGCATCAA
CATCGACAAG CGCCTCAGCT CGTCGGCCTC AACCAGTGCA CTAGTGCATC AGCTTCAGCA C'rTCAGCGTC AACCAGTGCG CAAGTGCTTC AGCCTCAGCA 1379 GCGTCCGCTTr CAGCAAGCAC AAGTGCGTCA CATCAACGAG TGCGTCGCC TCAGCAAGCG CAGTGCGTCG GCTTCAGCAA GCACAAGTGC TCGGCATCAA CGACTGCGTC TGAGTCAGCA TCAGCTTCTG AATCGGCCATC AACCAGTGCG TCAGCAAGTA CCAGTGCTTC AGCCTCAGCG TCTGAATCGG CATCAACCAG TGCGTCAGCC TCAACGAGTG CATCGGCI'TC AGCATCAACC TCAGCTTCAG CAAGTACCAG TGCTTCAGTC TCGACAAGTG CCTCGGCTTC AGCAAGCACA TCGGCCTCAA CCAGTGCATC TGAATCGGCA AGTGCATCAG CTTCAGCATC AACGAGTGCA TCAGCATCTG AATCAGCGTC GACAAGTGCG TCAACCAGTG CGTCAGCCTC AGCAAGTACT 0 0 e 0. e
TCGGCTTCGG
TCAACAAGTG
TCGGCTTCAG
AGCACCTCAG
TCTGAATCGG
AGCACAAGCG
TCAGCCTCAG
TCAACGAGTG
TCAGCCTCAG
AGTACTAGCG
TC'DGAGTCAG
TCAACCAGTG
TCAGCCTCAG
CGTCAACCAG TGCATCAGAG TCAGCAAGTA CCAGTGCGTC AGC'rTCCGCA CCTCGGCTTC AGCAAGCACA TCAGCATCTO AATCAGCGTC AACCAGTGCT 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 CAAGTACCAG TGCTT~CAGCT TCAGCATCAA CCAGCGCCTC CTTCTGAATC GGCCTCAACC AGCGCCTCGG CCTCAGCAAG CCTCAACCAG CGCCTCAGCC TCAGCATCAA CGAGTGCTTC CCTCGGGTTC AGCATCAACG AGTACGTCAG CTTCAGCGTC CATCAACAAG TGCGTCAGCC TCAGCAAGTA 'rCTCAGCGTC CGTCTGAGTC AGCATCAACG AGTACGTCfAG CCTCAGCAAG CAAGTATCTC AGCGTCTGAA TCGGCATCAA CGAGTGCG'rC CCTCAGCATC AGCGTCAACA AGTGCTTCGG CTTCAGCGTC CATCAACGAG TACGTCAGCC TCAGCAAGCA CATCAGCTTC CGTCAGCCTC AGCATCGACA AGCGCCTCAG C'rTCAGCAAG CAAGTACCAG 'rGCT'rCAGCC TCAGCGTCGA CAAGTGCGTC
GGCCTCAGCA
CACCTCAGCT
GGCTTCAGCA
ALACCAGTGCT
TGAATCGGCA
CACAAGTGCT
CGCTTCAGCA
PLACGAGTGCG
TGPLATCTGCA
TACCAGTGCG
GGCCI'CAACC
CCATCAGCTT
GCAAGTACCA
AGTrGCATCTG AATCGGCATC AACCAGTGCG TCAGCTCAGC CAGCATCAAC GAGTGCATCG GCTTCGGCGT CAACCAGTGC
AAGTACTAGT
ATCAGACTCA
GTGCGTCACt TCCGCATCAA CAAGTGCCTC GGCTTCAGCA AGCACATCAG CATCTGAATC AGCGTCAACC AGTGCTTCGG CTTCAGCAAG TACCAGTGCT TCAGCT'rCAG CATCAACCAG
CGCCTCGGCC
AGCAAGCACC
TGCTTCGGC'r
AGCGTCAACC
AGCGTCTGAA
AGCAAGCACC
CGCCTCAGCT
AACCAGTGCA
GGCTTCAGCA
TACCAGTGCT
GGCTTCAGCA
TCAGCAAGCA
TCAGCTTCTG
'rCAGCAAGCA
AGTGCTTCAG
TCGGCATCAA
TCAGCTTCTG
TCAGCAAGTA
TCTGAATCGG
TCAACCAGTG
TCAGTCTCAG
AGCACATCAG
1380 CCTCAGCTTC TGAATCGGCC AATCGGCCTC AACCAGCGCC CAAGCGCCTC GGGTTCAGCA.
CCTCAGCATC AACAAGTGCG
CGAGTGCGTC
AATCGGCCTC
CCAGTGCTTC
TGAGTCAGCA
AACCAGTGCG
AGCCTCAGCG
TCAACCAGCG CC'TCGGCCTC TCAGCCTCAG CATCAACGAG TCAACGAGTA CGTCAGCTrC TCAGCCTCAG CAAGTATCTC TCAACGAGTA CGTCAGCCTC TCAGCCTCAG CATCGACAAG TCGACAAGTG CGTCGGCCTC TCAGCAAGTA CTAGTGCATC AGTGCGTCAG CTTCAGCAAG TCAGCATCGA CAAGTGCCTC AGCGCCTCAG CTTCAGCAAG ACAGCAAGTA CTAGTGCATC AGTGCATCAG AGTCAGCAAG 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2550 CATCAACCAG TGCGTCAGCC CCTCGGCTTC AGCGTCAACC TACCAGTGCG TCAGCCTCAG AGCTTCAGCA TCAACGAGTG TACCAGTGCG TCAGTTCACG
CATCAACAAG
CATCTGAATC
CGTCGACAAG
CATCGGCTTC
CATCAACAAG
TGCTTCAGCC
AGCGTCGACA
TGCGTCAGCT
GGCGTCAACC
INFORMATION FOR SEQ ID NO: 365: SEQUENCE CHARACTERISTICS: LENGTH: 1436 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 365: ACCCAGCAAG TACTAGTGCA CCAGTGCCTC AGCCTCAGCA CTCAGCAAGT ACTAGTGCAT CAGCGCCT CA GCTTCAGCAA TCGGCTTCAG CAAGCACCAG TGCGTCGGCT TCAGCATCAA AGTATCTCAG CGTCTGAATC GGCATCAACG AGTGCGTCAC
TCAGCAAGCA
AGTGCGTCGG
TCAGCATCAA
AGTGCGTCCG
TCAGCGTCAA
CCAGTGCCTC
CTTCAGCAAG
CAAGTGCTTC
CTTCAGCAAG
CGAGTGCGTC
CAGCATCAGC
GCACCAGTGC
AGCTTCAGCA
TACCTCAGCG
AGCTTCAGCA
TACTAGCGCC
TGAGTCAGCA
ATCAACGAGT
GTCAsCTCAG
AGTACCAGTG
TCTGAATCAG
AGTATCTCAG
TCAGCATCAG
TCAACGAGTA
GCATCGGCT'r CAGCAAGTAC CAAGTACCAG CGCCTCAGCC CGTCAGCCTC AGCGTCGACA CATCAACGAG TGCATCAGCT CGTCTGAATC GGCATCAACG CGTCAACAAG TGCTTCGGCT CGTCAGCCTC AGCAAGCACA CATCGACAAG CGCCTCAGCT TCAGCTTCTG AATCTGCATC AACCAGTGCG TCAGCCTCAG 1381 TCAGCAAGTA CCAGTGCGTC AgCCTCAGCA AGTACCAG'rG C'TTCAGCCTC AGCGTCGACA AGTGCGTCGG CCTCAACCAG TGCATCTGAA TCGGCATCAA CCAGTGCGTC AGCCTCAGCA AGTACI'AGCG CCTCAGCCTC AGCATCAACG AGTGCGTCCG CTTCAGCAAG TACTAGTGCA TCAGCTTCAG CAAGTACTAG CGCCTCAGCC TCAGCGTCGA CAAGCGCCTC A~CTrCAGCA AGTACCAGTG CGTCAGCCTC AGCGTCGACA AGTGCGTCGG CTrCAGCAAG 'rACCTCAGCG TCTGAATCAG CATCAACAAG TCAACAAGTG CTTCAGCTrC TCAGTCTCAG CGTCAACCAG AGCACCAGTG CTTCGGCTTC TGCGTCGGCT TCAGCATCAA CGAGTGCATC AGCAAGTACC AGTGCGTCGG CTTCAGCATC TGCCTCTGAA TCCGCATCAA CAAGTGCCTC AGCGTCAACG AGTGCGTCTG AGTCAGCATC
AGCTTCAGCA
AACGAGTGCT
GGCTTCAGCA
AACGAGTGCG
AGCTTCCGCA
AACCAGTGCA
GGCTTCAGCA
AACCAGTGCT
AGCCGG
660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1436
TCAGCCTCAG
TCAACAAGCG
TCAGCTTCAG
AGTACCAGTG
TCGGCTTCGG
CAAGCACATC AGCTTCTGAA TCTGCATCAA CCAGTGCGTC CCTCGGCCTC AGCAAGTACA AGTGCTTCAG CCTCAGCATC CCTCAACAAG TGCTTCAGCC TCAGCGTCAA CCAGTGCCTC CGTCAGCTT~C AGCAAGCACA AGTGCGTCAG CTTCAGCATC CATCAACAAG TGCCTCAGCA TCAGCATCAA CGAGTGCGTC INFORMATION FOR SEQ ID NO: 366: SEQUENCE CHARACTERISTICS: LENGTH: 735 base pairs B) TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 366:
GCAGTTGCCA
CGTWTGGCAA
AGGTCTTCCC
AATAAAGAGG
CACCGTGCTG
GCCAAACTTG
GTITTGAGATA
AGTTTCACGT
AACTTGGGTC TGCCTGACTT ATGTTTCTGG GACAAAACGT GATATGCCAT GCTTTACCCT TCTGTCTCCA CTCCGCTCGA ACTACATTAT CTTCATTAAG ACCAGCACCC GTTCCTGCGA TCCTAAGGCA TTGTTAATCT A.ATCTTGCTC CGCCAATATO CCTACGTACT GGCGCAAAAG TCACGGTAGG CCTTCTCCAA CCGCCGAATT TTCCGTAAAA CTCTATAAAT CTTCTAATCT TACATCTACT GCATAGGGAG GCCACCTGCG ATAAAGAAGG TAATTT'rCTT TTTACCCATG TGTGGGCTCC TGTATGGTTA CTGGGTCAAG T'rTTTTGCGT CTGGTTTAAT TCCTCTTGGA
CTCCAAAACT
TCCATCTTTA
TTTCATGATC
GCTGTCATCA
TTTGGTTCCT
T'TTTTGTCCA
120 180 240 300 360 420 480 540 TAAAGTGTTG AATTGCTTTT GCTGTGCTAG TCCAGTCGTA 1382 TCCAGTTGAC CCCAATCAAA GGGCTGGCCA CT'rCCTGCCA CAGGGGCATC AAAGAGTAGA TAATCTGCCT GAGAA'rTGGG GACATGCCCA TTTCCATCTA CCTGCACAGC CTGAATAC'rG GCACAAGGCA AATTCTCAAA TAAATCATCT GCCACCTGAC CGTGAACTTG AACCAAGTCC AAGCCGGGGA TCCTC INFORMATION FOR SEQ ID NO: 367: SEQUENCE CHARACTERISTICS: LENGTH: 1702 base pairs TYPE: nucleic acid STR.ANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 367: TACTAGCGCC TCAGCCTCAG CGTCAACAAG TGCATCGGCT TCAGCATCAA CGCTTCAGCA AGTACTAGCG CCTCAGCCTC AGCGTCAACA AGTGCATCGG AACGAGTGCG TCTGAGTCAG CATCAACGAG TGCGTCAGCC TCAGCAAGCA TGAATCTGCA TCAACCAGTG CGTCAGCCTC AGCATCGACA AGCGCCTCAG TACCAGTGCG TCAGCCTCAG CGTCGACAAG TGCGTCGGCT TCAGCAAGTA AGCCTCAGCA AGTACCAGTG CGTCAGCCTC AGCGTCGACA AGTGCGTCGG
A.
A
A
A
A. *A
A
A A 4
A
A. A.
A A
A
A.
*0 *A A A
A
TGCATCTGAA TCGGCATCAA CCAGTGCGTC AGCCTCAGCA AGCATCAACG AGTGCATCGG CTTCAGCATC AACCtAGTGCA
AGTACTAGTG
TCAGAGTCAG
CGAGTGCGTC
CTT1CAGCGTC CATCAGC-rC
CTTCAGCAAG
CCAGTGCGTC
CCTCAACCAG
CATCAGCTTC
CAAGTACCAG
CCTCAGCCTC
CAAGTATCTC
CATCAGTCTC
CATCAACCAG
CATCGGCT2'C
CATCAACGAG
CTTCGGCTTC
CAAGCACATC
TGCGTCAgCT TCCGCATCAA AGCGTCAACA AGTGCTTCAG AGCGTCTGAA TCGGCATCAA AGCAAGCACC AGTGCGTCGG TGCCTCAGCT TCAGCAAGTA AGCAAGCACA AGTGC1'TCAG TGCGTCCGCT TCAGCAAGTA AGCGTCAACG AGTGCGTCTG AGCTTCTGAA TCTGCATCAA AGCAAGTACC AGTGCGTCAG TGCGTCGGCC TCAACCAGTG TACTAGCGCC TCAGCCTCAG
CAAGTGCCTC
CTTCCGCGTC
CA.AGTGCCTC
CCTCAGCAAG
CCTCAGCATC
CCTCAGCAAG
CTAGCGCCTC
AGTCAGCATC
CCAGTGCGTC
CCTCAGCAAG
GGCTTCAGCA AGTACTAGCG AACCAGCGCC TCGGCCTCAG GGCTTCAGCA TCAACGAGTG CACCAGCGCG TCTGAATCCG TGAATCAGCA TCAACAAGTG TATCTCAGCG TCTGAATCGG AGCATCAGCG TCAACAAGTG AACGAGTACG TCAGCCTCAG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 AGCCTCAGCA TCGACAAGCG CCTCAGCTTC TACCAGTGCT TCAGCCTCAG CGTCGACAAG CATCTGAATC GGCATCAACC AGTGCGTCAG CCTCAGCAAG CATCAACGAG TGCGTCCGCT TCAGCAAGTA CTAGTGCATC
AGCATCAGCA
CACCAGTGCG
AGCTTCAGCA
TACCTCAGCG
AGCTTCAGCA
1383 TCAACGAGTG CATCGGCTTC AGCAAGTACC AGCGCCTCAG TCAGCCTCAG CACTACCAG CGCCTCAGCC TCAGCAAGCA AGTACCAGTG CGTCAGCCTC AGCGTCGACA AGTGCGTCGG TCTGAATCAG CATCAACGAG TGCATCAGCT TCAGCATCAA AGTACCAGTG CGTCGGCTTC AGCATCAACG AGTGCTTCAG CTrCAGCAAG 1260 CCAGTGCCTC 1320 CTTCAGCAAG 1380 CAAGTGCTTC 1440 TCTCAGCGTC 1500 CCAGTGCGTC 1560 AATCGGCATC 1620 CATCAGCTTC 1680 1702 AACCAGTGCC TCTGAATCAG CATCAACAAG TGCCTCGGCT CGCTTCAGCA AGTACTAG'rG CATCGGCTTC AGCATCGACA AACGAGTGCT TCGGCTTCAG CATCAACGAG TGCGTCAGCC TGAATCTGCA TCA.ACCAGTG CG INFORMATION FOR SEQ ID NO: 368: SEQUENCE CHARACTERISTICS: LENGTH: 941 base pairs (8B) TYPE: nucleic acid STRANDEDNESS: double (D TOPOLOGY: linear
TCAGCAAGCA
AGTGCGTCTG
TCAGCAAGCA
.0~0S~
A.
4@
A.
I
a 9*
A
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 368: ACCAGTGCAT CAGCTTCAGC CTCAACAAGT GCTITCAGCCT CAGCGTCAAC CAGTGCCTCG GCTTCAGCAA GTACCAGTGC OTCACTTCAG CAAGCACAAG TGCGTCACT' CAGCATCAAC CAGTGCTTCG GCTTCGGCAT CA.ACAAGTGC CTCAGCATCA GCATCAACGA GTGCGTCACC TCAGCAAGTA CTAGTGCATC AGCATCAGCA TCAACCAGTG CATCAGCCTC AGCAAGTATC TCAGCGTCTG AATCGGCATC AACGAGTGCA TCAGCATCAG CATCA.ACGAG TGCATCGGCT TCAGCGTCAA CCAGTGCATC AGTCTCAGCA AGCACCAGTG CGTCGGCTTC AGCATCAACG AGTGCCTCAG CCTCAGCAAG TATCTCAGCG TCTGAATCGG CATCAACGAG TGCGTCAGCC TCAGCAAGTA CTAGTGCATC GGCTTCAGCA AGCACCAGTG CGTCGGCTTC AGCATCAACC AGTGCCTCAG CCTCAGCA.AG TATCTCAGCG TCTGAATCGG CATCAACGAG TGCGTCAGCC TCAGCAAGTA CTAGTGCATC AGCATCAGCA TCAACGAGTG CATCGGCTTC AGCAAGTACC AGCGCCTCAG CTTCAGCA.AG CACCAGTGCG TCAGCCTCAG CAAGTACCAG CGCCTCAGCC TCAGCAAGCA CCAGTGCCTC AGCTTCAGCA AGTACCAGTG CGTCAGCCTC AGCGTCGACA AGTCGTGGCTCAGCAAf TACCTCACG TCTGAApCAf CA 'CAACGAC TGCATCAGCT a. a a a *aa.
a. *a a TCAGCATCAA CAAGTGCTTC AGCTTCAGCA AGTACCAGTG cGTCGGCTTC AGCATCAACG 84 1384 AGTGCTTCAG TCTCAGCGTC AACCAGTGCC TCTGAATCAG CATCAACAAG TGCCTCGGCT TCAGCAAGCA CCAGTGCGTC GGCTTCAGCA AGTACTAGTG C INFORMATION FOR SEQ ID NO: 369: SEQUENCE CHARACTERISTICS: LENGTH: 869 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 369: CAGCAAGTAC TAGTGCATCA GCTTCAGCAT GTGCATCAGA GTCAGCAAGT ACCAGTGCGT CAGCAAGCAC CAGTGCGTCG GCTTCAGCAA GTGCGTCAGC CTCAGCAAGT ATCTCAGCGT CAGCAAGTAC TAGCGCCTCA GCCTCAGCGT GTGCGTCTGA ATCGGCATCA ACGAGTGCGT
CAACGAGTGC
CAGCTTCCGC
GTACTAGCGC
CTGAATCGGC
ATCGGCrTCT GCGTCAACCA ATCAACAAGT GCCTCGGCTT CTCAGCCTCA GCCTCAACCA ATCAACGAGT GCGTCCGCTT CAACAAGTGC ATCGGCTTCA GCGTCAACGA CCGC'N'CAGC AAGTACTAGC GCCTCAGCCT CAGCGTCAAC AAGTGCATCG GCTTCAGCAT CAACGAGTGC GTCCGCTTCA GCAAGTACTA GCGCCTCAGC CTCAGCGTCA ACAAGTGCAT CGGCTTCAGC GTCAACGAGT GCGTCTGAGT CAGCATCAAC GAGTGCGTCA GCCTCAGCAA GCACATCAGC TrCTGAATCT GCATCAACCA GTGCGTCAGC CTCAGCATCG ACAAGCGCCT CAGCT'rCAGC AAGTACCAGT GCGTCAGCCT CAGCGTCGAC AAGTGCGTCG GCTTCAGCAA GTACCAGTGC GTCAGCCTCA GCAAGTACCA GTGCGTCAGC CTCAGCGTCG ACAAGTGCGT CGGCCTCAAC CAGTGCATCT GAATCGGCAT CAACCAGTGC GTCAGCCTCA GCAAGTACTA GTGCATCAGC CGGCTTCAGC ATCAACCAGT GCATCAGAGT CAGCAAGTAC GCAACAAGTG CCTCGGCTTC AGCAAGTAC INFORMATION FOR SEQ ID NO: 370: SEQUENCE CHARACTERISTICS: LENGTH: 750 base pairs TYPE: nucleic acid STRANDEDNESS: double (0D) TOPOLOGY: linear TTCAGCATCA ACGAGTGCAT CAGTGCGTCA GnTTCCGCAT 869 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 370: TCAACAAGTG CCTCAGCATC AGCATCAACG AGTGCGTCAG CCTCAGCAAG TACTAGTGCA 1385 TCAGCATCAG CATCAACCAG TGCATCAGCC TCAGCAAGTA TCTCAGCGTC TGAATCGGCA TCAACGAGTG CATCAGCATC AGCATCAACG AGTGCATCGG CTTCAGCGTC AACCAGTGCA TCAGTCTCAG CAAGCACCAG TGCGTCGGCT TCAGCATCAA CGAGrG.CCTC AGCCTCAGCA AGTATCTCAG CGTCTGAAT C GGCATCAACG AGTGCGTCAG CCTC.AGCAAG TACTAGTGCA TCGGCTTCAG CAAGCACCAG TGCGTCGGCT TCAGCATCAA CCAGTGCCTC AGCCTC.AGCA AGTATCTCAG CGTCTGAATC GGCATCAACG AGTGCGTCAG CCTCAGCAAG TACTAGTGCA TCAGCATCAG CATCAACGAG TGCATCGGCT TCAGCAAGTA CCAGCGCCTC AGCP1TCAGCA AGCACCAGTG CGTCAGCCTC AGCAAGTACC AGCGCCTCAG CCTCAGCAAG CACCAGTGCC TCAGCTTCAG CAAGTACCAG TGCGTCAGCC TCAGCGTCGA CAAGTGCGTC GGCTTCAGCA AGTACCTCAG CGTCTGA.ATC AGCATCAACG AGTGCATCAG CTTCAGCATC AACAAGTGCT TCAGCT'rCAG CAAGTATCTC AGCGTCTGA.A TCGGCATCAA CGAGTGCGTC CGCTT.CAGCA AGTACTAGCG CCTCAGCATC AGCGTCAACG INFORMATION FOR SEQ ID NO: 371: SEQUENCE CHARACTERISTICS: LENGTH: 957 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 371: CCGGAAAACA GCTCTGGCGC TTGGTCTTGC CCAGCGTATT GCTAGTGGTG ACGTGCCTGC GGAAATGGCT AAGATGCGCG TGTTAGAACT TGATTTGATG AATGTCGTTG CAGGGACACG CTTCCGTGGT GACTTTGAAG AACGCATGAA TAATATCATC AAGGATATTG AAGAAGATGG CCAAGTCATC CTCTTTATCG ATGAACTCCA CACCATCATG GGTTCTGGTA GCGGGATTGA TTCGACTC TG GATGCGGCCA ATATCTTGAA ACCAGCCTTG GCGCGTGGAA CTTTGAGAAC GGTTGGTGCC ACTACTCAGG AAGAATATCA AAAACATATC GAAAAAGATG CGGCACTTTC TICGTCGTfTTC GCTAAAGTGA CGATTGAAGA ACCAAGTGTG GCAGATAGTA TGACTATrT ACAAGGTTTG AAGGCGACTT ATGAGAAACA TCACCGTGTA CAXATCACAG ATGAAGCGGT TGAAACAGCG GTTAAGATGG CTCATCGTTA TT'PAACCAGT CGTCACTTGC CAGACTCTGC TATCGATCTC TTGGATGAGG CGGCAGCAAC AGTGCAAAAT AAGGCAAAGC ATGTAA.AAGC AGACGATTCA GATTTGAGTC CAGCTGACAA GGCCCTGATG GATGGCAAGT GGAAACAGGC 1386 AGCCCAGCTA ATCGCAAAAG AAGAGGAAGT ACCTGTCTAC AAAGACTrGG TGACAGAGTC 720 TGATATTTTG ACCACCTTGA GTCGCN'GTC AGGAATCCCA GTTCAAAAAC TGACTCAAAC 780 GGATGCTAAG AAGTATTTAA ATCTTGAAGC AGAACTCCAT AAACGGGTTA TCGGTCAAGA 840 TCAAGCTGTT TCAAGCATTA GCCGTGCCAT TCGCCGCAAC CAGTCAGGGA TTCGCAGTCA 900 TAAGCGTCCG ATTGGTTCCT TTATGT'rCCT AGGGCCTACA GGTGTCGGGG TATCCGA 957 INFORMATION FOR SEQ ID NO: 372: SEQUENCE CHARACTERISTICS: LENGTH: 807 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 372: CAAAGCGCCT CAGCTTCAGC ATCAACAAGT GCGTCGGCTT CAGCATCAAC CAGTGCCTCG GCTTCAGCGT CAACCAGTGC GTCACATTCA GCAAGTACCA GTGCTTCAGT CTCAGCATCA 120 *ACAAGTGCTT CAGCCTCAGC ATCGACAAGT GCCTCGGCTT CAGCAAGCAC ATCAGCATCT 180 *GAATCAGCGT CAACCAGTGC TTCGGCTTCA GCAAGTACCA GTGCTTCAGC TTCAGCATCA 240 *ACCAGCGCCT CGGCCTCAGC AAGCACCTCA GCTTCTGAAT CGGCCTCAAC CAGCGCCTCG 300 .GCCTCAGCAA GCACCTCAGC TTCTGAATCG GCCTCAACCA GCGCCTCAGC CTCAGCATCA 360 ***ACGAGTGCTT CGGCTrCAGC AAGCACAAGC GCCTCGGGTT CAGCATCAAC GAGTACGTCA 420 GCTTCAGCGT CAACCAGTGC TTCAGCCTCA GCATCAACAA GTGCGTCAGC CTCAGCAAGT 480 ATCTCAGCGT CTGAATCGGC ATCAACGAGT GCGTCTGAGT CAGCATCAAC GAGTACGTCA 540 GCCTCAGCAA GCACCTCAGC TTCTGAATCG GCCTCAACCA GTGCGTCAGC CTCAGCATCG 600 ACAAGCGCCT CAGCTTCAGC AAGTACCAGT GCTTCAGCCT CAGCGTCGAC AAGTGCGTCG 660 GCCTCAACCA GTGCATCTGA ATCGGCATCA ACCAGTGCGT CAGCCTCAGC AAGTACTAGT 720 *GCATCGGCTT CAGCATCAAC CAGTGCCTCG GCTTCAGCGT CAACCAGTGC GTCAGCTTCA 780 GCAAGTACCA TGTGCTI'CAT GTCTCAG 807 INFORM4ATION FOR SEQ ID NO: 373: SEQUENCE CHARACTERISTICS: LENGTH: 1068 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1387 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 373: CATCGGCTTC AGCATCAACG CGTCAAGAAG TGCATCGGCT CGTCACCTCA GCAAGCACAT TCGACA.AGCG CCTCAGCTTC CGGCTTCAGC AAGTACCAGT ACA.AGTGCGT CGGCCTCAAC CAAGTACTAG TGCATCAGCT CATCAGAGTC AGCAAGTACC CAAGTACTAG CGCCTCAGCC CCTCGGCCTC AGCAAGTATC CATCAACGAG TGCATCAGTC AGTGCGTCCG CPCAGCAAG TACTACCGCC TCAGCCTCAG TCAGCGTCAA CGAGTGCGTC TGAGTCAGCA TCAACGAGTG CAGCTrCTGA ATCTGCATCA. ACCAGTGCGT CACCTCAGCA AGCAAGTACC AGTGCGTCAC GCGTCAsCTC AGCAAGTACC CAGTGCATCT GAATCGGCAT TCAGCATCA.A CGAGTGCATC AGTGCGTCAG cTTCCGCATC TCAGCGTCAA CAAGTGCTTC TCAGCGTCTG AATCGGCATC TCAGCAAGCA CCAGTGCGTC CTCAGCGTCG ACAAGTGCGT AGTGCGTCAC CTCAGCGTCG CAACCAGTGC GTCACCTCAG GGCTTCAGCA TCAACCAGTG AACAAGTGCC TCGGCTI'CAG AGCTTCCGCG TCA.ACCAGCG AACAAGTGCC TCGGCTTCAG GGCCTCAGCA AGCACCAGCG TACCTCAGCA TCTGAATCAG AGCCTCAGCA AGTATCTCAG TACTAGCGCC TCAGCATCAG TGAGTCAGCA TCAACGAGTA AACC.AGTGCG TCAGCCTCAG AGCCTCAGCA AGTACCAGTG
TGCATCTG
S
S
S. 55 S S .5 S.
S
S. *S S. S
S
S
CGTCTGAATC CGCATCAACC AGTGCCTCAG CTTCAGCAAG CATCAACAAG TGCATCGGCT TCAGCAAGCA CAAGTGCTTC CGTCTGAATC GGCATCAACG AGTGCGTCCG CTTCAGCAAG CGTCAACAAG TGCTTCGGCT TCAGCGTCAA CGAGTGCGTC CGTCAGCCTC AGCAAGCACA TCAGCTTCTG AATCTGCATC CATCGACAAG CGCCTCAGCT TCAGCAAGTA CCAGTGCGTC CTTCAGCCTC AGCGTCGACA AGTGCGTCGG GCTCAACCAG INFORMATION FOR SEQ ID NO: 374: SEQUENCE CHARACTERISTICS: LENGTH: 620 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1068 120 180 240 ;6 S 5.55 5* 55
S
S
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 374: CAGCATCAAC GAGTGCTTCA GTTTCAGCGT CAACCAGTGC CTCTGAATCA GTGCCTCGGC TTCAGCAAGC CCC.AGTGCGT CGGCTTCAGC AAGTACTAGT CAGCATCGAC AAGTGCGTCT GAATCGGCAT CAACGAGTGC TTCGGCT'rCA.
GTGCGTCAGC CTCAGCAAGC ACATCAGC'TT CTGAATCTGC ATCAACCAGT
GCTTCAACPA
GCATCGGCTT
GCATCAACGA
GCGTCCGyTT CAGCGTCAAC CAGTGCGTCG GCTTCAGCGT GTGCGTCGGC CTCAGCAAGC GCAAGTACCT CGGCTTCAGC AAGCACAAGT GCGTCAGCCT CAACGAGTGC GTCTGAGTCA GCATCAACGA CTGAATCTGC ATCAACCAGT GCGTCAGCCT GTACCAGTGC TTCAGCCTCA GCGTCGACAA CGGCATCAAC CAGTGCG'rCA INFORMATION FOR SEQ ID NO: 37 SEQUENCE CHARACTERISTICS LENGTH: 720 base pa TYPE: nucleic acid STRANDEDNESS: doubi TOPOLOGY: linear 1388 CGACAAGTGC TTCGGCTTCA GCATCAACGA CAGCGTCAGC TTCCGCCTCA ACCAGTGCGT CAGCAAGTAT CTCAGCGTCT GAATCGGCAT GTACGTCAGC CTCAGCAAGC ACATCAGCTT CAGCATCGAC AAGCGCCTCA GC?1'CAGCAA GTGCGTCGGC CTCAACCAGT GCATCTGAAT (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 375: GTATTGGGGC GCCCCAACCT CTATGTGACT ACGGATTATT TCCTAGATTA CATGgGGATA AACCATTTAG AAGAATTACC AGTGAT'IGAT GAGCTTGAGA TTCAAGCCCA AGAAAGCCAA TTATTTGGTG AAAGGATAGA GGCCAGTAGG AGAAAAGCAG AGTGGTGCGT GAACTAGCAA ACCTATCTAC AACGAAGAAA CAGTGTGACA GATGATAAGG GCGTATTTAC CCTGTGGGTC TGATGGGGAC TTTACAGACG CGCGCGTGTT AAAGGTGTGG AGAAGATGAG AATCAATAAG AAGAGCTGAT TAAGCAAGGC CCACTATCAA GTCAGGCGAC AGGTCTACTA TCTGCTTAAC GTCGCAAGAC GGTTGTCGAC GTTTGGACTG GGATACATCA AGATGATTCA CCCTCGTAAT CCAATAAGGA CAATCTCCGC TATAT'rGCCC ACGCAGGTGT TTGGTGACGG TTAACGGCCA AAGGTCGAAG TTGAAGGTCA AAACCACGCG GTGTGATTTC CTCTTGCCCA ATGTCAAAGA GGTGTCTTGA. TTTTGACCAA GAGATTGACA AGGTTTATGT CCCTTGACCC GTGGTCTTGA.
GTAGCCTCTA CACCATAAAT GCGAAATTAT TCAAGTTCTT GATTGATGGT AAGAAAACCA AGCCATAATA TATAGGTTTTr ATTTGCTAAT AAAAATACTG TAT'rATTACC CTCTTAAGGT INFORMATION FOR SEQ ID NO: 376: SEQUENCE CHARACTERISTICS: LENGTH: 648 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear 1389 (xi) SEQUENCE DESCRIPTION. SEQ ID NO: 376: -CGCCATTTCC CATCGTACCG CCGAAAATCC CAGCGCCTCA
CGTTCTCA.AA
CGATGTCAAG
TACCTACACC
CTTTGGAAAT
AGTCGCCAAT
CAATGATAAA
TATTATCCTC
AGATAAGAAA
TTCATTTATA
AAATCGATT
AAAAGTGACC
GGGGGCGCAC
GGTCCCATGA
CCAATCGATA
CGTATTCAAA
AAACCAAATC
GCTATCCTAA
AGAGAAGAAC
TAGTAGATG
GACTGTCCTG
GCTCTCTCAT
ACTskATTGC QI rLGAAGGG;
TCTCAGATAT
CAGAATTCCA
CACTCTGGTG
CCATCATCTT
TTGCATAGAA
CATGTTTCCA
CAAAATGGCC
CTTGATTAGC
CAAGAAAATG
ACGTCTGGAC
GTTTATCCGC
TAGCTTTATC
GAAATGAACC
GCCA'ICAAAT ATCCTATCAA AGTGGTAGCC GCCACTCAA.A AAGGTCCGTA TCATGCCGGT CGTGAACGTG TCGATATGAA- AATGATGAAG GCATTGAAAC GAAGAAACGA AACAATGGCA ATCCCTGCCC TCATCCTTGC GCAAGCTTCA TCTGGAACCC TTGGCCAAAC AGCTAAGGTT GwACTAGAAT AGTACACCTC TACTTCTAAA ACATTTTTAG ATCGATTTGT CCTAATCTTA T'rTCAATT INFORMATION FOR SEQ ID NO: 377: SEQUENCE CHARACTERISTICS: LENGTH: 690 base pairs TYPE: nucleic acid STRANDEDNESS: double (D TOPOLOGY: linear Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 377: GTGCATCGCT TTCAGCATCG ACAAGTGCGT CTGAATCGGC ATCAACGAGT GCTTCGGCTT CAGCATCAAC GAGTGCGTCA GCTTCAGCA.A GCACATCAGC TTCTGAATCT GCATCAACCA GTGCGTCCGC TTCAGCGTCA CAGCATCAAC GAGTGCGTCG CAACCAGTGC GTCCGCTTCA CTGAATCGGC ATCAACGAGT CCGCCTCAAC CAGTGCGTCG CAGCGTCTGA ATCGGCATCA CAGCAAGCAC ATCAGCTTCT ACCAGTGCGT CGGCTTCAGC GTCGACAAGT GCTTCGGCTT GCCTCAGCAA GCGCA.AGTAC CTCAGCGTCA GCTTCCGCCT GCAAGCACAA GTGCGTCAGC CTCAGCAAGT ATCTCAGCGT GCGTCGGCCT CAGCAAGCGC A.AGTACCTCA GCGTCAGCTT GCTTCAGCAA GCACAAGTGC., GTCAGCCC-A GCAAGTATCT ACGAGTGCGT CTGAGTCAGC ATCAACGAGT ACGTCAGCCT GAATCGGCAT CAACCAGTGC GTCAGCCTCA GCATCGACAA GCGCCTCAGC TTCAGCAAGT ACCAGTGCTT CAGCCTCAGC GTCGACAAGT GCGTCGGCCT CAACCAGTGC ATCTGAATCG GCATCAACCA GTGCGTCAGC CTCAGCAAGT ACTAGTGCAT CAGCTTCAGC ATCAACGAGT GCATCGGCTT INFORMATION FOR SEQ ID NO: 378: SEQUENCE CHARACTERISTICS: LENGTH: 1003 base pairs TYPE:*nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 378: CGAGATTCTC TGGAGTTATG GATGTCGTTC CAATATGTGC
ATATGGGGGG
CTTTTTGTCA
TTAGT'TTATT
CTGAGAGAAG
GAGCTTGTAG
AACAGAATCC TCTCTTGATT GAAGACAAGC ACTGTAGTGG GTTGATATAA TAGTATTAGT CAGTACAAAT TTAACGGGTC AAGATr'rATA TATCTTGATT TTATGTGTGG TTTTTATACT AGAAGAATTA GCTCATAAAG GAGATTGATT ACGTTrGGAAT GTTAGTGCTT TAGTCATTAG GCTGGTTTGT GAGTGGGATA AAAGTTTCAT TACTAGTGGT GTTTTTGGGG TACAGTTGTT CTGCTCCAAA ATTTTGATAT CAAAAAAATG CTACTTCCTr CTGTCGAATT AACCATCCAG AATCCTTGCC GTGTTTTTTA CTAGACAAAA AAAA.AGAAAG GAAACTCACA TTACCAACTA TCTTTCGATG CACAGGATAA CCTGATGCAT TAGACAATTT TAAACCCCAA TTAGCTTAGA TCCTGGATGG AAGAGTTTCC CCTTTATGGT TGAACAGTTT ACCAAATCAT GAGGTCATTT AACCCAGTAT AACTAAAAGA GCGGATTTCT ATTCGGATTC AGATATCCTT ACTATGCTTG TAAAGAATTG AGCTTGCTTC ACAGCCAACC ATAGTTTGCG ATGCCTCAAC
TTTTTTAGCG
TTATTCACCC
TTTCTTTTTT
ATAAGTGTAG
CACTTCCAAA
GGTGGTCTTA
AAGTATTTAG
ACAATGCTTG
CAAATCTAAA
CACCCAATGG
AAAAAAACAC
ACAAGTCTTT
TCTTTTTTCA GGAACTT'TTT TAACGAATGA CCAACGCCGC
TCCCAGTTGA
TACTGTCGTT
GTCCAGTTCC TCTTTCAACT GTTAACAGGT TATGGAACGG TCAGCTGATG CCTACTTTCC AAAATTATTG GAAGGAGGGC TTATCCCGTT TTCTTTCCAG AACTGACGAG GAAACAGTCC CTTGAATTGG TCGAATTCTT TTT INFORMATION FOR SEQ ID NO: 379: SEQUENCE CHARACTERISTICS: LENGTH: 738 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 379: 1391 TTGCTGGGAA TTTGAGGTA GATCTATGAT CCGATGATTC
TGAAATACTA
ACAACCCCGT
CTACTGGCAG
TATTCTACTA
CCTTTTGTGG
TAGGAAGCTA
GACTCAAAAC
TGAGGTI'GTG
TGATTGGTTT GCTCTTTACT ATTGTTTTAG CTATTATCCT CAAAATCAAC TATTTTCCAT AGCAACACCT TGGTCAAGGT TTAACCTrTA TGGTGATTAC ATTGGTCTGA AGTTrTCTTT GCCGCAckGC TCAAAACACC ACC GT'rGA GGTTGTAGAT GATAGAACTrG ACGAGcGACT ATCTCTTGCT TTGATTGTAT GGATGCCACT AGTA.ATATTG GCTCAC?1'TA N'GGTGAGTr TTATAAA'rAA AAGAAAACTT TTTATACTCA ATr.AAAATCA GTTr'TGAGGT TGTAGATATA ATAACTGACG AGcGACTCAA CAAAACACCG TTTTGAGGTT
TGGTAACTAT
GTAAACCAAG
TGGCTr-rATT
CAGATATTCA
AAGAGCAAAC
ACTGACGAGc
AACACCGTTT
GTGGATAGAA
CTGACGAAGT CGcTCAAAAC ACCGTTI'TGA GGTTGTGGAT AGAACTGACG AAtgctCAAA ACACCGTTTr GAGGTTGTGG ATAGAACTGA CGAAGCgaaC ATATATACAG CAACGGCGACG C'rGACGTGGT T'rGAAGAGTA TTACTGTCTA TAT'TTrGGT ATGAAGGTTT TTTTTTTT INFORMATION FOR SEQ I0 NO: 380: SEQUENCE CHARACTERISTICS: LENGTH: 695 base pairs TYPE: nucleic acid STRANDEDNESS: double (0)-TOPOLOGY: linear AAAAATCAAC TTTTACTTGG (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 380: CCGTCTTATC AAAGAGGTTA ACAAAGGCAC CAAATTTCTC GATACGAACG ACTTTAGCAC GGTAAACTTC ATCCACTTTG GCTTCACGAA CCAAACCAGC AATA.ATTTCT TTGGCACGGT TAATAGCATC TTGGTCACTA GAGTAGATAG ACACAVPTCC TTCT'FCGTCT ATATCAATCT TAACACCTGT TTCAGCGATA ATCTTGTCGA TGGTTTCTCC ACCCTTACCG ATGACAATCT TAATCTTGTC CACATCAATC TTGATCGTAT CAATTNTCGG AGCAGTTGGA GCCAATTCTG GACGAACTEC TGGAATGGTT GCTTCAATGA CATCAAGGAT TTCAAAACGC GCTTTCTTGG CTTGAGCAAG AGCCTCCGTC AAGATTTCTG, CAGTAATCCC TT-GAATCTTG ATATCCATTT GAAGGGCTGT AATCCCATCA CGAGTACCTG CAACCTTGAA GTCCATATCT CCAAAGTGAT CTTCCAAACC TTGGATATCT GTCAATACTG TGTAGTTATT TCCATCTGAG ATAAGCCCCA TAGCAATACC AGCTAC!TGGC GCCTTGATTG GCACACCACC AGCCATAAGG GCAAGAGTTC 1392 CCGCACAGAT AGAAGCTTGA GATGAAGAAC CGTTTGATTC CAAAACTTCT GCTACTAGAC 660 GGATAGCGTA GGGGAATITCT TCCAAGCTTG GCAGG 695 INFORMATION FOR SEQ ID NO: 381: SEQUENCE CHARACTERISTICS: LENGTH: 691 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 381: GACATCTTAT CTAAATACAT GCTAATATAT TTAGATACAA ACATTCCAAC TTGATAATTT TCACTCATCT TTCATCATTC CTTATACAAC TATGCAGTAT AAATAGAATA GTTTTCTCAT 120 CAGAATGAGA CTATTTTAAT ATTAGATCCC CAATTATTCA CCCCAAATCT A.AAAACCATC 180 CAGAATCCTT GCCTTAGCTT AGATCCTGGA TGGTTrCTTT T'rFCACCCAA TGGGTGTTTT 240 *-T'rACTAGACA AAAAAGAGPT TCCCCTTTAT GGTATAAGTG TAGAAAAAAA CACAAAAAGA 300 *AAGGAAACTC ACATGAACAG TTTACCAAAT .CATCACTTCC AAAACAAGTC TTTTTACCAA 360 'CTATCTTTCG ATGGAGGTCA TTTAACCCAG TATGGTGGTC TTATCTTTTT TCAGGAACTT 420 *TTTTCCCAGT TGAAACTAAA AGAGCGGATT TCTAAGTATT TAGTAACGAA TGACCAACGC 480 *CGCTACTGTC GTTATTCGGA T'rCAGATATC CTTGTCCAGT TCCTCTTTCA ACTGTTAACA 540 GGTTATGGAA CGGACTATGC TTGTAAAGAA TTGTCAGCTG ATGCCTACTT TCCAAAATTG 600 TTGGAAGGAG GGCAGCTTGc TTCACAGCCA ACCTTATCCC Gw'rTTCTTTC CAGAACTGAC 660 .GAGGAAACAG TCCATAGTTT GCGATGCCTC A69 INFORMATION FOR SEQ ID NO: 382: SEQUENCE CHARACTERISTICS: LENGTH: 750 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 382: ATCTCTCTGC GTAATGGTCC TCAGATAACT CTGATGATGT GTGGCGATAT AGAACTGAGC CAAGTTATGC CTAAAGGGCC TTAGGAATAG GAGCTTTCAC AAGCTTATCC AGATGATTAT 120 CTTTTACTCG TTATGGACAA TGCTATATGG CATAAATCAA GTACCTTAAA GATTCCGACT 180 AATATTGGCT TTGCATTTAT TCCTCCATAC ACACCAGAGA TGAACCCCAT TGAACAAGTG 240 1393 TGGAAAGAGA TTCGTAAACG TGGATrTAAG AATAAAGCCT ATACAAGGAC TGGAGAAGGA GGTGATAAAG TCCATCGTTA CTTTTTGAAA ACAGATGAGT ATAAAAAGAA AGTCCTCATT CTGATGAAT'r TATAGTAAAA TGAAATAAGA ACAGGATAGT TTTTAGAAGC AGAGGTGTAC TATTCTAGTT TAAATCCACT TTCGAACTTT GGAAGATGTC ATCGGAGACG GACTAGAATG TCAATAGAAA TCACGACTTT CAAATCGATT TCTAACAATG ATATTTGGGG AGTGATAGAA AAGCCCTTCA TCAGCCAATC TACTTGTTCA GGTGCGAGAG CTTTGACATC CTT'rTCTGTA CTGGACCAAG TCAGTrTTCC GT'rCTCAAAG CGTTTATATA ATATCCAAAA TCCTTGACCA TCCCAGTAAA GAACT'N'AAA GCGGTCTTTA CGTCCACCAC AAAAGAGAAA GACTTGATCG GAGAAAGGAT CCAA'PTCAAA GTGGGTTTGG INFORMATION FOR SEQ ID NO: 383: SEQUENCE CHARACTERISTICS: LENGTH: 738 base pairs TYPE: nucleic acid STRANDEONESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 383: TCAAATTCTT CGTGGTCCGC ATATCTnTCT TCGTACACGG CAGTCACTTG GTCTTTCACT ACTCGAGTCG CAGCTTCACG GGCCAATTTC TCTTCTACTT GAACTGCC'T TTGGAGGTCA CTGTTGTAGG CTGCAATGAT TTCAGCTTGC AATTCAGCAT CCACGTGAAG CAATTCCACT TCTGCTTN'T CTTTACCGAC AGCAGCAACG ATTTCTTCTT GGAAGGCAAT CAATTCTTTG ACAGCTTCGT GCCCTTTA.AG GAGCGCTTCC AACATGATT'r CTTCTGACAA TTCTTTGGCA CCAGACTCTA CCATGTTGAT AGCGTGCTTG GTTCCAGCTA CTGTCAATTC AAGAAGAGAT TGCTCTGCTT GTTCTTGACT TGGGTTGATG ATGATTTGGC CATCTACATA TCCCACTTGT ACCCCAGCAA TTGGTCCGTC AAATGGAATA TCTGAAATAG ACAGTGCCAA AGATGAACCA
AACATAGCAG
TGGACTTCAT
GCTGTCAAGG
TTCCCAGCCG
GCCATTTTCT
CCATTGGTGC
TACGGAAACC
TCGCATCTGT
CATACATTTT
GGGGATCC
AGATGCATTT TCATCATAAG AA.AGCACTGT AT'rGATGACT TTCCGCAAAC ATAGGACGAA. TCGGACGGTC AATCAAACGC TGAAGGACGT CCTTCACGTT TCATAAAGCC ACCAGGAAAC TTCTTCGTAG TTGACTTGGA GTGGGAAGAA ATCCTCAGTT 738 INFORMATION FOR SEQ ID NO: 384: SEQUENCE CHARACTERISTICS: LENGTH: 657 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184- CCCCCTATTT ACCGTGGACT AAAGTTGTAC AAGAAAAGTG CAAATAAGAA ATCTCCAGAT TAGGAACTAT ATATGAGTTC TCTAGTCTGG AGATTTrCA ATAGACTTCG TTATTGGGCG 120 GTTACTTTCG AAACTTTGAA AACTTCAAAA AACGGATTTT TATCGCTN'C AAATTCTTTT 180 GGGGTCAAAC TCAGTAACTT ATTCGCCTTG TAGACTrCAT GACGCTCAGG GTATACTTTC 240 AAGGTCCCAA ATAGCCAAGA ATCGTCAGCG ATATTATCTG AATCATCTCC T'rCTTGTTCT 300 CCTTTAGTTC GCCTGAGGAC AGCCTTGACA CGCGCCAGAA TTCTCTAGGG CTAAAAGGCT 360 TGGTCAGGTA GTCATCAGCC CCTAATTCCA AGGCCAAAAC CTTATCAAAT TCATCACTrT 420 TCGCAGAAAC CATCATAATT GGAGTTTTGA CGCCTTTGGC TCTCAGCCGC TTACAAACTT 480 *..CCATGCCATC TAATTGTGGT AACATGATAT CAAGCAAGAT AAAATCAAAG GGTTCTGTTT 540 *..CTGCCAAAGC TAAGGCCTTC CGTCCATTTG TCACCAATTG AGTAGAAAAG CCTTCCTTAC 600 ***TTAAATGGTA GTCAAGCAAT TTCAGAATGT GTTCTTCATC ATCCACTAAT AAGACTT 657 INFORMATION FOR SEQ ID NO: 385: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 586 base pairs TYPE: nucleic acid STRANDEDNESS: double (D D) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 385: *CCGCATCAGC ATCAACGAGT GCATCGGCI' CACGTCAACC AGTGCATCAG TCTCAGCAkbG CACCAGTGCG TCGGCTTCAG CATCAACGAG TGCCTCAGCC TCAGCAAGTA TCTCAGCGTC 120 TGAATCGGCA TCAACGAGTG CGTCAGCTCA GCAAGTACTA GTGCATCGGC TTCAGCAAGC 180 ACCAGTGCGT CGGCTTCAGC ATCAACCAGT GCCTCAGCCT CAGCAAGTAT CTCAGCGTCT 240 *GAATCGGCAT CAACGAGTGC GTCACCTCAG CAAGTACTAG TGCATCAGCA TCAGCATCAA 300 CGAGTGCATC GGCTTCAGCA AGTACCAGCG CCTCAGCTTC AGCAAGCACC AGTGCGTCAC 360 CTCAGCAAGT ACCAGCGCCT CAGCCTCAGC AAGCACCAGT GCCTCAGCTT CAGCAAGTAC 420 CAGTGCGTCA CCTCAGCATC GACAAGTGCG TCGGCTTCAG CAAGTACCTC AGCGTCTGAA 480 a a. a a a a a 1395 TCAGCATCAA CGAGTGCGTC AGCTflCAGCA TCAACCAGTG CCTCAGCCTC AGCAAGTATC AGTGCGTCAG CTTCAGCATC AACGAGTGCG TCAGCTGCAG CAAGTA INFORMATION FOR SEQ ID NO: 386: SEQUENCE CHAR~ACTERISTICS: LENGTH: 451 base pairs TYPE: nucleic acid STRANflEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 386: CGTCGGCTTC AGCATCAACG AGTGCATCAG CTrCAGCATC AACAAGTGCT TCAGCTTCAG CAAGTACCAG TGCGTCGGCT TCAGCATCAA CGAGTGCTTC AGTCTCAGCG TCAACCAGTG CCTCTGAATC CGCATCAACA AGTGCCTCGG CTTCAGCAAG CACCAGTGCT TCGGCTTCAG CGTCAACGAG TGCGTCTGAG TCAGCATCAA CGAGTGCGTC ACCTCAGCAA GCACATCAGC TTCTGAATCT GCATCAACCA GTGCGTCAGC TTCCGCATCA ACAAGCCCCT CGGCCTCAGC AAGTACAAGT GCTTCAGCC1' CAGCATCAAC CAGTGCATCA GCTTCAGCCT CAACAAGTGC TTCAGCCTCA GCGTCAACCA GTGCCTCGGC TTCAGCAAGT ACCAGTGCGT CAGTTcAGCA AGCACAAGTG CGTCAATTTA GCATCAACCA G INFORMATION FOR SEQ ID NO: 387: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 425 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 387: TCTCAGCAAG CACCATTGCG TCGGCTrCAT CAAGCACCAG CGCGT'TTGAA TCCGCATCAA CCAGTGCTTC AGCTrCAGCC AAGTTACCTC AGCATCTGAA TCAGCATCAA CAAGTGCATC GGCTTCAGCA AGCACAAGTG CTTCAGCtCA GCAAGTATCT CAGCGTCTGA ATCGGCATCA ACGAGTGCGT CCGCTTCAGC AAGTACTAGC GCCTCAGCAT CAGCGTCAAC AAGTGCTTCG GCTrCAGCGT CAACGAGTGC GTCTGAGTCA GCATCAACGA GTACGTCAGC CTCAGCAAGC ACATCAGCTT CTGAATCTGC ATCAACCAGT GCGTCAGCCT CAGCATCGAC AAGCGCCTCA rCrrrCAr-CAA GTACCAGTGC GTCAGCCTCA GCAAGTACCA GTGCTTCAGC CTCAGCGTCG 540 586 120 240 300 360 420 451 1396
ACAAG
INFORMATION FOR SEQ ID NO: 388: SEQUENCE CHARACTERISTICS: LENGTH: 572 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 388: AGAGGATCCC CGGATCCTCA AGACAAACTC TTCATACTCC T'rTTTTTGCA ATTTAGCTGA AGCA.AGTTTT TGACCTCAGT CTCATAGAAC TATTATATCA TTTCATACTC TTCAAAAATC ACTGACTTCG TCAGTTCTAT TCTATCTGCA ACCTCAAAGC TTTGATTTWC ATTGAGTATC GTCGCTGAGA TAAC'TCCTTT AACACTTGCC CATTTTATGC TTTTCTT TTACCATTTA CCGACTTCCC ACCGCACAGG TATCAAAAGG AGGCTAGTAC TCTTCAAACC GCGTCAACGT CTGCAACCTC AAAACAGTGT GGGCTTGTTC ATCATGTAGT GAATCTCATC TATT?1'TTCT CAGTCACGCG CCCAGCCTTG CAACTAAAAA TTTATCTAAT AATGACCAAC CTCCTTTTCG CGCCTTGCCG TATATATGTT TTTGAGCTGA CTTCGTCAGT AGTGCTTTGA GCATCCTGCG GCTAGTTTCC kAGTkTGCTC AGATTTAGGA AATTAACTTC CTCGkCTCCA AAAAAkAGCT AAAACAATCA AGGCTCCTAA AATCGCTGGG AT INFORMATION FOR SEQ ID NO: 389: Ci) SEQUENCE CHARACTERISTICS: LENGTH: 505 base pairs TYPE: nucleic acid STRANOEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 389: CAACAAGTGC CTCGGCTTCA GCATGCACAA GTGCTTCAGC 'rTCAGCATGT ACCTGAGCGT CTGAATCAGC ATCAACGTGT GCGTCCGCTT CAGCATGTAC TGCTGCCTCA GCATCAGCGT -CAAcAwGTGC TTCGGCTTCA. GCGTCAACGA GTGCGTCTGA GTCAGCATCA ACGAGTACGT CAGCCTCAGC AAGCACATCA GCTTCTGAAT CTGCATCAAC CAGTGCGTCA GCCTCAGCAT CGACAAGCGC CTCAGCTTCA GCAAGTACCA GTGCGTCAGC CTCAGCAAGT ACCAGTGCTT CAGCCTCAGC GTCGACAAGT GCGTCGGCCT CAACCAGTGC ATCTGAATCG GCATCAACCA GTGCGTCAGC CTCAGCAAGT ACTAGCGCCT CAGCCTCAGC ATCAACGAGT GCGTCCGCTT 1397 CAGCAAX3TAC TAGTGCATCA GCATCAGCAT CAACGAGTGC ATCGGCTTCA GCAAGTACCA GCGCCTCAGC ?I'CAGCAAGC ACCGG INFORMATION FOR SEQ ID NO: 390: Wi SEQUENCE CHARACTERISTICS: LENGTH: 447 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 390: GCTAAGACTA CCTCATTAGG GGCATAGGCT GCTAAAATAA CTG AATACTGTAC TTTTTTCAT TTTAATTCCT TACATATTTA TAT AAACTrTAAC TTTGCTAGCC TTTGTTATAA AAAGTTTTAC TAA GAGTAGTACA TTTATATATA ATTGTTATCT CTCTATAAAA ACA ATTTAAGTCA AAAAAATTAA CATTAGTTAA 'TTATTTTT'r AGC ATTAGTACTC AATGAAAATC AAAGAGCAAA CTAGGAAACT AGC' AGTGTTT'rGA GGTTGTAGAT GGAATGACGT AGTCAGCTCA AAA CAGCTGT GGT'rAATGAC AACTTCC AATAGATAAT GTA'1rAT CTAGGAAATA GTATATC ATTrAAAAAA ACACATT AAAAAATAAG 1* S .0
S
*5t* *0 5*9*
S
CGCAGAT
CACTG1'
TGCTCAAAAC
TTGAAGTTGT
GGATAGAACT GACGAAGTCG GTACCGA INFORMATION FOR SEQ ID NO: 391: SEQUENCE CHARACTERISTICS: LENGTH: 572 base pairs B) TYPE: nucleic acid STRANDEDNESS: double (D0) TOPOLOGY: linear (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 391: AGCACTTGTC GTTGAATTCT ACAACAAAAT GTTIGTAATAT TTTATTGAAT AAGATAGGCC TTGATATTAA GCACTTTGGG ACGTTCTCCC TTAGTGCTTT TTTGATTTCT CTTAGTATCC AGCTATAATC GTTGAGACAT AACTAGACCG ATATAGTCCA AAGTGATATA GTAAAATGAA CCAAAAATAG TACACAATGT GGTATAATCC TTTTATGGCA TATTCAATAG AT'PTTCGTAA AAAAGTTCTC TCTTATTGTG AGCGAACAGG TAGTATAACA GAAGCATCAC ACGT'rTTCCA AATCTCACGT AATACCATTT ATGGCTGGI' AAAGCTAAAA GAGAAAACAG GAGAGCTAAA CCACCAAGTA TAGTGTATTG AATCTATAAC AGTACACC?1' GGCTGCTAAA ATATTTC'rAT 1398 AAAT'IAATTT GACTCCTG ATAGAGATGT TCACATCTTA r'rTCAAACTA CTA'rArA.AGT 480 TCTATAA'rCT CTTTATAAGA TTTGCCCATC AGACAAAATA GAACGATTTG AAGGCGTflTA 540 TGATATTTAG CTGTACGAGA GTCTTrAAA AG 572 00.
.0.
1399 MISSING UPON TI2M OF PUBLICATION oi.o** $00e* 1400 DENMkRK The applicant hereby requests that, until the application has been laid open to public inspection (by the Danish Patent Office), or has been finally decided upon by the Danish Patent Office without having been laid open to public inspection, the furnishing of a sample shall only be effected to an expert in the art. The request to this effect shall be filed by the applicant with the Danish Patent Office not later than at the time when the application is made available to the public under Sections 22 and 33(3) of the Danish Patents Act. If such a request has been filed by the applicant, any request made by a third party for the furnishing of a sample shall indicate the expert to be used. That expert may be any person entered on a list of recognized experts drawn up by the Danish Patent Office or any person approved by the applicant in the individual case.
SWEDEN
The applicant hereby requests that, until the application has been laid open to public inspection (by the Swedish Patent Office), or has been finally decided upon by the Swedish Patent Office without having been laid open to public inspection, the furnishing of a sample shall only be effected to an expert in the aft. The request to this effect shall be filed by the applicant with the International Bureau before the expiration of 16 months from the priority date (preferably on the Form PUT/RO/134 reproduced in annex Z of Volume I of the PCT Applicant's Guide). If such a request has been filed by the applicant, any request has been filed by the applicant, any request made by a third party for the furnishing of a sample shall indicate the expert to be used. That expert may be any person entered on a list of recognized experts drawn up by the Swedish Patent Office or any person approved by the applicant in the individual case.
UNITED KINGDOM The. applicant hereby requests that the furnishing of a sample of a microorganism shall only be made available to an expert. The request to this effect must be filed by the applicant with the International Bureau before the completion of the technical preparations for the International publication of the application.
NETHERLANDS
The applicant hereby requests that until the date of a grant of a Netherlands patent or until the date on which the application is refused or withdrawn or lapse, the microorganism shall be made available as provided in Rule 3 I1F(l) of the Patent Rules only by the issue of a sample to an expert. The request to this effect must be furnished by the applicant with the Netherlands Industrial Property Office before the date on which the application is made available to the public under Section 22C or Section 25 of the Patents Act of the Kingdom of the Netherlands, whichever two dates occurs earlier.
1401 Page 2
SINGAPORE
The applicant hereby requests that the furnishing of a sample of a microorganism shall only be made available to an expert. The request to this effect must be filed by the applicant with the International Bureau before the completion of the technical preparations for international publication of the application.
NORWAY
The applicant hereby requests that, until the application has been laid open to public inspection (by the Norwegian Patent Office), or has been finally decided upon by the Norwegian Patent Office without having been laid open to public inspection, the furnishing of a sample shall only be effected to an expert in the art. The request to this effect shall be filed by the applicant with the Norwegian Patent Office not later than at the time when the application is made available to the public under Sections 22 and 33(3) of the Norwegians Patents Act. If such a request has been filed by the applicant, any request made by a third party for the furnishing of a sample shall indicate the expert to be used. That expert may be any person entered on a list of recognized experts drawn up by the Norwegian Patent Office or any person approved by the applicant in the individual case.
AUSTRALIA
The applicant hereby gives notice that the furnishing of a sample of a microorganism shall only be effected prior to the grant of a patent, or prior to the lapsing, refusal or withdrawal of the application, to a person who is a skilled addressee without an interest in the invention (Regulation 3.25(3) of the Australian Patents Regulations).
FINLAND
The applicant hereby requests that, until the application has been laid open to public inspection (by the National Board of Patents and Registration), or has been finally decided upon by the National Board of Patents and Registration without having been laid open to public inspection, the furnishing of a sample shall only be effected to an expert in the art.
ICELAND
The applicant hereby requests that, until the application has been laid open to public inspection (by the Icelandic Patent Office), or has been finally decided upon by the Icelandic Patent Office without having been laid open to public inspection, the furnishing of a sample shall only be effected in the art.

Claims (19)

1. Computer readable medium having recorded thereon the nucleotide sequence depicted in SEQ ID NOS:1-391, a representative fragment thereof or a nucleotide sequence at least 95% identical to a nucleotide sequence depicted in SEQ ID NOS: 1-391.
2. Computer readable medium having recorded thereon any one of the fragments of SEQ ID NOS: 1-391 depicted in Tables 2 and 3 or a degenerate variant thereof.
3. The computer readable medium of claim 1, wherein said medium is 35 selected from the group consisting of a floppy disc, a hard disc, random access memory (RAM), read only memory (ROM), and CD-ROM.
4. The computer readable medium of claim 3, wherein said medium is S• selected from the group consisting of a floppy disc, a hard disc, random access 40 memory (RAM), read only memory (ROM), and CD-ROM. S*
5. A computer-based system for identifying fragments of the Streptococcus pneumoniae genome of commercial importance comprising the following elements: a) a data storage means comprising the nucleotide sequence of SEQ ID NOS:1-391, a representative fragment thereof, or a nucleotide sequence at least 95% identical to a nucleotide sequence of SEQ ID NOS:1-391; b) search means for comparing a target sequence to the nucleotide sequence of the data storage means of step to identify homologous sequence(s), and c) retrieval means for obtaining said homologous sequence(s) of step
6. A method for identifying commercially important nucleic acid fragments of the Streptococcus pneumoniae genome comprising the'step of comparing a database comprising the nucleotide sequences depicted in SEQ ID NOS:1-391, a representative fragment thereof, or a nucleotide sequence at least 95% identical to a nucleotide sequence of SEQ ID NOS:1-391 with a target sequence to obtain a nucleic acid molecule comprised of a complementary nucleotide sequence to said target sequence, wherein said target sequence is not randomly selected.
7. A method for identifying an expression modulating fragment of Streptococcus pneumoniae genome comprising the step of comparing a database comprising the nucleotide sequences depicted in SEQ ID NOS:1-391, a representative fragment thereof, or a nucleotide sequence at least 95% identical to the nucleotide sequence of SEQ ID NOS:1-391 with a target sequence to obtain a nucleic acid molecule comprised of a complementary nucleotide sequence to said target sequence, wherein said target sequence comprises sequences known to regulate gene expression.
8. An isolated protein-encoding nucleic acid fragment of the Streptococcus pneumoniae genome, wherein said fragment consists of the nucleotide sequence of any one of the fragments of SEQ ID NOS: 1-391 depicted in Tables 2 and 3, or a degenerate variant thereof.
9. A vector comprising any one of the fragments of the Streptococcus pneumoniae genome SEQ ID NOS:1-391 depicted in Tables 2 and 3 or a 75 degenerate variant thereof.
10. An isolated fragment of the Streptococcus pneumoniae genome, wherein said fragment modulates the expression of an operably linked open reading frame, wherein said fragment consists of the nucleotide sequence from about 10 to 200 bases in length which is 5' to any one of the open reading frames depicted in Tables 2 and 3 or a degenerate variant thereof.
11. A vector comprising any one of the fragments of the Streptococcus pneumoniae genome of claim 8.
12. An organism which has been altered to contain any one of the fragments of the Streptococcus pneumoniae genome of.claim 8.
13. An organism which has been altered to contain any one of the fragments of the Streptococcus pneumoniae genome of claim 1404
14. A method for regulating the expression of a nucleic acid molecule comprising the step of covalently attaching to said nucleic acid molecule a nucleic acid molecule consisting of the nucleotide sequence from about 10 to 100 bases to any one of the fragments of the Streptococcus pneumoniae genome depicted in SEQ ID NOS:1-391 and Tables 2 and 3 or a degenerate variant thereof.
An isolated nucleic acid molecule encoding a homolog of any of the fragments of the Streptococcus pneumoniae genome of SEQ ID NOS: 1-391 and 100 Tables 2 and 3, wherein said nucleic acid molecule is produced by a process comprising steps of: a) screening a genomic DNA library using as a probe a target sequence defined by any of SEQ ID NOS:1-391 and Tables 2 and 3, including fragments thereof; 105 b) identifying members of said library which contain sequences that hybridize to said target sequence; and c) isolating the nucleic acid molecules from said members identified in step 110
16. An isolated DNA molecule encoding a homolog of any one of the fragments of-the Streptococcus pneumoniae genome of SEQ ID NOS:1-391 and Tables 2 and 3, wherein said nucleic acid molecule is produced a process comprising steps of: 115 a) isolating mRNA, DNA, or cDNA produced from an organism; b) amplifying nucleic acid molecules whose nucleotide sequence is homologous to amplification primers derived from said fragment of said Streptococcus pneumoniae genome to prime said amplification; c) isolating said amplified sequences produced in step 120
17. An isolated polypeptide encoded by any of the fragments of the Streptococcus pneumoniae genome of SEQ ID NOS:1-391 and depicted in Table 2 and 3 or by a degenerate variant of said fragments. 125
18. An isolated polynucleotide molecule encoding any one of the polypeptides of claim 17.
19. An antibody which selectively binds to any one of the polypeptides of claim 17. 130 A method for producing a polypeptide in a host cell comprising the steps of: a) incubating a host containing a heterologous nucleic acid molecule whose nucleotide sequence consists of any one of the fragments of the Streptococcus 135 pneumoniae genome of SEQ ID NOS:1-391 and depicted in Tables 2 and 3, under conditions where said heterologous nucleic acid molecule is expressed to produce said protein, and b) isolating said protein. Dated this Thirtieth day of March 2001. Human Genome Sciences, Inc. Wray Associates Perth, Western Australia Patent Attorneys for the Applicant o* go
AU33351/01A 1996-10-31 2001-03-30 Streptococcus pneumoniae polynucleotides and sequences Ceased AU777190B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US60/029960 1996-10-31
AU69090/98A AU6909098A (en) 1996-10-31 1997-10-30 Streptococcus pneumoniae polynucleotides and sequences

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU69090/98A Division AU6909098A (en) 1996-10-31 1997-10-30 Streptococcus pneumoniae polynucleotides and sequences

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2004231248A Division AU2004231248B2 (en) 1996-10-31 2004-11-23 Streptococcus pneumoniae Polynucleotides and Sequences

Publications (2)

Publication Number Publication Date
AU3335101A true AU3335101A (en) 2001-12-13
AU777190B2 AU777190B2 (en) 2004-10-07

Family

ID=33163451

Family Applications (1)

Application Number Title Priority Date Filing Date
AU33351/01A Ceased AU777190B2 (en) 1996-10-31 2001-03-30 Streptococcus pneumoniae polynucleotides and sequences

Country Status (1)

Country Link
AU (1) AU777190B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113136444A (en) * 2021-05-10 2021-07-20 临沂大学 Microdroplet digital PCR detection method for enterococcus hirae in medical food
CN113290076A (en) * 2021-04-29 2021-08-24 邯郸钢铁集团有限责任公司 Control method for improving threading efficiency of hot galvanizing outlet double-coiler
CN116574634A (en) * 2023-03-13 2023-08-11 广东悦创生物科技有限公司 Streptococcus salivarius thermophilus subspecies JF2 and application thereof in preparation of anti-inflammatory and lipid-relieving food and drug

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113290076A (en) * 2021-04-29 2021-08-24 邯郸钢铁集团有限责任公司 Control method for improving threading efficiency of hot galvanizing outlet double-coiler
CN113290076B (en) * 2021-04-29 2022-11-15 邯郸钢铁集团有限责任公司 Control method for improving threading efficiency of hot galvanizing outlet double-coiler
CN113136444A (en) * 2021-05-10 2021-07-20 临沂大学 Microdroplet digital PCR detection method for enterococcus hirae in medical food
CN113136444B (en) * 2021-05-10 2024-04-19 临沂大学 Microdroplet digital PCR detection method for enterococcus faecalis in medical food
CN116574634A (en) * 2023-03-13 2023-08-11 广东悦创生物科技有限公司 Streptococcus salivarius thermophilus subspecies JF2 and application thereof in preparation of anti-inflammatory and lipid-relieving food and drug
CN116574634B (en) * 2023-03-13 2023-11-03 广东悦创生物科技有限公司 Streptococcus salivarius thermophilus subspecies JF2 and application thereof in preparation of anti-inflammatory and lipid-relieving food and drug

Also Published As

Publication number Publication date
AU777190B2 (en) 2004-10-07

Similar Documents

Publication Publication Date Title
AU745787B2 (en) Enterococcus faecalis polynucleotides and polypeptides
KR101914245B1 (en) Composition Containing Bacterial Strain
AU762606B2 (en) Chlamydia pneumoniae genomic sequence and polypeptides, fragments thereof and uses thereof, in particular for the diagnosis, prevention and treatment of infection
AU754264B2 (en) Chlamydia trachomatis genomic sequence and polypeptides, fragments thereof and uses thereof, in particular for the diagnosis, prevention and treatment of infection
AU2021290210A1 (en) Compositions comprising bacterial strains
EP0941335A2 (en) Streptococcus pneumoniae polynucleotides and sequences
AU2016357553A1 (en) Compositions comprising bacterial strains
KR100923598B1 (en) Surface Proteins of Streptococcus pyogenes
TW202223083A (en) Use of compositions comprising bacterial strains
JPH09322781A (en) Staphylococcus aureus polynucleotide and sequence
KR101986442B1 (en) Biomarkers for rheumatoid arthritis and usage thereof
AU2015205512B2 (en) Phage therapy of E coli infections
KR102191537B1 (en) Selection and use of lactic acid bacteria preventing bone loss in mammals
AU2022256122A1 (en) Novel Proteins From Anaerobic Fungi And Uses Thereof
RU2673715C2 (en) Haemophilus parasuis vaccine serovar type 4
CN112243377A (en) Bacteriophage for treating and preventing bacterially-associated cancer
KR20200019882A (en) Compositions Containing Bacterial Strains
AU2016295176A1 (en) Genetic testing for predicting resistance of gram-negative proteus against antimicrobial agents
KR20200038970A (en) Composition comprising a bacterial strain
AU777190B2 (en) Streptococcus pneumoniae polynucleotides and sequences
KR20190059562A (en) Novel Bacillus subtilis having proteolytic activity and uses thereof
KR20240021274A (en) Bacteriophages against vancomycin-resistant enterococci
AU710880B2 (en) Nucleic acid and amino acid sequences relating to helicobacter pylori for diagnostics and therapeutics
AU713692B2 (en) Nucleic acid and amino acid sequences relating to helicobacter pylori for therapeutics
AU1546202A (en) Enterococcus faecalis polynucleotides and polypeptides